Beignet-1.3.2-Source/intel-beignet.icd.in000664 001750 001750 00000000037 13161142102 017263 0ustar00yryr000000 000000 @BEIGNET_INSTALL_DIR@/libcl.so Beignet-1.3.2-Source/COPYING000664 001750 001750 00000063646 13161142102 014521 0ustar00yryr000000 000000 GNU LESSER GENERAL PUBLIC LICENSE Version 2.1, February 1999 Copyright (C) 1991, 1999 Free Software Foundation, Inc. 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. [This is the first released version of the Lesser GPL. It also counts as the successor of the GNU Library Public License, version 2.1 hence the version number 2.1.] Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This license, the Lesser General Public License, applies to some specially designated software packages--typically libraries--of the Free Software Foundation and other authors who decide to use it. You can use it too, but we suggest you first think carefully about whether this license or the ordinary General Public License is the better strategy to use in any particular case, based on the explanations below. When we speak of free software, we are referring to freedom of use, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish); that you receive source code or can get it if you want it; that you can change the software and use pieces of it in new free programs; and that you are informed that you can do these things. To protect your rights, we need to make restrictions that forbid distributors to deny you these rights or to ask you to surrender these rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library or if you modify it. For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link other code with the library, you must provide complete object files to the recipients, so that they can relink them with the library after making changes to the library and recompiling it. And you must show them these terms so they know their rights. We protect your rights with a two-step method: (1) we copyright the library, and (2) we offer you this license, which gives you legal permission to copy, distribute and/or modify the library. To protect each distributor, we want to make it very clear that there is no warranty for the free library. Also, if the library is modified by someone else and passed on, the recipients should know that what they have is not the original version, so that the original author's reputation will not be affected by problems that might be introduced by others. Finally, software patents pose a constant threat to the existence of any free program. We wish to make sure that a company cannot effectively restrict the users of a free program by obtaining a restrictive license from a patent holder. Therefore, we insist that any patent license obtained for a version of the library must be consistent with the full freedom of use specified in this license. Most GNU software, including some libraries, is covered by the ordinary GNU General Public License. This license, the GNU Lesser General Public License, applies to certain designated libraries, and is quite different from the ordinary General Public License. We use this license for certain libraries in order to permit linking those libraries into non-free programs. When a program is linked with a library, whether statically or using a shared library, the combination of the two is legally speaking a combined work, a derivative of the original library. The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom. The Lesser General Public License permits more lax criteria for linking other code with the library. We call this license the "Lesser" General Public License because it does Less to protect the user's freedom than the ordinary General Public License. It also provides other free software developers Less of an advantage over competing non-free programs. These disadvantages are the reason we use the ordinary General Public License for many libraries. However, the Lesser license provides advantages in certain special circumstances. For example, on rare occasions, there may be a special need to encourage the widest possible use of a certain library, so that it becomes a de-facto standard. To achieve this, non-free programs must be allowed to use the library. A more frequent case is that a free library does the same job as widely used non-free libraries. In this case, there is little to gain by limiting the free library to free software only, so we use the Lesser General Public License. In other cases, permission to use a particular library in non-free programs enables a greater number of people to use a large body of free software. For example, permission to use the GNU C Library in non-free programs enables many more people to use the whole GNU operating system, as well as its variant, the GNU/Linux operating system. Although the Lesser General Public License is Less protective of the users' freedom, it does ensure that the user of a program that is linked with the Library has the freedom and the wherewithal to run that program using a modified version of the Library. The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, whereas the latter must be combined with the library in order to run. GNU LESSER GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any software library or other program which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Lesser General Public License (also called "this License"). Each licensee is addressed as "you". A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables. The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".) "Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library. Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does. 1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) The modified work must itself be a software library. b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change. c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License. d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful. (For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library. In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2.1 instead of to this License. (If a newer version than version 2.1 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices. Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy. This option is useful when you wish to copy part of the code of the Library into a program that is not a library. 4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange. If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code. 5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License. However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables. When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law. If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.) Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself. 6. As an exception to the Sections above, you may also combine or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications. You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things: a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.) b) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (1) uses at run time a copy of the library already present on the user's computer system, rather than copying library functions into the executable, and (2) will operate properly with a modified version of the library, if the user installs one, as long as the modified version is interface-compatible with the version that the work was made with. c) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution. d) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place. e) Verify that the user has already received a copy of these materials or that you have already sent this user a copy. For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the materials to be distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute. 7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above. b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it. 10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties with this License. 11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 13. The Free Software Foundation may publish revised and/or new versions of the Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation. 14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Libraries If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License). To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Also add information on how to contact you by electronic and paper mail. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the library `Frob' (a library for tweaking knobs) written by James Random Hacker. , 1 April 1990 Ty Coon, President of Vice That's all there is to it! Beignet-1.3.2-Source/backend/000775 001750 001750 00000000000 13174334761 015060 5ustar00yryr000000 000000 Beignet-1.3.2-Source/backend/src/000775 001750 001750 00000000000 13174334761 015647 5ustar00yryr000000 000000 Beignet-1.3.2-Source/backend/src/gbe_bin_generater.cpp000664 001750 001750 00000031405 13161142102 021755 0ustar00yryr000000 000000 /* * Copyright © 2013 Intel Corporation * * This library is free software; you can redistribute it and/or modify it * under the terms of the GNU Lesser General Public License as published by the * Free Software Foundation; either version 2.1 of the License, or (at your * option) any later version. * * This library is distributed in the hope that it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License * for more details. * * You should have received a copy of the GNU Lesser General Public License * along with this library. If not, see . * */ /******************************************************************************* This file is used to generating the gbe kernel binary. These binary may be used in CL API, such as enqueue memory We generate the binary in build time to improve the performance. *******************************************************************************/ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "backend/program.h" #include "backend/program.hpp" #include "backend/src/sys/platform.hpp" #include "src/cl_device_data.h" using namespace std; #define FILE_NOT_FIND_ERR 1 #define FILE_MAP_ERR 2 #define FILE_BUILD_FAILED 3 #define FILE_SERIALIZATION_FAILED 4 static uint32_t gen_pci_id = 0; class program_build_instance { protected: string prog_path; string build_opt; static string bin_path; static bool str_fmt_out; int fd; int file_len; const char* code; gbe::Program* gbe_prog; public: program_build_instance (void) : fd(-1), file_len(0), code(NULL), gbe_prog(NULL) { } explicit program_build_instance (const char* file_path, const char* option = NULL) : prog_path(file_path), build_opt(option), fd(-1), file_len(0), code(NULL), gbe_prog(NULL) { } ~program_build_instance () { if (code) { munmap((void *)(code), file_len); code = NULL; } if (fd >= 0) close(fd); if (gbe_prog) gbe_program_delete(reinterpret_cast(gbe_prog)); } program_build_instance(program_build_instance&& other) = default; #if 0 { #define SWAP(ELT) \ do { \ auto elt = this->ELT; \ this->ELT = other.ELT; \ other.ELT = elt; \ } while(0) SWAP(fd); SWAP(code); SWAP(file_len); SWAP(prog_path); SWAP(build_opt); #undef SWAP } #endif explicit program_build_instance(const program_build_instance& other) = delete; program_build_instance& operator= (const program_build_instance& other) { /* we do not want to be Lvalue copied, but operator is needed to instance the template of vector. */ assert(1); return *this; } const char* file_map_open (void) throw (int); const char* get_code (void) { return code; } const string& get_program_path (void) { return prog_path; } int get_size (void) { return file_len; } void print_file (void) { cout << code << endl; } void dump (void) { cout << "program path: " << prog_path << endl; cout << "Build option: " << build_opt << endl; print_file(); } static void set_str_fmt_out (bool flag) { str_fmt_out = flag; } static int set_bin_path (const char* path) { if (bin_path.size()) return 0; bin_path = path; return 1; } void build_program(void) throw(int); void serialize_program(void) throw(int); }; string program_build_instance::bin_path; bool program_build_instance::str_fmt_out = false; #define OUTS_UPDATE_SZ(elt) SERIALIZE_OUT(elt, oss, header_sz) #define OUTF_UPDATE_SZ(elt) SERIALIZE_OUT(elt, ofs, header_sz) void program_build_instance::serialize_program(void) throw(int) { ofstream ofs; ostringstream oss; size_t sz = 0, header_sz = 0; ofs.open(bin_path, ofstream::out | ofstream::trunc | ofstream::binary); char src_hw_info[4]=""; if(IS_IVYBRIDGE(gen_pci_id)){ src_hw_info[0]='I'; src_hw_info[1]='V'; src_hw_info[2]='B'; if(IS_BAYTRAIL_T(gen_pci_id)){ src_hw_info[0]='B'; src_hw_info[1]='Y'; src_hw_info[2]='T'; } }else if(IS_HASWELL(gen_pci_id)){ src_hw_info[0]='H'; src_hw_info[1]='S'; src_hw_info[2]='W'; }else if(IS_BROADWELL(gen_pci_id)){ src_hw_info[0]='B'; src_hw_info[1]='D'; src_hw_info[2]='W'; }else if(IS_CHERRYVIEW(gen_pci_id)){ src_hw_info[0]='C'; src_hw_info[1]='H'; src_hw_info[2]='V'; }else if(IS_SKYLAKE(gen_pci_id)){ src_hw_info[0]='S'; src_hw_info[1]='K'; src_hw_info[2]='L'; }else if(IS_BROXTON(gen_pci_id)){ src_hw_info[0]='B'; src_hw_info[1]='X'; src_hw_info[2]='T'; } if (str_fmt_out) { if(gen_pci_id){ //add header to differeciate from llvm bitcode binary. // (5 bytes: 1 byte for binary version, 4 byte for bc code, 'GENC' is for gen binary.) char gen_header[6] = "\1GENC"; OUTS_UPDATE_SZ(gen_header[0]); OUTS_UPDATE_SZ(gen_header[1]); OUTS_UPDATE_SZ(gen_header[2]); OUTS_UPDATE_SZ(gen_header[3]); OUTS_UPDATE_SZ(gen_header[4]); OUTS_UPDATE_SZ(src_hw_info[0]); OUTS_UPDATE_SZ(src_hw_info[1]); OUTS_UPDATE_SZ(src_hw_info[2]); } string array_name = "Unknown_name_array"; unsigned long last_slash = bin_path.rfind("/"); unsigned long last_dot = bin_path.rfind("."); if (last_slash != string::npos && last_dot != string::npos) array_name = bin_path.substr(last_slash + 1, last_dot - 1 - last_slash); ofs << "#include " << "\n"; ofs << "char " << array_name << "[] = {" << "\n"; if(gen_pci_id){ sz = gbe_prog->serializeToBin(oss); sz += header_sz; }else{ char *llvm_binary; size_t bin_length = gbe_program_serialize_to_binary((gbe_program)gbe_prog, &llvm_binary, 1); oss.write(llvm_binary, bin_length); sz += bin_length; free(llvm_binary); } for (size_t i = 0; i < sz; i++) { unsigned char c = oss.str().c_str()[i]; char asic_str[9]; sprintf(asic_str, "%2.2x", c); ofs << "0x"; ofs << asic_str << ((i == sz - 1) ? "" : ", "); } ofs << "};\n"; string array_size = array_name + "_size"; ofs << "size_t " << array_size << " = " << sz << ";" << "\n"; } else { if(gen_pci_id){ //add header to differeciate from llvm bitcode binary. // (5 bytes: 1 byte for binary version, 4 byte for bc code, 'GENC' is for gen binary.) char gen_header[6] = "\1GENC"; OUTF_UPDATE_SZ(gen_header[0]); OUTF_UPDATE_SZ(gen_header[1]); OUTF_UPDATE_SZ(gen_header[2]); OUTF_UPDATE_SZ(gen_header[3]); OUTF_UPDATE_SZ(gen_header[4]); OUTF_UPDATE_SZ(src_hw_info[0]); OUTF_UPDATE_SZ(src_hw_info[1]); OUTF_UPDATE_SZ(src_hw_info[2]); sz = gbe_prog->serializeToBin(ofs); }else{ char *llvm_binary; size_t bin_length = gbe_program_serialize_to_binary((gbe_program)gbe_prog, &llvm_binary, 1); ofs.write(llvm_binary, bin_length); sz+=bin_length; free(llvm_binary); } } ofs.close(); if (!sz) { throw FILE_SERIALIZATION_FAILED; } } void program_build_instance::build_program(void) throw(int) { gbe_program opaque = NULL; if(gen_pci_id){ opaque = gbe_program_new_from_source(gen_pci_id, code, 0, build_opt.c_str(), NULL, NULL); }else{ opaque = gbe_program_compile_from_source(0, code, NULL, 0, build_opt.c_str(), NULL, NULL); } if (!opaque) throw FILE_BUILD_FAILED; gbe_prog = reinterpret_cast(opaque); if(gen_pci_id){ assert(gbe_program_get_kernel_num(opaque)); } } const char* program_build_instance::file_map_open(void) throw(int) { void * address; /* Open the file */ fd = ::open(prog_path.c_str(), O_RDONLY); if (fd < 0) { throw FILE_NOT_FIND_ERR; } /* Map it */ file_len = lseek(fd, 0, SEEK_END); lseek(fd, 0, SEEK_SET); address = mmap(0, file_len, PROT_READ, MAP_SHARED, fd, 0); if (address == NULL) { throw FILE_MAP_ERR; } code = reinterpret_cast(address); return code; } typedef vector prog_vector; int main (int argc, const char **argv) { prog_vector prog_insts; vector argv_saved; const char* build_opt; const char* file_path; int i; int oc; deque used_index; if (argc < 2) { cout << "Usage: kernel_path [-pbuild_parameter] [-obin_path] [-tgen_pci_id]" << endl; return 0; } used_index.assign(argc, 0); /* because getopt will re-sort the argv, so we save here. */ for (i=0; i< argc; i++) { argv_saved.push_back(string(argv[i])); } while ( (oc = getopt(argc, (char * const *)argv, "t:o:p:s")) != -1 ) { switch (oc) { case 'p': { int opt_index; if (argv[optind-1][0] == '-') {// -pXXX like opt_index = optind - 1; } else { // Must be -p XXXX mode opt_index = optind - 2; used_index[opt_index + 1] = 1; } /* opt must follow the file name.*/ if ((opt_index < 2 ) || argv[opt_index-1][0] == '-') { cout << "Usage note: Building option must follow file name" << endl; return 1; } file_path = argv[opt_index - 1]; build_opt = optarg; prog_insts.push_back(program_build_instance(file_path, build_opt)); break; } case 'o': if (!program_build_instance::set_bin_path(optarg)) { cout << "Can not specify the bin path more than once." << endl; return 1; } used_index[optind-1] = 1; break; case 't': { char *s = optarg; if (optarg[0] == '0' && (optarg[1] == 'x' || optarg[1] == 'X')) s += 2; if (s[0] < '0' || s[0] > '9') { cout << "Invalid target option argument" << endl; return 1; } std::stringstream str(s); str >> std::hex >> gen_pci_id; used_index[optind-1] = 1; break; } case 's': program_build_instance::set_str_fmt_out(true); used_index[optind-1] = 1; break; case ':': cout << "Miss the file option argument" << endl; return 1; default: cout << "Unknown opt" << endl; } } for (i=1; i < argc; i++) { //cout << argv_saved[i] << endl; if (argv_saved[i].size() && argv_saved[i][0] != '-') { if (used_index[i]) continue; string file_name = argv_saved[i]; prog_vector::iterator result = find_if(prog_insts.begin(), prog_insts.end(), [&](program_build_instance & prog_inst)-> bool { bool result = false; if (prog_inst.get_program_path() == file_name) result = true; return result; }); if (result == prog_insts.end()) { prog_insts.push_back(program_build_instance(file_name.c_str(), "")); } } } for (auto& inst : prog_insts) { try { inst.file_map_open(); inst.build_program(); inst.serialize_program(); } catch (int & err_no) { if (err_no == FILE_NOT_FIND_ERR) { cout << "can not open the file " << inst.get_program_path() << endl; } else if (err_no == FILE_MAP_ERR) { cout << "map the file " << inst.get_program_path() << " failed" << endl; } else if (err_no == FILE_BUILD_FAILED) { cout << "build the file " << inst.get_program_path() << " failed" << endl; } else if (err_no == FILE_SERIALIZATION_FAILED) { cout << "Serialize the file " << inst.get_program_path() << " failed" << endl; } return -1; } } //for (auto& inst : prog_insts) { // inst.dump(); //} return 0; } Beignet-1.3.2-Source/backend/src/sys/000775 001750 001750 00000000000 13174334761 016465 5ustar00yryr000000 000000 Beignet-1.3.2-Source/backend/src/sys/intrusive_list.hpp000664 001750 001750 00000014075 13161142102 022246 0ustar00yryr000000 000000 /* * Copyright (c) 2007 Maciej Sinilo * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell * copies of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN * THE SOFTWARE. */ #ifndef __GBE_INTRUSIVE_LIST_HPP__ #define __GBE_INTRUSIVE_LIST_HPP__ #include "sys/platform.hpp" namespace gbe { /*! List elements must inherit from it */ struct intrusive_list_node { INLINE intrusive_list_node(void) { next = prev = this; } INLINE bool in_list(void) const { return this != next; } intrusive_list_node *next; intrusive_list_node *prev; }; /*! Insert node such that prev -> node */ void append(intrusive_list_node *node, intrusive_list_node *prev); /*! Insert node such that node -> next */ void prepend(intrusive_list_node *node, intrusive_list_node *next); /*! Same as prepend */ void link(intrusive_list_node* node, intrusive_list_node* nextNode); /*! Remove the node from its current list */ void unlink(intrusive_list_node* node); template class intrusive_list_iterator { public: typedef Pointer pointer; typedef Reference reference; INLINE intrusive_list_iterator(void): m_node(0) {} INLINE intrusive_list_iterator(Pointer iterNode) : m_node(iterNode) {} INLINE Reference operator*(void) const { GBE_ASSERT(m_node); return *m_node; } INLINE Pointer operator->(void) const { return m_node; } INLINE Pointer node(void) const { return m_node; } INLINE intrusive_list_iterator& operator++(void) { m_node = static_cast(m_node->next); return *this; } INLINE intrusive_list_iterator& operator--(void) { m_node = static_cast(m_node->prev); return *this; } INLINE intrusive_list_iterator operator++(int) { intrusive_list_iterator copy(*this); ++(*this); return copy; } INLINE intrusive_list_iterator operator--(int) { intrusive_list_iterator copy(*this); --(*this); return copy; } INLINE bool operator== (const intrusive_list_iterator& rhs) const { return rhs.m_node == m_node; } INLINE bool operator!= (const intrusive_list_iterator& rhs) const { return !(rhs == *this); } private: Pointer m_node; }; class intrusive_list_base { public: typedef size_t size_type; INLINE void pop_back(void) { unlink(m_root.prev); } INLINE void pop_front(void) { unlink(m_root.next); } INLINE bool empty(void) const { return !m_root.in_list(); } size_type size(void) const; protected: intrusive_list_base(void); INLINE ~intrusive_list_base(void) {} intrusive_list_node m_root; private: intrusive_list_base(const intrusive_list_base&); intrusive_list_base& operator=(const intrusive_list_base&); }; template class intrusive_list : public intrusive_list_base { public: typedef T node_type; typedef T value_type; typedef intrusive_list_iterator iterator; typedef intrusive_list_iterator const_iterator; intrusive_list(void) : intrusive_list_base() { intrusive_list_node* testNode((T*)0); static_cast(sizeof(testNode)); } void push_back(value_type* v) { link(v, &m_root); } void push_front(value_type* v) { link(v, m_root.next); } iterator begin(void) { return iterator(upcast(m_root.next)); } iterator end(void) { return iterator(upcast(&m_root)); } iterator rbegin(void) { return iterator(upcast(m_root.prev)); } iterator rend(void) { return iterator(upcast(&m_root)); } const_iterator begin(void) const { return const_iterator(upcast(m_root.next)); } const_iterator end(void) const { return const_iterator(upcast(&m_root)); } const_iterator rbegin(void) const { return const_iterator(upcast(m_root.prev)); } const_iterator rend(void) const { return const_iterator(upcast(&m_root)); } INLINE value_type* front(void) { return upcast(m_root.next); } INLINE value_type* back(void) { return upcast(m_root.prev); } INLINE const value_type* front(void) const { return upcast(m_root.next); } INLINE const value_type* back(void) const { return upcast(m_root.prev); } iterator insert(iterator pos, value_type* v) { link(v, pos.node()); return iterator(v); } iterator erase(iterator it) { iterator itErase(it); ++it; unlink(itErase.node()); return it; } iterator erase(iterator first, iterator last) { while (first != last) first = erase(first); return first; } void clear(void) { erase(begin(), end()); } void fast_clear(void) { m_root.next = m_root.prev = &m_root; } static void remove(value_type* v) { unlink(v); } private: static INLINE node_type* upcast(intrusive_list_node* n) { return static_cast(n); } static INLINE const node_type* upcast(const intrusive_list_node* n) { return static_cast(n); } }; } /* namespace gbe */ #endif /* __GBE_INTRUSIVE_LIST_HPP__ */ Beignet-1.3.2-Source/backend/src/sys/intrusive_list.cpp000664 001750 001750 00000004362 13161142102 022237 0ustar00yryr000000 000000 /* * Copyright (c) 2007 Maciej Sinilo * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell * copies of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN * THE SOFTWARE. */ #include "intrusive_list.hpp" namespace gbe { intrusive_list_base::intrusive_list_base() : m_root() {} intrusive_list_base::size_type intrusive_list_base::size() const { size_type numNodes(0); const intrusive_list_node* iter = &m_root; do { iter = iter->next; ++numNodes; } while (iter != &m_root); return numNodes - 1; } void append(intrusive_list_node *node, intrusive_list_node *prev) { GBE_ASSERT(!node->in_list()); node->next = prev->next; node->next->prev = node; prev->next = node; node->prev = prev; } void prepend(intrusive_list_node *node, intrusive_list_node *next) { GBE_ASSERT(!node->in_list()); node->prev = next->prev; node->prev->next = node; next->prev = node; node->next = next; } void link(intrusive_list_node* node, intrusive_list_node* nextNode) { prepend(node, nextNode); } void unlink(intrusive_list_node* node) { GBE_ASSERT(node->in_list()); node->prev->next = node->next; node->next->prev = node->prev; node->next = node->prev = node; } } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/sys/mutex.cpp000664 001750 001750 00000003372 13161142102 020316 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "sys/mutex.hpp" #if defined(__WIN32__) #define WIN32_LEAN_AND_MEAN #include namespace gbe { /*! system mutex using windows API */ MutexSys::MutexSys( void ) { mutex = new CRITICAL_SECTION; InitializeCriticalSection((CRITICAL_SECTION*)mutex); } MutexSys::~MutexSys( void ) { DeleteCriticalSection((CRITICAL_SECTION*)mutex); delete ((CRITICAL_SECTION*)mutex); } void MutexSys::lock( void ) { EnterCriticalSection((CRITICAL_SECTION*)mutex); } void MutexSys::unlock( void ) { LeaveCriticalSection((CRITICAL_SECTION*)mutex); } } #endif #if defined(__UNIX__) #include namespace gbe { /*! system mutex using pthreads */ MutexSys::MutexSys( void ) { mutex = new pthread_mutex_t; pthread_mutex_init((pthread_mutex_t*)mutex, NULL); } MutexSys::~MutexSys( void ) { pthread_mutex_destroy((pthread_mutex_t*)mutex); delete ((pthread_mutex_t*)mutex); } void MutexSys::lock( void ) { pthread_mutex_lock((pthread_mutex_t*)mutex); } void MutexSys::unlock( void ) { pthread_mutex_unlock((pthread_mutex_t*)mutex); } } #endif Beignet-1.3.2-Source/backend/src/sys/cvar.cpp000664 001750 001750 00000003410 13161142102 020100 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file cvar.cpp * \author Benjamin Segovia */ #include "sys/cvar.hpp" #include namespace gbe { CVarInit::CVarInit(const char *name, int32_t *addr, int32_t imin, int32_t i, int32_t imax) : varType(CVarInit::INTEGER) { this->i.min = imin; this->i.max = imax; const char *env = getenv(name); if (env != NULL) { sscanf(env, "%i", &i); i = std::min(imax, std::max(imin, i)); } *addr = i; } CVarInit::CVarInit(const char *name, float *addr, float fmin, float f, float fmax) : varType(CVarInit::FLOAT) { this->f.min = fmin; this->f.max = fmax; const char *env = getenv(name); if (env != NULL) { sscanf(env, "%f", &f); f = std::min(fmax, std::max(fmin, f)); } *addr = f; } CVarInit::CVarInit(const char *name, std::string *str, const std::string &v) : varType(CVarInit::STRING) { const char *env = getenv(name); *str = env != NULL ? env : v; } } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/sys/list.hpp000664 001750 001750 00000004023 13161142102 020126 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file list.hpp * * \author Benjamin Segovia */ #ifndef __GBE_LIST_HPP__ #define __GBE_LIST_HPP__ #include "sys/platform.hpp" #include namespace gbe { /*! Use custom allocator instead of std one */ template class list : public std::list> { public: // Typedefs typedef T value_type; typedef Allocator allocator_type; typedef std::list parent_type; typedef typename allocator_type::size_type size_type; /*! Default constructor */ INLINE explicit list(const allocator_type &a = allocator_type()) : parent_type(a) {} /*! Repetitive constructor */ INLINE explicit list(size_type n, const T &value = T(), const allocator_type &a = allocator_type()) : parent_type(n, value, a) {} /*! Iteration constructor */ template INLINE list(InputIterator first, InputIterator last, const allocator_type &a = allocator_type()) : parent_type(first, last, a) {} /*! Copy constructor */ INLINE list(const list &x) : parent_type(x) {} GBE_CLASS(list); }; } /* namespace gbe */ #endif /* __GBE_LIST_HPP__ */ Beignet-1.3.2-Source/backend/src/sys/assert.cpp000664 001750 001750 00000004465 13161142102 020461 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file assert.cpp * \author Benjamin Segovia */ #if GBE_COMPILE_UTESTS #include "sys/assert.hpp" #include "sys/exception.hpp" #include "sys/cvar.hpp" #include #include namespace gbe { BVAR(OCL_BREAK_POINT_IN_ASSERTION, false); BVAR(OCL_ABORT_IN_ASSERTION, false); void onFailedAssertion(const char *msg, const char *file, const char *fn, int line) { char lineString[256]; sprintf(lineString, "%i", line); assert(msg != NULL && file != NULL && fn != NULL); const std::string str = "Compiler error: " + std::string(msg) + "\n at file " + std::string(file) + ", function " + std::string(fn) + ", line " + std::string(lineString); if (OCL_BREAK_POINT_IN_ASSERTION) DEBUGBREAK(); if (OCL_ABORT_IN_ASSERTION) { assert(false); exit(-1); } throw Exception(str); } } /* namespace gbe */ #else #include "sys/assert.hpp" #include "sys/exception.hpp" #include "sys/platform.hpp" #include #include #include namespace gbe { void onFailedAssertion(const char *msg, const char *file, const char *fn, int32_t line) { assert(msg != NULL && file != NULL && fn != NULL); fprintf(stderr, "ASSERTION FAILED: %s\n" " at file %s, function %s, line %i\n", msg, file, fn, line); fflush(stdout); DEBUGBREAK(); _exit(-1); } } /* namespace gbe */ #endif /* GBE_COMPILE_UTESTS */ Beignet-1.3.2-Source/backend/src/sys/set.hpp000664 001750 001750 00000004220 13161142102 017745 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file set.hpp * * \author Benjamin Segovia */ #ifndef __GBE_SET_HPP__ #define __GBE_SET_HPP__ #include "sys/platform.hpp" #include namespace gbe { /*! Add our custom allocator to std::set */ template> class set : public std::set>, public NonCopyable { public: // Typedefs typedef Key value_type; typedef Allocator allocator_type; typedef std::set> parent_type; typedef Key key_type; typedef Pred key_compare; /*! Default constructor */ INLINE set(const key_compare &comp = key_compare(), const allocator_type &a = allocator_type()) : parent_type(comp, a) {} /*! Iteration constructor */ template INLINE set(InputIterator first, InputIterator last, const key_compare &comp = key_compare(), const allocator_type& a = allocator_type()) : parent_type(first, last, comp, a) {} #if 0 /*! Copy constructor */ INLINE set(const set& x) : parent_type(x) {} #endif /*! Better than using find if we do not care about the iterator itself */ INLINE bool contains(const Key &key) const { return this->find(key) != this->end(); } GBE_CLASS(set); }; } /* namespace gbe */ #endif /* __GBE_SET_HPP__ */ Beignet-1.3.2-Source/backend/src/sys/mutex.hpp000664 001750 001750 00000004155 13161142102 020323 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __GBE_MUTEX_HPP__ #define __GBE_MUTEX_HPP__ #include "platform.hpp" #include "atomic.hpp" #include namespace gbe { class MutexSys { friend class ConditionSys; public: MutexSys(void); ~MutexSys(void); void lock(void); void unlock(void); protected: void* mutex; MutexSys(const MutexSys&); // don't implement MutexSys& operator= (const MutexSys&); // don't implement GBE_CLASS(MutexSys); }; /*! active mutex */ class MutexActive { public: INLINE MutexActive(void) : _lock(LOCK_IS_FREE) {} INLINE void lock(void) { GBE_COMPILER_READ_BARRIER; while (cmpxchg(_lock, LOCK_IS_TAKEN, LOCK_IS_FREE) != LOCK_IS_FREE) _mm_pause(); GBE_COMPILER_READ_BARRIER; } INLINE void unlock(void) { _lock.storeRelease(LOCK_IS_FREE); } protected: enum { LOCK_IS_FREE = 0, LOCK_IS_TAKEN = 1 }; Atomic _lock; MutexActive(const MutexActive&); // don't implement MutexActive& operator=(const MutexActive&); // don't implement GBE_CLASS(MutexActive); }; /*! safe mutex lock and unlock helper */ template class Lock { public: Lock (Mutex& mutex) : mutex(mutex) { mutex.lock(); } ~Lock() { mutex.unlock(); } protected: Mutex& mutex; Lock(const Lock&); // don't implement Lock& operator= (const Lock&); // don't implement GBE_CLASS(Lock); }; } #endif /* __GBE_MUTEX_HPP__ */ Beignet-1.3.2-Source/backend/src/sys/exception.hpp000664 001750 001750 00000003113 13161142102 021150 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file exception.hpp * * \author Benjamin Segovia */ #ifndef __GBE_EXCEPTION_HPP__ #define __GBE_EXCEPTION_HPP__ #if GBE_COMPILE_UTESTS #include #include namespace gbe { /*! Exception are only used while using unit tests */ class Exception : public std::exception { public: Exception(const std::string &msg) throw() : msg(msg) {} Exception(const Exception &other) throw() : msg(other.msg) {} ~Exception(void) throw() {} Exception &operator= (const Exception &other) throw() { this->msg = other.msg; return *this; } const char *what(void) const throw() { return msg.c_str(); } private: std::string msg; //!< String message }; } /* namespace gbe */ #endif /* GBE_COMPILE_UTESTS */ #endif /* __GBE_EXCEPTION_HPP__ */ Beignet-1.3.2-Source/backend/src/sys/assert.hpp000664 001750 001750 00000002200 13161142102 020447 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file assert.hpp * * \author Benjamin Segovia */ #ifndef __GBE_ASSERT_HPP__ #define __GBE_ASSERT_HPP__ namespace gbe { /*! To ensure that condition truth. Optional message is supported */ void onFailedAssertion(const char *msg, const char *file, const char *fn, int line); } /* namespace gbe */ #endif /* __GBE_ASSERT_HPP__ */ Beignet-1.3.2-Source/backend/src/sys/alloc.cpp000664 001750 001750 00000026547 13161142102 020257 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file alloc.cpp * \author Benjamin Segovia * * Provides facilities to track allocations and pre-initialize memory at * memory allocation and memory free time */ #include "sys/alloc.hpp" #include "sys/atomic.hpp" #include "sys/mutex.hpp" #if GBE_DEBUG_MEMORY #include #include #endif /* GBE_DEBUG_MEMORY */ #if defined(__ICC__) #include #endif /* __ICC__ */ #include #include #include //////////////////////////////////////////////////////////////////////////////// /// Memory debugger //////////////////////////////////////////////////////////////////////////////// #if GBE_DEBUG_MEMORY namespace gbe { /*! Store each allocation data */ struct AllocData { INLINE AllocData(void) {} INLINE AllocData(int fileName_, int functionName_, int line_, intptr_t alloc_) : fileName(fileName_), functionName(functionName_), line(line_), alloc(alloc_) {} int fileName, functionName, line; intptr_t alloc; }; /*! Store allocation information */ struct MemDebugger { MemDebugger(void) : unfreedNum(0), allocNum(0) {} ~MemDebugger(void) { this->dumpAlloc(); } void* insertAlloc(void *ptr, const char *file, const char *function, int line); void removeAlloc(void *ptr); void dumpAlloc(void); void dumpData(const AllocData &data); /*! Count the still unfreed allocations */ volatile intptr_t unfreedNum; /*! Total number of allocations done */ volatile intptr_t allocNum; /*! Sorts the file name and function name strings */ std::tr1::unordered_map staticStringMap; /*! Each element contains the actual string */ std::vector staticStringVector; std::map allocMap; /*! Protect the memory debugger accesses */ MutexSys mutex; }; void* MemDebugger::insertAlloc(void *ptr, const char *file, const char *function, int line) { if (ptr == NULL) return ptr; Lock lock(mutex); const uintptr_t iptr = (uintptr_t) ptr; if (UNLIKELY(allocMap.find(iptr) != allocMap.end())) { this->dumpData(allocMap.find(iptr)->second); FATAL("Pointer already in map"); } const auto fileIt = staticStringMap.find(file); const auto functionIt = staticStringMap.find(function); int fileName, functionName; if (fileIt == staticStringMap.end()) { staticStringVector.push_back(file); staticStringMap[file] = fileName = int(staticStringVector.size()) - 1; } else fileName = staticStringMap[file]; if (functionIt == staticStringMap.end()) { staticStringVector.push_back(function); staticStringMap[function] = functionName = int(staticStringVector.size()) - 1; } else functionName = staticStringMap[function]; allocMap[iptr] = AllocData(fileName, functionName, line, allocNum); unfreedNum++; allocNum++; return ptr; } void MemDebugger::removeAlloc(void *ptr) { if (ptr == NULL) return; Lock lock(mutex); const uintptr_t iptr = (uintptr_t) ptr; FATAL_IF(allocMap.find(iptr) == allocMap.end(), "Pointer not referenced"); allocMap.erase(iptr); unfreedNum--; } void MemDebugger::dumpData(const AllocData &data) { std::cerr << "ALLOC " << data.alloc << ": " << "file " << staticStringVector[data.fileName] << ", " << "function " << staticStringVector[data.functionName] << ", " << "line " << data.line << std::endl; } void MemDebugger::dumpAlloc(void) { std::cerr << "MemDebugger: Unfreed number: " << unfreedNum << std::endl; for (const auto &alloc : allocMap) this->dumpData(alloc.second); std::cerr << "MemDebugger: " << staticStringVector.size() << " allocated static strings" << std::endl; } /*! The user can deactivate the memory initialization */ static bool memoryInitializationEnabled = true; /*! Declare C like interface functions here */ static MemDebugger *memDebugger = NULL; /*! Monitor maximum memory requirement in the compiler */ static MutexSys *sizeMutex = NULL; static bool isMutexInitializing = true; static size_t memDebuggerCurrSize(0u); static size_t memDebuggerMaxSize(0u); static void SizeMutexDeallocate(void) { if (sizeMutex) delete sizeMutex; } static void SizeMutexAllocate(void) { if (sizeMutex == NULL && isMutexInitializing == false) { isMutexInitializing = true; sizeMutex = new MutexSys; atexit(SizeMutexDeallocate); } } /*! Stop the memory debugger */ static void MemDebuggerEnd(void) { MemDebugger *_debug = memDebugger; memDebugger = NULL; std::cout << "Maximum memory consumption: " << std::setprecision(2) << std::fixed << float(memDebuggerMaxSize) / 1024. << "KB" << std::endl; delete _debug; GBE_ASSERT(memDebuggerCurrSize == 0); } /*! Bring up the debugger at pre-main */ static struct ForceMemDebugger { ForceMemDebugger(void) { doesnotmatter = GBE_NEW(int); GBE_DELETE(doesnotmatter); } int *doesnotmatter; } forceMemDebugger; /*! Start the memory debugger */ static void MemDebuggerStart(void) { if (memDebugger == NULL) { atexit(MemDebuggerEnd); memDebugger = new MemDebugger; } } void* MemDebuggerInsertAlloc(void *ptr, const char *file, const char *function, int line) { if (memDebugger == NULL) MemDebuggerStart(); return memDebugger->insertAlloc(ptr, file, function, line); } void MemDebuggerRemoveAlloc(void *ptr) { if (memDebugger == NULL) MemDebuggerStart(); memDebugger->removeAlloc(ptr); } void MemDebuggerDumpAlloc(void) { if (memDebugger == NULL) MemDebuggerStart(); memDebugger->dumpAlloc(); } void MemDebuggerEnableMemoryInitialization(bool enabled) { memoryInitializationEnabled = enabled; } void MemDebuggerInitializeMem(void *mem, size_t sz) { if (memoryInitializationEnabled) std::memset(mem, 0xcd, sz); } } /* namespace gbe */ #endif /* GBE_DEBUG_MEMORY */ namespace gbe { #if GBE_DEBUG_MEMORY void* memAlloc(size_t size) { void *ptr = std::malloc(size + sizeof(size_t)); *(size_t *) ptr = size; MemDebuggerInitializeMem((char*) ptr + sizeof(size_t), size); SizeMutexAllocate(); if (sizeMutex) sizeMutex->lock(); memDebuggerCurrSize += size; memDebuggerMaxSize = std::max(memDebuggerCurrSize, memDebuggerMaxSize); if (sizeMutex) sizeMutex->unlock(); return (char *) ptr + sizeof(size_t); } void memFree(void *ptr) { if (ptr != NULL) { char *toFree = (char*) ptr - sizeof(size_t); const size_t size = *(size_t *) toFree; MemDebuggerInitializeMem(ptr, size); SizeMutexAllocate(); if (sizeMutex) sizeMutex->lock(); memDebuggerCurrSize -= size; if (sizeMutex) sizeMutex->unlock(); std::free(toFree); } } #else void* memAlloc(size_t size) { return std::malloc(size); } void memFree(void *ptr) { if (ptr != NULL) std::free(ptr); } #endif /* GBE_DEBUG_MEMORY */ } /* namespace gbe */ #if GBE_DEBUG_MEMORY namespace gbe { void* alignedMalloc(size_t size, size_t align) { void* mem = malloc(size+align+sizeof(uintptr_t) + sizeof(void*)); FATAL_IF (!mem && size, "memory allocation failed"); char* aligned = (char*) mem + sizeof(uintptr_t) + sizeof(void*); aligned += align - ((uintptr_t)aligned & (align - 1)); ((void**)aligned)[-1] = mem; ((uintptr_t*)aligned)[-2] = uintptr_t(size); MemDebuggerInitializeMem(aligned, size); SizeMutexAllocate(); if (sizeMutex) sizeMutex->lock(); memDebuggerCurrSize += size; memDebuggerMaxSize = std::max(memDebuggerCurrSize, memDebuggerMaxSize); if (sizeMutex) sizeMutex->unlock(); return aligned; } void alignedFree(void* ptr) { if (ptr) { const size_t size = ((uintptr_t*)ptr)[-2]; MemDebuggerInitializeMem(ptr, size); free(((void**)ptr)[-1]); SizeMutexAllocate(); if (sizeMutex) sizeMutex->lock(); memDebuggerCurrSize -= size; if (sizeMutex) sizeMutex->unlock(); } } } /* namespace gbe */ #else /* GBE_DEBUG_MEMORY */ //////////////////////////////////////////////////////////////////////////////// /// Linux Platform //////////////////////////////////////////////////////////////////////////////// #if defined(__LINUX__) || defined(__GLIBC__) #include #include #include #include #include namespace gbe { void* alignedMalloc(size_t size, size_t align) { void* ptr = memalign(align,size); FATAL_IF (!ptr && size, "memory allocation failed"); MemDebuggerInitializeMem(ptr, size); return ptr; } void alignedFree(void *ptr) { if (ptr) std::free(ptr); } } /* namespace gbe */ #else #error "Unsupported platform" #endif /* __LINUX__ */ #endif //////////////////////////////////////////////////////////////////////////////// // Linear allocator //////////////////////////////////////////////////////////////////////////////// namespace gbe { LinearAllocator::Segment::Segment(size_t size) : size(size), offset(0u), data(alignedMalloc(size, CACHE_LINE)), next(NULL){} LinearAllocator::Segment::~Segment(void) { alignedFree(data); if (this->next) GBE_DELETE(this->next); } LinearAllocator::LinearAllocator(size_t minSize, size_t maxSize) : maxSize(std::max(maxSize, size_t(CACHE_LINE))) { this->curr = GBE_NEW(LinearAllocator::Segment, std::max(minSize, size_t(1))); } LinearAllocator::~LinearAllocator(void) { if (this->curr) GBE_DELETE(this->curr); } void *LinearAllocator::allocate(size_t size) { #if GBE_DEBUG_SPECIAL_ALLOCATOR if (ptr) GBE_ALIGNED_MALLOC(size, sizeof(void*)); #else // Try to use the current segment. This is the most likely condition here this->curr->offset = ALIGN(this->curr->offset, sizeof(void*)); if (this->curr->offset + size <= this->curr->size) { char *ptr = (char*) curr->data + this->curr->offset; this->curr->offset += size; return (void*) ptr; } // Well not really a use case in this code base if (UNLIKELY(size > maxSize)) { // This is really bad since we do two allocations Segment *unfortunate = GBE_NEW(Segment, size); GBE_ASSERT(this->curr); Segment *next = this->curr->next; this->curr->next = unfortunate; unfortunate->next = next; return unfortunate->data; } // OK. We need a new segment const size_t segmentSize = std::max(size, 2*this->curr->size); Segment *next = GBE_NEW(Segment, segmentSize); next->next = curr; this->curr = next; char *ptr = (char*) curr->data; this->curr->offset += size; return ptr; #endif } } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/sys/atomic.hpp000664 001750 001750 00000004155 13161142102 020435 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __GBE_ATOMIC_HPP__ #define __GBE_ATOMIC_HPP__ #include "sys/intrinsics.hpp" namespace gbe { template struct AtomicInternal { protected: AtomicInternal(const AtomicInternal&); // don't implement AtomicInternal& operator= (const AtomicInternal&); // don't implement public: INLINE AtomicInternal(void) {} INLINE AtomicInternal(T data) : data(data) {} INLINE AtomicInternal& operator =(const T input) { data = input; return *this; } INLINE operator T() const { return data; } INLINE void storeRelease(T x) { __store_release(&data, x); } public: INLINE friend T operator+= (AtomicInternal& value, T input) { return atomic_add(&value.data, input) + input; } INLINE friend T operator++ (AtomicInternal& value) { return atomic_add(&value.data, 1) + 1; } INLINE friend T operator-- (AtomicInternal& value) { return atomic_add(&value.data, -1) - 1; } INLINE friend T operator++ (AtomicInternal& value, int) { return atomic_add(&value.data, 1); } INLINE friend T operator-- (AtomicInternal& value, int) { return atomic_add(&value.data, -1); } INLINE friend T cmpxchg (AtomicInternal& value, const T v, const T c) { return atomic_cmpxchg(&value.data,v,c); } private: volatile T data; GBE_STRUCT(AtomicInternal); }; typedef AtomicInternal Atomic32; typedef AtomicInternal Atomic; } #endif /* __GBE_ATOMIC_HPP__ */ Beignet-1.3.2-Source/backend/src/sys/vector.hpp000664 001750 001750 00000005052 13161142102 020460 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file vector.hpp * \author Benjamin Segovia */ #ifndef __GBE_VECTOR_HPP__ #define __GBE_VECTOR_HPP__ #include "sys/platform.hpp" #include namespace gbe { /*! Add bound checks to the standard vector class and use the internal * allocator */ template class vector : public std::vector> { public: // Typedefs typedef std::vector> parent_type; typedef Allocator allocator_type; typedef typename allocator_type::size_type size_type; typedef typename parent_type::iterator iterator; /*! Default constructor */ INLINE explicit vector(const allocator_type &a = allocator_type()) : parent_type(a) {} #if 0 /*! Copy constructor */ INLINE vector(const vector &x) : parent_type(x) {} #endif /*! Repetitive sequence constructor */ INLINE explicit vector(size_type n, const T& value= T(), const allocator_type &a = allocator_type()) : parent_type(n, value, a) {} /*! Iteration constructor */ template INLINE vector(InputIterator first, InputIterator last, const allocator_type &a = allocator_type()) : parent_type(first, last, a) {} /*! Get element at position index (with a bound check) */ T &operator[] (size_t index) { GBE_ASSERT(index < this->size()); return parent_type::operator[] (index); } /*! Get element at position index (with a bound check) */ const T &operator[] (size_t index) const { GBE_ASSERT(index < this->size()); return parent_type::operator[] (index); } GBE_CLASS(vector); }; } /* namespace gbe */ #endif /* __GBE_VECTOR_HPP__ */ Beignet-1.3.2-Source/backend/src/sys/intrinsics.hpp000664 001750 001750 00000013466 13161142102 021353 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __GBE_INTRINSICS_HPP__ #define __GBE_INTRINSICS_HPP__ #include "sys/platform.hpp" #include #include #if defined(__MSVC__) #include #define GBE_COMPILER_WRITE_BARRIER _WriteBarrier() #define GBE_COMPILER_READ_WRITE_BARRIER _ReadWriteBarrier() #if _MSC_VER >= 1400 #pragma intrinsic(_ReadBarrier) #define GBE_COMPILER_READ_BARRIER _ReadBarrier() #else #define GBE_COMPILER_READ_BARRIER _ReadWriteBarrier() #endif /* _MSC_VER >= 1400 */ INLINE int __bsf(int v) { unsigned long r = 0; _BitScanForward(&r,v); return r; } INLINE int __bsr(int v) { unsigned long r = 0; _BitScanReverse(&r,v); return r; } INLINE int __btc(int v, int i) { long r = v; _bittestandcomplement(&r,i); return r; } INLINE int __bts(int v, int i) { long r = v; _bittestandset(&r,i); return r; } INLINE int __btr(int v, int i) { long r = v; _bittestandreset(&r,i); return r; } INLINE void memoryFence(void) { _mm_mfence(); } #if defined(__X86_64__) && !defined(__INTEL_COMPILER) INLINE size_t __bsf(size_t v) { unsigned long r = 0; _BitScanForward64(&r,v); return r; } INLINE size_t __bsr(size_t v) { unsigned long r = 0; _BitScanReverse64(&r,v); return r; } INLINE size_t __btc(size_t v, size_t i) { __int64_t r = v; _bittestandcomplement64(&r,i); return r; } INLINE size_t __bts(size_t v, size_t i) { __int64_t r = v; _bittestandset64(&r,i); return r; } INLINE size_t __btr(size_t v, size_t i) { __int64_t r = v; _bittestandreset64(&r,i); return r; } #endif /* defined(__X86_64__) && !defined(__INTEL_COMPILER) */ typedef int32_t atomic32_t; INLINE int32_t atomic_add(volatile int32_t* m, const int32_t v) { return _InterlockedExchangeAdd((volatile long*)m,v); } INLINE int32_t atomic_cmpxchg(volatile int32_t* m, const int32_t v, const int32_t c) { return _InterlockedCompareExchange((volatile long*)m,v,c); } #if defined(__X86_64__) typedef int64_t atomic_t; INLINE int64_t atomic_add(volatile int64_t* m, const int64_t v) { return _InterlockedExchangeAdd64(m,v); } INLINE int64_t atomic_cmpxchg(volatile int64_t* m, const int64_t v, const int64_t c) { return _InterlockedCompareExchange64(m,v,c); } #else typedef int32_t atomic_t; #endif /* defined(__X86_64__) */ #else INLINE unsigned int __popcnt(unsigned int in) { int r = 0; asm ("popcnt %1,%0" : "=r"(r) : "r"(in)); return r; } INLINE int __bsf(int v) { int r = 0; asm ("bsf %1,%0" : "=r"(r) : "r"(v)); return r; } INLINE int __bsr(int v) { int r = 0; asm ("bsr %1,%0" : "=r"(r) : "r"(v)); return r; } INLINE int __btc(int v, int i) { int r = 0; asm ("btc %1,%0" : "=r"(r) : "r"(i), "0"(v) : "flags"); return r; } INLINE int __bts(int v, int i) { int r = 0; asm ("bts %1,%0" : "=r"(r) : "r"(i), "0"(v) : "flags"); return r; } INLINE int __btr(int v, int i) { int r = 0; asm ("btr %1,%0" : "=r"(r) : "r"(i), "0"(v) : "flags"); return r; } INLINE size_t __bsf(size_t v) { size_t r = 0; asm ("bsf %1,%0" : "=r"(r) : "r"(v)); return r; } INLINE size_t __bsr(size_t v) { size_t r = 0; asm ("bsr %1,%0" : "=r"(r) : "r"(v)); return r; } INLINE size_t __btc(size_t v, size_t i) { size_t r = 0; asm ("btc %1,%0" : "=r"(r) : "r"(i), "0"(v) : "flags"); return r; } INLINE size_t __bts(size_t v, size_t i) { size_t r = 0; asm ("bts %1,%0" : "=r"(r) : "r"(i), "0"(v) : "flags"); return r; } INLINE size_t __btr(size_t v, size_t i) { size_t r = 0; asm ("btr %1,%0" : "=r"(r) : "r"(i), "0"(v) : "flags"); return r; } INLINE void memoryFence(void) { _mm_mfence(); } typedef int32_t atomic32_t; INLINE int32_t atomic_add(int32_t volatile* value, int32_t input) { asm volatile("lock xadd %0,%1" : "+r" (input), "+m" (*value) : "r" (input), "m" (*value)); return input; } INLINE int32_t atomic_cmpxchg(int32_t volatile* value, const int32_t input, int32_t comparand) { asm volatile("lock cmpxchg %2,%0" : "=m" (*value), "=a" (comparand) : "r" (input), "m" (*value), "a" (comparand) : "flags"); return comparand; } #if defined(__X86_64__) typedef int64_t atomic_t; INLINE int64_t atomic_add(int64_t volatile* value, int64_t input) { asm volatile("lock xaddq %0,%1" : "+r" (input), "+m" (*value) : "r" (input), "m" (*value)); return input; } INLINE int64_t atomic_cmpxchg(int64_t volatile* value, const int64_t input, int64_t comparand) { asm volatile("lock cmpxchgq %2,%0" : "+m" (*value), "+a" (comparand) : "r" (input), "m" (*value), "r" (comparand) : "flags"); return comparand; } #else typedef int32_t atomic_t; #endif /* defined(__X86_64__) */ #define GBE_COMPILER_READ_WRITE_BARRIER asm volatile("" ::: "memory"); #define GBE_COMPILER_WRITE_BARRIER GBE_COMPILER_READ_WRITE_BARRIER #define GBE_COMPILER_READ_BARRIER GBE_COMPILER_READ_WRITE_BARRIER #endif /* __MSVC__ */ template INLINE T __load_acquire(volatile T *ptr) { GBE_COMPILER_READ_WRITE_BARRIER; T x = *ptr; // for x86, load == load_acquire GBE_COMPILER_READ_WRITE_BARRIER; return x; } template INLINE void __store_release(volatile T *ptr, T x) { GBE_COMPILER_READ_WRITE_BARRIER; *ptr = x; // for x86, store == store_release GBE_COMPILER_READ_WRITE_BARRIER; } #endif /* __GBE_INTRINSICS_HPP__ */ Beignet-1.3.2-Source/backend/src/sys/fixed_array.hpp000664 001750 001750 00000005354 13161142102 021460 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file fixed_array.hpp * * \author Benjamin Segovia */ #ifndef __GBE_FIXED_ARRAY_HPP__ #define __GBE_FIXED_ARRAY_HPP__ #include "platform.hpp" #include namespace gbe { /*! Regular C array but with bound checks */ template class fixed_array { public: /*! Do not initialize the data */ fixed_array(void) {} /*! Copy the input array */ fixed_array(const T array[N]) { std::memcpy(elem, array, N * sizeof(T)); } /*! First element (non const) */ T* begin(void) { return &elem[0]; } /*! First non-valid element (non const) */ T* end(void) { return begin() + N; } /*! First element (const) */ const T* begin(void) const { return &elem[0]; } /*! First non-valid element (const) */ const T* end(void) const { return begin() + N; } /*! Number of elements in the array */ size_t size(void) const { return N; } /*! Get the pointer to the data (non-const) */ T* data(void) { return &elem[0]; } /*! Get the pointer to the data (const) */ const T* data(void) const { return &elem[0]; } /*! First element (const) */ const T& front(void) const { return *begin(); } /*! Last element (const) */ const T& back(void) const { return *(end() - 1); } /*! First element (non-const) */ T& front(void) { return *begin(); } /*! Last element (non-const) */ T& back(void) { return *(end() - 1); } /*! Get element at position index (with bound check) */ INLINE T& operator[] (size_t index) { GBE_ASSERT(index < size()); return elem[index]; } /*! Get element at position index (with bound check) */ INLINE const T& operator[] (size_t index) const { GBE_ASSERT(index < size()); return elem[index]; } private: T elem[N]; //!< Store the elements STATIC_ASSERT(N > 0); //!< zero element is not allowed GBE_CLASS(fixed_array); }; } /* namespace gbe */ #endif /* __GBE_FIXED_ARRAY_HPP__ */ Beignet-1.3.2-Source/backend/src/sys/map.hpp000664 001750 001750 00000004750 13161142102 017737 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file map.hpp * * \author Benjamin Segovia */ #ifndef __GBE_MAP_HPP__ #define __GBE_MAP_HPP__ #include "sys/platform.hpp" #include namespace gbe { /*! Use custom allocator instead of std one */ template> class map : public std::map>>, public NonCopyable { public: // Typedefs typedef std::pair value_type; typedef Allocator allocator_type; typedef std::map parent_type; typedef Key key_type; typedef T mapped_type; typedef Pred key_compare; typedef typename allocator_type::pointer pointer; typedef typename allocator_type::const_pointer const_pointer; typedef typename allocator_type::reference reference; typedef typename allocator_type::const_reference const_reference; /*! Default constructor */ INLINE map(const key_compare &comp = key_compare(), const allocator_type &a = allocator_type()) : parent_type(comp, a) {} /*! Iteration constructor */ template INLINE map(InputIterator first, InputIterator last, const key_compare &comp = key_compare(), const allocator_type& a = allocator_type()) : parent_type(first, last, comp, a) {} #if 0 /*! Copy constructor */ INLINE map(const map& x) : parent_type(x) {} #endif /*! Better than using find if we do not care about the iterator itself */ INLINE bool contains(const Key &key) const { return this->find(key) != this->end(); } GBE_CLASS(map); }; } /* namespace gbe */ #endif /* __GBE_MAP_HPP__ */ Beignet-1.3.2-Source/backend/src/sys/platform.hpp000664 001750 001750 00000030411 13161142102 020777 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __GBE_PLATFORM_HPP__ #define __GBE_PLATFORM_HPP__ #include #include #include #include #include #include #include #include #include //////////////////////////////////////////////////////////////////////////////// /// CPU architecture //////////////////////////////////////////////////////////////////////////////// /* detect 32 or 64 platform */ #if defined(__x86_64__) || defined(__ia64__) || defined(_M_X64) #define __X86_64__ #else #define __X86__ #endif /* We require SSE ... */ #ifndef __SSE__ #define __SSE__ #endif /* ... and SSE2 */ #ifndef __SSE2__ #define __SSE2__ #endif #if defined(_INCLUDED_IMM) // #define __AVX__ #endif #if defined(_MSC_VER) && (_MSC_VER < 1600) && !defined(__INTEL_COMPILER) || defined(_DEBUG) && defined(_WIN32) #define __NO_AVX__ #endif #if defined(_MSC_VER) && !defined(__SSE4_2__) // #define __SSE4_2__ //! activates SSE4.2 support #endif //////////////////////////////////////////////////////////////////////////////// /// Operating system //////////////////////////////////////////////////////////////////////////////// /* detect Linux platform */ #if defined(linux) || defined(__linux__) || defined(__LINUX__) # if !defined(__LINUX__) # define __LINUX__ # endif # if !defined(__UNIX__) # define __UNIX__ # endif #endif /* detect FreeBSD platform */ #if defined(__FreeBSD__) || defined(__FREEBSD__) # if !defined(__FREEBSD__) # define __FREEBSD__ # endif # if !defined(__UNIX__) # define __UNIX__ # endif #endif /* detect Windows 95/98/NT/2000/XP/Vista/7 platform */ #if (defined(WIN32) || defined(_WIN32) || defined(__WIN32__) || defined(__NT__)) && !defined(__CYGWIN__) # if !defined(__WIN32__) # define __WIN32__ # endif #endif /* detect Cygwin platform */ #if defined(__CYGWIN__) # if !defined(__UNIX__) # define __UNIX__ # endif #endif /* detect MAC OS X platform */ #if defined(__APPLE__) || defined(MACOSX) || defined(__MACOSX__) # if !defined(__MACOSX__) # define __MACOSX__ # endif # if !defined(__UNIX__) # define __UNIX__ # endif #endif /* try to detect other Unix systems */ #if defined(__unix__) || defined (unix) || defined(__unix) || defined(_unix) # if !defined(__UNIX__) # define __UNIX__ # endif #endif //////////////////////////////////////////////////////////////////////////////// /// Compiler //////////////////////////////////////////////////////////////////////////////// /*! GCC compiler */ #ifdef __GNUC__ // #define __GNUC__ #endif /*! Intel compiler */ #ifdef __INTEL_COMPILER #define __ICC__ #endif /*! Visual C compiler */ #ifdef _MSC_VER #define __MSVC__ #endif //////////////////////////////////////////////////////////////////////////////// /// Makros //////////////////////////////////////////////////////////////////////////////// #ifdef __WIN32__ #define __dllexport extern "C" __declspec(dllexport) #define __dllimport extern "C" __declspec(dllimport) #else #define __dllexport extern "C" #define __dllimport extern "C" #endif #ifdef __MSVC__ #undef NOINLINE #define NOINLINE __declspec(noinline) #define INLINE __forceinline #define RESTRICT __restrict #define THREAD __declspec(thread) #define ALIGNED(...) __declspec(align(__VA_ARGS__)) //#define __FUNCTION__ __FUNCTION__ #define DEBUGBREAK() __debugbreak() #else #undef NOINLINE #undef INLINE #define NOINLINE __attribute__((noinline)) #define INLINE inline __attribute__((always_inline)) #define RESTRICT __restrict #define THREAD __thread #define ALIGNED(...) __attribute__((aligned(__VA_ARGS__))) #define __FUNCTION__ __PRETTY_FUNCTION__ #define DEBUGBREAK() asm ("int $3") #endif /*! Modern x86 processors */ #define CACHE_LINE 64 #define CACHE_LINE_ALIGNED ALIGNED(CACHE_LINE) #ifdef __GNUC__ #define MAYBE_UNUSED __attribute__((used)) #else #define MAYBE_UNUSED #endif #if defined(_MSC_VER) #define __builtin_expect(expr,b) expr #endif /*! Debug syntactic sugar */ #if GBE_DEBUG #define IF_DEBUG(EXPR) EXPR #else #define IF_DEBUG(EXPR) #endif /* GBE_DEBUG */ /*! Debug printing macros */ #define STRING(x) #x #define PING std::cout << __FILE__ << " (" << __LINE__ << "): " << __FUNCTION__ << std::endl #define PRINT(x) std::cout << STRING(x) << " = " << (x) << std::endl /*! Branch hint */ #define LIKELY(x) __builtin_expect(!!(x),1) #define UNLIKELY(x) __builtin_expect((x),0) /*! Stringify macros */ #define JOIN(X, Y) _DO_JOIN(X, Y) #define _DO_JOIN(X, Y) _DO_JOIN2(X, Y) #define _DO_JOIN2(X, Y) X##Y /*! Run-time assertion */ #if GBE_DEBUG #define GBE_ASSERT(EXPR) do { \ if (UNLIKELY(!(EXPR))) \ gbe::onFailedAssertion(#EXPR, __FILE__, __FUNCTION__, __LINE__); \ } while (0) #define GBE_ASSERTM(EXPR, MSG) do { \ if (UNLIKELY(!(EXPR))) \ gbe::onFailedAssertion(MSG, __FILE__, __FUNCTION__, __LINE__); \ } while (0) #else #define GBE_ASSERT(EXPR) do { } while (0) #define GBE_ASSERTM(EXPR, MSG) do { } while (0) #endif /* GBE_DEBUG */ #define NOT_IMPLEMENTED GBE_ASSERTM (false, "Not implemented") #define NOT_SUPPORTED GBE_ASSERTM (false, "Not supported") /*! Fatal error macros */ #define FATAL_IF(COND, MSG) \ do { \ if(UNLIKELY(COND)) FATAL(MSG); \ } while (0) /* Safe deletion macros */ #define GBE_SAFE_DELETE_ARRAY(x) do { if (x != NULL) GBE_DELETE_ARRAY(x); } while (0) #define GBE_SAFE_DELETE(x) do { if (x != NULL) GBE_DELETE(x); } while (0) /* Number of elements in an array */ #define ARRAY_ELEM_NUM(x) (sizeof(x) / sizeof(x[0])) /* Align X on A */ #define ALIGN(X,A) (((X) % (A)) ? ((X) + (A) - ((X) % (A))) : (X)) /*! Produce a string from the macro locatiom */ #define HERE (STRING(__LINE__) "@" __FILE__) /*! Typesafe encapusalation of a type (mostly for integers) */ #define TYPE_SAFE(SAFE, UNSAFE) \ class SAFE \ { \ public: \ INLINE SAFE(void) {} \ explicit INLINE SAFE(UNSAFE unsafe) : unsafe(unsafe) {} \ INLINE operator UNSAFE (void) const { return unsafe; } \ UNSAFE value(void) const { return unsafe; } \ private: \ UNSAFE unsafe; \ }; /*! Default alignment for the platform */ #define GBE_DEFAULT_ALIGNMENT 16 namespace gbe { /*! Useful constants */ enum { KB = 1024, MB = (KB*KB), }; } /*! Portable AlignOf */ template struct AlignOf { struct Helper { char x; T t; }; enum { value = offsetof(Helper, t) }; }; //gcc 4.8+ support C++11 alignof keyword #if (__GNUC__ >= 4 && __GNUC_MINOR__ >= 8) #define ALIGNOF(T) (alignof(T)) #else #define ALIGNOF(T) (AlignOf::value) #endif //////////////////////////////////////////////////////////////////////////////// /// Visibility parameters (DLL export and so on) //////////////////////////////////////////////////////////////////////////////// #if defined __WIN32__ #if defined __GNUC__ #define GBE_EXPORT_SYMBOL __attribute__ ((dllexport)) #define GBE_IMPORT_SYMBOL __attribute__ ((dllimport)) #else #define GBE_IMPORT_SYMBOL __declspec(dllimport) #define GBE_EXPORT_SYMBOL __declspec(dllexport) #endif /* __GNUC__ */ #else #define GBE_EXPORT_SYMBOL __attribute__ ((visibility ("default"))) #define GBE_IMPORT_SYMBOL #endif /* __WIN32__ */ //////////////////////////////////////////////////////////////////////////////// /// Basic Types //////////////////////////////////////////////////////////////////////////////// #if defined(__MSVC__) typedef __int64_t int64_t; typedef unsigned __int64_t uint64_t; typedef __int32_t int32_t; typedef unsigned __int32_t uint32_t; typedef __int16_t int16_t; typedef unsigned __int16_t uint16_t; typedef __int8_t int8_t; typedef unsigned __int8_t uint8_t; #else #include #endif #if defined(__X86_64__) typedef int64_t index_t; #else typedef int32_t index_t; #endif /*! To protect some classes from being copied */ class NonCopyable { protected: INLINE NonCopyable(void) {} INLINE ~NonCopyable(void) {} private: INLINE NonCopyable(const NonCopyable&) {} INLINE NonCopyable& operator= (const NonCopyable&) {return *this;} }; #define TO_MAGIC(A, B, C, D) (A<<24 | B<<16 | C<<8 | D) class Serializable { public: INLINE Serializable(void) = default; INLINE Serializable(const Serializable&) = default; INLINE Serializable& operator= (const Serializable&) = default; virtual uint32_t serializeToBin(std::ostream& outs) = 0; virtual uint32_t deserializeFromBin(std::istream& ins) = 0; /* These two will follow LLVM's ABI. */ virtual uint32_t serializeToLLVM(void) { return 0;/* not implemented now. */} virtual uint32_t deserializeFromLLVM(void) { return 0;/* not implemented now. */} virtual void printStatus(int indent = 0, std::ostream& outs = std::cout) { } virtual ~Serializable(void) { } protected: static std::string indent_to_str(int indent) { std::string ind(indent, ' '); return ind; } }; /* Help Macro for serialization. */ #define SERIALIZE_OUT(elt, out, sz) \ do { \ auto tmp_val = elt; \ out.write((char *)(&tmp_val), sizeof(elt)); \ sz += sizeof(elt); \ } while(0) #define DESERIALIZE_IN(elt, in, sz) \ do { \ in.read((char *)(&(elt)), sizeof(elt)); \ sz += sizeof(elt); \ } while(0) //////////////////////////////////////////////////////////////////////////////// /// Disable some compiler warnings //////////////////////////////////////////////////////////////////////////////// #ifdef __ICC__ #pragma warning(disable:265) // floating-point operation result is out of range #pragma warning(disable:383) // value copied to temporary, reference to temporary used #pragma warning(disable:869) // parameter was never referenced #pragma warning(disable:981) // operands are evaluated in unspecified order #pragma warning(disable:1418) // external function definition with no prior declaration #pragma warning(disable:1419) // external declaration in primary source file #pragma warning(disable:1572) // floating-point equality and inequality comparisons are unreliable #pragma warning(disable:1125) // virtual function override intended? #endif /* __ICC__ */ //////////////////////////////////////////////////////////////////////////////// /// Default Includes and Functions //////////////////////////////////////////////////////////////////////////////// #include "sys/alloc.hpp" namespace gbe { /*! selects */ INLINE bool select(bool s, bool t , bool f) { return s ? t : f; } INLINE int select(bool s, int t, int f) { return s ? t : f; } INLINE float select(bool s, float t, float f) { return s ? t : f; } /*! Fatal error function */ void FATAL(const std::string&); /*! Return the next power of 2 */ INLINE uint32_t nextHighestPowerOf2(uint32_t x) { x--; x |= x >> 1; x |= x >> 2; x |= x >> 4; x |= x >> 8; x |= x >> 16; return ++x; } INLINE uint32_t logi2(uint32_t x) { uint32_t r = 0; while(x >>= 1) r++; return r; } template INLINE uint32_t isPowerOf(uint32_t i) { while (i > 1) { if (i%N) return false; i = i/N; } return true; } template<> INLINE uint32_t isPowerOf<2>(uint32_t i) { return ((i-1)&i) == 0; } /*! random functions */ template T random() { return T(0); } template<> INLINE int32_t random() { return int(rand()); } template<> INLINE uint32_t random() { return uint32_t(rand()); } template<> INLINE float random() { return random()/float(RAND_MAX); } template<> INLINE double random() { return random()/double(RAND_MAX); } /** returns performance counter in seconds */ double getSeconds(); } /* namespace gbe */ #endif /* __GBE_PLATFORM_HPP__ */ Beignet-1.3.2-Source/backend/src/sys/platform.cpp000664 001750 001750 00000004066 13161142102 021001 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "sys/platform.hpp" #include "sys/intrinsics.hpp" #include //////////////////////////////////////////////////////////////////////////////// /// Windows Platform //////////////////////////////////////////////////////////////////////////////// #ifdef __WIN32__ #define WIN32_LEAN_AND_MEAN #include namespace gbe { double getSeconds() { LARGE_INTEGER freq, val; QueryPerformanceFrequency(&freq); QueryPerformanceCounter(&val); return (double)val.QuadPart / (double)freq.QuadPart; } void FATAL(const std::string &msg) { std::cerr << msg << std::endl; MessageBox(NULL, msg.c_str(), "Fatal Error", MB_OK | MB_ICONEXCLAMATION); GBE_ASSERT(0); #ifdef __GNUC__ exit(-1); #else _exit(-1); #endif /* __GNUC__ */ } } /* namespace gbe */ #endif /* __WIN32__ */ //////////////////////////////////////////////////////////////////////////////// /// Unix Platform //////////////////////////////////////////////////////////////////////////////// #if defined(__UNIX__) #include #include namespace gbe { double getSeconds() { struct timeval tp; gettimeofday(&tp,NULL); return double(tp.tv_sec) + double(tp.tv_usec)/1E6; } void FATAL(const std::string &msg) { std::cerr << msg << std::endl; GBE_ASSERT(0); _exit(-1); } } /* namespace gbe */ #endif /* __UNIX__ */ Beignet-1.3.2-Source/backend/src/sys/alloc.hpp000664 001750 001750 00000025542 13161142102 020256 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file alloc.hpp * \author Benjamin Segovia */ #ifndef __GBE_ALLOC_HPP__ #define __GBE_ALLOC_HPP__ #include "sys/platform.hpp" #include "sys/assert.hpp" #include #include namespace gbe { /*! regular allocation */ void* memAlloc(size_t size); void memFree(void *ptr); /*! Aligned allocation */ void* alignedMalloc(size_t size, size_t align = 64); void alignedFree(void* ptr); /*! Monitor memory allocations */ #if GBE_DEBUG_MEMORY void* MemDebuggerInsertAlloc(void*, const char*, const char*, int); void MemDebuggerRemoveAlloc(void *ptr); void MemDebuggerDumpAlloc(void); void MemDebuggerInitializeMem(void *mem, size_t sz); void MemDebuggerEnableMemoryInitialization(bool enabled); #else INLINE void* MemDebuggerInsertAlloc(void *ptr, const char*, const char*, int) {return ptr;} INLINE void MemDebuggerRemoveAlloc(void *ptr) {} INLINE void MemDebuggerDumpAlloc(void) {} INLINE void MemDebuggerInitializeMem(void *mem, size_t sz) {} INLINE void MemDebuggerEnableMemoryInitialization(bool enabled) {} #endif /* GBE_DEBUG_MEMORY */ /*! Properly handle the allocated type */ template T* _MemDebuggerInsertAlloc(T *ptr, const char *file, const char *function, int line) { MemDebuggerInsertAlloc(ptr, file, function, line); return ptr; } } /* namespace gbe */ /*! Declare a class with custom allocators */ #define GBE_CLASS(TYPE) \ GBE_STRUCT(TYPE) \ private: /*! Declare a structure with custom allocators */ #define GBE_STRUCT(TYPE) \ public: \ void* operator new(size_t size) { \ return gbe::alignedMalloc(size, GBE_DEFAULT_ALIGNMENT); \ } \ void operator delete(void* ptr) { return gbe::alignedFree(ptr); } \ void* operator new[](size_t size) { \ return gbe::alignedMalloc(size, GBE_DEFAULT_ALIGNMENT); \ } \ void operator delete[](void* ptr) { return gbe::alignedFree(ptr); } \ void* operator new(size_t size, void *p) { return p; } \ void operator delete(void* ptr, void *p) {/*do nothing*/} \ void* operator new[](size_t size, void *p) { return p; } \ void operator delete[](void* ptr, void *p) { /*do nothing*/ } /*! Macros to handle allocation position */ #define GBE_NEW(T,...) \ gbe::_MemDebuggerInsertAlloc(new T(__VA_ARGS__), __FILE__, __FUNCTION__, __LINE__) #define GBE_NEW_NO_ARG(T) \ gbe::_MemDebuggerInsertAlloc(new T, __FILE__, __FUNCTION__, __LINE__) #define GBE_NEW_ARRAY(T,N,...) \ gbe::_MemDebuggerInsertAlloc(new T[N](__VA_ARGS__), __FILE__, __FUNCTION__, __LINE__) #define GBE_NEW_ARRAY_NO_ARG(T,N)\ gbe::_MemDebuggerInsertAlloc(new T[N], __FILE__, __FUNCTION__, __LINE__) #define GBE_NEW_P(T,X,...) \ gbe::_MemDebuggerInsertAlloc(new (X) T(__VA_ARGS__), __FILE__, __FUNCTION__, __LINE__) #define GBE_DELETE(X) \ do { gbe::MemDebuggerRemoveAlloc(X); delete X; } while (0) #define GBE_DELETE_ARRAY(X) \ do { gbe::MemDebuggerRemoveAlloc(X); delete[] X; } while (0) #define GBE_MALLOC(SZ) \ gbe::MemDebuggerInsertAlloc(gbe::memAlloc(SZ),__FILE__, __FUNCTION__, __LINE__) #define GBE_FREE(X) \ do { gbe::MemDebuggerRemoveAlloc(X); gbe::memFree(X); } while (0) #define GBE_ALIGNED_FREE(X) \ do { gbe::MemDebuggerRemoveAlloc(X); gbe::alignedFree(X); } while (0) #define GBE_ALIGNED_MALLOC(SZ,ALIGN) \ gbe::MemDebuggerInsertAlloc(gbe::alignedMalloc(SZ,ALIGN),__FILE__, __FUNCTION__, __LINE__) namespace gbe { /*! STL compliant allocator to intercept all memory allocations */ template class Allocator : public std::allocator { public: typedef T value_type; typedef value_type* pointer; typedef const value_type* const_pointer; typedef value_type& reference; typedef const value_type& const_reference; typedef std::size_t size_type; typedef std::ptrdiff_t difference_type; typedef typename std::allocator::const_pointer void_allocator_ptr; template struct rebind { typedef Allocator other; }; INLINE Allocator(void) {} INLINE ~Allocator(void) {} INLINE Allocator(Allocator const&) {} template INLINE Allocator(Allocator const&) {} INLINE pointer address(reference r) { return &r; } INLINE const_pointer address(const_reference r) { return &r; } INLINE pointer allocate(size_type n, void_allocator_ptr = 0) { if (ALIGNOF(T) > sizeof(uintptr_t)) return (pointer) GBE_ALIGNED_MALLOC(n*sizeof(T), ALIGNOF(T)); else return (pointer) GBE_MALLOC(n * sizeof(T)); } INLINE void deallocate(pointer p, size_type) { if (ALIGNOF(T) > sizeof(uintptr_t)) GBE_ALIGNED_FREE(p); else GBE_FREE(p); } INLINE size_type max_size(void) const { return std::numeric_limits::max() / sizeof(T); } INLINE void destroy(pointer p) { p->~T(); } INLINE bool operator==(Allocator const&) { return true; } INLINE bool operator!=(Allocator const& a) { return !operator==(a); } }; // Deactivate fast allocators #ifndef GBE_DEBUG_SPECIAL_ALLOCATOR #define GBE_DEBUG_SPECIAL_ALLOCATOR 0 #endif /*! A growing pool never gives memory to the system but chain free elements * together such as deallocation can be quickly done */ template class GrowingPool { public: GrowingPool(uint32_t elemNum = 1) : curr(GBE_NEW(GrowingPoolElem, elemNum <= 1 ? 1 : elemNum)), free(NULL), full(NULL), freeList(NULL) {} ~GrowingPool(void) { GBE_SAFE_DELETE(curr); GBE_SAFE_DELETE(free); GBE_SAFE_DELETE(full); } void *allocate(void) { #if GBE_DEBUG_SPECIAL_ALLOCATOR return GBE_ALIGNED_MALLOC(sizeof(T), ALIGNOF(T)); #else // Pick up an element from the free list if (this->freeList != NULL) { void *data = (void*) freeList; this->freeList = *(void**) freeList; return data; } // Pick up an element from the current block (if not full) if (this->curr->allocated < this->curr->maxElemNum) { void *data = (T*) curr->data + curr->allocated++; return data; } // Block is full this->curr->next = this->full; this->full = this->curr; // Try to pick up a free block if (this->free) this->getFreeBlock(); // No free block we must allocate a new one else this->curr = GBE_NEW(GrowingPoolElem, 2 * this->curr->maxElemNum); void *data = (T*) curr->data + curr->allocated++; return data; #endif /* GBE_DEBUG_SPECIAL_ALLOCATOR */ } void deallocate(void *t) { if (t == NULL) return; #if GBE_DEBUG_SPECIAL_ALLOCATOR GBE_ALIGNED_FREE(t); #else *(void**) t = this->freeList; this->freeList = t; #endif /* GBE_DEBUG_SPECIAL_ALLOCATOR */ } void rewind(void) { #if GBE_DEBUG_SPECIAL_ALLOCATOR == 0 // All free elements return to their blocks this->freeList = NULL; // Put back current block in full list if (this->curr) { this->curr->next = this->full; this->full = this->curr; this->curr = NULL; } // Reverse the chain list and mark all blocks as empty while (this->full) { GrowingPoolElem *next = this->full->next; this->full->allocated = 0; this->full->next = this->free; this->free = this->full; this->full = next; } // Provide a valid current block this->getFreeBlock(); #endif /* GBE_DEBUG_SPECIAL_ALLOCATOR */ } private: /*! Pick-up a free block */ INLINE void getFreeBlock(void) { GBE_ASSERT(this->free); this->curr = this->free; this->free = this->free->next; this->curr->next = NULL; } /*! Chunk of elements to allocate */ class GrowingPoolElem { friend class GrowingPool; GrowingPoolElem(size_t elemNum) { const size_t sz = std::max(sizeof(T), sizeof(void*)); this->data = (T*) GBE_ALIGNED_MALLOC(elemNum * sz, ALIGNOF(T)); this->next = NULL; this->maxElemNum = elemNum; this->allocated = 0; } ~GrowingPoolElem(void) { GBE_ALIGNED_FREE(this->data); if (this->next) GBE_DELETE(this->next); } T *data; GrowingPoolElem *next; size_t allocated, maxElemNum; }; GrowingPoolElem *curr; //!< To get new element from GrowingPoolElem *free; //!< Blocks that can be reused (after rewind) GrowingPoolElem *full; //!< Blocks fully used void *freeList; //!< Elements that have been deallocated GBE_CLASS(GrowingPool); }; /*! Helper macros to build and destroy objects with a growing pool */ #define DECL_POOL(TYPE, POOL) \ GrowingPool POOL; \ template \ TYPE *new##TYPE(Args&&... args) { \ return new (POOL.allocate()) TYPE(args...); \ } \ void delete##TYPE(TYPE *ptr) { \ ptr->~TYPE(); \ POOL.deallocate(ptr); \ } /*! A linear allocator just grows and does not reuse freed memory. It can * however allocate objects of any size */ class LinearAllocator { public: /*! Initiate the linear allocator (one segment is allocated) */ LinearAllocator(size_t minSize = CACHE_LINE, size_t maxSize = 64*KB); /*! Free up everything */ ~LinearAllocator(void); /*! Allocate size bytes */ void *allocate(size_t size); /*! Nothing here */ INLINE void deallocate(void *ptr) { #if GBE_DEBUG_SPECIAL_ALLOCATOR if (ptr) GBE_ALIGNED_FREE(ptr); #endif /* GBE_DEBUG_SPECIAL_ALLOCATOR */ } private: /*! Helds an allocated segment of memory */ struct Segment { /*! Allocate a new segment */ Segment(size_t size); /*! Destroy the segment and the next ones */ ~Segment(void); /* Size of the segment */ size_t size; /*! Offset to the next free bytes (if any left) */ size_t offset; /*! Pointer to valid data */ void *data; /*! Pointer to the next segment */ Segment *next; /*! Use internal allocator */ GBE_STRUCT(Segment); }; /*! Points to the current segment we can allocate from */ Segment *curr; /*! Maximum segment size */ size_t maxSize; /*! Use internal allocator */ GBE_CLASS(LinearAllocator); }; } /* namespace gbe */ #endif /* __GBE_ALLOC_HPP__ */ Beignet-1.3.2-Source/backend/src/sys/cvar.hpp000664 001750 001750 00000005320 13161142102 020107 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file cvar.hpp * \author Benjamin Segovia * * Quake like console variable system. Just use the environment variables from * the console to change their value */ #ifndef __GBE_CVAR_HPP__ #define __GBE_CVAR_HPP__ #include "sys/platform.hpp" namespace gbe { /*! A CVar is either a float, an integer or a string value. CVarInit is only * here to set the global variable in pre-main */ class CVarInit { public: enum { STRING = 0, INTEGER = 1, FLOAT = 2 }; /*! Build a CVar from an integer environment variable */ explicit CVarInit(const char *name, int32_t *addr, int32_t imin, int32_t i, int32_t imax); /*! Build a CVar from a float environment variable */ explicit CVarInit(const char *name, float *addr, float fmin, float f, float fmax); /*! Build a CVar from a string environment variable */ explicit CVarInit(const char *name, std::string *str, const std::string &v); int varType; //!< STRING, INTEGER or FLOAT std::string *str; //!< string variable union { struct { int32_t min, *curr, max; } i; //!< integer variables with bounds struct { float min, *curr, max; } f; //!< float variables with bounds }; }; } /* namespace gbe */ /*! Declare an integer console variable */ #define IVAR(NAME, MIN, CURR, MAX) \ int32_t NAME; \ static gbe::CVarInit __CVAR##NAME##__LINE__##__(#NAME, &NAME, int32_t(MIN), int32_t(CURR), int32_t(MAX)); /*! Declare a float console variable */ #define FVAR(NAME, MIN, CURR, MAX) \ float NAME; \ static gbe::CVarInit __CVAR##NAME##__LINE__##__(#NAME, &NAME, float(MIN), float(CURR), float(MAX)); /*! Declare a string console variable */ #define SVAR(NAME, STR) \ std::string NAME; \ static gbe::CVarInit __CVAR##NAME##__LINE__##__(#NAME, &NAME, STR); /*! Declare a Boolean variable (just an integer in {0,1}) */ #define BVAR(NAME, CURR) IVAR(NAME, 0, CURR ? 1 : 0, 1) #endif /* __GBE_CVAR_HPP__ */ Beignet-1.3.2-Source/backend/src/gbe_bin_interpreter.cpp000664 001750 001750 00000007456 13161142102 022355 0ustar00yryr000000 000000 /* * Copyright © 2014 Intel Corporation * * This library is free software; you can redistribute it and/or modify it * under the terms of the GNU Lesser General Public License as published by the * Free Software Foundation; either version 2.1 of the License, or (at your * option) any later version. * * This library is distributed in the hope that it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License * for more details. * * You should have received a copy of the GNU Lesser General Public License * along with this library. If not, see . * */ #include "sys/alloc.cpp" #include "sys/cvar.cpp" #include "sys/assert.cpp" #include "sys/platform.cpp" #include "ir/constant.cpp" #include "ir/printf.cpp" #include "ir/profiling.cpp" #include "ir/reloc.cpp" #pragma GCC diagnostic ignored "-Wunused-function" #pragma GCC diagnostic ignored "-Wunused-variable" #undef GBE_COMPILER_AVAILABLE #include "backend/program.cpp" #include "backend/gen_program.cpp" #include "ir/sampler.cpp" #include "ir/image.cpp" struct BinInterpCallBackInitializer { BinInterpCallBackInitializer() { gbe_program_new_from_binary = gbe::genProgramNewFromBinary; gbe_program_get_kernel_num = gbe::programGetKernelNum; gbe_program_get_kernel_by_name = gbe::programGetKernelByName; gbe_program_get_kernel = gbe::programGetKernel; gbe_program_get_device_enqueue_kernel_name = gbe::programGetDeviceEnqueueKernelName; gbe_kernel_get_code_size = gbe::kernelGetCodeSize; gbe_kernel_get_code = gbe::kernelGetCode; gbe_kernel_get_arg_num = gbe::kernelGetArgNum; gbe_kernel_get_curbe_size = gbe::kernelGetCurbeSize; gbe_kernel_get_sampler_size = gbe::kernelGetSamplerSize; gbe_kernel_get_compile_wg_size = gbe::kernelGetCompileWorkGroupSize; gbe_kernel_get_stack_size = gbe::kernelGetStackSize; gbe_kernel_get_image_size = gbe::kernelGetImageSize; gbe_kernel_get_name = gbe::kernelGetName; gbe_kernel_get_attributes = gbe::kernelGetAttributes; gbe_kernel_get_arg_type = gbe::kernelGetArgType; gbe_kernel_get_arg_size = gbe::kernelGetArgSize; gbe_kernel_get_arg_bti = gbe::kernelGetArgBTI; gbe_kernel_get_simd_width = gbe::kernelGetSIMDWidth; gbe_kernel_get_scratch_size = gbe::kernelGetScratchSize; gbe_kernel_use_slm = gbe::kernelUseSLM; gbe_kernel_get_required_work_group_size = gbe::kernelGetRequiredWorkGroupSize; gbe_kernel_get_curbe_offset = gbe::kernelGetCurbeOffset; gbe_kernel_get_slm_size = gbe::kernelGetSLMSize; gbe_kernel_get_arg_align = gbe::kernelGetArgAlign; gbe_program_get_global_constant_size = gbe::programGetGlobalConstantSize; gbe_program_delete = gbe::programDelete; gbe_program_get_global_constant_data = gbe::programGetGlobalConstantData; gbe_program_get_global_reloc_count = gbe::programGetGlobalRelocCount; gbe_program_get_global_reloc_table = gbe::programGetGlobalRelocTable; gbe_kernel_get_sampler_data = gbe::kernelGetSamplerData; gbe_kernel_get_image_data = gbe::kernelGetImageData; gbe_kernel_get_ocl_version = gbe::kernelGetOclVersion; gbe_kernel_get_arg_info = gbe::kernelGetArgInfo; gbe_get_profiling_bti = gbe::kernelGetProfilingBTI; gbe_dup_profiling = gbe::kernelDupProfiling; gbe_output_profiling = gbe::kernelOutputProfiling; gbe_get_printf_num = gbe::kernelGetPrintfNum; gbe_get_printf_buf_bti = gbe::kernelGetPrintfBufBTI; gbe_dup_printfset = gbe::kernelDupPrintfSet; gbe_release_printf_info = gbe::kernelReleasePrintfSet; gbe_output_printf = gbe::kernelOutputPrintf; gbe_kernel_use_device_enqueue = gbe::kernelUseDeviceEnqueue; } ~BinInterpCallBackInitializer() { } }; static struct BinInterpCallBackInitializer binInterpCB; Beignet-1.3.2-Source/backend/src/backend/000775 001750 001750 00000000000 13174334761 017236 5ustar00yryr000000 000000 Beignet-1.3.2-Source/backend/src/backend/gen_context.cpp000664 001750 001750 00000431074 13173554000 022255 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporatin * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file gen_context.cpp * \author Benjamin Segovia */ #include "backend/gen_context.hpp" #include "backend/gen_program.hpp" #include "backend/gen_defs.hpp" #include "backend/gen_encoder.hpp" #include "backend/gen_insn_selection.hpp" #include "backend/gen_insn_scheduling.hpp" #include "backend/gen_insn_selection_output.hpp" #include "backend/gen_reg_allocation.hpp" #include "backend/gen/gen_mesa_disasm.h" #include "ir/function.hpp" #include "ir/value.hpp" #include "ir/profiling.hpp" #include "sys/cvar.hpp" #include #include #include namespace gbe { /////////////////////////////////////////////////////////////////////////// // GenContext implementation /////////////////////////////////////////////////////////////////////////// GenContext::GenContext(const ir::Unit &unit, const std::string &name, uint32_t deviceID, bool relaxMath) : Context(unit, name), deviceID(deviceID), relaxMath(relaxMath) { this->p = NULL; this->sel = NULL; this->ra = NULL; this->asmFileName = NULL; this->ifEndifFix = false; this->regSpillTick = 0; this->inProfilingMode = false; } GenContext::~GenContext(void) { GBE_DELETE(this->ra); GBE_DELETE(this->sel); GBE_DELETE(this->p); } void GenContext::startNewCG(uint32_t simdWidth, uint32_t reservedSpillRegs, bool limitRegisterPressure) { this->limitRegisterPressure = limitRegisterPressure; this->reservedSpillRegs = reservedSpillRegs; Context::startNewCG(simdWidth); GBE_SAFE_DELETE(ra); GBE_SAFE_DELETE(sel); GBE_SAFE_DELETE(p); this->p = generateEncoder(); this->newSelection(); this->ra = GBE_NEW(GenRegAllocator, *this); this->branchPos2.clear(); this->branchPos3.clear(); this->labelPos.clear(); this->errCode = NO_ERROR; this->regSpillTick = 0; } void GenContext::setASMFileName(const char* asmFname) { this->asmFileName = asmFname; } void GenContext::newSelection(void) { this->sel = GBE_NEW(Selection, *this); } uint32_t GenContext::alignScratchSize(uint32_t size){ uint32_t i = 0; while(i < size) i+=1024; return i; } extern bool OCL_DEBUGINFO; // first defined by calling BVAR in program.cpp #define SET_GENINSN_DBGINFO(I) \ if(OCL_DEBUGINFO) p->DBGInfo = I.DBGInfo; void GenContext::emitInstructionStream(void) { // Emit Gen ISA for (auto &block : *sel->blockList) for (auto &insn : block.insnList) { const uint32_t opcode = insn.opcode; p->push(); // no more virtual register here in that part of the code generation GBE_ASSERT(insn.state.physicalFlag); p->curr = insn.state; SET_GENINSN_DBGINFO(insn); switch (opcode) { #define DECL_SELECTION_IR(OPCODE, FAMILY) \ case SEL_OP_##OPCODE: this->emit##FAMILY(insn); break; #include "backend/gen_insn_selection.hxx" #undef DECL_INSN } p->pop(); } /* per spec, pad the instruction stream with 8 nop to avoid instruction prefetcher prefetch into an invalide page */ for(int i = 0; i < 8; i++) p->NOP(); } #undef SET_GENINSN_DBGINFO bool GenContext::patchBranches(void) { using namespace ir; for (auto pair : branchPos2) { const LabelIndex label = pair.first; const int32_t insnID = pair.second; const int32_t targetID = labelPos.find(label)->second; p->patchJMPI(insnID, (targetID - insnID), 0); } for (auto pair : branchPos3) { const LabelPair labelPair = pair.first; const int32_t insnID = pair.second; const int32_t jip = labelPos.find(labelPair.l0)->second; const int32_t uip = labelPos.find(labelPair.l1)->second; if (((jip - insnID) > 32767 || (jip - insnID) < -32768) || ((uip - insnID) > 32768 || (uip - insnID) < -32768)) { // The only possible error instruction is if/endif here. errCode = OUT_OF_RANGE_IF_ENDIF; return false; } p->patchJMPI(insnID, jip - insnID, uip - insnID); } return true; } /* Get proper block ip register according to current label width. */ GenRegister GenContext::getBlockIP(void) { GenRegister blockip; if (!isDWLabel()) blockip = ra->genReg(GenRegister::uw8grf(ir::ocl::blockip)); else blockip = ra->genReg(GenRegister::ud8grf(ir::ocl::dwblockip)); return blockip; } /* Set current block ip register to a specified constant label value. */ void GenContext::setBlockIP(GenRegister blockip, uint32_t label) { if (!isDWLabel()) p->MOV(blockip, GenRegister::immuw(label)); else p->MOV(blockip, GenRegister::immud(label)); } void GenContext::clearFlagRegister(void) { // when group size not aligned to simdWidth, flag register need clear to // make prediction(any8/16h) work correctly const GenRegister blockip = getBlockIP(); p->push(); p->curr.noMask = 1; p->curr.predicate = GEN_PREDICATE_NONE; setBlockIP(blockip, getMaxLabel()); p->curr.noMask = 0; setBlockIP(blockip, 0); p->curr.execWidth = 1; if (ra->isAllocated(ir::ocl::zero)) p->MOV(ra->genReg(GenRegister::uw1grf(ir::ocl::zero)), GenRegister::immuw(0)); if (ra->isAllocated(ir::ocl::one)) p->MOV(ra->genReg(GenRegister::uw1grf(ir::ocl::one)), GenRegister::immw(-1)); p->pop(); } void GenContext::loadLaneID(GenRegister dst) { const GenRegister laneID = GenRegister::immv(0x76543210); GenRegister dst_; if (dst.type == GEN_TYPE_UW) dst_ = dst; else if (dst.type == GEN_TYPE_UD) dst_ = GenRegister::retype(dst, GEN_TYPE_UW); p->push(); uint32_t execWidth = p->curr.execWidth; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; if (execWidth == 8) p->MOV(dst_, laneID); else { p->curr.execWidth = 8; p->MOV(dst_, laneID); //Packed Unsigned Half-Byte Integer Vector does not work //have to mock by adding 8 to the singed vector const GenRegister eight = GenRegister::immuw(8); p->ADD(GenRegister::offset(dst_, 0, 16), dst_, eight); p->curr.execWidth = 16; } if (dst.type != GEN_TYPE_UW) p->MOV(dst, dst_); p->pop(); } void GenContext::emitStackPointer(void) { using namespace ir; // Only emit stack pointer computation if we use a stack if (kernel->getStackSize() == 0) return; // Check that everything is consistent in the kernel code const uint32_t perLaneSize = kernel->getStackSize(); GBE_ASSERT(perLaneSize > 0); const GenRegister selStatckPtr = this->simdWidth == 8 ? GenRegister::ud8grf(ir::ocl::stackptr) : GenRegister::ud16grf(ir::ocl::stackptr); const GenRegister stackptr = ra->genReg(selStatckPtr); // borrow block ip as temporary register as we will // initialize block ip latter. const GenRegister tmpReg = GenRegister::retype(GenRegister::vec1(getBlockIP()), GEN_TYPE_UW); const GenRegister tmpReg_ud = GenRegister::retype(tmpReg, GEN_TYPE_UD); loadLaneID(stackptr); // We compute the per-lane stack pointer here // threadId * perThreadSize + laneId*perLaneSize or // (threadId * simdWidth + laneId)*perLaneSize // let private address start from zero //p->MOV(stackptr, GenRegister::immud(0)); p->push(); p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->AND(tmpReg, GenRegister::ud1grf(0,5), GenRegister::immuw(0x1ff)); //threadId p->MUL(tmpReg, tmpReg, GenRegister::immuw(this->simdWidth)); //threadId * simdWidth p->curr.execWidth = this->simdWidth; p->ADD(stackptr, GenRegister::unpacked_uw(stackptr), tmpReg); //threadId * simdWidth + laneId, must < 64K p->curr.execWidth = 1; p->MOV(tmpReg_ud, GenRegister::immud(perLaneSize)); p->curr.execWidth = this->simdWidth; p->MUL(stackptr, tmpReg_ud, stackptr); // (threadId * simdWidth + laneId)*perLaneSize if (fn.getPointerFamily() == ir::FAMILY_QWORD) { const GenRegister selStatckPtr2 = this->simdWidth == 8 ? GenRegister::ul8grf(ir::ocl::stackptr) : GenRegister::ul16grf(ir::ocl::stackptr); const GenRegister stackptr2 = ra->genReg(selStatckPtr2); int simdWidth = p->curr.execWidth; if (simdWidth == 16) { // we need do second quarter first, because the dst type is QW, // while the src is DW. If we do first quater first, the 1st // quarter's dst would contain the 2nd quarter's src. p->curr.execWidth = 8; p->curr.quarterControl = GEN_COMPRESSION_Q2; p->MOV(GenRegister::Qn(stackptr2, 1), GenRegister::Qn(stackptr,1)); } p->curr.quarterControl = GEN_COMPRESSION_Q1; p->MOV(stackptr2, stackptr); } p->pop(); } void GenContext::emitLabelInstruction(const SelectionInstruction &insn) { const ir::LabelIndex label(insn.index); this->labelPos.insert(std::make_pair(label, p->store.size())); } void GenContext::emitUnaryInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src = ra->genReg(insn.src(0)); switch (insn.opcode) { case SEL_OP_MOV: p->MOV(dst, src, insn.extra.function); break; case SEL_OP_READ_ARF: p->MOV(dst, src); break; case SEL_OP_FBH: p->FBH(dst, src); break; case SEL_OP_FBL: p->FBL(dst, src); break; case SEL_OP_CBIT: p->CBIT(dst, src); break; case SEL_OP_LZD: p->LZD(dst, src); break; case SEL_OP_NOT: p->NOT(dst, src); break; case SEL_OP_RNDD: p->RNDD(dst, src); break; case SEL_OP_RNDU: p->RNDU(dst, src); break; case SEL_OP_RNDE: p->RNDE(dst, src); break; case SEL_OP_RNDZ: p->RNDZ(dst, src); break; case SEL_OP_F16TO32: p->F16TO32(dst, src); break; case SEL_OP_F32TO16: p->F32TO16(dst, src); break; case SEL_OP_LOAD_INT64_IMM: p->LOAD_INT64_IMM(dst, src); break; case SEL_OP_BFREV: p->BFREV(dst, src); break; case SEL_OP_CONVI64_TO_I: { p->MOV(dst, src.bottom_half()); break; } case SEL_OP_BRC: { const ir::LabelIndex label0(insn.index), label1(insn.index1); const LabelPair labelPair(label0, label1); const GenRegister src = ra->genReg(insn.src(0)); this->branchPos3.push_back(std::make_pair(labelPair, p->store.size())); p->BRC(src); } break; case SEL_OP_BRD: insertJumpPos(insn); p->BRD(src); break; case SEL_OP_ENDIF: insertJumpPos(insn); p->ENDIF(src); break; case SEL_OP_IF: { const ir::LabelIndex label0(insn.index), label1(insn.index1); const LabelPair labelPair(label0, label1); const GenRegister src = ra->genReg(insn.src(0)); this->branchPos3.push_back(std::make_pair(labelPair, p->store.size())); p->IF(src); } break; case SEL_OP_ELSE: { insertJumpPos(insn); /* const ir::LabelIndex label(insn.index), label1(insn.index); const LabelPair labelPair(label, label1); const GenRegister src = ra->genReg(insn.src(0)); this->branchPos3.push_back(std::make_pair(labelPair, p->store.size()));*/ p->ELSE(src); } break; case SEL_OP_WHILE: { /*const ir::LabelIndex label0(insn.index), label1(insn.index1); const LabelPair labelPair(label0, label1); const GenRegister src = ra->genReg(insn.src(0)); this->branchPos3.push_back(std::make_pair(labelPair, p->store.size()));*/ insertJumpPos(insn); p->WHILE(src); } break; default: NOT_IMPLEMENTED; } } void GenContext::emitUnaryWithTempInstruction(const SelectionInstruction &insn) { GenRegister dst = ra->genReg(insn.dst(0)); GenRegister src = ra->genReg(insn.src(0)); GenRegister tmp = ra->genReg(insn.dst(1)); switch (insn.opcode) { case SEL_OP_CONVI_TO_I64: { GenRegister middle = src; if(src.type == GEN_TYPE_B || src.type == GEN_TYPE_W) { middle = tmp; middle.type = GEN_TYPE_D; p->MOV(middle, src); } p->MOV(dst.bottom_half(), middle); if(src.is_signed_int()) p->ASR(dst.top_half(this->simdWidth), middle, GenRegister::immud(31)); else p->MOV(dst.top_half(this->simdWidth), GenRegister::immud(0)); break; } case SEL_OP_BSWAP: { uint32_t simd = p->curr.execWidth; GBE_ASSERT(simd == 8 || simd == 16 || simd == 1); uint16_t new_a0[16]; memset(new_a0, 0, sizeof(new_a0)); GBE_ASSERT(src.type == dst.type); uint32_t start_addr = src.nr*32 + src.subnr; if (simd == 1) { GBE_ASSERT(src.hstride == GEN_HORIZONTAL_STRIDE_0 && dst.hstride == GEN_HORIZONTAL_STRIDE_0); if (src.type == GEN_TYPE_UD || src.type == GEN_TYPE_D) { GBE_ASSERT(start_addr >= 0); new_a0[0] = start_addr + 3; new_a0[1] = start_addr + 2; new_a0[2] = start_addr + 1; new_a0[3] = start_addr; this->setA0Content(new_a0, 0, 4); p->push(); p->curr.execWidth = 4; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; GenRegister ind_src = GenRegister::to_indirect1xN(GenRegister::retype(src, GEN_TYPE_UB), new_a0[0], 0); GenRegister dst_ = dst; dst_.type = GEN_TYPE_UB; dst_.hstride = GEN_HORIZONTAL_STRIDE_1; dst_.width = GEN_WIDTH_4; dst_.vstride = GEN_VERTICAL_STRIDE_4; p->MOV(dst_, ind_src); p->pop(); } else if (src.type == GEN_TYPE_UW || src.type == GEN_TYPE_W) { p->MOV(GenRegister::retype(dst, GEN_TYPE_UB), GenRegister::retype(GenRegister::offset(src, 0, 1), GEN_TYPE_UB)); p->MOV(GenRegister::retype(GenRegister::offset(dst, 0, 1), GEN_TYPE_UB), GenRegister::retype(src, GEN_TYPE_UB)); } else { GBE_ASSERT(0); } } else { if (src.type == GEN_TYPE_UD || src.type == GEN_TYPE_D) { bool uniform_src = (src.hstride == GEN_HORIZONTAL_STRIDE_0); GBE_ASSERT(uniform_src || src.subnr == 0); GBE_ASSERT(dst.subnr == 0); GBE_ASSERT(tmp.subnr == 0); GBE_ASSERT(start_addr >= 0); new_a0[0] = start_addr + 3; new_a0[1] = start_addr + 2; new_a0[2] = start_addr + 1; new_a0[3] = start_addr; if (!uniform_src) { new_a0[4] = start_addr + 7; new_a0[5] = start_addr + 6; new_a0[6] = start_addr + 5; new_a0[7] = start_addr + 4; } else { new_a0[4] = start_addr + 3; new_a0[5] = start_addr + 2; new_a0[6] = start_addr + 1; new_a0[7] = start_addr; } this->setA0Content(new_a0, 56); p->push(); p->curr.execWidth = 8; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; GenRegister ind_src = GenRegister::to_indirect1xN(GenRegister::retype(src, GEN_TYPE_UB), new_a0[0], 0); p->MOV(GenRegister::retype(tmp, GEN_TYPE_UB), ind_src); for (int i = 1; i < 4; i++) { if (!uniform_src) ind_src.addr_imm += 8; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 0, 8*i), ind_src); } if (simd == 16) { for (int i = 0; i < 4; i++) { if (!uniform_src) ind_src.addr_imm += 8; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 1, 8*i), ind_src); } } p->pop(); p->MOV(dst, tmp); } else if (src.type == GEN_TYPE_UW || src.type == GEN_TYPE_W) { bool uniform_src = (src.hstride == GEN_HORIZONTAL_STRIDE_0); GBE_ASSERT(uniform_src || src.subnr == 0 || src.subnr == 16); GBE_ASSERT(dst.subnr == 0 || dst.subnr == 16); GBE_ASSERT(tmp.subnr == 0 || tmp.subnr == 16); GBE_ASSERT(start_addr >= 0); new_a0[0] = start_addr + 1; new_a0[1] = start_addr; if (!uniform_src) { new_a0[2] = start_addr + 3; new_a0[3] = start_addr + 2; new_a0[4] = start_addr + 5; new_a0[5] = start_addr + 4; new_a0[6] = start_addr + 7; new_a0[7] = start_addr + 6; } else { new_a0[2] = start_addr + 1; new_a0[3] = start_addr; new_a0[4] = start_addr + 1; new_a0[5] = start_addr; new_a0[6] = start_addr + 1; new_a0[7] = start_addr; } this->setA0Content(new_a0, 56); p->push(); p->curr.execWidth = 8; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; GenRegister ind_src = GenRegister::to_indirect1xN(GenRegister::retype(src, GEN_TYPE_UB), new_a0[0], 0); p->MOV(GenRegister::retype(tmp, GEN_TYPE_UB), ind_src); for (int i = 1; i < (simd == 8 ? 2 : 4); i++) { if (!uniform_src) ind_src.addr_imm += 8; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 0, 8*i), ind_src); } p->pop(); p->MOV(dst, tmp); }else if (src.type == GEN_TYPE_UL || src.type == GEN_TYPE_L) { bool uniform_src = (src.hstride == GEN_HORIZONTAL_STRIDE_0); GBE_ASSERT(uniform_src || src.subnr == 0); GBE_ASSERT(dst.subnr == 0); GBE_ASSERT(tmp.subnr == 0); GBE_ASSERT(start_addr >= 0); if (!uniform_src) { new_a0[0] = start_addr + 3; new_a0[1] = start_addr + 2; new_a0[2] = start_addr + 1; new_a0[3] = start_addr; new_a0[4] = start_addr + 7; new_a0[5] = start_addr + 6; new_a0[6] = start_addr + 5; new_a0[7] = start_addr + 4; } else { new_a0[0] = start_addr + 7; new_a0[1] = start_addr + 6; new_a0[2] = start_addr + 5; new_a0[3] = start_addr + 4; new_a0[4] = start_addr + 3; new_a0[5] = start_addr + 2; new_a0[6] = start_addr + 1; new_a0[7] = start_addr; } this->setA0Content(new_a0, 56); if (!uniform_src) { p->push(); p->curr.execWidth = 8; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; GenRegister ind_src = GenRegister::to_indirect1xN(GenRegister::retype(src, GEN_TYPE_UB), new_a0[0], 0); p->MOV(GenRegister::retype(tmp, GEN_TYPE_UB), ind_src); for (int i = 1; i < 4; i++) { if (!uniform_src) ind_src.addr_imm += 8; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 0, 8*i), ind_src); } for (int i = 0; i < 4; i++) { if (!uniform_src) ind_src.addr_imm += 8; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 1, 8*i), ind_src); } if (simd == 16) { for (int i = 0; i < 4; i++) { if (!uniform_src) ind_src.addr_imm += 8; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 2, 8*i), ind_src); } for (int i = 0; i < 4; i++) { if (!uniform_src) ind_src.addr_imm += 8; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 3, 8*i), ind_src); } } p->pop(); p->push(); p->curr.execWidth = 8; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; if (simd == 8) { p->MOV(GenRegister::offset(GenRegister::retype(dst, GEN_TYPE_D), 1, 0), GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_D), 0, 0)); p->MOV(GenRegister::offset(GenRegister::retype(dst, GEN_TYPE_D), 0, 0), GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_D), 1, 0)); }else if(simd == 16) { p->MOV(GenRegister::offset(GenRegister::retype(dst, GEN_TYPE_D), 2, 0), GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_D), 0, 0)); p->MOV(GenRegister::offset(GenRegister::retype(dst, GEN_TYPE_D), 3, 0), GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_D), 1, 0)); p->MOV(GenRegister::offset(GenRegister::retype(dst, GEN_TYPE_D), 0, 0), GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_D), 2, 0)); p->MOV(GenRegister::offset(GenRegister::retype(dst, GEN_TYPE_D), 1, 0), GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_D), 3, 0)); } p->pop(); } else { p->push(); p->curr.execWidth = 8; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; GenRegister ind_src = GenRegister::to_indirect1xN(GenRegister::retype(src, GEN_TYPE_UB), new_a0[0], 0); p->MOV(GenRegister::retype(tmp, GEN_TYPE_UB), ind_src); p->pop(); p->push(); p->curr.execWidth = 8; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; GenRegister x = GenRegister::ud1grf(tmp.nr, 0); GenRegister y = GenRegister::ud1grf(tmp.nr, 1); GenRegister dst_ = dst; dst_.type = GEN_TYPE_UD; dst_.hstride = GEN_HORIZONTAL_STRIDE_1; dst_.width = GEN_WIDTH_8; dst_.vstride = GEN_VERTICAL_STRIDE_8; if (simd == 8) { p->MOV(GenRegister::offset(GenRegister::retype(dst_, GEN_TYPE_D), 0, 0), x); p->MOV(GenRegister::offset(GenRegister::retype(dst_, GEN_TYPE_D), 1, 0), y); }else if(simd == 16) { p->MOV(GenRegister::offset(GenRegister::retype(dst_, GEN_TYPE_D), 0, 0), x); p->MOV(GenRegister::offset(GenRegister::retype(dst_, GEN_TYPE_D), 1, 0), x); p->MOV(GenRegister::offset(GenRegister::retype(dst_, GEN_TYPE_D), 2, 0), y); p->MOV(GenRegister::offset(GenRegister::retype(dst_, GEN_TYPE_D), 3, 0), y); } p->pop(); } } else { GBE_ASSERT(0); } } } break; default: NOT_IMPLEMENTED; } } void GenContext::emitBinaryWithTempInstruction(const SelectionInstruction &insn) { GenRegister dst = ra->genReg(insn.dst(0)); GenRegister src0 = ra->genReg(insn.src(0)); GenRegister src1 = ra->genReg(insn.src(1)); GenRegister tmp = ra->genReg(insn.dst(1)); switch (insn.opcode) { case SEL_OP_I64ADD: { tmp = GenRegister::retype(tmp, GEN_TYPE_UL); GenRegister x = tmp.bottom_half(); GenRegister y = tmp.top_half(this->simdWidth); loadBottomHalf(x, src0); loadBottomHalf(y, src1); addWithCarry(x, x, y); storeBottomHalf(dst, x); loadTopHalf(x, src0); p->ADD(x, x, y); loadTopHalf(y, src1); p->ADD(x, x, y); storeTopHalf(dst, x); break; } case SEL_OP_I64SUB: { tmp = GenRegister::retype(tmp, GEN_TYPE_UL); GenRegister x = tmp.bottom_half(); GenRegister y = tmp.top_half(this->simdWidth); loadBottomHalf(x, src0); loadBottomHalf(y, src1); subWithBorrow(x, x, y); storeBottomHalf(dst, x); loadTopHalf(x, src0); subWithBorrow(x, x, y); loadTopHalf(y, src1); subWithBorrow(x, x, y); storeTopHalf(dst, x); break; } case SEL_OP_MUL_HI: { int w = p->curr.execWidth; p->push(); p->curr.execWidth = 8; for (int i = 0; i < w / 8; i ++) { p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->MUL(GenRegister::retype(GenRegister::acc(), GEN_TYPE_UD), src0, GenRegister::h2(GenRegister::retype(src1, GEN_TYPE_UW))); p->curr.accWrEnable = 1; p->MACH(tmp, src0, src1); p->pop(); p->curr.quarterControl = i; p->MOV(dst, tmp); dst = GenRegister::Qn(dst, 1); src0 = GenRegister::Qn(src0, 1); src1 = GenRegister::Qn(src1, 1); } p->pop(); break; } case SEL_OP_HADD: { int w = p->curr.execWidth; p->push(); p->curr.execWidth = 8; for (int i = 0; i < w / 8; i ++) { p->curr.quarterControl = i; p->ADDC(dst, src0, src1); p->SHR(dst, dst, GenRegister::immud(1)); p->SHL(tmp, GenRegister::retype(GenRegister::acc(), GEN_TYPE_D), GenRegister::immud(31)); p->OR(dst, dst, tmp); dst = GenRegister::Qn(dst, 1); src0 = GenRegister::Qn(src0, 1); src1 = GenRegister::Qn(src1, 1); } p->pop(); break; } case SEL_OP_RHADD: { int w = p->curr.execWidth; p->push(); p->curr.execWidth = 8; for (int i = 0; i < w / 8; i ++) { p->curr.quarterControl = i; p->ADDC(dst, src0, src1); p->ADD(dst, dst, GenRegister::immud(1)); p->SHR(dst, dst, GenRegister::immud(1)); p->SHL(tmp, GenRegister::retype(GenRegister::acc(), GEN_TYPE_D), GenRegister::immud(31)); p->OR(dst, dst, tmp); dst = GenRegister::Qn(dst, 1); src0 = GenRegister::Qn(src0, 1); src1 = GenRegister::Qn(src1, 1); } p->pop(); break; } default: NOT_IMPLEMENTED; } } void GenContext::emitSimdShuffleInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src0 = ra->genReg(insn.src(0)); const GenRegister src1 = ra->genReg(insn.src(1)); assert(insn.opcode == SEL_OP_SIMD_SHUFFLE); assert (src1.file != GEN_IMMEDIATE_VALUE); uint32_t base = src0.nr * 32 + src0.subnr; GenRegister baseReg = GenRegister::immuw(base); const GenRegister a0 = GenRegister::addr8(0); uint32_t simd = p->curr.execWidth; p->push(); if (simd == 8) { p->ADD(a0, GenRegister::unpacked_uw(src1.nr, src1.subnr / typeSize(GEN_TYPE_UW)), baseReg); GenRegister indirect = GenRegister::to_indirect1xN(src0, 0, 0); p->MOV(dst, indirect); } else if (simd == 16) { p->curr.execWidth = 8; p->ADD(a0, GenRegister::unpacked_uw(src1.nr, src1.subnr / typeSize(GEN_TYPE_UW)), baseReg); GenRegister indirect = GenRegister::to_indirect1xN(src0, 0, 0); p->MOV(dst, indirect); p->curr.quarterControl = 1; p->ADD(a0, GenRegister::unpacked_uw(src1.nr+1, src1.subnr / typeSize(GEN_TYPE_UW)), baseReg); p->MOV(GenRegister::offset(dst, 0, 8 * typeSize(src0.type)), indirect); } else NOT_IMPLEMENTED; p->pop(); } void GenContext::emitBinaryInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src0 = ra->genReg(insn.src(0)); const GenRegister src1 = ra->genReg(insn.src(1)); switch (insn.opcode) { case SEL_OP_SEL: p->SEL(dst, src0, src1); break; case SEL_OP_SEL_INT64: { p->SEL(dst.bottom_half(), src0.bottom_half(), src1.bottom_half()); p->SEL(dst.top_half(this->simdWidth), src0.top_half(this->simdWidth), src1.top_half(this->simdWidth)); } break; case SEL_OP_AND: p->AND(dst, src0, src1, insn.extra.function); break; case SEL_OP_OR: p->OR (dst, src0, src1, insn.extra.function); break; case SEL_OP_XOR: p->XOR(dst, src0, src1, insn.extra.function); break; case SEL_OP_I64AND: { p->AND(dst.bottom_half(), src0.bottom_half(), src1.bottom_half()); p->AND(dst.top_half(this->simdWidth), src0.top_half(this->simdWidth), src1.top_half(this->simdWidth)); } break; case SEL_OP_I64OR: { p->OR(dst.bottom_half(), src0.bottom_half(), src1.bottom_half()); p->OR(dst.top_half(this->simdWidth), src0.top_half(this->simdWidth), src1.top_half(this->simdWidth)); } break; case SEL_OP_I64XOR: { p->XOR(dst.bottom_half(), src0.bottom_half(), src1.bottom_half()); p->XOR(dst.top_half(this->simdWidth), src0.top_half(this->simdWidth), src1.top_half(this->simdWidth)); } break; case SEL_OP_SHR: p->SHR(dst, src0, src1); break; case SEL_OP_SHL: p->SHL(dst, src0, src1); break; case SEL_OP_RSR: p->RSR(dst, src0, src1); break; case SEL_OP_RSL: p->RSL(dst, src0, src1); break; case SEL_OP_ASR: p->ASR(dst, src0, src1); break; case SEL_OP_ADD: p->ADD(dst, src0, src1); break; case SEL_OP_MUL: p->MUL(dst, src0, src1); break; case SEL_OP_MACH: p->MACH(dst, src0, src1); break; case SEL_OP_UPSAMPLE_LONG: { GenRegister xdst = GenRegister::retype(dst, GEN_TYPE_UL), xsrc0 = GenRegister::retype(src0, GEN_TYPE_UL), xsrc1 = GenRegister::retype(src1, GEN_TYPE_UL); p->MOV(xdst.top_half(this->simdWidth), xsrc0.bottom_half()); p->MOV(xdst.bottom_half(), xsrc1.bottom_half()); } break; default: NOT_IMPLEMENTED; } } void GenContext::collectShifter(GenRegister dest, GenRegister src) { p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->AND(dest, src.bottom_half(), GenRegister::immud(63)); p->pop(); } void GenContext::I64FullAdd(GenRegister high1, GenRegister low1, GenRegister high2, GenRegister low2) { addWithCarry(low1, low1, low2); addWithCarry(high1, high1, high2); p->ADD(high1, high1, low2); } void GenContext::I64FullMult(GenRegister dst1, GenRegister dst2, GenRegister dst3, GenRegister dst4, GenRegister x_high, GenRegister x_low, GenRegister y_high, GenRegister y_low) { GenRegister &e = dst1, &f = dst2, &g = dst3, &h = dst4, &a = x_high, &b = x_low, &c = y_high, &d = y_low; I32FullMult(e, h, b, d); I32FullMult(f, g, a, d); addWithCarry(g, g, e); addWithCarry(f, f, e); I32FullMult(e, d, b, c); I64FullAdd(f, g, e, d); I32FullMult(b, d, a, c); I64FullAdd(e, f, b, d); } void GenContext::I64Neg(GenRegister high, GenRegister low, GenRegister tmp) { p->NOT(high, high); p->NOT(low, low); p->MOV(tmp, GenRegister::immud(1)); addWithCarry(low, low, tmp); p->ADD(high, high, tmp); } void GenContext::I64ABS(GenRegister sign, GenRegister high, GenRegister low, GenRegister tmp, GenRegister flagReg) { p->SHR(sign, high, GenRegister::immud(31)); p->push(); p->curr.noMask = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_NZ, sign, GenRegister::immud(0)); p->curr.predicate = GEN_PREDICATE_NORMAL; I64Neg(high, low, tmp); p->pop(); } void GenContext::emitI64MULHIInstruction(const SelectionInstruction &insn) { GenRegister dest = ra->genReg(insn.dst(0)); GenRegister x = ra->genReg(insn.src(0)); GenRegister y = ra->genReg(insn.src(1)); GenRegister a = ra->genReg(insn.dst(1)); GenRegister b = ra->genReg(insn.dst(2)); GenRegister c = ra->genReg(insn.dst(3)); GenRegister d = ra->genReg(insn.dst(4)); GenRegister e = ra->genReg(insn.dst(5)); GenRegister f = ra->genReg(insn.dst(6)); GenRegister g = ra->genReg(insn.dst(7)); GenRegister h = ra->genReg(insn.dst(8)); GenRegister i = ra->genReg(insn.dst(9)); GBE_ASSERT(insn.state.flag == 0 && insn.state.subFlag == 1); GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag); loadTopHalf(a, x); loadBottomHalf(b, x); loadTopHalf(c, y); loadBottomHalf(d, y); if(x.type == GEN_TYPE_UL) { I64FullMult(e, f, g, h, a, b, c, d); } else { I64ABS(e, a, b, i, flagReg); I64ABS(f, c, d, i, flagReg); p->XOR(i, e, f); I64FullMult(e, f, g, h, a, b, c, d); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_NZ, i, GenRegister::immud(0)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->NOT(e, e); p->NOT(f, f); p->NOT(g, g); p->NOT(h, h); p->MOV(i, GenRegister::immud(1)); addWithCarry(h, h, i); addWithCarry(g, g, i); addWithCarry(f, f, i); p->ADD(e, e, i); p->pop(); } storeTopHalf(dest, e); storeBottomHalf(dest, f); } void GenContext::emitI64MADSATInstruction(const SelectionInstruction &insn) { GenRegister dest = ra->genReg(insn.dst(0)); GenRegister x = ra->genReg(insn.src(0)); GenRegister y = ra->genReg(insn.src(1)); GenRegister z = ra->genReg(insn.src(2)); GenRegister a = ra->genReg(insn.dst(1)); GenRegister b = ra->genReg(insn.dst(2)); GenRegister c = ra->genReg(insn.dst(3)); GenRegister d = ra->genReg(insn.dst(4)); GenRegister e = ra->genReg(insn.dst(5)); GenRegister f = ra->genReg(insn.dst(6)); GenRegister g = ra->genReg(insn.dst(7)); GenRegister h = ra->genReg(insn.dst(8)); GenRegister i = ra->genReg(insn.dst(9)); GBE_ASSERT(insn.state.flag == 0 && insn.state.subFlag == 1); GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag); GenRegister zero = GenRegister::immud(0), one = GenRegister::immud(1); loadTopHalf(a, x); loadBottomHalf(b, x); loadTopHalf(c, y); loadBottomHalf(d, y); if(x.type == GEN_TYPE_UL) { I64FullMult(e, f, g, h, a, b, c, d); loadTopHalf(c, z); loadBottomHalf(d, z); addWithCarry(h, h, d); addWithCarry(g, g, d); addWithCarry(f, f, d); p->ADD(e, e, d); addWithCarry(g, g, c); addWithCarry(f, f, c); p->ADD(e, e, c); p->OR(a, e, f); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_NZ, a, zero); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(g, GenRegister::immd(-1)); p->MOV(h, GenRegister::immd(-1)); p->pop(); } else { I64ABS(e, a, b, i, flagReg); I64ABS(f, c, d, i, flagReg); p->XOR(i, e, f); I64FullMult(e, f, g, h, a, b, c, d); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_NZ, i, zero); p->curr.predicate = GEN_PREDICATE_NORMAL; p->NOT(e, e); p->NOT(f, f); p->NOT(g, g); p->NOT(h, h); p->MOV(i, one); addWithCarry(h, h, i); addWithCarry(g, g, i); addWithCarry(f, f, i); p->ADD(e, e, i); p->pop(); loadTopHalf(c, z); loadBottomHalf(d, z); p->ASR(GenRegister::retype(b, GEN_TYPE_D), GenRegister::retype(c, GEN_TYPE_D), GenRegister::immd(31)); p->MOV(a, b); addWithCarry(h, h, d); addWithCarry(g, g, d); addWithCarry(f, f, d); p->ADD(e, e, d); addWithCarry(g, g, c); addWithCarry(f, f, c); p->ADD(e, e, c); addWithCarry(f, f, b); p->ADD(e, e, b); p->ADD(e, e, a); p->MOV(b, zero); p->push(); p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->CMP(GEN_CONDITIONAL_NZ, e, zero); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(b, one); p->curr.predicate = GEN_PREDICATE_NONE; p->CMP(GEN_CONDITIONAL_NZ, f, zero); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(b, one); p->curr.predicate = GEN_PREDICATE_NONE; p->CMP(GEN_CONDITIONAL_G, g, GenRegister::immud(0x7FFFFFFF)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(b, one); p->curr.predicate = GEN_PREDICATE_NONE; p->SHR(a, e, GenRegister::immud(31)); p->CMP(GEN_CONDITIONAL_NZ, a, zero); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(b, zero); p->curr.predicate = GEN_PREDICATE_NONE; p->CMP(GEN_CONDITIONAL_NZ, b, zero); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(g, GenRegister::immud(0x7FFFFFFF)); p->MOV(h, GenRegister::immud(0xFFFFFFFFu)); p->curr.predicate = GEN_PREDICATE_NONE; p->MOV(b, zero); p->CMP(GEN_CONDITIONAL_NEQ, e, GenRegister::immud(0xFFFFFFFFu)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(b, one); p->curr.predicate = GEN_PREDICATE_NONE; p->CMP(GEN_CONDITIONAL_NEQ, f, GenRegister::immud(0xFFFFFFFFu)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(b, one); p->curr.predicate = GEN_PREDICATE_NONE; p->CMP(GEN_CONDITIONAL_LE, g, GenRegister::immud(0x7FFFFFFF)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(b, one); p->curr.predicate = GEN_PREDICATE_NONE; p->CMP(GEN_CONDITIONAL_Z, a, zero); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(b, zero); p->curr.predicate = GEN_PREDICATE_NONE; p->CMP(GEN_CONDITIONAL_NZ, b, zero); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(g, GenRegister::immud(0x80000000u)); p->MOV(h, zero); p->pop(); } storeTopHalf(dest, g); storeBottomHalf(dest, h); } void GenContext::emitI64HADDInstruction(const SelectionInstruction &insn) { GenRegister dest = ra->genReg(insn.dst(0)); GenRegister x = ra->genReg(insn.src(0)); GenRegister y = ra->genReg(insn.src(1)); GenRegister a = ra->genReg(insn.dst(1)); GenRegister b = ra->genReg(insn.dst(2)); GenRegister c = ra->genReg(insn.dst(3)); GenRegister d = ra->genReg(insn.dst(4)); a.type = b.type = c.type = d.type = GEN_TYPE_UD; loadBottomHalf(a, x); loadBottomHalf(b, y); loadTopHalf(c, x); loadTopHalf(d, y); addWithCarry(a, a, b); addWithCarry(c, c, b); addWithCarry(c, c, d); p->ADD(b, b, d); p->SHR(a, a, GenRegister::immud(1)); p->SHL(d, c, GenRegister::immud(31)); p->OR(a, a, d); p->SHR(c, c, GenRegister::immud(1)); p->SHL(d, b, GenRegister::immud(31)); p->OR(c, c, d); storeBottomHalf(dest, a); storeTopHalf(dest, c); } void GenContext::emitI64RHADDInstruction(const SelectionInstruction &insn) { GenRegister dest = ra->genReg(insn.dst(0)); GenRegister x = ra->genReg(insn.src(0)); GenRegister y = ra->genReg(insn.src(1)); GenRegister a = ra->genReg(insn.dst(1)); GenRegister b = ra->genReg(insn.dst(2)); GenRegister c = ra->genReg(insn.dst(3)); GenRegister d = ra->genReg(insn.dst(4)); a.type = b.type = c.type = d.type = GEN_TYPE_UD; loadBottomHalf(a, x); loadBottomHalf(b, y); addWithCarry(a, a, b); p->MOV(c, GenRegister::immud(1)); addWithCarry(a, a, c); p->ADD(b, b, c); loadTopHalf(c, x); loadTopHalf(d, y); addWithCarry(c, c, b); addWithCarry(c, c, d); p->ADD(b, b, d); p->SHR(a, a, GenRegister::immud(1)); p->SHL(d, c, GenRegister::immud(31)); p->OR(a, a, d); p->SHR(c, c, GenRegister::immud(1)); p->SHL(d, b, GenRegister::immud(31)); p->OR(c, c, d); storeBottomHalf(dest, a); storeTopHalf(dest, c); } void GenContext::emitI64ShiftInstruction(const SelectionInstruction &insn) { GenRegister dest = ra->genReg(insn.dst(0)); GenRegister x = ra->genReg(insn.src(0)); GenRegister y = ra->genReg(insn.src(1)); GenRegister a = ra->genReg(insn.dst(1)); GenRegister b = ra->genReg(insn.dst(2)); GenRegister c = ra->genReg(insn.dst(3)); GenRegister d = ra->genReg(insn.dst(4)); GenRegister e = ra->genReg(insn.dst(5)); GenRegister f = ra->genReg(insn.dst(6)); a.type = b.type = c.type = d.type = e.type = f.type = GEN_TYPE_UD; GBE_ASSERT(insn.state.flag == 0 && insn.state.subFlag == 1); GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag); GenRegister zero = GenRegister::immud(0); switch(insn.opcode) { case SEL_OP_I64SHL: p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; collectShifter(a, y); loadBottomHalf(e, x); loadTopHalf(f, x); p->SHR(b, e, GenRegister::negate(a)); p->SHL(c, e, a); p->SHL(d, f, a); p->OR(e, d, b); setFlag(flagReg, GenRegister::immuw(0xFFFF)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_Z, a, zero); p->SEL(d, d, e); p->curr.predicate = GEN_PREDICATE_NONE; p->AND(a, a, GenRegister::immud(32)); setFlag(flagReg, GenRegister::immuw(0xFFFF)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_Z, a, zero); p->SEL(d, d, c); p->SEL(c, c, zero); p->pop(); storeBottomHalf(dest, c); storeTopHalf(dest, d); break; case SEL_OP_I64SHR: p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; collectShifter(a, y); loadBottomHalf(e, x); loadTopHalf(f, x); p->SHL(b, f, GenRegister::negate(a)); p->SHR(c, f, a); p->SHR(d, e, a); p->OR(e, d, b); setFlag(flagReg, GenRegister::immuw(0xFFFF)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_Z, a, zero); p->SEL(d, d, e); p->curr.predicate = GEN_PREDICATE_NONE; p->AND(a, a, GenRegister::immud(32)); setFlag(flagReg, GenRegister::immuw(0xFFFF)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_Z, a, zero); p->SEL(d, d, c); p->SEL(c, c, zero); p->pop(); storeBottomHalf(dest, d); storeTopHalf(dest, c); break; case SEL_OP_I64ASR: f.type = GEN_TYPE_D; p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; collectShifter(a, y); loadBottomHalf(e, x); loadTopHalf(f, x); p->SHL(b, f, GenRegister::negate(a)); p->ASR(c, f, a); p->SHR(d, e, a); p->OR(e, d, b); setFlag(flagReg, GenRegister::immuw(0xFFFF)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_Z, a, zero); p->SEL(d, d, e); p->curr.predicate = GEN_PREDICATE_NONE; p->AND(a, a, GenRegister::immud(32)); p->ASR(f, f, GenRegister::immd(31)); setFlag(flagReg, GenRegister::immuw(0xFFFF)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_Z, a, zero); p->SEL(d, d, c); p->SEL(c, c, f); p->pop(); storeBottomHalf(dest, d); storeTopHalf(dest, c); break; default: NOT_IMPLEMENTED; } } void GenContext::setFlag(GenRegister flagReg, GenRegister src) { p->push(); p->curr.noMask = 1; p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->MOV(flagReg, src); p->pop(); } void GenContext::saveFlag(GenRegister dest, int flag, int subFlag) { p->push(); p->curr.execWidth = 1; p->MOV(dest, GenRegister::flag(flag, subFlag)); p->pop(); } void GenContext::UnsignedI64ToFloat(GenRegister dst, GenRegister high, GenRegister low, GenRegister exp, GenRegister mantissa, GenRegister tmp, GenRegister flag) { uint32_t jip0, jip1; GenRegister dst_ud = GenRegister::retype(dst, GEN_TYPE_UD); p->push(); p->curr.noMask = 1; p->MOV(exp, GenRegister::immud(32)); // make sure the inactive lane is 1 when check ALL8H/ALL16H condition latter. p->pop(); p->FBH(exp, high); p->ADD(exp, GenRegister::negate(exp), GenRegister::immud(31)); //exp = 32 when high == 0 p->push(); p->curr.useFlag(flag.flag_nr(), flag.flag_subnr()); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->CMP(GEN_CONDITIONAL_EQ, exp, GenRegister::immud(32)); //high == 0 p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.noMask = 0; p->MOV(dst, low); p->push(); if (simdWidth == 8) p->curr.predicate = GEN_PREDICATE_ALIGN1_ALL8H; else if (simdWidth == 16) p->curr.predicate = GEN_PREDICATE_ALIGN1_ALL16H; else NOT_IMPLEMENTED; p->curr.execWidth = 1; p->curr.noMask = 1; jip0 = p->n_instruction(); p->JMPI(GenRegister::immud(0)); p->pop(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->CMP(GEN_CONDITIONAL_G, exp, GenRegister::immud(23)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->CMP(GEN_CONDITIONAL_L, exp, GenRegister::immud(32)); //exp>23 && high!=0 p->ADD(tmp, exp, GenRegister::immud(-23)); p->SHR(mantissa, high, tmp); p->AND(mantissa, mantissa, GenRegister::immud(0x7fffff)); p->SHR(dst_ud, low, tmp); //dst is temp regitster here p->ADD(tmp, GenRegister::negate(tmp), GenRegister::immud(32)); p->SHL(high, high, tmp); p->OR(high, high, dst_ud); p->SHL(low, low, tmp); p->push(); if (simdWidth == 8) p->curr.predicate = GEN_PREDICATE_ALIGN1_ALL8H; else if (simdWidth == 16) p->curr.predicate = GEN_PREDICATE_ALIGN1_ALL16H; else NOT_IMPLEMENTED; p->curr.execWidth = 1; p->curr.noMask = 1; jip1 = p->n_instruction(); p->JMPI(GenRegister::immud(0)); p->pop(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->CMP(GEN_CONDITIONAL_EQ, exp, GenRegister::immud(23)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(dst_ud, GenRegister::immud(0)); //exp==9, SHR == 0 p->curr.predicate = GEN_PREDICATE_NONE; p->CMP(GEN_CONDITIONAL_L, exp, GenRegister::immud(23)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->ADD(tmp, exp, GenRegister::immud(9)); p->SHR(dst_ud, low, tmp); //dst is temp regitster here p->curr.predicate = GEN_PREDICATE_NONE; p->CMP(GEN_CONDITIONAL_LE, exp, GenRegister::immud(23)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->ADD(tmp, GenRegister::negate(exp), GenRegister::immud(23)); p->SHL(mantissa, high, tmp); p->OR(mantissa, mantissa, dst_ud); p->AND(mantissa, mantissa, GenRegister::immud(0x7fffff)); p->SHL(high, low, tmp); p->MOV(low, GenRegister::immud(0)); p->patchJMPI(jip1, (p->n_instruction() - jip1), 0); p->curr.predicate = GEN_PREDICATE_NONE; p->CMP(GEN_CONDITIONAL_LE, exp, GenRegister::immud(31)); //update dst where high != 0 p->curr.predicate = GEN_PREDICATE_NORMAL; p->ADD(exp, exp, GenRegister::immud(159)); p->SHL(exp, exp, GenRegister::immud(23)); p->OR(dst_ud, exp, mantissa); p->CMP(GEN_CONDITIONAL_GE, high, GenRegister::immud(0x80000000)); p->ADD(dst_ud, dst_ud, GenRegister::immud(1)); p->CMP(GEN_CONDITIONAL_EQ, high, GenRegister::immud(0x80000000)); p->CMP(GEN_CONDITIONAL_EQ, low, GenRegister::immud(0x0)); p->AND(dst_ud, dst_ud, GenRegister::immud(0xfffffffe)); p->patchJMPI(jip0, (p->n_instruction() - jip0), 0); p->pop(); } void GenContext::emitI64ToFloatInstruction(const SelectionInstruction &insn) { GenRegister src = ra->genReg(insn.src(0)); GenRegister dest = ra->genReg(insn.dst(0)); GenRegister high = ra->genReg(insn.dst(1)); GenRegister low = ra->genReg(insn.dst(2)); GenRegister exp = ra->genReg(insn.dst(3)); GenRegister mantissa = ra->genReg(insn.dst(4)); GenRegister tmp = ra->genReg(insn.dst(5)); GenRegister tmp_high = ra->genReg(insn.dst(6)); GBE_ASSERT(insn.state.flag == 0 && insn.state.subFlag == 1); GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag); loadTopHalf(high, src); loadBottomHalf(low, src); if(!src.is_signed_int()) { UnsignedI64ToFloat(dest, high, low, exp, mantissa, tmp, flagReg); } else { p->MOV(tmp_high, high); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_GE, tmp_high, GenRegister::immud(0x80000000)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->NOT(high, high); p->NOT(low, low); p->MOV(tmp, GenRegister::immud(1)); addWithCarry(low, low, tmp); p->ADD(high, high, tmp); p->pop(); UnsignedI64ToFloat(dest, high, low, exp, mantissa, tmp, flagReg); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_GE, tmp_high, GenRegister::immud(0x80000000)); p->curr.predicate = GEN_PREDICATE_NORMAL; dest.type = GEN_TYPE_UD; p->OR(dest, dest, GenRegister::immud(0x80000000)); p->pop(); } } void GenContext::emitFloatToI64Instruction(const SelectionInstruction &insn) { GenRegister src = ra->genReg(insn.src(0)); GenRegister dst = ra->genReg(insn.dst(0)); GenRegister high = ra->genReg(insn.dst(1)); GenRegister tmp = ra->genReg(insn.dst(2)); GBE_ASSERT(insn.state.flag == 0 && insn.state.subFlag == 1); GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag); if(dst.is_signed_int()) high = GenRegister::retype(high, GEN_TYPE_D); GenRegister low = GenRegister::retype(tmp, GEN_TYPE_UD); float c = (1.f / 65536.f) * (1.f / 65536.f); p->MUL(tmp, src, GenRegister::immf(c)); p->RNDZ(tmp, tmp); p->MOV(high, tmp); c = 65536.f * 65536.f; p->MOV(tmp, high); //result may not equal to tmp //mov float to int/uint is sat, so must sub high*0xffffffff p->MUL(tmp, tmp, GenRegister::immf(c)); p->ADD(tmp, src, GenRegister::negate(tmp)); p->MOV(low, GenRegister::abs(tmp)); if(dst.is_signed_int()) { p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_L, src, GenRegister::immf(0x0)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->CMP(GEN_CONDITIONAL_NEQ, low, GenRegister::immud(0x0)); p->ADD(high, high, GenRegister::immd(-1)); p->NOT(low, low); p->ADD(low, low, GenRegister::immud(1)); p->pop(); } storeTopHalf(dst, high); storeBottomHalf(dst, low); } void GenContext::emitI64CompareInstruction(const SelectionInstruction &insn) { GenRegister src0 = ra->genReg(insn.src(0)); GenRegister src1 = ra->genReg(insn.src(1)); GenRegister tmp0 = ra->genReg(insn.dst(0)); GenRegister tmp1 = ra->genReg(insn.dst(1)); GenRegister tmp2 = ra->genReg(insn.dst(2)); tmp0.type = (src0.type == GEN_TYPE_L) ? GEN_TYPE_D : GEN_TYPE_UD; tmp1.type = (src1.type == GEN_TYPE_L) ? GEN_TYPE_D : GEN_TYPE_UD; int flag = p->curr.flag, subFlag = p->curr.subFlag; GenRegister f1 = GenRegister::retype(tmp2, GEN_TYPE_UW); f1.width = GEN_WIDTH_1; GenRegister f2 = GenRegister::suboffset(f1, 1); GenRegister f3 = GenRegister::suboffset(f1, 2); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; loadTopHalf(tmp0, src0); loadTopHalf(tmp1, src1); switch(insn.extra.function) { case GEN_CONDITIONAL_L: case GEN_CONDITIONAL_LE: case GEN_CONDITIONAL_G: case GEN_CONDITIONAL_GE: { int cmpTopHalf = insn.extra.function; if(insn.extra.function == GEN_CONDITIONAL_LE) cmpTopHalf = GEN_CONDITIONAL_L; if(insn.extra.function == GEN_CONDITIONAL_GE) cmpTopHalf = GEN_CONDITIONAL_G; p->CMP(cmpTopHalf, tmp0, tmp1); } saveFlag(f1, flag, subFlag); p->CMP(GEN_CONDITIONAL_EQ, tmp0, tmp1); saveFlag(f2, flag, subFlag); tmp0.type = tmp1.type = GEN_TYPE_UD; loadBottomHalf(tmp0, src0); loadBottomHalf(tmp1, src1); p->CMP(insn.extra.function, tmp0, tmp1); saveFlag(f3, flag, subFlag); p->push(); p->curr.execWidth = 1; p->AND(f2, f2, f3); p->OR(f1, f1, f2); p->pop(); break; case GEN_CONDITIONAL_EQ: p->CMP(GEN_CONDITIONAL_EQ, tmp0, tmp1); saveFlag(f1, flag, subFlag); tmp0.type = tmp1.type = GEN_TYPE_UD; loadBottomHalf(tmp0, src0); loadBottomHalf(tmp1, src1); p->CMP(GEN_CONDITIONAL_EQ, tmp0, tmp1); saveFlag(f2, flag, subFlag); p->push(); p->curr.execWidth = 1; p->AND(f1, f1, f2); p->pop(); break; case GEN_CONDITIONAL_NEQ: p->CMP(GEN_CONDITIONAL_NEQ, tmp0, tmp1); saveFlag(f1, flag, subFlag); tmp0.type = tmp1.type = GEN_TYPE_UD; loadBottomHalf(tmp0, src0); loadBottomHalf(tmp1, src1); p->CMP(GEN_CONDITIONAL_NEQ, tmp0, tmp1); saveFlag(f2, flag, subFlag); p->push(); p->curr.execWidth = 1; p->OR(f1, f1, f2); p->pop(); break; default: NOT_IMPLEMENTED; } p->curr.execWidth = 1; p->MOV(GenRegister::flag(flag, subFlag), f1); p->pop(); } void GenContext::emitI64SATADDInstruction(const SelectionInstruction &insn) { GenRegister x = ra->genReg(insn.src(0)); GenRegister y = ra->genReg(insn.src(1)); GenRegister dst = ra->genReg(insn.dst(0)); GenRegister a = ra->genReg(insn.dst(1)); GenRegister b = ra->genReg(insn.dst(2)); GenRegister c = ra->genReg(insn.dst(3)); GenRegister d = ra->genReg(insn.dst(4)); GenRegister e = ra->genReg(insn.dst(5)); GBE_ASSERT(insn.state.flag == 0 && insn.state.subFlag == 1); GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag); loadTopHalf(a, x); loadBottomHalf(b, x); loadTopHalf(c, y); loadBottomHalf(d, y); if(dst.is_signed_int()) p->SHR(e, a, GenRegister::immud(31)); addWithCarry(b, b, d); addWithCarry(a, a, d); addWithCarry(a, a, c); p->ADD(c, c, d); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); if(! dst.is_signed_int()) { p->CMP(GEN_CONDITIONAL_NZ, c, GenRegister::immud(0)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(a, GenRegister::immud(0xFFFFFFFFu)); p->MOV(b, GenRegister::immud(0xFFFFFFFFu)); } else { p->CMP(GEN_CONDITIONAL_EQ, e, GenRegister::immud(1)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->CMP(GEN_CONDITIONAL_L, a, GenRegister::immud(0x80000000u)); p->MOV(a, GenRegister::immud(0x80000000u)); p->MOV(b, GenRegister::immud(0)); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->CMP(GEN_CONDITIONAL_EQ, e, GenRegister::immud(0)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->CMP(GEN_CONDITIONAL_GE, a, GenRegister::immud(0x80000000u)); p->MOV(a, GenRegister::immud(0x7FFFFFFFu)); p->MOV(b, GenRegister::immud(0xFFFFFFFFu)); } p->pop(); storeTopHalf(dst, a); storeBottomHalf(dst, b); } void GenContext::emitI64SATSUBInstruction(const SelectionInstruction &insn) { GenRegister x = ra->genReg(insn.src(0)); GenRegister y = ra->genReg(insn.src(1)); GenRegister dst = ra->genReg(insn.dst(0)); GenRegister a = ra->genReg(insn.dst(1)); GenRegister b = ra->genReg(insn.dst(2)); GenRegister c = ra->genReg(insn.dst(3)); GenRegister d = ra->genReg(insn.dst(4)); GenRegister e = ra->genReg(insn.dst(5)); GBE_ASSERT(insn.state.flag == 0 && insn.state.subFlag == 1); GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag); loadTopHalf(a, x); loadBottomHalf(b, x); loadTopHalf(c, y); loadBottomHalf(d, y); if(dst.is_signed_int()) p->SHR(e, a, GenRegister::immud(31)); subWithBorrow(b, b, d); subWithBorrow(a, a, d); subWithBorrow(a, a, c); p->ADD(c, c, d); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); if(! dst.is_signed_int()) { p->CMP(GEN_CONDITIONAL_NZ, c, GenRegister::immud(0)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(a, GenRegister::immud(0)); p->MOV(b, GenRegister::immud(0)); } else { p->CMP(GEN_CONDITIONAL_EQ, e, GenRegister::immud(1)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->CMP(GEN_CONDITIONAL_L, a, GenRegister::immud(0x80000000u)); p->MOV(a, GenRegister::immud(0x80000000u)); p->MOV(b, GenRegister::immud(0)); p->curr.predicate = GEN_PREDICATE_NONE; p->CMP(GEN_CONDITIONAL_EQ, e, GenRegister::immud(0)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->CMP(GEN_CONDITIONAL_GE, a, GenRegister::immud(0x80000000u)); p->MOV(a, GenRegister::immud(0x7FFFFFFFu)); p->MOV(b, GenRegister::immud(0xFFFFFFFFu)); } p->pop(); storeTopHalf(dst, a); storeBottomHalf(dst, b); } void GenContext::loadTopHalf(GenRegister dest, GenRegister src) { p->MOV(dest, src.top_half(this->simdWidth)); } void GenContext::storeTopHalf(GenRegister dest, GenRegister src) { p->MOV(dest.top_half(this->simdWidth), src); } void GenContext::loadBottomHalf(GenRegister dest, GenRegister src) { p->MOV(dest, src.bottom_half()); } void GenContext::storeBottomHalf(GenRegister dest, GenRegister src) { p->MOV(dest.bottom_half(), src); } void GenContext::addWithCarry(GenRegister dest, GenRegister src0, GenRegister src1) { int execWidth = p->curr.execWidth; GenRegister acc0 = GenRegister::retype(GenRegister::acc(), GEN_TYPE_D); p->push(); p->curr.execWidth = 8; p->ADDC(dest, src0, src1); p->MOV(src1, acc0); if (execWidth == 16) { p->curr.quarterControl = 1; p->ADDC(GenRegister::suboffset(dest, 8), GenRegister::suboffset(src0, 8), GenRegister::suboffset(src1, 8)); p->MOV(GenRegister::suboffset(src1, 8), acc0); } p->pop(); } void GenContext::subWithBorrow(GenRegister dest, GenRegister src0, GenRegister src1) { int execWidth = p->curr.execWidth; GenRegister acc0 = GenRegister::retype(GenRegister::acc(), GEN_TYPE_D); p->push(); p->curr.execWidth = 8; p->SUBB(dest, src0, src1); p->MOV(src1, acc0); if (execWidth == 16) { p->curr.quarterControl = 1; p->SUBB(GenRegister::suboffset(dest, 8), GenRegister::suboffset(src0, 8), GenRegister::suboffset(src1, 8)); p->MOV(GenRegister::suboffset(src1, 8), acc0); } p->pop(); } void GenContext::I32FullMult(GenRegister high, GenRegister low, GenRegister src0, GenRegister src1) { GenRegister acc = GenRegister::retype(GenRegister::acc(), GEN_TYPE_UD); int execWidth = p->curr.execWidth; p->push(); p->curr.execWidth = 8; for(int i = 0; i < execWidth; i += 8) { p->MUL(acc, src0, GenRegister::h2(GenRegister::retype(src1, GEN_TYPE_UW))); p->curr.accWrEnable = 1; p->MACH(high, src0, src1); p->curr.accWrEnable = 0; p->MOV(low, acc); src0 = GenRegister::suboffset(src0, 8); src1 = GenRegister::suboffset(src1, 8); high = GenRegister::suboffset(high, 8); low = GenRegister::suboffset(low, 8); } p->pop(); } void GenContext::emitI64MULInstruction(const SelectionInstruction &insn) { GenRegister dest = ra->genReg(insn.dst(0)); GenRegister x = ra->genReg(insn.src(0)); GenRegister y = ra->genReg(insn.src(1)); GenRegister a = ra->genReg(insn.dst(1)); GenRegister b = ra->genReg(insn.dst(2)); GenRegister c = ra->genReg(insn.dst(3)); GenRegister d = ra->genReg(insn.dst(4)); GenRegister e = ra->genReg(insn.dst(5)); GenRegister f = ra->genReg(insn.dst(6)); a.type = b.type = c.type = d.type = e.type = f.type = GEN_TYPE_UD; loadTopHalf(a, x); loadBottomHalf(b, x); loadTopHalf(c, y); loadBottomHalf(d, y); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; I32FullMult(GenRegister::retype(GenRegister::null(), GEN_TYPE_D), e, b, c); I32FullMult(GenRegister::retype(GenRegister::null(), GEN_TYPE_D), f, a, d); p->ADD(e, e, f); I32FullMult(f, a, b, d); p->ADD(e, e, f); p->pop(); storeTopHalf(dest, e); storeBottomHalf(dest, a); } void GenContext::emitI64DIVREMInstruction(const SelectionInstruction &insn) { GenRegister dest = ra->genReg(insn.dst(0)); GenRegister x = ra->genReg(insn.src(0)); GenRegister y = ra->genReg(insn.src(1)); GenRegister a = ra->genReg(insn.dst(1)); GenRegister b = ra->genReg(insn.dst(2)); GenRegister c = ra->genReg(insn.dst(3)); GenRegister d = ra->genReg(insn.dst(4)); GenRegister e = ra->genReg(insn.dst(5)); GenRegister f = ra->genReg(insn.dst(6)); GenRegister g = ra->genReg(insn.dst(7)); GenRegister h = ra->genReg(insn.dst(8)); GenRegister i = ra->genReg(insn.dst(9)); GenRegister j = ra->genReg(insn.dst(10)); GenRegister k = ra->genReg(insn.dst(11)); GenRegister l = ra->genReg(insn.dst(12)); GenRegister m = ra->genReg(insn.dst(13)); GBE_ASSERT(insn.state.flag == 0 && insn.state.subFlag == 1); GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag); GenRegister zero = GenRegister::immud(0), one = GenRegister::immud(1), imm31 = GenRegister::immud(31); uint32_t jip0; // (a,b) <- x loadTopHalf(a, x); loadBottomHalf(b, x); // (c,d) <- y loadTopHalf(c, y); loadBottomHalf(d, y); // k <- sign_of_result if(x.is_signed_int()) { GBE_ASSERT(y.is_signed_int()); GBE_ASSERT(dest.is_signed_int()); I64ABS(k, a, b, e, flagReg); I64ABS(l, c, d, e, flagReg); if(insn.opcode == SEL_OP_I64DIV) p->XOR(k, k, l); } // (e,f) <- 0 p->MOV(e, zero); p->MOV(f, zero); // (g,h) <- 2**63 p->MOV(g, GenRegister::immud(0x80000000)); p->MOV(h, zero); // (i,j) <- 0 p->MOV(i, zero); p->MOV(j, zero); // m <- 0 p->MOV(m, zero); { uint32_t loop_start = p->n_instruction(); // (c,d,e,f) <- (c,d,e,f) / 2 p->SHR(f, f, one); p->SHL(l, e, imm31); p->OR(f, f, l); p->SHR(e, e, one); p->SHL(l, d, imm31); p->OR(e, e, l); p->SHR(d, d, one); p->SHL(l, c, imm31); p->OR(d, d, l); p->SHR(c, c, one); // condition <- (c,d)==0 && (a,b)>=(e,f) p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->MOV(l, zero); p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_EQ, a, e); p->curr.predicate = GEN_PREDICATE_NORMAL; p->CMP(GEN_CONDITIONAL_GE, b, f); p->MOV(l, one); p->curr.predicate = GEN_PREDICATE_NONE; p->CMP(GEN_CONDITIONAL_G, a, e); p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(l, one); p->curr.predicate = GEN_PREDICATE_NONE; p->CMP(GEN_CONDITIONAL_NEQ, l, zero); p->curr.predicate = GEN_PREDICATE_NORMAL; p->CMP(GEN_CONDITIONAL_EQ, c, zero); p->CMP(GEN_CONDITIONAL_EQ, d, zero); // under condition, (a,b) <- (a,b) - (e,f) p->MOV(l, f); subWithBorrow(b, b, l); subWithBorrow(a, a, l); p->MOV(l, e); subWithBorrow(a, a, l); // under condition, (i,j) <- (i,j) | (g,h) p->OR(i, i, g); p->OR(j, j, h); p->pop(); // (g,h) /= 2 p->SHR(h, h, one); p->SHL(l, g, imm31); p->OR(h, h, l); p->SHR(g, g, one); // condition: m < 64 p->ADD(m, m, one); p->push(); p->curr.noMask = 1; p->curr.execWidth = 1; p->MOV(flagReg, zero); p->pop(); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 0; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_L, m, GenRegister::immud(64)); p->curr.execWidth = 1; p->curr.noMask = 1; // under condition, jump back to start point if (simdWidth == 8) p->curr.predicate = GEN_PREDICATE_ALIGN1_ANY8H; else if (simdWidth == 16) p->curr.predicate = GEN_PREDICATE_ALIGN1_ANY16H; else NOT_IMPLEMENTED; int distance = -(int)(p->n_instruction() - loop_start ); p->curr.noMask = 1; jip0 = p->n_instruction(); p->JMPI(zero); p->patchJMPI(jip0, distance, 0); p->pop(); // end of loop } // adjust sign of result if(x.is_signed_int()) { p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_NEQ, k, zero); p->curr.predicate = GEN_PREDICATE_NORMAL; if(insn.opcode == SEL_OP_I64DIV) I64Neg(i, j, l); else I64Neg(a, b, l); p->pop(); } // write dest if(insn.opcode == SEL_OP_I64DIV) { storeTopHalf(dest, i); storeBottomHalf(dest, j); } else { GBE_ASSERT(insn.opcode == SEL_OP_I64REM); storeTopHalf(dest, a); storeBottomHalf(dest, b); } } void GenContext::emitF64DIVInstruction(const SelectionInstruction &insn) { GBE_ASSERT(0); // No support for double on Gen7 } void GenContext::emitTernaryInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src0 = ra->genReg(insn.src(0)); const GenRegister src1 = ra->genReg(insn.src(1)); const GenRegister src2 = ra->genReg(insn.src(2)); switch (insn.opcode) { case SEL_OP_MAD: p->MAD(dst, src0, src1, src2); break; case SEL_OP_LRP: p->LRP(dst, src0, src1, src2); break; default: NOT_IMPLEMENTED; } } void GenContext::emitNoOpInstruction(const SelectionInstruction &insn) { p->NOP(); } void GenContext::emitWaitInstruction(const SelectionInstruction &insn) { p->WAIT(insn.extra.waitType); } void GenContext::emitBarrierInstruction(const SelectionInstruction &insn) { const GenRegister src = ra->genReg(insn.src(0)); const GenRegister fenceDst = ra->genReg(insn.dst(0)); uint32_t barrierType = insn.extra.barrierType; const GenRegister barrierId = ra->genReg(GenRegister::ud1grf(ir::ocl::barrierid)); bool imageFence = barrierType & ir::SYNC_IMAGE_FENCE; if (barrierType & ir::SYNC_GLOBAL_READ_FENCE) { p->FENCE(fenceDst, imageFence); p->MOV(fenceDst, fenceDst); } p->push(); // As only the payload.2 is used and all the other regions are ignored // SIMD8 mode here is safe. p->curr.execWidth = 8; p->curr.physicalFlag = 0; p->curr.noMask = 1; // Copy barrier id from r0. p->AND(src, barrierId, GenRegister::immud(0x0f000000)); // A barrier is OK to start the thread synchronization *and* SLM fence p->BARRIER(src); p->curr.execWidth = 1; // Now we wait for the other threads p->curr.predicate = GEN_PREDICATE_NONE; p->WAIT(); p->pop(); if (imageFence) { p->FLUSH_SAMPLERCACHE(fenceDst); p->MOV(fenceDst, fenceDst); } } void GenContext::emitFenceInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); p->FENCE(dst, false); p->MOV(dst, dst); } void GenContext::emitMathInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src0 = ra->genReg(insn.src(0)); const uint32_t function = insn.extra.function; if (insn.srcNum == 2) { const GenRegister src1 = ra->genReg(insn.src(1)); p->MATH(dst, function, src0, src1); } else p->MATH(dst, function, src0); } void GenContext::emitCompareInstruction(const SelectionInstruction &insn) { const GenRegister src0 = ra->genReg(insn.src(0)); const GenRegister src1 = ra->genReg(insn.src(1)); const GenRegister dst = ra->genReg(insn.dst(0)); if (insn.opcode == SEL_OP_CMP) p->CMP(insn.extra.function, src0, src1, dst); else { GBE_ASSERT(insn.opcode == SEL_OP_SEL_CMP); const GenRegister dst = ra->genReg(insn.dst(0)); p->SEL_CMP(insn.extra.function, dst, src0, src1); } } void GenContext::emitAtomicInstruction(const SelectionInstruction &insn) { const GenRegister addr = ra->genReg(insn.src(0)); const GenRegister dst = ra->genReg(insn.dst(0)); const uint32_t function = insn.extra.function; unsigned srcNum = insn.extra.elem; GenRegister data = addr; if (srcNum > 1) data = ra->genReg(insn.src(1)); const GenRegister bti = ra->genReg(insn.src(srcNum)); if (bti.file == GEN_IMMEDIATE_VALUE) { p->ATOMIC(dst, function, addr, data, bti, srcNum, insn.extra.splitSend); } else { GenRegister flagTemp = ra->genReg(insn.dst(1)); GenRegister btiTmp = ra->genReg(insn.dst(2)); unsigned desc = 0; if (insn.extra.splitSend) desc = p->generateAtomicMessageDesc(function, 0, 1); else desc = p->generateAtomicMessageDesc(function, 0, srcNum); unsigned jip0 = beforeMessage(insn, bti, flagTemp, btiTmp, desc); p->push(); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(insn.state.flag, insn.state.subFlag); p->ATOMIC(dst, function, addr, data, GenRegister::addr1(0), srcNum, insn.extra.splitSend); p->pop(); afterMessage(insn, bti, flagTemp, btiTmp, jip0); } } void GenContext::emitIndirectMoveInstruction(const SelectionInstruction &insn) { GenRegister baseReg = ra->genReg(insn.src(0)); GenRegister offset = ra->genReg(insn.src(1)); uint32_t immoffset = insn.extra.indirect_offset; const GenRegister dst = ra->genReg(insn.dst(0)); GenRegister tmp = ra->genReg(insn.dst(1)); const GenRegister a0 = GenRegister::addr8(0); uint32_t simdWidth = p->curr.execWidth; GenRegister indirect_src; if(sel->isScalarReg(offset.reg())) offset = GenRegister::retype(offset, GEN_TYPE_UW); else offset = GenRegister::unpacked_uw(offset); uint32_t baseRegOffset = GenRegister::grfOffset(baseReg); //There is a restrict that: lower 5 bits indirect reg SubRegNum and //the lower 5 bits of indirect imm SubRegNum cannot exceed 5 bits. //So can't use AddrImm field, need a add. p->ADD(tmp, offset, GenRegister::immuw(baseRegOffset + immoffset)); indirect_src = GenRegister::indirect(dst.type, 0, GEN_WIDTH_1, GEN_VERTICAL_STRIDE_ONE_DIMENSIONAL, GEN_HORIZONTAL_STRIDE_0); if (sel->isScalarReg(dst.reg())) { p->push(); p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->MOV(a0, tmp); p->MOV(dst, indirect_src); p->pop(); } else { p->push(); p->curr.execWidth = 8; p->curr.quarterControl = GEN_COMPRESSION_Q1; p->MOV(a0, tmp); p->MOV(dst, indirect_src); p->pop(); if (simdWidth == 16) { p->push(); p->curr.execWidth = 8; p->curr.quarterControl = GEN_COMPRESSION_Q2; const GenRegister nextDst = GenRegister::Qn(dst, 1); const GenRegister nextOffset = GenRegister::Qn(tmp, 1); p->MOV(a0, nextOffset); p->MOV(nextDst, indirect_src); p->pop(); } } } void GenContext::insertJumpPos(const SelectionInstruction &insn) { const ir::LabelIndex label(insn.index); this->branchPos2.push_back(std::make_pair(label, p->store.size())); } void GenContext::emitJumpInstruction(const SelectionInstruction &insn) { insertJumpPos(insn); const GenRegister src = ra->genReg(insn.src(0)); p->JMPI(src, insn.extra.longjmp); } void GenContext::emitEotInstruction(const SelectionInstruction &insn) { p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->MOV(GenRegister::ud8grf(112, 0), GenRegister::ud8grf(0, 0)); p->curr.execWidth = 8; p->EOT(112); p->pop(); } void GenContext::emitSpillRegInstruction(const SelectionInstruction &insn) { uint32_t simdWidth = p->curr.execWidth; uint32_t scratchOffset = insn.extra.scratchOffset; const uint32_t header = insn.extra.scratchMsgHeader; p->push(); const GenRegister msg = GenRegister::ud8grf(header, 0); const GenRegister src = ra->genReg(insn.src(0)); GenRegister payload = src; payload.nr = header + 1; payload.subnr = 0; GBE_ASSERT(src.subnr == 0); uint32_t regType = insn.src(0).type; uint32_t size = typeSize(regType); uint32_t regSize = stride(src.hstride)*size; GBE_ASSERT(regSize == 4 || regSize == 8); if(regSize == 4) { if (payload.nr != src.nr) p->MOV(payload, src); uint32_t regNum = (regSize*simdWidth) > 32 ? 2 : 1; this->scratchWrite(msg, scratchOffset, regNum, GEN_TYPE_UD, GEN_SCRATCH_CHANNEL_MODE_DWORD); } else { //size == 8 payload.type = GEN_TYPE_UD; GBE_ASSERT(payload.hstride == GEN_HORIZONTAL_STRIDE_1); loadBottomHalf(payload, src.isdf()? GenRegister::retype(src, GEN_TYPE_UL) : src ); uint32_t regNum = (regSize/2*simdWidth) > 32 ? 2 : 1; this->scratchWrite(msg, scratchOffset, regNum, GEN_TYPE_UD, GEN_SCRATCH_CHANNEL_MODE_DWORD); loadTopHalf(payload, src.isdf() ? GenRegister::retype(src, GEN_TYPE_UL) : src); this->scratchWrite(msg, scratchOffset + 4*simdWidth, regNum, GEN_TYPE_UD, GEN_SCRATCH_CHANNEL_MODE_DWORD); } p->pop(); } void GenContext::emitUnSpillRegInstruction(const SelectionInstruction &insn) { uint32_t scratchOffset = insn.extra.scratchOffset; const GenRegister dst = insn.dst(0); uint32_t regType = dst.type; uint32_t simdWidth = p->curr.execWidth; const uint32_t header = insn.extra.scratchMsgHeader; uint32_t size = typeSize(regType); uint32_t regSize = stride(dst.hstride)*size; const GenRegister msg = GenRegister::ud8grf(header, 0); GenRegister payload = msg; payload.nr = header + 1; p->push(); assert(regSize == 4 || regSize == 8); if(regSize == 4) { uint32_t regNum = (regSize*simdWidth) > 32 ? 2 : 1; this->scratchRead(GenRegister::ud8grf(dst.nr, dst.subnr), msg, scratchOffset, regNum, GEN_TYPE_UD, GEN_SCRATCH_CHANNEL_MODE_DWORD); } else { uint32_t regNum = (regSize/2*simdWidth) > 32 ? 2 : 1; this->scratchRead(payload, msg, scratchOffset, regNum, GEN_TYPE_UD, GEN_SCRATCH_CHANNEL_MODE_DWORD); storeBottomHalf(GenRegister::ul8grf(dst.nr, dst.subnr), payload); this->scratchRead(payload, msg, scratchOffset + 4*simdWidth, regNum, GEN_TYPE_UD, GEN_SCRATCH_CHANNEL_MODE_DWORD); storeTopHalf(GenRegister::ul8grf(dst.nr, dst.subnr), payload); } p->pop(); } void GenContext::emitRead64Instruction(const SelectionInstruction &insn) { const uint32_t elemNum = insn.extra.elem * 2; const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src = ra->genReg(insn.src(0)); const GenRegister bti = ra->genReg(insn.src(1)); if (bti.file == GEN_IMMEDIATE_VALUE) { p->UNTYPED_READ(dst, src, bti, elemNum); } else { const GenRegister tmp = ra->genReg(insn.dst(insn.extra.elem)); const GenRegister btiTmp = ra->genReg(insn.dst(insn.extra.elem + 1)); unsigned desc = p->generateUntypedReadMessageDesc(0, elemNum); unsigned jip0 = beforeMessage(insn, bti, tmp, btiTmp, desc); //predicated load p->push(); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(insn.state.flag, insn.state.subFlag); p->UNTYPED_READ(dst, src, GenRegister::retype(GenRegister::addr1(0), GEN_TYPE_UD), elemNum); p->pop(); afterMessage(insn, bti, tmp, btiTmp, jip0); } } unsigned GenContext::beforeMessage(const SelectionInstruction &insn, GenRegister bti, GenRegister tmp, GenRegister btiTmp, unsigned desc) { const GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag); setFlag(flagReg, GenRegister::immuw(0)); p->CMP(GEN_CONDITIONAL_NZ, flagReg, GenRegister::immuw(1)); GenRegister btiUD = GenRegister::retype(btiTmp, GEN_TYPE_UD); GenRegister btiUW = GenRegister::retype(btiTmp, GEN_TYPE_UW); GenRegister btiUB = GenRegister::retype(btiTmp, GEN_TYPE_UB); unsigned jip0 = p->n_instruction(); p->push(); p->curr.execWidth = 1; p->curr.noMask = 1; p->AND(btiUD, flagReg, GenRegister::immud(0xffffffff)); p->LZD(btiUD, btiUD); p->ADD(btiUW, GenRegister::negate(btiUW), GenRegister::immuw(0x1f)); p->MUL(btiUW, btiUW, GenRegister::immuw(0x4)); p->ADD(GenRegister::addr1(0), btiUW, GenRegister::immud(bti.nr*32)); p->MOV(btiUD, GenRegister::indirect(GEN_TYPE_UD, 0, GEN_WIDTH_1, GEN_VERTICAL_STRIDE_ONE_DIMENSIONAL, GEN_HORIZONTAL_STRIDE_0)); //save flag p->MOV(tmp, flagReg); p->pop(); p->CMP(GEN_CONDITIONAL_Z, bti, btiUD); p->push(); p->curr.execWidth = 1; p->curr.noMask = 1; p->OR(GenRegister::retype(GenRegister::addr1(0), GEN_TYPE_UD), btiUB, GenRegister::immud(desc)); p->pop(); return jip0; } void GenContext::afterMessage(const SelectionInstruction &insn, GenRegister bti, GenRegister tmp, GenRegister btiTmp, unsigned jip0) { const GenRegister btiUD = GenRegister::retype(btiTmp, GEN_TYPE_UD); //restore flag setFlag(GenRegister::flag(insn.state.flag, insn.state.subFlag), tmp); // get active channel p->push(); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(insn.state.flag, insn.state.subFlag); p->CMP(GEN_CONDITIONAL_NZ, bti, btiUD); unsigned jip1 = p->n_instruction(); p->WHILE(GenRegister::immud(0)); p->pop(); p->patchJMPI(jip1, jip0 - jip1, 0); } void GenContext::emitUntypedReadInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src = ra->genReg(insn.src(0)); const GenRegister bti = ra->genReg(insn.src(1)); const uint32_t elemNum = insn.extra.elem; if (bti.file == GEN_IMMEDIATE_VALUE) { p->UNTYPED_READ(dst, src, bti, elemNum); } else { const GenRegister tmp = ra->genReg(insn.dst(elemNum)); const GenRegister btiTmp = ra->genReg(insn.dst(elemNum + 1)); unsigned desc = p->generateUntypedReadMessageDesc(0, elemNum); unsigned jip0 = beforeMessage(insn, bti, tmp, btiTmp, desc); //predicated load p->push(); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(insn.state.flag, insn.state.subFlag); p->UNTYPED_READ(dst, src, GenRegister::retype(GenRegister::addr1(0), GEN_TYPE_UD), elemNum); p->pop(); afterMessage(insn, bti, tmp, btiTmp, jip0); } } void GenContext::emitWrite64Instruction(const SelectionInstruction &insn) { const GenRegister src = ra->genReg(insn.dst(0)); const uint32_t elemNum = insn.extra.elem; const GenRegister bti = ra->genReg(insn.src(elemNum+1)); if (bti.file == GEN_IMMEDIATE_VALUE) { p->UNTYPED_WRITE(src, src, bti, elemNum*2, false); } else { const GenRegister tmp = ra->genReg(insn.dst(0)); const GenRegister btiTmp = ra->genReg(insn.dst(1)); unsigned desc = p->generateUntypedWriteMessageDesc(0, elemNum*2); unsigned jip0 = beforeMessage(insn, bti, tmp, btiTmp, desc); //predicated load p->push(); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(insn.state.flag, insn.state.subFlag); p->UNTYPED_WRITE(src, src, GenRegister::addr1(0), elemNum*2, false); p->pop(); afterMessage(insn, bti, tmp, btiTmp, jip0); } } void GenContext::emitUntypedWriteInstruction(const SelectionInstruction &insn) { const GenRegister addr = ra->genReg(insn.src(0)); GenRegister data = ra->genReg(insn.src(1)); const uint32_t elemNum = insn.extra.elem; const GenRegister bti = ra->genReg(insn.src(elemNum+1)); if (bti.file == GEN_IMMEDIATE_VALUE) { p->UNTYPED_WRITE(addr, data, bti, elemNum, insn.extra.splitSend); } else { const GenRegister tmp = ra->genReg(insn.dst(0)); const GenRegister btiTmp = ra->genReg(insn.dst(1)); unsigned desc = 0; if (insn.extra.splitSend) desc = p->generateUntypedWriteSendsMessageDesc(0, elemNum); else desc = p->generateUntypedWriteMessageDesc(0, elemNum); unsigned jip0 = beforeMessage(insn, bti, tmp, btiTmp, desc); //predicated load p->push(); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(insn.state.flag, insn.state.subFlag); p->UNTYPED_WRITE(addr, data, GenRegister::addr1(0), elemNum, insn.extra.splitSend); p->pop(); afterMessage(insn, bti, tmp, btiTmp, jip0); } } void GenContext::emitByteGatherInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src = ra->genReg(insn.src(0)); const GenRegister bti = ra->genReg(insn.src(1)); const uint32_t elemSize = insn.extra.elem; if (bti.file == GEN_IMMEDIATE_VALUE) { p->BYTE_GATHER(dst, src, bti, elemSize); } else { const GenRegister tmp = ra->genReg(insn.dst(1)); const GenRegister btiTmp = ra->genReg(insn.dst(2)); unsigned desc = p->generateByteGatherMessageDesc(0, elemSize); unsigned jip0 = beforeMessage(insn, bti, tmp, btiTmp, desc); //predicated load p->push(); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(insn.state.flag, insn.state.subFlag); p->BYTE_GATHER(dst, src, GenRegister::addr1(0), elemSize); p->pop(); afterMessage(insn, bti, tmp, btiTmp, jip0); } } void GenContext::emitByteScatterInstruction(const SelectionInstruction &insn) { const GenRegister addr = ra->genReg(insn.src(0)); GenRegister data = ra->genReg(insn.src(1)); const uint32_t elemSize = insn.extra.elem; const GenRegister bti = ra->genReg(insn.src(2)); if (bti.file == GEN_IMMEDIATE_VALUE) { p->BYTE_SCATTER(addr, data, bti, elemSize, insn.extra.splitSend); } else { const GenRegister tmp = ra->genReg(insn.dst(0)); const GenRegister btiTmp = ra->genReg(insn.dst(1)); unsigned desc = 0; if (insn.extra.splitSend) desc = p->generateByteScatterSendsMessageDesc(0, elemSize); else desc = p->generateByteScatterMessageDesc(0, elemSize); unsigned jip0 = beforeMessage(insn, bti, tmp, btiTmp, desc); //predicated load p->push(); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(insn.state.flag, insn.state.subFlag); p->BYTE_SCATTER(addr, data, GenRegister::addr1(0), elemSize, insn.extra.splitSend); p->pop(); afterMessage(insn, bti, tmp, btiTmp, jip0); } } void GenContext::emitUntypedReadA64Instruction(const SelectionInstruction &insn) { assert(0); } void GenContext::emitUntypedWriteA64Instruction(const SelectionInstruction &insn) { assert(0); } void GenContext::emitByteGatherA64Instruction(const SelectionInstruction &insn) { assert(0); } void GenContext::emitByteScatterA64Instruction(const SelectionInstruction &insn) { assert(0); } void GenContext::emitRead64A64Instruction(const SelectionInstruction &insn) { assert(0); } void GenContext::emitWrite64A64Instruction(const SelectionInstruction &insn) { assert(0); } void GenContext::emitAtomicA64Instruction(const SelectionInstruction &insn) { assert(0); } void GenContext::emitUnpackByteInstruction(const SelectionInstruction &insn) { const GenRegister src = ra->genReg(insn.src(0)); for(uint32_t i = 0; i < insn.dstNum; i++) { p->MOV(ra->genReg(insn.dst(i)), GenRegister::splitReg(src, insn.extra.elem, i)); } } void GenContext::emitPackByteInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); p->push(); if(simdWidth == 8) { for(uint32_t i = 0; i < insn.srcNum; i++) p->MOV(GenRegister::splitReg(dst, insn.extra.elem, i), ra->genReg(insn.src(i))); } else { // when destination expands two registers, the source must span two registers. p->curr.execWidth = 8; for(uint32_t i = 0; i < insn.srcNum; i++) { GenRegister dsti = GenRegister::splitReg(dst, insn.extra.elem, i); GenRegister src = ra->genReg(insn.src(i)); p->curr.quarterControl = 0; p->MOV(dsti, src); p->curr.quarterControl = 1; p->MOV(GenRegister::Qn(dsti,1), GenRegister::Qn(src, 1)); } } p->pop(); } void GenContext::emitUnpackLongInstruction(const SelectionInstruction &insn) { GBE_ASSERT(0); } void GenContext::emitPackLongInstruction(const SelectionInstruction &insn) { GBE_ASSERT(0); } void GenContext::emitDWordGatherInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src = ra->genReg(insn.src(0)); const uint32_t bti = insn.getbti(); p->DWORD_GATHER(dst, src, bti); } void GenContext::emitSampleInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister msgPayload = GenRegister::retype(ra->genReg(insn.src(0)), GEN_TYPE_F); const unsigned char bti = insn.getbti(); const unsigned char sampler = insn.extra.sampler; const unsigned int msgLen = insn.extra.rdmsglen; uint32_t simdWidth = p->curr.execWidth; p->SAMPLE(dst, msgPayload, msgLen, false, bti, sampler, simdWidth, -1, 0, insn.extra.isLD, insn.extra.isUniform); } void GenContext::emitVmeInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const unsigned int msg_type = insn.extra.msg_type; GBE_ASSERT(msg_type == 1); int rsp_len; if(msg_type == 1) rsp_len = 6; uint32_t execWidth_org = p->curr.execWidth; p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.execWidth = 1; /* Use MOV to Setup bits of payload: mov payload value stored in insn.src(x) to * 5 consecutive payload grf. * In simd8 mode, one virtual grf register map to one physical grf register. But * in simd16 mode, one virtual grf register map to two physical grf registers. * So we should treat them differently. * */ if(execWidth_org == 8){ for(int i=0; i < 5; i++){ GenRegister payload_grf = ra->genReg(insn.dst(rsp_len+i)); payload_grf.vstride = GEN_VERTICAL_STRIDE_0; payload_grf.width = GEN_WIDTH_1; payload_grf.hstride = GEN_HORIZONTAL_STRIDE_0; payload_grf.subphysical = 1; for(int j=0; j < 8; j++){ payload_grf.subnr = (7 - j) * typeSize(GEN_TYPE_UD); GenRegister payload_val = ra->genReg(insn.src(i*8+j)); payload_val.vstride = GEN_VERTICAL_STRIDE_0; payload_val.width = GEN_WIDTH_1; payload_val.hstride = GEN_HORIZONTAL_STRIDE_0; p->MOV(payload_grf, payload_val); } } } else if(execWidth_org == 16){ for(int i=0; i < 2; i++){ for(int k = 0; k < 2; k++){ GenRegister payload_grf = ra->genReg(insn.dst(rsp_len+i)); payload_grf.nr += k; payload_grf.vstride = GEN_VERTICAL_STRIDE_0; payload_grf.width = GEN_WIDTH_1; payload_grf.hstride = GEN_HORIZONTAL_STRIDE_0; payload_grf.subphysical = 1; for(int j=0; j < 8; j++){ payload_grf.subnr = (7 - j) * typeSize(GEN_TYPE_UD); GenRegister payload_val = ra->genReg(insn.src(i*16+k*8+j)); payload_val.vstride = GEN_VERTICAL_STRIDE_0; payload_val.width = GEN_WIDTH_1; payload_val.hstride = GEN_HORIZONTAL_STRIDE_0; p->MOV(payload_grf, payload_val); } } } { int i = 2; GenRegister payload_grf = ra->genReg(insn.dst(rsp_len+i)); payload_grf.vstride = GEN_VERTICAL_STRIDE_0; payload_grf.width = GEN_WIDTH_1; payload_grf.hstride = GEN_HORIZONTAL_STRIDE_0; payload_grf.subphysical = 1; for(int j=0; j < 8; j++){ payload_grf.subnr = (7 - j) * typeSize(GEN_TYPE_UD); GenRegister payload_val = ra->genReg(insn.src(i*16+j)); payload_val.vstride = GEN_VERTICAL_STRIDE_0; payload_val.width = GEN_WIDTH_1; payload_val.hstride = GEN_HORIZONTAL_STRIDE_0; p->MOV(payload_grf, payload_val); } } } p->pop(); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.execWidth = 1; GenRegister payload_did = GenRegister::retype(ra->genReg(insn.dst(rsp_len)), GEN_TYPE_UB); payload_did.vstride = GEN_VERTICAL_STRIDE_0; payload_did.width = GEN_WIDTH_1; payload_did.hstride = GEN_HORIZONTAL_STRIDE_0; payload_did.subphysical = 1; payload_did.subnr = 20 * typeSize(GEN_TYPE_UB); GenRegister grf0 = GenRegister::ub1grf(0, 20); p->MOV(payload_did, grf0); p->pop(); const GenRegister msgPayload = ra->genReg(insn.dst(rsp_len)); const unsigned char bti = insn.getbti(); const unsigned int vme_search_path_lut = insn.extra.vme_search_path_lut; const unsigned int lut_sub = insn.extra.lut_sub; p->VME(bti, dst, msgPayload, msg_type, vme_search_path_lut, lut_sub); } void GenContext::scratchWrite(const GenRegister header, uint32_t offset, uint32_t reg_num, uint32_t reg_type, uint32_t channel_mode) { p->push(); uint32_t simdWidth = p->curr.execWidth; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.execWidth = 8; p->MOV(header, GenRegister::ud8grf(0,0)); p->pop(); int size = typeSize(reg_type)*simdWidth; p->push(); p->SCRATCH_WRITE(header, offset/32, size, reg_num, channel_mode); p->pop(); } void GenContext::scratchRead(const GenRegister dst, const GenRegister header, uint32_t offset, uint32_t reg_num, uint32_t reg_type, uint32_t channel_mode) { p->push(); uint32_t simdWidth = p->curr.execWidth; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.execWidth = 8; p->MOV(header, GenRegister::ud8grf(0,0)); p->pop(); int size = typeSize(reg_type)*simdWidth; p->push(); p->SCRATCH_READ(dst, header, offset/32, size, reg_num, channel_mode); p->pop(); } void GenContext::emitTypedWriteInstruction(const SelectionInstruction &insn) { const GenRegister header = GenRegister::retype(ra->genReg(insn.src(0)), GEN_TYPE_UD); GenRegister data = ra->genReg(insn.src(5)); const uint32_t bti = insn.getbti(); p->TYPED_WRITE(header, data, true, bti, insn.extra.typedWriteSplitSend); } static void calcGID(GenRegister& reg, GenRegister& tmp, int flag, int subFlag, int dim, GenContext *gc) { GenRegister flagReg = GenRegister::flag(flag, subFlag); GenRegister gstart = GenRegister::offset(reg, 0, 8 + dim*8); GenRegister gend = GenRegister::offset(gstart, 0, 4); GenRegister lid, localsz, gid, goffset; if (dim == 0) { lid = GenRegister::toUniform(gc->ra->genReg(GenRegister::ud16grf(ir::ocl::lid0)), GEN_TYPE_UD); localsz = GenRegister::toUniform(gc->ra->genReg(GenRegister::ud1grf(ir::ocl::lsize0)), GEN_TYPE_UD); gid = GenRegister::toUniform(gc->ra->genReg(GenRegister::ud1grf(ir::ocl::groupid0)), GEN_TYPE_UD); goffset = GenRegister::toUniform(gc->ra->genReg(GenRegister::ud1grf(ir::ocl::goffset0)), GEN_TYPE_UD); } else if (dim == 1) { lid = GenRegister::toUniform(gc->ra->genReg(GenRegister::ud16grf(ir::ocl::lid1)), GEN_TYPE_UD); localsz = GenRegister::toUniform(gc->ra->genReg(GenRegister::ud1grf(ir::ocl::lsize1)), GEN_TYPE_UD); gid = GenRegister::toUniform(gc->ra->genReg(GenRegister::ud1grf(ir::ocl::groupid1)), GEN_TYPE_UD); goffset = GenRegister::toUniform(gc->ra->genReg(GenRegister::ud1grf(ir::ocl::goffset1)), GEN_TYPE_UD); } else { lid = GenRegister::toUniform(gc->ra->genReg(GenRegister::ud16grf(ir::ocl::lid2)), GEN_TYPE_UD); localsz = GenRegister::toUniform(gc->ra->genReg(GenRegister::ud1grf(ir::ocl::lsize2)), GEN_TYPE_UD); gid = GenRegister::toUniform(gc->ra->genReg(GenRegister::ud1grf(ir::ocl::groupid2)), GEN_TYPE_UD); goffset = GenRegister::toUniform(gc->ra->genReg(GenRegister::ud1grf(ir::ocl::goffset2)), GEN_TYPE_UD); } gc->p->MUL(gstart, localsz, gid); gc->p->ADD(gstart, gstart, lid); gc->p->ADD(gstart, gstart, goffset); GenRegister ip; gc->p->MOV(flagReg, GenRegister::immuw(0x0)); gc->p->curr.useFlag(flag, subFlag); gc->p->curr.predicate = GEN_PREDICATE_NONE; if (gc->getSimdWidth() == 16) gc->p->curr.execWidth = 16; else gc->p->curr.execWidth = 8; if (!gc->isDWLabel()) { ip = gc->ra->genReg(GenRegister::uw16grf(ir::ocl::blockip)); gc->p->CMP(GEN_CONDITIONAL_EQ, ip, GenRegister::immuw(0xffff)); } else { ip = gc->ra->genReg(GenRegister::ud16grf(ir::ocl::dwblockip)); gc->p->CMP(GEN_CONDITIONAL_EQ, ip, GenRegister::immud(0xffffffff)); } gc->p->curr.execWidth = 1; gc->p->MOV(GenRegister::retype(tmp, GEN_TYPE_UW), flagReg); if (gc->getSimdWidth() == 16) gc->p->OR(tmp, tmp, GenRegister::immud(0xffff0000)); else gc->p->OR(tmp, tmp, GenRegister::immud(0xffffff00)); gc->p->FBL(tmp, tmp); gc->p->ADD(tmp, tmp, GenRegister::negate(GenRegister::immud(0x1))); gc->p->MUL(tmp, tmp, GenRegister::immud(4)); gc->p->ADD(tmp, tmp, GenRegister::immud(lid.nr*32)); gc->p->MOV(GenRegister::addr1(0), GenRegister::retype(tmp, GEN_TYPE_UW)); GenRegister dimEnd = GenRegister::to_indirect1xN(lid, 0, 0); gc->p->MOV(tmp, dimEnd); gc->p->MUL(gend, localsz, gid); gc->p->ADD(gend, gend, tmp); gc->p->ADD(gend, gend, goffset); } void GenContext::calcGlobalXYZRange(GenRegister& reg, GenRegister& tmp, int flag, int subFlag) { p->push(); { p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; calcGID(reg, tmp, flag, subFlag, 0, this); calcGID(reg, tmp, flag, subFlag, 1, this); calcGID(reg, tmp, flag, subFlag, 2, this); } p->pop(); } void GenContext::profilingProlog(void) { // record the prolog, globalXYZ and lasttimestamp at the very beginning. GenRegister profilingReg2, profilingReg3, profilingReg4; GenRegister tmArf = GenRegister::tm0(); if (this->simdWidth == 16) { profilingReg2 = ra->genReg(GenRegister::ud16grf(ir::ocl::profilingts1)); profilingReg3 = GenRegister::offset(profilingReg2, 1); profilingReg4 = ra->genReg(GenRegister::ud16grf(ir::ocl::profilingts2)); } else { GBE_ASSERT(this->simdWidth == 8); profilingReg2 = ra->genReg(GenRegister::ud8grf(ir::ocl::profilingts2)); profilingReg3 = ra->genReg(GenRegister::ud8grf(ir::ocl::profilingts3)); profilingReg4 = ra->genReg(GenRegister::ud8grf(ir::ocl::profilingts4)); } /* MOV(4) prolog<1>:UW arf_tm<4,4,1>:UW */ /* MOV(4) lastTsReg<1>:UW prolog<4,4,1>:UW */ GenRegister prolog = profilingReg2; prolog.type = GEN_TYPE_UW; prolog.hstride = GEN_HORIZONTAL_STRIDE_1; prolog.vstride = GEN_VERTICAL_STRIDE_4; prolog.width = GEN_WIDTH_4; prolog = GenRegister::offset(prolog, 0, 4*sizeof(uint32_t)); GenRegister lastTsReg = GenRegister::toUniform(profilingReg3, GEN_TYPE_UL); lastTsReg = GenRegister::offset(lastTsReg, 0, 2*sizeof(uint64_t)); lastTsReg.type = GEN_TYPE_UW; lastTsReg.hstride = GEN_HORIZONTAL_STRIDE_1; lastTsReg.vstride = GEN_VERTICAL_STRIDE_4; lastTsReg.width = GEN_WIDTH_4; GenRegister gids = GenRegister::toUniform(profilingReg4, GEN_TYPE_UD); GenRegister tmp = GenRegister::toUniform(profilingReg4, GEN_TYPE_UD); // X Y and Z this->calcGlobalXYZRange(gids, tmp, 0, 1); p->push(); { p->curr.execWidth = 4; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->MOV(prolog, tmArf); p->MOV(lastTsReg, tmArf); } p->pop(); p->NOP(); p->NOP(); return; } void GenContext::subTimestamps(GenRegister& t0, GenRegister& t1, GenRegister& tmp) { p->push(); { p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->SUBB(GenRegister::retype(t0, GEN_TYPE_UD), GenRegister::retype(t0, GEN_TYPE_UD), GenRegister::retype(t1, GEN_TYPE_UD)); /* FIXME We can not get the acc register's value correctly by set simd = 1. */ p->curr.execWidth = 8; p->MOV(tmp, GenRegister::retype(GenRegister::acc(), GEN_TYPE_UD)); p->curr.execWidth = 1; p->ADD(GenRegister::retype(GenRegister::offset(t0, 0, sizeof(uint32_t)), GEN_TYPE_UD), GenRegister::retype(GenRegister::offset(t0, 0, sizeof(uint32_t)), GEN_TYPE_UD), GenRegister::negate(GenRegister::toUniform(tmp, GEN_TYPE_UD))); p->ADD(GenRegister::retype(GenRegister::offset(t0, 0, sizeof(uint32_t)), GEN_TYPE_UD), GenRegister::retype(GenRegister::offset(t0, 0, sizeof(uint32_t)), GEN_TYPE_UD), GenRegister::negate(GenRegister::retype(GenRegister::offset(t1, 0, sizeof(uint32_t)), GEN_TYPE_UD))); } p->pop(); } void GenContext::addTimestamps(GenRegister& t0, GenRegister& t1, GenRegister& tmp) { p->push(); { p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->ADDC(GenRegister::retype(t0, GEN_TYPE_UD), GenRegister::retype(t0, GEN_TYPE_UD), GenRegister::retype(t1, GEN_TYPE_UD)); p->curr.execWidth = 8; p->MOV(tmp, GenRegister::retype(GenRegister::acc(), GEN_TYPE_UD)); p->curr.execWidth = 1; p->ADD(GenRegister::retype(GenRegister::offset(t0, 0, sizeof(uint32_t)), GEN_TYPE_UD), GenRegister::retype(GenRegister::offset(t0, 0, sizeof(uint32_t)), GEN_TYPE_UD), GenRegister::offset(GenRegister::toUniform(tmp, GEN_TYPE_UD), 0, 6*sizeof(uint32_t))); p->ADD(GenRegister::retype(GenRegister::offset(t0, 0, sizeof(uint32_t)), GEN_TYPE_UD), GenRegister::retype(GenRegister::offset(t0, 0, sizeof(uint32_t)), GEN_TYPE_UD), GenRegister::retype(GenRegister::offset(t1, 0, sizeof(uint32_t)), GEN_TYPE_UD)); } p->pop(); } /* We will record at most 20 timestamps, each one is 16bits. We also will record the prolog and epilog timestamps in 64 bits. So the format of the curbe timestamp reg is: --------------------------------------------------------- | ts0 | ts1 | ts2 | ts3 | ts4 | ts5 | ts6 | ts7 | profilingReg0 | ts8 | ts9 | ts10 | ts11 | ts12 | ts13 | ts14 | ts15 | profilingReg1 | ts16 | ts17 | ts18 | ts19 | prolog | epilog | profilingReg2 --------------------------------------------------------- | tmp0 | tmp1 |lasttimestamp| real clock | profilingReg3 --------------------------------------------------------- | | gX s | gX e | gY s | gY e | gZ s | gZ e | profilingReg4 --------------------------------------------------------- */ void GenContext::emitCalcTimestampInstruction(const SelectionInstruction &insn) { uint32_t pointNum = insn.extra.pointNum; uint32_t tsType = insn.extra.timestampType; GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag); (void) tsType; GBE_ASSERT(tsType == 1); GenRegister tmArf = GenRegister::tm0(); GenRegister profilingReg[5]; GenRegister tmp; if (p->curr.execWidth == 16) { profilingReg[0] = GenRegister::retype(ra->genReg(insn.src(0)), GEN_TYPE_UD); profilingReg[1] = GenRegister::offset(profilingReg[0], 1); profilingReg[2] = GenRegister::retype(ra->genReg(insn.src(1)), GEN_TYPE_UD); profilingReg[3] = GenRegister::offset(profilingReg[2], 1); profilingReg[4] = GenRegister::retype(ra->genReg(insn.src(2)), GEN_TYPE_UD); if (insn.dstNum == 4) { tmp = GenRegister::retype(ra->genReg(insn.dst(3)), GEN_TYPE_UD); } else { GBE_ASSERT(insn.dstNum == 3); tmp = GenRegister::toUniform(profilingReg[4], GEN_TYPE_UL); } } else { GBE_ASSERT(p->curr.execWidth == 8); profilingReg[0] = GenRegister::retype(ra->genReg(insn.src(0)), GEN_TYPE_UD); profilingReg[1] = GenRegister::retype(ra->genReg(insn.src(1)), GEN_TYPE_UD); profilingReg[2] = GenRegister::retype(ra->genReg(insn.src(2)), GEN_TYPE_UD); profilingReg[3] = GenRegister::retype(ra->genReg(insn.src(3)), GEN_TYPE_UD); profilingReg[4] = GenRegister::retype(ra->genReg(insn.src(4)), GEN_TYPE_UD); if (insn.dstNum == 6) { tmp = GenRegister::retype(ra->genReg(insn.dst(5)), GEN_TYPE_UD); } else { GBE_ASSERT(insn.dstNum == 5); tmp = GenRegister::toUniform(profilingReg[4], GEN_TYPE_UL); } } GenRegister tmp0 = GenRegister::toUniform(profilingReg[3], GEN_TYPE_UL); GenRegister lastTsReg = GenRegister::toUniform(profilingReg[3], GEN_TYPE_UL); lastTsReg = GenRegister::offset(lastTsReg, 0, 2*sizeof(uint64_t)); GenRegister realClock = GenRegister::offset(lastTsReg, 0, sizeof(uint64_t)); /* MOV(4) tmp0<1>:UW arf_tm<4,4,1>:UW */ p->push(); { p->curr.execWidth = 4; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; GenRegister _tmp0 = tmp0; _tmp0.type = GEN_TYPE_UW; _tmp0.hstride = GEN_HORIZONTAL_STRIDE_1; _tmp0.vstride = GEN_VERTICAL_STRIDE_4; _tmp0.width = GEN_WIDTH_4; p->MOV(_tmp0, tmArf); } p->pop(); /* Calc the time elapsed. */ // SUB(1) tmp0<1>:UL tmp0<1>:UL lastTS<0,1,0> subTimestamps(tmp0, lastTsReg, tmp); /* Update the real clock ADD(1) realclock<1>:UL realclock<1>:UL tmp0<1>:UL */ addTimestamps(realClock, tmp0, tmp); /* We just record timestamp of the first time this point is reached. If the this point is in loop, it can be reached many times. We will not record the later timestamps. The 32bits timestamp can represent about 3.2s, one each kernel's execution time should never exceed 3s. So we just record the low 32 bits. CMP.EQ(1)flag0.1 NULL tsReg_n<1>:UD 0x0 (+flag0.1) MOV(1) tsReg_n<1>:UD realclock<1>:UD Just record the low 32bits */ GenRegister tsReg = GenRegister::toUniform(profilingReg[pointNum/8], GEN_TYPE_UD); tsReg = GenRegister::offset(tsReg, 0, (pointNum%8)*sizeof(uint32_t)); p->push(); { p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_EQ, tsReg, GenRegister::immud(0)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.inversePredicate = 0; p->MOV(tsReg, GenRegister::retype(GenRegister::retype(realClock, GEN_TYPE_UD), GEN_TYPE_UD)); } p->pop(); /* Store the timestamp for next point use. MOV(4) lastTS<1>:UW arf_tm<4,4,1>:UW */ p->push(); { p->curr.execWidth = 4; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; GenRegister _lastTsReg = lastTsReg; _lastTsReg.type = GEN_TYPE_UW; _lastTsReg.hstride = GEN_HORIZONTAL_STRIDE_1; _lastTsReg.vstride = GEN_VERTICAL_STRIDE_4; _lastTsReg.width = GEN_WIDTH_4; p->MOV(_lastTsReg, tmArf); } p->pop(); } void GenContext::emitStoreProfilingInstruction(const SelectionInstruction &insn) { uint32_t simdType; if (this->simdWidth == 16) { simdType = ir::ProfilingInfo::ProfilingSimdType16; } else if (this->simdWidth == 8) { simdType = ir::ProfilingInfo::ProfilingSimdType8; } else { simdType = ir::ProfilingInfo::ProfilingSimdType1; GBE_ASSERT(0); } p->NOP(); p->NOP(); GenRegister tmArf = GenRegister::tm0(); GenRegister profilingReg[5]; if (p->curr.execWidth == 16) { profilingReg[0] = GenRegister::retype(ra->genReg(insn.src(0)), GEN_TYPE_UD); profilingReg[1] = GenRegister::offset(profilingReg[0], 1); profilingReg[2] = GenRegister::retype(ra->genReg(insn.src(1)), GEN_TYPE_UD); profilingReg[3] = GenRegister::offset(profilingReg[2], 1); profilingReg[4] = GenRegister::retype(ra->genReg(insn.src(2)), GEN_TYPE_UD); } else { GBE_ASSERT(p->curr.execWidth == 8); profilingReg[0] = GenRegister::retype(ra->genReg(insn.src(0)), GEN_TYPE_UD); profilingReg[1] = GenRegister::retype(ra->genReg(insn.src(1)), GEN_TYPE_UD); profilingReg[2] = GenRegister::retype(ra->genReg(insn.src(2)), GEN_TYPE_UD); profilingReg[3] = GenRegister::retype(ra->genReg(insn.src(3)), GEN_TYPE_UD); profilingReg[4] = GenRegister::retype(ra->genReg(insn.src(4)), GEN_TYPE_UD); } GenRegister tmp = ra->genReg(insn.dst(0)); uint32_t profilingType = insn.extra.profilingType; uint32_t bti = insn.extra.profilingBTI; (void) profilingType; GBE_ASSERT(profilingType == 1); GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag); GenRegister lastTsReg = GenRegister::toUniform(profilingReg[3], GEN_TYPE_UL); lastTsReg = GenRegister::offset(lastTsReg, 0, 2*sizeof(uint64_t)); GenRegister realClock = GenRegister::offset(lastTsReg, 0, sizeof(uint64_t)); GenRegister tmp0 = GenRegister::toUniform(profilingReg[3], GEN_TYPE_UL); /* MOV(4) tmp0<1>:UW arf_tm<4,4,1>:UW */ p->push(); { p->curr.execWidth = 4; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; GenRegister _tmp0 = tmp0; _tmp0.type = GEN_TYPE_UW; _tmp0.hstride = GEN_HORIZONTAL_STRIDE_1; _tmp0.vstride = GEN_VERTICAL_STRIDE_4; _tmp0.width = GEN_WIDTH_4; p->MOV(_tmp0, tmArf); } p->pop(); /* Calc the time elapsed. */ subTimestamps(tmp0, lastTsReg, tmp); /* Update the real clock */ addTimestamps(realClock, tmp0, tmp); //the epilog, record the last timestamp and return. /* MOV(1) epilog<1>:UL realclock<0,1,0>:UL */ /* ADD(1) epilog<1>:UL prolog<0,1,0>:UL */ GenRegister prolog = GenRegister::toUniform(profilingReg[2], GEN_TYPE_UD); prolog = GenRegister::offset(prolog, 0, 4*sizeof(uint32_t)); GenRegister epilog = GenRegister::offset(prolog, 0, 2*sizeof(uint32_t)); p->push(); { p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->MOV(epilog, GenRegister::retype(realClock, GEN_TYPE_UD)); p->MOV(GenRegister::offset(epilog, 0, sizeof(uint32_t)), GenRegister::offset(GenRegister::retype(realClock, GEN_TYPE_UD), 0, sizeof(uint32_t))); addTimestamps(epilog, prolog, tmp); } p->pop(); /* Now, begin to write the results out. */ // Inc the log items number. p->push(); { //ptr[0] is the total count of the log items. GenRegister sndMsg = GenRegister::retype(tmp, GEN_TYPE_UD); sndMsg.width = GEN_WIDTH_8; sndMsg.hstride = GEN_HORIZONTAL_STRIDE_1; sndMsg.vstride = GEN_VERTICAL_STRIDE_8; p->curr.execWidth = 8; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->MOV(sndMsg, GenRegister::immud(0x0)); GenRegister incRes = GenRegister::offset(sndMsg, 1); p->push(); { p->curr.execWidth = 1; p->MOV(flagReg, GenRegister::immuw(0x01)); } p->pop(); p->curr.useFlag(insn.state.flag, insn.state.subFlag); p->curr.predicate = GEN_PREDICATE_NORMAL; p->ATOMIC(incRes, GEN_ATOMIC_OP_INC, sndMsg, sndMsg, GenRegister::immud(bti), 1, false); } p->pop(); // Calculate the final addr GenRegister addr = GenRegister::retype(tmp, GEN_TYPE_UD); addr.width = GEN_WIDTH_8; addr.hstride = GEN_HORIZONTAL_STRIDE_1; addr.vstride = GEN_VERTICAL_STRIDE_8; p->push(); { GenRegister offset = GenRegister::offset(addr, 1); p->curr.execWidth = 8; p->curr.noMask = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->MUL(addr, GenRegister::toUniform(offset, GEN_TYPE_UD), GenRegister::immud(sizeof(ir::ProfilingInfo::ProfilingReportItem))); p->ADD(addr, addr, GenRegister::immud(4)); // for the counter p->curr.execWidth = 1; for (int i = 1; i < 8; i++) { p->ADD(GenRegister::toUniform(GenRegister::offset(addr, 0, i*sizeof(uint32_t)), GEN_TYPE_UD), GenRegister::toUniform(GenRegister::offset(addr, 0, i*sizeof(uint32_t)), GEN_TYPE_UD), GenRegister::immud(i*sizeof(uint32_t))); } } p->pop(); GenRegister data = GenRegister::offset(addr, 1); p->push(); { p->curr.execWidth = 8; p->curr.noMask = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->MOV(data, profilingReg[4]); } p->pop(); // Write the result out p->push(); { GenRegister ffid = GenRegister::toUniform(data, GEN_TYPE_UD); GenRegister tmp = GenRegister::toUniform(profilingReg[3], GEN_TYPE_UD); GenRegister stateReg = GenRegister::sr(0, 0); p->curr.noMask = 1; p->curr.execWidth = 1; p->MOV(ffid, stateReg); p->SHR(ffid, ffid, GenRegister::immud(24)); p->AND(ffid, ffid, GenRegister::immud(0x0ff)); p->OR(ffid, ffid, GenRegister::immud(simdType << 4)); GenRegister genInfo = GenRegister::offset(ffid, 0, 4); p->MOV(genInfo, stateReg); p->AND(genInfo, genInfo, GenRegister::immud(0x0ff07)); //The dispatch mask stateReg = GenRegister::sr(0, 2); p->MOV(tmp, stateReg); p->AND(tmp, tmp, GenRegister::immud(0x0000ffff)); p->SHL(tmp, tmp, GenRegister::immud(16)); p->OR(genInfo, genInfo, tmp); // Write it out. p->curr.execWidth = 8; p->curr.noMask = 1; p->UNTYPED_WRITE(addr, addr, GenRegister::immud(bti), 1, false); p->ADD(addr, addr, GenRegister::immud(32)); // time stamps for (int i = 0; i < 3; i++) { p->curr.execWidth = 8; p->MOV(data, GenRegister::retype(profilingReg[i], GEN_TYPE_UD)); p->UNTYPED_WRITE(addr, addr, GenRegister::immud(bti), 1, false); p->ADD(addr, addr, GenRegister::immud(32)); } } p->pop(); } /* Init value according to WORKGROUP OP * Emit assert is invalid combination operation - datatype */ static void wgOpInitValue(GenEncoder *p, GenRegister dataReg, uint32_t wg_op) { if (wg_op == ir::WORKGROUP_OP_ALL) { if (dataReg.type == GEN_TYPE_D || dataReg.type == GEN_TYPE_UD) p->MOV(dataReg, GenRegister::immd(0xFFFFFFFF)); else if(dataReg.type == GEN_TYPE_L || dataReg.type == GEN_TYPE_UL) p->MOV(dataReg, GenRegister::immint64(0xFFFFFFFFFFFFFFFFL)); else GBE_ASSERT(0); /* unsupported data-type */ } else if(wg_op == ir::WORKGROUP_OP_ANY || wg_op == ir::WORKGROUP_OP_REDUCE_ADD || wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD) { if (dataReg.type == GEN_TYPE_D) p->MOV(dataReg, GenRegister::immd(0x0)); else if (dataReg.type == GEN_TYPE_UD) p->MOV(dataReg, GenRegister::immud(0x0)); else if (dataReg.type == GEN_TYPE_F) p->MOV(dataReg, GenRegister::immf(0x0)); else if (dataReg.type == GEN_TYPE_L) p->MOV(dataReg, GenRegister::immint64(0x0)); else if (dataReg.type == GEN_TYPE_UL) p->MOV(dataReg, GenRegister::immuint64(0x0)); else if (dataReg.type == GEN_TYPE_W) p->MOV(dataReg, GenRegister::immw(0x0)); else if (dataReg.type == GEN_TYPE_UW) p->MOV(dataReg, GenRegister::immuw(0x0)); else GBE_ASSERT(0); /* unsupported data-type */ } else if(wg_op == ir::WORKGROUP_OP_REDUCE_MIN || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN) { if (dataReg.type == GEN_TYPE_D) p->MOV(dataReg, GenRegister::immd(0x7FFFFFFF)); else if (dataReg.type == GEN_TYPE_UD) p->MOV(dataReg, GenRegister::immud(0xFFFFFFFF)); else if (dataReg.type == GEN_TYPE_F) p->MOV(GenRegister::retype(dataReg, GEN_TYPE_UD), GenRegister::immud(0x7F800000)); else if (dataReg.type == GEN_TYPE_L) p->MOV(dataReg, GenRegister::immint64(0x7FFFFFFFFFFFFFFFL)); else if (dataReg.type == GEN_TYPE_UL) p->MOV(dataReg, GenRegister::immuint64(0xFFFFFFFFFFFFFFFFL)); else if (dataReg.type == GEN_TYPE_W) p->MOV(dataReg, GenRegister::immw(0x7FFF)); else if (dataReg.type == GEN_TYPE_UW) p->MOV(dataReg, GenRegister::immuw(0xFFFF)); else GBE_ASSERT(0); /* unsupported data-type */ } else if(wg_op == ir::WORKGROUP_OP_REDUCE_MAX || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) { if (dataReg.type == GEN_TYPE_D) p->MOV(dataReg, GenRegister::immd(0x80000000)); else if (dataReg.type == GEN_TYPE_UD) p->MOV(dataReg, GenRegister::immud(0x0)); else if (dataReg.type == GEN_TYPE_F) p->MOV(GenRegister::retype(dataReg, GEN_TYPE_UD), GenRegister::immud(0xFF800000)); else if (dataReg.type == GEN_TYPE_L) p->MOV(dataReg, GenRegister::immint64(0x8000000000000000L)); else if (dataReg.type == GEN_TYPE_UL) p->MOV(dataReg, GenRegister::immuint64(0x0)); else if (dataReg.type == GEN_TYPE_W) p->MOV(dataReg, GenRegister::immw(0x8000)); else if (dataReg.type == GEN_TYPE_UW) p->MOV(dataReg, GenRegister::immuw(0x0)); else GBE_ASSERT(0); /* unsupported data-type */ } /* unsupported operation */ else GBE_ASSERT(0); } /* Perform WORKGROUP OP on 2 input elements (registers) */ static void wgOpPerform(GenRegister dst, GenRegister src1, GenRegister src2, uint32_t wg_op, GenEncoder *p) { /* perform OP REDUCE on 2 elements */ if (wg_op == ir::WORKGROUP_OP_ANY) p->OR(dst, src1, src2); else if (wg_op == ir::WORKGROUP_OP_ALL) p->AND(dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_REDUCE_ADD) p->ADD(dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_REDUCE_MIN) p->SEL_CMP(GEN_CONDITIONAL_LE, dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_REDUCE_MAX) p->SEL_CMP(GEN_CONDITIONAL_GE, dst, src1, src2); /* perform OP SCAN INCLUSIVE on 2 elements */ else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD) p->ADD(dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN) p->SEL_CMP(GEN_CONDITIONAL_LE, dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX) p->SEL_CMP(GEN_CONDITIONAL_GE, dst, src1, src2); /* perform OP SCAN EXCLUSIVE on 2 elements */ else if(wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD) p->ADD(dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN) p->SEL_CMP(GEN_CONDITIONAL_LE, dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) p->SEL_CMP(GEN_CONDITIONAL_GE, dst, src1, src2); else GBE_ASSERT(0); } static void wgOpPerformThread(GenRegister threadDst, GenRegister inputVal, GenRegister threadExchangeData, GenRegister resultVal, uint32_t simd, uint32_t wg_op, GenEncoder *p) { p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.execWidth = 1; /* setting the type */ resultVal = GenRegister::retype(resultVal, inputVal.type); threadDst = GenRegister::retype(threadDst, inputVal.type); threadExchangeData = GenRegister::retype(threadExchangeData, inputVal.type); vector input; vector result; /* for workgroup all and any we can use simd_all/any for each thread */ if (wg_op == ir::WORKGROUP_OP_ALL || wg_op == ir::WORKGROUP_OP_ANY) { GenRegister constZero = GenRegister::immuw(0); GenRegister flag01 = GenRegister::flag(0, 1); p->push(); { p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.execWidth = simd; p->MOV(resultVal, GenRegister::immud(1)); p->curr.execWidth = 1; if (wg_op == ir::WORKGROUP_OP_ALL) p->MOV(flag01, GenRegister::immw(-1)); else p->MOV(flag01, constZero); p->curr.execWidth = simd; p->curr.noMask = 0; p->curr.flag = 0; p->curr.subFlag = 1; p->CMP(GEN_CONDITIONAL_NEQ, inputVal, constZero); if (p->curr.execWidth == 16) if (wg_op == ir::WORKGROUP_OP_ALL) p->curr.predicate = GEN_PREDICATE_ALIGN1_ALL16H; else p->curr.predicate = GEN_PREDICATE_ALIGN1_ANY16H; else if (p->curr.execWidth == 8) if (wg_op == ir::WORKGROUP_OP_ALL) p->curr.predicate = GEN_PREDICATE_ALIGN1_ALL8H; else p->curr.predicate = GEN_PREDICATE_ALIGN1_ANY8H; else NOT_IMPLEMENTED; p->SEL(threadDst, resultVal, constZero); p->SEL(threadExchangeData, resultVal, constZero); } p->pop(); } else { if (inputVal.hstride == GEN_HORIZONTAL_STRIDE_0) { p->MOV(threadExchangeData, inputVal); p->pop(); return; } /* init thread data to min/max/null values */ p->push(); { p->curr.execWidth = simd; wgOpInitValue(p, threadExchangeData, wg_op); p->MOV(resultVal, inputVal); } p->pop(); GenRegister resultValSingle = resultVal; resultValSingle.hstride = GEN_HORIZONTAL_STRIDE_0; resultValSingle.vstride = GEN_VERTICAL_STRIDE_0; resultValSingle.width = GEN_WIDTH_1; GenRegister inputValSingle = inputVal; inputValSingle.hstride = GEN_HORIZONTAL_STRIDE_0; inputValSingle.vstride = GEN_VERTICAL_STRIDE_0; inputValSingle.width = GEN_WIDTH_1; /* make an array of registers for easy accesing */ for(uint32_t i = 0; i < simd; i++){ /* add all resultVal offset reg positions from list */ result.push_back(resultValSingle); input.push_back(inputValSingle); /* move to next position */ resultValSingle.subnr += typeSize(resultValSingle.type); if (resultValSingle.subnr == 32) { resultValSingle.subnr = 0; resultValSingle.nr++; } /* move to next position */ inputValSingle.subnr += typeSize(inputValSingle.type); if (inputValSingle.subnr == 32) { inputValSingle.subnr = 0; inputValSingle.nr++; } } uint32_t start_i = 0; if( wg_op == ir::WORKGROUP_OP_REDUCE_ADD || wg_op == ir::WORKGROUP_OP_REDUCE_MIN || wg_op == ir::WORKGROUP_OP_REDUCE_MAX || wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX) { p->MOV(result[0], input[0]); start_i = 1; } else if(wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) { p->MOV(result[1], input[0]); start_i = 2; } /* algorithm workgroup */ for (uint32_t i = start_i; i < simd; i++) { if( wg_op == ir::WORKGROUP_OP_REDUCE_ADD || wg_op == ir::WORKGROUP_OP_REDUCE_MIN || wg_op == ir::WORKGROUP_OP_REDUCE_MAX) wgOpPerform(result[0], result[0], input[i], wg_op, p); else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX) wgOpPerform(result[i], result[i - 1], input[i], wg_op, p); else if(wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) wgOpPerform(result[i], result[i - 1], input[i - 1], wg_op, p); else GBE_ASSERT(0); } } if( wg_op == ir::WORKGROUP_OP_REDUCE_ADD || wg_op == ir::WORKGROUP_OP_REDUCE_MIN || wg_op == ir::WORKGROUP_OP_REDUCE_MAX) { p->curr.execWidth = simd; /* value exchanged with other threads */ p->MOV(threadExchangeData, result[0]); /* partial result thread */ p->MOV(threadDst, result[0]); } else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX) { p->curr.execWidth = simd; /* value exchanged with other threads */ p->MOV(threadExchangeData, result[simd - 1]); /* partial result thread */ p->MOV(threadDst, resultVal); } else if(wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) { p->curr.execWidth = 1; /* set result[0] to min/max/null */ wgOpInitValue(p, result[0], wg_op); p->curr.execWidth = simd; /* value exchanged with other threads */ wgOpPerform(threadExchangeData, result[simd - 1], input[simd - 1], wg_op, p); /* partial result thread */ p->MOV(threadDst, resultVal); } p->pop(); } /** * WORKGROUP OP: ALL, ANY, REDUCE, SCAN INCLUSIVE, SCAN EXCLUSIVE * * Implementation: * 1. All the threads first perform the workgroup op value for the * allocated work-items. SIMD16=> 16 work-items allocated for each thread * 2. Each thread writes the partial result in shared local memory using threadId * 3. After a barrier, each thread will read in chunks of 1-4 elements, * the shared local memory region, using a loop based on the thread num value (threadN) * 4. Each thread computes the final value individually * * Optimizations: * Performance is given by chunk read. If threads read in chunks of 4 elements * the performance is increase 2-3x times compared to chunks of 1 element. */ void GenContext::emitWorkGroupOpInstruction(const SelectionInstruction &insn){ const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister tmp = GenRegister::retype(ra->genReg(insn.dst(1)), dst.type); const GenRegister theVal = GenRegister::retype(ra->genReg(insn.src(2)), dst.type); GenRegister threadData = ra->genReg(insn.src(3)); GenRegister partialData = GenRegister::toUniform(threadData, dst.type); GenRegister threadId = ra->genReg(insn.src(0)); GenRegister threadLoop = ra->genReg(insn.src(1)); GenRegister barrierId = ra->genReg(GenRegister::ud1grf(ir::ocl::barrierid)); GenRegister localBarrier = ra->genReg(insn.src(5)); uint32_t wg_op = insn.extra.wgop.workgroupOp; uint32_t simd = p->curr.execWidth; int32_t jip0, jip1; /* masked elements should be properly set to init value */ p->push(); { p->curr.noMask = 1; wgOpInitValue(p, tmp, wg_op); p->curr.noMask = 0; p->MOV(tmp, theVal); p->curr.noMask = 1; p->MOV(theVal, tmp); } p->pop(); threadId = GenRegister::toUniform(threadId, GEN_TYPE_UD); /* use of continuous GRF allocation from insn selection */ GenRegister msg = GenRegister::retype(ra->genReg(insn.dst(2)), dst.type); GenRegister msgSlmOff = GenRegister::retype(ra->genReg(insn.src(4)), GEN_TYPE_UD); GenRegister msgAddr = GenRegister::retype(msg, GEN_TYPE_UD); GenRegister msgData = GenRegister::retype(ra->genReg(insn.dst(3)), dst.type); /* do some calculation within each thread */ wgOpPerformThread(dst, theVal, threadData, tmp, simd, wg_op, p); p->curr.execWidth = simd; p->MOV(theVal, dst); threadData = GenRegister::toUniform(threadData, dst.type); /* store thread count for future use on read/write to SLM */ if (wg_op == ir::WORKGROUP_OP_ANY || wg_op == ir::WORKGROUP_OP_ALL || wg_op == ir::WORKGROUP_OP_REDUCE_ADD || wg_op == ir::WORKGROUP_OP_REDUCE_MIN || wg_op == ir::WORKGROUP_OP_REDUCE_MAX) { threadLoop = GenRegister::retype(tmp, GEN_TYPE_D); p->MOV(threadLoop, ra->genReg(GenRegister::ud1grf(ir::ocl::threadn))); } else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) { threadLoop = GenRegister::retype(tmp, GEN_TYPE_D); p->MOV(threadLoop, ra->genReg(GenRegister::ud1grf(ir::ocl::threadid))); } /* all threads write the partial results to SLM memory */ if(dst.type == GEN_TYPE_UL || dst.type == GEN_TYPE_L) { GenRegister threadDataL = GenRegister::retype(threadData, GEN_TYPE_D); GenRegister threadDataH = threadDataL.offset(threadDataL, 0, 4); GenRegister msgDataL = GenRegister::retype(msgData, GEN_TYPE_D); GenRegister msgDataH = msgDataL.offset(msgDataL, 1); p->curr.execWidth = 8; p->MOV(msgDataL, threadDataL); p->MOV(msgDataH, threadDataH); p->MUL(msgAddr, threadId, GenRegister::immd(0x8)); p->ADD(msgAddr, msgAddr, msgSlmOff); p->UNTYPED_WRITE(msg, msg, GenRegister::immw(0xFE), 2, false); } else { p->curr.execWidth = 8; p->MOV(msgData, threadData); p->MUL(msgAddr, threadId, GenRegister::immd(0x4)); p->ADD(msgAddr, msgAddr, msgSlmOff); p->UNTYPED_WRITE(msg, msg, GenRegister::immw(0xFE), 1, false); } /* init partialData register, it will hold the final result */ wgOpInitValue(p, partialData, wg_op); /* add call to barrier */ p->push(); p->curr.execWidth = 8; p->curr.physicalFlag = 0; p->curr.noMask = 1; p->AND(localBarrier, barrierId, GenRegister::immud(0x0f000000)); p->BARRIER(localBarrier); p->curr.execWidth = 1; p->WAIT(); p->pop(); /* perform a loop, based on thread count (which is now multiple of 4) */ p->push();{ jip0 = p->n_instruction(); /* read in chunks of 4 to optimize SLM reads and reduce SEND messages */ if(dst.type == GEN_TYPE_UL || dst.type == GEN_TYPE_L) { p->curr.execWidth = 8; p->curr.predicate = GEN_PREDICATE_NONE; p->ADD(threadLoop, threadLoop, GenRegister::immd(-1)); p->MUL(msgAddr, threadLoop, GenRegister::immd(0x8)); p->ADD(msgAddr, msgAddr, msgSlmOff); p->UNTYPED_READ(msgData, msgAddr, GenRegister::immw(0xFE), 2); GenRegister msgDataL = msgData.retype(msgData.offset(msgData, 0, 4), GEN_TYPE_D); GenRegister msgDataH = msgData.retype(msgData.offset(msgData, 1, 4), GEN_TYPE_D); msgDataL.hstride = 2; msgDataH.hstride = 2; p->MOV(msgDataL, msgDataH); /* perform operation, partialData will hold result */ wgOpPerform(partialData, partialData, msgData.offset(msgData, 0), wg_op, p); } else { p->curr.execWidth = 8; p->curr.predicate = GEN_PREDICATE_NONE; p->ADD(threadLoop, threadLoop, GenRegister::immd(-1)); p->MUL(msgAddr, threadLoop, GenRegister::immd(0x4)); p->ADD(msgAddr, msgAddr, msgSlmOff); p->UNTYPED_READ(msgData, msgAddr, GenRegister::immw(0xFE), 1); /* perform operation, partialData will hold result */ wgOpPerform(partialData, partialData, msgData.offset(msgData, 0), wg_op, p); } /* while threadN is not 0, cycle read SLM / update value */ p->curr.noMask = 1; p->curr.flag = 0; p->curr.subFlag = 1; p->CMP(GEN_CONDITIONAL_G, threadLoop, GenRegister::immd(0x0)); p->curr.predicate = GEN_PREDICATE_NORMAL; jip1 = p->n_instruction(); p->JMPI(GenRegister::immud(0)); p->patchJMPI(jip1, jip0 - jip1, 0); } p->pop(); if(wg_op == ir::WORKGROUP_OP_ANY || wg_op == ir::WORKGROUP_OP_ALL || wg_op == ir::WORKGROUP_OP_REDUCE_ADD || wg_op == ir::WORKGROUP_OP_REDUCE_MIN || wg_op == ir::WORKGROUP_OP_REDUCE_MAX) { /* save result to final register location dst */ p->curr.execWidth = simd; p->MOV(dst, partialData); } else { /* save result to final register location dst */ p->curr.execWidth = simd; if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD) p->ADD(dst, dst, partialData); else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN) { /* workaround QW datatype on CMP */ if(dst.type == GEN_TYPE_UL || dst.type == GEN_TYPE_L){ p->push(); p->curr.execWidth = 8; p->SEL_CMP(GEN_CONDITIONAL_LE, dst, dst, partialData); if (simd == 16) { p->curr.execWidth = 8; p->curr.quarterControl = GEN_COMPRESSION_Q2; p->SEL_CMP(GEN_CONDITIONAL_LE, GenRegister::Qn(dst, 1), GenRegister::Qn(dst, 1), GenRegister::Qn(partialData, 1)); } p->pop(); } else p->SEL_CMP(GEN_CONDITIONAL_LE, dst, dst, partialData); } else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) { /* workaround QW datatype on CMP */ if(dst.type == GEN_TYPE_UL || dst.type == GEN_TYPE_L){ p->push(); p->curr.execWidth = 8; p->SEL_CMP(GEN_CONDITIONAL_GE, dst, dst, partialData); if (simd == 16) { p->curr.execWidth = 8; p->curr.quarterControl = GEN_COMPRESSION_Q2; p->SEL_CMP(GEN_CONDITIONAL_GE, GenRegister::Qn(dst, 1), GenRegister::Qn(dst, 1), GenRegister::Qn(partialData, 1)); } p->pop(); } else p->SEL_CMP(GEN_CONDITIONAL_GE, dst, dst, partialData); } } /* corner cases for threads 0 */ if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) { p->push();{ p->curr.flag = 0; p->curr.subFlag = 1; p->CMP(GEN_CONDITIONAL_EQ, threadId, GenRegister::immd(0x0)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.execWidth = simd; p->MOV(dst, theVal); } p->pop(); } } void GenContext::emitSubGroupOpInstruction(const SelectionInstruction &insn){ const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister tmp = GenRegister::retype(ra->genReg(insn.dst(1)), dst.type); const GenRegister theVal = GenRegister::retype(ra->genReg(insn.src(0)), dst.type); GenRegister threadData = ra->genReg(insn.src(1)); uint32_t wg_op = insn.extra.wgop.workgroupOp; uint32_t simd = p->curr.execWidth; /* masked elements should be properly set to init value */ p->push(); { p->curr.noMask = 1; wgOpInitValue(p, tmp, wg_op); p->curr.noMask = 0; p->MOV(tmp, theVal); p->curr.noMask = 1; p->MOV(theVal, tmp); } p->pop(); /* do some calculation within each thread */ wgOpPerformThread(dst, theVal, threadData, tmp, simd, wg_op, p); } void GenContext::emitPrintfLongInstruction(GenRegister& addr, GenRegister& data, GenRegister& src, uint32_t bti, bool useSends) { p->MOV(GenRegister::retype(data, GEN_TYPE_UD), src.bottom_half()); p->UNTYPED_WRITE(addr, data, GenRegister::immud(bti), 1, useSends); p->ADD(addr, addr, GenRegister::immud(sizeof(uint32_t))); p->MOV(GenRegister::retype(data, GEN_TYPE_UD), src.top_half(this->simdWidth)); p->UNTYPED_WRITE(addr, data, GenRegister::immud(bti), 1, useSends); p->ADD(addr, addr, GenRegister::immud(sizeof(uint32_t))); } void GenContext::emitPrintfInstruction(const SelectionInstruction &insn) { const GenRegister tmp0 = ra->genReg(insn.dst(0)); const GenRegister tmp1 = ra->genReg(insn.dst(1)); GenRegister src; uint32_t srcNum = insn.srcNum; GenRegister addr = GenRegister::retype(tmp0, GEN_TYPE_UD); GenRegister data = GenRegister::retype(tmp1, GEN_TYPE_UD); bool useSends = insn.extra.printfSplitSend; if (!insn.extra.continueFlag) { p->push(); { p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; //ptr[0] is the total count of the log size. p->MOV(addr, GenRegister::immud(0)); p->MOV(data, GenRegister::immud(insn.extra.printfSize + 12)); } p->pop(); p->ATOMIC(addr, GEN_ATOMIC_OP_ADD, addr, data, GenRegister::immud(insn.extra.printfBTI), 2, useSends); /* Write out the header. */ p->MOV(data, GenRegister::immud(0xAABBCCDD)); p->UNTYPED_WRITE(addr, data, GenRegister::immud(insn.extra.printfBTI), 1, useSends); p->ADD(addr, addr, GenRegister::immud(sizeof(uint32_t))); p->MOV(data, GenRegister::immud(insn.extra.printfSize + 12)); p->UNTYPED_WRITE(addr, data, GenRegister::immud(insn.extra.printfBTI), 1, useSends); p->ADD(addr, addr, GenRegister::immud(sizeof(uint32_t))); p->MOV(data, GenRegister::immud(insn.extra.printfNum)); p->UNTYPED_WRITE(addr, data, GenRegister::immud(insn.extra.printfBTI), 1, useSends); p->ADD(addr, addr, GenRegister::immud(sizeof(uint32_t))); } // Now, store out every parameter. for(uint32_t i = 0; i < srcNum; i++) { src = ra->genReg(insn.src(i)); if (src.type == GEN_TYPE_UD || src.type == GEN_TYPE_D || src.type == GEN_TYPE_F) { p->MOV(GenRegister::retype(data, src.type), src); p->UNTYPED_WRITE(addr, data, GenRegister::immud(insn.extra.printfBTI), 1, useSends); p->ADD(addr, addr, GenRegister::immud(sizeof(uint32_t))); } else if (src.type == GEN_TYPE_B || src.type == GEN_TYPE_UB ) { p->MOV(GenRegister::retype(data, GEN_TYPE_UD), src); p->UNTYPED_WRITE(addr, data, GenRegister::immud(insn.extra.printfBTI), 1, useSends); p->ADD(addr, addr, GenRegister::immud(sizeof(uint32_t))); } else if (src.type == GEN_TYPE_L || src.type == GEN_TYPE_UL ) { emitPrintfLongInstruction(addr, data, src, insn.extra.printfBTI, useSends); } } } void GenContext::setA0Content(uint16_t new_a0[16], uint16_t max_offset, int sz) { if (sz == 0) sz = 8; GBE_ASSERT(sz%4 == 0); GBE_ASSERT(new_a0[0] >= 0 && new_a0[0] < 4096); p->push(); p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; for (int i = 0; i < sz/2; i++) { p->MOV(GenRegister::retype(GenRegister::addr1(i*2), GEN_TYPE_UD), GenRegister::immud(new_a0[i*2 + 1] << 16 | new_a0[i*2])); } p->pop(); } void GenContext::emitOBReadInstruction(const SelectionInstruction &insn) { const GenRegister header = ra->genReg(insn.src(0)); const GenRegister tmp = ra->genReg(insn.dst(0)); const uint32_t bti = insn.getbti(); const uint32_t ow_size = insn.extra.elem; bool isA64 = bti == 255; if (isA64) p->OBREADA64(tmp, header, bti, ow_size); else p->OBREAD(tmp, header, bti, ow_size); } void GenContext::emitOBWriteInstruction(const SelectionInstruction &insn) { const GenRegister header = ra->genReg(insn.src(0)); const GenRegister data = ra->genReg(insn.src(1)); const uint32_t bti = insn.getbti(); const uint32_t ow_size = insn.extra.elem; bool isA64 = bti == 255; if (isA64) p->OBWRITEA64(header, bti, ow_size); else p->OBWRITE(header, data, bti, ow_size, insn.extra.splitSend); } void GenContext::emitMBReadInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister header = ra->genReg(insn.src(0)); const size_t response_size = insn.extra.elem; p->MBREAD(dst, header, insn.getbti(), response_size); } void GenContext::emitMBWriteInstruction(const SelectionInstruction &insn) { const GenRegister header = ra->genReg(insn.dst(0)); const GenRegister data = ra->genReg(insn.dst(1)); const size_t data_size = insn.extra.elem; p->MBWRITE(header, data, insn.getbti(), data_size, insn.extra.splitSend); } BVAR(OCL_OUTPUT_REG_ALLOC, false); BVAR(OCL_OUTPUT_ASM, false); void GenContext::allocCurbeReg(ir::Register reg) { uint32_t regSize; gbe_curbe_type curbeType; int subType; this->getRegPayloadType(reg, curbeType, subType); regSize = this->ra->getRegSize(reg); insertCurbeReg(reg, newCurbeEntry(curbeType, subType, regSize)); /* Need to patch the image information registers. */ if (curbeType == GBE_CURBE_IMAGE_INFO) { std::sort(kernel->patches.begin(), kernel->patches.end()); uint32_t offset = kernel->getCurbeOffset(GBE_CURBE_IMAGE_INFO, subType); fn.getImageSet()->appendInfo(static_cast(subType), offset); } } void GenContext::buildPatchList() { // After this point the vector is immutable. Sorting it will make // research faster std::sort(kernel->patches.begin(), kernel->patches.end()); kernel->curbeSize = ALIGN(kernel->curbeSize, GEN_REG_SIZE); } BVAR(OCL_OUTPUT_SEL_IR, false); BVAR(OCL_OPTIMIZE_SEL_IR, true); bool GenContext::emitCode(void) { GenKernel *genKernel = static_cast(this->kernel); sel->select(); if (OCL_OPTIMIZE_SEL_IR) sel->optimize(); sel->addID(); if (OCL_OUTPUT_SEL_IR) outputSelectionIR(*this, this->sel, genKernel->getName()); schedulePreRegAllocation(*this, *this->sel); sel->addID(); if (UNLIKELY(ra->allocate(*this->sel) == false)) return false; schedulePostRegAllocation(*this, *this->sel); if (OCL_OUTPUT_REG_ALLOC) ra->outputAllocation(); if (inProfilingMode) { // add the profiling prolog before do anything. this->profilingProlog(); } this->emitStackPointer(); this->clearFlagRegister(); this->emitSLMOffset(); this->emitInstructionStream(); if (this->patchBranches() == false) return false; genKernel->insnNum = p->store.size(); genKernel->insns = GBE_NEW_ARRAY_NO_ARG(GenInstruction, genKernel->insnNum); std::memcpy(genKernel->insns, &p->store[0], genKernel->insnNum * sizeof(GenInstruction)); if (OCL_OUTPUT_ASM) outputAssembly(stdout, genKernel); if (OCL_DEBUGINFO) outputAssembly(stdout, genKernel); if (this->asmFileName) { FILE *asmDumpStream = fopen(this->asmFileName, "a"); if (asmDumpStream) { outputAssembly(asmDumpStream, genKernel); fclose(asmDumpStream); } } return true; } Kernel *GenContext::allocateKernel(void) { return GBE_NEW(GenKernel, name, deviceID); } void GenContext::outputAssembly(FILE *file, GenKernel* genKernel) { /* get gen version for the instruction compact */ uint32_t insn_version = 0; if (IS_GEN7(deviceID) || IS_GEN75(deviceID)) insn_version = 7; else if (IS_GEN8(deviceID) || IS_GEN9(deviceID)) insn_version = 8; fprintf(file, "%s's disassemble begin:\n", genKernel->getName()); ir::LabelIndex curLabel = (ir::LabelIndex)0; GenCompactInstruction * pCom = NULL; GenInstruction insn[2]; fprintf(file, " L0:\n"); for (uint32_t insnID = 0; insnID < genKernel->insnNum; ) { if (labelPos.find((ir::LabelIndex)(curLabel + 1))->second == insnID && curLabel < this->getFunction().labelNum()) { fprintf(file, " L%i:\n", curLabel + 1); curLabel = (ir::LabelIndex)(curLabel + 1); while(labelPos.find((ir::LabelIndex)(curLabel + 1))->second == insnID) { fprintf(file, " L%i:\n", curLabel + 1); curLabel = (ir::LabelIndex)(curLabel + 1); } } if (OCL_DEBUGINFO) fprintf(file, "[%3i,%3i]", p->storedbg[insnID].line, p->storedbg[insnID].col); fprintf(file, " (%8i) ", insnID); pCom = (GenCompactInstruction*)&p->store[insnID]; if(pCom->bits1.cmpt_control == 1) { decompactInstruction(pCom, &insn, insn_version); gen_disasm(file, &insn, deviceID, 1); insnID++; } else { gen_disasm(file, &p->store[insnID], deviceID, 0); insnID = insnID + 2; } } fprintf(file, "%s's disassemble end.\n", genKernel->getName()); } } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/backend/gen9_context.hpp000664 001750 001750 00000006627 13173554000 022355 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file gen9_context.hpp */ #ifndef __GBE_gen9_CONTEXT_HPP__ #define __GBE_gen9_CONTEXT_HPP__ #include "backend/gen8_context.hpp" #include "backend/gen9_encoder.hpp" namespace gbe { /* This class is used to implement the skylake specific logic for context. */ class Gen9Context : public Gen8Context { public: virtual ~Gen9Context(void) { }; Gen9Context(const ir::Unit &unit, const std::string &name, uint32_t deviceID, bool relaxMath = false) : Gen8Context(unit, name, deviceID, relaxMath) { }; virtual void emitBarrierInstruction(const SelectionInstruction &insn); protected: virtual GenEncoder* generateEncoder(void) { return GBE_NEW(Gen9Encoder, this->simdWidth, 9, deviceID); } private: virtual void newSelection(void); }; //most code of BxtContext are copied from ChvContext, it results in two physical copy of the same code. //there are two possible ways to resolve it: 1) virtual inheritance 2) class template //but either way makes BxtContext and ChvContext tied closely, it might impact the flexibility of future changes //so, choose the method of two physical copies. class BxtContext : public Gen9Context { public: virtual ~BxtContext(void) { } BxtContext(const ir::Unit &unit, const std::string &name, uint32_t deviceID, bool relaxMath = false) : Gen9Context(unit, name, deviceID, relaxMath) { }; virtual void emitI64MULInstruction(const SelectionInstruction &insn); protected: virtual void setA0Content(uint16_t new_a0[16], uint16_t max_offset = 0, int sz = 0); private: virtual void newSelection(void); virtual void calculateFullU64MUL(GenRegister src0, GenRegister src1, GenRegister dst_h, GenRegister dst_l, GenRegister s0l_s1h, GenRegister s0h_s1l); virtual void emitStackPointer(void); }; /* This class is used to implement the kabylake specific logic for context. */ class KblContext : public Gen9Context { public: virtual ~KblContext(void) { }; KblContext(const ir::Unit &unit, const std::string &name, uint32_t deviceID, bool relaxMath = false) : Gen9Context(unit, name, deviceID, relaxMath) { }; private: virtual void newSelection(void); }; /* This class is used to implement the geminilake specific logic for context. */ class GlkContext : public BxtContext { public: virtual ~GlkContext(void) { }; GlkContext(const ir::Unit &unit, const std::string &name, uint32_t deviceID, bool relaxMath = false) : BxtContext(unit, name, deviceID, relaxMath) { }; private: virtual void newSelection(void); }; } #endif /* __GBE_GEN9_CONTEXT_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen9_encoder.cpp000664 001750 001750 00000025511 13161142102 022265 0ustar00yryr000000 000000 /* Copyright (C) Intel Corp. 2006. All Rights Reserved. Intel funded Tungsten Graphics (http://www.tungstengraphics.com) to develop this 3D driver. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. **********************************************************************/ #include "backend/gen9_encoder.hpp" #include "backend/gen9_instruction.hpp" static const uint32_t untypedRWMask[] = { GEN_UNTYPED_ALPHA|GEN_UNTYPED_BLUE|GEN_UNTYPED_GREEN|GEN_UNTYPED_RED, GEN_UNTYPED_ALPHA|GEN_UNTYPED_BLUE|GEN_UNTYPED_GREEN, GEN_UNTYPED_ALPHA|GEN_UNTYPED_BLUE, GEN_UNTYPED_ALPHA, 0 }; namespace gbe { void Gen9Encoder::SAMPLE(GenRegister dest, GenRegister msg, unsigned int msg_len, bool header_present, unsigned char bti, unsigned char sampler, uint32_t simdWidth, uint32_t writemask, uint32_t return_format, bool isLD, bool isUniform) { if (writemask == 0) return; uint32_t msg_type = isLD ? GEN_SAMPLER_MESSAGE_SIMD8_LD : GEN_SAMPLER_MESSAGE_SIMD8_SAMPLE; uint32_t response_length = (4 * (simdWidth / 8)); uint32_t msg_length = (msg_len * (simdWidth / 8)); if (header_present) msg_length++; uint32_t simd_mode = (simdWidth == 16) ? GEN_SAMPLER_SIMD_MODE_SIMD16 : GEN_SAMPLER_SIMD_MODE_SIMD8; if(isUniform) { response_length = 1; msg_type = GEN_SAMPLER_MESSAGE_SIMD4X2_LD; msg_length = 1; simd_mode = GEN_SAMPLER_SIMD_MODE_SIMD8; } GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); this->setDst(insn, dest); this->setSrc0(insn, msg); this->setSrc1(insn, GenRegister::immud(0)); setSamplerMessage(insn, bti, sampler, msg_type, response_length, msg_length, header_present, simd_mode, return_format); } void Gen9Encoder::setSendsOperands(Gen9NativeInstruction *gen9_insn, GenRegister dst, GenRegister src0, GenRegister src1) { assert(dst.subnr == 0 && src0.subnr == 0 && src1.subnr == 0); if (dst.file == GEN_ARCHITECTURE_REGISTER_FILE) gen9_insn->bits1.sends.dest_reg_file_0 = 0; else if (dst.file == GEN_GENERAL_REGISTER_FILE) gen9_insn->bits1.sends.dest_reg_file_0 = 1; else NOT_SUPPORTED; gen9_insn->bits1.sends.src1_reg_file_0 = 1; gen9_insn->bits1.sends.src1_reg_nr = src1.nr; gen9_insn->bits1.sends.dest_subreg_nr = 0; gen9_insn->bits1.sends.dest_reg_nr = dst.nr; gen9_insn->bits1.sends.dest_address_mode = 0; //direct mode gen9_insn->bits2.sends.src0_subreg_nr = 0; gen9_insn->bits2.sends.src0_reg_nr = src0.nr; gen9_insn->bits2.sends.src0_address_mode = 0; } unsigned Gen9Encoder::setUntypedWriteSendsMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) { uint32_t msg_length = 0; uint32_t response_length = 0; if (this->curr.execWidth == 8) { msg_length = 1; } else if (this->curr.execWidth == 16) { msg_length = 2; } else NOT_IMPLEMENTED; setDPUntypedRW(insn, bti, untypedRWMask[elemNum], GEN75_P1_UNTYPED_SURFACE_WRITE, msg_length, response_length); return insn->bits3.ud; } void Gen9Encoder::UNTYPED_WRITE(GenRegister addr, GenRegister data, GenRegister bti, uint32_t elemNum, bool useSends) { if (!useSends) Gen8Encoder::UNTYPED_WRITE(addr, data, bti, elemNum, false); else { GBE_ASSERT(addr.reg() != data.reg()); GenNativeInstruction *insn = this->next(GEN_OPCODE_SENDS); Gen9NativeInstruction *gen9_insn = &insn->gen9_insn; assert(elemNum >= 1 || elemNum <= 4); this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA; setSendsOperands(gen9_insn, GenRegister::null(), addr, data); if (this->curr.execWidth == 8) gen9_insn->bits2.sends.src1_length = elemNum; else if (this->curr.execWidth == 16) gen9_insn->bits2.sends.src1_length = 2 * elemNum; else NOT_SUPPORTED; if (bti.file == GEN_IMMEDIATE_VALUE) { gen9_insn->bits2.sends.sel_reg32_desc = 0; setUntypedWriteSendsMessageDesc(insn, bti.value.ud, elemNum); } else gen9_insn->bits2.sends.sel_reg32_desc = 1; } } void Gen9Encoder::TYPED_WRITE(GenRegister header, GenRegister data, bool header_present, unsigned char bti, bool useSends) { if (!useSends) Gen8Encoder::TYPED_WRITE(header, data, header_present, bti, false); else { GBE_ASSERT(header.reg() != data.reg()); GenNativeInstruction *insn = this->next(GEN_OPCODE_SENDS); Gen9NativeInstruction *gen9_insn = &insn->gen9_insn; assert(header_present); this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA; setSendsOperands(gen9_insn, GenRegister::null(), header, data); gen9_insn->bits2.sends.src1_length = 4; //src0_length: 5(header+u+v+w+lod), src1_length: 4(data) gen9_insn->bits2.sends.sel_reg32_desc = 0; setTypedWriteMessage(insn, bti, GEN_TYPED_WRITE, 5, header_present); } } unsigned Gen9Encoder::setByteScatterSendsMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemSize) { uint32_t msg_length = 0; uint32_t response_length = 0; if (this->curr.execWidth == 8) { msg_length = 1; } else if (this->curr.execWidth == 16) { msg_length = 2; } else NOT_IMPLEMENTED; setDPByteScatterGather(insn, bti, elemSize, GEN7_BYTE_SCATTER, msg_length, response_length); return insn->bits3.ud; } void Gen9Encoder::BYTE_SCATTER(GenRegister addr, GenRegister data, GenRegister bti, uint32_t elemSize, bool useSends) { if (!useSends) Gen8Encoder::BYTE_SCATTER(addr, data, bti, elemSize, false); else { GBE_ASSERT(addr.reg() != data.reg()); GenNativeInstruction *insn = this->next(GEN_OPCODE_SENDS); Gen9NativeInstruction *gen9_insn = &insn->gen9_insn; this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA; setSendsOperands(gen9_insn, GenRegister::null(), addr, data); if (this->curr.execWidth == 8) gen9_insn->bits2.sends.src1_length = 1; else if (this->curr.execWidth == 16) gen9_insn->bits2.sends.src1_length = 2; else NOT_SUPPORTED; if (bti.file == GEN_IMMEDIATE_VALUE) { gen9_insn->bits2.sends.sel_reg32_desc = 0; setByteScatterSendsMessageDesc(insn, bti.value.ud, elemSize); } else gen9_insn->bits2.sends.sel_reg32_desc = 1; } } void Gen9Encoder::ATOMIC(GenRegister dst, uint32_t function, GenRegister addr, GenRegister data, GenRegister bti, uint32_t srcNum, bool useSends) { if (!useSends) Gen8Encoder::ATOMIC(dst, function, addr, data, bti, srcNum, false); else { GBE_ASSERT(addr.reg() != data.reg()); GenNativeInstruction *insn = this->next(GEN_OPCODE_SENDS); Gen9NativeInstruction *gen9_insn = &insn->gen9_insn; this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA; setSendsOperands(gen9_insn, dst, addr, data); if (this->curr.execWidth == 8) gen9_insn->bits2.sends.src1_length = srcNum - 1; else if (this->curr.execWidth == 16) gen9_insn->bits2.sends.src1_length = 2 * (srcNum - 1); else NOT_SUPPORTED; if (bti.file == GEN_IMMEDIATE_VALUE) { gen9_insn->bits2.sends.sel_reg32_desc = 0; setAtomicMessageDesc(insn, function, bti.value.ud, 1); } else gen9_insn->bits2.sends.sel_reg32_desc = 1; } } void Gen9Encoder::OBWRITE(GenRegister header, GenRegister data, uint32_t bti, uint32_t ow_size, bool useSends) { if (!useSends) Gen8Encoder::OBWRITE(header, data, bti, ow_size, false); else { GBE_ASSERT(data.reg() != header.reg()); GenNativeInstruction *insn = this->next(GEN_OPCODE_SENDS); Gen9NativeInstruction *gen9_insn = &insn->gen9_insn; this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA; setSendsOperands(gen9_insn, GenRegister::null(), header, data); uint32_t dataRegs = ow_size / 2; // half reg should also have size 1 if (dataRegs == 0) dataRegs = 1; gen9_insn->bits2.sends.src1_length = dataRegs; const uint32_t block_size = getOBlockSize(ow_size); const uint32_t msg_length = 1; const uint32_t response_length = 0; setOBlockRW(insn, bti, block_size, GEN7_OBLOCK_WRITE, msg_length, response_length); } } void Gen9Encoder::MBWRITE(GenRegister header, GenRegister data, uint32_t bti, uint32_t data_size, bool useSends) { if (!useSends) Gen8Encoder::MBWRITE(header, data, bti, data_size, false); else { GBE_ASSERT(data.reg() != header.reg()); GenNativeInstruction *insn = this->next(GEN_OPCODE_SENDS); Gen9NativeInstruction *gen9_insn = &insn->gen9_insn; this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA; setSendsOperands(gen9_insn, GenRegister::null(), header, data); gen9_insn->bits2.sends.src1_length = data_size; const uint32_t msg_length = 1; const uint32_t response_length = 0; setMBlockRW(insn, bti, GEN75_P1_MEDIA_TYPED_BWRITE, msg_length, response_length); } } } /* End of the name space. */ Beignet-1.3.2-Source/backend/src/backend/gen_insn_gen7_schedule_info.hxx000664 001750 001750 00000007303 13161142102 025357 0ustar00yryr000000 000000 // Family Latency SIMD16 SIMD8 DECL_GEN7_SCHEDULE(Label, 0, 0, 0) DECL_GEN7_SCHEDULE(Unary, 20, 4, 2) DECL_GEN7_SCHEDULE(UnaryWithTemp, 20, 40, 20) DECL_GEN7_SCHEDULE(Binary, 20, 4, 2) DECL_GEN7_SCHEDULE(SimdShuffle, 20, 4, 2) DECL_GEN7_SCHEDULE(BinaryWithTemp, 20, 40, 20) DECL_GEN7_SCHEDULE(Ternary, 20, 4, 2) DECL_GEN7_SCHEDULE(I64Shift, 20, 40, 20) DECL_GEN7_SCHEDULE(I64HADD, 20, 40, 20) DECL_GEN7_SCHEDULE(I64RHADD, 20, 40, 20) DECL_GEN7_SCHEDULE(I64ToFloat, 20, 40, 20) DECL_GEN7_SCHEDULE(FloatToI64, 20, 40, 20) DECL_GEN7_SCHEDULE(I64MULHI, 20, 40, 20) DECL_GEN7_SCHEDULE(I64MADSAT, 20, 40, 20) DECL_GEN7_SCHEDULE(Compare, 20, 4, 2) DECL_GEN7_SCHEDULE(I64Compare, 20, 80, 20) DECL_GEN7_SCHEDULE(I64DIVREM, 20, 80, 20) DECL_GEN7_SCHEDULE(Jump, 14, 1, 1) DECL_GEN7_SCHEDULE(IndirectMove, 20, 2, 2) DECL_GEN7_SCHEDULE(Eot, 20, 1, 1) DECL_GEN7_SCHEDULE(NoOp, 20, 2, 2) DECL_GEN7_SCHEDULE(Wait, 20, 2, 2) DECL_GEN7_SCHEDULE(Math, 20, 4, 2) DECL_GEN7_SCHEDULE(Barrier, 80, 1, 1) DECL_GEN7_SCHEDULE(Fence, 80, 1, 1) DECL_GEN7_SCHEDULE(Read64, 80, 1, 1) DECL_GEN7_SCHEDULE(Write64, 80, 1, 1) DECL_GEN7_SCHEDULE(Read64A64, 80, 1, 1) DECL_GEN7_SCHEDULE(Write64A64, 80, 1, 1) DECL_GEN7_SCHEDULE(UntypedRead, 160, 1, 1) DECL_GEN7_SCHEDULE(UntypedWrite, 160, 1, 1) DECL_GEN7_SCHEDULE(UntypedReadA64, 160, 1, 1) DECL_GEN7_SCHEDULE(UntypedWriteA64, 160, 1, 1) DECL_GEN7_SCHEDULE(ByteGatherA64, 160, 1, 1) DECL_GEN7_SCHEDULE(ByteScatterA64, 160, 1, 1) DECL_GEN7_SCHEDULE(ByteGather, 160, 1, 1) DECL_GEN7_SCHEDULE(ByteScatter, 160, 1, 1) DECL_GEN7_SCHEDULE(DWordGather, 160, 1, 1) DECL_GEN7_SCHEDULE(PackByte, 40, 1, 1) DECL_GEN7_SCHEDULE(UnpackByte, 40, 1, 1) DECL_GEN7_SCHEDULE(PackLong, 40, 1, 1) DECL_GEN7_SCHEDULE(UnpackLong, 40, 1, 1) DECL_GEN7_SCHEDULE(Sample, 160, 1, 1) DECL_GEN7_SCHEDULE(Vme, 320, 1, 1) DECL_GEN7_SCHEDULE(TypedWrite, 80, 1, 1) DECL_GEN7_SCHEDULE(SpillReg, 20, 1, 1) DECL_GEN7_SCHEDULE(UnSpillReg, 160, 1, 1) DECL_GEN7_SCHEDULE(Atomic, 80, 1, 1) DECL_GEN7_SCHEDULE(AtomicA64, 80, 1, 1) DECL_GEN7_SCHEDULE(I64MUL, 20, 40, 20) DECL_GEN7_SCHEDULE(I64SATADD, 20, 40, 20) DECL_GEN7_SCHEDULE(I64SATSUB, 20, 40, 20) DECL_GEN7_SCHEDULE(F64DIV, 20, 40, 20) DECL_GEN7_SCHEDULE(CalcTimestamp, 80, 1, 1) DECL_GEN7_SCHEDULE(StoreProfiling, 80, 1, 1) DECL_GEN7_SCHEDULE(WorkGroupOp, 80, 1, 1) DECL_GEN7_SCHEDULE(SubGroupOp, 80, 1, 1) DECL_GEN7_SCHEDULE(Printf, 80, 1, 1) DECL_GEN7_SCHEDULE(OBRead, 80, 1, 1) DECL_GEN7_SCHEDULE(OBWrite, 80, 1, 1) DECL_GEN7_SCHEDULE(MBRead, 80, 1, 1) DECL_GEN7_SCHEDULE(MBWrite, 80, 1, 1) Beignet-1.3.2-Source/backend/src/backend/context.cpp000664 001750 001750 00000055715 13173554000 021430 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file context.cpp * \author Benjamin Segovia */ #include "backend/context.hpp" #include "backend/program.hpp" #include "backend/gen_encoder.hpp" #include "ir/unit.hpp" #include "ir/function.hpp" #include "ir/profile.hpp" #include "ir/liveness.hpp" #include "ir/value.hpp" #include "ir/image.hpp" #include "sys/cvar.hpp" #include namespace gbe { class SimpleAllocator { public: SimpleAllocator(int32_t startOffset, int32_t size); ~SimpleAllocator(void); /*! Allocate some memory from the pool. */ int32_t allocate(int32_t size, int32_t alignment, bool bFwd=false); /*! Free the given register file piece */ void deallocate(int32_t offset); /*! check whether a super register is in free list, * a super register means a 32byte register, Gen * often has 128 super registers*/ bool isSuperRegisterFree(int32_t offset); /*! Spilt a block into 2 blocks */ void splitBlock(int32_t offset, int32_t subOffset); protected: /*! Double chained list of free spaces */ struct Block { Block(int32_t offset, int32_t size) : prev(NULL), next(NULL), offset(offset), size(size) {} Block *prev, *next; //!< Previous and next free blocks int32_t offset; //!< Where the free block starts int32_t size; //!< Size of the free block }; /*! Try to coalesce two blocks (left and right). They must be in that order. * If the colascing was done, the left block is deleted */ void coalesce(Block *left, Block *right); void dumpFreeList(); /*! the maximum offset */ int32_t maxOffset; /*! Head and tail of the free list */ Block *head; Block *tail; /*! Handle free list element allocation */ DECL_POOL(Block, blockPool); /*! Track allocated memory blocks */ map allocatedBlocks; /*! Use custom allocators */ GBE_CLASS(SimpleAllocator); }; /*! Structure that keeps track of allocation in the register file. This is * actually needed by Context (and not only by GenContext) because both * simulator and hardware have to deal with constant pushing which uses the * register file * * Since Gen is pretty flexible, we just reuse the Simpleallocator */ class RegisterAllocator: public SimpleAllocator { public: RegisterAllocator(int32_t offset, int32_t size): SimpleAllocator(offset, size) {} GBE_CLASS(RegisterAllocator); }; /*! * an allocator for scratch memory allocation. Scratch memory are used for register spilling. * You can query how much scratch memory needed through getMaxScatchMemUsed(). */ class ScratchAllocator: public SimpleAllocator { public: ScratchAllocator(int32_t size): SimpleAllocator(0, size) {} int32_t getMaxScatchMemUsed() { return maxOffset; } GBE_CLASS(ScratchAllocator); }; SimpleAllocator::SimpleAllocator(int32_t startOffset, int32_t size) : maxOffset(0) { tail = head = this->newBlock(startOffset, size); } SimpleAllocator::~SimpleAllocator(void) { while (this->head) { Block *next = this->head->next; this->deleteBlock(this->head); this->head = next; } } void SimpleAllocator::dumpFreeList() { Block *s = head; printf("register free list:\n"); while (s) { printf("blk: %d(r%d.%d) (%d)\n", s->offset, s->offset/GEN_REG_SIZE, s->offset % GEN_REG_SIZE, s->size); s = s->next; } printf("free list end\n"); } bool SimpleAllocator::isSuperRegisterFree(int32_t offset) { assert((offset % GEN_REG_SIZE) == 0); Block *s = head; while (s) { if (s->offset <= offset && (s->offset+s->size) >= offset+GEN_REG_SIZE) { return true; } if (s->offset > offset) return false; s = s->next; } return false; } int32_t SimpleAllocator::allocate(int32_t size, int32_t alignment, bool bFwd) { // Make it simple and just use the first block we find Block *list = bFwd ? head : tail; while (list) { int32_t aligned; int32_t spaceOnLeft; int32_t spaceOnRight; if(bFwd) { aligned = ALIGN(list->offset, alignment); spaceOnLeft = aligned - list->offset; spaceOnRight = list->size - size - spaceOnLeft; // Not enough space in this block if (spaceOnRight < 0) { list = list->next; continue; } } else { int32_t unaligned = list->offset + list->size - size - (alignment-1); if(unaligned < 0) { list = list->prev; continue; } aligned = ALIGN(unaligned, alignment); //alloc from block's tail spaceOnLeft = aligned - list->offset; spaceOnRight = list->size - size - spaceOnLeft; // Not enough space in this block if (spaceOnLeft < 0) { list = list->prev; continue; } } // Cool we can use this block Block *left = list->prev; Block *right = list->next; // If we left a hole on the left, create a new block if (spaceOnLeft) { Block *newBlock = this->newBlock(list->offset, spaceOnLeft); if (left) { left->next = newBlock; newBlock->prev = left; } if (right) { newBlock->next = right; right->prev = newBlock; } left = newBlock; } // If we left a hole on the right, create a new block as well if (spaceOnRight) { Block *newBlock = this->newBlock(aligned + size, spaceOnRight); if (left) { left->next = newBlock; newBlock->prev = left; } if (right) { right->prev = newBlock; newBlock->next = right; } right = newBlock; } // Chain both successors and predecessors when the entire block was // allocated if (spaceOnLeft == 0 && spaceOnRight == 0) { if (left) left->next = right; if (right) right->prev = left; } // Update the head of the free blocks if (list == head) { if (left) head = left; else if (right) head = right; else head = NULL; } // Update the tail of the free blocks if (list == tail) { if (right) tail = right; else if (left) tail = left; else tail = NULL; } // Free the block and check the consistency this->deleteBlock(list); if (head && head->next) GBE_ASSERT(head->next->prev == head); if (tail && tail->prev) GBE_ASSERT(tail->prev->next == tail); // Track the allocation to retrieve the size later allocatedBlocks.insert(std::make_pair(aligned, size)); // update max offset if(aligned + size > maxOffset) maxOffset = aligned + size; // We have a valid offset now return aligned; } return -1; } void SimpleAllocator::deallocate(int32_t offset) { // Retrieve the size in the allocation map auto it = allocatedBlocks.find(offset); GBE_ASSERT(it != allocatedBlocks.end()); const int32_t size = it->second; // Find the two blocks where to insert the new block Block *list = tail, *next = NULL; while (list != NULL) { if (list->offset < offset) break; next = list; list = list->prev; } // Create the block and insert it Block *newBlock = this->newBlock(offset, size); if (list) { GBE_ASSERT(list->offset + list->size <= offset); list->next = newBlock; newBlock->prev = list; } else this->head = newBlock; // list is NULL means newBlock should be the head. if (next) { GBE_ASSERT(offset + size <= next->offset); next->prev = newBlock; newBlock->next = next; } else this->tail = newBlock; // next is NULL means newBlock should be the tail. if (list != NULL || next != NULL) { // Coalesce the blocks if possible this->coalesce(list, newBlock); this->coalesce(newBlock, next); } // Do not track this allocation anymore allocatedBlocks.erase(it); } void SimpleAllocator::coalesce(Block *left, Block *right) { if (left == NULL || right == NULL) return; GBE_ASSERT(left->offset < right->offset); GBE_ASSERT(left->next == right); GBE_ASSERT(right->prev == left); if (left->offset + left->size == right->offset) { right->offset = left->offset; right->size += left->size; if (left->prev) left->prev->next = right; right->prev = left->prev; if (left == this->head) this->head = right; this->deleteBlock(left); } } void SimpleAllocator::splitBlock(int32_t offset, int32_t subOffset) { // Retrieve the size in the allocation map auto it = allocatedBlocks.find(offset); GBE_ASSERT(it != allocatedBlocks.end()); while(subOffset > it->second) { subOffset -= it->second; offset += it->second; it = allocatedBlocks.find(offset); GBE_ASSERT(it != allocatedBlocks.end()); } if(subOffset == 0) return; int32_t size = it->second; allocatedBlocks.erase(it); // Track the allocation to retrieve the size later allocatedBlocks.insert(std::make_pair(offset, subOffset)); allocatedBlocks.insert(std::make_pair(offset + subOffset, size - subOffset)); } /////////////////////////////////////////////////////////////////////////// // Generic Context (shared by the simulator and the HW context) /////////////////////////////////////////////////////////////////////////// Context::Context(const ir::Unit &unit, const std::string &name) : unit(unit), fn(*unit.getFunction(name)), name(name), liveness(NULL), dag(NULL), useDWLabel(false) { GBE_ASSERT(unit.getPointerSize() == ir::POINTER_32_BITS || unit.getPointerSize() == ir::POINTER_64_BITS); this->liveness = GBE_NEW(ir::Liveness, const_cast(fn), true); this->dag = GBE_NEW(ir::FunctionDAG, *this->liveness); // r0 (GEN_REG_SIZE) is always set by the HW and used at the end by EOT this->registerAllocator = NULL; //GBE_NEW(RegisterAllocator, GEN_REG_SIZE, 4*KB - GEN_REG_SIZE); this->scratchAllocator = NULL; //GBE_NEW(ScratchAllocator, 12*KB); } Context::~Context(void) { GBE_SAFE_DELETE(this->registerAllocator); GBE_SAFE_DELETE(this->scratchAllocator); GBE_SAFE_DELETE(this->dag); GBE_SAFE_DELETE(this->liveness); } void Context::startNewCG(uint32_t simdWidth) { this->simdWidth = simdWidth; GBE_SAFE_DELETE(this->registerAllocator); GBE_SAFE_DELETE(this->scratchAllocator); GBE_ASSERT(dag != NULL && liveness != NULL); this->registerAllocator = GBE_NEW(RegisterAllocator, GEN_REG_SIZE, 4*KB - GEN_REG_SIZE); this->scratchAllocator = GBE_NEW(ScratchAllocator, this->getScratchSize()); this->curbeRegs.clear(); this->JIPs.clear(); } Kernel *Context::compileKernel(void) { this->kernel = this->allocateKernel(); this->kernel->simdWidth = this->simdWidth; this->buildArgList(); if (fn.labelNum() > 0xffff) this->useDWLabel = true; if (usedLabels.size() == 0) this->buildUsedLabels(); if (JIPs.size() == 0) this->buildJIPs(); this->buildStack(); this->handleSLM(); if (this->emitCode() == false) { GBE_DELETE(this->kernel); this->kernel = NULL; } if(this->kernel != NULL) { this->kernel->scratchSize = this->alignScratchSize(scratchAllocator->getMaxScatchMemUsed()); this->kernel->ctx = this; this->kernel->setUseDeviceEnqueue(fn.getUseDeviceEnqueue()); } return this->kernel; } int32_t Context::allocate(int32_t size, int32_t alignment, bool bFwd) { return registerAllocator->allocate(size, alignment, bFwd); } bool Context::isSuperRegisterFree(int offset) { return registerAllocator->isSuperRegisterFree(offset); } void Context::deallocate(int32_t offset) { registerAllocator->deallocate(offset); } void Context::splitBlock(int32_t offset, int32_t subOffset) { registerAllocator->splitBlock(offset, subOffset); } // FIXME TODO as we optimize scratch memory usage using the register interval. // we need to add some dependency in post_reg_alloc scheduler, to keep scratch // memory that are reused still keep the order int32_t Context::allocateScratchMem(uint32_t size) { return scratchAllocator->allocate(size, 32, true); } void Context::deallocateScratchMem(int32_t offset) { scratchAllocator->deallocate(offset); } void Context::buildStack(void) { const auto &stackUse = dag->getUse(ir::ocl::stackptr); if (stackUse.size() == 0) { // no stack is used if stackptr is unused this->kernel->stackSize = 0; return; } uint32_t stackSize = 128; while (stackSize < fn.getStackSize()) { stackSize *= 3; //GBE_ASSERT(stackSize <= 64*KB); } this->kernel->stackSize = stackSize; } uint32_t Context::newCurbeEntry(gbe_curbe_type value, uint32_t subValue, uint32_t size, uint32_t alignment) { alignment = alignment == 0 ? size : alignment; const uint32_t offset = registerAllocator->allocate(size, alignment, 1); GBE_ASSERT(offset >= GEN_REG_SIZE); kernel->patches.push_back(PatchInfo(value, subValue, offset - GEN_REG_SIZE)); kernel->curbeSize = std::max(kernel->curbeSize, offset + size - GEN_REG_SIZE); return offset; } void Context::insertCurbeReg(ir::Register reg, uint32_t offset) { curbeRegs.insert(std::make_pair(reg, offset)); } ir::Register Context::getSurfaceBaseReg(unsigned char bti) { return fn.getSurfaceBaseReg(bti); } void Context::buildArgList(void) { kernel->argNum = fn.argNum(); if (kernel->argNum) kernel->args = GBE_NEW_ARRAY_NO_ARG(KernelArgument, kernel->argNum); else kernel->args = NULL; for (uint32_t argID = 0; argID < kernel->argNum; ++argID) { const auto &arg = fn.getArg(argID); kernel->args[argID].align = arg.align; kernel->args[argID].info.addrSpace = arg.info.addrSpace; kernel->args[argID].info.typeName = arg.info.typeName; kernel->args[argID].info.accessQual = arg.info.accessQual; kernel->args[argID].info.typeQual = arg.info.typeQual; kernel->args[argID].info.argName = arg.info.argName; kernel->args[argID].info.typeSize = arg.info.typeSize; switch (arg.type) { case ir::FunctionArgument::VALUE: case ir::FunctionArgument::STRUCTURE: kernel->args[argID].type = GBE_ARG_VALUE; kernel->args[argID].size = arg.size; break; case ir::FunctionArgument::GLOBAL_POINTER: kernel->args[argID].type = GBE_ARG_GLOBAL_PTR; kernel->args[argID].size = sizeof(void*); kernel->args[argID].bti = arg.bti; break; case ir::FunctionArgument::CONSTANT_POINTER: kernel->args[argID].type = GBE_ARG_CONSTANT_PTR; kernel->args[argID].size = sizeof(void*); break; case ir::FunctionArgument::LOCAL_POINTER: kernel->args[argID].type = GBE_ARG_LOCAL_PTR; kernel->args[argID].size = 0; break; case ir::FunctionArgument::IMAGE: kernel->args[argID].type = GBE_ARG_IMAGE; kernel->args[argID].size = sizeof(void*); break; case ir::FunctionArgument::SAMPLER: kernel->args[argID].type = GBE_ARG_SAMPLER; kernel->args[argID].size = sizeof(void*); break; case ir::FunctionArgument::PIPE: kernel->args[argID].type = GBE_ARG_PIPE; kernel->args[argID].size = sizeof(void*); kernel->args[argID].bti = arg.bti; break; } } } void Context::buildUsedLabels(void) { usedLabels.clear(); fn.foreachInstruction([this](const ir::Instruction &insn) { using namespace ir; if (insn.getOpcode() != OP_BRA) return; const LabelIndex index = cast(insn).getLabelIndex(); usedLabels.insert(index); }); } /* Because of the structural analysis, control flow of blocks inside a structure * is manipulated by if, else and endif. so these blocks don't need jips. so here * treats all the blocks belong to the same structure as a whole. */ void Context::buildJIPs(void) { using namespace ir; // Linearly store the branch target for each block and its own label const LabelIndex noTarget(fn.labelNum()); vector> braTargets; int32_t curr = 0; // If some blocks are unused we mark them as such by setting their own label // as "invalid" (== noTarget) int blockCount = 0; // because some blocks maybe belong to the same structure, so the number of // blocks we are dealing with may be less than the number of basic blocks. // here calculate the actual block number we would handle. fn.foreachBlock([&](const BasicBlock &bb) { if(bb.belongToStructure && bb.isStructureExit) blockCount++; else if(!bb.belongToStructure) blockCount++; }); braTargets.resize(blockCount); LabelIndex structureExitLabel; LabelIndex structureEntryLabel; bool flag; set pos; map exitMap; map entryMap; for (auto &bb : braTargets) bb = std::make_pair(noTarget, noTarget); fn.foreachBlock([&](const BasicBlock &bb) { LabelIndex ownLabel; Instruction *last; flag = false; // bb belongs to a structure and it's not the structure's exit, just simply insert // the target of bra to JIPs. if(bb.belongToStructure && !bb.isStructureExit) { last = bb.getLastInstruction(); if(last->getOpcode() == OP_BRA) { BranchInstruction *bra = cast(last); JIPs.insert(std::make_pair(bra, bra->getLabelIndex())); } return; } else { // bb belongs to a structure and it's the strucutre's exit, we treat this bb // as the structure it belongs to, use the label of structure's entry as this // structure's label and last instruction of structure's exit as this structure's // last instruction. if(bb.belongToStructure && bb.isStructureExit) { ownLabel = (bb.matchingStructureEntry)->getLabelIndex(); last = bb.getLastInstruction(); structureExitLabel = bb.getLabelIndex(); structureEntryLabel = ownLabel; flag = true; } // bb belongs to no structure. else { ownLabel = bb.getLabelIndex(); last = bb.getLastInstruction(); } if (last->getOpcode() != OP_BRA) { braTargets[curr++] = std::make_pair(ownLabel, noTarget); if(flag) { pos.insert(curr-1); exitMap[curr-1] = structureExitLabel; entryMap[curr-1] = structureEntryLabel; } } else { const BranchInstruction *bra = cast(last); braTargets[curr++] = std::make_pair(ownLabel, bra->getLabelIndex()); if(flag) { exitMap[curr-1] = structureExitLabel; entryMap[curr-1] = structureEntryLabel; pos.insert(curr-1); } } } }); // Backward jumps are special. We must insert the label of the next block // when we hit the "DO" i.e. the target label of the backward branch (as in // do { } while) . So, we store the bwd jumps per targets // XXX does not use custom allocator std::multimap bwdTargets; for (int32_t blockID = 0; blockID fwdTargets; // Now retraverse the blocks and figure out all JIPs for (int32_t blockID = 0; blockID second); // If there is an outstanding forward branch, compute a JIP for the label auto lower = fwdTargets.lower_bound(LabelIndex(0)); GBE_ASSERT(label->isMemberOf() == true); if (lower != fwdTargets.end()) JIPs.insert(std::make_pair(label, *lower)); // Handle special cases and backward branches first if (ownLabel == noTarget) continue; // unused block if (target == noTarget) continue; // no branch at all GBE_ASSERT(bra->isMemberOf() == true); if (target <= ownLabel) { // bwd branch: we always jump JIPs.insert(std::make_pair(bra, LabelIndex(target))); continue; } // This is a forward jump, register it and get the JIP fwdTargets.insert(target); auto jip = fwdTargets.lower_bound(LabelIndex(0)); JIPs.insert(std::make_pair(bra, *jip)); } } void Context::handleSLM(void) { const bool useSLM = fn.getUseSLM(); kernel->useSLM = useSLM; kernel->slmSize = fn.getSLMSize(); } } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/backend/gen9_encoder.hpp000664 001750 001750 00000005251 13161142102 022271 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file gen9_context.hpp */ #ifndef __GBE_GEN9_ENCODER_HPP__ #define __GBE_GEN9_ENCODER_HPP__ #include "backend/gen8_encoder.hpp" namespace gbe { /* This class is used to implement the SKL specific logic for encoder. */ class Gen9Encoder : public Gen8Encoder { public: virtual ~Gen9Encoder(void) { } Gen9Encoder(uint32_t simdWidth, uint32_t gen, uint32_t deviceID) : Gen8Encoder(simdWidth, gen, deviceID) { } /*! Send instruction for the sampler */ virtual void SAMPLE(GenRegister dest, GenRegister msg, unsigned int msg_len, bool header_present, unsigned char bti, unsigned char sampler, unsigned int simdWidth, uint32_t writemask, uint32_t return_format, bool isLD, bool isUniform); void setSendsOperands(Gen9NativeInstruction *gen9_insn, GenRegister dst, GenRegister src0, GenRegister src1); virtual void UNTYPED_WRITE(GenRegister addr, GenRegister data, GenRegister bti, uint32_t elemNum, bool useSends); virtual void TYPED_WRITE(GenRegister header, GenRegister data, bool header_present, unsigned char bti, bool useSends); virtual unsigned setUntypedWriteSendsMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum); virtual void BYTE_SCATTER(GenRegister addr, GenRegister data, GenRegister bti, uint32_t elemSize, bool useSends); virtual unsigned setByteScatterSendsMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemSize); virtual void ATOMIC(GenRegister dst, uint32_t function, GenRegister addr, GenRegister data, GenRegister bti, uint32_t srcNum, bool useSends); virtual void OBWRITE(GenRegister header, GenRegister data, uint32_t bti, uint32_t ow_size, bool useSends); virtual void MBWRITE(GenRegister header, GenRegister data, uint32_t bti, uint32_t data_size, bool useSends); }; } #endif /* __GBE_GEN9_ENCODER_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen8_encoder.hpp000664 001750 001750 00000011515 13161142102 022270 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file gen8_context.hpp */ #ifndef __GBE_GEN8_ENCODER_HPP__ #define __GBE_GEN8_ENCODER_HPP__ #include "backend/gen_encoder.hpp" namespace gbe { /* This class is used to implement the HSW specific logic for encoder. */ class Gen8Encoder : public GenEncoder { public: virtual ~Gen8Encoder(void) { } Gen8Encoder(uint32_t simdWidth, uint32_t gen, uint32_t deviceID) : GenEncoder(simdWidth, gen, deviceID) { } /*! Jump indexed instruction */ virtual void JMPI(GenRegister src, bool longjmp = false); virtual void FENCE(GenRegister dst, bool flushRWCache); /*! Patch JMPI/BRC/BRD (located at index insnID) with the given jump distance */ virtual void patchJMPI(uint32_t insnID, int32_t jip, int32_t uip); virtual void F16TO32(GenRegister dest, GenRegister src0); virtual void F32TO16(GenRegister dest, GenRegister src0); virtual void LOAD_INT64_IMM(GenRegister dest, GenRegister value); virtual void ATOMIC(GenRegister dst, uint32_t function, GenRegister addr, GenRegister data, GenRegister bti, uint32_t srcNum, bool useSends); virtual void ATOMICA64(GenRegister dst, uint32_t function, GenRegister src, GenRegister bti, uint32_t srcNum); virtual void UNTYPED_READ(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemNum); virtual void UNTYPED_WRITE(GenRegister src, GenRegister data, GenRegister bti, uint32_t elemNum, bool useSends); virtual void UNTYPED_READA64(GenRegister dst, GenRegister src, uint32_t elemNum); virtual void UNTYPED_WRITEA64(GenRegister src, uint32_t elemNum); virtual void BYTE_GATHERA64(GenRegister dst, GenRegister src, uint32_t elemSize); virtual void BYTE_SCATTERA64(GenRegister src, uint32_t elemSize); virtual void setHeader(GenNativeInstruction *insn); virtual void setDPUntypedRW(GenNativeInstruction *insn, uint32_t bti, uint32_t rgba, uint32_t msg_type, uint32_t msg_length, uint32_t response_length); virtual void setTypedWriteMessage(GenNativeInstruction *insn, unsigned char bti, unsigned char msg_type, uint32_t msg_length, bool header_present); virtual void FLUSH_SAMPLERCACHE(GenRegister dst); virtual void setDst(GenNativeInstruction *insn, GenRegister dest); virtual void setSrc0(GenNativeInstruction *insn, GenRegister reg); virtual void setSrc1(GenNativeInstruction *insn, GenRegister reg); virtual uint32_t getCompactVersion() { return 8; } virtual void alu3(uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1, GenRegister src2); virtual bool canHandleLong(uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1 = GenRegister::null()); virtual void handleDouble(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1 = GenRegister::null()); virtual unsigned setAtomicMessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum); virtual unsigned setAtomicA64MessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum, int type_long); virtual unsigned setUntypedReadMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum); virtual unsigned setUntypedWriteMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum); void setSrc0WithAcc(GenNativeInstruction *insn, GenRegister reg, uint32_t accN); void setSrc1WithAcc(GenNativeInstruction *insn, GenRegister reg, uint32_t accN); void MATH_WITH_ACC(GenRegister dst, uint32_t function, GenRegister src0, GenRegister src1, uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc); void MADM(GenRegister dst, GenRegister src0, GenRegister src1, GenRegister src2, uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t src2Acc); /*! A64 OBlock read */ virtual void OBREADA64(GenRegister dst, GenRegister header, uint32_t bti, uint32_t elemSize); /*! A64 OBlock write */ virtual void OBWRITEA64(GenRegister header, uint32_t bti, uint32_t elemSize); }; } #endif /* __GBE_GEN8_ENCODER_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/program.h000664 001750 001750 00000041247 13173554000 021053 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file program.h * \author Benjamin Segovia * * C interface for the Gen kernels and programs (either real Gen ISA or Gen * simulator). This is the only thing the run-time can see from the compiler */ #ifndef __GBE_PROGRAM_H__ #define __GBE_PROGRAM_H__ #include #include #include #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ typedef struct _DebugInfo { uint32_t line; uint32_t col; } DebugInfo; /*! Opaque structure that interfaces a GBE program */ typedef struct _gbe_program *gbe_program; /*! Opaque structure that interfaces a GBE kernel (ie one OCL function) */ typedef struct _gbe_kernel *gbe_kernel; /*! Argument type for each function call */ enum gbe_arg_type { GBE_ARG_VALUE = 0, // int, float and so on GBE_ARG_GLOBAL_PTR = 1, // __global GBE_ARG_CONSTANT_PTR = 2, // __constant GBE_ARG_LOCAL_PTR = 3, // __local GBE_ARG_IMAGE = 4, // image2d_t, image3d_t GBE_ARG_SAMPLER = 5, // sampler_t GBE_ARG_PIPE = 6, // pipe GBE_ARG_INVALID = 0xffffffff }; /*! Get argument info values */ enum gbe_get_arg_info_value { GBE_GET_ARG_INFO_ADDRSPACE = 0, GBE_GET_ARG_INFO_ACCESS = 1, GBE_GET_ARG_INFO_TYPE = 2, GBE_GET_ARG_INFO_TYPEQUAL = 3, GBE_GET_ARG_INFO_NAME = 4, GBE_GET_ARG_INFO_TYPESIZE = 5, GBE_GET_ARG_INFO_INVALID = 0xffffffff }; // BTI magic number #define BTI_CONSTANT 0 #define BTI_PRIVATE 1 #define BTI_RESERVED_NUM 2 #define BTI_MAX_READ_IMAGE_ARGS 128 #define BTI_MAX_WRITE_IMAGE_ARGS 8 #define BTI_WORKAROUND_IMAGE_OFFSET 128 #define BTI_MAX_ID 253 #define BTI_LOCAL 0xfe /*! Constant buffer values (ie values to setup in the constant buffer) */ enum gbe_curbe_type { GBE_CURBE_LOCAL_ID_X = 0, GBE_CURBE_LOCAL_ID_Y, GBE_CURBE_LOCAL_ID_Z, GBE_CURBE_LOCAL_SIZE_X, GBE_CURBE_LOCAL_SIZE_Y, GBE_CURBE_LOCAL_SIZE_Z, GBE_CURBE_ENQUEUED_LOCAL_SIZE_X, GBE_CURBE_ENQUEUED_LOCAL_SIZE_Y, GBE_CURBE_ENQUEUED_LOCAL_SIZE_Z, GBE_CURBE_GLOBAL_SIZE_X, GBE_CURBE_GLOBAL_SIZE_Y, GBE_CURBE_GLOBAL_SIZE_Z, GBE_CURBE_GLOBAL_OFFSET_X, GBE_CURBE_GLOBAL_OFFSET_Y, GBE_CURBE_GLOBAL_OFFSET_Z, GBE_CURBE_GROUP_NUM_X, GBE_CURBE_GROUP_NUM_Y, GBE_CURBE_GROUP_NUM_Z, GBE_CURBE_WORK_DIM, GBE_CURBE_IMAGE_INFO, GBE_CURBE_KERNEL_ARGUMENT, GBE_CURBE_EXTRA_ARGUMENT, GBE_CURBE_BLOCK_IP, GBE_CURBE_DW_BLOCK_IP, GBE_CURBE_THREAD_NUM, GBE_CURBE_PROFILING_BUF_POINTER, GBE_CURBE_PROFILING_TIMESTAMP0, GBE_CURBE_PROFILING_TIMESTAMP1, GBE_CURBE_PROFILING_TIMESTAMP2, GBE_CURBE_PROFILING_TIMESTAMP3, GBE_CURBE_PROFILING_TIMESTAMP4, GBE_CURBE_THREAD_ID, GBE_CURBE_CONSTANT_ADDRSPACE, GBE_CURBE_STACK_SIZE, GBE_CURBE_ENQUEUE_BUF_POINTER, GBE_GEN_REG, }; /*! Extra arguments use the negative range of sub-values */ enum gbe_extra_argument { GBE_STACK_BUFFER = 0, /* Give stack location in curbe */ }; typedef struct ImageInfo { int32_t arg_idx; int32_t idx; int32_t wSlot; int32_t hSlot; int32_t depthSlot; int32_t dataTypeSlot; int32_t channelOrderSlot; int32_t dimOrderSlot; } ImageInfo; typedef void (gbe_set_image_base_index_cb)(uint32_t base_idx); extern gbe_set_image_base_index_cb *gbe_set_image_base_index; typedef uint32_t (gbe_get_image_base_index_cb)(); extern gbe_get_image_base_index_cb *gbe_get_image_base_index; /*! Get the size of defined images */ typedef size_t (gbe_kernel_get_image_size_cb)(gbe_kernel gbeKernel); extern gbe_kernel_get_image_size_cb *gbe_kernel_get_image_size; /*! Get the content of defined images */ typedef void (gbe_kernel_get_image_data_cb)(gbe_kernel gbeKernel, ImageInfo *images); extern gbe_kernel_get_image_data_cb *gbe_kernel_get_image_data; /*! Get whether we are in the code profiling mode */ typedef void (gbe_output_profiling_cb)(void* profiling_info, void* buf); extern gbe_output_profiling_cb *gbe_output_profiling; /*! Get the profiling bti */ typedef uint32_t (gbe_get_profiling_bti_cb)(gbe_kernel gbeKernel); extern gbe_get_profiling_bti_cb *gbe_get_profiling_bti; typedef void* (gbe_dup_profiling_cb)(gbe_kernel gbeKernel); extern gbe_dup_profiling_cb *gbe_dup_profiling; /*! Get the printf number */ typedef uint32_t (gbe_get_printf_num_cb)(void* printf_info); extern gbe_get_printf_num_cb *gbe_get_printf_num; /*! Get the printf buffer bti */ typedef uint8_t (gbe_get_printf_buf_bti_cb)(void* printf_info); extern gbe_get_printf_buf_bti_cb *gbe_get_printf_buf_bti; /*! Release the printfset */ typedef void (gbe_release_printf_info_cb)(void* printf_info); extern gbe_release_printf_info_cb *gbe_release_printf_info; /*! Dup the printf set */ typedef void* (gbe_dup_printfset_cb)(gbe_kernel gbeKernel); extern gbe_dup_printfset_cb *gbe_dup_printfset; typedef void (gbe_output_printf_cb) (void* printf_info, void* buf_addr); extern gbe_output_printf_cb* gbe_output_printf; /*! Create a new program from the llvm file (zero terminated string) */ typedef gbe_program (gbe_program_new_from_llvm_file_cb)(uint32_t deviceID, const char *fileName, size_t stringSize, char *err, size_t *err_size); extern gbe_program_new_from_llvm_file_cb *gbe_program_new_from_llvm_file; /*! Create a new program from the given source code (zero terminated string) */ typedef gbe_program (gbe_program_new_from_source_cb)(uint32_t deviceID, const char *source, size_t stringSize, const char *options, char *err, size_t *err_size); extern gbe_program_new_from_source_cb *gbe_program_new_from_source; /*! Create a new program from the given source code and compile it (zero terminated string) */ typedef gbe_program (gbe_program_compile_from_source_cb)(uint32_t deviceID, const char *source, const char *temp_header_path, size_t stringSize, const char *options, char *err, size_t *err_size); extern gbe_program_compile_from_source_cb *gbe_program_compile_from_source; /*! link the programs. */ typedef bool (gbe_program_link_program_cb)(gbe_program dst_program, gbe_program src_program, size_t stringSize, char * err, size_t * errSize); extern gbe_program_link_program_cb *gbe_program_link_program; /*! check link option. */ typedef bool (gbe_program_check_opt_cb)(const char *option); extern gbe_program_check_opt_cb *gbe_program_check_opt; /*! create s new genprogram for link. */ typedef gbe_program (gbe_program_new_gen_program_cb)(uint32_t deviceID, const void *module, const void *act, const char *asm_file_name); extern gbe_program_new_gen_program_cb *gbe_program_new_gen_program; /*! Create a new program from the given blob */ typedef gbe_program (gbe_program_new_from_binary_cb)(uint32_t deviceID, const char *binary, size_t size); extern gbe_program_new_from_binary_cb *gbe_program_new_from_binary; /*! Create a new program from the llvm bitcode*/ typedef gbe_program (gbe_program_new_from_llvm_binary_cb)(uint32_t deviceID, const char *binary, size_t size); extern gbe_program_new_from_llvm_binary_cb *gbe_program_new_from_llvm_binary; /*! Serialize a program to a bin, 0 means executable, 1 means llvm bitcode*/ typedef size_t (gbe_program_serialize_to_binary_cb)(gbe_program program, char **binary, int binary_type); extern gbe_program_serialize_to_binary_cb *gbe_program_serialize_to_binary; /*! Create a new program from the given LLVM file */ typedef gbe_program (gbe_program_new_from_llvm_cb)(uint32_t deviceID, const void *module, const void *llvm_ctx, const char *asm_file_name, size_t string_size, char *err, size_t *err_size, int optLevel, const char* options); extern gbe_program_new_from_llvm_cb *gbe_program_new_from_llvm; /*! link the programs from llvm level. */ typedef bool (gbe_program_link_from_llvm_cb)(gbe_program dst_program, gbe_program src_program, size_t stringSize, char * err, size_t * errSize); extern gbe_program_link_from_llvm_cb *gbe_program_link_from_llvm; /* build the program to gen binary */ typedef void gbe_program_build_from_llvm_cb(gbe_program program, size_t stringSize, char *err, size_t *errSize, const char * options); extern gbe_program_build_from_llvm_cb *gbe_program_build_from_llvm; /*! Get the size of global constants */ typedef size_t (gbe_program_get_global_constant_size_cb)(gbe_program gbeProgram); extern gbe_program_get_global_constant_size_cb *gbe_program_get_global_constant_size; /*! Get the content of global constants */ typedef void (gbe_program_get_global_constant_data_cb)(gbe_program gbeProgram, char *mem); extern gbe_program_get_global_constant_data_cb *gbe_program_get_global_constant_data; typedef size_t (gbe_program_get_global_reloc_count_cb)(gbe_program gbeProgram); extern gbe_program_get_global_reloc_count_cb *gbe_program_get_global_reloc_count; typedef void (gbe_program_get_global_reloc_table_cb)(gbe_program gbeProgram, char *mem); extern gbe_program_get_global_reloc_table_cb *gbe_program_get_global_reloc_table; /*! Get the size of defined samplers */ typedef size_t (gbe_kernel_get_sampler_size_cb)(gbe_kernel gbeKernel); extern gbe_kernel_get_sampler_size_cb *gbe_kernel_get_sampler_size; /*! Get the content of defined samplers */ typedef void (gbe_kernel_get_sampler_data_cb)(gbe_kernel gbeKernel, uint32_t *samplers); extern gbe_kernel_get_sampler_data_cb *gbe_kernel_get_sampler_data; /*! Get the content of defined samplers */ typedef void (gbe_kernel_get_compile_wg_size_cb)(gbe_kernel gbeKernel, size_t wg_sz[3]); extern gbe_kernel_get_compile_wg_size_cb *gbe_kernel_get_compile_wg_size; /*! Clean LLVM resource of the given program */ typedef void (gbe_program_clean_llvm_resource_cb)(gbe_program); extern gbe_program_clean_llvm_resource_cb *gbe_program_clean_llvm_resource; /*! Destroy and deallocate the given program */ typedef void (gbe_program_delete_cb)(gbe_program); extern gbe_program_delete_cb *gbe_program_delete; /*! Get the number of functions in the program */ typedef uint32_t (gbe_program_get_kernel_num_cb)(gbe_program); extern gbe_program_get_kernel_num_cb *gbe_program_get_kernel_num; /*! Get the kernel from its name */ typedef gbe_kernel (gbe_program_get_kernel_by_name_cb)(gbe_program, const char *name); extern gbe_program_get_kernel_by_name_cb *gbe_program_get_kernel_by_name; /*! Get the kernel from its ID */ typedef gbe_kernel (gbe_program_get_kernel_cb)(gbe_program, uint32_t ID); extern gbe_program_get_kernel_cb *gbe_program_get_kernel; typedef const char* (gbe_program_get_device_enqueue_kernel_name_cb)(gbe_program, uint32_t ID); extern gbe_program_get_device_enqueue_kernel_name_cb *gbe_program_get_device_enqueue_kernel_name; /*! Get the kernel name */ typedef const char *(gbe_kernel_get_name_cb)(gbe_kernel); extern gbe_kernel_get_name_cb *gbe_kernel_get_name; /*! Get the kernel attributes*/ typedef const char *(gbe_kernel_get_attributes_cb)(gbe_kernel); extern gbe_kernel_get_attributes_cb *gbe_kernel_get_attributes; /*! Get the kernel source code */ typedef const char *(gbe_kernel_get_code_cb)(gbe_kernel); extern gbe_kernel_get_code_cb *gbe_kernel_get_code; /*! Get the size of the source code */ typedef size_t (gbe_kernel_get_code_size_cb)(gbe_kernel); extern gbe_kernel_get_code_size_cb *gbe_kernel_get_code_size; /*! Get the total number of arguments */ typedef uint32_t (gbe_kernel_get_arg_num_cb)(gbe_kernel); extern gbe_kernel_get_arg_num_cb *gbe_kernel_get_arg_num; /*! Get the argument info */ typedef void* (gbe_kernel_get_arg_info_cb)(gbe_kernel, uint32_t argID, uint32_t value); extern gbe_kernel_get_arg_info_cb *gbe_kernel_get_arg_info; /*! Get the size of the given argument */ typedef uint32_t (gbe_kernel_get_arg_size_cb)(gbe_kernel, uint32_t argID); extern gbe_kernel_get_arg_size_cb *gbe_kernel_get_arg_size; /*! Get the the bti of a __global buffer */ typedef uint8_t (gbe_kernel_get_arg_bti_cb)(gbe_kernel, uint32_t argID); extern gbe_kernel_get_arg_bti_cb *gbe_kernel_get_arg_bti; /*! Get the type of the given argument */ typedef enum gbe_arg_type (gbe_kernel_get_arg_type_cb)(gbe_kernel, uint32_t argID); extern gbe_kernel_get_arg_type_cb *gbe_kernel_get_arg_type; /*! Get the align of the given argument */ typedef uint32_t (gbe_kernel_get_arg_align_cb)(gbe_kernel, uint32_t argID); extern gbe_kernel_get_arg_align_cb *gbe_kernel_get_arg_align; /*! Get the simd width for the kernel */ typedef uint32_t (gbe_kernel_get_simd_width_cb)(gbe_kernel); extern gbe_kernel_get_simd_width_cb *gbe_kernel_get_simd_width; /*! Get the curbe size required by the kernel */ typedef int32_t (gbe_kernel_get_curbe_size_cb)(gbe_kernel); extern gbe_kernel_get_curbe_size_cb *gbe_kernel_get_curbe_size; /*! Get the stack size (zero if no stack is required) */ typedef int32_t (gbe_kernel_get_stack_size_cb)(gbe_kernel); extern gbe_kernel_get_stack_size_cb *gbe_kernel_get_stack_size; /*! Get the scratch size (zero if no scratch is required) */ typedef int32_t (gbe_kernel_get_scratch_size_cb)(gbe_kernel); extern gbe_kernel_get_scratch_size_cb *gbe_kernel_get_scratch_size; /*! Get the curbe offset where to put the data. Returns -1 if not required */ typedef int32_t (gbe_kernel_get_curbe_offset_cb)(gbe_kernel, enum gbe_curbe_type type, uint32_t sub_type); extern gbe_kernel_get_curbe_offset_cb *gbe_kernel_get_curbe_offset; /*! Indicates if a work group size is required. Return the required width or 0 * if none */ typedef uint32_t (gbe_kernel_get_required_work_group_size_cb)(gbe_kernel, uint32_t dim); extern gbe_kernel_get_required_work_group_size_cb *gbe_kernel_get_required_work_group_size; /*! Says if SLM is used. Required to reconfigure the L3 complex */ typedef int32_t (gbe_kernel_use_slm_cb)(gbe_kernel); extern gbe_kernel_use_slm_cb *gbe_kernel_use_slm; /*! Get slm size needed for kernel local variables */ typedef int32_t (gbe_kernel_get_slm_size_cb)(gbe_kernel); extern gbe_kernel_get_slm_size_cb *gbe_kernel_get_slm_size; /*! Get the kernel's opencl version. */ typedef uint32_t (gbe_kernel_get_ocl_version_cb)(gbe_kernel); extern gbe_kernel_get_ocl_version_cb *gbe_kernel_get_ocl_version; /* Kernel use device enqueue or not. */ typedef uint32_t (gbe_kernel_use_device_enqueue_cb)(gbe_kernel); extern gbe_kernel_use_device_enqueue_cb *gbe_kernel_use_device_enqueue; /*mutex to lock global llvmcontext access.*/ extern void acquireLLVMContextLock(); extern void releaseLLVMContextLock(); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* __GBE_PROGRAM_H__ */ Beignet-1.3.2-Source/backend/src/backend/gen_reg_allocation.hpp000664 001750 001750 00000004555 13161142102 023551 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file gen_reg_allocation.hpp * \author Benjamin Segovia */ #ifndef __GBE_GEN_REG_ALLOCATION_HPP__ #define __GBE_GEN_REG_ALLOCATION_HPP__ #include "ir/register.hpp" #include "backend/gen_register.hpp" namespace gbe { class Selection; // Pre-register allocation code generation class GenRegister; // Pre-register allocation Gen register struct GenRegInterval; // Liveness interval for each register class GenContext; // Gen specific context typedef struct SpillRegTag { bool isTmpReg; int32_t addr; } SpillRegTag; typedef map SpilledRegs; /*! Register allocate (i.e. virtual to physical register mapping) */ class GenRegAllocator { public: /*! Initialize the register allocator */ GenRegAllocator(GenContext &ctx); /*! Release all taken resources */ ~GenRegAllocator(void); /*! Perform the register allocation */ bool allocate(Selection &selection); /*! Virtual to physical translation */ GenRegister genReg(const GenRegister ®); /*! Check whether a register is allocated. */ bool isAllocated(const ir::Register ®); /*! Output the register allocation */ void outputAllocation(void); /*! Get register actual size in byte. */ uint32_t getRegSize(ir::Register reg); private: /*! Actual implementation of the register allocator (use Pimpl) */ class Opaque; /*! Created and destroyed in cpp */ Opaque *opaque; /*! Use custom allocator */ GBE_CLASS(GenRegAllocator); }; } /* namespace gbe */ #endif /* __GBE_GEN_REG_ALLOCATION_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen8_context.cpp000664 001750 001750 00000227075 13161142102 022342 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file gen8_context.cpp */ #include "backend/gen8_context.hpp" #include "backend/gen8_encoder.hpp" #include "backend/gen_program.hpp" #include "backend/gen_defs.hpp" #include "backend/gen_encoder.hpp" #include "backend/gen_insn_selection.hpp" #include "backend/gen_insn_scheduling.hpp" #include "backend/gen_reg_allocation.hpp" #include "sys/cvar.hpp" #include "ir/function.hpp" #include "ir/value.hpp" #include namespace gbe { void Gen8Context::emitSLMOffset(void) { return; } uint32_t Gen8Context::alignScratchSize(uint32_t size){ if(size == 0) return 0; uint32_t i = 1024; while(i < size) i *= 2; return i; } void Gen8Context::newSelection(void) { this->sel = GBE_NEW(Selection8, *this); } bool Gen8Context::patchBranches(void) { using namespace ir; for (auto pair : branchPos2) { const LabelIndex label = pair.first; const int32_t insnID = pair.second; const int32_t targetID = labelPos.find(label)->second; p->patchJMPI(insnID, (targetID - insnID), 0); } for (auto pair : branchPos3) { const LabelPair labelPair = pair.first; const int32_t insnID = pair.second; const int32_t jip = labelPos.find(labelPair.l0)->second; const int32_t uip = labelPos.find(labelPair.l1)->second; p->patchJMPI(insnID, jip - insnID, uip - insnID); } return true; } void Gen8Context::emitUnaryInstruction(const SelectionInstruction &insn) { switch (insn.opcode) { case SEL_OP_CONVI64_TO_I: /* Should never come to here, just use the common OPCODE. */ GBE_ASSERT(0); break; default: GenContext::emitUnaryInstruction(insn); } } void Gen8Context::emitUnaryWithTempInstruction(const SelectionInstruction &insn) { GenRegister dst = ra->genReg(insn.dst(0)); GenRegister src = ra->genReg(insn.src(0)); GenRegister tmp = ra->genReg(insn.dst(1)); switch (insn.opcode) { case SEL_OP_CONVI_TO_I64: /* Should never come to here, just use the common OPCODE. */ GBE_ASSERT(0); break; case SEL_OP_BSWAP: { uint32_t simd = p->curr.execWidth; GBE_ASSERT(simd == 8 || simd == 16 || simd == 1); uint16_t new_a0[16]; memset(new_a0, 0, sizeof(new_a0)); GBE_ASSERT(src.type == dst.type); uint32_t start_addr = src.nr*32 + src.subnr; if (simd == 1) { GBE_ASSERT(src.hstride == GEN_HORIZONTAL_STRIDE_0 && dst.hstride == GEN_HORIZONTAL_STRIDE_0); if (src.type == GEN_TYPE_UD || src.type == GEN_TYPE_D) { GBE_ASSERT(start_addr >= 0); new_a0[0] = start_addr + 3; new_a0[1] = start_addr + 2; new_a0[2] = start_addr + 1; new_a0[3] = start_addr; this->setA0Content(new_a0, 0, 4); p->push(); p->curr.execWidth = 4; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; GenRegister ind_src = GenRegister::to_indirect1xN(GenRegister::retype(src, GEN_TYPE_UB), new_a0[0], 0); GenRegister dst_ = dst; dst_.type = GEN_TYPE_UB; dst_.hstride = GEN_HORIZONTAL_STRIDE_1; dst_.width = GEN_WIDTH_4; dst_.vstride = GEN_VERTICAL_STRIDE_4; p->MOV(dst_, ind_src); p->pop(); } else if (src.type == GEN_TYPE_UW || src.type == GEN_TYPE_W) { p->MOV(GenRegister::retype(dst, GEN_TYPE_UB), GenRegister::retype(GenRegister::offset(src, 0, 1), GEN_TYPE_UB)); p->MOV(GenRegister::retype(GenRegister::offset(dst, 0, 1), GEN_TYPE_UB), GenRegister::retype(src, GEN_TYPE_UB)); } else { GBE_ASSERT(0); } } else { if (src.type == GEN_TYPE_UD || src.type == GEN_TYPE_D) { bool uniform_src = (src.hstride == GEN_HORIZONTAL_STRIDE_0); GBE_ASSERT(uniform_src || src.subnr == 0); GBE_ASSERT(dst.subnr == 0); GBE_ASSERT(tmp.subnr == 0); GBE_ASSERT(start_addr >= 0); new_a0[0] = start_addr + 3; new_a0[1] = start_addr + 2; new_a0[2] = start_addr + 1; new_a0[3] = start_addr; if (!uniform_src) { new_a0[4] = start_addr + 7; new_a0[5] = start_addr + 6; new_a0[6] = start_addr + 5; new_a0[7] = start_addr + 4; new_a0[8] = start_addr + 11; new_a0[9] = start_addr + 10; new_a0[10] = start_addr + 9; new_a0[11] = start_addr + 8; new_a0[12] = start_addr + 15; new_a0[13] = start_addr + 14; new_a0[14] = start_addr + 13; new_a0[15] = start_addr + 12; } else { new_a0[4] = start_addr + 3; new_a0[5] = start_addr + 2; new_a0[6] = start_addr + 1; new_a0[7] = start_addr; new_a0[8] = start_addr + 3; new_a0[9] = start_addr + 2; new_a0[10] = start_addr + 1; new_a0[11] = start_addr; new_a0[12] = start_addr + 3; new_a0[13] = start_addr + 2; new_a0[14] = start_addr + 1; new_a0[15] = start_addr; } this->setA0Content(new_a0, 48); p->push(); p->curr.execWidth = 16; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; GenRegister ind_src = GenRegister::to_indirect1xN(GenRegister::retype(src, GEN_TYPE_UB), new_a0[0], 0); p->MOV(GenRegister::retype(tmp, GEN_TYPE_UB), ind_src); if(!uniform_src) ind_src.addr_imm += 16; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 0, 16), ind_src); if (simd == 16) { for (int i = 0; i < 2; i++) { if(!uniform_src) ind_src.addr_imm += 16; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 1, 16*i), ind_src); } } p->pop(); p->MOV(dst, tmp); } else if (src.type == GEN_TYPE_UW || src.type == GEN_TYPE_W) { bool uniform_src = (src.hstride == GEN_HORIZONTAL_STRIDE_0); GBE_ASSERT(uniform_src || src.subnr == 0 || src.subnr == 16); GBE_ASSERT(dst.subnr == 0 || dst.subnr == 16); GBE_ASSERT(tmp.subnr == 0 || tmp.subnr == 16); GBE_ASSERT(start_addr >= 0); new_a0[0] = start_addr + 1; new_a0[1] = start_addr; if (!uniform_src) { new_a0[2] = start_addr + 3; new_a0[3] = start_addr + 2; new_a0[4] = start_addr + 5; new_a0[5] = start_addr + 4; new_a0[6] = start_addr + 7; new_a0[7] = start_addr + 6; new_a0[8] = start_addr + 9; new_a0[9] = start_addr + 8; new_a0[10] = start_addr + 11; new_a0[11] = start_addr + 10; new_a0[12] = start_addr + 13; new_a0[13] = start_addr + 12; new_a0[14] = start_addr + 15; new_a0[15] = start_addr + 14; } else { new_a0[2] = start_addr + 1; new_a0[3] = start_addr; new_a0[4] = start_addr + 1; new_a0[5] = start_addr; new_a0[6] = start_addr + 1; new_a0[7] = start_addr; new_a0[8] = start_addr + 1; new_a0[9] = start_addr; new_a0[10] = start_addr + 1; new_a0[11] = start_addr; new_a0[12] = start_addr + 1; new_a0[13] = start_addr; new_a0[14] = start_addr + 1; new_a0[15] = start_addr; } this->setA0Content(new_a0, 48); p->push(); p->curr.execWidth = 16; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; GenRegister ind_src = GenRegister::to_indirect1xN(GenRegister::retype(src, GEN_TYPE_UB), new_a0[0], 0); p->MOV(GenRegister::retype(tmp, GEN_TYPE_UB), ind_src); if (simd == 16) { if(!uniform_src) ind_src.addr_imm += 16; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 0, 16), ind_src); } p->pop(); p->MOV(dst, tmp); }else if (src.type == GEN_TYPE_UL || src.type == GEN_TYPE_L) { bool uniform_src = (src.hstride == GEN_HORIZONTAL_STRIDE_0); GBE_ASSERT(uniform_src || src.subnr == 0); GBE_ASSERT(dst.subnr == 0); GBE_ASSERT(tmp.subnr == 0); GBE_ASSERT(start_addr >= 0); new_a0[0] = start_addr + 7; new_a0[1] = start_addr + 6; new_a0[2] = start_addr + 5; new_a0[3] = start_addr + 4; new_a0[4] = start_addr + 3; new_a0[5] = start_addr + 2; new_a0[6] = start_addr + 1; new_a0[7] = start_addr; if(!uniform_src) { new_a0[8] = start_addr + 15; new_a0[9] = start_addr + 14; new_a0[10] = start_addr + 13; new_a0[11] = start_addr + 12; new_a0[12] = start_addr + 11; new_a0[13] = start_addr + 10; new_a0[14] = start_addr + 9; new_a0[15] = start_addr + 8; } else { new_a0[8] = start_addr + 7; new_a0[9] = start_addr + 6; new_a0[10] = start_addr + 5; new_a0[11] = start_addr + 4; new_a0[12] = start_addr + 3; new_a0[13] = start_addr + 2; new_a0[14] = start_addr + 1; new_a0[15] = start_addr; } this->setA0Content(new_a0, 56); p->push(); p->curr.execWidth = 16; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; GenRegister ind_src = GenRegister::to_indirect1xN(GenRegister::retype(src, GEN_TYPE_UB), new_a0[0], 0); p->MOV(GenRegister::retype(tmp, GEN_TYPE_UB), ind_src); if(!uniform_src) ind_src.addr_imm += 16; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 0, 16), ind_src); for (int i = 0; i < 2; i++) { if(!uniform_src) ind_src.addr_imm += 16; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 1, 16*i), ind_src); } if (simd == 16) { for (int i = 0; i < 2; i++) { if(!uniform_src) ind_src.addr_imm += 16; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 2, 16*i), ind_src); } for (int i = 0; i < 2; i++) { if(!uniform_src) ind_src.addr_imm += 16; p->MOV(GenRegister::offset(GenRegister::retype(tmp, GEN_TYPE_UB), 3, 16*i), ind_src); } } p->pop(); p->MOV(dst, tmp); } else { GBE_ASSERT(0); } } } break; default: GenContext::emitUnaryWithTempInstruction(insn); } } void Gen8Context::emitSimdShuffleInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src0 = ra->genReg(insn.src(0)); const GenRegister src1 = ra->genReg(insn.src(1)); assert(insn.opcode == SEL_OP_SIMD_SHUFFLE); assert (src1.file != GEN_IMMEDIATE_VALUE); uint32_t base = src0.nr * 32 + src0.subnr; GenRegister baseReg = GenRegister::immuw(base); const GenRegister a0 = GenRegister::addr8(0); p->ADD(a0, GenRegister::unpacked_uw(src1.nr, src1.subnr / typeSize(GEN_TYPE_UW)), baseReg); GenRegister indirect = GenRegister::to_indirect1xN(src0, 0, 0); p->MOV(dst, indirect); } void Gen8Context::emitBinaryInstruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src0 = ra->genReg(insn.src(0)); const GenRegister src1 = ra->genReg(insn.src(1)); switch (insn.opcode) { case SEL_OP_SEL_INT64: case SEL_OP_I64AND: case SEL_OP_I64OR: case SEL_OP_I64XOR: /* Should never come to here, just use the common OPCODE. */ GBE_ASSERT(0); break; case SEL_OP_UPSAMPLE_LONG: { p->MOV(dst, src0); p->SHL(dst, dst, GenRegister::immud(32)); p->ADD(dst, dst, src1); break; } default: GenContext::emitBinaryInstruction(insn); } } void Gen8Context::emitBinaryWithTempInstruction(const SelectionInstruction &insn) { switch (insn.opcode) { case SEL_OP_I64ADD: case SEL_OP_I64SUB: /* Should never come to here, just use the common OPCODE. */ GBE_ASSERT(0); break; default: GenContext::emitBinaryWithTempInstruction(insn); } } void Gen8Context::emitI64ShiftInstruction(const SelectionInstruction &insn) { switch (insn.opcode) { case SEL_OP_I64SHL: case SEL_OP_I64SHR: case SEL_OP_I64ASR: /* Should never come to here, just use the common OPCODE. */ GBE_ASSERT(0); break; default: GenContext::emitI64ShiftInstruction(insn); } } void Gen8Context::emitI64CompareInstruction(const SelectionInstruction &insn) { /* Should never come to here, just use the common OPCODE. */ GBE_ASSERT(0); } void Gen8Context::emitI64SATADDInstruction(const SelectionInstruction &insn) { /* Should never come to here, just use the common OPCODE. */ GBE_ASSERT(0); } void Gen8Context::emitI64SATSUBInstruction(const SelectionInstruction &insn) { /* Should never come to here, just use the common OPCODE. */ GBE_ASSERT(0); } void Gen8Context::emitI64ToFloatInstruction(const SelectionInstruction &insn) { /* Should never come to here, just use the common OPCODE. */ GBE_ASSERT(0); } void Gen8Context::emitFloatToI64Instruction(const SelectionInstruction &insn) { /* Should never come to here, just use the common OPCODE. */ GBE_ASSERT(0); } GenRegister Gen8Context::unpacked_ud(GenRegister reg, uint32_t offset) { if(reg.hstride == GEN_HORIZONTAL_STRIDE_0) { if(offset == 0) return GenRegister::retype(reg, GEN_TYPE_UD); else return GenRegister::retype(GenRegister::offset(reg, 0, typeSize(GEN_TYPE_UD)*offset), GEN_TYPE_UD); } else return GenRegister::unpacked_ud(reg.nr, reg.subnr + offset); } void Gen8Context::calculateFullU64MUL(GenRegister src0, GenRegister src1, GenRegister dst_h, GenRegister dst_l, GenRegister s0l_s1h, GenRegister s0h_s1l) { src0.type = src1.type = GEN_TYPE_UD; dst_h.type = dst_l.type = GEN_TYPE_UL; s0l_s1h.type = s0h_s1l.type = GEN_TYPE_UL; GenRegister s0l = unpacked_ud(src0); GenRegister s1l = unpacked_ud(src1); GenRegister s0h = GenRegister::offset(s0l, 0, 4); GenRegister s1h = GenRegister::offset(s1l, 0, 4); /* Low 32 bits X low 32 bits. */ p->MUL(dst_l, s0l, s1l); /* High 32 bits X High 32 bits. */ p->MUL(dst_h, s0h, s1h); /* Low 32 bits X high 32 bits. */ p->MUL(s0l_s1h, s0l, s1h); /* High 32 bits X low 32 bits. */ p->MUL(s0h_s1l, s0h, s1l); /* Because the max product of s0l*s1h is (2^N - 1) * (2^N - 1) = 2^2N + 1 - 2^(N+1), here N = 32 The max of addding 2 32bits integer to it is 2^2N + 1 - 2^(N+1) + 2*(2^N - 1) = 2^2N - 1 which means the product s0h_s1l adds dst_l's high 32 bits and then adds s0l_s1h's low 32 bits will not overflow and have no carry. By this manner, we can avoid using acc register, which has a lot of restrictions. */ GenRegister dst_l_h = unpacked_ud(dst_l, 1); p->ADD(s0h_s1l, s0h_s1l, dst_l_h); GenRegister s0l_s1h_l = unpacked_ud(s0l_s1h); p->ADD(s0h_s1l, s0h_s1l, s0l_s1h_l); GenRegister s0l_s1h_h = unpacked_ud(s0l_s1h, 1); p->ADD(dst_h, dst_h, s0l_s1h_h); // No longer need s0l_s1h GenRegister tmp = s0l_s1h; p->SHL(tmp, s0h_s1l, GenRegister::immud(32)); GenRegister tmp_unpacked = unpacked_ud(tmp, 1); p->MOV(dst_l_h, tmp_unpacked); p->SHR(tmp, s0h_s1l, GenRegister::immud(32)); p->ADD(dst_h, dst_h, tmp); } void Gen8Context::calculateFullS64MUL(GenRegister src0, GenRegister src1, GenRegister dst_h, GenRegister dst_l, GenRegister s0_abs, GenRegister s1_abs, GenRegister tmp0, GenRegister tmp1, GenRegister sign, GenRegister flagReg) { tmp0.type = tmp1.type = GEN_TYPE_UL; sign.type = GEN_TYPE_UL; src0.type = src1.type = GEN_TYPE_UL; /* First, need to get the sign. */ p->SHR(tmp0, src0, GenRegister::immud(63)); p->SHR(tmp1, src1, GenRegister::immud(63)); p->XOR(sign, tmp0, tmp1); src0.type = src1.type = GEN_TYPE_L; tmp0.type = tmp1.type = GEN_TYPE_UL; s0_abs.type = s1_abs.type = GEN_TYPE_L; p->MOV(s0_abs, GenRegister::abs(src0)); p->MOV(s1_abs, GenRegister::abs(src1)); calculateFullU64MUL(s0_abs, s1_abs, dst_h, dst_l, tmp0, tmp1); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_NZ, sign, GenRegister::immud(0), tmp0); p->curr.noMask = 0; p->curr.predicate = GEN_PREDICATE_NORMAL; /* Calculate the neg for the whole 128 bits. */ dst_l.type = GEN_TYPE_UL; dst_h.type = GEN_TYPE_L; p->NOT(dst_l, dst_l); p->NOT(dst_h, dst_h); p->ADD(dst_l, dst_l, GenRegister::immud(0x01)); p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_Z, dst_l, GenRegister::immud(0), tmp0); p->ADD(dst_h, dst_h, GenRegister::immud(0x01)); p->pop(); } void Gen8Context::emitI64MULHIInstruction(const SelectionInstruction &insn) { GenRegister src0 = ra->genReg(insn.src(0)); GenRegister src1 = ra->genReg(insn.src(1)); GenRegister dst_h = ra->genReg(insn.dst(0)); GenRegister dst_l = ra->genReg(insn.dst(1)); GenRegister s0_abs = ra->genReg(insn.dst(2)); GenRegister s1_abs = ra->genReg(insn.dst(3)); GenRegister tmp0 = ra->genReg(insn.dst(4)); GenRegister tmp1 = ra->genReg(insn.dst(5)); GenRegister sign = ra->genReg(insn.dst(6)); GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag); if(src0.type == GEN_TYPE_UL) { GBE_ASSERT(src1.type == GEN_TYPE_UL); calculateFullU64MUL(src0, src1, dst_h, dst_l, tmp0, tmp1); } else { GBE_ASSERT(src0.type == GEN_TYPE_L); GBE_ASSERT(src1.type == GEN_TYPE_L); calculateFullS64MUL(src0, src1, dst_h, dst_l, s0_abs, s1_abs, tmp0, tmp1, sign, flagReg); } } void Gen8Context::emitI64MADSATInstruction(const SelectionInstruction &insn) { GenRegister src0 = ra->genReg(insn.src(0)); GenRegister src1 = ra->genReg(insn.src(1)); GenRegister src2 = ra->genReg(insn.src(2)); GenRegister dst_l = ra->genReg(insn.dst(0)); GenRegister dst_h = ra->genReg(insn.dst(1)); GenRegister s0_abs = ra->genReg(insn.dst(2)); GenRegister s1_abs = ra->genReg(insn.dst(3)); GenRegister tmp0 = ra->genReg(insn.dst(4)); GenRegister tmp1 = ra->genReg(insn.dst(5)); GenRegister sign = ra->genReg(insn.dst(6)); GenRegister flagReg = GenRegister::flag(insn.state.flag, insn.state.subFlag); if (src0.type == GEN_TYPE_UL) { /* Always should be the same long type. */ GBE_ASSERT(src1.type == GEN_TYPE_UL); GBE_ASSERT(src2.type == GEN_TYPE_UL); dst_l.type = dst_h.type = GEN_TYPE_UL; tmp0.type = tmp1.type = GEN_TYPE_UL; calculateFullU64MUL(src0, src1, dst_h, dst_l, tmp0, tmp1); /* Inplement the logic: dst_l += src2; if (dst_h) dst_l = 0xFFFFFFFFFFFFFFFFULL; if (dst_l < src2) // carry if overflow dst_l = 0xFFFFFFFFFFFFFFFFULL; */ p->ADD(dst_l, dst_l, src2); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_NZ, dst_h, GenRegister::immud(0), tmp0); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.noMask = 0; p->MOV(dst_l, GenRegister::immuint64(0xFFFFFFFFFFFFFFFF)); p->pop(); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_L, dst_l, src2, tmp0); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.noMask = 0; p->MOV(dst_l, GenRegister::immuint64(0xFFFFFFFFFFFFFFFF)); p->pop(); } else { GBE_ASSERT(src0.type == GEN_TYPE_L); GBE_ASSERT(src1.type == GEN_TYPE_L); GBE_ASSERT(src2.type == GEN_TYPE_L); calculateFullS64MUL(src0, src1, dst_h, dst_l, s0_abs, s1_abs, tmp0, tmp1, sign, flagReg); GenRegister sum = sign; sum.type = GEN_TYPE_UL; src2.type = GEN_TYPE_L; dst_l.type = GEN_TYPE_UL; p->ADD(sum, src2, dst_l); /* Implement this logic: if(src2 >= 0) { if(dst_l > sum) { dst_h++; if(CL_LONG_MIN == dst_h) { dst_h = CL_LONG_MAX; sum = CL_ULONG_MAX; } } } */ p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_GE, src2, GenRegister::immud(0), tmp1); p->curr.noMask = 0; p->curr.predicate = GEN_PREDICATE_NORMAL; p->CMP(GEN_CONDITIONAL_G, dst_l, sum, tmp1); p->ADD(dst_h, dst_h, GenRegister::immud(1)); p->MOV(tmp0, GenRegister::immint64(-0x7FFFFFFFFFFFFFFFLL - 1LL)); p->CMP(GEN_CONDITIONAL_EQ, dst_h, tmp0, tmp1); p->MOV(dst_h, GenRegister::immint64(0x7FFFFFFFFFFFFFFFLL)); p->MOV(sum, GenRegister::immuint64(0xFFFFFFFFFFFFFFFFULL)); p->pop(); /* Implement this logic: else { if(dst_l < sum) { dst_h--; if(CL_LONG_MAX == dst_h) { dst_h = CL_LONG_MIN; sum = 0; } } } */ p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_L, src2, GenRegister::immud(0), tmp1); p->curr.noMask = 0; p->curr.predicate = GEN_PREDICATE_NORMAL; p->CMP(GEN_CONDITIONAL_L, dst_l, sum, tmp1); p->ADD(dst_h, dst_h, GenRegister::immd(-1)); p->MOV(tmp0, GenRegister::immint64(0x7FFFFFFFFFFFFFFFLL)); p->CMP(GEN_CONDITIONAL_EQ, dst_h, tmp0, tmp1); p->MOV(dst_h, GenRegister::immint64(-0x7FFFFFFFFFFFFFFFLL - 1LL)); p->MOV(sum, GenRegister::immud(0)); p->pop(); /* saturate logic: if(dst_h > 0) sum = CL_LONG_MAX; else if (dst_h == 0 && sum > 0x7FFFFFFFFFFFFFFFLL) { sum = CL_LONG_MAX; else if (dst_h == -1 && sum < 0x8000000000000000) sum = CL_LONG_MIN; else (dst_h < -1) sum = CL_LONG_MIN; cl_long result = (cl_long) sum; */ p->MOV(dst_l, sum); tmp0.type = GEN_TYPE_UL; dst_h.type = GEN_TYPE_L; p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_G, dst_h, GenRegister::immud(0), tmp1); p->curr.noMask = 0; p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(dst_l, GenRegister::immint64(0x7FFFFFFFFFFFFFFFLL)); p->pop(); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_EQ, dst_h, GenRegister::immd(0x0L), tmp1); p->curr.noMask = 0; p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(tmp0, GenRegister::immuint64(0x7FFFFFFFFFFFFFFFUL)); p->CMP(GEN_CONDITIONAL_G, dst_l, tmp0, tmp1); p->MOV(dst_l, GenRegister::immint64(0x7FFFFFFFFFFFFFFFLL)); p->pop(); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); /* Fixme: HW bug ? 0xFFFFFFFFFFFFFFFF != 0xFFFFFFFFFFFFFFFF */ p->ADD(tmp0, dst_h, GenRegister::immud(1)); p->CMP(GEN_CONDITIONAL_EQ, tmp0, GenRegister::immud(0), tmp1); p->curr.noMask = 0; p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(tmp0, GenRegister::immuint64(0x8000000000000000UL)); p->CMP(GEN_CONDITIONAL_L, dst_l, tmp0, tmp1); p->MOV(dst_l, GenRegister::immint64(-0x7FFFFFFFFFFFFFFFLL - 1LL)); p->pop(); p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.useFlag(flagReg.flag_nr(), flagReg.flag_subnr()); p->CMP(GEN_CONDITIONAL_L, dst_h, GenRegister::immd(-1), tmp1); p->curr.noMask = 0; p->curr.predicate = GEN_PREDICATE_NORMAL; p->MOV(dst_l, GenRegister::immint64(-0x7FFFFFFFFFFFFFFFLL - 1LL)); p->pop(); } } void Gen8Context::emitI64MULInstruction(const SelectionInstruction &insn) { GenRegister src0 = ra->genReg(insn.src(0)); GenRegister src1 = ra->genReg(insn.src(1)); GenRegister dst = ra->genReg(insn.dst(0)); GenRegister res = ra->genReg(insn.dst(1)); src0.type = src1.type = GEN_TYPE_UD; dst.type = GEN_TYPE_UL; res.type = GEN_TYPE_UL; /* Low 32 bits X low 32 bits. */ GenRegister s0l = unpacked_ud(src0); GenRegister s1l = unpacked_ud(src1); p->MUL(dst, s0l, s1l); /* Low 32 bits X high 32 bits. */ GenRegister s1h = GenRegister::offset(s1l, 0, 4); p->MUL(res, s0l, s1h); p->SHL(res, res, GenRegister::immud(32)); p->ADD(dst, dst, res); /* High 32 bits X low 32 bits. */ GenRegister s0h = GenRegister::offset(s0l, 0, 4); p->MUL(res, s0h, s1l); p->SHL(res, res, GenRegister::immud(32)); p->ADD(dst, dst, res); } void Gen8Context::emitI64HADDInstruction(const SelectionInstruction &insn) { GenRegister src0 = ra->genReg(insn.src(0)); GenRegister src1 = ra->genReg(insn.src(1)); GenRegister dst = ra->genReg(insn.dst(0)); GenRegister tmp0 = ra->genReg(insn.dst(1)); GenRegister tmp1 = ra->genReg(insn.dst(2)); /* Src0 and Src1 are always unsigned long type.*/ GBE_ASSERT(src0.type == GEN_TYPE_UL && src1.type == GEN_TYPE_UL); dst.type = src0.type; tmp0.type = tmp1.type = GEN_TYPE_UL; //hadd = (src0>>1) + (src1>>1) + ((src0&0x1) & (src1&0x1)) p->AND(tmp0, src0, GenRegister::immud(1)); p->AND(dst, src1, tmp0); p->SHR(tmp0, src0, GenRegister::immud(1)); p->SHR(tmp1, src1, GenRegister::immud(1)); p->ADD(dst, dst, tmp0); p->ADD(dst, dst, tmp1); } void Gen8Context::emitI64RHADDInstruction(const SelectionInstruction &insn) { GenRegister src0 = ra->genReg(insn.src(0)); GenRegister src1 = ra->genReg(insn.src(1)); GenRegister dst = ra->genReg(insn.dst(0)); GenRegister tmp0 = ra->genReg(insn.dst(1)); GenRegister tmp1 = ra->genReg(insn.dst(2)); /* Src0 and Src1 are always unsigned long type.*/ GBE_ASSERT(src0.type == GEN_TYPE_UL && src1.type == GEN_TYPE_UL); dst.type = src0.type; tmp0.type = tmp1.type = GEN_TYPE_UL; //rhadd = (src0>>1) + (src1>>1) + ((src0&0x1) | (src1&0x1)) p->AND(tmp0, src0, GenRegister::immud(1)); p->AND(tmp1, src1, GenRegister::immud(1)); p->OR(dst, tmp0, tmp1); p->SHR(tmp0, src0, GenRegister::immud(1)); p->SHR(tmp1, src1, GenRegister::immud(1)); p->ADD(dst, dst, tmp0); p->ADD(dst, dst, tmp1); } void Gen8Context::emitI64DIVREMInstruction(const SelectionInstruction &cnst_insn) { SelectionInstruction* insn = const_cast(&cnst_insn); GenRegister packed_src0 = ra->genReg(insn->src(0)); GenRegister packed_src1 = ra->genReg(insn->src(1)); GenRegister dst = ra->genReg(insn->dst(0)); int tmp_reg_n = 14; if (packed_src0.hstride != GEN_HORIZONTAL_STRIDE_0) { GenRegister unpacked_src0 = ra->genReg(insn->dst(tmp_reg_n)); unpackLongVec(packed_src0, unpacked_src0, p->curr.execWidth); tmp_reg_n++; insn->src(0) = unpacked_src0; } if (packed_src1.hstride != GEN_HORIZONTAL_STRIDE_0) { GenRegister unpacked_src1 = ra->genReg(insn->dst(tmp_reg_n)); unpackLongVec(packed_src1, unpacked_src1, p->curr.execWidth); tmp_reg_n++; insn->src(1) = unpacked_src1; } GBE_ASSERT(tmp_reg_n <= insn->dstNum); GenContext::emitI64DIVREMInstruction(*insn); if (dst.hstride != GEN_HORIZONTAL_STRIDE_0) { GenRegister dst_packed = ra->genReg(insn->dst(14)); packLongVec(dst, dst_packed, p->curr.execWidth); p->MOV(dst, dst_packed); } } void Gen8Context::packLongVec(GenRegister unpacked, GenRegister packed, uint32_t simd) { bool isScalar = false; if (unpacked.hstride == GEN_HORIZONTAL_STRIDE_0) isScalar = true; GBE_ASSERT(packed.subnr == 0); GBE_ASSERT(packed.hstride != GEN_HORIZONTAL_STRIDE_0); GBE_ASSERT(unpacked.subnr == 0 || isScalar); unpacked = GenRegister::retype(unpacked, GEN_TYPE_UD); packed = GenRegister::retype(packed, GEN_TYPE_UD); if (isScalar) { p->MOV(packed, unpacked); } else { if (simd == 16) { p->push(); p->curr.execWidth = 8; p->MOV(GenRegister::h2(packed), unpacked); p->MOV(GenRegister::h2(GenRegister::offset(packed, 0, typeSize(GEN_TYPE_UD))), GenRegister::offset(unpacked, 2)); p->curr.quarterControl = 1; p->MOV(GenRegister::h2(GenRegister::offset(packed, 2, 0)), GenRegister::offset(unpacked, 1)); p->MOV(GenRegister::h2(GenRegister::offset(packed, 2, typeSize(GEN_TYPE_UD))), GenRegister::offset(unpacked, 3)); p->pop(); } else { GBE_ASSERT(simd == 8); p->MOV(GenRegister::h2(packed), unpacked); p->MOV(GenRegister::h2(GenRegister::offset(packed, 0, typeSize(GEN_TYPE_UD))), GenRegister::offset(unpacked, 1)); } } } void Gen8Context::unpackLongVec(GenRegister packed, GenRegister unpacked, uint32_t simd) { bool isScalar = false; if (packed.hstride == GEN_HORIZONTAL_STRIDE_0) isScalar = true; GBE_ASSERT(packed.subnr == 0 || isScalar); GBE_ASSERT(unpacked.hstride != GEN_HORIZONTAL_STRIDE_0); GBE_ASSERT(unpacked.subnr == 0); unpacked = GenRegister::retype(unpacked, GEN_TYPE_UD); packed = GenRegister::retype(packed, GEN_TYPE_UD); if (isScalar) { p->MOV(unpacked, packed); if (simd == 16) { p->MOV(GenRegister::offset(unpacked, 2), GenRegister::offset(packed, 0, typeSize(GEN_TYPE_UD))); } else { p->MOV(GenRegister::offset(unpacked, 1), GenRegister::offset(packed, 0, typeSize(GEN_TYPE_UD))); } } else { packed.vstride = GEN_VERTICAL_STRIDE_8; packed.width = GEN_WIDTH_4; p->push(); p->curr.execWidth = 8; if (simd == 16) { p->MOV(unpacked, GenRegister::h2(packed)); p->MOV(GenRegister::offset(unpacked, 2), GenRegister::h2(GenRegister::offset(packed, 0, typeSize(GEN_TYPE_UD)))); p->curr.quarterControl = 1; p->MOV(GenRegister::offset(unpacked, 1), GenRegister::h2(GenRegister::offset(packed, 2))); p->MOV(GenRegister::offset(unpacked, 3), GenRegister::h2(GenRegister::offset(packed, 2, typeSize(GEN_TYPE_UD)))); } else { GBE_ASSERT(simd == 8); p->MOV(unpacked, GenRegister::h2(packed)); p->MOV(GenRegister::offset(unpacked, 1), GenRegister::h2(GenRegister::offset(packed, 0, typeSize(GEN_TYPE_UD)))); } p->pop(); } } void Gen8Context::emitUntypedReadA64Instruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src = ra->genReg(insn.src(0)); const uint32_t elemNum = insn.extra.elem; p->UNTYPED_READA64(dst, src, elemNum); } void Gen8Context::emitUntypedWriteA64Instruction(const SelectionInstruction &insn) { const GenRegister src = ra->genReg(insn.src(0)); const uint32_t elemNum = insn.extra.elem; p->UNTYPED_WRITEA64(src, elemNum); } void Gen8Context::emitByteGatherA64Instruction(const SelectionInstruction &insn) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src = ra->genReg(insn.src(0)); const uint32_t elemSize = insn.extra.elem; p->BYTE_GATHERA64(dst, src, elemSize); } void Gen8Context::emitByteScatterA64Instruction(const SelectionInstruction &insn) { const GenRegister src = ra->genReg(insn.src(0)); const uint32_t elemSize = insn.extra.elem; p->BYTE_SCATTERA64(src, elemSize); } void Gen8Context::emitRead64Instruction(const SelectionInstruction &insn) { const uint32_t elemNum = insn.extra.elem; GBE_ASSERT(elemNum == 1); const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src = ra->genReg(insn.src(0)); const GenRegister bti = ra->genReg(insn.src(1)); /* Because BDW's store and load send instructions for 64 bits require the bti to be surfaceless, which we can not accept. We just fallback to 2 DW untyperead here. */ if (bti.file == GEN_IMMEDIATE_VALUE) { p->UNTYPED_READ(dst, src, bti, 2*elemNum); } else { const GenRegister tmp = ra->genReg(insn.dst(2*elemNum)); const GenRegister btiTmp = ra->genReg(insn.dst(2*elemNum + 1)); unsigned desc = p->generateUntypedReadMessageDesc(0, 2*elemNum); unsigned jip0 = beforeMessage(insn, bti, tmp, btiTmp, desc); //predicated load p->push(); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(insn.state.flag, insn.state.subFlag); p->UNTYPED_READ(dst, src, GenRegister::retype(GenRegister::addr1(0), GEN_TYPE_UD), 2*elemNum); p->pop(); afterMessage(insn, bti, tmp, btiTmp, jip0); } for (uint32_t elemID = 0; elemID < elemNum; elemID++) { GenRegister long_tmp = ra->genReg(insn.dst(elemID)); GenRegister the_long = ra->genReg(insn.dst(elemID + elemNum)); this->packLongVec(long_tmp, the_long, p->curr.execWidth); } } void Gen8Context::emitWrite64Instruction(const SelectionInstruction &insn) { const uint32_t elemNum = insn.extra.elem; GBE_ASSERT(elemNum == 1); const GenRegister addr = ra->genReg(insn.src(elemNum)); const GenRegister bti = ra->genReg(insn.src(elemNum*2+1)); GenRegister data = ra->genReg(insn.src(elemNum+1)); /* Because BDW's store and load send instructions for 64 bits require the bti to be surfaceless, which we can not accept. We just fallback to 2 DW untypewrite here. */ for (uint32_t elemID = 0; elemID < elemNum; elemID++) { GenRegister the_long = ra->genReg(insn.src(elemID)); GenRegister long_tmp = ra->genReg(insn.src(elemNum + 1 + elemID)); this->unpackLongVec(the_long, long_tmp, p->curr.execWidth); } if (bti.file == GEN_IMMEDIATE_VALUE) { p->UNTYPED_WRITE(addr, data, bti, elemNum*2, insn.extra.splitSend); } else { const GenRegister tmp = ra->genReg(insn.dst(elemNum)); const GenRegister btiTmp = ra->genReg(insn.dst(elemNum + 1)); unsigned desc = 0; if (insn.extra.splitSend) desc = p->generateUntypedWriteSendsMessageDesc(0, elemNum*2); else desc = p->generateUntypedWriteMessageDesc(0, elemNum*2); unsigned jip0 = beforeMessage(insn, bti, tmp, btiTmp, desc); //predicated load p->push(); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.useFlag(insn.state.flag, insn.state.subFlag); p->UNTYPED_WRITE(addr, data, GenRegister::addr1(0), elemNum*2, insn.extra.splitSend); p->pop(); afterMessage(insn, bti, tmp, btiTmp, jip0); } } void Gen8Context::emitRead64A64Instruction(const SelectionInstruction &insn) { const uint32_t elemNum = insn.extra.elem; GBE_ASSERT(elemNum == 1); const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src = ra->genReg(insn.src(0)); /* Because BDW's store and load send instructions for 64 bits require the bti to be surfaceless, which we can not accept. We just fallback to 2 DW untyperead here. */ p->UNTYPED_READA64(dst, src, 2*elemNum); for (uint32_t elemID = 0; elemID < elemNum; elemID++) { GenRegister long_tmp = ra->genReg(insn.dst(elemID)); GenRegister the_long = ra->genReg(insn.dst(elemID + elemNum)); this->packLongVec(long_tmp, the_long, p->curr.execWidth); } } void Gen8Context::emitWrite64A64Instruction(const SelectionInstruction &insn) { const uint32_t elemNum = insn.extra.elem; GBE_ASSERT(elemNum == 1); const GenRegister addr = ra->genReg(insn.src(elemNum)); /* Because BDW's store and load send instructions for 64 bits require the bti to be surfaceless, which we can not accept. We just fallback to 2 DW untypewrite here. */ for (uint32_t elemID = 0; elemID < elemNum; elemID++) { GenRegister the_long = ra->genReg(insn.src(elemID)); GenRegister long_tmp = ra->genReg(insn.src(elemNum + 1 + elemID)); this->unpackLongVec(the_long, long_tmp, p->curr.execWidth); } p->UNTYPED_WRITEA64(addr, elemNum*2); } void Gen8Context::emitAtomicA64Instruction(const SelectionInstruction &insn) { const GenRegister src = ra->genReg(insn.src(0)); const GenRegister dst = ra->genReg(insn.dst(0)); const uint32_t function = insn.extra.function; unsigned srcNum = insn.extra.elem; const GenRegister bti = ra->genReg(insn.src(srcNum)); GBE_ASSERT(bti.value.ud == 0xff); p->ATOMICA64(dst, function, src, bti, srcNum); } void Gen8Context::emitPackLongInstruction(const SelectionInstruction &insn) { const GenRegister src = ra->genReg(insn.src(0)); const GenRegister dst = ra->genReg(insn.dst(0)); /* Scalar register need not to convert. */ GBE_ASSERT(dst.hstride != GEN_HORIZONTAL_STRIDE_0 && src.hstride != GEN_HORIZONTAL_STRIDE_0); this->packLongVec(src, dst, p->curr.execWidth); } void Gen8Context::emitUnpackLongInstruction(const SelectionInstruction &insn) { const GenRegister src = ra->genReg(insn.src(0)); const GenRegister dst = ra->genReg(insn.dst(0)); /* Scalar register need not to convert. */ GBE_ASSERT(dst.hstride != GEN_HORIZONTAL_STRIDE_0); this->unpackLongVec(src, dst, p->curr.execWidth); } void Gen8Context::emitF64DIVInstruction(const SelectionInstruction &insn) { /* Macro for Double Precision IEEE Compliant fdiv Set Rounding Mode in CR to RNE GRF are initialized: r0 = 0, r6 = a, r7 = b, r1 = 1 The default data type for the macro is :df math.eo.f0.0 (4) r8.acc2 r6.noacc r7.noacc 0xE (-f0.0) if madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2 // Step(1), q0=a*y0 madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2 // Step(2), e0=(1-b*y0) madm (4) r11.acc5 r6.noacc -r7.noacc r9.acc3 // Step(3), r0=a-b*q0 madm (4) r12.acc6 r8.acc2 r10.acc4 r8.acc2 // Step(4), y1=y0+e0*y0 madm (4) r13.acc7 r1.noacc -r7.noacc r12.acc6 // Step(5), e1=(1-b*y1) madm (4) r8.acc8 r8.acc2 r10.acc4 r12.acc6 // Step(6), y2=y0+e0*y1 madm (4) r9.acc9 r9.acc3 r11.acc5 r12.acc6 // Step(7), q1=q0+r0*y1 madm (4) r12.acc2 r12.acc6 r8.acc8 r13.acc7 // Step(8), y3=y1+e1*y2 madm (4) r11.acc3 r6.noacc -r7.noacc r9.acc9 // Step(9), r1=a-b*q1 Change Rounding Mode in CR if required Implicit Accumulator for destination is NULL madm (4) r8.noacc r9.acc9 r11.acc3 r12.acc2 // Step(10), q=q1+r1*y3 endif */ GenRegister src0 = GenRegister::retype(ra->genReg(insn.src(0)), GEN_TYPE_DF); GenRegister src1 = GenRegister::retype(ra->genReg(insn.src(1)), GEN_TYPE_DF); GenRegister dst = GenRegister::retype(ra->genReg(insn.dst(0)), GEN_TYPE_DF); GenRegister r6 , r7, r8; int src0Stride = 1; int src1Stride = 1; int tmpNum = 7; int loopNum = 0; if (dst.hstride == GEN_HORIZONTAL_STRIDE_0) {// dst is uniform loopNum = 1; } else if (p->curr.execWidth == 4) { loopNum = 1; } else if (p->curr.execWidth == 8) { loopNum = 2; } else if (p->curr.execWidth == 16) { loopNum = 4; } else GBE_ASSERT(0); r8 = GenRegister::retype(ra->genReg(insn.dst(tmpNum + 1)), GEN_TYPE_DF); tmpNum++; if (src0.vstride == GEN_HORIZONTAL_STRIDE_0) { r6 = GenRegister::retype(ra->genReg(insn.dst(tmpNum + 1)), GEN_TYPE_DF); tmpNum++; src0Stride = 0; p->push(); { p->curr.execWidth = 4; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask= 1; p->MOV(r6, src0); } p->pop(); } else { r6 = src0; } if (src1.vstride == GEN_HORIZONTAL_STRIDE_0) { r7 = GenRegister::retype(ra->genReg(insn.dst(tmpNum + 1)), GEN_TYPE_DF); tmpNum++; src1Stride = 0; p->push(); { p->curr.execWidth = 4; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->MOV(r7, src1); } p->pop(); } else { r7 = src1; } const GenRegister r0 = GenRegister::retype(ra->genReg(insn.dst(1)), GEN_TYPE_DF); const GenRegister r1 = GenRegister::retype(ra->genReg(insn.dst(2)), GEN_TYPE_DF); const GenRegister r9 = GenRegister::retype(ra->genReg(insn.dst(3)), GEN_TYPE_DF); const GenRegister r10 = GenRegister::retype(ra->genReg(insn.dst(4)), GEN_TYPE_DF); const GenRegister r11 = GenRegister::retype(ra->genReg(insn.dst(5)), GEN_TYPE_DF); const GenRegister r12 = GenRegister::retype(ra->genReg(insn.dst(6)), GEN_TYPE_DF); const GenRegister r13 = GenRegister::retype(ra->genReg(insn.dst(7)), GEN_TYPE_DF); Gen8Encoder *p8 = reinterpret_cast(p); p->push(); { p->curr.execWidth = 4; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask= 1; p->MOV(r1, GenRegister::immdf(1.0)); p->MOV(r0, GenRegister::immdf(0.0)); } p->pop(); for (int i = 0; i < loopNum; i++) { p->push(); { p->curr.noMask= 1; p->curr.execWidth = 4; p->curr.predicate = GEN_PREDICATE_NONE; p8->MATH_WITH_ACC(r8, GEN8_MATH_FUNCTION_INVM, r6, r7, GEN8_INSN_ACC2, GEN8_INSN_NOACC, GEN8_INSN_NOACC); p->curr.useFlag(insn.state.flag, insn.state.subFlag); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.inversePredicate = 1; p8->MADM(r9, r0, r6, r8, GEN8_INSN_ACC3, GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC2); p8->MADM(r10, r1, GenRegister::negate(r7), r8, GEN8_INSN_ACC4, GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC2); p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC5, GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC3); p8->MADM(r12, r8, r10, r8, GEN8_INSN_ACC6, GEN8_INSN_ACC2, GEN8_INSN_ACC4, GEN8_INSN_ACC2); p8->MADM(r13, r1, GenRegister::negate(r7), r12, GEN8_INSN_ACC7, GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC6); p8->MADM(r8, r8, r10, r12, GEN8_INSN_ACC8, GEN8_INSN_ACC2, GEN8_INSN_ACC4, GEN8_INSN_ACC6); p8->MADM(r9, r9, r11, r12, GEN8_INSN_ACC9, GEN8_INSN_ACC3, GEN8_INSN_ACC5, GEN8_INSN_ACC6); p8->MADM(r12, r12, r8, r13, GEN8_INSN_ACC2, GEN8_INSN_ACC6, GEN8_INSN_ACC8, GEN8_INSN_ACC7); p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC3, GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC9); p8->MADM(r8, r9, r11, r12, GEN8_INSN_NOACC, GEN8_INSN_ACC9, GEN8_INSN_ACC3, GEN8_INSN_ACC2); } p->pop(); r6 = GenRegister::offset(r6, src0Stride); r7 = GenRegister::offset(r7, src1Stride); /* Move back the result. */ if (dst.hstride == GEN_HORIZONTAL_STRIDE_0) {// dst is uniform p->push(); { p->curr.execWidth = 1; r8.hstride = GEN_HORIZONTAL_STRIDE_0; r8.vstride = GEN_VERTICAL_STRIDE_0; r8.width = GEN_WIDTH_1; p->MOV(dst, r8); } p->pop(); break; } else { p->push(); { p->curr.execWidth = 4; if (i % 2 == 0) p->curr.nibControl = 0; else p->curr.nibControl = 1; if (i < 2) p->curr.quarterControl = GEN_COMPRESSION_Q1; else p->curr.quarterControl = GEN_COMPRESSION_Q2; p->MOV(GenRegister::offset(dst, i), r8); } p->pop(); } } } void Gen8Context::setA0Content(uint16_t new_a0[16], uint16_t max_offset, int sz) { if (sz == 0) sz = 16; GBE_ASSERT(sz%4 == 0); GBE_ASSERT(new_a0[0] >= 0 && new_a0[0] < 4096); p->push(); p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; for (int i = 0; i < sz/4; i++) { uint64_t addr = (new_a0[i*4 + 3] << 16) | (new_a0[i*4 + 2]); addr = addr << 32; addr = addr | (new_a0[i*4 + 1] << 16) | (new_a0[i*4]); p->MOV(GenRegister::retype(GenRegister::addr1(i*4), GEN_TYPE_UL), GenRegister::immuint64(addr)); } p->pop(); } void Gen8Context::subTimestamps(GenRegister& t0, GenRegister& t1, GenRegister& tmp) { p->push(); { p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->ADD(GenRegister::retype(t0, GEN_TYPE_UL), GenRegister::retype(t0, GEN_TYPE_UL), GenRegister::negate(GenRegister::retype(t1, GEN_TYPE_UL))); } p->pop(); } void Gen8Context::addTimestamps(GenRegister& t0, GenRegister& t1, GenRegister& tmp) { p->push(); { p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->ADD(GenRegister::retype(t0, GEN_TYPE_UL), GenRegister::retype(t0, GEN_TYPE_UL), GenRegister::retype(t1, GEN_TYPE_UL)); } p->pop(); } void ChvContext::newSelection(void) { this->sel = GBE_NEW(SelectionChv, *this); } void ChvContext::calculateFullU64MUL(GenRegister src0, GenRegister src1, GenRegister dst_h, GenRegister dst_l, GenRegister s0l_s1h, GenRegister s0h_s1l) { src0.type = src1.type = GEN_TYPE_UD; dst_h.type = dst_l.type = GEN_TYPE_UL; s0l_s1h.type = s0h_s1l.type = GEN_TYPE_UL; //GenRegister tmp; GenRegister s0l = unpacked_ud(src0); GenRegister s1l = unpacked_ud(src1); GenRegister s0h = unpacked_ud(s0l_s1h); //s0h only used before s0l_s1h, reuse s0l_s1h GenRegister s1h = unpacked_ud(dst_l); //s1h only used before dst_l, reuse dst_l p->MOV(s0h, GenRegister::offset(s0l, 0, 4)); p->MOV(s1h, GenRegister::offset(s1l, 0, 4)); /* High 32 bits X High 32 bits. */ p->MUL(dst_h, s0h, s1h); /* High 32 bits X low 32 bits. */ p->MUL(s0h_s1l, s0h, s1l); /* Low 32 bits X high 32 bits. */ p->MUL(s0l_s1h, s0l, s1h); /* Low 32 bits X low 32 bits. */ p->MUL(dst_l, s0l, s1l); /* Because the max product of s0l*s1h is (2^N - 1) * (2^N - 1) = 2^2N + 1 - 2^(N+1), here N = 32 The max of addding 2 32bits integer to it is 2^2N + 1 - 2^(N+1) + 2*(2^N - 1) = 2^2N - 1 which means the product s0h_s1l adds dst_l's high 32 bits and then adds s0l_s1h's low 32 bits will not overflow and have no carry. By this manner, we can avoid using acc register, which has a lot of restrictions. */ GenRegister s0l_s1h_l = unpacked_ud(s0l_s1h); p->ADD(s0h_s1l, s0h_s1l, s0l_s1h_l); p->SHR(s0l_s1h, s0l_s1h, GenRegister::immud(32)); GenRegister s0l_s1h_h = unpacked_ud(s0l_s1h); p->ADD(dst_h, dst_h, s0l_s1h_h); GenRegister dst_l_h = unpacked_ud(s0l_s1h); p->MOV(dst_l_h, unpacked_ud(dst_l, 1)); p->ADD(s0h_s1l, s0h_s1l, dst_l_h); // No longer need s0l_s1h GenRegister tmp = s0l_s1h; p->SHL(tmp, s0h_s1l, GenRegister::immud(32)); GenRegister tmp_unpacked = unpacked_ud(tmp, 1); p->MOV(unpacked_ud(dst_l, 1), tmp_unpacked); p->SHR(tmp, s0h_s1l, GenRegister::immud(32)); p->ADD(dst_h, dst_h, tmp); } void ChvContext::emitI64MULInstruction(const SelectionInstruction &insn) { GenRegister src0 = ra->genReg(insn.src(0)); GenRegister src1 = ra->genReg(insn.src(1)); GenRegister dst = ra->genReg(insn.dst(0)); GenRegister res = ra->genReg(insn.dst(1)); src0.type = src1.type = GEN_TYPE_UD; dst.type = GEN_TYPE_UL; res.type = GEN_TYPE_UL; /* Low 32 bits X low 32 bits. */ GenRegister s0l = unpacked_ud(src0); GenRegister s1l = unpacked_ud(src1); p->MUL(dst, s0l, s1l); /* Low 32 bits X high 32 bits. */ GenRegister s1h = unpacked_ud(res); p->MOV(s1h, unpacked_ud(src1, 1)); p->MUL(res, s0l, s1h); p->SHL(res, res, GenRegister::immud(32)); p->ADD(dst, dst, res); /* High 32 bits X low 32 bits. */ GenRegister s0h = unpacked_ud(res); p->MOV(s0h, unpacked_ud(src0, 1)); p->MUL(res, s0h, s1l); p->SHL(res, res, GenRegister::immud(32)); p->ADD(dst, dst, res); } void Gen8Context::emitPrintfLongInstruction(GenRegister& addr, GenRegister& data, GenRegister& src, uint32_t bti) { GenRegister tempSrc, tempDst; GenRegister nextSrc, nextDst; p->push(); tempSrc = GenRegister::h2(GenRegister::retype(src, GEN_TYPE_UD)); tempDst = GenRegister::retype(data, GEN_TYPE_UD); p->curr.execWidth = 8; p->curr.quarterControl = GEN_COMPRESSION_Q1; p->MOV(tempDst, tempSrc); p->curr.quarterControl = GEN_COMPRESSION_Q2; nextSrc = GenRegister::Qn(tempSrc, 1); nextDst = GenRegister::Qn(tempDst, 1); p->MOV(nextDst, nextSrc); p->pop(); p->UNTYPED_WRITE(addr, addr, GenRegister::immud(bti), 1, false); p->ADD(addr, addr, GenRegister::immud(sizeof(uint32_t))); p->push(); tempSrc = GenRegister::h2( GenRegister::retype(GenRegister::offset(src, 0, 4), GEN_TYPE_UD)); tempDst = GenRegister::retype(data, GEN_TYPE_UD); p->curr.execWidth = 8; p->curr.quarterControl = GEN_COMPRESSION_Q1; p->MOV(tempDst, tempSrc); p->curr.quarterControl = GEN_COMPRESSION_Q2; nextSrc = GenRegister::Qn(tempSrc, 1); nextDst = GenRegister::Qn(tempDst, 1); p->MOV(nextDst, nextSrc); p->pop(); p->UNTYPED_WRITE(addr, addr, GenRegister::immud(bti), 1, false); p->ADD(addr, addr, GenRegister::immud(sizeof(uint32_t))); } void ChvContext::setA0Content(uint16_t new_a0[16], uint16_t max_offset, int sz) { if (sz == 0) sz = 16; GBE_ASSERT(sz%4 == 0); GBE_ASSERT(new_a0[0] >= 0 && new_a0[0] < 4096); p->push(); p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; for (int i = 0; i < sz/2; i++) { p->MOV(GenRegister::retype(GenRegister::addr1(i*2), GEN_TYPE_UD), GenRegister::immud(new_a0[i*2 + 1] << 16 | new_a0[i*2])); } p->pop(); } void ChvContext::emitStackPointer(void) { using namespace ir; // Only emit stack pointer computation if we use a stack if (kernel->getStackSize() == 0) return; // Check that everything is consistent in the kernel code const uint32_t perLaneSize = kernel->getStackSize(); GBE_ASSERT(perLaneSize > 0); const GenRegister selStatckPtr = this->simdWidth == 8 ? GenRegister::ud8grf(ir::ocl::stackptr) : GenRegister::ud16grf(ir::ocl::stackptr); const GenRegister stackptr = ra->genReg(selStatckPtr); // borrow block ip as temporary register as we will // initialize block ip latter. const GenRegister tmpReg = GenRegister::retype(GenRegister::vec1(getBlockIP()), GEN_TYPE_UW); const GenRegister tmpReg_ud = GenRegister::retype(tmpReg, GEN_TYPE_UD); loadLaneID(stackptr); // We compute the per-lane stack pointer here // threadId * perThreadSize + laneId*perLaneSize or // (threadId * simdWidth + laneId)*perLaneSize // let private address start from zero //p->MOV(stackptr, GenRegister::immud(0)); p->push(); p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->AND(tmpReg, GenRegister::ud1grf(0,5), GenRegister::immuw(0x1ff)); //threadId p->MUL(tmpReg, tmpReg, GenRegister::immuw(this->simdWidth)); //threadId * simdWidth p->curr.execWidth = this->simdWidth; p->ADD(stackptr, GenRegister::unpacked_uw(stackptr), tmpReg); //threadId * simdWidth + laneId, must < 64K p->curr.execWidth = 1; p->MOV(tmpReg_ud, GenRegister::immud(perLaneSize)); p->curr.execWidth = this->simdWidth; p->MUL(stackptr, tmpReg_ud, GenRegister::unpacked_uw(stackptr)); // (threadId * simdWidth + laneId)*perLaneSize if (fn.getPointerFamily() == ir::FAMILY_QWORD) { const GenRegister selStatckPtr2 = this->simdWidth == 8 ? GenRegister::ul8grf(ir::ocl::stackptr) : GenRegister::ul16grf(ir::ocl::stackptr); GenRegister stackptr2 = ra->genReg(selStatckPtr2); GenRegister sp = GenRegister::unpacked_ud(stackptr2.nr, stackptr2.subnr); int simdWidth = p->curr.execWidth; if (simdWidth == 16) { // we need do second quarter first, because the dst type is QW, // while the src is DW. If we do first quater first, the 1st // quarter's dst would contain the 2nd quarter's src. p->curr.execWidth = 8; p->curr.quarterControl = GEN_COMPRESSION_Q2; p->MOV(GenRegister::Qn(sp, 1), GenRegister::Qn(stackptr,1)); p->MOV(GenRegister::Qn(stackptr2, 1), GenRegister::Qn(sp,1)); } p->curr.quarterControl = GEN_COMPRESSION_Q1; p->MOV(sp, stackptr); p->MOV(stackptr2, sp); } p->pop(); } /* Init value according to WORKGROUP OP * Emit assert is invalid combination operation - datatype */ static void wgOpInitValue(GenEncoder *p, GenRegister dataReg, uint32_t wg_op) { if (wg_op == ir::WORKGROUP_OP_ALL) { if (dataReg.type == GEN_TYPE_D || dataReg.type == GEN_TYPE_UD) p->MOV(dataReg, GenRegister::immd(0xFFFFFFFF)); else if(dataReg.type == GEN_TYPE_L || dataReg.type == GEN_TYPE_UL) p->MOV(dataReg, GenRegister::immint64(0xFFFFFFFFFFFFFFFFL)); else GBE_ASSERT(0); /* unsupported data-type */ } else if(wg_op == ir::WORKGROUP_OP_ANY || wg_op == ir::WORKGROUP_OP_REDUCE_ADD || wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD) { if (dataReg.type == GEN_TYPE_D) p->MOV(dataReg, GenRegister::immd(0x0)); else if (dataReg.type == GEN_TYPE_UD) p->MOV(dataReg, GenRegister::immud(0x0)); else if (dataReg.type == GEN_TYPE_HF) p->MOV(dataReg, GenRegister::immh(0x0)); else if (dataReg.type == GEN_TYPE_F) p->MOV(dataReg, GenRegister::immf(0x0)); else if (dataReg.type == GEN_TYPE_L) p->MOV(dataReg, GenRegister::immint64(0x0)); else if (dataReg.type == GEN_TYPE_UL) p->MOV(dataReg, GenRegister::immuint64(0x0)); else if (dataReg.type == GEN_TYPE_W) p->MOV(dataReg, GenRegister::immw(0x0)); else if (dataReg.type == GEN_TYPE_UW) p->MOV(dataReg, GenRegister::immuw(0x0)); else GBE_ASSERT(0); /* unsupported data-type */ } else if(wg_op == ir::WORKGROUP_OP_REDUCE_MIN || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN) { if (dataReg.type == GEN_TYPE_D) p->MOV(dataReg, GenRegister::immd(0x7FFFFFFF)); else if (dataReg.type == GEN_TYPE_UD) p->MOV(dataReg, GenRegister::immud(0xFFFFFFFF)); else if (dataReg.type == GEN_TYPE_HF) p->MOV(GenRegister::retype(dataReg, GEN_TYPE_UW), GenRegister::immuw(0x7C00)); else if (dataReg.type == GEN_TYPE_F) p->MOV(GenRegister::retype(dataReg, GEN_TYPE_UD), GenRegister::immud(0x7F800000)); else if (dataReg.type == GEN_TYPE_L) p->MOV(dataReg, GenRegister::immint64(0x7FFFFFFFFFFFFFFFL)); else if (dataReg.type == GEN_TYPE_UL) p->MOV(dataReg, GenRegister::immuint64(0xFFFFFFFFFFFFFFFFL)); else if (dataReg.type == GEN_TYPE_W) p->MOV(dataReg, GenRegister::immw(0x7FFF)); else if (dataReg.type == GEN_TYPE_UW) p->MOV(dataReg, GenRegister::immuw(0xFFFF)); else GBE_ASSERT(0); /* unsupported data-type */ } else if(wg_op == ir::WORKGROUP_OP_REDUCE_MAX || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) { if (dataReg.type == GEN_TYPE_D) p->MOV(dataReg, GenRegister::immd(0x80000000)); else if (dataReg.type == GEN_TYPE_UD) p->MOV(dataReg, GenRegister::immud(0x0)); else if (dataReg.type == GEN_TYPE_HF) p->MOV(GenRegister::retype(dataReg, GEN_TYPE_UW), GenRegister::immuw(0xFC00)); else if (dataReg.type == GEN_TYPE_F) p->MOV(GenRegister::retype(dataReg, GEN_TYPE_UD), GenRegister::immud(0xFF800000)); else if (dataReg.type == GEN_TYPE_L) p->MOV(dataReg, GenRegister::immint64(0x8000000000000000L)); else if (dataReg.type == GEN_TYPE_UL) p->MOV(dataReg, GenRegister::immuint64(0x0)); else if (dataReg.type == GEN_TYPE_W) p->MOV(dataReg, GenRegister::immw(0x8000)); else if (dataReg.type == GEN_TYPE_UW) p->MOV(dataReg, GenRegister::immuw(0x0)); else GBE_ASSERT(0); /* unsupported data-type */ } /* unsupported operation */ else GBE_ASSERT(0); } /* Perform WORKGROUP OP on 2 input elements (registers) */ static void wgOpPerform(GenRegister dst, GenRegister src1, GenRegister src2, uint32_t wg_op, GenEncoder *p) { /* perform OP REDUCE on 2 elements */ if (wg_op == ir::WORKGROUP_OP_ANY) p->OR(dst, src1, src2); else if (wg_op == ir::WORKGROUP_OP_ALL) p->AND(dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_REDUCE_ADD) p->ADD(dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_REDUCE_MIN) p->SEL_CMP(GEN_CONDITIONAL_LE, dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_REDUCE_MAX) p->SEL_CMP(GEN_CONDITIONAL_GE, dst, src1, src2); /* perform OP SCAN INCLUSIVE on 2 elements */ else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD) p->ADD(dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN) p->SEL_CMP(GEN_CONDITIONAL_LE, dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX) p->SEL_CMP(GEN_CONDITIONAL_GE, dst, src1, src2); /* perform OP SCAN EXCLUSIVE on 2 elements */ else if(wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD) p->ADD(dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN) p->SEL_CMP(GEN_CONDITIONAL_LE, dst, src1, src2); else if(wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) p->SEL_CMP(GEN_CONDITIONAL_GE, dst, src1, src2); else GBE_ASSERT(0); } static void wgOpPerformThread(GenRegister threadDst, GenRegister inputVal, GenRegister threadExchangeData, GenRegister resultVal, uint32_t simd, uint32_t wg_op, GenEncoder *p) { p->push(); p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.execWidth = 1; /* setting the type */ resultVal = GenRegister::retype(resultVal, inputVal.type); threadDst = GenRegister::retype(threadDst, inputVal.type); threadExchangeData = GenRegister::retype(threadExchangeData, inputVal.type); vector input; vector result; /* for workgroup all and any we can use simd_all/any for each thread */ if (wg_op == ir::WORKGROUP_OP_ALL || wg_op == ir::WORKGROUP_OP_ANY) { GenRegister constZero = GenRegister::immuw(0); GenRegister flag01 = GenRegister::flag(0, 1); p->push(); { p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; p->curr.execWidth = simd; p->MOV(resultVal, GenRegister::immud(1)); p->curr.execWidth = 1; if (wg_op == ir::WORKGROUP_OP_ALL) p->MOV(flag01, GenRegister::immw(-1)); else p->MOV(flag01, constZero); p->curr.execWidth = simd; p->curr.noMask = 0; p->curr.flag = 0; p->curr.subFlag = 1; p->CMP(GEN_CONDITIONAL_NEQ, inputVal, constZero); if (p->curr.execWidth == 16) if (wg_op == ir::WORKGROUP_OP_ALL) p->curr.predicate = GEN_PREDICATE_ALIGN1_ALL16H; else p->curr.predicate = GEN_PREDICATE_ALIGN1_ANY16H; else if (p->curr.execWidth == 8) if (wg_op == ir::WORKGROUP_OP_ALL) p->curr.predicate = GEN_PREDICATE_ALIGN1_ALL8H; else p->curr.predicate = GEN_PREDICATE_ALIGN1_ANY8H; else NOT_IMPLEMENTED; p->SEL(threadDst, resultVal, constZero); p->SEL(threadExchangeData, resultVal, constZero); } p->pop(); } else { if (inputVal.hstride == GEN_HORIZONTAL_STRIDE_0) { p->MOV(threadExchangeData, inputVal); p->pop(); return; } /* init thread data to min/max/null values */ p->push(); { p->curr.execWidth = simd; wgOpInitValue(p, threadExchangeData, wg_op); p->MOV(resultVal, inputVal); } p->pop(); GenRegister resultValSingle = resultVal; resultValSingle.hstride = GEN_HORIZONTAL_STRIDE_0; resultValSingle.vstride = GEN_VERTICAL_STRIDE_0; resultValSingle.width = GEN_WIDTH_1; GenRegister inputValSingle = inputVal; inputValSingle.hstride = GEN_HORIZONTAL_STRIDE_0; inputValSingle.vstride = GEN_VERTICAL_STRIDE_0; inputValSingle.width = GEN_WIDTH_1; /* make an array of registers for easy accesing */ for(uint32_t i = 0; i < simd; i++){ /* add all resultVal offset reg positions from list */ result.push_back(resultValSingle); input.push_back(inputValSingle); /* move to next position */ resultValSingle.subnr += typeSize(resultValSingle.type); if (resultValSingle.subnr == 32) { resultValSingle.subnr = 0; resultValSingle.nr++; } /* move to next position */ inputValSingle.subnr += typeSize(inputValSingle.type); if (inputValSingle.subnr == 32) { inputValSingle.subnr = 0; inputValSingle.nr++; } } uint32_t start_i = 0; if( wg_op == ir::WORKGROUP_OP_REDUCE_ADD || wg_op == ir::WORKGROUP_OP_REDUCE_MIN || wg_op == ir::WORKGROUP_OP_REDUCE_MAX || wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX) { p->MOV(result[0], input[0]); start_i = 1; } else if(wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) { p->MOV(result[1], input[0]); start_i = 2; } /* algorithm workgroup */ for (uint32_t i = start_i; i < simd; i++) { if( wg_op == ir::WORKGROUP_OP_REDUCE_ADD || wg_op == ir::WORKGROUP_OP_REDUCE_MIN || wg_op == ir::WORKGROUP_OP_REDUCE_MAX) wgOpPerform(result[0], result[0], input[i], wg_op, p); else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX) wgOpPerform(result[i], result[i - 1], input[i], wg_op, p); else if(wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) wgOpPerform(result[i], result[i - 1], input[i - 1], wg_op, p); else GBE_ASSERT(0); } } if( wg_op == ir::WORKGROUP_OP_REDUCE_ADD || wg_op == ir::WORKGROUP_OP_REDUCE_MIN || wg_op == ir::WORKGROUP_OP_REDUCE_MAX) { p->curr.execWidth = simd; /* value exchanged with other threads */ p->MOV(threadExchangeData, result[0]); /* partial result thread */ p->MOV(threadDst, result[0]); } else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX) { p->curr.execWidth = simd; /* value exchanged with other threads */ p->MOV(threadExchangeData, result[simd - 1]); /* partial result thread */ p->MOV(threadDst, resultVal); } else if(wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) { p->curr.execWidth = 1; /* set result[0] to min/max/null */ wgOpInitValue(p, result[0], wg_op); p->curr.execWidth = simd; /* value exchanged with other threads */ wgOpPerform(threadExchangeData, result[simd - 1], input[simd - 1], wg_op, p); /* partial result thread */ p->MOV(threadDst, resultVal); } p->pop(); } /** * WORKGROUP OP: ALL, ANY, REDUCE, SCAN INCLUSIVE, SCAN EXCLUSIVE * * Implementation: * 1. All the threads first perform the workgroup op value for the * allocated work-items. SIMD16=> 16 work-items allocated for each thread * 2. Each thread writes the partial result in shared local memory using threadId * 3. After a barrier, each thread will read in chunks of 1-4 elements, * the shared local memory region, using a loop based on the thread num value (threadN) * 4. Each thread computes the final value individually * * Optimizations: * Performance is given by chunk read. If threads read in chunks of 4 elements * the performance is increase 2-3x times compared to chunks of 1 element. */ void Gen8Context::emitWorkGroupOpInstruction(const SelectionInstruction &insn){ const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister tmp = GenRegister::retype(ra->genReg(insn.dst(1)), dst.type); const GenRegister theVal = GenRegister::retype(ra->genReg(insn.src(2)), dst.type); GenRegister threadData = ra->genReg(insn.src(3)); GenRegister partialData = GenRegister::toUniform(threadData, dst.type); GenRegister threadId = ra->genReg(insn.src(0)); GenRegister threadLoop = ra->genReg(insn.src(1)); GenRegister barrierId = ra->genReg(GenRegister::ud1grf(ir::ocl::barrierid)); GenRegister localBarrier = ra->genReg(insn.src(5)); uint32_t wg_op = insn.extra.wgop.workgroupOp; uint32_t simd = p->curr.execWidth; int32_t jip0, jip1; /* masked elements should be properly set to init value */ p->push(); { p->curr.noMask = 1; wgOpInitValue(p, tmp, wg_op); p->curr.noMask = 0; p->MOV(tmp, theVal); p->curr.noMask = 1; p->MOV(theVal, tmp); } p->pop(); threadId = GenRegister::toUniform(threadId, GEN_TYPE_UD); /* use of continuous GRF allocation from insn selection */ GenRegister msg = GenRegister::retype(ra->genReg(insn.dst(2)), dst.type); GenRegister msgSlmOff = GenRegister::retype(ra->genReg(insn.src(4)), GEN_TYPE_UD); GenRegister msgAddr = GenRegister::retype(msg, GEN_TYPE_UD); GenRegister msgData = GenRegister::retype(ra->genReg(insn.dst(3)), dst.type); /* do some calculation within each thread */ wgOpPerformThread(dst, theVal, threadData, tmp, simd, wg_op, p); p->curr.execWidth = simd; p->MOV(theVal, dst); threadData = GenRegister::toUniform(threadData, dst.type); /* store thread count for future use on read/write to SLM */ if (wg_op == ir::WORKGROUP_OP_ANY || wg_op == ir::WORKGROUP_OP_ALL || wg_op == ir::WORKGROUP_OP_REDUCE_ADD || wg_op == ir::WORKGROUP_OP_REDUCE_MIN || wg_op == ir::WORKGROUP_OP_REDUCE_MAX) { threadLoop = GenRegister::retype(tmp, GEN_TYPE_D); p->MOV(threadLoop, ra->genReg(GenRegister::ud1grf(ir::ocl::threadn))); } else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) { threadLoop = GenRegister::retype(tmp, GEN_TYPE_D); p->MOV(threadLoop, ra->genReg(GenRegister::ud1grf(ir::ocl::threadid))); } /* all threads write the partial results to SLM memory */ if(dst.type == GEN_TYPE_UL || dst.type == GEN_TYPE_L) { GenRegister threadDataL = GenRegister::retype(threadData, GEN_TYPE_D); GenRegister threadDataH = threadDataL.offset(threadDataL, 0, 4); GenRegister msgDataL = GenRegister::retype(msgData, GEN_TYPE_D); GenRegister msgDataH = msgDataL.offset(msgDataL, 1); p->curr.execWidth = 8; p->MOV(msgDataL, threadDataL); p->MOV(msgDataH, threadDataH); p->MUL(msgAddr, threadId, GenRegister::immd(0x8)); p->ADD(msgAddr, msgAddr, msgSlmOff); p->UNTYPED_WRITE(msgAddr, msgData, GenRegister::immw(0xFE), 2, insn.extra.wgop.splitSend); } else { p->curr.execWidth = 8; p->MOV(msgData, threadData); p->MUL(msgAddr, threadId, GenRegister::immd(0x4)); p->ADD(msgAddr, msgAddr, msgSlmOff); p->UNTYPED_WRITE(msgAddr, msgData, GenRegister::immw(0xFE), 1, insn.extra.wgop.splitSend); } /* init partialData register, it will hold the final result */ wgOpInitValue(p, partialData, wg_op); /* add call to barrier */ p->push(); p->curr.execWidth = 8; p->curr.physicalFlag = 0; p->curr.noMask = 1; p->AND(localBarrier, barrierId, GenRegister::immud(0x0f000000)); p->BARRIER(localBarrier); p->curr.execWidth = 1; p->WAIT(); p->pop(); /* perform a loop, based on thread count (which is now multiple of 4) */ p->push();{ jip0 = p->n_instruction(); /* read in chunks of 4 to optimize SLM reads and reduce SEND messages */ if(dst.type == GEN_TYPE_UL || dst.type == GEN_TYPE_L) { p->curr.execWidth = 8; p->curr.predicate = GEN_PREDICATE_NONE; p->ADD(threadLoop, threadLoop, GenRegister::immd(-1)); p->MUL(msgAddr, threadLoop, GenRegister::immd(0x8)); p->ADD(msgAddr, msgAddr, msgSlmOff); p->UNTYPED_READ(msgData, msgAddr, GenRegister::immw(0xFE), 2); GenRegister msgDataL = msgData.retype(msgData.offset(msgData, 0, 4), GEN_TYPE_D); GenRegister msgDataH = msgData.retype(msgData.offset(msgData, 1, 4), GEN_TYPE_D); msgDataL.hstride = 2; msgDataH.hstride = 2; p->MOV(msgDataL, msgDataH); /* perform operation, partialData will hold result */ wgOpPerform(partialData, partialData, msgData.offset(msgData, 0), wg_op, p); } else { p->curr.execWidth = 8; p->curr.predicate = GEN_PREDICATE_NONE; p->ADD(threadLoop, threadLoop, GenRegister::immd(-1)); p->MUL(msgAddr, threadLoop, GenRegister::immd(0x4)); p->ADD(msgAddr, msgAddr, msgSlmOff); p->UNTYPED_READ(msgData, msgAddr, GenRegister::immw(0xFE), 1); /* perform operation, partialData will hold result */ wgOpPerform(partialData, partialData, msgData.offset(msgData, 0), wg_op, p); } /* while threadN is not 0, cycle read SLM / update value */ p->curr.noMask = 1; p->curr.flag = 0; p->curr.subFlag = 1; p->CMP(GEN_CONDITIONAL_G, threadLoop, GenRegister::immd(0x0)); p->curr.predicate = GEN_PREDICATE_NORMAL; jip1 = p->n_instruction(); p->JMPI(GenRegister::immud(0)); p->patchJMPI(jip1, jip0 - jip1, 0); } p->pop(); if(wg_op == ir::WORKGROUP_OP_ANY || wg_op == ir::WORKGROUP_OP_ALL || wg_op == ir::WORKGROUP_OP_REDUCE_ADD || wg_op == ir::WORKGROUP_OP_REDUCE_MIN || wg_op == ir::WORKGROUP_OP_REDUCE_MAX) { /* save result to final register location dst */ p->curr.execWidth = simd; p->MOV(dst, partialData); } else { /* save result to final register location dst */ p->curr.execWidth = simd; if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD) p->ADD(dst, dst, partialData); else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN) { /* workaround QW datatype on CMP */ if(dst.type == GEN_TYPE_UL || dst.type == GEN_TYPE_L){ p->push(); p->curr.execWidth = 8; p->SEL_CMP(GEN_CONDITIONAL_LE, dst, dst, partialData); if (simd == 16) { p->curr.execWidth = 8; p->curr.quarterControl = GEN_COMPRESSION_Q2; p->SEL_CMP(GEN_CONDITIONAL_LE, GenRegister::Qn(dst, 1), GenRegister::Qn(dst, 1), GenRegister::Qn(partialData, 1)); } p->pop(); } else p->SEL_CMP(GEN_CONDITIONAL_LE, dst, dst, partialData); } else if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) { /* workaround QW datatype on CMP */ if(dst.type == GEN_TYPE_UL || dst.type == GEN_TYPE_L){ p->push(); p->curr.execWidth = 8; p->SEL_CMP(GEN_CONDITIONAL_GE, dst, dst, partialData); if (simd == 16) { p->curr.execWidth = 8; p->curr.quarterControl = GEN_COMPRESSION_Q2; p->SEL_CMP(GEN_CONDITIONAL_GE, GenRegister::Qn(dst, 1), GenRegister::Qn(dst, 1), GenRegister::Qn(partialData, 1)); } p->pop(); } else p->SEL_CMP(GEN_CONDITIONAL_GE, dst, dst, partialData); } } /* corner cases for threads 0 */ if(wg_op == ir::WORKGROUP_OP_INCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_INCLUSIVE_MAX || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_ADD || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MIN || wg_op == ir::WORKGROUP_OP_EXCLUSIVE_MAX) { p->push();{ p->curr.flag = 0; p->curr.subFlag = 1; p->CMP(GEN_CONDITIONAL_EQ, threadId, GenRegister::immd(0x0)); p->curr.predicate = GEN_PREDICATE_NORMAL; p->curr.execWidth = simd; p->MOV(dst, theVal); } p->pop(); } } void Gen8Context::emitSubGroupOpInstruction(const SelectionInstruction &insn){ const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister tmp = GenRegister::retype(ra->genReg(insn.dst(1)), dst.type); const GenRegister theVal = GenRegister::retype(ra->genReg(insn.src(0)), dst.type); GenRegister threadData = ra->genReg(insn.src(1)); uint32_t wg_op = insn.extra.wgop.workgroupOp; uint32_t simd = p->curr.execWidth; /* masked elements should be properly set to init value */ p->push(); { p->curr.noMask = 1; wgOpInitValue(p, tmp, wg_op); p->curr.noMask = 0; p->MOV(tmp, theVal); p->curr.noMask = 1; p->MOV(theVal, tmp); } p->pop(); /* do some calculation within each thread */ wgOpPerformThread(dst, theVal, threadData, tmp, simd, wg_op, p); } } Beignet-1.3.2-Source/backend/src/backend/gen75_context.cpp000664 001750 001750 00000007470 13161142102 022421 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file gen75_context.cpp */ #include "backend/gen75_context.hpp" #include "backend/gen75_encoder.hpp" #include "backend/gen_program.hpp" #include "backend/gen_defs.hpp" #include "backend/gen_encoder.hpp" #include "backend/gen_insn_selection.hpp" #include "backend/gen_insn_scheduling.hpp" #include "backend/gen_reg_allocation.hpp" #include "sys/cvar.hpp" #include "ir/function.hpp" #include "ir/value.hpp" #include namespace gbe { void Gen75Context::emitSLMOffset(void) { if(kernel->getUseSLM() == false) return; const GenRegister slm_index = GenRegister::ud1grf(0, 0); //the slm index is hold in r0.0 24-27 bit, in 4K unit, move it to sr0.1's 8-11 bits. p->push(); p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; GenRegister sr0 = GenRegister::sr(0, 1); p->SHR(sr0, slm_index, GenRegister::immud(16)); p->pop(); } uint32_t Gen75Context::alignScratchSize(uint32_t size){ if(size == 0) return 0; uint32_t i = 2048; while(i < size) i *= 2; return i; } void Gen75Context::emitStackPointer(void) { using namespace ir; // Only emit stack pointer computation if we use a stack if (kernel->getStackSize() == 0) return; // Check that everything is consistent in the kernel code const uint32_t perLaneSize = kernel->getStackSize(); GBE_ASSERT(perLaneSize > 0); const GenRegister selStatckPtr = this->simdWidth == 8 ? GenRegister::ud8grf(ir::ocl::stackptr) : GenRegister::ud16grf(ir::ocl::stackptr); const GenRegister stackptr = ra->genReg(selStatckPtr); // borrow block ip as temporary register as we will // initialize block ip latter. const GenRegister tmpReg = GenRegister::retype(GenRegister::vec1(getBlockIP()), GEN_TYPE_UW); const GenRegister tmpReg_ud = GenRegister::retype(GenRegister::vec1(getBlockIP()), GEN_TYPE_UD); // We compute the per-lane stack pointer here // threadId * perThreadSize + laneId*perLaneSize or // (threadId * simdWidth + laneId)*perLaneSize p->push(); p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; //p->AND(GenRegister::ud1grf(126,0), GenRegister::ud1grf(0,5), GenRegister::immud(0x1ff)); p->AND(tmpReg, GenRegister::ud1grf(0,5), GenRegister::immud(0x7f)); p->AND(stackptr, GenRegister::ud1grf(0,5), GenRegister::immud(0x180)); p->SHR(stackptr, stackptr, GenRegister::immud(7)); p->SHL(tmpReg, tmpReg, GenRegister::immud(2)); p->ADD(tmpReg, tmpReg, stackptr); //threadId p->MUL(tmpReg, tmpReg, GenRegister::immuw(this->simdWidth)); //threadId * simdWidth p->curr.execWidth = this->simdWidth; loadLaneID(stackptr); p->ADD(stackptr, GenRegister::unpacked_uw(stackptr), tmpReg); //threadId * simdWidth + laneId, must < 64K p->curr.execWidth = 1; p->MOV(tmpReg_ud, GenRegister::immud(perLaneSize)); p->curr.execWidth = this->simdWidth; p->MUL(stackptr, tmpReg_ud, stackptr); // (threadId * simdWidth + laneId)*perLaneSize p->pop(); } void Gen75Context::newSelection(void) { this->sel = GBE_NEW(Selection75, *this); } } Beignet-1.3.2-Source/backend/src/backend/gen_insn_selection.hpp000664 001750 001750 00000031306 13173554000 023604 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file gen_insn_selection.hpp * \author Benjamin Segovia */ #ifndef __GEN_INSN_SELECTION_HPP__ #define __GEN_INSN_SELECTION_HPP__ #include "ir/register.hpp" #include "ir/instruction.hpp" #include "backend/gen_register.hpp" #include "backend/gen_encoder.hpp" #include "backend/gen_context.hpp" #include "backend/gen_reg_allocation.hpp" #include "sys/vector.hpp" #include "sys/intrusive_list.hpp" namespace gbe { /*! Translate IR type to Gen type */ uint32_t getGenType(ir::Type type); /*! Translate Gen type to IR type */ ir::Type getIRType(uint32_t genType); /*! Translate IR compare to Gen compare */ uint32_t getGenCompare(ir::Opcode opcode); /*! Selection opcodes properly encoded from 0 to n for fast jump tables * generations */ enum SelectionOpcode { #define DECL_SELECTION_IR(OP, FN) SEL_OP_##OP, #include "backend/gen_insn_selection.hxx" #undef DECL_SELECTION_IR }; // Owns and Allocates selection instructions class Selection; // List of SelectionInstruction forms a block class SelectionBlock; /*! A selection instruction is also almost a Gen instruction but *before* the * register allocation */ class SelectionInstruction : public NonCopyable, public intrusive_list_node { public: /*! Owns the instruction */ SelectionBlock *parent; /*! Append an instruction before this one */ void prepend(SelectionInstruction &insn); /*! Append an instruction after this one */ void append(SelectionInstruction &insn); /*! Does it read memory? */ bool isRead(void) const; /*! Does it write memory? */ bool isWrite(void) const; /*! Does it modify the acc register. */ bool modAcc(void) const; /*! Is it a branch instruction (i.e. modify control flow) */ bool isBranch(void) const; /*! Is it a label instruction (i.e. change the implicit mask) */ bool isLabel(void) const; /*! Is the src's gen register region is same as all dest regs' region */ bool sameAsDstRegion(uint32_t srcID); /*! Is it a simple navtive instruction (i.e. will be one simple ISA) */ bool isNative(void) const; /*! Get the destination register */ GenRegister &dst(uint32_t dstID) { return regs[dstID]; } /*! Get the source register */ GenRegister &src(uint32_t srcID) { return regs[dstNum+srcID]; } /*! Damn C++ */ const GenRegister &dst(uint32_t dstID) const { return regs[dstID]; } /*! Damn C++ */ const GenRegister &src(uint32_t srcID) const { return regs[dstNum+srcID]; } /*! Set debug infomation to selection */ void setDBGInfo(DebugInfo in) { DBGInfo = in; } /*! No more than 40 sources (40 sources are used by vme for payload passing and setting) */ enum { MAX_SRC_NUM = 40 }; /*! No more than 17 destinations (17 used by image block read8) */ enum { MAX_DST_NUM = 17 }; /*! State of the instruction (extra fields neeed for the encoding) */ GenInstructionState state; union { struct { /*! Store bti for loads/stores and function for math, atomic and compares */ uint16_t function:8; /*! elemSize for byte scatters / gathers, elemNum for untyped msg, operand number for atomic */ uint16_t elem:8; uint16_t splitSend:1; }; struct { /*! Number of sources in the tuple */ uint16_t width:4; /*! vertical stride (0,1,2,4,8 or 16) */ uint16_t vstride:5; /*! horizontal stride (0,1,2,4,8 or 16) */ uint16_t hstride:5; /*! offset (0 to 7) */ uint16_t offset:5; }; struct { uint16_t scratchOffset; uint16_t scratchMsgHeader; }; struct { uint16_t bti:8; uint16_t msglen:5; uint16_t is3DWrite:1; uint16_t typedWriteSplitSend:1; }; struct { uint16_t rdbti:8; uint16_t sampler:5; uint16_t rdmsglen:3; bool isLD; // is this a ld message? bool isUniform; }; struct { uint16_t vme_bti:8; uint16_t msg_type:2; uint16_t vme_search_path_lut:3; uint16_t lut_sub:2; }; uint32_t barrierType; uint32_t waitType; bool longjmp; uint32_t indirect_offset; struct { uint32_t pointNum:16; uint32_t timestampType:16; }; struct { uint32_t profilingType:16; uint32_t profilingBTI:16; }; struct { uint32_t printfNum:16; uint32_t printfBTI:8; uint32_t continueFlag:8; uint16_t printfSize; uint16_t printfSplitSend:1; }; struct { uint16_t workgroupOp; uint16_t splitSend:1; }wgop; } extra; /*! Gen opcode */ uint8_t opcode; /*! Number of destinations */ uint8_t dstNum:5; /*! Number of sources */ uint8_t srcNum:6; /*! To store various indices */ uint32_t index; /*! For BRC/IF to store the UIP */ uint32_t index1; /*! instruction ID used for vector allocation. */ uint32_t ID; DebugInfo DBGInfo; /*! Variable sized. Destinations and sources go here */ GenRegister regs[0]; INLINE uint32_t getbti() const { GBE_ASSERT(isRead() || isWrite()); switch (opcode) { case SEL_OP_OBREAD: case SEL_OP_OBWRITE: case SEL_OP_MBREAD: case SEL_OP_MBWRITE: case SEL_OP_DWORD_GATHER: return extra.function; case SEL_OP_SAMPLE: return extra.rdbti; case SEL_OP_VME: return extra.vme_bti; case SEL_OP_TYPED_WRITE: return extra.bti; default: GBE_ASSERT(0); } return 0; } private: INLINE void setbti(uint32_t bti) { GBE_ASSERT(isRead() || isWrite()); switch (opcode) { case SEL_OP_OBREAD: case SEL_OP_OBWRITE: case SEL_OP_MBREAD: case SEL_OP_MBWRITE: case SEL_OP_DWORD_GATHER: extra.function = bti; return; case SEL_OP_SAMPLE: extra.rdbti = bti; return; case SEL_OP_VME: extra.vme_bti = bti; return; case SEL_OP_TYPED_WRITE: extra.bti = bti; return; default: GBE_ASSERT(0); } } /*! Just Selection class can create SelectionInstruction */ SelectionInstruction(SelectionOpcode, uint32_t dstNum, uint32_t srcNum); // Allocates (with a linear allocator) and owns SelectionInstruction friend class Selection; }; void outputSelectionInst(SelectionInstruction &insn); /*! Instructions like sends require to make registers contiguous in GRF */ class SelectionVector : public NonCopyable, public intrusive_list_node { public: SelectionVector(void); /*! The instruction that requires the vector of registers */ SelectionInstruction *insn; /*! Directly points to the selection instruction registers */ GenRegister *reg; /*! Number of registers in the vector */ uint16_t regNum; /*! offset in insn src() or dst() */ uint16_t offsetID; /*! Indicate if this a destination or a source vector */ uint16_t isSrc; }; // Owns the selection block class Selection; /*! A selection block is the counterpart of the IR Basic block. It contains * the instructions generated from an IR basic block */ class SelectionBlock : public NonCopyable, public intrusive_list_node { public: SelectionBlock(const ir::BasicBlock *bb); /*! All the emitted instructions in the block */ intrusive_list insnList; /*! The vectors that may be required by some instructions of the block */ intrusive_list vectorList; /*! Extra registers needed by the block (only live in the block) */ gbe::vector tmp; /*! Associated IR basic block */ const ir::BasicBlock *bb; /*! Append a new temporary register */ void append(ir::Register reg); /*! Append a new selection vector in the block */ void append(SelectionVector *vec); /*! Append a new selection instruction at the end of the block */ void append(SelectionInstruction *insn); /*! Append a new selection instruction at the beginning of the block */ void prepend(SelectionInstruction *insn); ir::LabelIndex endifLabel; int endifOffset; bool hasBarrier; bool hasBranch; bool removeSimpleIfEndif; }; enum SEL_IR_OPT_FEATURE { //for OP_AND/not/or/xor , on BDW+, SrcMod value indicates a logical source modifier // on PRE-BDW, SrcMod value indicates a numeric source modifier SIOF_LOGICAL_SRCMOD = 1 << 0, //for OP_MOV, on BSW, for long data type, src and dst hstride must be aligned to the same qword SIOF_OP_MOV_LONG_REG_RESTRICT = 1 << 1, }; /*! Owns the selection engine */ class GenContext; /*! Selection engine produces the pre-ISA instruction blocks */ class Selection { public: /*! Initialize internal structures used for the selection */ Selection(GenContext &ctx); /*! Release everything */ ~Selection(void); /*! Implements the instruction selection itself */ void select(void); /*! Get the number of instructions of the largest block */ uint32_t getLargestBlockSize(void) const; /*! Number of register vectors in the selection */ uint32_t getVectorNum(void) const; /*! Number of registers (temporaries are created during selection) */ uint32_t getRegNum(void) const; /*! Get the family for the given register */ ir::RegisterFamily getRegisterFamily(ir::Register reg) const; /*! Get the data for the given register */ ir::RegisterData getRegisterData(ir::Register reg) const; /*! Replace a source by the returned temporary register */ ir::Register replaceSrc(SelectionInstruction *insn, uint32_t regID, ir::Type type = ir::TYPE_FLOAT, bool needMov = true); /*! Replace a destination to the returned temporary register */ ir::Register replaceDst(SelectionInstruction *insn, uint32_t regID, ir::Type type = ir::TYPE_FLOAT, bool needMov = true); /*! spill a register (insert spill/unspill instructions) */ bool spillRegs(const SpilledRegs &spilledRegs, uint32_t registerPool); /*! Indicate if a register is scalar or not */ bool isScalarReg(const ir::Register ®) const; /*! is this register a partially written register.*/ bool isPartialWrite(const ir::Register ®) const; /*! Create a new selection instruction */ SelectionInstruction *create(SelectionOpcode, uint32_t dstNum, uint32_t srcNum); /*! List of emitted blocks */ intrusive_list *blockList; /*! Actual implementation of the register allocator (use Pimpl) */ class Opaque; /*! Created and destroyed in cpp */ Opaque *opaque; /* optimize at selection IR level */ void optimize(void); uint32_t opt_features; /* Add insn ID for sel IR */ void addID(void); const GenContext &getCtx(); /*! Use custom allocators */ GBE_CLASS(Selection); }; class Selection75: public Selection { public: /*! Initialize internal structures used for the selection */ Selection75(GenContext &ctx); }; class Selection8: public Selection { public: /*! Initialize internal structures used for the selection */ Selection8(GenContext &ctx); }; class SelectionChv: public Selection { public: /*! Initialize internal structures used for the selection */ SelectionChv(GenContext &ctx); }; class Selection9: public Selection { public: /*! Initialize internal structures used for the selection */ Selection9(GenContext &ctx); }; class SelectionBxt: public Selection { public: /*! Initialize internal structures used for the selection */ SelectionBxt(GenContext &ctx); }; class SelectionKbl : public Selection { public: /*! Initialize internal structures used for the selection */ SelectionKbl(GenContext &ctx); }; class SelectionGlk: public Selection { public: /*! Initialize internal structures used for the selection */ SelectionGlk(GenContext &ctx); }; } /* namespace gbe */ #endif /* __GEN_INSN_SELECTION_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen_defs.hpp000664 001750 001750 00000065542 13161142102 021513 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* Copyright (C) Intel Corp. 2006. All Rights Reserved. Intel funded Tungsten Graphics (http://www.tungstengraphics.com) to develop this 3D driver. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. **********************************************************************/ /* * Authors: * Keith Whitwell */ #ifndef __GEN_DEFS_HPP__ #define __GEN_DEFS_HPP__ #include #include "backend/gen7_instruction.hpp" #include "backend/gen8_instruction.hpp" #include "backend/gen9_instruction.hpp" ///////////////////////////////////////////////////////////////////////////// // Gen EU defines ///////////////////////////////////////////////////////////////////////////// /* Execution Unit (EU) defines */ #define GEN_ALIGN_1 0 #define GEN_ALIGN_16 1 #define GEN_REG_SIZE 32 #define GEN_ADDRESS_DIRECT 0 #define GEN_ADDRESS_REGISTER_INDIRECT_REGISTER 1 #define GEN_CHANNEL_X 0 #define GEN_CHANNEL_Y 1 #define GEN_CHANNEL_Z 2 #define GEN_CHANNEL_W 3 #define GEN_COMPRESSION_Q1 0 #define GEN_COMPRESSION_Q2 1 #define GEN_COMPRESSION_Q3 2 #define GEN_COMPRESSION_Q4 3 #define GEN_COMPRESSION_H1 0 #define GEN_COMPRESSION_H2 2 #define GEN_CONDITIONAL_NONE 0 #define GEN_CONDITIONAL_Z 1 #define GEN_CONDITIONAL_NZ 2 #define GEN_CONDITIONAL_EQ 1 /* Z */ #define GEN_CONDITIONAL_NEQ 2 /* NZ */ #define GEN_CONDITIONAL_G 3 #define GEN_CONDITIONAL_GE 4 #define GEN_CONDITIONAL_L 5 #define GEN_CONDITIONAL_LE 6 #define GEN_CONDITIONAL_R 7 #define GEN_CONDITIONAL_O 8 #define GEN_CONDITIONAL_U 9 #define GEN_DEBUG_NONE 0 #define GEN_DEBUG_BREAKPOINT 1 #define GEN_DEPENDENCY_NORMAL 0 #define GEN_DEPENDENCY_NOTCLEARED 1 #define GEN_DEPENDENCY_NOTCHECKED 2 #define GEN_DEPENDENCY_DISABLE 3 #define GEN_HORIZONTAL_STRIDE_0 0 #define GEN_HORIZONTAL_STRIDE_1 1 #define GEN_HORIZONTAL_STRIDE_2 2 #define GEN_HORIZONTAL_STRIDE_4 3 #define GEN_INSTRUCTION_NORMAL 0 #define GEN_INSTRUCTION_SATURATE 1 #define GEN_MASK_ENABLE 0 #define GEN_MASK_DISABLE 1 /*! Gen opcode */ enum opcode { GEN_OPCODE_MOV = 1, GEN_OPCODE_SEL = 2, GEN_OPCODE_NOT = 4, GEN_OPCODE_AND = 5, GEN_OPCODE_OR = 6, GEN_OPCODE_XOR = 7, GEN_OPCODE_SHR = 8, GEN_OPCODE_SHL = 9, GEN_OPCODE_RSR = 10, GEN_OPCODE_RSL = 11, GEN_OPCODE_ASR = 12, GEN_OPCODE_CMP = 16, GEN_OPCODE_CMPN = 17, GEN_OPCODE_F32TO16 = 19, GEN_OPCODE_F16TO32 = 20, GEN_OPCODE_BFREV = 23, GEN_OPCODE_JMPI = 32, GEN_OPCODE_BRD = 33, GEN_OPCODE_IF = 34, GEN_OPCODE_BRC = 35, GEN_OPCODE_ELSE = 36, GEN_OPCODE_ENDIF = 37, GEN_OPCODE_DO = 38, GEN_OPCODE_WHILE = 39, GEN_OPCODE_BREAK = 40, GEN_OPCODE_CONTINUE = 41, GEN_OPCODE_HALT = 42, GEN_OPCODE_MSAVE = 44, GEN_OPCODE_MRESTORE = 45, GEN_OPCODE_PUSH = 46, GEN_OPCODE_POP = 47, GEN_OPCODE_WAIT = 48, GEN_OPCODE_SEND = 49, GEN_OPCODE_SENDC = 50, GEN_OPCODE_SENDS = 51, GEN_OPCODE_MATH = 56, GEN_OPCODE_ADD = 64, GEN_OPCODE_MUL = 65, GEN_OPCODE_AVG = 66, GEN_OPCODE_FRC = 67, GEN_OPCODE_RNDU = 68, GEN_OPCODE_RNDD = 69, GEN_OPCODE_RNDE = 70, GEN_OPCODE_RNDZ = 71, GEN_OPCODE_MAC = 72, GEN_OPCODE_MACH = 73, GEN_OPCODE_LZD = 74, GEN_OPCODE_FBH = 75, GEN_OPCODE_FBL = 76, GEN_OPCODE_CBIT = 77, GEN_OPCODE_ADDC = 78, GEN_OPCODE_SUBB = 79, GEN_OPCODE_SAD2 = 80, GEN_OPCODE_SADA2 = 81, GEN_OPCODE_DP4 = 84, GEN_OPCODE_DPH = 85, GEN_OPCODE_DP3 = 86, GEN_OPCODE_DP2 = 87, GEN_OPCODE_DPA2 = 88, GEN_OPCODE_LINE = 89, GEN_OPCODE_PLN = 90, GEN_OPCODE_MAD = 91, GEN_OPCODE_LRP = 92, GEN_OPCODE_MADM = 93, GEN_OPCODE_NOP = 126, }; #define GEN_ATOMIC_SIMD16 0 #define GEN_ATOMIC_SIMD8 1 enum GenAtomicOpCode { GEN_ATOMIC_OP_CMPWR8B = 0, GEN_ATOMIC_OP_AND = 1, GEN_ATOMIC_OP_OR = 2, GEN_ATOMIC_OP_XOR = 3, GEN_ATOMIC_OP_MOV = 4, GEN_ATOMIC_OP_INC = 5, GEN_ATOMIC_OP_DEC = 6, GEN_ATOMIC_OP_ADD = 7, GEN_ATOMIC_OP_SUB = 8, GEN_ATOMIC_OP_REVSUB = 9, GEN_ATOMIC_OP_IMAX = 10, GEN_ATOMIC_OP_IMIN = 11, GEN_ATOMIC_OP_UMAX = 12, GEN_ATOMIC_OP_UMIN = 13, GEN_ATOMIC_OP_CMPWR = 14, GEN_ATOMIC_OP_PREDEC = 15 }; /*! Gen SFID */ enum GenMessageTarget { GEN_SFID_NULL = 0, GEN_SFID_RESERVED = 1, GEN_SFID_SAMPLER = 2, GEN_SFID_MESSAGE_GATEWAY = 3, GEN_SFID_DATAPORT_SAMPLER = 4, GEN_SFID_DATAPORT_RENDER = 5, GEN_SFID_URB = 6, GEN_SFID_THREAD_SPAWNER = 7, GEN_SFID_VIDEO_MOTION_EST = 8, GEN_SFID_DATAPORT_CONSTANT = 9, GEN_SFID_DATAPORT_DATA = 10, GEN_SFID_PIXEL_INTERPOLATOR = 11, GEN_SFID_DATAPORT1_DATA = 12, /* New for HSW and BDW. */ }; #define GEN_PREDICATE_NONE 0 #define GEN_PREDICATE_NORMAL 1 #define GEN_PREDICATE_ALIGN1_ANYV 2 #define GEN_PREDICATE_ALIGN1_ALLV 3 #define GEN_PREDICATE_ALIGN1_ANY2H 4 #define GEN_PREDICATE_ALIGN1_ALL2H 5 #define GEN_PREDICATE_ALIGN1_ANY4H 6 #define GEN_PREDICATE_ALIGN1_ALL4H 7 #define GEN_PREDICATE_ALIGN1_ANY8H 8 #define GEN_PREDICATE_ALIGN1_ALL8H 9 #define GEN_PREDICATE_ALIGN1_ANY16H 10 #define GEN_PREDICATE_ALIGN1_ALL16H 11 #define GEN_PREDICATE_ALIGN16_REPLICATE_X 2 #define GEN_PREDICATE_ALIGN16_REPLICATE_Y 3 #define GEN_PREDICATE_ALIGN16_REPLICATE_Z 4 #define GEN_PREDICATE_ALIGN16_REPLICATE_W 5 #define GEN_PREDICATE_ALIGN16_ANY4H 6 #define GEN_PREDICATE_ALIGN16_ALL4H 7 #define GEN_ARCHITECTURE_REGISTER_FILE 0 #define GEN_GENERAL_REGISTER_FILE 1 #define GEN_IMMEDIATE_VALUE 3 #define GEN_TYPE_UD 0 #define GEN_TYPE_D 1 #define GEN_TYPE_UW 2 #define GEN_TYPE_W 3 #define GEN_TYPE_UB 4 #define GEN_TYPE_B 5 #define GEN_TYPE_VF 5 /* packed float vector, immediates only? */ #define GEN_TYPE_V 6 /* packed int vector, immediates only, uword dest only */ #define GEN_TYPE_DF 6 #define GEN_TYPE_F 7 #define GEN_TYPE_UL 8 #define GEN_TYPE_L 9 #define GEN_TYPE_HF 10 #define GEN_TYPE_DF_IMM 10 /* For the double float in imm. */ #define GEN_TYPE_HF_IMM 11 /* For the half float in imm. */ #define GEN_ARF_NULL 0x00 #define GEN_ARF_ADDRESS 0x10 #define GEN_ARF_ACCUMULATOR 0x20 #define GEN_ARF_FLAG 0x30 #define GEN_ARF_MASK 0x40 #define GEN_ARF_MASK_STACK 0x50 #define GEN_ARF_MASK_STACK_DEPTH 0x60 #define GEN_ARF_STATE 0x70 #define GEN_ARF_CONTROL 0x80 #define GEN_ARF_NOTIFICATION_COUNT 0x90 #define GEN_ARF_IP 0xA0 #define GEN_ARF_TM 0xC0 #define GEN_MRF_COMPR4 (1 << 7) #define GEN_AMASK 0 #define GEN_IMASK 1 #define GEN_LMASK 2 #define GEN_CMASK 3 #define GEN_THREAD_NORMAL 0 #define GEN_THREAD_ATOMIC 1 #define GEN_THREAD_SWITCH 2 #define GEN_VERTICAL_STRIDE_0 0 #define GEN_VERTICAL_STRIDE_1 1 #define GEN_VERTICAL_STRIDE_2 2 #define GEN_VERTICAL_STRIDE_4 3 #define GEN_VERTICAL_STRIDE_8 4 #define GEN_VERTICAL_STRIDE_16 5 #define GEN_VERTICAL_STRIDE_32 6 #define GEN_VERTICAL_STRIDE_64 7 #define GEN_VERTICAL_STRIDE_128 8 #define GEN_VERTICAL_STRIDE_256 9 #define GEN_VERTICAL_STRIDE_ONE_DIMENSIONAL 0xF /* Execution width */ #define GEN_WIDTH_1 0 #define GEN_WIDTH_2 1 #define GEN_WIDTH_4 2 #define GEN_WIDTH_8 3 #define GEN_WIDTH_16 4 #define GEN_WIDTH_32 5 /* Channels to enable for the untyped reads and writes */ #define GEN_UNTYPED_RED (1 << 0) #define GEN_UNTYPED_GREEN (1 << 1) #define GEN_UNTYPED_BLUE (1 << 2) #define GEN_UNTYPED_ALPHA (1 << 3) /* SIMD mode for untyped reads and writes */ #define GEN_UNTYPED_SIMD4x2 0 #define GEN_UNTYPED_SIMD16 1 #define GEN_UNTYPED_SIMD8 2 /* SIMD mode for byte scatters / gathers */ #define GEN_BYTE_SCATTER_SIMD8 0 #define GEN_BYTE_SCATTER_SIMD16 1 /* Data port message type for gen7*/ #define GEN7_OBLOCK_READ 0 //0000: OWord Block Read #define GEN7_UNALIGNED_OBLOCK_READ 1 //0001: Unaligned OWord Block Read #define GEN7_ODBLOCK_READ 2 //0010: OWord Dual Block Read #define GEN7_DWORD_GATHER 3 //0011: DWord Scattered Read #define GEN7_BYTE_GATHER 4 //0100: Byte Scattered Read #define GEN7_UNTYPED_READ 5 //0101: Untyped Surface Read #define GEN7_UNTYPED_ATOMIC_READ 6 //0110: Untyped Atomic Operation #define GEN7_MEMORY_FENCE 7 //0111: Memory Fence #define GEN7_OBLOCK_WRITE 8 //1000: OWord Block Write #define GEN7_ODBLOCK_WRITE 10//1010: OWord Dual Block Write #define GEN7_DWORD_SCATTER 11//1011: DWord Scattered Write #define GEN7_BYTE_SCATTER 12//1100: Byte Scattered Write #define GEN7_UNTYPED_WRITE 13//1101: Untyped Surface Write /* Data port0 message type for Gen75*/ #define GEN75_P0_OBLOCK_READ 0 //0000: OWord Block Read #define GEN75_P0_UNALIGNED_OBLOCK_READ 1 //0001: Unaligned OWord Block Read #define GEN75_P0_ODBLOCK_READ 2 //0010: OWord Dual Block Read #define GEN75_P0_DWORD_GATHER 3 //0011: DWord Scattered Read #define GEN75_P0_BYTE_GATHER 4 //0100: Byte Scattered Read #define GEN75_P0_MEMORY_FENCE 7 //0111: Memory Fence #define GEN75_P0_OBLOCK_WRITE 8 //1000: OWord Block Write #define GEN75_P0_ODBLOCK_WRITE 10 //1010: OWord Dual Block Write #define GEN75_P0_DWORD_SCATTER 11 //1011: DWord Scattered Write #define GEN75_P0_BYTE_SCATTER 12 //1100: Byte Scattered Write /* Data port1 message type for Gen75*/ #define GEN75_P1_UNTYPED_READ 1 //0001: Untyped Surface Read #define GEN75_P1_UNTYPED_ATOMIC_OP 2 //0010: Untyped Atomic Operation #define GEN75_P1_UNTYPED_ATOMIC_OP_4X2 3 //0011: Untyped Atomic Operation SIMD4x2 #define GEN75_P1_MEDIA_BREAD 4 //0100: Media Block Read #define GEN75_P1_TYPED_SURFACE_READ 5 //0101: Typed Surface Read #define GEN75_P1_TYPED_ATOMIC_OP 6 //0110: Typed Atomic Operation #define GEN75_P1_TYPED_ATOMIC_OP_4X2 7 //0111: Typed Atomic Operation SIMD4x2 #define GEN75_P1_UNTYPED_SURFACE_WRITE 9 //1001: Untyped Surface Write #define GEN75_P1_MEDIA_TYPED_BWRITE 10 //1010: Media Block Write #define GEN75_P1_ATOMIC_COUNTER 11 //1011: Atomic Counter Operation #define GEN75_P1_ATOMIC_COUNTER_4X2 12 //1100: Atomic Counter Operation 4X2 #define GEN75_P1_TYPED_SURFACE_WRITE 13 //1101: Typed Surface Write #define GEN8_P1_BLOCK_READ_A64 20 //10100 #define GEN8_P1_BLOCK_WRITE_A64 21 //10101 #define GEN8_P1_BYTE_GATHER_A64 16 //10000 #define GEN8_P1_UNTYPED_READ_A64 17 //10001 #define GEN8_P1_UNTYPED_ATOMIC_A64 18 //10010 #define GEN8_P1_UNTYPED_WRITE_A64 25 //11001 #define GEN8_P1_BYTE_SCATTER_A64 26 //11010 /* Data port data cache scratch messages*/ #define GEN_SCRATCH_READ 0 #define GEN_SCRATCH_WRITE 1 #define GEN_SCRATCH_CHANNEL_MODE_OWORD 0 #define GEN_SCRATCH_CHANNEL_MODE_DWORD 1 #define GEN_SCRATCH_BLOCK_SIZE_1 0 #define GEN_SCRATCH_BLOCK_SIZE_2 1 #define GEN_SCRATCH_BLOCK_SIZE_4 3 /* Data port render cache Message Type*/ #define GEN_MBLOCK_READ 4 //0100: Media Block Read #define GEN_TYPED_READ 5 //0101: Typed Surface Read #define GEN_TYPED_ATOMIC 6 //0110: Typed Atomic Operation #define GEN_MEM_FENCE 7 //0111: Memory Fence #define GEN_MBLOCK_WRITE 10 //1010: Media Block Write #define GEN_RENDER_WRITE 12 //1100: Render Target Write #define GEN_TYPED_WRITE 13 //1101: Typed Surface Write /* For byte scatters and gathers, the element to write */ #define GEN_BYTE_SCATTER_BYTE 0 #define GEN_BYTE_SCATTER_WORD 1 #define GEN_BYTE_SCATTER_DWORD 2 #define GEN_BYTE_SCATTER_QWORD 3 /* dword scattered rw */ #define GEN_DWORD_SCATTER_8_DWORDS 2 #define GEN_DWORD_SCATTER_16_DWORDS 3 #define GEN_SAMPLER_RETURN_FORMAT_FLOAT32 0 #define GEN_SAMPLER_RETURN_FORMAT_UINT32 2 #define GEN_SAMPLER_RETURN_FORMAT_SINT32 3 #define GEN_SAMPLER_MESSAGE_SIMD8_SAMPLE 0 #define GEN_SAMPLER_MESSAGE_SIMD16_SAMPLE 0 #define GEN_SAMPLER_MESSAGE_SIMD16_SAMPLE_BIAS 0 #define GEN_SAMPLER_MESSAGE_SIMD8_KILLPIX 1 #define GEN_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_LOD 1 #define GEN_SAMPLER_MESSAGE_SIMD16_SAMPLE_LOD 1 #define GEN_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_GRADIENTS 2 #define GEN_SAMPLER_MESSAGE_SIMD8_SAMPLE_GRADIENTS 2 #define GEN_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_COMPARE 0 #define GEN_SAMPLER_MESSAGE_SIMD16_SAMPLE_COMPARE 2 #define GEN_SAMPLER_MESSAGE_SIMD8_SAMPLE_BIAS_COMPARE 0 #define GEN_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_LOD_COMPARE 1 #define GEN_SAMPLER_MESSAGE_SIMD8_SAMPLE_LOD_COMPARE 1 #define GEN_SAMPLER_MESSAGE_SIMD4X2_RESINFO 2 #define GEN_SAMPLER_MESSAGE_SIMD16_RESINFO 2 #define GEN_SAMPLER_MESSAGE_SIMD4X2_LD 7 #define GEN_SAMPLER_MESSAGE_SIMD8_LD 7 #define GEN_SAMPLER_MESSAGE_SIMD16_LD 7 #define GEN5_SAMPLER_MESSAGE_SAMPLE 0 #define GEN5_SAMPLER_MESSAGE_SAMPLE_BIAS 1 #define GEN5_SAMPLER_MESSAGE_SAMPLE_LOD 2 #define GEN5_SAMPLER_MESSAGE_SAMPLE_COMPARE 3 #define GEN5_SAMPLER_MESSAGE_SAMPLE_DERIVS 4 #define GEN5_SAMPLER_MESSAGE_SAMPLE_BIAS_COMPARE 5 #define GEN5_SAMPLER_MESSAGE_SAMPLE_LOD_COMPARE 6 #define GEN5_SAMPLER_MESSAGE_SAMPLE_LD 7 #define GEN5_SAMPLER_MESSAGE_SAMPLE_RESINFO 10 #define GEN_SAMPLER_MESSAGE_CACHE_FLUSH 0x1f /* for GEN5 only */ #define GEN_SAMPLER_SIMD_MODE_SIMD4X2 0 #define GEN_SAMPLER_SIMD_MODE_SIMD8 1 #define GEN_SAMPLER_SIMD_MODE_SIMD16 2 #define GEN_SAMPLER_SIMD_MODE_SIMD32_64 3 #define GEN_MATH_FUNCTION_INV 1 #define GEN_MATH_FUNCTION_LOG 2 #define GEN_MATH_FUNCTION_EXP 3 #define GEN_MATH_FUNCTION_SQRT 4 #define GEN_MATH_FUNCTION_RSQ 5 #define GEN_MATH_FUNCTION_SIN 6 /* was 7 */ #define GEN_MATH_FUNCTION_COS 7 /* was 8 */ #define GEN_MATH_FUNCTION_FDIV 9 /* gen6+ */ #define GEN_MATH_FUNCTION_POW 10 #define GEN_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER 11 #define GEN_MATH_FUNCTION_INT_DIV_QUOTIENT 12 #define GEN_MATH_FUNCTION_INT_DIV_REMAINDER 13 #define GEN8_MATH_FUNCTION_INVM 14 #define GEN8_MATH_FUNCTION_RSQRTM 15 #define GEN_MATH_INTEGER_UNSIGNED 0 #define GEN_MATH_INTEGER_SIGNED 1 #define GEN_MATH_PRECISION_FULL 0 #define GEN_MATH_PRECISION_PARTIAL 1 #define GEN_MATH_SATURATE_NONE 0 #define GEN_MATH_SATURATE_SATURATE 1 #define GEN_MATH_DATA_VECTOR 0 #define GEN_MATH_DATA_SCALAR 1 #define GEN_DEREFERENCE_URB 0 #define GEN_DO_NOT_DEREFERENCE_URB 1 #define GEN_MAX_NUM_BUFFER_ENTRIES (1 << 27) /* Message gateway */ #define GEN_OPEN_GATEWAY 0b000 #define GEN_CLOSE_GATEWAY 0b001 #define GEN_FORWARD_MSG 0b010 #define GEN_GET_TIME_STAMP 0b011 #define GEN_BARRIER_MSG 0b100 #define GEN_UPDATE_GATEWAT_STATE 0b101 #define GEN_MMIO_READ_WRITE 0b110 /* Accumulator acc2~acc9 in instruction */ #define GEN8_INSN_ACC2 0 #define GEN8_INSN_ACC3 1 #define GEN8_INSN_ACC4 2 #define GEN8_INSN_ACC5 3 #define GEN8_INSN_ACC6 4 #define GEN8_INSN_ACC7 5 #define GEN8_INSN_ACC8 6 #define GEN8_INSN_ACC9 7 #define GEN8_INSN_NOACC 8 ///////////////////////////////////////////////////////////////////////////// // Gen EU structures ///////////////////////////////////////////////////////////////////////////// /** Number of general purpose registers (VS, WM, etc) */ #define GEN_MAX_GRF 128 /* Instruction format for the execution units */ struct GenInstruction { uint32_t low; uint32_t high; }; union GenCompactInstruction { struct GenInstruction low; /* Gen8+ src3 compact inst */ struct { struct { uint32_t opcode:7; uint32_t pad:1; uint32_t control_index:2; uint32_t src_index:2; uint32_t dst_reg_nr:7; uint32_t pad1:9; uint32_t src0_rep_ctrl:1; uint32_t compact_control:1; uint32_t debug_control:1; uint32_t saturate:1; } bits1; struct { uint32_t src1_rep_ctrl:1; uint32_t src2_rep_ctrl:1; uint32_t src0_subnr:3; uint32_t src1_subnr:3; uint32_t src2_subnr:3; uint32_t src0_reg_nr:7; uint32_t src1_reg_nr:7; uint32_t src2_reg_nr:7; } bits2; } src3Insn; /* Normal src2 compact inst */ struct { struct { uint32_t opcode:7; uint32_t debug_control:1; uint32_t control_index:5; uint32_t data_type_index:5; uint32_t sub_reg_index:5; uint32_t acc_wr_control:1; uint32_t destreg_or_condmod:4; uint32_t pad:1; uint32_t cmpt_control:1; uint32_t src0_index_lo:2; } bits1; struct { uint32_t src0_index_hi:3; uint32_t src1_index:5; uint32_t dest_reg_nr:8; uint32_t src0_reg_nr:8; uint32_t src1_reg_nr:8; } bits2; }; }; union GenNativeInstruction { struct { struct GenInstruction low; struct GenInstruction high; }; union Gen7NativeInstruction gen7_insn; union Gen8NativeInstruction gen8_insn; union Gen9NativeInstruction gen9_insn; //Gen7 & Gen8 common field struct { struct { uint32_t opcode:7; uint32_t pad:1; uint32_t access_mode:1; uint32_t pad1:3; uint32_t quarter_control:2; uint32_t thread_control:2; uint32_t predicate_control:4; uint32_t predicate_inverse:1; uint32_t execution_size:3; uint32_t destreg_or_condmod:4; uint32_t acc_wr_control:1; uint32_t cmpt_control:1; uint32_t debug_control:1; uint32_t saturate:1; } header; struct { uint32_t pad1:32; } bits1; struct { uint32_t pad2:32; } bits2; union { struct { uint32_t function_control:19; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } generic_gen5; struct { uint32_t sub_function_id:3; uint32_t pad0:11; uint32_t ack_req:1; uint32_t notify:2; uint32_t pad1:2; uint32_t header:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } msg_gateway; struct { uint32_t opcode:1; uint32_t request:1; uint32_t pad0:2; uint32_t resource:1; uint32_t pad1:14; uint32_t header:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } spawner_gen5; /** Ironlake PRM, Volume 4 Part 1, Section 6.1.1.1 */ struct { uint32_t function:4; uint32_t int_type:1; uint32_t precision:1; uint32_t saturate:1; uint32_t data_type:1; uint32_t snapshot:1; uint32_t pad0:10; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } math_gen5; struct { uint32_t bti:8; uint32_t sampler:4; uint32_t msg_type:5; uint32_t simd_mode:2; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } sampler_gen7; struct { uint32_t bti:8; uint32_t vme_search_path_lut:3; uint32_t lut_sub:2; uint32_t msg_type:2; uint32_t stream_in:1; uint32_t stream_out:1; uint32_t reserved_mbz:2; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } vme_gen7; /** * Message for the Sandybridge Sampler Cache or Constant Cache Data Port. * * See the Sandybridge PRM, Volume 4 Part 1, Section 3.9.2.1.1. **/ struct { uint32_t bti:8; uint32_t msg_control:5; uint32_t msg_type:3; uint32_t pad0:3; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } gen6_dp_sampler_const_cache; /*! Data port untyped read / write messages */ struct { uint32_t bti:8; uint32_t rgba:4; uint32_t simd_mode:2; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_untyped_rw; /*! Data port byte scatter / gather */ struct { uint32_t bti:8; uint32_t simd_mode:1; uint32_t ignored0:1; uint32_t data_size:2; uint32_t ignored1:2; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_byte_rw; /*! Data port Scratch Read/ write */ struct { uint32_t offset:12; uint32_t block_size:2; uint32_t ignored0:1; uint32_t invalidate_after_read:1; uint32_t channel_mode:1; uint32_t msg_type:1; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_scratch_rw; /*! Data port OBlock read / write */ struct { uint32_t bti:8; uint32_t block_size:3; uint32_t ignored:2; uint32_t invalidate_after_read:1; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_oblock_rw; /*! Data port dword scatter / gather */ struct { uint32_t bti:8; uint32_t block_size:2; uint32_t ignored0:3; uint32_t invalidate_after_read:1; uint32_t msg_type:4; uint32_t ignored1:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_dword_rw; /*! Data port typed read / write messages */ struct { uint32_t bti:8; uint32_t chan_mask:4; uint32_t slot:2; uint32_t msg_type:4; uint32_t pad2:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad3:2; uint32_t end_of_thread:1; } gen7_typed_rw; /*! Memory fence */ struct { uint32_t bti:8; uint32_t pad:5; uint32_t commit_enable:1; uint32_t msg_type:4; uint32_t pad2:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad3:2; uint32_t end_of_thread:1; } gen7_memory_fence; /*! atomic messages */ struct { uint32_t bti:8; uint32_t aop_type:4; uint32_t simd_mode:1; uint32_t return_data:1; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad3:2; uint32_t end_of_thread:1; } gen7_atomic_op; /*! Message gateway */ struct { uint32_t subfunc:3; uint32_t pad:11; uint32_t ackreq:1; uint32_t notify:2; uint32_t pad2:2; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad3:2; uint32_t end_of_thread:1; } gen7_msg_gw; struct { uint32_t jip:32; } gen8_branch; /*! Data port Media block read / write */ struct { uint32_t bti:8; uint32_t ver_line_stride_offset:1; uint32_t ver_line_stride:1; uint32_t ver_line_stride_override:1; uint32_t ignored:3; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_mblock_rw; int d; uint32_t ud; float f; } bits3; }; }; #endif /* __GEN_DEFS_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen8_encoder.cpp000664 001750 001750 00000107471 13161142102 022272 0ustar00yryr000000 000000 /* Copyright (C) Intel Corp. 2006. All Rights Reserved. Intel funded Tungsten Graphics (http://www.tungstengraphics.com) to develop this 3D driver. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. **********************************************************************/ #include "backend/gen8_encoder.hpp" static const uint32_t untypedRWMask[] = { GEN_UNTYPED_ALPHA|GEN_UNTYPED_BLUE|GEN_UNTYPED_GREEN|GEN_UNTYPED_RED, GEN_UNTYPED_ALPHA|GEN_UNTYPED_BLUE|GEN_UNTYPED_GREEN, GEN_UNTYPED_ALPHA|GEN_UNTYPED_BLUE, GEN_UNTYPED_ALPHA, 0 }; namespace gbe { extern bool compactAlu3(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1, GenRegister src2); void Gen8Encoder::setHeader(GenNativeInstruction *insn) { Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; if (this->curr.execWidth == 8) gen8_insn->header.execution_size = GEN_WIDTH_8; else if (this->curr.execWidth == 16) gen8_insn->header.execution_size = GEN_WIDTH_16; else if (this->curr.execWidth == 1) gen8_insn->header.execution_size = GEN_WIDTH_1; else if (this->curr.execWidth == 4) gen8_insn->header.execution_size = GEN_WIDTH_4; else NOT_IMPLEMENTED; gen8_insn->header.acc_wr_control = this->curr.accWrEnable; gen8_insn->header.quarter_control = this->curr.quarterControl; gen8_insn->header.nib_ctrl = this->curr.nibControl; gen8_insn->bits1.ia1.mask_control = this->curr.noMask; gen8_insn->bits1.ia1.flag_reg_nr = this->curr.flag; gen8_insn->bits1.ia1.flag_sub_reg_nr = this->curr.subFlag; if (this->curr.predicate != GEN_PREDICATE_NONE) { gen8_insn->header.predicate_control = this->curr.predicate; gen8_insn->header.predicate_inverse = this->curr.inversePredicate; } gen8_insn->header.saturate = this->curr.saturate; } void Gen8Encoder::setDPUntypedRW(GenNativeInstruction *insn, uint32_t bti, uint32_t rgba, uint32_t msg_type, uint32_t msg_length, uint32_t response_length) { Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; const GenMessageTarget sfid = GEN_SFID_DATAPORT1_DATA; setMessageDescriptor(insn, sfid, msg_length, response_length); gen8_insn->bits3.gen8_untyped_rw_a64.msg_type = msg_type; gen8_insn->bits3.gen8_untyped_rw_a64.bti = bti; gen8_insn->bits3.gen8_untyped_rw_a64.rgba = rgba; if (curr.execWidth == 8) gen8_insn->bits3.gen8_untyped_rw_a64.simd_mode = GEN_UNTYPED_SIMD8; else if (curr.execWidth == 16) gen8_insn->bits3.gen8_untyped_rw_a64.simd_mode = GEN_UNTYPED_SIMD16; else NOT_SUPPORTED; } static void setDPByteScatterGatherA64(GenEncoder *p, GenNativeInstruction *insn, uint32_t bti, uint32_t block_size, uint32_t data_size, uint32_t msg_type, uint32_t msg_length, uint32_t response_length) { const GenMessageTarget sfid = GEN_SFID_DATAPORT1_DATA; Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; p->setMessageDescriptor(insn, sfid, msg_length, response_length); gen8_insn->bits3.gen8_scatter_rw_a64.msg_type = msg_type; gen8_insn->bits3.gen8_scatter_rw_a64.bti = bti; gen8_insn->bits3.gen8_scatter_rw_a64.data_sz = data_size; gen8_insn->bits3.gen8_scatter_rw_a64.block_sz = block_size; GBE_ASSERT(p->curr.execWidth == 8); } void Gen8Encoder::setTypedWriteMessage(GenNativeInstruction *insn, unsigned char bti, unsigned char msg_type, uint32_t msg_length, bool header_present) { Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; const GenMessageTarget sfid = GEN_SFID_DATAPORT1_DATA; setMessageDescriptor(insn, sfid, msg_length, 0, header_present); gen8_insn->bits3.gen7_typed_rw.bti = bti; gen8_insn->bits3.gen7_typed_rw.msg_type = msg_type; /* Always using the low 8 slots here. */ gen8_insn->bits3.gen7_typed_rw.slot = 1; } void Gen8Encoder::F16TO32(GenRegister dest, GenRegister src0) { MOV(GenRegister::retype(dest, GEN_TYPE_F), GenRegister::retype(src0, GEN_TYPE_HF)); } void Gen8Encoder::F32TO16(GenRegister dest, GenRegister src0) { MOV(GenRegister::retype(dest, GEN_TYPE_HF), GenRegister::retype(src0, GEN_TYPE_F)); } unsigned Gen8Encoder::setAtomicMessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum) { Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; uint32_t msg_length = 0; uint32_t response_length = 0; if (this->curr.execWidth == 8) { msg_length = srcNum; response_length = 1; } else if (this->curr.execWidth == 16) { msg_length = 2 * srcNum; response_length = 2; } else NOT_IMPLEMENTED; const GenMessageTarget sfid = GEN_SFID_DATAPORT1_DATA; setMessageDescriptor(insn, sfid, msg_length, response_length); gen8_insn->bits3.gen7_atomic_op.msg_type = GEN75_P1_UNTYPED_ATOMIC_OP; gen8_insn->bits3.gen7_atomic_op.bti = bti; gen8_insn->bits3.gen7_atomic_op.return_data = 1; gen8_insn->bits3.gen7_atomic_op.aop_type = function; if (this->curr.execWidth == 8) gen8_insn->bits3.gen7_atomic_op.simd_mode = GEN_ATOMIC_SIMD8; else if (this->curr.execWidth == 16) gen8_insn->bits3.gen7_atomic_op.simd_mode = GEN_ATOMIC_SIMD16; else NOT_SUPPORTED; return gen8_insn->bits3.ud; } void Gen8Encoder::ATOMIC(GenRegister dst, uint32_t function, GenRegister src, GenRegister data, GenRegister bti, uint32_t srcNum, bool useSends) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA; this->setDst(insn, GenRegister::uw16grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0)); if (bti.file == GEN_IMMEDIATE_VALUE) { this->setSrc1(insn, GenRegister::immud(0)); setAtomicMessageDesc(insn, function, bti.value.ud, srcNum); } else { this->setSrc1(insn, bti); } } unsigned Gen8Encoder::setAtomicA64MessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum, int type_long) { Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; uint32_t msg_length = 0; uint32_t response_length = 0; assert(srcNum <= 3); if (this->curr.execWidth == 8) { msg_length = srcNum + 1 + type_long; if(srcNum == 3 && type_long) msg_length++; response_length = 1 + type_long; } else if (this->curr.execWidth == 16) { msg_length = 2 * (srcNum + 1); response_length = 2; } else NOT_IMPLEMENTED; const GenMessageTarget sfid = GEN_SFID_DATAPORT1_DATA; setMessageDescriptor(insn, sfid, msg_length, response_length); gen8_insn->bits3.gen8_atomic_a64.msg_type = GEN8_P1_UNTYPED_ATOMIC_A64; gen8_insn->bits3.gen8_atomic_a64.bti = bti; gen8_insn->bits3.gen8_atomic_a64.return_data = 1; gen8_insn->bits3.gen8_atomic_a64.aop_type = function; gen8_insn->bits3.gen8_atomic_a64.data_size = type_long; return gen8_insn->bits3.ud; } void Gen8Encoder::ATOMICA64(GenRegister dst, uint32_t function, GenRegister src, GenRegister bti, uint32_t srcNum) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA; this->setDst(insn, GenRegister::uw16grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); int type_long = (dst.type == GEN_TYPE_UL || dst.type == GEN_TYPE_L) ? 1: 0; setAtomicA64MessageDesc(insn, function, bti.value.ud, srcNum, type_long); } unsigned Gen8Encoder::setUntypedReadMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) { uint32_t msg_length = 0; uint32_t response_length = 0; if (this->curr.execWidth == 8) { msg_length = 1; response_length = elemNum; } else if (this->curr.execWidth == 16) { msg_length = 2; response_length = 2 * elemNum; } else NOT_IMPLEMENTED; setDPUntypedRW(insn, bti, untypedRWMask[elemNum], GEN75_P1_UNTYPED_READ, msg_length, response_length); return insn->bits3.ud; } void Gen8Encoder::UNTYPED_READ(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemNum) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); assert(elemNum >= 1 || elemNum <= 4); this->setHeader(insn); this->setDst(insn, GenRegister::uw16grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA; if (bti.file == GEN_IMMEDIATE_VALUE) { this->setSrc1(insn, GenRegister::immud(0)); setUntypedReadMessageDesc(insn, bti.value.ud, elemNum); } else { this->setSrc1(insn, bti); } } unsigned Gen8Encoder::setUntypedWriteMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) { uint32_t msg_length = 0; uint32_t response_length = 0; if (this->curr.execWidth == 8) { msg_length = 1 + elemNum; } else if (this->curr.execWidth == 16) { msg_length = 2 * (1 + elemNum); } else NOT_IMPLEMENTED; setDPUntypedRW(insn, bti, untypedRWMask[elemNum], GEN75_P1_UNTYPED_SURFACE_WRITE, msg_length, response_length); return insn->bits3.ud; } void Gen8Encoder::UNTYPED_WRITE(GenRegister msg, GenRegister data, GenRegister bti, uint32_t elemNum, bool useSends) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); assert(elemNum >= 1 || elemNum <= 4); this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA; if (this->curr.execWidth == 8) { this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); } else if (this->curr.execWidth == 16) { this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW)); } else NOT_IMPLEMENTED; this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0)); if (bti.file == GEN_IMMEDIATE_VALUE) { this->setSrc1(insn, GenRegister::immud(0)); setUntypedWriteMessageDesc(insn, bti.value.ud, elemNum); } else { this->setSrc1(insn, bti); } } void Gen8Encoder::UNTYPED_READA64(GenRegister dst, GenRegister src, uint32_t elemNum) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); assert(elemNum >= 1 || elemNum <= 4); uint32_t msg_length = 0; uint32_t response_length = 0; assert(this->curr.execWidth == 8); if (this->curr.execWidth == 8) { msg_length = 2; response_length = elemNum; } else NOT_IMPLEMENTED; this->setHeader(insn); this->setDst(insn, GenRegister::uw16grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); setDPUntypedRW(insn, 255, // stateless bti untypedRWMask[elemNum], GEN8_P1_UNTYPED_READ_A64, msg_length, response_length); } void Gen8Encoder::UNTYPED_WRITEA64(GenRegister msg, uint32_t elemNum) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); assert(elemNum >= 1 || elemNum <= 4); uint32_t msg_length = 0; uint32_t response_length = 0; this->setHeader(insn); if (this->curr.execWidth == 8) { this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); msg_length = 2 + elemNum; } else NOT_IMPLEMENTED; this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); setDPUntypedRW(insn, 255, //stateless bti untypedRWMask[elemNum], GEN8_P1_UNTYPED_WRITE_A64, msg_length, response_length); } void Gen8Encoder::BYTE_GATHERA64(GenRegister dst, GenRegister src, uint32_t elemSize) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA; this->setDst(insn, GenRegister::uw16grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); //setByteGatherMessageDesc(insn, bti.value.ud, elemSize); GBE_ASSERT(this->curr.execWidth == 8); const uint32_t msg_length = 2; const uint32_t response_length = 1; setDPByteScatterGatherA64(this, insn, 0xff, 0x0, elemSize, GEN8_P1_BYTE_GATHER_A64, msg_length, response_length); } void Gen8Encoder::BYTE_SCATTERA64(GenRegister msg, uint32_t elemSize) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA; // only support simd8 GBE_ASSERT(this->curr.execWidth == 8); this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); const uint32_t msg_length = 3; const uint32_t response_length = 0; setDPByteScatterGatherA64(this, insn, 0xff, 0x0, elemSize, GEN8_P1_BYTE_SCATTER_A64, msg_length, response_length); } void Gen8Encoder::LOAD_INT64_IMM(GenRegister dest, GenRegister value) { MOV(dest, value); } void Gen8Encoder::JMPI(GenRegister src, bool longjmp) { alu2(this, GEN_OPCODE_JMPI, GenRegister::ip(), GenRegister::ip(), src); } void Gen8Encoder::patchJMPI(uint32_t insnID, int32_t jip, int32_t uip) { GenNativeInstruction &insn = *(GenNativeInstruction *)&this->store[insnID]; GBE_ASSERT(insnID < this->store.size()); GBE_ASSERT(insn.header.opcode == GEN_OPCODE_JMPI || insn.header.opcode == GEN_OPCODE_BRD || insn.header.opcode == GEN_OPCODE_ENDIF || insn.header.opcode == GEN_OPCODE_IF || insn.header.opcode == GEN_OPCODE_BRC || insn.header.opcode == GEN_OPCODE_WHILE || insn.header.opcode == GEN_OPCODE_ELSE); if( insn.header.opcode == GEN_OPCODE_WHILE ) { // if this WHILE instruction jump back to an ELSE instruction, // need add distance to go to the next instruction. GenNativeInstruction & insn_else = *(GenNativeInstruction *)&this->store[insnID+jip]; if(insn_else.header.opcode == GEN_OPCODE_ELSE) { jip += 2; } } if(insn.header.opcode == GEN_OPCODE_ELSE) uip = jip; if (insn.header.opcode == GEN_OPCODE_IF || insn.header.opcode == GEN_OPCODE_ELSE) { Gen8NativeInstruction *gen8_insn = &insn.gen8_insn; this->setSrc0(&insn, GenRegister::immud(0)); gen8_insn->bits2.gen8_branch.uip = uip*8; gen8_insn->bits3.gen8_branch.jip = jip*8; return; } else if (insn.header.opcode == GEN_OPCODE_JMPI) { //jumpDistance'unit is Qword, and the HSW's offset of jmpi is in byte, so multi 8 jip = (jip - 2); } this->setSrc1(&insn, GenRegister::immd(jip*8)); } void Gen8Encoder::FENCE(GenRegister dst, bool flushRWCache) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; this->setHeader(insn); this->setDst(insn, dst); this->setSrc0(insn, dst); setMessageDescriptor(insn, GEN_SFID_DATAPORT_DATA, 1, 1, 1); gen8_insn->bits3.gen7_memory_fence.msg_type = GEN_MEM_FENCE; gen8_insn->bits3.gen7_memory_fence.commit_enable = 0x1; gen8_insn->bits3.gen7_memory_fence.flush_rw = flushRWCache ? 1 : 0; } void Gen8Encoder::FLUSH_SAMPLERCACHE(GenRegister dst) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); this->setDst(insn, dst); this->setSrc0(insn, GenRegister::ud8grf(0,0)); unsigned msg_type = GEN_SAMPLER_MESSAGE_CACHE_FLUSH; unsigned simd_mode = GEN_SAMPLER_SIMD_MODE_SIMD32_64; setSamplerMessage(insn, 0, 0, msg_type, 1, 1, true, simd_mode, 0); } void Gen8Encoder::setDst(GenNativeInstruction *insn, GenRegister dest) { Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; if (dest.file != GEN_ARCHITECTURE_REGISTER_FILE) assert(dest.nr < 128); gen8_insn->bits1.da1.dest_reg_file = dest.file; gen8_insn->bits1.da1.dest_reg_type = dest.type; gen8_insn->bits1.da1.dest_address_mode = dest.address_mode; gen8_insn->bits1.da1.dest_reg_nr = dest.nr; gen8_insn->bits1.da1.dest_subreg_nr = dest.subnr; if (dest.hstride == GEN_HORIZONTAL_STRIDE_0) { if (dest.type == GEN_TYPE_UB || dest.type == GEN_TYPE_B) dest.hstride = GEN_HORIZONTAL_STRIDE_4; else if (dest.type == GEN_TYPE_UW || dest.type == GEN_TYPE_W) dest.hstride = GEN_HORIZONTAL_STRIDE_2; else dest.hstride = GEN_HORIZONTAL_STRIDE_1; } gen8_insn->bits1.da1.dest_horiz_stride = dest.hstride; } void Gen8Encoder::setSrc0WithAcc(GenNativeInstruction *insn, GenRegister reg, uint32_t accN) { Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; assert(reg.file == GEN_GENERAL_REGISTER_FILE); assert(reg.nr < 128); assert(gen8_insn->header.access_mode == GEN_ALIGN_16); assert(reg.subnr == 0); assert(gen8_insn->header.execution_size >= GEN_WIDTH_4); gen8_insn->bits1.da16acc.src0_reg_file = reg.file; gen8_insn->bits1.da16acc.src0_reg_type = reg.type; gen8_insn->bits2.da16acc.src0_abs = reg.absolute; gen8_insn->bits2.da16acc.src0_negate = reg.negation; gen8_insn->bits2.da16acc.src0_address_mode = reg.address_mode; gen8_insn->bits2.da16acc.src0_subreg_nr = reg.subnr / 16; gen8_insn->bits2.da16acc.src0_reg_nr = reg.nr; gen8_insn->bits2.da16acc.src0_special_acc_lo = accN; gen8_insn->bits2.da16acc.src0_special_acc_hi = 0; gen8_insn->bits2.da16acc.src0_vert_stride = reg.vstride; } void Gen8Encoder::setSrc1WithAcc(GenNativeInstruction *insn, GenRegister reg, uint32_t accN) { Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; assert(reg.file == GEN_GENERAL_REGISTER_FILE); assert(reg.nr < 128); assert(gen8_insn->header.access_mode == GEN_ALIGN_16); assert(reg.subnr == 0); assert(gen8_insn->header.execution_size >= GEN_WIDTH_4); gen8_insn->bits2.da16acc.src1_reg_file = reg.file; gen8_insn->bits2.da16acc.src1_reg_type = reg.type; gen8_insn->bits3.da16acc.src1_abs = reg.absolute; gen8_insn->bits3.da16acc.src1_negate = reg.negation; gen8_insn->bits3.da16acc.src1_address_mode = reg.address_mode; gen8_insn->bits3.da16acc.src1_subreg_nr = reg.subnr / 16; gen8_insn->bits3.da16acc.src1_reg_nr = reg.nr; gen8_insn->bits3.da16acc.src1_special_acc_lo = accN; gen8_insn->bits3.da16acc.src1_special_acc_hi = 0; gen8_insn->bits3.da16acc.src1_vert_stride = reg.vstride; } void Gen8Encoder::setSrc0(GenNativeInstruction *insn, GenRegister reg) { Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; if (reg.file != GEN_ARCHITECTURE_REGISTER_FILE) assert(reg.nr < 128); if (reg.address_mode == GEN_ADDRESS_DIRECT) { gen8_insn->bits1.da1.src0_reg_file = reg.file; gen8_insn->bits1.da1.src0_reg_type = reg.type; gen8_insn->bits2.da1.src0_abs = reg.absolute; gen8_insn->bits2.da1.src0_negate = reg.negation; gen8_insn->bits2.da1.src0_address_mode = reg.address_mode; if (reg.file == GEN_IMMEDIATE_VALUE) { if (reg.type == GEN_TYPE_L || reg.type == GEN_TYPE_UL || reg.type == GEN_TYPE_DF_IMM) { gen8_insn->bits3.ud = (uint32_t)(reg.value.i64 >> 32); gen8_insn->bits2.ud = (uint32_t)(reg.value.i64); } else { gen8_insn->bits3.ud = reg.value.ud; /* Required to set some fields in src1 as well: */ gen8_insn->bits2.da1.src1_reg_file = 0; /* arf */ gen8_insn->bits2.da1.src1_reg_type = reg.type; } } else { if (gen8_insn->header.access_mode == GEN_ALIGN_1) { gen8_insn->bits2.da1.src0_subreg_nr = reg.subnr; gen8_insn->bits2.da1.src0_reg_nr = reg.nr; } else { gen8_insn->bits2.da16.src0_subreg_nr = reg.subnr / 16; gen8_insn->bits2.da16.src0_reg_nr = reg.nr; } if (reg.width == GEN_WIDTH_1 && gen8_insn->header.execution_size == GEN_WIDTH_1) { gen8_insn->bits2.da1.src0_horiz_stride = GEN_HORIZONTAL_STRIDE_0; gen8_insn->bits2.da1.src0_width = GEN_WIDTH_1; gen8_insn->bits2.da1.src0_vert_stride = GEN_VERTICAL_STRIDE_0; } else { gen8_insn->bits2.da1.src0_horiz_stride = reg.hstride; gen8_insn->bits2.da1.src0_width = reg.width; gen8_insn->bits2.da1.src0_vert_stride = reg.vstride; } } } else { gen8_insn->bits1.ia1.src0_reg_file = GEN_GENERAL_REGISTER_FILE; gen8_insn->bits1.ia1.src0_reg_type = reg.type; gen8_insn->bits2.ia1.src0_subreg_nr = reg.a0_subnr; gen8_insn->bits2.ia1.src0_indirect_offset = (reg.addr_imm & 0x1ff); gen8_insn->bits2.ia1.src0_abs = reg.absolute; gen8_insn->bits2.ia1.src0_negate = reg.negation; gen8_insn->bits2.ia1.src0_address_mode = reg.address_mode; gen8_insn->bits2.ia1.src0_horiz_stride = reg.hstride; gen8_insn->bits2.ia1.src0_width = reg.width; gen8_insn->bits2.ia1.src0_vert_stride = reg.vstride; gen8_insn->bits2.ia1.src0_indirect_offset_9 = (reg.addr_imm & 0x02) >> 9; } } void Gen8Encoder::setSrc1(GenNativeInstruction *insn, GenRegister reg) { Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; assert(reg.nr < 128); gen8_insn->bits2.da1.src1_reg_file = reg.file; gen8_insn->bits2.da1.src1_reg_type = reg.type; gen8_insn->bits3.da1.src1_abs = reg.absolute; gen8_insn->bits3.da1.src1_negate = reg.negation; assert(gen8_insn->bits1.da1.src0_reg_file != GEN_IMMEDIATE_VALUE); if (reg.file == GEN_IMMEDIATE_VALUE) { assert(!((reg.type == GEN_TYPE_L || reg.type == GEN_TYPE_UL || reg.type == GEN_TYPE_DF_IMM) && reg.value.u64 > 0xFFFFFFFFl)); gen8_insn->bits3.ud = reg.value.ud; } else { assert (reg.address_mode == GEN_ADDRESS_DIRECT); if (gen8_insn->header.access_mode == GEN_ALIGN_1) { gen8_insn->bits3.da1.src1_subreg_nr = reg.subnr; gen8_insn->bits3.da1.src1_reg_nr = reg.nr; } else { gen8_insn->bits3.da16.src1_subreg_nr = reg.subnr / 16; gen8_insn->bits3.da16.src1_reg_nr = reg.nr; } if (reg.width == GEN_WIDTH_1 && gen8_insn->header.execution_size == GEN_WIDTH_1) { gen8_insn->bits3.da1.src1_horiz_stride = GEN_HORIZONTAL_STRIDE_0; gen8_insn->bits3.da1.src1_width = GEN_WIDTH_1; gen8_insn->bits3.da1.src1_vert_stride = GEN_VERTICAL_STRIDE_0; } else { gen8_insn->bits3.da1.src1_horiz_stride = reg.hstride; gen8_insn->bits3.da1.src1_width = reg.width; gen8_insn->bits3.da1.src1_vert_stride = reg.vstride; } } } bool Gen8Encoder::canHandleLong(uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1) { return false; } void Gen8Encoder::handleDouble(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1) { uint32_t w = p->curr.execWidth; GenNativeInstruction *insn = NULL; if (w <= 8) { insn = p->next(opcode); p->setHeader(insn); p->setDst(insn, dst); p->setSrc0(insn, src0); if (!GenRegister::isNull(src1)) p->setSrc1(insn, src1); return; } else { GBE_ASSERT(w == 16); GBE_ASSERT(dst.hstride != GEN_HORIZONTAL_STRIDE_0); //Should not be a uniform. p->push(); { p->curr.execWidth = 8; p->curr.quarterControl = GEN_COMPRESSION_Q1; insn = p->next(opcode); p->setHeader(insn); p->setDst(insn, dst); p->setSrc0(insn, src0); if (!GenRegister::isNull(src1)) p->setSrc1(insn, src1); // second half p->curr.quarterControl = GEN_COMPRESSION_Q2; insn = p->next(opcode); p->setHeader(insn); p->setDst(insn, GenRegister::offset(dst, 2)); if (src0.hstride != GEN_HORIZONTAL_STRIDE_0) p->setSrc0(insn, GenRegister::offset(src0, 2)); else p->setSrc0(insn, src0); if (!GenRegister::isNull(src1)) { if (src1.hstride != GEN_HORIZONTAL_STRIDE_0) p->setSrc1(insn, GenRegister::offset(src1, 2)); else p->setSrc1(insn, src1); } } p->pop(); } } #define NO_SWIZZLE ((0<<0) | (1<<2) | (2<<4) | (3<<6)) void Gen8Encoder::alu3(uint32_t opcode, GenRegister dest, GenRegister src0, GenRegister src1, GenRegister src2) { if(compactAlu3(this, opcode, dest, src0, src1, src2)) return; GenNativeInstruction *insn = this->next(opcode); Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; int execution_size = 0; if (this->curr.execWidth == 1) { execution_size = GEN_WIDTH_1; }else if(this->curr.execWidth == 8) { execution_size = GEN_WIDTH_8; } else if(this->curr.execWidth == 16) { execution_size = GEN_WIDTH_16; }else NOT_IMPLEMENTED; assert(dest.file == GEN_GENERAL_REGISTER_FILE); assert(dest.nr < 128); assert(dest.address_mode == GEN_ADDRESS_DIRECT); assert(src0.type == GEN_TYPE_HF || src0.type == GEN_TYPE_F || src0.type == GEN_TYPE_DF); assert(src0.type == dest.type); assert(src0.type == src1.type); assert(src0.type == src2.type); int32_t dataType = src0.type == GEN_TYPE_DF ? 3 : (src0.type == GEN_TYPE_HF ? 4 : 0); //gen8_insn->bits1.da3src.dest_reg_file = 0; gen8_insn->bits1.da3src.dest_reg_nr = dest.nr; gen8_insn->bits1.da3src.dest_subreg_nr = dest.subnr / 4; gen8_insn->bits1.da3src.dest_writemask = 0xf; gen8_insn->bits1.da3src.dest_type = dataType; gen8_insn->bits1.da3src.src_type = dataType; gen8_insn->bits1.da3src.src1_type = src1.type == GEN_TYPE_HF; gen8_insn->bits1.da3src.src2_type = src2.type == GEN_TYPE_HF; this->setHeader(insn); gen8_insn->header.access_mode = GEN_ALIGN_16; gen8_insn->header.execution_size = execution_size; assert(src0.file == GEN_GENERAL_REGISTER_FILE); assert(src0.address_mode == GEN_ADDRESS_DIRECT); assert(src0.nr < 128); gen8_insn->bits2.da3src.src0_swizzle = NO_SWIZZLE; gen8_insn->bits2.da3src.src0_subreg_nr = src0.subnr / 4 ; gen8_insn->bits2.da3src.src0_reg_nr = src0.nr; gen8_insn->bits1.da3src.src0_abs = src0.absolute; gen8_insn->bits1.da3src.src0_negate = src0.negation; gen8_insn->bits2.da3src.src0_rep_ctrl = src0.vstride == GEN_VERTICAL_STRIDE_0; assert(src1.file == GEN_GENERAL_REGISTER_FILE); assert(src1.address_mode == GEN_ADDRESS_DIRECT); assert(src1.nr < 128); gen8_insn->bits2.da3src.src1_swizzle = NO_SWIZZLE; gen8_insn->bits2.da3src.src1_subreg_nr_low = (src1.subnr / 4) & 0x3; gen8_insn->bits3.da3src.src1_subreg_nr_high = (src1.subnr / 4) >> 2; gen8_insn->bits2.da3src.src1_rep_ctrl = src1.vstride == GEN_VERTICAL_STRIDE_0; gen8_insn->bits3.da3src.src1_reg_nr = src1.nr; gen8_insn->bits1.da3src.src1_abs = src1.absolute; gen8_insn->bits1.da3src.src1_negate = src1.negation; assert(src2.file == GEN_GENERAL_REGISTER_FILE); assert(src2.address_mode == GEN_ADDRESS_DIRECT); assert(src2.nr < 128); gen8_insn->bits3.da3src.src2_swizzle = NO_SWIZZLE; gen8_insn->bits3.da3src.src2_subreg_nr = src2.subnr / 4; gen8_insn->bits3.da3src.src2_rep_ctrl = src2.vstride == GEN_VERTICAL_STRIDE_0; gen8_insn->bits3.da3src.src2_reg_nr = src2.nr; gen8_insn->bits1.da3src.src2_abs = src2.absolute; gen8_insn->bits1.da3src.src2_negate = src2.negation; } void Gen8Encoder::MATH_WITH_ACC(GenRegister dst, uint32_t function, GenRegister src0, GenRegister src1, uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc) { GenNativeInstruction *insn = this->next(GEN_OPCODE_MATH); Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; assert(dst.file == GEN_GENERAL_REGISTER_FILE); assert(src0.file == GEN_GENERAL_REGISTER_FILE); assert(src1.file == GEN_GENERAL_REGISTER_FILE); assert(dst.hstride == GEN_HORIZONTAL_STRIDE_1 || dst.hstride == GEN_HORIZONTAL_STRIDE_0); gen8_insn->header.access_mode = GEN_ALIGN_16; insn->header.destreg_or_condmod = function; this->setHeader(insn); this->setDst(insn, dst); gen8_insn->bits1.da16acc.dst_special_acc = dstAcc; this->setSrc0WithAcc(insn, src0, src0Acc); this->setSrc1WithAcc(insn, src1, src1Acc); } void Gen8Encoder::MADM(GenRegister dst, GenRegister src0, GenRegister src1, GenRegister src2, uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t src2Acc) { GenNativeInstruction *insn = this->next(GEN_OPCODE_MADM); Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; assert(dst.file == GEN_GENERAL_REGISTER_FILE); assert(src0.file == GEN_GENERAL_REGISTER_FILE); assert(src1.file == GEN_GENERAL_REGISTER_FILE); assert(src2.file == GEN_GENERAL_REGISTER_FILE); assert(dst.hstride == GEN_HORIZONTAL_STRIDE_1 || dst.hstride == GEN_HORIZONTAL_STRIDE_0); assert(src0.type == GEN_TYPE_DF || src0.type == GEN_TYPE_F); assert(src0.type == dst.type); assert(src0.type == src1.type); assert(src0.type == src2.type); // If in double, width should be less than 4 assert((src0.type == GEN_TYPE_DF && this->curr.execWidth <= 4) // If in float, width should be less than 8 || (src0.type == GEN_TYPE_F && this->curr.execWidth <= 8)); int32_t dataType = src0.type == GEN_TYPE_DF ? 3 : 0; this->setHeader(insn); gen8_insn->bits1.da3srcacc.dest_reg_nr = dst.nr; gen8_insn->bits1.da3srcacc.dest_subreg_nr = dst.subnr / 16; gen8_insn->bits1.da3srcacc.dst_special_acc = dstAcc; gen8_insn->bits1.da3srcacc.src_type = dataType; gen8_insn->bits1.da3srcacc.dest_type = dataType; gen8_insn->header.access_mode = GEN_ALIGN_16; assert(src0.file == GEN_GENERAL_REGISTER_FILE); assert(src0.address_mode == GEN_ADDRESS_DIRECT); assert(src0.nr < 128); gen8_insn->bits2.da3srcacc.src0_special_acc = src0Acc; gen8_insn->bits2.da3srcacc.src0_subreg_nr = src0.subnr / 4 ; gen8_insn->bits2.da3srcacc.src0_reg_nr = src0.nr; gen8_insn->bits1.da3srcacc.src0_abs = src0.absolute; gen8_insn->bits1.da3srcacc.src0_negate = src0.negation; gen8_insn->bits2.da3srcacc.src0_rep_ctrl = src0.vstride == GEN_VERTICAL_STRIDE_0; assert(src1.file == GEN_GENERAL_REGISTER_FILE); assert(src1.address_mode == GEN_ADDRESS_DIRECT); assert(src1.nr < 128); gen8_insn->bits2.da3srcacc.src1_special_acc = src1Acc; gen8_insn->bits2.da3srcacc.src1_subreg_nr_low = (src1.subnr / 4) & 0x3; gen8_insn->bits3.da3srcacc.src1_subreg_nr_high = (src1.subnr / 4) >> 2; gen8_insn->bits2.da3srcacc.src1_rep_ctrl = src1.vstride == GEN_VERTICAL_STRIDE_0; gen8_insn->bits3.da3srcacc.src1_reg_nr = src1.nr; gen8_insn->bits1.da3srcacc.src1_abs = src1.absolute; gen8_insn->bits1.da3srcacc.src1_negate = src1.negation; assert(src2.file == GEN_GENERAL_REGISTER_FILE); assert(src2.address_mode == GEN_ADDRESS_DIRECT); assert(src2.nr < 128); gen8_insn->bits3.da3srcacc.src2_special_acc = src2Acc; gen8_insn->bits3.da3srcacc.src2_subreg_nr = src2.subnr / 4; gen8_insn->bits3.da3srcacc.src2_rep_ctrl = src2.vstride == GEN_VERTICAL_STRIDE_0; gen8_insn->bits3.da3srcacc.src2_reg_nr = src2.nr; gen8_insn->bits1.da3srcacc.src2_abs = src2.absolute; gen8_insn->bits1.da3srcacc.src2_negate = src2.negation; } static void setOBlockRWA64(GenEncoder *p, GenNativeInstruction *insn, uint32_t bti, uint32_t size, uint32_t msg_type, uint32_t msg_length, uint32_t response_length) { const GenMessageTarget sfid = GEN_SFID_DATAPORT1_DATA; p->setMessageDescriptor(insn, sfid, msg_length, response_length); Gen8NativeInstruction *gen8_insn = &insn->gen8_insn; gen8_insn->bits3.gen8_block_rw_a64.msg_type = msg_type; gen8_insn->bits3.gen8_block_rw_a64.bti = bti; // For OWord Block read, we use unaligned read gen8_insn->bits3.gen8_block_rw_a64.msg_sub_type = msg_type == GEN8_P1_BLOCK_READ_A64 ? 1 : 0; gen8_insn->bits3.gen8_block_rw_a64.block_size = size; gen8_insn->bits3.gen8_block_rw_a64.header_present = 1; } void Gen8Encoder::OBREADA64(GenRegister dst, GenRegister header, uint32_t bti, uint32_t ow_size) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); const uint32_t msg_length = 1; uint32_t sizeinreg = ow_size / 2; // half reg should also have size 1 sizeinreg = sizeinreg == 0 ? 1 : sizeinreg; const uint32_t block_size = getOBlockSize(ow_size, dst.subnr == 0); const uint32_t response_length = sizeinreg; // Size is in reg this->setHeader(insn); this->setDst(insn, GenRegister::uw16grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(header.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); setOBlockRWA64(this, insn, bti, block_size, GEN8_P1_BLOCK_READ_A64, msg_length, response_length); } void Gen8Encoder::OBWRITEA64(GenRegister header, uint32_t bti, uint32_t ow_size) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); uint32_t sizeinreg = ow_size / 2; // half reg should also have size 1 sizeinreg = sizeinreg == 0 ? 1 : sizeinreg; const uint32_t msg_length = 1 + sizeinreg; // Size is in reg and header const uint32_t response_length = 0; const uint32_t block_size = getOBlockSize(ow_size); this->setHeader(insn); this->setSrc0(insn, GenRegister::ud8grf(header.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW)); setOBlockRWA64(this, insn, bti, block_size, GEN8_P1_BLOCK_WRITE_A64, msg_length, response_length); } } /* End of the name space. */ Beignet-1.3.2-Source/backend/src/backend/gen_insn_selection.cpp000664 001750 001750 00001161151 13173554000 023602 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file gen_insn_selection.cpp * \author Benjamin Segovia */ /* This is the instruction selection code. First of all, this is a bunch of c++ * crap. Sorry if this is not that readable. Anyway, the goal here is to take * GenIR code (i.e. the very regular, very RISC IR) and to produce GenISA with * virtual registers (i.e. regular GenIR registers). * * Overall idea: * ============= * * There is a lot of papers and research about that but I tried to keep it * simple. No dynamic programming, nothing like this. Just a recursive maximal * munch. * * Basically, the code is executed per basic block from bottom to top. Patterns * of GenIR instructions are defined and each instruction is matched against the * best pattern i.e. the pattern that catches the largest number of * instructions. Once matched, a sequence of instructions is output. * * Each instruction the match depends on is then marked as "root" i.e. we * indicate that each of these instructions must be generated: we indeed need their * destinations for the next instructions (remember that we generate the code in * reverse order) * * Patterns: * ========= * * There is a lot of patterns and I did not implement all of them obviously. I * just quickly gather the complete code to make pattern implementation kind of * easy. This is pretty verbose to add a pattern but it should be not too hard * to add new ones. * * To create and register patterns, I just abused C++ pre-main. A bunch of * patterns is then created and sorted per opcode (i.e. the opcode of the root * of the pattern): this creates a library of patterns that may be used in * run-time. * * Predication / Masking and CFG linearization * =========================================== * * The current version is based on an unfortunate choice. Basically, the problem * to solve is how to map unstructured branches (i.e. regular gotos) onto Gen. * Gen has a native support for structured branches (if/else/endif/while...) but * nothing really native for unstructured branches. * * The idea we implemented is simple. We stole one flag register (here f0.0) to * mask all the instructions (and only activate the proper SIMD lanes) and we * use the CFG linearization technique to properly handle the control flow. This * is not really good for one particular reason: Gen instructions must use the * *same* flag register for the predicates (used for masking) and the * conditional modifier (used as a destination for CMP). This leads to extra * complications with compare instructions and select instructions. Basically, * we need to insert extra MOVs. * * Also, there is some extra kludge to handle the predicates for JMPI. * * TODO: * ===== * * Sadly, I recreated here a new DAG class. This is just a bad idea since we * already have the DAG per basic block with the Function graph i.e. the * complete graph of uses and definitions. I think we should be able to save a * lot of code here if we can simply reuse the code from UD / DU chains. * * Finally, cross-block instruction selection is quite possible with this simple * approach. Basically, instructions from dominating blocks could be merged and * matched with other instructions in the dominated block. This leads to the * interesting approach which consists in traversing the dominator tree in post * order * * We already use if/endif to enclose each basic block. We will continue to identify * those blocks which could match to structured branching and use pure structured * instruction to handle them completely. */ #include "backend/gen_insn_selection.hpp" #include "backend/gen_context.hpp" #include "ir/function.hpp" #include "ir/liveness.hpp" #include "ir/profile.hpp" #include "sys/cvar.hpp" #include "sys/vector.hpp" #include #include namespace gbe { /////////////////////////////////////////////////////////////////////////// // Helper functions /////////////////////////////////////////////////////////////////////////// uint32_t getGenType(ir::Type type) { using namespace ir; switch (type) { case TYPE_BOOL: return GEN_TYPE_W; case TYPE_S8: return GEN_TYPE_B; case TYPE_U8: return GEN_TYPE_UB; case TYPE_S16: return GEN_TYPE_W; case TYPE_U16: return GEN_TYPE_UW; case TYPE_S32: return GEN_TYPE_D; case TYPE_U32: return GEN_TYPE_UD; case TYPE_S64: return GEN_TYPE_L; case TYPE_U64: return GEN_TYPE_UL; case TYPE_FLOAT: return GEN_TYPE_F; case TYPE_DOUBLE: return GEN_TYPE_DF; case TYPE_HALF: return GEN_TYPE_HF; default: NOT_SUPPORTED; return GEN_TYPE_F; } } ir::Type getIRType(uint32_t genType) { using namespace ir; switch (genType) { case GEN_TYPE_B: return TYPE_S8; case GEN_TYPE_UB: return TYPE_U8; case GEN_TYPE_W: return TYPE_S16; case GEN_TYPE_UW: return TYPE_U16; case GEN_TYPE_D: return TYPE_S32; case GEN_TYPE_UD: return TYPE_U32; case GEN_TYPE_L: return TYPE_S64; case GEN_TYPE_UL: return TYPE_U64; case GEN_TYPE_F: return TYPE_FLOAT; case GEN_TYPE_DF: return TYPE_DOUBLE; case GEN_TYPE_HF : return TYPE_HALF; default: NOT_SUPPORTED; return TYPE_FLOAT; } } uint32_t getGenCompare(ir::Opcode opcode, bool inverse = false) { using namespace ir; switch (opcode) { case OP_LE: return (!inverse) ? GEN_CONDITIONAL_LE : GEN_CONDITIONAL_G; case OP_LT: return (!inverse) ? GEN_CONDITIONAL_L : GEN_CONDITIONAL_GE; case OP_GE: return (!inverse) ? GEN_CONDITIONAL_GE : GEN_CONDITIONAL_L; case OP_GT: return (!inverse) ? GEN_CONDITIONAL_G : GEN_CONDITIONAL_LE; case OP_EQ: return (!inverse) ? GEN_CONDITIONAL_EQ : GEN_CONDITIONAL_NEQ; case OP_NE: return (!inverse) ? GEN_CONDITIONAL_NEQ : GEN_CONDITIONAL_EQ; default: NOT_SUPPORTED; return 0u; }; } /////////////////////////////////////////////////////////////////////////// // SelectionInstruction /////////////////////////////////////////////////////////////////////////// SelectionInstruction::SelectionInstruction(SelectionOpcode op, uint32_t dst, uint32_t src) : parent(NULL), opcode(op), dstNum(dst), srcNum(src) { extra = { 0 }; } void SelectionInstruction::prepend(SelectionInstruction &other) { gbe::prepend(&other, this); other.parent = this->parent; } void SelectionInstruction::append(SelectionInstruction &other) { gbe::append(&other, this); other.parent = this->parent; } bool SelectionInstruction::isRead(void) const { return this->opcode == SEL_OP_UNTYPED_READ || this->opcode == SEL_OP_UNTYPED_READA64 || this->opcode == SEL_OP_READ64 || this->opcode == SEL_OP_READ64A64 || this->opcode == SEL_OP_ATOMIC || this->opcode == SEL_OP_ATOMICA64 || this->opcode == SEL_OP_BYTE_GATHER || this->opcode == SEL_OP_BYTE_GATHERA64 || this->opcode == SEL_OP_SAMPLE || this->opcode == SEL_OP_VME || this->opcode == SEL_OP_DWORD_GATHER || this->opcode == SEL_OP_OBREAD || this->opcode == SEL_OP_MBREAD; } bool SelectionInstruction::modAcc(void) const { return this->opcode == SEL_OP_I64SUB || this->opcode == SEL_OP_I64ADD || this->opcode == SEL_OP_MUL_HI || this->opcode == SEL_OP_HADD || this->opcode == SEL_OP_RHADD || this->opcode == SEL_OP_I64MUL || this->opcode == SEL_OP_I64_MUL_HI || this->opcode == SEL_OP_I64MADSAT || this->opcode == SEL_OP_I64DIV || this->opcode == SEL_OP_I64REM || this->opcode == SEL_OP_MACH; } bool SelectionInstruction::isWrite(void) const { return this->opcode == SEL_OP_UNTYPED_WRITE || this->opcode == SEL_OP_UNTYPED_WRITEA64 || this->opcode == SEL_OP_WRITE64 || this->opcode == SEL_OP_WRITE64A64 || this->opcode == SEL_OP_ATOMIC || this->opcode == SEL_OP_ATOMICA64 || this->opcode == SEL_OP_BYTE_SCATTER || this->opcode == SEL_OP_BYTE_SCATTERA64 || this->opcode == SEL_OP_TYPED_WRITE || this->opcode == SEL_OP_OBWRITE || this->opcode == SEL_OP_MBWRITE; } bool SelectionInstruction::isBranch(void) const { return this->opcode == SEL_OP_JMPI; } bool SelectionInstruction::isLabel(void) const { return this->opcode == SEL_OP_LABEL; } bool SelectionInstruction::sameAsDstRegion(uint32_t srcID) { assert(srcID < srcNum); if (dstNum == 0) return true; GenRegister &srcReg = this->src(srcID); for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { const GenRegister &dstReg = this->dst(dstID); if (!dstReg.isSameRegion(srcReg)) return false; } return true; } bool SelectionInstruction::isNative(void) const { return this->opcode == SEL_OP_NOT || /* ALU1 */ this->opcode == SEL_OP_LZD || this->opcode == SEL_OP_RNDZ || this->opcode == SEL_OP_RNDE || this->opcode == SEL_OP_RNDD || this->opcode == SEL_OP_RNDU || this->opcode == SEL_OP_FRC || this->opcode == SEL_OP_F16TO32 || this->opcode == SEL_OP_F32TO16 || this->opcode == SEL_OP_CBIT || this->opcode == SEL_OP_SEL || /* ALU2 */ this->opcode == SEL_OP_AND || this->opcode == SEL_OP_OR || this->opcode == SEL_OP_XOR || this->opcode == SEL_OP_SHR || this->opcode == SEL_OP_SHL || this->opcode == SEL_OP_RSR || this->opcode == SEL_OP_RSL || this->opcode == SEL_OP_ASR || this->opcode == SEL_OP_SEL || this->opcode == SEL_OP_ADD || this->opcode == SEL_OP_MUL || this->opcode == SEL_OP_FBH || this->opcode == SEL_OP_FBL || this->opcode == SEL_OP_MACH || this->opcode == SEL_OP_MATH || this->opcode == SEL_OP_LRP || /* ALU3 */ this->opcode == SEL_OP_MAD; } /////////////////////////////////////////////////////////////////////////// // SelectionVector /////////////////////////////////////////////////////////////////////////// SelectionVector::SelectionVector(void) : insn(NULL), reg(NULL), regNum(0), isSrc(0) {} /////////////////////////////////////////////////////////////////////////// // SelectionBlock /////////////////////////////////////////////////////////////////////////// SelectionBlock::SelectionBlock(const ir::BasicBlock *bb) : bb(bb), endifLabel( (ir::LabelIndex) 0), removeSimpleIfEndif(false){} void SelectionBlock::append(ir::Register reg) { tmp.push_back(reg); } void SelectionBlock::append(SelectionInstruction *insn) { this->insnList.push_back(insn); insn->parent = this; } void SelectionBlock::prepend(SelectionInstruction *insn) { this->insnList.push_front(insn); insn->parent = this; } void SelectionBlock::append(SelectionVector *vec) { this->vectorList.push_back(vec); } #define LD_MSG_ORDER_IVB 7 #define LD_MSG_ORDER_SKL 9 /////////////////////////////////////////////////////////////////////////// // Maximal munch selection on DAG /////////////////////////////////////////////////////////////////////////// /*! All instructions in a block are organized into a DAG */ class SelectionDAG { public: INLINE SelectionDAG(const ir::Instruction &insn) : insn(insn), mergeable(0), childNum(insn.getSrcNum()), isRoot(0) { GBE_ASSERT(insn.getSrcNum() <= ir::Instruction::MAX_SRC_NUM); for (uint32_t childID = 0; childID < childNum; ++childID) this->child[childID] = NULL; computeBool = false; isUsed = false; } /*! Mergeable are non-root instructions with valid sources */ INLINE void setAsMergeable(uint32_t which) { mergeable|=(1< opcodes; /*! Number of instruction generated */ uint32_t insnNum; /*! Cost of the pattern */ uint32_t cost; }; /*! Store and sort all the patterns. This is our global library we use for the * code selection */ class SelectionLibrary { public: /*! Will register all the patterns */ SelectionLibrary(void); /*! Release and destroy all the registered patterns */ ~SelectionLibrary(void); /*! Insert the given pattern for all associated opcodes */ template void insert(void); /*! One list of pattern per opcode */ typedef vector PatternList; /*! All lists of patterns properly sorted per opcode */ PatternList patterns[ir::OP_INVALID]; /*! All patterns to free */ vector toFree; }; /////////////////////////////////////////////////////////////////////////// // Code selection internal implementation /////////////////////////////////////////////////////////////////////////// /*! Actual implementation of the instruction selection engine */ class Selection::Opaque { public: /*! simdWidth is the default width for the instructions */ Opaque(GenContext &ctx); /*! Release everything */ virtual ~Opaque(void); /*! Implements the instruction selection itself */ void select(void); /*! Start a backward generation (from the end of the block) */ void startBackwardGeneration(void); /*! End backward code generation and output the code in the block */ void endBackwardGeneration(void); /*! Implement public class */ uint32_t getLargestBlockSize(void) const; /*! Implement public class */ INLINE uint32_t getVectorNum(void) const { return this->vectorNum; } /*! Implement public class */ INLINE ir::Register replaceSrc(SelectionInstruction *insn, uint32_t regID, ir::Type type, bool needMov); /*! Implement public class */ INLINE ir::Register replaceDst(SelectionInstruction *insn, uint32_t regID, ir::Type type, bool needMov); /*! spill a register (insert spill/unspill instructions) */ INLINE bool spillRegs(const SpilledRegs &spilledRegs, uint32_t registerPool); bool has32X32Mul() const { return bHas32X32Mul; } bool hasSends() const { return bHasSends; } void setHas32X32Mul(bool b) { bHas32X32Mul = b; } void setHasSends(bool b) { bHasSends = b; } bool hasLongType() const { return bHasLongType; } bool hasDoubleType() const { return bHasDoubleType; } bool hasHalfType() const { return bHasHalfType; } void setHasLongType(bool b) { bHasLongType = b; } void setHasDoubleType(bool b) { bHasDoubleType = b; } void setHasHalfType(bool b) { bHasHalfType = b; } bool hasLongRegRestrict() { return bLongRegRestrict; } void setLongRegRestrict(bool b) { bLongRegRestrict = b; } void setLdMsgOrder(uint32_t type) { ldMsgOrder = type; } uint32_t getLdMsgOrder() const { return ldMsgOrder; } void setSlowByteGather(bool b) { slowByteGather = b; } bool getSlowByteGather() { return slowByteGather; } /*! indicate whether a register is a scalar/uniform register. */ INLINE bool isPartialWrite(const ir::Register ®) const { return partialWriteRegs.find(reg.value()) != partialWriteRegs.end(); } INLINE bool isScalarReg(const ir::Register ®) const { const ir::RegisterData ®Data = getRegisterData(reg); return regData.isUniform(); } INLINE bool isLongReg(const ir::Register ®) const { const ir::RegisterData ®Data = getRegisterData(reg); return regData.family == ir::FAMILY_QWORD; } INLINE GenRegister unpacked_ud(const ir::Register ®) const { return GenRegister::unpacked_ud(reg, isScalarReg(reg)); } INLINE GenRegister unpacked_uw(const ir::Register ®) const { return GenRegister::unpacked_uw(reg, isScalarReg(reg), isLongReg(reg)); } INLINE GenRegister unpacked_ub(const ir::Register ®) const { return GenRegister::unpacked_ub(reg, isScalarReg(reg)); } INLINE GenRegister getOffsetReg(GenRegister reg, int nr, int subnr, bool isDst = true) { if (isDst) partialWriteRegs.insert(reg.value.reg); return GenRegister::offset(reg, nr, subnr); } GenRegister getLaneIDReg(); /*! Implement public class */ INLINE uint32_t getRegNum(void) const { return file.regNum(); } /*! Implements public interface */ INLINE ir::RegisterData getRegisterData(ir::Register reg) const { return file.get(reg); } /*! Implement public class */ INLINE ir::RegisterFamily getRegisterFamily(ir::Register reg) const { return file.get(reg).family; } /*! Implement public class */ SelectionInstruction *create(SelectionOpcode, uint32_t dstNum, uint32_t srcNum); /*! Return the selection register from the GenIR one */ GenRegister selReg(ir::Register, ir::Type type = ir::TYPE_FLOAT) const; /*! Compute the nth register part when using SIMD8 with Qn (n in 2,3,4) */ GenRegister selRegQn(ir::Register, uint32_t quarter, ir::Type type = ir::TYPE_FLOAT) const; /*! Size of the stack (should be large enough) */ enum { MAX_STATE_NUM = 16 }; /*! Push the current instruction state */ INLINE void push(void) { assert(stateNum < MAX_STATE_NUM); stack[stateNum++] = curr; } /*! Pop the latest pushed state */ INLINE void pop(void) { assert(stateNum > 0); curr = stack[--stateNum]; } /*! Create a new register in the register file and append it in the * temporary list of the current block */ INLINE ir::Register reg(ir::RegisterFamily family, bool scalar = false) { GBE_ASSERT(block != NULL); const ir::Register reg = file.append(family, scalar); block->append(reg); return reg; } /*! Append a block at the block stream tail. It becomes the current block */ void appendBlock(const ir::BasicBlock &bb); /*! Append an instruction in the current block */ SelectionInstruction *appendInsn(SelectionOpcode, uint32_t dstNum, uint32_t srcNum); /*! Append a new vector of registers in the current block */ SelectionVector *appendVector(void); /*! Build a DAG for the basic block (return number of instructions) */ uint32_t buildBasicBlockDAG(const ir::BasicBlock &bb); /*! Perform the selection on the basic block */ void matchBasicBlock(const ir::BasicBlock &bb, uint32_t insnNum); /*! a simple block can use predication instead of if/endif*/ bool isSimpleBlock(const ir::BasicBlock &bb, uint32_t insnNum); /*! an instruction has a QWORD family src or dst operand. */ bool hasQWord(const ir::Instruction &insn); /*! A root instruction needs to be generated */ bool isRoot(const ir::Instruction &insn) const; /*! Set debug infomation to Selection */ void setDBGInfo_SEL(DebugInfo in) { DBGInfo = in; } /*! To handle selection block allocation */ DECL_POOL(SelectionBlock, blockPool); /*! To handle selection instruction allocation */ LinearAllocator insnAllocator; /*! To handle selection vector allocation */ DECL_POOL(SelectionVector, vecPool); /*! Per register information used with top-down block sweeping */ vector regDAG; /*! Store one DAG per instruction */ vector insnDAG; /*! Owns this structure */ GenContext &ctx; /*! Tail of the code fragment for backward code generation */ intrusive_list bwdList; /*! List of emitted blocks */ intrusive_list blockList; /*! Currently processed block */ SelectionBlock *block; /*! Current instruction state to use */ GenInstructionState curr; /*! We append new registers so we duplicate the function register file */ ir::RegisterFile file; /*! State used to encode the instructions */ GenInstructionState stack[MAX_STATE_NUM]; /*! Maximum number of instructions in the basic blocks */ uint32_t maxInsnNum; /*! Speed up instruction dag allocation */ DECL_POOL(SelectionDAG, dagPool); /*! Total number of registers in the function we encode */ uint32_t regNum; /*! Number of states currently pushed */ uint32_t stateNum; /*! Number of vector allocated */ uint32_t vectorNum; /*! If true, generate code backward */ bool bwdCodeGeneration; DebugInfo DBGInfo; /*! To make function prototypes more readable */ typedef const GenRegister &Reg; /*! If true, the thread map has already been stored */ bool storeThreadMap; /*! Check for destination register. Major purpose is to find out partially updated dst registers. These registers will be unspillable. */ set partialWriteRegs; #define ALU1(OP) \ INLINE void OP(Reg dst, Reg src) { ALU1(SEL_OP_##OP, dst, src); } #define ALU1WithTemp(OP) \ INLINE void OP(Reg dst, Reg src, Reg temp) { ALU1WithTemp(SEL_OP_##OP, dst, src, temp); } #define ALU2(OP) \ INLINE void OP(Reg dst, Reg src0, Reg src1) { ALU2(SEL_OP_##OP, dst, src0, src1); } #define ALU2WithTemp(OP) \ INLINE void OP(Reg dst, Reg src0, Reg src1, Reg temp) { ALU2WithTemp(SEL_OP_##OP, dst, src0, src1, temp); } #define ALU3(OP) \ INLINE void OP(Reg dst, Reg src0, Reg src1, Reg src2) { ALU3(SEL_OP_##OP, dst, src0, src1, src2); } #define I64Shift(OP) \ INLINE void OP(Reg dst, Reg src0, Reg src1, GenRegister tmp[6]) { I64Shift(SEL_OP_##OP, dst, src0, src1, tmp); } ALU1(MOV) ALU1(READ_ARF) ALU1(LOAD_INT64_IMM) ALU1(RNDZ) ALU1(RNDE) ALU1(F16TO32) ALU1(F32TO16) ALU1WithTemp(BSWAP) ALU2(SEL) ALU2(SEL_INT64) ALU1(NOT) ALU2(AND) ALU2(OR) ALU2(XOR) ALU2(I64AND) ALU2(I64OR) ALU2(I64XOR) ALU2(SHR) ALU2(SHL) ALU2(RSR) ALU2(RSL) ALU2(ASR) ALU2(ADD) ALU2WithTemp(I64ADD) ALU2WithTemp(I64SUB) ALU2(MUL) ALU1(FRC) ALU1(RNDD) ALU1(RNDU) ALU2(MACH) ALU1(LZD) ALU3(MAD) ALU3(LRP) ALU2WithTemp(MUL_HI) ALU1(FBH) ALU1(FBL) ALU1(CBIT) ALU2WithTemp(HADD) ALU2WithTemp(RHADD) ALU2(UPSAMPLE_LONG) ALU1WithTemp(CONVI_TO_I64) ALU1WithTemp(CONVF_TO_I64) ALU1(CONVI64_TO_I) I64Shift(I64SHL) I64Shift(I64SHR) I64Shift(I64ASR) ALU1(BFREV) #undef ALU1 #undef ALU1WithTemp #undef ALU2 #undef ALU2WithTemp #undef ALU3 #undef I64Shift /*! simd shuffle */ void SIMD_SHUFFLE(Reg dst, Reg src0, Reg src1); /*! Convert 64-bit integer to 32-bit float */ void CONVI64_TO_F(Reg dst, Reg src, GenRegister tmp[6]); /*! Convert 64-bit integer to 32-bit float */ void CONVF_TO_I64(Reg dst, Reg src, GenRegister tmp[2]); /*! Saturated 64bit x*y + z */ void I64MADSAT(Reg dst, Reg src0, Reg src1, Reg src2, GenRegister* tmp, int tmp_num); /*! High 64bit of x*y */ void I64_MUL_HI(Reg dst, Reg src0, Reg src1, GenRegister *tmp, int tmp_num); /*! (x+y)>>1 without mod. overflow */ void I64HADD(Reg dst, Reg src0, Reg src1, GenRegister *tmp, int tmp_num); /*! (x+y+1)>>1 without mod. overflow */ void I64RHADD(Reg dst, Reg src0, Reg src1, GenRegister *tmp, int tmp_num); /*! Shift a 64-bit integer */ void I64Shift(SelectionOpcode opcode, Reg dst, Reg src0, Reg src1, GenRegister tmp[7]); /*! Compare 64-bit integer */ void I64CMP(uint32_t conditional, Reg src0, Reg src1, GenRegister tmp[3]); /*! Saturated addition of 64-bit integer */ void I64SATADD(Reg dst, Reg src0, Reg src1, GenRegister tmp[5]); /*! Saturated subtraction of 64-bit integer */ void I64SATSUB(Reg dst, Reg src0, Reg src1, GenRegister tmp[5]); /*! Encode a barrier instruction */ void BARRIER(GenRegister src, GenRegister fence, uint32_t barrierType); /*! Encode a barrier instruction */ void FENCE(GenRegister dst); /*! Encode a label instruction */ void LABEL(ir::LabelIndex label); /*! Jump indexed instruction, return the encoded instruction count according to jump distance. */ int JMPI(Reg src, ir::LabelIndex target, ir::LabelIndex origin); /*! IF indexed instruction */ void IF(Reg src, ir::LabelIndex jip, ir::LabelIndex uip); /*! ELSE indexed instruction */ void ELSE(Reg src, ir::LabelIndex jip, ir::LabelIndex elseLabel); /*! ENDIF indexed instruction */ void ENDIF(Reg src, ir::LabelIndex jip, ir::LabelIndex endifLabel = ir::LabelIndex(0)); /*! WHILE indexed instruction */ void WHILE(Reg src, ir::LabelIndex jip); /*! BRD indexed instruction */ void BRD(Reg src, ir::LabelIndex jip); /*! BRC indexed instruction */ void BRC(Reg src, ir::LabelIndex jip, ir::LabelIndex uip); /*! Compare instructions */ void CMP(uint32_t conditional, Reg src0, Reg src1, Reg dst = GenRegister::null()); /*! Select instruction with embedded comparison */ void SEL_CMP(uint32_t conditional, Reg dst, Reg src0, Reg src1); /* Constant buffer move instruction */ void INDIRECT_MOVE(Reg dst, Reg tmp, Reg base, Reg regOffset, uint32_t immOffset); /*! EOT is used to finish GPGPU threads */ void EOT(void); /*! No-op */ void NOP(void); /*! Wait instruction (used for the barrier) */ void WAIT(uint32_t n = 0); /*! Atomic instruction */ void ATOMIC(Reg dst, uint32_t function, uint32_t srcNum, Reg src0, Reg src1, Reg src2, GenRegister bti, vector temps); /*! AtomicA64 instruction */ void ATOMICA64(Reg dst, uint32_t function, uint32_t srcNum, vector src, GenRegister bti, vector temps); /*! Read 64 bits float/int array */ void READ64(Reg addr, const GenRegister *dst, const GenRegister *tmp, uint32_t elemNum, const GenRegister bti, bool native_long, vector temps); /*! Write 64 bits float/int array */ void WRITE64(Reg addr, const GenRegister *src, const GenRegister *tmp, uint32_t srcNum, GenRegister bti, bool native_long, vector temps); /*! Read64 A64 */ void READ64A64(Reg addr, const GenRegister *dst, const GenRegister *tmp, uint32_t elemNum); /*! write64 a64 */ void WRITE64A64(Reg addr, const GenRegister *src, const GenRegister *tmp, uint32_t srcNum); /*! Untyped read (up to 4 elements) */ void UNTYPED_READ(Reg addr, const GenRegister *dst, uint32_t elemNum, GenRegister bti, vector temps); /*! Untyped write (up to 4 elements) */ void UNTYPED_WRITE(Reg addr, const GenRegister *src, uint32_t elemNum, GenRegister bti, vector temps); /*! Byte gather (for unaligned bytes, shorts and ints) */ void BYTE_GATHER(Reg dst, Reg addr, uint32_t elemSize, GenRegister bti, vector temps); /*! Byte scatter (for unaligned bytes, shorts and ints) */ void BYTE_SCATTER(Reg addr, Reg src, uint32_t elemSize, GenRegister bti, vector temps); /*! Byte gather a64 (for unaligned bytes, shorts and ints) */ void BYTE_GATHERA64(Reg dst, Reg addr, uint32_t elemSize); /*! Byte scatter (for unaligned bytes, shorts and ints) */ void BYTE_SCATTERA64(GenRegister *msg, unsigned msgNum, uint32_t elemSize); /*! Untyped read (up to 4 elements) */ void UNTYPED_READA64(Reg addr, const GenRegister *dst, uint32_t dstNum, uint32_t elemNum); /*! Untyped write (up to 4 elements) */ void UNTYPED_WRITEA64(const GenRegister *msgs, uint32_t msgNum, uint32_t elemNum); /*! DWord scatter (for constant cache read) */ void DWORD_GATHER(Reg dst, Reg addr, uint32_t bti); /*! Unpack the uint to charN */ void UNPACK_BYTE(const GenRegister *dst, const GenRegister src, uint32_t elemSize, uint32_t elemNum); /*! pack the charN to uint */ void PACK_BYTE(const GenRegister dst, const GenRegister *src, uint32_t elemSize, uint32_t elemNum); /*! Unpack the uint to charN */ void UNPACK_LONG(const GenRegister dst, const GenRegister src); /*! pack the charN to uint */ void PACK_LONG(const GenRegister dst, const GenRegister src); /*! Extended math function (2 arguments) */ void MATH(Reg dst, uint32_t function, Reg src0, Reg src1); /*! Extended math function (1 argument) */ void MATH(Reg dst, uint32_t function, Reg src); /*! Encode unary instructions */ void ALU1(SelectionOpcode opcode, Reg dst, Reg src); /*! Encode unary with temp reg instructions */ void ALU1WithTemp(SelectionOpcode opcode, Reg dst, Reg src0, Reg temp); /*! Encode binary instructions */ void ALU2(SelectionOpcode opcode, Reg dst, Reg src0, Reg src1); /*! Encode binary with temp reg instructions */ void ALU2WithTemp(SelectionOpcode opcode, Reg dst, Reg src0, Reg src1, Reg temp); /*! Encode ternary instructions */ void ALU3(SelectionOpcode opcode, Reg dst, Reg src0, Reg src1, Reg src2); /*! Encode sample instructions */ void SAMPLE(GenRegister *dst, uint32_t dstNum, GenRegister *msgPayloads, uint32_t msgNum, uint32_t bti, uint32_t sampler, bool isLD, bool isUniform); /*! Encode vme instructions */ void VME(uint32_t bti, GenRegister *dst, GenRegister *payloadVal, uint32_t dstNum, uint32_t srcNum, uint32_t msg_type, uint32_t vme_search_path_lut, uint32_t lut_sub); /*! Encode typed write instructions */ void TYPED_WRITE(GenRegister *msgs, uint32_t msgNum, uint32_t bti, bool is3D); /*! Get image information */ void GET_IMAGE_INFO(uint32_t type, GenRegister *dst, uint32_t dst_num, uint32_t bti); /*! Calculate the timestamp */ void CALC_TIMESTAMP(GenRegister ts[5], int tsN, GenRegister tmp, uint32_t pointNum, uint32_t tsType); /*! Store the profiling info */ void STORE_PROFILING(uint32_t profilingType, uint32_t bti, GenRegister tmp0, GenRegister tmp1, GenRegister ts[5], int tsNum); /*! Printf */ void PRINTF(uint8_t bti, GenRegister tmp0, GenRegister tmp1, GenRegister src[8], int srcNum, uint16_t num, bool isContinue, uint32_t totalSize); /*! Multiply 64-bit integers */ void I64MUL(Reg dst, Reg src0, Reg src1, GenRegister *tmp, bool native_long); /*! 64-bit integer division */ void I64DIV(Reg dst, Reg src0, Reg src1, GenRegister *tmp, int tmp_int); /*! 64-bit integer remainder of division */ void I64REM(Reg dst, Reg src0, Reg src1, GenRegister *tmp, int tmp_int); /*! double division */ void F64DIV(Reg dst, Reg src0, Reg src1, GenRegister* tmp, int tmpNum); /*! Work Group Operations */ void WORKGROUP_OP(uint32_t wg_op, Reg dst, GenRegister src, GenRegister tmpData1, GenRegister localThreadID, GenRegister localThreadNUM, GenRegister tmpData2, GenRegister slmOff, vector msg, GenRegister localBarrier); /*! Sub Group Operations */ void SUBGROUP_OP(uint32_t wg_op, Reg dst, GenRegister src, GenRegister tmpData1, GenRegister tmpData2); /*! Oblock read */ void OBREAD(GenRegister* dsts, uint32_t tmp_size, GenRegister header, uint32_t bti, uint32_t ow_size); /*! Oblock write */ void OBWRITE(GenRegister header, GenRegister* values, uint32_t tmp_size, uint32_t bti, uint32_t ow_size); /*! Media block read */ void MBREAD(GenRegister* dsts, uint32_t tmp_size, GenRegister header, uint32_t bti, uint32_t response_size); /*! Media block write */ void MBWRITE(GenRegister header, GenRegister* values, uint32_t tmp_size, uint32_t bti, uint32_t data_size); /* common functions for both binary instruction and sel_cmp and compare instruction. It will handle the IMM or normal register assignment, and will try to avoid LOADI as much as possible. */ void getSrcGenRegImm(SelectionDAG &dag, GenRegister &src0, GenRegister &src1, ir::Type type, bool &inverse); void getSrcGenRegImm(SelectionDAG &dag, SelectionDAG *dag0, SelectionDAG *dag1, GenRegister &src0, GenRegister &src1, ir::Type type, bool &inverse); /* Get current block IP register according to label width. */ GenRegister getBlockIP() { return ctx.isDWLabel() ? selReg(ir::ocl::dwblockip) : selReg(ir::ocl::blockip); } /* Get proper label immediate gen register from label value. */ GenRegister getLabelImmReg(uint32_t labelValue) { return ctx.isDWLabel() ? GenRegister::immud(labelValue) : GenRegister::immuw(labelValue); } /* Get proper label immediate gen register from label. */ GenRegister getLabelImmReg(ir::LabelIndex label) { return getLabelImmReg(label.value()); } /* Set current label register to a label value. */ void setBlockIP(GenRegister blockip, uint32_t labelValue) { if (!ctx.isDWLabel()) MOV(GenRegister::retype(blockip, GEN_TYPE_UW), GenRegister::immuw(labelValue)); else MOV(GenRegister::retype(blockip, GEN_TYPE_UD), GenRegister::immud(labelValue)); } /* Generate comparison instruction to compare block ip address and specified label register.*/ void cmpBlockIP(uint32_t cond, GenRegister blockip, GenRegister labelReg) { if (!ctx.isDWLabel()) CMP(cond, GenRegister::retype(blockip, GEN_TYPE_UW), labelReg, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW)); else CMP(cond, GenRegister::retype(blockip, GEN_TYPE_UD), labelReg, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); } void cmpBlockIP(uint32_t cond, GenRegister blockip, uint32_t labelValue) { if (!ctx.isDWLabel()) CMP(cond, GenRegister::retype(blockip, GEN_TYPE_UW), GenRegister::immuw(labelValue), GenRegister::retype(GenRegister::null(), GEN_TYPE_UW)); else CMP(cond, GenRegister::retype(blockip, GEN_TYPE_UD), GenRegister::immud(labelValue), GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); } INLINE vector getBTITemps(const ir::AddressMode &AM) { vector temps; if (AM == ir::AM_DynamicBti) { temps.push_back(selReg(reg(ir::FAMILY_WORD, true), ir::TYPE_U16)); temps.push_back(selReg(reg(ir::FAMILY_DWORD, true), ir::TYPE_U32)); } return temps; } /*! Use custom allocators */ GBE_CLASS(Opaque); friend class SelectionBlock; friend class SelectionInstruction; private: /*! Auxiliary label for if/endif. */ uint32_t currAuxLabel; bool bHas32X32Mul; bool bHasLongType; bool bHasDoubleType; bool bHasHalfType; bool bLongRegRestrict; bool bHasSends; uint32_t ldMsgOrder; bool slowByteGather; INLINE ir::LabelIndex newAuxLabel() { currAuxLabel++; return (ir::LabelIndex)currAuxLabel; } }; /////////////////////////////////////////////////////////////////////////// // Helper function /////////////////////////////////////////////////////////////////////////// /*! Directly mark all sources as root (when no match is found) */ static void markAllChildren(SelectionDAG &dag) { // Do not merge anything, so all sources become roots for (uint32_t childID = 0; childID < dag.childNum; ++childID) if (dag.child[childID]) dag.child[childID]->isRoot = 1; } /*! Helper function to figure if two sources are the same */ static bool sourceMatch(SelectionDAG *src0DAG, uint32_t src0ID, SelectionDAG *src1DAG, uint32_t src1ID) { GBE_ASSERT(src0DAG && src1DAG); // Ensure they are the same physical registers const ir::Register src0 = src0DAG->insn.getSrc(src0ID); const ir::Register src1 = src1DAG->insn.getSrc(src1ID); if (src0 != src1) return false; // Ensure they contain the same values return src0DAG->child[src0ID] == src1DAG->child[src1ID]; } Selection::Opaque::Opaque(GenContext &ctx) : ctx(ctx), block(NULL), curr(ctx.getSimdWidth()), file(ctx.getFunction().getRegisterFile()), maxInsnNum(ctx.getFunction().getLargestBlockSize()), dagPool(maxInsnNum), stateNum(0), vectorNum(0), bwdCodeGeneration(false), storeThreadMap(false), currAuxLabel(ctx.getFunction().labelNum()), bHas32X32Mul(false), bHasLongType(false), bHasDoubleType(false), bHasHalfType(false), bLongRegRestrict(false), bHasSends(false), ldMsgOrder(LD_MSG_ORDER_IVB), slowByteGather(false) { const ir::Function &fn = ctx.getFunction(); this->regNum = fn.regNum(); this->regDAG.resize(regNum); this->insnDAG.resize(maxInsnNum); } Selection::Opaque::~Opaque(void) { for (auto it = blockList.begin(); it != blockList.end();) { SelectionBlock &block = *it; ++it; this->deleteSelectionBlock(&block); } } SelectionInstruction* Selection::Opaque::create(SelectionOpcode opcode, uint32_t dstNum, uint32_t srcNum) { const size_t regSize = (dstNum+srcNum)*sizeof(GenRegister); const size_t size = sizeof(SelectionInstruction) + regSize; void *ptr = insnAllocator.allocate(size); return new (ptr) SelectionInstruction(opcode, dstNum, srcNum); } void Selection::Opaque::startBackwardGeneration(void) { this->bwdCodeGeneration = true; } void Selection::Opaque::endBackwardGeneration(void) { for (auto it = bwdList.rbegin(); it != bwdList.rend();) { SelectionInstruction &insn = *it; auto toRemoveIt = it--; bwdList.erase(toRemoveIt); this->block->prepend(&insn); } this->bwdCodeGeneration = false; } uint32_t Selection::Opaque::getLargestBlockSize(void) const { size_t maxInsnNum = 0; for (const auto &bb : blockList) maxInsnNum = std::max(maxInsnNum, bb.insnList.size()); return uint32_t(maxInsnNum); } void Selection::Opaque::appendBlock(const ir::BasicBlock &bb) { this->block = this->newSelectionBlock(&bb); this->blockList.push_back(this->block); } SelectionInstruction *Selection::Opaque::appendInsn(SelectionOpcode opcode, uint32_t dstNum, uint32_t srcNum) { GBE_ASSERT(dstNum <= SelectionInstruction::MAX_DST_NUM && srcNum <= SelectionInstruction::MAX_SRC_NUM); GBE_ASSERT(this->block != NULL); SelectionInstruction *insn = this->create(opcode, dstNum, srcNum); insn->setDBGInfo(DBGInfo); if (this->bwdCodeGeneration) this->bwdList.push_back(insn); else this->block->append(insn); insn->state = this->curr; return insn; } SelectionVector *Selection::Opaque::appendVector(void) { GBE_ASSERT(this->block != NULL); SelectionVector *vector = this->newSelectionVector(); if (this->bwdCodeGeneration) vector->insn = this->bwdList.back(); else vector->insn = this->block->insnList.back(); this->block->append(vector); this->vectorNum++; return vector; } bool Selection::Opaque::spillRegs(const SpilledRegs &spilledRegs, uint32_t registerPool) { GBE_ASSERT(registerPool != 0); for (auto &block : blockList) for (auto &insn : block.insnList) { // spill / unspill insn should be skipped when do spilling if(insn.opcode == SEL_OP_SPILL_REG || insn.opcode == SEL_OP_UNSPILL_REG) continue; const int simdWidth = insn.state.execWidth; const uint32_t srcNum = insn.srcNum, dstNum = insn.dstNum; struct RegSlot { RegSlot(ir::Register _reg, uint8_t _srcID, uint8_t _poolOffset, bool _isTmp, uint32_t _addr) : reg(_reg), srcID(_srcID), poolOffset(_poolOffset), isTmpReg(_isTmp), addr(_addr) {}; ir::Register reg; union { uint8_t srcID; uint8_t dstID; }; uint8_t poolOffset; bool isTmpReg; int32_t addr; }; uint8_t poolOffset = 1; // keep one for scratch message header vector regSet; for (uint32_t srcID = 0; srcID < srcNum; ++srcID) { const GenRegister selReg = insn.src(srcID); const ir::Register reg = selReg.reg(); auto it = spilledRegs.find(reg); if(it != spilledRegs.end() && selReg.file == GEN_GENERAL_REGISTER_FILE && selReg.physical == 0) { ir::RegisterFamily family = getRegisterFamily(reg); if(family == ir::FAMILY_QWORD && poolOffset == 1) { poolOffset += simdWidth / 8; // qword register fill could not share the scratch read message payload register } struct RegSlot regSlot(reg, srcID, poolOffset, it->second.isTmpReg, it->second.addr); if(family == ir::FAMILY_QWORD) { poolOffset += 2 * simdWidth / 8; } else { poolOffset += simdWidth / 8; } regSet.push_back(regSlot); } } if (poolOffset > ctx.reservedSpillRegs) return false; // FIXME, to support post register allocation scheduling, // put all the reserved register to the spill/unspill's destination registers. // This is not the best way. We need to refine the spill/unspill instruction to // only use passed in registers and don't access hard coded offset in the future. while(!regSet.empty()) { struct RegSlot regSlot = regSet.back(); regSet.pop_back(); const GenRegister selReg = insn.src(regSlot.srcID); if (!regSlot.isTmpReg) { /* For temporary registers, we don't need to unspill. */ SelectionInstruction *unspill = this->create(SEL_OP_UNSPILL_REG, 1 + (ctx.reservedSpillRegs * 8) / ctx.getSimdWidth(), 0); unspill->state = GenInstructionState(simdWidth); unspill->state.noMask = 1; unspill->dst(0) = GenRegister(GEN_GENERAL_REGISTER_FILE, registerPool + regSlot.poolOffset, 0, selReg.type, selReg.vstride, selReg.width, selReg.hstride); for(uint32_t i = 1; i < 1 + (ctx.reservedSpillRegs * 8) / ctx.getSimdWidth(); i++) unspill->dst(i) = ctx.getSimdWidth() == 8 ? GenRegister::vec8(GEN_GENERAL_REGISTER_FILE, registerPool + (i - 1), 0 ) : GenRegister::vec16(GEN_GENERAL_REGISTER_FILE, registerPool + (i - 1) * 2, 0); unspill->extra.scratchOffset = regSlot.addr + selReg.quarter * 4 * simdWidth; unspill->extra.scratchMsgHeader = registerPool; insn.prepend(*unspill); } GenRegister src = insn.src(regSlot.srcID); // change nr/subnr, keep other register settings src.nr = registerPool + regSlot.poolOffset; src.subnr = 0; src.physical = 1; insn.src(regSlot.srcID) = src; }; /* To save one register, registerPool + 1 was used by both the src0 as source and other operands as payload. To avoid side effect, we use a stack model to push all operands register, and spill the 0th dest at last. As all the spill will be append to the current instruction. Then the last spill instruction will be the first instruction after current instruction. Thus the registerPool + 1 still contain valid data. */ for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { const GenRegister selReg = insn.dst(dstID); const ir::Register reg = selReg.reg(); auto it = spilledRegs.find(reg); if(it != spilledRegs.end() && selReg.file == GEN_GENERAL_REGISTER_FILE && selReg.physical == 0) { ir::RegisterFamily family = getRegisterFamily(reg); if(family == ir::FAMILY_QWORD && poolOffset == 1) { poolOffset += simdWidth / 8; // qword register spill could not share the scratch write message payload register } struct RegSlot regSlot(reg, dstID, poolOffset, it->second.isTmpReg, it->second.addr); if (family == ir::FAMILY_QWORD) poolOffset += 2 * simdWidth / 8; else poolOffset += simdWidth / 8; regSet.push_back(regSlot); } } if (poolOffset > ctx.reservedSpillRegs) return false; while(!regSet.empty()) { struct RegSlot regSlot = regSet.back(); regSet.pop_back(); const GenRegister selReg = insn.dst(regSlot.dstID); if(!regSlot.isTmpReg) { /* For temporary registers, we don't need to unspill. */ SelectionInstruction *spill = this->create(SEL_OP_SPILL_REG, (ctx.reservedSpillRegs * 8) / ctx.getSimdWidth() , 1); spill->state = insn.state;//GenInstructionState(simdWidth); spill->state.accWrEnable = 0; spill->state.saturate = 0; if (insn.opcode == SEL_OP_SEL) spill->state.predicate = GEN_PREDICATE_NONE; spill->src(0) = GenRegister(GEN_GENERAL_REGISTER_FILE, registerPool + regSlot.poolOffset, 0, selReg.type, selReg.vstride, selReg.width, selReg.hstride); spill->extra.scratchOffset = regSlot.addr + selReg.quarter * 4 * simdWidth; spill->extra.scratchMsgHeader = registerPool; for(uint32_t i = 0; i < 0 + (ctx.reservedSpillRegs * 8) / ctx.getSimdWidth(); i++) spill->dst(i) = ctx.getSimdWidth() == 8 ? GenRegister::vec8(GEN_GENERAL_REGISTER_FILE, registerPool + (i), 0 ) : GenRegister::vec16(GEN_GENERAL_REGISTER_FILE, registerPool + (i) * 2, 0); insn.append(*spill); } GenRegister dst = insn.dst(regSlot.dstID); // change nr/subnr, keep other register settings dst.physical =1; dst.nr = registerPool + regSlot.poolOffset; dst.subnr = 0; insn.dst(regSlot.dstID)= dst; } } return true; } ir::Register Selection::Opaque::replaceSrc(SelectionInstruction *insn, uint32_t regID, ir::Type type, bool needMov) { SelectionBlock *block = insn->parent; const uint32_t simdWidth = insn->state.execWidth; ir::Register tmp; GenRegister gr; // This will append the temporary register in the instruction block this->block = block; tmp = this->reg(ir::getFamily(type), simdWidth == 1); gr = this->selReg(tmp, type); if (needMov) { // Generate the MOV instruction and replace the register in the instruction SelectionInstruction *mov = this->create(SEL_OP_MOV, 1, 1); mov->src(0) = GenRegister::retype(insn->src(regID), gr.type); mov->state = GenInstructionState(simdWidth); if(this->block->removeSimpleIfEndif){ mov->state.predicate = GEN_PREDICATE_NORMAL; mov->state.flag = 0; mov->state.subFlag = 1; } if (this->isScalarReg(insn->src(regID).reg())) mov->state.noMask = 1; mov->dst(0) = gr; insn->prepend(*mov); } insn->src(regID) = gr; return tmp; } ir::Register Selection::Opaque::replaceDst(SelectionInstruction *insn, uint32_t regID, ir::Type type, bool needMov) { SelectionBlock *block = insn->parent; uint32_t simdWidth; if (!GenRegister::isNull(insn->dst(regID))) simdWidth = this->isScalarReg(insn->dst(regID).reg()) ? 1 : insn->state.execWidth; else { GBE_ASSERT(needMov == false); simdWidth = insn->state.execWidth; } ir::Register tmp; GenRegister gr; this->block = block; tmp = this->reg(ir::getFamily(type)); gr = this->selReg(tmp, type); if (needMov) { // Generate the MOV instruction and replace the register in the instruction SelectionInstruction *mov = this->create(SEL_OP_MOV, 1, 1); mov->dst(0) = GenRegister::retype(insn->dst(regID), gr.type); mov->state = GenInstructionState(simdWidth); if(this->block->removeSimpleIfEndif){ mov->state.predicate = GEN_PREDICATE_NORMAL; mov->state.flag = 0; mov->state.subFlag = 1; } if (simdWidth == 1) { mov->state.noMask = 1; mov->src(0) = GenRegister::retype(GenRegister::vec1(GEN_GENERAL_REGISTER_FILE, gr.reg()), gr.type); } else mov->src(0) = gr; insn->append(*mov); } insn->dst(regID) = gr; return tmp; } #define SEL_REG(SIMD16, SIMD8, SIMD1) \ if (ctx.sel->isScalarReg(reg) == true) \ return GenRegister::retype(GenRegister::SIMD1(reg), genType); \ else if (simdWidth == 8) \ return GenRegister::retype(GenRegister::SIMD8(reg), genType); \ else { \ GBE_ASSERT (simdWidth == 16); \ return GenRegister::retype(GenRegister::SIMD16(reg), genType); \ } GenRegister Selection::Opaque::selReg(ir::Register reg, ir::Type type) const { using namespace ir; const uint32_t genType = getGenType(type); const uint32_t simdWidth = ctx.getSimdWidth(); const RegisterData data = file.get(reg); const RegisterFamily family = data.family; switch (family) { case FAMILY_BOOL: SEL_REG(uw16grf, uw8grf, uw1grf); break; case FAMILY_WORD: SEL_REG(uw16grf, uw8grf, uw1grf); break; case FAMILY_BYTE: SEL_REG(ub16grf, ub8grf, ub1grf); break; case FAMILY_DWORD: SEL_REG(f16grf, f8grf, f1grf); break; case FAMILY_QWORD: if (!this->hasLongType()) { SEL_REG(ud16grf, ud8grf, ud1grf); } else { SEL_REG(ul16grf, ul8grf, ul1grf); } break; default: NOT_SUPPORTED; } GBE_ASSERT(false); return GenRegister(); } #undef SEL_REG GenRegister Selection::Opaque::selRegQn(ir::Register reg, uint32_t q, ir::Type type) const { GenRegister sreg = this->selReg(reg, type); sreg.quarter = q; return sreg; } /*! Syntactic sugar for method declaration */ typedef const GenRegister &Reg; void Selection::Opaque::LABEL(ir::LabelIndex index) { SelectionInstruction *insn = this->appendInsn(SEL_OP_LABEL, 0, 0); insn->index = index.value(); } void Selection::Opaque::BARRIER(GenRegister src, GenRegister fence, uint32_t barrierType) { SelectionInstruction *insn = this->appendInsn(SEL_OP_BARRIER, 1, 1); insn->src(0) = src; insn->dst(0) = fence; insn->extra.barrierType = barrierType; } void Selection::Opaque::FENCE(GenRegister dst) { SelectionInstruction *insn = this->appendInsn(SEL_OP_FENCE, 1, 0); insn->dst(0) = dst; } int Selection::Opaque::JMPI(Reg src, ir::LabelIndex index, ir::LabelIndex origin) { SelectionInstruction *insn = this->appendInsn(SEL_OP_JMPI, 0, 1); insn->src(0) = src; insn->index = index.value(); ir::LabelIndex start, end; if (origin.value() < index.value()) { // Forward Jump, need to exclude the target BB. Because we // need to jump to the beginning of it. start = origin; end = ir::LabelIndex(index.value() - 1); } else { start = index; end = origin; } // FIXME, this longjmp check is too hacky. We need to support instruction // insertion at code emission stage in the future. insn->extra.longjmp = ctx.getFunction().getDistance(start, end) > 3000; return insn->extra.longjmp ? 2 : 1; } void Selection::Opaque::BRD(Reg src, ir::LabelIndex jip) { SelectionInstruction *insn = this->appendInsn(SEL_OP_BRD, 0, 1); insn->src(0) = src; insn->index = jip.value(); } void Selection::Opaque::BRC(Reg src, ir::LabelIndex jip, ir::LabelIndex uip) { SelectionInstruction *insn = this->appendInsn(SEL_OP_BRC, 0, 1); insn->src(0) = src; insn->index = jip.value(); insn->index1 = uip.value(); } void Selection::Opaque::IF(Reg src, ir::LabelIndex jip, ir::LabelIndex uip) { SelectionInstruction *insn = this->appendInsn(SEL_OP_IF, 0, 1); insn->src(0) = src; insn->index = jip.value(); insn->index1 = uip.value(); } void Selection::Opaque::ELSE(Reg src, ir::LabelIndex jip, ir::LabelIndex elseLabel) { SelectionInstruction *insn = this->appendInsn(SEL_OP_ELSE, 0, 1); insn->src(0) = src; insn->index = jip.value(); this->LABEL(elseLabel); } void Selection::Opaque::ENDIF(Reg src, ir::LabelIndex jip, ir::LabelIndex endifLabel) { if(endifLabel == 0) this->block->endifLabel = this->newAuxLabel(); else this->block->endifLabel = endifLabel; this->LABEL(this->block->endifLabel); SelectionInstruction *insn = this->appendInsn(SEL_OP_ENDIF, 0, 1); insn->src(0) = src; insn->index = this->block->endifLabel.value(); } void Selection::Opaque::WHILE(Reg src, ir::LabelIndex jip) { SelectionInstruction *insn = this->appendInsn(SEL_OP_WHILE, 0, 1); insn->src(0) = src; insn->index = jip.value(); } void Selection::Opaque::CMP(uint32_t conditional, Reg src0, Reg src1, Reg dst) { SelectionInstruction *insn = this->appendInsn(SEL_OP_CMP, 1, 2); insn->src(0) = src0; insn->src(1) = src1; insn->dst(0) = dst; insn->extra.function = conditional; } void Selection::Opaque::SEL_CMP(uint32_t conditional, Reg dst, Reg src0, Reg src1) { SelectionInstruction *insn = this->appendInsn(SEL_OP_SEL_CMP, 1, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; insn->extra.function = conditional; } void Selection::Opaque::INDIRECT_MOVE(Reg dst, Reg tmp, Reg base, Reg regOffset, uint32_t immOffset) { SelectionInstruction *insn = this->appendInsn(SEL_OP_INDIRECT_MOVE, 2, 2); insn->dst(0) = dst; insn->dst(1) = tmp; insn->src(0) = base; insn->src(1) = regOffset; insn->extra.indirect_offset = immOffset; } void Selection::Opaque::ATOMIC(Reg dst, uint32_t function, uint32_t msgPayload, Reg src0, Reg src1, Reg src2, GenRegister bti, vector temps) { unsigned dstNum = 1 + temps.size(); SelectionInstruction *insn = this->appendInsn(SEL_OP_ATOMIC, dstNum, msgPayload + 1); if (bti.file != GEN_IMMEDIATE_VALUE) { insn->state.flag = 0; insn->state.subFlag = 1; } insn->dst(0) = dst; if(temps.size()) { insn->dst(1) = temps[0]; insn->dst(2) = temps[1]; } insn->src(0) = src0; if(msgPayload > 1) insn->src(1) = src1; if(msgPayload > 2) insn->src(2) = src2; insn->src(msgPayload) = bti; insn->extra.function = function; insn->extra.elem = msgPayload; if (hasSends() && msgPayload > 1) { insn->extra.splitSend = 1; SelectionVector *vector = this->appendVector(); vector->regNum = 1; vector->offsetID = 0; vector->reg = &insn->src(0); vector->isSrc = 1; vector = this->appendVector(); vector->regNum = msgPayload - 1; vector->offsetID = 1; vector->reg = &insn->src(1); vector->isSrc = 1; } else { SelectionVector *vector = this->appendVector(); vector->regNum = msgPayload; //bti not included in SelectionVector vector->offsetID = 0; vector->reg = &insn->src(0); vector->isSrc = 1; } } void Selection::Opaque::ATOMICA64(Reg dst, uint32_t function, uint32_t msgPayload, vector src, GenRegister bti, vector temps) { unsigned dstNum = 1 + temps.size(); SelectionInstruction *insn = this->appendInsn(SEL_OP_ATOMICA64, dstNum, msgPayload + 1); insn->dst(0) = dst; if(temps.size()) { insn->dst(1) = temps[0]; insn->dst(2) = temps[1]; } for (uint32_t elemID = 0; elemID < msgPayload; ++elemID) insn->src(elemID) = src[elemID]; insn->src(msgPayload) = bti; insn->extra.function = function; insn->extra.elem = msgPayload; SelectionVector *vector = this->appendVector(); vector->regNum = msgPayload; //bti not included in SelectionVector vector->offsetID = 0; vector->reg = &insn->src(0); vector->isSrc = 1; } void Selection::Opaque::EOT(void) { this->appendInsn(SEL_OP_EOT, 0, 0); } void Selection::Opaque::NOP(void) { this->appendInsn(SEL_OP_NOP, 0, 0); } void Selection::Opaque::WAIT(uint32_t n) { SelectionInstruction *insn = this->appendInsn(SEL_OP_WAIT, 0, 0); insn->extra.waitType = n; } void Selection::Opaque::READ64(Reg addr, const GenRegister *dst, const GenRegister *tmp, uint32_t elemNum, const GenRegister bti, bool native_long, vector temps) { SelectionInstruction *insn = NULL; SelectionVector *srcVector = NULL; SelectionVector *dstVector = NULL; if (!native_long) { unsigned dstNum = elemNum + temps.size(); insn = this->appendInsn(SEL_OP_READ64, dstNum, 2); srcVector = this->appendVector(); dstVector = this->appendVector(); // Regular instruction to encode for (uint32_t elemID = 0; elemID < elemNum; ++elemID) insn->dst(elemID) = dst[elemID]; // flagTemp don't need to be put in SelectionVector if (temps.size()) { insn->dst(elemNum) = temps[0]; insn->dst(elemNum + 1) = temps[1]; } } else { unsigned dstNum = elemNum*2 + temps.size(); insn = this->appendInsn(SEL_OP_READ64, dstNum, 2); srcVector = this->appendVector(); dstVector = this->appendVector(); for (uint32_t elemID = 0; elemID < elemNum; ++elemID) insn->dst(elemID) = tmp[elemID]; for (uint32_t elemID = 0; elemID < elemNum; ++elemID) insn->dst(elemID + elemNum) = dst[elemID]; // flagTemp don't need to be put in SelectionVector if (temps.size()) { insn->dst(2*elemNum) = temps[0]; insn->dst(2*elemNum + 1) = temps[1]; } } if (bti.file != GEN_IMMEDIATE_VALUE) { insn->state.flag = 0; insn->state.subFlag = 1; } insn->src(0) = addr; insn->src(1) = bti; insn->extra.elem = elemNum; dstVector->regNum = elemNum; dstVector->isSrc = 0; dstVector->offsetID = 0; dstVector->reg = &insn->dst(0); srcVector->regNum = 1; srcVector->offsetID = 0; srcVector->isSrc = 1; srcVector->reg = &insn->src(0); } void Selection::Opaque::READ64A64(Reg addr, const GenRegister *dst, const GenRegister *tmp, uint32_t elemNum) { SelectionInstruction *insn = NULL; SelectionVector *srcVector = NULL; SelectionVector *dstVector = NULL; insn = this->appendInsn(SEL_OP_READ64A64,elemNum*2, 1); srcVector = this->appendVector(); dstVector = this->appendVector(); for (uint32_t elemID = 0; elemID < elemNum; ++elemID) insn->dst(elemID) = tmp[elemID]; for (uint32_t elemID = 0; elemID < elemNum; ++elemID) insn->dst(elemID + elemNum) = dst[elemID]; insn->src(0) = addr; insn->extra.elem = elemNum; dstVector->regNum = elemNum; dstVector->isSrc = 0; dstVector->offsetID = 0; dstVector->reg = &insn->dst(0); srcVector->regNum = 1; srcVector->offsetID = 0; srcVector->isSrc = 1; srcVector->reg = &insn->src(0); } void Selection::Opaque::UNTYPED_READ(Reg addr, const GenRegister *dst, uint32_t elemNum, GenRegister bti, vector temps) { unsigned dstNum = elemNum + temps.size(); SelectionInstruction *insn = this->appendInsn(SEL_OP_UNTYPED_READ, dstNum, 2); SelectionVector *srcVector = this->appendVector(); SelectionVector *dstVector = this->appendVector(); if (this->isScalarReg(dst[0].reg())) insn->state.noMask = 1; // Regular instruction to encode for (uint32_t elemID = 0; elemID < elemNum; ++elemID) insn->dst(elemID) = dst[elemID]; if (temps.size()) { insn->dst(elemNum) = temps[0]; insn->dst(elemNum + 1) = temps[1]; } insn->src(0) = addr; insn->src(1) = bti; if (bti.file != GEN_IMMEDIATE_VALUE) { insn->state.flag = 0; insn->state.subFlag = 1; } insn->extra.elem = elemNum; // Sends require contiguous allocation dstVector->regNum = elemNum; dstVector->isSrc = 0; dstVector->offsetID = 0; dstVector->reg = &insn->dst(0); srcVector->regNum = 1; srcVector->isSrc = 1; srcVector->offsetID = 0; srcVector->reg = &insn->src(0); } void Selection::Opaque::UNTYPED_READA64(Reg addr, const GenRegister *dst, uint32_t dstNum, uint32_t elemNum) { SelectionInstruction *insn = this->appendInsn(SEL_OP_UNTYPED_READA64, dstNum, 1); SelectionVector *srcVector = this->appendVector(); SelectionVector *dstVector = this->appendVector(); if (this->isScalarReg(dst[0].reg())) insn->state.noMask = 1; // Regular instruction to encode for (uint32_t id = 0; id < dstNum; ++id) insn->dst(id) = dst[id]; insn->src(0) = addr; insn->extra.elem = elemNum; // Sends require contiguous allocation dstVector->regNum = dstNum; dstVector->isSrc = 0; dstVector->offsetID = 0; dstVector->reg = &insn->dst(0); srcVector->regNum = 1; srcVector->isSrc = 1; srcVector->offsetID = 0; srcVector->reg = &insn->src(0); } void Selection::Opaque::WRITE64(Reg addr, const GenRegister *src, const GenRegister *tmp, uint32_t srcNum, GenRegister bti, bool native_long, vector temps) { SelectionVector *vector = NULL; SelectionInstruction *insn = NULL; if (!native_long) { unsigned dstNum = temps.size(); insn = this->appendInsn(SEL_OP_WRITE64, dstNum, srcNum + 2); vector = this->appendVector(); // Register layout: // dst: (flagTemp) // src: addr, srcNum, bti insn->src(0) = addr; for (uint32_t elemID = 0; elemID < srcNum; ++elemID) insn->src(elemID + 1) = src[elemID]; insn->src(srcNum+1) = bti; if (temps.size()) { insn->dst(0) = temps[0]; insn->dst(1) = temps[1]; } insn->extra.elem = srcNum; vector->regNum = srcNum + 1; vector->offsetID = 0; vector->reg = &insn->src(0); vector->isSrc = 1; } else { // handle the native long case unsigned dstNum = srcNum + temps.size(); // Register layout: // dst: srcNum, (flagTemp) // src: srcNum, addr, srcNum, bti. insn = this->appendInsn(SEL_OP_WRITE64, dstNum, srcNum*2 + 2); for (uint32_t elemID = 0; elemID < srcNum; ++elemID) insn->src(elemID) = src[elemID]; insn->src(srcNum) = addr; for (uint32_t elemID = 0; elemID < srcNum; ++elemID) insn->src(srcNum + 1 + elemID) = tmp[0]; insn->src(srcNum*2+1) = bti; /* We also need to add the tmp reigster to dst, in order to avoid the post schedule error . */ for (uint32_t elemID = 0; elemID < srcNum; ++elemID) insn->dst(elemID) = tmp[0]; if (temps.size()) { insn->dst(srcNum) = temps[0]; insn->dst(srcNum + 1) = temps[1]; } insn->extra.elem = srcNum; if (hasSends()) { insn->extra.splitSend = 1; //addr regs vector = this->appendVector(); vector->regNum = 1; vector->offsetID = srcNum; vector->reg = &insn->src(srcNum); vector->isSrc = 1; //data regs vector = this->appendVector(); vector->regNum = srcNum; vector->offsetID = srcNum+1; vector->reg = &insn->src(srcNum+1); vector->isSrc = 1; } else { vector = this->appendVector(); vector->regNum = srcNum + 1; vector->offsetID = srcNum; vector->reg = &insn->src(srcNum); vector->isSrc = 1; } } if (bti.file != GEN_IMMEDIATE_VALUE) { insn->state.flag = 0; insn->state.subFlag = 1; } } void Selection::Opaque::WRITE64A64(Reg addr, const GenRegister *src, const GenRegister *tmp, uint32_t srcNum) { SelectionVector *vector = NULL; SelectionInstruction *insn = NULL; const uint32_t dstNum = srcNum; insn = this->appendInsn(SEL_OP_WRITE64A64, dstNum, srcNum*2 + 1); vector = this->appendVector(); for (uint32_t elemID = 0; elemID < srcNum; ++elemID) insn->src(elemID) = src[elemID]; insn->src(srcNum) = addr; for (uint32_t elemID = 0; elemID < srcNum; ++elemID) insn->src(srcNum + 1 + elemID) = tmp[elemID]; /* We also need to add the tmp reigster to dst, in order to avoid the post schedule error . */ for (uint32_t elemID = 0; elemID < srcNum; ++elemID) insn->dst(elemID) = tmp[elemID]; insn->extra.elem = srcNum; vector->regNum = srcNum + 1; vector->offsetID = srcNum; vector->reg = &insn->src(srcNum); vector->isSrc = 1; } void Selection::Opaque::UNTYPED_WRITE(Reg addr, const GenRegister *src, uint32_t elemNum, GenRegister bti, vector temps) { unsigned dstNum = temps.size(); unsigned srcNum = elemNum + 2 + temps.size(); SelectionInstruction *insn = this->appendInsn(SEL_OP_UNTYPED_WRITE, dstNum, srcNum); if (bti.file != GEN_IMMEDIATE_VALUE) { insn->state.flag = 0; insn->state.subFlag = 1; } // Regular instruction to encode insn->src(0) = addr; for (uint32_t elemID = 0; elemID < elemNum; ++elemID) insn->src(elemID+1) = src[elemID]; insn->src(elemNum+1) = bti; if (temps.size()) { insn->dst(0) = temps[0]; insn->dst(1) = temps[1]; insn->src(elemNum + 2) = temps[0]; insn->src(elemNum + 3) = temps[1]; } insn->extra.elem = elemNum; if (hasSends()) { insn->extra.splitSend = 1; SelectionVector *vector = this->appendVector(); vector->regNum = elemNum; vector->reg = &insn->src(1); vector->offsetID = 1; vector->isSrc = 1; vector = this->appendVector(); vector->regNum = 1; vector->reg = &insn->src(0); vector->offsetID = 0; vector->isSrc = 1; } else { // Sends require contiguous allocation for the sources SelectionVector *vector = this->appendVector(); vector->regNum = elemNum+1; vector->reg = &insn->src(0); vector->offsetID = 0; vector->isSrc = 1; } } void Selection::Opaque::UNTYPED_WRITEA64(const GenRegister *src, uint32_t msgNum, uint32_t elemNum) { SelectionInstruction *insn = this->appendInsn(SEL_OP_UNTYPED_WRITEA64, 0, msgNum); SelectionVector *vector = this->appendVector(); // Regular instruction to encode for (uint32_t id = 0; id < msgNum; ++id) insn->src(id) = src[id]; insn->extra.elem = elemNum; // Sends require contiguous allocation for the sources vector->regNum = msgNum; vector->reg = &insn->src(0); vector->offsetID = 0; vector->isSrc = 1; } void Selection::Opaque::BYTE_GATHER(Reg dst, Reg addr, uint32_t elemSize, GenRegister bti, vector temps) { unsigned dstNum = 1 + temps.size(); SelectionInstruction *insn = this->appendInsn(SEL_OP_BYTE_GATHER, dstNum, 2); SelectionVector *srcVector = this->appendVector(); SelectionVector *dstVector = this->appendVector(); if (bti.file != GEN_IMMEDIATE_VALUE) { insn->state.flag = 0; insn->state.subFlag = 1; } if (this->isScalarReg(dst.reg())) insn->state.noMask = 1; // Instruction to encode insn->src(0) = addr; insn->src(1) = bti; insn->dst(0) = dst; if (temps.size()) { insn->dst(1) = temps[0]; insn->dst(2) = temps[1]; } insn->extra.elem = elemSize; // byte gather requires vector in the sense that scalar are not allowed // (yet) dstVector->regNum = 1; dstVector->isSrc = 0; dstVector->offsetID = 0; dstVector->reg = &insn->dst(0); srcVector->regNum = 1; srcVector->isSrc = 1; srcVector->offsetID = 0; srcVector->reg = &insn->src(0); } void Selection::Opaque::BYTE_SCATTER(Reg addr, Reg src, uint32_t elemSize, GenRegister bti, vector temps) { unsigned dstNum = temps.size(); SelectionInstruction *insn = this->appendInsn(SEL_OP_BYTE_SCATTER, dstNum, 3); if (bti.file != GEN_IMMEDIATE_VALUE) { insn->state.flag = 0; insn->state.subFlag = 1; } if (temps.size()) { insn->dst(0) = temps[0]; insn->dst(1) = temps[1]; } // Instruction to encode insn->src(0) = addr; insn->src(1) = src; insn->src(2) = bti; insn->extra.elem = elemSize; if (hasSends()) { insn->extra.splitSend = 1; SelectionVector *vector = this->appendVector(); vector->regNum = 1; vector->isSrc = 1; vector->offsetID = 0; vector->reg = &insn->src(0); vector = this->appendVector(); vector->regNum = 1; vector->isSrc = 1; vector->offsetID = 1; vector->reg = &insn->src(1); } else { SelectionVector *vector = this->appendVector(); vector->regNum = 2; vector->isSrc = 1; vector->offsetID = 0; vector->reg = &insn->src(0); } } void Selection::Opaque::BYTE_GATHERA64(Reg dst, Reg addr, uint32_t elemSize) { SelectionInstruction *insn = this->appendInsn(SEL_OP_BYTE_GATHERA64, 1, 1); SelectionVector *srcVector = this->appendVector(); SelectionVector *dstVector = this->appendVector(); if (this->isScalarReg(dst.reg())) insn->state.noMask = 1; insn->src(0) = addr; insn->dst(0) = dst; insn->extra.elem = elemSize; dstVector->regNum = 1; dstVector->isSrc = 0; dstVector->offsetID = 0; dstVector->reg = &insn->dst(0); srcVector->regNum = 1; srcVector->isSrc = 1; srcVector->offsetID = 0; srcVector->reg = &insn->src(0); } void Selection::Opaque::BYTE_SCATTERA64(GenRegister *msg, uint32_t msgNum, uint32_t elemSize) { SelectionInstruction *insn = this->appendInsn(SEL_OP_BYTE_SCATTERA64, 0, msgNum); SelectionVector *vector = this->appendVector(); for (unsigned i = 0; i < msgNum; i++) insn->src(i) = msg[i]; insn->extra.elem = elemSize; vector->regNum = msgNum; vector->isSrc = 1; vector->offsetID = 0; vector->reg = &insn->src(0); } void Selection::Opaque::DWORD_GATHER(Reg dst, Reg addr, uint32_t bti) { SelectionInstruction *insn = this->appendInsn(SEL_OP_DWORD_GATHER, 1, 1); SelectionVector *vector = this->appendVector(); SelectionVector *srcVector = this->appendVector(); if (this->isScalarReg(dst.reg())) insn->state.noMask = 1; insn->src(0) = addr; insn->dst(0) = dst; insn->setbti(bti); vector->regNum = 1; vector->isSrc = 0; vector->offsetID = 0; vector->reg = &insn->dst(0); srcVector->regNum = 1; srcVector->isSrc = 1; srcVector->offsetID = 0; srcVector->reg = &insn->src(0); } void Selection::Opaque::UNPACK_BYTE(const GenRegister *dst, const GenRegister src, uint32_t elemSize, uint32_t elemNum) { SelectionInstruction *insn = this->appendInsn(SEL_OP_UNPACK_BYTE, elemNum, 1); insn->src(0) = src; insn->extra.elem = 4 / elemSize; for(uint32_t i = 0; i < elemNum; i++) insn->dst(i) = dst[i]; } void Selection::Opaque::PACK_BYTE(const GenRegister dst, const GenRegister *src, uint32_t elemSize, uint32_t elemNum) { SelectionInstruction *insn = this->appendInsn(SEL_OP_PACK_BYTE, 1, elemNum); for(uint32_t i = 0; i < elemNum; i++) insn->src(i) = src[i]; insn->extra.elem = 4 / elemSize; insn->dst(0) = dst; } void Selection::Opaque::UNPACK_LONG(const GenRegister dst, const GenRegister src) { SelectionInstruction *insn = this->appendInsn(SEL_OP_UNPACK_LONG, 1, 1); insn->src(0) = src; insn->dst(0) = dst; } void Selection::Opaque::PACK_LONG(const GenRegister dst, const GenRegister src) { SelectionInstruction *insn = this->appendInsn(SEL_OP_PACK_LONG, 1, 1); insn->src(0) = src; insn->dst(0) = dst; } void Selection::Opaque::MATH(Reg dst, uint32_t function, Reg src0, Reg src1) { SelectionInstruction *insn = this->appendInsn(SEL_OP_MATH, 1, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; insn->extra.function = function; } void Selection::Opaque::MATH(Reg dst, uint32_t function, Reg src) { SelectionInstruction *insn = this->appendInsn(SEL_OP_MATH, 1, 1); insn->dst(0) = dst; insn->src(0) = src; insn->extra.function = function; } void Selection::Opaque::I64MUL(Reg dst, Reg src0, Reg src1, GenRegister *tmp, bool native_long) { SelectionInstruction *insn = NULL; if (native_long) insn = this->appendInsn(SEL_OP_I64MUL, 2, 2); else insn = this->appendInsn(SEL_OP_I64MUL, 7, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; if (native_long) { insn->dst(1) = tmp[0]; } else { for (int i = 0; i < 6; i++) insn->dst(i + 1) = tmp[i]; } } void Selection::Opaque::I64DIV(Reg dst, Reg src0, Reg src1, GenRegister* tmp, int tmp_num) { SelectionInstruction *insn = this->appendInsn(SEL_OP_I64DIV, tmp_num + 1, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; for(int i = 0; i < tmp_num; i++) insn->dst(i + 1) = tmp[i]; } void Selection::Opaque::I64REM(Reg dst, Reg src0, Reg src1, GenRegister* tmp, int tmp_num) { SelectionInstruction *insn = this->appendInsn(SEL_OP_I64REM, tmp_num + 1, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; for(int i = 0; i < tmp_num; i++) insn->dst(i + 1) = tmp[i]; } void Selection::Opaque::F64DIV(Reg dst, Reg src0, Reg src1, GenRegister* tmp, int tmpNum) { SelectionInstruction *insn = this->appendInsn(SEL_OP_F64DIV, tmpNum + 1, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; for(int i = 0; i < tmpNum; i++) insn->dst(i + 1) = tmp[i]; } void Selection::Opaque::ALU1(SelectionOpcode opcode, Reg dst, Reg src) { SelectionInstruction *insn = this->appendInsn(opcode, 1, 1); insn->dst(0) = dst; insn->src(0) = src; } void Selection::Opaque::ALU1WithTemp(SelectionOpcode opcode, Reg dst, Reg src, Reg temp) { SelectionInstruction *insn = this->appendInsn(opcode, 2, 1); insn->dst(0) = dst; insn->src(0) = src; insn->dst(1) = temp; } void Selection::Opaque::ALU2(SelectionOpcode opcode, Reg dst, Reg src0, Reg src1) { SelectionInstruction *insn = this->appendInsn(opcode, 1, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; } void Selection::Opaque::ALU2WithTemp(SelectionOpcode opcode, Reg dst, Reg src0, Reg src1, Reg temp) { SelectionInstruction *insn = this->appendInsn(opcode, 2, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; insn->dst(1) = temp; } void Selection::Opaque::ALU3(SelectionOpcode opcode, Reg dst, Reg src0, Reg src1, Reg src2) { SelectionInstruction *insn = this->appendInsn(opcode, 1, 3); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; insn->src(2) = src2; } void Selection::Opaque::SIMD_SHUFFLE(Reg dst, Reg src0, Reg src1) { SelectionInstruction *insn = this->appendInsn(SEL_OP_SIMD_SHUFFLE, 1, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; } GenRegister Selection::Opaque::getLaneIDReg() { const GenRegister laneID = GenRegister::immv(0x76543210); GenRegister dst; uint32_t execWidth = curr.execWidth; if (execWidth == 8) { // Work around to force the register 32 alignmet dst = selReg(reg(ir::RegisterFamily::FAMILY_DWORD), ir::TYPE_U16); MOV(dst, laneID); } else { dst = selReg(reg(ir::RegisterFamily::FAMILY_WORD), ir::TYPE_U16); push(); curr.execWidth = 8; curr.noMask = 1; MOV(dst, laneID); //Packed Unsigned Half-Byte Integer Vector does not work //have to mock by adding 8 to the singed vector const GenRegister eight = GenRegister::immuw(8); ADD(GenRegister::offset(dst, 0, 16), dst, eight); pop(); } return dst; } void Selection::Opaque::I64CMP(uint32_t conditional, Reg src0, Reg src1, GenRegister tmp[3]) { SelectionInstruction *insn = this->appendInsn(SEL_OP_I64CMP, 3, 2); insn->src(0) = src0; insn->src(1) = src1; for(int i=0; i<3; i++) insn->dst(i) = tmp[i]; insn->extra.function = conditional; } void Selection::Opaque::I64SATADD(Reg dst, Reg src0, Reg src1, GenRegister tmp[5]) { SelectionInstruction *insn = this->appendInsn(SEL_OP_I64SATADD, 6, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; for(int i=0; i<5; i++) insn->dst(i + 1) = tmp[i]; } void Selection::Opaque::I64SATSUB(Reg dst, Reg src0, Reg src1, GenRegister tmp[5]) { SelectionInstruction *insn = this->appendInsn(SEL_OP_I64SATSUB, 6, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; for(int i=0; i<5; i++) insn->dst(i + 1) = tmp[i]; } void Selection::Opaque::CONVI64_TO_F(Reg dst, Reg src, GenRegister tmp[6]) { SelectionInstruction *insn = this->appendInsn(SEL_OP_CONVI64_TO_F, 7, 1); insn->dst(0) = dst; insn->src(0) = src; for(int i = 0; i < 6; i ++) insn->dst(i + 1) = tmp[i]; } void Selection::Opaque::CONVF_TO_I64(Reg dst, Reg src, GenRegister tmp[2]) { SelectionInstruction *insn = this->appendInsn(SEL_OP_CONVF_TO_I64, 3, 1); insn->dst(0) = dst; insn->src(0) = src; for(int i = 0; i < 2; i ++) insn->dst(i + 1) = tmp[i]; } void Selection::Opaque::I64MADSAT(Reg dst, Reg src0, Reg src1, Reg src2, GenRegister *tmp, int tmp_num) { SelectionInstruction *insn = this->appendInsn(SEL_OP_I64MADSAT, tmp_num + 1, 3); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; insn->src(2) = src2; for(int i = 0; i < tmp_num; i ++) insn->dst(i + 1) = tmp[i]; } void Selection::Opaque::I64_MUL_HI(Reg dst, Reg src0, Reg src1, GenRegister *tmp, int tmp_num) { SelectionInstruction *insn = this->appendInsn(SEL_OP_I64_MUL_HI, tmp_num + 1, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; for(int i = 0; i < tmp_num; i ++) insn->dst(i + 1) = tmp[i]; } void Selection::Opaque::I64HADD(Reg dst, Reg src0, Reg src1, GenRegister *tmp, int tmp_num) { SelectionInstruction *insn = this->appendInsn(SEL_OP_I64HADD, tmp_num + 1, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; for(int i = 0; i < tmp_num; i ++) insn->dst(i + 1) = tmp[i]; } void Selection::Opaque::I64RHADD(Reg dst, Reg src0, Reg src1, GenRegister *tmp, int tmp_num) { SelectionInstruction *insn = this->appendInsn(SEL_OP_I64RHADD, tmp_num + 1, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; for(int i = 0; i < tmp_num; i ++) insn->dst(i + 1) = tmp[i]; } void Selection::Opaque::I64Shift(SelectionOpcode opcode, Reg dst, Reg src0, Reg src1, GenRegister tmp[6]) { SelectionInstruction *insn = this->appendInsn(opcode, 7, 2); insn->dst(0) = dst; insn->src(0) = src0; insn->src(1) = src1; for(int i = 0; i < 6; i ++) insn->dst(i + 1) = tmp[i]; } void Selection::Opaque::CALC_TIMESTAMP(GenRegister ts[5], int tsN, GenRegister tmp, uint32_t pointNum, uint32_t tsType) { SelectionInstruction *insn = NULL; if (!this->hasLongType()) { insn = this->appendInsn(SEL_OP_CALC_TIMESTAMP, tsN + 1, tsN); } else {// No need for tmp insn = this->appendInsn(SEL_OP_CALC_TIMESTAMP, tsN, tsN); } for (int i = 0; i < tsN; i++) { insn->src(i) = ts[i]; insn->dst(i) = ts[i]; } if (!this->hasLongType()) insn->dst(tsN) = tmp; insn->extra.pointNum = static_cast(pointNum); insn->extra.timestampType = static_cast(tsType); } void Selection::Opaque::STORE_PROFILING(uint32_t profilingType, uint32_t bti, GenRegister tmp0, GenRegister tmp1, GenRegister ts[5], int tsNum) { if (tsNum == 3) { // SIMD16 mode SelectionInstruction *insn = this->appendInsn(SEL_OP_STORE_PROFILING, 1, 3); for (int i = 0; i < 3; i++) insn->src(i) = ts[i]; insn->dst(0) = tmp0; insn->extra.profilingType = static_cast(profilingType); insn->extra.profilingBTI = static_cast(bti); } else { // SIMD8 mode GBE_ASSERT(tsNum == 5); SelectionInstruction *insn = this->appendInsn(SEL_OP_STORE_PROFILING, 2, 5); SelectionVector *dstVector = this->appendVector(); for (int i = 0; i < 5; i++) insn->src(i) = ts[i]; insn->dst(0) = tmp0; insn->dst(1) = tmp1; dstVector->regNum = 2; dstVector->isSrc = 0; dstVector->offsetID = 0; dstVector->reg = &insn->dst(0); insn->extra.profilingType = static_cast(profilingType); insn->extra.profilingBTI = static_cast(bti); } } void Selection::Opaque::PRINTF(uint8_t bti, GenRegister tmp0, GenRegister tmp1, GenRegister src[8], int srcNum, uint16_t num, bool isContinue, uint32_t totalSize) { SelectionInstruction *insn = this->appendInsn(SEL_OP_PRINTF, 2, srcNum); for (int i = 0; i < srcNum; i++) insn->src(i) = src[i]; insn->dst(0) = tmp0; insn->dst(1) = tmp1; if (hasSends()) { insn->extra.printfSplitSend = 1; SelectionVector *vector = this->appendVector(); vector->regNum = 1; vector->reg = &insn->dst(0); vector->offsetID = 0; vector->isSrc = 0; vector = this->appendVector(); vector->regNum = 1; vector->reg = &insn->dst(1); vector->offsetID = 1; vector->isSrc = 0; } else { SelectionVector *vector = this->appendVector(); vector->regNum = 2; vector->reg = &insn->dst(0); vector->offsetID = 0; vector->isSrc = 0; } insn->extra.printfSize = static_cast(totalSize); insn->extra.continueFlag = isContinue; insn->extra.printfBTI = bti; insn->extra.printfNum = num; } void Selection::Opaque::WORKGROUP_OP(uint32_t wg_op, Reg dst, GenRegister src, GenRegister tmpData1, GenRegister localThreadID, GenRegister localThreadNUM, GenRegister tmpData2, GenRegister slmOff, vector msg, GenRegister localBarrier) { SelectionInstruction *insn = this->appendInsn(SEL_OP_WORKGROUP_OP, 2 + msg.size(), 6); insn->extra.wgop.workgroupOp = wg_op; insn->dst(0) = dst; insn->dst(1) = tmpData1; for(uint32_t i = 0; i < msg.size(); i++) insn->dst(2 + i) = msg[i]; insn->src(0) = localThreadID; insn->src(1) = localThreadNUM; insn->src(2) = src; insn->src(3) = tmpData2; insn->src(4) = slmOff; insn->src(5) = localBarrier; if (hasSends()) { insn->extra.wgop.splitSend = 1; SelectionVector *vector = this->appendVector(); vector->regNum = 1; vector->offsetID = 2; vector->reg = &insn->dst(2); vector->isSrc = 0; vector = this->appendVector(); vector->regNum = msg.size() - 1; vector->offsetID = 3; vector->reg = &insn->dst(3); vector->isSrc = 0; } else { /* allocate continuous GRF registers for READ/WRITE to SLM */ SelectionVector *vector = this->appendVector(); vector->regNum = msg.size(); vector->offsetID = 2; vector->reg = &insn->dst(2); vector->isSrc = 0; } } void Selection::Opaque::SUBGROUP_OP(uint32_t wg_op, Reg dst, GenRegister src, GenRegister tmpData1, GenRegister tmpData2) { SelectionInstruction *insn = this->appendInsn(SEL_OP_SUBGROUP_OP, 2, 2); insn->extra.wgop.workgroupOp = wg_op; insn->dst(0) = dst; insn->dst(1) = tmpData1; insn->src(0) = src; insn->src(1) = tmpData2; } void Selection::Opaque::OBREAD(GenRegister* dsts, uint32_t vec_size, GenRegister header, uint32_t bti, uint32_t ow_size) { SelectionInstruction *insn = this->appendInsn(SEL_OP_OBREAD, vec_size, 1); SelectionVector *vector = this->appendVector(); insn->src(0) = header; for (uint32_t i = 0; i < vec_size; ++i) insn->dst(i) = dsts[i]; insn->setbti(bti); insn->extra.elem = ow_size; // number of OWord size // tmp regs for OWORD read dst vector->regNum = vec_size; vector->reg = &insn->dst(0); vector->offsetID = 0; vector->isSrc = 0; } void Selection::Opaque::OBWRITE(GenRegister header, GenRegister* values, uint32_t vec_size, uint32_t bti, uint32_t ow_size) { SelectionInstruction *insn = this->appendInsn(SEL_OP_OBWRITE, 0, vec_size + 1); insn->src(0) = header; for (uint32_t i = 0; i < vec_size; ++i) insn->src(i + 1) = values[i]; insn->setbti(bti); insn->extra.elem = ow_size; // number of OWord_size // For A64 write, we did not add sends support yet. if (hasSends() && bti != 255) { insn->extra.splitSend = 1; SelectionVector *vector = this->appendVector(); vector->regNum = 1; vector->reg = &insn->src(0); vector->offsetID = 0; vector->isSrc = 1; vector = this->appendVector(); vector->regNum = vec_size; vector->reg = &insn->src(1); vector->offsetID = 1; vector->isSrc = 1; } else { // tmp regs for OWORD write header and values SelectionVector *vector = this->appendVector(); vector->regNum = vec_size + 1; vector->reg = &insn->src(0); vector->offsetID = 0; vector->isSrc = 1; } } void Selection::Opaque::MBREAD(GenRegister* dsts, uint32_t tmp_size, GenRegister header, uint32_t bti, uint32_t response_size) { SelectionInstruction *insn = this->appendInsn(SEL_OP_MBREAD, tmp_size, 1); insn->src(0) = header; insn->setbti(bti); insn->extra.elem = response_size; // send response length for (uint32_t i = 0; i < tmp_size; ++i) { insn->dst(i) = dsts[i]; } SelectionVector *vector = this->appendVector(); vector->regNum = tmp_size; vector->reg = &insn->dst(0); vector->offsetID = 0; vector->isSrc = 0; } void Selection::Opaque::MBWRITE(GenRegister header, GenRegister* values, uint32_t tmp_size, uint32_t bti, uint32_t data_size) { SelectionInstruction *insn = this->appendInsn(SEL_OP_MBWRITE, 0, 1 + tmp_size); insn->src(0) = header; for (uint32_t i = 0; i < tmp_size; ++i) insn->src(1 + i) = values[i]; insn->setbti(bti); insn->extra.elem = data_size; // msg data part size if (hasSends()) { insn->extra.splitSend = 1; SelectionVector *vector = this->appendVector(); vector->regNum = 1; vector->reg = &insn->src(0); vector->offsetID = 0; vector->isSrc = 1; vector = this->appendVector(); vector->regNum = tmp_size; vector->reg = &insn->src(1); vector->offsetID = 1; vector->isSrc = 1; } else { // We need to put the header and the data together SelectionVector *vector = this->appendVector(); vector->regNum = 1 + tmp_size; vector->reg = &insn->src(0); vector->offsetID = 0; vector->isSrc = 1; } } // Boiler plate to initialize the selection library at c++ pre-main static SelectionLibrary *selLib = NULL; static void destroySelectionLibrary(void) { GBE_DELETE(selLib); } static struct SelectionLibraryInitializer { SelectionLibraryInitializer(void) { selLib = GBE_NEW_NO_ARG(SelectionLibrary); atexit(destroySelectionLibrary); } } selectionLibraryInitializer; bool Selection::Opaque::isRoot(const ir::Instruction &insn) const { if (insn.hasSideEffect() || insn.isMemberOf() || insn.isMemberOf()) return true; // No side effect, not a branch and no destination? Impossible GBE_ASSERT(insn.getDstNum() >= 1); // Root if alive outside the block. // XXX we should use Value and not registers in liveness info const ir::BasicBlock *insnBlock = insn.getParent(); const ir::Liveness &liveness = this->ctx.getLiveness(); const ir::Liveness::LiveOut &liveOut = liveness.getLiveOut(insnBlock); for(uint32_t i = 0; i < insn.getDstNum(); i++) { const ir::Register reg = insn.getDst(i); if (liveOut.contains(reg)) return true; } // The instruction is only used in the current basic block return false; } bool Selection::Opaque::hasQWord(const ir::Instruction &insn) { for (uint32_t i = 0; i < insn.getSrcNum(); i++) { const ir::Register reg = insn.getSrc(i); if (getRegisterFamily(reg) == ir::FAMILY_QWORD) return true; } for (uint32_t i = 0; i < insn.getDstNum(); i++) { const ir::Register reg = insn.getDst(i); if (getRegisterFamily(reg) == ir::FAMILY_QWORD) return true; } return false; } bool Selection::Opaque::isSimpleBlock(const ir::BasicBlock &bb, uint32_t insnNum) { // FIXME should include structured innermost if/else/endif if(bb.belongToStructure) return false; // FIXME scalar reg should not be excluded and just need some special handling. for (int32_t insnID = insnNum-1; insnID >= 0; --insnID) { SelectionDAG &dag = *insnDAG[insnID]; const ir::Instruction& insn = dag.insn; if ( (insn.getDstNum() && this->isScalarReg(insn.getDst(0)) == true) || insn.isMemberOf() || insn.isMemberOf() || insn.getOpcode() == ir::OP_SIMD_ANY || insn.getOpcode() == ir::OP_SIMD_ALL || insn.getOpcode() == ir::OP_ELSE) return false; // Most of the QWord(long) related instruction introduce some CMP or // more than 10 actual instructions at latter stage. if (hasQWord(insn)) return false; // Unaligned load may introduce CMP instruction. if ( insn.isMemberOf()) { const ir::LoadInstruction &ld = ir::cast(insn); if (!ld.isAligned()) return false; } //If dst is a bool reg, the insn may modify flag, can't use this flag //as predication, so can't remove if/endif. For example ir: //%or.cond1244 = or i1 %cmp.i338, %cmp2.i403 //%or.cond1245 = or i1 %or.cond1244, %cmp3.i405 //asm: //(+f1.0) or.ne(16) g20<1>:W g9<8,8,1>:W g1<8,8,1>:W //(+f1.1) or.ne.f1.1(16) g21<1>:W g20<8,8,1>:W g30<8,8,1>:W //The second insn is error. if(insn.getDstNum() && getRegisterFamily(insn.getDst(0)) == ir::FAMILY_BOOL) return false; } // there would generate a extra CMP instruction for predicated BRA with extern flag, // should retrun false to keep the if/endif. if((insnDAG[insnNum-1]->insn.isMemberOf())){ if (insnDAG[insnNum-1]->insn.getOpcode() == ir::OP_BRA) { const ir::BranchInstruction &insn = ir::cast(insnDAG[insnNum-1]->insn); if(insn.isPredicated() && insnDAG[insnNum-1]->child[0] == NULL){ return false; } } } return true; } uint32_t Selection::Opaque::buildBasicBlockDAG(const ir::BasicBlock &bb) { using namespace ir; // Clear all registers for (uint32_t regID = 0; regID < this->regNum; ++regID) this->regDAG[regID] = NULL; this->block->hasBarrier = false; this->block->hasBranch = bb.getLastInstruction()->getOpcode() == OP_BRA || bb.getLastInstruction()->getOpcode() == OP_RET; if (!this->block->hasBranch) this->block->endifOffset = -1; // Build the DAG on the fly uint32_t insnNum = 0; const_cast(bb).foreach([&](const Instruction &insn) { if (insn.getOpcode() == OP_SYNC) this->block->hasBarrier = true; // Build a selectionDAG node for instruction SelectionDAG *dag = this->newSelectionDAG(insn); // Point to non-root children const uint32_t srcNum = insn.getSrcNum(); for (uint32_t srcID = 0; srcID < srcNum; ++srcID) { const ir::Register reg = insn.getSrc(srcID); SelectionDAG *child = this->regDAG[reg]; if (child) { const ir::Instruction &childInsn = child->insn; const uint32_t childSrcNum = childInsn.getSrcNum(); // We can merge a child only if its sources are still valid bool mergeable = true; for (uint32_t otherID = 0; otherID < childSrcNum; ++otherID) { const SelectionDAG *srcDAG = child->child[otherID]; const ir::Register srcReg = childInsn.getSrc(otherID); SelectionDAG *currDAG = this->regDAG[srcReg]; if (srcDAG != currDAG) { mergeable = false; break; } } if (mergeable) dag->setAsMergeable(srcID); dag->child[srcID] = child; // Check whether this bool is used as a normal source // oprand other than BRA/SEL. if (getRegisterFamily(reg) == FAMILY_BOOL) { if ((insn.getOpcode() != OP_BRA && (insn.getOpcode() != OP_SEL || (insn.getOpcode() == OP_SEL && srcID != 0))) || (isScalarReg(reg))) child->computeBool = true; } child->isUsed = true; } else dag->child[srcID] = NULL; } // Make it a root if we must if (this->isRoot(insn)) dag->isRoot = 1; // Save the DAG <-> instruction mapping this->insnDAG[insnNum++] = dag; // Associate all output registers to this instruction const uint32_t dstNum = insn.getDstNum(); for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { const ir::Register reg = insn.getDst(dstID); this->regDAG[reg] = dag; } }); return insnNum; } extern bool OCL_DEBUGINFO; // first defined by calling BVAR in program.cpp #define SET_SEL_DBGINFO(I) \ if(OCL_DEBUGINFO) \ this->setDBGInfo_SEL(I.DBGInfo) void Selection::Opaque::matchBasicBlock(const ir::BasicBlock &bb, uint32_t insnNum) { // Bottom up code generation bool needEndif = this->block->hasBranch == false && !this->block->hasBarrier; needEndif = needEndif && bb.needEndif; this->block->removeSimpleIfEndif = insnNum < 10 && isSimpleBlock(bb, insnNum); if (needEndif && !this->block->removeSimpleIfEndif) { if(!bb.needIf) // this basic block is the exit of a structure this->ENDIF(GenRegister::immd(0), bb.endifLabel, bb.endifLabel); else { const ir::BasicBlock *next = bb.getNextBlock(); this->ENDIF(GenRegister::immd(0), next->getLabelIndex()); needEndif = false; } } for (int32_t insnID = insnNum-1; insnID >= 0; --insnID) { // Process all possible patterns for this instruction SelectionDAG &dag = *insnDAG[insnID]; SET_SEL_DBGINFO(dag.insn); if (dag.isRoot) { const ir::Instruction &insn = dag.insn; const ir::Opcode opcode = insn.getOpcode(); auto it = selLib->patterns[opcode].begin(); const auto end = selLib->patterns[opcode].end(); // Start a new code fragment this->startBackwardGeneration(); if(this->block->removeSimpleIfEndif){ this->push(); this->curr.predicate = GEN_PREDICATE_NORMAL; this->curr.flag = 0; this->curr.subFlag = 1; } // If there is no branch at the end of this block. // Try all the patterns from best to worst do { if ((*it)->emit(*this, dag)) break; ++it; } while (it != end); GBE_ASSERT(it != end); if(this->block->removeSimpleIfEndif){ this->curr.predicate = GEN_PREDICATE_NONE; this->curr.flag = 0; this->curr.subFlag = 1; this->pop(); } // If we are in if/endif fix mode, and this block is // large enough, we need to insert endif/if pair to eliminate // the too long if/endif block. if (this->ctx.getIFENDIFFix() && this->block->insnList.size() != 0 && this->block->insnList.size() % 1000 == 0 && this->block->endifLabel.value() != 0) { this->curr.flag = 0; this->curr.subFlag = 1; ir::LabelIndex jip = this->block->endifLabel; this->ENDIF(GenRegister::immd(0), jip); this->push(); this->curr.predicate = GEN_PREDICATE_NORMAL; this->IF(GenRegister::immd(0), jip, jip); this->pop(); } // Output the code in the current basic block this->endBackwardGeneration(); } } } #undef SET_SEL_DBGINFO void Selection::Opaque::select(void) { using namespace ir; const Function &fn = ctx.getFunction(); // Perform the selection per basic block fn.foreachBlock([&](const BasicBlock &bb) { this->dagPool.rewind(); this->appendBlock(bb); const uint32_t insnNum = this->buildBasicBlockDAG(bb); this->matchBasicBlock(bb, insnNum); }); } void Selection::Opaque::SAMPLE(GenRegister *dst, uint32_t dstNum, GenRegister *msgPayloads, uint32_t msgNum, uint32_t bti, uint32_t sampler, bool isLD, bool isUniform) { SelectionInstruction *insn = this->appendInsn(SEL_OP_SAMPLE, dstNum, msgNum); SelectionVector *dstVector = this->appendVector(); SelectionVector *msgVector = this->appendVector(); // Regular instruction to encode for (uint32_t elemID = 0; elemID < dstNum; ++elemID) insn->dst(elemID) = dst[elemID]; for (uint32_t elemID = 0; elemID < msgNum; ++elemID) insn->src(elemID) = msgPayloads[elemID]; // Sends require contiguous allocation dstVector->regNum = dstNum; dstVector->isSrc = 0; dstVector->offsetID = 0; dstVector->reg = &insn->dst(0); // Only the messages require contiguous registers. msgVector->regNum = msgNum; msgVector->isSrc = 1; msgVector->offsetID = 0; msgVector->reg = &insn->src(0); insn->setbti(bti); insn->extra.sampler = sampler; insn->extra.rdmsglen = msgNum; insn->extra.isLD = isLD; insn->extra.isUniform = isUniform; } void Selection::Opaque::VME(uint32_t bti, GenRegister *dst, GenRegister *payloadVal, uint32_t dstNum, uint32_t srcNum, uint32_t msg_type, uint32_t vme_search_path_lut, uint32_t lut_sub) { SelectionInstruction *insn = this->appendInsn(SEL_OP_VME, dstNum, srcNum); SelectionVector *dstVector = this->appendVector(); SelectionVector *msgVector = this->appendVector(); for (uint32_t elemID = 0; elemID < dstNum; ++elemID) insn->dst(elemID) = dst[elemID]; for (uint32_t elemID = 0; elemID < srcNum; ++elemID) insn->src(elemID) = payloadVal[elemID]; dstVector->regNum = dstNum; dstVector->isSrc = 0; dstVector->offsetID = 0; dstVector->reg = &insn->dst(0); msgVector->regNum = srcNum; msgVector->isSrc = 1; msgVector->offsetID = 0; msgVector->reg = &insn->src(0); insn->setbti(bti); insn->extra.msg_type = msg_type; insn->extra.vme_search_path_lut = vme_search_path_lut; insn->extra.lut_sub = lut_sub; } /////////////////////////////////////////////////////////////////////////// // Code selection public implementation /////////////////////////////////////////////////////////////////////////// const GenContext& Selection::getCtx() { return this->opaque->ctx; } Selection::Selection(GenContext &ctx) { this->blockList = NULL; this->opaque = GBE_NEW(Selection::Opaque, ctx); this->opaque->setSlowByteGather(true); opt_features = 0; } Selection75::Selection75(GenContext &ctx) : Selection(ctx) { this->opaque->setSlowByteGather(false); opt_features = 0; } Selection8::Selection8(GenContext &ctx) : Selection(ctx) { this->opaque->setHas32X32Mul(true); this->opaque->setHasLongType(true); this->opaque->setHasDoubleType(true); this->opaque->setSlowByteGather(false); this->opaque->setHasHalfType(true); opt_features = SIOF_LOGICAL_SRCMOD; } SelectionChv::SelectionChv(GenContext &ctx) : Selection(ctx) { this->opaque->setHas32X32Mul(true); this->opaque->setHasLongType(true); this->opaque->setHasDoubleType(true); this->opaque->setLongRegRestrict(true); this->opaque->setSlowByteGather(false); this->opaque->setHasHalfType(true); opt_features = SIOF_LOGICAL_SRCMOD | SIOF_OP_MOV_LONG_REG_RESTRICT; } Selection9::Selection9(GenContext &ctx) : Selection(ctx) { this->opaque->setHas32X32Mul(true); this->opaque->setHasLongType(true); this->opaque->setHasDoubleType(true); this->opaque->setLdMsgOrder(LD_MSG_ORDER_SKL); this->opaque->setSlowByteGather(false); this->opaque->setHasHalfType(true); this->opaque->setHasSends(true); opt_features = SIOF_LOGICAL_SRCMOD; } SelectionBxt::SelectionBxt(GenContext &ctx) : Selection(ctx) { this->opaque->setHas32X32Mul(true); this->opaque->setHasLongType(true); this->opaque->setLongRegRestrict(true); this->opaque->setHasDoubleType(true); this->opaque->setLdMsgOrder(LD_MSG_ORDER_SKL); this->opaque->setSlowByteGather(false); this->opaque->setHasHalfType(true); opt_features = SIOF_LOGICAL_SRCMOD | SIOF_OP_MOV_LONG_REG_RESTRICT; } SelectionKbl::SelectionKbl(GenContext &ctx) : Selection(ctx) { this->opaque->setHas32X32Mul(true); this->opaque->setHasLongType(true); this->opaque->setHasDoubleType(true); this->opaque->setLdMsgOrder(LD_MSG_ORDER_SKL); this->opaque->setSlowByteGather(false); this->opaque->setHasHalfType(true); this->opaque->setHasSends(true); opt_features = SIOF_LOGICAL_SRCMOD; } SelectionGlk::SelectionGlk(GenContext &ctx) : Selection(ctx) { this->opaque->setHas32X32Mul(true); this->opaque->setHasLongType(true); this->opaque->setLongRegRestrict(true); this->opaque->setHasDoubleType(true); this->opaque->setLdMsgOrder(LD_MSG_ORDER_SKL); this->opaque->setSlowByteGather(false); this->opaque->setHasHalfType(true); opt_features = SIOF_LOGICAL_SRCMOD | SIOF_OP_MOV_LONG_REG_RESTRICT; } void Selection::Opaque::TYPED_WRITE(GenRegister *msgs, uint32_t msgNum, uint32_t bti, bool is3D) { uint32_t elemID = 0; uint32_t i; SelectionInstruction *insn = this->appendInsn(SEL_OP_TYPED_WRITE, 0, msgNum); for( i = 0; i < msgNum; ++i, ++elemID) insn->src(elemID) = msgs[i]; insn->setbti(bti); insn->extra.msglen = msgNum; insn->extra.is3DWrite = is3D; if (hasSends()) { assert(msgNum == 9); insn->extra.typedWriteSplitSend = 1; //header + coords SelectionVector *msgVector = this->appendVector(); msgVector->regNum = 5; msgVector->isSrc = 1; msgVector->offsetID = 0; msgVector->reg = &insn->src(0); //data msgVector = this->appendVector(); msgVector->regNum = 4; msgVector->isSrc = 1; msgVector->offsetID = 5; msgVector->reg = &insn->src(5); } else { // Send require contiguous allocation SelectionVector *msgVector = this->appendVector(); msgVector->regNum = msgNum; msgVector->isSrc = 1; msgVector->offsetID = 0; msgVector->reg = &insn->src(0); } } Selection::~Selection(void) { GBE_DELETE(this->opaque); } void Selection::select(void) { this->opaque->select(); this->blockList = &this->opaque->blockList; } uint32_t Selection::getLargestBlockSize(void) const { return this->opaque->getLargestBlockSize(); } uint32_t Selection::getVectorNum(void) const { return this->opaque->getVectorNum(); } uint32_t Selection::getRegNum(void) const { return this->opaque->getRegNum(); } ir::RegisterFamily Selection::getRegisterFamily(ir::Register reg) const { return this->opaque->getRegisterFamily(reg); } ir::RegisterData Selection::getRegisterData(ir::Register reg) const { return this->opaque->getRegisterData(reg); } ir::Register Selection::replaceSrc(SelectionInstruction *insn, uint32_t regID, ir::Type type, bool needMov) { return this->opaque->replaceSrc(insn, regID, type, needMov); } ir::Register Selection::replaceDst(SelectionInstruction *insn, uint32_t regID, ir::Type type, bool needMov) { return this->opaque->replaceDst(insn, regID, type, needMov); } bool Selection::spillRegs(const SpilledRegs &spilledRegs, uint32_t registerPool) { return this->opaque->spillRegs(spilledRegs, registerPool); } bool Selection::isScalarReg(const ir::Register ®) const { return this->opaque->isScalarReg(reg); } bool Selection::isPartialWrite(const ir::Register ®) const { return this->opaque->isPartialWrite(reg); } SelectionInstruction *Selection::create(SelectionOpcode opcode, uint32_t dstNum, uint32_t srcNum) { return this->opaque->create(opcode, dstNum, srcNum); } /////////////////////////////////////////////////////////////////////////// // Implementation of all patterns /////////////////////////////////////////////////////////////////////////// bool canGetRegisterFromImmediate(const ir::Instruction &insn) { using namespace ir; const auto &childInsn = cast(insn); const auto &imm = childInsn.getImmediate(); if(imm.getType() != TYPE_DOUBLE && imm.getType() != TYPE_S64 && imm.getType() != TYPE_U64) return true; return false; } GenRegister getRegisterFromImmediate(ir::Immediate imm, ir::Type type, bool negate = false) { using namespace ir; int sign = negate ? -1 : 1; switch (type) { case TYPE_U32: return GenRegister::immud(imm.getIntegerValue() * sign); case TYPE_S32: return GenRegister::immd(imm.getIntegerValue() * sign); case TYPE_FLOAT: return GenRegister::immf(imm.getFloatValue() * sign); case TYPE_U16: return GenRegister::immuw(imm.getIntegerValue() * sign); case TYPE_S16: return GenRegister::immw((int16_t)imm.getIntegerValue() * sign); case TYPE_U8: return GenRegister::immuw(imm.getIntegerValue() * sign); case TYPE_S8: return GenRegister::immw((int8_t)imm.getIntegerValue() * sign); case TYPE_DOUBLE: return GenRegister::immdf(imm.getDoubleValue() * sign); case TYPE_BOOL: return GenRegister::immw((imm.getIntegerValue() == 0) ? 0 : -1); //return 0xffff when true case TYPE_HALF: { ir::half hf = imm.getHalfValue(); int16_t _sign = negate ? -1 : 1; ir::half hfSign = ir::half::convToHalf(_sign); hf = hf * hfSign; return GenRegister::immh(hf.getVal()); } default: NOT_SUPPORTED; return GenRegister::immuw(0); } } BVAR(OCL_OPTIMIZE_IMMEDIATE, true); void Selection::Opaque::getSrcGenRegImm(SelectionDAG &dag, SelectionDAG *dag0, SelectionDAG *dag1, GenRegister &src0, GenRegister &src1, ir::Type type, bool &inverse) { using namespace ir; inverse = false; // Right source can always be an immediate const int src0Index = dag.insn.isMemberOf() ? SelectInstruction::src0Index : 0; const int src1Index = dag.insn.isMemberOf() ? SelectInstruction::src1Index : 1; if (OCL_OPTIMIZE_IMMEDIATE && dag1 != NULL && dag1->insn.getOpcode() == OP_LOADI && canGetRegisterFromImmediate(dag1->insn)) { const auto &childInsn = cast(dag1->insn); src0 = this->selReg(dag.insn.getSrc(src0Index), type); src1 = getRegisterFromImmediate(childInsn.getImmediate(), type); if (dag0) dag0->isRoot = 1; } // Left source cannot be immediate but it is OK if we can commute else if (OCL_OPTIMIZE_IMMEDIATE && dag0 != NULL && dag.insn.isMemberOf() && ((cast(dag.insn)).commutes() || dag.insn.getOpcode() == OP_SUB) && dag0->insn.getOpcode() == OP_LOADI && canGetRegisterFromImmediate(dag0->insn)) { const auto &childInsn = cast(dag0->insn); src0 = dag.insn.getOpcode() != OP_SUB ? this->selReg(dag.insn.getSrc(src1Index), type) : GenRegister::negate(this->selReg(dag.insn.getSrc(src1Index), type)); Immediate imm = childInsn.getImmediate(); src1 = getRegisterFromImmediate(imm, type, dag.insn.getOpcode() == OP_SUB); if (dag1) dag1->isRoot = 1; } // If it's a compare instruction, theoritically, we can easily revert the condition code to // switch the two operands. But we can't do that for float due to the NaN's exist. // For a normal select instruction, we can always inverse the predication to switch the two // operands' position. else if (OCL_OPTIMIZE_IMMEDIATE && dag0 != NULL && dag0->insn.getOpcode() == OP_LOADI && canGetRegisterFromImmediate(dag0->insn) && ((dag.insn.isMemberOf() && type != TYPE_FLOAT && type != TYPE_DOUBLE) || (dag.insn.isMemberOf()))) { const auto &childInsn = cast(dag0->insn); src0 = this->selReg(dag.insn.getSrc(src1Index), type); src1 = getRegisterFromImmediate(childInsn.getImmediate(), type); inverse = true; if (dag1) dag1->isRoot = 1; } // Just grab the two sources else { src0 = this->selReg(dag.insn.getSrc(src0Index), type); src1 = this->selReg(dag.insn.getSrc(src1Index), type); markAllChildren(dag); } } void Selection::Opaque::getSrcGenRegImm(SelectionDAG &dag, GenRegister &src0, GenRegister &src1, ir::Type type, bool &inverse) { SelectionDAG *dag0 = dag.child[0]; SelectionDAG *dag1 = dag.child[1]; getSrcGenRegImm(dag, dag0, dag1, src0, src1, type, inverse); } /*! Template for the one-to-many instruction patterns */ template class OneToManyPattern : public SelectionPattern { public: /*! Register the pattern for all opcodes of the family */ OneToManyPattern(uint32_t insnNum, uint32_t cost) : SelectionPattern(insnNum, cost) { for (uint32_t op = 0; op < ir::OP_INVALID; ++op) if (ir::isOpcodeFrom(ir::Opcode(op)) == true) this->opcodes.push_back(ir::Opcode(op)); } /*! Call the child method with the proper prototype */ virtual bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { bool markChildren = true; if (static_cast(this)->emitOne(sel, ir::cast(dag.insn), markChildren)) { if (markChildren) markAllChildren(dag); return true; } return false; } }; /*! Declare a naive one-to-many pattern */ #define DECL_PATTERN(FAMILY) \ struct FAMILY##Pattern : public OneToManyPattern #define DECL_CTOR(FAMILY, INSN_NUM, COST) \ FAMILY##Pattern(void) : OneToManyPattern(INSN_NUM, COST) {} /*! Nullary instruction patterns */ class NullaryInstructionPattern : public SelectionPattern { public: NullaryInstructionPattern(void) : SelectionPattern(1,1) { for (uint32_t op = 0; op < ir::OP_INVALID; ++op) if (ir::isOpcodeFrom(ir::Opcode(op)) == true) this->opcodes.push_back(ir::Opcode(op)); } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::NullaryInstruction &insn = cast(dag.insn); const Opcode opcode = insn.getOpcode(); const Type type = insn.getType(); GenRegister dst = sel.selReg(insn.getDst(0), type); sel.push(); if (sel.isScalarReg(insn.getDst(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } switch (opcode) { case ir::OP_SIMD_SIZE: { const GenRegister src = GenRegister::immud(sel.ctx.getSimdWidth()); sel.MOV(dst, src); } break; case ir::OP_SIMD_ID: { GenRegister laneID = sel.getLaneIDReg(); sel.MOV(dst, laneID); } break; default: NOT_SUPPORTED; } sel.pop(); return true; } }; /*! Unary instruction patterns */ DECL_PATTERN(UnaryInstruction) { static ir::Type getType(const ir::Opcode opcode, const ir::Type insnType, bool isSrc = false) { if (opcode == ir::OP_CBIT) return isSrc ? insnType : ir::TYPE_U32; if (insnType == ir::TYPE_BOOL) return ir::TYPE_U16; else if (opcode == ir::OP_MOV && (insnType == ir::TYPE_U32 || insnType == ir::TYPE_S32)) return ir::TYPE_FLOAT; else return insnType; } INLINE bool emitOne(Selection::Opaque &sel, const ir::UnaryInstruction &insn, bool &markChildren) const { const ir::Opcode opcode = insn.getOpcode(); const ir::Type insnType = insn.getType(); const GenRegister dst = sel.selReg(insn.getDst(0), getType(opcode, insnType, false)); const GenRegister src = sel.selReg(insn.getSrc(0), getType(opcode, insnType, true)); sel.push(); if (sel.isScalarReg(insn.getDst(0)) == true) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } switch (opcode) { case ir::OP_ABS: { const GenRegister src_ = GenRegister::retype(src, getGenType(insnType)); const GenRegister dst_ = GenRegister::retype(dst, getGenType(insnType)); sel.MOV(dst_, GenRegister::abs(src_)); } break; case ir::OP_MOV: { sel.push(); auto dag = sel.regDAG[insn.getDst(0)]; if (sel.getRegisterFamily(insn.getDst(0)) == ir::FAMILY_BOOL && dag->isUsed) { sel.curr.physicalFlag = 0; sel.curr.flagIndex = insn.getDst(0).value(); sel.curr.modFlag = 1; } sel.MOV(dst, src); sel.pop(); } break; case ir::OP_RNDD: sel.RNDD(dst, src); break; case ir::OP_RNDE: sel.RNDE(dst, src); break; case ir::OP_RNDU: sel.RNDU(dst, src); break; case ir::OP_RNDZ: sel.RNDZ(dst, src); break; case ir::OP_FBH: sel.FBH(dst, src); break; case ir::OP_FBL: sel.FBL(dst, src); break; case ir::OP_CBIT: sel.CBIT(dst, src); break; case ir::OP_LZD: sel.LZD(dst, src); break; case ir::OP_BFREV: sel.BFREV(dst, src); break; case ir::OP_COS: sel.MATH(dst, GEN_MATH_FUNCTION_COS, src); break; case ir::OP_SIN: sel.MATH(dst, GEN_MATH_FUNCTION_SIN, src); break; case ir::OP_LOG: sel.MATH(dst, GEN_MATH_FUNCTION_LOG, src); break; case ir::OP_EXP: sel.MATH(dst, GEN_MATH_FUNCTION_EXP, src); break; case ir::OP_SQR: sel.MATH(dst, GEN_MATH_FUNCTION_SQRT, src); break; case ir::OP_RSQ: sel.MATH(dst, GEN_MATH_FUNCTION_RSQ, src); break; case ir::OP_RCP: sel.MATH(dst, GEN_MATH_FUNCTION_INV, src); break; case ir::OP_BSWAP: { ir::Register tmp = sel.reg(getFamily(insnType)); const GenRegister src_ = GenRegister::retype(src, getGenType(insnType)); const GenRegister dst_ = GenRegister::retype(dst, getGenType(insnType)); sel.BSWAP(dst_, src_, sel.selReg(tmp, insnType)); break; } case ir::OP_SIMD_ANY: { const GenRegister constZero = GenRegister::immuw(0);; const GenRegister constOne = GenRegister::retype(sel.selReg(sel.reg(ir::FAMILY_DWORD)), GEN_TYPE_UD); const GenRegister flag01 = GenRegister::flag(0, 1); sel.push(); int simdWidth = sel.curr.execWidth; sel.curr.predicate = GEN_PREDICATE_NONE; sel.MOV(constOne, GenRegister::immud(1)); sel.curr.execWidth = 1; sel.curr.noMask = 1; sel.MOV(flag01, constZero); sel.curr.execWidth = simdWidth; sel.curr.noMask = 0; sel.curr.flag = 0; sel.curr.subFlag = 1; sel.CMP(GEN_CONDITIONAL_NEQ, src, constZero); if (sel.curr.execWidth == 16) sel.curr.predicate = GEN_PREDICATE_ALIGN1_ANY16H; else if (sel.curr.execWidth == 8) sel.curr.predicate = GEN_PREDICATE_ALIGN1_ANY8H; else NOT_IMPLEMENTED; sel.SEL(dst, constOne, constZero); sel.pop(); } break; case ir::OP_SIMD_ALL: { const GenRegister constZero = GenRegister::immuw(0); const GenRegister constOne = GenRegister::retype(sel.selReg(sel.reg(ir::FAMILY_DWORD)), GEN_TYPE_UD); const GenRegister regOne = GenRegister::uw1grf(ir::ocl::one); const GenRegister flag01 = GenRegister::flag(0, 1); sel.push(); int simdWidth = sel.curr.execWidth; sel.curr.predicate = GEN_PREDICATE_NONE; sel.MOV(constOne, GenRegister::immud(1)); sel.curr.execWidth = 1; sel.curr.noMask = 1; sel.MOV(flag01, regOne); sel.curr.execWidth = simdWidth; sel.curr.noMask = 0; sel.curr.flag = 0; sel.curr.subFlag = 1; sel.CMP(GEN_CONDITIONAL_NEQ, src, constZero); if (sel.curr.execWidth == 16) sel.curr.predicate = GEN_PREDICATE_ALIGN1_ALL16H; else if (sel.curr.execWidth == 8) sel.curr.predicate = GEN_PREDICATE_ALIGN1_ALL8H; else NOT_IMPLEMENTED; sel.SEL(dst, constOne, constZero); sel.pop(); } break; default: NOT_SUPPORTED; } sel.pop(); return true; } DECL_CTOR(UnaryInstruction, 1, 1) }; /*! Binary regular instruction pattern */ class BinaryInstructionPattern : public SelectionPattern { public: BinaryInstructionPattern(void) : SelectionPattern(1,1) { for (uint32_t op = 0; op < ir::OP_INVALID; ++op) if (ir::isOpcodeFrom(ir::Opcode(op)) == true) this->opcodes.push_back(ir::Opcode(op)); } bool emitDivRemInst(Selection::Opaque &sel, SelectionDAG &dag, ir::Opcode op) const { using namespace ir; const ir::BinaryInstruction &insn = cast(dag.insn); const Type type = insn.getType(); GenRegister dst = sel.selReg(insn.getDst(0), type); GenRegister src0 = sel.selReg(insn.getSrc(0), type); GenRegister src1 = sel.selReg(insn.getSrc(1), type); const uint32_t simdWidth = sel.curr.execWidth; const bool isUniform = simdWidth == 1; const RegisterFamily family = getFamily(type); uint32_t function = (op == OP_DIV)? GEN_MATH_FUNCTION_INT_DIV_QUOTIENT : GEN_MATH_FUNCTION_INT_DIV_REMAINDER; //bytes and shorts must be converted to int for DIV and REM per GEN restriction if((family == FAMILY_WORD || family == FAMILY_BYTE) && (type != TYPE_HALF)) { GenRegister tmp0, tmp1; ir::Register reg = sel.reg(FAMILY_DWORD, isUniform); tmp0 = sel.selReg(reg, ir::TYPE_S32); sel.MOV(tmp0, src0); tmp1 = sel.selReg(sel.reg(FAMILY_DWORD, isUniform), ir::TYPE_S32); sel.MOV(tmp1, src1); sel.MATH(tmp0, function, tmp0, tmp1); GenRegister unpacked; if(family == FAMILY_WORD) { unpacked = sel.unpacked_uw(reg); } else { unpacked = sel.unpacked_ub(reg); } unpacked = GenRegister::retype(unpacked, getGenType(type)); sel.MOV(dst, unpacked); } else if (type == TYPE_HALF) { ir::Register reg = sel.reg(FAMILY_DWORD, isUniform); GenRegister tmp0 = sel.selReg(sel.reg(FAMILY_DWORD, isUniform), ir::TYPE_FLOAT); GenRegister tmp1 = sel.selReg(reg, ir::TYPE_FLOAT); sel.MOV(tmp0, src0); sel.MOV(tmp1, src1); GBE_ASSERT(op != OP_REM); sel.MATH(tmp0, GEN_MATH_FUNCTION_FDIV, tmp0, tmp1); GenRegister unpacked = GenRegister::retype(sel.unpacked_uw(reg), GEN_TYPE_HF); sel.MOV(unpacked, tmp0); sel.MOV(dst, unpacked); } else if (type == TYPE_S32 || type == TYPE_U32 ) { sel.MATH(dst, function, src0, src1); } else if(type == TYPE_FLOAT) { GBE_ASSERT(op != OP_REM); sel.MATH(dst, GEN_MATH_FUNCTION_FDIV, src0, src1); } else if (type == TYPE_S64 || type == TYPE_U64) { GenRegister tmp[15]; int tmp_num = 13; for(int i=0; i < 13; i++) { tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD)); tmp[i].type = GEN_TYPE_UD; } if (sel.hasLongType()) { if (!sel.isScalarReg(insn.getSrc(0))) { tmp[tmp_num] = GenRegister::retype(sel.selReg(sel.reg(FAMILY_QWORD)), src0.type); tmp_num++; } if (!sel.isScalarReg(insn.getSrc(1))) { tmp[tmp_num] = GenRegister::retype(sel.selReg(sel.reg(FAMILY_QWORD)), src1.type); tmp_num++; } /* We at least one tmp register to convert if dst is not scalar. */ if (!sel.isScalarReg(insn.getDst(0)) && sel.isScalarReg(insn.getSrc(0)) && sel.isScalarReg(insn.getSrc(1))) { GBE_ASSERT(tmp_num == 13); tmp[tmp_num] = sel.selReg(sel.reg(FAMILY_QWORD), ir::TYPE_U64); tmp_num++; } } sel.push(); sel.curr.flag = 0; sel.curr.subFlag = 1; if(op == OP_DIV) sel.I64DIV(dst, src0, src1, tmp, tmp_num); else sel.I64REM(dst, src0, src1, tmp, tmp_num); sel.pop(); } else if (type == TYPE_DOUBLE) { if (!sel.hasDoubleType()) GBE_ASSERT(0); GenRegister tmp[10]; int tmpNum = 7; ir::RegisterFamily fm; if (sel.ctx.getSimdWidth() == 16) { fm = FAMILY_WORD; } else { fm = FAMILY_DWORD; } /* madm and invm need special accumutor support, which require us in align16 mode. If any src is uniform, we need another tmp register and MOV the uniform one to it. Because the madm and invm will work in align16 mode, the channel mask is different from the align1 mode. So we can not directly write the result to the dst and need a tmp register to hold the result and MOV it to dst later. */ tmpNum++; //For the dst. if (src0.hstride == GEN_HORIZONTAL_STRIDE_0) tmpNum++; if (src1.hstride == GEN_HORIZONTAL_STRIDE_0) tmpNum++; for (int i = 0; i < tmpNum; i++) tmp[i] = GenRegister::df8grf(sel.reg(fm)); sel.push(); sel.curr.flag = 0; sel.curr.subFlag = 1; sel.F64DIV(dst, src0, src1, tmp, tmpNum); sel.pop(); } else { GBE_ASSERT(0); } markAllChildren(dag); return true; } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::BinaryInstruction &insn = cast(dag.insn); const Opcode opcode = insn.getOpcode(); const Type type = insn.getType(); GenRegister dst = sel.selReg(insn.getDst(0), type); sel.push(); // Boolean values use scalars if (sel.isScalarReg(insn.getDst(0)) == true) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } if(opcode == OP_DIV || opcode == OP_REM) { bool ret = this->emitDivRemInst(sel, dag, opcode); sel.pop(); return ret; } // Immediates not supported if (opcode == OP_POW) { GenRegister src0 = sel.selReg(insn.getSrc(0), type); GenRegister src1 = sel.selReg(insn.getSrc(1), type); if(type == TYPE_FLOAT) { sel.MATH(dst, GEN_MATH_FUNCTION_POW, src0, src1); } else { NOT_IMPLEMENTED; } markAllChildren(dag); sel.pop(); return true; } // Look for immediate values GenRegister src0, src1; bool inverse = false; sel.getSrcGenRegImm(dag, src0, src1, type, inverse); // Output the binary instruction if (sel.getRegisterFamily(insn.getDst(0)) == ir::FAMILY_BOOL && dag.isUsed) { GBE_ASSERT(insn.getOpcode() == OP_AND || insn.getOpcode() == OP_OR || insn.getOpcode() == OP_XOR); sel.curr.physicalFlag = 0; sel.curr.flagIndex = insn.getDst(0).value(); sel.curr.modFlag = 1; } switch (opcode) { case OP_ADD: if ((type == Type::TYPE_U64 || type == Type::TYPE_S64) && !sel.hasLongType()) { GenRegister t = sel.selReg(sel.reg(RegisterFamily::FAMILY_QWORD), Type::TYPE_S64); sel.I64ADD(dst, src0, src1, t); } else sel.ADD(dst, src0, src1); break; case OP_ADDSAT: if ((type == Type::TYPE_U64 || type == Type::TYPE_S64) && !sel.hasLongType()) { GenRegister tmp[5]; for(int i=0; i<5; i++) { tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD)); tmp[i].type = GEN_TYPE_UD; } sel.push(); sel.curr.flag = 0; sel.curr.subFlag = 1; sel.I64SATADD(dst, src0, src1, tmp); sel.pop(); break; } sel.push(); sel.curr.saturate = GEN_MATH_SATURATE_SATURATE; sel.ADD(dst, src0, src1); sel.pop(); break; case OP_XOR: if ((type == Type::TYPE_U64 || type == Type::TYPE_S64) && !sel.hasLongType()) sel.I64XOR(dst, src0, src1); else sel.XOR(dst, src0, src1); break; case OP_OR: if ((type == Type::TYPE_U64 || type == Type::TYPE_S64) && !sel.hasLongType()) sel.I64OR(dst, src0, src1); else sel.OR(dst, src0, src1); break; case OP_AND: if ((type == Type::TYPE_U64 || type == Type::TYPE_S64) && !sel.hasLongType()) sel.I64AND(dst, src0, src1); else sel.AND(dst, src0, src1); break; case OP_SUB: if ((type == Type::TYPE_U64 || type == Type::TYPE_S64) && !sel.hasLongType()) { GenRegister t = sel.selReg(sel.reg(RegisterFamily::FAMILY_QWORD), Type::TYPE_S64); sel.I64SUB(dst, src0, src1, t); } else sel.ADD(dst, src0, GenRegister::negate(src1)); break; case OP_SUBSAT: if ((type == Type::TYPE_U64 || type == Type::TYPE_S64) && !sel.hasLongType()) { GenRegister tmp[5]; for(int i=0; i<5; i++) { tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD)); tmp[i].type = GEN_TYPE_UD; } sel.push(); sel.curr.flag = 0; sel.curr.subFlag = 1; sel.I64SATSUB(dst, src0, src1, tmp); sel.pop(); break; } sel.push(); sel.curr.saturate = GEN_MATH_SATURATE_SATURATE; sel.ADD(dst, src0, GenRegister::negate(src1)); sel.pop(); break; case OP_SHL: if ((type == Type::TYPE_U64 || type == Type::TYPE_S64) && !sel.hasLongType()) { GenRegister tmp[6]; for(int i = 0; i < 6; i ++) tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD)); sel.push(); sel.curr.flag = 0; sel.curr.subFlag = 1; sel.I64SHL(dst, src0, src1, tmp); sel.pop(); } else sel.SHL(dst, src0, src1); break; case OP_SHR: if ((type == Type::TYPE_U64 || type == Type::TYPE_S64) && !sel.hasLongType()) { GenRegister tmp[6]; for(int i = 0; i < 6; i ++) tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD)); sel.push(); sel.curr.flag = 0; sel.curr.subFlag = 1; sel.I64SHR(dst, src0, src1, tmp); sel.pop(); } else sel.SHR(dst, src0, src1); break; case OP_ASR: if ((type == Type::TYPE_U64 || type == Type::TYPE_S64) && !sel.hasLongType()) { GenRegister tmp[6]; for(int i = 0; i < 6; i ++) tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD)); sel.push(); sel.curr.flag = 0; sel.curr.subFlag = 1; sel.I64ASR(dst, src0, src1, tmp); sel.pop(); } else sel.ASR(dst, src0, src1); break; case OP_MUL_HI: { GenRegister temp = GenRegister::retype(sel.selReg(sel.reg(FAMILY_DWORD)), GEN_TYPE_UD); sel.MUL_HI(dst, src0, src1, temp); break; } case OP_I64_MUL_HI: { int tmp_num; GenRegister temp[9]; if (sel.hasLongType()) { for(int i=0; i<9; i++) { temp[i] = sel.selReg(sel.reg(FAMILY_QWORD), ir::TYPE_U64); } tmp_num = 6; } else { for(int i=0; i<9; i++) { temp[i] = sel.selReg(sel.reg(FAMILY_DWORD)); temp[i].type = GEN_TYPE_UD; } tmp_num = 9; } sel.push(); sel.curr.flag = 0; sel.curr.subFlag = 1; sel.I64_MUL_HI(dst, src0, src1, temp, tmp_num); sel.pop(); break; } case OP_MUL: if (type == TYPE_U32 || type == TYPE_S32) { sel.pop(); return false; } else if (type == TYPE_S64 || type == TYPE_U64) { if (sel.hasLongType()) { GenRegister tmp; tmp = sel.selReg(sel.reg(FAMILY_QWORD), ir::TYPE_U64); sel.I64MUL(dst, src0, src1, &tmp, true); } else { GenRegister tmp[6]; for(int i = 0; i < 6; i++) tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD)); sel.I64MUL(dst, src0, src1, tmp, false); } } else sel.MUL(dst, src0, src1); break; case OP_HADD: { GenRegister temp = GenRegister::retype(sel.selReg(sel.reg(FAMILY_DWORD)), GEN_TYPE_D); sel.HADD(dst, src0, src1, temp); break; } case OP_RHADD: { GenRegister temp = GenRegister::retype(sel.selReg(sel.reg(FAMILY_DWORD)), GEN_TYPE_D); sel.RHADD(dst, src0, src1, temp); break; } case OP_I64HADD: { GenRegister tmp[4]; if (!sel.hasLongType()) { for(int i=0; i<4; i++) tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD)); sel.I64HADD(dst, src0, src1, tmp, 4); } else { tmp[0] = sel.selReg(sel.reg(FAMILY_QWORD), ir::TYPE_U64); tmp[1] = sel.selReg(sel.reg(FAMILY_QWORD), ir::TYPE_U64); sel.I64HADD(dst, src0, src1, tmp, 2); } break; } case OP_I64RHADD: { GenRegister tmp[4]; if (!sel.hasLongType()) { for(int i=0; i<4; i++) tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD)); sel.I64RHADD(dst, src0, src1, tmp, 4); } else { tmp[0] = sel.selReg(sel.reg(FAMILY_QWORD), ir::TYPE_U64); tmp[1] = sel.selReg(sel.reg(FAMILY_QWORD), ir::TYPE_U64); sel.I64RHADD(dst, src0, src1, tmp, 2); } break; } case OP_UPSAMPLE_SHORT: { dst = GenRegister::retype(sel.unpacked_uw(dst.reg()), GEN_TYPE_B); src0 = GenRegister::retype(sel.unpacked_uw(src0.reg()), GEN_TYPE_B); src1 = GenRegister::retype(sel.unpacked_uw(src1.reg()), GEN_TYPE_B); sel.MOV(dst, src1); dst = sel.getOffsetReg(dst, 0, typeSize(GEN_TYPE_B)); sel.MOV(dst, src0); break; } case OP_UPSAMPLE_INT: { dst = sel.unpacked_uw(dst.reg()); src0 = sel.unpacked_uw(src0.reg()); src1 = sel.unpacked_uw(src1.reg()); sel.MOV(dst, src1); dst = sel.getOffsetReg(dst, 0, typeSize(GEN_TYPE_W)); sel.MOV(dst, src0); break; } case OP_UPSAMPLE_LONG: sel.UPSAMPLE_LONG(dst, src0, src1); break; default: NOT_IMPLEMENTED; } sel.pop(); return true; } }; /*! MAD pattern */ class MulAddInstructionPattern : public SelectionPattern { public: /*! Register the pattern for all opcodes of the family */ MulAddInstructionPattern(void) : SelectionPattern(2, 1) { this->opcodes.push_back(ir::OP_ADD); this->opcodes.push_back(ir::OP_SUB); } /*! Implements base class */ virtual bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; // XXX TODO: we need a clean support of FP_CONTRACT to remove below line 'return false' // if 'pragma FP_CONTRACT OFF' is used in cl kernel, we should not do mad optimization. if (!sel.ctx.relaxMath) return false; // MAD tend to increase liveness of the sources (since there are three of // them). TODO refine this strategy. Well, we should be able at least to // evaluate per basic block register pressure and selectively enable // disable MADs if (sel.ctx.limitRegisterPressure) return false; // We are good to try. We need a MUL for one of the two sources const ir::BinaryInstruction &insn = cast(dag.insn); if (insn.getType() != TYPE_FLOAT) return false; SelectionDAG *child0 = dag.child[0]; SelectionDAG *child1 = dag.child[1]; const GenRegister dst = sel.selReg(insn.getDst(0), TYPE_FLOAT); if (child0 && child0->insn.getOpcode() == OP_MUL) { GBE_ASSERT(cast(child0->insn).getType() == TYPE_FLOAT); const GenRegister src0 = sel.selReg(child0->insn.getSrc(0), TYPE_FLOAT); const GenRegister src1 = sel.selReg(child0->insn.getSrc(1), TYPE_FLOAT); GenRegister src2 = sel.selReg(insn.getSrc(1), TYPE_FLOAT); if(insn.getOpcode() == ir::OP_SUB) src2 = GenRegister::negate(src2); sel.push(); if (sel.isScalarReg(insn.getDst(0))) sel.curr.execWidth = 1; sel.MAD(dst, src2, src0, src1); // order different on HW! sel.pop(); if (child0->child[0]) child0->child[0]->isRoot = 1; if (child0->child[1]) child0->child[1]->isRoot = 1; if (child1) child1->isRoot = 1; return true; } if (child1 && child1->insn.getOpcode() == OP_MUL) { GBE_ASSERT(cast(child1->insn).getType() == TYPE_FLOAT); GenRegister src0 = sel.selReg(child1->insn.getSrc(0), TYPE_FLOAT); const GenRegister src1 = sel.selReg(child1->insn.getSrc(1), TYPE_FLOAT); const GenRegister src2 = sel.selReg(insn.getSrc(0), TYPE_FLOAT); if(insn.getOpcode() == ir::OP_SUB) src0 = GenRegister::negate(src0); sel.push(); if (sel.isScalarReg(insn.getDst(0))) sel.curr.execWidth = 1; sel.MAD(dst, src2, src0, src1); // order different on HW! sel.pop(); if (child1->child[0]) child1->child[0]->isRoot = 1; if (child1->child[1]) child1->child[1]->isRoot = 1; if (child0) child0->isRoot = 1; return true; } return false; } }; /*! there some patterns like: sqrt r1, r2; load r4, 1.0; ===> rqrt r3, r2 div r3, r4, r1; */ class SqrtDivInstructionPattern : public SelectionPattern { public: /*! Register the pattern for all opcodes of the family */ SqrtDivInstructionPattern(void) : SelectionPattern(1, 1) { this->opcodes.push_back(ir::OP_DIV); } /*! Implements base class */ virtual bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; // We are good to try. We need a MUL for one of the two sources const ir::BinaryInstruction &insn = cast(dag.insn); if (insn.getType() != TYPE_FLOAT) return false; SelectionDAG *child0 = dag.child[0]; SelectionDAG *child1 = dag.child[1]; const GenRegister dst = sel.selReg(insn.getDst(0), TYPE_FLOAT); if (child1 && child1->insn.getOpcode() == OP_SQR) { GBE_ASSERT(cast(child1->insn).getType() == TYPE_FLOAT); GenRegister srcSQR = sel.selReg(child1->insn.getSrc(0), TYPE_FLOAT); const GenRegister tmp = sel.selReg(sel.reg(ir::FAMILY_DWORD), ir::TYPE_FLOAT); const GenRegister src0 = sel.selReg(insn.getSrc(0), TYPE_FLOAT); float immVal = 0.0f; if (child0 && child0->insn.getOpcode() == OP_LOADI) { const auto &loadimm = cast(child0->insn); const Immediate imm = loadimm.getImmediate(); const Type type = imm.getType(); if (type == TYPE_FLOAT) immVal = imm.getFloatValue(); else if (type == TYPE_S32 || type == TYPE_U32) immVal = imm.getIntegerValue(); } sel.push(); if (sel.isScalarReg(insn.getDst(0))) sel.curr.execWidth = 1; if (immVal == 1.0f) { sel.MATH(dst, GEN_MATH_FUNCTION_RSQ, srcSQR); if (child1->child[0]) child1->child[0]->isRoot = 1; } else { sel.MATH(tmp, GEN_MATH_FUNCTION_RSQ, srcSQR); if (immVal != 0.0f) { GenRegister isrc = GenRegister::immf(immVal); sel.MUL(dst, tmp, isrc); } else { sel.MUL(dst, src0, tmp); if (child0) child0->isRoot = 1; } if (child1->child[0]) child1->child[0]->isRoot = 1; } sel.pop(); return true; } return false; } }; /*! sel.{le,l,ge...} like patterns */ class SelectModifierInstructionPattern : public SelectionPattern { public: /*! Register the pattern for all opcodes of the family */ SelectModifierInstructionPattern(void) : SelectionPattern(2, 1) { this->opcodes.push_back(ir::OP_SEL); } /*! Implements base class */ virtual bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; SelectionDAG *cmp = dag.child[0]; const SelectInstruction &insn = cast(dag.insn); if (insn.getType() == TYPE_S64 || insn.getType() == TYPE_U64) // not support return false; // Not in this block if (cmp == NULL) return false; // We need to match a compare if (cmp->insn.isMemberOf() == false) return false; // We look for something like that: // cmp.{le,ge...} flag src0 src1 // sel dst flag src0 src1 // So both sources must match if (sourceMatch(cmp, 0, &dag, 1) == false) return false; if (sourceMatch(cmp, 1, &dag, 2) == false) return false; // OK, we merge the instructions const ir::CompareInstruction &cmpInsn = cast(cmp->insn); const ir::Opcode opcode = cmpInsn.getOpcode(); if(opcode == OP_ORD) return false; GenRegister src0, src1; const ir::Type type = cmpInsn.getType(); bool inverse = false; sel.getSrcGenRegImm(*cmp, src0, src1, type, inverse); const uint32_t genCmp = getGenCompare(opcode, inverse); sel.push(); if (sel.isScalarReg(insn.getDst(0)) == true) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } // Like for regular selects, we need a temporary since we cannot predicate // properly const uint32_t simdWidth = sel.curr.execWidth; const GenRegister dst = sel.selReg(insn.getDst(0), type); sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.execWidth = simdWidth; sel.SEL_CMP(genCmp, dst, src0, src1); sel.pop(); return true; } }; /*! 32 bits integer multiply needs more instructions */ class Int32x32MulInstructionPattern : public SelectionPattern { public: /*! Register the pattern for all opcodes of the family */ Int32x32MulInstructionPattern(void) : SelectionPattern(1, 4) { this->opcodes.push_back(ir::OP_MUL); } /*! Implements base class */ virtual bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::BinaryInstruction &insn = cast(dag.insn); const Type type = insn.getType(); if (type != TYPE_U32 && type != TYPE_S32) return false; GenRegister dst = sel.selReg(insn.getDst(0), type); GenRegister src0 = sel.selReg(insn.getSrc(0), type); GenRegister src1 = sel.selReg(insn.getSrc(1), type); sel.push(); if (sel.has32X32Mul()) { if (sel.isScalarReg(insn.getDst(0)) == true) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MUL(dst, src0, src1); } else { if (sel.isScalarReg(insn.getDst(0)) == true) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } const int simdWidth = sel.curr.execWidth; // Either left part of the 16-wide register or just a simd 8 register dst = GenRegister::retype(dst, GEN_TYPE_D); src0 = GenRegister::retype(src0, GEN_TYPE_D); src1 = GenRegister::retype(src1, GEN_TYPE_D); sel.curr.execWidth = 8; sel.curr.quarterControl = GEN_COMPRESSION_Q1; sel.MUL(GenRegister::retype(GenRegister::acc(), GEN_TYPE_D), src0, src1); sel.curr.accWrEnable = 1; sel.MACH(GenRegister::retype(GenRegister::null(), GEN_TYPE_D), src0, src1); sel.curr.accWrEnable = 0; if (simdWidth == 1) { sel.curr.execWidth = 1; sel.MOV(GenRegister::retype(dst, GEN_TYPE_F), GenRegister::vec1(GenRegister::acc())); } else { sel.curr.execWidth = 8; sel.MOV(GenRegister::retype(dst, GEN_TYPE_F), GenRegister::acc()); } // Right part of the 16-wide register now if (simdWidth == 16) { int predicate = sel.curr.predicate; int noMask = sel.curr.noMask; sel.curr.noMask = 1; sel.curr.predicate = GEN_PREDICATE_NONE; const GenRegister nextSrc0 = sel.selRegQn(insn.getSrc(0), 1, TYPE_S32); const GenRegister nextSrc1 = sel.selRegQn(insn.getSrc(1), 1, TYPE_S32); sel.MUL(GenRegister::retype(GenRegister::acc(), GEN_TYPE_D), nextSrc0, nextSrc1); sel.curr.accWrEnable = 1; sel.MACH(GenRegister::retype(GenRegister::null(), GEN_TYPE_D), nextSrc0, nextSrc1); sel.curr.accWrEnable = 0; sel.curr.quarterControl = GEN_COMPRESSION_Q2; if (predicate != GEN_PREDICATE_NONE || noMask != 1) { const ir::Register reg = sel.reg(FAMILY_DWORD); sel.MOV(GenRegister::f8grf(reg), GenRegister::acc()); sel.curr.noMask = noMask;; sel.curr.predicate = predicate; sel.MOV(GenRegister::retype(GenRegister::next(dst), GEN_TYPE_F), GenRegister::f8grf(reg)); } else sel.MOV(GenRegister::retype(GenRegister::next(dst), GEN_TYPE_F), GenRegister::acc()); } } sel.pop(); // All children are marked as root markAllChildren(dag); return true; } }; /*! 32x16 bits integer can be done in one instruction */ class Int32x16MulInstructionPattern : public SelectionPattern { public: /*! Register the pattern for all opcodes of the family */ Int32x16MulInstructionPattern(void) : SelectionPattern(1, 1) { this->opcodes.push_back(ir::OP_MUL); } bool is16BitSpecialReg(ir::Register reg) const { if (reg == ir::ocl::lid0 || reg == ir::ocl::lid1 || reg == ir::ocl::lid2 || reg == ir::ocl::lsize0 || reg == ir::ocl::lsize1 || reg == ir::ocl::lsize2 || reg == ir::ocl::enqlsize0 || reg == ir::ocl::enqlsize1 || reg == ir::ocl::enqlsize2) return true; else return false; } /*! Try to emit a multiply where child childID is a 16 immediate */ bool emitMulImmediate(Selection::Opaque &sel, SelectionDAG &dag, uint32_t childID) const { using namespace ir; const ir::BinaryInstruction &insn = cast(dag.insn); const Register dst = insn.getDst(0); const Register src1 = insn.getSrc(childID ^ 1); const SelectionDAG *src0DAG = dag.child[childID]; if (src0DAG != NULL) { if (src0DAG->insn.getOpcode() == OP_LOADI) { const auto &loadimm = cast(src0DAG->insn); const Immediate imm = loadimm.getImmediate(); const Type type = imm.getType(); GBE_ASSERT(type == TYPE_U32 || type == TYPE_S32); if (type == TYPE_U32 && imm.getIntegerValue() <= 0xffff) { sel.push(); if (sel.isScalarReg(insn.getDst(0)) == true) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MUL(sel.selReg(dst, type), sel.selReg(src1, type), GenRegister::immuw(imm.getIntegerValue())); sel.pop(); if (dag.child[childID ^ 1] != NULL) dag.child[childID ^ 1]->isRoot = 1; return true; } if (type == TYPE_S32 && (imm.getIntegerValue() >= -32768 && imm.getIntegerValue() <= 32767)) { sel.push(); if (sel.isScalarReg(insn.getDst(0)) == true) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MUL(sel.selReg(dst, type), sel.selReg(src1, type), GenRegister::immw(imm.getIntegerValue())); sel.pop(); if (dag.child[childID ^ 1] != NULL) dag.child[childID ^ 1]->isRoot = 1; return true; } } } return false; } /*! Try to emit a multiply with a 16 bit special register */ bool emitMulSpecialReg(Selection::Opaque &sel, SelectionDAG &dag, uint32_t childID) const { using namespace ir; const BinaryInstruction &insn = cast(dag.insn); const Type type = insn.getType(); const Register dst = insn.getDst(0); const Register src0 = insn.getSrc(childID); const Register src1 = insn.getSrc(childID ^ 1); if (is16BitSpecialReg(src0)) { sel.push(); if (sel.isScalarReg(insn.getDst(0)) == true) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MUL(sel.selReg(dst, type), sel.selReg(src1, type), sel.selReg(src0, TYPE_U32)); sel.pop(); markAllChildren(dag); return true; } return false; } virtual bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const BinaryInstruction &insn = cast(dag.insn); const Type type = insn.getType(); if (type == TYPE_U32 || type == TYPE_S32) { if (this->emitMulSpecialReg(sel, dag, 0)) return true; if (this->emitMulSpecialReg(sel, dag, 1)) return true; if (this->emitMulImmediate(sel, dag, 0)) return true; if (this->emitMulImmediate(sel, dag, 1)) return true; } return false; } }; #define DECL_NOT_IMPLEMENTED_ONE_TO_MANY(FAMILY) \ struct FAMILY##Pattern : public OneToManyPattern\ {\ INLINE bool emitOne(Selection::Opaque &sel, const ir::FAMILY &insn, bool &markChildren) const {\ NOT_IMPLEMENTED;\ return false;\ }\ DECL_CTOR(FAMILY, 1, 1); \ } #undef DECL_NOT_IMPLEMENTED_ONE_TO_MANY /*! Load immediate pattern */ DECL_PATTERN(LoadImmInstruction) { INLINE bool emitOne(Selection::Opaque &sel, const ir::LoadImmInstruction &insn, bool &markChildren) const { using namespace ir; const Type type = insn.getType(); const Immediate imm = insn.getImmediate(); const GenRegister dst = sel.selReg(insn.getDst(0), type); sel.push(); if (sel.isScalarReg(insn.getDst(0)) == true) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } switch (type) { case TYPE_BOOL: if (!sel.isScalarReg(insn.getDst(0)) && sel.regDAG[insn.getDst(0)]->isUsed) { sel.curr.modFlag = 1; sel.curr.physicalFlag = 0; sel.curr.flagIndex = insn.getDst(0).value(); } sel.MOV(dst, imm.getIntegerValue() ? GenRegister::immuw(0xffff) : GenRegister::immuw(0)); break; case TYPE_U32: sel.MOV(dst, GenRegister::immud(imm.getIntegerValue())); break; case TYPE_S32: sel.MOV(dst, GenRegister::immd(imm.getIntegerValue())); break; case TYPE_FLOAT: sel.MOV(GenRegister::retype(dst, GEN_TYPE_F), GenRegister::immf(imm.asFloatValue())); break; case TYPE_HALF: { ir::half hf = imm.getHalfValue(); sel.MOV(GenRegister::retype(dst, GEN_TYPE_HF), GenRegister::immh(hf.getVal())); break; } case TYPE_U16: sel.MOV(dst, GenRegister::immuw(imm.getIntegerValue())); break; case TYPE_S16: sel.MOV(dst, GenRegister::immw(imm.getIntegerValue())); break; case TYPE_U8: sel.MOV(dst, GenRegister::immuw(imm.getIntegerValue())); break; case TYPE_S8: sel.MOV(dst, GenRegister::immw(imm.getIntegerValue())); break; case TYPE_DOUBLE: sel.MOV(dst, GenRegister::immdf(imm.getDoubleValue())); break; case TYPE_S64: sel.LOAD_INT64_IMM(dst, GenRegister::immint64(imm.getIntegerValue())); break; case TYPE_U64: sel.LOAD_INT64_IMM(dst, GenRegister::immuint64(imm.getIntegerValue())); break; default: NOT_SUPPORTED; } sel.pop(); return true; } DECL_CTOR(LoadImmInstruction, 1,1); }; /*! Sync instruction */ DECL_PATTERN(SyncInstruction) { INLINE bool emitOne(Selection::Opaque &sel, const ir::SyncInstruction &insn, bool &markChildren) const { using namespace ir; const ir::Register reg = sel.reg(FAMILY_DWORD); const uint32_t params = insn.getParameters(); // A barrier is OK to start the thread synchronization *and* SLM fence sel.BARRIER(GenRegister::ud8grf(reg), sel.selReg(sel.reg(FAMILY_DWORD)), params); return true; } DECL_CTOR(SyncInstruction, 1,1); }; /*! Wait instruction */ DECL_PATTERN(WaitInstruction) { INLINE bool emitOne(Selection::Opaque &sel, const ir::WaitInstruction &insn, bool &markChildren) const { using namespace ir; // Debugwait will use reg 1, which is different from barrier sel.push(); sel.curr.noMask = 1; sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.WAIT(1); sel.pop(); return true; } DECL_CTOR(WaitInstruction, 1,1); }; INLINE uint32_t getByteScatterGatherSize(Selection::Opaque &sel, ir::Type type) { using namespace ir; switch (type) { case TYPE_DOUBLE: case TYPE_S64: case TYPE_U64: return GEN_BYTE_SCATTER_QWORD; case TYPE_FLOAT: case TYPE_U32: case TYPE_S32: return GEN_BYTE_SCATTER_DWORD; case TYPE_BOOL: case TYPE_U16: case TYPE_S16: return GEN_BYTE_SCATTER_WORD; case TYPE_U8: case TYPE_S8: return GEN_BYTE_SCATTER_BYTE; case TYPE_HALF: if (sel.hasHalfType()) return GEN_BYTE_SCATTER_WORD; default: NOT_SUPPORTED; return GEN_BYTE_SCATTER_BYTE; } } ir::Register generateLocalMask(Selection::Opaque &sel, GenRegister addr) { sel.push(); ir::Register localMask = sel.reg(ir::FAMILY_BOOL); sel.curr.physicalFlag = 0; sel.curr.modFlag = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.flagIndex = localMask; sel.CMP(GEN_CONDITIONAL_L, addr, GenRegister::immud(64*1024)); sel.pop(); return localMask; } class LoadInstructionPattern : public SelectionPattern { public: /*! Register the pattern for all opcodes of the family */ LoadInstructionPattern(void) : SelectionPattern(1, 1) { this->opcodes.push_back(ir::OP_LOAD); } bool isReadConstantLegacy(const ir::LoadInstruction &load) const { ir::AddressMode AM = load.getAddressMode(); ir::AddressSpace AS = load.getAddressSpace(); if (AM != ir::AM_Stateless && AS == ir::MEM_CONSTANT) return true; return false; } void untypedReadStateless(Selection::Opaque &sel, GenRegister addr, vector &dst ) const { using namespace ir; GenRegister addrQ; unsigned simdWidth = sel.curr.execWidth; unsigned addrBytes = typeSize(addr.type); unsigned valueNum = dst.size(); bool isUniform = sel.isScalarReg(dst[0].reg()); if (addrBytes == 4) { addrQ = sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64); sel.MOV(addrQ, addr); } else if (addrBytes == 8) { addrQ = addr; } else NOT_IMPLEMENTED; if (simdWidth == 8) { sel.UNTYPED_READA64(addrQ, dst.data(), valueNum, valueNum); } else if (simdWidth == 16) { std::vector tmpData; for (unsigned i = 0; i < (valueNum+1)/2; i++) { tmpData.push_back(sel.selReg(sel.reg(ir::FAMILY_DWORD), ir::TYPE_U32)); } sel.push(); /* first quarter */ sel.curr.execWidth = 8; sel.curr.quarterControl = GEN_COMPRESSION_Q1; sel.UNTYPED_READA64(GenRegister::Qn(addrQ, 0), tmpData.data(), (valueNum+1)/2, valueNum); sel.push(); if (isUniform) sel.curr.execWidth = 1; for (unsigned k = 0; k < valueNum; k++) { sel.MOV(GenRegister::Qn(dst[k], 0), GenRegister::Qn(tmpData[k/2], k%2)); } sel.pop(); /* second quarter */ sel.curr.execWidth = 8; sel.curr.quarterControl = GEN_COMPRESSION_Q2; sel.UNTYPED_READA64(GenRegister::Qn(addrQ, 1), tmpData.data(), (valueNum+1)/2, valueNum); if (isUniform) sel.curr.execWidth = 1; for (unsigned k = 0; k < valueNum; k++) { sel.MOV(GenRegister::Qn(dst[k], 1), GenRegister::Qn(tmpData[k/2], k%2)); } sel.pop(); } } void shootUntypedReadMsg(Selection::Opaque &sel, const ir::LoadInstruction &insn, vector &dst, GenRegister addr, uint32_t valueNum, ir::AddressSpace addrSpace) const { using namespace ir; unsigned addrBytes = typeSize(addr.type); AddressMode AM = insn.getAddressMode(); /* Notes on uniform of LoadInstruction, all-lanes-active(noMask,noPredicate) * property should only need be taken care when the value is UNIFORM, if the * value is not uniform, just do things under predication or mask */ bool isUniform = sel.isScalarReg(dst[0].reg()); sel.push(); if (isUniform) { sel.curr.noMask = 1; sel.curr.predicate = GEN_PREDICATE_NONE; } vector btiTemp = sel.getBTITemps(AM); if (AM == AM_DynamicBti || AM == AM_StaticBti) { if (AM == AM_DynamicBti) { Register btiReg = insn.getBtiReg(); sel.UNTYPED_READ(addr, dst.data(), valueNum, sel.selReg(btiReg, TYPE_U32), btiTemp); } else { unsigned SI = insn.getSurfaceIndex(); sel.UNTYPED_READ(addr, dst.data(), valueNum, GenRegister::immud(SI), btiTemp); } } else if (addrSpace == ir::MEM_LOCAL || isReadConstantLegacy(insn) ) { // stateless mode, local/constant still use bti access unsigned bti = addrSpace == ir::MEM_CONSTANT ? BTI_CONSTANT : 0xfe; GenRegister addrDW = addr; if (addrBytes == 8) addrDW = convertU64ToU32(sel, addr); sel.UNTYPED_READ(addrDW, dst.data(), valueNum, GenRegister::immud(bti), btiTemp); } else if (addrSpace == ir::MEM_GENERIC) { Register localMask = generateLocalMask(sel, addr); sel.push(); sel.curr.useVirtualFlag(localMask, GEN_PREDICATE_NORMAL); GenRegister addrDW = addr; if (addrBytes == 8) addrDW = convertU64ToU32(sel, addr); sel.UNTYPED_READ(addrDW, dst.data(), valueNum, GenRegister::immud(0xfe), btiTemp); sel.curr.inversePredicate = 1; untypedReadStateless(sel, addr, dst); sel.pop(); } else { untypedReadStateless(sel, addr, dst); } sel.pop(); } void emitUntypedRead(Selection::Opaque &sel, const ir::LoadInstruction &insn, GenRegister addr, ir::AddressSpace addrSpace) const { using namespace ir; const uint32_t valueNum = insn.getValueNum(); vector dst(valueNum); for (uint32_t dstID = 0; dstID < valueNum; ++dstID) dst[dstID] = sel.selReg(insn.getValue(dstID), TYPE_U32); shootUntypedReadMsg(sel, insn, dst, addr, valueNum, addrSpace); } void emitDWordGather(Selection::Opaque &sel, const ir::LoadInstruction &insn, GenRegister addr, ir::AddressSpace addrSpace) const { using namespace ir; GBE_ASSERT(insn.getValueNum() == 1); const uint32_t isUniform = sel.isScalarReg(insn.getValue(0)); if(isUniform) { GenRegister dst = sel.selReg(insn.getValue(0), ir::TYPE_U32); sel.push(); sel.curr.noMask = 1; sel.SAMPLE(&dst, 1, &addr, 1, BTI_CONSTANT, 0, true, true); sel.pop(); return; } GenRegister dst = GenRegister::retype(sel.selReg(insn.getValue(0)), GEN_TYPE_F); // get dword based address GenRegister addrDW = sel.selReg(sel.reg(FAMILY_DWORD, isUniform), ir::TYPE_U32); sel.push(); if (sel.isScalarReg(addr.reg())) { sel.curr.noMask = 1; } if (sel.getRegisterFamily(addr.reg()) == FAMILY_QWORD) { // as we still use offset instead of absolut graphics address, // it is safe to convert from u64 to u32 GenRegister t = convertU64ToU32(sel, addr); sel.SHR(addrDW, t, GenRegister::immud(2)); } else sel.SHR(addrDW, GenRegister::retype(addr, GEN_TYPE_UD), GenRegister::immud(2)); sel.pop(); sel.DWORD_GATHER(dst, addrDW, BTI_CONSTANT); } void read64Legacy(Selection::Opaque &sel, GenRegister addr, vector &dst, GenRegister bti, vector &btiTemp) const { const uint32_t valueNum = dst.size(); if (sel.hasLongType()) { vector tmp(valueNum); for (uint32_t valueID = 0; valueID < valueNum; ++valueID) { tmp[valueID] = GenRegister::retype(sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64), GEN_TYPE_UL); } sel.READ64(addr, dst.data(), tmp.data(), valueNum, bti, true, btiTemp); } else { sel.READ64(addr, dst.data(), NULL, valueNum, bti, false, btiTemp); } } void read64Stateless(Selection::Opaque &sel, const GenRegister addr, vector dst) const { using namespace ir; unsigned simdWidth = sel.ctx.getSimdWidth(); unsigned valueNum = dst.size(); vector tmp(valueNum); for (uint32_t valueID = 0; valueID < valueNum; ++valueID) { tmp[valueID] = GenRegister::retype(sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64), GEN_TYPE_UL); } unsigned addrBytes = typeSize(addr.type); GenRegister addrQ; sel.push(); if (addrBytes == 4) { addrQ = sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64); sel.MOV(addrQ, addr); } else { addrQ = addr; } if (simdWidth == 8) { sel.READ64A64(addrQ, dst.data(), tmp.data(), valueNum); } else { assert(valueNum == 1); GenRegister tmpAddr, tmpDst; tmpAddr = GenRegister::Qn(addrQ, 0); tmpDst = GenRegister::Qn(dst[0], 0); sel.curr.execWidth = 8; sel.curr.quarterControl = GEN_COMPRESSION_Q1; sel.READ64A64(tmpAddr, &tmpDst, tmp.data(), valueNum); tmpAddr = GenRegister::Qn(addrQ, 1); tmpDst = GenRegister::Qn(dst[0], 1); sel.curr.quarterControl = GEN_COMPRESSION_Q2; sel.READ64A64(tmpAddr, &tmpDst, tmp.data(), valueNum); } sel.pop(); } void emitRead64(Selection::Opaque &sel, const ir::LoadInstruction &insn, GenRegister addr, ir::AddressSpace addrSpace) const { using namespace ir; const uint32_t valueNum = insn.getValueNum(); /* XXX support scalar only right now. */ GBE_ASSERT(valueNum == 1); vector dst(valueNum); for ( uint32_t dstID = 0; dstID < valueNum; ++dstID) dst[dstID] = sel.selReg(insn.getValue(dstID), ir::TYPE_U64); bool isUniform = sel.isScalarReg(insn.getValue(0)); unsigned addrBytes = typeSize(addr.type); AddressMode AM = insn.getAddressMode(); vector btiTemp = sel.getBTITemps(AM); sel.push(); if (isUniform) { sel.curr.noMask = 1; sel.curr.predicate = GEN_PREDICATE_NONE; } if (AM != AM_Stateless) { GenRegister b; if (AM == AM_DynamicBti) { b = sel.selReg(insn.getBtiReg(), TYPE_U32); } else { b = GenRegister::immud(insn.getSurfaceIndex()); } read64Legacy(sel, addr, dst, b, btiTemp); } else if (addrSpace == MEM_LOCAL || isReadConstantLegacy(insn)) { GenRegister b = GenRegister::immud(addrSpace == MEM_LOCAL? 0xfe : BTI_CONSTANT); GenRegister addrDW = addr; if (addrBytes == 8) addrDW = convertU64ToU32(sel, addr); read64Legacy(sel, addrDW, dst, b, btiTemp); } else if (addrSpace == ir::MEM_GENERIC) { Register localMask = generateLocalMask(sel, addr); sel.push(); sel.curr.useVirtualFlag(localMask, GEN_PREDICATE_NORMAL); GenRegister addrDW = addr; if (addrBytes == 8) addrDW = convertU64ToU32(sel, addr); read64Legacy(sel, addrDW, dst, GenRegister::immud(0xfe), btiTemp); sel.curr.inversePredicate = 1; read64Stateless(sel, addr, dst); sel.pop(); } else { read64Stateless(sel, addr, dst); } sel.pop(); } void readByteAsDWord(Selection::Opaque &sel, const ir::LoadInstruction &insn, const uint32_t elemSize, GenRegister address, GenRegister dst, bool isUniform, ir::AddressSpace addrSpace) const { using namespace ir; RegisterFamily addrFamily = sel.getRegisterFamily(address.reg()); Type addrType = getType(addrFamily); Register tmpReg = sel.reg(FAMILY_DWORD, isUniform); GenRegister tmpAddr = sel.selReg(sel.reg(addrFamily, isUniform), addrType); GenRegister tmpData = sel.selReg(tmpReg, ir::TYPE_U32); GenRegister addrOffset = sel.selReg(sel.reg(FAMILY_DWORD, isUniform), ir::TYPE_U32); // Get dword aligned addr sel.push(); if (isUniform) { sel.curr.noMask = 1; sel.curr.execWidth = 1; } if (addrFamily == FAMILY_DWORD) sel.AND(tmpAddr, GenRegister::retype(address,GEN_TYPE_UD), GenRegister::immud(0xfffffffc)); else { sel.MOV(tmpAddr, GenRegister::immuint64(0xfffffffffffffffc)); sel.AND(tmpAddr, GenRegister::retype(address,GEN_TYPE_UL), tmpAddr); } sel.pop(); sel.push(); vector tmp; tmp.push_back(tmpData); shootUntypedReadMsg(sel, insn, tmp, tmpAddr, 1, addrSpace); if (isUniform) sel.curr.noMask = 1; if (isUniform) sel.curr.execWidth = 1; // Get the remaining offset from aligned addr if (addrFamily == FAMILY_QWORD) { sel.AND(addrOffset, sel.unpacked_ud(address.reg()), GenRegister::immud(0x3)); } else { sel.AND(addrOffset, GenRegister::retype(address,GEN_TYPE_UD), GenRegister::immud(0x3)); } sel.SHL(addrOffset, addrOffset, GenRegister::immud(0x3)); sel.SHR(tmpData, tmpData, addrOffset); if (elemSize == GEN_BYTE_SCATTER_WORD) sel.MOV(GenRegister::retype(dst, GEN_TYPE_UW), GenRegister::unpacked_uw(tmpReg, isUniform, sel.isLongReg(tmpReg))); else if (elemSize == GEN_BYTE_SCATTER_BYTE) sel.MOV(GenRegister::retype(dst, GEN_TYPE_UB), GenRegister::unpacked_ub(tmpReg, isUniform)); sel.pop(); } // The address is dw aligned. void emitAlignedByteGather(Selection::Opaque &sel, const ir::LoadInstruction &insn, const uint32_t elemSize, GenRegister address, ir::AddressSpace addrSpace) const { using namespace ir; const uint32_t valueNum = insn.getValueNum(); const bool isUniform = sel.isScalarReg(insn.getValue(0)); RegisterFamily family = getFamily(insn.getValueType()); vector dst(valueNum); const uint32_t typeSize = getFamilySize(family); for(uint32_t i = 0; i < valueNum; i++) dst[i] = sel.selReg(insn.getValue(i), getType(family)); uint32_t tmpRegNum = (typeSize*valueNum + 3) / 4; vector tmp(tmpRegNum); for(uint32_t i = 0; i < tmpRegNum; i++) { tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD, isUniform), ir::TYPE_U32); } shootUntypedReadMsg(sel, insn, tmp, address, tmpRegNum, addrSpace); for(uint32_t i = 0; i < tmpRegNum; i++) { unsigned int elemNum = (valueNum - i * (4 / typeSize)) > 4/typeSize ? 4/typeSize : (valueNum - i * (4 / typeSize)); sel.UNPACK_BYTE(dst.data() + i * 4/typeSize, tmp[i], typeSize, elemNum); } } // Gather effect data to the effectData vector from the tmp vector. // x x d0 d1 | d2 d3 d4 d5 | ... ==> d0 d1 d2 d3 | d4 d5 ... void getEffectByteData(Selection::Opaque &sel, vector &effectData, vector &tmp, uint32_t effectDataNum, const GenRegister &address, bool isUniform) const { using namespace ir; GBE_ASSERT(effectData.size() == effectDataNum); GBE_ASSERT(tmp.size() == effectDataNum + 1); RegisterFamily addrFamily = sel.getRegisterFamily(address.reg()); sel.push(); Register alignedFlag = sel.reg(FAMILY_BOOL, isUniform); GenRegister shiftL = sel.selReg(sel.reg(FAMILY_DWORD, isUniform), ir::TYPE_U32); Register shiftHReg = sel.reg(FAMILY_DWORD, isUniform); GenRegister shiftH = sel.selReg(shiftHReg, ir::TYPE_U32); sel.push(); if (isUniform) sel.curr.noMask = 1; if (addrFamily == FAMILY_QWORD) { GenRegister t = convertU64ToU32(sel, address); sel.AND(shiftL, t, GenRegister::immud(0x3)); } else { sel.AND(shiftL, GenRegister::retype(address,GEN_TYPE_UD), GenRegister::immud(0x3)); } sel.SHL(shiftL, shiftL, GenRegister::immud(0x3)); sel.ADD(shiftH, GenRegister::negate(shiftL), GenRegister::immud(32)); sel.curr.physicalFlag = 0; sel.curr.modFlag = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.flagIndex = alignedFlag.value(); sel.CMP(GEN_CONDITIONAL_NEQ, GenRegister::unpacked_uw(shiftHReg), GenRegister::immuw(32)); sel.pop(); sel.curr.noMask = 1; for(uint32_t i = 0; i < effectDataNum; i++) { GenRegister tmpH = sel.selReg(sel.reg(FAMILY_DWORD, isUniform), ir::TYPE_U32); GenRegister tmpL = effectData[i]; sel.SHR(tmpL, tmp[i], shiftL); sel.push(); // Only need to consider the tmpH when the addr is not aligned. sel.curr.modFlag = 0; sel.curr.physicalFlag = 0; sel.curr.flagIndex = alignedFlag.value(); sel.curr.predicate = GEN_PREDICATE_NORMAL; sel.SHL(tmpH, tmp[i + 1], shiftH); sel.OR(effectData[i], tmpL, tmpH); sel.pop(); } sel.pop(); } /* Used to transform address from 64bit to 32bit, note as dataport messages * cannot accept scalar register, so here to convert to non-uniform * register here. */ GenRegister convertU64ToU32(Selection::Opaque &sel, GenRegister addr) const { GenRegister unpacked = GenRegister::retype(sel.unpacked_ud(addr.reg()), GEN_TYPE_UD); GenRegister dst = sel.selReg(sel.reg(ir::FAMILY_DWORD), ir::TYPE_U32); sel.MOV(dst, unpacked); return dst; } void byteGatherStateless(Selection::Opaque &sel, GenRegister addr, GenRegister dst, unsigned elemSize) const { using namespace ir; GenRegister addrQ; unsigned simdWidth = sel.ctx.getSimdWidth(); unsigned addrBytes = typeSize(addr.type); if (addrBytes == 4) { addrQ = sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64); sel.MOV(addrQ, addr); } else { addrQ = addr; } sel.push(); if (simdWidth == 8) { sel.BYTE_GATHERA64(dst, addrQ, elemSize); } else if (simdWidth == 16) { sel.curr.execWidth = 8; sel.curr.quarterControl = GEN_COMPRESSION_Q1; sel.BYTE_GATHERA64(GenRegister::Qn(dst, 0), GenRegister::Qn(addrQ, 0), elemSize); sel.curr.quarterControl = GEN_COMPRESSION_Q2; sel.BYTE_GATHERA64(GenRegister::Qn(dst, 1), GenRegister::Qn(addrQ, 1), elemSize); } sel.pop(); } void shootByteGatherMsg(Selection::Opaque &sel, const ir::LoadInstruction &insn, GenRegister dst, GenRegister addr, unsigned elemSize, bool isUniform, ir::AddressSpace addrSpace) const { using namespace ir; unsigned addrBytes = typeSize(addr.type); AddressMode AM = insn.getAddressMode(); vector btiTemp = sel.getBTITemps(AM); if (AM == AM_DynamicBti || AM == AM_StaticBti) { if (AM == AM_DynamicBti) { Register btiReg = insn.getBtiReg(); sel.BYTE_GATHER(dst, addr, elemSize, sel.selReg(btiReg, TYPE_U32), btiTemp); } else { unsigned SI = insn.getSurfaceIndex(); sel.BYTE_GATHER(dst, addr, elemSize, GenRegister::immud(SI), btiTemp); } } else if (addrSpace == ir::MEM_LOCAL || isReadConstantLegacy(insn)) { unsigned bti = addrSpace == ir::MEM_CONSTANT ? BTI_CONSTANT : 0xfe; GenRegister addrDW = addr; if (addrBytes == 8) { addrDW = convertU64ToU32(sel, addr); } sel.BYTE_GATHER(dst, addrDW, elemSize, GenRegister::immud(bti), btiTemp); } else if (addrSpace == ir::MEM_GENERIC) { Register localMask = generateLocalMask(sel, addr); sel.push(); sel.curr.useVirtualFlag(localMask, GEN_PREDICATE_NORMAL); GenRegister addrDW = addr; if (addrBytes == 8) addrDW = convertU64ToU32(sel, addr); sel.BYTE_GATHER(dst, addrDW, elemSize, GenRegister::immud(0xfe), btiTemp); sel.curr.inversePredicate = 1; byteGatherStateless(sel, addr, dst, elemSize); sel.pop(); } else { byteGatherStateless(sel, addr, dst, elemSize); } } void emitUnalignedByteGather(Selection::Opaque &sel, const ir::LoadInstruction &insn, const uint32_t elemSize, GenRegister address, ir::AddressSpace addrSpace) const { using namespace ir; const uint32_t valueNum = insn.getValueNum(); const uint32_t simdWidth = sel.isScalarReg(insn.getValue(0)) ? 1 : sel.ctx.getSimdWidth(); const bool isUniform = simdWidth == 1; RegisterFamily family = getFamily(insn.getValueType()); RegisterFamily addrFamily = sel.getRegisterFamily(address.reg()); Type addrType = getType(addrFamily); if(valueNum > 1) { GBE_ASSERT(!isUniform && "vector load should not be uniform. Something went wrong."); //need to investigate the case of GEN_BYTE_SCATTER_WORD later //for GEN_BYTE_SCATTER_BYTE, if the pointer is not aligned to 4, using byte gather, // on BDW, vec8 and vec16 are worse. on SKL/BXT, vec16 is worse. if(sel.getSlowByteGather() || elemSize == GEN_BYTE_SCATTER_WORD || (elemSize == GEN_BYTE_SCATTER_BYTE && (valueNum == 16 || valueNum == 8))) { vector dst(valueNum); const uint32_t typeSize = getFamilySize(family); for(uint32_t i = 0; i < valueNum; i++) dst[i] = sel.selReg(insn.getValue(i), getType(family)); uint32_t effectDataNum = (typeSize*valueNum + 3) / 4; vector tmp(effectDataNum + 1); vector effectData(effectDataNum); for(uint32_t i = 0; i < effectDataNum + 1; i++) tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD, isUniform), ir::TYPE_U32); GenRegister alignedAddr = sel.selReg(sel.reg(addrFamily, isUniform), addrType); sel.push(); if (isUniform) sel.curr.noMask = 1; if (addrFamily == FAMILY_DWORD) sel.AND(alignedAddr, GenRegister::retype(address, GEN_TYPE_UD), GenRegister::immud(~0x3)); else { sel.MOV(alignedAddr, GenRegister::immuint64(~0x3ul)); sel.AND(alignedAddr, GenRegister::retype(address, GEN_TYPE_UL), alignedAddr); } sel.pop(); uint32_t remainedReg = effectDataNum + 1; uint32_t pos = 0; do { uint32_t width = remainedReg > 4 ? 4 : remainedReg; vector t1(tmp.begin() + pos, tmp.begin() + pos + width); if (pos != 0) { sel.push(); if (isUniform) sel.curr.noMask = 1; if (addrFamily == FAMILY_DWORD) sel.ADD(alignedAddr, alignedAddr, GenRegister::immud(pos * 4)); else sel.ADD(alignedAddr, alignedAddr, GenRegister::immuint64(pos * 4)); sel.pop(); } shootUntypedReadMsg(sel, insn, t1, alignedAddr, width, addrSpace); remainedReg -= width; pos += width; } while(remainedReg); for(uint32_t i = 0; i < effectDataNum; i++) effectData[i] = sel.selReg(sel.reg(FAMILY_DWORD, isUniform), ir::TYPE_U32); getEffectByteData(sel, effectData, tmp, effectDataNum, address, isUniform); for(uint32_t i = 0; i < effectDataNum; i++) { unsigned int elemNum = (valueNum - i * (4 / typeSize)) > 4/typeSize ? 4/typeSize : (valueNum - i * (4 / typeSize)); sel.UNPACK_BYTE(dst.data() + i * 4/typeSize, effectData[i], typeSize, elemNum); } } else { GBE_ASSERT(elemSize == GEN_BYTE_SCATTER_BYTE); vector dst(valueNum); for(uint32_t i = 0; i < valueNum; i++) dst[i] = sel.selReg(insn.getValue(i), getType(family)); GenRegister readDst = sel.selReg(sel.reg(FAMILY_DWORD), ir::TYPE_U32); uint32_t valueIndex = 0; uint32_t loopCount = (valueNum + 3) / 4; GenRegister addressForLoop = address; sel.push(); if (loopCount > 1) { addressForLoop = sel.selReg(sel.reg(FAMILY_DWORD), ir::TYPE_U32); sel.MOV(addressForLoop, address); } for (uint32_t i = 0; i < loopCount; ++i) { uint32_t valueLeft = valueNum - valueIndex; GBE_ASSERT(valueLeft > 1); uint32_t dataSize = 0; if (valueLeft == 2) dataSize = GEN_BYTE_SCATTER_WORD; else dataSize = GEN_BYTE_SCATTER_DWORD; shootByteGatherMsg(sel, insn, readDst, addressForLoop, dataSize, isUniform, addrSpace); // only 4 bytes is gathered even if valueLeft >= 4 sel.UNPACK_BYTE(dst.data(), readDst, getFamilySize(FAMILY_BYTE), (valueLeft < 4 ? valueLeft : 4)); valueIndex += 4; //calculate the new address to read if (valueIndex < valueNum) sel.ADD(addressForLoop, addressForLoop, GenRegister::immud(4)); } sel.pop(); } } else { GBE_ASSERT(insn.getValueNum() == 1); const GenRegister value = sel.selReg(insn.getValue(0), insn.getValueType()); GBE_ASSERT(elemSize == GEN_BYTE_SCATTER_WORD || elemSize == GEN_BYTE_SCATTER_BYTE); if (sel.getSlowByteGather()) readByteAsDWord(sel, insn, elemSize, address, value, isUniform, addrSpace); else { // We need a temporary register if we read bytes or words Register dst = sel.reg(FAMILY_DWORD); sel.push(); if (isUniform) sel.curr.noMask = 1; shootByteGatherMsg(sel, insn, sel.selReg(dst, ir::TYPE_U32), address, elemSize, isUniform, addrSpace); sel.pop(); sel.push(); if (isUniform) { sel.curr.noMask = 1; sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; } if (elemSize == GEN_BYTE_SCATTER_WORD) sel.MOV(GenRegister::retype(value, GEN_TYPE_UW), GenRegister::unpacked_uw(dst, isUniform)); else if (elemSize == GEN_BYTE_SCATTER_BYTE) sel.MOV(GenRegister::retype(value, GEN_TYPE_UB), GenRegister::unpacked_ub(dst, isUniform)); sel.pop(); } } } void emitOWordRead(Selection::Opaque &sel, const ir::LoadInstruction &insn, GenRegister address, ir::AddressSpace addrSpace) const { using namespace ir; uint32_t SI = insn.getSurfaceIndex(); const uint32_t vec_size = insn.getValueNum(); const uint32_t simdWidth = sel.ctx.getSimdWidth(); const Type type = insn.getValueType(); const uint32_t typeSize = type == TYPE_U32 ? 4 : 2; const uint32_t genType = type == TYPE_U32 ? GEN_TYPE_UD : GEN_TYPE_UW; const RegisterFamily family = getFamily(type); bool isA64 = SI == 255; const GenRegister header = GenRegister::ud8grf(sel.reg(FAMILY_REG)); vector valuesVec; vector tmpVec; for(uint32_t i = 0; i < vec_size; i++) valuesVec.push_back(sel.selReg(insn.getValue(i), type)); GenRegister headeraddr; if (isA64) headeraddr = GenRegister::toUniform(sel.getOffsetReg(header, 0, 0), GEN_TYPE_UL); else headeraddr = GenRegister::toUniform(sel.getOffsetReg(header, 0, 2 * 4), GEN_TYPE_UD); // Make header sel.push(); { // Copy r0 into the header first sel.curr.execWidth = 8; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; sel.MOV(header, GenRegister::ud8grf(0, 0)); // Update the header with the current address sel.curr.execWidth = 1; // Put zero in the general state base address if (isA64) sel.MOV(headeraddr, GenRegister::toUniform(address, GEN_TYPE_UL)); else { sel.MOV(headeraddr, GenRegister::toUniform(address, GEN_TYPE_UD)); sel.MOV(sel.getOffsetReg(header, 0, 5 * 4), GenRegister::immud(0)); } } sel.pop(); /* For block read we need to unpack the block date into values, and for different * simdwidth and vector size with different type size, we may need to spilt the * block read send message. * We can only get a send message with 5 reg length * so for different combination we have different message length and tmp vector size * | simd8 | simd16 | simd8 | simd16 * r0 |header | | | | * r1 |date | w0,w1 | w0 | dw0 | dw0 * r2 |date | w2,w3 | w1 | dw1 | dw0 * r3 |date | ...... | ...... | ...... | dw1 * r4 |date | ....... | ...... | ...... | dw1 */ uint32_t totalSize = simdWidth * typeSize * vec_size; uint32_t valueSize = simdWidth * typeSize; uint32_t tmp_size = totalSize > 128 ? (128 / valueSize) : vec_size; uint32_t msg_num = vec_size / tmp_size; uint32_t ow_size = msg_num > 1 ? 8 : (totalSize / 16); for(uint32_t i = 0; i < tmp_size; i++) tmpVec.push_back(GenRegister::retype(GenRegister::f8grf(sel.reg(family)), genType)); for (uint32_t i = 0; i < msg_num; i++) { if (i > 0) { sel.push(); { // Update the address in header sel.curr.execWidth = 1; sel.ADD(headeraddr, headeraddr, GenRegister::immud(128)); } sel.pop(); } sel.OBREAD(&tmpVec[0], tmp_size, header, SI, ow_size); for (uint32_t j = 0; j < tmp_size; j++) sel.MOV(valuesVec[j + i * tmp_size], tmpVec[j]); } } // check whether all binded table index point to constant memory INLINE bool isAllConstant(const ir::BTI &bti) const { if (bti.isConst && bti.imm == BTI_CONSTANT) return true; return false; } /*! Implements base class */ virtual bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::LoadInstruction &insn = cast(dag.insn); Register reg = insn.getAddressRegister(); GenRegister address = sel.selReg(reg, getType(sel.getRegisterFamily(reg))); GBE_ASSERT(insn.getAddressSpace() == MEM_GLOBAL || insn.getAddressSpace() == MEM_CONSTANT || insn.getAddressSpace() == MEM_PRIVATE || insn.getAddressSpace() == MEM_LOCAL || insn.getAddressSpace() == MEM_GENERIC || insn.getAddressSpace() == MEM_MIXED); //GBE_ASSERT(sel.isScalarReg(insn.getValue(0)) == false); AddressSpace addrSpace = insn.getAddressSpace(); const Type type = insn.getValueType(); const uint32_t elemSize = getByteScatterGatherSize(sel, type); if (insn.isBlock()) this->emitOWordRead(sel, insn, address, addrSpace); else if (isReadConstantLegacy(insn)) { // XXX TODO read 64bit constant through constant cache // Per HW Spec, constant cache messages can read at least DWORD data. // So, byte/short data type, we have to read through data cache. if(insn.isAligned() == true && elemSize == GEN_BYTE_SCATTER_QWORD) this->emitRead64(sel, insn, address, addrSpace); else if(insn.isAligned() == true && elemSize == GEN_BYTE_SCATTER_DWORD) this->emitDWordGather(sel, insn, address, addrSpace); else if (insn.isAligned() == true) this->emitAlignedByteGather(sel, insn, elemSize, address, addrSpace); else this->emitUnalignedByteGather(sel, insn, elemSize, address, addrSpace); } else { if (insn.isAligned() == true && elemSize == GEN_BYTE_SCATTER_QWORD) this->emitRead64(sel, insn, address, addrSpace); else if (insn.isAligned() == true && elemSize == GEN_BYTE_SCATTER_DWORD) this->emitUntypedRead(sel, insn, address, addrSpace); else if (insn.isAligned()) this->emitAlignedByteGather(sel, insn, elemSize, address, addrSpace); else this->emitUnalignedByteGather(sel, insn, elemSize, address, addrSpace); } markAllChildren(dag); return true; } }; class StoreInstructionPattern : public SelectionPattern { public: /*! Register the pattern for all opcodes of the family */ StoreInstructionPattern(void) : SelectionPattern(1, 1) { this->opcodes.push_back(ir::OP_STORE); } GenRegister convertU64ToU32(Selection::Opaque &sel, GenRegister addr) const { GenRegister unpacked = GenRegister::retype(sel.unpacked_ud(addr.reg()), GEN_TYPE_UD); GenRegister dst = sel.selReg(sel.reg(ir::FAMILY_DWORD), ir::TYPE_U32); sel.MOV(dst, unpacked); return dst; } void untypedWriteStateless(Selection::Opaque &sel, GenRegister address, vector &value) const { using namespace ir; unsigned simdWidth = sel.ctx.getSimdWidth(); unsigned int addrBytes = typeSize(address.type); unsigned valueNum = value.size(); GenRegister addrQ; if (addrBytes == 4) { if (simdWidth == 8) { addrQ = sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64); sel.MOV(addrQ, address); } else if (simdWidth == 16) { addrQ = address; } } else if (addrBytes == 8) { addrQ = address; } if (simdWidth == 8) { vector msg; msg.push_back(addrQ); for (unsigned k = 0; k < valueNum; k++) msg.push_back(value[k]); sel.UNTYPED_WRITEA64(msg.data(), valueNum+1, valueNum); } else if (simdWidth == 16) { vector msgs; for (unsigned k = 0; k < (valueNum+1)/2+1; k++) { msgs.push_back(sel.selReg(sel.reg(ir::FAMILY_DWORD), ir::TYPE_U32)); } sel.push(); /* do first quarter */ sel.curr.execWidth = 8; sel.curr.quarterControl = GEN_COMPRESSION_Q1; sel.MOV(GenRegister::retype(msgs[0], GEN_TYPE_UL), GenRegister::Qn(addrQ, 0)); for (unsigned k = 0; k < valueNum; k++) { sel.MOV(GenRegister::Qn(msgs[k/2+1], k%2), GenRegister::Qn(value[k], 0)); } sel.UNTYPED_WRITEA64(msgs.data(), (valueNum+1)/2+1, valueNum); /* do second quarter */ sel.curr.execWidth = 8; sel.curr.quarterControl = GEN_COMPRESSION_Q2; sel.MOV(GenRegister::retype(msgs[0], GEN_TYPE_UL), GenRegister::Qn(addrQ, 1)); for (unsigned k = 0; k < valueNum; k++) sel.MOV(GenRegister::Qn(msgs[k/2+1], k%2), GenRegister::Qn(value[k], 1)); sel.UNTYPED_WRITEA64(msgs.data(), (valueNum+1)/2+1, valueNum); sel.pop(); } } void shootUntypedWriteMsg(Selection::Opaque &sel, const ir::StoreInstruction &insn, GenRegister &address, vector &value, ir::AddressSpace addrSpace) const { using namespace ir; unsigned int addrBytes = typeSize(address.type); unsigned valueNum = value.size(); AddressMode AM = insn.getAddressMode(); vector btiTemp = sel.getBTITemps(AM); if (AM == AM_DynamicBti || AM == AM_StaticBti) { if (AM == AM_DynamicBti) { Register btiReg = insn.getBtiReg(); sel.UNTYPED_WRITE(address, value.data(), valueNum, sel.selReg(btiReg, TYPE_U32), btiTemp); } else { unsigned SI = insn.getSurfaceIndex(); sel.UNTYPED_WRITE(address, value.data(), valueNum, GenRegister::immud(SI), btiTemp); } } else if (addrSpace == ir::MEM_LOCAL) { GenRegister addr = address; if (addrBytes == 8) { addr = convertU64ToU32(sel, address); } sel.UNTYPED_WRITE(addr, value.data(), valueNum, GenRegister::immud(0xfe), btiTemp); } else if (addrSpace == ir::MEM_GENERIC) { Register localMask = generateLocalMask(sel, address); sel.push(); sel.curr.useVirtualFlag(localMask, GEN_PREDICATE_NORMAL); GenRegister addrDW = address; if (addrBytes == 8) addrDW = convertU64ToU32(sel, address); sel.UNTYPED_WRITE(addrDW, value.data(), valueNum, GenRegister::immud(0xfe), btiTemp); sel.curr.inversePredicate = 1; untypedWriteStateless(sel, address, value); sel.pop(); } else { untypedWriteStateless(sel, address, value); } } void emitUntypedWrite(Selection::Opaque &sel, const ir::StoreInstruction &insn, GenRegister address, ir::AddressSpace addrSpace) const { using namespace ir; const uint32_t valueNum = insn.getValueNum(); vector value(valueNum); for (uint32_t valueID = 0; valueID < valueNum; ++valueID) value[valueID] = GenRegister::retype(sel.selReg(insn.getValue(valueID)), GEN_TYPE_UD); shootUntypedWriteMsg(sel, insn, address, value, addrSpace); } void write64Legacy(Selection::Opaque &sel, GenRegister address, vector &value, GenRegister bti, vector &btiTemp) const { using namespace ir; const uint32_t valueNum = value.size(); if (sel.hasLongType()) { vector tmp(valueNum); for (uint32_t valueID = 0; valueID < valueNum; ++valueID) { tmp[valueID] = GenRegister::retype(sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64), GEN_TYPE_UL); } sel.WRITE64(address, value.data(), tmp.data(), valueNum, bti, true, btiTemp); } else { sel.WRITE64(address, value.data(), NULL, valueNum, bti, false, btiTemp); } } void write64Stateless(Selection::Opaque &sel, GenRegister address, vector &value) const { using namespace ir; unsigned simdWidth = sel.ctx.getSimdWidth(); unsigned int addrBytes = typeSize(address.type); unsigned valueNum = value.size(); vector tmp(valueNum); for (uint32_t valueID = 0; valueID < valueNum; ++valueID) { tmp[valueID] = GenRegister::retype(sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64), GEN_TYPE_UL); } GenRegister addrQ; if (addrBytes == 4) { addrQ = sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64); sel.MOV(addrQ, address); } else { addrQ = address; } sel.push(); if (simdWidth == 8) { sel.WRITE64A64(addrQ, value.data(), tmp.data(), valueNum); } else { GenRegister tmpAddr, tmpSrc; tmpAddr = GenRegister::Qn(addrQ, 0); tmpSrc = GenRegister::Qn(value[0], 0); GenRegister tmp = sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64); /* SIMD16 long register is just enough for (SIMD8 A64 addr + SIMD8 long) */ sel.curr.execWidth = 8; sel.curr.quarterControl = GEN_COMPRESSION_Q1; sel.MOV(GenRegister::Qn(tmp, 0), tmpAddr); sel.UNPACK_LONG(GenRegister::Qn(tmp, 1), tmpSrc); sel.UNTYPED_WRITEA64(&tmp, 1, 2); tmpAddr = GenRegister::Qn(addrQ, 1); tmpSrc = GenRegister::Qn(value[0], 1); sel.curr.quarterControl = GEN_COMPRESSION_Q2; sel.MOV(GenRegister::Qn(tmp, 0), tmpAddr); sel.UNPACK_LONG(GenRegister::Qn(tmp, 1), tmpSrc); sel.UNTYPED_WRITEA64(&tmp, 1, 2); } sel.pop(); } void emitWrite64(Selection::Opaque &sel, const ir::StoreInstruction &insn, GenRegister address, ir::AddressSpace addrSpace) const { using namespace ir; const uint32_t valueNum = insn.getValueNum(); /* XXX support scalar only right now. */ GBE_ASSERT(valueNum == 1); vector src(valueNum); for (uint32_t valueID = 0; valueID < valueNum; ++valueID) src[valueID] = sel.selReg(insn.getValue(valueID), ir::TYPE_U64); AddressMode AM = insn.getAddressMode(); unsigned int addrBytes = typeSize(address.type); vector btiTemp = sel.getBTITemps(AM); if (AM != AM_Stateless) { GenRegister b; if (AM == AM_DynamicBti) { b = sel.selReg(insn.getBtiReg(), TYPE_U32); } else { b = GenRegister::immud(insn.getSurfaceIndex()); } write64Legacy(sel, address, src, b, btiTemp); } else if (addrSpace == MEM_LOCAL) { GenRegister b = GenRegister::immud(0xfe); GenRegister addr = address; if (addrBytes == 8) { addr = convertU64ToU32(sel, address); } write64Legacy(sel, addr, src, b, btiTemp); } else if (addrSpace == ir::MEM_GENERIC) { Register localMask = generateLocalMask(sel, address); sel.push(); sel.curr.useVirtualFlag(localMask, GEN_PREDICATE_NORMAL); GenRegister addrDW = address; if (addrBytes == 8) addrDW = convertU64ToU32(sel, address); write64Legacy(sel, addrDW, src, GenRegister::immud(0xfe), btiTemp); sel.curr.inversePredicate = 1; write64Stateless(sel, address, src); sel.pop(); } else { GBE_ASSERT(sel.hasLongType()); write64Stateless(sel, address, src); } } void byteScatterStateless(Selection::Opaque &sel, GenRegister address, GenRegister data, unsigned elemSize) const { using namespace ir; unsigned addrBytes = typeSize(address.type); unsigned simdWidth = sel.ctx.getSimdWidth(); GenRegister addrQ; if (addrBytes == 4) { addrQ = sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64); sel.MOV(addrQ, address); } else { addrQ = address; } if (simdWidth == 8) { GenRegister msg[2]; msg[0] = addrQ; msg[1] = data; sel.BYTE_SCATTERA64(msg, 2, elemSize); } else if (simdWidth == 16) { GenRegister msgs[2]; msgs[0] = sel.selReg(sel.reg(ir::FAMILY_DWORD), ir::TYPE_U32); msgs[1] = sel.selReg(sel.reg(ir::FAMILY_DWORD), ir::TYPE_U32); sel.push(); sel.curr.execWidth = 8; /* do first quarter */ sel.curr.quarterControl = GEN_COMPRESSION_Q1; sel.MOV(GenRegister::retype(msgs[0], GEN_TYPE_UL), GenRegister::Qn(addrQ, 0)); sel.MOV(GenRegister::Qn(msgs[1], 0), GenRegister::Qn(data, 0)); sel.BYTE_SCATTERA64(msgs, 2, elemSize); /* do second quarter */ sel.curr.quarterControl = GEN_COMPRESSION_Q2; sel.MOV(GenRegister::retype(msgs[0], GEN_TYPE_UL), GenRegister::Qn(addrQ, 1)); sel.MOV(GenRegister::Qn(msgs[1], 0), GenRegister::Qn(data, 1)); sel.BYTE_SCATTERA64(msgs, 2, elemSize); sel.pop(); } } void shootByteScatterMsg(Selection::Opaque &sel, const ir::StoreInstruction &insn, GenRegister address, GenRegister data, unsigned elemSize, ir::AddressSpace addrSpace) const { using namespace ir; unsigned addrBytes = typeSize(address.type); AddressMode AM = insn.getAddressMode(); vector btiTemp = sel.getBTITemps(AM); if (AM != AM_Stateless) { if (AM == AM_DynamicBti) { Register btiReg = insn.getBtiReg(); sel.BYTE_SCATTER(address, data, elemSize, sel.selReg(btiReg, TYPE_U32), btiTemp); } else { unsigned SI = insn.getSurfaceIndex(); sel.BYTE_SCATTER(address, data, elemSize, GenRegister::immud(SI), btiTemp); } } else if (addrSpace == ir::MEM_LOCAL) { GenRegister addr = address; if (addrBytes == 8) { addr = convertU64ToU32(sel, address); } sel.BYTE_SCATTER(addr, data, elemSize, GenRegister::immud(0xfe), btiTemp); } else if (addrSpace == ir::MEM_GENERIC) { Register localMask = generateLocalMask(sel, address); sel.push(); sel.curr.useVirtualFlag(localMask, GEN_PREDICATE_NORMAL); GenRegister addrDW = address; if (addrBytes == 8) addrDW = convertU64ToU32(sel, address); sel.BYTE_SCATTER(addrDW, data, elemSize, GenRegister::immud(0xfe), btiTemp); sel.curr.inversePredicate = 1; byteScatterStateless(sel, address, data, elemSize); sel.pop(); } else { byteScatterStateless(sel, address, data, elemSize); } } void emitByteScatter(Selection::Opaque &sel, const ir::StoreInstruction &insn, const uint32_t elemSize, GenRegister address, ir::AddressSpace addrSpace, bool isUniform) const { using namespace ir; uint32_t valueNum = insn.getValueNum(); if(valueNum > 1) { const uint32_t typeSize = getFamilySize(getFamily(insn.getValueType())); vector value(valueNum); if(elemSize == GEN_BYTE_SCATTER_WORD) { for(uint32_t i = 0; i < valueNum; i++) value[i] = sel.selReg(insn.getValue(i), ir::TYPE_U16); } else if(elemSize == GEN_BYTE_SCATTER_BYTE) { for(uint32_t i = 0; i < valueNum; i++) value[i] = sel.selReg(insn.getValue(i), ir::TYPE_U8); } uint32_t tmpRegNum = typeSize*valueNum / 4; vector tmp(tmpRegNum); for(uint32_t i = 0; i < tmpRegNum; i++) { tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD, isUniform), ir::TYPE_U32); sel.PACK_BYTE(tmp[i], value.data() + i * 4/typeSize, typeSize, 4/typeSize); } shootUntypedWriteMsg(sel, insn, address, tmp, addrSpace); } else { const GenRegister value = sel.selReg(insn.getValue(0)); GBE_ASSERT(insn.getValueNum() == 1); const GenRegister tmp = sel.selReg(sel.reg(FAMILY_DWORD), ir::TYPE_U32); if (elemSize == GEN_BYTE_SCATTER_WORD) sel.MOV(tmp, GenRegister::retype(value, GEN_TYPE_UW)); else if (elemSize == GEN_BYTE_SCATTER_BYTE) sel.MOV(tmp, GenRegister::retype(value, GEN_TYPE_UB)); shootByteScatterMsg(sel, insn, address, tmp, elemSize, addrSpace); } } void emitOWordWrite(Selection::Opaque &sel, const ir::StoreInstruction &insn, GenRegister address, ir::AddressSpace addrSpace) const { using namespace ir; uint32_t SI = insn.getSurfaceIndex(); const uint32_t vec_size = insn.getValueNum(); const uint32_t simdWidth = sel.ctx.getSimdWidth(); const Type type = insn.getValueType(); const uint32_t typeSize = type == TYPE_U32 ? 4 : 2; const uint32_t genType = type == TYPE_U32 ? GEN_TYPE_UD : GEN_TYPE_UW; const RegisterFamily family = getFamily(type); bool isA64 = SI == 255; uint32_t offset_size = isA64 ? 128 : 8; const GenRegister header = GenRegister::ud8grf(sel.reg(FAMILY_REG)); vector valuesVec; vector tmpVec; for(uint32_t i = 0; i < vec_size; i++) valuesVec.push_back(sel.selReg(insn.getValue(i), type)); GenRegister headeraddr; if (isA64) headeraddr = GenRegister::toUniform(sel.getOffsetReg(header, 0, 0), GEN_TYPE_UL); else headeraddr = GenRegister::toUniform(sel.getOffsetReg(header, 0, 2 * 4), GEN_TYPE_UD); // Make header sel.push(); { // Copy r0 into the header first sel.curr.execWidth = 8; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; sel.MOV(header, GenRegister::ud8grf(0, 0)); // Update the header with the current address sel.curr.execWidth = 1; // Put zero in the general state base address if (isA64) sel.MOV(headeraddr, GenRegister::toUniform(address, GEN_TYPE_UL)); else { sel.SHR(headeraddr, GenRegister::toUniform(address, GEN_TYPE_UD), GenRegister::immud(4)); sel.MOV(sel.getOffsetReg(header, 0, 5 * 4), GenRegister::immud(0)); } } sel.pop(); /* For block write we need to pack the block date into the tmp, and for different * simdwidth and vector size with different type size, we may need to spilt the * block write send message. * We can only get a send message with 5 reg length * so for different combination we have different message length and tmp vector size * | simd8 | simd16 | simd8 | simd16 * r0 |header | | | | * r1 |date | w0,w1 | w0 | dw0 | dw0 * r2 |date | w2,w3 | w1 | dw1 | dw0 * r3 |date | ...... | ...... | ...... | dw1 * r4 |date | ....... | ...... | ...... | dw1 */ uint32_t totalSize = simdWidth * typeSize * vec_size; uint32_t valueSize = simdWidth * typeSize; uint32_t tmp_size = totalSize > 128 ? (128 / valueSize) : vec_size; uint32_t msg_num = vec_size / tmp_size; uint32_t ow_size = msg_num > 1 ? 8 : (totalSize / 16); for(uint32_t i = 0; i < tmp_size; i++) tmpVec.push_back(GenRegister::retype(GenRegister::f8grf(sel.reg(family)), genType)); for (uint32_t i = 0; i < msg_num; i++) { for (uint32_t j = 0; j < tmp_size; j++) sel.MOV(tmpVec[j], valuesVec[j + i * tmp_size]); if (i > 0) { sel.push(); { // Update the address in header sel.curr.execWidth = 1; sel.ADD(headeraddr, headeraddr, GenRegister::immud(offset_size)); } sel.pop(); } sel.push(); // In simd8 mode, when data reg has more than 1 reg, execWidth 8 will get wrong // result, so set the execWidth to 16. sel.curr.execWidth = 16; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; sel.OBWRITE(header, &tmpVec[0], tmp_size, SI, ow_size); sel.pop(); } } virtual bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::StoreInstruction &insn = cast(dag.insn); Register reg = insn.getAddressRegister(); GenRegister address = sel.selReg(reg, getType(sel.getRegisterFamily(reg))); AddressSpace addrSpace = insn.getAddressSpace(); const Type type = insn.getValueType(); const uint32_t elemSize = getByteScatterGatherSize(sel, type); const bool isUniform = sel.isScalarReg(insn.getAddressRegister()) && sel.isScalarReg(insn.getValue(0)); if (insn.isBlock()) this->emitOWordWrite(sel, insn, address, addrSpace); else if (insn.isAligned() == true && elemSize == GEN_BYTE_SCATTER_QWORD) this->emitWrite64(sel, insn, address, addrSpace); else if (insn.isAligned() == true && elemSize == GEN_BYTE_SCATTER_DWORD) this->emitUntypedWrite(sel, insn, address, addrSpace); else { this->emitByteScatter(sel, insn, elemSize, address, addrSpace, isUniform); } markAllChildren(dag); return true; } }; /*! Compare instruction pattern */ class CompareInstructionPattern : public SelectionPattern { public: CompareInstructionPattern(void) : SelectionPattern(1,1) { for (uint32_t op = 0; op < ir::OP_INVALID; ++op) if (ir::isOpcodeFrom(ir::Opcode(op)) == true) this->opcodes.push_back(ir::Opcode(op)); } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::CompareInstruction &insn = cast(dag.insn); const Opcode opcode = insn.getOpcode(); const Type type = insn.getType(); const Register dst = insn.getDst(0); GenRegister tmpDst; const BasicBlock *curr = insn.getParent(); const ir::Liveness &liveness = sel.ctx.getLiveness(); const ir::Liveness::LiveOut &liveOut = liveness.getLiveOut(curr); bool needStoreBool = false; if (liveOut.contains(dst) || dag.computeBool) needStoreBool = true; // why we set the tmpDst to null? // because for the listed type compare instruction could not // generate bool(uw) result to grf directly, we need an extra // select to generate the bool value to grf if(type == TYPE_S64 || type == TYPE_U64 || type == TYPE_DOUBLE || type == TYPE_FLOAT || type == TYPE_U32 || type == TYPE_S32 || type == TYPE_HALF /*|| (!needStoreBool)*/) tmpDst = GenRegister::retype(GenRegister::null(), GEN_TYPE_F); else tmpDst = sel.selReg(dst, TYPE_BOOL); // Look for immediate values for the right source GenRegister src0, src1; bool inverseCmp = false; sel.getSrcGenRegImm(dag, src0, src1, type, inverseCmp); sel.push(); if (sel.isScalarReg(dst)) sel.curr.noMask = 1; sel.curr.physicalFlag = 0; sel.curr.modFlag = 1; sel.curr.flagIndex = dst.value(); sel.curr.grfFlag = needStoreBool; // indicate whether we need to allocate grf to store this boolean. if ((type == TYPE_S64 || type == TYPE_U64) && !sel.hasLongType()) { GenRegister tmp[3]; for(int i=0; i<3; i++) tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD)); sel.curr.flagGen = 1; sel.I64CMP(getGenCompare(opcode, inverseCmp), src0, src1, tmp); } else if(opcode == OP_ORD) { sel.push(); sel.CMP(GEN_CONDITIONAL_EQ, src0, src0, tmpDst); sel.curr.predicate = GEN_PREDICATE_NORMAL; sel.curr.flagGen = 1; sel.CMP(GEN_CONDITIONAL_EQ, src1, src1, tmpDst); sel.pop(); } else { if((type == TYPE_S64 || type == TYPE_U64 || type == TYPE_DOUBLE || type == TYPE_FLOAT || type == TYPE_U32 || type == TYPE_S32 || type == TYPE_HALF)) sel.curr.flagGen = 1; else if (sel.isScalarReg(dst)) { // If the dest reg is a scalar bool, we can't set it as // dst register, as the execution width is still 8 or 16. // Instead, we set the needStoreBool to flagGen, and change // the dst to null register. And let the flag reg allocation // function to generate the flag grf on demand correctly latter. sel.curr.flagGen = needStoreBool; tmpDst = GenRegister::retype(GenRegister::null(), GEN_TYPE_UW); } sel.CMP(getGenCompare(opcode, inverseCmp), src0, src1, tmpDst); } sel.pop(); return true; } }; /*! Bit cast instruction pattern */ DECL_PATTERN(BitCastInstruction) { INLINE bool emitOne(Selection::Opaque &sel, const ir::BitCastInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const uint32_t dstNum = insn.getDstNum(); const uint32_t srcNum = insn.getSrcNum(); int index = 0, multiple, narrowNum, wideNum; bool narrowDst; Type narrowType; bool wideScalar = false; if(dstNum > srcNum) { multiple = dstNum / srcNum; narrowType = dstType; narrowNum = dstNum; wideNum = srcNum; narrowDst = 1; wideScalar = sel.isScalarReg(insn.getSrc(0)); } else { multiple = srcNum / dstNum; narrowType = srcType; narrowNum = srcNum; wideNum = dstNum; narrowDst = 0; wideScalar = sel.isScalarReg(insn.getDst(0)); } sel.push(); if (sel.isScalarReg(insn.getDst(0)) == true) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } // As we store long/ulong low/high part separately, // we need to deal with it separately, we need to change it back again // when hardware support native long type. const bool isInt64 = (srcType == TYPE_S64 || srcType == TYPE_U64 || dstType == TYPE_S64 || dstType == TYPE_U64); const int simdWidth = sel.curr.execWidth; /* because we do not have hstride = 8, here, we need to seperate the long into top have and bottom half. */ vector tmp(wideNum); if (multiple == 8 && sel.hasLongType() && !wideScalar) { GBE_ASSERT(isInt64); // Must relate to long and char conversion. if (narrowDst) { for (int i = 0; i < wideNum; i++) { tmp[i] = sel.selReg(sel.reg(FAMILY_QWORD), ir::TYPE_U64); sel.UNPACK_LONG(tmp[i], sel.selReg(insn.getSrc(i), srcType)); } } else { for (int i = 0; i < wideNum; i++) { tmp[i] = sel.selReg(sel.reg(FAMILY_QWORD), ir::TYPE_U64); } } } for(int i = 0; i < narrowNum; i++, index++) { GenRegister narrowReg, wideReg; if (multiple == 8 && sel.hasLongType() && !wideScalar) { if(narrowDst) { narrowReg = sel.selReg(insn.getDst(i), narrowType); wideReg = GenRegister::retype(tmp[index/multiple], narrowType); //retype to narrow type } else { wideReg = GenRegister::retype(tmp[index/multiple], narrowType); narrowReg = sel.selReg(insn.getSrc(i), narrowType); //retype to narrow type } } else { if(narrowDst) { narrowReg = sel.selReg(insn.getDst(i), narrowType); wideReg = sel.selReg(insn.getSrc(index/multiple), narrowType); //retype to narrow type } else { wideReg = sel.selReg(insn.getDst(index/multiple), narrowType); narrowReg = sel.selReg(insn.getSrc(i), narrowType); //retype to narrow type } } // set correct horizontal stride if(wideReg.hstride != GEN_HORIZONTAL_STRIDE_0) { if(multiple == 2) { if (sel.hasLongType() && isInt64) { // long to int or int to long wideReg = sel.unpacked_ud(wideReg.reg()); wideReg = GenRegister::retype(wideReg, getGenType(narrowType)); } else { wideReg = sel.unpacked_uw(wideReg.reg()); wideReg = GenRegister::retype(wideReg, getGenType(narrowType)); if(isInt64) { wideReg.width = GEN_WIDTH_8; wideReg.hstride = GEN_HORIZONTAL_STRIDE_1; wideReg.vstride = GEN_VERTICAL_STRIDE_8; } } } else if(multiple == 4) { if (sel.hasLongType() && isInt64) { // long to short or short to long wideReg = sel.unpacked_uw(wideReg.reg()); wideReg = GenRegister::retype(wideReg, getGenType(narrowType)); } else { wideReg = sel.unpacked_ub(wideReg.reg()); wideReg = GenRegister::retype(wideReg, getGenType(narrowType)); if(isInt64) { wideReg.hstride = GEN_HORIZONTAL_STRIDE_2; wideReg.vstride = GEN_VERTICAL_STRIDE_16; } } } else if(multiple == 8) { // we currently store high/low 32bit separately in register, // so, its hstride is 4 here. wideReg = sel.unpacked_ub(wideReg.reg()); wideReg = GenRegister::retype(wideReg, getGenType(narrowType)); } else { GBE_ASSERT(0); } } if((!isInt64 || (sel.hasLongType() && multiple != 8)) && index % multiple) { wideReg = sel.getOffsetReg(wideReg, 0, (index % multiple) * typeSize(wideReg.type)); } if(isInt64 && (multiple == 8 || !sel.hasLongType())) { // Offset to next half if((i % multiple) >= multiple/2) wideReg = sel.getOffsetReg(wideReg, 0, sel.isScalarReg(wideReg.reg()) ? 4 : simdWidth*4); // Offset to desired narrow element in wideReg if(index % (multiple/2)) wideReg = sel.getOffsetReg(wideReg, 0, (index % (multiple/2)) * typeSize(wideReg.type)); } GenRegister xdst = narrowDst ? narrowReg : wideReg; GenRegister xsrc = narrowDst ? wideReg : narrowReg; if(isInt64) { sel.MOV(xdst, xsrc); } else if(srcType == TYPE_DOUBLE || dstType == TYPE_DOUBLE) { sel.push(); sel.curr.execWidth = 8; for(int i = 0; i < simdWidth/4; i ++) { sel.curr.chooseNib(i); sel.MOV(xdst, xsrc); xdst = sel.getOffsetReg(xdst, 0, 4 * typeSize(getGenType(dstType))); xsrc = sel.getOffsetReg(xsrc, 0, 4 * typeSize(getGenType(srcType))); } sel.pop(); } else sel.MOV(xdst, xsrc); } if (multiple == 8 && sel.hasLongType() && !wideScalar && !narrowDst) { for (int i = 0; i < wideNum; i++) { sel.PACK_LONG(sel.selReg(insn.getDst(i), dstType), tmp[i]); } } sel.pop(); return true; } DECL_CTOR(BitCastInstruction, 1, 1); }; /*! Convert instruction pattern */ DECL_PATTERN(ConvertInstruction) { INLINE bool lowerI64Reg(Selection::Opaque &sel, SelectionDAG *dag, GenRegister &src, uint32_t type) const { using namespace ir; GBE_ASSERT(type == GEN_TYPE_UD || type == GEN_TYPE_F); if (dag->insn.getOpcode() == OP_LOADI) { const auto &immInsn = cast(dag->insn); const auto imm = immInsn.getImmediate(); const Type immType = immInsn.getType(); if (immType == TYPE_S64 && imm.getIntegerValue() <= INT_MAX && imm.getIntegerValue() >= INT_MIN) { src = GenRegister::immd((int32_t)imm.getIntegerValue()); return true; } else if (immType == TYPE_U64 && imm.getIntegerValue() <= UINT_MAX) { src = GenRegister::immud((uint32_t)imm.getIntegerValue()); return true; } } else if (dag->insn.getOpcode() == OP_CVT) { const auto cvtInsn = cast(dag->insn); auto srcType = cvtInsn.getSrcType(); if (((srcType == TYPE_U32 || srcType == TYPE_S32) && (type == GEN_TYPE_UD || type == GEN_TYPE_D)) || ((srcType == TYPE_FLOAT) && type == GEN_TYPE_F)) { src = GenRegister::retype(sel.selReg(cvtInsn.getSrc(0), srcType), type); dag->isRoot = 1; return true; } else if (srcType == TYPE_FLOAT || srcType == TYPE_U16 || srcType == TYPE_S16 || srcType == TYPE_U32 || srcType == TYPE_S32) { src = GenRegister::retype(sel.selReg(sel.reg(FAMILY_DWORD), TYPE_U32), type); dag->isRoot = 1; sel.MOV(src, sel.selReg(cvtInsn.getSrc(0), srcType)); return true; } } return false; } INLINE void convertBetweenHalfFloat(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); const Opcode opcode = insn.getOpcode(); if (opcode == OP_F16TO32) { sel.F16TO32(dst, src); } else if (opcode == OP_F32TO16) { // We need two instructions to make the conversion GenRegister unpacked; unpacked = sel.unpacked_uw(sel.reg(FAMILY_DWORD, sel.isScalarReg(insn.getSrc(0)))); sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.F32TO16(unpacked, src); sel.pop(); sel.MOV(dst, unpacked); } else { GBE_ASSERT("Not conversion between float and half\n"); } } INLINE void convert32bitsToSmall(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); GenRegister unpacked; const RegisterFamily dstFamily = getFamily(dstType); if (dstFamily == FAMILY_WORD) { uint32_t type = dstType == TYPE_U16 ? GEN_TYPE_UW : GEN_TYPE_W; /* The special case, when dst is half, float->word->half will lose accuracy. */ if (dstType == TYPE_HALF) { GBE_ASSERT(sel.hasHalfType()); type = GEN_TYPE_HF; } if (!sel.isScalarReg(dst.reg())) { unpacked = sel.unpacked_uw(sel.reg(FAMILY_DWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, type); } else unpacked = GenRegister::retype(sel.unpacked_uw(dst.reg()), type); } else { const uint32_t type = dstType == TYPE_U8 ? GEN_TYPE_UB : GEN_TYPE_B; if (!sel.isScalarReg(dst.reg())) { unpacked = sel.unpacked_ub(sel.reg(FAMILY_DWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, type); } else unpacked = GenRegister::retype(sel.unpacked_ub(dst.reg()), type); } sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MOV(unpacked, src); sel.pop(); if (unpacked.reg() != dst.reg()) sel.MOV(dst, unpacked); } INLINE void convertI64To16bits(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); if (dstType == TYPE_HALF) { /* There is no MOV for Long <---> Half. So Long-->Float-->half. */ GBE_ASSERT(sel.hasLongType()); GBE_ASSERT(sel.hasHalfType()); sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } GenRegister funpacked = sel.unpacked_ud(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); funpacked = GenRegister::retype(funpacked, GEN_TYPE_F); sel.MOV(funpacked, src); GenRegister ftmp = sel.selReg(sel.reg(FAMILY_DWORD, sel.isScalarReg(insn.getSrc(0)))); ftmp = GenRegister::retype(ftmp, GEN_TYPE_F); sel.MOV(ftmp, funpacked); GenRegister unpacked = sel.unpacked_uw(sel.reg(FAMILY_DWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, GEN_TYPE_HF); sel.MOV(unpacked, ftmp); sel.pop(); sel.MOV(dst, unpacked); } else { uint32_t type = dstType == TYPE_U16 ? GEN_TYPE_UW : GEN_TYPE_W; GenRegister unpacked; if (!sel.isScalarReg(dst.reg())) { if (sel.hasLongType()) { unpacked = sel.unpacked_uw(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); } else { unpacked = sel.unpacked_uw(sel.reg(FAMILY_DWORD, sel.isScalarReg(insn.getSrc(0)))); } unpacked = GenRegister::retype(unpacked, type); } else { unpacked = GenRegister::retype(sel.unpacked_uw(dst.reg()), type); } if(!sel.hasLongType()) { GenRegister tmp = sel.selReg(sel.reg(FAMILY_DWORD)); tmp.type = GEN_TYPE_D; sel.CONVI64_TO_I(tmp, src); sel.MOV(unpacked, tmp); } else { sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MOV(unpacked, src); sel.pop(); } if (unpacked.reg() != dst.reg()) { sel.MOV(dst, unpacked); } } } INLINE void convertI64ToI8(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); GenRegister unpacked; const uint32_t type = dstType == TYPE_U8 ? GEN_TYPE_UB : GEN_TYPE_B; if (sel.hasLongType()) { // handle the native long logic. if (!sel.isScalarReg(dst.reg())) { /* When convert i64 to i8, the hstride should be 8, but the hstride do not support more than 4, so we need to split it to 2 steps. */ unpacked = sel.unpacked_uw(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, dstType == TYPE_U8 ? GEN_TYPE_UW : GEN_TYPE_W); } else { unpacked = GenRegister::retype(sel.unpacked_ub(dst.reg()), type); } sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MOV(unpacked, src); sel.pop(); if (unpacked.reg() != dst.reg()) { sel.MOV(dst, unpacked); } } else { // Do not have native long if (!sel.isScalarReg(dst.reg())) { unpacked = sel.unpacked_ub(sel.reg(FAMILY_DWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, type); } else { unpacked = GenRegister::retype(sel.unpacked_ub(dst.reg()), type); } GenRegister tmp = sel.selReg(sel.reg(FAMILY_DWORD)); tmp.type = GEN_TYPE_D; sel.CONVI64_TO_I(tmp, src); sel.MOV(unpacked, tmp); if (unpacked.reg() != dst.reg()) { sel.MOV(dst, unpacked); } } } INLINE void convertI64ToI32(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); if (sel.hasLongType()) { GenRegister unpacked; const uint32_t type = dstType == TYPE_U32 ? GEN_TYPE_UD : GEN_TYPE_D; if (!sel.isScalarReg(dst.reg())) { unpacked = sel.unpacked_ud(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, dstType == TYPE_U32 ? GEN_TYPE_UD : GEN_TYPE_D); } else { unpacked = GenRegister::retype(sel.unpacked_ud(dst.reg()), type); } sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MOV(unpacked, src); sel.pop(); if (unpacked.reg() != dst.reg()) { sel.MOV(dst, unpacked); } } else { sel.CONVI64_TO_I(dst, src); } } INLINE void convertI64ToFloat(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); auto dag = sel.regDAG[src.reg()]; // FIXME, in the future, we need to do a common I64 lower to I32 analysis // at llvm IR layer which could cover more cases then just this one. SelectionDAG *dag0, *dag1; if (dag && dag->child[0] && dag->child[1]) { if (dag->child[0]->insn.getOpcode() == OP_LOADI) { dag0 = dag->child[1]; dag1 = dag->child[0]; } else { dag0 = dag->child[0]; dag1 = dag->child[1]; } GBE_ASSERT(!(dag->child[0]->insn.getOpcode() == OP_LOADI && dag->child[1]->insn.getOpcode() == OP_LOADI)); if (dag->insn.getOpcode() == OP_AND || dag->insn.getOpcode() == OP_OR || dag->insn.getOpcode() == OP_XOR) { GenRegister src0; GenRegister src1; if (lowerI64Reg(sel, dag0, src0, GEN_TYPE_UD) && lowerI64Reg(sel, dag1, src1, GEN_TYPE_UD)) { switch (dag->insn.getOpcode()) { default: case OP_AND: sel.AND(GenRegister::retype(dst, GEN_TYPE_UD), src0, src1); break; case OP_OR: sel.OR(GenRegister::retype(dst, GEN_TYPE_UD), src0, src1); break; case OP_XOR: sel.XOR(GenRegister::retype(dst, GEN_TYPE_UD), src0, src1); break; } sel.MOV(dst, GenRegister::retype(dst, GEN_TYPE_UD)); markChildren = false; return; } } } if (!sel.hasLongType()) { GenRegister tmp[6]; for(int i=0; i<6; i++) { tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD), TYPE_U32); } sel.push(); sel.curr.flag = 0; sel.curr.subFlag = 1; sel.CONVI64_TO_F(dst, src, tmp); sel.pop(); } else { GenRegister unpacked; const uint32_t type = GEN_TYPE_F; unpacked = sel.unpacked_ud(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, type); sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MOV(unpacked, src); sel.pop(); if (unpacked.reg() != dst.reg()) { sel.MOV(dst, unpacked); } } } INLINE void convertSmallIntsToI64(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); const RegisterFamily srcFamily = getFamily(srcType); if (sel.hasLongType() && sel.hasLongRegRestrict()) { // Convert i32/i16/i8 to i64 if hasLongRegRestrict(src and dst hstride must be aligned to the same qword). GenRegister unpacked; GenRegister unpacked_src = src; sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } if(srcFamily == FAMILY_DWORD) { unpacked = sel.unpacked_ud(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, srcType == TYPE_U32 ? GEN_TYPE_UD : GEN_TYPE_D); } else if(srcFamily == FAMILY_WORD) { unpacked = sel.unpacked_uw(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, srcType == TYPE_U16 ? GEN_TYPE_UW : GEN_TYPE_W); } else if(srcFamily == FAMILY_BYTE) { GenRegister tmp = sel.selReg(sel.reg(FAMILY_WORD, sel.isScalarReg(insn.getSrc(0)))); tmp = GenRegister::retype(tmp, srcType == TYPE_U8 ? GEN_TYPE_UW : GEN_TYPE_W); unpacked = sel.unpacked_uw(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, srcType == TYPE_U8 ? GEN_TYPE_UW : GEN_TYPE_W); sel.MOV(tmp, src); unpacked_src = tmp; } else GBE_ASSERT(0); sel.MOV(unpacked, unpacked_src); sel.pop(); sel.MOV(dst, unpacked); } else if (sel.hasLongType()) { sel.MOV(dst, src); } else { sel.CONVI_TO_I64(dst, src, sel.selReg(sel.reg(FAMILY_DWORD))); } } INLINE void convertFToI64(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); if (sel.hasLongType() && sel.hasLongRegRestrict() && srcType == ir::TYPE_FLOAT) { // typical bsw float->long case // Convert float to i64 if hasLongRegRestrict(src and dst hstride must be aligned to the same qword). GenRegister unpacked; GenRegister unpacked_src = src; sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } unpacked = sel.unpacked_ud(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, GEN_TYPE_F); sel.MOV(unpacked, unpacked_src); sel.pop(); sel.MOV(dst, unpacked); } else if (srcType == ir::TYPE_FLOAT) { if (sel.hasLongType()) { // typical bdw float->long case sel.MOV(dst, src); } else { // typical old platform float->long case GenRegister tmp[2]; tmp[0] = sel.selReg(sel.reg(FAMILY_DWORD), TYPE_U32); tmp[1] = sel.selReg(sel.reg(FAMILY_DWORD), TYPE_FLOAT); sel.push(); sel.curr.flag = 0; sel.curr.subFlag = 1; sel.CONVF_TO_I64(dst, src, tmp); sel.pop(); } } else if (srcType == ir::TYPE_HALF) { /* No need to consider old platform. if we support half, we must have native long. */ GBE_ASSERT(sel.hasLongType()); GBE_ASSERT(sel.hasHalfType()); uint32_t type = dstType == TYPE_U64 ? GEN_TYPE_UD : GEN_TYPE_D; GenRegister tmp = GenRegister::retype(sel.selReg(sel.reg(FAMILY_DWORD, sel.isScalarReg(insn.getSrc(0))), TYPE_U32), type); sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MOV(tmp, src); if (sel.hasLongRegRestrict()) { // special for BSW case. GenRegister unpacked = sel.unpacked_ud(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, type); sel.MOV(unpacked, tmp); sel.pop(); sel.MOV(dst, unpacked); } else { sel.pop(); sel.MOV(dst, tmp); } } else if (src.type == GEN_TYPE_DF) { GBE_ASSERT(sel.hasDoubleType()); GBE_ASSERT(sel.hasLongType()); //So far, if we support double, we support native long. // Just Mov sel.MOV(dst, src); } else { /* Invalid case. */ GBE_ASSERT(0); } } INLINE void convertBetweenFloatDouble(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); GBE_ASSERT(sel.hasDoubleType()); if (sel.isScalarReg(insn.getDst(0))) { // dst is scalar, just MOV and nothing more. GBE_ASSERT(sel.isScalarReg(insn.getSrc(0))); sel.MOV(dst, src); } else if (srcType == ir::TYPE_DOUBLE) { // double to float GBE_ASSERT(dstType == ir::TYPE_FLOAT); GenRegister unpacked; unpacked = sel.unpacked_ud(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, GEN_TYPE_F); sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MOV(unpacked, src); sel.pop(); sel.MOV(dst, unpacked); } else { // float to double, just mov sel.MOV(dst, src); } return; } INLINE void convertBetweenHalfDouble(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); GBE_ASSERT(sel.hasDoubleType()); GBE_ASSERT(sel.hasHalfType()); //So far, if we support double, we support half. if (sel.isScalarReg(insn.getDst(0))) { // uniform case. GBE_ASSERT(sel.isScalarReg(insn.getSrc(0))); GBE_ASSERT(sel.curr.execWidth == 1); GenRegister tmpFloat = GenRegister::retype(sel.selReg(sel.reg(FAMILY_DWORD)), GEN_TYPE_F); sel.MOV(tmpFloat, src); sel.MOV(dst, tmpFloat); return; } if (dstType == ir::TYPE_DOUBLE) { // half to double. There is no direct double to half MOV, need tmp float. GBE_ASSERT(srcType == ir::TYPE_HALF); GenRegister tmpFloat = GenRegister::retype(sel.selReg(sel.reg(FAMILY_DWORD)), GEN_TYPE_F); sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MOV(tmpFloat, src); sel.pop(); sel.MOV(dst, tmpFloat); } else { // double to half. No direct MOV from double to half, so double->float->half GBE_ASSERT(srcType == ir::TYPE_DOUBLE); GBE_ASSERT(dstType == ir::TYPE_HALF); sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } // double to float GenRegister unpackedFloat = sel.unpacked_ud(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpackedFloat = GenRegister::retype(unpackedFloat, GEN_TYPE_F); sel.MOV(unpackedFloat, src); // float to half GenRegister unpackedHalf = sel.unpacked_uw(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpackedHalf = GenRegister::retype(unpackedHalf, GEN_TYPE_HF); sel.MOV(unpackedHalf, unpackedFloat); sel.pop(); sel.MOV(dst, unpackedHalf); } } INLINE void convertHalfToSmallInts(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); const RegisterFamily dstFamily = getFamily(dstType); // Special case, half -> char/short. /* [DevBDW+]: Format conversion to or from HF (Half Float) must be DWord-aligned and strided by a DWord on the destination. */ GBE_ASSERT(sel.hasHalfType()); GenRegister tmp; sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } if (dstFamily == FAMILY_BYTE) { const uint32_t type = dstType == TYPE_U8 ? GEN_TYPE_UB : GEN_TYPE_B; tmp = GenRegister::retype(sel.unpacked_ub(sel.reg(FAMILY_DWORD, sel.isScalarReg(insn.getSrc(0)))), type); sel.MOV(tmp, src); } else { const uint32_t type = dstType == TYPE_U16 ? GEN_TYPE_UW : GEN_TYPE_W; tmp = GenRegister::retype(sel.unpacked_uw(sel.reg(FAMILY_DWORD, sel.isScalarReg(insn.getSrc(0)))), type); sel.MOV(tmp, src); } sel.pop(); sel.MOV(dst, tmp); } INLINE void convertSmallIntsToHalf(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); // Special case, char/uchar -> half /* [DevBDW+]: Format conversion to or from HF (Half Float) must be DWord-aligned and strided by a DWord on the destination. */ GBE_ASSERT(sel.hasHalfType()); GenRegister tmp = GenRegister::retype(sel.unpacked_uw(sel.reg(FAMILY_DWORD, sel.isScalarReg(insn.getSrc(0)))), GEN_TYPE_HF); sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MOV(tmp, src); sel.pop(); sel.MOV(dst, tmp); } INLINE void convertDoubleToSmallInts(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); const RegisterFamily dstFamily = getFamily(dstType); GBE_ASSERT(sel.hasDoubleType()); GBE_ASSERT(sel.hasHalfType()); //So far, if we support double, we support half. if (sel.isScalarReg(insn.getDst(0))) { // dst is scalar, just MOV and nothing more. GBE_ASSERT(sel.isScalarReg(insn.getSrc(0))); sel.MOV(dst, src); } else { GenRegister unpacked; if (dstFamily == FAMILY_DWORD) { // double to int unpacked = sel.unpacked_ud(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, dstType == TYPE_U32 ? GEN_TYPE_UD : GEN_TYPE_D); } else if (dstFamily == FAMILY_WORD) { // double to short unpacked = sel.unpacked_uw(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, dstType == TYPE_U16 ? GEN_TYPE_UW : GEN_TYPE_W); } else { GBE_ASSERT(dstFamily == FAMILY_BYTE); // double to char unpacked = sel.unpacked_uw(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, dstType == TYPE_U8 ? GEN_TYPE_UW : GEN_TYPE_W); } sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MOV(unpacked, src); sel.pop(); sel.MOV(dst, unpacked); } } INLINE void convertI64ToDouble(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); GBE_ASSERT(sel.hasDoubleType()); GBE_ASSERT(sel.hasLongType()); //So far, if we support double, we support native long. // Just Mov sel.MOV(dst, src); } INLINE void convertSmallIntsToDouble(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); const RegisterFamily srcFamily = getFamily(srcType); GBE_ASSERT(sel.hasDoubleType()); GBE_ASSERT(sel.hasLongType()); //So far, if we support double, we support native long. if (sel.hasLongType() && sel.hasLongRegRestrict()) { // Convert i32/i16/i8 to i64 if hasLongRegRestrict(src and dst hstride must be aligned to the same qword). GenRegister unpacked; GenRegister unpacked_src = src; sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } if(srcFamily == FAMILY_DWORD) { unpacked = sel.unpacked_ud(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, srcType == TYPE_U32 ? GEN_TYPE_UD : GEN_TYPE_D); } else if(srcFamily == FAMILY_WORD) { unpacked = sel.unpacked_uw(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, srcType == TYPE_U16 ? GEN_TYPE_UW : GEN_TYPE_W); } else if(srcFamily == FAMILY_BYTE) { GenRegister tmp = sel.selReg(sel.reg(FAMILY_WORD, sel.isScalarReg(insn.getSrc(0)))); tmp = GenRegister::retype(tmp, srcType == TYPE_U8 ? GEN_TYPE_UW : GEN_TYPE_W); unpacked = sel.unpacked_uw(sel.reg(FAMILY_QWORD, sel.isScalarReg(insn.getSrc(0)))); unpacked = GenRegister::retype(unpacked, srcType == TYPE_U8 ? GEN_TYPE_UW : GEN_TYPE_W); sel.MOV(tmp, src); unpacked_src = tmp; } else GBE_ASSERT(0); sel.MOV(unpacked, unpacked_src); sel.pop(); sel.MOV(dst, unpacked); } else if (sel.hasLongType()) { sel.MOV(dst, src); } } INLINE bool emitOne(Selection::Opaque &sel, const ir::ConvertInstruction &insn, bool &markChildren) const { using namespace ir; const Type dstType = insn.getDstType(); const Type srcType = insn.getSrcType(); const RegisterFamily dstFamily = getFamily(dstType); const RegisterFamily srcFamily = getFamily(srcType); const GenRegister dst = sel.selReg(insn.getDst(0), dstType); const GenRegister src = sel.selReg(insn.getSrc(0), srcType); const Opcode opcode = insn.getOpcode(); sel.push(); if (sel.isScalarReg(insn.getDst(0)) == true) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } if(opcode == ir::OP_SAT_CVT) sel.curr.saturate = 1; if (opcode == OP_F16TO32 || opcode == OP_F32TO16) { /* Conversion between float and half. */ convertBetweenHalfFloat(sel, insn, markChildren); } else if (dstFamily != FAMILY_DWORD && dstFamily != FAMILY_QWORD && srcFamily == FAMILY_DWORD) { //convert i32/float to small int/half convert32bitsToSmall(sel, insn, markChildren); } else if (dstFamily == FAMILY_WORD && srcFamily == FAMILY_QWORD && srcType != ir::TYPE_DOUBLE) { //convert i64 to i16 and half. convertI64To16bits(sel, insn, markChildren); } else if (dstFamily == FAMILY_BYTE && srcFamily == FAMILY_QWORD && srcType != ir::TYPE_DOUBLE) { //convert i64 to i8 convertI64ToI8(sel, insn, markChildren); } else if ((dstType == ir::TYPE_S32 || dstType == ir::TYPE_U32) && (srcType == ir::TYPE_U64 || srcType == ir::TYPE_S64)) { // Convert i64 to i32 convertI64ToI32(sel, insn, markChildren); } else if (dstType == ir::TYPE_FLOAT && (srcType == ir::TYPE_U64 || srcType == ir::TYPE_S64)) { // long -> float convertI64ToFloat(sel, insn, markChildren); } else if (dstType == ir::TYPE_DOUBLE && (srcType == ir::TYPE_U64 || srcType == ir::TYPE_S64)) { // long -> double convertI64ToDouble(sel, insn, markChildren); } else if ((dstType == ir::TYPE_U64 || dstType == ir::TYPE_S64) && (srcFamily != FAMILY_QWORD && srcType != ir::TYPE_FLOAT && srcType != ir::TYPE_HALF)) { // int/short/char to long convertSmallIntsToI64(sel, insn, markChildren); } else if ((dstType == ir::TYPE_DOUBLE) && (srcFamily != FAMILY_QWORD && srcType != ir::TYPE_FLOAT && srcType != ir::TYPE_HALF)) { // int/short/char to double convertSmallIntsToDouble(sel, insn, markChildren); } else if ((dstType == ir::TYPE_U64 || dstType == ir::TYPE_S64) && (srcType == ir::TYPE_FLOAT || srcType == ir::TYPE_HALF || srcType == ir::TYPE_DOUBLE)) { // All float type to long convertFToI64(sel, insn, markChildren); } else if ((srcType == ir::TYPE_FLOAT && dstType == ir::TYPE_DOUBLE) || (dstType == ir::TYPE_FLOAT && srcType == ir::TYPE_DOUBLE)) { // float and double conversion convertBetweenFloatDouble(sel, insn, markChildren); } else if ((srcType == ir::TYPE_HALF && dstType == ir::TYPE_DOUBLE) || (dstType == ir::TYPE_HALF && srcType == ir::TYPE_DOUBLE)) { // float and half conversion convertBetweenHalfDouble(sel, insn, markChildren); } else if (srcType == ir::TYPE_DOUBLE && dstType != ir::TYPE_FLOAT && dstType != ir::TYPE_HALF && dstFamily != FAMILY_QWORD) { // double to int/short/char convertDoubleToSmallInts(sel, insn, markChildren); } else if (srcType == ir::TYPE_HALF && (dstFamily == FAMILY_BYTE || dstFamily == FAMILY_WORD)) { // Convert half to small int convertHalfToSmallInts(sel, insn, markChildren); } else if (dstType == ir::TYPE_HALF && (srcFamily == FAMILY_BYTE || srcFamily == FAMILY_WORD)) { // Convert small int to half convertSmallIntsToHalf(sel, insn, markChildren); } else { /* All special cases has been handled, just MOV. */ sel.MOV(dst, src); } sel.pop(); return true; } DECL_CTOR(ConvertInstruction, 1, 1); }; /*! atomic instruction pattern */ class AtomicInstructionPattern : public SelectionPattern { public: AtomicInstructionPattern(void) : SelectionPattern(1,1) { for (uint32_t op = 0; op < ir::OP_INVALID; ++op) if (ir::isOpcodeFrom(ir::Opcode(op)) == true) this->opcodes.push_back(ir::Opcode(op)); } /* Used to transform address from 64bit to 32bit, note as dataport messages * cannot accept scalar register, so here to convert to non-uniform * register here. */ GenRegister convertU64ToU32(Selection::Opaque &sel, GenRegister addr) const { GenRegister unpacked = GenRegister::retype(sel.unpacked_ud(addr.reg()), GEN_TYPE_UD); GenRegister dst = sel.selReg(sel.reg(ir::FAMILY_DWORD), ir::TYPE_U32); sel.MOV(dst, unpacked); return dst; } void untypedAtomicA64Stateless(Selection::Opaque &sel, const ir::AtomicInstruction &insn, unsigned msgPayload, GenRegister dst, GenRegister addr, GenRegister src1, GenRegister src2, GenRegister bti) const { using namespace ir; GenRegister addrQ; const AtomicOps atomicOp = insn.getAtomicOpcode(); GenAtomicOpCode genAtomicOp = (GenAtomicOpCode)atomicOp; unsigned addrBytes = typeSize(addr.type); GBE_ASSERT(msgPayload <= 3); unsigned simdWidth = sel.curr.execWidth; AddressMode AM = insn.getAddressMode(); if (addrBytes == 4) { addrQ = sel.selReg(sel.reg(ir::FAMILY_QWORD), ir::TYPE_U64); sel.MOV(addrQ, addr); } else { addrQ = addr; } if (simdWidth == 8) { vector msgs; msgs.push_back(addr); msgs.push_back(src1); msgs.push_back(src2); sel.ATOMICA64(dst, genAtomicOp, msgPayload, msgs, bti, sel.getBTITemps(AM)); } else if (simdWidth == 16) { vector msgs; RegisterFamily family = sel.getRegisterFamily(insn.getDst(0)); Type type = getType(family); for (unsigned k = 0; k < msgPayload; k++) { msgs.push_back(sel.selReg(sel.reg(family), type)); } sel.push(); /* first quarter */ sel.curr.execWidth = 8; sel.curr.quarterControl = GEN_COMPRESSION_Q1; sel.MOV(GenRegister::retype(msgs[0], GEN_TYPE_UL), GenRegister::Qn(addrQ, 0)); if(msgPayload > 1) { if(family == ir::FAMILY_QWORD) sel.MOV(GenRegister::Qn(msgs[0], 1), GenRegister::Qn(src1, 0)); else sel.MOV(GenRegister::Qn(msgs[1], 0), GenRegister::Qn(src1, 0)); } if(msgPayload > 2) { if(family == ir::FAMILY_QWORD) sel.MOV(GenRegister::Qn(msgs[1], 0), GenRegister::Qn(src2, 0)); else sel.MOV(GenRegister::Qn(msgs[1], 1), GenRegister::Qn(src2, 0)); } sel.ATOMICA64(GenRegister::Qn(dst, 0), genAtomicOp, msgPayload, msgs, bti, sel.getBTITemps(AM)); /* second quarter */ sel.curr.execWidth = 8; sel.curr.quarterControl = GEN_COMPRESSION_Q2; sel.MOV(GenRegister::retype(msgs[0], GEN_TYPE_UL), GenRegister::Qn(addrQ, 1)); if(msgPayload > 1) { if(family == ir::FAMILY_QWORD) sel.MOV(GenRegister::Qn(msgs[0], 1), GenRegister::Qn(src1, 1)); else sel.MOV(GenRegister::Qn(msgs[1], 0), GenRegister::Qn(src1, 1)); } if(msgPayload > 2) { if(family == ir::FAMILY_QWORD) sel.MOV(GenRegister::Qn(msgs[1], 0), GenRegister::Qn(src2, 1)); else sel.MOV(GenRegister::Qn(msgs[1], 1), GenRegister::Qn(src2, 1)); } sel.ATOMICA64(GenRegister::Qn(dst, 1), genAtomicOp, msgPayload, msgs, bti, sel.getBTITemps(AM)); sel.pop(); } } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::AtomicInstruction &insn = cast(dag.insn); const AtomicOps atomicOp = insn.getAtomicOpcode(); unsigned srcNum = insn.getSrcNum(); unsigned msgPayload; Register reg = insn.getAddressRegister(); GenRegister address = sel.selReg(reg, getType(sel.getRegisterFamily(reg))); AddressSpace addrSpace = insn.getAddressSpace(); GBE_ASSERT(insn.getAddressSpace() == MEM_GLOBAL || insn.getAddressSpace() == MEM_PRIVATE || insn.getAddressSpace() == MEM_LOCAL || insn.getAddressSpace() == MEM_GENERIC || insn.getAddressSpace() == MEM_MIXED); unsigned addrBytes = typeSize(address.type); AddressMode AM = insn.getAddressMode(); if (AM == AM_DynamicBti) { msgPayload = srcNum - 1; } else { msgPayload = srcNum; } Type type = getType(sel.getRegisterFamily(insn.getDst(0))); GenRegister dst = sel.selReg(insn.getDst(0), type); GenRegister src0 = sel.selReg(insn.getAddressRegister(), type); GenRegister src1 = src0, src2 = src0; if(msgPayload > 1) src1 = sel.selReg(insn.getSrc(1), type); if(msgPayload > 2) src2 = sel.selReg(insn.getSrc(2), type); GenAtomicOpCode genAtomicOp = (GenAtomicOpCode)atomicOp; if (AM == AM_DynamicBti || AM == AM_StaticBti) { if (AM == AM_DynamicBti) { Register btiReg = insn.getBtiReg(); sel.ATOMIC(dst, genAtomicOp, msgPayload, address, src1, src2, sel.selReg(btiReg, type), sel.getBTITemps(AM)); } else { unsigned SI = insn.getSurfaceIndex(); sel.ATOMIC(dst, genAtomicOp, msgPayload, address, src1, src2, GenRegister::immud(SI), sel.getBTITemps(AM)); } } else if (addrSpace == ir::MEM_LOCAL) { // stateless mode, local still use bti access GenRegister addrDW = address; if (addrBytes == 8) addrDW = convertU64ToU32(sel, address); sel.ATOMIC(dst, genAtomicOp, msgPayload, addrDW, src1, src2, GenRegister::immud(0xfe), sel.getBTITemps(AM)); } else if (addrSpace == ir::MEM_GENERIC) { Register localMask = generateLocalMask(sel, address); sel.push(); sel.curr.useVirtualFlag(localMask, GEN_PREDICATE_NORMAL); GenRegister addrDW = address; if (addrBytes == 8) addrDW = convertU64ToU32(sel, address); sel.ATOMIC(dst, genAtomicOp, msgPayload, addrDW, src1, src2, GenRegister::immud(0xfe), sel.getBTITemps(AM)); sel.curr.inversePredicate = 1; untypedAtomicA64Stateless(sel, insn, msgPayload, dst, address, src1, src2, GenRegister::immud(0xff)); sel.pop(); } else untypedAtomicA64Stateless(sel, insn, msgPayload, dst, address, src1, src2, GenRegister::immud(0xff)); markAllChildren(dag); return true; } }; /*! Select instruction pattern */ class SelectInstructionPattern : public SelectionPattern { public: SelectInstructionPattern(void) : SelectionPattern(1,1) { for (uint32_t op = 0; op < ir::OP_INVALID; ++op) if (ir::isOpcodeFrom(ir::Opcode(op)) == true) this->opcodes.push_back(ir::Opcode(op)); } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::SelectInstruction &insn = cast(dag.insn); // Get all registers for the instruction const Type type = insn.getType(); const GenRegister dst = sel.selReg(insn.getDst(0), type); // Look for immediate values for the right source GenRegister src0, src1; SelectionDAG *dag0 = dag.child[0]; // source 0 is the predicate! SelectionDAG *dag1 = dag.child[1]; SelectionDAG *dag2 = dag.child[2]; if (dag0) dag0->isRoot = 1; bool inverse = false; sel.getSrcGenRegImm(dag, dag1, dag2, src0, src1, type, inverse); const Register pred = insn.getPredicate(); sel.push(); if (sel.isScalarReg(insn.getDst(0)) == true) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.curr.inversePredicate ^= inverse; sel.curr.physicalFlag = 0; sel.curr.flagIndex = pred.value(); sel.curr.predicate = GEN_PREDICATE_NORMAL; // FIXME in general, if the flag is a uniform flag. // we should treat that flag as extern flag, as we // never genrate a uniform physical flag. As we can // never predicate which channel is active when this // flag is used. // We need to concentrate this logic to the modFlag bit. // If an instruction has that bit, it will generate physical // flag, otherwise it will not. But current modFlag is // just a hint. We need to fix it in the future. if (!dag0 || (sel.isScalarReg(dag0->insn.getDst(0)))) sel.curr.externFlag = 1; if((type == ir::TYPE_S64 || type == ir::TYPE_U64) && !sel.hasLongType()) sel.SEL_INT64(dst, src0, src1); else sel.SEL(dst, src0, src1); sel.pop(); return true; } }; DECL_PATTERN(TernaryInstruction) { INLINE bool emitOne(Selection::Opaque &sel, const ir::TernaryInstruction &insn, bool &markChildren) const { using namespace ir; const Type type = insn.getType(); const GenRegister dst = sel.selReg(insn.getDst(0), type), src0 = sel.selReg(insn.getSrc(0), type), src1 = sel.selReg(insn.getSrc(1), type), src2 = sel.selReg(insn.getSrc(2), type); switch(insn.getOpcode()) { case OP_I64MADSAT: { GenRegister tmp[9]; int tmp_num; if (!sel.hasLongType()) { tmp_num = 9; for(int i=0; i<9; i++) { tmp[i] = sel.selReg(sel.reg(FAMILY_DWORD)); tmp[i].type = GEN_TYPE_UD; } } else { tmp_num = 6; for(int i=0; i<6; i++) { tmp[i] = sel.selReg(sel.reg(FAMILY_QWORD), ir::TYPE_U64); tmp[i].type = GEN_TYPE_UL; } } sel.push(); sel.curr.flag = 0; sel.curr.subFlag = 1; sel.I64MADSAT(dst, src0, src1, src2, tmp, tmp_num); sel.pop(); break; } case OP_MAD: { sel.push(); if (sel.isScalarReg(insn.getDst(0))) sel.curr.execWidth = 1; sel.MAD(dst, src2, src0, src1); sel.pop(); break; } case OP_LRP: { sel.LRP(dst, src0, src1, src2); break; } default: NOT_IMPLEMENTED; } return true; } DECL_CTOR(TernaryInstruction, 1, 1); }; /*! Label instruction pattern */ DECL_PATTERN(LabelInstruction) { INLINE bool emitOne(Selection::Opaque &sel, const ir::LabelInstruction &insn, bool &markChildren) const { using namespace ir; const LabelIndex label = insn.getLabelIndex(); const GenRegister src0 = sel.getBlockIP(); const GenRegister src1 = sel.getLabelImmReg(label); const uint32_t simdWidth = sel.ctx.getSimdWidth(); GBE_ASSERTM(label < sel.ctx.getMaxLabel(), "We reached the maximum label number which is reserved for barrier handling"); sel.LABEL(label); if(!insn.getParent()->needIf) return true; // Do not emit any code for the "returning" block. There is no need for it if (insn.getParent() == &sel.ctx.getFunction().getBottomBlock()) return true; LabelIndex jip; const LabelIndex nextLabel = insn.getParent()->getNextBlock()->getLabelIndex(); if (sel.ctx.hasJIP(&insn)) jip = sel.ctx.getLabelIndex(&insn); else jip = nextLabel; // Emit the mask computation at the head of each basic block sel.push(); sel.curr.noMask = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.flag = 0; sel.curr.subFlag = 1; sel.cmpBlockIP(GEN_CONDITIONAL_LE, src0, src1); sel.pop(); if (sel.block->hasBarrier) { // If this block has barrier, we don't execute the block until all lanes // are 1s. Set each reached lane to 1, then check all lanes. If there is any // lane not reached, we jump to jip. And no need to issue if/endif for // this block, as it will always excute with all lanes activated. sel.push(); sel.curr.predicate = GEN_PREDICATE_NORMAL; sel.curr.flag = 0; sel.curr.subFlag = 1; sel.setBlockIP(src0, sel.ctx.getMaxLabel()); sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; sel.cmpBlockIP(GEN_CONDITIONAL_EQ, src0, sel.ctx.getMaxLabel()); if (simdWidth == 8) sel.curr.predicate = GEN_PREDICATE_ALIGN1_ALL8H; else if (simdWidth == 16) sel.curr.predicate = GEN_PREDICATE_ALIGN1_ALL16H; else NOT_IMPLEMENTED; sel.curr.noMask = 1; sel.curr.execWidth = 1; sel.curr.inversePredicate = 1; sel.JMPI(GenRegister::immd(0), jip, label); sel.pop(); // FIXME, if the last BRA is unconditional jump, we don't need to update the label here. sel.push(); sel.curr.predicate = GEN_PREDICATE_NORMAL; sel.curr.flag = 0; sel.curr.subFlag = 1; sel.setBlockIP(src0, label.value()); sel.pop(); } else { if (sel.ctx.hasJIP(&insn) && // If jump to next label and the endif offset is -1, then // We don't need to add a jmpi here, as the following IF will do the same // thing if all channels are disabled. (jip != nextLabel || sel.block->endifOffset != -1)) { // If it is required, insert a JUMP to bypass the block sel.push(); sel.curr.flag = 0; sel.curr.subFlag = 1; if (simdWidth == 8) sel.curr.predicate = GEN_PREDICATE_ALIGN1_ANY8H; else if (simdWidth == 16) sel.curr.predicate = GEN_PREDICATE_ALIGN1_ANY16H; else NOT_IMPLEMENTED; sel.curr.noMask = 1; sel.curr.execWidth = 1; sel.curr.inversePredicate = 1; sel.JMPI(GenRegister::immd(0), jip, label); sel.pop(); } if(!sel.block->removeSimpleIfEndif){ sel.push(); sel.curr.flag = 0; sel.curr.subFlag = 1; sel.curr.predicate = GEN_PREDICATE_NORMAL; if(!insn.getParent()->needEndif && insn.getParent()->needIf) { ir::LabelIndex label = insn.getParent()->endifLabel; sel.IF(GenRegister::immd(0), label, label); } else sel.IF(GenRegister::immd(0), sel.block->endifLabel, sel.block->endifLabel); sel.pop(); } } return true; } DECL_CTOR(LabelInstruction, 1, 1); }; DECL_PATTERN(SampleInstruction) { INLINE void emitLd_ivb(Selection::Opaque &sel, const ir::SampleInstruction &insn, GenRegister msgPayloads[4], uint32_t &msgLen) const { // pre SKL: U, lod, [V], [W] using namespace ir; GBE_ASSERT(insn.getSrcType() != TYPE_FLOAT); uint32_t srcNum = insn.getSrcNum(); msgPayloads[0] = sel.selReg(insn.getSrc(0), insn.getSrcType()); msgPayloads[1] = sel.selReg(sel.reg(FAMILY_DWORD), TYPE_U32); sel.MOV(msgPayloads[1], GenRegister::immud(0)); if (srcNum > 1) msgPayloads[2] = sel.selReg(insn.getSrc(1), insn.getSrcType()); if (srcNum > 2) msgPayloads[3] = sel.selReg(insn.getSrc(2), insn.getSrcType()); // Clear the lod to zero. msgLen = srcNum + 1; } INLINE void emitLd_skl(Selection::Opaque &sel, const ir::SampleInstruction &insn, GenRegister msgPayloads[4], uint32_t &msgLen) const { // SKL: U, [V], [lod], [W] using namespace ir; GBE_ASSERT(insn.getSrcType() != TYPE_FLOAT); uint32_t srcNum = msgLen = insn.getSrcNum(); msgPayloads[0] = sel.selReg(insn.getSrc(0), insn.getSrcType()); if (srcNum > 1) msgPayloads[1] = sel.selReg(insn.getSrc(1), insn.getSrcType()); if (srcNum > 2) { // Clear the lod to zero. msgPayloads[2] = sel.selReg(sel.reg(FAMILY_DWORD), TYPE_U32); sel.MOV(msgPayloads[2], GenRegister::immud(0)); msgLen += 1; msgPayloads[3] = sel.selReg(insn.getSrc(2), insn.getSrcType()); } } INLINE bool emitOne(Selection::Opaque &sel, const ir::SampleInstruction &insn, bool &markChildren) const { using namespace ir; GenRegister msgPayloads[4]; vector dst(insn.getDstNum()); uint32_t srcNum = insn.getSrcNum(); uint32_t valueID = 0; uint32_t msgLen = 0; for (valueID = 0; valueID < insn.getDstNum(); ++valueID) dst[valueID] = sel.selReg(insn.getDst(valueID), insn.getDstType()); if (insn.getSamplerOffset() != 0) { if(sel.getLdMsgOrder() < LD_MSG_ORDER_SKL) this->emitLd_ivb(sel, insn, msgPayloads, msgLen); else this->emitLd_skl(sel, insn, msgPayloads, msgLen); } else { // U, V, [W] GBE_ASSERT(insn.getSrcType() == TYPE_FLOAT); for (valueID = 0; valueID < srcNum; ++valueID) msgPayloads[valueID] = sel.selReg(insn.getSrc(valueID), insn.getSrcType()); msgLen = srcNum; } // We switch to a fixup bti for linear filter on a image1d array sampling. uint32_t bti = insn.getImageIndex() + (insn.getSamplerOffset() == 2 ? BTI_WORKAROUND_IMAGE_OFFSET : 0); if (bti > BTI_MAX_ID) { std::cerr << "Too large bti " << bti; return false; } uint32_t sampler = insn.getSamplerIndex(); sel.SAMPLE(dst.data(), insn.getDstNum(), msgPayloads, msgLen, bti, sampler, insn.getSamplerOffset() != 0, false); return true; } DECL_CTOR(SampleInstruction, 1, 1); }; DECL_PATTERN(VmeInstruction) { INLINE bool emitOne(Selection::Opaque &sel, const ir::VmeInstruction &insn, bool &markChildren) const { using namespace ir; uint32_t msg_type, vme_search_path_lut, lut_sub; msg_type = insn.getMsgType(); vme_search_path_lut = 0; lut_sub = 0; GBE_ASSERT(msg_type == 1); uint32_t payloadLen = 0; //We allocate 5 virtual payload grfs to selection dst register. if(msg_type == 1){ payloadLen = 5; } uint32_t selDstNum = insn.getDstNum() + payloadLen; uint32_t srcNum = insn.getSrcNum(); vector dst(selDstNum); vector payloadVal(srcNum); uint32_t valueID = 0; for (valueID = 0; valueID < insn.getDstNum(); ++valueID) dst[valueID] = sel.selReg(insn.getDst(valueID), insn.getDstType()); for (valueID = insn.getDstNum(); valueID < selDstNum; ++valueID) dst[valueID] = sel.selReg(sel.reg(FAMILY_DWORD), TYPE_U32); for (valueID = 0; valueID < srcNum; ++valueID) payloadVal[valueID] = sel.selReg(insn.getSrc(valueID), insn.getSrcType()); uint32_t bti = insn.getImageIndex(); if (bti > BTI_MAX_ID) { std::cerr << "Too large bti " << bti; return false; } sel.VME(bti, dst.data(), payloadVal.data(), selDstNum, srcNum, msg_type, vme_search_path_lut, lut_sub); return true; } DECL_CTOR(VmeInstruction, 1, 1); }; /*! Typed write instruction pattern. */ DECL_PATTERN(TypedWriteInstruction) { INLINE bool emitOne(Selection::Opaque &sel, const ir::TypedWriteInstruction &insn, bool &markChildren) const { const GenRegister header = GenRegister::ud8grf(sel.reg(ir::FAMILY_REG)); sel.push(); sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; sel.MOV(header, GenRegister::immud(0)); sel.curr.execWidth = 1; GenRegister channelEn = sel.getOffsetReg(header, 0, 7*4); // Enable all channels. sel.MOV(channelEn, GenRegister::immud(0xffff)); sel.pop(); const uint32_t simdWidth = sel.ctx.getSimdWidth(); if (simdWidth == 16) emitWithSimd16(sel, insn, markChildren, header); else if (simdWidth == 8) emitWithSimd8(sel, insn, markChildren, header); else NOT_SUPPORTED; return true; } INLINE bool emitWithSimd16(Selection::Opaque &sel, const ir::TypedWriteInstruction &insn, bool &markChildren, const GenRegister& header) const { using namespace ir; GenRegister msgs[9]; // (header + U + V + W + LOD + 4) msgs[0] = header; for (uint32_t i = 1; i < 9; ++i) { //SIMD16 will be split into two SIMD8, //each virtual reg in msgs requires one physical reg with 8 DWORDs (32 bytes), //so, declare with FAMILY_WORD, and the allocated size will be sizeof(WORD)*SIMD16 = 32 bytes msgs[i] = sel.selReg(sel.reg(FAMILY_WORD), TYPE_U32); } const uint32_t dims = insn.getSrcNum() - 4; uint32_t bti = insn.getImageIndex(); sel.push(); sel.curr.execWidth = 8; for (uint32_t i = 0; i < 2; ++i) { //SIMD16 split to two SIMD8 sel.curr.quarterControl = (i == 0) ? GEN_COMPRESSION_Q1 : GEN_COMPRESSION_Q2; uint32_t msgid = 1; for (uint32_t dim = 0; dim < dims; ++dim) { //the coords GenRegister coord = sel.selReg(insn.getSrc(dim), insn.getCoordType()); sel.MOV(GenRegister::retype(msgs[msgid++], coord.type), GenRegister::Qn(coord, i)); } while (msgid < 5) //fill fake coords sel.MOV(msgs[msgid++], GenRegister::immud(0)); for (uint32_t j = 0; j < 4; ++j) { //the data GenRegister data = sel.selReg(insn.getSrc(j + dims), insn.getSrcType()); sel.MOV(GenRegister::retype(msgs[msgid++], data.type), GenRegister::Qn(data, i)); } sel.TYPED_WRITE(msgs, 9, bti, dims == 3); } sel.pop(); return true; } INLINE bool emitWithSimd8(Selection::Opaque &sel, const ir::TypedWriteInstruction &insn, bool &markChildren, const GenRegister& header) const { using namespace ir; GenRegister msgs[9]; // (header + U + V + W + LOD + 4) msgs[0] = header; const uint32_t dims = insn.getSrcNum() - 4; uint32_t bti = insn.getImageIndex(); uint32_t msgid = 1; for (uint32_t dim = 0; dim < dims; ++dim) { //the coords GenRegister coord = sel.selReg(insn.getSrc(dim), insn.getCoordType()); msgs[msgid++] = coord; } while (msgid < 5) { //fill fake coords GenRegister fake = sel.selReg(sel.reg(FAMILY_DWORD), TYPE_U32); sel.MOV(fake, GenRegister::immud(0)); msgs[msgid++] = fake; } for (uint32_t j = 0; j < 4; ++j) { //the data GenRegister data = sel.selReg(insn.getSrc(j + dims), insn.getSrcType()); msgs[msgid++] = data; } sel.TYPED_WRITE(msgs, 9, bti, dims == 3); return true; } DECL_CTOR(TypedWriteInstruction, 1, 1); }; /*! get image info instruction pattern. */ DECL_PATTERN(GetImageInfoInstruction) { INLINE bool emitOne(Selection::Opaque &sel, const ir::GetImageInfoInstruction &insn, bool &markChildren) const { using namespace ir; GenRegister dst; dst = sel.selReg(insn.getDst(0), TYPE_U32); GenRegister imageInfoReg = GenRegister::ud1grf(insn.getSrc(0)); sel.MOV(dst, imageInfoReg); return true; } DECL_CTOR(GetImageInfoInstruction, 1, 1); }; class ReadARFInstructionPattern : public SelectionPattern { public: ReadARFInstructionPattern(void) : SelectionPattern(1,1) { this->opcodes.push_back(ir::OP_READ_ARF); } INLINE uint32_t getRegNum(ir::ARFRegister arf) const { if (arf == ir::ARF_TM) { return 0xc0; } else { GBE_ASSERT(0); return 0; } } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::ReadARFInstruction &insn = cast(dag.insn); GenRegister dst; dst = sel.selReg(insn.getDst(0), insn.getType()); sel.push(); sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; sel.curr.execWidth = 8; sel.READ_ARF(dst, GenRegister(GEN_ARCHITECTURE_REGISTER_FILE, getRegNum(insn.getARFRegister()), 0, getGenType(insn.getType()), GEN_VERTICAL_STRIDE_8, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_1)); sel.pop(); return true; } }; class SimdShuffleInstructionPattern : public SelectionPattern { public: SimdShuffleInstructionPattern(void) : SelectionPattern(1,1) { this->opcodes.push_back(ir::OP_SIMD_SHUFFLE); } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::SimdShuffleInstruction &insn = cast(dag.insn); assert(insn.getOpcode() == OP_SIMD_SHUFFLE); const Type type = insn.getType(); GenRegister dst = sel.selReg(insn.getDst(0), type); GenRegister src0 = sel.selReg(insn.getSrc(0), type); GenRegister src1; SelectionDAG *dag0 = dag.child[0]; SelectionDAG *dag1 = dag.child[1]; if (dag1 != NULL && dag1->insn.getOpcode() == OP_LOADI && canGetRegisterFromImmediate(dag1->insn)) { const auto &childInsn = cast(dag1->insn); src1 = getRegisterFromImmediate(childInsn.getImmediate(), TYPE_U32); if (dag0) dag0->isRoot = 1; } else { markAllChildren(dag); src1 = sel.selReg(insn.getSrc(1), TYPE_U32); } sel.push(); if (sel.isScalarReg(insn.getSrc(0))) { if (sel.isScalarReg(insn.getDst(0))) { sel.curr.execWidth = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; } sel.MOV(dst, src0); //no matter what src1 is } else { if (src1.file == GEN_IMMEDIATE_VALUE) { uint32_t offset = src1.value.ud % sel.curr.execWidth; GenRegister reg = GenRegister::subphysicaloffset(src0, offset); reg.vstride = GEN_VERTICAL_STRIDE_0; reg.hstride = GEN_HORIZONTAL_STRIDE_0; reg.width = GEN_WIDTH_1; sel.MOV(dst, reg); } else { GenRegister shiftL = sel.selReg(sel.reg(FAMILY_DWORD), TYPE_U32); uint32_t SHLimm = typeSize(getGenType(type)) == 2 ? 1 : (typeSize(getGenType(type)) == 4 ? 2 : 3); sel.SHL(shiftL, src1, GenRegister::immud(SHLimm)); sel.SIMD_SHUFFLE(dst, src0, shiftL); } } sel.pop(); return true; } }; /*! Get a region of a register */ class RegionInstructionPattern : public SelectionPattern { public: RegionInstructionPattern(void) : SelectionPattern(1,1) { this->opcodes.push_back(ir::OP_REGION); } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::RegionInstruction &insn = cast(dag.insn); GenRegister dst, src; dst = sel.selReg(insn.getDst(0), ir::TYPE_U32); src = GenRegister::ud1grf(insn.getSrc(0)); src = sel.getOffsetReg(src, 0, insn.getOffset()*4); sel.push(); sel.curr.noMask = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.MOV(dst, src); sel.pop(); markAllChildren(dag); return true; } }; /*! Get a region of a register */ class IndirectMovInstructionPattern : public SelectionPattern { public: IndirectMovInstructionPattern(void) : SelectionPattern(1,1) { this->opcodes.push_back(ir::OP_INDIRECT_MOV); } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::IndirectMovInstruction &insn = cast(dag.insn); GenRegister dst, src0, src1; uint32_t offset = insn.getOffset(); dst = sel.selReg(insn.getDst(0), insn.getType()); src0 = sel.selReg(insn.getSrc(0), TYPE_U32); src1 = sel.selReg(insn.getSrc(1), TYPE_U32); GenRegister tmp = sel.selReg(sel.reg(FAMILY_WORD), TYPE_U16); sel.INDIRECT_MOVE(dst, tmp, src0, src1, offset); markAllChildren(dag); return true; } }; class CalcTimestampInstructionPattern : public SelectionPattern { public: CalcTimestampInstructionPattern(void) : SelectionPattern(1,1) { this->opcodes.push_back(ir::OP_CALC_TIMESTAMP); } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::CalcTimestampInstruction &insn = cast(dag.insn); uint32_t pointNum = insn.getPointNum(); uint32_t tsType = insn.getTimestamptType(); GBE_ASSERT(sel.ctx.getSimdWidth() == 16 || sel.ctx.getSimdWidth() == 8); GenRegister tmp; GenRegister ts[5]; int tsNum; if (sel.ctx.getSimdWidth() == 16) { if (!sel.hasLongType()) tmp = GenRegister::retype(sel.selReg(sel.reg(FAMILY_WORD)), GEN_TYPE_UD); ts[0] = GenRegister::retype(sel.selReg(ir::ocl::profilingts0, ir::TYPE_U32), GEN_TYPE_UD); ts[1] = GenRegister::retype(sel.selReg(ir::ocl::profilingts1, ir::TYPE_U32), GEN_TYPE_UD); ts[2] = GenRegister::retype(sel.selReg(ir::ocl::profilingts2, ir::TYPE_U32), GEN_TYPE_UW); tsNum = 3; } else { if (!sel.hasLongType()) tmp = GenRegister::retype(sel.selReg(sel.reg(FAMILY_DWORD)), GEN_TYPE_UD); ts[0] = GenRegister::retype(sel.selReg(ir::ocl::profilingts0, ir::TYPE_U32), GEN_TYPE_UD); ts[1] = GenRegister::retype(sel.selReg(ir::ocl::profilingts1, ir::TYPE_U32), GEN_TYPE_UD); ts[2] = GenRegister::retype(sel.selReg(ir::ocl::profilingts2, ir::TYPE_U32), GEN_TYPE_UD); ts[3] = GenRegister::retype(sel.selReg(ir::ocl::profilingts3, ir::TYPE_U32), GEN_TYPE_UD); ts[4] = GenRegister::retype(sel.selReg(ir::ocl::profilingts4, ir::TYPE_U32), GEN_TYPE_UD); tsNum = 5; } sel.push(); { sel.curr.flag = 0; sel.curr.subFlag = 1; sel.CALC_TIMESTAMP(ts, tsNum, tmp, pointNum, tsType); } sel.pop(); markAllChildren(dag); return true; } }; class StoreProfilingInstructionPattern : public SelectionPattern { public: StoreProfilingInstructionPattern(void) : SelectionPattern(1,1) { this->opcodes.push_back(ir::OP_STORE_PROFILING); } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::StoreProfilingInstruction &insn = cast(dag.insn); uint32_t profilingType = insn.getProfilingType(); uint32_t BTI = insn.getBTI(); GBE_ASSERT(sel.ctx.getSimdWidth() == 16 || sel.ctx.getSimdWidth() == 8); GenRegister tmp0; GenRegister tmp1; GenRegister ts[5]; int tsNum; if (sel.ctx.getSimdWidth() == 16) { tmp0 = GenRegister::retype(sel.selReg(sel.reg(FAMILY_DWORD)), GEN_TYPE_UD); ts[0] = GenRegister::retype(sel.selReg(ir::ocl::profilingts0, ir::TYPE_U32), GEN_TYPE_UD); ts[1] = GenRegister::retype(sel.selReg(ir::ocl::profilingts1, ir::TYPE_U32), GEN_TYPE_UD); ts[2] = GenRegister::retype(sel.selReg(ir::ocl::profilingts2, ir::TYPE_U32), GEN_TYPE_UW); tsNum = 3; } else { tmp0 = GenRegister::retype(sel.selReg(sel.reg(FAMILY_DWORD)), GEN_TYPE_UD); tmp1 = GenRegister::retype(sel.selReg(sel.reg(FAMILY_DWORD)), GEN_TYPE_UD); ts[0] = GenRegister::retype(sel.selReg(ir::ocl::profilingts0, ir::TYPE_U32), GEN_TYPE_UD); ts[1] = GenRegister::retype(sel.selReg(ir::ocl::profilingts1, ir::TYPE_U32), GEN_TYPE_UD); ts[2] = GenRegister::retype(sel.selReg(ir::ocl::profilingts2, ir::TYPE_U32), GEN_TYPE_UD); ts[3] = GenRegister::retype(sel.selReg(ir::ocl::profilingts3, ir::TYPE_U32), GEN_TYPE_UD); ts[4] = GenRegister::retype(sel.selReg(ir::ocl::profilingts4, ir::TYPE_U32), GEN_TYPE_UD); tsNum = 5; } sel.push(); { sel.curr.flag = 0; sel.curr.subFlag = 1; sel.STORE_PROFILING(profilingType, BTI, tmp0, tmp1, ts, tsNum); } sel.pop(); markAllChildren(dag); return true; } }; class PrintfInstructionPattern : public SelectionPattern { public: PrintfInstructionPattern(void) : SelectionPattern(1,1) { this->opcodes.push_back(ir::OP_PRINTF); } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::PrintfInstruction &insn = cast(dag.insn); uint16_t num = insn.getNum(); uint8_t BTI = insn.getBti(); GenRegister tmp0, tmp1; uint32_t srcNum = insn.getSrcNum(); uint32_t i = 0; uint32_t totalSize = 0; bool isContinue = false; GBE_ASSERT(sel.ctx.getSimdWidth() == 16 || sel.ctx.getSimdWidth() == 8); tmp0 = GenRegister::retype(sel.selReg(sel.reg(FAMILY_DWORD)), GEN_TYPE_UD); tmp1 = GenRegister::retype(sel.selReg(sel.reg(FAMILY_DWORD)), GEN_TYPE_UD); /* Get the total size for one printf statement. */ for (i = 0; i < srcNum; i++) { Type type = insn.getType(i); if (type == TYPE_DOUBLE || type == TYPE_S64 || type == TYPE_U64) { totalSize += 8; } else { totalSize += 4; // Make sure always align to 4. } } i = 0; GenRegister regs[8]; if (srcNum == 0) { sel.PRINTF(BTI, tmp0, tmp1, regs, srcNum, num, isContinue, totalSize); } else { do { uint32_t s = srcNum < 8 ? srcNum : 8; for (uint32_t j = 0; j < s; j++) { regs[j] = sel.selReg(insn.getSrc(i + j), insn.getType(i + j)); } sel.PRINTF(BTI, tmp0, tmp1, regs, s, num, isContinue, totalSize); if (srcNum > 8) { srcNum -= 8; i += 8; } else { srcNum = 0; } isContinue = true; } while(srcNum); } markAllChildren(dag); return true; } }; /*! Branch instruction pattern */ class BranchInstructionPattern : public SelectionPattern { public: BranchInstructionPattern(void) : SelectionPattern(1,1) { for (uint32_t op = 0; op < ir::OP_INVALID; ++op) if (ir::isOpcodeFrom(ir::Opcode(op)) == true) this->opcodes.push_back(ir::Opcode(op)); } void emitForwardBranch(Selection::Opaque &sel, const ir::BranchInstruction &insn, ir::LabelIndex dst, ir::LabelIndex src) const { using namespace ir; const GenRegister ip = sel.getBlockIP(); // We will not emit any jump if we must go the next block anyway const BasicBlock *curr = insn.getParent(); const BasicBlock *next = curr->getNextBlock(); const LabelIndex nextLabel = next->getLabelIndex(); if (insn.isPredicated() == true) { const Register pred = insn.getPredicateIndex(); sel.push(); // we don't need to set next label to the pcip // as if there is no backward jump latter, then obviously everything will work fine. // If there is backward jump latter, then all the pcip will be updated correctly there. sel.curr.physicalFlag = 0; sel.curr.flagIndex = pred.value(); sel.curr.predicate = GEN_PREDICATE_NORMAL; sel.setBlockIP(ip, dst.value()); sel.curr.predicate = GEN_PREDICATE_NONE; if (!sel.block->hasBarrier && !sel.block->removeSimpleIfEndif) sel.ENDIF(GenRegister::immd(0), nextLabel); sel.block->endifOffset = -1; sel.pop(); } else { // Update the PcIPs const LabelIndex jip = sel.ctx.getLabelIndex(&insn); sel.curr.flag = 0; sel.curr.subFlag = 1; if(insn.getParent()->needEndif) sel.setBlockIP(ip, dst.value()); if (!sel.block->hasBarrier && !sel.block->removeSimpleIfEndif) { if(insn.getParent()->needEndif && !insn.getParent()->needIf) sel.ENDIF(GenRegister::immd(0), insn.getParent()->endifLabel, insn.getParent()->endifLabel); else if(insn.getParent()->needEndif) sel.ENDIF(GenRegister::immd(0), nextLabel); } sel.block->endifOffset = -1; if (nextLabel == jip) return; // Branch to the jump target sel.push(); sel.curr.execWidth = 1; sel.curr.noMask = 1; sel.curr.predicate = GEN_PREDICATE_NONE; // Actually, the origin of this JMPI should be the beginning of next BB. sel.block->endifOffset -= sel.JMPI(GenRegister::immd(0), jip, ir::LabelIndex(curr->getLabelIndex().value() + 1)); sel.pop(); } } void emitBackwardBranch(Selection::Opaque &sel, const ir::BranchInstruction &insn, ir::LabelIndex dst, ir::LabelIndex src) const { using namespace ir; //const GenRegister ip = sel.selReg(ocl::blockip, TYPE_U16); const GenRegister ip = sel.getBlockIP(); const Function &fn = sel.ctx.getFunction(); const BasicBlock &bb = fn.getBlock(src); const LabelIndex jip = sel.ctx.getLabelIndex(&insn); const LabelIndex label = bb.getLabelIndex(); const uint32_t simdWidth = sel.ctx.getSimdWidth(); GBE_ASSERT(bb.getNextBlock() != NULL); if (insn.isPredicated() == true) { const Register pred = insn.getPredicateIndex(); // Update the PcIPs for all the branches. Just put the IPs of the next // block. Next instruction will properly update the IPs of the lanes // that actually take the branch const LabelIndex next = bb.getNextBlock()->getLabelIndex(); sel.setBlockIP(ip, next.value()); GBE_ASSERT(jip == dst); sel.push(); sel.curr.physicalFlag = 0; sel.curr.flagIndex = pred.value(); sel.curr.predicate = GEN_PREDICATE_NORMAL; sel.setBlockIP(ip, dst.value()); sel.block->endifOffset = -1; sel.curr.predicate = GEN_PREDICATE_NONE; if (!sel.block->hasBarrier && !sel.block->removeSimpleIfEndif) sel.ENDIF(GenRegister::immd(0), next); sel.curr.execWidth = 1; if (simdWidth == 16) sel.curr.predicate = GEN_PREDICATE_ALIGN1_ANY16H; else sel.curr.predicate = GEN_PREDICATE_ALIGN1_ANY8H; sel.curr.noMask = 1; sel.block->endifOffset -= sel.JMPI(GenRegister::immd(0), jip, label); sel.pop(); } else { const LabelIndex next = bb.getNextBlock()->getLabelIndex(); // Update the PcIPs sel.curr.flag = 0; sel.curr.subFlag = 1; if(insn.getParent()->needEndif) sel.setBlockIP(ip, dst.value()); sel.block->endifOffset = -1; if (!sel.block->hasBarrier && !sel.block->removeSimpleIfEndif) { if(insn.getParent()->needEndif && !insn.getParent()->needIf) sel.ENDIF(GenRegister::immd(0), insn.getParent()->endifLabel, insn.getParent()->endifLabel); else if(insn.getParent()->needEndif) sel.ENDIF(GenRegister::immd(0), next); } // Branch to the jump target sel.push(); sel.curr.execWidth = 1; sel.curr.noMask = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.block->endifOffset -= sel.JMPI(GenRegister::immd(0), jip, label); sel.pop(); } } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::BranchInstruction &insn = cast(dag.insn); const Opcode opcode = insn.getOpcode(); if (opcode == OP_RET) sel.EOT(); else if (opcode == OP_BRA) { const LabelIndex dst = insn.getLabelIndex(); const LabelIndex src = insn.getParent()->getLabelIndex(); sel.push(); if (insn.isPredicated() == true) { if (dag.child[0] == NULL) sel.curr.externFlag = 1; } // We handle foward and backward branches differently if (uint32_t(dst) <= uint32_t(src)) this->emitBackwardBranch(sel, insn, dst, src); else this->emitForwardBranch(sel, insn, dst, src); sel.pop(); } else if(opcode == OP_IF) { const Register pred = insn.getPredicateIndex(); const LabelIndex jip = insn.getLabelIndex(); LabelIndex uip; if(insn.getParent()->matchingEndifLabel != 0) uip = insn.getParent()->matchingEndifLabel; else uip = jip; sel.push(); sel.curr.physicalFlag = 0; sel.curr.flagIndex = (uint64_t)pred; sel.curr.externFlag = 1; sel.curr.inversePredicate = insn.getInversePredicated(); sel.curr.predicate = GEN_PREDICATE_NORMAL; sel.IF(GenRegister::immd(0), jip, uip); sel.curr.inversePredicate = 0; sel.pop(); } else if(opcode == OP_ENDIF) { const LabelIndex label = insn.getLabelIndex(); sel.push(); sel.curr.noMask = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.ENDIF(GenRegister::immd(0), label, label); sel.pop(); } else if(opcode == OP_ELSE) { const LabelIndex label = insn.getLabelIndex(); sel.ELSE(GenRegister::immd(0), label, insn.getParent()->thisElseLabel); } else if(opcode == OP_WHILE) { const Register pred = insn.getPredicateIndex(); const LabelIndex jip = insn.getLabelIndex(); sel.push(); sel.curr.physicalFlag = 0; sel.curr.flagIndex = (uint64_t)pred; sel.curr.externFlag = 1; sel.curr.inversePredicate = insn.getInversePredicated(); sel.curr.predicate = GEN_PREDICATE_NORMAL; sel.WHILE(GenRegister::immd(0), jip); sel.curr.inversePredicate = 0; sel.pop(); } else NOT_IMPLEMENTED; markAllChildren(dag); return true; } }; /*! WorkGroup instruction pattern */ DECL_PATTERN(WorkGroupInstruction) { /* WORKGROUP OP: ALL, ANY, REDUCE, SCAN INCLUSIVE, SCAN EXCLUSIVE * Shared local memory bassed communication between threads, * prepare for the workgroup op in gen context * Algorithm logic is in gen context, */ INLINE bool emitWGReduce(Selection::Opaque &sel, const ir::WorkGroupInstruction &insn) const { using namespace ir; GBE_ASSERT(insn.getSrcNum() == 3); GBE_ASSERT(insn.getSrc(0) == ocl::threadn); GBE_ASSERT(insn.getSrc(1) == ocl::threadid); const WorkGroupOps workGroupOp = insn.getWorkGroupOpcode(); const Type type = insn.getType(); GenRegister dst = sel.selReg(insn.getDst(0), type); GenRegister src = sel.selReg(insn.getSrc(2), type); GenRegister tmpData1 = GenRegister::retype(sel.selReg(sel.reg(FAMILY_QWORD)), type); GenRegister tmpData2 = GenRegister::retype(sel.selReg(sel.reg(FAMILY_QWORD)), type); GenRegister slmOff = sel.selReg(sel.reg(FAMILY_QWORD), TYPE_U32); GenRegister localThreadID = sel.selReg(ocl::threadid, TYPE_U32); GenRegister localThreadNUM = sel.selReg(ocl::threadn, TYPE_U32); GenRegister localBarrier = GenRegister::ud8grf(sel.reg(FAMILY_DWORD)); /* Allocate registers for message sending * (read/write to shared local memory), * only one data (ud/ul) is needed for thread communication, * we will always use SIMD8 to do the read/write */ vector msg; msg.push_back(GenRegister::ud8grf(sel.reg(ir::FAMILY_REG))); //address msg.push_back(GenRegister::ud8grf(sel.reg(ir::FAMILY_REG))); //data if(dst.type == GEN_TYPE_UL || dst.type == GEN_TYPE_L) msg.push_back(GenRegister::ud8grf(sel.reg(ir::FAMILY_REG))); //data /* Insert a barrier to make sure all the var we are interested in have been assigned the final value. */ sel.BARRIER(GenRegister::ud8grf(sel.reg(FAMILY_DWORD)), sel.selReg(sel.reg(FAMILY_DWORD)), syncLocalBarrier); /* Pass the shared local memory offset */ sel.MOV(slmOff, GenRegister::immud(insn.getSlmAddr())); /* Perform workgroup op */ sel.WORKGROUP_OP(workGroupOp, dst, src, tmpData1, localThreadID, localThreadNUM, tmpData2, slmOff, msg, localBarrier); return true; } /* WORKGROUP OP: BROADCAST * 1. BARRIER Ensure all the threads have set the correct value for the var which will be broadcasted. 2. CMP IDs Compare the local IDs with the specified ones in the function call. 3. STORE Use flag to control the store of the var. Only the specified item will execute the store. 4. BARRIER Ensure the specified value has been stored. 5. LOAD Load the stored value to all the dst value, the dst of all the items will have same value, so broadcasted. */ INLINE bool emitWGBroadcast(Selection::Opaque &sel, const ir::WorkGroupInstruction &insn) const { using namespace ir; const uint32_t srcNum = insn.getSrcNum(); GBE_ASSERT(srcNum >= 2); const Type type = insn.getType(); const GenRegister src = sel.selReg(insn.getSrc(0), type); const GenRegister dst = sel.selReg(insn.getDst(0), type); const uint32_t slmAddr = insn.getSlmAddr(); GenRegister addr = sel.selReg(sel.reg(FAMILY_DWORD), TYPE_U32); vector fakeTemps; fakeTemps.push_back(sel.selReg(sel.reg(FAMILY_DWORD), type)); fakeTemps.push_back(sel.selReg(sel.reg(FAMILY_DWORD), type)); GenRegister coords[3]; for (uint32_t i = 1; i < srcNum; i++) coords[i - 1] = GenRegister::toUniform(sel.selReg(insn.getSrc(i), TYPE_U32), GEN_TYPE_UD); sel.push(); { sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; sel.MOV(addr, GenRegister::immud(slmAddr)); } sel.pop(); /* insert a barrier to make sure all the var we are interested in have been assigned the final value. */ sel.BARRIER(GenRegister::ud8grf(sel.reg(FAMILY_DWORD)), sel.selReg(sel.reg(FAMILY_DWORD)), syncLocalBarrier); sel.push(); { sel.curr.flag = 0; sel.curr.subFlag = 1; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; GenRegister lid0, lid1, lid2; uint32_t dim = srcNum - 1; lid0 = GenRegister::retype(sel.selReg(ocl::lid0, TYPE_U32), GEN_TYPE_UD); lid1 = GenRegister::retype(sel.selReg(ocl::lid1, TYPE_U32), GEN_TYPE_UD); lid2 = GenRegister::retype(sel.selReg(ocl::lid2, TYPE_U32), GEN_TYPE_UD); sel.CMP(GEN_CONDITIONAL_EQ, coords[0], lid0, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); sel.curr.predicate = GEN_PREDICATE_NORMAL; if (dim >= 2) sel.CMP(GEN_CONDITIONAL_EQ, coords[1], lid1, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); if (dim >= 3) sel.CMP(GEN_CONDITIONAL_EQ, coords[2], lid2, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); /* write to shared local memory for BYTE/WORD/DWORD types */ if (typeSize(src.type) <= 4) { GenRegister _addr = GenRegister::retype(addr, GEN_TYPE_UD); GenRegister _src = GenRegister::retype(src, GEN_TYPE_UD); sel.UNTYPED_WRITE(_addr, &_src, 1, GenRegister::immw(0xfe), fakeTemps); } /* write to shared local memory for QWORD types */ else if (typeSize(src.type) == 8) { sel.push(); { /* arrange data in QWORD */ GenRegister _addr = GenRegister::retype(addr, GEN_TYPE_UD); GenRegister srcQW = sel.selReg(sel.reg(FAMILY_QWORD), ir::TYPE_U64); GenRegister srcQW_p1 = src.retype(srcQW, GEN_TYPE_UD); GenRegister srcQW_p2 = src.retype(src.offset(srcQW, 2, 0), GEN_TYPE_UD); vector srcVec; srcVec.push_back(srcQW_p1); srcVec.push_back(srcQW_p2); /* unpack into 2 DWORD */ sel.UNPACK_LONG(srcQW, src); /* emit write through SEND */ sel.UNTYPED_WRITE(_addr, srcVec.data(), 2, GenRegister::immw(0xfe), fakeTemps); }sel.pop(); } else GBE_ASSERT(0); } sel.pop(); /* make sure the slm var have the valid value now */ sel.BARRIER(GenRegister::ud8grf(sel.reg(FAMILY_DWORD)), sel.selReg(sel.reg(FAMILY_DWORD)), syncLocalBarrier); /* read from shared local memory for BYTE/WORD/DWORD types */ if (typeSize(src.type) <= 4) { GenRegister _addr = GenRegister::retype(addr, GEN_TYPE_UD); GenRegister _dst = GenRegister::retype(dst, GEN_TYPE_UD); sel.UNTYPED_READ(_addr, &_dst, 1, GenRegister::immw(0xfe), fakeTemps); } /* read from shared local memory for QWORD types */ else if (typeSize(src.type) == 8) { GenRegister _addr = GenRegister::retype(addr, GEN_TYPE_UD); vector _dst; _dst.push_back(sel.selReg(sel.reg(FAMILY_WORD), ir::TYPE_U32)); _dst.push_back(sel.selReg(sel.reg(FAMILY_WORD), ir::TYPE_U32)); GenRegister _dstQ = dst.toUniform(_dst[0], GEN_TYPE_UL); sel.push(); { /* emit read through SEND */ sel.curr.execWidth = 8; sel.UNTYPED_READ(_addr, _dst.data(), 2, GenRegister::immw(0xfe), fakeTemps); /* reconstruct QWORD type */ _dst[0] = dst.toUniform(dst.offset(_dst[0], 0, 4), GEN_TYPE_UD); _dst[1] = dst.toUniform(_dst[1], GEN_TYPE_UD); sel.curr.execWidth = 1; sel.MOV(_dst[0], _dst[1]); } sel.pop(); /* set all elements assigned to thread */ sel.MOV(dst, _dstQ); } else GBE_ASSERT(0); return true; } INLINE bool emitOne(Selection::Opaque &sel, const ir::WorkGroupInstruction &insn, bool &markChildren) const { using namespace ir; const WorkGroupOps workGroupOp = insn.getWorkGroupOpcode(); if (workGroupOp == WORKGROUP_OP_BROADCAST){ return emitWGBroadcast(sel, insn); } else if (workGroupOp >= WORKGROUP_OP_ANY && workGroupOp <= WORKGROUP_OP_EXCLUSIVE_MAX){ return emitWGReduce(sel, insn); } else GBE_ASSERT(0); return true; } DECL_CTOR(WorkGroupInstruction, 1, 1); }; /*! SubGroup instruction pattern */ class SubGroupInstructionPattern : public SelectionPattern { public: SubGroupInstructionPattern(void) : SelectionPattern(1,1) { for (uint32_t op = 0; op < ir::OP_INVALID; ++op) if (ir::isOpcodeFrom(ir::Opcode(op)) == true) this->opcodes.push_back(ir::Opcode(op)); } /* SUBGROUP OP: ALL, ANY, REDUCE, SCAN INCLUSIVE, SCAN EXCLUSIVE * Shared algorithm with workgroup inthread */ INLINE bool emitSGReduce(Selection::Opaque &sel, const ir::SubGroupInstruction &insn) const { using namespace ir; GBE_ASSERT(insn.getSrcNum() == 1); const WorkGroupOps workGroupOp = insn.getWorkGroupOpcode(); const Type type = insn.getType(); GenRegister dst = sel.selReg(insn.getDst(0), type); GenRegister src = sel.selReg(insn.getSrc(0), type); GenRegister tmpData1 = GenRegister::retype(sel.selReg(sel.reg(FAMILY_QWORD)), type); GenRegister tmpData2 = GenRegister::retype(sel.selReg(sel.reg(FAMILY_QWORD)), type); /* Perform workgroup op */ sel.SUBGROUP_OP(workGroupOp, dst, src, tmpData1, tmpData2); return true; } /* SUBROUP OP: BROADCAST * Shared algorithm with simd shuffle */ INLINE bool emitSGBroadcast(Selection::Opaque &sel, const ir::SubGroupInstruction &insn, SelectionDAG &dag) const { using namespace ir; GBE_ASSERT(insn.getSrcNum() == 2); const Type type = insn.getType(); const GenRegister src0 = sel.selReg(insn.getSrc(0), type); const GenRegister dst = sel.selReg(insn.getDst(0), type); GenRegister src1; SelectionDAG *dag0 = dag.child[0]; SelectionDAG *dag1 = dag.child[1]; if (dag1 != NULL && dag1->insn.getOpcode() == OP_LOADI && canGetRegisterFromImmediate(dag1->insn)) { const auto &childInsn = cast(dag1->insn); src1 = getRegisterFromImmediate(childInsn.getImmediate(), TYPE_U32); if (dag0) dag0->isRoot = 1; } else { markAllChildren(dag); src1 = sel.selReg(insn.getSrc(1), TYPE_U32); } sel.push(); { if (src1.file == GEN_IMMEDIATE_VALUE) { uint32_t offset = src1.value.ud % sel.curr.execWidth; GenRegister reg = GenRegister::subphysicaloffset(src0, offset); reg.vstride = GEN_VERTICAL_STRIDE_0; reg.hstride = GEN_HORIZONTAL_STRIDE_0; reg.width = GEN_WIDTH_1; sel.MOV(dst, reg); } else { GenRegister shiftL = sel.selReg(sel.reg(FAMILY_DWORD), TYPE_U32); uint32_t SHLimm = typeSize(getGenType(type)) == 2 ? 1 : (typeSize(getGenType(type)) == 4 ? 2 : 3); sel.SHL(shiftL, src1, GenRegister::immud(SHLimm)); sel.SIMD_SHUFFLE(dst, src0, shiftL); } } sel.pop(); return true; } INLINE bool emit(Selection::Opaque &sel, SelectionDAG &dag) const { using namespace ir; const ir::SubGroupInstruction &insn = cast(dag.insn); const WorkGroupOps workGroupOp = insn.getWorkGroupOpcode(); if (workGroupOp == WORKGROUP_OP_BROADCAST){ return emitSGBroadcast(sel, insn, dag); } else if (workGroupOp >= WORKGROUP_OP_ANY && workGroupOp <= WORKGROUP_OP_EXCLUSIVE_MAX){ if(emitSGReduce(sel, insn)) markAllChildren(dag); else return false; } else GBE_ASSERT(0); return true; } }; /*! Media Block Read pattern */ DECL_PATTERN(MediaBlockReadInstruction) { bool emitOne(Selection::Opaque &sel, const ir::MediaBlockReadInstruction &insn, bool &markChildren) const { using namespace ir; uint32_t vec_size = insn.getVectorSize(); uint32_t simdWidth = sel.curr.execWidth; const Type type = insn.getType(); const uint32_t typeSize = type == TYPE_U32 ? 4 : 2; uint32_t response_size = simdWidth * vec_size * typeSize / 32; // ushort in simd8 will have half reg thus 0.5 reg size, but response lenght is still 1 response_size = response_size ? response_size : 1; uint32_t block_width = typeSize * simdWidth; uint32_t blocksize = (block_width - 1) % 32 | (vec_size - 1) << 16; vector valuesVec; vector tmpVec; for (uint32_t i = 0; i < vec_size; ++i) { valuesVec.push_back(sel.selReg(insn.getDst(i), type)); if(simdWidth == 16 && typeSize == 4) tmpVec.push_back(GenRegister::ud8grf(sel.reg(FAMILY_REG))); } const GenRegister coordx = GenRegister::toUniform(sel.selReg(insn.getSrc(0), TYPE_U32), GEN_TYPE_UD); const GenRegister coordy = GenRegister::toUniform(sel.selReg(insn.getSrc(1), TYPE_U32), GEN_TYPE_UD); const GenRegister header = GenRegister::ud8grf(sel.reg(FAMILY_REG)); const GenRegister offsetx = GenRegister::toUniform(sel.getOffsetReg(header, 0, 0 * 4), GEN_TYPE_UD); const GenRegister offsety = GenRegister::toUniform(sel.getOffsetReg(header, 0, 1 * 4), GEN_TYPE_UD); const GenRegister blocksizereg = sel.getOffsetReg(header, 0, 2 * 4); // Make header sel.push(); // Copy r0 into the header first sel.curr.execWidth = 8; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; sel.MOV(header, GenRegister::ud8grf(0, 0)); // Update the header with the coord sel.curr.execWidth = 1; sel.MOV(offsetx, coordx); sel.MOV(offsety, coordy); // Update block width and height sel.MOV(blocksizereg, GenRegister::immud(blocksize)); sel.pop(); if (simdWidth * typeSize < 64) { sel.push(); sel.curr.execWidth = 8; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; // Now read the data sel.MBREAD(&valuesVec[0], vec_size, header, insn.getImageIndex(), response_size); sel.pop(); } else if (simdWidth * typeSize == 64) { sel.push(); sel.curr.execWidth = 8; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; sel.MBREAD(&tmpVec[0], vec_size ,header, insn.getImageIndex(), vec_size); for (uint32_t i = 0; i < vec_size; i++) sel.MOV(valuesVec[i], tmpVec[i]); // Second half // Update the header with the coord sel.curr.execWidth = 1; sel.ADD(offsetx, offsetx, GenRegister::immud(32)); // Now read the data sel.curr.execWidth = 8; sel.MBREAD(&tmpVec[0], vec_size, header, insn.getImageIndex(), vec_size); // Move the reg to fit vector rule. for (uint32_t i = 0; i < vec_size; i++) sel.MOV(sel.getOffsetReg(valuesVec[i], 0, 32) , tmpVec[i]); sel.pop(); } else NOT_IMPLEMENTED; return true; } DECL_CTOR(MediaBlockReadInstruction, 1, 1); }; /*! Media Block Write pattern */ DECL_PATTERN(MediaBlockWriteInstruction) { bool emitOne(Selection::Opaque &sel, const ir::MediaBlockWriteInstruction &insn, bool &markChildren) const { using namespace ir; uint32_t vec_size = insn.getVectorSize(); const Type type = insn.getType(); uint32_t simdWidth = sel.curr.execWidth; const uint32_t genType = type == TYPE_U32 ? GEN_TYPE_UD : GEN_TYPE_UW; const RegisterFamily family = getFamily(type); const uint32_t typeSize = type == TYPE_U32 ? 4 : 2; // ushort in simd8 will have half reg, but data lenght is still 1 uint32_t data_size = simdWidth * vec_size * typeSize / 32; data_size = data_size? data_size : 1; uint32_t block_width = typeSize * simdWidth; uint32_t blocksize = (block_width - 1) % 32 | (vec_size - 1) << 16; vector valuesVec; vector tmpVec; for (uint32_t i = 0; i < vec_size; ++i) { valuesVec.push_back(sel.selReg(insn.getSrc(2 + i), type)); if(simdWidth == 16 && typeSize == 4) tmpVec.push_back(GenRegister::ud8grf(sel.reg(FAMILY_REG))); else tmpVec.push_back(GenRegister::retype(GenRegister::f8grf(sel.reg(family)), genType)); } const GenRegister coordx = GenRegister::toUniform(sel.selReg(insn.getSrc(0), TYPE_U32), GEN_TYPE_UD); const GenRegister coordy = GenRegister::toUniform(sel.selReg(insn.getSrc(1), TYPE_U32), GEN_TYPE_UD); const GenRegister header = GenRegister::ud8grf(sel.reg(FAMILY_REG)); const GenRegister offsetx = GenRegister::toUniform(sel.getOffsetReg(header, 0, 0*4), GEN_TYPE_UD); const GenRegister offsety = GenRegister::toUniform(sel.getOffsetReg(header, 0, 1*4), GEN_TYPE_UD); const GenRegister blocksizereg = sel.getOffsetReg(header, 0, 2*4); // Make header sel.push(); // Copy r0 into the header first sel.curr.execWidth = 8; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; sel.MOV(header, GenRegister::ud8grf(0, 0)); // Update the header with the coord sel.curr.execWidth = 1; sel.MOV(offsetx, coordx); sel.MOV(offsety, coordy); // Update block width and height sel.MOV(blocksizereg, GenRegister::immud(blocksize)); sel.pop(); if (simdWidth * typeSize < 64) { for (uint32_t i = 0; i < vec_size; ++i) { sel.MOV(tmpVec[i], valuesVec[i]); } sel.push(); sel.curr.execWidth = 8; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; // Now write the data sel.MBWRITE(header, &tmpVec[0], vec_size, insn.getImageIndex(), data_size); sel.pop(); } else if (simdWidth * typeSize == 64) { sel.push(); sel.curr.execWidth = 8; sel.curr.predicate = GEN_PREDICATE_NONE; sel.curr.noMask = 1; for (uint32_t i = 0; i < vec_size; i++) sel.MOV(tmpVec[i], valuesVec[i]); sel.MBWRITE(header, &tmpVec[0], vec_size, insn.getImageIndex(), vec_size); // Second half // Update the header with the coord sel.curr.execWidth = 1; sel.ADD(offsetx, offsetx, GenRegister::immud(32)); sel.curr.execWidth = 8; for (uint32_t i = 0; i < vec_size; i++) sel.MOV(tmpVec[i], sel.getOffsetReg(valuesVec[i], 0, 32)); // Now write the data sel.MBWRITE(header, &tmpVec[0], vec_size, insn.getImageIndex(), vec_size); // Move the reg to fit vector rule. sel.pop(); } else NOT_IMPLEMENTED; return true; } DECL_CTOR(MediaBlockWriteInstruction, 1, 1); }; /*! Sort patterns */ INLINE bool cmp(const SelectionPattern *p0, const SelectionPattern *p1) { if (p0->insnNum != p1->insnNum) return p0->insnNum > p1->insnNum; return p0->cost < p1->cost; } SelectionLibrary::SelectionLibrary(void) { this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); this->insert(); // Sort all the patterns with the number of instructions they output for (uint32_t op = 0; op < ir::OP_INVALID; ++op) std::sort(this->patterns[op].begin(), this->patterns[op].end(), cmp); } SelectionLibrary::~SelectionLibrary(void) { for (auto pattern : this->toFree) GBE_DELETE(const_cast(pattern)); } template void SelectionLibrary::insert(void) { const SelectionPattern *pattern = GBE_NEW_NO_ARG(PatternType); this->toFree.push_back(pattern); for (auto opcode : pattern->opcodes) this->patterns[opcode].push_back(pattern); } } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/backend/gen_context.hpp000664 001750 001750 00000032236 13161142102 022250 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file gen_context.hpp * \author Benjamin Segovia */ #ifndef __GBE_GEN_CONTEXT_HPP__ #define __GBE_GEN_CONTEXT_HPP__ #include "backend/context.hpp" #include "backend/gen7_encoder.hpp" #include "backend/program.h" #include "backend/gen_register.hpp" #include "ir/function.hpp" #include "ir/liveness.hpp" #include "sys/map.hpp" #include namespace gbe { class Kernel; // We build this structure class GenEncoder; // Helps emitting Gen ISA class GenRegAllocator; // Handle the register allocation class Selection; // Performs instruction selection class SelectionInstruction; // Pre-RA Gen instruction class SelectionReg; // Pre-RA Gen register class GenRegister; class GenKernel; typedef enum { NO_ERROR, REGISTER_ALLOCATION_FAIL, REGISTER_SPILL_EXCEED_THRESHOLD, REGISTER_SPILL_FAIL, REGISTER_SPILL_NO_SPACE, OUT_OF_RANGE_IF_ENDIF, } CompileErrorCode; /*! Context is the helper structure to build the Gen ISA or simulation code * from GenIR */ class GenContext : public Context { public: /*! Create a new context. name is the name of the function we want to * compile */ GenContext(const ir::Unit &unit, const std::string &name, uint32_t deviceID, bool relaxMath = false); /*! Release everything needed */ virtual ~GenContext(void); /*! device's max srcatch buffer size */ #define GEN7_SCRATCH_SIZE (12 * KB) /*! Start new code generation with specific parameters */ void startNewCG(uint32_t simdWidth, uint32_t reservedSpillRegs, bool limitRegisterPressure); /*! Set the file name for the ASM dump */ void setASMFileName(const char* asmFname); /*! Target device ID*/ uint32_t deviceID; /*! Implements base class */ virtual bool emitCode(void); /*! Align the scratch size to the device's scratch unit size */ virtual uint32_t alignScratchSize(uint32_t size); /*! Get the device's max srcatch size */ virtual uint32_t getScratchSize(void) { return GEN7_SCRATCH_SIZE; } /*! Get the pointer argument size for curbe alloc */ virtual uint32_t getPointerSize(void) { return 4; } /*! Function we emit code for */ INLINE const ir::Function &getFunction(void) const { return fn; } /*! Simd width chosen for the current function */ INLINE uint32_t getSimdWidth(void) const { return simdWidth; } void clearFlagRegister(void); void profilingProlog(void); /*! check the flag reg, if is grf, use f0.1 instead */ GenRegister checkFlagRegister(GenRegister flagReg); /*! Emit the per-lane stack pointer computation */ virtual void emitStackPointer(void); /*! Emit the instructions */ void emitInstructionStream(void); /*! Set the correct target values for the branches */ virtual bool patchBranches(void); /*! Forward ir::Function isSpecialReg method */ INLINE bool isSpecialReg(ir::Register reg) const { return fn.isSpecialReg(reg); } /*! Get the liveOut information for the given block */ INLINE const ir::Liveness::LiveOut &getLiveOut(const ir::BasicBlock *bb) const { return this->liveness->getLiveOut(bb); } /*! Get the LiveIn information for the given block */ INLINE const ir::Liveness::UEVar &getLiveIn(const ir::BasicBlock *bb) const { return this->liveness->getLiveIn(bb); } void loadLaneID(GenRegister dst); GenRegister getBlockIP(void); void setBlockIP(GenRegister blockip, uint32_t label); void collectShifter(GenRegister dest, GenRegister src); void loadTopHalf(GenRegister dest, GenRegister src); void storeTopHalf(GenRegister dest, GenRegister src); void loadBottomHalf(GenRegister dest, GenRegister src); void storeBottomHalf(GenRegister dest, GenRegister src); void addWithCarry(GenRegister dest, GenRegister src0, GenRegister src1); void subWithBorrow(GenRegister dest, GenRegister src0, GenRegister src1); void I64Neg(GenRegister high, GenRegister low, GenRegister tmp); void I64ABS(GenRegister sign, GenRegister high, GenRegister low, GenRegister tmp, GenRegister flagReg); void I64FullAdd(GenRegister high1, GenRegister low1, GenRegister high2, GenRegister low2); void I32FullMult(GenRegister high, GenRegister low, GenRegister src0, GenRegister src1); void I64FullMult(GenRegister dst1, GenRegister dst2, GenRegister dst3, GenRegister dst4, GenRegister x_high, GenRegister x_low, GenRegister y_high, GenRegister y_low); void setFlag(GenRegister flag, GenRegister src); void saveFlag(GenRegister dest, int flag, int subFlag); void UnsignedI64ToFloat(GenRegister dst, GenRegister high, GenRegister low, GenRegister exp, GenRegister mantissa, GenRegister tmp, GenRegister flag); /*! Final Gen ISA emission helper functions */ void emitLabelInstruction(const SelectionInstruction &insn); virtual void emitUnaryInstruction(const SelectionInstruction &insn); virtual void emitUnaryWithTempInstruction(const SelectionInstruction &insn); virtual void emitBinaryInstruction(const SelectionInstruction &insn); virtual void emitSimdShuffleInstruction(const SelectionInstruction &insn); virtual void emitBinaryWithTempInstruction(const SelectionInstruction &insn); void emitTernaryInstruction(const SelectionInstruction &insn); virtual void emitI64MULHIInstruction(const SelectionInstruction &insn); virtual void emitI64MADSATInstruction(const SelectionInstruction &insn); virtual void emitI64HADDInstruction(const SelectionInstruction &insn); virtual void emitI64RHADDInstruction(const SelectionInstruction &insn); virtual void emitI64ShiftInstruction(const SelectionInstruction &insn); virtual void emitI64CompareInstruction(const SelectionInstruction &insn); virtual void emitI64SATADDInstruction(const SelectionInstruction &insn); virtual void emitI64SATSUBInstruction(const SelectionInstruction &insn); virtual void emitI64ToFloatInstruction(const SelectionInstruction &insn); virtual void emitFloatToI64Instruction(const SelectionInstruction &insn); void emitCompareInstruction(const SelectionInstruction &insn); void emitJumpInstruction(const SelectionInstruction &insn); void emitIndirectMoveInstruction(const SelectionInstruction &insn); void emitEotInstruction(const SelectionInstruction &insn); void emitNoOpInstruction(const SelectionInstruction &insn); void emitWaitInstruction(const SelectionInstruction &insn); virtual void emitBarrierInstruction(const SelectionInstruction &insn); void emitFenceInstruction(const SelectionInstruction &insn); void emitMathInstruction(const SelectionInstruction &insn); virtual void emitRead64Instruction(const SelectionInstruction &insn); virtual void emitWrite64Instruction(const SelectionInstruction &insn); virtual void emitRead64A64Instruction(const SelectionInstruction &insn); virtual void emitWrite64A64Instruction(const SelectionInstruction &insn); virtual void emitAtomicA64Instruction(const SelectionInstruction &insn); void emitUntypedReadInstruction(const SelectionInstruction &insn); void emitUntypedWriteInstruction(const SelectionInstruction &insn); virtual void emitUntypedReadA64Instruction(const SelectionInstruction &insn); virtual void emitUntypedWriteA64Instruction(const SelectionInstruction &insn); virtual void emitByteGatherA64Instruction(const SelectionInstruction &insn); virtual void emitByteScatterA64Instruction(const SelectionInstruction &insn); void emitAtomicInstruction(const SelectionInstruction &insn); void emitByteGatherInstruction(const SelectionInstruction &insn); void emitByteScatterInstruction(const SelectionInstruction &insn); void emitPackByteInstruction(const SelectionInstruction &insn); void emitUnpackByteInstruction(const SelectionInstruction &insn); virtual void emitPackLongInstruction(const SelectionInstruction &insn); virtual void emitUnpackLongInstruction(const SelectionInstruction &insn); void emitDWordGatherInstruction(const SelectionInstruction &insn); void emitSampleInstruction(const SelectionInstruction &insn); void emitVmeInstruction(const SelectionInstruction &insn); void emitTypedWriteInstruction(const SelectionInstruction &insn); void emitSpillRegInstruction(const SelectionInstruction &insn); void emitUnSpillRegInstruction(const SelectionInstruction &insn); void emitGetImageInfoInstruction(const SelectionInstruction &insn); virtual void emitI64MULInstruction(const SelectionInstruction &insn); virtual void emitI64DIVREMInstruction(const SelectionInstruction &insn); virtual void emitF64DIVInstruction(const SelectionInstruction &insn); void emitCalcTimestampInstruction(const SelectionInstruction &insn); void emitStoreProfilingInstruction(const SelectionInstruction &insn); virtual void emitWorkGroupOpInstruction(const SelectionInstruction &insn); virtual void emitSubGroupOpInstruction(const SelectionInstruction &insn); void emitPrintfInstruction(const SelectionInstruction &insn); void scratchWrite(const GenRegister header, uint32_t offset, uint32_t reg_num, uint32_t reg_type, uint32_t channel_mode); void scratchRead(const GenRegister dst, const GenRegister header, uint32_t offset, uint32_t reg_num, uint32_t reg_type, uint32_t channel_mode); unsigned beforeMessage(const SelectionInstruction &insn, GenRegister bti, GenRegister flagTemp, GenRegister btiTmp, unsigned desc); void afterMessage(const SelectionInstruction &insn, GenRegister bti, GenRegister flagTemp, GenRegister btiTmp, unsigned jip0); void emitOBReadInstruction(const SelectionInstruction &insn); void emitOBWriteInstruction(const SelectionInstruction &insn); void emitMBReadInstruction(const SelectionInstruction &insn); void emitMBWriteInstruction(const SelectionInstruction &insn); /*! Implements base class */ virtual Kernel *allocateKernel(void); /*! Store the position of each label instruction in the Gen ISA stream */ map labelPos; typedef struct LabelPair { LabelPair(ir::LabelIndex l0, ir::LabelIndex l1) : l0(l0), l1(l1){}; ir::LabelIndex l0; ir::LabelIndex l1; } LabelPair; /*! Store the Gen instructions to patch */ vector> branchPos3; vector> branchPos2; void insertJumpPos(const SelectionInstruction &insn); /*! Encode Gen ISA */ GenEncoder *p; /*! Instruction selection on Gen ISA (pre-register allocation) */ Selection *sel; /*! Perform the register allocation */ GenRegAllocator *ra; /*! Indicate if we need to tackle a register pressure issue when * regenerating the code */ uint32_t reservedSpillRegs; bool limitRegisterPressure; bool relaxMath; bool getIFENDIFFix(void) const { return ifEndifFix; } void setIFENDIFFix(bool fix) { ifEndifFix = fix; } bool getProfilingMode(void) const { return inProfilingMode; } void setProfilingMode(bool b) { inProfilingMode = b; } CompileErrorCode getErrCode() { return errCode; } protected: virtual GenEncoder* generateEncoder(void) { return GBE_NEW(Gen7Encoder, this->simdWidth, 7, deviceID); } /*! allocate a new curbe register and insert to curbe pool. */ void allocCurbeReg(ir::Register reg); virtual void setA0Content(uint16_t new_a0[16], uint16_t max_offset = 0, int sz = 0); void calcGlobalXYZRange(GenRegister& reg, GenRegister& tmp, int flag, int subFlag); virtual void subTimestamps(GenRegister& t0, GenRegister& t1, GenRegister& tmp); virtual void addTimestamps(GenRegister& t0, GenRegister& t1, GenRegister& tmp); virtual void emitPrintfLongInstruction(GenRegister& addr, GenRegister& data, GenRegister& src, uint32_t bti, bool useSends); private: CompileErrorCode errCode; bool ifEndifFix; bool inProfilingMode; uint32_t regSpillTick; const char* asmFileName; /*! Build the curbe patch list for the given kernel */ void buildPatchList(void); /* Helper for printing the assembly */ void outputAssembly(FILE *file, GenKernel* genKernel); /*! Calc the group's slm offset from R0.0, to work around HSW SLM bug*/ virtual void emitSLMOffset(void) { }; /*! new selection of device */ virtual void newSelection(void); friend class GenRegAllocator; //!< need to access errCode directly. }; } /* namespace gbe */ #endif /* __GBE_GEN_CONTEXT_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen75_encoder.cpp000664 001750 001750 00000024417 13173554000 022363 0ustar00yryr000000 000000 /* Copyright (C) Intel Corp. 2006. All Rights Reserved. Intel funded Tungsten Graphics (http://www.tungstengraphics.com) to develop this 3D driver. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. **********************************************************************/ #include "backend/gen75_encoder.hpp" static const uint32_t untypedRWMask[] = { GEN_UNTYPED_ALPHA|GEN_UNTYPED_BLUE|GEN_UNTYPED_GREEN|GEN_UNTYPED_RED, GEN_UNTYPED_ALPHA|GEN_UNTYPED_BLUE|GEN_UNTYPED_GREEN, GEN_UNTYPED_ALPHA|GEN_UNTYPED_BLUE, GEN_UNTYPED_ALPHA, 0 }; namespace gbe { void Gen75Encoder::setHeader(GenNativeInstruction *insn) { Gen7NativeInstruction *gen7_insn = &insn->gen7_insn; if (this->curr.execWidth == 8) gen7_insn->header.execution_size = GEN_WIDTH_8; else if (this->curr.execWidth == 16) gen7_insn->header.execution_size = GEN_WIDTH_16; else if (this->curr.execWidth == 1) gen7_insn->header.execution_size = GEN_WIDTH_1; else if (this->curr.execWidth == 4) gen7_insn->header.execution_size = GEN_WIDTH_4; else NOT_IMPLEMENTED; gen7_insn->header.acc_wr_control = this->curr.accWrEnable; gen7_insn->header.quarter_control = this->curr.quarterControl; gen7_insn->bits1.ia1.nib_ctrl = this->curr.nibControl; gen7_insn->header.mask_control = this->curr.noMask; if (insn->header.opcode == GEN_OPCODE_MAD || insn->header.opcode == GEN_OPCODE_LRP) { gen7_insn->bits1.da3src.flag_reg_nr = this->curr.flag; gen7_insn->bits1.da3src.flag_sub_reg_nr = this->curr.subFlag; } else { gen7_insn->bits2.ia1.flag_reg_nr = this->curr.flag; gen7_insn->bits2.ia1.flag_sub_reg_nr = this->curr.subFlag; } if (this->curr.predicate != GEN_PREDICATE_NONE) { gen7_insn->header.predicate_control = this->curr.predicate; gen7_insn->header.predicate_inverse = this->curr.inversePredicate; } gen7_insn->header.saturate = this->curr.saturate; } void Gen75Encoder::setDPUntypedRW(GenNativeInstruction *insn, uint32_t bti, uint32_t rgba, uint32_t msg_type, uint32_t msg_length, uint32_t response_length) { Gen7NativeInstruction *gen7_insn = &insn->gen7_insn; const GenMessageTarget sfid = GEN_SFID_DATAPORT1_DATA; setMessageDescriptor(insn, sfid, msg_length, response_length); gen7_insn->bits3.gen7_untyped_rw.msg_type = msg_type; gen7_insn->bits3.gen7_untyped_rw.bti = bti; gen7_insn->bits3.gen7_untyped_rw.rgba = rgba; if (curr.execWidth == 8) gen7_insn->bits3.gen7_untyped_rw.simd_mode = GEN_UNTYPED_SIMD8; else if (curr.execWidth == 16) gen7_insn->bits3.gen7_untyped_rw.simd_mode = GEN_UNTYPED_SIMD16; else NOT_SUPPORTED; } void Gen75Encoder::setTypedWriteMessage(GenNativeInstruction *insn, unsigned char bti, unsigned char msg_type, uint32_t msg_length, bool header_present) { Gen7NativeInstruction *gen7_insn = &insn->gen7_insn; const GenMessageTarget sfid = GEN_SFID_DATAPORT1_DATA; setMessageDescriptor(insn, sfid, msg_length, 0, header_present); gen7_insn->bits3.gen7_typed_rw.bti = bti; gen7_insn->bits3.gen7_typed_rw.msg_type = msg_type; /* Always using the low 8 slots here. */ gen7_insn->bits3.gen7_typed_rw.slot = 1; } unsigned Gen75Encoder::setAtomicMessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum) { Gen7NativeInstruction *gen7_insn = &insn->gen7_insn; uint32_t msg_length = 0; uint32_t response_length = 0; if (this->curr.execWidth == 8) { msg_length = srcNum; response_length = 1; } else if (this->curr.execWidth == 16) { msg_length = 2 * srcNum; response_length = 2; } else NOT_IMPLEMENTED; const GenMessageTarget sfid = GEN_SFID_DATAPORT1_DATA; setMessageDescriptor(insn, sfid, msg_length, response_length); gen7_insn->bits3.gen7_atomic_op.msg_type = GEN75_P1_UNTYPED_ATOMIC_OP; gen7_insn->bits3.gen7_atomic_op.bti = bti; gen7_insn->bits3.gen7_atomic_op.return_data = 1; gen7_insn->bits3.gen7_atomic_op.aop_type = function; if (this->curr.execWidth == 8) gen7_insn->bits3.gen7_atomic_op.simd_mode = GEN_ATOMIC_SIMD8; else if (this->curr.execWidth == 16) gen7_insn->bits3.gen7_atomic_op.simd_mode = GEN_ATOMIC_SIMD16; else NOT_SUPPORTED; return gen7_insn->bits3.ud; } void Gen75Encoder::ATOMIC(GenRegister dst, uint32_t function, GenRegister src, GenRegister bti, uint32_t srcNum, bool useSends) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA; this->setDst(insn, GenRegister::uw16grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0)); if (bti.file == GEN_IMMEDIATE_VALUE) { this->setSrc1(insn, GenRegister::immud(0)); setAtomicMessageDesc(insn, function, bti.value.ud, srcNum); } else { this->setSrc1(insn, bti); } } unsigned Gen75Encoder::setUntypedReadMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) { uint32_t msg_length = 0; uint32_t response_length = 0; if (this->curr.execWidth == 8) { msg_length = 1; response_length = elemNum; } else if (this->curr.execWidth == 16) { msg_length = 2; response_length = 2 * elemNum; } else NOT_IMPLEMENTED; setDPUntypedRW(insn, bti, untypedRWMask[elemNum], GEN75_P1_UNTYPED_READ, msg_length, response_length); return insn->bits3.ud; } void Gen75Encoder::UNTYPED_READ(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemNum) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); assert(elemNum >= 1 || elemNum <= 4); this->setHeader(insn); this->setDst(insn, GenRegister::uw16grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA; if (bti.file == GEN_IMMEDIATE_VALUE) { this->setSrc1(insn, GenRegister::immud(0)); setUntypedReadMessageDesc(insn, bti.value.ud, elemNum); } else { this->setSrc1(insn, bti); } } unsigned Gen75Encoder::setUntypedWriteMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) { uint32_t msg_length = 0; uint32_t response_length = 0; if (this->curr.execWidth == 8) { msg_length = 1 + elemNum; } else if (this->curr.execWidth == 16) { msg_length = 2 * (1 + elemNum); } else NOT_IMPLEMENTED; setDPUntypedRW(insn, bti, untypedRWMask[elemNum], GEN75_P1_UNTYPED_SURFACE_WRITE, msg_length, response_length); return insn->bits3.ud; } void Gen75Encoder::UNTYPED_WRITE(GenRegister msg, GenRegister data, GenRegister bti, uint32_t elemNum, bool useSends) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); assert(elemNum >= 1 || elemNum <= 4); this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT1_DATA; if (this->curr.execWidth == 8) { this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); } else if (this->curr.execWidth == 16) { this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW)); } else NOT_IMPLEMENTED; this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0)); if (bti.file == GEN_IMMEDIATE_VALUE) { this->setSrc1(insn, GenRegister::immud(0)); setUntypedWriteMessageDesc(insn, bti.value.ud, elemNum); } else { this->setSrc1(insn, bti); } } void Gen75Encoder::JMPI(GenRegister src, bool longjmp) { alu2(this, GEN_OPCODE_JMPI, GenRegister::ip(), GenRegister::ip(), src); } void Gen75Encoder::patchJMPI(uint32_t insnID, int32_t jip, int32_t uip) { GenNativeInstruction &insn = *(GenNativeInstruction *)&this->store[insnID]; GBE_ASSERT(insnID < this->store.size()); GBE_ASSERT(insn.header.opcode == GEN_OPCODE_JMPI || insn.header.opcode == GEN_OPCODE_BRD || insn.header.opcode == GEN_OPCODE_ENDIF || insn.header.opcode == GEN_OPCODE_IF || insn.header.opcode == GEN_OPCODE_BRC || insn.header.opcode == GEN_OPCODE_WHILE || insn.header.opcode == GEN_OPCODE_ELSE); if( insn.header.opcode == GEN_OPCODE_WHILE ){ // if this WHILE instruction jump back to an ELSE instruction, // need add distance to go to the next instruction. GenNativeInstruction & insn_else = *(GenNativeInstruction *)&this->store[insnID+jip]; if(insn_else.header.opcode == GEN_OPCODE_ELSE){ jip += 2; } } if (insn.header.opcode != GEN_OPCODE_JMPI) this->setSrc1(&insn, GenRegister::immd((jip & 0xffff) | (uip<<16))); else if (insn.header.opcode == GEN_OPCODE_JMPI) { //jumpDistance'unit is Qword, and the HSW's JMPI offset of jmpi is in byte, so multi 8 jip = (jip - 2) * 8; this->setSrc1(&insn, GenRegister::immd(jip)); } return; } } /* End of the name space. */ Beignet-1.3.2-Source/backend/src/backend/gen8_context.hpp000664 001750 001750 00000014344 13161142102 022340 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file gen8_context.hpp */ #ifndef __GBE_GEN8_CONTEXT_HPP__ #define __GBE_GEN8_CONTEXT_HPP__ #include "backend/gen_context.hpp" #include "backend/gen8_encoder.hpp" namespace gbe { /* This class is used to implement the HSW specific logic for context. */ class Gen8Context : public GenContext { public: virtual ~Gen8Context(void) { } Gen8Context(const ir::Unit &unit, const std::string &name, uint32_t deviceID, bool relaxMath = false) : GenContext(unit, name, deviceID, relaxMath) { }; /*! device's max srcatch buffer size */ #define GEN8_SCRATCH_SIZE (2 * KB * KB) /*! Align the scratch size to the device's scratch unit size */ virtual uint32_t alignScratchSize(uint32_t size); /*! Get the device's max srcatch size */ virtual uint32_t getScratchSize(void) { //Because the allocate is use uint16_t, so clamp it, need refine return std::min(GEN8_SCRATCH_SIZE, 0x7fff); } /*! Get the pointer argument size for curbe alloc */ virtual uint32_t getPointerSize(void) { return 8; } /*! Set the correct target values for the branches */ virtual bool patchBranches(void); virtual void emitUnaryInstruction(const SelectionInstruction &insn); virtual void emitUnaryWithTempInstruction(const SelectionInstruction &insn); virtual void emitSimdShuffleInstruction(const SelectionInstruction &insn); virtual void emitBinaryInstruction(const SelectionInstruction &insn); virtual void emitBinaryWithTempInstruction(const SelectionInstruction &insn); virtual void emitI64MULHIInstruction(const SelectionInstruction &insn); virtual void emitI64RHADDInstruction(const SelectionInstruction &insn); virtual void emitI64HADDInstruction(const SelectionInstruction &insn); virtual void emitI64ShiftInstruction(const SelectionInstruction &insn); virtual void emitI64CompareInstruction(const SelectionInstruction &insn); virtual void emitI64SATADDInstruction(const SelectionInstruction &insn); virtual void emitI64SATSUBInstruction(const SelectionInstruction &insn); virtual void emitI64ToFloatInstruction(const SelectionInstruction &insn); virtual void emitFloatToI64Instruction(const SelectionInstruction &insn); virtual void emitI64MADSATInstruction(const SelectionInstruction &insn); virtual void emitUntypedWriteA64Instruction(const SelectionInstruction &insn); virtual void emitUntypedReadA64Instruction(const SelectionInstruction &insn); virtual void emitByteGatherA64Instruction(const SelectionInstruction &insn); virtual void emitByteScatterA64Instruction(const SelectionInstruction &insn); virtual void emitWrite64Instruction(const SelectionInstruction &insn); virtual void emitRead64Instruction(const SelectionInstruction &insn); virtual void emitWrite64A64Instruction(const SelectionInstruction &insn); virtual void emitRead64A64Instruction(const SelectionInstruction &insn); virtual void emitAtomicA64Instruction(const SelectionInstruction &insn); virtual void emitI64MULInstruction(const SelectionInstruction &insn); virtual void emitI64DIVREMInstruction(const SelectionInstruction &insn); virtual void emitPackLongInstruction(const SelectionInstruction &insn); virtual void emitUnpackLongInstruction(const SelectionInstruction &insn); virtual void emitF64DIVInstruction(const SelectionInstruction &insn); virtual void emitWorkGroupOpInstruction(const SelectionInstruction &insn); virtual void emitSubGroupOpInstruction(const SelectionInstruction &insn); static GenRegister unpacked_ud(GenRegister reg, uint32_t offset = 0); protected: virtual void setA0Content(uint16_t new_a0[16], uint16_t max_offset = 0, int sz = 0); virtual void subTimestamps(GenRegister& t0, GenRegister& t1, GenRegister& tmp); virtual void addTimestamps(GenRegister& t0, GenRegister& t1, GenRegister& tmp); virtual void emitPrintfLongInstruction(GenRegister& addr, GenRegister& data, GenRegister& src, uint32_t bti); virtual GenEncoder* generateEncoder(void) { return GBE_NEW(Gen8Encoder, this->simdWidth, 8, deviceID); } private: virtual void emitSLMOffset(void); virtual void newSelection(void); void packLongVec(GenRegister unpacked, GenRegister packed, uint32_t simd); void unpackLongVec(GenRegister packed, GenRegister unpacked, uint32_t simd); void calculateFullS64MUL(GenRegister src0, GenRegister src1, GenRegister dst_h, GenRegister dst_l, GenRegister s0_abs, GenRegister s1_abs, GenRegister tmp0, GenRegister tmp1, GenRegister sign, GenRegister flagReg); virtual void calculateFullU64MUL(GenRegister src0, GenRegister src1, GenRegister dst_h, GenRegister dst_l, GenRegister s0l_s1h, GenRegister s0h_s1l); }; class ChvContext : public Gen8Context { public: virtual ~ChvContext(void) { } ChvContext(const ir::Unit &unit, const std::string &name, uint32_t deviceID, bool relaxMath = false) : Gen8Context(unit, name, deviceID, relaxMath) { }; virtual void emitI64MULInstruction(const SelectionInstruction &insn); protected: virtual void setA0Content(uint16_t new_a0[16], uint16_t max_offset = 0, int sz = 0); private: virtual void newSelection(void); virtual void calculateFullU64MUL(GenRegister src0, GenRegister src1, GenRegister dst_h, GenRegister dst_l, GenRegister s0l_s1h, GenRegister s0h_s1l); virtual void emitStackPointer(void); }; } #endif /* __GBE_GEN8_CONTEXT_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen/000775 001750 001750 00000000000 13174334761 020007 5ustar00yryr000000 000000 Beignet-1.3.2-Source/backend/src/backend/gen/gen_mesa_disasm.h000664 001750 001750 00000002422 13161142102 023254 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file gen_mesa_disasm.h * \author Benjamin Segovia * * To decode and print one Gen ISA instruction. The code is directly taken * from Mesa */ #ifndef __GBE_GEN_MESA_DISASM_H__ #define __GBE_GEN_MESA_DISASM_H__ #include #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ extern int gen_disasm(FILE *file, const void *opaque_insn, uint32_t deviceID, uint32_t compacted); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* __GBE_GEN_MESA_DISASM_H__ */ Beignet-1.3.2-Source/backend/src/backend/gen/gen_mesa_disasm.c000664 001750 001750 00000163633 13161142102 023263 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* * Copyright 2008 Keith Packard * * Permission to use, copy, modify, distribute, and sell this software and its * documentation for any purpose is hereby granted without fee, provided that * the above copyright notice appear in all copies and that both that copyright * notice and this permission notice appear in supporting documentation, and * that the name of the copyright holders not be used in advertising or * publicity pertaining to distribution of the software without specific, * written prior permission. The copyright holders make no representations * about the suitability of this software for any purpose. It is provided "as * is" without express or implied warranty. * * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, * INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO * EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR * CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, * DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER * TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE * OF THIS SOFTWARE. */ #include #include #include #include #include #include #include #include #include "backend/gen_defs.hpp" #include "backend/gen7_instruction.hpp" #include "backend/gen9_instruction.hpp" #include "src/cl_device_data.h" static const struct { const char *name; int nsrc; int ndst; } opcode[128] = { [GEN_OPCODE_MOV] = { .name = "mov", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_FRC] = { .name = "frc", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_RNDU] = { .name = "rndu", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_RNDD] = { .name = "rndd", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_RNDE] = { .name = "rnde", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_RNDZ] = { .name = "rndz", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_NOT] = { .name = "not", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_LZD] = { .name = "lzd", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_FBH] = { .name = "fbh", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_FBL] = { .name = "fbl", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_CBIT] = { .name = "cbit", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_F16TO32] = { .name = "f16to32", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_F32TO16] = { .name = "f32to16", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_BFREV] = { .name = "bfrev", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_MUL] = { .name = "mul", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_MAC] = { .name = "mac", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_MACH] = { .name = "mach", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_LINE] = { .name = "line", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_PLN] = { .name = "pln", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_MAD] = { .name = "mad", .nsrc = 3, .ndst = 1 }, [GEN_OPCODE_LRP] = { .name = "lrp", .nsrc = 3, .ndst = 1 }, [GEN_OPCODE_SAD2] = { .name = "sad2", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_SADA2] = { .name = "sada2", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_DP4] = { .name = "dp4", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_DPH] = { .name = "dph", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_DP3] = { .name = "dp3", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_DP2] = { .name = "dp2", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_MATH] = { .name = "math", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_MADM] = { .name = "madm", .nsrc = 3, .ndst = 1 }, [GEN_OPCODE_AVG] = { .name = "avg", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_ADD] = { .name = "add", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_ADDC] = { .name = "addc", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_SUBB] = { .name = "subb", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_SEL] = { .name = "sel", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_AND] = { .name = "and", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_OR] = { .name = "or", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_XOR] = { .name = "xor", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_SHR] = { .name = "shr", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_SHL] = { .name = "shl", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_ASR] = { .name = "asr", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_CMP] = { .name = "cmp", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_CMPN] = { .name = "cmpn", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_SEND] = { .name = "send", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_SENDC] = { .name = "sendc", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_SENDS] = { .name = "sends", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_NOP] = { .name = "nop", .nsrc = 0, .ndst = 0 }, [GEN_OPCODE_JMPI] = { .name = "jmpi", .nsrc = 0, .ndst = 0 }, [GEN_OPCODE_BRD] = { .name = "brd", .nsrc = 0, .ndst = 0 }, [GEN_OPCODE_IF] = { .name = "if", .nsrc = 0, .ndst = 0 }, [GEN_OPCODE_BRC] = { .name = "brc", .nsrc = 0, .ndst = 0 }, [GEN_OPCODE_WHILE] = { .name = "while", .nsrc = 0, .ndst = 0 }, [GEN_OPCODE_ELSE] = { .name = "else", .nsrc = 0, .ndst = 0 }, [GEN_OPCODE_BREAK] = { .name = "break", .nsrc = 0, .ndst = 0 }, [GEN_OPCODE_CONTINUE] = { .name = "cont", .nsrc = 0, .ndst = 0 }, [GEN_OPCODE_HALT] = { .name = "halt", .nsrc = 1, .ndst = 0 }, [GEN_OPCODE_MSAVE] = { .name = "msave", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_PUSH] = { .name = "push", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_MRESTORE] = { .name = "mrest", .nsrc = 1, .ndst = 1 }, [GEN_OPCODE_POP] = { .name = "pop", .nsrc = 2, .ndst = 0 }, [GEN_OPCODE_WAIT] = { .name = "wait", .nsrc = 1, .ndst = 0 }, [GEN_OPCODE_DO] = { .name = "do", .nsrc = 0, .ndst = 0 }, [GEN_OPCODE_ENDIF] = { .name = "endif", .nsrc = 1, .ndst = 0 }, }; static const char *conditional_modifier[16] = { [GEN_CONDITIONAL_NONE] = "", [GEN_CONDITIONAL_Z] = ".e", [GEN_CONDITIONAL_NZ] = ".ne", [GEN_CONDITIONAL_G] = ".g", [GEN_CONDITIONAL_GE] = ".ge", [GEN_CONDITIONAL_L] = ".l", [GEN_CONDITIONAL_LE] = ".le", [GEN_CONDITIONAL_R] = ".r", [GEN_CONDITIONAL_O] = ".o", [GEN_CONDITIONAL_U] = ".u", }; static const char *negate[2] = { [0] = "", [1] = "-", }; static const char *_abs[2] = { [0] = "", [1] = "(abs)", }; static const char *vert_stride[16] = { [0] = "0", [1] = "1", [2] = "2", [3] = "4", [4] = "8", [5] = "16", [6] = "32", [15] = "VxH", }; static const char *width[8] = { [0] = "1", [1] = "2", [2] = "4", [3] = "8", [4] = "16", }; static const char *horiz_stride[4] = { [0] = "0", [1] = "1", [2] = "2", [3] = "4" }; static const char *chan_sel[4] = { [0] = "x", [1] = "y", [2] = "z", [3] = "w", }; static const char *debug_ctrl[2] = { [0] = "", [1] = ".breakpoint" }; static const char *saturate[2] = { [0] = "", [1] = ".sat" }; static const char *accwr[2] = { [0] = "", [1] = "AccWrEnable" }; static const char *wectrl[2] = { [0] = "WE_normal", [1] = "WE_all" }; static const char *exec_size[8] = { [0] = "1", [1] = "2", [2] = "4", [3] = "8", [4] = "16", [5] = "32" }; static const char *pred_inv[2] = { [0] = "+", [1] = "-" }; static const char *pred_ctrl_align16[16] = { [1] = "", [2] = ".x", [3] = ".y", [4] = ".z", [5] = ".w", [6] = ".any4h", [7] = ".all4h", }; static const char *pred_ctrl_align1[16] = { [1] = "", [2] = ".anyv", [3] = ".allv", [4] = ".any2h", [5] = ".all2h", [6] = ".any4h", [7] = ".all4h", [8] = ".any8h", [9] = ".all8h", [10] = ".any16h", [11] = ".all16h", }; static const char *thread_ctrl_gen7[4] = { [0] = "", [2] = "switch" }; static const char *thread_ctrl_gen8[4] = { [0] = "", [1] = "atomic", [2] = "switch" }; static const char *dep_ctrl[4] = { [0] = "", [1] = "NoDDClr", [2] = "NoDDChk", [3] = "NoDDClr,NoDDChk", }; static const char *access_mode[2] = { [0] = "align1", [1] = "align16", }; static const char *reg_encoding[11] = { [0] = ":UD", [1] = ":D", [2] = ":UW", [3] = ":W", [4] = ":UB", [5] = ":B", [6] = ":DF", [7] = ":F", [8] = ":UQ", [9] = ":Q", [10] = ":HF" }; static const char *reg_encoding_3src[5] = { [0] = ":F", [1] = ":D", [2] = ":UD", [3] = ":DF", [4] = ":HF", }; int reg_type_size[11] = { [0] = 4, [1] = 4, [2] = 2, [3] = 2, [4] = 1, [5] = 1, [6] = 8, [7] = 4, [8] = 8, [9] = 8, [10] = 2, }; static const char *reg_file[4] = { [0] = "A", [1] = "g", [2] = "m", [3] = "imm", }; static const char *writemask[16] = { [0x0] = ".", [0x1] = ".x", [0x2] = ".y", [0x3] = ".xy", [0x4] = ".z", [0x5] = ".xz", [0x6] = ".yz", [0x7] = ".xyz", [0x8] = ".w", [0x9] = ".xw", [0xa] = ".yw", [0xb] = ".xyw", [0xc] = ".zw", [0xd] = ".xzw", [0xe] = ".yzw", [0xf] = "", }; static const char *special_acc[9] = { [0x0] = ".acc2", [0x1] = ".acc3", [0x2] = ".acc4", [0x3] = ".acc5", [0x4] = ".acc6", [0x5] = ".acc7", [0x6] = ".acc8", [0x7] = ".acc9", [0x8] = ".noacc", }; static const char *end_of_thread[2] = { [0] = "", [1] = "EOT" }; static const char *target_function_gen7[16] = { [GEN_SFID_NULL] = "null", [GEN_SFID_RESERVED] = NULL, [GEN_SFID_SAMPLER] = "sampler", [GEN_SFID_MESSAGE_GATEWAY] = "gateway", [GEN_SFID_DATAPORT_SAMPLER] = "dataport_sampler", [GEN_SFID_DATAPORT_RENDER] = "render", [GEN_SFID_URB] = "urb", [GEN_SFID_THREAD_SPAWNER] = "thread_spawner", [GEN_SFID_VIDEO_MOTION_EST] = "video_motion_estimation", [GEN_SFID_DATAPORT_CONSTANT] = "const", [GEN_SFID_DATAPORT_DATA] = "data", [GEN_SFID_PIXEL_INTERPOLATOR] = "pix_interpolator", }; static const char *target_function_gen75[16] = { [GEN_SFID_NULL] = "null", [GEN_SFID_RESERVED] = NULL, [GEN_SFID_SAMPLER] = "sampler", [GEN_SFID_MESSAGE_GATEWAY] = "gateway", [GEN_SFID_DATAPORT_SAMPLER] = "dataport_sampler", [GEN_SFID_DATAPORT_RENDER] = "render", [GEN_SFID_URB] = "urb", [GEN_SFID_THREAD_SPAWNER] = "thread_spawner", [GEN_SFID_VIDEO_MOTION_EST] = "video_motion_estimation", [GEN_SFID_DATAPORT_CONSTANT] = "const", [GEN_SFID_DATAPORT_DATA] = "data (0)", [GEN_SFID_PIXEL_INTERPOLATOR] = "pix_interpolator", [GEN_SFID_DATAPORT1_DATA] = "data (1)", }; static const char *gateway_sub_function[8] = { [0] = "open gateway", [1] = "close gateway", [2] = "forward gateway", [3] = "get time stamp", [4] = "barrier", [5] = "update gateway state", [6] = "MMIO R/W", [7] = "reserved" }; static const char *math_function_gen7[16] = { [GEN_MATH_FUNCTION_INV] = "inv", [GEN_MATH_FUNCTION_LOG] = "log", [GEN_MATH_FUNCTION_EXP] = "exp", [GEN_MATH_FUNCTION_SQRT] = "sqrt", [GEN_MATH_FUNCTION_RSQ] = "rsq", [GEN_MATH_FUNCTION_SIN] = "sin", [GEN_MATH_FUNCTION_COS] = "cos", [GEN_MATH_FUNCTION_FDIV] = "fdiv", [GEN_MATH_FUNCTION_POW] = "pow", [GEN_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER] = "intdivmod", [GEN_MATH_FUNCTION_INT_DIV_QUOTIENT] = "intdiv", [GEN_MATH_FUNCTION_INT_DIV_REMAINDER] = "intmod", }; static const char *math_function_gen8[16] = { [GEN_MATH_FUNCTION_INV] = "inv", [GEN_MATH_FUNCTION_LOG] = "log", [GEN_MATH_FUNCTION_EXP] = "exp", [GEN_MATH_FUNCTION_SQRT] = "sqrt", [GEN_MATH_FUNCTION_RSQ] = "rsq", [GEN_MATH_FUNCTION_SIN] = "sin", [GEN_MATH_FUNCTION_COS] = "cos", [GEN_MATH_FUNCTION_FDIV] = "fdiv", [GEN_MATH_FUNCTION_POW] = "pow", [GEN_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER] = "intdivmod", [GEN_MATH_FUNCTION_INT_DIV_QUOTIENT] = "intdiv", [GEN_MATH_FUNCTION_INT_DIV_REMAINDER] = "intmod", [GEN8_MATH_FUNCTION_INVM] = "invm", [GEN8_MATH_FUNCTION_RSQRTM] = "rsqrtm", }; static const char *data_port_data_cache_data_size[] = { "1 byte", "2 bytes", "4 bytes", "Reserved" }; static const char *data_port_data_cache_byte_scattered_simd_mode[] = { "SIMD8", "SIMD16", }; static const char *data_port_data_cache_simd_mode[] = { "SIMD4x2", "SIMD16", "SIMD8", }; static const char *data_port_data_cache_category[] = { "legacy", "scratch", }; static const char *data_port_data_cache_block_size[] = { "1 OWORD LOW", "1 OWORD HIGH", "2 OWORD", "4 OWORD", "8 OWORD", }; static const char *data_port_scratch_block_size[] = { "1 register", "2 registers", "Reserve", "4 registers", }; static const char *data_port_scratch_invalidate[] = { "no invalidate", "invalidate cache line", }; static const char *data_port_scratch_channel_mode[] = { "Oword", "Dword", }; static const char *data_port_scratch_msg_type[] = { "Scratch Read", "Scratch Write", }; static const char *data_port_data_cache_msg_type[] = { [0] = "OWord Block Read", [1] = "Unaligned OWord Block Read", [2] = "OWord Dual Block Read", [3] = "DWord Scattered Read", [4] = "Byte Scattered Read", [5] = "Untyped Surface Read", [6] = "Untyped Atomic Operation", [7] = "Memory Fence", [8] = "OWord Block Write", [10] = "OWord Dual Block Write", [11] = "DWord Scattered Write", [12] = "Byte Scattered Write", [13] = "Untyped Surface Write", }; static const char *data_port1_data_cache_msg_type[] = { [1] = "Untyped Surface Read", [2] = "Untyped Atomic Operation", [3] = "Untyped Atomic Operation SIMD4x2", [4] = "Media Block Read", [5] = "Typed Surface Read", [6] = "Typed Atomic Operation", [7] = "Typed Atomic Operation SIMD4x2", [9] = "Untyped Surface Write", [10] = "Media Block Write", [11] = "Atomic Counter Operation", [12] = "Atomic Counter Operation 4X2", [13] = "Typed Surface Write", }; static const char *atomic_opration_type[] = { [1] = "and", [2] = "or", [3] = "xor", [4] = "xchg", [5] = "inc", [6] = "dec", [7] = "add", [8] = "sub", [9] = "rsub", [10] = "imax", [11] = "imin", [12] = "umax", [13] = "umin", [14] = "cmpxchg", [15] = "invalid" }; static int column; static int gen_version; #define GEN7_BITS_FIELD(inst, gen7) \ ({ \ int bits; \ bits = ((const union Gen7NativeInstruction *)inst)->gen7; \ bits; \ }) #define GEN_BITS_FIELD(inst, gen) \ ({ \ int bits; \ if (gen_version < 80) \ bits = ((const union Gen7NativeInstruction *)inst)->gen; \ else \ bits = ((const union Gen8NativeInstruction *)inst)->gen; \ bits; \ }) #define GEN_BITS_FIELD_WITH_TYPE(inst, gen, TYPE) \ ({ \ TYPE bits; \ if (gen_version < 80) \ bits = ((const union Gen7NativeInstruction *)inst)->gen; \ else \ bits = ((const union Gen8NativeInstruction *)inst)->gen; \ bits; \ }) #define GEN_BITS_FIELD2(inst, gen7, gen8) \ ({ \ int bits; \ if (gen_version < 80) \ bits = ((const union Gen7NativeInstruction *)inst)->gen7; \ else \ bits = ((const union Gen8NativeInstruction *)inst)->gen8; \ bits; \ }) #define PRED_CTRL(inst) GEN_BITS_FIELD(inst, header.predicate_control) #define PRED_INV(inst) GEN_BITS_FIELD(inst, header.predicate_inverse) #define FLAG_REG_NR(inst) GEN_BITS_FIELD2(inst, bits2.da1.flag_reg_nr, bits1.da1.flag_reg_nr) #define FLAG_SUB_REG_NR(inst) GEN_BITS_FIELD2(inst, bits2.da1.flag_sub_reg_nr, bits1.da1.flag_sub_reg_nr) #define ACCESS_MODE(inst) GEN_BITS_FIELD(inst, header.access_mode) #define MASK_CONTROL(inst) GEN_BITS_FIELD2(inst, header.mask_control, bits1.da1.mask_control) #define DEPENDENCY_CONTROL(inst) GEN_BITS_FIELD(inst, header.dependency_control) #define THREAD_CONTROL(inst) GEN_BITS_FIELD(inst, header.thread_control) #define ACC_WR_CONTROL(inst) GEN_BITS_FIELD(inst, header.acc_wr_control) #define QUARTER_CONTROL(inst) GEN_BITS_FIELD(inst, header.quarter_control) #define END_OF_THREAD(inst) GEN_BITS_FIELD(inst, bits3.generic_gen5.end_of_thread) #define OPCODE(inst) GEN_BITS_FIELD(inst, header.opcode) #define SATURATE(inst) GEN_BITS_FIELD(inst, header.saturate) #define DEBUG_CONTROL(inst) GEN_BITS_FIELD(inst, header.debug_control) #define MATH_FUNCTION(inst) GEN_BITS_FIELD(inst, header.destreg_or_condmod) #define MATH_SATURATE(inst) GEN_BITS_FIELD(inst, bits3.math_gen5.saturate) #define MATH_SIGNED(inst) GEN_BITS_FIELD(inst, bits3.math_gen5.int_type) #define MATH_SCALAR(inst) GEN_BITS_FIELD(inst, bits3.math_gen5.data_type) #define MATH_PRECISION(inst) GEN_BITS_FIELD(inst, bits3.math_gen5.precision) #define COND_DST_OR_MODIFIER(inst) GEN_BITS_FIELD(inst, header.destreg_or_condmod) #define EXECUTION_SIZE(inst) GEN_BITS_FIELD(inst, header.execution_size) #define BRANCH_JIP(inst) GEN_BITS_FIELD2(inst, bits3.gen7_branch.jip, bits3.gen8_branch.jip/8) #define BRANCH_UIP(inst) GEN_BITS_FIELD2(inst, bits3.gen7_branch.uip, bits2.gen8_branch.uip/8) #define VME_BTI(inst) GEN7_BITS_FIELD(inst, bits3.vme_gen7.bti) #define VME_MSG_TYPE(inst) GEN7_BITS_FIELD(inst, bits3.vme_gen7.msg_type) #define SAMPLE_BTI(inst) GEN_BITS_FIELD(inst, bits3.sampler_gen7.bti) #define SAMPLER(inst) GEN_BITS_FIELD(inst, bits3.sampler_gen7.sampler) #define SAMPLER_MSG_TYPE(inst) GEN_BITS_FIELD(inst, bits3.sampler_gen7.msg_type) #define SAMPLER_SIMD_MODE(inst) GEN_BITS_FIELD(inst, bits3.sampler_gen7.simd_mode) #define UNTYPED_RW_BTI(inst) GEN_BITS_FIELD(inst, bits3.gen7_untyped_rw.bti) #define UNTYPED_RW_RGBA(inst) GEN_BITS_FIELD(inst, bits3.gen7_untyped_rw.rgba) #define UNTYPED_RW_SIMD_MODE(inst) GEN_BITS_FIELD(inst, bits3.gen7_untyped_rw.simd_mode) #define UNTYPED_RW_CATEGORY(inst) GEN_BITS_FIELD(inst, bits3.gen7_untyped_rw.category) #define UNTYPED_RW_MSG_TYPE(inst) GEN_BITS_FIELD(inst, bits3.gen7_untyped_rw.msg_type) #define BYTE_RW_SIMD_MODE(inst) GEN_BITS_FIELD(inst, bits3.gen7_byte_rw.simd_mode) #define BYTE_RW_DATA_SIZE(inst) GEN_BITS_FIELD(inst, bits3.gen7_byte_rw.data_size) #define UNTYPED_RW_AOP_TYPE(inst) GEN_BITS_FIELD2(inst, bits3.gen7_atomic_op.aop_type, bits3.gen8_atomic_a64.aop_type) #define SCRATCH_RW_OFFSET(inst) GEN_BITS_FIELD(inst, bits3.gen7_scratch_rw.offset) #define SCRATCH_RW_BLOCK_SIZE(inst) GEN_BITS_FIELD(inst, bits3.gen7_scratch_rw.block_size) #define SCRATCH_RW_INVALIDATE_AFTER_READ(inst) GEN_BITS_FIELD(inst, bits3.gen7_scratch_rw.invalidate_after_read) #define SCRATCH_RW_BLOCK_SIZE(inst) GEN_BITS_FIELD(inst, bits3.gen7_scratch_rw.block_size) #define SCRATCH_RW_CHANNEL_MODE(inst) GEN_BITS_FIELD(inst, bits3.gen7_scratch_rw.channel_mode) #define SCRATCH_RW_MSG_TYPE(inst) GEN_BITS_FIELD(inst, bits3.gen7_scratch_rw.msg_type) #define DWORD_RW_BTI(inst) GEN_BITS_FIELD(inst, bits3.gen7_dword_rw.bti) #define DWORD_RW_MSG_TYPE(inst) GEN_BITS_FIELD(inst, bits3.gen7_dword_rw.msg_type) #define MSG_GW_SUBFUNC(inst) GEN_BITS_FIELD(inst, bits3.gen7_msg_gw.subfunc) #define MSG_GW_NOTIFY(inst) GEN_BITS_FIELD(inst, bits3.gen7_msg_gw.notify) #define MSG_GW_ACKREQ(inst) GEN_BITS_FIELD(inst, bits3.gen7_msg_gw.ackreq) #define GENERIC_MSG_LENGTH(inst) GEN_BITS_FIELD(inst, bits3.generic_gen5.msg_length) #define GENERIC_RESPONSE_LENGTH(inst) GEN_BITS_FIELD(inst, bits3.generic_gen5.response_length) #define OWORD_RW_BLOCK_SIZE(inst) GEN_BITS_FIELD(inst, bits3.gen7_oblock_rw.block_size) static int is_special_acc(const void* inst) { if (gen_version < 80) return 0; if (OPCODE(inst) != GEN_OPCODE_MADM && OPCODE(inst) != GEN_OPCODE_MATH) return 0; if (OPCODE(inst) == GEN_OPCODE_MATH && (MATH_FUNCTION(inst) != GEN8_MATH_FUNCTION_INVM && MATH_FUNCTION(inst) != GEN8_MATH_FUNCTION_RSQRTM)) return 0; if (ACCESS_MODE(inst) != GEN_ALIGN_16) return 0; return 1; } static int string(FILE *file, const char *string) { fputs (string, file); column += strlen (string); return 0; } static int format(FILE *f, const char *format, ...) { char buf[1024]; va_list args; va_start (args, format); vsnprintf (buf, sizeof (buf) - 1, format, args); va_end (args); string(f, buf); return 0; } static int newline(FILE *f) { putc ('\n', f); column = 0; return 0; } static int pad(FILE *f, int c) { do string(f, " "); while (column < c); return 0; } static int flag_reg(FILE *file, const int flag_nr, const int flag_sub_reg_nr) { if (flag_nr || flag_sub_reg_nr) return format(file, ".f%d.%d", flag_nr, flag_sub_reg_nr); return 0; } static int control(FILE *file, const char *name, const char *ctrl[], uint32_t id, int *space) { if (!ctrl[id]) { fprintf (file, "*** invalid %s value %d ", name, id); return 1; } if (ctrl[id][0]) { if (space && *space) string(file, " "); string(file, ctrl[id]); if (space) *space = 1; } return 0; } static int print_opcode(FILE *file, int id) { if (!opcode[id].name) { format(file, "*** invalid opcode value %d ", id); return 1; } string(file, opcode[id].name); return 0; } static int reg(FILE *file, uint32_t _reg_file, uint32_t _reg_nr) { int err = 0; if (_reg_file == GEN_ARCHITECTURE_REGISTER_FILE) { switch (_reg_nr & 0xf0) { case GEN_ARF_NULL: string(file, "null"); return -1; case GEN_ARF_ADDRESS: format(file, "a%d", _reg_nr & 0x0f); break; case GEN_ARF_ACCUMULATOR: format(file, "acc%d", _reg_nr & 0x0f); break; case GEN_ARF_FLAG: format(file, "f%d", _reg_nr & 0x0f); break; case GEN_ARF_MASK: format(file, "mask%d", _reg_nr & 0x0f); break; case GEN_ARF_MASK_STACK: format(file, "msd%d", _reg_nr & 0x0f); break; case GEN_ARF_STATE: format(file, "sr%d", _reg_nr & 0x0f); break; case GEN_ARF_CONTROL: format(file, "cr%d", _reg_nr & 0x0f); break; case GEN_ARF_NOTIFICATION_COUNT: format(file, "n%d", _reg_nr & 0x0f); break; case GEN_ARF_IP: string(file, "ip"); return -1; break; case GEN_ARF_TM: format(file, "tm%d", _reg_nr & 0x0f); break; default: format(file, "ARF%d", _reg_nr); break; } } else { err |= control(file, "src reg file", reg_file, _reg_file, NULL); format(file, "%d", _reg_nr); } return err; } static int dest(FILE *file, const void* inst) { int err = 0; if (ACCESS_MODE(inst) == GEN_ALIGN_1) { if (GEN_BITS_FIELD(inst, bits1.da1.dest_address_mode) == GEN_ADDRESS_DIRECT) { err |= reg(file, GEN_BITS_FIELD(inst, bits1.da1.dest_reg_file), GEN_BITS_FIELD(inst, bits1.da1.dest_reg_nr)); if (err == -1) { control(file, "dest reg encoding", reg_encoding, GEN_BITS_FIELD(inst, bits1.da1.dest_reg_type), NULL); return 0; } if (GEN_BITS_FIELD(inst, bits1.da1.dest_subreg_nr)) format(file, ".%d", GEN_BITS_FIELD(inst, bits1.da1.dest_subreg_nr) / reg_type_size[GEN_BITS_FIELD(inst, bits1.da1.dest_reg_type)]); format(file, "<%s>", horiz_stride[GEN_BITS_FIELD(inst, bits1.da1.dest_horiz_stride)]); err |= control(file, "dest reg encoding", reg_encoding, GEN_BITS_FIELD(inst, bits1.da1.dest_reg_type), NULL); } else { string(file, "g[a0"); if (GEN_BITS_FIELD(inst, bits1.ia1.dest_subreg_nr)) format(file, ".%d", GEN_BITS_FIELD(inst, bits1.ia1.dest_subreg_nr) / reg_type_size[GEN_BITS_FIELD(inst, bits1.ia1.dest_reg_type)]); if (GEN_BITS_FIELD(inst, bits1.ia1.dest_indirect_offset)) format(file, " %d", GEN_BITS_FIELD(inst, bits1.ia1.dest_indirect_offset)); string(file, "]"); format(file, "<%s>", horiz_stride[GEN_BITS_FIELD(inst, bits1.ia1.dest_horiz_stride)]); err |= control(file, "dest reg encoding", reg_encoding, GEN_BITS_FIELD(inst, bits1.ia1.dest_reg_type), NULL); } } else { if (GEN_BITS_FIELD(inst, bits1.da16.dest_address_mode) == GEN_ADDRESS_DIRECT) { err |= reg(file, GEN_BITS_FIELD(inst, bits1.da16.dest_reg_file), GEN_BITS_FIELD(inst, bits1.da16.dest_reg_nr)); if (err == -1) return 0; if (GEN_BITS_FIELD(inst, bits1.da16.dest_subreg_nr)) format(file, ".%d", GEN_BITS_FIELD(inst, bits1.da16.dest_subreg_nr) / reg_type_size[GEN_BITS_FIELD(inst, bits1.da16.dest_reg_type)]); string(file, "<1>"); if (is_special_acc(inst)) { err |= control(file, "specialacc", special_acc, ((const union Gen8NativeInstruction *)inst)->bits1.da16acc.dst_special_acc, NULL); } else { err |= control(file, "writemask", writemask, GEN_BITS_FIELD(inst, bits1.da16.dest_writemask), NULL); } err |= control(file, "dest reg encoding", reg_encoding, GEN_BITS_FIELD(inst, bits1.da16.dest_reg_type), NULL); } else { err = 1; string(file, "Indirect align16 address mode not supported"); } } return 0; } static int dest_3src(FILE *file, const void *inst) { int err = 0; const uint32_t reg_file = GEN_GENERAL_REGISTER_FILE; err |= reg(file, reg_file, GEN_BITS_FIELD(inst, bits1.da3src.dest_reg_nr)); if (err == -1) return 0; if (GEN_BITS_FIELD(inst, bits1.da3src.dest_subreg_nr)) format(file, ".%d", GEN_BITS_FIELD(inst, bits1.da3src.dest_subreg_nr)); string(file, "<1>"); if (is_special_acc(inst)) { err |= control(file, "specialacc", special_acc, ((const union Gen8NativeInstruction *)inst)->bits1.da3srcacc.dst_special_acc, NULL); } else { err |= control(file, "writemask", writemask, GEN_BITS_FIELD(inst, bits1.da3src.dest_writemask), NULL); } if (gen_version < 80) { err |= control(file, "dest reg encoding", reg_encoding, GEN_TYPE_F, NULL); } else { err |= control(file, "dest reg encoding", reg_encoding_3src, ((const union Gen8NativeInstruction *)inst)->bits1.da3src.dest_type, NULL); } return 0; } static int src_align1_region(FILE *file, uint32_t _vert_stride, uint32_t _width, uint32_t _horiz_stride) { int err = 0; string(file, "<"); err |= control(file, "vert stride", vert_stride, _vert_stride, NULL); string(file, ","); err |= control(file, "width", width, _width, NULL); string(file, ","); err |= control(file, "horiz_stride", horiz_stride, _horiz_stride, NULL); string(file, ">"); return err; } static int src_da1(FILE *file, uint32_t type, uint32_t _reg_file, uint32_t _vert_stride, uint32_t _width, uint32_t _horiz_stride, uint32_t reg_num, uint32_t sub_reg_num, uint32_t __abs, uint32_t _negate) { int err = 0; err |= control(file, "negate", negate, _negate, NULL); err |= control(file, "abs", _abs, __abs, NULL); err |= reg(file, _reg_file, reg_num); if (err == -1) return 0; if (sub_reg_num) format(file, ".%d", sub_reg_num / reg_type_size[type]); /* use formal style like spec */ src_align1_region(file, _vert_stride, _width, _horiz_stride); err |= control(file, "src reg encoding", reg_encoding, type, NULL); return err; } static int src_ia1(FILE *file, uint32_t type, uint32_t _reg_file, int32_t _addr_imm, uint32_t _addr_subreg_nr, uint32_t _negate, uint32_t __abs, uint32_t _addr_mode, uint32_t _horiz_stride, uint32_t _width, uint32_t _vert_stride) { int err = 0; err |= control(file, "negate", negate, _negate, NULL); err |= control(file, "abs", _abs, __abs, NULL); string(file, "g[a0"); if (_addr_subreg_nr) format(file, ".%d", _addr_subreg_nr); if (_addr_imm) format(file, " %d", _addr_imm); string(file, "]"); src_align1_region(file, _vert_stride, _width, _horiz_stride); err |= control(file, "src reg encoding", reg_encoding, type, NULL); return err; } static int src_da16(FILE *file, const void* inst, int src_num, uint32_t _reg_type, uint32_t _reg_file, uint32_t _vert_stride, uint32_t _reg_nr, uint32_t _subreg_nr, uint32_t __abs, uint32_t _negate, uint32_t swz_x, uint32_t swz_y, uint32_t swz_z, uint32_t swz_w) { int err = 0; err |= control(file, "negate", negate, _negate, NULL); err |= control(file, "abs", _abs, __abs, NULL); err |= reg(file, _reg_file, _reg_nr); if (err == -1) return 0; if (_subreg_nr) /* bit4 for subreg number byte addressing. Make this same meaning as in da1 case, so output looks consistent. */ format(file, ".%d", 16 / reg_type_size[_reg_type]); string(file, "<"); err |= control(file, "vert stride", vert_stride, _vert_stride, NULL); string(file, ",4,1>"); if (is_special_acc(inst)) { if (src_num == 0) { err |= control(file, "specialacc", special_acc, ((const union Gen8NativeInstruction *)inst)->bits2.da16acc.src0_special_acc_lo, NULL); } else { assert(src_num == 1); err |= control(file, "specialacc", special_acc, ((const union Gen8NativeInstruction *)inst)->bits3.da16acc.src1_special_acc_lo, NULL); } return err; } /* * Three kinds of swizzle display: * identity - nothing printed * 1->all - print the single channel * 1->1 - print the mapping */ if (swz_x == GEN_CHANNEL_X && swz_y == GEN_CHANNEL_Y && swz_z == GEN_CHANNEL_Z && swz_w == GEN_CHANNEL_W) { ; } else if (swz_x == swz_y && swz_x == swz_z && swz_x == swz_w) { string(file, "."); err |= control(file, "channel select", chan_sel, swz_x, NULL); } else { string(file, "."); err |= control(file, "channel select", chan_sel, swz_x, NULL); err |= control(file, "channel select", chan_sel, swz_y, NULL); err |= control(file, "channel select", chan_sel, swz_z, NULL); err |= control(file, "channel select", chan_sel, swz_w, NULL); } err |= control(file, "src da16 reg type", reg_encoding, _reg_type, NULL); return err; } static int src0_3src(FILE *file, const void* inst) { int err = 0; uint32_t swz_x = (GEN_BITS_FIELD(inst, bits2.da3src.src0_swizzle) >> 0) & 0x3; uint32_t swz_y = (GEN_BITS_FIELD(inst, bits2.da3src.src0_swizzle) >> 2) & 0x3; uint32_t swz_z = (GEN_BITS_FIELD(inst, bits2.da3src.src0_swizzle) >> 4) & 0x3; uint32_t swz_w = (GEN_BITS_FIELD(inst, bits2.da3src.src0_swizzle) >> 6) & 0x3; err |= control(file, "negate", negate, GEN_BITS_FIELD(inst, bits1.da3src.src0_negate), NULL); err |= control(file, "abs", _abs, GEN_BITS_FIELD(inst, bits1.da3src.src0_abs), NULL); err |= reg(file, GEN_GENERAL_REGISTER_FILE, GEN_BITS_FIELD(inst, bits2.da3src.src0_reg_nr)); if (err == -1) return 0; if (GEN_BITS_FIELD(inst, bits2.da3src.src0_subreg_nr)) format(file, ".%d", GEN_BITS_FIELD(inst, bits2.da3src.src0_subreg_nr)); if (GEN_BITS_FIELD(inst, bits2.da3src.src0_rep_ctrl)) string(file, "<0,1,0>"); if (gen_version < 80) { err |= control(file, "src da16 reg type", reg_encoding, GEN_TYPE_F, NULL); } else { err |= control(file, "src da16 reg type", reg_encoding_3src, ((const union Gen8NativeInstruction *)inst)->bits1.da3src.src_type, NULL); } if (is_special_acc(inst)) { err |= control(file, "specialacc", special_acc, ((const union Gen8NativeInstruction *)inst)->bits2.da3srcacc.src0_special_acc, NULL); return err; } /* * Three kinds of swizzle display: * identity - nothing printed * 1->all - print the single channel * 1->1 - print the mapping */ if (swz_x == GEN_CHANNEL_X && swz_y == GEN_CHANNEL_Y && swz_z == GEN_CHANNEL_Z && swz_w == GEN_CHANNEL_W) { ; } else if (swz_x == swz_y && swz_x == swz_z && swz_x == swz_w) { string(file, "."); err |= control(file, "channel select", chan_sel, swz_x, NULL); } else { string(file, "."); err |= control(file, "channel select", chan_sel, swz_x, NULL); err |= control(file, "channel select", chan_sel, swz_y, NULL); err |= control(file, "channel select", chan_sel, swz_z, NULL); err |= control(file, "channel select", chan_sel, swz_w, NULL); } return err; } static int src1_3src(FILE *file, const void* inst) { int err = 0; uint32_t swz_x = (GEN_BITS_FIELD(inst, bits2.da3src.src1_swizzle) >> 0) & 0x3; uint32_t swz_y = (GEN_BITS_FIELD(inst, bits2.da3src.src1_swizzle) >> 2) & 0x3; uint32_t swz_z = (GEN_BITS_FIELD(inst, bits2.da3src.src1_swizzle) >> 4) & 0x3; uint32_t swz_w = (GEN_BITS_FIELD(inst, bits2.da3src.src1_swizzle) >> 6) & 0x3; uint32_t src1_subreg_nr = (GEN_BITS_FIELD(inst, bits2.da3src.src1_subreg_nr_low) | (GEN_BITS_FIELD(inst, bits3.da3src.src1_subreg_nr_high) << 2)); err |= control(file, "negate", negate, GEN_BITS_FIELD(inst, bits1.da3src.src1_negate), NULL); err |= control(file, "abs", _abs, GEN_BITS_FIELD(inst, bits1.da3src.src1_abs), NULL); err |= reg(file, GEN_GENERAL_REGISTER_FILE, GEN_BITS_FIELD(inst, bits3.da3src.src1_reg_nr)); if (err == -1) return 0; if (src1_subreg_nr) format(file, ".%d", src1_subreg_nr); if (GEN_BITS_FIELD(inst, bits2.da3src.src1_rep_ctrl)) string(file, "<0,1,0>"); if (gen_version < 80) { err |= control(file, "src da16 reg type", reg_encoding, GEN_TYPE_F, NULL); } else { err |= control(file, "src da16 reg type", reg_encoding_3src, ((const union Gen8NativeInstruction *)inst)->bits1.da3src.src_type, NULL); } if (is_special_acc(inst)) { err |= control(file, "specialacc", special_acc, ((const union Gen8NativeInstruction *)inst)->bits2.da3srcacc.src1_special_acc, NULL); return err; } /* * Three kinds of swizzle display: * identity - nothing printed * 1->all - print the single channel * 1->1 - print the mapping */ if (swz_x == GEN_CHANNEL_X && swz_y == GEN_CHANNEL_Y && swz_z == GEN_CHANNEL_Z && swz_w == GEN_CHANNEL_W) { ; } else if (swz_x == swz_y && swz_x == swz_z && swz_x == swz_w) { string(file, "."); err |= control(file, "channel select", chan_sel, swz_x, NULL); } else { string(file, "."); err |= control(file, "channel select", chan_sel, swz_x, NULL); err |= control(file, "channel select", chan_sel, swz_y, NULL); err |= control(file, "channel select", chan_sel, swz_z, NULL); err |= control(file, "channel select", chan_sel, swz_w, NULL); } return err; } static int src2_3src(FILE *file, const void* inst) { int err = 0; uint32_t swz_x = (GEN_BITS_FIELD(inst, bits3.da3src.src2_swizzle) >> 0) & 0x3; uint32_t swz_y = (GEN_BITS_FIELD(inst, bits3.da3src.src2_swizzle) >> 2) & 0x3; uint32_t swz_z = (GEN_BITS_FIELD(inst, bits3.da3src.src2_swizzle) >> 4) & 0x3; uint32_t swz_w = (GEN_BITS_FIELD(inst, bits3.da3src.src2_swizzle) >> 6) & 0x3; err |= control(file, "negate", negate, GEN_BITS_FIELD(inst, bits1.da3src.src2_negate), NULL); err |= control(file, "abs", _abs, GEN_BITS_FIELD(inst, bits1.da3src.src2_abs), NULL); err |= reg(file, GEN_GENERAL_REGISTER_FILE, GEN_BITS_FIELD(inst, bits3.da3src.src2_reg_nr)); if (err == -1) return 0; if (GEN_BITS_FIELD(inst, bits3.da3src.src2_subreg_nr)) format(file, ".%d", GEN_BITS_FIELD(inst, bits3.da3src.src2_subreg_nr)); if (GEN_BITS_FIELD(inst, bits3.da3src.src2_rep_ctrl)) string(file, "<0,1,0>"); if (gen_version < 80) { err |= control(file, "src da16 reg type", reg_encoding, GEN_TYPE_F, NULL); } else { err |= control(file, "src da16 reg type", reg_encoding_3src, ((const union Gen8NativeInstruction *)inst)->bits1.da3src.src_type, NULL); } if (is_special_acc(inst)) { err |= control(file, "specialacc", special_acc, ((const union Gen8NativeInstruction *)inst)->bits3.da3srcacc.src2_special_acc, NULL); return err; } /* * Three kinds of swizzle display: * identity - nothing printed * 1->all - print the single channel * 1->1 - print the mapping */ if (swz_x == GEN_CHANNEL_X && swz_y == GEN_CHANNEL_Y && swz_z == GEN_CHANNEL_Z && swz_w == GEN_CHANNEL_W) { ; } else if (swz_x == swz_y && swz_x == swz_z && swz_x == swz_w) { string(file, "."); err |= control(file, "channel select", chan_sel, swz_x, NULL); } else { string(file, "."); err |= control(file, "channel select", chan_sel, swz_x, NULL); err |= control(file, "channel select", chan_sel, swz_y, NULL); err |= control(file, "channel select", chan_sel, swz_z, NULL); err |= control(file, "channel select", chan_sel, swz_w, NULL); } return err; } static uint32_t __conv_half_to_float(uint16_t h) { struct __FP32 { uint32_t mantissa:23; uint32_t exponent:8; uint32_t sign:1; }; struct __FP16 { uint32_t mantissa:10; uint32_t exponent:5; uint32_t sign:1; }; uint32_t f; struct __FP32 o; memset(&o, 0, sizeof(o)); struct __FP16 i; memcpy(&i, &h, sizeof(uint16_t)); if (i.exponent == 0 && i.mantissa == 0) // (Signed) zero o.sign = i.sign; else { if (i.exponent == 0) { // Denormal (converts to normalized) // Adjust mantissa so it's normalized (and keep // track of exponent adjustment) int e = -1; uint m = i.mantissa; do { e++; m <<= 1; } while ((m & 0x400) == 0); o.mantissa = (m & 0x3ff) << 13; o.exponent = 127 - 15 - e; o.sign = i.sign; } else if (i.exponent == 0x1f) { // Inf/NaN // NOTE: Both can be handled with same code path // since we just pass through mantissa bits. o.mantissa = i.mantissa << 13; o.exponent = 255; o.sign = i.sign; } else { // Normalized number o.mantissa = i.mantissa << 13; o.exponent = 127 - 15 + i.exponent; o.sign = i.sign; } } memcpy(&f, &o, sizeof(uint32_t)); return f; } static int imm(FILE *file, uint32_t type, const void* inst) { switch (type) { case GEN_TYPE_UD: format(file, "0x%xUD", GEN_BITS_FIELD(inst, bits3.ud)); break; case GEN_TYPE_D: format(file, "%dD", GEN_BITS_FIELD(inst, bits3.d)); break; case GEN_TYPE_UW: format(file, "0x%xUW", (uint16_t) GEN_BITS_FIELD(inst, bits3.ud)); break; case GEN_TYPE_W: format(file, "%dW", (int16_t) GEN_BITS_FIELD(inst, bits3.d)); break; case GEN_TYPE_UB: format(file, "0x%xUB", (int8_t) GEN_BITS_FIELD(inst, bits3.ud)); break; case GEN_TYPE_VF: format(file, "Vector Float"); break; case GEN_TYPE_V: format(file, "0x%xV", GEN_BITS_FIELD(inst, bits3.ud)); break; case GEN_TYPE_F: format(file, "%-gF", GEN_BITS_FIELD_WITH_TYPE(inst, bits3.f, float)); break; case GEN_TYPE_UL: assert(!(gen_version < 80)); format(file, "0x%.8x %.8xUQ", (((const union Gen8NativeInstruction *)inst)->bits3).ud, (((const union Gen8NativeInstruction *)inst)->bits2).ud); break; case GEN_TYPE_L: { assert(!(gen_version < 80)); uint64_t val = (((const union Gen8NativeInstruction *)inst)->bits3).ud; val = (val << 32) + ((((const union Gen8NativeInstruction *)inst)->bits2).ud); format(file, "0x%lldQ", val); } case GEN_TYPE_HF_IMM: { uint16_t h = GEN_BITS_FIELD_WITH_TYPE(inst, bits3.d, uint16_t); uint32_t uf = __conv_half_to_float(h); float f; memcpy(&f, &uf, sizeof(float)); format(file, "%-gHF", f); break; } case GEN_TYPE_DF_IMM: { assert(!(gen_version < 80)); double val; uint32_t hi = (((const union Gen8NativeInstruction *)inst)->bits3).ud; uint32_t lo = (((const union Gen8NativeInstruction *)inst)->bits2).ud; memcpy((void *)(&val), &lo, sizeof(uint32_t)); memcpy(((void *)(&val) + sizeof(uint32_t)), &hi, sizeof(uint32_t)); format(file, "%f", val); } } return 0; } static int src0(FILE *file, const void* inst) { if (GEN_BITS_FIELD(inst, bits1.da1.src0_reg_file) == GEN_IMMEDIATE_VALUE) return imm(file, GEN_BITS_FIELD(inst, bits1.da1.src0_reg_type), inst); else if (ACCESS_MODE(inst) == GEN_ALIGN_1) { if (GEN_BITS_FIELD(inst, bits2.da1.src0_address_mode) == GEN_ADDRESS_DIRECT) { return src_da1(file, GEN_BITS_FIELD(inst, bits1.da1.src0_reg_type), GEN_BITS_FIELD(inst, bits1.da1.src0_reg_file), GEN_BITS_FIELD(inst, bits2.da1.src0_vert_stride), GEN_BITS_FIELD(inst, bits2.da1.src0_width), GEN_BITS_FIELD(inst, bits2.da1.src0_horiz_stride), GEN_BITS_FIELD(inst, bits2.da1.src0_reg_nr), GEN_BITS_FIELD(inst, bits2.da1.src0_subreg_nr), GEN_BITS_FIELD(inst, bits2.da1.src0_abs), GEN_BITS_FIELD(inst, bits2.da1.src0_negate)); } else { int32_t imm_off = GEN_BITS_FIELD(inst, bits2.ia1.src0_indirect_offset); if (gen_version >= 80) { imm_off = imm_off + ((((const union Gen8NativeInstruction *)inst)->bits2.ia1.src0_indirect_offset_9) << 9); } return src_ia1(file, GEN_BITS_FIELD(inst, bits1.ia1.src0_reg_type), GEN_BITS_FIELD(inst, bits1.ia1.src0_reg_file), imm_off, GEN_BITS_FIELD(inst, bits2.ia1.src0_subreg_nr), GEN_BITS_FIELD(inst, bits2.ia1.src0_negate), GEN_BITS_FIELD(inst, bits2.ia1.src0_abs), GEN_BITS_FIELD(inst, bits2.ia1.src0_address_mode), GEN_BITS_FIELD(inst, bits2.ia1.src0_horiz_stride), GEN_BITS_FIELD(inst, bits2.ia1.src0_width), GEN_BITS_FIELD(inst, bits2.ia1.src0_vert_stride)); } } else { if (GEN_BITS_FIELD(inst, bits2.da16.src0_address_mode) == GEN_ADDRESS_DIRECT) { return src_da16(file, inst, 0, GEN_BITS_FIELD(inst, bits1.da16.src0_reg_type), GEN_BITS_FIELD(inst, bits1.da16.src0_reg_file), GEN_BITS_FIELD(inst, bits2.da16.src0_vert_stride), GEN_BITS_FIELD(inst, bits2.da16.src0_reg_nr), GEN_BITS_FIELD(inst, bits2.da16.src0_subreg_nr), GEN_BITS_FIELD(inst, bits2.da16.src0_abs), GEN_BITS_FIELD(inst, bits2.da16.src0_negate), GEN_BITS_FIELD(inst, bits2.da16.src0_swz_x), GEN_BITS_FIELD(inst, bits2.da16.src0_swz_y), GEN_BITS_FIELD(inst, bits2.da16.src0_swz_z), GEN_BITS_FIELD(inst, bits2.da16.src0_swz_w)); } else { string(file, "Indirect align16 address mode not supported"); return 1; } } } static int src1(FILE *file, const void* inst) { if (GEN_BITS_FIELD2(inst, bits1.da1.src1_reg_file, bits2.da1.src1_reg_file) == GEN_IMMEDIATE_VALUE) return imm(file, GEN_BITS_FIELD2(inst, bits1.da1.src1_reg_type, bits2.da1.src1_reg_type), inst); else if (ACCESS_MODE(inst) == GEN_ALIGN_1) { if (GEN_BITS_FIELD(inst, bits3.da1.src1_address_mode) == GEN_ADDRESS_DIRECT) { return src_da1(file, GEN_BITS_FIELD2(inst, bits1.da1.src1_reg_type, bits2.da1.src1_reg_type), GEN_BITS_FIELD2(inst, bits1.da1.src1_reg_file, bits2.da1.src1_reg_file), GEN_BITS_FIELD(inst, bits3.da1.src1_vert_stride), GEN_BITS_FIELD(inst, bits3.da1.src1_width), GEN_BITS_FIELD(inst, bits3.da1.src1_horiz_stride), GEN_BITS_FIELD(inst, bits3.da1.src1_reg_nr), GEN_BITS_FIELD(inst, bits3.da1.src1_subreg_nr), GEN_BITS_FIELD(inst, bits3.da1.src1_abs), GEN_BITS_FIELD(inst, bits3.da1.src1_negate)); } else { return src_ia1(file, GEN_BITS_FIELD2(inst, bits1.ia1.src1_reg_type, bits2.ia1.src1_reg_type), GEN_BITS_FIELD2(inst, bits1.ia1.src1_reg_file, bits2.ia1.src1_reg_file), GEN_BITS_FIELD(inst, bits3.ia1.src1_indirect_offset), GEN_BITS_FIELD(inst, bits3.ia1.src1_subreg_nr), GEN_BITS_FIELD(inst, bits3.ia1.src1_negate), GEN_BITS_FIELD(inst, bits3.ia1.src1_abs), GEN_BITS_FIELD(inst, bits3.ia1.src1_address_mode), GEN_BITS_FIELD(inst, bits3.ia1.src1_horiz_stride), GEN_BITS_FIELD(inst, bits3.ia1.src1_width), GEN_BITS_FIELD(inst, bits3.ia1.src1_vert_stride)); } } else { if (GEN_BITS_FIELD(inst, bits3.da16.src1_address_mode) == GEN_ADDRESS_DIRECT) { return src_da16(file, inst, 1, GEN_BITS_FIELD2(inst, bits1.da16.src1_reg_type, bits2.da16.src1_reg_type), GEN_BITS_FIELD2(inst, bits1.da16.src1_reg_file, bits2.da16.src1_reg_file), GEN_BITS_FIELD(inst, bits3.da16.src1_vert_stride), GEN_BITS_FIELD(inst, bits3.da16.src1_reg_nr), GEN_BITS_FIELD(inst, bits3.da16.src1_subreg_nr), GEN_BITS_FIELD(inst, bits3.da16.src1_abs), GEN_BITS_FIELD(inst, bits3.da16.src1_negate), GEN_BITS_FIELD(inst, bits3.da16.src1_swz_x), GEN_BITS_FIELD(inst, bits3.da16.src1_swz_y), GEN_BITS_FIELD(inst, bits3.da16.src1_swz_z), GEN_BITS_FIELD(inst, bits3.da16.src1_swz_w)); } else { string(file, "Indirect align16 address mode not supported"); return 1; } } } static const int esize[6] = { [0] = 1, [1] = 2, [2] = 4, [3] = 8, [4] = 16, [5] = 32, }; static int qtr_ctrl(FILE *file, const void* inst) { int qtr_ctl = QUARTER_CONTROL(inst); int exec_size = esize[EXECUTION_SIZE(inst)]; if (exec_size == 8) { switch (qtr_ctl) { case 0: string(file, " 1Q"); break; case 1: string(file, " 2Q"); break; case 2: string(file, " 3Q"); break; case 3: string(file, " 4Q"); break; } } else if (exec_size == 16) { if (qtr_ctl < 2) string(file, " 1H"); else string(file, " 2H"); } return 0; } int gen_disasm (FILE *file, const void *inst, uint32_t deviceID, uint32_t compacted) { int err = 0; int space = 0; if (IS_GEN7(deviceID)) { gen_version = 70; } else if (IS_GEN75(deviceID)) { gen_version = 75; } else if (IS_GEN8(deviceID)) { gen_version = 80; } else if (IS_GEN9(deviceID)) { gen_version = 90; } if (PRED_CTRL(inst)) { string(file, "("); err |= control(file, "predicate inverse", pred_inv, PRED_INV(inst), NULL); format(file, "f%d", FLAG_REG_NR(inst)); if (FLAG_SUB_REG_NR(inst)) format(file, ".%d", FLAG_SUB_REG_NR(inst)); if (ACCESS_MODE(inst) == GEN_ALIGN_1) err |= control(file, "predicate control align1", pred_ctrl_align1, PRED_CTRL(inst), NULL); else err |= control(file, "predicate control align16", pred_ctrl_align16, PRED_CTRL(inst), NULL); string(file, ") "); } err |= print_opcode(file, OPCODE(inst)); err |= control(file, "saturate", saturate, SATURATE(inst), NULL); err |= control(file, "debug control", debug_ctrl, DEBUG_CONTROL(inst), NULL); if (OPCODE(inst) == GEN_OPCODE_MATH) { string(file, " "); if (gen_version < 80) { err |= control(file, "function", math_function_gen7, MATH_FUNCTION(inst), &space); } else { err |= control(file, "function", math_function_gen8, MATH_FUNCTION(inst), &space); } } else if (OPCODE(inst) != GEN_OPCODE_SEND && OPCODE(inst) != GEN_OPCODE_SENDC && OPCODE(inst) != GEN_OPCODE_SENDS) { err |= control(file, "conditional modifier", conditional_modifier, COND_DST_OR_MODIFIER(inst), NULL); if (COND_DST_OR_MODIFIER(inst)) err |= flag_reg(file, FLAG_REG_NR(inst), FLAG_SUB_REG_NR(inst)); } if (OPCODE(inst) != GEN_OPCODE_NOP) { string(file, "("); err |= control(file, "execution size", exec_size, EXECUTION_SIZE(inst), NULL); string(file, ")"); } if (OPCODE(inst) == GEN_OPCODE_SENDS) { const union Gen9NativeInstruction *gen9_insn = (const union Gen9NativeInstruction *)inst; pad(file, 16); if (gen9_insn->bits1.sends.dest_reg_file_0 == 0) reg(file, GEN_ARCHITECTURE_REGISTER_FILE, gen9_insn->bits1.sends.dest_reg_nr); else format(file, "g%d", gen9_insn->bits1.sends.dest_reg_nr); pad(file, 32); format(file, "g%d(addLen:%d)", gen9_insn->bits2.sends.src0_reg_nr, GENERIC_MSG_LENGTH(inst)); pad(file, 48); format(file, "g%d(dataLen:%d)", gen9_insn->bits1.sends.src1_reg_nr, gen9_insn->bits2.sends.src1_length); pad(file, 64); format(file, "0x%08x", gen9_insn->bits3.ud); } else if (opcode[OPCODE(inst)].nsrc == 3) { pad(file, 16); err |= dest_3src(file, inst); pad(file, 32); err |= src0_3src(file, inst); pad(file, 48); err |= src1_3src(file, inst); pad(file, 64); err |= src2_3src(file, inst); } else { if (opcode[OPCODE(inst)].ndst > 0) { pad(file, 16); err |= dest(file, inst); } else if (OPCODE(inst) == GEN_OPCODE_IF || OPCODE(inst) == GEN_OPCODE_ELSE || OPCODE(inst) == GEN_OPCODE_ENDIF || OPCODE(inst) == GEN_OPCODE_WHILE || OPCODE(inst) == GEN_OPCODE_BRD || OPCODE(inst) == GEN_OPCODE_JMPI) { format(file, " %d", (int16_t)BRANCH_JIP(inst)); } else if (OPCODE(inst) == GEN_OPCODE_BREAK || OPCODE(inst) == GEN_OPCODE_CONTINUE || OPCODE(inst) == GEN_OPCODE_HALT || OPCODE(inst) == GEN_OPCODE_BRC) { format(file, " %d %d", BRANCH_JIP(inst), BRANCH_UIP(inst)); }/* else if (inst->header.opcode == GEN_OPCODE_JMPI) { format(file, " %d", inst->bits3.d); }*/ if (opcode[OPCODE(inst)].nsrc > 0) { pad(file, 32); err |= src0(file, inst); } if (opcode[OPCODE(inst)].nsrc > 1) { pad(file, 48); err |= src1(file, inst); } } if (OPCODE(inst) == GEN_OPCODE_SEND || OPCODE(inst) == GEN_OPCODE_SENDC || OPCODE(inst) == GEN_OPCODE_SENDS) { enum GenMessageTarget target = COND_DST_OR_MODIFIER(inst); newline(file); pad(file, 16); space = 0; if(gen_version >= 75) { err |= control(file, "target function", target_function_gen75, target, &space); } else { err |= control(file, "target function", target_function_gen7, target, &space); } int immbti = 0; if (OPCODE(inst) == GEN_OPCODE_SENDS) { const union Gen9NativeInstruction *gen9_insn = (const union Gen9NativeInstruction *)inst; immbti = !(gen9_insn->bits2.sends.sel_reg32_desc); } else immbti = (GEN_BITS_FIELD2(inst, bits1.da1.src1_reg_file, bits2.da1.src1_reg_file) == GEN_IMMEDIATE_VALUE); if (immbti) { switch (target) { case GEN_SFID_VIDEO_MOTION_EST: format(file, " (bti: %d, msg_type: %d)", VME_BTI(inst), VME_MSG_TYPE(inst)); break; case GEN_SFID_SAMPLER: format(file, " (%d, %d, %d, %d)", SAMPLE_BTI(inst), SAMPLER(inst), SAMPLER_MSG_TYPE(inst), SAMPLER_SIMD_MODE(inst)); break; case GEN_SFID_DATAPORT_RENDER: if(UNTYPED_RW_MSG_TYPE(inst) == 4 || UNTYPED_RW_MSG_TYPE(inst) == 10) format(file, " (bti: %d, %s, %s)", UNTYPED_RW_BTI(inst), data_port_data_cache_category[UNTYPED_RW_CATEGORY(inst)], data_port1_data_cache_msg_type[UNTYPED_RW_MSG_TYPE(inst)]); else format(file, " not implemented"); break; case GEN_SFID_DATAPORT_DATA: if(UNTYPED_RW_CATEGORY(inst) == 0) { if(UNTYPED_RW_MSG_TYPE(inst) == 5 || UNTYPED_RW_MSG_TYPE(inst) == 13) format(file, " (bti: %d, rgba: %d, %s, %s, %s)", UNTYPED_RW_BTI(inst), UNTYPED_RW_RGBA(inst), data_port_data_cache_simd_mode[UNTYPED_RW_SIMD_MODE(inst)], data_port_data_cache_category[UNTYPED_RW_CATEGORY(inst)], data_port_data_cache_msg_type[UNTYPED_RW_MSG_TYPE(inst)]); else if(UNTYPED_RW_MSG_TYPE(inst) == 4 || UNTYPED_RW_MSG_TYPE(inst) == 12) format(file, " (bti: %d, data size: %s, %s, %s, %s)", UNTYPED_RW_BTI(inst), data_port_data_cache_data_size[BYTE_RW_DATA_SIZE(inst)], data_port_data_cache_byte_scattered_simd_mode[BYTE_RW_SIMD_MODE(inst)], data_port_data_cache_category[UNTYPED_RW_CATEGORY(inst)], data_port_data_cache_msg_type[UNTYPED_RW_MSG_TYPE(inst)]); else if(UNTYPED_RW_MSG_TYPE(inst) == 0 || UNTYPED_RW_MSG_TYPE(inst) == 8) format(file, " (bti: %d, data size: %s, %s, %s)", UNTYPED_RW_BTI(inst), data_port_data_cache_block_size[OWORD_RW_BLOCK_SIZE(inst)], data_port_data_cache_category[UNTYPED_RW_CATEGORY(inst)], data_port_data_cache_msg_type[UNTYPED_RW_MSG_TYPE(inst)]); else if(UNTYPED_RW_MSG_TYPE(inst) == 6) format(file, " (bti: %d, rgba: %d, %s, %s, %s, %s)", UNTYPED_RW_BTI(inst), UNTYPED_RW_RGBA(inst), data_port_data_cache_simd_mode[UNTYPED_RW_SIMD_MODE(inst)], data_port_data_cache_category[UNTYPED_RW_CATEGORY(inst)], data_port_data_cache_msg_type[UNTYPED_RW_MSG_TYPE(inst)], atomic_opration_type[UNTYPED_RW_AOP_TYPE(inst)]); else format(file, " not implemented"); } else { format(file, " (addr: %d, blocks: %s, %s, mode: %s, %s)", SCRATCH_RW_OFFSET(inst), data_port_scratch_block_size[SCRATCH_RW_BLOCK_SIZE(inst)], data_port_scratch_invalidate[SCRATCH_RW_INVALIDATE_AFTER_READ(inst)], data_port_scratch_channel_mode[SCRATCH_RW_CHANNEL_MODE(inst)], data_port_scratch_msg_type[SCRATCH_RW_MSG_TYPE(inst)]); } break; case GEN_SFID_DATAPORT1_DATA: if(UNTYPED_RW_MSG_TYPE(inst) == 4 || UNTYPED_RW_MSG_TYPE(inst) == 10) format(file, " (bti: %d, %s, %s)", UNTYPED_RW_BTI(inst), data_port_data_cache_category[UNTYPED_RW_CATEGORY(inst)], data_port1_data_cache_msg_type[UNTYPED_RW_MSG_TYPE(inst)]); else if(UNTYPED_RW_MSG_TYPE(inst) == 2) format(file, " (bti: %d, rgba: %d, %s, %s, %s, %s)", UNTYPED_RW_BTI(inst), UNTYPED_RW_RGBA(inst), data_port_data_cache_simd_mode[UNTYPED_RW_SIMD_MODE(inst)], data_port_data_cache_category[UNTYPED_RW_CATEGORY(inst)], data_port1_data_cache_msg_type[UNTYPED_RW_MSG_TYPE(inst)], atomic_opration_type[UNTYPED_RW_AOP_TYPE(inst)]); else format(file, " (bti: %d, rgba: %d, %s, %s, %s)", UNTYPED_RW_BTI(inst), UNTYPED_RW_RGBA(inst), data_port_data_cache_simd_mode[UNTYPED_RW_SIMD_MODE(inst)], data_port_data_cache_category[UNTYPED_RW_CATEGORY(inst)], data_port1_data_cache_msg_type[UNTYPED_RW_MSG_TYPE(inst)]); break; case GEN_SFID_DATAPORT_CONSTANT: format(file, " (bti: %d, %s)", DWORD_RW_BTI(inst), data_port_data_cache_msg_type[DWORD_RW_MSG_TYPE(inst)]); break; case GEN_SFID_MESSAGE_GATEWAY: format(file, " (subfunc: %s, notify: %d, ackreq: %d)", gateway_sub_function[MSG_GW_SUBFUNC(inst)], MSG_GW_NOTIFY(inst), MSG_GW_ACKREQ(inst)); break; default: format(file, "unsupported target %d", target); break; } if (space) string(file, " "); format(file, "mlen %d", GENERIC_MSG_LENGTH(inst)); format(file, " rlen %d", GENERIC_RESPONSE_LENGTH(inst)); } } pad(file, 64); if (OPCODE(inst) != GEN_OPCODE_NOP) { string(file, "{"); space = 1; err |= control(file, "access mode", access_mode, ACCESS_MODE(inst), &space); err |= control(file, "write enable control", wectrl, MASK_CONTROL(inst), &space); err |= control(file, "dependency control", dep_ctrl, DEPENDENCY_CONTROL(inst), &space); err |= qtr_ctrl(file, inst); if (gen_version < 80) { err |= control(file, "thread control", thread_ctrl_gen7, THREAD_CONTROL(inst), &space); } else { err |= control(file, "thread control", thread_ctrl_gen8, THREAD_CONTROL(inst), &space); } err |= control(file, "acc write control", accwr, ACC_WR_CONTROL(inst), &space); if (OPCODE(inst) == GEN_OPCODE_SEND || OPCODE(inst) == GEN_OPCODE_SENDC) err |= control(file, "end of thread", end_of_thread, END_OF_THREAD(inst), &space); if(compacted) { string(file, " Compacted"); } if (space) string(file, " "); string(file, "}"); } string(file, ";"); newline(file); return err; } Beignet-1.3.2-Source/backend/src/backend/gen_encoder.hpp000664 001750 001750 00000035650 13161142102 022206 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* Copyright (C) Intel Corp. 2006. All Rights Reserved. Intel funded Tungsten Graphics (http://www.tungstengraphics.com) to develop this 3D driver. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. **********************************************************************/ /* * Authors: * Keith Whitwell */ #ifndef __GBE_GEN_ENCODER_HPP__ #define __GBE_GEN_ENCODER_HPP__ #include "backend/gen_defs.hpp" #include "backend/gen_register.hpp" #include "sys/platform.hpp" #include "sys/vector.hpp" #include #include "src/cl_device_data.h" namespace gbe { /*! Helper structure to emit Gen instructions */ class GenEncoder { public: /*! simdWidth is the default width for the instructions */ GenEncoder(uint32_t simdWidth, uint32_t gen, uint32_t deviceID); virtual ~GenEncoder(void) { } /*! Size of the stack (should be large enough) */ enum { MAX_STATE_NUM = 16 }; /*! Push the current instruction state */ void push(void); /*! Pop the latest pushed state */ void pop(void); /*! The instruction stream we are building */ vector store; /*! Current instruction state to use */ GenInstructionState curr; /*! State used to encode the instructions */ GenInstructionState stack[MAX_STATE_NUM]; /*! Number of states currently pushed */ uint32_t stateNum; /*! Gen generation to encode */ uint32_t gen; /*! Device ID */ uint32_t deviceID; /*! simd width for this codegen */ uint32_t simdWidth; DebugInfo DBGInfo; vector storedbg; void setDBGInfo(DebugInfo in, bool hasHigh); //////////////////////////////////////////////////////////////////////// // Encoding functions //////////////////////////////////////////////////////////////////////// #define ALU1(OP) void OP(GenRegister dest, GenRegister src0, uint32_t condition = 0); #define ALU2(OP) void OP(GenRegister dest, GenRegister src0, GenRegister src1); #define ALU2_MOD(OP) void OP(GenRegister dest, GenRegister src0, GenRegister src1, uint32_t condition = 0); #define ALU3(OP) void OP(GenRegister dest, GenRegister src0, GenRegister src1, GenRegister src2); ALU1(MOV) ALU1(FBH) ALU1(FBL) ALU1(CBIT) ALU2(SUBB) ALU1(RNDZ) ALU1(RNDE) ALU1(RNDD) ALU1(RNDU) ALU2(SEL) ALU1(NOT) ALU2_MOD(AND) ALU2_MOD(OR) ALU2_MOD(XOR) ALU2(SHR) ALU2(SHL) ALU2(RSR) ALU2(RSL) ALU2(ASR) ALU2(ADD) ALU2(ADDC) ALU2(MUL) ALU1(FRC) ALU2(MAC) ALU2(MACH) ALU1(LZD) ALU2(LINE) ALU2(PLN) ALU3(MAD) ALU3(LRP) ALU2(BRC) ALU1(BRD) ALU1(BFREV) #undef ALU1 #undef ALU2 #undef ALU2_MOD #undef ALU3 virtual void F16TO32(GenRegister dest, GenRegister src0); virtual void F32TO16(GenRegister dest, GenRegister src0); virtual void LOAD_INT64_IMM(GenRegister dest, GenRegister value); /*! Barrier message (to synchronize threads of a workgroup) */ void BARRIER(GenRegister src); /*! Forward the gateway message. */ void FWD_GATEWAY_MSG(GenRegister src, uint32_t notifyN = 0); /*! Memory fence message (to order loads and stores between threads) */ virtual void FENCE(GenRegister dst, bool flushRWCache); /*! Jump indexed instruction */ virtual void JMPI(GenRegister src, bool longjmp = false); /*! IF indexed instruction */ void IF(GenRegister src); /*! ELSE indexed instruction */ void ELSE(GenRegister src); /*! ENDIF indexed instruction */ void ENDIF(GenRegister src); /*! WHILE indexed instruction */ void WHILE(GenRegister src); /*! BRC indexed instruction */ void BRC(GenRegister src); /*! BRD indexed instruction */ void BRD(GenRegister src); /*! Compare instructions */ void CMP(uint32_t conditional, GenRegister src0, GenRegister src1, GenRegister dst = GenRegister::null()); /*! Select with embedded compare (like sel.le ...) */ void SEL_CMP(uint32_t conditional, GenRegister dst, GenRegister src0, GenRegister src1); /*! EOT is used to finish GPGPU threads */ void EOT(uint32_t msg_nr); /*! No-op */ void NOP(void); /*! Wait instruction (used for the barrier) */ void WAIT(uint32_t n = 0); /*! Atomic instructions */ virtual void ATOMIC(GenRegister dst, uint32_t function, GenRegister addr, GenRegister data, GenRegister bti, uint32_t srcNum, bool useSends); /*! AtomicA64 instructions */ virtual void ATOMICA64(GenRegister dst, uint32_t function, GenRegister src, GenRegister bti, uint32_t srcNum); /*! Untyped read (upto 4 channels) */ virtual void UNTYPED_READ(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemNum); /*! Untyped write (upto 4 channels) */ virtual void UNTYPED_WRITE(GenRegister addr, GenRegister data, GenRegister bti, uint32_t elemNum, bool useSends); /*! Untyped read A64(upto 4 channels) */ virtual void UNTYPED_READA64(GenRegister dst, GenRegister src, uint32_t elemNum); /*! Untyped write (upto 4 channels) */ virtual void UNTYPED_WRITEA64(GenRegister src, uint32_t elemNum); /*! Byte gather (for unaligned bytes, shorts and ints) */ void BYTE_GATHER(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemSize); /*! Byte scatter (for unaligned bytes, shorts and ints) */ virtual void BYTE_SCATTER(GenRegister addr, GenRegister data, GenRegister bti, uint32_t elemSize, bool useSends); /*! Byte gather a64 (for unaligned bytes, shorts and ints) */ virtual void BYTE_GATHERA64(GenRegister dst, GenRegister src, uint32_t elemSize); /*! Byte scatter a64 (for unaligned bytes, shorts and ints) */ virtual void BYTE_SCATTERA64(GenRegister src, uint32_t elemSize); /*! DWord gather (for constant cache read) */ void DWORD_GATHER(GenRegister dst, GenRegister src, uint32_t bti); /*! for scratch memory read */ void SCRATCH_READ(GenRegister msg, GenRegister dst, uint32_t offset, uint32_t size, uint32_t dst_num, uint32_t channel_mode); /*! for scratch memory write */ void SCRATCH_WRITE(GenRegister msg, uint32_t offset, uint32_t size, uint32_t src_num, uint32_t channel_mode); /*! Send instruction for the sampler */ virtual void SAMPLE(GenRegister dest, GenRegister msg, unsigned int msg_len, bool header_present, unsigned char bti, unsigned char sampler, unsigned int simdWidth, uint32_t writemask, uint32_t return_format, bool isLD, bool isUniform); void setSamplerMessage(GenNativeInstruction *insn, unsigned char bti, unsigned char sampler, uint32_t msg_type, uint32_t response_length, uint32_t msg_length, bool header_present, uint32_t simd_mode, uint32_t return_format); virtual void VME(unsigned char bti, GenRegister dest, GenRegister msg, uint32_t msg_type, uint32_t vme_search_path_lut, uint32_t lut_sub); void setVmeMessage(GenNativeInstruction *insn, unsigned char bti, uint32_t response_length, uint32_t msg_length, uint32_t msg_type, unsigned char vme_search_path_lut, unsigned char lut_sub); virtual void FLUSH_SAMPLERCACHE(GenRegister dst); /*! TypedWrite instruction for texture */ virtual void TYPED_WRITE(GenRegister header, GenRegister data, bool header_present, unsigned char bti, bool useSends); /*! Extended math function (2 sources) */ void MATH(GenRegister dst, uint32_t function, GenRegister src0, GenRegister src1); /*! Extended math function (1 source) */ void MATH(GenRegister dst, uint32_t function, GenRegister src); /*! Patch JMPI/BRC/BRD (located at index insnID) with the given jump distance */ virtual void patchJMPI(uint32_t insnID, int32_t jip, int32_t uip); //////////////////////////////////////////////////////////////////////// // Helper functions to encode //////////////////////////////////////////////////////////////////////// void setDPByteScatterGather(GenNativeInstruction *insn, uint32_t bti, uint32_t elem_size, uint32_t msg_type, uint32_t msg_length, uint32_t response_length); virtual void setDPUntypedRW(GenNativeInstruction *insn, uint32_t bti, uint32_t rgba, uint32_t msg_type, uint32_t msg_length, uint32_t response_length); virtual void setTypedWriteMessage(GenNativeInstruction *insn, unsigned char bti, unsigned char msg_type, uint32_t msg_length, bool header_present); void setMessageDescriptor(GenNativeInstruction *inst, enum GenMessageTarget sfid, unsigned msg_length, unsigned response_length, bool header_present = false, bool end_of_thread = false); virtual unsigned setAtomicMessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum); virtual unsigned setAtomicA64MessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum, int type_long); virtual unsigned setUntypedReadMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum); virtual unsigned setUntypedWriteMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum); virtual unsigned setUntypedWriteSendsMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum); unsigned setByteGatherMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemSize); unsigned setByteScatterMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemSize); virtual unsigned setByteScatterSendsMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemSize); unsigned generateAtomicMessageDesc(unsigned function, unsigned bti, unsigned srcNum); unsigned generateUntypedReadMessageDesc(unsigned bti, unsigned elemNum); unsigned generateUntypedWriteMessageDesc(unsigned bti, unsigned elemNum); unsigned generateUntypedWriteSendsMessageDesc(unsigned bti, unsigned elemNum); unsigned generateByteGatherMessageDesc(unsigned bti, unsigned elemSize); unsigned generateByteScatterMessageDesc(unsigned bti, unsigned elemSize); unsigned generateByteScatterSendsMessageDesc(unsigned bti, unsigned elemSize); virtual void setHeader(GenNativeInstruction *insn) = 0; virtual void setDst(GenNativeInstruction *insn, GenRegister dest) = 0; virtual void setSrc0(GenNativeInstruction *insn, GenRegister reg) = 0; virtual void setSrc1(GenNativeInstruction *insn, GenRegister reg) = 0; GenCompactInstruction *nextCompact(uint32_t opcode); virtual uint32_t getCompactVersion() { return 7; } GenNativeInstruction *next(uint32_t opcode); uint32_t n_instruction(void) const { return store.size(); } virtual bool canHandleLong(uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1 = GenRegister::null()); virtual void handleDouble(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1 = GenRegister::null()); /*! OBlock helper function */ uint32_t getOBlockSize(uint32_t oword_size, bool low_half = true); void setMBlockRW(GenNativeInstruction *insn, uint32_t bti, uint32_t msg_type, uint32_t msg_length, uint32_t response_length); void setOBlockRW(GenNativeInstruction *insn, uint32_t bti, uint32_t block_size, uint32_t msg_type, uint32_t msg_length, uint32_t response_lengtha); /*! OBlock read */ void OBREAD(GenRegister dst, GenRegister header, uint32_t bti, uint32_t ow_size); /*! OBlock write */ virtual void OBWRITE(GenRegister header, GenRegister data, uint32_t bti, uint32_t ow_size, bool useSends); /*! MBlock read */ virtual void MBREAD(GenRegister dst, GenRegister header, uint32_t bti, uint32_t response_size); /*! MBlock write */ virtual void MBWRITE(GenRegister header, GenRegister data, uint32_t bti, uint32_t data_size, bool useSends); /*! A64 OBlock read */ virtual void OBREADA64(GenRegister dst, GenRegister header, uint32_t bti, uint32_t ow_size); /*! A64 OBlock write */ virtual void OBWRITEA64(GenRegister header, uint32_t bti, uint32_t ow_size); GBE_CLASS(GenEncoder); //!< Use custom allocators virtual void alu3(uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1, GenRegister src2) = 0; }; void alu1(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src, uint32_t condition = 0); void alu2(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1, uint32_t condition = 0); } /* namespace gbe */ #endif /* __GBE_GEN_ENCODER_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen_insn_selection_optimize.cpp000664 001750 001750 00000027536 13173554000 025531 0ustar00yryr000000 000000 #include "backend/gen_insn_selection.hpp" #include "backend/gen_context.hpp" #include "ir/function.hpp" #include "ir/liveness.hpp" #include "ir/profile.hpp" #include "sys/cvar.hpp" #include "sys/vector.hpp" #include #include #include #include namespace gbe { //helper functions static uint32_t CalculateElements(const GenRegister& reg, uint32_t execWidth) { uint32_t elements = 0; uint32_t elementSize = typeSize(reg.type); uint32_t width = GenRegister::width_size(reg); // reg may be other insn's source, this insn's width don't force large then execWidth. //assert(execWidth >= width); uint32_t height = execWidth / width; uint32_t vstride = GenRegister::vstride_size(reg); uint32_t hstride = GenRegister::hstride_size(reg); uint32_t base = reg.nr * GEN_REG_SIZE + reg.subnr; for (uint32_t i = 0; i < height; ++i) { uint32_t offsetInByte = base; for (uint32_t j = 0; j < width; ++j) { uint32_t offsetInType = offsetInByte / elementSize; //it is possible that offsetInType > 32, it doesn't matter even elements is 32 bit. //the reseason is that if one instruction span several registers, //the other registers' visit pattern is same as first register if the vstride is normal(width * hstride) assert(vstride == width * hstride); elements |= (1 << offsetInType); offsetInByte += hstride * elementSize; } base += vstride * elementSize; } return elements; } class SelOptimizer { public: SelOptimizer(const GenContext& ctx, uint32_t features) : ctx(ctx), features(features) {} virtual void run() = 0; virtual ~SelOptimizer() {} protected: const GenContext &ctx; //in case that we need it uint32_t features; }; class SelBasicBlockOptimizer : public SelOptimizer { public: SelBasicBlockOptimizer(const GenContext& ctx, const ir::Liveness::LiveOut& liveout, uint32_t features, SelectionBlock &bb) : SelOptimizer(ctx, features), bb(bb), liveout(liveout), optimized(false) { } ~SelBasicBlockOptimizer() {} virtual void run(); private: // local copy propagation class ReplaceInfo { public: ReplaceInfo(SelectionInstruction& insn, const GenRegister& intermedia, const GenRegister& replacement) : insn(insn), intermedia(intermedia), replacement(replacement) { assert(insn.opcode == SEL_OP_MOV || insn.opcode == SEL_OP_ADD); assert(&(insn.dst(0)) == &intermedia); this->elements = CalculateElements(intermedia, insn.state.execWidth); replacementOverwritten = false; } ~ReplaceInfo() { this->toBeReplaceds.clear(); } SelectionInstruction& insn; const GenRegister& intermedia; uint32_t elements; const GenRegister& replacement; set toBeReplaceds; bool replacementOverwritten; GBE_CLASS(ReplaceInfo); }; typedef map ReplaceInfoMap; ReplaceInfoMap replaceInfoMap; void doLocalCopyPropagation(); void addToReplaceInfoMap(SelectionInstruction& insn); void changeInsideReplaceInfoMap(const SelectionInstruction& insn, GenRegister& var); void removeFromReplaceInfoMap(const SelectionInstruction& insn, const GenRegister& var); void doReplacement(ReplaceInfo* info); bool CanBeReplaced(const ReplaceInfo* info, const SelectionInstruction& insn, const GenRegister& var); void cleanReplaceInfoMap(); void doNegAddOptimization(SelectionInstruction &insn); SelectionBlock &bb; const ir::Liveness::LiveOut& liveout; bool optimized; static const size_t MaxTries = 1; //the max times of optimization try }; void SelBasicBlockOptimizer::doReplacement(ReplaceInfo* info) { for (GenRegister* reg : info->toBeReplaceds) { GenRegister::propagateRegister(*reg, info->replacement); } bb.insnList.erase(&(info->insn)); optimized = true; } void SelBasicBlockOptimizer::cleanReplaceInfoMap() { for (auto& pair : replaceInfoMap) { ReplaceInfo* info = pair.second; doReplacement(info); delete info; } replaceInfoMap.clear(); } void SelBasicBlockOptimizer::removeFromReplaceInfoMap(const SelectionInstruction& insn, const GenRegister& var) { for (ReplaceInfoMap::iterator pos = replaceInfoMap.begin(); pos != replaceInfoMap.end(); ++pos) { ReplaceInfo* info = pos->second; if (info->intermedia.reg() == var.reg()) { //intermedia is overwritten if (info->intermedia.quarter == var.quarter && info->intermedia.subnr == var.subnr && info->intermedia.nr == var.nr) { // We need to check the if intermedia is fully overwritten, they may be in some prediction state. if (CanBeReplaced(info, insn, var)) doReplacement(info); } replaceInfoMap.erase(pos); delete info; return; } if (info->replacement.reg() == var.reg()) { //replacement is overwritten //there could be more than one replacements (with different physical subnr) overwritten, //so do not break here, need to scann the whole map. //here is an example: // mov %10, %9.0 // mov %11, %9.1 // ... // mov %9, %8 //both %9.0 and %9.1 are collected into replacement in the ReplaceInfoMap after the first two insts are scanned. //when scan the last inst that %9 is overwritten, we should flag both %9.0 and %9.1 in the map. info->replacementOverwritten = true; } } } void SelBasicBlockOptimizer::addToReplaceInfoMap(SelectionInstruction& insn) { assert(insn.opcode == SEL_OP_MOV || insn.opcode == SEL_OP_ADD); GenRegister &src = insn.src(0); if (insn.opcode == SEL_OP_ADD) { if (src.file == GEN_IMMEDIATE_VALUE) src = insn.src(1); } const GenRegister& dst = insn.dst(0); if (src.type != dst.type || src.file != dst.file) return; if (src.hstride != GEN_HORIZONTAL_STRIDE_0 && src.hstride != dst.hstride ) return; if (liveout.find(dst.reg()) != liveout.end()) return; ReplaceInfo* info = new ReplaceInfo(insn, dst, src); replaceInfoMap[dst.reg()] = info; } bool SelBasicBlockOptimizer::CanBeReplaced(const ReplaceInfo* info, const SelectionInstruction& insn, const GenRegister& var) { //some conditions here are very strict, while some conditions are very light //the reason is that i'm unable to find a perfect condition now in the first version //need to refine the conditions when debugging/optimizing real kernels if (insn.opcode == SEL_OP_BSWAP) //should remove once bswap issue is fixed return false; //the src modifier is not supported by the following instructions if(info->replacement.negation || info->replacement.absolute) { switch(insn.opcode) { case SEL_OP_MATH: { switch(insn.extra.function) { case GEN_MATH_FUNCTION_INT_DIV_QUOTIENT: case GEN_MATH_FUNCTION_INT_DIV_REMAINDER: case GEN_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER: return false; default: break; } break; } case SEL_OP_CBIT: case SEL_OP_FBH: case SEL_OP_FBL: case SEL_OP_BRC: case SEL_OP_BRD: case SEL_OP_BFREV: case SEL_OP_LZD: case SEL_OP_HADD: case SEL_OP_RHADD: return false; default: break; } } if (insn.isWrite() || insn.isRead()) //register in selection vector return false; if (features & SIOF_LOGICAL_SRCMOD) if ((insn.opcode == SEL_OP_AND || insn.opcode == SEL_OP_NOT || insn.opcode == SEL_OP_OR || insn.opcode == SEL_OP_XOR) && (info->replacement.absolute || info->replacement.negation)) return false; if (features & SIOF_OP_MOV_LONG_REG_RESTRICT && insn.opcode == SEL_OP_MOV) { const GenRegister& dst = insn.dst(0); if (dst.isint64() && !info->replacement.isint64() && info->elements != CalculateElements(info->replacement, insn.state.execWidth)) return false; } if (info->replacementOverwritten) return false; if (info->insn.state.noMask == 0 && insn.state.noMask == 1) return false; // If insn is in no prediction state, it will overwrite the info insn. if (info->insn.state.predicate != insn.state.predicate && insn.state.predicate != GEN_PREDICATE_NONE) return false; if (info->insn.state.inversePredicate != insn.state.inversePredicate) return false; if (info->intermedia.type == var.type && info->intermedia.quarter == var.quarter && info->intermedia.subnr == var.subnr && info->intermedia.nr == var.nr) { uint32_t elements = CalculateElements(var, insn.state.execWidth); //considering width, hstrid, vstrid and execWidth if (info->elements == elements) return true; } return false; } void SelBasicBlockOptimizer::changeInsideReplaceInfoMap(const SelectionInstruction& insn, GenRegister& var) { ReplaceInfoMap::iterator it = replaceInfoMap.find(var.reg()); if (it != replaceInfoMap.end()) { //same ir register ReplaceInfo* info = it->second; if (CanBeReplaced(info, insn, var)) { info->toBeReplaceds.insert(&var); } else { //if it is the same ir register, but could not be replaced for some reason, //that means we could not remove MOV instruction, and so no replacement, //so we'll remove the info for this case. replaceInfoMap.erase(it); delete info; } } } void SelBasicBlockOptimizer::doLocalCopyPropagation() { for (SelectionInstruction &insn : bb.insnList) { for (uint8_t i = 0; i < insn.srcNum; ++i) changeInsideReplaceInfoMap(insn, insn.src(i)); for (uint8_t i = 0; i < insn.dstNum; ++i) removeFromReplaceInfoMap(insn, insn.dst(i)); if (insn.opcode == SEL_OP_MOV) addToReplaceInfoMap(insn); doNegAddOptimization(insn); } cleanReplaceInfoMap(); } /* LLVM transform Mad(a, -b, c) to Add b, -b, 0 Mad val, a, b, c for Gen support negtive modifier, mad(a, -b, c) is native suppoted. Also it can be used for the same like instruction sequence. Do it just like a: mov b, -b, so it is a Mov operation like LocalCopyPropagation */ void SelBasicBlockOptimizer::doNegAddOptimization(SelectionInstruction &insn) { if (insn.opcode == SEL_OP_ADD) { GenRegister src0 = insn.src(0); GenRegister src1 = insn.src(1); if ((src0.negation && src1.file == GEN_IMMEDIATE_VALUE && src1.value.f == 0.0f) || (src1.negation && src0.file == GEN_IMMEDIATE_VALUE && src0.value.f == 0.0f)) addToReplaceInfoMap(insn); } } void SelBasicBlockOptimizer::run() { for (size_t i = 0; i < MaxTries; ++i) { optimized = false; doLocalCopyPropagation(); //doOtherLocalOptimization(); if (!optimized) break; //break since no optimization found at this round } } class SelGlobalOptimizer : public SelOptimizer { public: SelGlobalOptimizer(const GenContext& ctx, uint32_t features) : SelOptimizer(ctx, features) {} ~SelGlobalOptimizer() {} virtual void run(); }; void SelGlobalOptimizer::run() { } void Selection::optimize() { //do basic block level optimization for (SelectionBlock &block : *blockList) { SelBasicBlockOptimizer bbopt(getCtx(), getCtx().getLiveOut(block.bb), opt_features, block); bbopt.run(); } //do global optimization } void Selection::addID() { uint32_t insnID = 0; for (auto &block : *blockList) for (auto &insn : block.insnList) { insn.ID = insnID; insnID += 2; } } } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/backend/gen75_encoder.hpp000664 001750 001750 00000005135 13161142102 022355 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file gen75_context.hpp */ #ifndef __GBE_GEN75_ENCODER_HPP__ #define __GBE_GEN75_ENCODER_HPP__ #include "backend/gen_encoder.hpp" #include "backend/gen7_encoder.hpp" namespace gbe { /* This class is used to implement the HSW specific logic for encoder. */ class Gen75Encoder : public Gen7Encoder { public: virtual ~Gen75Encoder(void) { } Gen75Encoder(uint32_t simdWidth, uint32_t gen, uint32_t deviceID) : Gen7Encoder(simdWidth, gen, deviceID) { } /*! Jump indexed instruction */ virtual void JMPI(GenRegister src, bool longjmp = false); /*! Patch JMPI/BRC/BRD (located at index insnID) with the given jump distance */ virtual void patchJMPI(uint32_t insnID, int32_t jip, int32_t uip); virtual void ATOMIC(GenRegister dst, uint32_t function, GenRegister src, GenRegister bti, uint32_t srcNum, bool useSends); virtual void UNTYPED_READ(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemNum); virtual void UNTYPED_WRITE(GenRegister src, GenRegister data, GenRegister bti, uint32_t elemNum, bool useSends); virtual void setHeader(GenNativeInstruction *insn); virtual void setDPUntypedRW(GenNativeInstruction *insn, uint32_t bti, uint32_t rgba, uint32_t msg_type, uint32_t msg_length, uint32_t response_length); virtual void setTypedWriteMessage(GenNativeInstruction *insn, unsigned char bti, unsigned char msg_type, uint32_t msg_length, bool header_present); virtual unsigned setAtomicMessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum); virtual unsigned setUntypedReadMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum); virtual unsigned setUntypedWriteMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum); }; } #endif /* __GBE_GEN75_ENCODER_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen_program.cpp000664 001750 001750 00000057516 13173554000 022245 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file program.cpp * \author Benjamin Segovia */ #ifdef GBE_COMPILER_AVAILABLE #include "llvm/Config/llvm-config.h" #include "llvm/IR/LLVMContext.h" #include "llvm/IR/Module.h" #include "llvm/IR/DataLayout.h" #include "llvm-c/Linker.h" #include "llvm-c/BitReader.h" #include "llvm-c/BitWriter.h" #include "llvm/Transforms/Utils/Cloning.h" #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 #include "llvm/Bitcode/BitcodeWriter.h" #else #include "llvm/Bitcode/ReaderWriter.h" #endif /* LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 */ #include "llvm/Support/raw_ostream.h" #include "llvm/ADT/StringRef.h" #include "llvm/Support/MemoryBuffer.h" #include "llvm/Support/SourceMgr.h" #include "llvm/IRReader/IRReader.h" #endif #include "backend/program.h" #include "backend/gen_program.h" #include "backend/gen_program.hpp" #include "backend/gen_context.hpp" #include "backend/gen75_context.hpp" #include "backend/gen8_context.hpp" #include "backend/gen9_context.hpp" #include "backend/gen_defs.hpp" #include "backend/gen/gen_mesa_disasm.h" #include "backend/gen_reg_allocation.hpp" #include "ir/unit.hpp" #ifdef GBE_COMPILER_AVAILABLE #include "llvm/llvm_to_gen.hpp" #include "llvm/llvm_gen_backend.hpp" #include #endif #include "sys/cvar.hpp" #include #include #include #include #include #include #include namespace gbe { GenKernel::GenKernel(const std::string &name, uint32_t deviceID) : Kernel(name), deviceID(deviceID), insns(NULL), insnNum(0) {} GenKernel::~GenKernel(void) { GBE_SAFE_DELETE_ARRAY(insns); } const char *GenKernel::getCode(void) const { return (const char*) insns; } void GenKernel::setCode(const char * ins, size_t size) { insns = (GenInstruction *)ins; insnNum = size / sizeof(GenInstruction); } uint32_t GenKernel::getCodeSize(void) const { return insnNum * sizeof(GenInstruction); } void GenKernel::printStatus(int indent, std::ostream& outs) { #ifdef GBE_COMPILER_AVAILABLE Kernel::printStatus(indent, outs); FILE *f = fopen("/dev/null", "w"); if(!f) { outs << "could not open /dev/null !"; return; } char *buf = new char[4096]; setbuffer(f, buf, 4096); GenCompactInstruction * pCom = NULL; GenInstruction insn[2]; uint32_t insn_version = 0; if (IS_GEN7(deviceID) || IS_GEN75(deviceID)) insn_version = 7; else if (IS_GEN8(deviceID) || IS_GEN9(deviceID)) insn_version = 8; for (uint32_t i = 0; i < insnNum;) { pCom = (GenCompactInstruction*)(insns+i); if(pCom->bits1.cmpt_control == 1) { decompactInstruction(pCom, &insn, insn_version); gen_disasm(f, &insn, deviceID, 1); i++; } else { gen_disasm(f, insns+i, deviceID, 0); i = i + 2; } outs << buf; fflush(f); setbuffer(f, NULL, 0); setbuffer(f, buf, 4096); } setbuffer(f, NULL, 0); delete [] buf; fclose(f); #endif } void GenProgram::CleanLlvmResource(void){ #ifdef GBE_COMPILER_AVAILABLE llvm::LLVMContext* ctx = NULL; if(module){ ctx = &((llvm::Module*)module)->getContext(); (void)ctx; delete (llvm::Module*)module; module = NULL; } //llvm's version < 3.9, ctx is global ctx, can't be deleted. #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 //each module's context is individual, just delete it, ignaor llvm_ctx. if (ctx != NULL) delete ctx; #else if(llvm_ctx){ delete (llvm::LLVMContext*)llvm_ctx; llvm_ctx = NULL; } #endif #endif } /*! We must avoid spilling at all cost with Gen */ struct CodeGenStrategy { uint32_t simdWidth; uint32_t reservedSpillRegs; bool limitRegisterPressure; }; static const struct CodeGenStrategy codeGenStrategyDefault[] = { {16, 0, false}, {8, 0, false}, {8, 8, false}, {8, 16, false}, }; static const struct CodeGenStrategy codeGenStrategySimd16[] = { {16, 0, false}, {16, 8, false}, {16, 16, false}, }; IVAR(OCL_SIMD_WIDTH, 8, 15, 16); Kernel *GenProgram::compileKernel(const ir::Unit &unit, const std::string &name, bool relaxMath, int profiling) { #ifdef GBE_COMPILER_AVAILABLE // Be careful when the simdWidth is forced by the programmer. We can see it // when the function already provides the simd width we need to use (i.e. // non zero) const ir::Function *fn = unit.getFunction(name); const struct CodeGenStrategy* codeGenStrategy = codeGenStrategyDefault; if(fn == NULL) GBE_ASSERT(0); uint32_t codeGenNum = sizeof(codeGenStrategyDefault) / sizeof(codeGenStrategyDefault[0]); uint32_t codeGen = 0; GenContext *ctx = NULL; if ( fn->getSimdWidth() != 0 && OCL_SIMD_WIDTH != 15) { GBE_ASSERTM(0, "unsupported SIMD width!"); }else if (fn->getSimdWidth() == 8 || OCL_SIMD_WIDTH == 8) { codeGen = 1; } else if (fn->getSimdWidth() == 16 || OCL_SIMD_WIDTH == 16){ codeGenStrategy = codeGenStrategySimd16; codeGenNum = sizeof(codeGenStrategySimd16) / sizeof(codeGenStrategySimd16[0]); } else if (fn->getSimdWidth() == 0 && OCL_SIMD_WIDTH == 15) { codeGen = 0; } else GBE_ASSERTM(0, "unsupported SIMD width!"); Kernel *kernel = NULL; // Stop when compilation is successful if (IS_IVYBRIDGE(deviceID)) { ctx = GBE_NEW(GenContext, unit, name, deviceID, relaxMath); } else if (IS_HASWELL(deviceID)) { ctx = GBE_NEW(Gen75Context, unit, name, deviceID, relaxMath); } else if (IS_BROADWELL(deviceID)) { ctx = GBE_NEW(Gen8Context, unit, name, deviceID, relaxMath); } else if (IS_CHERRYVIEW(deviceID)) { ctx = GBE_NEW(ChvContext, unit, name, deviceID, relaxMath); } else if (IS_SKYLAKE(deviceID)) { ctx = GBE_NEW(Gen9Context, unit, name, deviceID, relaxMath); } else if (IS_BROXTON(deviceID)) { ctx = GBE_NEW(BxtContext, unit, name, deviceID, relaxMath); } else if (IS_KABYLAKE(deviceID)) { ctx = GBE_NEW(KblContext, unit, name, deviceID, relaxMath); } else if (IS_GEMINILAKE(deviceID)) { ctx = GBE_NEW(GlkContext, unit, name, deviceID, relaxMath); } GBE_ASSERTM(ctx != NULL, "Fail to create the gen context\n"); if (profiling) { ctx->setProfilingMode(true); unit.getProfilingInfo()->setDeviceID(deviceID); } ctx->setASMFileName(this->asm_file_name); for (; codeGen < codeGenNum; ++codeGen) { const uint32_t simdWidth = codeGenStrategy[codeGen].simdWidth; const bool limitRegisterPressure = codeGenStrategy[codeGen].limitRegisterPressure; const uint32_t reservedSpillRegs = codeGenStrategy[codeGen].reservedSpillRegs; // Force the SIMD width now and try to compile ir::Function *simdFn = unit.getFunction(name); if(simdFn == NULL) GBE_ASSERT(0); simdFn->setSimdWidth(simdWidth); ctx->startNewCG(simdWidth, reservedSpillRegs, limitRegisterPressure); kernel = ctx->compileKernel(); if (kernel != NULL) { GBE_ASSERT(ctx->getErrCode() == NO_ERROR); kernel->setOclVersion(unit.getOclVersion()); break; } simdFn->getImageSet()->clearInfo(); // If we get a out of range if/endif error. // We need to set the context to if endif fix mode and restart the previous compile. if ( ctx->getErrCode() == OUT_OF_RANGE_IF_ENDIF && !ctx->getIFENDIFFix() ) { ctx->setIFENDIFFix(true); codeGen--; } else GBE_ASSERT(!(ctx->getErrCode() == OUT_OF_RANGE_IF_ENDIF && ctx->getIFENDIFFix())); } //GBE_ASSERTM(kernel != NULL, "Fail to compile kernel, may need to increase reserved registers for spilling."); return kernel; #else return NULL; #endif } #define GEN_BINARY_HEADER_LENGTH 8 enum GEN_BINARY_HEADER_INDEX { GBHI_BYT = 0, GBHI_IVB = 1, GBHI_HSW = 2, GBHI_CHV = 3, GBHI_BDW = 4, GBHI_SKL = 5, GBHI_BXT = 6, GBHI_KBL = 7, GBHI_GLK = 8, GBHI_MAX, }; #define GEN_BINARY_VERSION 1 static const unsigned char gen_binary_header[GBHI_MAX][GEN_BINARY_HEADER_LENGTH]= \ {{GEN_BINARY_VERSION, 'G','E', 'N', 'C', 'B', 'Y', 'T'}, {GEN_BINARY_VERSION, 'G','E', 'N', 'C', 'I', 'V', 'B'}, {GEN_BINARY_VERSION, 'G','E', 'N', 'C', 'H', 'S', 'W'}, {GEN_BINARY_VERSION, 'G','E', 'N', 'C', 'C', 'H', 'V'}, {GEN_BINARY_VERSION, 'G','E', 'N', 'C', 'B', 'D', 'W'}, {GEN_BINARY_VERSION, 'G','E', 'N', 'C', 'S', 'K', 'L'}, {GEN_BINARY_VERSION, 'G','E', 'N', 'C', 'B', 'X', 'T'}, {GEN_BINARY_VERSION, 'G','E', 'N', 'C', 'K', 'B', 'T'}, {GEN_BINARY_VERSION, 'G','E', 'N', 'C', 'G', 'L', 'K'} }; #define FILL_GEN_HEADER(binary, index) do {int i = 0; do {*(binary+i) = gen_binary_header[index][i]; i++; }while(i < GEN_BINARY_HEADER_LENGTH);}while(0) #define FILL_BYT_HEADER(binary) FILL_GEN_HEADER(binary, GBHI_BYT) #define FILL_IVB_HEADER(binary) FILL_GEN_HEADER(binary, GBHI_IVB) #define FILL_HSW_HEADER(binary) FILL_GEN_HEADER(binary, GBHI_HSW) #define FILL_CHV_HEADER(binary) FILL_GEN_HEADER(binary, GBHI_CHV) #define FILL_BDW_HEADER(binary) FILL_GEN_HEADER(binary, GBHI_BDW) #define FILL_SKL_HEADER(binary) FILL_GEN_HEADER(binary, GBHI_SKL) #define FILL_BXT_HEADER(binary) FILL_GEN_HEADER(binary, GBHI_BXT) #define FILL_KBL_HEADER(binary) FILL_GEN_HEADER(binary, GBHI_KBL) #define FILL_GLK_HEADER(binary) FILL_GEN_HEADER(binary, GBHI_GLK) static bool genHeaderCompare(const unsigned char *BufPtr, GEN_BINARY_HEADER_INDEX index) { bool matched = true; for (int i = 1; i < GEN_BINARY_HEADER_LENGTH; ++i) { matched = matched && (BufPtr[i] == gen_binary_header[index][i]); } if(matched) { if(BufPtr[0] != gen_binary_header[index][0]) { std::cout << "Beignet binary format have been changed, please generate binary again.\n"; matched = false; } } return matched; } #define MATCH_BYT_HEADER(binary) genHeaderCompare(binary, GBHI_BYT) #define MATCH_IVB_HEADER(binary) genHeaderCompare(binary, GBHI_IVB) #define MATCH_HSW_HEADER(binary) genHeaderCompare(binary, GBHI_HSW) #define MATCH_CHV_HEADER(binary) genHeaderCompare(binary, GBHI_CHV) #define MATCH_BDW_HEADER(binary) genHeaderCompare(binary, GBHI_BDW) #define MATCH_SKL_HEADER(binary) genHeaderCompare(binary, GBHI_SKL) #define MATCH_BXT_HEADER(binary) genHeaderCompare(binary, GBHI_BXT) #define MATCH_KBL_HEADER(binary) genHeaderCompare(binary, GBHI_KBL) #define MATCH_GLK_HEADER(binary) genHeaderCompare(binary, GBHI_GLK) #define MATCH_DEVICE(deviceID, binary) ((IS_IVYBRIDGE(deviceID) && MATCH_IVB_HEADER(binary)) || \ (IS_IVYBRIDGE(deviceID) && MATCH_IVB_HEADER(binary)) || \ (IS_BAYTRAIL_T(deviceID) && MATCH_BYT_HEADER(binary)) || \ (IS_HASWELL(deviceID) && MATCH_HSW_HEADER(binary)) || \ (IS_BROADWELL(deviceID) && MATCH_BDW_HEADER(binary)) || \ (IS_CHERRYVIEW(deviceID) && MATCH_CHV_HEADER(binary)) || \ (IS_SKYLAKE(deviceID) && MATCH_SKL_HEADER(binary)) || \ (IS_BROXTON(deviceID) && MATCH_BXT_HEADER(binary)) || \ (IS_KABYLAKE(deviceID) && MATCH_KBL_HEADER(binary)) || \ (IS_GEMINILAKE(deviceID) && MATCH_GLK_HEADER(binary)) \ ) static gbe_program genProgramNewFromBinary(uint32_t deviceID, const char *binary, size_t size) { using namespace gbe; std::string binary_content; if(size < GEN_BINARY_HEADER_LENGTH) return NULL; //the header length is 8 bytes: 1 byte is binary type, 4 bytes are bitcode header, 3 bytes are hw info. if(!MATCH_DEVICE(deviceID, (unsigned char*)binary)){ return NULL; } binary_content.assign(binary+GEN_BINARY_HEADER_LENGTH, size-GEN_BINARY_HEADER_LENGTH); GenProgram *program = GBE_NEW(GenProgram, deviceID); std::istringstream ifs(binary_content, std::ostringstream::binary); if (!program->deserializeFromBin(ifs)) { delete program; return NULL; } //program->printStatus(0, std::cout); return reinterpret_cast(program); } static gbe_program genProgramNewFromLLVMBinary(uint32_t deviceID, const char *binary, size_t size) { #ifdef GBE_COMPILER_AVAILABLE using namespace gbe; std::string binary_content; //the first byte stands for binary_type. binary_content.assign(binary+1, size-1); llvm::StringRef llvm_bin_str(binary_content); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 llvm::LLVMContext *c = new llvm::LLVMContext; #else llvm::LLVMContext *c = &llvm::getGlobalContext(); #endif llvm::SMDiagnostic Err; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 36 std::unique_ptr memory_buffer = llvm::MemoryBuffer::getMemBuffer(llvm_bin_str, "llvm_bin_str"); acquireLLVMContextLock(); llvm::Module* module = llvm::parseIR(memory_buffer->getMemBufferRef(), Err, *c).release(); #else llvm::MemoryBuffer* memory_buffer = llvm::MemoryBuffer::getMemBuffer(llvm_bin_str, "llvm_bin_str"); acquireLLVMContextLock(); llvm::Module* module = llvm::ParseIR(memory_buffer, Err, *c); #endif // if load 32 bit spir binary, the triple should be spir-unknown-unknown. llvm::Triple triple(module->getTargetTriple()); if (triple.getArchName() == "spir" && triple.getVendorName() == "unknown" && triple.getOSName() == "unknown"){ module->setTargetTriple("spir"); } else if (triple.getArchName() == "spir64" && triple.getVendorName() == "unknown" && triple.getOSName() == "unknown"){ module->setTargetTriple("spir64"); } releaseLLVMContextLock(); if(module == NULL){ GBE_ASSERT(0); } GenProgram *program = GBE_NEW(GenProgram, deviceID, module); //program->printStatus(0, std::cout); return reinterpret_cast(program); #else return NULL; #endif } static size_t genProgramSerializeToBinary(gbe_program program, char **binary, int binary_type) { using namespace gbe; size_t sz; std::ostringstream oss; GenProgram *prog = (GenProgram*)program; //0 means GEN binary, 1 means LLVM bitcode compiled object, 2 means LLVM bitcode library if(binary_type == 0){ if ((sz = prog->serializeToBin(oss)) == 0) { *binary = NULL; return 0; } //add header to differetiate from llvm bitcode binary. //the header length is 8 bytes: 1 byte is binary type, 4 bytes are bitcode header, 3 bytes are hw info. *binary = (char *)malloc(sizeof(char) * (sz+GEN_BINARY_HEADER_LENGTH) ); if(*binary == NULL) return 0; memset(*binary, 0, sizeof(char) * (sz+GEN_BINARY_HEADER_LENGTH) ); if(IS_IVYBRIDGE(prog->deviceID)){ FILL_IVB_HEADER(*binary); if(IS_BAYTRAIL_T(prog->deviceID)){ FILL_BYT_HEADER(*binary); } }else if(IS_HASWELL(prog->deviceID)){ FILL_HSW_HEADER(*binary); }else if(IS_BROADWELL(prog->deviceID)){ FILL_BDW_HEADER(*binary); }else if(IS_CHERRYVIEW(prog->deviceID)){ FILL_CHV_HEADER(*binary); }else if(IS_SKYLAKE(prog->deviceID)){ FILL_SKL_HEADER(*binary); }else if(IS_BROXTON(prog->deviceID)){ FILL_BXT_HEADER(*binary); }else if(IS_KABYLAKE(prog->deviceID)){ FILL_KBL_HEADER(*binary); }else if(IS_GEMINILAKE(prog->deviceID)){ FILL_GLK_HEADER(*binary); }else { free(*binary); *binary = NULL; return 0; } memcpy(*binary+GEN_BINARY_HEADER_LENGTH, oss.str().c_str(), sz*sizeof(char)); return sz+GEN_BINARY_HEADER_LENGTH; }else{ #ifdef GBE_COMPILER_AVAILABLE std::string str; llvm::raw_string_ostream OS(str); llvm::WriteBitcodeToFile((llvm::Module*)prog->module, OS); std::string& bin_str = OS.str(); int llsz = bin_str.size(); *binary = (char *)malloc(sizeof(char) * (llsz+1) ); if(*binary == NULL) return 0; *(*binary) = binary_type; memcpy(*binary+1, bin_str.c_str(), llsz); return llsz+1; #else return 0; #endif } } static gbe_program genProgramNewFromLLVM(uint32_t deviceID, const void* module, const void* llvm_ctx, const char* asm_file_name, size_t stringSize, char *err, size_t *errSize, int optLevel, const char* options) { using namespace gbe; uint32_t fast_relaxed_math = 0; if (options != NULL) if (strstr(options, "-cl-fast-relaxed-math") != NULL) fast_relaxed_math = 1; GenProgram *program = GBE_NEW(GenProgram, deviceID, module, llvm_ctx, asm_file_name, fast_relaxed_math); #ifdef GBE_COMPILER_AVAILABLE std::string error; // Try to compile the program if (program->buildFromLLVMModule(module, error, optLevel) == false) { if (err != NULL && errSize != NULL && stringSize > 0u) { const size_t msgSize = std::min(error.size(), stringSize-1u); std::memcpy(err, error.c_str(), msgSize); *errSize = error.size(); } GBE_DELETE(program); return NULL; } #endif // Everything run fine return (gbe_program) program; } static gbe_program genProgramNewGenProgram(uint32_t deviceID, const void* module, const void* llvm_ctx,const char* asm_file_name) { using namespace gbe; GenProgram *program = GBE_NEW(GenProgram, deviceID, module, llvm_ctx, asm_file_name); // Everything run fine return (gbe_program) program; } static bool genProgramLinkFromLLVM(gbe_program dst_program, gbe_program src_program, size_t stringSize, char * err, size_t * errSize) { #ifdef GBE_COMPILER_AVAILABLE using namespace gbe; char* errMsg = NULL; if(((GenProgram*)dst_program)->module == NULL){ #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 LLVMModuleRef modRef; LLVMParseBitcodeInContext2(wrap(new llvm::LLVMContext()), LLVMWriteBitcodeToMemoryBuffer(wrap((llvm::Module*)((GenProgram*)src_program)->module)), &modRef); ((GenProgram*)dst_program)->module = llvm::unwrap(modRef); #elif LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 ((GenProgram*)dst_program)->module = llvm::CloneModule((llvm::Module*)((GenProgram*)src_program)->module).release(); #else ((GenProgram*)dst_program)->module = llvm::CloneModule((llvm::Module*)((GenProgram*)src_program)->module); #endif errSize = 0; } else { llvm::Module* src = (llvm::Module*)((GenProgram*)src_program)->module; llvm::Module* dst = (llvm::Module*)((GenProgram*)dst_program)->module; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 if (&src->getContext() != &dst->getContext()) { LLVMModuleRef modRef; LLVMParseBitcodeInContext2(wrap(&dst->getContext()), LLVMWriteBitcodeToMemoryBuffer(wrap(src)), &modRef); src = llvm::unwrap(modRef); } llvm::Module* clone = llvm::CloneModule(src).release(); if (LLVMLinkModules2(wrap(dst), wrap(clone))) { #elif LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 if (LLVMLinkModules(wrap(dst), wrap(src), LLVMLinkerPreserveSource_Removed, &errMsg)) { #else if (LLVMLinkModules(wrap(dst), wrap(src), LLVMLinkerPreserveSource, &errMsg)) { #endif if (err != NULL && errSize != NULL && stringSize > 0u && errMsg) { strncpy(err, errMsg, stringSize-1); err[stringSize-1] = '\0'; *errSize = strlen(err); } return true; } } #endif return false; } static void genProgramBuildFromLLVM(gbe_program program, size_t stringSize, char *err, size_t *errSize, const char * options) { #ifdef GBE_COMPILER_AVAILABLE using namespace gbe; std::string error; int optLevel = 1; std::string dumpASMFileName; size_t start = 0, end = 0; uint32_t fast_relaxed_math = 0; if(options) { char *p; p = strstr(const_cast(options), "-cl-opt-disable"); if (p) optLevel = 0; if (options != NULL) if (strstr(options, "-cl-fast-relaxed-math") != NULL) fast_relaxed_math = 1; char *options_str = (char *)malloc(sizeof(char) * (strlen(options) + 1)); if (options_str == NULL) return; memcpy(options_str, options, strlen(options) + 1); std::string optionStr(options_str); while (end != std::string::npos) { end = optionStr.find(' ', start); std::string str = optionStr.substr(start, end - start); start = end + 1; if(str.size() == 0) continue; if(str.find("-dump-opt-asm=") != std::string::npos) { dumpASMFileName = str.substr(str.find("=") + 1); continue; // Don't push this str back; ignore it. } } free(options_str); } GenProgram* p = (GenProgram*) program; p->fast_relaxed_math = fast_relaxed_math; if (!dumpASMFileName.empty()) { p->asm_file_name = dumpASMFileName.c_str(); FILE *asmDumpStream = fopen(dumpASMFileName.c_str(), "w"); if (asmDumpStream) fclose(asmDumpStream); } // Try to compile the program acquireLLVMContextLock(); llvm::Module* module = (llvm::Module*)p->module; if (p->buildFromLLVMModule(module, error, optLevel) == false) { if (err != NULL && errSize != NULL && stringSize > 0u) { const size_t msgSize = std::min(error.size(), stringSize-1u); std::memcpy(err, error.c_str(), msgSize); *errSize = error.size(); } } releaseLLVMContextLock(); #endif } } /* namespace gbe */ void genSetupCallBacks(void) { gbe_program_new_from_binary = gbe::genProgramNewFromBinary; gbe_program_new_from_llvm_binary = gbe::genProgramNewFromLLVMBinary; gbe_program_serialize_to_binary = gbe::genProgramSerializeToBinary; gbe_program_new_from_llvm = gbe::genProgramNewFromLLVM; gbe_program_new_gen_program = gbe::genProgramNewGenProgram; gbe_program_link_from_llvm = gbe::genProgramLinkFromLLVM; gbe_program_build_from_llvm = gbe::genProgramBuildFromLLVM; } Beignet-1.3.2-Source/backend/src/backend/gen_insn_compact.cpp000664 001750 001750 00000071636 13161142102 023243 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Ruiling Song */ #include "backend/gen_defs.hpp" #include "backend/gen_encoder.hpp" #include namespace gbe { struct compact_table_entry { uint32_t bit_pattern; uint32_t index; }; static compact_table_entry control_table[] = { {0b0000000000000000010, 0}, {0b0000100000000000000, 1}, {0b0000100000000000001, 2}, {0b0000100000000000010, 3}, {0b0000100000000000011, 4}, {0b0000100000000000100, 5}, {0b0000100000000000101, 6}, {0b0000100000000000111, 7}, {0b0000100000000001000, 8}, {0b0000100000000001001, 9}, {0b0000100000000001101, 10}, {0b0000110000000000000, 11}, {0b0000110000000000001, 12}, {0b0000110000000000010, 13}, {0b0000110000000000011, 14}, {0b0000110000000000100, 15}, {0b0000110000000000101, 16}, {0b0000110000000000111, 17}, {0b0000110000000001001, 18}, {0b0000110000000001101, 19}, {0b0000110000000010000, 20}, {0b0000110000100000000, 21}, {0b0001000000000000000, 22}, {0b0001000000000000010, 23}, {0b0001000000000000100, 24}, {0b0001000000100000000, 25}, {0b0010110000000000000, 26}, {0b0010110000000010000, 27}, {0b0011000000000000000, 28}, {0b0011000000100000000, 29}, {0b0101000000000000000, 30}, {0b0101000000100000000, 31}, }; static compact_table_entry src3_control_table[] = { {0b100000000110000000000001, 0}, {0b000000000110000000000001, 1}, {0b000000001000000000000001, 2}, {0b000000001000000000100001, 3}, }; static compact_table_entry data_type_table[] = { {0b000000001000001100, 20}, {0b001000000000000001, 0}, {0b001000000000100000, 1}, {0b001000000000100001, 2}, {0b001000000000111101, 21}, {0b001000000001100001, 3}, {0b001000000010100101, 22}, {0b001000000010111101, 4}, {0b001000001011111101, 5}, {0b001000001110100001, 6}, {0b001000001110100101, 7}, {0b001000001110111101, 8}, {0b001000010000100000, 23}, {0b001000010000100001, 9}, {0b001000110000100000, 10}, {0b001000110000100001, 11}, {0b001001010010100100, 24}, {0b001001010010100101, 12}, {0b001001110010000100, 25}, {0b001001110010100100, 13}, {0b001001110010100101, 14}, {0b001010010100001001, 26}, {0b001010010100101000, 30}, {0b001010110100101000, 31}, {0b001011110110101100, 29}, {0b001101111110111101, 27}, {0b001111001110111101, 15}, {0b001111011110011101, 16}, {0b001111011110111100, 17}, {0b001111011110111101, 18}, {0b001111111110111100, 19}, {0b001111111110111101, 28}, }; static compact_table_entry gen8_data_type_table[] = { {0b001000000000000000001, 0}, {0b001000000000001000000, 1}, {0b001000000000001000001, 2}, {0b001000000000011000001, 3}, {0b001000000000101011101, 4}, {0b001000000010111011101, 5}, {0b001000000011101000001, 6}, {0b001000000011101000101, 7}, {0b001000000011101011101, 8}, {0b001000001000001000001, 9}, {0b001000011000001000000, 10}, {0b001000011000001000001, 11}, {0b001000101000101000101, 12}, {0b001000111000101000100, 13}, {0b001000111000101000101, 14}, {0b001011100011101011101, 15}, {0b001011101011100011101, 16}, {0b001011101011101011100, 17}, {0b001011101011101011101, 18}, {0b001011111011101011100, 19}, {0b000000000010000001100, 20}, {0b001000000000001011101, 21}, {0b001000000000101000101, 22}, {0b001000001000001000000, 23}, {0b001000101000101000100, 24}, {0b001000111000100000100, 25}, {0b001001001001000001001, 26}, {0b001010111011101011101, 27}, {0b001011111011101011101, 28}, {0b001001111001101001100, 29}, {0b001001001001001001000, 30}, {0b001001011001001001000, 31}, }; static compact_table_entry data_type_decompact[] = { {0b001000000000000001, 0}, {0b001000000000100000, 1}, {0b001000000000100001, 2}, {0b001000000001100001, 3}, {0b001000000010111101, 4}, {0b001000001011111101, 5}, {0b001000001110100001, 6}, {0b001000001110100101, 7}, {0b001000001110111101, 8}, {0b001000010000100001, 9}, {0b001000110000100000, 10}, {0b001000110000100001, 11}, {0b001001010010100101, 12}, {0b001001110010100100, 13}, {0b001001110010100101, 14}, {0b001111001110111101, 15}, {0b001111011110011101, 16}, {0b001111011110111100, 17}, {0b001111011110111101, 18}, {0b001111111110111100, 19}, {0b000000001000001100, 20}, {0b001000000000111101, 21}, {0b001000000010100101, 22}, {0b001000010000100000, 23}, {0b001001010010100100, 24}, {0b001001110010000100, 25}, {0b001010010100001001, 26}, {0b001101111110111101, 27}, {0b001111111110111101, 28}, {0b001011110110101100, 29}, {0b001010010100101000, 30}, {0b001010110100101000, 31}, }; static compact_table_entry subreg_table[] = { {0b000000000000000, 0}, {0b000000000000001, 1}, {0b000000000001000, 2}, {0b000000000001111, 3}, {0b000000000010000, 4}, {0b000000010000000, 5}, {0b000000100000000, 6}, {0b000000110000000, 7}, {0b000001000000000, 8}, {0b000001000010000, 9}, {0b000001010000000, 10}, {0b001000000000000, 11}, {0b001000000000001, 12}, {0b001000010000001, 13}, {0b001000010000010, 14}, {0b001000010000011, 15}, {0b001000010000100, 16}, {0b001000010000111, 17}, {0b001000010001000, 18}, {0b001000010001110, 19}, {0b001000010001111, 20}, {0b001000110000000, 21}, {0b001000111101000, 22}, {0b010000000000000, 23}, {0b010000110000000, 24}, {0b011000000000000, 25}, {0b011110010000111, 26}, {0b100000000000000, 27}, {0b101000000000000, 28}, {0b110000000000000, 29}, {0b111000000000000, 30}, {0b111000000011100, 31}, }; static compact_table_entry srcreg_table[] = { {0b000000000000, 0}, {0b000000000010, 1}, {0b000000010000, 2}, {0b000000010010, 3}, {0b000000011000, 4}, {0b000000100000, 5}, {0b000000101000, 6}, {0b000001001000, 7}, {0b000001010000, 8}, {0b000001110000, 9}, {0b000001111000, 10}, {0b001100000000, 11}, {0b001100000010, 12}, {0b001100001000, 13}, {0b001100010000, 14}, {0b001100010010, 15}, {0b001100100000, 16}, {0b001100101000, 17}, {0b001100111000, 18}, {0b001101000000, 19}, {0b001101000010, 20}, {0b001101001000, 21}, {0b001101010000, 22}, {0b001101100000, 23}, {0b001101101000, 24}, {0b001101110000, 25}, {0b001101110001, 26}, {0b001101111000, 27}, {0b010001101000, 28}, {0b010001101001, 29}, {0b010001101010, 30}, {0b010110001000, 31}, }; static int cmp_key(const void *p1, const void*p2) { const compact_table_entry * px = (compact_table_entry *)p1; const compact_table_entry * py = (compact_table_entry *)p2; return (px->bit_pattern) - py->bit_pattern; } union ControlBits{ struct { uint32_t access_mode:1; uint32_t mask_control:1; uint32_t dependency_control:2; uint32_t quarter_control:2; uint32_t thread_control:2; uint32_t predicate_control:4; uint32_t predicate_inverse:1; uint32_t execution_size:3; uint32_t saturate:1; uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t pad:23; }; uint32_t data; }; union Src3ControlBits{ struct { uint32_t access_mode:1; uint32_t dependency_control:2; uint32_t nibble_control:1; uint32_t quarter_control:2; uint32_t thread_control:2; uint32_t predicate_control:4; uint32_t predicate_inverse:1; uint32_t execution_size:3; uint32_t conditional_modifier:4; uint32_t acc_wr_control:1; uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t mask_control:1; }; uint32_t data; }; union DataTypeBits{ struct { uint32_t dest_reg_file:2; uint32_t dest_reg_type:3; uint32_t src0_reg_file:2; uint32_t src0_reg_type:3; uint32_t src1_reg_file:2; uint32_t src1_reg_type:3; uint32_t dest_horiz_stride:2; uint32_t dest_address_mode:1; uint32_t pad:14; }; uint32_t data; }; union Gen8DataTypeBits{ struct { uint32_t dest_reg_file:2; uint32_t dest_reg_type:4; uint32_t src0_reg_file:2; uint32_t src0_reg_type:4; uint32_t src1_reg_file:2; uint32_t src1_reg_type:4; uint32_t dest_horiz_stride:2; uint32_t dest_address_mode:1; uint32_t pad:11; }; uint32_t data; }; union SubRegBits { struct { uint32_t dest_subreg_nr:5; uint32_t src0_subreg_nr:5; uint32_t src1_subreg_nr:5; uint32_t pad:17; }; uint32_t data; }; union SrcRegBits { struct { uint32_t src_abs:1; uint32_t src_negate:1; uint32_t src_address_mode:1; uint32_t src_horiz_stride:2; uint32_t src_width:3; uint32_t src_vert_stride:4; uint32_t pad:20; }; uint32_t data; }; void decompactInstruction(GenCompactInstruction * p, void *insn, uint32_t insn_version) { GenNativeInstruction *pNative = (union GenNativeInstruction *) insn; Gen7NativeInstruction *pOut = (union Gen7NativeInstruction *) insn; /* src3 compact insn */ if(p->bits1.opcode == GEN_OPCODE_MAD || p->bits1.opcode == GEN_OPCODE_LRP) { #define NO_SWIZZLE ((0<<0) | (1<<2) | (2<<4) | (3<<6)) assert(insn_version == 8); Gen8NativeInstruction *pOut = (union Gen8NativeInstruction *) insn; memset(pOut, 0, sizeof(Gen8NativeInstruction)); union Src3ControlBits control_bits; control_bits.data = src3_control_table[(uint32_t)p->src3Insn.bits1.control_index].bit_pattern; pOut->header.opcode = p->bits1.opcode; pOut->bits1.da1.flag_sub_reg_nr = control_bits.flag_sub_reg_nr; pOut->bits1.da1.flag_reg_nr = control_bits.flag_reg_nr; pOut->header.nib_ctrl = control_bits.nibble_control; pOut->header.execution_size = control_bits.execution_size; pOut->header.predicate_control = control_bits.predicate_control; pOut->header.predicate_inverse = control_bits.predicate_inverse; pOut->header.thread_control = control_bits.thread_control; pOut->header.quarter_control = control_bits.quarter_control; pOut->header.dependency_control = control_bits.dependency_control; pOut->header.access_mode = control_bits.access_mode; pOut->header.acc_wr_control = control_bits.acc_wr_control; pOut->header.destreg_or_condmod = control_bits.conditional_modifier; pOut->bits1.da1.mask_control= control_bits.mask_control; pOut->header.cmpt_control = p->bits1.cmpt_control; pOut->header.debug_control = p->bits1.debug_control; pOut->header.saturate = p->src3Insn.bits1.saturate; /* dst */ pOut->bits1.da3src.dest_reg_nr = p->src3Insn.bits1.dst_reg_nr; pOut->bits1.da3src.dest_writemask = 0xf; pOut->bits2.da3src.src0_swizzle = NO_SWIZZLE; pOut->bits2.da3src.src0_subreg_nr = p->src3Insn.bits2.src0_subnr; pOut->bits2.da3src.src0_reg_nr = p->src3Insn.bits2.src0_reg_nr; pOut->bits1.da3src.src0_negate = p->src3Insn.bits1.src_index == 1; pOut->bits2.da3src.src0_rep_ctrl = p->src3Insn.bits1.src0_rep_ctrl; pOut->bits2.da3src.src1_swizzle = NO_SWIZZLE; pOut->bits2.da3src.src1_subreg_nr_low = (p->src3Insn.bits2.src1_subnr) & 0x3; pOut->bits3.da3src.src1_subreg_nr_high = (p->src3Insn.bits2.src1_subnr) >> 2; pOut->bits2.da3src.src1_rep_ctrl = p->src3Insn.bits2.src1_rep_ctrl; pOut->bits3.da3src.src1_reg_nr = p->src3Insn.bits2.src1_reg_nr; pOut->bits1.da3src.src1_negate = p->src3Insn.bits1.src_index == 2; pOut->bits3.da3src.src2_swizzle = NO_SWIZZLE; pOut->bits3.da3src.src2_subreg_nr = p->src3Insn.bits2.src2_subnr; pOut->bits3.da3src.src2_rep_ctrl = p->src3Insn.bits2.src2_rep_ctrl; pOut->bits3.da3src.src2_reg_nr = p->src3Insn.bits2.src2_reg_nr; pOut->bits1.da3src.src2_negate = p->src3Insn.bits1.src_index == 3; #undef NO_SWIZZLE } else { if (insn_version == 7) { memset(pOut, 0, sizeof(Gen7NativeInstruction)); union ControlBits control_bits; control_bits.data = control_table[(uint32_t)p->bits1.control_index].bit_pattern; pNative->low.low = (uint32_t)p->bits1.opcode | ((control_bits.data & 0xffff) << 8); pOut->header.destreg_or_condmod = p->bits1.destreg_or_condmod; pOut->header.saturate = control_bits.saturate; pOut->header.acc_wr_control = p->bits1.acc_wr_control; pOut->header.cmpt_control = p->bits1.cmpt_control; pOut->header.debug_control = p->bits1.debug_control; union DataTypeBits data_type_bits; union SubRegBits subreg_bits; union SrcRegBits src0_bits; data_type_bits.data = data_type_decompact[(uint32_t)p->bits1.data_type_index].bit_pattern; subreg_bits.data = subreg_table[(uint32_t)p->bits1.sub_reg_index].bit_pattern; src0_bits.data = srcreg_table[p->bits1.src0_index_lo | p->bits2.src0_index_hi << 2].bit_pattern; pNative->low.high |= data_type_bits.data & 0x7fff; pOut->bits1.da1.dest_horiz_stride = data_type_bits.dest_horiz_stride; pOut->bits1.da1.dest_address_mode = data_type_bits.dest_address_mode; pOut->bits1.da1.dest_reg_nr = p->bits2.dest_reg_nr; pOut->bits1.da1.dest_subreg_nr = subreg_bits.dest_subreg_nr; pOut->bits2.da1.src0_subreg_nr = subreg_bits.src0_subreg_nr; pOut->bits2.da1.src0_reg_nr = p->bits2.src0_reg_nr; pNative->high.low |= (src0_bits.data << 13); pOut->bits2.da1.flag_sub_reg_nr = control_bits.flag_sub_reg_nr; pOut->bits2.da1.flag_reg_nr = control_bits.flag_reg_nr; if(data_type_bits.src1_reg_file == GEN_IMMEDIATE_VALUE) { uint32_t imm = (uint32_t)p->bits2.src1_reg_nr | (p->bits2.src1_index<<8); pOut->bits3.ud = imm & 0x1000 ? (imm | 0xfffff000) : imm; } else { union SrcRegBits src1_bits; src1_bits.data = srcreg_table[p->bits2.src1_index].bit_pattern; pOut->bits3.da1.src1_subreg_nr = subreg_bits.src1_subreg_nr; pOut->bits3.da1.src1_reg_nr = p->bits2.src1_reg_nr; pNative->high.high |= (src1_bits.data << 13); } } else if (insn_version == 8) { Gen8NativeInstruction *pOut = (union Gen8NativeInstruction *) insn; memset(pOut, 0, sizeof(Gen8NativeInstruction)); union ControlBits control_bits; control_bits.data = control_table[(uint32_t)p->bits1.control_index].bit_pattern; pOut->header.opcode = p->bits1.opcode; pOut->bits1.da1.flag_sub_reg_nr = control_bits.flag_sub_reg_nr; pOut->bits1.da1.flag_reg_nr = control_bits.flag_reg_nr; pOut->header.saturate = control_bits.saturate; pOut->header.execution_size= control_bits.execution_size; pOut->header.predicate_control= control_bits.predicate_control; pOut->header.predicate_inverse= control_bits.predicate_inverse; pOut->header.thread_control= control_bits.thread_control; pOut->header.quarter_control= control_bits.quarter_control; pOut->header.dependency_control = control_bits.dependency_control; pOut->header.access_mode= control_bits.access_mode; pOut->bits1.da1.mask_control= control_bits.mask_control; pOut->header.destreg_or_condmod = p->bits1.destreg_or_condmod; pOut->header.acc_wr_control = p->bits1.acc_wr_control; pOut->header.cmpt_control = p->bits1.cmpt_control; pOut->header.debug_control = p->bits1.debug_control; union Gen8DataTypeBits data_type_bits; union SubRegBits subreg_bits; union SrcRegBits src0_bits; data_type_bits.data = gen8_data_type_table[(uint32_t)p->bits1.data_type_index].bit_pattern; subreg_bits.data = subreg_table[(uint32_t)p->bits1.sub_reg_index].bit_pattern; src0_bits.data = srcreg_table[p->bits1.src0_index_lo | p->bits2.src0_index_hi << 2].bit_pattern; pOut->bits1.da1.dest_reg_file = data_type_bits.dest_reg_file; pOut->bits1.da1.dest_reg_type = data_type_bits.dest_reg_type; pOut->bits1.da1.dest_horiz_stride = data_type_bits.dest_horiz_stride; pOut->bits1.da1.dest_address_mode = data_type_bits.dest_address_mode; pOut->bits1.da1.dest_reg_nr = p->bits2.dest_reg_nr; pOut->bits1.da1.dest_subreg_nr = subreg_bits.dest_subreg_nr; pOut->bits1.da1.src0_reg_file = data_type_bits.src0_reg_file; pOut->bits1.da1.src0_reg_type = data_type_bits.src0_reg_type; pOut->bits2.da1.src0_subreg_nr = subreg_bits.src0_subreg_nr; pOut->bits2.da1.src0_reg_nr = p->bits2.src0_reg_nr; pNative->high.low |= (src0_bits.data << 13); pOut->bits2.da1.src1_reg_file = data_type_bits.src1_reg_file; pOut->bits2.da1.src1_reg_type = data_type_bits.src1_reg_type; if(data_type_bits.src1_reg_file == GEN_IMMEDIATE_VALUE) { uint32_t imm = (uint32_t)p->bits2.src1_reg_nr | (p->bits2.src1_index<<8); pOut->bits3.ud = imm & 0x1000 ? (imm | 0xfffff000) : imm; } else { union SrcRegBits src1_bits; src1_bits.data = srcreg_table[p->bits2.src1_index].bit_pattern; pOut->bits3.da1.src1_subreg_nr = subreg_bits.src1_subreg_nr; pOut->bits3.da1.src1_reg_nr = p->bits2.src1_reg_nr; pNative->high.high |= (src1_bits.data << 13); } } } } int compactControlBits(GenEncoder *p, uint32_t quarter, uint32_t execWidth) { const GenInstructionState *s = &p->curr; // some quick check if(s->nibControl != 0) return -1; if(s->predicate > GEN_PREDICATE_NORMAL) return -1; if(s->flag == 1) return -1; ControlBits b; b.data = 0; if (execWidth == 8) b.execution_size = GEN_WIDTH_8; else if (execWidth == 16) b.execution_size = GEN_WIDTH_16; else if (execWidth == 4) b.execution_size = GEN_WIDTH_4; else if (execWidth == 1) b.execution_size = GEN_WIDTH_1; else NOT_IMPLEMENTED; b.mask_control = s->noMask; b.quarter_control = quarter; b.predicate_control = s->predicate; b.predicate_inverse = s->inversePredicate; b.saturate = s->saturate; b.flag_sub_reg_nr = s->subFlag; b.flag_reg_nr = s->flag; compact_table_entry key; key.bit_pattern = b.data; compact_table_entry *r = (compact_table_entry *)bsearch(&key, control_table, sizeof(control_table)/sizeof(compact_table_entry), sizeof(compact_table_entry), cmp_key); if (r == NULL) return -1; return r->index; } int compactControlBitsSrc3(GenEncoder *p, uint32_t quarter, uint32_t execWidth) { const GenInstructionState *s = &p->curr; // some quick check if(s->nibControl != 0) return -1; if(s->predicate != GEN_PREDICATE_NONE) return -1; if(s->inversePredicate != 0) return -1; if(s->flag == 1) return -1; if(s->subFlag != 0) return -1; Src3ControlBits b; b.data = 0; if (execWidth == 8) b.execution_size = GEN_WIDTH_8; else if (execWidth == 16) b.execution_size = GEN_WIDTH_16; else if (execWidth == 4) return -1; else if (execWidth == 1) return -1; else NOT_IMPLEMENTED; b.mask_control = s->noMask; b.quarter_control = quarter; b.access_mode = 1; compact_table_entry key; key.bit_pattern = b.data; compact_table_entry *r = (compact_table_entry *)bsearch(&key, src3_control_table, sizeof(src3_control_table)/sizeof(compact_table_entry), sizeof(compact_table_entry), cmp_key); if (r == NULL) return -1; return r->index; } int compactDataTypeBits(GenEncoder *p, GenRegister *dst, GenRegister *src0, GenRegister *src1) { // compact does not support any indirect acess if(dst->address_mode != GEN_ADDRESS_DIRECT) return -1; if(src0->file == GEN_IMMEDIATE_VALUE) return -1; compact_table_entry *r = NULL; if(p->getCompactVersion() == 7) { DataTypeBits b; b.data = 0; b.dest_horiz_stride = dst->hstride == GEN_HORIZONTAL_STRIDE_0 ? GEN_HORIZONTAL_STRIDE_1 : dst->hstride; b.dest_address_mode = dst->address_mode; b.dest_reg_file = dst->file; b.dest_reg_type = dst->type; b.src0_reg_file = src0->file; b.src0_reg_type = src0->type; if(src1) { b.src1_reg_type = src1->type; b.src1_reg_file = src1->file; } else { // default to zero b.src1_reg_type = 0; b.src1_reg_file = 0; } compact_table_entry key; key.bit_pattern = b.data; r = (compact_table_entry *)bsearch(&key, data_type_table, sizeof(data_type_table)/sizeof(compact_table_entry), sizeof(compact_table_entry), cmp_key); } else if(p->getCompactVersion() == 8) { Gen8DataTypeBits b; b.data = 0; b.dest_horiz_stride = dst->hstride == GEN_HORIZONTAL_STRIDE_0 ? GEN_HORIZONTAL_STRIDE_1 : dst->hstride; b.dest_address_mode = dst->address_mode; b.dest_reg_file = dst->file; b.dest_reg_type = dst->type; b.src0_reg_file = src0->file; b.src0_reg_type = src0->type; if(src1) { b.src1_reg_type = src1->type; b.src1_reg_file = src1->file; } else { // default to zero b.src1_reg_type = 0; b.src1_reg_file = 0; } compact_table_entry key; key.bit_pattern = b.data; r = (compact_table_entry *)bsearch(&key, gen8_data_type_table, sizeof(gen8_data_type_table)/sizeof(compact_table_entry), sizeof(compact_table_entry), cmp_key); } if (r == NULL) return -1; return r->index; } int compactSubRegBits(GenEncoder *p, GenRegister *dst, GenRegister *src0, GenRegister *src1) { SubRegBits b; b.data = 0; b.dest_subreg_nr = dst->subnr; b.src0_subreg_nr = src0->subnr; if(src1) b.src1_subreg_nr = src1->subnr; else b.src1_subreg_nr = 0; compact_table_entry key; key.bit_pattern = b.data; compact_table_entry *r = (compact_table_entry *)bsearch(&key, subreg_table, sizeof(subreg_table)/sizeof(compact_table_entry), sizeof(compact_table_entry), cmp_key); if (r == NULL) return -1; return r->index; } int compactSrcRegBits(GenEncoder *p, GenRegister *src) { // As we only use GEN_ALIGN_1 and compact only support direct register access, // we only need to verify [hstride, width, vstride] if(src->file == GEN_IMMEDIATE_VALUE) return -1; if(src->address_mode != GEN_ADDRESS_DIRECT) return -1; SrcRegBits b; b.data = 0; b.src_abs = src->absolute; b.src_negate = src->negation; b.src_address_mode = src->address_mode; if(p->curr.execWidth == 1 && src->width == GEN_WIDTH_1) { b.src_width = src->width; b.src_horiz_stride = GEN_HORIZONTAL_STRIDE_0; b.src_vert_stride = GEN_VERTICAL_STRIDE_0; } else { b.src_horiz_stride = src->hstride; b.src_width = src->width; b.src_vert_stride = src->vstride; } compact_table_entry key; key.bit_pattern = b.data; compact_table_entry *r = (compact_table_entry *)bsearch(&key, srcreg_table, sizeof(srcreg_table)/sizeof(compact_table_entry), sizeof(compact_table_entry), cmp_key); if (r == NULL) return -1; return r->index; } bool compactAlu1(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src, uint32_t condition, bool split) { if(split) { // TODO support it return false; } else { int control_index = compactControlBits(p, p->curr.quarterControl, p->curr.execWidth); if(control_index == -1) return false; int data_type_index = compactDataTypeBits(p, &dst, &src, NULL); if(data_type_index == -1) return false; int sub_reg_index = compactSubRegBits(p, &dst, &src, NULL); if(sub_reg_index == -1) return false; int src_reg_index = compactSrcRegBits(p, &src); if(src_reg_index == -1) return false; GenCompactInstruction * insn = p->nextCompact(opcode); insn->bits1.control_index = control_index; insn->bits1.data_type_index = data_type_index; insn->bits1.sub_reg_index = sub_reg_index; insn->bits1.acc_wr_control = p->curr.accWrEnable; insn->bits1.destreg_or_condmod = condition; insn->bits1.cmpt_control = 1; insn->bits1.src0_index_lo = src_reg_index & 3; insn->bits2.src0_index_hi = src_reg_index >> 2; insn->bits2.src1_index = 0; insn->bits2.dest_reg_nr = dst.nr; insn->bits2.src0_reg_nr = src.nr; insn->bits2.src1_reg_nr = 0; return true; } } bool compactAlu2(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1, uint32_t condition, bool split) { if(split) { // TODO support it return false; } else { if(opcode == GEN_OPCODE_IF || opcode == GEN_OPCODE_ENDIF || opcode == GEN_OPCODE_JMPI) return false; int control_index = compactControlBits(p, p->curr.quarterControl, p->curr.execWidth); if(control_index == -1) return false; int data_type_index = compactDataTypeBits(p, &dst, &src0, &src1); if(data_type_index == -1) return false; int sub_reg_index = compactSubRegBits(p, &dst, &src0, &src1); if(sub_reg_index == -1) return false; int src0_reg_index = compactSrcRegBits(p, &src0); if(src0_reg_index == -1) return false; bool src1_imm = false; int src1_reg_index; if(src1.file == GEN_IMMEDIATE_VALUE) { if(src1.absolute != 0 || src1.negation != 0 || src1.type == GEN_TYPE_F) return false; if(src1.value.d < -4096 || src1.value.d > 4095) // 13bit signed imm return false; src1_imm = true; } else { src1_reg_index = compactSrcRegBits(p, &src1); if(src1_reg_index == -1) return false; } GenCompactInstruction * insn = p->nextCompact(opcode); insn->bits1.control_index = control_index; insn->bits1.data_type_index = data_type_index; insn->bits1.sub_reg_index = sub_reg_index; insn->bits1.acc_wr_control = p->curr.accWrEnable; insn->bits1.destreg_or_condmod = condition; insn->bits1.cmpt_control = 1; insn->bits1.src0_index_lo = src0_reg_index & 3; insn->bits2.src0_index_hi = src0_reg_index >> 2; insn->bits2.src1_index = src1_imm ? (src1.value.ud & 8191)>> 8 : src1_reg_index; insn->bits2.dest_reg_nr = dst.nr; insn->bits2.src0_reg_nr = src0.nr; insn->bits2.src1_reg_nr = src1_imm ? (src1.value.ud & 0xff): src1.nr; return true; } } bool compactAlu3(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1, GenRegister src2) { if(p->getCompactVersion() < 8) return false; if(opcode != GEN_OPCODE_MAD && opcode != GEN_OPCODE_LRP) return false; if(src0.type != GEN_TYPE_F) return false; assert(src0.file == GEN_GENERAL_REGISTER_FILE); assert(src0.address_mode == GEN_ADDRESS_DIRECT); assert(src0.nr < 128); assert(src1.file == GEN_GENERAL_REGISTER_FILE); assert(src1.address_mode == GEN_ADDRESS_DIRECT); assert(src1.nr < 128); assert(src2.file == GEN_GENERAL_REGISTER_FILE); assert(src2.address_mode == GEN_ADDRESS_DIRECT); assert(src2.nr < 128); int control_index = compactControlBitsSrc3(p, p->curr.quarterControl, p->curr.execWidth); if( control_index == -1) return false; if( src0.negation + src1.negation + src2.negation > 1) return false; if( src0.absolute + src1.absolute + src2.absolute > 0) return false; GenCompactInstruction *insn = p->nextCompact(opcode); insn->src3Insn.bits1.control_index = control_index; insn->src3Insn.bits1.compact_control = 1; insn->src3Insn.bits1.src_index = src0.negation ? 1 : (src1.negation ? 2: (src2.negation ? 3 : 0)); insn->src3Insn.bits1.dst_reg_nr = dst.nr ; insn->src3Insn.bits1.src0_rep_ctrl = src0.vstride == GEN_VERTICAL_STRIDE_0; insn->src3Insn.bits1.saturate = p->curr.saturate; /* bits2 */ insn->src3Insn.bits2.src1_rep_ctrl = src1.vstride == GEN_VERTICAL_STRIDE_0; insn->src3Insn.bits2.src2_rep_ctrl = src2.vstride == GEN_VERTICAL_STRIDE_0; insn->src3Insn.bits2.src0_subnr = src0.subnr/4; insn->src3Insn.bits2.src1_subnr = src1.subnr/4; insn->src3Insn.bits2.src2_subnr = src2.subnr/4; insn->src3Insn.bits2.src0_reg_nr = src0.nr; insn->src3Insn.bits2.src1_reg_nr = src1.nr; insn->src3Insn.bits2.src2_reg_nr = src2.nr; return true; } }; Beignet-1.3.2-Source/backend/src/backend/gen7_encoder.hpp000664 001750 001750 00000003515 13161142102 022270 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file gen7_context.hpp */ #ifndef __GBE_GEN7_ENCODER_HPP__ #define __GBE_GEN7_ENCODER_HPP__ #include "backend/gen_encoder.hpp" namespace gbe { /* This class is used to implement the HSW specific logic for encoder. */ class Gen7Encoder : public GenEncoder { public: virtual ~Gen7Encoder(void) { } Gen7Encoder(uint32_t simdWidth, uint32_t gen, uint32_t deviceID) : GenEncoder(simdWidth, gen, deviceID) { } virtual void setHeader(GenNativeInstruction *insn); virtual void setDst(GenNativeInstruction *insn, GenRegister dest); virtual void setSrc0(GenNativeInstruction *insn, GenRegister reg); virtual void setSrc1(GenNativeInstruction *insn, GenRegister reg); virtual void alu3(uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1, GenRegister src2); /*! MBlock read */ virtual void MBREAD(GenRegister dst, GenRegister header, uint32_t bti, uint32_t elemSize); /*! MBlock write */ virtual void MBWRITE(GenRegister header, GenRegister data, uint32_t bti, uint32_t elemSize, bool useSends); }; } #endif /* __GBE_GEN7_ENCODER_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/program.hpp000664 001750 001750 00000032671 13173554000 021414 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file program.hpp * \author Benjamin Segovia */ #ifndef __GBE_PROGRAM_HPP__ #define __GBE_PROGRAM_HPP__ #include "backend/program.h" #include "backend/context.hpp" #include "ir/constant.hpp" #include "ir/unit.hpp" #include "ir/function.hpp" #include "ir/printf.hpp" #include "ir/sampler.hpp" #include "sys/vector.hpp" #include namespace gbe { namespace ir { class Unit; // Compilation unit. Contains the program to compile } /* namespace ir */ } /* namespace gbe */ namespace gbe { /*! Info for the kernel argument */ struct KernelArgument { gbe_arg_type type; //!< Pointer, structure, image, regular value? uint32_t size; //!< Size of the argument uint32_t align; //!< addr alignment of the argument uint8_t bti; //!< binding table index for __global buffer // Strings for arg info. struct ArgInfo { uint32_t addrSpace; std::string typeName; std::string accessQual; std::string typeQual; std::string argName; uint32_t typeSize; }; ArgInfo info; }; /*! Stores the offset where to patch where to patch */ struct PatchInfo { INLINE PatchInfo(gbe_curbe_type type, uint32_t subType = 0u, uint32_t offset = 0u) : type(uint32_t(type)), subType(subType), offset(offset) {} INLINE PatchInfo(void) {} uint64_t type : 16; //!< Type of the patch (see program.h for the list) uint64_t subType : 32; //!< Optional sub-type of the patch (see program.h) uint64_t offset : 16; //!< Optional offset to encode }; /*! We will sort PatchInfo to make binary search */ INLINE bool operator< (PatchInfo i0, PatchInfo i1) { if (i0.type != i1.type) return i0.type < i1.type; return i0.subType < i1.subType; } /*! Describe a compiled kernel */ class Kernel : public NonCopyable, public Serializable { public: /*! Create an empty kernel with the given name */ Kernel(const std::string &name); /*! Destroy it */ virtual ~Kernel(void); /*! Return the instruction stream (to be implemented) */ virtual const char *getCode(void) const = 0; /*! Set the instruction stream.*/ virtual void setCode(const char *, size_t size) = 0; /*! Return the instruction stream size (to be implemented) */ virtual uint32_t getCodeSize(void) const = 0; /*! Get the kernel name */ INLINE const char *getName(void) const { return name.c_str(); } /*! Return the number of arguments for the kernel call */ INLINE uint32_t getArgNum(void) const { return argNum; } /*! Return the size of the given argument */ INLINE uint32_t getArgSize(uint32_t argID) const { return argID >= argNum ? 0u : args[argID].size; } /*! Return the bti for __global buffer */ INLINE uint8_t getArgBTI(uint32_t argID) const { return argID >= argNum ? 0u : args[argID].bti; } /*! Return the alignment of buffer argument */ INLINE uint32_t getArgAlign(uint32_t argID) const { return argID >= argNum ? 0u : args[argID].align; } /*! Return the type of the given argument */ INLINE gbe_arg_type getArgType(uint32_t argID) const { return argID >= argNum ? GBE_ARG_INVALID : args[argID].type; } /*! Get the offset where to patch. Returns -1 if no patch needed */ int32_t getCurbeOffset(gbe_curbe_type type, uint32_t subType) const; /*! Get the curbe size required by the kernel */ INLINE uint32_t getCurbeSize(void) const { return this->curbeSize; } /*! Return the size of the stack (zero if none) */ INLINE uint32_t getStackSize(void) const { return this->stackSize; } /*! Return the size of the scratch memory needed (zero if none) */ INLINE uint32_t getScratchSize(void) const { return this->scratchSize; } /*! Get the SIMD width for the kernel */ INLINE uint32_t getSIMDWidth(void) const { return this->simdWidth; } /*! Says if SLM is needed for it */ INLINE bool getUseSLM(void) const { return this->useSLM; } /*! get slm size for kernel local variable */ INLINE uint32_t getSLMSize(void) const { return this->slmSize; } /*! Return the OpenCL version */ INLINE void setOclVersion(uint32_t version) { this->oclVersion = version; } INLINE uint32_t getOclVersion(void) const { return this->oclVersion; } /*! Set sampler set. */ void setSamplerSet(ir::SamplerSet *from) { samplerSet = from; } /*! Get defined sampler size */ size_t getSamplerSize(void) const { return (samplerSet == NULL ? 0 : samplerSet->getDataSize()); } /*! Get defined sampler value array */ void getSamplerData(uint32_t *samplers) const { samplerSet->getData(samplers); } /*! Set image set. */ void setImageSet(ir::ImageSet * from) { imageSet = from; } /*! Set profiling info. */ void setProfilingInfo(ir::ProfilingInfo * from) { profilingInfo = from; } void * dupProfilingInfo() const { void* ptr = profilingInfo ? (void *)(new ir::ProfilingInfo(*profilingInfo)) : NULL; return ptr; } uint32_t getProfilingBTI(void) const { return profilingInfo ? profilingInfo->getBTI() : 0; } /*! Set printf set. */ void setPrintfSet(ir::PrintfSet * from) { printfSet = from; } uint32_t getPrintfNum() const { return printfSet ? printfSet->getPrintfNum() : 0; } void * dupPrintfSet() const { void* ptr = printfSet ? (void *)(new ir::PrintfSet(*printfSet)) : NULL; return ptr; } uint8_t getPrintfBufBTI() const { GBE_ASSERT(printfSet); return printfSet->getBufBTI(); } uint32_t getProfilingBufBTI() const { GBE_ASSERT(profilingInfo); return profilingInfo->getBTI(); } void outputProfilingInfo(void* buf) { if(profilingInfo) profilingInfo->outputProfilingInfo(buf); } KernelArgument::ArgInfo* getArgInfo(uint32_t id) const { return &args[id].info; } /*! Set compile work group size */ void setCompileWorkGroupSize(const size_t wg_sz[3]) { compileWgSize[0] = wg_sz[0]; compileWgSize[1] = wg_sz[1]; compileWgSize[2] = wg_sz[2]; } /*! Get compile work group size */ void getCompileWorkGroupSize (size_t wg_sz[3]) const { wg_sz[0] = compileWgSize[0]; wg_sz[1] = compileWgSize[1]; wg_sz[2] = compileWgSize[2]; } /*! Set function attributes string. */ void setFunctionAttributes(const std::string& functionAttributes) { this->functionAttributes= functionAttributes; } /*! Get function attributes string. */ const char* getFunctionAttributes(void) const {return this->functionAttributes.c_str();} /*! Get defined image size */ size_t getImageSize(void) const { return (imageSet == NULL ? 0 : imageSet->getDataSize()); } /*! Get defined image value array */ void getImageData(ImageInfo *images) const { imageSet->getData(images); } static const uint32_t magic_begin = TO_MAGIC('K', 'E', 'R', 'N'); static const uint32_t magic_end = TO_MAGIC('N', 'R', 'E', 'K'); /* format: magic_begin | name_size | name | arg_num | args | PatchInfo_num | PatchInfo | curbeSize | simdWidth | stackSize | scratchSize | useSLM | slmSize | samplers | images | code_size | code | magic_end */ /*! Implements the serialization. */ virtual uint32_t serializeToBin(std::ostream& outs); virtual uint32_t deserializeFromBin(std::istream& ins); virtual void printStatus(int indent, std::ostream& outs); /*! Does kernel use device enqueue */ INLINE bool getUseDeviceEnqueue(void) const { return this->useDeviceEnqueue; } /*! Change the device enqueue info of the function */ INLINE bool setUseDeviceEnqueue(bool useDeviceEnqueue) { return this->useDeviceEnqueue = useDeviceEnqueue; } protected: friend class Context; //!< Owns the kernels friend class GenContext; std::string name; //!< Kernel name KernelArgument *args; //!< Each argument vector patches; //!< Indicates how to build the curbe uint32_t argNum; //!< Number of function arguments uint32_t curbeSize; //!< Size of the data to push uint32_t simdWidth; //!< SIMD size for the kernel (lane number) uint32_t stackSize; //!< Stack size (0 if unused) uint32_t scratchSize; //!< Scratch memory size (may be 0 if unused) uint32_t oclVersion; //!< Opencl Version (120 for 1.2, 200 for 2.0) bool useSLM; //!< SLM requires a special HW config uint32_t slmSize; //!< slm size for kernel variable Context *ctx; //!< Save context after compiler to alloc constant buffer curbe ir::SamplerSet *samplerSet;//!< Copy from the corresponding function. ir::ImageSet *imageSet; //!< Copy from the corresponding function. ir::PrintfSet *printfSet; //!< Copy from the corresponding function. ir::ProfilingInfo *profilingInfo; //!< Copy from the corresponding function. uint32_t compileWgSize[3]; //!< required work group size by kernel attribute. std::string functionAttributes; //!< function attribute qualifiers combined. bool useDeviceEnqueue; //!< Has device enqueue? GBE_CLASS(Kernel); //!< Use custom allocators }; /*! Describe a compiled program */ class Program : public NonCopyable, public Serializable { public: /*! Create an empty program */ Program(uint32_t fast_relaxed_math); /*! Destroy the program */ virtual ~Program(void); /*! Clean LLVM resource of the program */ virtual void CleanLlvmResource() = 0; /*! Get the number of kernels in the program */ uint32_t getKernelNum(void) const { return kernels.size(); } /*! Get the kernel from its name */ Kernel *getKernel(const std::string &name) const { map::const_iterator it = kernels.find(name); if (it == kernels.end()) return NULL; else return it->second; } /*! Get the kernel from its ID */ Kernel *getKernel(uint32_t ID) const { uint32_t currID = 0; Kernel *kernel = NULL; for (map::const_iterator it = kernels.begin(); it != kernels.end(); ++it) { if (currID == ID) { kernel = it->second; break; } currID++; } return kernel; } const char *getDeviceEnqueueKernelName(uint32_t index) const { if(index >= blockFuncs.size()) return NULL; return blockFuncs[index].c_str(); } /*! Build a program from a ir::Unit */ bool buildFromUnit(const ir::Unit &unit, std::string &error); /*! Buils a program from a LLVM Module */ bool buildFromLLVMModule(const void* module, std::string &error, int optLevel); /*! Buils a program from a OCL string */ bool buildFromSource(const char *source, std::string &error); /*! Get size of the global constant arrays */ size_t getGlobalConstantSize(void) const { return constantSet->getDataSize(); } /*! Get the content of global constant arrays */ void getGlobalConstantData(char *mem) const { constantSet->getData(mem); } uint32_t getGlobalRelocCount(void) const { return relocTable->getCount(); } void getGlobalRelocTable(char *p) const { relocTable->getData(p); } static const uint32_t magic_begin = TO_MAGIC('P', 'R', 'O', 'G'); static const uint32_t magic_end = TO_MAGIC('G', 'O', 'R', 'P'); /* format: magic_begin | constantSet_flag | constSet_data | kernel_num | kernel_1 | ........ | kernel_n | magic_end | total_size */ /*! Implements the serialization. */ virtual uint32_t serializeToBin(std::ostream& outs); virtual uint32_t deserializeFromBin(std::istream& ins); virtual void printStatus(int indent, std::ostream& outs); uint32_t fast_relaxed_math : 1; protected: /*! Compile a kernel */ virtual Kernel *compileKernel(const ir::Unit &unit, const std::string &name, bool relaxMath, int profiling) = 0; /*! Allocate an empty kernel. */ virtual Kernel *allocateKernel(const std::string &name) = 0; /*! Kernels sorted by their name */ map kernels; /*! Global (constants) outside any kernel */ ir::ConstantSet *constantSet; /*! relocation table */ ir::RelocTable *relocTable; /*! device enqueue functions */ vector blockFuncs; /*! Use custom allocators */ GBE_CLASS(Program); }; } /* namespace gbe */ #endif /* __GBE_PROGRAM_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen_program.h000664 001750 001750 00000002226 13161142102 021667 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file program.h * \author Benjamin Segovia * * C-like interface for the gen kernels and programs */ #ifndef __GBE_GEN_PROGRAM_H__ #define __GBE_GEN_PROGRAM_H__ #include #include #include /*! This will make the compiler output Gen ISA code */ extern void genSetupCallBacks(void); #endif /* __GBE_GEN_PROGRAM_H__ */ Beignet-1.3.2-Source/backend/src/backend/gen75_context.hpp000664 001750 001750 00000004005 13161142102 022415 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file gen75_context.hpp */ #ifndef __GBE_GEN75_CONTEXT_HPP__ #define __GBE_GEN75_CONTEXT_HPP__ #include "backend/gen_context.hpp" #include "backend/gen75_encoder.hpp" namespace gbe { /* This class is used to implement the HSW specific logic for context. */ class Gen75Context : public GenContext { public: virtual ~Gen75Context(void) { } Gen75Context(const ir::Unit &unit, const std::string &name, uint32_t deviceID, bool relaxMath = false) : GenContext(unit, name, deviceID, relaxMath) { }; /*! device's max srcatch buffer size */ #define GEN75_SCRATCH_SIZE (2 * KB * KB) /*! Emit the per-lane stack pointer computation */ virtual void emitStackPointer(void); /*! Align the scratch size to the device's scratch unit size */ virtual uint32_t alignScratchSize(uint32_t size); /*! Get the device's max srcatch size */ virtual uint32_t getScratchSize(void) { //Because the allocate is use uint16_t, so clamp it, need refine return std::min(GEN75_SCRATCH_SIZE, 0x7fff); } protected: virtual GenEncoder* generateEncoder(void) { return GBE_NEW(Gen75Encoder, this->simdWidth, 75, deviceID); } private: virtual void emitSLMOffset(void); virtual void newSelection(void); }; } #endif /* __GBE_GEN75_CONTEXT_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen9_context.cpp000664 001750 001750 00000021126 13173554000 022337 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file gen9_context.cpp */ #include "backend/gen9_context.hpp" #include "backend/gen_insn_selection.hpp" #include "backend/gen_program.hpp" namespace gbe { void Gen9Context::newSelection(void) { this->sel = GBE_NEW(Selection9, *this); } void Gen9Context::emitBarrierInstruction(const SelectionInstruction &insn) { const GenRegister src = ra->genReg(insn.src(0)); const GenRegister fenceDst = ra->genReg(insn.dst(0)); uint32_t barrierType = insn.extra.barrierType; const GenRegister barrierId = ra->genReg(GenRegister::ud1grf(ir::ocl::barrierid)); bool imageFence = barrierType & ir::SYNC_IMAGE_FENCE; if (barrierType & ir::SYNC_GLOBAL_READ_FENCE) { p->FENCE(fenceDst, imageFence); p->MOV(fenceDst, fenceDst); } p->push(); // As only the payload.2 is used and all the other regions are ignored // SIMD8 mode here is safe. p->curr.execWidth = 8; p->curr.physicalFlag = 0; p->curr.noMask = 1; // Copy barrier id from r0. p->AND(src, barrierId, GenRegister::immud(0x8f000000)); // A barrier is OK to start the thread synchronization *and* SLM fence p->BARRIER(src); p->curr.execWidth = 1; // Now we wait for the other threads p->curr.predicate = GEN_PREDICATE_NONE; p->WAIT(); p->pop(); if (imageFence) { p->FLUSH_SAMPLERCACHE(fenceDst); p->MOV(fenceDst, fenceDst); } } void BxtContext::newSelection(void) { this->sel = GBE_NEW(SelectionBxt, *this); } void BxtContext::calculateFullU64MUL(GenRegister src0, GenRegister src1, GenRegister dst_h, GenRegister dst_l, GenRegister s0l_s1h, GenRegister s0h_s1l) { src0.type = src1.type = GEN_TYPE_UD; dst_h.type = dst_l.type = GEN_TYPE_UL; s0l_s1h.type = s0h_s1l.type = GEN_TYPE_UL; //GenRegister tmp; GenRegister s0l = unpacked_ud(src0); GenRegister s1l = unpacked_ud(src1); GenRegister s0h = unpacked_ud(s0l_s1h); //s0h only used before s0l_s1h, reuse s0l_s1h GenRegister s1h = unpacked_ud(dst_l); //s1h only used before dst_l, reuse dst_l p->MOV(s0h, GenRegister::offset(s0l, 0, 4)); p->MOV(s1h, GenRegister::offset(s1l, 0, 4)); /* High 32 bits X High 32 bits. */ p->MUL(dst_h, s0h, s1h); /* High 32 bits X low 32 bits. */ p->MUL(s0h_s1l, s0h, s1l); /* Low 32 bits X high 32 bits. */ p->MUL(s0l_s1h, s0l, s1h); /* Low 32 bits X low 32 bits. */ p->MUL(dst_l, s0l, s1l); /* Because the max product of s0l*s1h is (2^N - 1) * (2^N - 1) = 2^2N + 1 - 2^(N+1), here N = 32 The max of addding 2 32bits integer to it is 2^2N + 1 - 2^(N+1) + 2*(2^N - 1) = 2^2N - 1 which means the product s0h_s1l adds dst_l's high 32 bits and then adds s0l_s1h's low 32 bits will not overflow and have no carry. By this manner, we can avoid using acc register, which has a lot of restrictions. */ GenRegister s0l_s1h_l = unpacked_ud(s0l_s1h); p->ADD(s0h_s1l, s0h_s1l, s0l_s1h_l); p->SHR(s0l_s1h, s0l_s1h, GenRegister::immud(32)); GenRegister s0l_s1h_h = unpacked_ud(s0l_s1h); p->ADD(dst_h, dst_h, s0l_s1h_h); GenRegister dst_l_h = unpacked_ud(s0l_s1h); p->MOV(dst_l_h, unpacked_ud(dst_l, 1)); p->ADD(s0h_s1l, s0h_s1l, dst_l_h); // No longer need s0l_s1h GenRegister tmp = s0l_s1h; p->SHL(tmp, s0h_s1l, GenRegister::immud(32)); GenRegister tmp_unpacked = unpacked_ud(tmp, 1); p->MOV(unpacked_ud(dst_l, 1), tmp_unpacked); p->SHR(tmp, s0h_s1l, GenRegister::immud(32)); p->ADD(dst_h, dst_h, tmp); } void BxtContext::emitI64MULInstruction(const SelectionInstruction &insn) { GenRegister src0 = ra->genReg(insn.src(0)); GenRegister src1 = ra->genReg(insn.src(1)); GenRegister dst = ra->genReg(insn.dst(0)); GenRegister res = ra->genReg(insn.dst(1)); src0.type = src1.type = GEN_TYPE_UD; dst.type = GEN_TYPE_UL; res.type = GEN_TYPE_UL; /* Low 32 bits X low 32 bits. */ GenRegister s0l = unpacked_ud(src0); GenRegister s1l = unpacked_ud(src1); p->MUL(dst, s0l, s1l); /* Low 32 bits X high 32 bits. */ GenRegister s1h = unpacked_ud(res); p->MOV(s1h, unpacked_ud(src1, 1)); p->MUL(res, s0l, s1h); p->SHL(res, res, GenRegister::immud(32)); p->ADD(dst, dst, res); /* High 32 bits X low 32 bits. */ GenRegister s0h = unpacked_ud(res); p->MOV(s0h, unpacked_ud(src0, 1)); p->MUL(res, s0h, s1l); p->SHL(res, res, GenRegister::immud(32)); p->ADD(dst, dst, res); } void BxtContext::setA0Content(uint16_t new_a0[16], uint16_t max_offset, int sz) { if (sz == 0) sz = 16; GBE_ASSERT(sz%4 == 0); GBE_ASSERT(new_a0[0] >= 0 && new_a0[0] < 4096); p->push(); p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->curr.noMask = 1; for (int i = 0; i < sz/2; i++) { p->MOV(GenRegister::retype(GenRegister::addr1(i*2), GEN_TYPE_UD), GenRegister::immud(new_a0[i*2 + 1] << 16 | new_a0[i*2])); } p->pop(); } void BxtContext::emitStackPointer(void) { using namespace ir; // Only emit stack pointer computation if we use a stack if (kernel->getStackSize() == 0) return; // Check that everything is consistent in the kernel code const uint32_t perLaneSize = kernel->getStackSize(); GBE_ASSERT(perLaneSize > 0); const GenRegister selStatckPtr = this->simdWidth == 8 ? GenRegister::ud8grf(ir::ocl::stackptr) : GenRegister::ud16grf(ir::ocl::stackptr); const GenRegister stackptr = ra->genReg(selStatckPtr); // borrow block ip as temporary register as we will // initialize block ip latter. const GenRegister tmpReg = GenRegister::retype(GenRegister::vec1(getBlockIP()), GEN_TYPE_UW); const GenRegister tmpReg_ud = GenRegister::retype(tmpReg, GEN_TYPE_UD); loadLaneID(stackptr); // We compute the per-lane stack pointer here // threadId * perThreadSize + laneId*perLaneSize or // (threadId * simdWidth + laneId)*perLaneSize // let private address start from zero //p->MOV(stackptr, GenRegister::immud(0)); p->push(); p->curr.execWidth = 1; p->curr.predicate = GEN_PREDICATE_NONE; p->AND(tmpReg, GenRegister::ud1grf(0,5), GenRegister::immuw(0x1ff)); //threadId p->MUL(tmpReg, tmpReg, GenRegister::immuw(this->simdWidth)); //threadId * simdWidth p->curr.execWidth = this->simdWidth; p->ADD(stackptr, GenRegister::unpacked_uw(stackptr), tmpReg); //threadId * simdWidth + laneId, must < 64K p->curr.execWidth = 1; p->MOV(tmpReg_ud, GenRegister::immud(perLaneSize)); p->curr.execWidth = this->simdWidth; p->MUL(stackptr, tmpReg_ud, GenRegister::unpacked_uw(stackptr)); // (threadId * simdWidth + laneId)*perLaneSize if (fn.getPointerFamily() == ir::FAMILY_QWORD) { const GenRegister selStatckPtr2 = this->simdWidth == 8 ? GenRegister::ul8grf(ir::ocl::stackptr) : GenRegister::ul16grf(ir::ocl::stackptr); GenRegister stackptr2 = ra->genReg(selStatckPtr2); GenRegister sp = GenRegister::unpacked_ud(stackptr2.nr, stackptr2.subnr); int simdWidth = p->curr.execWidth; if (simdWidth == 16) { // we need do second quarter first, because the dst type is QW, // while the src is DW. If we do first quater first, the 1st // quarter's dst would contain the 2nd quarter's src. p->curr.execWidth = 8; p->curr.quarterControl = GEN_COMPRESSION_Q2; p->MOV(GenRegister::Qn(sp, 1), GenRegister::Qn(stackptr,1)); p->MOV(GenRegister::Qn(stackptr2, 1), GenRegister::Qn(sp,1)); } p->curr.quarterControl = GEN_COMPRESSION_Q1; p->MOV(sp, stackptr); p->MOV(stackptr2, sp); } p->pop(); } void KblContext::newSelection(void) { this->sel = GBE_NEW(SelectionKbl, *this); } void GlkContext::newSelection(void) { this->sel = GBE_NEW(SelectionGlk, *this); } } Beignet-1.3.2-Source/backend/src/backend/gen9_instruction.hpp000664 001750 001750 00000004345 13161142102 023236 0ustar00yryr000000 000000 /* * Copyright © 2016 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Guo, Yejun */ #ifndef __GEN9_INSTRUCTION_HPP__ #define __GEN9_INSTRUCTION_HPP__ union Gen9NativeInstruction { struct { struct { uint32_t opcode:7; uint32_t pad:1; uint32_t access_mode:1; uint32_t dependency_control:2; uint32_t nib_ctrl:1; uint32_t quarter_control:2; uint32_t thread_control:2; uint32_t predicate_control:4; uint32_t predicate_inverse:1; uint32_t execution_size:3; uint32_t destreg_or_condmod:4; uint32_t acc_wr_control:1; uint32_t cmpt_control:1; uint32_t debug_control:1; uint32_t saturate:1; } header; union { struct { uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t mask_control:1; uint32_t dest_reg_file_0:1; uint32_t src1_reg_file_0:1; uint32_t dest_reg_type:4; uint32_t pad0:3; uint32_t src1_reg_nr:8; uint32_t dest_subreg_nr:1; uint32_t dest_reg_nr:8; uint32_t pad1:1; uint32_t pad2:1; //direct mode is used uint32_t dest_address_mode:1; } sends; uint32_t ud; }bits1; union { struct { uint32_t src1_length:4; //exdesc_9_6 uint32_t src0_subreg_nr:1; uint32_t src0_reg_nr:8; uint32_t sel_reg32_desc:1; uint32_t pad0:1; uint32_t src0_address_mode:1; uint32_t exdesc_31_16:16; } sends; uint32_t ud; } bits2; union { uint32_t ud; } bits3; }; }; #endif Beignet-1.3.2-Source/backend/src/backend/gen_program.hpp000664 001750 001750 00000006124 13161142102 022230 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file program.hpp * \author Benjamin Segovia */ #ifndef __GBE_GEN_PROGRAM_HPP__ #define __GBE_GEN_PROGRAM_HPP__ #include "backend/program.h" #include "backend/program.hpp" #include "backend/gen_defs.hpp" // Gen ISA instruction struct GenInstruction; namespace gbe { /*! Describe a compiled kernel */ class GenKernel : public Kernel { public: /*! Create an empty kernel with the given name */ GenKernel(const std::string &name, uint32_t deviceID); /*! Destroy it */ virtual ~GenKernel(void); /*! Implements base class */ virtual const char *getCode(void) const; /*! Set the instruction stream (to be implemented) */ virtual void setCode(const char *, size_t size); /*! Implements get the code size */ virtual uint32_t getCodeSize(void) const; /*! Implements printStatus*/ virtual void printStatus(int indent, std::ostream& outs); uint32_t deviceID; //!< Current device ID GenInstruction *insns; //!< Instruction stream uint32_t insnNum; //!< Number of instructions GBE_CLASS(GenKernel); //!< Use custom allocators }; /*! Describe a compiled program */ class GenProgram : public Program { public: /*! Create an empty program */ GenProgram(uint32_t deviceID, const void* mod = NULL, const void* ctx = NULL, const char* asm_fname = NULL, uint32_t fast_relaxed_math = 0) : Program(fast_relaxed_math), deviceID(deviceID),module((void*)mod), llvm_ctx((void*)ctx), asm_file_name(asm_fname) {} /*! Current device ID*/ uint32_t deviceID; /*! Destroy the program */ virtual ~GenProgram(void) {}; /*! Clean LLVM resource */ virtual void CleanLlvmResource(void); /*! Implements base class */ virtual Kernel *compileKernel(const ir::Unit &unit, const std::string &name, bool relaxMath, int profiling); /*! Allocate an empty kernel. */ virtual Kernel *allocateKernel(const std::string &name) { return GBE_NEW(GenKernel, name, deviceID); } void* module; void* llvm_ctx; const char* asm_file_name; /*! Use custom allocators */ GBE_CLASS(GenProgram); }; /*! decompact GEN ASM if it is in compacted format */ extern void decompactInstruction(union GenCompactInstruction *p, void *insn, uint32_t insn_version); } /* namespace gbe */ #endif /* __GBE_GEN_PROGRAM_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen_reg_allocation.cpp000664 001750 001750 00000175016 13161142102 023545 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file gen_reg_allocation.cpp * \author Benjamin Segovia */ #include "ir/profile.hpp" #include "ir/function.hpp" #include "backend/gen_insn_selection.hpp" #include "backend/gen_reg_allocation.hpp" #include "backend/gen_register.hpp" #include "backend/program.hpp" #include "sys/exception.hpp" #include "sys/cvar.hpp" #include #include #include #include #define HALF_REGISTER_FILE_OFFSET (32*64) namespace gbe { ///////////////////////////////////////////////////////////////////////////// // Register allocator internal implementation ///////////////////////////////////////////////////////////////////////////// /*! Provides the location of a register in a vector */ typedef std::pair VectorLocation; /*! Interval as used in linear scan allocator. Basically, stores the first and * the last instruction where the register is alive */ struct GenRegInterval { INLINE GenRegInterval(ir::Register reg) : reg(reg), minID(INT_MAX), maxID(-INT_MAX), accessCount(0), conflictReg(0), b3OpAlign(0) {} ir::Register reg; //!< (virtual) register of the interval int32_t minID, maxID; //!< Starting and ending points int32_t accessCount; ir::Register conflictReg; // < has banck conflict with this register bool b3OpAlign; }; struct SpillInterval { SpillInterval(const ir::Register r, float c): reg(r), cost(c) {} ir::Register reg; float cost; }; typedef std::vector::iterator SpillIntervalIter; /*! Implements the register allocation */ class GenRegAllocator::Opaque { public: /*! Initialize the register allocator */ Opaque(GenContext &ctx); /*! Release all taken resources */ ~Opaque(void); /*! Perform the register allocation. Return true if success */ bool allocate(Selection &selection); /*! Return the Gen register from the selection register */ GenRegister genReg(const GenRegister ®); INLINE bool isAllocated(const ir::Register ®) { return RA.contains(reg); } /*! Output the register allocation */ void outputAllocation(void); INLINE void getRegAttrib(ir::Register reg, uint32_t ®Size, ir::RegisterFamily *regFamily = NULL) const { // Note that byte vector registers use two bytes per byte (and can be // interleaved) static const size_t familyVectorSize[] = {2,2,2,4,8,16,32}; static const size_t familyScalarSize[] = {2,2,2,4,8,16,32}; using namespace ir; const bool isScalar = ctx.sel->isScalarReg(reg); const RegisterData regData = ctx.sel->getRegisterData(reg); const RegisterFamily family = regData.family; if (family == ir::FAMILY_REG) regSize = 32; else { const uint32_t typeSize = isScalar ? familyScalarSize[family] : familyVectorSize[family]; regSize = isScalar ? typeSize : ctx.getSimdWidth() * typeSize; } if (regFamily != NULL) *regFamily = family; } private: /*! Expire one GRF interval. Return true if one was successfully expired */ bool expireGRF(const GenRegInterval &limit); /*! Expire a flag register. Return true if one was successfully expired */ bool expireFlag(const GenRegInterval &limit); /*! Allocate the virtual boolean (== flags) registers */ void allocateFlags(Selection &selection); /*! calculate the spill cost, what we store here is 'use count', * we use [use count]/[live range] as spill cost */ void calculateSpillCost(Selection &selection); /*! validated flags which contains valid value in the physical flag register */ set validatedFlags; /*! validated temp flag register which indicate the flag 0,1 contains which virtual flag register. */ uint32_t validTempFlagReg; /*! validate flag for the current flag user instruction */ void validateFlag(Selection &selection, SelectionInstruction &insn); /*! Allocate the GRF registers */ bool allocateGRFs(Selection &selection); /*! Create gen registers for all preallocated special registers. */ void allocateSpecialRegs(void); /*! Create a Gen register from a register set in the payload */ void allocatePayloadReg(ir::Register, uint32_t offset, uint32_t subOffset = 0); /*! Create the intervals for each register */ /*! Allocate the vectors detected in the instruction selection pass */ void allocateVector(Selection &selection); /*! Allocate the given interval. Return true if success */ bool createGenReg(const Selection &selection, const GenRegInterval &interval); /*! Indicate if the registers are already allocated in vectors */ bool isAllocated(const SelectionVector *vector) const; /*! Reallocate registers if needed to make the registers in the vector * contigous in memory */ void coalesce(Selection &selection, SelectionVector *vector); /*! The context owns the register allocator */ GenContext &ctx; /*! Map virtual registers to offset in the (physical) register file */ map RA; /*! Map offset to virtual registers. */ map offsetReg; /*! Provides the position of each register in a vector */ map vectorMap; /*! All vectors used in the selection */ vector vectors; /*! The set of booleans that will go to GRF (cannot be kept into flags) */ set grfBooleans; /*! The set of booleans which be held in flags, don't need to allocate grf */ set flagBooleans; /*! All the register intervals */ vector intervals; /*! All the boolean register intervals on the corresponding BB*/ typedef map RegIntervalMap; map boolIntervalsMap; /*! Intervals sorting based on starting point positions */ vector starting; /*! Intervals sorting based on ending point positions */ vector ending; /*! registers that are spilled */ SpilledRegs spilledRegs; /*! register which could be spilled.*/ std::set spillCandidate; /*! BBs last instruction ID map */ map bbLastInsnIDMap; /* reserved registers for register spill/reload */ uint32_t reservedReg; /*! Current vector to expire */ uint32_t expiringID; INLINE void insertNewReg(const Selection &selection, ir::Register reg, uint32_t grfOffset, bool isVector = false); INLINE bool expireReg(ir::Register reg); INLINE bool spillAtInterval(GenRegInterval interval, int size, uint32_t alignment); INLINE bool findNextSpillCandidate(std::vector &candidate, int &remainSize, int &offset, SpillIntervalIter &nextCand); INLINE uint32_t allocateReg(GenRegInterval interval, uint32_t size, uint32_t alignment); INLINE bool spillReg(GenRegInterval interval, bool isAllocated = false); INLINE bool spillReg(ir::Register reg, bool isAllocated = false); INLINE bool vectorCanSpill(SelectionVector *vector); INLINE bool allocateScratchForSpilled(); void allocateCurbePayload(void); /*! replace specified source/dst register with temporary register and update interval */ INLINE ir::Register replaceReg(Selection &sel, SelectionInstruction *insn, uint32_t regID, bool isSrc, ir::Type type = ir::TYPE_FLOAT, bool needMov = true) { ir::Register reg; if (isSrc) { reg = sel.replaceSrc(insn, regID, type, needMov); assert(reg == intervals.size()); intervals.push_back(reg); intervals[reg].minID = insn->ID - 1; intervals[reg].maxID = insn->ID; } else { reg = sel.replaceDst(insn, regID, type, needMov); assert(reg == intervals.size()); intervals.push_back(reg); intervals[reg].minID = insn->ID; intervals[reg].maxID = insn->ID + 1; } return reg; } /*! Use custom allocator */ friend GenRegAllocator; GBE_CLASS(Opaque); }; GenRegAllocator::Opaque::Opaque(GenContext &ctx) : ctx(ctx) {} GenRegAllocator::Opaque::~Opaque(void) {} void GenRegAllocator::Opaque::allocatePayloadReg(ir::Register reg, uint32_t offset, uint32_t subOffset) { using namespace ir; assert(offset >= GEN_REG_SIZE); offset += subOffset; RA.insert(std::make_pair(reg, offset)); //GBE_ASSERT(reg != ocl::blockip || (offset % GEN_REG_SIZE == 0)); //this->intervals[reg].minID = 0; //this->intervals[reg].maxID = 0; } INLINE void GenRegAllocator::Opaque::allocateSpecialRegs(void) { using namespace ir; for(auto &it : this->ctx.curbeRegs) allocatePayloadReg(it.first, it.second); // Allocate all pushed registers (i.e. structure kernel arguments) const Function &fn = ctx.getFunction(); GBE_ASSERT(fn.getProfile() == PROFILE_OCL); const Function::PushMap &pushMap = fn.getPushMap(); for (auto rit = pushMap.rbegin(); rit != pushMap.rend(); ++rit) { const uint32_t argID = rit->second.argID; const FunctionArgument arg = fn.getArg(argID); const uint32_t subOffset = rit->second.offset; const Register reg = rit->second.getRegister(); if (intervals[reg].maxID == - INT_MAX) continue; auto it = this->ctx.curbeRegs.find(arg.reg); assert(it != ctx.curbeRegs.end()); allocatePayloadReg(reg, it->second, subOffset); ctx.splitBlock(it->second, subOffset); } // Group and barrier IDs are always allocated by the hardware in r0 RA.insert(std::make_pair(ocl::groupid0, 1*sizeof(float))); // r0.1 RA.insert(std::make_pair(ocl::groupid1, 6*sizeof(float))); // r0.6 RA.insert(std::make_pair(ocl::groupid2, 7*sizeof(float))); // r0.7 RA.insert(std::make_pair(ocl::barrierid, 2*sizeof(float))); // r0.2 } template inline bool cmp(const GenRegInterval *i0, const GenRegInterval *i1) { if (sortStartingPoint) { if (i0->minID == i1->minID) return (i0->maxID < i1->maxID); return i0->minID < i1->minID; } else { if (i0->maxID == i1->maxID) return (i0->minID < i1->minID); return i0->maxID < i1->maxID; } } void GenRegAllocator::Opaque::allocateCurbePayload(void) { vector payloadInterval; for (auto interval : starting) { if (!ctx.isPayloadReg(interval->reg)) continue; if (interval->minID > 0) break; payloadInterval.push_back(interval); } std::sort(payloadInterval.begin(), payloadInterval.end(), cmp); for(auto interval : payloadInterval) { if (interval->maxID < 0) continue; ctx.allocCurbeReg(interval->reg); } } bool GenRegAllocator::Opaque::createGenReg(const Selection &selection, const GenRegInterval &interval) { using namespace ir; const ir::Register reg = interval.reg; if (RA.contains(reg) == true) return true; // already allocated uint32_t regSize; ir::RegisterFamily family; getRegAttrib(reg, regSize, &family); uint32_t grfOffset = allocateReg(interval, regSize, regSize); if (grfOffset == 0) { return false; } insertNewReg(selection, reg, grfOffset); return true; } bool GenRegAllocator::Opaque::isAllocated(const SelectionVector *vector) const { const ir::Register first = vector->reg[0].reg(); const auto it = vectorMap.find(first); // If the first register is not allocated we are done if (it == vectorMap.end()) return false; // If there are more left registers than in the found vector, there are // still registers to allocate const SelectionVector *other = it->second.first; const uint32_t otherFirst = it->second.second; const uint32_t leftNum = other->regNum - otherFirst; if (leftNum < vector->regNum) return false; // Now check that all the registers in the already allocated vector match // the current vector for (uint32_t regID = 1; regID < vector->regNum; ++regID) { const ir::Register from = vector->reg[regID].reg(); const ir::Register to = other->reg[regID + otherFirst].reg(); if (from != to) return false; } return true; } void GenRegAllocator::Opaque::coalesce(Selection &selection, SelectionVector *vector) { for (uint32_t regID = 0; regID < vector->regNum; ++regID) { const ir::Register reg = vector->reg[regID].reg(); const auto it = this->vectorMap.find(reg); // case 1: the register is not already in a vector, so it can stay in this // vector. Note that local IDs are *non-scalar* special registers but will // require a MOV anyway since pre-allocated in the CURBE // for dst SelectionVector, we can always try to allocate them even under // spilling, reason is that its components can be expired separately, so, // it does not introduce too much register pressure. if (it == vectorMap.end() && ctx.sel->isScalarReg(reg) == false && ctx.isSpecialReg(reg) == false && (ctx.reservedSpillRegs == 0 || !vector->isSrc) ) { const VectorLocation location = std::make_pair(vector, regID); this->vectorMap.insert(std::make_pair(reg, location)); } // case 2: the register is already in another vector, so we need to move // it to a temporary register. // TODO: we can do better than that if we analyze the liveness of the // already allocated registers in the vector. If there is no inteference // and the order is maintained, we can reuse the previous vector and avoid // the MOVs else { ir::Register tmp; ir::Type type = getIRType(vector->reg[regID].type); tmp = this->replaceReg(selection, vector->insn, regID + vector->offsetID, vector->isSrc, type); const VectorLocation location = std::make_pair(vector, regID); this->vectorMap.insert(std::make_pair(tmp, location)); } } } /*! Will sort vector in decreasing order */ inline bool cmpVec(const SelectionVector *v0, const SelectionVector *v1) { return v0->regNum > v1->regNum; } void GenRegAllocator::Opaque::allocateVector(Selection &selection) { const uint32_t vectorNum = selection.getVectorNum(); this->vectors.resize(vectorNum); // First we find and store all vectors uint32_t vectorID = 0; for (auto &block : *selection.blockList) for (auto &v : block.vectorList) this->vectors[vectorID++] = &v; GBE_ASSERT(vectorID == vectorNum); // Heuristic (really simple...): sort them by the number of registers they // contain std::sort(this->vectors.begin(), this->vectors.end(), cmpVec); // Insert MOVs when this is required for (vectorID = 0; vectorID < vectorNum; ++vectorID) { SelectionVector *vector = this->vectors[vectorID]; if (this->isAllocated(vector)) continue; this->coalesce(selection, vector); } } bool GenRegAllocator::Opaque::expireGRF(const GenRegInterval &limit) { bool ret = false; while (this->expiringID != ending.size()) { const GenRegInterval *toExpire = this->ending[this->expiringID]; const ir::Register reg = toExpire->reg; // Dead code produced by the insn selection -> we skip it if (toExpire->minID > toExpire->maxID) { this->expiringID++; continue; } //ignore register that already spilled if(spilledRegs.find(reg) != spilledRegs.end()) { this->expiringID++; continue; } if (toExpire->maxID >= limit.minID) break; if (expireReg(reg)) ret = true; this->expiringID++; } // We were not able to expire anything return ret; } #define IS_IMPLICITLY_MOD_FLAG(insn) (insn.state.modFlag == 1 && \ (insn.opcode == SEL_OP_MOV || \ insn.opcode == SEL_OP_AND || \ insn.opcode == SEL_OP_OR || \ insn.opcode == SEL_OP_XOR)) #define IS_SCALAR_FLAG(insn) selection.isScalarReg(ir::Register(insn.state.flagIndex)) #define GET_FLAG_REG(insn) GenRegister::uwxgrf(IS_SCALAR_FLAG(insn) ? 1 : 8,\ ir::Register(insn.state.flagIndex)); #define IS_TEMP_FLAG(insn) (insn.state.flag == 0 && insn.state.subFlag == 1) #define NEED_DST_GRF_TYPE_FIX(ty) \ (ty == GEN_TYPE_F || \ ty == GEN_TYPE_HF || \ ty == GEN_TYPE_DF || \ ty == GEN_TYPE_UL || \ ty == GEN_TYPE_L) // Flag is a virtual flag, this function is to validate the virtual flag // to a physical flag. It is used to validate both temporary flag and the // non-temporary flag registers. // We track the last temporary validate register, if it's the same as // current, we can avoid the revalidation. void GenRegAllocator::Opaque::validateFlag(Selection &selection, SelectionInstruction &insn) { GBE_ASSERT(insn.state.physicalFlag == 1); if (!IS_TEMP_FLAG(insn) && validatedFlags.find(insn.state.flagIndex) != validatedFlags.end()) return; else if (IS_TEMP_FLAG(insn) && validTempFlagReg == insn.state.flagIndex) return; SelectionInstruction *cmp0 = selection.create(SEL_OP_CMP, 1, 2); cmp0->state = GenInstructionState(ctx.getSimdWidth()); cmp0->state.flag = insn.state.flag; cmp0->state.subFlag = insn.state.subFlag; if (IS_SCALAR_FLAG(insn)) cmp0->state.noMask = 1; cmp0->src(0) = GET_FLAG_REG(insn); cmp0->src(1) = GenRegister::immuw(0); cmp0->dst(0) = GenRegister::retype(GenRegister::null(), GEN_TYPE_UW); cmp0->extra.function = GEN_CONDITIONAL_NEQ; insn.prepend(*cmp0); if (!IS_TEMP_FLAG(insn)) validatedFlags.insert(insn.state.flagIndex); else { if (insn.state.modFlag == 0) validTempFlagReg = insn.state.flagIndex; else validTempFlagReg = 0; } } void GenRegAllocator::Opaque::allocateFlags(Selection &selection) { // Previously, we have a global flag allocation implemntation. // After some analysis, I found the global flag allocation is not // the best solution here. // As for the cross block reference of bool value, we have to // combine it with current emask. There is no obvious advantage to // allocate deadicate physical flag register for those cross block usage. // We just need to allocate physical flag within each BB. We need to handle // the following cases: // // 1. The bool's liveness never beyond this BB. And the bool is only used as // a dst register or a pred register. This bool value could be // allocated in physical flag only if there is enough physical flag. // We already identified those bool at the instruction select stage, and // put them in the flagBooleans set. // 2. The bool is defined in another BB and used in this BB, then we need // to prepend an instruction at the position where we use it. // 3. The bool is defined in this BB but is also used as some instruction's // source registers rather than the pred register. We have to keep the normal // grf (UW8/UW16) register for this bool. For some CMP instruction, we need to // append a SEL instruction convert the flag to the grf register. // 4. Even for the spilling flag, if there is only one spilling flag, we will also // try to reuse the temporary flag register latter. This requires all the // instructions should got it flag at the instruction selection stage. And should // not use the flag physical number directly at the gen_context stage. Otherwise, // may break the algorithm here. // We will track all the validated bool value and to avoid any redundant // validation for the same flag. But if there is no enough physical flag, // we have to spill the previous allocated physical flag. And the spilling // policy is to spill the allocate flag which live to the last time end point. // we have three flags we use for booleans f0.0 , f1.0 and f1.1 set liveInSet01; for (auto &block : *selection.blockList) { // Store the registers allocated in the map map allocatedFlags; map allocatedFlagIntervals; const uint32_t flagNum = 3; uint32_t freeFlags[] = {2, 3, 0}; uint32_t freeNum = flagNum; if (boolIntervalsMap.find(&block) == boolIntervalsMap.end()) continue; const auto boolsMap = boolIntervalsMap[&block]; vector flagStarting; vector flagEnding; GBE_ASSERT(boolsMap->size() > 0); uint32_t regNum = boolsMap->size(); flagStarting.resize(regNum); flagEnding.resize(regNum); uint32_t id = 0; for (auto &interval : *boolsMap) { flagStarting[id] = flagEnding[id] = &interval.second; id++; } std::sort(flagStarting.begin(), flagStarting.end(), cmp); std::sort(flagEnding.begin(), flagEnding.end(), cmp); uint32_t endID = 0; // interval to expire for (uint32_t startID = 0; startID < regNum; ++startID) { const GenRegInterval *interval = flagStarting[startID]; const ir::Register reg = interval->reg; GBE_ASSERT(ctx.sel->getRegisterFamily(reg) == ir::FAMILY_BOOL); if (freeNum != 0) { allocatedFlags.insert(std::make_pair(reg, freeFlags[--freeNum])); allocatedFlagIntervals.insert(std::make_pair(interval, freeFlags[freeNum])); } else { // Try to expire one register while (endID != flagEnding.size()) { const GenRegInterval *toExpire = flagEnding[endID]; // Dead code produced by the insn selection -> we skip it if (toExpire->minID > toExpire->maxID) { endID++; continue; } // We cannot expire this interval and the next ones if (toExpire->maxID >= interval->minID) break; // We reuse a flag from a previous interval (the oldest one) auto it = allocatedFlags.find(toExpire->reg); if (it == allocatedFlags.end()) { endID++; continue; } freeFlags[freeNum++] = it->second; endID++; break; } if (freeNum != 0) { allocatedFlags.insert(std::make_pair(reg, freeFlags[--freeNum])); allocatedFlagIntervals.insert(std::make_pair(interval, freeFlags[freeNum])); } else { // FIXME we may sort the allocated flags before do the spilling in the furture. int32_t spill = -1; const GenRegInterval *spillInterval = NULL; int32_t maxID = 0; for (auto &it : allocatedFlagIntervals) { if (it.first->maxID <= interval->minID) continue; if (it.first->maxID > maxID && it.second != 0) { maxID = it.first->maxID; spill = it.second; spillInterval = it.first; } } if (spill != -1) { allocatedFlags.insert(std::make_pair(reg, spill)); allocatedFlagIntervals.insert(std::make_pair(interval, spill)); allocatedFlags.erase(spillInterval->reg); allocatedFlagIntervals.erase(spillInterval); // We spill this flag booleans register, so erase it from the flag boolean set. if (flagBooleans.contains(spillInterval->reg)) flagBooleans.erase(spillInterval->reg); } else { GBE_ASSERT(0); } } } } delete boolsMap; // Now, we traverse all the selection instructions and we patch them to make // them use flag registers validTempFlagReg = 0; validatedFlags.clear(); for (auto &insn : block.insnList) { // Patch the predicate now. Note that only compares actually modify it (it // is called a "conditional modifier"). The other instructions just read // it if (insn.state.physicalFlag == 0) { // SEL.bool instruction, the dst register should be stored in GRF // the pred flag is used by flag register if (insn.opcode == SEL_OP_SEL) { ir::Register dst = insn.dst(0).reg(); if (ctx.sel->getRegisterFamily(dst) == ir::FAMILY_BOOL && allocatedFlags.find(dst) != allocatedFlags.end()) allocatedFlags.erase(dst); } auto it = allocatedFlags.find(ir::Register(insn.state.flagIndex)); if (it != allocatedFlags.end()) { insn.state.physicalFlag = 1; insn.state.flag = it->second / 2; insn.state.subFlag = it->second & 1; // modFlag is for the LOADI/MOV/AND/OR/XOR instructions which will modify a // flag register. We set the condition for them to save one instruction if possible. if (IS_IMPLICITLY_MOD_FLAG(insn)) { // If this is a modFlag on a scalar bool, we need to remove it // from the allocated flags map. Then latter, the user could // validate the flag from the scalar value correctly. // The reason is we can not predicate the active channel when we // need to use this flag. if (IS_SCALAR_FLAG(insn)) { allocatedFlags.erase(ir::Register(insn.state.flagIndex)); continue; } insn.extra.function = GEN_CONDITIONAL_NEQ; } // If this is an external bool, we need to validate it if it is not validated yet. if ((insn.state.externFlag && insn.state.predicate != GEN_PREDICATE_NONE)) validateFlag(selection, insn); } else { insn.state.physicalFlag = 1; insn.state.flag = 0; insn.state.subFlag = 1; // If this is for MOV/AND/OR/... we don't need to waste an extra instruction // to generate the flag here, just continue to next instruction. And the validTempFlagReg // will not be destroyed. if (IS_IMPLICITLY_MOD_FLAG(insn)) continue; // This bool doesn't have a deadicated flag, we use temporary flag here. // each time we need to validate it from the grf register. if (insn.state.predicate != GEN_PREDICATE_NONE) validateFlag(selection, insn); } if (insn.opcode == SEL_OP_CMP && (flagBooleans.contains(insn.dst(0).reg()) || GenRegister::isNull(insn.dst(0)))) { // This is a CMP for a pure flag booleans, we don't need to write result to // the grf. And latter, we will not allocate grf for it. // set a temporary register to avoid switch in this block. bool isSrc = false; bool needMov = false; ir::Type ir_type = ir::TYPE_FLOAT; // below (src : dst) type mapping for 'cmp' // is allowed by hardware // B,W,D,F : F // HF : HF // DF : DF // Q : Q if (NEED_DST_GRF_TYPE_FIX(insn.src(0).type)) ir_type = getIRType(insn.src(0).type); this->replaceReg(selection, &insn, 0, isSrc, ir_type, needMov); } // If the instruction requires to generate (CMP for long/int/float..) // the flag value to the register, and it's not a pure flag boolean, // we need to use SEL instruction to generate the flag value to the UW8 // register. if (insn.state.flagGen == 1 && !flagBooleans.contains((ir::Register)(insn.state.flagIndex))) { SelectionInstruction *sel0 = selection.create(SEL_OP_SEL, 1, 2); uint32_t simdWidth; simdWidth = IS_SCALAR_FLAG(insn) ? 1 : ctx.getSimdWidth(); sel0->state = GenInstructionState(simdWidth); if (IS_SCALAR_FLAG(insn)) sel0->state.noMask = 1; sel0->state.flag = insn.state.flag; sel0->state.subFlag = insn.state.subFlag; sel0->state.predicate = GEN_PREDICATE_NORMAL; sel0->src(0) = GenRegister::uw1grf(ir::ocl::one); sel0->src(1) = GenRegister::uw1grf(ir::ocl::zero); sel0->dst(0) = GET_FLAG_REG(insn); liveInSet01.insert(insn.parent->bb); insn.append(*sel0); // We use the zero one after the liveness analysis, we have to update // the liveness data manually here. GenRegInterval &interval0 = intervals[ir::ocl::zero]; GenRegInterval &interval1 = intervals[ir::ocl::one]; interval0.minID = std::min(interval0.minID, (int32_t)insn.ID); interval0.maxID = std::max(interval0.maxID, (int32_t)insn.ID); interval1.minID = std::min(interval1.minID, (int32_t)insn.ID); interval1.maxID = std::max(interval1.maxID, (int32_t)insn.ID); } } else { // If the instruction use the temporary flag register manually, // we should invalidate the temp flag reg here. if (insn.state.flag == 0 && insn.state.subFlag == 1) validTempFlagReg = 0; } } } // As we introduce two global variables zero and one, we have to // recompute its liveness information here! if (liveInSet01.size()) { set liveOutSet01; set workSet(liveInSet01.begin(), liveInSet01.end()); while(workSet.size()) { for (auto bb = workSet.begin(); bb != workSet.end(); ) { for(auto predBB : (*bb)->getPredecessorSet()) { liveOutSet01.insert(predBB); if (liveInSet01.find(predBB) != liveInSet01.end()) continue; liveInSet01.insert(predBB); workSet.insert(predBB); } bb = workSet.erase(bb); } } int32_t maxID = 0; for(auto bb : liveOutSet01) maxID = std::max(maxID, bbLastInsnIDMap.find(bb)->second); intervals[ir::ocl::zero].maxID = std::max(intervals[ir::ocl::zero].maxID, maxID); intervals[ir::ocl::one].maxID = std::max(intervals[ir::ocl::one].maxID, maxID); } } IVAR(OCL_SIMD16_SPILL_THRESHOLD, 0, 16, 256); bool GenRegAllocator::Opaque::allocateGRFs(Selection &selection) { // Perform the linear scan allocator ctx.errCode = REGISTER_ALLOCATION_FAIL; const uint32_t regNum = ctx.sel->getRegNum(); for (uint32_t startID = 0; startID < regNum; ++startID) { const GenRegInterval &interval = *this->starting[startID]; const ir::Register reg = interval.reg; if (interval.maxID == -INT_MAX) continue; // Unused register if (RA.contains(reg)) continue; // already allocated if (flagBooleans.contains(reg)) continue; // Case 1: the register belongs to a vector, allocate all the registers in // one piece auto it = vectorMap.find(reg); if (it != vectorMap.end()) { const SelectionVector *vector = it->second.first; // all the reg in the SelectionVector are spilled if(spilledRegs.find(vector->reg[0].reg()) != spilledRegs.end()) continue; uint32_t alignment; uint32_t size = 0; for (uint32_t regID = 0; regID < vector->regNum; ++regID) { getRegAttrib(vector->reg[regID].reg(), alignment, NULL); size += alignment; } // FIXME this is workaround for scheduling limitation, which requires 2*GEN_REG_SIZE under SIMD16. const uint32_t maxAlignment = ctx.getSimdWidth()/8*GEN_REG_SIZE; const uint32_t grfOffset = allocateReg(interval, size, maxAlignment); if(grfOffset == 0) { for(int i = vector->regNum-1; i >= 0; i--) { if (!spillReg(vector->reg[i].reg())) return false; } continue; } uint32_t subOffset = 0; for (uint32_t regID = 0; regID < vector->regNum; ++regID) { const ir::Register reg = vector->reg[regID].reg(); GBE_ASSERT(RA.contains(reg) == false); getRegAttrib(reg, alignment, NULL); // check all sub registers aligned correctly GBE_ASSERT((grfOffset + subOffset) % alignment == 0 || (grfOffset + subOffset) % GEN_REG_SIZE == 0); insertNewReg(selection, reg, grfOffset + subOffset, true); ctx.splitBlock(grfOffset, subOffset); //splitBlock will not split if regID == 0 subOffset += alignment; } } // Case 2: This is a regular scalar register, allocate it alone else if (this->createGenReg(selection, interval) == false) { if (!spillReg(interval)) return false; } } if (!spilledRegs.empty()) { GBE_ASSERT(reservedReg != 0); if (ctx.getSimdWidth() == 16) { if (spilledRegs.size() > (unsigned int)OCL_SIMD16_SPILL_THRESHOLD) { ctx.errCode = REGISTER_SPILL_EXCEED_THRESHOLD; return false; } } if (!allocateScratchForSpilled()) { ctx.errCode = REGISTER_SPILL_NO_SPACE; return false; } bool success = selection.spillRegs(spilledRegs, reservedReg); if (!success) { ctx.errCode = REGISTER_SPILL_FAIL; return false; } } ctx.errCode = NO_ERROR; return true; } INLINE bool GenRegAllocator::Opaque::allocateScratchForSpilled() { const uint32_t regNum = spilledRegs.size(); this->starting.resize(regNum); this->ending.resize(regNum); uint32_t regID = 0; for(auto it = spilledRegs.begin(); it != spilledRegs.end(); ++it) { this->starting[regID] = this->ending[regID] = &intervals[it->first]; regID++; } std::sort(this->starting.begin(), this->starting.end(), cmp); std::sort(this->ending.begin(), this->ending.end(), cmp); int toExpire = 0; for(uint32_t i = 0; i < regNum; i++) { const GenRegInterval * cur = starting[i]; const GenRegInterval * exp = ending[toExpire]; if (exp->maxID < cur->minID) { auto it = spilledRegs.find(exp->reg); GBE_ASSERT(it != spilledRegs.end()); if(it->second.addr != -1) { ctx.deallocateScratchMem(it->second.addr); } toExpire++; } auto it = spilledRegs.find(cur->reg); GBE_ASSERT(it != spilledRegs.end()); if(cur->minID == cur->maxID) { it->second.addr = -1; continue; } ir::RegisterFamily family = ctx.sel->getRegisterFamily(cur->reg); it->second.addr = ctx.allocateScratchMem(getFamilySize(family) * ctx.getSimdWidth()); if (it->second.addr == -1) return false; } return true; } INLINE bool GenRegAllocator::Opaque::expireReg(ir::Register reg) { auto it = RA.find(reg); if (flagBooleans.contains(reg)) return false; GBE_ASSERT(it != RA.end()); // offset less than 32 means it is not managed by our reg allocator. if (it->second < 32) return false; ctx.deallocate(it->second); if (reservedReg != 0 && (spillCandidate.find(&intervals[reg]) != spillCandidate.end())) { spillCandidate.erase(&intervals[reg]); /* offset --> reg map should keep updated. */ offsetReg.erase(it->second); } return true; } // insert a new register with allocated offset, // put it to the RA map and the spill map if it could be spilled. INLINE void GenRegAllocator::Opaque::insertNewReg(const Selection &selection, ir::Register reg, uint32_t grfOffset, bool isVector) { RA.insert(std::make_pair(reg, grfOffset)); if (reservedReg != 0) { uint32_t regSize; ir::RegisterFamily family; getRegAttrib(reg, regSize, &family); // At simd16 mode, we may introduce some simd8 registers in te instruction selection stage. // To spill those simd8 temporary registers will introduce unecessary complexity. We just simply // avoid to spill those temporary registers here. if (ctx.getSimdWidth() == 16 && reg.value() >= ctx.getFunction().getRegisterFile().regNum()) return; if (((regSize == ctx.getSimdWidth()/8 * GEN_REG_SIZE && family == ir::FAMILY_DWORD) || (regSize == 2 * ctx.getSimdWidth()/8 * GEN_REG_SIZE && family == ir::FAMILY_QWORD)) && !selection.isPartialWrite(reg)) { GBE_ASSERT(offsetReg.find(grfOffset) == offsetReg.end()); offsetReg.insert(std::make_pair(grfOffset, reg)); spillCandidate.insert(&intervals[reg]); } } } INLINE bool GenRegAllocator::Opaque::spillReg(ir::Register reg, bool isAllocated) { return spillReg(intervals[reg], isAllocated); } INLINE bool GenRegAllocator::Opaque::spillReg(GenRegInterval interval, bool isAllocated) { if (reservedReg == 0) return false; if (interval.reg.value() >= ctx.getFunction().getRegisterFile().regNum() && ctx.getSimdWidth() == 16) return false; ir::RegisterFamily family = ctx.sel->getRegisterFamily(interval.reg); // we currently only support DWORD/QWORD spill if(family != ir::FAMILY_DWORD && family != ir::FAMILY_QWORD) return false; SpillRegTag spillTag; spillTag.isTmpReg = interval.maxID == interval.minID; spillTag.addr = -1; if (isAllocated) { // If this register is allocated, we need to expire it and erase it // from the RA map. bool success = expireReg(interval.reg); GBE_ASSERT(success); if(!success) return success; RA.erase(interval.reg); } spilledRegs.insert(std::make_pair(interval.reg, spillTag)); return true; } // Check whethere a vector which is allocated can be spilled out // If a partial of a vector is expired, the vector will be unspillable, currently. // FIXME we may need to fix those unspillable vector in the furture. INLINE bool GenRegAllocator::Opaque::vectorCanSpill(SelectionVector *vector) { for(uint32_t id = 0; id < vector->regNum; id++) if (spillCandidate.find(&intervals[(ir::Register)(vector->reg[id].value.reg)]) == spillCandidate.end()) return false; return true; } INLINE float getSpillCost(const GenRegInterval &v) { // check minID maxId value assert(v.maxID >= v.minID); if (v.maxID == v.minID) return 1.0f; // FIXME some register may get access count of 0, need to be fixed. float count = v.accessCount == 0 ? (float)2 : (float)v.accessCount; return count / (float)(v.maxID - v.minID); } bool spillinterval_cmp(const SpillInterval &v1, const SpillInterval &v2) { return v1.cost < v2.cost; } INLINE SpillIntervalIter findRegisterInSpillQueue( std::vector &cand, ir::Register reg) { for (SpillIntervalIter it = cand.begin(); it != cand.end(); ++it) { if (it->reg == reg) return it; } return cand.end(); } // The function tries to search in 'free physical register' and 'candidate'. // so, the result may be on of the three possible situations: // 1. search completed, find the next valid iterator to a candidate. // 2. search ended, because we met unspillable register, we have to drop the iteration // 3. search completed, there are enough free physical register. // // return value: should we break? because of: // 1. search end, found enough free register // 2. search end, because met unspillable register INLINE bool GenRegAllocator::Opaque::findNextSpillCandidate( std::vector &candidate, int &remainSize, int &offset, SpillIntervalIter &nextCand) { bool isFree = false; bool shouldBreak = false; do { // check is free? isFree = ctx.isSuperRegisterFree(offset); if (isFree) { remainSize -= GEN_REG_SIZE; offset += GEN_REG_SIZE; } } while(isFree && remainSize > 0); // done if (remainSize <= 0) return true; auto registerIter = offsetReg.find(offset); shouldBreak = registerIter == offsetReg.end(); if (!shouldBreak) { ir::Register reg = registerIter->second; nextCand = findRegisterInSpillQueue(candidate, reg); } // if shouldBreak is false, means we need go on return shouldBreak; } INLINE bool GenRegAllocator::Opaque::spillAtInterval(GenRegInterval interval, int size, uint32_t alignment) { if (reservedReg == 0) return false; if (spillCandidate.empty()) return false; // push spill candidate into a vector in ascending order of spill-cost. std::vector candQ; for (auto &p : spillCandidate) { float cost = getSpillCost(*p); candQ.push_back(SpillInterval(p->reg, cost)); } std::sort(candQ.begin(), candQ.end(), spillinterval_cmp); bool scalarAllocationFail = (vectorMap.find(interval.reg) == vectorMap.end()); int remainSize = size; float spillCostTotal = 0.0f; std::set spillSet; // if we search the whole register, it will take lots of time. // so, I just add this max value to make the compile time not // grow too much, although this method may not find truely lowest // spill cost candidates. const int spillGroupMax = 8; int spillGroupID = 0; std::vector> spillGroups; std::vector spillGroupCost; auto searchBegin = candQ.begin(); while (searchBegin != candQ.end() && spillGroupID < spillGroupMax) { auto contiguousIter = searchBegin; while (contiguousIter != candQ.end()) { ir::Register reg = contiguousIter->reg; auto vectorIt = vectorMap.find(reg); bool spillVector = (vectorIt != vectorMap.end()); int32_t nextOffset = -1; // is register allocation failed at scalar register? // if so, don't try to spill a vector register, // which is obviously of no benefit. if (scalarAllocationFail && spillVector) break; if (spillVector) { if (vectorCanSpill(vectorIt->second.first)) { const SelectionVector *vector = vectorIt->second.first; for (uint32_t id = 0; id < vector->regNum; id++) { GBE_ASSERT(spilledRegs.find(vector->reg[id].reg()) == spilledRegs.end()); spillSet.insert(vector->reg[id].reg()); reg = vector->reg[id].reg(); uint32_t s; getRegAttrib(reg, s); remainSize-= s; spillCostTotal += contiguousIter->cost; } } else { break; } } else { spillSet.insert(reg); uint32_t s; getRegAttrib(reg, s); spillCostTotal += contiguousIter->cost; remainSize -= s; } if (remainSize <= 0) break; uint32_t offset = RA.find(reg)->second; uint32_t s; getRegAttrib(reg, s); nextOffset = offset + s; SpillIntervalIter nextValid = candQ.end(); bool shouldBreak = findNextSpillCandidate(candQ, remainSize, nextOffset, nextValid); contiguousIter = nextValid; if (shouldBreak) break; } if (remainSize <= 0) { if (scalarAllocationFail) { // Done break; } else { // Add as one spillGroup spillGroups.push_back(spillSet); spillGroupCost.push_back(spillCostTotal); ++spillGroupID; } } ++searchBegin; // restore states remainSize = size; spillCostTotal = 0.0f; spillSet.clear(); } // failed to spill if (scalarAllocationFail && remainSize > 0) return false; if (!scalarAllocationFail && spillGroups.size() == 0) return false; if (!scalarAllocationFail) { // push min spillcost group into spillSet int minIndex = std::distance(spillGroupCost.begin(), std::min_element(spillGroupCost.begin(), spillGroupCost.end())); spillSet.swap(spillGroups[minIndex]); } for(auto spillreg : spillSet) { spillReg(spillreg, true); } return true; } INLINE uint32_t GenRegAllocator::Opaque::allocateReg(GenRegInterval interval, uint32_t size, uint32_t alignment) { int32_t grfOffset; // Doing expireGRF too freqently will cause the post register allocation // scheduling very hard. As it will cause a very high register conflict rate. // The tradeoff here is to reduce the freqency here. And if we are under spilling // then no need to reduce that freqency as the register pressure is the most // important factor. if (ctx.regSpillTick % 12 == 0 || ctx.reservedSpillRegs != 0) this->expireGRF(interval); ctx.regSpillTick++; // For some scalar byte register, it may be used as a destination register // and the source is a scalar Dword. If that is the case, the byte register // must get 4byte alignment register offset. alignment = (alignment + 3) & ~3; bool direction = true; if (interval.conflictReg != 0) { // try to allocate conflict registers in top/bottom half. if (RA.contains(interval.conflictReg)) { if (RA.find(interval.conflictReg)->second < HALF_REGISTER_FILE_OFFSET) { direction = false; } } } if (interval.b3OpAlign != 0) { alignment = (alignment + 15) & ~15; } while ((grfOffset = ctx.allocate(size, alignment, direction)) == -1) { const bool success = this->expireGRF(interval); if (success == false) { if (spillAtInterval(interval, size, alignment) == false) return 0; } } return grfOffset; } int UseCountApproximate(int loopDepth) { int ret = 1; for (int i = 0; i < loopDepth; i++) { ret = ret * 10; } return ret; } void GenRegAllocator::Opaque::calculateSpillCost(Selection &selection) { int BlockIndex = 0; for (auto &block : *selection.blockList) { int LoopDepth = ctx.fn.getLoopDepth(ir::LabelIndex(BlockIndex)); for (auto &insn : block.insnList) { const uint32_t srcNum = insn.srcNum, dstNum = insn.dstNum; for (uint32_t srcID = 0; srcID < srcNum; ++srcID) { const GenRegister &selReg = insn.src(srcID); const ir::Register reg = selReg.reg(); if (selReg.file == GEN_GENERAL_REGISTER_FILE) this->intervals[reg].accessCount += UseCountApproximate(LoopDepth); } for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { const GenRegister &selReg = insn.dst(dstID); const ir::Register reg = selReg.reg(); if (selReg.file == GEN_GENERAL_REGISTER_FILE) this->intervals[reg].accessCount += UseCountApproximate(LoopDepth); } } BlockIndex++; } } INLINE bool GenRegAllocator::Opaque::allocate(Selection &selection) { using namespace ir; const Function::PushMap &pushMap = ctx.fn.getPushMap(); if (ctx.reservedSpillRegs != 0) { reservedReg = ctx.allocate(ctx.reservedSpillRegs * GEN_REG_SIZE, GEN_REG_SIZE, false); reservedReg /= GEN_REG_SIZE; } else { reservedReg = 0; } // Now start the linear scan allocation for (uint32_t regID = 0; regID < ctx.sel->getRegNum(); ++regID) { this->intervals.push_back(ir::Register(regID)); // Set all payload register's liveness minID to 0. gbe_curbe_type curbeType; int subType; ctx.getRegPayloadType(ir::Register(regID), curbeType, subType); if (curbeType != GBE_GEN_REG) { intervals[regID].minID = 0; // FIXME stack buffer is not used, we may need to remove it in the furture. if (curbeType == GBE_CURBE_EXTRA_ARGUMENT && subType == GBE_STACK_BUFFER) intervals[regID].maxID = 1; } if (regID == ir::ocl::zero.value() || regID == ir::ocl::one.value()) intervals[regID].minID = 0; } // Compute the intervals int32_t insnID = 0; for (auto &block : *selection.blockList) { int32_t lastID = insnID; int32_t firstID = insnID; // Update the intervals of each used register. Note that we do not // register allocate R0, so we skip all sub-registers in r0 RegIntervalMap *boolsMap = new RegIntervalMap; for (auto &insn : block.insnList) { const uint32_t srcNum = insn.srcNum, dstNum = insn.dstNum; assert(insnID == (int32_t)insn.ID); bool is3SrcOp = insn.opcode == SEL_OP_MAD; for (uint32_t srcID = 0; srcID < srcNum; ++srcID) { const GenRegister &selReg = insn.src(srcID); const ir::Register reg = selReg.reg(); if (selReg.file != GEN_GENERAL_REGISTER_FILE || reg == ir::ocl::barrierid || reg == ir::ocl::groupid0 || reg == ir::ocl::groupid1 || reg == ir::ocl::groupid2) continue; ir::Register conflictReg = ir::Register(0); if (is3SrcOp) { if (srcID == 1) conflictReg = insn.src(2).reg(); else if (srcID == 2) conflictReg = insn.src(1).reg(); } // we only let it conflict with one register, and with smaller reg number, // as smaller virtual register usually comes first, // and linear scan allocator allocate from smaller to larger register // so, conflict with larger register number will not make any effect. if (this->intervals[reg].conflictReg == 0 || this->intervals[reg].conflictReg > conflictReg) this->intervals[reg].conflictReg = conflictReg; int insnsrcID = insnID; // If instruction is simple, src and dst can be reused and they will have different IDs // insn may be split in the encoder, if register region are not same, can't be reused. // Because hard to check split or not here, so only check register regio. if (insn.isNative() && insn.sameAsDstRegion(srcID)) insnsrcID -= 1; this->intervals[reg].minID = std::min(this->intervals[reg].minID, insnsrcID); this->intervals[reg].maxID = std::max(this->intervals[reg].maxID, insnsrcID); } for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { const GenRegister &selReg = insn.dst(dstID); const ir::Register reg = selReg.reg(); if (selReg.file != GEN_GENERAL_REGISTER_FILE || reg == ir::ocl::barrierid || reg == ir::ocl::groupid0 || reg == ir::ocl::groupid1 || reg == ir::ocl::groupid2) continue; if (is3SrcOp) { this->intervals[reg].b3OpAlign = 1; } this->intervals[reg].minID = std::min(this->intervals[reg].minID, insnID); this->intervals[reg].maxID = std::max(this->intervals[reg].maxID, insnID); } // OK, a flag is used as a predicate or a conditional modifier if (insn.state.physicalFlag == 0) { const ir::Register reg = ir::Register(insn.state.flagIndex); this->intervals[reg].minID = std::min(this->intervals[reg].minID, insnID); this->intervals[reg].maxID = std::max(this->intervals[reg].maxID, insnID); // Check whether this is a pure flag booleans candidate. if (insn.state.grfFlag == 0) flagBooleans.insert(reg); GBE_ASSERT(ctx.sel->getRegisterFamily(reg) == ir::FAMILY_BOOL); // update the bool register's per-BB's interval data if (boolsMap->find(reg) == boolsMap->end()) { GenRegInterval boolInterval(reg); boolsMap->insert(std::make_pair(reg, boolInterval)); } boolsMap->find(reg)->second.minID = std::min(boolsMap->find(reg)->second.minID, insnID); boolsMap->find(reg)->second.maxID = std::max(boolsMap->find(reg)->second.maxID, insnID); if (&insn == block.insnList.back() && insn.opcode == SEL_OP_JMPI && insn.state.predicate != GEN_PREDICATE_NONE) { // If this is the last instruction and is a predicated JMPI. // We must extent its liveness before any other instrution. // As we need to allocate f0 to it, and need to keep the f0 // unchanged during the block. The root cause is this instruction // is out-of the if/endif region, so we have to borrow the f0 // to get correct bits for all channels. boolsMap->find(reg)->second.minID = 0; } } else { // Make sure that instruction selection stage didn't use physiacl flags incorrectly. GBE_ASSERT ((insn.opcode == SEL_OP_LABEL || insn.opcode == SEL_OP_IF || insn.opcode == SEL_OP_JMPI || insn.state.predicate == GEN_PREDICATE_NONE || (block.hasBarrier && insn.opcode == SEL_OP_MOV) || (insn.state.flag == 0 && insn.state.subFlag == 1) )); } lastID = insnID; insnID += 2; } // All registers alive at the begining of the block must update their intervals. const ir::BasicBlock *bb = block.bb; bbLastInsnIDMap.insert(std::make_pair(bb, lastID)); for (auto reg : ctx.getLiveIn(bb)) this->intervals[reg].minID = std::min(this->intervals[reg].minID, firstID); // All registers alive at the end of the block must have their intervals // updated as well for (auto reg : ctx.getLiveOut(bb)) this->intervals[reg].maxID = std::max(this->intervals[reg].maxID, lastID); if (boolsMap->size() > 0) boolIntervalsMap.insert(std::make_pair(&block, boolsMap)); else delete boolsMap; } for (auto &it : this->intervals) { if (it.maxID == -INT_MAX) continue; if(pushMap.find(it.reg) != pushMap.end()) { uint32_t argID = ctx.fn.getPushLocation(it.reg)->argID; ir::Register argReg = ctx.fn.getArg(argID).reg; intervals[argReg].maxID = std::max(intervals[argReg].maxID, 1); } } if (ctx.inProfilingMode) { /* If we are in profiling mode, we always need xyz dim info and timestamp curbes. xyz dim info related curbe registers just live for the first INSN, but timestamp curbes will live the whole execution life. */ #define ADD_CURB_REG_FOR_PROFILING(REG_NAME, LIFE_START, LIFE_END) \ do { \ bool hasIt = false; \ for (auto& itv : this->intervals) { \ if (itv.reg == REG_NAME) { \ hasIt = true; \ if (itv.minID > LIFE_START) itv.minID = LIFE_START; \ if (itv.maxID < LIFE_END) itv.maxID = LIFE_END; \ break; \ } \ } \ if (!hasIt) { \ GenRegInterval regInv(REG_NAME); \ regInv.minID = LIFE_START; \ regInv.maxID = LIFE_END; \ this->intervals.push_back(regInv); \ } \ } while(0) ADD_CURB_REG_FOR_PROFILING(ocl::lsize0, 0, 1); ADD_CURB_REG_FOR_PROFILING(ocl::lsize1, 0, 1); ADD_CURB_REG_FOR_PROFILING(ocl::lsize2, 0, 1); ADD_CURB_REG_FOR_PROFILING(ocl::goffset0, 0, 1); ADD_CURB_REG_FOR_PROFILING(ocl::goffset1, 0, 1); ADD_CURB_REG_FOR_PROFILING(ocl::goffset2, 0, 1); ADD_CURB_REG_FOR_PROFILING(ocl::groupid0, 0, 1); ADD_CURB_REG_FOR_PROFILING(ocl::groupid1, 0, 1); ADD_CURB_REG_FOR_PROFILING(ocl::groupid2, 0, 1); ADD_CURB_REG_FOR_PROFILING(ocl::lid0, 0, 1); ADD_CURB_REG_FOR_PROFILING(ocl::lid1, 0, 1); ADD_CURB_REG_FOR_PROFILING(ocl::lid2, 0, 1); ADD_CURB_REG_FOR_PROFILING(ocl::profilingbptr, 0, INT_MAX); ADD_CURB_REG_FOR_PROFILING(ocl::profilingts0, 0, INT_MAX); ADD_CURB_REG_FOR_PROFILING(ocl::profilingts1, 0, INT_MAX); ADD_CURB_REG_FOR_PROFILING(ocl::profilingts2, 0, INT_MAX); if (ctx.simdWidth == 8) { ADD_CURB_REG_FOR_PROFILING(ocl::profilingts3, 0, INT_MAX); ADD_CURB_REG_FOR_PROFILING(ocl::profilingts4, 0, INT_MAX); } } #undef ADD_CURB_REG_FOR_PROFILING this->intervals[ocl::retVal].minID = INT_MAX; this->intervals[ocl::retVal].maxID = -INT_MAX; // Allocate all the vectors first since they need to be contiguous this->allocateVector(selection); // First we try to put all booleans registers into flags this->allocateFlags(selection); this->calculateSpillCost(selection); // Sort both intervals in starting point and ending point increasing orders const uint32_t regNum = ctx.sel->getRegNum(); this->starting.resize(regNum); this->ending.resize(regNum); for (uint32_t regID = 0; regID < regNum; ++regID) this->starting[regID] = this->ending[regID] = &intervals[regID]; std::sort(this->starting.begin(), this->starting.end(), cmp); std::sort(this->ending.begin(), this->ending.end(), cmp); // Remove the registers that were not allocated this->expiringID = 0; while (this->expiringID < regNum) { const GenRegInterval *interval = ending[this->expiringID]; if (interval->maxID == -INT_MAX) this->expiringID++; else break; } this->allocateCurbePayload(); ctx.buildPatchList(); // Allocate the special registers (only those which are actually used) this->allocateSpecialRegs(); // Allocate all the GRFs now (regular register and boolean that are not in // flag registers) return this->allocateGRFs(selection); } INLINE void GenRegAllocator::Opaque::outputAllocation(void) { using namespace std; cout << "## register allocation ##" << endl; for(auto &i : RA) { ir::Register vReg = (ir::Register)i.first; ir::RegisterFamily family; uint32_t regSize; getRegAttrib(vReg, regSize, &family); int offst = (int)i.second;// / sizeof(float); int reg = offst / 32; int subreg = (offst % 32) / regSize; cout << "%" << setiosflags(ios::left) << setw(8) << vReg << "g" << setiosflags(ios::left) << setw(3) << reg << "." << setiosflags(ios::left) << setw(3) << subreg << ir::getFamilyName(family) << " " << setw(-3) << regSize << "B\t" << "[ " << setw(8) << this->intervals[(uint)vReg].minID << " -> " << setw(8) << this->intervals[(uint)vReg].maxID << "]" << setw(8) << "use count: " << this->intervals[(uint)vReg].accessCount << endl; } if (!spilledRegs.empty()) cout << "## spilled registers: " << spilledRegs.size() << endl; for(auto it = spilledRegs.begin(); it != spilledRegs.end(); it++) { ir::Register vReg = it->first; ir::RegisterFamily family; uint32_t regSize; getRegAttrib(vReg, regSize, &family); cout << "%" << setiosflags(ios::left) << setw(8) << vReg << "@" << setw(8) << it->second.addr << " " << ir::getFamilyName(family) << " " << setw(-3) << regSize << "B\t" << "[ " << setw(8) << this->intervals[(uint)vReg].minID << " -> " << setw(8) << this->intervals[(uint)vReg].maxID << "]" << setw(8) << "use count: " << this->intervals[(uint)vReg].accessCount << endl; } cout << endl; } INLINE GenRegister setGenReg(const GenRegister &src, uint32_t grfOffset) { GenRegister dst; dst = src; dst.physical = 1; dst.nr = grfOffset / GEN_REG_SIZE; dst.subnr = grfOffset % GEN_REG_SIZE; return dst; } INLINE GenRegister GenRegAllocator::Opaque::genReg(const GenRegister ®) { if (reg.file == GEN_GENERAL_REGISTER_FILE) { if(reg.physical == 1) { return reg; } GBE_ASSERT(RA.contains(reg.reg()) != false); const uint32_t grfOffset = RA.find(reg.reg())->second; const uint32_t suboffset = reg.subphysical ? reg.nr * GEN_REG_SIZE + reg.subnr : 0; const GenRegister dst = setGenReg(reg, grfOffset + suboffset); if (reg.quarter != 0) return GenRegister::Qn(dst, reg.quarter); else return dst; } else return reg; } ///////////////////////////////////////////////////////////////////////////// // Register allocator public implementation ///////////////////////////////////////////////////////////////////////////// GenRegAllocator::GenRegAllocator(GenContext &ctx) { this->opaque = GBE_NEW(GenRegAllocator::Opaque, ctx); } GenRegAllocator::~GenRegAllocator(void) { GBE_DELETE(this->opaque); } bool GenRegAllocator::allocate(Selection &selection) { return this->opaque->allocate(selection); } GenRegister GenRegAllocator::genReg(const GenRegister ®) { return this->opaque->genReg(reg); } bool GenRegAllocator::isAllocated(const ir::Register ®) { return this->opaque->isAllocated(reg); } void GenRegAllocator::outputAllocation(void) { this->opaque->outputAllocation(); } uint32_t GenRegAllocator::getRegSize(ir::Register reg) { uint32_t regSize; gbe_curbe_type curbeType = GBE_GEN_REG; int subType = 0; this->opaque->ctx.getRegPayloadType(reg, curbeType, subType); if (curbeType == GBE_CURBE_IMAGE_INFO) regSize = 4; else if (curbeType == GBE_CURBE_KERNEL_ARGUMENT) { const ir::FunctionArgument &arg = this->opaque->ctx.getFunction().getArg(subType); if (arg.type == ir::FunctionArgument::GLOBAL_POINTER || arg.type == ir::FunctionArgument::LOCAL_POINTER || arg.type == ir::FunctionArgument::CONSTANT_POINTER|| arg.type == ir::FunctionArgument::PIPE) regSize = this->opaque->ctx.getPointerSize(); else regSize = arg.size; GBE_ASSERT(arg.reg == reg); } else this->opaque->getRegAttrib(reg, regSize); return regSize; } } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/backend/context.hpp000664 001750 001750 00000016167 13161142102 021424 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __GBE_CONTEXT_HPP__ #define __GBE_CONTEXT_HPP__ #include "ir/instruction.hpp" #include "ir/function.hpp" #include "backend/program.h" #include "sys/set.hpp" #include "sys/map.hpp" #include "sys/platform.hpp" #include namespace gbe { namespace ir { class Unit; // Contains the complete program class Function; // We compile a function into a kernel class Liveness; // Describes liveness of each ir function register class FunctionDAG; // Describes the instruction dependencies } /* namespace ir */ } /* namespace gbe */ namespace gbe { class Kernel; // context creates Kernel class RegisterAllocator; // allocator for physical register allocation class ScratchAllocator; // allocator for scratch memory allocation /*! Context is the helper structure to build the Gen ISA or simulation code * from GenIR */ class Context : public NonCopyable { public: /*! Create a new context. name is the name of the function we want to * compile */ Context(const ir::Unit &unit, const std::string &name); /*! Release everything needed */ virtual ~Context(void); /*! start new code generation with specific simd width. */ void startNewCG(uint32_t simdWidth); /*! Compile the code */ Kernel *compileKernel(void); /*! Tells if the labels is used */ INLINE bool isLabelUsed(ir::LabelIndex index) const { return usedLabels.contains(index); } /*! Get the function graph */ INLINE const ir::FunctionDAG &getFunctionDAG(void) const { return *dag; } /*! Get the liveness information */ INLINE const ir::Liveness &getLiveness(void) const { return *liveness; } /*! Tells if the register is used */ bool isRegUsed(const ir::Register ®) const; /*! Get the kernel we are currently compiling */ INLINE Kernel *getKernel(void) const { return this->kernel; } /*! Get the function we are currently compiling */ INLINE const ir::Function &getFunction(void) const { return this->fn; } /*! Get the target label index for the given instruction */ INLINE ir::LabelIndex getLabelIndex(const ir::Instruction *insn) const { GBE_ASSERT(JIPs.find(insn) != JIPs.end()); return JIPs.find(insn)->second; } /*! Only GOTO and some LABEL instructions may have JIPs */ INLINE bool hasJIP(const ir::Instruction *insn) const { return JIPs.find(insn) != JIPs.end(); } /*! Allocate some memory in the register file */ int32_t allocate(int32_t size, int32_t alignment, bool bFwd = true); bool isSuperRegisterFree(int offset); /*! Deallocate previously allocated memory */ void deallocate(int32_t offset); /*! Spilt a block into 2 blocks, for some registers allocate together but deallocate seperate */ void splitBlock(int32_t offset, int32_t subOffset); /*! allocate size scratch memory and return start address */ int32_t allocateScratchMem(uint32_t size); /*! deallocate scratch memory at offset */ void deallocateScratchMem(int32_t offset); /*! Preallocated curbe register set including special registers. */ map curbeRegs; ir::Register getSurfaceBaseReg(unsigned char bti); /* Indicate whether we should use DW label or W label in backend.*/ bool isDWLabel(void) const { return useDWLabel; } uint32_t getMaxLabel(void) const { return this->isDWLabel() ? 0xffffffff : 0xffff; } /*! get register's payload type. */ INLINE void getRegPayloadType(ir::Register reg, gbe_curbe_type &curbeType, int &subType) const { if (reg.value() >= fn.getRegisterFile().regNum()) { curbeType = GBE_GEN_REG; subType = 0; return; } fn.getRegPayloadType(reg, curbeType, subType); } /*! check whether a register is a payload register */ INLINE bool isPayloadReg(ir::Register reg) const{ if (reg.value() >= fn.getRegisterFile().regNum()) return false; return fn.isPayloadReg(reg); } protected: /*! Build the instruction stream. Return false if failed */ virtual bool emitCode(void) = 0; /*! Align the scratch size to the device's scratch unit size */ virtual uint32_t alignScratchSize(uint32_t) = 0; /*! Get the device's max srcatch size */ virtual uint32_t getScratchSize(void) = 0; /*! Allocate a new empty kernel (to be implemented) */ virtual Kernel *allocateKernel(void) = 0; /*! Look if a stack is needed and allocate it */ void buildStack(void); /*! Build the list of arguments to set to launch the kernel */ void buildArgList(void); /*! Build the sets of used labels */ void buildUsedLabels(void); /*! Build JIPs for each branch and possibly labels. Can be different from * the branch target due to unstructured branches */ void buildJIPs(void); /*! Configure SLM use if needed */ void handleSLM(void); /*! Insert a new entry with the given size in the Curbe. Return the offset * of the entry */ void insertCurbeReg(ir::Register, uint32_t grfOffset); /*! allocate a curbe entry. */ uint32_t newCurbeEntry(gbe_curbe_type value, uint32_t subValue, uint32_t size, uint32_t alignment = 0); /*! Provide for each branch and label the label index target */ typedef map JIPMap; const ir::Unit &unit; //!< Unit that contains the kernel const ir::Function &fn; //!< Function to compile std::string name; //!< Name of the kernel to compile Kernel *kernel; //!< Kernel we are building ir::Liveness *liveness; //!< Liveness info for the variables ir::FunctionDAG *dag; //!< Graph of values on the function RegisterAllocator *registerAllocator; //!< physical register allocation ScratchAllocator *scratchAllocator; //!< scratch memory allocator set usedLabels; //!< Set of all used labels JIPMap JIPs; //!< Where to jump all labels/branches uint32_t simdWidth; //!< Number of lanes per HW threads bool useDWLabel; //!< false means using u16 label, true means using u32 label. map btiRegMap; GBE_CLASS(Context); //!< Use custom allocators }; } /* namespace gbe */ #endif /* __GBE_CONTEXT_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen_insn_selection_output.hpp000664 001750 001750 00000000466 13161142102 025220 0ustar00yryr000000 000000 #ifndef __GBE_GEN_INSN_SELECTION_OUTPUT_HPP__ #define __GBE_GEN_INSN_SELECTION_OUTPUT_HPP__ namespace gbe { class Selection; // Pre ISA code class GenContext; // Handle compilation for Gen void outputSelectionIR(GenContext &ctx, Selection* sel, const char* KernelName); } /* namespace gbe */ #endif Beignet-1.3.2-Source/backend/src/backend/gen8_instruction.hpp000664 001750 001750 00000047475 13161142102 023250 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Rong Yang */ /* Copyright (C) Intel Corp. 2006. All Rights Reserved. Intel funded Tungsten Graphics (http://www.tungstengraphics.com) to develop this 3D driver. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. **********************************************************************/ /* * Authors: * Keith Whitwell */ #ifndef __GEN8_INSTRUCTION_HPP__ #define __GEN8_INSTRUCTION_HPP__ union Gen8NativeInstruction { struct { struct { uint32_t opcode:7; uint32_t pad:1; uint32_t access_mode:1; uint32_t dependency_control:2; uint32_t nib_ctrl:1; uint32_t quarter_control:2; uint32_t thread_control:2; uint32_t predicate_control:4; uint32_t predicate_inverse:1; uint32_t execution_size:3; uint32_t destreg_or_condmod:4; uint32_t acc_wr_control:1; uint32_t cmpt_control:1; uint32_t debug_control:1; uint32_t saturate:1; } header; union { struct { uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t mask_control:1; uint32_t dest_reg_file:2; uint32_t dest_reg_type:4; uint32_t src0_reg_file:2; uint32_t src0_reg_type:4; uint32_t pad:1; uint32_t dest_subreg_nr:5; uint32_t dest_reg_nr:8; uint32_t dest_horiz_stride:2; uint32_t dest_address_mode:1; } da1; struct { uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t mask_control:1; uint32_t dest_reg_file:2; uint32_t dest_reg_type:4; uint32_t src0_reg_file:2; uint32_t src0_reg_type:4; int dest_indirect_offset_9:1; /* offset against the deref'd address reg bit9 */ int dest_indirect_offset:9; /* offset against the deref'd address reg bit0-8 */ uint32_t dest_subreg_nr:4; /* subnr for the address reg a0.x */ uint32_t dest_horiz_stride:2; uint32_t dest_address_mode:1; } ia1; struct { uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t mask_control:1; uint32_t dest_reg_file:2; uint32_t dest_reg_type:4; uint32_t src0_reg_file:2; uint32_t src0_reg_type:4; uint32_t pad:1; uint32_t dest_writemask:4; uint32_t dest_subreg_nr:1; uint32_t dest_reg_nr:8; uint32_t dest_horiz_stride:2; uint32_t dest_address_mode:1; } da16; struct { uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t mask_control:1; uint32_t dest_reg_file:2; uint32_t dest_reg_type:4; uint32_t src0_reg_file:2; uint32_t src0_reg_type:4; int dest_indirect_offset_9:1; /* offset against the deref'd address reg bit9 */ uint32_t dest_writemask:4; int dest_indirect_offset:5; uint32_t dest_subreg_nr:3; uint32_t dest_horiz_stride:2; uint32_t dest_address_mode:1; } ia16; struct { // The sub reg field is reinterpreted as accumulator selector. uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t mask_control:1; uint32_t dest_reg_file:2; uint32_t dest_reg_type:4; uint32_t src0_reg_file:2; uint32_t src0_reg_type:4; uint32_t pad:1; uint32_t dst_special_acc:4; uint32_t dest_subreg_nr:1; uint32_t dest_reg_nr:8; uint32_t reserved:2; uint32_t dest_address_mode:1; } da16acc; struct { uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t mask_control:1; uint32_t src1_type:1; uint32_t src2_type:1; uint32_t src0_abs:1; uint32_t src0_negate:1; uint32_t src1_abs:1; uint32_t src1_negate:1; uint32_t src2_abs:1; uint32_t src2_negate:1; uint32_t src_type:3; uint32_t dest_type:3; uint32_t dest_writemask:4; uint32_t dest_subreg_nr:3; uint32_t dest_reg_nr:8; } da3src; struct { uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t mask_control:1; uint32_t src1_type:1; uint32_t src2_type:1; uint32_t src0_abs:1; uint32_t src0_negate:1; uint32_t src1_abs:1; uint32_t src1_negate:1; uint32_t src2_abs:1; uint32_t src2_negate:1; uint32_t src_type:3; uint32_t dest_type:3; uint32_t dst_special_acc:4; uint32_t dest_subreg_nr:3; uint32_t dest_reg_nr:8; } da3srcacc; }bits1; union { struct { uint32_t src0_subreg_nr:5; uint32_t src0_reg_nr:8; uint32_t src0_abs:1; uint32_t src0_negate:1; uint32_t src0_address_mode:1; uint32_t src0_horiz_stride:2; uint32_t src0_width:3; uint32_t src0_vert_stride:4; uint32_t src1_reg_file:2; uint32_t src1_reg_type:4; uint32_t pad:1; } da1; struct { int src0_indirect_offset:9; uint32_t src0_subreg_nr:4; uint32_t src0_abs:1; uint32_t src0_negate:1; uint32_t src0_address_mode:1; uint32_t src0_horiz_stride:2; uint32_t src0_width:3; uint32_t src0_vert_stride:4; uint32_t src1_reg_file:2; uint32_t src1_reg_type:4; uint32_t src0_indirect_offset_9:1; } ia1; struct { uint32_t src0_swz_x:2; uint32_t src0_swz_y:2; uint32_t src0_subreg_nr:1; uint32_t src0_reg_nr:8; uint32_t src0_abs:1; uint32_t src0_negate:1; uint32_t src0_address_mode:1; uint32_t src0_swz_z:2; uint32_t src0_swz_w:2; uint32_t pad0:1; uint32_t src0_vert_stride:4; uint32_t src1_reg_file:2; uint32_t src1_reg_type:4; uint32_t pad:1; } da16; struct { uint32_t src0_swz_x:2; uint32_t src0_swz_y:2; int src0_indirect_offset:5; uint32_t src0_subreg_nr:4; uint32_t src0_abs:1; uint32_t src0_negate:1; uint32_t src0_address_mode:1; uint32_t src0_swz_z:2; uint32_t src0_swz_w:2; uint32_t pad0:1; uint32_t src0_vert_stride:4; uint32_t src1_reg_file:2; uint32_t src1_reg_type:4; uint32_t src0_indirect_offset_9:1; } ia16; struct { uint32_t src0_special_acc_lo:4; uint32_t src0_subreg_nr:1; uint32_t src0_reg_nr:8; uint32_t src0_abs:1; uint32_t src0_negate:1; uint32_t src0_address_mode:1; uint32_t src0_special_acc_hi:4; uint32_t pad0:1; uint32_t src0_vert_stride:4; uint32_t src1_reg_file:2; uint32_t src1_reg_type:4; uint32_t pad:1; } da16acc; struct { uint32_t src0_rep_ctrl:1; uint32_t src0_swizzle:8; uint32_t src0_subreg_nr:3; uint32_t src0_reg_nr:8; uint32_t src0_subreg_nr_w:1; uint32_t src1_rep_ctrl:1; uint32_t src1_swizzle:8; uint32_t src1_subreg_nr_low:2; } da3src; struct { uint32_t src0_rep_ctrl:1; uint32_t src0_special_acc:8; uint32_t src0_subreg_nr:3; uint32_t src0_reg_nr:8; uint32_t src0_subreg_nr_w:1; uint32_t src1_rep_ctrl:1; uint32_t src1_special_acc:8; uint32_t src1_subreg_nr_low:2; } da3srcacc; struct { uint32_t uip:32; } gen8_branch; uint32_t ud; } bits2; union { struct { uint32_t src1_subreg_nr:5; uint32_t src1_reg_nr:8; uint32_t src1_abs:1; uint32_t src1_negate:1; uint32_t src1_address_mode:1; uint32_t src1_horiz_stride:2; uint32_t src1_width:3; uint32_t src1_vert_stride:4; uint32_t pad0:7; } da1; struct { uint32_t src1_swz_x:2; uint32_t src1_swz_y:2; uint32_t src1_subreg_nr:1; uint32_t src1_reg_nr:8; uint32_t src1_abs:1; uint32_t src1_negate:1; uint32_t src1_address_mode:1; uint32_t src1_swz_z:2; uint32_t src1_swz_w:2; uint32_t pad1:1; uint32_t src1_vert_stride:4; uint32_t pad2:7; } da16; struct { int src1_indirect_offset:9; uint32_t src1_subreg_nr:4; uint32_t src1_abs:1; uint32_t src1_negate:1; uint32_t src1_address_mode:1; uint32_t src1_horiz_stride:2; uint32_t src1_width:3; uint32_t src1_vert_stride:4; int src1_indirect_offset_9:1; uint32_t pad1:6; } ia1; struct { uint32_t src1_swz_x:2; uint32_t src1_swz_y:2; int src1_indirect_offset:5; uint32_t src1_subreg_nr:4; uint32_t src1_abs:1; uint32_t src1_negate:1; uint32_t src1_address_mode:1; uint32_t src1_swz_z:2; uint32_t src1_swz_w:2; uint32_t pad1:1; uint32_t src1_vert_stride:4; int src1_indirect_offset_9:1; uint32_t pad2:6; } ia16; struct { uint32_t src1_special_acc_lo:4; uint32_t src1_subreg_nr:1; uint32_t src1_reg_nr:8; uint32_t src1_abs:1; uint32_t src1_negate:1; uint32_t src1_address_mode:1; uint32_t src1_special_acc_hi:4; uint32_t pad1:1; uint32_t src1_vert_stride:4; uint32_t pad2:7; } da16acc; struct { uint32_t function_control:19; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } generic_gen5; struct { uint32_t sub_function_id:3; uint32_t pad0:11; uint32_t ack_req:1; uint32_t notify:2; uint32_t pad1:2; uint32_t header:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } msg_gateway; struct { uint32_t opcode:1; uint32_t request:1; uint32_t pad0:2; uint32_t resource:1; uint32_t pad1:14; uint32_t header:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } spawner_gen5; /** Ironlake PRM, Volume 4 Part 1, Section 6.1.1.1 */ struct { uint32_t function:4; uint32_t int_type:1; uint32_t precision:1; uint32_t saturate:1; uint32_t data_type:1; uint32_t snapshot:1; uint32_t pad0:10; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } math_gen5; struct { uint32_t bti:8; uint32_t sampler:4; uint32_t msg_type:5; uint32_t simd_mode:2; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } sampler_gen7; /** * Message for the Sandybridge Sampler Cache or Constant Cache Data Port. * * See the Sandybridge PRM, Volume 4 Part 1, Section 3.9.2.1.1. **/ struct { uint32_t bti:8; uint32_t msg_control:5; uint32_t msg_type:3; uint32_t pad0:3; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } gen6_dp_sampler_const_cache; /*! Data port untyped read / write messages */ struct { uint32_t bti:8; uint32_t rgba:4; uint32_t simd_mode:2; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_untyped_rw; /*! Data port byte scatter / gather */ struct { uint32_t bti:8; uint32_t simd_mode:1; uint32_t ignored0:1; uint32_t data_size:2; uint32_t ignored1:2; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_byte_rw; /*! Data port Scratch Read/ write */ struct { uint32_t offset:12; uint32_t block_size:2; uint32_t ignored0:1; uint32_t invalidate_after_read:1; uint32_t channel_mode:1; uint32_t msg_type:1; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_scratch_rw; /*! Data port OBlock read / write */ struct { uint32_t bti:8; uint32_t block_size:3; uint32_t ignored:2; uint32_t invalidate_after_read:1; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_oblock_rw; /*! Data port dword scatter / gather */ struct { uint32_t bti:8; uint32_t block_size:2; uint32_t ignored0:3; uint32_t invalidate_after_read:1; uint32_t msg_type:4; uint32_t ignored1:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_dword_rw; /*! Data port typed read / write messages */ struct { uint32_t bti:8; uint32_t chan_mask:4; uint32_t slot:2; uint32_t msg_type:4; uint32_t pad2:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad3:2; uint32_t end_of_thread:1; } gen7_typed_rw; /*! Memory fence */ struct { uint32_t bti:8; uint32_t pad:1; uint32_t flush_instruction:1; uint32_t flush_texture:1; uint32_t flush_constant:1; uint32_t flush_rw:1; uint32_t commit_enable:1; uint32_t msg_type:4; uint32_t pad2:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad3:2; uint32_t end_of_thread:1; } gen7_memory_fence; /*! atomic messages */ struct { uint32_t bti:8; uint32_t aop_type:4; uint32_t simd_mode:1; uint32_t return_data:1; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad3:2; uint32_t end_of_thread:1; } gen7_atomic_op; /*! atomic a64 messages */ struct { uint32_t bti:8; uint32_t aop_type:4; uint32_t data_size:1; uint32_t return_data:1; uint32_t msg_type:5; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad3:2; uint32_t end_of_thread:1; } gen8_atomic_a64; // gen8 untyped read/write struct { uint32_t bti:8; uint32_t rgba:4; uint32_t simd_mode:2; uint32_t msg_type:5; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen8_untyped_rw_a64; struct { uint32_t bti:8; uint32_t block_sz:2; // 00 byte 01 dword uint32_t data_sz:2; // 0 ->1block 1->2block uint32_t ignored:2; uint32_t msg_type:5; // 10000 scatter read, 11010 scatter write 11001 a64 untyped write uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen8_scatter_rw_a64; struct { uint32_t src1_subreg_nr_high:1; uint32_t src1_reg_nr:8; uint32_t src1_subreg_nr_w:1; uint32_t src2_rep_ctrl:1; uint32_t src2_swizzle:8; uint32_t src2_subreg_nr:3; uint32_t src2_reg_nr:8; uint32_t src2_subreg_nr_w:1; uint32_t pad:1; } da3src; struct { uint32_t src1_subreg_nr_high:1; uint32_t src1_reg_nr:8; uint32_t src1_subreg_nr_w:1; uint32_t src2_rep_ctrl:1; uint32_t src2_special_acc:8; uint32_t src2_subreg_nr:3; uint32_t src2_reg_nr:8; uint32_t src2_subreg_nr_w:1; uint32_t pad:1; } da3srcacc; /*! Message gateway */ struct { uint32_t subfunc:3; uint32_t pad:11; uint32_t ackreq:1; uint32_t notify:2; uint32_t pad2:2; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad3:2; uint32_t end_of_thread:1; } gen7_msg_gw; struct { uint32_t bti:8; uint32_t block_size:3; // oword size uint32_t msg_sub_type:2; // 00 OWord block R/W 01 Unaligned OWord block read 10 Oword Dual Block R/W 11 HWord Block R/W uint32_t ignored:1; uint32_t msg_type:5; // 10100 A64 block read, 10101 A64 block write uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen8_block_rw_a64; struct { uint32_t jip:32; } gen8_branch; /*! Data port Media block read / write */ struct { uint32_t bti:8; uint32_t ver_line_stride_offset:1; uint32_t ver_line_stride:1; uint32_t ver_line_stride_override:1; uint32_t ignored:3; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_mblock_rw; int d; uint32_t ud; float f; } bits3; }; }; #endif Beignet-1.3.2-Source/backend/src/backend/gen_encoder.cpp000664 001750 001750 00000151415 13161142102 022177 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* Copyright (C) Intel Corp. 2006. All Rights Reserved. Intel funded Tungsten Graphics (http://www.tungstengraphics.com) to develop this 3D driver. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. **********************************************************************/ /* * Authors: * Keith Whitwell */ #include "backend/gen_encoder.hpp" #include namespace gbe { extern bool compactAlu2(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1, uint32_t condition, bool split); extern bool compactAlu1(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src, uint32_t condition, bool split); ////////////////////////////////////////////////////////////////////////// // Some helper functions to encode ////////////////////////////////////////////////////////////////////////// INLINE bool isVectorOfBytes(GenRegister reg) { if (reg.hstride != GEN_HORIZONTAL_STRIDE_0 && (reg.type == GEN_TYPE_UB || reg.type == GEN_TYPE_B)) return true; else return false; } INLINE bool isVectorOfLongs(GenRegister reg) { if (reg.hstride != GEN_HORIZONTAL_STRIDE_0 && (reg.type == GEN_TYPE_UL || reg.type == GEN_TYPE_L)) return true; else return false; } INLINE bool isCrossMoreThan2(GenRegister reg) { if (reg.hstride == GEN_HORIZONTAL_STRIDE_0) return false; const uint32_t typeSz = typeSize(reg.type); const uint32_t horizontal = stride(reg.hstride); if (horizontal * typeSz * 16 > GEN_REG_SIZE * 2) { return true; } return false; } INLINE bool isSrcDstDiffSpan(GenRegister dst, GenRegister src) { if (src.hstride == GEN_HORIZONTAL_STRIDE_0) return false; GBE_ASSERT(dst.hstride != GEN_HORIZONTAL_STRIDE_0 && "dst register is uniform but src is not."); uint32_t typeSz = typeSize(dst.type); uint32_t horizontal = stride(dst.hstride); uint32_t spans = (dst.subnr / (horizontal * typeSz)) * (horizontal * typeSz) + horizontal * typeSz * 16; uint32_t dstSpan = spans / GEN_REG_SIZE; dstSpan = dstSpan + (spans % GEN_REG_SIZE == 0 ? 0 : 1); if (dstSpan < 2) return false; typeSz = typeSize(src.type); horizontal = stride(src.hstride); spans = (src.subnr / (horizontal * typeSz)) * (horizontal * typeSz) + horizontal * typeSz * 16; uint32_t srcSpan = (horizontal * typeSz * 16) / GEN_REG_SIZE; srcSpan = srcSpan + (spans % GEN_REG_SIZE == 0 ? 0 : 1); GBE_ASSERT(srcSpan <= 2); GBE_ASSERT(dstSpan == 2); if (srcSpan == dstSpan) return false; /* Special case, dst is DW and src is w. the case: mov (16) r10.0<1>:d r12<8;8,1>:w is allowed. */ if ((dst.type == GEN_TYPE_UD || dst.type == GEN_TYPE_D) && (src.type == GEN_TYPE_UW || src.type == GEN_TYPE_W) && dstSpan == 2 && srcSpan == 1 && dst.subnr == 0 && src.subnr == 0) return false; return true; } INLINE bool needToSplitAlu1(GenEncoder *p, GenRegister dst, GenRegister src) { if (p->curr.execWidth != 16) return false; if (isVectorOfLongs(dst) == true) return true; if (isCrossMoreThan2(dst) == true) return true; if (src.hstride == GEN_HORIZONTAL_STRIDE_0) return false; if (isCrossMoreThan2(src) == true) return true; if (isVectorOfLongs(src) == true) return true; if (isSrcDstDiffSpan(dst, src) == true) return true; if (isVectorOfBytes(dst) == true && ((isVectorOfBytes(src) == true && src.hstride == dst.hstride) || src.hstride == GEN_HORIZONTAL_STRIDE_0)) return false; if (isVectorOfBytes(dst) == true) return true; if (isVectorOfBytes(src) == true) return true; return false; } INLINE bool needToSplitAlu2(GenEncoder *p, GenRegister dst, GenRegister src0, GenRegister src1) { if (p->curr.execWidth != 16) return false; if (isVectorOfLongs(dst) == true) return true; if (isCrossMoreThan2(dst) == true) return true; if (src0.hstride == GEN_HORIZONTAL_STRIDE_0 && src1.hstride == GEN_HORIZONTAL_STRIDE_0) return false; if (isVectorOfLongs(src0) == true) return true; if (isVectorOfLongs(src1) == true) return true; if (isCrossMoreThan2(src0) == true) return true; if (isCrossMoreThan2(src1) == true) return true; if (isSrcDstDiffSpan(dst, src0) == true) return true; if (isSrcDstDiffSpan(dst, src1) == true) return true; if (isVectorOfBytes(dst) == true && ((isVectorOfBytes(src0) == true && src0.hstride == dst.hstride) || src0.hstride == GEN_HORIZONTAL_STRIDE_0) && ((isVectorOfBytes(src1) == true && src1.hstride == dst.hstride) || src1.hstride == GEN_HORIZONTAL_STRIDE_0)) return false; if (isVectorOfBytes(dst) == true ) return true; if (isVectorOfBytes(src0) == true) return true; if (isVectorOfBytes(src1) == true) return true; return false; } INLINE bool needToSplitCmp(GenEncoder *p, GenRegister src0, GenRegister src1, GenRegister dst) { if (p->curr.execWidth != 16) return false; if (isVectorOfLongs(dst) == true) return true; if (isCrossMoreThan2(dst) == true) return true; if (src0.hstride == GEN_HORIZONTAL_STRIDE_0 && src1.hstride == GEN_HORIZONTAL_STRIDE_0) return false; if (isVectorOfBytes(src0) == true) return true; if (isVectorOfBytes(src1) == true) return true; if (isVectorOfLongs(src0) == true) return true; if (isVectorOfLongs(src1) == true) return true; if (isCrossMoreThan2(src0) == true) return true; if (isCrossMoreThan2(src1) == true) return true; if (isSrcDstDiffSpan(dst, src0) == true) return true; if (isSrcDstDiffSpan(dst, src1) == true) return true; if (src0.type == GEN_TYPE_D || src0.type == GEN_TYPE_UD || src0.type == GEN_TYPE_F) return true; if (src1.type == GEN_TYPE_D || src1.type == GEN_TYPE_UD || src1.type == GEN_TYPE_F) return true; return false; } void GenEncoder::setMessageDescriptor(GenNativeInstruction *inst, enum GenMessageTarget sfid, unsigned msg_length, unsigned response_length, bool header_present, bool end_of_thread) { inst->bits3.generic_gen5.header_present = header_present; inst->bits3.generic_gen5.response_length = response_length; inst->bits3.generic_gen5.msg_length = msg_length; inst->bits3.generic_gen5.end_of_thread = end_of_thread; inst->header.destreg_or_condmod = sfid; } void GenEncoder::setTypedWriteMessage(GenNativeInstruction *insn, unsigned char bti, unsigned char msg_type, uint32_t msg_length, bool header_present) { const GenMessageTarget sfid = GEN_SFID_DATAPORT_RENDER; setMessageDescriptor(insn, sfid, msg_length, 0, header_present); insn->bits3.gen7_typed_rw.bti = bti; insn->bits3.gen7_typed_rw.msg_type = msg_type; } void GenEncoder::setDPUntypedRW(GenNativeInstruction *insn, uint32_t bti, uint32_t rgba, uint32_t msg_type, uint32_t msg_length, uint32_t response_length) { const GenMessageTarget sfid = GEN_SFID_DATAPORT_DATA; setMessageDescriptor(insn, sfid, msg_length, response_length); insn->bits3.gen7_untyped_rw.msg_type = msg_type; insn->bits3.gen7_untyped_rw.bti = bti; insn->bits3.gen7_untyped_rw.rgba = rgba; if (curr.execWidth == 8) insn->bits3.gen7_untyped_rw.simd_mode = GEN_UNTYPED_SIMD8; else if (curr.execWidth == 16) insn->bits3.gen7_untyped_rw.simd_mode = GEN_UNTYPED_SIMD16; else NOT_SUPPORTED; } void GenEncoder::setDPByteScatterGather(GenNativeInstruction *insn, uint32_t bti, uint32_t elem_size, uint32_t msg_type, uint32_t msg_length, uint32_t response_length) { const GenMessageTarget sfid = GEN_SFID_DATAPORT_DATA; setMessageDescriptor(insn, sfid, msg_length, response_length); insn->bits3.gen7_byte_rw.msg_type = msg_type; insn->bits3.gen7_byte_rw.bti = bti; insn->bits3.gen7_byte_rw.data_size = elem_size; if (curr.execWidth == 8) insn->bits3.gen7_byte_rw.simd_mode = GEN_BYTE_SCATTER_SIMD8; else if (curr.execWidth == 16) insn->bits3.gen7_byte_rw.simd_mode = GEN_BYTE_SCATTER_SIMD16; else NOT_SUPPORTED; } void GenEncoder::setOBlockRW(GenNativeInstruction *insn, uint32_t bti, uint32_t block_size, uint32_t msg_type, uint32_t msg_length, uint32_t response_length) { const GenMessageTarget sfid = GEN_SFID_DATAPORT_DATA; setMessageDescriptor(insn, sfid, msg_length, response_length); insn->bits3.gen7_oblock_rw.msg_type = msg_type; insn->bits3.gen7_oblock_rw.bti = bti; insn->bits3.gen7_oblock_rw.block_size = block_size; insn->bits3.gen7_oblock_rw.header_present = 1; } uint32_t GenEncoder::getOBlockSize(uint32_t oword_size, bool low_half) { /* 000: 1 OWord, read into or written from the low 128 bits of the destination register. * 001: 1 OWord, read into or written from the high 128 bits of the destination register. * 010: 2 OWords * 011: 4 OWords * 100: 8 OWords */ switch(oword_size) { case 1: return low_half ? 0 : 1; case 2: return 2; case 4: return 3; case 8: return 4; default: NOT_SUPPORTED; } return 0; } void GenEncoder::setMBlockRW(GenNativeInstruction *insn, uint32_t bti, uint32_t msg_type, uint32_t msg_length, uint32_t response_length) { const GenMessageTarget sfid = GEN_SFID_DATAPORT1_DATA; setMessageDescriptor(insn, sfid, msg_length, response_length); insn->bits3.gen7_mblock_rw.msg_type = msg_type; insn->bits3.gen7_mblock_rw.bti = bti; insn->bits3.gen7_mblock_rw.header_present = 1; } static void setDWordScatterMessgae(GenEncoder *p, GenNativeInstruction *insn, uint32_t bti, uint32_t block_size, uint32_t msg_type, uint32_t msg_length, uint32_t response_length) { // FIXME there is a unknown issue with baytrail-t platform, the DWORD scatter // message causes a hang at unit test case compiler_global_constant. // We workaround it to use DATA CACHE instead. const GenMessageTarget sfid = (p->deviceID == PCI_CHIP_BAYTRAIL_T) ? GEN_SFID_DATAPORT_DATA : GEN_SFID_DATAPORT_CONSTANT; p->setMessageDescriptor(insn, sfid, msg_length, response_length); insn->bits3.gen7_dword_rw.msg_type = msg_type; insn->bits3.gen7_dword_rw.bti = bti; insn->bits3.gen7_dword_rw.block_size = block_size; insn->bits3.gen7_dword_rw.invalidate_after_read = 0; } ////////////////////////////////////////////////////////////////////////// // Gen Emitter encoding class ////////////////////////////////////////////////////////////////////////// GenEncoder::GenEncoder(uint32_t simdWidth, uint32_t gen, uint32_t deviceID) : stateNum(0), gen(gen), deviceID(deviceID) { this->simdWidth = simdWidth; this->curr.execWidth = simdWidth; this->curr.quarterControl = GEN_COMPRESSION_Q1; this->curr.noMask = 0; this->curr.flag = 0; this->curr.subFlag = 0; this->curr.predicate = GEN_PREDICATE_NORMAL; this->curr.inversePredicate = 0; } void GenEncoder::push(void) { assert(stateNum < MAX_STATE_NUM); stack[stateNum++] = curr; } void GenEncoder::pop(void) { assert(stateNum > 0); curr = stack[--stateNum]; } static const uint32_t untypedRWMask[] = { GEN_UNTYPED_ALPHA|GEN_UNTYPED_BLUE|GEN_UNTYPED_GREEN|GEN_UNTYPED_RED, GEN_UNTYPED_ALPHA|GEN_UNTYPED_BLUE|GEN_UNTYPED_GREEN, GEN_UNTYPED_ALPHA|GEN_UNTYPED_BLUE, GEN_UNTYPED_ALPHA, 0 }; unsigned GenEncoder::generateUntypedReadMessageDesc(unsigned bti, unsigned elemNum) { GenNativeInstruction insn; memset(&insn, 0, sizeof(GenNativeInstruction)); return setUntypedReadMessageDesc(&insn, bti, elemNum); } unsigned GenEncoder::setUntypedReadMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) { uint32_t msg_length = 0; uint32_t response_length = 0; if (this->curr.execWidth == 8) { msg_length = 1; response_length = elemNum; } else if (this->curr.execWidth == 16) { msg_length = 2; response_length = 2 * elemNum; } else NOT_IMPLEMENTED; setDPUntypedRW(insn, bti, untypedRWMask[elemNum], GEN7_UNTYPED_READ, msg_length, response_length); return insn->bits3.ud; } void GenEncoder::UNTYPED_READ(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemNum) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); assert(elemNum >= 1 || elemNum <= 4); this->setHeader(insn); this->setDst(insn, GenRegister::uw16grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0)); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA; if (bti.file == GEN_IMMEDIATE_VALUE) { this->setSrc1(insn, GenRegister::immud(0)); setUntypedReadMessageDesc(insn, bti.value.ud, elemNum); } else { this->setSrc1(insn, bti); } } unsigned GenEncoder::generateUntypedWriteMessageDesc(unsigned bti, unsigned elemNum) { GenNativeInstruction insn; memset(&insn, 0, sizeof(GenNativeInstruction)); return setUntypedWriteMessageDesc(&insn, bti, elemNum); } unsigned GenEncoder::setUntypedWriteMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) { uint32_t msg_length = 0; uint32_t response_length = 0; if (this->curr.execWidth == 8) { msg_length = 1 + elemNum; } else if (this->curr.execWidth == 16) { msg_length = 2 * (1 + elemNum); } else NOT_IMPLEMENTED; setDPUntypedRW(insn, bti, untypedRWMask[elemNum], GEN7_UNTYPED_WRITE, msg_length, response_length); return insn->bits3.ud; } unsigned GenEncoder::generateUntypedWriteSendsMessageDesc(unsigned bti, unsigned elemNum) { GenNativeInstruction insn; memset(&insn, 0, sizeof(GenNativeInstruction)); return setUntypedWriteSendsMessageDesc(&insn, bti, elemNum); } unsigned GenEncoder::setUntypedWriteSendsMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum) { assert(0); return 0; } void GenEncoder::UNTYPED_READA64(GenRegister dst, GenRegister src, uint32_t elemNum) { assert(0); } void GenEncoder::UNTYPED_WRITEA64(GenRegister src, uint32_t elemNum){ assert(0); } void GenEncoder::ATOMICA64(GenRegister dst, uint32_t function, GenRegister src, GenRegister bti, uint32_t srcNum) { assert(0); } void GenEncoder::UNTYPED_WRITE(GenRegister msg, GenRegister data, GenRegister bti, uint32_t elemNum, bool useSends) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); assert(elemNum >= 1 || elemNum <= 4); this->setHeader(insn); if (this->curr.execWidth == 8) { this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); } else if (this->curr.execWidth == 16) { this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW)); } else NOT_IMPLEMENTED; this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0)); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA; if (bti.file == GEN_IMMEDIATE_VALUE) { this->setSrc1(insn, GenRegister::immud(0)); setUntypedWriteMessageDesc(insn, bti.value.ud, elemNum); } else { this->setSrc1(insn, bti); } } unsigned GenEncoder::generateByteGatherMessageDesc(unsigned bti, unsigned elemSize) { GenNativeInstruction insn; memset(&insn, 0, sizeof(GenNativeInstruction)); return setByteGatherMessageDesc(&insn, bti, elemSize); } unsigned GenEncoder::setByteGatherMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemSize) { uint32_t msg_length = 0; uint32_t response_length = 0; if (this->curr.execWidth == 8) { msg_length = 1; response_length = 1; } else if (this->curr.execWidth == 16) { msg_length = 2; response_length = 2; } else NOT_IMPLEMENTED; setDPByteScatterGather(insn, bti, elemSize, GEN7_BYTE_GATHER, msg_length, response_length); return insn->bits3.ud; } void GenEncoder::BYTE_GATHER(GenRegister dst, GenRegister src, GenRegister bti, uint32_t elemSize) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA; this->setDst(insn, GenRegister::ud8grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(src.nr, 0)); if (bti.file == GEN_IMMEDIATE_VALUE) { this->setSrc1(insn, GenRegister::immud(0)); setByteGatherMessageDesc(insn, bti.value.ud, elemSize); } else { this->setSrc1(insn, bti); } } unsigned GenEncoder::generateByteScatterMessageDesc(unsigned bti, unsigned elemSize) { GenNativeInstruction insn; memset(&insn, 0, sizeof(GenNativeInstruction)); return setByteScatterMessageDesc(&insn, bti, elemSize); } unsigned GenEncoder::generateByteScatterSendsMessageDesc(unsigned bti, unsigned elemSize) { GenNativeInstruction insn; memset(&insn, 0, sizeof(GenNativeInstruction)); return setByteScatterSendsMessageDesc(&insn, bti, elemSize); } unsigned GenEncoder::setByteScatterSendsMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemSize) { assert(0); return 0; } unsigned GenEncoder::setByteScatterMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemSize) { uint32_t msg_length = 0; uint32_t response_length = 0; if (this->curr.execWidth == 8) { msg_length = 2; } else if (this->curr.execWidth == 16) { msg_length = 4; } else NOT_IMPLEMENTED; setDPByteScatterGather(insn, bti, elemSize, GEN7_BYTE_SCATTER, msg_length, response_length); return insn->bits3.ud; } void GenEncoder::BYTE_SCATTER(GenRegister msg, GenRegister data, GenRegister bti, uint32_t elemSize, bool useSends) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA; if (this->curr.execWidth == 8) { this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); } else if (this->curr.execWidth == 16) { this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW)); } else NOT_IMPLEMENTED; this->setSrc0(insn, GenRegister::ud8grf(msg.nr, 0)); if (bti.file == GEN_IMMEDIATE_VALUE) { this->setSrc1(insn, GenRegister::immud(0)); setByteScatterMessageDesc(insn, bti.value.ud, elemSize); } else { this->setSrc1(insn, bti); } } void GenEncoder::BYTE_GATHERA64(GenRegister dst, GenRegister src, uint32_t elemSize) { assert(0); } void GenEncoder::BYTE_SCATTERA64(GenRegister src, uint32_t elemSize){ assert(0); } void GenEncoder::DWORD_GATHER(GenRegister dst, GenRegister src, uint32_t bti) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); uint32_t msg_length = 0; uint32_t response_length = 0; uint32_t block_size = 0; if (this->curr.execWidth == 8) { msg_length = 1; response_length = 1; block_size = GEN_DWORD_SCATTER_8_DWORDS; } else if (this->curr.execWidth == 16) { msg_length = 2; response_length = 2; block_size = GEN_DWORD_SCATTER_16_DWORDS; } else NOT_IMPLEMENTED; this->setHeader(insn); this->setDst(insn, dst); this->setSrc0(insn, src); this->setSrc1(insn, GenRegister::immud(0)); setDWordScatterMessgae(this, insn, bti, block_size, GEN7_DWORD_GATHER, msg_length, response_length); } unsigned GenEncoder::generateAtomicMessageDesc(unsigned function, unsigned bti, unsigned srcNum) { GenNativeInstruction insn; memset(&insn, 0, sizeof(GenNativeInstruction)); return setAtomicMessageDesc(&insn, function, bti, srcNum); } unsigned GenEncoder::setAtomicMessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum) { uint32_t msg_length = 0; uint32_t response_length = 0; if (this->curr.execWidth == 8) { msg_length = srcNum; response_length = 1; } else if (this->curr.execWidth == 16) { msg_length = 2 * srcNum; response_length = 2; } else NOT_IMPLEMENTED; const GenMessageTarget sfid = GEN_SFID_DATAPORT_DATA; setMessageDescriptor(insn, sfid, msg_length, response_length); insn->bits3.gen7_atomic_op.msg_type = GEN7_UNTYPED_ATOMIC_READ; insn->bits3.gen7_atomic_op.bti = bti; insn->bits3.gen7_atomic_op.return_data = 1; insn->bits3.gen7_atomic_op.aop_type = function; if (this->curr.execWidth == 8) insn->bits3.gen7_atomic_op.simd_mode = GEN_ATOMIC_SIMD8; else if (this->curr.execWidth == 16) insn->bits3.gen7_atomic_op.simd_mode = GEN_ATOMIC_SIMD16; else NOT_SUPPORTED; return insn->bits3.ud; } unsigned GenEncoder::setAtomicA64MessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum, int type_long) { GBE_ASSERT(0); return 0; } void GenEncoder::ATOMIC(GenRegister dst, uint32_t function, GenRegister addr, GenRegister data, GenRegister bti, uint32_t srcNum, bool useSends) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); insn->header.destreg_or_condmod = GEN_SFID_DATAPORT_DATA; this->setDst(insn, GenRegister::uw16grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(addr.nr, 0)); if (bti.file == GEN_IMMEDIATE_VALUE) { this->setSrc1(insn, GenRegister::immud(0)); setAtomicMessageDesc(insn, function, bti.value.ud, srcNum); } else { this->setSrc1(insn, bti); } } extern bool OCL_DEBUGINFO; // first defined by calling BVAR in program.cpp void GenEncoder::setDBGInfo(DebugInfo in, bool hasHigh) { if(OCL_DEBUGINFO) { storedbg.push_back(in); if(hasHigh) storedbg.push_back(in); } } GenCompactInstruction *GenEncoder::nextCompact(uint32_t opcode) { GenCompactInstruction insn; std::memset(&insn, 0, sizeof(GenCompactInstruction)); insn.bits1.opcode = opcode; this->store.push_back(insn.low); setDBGInfo(DBGInfo, false); return (GenCompactInstruction *)&this->store.back(); } GenNativeInstruction *GenEncoder::next(uint32_t opcode) { GenNativeInstruction insn; std::memset(&insn, 0, sizeof(GenNativeInstruction)); insn.header.opcode = opcode; this->store.push_back(insn.low); this->store.push_back(insn.high); setDBGInfo(DBGInfo, true); return (GenNativeInstruction *)(&this->store.back()-1); } bool GenEncoder::canHandleLong(uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1) { /* By now, just alu1 insn will come to here. So just MOV */ this->MOV(dst.bottom_half(), src0.bottom_half()); this->MOV(dst.top_half(this->simdWidth), src0.top_half(this->simdWidth)); return true; } void GenEncoder::handleDouble(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1) { /* For platform before gen8, we do not support double and can not get here. */ GBE_ASSERT(0); } void alu1(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src, uint32_t condition) { if (dst.isdf() && src.isdf()) { p->handleDouble(p, opcode, dst, src); } else if (dst.isint64() && src.isint64() && p->canHandleLong(opcode, dst, src)) { // handle int64 return; } else if (needToSplitAlu1(p, dst, src) == false) { if(compactAlu1(p, opcode, dst, src, condition, false)) return; GenNativeInstruction *insn = p->next(opcode); if (condition != 0) { GBE_ASSERT(opcode == GEN_OPCODE_MOV || opcode == GEN_OPCODE_NOT); insn->header.destreg_or_condmod = condition; } p->setHeader(insn); p->setDst(insn, dst); p->setSrc0(insn, src); } else { GenNativeInstruction *insnQ1, *insnQ2; // Instruction for the first quarter insnQ1 = p->next(opcode); p->setHeader(insnQ1); insnQ1->header.quarter_control = GEN_COMPRESSION_Q1; insnQ1->header.execution_size = GEN_WIDTH_8; p->setDst(insnQ1, dst); p->setSrc0(insnQ1, src); // Instruction for the second quarter insnQ2 = p->next(opcode); p->setHeader(insnQ2); insnQ2->header.quarter_control = GEN_COMPRESSION_Q2; insnQ2->header.execution_size = GEN_WIDTH_8; p->setDst(insnQ2, GenRegister::Qn(dst, 1)); p->setSrc0(insnQ2, GenRegister::Qn(src, 1)); } } void alu2(GenEncoder *p, uint32_t opcode, GenRegister dst, GenRegister src0, GenRegister src1, uint32_t condition) { if (dst.isdf() && src0.isdf() && src1.isdf()) { p->handleDouble(p, opcode, dst, src0, src1); } else if (needToSplitAlu2(p, dst, src0, src1) == false) { if(compactAlu2(p, opcode, dst, src0, src1, condition, false)) return; GenNativeInstruction *insn = p->next(opcode); if (condition != 0) { GBE_ASSERT(opcode == GEN_OPCODE_OR || opcode == GEN_OPCODE_XOR || opcode == GEN_OPCODE_AND); insn->header.destreg_or_condmod = condition; } p->setHeader(insn); p->setDst(insn, dst); p->setSrc0(insn, src0); p->setSrc1(insn, src1); } else { GenNativeInstruction *insnQ1, *insnQ2; // Instruction for the first quarter insnQ1 = p->next(opcode); p->setHeader(insnQ1); insnQ1->header.quarter_control = GEN_COMPRESSION_Q1; insnQ1->header.execution_size = GEN_WIDTH_8; p->setDst(insnQ1, dst); p->setSrc0(insnQ1, src0); p->setSrc1(insnQ1, src1); // Instruction for the second quarter insnQ2 = p->next(opcode); p->setHeader(insnQ2); insnQ2->header.quarter_control = GEN_COMPRESSION_Q2; insnQ2->header.execution_size = GEN_WIDTH_8; p->setDst(insnQ2, GenRegister::Qn(dst, 1)); p->setSrc0(insnQ2, GenRegister::Qn(src0, 1)); p->setSrc1(insnQ2, GenRegister::Qn(src1, 1)); } } #define ALU1(OP) \ void GenEncoder::OP(GenRegister dest, GenRegister src0, uint32_t condition) { \ alu1(this, GEN_OPCODE_##OP, dest, src0, condition); \ } #define ALU2(OP) \ void GenEncoder::OP(GenRegister dest, GenRegister src0, GenRegister src1) { \ alu2(this, GEN_OPCODE_##OP, dest, src0, src1, 0); \ } #define ALU2_MOD(OP) \ void GenEncoder::OP(GenRegister dest, GenRegister src0, GenRegister src1, uint32_t condition) { \ alu2(this, GEN_OPCODE_##OP, dest, src0, src1, condition); \ } #define ALU3(OP) \ void GenEncoder::OP(GenRegister dest, GenRegister src0, GenRegister src1, GenRegister src2) { \ this->alu3(GEN_OPCODE_##OP, dest, src0, src1, src2); \ } void GenEncoder::LOAD_INT64_IMM(GenRegister dest, GenRegister value) { GenRegister u0 = GenRegister::immd((int)value.value.i64), u1 = GenRegister::immd(value.value.i64 >> 32); MOV(dest.bottom_half(), u0); MOV(dest.top_half(this->simdWidth), u1); } void GenEncoder::F16TO32(GenRegister dest, GenRegister src0) { alu1(this, GEN_OPCODE_F16TO32, dest, src0); } void GenEncoder::F32TO16(GenRegister dest, GenRegister src0) { alu1(this, GEN_OPCODE_F32TO16, dest, src0); } ALU1(MOV) ALU1(RNDZ) ALU1(RNDE) ALU1(RNDD) ALU1(RNDU) ALU1(FBH) ALU1(FBL) ALU1(CBIT) ALU2(SEL) ALU1(NOT) ALU2_MOD(AND) ALU2_MOD(OR) ALU2_MOD(XOR) ALU2(SHR) ALU2(SHL) ALU2(RSR) ALU2(RSL) ALU2(ASR) ALU1(FRC) ALU2(MAC) ALU1(LZD) ALU2(LINE) ALU2(PLN) ALU2(MACH) ALU3(MAD) ALU3(LRP) ALU1(BFREV) // ALU2(BRC) // ALU1(ENDIF) // ALU1(IF) void GenEncoder::SUBB(GenRegister dest, GenRegister src0, GenRegister src1) { push(); curr.accWrEnable = 1; alu2(this, GEN_OPCODE_SUBB, dest, src0, src1); pop(); } void GenEncoder::ADDC(GenRegister dest, GenRegister src0, GenRegister src1) { push(); curr.accWrEnable = 1; alu2(this, GEN_OPCODE_ADDC, dest, src0, src1); pop(); } void GenEncoder::ADD(GenRegister dest, GenRegister src0, GenRegister src1) { if (src0.type == GEN_TYPE_F || (src0.file == GEN_IMMEDIATE_VALUE && src0.type == GEN_TYPE_VF)) { assert(src1.type != GEN_TYPE_UD); assert(src1.type != GEN_TYPE_D); } if (src1.type == GEN_TYPE_F || (src1.file == GEN_IMMEDIATE_VALUE && src1.type == GEN_TYPE_VF)) { assert(src0.type != GEN_TYPE_UD); assert(src0.type != GEN_TYPE_D); } alu2(this, GEN_OPCODE_ADD, dest, src0, src1); } void GenEncoder::MUL(GenRegister dest, GenRegister src0, GenRegister src1) { if (src0.type == GEN_TYPE_D || src0.type == GEN_TYPE_UD || src1.type == GEN_TYPE_D || src1.type == GEN_TYPE_UD) assert(dest.type != GEN_TYPE_F); if (src0.type == GEN_TYPE_F || (src0.file == GEN_IMMEDIATE_VALUE && src0.type == GEN_TYPE_VF)) { assert(src1.type != GEN_TYPE_UD); assert(src1.type != GEN_TYPE_D); } if (src1.type == GEN_TYPE_F || (src1.file == GEN_IMMEDIATE_VALUE && src1.type == GEN_TYPE_VF)) { assert(src0.type != GEN_TYPE_UD); assert(src0.type != GEN_TYPE_D); } assert(src0.file != GEN_ARCHITECTURE_REGISTER_FILE || src0.nr != GEN_ARF_ACCUMULATOR); assert(src1.file != GEN_ARCHITECTURE_REGISTER_FILE || src1.nr != GEN_ARF_ACCUMULATOR); alu2(this, GEN_OPCODE_MUL, dest, src0, src1); } void GenEncoder::NOP(void) { GenNativeInstruction *insn = this->next(GEN_OPCODE_NOP); this->setDst(insn, GenRegister::retype(GenRegister::f4grf(0,0), GEN_TYPE_UD)); this->setSrc0(insn, GenRegister::retype(GenRegister::f4grf(0,0), GEN_TYPE_UD)); this->setSrc1(insn, GenRegister::immud(0x0)); } void GenEncoder::BARRIER(GenRegister src) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); this->setDst(insn, GenRegister::null()); this->setSrc0(insn, src); this->setSrc1(insn, GenRegister::immud(0)); setMessageDescriptor(insn, GEN_SFID_MESSAGE_GATEWAY, 1, 0); insn->bits3.msg_gateway.sub_function_id = GEN_BARRIER_MSG; insn->bits3.msg_gateway.notify = 0x1; } void GenEncoder::FWD_GATEWAY_MSG(GenRegister src, uint32_t notifyN) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); this->setDst(insn, GenRegister::null()); this->setSrc0(insn, src); this->setSrc1(insn, GenRegister::immud(0)); setMessageDescriptor(insn, GEN_SFID_MESSAGE_GATEWAY, 1, 0); insn->bits3.msg_gateway.sub_function_id = GEN_FORWARD_MSG; GBE_ASSERT(notifyN <= 2); insn->bits3.msg_gateway.notify = notifyN; } void GenEncoder::FENCE(GenRegister dst, bool flushRWCache) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); this->setDst(insn, dst); this->setSrc0(insn, dst); this->setSrc1(insn, GenRegister::immud(0)); setMessageDescriptor(insn, GEN_SFID_DATAPORT_DATA, 1, 1, 1); insn->bits3.gen7_memory_fence.msg_type = GEN_MEM_FENCE; insn->bits3.gen7_memory_fence.commit_enable = 0x1; } void GenEncoder::JMPI(GenRegister src, bool longjmp) { alu2(this, GEN_OPCODE_JMPI, GenRegister::ip(), GenRegister::ip(), src); if (longjmp) NOP(); } #define ALU2_BRA(OP) \ void GenEncoder::OP(GenRegister src) { \ alu2(this, GEN_OPCODE_##OP, GenRegister::nullud(), GenRegister::nullud(), src); \ } ALU2_BRA(IF) ALU2_BRA(ELSE) ALU2_BRA(ENDIF) ALU2_BRA(WHILE) ALU2_BRA(BRD) ALU2_BRA(BRC) // jip is the distance between jump instruction and jump-target. we have handled // pre/post-increment in patchJMPI() function body void GenEncoder::patchJMPI(uint32_t insnID, int32_t jip, int32_t uip) { GenNativeInstruction &insn = *(GenNativeInstruction *)&this->store[insnID]; GBE_ASSERT(insnID < this->store.size()); GBE_ASSERT(insn.header.opcode == GEN_OPCODE_JMPI || insn.header.opcode == GEN_OPCODE_BRD || insn.header.opcode == GEN_OPCODE_ENDIF || insn.header.opcode == GEN_OPCODE_IF || insn.header.opcode == GEN_OPCODE_BRC || insn.header.opcode == GEN_OPCODE_WHILE || insn.header.opcode == GEN_OPCODE_ELSE); if( insn.header.opcode == GEN_OPCODE_WHILE ){ // if this WHILE instruction jump back to an ELSE instruction, // need add distance to go to the next instruction. GenNativeInstruction & insn_else = *(GenNativeInstruction *)&this->store[insnID+jip]; if(insn_else.header.opcode == GEN_OPCODE_ELSE){ jip += 2; } } if (insn.header.opcode != GEN_OPCODE_JMPI || (jip > -32769 && jip < 32768)) { if (insn.header.opcode == GEN_OPCODE_IF) { this->setSrc1(&insn, GenRegister::immd((jip & 0xffff) | uip<<16)); return; } else if (insn.header.opcode == GEN_OPCODE_JMPI) { jip = jip - 2; } else if(insn.header.opcode == GEN_OPCODE_ENDIF) jip += 2; this->setSrc1(&insn, GenRegister::immd((jip & 0xffff) | uip<<16)); } else if ( insn.header.predicate_control == GEN_PREDICATE_NONE ) { // For the conditional jump distance out of S15 range, we need to use an // inverted jmp followed by a add ip, ip, distance to implement. // A little hacky as we need to change the nop instruction to add // instruction manually. // If this is a unconditional jump, we just need to add the IP directly. // FIXME there is an optimization method which we can insert a // ADD instruction on demand. But that will need some extra analysis // for all the branching instruction. And need to adjust the distance // for those branch instruction's start point and end point contains // this instruction. GBE_ASSERT(((GenNativeInstruction *)&this->store[insnID+2])->header.opcode == GEN_OPCODE_NOP); insn.header.opcode = GEN_OPCODE_ADD; this->setDst(&insn, GenRegister::ip()); this->setSrc0(&insn, GenRegister::ip()); this->setSrc1(&insn, GenRegister::immd(jip * 8)); } else { GenNativeInstruction &insn2 = *(GenNativeInstruction *)&this->store[insnID+2]; insn.header.predicate_inverse ^= 1; this->setSrc1(&insn, GenRegister::immd(2)); GBE_ASSERT(insn2.header.opcode == GEN_OPCODE_NOP); GBE_ASSERT(insnID < this->store.size()); insn2.header.predicate_control = GEN_PREDICATE_NONE; insn2.header.opcode = GEN_OPCODE_ADD; this->setDst(&insn2, GenRegister::ip()); this->setSrc0(&insn2, GenRegister::ip()); this->setSrc1(&insn2, GenRegister::immd((jip - 2) * 8)); } } void GenEncoder::CMP(uint32_t conditional, GenRegister src0, GenRegister src1, GenRegister dst) { if (needToSplitCmp(this, src0, src1, dst) == false) { if(!GenRegister::isNull(dst) && compactAlu2(this, GEN_OPCODE_CMP, dst, src0, src1, conditional, false)) { return; } GenNativeInstruction *insn = this->next(GEN_OPCODE_CMP); this->setHeader(insn); insn->header.destreg_or_condmod = conditional; if (GenRegister::isNull(dst)) insn->header.thread_control = GEN_THREAD_SWITCH; this->setDst(insn, dst); this->setSrc0(insn, src0); this->setSrc1(insn, src1); } else { GenNativeInstruction *insnQ1, *insnQ2; // Instruction for the first quarter insnQ1 = this->next(GEN_OPCODE_CMP); this->setHeader(insnQ1); if (GenRegister::isNull(dst)) insnQ1->header.thread_control = GEN_THREAD_SWITCH; insnQ1->header.quarter_control = GEN_COMPRESSION_Q1; insnQ1->header.execution_size = GEN_WIDTH_8; insnQ1->header.destreg_or_condmod = conditional; this->setDst(insnQ1, dst); this->setSrc0(insnQ1, src0); this->setSrc1(insnQ1, src1); // Instruction for the second quarter insnQ2 = this->next(GEN_OPCODE_CMP); this->setHeader(insnQ2); if (GenRegister::isNull(dst)) insnQ2->header.thread_control = GEN_THREAD_SWITCH; insnQ2->header.quarter_control = GEN_COMPRESSION_Q2; insnQ2->header.execution_size = GEN_WIDTH_8; insnQ2->header.destreg_or_condmod = conditional; this->setDst(insnQ2, GenRegister::Qn(dst, 1)); this->setSrc0(insnQ2, GenRegister::Qn(src0, 1)); this->setSrc1(insnQ2, GenRegister::Qn(src1, 1)); } } void GenEncoder::SEL_CMP(uint32_t conditional, GenRegister dst, GenRegister src0, GenRegister src1) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEL); GBE_ASSERT(curr.predicate == GEN_PREDICATE_NONE); this->setHeader(insn); insn->header.destreg_or_condmod = conditional; this->setDst(insn, dst); this->setSrc0(insn, src0); this->setSrc1(insn, src1); } void GenEncoder::WAIT(uint32_t n) { GenNativeInstruction *insn = this->next(GEN_OPCODE_WAIT); GBE_ASSERT(curr.predicate == GEN_PREDICATE_NONE); GenRegister src = GenRegister::notification0(n); this->setDst(insn, GenRegister::null()); this->setSrc0(insn, src); this->setSrc1(insn, GenRegister::null()); insn->header.execution_size = 0; /* must */ insn->header.predicate_control = 0; insn->header.quarter_control = 0; } void GenEncoder::MATH(GenRegister dst, uint32_t function, GenRegister src0, GenRegister src1) { GenNativeInstruction *insn = this->next(GEN_OPCODE_MATH); assert(dst.file == GEN_GENERAL_REGISTER_FILE); assert(src0.file == GEN_GENERAL_REGISTER_FILE); assert(src1.file == GEN_GENERAL_REGISTER_FILE); assert(dst.hstride == GEN_HORIZONTAL_STRIDE_1 || dst.hstride == GEN_HORIZONTAL_STRIDE_0); if (function == GEN_MATH_FUNCTION_INT_DIV_QUOTIENT || function == GEN_MATH_FUNCTION_INT_DIV_REMAINDER || function == GEN_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER) { assert(src0.type != GEN_TYPE_F); assert(src1.type != GEN_TYPE_F); } else { assert(src0.type == GEN_TYPE_F); assert(src1.type == GEN_TYPE_F); } insn->header.destreg_or_condmod = function; this->setHeader(insn); this->setDst(insn, dst); this->setSrc0(insn, src0); this->setSrc1(insn, src1); if (function == GEN_MATH_FUNCTION_INT_DIV_QUOTIENT || function == GEN_MATH_FUNCTION_INT_DIV_REMAINDER) { insn->header.execution_size = this->curr.execWidth == 1 ? GEN_WIDTH_1 : GEN_WIDTH_8; insn->header.quarter_control = GEN_COMPRESSION_Q1; if(this->curr.execWidth == 16) { GenNativeInstruction *insn2 = this->next(GEN_OPCODE_MATH); GenRegister new_dest, new_src0, new_src1; new_dest = GenRegister::QnPhysical(dst, 1); new_src0 = GenRegister::QnPhysical(src0, 1); new_src1 = GenRegister::QnPhysical(src1, 1); insn2->header.destreg_or_condmod = function; this->setHeader(insn2); insn2->header.execution_size = GEN_WIDTH_8; insn2->header.quarter_control = GEN_COMPRESSION_Q2; this->setDst(insn2, new_dest); this->setSrc0(insn2, new_src0); this->setSrc1(insn2, new_src1); } } } void GenEncoder::MATH(GenRegister dst, uint32_t function, GenRegister src) { GenNativeInstruction *insn = this->next(GEN_OPCODE_MATH); assert(dst.file == GEN_GENERAL_REGISTER_FILE); assert(src.file == GEN_GENERAL_REGISTER_FILE); assert(dst.hstride == GEN_HORIZONTAL_STRIDE_1 || dst.hstride == GEN_HORIZONTAL_STRIDE_0); assert(src.type == GEN_TYPE_F); insn->header.destreg_or_condmod = function; this->setHeader(insn); this->setDst(insn, dst); this->setSrc0(insn, src); } void GenEncoder::setSamplerMessage(GenNativeInstruction *insn, unsigned char bti, unsigned char sampler, uint32_t msg_type, uint32_t response_length, uint32_t msg_length, bool header_present, uint32_t simd_mode, uint32_t return_format) { const GenMessageTarget sfid = GEN_SFID_SAMPLER; setMessageDescriptor(insn, sfid, msg_length, response_length); insn->bits3.sampler_gen7.bti = bti; insn->bits3.sampler_gen7.sampler = sampler; insn->bits3.sampler_gen7.msg_type = msg_type; insn->bits3.sampler_gen7.simd_mode = simd_mode; } void GenEncoder::SAMPLE(GenRegister dest, GenRegister msg, unsigned int msg_len, bool header_present, unsigned char bti, unsigned char sampler, uint32_t simdWidth, uint32_t writemask, uint32_t return_format, bool isLD, bool isUniform) { if (writemask == 0) return; uint32_t msg_type = isLD ? GEN_SAMPLER_MESSAGE_SIMD8_LD : GEN_SAMPLER_MESSAGE_SIMD8_SAMPLE; uint32_t response_length = (4 * (simdWidth / 8)); uint32_t msg_length = (msg_len * (simdWidth / 8)); if (header_present) msg_length++; uint32_t simd_mode = (simdWidth == 16) ? GEN_SAMPLER_SIMD_MODE_SIMD16 : GEN_SAMPLER_SIMD_MODE_SIMD8; if(isUniform) { response_length = 1; msg_type = GEN_SAMPLER_MESSAGE_SIMD4X2_LD; msg_length = 1; simd_mode = GEN_SAMPLER_SIMD_MODE_SIMD4X2; } GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); this->setDst(insn, dest); this->setSrc0(insn, msg); this->setSrc1(insn, GenRegister::immud(0)); setSamplerMessage(insn, bti, sampler, msg_type, response_length, msg_length, header_present, simd_mode, return_format); } void GenEncoder::FLUSH_SAMPLERCACHE(GenRegister dst) { // only Gen8+ support flushing sampler cache assert(0); } void GenEncoder::setVmeMessage(GenNativeInstruction *insn, unsigned char bti, uint32_t response_length, uint32_t msg_length, uint32_t msg_type, unsigned char vme_search_path_lut, unsigned char lut_sub) { const GenMessageTarget sfid = GEN_SFID_VIDEO_MOTION_EST; setMessageDescriptor(insn, sfid, msg_length, response_length, true); insn->bits3.vme_gen7.bti = bti; insn->bits3.vme_gen7.vme_search_path_lut = vme_search_path_lut; insn->bits3.vme_gen7.lut_sub = lut_sub; insn->bits3.vme_gen7.msg_type = msg_type; insn->bits3.vme_gen7.stream_in = 0; insn->bits3.vme_gen7.stream_out = 0; insn->bits3.vme_gen7.reserved_mbz = 0; } void GenEncoder::VME(unsigned char bti, GenRegister dest, GenRegister msg, uint32_t msg_type, uint32_t vme_search_path_lut, uint32_t lut_sub) { /* Currectly we just support inter search only, we will support other * modes in future. */ GBE_ASSERT(msg_type == 1); uint32_t msg_length, response_length; if(msg_type == 1){ msg_length = 5; response_length = 6; } GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); this->setDst(insn, dest); this->setSrc0(insn, msg); this->setSrc1(insn, GenRegister::immud(0)); setVmeMessage(insn, bti, response_length, msg_length, msg_type, vme_search_path_lut, lut_sub); } void GenEncoder::TYPED_WRITE(GenRegister msg, GenRegister data, bool header_present, unsigned char bti, bool useSends) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); uint32_t msg_type = GEN_TYPED_WRITE; uint32_t msg_length = header_present ? 9 : 8; this->setHeader(insn); this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); this->setSrc0(insn, msg); this->setSrc1(insn, GenRegister::immud(0)); setTypedWriteMessage(insn, bti, msg_type, msg_length, header_present); } static void setScratchMessage(GenEncoder *p, GenNativeInstruction *insn, uint32_t offset, uint32_t block_size, uint32_t channel_mode, uint32_t msg_type, uint32_t msg_length, uint32_t response_length) { const GenMessageTarget sfid = GEN_SFID_DATAPORT_DATA; p->setMessageDescriptor(insn, sfid, msg_length, response_length, true); insn->bits3.gen7_scratch_rw.block_size = block_size; insn->bits3.gen7_scratch_rw.msg_type = msg_type; insn->bits3.gen7_scratch_rw.channel_mode = channel_mode; insn->bits3.gen7_scratch_rw.offset = offset; insn->bits3.gen7_scratch_rw.category = 1; } void GenEncoder::SCRATCH_WRITE(GenRegister msg, uint32_t offset, uint32_t size, uint32_t src_num, uint32_t channel_mode) { assert(src_num == 1 || src_num ==2); uint32_t block_size = src_num == 1 ? GEN_SCRATCH_BLOCK_SIZE_1 : GEN_SCRATCH_BLOCK_SIZE_2; GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); this->setSrc0(insn, msg); this->setSrc1(insn, GenRegister::immud(0)); // here src_num means register that will be write out: in terms of 32byte register number setScratchMessage(this, insn, offset, block_size, channel_mode, GEN_SCRATCH_WRITE, src_num+1, 0); } void GenEncoder::SCRATCH_READ(GenRegister dst, GenRegister src, uint32_t offset, uint32_t size, uint32_t dst_num, uint32_t channel_mode) { assert(dst_num == 1 || dst_num ==2); uint32_t block_size = dst_num == 1 ? GEN_SCRATCH_BLOCK_SIZE_1 : GEN_SCRATCH_BLOCK_SIZE_2; GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setHeader(insn); this->setDst(insn, dst); this->setSrc0(insn, src); this->setSrc1(insn, GenRegister::immud(0)); // here dst_num is the register that will be write-back: in terms of 32byte register setScratchMessage(this, insn, offset, block_size, channel_mode, GEN_SCRATCH_READ, 1, dst_num); } void GenEncoder::OBREAD(GenRegister dst, GenRegister header, uint32_t bti, uint32_t ow_size) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); const uint32_t msg_length = 1; uint32_t sizeinreg = ow_size / 2; // half reg should also have size 1 sizeinreg = sizeinreg == 0 ? 1 : sizeinreg; const uint32_t block_size = getOBlockSize(ow_size, dst.subnr == 0); const uint32_t response_length = sizeinreg; // Size is in reg this->setHeader(insn); this->setDst(insn, GenRegister::uw16grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(header.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); setOBlockRW(insn, bti, block_size, GEN7_UNALIGNED_OBLOCK_READ, msg_length, response_length); } void GenEncoder::OBWRITE(GenRegister header, GenRegister data, uint32_t bti, uint32_t ow_size, bool useSends) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); uint32_t sizeinreg = ow_size / 2; // half reg should also have size 1 sizeinreg = sizeinreg == 0 ? 1 : sizeinreg; const uint32_t msg_length = 1 + sizeinreg; // Size is in reg and header const uint32_t response_length = 0; const uint32_t block_size = getOBlockSize(ow_size); this->setHeader(insn); this->setSrc0(insn, GenRegister::ud8grf(header.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW)); setOBlockRW(insn, bti, block_size, GEN7_OBLOCK_WRITE, msg_length, response_length); } void GenEncoder::MBREAD(GenRegister dst, GenRegister header, uint32_t bti, uint32_t response_size) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); const uint32_t msg_length = 1; const uint32_t response_length = response_size; // Size of registers this->setHeader(insn); this->setDst(insn, GenRegister::ud8grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(header.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); setMBlockRW(insn, bti, GEN75_P1_MEDIA_BREAD, msg_length, response_length); } void GenEncoder::MBWRITE(GenRegister header, GenRegister data, uint32_t bti, uint32_t data_size, bool useSends) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); const uint32_t msg_length = 1 + data_size; const uint32_t response_length = 0; // Size of registers this->setHeader(insn); this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW)); this->setSrc0(insn, GenRegister::ud8grf(header.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); setMBlockRW(insn, bti, GEN75_P1_MEDIA_TYPED_BWRITE, msg_length, response_length); } void GenEncoder::OBREADA64(GenRegister dst, GenRegister header, uint32_t bti, uint32_t elemSize) { NOT_SUPPORTED; } void GenEncoder::OBWRITEA64(GenRegister header, uint32_t bti, uint32_t elemSize) { NOT_SUPPORTED; } void GenEncoder::EOT(uint32_t msg) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UD)); this->setSrc0(insn, GenRegister::ud8grf(msg,0)); this->setSrc1(insn, GenRegister::immud(0)); insn->header.execution_size = GEN_WIDTH_8; insn->bits3.spawner_gen5.resource = GEN_DO_NOT_DEREFERENCE_URB; insn->bits3.spawner_gen5.msg_length = 1; insn->bits3.spawner_gen5.end_of_thread = 1; insn->header.destreg_or_condmod = GEN_SFID_THREAD_SPAWNER; } } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/backend/gen7_instruction.hpp000664 001750 001750 00000037250 13173554000 023244 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Rong Yang */ /* Copyright (C) Intel Corp. 2006. All Rights Reserved. Intel funded Tungsten Graphics (http://www.tungstengraphics.com) to develop this 3D driver. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. **********************************************************************/ /* * Authors: * Keith Whitwell */ #ifndef __GEN7_INSTRUCTION_HPP__ #define __GEN7_INSTRUCTION_HPP__ union Gen7NativeInstruction { struct { struct { uint32_t opcode:7; uint32_t pad:1; uint32_t access_mode:1; uint32_t mask_control:1; uint32_t dependency_control:2; uint32_t quarter_control:2; uint32_t thread_control:2; uint32_t predicate_control:4; uint32_t predicate_inverse:1; uint32_t execution_size:3; uint32_t destreg_or_condmod:4; uint32_t acc_wr_control:1; uint32_t cmpt_control:1; uint32_t debug_control:1; uint32_t saturate:1; } header; union { struct { uint32_t dest_reg_file:2; uint32_t dest_reg_type:3; uint32_t src0_reg_file:2; uint32_t src0_reg_type:3; uint32_t src1_reg_file:2; uint32_t src1_reg_type:3; uint32_t nib_ctrl:1; uint32_t dest_subreg_nr:5; uint32_t dest_reg_nr:8; uint32_t dest_horiz_stride:2; uint32_t dest_address_mode:1; } da1; struct { uint32_t dest_reg_file:2; uint32_t dest_reg_type:3; uint32_t src0_reg_file:2; uint32_t src0_reg_type:3; uint32_t src1_reg_file:2; /* 0x00000c00 */ uint32_t src1_reg_type:3; /* 0x00007000 */ uint32_t nib_ctrl:1; int dest_indirect_offset:10; /* offset against the deref'd address reg */ uint32_t dest_subreg_nr:3; /* subnr for the address reg a0.x */ uint32_t dest_horiz_stride:2; uint32_t dest_address_mode:1; } ia1; struct { uint32_t dest_reg_file:2; uint32_t dest_reg_type:3; uint32_t src0_reg_file:2; uint32_t src0_reg_type:3; uint32_t src1_reg_file:2; uint32_t src1_reg_type:3; uint32_t nib_ctrl:1; uint32_t dest_writemask:4; uint32_t dest_subreg_nr:1; uint32_t dest_reg_nr:8; uint32_t dest_horiz_stride:2; uint32_t dest_address_mode:1; } da16; struct { uint32_t dest_reg_file:2; uint32_t dest_reg_type:3; uint32_t src0_reg_file:2; uint32_t src0_reg_type:3; uint32_t nib_ctrl:1; uint32_t dest_writemask:4; int dest_indirect_offset:6; uint32_t dest_subreg_nr:3; uint32_t dest_horiz_stride:2; uint32_t dest_address_mode:1; } ia16; struct { uint32_t dest_reg_file:2; uint32_t dest_reg_type:3; uint32_t src0_reg_file:2; uint32_t src0_reg_type:3; uint32_t src1_reg_file:2; uint32_t src1_reg_type:3; uint32_t pad:1; int jump_count:16; } branch_gen6; struct { uint32_t dest_reg_file:1; uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t pad0:1; uint32_t src0_abs:1; uint32_t src0_negate:1; uint32_t src1_abs:1; uint32_t src1_negate:1; uint32_t src2_abs:1; uint32_t src2_negate:1; uint32_t pad1:7; uint32_t dest_writemask:4; uint32_t dest_subreg_nr:3; uint32_t dest_reg_nr:8; } da3src; } bits1; union { struct { uint32_t src0_subreg_nr:5; uint32_t src0_reg_nr:8; uint32_t src0_abs:1; uint32_t src0_negate:1; uint32_t src0_address_mode:1; uint32_t src0_horiz_stride:2; uint32_t src0_width:3; uint32_t src0_vert_stride:4; uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t pad:5; } da1; struct { int src0_indirect_offset:10; uint32_t src0_subreg_nr:3; uint32_t src0_abs:1; uint32_t src0_negate:1; uint32_t src0_address_mode:1; uint32_t src0_horiz_stride:2; uint32_t src0_width:3; uint32_t src0_vert_stride:4; uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t pad:5; } ia1; struct { uint32_t src0_swz_x:2; uint32_t src0_swz_y:2; uint32_t src0_subreg_nr:1; uint32_t src0_reg_nr:8; uint32_t src0_abs:1; uint32_t src0_negate:1; uint32_t src0_address_mode:1; uint32_t src0_swz_z:2; uint32_t src0_swz_w:2; uint32_t pad0:1; uint32_t src0_vert_stride:4; uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t pad:5; } da16; struct { uint32_t src0_swz_x:2; uint32_t src0_swz_y:2; int src0_indirect_offset:6; uint32_t src0_subreg_nr:3; uint32_t src0_abs:1; uint32_t src0_negate:1; uint32_t src0_address_mode:1; uint32_t src0_swz_z:2; uint32_t src0_swz_w:2; uint32_t pad0:1; uint32_t src0_vert_stride:4; uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; uint32_t pad:5; } ia16; struct { uint32_t src0_rep_ctrl:1; uint32_t src0_swizzle:8; uint32_t src0_subreg_nr:3; uint32_t src0_reg_nr:8; uint32_t pad0:1; uint32_t src1_rep_ctrl:1; uint32_t src1_swizzle:8; uint32_t src1_subreg_nr_low:2; } da3src; } bits2; union { struct { uint32_t src1_subreg_nr:5; uint32_t src1_reg_nr:8; uint32_t src1_abs:1; uint32_t src1_negate:1; uint32_t src1_address_mode:1; uint32_t src1_horiz_stride:2; uint32_t src1_width:3; uint32_t src1_vert_stride:4; uint32_t pad0:7; } da1; struct { uint32_t src1_swz_x:2; uint32_t src1_swz_y:2; uint32_t src1_subreg_nr:1; uint32_t src1_reg_nr:8; uint32_t src1_abs:1; uint32_t src1_negate:1; uint32_t src1_address_mode:1; uint32_t src1_swz_z:2; uint32_t src1_swz_w:2; uint32_t pad1:1; uint32_t src1_vert_stride:4; uint32_t pad2:7; } da16; struct { int src1_indirect_offset:10; uint32_t src1_subreg_nr:3; uint32_t src1_abs:1; uint32_t src1_negate:1; uint32_t src1_address_mode:1; uint32_t src1_horiz_stride:2; uint32_t src1_width:3; uint32_t src1_vert_stride:4; uint32_t pad1:7; } ia1; struct { uint32_t src1_swz_x:2; uint32_t src1_swz_y:2; int src1_indirect_offset:6; uint32_t src1_subreg_nr:3; uint32_t src1_abs:1; uint32_t src1_negate:1; uint32_t pad0:1; uint32_t src1_swz_z:2; uint32_t src1_swz_w:2; uint32_t pad1:1; uint32_t src1_vert_stride:4; uint32_t pad2:7; } ia16; struct { uint32_t function_control:19; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } generic_gen5; struct { uint32_t sub_function_id:3; uint32_t pad0:11; uint32_t ack_req:1; uint32_t notify:2; uint32_t pad1:2; uint32_t header:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } msg_gateway; struct { uint32_t opcode:1; uint32_t request:1; uint32_t pad0:2; uint32_t resource:1; uint32_t pad1:14; uint32_t header:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } spawner_gen5; /** Ironlake PRM, Volume 4 Part 1, Section 6.1.1.1 */ struct { uint32_t function:4; uint32_t int_type:1; uint32_t precision:1; uint32_t saturate:1; uint32_t data_type:1; uint32_t snapshot:1; uint32_t pad0:10; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } math_gen5; struct { uint32_t bti:8; uint32_t sampler:4; uint32_t msg_type:5; uint32_t simd_mode:2; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } sampler_gen7; struct { uint32_t bti:8; uint32_t vme_search_path_lut:3; uint32_t lut_sub:2; uint32_t msg_type:2; uint32_t stream_in:1; uint32_t stream_out:1; uint32_t reserved_mbz:2; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } vme_gen7; /** * Message for the Sandybridge Sampler Cache or Constant Cache Data Port. * * See the Sandybridge PRM, Volume 4 Part 1, Section 3.9.2.1.1. **/ struct { uint32_t bti:8; uint32_t msg_control:5; uint32_t msg_type:3; uint32_t pad0:3; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad1:2; uint32_t end_of_thread:1; } gen6_dp_sampler_const_cache; /*! Data port untyped read / write messages */ struct { uint32_t bti:8; uint32_t rgba:4; uint32_t simd_mode:2; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_untyped_rw; /*! Data port byte scatter / gather */ struct { uint32_t bti:8; uint32_t simd_mode:1; uint32_t ignored0:1; uint32_t data_size:2; uint32_t ignored1:2; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_byte_rw; /*! Data port Scratch Read/ write */ struct { uint32_t offset:12; uint32_t block_size:2; uint32_t ignored0:1; uint32_t invalidate_after_read:1; uint32_t channel_mode:1; uint32_t msg_type:1; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_scratch_rw; /*! Data port OBlock read / write */ struct { uint32_t bti:8; uint32_t block_size:3; uint32_t ignored:2; uint32_t invalidate_after_read:1; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_oblock_rw; /*! Data port dword scatter / gather */ struct { uint32_t bti:8; uint32_t block_size:2; uint32_t ignored0:3; uint32_t invalidate_after_read:1; uint32_t msg_type:4; uint32_t ignored1:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_dword_rw; /*! Data port typed read / write messages */ struct { uint32_t bti:8; uint32_t chan_mask:4; uint32_t slot:2; uint32_t msg_type:4; uint32_t pad2:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad3:2; uint32_t end_of_thread:1; } gen7_typed_rw; /*! Memory fence */ struct { uint32_t bti:8; uint32_t pad:5; uint32_t commit_enable:1; uint32_t msg_type:4; uint32_t pad2:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad3:2; uint32_t end_of_thread:1; } gen7_memory_fence; /*! atomic messages */ struct { uint32_t bti:8; uint32_t aop_type:4; uint32_t simd_mode:1; uint32_t return_data:1; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad3:2; uint32_t end_of_thread:1; } gen7_atomic_op; struct { uint32_t src1_subreg_nr_high:1; uint32_t src1_reg_nr:8; uint32_t pad0:1; uint32_t src2_rep_ctrl:1; uint32_t src2_swizzle:8; uint32_t src2_subreg_nr:3; uint32_t src2_reg_nr:8; uint32_t pad1:2; } da3src; /*! Message gateway */ struct { uint32_t subfunc:3; uint32_t pad:11; uint32_t ackreq:1; uint32_t notify:2; uint32_t pad2:2; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad3:2; uint32_t end_of_thread:1; } gen7_msg_gw; struct { uint32_t jip:16; uint32_t uip:16; } gen7_branch; /*! Data port Media block read / write */ struct { uint32_t bti:8; uint32_t ver_line_stride_offset:1; uint32_t ver_line_stride:1; uint32_t ver_line_stride_override:1; uint32_t ignored:3; uint32_t msg_type:4; uint32_t category:1; uint32_t header_present:1; uint32_t response_length:5; uint32_t msg_length:4; uint32_t pad2:2; uint32_t end_of_thread:1; } gen7_mblock_rw; int d; uint32_t ud; float f; } bits3; }; }; #endif Beignet-1.3.2-Source/backend/src/backend/gen_insn_selection.hxx000664 001750 001750 00000012166 13161142102 023620 0ustar00yryr000000 000000 DECL_SELECTION_IR(LABEL, LabelInstruction) DECL_SELECTION_IR(MOV, UnaryInstruction) DECL_SELECTION_IR(BSWAP, UnaryWithTempInstruction) DECL_SELECTION_IR(LOAD_INT64_IMM, UnaryInstruction) DECL_SELECTION_IR(NOT, UnaryInstruction) DECL_SELECTION_IR(LZD, UnaryInstruction) DECL_SELECTION_IR(RNDZ, UnaryInstruction) DECL_SELECTION_IR(RNDE, UnaryInstruction) DECL_SELECTION_IR(RNDD, UnaryInstruction) DECL_SELECTION_IR(RNDU, UnaryInstruction) DECL_SELECTION_IR(FRC, UnaryInstruction) DECL_SELECTION_IR(F16TO32, UnaryInstruction) DECL_SELECTION_IR(F32TO16, UnaryInstruction) DECL_SELECTION_IR(SEL, BinaryInstruction) DECL_SELECTION_IR(SEL_INT64, BinaryInstruction) DECL_SELECTION_IR(AND, BinaryInstruction) DECL_SELECTION_IR(OR, BinaryInstruction) DECL_SELECTION_IR(XOR, BinaryInstruction) DECL_SELECTION_IR(I64AND, BinaryInstruction) DECL_SELECTION_IR(I64OR, BinaryInstruction) DECL_SELECTION_IR(I64XOR, BinaryInstruction) DECL_SELECTION_IR(SHR, BinaryInstruction) DECL_SELECTION_IR(SHL, BinaryInstruction) DECL_SELECTION_IR(RSR, BinaryInstruction) DECL_SELECTION_IR(RSL, BinaryInstruction) DECL_SELECTION_IR(ASR, BinaryInstruction) DECL_SELECTION_IR(SIMD_SHUFFLE, SimdShuffleInstruction) DECL_SELECTION_IR(I64SHR, I64ShiftInstruction) DECL_SELECTION_IR(I64SHL, I64ShiftInstruction) DECL_SELECTION_IR(I64ASR, I64ShiftInstruction) DECL_SELECTION_IR(ADD, BinaryInstruction) DECL_SELECTION_IR(I64ADD, BinaryWithTempInstruction) DECL_SELECTION_IR(I64SATADD, I64SATADDInstruction) DECL_SELECTION_IR(I64SUB, BinaryWithTempInstruction) DECL_SELECTION_IR(I64SATSUB, I64SATSUBInstruction) DECL_SELECTION_IR(MUL, BinaryInstruction) DECL_SELECTION_IR(I64MUL, I64MULInstruction) DECL_SELECTION_IR(I64DIV, I64DIVREMInstruction) DECL_SELECTION_IR(I64REM, I64DIVREMInstruction) DECL_SELECTION_IR(ATOMIC, AtomicInstruction) DECL_SELECTION_IR(ATOMICA64, AtomicA64Instruction) DECL_SELECTION_IR(MACH, BinaryInstruction) DECL_SELECTION_IR(CMP, CompareInstruction) DECL_SELECTION_IR(I64CMP, I64CompareInstruction) DECL_SELECTION_IR(SEL_CMP, CompareInstruction) DECL_SELECTION_IR(MAD, TernaryInstruction) DECL_SELECTION_IR(LRP, TernaryInstruction) DECL_SELECTION_IR(JMPI, JumpInstruction) DECL_SELECTION_IR(EOT, EotInstruction) DECL_SELECTION_IR(INDIRECT_MOVE, IndirectMoveInstruction) DECL_SELECTION_IR(NOP, NoOpInstruction) DECL_SELECTION_IR(WAIT, WaitInstruction) DECL_SELECTION_IR(MATH, MathInstruction) DECL_SELECTION_IR(BARRIER, BarrierInstruction) DECL_SELECTION_IR(FENCE, FenceInstruction) DECL_SELECTION_IR(UNTYPED_READ, UntypedReadInstruction) DECL_SELECTION_IR(UNTYPED_WRITE, UntypedWriteInstruction) DECL_SELECTION_IR(UNTYPED_READA64, UntypedReadA64Instruction) DECL_SELECTION_IR(UNTYPED_WRITEA64, UntypedWriteA64Instruction) DECL_SELECTION_IR(READ64, Read64Instruction) DECL_SELECTION_IR(WRITE64, Write64Instruction) DECL_SELECTION_IR(READ64A64, Read64A64Instruction) DECL_SELECTION_IR(WRITE64A64, Write64A64Instruction) DECL_SELECTION_IR(BYTE_GATHER, ByteGatherInstruction) DECL_SELECTION_IR(BYTE_SCATTER, ByteScatterInstruction) DECL_SELECTION_IR(BYTE_GATHERA64, ByteGatherA64Instruction) DECL_SELECTION_IR(BYTE_SCATTERA64, ByteScatterA64Instruction) DECL_SELECTION_IR(DWORD_GATHER, DWordGatherInstruction) DECL_SELECTION_IR(PACK_BYTE, PackByteInstruction) DECL_SELECTION_IR(UNPACK_BYTE, UnpackByteInstruction) DECL_SELECTION_IR(PACK_LONG, PackLongInstruction) DECL_SELECTION_IR(UNPACK_LONG, UnpackLongInstruction) DECL_SELECTION_IR(SAMPLE, SampleInstruction) DECL_SELECTION_IR(VME, VmeInstruction) DECL_SELECTION_IR(TYPED_WRITE, TypedWriteInstruction) DECL_SELECTION_IR(SPILL_REG, SpillRegInstruction) DECL_SELECTION_IR(UNSPILL_REG, UnSpillRegInstruction) DECL_SELECTION_IR(MUL_HI, BinaryWithTempInstruction) DECL_SELECTION_IR(I64_MUL_HI, I64MULHIInstruction) DECL_SELECTION_IR(FBH, UnaryInstruction) DECL_SELECTION_IR(FBL, UnaryInstruction) DECL_SELECTION_IR(CBIT, UnaryInstruction) DECL_SELECTION_IR(HADD, BinaryWithTempInstruction) DECL_SELECTION_IR(RHADD, BinaryWithTempInstruction) DECL_SELECTION_IR(I64HADD, I64HADDInstruction) DECL_SELECTION_IR(I64RHADD, I64RHADDInstruction) DECL_SELECTION_IR(UPSAMPLE_LONG, BinaryInstruction) DECL_SELECTION_IR(CONVI_TO_I64, UnaryWithTempInstruction) DECL_SELECTION_IR(CONVI64_TO_I, UnaryInstruction) DECL_SELECTION_IR(CONVI64_TO_F, I64ToFloatInstruction) DECL_SELECTION_IR(CONVF_TO_I64, FloatToI64Instruction) DECL_SELECTION_IR(I64MADSAT, I64MADSATInstruction) DECL_SELECTION_IR(BRC, UnaryInstruction) DECL_SELECTION_IR(BRD, UnaryInstruction) DECL_SELECTION_IR(IF, UnaryInstruction) DECL_SELECTION_IR(ENDIF, UnaryInstruction) DECL_SELECTION_IR(ELSE, UnaryInstruction) DECL_SELECTION_IR(READ_ARF, UnaryInstruction) DECL_SELECTION_IR(WHILE, UnaryInstruction) DECL_SELECTION_IR(F64DIV, F64DIVInstruction) DECL_SELECTION_IR(CALC_TIMESTAMP, CalcTimestampInstruction) DECL_SELECTION_IR(STORE_PROFILING, StoreProfilingInstruction) DECL_SELECTION_IR(WORKGROUP_OP, WorkGroupOpInstruction) DECL_SELECTION_IR(SUBGROUP_OP, SubGroupOpInstruction) DECL_SELECTION_IR(PRINTF, PrintfInstruction) DECL_SELECTION_IR(OBREAD, OBReadInstruction) DECL_SELECTION_IR(OBWRITE, OBWriteInstruction) DECL_SELECTION_IR(MBREAD, MBReadInstruction) DECL_SELECTION_IR(MBWRITE, MBWriteInstruction) DECL_SELECTION_IR(BFREV, UnaryInstruction) Beignet-1.3.2-Source/backend/src/backend/program.cpp000664 001750 001750 00000160405 13173554000 021404 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file callback interface for the compiler * \author Benjamin Segovia */ #include "program.h" #include "program.hpp" #include "gen_program.h" #include "sys/platform.hpp" #include "sys/cvar.hpp" #include "ir/liveness.hpp" #include "ir/value.hpp" #include "ir/unit.hpp" #include "ir/printf.hpp" #include "src/cl_device_data.h" #ifdef GBE_COMPILER_AVAILABLE #include "llvm/llvm_to_gen.hpp" #include "llvm/Config/llvm-config.h" #include "llvm/Support/Threading.h" #include "llvm/Support/ManagedStatic.h" #include "llvm/Transforms/Utils/Cloning.h" #include "llvm/IR/LLVMContext.h" #include "llvm/IRReader/IRReader.h" #endif #include #include #include #include #include #include #include #include #ifdef GBE_COMPILER_AVAILABLE #include #include #include #include #include #include #include #include #include #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 #include #include #else #include #endif #include #endif #include "src/GBEConfig.h" namespace gbe { Kernel::Kernel(const std::string &name) : name(name), args(NULL), argNum(0), curbeSize(0), stackSize(0), useSLM(false), slmSize(0), ctx(NULL), samplerSet(NULL), imageSet(NULL), printfSet(NULL), profilingInfo(NULL), useDeviceEnqueue(false) {} Kernel::~Kernel(void) { if(ctx) GBE_DELETE(ctx); if(samplerSet) GBE_DELETE(samplerSet); if(imageSet) GBE_DELETE(imageSet); if(printfSet) GBE_DELETE(printfSet); if(profilingInfo) GBE_DELETE(profilingInfo); GBE_SAFE_DELETE_ARRAY(args); } int32_t Kernel::getCurbeOffset(gbe_curbe_type type, uint32_t subType) const { const PatchInfo patch(type, subType); const auto it = std::lower_bound(patches.begin(), patches.end(), patch); if (it == patches.end()) return -1; // nothing found if (patch < *it) return -1; // they are not equal return it->offset; // we found it! } Program::Program(uint32_t fast_relaxed_math) : fast_relaxed_math(fast_relaxed_math), constantSet(NULL), relocTable(NULL) {} Program::~Program(void) { for (map::iterator it = kernels.begin(); it != kernels.end(); ++it) GBE_DELETE(it->second); if (constantSet) delete constantSet; if (relocTable) delete relocTable; } #ifdef GBE_COMPILER_AVAILABLE BVAR(OCL_OUTPUT_GEN_IR, false); BVAR(OCL_STRICT_CONFORMANCE, true); IVAR(OCL_PROFILING_LOG, 0, 0, 1); // Int for different profiling types. BVAR(OCL_OUTPUT_BUILD_LOG, false); bool Program::buildFromLLVMModule(const void* module, std::string &error, int optLevel) { ir::Unit *unit = new ir::Unit(); bool ret = false; bool strictMath = true; if (fast_relaxed_math || !OCL_STRICT_CONFORMANCE) strictMath = false; if (llvmToGen(*unit, module, optLevel, strictMath, OCL_PROFILING_LOG, error) == false) { delete unit; return false; } //If unit is not valid, maybe some thing don't support by backend, introduce by some passes //use optLevel 0 to try again. if(!unit->getValid()) { delete unit; //clear unit unit = new ir::Unit(); //suppose file exists and llvmToGen will not return false. llvmToGen(*unit, module, 0, strictMath, OCL_PROFILING_LOG, error); } if(unit->getValid()){ std::string error2; if (this->buildFromUnit(*unit, error2)){ ret = true; } error = error + error2; } delete unit; return ret; } bool Program::buildFromUnit(const ir::Unit &unit, std::string &error) { constantSet = new ir::ConstantSet(unit.getConstantSet()); relocTable = new ir::RelocTable(unit.getRelocTable()); blockFuncs = unit.blockFuncs; const auto &set = unit.getFunctionSet(); const uint32_t kernelNum = set.size(); if (OCL_OUTPUT_GEN_IR) std::cout << unit; if (kernelNum == 0) return true; bool strictMath = true; if (fast_relaxed_math || !OCL_STRICT_CONFORMANCE) strictMath = false; for (const auto &pair : set) { const std::string &name = pair.first; Kernel *kernel = this->compileKernel(unit, name, !strictMath, OCL_PROFILING_LOG); if (!kernel) { error += name; error += ":(GBE): error: failed in Gen backend.\n"; if (OCL_OUTPUT_BUILD_LOG) llvm::errs() << error; return false; } kernel->setSamplerSet(pair.second->getSamplerSet()); kernel->setProfilingInfo(new ir::ProfilingInfo(*unit.getProfilingInfo())); kernel->setImageSet(pair.second->getImageSet()); kernel->setPrintfSet(pair.second->getPrintfSet()); kernel->setCompileWorkGroupSize(pair.second->getCompileWorkGroupSize()); kernel->setFunctionAttributes(pair.second->getFunctionAttributes()); kernels.insert(std::make_pair(name, kernel)); } return true; } #endif #define OUT_UPDATE_SZ(elt) SERIALIZE_OUT(elt, outs, ret_size) #define IN_UPDATE_SZ(elt) DESERIALIZE_IN(elt, ins, total_size) uint32_t Program::serializeToBin(std::ostream& outs) { uint32_t ret_size = 0; uint32_t ker_num = kernels.size(); uint32_t has_constset = 0; uint32_t has_relocTable = 0; OUT_UPDATE_SZ(magic_begin); if (constantSet) { has_constset = 1; OUT_UPDATE_SZ(has_constset); uint32_t sz = constantSet->serializeToBin(outs); if (!sz) return 0; ret_size += sz; } else { OUT_UPDATE_SZ(has_constset); } if(relocTable) { has_relocTable = 1; OUT_UPDATE_SZ(has_relocTable); uint32_t sz = relocTable->serializeToBin(outs); if (!sz) return 0; ret_size += sz; } else { OUT_UPDATE_SZ(has_relocTable); } OUT_UPDATE_SZ(ker_num); for (map::iterator it = kernels.begin(); it != kernels.end(); ++it) { uint32_t sz = it->second->serializeToBin(outs); if (!sz) return 0; ret_size += sz; } OUT_UPDATE_SZ(magic_end); OUT_UPDATE_SZ(ret_size); return ret_size; } uint32_t Program::deserializeFromBin(std::istream& ins) { uint32_t total_size = 0; int has_constset = 0; uint32_t ker_num; uint32_t magic; uint32_t has_relocTable = 0; IN_UPDATE_SZ(magic); if (magic != magic_begin) return 0; IN_UPDATE_SZ(has_constset); if(has_constset) { constantSet = new ir::ConstantSet; uint32_t sz = constantSet->deserializeFromBin(ins); if (sz == 0) return 0; total_size += sz; } IN_UPDATE_SZ(has_relocTable); if(has_relocTable) { relocTable = new ir::RelocTable; uint32_t sz = relocTable->deserializeFromBin(ins); if (sz == 0) return 0; total_size += sz; } IN_UPDATE_SZ(ker_num); for (uint32_t i = 0; i < ker_num; i++) { uint32_t ker_serial_sz; std::string ker_name; // Just a empty name here. Kernel* ker = allocateKernel(ker_name); if(!(ker_serial_sz = ker->deserializeFromBin(ins))) return 0; kernels.insert(std::make_pair(ker->getName(), ker)); total_size += ker_serial_sz; } IN_UPDATE_SZ(magic); if (magic != magic_end) return 0; uint32_t total_bytes; IN_UPDATE_SZ(total_bytes); if (total_bytes + sizeof(total_size) != total_size) return 0; return total_size; } uint32_t Kernel::serializeToBin(std::ostream& outs) { unsigned int i; uint32_t ret_size = 0; int has_samplerset = 0; int has_imageset = 0; uint32_t sz = 0; OUT_UPDATE_SZ(magic_begin); sz = name.size(); OUT_UPDATE_SZ(sz); outs.write(name.c_str(), name.size()); ret_size += sizeof(char)*name.size(); OUT_UPDATE_SZ(oclVersion); OUT_UPDATE_SZ(argNum); for (i = 0; i < argNum; i++) { KernelArgument& arg = args[i]; OUT_UPDATE_SZ(arg.type); OUT_UPDATE_SZ(arg.size); OUT_UPDATE_SZ(arg.align); OUT_UPDATE_SZ(arg.bti); OUT_UPDATE_SZ(arg.info.addrSpace); sz = arg.info.typeName.size(); OUT_UPDATE_SZ(sz); outs.write(arg.info.typeName.c_str(), arg.info.typeName.size()); ret_size += sizeof(char)*arg.info.typeName.size(); sz = arg.info.accessQual.size(); OUT_UPDATE_SZ(sz); outs.write(arg.info.accessQual.c_str(), arg.info.accessQual.size()); ret_size += sizeof(char)*arg.info.accessQual.size(); sz = arg.info.typeQual.size(); OUT_UPDATE_SZ(sz); outs.write(arg.info.typeQual.c_str(), arg.info.typeQual.size()); ret_size += sizeof(char)*arg.info.typeQual.size(); sz = arg.info.argName.size(); OUT_UPDATE_SZ(sz); outs.write(arg.info.argName.c_str(), arg.info.argName.size()); ret_size += sizeof(char)*arg.info.argName.size(); } sz = patches.size(); OUT_UPDATE_SZ(sz); for (uint32_t i = 0; i < patches.size(); ++i) { const PatchInfo& patch = patches[i]; unsigned int tmp; tmp = patch.type; OUT_UPDATE_SZ(tmp); tmp = patch.subType; OUT_UPDATE_SZ(tmp); tmp = patch.offset; OUT_UPDATE_SZ(tmp); } OUT_UPDATE_SZ(curbeSize); OUT_UPDATE_SZ(simdWidth); OUT_UPDATE_SZ(stackSize); OUT_UPDATE_SZ(scratchSize); OUT_UPDATE_SZ(useSLM); OUT_UPDATE_SZ(slmSize); OUT_UPDATE_SZ(compileWgSize[0]); OUT_UPDATE_SZ(compileWgSize[1]); OUT_UPDATE_SZ(compileWgSize[2]); /* samplers. */ if (!samplerSet->empty()) { //samplerSet is always valid, allocated in Function::Function has_samplerset = 1; OUT_UPDATE_SZ(has_samplerset); uint32_t sz = samplerSet->serializeToBin(outs); if (!sz) return 0; ret_size += sz; } else { OUT_UPDATE_SZ(has_samplerset); } /* images. */ if (!imageSet->empty()) { //imageSet is always valid, allocated in Function::Function has_imageset = 1; OUT_UPDATE_SZ(has_imageset); uint32_t sz = imageSet->serializeToBin(outs); if (!sz) return 0; ret_size += sz; } else { OUT_UPDATE_SZ(has_imageset); } /* Code. */ const char * code = getCode(); OUT_UPDATE_SZ(getCodeSize()); outs.write(code, getCodeSize()*sizeof(char)); ret_size += getCodeSize()*sizeof(char); OUT_UPDATE_SZ(magic_end); OUT_UPDATE_SZ(ret_size); return ret_size; } uint32_t Kernel::deserializeFromBin(std::istream& ins) { uint32_t total_size = 0; int has_samplerset = 0; int has_imageset = 0; uint32_t code_size = 0; uint32_t magic = 0; uint32_t patch_num = 0; IN_UPDATE_SZ(magic); if (magic != magic_begin) return 0; uint32_t name_len; IN_UPDATE_SZ(name_len); char* c_name = new char[name_len+1]; ins.read(c_name, name_len*sizeof(char)); total_size += sizeof(char)*name_len; c_name[name_len] = 0; name = c_name; delete[] c_name; IN_UPDATE_SZ(oclVersion); IN_UPDATE_SZ(argNum); args = GBE_NEW_ARRAY_NO_ARG(KernelArgument, argNum); for (uint32_t i = 0; i < argNum; i++) { KernelArgument& arg = args[i]; IN_UPDATE_SZ(arg.type); IN_UPDATE_SZ(arg.size); IN_UPDATE_SZ(arg.align); IN_UPDATE_SZ(arg.bti); IN_UPDATE_SZ(arg.info.addrSpace); uint32_t len; char* a_name = NULL; IN_UPDATE_SZ(len); a_name = new char[len+1]; ins.read(a_name, len*sizeof(char)); total_size += sizeof(char)*len; a_name[len] = 0; arg.info.typeName = a_name; delete[] a_name; IN_UPDATE_SZ(len); a_name = new char[len+1]; ins.read(a_name, len*sizeof(char)); total_size += sizeof(char)*len; a_name[len] = 0; arg.info.accessQual = a_name; delete[] a_name; IN_UPDATE_SZ(len); a_name = new char[len+1]; ins.read(a_name, len*sizeof(char)); total_size += sizeof(char)*len; a_name[len] = 0; arg.info.typeQual = a_name; delete[] a_name; IN_UPDATE_SZ(len); a_name = new char[len+1]; ins.read(a_name, len*sizeof(char)); total_size += sizeof(char)*len; a_name[len] = 0; arg.info.argName = a_name; delete[] a_name; } IN_UPDATE_SZ(patch_num); for (uint32_t i = 0; i < patch_num; i++) { unsigned int tmp; PatchInfo patch; IN_UPDATE_SZ(tmp); patch.type = tmp; IN_UPDATE_SZ(tmp); patch.subType = tmp; IN_UPDATE_SZ(tmp); patch.offset = tmp; patches.push_back(patch); } IN_UPDATE_SZ(curbeSize); IN_UPDATE_SZ(simdWidth); IN_UPDATE_SZ(stackSize); IN_UPDATE_SZ(scratchSize); IN_UPDATE_SZ(useSLM); IN_UPDATE_SZ(slmSize); IN_UPDATE_SZ(compileWgSize[0]); IN_UPDATE_SZ(compileWgSize[1]); IN_UPDATE_SZ(compileWgSize[2]); IN_UPDATE_SZ(has_samplerset); if (has_samplerset) { samplerSet = GBE_NEW(ir::SamplerSet); uint32_t sz = samplerSet->deserializeFromBin(ins); if (sz == 0) { return 0; } total_size += sz; } else samplerSet = NULL; IN_UPDATE_SZ(has_imageset); if (has_imageset) { imageSet = GBE_NEW(ir::ImageSet); uint32_t sz = imageSet->deserializeFromBin(ins); if (sz == 0) { return 0; } total_size += sz; } else imageSet = NULL; IN_UPDATE_SZ(code_size); if (code_size) { char* code = GBE_NEW_ARRAY_NO_ARG(char, code_size); ins.read(code, code_size*sizeof(char)); total_size += sizeof(char)*code_size; setCode(code, code_size); } IN_UPDATE_SZ(magic); if (magic != magic_end) return 0; uint32_t total_bytes; IN_UPDATE_SZ(total_bytes); if (total_bytes + sizeof(total_size) != total_size) return 0; return total_size; } #undef OUT_UPDATE_SZ #undef IN_UPDATE_SZ void Program::printStatus(int indent, std::ostream& outs) { using namespace std; string spaces = indent_to_str(indent); outs << spaces << "=============== Begin Program ===============" << "\n"; if (constantSet) { constantSet->printStatus(indent + 4, outs); } for (map::iterator it = kernels.begin(); it != kernels.end(); ++it) { it->second->printStatus(indent + 4, outs); } outs << spaces << "================ End Program ================" << "\n"; } void Kernel::printStatus(int indent, std::ostream& outs) { using namespace std; string spaces = indent_to_str(indent); string spaces_nl = indent_to_str(indent + 4); int num; outs << spaces << "+++++++++++ Begin Kernel +++++++++++" << "\n"; outs << spaces_nl << "Kernel Name: " << name << "\n"; outs << spaces_nl << " curbeSize: " << curbeSize << "\n"; outs << spaces_nl << " simdWidth: " << simdWidth << "\n"; outs << spaces_nl << " stackSize: " << stackSize << "\n"; outs << spaces_nl << " scratchSize: " << scratchSize << "\n"; outs << spaces_nl << " useSLM: " << useSLM << "\n"; outs << spaces_nl << " slmSize: " << slmSize << "\n"; outs << spaces_nl << " compileWgSize: " << compileWgSize[0] << compileWgSize[1] << compileWgSize[2] << "\n"; outs << spaces_nl << " Argument Number is " << argNum << "\n"; for (uint32_t i = 0; i < argNum; i++) { KernelArgument& arg = args[i]; outs << spaces_nl << " Arg " << i << ":\n"; outs << spaces_nl << " type value: "<< arg.type << "\n"; outs << spaces_nl << " size: "<< arg.size << "\n"; outs << spaces_nl << " align: "<< arg.align << "\n"; outs << spaces_nl << " bti: "<< arg.bti << "\n"; } outs << spaces_nl << " Patches Number is " << patches.size() << "\n"; num = 0; for (size_t i = 0; i < patches.size(); ++i) { PatchInfo& patch = patches[i]; num++; outs << spaces_nl << " patch " << num << ":\n"; outs << spaces_nl << " type value: "<< patch.type << "\n"; outs << spaces_nl << " subtype value: "<< patch.subType << "\n"; outs << spaces_nl << " offset: "<< patch.offset << "\n"; } if (samplerSet) { samplerSet->printStatus(indent + 4, outs); } if (imageSet) { imageSet->printStatus(indent + 4, outs); } outs << spaces << "++++++++++++ End Kernel ++++++++++++" << "\n"; } /*********************** End of Program class member function *************************/ static void programDelete(gbe_program gbeProgram) { gbe::Program *program = (gbe::Program*)(gbeProgram); GBE_SAFE_DELETE(program); } static void programCleanLlvmResource(gbe_program gbeProgram) { gbe::Program *program = (gbe::Program*)(gbeProgram); program->CleanLlvmResource(); } BVAR(OCL_DEBUGINFO, false); #ifdef GBE_COMPILER_AVAILABLE static bool buildModuleFromSource(const char *source, llvm::Module** out_module, llvm::LLVMContext* llvm_ctx, std::string dumpLLVMFileName, std::string dumpSPIRBinaryName, std::vector& options, size_t stringSize, char *err, size_t *errSize, uint32_t oclVersion) { // Arguments to pass to the clang frontend vector args; bool bFastMath = false; for (auto &s : options) { args.push_back(s.c_str()); } args.push_back("-cl-kernel-arg-info"); // The ParseCommandLineOptions used for mllvm args can not be used with multithread // and GVN now have a 100 inst limit on block scan. Now only pass a bigger limit // for each context only once, this can also fix multithread bug. #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 static bool ifsetllvm = false; if(!ifsetllvm) { args.push_back("-mllvm"); args.push_back("-memdep-block-scan-limit=200"); ifsetllvm = true; } #endif #ifdef GEN7_SAMPLER_CLAMP_BORDER_WORKAROUND args.push_back("-DGEN7_SAMPLER_CLAMP_BORDER_WORKAROUND"); #endif args.push_back("-emit-llvm"); // FIXME we haven't implement those builtin functions, // so disable it currently. args.push_back("-fno-builtin"); args.push_back("-disable-llvm-optzns"); if(bFastMath) args.push_back("-D __FAST_RELAXED_MATH__=1"); args.push_back("-x"); args.push_back("cl"); args.push_back("-triple"); if (oclVersion >= 200) { args.push_back("spir64"); args.push_back("-fblocks"); } else args.push_back("spir"); args.push_back("stringInput.cl"); args.push_back("-ffp-contract=on"); if(OCL_DEBUGINFO) args.push_back("-g"); // The compiler invocation needs a DiagnosticsEngine so it can report problems std::string ErrorString; llvm::raw_string_ostream ErrorInfo(ErrorString); llvm::IntrusiveRefCntPtr DiagOpts = new clang::DiagnosticOptions(); DiagOpts->ShowCarets = false; DiagOpts->ShowPresumedLoc = true; clang::TextDiagnosticPrinter *DiagClient = new clang::TextDiagnosticPrinter(ErrorInfo, &*DiagOpts); llvm::IntrusiveRefCntPtr DiagID(new clang::DiagnosticIDs()); clang::DiagnosticsEngine Diags(DiagID, &*DiagOpts, DiagClient); llvm::StringRef srcString(source); // Create the compiler invocation #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 auto CI = std::make_shared(); CI->getPreprocessorOpts().addRemappedFile("stringInput.cl", #else std::unique_ptr CI(new clang::CompilerInvocation); (*CI).getPreprocessorOpts().addRemappedFile("stringInput.cl", #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR <= 35 llvm::MemoryBuffer::getMemBuffer(srcString) #else llvm::MemoryBuffer::getMemBuffer(srcString).release() #endif ); clang::CompilerInvocation::CreateFromArgs(*CI, &args[0], &args[0] + args.size(), Diags); // Create the compiler instance clang::CompilerInstance Clang; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 Clang.setInvocation(std::move(CI)); #else Clang.setInvocation(CI.release()); #endif // Get ready to report problems Clang.createDiagnostics(DiagClient, false); Clang.getDiagnosticOpts().ShowCarets = false; if (!Clang.hasDiagnostics()) return false; // Set Language clang::LangOptions & lang_opts = Clang.getLangOpts(); lang_opts.OpenCL = 1; //llvm flags need command line parsing to take effect if (!Clang.getFrontendOpts().LLVMArgs.empty()) { unsigned NumArgs = Clang.getFrontendOpts().LLVMArgs.size(); const char **Args = new const char*[NumArgs + 2]; Args[0] = "clang (LLVM option parsing)"; for (unsigned i = 0; i != NumArgs; ++i){ Args[i + 1] = Clang.getFrontendOpts().LLVMArgs[i].c_str(); } Args[NumArgs + 1] = 0; llvm::cl::ParseCommandLineOptions(NumArgs + 1, Args); delete [] Args; } // Create an action and make the compiler instance carry it out std::unique_ptr Act(new clang::EmitLLVMOnlyAction(llvm_ctx)); auto retVal = Clang.ExecuteAction(*Act); if (err != NULL) { GBE_ASSERT(errSize != NULL); *errSize = ErrorString.copy(err, stringSize - 1, 0); } if (err == NULL || OCL_OUTPUT_BUILD_LOG) { // flush the error messages to the errs() if there is no // error string buffer. llvm::errs() << ErrorString; } ErrorString.clear(); if (!retVal) return false; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR <= 35 llvm::Module *module = Act->takeModule(); #else llvm::Module *module = Act->takeModule().release(); #endif *out_module = module; // Dump the LLVM if requested. #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR < 36 if (!dumpLLVMFileName.empty()) { std::string err; llvm::raw_fd_ostream ostream (dumpLLVMFileName.c_str(), err, llvm::sys::fs::F_None ); if (err.empty()) { (*out_module)->print(ostream, 0); } //Otherwise, you'll have to make do without the dump. } if (!dumpSPIRBinaryName.empty()) { std::string err; llvm::raw_fd_ostream ostream (dumpSPIRBinaryName.c_str(), err, llvm::sys::fs::F_None ); if (err.empty()) llvm::WriteBitcodeToFile(*out_module, ostream); } #else if (!dumpLLVMFileName.empty()) { std::error_code err; llvm::raw_fd_ostream ostream (dumpLLVMFileName.c_str(), err, llvm::sys::fs::F_None); if (!err) { (*out_module)->print(ostream, 0); } //Otherwise, you'll have to make do without the dump. } if (!dumpSPIRBinaryName.empty()) { std::error_code err; llvm::raw_fd_ostream ostream (dumpSPIRBinaryName.c_str(), err, llvm::sys::fs::F_None); if (!err) llvm::WriteBitcodeToFile(*out_module, ostream); } #endif return true; } SVAR(OCL_PCH_PATH, OCL_PCH_OBJECT); SVAR(OCL_PCH_20_PATH, OCL_PCH_OBJECT_20); SVAR(OCL_HEADER_FILE_DIR, OCL_HEADER_DIR); BVAR(OCL_OUTPUT_KERNEL_SOURCE, false); static bool processSourceAndOption(const char *source, const char *options, const char *temp_header_path, std::vector& clOpt, std::string& dumpLLVMFileName, std::string& dumpASMFileName, std::string& dumpSPIRBinaryName, int& optLevel, size_t stringSize, char *err, size_t *errSize, uint32_t &oclVersion) { uint32_t maxoclVersion = oclVersion; std::string pchFileName; bool findPCH = false; #if defined(__ANDROID__) bool invalidPCH = true; #else bool invalidPCH = false; #endif size_t start = 0, end = 0; std::string hdirs = OCL_HEADER_FILE_DIR; if(hdirs == "") hdirs = OCL_HEADER_DIR; std::istringstream hidirs(hdirs); std::string headerFilePath; bool findOcl = false; while (getline(hidirs, headerFilePath, ':')) { std::string oclDotHName = headerFilePath + "/ocl.h"; if(access(oclDotHName.c_str(), R_OK) == 0) { findOcl = true; break; } } (void) findOcl; assert(findOcl); if (OCL_OUTPUT_KERNEL_SOURCE) { if(options) { std::cout << "Build options:" << std::endl; std::cout << options << std::endl; } std::cout << "CL kernel source:" << std::endl; std::cout << source << std::endl; } std::string includePath = "-I" + headerFilePath; clOpt.push_back(includePath); bool useDefaultCLCVersion = true; if (options) { char *c_str = (char *)malloc(sizeof(char) * (strlen(options) + 1)); if (c_str == NULL) return false; memcpy(c_str, options, strlen(options) + 1); std::string optionStr(c_str); const std::string unsupportedOptions("-cl-denorms-are-zero, -cl-strict-aliasing, -cl-opt-disable," "-cl-no-signed-zeros, -cl-fp32-correctly-rounded-divide-sqrt"); const std::string uncompatiblePCHOptions = ("-cl-single-precision-constant, -cl-fast-relaxed-math, -cl-std=CL1.1, -cl-finite-math-only, -cl-unsafe-math-optimizations"); const std::string fastMathOption = ("-cl-fast-relaxed-math"); while (end != std::string::npos) { end = optionStr.find(' ', start); std::string str = optionStr.substr(start, end - start); if(str.size() == 0) { start = end + 1; continue; } EXTEND_QUOTE: /* We need to find the ", if the there are odd number of " within this string, we need to extend the string to the matched " of the last one. */ int quoteNum = 0; for (size_t i = 0; i < str.size(); i++) { if (str[i] == '"') { quoteNum++; } } if (quoteNum % 2) { // Odd number of ", need to extend the string. /* find the second " */ while (end < optionStr.size() && optionStr[end] != '"') end++; if (end == optionStr.size()) { printf("Warning: Unmatched \" number in build option\n"); free(c_str); return false; } GBE_ASSERT(optionStr[end] == '"'); end++; if (end < optionStr.size() && optionStr[end] != ' ') { // "CC AAA"BBDDDD case, need to further extend. end = optionStr.find(' ', end); str = optionStr.substr(start, end - start); goto EXTEND_QUOTE; } else { str = optionStr.substr(start, end - start); } } start = end + 1; if(unsupportedOptions.find(str) != std::string::npos) { continue; } /* if -I, we need to extract "path" to path, no " */ if (clOpt.back() == "-I") { if (str[0] == '"') { GBE_ASSERT(str[str.size() - 1] == '"'); if (str.size() > 2) { clOpt.push_back(str.substr(1, str.size() - 2)); } else { clOpt.push_back(""); } continue; } } // The -I"YYYY" like case. if (str.size() > 4 && str[0] == '-' && str[1] == 'I' && str[2] == '"') { GBE_ASSERT(str[str.size() - 1] == '"'); clOpt.push_back("-I"); if (str.size() > 4) { clOpt.push_back(str.substr(3, str.size() - 4)); } else { clOpt.push_back(""); } continue; } if(str.find("-cl-std=") != std::string::npos) { useDefaultCLCVersion = false; if (str == "-cl-std=CL1.1") { clOpt.push_back("-D__OPENCL_C_VERSION__=110"); oclVersion = 110; } else if (str == "-cl-std=CL1.2") { clOpt.push_back("-D__OPENCL_C_VERSION__=120"); oclVersion = 120; } else if (str == "-cl-std=CL2.0") { clOpt.push_back("-D__OPENCL_C_VERSION__=200"); oclVersion = 200; } else { if (err && stringSize > 0 && errSize) *errSize = snprintf(err, stringSize, "Invalid build option: %s\n", str.c_str()); return false; } } if (uncompatiblePCHOptions.find(str) != std::string::npos) invalidPCH = true; if (fastMathOption.find(str) != std::string::npos) { clOpt.push_back("-D"); clOpt.push_back("__FAST_RELAXED_MATH__=1"); } if(str.find("-dump-opt-llvm=") != std::string::npos) { dumpLLVMFileName = str.substr(str.find("=") + 1); continue; // Don't push this str back; ignore it. } if(str.find("-dump-opt-asm=") != std::string::npos) { dumpASMFileName = str.substr(str.find("=") + 1); continue; // Don't push this str back; ignore it. } if(str.find("-dump-spir-binary=") != std::string::npos) { dumpSPIRBinaryName = str.substr(str.find("=") + 1); continue; // Don't push this str back; ignore it. } clOpt.push_back(str); } free(c_str); } if (useDefaultCLCVersion) { clOpt.push_back("-D__OPENCL_C_VERSION__=120"); clOpt.push_back("-cl-std=CL1.2"); oclVersion = 120; } //for clCompilerProgram usage. if(temp_header_path){ clOpt.push_back("-I"); clOpt.push_back(temp_header_path); } std::string dirs = OCL_PCH_PATH; if(oclVersion >= 200) dirs = OCL_PCH_20_PATH; if(dirs == "") { dirs = oclVersion >= 200 ? OCL_PCH_OBJECT_20 : OCL_PCH_OBJECT; } std::istringstream idirs(dirs); while (getline(idirs, pchFileName, ':')) { if(access(pchFileName.c_str(), R_OK) == 0) { findPCH = true; break; } } if (!findPCH || invalidPCH) { clOpt.push_back("-include"); clOpt.push_back("ocl.h"); } else { clOpt.push_back("-fno-validate-pch"); clOpt.push_back("-include-pch"); clOpt.push_back(pchFileName); } if (oclVersion > maxoclVersion){ if (err && stringSize > 0 && errSize) { *errSize = snprintf(err, stringSize, "Requested OpenCL version %lf is higher than maximum supported version %lf\n", (float)oclVersion/100.0,(float)maxoclVersion/100.0); } return false; } return true; } static gbe_program programNewFromSource(uint32_t deviceID, const char *source, size_t stringSize, const char *options, char *err, size_t *errSize) { int optLevel = 1; std::vector clOpt; std::string dumpLLVMFileName, dumpASMFileName; std::string dumpSPIRBinaryName; uint32_t oclVersion = MAX_OCLVERSION(deviceID); if (!processSourceAndOption(source, options, NULL, clOpt, dumpLLVMFileName, dumpASMFileName, dumpSPIRBinaryName, optLevel, stringSize, err, errSize, oclVersion)) return NULL; gbe_program p; // will delete the module and act in GenProgram::CleanLlvmResource(). llvm::Module * out_module; llvm::LLVMContext* llvm_ctx = new llvm::LLVMContext; static std::mutex llvm_mutex; if (!llvm::llvm_is_multithreaded()) llvm_mutex.lock(); if (buildModuleFromSource(source, &out_module, llvm_ctx, dumpLLVMFileName, dumpSPIRBinaryName, clOpt, stringSize, err, errSize, oclVersion)) { // Now build the program from llvm size_t clangErrSize = 0; if (err != NULL && *errSize != 0) { GBE_ASSERT(errSize != NULL); stringSize = stringSize - *errSize; err = err + *errSize; clangErrSize = *errSize; } if (!dumpASMFileName.empty()) { FILE *asmDumpStream = fopen(dumpASMFileName.c_str(), "w"); if (asmDumpStream) fclose(asmDumpStream); } p = gbe_program_new_from_llvm(deviceID, out_module, llvm_ctx, dumpASMFileName.empty() ? NULL : dumpASMFileName.c_str(), stringSize, err, errSize, optLevel, options); if (err != NULL) *errSize += clangErrSize; if (OCL_OUTPUT_BUILD_LOG && options) llvm::errs() << "options:" << options << "\n"; if (OCL_OUTPUT_BUILD_LOG && err && *errSize) llvm::errs() << err << "\n"; } else p = NULL; if (!llvm::llvm_is_multithreaded()) llvm_mutex.unlock(); return p; } #endif #ifdef GBE_COMPILER_AVAILABLE static gbe_program programNewFromLLVMFile(uint32_t deviceID, const char *fileName, size_t string_size, char *err, size_t *err_size) { gbe_program p = NULL; if (fileName == NULL) return NULL; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 llvm::LLVMContext *c = new llvm::LLVMContext; #else llvm::LLVMContext *c = &llvm::getGlobalContext(); #endif // Get the module from its file llvm::SMDiagnostic errDiag; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 36 llvm::Module *module = parseIRFile(fileName, errDiag, *c).release(); #else llvm::Module *module = ParseIRFile(fileName, errDiag, *c); #endif int optLevel = 1; //module will be delete in programCleanLlvmResource p = gbe_program_new_from_llvm(deviceID, module, c, NULL, string_size, err, err_size, optLevel, NULL); if (OCL_OUTPUT_BUILD_LOG && err && *err_size) llvm::errs() << err << "\n"; return p; } #endif #ifdef GBE_COMPILER_AVAILABLE static gbe_program programCompileFromSource(uint32_t deviceID, const char *source, const char *temp_header_path, size_t stringSize, const char *options, char *err, size_t *errSize) { int optLevel = 1; std::vector clOpt; std::string dumpLLVMFileName, dumpASMFileName; std::string dumpSPIRBinaryName; uint32_t oclVersion = MAX_OCLVERSION(deviceID); if (!processSourceAndOption(source, options, temp_header_path, clOpt, dumpLLVMFileName, dumpASMFileName, dumpSPIRBinaryName, optLevel, stringSize, err, errSize, oclVersion)) return NULL; gbe_program p; acquireLLVMContextLock(); llvm::Module * out_module; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 llvm::LLVMContext* llvm_ctx = new llvm::LLVMContext; #else llvm::LLVMContext* llvm_ctx = &llvm::getGlobalContext(); #endif if (buildModuleFromSource(source, &out_module, llvm_ctx, dumpLLVMFileName, dumpSPIRBinaryName, clOpt, stringSize, err, errSize, oclVersion)) { // Now build the program from llvm if (err != NULL) { GBE_ASSERT(errSize != NULL); stringSize -= *errSize; err += *errSize; } p = gbe_program_new_gen_program(deviceID, out_module, NULL, NULL); if (OCL_OUTPUT_BUILD_LOG && options) llvm::errs() << "options:" << options << "\n"; if (OCL_OUTPUT_BUILD_LOG && err && *errSize) llvm::errs() << err << "\n"; } else p = NULL; releaseLLVMContextLock(); return p; } #endif #ifdef GBE_COMPILER_AVAILABLE static bool programLinkProgram(gbe_program dst_program, gbe_program src_program, size_t stringSize, char * err, size_t * errSize) { bool ret = 0; acquireLLVMContextLock(); ret = gbe_program_link_from_llvm(dst_program, src_program, stringSize, err, errSize); releaseLLVMContextLock(); if (OCL_OUTPUT_BUILD_LOG && err) llvm::errs() << err; return ret; } #endif #ifdef GBE_COMPILER_AVAILABLE static bool programCheckOption(const char * option) { vector args; if (option == NULL) return 1; //if NULL, return ok std::string s(option); size_t pos = s.find("-create-library"); //clang don't accept -create-library and -enable-link-options, erase them if(pos != std::string::npos) { s.erase(pos, strlen("-create-library")); } pos = s.find("-enable-link-options"); if(pos != std::string::npos) { s.erase(pos, strlen("-enable-link-options")); } pos = s.find("-dump-opt-asm"); if(pos != std::string::npos) { s.erase(pos, strlen("-dump-opt-asm")); } args.push_back(s.c_str()); // The compiler invocation needs a DiagnosticsEngine so it can report problems std::string ErrorString; llvm::raw_string_ostream ErrorInfo(ErrorString); llvm::IntrusiveRefCntPtr DiagOpts = new clang::DiagnosticOptions(); DiagOpts->ShowCarets = false; DiagOpts->ShowPresumedLoc = true; clang::TextDiagnosticPrinter *DiagClient = new clang::TextDiagnosticPrinter(ErrorInfo, &*DiagOpts); llvm::IntrusiveRefCntPtr DiagID(new clang::DiagnosticIDs()); clang::DiagnosticsEngine Diags(DiagID, &*DiagOpts, DiagClient); // Create the compiler invocation std::unique_ptr CI(new clang::CompilerInvocation); return clang::CompilerInvocation::CreateFromArgs(*CI, &args[0], &args[0] + args.size(), Diags); } #endif static size_t programGetGlobalConstantSize(gbe_program gbeProgram) { if (gbeProgram == NULL) return 0; const gbe::Program *program = (const gbe::Program*) gbeProgram; return program->getGlobalConstantSize(); } static void programGetGlobalConstantData(gbe_program gbeProgram, char *mem) { if (gbeProgram == NULL) return; const gbe::Program *program = (const gbe::Program*) gbeProgram; program->getGlobalConstantData(mem); } static size_t programGetGlobalRelocCount(gbe_program gbeProgram) { if (gbeProgram == NULL) return 0; const gbe::Program *program = (const gbe::Program*) gbeProgram; return program->getGlobalRelocCount(); } static void programGetGlobalRelocTable(gbe_program gbeProgram, char *mem) { if (gbeProgram == NULL) return; const gbe::Program *program = (const gbe::Program*) gbeProgram; program->getGlobalRelocTable(mem); } static uint32_t programGetKernelNum(gbe_program gbeProgram) { if (gbeProgram == NULL) return 0; const gbe::Program *program = (const gbe::Program*) gbeProgram; return program->getKernelNum(); } const static char* programGetDeviceEnqueueKernelName(gbe_program gbeProgram, uint32_t index) { if (gbeProgram == NULL) return 0; const gbe::Program *program = (const gbe::Program*) gbeProgram; return program->getDeviceEnqueueKernelName(index); } static gbe_kernel programGetKernelByName(gbe_program gbeProgram, const char *name) { if (gbeProgram == NULL) return NULL; const gbe::Program *program = (gbe::Program*) gbeProgram; return (gbe_kernel) program->getKernel(std::string(name)); } static gbe_kernel programGetKernel(const gbe_program gbeProgram, uint32_t ID) { if (gbeProgram == NULL) return NULL; const gbe::Program *program = (gbe::Program*) gbeProgram; return (gbe_kernel) program->getKernel(ID); } static const char *kernelGetName(gbe_kernel genKernel) { if (genKernel == NULL) return NULL; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getName(); } static const char *kernelGetAttributes(gbe_kernel genKernel) { if (genKernel == NULL) return NULL; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getFunctionAttributes(); } static const char *kernelGetCode(gbe_kernel genKernel) { if (genKernel == NULL) return NULL; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getCode(); } static size_t kernelGetCodeSize(gbe_kernel genKernel) { if (genKernel == NULL) return 0u; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getCodeSize(); } static uint32_t kernelGetArgNum(gbe_kernel genKernel) { if (genKernel == NULL) return 0u; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getArgNum(); } static void *kernelGetArgInfo(gbe_kernel genKernel, uint32_t argID, uint32_t value) { if (genKernel == NULL) return NULL; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; KernelArgument::ArgInfo* info = kernel->getArgInfo(argID); switch (value) { case GBE_GET_ARG_INFO_ADDRSPACE: return (void*)((unsigned long)info->addrSpace); case GBE_GET_ARG_INFO_TYPE: return (void *)(info->typeName.c_str()); case GBE_GET_ARG_INFO_ACCESS: return (void *)(info->accessQual.c_str()); case GBE_GET_ARG_INFO_TYPEQUAL: return (void *)(info->typeQual.c_str()); case GBE_GET_ARG_INFO_NAME: return (void *)(info->argName.c_str()); case GBE_GET_ARG_INFO_TYPESIZE: return (void *)((size_t)info->typeSize); default: assert(0); } return NULL; } static uint32_t kernelGetArgSize(gbe_kernel genKernel, uint32_t argID) { if (genKernel == NULL) return 0u; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getArgSize(argID); } static uint8_t kernelGetArgBTI(gbe_kernel genKernel, uint32_t argID) { if (genKernel == NULL) return 0u; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getArgBTI(argID); } static uint32_t kernelGetArgAlign(gbe_kernel genKernel, uint32_t argID) { if (genKernel == NULL) return 0u; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getArgAlign(argID); } static gbe_arg_type kernelGetArgType(gbe_kernel genKernel, uint32_t argID) { if (genKernel == NULL) return GBE_ARG_INVALID; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getArgType(argID); } static uint32_t kernelGetSIMDWidth(gbe_kernel genKernel) { if (genKernel == NULL) return GBE_ARG_INVALID; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getSIMDWidth(); } static int32_t kernelGetCurbeOffset(gbe_kernel genKernel, gbe_curbe_type type, uint32_t subType) { if (genKernel == NULL) return 0; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getCurbeOffset(type, subType); } static int32_t kernelGetCurbeSize(gbe_kernel genKernel) { if (genKernel == NULL) return 0; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getCurbeSize(); } static int32_t kernelGetStackSize(gbe_kernel genKernel) { if (genKernel == NULL) return 0; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getStackSize(); } static int32_t kernelGetScratchSize(gbe_kernel genKernel) { if (genKernel == NULL) return 0; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getScratchSize(); } static int32_t kernelUseSLM(gbe_kernel genKernel) { if (genKernel == NULL) return 0; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getUseSLM() ? 1 : 0; } static int32_t kernelGetSLMSize(gbe_kernel genKernel) { if (genKernel == NULL) return 0; const gbe::Kernel *kernel = (const gbe::Kernel*) genKernel; return kernel->getSLMSize(); } static size_t kernelGetSamplerSize(gbe_kernel gbeKernel) { if (gbeKernel == NULL) return 0; const gbe::Kernel *kernel = (const gbe::Kernel*) gbeKernel; return kernel->getSamplerSize(); } static void kernelGetSamplerData(gbe_kernel gbeKernel, uint32_t *samplers) { if (gbeKernel == NULL) return; const gbe::Kernel *kernel = (const gbe::Kernel*) gbeKernel; kernel->getSamplerData(samplers); } static void* kernelDupProfiling(gbe_kernel gbeKernel) { if (gbeKernel == NULL) return NULL; const gbe::Kernel *kernel = (const gbe::Kernel*) gbeKernel; return kernel->dupProfilingInfo(); } static uint32_t kernelGetProfilingBTI(gbe_kernel gbeKernel) { if (gbeKernel == NULL) return 0; const gbe::Kernel *kernel = (const gbe::Kernel*) gbeKernel; return kernel->getProfilingBTI(); } static void kernelOutputProfiling(void *profiling_info, void* buf) { if (profiling_info == NULL) return; ir::ProfilingInfo *pi = (ir::ProfilingInfo *)profiling_info; return pi->outputProfilingInfo(buf); } static uint32_t kernelGetPrintfNum(void * printf_info) { if (printf_info == NULL) return 0; const ir::PrintfSet *ps = (ir::PrintfSet *)printf_info; return ps->getPrintfNum(); } static uint32_t kernelUseDeviceEnqueue(gbe_kernel gbeKernel) { if (gbeKernel == NULL) return 0; const gbe::Kernel *kernel = (const gbe::Kernel*) gbeKernel; return kernel->getUseDeviceEnqueue(); } static void* kernelDupPrintfSet(gbe_kernel gbeKernel) { if (gbeKernel == NULL) return NULL; const gbe::Kernel *kernel = (const gbe::Kernel*) gbeKernel; return kernel->dupPrintfSet(); } static uint8_t kernelGetPrintfBufBTI(void * printf_info) { if (printf_info == NULL) return 0; const ir::PrintfSet *ps = (ir::PrintfSet *)printf_info; return ps->getBufBTI(); } static void kernelReleasePrintfSet(void * printf_info) { if (printf_info == NULL) return; ir::PrintfSet *ps = (ir::PrintfSet *)printf_info; delete ps; } static void kernelOutputPrintf(void * printf_info, void* buf_addr) { if (printf_info == NULL) return; ir::PrintfSet *ps = (ir::PrintfSet *)printf_info; ps->outputPrintf(buf_addr); } static void kernelGetCompileWorkGroupSize(gbe_kernel gbeKernel, size_t wg_size[3]) { if (gbeKernel == NULL) return; const gbe::Kernel *kernel = (const gbe::Kernel*) gbeKernel; kernel->getCompileWorkGroupSize(wg_size); } static size_t kernelGetImageSize(gbe_kernel gbeKernel) { if (gbeKernel == NULL) return 0; const gbe::Kernel *kernel = (const gbe::Kernel*) gbeKernel; return kernel->getImageSize(); } static void kernelGetImageData(gbe_kernel gbeKernel, ImageInfo *images) { if (gbeKernel == NULL) return; const gbe::Kernel *kernel = (const gbe::Kernel*) gbeKernel; kernel->getImageData(images); } static uint32_t kernelGetOclVersion(gbe_kernel gbeKernel) { if (gbeKernel == NULL) return 0; const gbe::Kernel *kernel = (const gbe::Kernel*) gbeKernel; return kernel->getOclVersion(); } static uint32_t kernelGetRequiredWorkGroupSize(gbe_kernel kernel, uint32_t dim) { return 0u; } } /* namespace gbe */ std::mutex llvm_ctx_mutex; void acquireLLVMContextLock() { llvm_ctx_mutex.lock(); } void releaseLLVMContextLock() { llvm_ctx_mutex.unlock(); } GBE_EXPORT_SYMBOL gbe_program_new_from_source_cb *gbe_program_new_from_source = NULL; GBE_EXPORT_SYMBOL gbe_program_new_from_llvm_file_cb *gbe_program_new_from_llvm_file = NULL; GBE_EXPORT_SYMBOL gbe_program_compile_from_source_cb *gbe_program_compile_from_source = NULL; GBE_EXPORT_SYMBOL gbe_program_link_program_cb *gbe_program_link_program = NULL; GBE_EXPORT_SYMBOL gbe_program_check_opt_cb *gbe_program_check_opt = NULL; GBE_EXPORT_SYMBOL gbe_program_new_from_binary_cb *gbe_program_new_from_binary = NULL; GBE_EXPORT_SYMBOL gbe_program_new_from_llvm_binary_cb *gbe_program_new_from_llvm_binary = NULL; GBE_EXPORT_SYMBOL gbe_program_serialize_to_binary_cb *gbe_program_serialize_to_binary = NULL; GBE_EXPORT_SYMBOL gbe_program_new_from_llvm_cb *gbe_program_new_from_llvm = NULL; GBE_EXPORT_SYMBOL gbe_program_new_gen_program_cb *gbe_program_new_gen_program = NULL; GBE_EXPORT_SYMBOL gbe_program_link_from_llvm_cb *gbe_program_link_from_llvm = NULL; GBE_EXPORT_SYMBOL gbe_program_build_from_llvm_cb *gbe_program_build_from_llvm = NULL; GBE_EXPORT_SYMBOL gbe_program_get_global_constant_size_cb *gbe_program_get_global_constant_size = NULL; GBE_EXPORT_SYMBOL gbe_program_get_global_constant_data_cb *gbe_program_get_global_constant_data = NULL; GBE_EXPORT_SYMBOL gbe_program_get_global_reloc_count_cb *gbe_program_get_global_reloc_count = NULL; GBE_EXPORT_SYMBOL gbe_program_get_global_reloc_table_cb *gbe_program_get_global_reloc_table = NULL; GBE_EXPORT_SYMBOL gbe_program_clean_llvm_resource_cb *gbe_program_clean_llvm_resource = NULL; GBE_EXPORT_SYMBOL gbe_program_delete_cb *gbe_program_delete = NULL; GBE_EXPORT_SYMBOL gbe_program_get_kernel_num_cb *gbe_program_get_kernel_num = NULL; GBE_EXPORT_SYMBOL gbe_program_get_kernel_by_name_cb *gbe_program_get_kernel_by_name = NULL; GBE_EXPORT_SYMBOL gbe_program_get_kernel_cb *gbe_program_get_kernel = NULL; GBE_EXPORT_SYMBOL gbe_program_get_device_enqueue_kernel_name_cb *gbe_program_get_device_enqueue_kernel_name = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_name_cb *gbe_kernel_get_name = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_attributes_cb *gbe_kernel_get_attributes = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_code_cb *gbe_kernel_get_code = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_code_size_cb *gbe_kernel_get_code_size = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_arg_num_cb *gbe_kernel_get_arg_num = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_arg_info_cb *gbe_kernel_get_arg_info = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_arg_size_cb *gbe_kernel_get_arg_size = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_arg_bti_cb *gbe_kernel_get_arg_bti = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_arg_type_cb *gbe_kernel_get_arg_type = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_arg_align_cb *gbe_kernel_get_arg_align = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_simd_width_cb *gbe_kernel_get_simd_width = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_curbe_offset_cb *gbe_kernel_get_curbe_offset = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_curbe_size_cb *gbe_kernel_get_curbe_size = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_stack_size_cb *gbe_kernel_get_stack_size = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_scratch_size_cb *gbe_kernel_get_scratch_size = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_required_work_group_size_cb *gbe_kernel_get_required_work_group_size = NULL; GBE_EXPORT_SYMBOL gbe_kernel_use_slm_cb *gbe_kernel_use_slm = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_slm_size_cb *gbe_kernel_get_slm_size = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_sampler_size_cb *gbe_kernel_get_sampler_size = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_sampler_data_cb *gbe_kernel_get_sampler_data = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_compile_wg_size_cb *gbe_kernel_get_compile_wg_size = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_image_size_cb *gbe_kernel_get_image_size = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_image_data_cb *gbe_kernel_get_image_data = NULL; GBE_EXPORT_SYMBOL gbe_kernel_get_ocl_version_cb *gbe_kernel_get_ocl_version = NULL; GBE_EXPORT_SYMBOL gbe_output_profiling_cb *gbe_output_profiling = NULL; GBE_EXPORT_SYMBOL gbe_dup_profiling_cb *gbe_dup_profiling = NULL; GBE_EXPORT_SYMBOL gbe_get_profiling_bti_cb *gbe_get_profiling_bti = NULL; GBE_EXPORT_SYMBOL gbe_get_printf_num_cb *gbe_get_printf_num = NULL; GBE_EXPORT_SYMBOL gbe_dup_printfset_cb *gbe_dup_printfset = NULL; GBE_EXPORT_SYMBOL gbe_get_printf_buf_bti_cb *gbe_get_printf_buf_bti = NULL; GBE_EXPORT_SYMBOL gbe_release_printf_info_cb *gbe_release_printf_info = NULL; GBE_EXPORT_SYMBOL gbe_output_printf_cb *gbe_output_printf = NULL; GBE_EXPORT_SYMBOL gbe_kernel_use_device_enqueue_cb *gbe_kernel_use_device_enqueue = NULL; #ifdef GBE_COMPILER_AVAILABLE namespace gbe { /* Use pre-main to setup the call backs */ struct CallBackInitializer { CallBackInitializer(void) { gbe_program_new_from_source = gbe::programNewFromSource; gbe_program_new_from_llvm_file = gbe::programNewFromLLVMFile; gbe_program_compile_from_source = gbe::programCompileFromSource; gbe_program_link_program = gbe::programLinkProgram; gbe_program_check_opt = gbe::programCheckOption; gbe_program_get_global_constant_size = gbe::programGetGlobalConstantSize; gbe_program_get_global_constant_data = gbe::programGetGlobalConstantData; gbe_program_get_global_reloc_count = gbe::programGetGlobalRelocCount; gbe_program_get_global_reloc_table = gbe::programGetGlobalRelocTable; gbe_program_clean_llvm_resource = gbe::programCleanLlvmResource; gbe_program_delete = gbe::programDelete; gbe_program_get_kernel_num = gbe::programGetKernelNum; gbe_program_get_device_enqueue_kernel_name = gbe::programGetDeviceEnqueueKernelName; gbe_program_get_kernel_by_name = gbe::programGetKernelByName; gbe_program_get_kernel = gbe::programGetKernel; gbe_kernel_get_name = gbe::kernelGetName; gbe_kernel_get_attributes = gbe::kernelGetAttributes; gbe_kernel_get_code = gbe::kernelGetCode; gbe_kernel_get_code_size = gbe::kernelGetCodeSize; gbe_kernel_get_arg_num = gbe::kernelGetArgNum; gbe_kernel_get_arg_info = gbe::kernelGetArgInfo; gbe_kernel_get_arg_size = gbe::kernelGetArgSize; gbe_kernel_get_arg_bti = gbe::kernelGetArgBTI; gbe_kernel_get_arg_type = gbe::kernelGetArgType; gbe_kernel_get_arg_align = gbe::kernelGetArgAlign; gbe_kernel_get_simd_width = gbe::kernelGetSIMDWidth; gbe_kernel_get_curbe_offset = gbe::kernelGetCurbeOffset; gbe_kernel_get_curbe_size = gbe::kernelGetCurbeSize; gbe_kernel_get_stack_size = gbe::kernelGetStackSize; gbe_kernel_get_scratch_size = gbe::kernelGetScratchSize; gbe_kernel_get_required_work_group_size = gbe::kernelGetRequiredWorkGroupSize; gbe_kernel_use_slm = gbe::kernelUseSLM; gbe_kernel_get_slm_size = gbe::kernelGetSLMSize; gbe_kernel_get_sampler_size = gbe::kernelGetSamplerSize; gbe_kernel_get_sampler_data = gbe::kernelGetSamplerData; gbe_kernel_get_compile_wg_size = gbe::kernelGetCompileWorkGroupSize; gbe_kernel_get_image_size = gbe::kernelGetImageSize; gbe_kernel_get_image_data = gbe::kernelGetImageData; gbe_kernel_get_ocl_version = gbe::kernelGetOclVersion; gbe_get_profiling_bti = gbe::kernelGetProfilingBTI; gbe_get_printf_num = gbe::kernelGetPrintfNum; gbe_dup_profiling = gbe::kernelDupProfiling; gbe_output_profiling = gbe::kernelOutputProfiling; gbe_get_printf_buf_bti = gbe::kernelGetPrintfBufBTI; gbe_dup_printfset = gbe::kernelDupPrintfSet; gbe_release_printf_info = gbe::kernelReleasePrintfSet; gbe_output_printf = gbe::kernelOutputPrintf; gbe_kernel_use_device_enqueue = gbe::kernelUseDeviceEnqueue; genSetupCallBacks(); } ~CallBackInitializer() { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 34 llvm::llvm_shutdown(); #endif } }; static CallBackInitializer cbInitializer; } /* namespace gbe */ #endif Beignet-1.3.2-Source/backend/src/backend/gen_insn_selection_output.cpp000664 001750 001750 00000013776 13173554000 025232 0ustar00yryr000000 000000 #include "backend/gen_insn_selection.hpp" #include "backend/gen_insn_selection_output.hpp" #include "sys/cvar.hpp" #include "sys/intrusive_list.hpp" #include #include #include using namespace std; namespace gbe { static void outputGenReg(GenRegister& reg, bool dst) { if (reg.file == GEN_IMMEDIATE_VALUE || reg.file == GEN_GENERAL_REGISTER_FILE) { if (reg.file == GEN_IMMEDIATE_VALUE) { switch (reg.type) { case GEN_TYPE_UD: case GEN_TYPE_UW: case GEN_TYPE_UB: case GEN_TYPE_HF_IMM: cout << hex << "0x" << reg.value.ud << dec; break; case GEN_TYPE_D: case GEN_TYPE_W: case GEN_TYPE_B: cout << reg.value.d; break; case GEN_TYPE_V: cout << hex << "0x" << reg.value.ud << dec; break; case GEN_TYPE_UL: cout << reg.value.u64; break; case GEN_TYPE_L: cout << reg.value.i64; break; case GEN_TYPE_F: cout << reg.value.f; break; } }else { if (reg.negation) cout << "-"; if (reg.absolute) cout << "(abs)"; cout << "%" << reg.value.reg; if (reg.subphysical) cout << "." << reg.subnr + reg.nr * GEN_REG_SIZE; if (dst) cout << "<" << GenRegister::hstride_size(reg) << ">"; else cout << "<" << GenRegister::vstride_size(reg) << "," << GenRegister::width_size(reg) << "," << GenRegister::hstride_size(reg) << ">"; } cout << ":"; switch (reg.type) { case GEN_TYPE_UD: cout << "UD"; break; case GEN_TYPE_UW: cout << "UW"; break; case GEN_TYPE_UB: cout << "UB"; break; case GEN_TYPE_HF_IMM: cout << "HF"; break; case GEN_TYPE_D: cout << "D"; break; case GEN_TYPE_W: cout << "W"; break; case GEN_TYPE_B: cout << "B"; break; case GEN_TYPE_V: cout << "V"; break; case GEN_TYPE_UL: cout << "UL"; break; case GEN_TYPE_L: cout << "L"; break; case GEN_TYPE_F: cout << "F"; break; } } else if (reg.file == GEN_ARCHITECTURE_REGISTER_FILE) { cout << setw(8) << "arf"; } else assert(!"should not reach here"); } #define OP_NAME_LENGTH 512 void outputSelectionInst(SelectionInstruction &insn) { cout<<"["<= OP_NAME_LENGTH - 20) { cout << "opname too long: " << opname << endl; return; } sprintf(&opname[n], "(%d)", insn.state.execWidth); cout << left << setw(20) << opname; for (int i = 0; i < insn.dstNum; ++i) { GenRegister dst = insn.dst(i); outputGenReg(dst, true); cout << "\t"; } cout << ":\t"; for (int i = 0; i < insn.srcNum; ++i) { GenRegister src = insn.src(i); outputGenReg(src, false); cout << "\t"; } cout << endl; } void outputSelectionIR(GenContext &ctx, Selection* sel, const char* KernelName) { cout << KernelName <<"'s SELECTION IR begin:" << endl; cout << "WARNING: not completed yet, welcome for the FIX!" << endl; for (SelectionBlock &block : *sel->blockList) { for (SelectionInstruction &insn : block.insnList) { outputSelectionInst(insn); } cout << endl; } cout < */ /** * \file gen_register.hpp * \author Benjamin Segovia */ #ifndef __GEN_REGISTER_HPP__ #define __GEN_REGISTER_HPP__ #include "backend/gen_defs.hpp" #include "ir/register.hpp" #include "sys/platform.hpp" namespace gbe { /*! Type size in bytes for each Gen type */ INLINE int typeSize(uint32_t type) { switch(type) { case GEN_TYPE_DF: case GEN_TYPE_UL: case GEN_TYPE_L: return 8; case GEN_TYPE_UD: case GEN_TYPE_D: case GEN_TYPE_F: return 4; case GEN_TYPE_UW: case GEN_TYPE_W: case GEN_TYPE_HF: case GEN_TYPE_HF_IMM: return 2; case GEN_TYPE_UB: case GEN_TYPE_B: return 1; default: assert(0); return 0; } } /*! Convert a hstride to a number of element */ INLINE uint32_t stride(uint32_t stride) { switch (stride) { case 0: return 0; case 1: return 1; case 2: return 2; case 3: return 4; case 4: return 8; case 5: return 16; default: assert(0); return 0; } } /*! Encode the instruction state. Note that the flag register can be either * physical (i.e. a real Gen flag) or a virtual boolean register. The flag * register allocation will turn all virtual boolean registers into flag * registers */ class GenInstructionState { public: INLINE GenInstructionState(uint32_t simdWidth = 8) { this->execWidth = simdWidth; this->quarterControl = GEN_COMPRESSION_Q1; this->nibControl = 0; this->accWrEnable = 0; this->noMask = 0; this->flag = 0; this->subFlag = 0; this->grfFlag = 1; this->externFlag = 0; this->modFlag = 0; this->flagGen = 0; this->predicate = GEN_PREDICATE_NONE; this->inversePredicate = 0; this->physicalFlag = 1; this->flagIndex = 0; this->saturate = GEN_MATH_SATURATE_NONE; } uint32_t physicalFlag:1; //!< Physical or virtual flag register uint32_t flag:1; //!< Only if physical flag, uint32_t subFlag:1; //!< Only if physical flag uint32_t grfFlag:1; //!< Only if virtual flag, 0 means we do not need to allocate GRF. uint32_t externFlag:1; //!< Only if virtual flag, 1 means this flag is from external BB. uint32_t modFlag:1; //!< Only if virtual flag, 1 means will modify flag. uint32_t flagGen:1; //!< Only if virtual flag, 1 means the gen_context stage may need to //!< generate the flag. uint32_t execWidth:5; uint32_t quarterControl:1; uint32_t nibControl:1; uint32_t accWrEnable:1; uint32_t noMask:1; uint32_t predicate:4; uint32_t inversePredicate:1; uint32_t saturate:1; uint32_t flagIndex; //!< Only if virtual flag (index of the register) void chooseNib(int nib) { switch (nib) { case 0: quarterControl = 0; nibControl = 0; break; case 1: quarterControl = 0; nibControl = 1; break; case 2: quarterControl = 1; nibControl = 0; break; case 3: quarterControl = 1; nibControl = 1; break; default: NOT_IMPLEMENTED; } } void useVirtualFlag(ir::Register flag, unsigned pred) { modFlag = 0; physicalFlag = 0; flagIndex = flag; predicate = pred; } void useFlag(int nr, int subnr) { flag = nr; subFlag = subnr; physicalFlag = 1; } }; /*! This is a book-keeping structure used to encode both virtual and physical * registers */ class GenRegister { public: /*! Empty constructor */ INLINE GenRegister(void) {} /*! General constructor */ INLINE GenRegister(uint32_t file, ir::Register reg, uint32_t type, uint32_t vstride, uint32_t width, uint32_t hstride) { this->type = type; this->file = file; this->physical = 0; this->subphysical = 0; this->value.reg = reg; this->negation = 0; this->absolute = 0; this->vstride = vstride; this->width = width; this->hstride = hstride; this->quarter = 0; this->nr = this->subnr = 0; this->address_mode = GEN_ADDRESS_DIRECT; this->a0_subnr = 0; this->addr_imm = 0; } /*! For specific physical registers only */ INLINE GenRegister(uint32_t file, uint32_t nr, uint32_t subnr, uint32_t type, uint32_t vstride, uint32_t width, uint32_t hstride) { this->type = type; this->file = file; this->nr = nr; this->physical = 1; this->subphysical = 1; this->subnr = subnr * typeSize(type); this->negation = 0; this->absolute = 0; this->vstride = vstride; this->width = width; this->hstride = hstride; this->quarter = 0; this->address_mode = GEN_ADDRESS_DIRECT; this->a0_subnr = 0; this->addr_imm = 0; } /*! Return the IR virtual register */ INLINE ir::Register reg(void) const { return ir::Register(value.reg); } /*! For immediates or virtual register */ union { double df; float f; int32_t d; uint32_t ud; uint32_t reg; int64_t i64; uint64_t u64; } value; uint32_t nr:8; //!< Just for some physical registers (acc, null) uint32_t subnr:8; //!< Idem uint32_t physical:1; //!< 1 if physical, 0 otherwise uint32_t subphysical:1;//!< 1 if subnr is physical, 0 otherwise uint32_t type:4; //!< Gen type uint32_t file:2; //!< Register file uint32_t negation:1; //!< For source uint32_t absolute:1; //!< For source uint32_t vstride:4; //!< Vertical stride uint32_t width:3; //!< Width uint32_t hstride:2; //!< Horizontal stride uint32_t quarter:1; //!< To choose which part we want (Q1 / Q2) uint32_t address_mode:1; //!< direct or indirect uint32_t a0_subnr:4; //!< In indirect mode, use a0.nr as the base. int32_t addr_imm:10; //!< In indirect mode, the imm as address offset from a0. static INLINE GenRegister offset(GenRegister reg, int nr, int subnr = 0) { GenRegister r = reg; if(subnr >= 32){ nr += subnr / 32; subnr = subnr % 32; } r.nr += nr; r.subnr += subnr; r.subphysical = 1; return r; } static INLINE GenRegister toUniform(GenRegister reg, uint32_t type) { GenRegister r = reg; r.type = type; r.hstride = GEN_HORIZONTAL_STRIDE_0; r.vstride = GEN_VERTICAL_STRIDE_0; r.width = GEN_WIDTH_1; return r; } INLINE bool isSameRegion(GenRegister reg) const { return reg.file == file && typeSize(reg.type) == typeSize(type) && reg.vstride == vstride && reg.width == width && reg.hstride == hstride; } static INLINE uint32_t grfOffset(GenRegister reg) { return reg.nr * GEN_REG_SIZE + reg.subnr; } // split a DWORD register into unpacked Byte or Short register static INLINE GenRegister splitReg(GenRegister reg, uint32_t count, uint32_t sub_part) { GenRegister r = reg; GBE_ASSERT(count == 4 || count == 2); GBE_ASSERT(reg.type == GEN_TYPE_UD || reg.type == GEN_TYPE_D); if(reg.hstride != GEN_HORIZONTAL_STRIDE_0) { GBE_ASSERT(reg.hstride == GEN_HORIZONTAL_STRIDE_1); r.hstride = count == 4 ? GEN_HORIZONTAL_STRIDE_4 : GEN_HORIZONTAL_STRIDE_2; } if(count == 4) { r.type = reg.type == GEN_TYPE_UD ? GEN_TYPE_UB : GEN_TYPE_B; r.vstride = GEN_VERTICAL_STRIDE_32; } else { r.type = reg.type == GEN_TYPE_UD ? GEN_TYPE_UW : GEN_TYPE_W; r.vstride = GEN_VERTICAL_STRIDE_16; } r.subnr += sub_part*typeSize(r.type); r.nr += r.subnr / 32; r.subnr %= 32; return r; } INLINE bool isint64(void) const { if ((type == GEN_TYPE_UL || type == GEN_TYPE_L) && file == GEN_GENERAL_REGISTER_FILE) return true; return false; } /* Besides long and double, there are also some cases which can also stride several registers, eg. unpacked ud for long<8,4:2> and unpacked uw for long<16,4:4> */ INLINE bool is_unpacked_long(void) const { if (file != GEN_GENERAL_REGISTER_FILE) return false; if (width == GEN_WIDTH_4 && hstride > GEN_HORIZONTAL_STRIDE_1) return true; return false; } INLINE bool isimmdf(void) const { if (type == GEN_TYPE_DF && file == GEN_IMMEDIATE_VALUE) return true; return false; } INLINE GenRegister top_half(int simdWidth) const { GBE_ASSERT(isint64()); GenRegister reg = retype(*this, type == GEN_TYPE_UL ? GEN_TYPE_UD : GEN_TYPE_D); if (reg.hstride != GEN_HORIZONTAL_STRIDE_0) { reg.subnr += simdWidth * typeSize(reg.type) * hstride_size(reg); reg.nr += reg.subnr / 32; reg.subnr %= 32; } else { reg.subnr += typeSize(reg.type); reg.nr += reg.subnr/32; reg.subnr %= 32; } return reg; } INLINE GenRegister bottom_half(void) const { GBE_ASSERT(isint64()); GenRegister r = retype(*this, type == GEN_TYPE_UL ? GEN_TYPE_UD : GEN_TYPE_D); return r; } INLINE bool is_signed_int(void) const { if ((type == GEN_TYPE_B || type == GEN_TYPE_W || type == GEN_TYPE_D || type == GEN_TYPE_L) && file == GEN_GENERAL_REGISTER_FILE) return true; return false; } INLINE bool isdf(void) const { if (type == GEN_TYPE_DF && file == GEN_GENERAL_REGISTER_FILE) return true; return false; } INLINE int flag_nr(void) const { assert(file == GEN_ARCHITECTURE_REGISTER_FILE); assert(nr >= GEN_ARF_FLAG && nr < GEN_ARF_FLAG + 2); return nr & 15; } INLINE int flag_subnr(void) const { return subnr / typeSize(type); } static INLINE GenRegister h2(GenRegister reg) { GenRegister r = reg; if(r.hstride != GEN_HORIZONTAL_STRIDE_0) r.hstride = GEN_HORIZONTAL_STRIDE_2; return r; } static INLINE GenRegister QnVirtual(GenRegister reg, uint32_t quarter) { GBE_ASSERT(reg.physical == 0); if (reg.hstride == GEN_HORIZONTAL_STRIDE_0) // scalar register return reg; else { reg.quarter = quarter; return reg; } } static INLINE GenRegister QnPhysical(GenRegister reg, uint32_t quarter) { GBE_ASSERT(reg.physical); if (reg.hstride == GEN_HORIZONTAL_STRIDE_0) // scalar register return reg; else { const uint32_t typeSz = typeSize(reg.type); const uint32_t horizontal = stride(reg.hstride); const uint32_t grfOffset = reg.nr*GEN_REG_SIZE + reg.subnr; const uint32_t nextOffset = grfOffset + 8*quarter*horizontal*typeSz; reg.nr = nextOffset / GEN_REG_SIZE; reg.subnr = (nextOffset % GEN_REG_SIZE); return reg; } } static INLINE GenRegister Qn(GenRegister reg, uint32_t quarter) { if (reg.physical) return QnPhysical(reg, quarter); else return QnVirtual(reg, quarter); } static INLINE GenRegister vec16(uint32_t file, ir::Register reg) { return GenRegister(file, reg, GEN_TYPE_F, GEN_VERTICAL_STRIDE_8, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_1); } static INLINE GenRegister vec8(uint32_t file, ir::Register reg) { return GenRegister(file, reg, GEN_TYPE_F, GEN_VERTICAL_STRIDE_8, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_1); } static INLINE GenRegister vec4(uint32_t file, ir::Register reg) { return GenRegister(file, reg, GEN_TYPE_F, GEN_VERTICAL_STRIDE_4, GEN_WIDTH_4, GEN_HORIZONTAL_STRIDE_1); } static INLINE GenRegister vec2(uint32_t file, ir::Register reg) { return GenRegister(file, reg, GEN_TYPE_F, GEN_VERTICAL_STRIDE_2, GEN_WIDTH_2, GEN_HORIZONTAL_STRIDE_1); } static INLINE GenRegister vec1(uint32_t file, ir::Register reg) { return GenRegister(file, reg, GEN_TYPE_F, GEN_VERTICAL_STRIDE_0, GEN_WIDTH_1, GEN_HORIZONTAL_STRIDE_0); } static INLINE GenRegister retype(GenRegister reg, uint32_t type) { reg.type = type; return reg; } static INLINE GenRegister df16(uint32_t file, ir::Register reg) { return retype(vec4(file, reg), GEN_TYPE_DF); } static INLINE GenRegister df8(uint32_t file, ir::Register reg) { return retype(vec4(file, reg), GEN_TYPE_DF); } static INLINE GenRegister df1(uint32_t file, ir::Register reg) { return retype(vec1(file, reg), GEN_TYPE_DF); } /* Because we can not crossing row with horizontal stride, so for long type, we need to set it to <4,4:1>:UQ */ static INLINE GenRegister ul16(uint32_t file, ir::Register reg) { return retype(vec4(file, reg), GEN_TYPE_UL); } static INLINE GenRegister ul8(uint32_t file, ir::Register reg) { return retype(vec4(file, reg), GEN_TYPE_UL); } static INLINE GenRegister ul1(uint32_t file, ir::Register reg) { return retype(vec1(file, reg), GEN_TYPE_UL); } static INLINE GenRegister ud16(uint32_t file, ir::Register reg) { return retype(vec16(file, reg), GEN_TYPE_UD); } static INLINE GenRegister ud8(uint32_t file, ir::Register reg) { return retype(vec8(file, reg), GEN_TYPE_UD); } static INLINE GenRegister ud1(uint32_t file, ir::Register reg) { return retype(vec1(file, reg), GEN_TYPE_UD); } static INLINE GenRegister d8(uint32_t file, ir::Register reg) { return retype(vec8(file, reg), GEN_TYPE_D); } static INLINE GenRegister uw16(uint32_t file, ir::Register reg) { return retype(vec16(file, reg), GEN_TYPE_UW); } static INLINE GenRegister uw8(uint32_t file, ir::Register reg) { return retype(vec8(file, reg), GEN_TYPE_UW); } static INLINE GenRegister uw1(uint32_t file, ir::Register reg) { return retype(vec1(file, reg), GEN_TYPE_UW); } static INLINE GenRegister ub16(uint32_t file, ir::Register reg) { return GenRegister(file, reg, GEN_TYPE_UB, GEN_VERTICAL_STRIDE_16, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_2); } static INLINE GenRegister ub8(uint32_t file, ir::Register reg) { return GenRegister(file, reg, GEN_TYPE_UB, GEN_VERTICAL_STRIDE_16, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_2); } static INLINE GenRegister ub1(uint32_t file, ir::Register reg) { return retype(vec1(file, reg), GEN_TYPE_UB); } static INLINE GenRegister unpacked_ud(ir::Register reg, bool uniform = false) { uint32_t width; uint32_t vstride; uint32_t hstride; if (uniform) { width = GEN_WIDTH_1; vstride = GEN_VERTICAL_STRIDE_0; hstride = GEN_HORIZONTAL_STRIDE_0; } else { width = GEN_WIDTH_4; vstride = GEN_VERTICAL_STRIDE_8; hstride = GEN_HORIZONTAL_STRIDE_2; } return GenRegister(GEN_GENERAL_REGISTER_FILE, reg, GEN_TYPE_UD, vstride, width, hstride); } static INLINE GenRegister unpacked_uw(ir::Register reg, bool uniform = false, bool islong = false) { uint32_t width; uint32_t vstride; uint32_t hstride; if (uniform) { width = GEN_WIDTH_1; vstride = GEN_VERTICAL_STRIDE_0; hstride = GEN_HORIZONTAL_STRIDE_0; } else if (islong) { width = GEN_WIDTH_4; vstride = GEN_VERTICAL_STRIDE_16; hstride = GEN_HORIZONTAL_STRIDE_4; } else { width = GEN_WIDTH_8; vstride = GEN_VERTICAL_STRIDE_16; hstride = GEN_HORIZONTAL_STRIDE_2; } return GenRegister(GEN_GENERAL_REGISTER_FILE, reg, GEN_TYPE_UW, vstride, width, hstride); } static INLINE GenRegister unpacked_ub(ir::Register reg, bool uniform = false) { return GenRegister(GEN_GENERAL_REGISTER_FILE, reg, GEN_TYPE_UB, uniform ? GEN_VERTICAL_STRIDE_0 : GEN_VERTICAL_STRIDE_32, uniform ? GEN_WIDTH_1 : GEN_WIDTH_8, uniform ? GEN_HORIZONTAL_STRIDE_0 : GEN_HORIZONTAL_STRIDE_4); } static INLINE GenRegister imm(uint32_t type) { return GenRegister(GEN_IMMEDIATE_VALUE, 0, 0, type, GEN_VERTICAL_STRIDE_0, GEN_WIDTH_1, GEN_HORIZONTAL_STRIDE_0); } static INLINE GenRegister immuint64(uint64_t i) { GenRegister immediate = imm(GEN_TYPE_UL); immediate.value.u64 = i; return immediate; } static INLINE GenRegister immint64(int64_t i) { GenRegister immediate = imm(GEN_TYPE_L); immediate.value.i64 = i; return immediate; } static INLINE GenRegister immdf(double df) { GenRegister immediate = imm(GEN_TYPE_DF_IMM); immediate.value.df = df; return immediate; } static INLINE GenRegister immf(float f) { GenRegister immediate = imm(GEN_TYPE_F); immediate.value.f = f; return immediate; } static INLINE GenRegister immd(int d) { GenRegister immediate = imm(GEN_TYPE_D); immediate.value.d = d; return immediate; } static INLINE GenRegister immud(uint32_t ud) { GenRegister immediate = imm(GEN_TYPE_UD); immediate.value.ud = ud; return immediate; } static INLINE GenRegister immuw(uint16_t uw) { GenRegister immediate = imm(GEN_TYPE_UW); immediate.value.ud = uw; return immediate; } static INLINE GenRegister immw(int16_t w) { GenRegister immediate = imm(GEN_TYPE_W); immediate.value.d = w; return immediate; } static INLINE GenRegister immh(uint16_t uw) { GenRegister immediate = imm(GEN_TYPE_HF_IMM); immediate.value.ud = uw; return immediate; } static INLINE GenRegister immv(uint32_t v) { GenRegister immediate = imm(GEN_TYPE_V); immediate.vstride = GEN_VERTICAL_STRIDE_0; immediate.width = GEN_WIDTH_8; immediate.hstride = GEN_HORIZONTAL_STRIDE_1; immediate.value.ud = v; return immediate; } static INLINE GenRegister immvf(uint32_t v) { GenRegister immediate = imm(GEN_TYPE_VF); immediate.vstride = GEN_VERTICAL_STRIDE_0; immediate.width = GEN_WIDTH_4; immediate.hstride = GEN_HORIZONTAL_STRIDE_1; immediate.value.ud = v; return immediate; } static INLINE GenRegister immvf4(uint32_t v0, uint32_t v1, uint32_t v2, uint32_t v3) { GenRegister immediate = imm(GEN_TYPE_VF); immediate.vstride = GEN_VERTICAL_STRIDE_0; immediate.width = GEN_WIDTH_4; immediate.hstride = GEN_HORIZONTAL_STRIDE_1; immediate.value.ud = ((v0 << 0) | (v1 << 8) | (v2 << 16) | (v3 << 24)); return immediate; } static INLINE GenRegister f1grf(ir::Register reg) { return vec1(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister f2grf(ir::Register reg) { return vec2(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister f4grf(ir::Register reg) { return vec4(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister f8grf(ir::Register reg) { return vec8(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister f16grf(ir::Register reg) { return vec16(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister df1grf(ir::Register reg) { return df1(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister df8grf(ir::Register reg) { return df8(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister df16grf(ir::Register reg) { return df16(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister ul16grf(ir::Register reg) { return ul16(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister ul8grf(ir::Register reg) { return ul8(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister ul1grf(ir::Register reg) { return ul1(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister ud16grf(ir::Register reg) { return ud16(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister ud8grf(ir::Register reg) { return ud8(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister ud1grf(ir::Register reg) { return ud1(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister uw1grf(ir::Register reg) { return uw1(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister uw8grf(ir::Register reg) { return uw8(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister uw16grf(ir::Register reg) { return uw16(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister ub1grf(ir::Register reg) { return ub1(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister ub8grf(ir::Register reg) { return ub8(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister ub16grf(ir::Register reg) { return ub16(GEN_GENERAL_REGISTER_FILE, reg); } static INLINE GenRegister null(void) { return GenRegister(GEN_ARCHITECTURE_REGISTER_FILE, GEN_ARF_NULL, 0, GEN_TYPE_F, GEN_VERTICAL_STRIDE_8, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_1); } static INLINE GenRegister nullud(void) { return GenRegister(GEN_ARCHITECTURE_REGISTER_FILE, GEN_ARF_NULL, 0, GEN_TYPE_UD, GEN_VERTICAL_STRIDE_8, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_1); } static INLINE bool isNull(GenRegister reg) { return (reg.file == GEN_ARCHITECTURE_REGISTER_FILE && reg.nr == GEN_ARF_NULL); } static INLINE GenRegister vec1(GenRegister reg) { reg.width = GEN_WIDTH_1; reg.hstride = GEN_HORIZONTAL_STRIDE_0; reg.vstride = GEN_VERTICAL_STRIDE_0; return reg; } static INLINE GenRegister tm0(void) { return GenRegister(GEN_ARCHITECTURE_REGISTER_FILE, 0xc0, 0, GEN_TYPE_UW, GEN_VERTICAL_STRIDE_4, GEN_WIDTH_4, GEN_HORIZONTAL_STRIDE_1); } static INLINE GenRegister acc(void) { return GenRegister(GEN_ARCHITECTURE_REGISTER_FILE, GEN_ARF_ACCUMULATOR, 0, GEN_TYPE_F, GEN_VERTICAL_STRIDE_8, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_1); } static INLINE GenRegister ip(void) { return GenRegister(GEN_ARCHITECTURE_REGISTER_FILE, GEN_ARF_IP, 0, GEN_TYPE_D, GEN_VERTICAL_STRIDE_4, GEN_WIDTH_1, GEN_HORIZONTAL_STRIDE_0); } static INLINE GenRegister sr(uint32_t nr, uint32_t subnr = 0) { return GenRegister(GEN_ARCHITECTURE_REGISTER_FILE, GEN_ARF_STATE | nr, subnr, GEN_TYPE_UD, GEN_VERTICAL_STRIDE_8, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_1); } static INLINE GenRegister notification0(uint32_t subnr) { return GenRegister(GEN_ARCHITECTURE_REGISTER_FILE, GEN_ARF_NOTIFICATION_COUNT, subnr, GEN_TYPE_UD, GEN_VERTICAL_STRIDE_0, GEN_WIDTH_1, GEN_HORIZONTAL_STRIDE_0); } static INLINE GenRegister flag(uint32_t nr, uint32_t subnr) { return GenRegister(GEN_ARCHITECTURE_REGISTER_FILE, GEN_ARF_FLAG | nr, subnr, GEN_TYPE_UW, GEN_VERTICAL_STRIDE_0, GEN_WIDTH_1, GEN_HORIZONTAL_STRIDE_0); } static INLINE GenRegister next(GenRegister reg) { if (reg.physical) reg.nr++; else reg.quarter++; return reg; } /*! Build an indirectly addressed source */ static INLINE GenRegister indirect(uint32_t type, uint32_t subnr, uint32_t width, uint32_t vstride, uint32_t hstride) { GenRegister reg; reg.type = type; reg.file = GEN_GENERAL_REGISTER_FILE; reg.address_mode = GEN_ADDRESS_REGISTER_INDIRECT_REGISTER; reg.width = width; reg.a0_subnr = subnr; reg.nr = 0; reg.addr_imm = 0; reg.negation = 0; reg.absolute = 0; reg.vstride = vstride; reg.hstride = hstride; return reg; } /*! convert one register to indirectly mode */ static INLINE GenRegister to_indirect1xN(GenRegister reg, uint32_t base_addr, int32_t imm_off = 4096, int a0_subnr = 0) { GenRegister r = reg; int32_t offset; if (imm_off > 4095) { offset = (r.nr*32 + r.subnr) - base_addr; } else { offset = imm_off; } GBE_ASSERT(offset <= 511 && offset>=-512); r.a0_subnr = a0_subnr; r.addr_imm = offset; r.address_mode = GEN_ADDRESS_REGISTER_INDIRECT_REGISTER; r.width = GEN_WIDTH_1; r.vstride = GEN_VERTICAL_STRIDE_ONE_DIMENSIONAL; r.hstride = GEN_HORIZONTAL_STRIDE_0; return r; } static INLINE GenRegister vec16(uint32_t file, uint32_t nr, uint32_t subnr) { return GenRegister(file, nr, subnr, GEN_TYPE_F, GEN_VERTICAL_STRIDE_8, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_1); } static INLINE GenRegister vec8(uint32_t file, uint32_t nr, uint32_t subnr) { return GenRegister(file, nr, subnr, GEN_TYPE_F, GEN_VERTICAL_STRIDE_8, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_1); } static INLINE GenRegister vec4(uint32_t file, uint32_t nr, uint32_t subnr) { return GenRegister(file, nr, subnr, GEN_TYPE_F, GEN_VERTICAL_STRIDE_4, GEN_WIDTH_4, GEN_HORIZONTAL_STRIDE_1); } static INLINE GenRegister vec2(uint32_t file, uint32_t nr, uint32_t subnr) { return GenRegister(file, nr, subnr, GEN_TYPE_F, GEN_VERTICAL_STRIDE_2, GEN_WIDTH_2, GEN_HORIZONTAL_STRIDE_1); } static INLINE GenRegister vec1(uint32_t file, uint32_t nr, uint32_t subnr) { return GenRegister(file, nr, subnr, GEN_TYPE_F, GEN_VERTICAL_STRIDE_0, GEN_WIDTH_1, GEN_HORIZONTAL_STRIDE_0); } static INLINE uint32_t hstrideFromSize(int size) { switch (size) { case 0: return GEN_HORIZONTAL_STRIDE_0; case 1: return GEN_HORIZONTAL_STRIDE_1; case 2: return GEN_HORIZONTAL_STRIDE_2; case 4: return GEN_HORIZONTAL_STRIDE_4; default: NOT_IMPLEMENTED; return GEN_HORIZONTAL_STRIDE_0; } } static INLINE int hstride_size(GenRegister reg) { switch (reg.hstride) { case GEN_HORIZONTAL_STRIDE_0: return 0; case GEN_HORIZONTAL_STRIDE_1: return 1; case GEN_HORIZONTAL_STRIDE_2: return 2; case GEN_HORIZONTAL_STRIDE_4: return 4; default: NOT_IMPLEMENTED; return 0; } } static INLINE int vstride_size(GenRegister reg) { switch (reg.vstride) { case GEN_VERTICAL_STRIDE_0: return 0; case GEN_VERTICAL_STRIDE_1: return 1; case GEN_VERTICAL_STRIDE_2: return 2; case GEN_VERTICAL_STRIDE_4: return 4; case GEN_VERTICAL_STRIDE_8: return 8; case GEN_VERTICAL_STRIDE_16: return 16; case GEN_VERTICAL_STRIDE_32: return 32; case GEN_VERTICAL_STRIDE_64: return 64; case GEN_VERTICAL_STRIDE_128: return 128; case GEN_VERTICAL_STRIDE_256: return 256; default: NOT_IMPLEMENTED; return 0; } } static INLINE int width_size(GenRegister reg) { switch (reg.width) { case GEN_WIDTH_1: return 1; case GEN_WIDTH_2: return 2; case GEN_WIDTH_4: return 4; case GEN_WIDTH_8: return 8; case GEN_WIDTH_16: return 16; case GEN_WIDTH_32: return 32; default: NOT_IMPLEMENTED; return 0; } } static INLINE GenRegister suboffset(GenRegister reg, uint32_t delta) { if (reg.hstride != GEN_HORIZONTAL_STRIDE_0) { reg.subnr += delta * typeSize(reg.type) * hstride_size(reg); reg.nr += reg.subnr / 32; reg.subnr %= 32; } return reg; } static INLINE GenRegister subphysicaloffset(GenRegister reg, uint32_t delta) { if (reg.hstride != GEN_HORIZONTAL_STRIDE_0) { reg.subnr += delta * typeSize(reg.type) * hstride_size(reg); reg.subphysical = 1; } return reg; } static INLINE GenRegister df16(uint32_t file, uint32_t nr, uint32_t subnr) { return retype(vec16(file, nr, subnr), GEN_TYPE_DF); } static INLINE GenRegister df8(uint32_t file, uint32_t nr, uint32_t subnr) { return retype(vec8(file, nr, subnr), GEN_TYPE_DF); } static INLINE GenRegister df1(uint32_t file, uint32_t nr, uint32_t subnr) { return retype(vec1(file, nr, subnr), GEN_TYPE_DF); } static INLINE GenRegister ul16(uint32_t file, uint32_t nr, uint32_t subnr) { return retype(vec4(file, nr, subnr), GEN_TYPE_UL); } static INLINE GenRegister ul8(uint32_t file, uint32_t nr, uint32_t subnr) { return retype(vec4(file, nr, subnr), GEN_TYPE_UL); } static INLINE GenRegister ul1(uint32_t file, uint32_t nr, uint32_t subnr) { return retype(vec1(file, nr, subnr), GEN_TYPE_UL); } static INLINE GenRegister ud16(uint32_t file, uint32_t nr, uint32_t subnr) { return retype(vec16(file, nr, subnr), GEN_TYPE_UD); } static INLINE GenRegister ud8(uint32_t file, uint32_t nr, uint32_t subnr) { return retype(vec8(file, nr, subnr), GEN_TYPE_UD); } static INLINE GenRegister ud1(uint32_t file, uint32_t nr, uint32_t subnr) { return retype(vec1(file, nr, subnr), GEN_TYPE_UD); } static INLINE GenRegister d8(uint32_t file, uint32_t nr, uint32_t subnr) { return retype(vec8(file, nr, subnr), GEN_TYPE_D); } static INLINE GenRegister uw16(uint32_t file, uint32_t nr, uint32_t subnr) { return suboffset(retype(vec16(file, nr, 0), GEN_TYPE_UW), subnr); } static INLINE GenRegister uw8(uint32_t file, uint32_t nr, uint32_t subnr) { return suboffset(retype(vec8(file, nr, 0), GEN_TYPE_UW), subnr); } static INLINE GenRegister uw1(uint32_t file, uint32_t nr, uint32_t subnr) { return GenRegister(file, nr, subnr, GEN_TYPE_UW, GEN_VERTICAL_STRIDE_0, GEN_WIDTH_1, GEN_HORIZONTAL_STRIDE_0); } static INLINE GenRegister ub16(uint32_t file, uint32_t nr, uint32_t subnr) { return GenRegister(file, nr, subnr, GEN_TYPE_UB, GEN_VERTICAL_STRIDE_16, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_2); } static INLINE GenRegister ub8(uint32_t file, uint32_t nr, uint32_t subnr) { return GenRegister(file, nr, subnr, GEN_TYPE_UB, GEN_VERTICAL_STRIDE_16, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_2); } static INLINE GenRegister ub1(uint32_t file, uint32_t nr, uint32_t subnr) { return GenRegister(file, nr, subnr, GEN_TYPE_UB, GEN_VERTICAL_STRIDE_0, GEN_WIDTH_1, GEN_HORIZONTAL_STRIDE_0); } static INLINE GenRegister f1grf(uint32_t nr, uint32_t subnr) { return vec1(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister f2grf(uint32_t nr, uint32_t subnr) { return vec2(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister f4grf(uint32_t nr, uint32_t subnr) { return vec4(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister f8grf(uint32_t nr, uint32_t subnr) { return vec8(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister f16grf(uint32_t nr, uint32_t subnr) { return vec16(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister df16grf(uint32_t nr, uint32_t subnr) { return df16(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister df8grf(uint32_t nr, uint32_t subnr) { return df8(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister df1grf(uint32_t nr, uint32_t subnr) { return df1(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister ul16grf(uint32_t nr, uint32_t subnr) { return ul16(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister ul8grf(uint32_t nr, uint32_t subnr) { return ul8(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister ul1grf(uint32_t nr, uint32_t subnr) { return ul1(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister ud16grf(uint32_t nr, uint32_t subnr) { return ud16(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister ud8grf(uint32_t nr, uint32_t subnr) { return ud8(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister ud1grf(uint32_t nr, uint32_t subnr) { return ud1(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister ud1arf(uint32_t nr, uint32_t subnr) { return ud1(GEN_ARCHITECTURE_REGISTER_FILE, nr, subnr); } static INLINE GenRegister uw1grf(uint32_t nr, uint32_t subnr) { return uw1(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister uw8grf(uint32_t nr, uint32_t subnr) { return uw8(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister uw16grf(uint32_t nr, uint32_t subnr) { return uw16(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister ub1grf(uint32_t nr, uint32_t subnr) { return ub1(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister ub8grf(uint32_t nr, uint32_t subnr) { return ub8(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister ub16grf(uint32_t nr, uint32_t subnr) { return ub16(GEN_GENERAL_REGISTER_FILE, nr, subnr); } static INLINE GenRegister unpacked_uw(uint32_t nr, uint32_t subnr) { return GenRegister(GEN_GENERAL_REGISTER_FILE, nr, subnr, GEN_TYPE_UW, GEN_VERTICAL_STRIDE_16, GEN_WIDTH_8, GEN_HORIZONTAL_STRIDE_2); } static INLINE GenRegister unpacked_uw(const GenRegister& reg) { uint32_t nr = reg.nr; uint32_t subnr = reg.subnr / typeSize(GEN_TYPE_UW); uint32_t width = reg.width; int hstrideSize = GenRegister::hstride_size(reg) * typeSize(reg.type) / typeSize(GEN_TYPE_UW); uint32_t hstride = GenRegister::hstrideFromSize(hstrideSize); return GenRegister(GEN_GENERAL_REGISTER_FILE, nr, subnr, GEN_TYPE_UW, GEN_VERTICAL_STRIDE_16, width, hstride); } static INLINE GenRegister packed_ud(uint32_t nr, uint32_t subnr) { return GenRegister(GEN_GENERAL_REGISTER_FILE, nr, subnr, GEN_TYPE_UD, GEN_VERTICAL_STRIDE_8, GEN_WIDTH_4, GEN_HORIZONTAL_STRIDE_1); } static INLINE GenRegister unpacked_ud(uint32_t nr, uint32_t subnr) { return GenRegister(GEN_GENERAL_REGISTER_FILE, nr, subnr, GEN_TYPE_UD, GEN_VERTICAL_STRIDE_8, GEN_WIDTH_4, GEN_HORIZONTAL_STRIDE_2); } static INLINE GenRegister mask(uint32_t subnr) { return uw1(GEN_ARCHITECTURE_REGISTER_FILE, GEN_ARF_MASK, subnr); } static INLINE GenRegister addr1(uint32_t subnr) { return uw1(GEN_ARCHITECTURE_REGISTER_FILE, GEN_ARF_ADDRESS, subnr); } static INLINE GenRegister addr8(uint32_t subnr) { return uw8(GEN_ARCHITECTURE_REGISTER_FILE, GEN_ARF_ADDRESS, subnr); } static INLINE GenRegister negate(GenRegister reg) { if (reg.file != GEN_IMMEDIATE_VALUE) reg.negation ^= 1; else { if (reg.type == GEN_TYPE_F) reg.value.f = -reg.value.f; else if (reg.type == GEN_TYPE_UD) reg.value.ud = -reg.value.ud; else if (reg.type == GEN_TYPE_D) reg.value.d = -reg.value.d; else if (reg.type == GEN_TYPE_UW) { const uint16_t uw = reg.value.ud & 0xffff; reg = GenRegister::immuw(-uw); } else if (reg.type == GEN_TYPE_W) { const uint16_t uw = reg.value.ud & 0xffff; reg = GenRegister::immw(-(int16_t)uw); } else if (reg.type == GEN_TYPE_HF_IMM) { const uint16_t uw = reg.value.ud & 0xffff; reg = GenRegister::immh(uw ^ 0x8000); } else if (reg.type == GEN_TYPE_DF_IMM) { reg.value.df = -reg.value.df; } else NOT_SUPPORTED; } return reg; } static INLINE GenRegister abs(GenRegister reg) { reg.absolute = 1; reg.negation = 0; return reg; } static INLINE void propagateRegister(GenRegister& dst, const GenRegister& src) { dst.type = src.type; dst.file = src.file; dst.physical = src.physical; dst.subphysical = src.subphysical; dst.value.reg = src.value.reg; dst.vstride = src.vstride; dst.width = src.width; dst.hstride = src.hstride; dst.quarter = src.quarter; dst.nr = src.nr; dst.subnr = src.subnr; dst.address_mode = src.address_mode; dst.a0_subnr = src.a0_subnr; dst.addr_imm = src.addr_imm; dst.negation = dst.negation ^ src.negation; dst.absolute = dst.absolute | src.absolute; } /*! Generate register encoding with run-time simdWidth */ #define DECL_REG_ENCODER(NAME, SIMD16, SIMD8, SIMD1) \ template \ static INLINE GenRegister NAME(uint32_t simdWidth, Args... values) { \ if (simdWidth == 16) \ return SIMD16(values...); \ else if (simdWidth == 8) \ return SIMD8(values...); \ else if (simdWidth == 1) \ return SIMD1(values...); \ else { \ NOT_IMPLEMENTED; \ return SIMD1(values...); \ } \ } // TODO: Should add native long type here. DECL_REG_ENCODER(dfxgrf, df16grf, df8grf, df1grf); DECL_REG_ENCODER(fxgrf, f16grf, f8grf, f1grf); DECL_REG_ENCODER(uwxgrf, uw16grf, uw8grf, uw1grf); DECL_REG_ENCODER(udxgrf, ud16grf, ud8grf, ud1grf); #undef DECL_REG_ENCODER }; } /* namespace gbe */ #endif /* __GEN_REGISTER_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen_insn_scheduling.cpp000664 001750 001750 00000077546 13161142102 023750 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file gen_insn_scheduling.cpp * \author Benjamin Segovia */ /* * Overall idea: * ============= * * This is the instruction scheduling part of the code. With Gen, we actually * have a simple strategy to follow. Indeed, here are the constraints: * * 1 - the number of registers per HW thread is constant and given (128 32 bytes * GRF per thread). So, we can use all these registers with no penalty * 2 - spilling is super bad. Instruction latency matters but the top priority * is to avoid as much as possible spilling * * * We schedule twice using at each time a local forward list scheduler * * Before the register allocation * ============================== * * We try to limit the register pressure. * * To find out an instruction scheduling policy to achieve the theoretical minimum * registers required in a basic block is a NP problem. We have to use some heuristic * factor to simplify the algorithm. There are many researchs which indicate a * bottom-up list scheduling is much better than the top-down method in turns of * register pressure. I choose one of such research paper as our target. The paper * is as below: * * "Register-Sensitive Selection, Duplication, and Sequencing of Instructions" * It use the bottom-up list scheduling with a Sethi-Ullman label as an * heuristic number. As we will do cycle awareness scheduling after the register * allocation, we don't need to bother with cycle related heuristic number here. * I just skipped the EST computing and usage part in the algorithm. * * It turns out this algorithm works well. It could reduce the register spilling * in clBlas's sgemmBlock kernel from 83+ to only 20. * * Although this scheduling method seems to be lowering the ILP(instruction level parallism). * It's not a big issue, because we will allocate as much as possible different registers * in the following register allocation stage, and we will do a after allocation * instruction scheduling which will try to get as much ILP as possible. * * FIXME: we only need to do this scheduling when a BB is really under high register pressure. * * After the register allocation * ============================== * * This is here a pretty simple strategy based on a regular forward list * scheduling. Since Gen is a co-issue based machine, this is useless to take * into account really precise timings since instruction issues will happen * out-of-order based on other thread executions. * * Note that we over-simplify the problem. Indeed, Gen register file is flexible * and we are able to use sub-registers of GRF in particular when we handle * uniforms or mask registers which are spilled in GRFs. Thing is that two * uniforms may not interfere even if they belong to the same GRF (i.e. they use * two different sub-registers). This means that the interference relation is * not transitive for Gen. To simplify everything, we just take consider full * GRFs (in SIMD8) or double full GRFs (in SIMD16) regardless of the fact this * is a uniform, a mask or a regular GRF. * * Obviously, this leads to extra dependencies in the code. */ #include "backend/gen_insn_selection.hpp" #include "backend/gen_reg_allocation.hpp" #include "sys/cvar.hpp" #include "sys/intrusive_list.hpp" namespace gbe { // Helper structure to schedule the basic blocks struct SelectionScheduler; // Node for the schedule DAG struct ScheduleDAGNode; typedef enum { WRITE_AFTER_WRITE, WRITE_AFTER_READ, READ_AFTER_WRITE, READ_AFTER_WRITE_MEMORY } DepMode; /*! We need to chain together the node we point */ struct ScheduleListNode : public intrusive_list_node { INLINE ScheduleListNode(ScheduleDAGNode *node, DepMode m = READ_AFTER_WRITE) : node(node), depMode(m) {} ScheduleDAGNode *node; DepMode depMode; }; /*! Node of the DAG */ struct ScheduleDAGNode { INLINE ScheduleDAGNode(SelectionInstruction &insn) : insn(insn), refNum(0), depNum(0), retiredCycle(0), preRetired(false), readDistance(0x7fffffff) {} bool dependsOn(ScheduleDAGNode *node) const { GBE_ASSERT(node != NULL); for (auto child : node->children) if (child.node == this) return true; return false; } /*! Children that depends on us */ intrusive_list children; /*! Instruction after code selection */ SelectionInstruction &insn; /*! Number of nodes that point to us (i.e. nodes we depend on) */ uint32_t refNum; /*! Number of nodes that we depends on. */ uint32_t depNum; /*! Register pressure. */ uint32_t regNum; /*! Cycle when the instruction is retired */ uint32_t retiredCycle; bool preRetired; uint32_t readDistance; }; /*! To track loads and stores */ enum GenMemory : uint8_t { GLOBAL_MEMORY = 0, LOCAL_MEMORY, SCRATCH_MEMORY, MAX_MEM_SYSTEM }; /*! Do we allocate after or before the register allocation? */ enum SchedulePolicy { PRE_ALLOC = 0, // LIFO scheduling (tends to limit register pressure) POST_ALLOC // FIFO scheduling (limits latency problems) }; /*! Helper structure to handle dependencies while scheduling. Takes into * account virtual and physical registers and memory sub-systems */ struct DependencyTracker : public NonCopyable { DependencyTracker(const Selection &selection, SelectionScheduler &scheduler); /*! Reset it before scheduling a new block */ void clear(bool fullClear = false); /*! Get an index in the node array for the given register */ uint32_t getIndex(GenRegister reg) const; /*! Get an index in the node array for the given memory system */ uint32_t getMemoryIndex() const; /*! Add a new dependency "node0 depends on node1" */ void addDependency(ScheduleDAGNode *node0, ScheduleDAGNode *node1, DepMode m); /*! Add a new dependency "node0 depends on node located at index" */ void addDependency(ScheduleDAGNode *node0, uint32_t index, DepMode m); /*! Add a new dependency "node located at index depends on node0" */ void addDependency(uint32_t index, ScheduleDAGNode *node0, DepMode m); /*! No dependency for null registers and immediate */ INLINE bool ignoreDependency(GenRegister reg) const { if (reg.file == GEN_IMMEDIATE_VALUE) return true; else if (reg.file == GEN_ARCHITECTURE_REGISTER_FILE) { if ((reg.nr & 0xf0) == GEN_ARF_NULL) return true; } return false; } /*! Owns the tracker */ SelectionScheduler &scheduler; /*! Add a new dependency "node0 depends on node set for register reg" */ void addDependency(ScheduleDAGNode *node0, GenRegister reg, DepMode m); /*! Add a new dependency "node set for register reg depends on node0" */ void addDependency(GenRegister reg, ScheduleDAGNode *node0, DepMode m); /*! Make the node located at insnID a barrier */ void makeBarrier(int32_t insnID, int32_t insnNum); /*! Update all the writes (memory, predicates, registers) */ void updateWrites(ScheduleDAGNode *node); /*! Maximum number of *physical* flag registers */ static const uint32_t MAX_FLAG_REGISTER = 8u; /*! Maximum number of *physical* accumulators registers */ static const uint32_t MAX_ACC_REGISTER = 1u; /*! Maximum number of *physical* tm registers */ static const uint32_t MAX_TM_REGISTER = 1u; /*! Maximum number of state registers */ static const uint32_t MAX_ST_REGISTER = 2u; /*! Maximum number of *physical* arf registers */ static const uint32_t MAX_ARF_REGISTER = MAX_FLAG_REGISTER + MAX_ACC_REGISTER + MAX_TM_REGISTER + MAX_ST_REGISTER; /*! Stores the last node that wrote to a register / memory ... */ vector nodes; /*! store nodes each node depends on */ map> deps; /*! Stores the nodes per instruction */ vector insnNodes; /*! Number of virtual register in the selection */ uint32_t grfNum; }; /*! Perform the instruction scheduling */ struct SelectionScheduler : public NonCopyable { /*! Init the book keeping structures */ SelectionScheduler(GenContext &ctx, Selection &selection, SchedulePolicy policy); /*! Make all lists empty */ void clearLists(void); /*! Return the number of instructions to schedule in the DAG */ int32_t buildDAG(SelectionBlock &bb); /*! traverse read node and update read distance for all the child. */ void traverseReadNode(ScheduleDAGNode *node, uint32_t degree = 0); /*! Schedule the DAG, pre register allocation and post register allocation. */ void preScheduleDAG(SelectionBlock &bb, int32_t insnNum); void postScheduleDAG(SelectionBlock &bb, int32_t insnNum); void computeRegPressure(ScheduleDAGNode *node, map ®PressureMap); /*! To limit register pressure or limit insn latency problems */ SchedulePolicy policy; /*! Make ScheduleListNode allocation faster */ DECL_POOL(ScheduleListNode, listPool); /*! Make ScheduleDAGNode allocation faster */ DECL_POOL(ScheduleDAGNode, nodePool); /*! Ready list is instructions that can be scheduled */ intrusive_list ready; /*! Active list is instructions that are executing */ intrusive_list active; /*! Handle complete compilation */ GenContext &ctx; /*! Code to schedule */ Selection &selection; /*! To help tracking dependencies */ DependencyTracker tracker; }; DependencyTracker::DependencyTracker(const Selection &selection, SelectionScheduler &scheduler) : scheduler(scheduler) { if (scheduler.policy == PRE_ALLOC) { this->grfNum = selection.getRegNum(); nodes.resize(grfNum + MAX_ARF_REGISTER + MAX_MEM_SYSTEM); } else { const uint32_t simdWidth = scheduler.ctx.getSimdWidth(); GBE_ASSERT(simdWidth == 8 || simdWidth == 16); this->grfNum = simdWidth == 8 ? 128 : 64; nodes.resize(grfNum + MAX_ARF_REGISTER + MAX_MEM_SYSTEM); } insnNodes.resize(selection.getLargestBlockSize()); } void DependencyTracker::clear(bool fullClear) { for (auto &x : nodes) x = NULL; if (fullClear) deps.clear(); } void DependencyTracker::addDependency(ScheduleDAGNode *node0, GenRegister reg, DepMode m) { if (this->ignoreDependency(reg) == false) { const uint32_t index = this->getIndex(reg); this->addDependency(node0, index, m); if (scheduler.policy == POST_ALLOC && (reg.isdf() || reg.isint64() || reg.is_unpacked_long())) this->addDependency(node0, index + 1, m); } } void DependencyTracker::addDependency(GenRegister reg, ScheduleDAGNode *node0, DepMode m) { if (this->ignoreDependency(reg) == false) { const uint32_t index = this->getIndex(reg); this->addDependency(index, node0, m); if (scheduler.policy == POST_ALLOC && (reg.isdf() || reg.isint64() || reg.is_unpacked_long())) this->addDependency(index + 1, node0, m); } } void DependencyTracker::addDependency(ScheduleDAGNode *node0, ScheduleDAGNode *node1, DepMode depMode) { if (node0 != NULL && node1 != NULL && node0 != node1 && node0->dependsOn(node1) == false) { if (node1->insn.isRead()) depMode = depMode == READ_AFTER_WRITE ? READ_AFTER_WRITE_MEMORY : depMode; ScheduleListNode *dep = scheduler.newScheduleListNode(node0, depMode); node0->refNum++; node1->children.push_back(dep); node1->depNum++; auto it = deps.find(node0); if (it != deps.end()) { it->second.push_back(node1); } else { vector vn; vn.push_back(node1); deps.insert(std::make_pair(node0, vn)); } } } void DependencyTracker::addDependency(ScheduleDAGNode *node, uint32_t index, DepMode m) { this->addDependency(node, this->nodes[index], m); } void DependencyTracker::addDependency(uint32_t index, ScheduleDAGNode *node, DepMode m) { this->addDependency(this->nodes[index], node, m); } void DependencyTracker::makeBarrier(int32_t barrierID, int32_t insnNum) { ScheduleDAGNode *barrier = this->insnNodes[barrierID]; // The barrier depends on all nodes before it for (int32_t insnID = 0; insnID < barrierID; ++insnID) this->addDependency(barrier, this->insnNodes[insnID], WRITE_AFTER_WRITE); // All nodes after barriers depend on the barrier for (int32_t insnID = barrierID + 1; insnID < insnNum; ++insnID) this->addDependency(this->insnNodes[insnID], barrier, WRITE_AFTER_WRITE); } static GenRegister getFlag(const SelectionInstruction &insn) { if (insn.state.physicalFlag) { const uint32_t nr = insn.state.flag; const uint32_t subnr = insn.state.subFlag; return GenRegister::flag(nr, subnr); } else return GenRegister::uw1grf(ir::Register(insn.state.flagIndex)); } uint32_t DependencyTracker::getIndex(GenRegister reg) const { // Non GRF physical register if (reg.physical) { //GBE_ASSERT (reg.file == GEN_ARCHITECTURE_REGISTER_FILE); if(reg.file == GEN_ARCHITECTURE_REGISTER_FILE) { const uint32_t file = reg.nr & 0xf0; const uint32_t nr = reg.nr & 0x0f; if (file == GEN_ARF_FLAG) { const uint32_t subnr = reg.subnr / sizeof(uint16_t); GBE_ASSERT(nr < MAX_FLAG_REGISTER && (subnr == 0 || subnr == 1)); return grfNum + 2*nr + subnr; } else if (file == GEN_ARF_ACCUMULATOR) { GBE_ASSERT(nr < MAX_ACC_REGISTER); return grfNum + MAX_FLAG_REGISTER + nr; } else if (file == GEN_ARF_TM) { return grfNum + MAX_FLAG_REGISTER + MAX_ACC_REGISTER; } else if (file == GEN_ARF_STATE) { GBE_ASSERT(nr < MAX_ST_REGISTER); return grfNum + MAX_FLAG_REGISTER + MAX_ACC_REGISTER + MAX_TM_REGISTER + nr; } else { NOT_SUPPORTED; return 0; } } else { const uint32_t simdWidth = scheduler.ctx.getSimdWidth(); return simdWidth == 8 ? reg.nr : reg.nr / 2; } } // We directly manipulate physical GRFs here else if (scheduler.policy == POST_ALLOC) { const GenRegister physical = scheduler.ctx.ra->genReg(reg); const uint32_t simdWidth = scheduler.ctx.getSimdWidth(); return simdWidth == 8 ? physical.nr : physical.nr / 2; } // We use virtual registers since allocation is not done yet else return reg.value.reg; } uint32_t DependencyTracker::getMemoryIndex() const { const uint32_t memDelta = grfNum + MAX_ARF_REGISTER; return memDelta; } void DependencyTracker::updateWrites(ScheduleDAGNode *node) { const SelectionInstruction &insn = node->insn; // Track writes in registers for (uint32_t dstID = 0; dstID < insn.dstNum; ++dstID) { const GenRegister dst = insn.dst(dstID); if (this->ignoreDependency(dst) == false) { const uint32_t index = this->getIndex(dst); this->nodes[index] = node; if (scheduler.policy == POST_ALLOC && (dst.isdf() || dst.isint64() || dst.is_unpacked_long())) this->nodes[index + 1] = node; } } // Track writes in predicates if (insn.opcode == SEL_OP_CMP || insn.opcode == SEL_OP_I64CMP || insn.state.modFlag) { const uint32_t index = this->getIndex(getFlag(insn)); this->nodes[index] = node; } // Track writes in accumulators if (insn.modAcc()) { const uint32_t index = this->getIndex(GenRegister::acc()); this->nodes[index] = node; } // Track writes in memory if (insn.isWrite()) { const uint32_t index = this->getMemoryIndex(); this->nodes[index] = node; } // Track writes in scratch memory if(insn.opcode == SEL_OP_SPILL_REG) { const uint32_t index = this->getMemoryIndex(); this->nodes[index] = node; } // Consider barriers and wait write to memory if (insn.opcode == SEL_OP_BARRIER || insn.opcode == SEL_OP_FENCE || insn.opcode == SEL_OP_WAIT) { const uint32_t memIndex = this->getMemoryIndex(); this->nodes[memIndex] = node; } } /*! Kind-of roughly estimated latency. Nothing real here */ static uint32_t getLatencyGen7(const SelectionInstruction &insn) { #define DECL_GEN7_SCHEDULE(FAMILY, LATENCY, SIMD16, SIMD8)\ const uint32_t FAMILY##InstructionLatency = LATENCY; #include "gen_insn_gen7_schedule_info.hxx" #undef DECL_GEN7_SCHEDULE switch (insn.opcode) { #define DECL_SELECTION_IR(OP, FAMILY) case SEL_OP_##OP: return FAMILY##Latency; #include "backend/gen_insn_selection.hxx" #undef DECL_SELECTION_IR }; return 0; } /*! Throughput in cycles for SIMD8 or SIMD16 */ static uint32_t getThroughputGen7(const SelectionInstruction &insn, bool isSIMD8) { #define DECL_GEN7_SCHEDULE(FAMILY, LATENCY, SIMD16, SIMD8)\ const uint32_t FAMILY##InstructionThroughput = isSIMD8 ? SIMD8 : SIMD16; #include "gen_insn_gen7_schedule_info.hxx" #undef DECL_GEN7_SCHEDULE switch (insn.opcode) { #define DECL_SELECTION_IR(OP, FAMILY) case SEL_OP_##OP: return FAMILY##Throughput; #include "backend/gen_insn_selection.hxx" #undef DECL_SELECTION_IR }; return 0; } SelectionScheduler::SelectionScheduler(GenContext &ctx, Selection &selection, SchedulePolicy policy) : policy(policy), listPool(nextHighestPowerOf2(selection.getLargestBlockSize())), ctx(ctx), selection(selection), tracker(selection, *this) { this->clearLists(); } void SelectionScheduler::clearLists(void) { this->ready.fast_clear(); this->active.fast_clear(); } void SelectionScheduler::traverseReadNode(ScheduleDAGNode *node, uint32_t degree) { GBE_ASSERT(degree != 0 || node->insn.isRead()); if (node->readDistance != 0x7FFFFFFF) return; node->readDistance = degree; if (degree > 5) return; //printf("node id %d op %d degree %d \n", node->insn.ID, node->insn.opcode, degree); auto it = tracker.deps.find(node); if (it != tracker.deps.end()) { for (auto &depNode : it->second) { if (depNode && !depNode->insn.isRead()) traverseReadNode(depNode, degree + 1); } } } int32_t SelectionScheduler::buildDAG(SelectionBlock &bb) { nodePool.rewind(); listPool.rewind(); tracker.clear(true); this->clearLists(); // Track write-after-write and read-after-write dependencies int32_t insnNum = 0; for (auto &insn : bb.insnList) { // Create a new node for this instruction ScheduleDAGNode *node = this->newScheduleDAGNode(insn); tracker.insnNodes[insnNum++] = node; // read-after-write in registers for (uint32_t srcID = 0; srcID < insn.srcNum; ++srcID) tracker.addDependency(node, insn.src(srcID), READ_AFTER_WRITE); // read-after-write for predicate if (insn.state.predicate != GEN_PREDICATE_NONE) tracker.addDependency(node, getFlag(insn), READ_AFTER_WRITE); // read-after-write in memory if (insn.isRead()) { const uint32_t index = tracker.getMemoryIndex(); tracker.addDependency(node, index, READ_AFTER_WRITE); } //read-after-write of scratch memory if (insn.opcode == SEL_OP_UNSPILL_REG) { const uint32_t index = tracker.getMemoryIndex(); tracker.addDependency(node, index, READ_AFTER_WRITE); } // Consider barriers and wait are reading memory (local and global) if (insn.opcode == SEL_OP_BARRIER || insn.opcode == SEL_OP_FENCE || insn.opcode == SEL_OP_WAIT || insn.opcode == SEL_OP_WORKGROUP_OP) { const uint32_t memIndex = tracker.getMemoryIndex(); tracker.addDependency(node, memIndex, READ_AFTER_WRITE); } // write-after-write in registers for (uint32_t dstID = 0; dstID < insn.dstNum; ++dstID) tracker.addDependency(node, insn.dst(dstID), WRITE_AFTER_WRITE); // write-after-write for predicate if (insn.opcode == SEL_OP_CMP || insn.opcode == SEL_OP_I64CMP || insn.state.modFlag) tracker.addDependency(node, getFlag(insn), WRITE_AFTER_WRITE); // write-after-write for accumulators if (insn.modAcc()) tracker.addDependency(node, GenRegister::acc(), WRITE_AFTER_WRITE); // write-after-write in memory if (insn.isWrite()) { const uint32_t index = tracker.getMemoryIndex(); tracker.addDependency(node, index, WRITE_AFTER_WRITE); } // write-after-write in scratch memory if (insn.opcode == SEL_OP_SPILL_REG) { const uint32_t index = tracker.getMemoryIndex(); tracker.addDependency(node, index, WRITE_AFTER_WRITE); } // Track all writes done by the instruction tracker.updateWrites(node); } // Track write-after-read dependencies tracker.clear(); for (int32_t insnID = insnNum-1; insnID >= 0; --insnID) { ScheduleDAGNode *node = tracker.insnNodes[insnID]; const SelectionInstruction &insn = node->insn; // write-after-read in registers for (uint32_t srcID = 0; srcID < insn.srcNum; ++srcID) tracker.addDependency(insn.src(srcID), node, WRITE_AFTER_READ); // write-after-read for predicate if (insn.state.predicate != GEN_PREDICATE_NONE) tracker.addDependency(getFlag(insn), node, WRITE_AFTER_READ); // write-after-read in memory if (insn.isRead()) { const uint32_t index = tracker.getMemoryIndex(); tracker.addDependency(index, node, WRITE_AFTER_READ); } // write-after-read in scratch memory if (insn.opcode == SEL_OP_UNSPILL_REG) { const uint32_t index = tracker.getMemoryIndex(); tracker.addDependency(index, node, WRITE_AFTER_READ); } // Consider barriers and wait are reading memory (local and global) if (insn.opcode == SEL_OP_BARRIER || insn.opcode == SEL_OP_FENCE || insn.opcode == SEL_OP_WAIT || insn.opcode == SEL_OP_WORKGROUP_OP) { const uint32_t memIndex = tracker.getMemoryIndex(); tracker.addDependency(memIndex, node, WRITE_AFTER_READ); } // Track all writes done by the instruction tracker.updateWrites(node); } // Update distance to read for each read node. for (int32_t insnID = 0; insnID < insnNum; ++insnID) { ScheduleDAGNode *node = tracker.insnNodes[insnID]; const SelectionInstruction &insn = node->insn; if (insn.isRead()) traverseReadNode(node); } // Make labels and branches non-schedulable (i.e. they act as barriers) for (int32_t insnID = 0; insnID < insnNum; ++insnID) { ScheduleDAGNode *node = tracker.insnNodes[insnID]; if (node->insn.isBranch() || node->insn.isLabel() || node->insn.opcode == SEL_OP_EOT || node->insn.opcode == SEL_OP_IF || node->insn.opcode == SEL_OP_ELSE || node->insn.opcode == SEL_OP_ENDIF || node->insn.opcode == SEL_OP_WHILE || node->insn.opcode == SEL_OP_READ_ARF || node->insn.opcode == SEL_OP_BARRIER || node->insn.opcode == SEL_OP_CALC_TIMESTAMP || node->insn.opcode == SEL_OP_STORE_PROFILING || node->insn.opcode == SEL_OP_WAIT || node->insn.opcode == SEL_OP_WORKGROUP_OP) tracker.makeBarrier(insnID, insnNum); } // Build the initial ready list (should only be the label actually) for (int32_t insnID = 0; insnID < insnNum; ++insnID) { ScheduleDAGNode *node = tracker.insnNodes[insnID]; if (node->refNum == 0) { ScheduleListNode *listNode = this->newScheduleListNode(node); this->ready.push_back(listNode); } } return insnNum; } /*! Will sort child in register pressure in increasing order */ inline bool cmp(const ScheduleDAGNode *v0, const ScheduleDAGNode *v1) { return v0->regNum < v1->regNum; } /* Recursively compute heuristic Sethi-Ullman number for each node. */ void SelectionScheduler::computeRegPressure(ScheduleDAGNode *node, map ®PressureMap) { if (regPressureMap.find(node) != regPressureMap.end()) { GBE_ASSERT(node->regNum == (uint32_t)regPressureMap.find(node)->second); return; } if (node->refNum == 0) { node->regNum = 0; regPressureMap.insert(std::make_pair(node, 0)); return; } auto &children = tracker.deps.find(node)->second; for (auto child : children) { computeRegPressure(child, regPressureMap); } std::sort(children.begin(), children.end(), cmp); uint32_t maxRegNum = 0; int32_t i = 0; for (auto &child : children) { if (child->regNum + children.size() - i > maxRegNum) maxRegNum = child->regNum + node->children.size() - i; ++i; } node->regNum = maxRegNum; regPressureMap.insert(std::make_pair(node, maxRegNum)); return; } void SelectionScheduler::preScheduleDAG(SelectionBlock &bb, int32_t insnNum) { set rootNodes; for (int32_t i = 0; i < insnNum; i++) { ScheduleDAGNode *node = tracker.insnNodes[i]; if (node->depNum == 0) rootNodes.insert(node); } map regPressureMap; map parentIndexMap; for (auto node : rootNodes) { computeRegPressure(node, regPressureMap); parentIndexMap.insert(std::make_pair(node, INT_MAX)); } set readySet(rootNodes.begin(), rootNodes.end()); set scheduledSet; int32_t j = insnNum; // Now, start the scheduling. // Each time find the minimum smallest pair (parentIndex[node], regPressure[node]) // as the best node to schedule. while(readySet.size()) { ScheduleDAGNode * bestNode = NULL; int32_t minRegNum = INT_MAX; int32_t minParentIndex = INT_MAX; for(auto node : readySet) { GBE_ASSERT(scheduledSet.contains(node) == false); if (parentIndexMap.find(node)->second < minParentIndex) { bestNode = node; minParentIndex = parentIndexMap.find(node)->second; minRegNum = regPressureMap.find(node)->second; } else if (parentIndexMap.find(node)->second == minParentIndex) { if (regPressureMap.find(node)->second < minRegNum) { bestNode = node; minRegNum = regPressureMap.find(node)->second; } } } for( auto node : tracker.deps.find(bestNode)->second ) { if (node == NULL) continue; node->depNum--; if (parentIndexMap.find(node) != parentIndexMap.end()) parentIndexMap.find(node)->second = j; else parentIndexMap.insert(std::make_pair(node, j)); if (node->depNum == 0 && scheduledSet.contains(node) == false) readySet.insert(node); } bb.prepend(&bestNode->insn); readySet.erase(bestNode); scheduledSet.insert(bestNode); --j; } GBE_ASSERT(insnNum == (int32_t)bb.insnList.size()); } void SelectionScheduler::postScheduleDAG(SelectionBlock &bb, int32_t insnNum) { uint32_t cycle = 0; const bool isSIMD8 = this->ctx.getSimdWidth() == 8; vector scheduledNodes; while (insnNum) { // Retire all the instructions that finished //printf("cycle = %d \n", cycle); for (auto toRetireIt = active.begin(); toRetireIt != active.end();) { ScheduleDAGNode *toRetireNode = toRetireIt.node()->node; // Firstly, put all write after read children to ready. if (toRetireNode->preRetired == false) { auto &children = toRetireNode->children; toRetireNode->preRetired = true; //printf("id %d pre retired \n", toRetireNode->insn.ID); for (auto it = children.begin(); it != children.end();) { ScheduleListNode *listNode = it.node(); if (listNode->depMode != WRITE_AFTER_READ) { ++it; continue; } if (--it->node->refNum == 0) { //printf("pre push id %d to ready list. \n", listNode->node->insn.ID); it = children.erase(it); this->ready.push_back(listNode); } else ++it; } if (children.size() == 0) { toRetireIt = this->active.erase(toRetireIt); continue; } } // Instruction is now complete if (toRetireNode->retiredCycle <= cycle) { toRetireIt = this->active.erase(toRetireIt); //printf("id %d retired \n", toRetireNode->insn.ID); // Traverse all children and make them ready if no more dependency auto &children = toRetireNode->children; for (auto it = children.begin(); it != children.end();) { ScheduleListNode *listNode = it.node(); if (listNode->depMode == WRITE_AFTER_READ) { ++it; continue; } if (--it->node->refNum == 0) { it = children.erase(it); if (listNode->depMode != WRITE_AFTER_READ) this->ready.push_back(listNode); //printf("push id %d to ready list. \n", listNode->node->insn.ID); } else ++it; } } else ++toRetireIt; } // Try to schedule something from the ready list intrusive_list::iterator toSchedule; toSchedule = this->ready.begin(); float minCost = 1000; for(auto it = this->ready.begin(); it != this->ready.end(); ++it) { float cost = (it->depMode == WRITE_AFTER_READ) ? 0 : ((it->depMode == WRITE_AFTER_WRITE) ? 5 : 10) - 5.0 / (it->node->readDistance == 0 ? 0.1 : it->node->readDistance); if (cost < minCost) { toSchedule = it; minCost = cost; } } if (toSchedule != this->ready.end()) { //printf("get id %d op %d to schedule \n", toSchedule->node->insn.ID, toSchedule->node->insn.opcode); // The instruction is instantaneously issued to simulate zero cycle // scheduling cycle += getThroughputGen7(toSchedule->node->insn, isSIMD8); this->ready.erase(toSchedule); this->active.push_back(toSchedule.node()); // When we schedule before allocation, instruction is instantaneously // ready. This allows to have a real LIFO strategy toSchedule->node->retiredCycle = cycle + getLatencyGen7(toSchedule->node->insn); bb.append(&toSchedule->node->insn); scheduledNodes.push_back(toSchedule->node); insnNum--; } else cycle++; } } BVAR(OCL_POST_ALLOC_INSN_SCHEDULE, true); BVAR(OCL_PRE_ALLOC_INSN_SCHEDULE, false); void schedulePostRegAllocation(GenContext &ctx, Selection &selection) { if (OCL_POST_ALLOC_INSN_SCHEDULE) { SelectionScheduler scheduler(ctx, selection, POST_ALLOC); for (auto &bb : *selection.blockList) { const int32_t insnNum = scheduler.buildDAG(bb); bb.insnList.clear(); scheduler.postScheduleDAG(bb, insnNum); } } } void schedulePreRegAllocation(GenContext &ctx, Selection &selection) { if (OCL_PRE_ALLOC_INSN_SCHEDULE) { SelectionScheduler scheduler(ctx, selection, PRE_ALLOC); for (auto &bb : *selection.blockList) { const int32_t insnNum = scheduler.buildDAG(bb); bb.insnList.clear(); scheduler.preScheduleDAG(bb, insnNum); } } } } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/backend/gen_insn_scheduling.hpp000664 001750 001750 00000002627 13161142102 023741 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file gen_insn_scheduling.hpp * \author Benjamin Segovia */ #ifndef __GBE_GEN_INSN_SCHEDULING_HPP__ #define __GBE_GEN_INSN_SCHEDULING_HPP__ namespace gbe { class Selection; // Pre ISA code class GenContext; // Handle compilation for Gen /*! Schedule the code per basic block (tends to limit register number) */ void schedulePreRegAllocation(GenContext &ctx, Selection &selection); /*! Schedule the code per basic block (tends to deal with insn latency) */ void schedulePostRegAllocation(GenContext &ctx, Selection &selection); } /* namespace gbe */ #endif /* __GBE_GEN_INSN_SCHEDULING_HPP__ */ Beignet-1.3.2-Source/backend/src/backend/gen7_encoder.cpp000664 001750 001750 00000031104 13173554000 022265 0ustar00yryr000000 000000 /* Copyright (C) Intel Corp. 2006. All Rights Reserved. Intel funded Tungsten Graphics (http://www.tungstengraphics.com) to develop this 3D driver. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. **********************************************************************/ #include "backend/gen7_encoder.hpp" namespace gbe { void Gen7Encoder::setHeader(GenNativeInstruction *insn) { Gen7NativeInstruction *gen7_insn = &insn->gen7_insn; if (this->curr.execWidth == 8) gen7_insn->header.execution_size = GEN_WIDTH_8; else if (this->curr.execWidth == 16) gen7_insn->header.execution_size = GEN_WIDTH_16; else if (this->curr.execWidth == 4) gen7_insn->header.execution_size = GEN_WIDTH_4; else if (this->curr.execWidth == 1) gen7_insn->header.execution_size = GEN_WIDTH_1; else NOT_IMPLEMENTED; gen7_insn->header.acc_wr_control = this->curr.accWrEnable; gen7_insn->header.quarter_control = this->curr.quarterControl; gen7_insn->bits1.ia1.nib_ctrl = this->curr.nibControl; gen7_insn->header.mask_control = this->curr.noMask; if (insn->header.opcode == GEN_OPCODE_MAD || insn->header.opcode == GEN_OPCODE_LRP) { gen7_insn->bits1.da3src.flag_reg_nr = this->curr.flag; gen7_insn->bits1.da3src.flag_sub_reg_nr = this->curr.subFlag; } else { gen7_insn->bits2.ia1.flag_reg_nr = this->curr.flag; gen7_insn->bits2.ia1.flag_sub_reg_nr = this->curr.subFlag; } if (this->curr.predicate != GEN_PREDICATE_NONE) { gen7_insn->header.predicate_control = this->curr.predicate; gen7_insn->header.predicate_inverse = this->curr.inversePredicate; } gen7_insn->header.saturate = this->curr.saturate; } void Gen7Encoder::setDst(GenNativeInstruction *insn, GenRegister dest) { Gen7NativeInstruction *gen7_insn = &insn->gen7_insn; if (dest.file != GEN_ARCHITECTURE_REGISTER_FILE) assert(dest.nr < 128); gen7_insn->bits1.da1.dest_reg_file = dest.file; gen7_insn->bits1.da1.dest_reg_type = dest.type; gen7_insn->bits1.da1.dest_address_mode = dest.address_mode; gen7_insn->bits1.da1.dest_reg_nr = dest.nr; gen7_insn->bits1.da1.dest_subreg_nr = dest.subnr; if (dest.hstride == GEN_HORIZONTAL_STRIDE_0) { if (dest.type == GEN_TYPE_UB || dest.type == GEN_TYPE_B) dest.hstride = GEN_HORIZONTAL_STRIDE_4; else if (dest.type == GEN_TYPE_UW || dest.type == GEN_TYPE_W) dest.hstride = GEN_HORIZONTAL_STRIDE_2; else dest.hstride = GEN_HORIZONTAL_STRIDE_1; } gen7_insn->bits1.da1.dest_horiz_stride = dest.hstride; } void Gen7Encoder::setSrc0(GenNativeInstruction *insn, GenRegister reg) { Gen7NativeInstruction *gen7_insn = &insn->gen7_insn; if (reg.file != GEN_ARCHITECTURE_REGISTER_FILE) assert(reg.nr < 128); if (reg.address_mode == GEN_ADDRESS_DIRECT) { gen7_insn->bits1.da1.src0_reg_file = reg.file; gen7_insn->bits1.da1.src0_reg_type = reg.type; gen7_insn->bits2.da1.src0_abs = reg.absolute; gen7_insn->bits2.da1.src0_negate = reg.negation; gen7_insn->bits2.da1.src0_address_mode = reg.address_mode; if (reg.file == GEN_IMMEDIATE_VALUE) { gen7_insn->bits3.ud = reg.value.ud; /* Required to set some fields in src1 as well: */ gen7_insn->bits1.da1.src1_reg_file = 0; /* arf */ gen7_insn->bits1.da1.src1_reg_type = reg.type; } else { if (gen7_insn->header.access_mode == GEN_ALIGN_1) { gen7_insn->bits2.da1.src0_subreg_nr = reg.subnr; gen7_insn->bits2.da1.src0_reg_nr = reg.nr; } else { gen7_insn->bits2.da16.src0_subreg_nr = reg.subnr / 16; gen7_insn->bits2.da16.src0_reg_nr = reg.nr; } if (reg.width == GEN_WIDTH_1 && gen7_insn->header.execution_size == GEN_WIDTH_1) { gen7_insn->bits2.da1.src0_horiz_stride = GEN_HORIZONTAL_STRIDE_0; gen7_insn->bits2.da1.src0_width = GEN_WIDTH_1; gen7_insn->bits2.da1.src0_vert_stride = GEN_VERTICAL_STRIDE_0; } else { gen7_insn->bits2.da1.src0_horiz_stride = reg.hstride; gen7_insn->bits2.da1.src0_width = reg.width; gen7_insn->bits2.da1.src0_vert_stride = reg.vstride; } } } else { gen7_insn->bits1.ia1.src0_reg_file = GEN_GENERAL_REGISTER_FILE; gen7_insn->bits1.ia1.src0_reg_type = reg.type; gen7_insn->bits2.ia1.src0_subreg_nr = reg.a0_subnr; gen7_insn->bits2.ia1.src0_indirect_offset = reg.addr_imm; gen7_insn->bits2.ia1.src0_abs = reg.absolute; gen7_insn->bits2.ia1.src0_negate = reg.negation; gen7_insn->bits2.ia1.src0_address_mode = reg.address_mode; gen7_insn->bits2.ia1.src0_horiz_stride = reg.hstride; gen7_insn->bits2.ia1.src0_width = reg.width; gen7_insn->bits2.ia1.src0_vert_stride = reg.vstride; } } void Gen7Encoder::setSrc1(GenNativeInstruction *insn, GenRegister reg) { Gen7NativeInstruction *gen7_insn = &insn->gen7_insn; assert(reg.nr < 128); gen7_insn->bits1.da1.src1_reg_file = reg.file; gen7_insn->bits1.da1.src1_reg_type = reg.type; gen7_insn->bits3.da1.src1_abs = reg.absolute; gen7_insn->bits3.da1.src1_negate = reg.negation; assert(gen7_insn->bits1.da1.src0_reg_file != GEN_IMMEDIATE_VALUE); if (reg.file == GEN_IMMEDIATE_VALUE) gen7_insn->bits3.ud = reg.value.ud; else { assert (reg.address_mode == GEN_ADDRESS_DIRECT); if (gen7_insn->header.access_mode == GEN_ALIGN_1) { gen7_insn->bits3.da1.src1_subreg_nr = reg.subnr; gen7_insn->bits3.da1.src1_reg_nr = reg.nr; } else { gen7_insn->bits3.da16.src1_subreg_nr = reg.subnr / 16; gen7_insn->bits3.da16.src1_reg_nr = reg.nr; } if (reg.width == GEN_WIDTH_1 && gen7_insn->header.execution_size == GEN_WIDTH_1) { gen7_insn->bits3.da1.src1_horiz_stride = GEN_HORIZONTAL_STRIDE_0; gen7_insn->bits3.da1.src1_width = GEN_WIDTH_1; gen7_insn->bits3.da1.src1_vert_stride = GEN_VERTICAL_STRIDE_0; } else { gen7_insn->bits3.da1.src1_horiz_stride = reg.hstride; gen7_insn->bits3.da1.src1_width = reg.width; gen7_insn->bits3.da1.src1_vert_stride = reg.vstride; } } } #define NO_SWIZZLE ((0<<0) | (1<<2) | (2<<4) | (3<<6)) void Gen7Encoder::alu3(uint32_t opcode, GenRegister dest, GenRegister src0, GenRegister src1, GenRegister src2) { GenNativeInstruction *insn = this->next(opcode); Gen7NativeInstruction *gen7_insn = &insn->gen7_insn; int execution_size = 0; if (this->curr.execWidth == 1) { execution_size = GEN_WIDTH_1; } else if (this->curr.execWidth == 8) { execution_size = GEN_WIDTH_8; } else if (this->curr.execWidth == 16) { // Gen7 does not support SIMD16 alu3, still need to use SIMD8 execution_size = GEN_WIDTH_8; } else NOT_IMPLEMENTED; assert(dest.file == GEN_GENERAL_REGISTER_FILE); assert(dest.nr < 128); assert(dest.address_mode == GEN_ADDRESS_DIRECT); assert(dest.type = GEN_TYPE_F); gen7_insn->bits1.da3src.dest_reg_file = 0; gen7_insn->bits1.da3src.dest_reg_nr = dest.nr; gen7_insn->bits1.da3src.dest_subreg_nr = dest.subnr / 4; gen7_insn->bits1.da3src.dest_writemask = 0xf; this->setHeader(insn); gen7_insn->header.access_mode = GEN_ALIGN_16; gen7_insn->header.execution_size = execution_size; assert(src0.file == GEN_GENERAL_REGISTER_FILE); assert(src0.address_mode == GEN_ADDRESS_DIRECT); assert(src0.nr < 128); assert(src0.type == GEN_TYPE_F); gen7_insn->bits2.da3src.src0_swizzle = NO_SWIZZLE; gen7_insn->bits2.da3src.src0_subreg_nr = src0.subnr / 4 ; gen7_insn->bits2.da3src.src0_reg_nr = src0.nr; gen7_insn->bits1.da3src.src0_abs = src0.absolute; gen7_insn->bits1.da3src.src0_negate = src0.negation; gen7_insn->bits2.da3src.src0_rep_ctrl = src0.vstride == GEN_VERTICAL_STRIDE_0; assert(src1.file == GEN_GENERAL_REGISTER_FILE); assert(src1.address_mode == GEN_ADDRESS_DIRECT); assert(src1.nr < 128); assert(src1.type == GEN_TYPE_F); gen7_insn->bits2.da3src.src1_swizzle = NO_SWIZZLE; gen7_insn->bits2.da3src.src1_subreg_nr_low = (src1.subnr / 4) & 0x3; gen7_insn->bits3.da3src.src1_subreg_nr_high = (src1.subnr / 4) >> 2; gen7_insn->bits2.da3src.src1_rep_ctrl = src1.vstride == GEN_VERTICAL_STRIDE_0; gen7_insn->bits3.da3src.src1_reg_nr = src1.nr; gen7_insn->bits1.da3src.src1_abs = src1.absolute; gen7_insn->bits1.da3src.src1_negate = src1.negation; assert(src2.file == GEN_GENERAL_REGISTER_FILE); assert(src2.address_mode == GEN_ADDRESS_DIRECT); assert(src2.nr < 128); assert(src2.type == GEN_TYPE_F); gen7_insn->bits3.da3src.src2_swizzle = NO_SWIZZLE; gen7_insn->bits3.da3src.src2_subreg_nr = src2.subnr / 4; gen7_insn->bits3.da3src.src2_rep_ctrl = src2.vstride == GEN_VERTICAL_STRIDE_0; gen7_insn->bits3.da3src.src2_reg_nr = src2.nr; gen7_insn->bits1.da3src.src2_abs = src2.absolute; gen7_insn->bits1.da3src.src2_negate = src2.negation; // Emit second half of the instruction if (this->curr.execWidth == 16) { GenNativeInstruction q1Insn = *insn; insn = this->next(opcode); *insn = q1Insn; gen7_insn = &insn->gen7_insn; gen7_insn->header.quarter_control = GEN_COMPRESSION_Q2; gen7_insn->bits1.da3src.dest_reg_nr++; if (gen7_insn->bits2.da3src.src0_rep_ctrl == 0) gen7_insn->bits2.da3src.src0_reg_nr++; if (gen7_insn->bits2.da3src.src1_rep_ctrl == 0) gen7_insn->bits3.da3src.src1_reg_nr++; if (gen7_insn->bits3.da3src.src2_rep_ctrl == 0) gen7_insn->bits3.da3src.src2_reg_nr++; } } static void setMBlockRWGEN7(GenEncoder *p, GenNativeInstruction *insn, uint32_t bti, uint32_t msg_type, uint32_t msg_length, uint32_t response_length) { const GenMessageTarget sfid = GEN_SFID_DATAPORT_RENDER; p->setMessageDescriptor(insn, sfid, msg_length, response_length); insn->bits3.gen7_mblock_rw.msg_type = msg_type; insn->bits3.gen7_mblock_rw.bti = bti; insn->bits3.gen7_mblock_rw.header_present = 1; } void Gen7Encoder::MBREAD(GenRegister dst, GenRegister header, uint32_t bti, uint32_t size) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); const uint32_t msg_length = 1; const uint32_t response_length = size; // Size of registers this->setHeader(insn); this->setDst(insn, GenRegister::ud8grf(dst.nr, 0)); this->setSrc0(insn, GenRegister::ud8grf(header.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); setMBlockRWGEN7(this, insn, bti, GEN75_P1_MEDIA_BREAD, msg_length, response_length); } void Gen7Encoder::MBWRITE(GenRegister header, GenRegister data, uint32_t bti, uint32_t size, bool useSends) { GenNativeInstruction *insn = this->next(GEN_OPCODE_SEND); const uint32_t msg_length = 1 + size; const uint32_t response_length = 0; // Size of registers this->setHeader(insn); this->setDst(insn, GenRegister::retype(GenRegister::null(), GEN_TYPE_UW)); this->setSrc0(insn, GenRegister::ud8grf(header.nr, 0)); this->setSrc1(insn, GenRegister::immud(0)); setMBlockRWGEN7(this, insn, bti, GEN75_P1_MEDIA_TYPED_BWRITE, msg_length, response_length); } #undef NO_SWIZZLE } Beignet-1.3.2-Source/backend/src/CMakeLists.txt000664 001750 001750 00000015366 13161142102 020400 0ustar00yryr000000 000000 set (OCL_BITCODE_BIN "${BEIGNET_INSTALL_DIR}/beignet.bc") set (OCL_HEADER_DIR "${BEIGNET_INSTALL_DIR}/include") set (OCL_PCH_OBJECT "${BEIGNET_INSTALL_DIR}/beignet.pch") set (GBE_OBJECT_DIR "${BEIGNET_INSTALL_DIR}/libgbe.so") set (INTERP_OBJECT_DIR "${BEIGNET_INSTALL_DIR}/libgbeinterp.so") if (ENABLE_OPENCL_20) set (OCL_BITCODE_BIN_20 "${BEIGNET_INSTALL_DIR}/beignet_20.bc") set (OCL_PCH_OBJECT_20 "${BEIGNET_INSTALL_DIR}/beignet_20.pch") endif (ENABLE_OPENCL_20) configure_file ( "GBEConfig.h.in" "GBEConfig.h" ) #do not involve libocl if the standalone compiler is given, if (NOT (USE_STANDALONE_GBE_COMPILER STREQUAL "true")) add_subdirectory(libocl) endif () set (LOCAL_GBE_OBJECT_DIR "${CMAKE_CURRENT_BINARY_DIR}/libgbe.so" PARENT_SCOPE) set (LOCAL_INTERP_OBJECT_DIR "${CMAKE_CURRENT_BINARY_DIR}/libgbeinterp.so" PARENT_SCOPE) set (LOCAL_OCL_BITCODE_BIN "${OCL_OBJECT_DIR}/beignet.bc" PARENT_SCOPE) set (LOCAL_OCL_HEADER_DIR "${OCL_OBJECT_DIR}/include/" PARENT_SCOPE) set (LOCAL_OCL_PCH_OBJECT "${OCL_OBJECT_DIR}/beignet.local.pch" PARENT_SCOPE) if (ENABLE_OPENCL_20) set (LOCAL_OCL_BITCODE_BIN_20 "${OCL_OBJECT_DIR}/beignet_20.bc" PARENT_SCOPE) set (LOCAL_OCL_PCH_OBJECT_20 "${OCL_OBJECT_DIR}/beignet_20.local.pch" PARENT_SCOPE) endif (ENABLE_OPENCL_20) set (GBE_SRC ${ocl_blob_file} sys/vector.hpp sys/map.hpp sys/set.hpp sys/intrusive_list.hpp sys/intrusive_list.cpp sys/exception.hpp sys/assert.cpp sys/assert.hpp sys/alloc.cpp sys/alloc.hpp sys/mutex.cpp sys/mutex.hpp sys/platform.cpp sys/platform.hpp sys/cvar.cpp sys/cvar.hpp ir/context.cpp ir/context.hpp ir/profile.cpp ir/profile.hpp ir/type.cpp ir/type.hpp ir/unit.cpp ir/unit.hpp ir/constant.cpp ir/constant.hpp ir/sampler.cpp ir/sampler.hpp ir/image.cpp ir/image.hpp ir/half.cpp ir/half.hpp ir/instruction.cpp ir/instruction.hpp ir/liveness.cpp ir/register.cpp ir/register.hpp ir/function.cpp ir/function.hpp ir/value.cpp ir/value.hpp ir/lowering.cpp ir/lowering.hpp ir/profiling.cpp ir/profiling.hpp ir/printf.cpp ir/printf.hpp ir/immediate.hpp ir/immediate.cpp ir/structurizer.hpp ir/structurizer.cpp ir/reloc.hpp ir/reloc.cpp backend/context.cpp backend/context.hpp backend/program.cpp backend/program.hpp backend/program.h llvm/llvm_sampler_fix.cpp llvm/llvm_bitcode_link.cpp llvm/llvm_gen_backend.cpp llvm/llvm_passes.cpp llvm/llvm_scalarize.cpp llvm/llvm_intrinsic_lowering.cpp llvm/llvm_barrier_nodup.cpp llvm/llvm_printf_parser.cpp llvm/llvm_profiling.cpp llvm/ExpandConstantExpr.cpp llvm/ExpandUtils.cpp llvm/PromoteIntegers.cpp llvm/ExpandLargeIntegers.cpp llvm/llvm_device_enqueue.cpp llvm/StripAttributes.cpp llvm/llvm_to_gen.cpp llvm/llvm_loadstore_optimization.cpp llvm/llvm_gen_backend.hpp llvm/llvm_gen_ocl_function.hxx llvm/llvm_unroll.cpp llvm/llvm_to_gen.hpp backend/gen/gen_mesa_disasm.c backend/gen_insn_selection.cpp backend/gen_insn_selection.hpp backend/gen_insn_selection_optimize.cpp backend/gen_insn_scheduling.cpp backend/gen_insn_scheduling.hpp backend/gen_insn_selection_output.cpp backend/gen_insn_selection_output.hpp backend/gen_reg_allocation.cpp backend/gen_reg_allocation.hpp backend/gen_context.cpp backend/gen_context.cpp backend/gen75_context.hpp backend/gen75_context.cpp backend/gen8_context.hpp backend/gen8_context.cpp backend/gen9_context.hpp backend/gen9_context.cpp backend/gen_program.cpp backend/gen_program.hpp backend/gen_program.h backend/gen7_instruction.hpp backend/gen8_instruction.hpp backend/gen_defs.hpp backend/gen_insn_compact.cpp backend/gen_encoder.hpp backend/gen_encoder.cpp backend/gen7_encoder.hpp backend/gen7_encoder.cpp backend/gen75_encoder.hpp backend/gen75_encoder.cpp backend/gen8_encoder.hpp backend/gen8_encoder.cpp backend/gen9_encoder.hpp backend/gen9_encoder.cpp ) set (GBE_LINK_LIBRARIES ${DRM_INTEL_LIBRARIES} ${DRM_LIBRARIES} ${CLANG_LIBRARIES} ${LLVM_MODULE_LIBS} ${LLVM_SYSTEM_LIBS} ${CMAKE_THREAD_LIBS_INIT} ${CMAKE_DL_LIBS} ) include_directories (.) link_directories (${LLVM_LIBRARY_DIRS} ${DRM_LIBDIR}) include_directories(${LLVM_INCLUDE_DIRS}) #do not build libgbe.so if the standalone compiler is given if (NOT (USE_STANDALONE_GBE_COMPILER STREQUAL "true")) add_library (gbe SHARED ${GBE_SRC}) target_link_libraries(gbe ${GBE_LINK_LIBRARIES}) add_dependencies(gbe beignet_bitcode) endif (NOT (USE_STANDALONE_GBE_COMPILER STREQUAL "true")) add_library(gbeinterp SHARED gbe_bin_interpreter.cpp) if (LLVM_VERSION_NODOT VERSION_EQUAL 34) find_library(TERMINFO NAMES tinfo ncurses) if (${TERMINFO} STREQUAL TERMINFO-NOTFOUND) message(FATAL_ERROR "no libtinfo or libncurses is found in system") else (${TERMINFO} STREQUAL TERMINFO-NOTFOUND) target_link_libraries(gbe ${TERMINFO}) message(STATUS "use ${TERMINFO} as terminal control library") endif (${TERMINFO} STREQUAL TERMINFO-NOTFOUND) endif(LLVM_VERSION_NODOT VERSION_EQUAL 34) link_directories (${LLVM_LIBRARY_DIR} ${DRM_LIBDIR}) #do not build nor install if the standalone compiler is given if (NOT (USE_STANDALONE_GBE_COMPILER STREQUAL "true")) if (BUILD_STANDALONE_GBE_COMPILER STREQUAL "true") macro(remove_cxx_flag flag) string(REPLACE "${flag}" "" CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}") endmacro() remove_cxx_flag("-Wl,-E") ADD_EXECUTABLE(gbe_bin_generater gbe_bin_generater.cpp ${GBE_SRC}) set_target_properties(gbe_bin_generater PROPERTIES LINK_FLAGS "-static") TARGET_LINK_LIBRARIES(gbe_bin_generater ${GBE_LINK_LIBRARIES}) ADD_CUSTOM_TARGET(gbecompiler.tgz ALL COMMAND tar zcf ${OCL_OBJECT_DIR}/gbecompiler.tgz gbe_bin_generater -C ${OCL_OBJECT_DIR} beignet.bc -C ${OCL_OBJECT_DIR} beignet.pch -C ${OCL_OBJECT_DIR} include DEPENDS gbe_bin_generater beignet_bitcode ) else () ADD_EXECUTABLE(gbe_bin_generater gbe_bin_generater.cpp) TARGET_LINK_LIBRARIES(gbe_bin_generater gbe) endif () install (TARGETS gbe LIBRARY DESTINATION ${BEIGNET_INSTALL_DIR}) install (FILES ${OCL_OBJECT_DIR}/beignet.bc DESTINATION ${BEIGNET_INSTALL_DIR}) install (FILES ${OCL_OBJECT_DIR}/beignet.pch DESTINATION ${BEIGNET_INSTALL_DIR}) if (ENABLE_OPENCL_20) install (FILES ${OCL_OBJECT_DIR}/beignet_20.bc DESTINATION ${BEIGNET_INSTALL_DIR}) install (FILES ${OCL_OBJECT_DIR}/beignet_20.pch DESTINATION ${BEIGNET_INSTALL_DIR}) endif (ENABLE_OPENCL_20) install (FILES ${OCL_HEADER_FILES} DESTINATION ${BEIGNET_INSTALL_DIR}/include) endif (NOT (USE_STANDALONE_GBE_COMPILER STREQUAL "true")) install (TARGETS gbeinterp LIBRARY DESTINATION ${BEIGNET_INSTALL_DIR}) Beignet-1.3.2-Source/backend/src/.gitignore000664 001750 001750 00000000157 13161142102 017620 0ustar00yryr000000 000000 GBEConfig.h libgbe.so ocl_common_defines_str.cpp ocl_stdlib.h ocl_stdlib.h.pch ocl_stdlib_str.cpp ocl_vector.h Beignet-1.3.2-Source/backend/src/ir/000775 001750 001750 00000000000 13174334761 016261 5ustar00yryr000000 000000 Beignet-1.3.2-Source/backend/src/ir/function.cpp000664 001750 001750 00000035214 13161142102 020575 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file function.cpp * \author Benjamin Segovia */ #include "ir/function.hpp" #include "ir/unit.hpp" #include "sys/map.hpp" namespace gbe { namespace ir { /////////////////////////////////////////////////////////////////////////// // PushLocation /////////////////////////////////////////////////////////////////////////// Register PushLocation::getRegister(void) const { const Function::LocationMap &locationMap = fn.getLocationMap(); GBE_ASSERT(locationMap.contains(*this) == true); return locationMap.find(*this)->second; } /////////////////////////////////////////////////////////////////////////// // Function /////////////////////////////////////////////////////////////////////////// Function::Function(const std::string &name, const Unit &unit, Profile profile) : name(name), unit(unit), profile(profile), simdWidth(0), useSLM(false), slmSize(0), stackSize(0), wgBroadcastSLM(-1), tidMapSLM(-1), useDeviceEnqueue(false) { initProfile(*this); samplerSet = GBE_NEW(SamplerSet); imageSet = GBE_NEW(ImageSet); printfSet = GBE_NEW(PrintfSet); } Function::~Function(void) { for (auto block : blocks) GBE_DELETE(block); for (auto loop : loops) GBE_DELETE(loop); for (auto arg : args) GBE_DELETE(arg); } RegisterFamily Function::getPointerFamily(void) const { return unit.getPointerFamily(); } uint32_t Function::getOclVersion(void) const { return unit.getOclVersion(); } void Function::addLoop(LabelIndex preheader, int parent, const vector &bbs, const vector> &exits) { loops.push_back(GBE_NEW(Loop, preheader, parent, bbs, exits)); } int Function::getLoopDepth(LabelIndex Block) const{ if (loops.size() == 0) return 0; int LoopIndex = -1; int LoopDepth = 0; // get innermost loop for (int Idx = loops.size()-1; Idx >= 0; Idx--) { Loop *lp = loops[Idx]; vector &Blocks = lp->bbs; bool Found = (std::find(Blocks.begin(), Blocks.end(), Block) != Blocks.end()); if (Found) { LoopIndex = Idx; break; } } if (LoopIndex != -1) { int LoopId = LoopIndex; do { LoopId = loops[LoopId]->parent; LoopDepth++; } while(LoopId != -1); } return LoopDepth; } void Function::checkEmptyLabels(void) { // Empty label map, we map the removed label to the next label. map labelMap; map revLabelMap; foreachBlock([&](BasicBlock &BB) { Instruction * insn = BB.getLastInstruction(); if (insn->getOpcode() == OP_LABEL) { GBE_ASSERTM(0, "Found empty block. "); } }); } void Function::sortLabels(void) { uint32_t last = 0; // Compute the new labels and patch the label instruction map labelMap; foreachInstruction([&](Instruction &insn) { if (insn.getOpcode() != OP_LABEL) return; // Create the new label const Instruction newLabel = LABEL(LabelIndex(last)); // Replace the previous label instruction LabelInstruction &label = cast(insn); const LabelIndex index = label.getLabelIndex(); labelMap.insert(std::make_pair(index, LabelIndex(last++))); newLabel.replace(&insn); }); // Patch all branch instructions with the new labels foreachInstruction([&](Instruction &insn) { if (insn.getOpcode() != OP_BRA) return; // Get the current branch instruction BranchInstruction &bra = cast(insn); const LabelIndex index = bra.getLabelIndex(); const LabelIndex newIndex = labelMap.find(index)->second; // Insert the patched branch instruction if (bra.isPredicated() == true) { const Instruction newBra = BRA(newIndex, bra.getPredicateIndex()); newBra.replace(&insn); } else { const Instruction newBra = BRA(newIndex); newBra.replace(&insn); } }); // fix labels for loops for (auto &x : loops) { for (auto &y : x->bbs) y = labelMap[y]; x->preheader = labelMap[x->preheader]; for (auto &z : x->exits) { z.first = labelMap[z.first]; z.second = labelMap[z.second]; } } // Reset the label to block mapping //this->labels.resize(last); foreachBlock([&](BasicBlock &bb) { const Instruction *first = bb.getFirstInstruction(); const LabelInstruction *label = cast(first); const LabelIndex index = label->getLabelIndex(); this->labels[index] = &bb; }); } LabelIndex Function::newLabel(void) { GBE_ASSERTM(labels.size() < 0xffffffffull, "Too many labels are defined (4G only are supported)"); const LabelIndex index(labels.size()); labels.push_back(NULL); return index; } void Function::outImmediate(std::ostream &out, ImmediateIndex index) const { GBE_ASSERT(index < immediates.size()); const Immediate imm = immediates[index]; switch (imm.getType()) { case TYPE_BOOL: out << !!imm.getIntegerValue(); break; case TYPE_S8: case TYPE_U8: case TYPE_S16: case TYPE_U16: case TYPE_S32: case TYPE_U32: case TYPE_S64: out << imm.getIntegerValue(); break; case TYPE_U64: out << (uint64_t)imm.getIntegerValue(); break; case TYPE_HALF: out << "half(" << (float)imm.getHalfValue() << ")"; break; case TYPE_FLOAT: out << imm.getFloatValue(); break; case TYPE_DOUBLE: out << imm.getDoubleValue(); break; default: GBE_ASSERT(0 && "unsupported imm type.\n"); } } uint32_t Function::getLargestBlockSize(void) const { uint32_t insnNum = 0; foreachBlock([&insnNum](const ir::BasicBlock &bb) { insnNum = std::max(insnNum, uint32_t(bb.size())); }); return insnNum; } uint32_t Function::getFirstSpecialReg(void) const { return this->profile == PROFILE_OCL ? 0u : ~0u; } uint32_t Function::getSpecialRegNum(void) const { return this->profile == PROFILE_OCL ? ocl::regNum : ~0u; } bool Function::isEntryBlock(const BasicBlock &bb) const { if (this->blockNum() == 0) return false; else return &bb == this->blocks[0]; } BasicBlock &Function::getTopBlock(void) const { GBE_ASSERT(blockNum() > 0 && blocks[0] != NULL); return *blocks[0]; } const BasicBlock &Function::getBottomBlock(void) const { const uint32_t n = blockNum(); GBE_ASSERT(n > 0 && blocks[n-1] != NULL); return *blocks[n-1]; } BasicBlock &Function::getBottomBlock(void) { const uint32_t n = blockNum(); GBE_ASSERT(n > 0 && blocks[n-1] != NULL); return *blocks[n-1]; } BasicBlock &Function::getBlock(LabelIndex label) const { GBE_ASSERT(label < labelNum() && labels[label] != NULL); return *labels[label]; } const LabelInstruction *Function::getLabelInstruction(LabelIndex index) const { const BasicBlock *bb = this->labels[index]; const Instruction *first = bb->getFirstInstruction(); return cast(first); } /*! Indicate if the given register is a special one (like localID in OCL) */ bool Function::isSpecialReg(const Register ®) const { const uint32_t ID = uint32_t(reg); const uint32_t firstID = this->getFirstSpecialReg(); const uint32_t specialNum = this->getSpecialRegNum(); return ID >= firstID && ID < firstID + specialNum; } Register Function::getSurfaceBaseReg(uint8_t bti) const { map::const_iterator iter = btiRegMap.find(bti); GBE_ASSERT(iter != btiRegMap.end()); return iter->second; } void Function::appendSurface(uint8_t bti, Register reg) { btiRegMap.insert(std::make_pair(bti, reg)); } void Function::computeCFG(void) { // Clear possible previously computed CFG and compute the direct // predecessors and successors BasicBlock *prev = NULL; this->foreachBlock([this, &prev](BasicBlock &bb) { bb.successors.clear(); bb.predecessors.clear(); if (prev != NULL) { prev->nextBlock = &bb; bb.prevBlock = prev; } prev = &bb; }); // Update it. Do not forget that a branch can also jump to the next block BasicBlock *jumpToNext = NULL; this->foreachBlock([this, &jumpToNext](BasicBlock &bb) { if (jumpToNext) { jumpToNext->successors.insert(&bb); bb.predecessors.insert(jumpToNext); jumpToNext = NULL; } if (bb.size() == 0) return; Instruction *last = bb.getLastInstruction(); if (last->isMemberOf() == false || last->getOpcode() == OP_ENDIF || last->getOpcode() == OP_ELSE) { jumpToNext = &bb; return; } ir::BasicBlock::iterator it = --bb.end(); uint32_t handledInsns = 0; while ((handledInsns < 2 && it != bb.end()) && static_cast(&*it)->getOpcode() == OP_BRA) { const BranchInstruction &insn = cast(*it); if (insn.getOpcode() != OP_BRA) break; const LabelIndex label = insn.getLabelIndex(); BasicBlock *target = this->blocks[label]; GBE_ASSERT(target != NULL); target->predecessors.insert(&bb); bb.successors.insert(target); if (insn.isPredicated() == true) jumpToNext = &bb; // If we are going to handle the second bra, this bra must be a predicated bra GBE_ASSERT(handledInsns == 0 || insn.isPredicated() == true); --it; ++handledInsns; } }); } void Function::outputCFG(void) { std::string fileName = getName() + std::string(".dot"); ::FILE *fp = fopen(fileName.c_str(), "w"); if (fp == NULL) return; printf("writing Gen IR CFG to %s\n", fileName.c_str()); fprintf(fp, "digraph \"%s\" {\n", getName().c_str()); this->foreachBlock([this, fp](BasicBlock &bb) { uint32_t lid = bb.getLabelIndex(); fprintf(fp, "Node%d [shape=record, label=\"{%d}\"];\n", lid, lid); set &succ = bb.successors; for (auto x : succ) { uint32_t next = x->getLabelIndex(); fprintf(fp, "Node%d -> Node%d\n", lid, next); } }); fprintf(fp, "}\n"); fclose(fp); } std::ostream &operator<< (std::ostream &out, const Function &fn) { out << ".decl_function " << fn.getName() << std::endl; out << fn.getRegisterFile(); out << "## " << fn.argNum() << " input register" << (fn.argNum() ? "s" : "") << " ##" << std::endl; for (uint32_t i = 0; i < fn.argNum(); ++i) { const FunctionArgument &input = fn.getArg(i); out << "decl_input."; switch (input.type) { case FunctionArgument::GLOBAL_POINTER: out << "global"; break; case FunctionArgument::LOCAL_POINTER: out << "local"; break; case FunctionArgument::CONSTANT_POINTER: out << "constant"; break; case FunctionArgument::VALUE: out << "value"; break; case FunctionArgument::STRUCTURE: out << "structure." << input.size; break; case FunctionArgument::IMAGE: out << "image"; break; case FunctionArgument::PIPE: out << "pipe"; break; default: break; } out << " %" << input.reg << " " << input.name << std::endl; } out << "## " << fn.outputNum() << " output register" << (fn.outputNum() ? "s" : "") << " ##" << std::endl; for (uint32_t i = 0; i < fn.outputNum(); ++i) out << "decl_output %" << fn.getOutput(i) << std::endl; out << "## " << fn.pushedNum() << " pushed register" << std::endl; const Function::PushMap &pushMap = fn.getPushMap(); for (const auto &pushed : pushMap) { out << "decl_pushed %" << pushed.first << " @{" << pushed.second.argID << "," << pushed.second.offset << "}" << std::endl; } out << "## " << fn.blockNum() << " block" << (fn.blockNum() ? "s" : "") << " ##" << std::endl; fn.foreachBlock([&](const BasicBlock &bb) { const_cast(bb).foreach([&out] (const Instruction &insn) { out << insn << std::endl; }); out << std::endl; }); out << ".end_function" << std::endl; return out; } /////////////////////////////////////////////////////////////////////////// // Basic Block /////////////////////////////////////////////////////////////////////////// BasicBlock::BasicBlock(Function &fn) : needEndif(true), needIf(true), endifLabel(0), matchingEndifLabel(0), matchingElseLabel(0), thisElseLabel(0), belongToStructure(false), isStructureExit(false), isLoopExit(false), hasExtraBra(false), matchingStructureEntry(NULL), fn(fn) { this->nextBlock = this->prevBlock = NULL; } BasicBlock::~BasicBlock(void) { this->foreach([this] (Instruction &insn) { this->fn.deleteInstruction(&insn); }); } void BasicBlock::append(Instruction &insn) { insn.setParent(this); this->push_back(&insn); } void BasicBlock::insertAt(iterator pos, Instruction &insn) { insn.setParent(this); this->insert(pos, &insn); } Instruction *BasicBlock::getFirstInstruction(void) const { GBE_ASSERT(this->begin() != this->end()); const Instruction &insn = *this->begin(); return const_cast(&insn); } Instruction *BasicBlock::getLastInstruction(void) const { GBE_ASSERT(this->begin() != this->end()); const Instruction &insn = *(--this->end()); return const_cast(&insn); } LabelIndex BasicBlock::getLabelIndex(void) const { const Instruction *first = this->getFirstInstruction(); const LabelInstruction *label = cast(first); return label?label->getLabelIndex():LabelIndex(-1); } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/structurizer.cpp000664 001750 001750 00000074715 13161142102 021546 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "structurizer.hpp" #include "sys/cvar.hpp" using namespace llvm; namespace gbe { namespace ir { CFGStructurizer::~CFGStructurizer() { BlockVector::iterator iter = blocks.begin(); BlockVector::iterator iter_end = blocks.end(); while(iter != iter_end) { delete *iter; iter++; } } void CFGStructurizer::handleSelfLoopBlock(Block *loopblock, LabelIndex& whileLabel) { //BlockList::iterator child_iter = (*it)->children.begin(); BasicBlock *pbb = loopblock->getExit(); GBE_ASSERT(pbb->isLoopExit); BasicBlock::iterator it = pbb->end(); it--; if (pbb->hasExtraBra) it--; BranchInstruction* pinsn = static_cast(&*it); if(!pinsn->isPredicated()){ std::cout << "WARNING:" << "endless loop detected!" << std::endl; return; } Register reg = pinsn->getPredicateIndex(); /* since this block is an while block, so we remove the BRA instruction at the bottom of the exit BB of 'block', * and insert WHILE instead */ whileLabel = pinsn->getLabelIndex(); Instruction insn = WHILE(whileLabel, reg); Instruction* p_new_insn = pbb->getParent().newInstruction(insn); pbb->insertAt(it, *p_new_insn); pbb->whileLabel = whileLabel; it->remove(); } /* recursive mark the bbs' variable needEndif*/ void CFGStructurizer::markNeedIf(Block *block, bool status) { if(block->type() == SingleBlockType) { BasicBlock* bb = ((SimpleBlock*)block)->getBasicBlock(); bb->needIf = status; return; } BlockList::iterator it = block->children.begin(); while(it != block->children.end()) { markNeedIf(*it,status); it++; } } /* recursive mark the bbs' variable needIf*/ void CFGStructurizer::markNeedEndif(Block *block, bool status) { if(block->type() == SingleBlockType) { BasicBlock* bb = ((SimpleBlock*)block)->getBasicBlock(); bb->needEndif = status; return; } BlockList::iterator it = block->children.begin(); while(it != block->children.end()) { markNeedEndif(*it, status); it++; } } /* recursive mark the bbs' variable mark*/ void CFGStructurizer::markStructuredBlocks(Block *block, bool status) { if(block->type() == SingleBlockType) { SimpleBlock* pbb = static_cast(block); pbb->getBasicBlock()->belongToStructure = true; } block->mark = status; BlockList::iterator it = block->children.begin(); while(it != block->children.end()) { markStructuredBlocks(*it, status); it++; } } void CFGStructurizer::handleIfBlock(Block *block, LabelIndex& matchingEndifLabel, LabelIndex& matchingElseLabel) { BasicBlock *pbb = block->getExit(); BranchInstruction* pinsn = static_cast(pbb->getLastInstruction()); Register reg = pinsn->getPredicateIndex(); BasicBlock::iterator it = pbb->end(); it--; /* since this block is an if block, so we remove the BRA instruction at the bottom of the exit BB of 'block', * and insert IF instead */ it->remove(); Instruction insn = IF(matchingElseLabel, reg, block->inversePredicate); Instruction* p_new_insn = pbb->getParent().newInstruction(insn); pbb->append(*p_new_insn); pbb->matchingEndifLabel = matchingEndifLabel; pbb->matchingElseLabel = matchingElseLabel; } void CFGStructurizer::handleThenBlock(Block * block, LabelIndex& endiflabel) { BasicBlock *pbb = block->getExit(); BasicBlock::iterator it = pbb->end(); it--; Instruction *p_last_insn = pbb->getLastInstruction(); endiflabel = fn->newLabel(); //pbb->thisEndifLabel = endiflabel; Instruction insn = ENDIF(endiflabel); Instruction* p_new_insn = pbb->getParent().newInstruction(insn); // we need to insert ENDIF before the BRA(if exists). bool append_bra = false; if((*it).getOpcode() == OP_BRA) { pbb->erase(it); append_bra = true; } pbb->append(*p_new_insn); if(append_bra) pbb->append(*p_last_insn); } void CFGStructurizer::handleThenBlock2(Block *block, Block *elseblock, LabelIndex elseBBLabel) { BasicBlock *pbb = block->getExit(); BasicBlock::iterator it = pbb->end(); it--; if((*it).getOpcode() == OP_BRA) it->remove(); if(block->getExit()->getNextBlock() == elseblock->getEntry()) return; // Add an unconditional jump to 'else' block Instruction insn = BRA(elseBBLabel); Instruction* p_new_insn = pbb->getParent().newInstruction(insn); pbb->append(*p_new_insn); } void CFGStructurizer::handleElseBlock(Block * block, LabelIndex& elselabel, LabelIndex& endiflabel) { // to insert ENDIF properly handleThenBlock(block, endiflabel); BasicBlock *pbb = block->getEntry(); BasicBlock::iterator it = pbb->begin(); it++; elselabel = fn->newLabel(); pbb->thisElseLabel = elselabel; // insert ELSE properly Instruction insn = ELSE(endiflabel); Instruction* p_new_insn = pbb->getParent().newInstruction(insn); pbb->insertAt(it, *p_new_insn); } void CFGStructurizer::handleStructuredBlocks() { BlockVector::iterator it; BlockVector::iterator end = blocks.end(); BlockVector::iterator begin = blocks.begin(); it = end; it--; BlockVector::reverse_iterator rit = blocks.rbegin(); /* structured bbs only need if and endif insn to handle the execution * in structure entry and exit BasicBlock, so we process the blocks backward, since * the block at the back of blocks is always a 'not smaller' structure then * the ones before it. we mark the blocks which are sub-blocks of the block * we are dealing with, in order to ensure we are always handling the 'biggest' * structures */ while(rit != blocks.rend()) { if((*rit)->type() == IfThenType || (*rit)->type() == IfElseType|| (*rit)->type() == SelfLoopType) { if(false == (*rit)->mark && (*rit)->canBeHandled) { markStructuredBlocks(*rit, true); /* only the entry bb of this structure needs 'if' at backend and * only the exit bb of this structure needs 'endif' at backend * see comment about needEndif and needIf at function.hpp for detail. */ markNeedEndif(*rit, false); markNeedIf(*rit, false); BasicBlock* entry = (*rit)->getEntry(); BasicBlock* eexit = (*rit)->getExit(); entry->needIf = true; eexit->needEndif = true; entry->endifLabel = fn->newLabel(); eexit->endifLabel = entry->endifLabel; eexit->isStructureExit = true; eexit->matchingStructureEntry = entry; } } rit++; } rit = blocks.rbegin(); gbe::vector &bblocks = fn->getBlocks(); std::vector bbs; bbs.resize(bblocks.size()); /* here insert the bras to the BBs, which would * simplify the reorder of basic blocks */ for(size_t i = 0; i < bblocks.size(); ++i) { bbs[i] = bblocks[i]; if(i != bblocks.size() -1 && (bbs[i]->getLastInstruction()->getOpcode() != OP_BRA || (bbs[i]->isStructureExit && bbs[i]->isLoopExit))) { Instruction insn = BRA(bbs[i]->getNextBlock()->getLabelIndex()); Instruction* pNewInsn = bbs[i]->getParent().newInstruction(insn); bbs[i]->append(*pNewInsn); if (bbs[i]->isStructureExit && bbs[i]->isLoopExit) bbs[i]->hasExtraBra = true; } } /* now, reorder the basic blocks to reduce the unconditional jump we inserted whose * targets are the 'else' blocks. the algorithm is quite simple, just put the unstructured * BBs(maybe belong to another structure, but not this one) in front of the entry BB of * this structure in front of all the others and put the other unstructured BBs at the * back of the others. the sequence of structured is get through function getStructureSequence. */ while(rit != blocks.rend()) { if(((*rit)->type() == IfThenType || (*rit)->type() == IfElseType || (*rit)->type() == SerialBlockType ||(*rit)->type() == SelfLoopType) && (*rit)->canBeHandled && (*rit)->mark == true) { markStructuredBlocks(*rit, false); std::set ns = getStructureBasicBlocksIndex(*rit, bbs); BasicBlock *entry = (*rit)->getEntry(); int entryIndex = *(ns.begin()); for(size_t i=0; i::iterator iter = ns.begin(); int index = *iter; std::vector unstruSeqHead; std::vector unstruSeqTail; iter = ns.begin(); while(iter != ns.end()) { if(index != *iter) { if(index < entryIndex) unstruSeqHead.push_back(bbs[index]); else unstruSeqTail.push_back(bbs[index]); index++; } else { index++; iter++; } } std::vector struSeq; getStructureSequence(*rit, struSeq); int firstindex = *(ns.begin()); for(size_t i = 0; i < unstruSeqHead.size(); ++i) bbs[firstindex++] = unstruSeqHead[i]; for(size_t i = 0; i < struSeq.size(); ++i) bbs[firstindex++] = struSeq[i]; for(size_t i = 0; i < unstruSeqTail.size(); ++i) bbs[firstindex++] = unstruSeqTail[i]; } rit++; } /* now, erase the BRAs inserted before whose targets are their fallthrough blocks */ for(size_t i=0; igetLastInstruction()->getOpcode() == OP_BRA && !((BranchInstruction*)(bbs[i]->getLastInstruction()))->isPredicated()) { if(((BranchInstruction *)bbs[i]->getLastInstruction())->getLabelIndex() == bbs[i+1]->getLabelIndex()) { BasicBlock::iterator it= bbs[i]->end(); it--; it->remove(); if (bbs[i]->hasExtraBra) bbs[i]->hasExtraBra = false; } } } for(size_t i=0; isortLabels(); fn->computeCFG(); it = begin; while(it != end) { if((*it)->canBeHandled) { switch((*it)->type()) { case IfThenType: { BlockList::iterator child_iter = (*it)->children.end(); LabelIndex endiflabel; child_iter--; handleThenBlock(*child_iter, endiflabel); // this call would pass out the proper endiflabel for handleIfBlock's use. child_iter--; handleIfBlock(*child_iter, endiflabel, endiflabel); } break; case IfElseType: { BlockList::iterator child_iter = (*it)->children.end(); LabelIndex endiflabel; LabelIndex elselabel; BlockList::iterator else_block; child_iter--; else_block= child_iter; handleElseBlock(*child_iter, elselabel, endiflabel); LabelIndex elseBBLabel = (*child_iter)->getEntry()->getLabelIndex(); child_iter--; handleThenBlock2(*child_iter, *else_block, elseBBLabel); child_iter--; handleIfBlock(*child_iter, endiflabel, elselabel); } break; case SelfLoopType: { LabelIndex whilelabel; handleSelfLoopBlock(*it, whilelabel); } break; default: break; } } it++; } } void CFGStructurizer::getStructureSequence(Block *block, std::vector &seq) { /* in the control tree, for if-then, if block is before then block; for if-else, the * stored sequence is if-then-else, for block structure, the stored sequence is just * their executed sequence. so we could just get the structure sequence by recrusive * calls getStructureSequence to all the elements in children one by one. */ if(block->type() == SingleBlockType) { seq.push_back(((SimpleBlock*)block)->getBasicBlock()); return; } BlockList::iterator iter = block->children.begin(); while(iter != block->children.end()) { getStructureSequence(*iter, seq); iter++; } } std::set CFGStructurizer::getStructureBasicBlocksIndex(Block* block, std::vector &bbs) { std::set result; if(block->type() == SingleBlockType) { for(size_t i=0; igetBasicBlock()) { result.insert(i); break; } } return result; } BlockList::iterator iter = (block->children).begin(); BlockList::iterator end = (block->children).end(); while(iter != end) { std::set ret = getStructureBasicBlocksIndex(*iter, bbs); result.insert(ret.begin(), ret.end()); iter++; } return result; } std::set CFGStructurizer::getStructureBasicBlocks(Block *block) { std::set result; if(block->type() == SingleBlockType) { result.insert(((SimpleBlock*)block)->getBasicBlock()); return result; } BlockList::iterator iter = (block->children).begin(); BlockList::iterator end = (block->children).end(); while(iter != end) { std::set ret = getStructureBasicBlocks(*iter); result.insert(ret.begin(), ret.end()); iter++; } return result; } Block* CFGStructurizer::insertBlock(Block *p_block) { blocks.push_back(p_block); return p_block; } void CFGStructurizer::collectInsnNum(Block* block, const BasicBlock* bb) { BasicBlock::const_iterator iter = bb->begin(); BasicBlock::const_iterator iter_end = bb->end(); while(iter != iter_end) { block->insnNum++; iter++; } } bool CFGStructurizer::checkForBarrier(const BasicBlock* bb) { BasicBlock::const_iterator iter = bb->begin(); BasicBlock::const_iterator iter_end = bb->end(); while(iter != iter_end) { if((*iter).getOpcode() == OP_SYNC) return true; iter++; } return false; } void CFGStructurizer::getLiveIn(BasicBlock& bb, std::set& livein) { BasicBlock::iterator iter = bb.begin(); std::set varKill; while(iter != bb.end()) { Instruction& insn = *iter; const uint32_t srcNum = insn.getSrcNum(); const uint32_t dstNum = insn.getDstNum(); for(uint32_t srcID = 0; srcID < srcNum; ++srcID) { const Register reg = insn.getSrc(srcID); if(varKill.find(reg) == varKill.end()) livein.insert(reg); } for(uint32_t dstID = 0; dstID < dstNum; ++dstID) { const Register reg = insn.getDst(dstID); varKill.insert(reg); } iter++; } } void CFGStructurizer::calculateNecessaryLiveout() { BlockVector::iterator iter = blocks.begin(); while(iter != blocks.end()) { switch((*iter)->type()) { case IfElseType: { std::set bbs; BlockList::iterator thenIter = (*iter)->children.begin(); thenIter++; bbs = getStructureBasicBlocks(*thenIter); Block *elseblock = *((*iter)->children.rbegin()); std::set livein; getLiveIn(*(elseblock->getEntry()), livein); std::set::iterator bbiter = bbs.begin(); while(bbiter != bbs.end()) { (*bbiter)->liveout.insert(livein.begin(), livein.end()); bbiter++; } } default: break; } iter++; } } void CFGStructurizer::initializeBlocks() { BasicBlock& tmp_bb = fn->getTopBlock(); BasicBlock* p_tmp_bb = &tmp_bb; Block* p = NULL; if(NULL != p_tmp_bb) { Block *p_tmp_block = new SimpleBlock(p_tmp_bb); p_tmp_block->label = p_tmp_bb->getLabelIndex(); if(checkForBarrier(p_tmp_bb)) p_tmp_block->hasBarrier() = true; blocks.push_back(p_tmp_block); bbmap[p_tmp_bb] = p_tmp_block; bTobbmap[p_tmp_block] = p_tmp_bb; p_tmp_bb = p_tmp_bb->getNextBlock(); p = p_tmp_block; } while(p_tmp_bb != NULL) { Block *p_tmp_block = new SimpleBlock(p_tmp_bb); p_tmp_block->label = p_tmp_bb->getLabelIndex(); if(checkForBarrier(p_tmp_bb)) p_tmp_block->hasBarrier() = true; p->fallthrough() = p_tmp_block; p = p_tmp_block; blocks.push_back(p_tmp_block); bbmap[p_tmp_bb] = p_tmp_block; bTobbmap[p_tmp_block] = p_tmp_bb; p_tmp_bb = p_tmp_bb->getNextBlock(); } if(NULL != p) p->fallthrough() = NULL; p_tmp_bb = &tmp_bb; this->blocks_entry = bbmap[p_tmp_bb]; while(p_tmp_bb != NULL) { BlockSet::const_iterator iter_begin = p_tmp_bb->getPredecessorSet().begin(); BlockSet::const_iterator iter_end = p_tmp_bb->getPredecessorSet().end(); while(iter_begin != iter_end) { bbmap[p_tmp_bb]->predecessors().insert(bbmap[*iter_begin]); iter_begin++; } iter_begin = p_tmp_bb->getSuccessorSet().begin(); iter_end = p_tmp_bb->getSuccessorSet().end(); while(iter_begin != iter_end) { bbmap[p_tmp_bb]->successors().insert(bbmap[*iter_begin]); iter_begin++; } p_tmp_bb = p_tmp_bb->getNextBlock(); } //copy the sequenced blocks to orderedBlks. loops = fn->getLoops(); fn->foreachBlock([&](ir::BasicBlock &bb){ orderedBlks.push_back(bbmap[&bb]); collectInsnNum(bbmap[&bb], &bb); }); } void CFGStructurizer::outBlockTypes(BlockType type) { if(type == SerialBlockType) std::cout << " T:["<< "Serial" <<"]"<< std::endl; else if(type == IfThenType) std::cout << " T:["<< "IfThen" <<"]"<< std::endl; else if(type == IfElseType) std::cout << " T:["<< "IfElse" <<"]"<< std::endl; else if(type == SelfLoopType) std::cout << " T:["<< "SelfLoop" <<"]"<< std::endl; else std::cout << " T:["<< "BasicBlock" <<"]"<< std::endl; } /* dump the block info for debug use, only SingleBlockType has label.*/ void CFGStructurizer::printOrderedBlocks() { size_t i = 0; std::cout << "\n ordered Blocks -> BasicBlocks -> Current BB: "<< *orderIter << std::endl; for (auto iterBlk = orderedBlks.begin(), iterBlkEnd = orderedBlks.end(); iterBlk != iterBlkEnd; ++iterBlk, ++i) { std::cout << "B:" << *iterBlk << " BB:" << bTobbmap[*iterBlk]; if((*iterBlk)->type() == SingleBlockType) std::cout << " L:"<< bTobbmap[*iterBlk]->getLabelIndex() << std::endl; else outBlockTypes((*iterBlk)->type()); } } /* transfer the predecessors and successors from the matched blocks to new mergedBB. * if the blocks contains backage, should add a successor to itself to make a self loop.*/ void CFGStructurizer::cfgUpdate(Block* mergedBB, const BlockSets& blockBBs) { for(auto iter= blockBBs.begin(); iter != blockBBs.end(); iter++) { for(auto p = (*iter)->pred_begin(); p != (*iter)->pred_end(); p++) { if(blockBBs.find(*p) != blockBBs.end()) continue; (*p)->successors().erase(*iter); (*p)->successors().insert(mergedBB); mergedBB->predecessors().insert(*p); if((*p)->fallthrough() == *iter) (*p)->fallthrough() = mergedBB; } for(auto s = (*iter)->succ_begin(); s != (*iter)->succ_end(); s++) { if(blockBBs.find(*s) != blockBBs.end()) continue; (*s)->predecessors().erase(*iter); (*s)->predecessors().insert(mergedBB); mergedBB->successors().insert(*s); if((*iter)->fallthrough() == *s) mergedBB->fallthrough() = *s; } } if(mergedBB->type() != SelfLoopType) { for(auto iter= blockBBs.begin(); iter != blockBBs.end(); iter++) { for(auto s = (*iter)->succ_begin(); s != (*iter)->succ_end(); s++) { if(blockBBs.find(*s) == blockBBs.end()) continue; LabelIndex l_iter = (*iter)->getEntry()->getLabelIndex(); LabelIndex l_succ = (*s)->getEntry()->getLabelIndex(); if(l_iter > l_succ) { mergedBB->predecessors().insert(mergedBB); mergedBB->successors().insert(mergedBB); return; } } } } } /* delete the matched blocks and replace it with mergedBB to reduce the CFG. * the mergedBB should be inserted to the entry block position. */ void CFGStructurizer::replace(Block* mergedBB, BlockSets blockBBs) { lIterator iter, iterRep; bool flag = false; for(iter = orderedBlks.begin(); iter!= orderedBlks.end() && !blockBBs.empty();) { if(!blockBBs.erase(*iter)) { iter++; continue; } if(flag == false) { iter = orderedBlks.erase(iter); iterRep = iter; orderIter = orderedBlks.insert(iterRep, mergedBB); flag = true; }else { iter = orderedBlks.erase(iter); } } } Block* CFGStructurizer::mergeSerialBlock(BlockList& serialBBs) { Block* p = new SerialBlock(serialBBs); BlockList::iterator iter = serialBBs.begin(); while(iter != serialBBs.end()) { if((*iter)->canBeHandled == false) { p->canBeHandled = false; break; } p->insnNum += (*iter)->insnNum; iter++; } return insertBlock(p); } BVAR(OCL_OUTPUT_STRUCTURIZE, false); /* if the block has only one successor, and it's successor has only one predecessor * and one successor. the block and the childBlk could be merged to a serial Block.*/ int CFGStructurizer::serialPatternMatch(Block *block) { if (block->succ_size() != 1) return 0; if(block->hasBarrier()) return 0; Block *childBlk = *block->succ_begin(); //FIXME, As our barrier implementation doen't support structured barrier //operation, exclude all the barrier blocks from serialPatternMatch. if (childBlk->pred_size() != 1 || childBlk->hasBarrier() ) return 0; BlockList serialBBs;//childBBs BlockSets serialSets; serialBBs.push_back(block); serialBBs.push_back(childBlk); serialSets.insert(block); serialSets.insert(childBlk); Block* mergedBB = mergeSerialBlock(serialBBs); if(mergedBB == NULL) return 0; cfgUpdate(mergedBB, serialSets); replace(mergedBB, serialSets); if(OCL_OUTPUT_STRUCTURIZE) printOrderedBlocks(); ++numSerialPatternMatch; if(serialSets.find(blocks_entry) != serialSets.end()) blocks_entry = mergedBB; return 1; } Block* CFGStructurizer::mergeLoopBlock(BlockList& loopSets) { if(loopSets.size() == 1) { Block* p = new SelfLoopBlock(*loopSets.begin()); p->insnNum = (*loopSets.begin())->insnNum; p->canBeHandled = true; (*loopSets.begin())->getExit()->isLoopExit = true; return insertBlock(p); } return NULL; } /*match the selfLoop pattern with llvm info or check whether the compacted node has a backage to itself.*/ int CFGStructurizer::loopPatternMatch(Block *block) { Block* loop_header = NULL; Block* b = block; BlockSets loopSets; BlockList loopBBs; //if b is basic block , query the llvm loop info to find the loop whoose loop header is b; if(block->type() == SingleBlockType){ for (auto l : loops) { BasicBlock &a = fn->getBlock(l->bbs[0]); loop_header = bbmap.find(&a)->second; if(loop_header == b){ for (auto bb : l->bbs) { BasicBlock &tmp = fn->getBlock(bb); Block* block_ = bbmap.find(&tmp)->second; loopBBs.push_front(block_); loopSets.insert(block_); } break; } } }else{ //b is compacted node, it would have a successor pointed to itself for self loop. if(block->successors().find(b) != block->successors().end()) { loopBBs.push_front(b); loopSets.insert(b); } } if(loopBBs.empty()) return 0; if(loopSets.size() == 1) { //self loop header should have a successor to itself, check this before merged. Block* lblock = *loopSets.begin(); if(lblock->successors().find(lblock) == lblock->successors().end()) return 0; } Block* mergedBB = mergeLoopBlock(loopBBs); if(mergedBB == NULL) return 0; cfgUpdate(mergedBB, loopSets); replace(mergedBB, loopSets); if(OCL_OUTPUT_STRUCTURIZE) printOrderedBlocks(); ++numLoopPatternMatch; if(loopSets.find(blocks_entry) != loopSets.end()) blocks_entry = mergedBB; return 1; } /* match the if pattern(E: entry block; T: True block; F: False block; C: Converged block): * for if-else pattern: ** E ** / \ ** T F ** \ / ** C ** E has two edges T and F, T and F both have only one predecessor and one successor indepedently, ** the successor of T and F must be the same. E's fallthrough need be treated as True edge. * * for if-then pattern E-T-C: ** E ** / | ** T | ** \ | ** C ** E has two edges T and C, T should have only one predecessor and one successor, the successor ** of T must be C. if E's fallthrough is C, need inverse the predicate. * * for if-then pattern E-F-C: ** E ** | \ ** | F ** | / ** C ** E has two edges C and F, F should have only one predecessor and one successor, the successor ** of F must be C. if E's fallthrough is C, need inverse the predicate. */ int CFGStructurizer::ifPatternMatch(Block *block) { //two edges if (block->succ_size() != 2) return 0; if(block->hasBarrier()) return 0; int NumMatch = 0; Block *TrueBB = *block->succ_begin(); Block *FalseBB = *(++block->succ_begin()); Block *mergedBB = NULL; BlockSets ifSets; assert (!TrueBB->succ_empty() || !FalseBB->succ_empty()); if (TrueBB->succ_size() == 1 && FalseBB->succ_size() == 1 && TrueBB->pred_size() == 1 && FalseBB->pred_size() == 1 && *TrueBB->succ_begin() == *FalseBB->succ_begin() && !TrueBB->hasBarrier() && !FalseBB->hasBarrier() && TrueBB->insnNum < 1000 && FalseBB->insnNum < 1000) { // if-else pattern ifSets.insert(block); if(block->fallthrough() == TrueBB) { ifSets.insert(TrueBB); ifSets.insert(FalseBB); mergedBB = new IfElseBlock(block, TrueBB, FalseBB); }else if(block->fallthrough() == FalseBB) { ifSets.insert(FalseBB); ifSets.insert(TrueBB); mergedBB = new IfElseBlock(block, FalseBB, TrueBB); }else{ GBE_ASSERT(0); } mergedBB->insnNum = block->insnNum + TrueBB->insnNum + FalseBB->insnNum; if(block->canBeHandled == false || TrueBB->canBeHandled == false || FalseBB->canBeHandled == false) block->canBeHandled = false; insertBlock(mergedBB); } else if (TrueBB->succ_size() == 1 && TrueBB->pred_size() == 1 && *TrueBB->succ_begin() == FalseBB && !TrueBB->hasBarrier() && TrueBB->insnNum < 1000 ) { // if-then pattern, false is empty ifSets.insert(block); ifSets.insert(TrueBB); mergedBB = new IfThenBlock(block, TrueBB); mergedBB->insnNum = block->insnNum + TrueBB->insnNum; if(block->fallthrough() == FalseBB) block->inversePredicate = false; if(block->canBeHandled == false || TrueBB->canBeHandled == false) block->canBeHandled = false; insertBlock(mergedBB); } else if (FalseBB->succ_size() == 1 && FalseBB->pred_size() == 1 && *FalseBB->succ_begin() == TrueBB && !FalseBB->hasBarrier() && FalseBB->insnNum < 1000 ) { // if-then pattern, true is empty ifSets.insert(block); ifSets.insert(FalseBB); mergedBB = new IfThenBlock(block, FalseBB); mergedBB->insnNum = block->insnNum + FalseBB->insnNum; if(block->fallthrough() == TrueBB) block->inversePredicate = false; if(block->canBeHandled == false || FalseBB->canBeHandled == false) block->canBeHandled = false; insertBlock(mergedBB); } else{ return 0; } if(ifSets.empty()) return 0; if(mergedBB == NULL) return 0; cfgUpdate(mergedBB, ifSets); replace(mergedBB, ifSets); if(OCL_OUTPUT_STRUCTURIZE) printOrderedBlocks(); ++numIfPatternMatch; if(ifSets.find(blocks_entry) != ifSets.end()) blocks_entry = mergedBB; return NumMatch + 1; } /* match loop pattern, serail pattern, if pattern accordingly, update and replace block the CFG internally once matched. */ int CFGStructurizer::patternMatch(Block *block) { int NumMatch = 0; NumMatch += loopPatternMatch(block); NumMatch += serialPatternMatch(block); NumMatch += ifPatternMatch(block); return NumMatch; } void CFGStructurizer::blockPatternMatch() { int increased = 0; do { increased = numSerialPatternMatch + numLoopPatternMatch + numIfPatternMatch; orderIter = orderedBlks.begin(); while(orderedBlks.size() > 1 && orderIter != orderedBlks.end()) { if(OCL_OUTPUT_STRUCTURIZE) printOrderedBlocks(); patternMatch(*orderIter); orderIter++; } if(OCL_OUTPUT_STRUCTURIZE) printOrderedBlocks(); if(increased == numSerialPatternMatch + numLoopPatternMatch + numIfPatternMatch) break; } while(orderedBlks.size()>1); if(OCL_OUTPUT_STRUCTURIZE) std::cout << "Serial:" << numSerialPatternMatch << "Loop:" << numLoopPatternMatch << "If:" << numIfPatternMatch << std::endl; } void CFGStructurizer::StructurizeBlocks() { initializeBlocks(); blockPatternMatch(); handleStructuredBlocks(); calculateNecessaryLiveout(); } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/profiling.hpp000664 001750 001750 00000006630 13161142102 020746 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file profiling.hpp * */ #ifndef __GBE_IR_PROFILING_HPP__ #define __GBE_IR_PROFILING_HPP__ #include #include "sys/map.hpp" #include "sys/vector.hpp" #include "unit.hpp" namespace gbe { namespace ir { class Context; class ProfilingInfo //: public Serializable { public: const static uint32_t MaxTimestampProfilingPoints = 20; enum { ProfilingSimdType1, ProfilingSimdType8, ProfilingSimdType16, }; typedef struct { uint32_t fixedFunctionID:4; uint32_t simdType:4; uint32_t kernelID:24; union GenInfo { struct Gen7Info { uint16_t thread_id:3; uint16_t reserved1:5; uint16_t eu_id:4; uint16_t half_slice_id:1; uint16_t slice_id:2; uint16_t reserved0:1; } gen7; struct Gen8Info { uint16_t thread_id:3; uint16_t reserved1:5; uint16_t eu_id:4; uint16_t subslice_id:2; uint16_t slice_id:2; } gen8; } genInfo; uint16_t dispatchMask; uint32_t gidXStart; uint32_t gidXEnd; uint32_t gidYStart; uint32_t gidYEnd; uint32_t gidZStart; uint32_t gidZEnd; uint32_t userTimestamp[MaxTimestampProfilingPoints]; uint32_t timestampPrologLo; uint32_t timestampPrologHi; uint32_t timestampEpilogLo; uint32_t timestampEpilogHi; } ProfilingReportItem; ProfilingInfo(const ProfilingInfo& other) { this->bti = other.bti; this->profilingType = other.profilingType; this->deviceID = other.deviceID; } ProfilingInfo(void) { this->bti = 0; this->profilingType = 0; this->deviceID = 0; } struct LockOutput { LockOutput(void) { pthread_mutex_lock(&lock); } ~LockOutput(void) { pthread_mutex_unlock(&lock); } }; void setBTI(uint32_t b) { bti = b; } uint32_t getBTI() const { return bti; } void setProfilingType(uint32_t t) { profilingType = t; } uint32_t getProfilingType() const { return profilingType; } void setDeviceID(uint32_t id) { deviceID = id; } uint32_t getDeviceID() const { return deviceID; } void outputProfilingInfo(void* logBuf); private: uint32_t bti; uint32_t profilingType; uint32_t deviceID; friend struct LockOutput; static pthread_mutex_t lock; GBE_CLASS(ProfilingInfo); }; } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_PROFILING_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/liveness.hpp000664 001750 001750 00000012510 13161142102 020577 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file liveness.hpp * \author Benjamin Segovia */ #ifndef __GBE_IR_LIVENESS_HPP__ #define __GBE_IR_LIVENESS_HPP__ #include #include "sys/map.hpp" #include "sys/set.hpp" #include "ir/register.hpp" #include "ir/function.hpp" namespace gbe { namespace ir { // Liveness is computed per function class Function; /*! To choose the iteration direction, we either look at predecessors or * successors */ enum DataFlowDirection { DF_PRED = 0, DF_SUCC = 1 }; /*! Compute liveness of each register */ class Liveness : public NonCopyable { public: Liveness(Function &fn, bool isInGenBackend = false); ~Liveness(void); /*! Set of variables used upwards in the block (before a definition) */ typedef set UEVar; /*! Set of variables alive at the exit of the block */ typedef set LiveOut; /*! Set of variables actually killed in each block */ typedef set VarKill; /*! Per-block info */ struct BlockInfo : public NonCopyable { BlockInfo(const BasicBlock &bb) : bb(bb) {} const BasicBlock &bb; INLINE bool inUpwardUsed(Register reg) const { return upwardUsed.contains(reg); } INLINE bool inLiveOut(Register reg) const { return liveOut.contains(reg); } INLINE bool inVarKill(Register reg) const { return varKill.contains(reg); } UEVar upwardUsed; LiveOut liveOut; VarKill varKill; }; /*! Gives for each block the variables alive at entry / exit */ typedef map Info; /*! Return the complete liveness info */ INLINE const Info &getLivenessInfo(void) const { return liveness; } /*! Return the complete block info */ INLINE const BlockInfo &getBlockInfo(const BasicBlock *bb) const { auto it = liveness.find(bb); GBE_ASSERT(it != liveness.end() && it->second != NULL); return *it->second; } /*! Get the set of registers alive at the end of the block */ const LiveOut &getLiveOut(const BasicBlock *bb) const { const BlockInfo &info = this->getBlockInfo(bb); return info.liveOut; } /*! Get the set of registers alive at the beginning of the block */ const UEVar &getLiveIn(const BasicBlock *bb) const { const BlockInfo &info = this->getBlockInfo(bb); return info.upwardUsed; } /*! Return the function the liveness was computed on */ INLINE const Function &getFunction(void) const { return fn; } /*! Actually do something for each successor / predecessor of *all* blocks */ template void foreach(const T &functor) { // Iterate on all blocks for (Info::iterator pair = liveness.begin(); pair != liveness.end(); ++pair) { BlockInfo &info = *(pair->second); const BasicBlock &bb = info.bb; const BlockSet *set = NULL; if (dir == DF_SUCC) set = &bb.getSuccessorSet(); else set = &bb.getPredecessorSet(); // Iterate over all successors for (BlockSet::iterator other = (*set).begin(); other != (*set).end(); ++other) { Info::iterator otherInfo = liveness.find(*other); GBE_ASSERT(otherInfo != liveness.end() && otherInfo->second != NULL); functor(info, *otherInfo->second); } } } // remove some registers from the liveness information. void removeRegs(const set &removes); // replace some registers according to (from, to) register map. void replaceRegs(const map &replaceMap); private: /*! Store the liveness of all blocks */ Info liveness; /*! Compute the liveness for this function */ Function &fn; /*! Initialize UEVar and VarKill per block */ void initBlock(const BasicBlock &bb); /*! Initialize UEVar and VarKill per instruction */ void initInstruction(BlockInfo &info, const Instruction &insn); /*! Now really compute LiveOut based on UEVar and VarKill */ void computeLiveInOut(void); void computeExtraLiveInOut(set &extentRegs); void analyzeUniform(set *extentRegs); /*! Set of work list block which has exit(return) instruction */ typedef set WorkSet; WorkSet workSet; WorkSet unvisitBlocks; /*! Use custom allocators */ GBE_CLASS(Liveness); }; /*! Output a nice ASCII reprensation of the liveness */ std::ostream &operator<< (std::ostream &out, const Liveness &liveness); } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_LIVENESS_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/profile.hpp000664 001750 001750 00000011216 13161142102 020411 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file profile.hpp * \author Benjamin Segovia */ #ifndef __GBE_IR_PROFILE_HPP__ #define __GBE_IR_PROFILE_HPP__ #include "ir/register.hpp" namespace gbe { namespace ir { /*! Profile is defined *per-function* and mostly predefined registers */ enum Profile : uint32_t { PROFILE_C = 0, // Not used now PROFILE_OCL = 1 }; // Will be pre-initialized based on its profile class Function; /*! Registers used for ocl */ namespace ocl { static const Register lid0 = Register(0); // get_local_id(0) static const Register lid1 = Register(1); // get_local_id(1) static const Register lid2 = Register(2); // get_local_id(2) static const Register groupid0 = Register(3); // get_group_id(0) static const Register groupid1 = Register(4); // get_group_id(1) static const Register groupid2 = Register(5); // get_group_id(2) static const Register numgroup0 = Register(6); // get_num_groups(0) static const Register numgroup1 = Register(7); // get_num_groups(1) static const Register numgroup2 = Register(8); // get_num_groups(2) static const Register lsize0 = Register(9); // get_local_size(0) static const Register lsize1 = Register(10); // get_local_size(1) static const Register lsize2 = Register(11); // get_local_size(2) static const Register enqlsize0 = Register(12); // get_local_size(0) static const Register enqlsize1 = Register(13); // get_local_size(1) static const Register enqlsize2 = Register(14); // get_local_size(2) static const Register gsize0 = Register(15); // get_global_size(0) static const Register gsize1 = Register(16); // get_global_size(1) static const Register gsize2 = Register(17); // get_global_size(2) static const Register goffset0 = Register(18); // get_global_offset(0) static const Register goffset1 = Register(19); // get_global_offset(1) static const Register goffset2 = Register(20); // get_global_offset(2) static const Register stackptr = Register(21); // stack pointer static const Register stackbuffer = Register(22); // stack buffer base address. static const Register blockip = Register(23); // blockip static const Register barrierid = Register(24);// barrierid static const Register threadn = Register(25); // number of threads static const Register workdim = Register(26); // work dimention. static const Register zero = Register(27); // scalar register holds zero. static const Register one = Register(28); // scalar register holds one. static const Register retVal = Register(29); // helper register to do data flow analysis. static const Register dwblockip = Register(30); // blockip static const Register profilingbptr = Register(31); // buffer addr for profiling. static const Register profilingts0 = Register(32); // timestamp for profiling. static const Register profilingts1 = Register(33); // timestamp for profiling. static const Register profilingts2 = Register(34); // timestamp for profiling. static const Register profilingts3 = Register(35); // timestamp for profiling. static const Register profilingts4 = Register(36); // timestamp for profiling. static const Register threadid = Register(37); // the thread id of this thread. static const Register constant_addrspace = Register(38); // starting address of program-scope constant static const Register stacksize = Register(39); // stack buffer total size static const Register enqueuebufptr = Register(40); // enqueue buffer address . static const uint32_t regNum = 41; // number of special registers extern const char *specialRegMean[]; // special register name. } /* namespace ocl */ /*! Initialize the profile of the given function */ void initProfile(Function &fn); } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_PROFILE_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/context.cpp000664 001750 001750 00000014501 13161142102 020430 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file context.cpp * \author Benjamin Segovia */ #include "ir/context.hpp" #include "ir/unit.hpp" #include "ir/lowering.hpp" namespace gbe { namespace ir { Context::Context(Unit &unit) : unit(unit), fn(NULL), bb(NULL), usedLabels(NULL) {} Context::~Context(void) { for (const auto &elem : fnStack) GBE_SAFE_DELETE(elem.usedLabels); GBE_SAFE_DELETE(usedLabels); } Function &Context::getFunction(void) { GBE_ASSERTM(fn != NULL, "No function currently defined"); return *fn; } void Context::appendPushedConstant(Register reg, const PushLocation &pushed) { GBE_ASSERTM(fn != NULL, "No function currently defined"); GBE_ASSERTM(fn->pushMap.contains(reg) == false, "Register already pushed"); fn->pushMap.insert(std::make_pair(reg, pushed)); fn->locationMap.insert(std::make_pair(pushed, reg)); } void Context::startFunction(const std::string &name) { fnStack.push_back(StackElem(fn,bb,usedLabels)); fn = unit.newFunction(name); usedLabels = GBE_NEW_NO_ARG(vector); bb = NULL; } void Context::endFunction(void) { GBE_ASSERTM(fn != NULL, "No function to end"); GBE_ASSERT(fnStack.size() != 0); GBE_ASSERT(usedLabels != NULL); // Empty function -> append a return if (fn->blockNum() == 0) this->RET(); // Check first that all branch instructions point to valid labels GBE_ASSERT(usedLabels); #if GBE_DEBUG for (auto usage : *usedLabels) GBE_ASSERTM(usage != LABEL_IS_POINTED, "A label is used and not defined"); #endif /* GBE_DEBUG */ GBE_DELETE(usedLabels); // Remove all returns and insert one unique return block at the end of the // function lowerReturn(unit, fn->getName()); // check if there is empty labels at first // FIXME: I don't find a way to elimimate all empty blocks. temporary disable this check //fn->checkEmptyLabels(); // Properly order labels and compute the CFG, it's needed by FunctionArgumentLower fn->sortLabels(); fn->computeCFG(); // Spill function argument to the stack if required and identify which // function arguments can use constant push lowerFunctionArguments(unit, fn->getName()); const StackElem elem = fnStack.back(); fnStack.pop_back(); fn = elem.fn; bb = elem.bb; usedLabels = elem.usedLabels; } Register Context::reg(RegisterFamily family, bool uniform, gbe_curbe_type curbeType, int subType) { GBE_ASSERTM(fn != NULL, "No function currently defined"); return fn->newRegister(family, uniform, curbeType, subType); } LabelIndex Context::label(void) { GBE_ASSERTM(fn != NULL, "No function currently defined"); const LabelIndex index = fn->newLabel(); if (index >= usedLabels->size()) { usedLabels->resize(index + 1); (*usedLabels)[index] = 0; } return index; } void Context::input(const std::string &name, FunctionArgument::Type type, Register reg, FunctionArgument::InfoFromLLVM& info, uint32_t elementSize, uint32_t align, unsigned char bti) { GBE_ASSERTM(fn != NULL, "No function currently defined"); GBE_ASSERTM(reg < fn->file.regNum(), "Out-of-bound register"); FunctionArgument *arg = GBE_NEW(FunctionArgument, type, reg, elementSize, name, align, info, bti); fn->setRegPayloadType(arg->reg, GBE_CURBE_KERNEL_ARGUMENT, fn->args.size()); fn->args.push_back(arg); } void Context::output(Register reg) { GBE_ASSERTM(fn != NULL, "No function currently defined"); GBE_ASSERTM(reg < fn->file.regNum(), "Out-of-bound register"); fn->outputs.push_back(reg); } void Context::startBlock(void) { GBE_ASSERTM(fn != NULL, "No function currently defined"); this->bb = GBE_NEW(BasicBlock, *fn); fn->blocks.push_back(bb); } void Context::endBlock(void) { this->bb = NULL; } void Context::append(const Instruction &insn) { GBE_ASSERTM(fn != NULL, "No function currently defined"); // Start a new block if this is a label if (insn.isMemberOf() == true) { this->endBlock(); this->startBlock(); const LabelIndex index = cast(insn).getLabelIndex(); GBE_ASSERTM(index < fn->labelNum(), "Out-of-bound label"); GBE_ASSERTM(fn->labels[index] == NULL, "Label used in a previous block"); fn->labels[index] = bb; // Now the label index is properly defined GBE_ASSERT(index < usedLabels->size()); (*usedLabels)[index] |= LABEL_IS_DEFINED; } // We create a new label for a new block if the user did not do it else if (bb == NULL) { // this->startBlock(); const LabelIndex index = this->label(); const Instruction insn = ir::LABEL(index); this->append(insn); } // Append the instruction in the stream Instruction *insnPtr = fn->newInstruction(insn); bb->append(*insnPtr); insnPtr->setDBGInfo(this->DBGInfo); #if GBE_DEBUG std::string whyNot; if(getUnit().getValid()) GBE_ASSERTM(insnPtr->wellFormed(whyNot), whyNot.c_str()); #endif /* GBE_DEBUG */ // Close the current block if this is a branch if (insn.isMemberOf() == true) { // We must book keep the fact that the label is used if (insn.getOpcode() == OP_BRA) { const BranchInstruction &branch = cast(insn); const LabelIndex index = branch.getLabelIndex(); GBE_ASSERT(index < usedLabels->size()); (*usedLabels)[index] |= LABEL_IS_POINTED; } this->endBlock(); } } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/unit.hpp000664 001750 001750 00000010331 13173554000 017734 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file unit.hpp * \author Benjamin Segovia */ #ifndef __GBE_IR_UNIT_HPP__ #define __GBE_IR_UNIT_HPP__ #include "ir/constant.hpp" #include "ir/register.hpp" #include "ir/profiling.hpp" #include "ir/printf.hpp" #include "ir/reloc.hpp" #include "sys/map.hpp" #include namespace gbe { namespace ir { // A unit contains a set of functions class Function; class ProfilingInfo; class Unit : public NonCopyable { public: typedef map FunctionSet; /*! Moved from printf pass */ map printfs; vector blockFuncs; /*! Create an empty unit */ Unit(PointerSize pointerSize = POINTER_32_BITS); /*! Release everything (*including* the function pointers) */ ~Unit(void); /*! Get the set of functions defined in the unit */ const FunctionSet &getFunctionSet(void) const { return functions; } /*! Retrieve the function by its name */ Function *getFunction(const std::string &name) const; /*! Return NULL if the function already exists */ Function *newFunction(const std::string &name); /*! Create a new constant in the constant set */ void newConstant(const std::string&, uint32_t size, uint32_t alignment); /*! Apply the given functor on all the functions */ template INLINE void apply(const T &functor) const { for (FunctionSet::const_iterator it = functions.begin(); it != functions.end(); ++it) functor(*(it->second)); } /*! Return the size of the pointers manipulated */ INLINE PointerSize getPointerSize(void) const { return pointerSize; } INLINE void setPointerSize(PointerSize size) { pointerSize = size; } /*! Return the family of registers that contain pointer */ INLINE RegisterFamily getPointerFamily(void) const { if (this->getPointerSize() == POINTER_32_BITS) return FAMILY_DWORD; else return FAMILY_QWORD; } /*! Return the constant set */ ConstantSet& getConstantSet(void) { return constantSet; } const RelocTable& getRelocTable(void) const { return relocTable; } RelocTable& getRelocTable(void) { return relocTable; } /*! Return the constant set */ const ConstantSet& getConstantSet(void) const { return constantSet; } /*! Get profiling info in this function */ ProfilingInfo* getProfilingInfo(void) const { return profilingInfo; } /*! Set in profiling mode */ void setInProfilingMode(bool b) { inProfilingMode = b; } /*! Get in profiling mode */ bool getInProfilingMode(void) const { return inProfilingMode; } void setValid(bool value) { valid = value; } bool getValid() { return valid; } void setOclVersion(uint32_t version) { oclVersion = version; } uint32_t getOclVersion() const { return oclVersion; } private: friend class ContextInterface; //!< Can free modify the unit FunctionSet functions; //!< All the defined functions ConstantSet constantSet; //!< All the constants defined in the unit RelocTable relocTable; PointerSize pointerSize; //!< Size shared by all pointers ProfilingInfo *profilingInfo; //!< profilingInfo store the information for profiling. GBE_CLASS(Unit); uint32_t oclVersion; bool valid; bool inProfilingMode; }; /*! Output the unit string in the given stream */ std::ostream &operator<< (std::ostream &out, const Unit &unit); } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_UNIT_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/image.cpp000664 001750 001750 00000023227 13161142102 020033 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file image.cpp * */ #include "image.hpp" #include "context.hpp" #include "ocl_common_defines.h" #include "backend/program.h" namespace gbe { namespace ir { static uint32_t getInfoOffset4Type(struct ImageInfo *imageInfo, int type) { switch (type) { case GetImageInfoInstruction::WIDTH: return imageInfo->wSlot; case GetImageInfoInstruction::HEIGHT: return imageInfo->hSlot; case GetImageInfoInstruction::DEPTH: return imageInfo->depthSlot; case GetImageInfoInstruction::CHANNEL_DATA_TYPE: return imageInfo->dataTypeSlot; case GetImageInfoInstruction::CHANNEL_ORDER: return imageInfo->channelOrderSlot; default: NOT_IMPLEMENTED; } return 0; } static uint32_t setInfoOffset4Type(struct ImageInfo *imageInfo, int type, uint32_t offset) { switch (type) { case GetImageInfoInstruction::WIDTH: imageInfo->wSlot = offset; break; case GetImageInfoInstruction::HEIGHT: imageInfo->hSlot = offset; break; case GetImageInfoInstruction::DEPTH: imageInfo->depthSlot = offset; break; case GetImageInfoInstruction::CHANNEL_DATA_TYPE: imageInfo->dataTypeSlot = offset; break; case GetImageInfoInstruction::CHANNEL_ORDER: imageInfo->channelOrderSlot = offset; break; default: NOT_IMPLEMENTED; } return 0; } void ImageSet::appendInfo(ImageInfoKey key, uint32_t offset) { map::iterator it = indexMap.find(key.index); assert(it != indexMap.end()); struct ImageInfo *imageInfo = it->second; setInfoOffset4Type(imageInfo, key.type, offset); } void ImageSet::clearInfo() { struct ImageInfo *imageInfo; for (map::iterator it = indexMap.begin(); it != indexMap.end(); ++it) { imageInfo = it->second; imageInfo->wSlot = -1; imageInfo->hSlot = -1; imageInfo->depthSlot = -1; imageInfo->dataTypeSlot = -1; imageInfo->channelOrderSlot = -1; } } int32_t ImageSet::getInfoOffset(ImageInfoKey key) const { map::const_iterator it = indexMap.find(key.index); if (it == indexMap.end()) return -1; struct ImageInfo *imageInfo = it->second; return getInfoOffset4Type(imageInfo, key.type); } uint32_t ImageSet::getIdx(const Register imageReg) const { map::const_iterator it = regMap.find(imageReg); GBE_ASSERT(it != regMap.end()); return it->second->idx; } void ImageSet::getData(struct ImageInfo *imageInfos) const { int id = 0; for (map::const_iterator it = regMap.begin(); it != regMap.end(); ++it) imageInfos[id++] = *(it->second); } ImageSet::~ImageSet() { for (map::const_iterator it = regMap.begin(); it != regMap.end(); ++it) GBE_DELETE(it->second); } #define OUT_UPDATE_SZ(elt) SERIALIZE_OUT(elt, outs, ret_size) #define IN_UPDATE_SZ(elt) DESERIALIZE_IN(elt, ins, total_size) /*! Implements the serialization. */ uint32_t ImageSet::serializeToBin(std::ostream& outs) { uint32_t ret_size = 0; uint32_t sz = 0; OUT_UPDATE_SZ(magic_begin); sz = regMap.size(); OUT_UPDATE_SZ(sz); for (map::const_iterator it = regMap.begin(); it != regMap.end(); ++it) { OUT_UPDATE_SZ(it->first); OUT_UPDATE_SZ(it->second->arg_idx); OUT_UPDATE_SZ(it->second->idx); OUT_UPDATE_SZ(it->second->wSlot); OUT_UPDATE_SZ(it->second->hSlot); OUT_UPDATE_SZ(it->second->depthSlot); OUT_UPDATE_SZ(it->second->dataTypeSlot); OUT_UPDATE_SZ(it->second->channelOrderSlot); OUT_UPDATE_SZ(it->second->dimOrderSlot); } sz = indexMap.size(); OUT_UPDATE_SZ(sz); for (map::iterator it = indexMap.begin(); it != indexMap.end(); ++it) { OUT_UPDATE_SZ(it->first); OUT_UPDATE_SZ(it->second->arg_idx); OUT_UPDATE_SZ(it->second->idx); OUT_UPDATE_SZ(it->second->wSlot); OUT_UPDATE_SZ(it->second->hSlot); OUT_UPDATE_SZ(it->second->depthSlot); OUT_UPDATE_SZ(it->second->dataTypeSlot); OUT_UPDATE_SZ(it->second->channelOrderSlot); OUT_UPDATE_SZ(it->second->dimOrderSlot); } OUT_UPDATE_SZ(magic_end); OUT_UPDATE_SZ(ret_size); return ret_size; } uint32_t ImageSet::deserializeFromBin(std::istream& ins) { uint32_t total_size = 0; uint32_t magic; uint32_t image_map_sz = 0; IN_UPDATE_SZ(magic); if (magic != magic_begin) return 0; IN_UPDATE_SZ(image_map_sz); //regMap for (uint32_t i = 0; i < image_map_sz; i++) { ir::Register reg; ImageInfo *img_info = GBE_NEW(struct ImageInfo);; IN_UPDATE_SZ(reg); IN_UPDATE_SZ(img_info->arg_idx); IN_UPDATE_SZ(img_info->idx); IN_UPDATE_SZ(img_info->wSlot); IN_UPDATE_SZ(img_info->hSlot); IN_UPDATE_SZ(img_info->depthSlot); IN_UPDATE_SZ(img_info->dataTypeSlot); IN_UPDATE_SZ(img_info->channelOrderSlot); IN_UPDATE_SZ(img_info->dimOrderSlot); regMap.insert(std::make_pair(reg, img_info)); } IN_UPDATE_SZ(image_map_sz); //indexMap for (uint32_t i = 0; i < image_map_sz; i++) { uint32_t index; ImageInfo *img_info = GBE_NEW(struct ImageInfo);; IN_UPDATE_SZ(index); IN_UPDATE_SZ(img_info->arg_idx); IN_UPDATE_SZ(img_info->idx); IN_UPDATE_SZ(img_info->wSlot); IN_UPDATE_SZ(img_info->hSlot); IN_UPDATE_SZ(img_info->depthSlot); IN_UPDATE_SZ(img_info->dataTypeSlot); IN_UPDATE_SZ(img_info->channelOrderSlot); IN_UPDATE_SZ(img_info->dimOrderSlot); indexMap.insert(std::make_pair(img_info->idx, img_info)); } IN_UPDATE_SZ(magic); if (magic != magic_end) return 0; uint32_t total_bytes; IN_UPDATE_SZ(total_bytes); if (total_bytes + sizeof(total_size) != total_size) return 0; return total_size; } void ImageSet::printStatus(int indent, std::ostream& outs) { using namespace std; string spaces = indent_to_str(indent); string spaces_nl = indent_to_str(indent + 4); outs << spaces << "------------ Begin ImageSet ------------" << "\n"; outs << spaces_nl << " ImageSet Map: [reg, arg_idx, idx, wSlot, hSlot, depthSlot, " "dataTypeSlot, channelOrderSlot, dimOrderSlot]\n"; outs << spaces_nl << " regMap size: " << regMap.size() << "\n"; for (map::const_iterator it = regMap.begin(); it != regMap.end(); ++it) { outs << spaces_nl << " [" << it->first << ", " << it->second->arg_idx << ", " << it->second->idx << ", " << it->second->wSlot << ", " << it->second->hSlot << ", " << it->second->depthSlot << ", " << it->second->dataTypeSlot << ", " << it->second->channelOrderSlot << ", " << it->second->dimOrderSlot << "]" << "\n"; } outs << spaces_nl << " ImageSet Map: [index, arg_idx, idx, wSlot, hSlot, depthSlot, " "dataTypeSlot, channelOrderSlot, dimOrderSlot]\n"; outs << spaces_nl << " regMap size: " << indexMap.size() << "\n"; for (map::iterator it = indexMap.begin(); it != indexMap.end(); ++it) { outs << spaces_nl << " [" << it->first << ", " << it->second->arg_idx << ", " << it->second->idx << ", " << it->second->wSlot << ", " << it->second->hSlot << ", " << it->second->depthSlot << ", " << it->second->dataTypeSlot << ", " << it->second->channelOrderSlot << ", " << it->second->dimOrderSlot << ", " << "\n"; } outs << spaces << "------------- End ImageSet -------------" << "\n"; } #ifdef GBE_COMPILER_AVAILABLE Register ImageSet::appendInfo(ImageInfoKey key, Context *ctx) { auto it = infoRegMap.find(key.data); if (it != infoRegMap.end()) return it->second; Register reg = ctx->reg(FAMILY_DWORD, false, GBE_CURBE_IMAGE_INFO, key.data); infoRegMap.insert(std::make_pair(key.data, reg)); return reg; } void ImageSet::append(Register imageReg, Context *ctx, uint8_t bti) { ir::FunctionArgument *arg = ctx->getFunction().getArg(imageReg); GBE_ASSERTM(arg && arg->type == ir::FunctionArgument::IMAGE, "Append an invalid reg to image set."); GBE_ASSERTM(regMap.find(imageReg) == regMap.end(), "Append the same image reg twice."); int32_t id = ctx->getFunction().getArgID(arg); struct ImageInfo *imageInfo = GBE_NEW(struct ImageInfo); imageInfo->arg_idx = id; imageInfo->idx = bti; imageInfo->wSlot = -1; imageInfo->hSlot = -1; imageInfo->depthSlot = -1; imageInfo->dataTypeSlot = -1; imageInfo->channelOrderSlot = -1; imageInfo->dimOrderSlot = -1; regMap.insert(std::make_pair(imageReg, imageInfo)); indexMap.insert(std::make_pair(imageInfo->idx, imageInfo)); } #endif } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/immediate.cpp000664 001750 001750 00000033362 13161142102 020710 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "immediate.hpp" using namespace gbe; using namespace ir; #define SCALAR_SAME_TYPE_ASSERT() \ GBE_ASSERT(this->getType() == right.getType() && \ this->getElemNum() == right.getElemNum() && \ this->getElemNum() == 1) #define DECLAR_BINARY_ALL_TYPE_OP(OP) \ Immediate Immediate::operator OP (const Immediate &right) const { \ /*SCALAR_SAME_TYPE_ASSERT();*/ \ switch (this->getType()) { \ default: \ GBE_ASSERT(0); \ case TYPE_BOOL: return Immediate(*this->data.b OP *right.data.b); \ case TYPE_S8: return Immediate(*this->data.s8 OP *right.data.s8); \ case TYPE_U8: return Immediate(*this->data.u8 OP *right.data.u8); \ case TYPE_S16: return Immediate(*this->data.s16 OP *right.data.s16); \ case TYPE_U16: return Immediate(*this->data.u16 OP *right.data.u16); \ case TYPE_S32: return Immediate(*this->data.s32 OP *right.data.s32); \ case TYPE_U32: return Immediate(*this->data.u32 OP *right.data.u32); \ case TYPE_S64: return Immediate(*this->data.s64 OP *right.data.s64); \ case TYPE_U64: return Immediate(*this->data.u64 OP *right.data.u64); \ case TYPE_FLOAT: return Immediate(*this->data.f32 OP *right.data.f32); \ case TYPE_HALF: return Immediate(*this->data.f16 OP *right.data.f16); \ case TYPE_DOUBLE: return Immediate(*this->data.f64 OP *right.data.f64); \ }\ return *this;\ } DECLAR_BINARY_ALL_TYPE_OP(+) DECLAR_BINARY_ALL_TYPE_OP(-) DECLAR_BINARY_ALL_TYPE_OP(*) DECLAR_BINARY_ALL_TYPE_OP(/) DECLAR_BINARY_ALL_TYPE_OP(>) //DECLAR_BINARY_ALL_TYPE_OP(<) DECLAR_BINARY_ALL_TYPE_OP(==) DECLAR_BINARY_ALL_TYPE_OP(!=) DECLAR_BINARY_ALL_TYPE_OP(>=) DECLAR_BINARY_ALL_TYPE_OP(<=) DECLAR_BINARY_ALL_TYPE_OP(&&) #undef DECLAR_BINARY_ALL_TYPE_OP #define DECLAR_BINARY_INT_TYPE_OP(OP) \ Immediate Immediate::operator OP (const Immediate &right) const { \ /*SCALAR_SAME_TYPE_ASSERT();*/ \ switch (this->getType()) { \ default: \ GBE_ASSERT(0); \ case TYPE_BOOL: return Immediate(*this->data.b OP *right.data.b); \ case TYPE_S8: return Immediate(*this->data.s8 OP *right.data.s8); \ case TYPE_U8: return Immediate(*this->data.u8 OP *right.data.u8); \ case TYPE_S16: return Immediate(*this->data.s16 OP *right.data.s16); \ case TYPE_U16: return Immediate(*this->data.u16 OP *right.data.u16); \ case TYPE_S32: return Immediate(*this->data.s32 OP *right.data.s32); \ case TYPE_U32: return Immediate(*this->data.u32 OP *right.data.u32); \ case TYPE_S64: return Immediate(*this->data.s64 OP *right.data.s64); \ case TYPE_U64: return Immediate(*this->data.u64 OP *right.data.u64); \ }\ return *this;\ } DECLAR_BINARY_INT_TYPE_OP(%) DECLAR_BINARY_INT_TYPE_OP(&) DECLAR_BINARY_INT_TYPE_OP(|) DECLAR_BINARY_INT_TYPE_OP(^) #undef DECLAR_BINARY_INT_TYPE_OP #define DECLAR_BINARY_ASHIFT_OP(OP) \ Immediate Immediate::operator OP (const Immediate &right) const { \ GBE_ASSERT(this->getType() > TYPE_BOOL && this->getType() <= TYPE_U64); \ int32_t shift = right.getIntegerValue(); \ if (shift == 0) \ return *this; \ else \ switch (this->getType()) { \ default: \ GBE_ASSERT(0); \ case TYPE_S8: return Immediate((*this->data.s8 OP shift)); \ case TYPE_U8: return Immediate((*this->data.u8 OP shift)); \ case TYPE_S16: return Immediate((*this->data.s16 OP shift)); \ case TYPE_U16: return Immediate((*this->data.u16 OP shift)); \ case TYPE_S32: return Immediate((*this->data.s32 OP shift)); \ case TYPE_U32: return Immediate((*this->data.u32 OP shift)); \ case TYPE_S64: return Immediate((*this->data.s64 OP shift)); \ case TYPE_U64: return Immediate((*this->data.u64 OP shift)); \ } \ } DECLAR_BINARY_ASHIFT_OP(>>) DECLAR_BINARY_ASHIFT_OP(<<) #undef DECLAR_BINARY_ASHIFT_OP Immediate Immediate::lshr (const Immediate &left, const Immediate &right) { GBE_ASSERT(left.getType() > TYPE_BOOL && left.getType() <= TYPE_U64); int32_t shift = right.getIntegerValue(); if (shift == 0) return left; else switch (left.getType()) { default: GBE_ASSERT(0); case TYPE_S8: case TYPE_U8: return Immediate((*left.data.u8 >> shift)); case TYPE_S16: case TYPE_U16: return Immediate((*left.data.u16 >> shift)); case TYPE_S32: case TYPE_U32: return Immediate((*left.data.u32 >> shift)); case TYPE_S64: case TYPE_U64: return Immediate((*left.data.u64 >> shift)); } } Immediate Immediate::less (const Immediate &left, const Immediate &right) { GBE_ASSERT(left.getType() > TYPE_BOOL && left.getType() <= TYPE_DOUBLE); switch (left.getType()) { default: GBE_ASSERT(0); case TYPE_S8: return Immediate(*left.data.s8 < *right.data.s8); case TYPE_U8: return Immediate(*left.data.u8 < *right.data.u8); case TYPE_S16: return Immediate(*left.data.s16 < *right.data.s16); case TYPE_U16: return Immediate(*left.data.u16 < *right.data.u16); case TYPE_S32: return Immediate(*left.data.s32 < *right.data.s32); case TYPE_U32: return Immediate(*left.data.u32 < *right.data.u32); case TYPE_S64: return Immediate(*left.data.s64 < *right.data.s64); case TYPE_U64: return Immediate(*left.data.u64 < *right.data.u64); case TYPE_FLOAT: return Immediate(*left.data.f32 < *right.data.f32); case TYPE_HALF: return Immediate(*left.data.f16 < *right.data.f16); case TYPE_DOUBLE: return Immediate(*left.data.f64 < *right.data.f64); } } Immediate Immediate::extract (const Immediate &left, const Immediate &right, Type dstType) { GBE_ASSERT(left.getType() > TYPE_BOOL && left.getType() <= TYPE_DOUBLE); GBE_ASSERT(dstType == left.getType()); uint32_t index = right.getIntegerValue(); GBE_ASSERT(index >= 0 && index < left.getElemNum()); if (left.type != IMM_TYPE_COMP) { switch (left.getType()) { default: GBE_ASSERT(0); case TYPE_BOOL: return Immediate(left.data.b[index]); case TYPE_S8: return Immediate(left.data.s8[index]); case TYPE_U8: return Immediate(left.data.u8[index]); case TYPE_S16: return Immediate(left.data.s16[index]); case TYPE_U16: return Immediate(left.data.u16[index]); case TYPE_S32: return Immediate(left.data.s32[index]); case TYPE_U32: return Immediate(left.data.u32[index]); case TYPE_S64: return Immediate(left.data.s64[index]); case TYPE_U64: return Immediate(left.data.u64[index]); case TYPE_FLOAT: return Immediate(left.data.f32[index]); case TYPE_HALF: return Immediate(left.data.f16[index]); case TYPE_DOUBLE: return Immediate(left.data.f64[index]); } } else return *left.data.immVec[index]; } Immediate::Immediate(ImmOpCode op, const Immediate &left, const Immediate &right, Type dstType) { switch (op) { default: GBE_ASSERT(0 && "unsupported imm op\n"); case IMM_ADD: *this = left + right; break; case IMM_SUB: *this = left - right; break; case IMM_MUL: *this = left * right; break; case IMM_DIV: *this = left / right; break; case IMM_AND: *this = left & right; break; case IMM_OR: *this = left | right; break; case IMM_XOR: *this = left ^ right; break; case IMM_REM: { if (left.getType() > TYPE_BOOL && left.getType() <= TYPE_HALF) *this = left % right; else if (left.getType() == TYPE_FLOAT && right.getType() == TYPE_FLOAT) { *this = Immediate(left); *this->data.f32 = fmodf(left.getFloatValue(), right.getFloatValue()); } else if (left.getType() == TYPE_DOUBLE && right.getType() == TYPE_DOUBLE) { *this = Immediate(left); *this->data.f64 += fmod(left.getDoubleValue(), right.getDoubleValue()); } else GBE_ASSERT(0); break; } case IMM_LSHR: { if (left.getElemNum() == 1) *this = lshr(left, right); else { GBE_ASSERT(right.getIntegerValue() <= (left.getElemNum() * left.getTypeSize() * 8)); GBE_ASSERT(right.getIntegerValue() % (left.getTypeSize() * 8) == 0); copy(left, right.getIntegerValue() / (left.getTypeSize() * 8), left.getElemNum()); } break; } case IMM_ASHR: { if (left.getElemNum() == 1) *this = left >> right; else { GBE_ASSERT(0 && "Doesn't support ashr on array constant."); copy(left, right.getIntegerValue() / (left.getTypeSize() * 8), left.getElemNum()); } break; } case IMM_SHL: { if (left.getElemNum() == 1) *this = left << right; else { GBE_ASSERT(right.getIntegerValue() <= (left.getElemNum() * left.getTypeSize() * 8)); GBE_ASSERT(right.getIntegerValue() % (left.getTypeSize() * 8) == 0); copy(left, -right.getIntegerValue() / (left.getTypeSize() * 8), left.getElemNum()); } break; } case IMM_OEQ: *this = left == right; break; case IMM_ONE: *this = left != right; break; case IMM_OLE: *this = left <= right; break; case IMM_OGE: *this = left >= right; break; case IMM_OLT: *this = less(left, right); break; case IMM_OGT: *this = left > right; break; case IMM_ORD: *this = (left == left) && (right == right); break; case IMM_EXTRACT: *this = extract(left, right, dstType); break; } // If the dst type is large int, we will not change the imm type to large int. GBE_ASSERT(type == (ImmType)dstType || dstType == TYPE_LARGE_INT || dstType == TYPE_BOOL); } Immediate::Immediate(const vector immVec, Type dstType) { if (immVec.size() == 1) { *this = *immVec[0]; } else if (!(immVec[0]->isCompType()) && immVec[0]->elemNum == 1) { this->type = (ImmType)dstType; this->elemNum = immVec.size(); if (immVec[0]->getTypeSize() * immVec.size() < 8) this->data.p = &this->defaultData; else this->data.p = malloc(immVec[0]->getTypeSize() * immVec.size()); uint8_t *p = (uint8_t*)this->data.p; for(uint32_t i = 0; i < immVec.size(); i++) { GBE_ASSERT(immVec[i]->type == immVec[0]->type && immVec[i]->elemNum == 1); memcpy(p, immVec[i]->data.p, immVec[i]->getTypeSize()); p += immVec[i]->getTypeSize(); } } else { GBE_ASSERT(0); this->type = IMM_TYPE_COMP; if (immVec.size() * sizeof(Immediate*) < 8) this->data.p = &this->defaultData; else this->data.p = malloc(immVec.size() * sizeof(Immediate*)); this->elemNum = immVec.size(); for(uint32_t i = 0; i < immVec.size(); i++) this->data.immVec[i] = immVec[i]; } defaultData = 0ull; } // operator = and copy() are only called from constructor functions // which this never hold a memory pointer, we don't need to bother // to check the data.p before assignment. Immediate & Immediate::operator= (const Immediate & other) { if (this != &other) { type = other.type; elemNum = other.elemNum; if (other.data.p != &other.defaultData) { data.p = malloc(other.elemNum * other.getTypeSize()); memcpy(data.p, other.data.p, other.elemNum * other.getTypeSize()); } else { defaultData = other.defaultData; data.p = &defaultData; } } return *this; } void Immediate::copy(const Immediate &other, int32_t offset, uint32_t num) { if (this != &other) { if (other.type == IMM_TYPE_COMP && num == 1) { GBE_ASSERT(offset >= 0 && offset <= (int32_t)other.elemNum); *this = *other.data.immVec[offset]; return; } type = other.type; elemNum = num; if (num * other.getTypeSize() < 8) data.p = &defaultData; else data.p = malloc(num * other.getTypeSize()); uint8_t* datap = (uint8_t*)data.p; memset(datap, 0, num * other.getTypeSize()); if (offset < 0) { datap += (-offset) * other.getTypeSize(); num -= num < (uint32_t)(-offset) ? num : (-offset); offset = 0; } else if (offset > 0 && num > 1) { GBE_ASSERT((int32_t)num > offset); num -= offset; } memcpy(datap, (uint8_t*)other.data.p + offset * other.getTypeSize(), num * other.getTypeSize()); } } Beignet-1.3.2-Source/backend/src/ir/half.hpp000664 001750 001750 00000004154 13161142102 017666 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file half.hpp * */ #ifndef __GBE_IR_HALF_HPP__ #define __GBE_IR_HALF_HPP__ #include "llvm/ADT/APFloat.h" namespace gbe { namespace ir { /* Because there is no builtin half float data type for GCC on X86 platform, we need to generate a half class to implement all the OP and CONV for half float using LLVM's APFloat ADT. */ class half { private: uint16_t val; public: half(uint16_t v) : val(v) {}; static half convToHalf(uint16_t u16); static half convToHalf(int16_t v16); half(const half& other) { this->val = other.val; }; uint16_t getVal(void) { return val; }; operator float (void) const; operator double (void) const; operator uint16_t (void) const; operator int16_t (void) const; half operator+ (const half &) const; half operator- (const half &) const; half operator* (const half &) const; half operator/ (const half &) const; half operator% (const half &) const; bool operator> (const half &) const; bool operator< (const half &) const; bool operator== (const half &) const; bool operator!= (const half &) const; bool operator>= (const half &) const; bool operator<= (const half &) const; bool operator&& (const half &) const; bool operator|| (const half &) const; }; } /* namespace ir */ } /* namespace gbe */ #endif /* End of __GBE_IR_HALF_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/reloc.cpp000664 001750 001750 00000004301 13161142102 020045 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file constant.hpp * * \author Benjamin Segovia */ #include "reloc.hpp" namespace gbe { namespace ir { #define OUT_UPDATE_SZ(elt) SERIALIZE_OUT(elt, outs, ret_size) #define IN_UPDATE_SZ(elt) DESERIALIZE_IN(elt, ins, total_size) /*! Implements the serialization. */ uint32_t RelocTable::serializeToBin(std::ostream& outs) { uint32_t ret_size = 0; uint32_t sz = 0; OUT_UPDATE_SZ(magic_begin); sz = getCount(); OUT_UPDATE_SZ(sz); RelocEntry entry(0, 0); for (uint32_t i = 0; i < sz; ++i) { entry = entries[i]; OUT_UPDATE_SZ(entry.refOffset); OUT_UPDATE_SZ(entry.defOffset); } OUT_UPDATE_SZ(magic_end); OUT_UPDATE_SZ(ret_size); return ret_size; } uint32_t RelocTable::deserializeFromBin(std::istream& ins) { uint32_t total_size = 0; uint32_t magic; uint32_t refOffset; uint32_t defOffset; uint32_t sz = 0; IN_UPDATE_SZ(magic); if (magic != magic_begin) return 0; IN_UPDATE_SZ(sz); //regMap for (uint32_t i = 0; i < sz; i++) { IN_UPDATE_SZ(refOffset); IN_UPDATE_SZ(defOffset); addEntry(refOffset, defOffset); } IN_UPDATE_SZ(magic); if (magic != magic_end) return 0; uint32_t total_bytes; IN_UPDATE_SZ(total_bytes); if (total_bytes + sizeof(total_size) != total_size) return 0; return total_size; } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/profile.cpp000664 001750 001750 00000012570 13161142102 020410 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file profile.hpp * \author Benjamin Segovia */ #include "ir/profile.hpp" #include "ir/function.hpp" #include "sys/platform.hpp" namespace gbe { namespace ir { namespace ocl { const char *specialRegMean[] = { "local_id_0", "local_id_1", "local_id_2", "group_id_0", "group_id_1", "group_id_2", "num_groups_0", "num_groups_1", "num_groups_2", "local_size_0", "local_size_1", "local_size_2", "enqueued_local_size_0", "enqueued_local_size_1", "enqueued_local_size_2", "global_size_0", "global_size_1", "global_size_2", "global_offset_0", "global_offset_1", "global_offset_2", "stack_pointer", "stack_buffer", "block_ip", "barrier_id", "thread_number", "work_dimension", "zero", "one", "retVal", "dwblockip", "profiling_buffer_pointer", "profiling_timestamps0", "profiling_timestamps1", "profiling_timestamps2", "profiling_timestamps3", "profiling_timestamps4", "threadid", "constant_addrspace_start", "stack_size", "enqueue_buffer_pointer", }; #if GBE_DEBUG #define DECL_NEW_REG(FAMILY, REG, ...) \ r = fn.newRegister(FAMILY, __VA_ARGS__); \ GBE_ASSERT(r == REG); #else #define DECL_NEW_REG(FAMILY, REG, ...) \ fn.newRegister(FAMILY, __VA_ARGS__); #endif /* GBE_DEBUG */ static void init(Function &fn) { IF_DEBUG(Register r); DECL_NEW_REG(FAMILY_DWORD, lid0, 0, GBE_CURBE_LOCAL_ID_X); DECL_NEW_REG(FAMILY_DWORD, lid1, 0, GBE_CURBE_LOCAL_ID_Y); DECL_NEW_REG(FAMILY_DWORD, lid2, 0, GBE_CURBE_LOCAL_ID_Z); DECL_NEW_REG(FAMILY_DWORD, groupid0, 1); DECL_NEW_REG(FAMILY_DWORD, groupid1, 1); DECL_NEW_REG(FAMILY_DWORD, groupid2, 1); DECL_NEW_REG(FAMILY_DWORD, numgroup0, 1, GBE_CURBE_GROUP_NUM_X); DECL_NEW_REG(FAMILY_DWORD, numgroup1, 1, GBE_CURBE_GROUP_NUM_Y); DECL_NEW_REG(FAMILY_DWORD, numgroup2, 1, GBE_CURBE_GROUP_NUM_Z); DECL_NEW_REG(FAMILY_DWORD, lsize0, 1, GBE_CURBE_LOCAL_SIZE_X); DECL_NEW_REG(FAMILY_DWORD, lsize1, 1, GBE_CURBE_LOCAL_SIZE_Y); DECL_NEW_REG(FAMILY_DWORD, lsize2, 1, GBE_CURBE_LOCAL_SIZE_Z); DECL_NEW_REG(FAMILY_DWORD, enqlsize0, 1, GBE_CURBE_ENQUEUED_LOCAL_SIZE_X); DECL_NEW_REG(FAMILY_DWORD, enqlsize1, 1, GBE_CURBE_ENQUEUED_LOCAL_SIZE_Y); DECL_NEW_REG(FAMILY_DWORD, enqlsize2, 1, GBE_CURBE_ENQUEUED_LOCAL_SIZE_Z); DECL_NEW_REG(FAMILY_DWORD, gsize0, 1, GBE_CURBE_GLOBAL_SIZE_X); DECL_NEW_REG(FAMILY_DWORD, gsize1, 1, GBE_CURBE_GLOBAL_SIZE_Y); DECL_NEW_REG(FAMILY_DWORD, gsize2, 1, GBE_CURBE_GLOBAL_SIZE_Z); DECL_NEW_REG(FAMILY_DWORD, goffset0, 1, GBE_CURBE_GLOBAL_OFFSET_X); DECL_NEW_REG(FAMILY_DWORD, goffset1, 1, GBE_CURBE_GLOBAL_OFFSET_Y); DECL_NEW_REG(FAMILY_DWORD, goffset2, 1, GBE_CURBE_GLOBAL_OFFSET_Z); if(fn.getOclVersion() >= 200) { DECL_NEW_REG(FAMILY_QWORD, stackptr, 0); } else { DECL_NEW_REG(FAMILY_DWORD, stackptr, 0); } DECL_NEW_REG(FAMILY_QWORD, stackbuffer, 1, GBE_CURBE_EXTRA_ARGUMENT, GBE_STACK_BUFFER); DECL_NEW_REG(FAMILY_WORD, blockip, 0, GBE_CURBE_BLOCK_IP); DECL_NEW_REG(FAMILY_DWORD, barrierid, 1); DECL_NEW_REG(FAMILY_DWORD, threadn, 1, GBE_CURBE_THREAD_NUM); DECL_NEW_REG(FAMILY_DWORD, workdim, 1, GBE_CURBE_WORK_DIM); DECL_NEW_REG(FAMILY_DWORD, zero, 1); DECL_NEW_REG(FAMILY_DWORD, one, 1); DECL_NEW_REG(FAMILY_WORD, retVal, 1); DECL_NEW_REG(FAMILY_DWORD, dwblockip, 0, GBE_CURBE_DW_BLOCK_IP); DECL_NEW_REG(FAMILY_QWORD, profilingbptr, 1, GBE_CURBE_PROFILING_BUF_POINTER); DECL_NEW_REG(FAMILY_DWORD, profilingts0, 0, GBE_CURBE_PROFILING_TIMESTAMP0); DECL_NEW_REG(FAMILY_DWORD, profilingts1, 0, GBE_CURBE_PROFILING_TIMESTAMP1); DECL_NEW_REG(FAMILY_DWORD, profilingts2, 0, GBE_CURBE_PROFILING_TIMESTAMP2); DECL_NEW_REG(FAMILY_DWORD, profilingts3, 0, GBE_CURBE_PROFILING_TIMESTAMP3); DECL_NEW_REG(FAMILY_DWORD, profilingts4, 0, GBE_CURBE_PROFILING_TIMESTAMP4); DECL_NEW_REG(FAMILY_DWORD, threadid, 1, GBE_CURBE_THREAD_ID); DECL_NEW_REG(FAMILY_QWORD, constant_addrspace, 1, GBE_CURBE_CONSTANT_ADDRSPACE); DECL_NEW_REG(FAMILY_QWORD, stacksize, 1, GBE_CURBE_STACK_SIZE); DECL_NEW_REG(FAMILY_QWORD, enqueuebufptr, 1, GBE_CURBE_ENQUEUE_BUF_POINTER); } #undef DECL_NEW_REG } /* namespace ocl */ void initProfile(Function &fn) { const Profile profile = fn.getProfile(); switch (profile) { case PROFILE_C: GBE_ASSERTM(false, "Unsupported profile"); break; case PROFILE_OCL: ocl::init(fn); }; } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/value.hpp000664 001750 001750 00000027144 13161142102 020074 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file value.hpp * \author Benjamin Segovia */ #ifndef __GBE_IR_VALUE_HPP__ #define __GBE_IR_VALUE_HPP__ #include "ir/instruction.hpp" #include "ir/function.hpp" #include "sys/set.hpp" #include "sys/map.hpp" namespace gbe { namespace ir { // Make UD-Chain and DU-Chain computations faster and easier class Liveness; /*! A value definition is a destination of an instruction or a function * argument. Since we support multiple destinations, we also add the * destination ID. */ class ValueDef { public: /*! Discriminates the kind of values */ enum Type : uint32_t { DEF_FN_ARG = 0, DEF_FN_PUSHED = 1, DEF_INSN_DST = 2, DEF_SPECIAL_REG = 3 }; /*! Build a value from an instruction destination */ explicit ValueDef(const Instruction *insn, uint32_t dstID = 0u) : type(DEF_INSN_DST) { this->data.insn = insn; this->data.dstID = dstID; } /*! Build a value from a function argument */ explicit ValueDef(const FunctionArgument *arg) : type(DEF_FN_ARG) { this->data.arg = arg; } /*! Build a value from a pushed register */ explicit ValueDef(const PushLocation *pushed) : type(DEF_FN_PUSHED) { this->data.pushed = pushed; } /*! Build a value from a special register */ explicit ValueDef(const Register ®) : type(DEF_SPECIAL_REG) { this->data.regID = uint32_t(reg); } /*! Get the type of the value */ INLINE Type getType(void) const { return type; } /*! Get the instruction (only if this is a instruction value) */ INLINE const Instruction *getInstruction(void) const { GBE_ASSERT(type == DEF_INSN_DST); return data.insn; } /*! Get the destination ID (only if this is a instruction value) */ INLINE uint32_t getDstID(void) const { GBE_ASSERT(type == DEF_INSN_DST); return data.dstID; } /*! Get the function input (only if this is a function argument) */ INLINE const FunctionArgument *getFunctionArgument(void) const { GBE_ASSERT(type == DEF_FN_ARG); return data.arg; } /*! Get the pushed location */ INLINE const PushLocation *getPushLocation(void) const { GBE_ASSERT(type == DEF_FN_PUSHED); return data.pushed; } /*! Get the special register */ INLINE Register getSpecialReg(void) const { GBE_ASSERT(type == DEF_SPECIAL_REG); return Register(data.regID); } /*! Retrieve the register associated to the definition */ INLINE Register getRegister(void) const { if (type == DEF_SPECIAL_REG) return Register(data.regID); else if (type == DEF_FN_ARG) return data.arg->reg; else if (type == DEF_FN_PUSHED) return data.pushed->getRegister(); else return data.insn->getDst(data.dstID); } private: /*! Instruction or function argument */ union Data { /*! Instruction destination or ... */ struct { const Instruction *insn; //getSrc(srcID); } private: const Instruction *insn; //!< Instruction where the value is used uint32_t srcID; //!< Index of the source in the instruction GBE_CLASS(ValueUse); // Use gbe allocators }; /*! Compare two value uses (used in maps) */ INLINE bool operator< (const ValueUse &use0, const ValueUse &use1) { const Instruction *insn0 = use0.getInstruction(); const Instruction *insn1 = use1.getInstruction(); if (insn0 != insn1) return uintptr_t(insn0) < uintptr_t(insn1); const uint32_t src0 = use0.getSrcID(); const uint32_t src1 = use1.getSrcID(); return src0 < src1; } /*! All uses of a definition */ typedef set UseSet; /*! All possible definitions for a use */ typedef set DefSet; /*! Get the chains (in both directions) for the complete program. This data * structure is unfortunately way too brutal. Using std::sets all over the * place just burns a huge amount of memory. There is work to do to decrease * the memory footprint */ class FunctionDAG : public NonCopyable { public: /*! Build the complete DU/UD graphs for the program included in liveness */ FunctionDAG(Liveness &liveness); /*! Free all the resources */ ~FunctionDAG(void); /*! Get the du-chain for the definition */ const UseSet &getUse(const ValueDef &def) const; /*! Get the du-chain for the given instruction and destination */ const UseSet &getUse(const Instruction *insn, uint32_t dstID) const; /*! Get the du-chain for the given function input */ const UseSet &getUse(const FunctionArgument *arg) const; /*! Get the du-chain for the given pushed location */ const UseSet &getUse(const PushLocation *pushed) const; /*! Get the du-chain for the given special register */ const UseSet &getUse(const Register ®) const; /*! Get the ud-chain for the given use */ const DefSet &getDef(const ValueUse &use) const; /*! Get the ud-chain for the instruction and source */ const DefSet &getDef(const Instruction *insn, uint32_t srcID) const; /*! Get the pointer to the definition *as stored in the DAG* */ const ValueDef *getDefAddress(const ValueDef &def) const; /*! Get the pointer to the definition *as stored in the DAG* */ const ValueDef *getDefAddress(const PushLocation *pushed) const; /*! Get the pointer to the definition *as stored in the DAG* */ const ValueDef *getDefAddress(const Instruction *insn, uint32_t dstID) const; /*! Get the pointer to the definition *as stored in the DAG* */ const ValueDef *getDefAddress(const FunctionArgument *input) const; /*! Get the pointer to the definition *as stored in the DAG* */ const ValueDef *getDefAddress(const Register ®) const; /*! Get the pointer to the use *as stored in the DAG* */ const ValueUse *getUseAddress(const Instruction *insn, uint32_t srcID) const; /*! Get the set of all uses for the register */ const UseSet *getRegUse(const Register ®) const; /*! Get the set of all definitions for the register */ const DefSet *getRegDef(const Register ®) const; /*! Get the function we have the graph for */ INLINE const Function &getFunction(void) const { return fn; } /*! The DefSet for each definition use */ typedef map UDGraph; /*! The UseSet for each definition */ typedef map DUGraph; /*! get register's use and define BB set */ void getRegUDBBs(Register r, set &BBs) const; // check whether two register interering in the specific BB. // This function must be called at the following conditions: // 1. The outReg is in the BB's liveout set and not in the livein set. // 2. The inReg is in the BB's livein set but not in the livout set. bool interfere(const BasicBlock *bb, Register inReg, Register outReg) const; // check whether two register interfering to each other. // This function must be called at the following conditions: // r0 and r1 are both not a local variable which means they have information // in the liveness object. bool interfere(const Liveness &liveness, Register r0, Register r1) const; /*! check whether two registers which are both in liveout set interfering in the current BB. */ bool interfereLiveout(const BasicBlock *bb, Register r0, Register r1) const; /*! check whether two registers which are both in livein set interfering in the current BB. */ bool interfereLivein(const BasicBlock *bb, Register r0, Register r1) const; private: UDGraph udGraph; //!< All the UD chains DUGraph duGraph; //!< All the DU chains DefSet *udEmpty; //!< Void use set UseSet *duEmpty; //!< Void def set ValueDef *undefined; //!< Undefined value map useName; //!< Get the ValueUse pointer from the value map defName; //!< Get the ValueDef pointer from the value map regUse; //!< All uses of registers map regDef; //!< All defs of registers DECL_POOL(ValueDef, valueDefPool); //!< Fast ValueDef allocation DECL_POOL(ValueUse, valueUsePool); //!< Fast ValueUse allocation DECL_POOL(DefSet, udChainPool); //!< Fast DefSet allocation DECL_POOL(UseSet, duChainPool); //!< Fast UseSet allocation const Function &fn; //!< Function we are referring to GBE_CLASS(FunctionDAG); // Use internal allocators }; /*! Pretty print of the function DAG */ std::ostream &operator<< (std::ostream &out, const FunctionDAG &dag); } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_VALUE_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/instruction.hpp000664 001750 001750 00000105057 13161142102 021341 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file instruction.hpp * \author Benjamin Segovia */ #ifndef __GBE_IR_INSTRUCTION_HPP__ #define __GBE_IR_INSTRUCTION_HPP__ #include "ir/register.hpp" #include "ir/immediate.hpp" #include "ir/type.hpp" #include "sys/platform.hpp" #include "sys/intrusive_list.hpp" #include #define MAX_MIXED_POINTER 4 namespace gbe { namespace ir { struct BTI { uint8_t isConst; // whether fixed bti union { Register reg; // mixed reg unsigned short imm; // fixed bti }; BTI() : isConst(0) { } ~BTI() {} }; /*! All opcodes */ enum Opcode : uint8_t { #define DECL_INSN(INSN, FAMILY) OP_##INSN, #include "ir/instruction.hxx" #undef DECL_INSN OP_INVALID }; /*! Different memory spaces */ enum AddressSpace : uint8_t { MEM_GLOBAL = 0, //!< Global memory (a la OCL) MEM_LOCAL, //!< Local memory (thread group memory) MEM_CONSTANT, //!< Immutable global memory MEM_PRIVATE, //!< Per thread private memory MEM_MIXED, //!< mixed address space pointer. MEM_GENERIC, //!< mixed address space pointer. MEM_INVALID }; enum AddressMode : uint8_t { AM_DynamicBti = 0, AM_Stateless, AM_StaticBti, AM_INVALID }; enum AtomicOps { ATOMIC_OP_AND = 1, ATOMIC_OP_OR = 2, ATOMIC_OP_XOR = 3, ATOMIC_OP_XCHG = 4, ATOMIC_OP_INC = 5, ATOMIC_OP_DEC = 6, ATOMIC_OP_ADD = 7, ATOMIC_OP_SUB = 8, ATOMIC_OP_IMAX = 10, ATOMIC_OP_IMIN = 11, ATOMIC_OP_UMAX = 12, ATOMIC_OP_UMIN = 13, ATOMIC_OP_CMPXCHG = 14, ATOMIC_OP_INVALID }; enum WorkGroupOps { WORKGROUP_OP_ANY = 1, WORKGROUP_OP_ALL = 2, WORKGROUP_OP_BROADCAST = 3, WORKGROUP_OP_REDUCE_ADD = 4, WORKGROUP_OP_REDUCE_MIN = 5, WORKGROUP_OP_REDUCE_MAX = 6, WORKGROUP_OP_INCLUSIVE_ADD = 7, WORKGROUP_OP_INCLUSIVE_MIN = 8, WORKGROUP_OP_INCLUSIVE_MAX = 9, WORKGROUP_OP_EXCLUSIVE_ADD = 10, WORKGROUP_OP_EXCLUSIVE_MIN = 11, WORKGROUP_OP_EXCLUSIVE_MAX = 12, WORKGROUP_OP_INVALID }; /* Vote function per hardware thread */ enum VotePredicate : uint8_t { VOTE_ALL = 0, VOTE_ANY }; /*! Output the memory space */ std::ostream &operator<< (std::ostream &out, AddressSpace addrSpace); /*! A label is identified with an unsigned short */ TYPE_SAFE(LabelIndex, uint32_t) /*! Function class contains the register file and the register tuple. Any * information related to the registers may therefore require a function */ class Function; /*! Contains the stream of instructions */ class BasicBlock; /////////////////////////////////////////////////////////////////////////// /// All public instruction classes as manipulated by all public classes /////////////////////////////////////////////////////////////////////////// /*! Stores instruction internal data and opcode */ class ALIGNED(sizeof(uint64_t)*4) InstructionBase { public: /*! Initialize the instruction from a 8 bytes stream */ INLINE InstructionBase(Opcode op, const char* opaque) { opcode = op; for (uint32_t byte = 0; byte < opaqueSize; ++byte) this->opaque[byte] = opaque[byte]; } /*! Uninitialized instruction */ INLINE InstructionBase(void) {} /*! Get the instruction opcode */ INLINE Opcode getOpcode(void) const { return opcode; } protected: enum { opaqueSize = sizeof(uint64_t)*4-sizeof(uint8_t) }; Opcode opcode; //!< Idendifies the instruction char opaque[opaqueSize]; //!< Remainder of it GBE_CLASS(InstructionBase); //!< Use internal allocators }; /*! Store the instruction description in 32 bytes */ class Instruction : public InstructionBase, public intrusive_list_node { public: /*! Initialize the instruction from a 8 bytes stream */ INLINE Instruction(const char *stream) : InstructionBase(Opcode(stream[0]), &stream[1]) { parent = NULL; } /*! Copy the private fields and give it the same parent */ INLINE Instruction(const Instruction &other) : InstructionBase(other.opcode, other.opaque) { parent = other.parent; } private: /*! To be consistant with copy constructor */ INLINE Instruction &operator= (const Instruction &other) { return *this; } public: /*! Nothing to do here */ INLINE ~Instruction(void) {} /*! Uninitialized instruction */ INLINE Instruction(void) {} /*! Get the number of sources for this instruction */ uint32_t getSrcNum(void) const; /*! Get the number of destination for this instruction */ uint32_t getDstNum(void) const; /*! Get the register index of the given source */ Register getSrc(uint32_t ID = 0u) const; /*! Get the register index of the given destination */ Register getDst(uint32_t ID = 0u) const; /*! Get the register of the given source */ RegisterData getDstData(uint32_t ID = 0u) const; /*! Get the register of the given destination */ RegisterData getSrcData(uint32_t ID = 0u) const; /*! Set a register in src srcID */ void setSrc(uint32_t srcID, Register reg); /*! Set a register in dst dstID */ void setDst(uint32_t dstID, Register reg); /*! Is there any side effect in the memory sub-system? */ bool hasSideEffect(void) const; /*! Get / set the parent basic block */ BasicBlock *getParent(void) { return parent; } const BasicBlock *getParent(void) const { return parent; } void setParent(BasicBlock *block) { this->parent = block; } /*! Get the function from the parent basic block */ const Function &getFunction(void) const; Function &getFunction(void); /*! Check that the instruction is well formed (type properly match, * registers not of bound and so on). If not well formed, provide a reason * in string why */ bool wellFormed(std::string &why) const; /*! Replace other by this instruction */ void replace(Instruction *other) const; /*! Remove the instruction from the instruction stream */ void remove(void); /* Insert the instruction after the previous one. */ void insert(Instruction *prev, Instruction ** new_ins = NULL); void setDBGInfo(DebugInfo in) { DBGInfo = in; } /*! Indicates if the instruction belongs to instruction type T. Typically, T * can be BinaryInstruction, UnaryInstruction, LoadInstruction and so on */ template INLINE bool isMemberOf(void) const { return T::isClassOf(*this); } /*! max_src used by vme for payload passing and setting */ static const uint32_t MAX_SRC_NUM = 40; static const uint32_t MAX_DST_NUM = 32; DebugInfo DBGInfo; protected: BasicBlock *parent; //!< The basic block containing the instruction GBE_CLASS(Instruction); //!< Use internal allocators }; /*! Output the instruction string in the given stream */ std::ostream &operator<< (std::ostream &out, const Instruction &proxy); /*! Nullary instruction instructions are typed. */ class NullaryInstruction : public Instruction { public: /*! Get the type manipulated by the instruction */ Type getType(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Unary instructions are typed. dst and sources share the same type */ class UnaryInstruction : public Instruction { public: /*! Get the type manipulated by the instruction */ Type getType(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Binary instructions are typed. dst and sources share the same type */ class BinaryInstruction : public Instruction { public: /*! Get the type manipulated by the instruction */ Type getType(void) const; /*! Commutative instructions can allow better optimizations */ bool commutes(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Ternary instructions are typed. dst and sources share the same type */ class TernaryInstruction : public Instruction { public: Type getType(void) const; static bool isClassOf(const Instruction &insn); }; /*! Select instructions writes src0 to dst if cond is true. Otherwise, it * writes src1 */ class SelectInstruction : public Instruction { public: /*! Predicate is in slot 0. So first source to selec is in slot 1 */ static const uint32_t src0Index = 1; /*! Second source to select is in slot 2 */ static const uint32_t src1Index = 2; /*! Get the predicate of the selection instruction */ INLINE Register getPredicate(void) const { return this->getSrc(0); } /*! Get the type of both sources */ Type getType(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Compare instructions compare anything from the same type and return a * boolean value */ class CompareInstruction : public Instruction { public: /*! Get the type of the source registers */ Type getType(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! BitCast instruction converts from one type to another */ class BitCastInstruction : public Instruction { public: /*! Get the type of the source */ Type getSrcType(void) const; /*! Get the type of the destination */ Type getDstType(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Conversion instruction converts from one type to another */ class ConvertInstruction : public Instruction { public: /*! Get the type of the source */ Type getSrcType(void) const; /*! Get the type of the destination */ Type getDstType(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; class MemInstruction : public Instruction { public: unsigned getSurfaceIndex() const; unsigned getAddressIndex() const; /*! Address space that is manipulated here */ AddressMode getAddressMode() const; Register getBtiReg() const; /*! Return the register that contains the addresses */ Register getAddressRegister() const; AddressSpace getAddressSpace() const; /*! Return the types of the values */ Type getValueType() const; bool isAligned(void) const; void setBtiReg(Register reg); void setSurfaceIndex(unsigned idx); }; /*! Atomic instruction */ class AtomicInstruction : public MemInstruction { public: /*! Where the address register goes */ static const uint32_t addressIndex = 0; /*! Return the atomic function code */ AtomicOps getAtomicOpcode(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Store instruction. First source is the address. Next sources are the * values to store contiguously at the given address */ class StoreInstruction : public MemInstruction { public: /*! Where the address register goes */ static const uint32_t addressIndex = 0; uint32_t getValueNum(void) const; /*! Return the register that contain value valueID */ INLINE Register getValue(uint32_t valueID) const { GBE_ASSERT(valueID < this->getValueNum()); return this->getSrc(valueID + 1u); } /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); /*! Return true if the given instruction is block write */ bool isBlock() const; }; /*! Load instruction. The source is simply the address where to get the data. * The multiple destinations are the contiguous values loaded at the given * address */ class LoadInstruction : public MemInstruction { public: /*! Number of values loaded (ie number of destinations) */ uint32_t getValueNum(void) const; /*! Return the register that contain value valueID */ INLINE Register getValue(uint32_t valueID) const { return this->getDst(valueID); } /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); /*! Return true if the given instruction is block read */ bool isBlock() const; }; /*! Load immediate instruction loads an typed immediate value into the given * register. Since double and uint64_t values will not fit into an * instruction, the immediate themselves are stored in the function core. * Contrary to regular load instructions, there is only one destination * possible */ class LoadImmInstruction : public Instruction { public: /*! Return the value stored in the instruction */ Immediate getImmediate(void) const; /*! Return the type of the stored value */ Type getType(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Store data in an texture */ class TypedWriteInstruction : public Instruction { public: /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); uint8_t getImageIndex() const; Type getSrcType(void) const; Type getCoordType(void) const; }; /*! Load texels from a texture */ class SampleInstruction : public Instruction { public: uint8_t getImageIndex() const; uint8_t getSamplerIndex(void) const; uint8_t getSamplerOffset(void) const; Type getSrcType(void) const; Type getDstType(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Video motion estimation */ class VmeInstruction : public Instruction { public: uint8_t getImageIndex() const; uint8_t getMsgType() const; Type getSrcType(void) const; Type getDstType(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; typedef union _ImageInfoKey{ _ImageInfoKey(uint8_t i, uint8_t t) : index(i), type(t) {}; _ImageInfoKey(int key) : data(key) {}; struct { uint8_t index; /*! the allocated image index */ uint8_t type; /*! the information type */ }; uint16_t data; } ImageInfoKey; /*! Get image information */ class GetImageInfoInstruction : public Instruction { public: enum { WIDTH = 0, HEIGHT = 1, DEPTH = 2, CHANNEL_DATA_TYPE = 3, CHANNEL_ORDER = 4, }; static INLINE uint32_t getDstNum4Type(int infoType) { switch (infoType) { case WIDTH: case HEIGHT: case DEPTH: case CHANNEL_DATA_TYPE: case CHANNEL_ORDER: return 1; break; default: GBE_ASSERT(0); } return 0; } uint8_t getImageIndex() const; uint32_t getInfoType() const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! calculate the exec time and store it. */ class CalcTimestampInstruction : public Instruction { public: /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); /*! Get the point number of timestamp point */ uint32_t getPointNum(void) const; /*! Get the timestamp type */ uint32_t getTimestamptType(void) const; }; /*! store the profiling information. */ class StoreProfilingInstruction : public Instruction { public: /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); /*! Get the profiling info type */ uint32_t getProfilingType(void) const; /*! Get the BTI index*/ uint32_t getBTI(void) const; }; /*! Branch instruction is the unified way to branch (with or without * predicate) */ class BranchInstruction : public Instruction { public: /*! Indicate if the branch is predicated */ bool isPredicated(void) const; /*! Indicate if the branch is inverse predicated */ bool getInversePredicated(void) const; /*! Return the predicate register (if predicated) */ RegisterData getPredicate(void) const { GBE_ASSERTM(this->isPredicated() == true, "Branch is not predicated"); return this->getSrcData(0); } /*! Return the predicate register index (if predicated) */ Register getPredicateIndex(void) const { GBE_ASSERTM(this->isPredicated() == true, "Branch is not predicated"); return this->getSrc(0); } /*! Return the label index pointed by the branch */ LabelIndex getLabelIndex(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Label instruction are actual no-op but are referenced by branches as their * targets */ class LabelInstruction : public Instruction { public: /*! Return the label index of the instruction */ LabelIndex getLabelIndex(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Texture instruction are used for any texture mapping requests */ class TextureInstruction : public Instruction { public: /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Mapped to OpenCL (mem_fence, read_mem_fence, write_mem_fence, barrier) */ enum { SYNC_WORKGROUP_EXEC = 1<<0, SYNC_LOCAL_READ_FENCE = 1<<1, SYNC_LOCAL_WRITE_FENCE = 1<<2, SYNC_GLOBAL_READ_FENCE = 1<<3, SYNC_GLOBAL_WRITE_FENCE = 1<<4, SYNC_IMAGE_FENCE = 1<<5, SYNC_INVALID = 1<<6 }; /*! 5 bits to encode all possible synchronization capablities */ static const uint32_t syncFieldNum = 6u; /*! When barrier(CLK_LOCAL_MEM_FENCE) is issued */ static const uint32_t syncLocalBarrier = SYNC_WORKGROUP_EXEC |SYNC_LOCAL_WRITE_FENCE | SYNC_LOCAL_READ_FENCE; /*! When barrier(CLK_GLOBAL_MEM_FENCE) is issued */ static const uint32_t syncGlobalBarrier = SYNC_WORKGROUP_EXEC | SYNC_GLOBAL_WRITE_FENCE | SYNC_GLOBAL_READ_FENCE; static const uint32_t syncImageBarrier = SYNC_WORKGROUP_EXEC | SYNC_GLOBAL_WRITE_FENCE | SYNC_GLOBAL_READ_FENCE | SYNC_IMAGE_FENCE; /*! Sync instructions are used to order loads and stores for a given memory * space and/or to serialize threads at a given point in the program */ class SyncInstruction : public Instruction { public: /*! Get the parameters (bitfields) of the sync instructions (see above) */ uint32_t getParameters(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Read one register (8 DWORD) in arf */ class ReadARFInstruction : public Instruction { public: Type getType() const; ir::ARFRegister getARFRegister() const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! simd shuffle */ class SimdShuffleInstruction : public Instruction { public: Type getType(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! return a region of a register, make sure the offset does not exceed the register size */ class RegionInstruction : public Instruction { public: uint32_t getOffset(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Indirect Move instruction */ class IndirectMovInstruction : public Instruction { public: Type getType(void) const; uint32_t getOffset(void) const; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Indirect Move instruction */ class WaitInstruction : public Instruction { public: /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Related to Work Group. */ class WorkGroupInstruction : public Instruction { public: /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); Type getType(void) const; WorkGroupOps getWorkGroupOpcode(void) const; uint32_t getSlmAddr(void) const; }; /*! Related to Sub Group. */ class SubGroupInstruction : public Instruction { public: /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); Type getType(void) const; WorkGroupOps getWorkGroupOpcode(void) const; }; /*! Printf instruction. */ class PrintfInstruction : public Instruction { public: uint32_t getNum(void) const; uint32_t getBti(void) const; Type getType(const Function& fn, uint32_t ID) const; Type getType(uint32_t ID) const { return this->getType(this->getFunction(), ID); }; /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); }; /*! Media Block Read. */ class MediaBlockReadInstruction : public Instruction { public: /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); uint8_t getImageIndex() const; uint8_t getVectorSize() const; Type getType(void) const; }; /*! Media Block Write. */ class MediaBlockWriteInstruction : public Instruction { public: /*! Return true if the given instruction is an instance of this class */ static bool isClassOf(const Instruction &insn); uint8_t getImageIndex() const; uint8_t getVectorSize() const; Type getType(void) const; }; /*! Specialize the instruction. Also performs typechecking first based on the * opcode. Crashes if it fails */ template INLINE T *cast(Instruction *insn) { if(insn->isMemberOf()) return reinterpret_cast(insn); else return NULL; } template INLINE const T *cast(const Instruction *insn) { if(insn->isMemberOf()) return reinterpret_cast(insn); else return NULL; } template INLINE T &cast(Instruction &insn) { GBE_ASSERTM(insn.isMemberOf() == true, "Invalid instruction type"); return reinterpret_cast(insn); } template INLINE const T &cast(const Instruction &insn) { GBE_ASSERTM(insn.isMemberOf() == true, "Invalid instruction type"); return reinterpret_cast(insn); } /*! Indicates if the given opcode belongs the given instruction family */ template struct EqualType {enum {value = false};}; template struct EqualType { enum {value = true};}; template INLINE bool isOpcodeFrom(Opcode op) { switch (op) { #define DECL_INSN(OPCODE, FAMILY) \ case OP_##OPCODE: return EqualType::value; #include "instruction.hxx" #undef DECL_INSN default: NOT_SUPPORTED; return false; } } /////////////////////////////////////////////////////////////////////////// /// All emission functions /////////////////////////////////////////////////////////////////////////// /*! alu0.type dst */ Instruction ALU0(Opcode opcode, Type type, Register dst); /*! simd_size.type dst */ Instruction SIMD_SIZE(Type type, Register dst); /*! simd_id.type dst */ Instruction SIMD_ID(Type type, Register dst); /*! alu1.type dst src */ Instruction ALU1(Opcode opcode, Type type, Register dst, Register src); /*! mov.type dst src */ Instruction MOV(Type type, Register dst, Register src); /*! cos.type dst src */ Instruction COS(Type type, Register dst, Register src); /*! sin.type dst src */ Instruction SIN(Type type, Register dst, Register src); /*! mul_hi.type dst src */ Instruction MUL_HI(Type type, Register dst, Register src0, Register src1); /*! i64_mul_hi.type dst src */ Instruction I64_MUL_HI(Type type, Register dst, Register src0, Register src1); /*! i64madsat.type dst src */ Instruction I64MADSAT(Type type, Register dst, Tuple src); /*! mad.type dst src */ Instruction MAD(Type type, Register dst, Tuple src); /*! lrp.type dst src */ Instruction LRP(Type type, Register dst, Tuple src); /*! upsample_short.type dst src */ Instruction UPSAMPLE_SHORT(Type type, Register dst, Register src0, Register src1); /*! upsample_int.type dst src */ Instruction UPSAMPLE_INT(Type type, Register dst, Register src0, Register src1); /*! upsample_long.type dst src */ Instruction UPSAMPLE_LONG(Type type, Register dst, Register src0, Register src1); /*! fbh.type dst src */ Instruction FBH(Type type, Register dst, Register src); /*! fbl.type dst src */ Instruction FBL(Type type, Register dst, Register src); /*! cbit.type dst src */ Instruction CBIT(Type type, Register dst, Register src); /*! lzd.type dst src */ Instruction LZD(Type type, Register dst, Register src); /*! hadd.type dst src */ Instruction HADD(Type type, Register dst, Register src0, Register src1); /*! rhadd.type dst src */ Instruction RHADD(Type type, Register dst, Register src0, Register src1); /*! i64hadd.type dst src */ Instruction I64HADD(Type type, Register dst, Register src0, Register src1); /*! i64rhadd.type dst src */ Instruction I64RHADD(Type type, Register dst, Register src0, Register src1); /*! tan.type dst src */ Instruction RCP(Type type, Register dst, Register src); /*! abs.type dst src */ Instruction ABS(Type type, Register dst, Register src); /*! simd_all.type dst src */ Instruction SIMD_ALL(Type type, Register dst, Register src); /*! simd_any.type dst src */ Instruction SIMD_ANY(Type type, Register dst, Register src); /*! log.type dst src */ Instruction LOG(Type type, Register dst, Register src); /*! exp.type dst src */ Instruction EXP(Type type, Register dst, Register src); /*! sqr.type dst src */ Instruction SQR(Type type, Register dst, Register src); /*! rsq.type dst src */ Instruction RSQ(Type type, Register dst, Register src); /*! rndd.type dst src */ Instruction RNDD(Type type, Register dst, Register src); /*! rnde.type dst src */ Instruction RNDE(Type type, Register dst, Register src); /*! rndu.type dst src */ Instruction RNDU(Type type, Register dst, Register src); /*! rndz.type dst src */ Instruction RNDZ(Type type, Register dst, Register src); /*! bswap.type dst src */ Instruction BSWAP(Type type, Register dst, Register src); /*! bfrev.type dst src */ Instruction BFREV(Type type, Register dst, Register src); /*! pow.type dst src0 src1 */ Instruction POW(Type type, Register dst, Register src0, Register src1); /*! mul.type dst src0 src1 */ Instruction MUL(Type type, Register dst, Register src0, Register src1); /*! add.type dst src0 src1 */ Instruction ADD(Type type, Register dst, Register src0, Register src1); /*! addsat.type dst src0 src1 */ Instruction ADDSAT(Type type, Register dst, Register src0, Register src1); /*! sub.type dst src0 src1 */ Instruction SUB(Type type, Register dst, Register src0, Register src1); /*! subsat.type dst src0 src1 */ Instruction SUBSAT(Type type, Register dst, Register src0, Register src1); /*! div.type dst src0 src1 */ Instruction DIV(Type type, Register dst, Register src0, Register src1); /*! rem.type dst src0 src1 */ Instruction REM(Type type, Register dst, Register src0, Register src1); /*! shl.type dst src0 src1 */ Instruction SHL(Type type, Register dst, Register src0, Register src1); /*! shr.type dst src0 src1 */ Instruction SHR(Type type, Register dst, Register src0, Register src1); /*! asr.type dst src0 src1 */ Instruction ASR(Type type, Register dst, Register src0, Register src1); /*! bsf.type dst src0 src1 */ Instruction BSF(Type type, Register dst, Register src0, Register src1); /*! bsb.type dst src0 src1 */ Instruction BSB(Type type, Register dst, Register src0, Register src1); /*! or.type dst src0 src1 */ Instruction OR(Type type, Register dst, Register src0, Register src1); /*! xor.type dst src0 src1 */ Instruction XOR(Type type, Register dst, Register src0, Register src1); /*! and.type dst src0 src1 */ Instruction AND(Type type, Register dst, Register src0, Register src1); /*! sel.type dst {cond, src0, src1} (== src) */ Instruction SEL(Type type, Register dst, Tuple src); /*! eq.type dst src0 src1 */ Instruction EQ(Type type, Register dst, Register src0, Register src1); /*! ne.type dst src0 src1 */ Instruction NE(Type type, Register dst, Register src0, Register src1); /*! lt.type dst src0 src1 */ Instruction LE(Type type, Register dst, Register src0, Register src1); /*! le.type dst src0 src1 */ Instruction LT(Type type, Register dst, Register src0, Register src1); /*! gt.type dst src0 src1 */ Instruction GE(Type type, Register dst, Register src0, Register src1); /*! ge.type dst src0 src1 */ Instruction GT(Type type, Register dst, Register src0, Register src1); /*! ord.type dst src0 src1 */ Instruction ORD(Type type, Register dst, Register src0, Register src1); /*! sub_group_shuffle.type dst src0 src1 */ Instruction SIMD_SHUFFLE(Type type, Register dst, Register src0, Register src1); /*! BITCAST.{dstType <- srcType} dst src */ Instruction BITCAST(Type dstType, Type srcType, Tuple dst, Tuple src, uint8_t dstNum, uint8_t srcNum); /*! cvt.{dstType <- srcType} dst src */ Instruction CVT(Type dstType, Type srcType, Register dst, Register src); /*! sat_cvt.{dstType <- srcType} dst src */ Instruction SAT_CVT(Type dstType, Type srcType, Register dst, Register src); /*! F16TO32.{dstType <- srcType} dst src */ Instruction F16TO32(Type dstType, Type srcType, Register dst, Register src); /*! F32TO16.{dstType <- srcType} dst src */ Instruction F32TO16(Type dstType, Type srcType, Register dst, Register src); /*! atomic dst addr.space {src1 {src2}} */ Instruction ATOMIC(AtomicOps opcode, Type, Register dst, AddressSpace space, Register ptr, Tuple payload, AddressMode, unsigned); Instruction ATOMIC(AtomicOps opcode, Type, Register dst, AddressSpace space, Register ptr, Tuple src, AddressMode, Register); /*! bra labelIndex */ Instruction BRA(LabelIndex labelIndex); /*! (pred) bra labelIndex */ Instruction BRA(LabelIndex labelIndex, Register pred); /*! (pred) if labelIndex */ Instruction IF(LabelIndex labelIndex, Register pred, bool inv_pred=true); /*! else labelIndex */ Instruction ELSE(LabelIndex labelIndex); /*! endif */ Instruction ENDIF(LabelIndex labelIndex); /*! (pred) while labelIndex */ Instruction WHILE(LabelIndex labelIndex, Register pred); /*! ret */ Instruction RET(void); /*! load.type.space {dst1,...,dst_valueNum} offset value, {bti} */ Instruction LOAD(Type type, Tuple dst, Register offset, AddressSpace space, uint32_t valueNum, bool dwAligned, AddressMode, unsigned SurfaceIndex, bool isBlock = false); Instruction LOAD(Type type, Tuple dst, Register offset, AddressSpace space, uint32_t valueNum, bool dwAligned, AddressMode, Register bti); /*! store.type.space offset {src1,...,src_valueNum} value {bti}*/ Instruction STORE(Type type, Tuple src, Register offset, AddressSpace space, uint32_t valueNum, bool dwAligned, AddressMode, unsigned SurfaceIndex, bool isBlock = false); Instruction STORE(Type type, Tuple src, Register offset, AddressSpace space, uint32_t valueNum, bool dwAligned, AddressMode, Register bti); /*! loadi.type dst value */ Instruction LOADI(Type type, Register dst, ImmediateIndex value); /*! sync.params... (see Sync instruction) */ Instruction SYNC(uint32_t parameters); Instruction READ_ARF(Type type, Register dst, ARFRegister arf); Instruction REGION(Register dst, Register src, uint32_t offset); Instruction INDIRECT_MOV(Type type, Register dst, Register src0, Register src1, uint32_t offset); /*! typed write */ Instruction TYPED_WRITE(uint8_t imageIndex, Tuple src, uint8_t srcNum, Type srcType, Type coordType); /*! sample textures */ Instruction SAMPLE(uint8_t imageIndex, Tuple dst, Tuple src, uint8_t srcNum, bool dstIsFloat, bool srcIsFloat, uint8_t sampler, uint8_t samplerOffset); /*! video motion estimation */ Instruction VME(uint8_t imageIndex, Tuple dst, Tuple src, uint32_t dstNum, uint32_t srcNum, int msg_type, int vme_search_path_lut, int lut_sub); /*! get image information , such as width/height/depth/... */ Instruction GET_IMAGE_INFO(int infoType, Register dst, uint8_t imageIndex, Register infoReg); /*! label labelIndex */ Instruction LABEL(LabelIndex labelIndex); /*! calculate the execute timestamp for profiling */ Instruction CALC_TIMESTAMP(uint32_t pointNum, uint32_t tsType); /*! calculate the execute timestamp for profiling */ Instruction STORE_PROFILING(uint32_t bti, uint32_t Type); /*! wait */ Instruction WAIT(void); /*! work group */ Instruction WORKGROUP(WorkGroupOps opcode, uint32_t slmAddr, Register dst, Tuple srcTuple, uint8_t srcNum, Type type); /*! sub group */ Instruction SUBGROUP(WorkGroupOps opcode, Register dst, Tuple srcTuple, uint8_t srcNum, Type type); /*! printf */ Instruction PRINTF(Register dst, Tuple srcTuple, Tuple typeTuple, uint8_t srcNum, uint8_t bti, uint16_t num); /*! media block read */ Instruction MBREAD(uint8_t imageIndex, Tuple dst, uint8_t vec_size, Tuple coord, uint8_t srcNum, Type type); /*! media block write */ Instruction MBWRITE(uint8_t imageIndex, Tuple srcTuple, uint8_t srcNum, uint8_t vec_size, Type type); } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_INSTRUCTION_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/sampler.hpp000664 001750 001750 00000006135 13161142102 020420 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file sampler.hpp * * \author Benjamin Segovia */ #ifndef __GBE_IR_SAMPLER_HPP__ #define __GBE_IR_SAMPLER_HPP__ #include "ir/register.hpp" #include "sys/map.hpp" namespace gbe { namespace ir { /*! A sampler set is a set of global samplers which are defined as constant global * sampler or defined in the outermost kernel scope variables. According to the spec * all the variable should have a initialized integer value and can't be modified. */ class Context; class SamplerSet : public Serializable { public: /*! Append the specified sampler and return the allocated offset. * If the speficied sampler is exist, only return the previous offset and * don't append it again. Return -1, if failed.*/ uint8_t append(uint32_t clkSamplerValue, Context *ctx); /*! Append a sampler defined in kernel args. */ uint8_t append(Register samplerArg, Context *ctx); size_t getDataSize(void) { return samplerMap.size(); } size_t getDataSize(void) const { return samplerMap.size(); } void getData(uint32_t *samplers) const { for (map::const_iterator it = samplerMap.begin(); it != samplerMap.end(); ++it) samplers[it->second] = it->first; } void operator = (const SamplerSet& other) { samplerMap.insert(other.samplerMap.begin(), other.samplerMap.end()); } bool empty() const { return samplerMap.empty(); } SamplerSet(const SamplerSet& other) : samplerMap(other.samplerMap.begin(), other.samplerMap.end()) { } SamplerSet() {} static const uint32_t magic_begin = TO_MAGIC('S', 'A', 'M', 'P'); static const uint32_t magic_end = TO_MAGIC('P', 'M', 'A', 'S'); /* format: magic_begin | samplerMap_size | element_1 | ........ | element_n | regMap_size | element_1 | ........ | element_n | magic_end | total_size */ /*! Implements the serialization. */ virtual uint32_t serializeToBin(std::ostream& outs); virtual uint32_t deserializeFromBin(std::istream& ins); virtual void printStatus(int indent, std::ostream& outs); private: uint8_t appendReg(uint32_t key, Context *ctx); map samplerMap; GBE_CLASS(SamplerSet); }; } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_SAMPLER_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/function.hpp000664 001750 001750 00000062645 13173554000 020621 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file function.hpp * \author Benjamin Segovia */ #ifndef __GBE_IR_FUNCTION_HPP__ #define __GBE_IR_FUNCTION_HPP__ #include "ir/immediate.hpp" #include "ir/register.hpp" #include "ir/instruction.hpp" #include "ir/profile.hpp" #include "ir/sampler.hpp" #include "ir/printf.hpp" #include "ir/image.hpp" #include "sys/vector.hpp" #include "sys/set.hpp" #include "sys/map.hpp" #include "sys/alloc.hpp" #include namespace gbe { namespace ir { /*! Commonly used in the CFG */ typedef set BlockSet; class Unit; // Function belongs to a unit /*! Function basic blocks really belong to a function since: * 1 - registers used in the basic blocks belongs to the function register * file * 2 - branches point to basic blocks of the same function */ class BasicBlock : public NonCopyable, public intrusive_list { public: /*! Empty basic block */ BasicBlock(Function &fn); /*! Releases all the instructions */ ~BasicBlock(void); /*! Append a new instruction at the end of the stream */ void append(Instruction &insn); void insertAt(iterator pos, Instruction &insn); /*! Get the parent function */ Function &getParent(void) { return fn; } const Function &getParent(void) const { return fn; } /*! Get the next and previous allocated block */ BasicBlock *getNextBlock(void) const { return this->nextBlock; } BasicBlock *getPrevBlock(void) const { return this->prevBlock; } /*! Get / set the first and last instructions */ Instruction *getFirstInstruction(void) const; Instruction *getLastInstruction(void) const; /*! Get successors and predecessors */ const BlockSet &getSuccessorSet(void) const { return successors; } const BlockSet &getPredecessorSet(void) const { return predecessors; } /*! Get the label index of this block */ LabelIndex getLabelIndex(void) const; /*! Apply the given functor on all instructions */ template INLINE void foreach(const T &functor) { auto it = this->begin(); while (it != this->end()) { auto curr = it++; functor(*curr); } } set undefPhiRegs; set definedPhiRegs; /* these three are used by structure transforming */ public: /* if needEndif is true, it means that this bb is the exit of an * outermost structure, so this block needs another endif to match * the if inserted at the entry of this structure, otherwise this * block is in the middle of a structure, there's no need to insert * extra endif. */ bool needEndif; /* if needIf is true, it means that this bb is the entry of an * outermost structure, so this block needs an if instruction just * like other unstructured bbs. otherwise this block is in the * middle of a structure, there's no need to insert an if. */ bool needIf; /* since we need to insert an if and endif at the entry and exit * bb of an outermost structure respectively, so the endif is not * in the same bb with if, in order to get the endif's position, * we need to store the endif label in the entry bb. */ LabelIndex endifLabel; /* the identified if-then and if-else structure contains more than * one bbs, in order to insert if, else and endif properly, we give * all the IF ELSE and ENDIF a label for convenience. matchingEndifLabel * is used when inserts instruction if and else, and matchingElseLabel * is used when inserts instruction if. */ LabelIndex matchingEndifLabel; LabelIndex matchingElseLabel; /* IR ELSE's target is the matching ENDIF's LabelIndex, thisElseLabel * is used to store the virtual label of the instruction just below * ELSE. */ LabelIndex thisElseLabel; /* betongToStructure is used as a mark of wether this bb belongs to an * identified structure. */ bool belongToStructure; /* isStructureExit and matchingStructureEntry is used for buildJIPs at * backend, isStructureExit is true means the bb is an identified structure's * exit bb, while matchingStructureEntry means the entry bb of the same * identified structure. so if isStructureExit is false then matchingStructureEntry * is meaningless. */ bool isStructureExit; /* This block is an exit point of a loop block. It may not be exit point of the large structure block. */ bool isLoopExit; /* This block has an extra branch in the end of the block. */ bool hasExtraBra; BasicBlock *matchingStructureEntry; /* variable liveout is for if-else structure liveness analysis. eg. we have an sequence of * bbs of 0, 1, 2, 3, 4 and the CFG is as below: * 0 * |\ * 1 \ * | 2 * 4 | * \ / * 3 * we would identify 1 and 4 an sequence structure and 0 1 4 2 an if-else structure. * since we will insert an else instruction at the top of bb 2, we have to add an * unconditional jump at the bottom of bb 4 to bb 2 for executing the inserted else. this * would cause a change of CFG. at origin, bb 2 always executes before bb 4, but after * this insertion, bb 2 may executes after bb 4 which leads to bb 2's livein(i.e. part of * bb 0's liveout) may be destroyed by bb 4. so we inserted the livein of the entry of * else node into all the basic blocks belong to 'then' part while the liveout is * calculated in structural_analysis.cpp:calculateNecessaryLiveout(); */ std::set liveout; /* selfLoop's label. * */ LabelIndex whileLabel; private: friend class Function; //!< Owns the basic blocks BlockSet predecessors; //!< Incoming blocks BlockSet successors; //!< Outgoing blocks BasicBlock *nextBlock; //!< Block allocated just after this one BasicBlock *prevBlock; //!< Block allocated just before this one Function &fn; //!< Function the block belongs to GBE_CLASS(BasicBlock); }; /*! In fine, function input arguments can be pushed from the constant * buffer if they are structures. Other arguments can be images (textures) * and will also require special treatment. */ struct FunctionArgument { enum Type { GLOBAL_POINTER = 0, // __global CONSTANT_POINTER = 1, // __constant LOCAL_POINTER = 2, // __local VALUE = 3, // int, float STRUCTURE = 4, // struct foo IMAGE = 5, // image*d_t SAMPLER = 6, PIPE = 7 // pipe }; struct InfoFromLLVM { // All the info about passed by llvm, using -cl-kernel-arg-info uint32_t addrSpace; std::string typeName; std::string typeBaseName; std::string accessQual; std::string typeQual; std::string argName; // My different from arg->getName() uint32_t typeSize; // only llvm-3.6 or later has kernel_arg_base_type in metadata. #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR <= 35 bool isImage1dT() const { return typeName.compare("image1d_t") == 0; } bool isImage1dArrayT() const { return typeName.compare("image1d_array_t") == 0; } bool isImage1dBufferT() const { return typeName.compare("image1d_buffer_t") == 0; } bool isImage2dT() const { return typeName.compare("image2d_t") == 0; } bool isImage2dArrayT() const { return typeName.compare("image2d_array_t") == 0; } bool isImage3dT() const { return typeName.compare("image3d_t") == 0; } bool isSamplerType() const { return typeName.compare("sampler_t") == 0; } #else bool isImage1dT() const { return typeBaseName.find("image1d_t") !=std::string::npos; } bool isImage1dArrayT() const { return typeBaseName.find("image1d_array_t") !=std::string::npos; } bool isImage1dBufferT() const { return typeBaseName.find("image1d_buffer_t") !=std::string::npos; } bool isImage2dT() const { return typeBaseName.find("image2d_t") !=std::string::npos; } bool isImage2dArrayT() const { return typeBaseName.find("image2d_array_t") !=std::string::npos; } bool isImage3dT() const { return typeBaseName.find("image3d_t") !=std::string::npos; } bool isSamplerType() const { return typeBaseName.compare("sampler_t") == 0; } #endif bool isImageType() const { return isImage1dT() || isImage1dArrayT() || isImage1dBufferT() || isImage2dT() || isImage2dArrayT() || isImage3dT(); } bool isPipeType() const { return typeQual.compare("pipe") == 0; } }; /*! Create a function input argument */ INLINE FunctionArgument(Type type, Register reg, uint32_t size, const std::string &name, uint32_t align, InfoFromLLVM& info, uint8_t bti) : type(type), reg(reg), size(size), align(align), name(name), info(info), bti(bti) { } Type type; //!< Gives the type of argument we have Register reg; //!< Holds the argument uint32_t size; //!< == sizeof(void*) for ptr, sizeof(elem) for the rest uint32_t align; //!< address alignment for the argument const std::string name; //!< Holds the function name for IR output InfoFromLLVM info; //!< Holds the llvm passed info uint8_t bti; //!< binding table index GBE_STRUCT(FunctionArgument); // Use custom allocator }; /*! Maps the pushed register to the function argument */ struct PushLocation { INLINE PushLocation(const Function &fn, uint32_t argID, uint32_t offset) : fn(fn), argID(argID), offset(offset) {} /*! Get the pushed virtual register */ Register getRegister(void) const; const Function &fn; //!< Function it belongs to uint32_t argID; //!< Function argument uint32_t offset; //!< Offset in the function argument GBE_STRUCT(PushLocation); // Use custom allocator }; /*! For maps and sets */ INLINE bool operator< (const PushLocation &arg0, const PushLocation &arg1) { if (arg0.argID != arg1.argID) return arg0.argID < arg1.argID; return arg0.offset < arg1.offset; } /*! CFG loops */ struct Loop : public NonCopyable { public: Loop(LabelIndex pre, int paren, const vector &in, const vector> &exit) : preheader(pre), parent(paren), bbs(in), exits(exit) {} LabelIndex preheader; int parent; vector bbs; vector> exits; GBE_STRUCT(Loop); }; /*! A function is : * - a register file * - a set of basic block layout into a CGF * - input arguments */ class Function : public NonCopyable { public: /*! Map of all pushed registers */ typedef map PushMap; /*! Map of all pushed location (i.e. part of function argument) */ typedef map LocationMap; /*! Create an empty function */ Function(const std::string &name, const Unit &unit, Profile profile = PROFILE_OCL); /*! Release everything *including* the basic block pointers */ ~Function(void); /*! Get the function profile */ INLINE Profile getProfile(void) const { return profile; } /*! Get a new valid register */ INLINE Register newRegister(RegisterFamily family, bool uniform = false, gbe_curbe_type curbeType = GBE_GEN_REG, int subType = 0) { return this->file.append(family, uniform, curbeType, subType); } /*! Get the function name */ const std::string &getName(void) const { return name; } /*! When set, we do not have choice any more in the back end for it */ INLINE void setSimdWidth(uint32_t width) { simdWidth = width; } /*! Get the SIMD width (0 if not forced) */ uint32_t getSimdWidth(void) const { return simdWidth; } /*! Extract the register from the register file */ INLINE RegisterData getRegisterData(Register reg) const { return file.get(reg); } /*! set a register to uniform or nonuniform type. */ INLINE void setRegisterUniform(Register reg, bool uniform) { file.setUniform(reg, uniform); } /*! return true if the specified regsiter is uniform type */ INLINE bool isUniformRegister(Register reg) { return file.isUniform(reg); } /*! set register as specified payload type */ INLINE void setRegPayloadType(Register reg, gbe_curbe_type curbeType, int subType) { file.setPayloadType(reg, curbeType, subType); } /*! get register's payload type. */ INLINE void getRegPayloadType(Register reg, gbe_curbe_type &curbeType, int &subType) const { file.getPayloadType(reg, curbeType, subType); } /*! check whether a register is a payload register */ INLINE bool isPayloadReg(Register reg) const{ return file.isPayloadReg(reg); } /*! Get the register family from the register itself */ INLINE RegisterFamily getRegisterFamily(Register reg) const { return this->getRegisterData(reg).family; } /*! Get the register from the tuple vector */ INLINE Register getRegister(Tuple ID, uint32_t which) const { return file.get(ID, which); } /*! Set the register from the tuple vector */ INLINE void setRegister(Tuple ID, uint32_t which, Register reg) { file.set(ID, which, reg); } /*! Get the type from the tuple vector */ INLINE uint8_t getType(Tuple ID, uint32_t which) const { return file.getType(ID, which); } /*! Set the type into the tuple vector */ INLINE void setType(Tuple ID, uint32_t which, uint8_t type) { file.setType(ID, which, type); } /*! Get the register file */ INLINE const RegisterFile &getRegisterFile(void) const { return file; } /*! Get the given value ie immediate from the function */ INLINE const Immediate &getImmediate(ImmediateIndex ID) const { return immediates[ID]; } /*! Create a new immediate and returns its index */ INLINE ImmediateIndex newImmediate(const Immediate &imm) { const ImmediateIndex index(this->immediateNum()); this->immediates.push_back(imm); return index; } /*! Fast allocation / deallocation of instructions */ DECL_POOL(Instruction, insnPool); /*! Get input argument */ INLINE const FunctionArgument &getArg(uint32_t ID) const { GBE_ASSERT(args[ID] != NULL); return *args[ID]; } INLINE FunctionArgument &getArg(uint32_t ID) { GBE_ASSERT(args[ID] != NULL); return *args[ID]; } /*! Get arg ID. */ INLINE int32_t getArgID(FunctionArgument *requestArg) { for (uint32_t ID = 0; ID < args.size(); ID++) { if ( args[ID] == requestArg ) return ID; } GBE_ASSERTM(0, "Failed to get a valid argument ID."); return -1; } /*! Get the number of pushed registers */ INLINE uint32_t pushedNum(void) const { return pushMap.size(); } /*! Get the pushed data location for the given register */ INLINE const PushLocation *getPushLocation(Register reg) const { auto it = pushMap.find(reg); if (it == pushMap.end()) return NULL; else return &it->second; } /*! Get the map of pushed registers */ const PushMap &getPushMap(void) const { return this->pushMap; } /*! Get the map of pushed registers */ const LocationMap &getLocationMap(void) const { return this->locationMap; } /*! Get input argument from the register (linear research). Return NULL if * this is not an input argument */ INLINE const FunctionArgument *getArg(const Register ®) const { for (size_t i = 0; i < args.size(); ++i) { const FunctionArgument* arg = args[i]; if (arg->reg == reg) return arg; } return NULL; } INLINE FunctionArgument *getArg(const Register ®) { for (size_t i = 0; i < args.size(); ++i) { FunctionArgument* arg = args[i]; if (arg->reg == reg) return arg; } return NULL; } /*! Get output register */ INLINE Register getOutput(uint32_t ID) const { return outputs[ID]; } /*! Get the argument location for the pushed register */ INLINE const PushLocation &getPushLocation(Register reg) { GBE_ASSERT(pushMap.contains(reg) == true); return pushMap.find(reg)->second; } /*! Says if this is the top basic block (entry point) */ bool isEntryBlock(const BasicBlock &bb) const; /*! Get function the entry point block */ BasicBlock &getTopBlock(void) const; /*! Get the last block */ const BasicBlock &getBottomBlock(void) const; /*! Get the last block */ BasicBlock &getBottomBlock(void); /*! Get block from its label */ BasicBlock &getBlock(LabelIndex label) const; /*! Get the label instruction from its label index */ const LabelInstruction *getLabelInstruction(LabelIndex index) const; /*! Return the number of instructions of the largest basic block */ uint32_t getLargestBlockSize(void) const; /*! Get the first index of the special registers and number of them */ uint32_t getFirstSpecialReg(void) const; uint32_t getSpecialRegNum(void) const; /*! Indicate if the given register is a special one (like localID in OCL) */ bool isSpecialReg(const Register ®) const; /*! Create a new label (still not bound to a basic block) */ LabelIndex newLabel(void); /*! Create the control flow graph */ void computeCFG(void); /*! Sort labels in increasing orders (top block has the smallest label) */ void sortLabels(void); /*! check empty Label. */ void checkEmptyLabels(void); /*! Get the pointer family */ RegisterFamily getPointerFamily(void) const; /*! Number of registers in the register file */ INLINE uint32_t regNum(void) const { return file.regNum(); } /*! Number of register tuples in the register file */ INLINE uint32_t tupleNum(void) const { return file.tupleNum(); } /*! Number of labels in the function */ INLINE uint32_t labelNum(void) const { return labels.size(); } /*! Number of immediate values in the function */ INLINE uint32_t immediateNum(void) const { return immediates.size(); } /*! Get the number of argument register */ INLINE uint32_t argNum(void) const { return args.size(); } /*! Get the number of output register */ INLINE uint32_t outputNum(void) const { return outputs.size(); } /*! Number of blocks in the function */ INLINE uint32_t blockNum(void) const { return blocks.size(); } /*! Output an immediate value in a stream */ void outImmediate(std::ostream &out, ImmediateIndex index) const; /*! Apply the given functor on all basic blocks */ template INLINE void foreachBlock(const T &functor) const { for (size_t i = 0; i < blocks.size(); ++i) { BasicBlock* block = blocks[i]; functor(*block); } } /*! Apply the given functor on all instructions */ template INLINE void foreachInstruction(const T &functor) const { for (size_t i = 0; i < blocks.size(); ++i) { BasicBlock* block = blocks[i]; block->foreach(functor); } } /*! Get wgBroadcastSLM in this function */ int32_t getwgBroadcastSLM(void) const { return wgBroadcastSLM; } /*! Set wgBroadcastSLM for this function */ void setwgBroadcastSLM(int32_t v) { wgBroadcastSLM = v; } /*! Get tidMapSLM in this function */ int32_t gettidMapSLM(void) const { return tidMapSLM; } /*! Set tidMapSLM for this function */ void settidMapSLM(int32_t v) { tidMapSLM = v; } /*! Does it use SLM */ INLINE bool getUseSLM(void) const { return this->useSLM; } /*! Change the SLM config for the function */ INLINE bool setUseSLM(bool useSLM) { return this->useSLM = useSLM; } /*! get SLM size needed for local variable inside kernel function */ INLINE uint32_t getSLMSize(void) const { return this->slmSize; } /*! set slm size needed for local variable inside kernel function */ INLINE void setSLMSize(uint32_t size) { this->slmSize = size; } /*! Get sampler set in this function */ SamplerSet* getSamplerSet(void) const {return samplerSet; } /*! Get image set in this function */ ImageSet* getImageSet(void) const {return imageSet; } /*! Get printf set in this function */ PrintfSet* getPrintfSet(void) const {return printfSet; } /*! Set required work group size. */ void setCompileWorkGroupSize(size_t x, size_t y, size_t z) { compileWgSize[0] = x; compileWgSize[1] = y; compileWgSize[2] = z; } /*! Get required work group size. */ const size_t *getCompileWorkGroupSize(void) const {return compileWgSize;} /*! Set function attributes string. */ void setFunctionAttributes(const std::string& functionAttributes) { this->functionAttributes= functionAttributes; } /*! Get function attributes string. */ const std::string& getFunctionAttributes(void) const {return this->functionAttributes;} /*! Get stack size. */ INLINE uint32_t getStackSize(void) const { return this->stackSize; } /*! Push stack size. */ INLINE void pushStackSize(uint32_t step) { this->stackSize += step; } /*! add the loop info for later liveness analysis */ void addLoop(LabelIndex preheader, int parent, const vector &bbs, const vector> &exits); INLINE const vector &getLoops() { return loops; } int getLoopDepth(LabelIndex Block) const; vector &getBlocks() { return blocks; } /*! Get surface starting address register from bti */ Register getSurfaceBaseReg(uint8_t bti) const; void appendSurface(uint8_t bti, Register reg); /*! Get instruction distance between two BBs include both b0 and b1, and b0 must be less than b1. */ INLINE uint32_t getDistance(LabelIndex b0, LabelIndex b1) const { uint32_t insnNum = 0; GBE_ASSERT(b0.value() <= b1.value()); for(uint32_t i = b0.value(); i <= b1.value(); i++) { BasicBlock &bb = getBlock(LabelIndex(i)); insnNum += bb.size(); } return insnNum; } /*! Output the control flow graph to .dot file */ void outputCFG(); uint32_t getOclVersion(void) const; /*! Does it use device enqueue */ INLINE bool getUseDeviceEnqueue(void) const { return this->useDeviceEnqueue; } /*! Change the device enqueue infor of the function */ INLINE bool setUseDeviceEnqueue(bool useDeviceEnqueue) { return this->useDeviceEnqueue = useDeviceEnqueue; } private: friend class Context; //!< Can freely modify a function std::string name; //!< Function name const Unit &unit; //!< Function belongs to this unit vector args; //!< Input registers of the function vector outputs; //!< Output registers of the function vector labels; //!< Each label points to a basic block vector immediates; //!< All immediate values in the function vector blocks; //!< All chained basic blocks vector loops; //!< Loops info of the function map btiRegMap;//!< map bti to surface base address RegisterFile file; //!< RegisterDatas used by the instructions Profile profile; //!< Current function profile PushMap pushMap; //!< Pushed function arguments (reg->loc) LocationMap locationMap; //!< Pushed function arguments (loc->reg) uint32_t simdWidth; //!< 8 or 16 if forced, 0 otherwise bool useSLM; //!< Is SLM required? uint32_t slmSize; //!< local variable size inside kernel function uint32_t stackSize; //!< stack size for private memory. SamplerSet *samplerSet; //!< samplers used in this function. ImageSet* imageSet; //!< Image set in this function's arguments.. PrintfSet *printfSet; //!< printfSet store the printf info. size_t compileWgSize[3]; //!< required work group size specified by // __attribute__((reqd_work_group_size(X, Y, Z))). std::string functionAttributes; //!< function attribute qualifiers combined. int32_t wgBroadcastSLM; //!< Used for broadcast the workgroup value. int32_t tidMapSLM; //!< Used to store the map between groupid and hw thread. bool useDeviceEnqueue; //!< Has device enqueue? GBE_CLASS(Function); //!< Use custom allocator }; /*! Output the function string in the given stream */ std::ostream &operator<< (std::ostream &out, const Function &fn); } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_FUNCTION_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/unit.cpp000664 001750 001750 00000004077 13161142102 017732 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file unit.cpp * \author Benjamin Segovia */ #include "ir/unit.hpp" #include "ir/function.hpp" namespace gbe { namespace ir { Unit::Unit(PointerSize pointerSize) : pointerSize(pointerSize), valid(true) { profilingInfo = GBE_NEW(ProfilingInfo); inProfilingMode = false; oclVersion = 120; } Unit::~Unit(void) { for (const auto &pair : functions) GBE_DELETE(pair.second); for (const auto &pair : printfs) GBE_DELETE(pair.second); delete profilingInfo; } Function *Unit::getFunction(const std::string &name) const { auto it = functions.find(name); if (it == functions.end()) return NULL; return it->second; } Function *Unit::newFunction(const std::string &name) { auto it = functions.find(name); if (it != functions.end()) return NULL; Function *fn = GBE_NEW(Function, name, *this); functions[name] = fn; return fn; } void Unit::newConstant(const std::string &name, uint32_t size, uint32_t alignment) { constantSet.append(name, size, alignment); } std::ostream &operator<< (std::ostream &out, const Unit &unit) { unit.apply([&out] (const Function &fn) { out << fn << std::endl; }); return out; } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/type.hpp000664 001750 001750 00000005510 13161142102 017732 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file type.hpp * * \author Benjamin Segovia */ #ifndef __GBE_IR_TYPE_HPP__ #define __GBE_IR_TYPE_HPP__ #include "sys/platform.hpp" #include "ir/register.hpp" #include namespace gbe { namespace ir { /*! All types possibly supported by the instruction */ enum Type : uint8_t { TYPE_BOOL = 0, //!< boolean value TYPE_S8, //!< signed 8 bits integer TYPE_U8, //!< unsigned 8 bits integer TYPE_S16, //!< signed 16 bits integer TYPE_U16, //!< unsigned 16 bits integer TYPE_S32, //!< signed 32 bits integer TYPE_U32, //!< unsigned 32 bits integer TYPE_S64, //!< signed 64 bits integer TYPE_U64, //!< unsigned 64 bits integer TYPE_HALF, //!< 16 bits floating point value TYPE_FLOAT, //!< 32 bits floating point value TYPE_DOUBLE, //!< 64 bits floating point value TYPE_LARGE_INT //!< integer larger than 64 bits. }; /*! Output a string for the type in the given stream */ std::ostream &operator<< (std::ostream &out, const Type &type); /*! Get the register family for each type */ INLINE RegisterFamily getFamily(Type type) { switch (type) { case TYPE_BOOL: return FAMILY_BOOL; case TYPE_S8: case TYPE_U8: return FAMILY_BYTE; case TYPE_S16: case TYPE_U16: case TYPE_HALF: return FAMILY_WORD; case TYPE_S32: case TYPE_U32: case TYPE_FLOAT: return FAMILY_DWORD; case TYPE_S64: case TYPE_U64: case TYPE_DOUBLE: return FAMILY_QWORD; default: return FAMILY_DWORD; }; } /*! Return a type for each register family */ INLINE Type getType(RegisterFamily family) { switch (family) { case FAMILY_BOOL: return TYPE_BOOL; case FAMILY_BYTE: return TYPE_U8; case FAMILY_WORD: return TYPE_U16; case FAMILY_DWORD: return TYPE_U32; case FAMILY_QWORD: return TYPE_U64; default: return TYPE_U32; } } } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_TYPE_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/constant.hpp000664 001750 001750 00000011041 13161142102 020576 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file constant.cpp * * \author Benjamin Segovia */ #ifndef __GBE_IR_CONSTANT_HPP__ #define __GBE_IR_CONSTANT_HPP__ #include "sys/vector.hpp" namespace gbe { namespace ir { /*! Describe one constant (may be a scalar or an array) */ class Constant { public: /*! Build a constant description */ INLINE Constant(const std::string &name, uint32_t size, uint32_t alignment, uint32_t offset) : name(name), size(size), alignment(alignment), offset(offset) {} /*! Copy constructor */ INLINE Constant(const Constant &other) : name(other.name), size(other.size), alignment(other.alignment), offset(other.offset) {} /*! Copy operator */ INLINE Constant& operator= (const Constant &other) { this->name = other.name; this->size = other.size; this->alignment = other.alignment; this->offset = other.offset; return *this; } /*! Nothing happens here */ INLINE ~Constant(void) {} const std::string& getName(void) const { return name; } uint32_t getSize (void) const { return size; } uint32_t getAlignment (void) const { return alignment; } uint32_t getOffset(void) const { return offset; } private: std::string name; //!< Optional name of the constant uint32_t size; //!< Size of the constant uint32_t alignment; //!< Alignment required for each constant uint32_t offset; //!< Offset of the constant in the data segment GBE_CLASS(Constant); }; /*! A constant set is a set of immutable data associated to a compilation * unit */ class ConstantSet : public Serializable { public: /*! Append a new constant in the constant set */ void append(const std::string&, uint32_t size, uint32_t alignment); /*! Number of constants */ size_t getConstantNum(void) const { return constants.size(); } /*! Get a special constant */ Constant& getConstant(size_t i) { return constants[i]; } /*! Get a special constant */ Constant& getConstant(const std::string & name) { for (size_t i = 0; i < constants.size(); ++i) { Constant& c = constants[i]; if (c.getName() == name) return c; } GBE_ASSERT(false); return *(Constant *)NULL; } /*! Number of bytes of serialized constant data */ size_t getDataSize(void) const { return data.size(); } /*! Store serialized constant data into an array */ void getData(char *mem) const { for (size_t i = 0; i < data.size(); i ++) mem[i] = data[i]; } void setData(char *mem, int offset, int size) { for (int i = 0; i < size; i++) { data[i+offset] = mem[i]; } } ConstantSet() {} ConstantSet(const ConstantSet& other) : Serializable(other), data(other.data), constants(other.constants) {} ConstantSet & operator = (const ConstantSet& other) { if (&other != this) { data = other.data; constants = other.constants; } return *this; } static const uint32_t magic_begin = TO_MAGIC('C', 'N', 'S', 'T'); static const uint32_t magic_end = TO_MAGIC('T', 'S', 'N', 'C'); /* format: magic_begin | const_data_size | const_data | constant_1_size | constant_1 | ........ | constant_n_size | constant_n | magic_end | total_size */ /*! Implements the serialization. */ virtual uint32_t serializeToBin(std::ostream& outs); virtual uint32_t deserializeFromBin(std::istream& ins); private: vector data; //!< The constant data serialized in one array vector constants;//!< Each constant description GBE_CLASS(ConstantSet); }; } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_CONSTANT_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/register.cpp000664 001750 001750 00000004747 13161142102 020603 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file register.cpp * \author Benjamin Segovia */ #include "ir/profile.hpp" #include "ir/register.hpp" namespace gbe { namespace ir { std::ostream &operator<< (std::ostream &out, const RegisterData ®Data) { switch (regData.family) { case FAMILY_BOOL: return out << "bool"; case FAMILY_BYTE: return out << "byte"; case FAMILY_WORD: return out << "word"; case FAMILY_DWORD: return out << "dword"; case FAMILY_QWORD: return out << "qword"; case FAMILY_OWORD: return out << "oword"; case FAMILY_HWORD: return out << "hword"; case FAMILY_REG: return out << "reg"; }; return out; } std::ostream &operator<< (std::ostream &out, const RegisterFile &file) { out << "## " << file.regNum() << " register" << (file.regNum() ? "s" : "") << " ##" << std::endl; for (uint32_t i = 0; i < file.regNum(); ++i) { const RegisterData reg = file.get(Register(i)); out << ".decl." << reg << " %" << i; if (i < ocl::regNum) out << " " << ocl::specialRegMean[i]; out << std::endl; } return out; } Tuple RegisterFile::appendArrayTuple(const Register *reg, uint32_t regNum) { const Tuple index = Tuple(regTuples.size()); for (uint32_t regID = 0; regID < regNum; ++regID) { GBE_ASSERTM(reg[regID] < this->regNum(), "Out-of-bound register"); regTuples.push_back(reg[regID]); } return index; } Tuple RegisterFile::appendArrayTypeTuple(const uint8_t *types, uint32_t num) { const Tuple index = Tuple(typeTuples.size()); for (uint32_t id = 0; id < num; id++) { typeTuples.push_back(types[id]); } return index; } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/instruction.cpp000664 001750 001750 00000304300 13161142102 021324 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file instruction.cpp * \author Benjamin Segovia */ #include "ir/instruction.hpp" #include "ir/function.hpp" namespace gbe { namespace ir { /////////////////////////////////////////////////////////////////////////// // Implements the concrete implementations of the instruction classes. We // cast an instruction to an internal class to run the given member function /////////////////////////////////////////////////////////////////////////// namespace internal { #define ALIGNED_INSTRUCTION ALIGNED(ALIGNOF(Instruction)) /*! Policy shared by all the internal instructions */ struct BasePolicy { /*! Create an instruction from its internal representation */ Instruction convert(void) const { return Instruction(reinterpret_cast(&this->opcode)); } /*! Output the opcode in the given stream */ INLINE void outOpcode(std::ostream &out) const { switch (opcode) { #define DECL_INSN(OPCODE, CLASS) case OP_##OPCODE: out << #OPCODE; break; #include "instruction.hxx" #undef DECL_INSN case OP_INVALID: NOT_SUPPORTED; break; }; } /*! Instruction opcode */ Opcode opcode; }; /*! For regular n source instructions */ template struct NSrcPolicy { INLINE uint32_t getSrcNum(void) const { return srcNum; } INLINE Register getSrc(const Function &fn, uint32_t ID) const { GBE_ASSERTM((int) ID < (int) srcNum, "Out-of-bound source"); return static_cast(this)->src[ID]; } INLINE void setSrc(Function &fn, uint32_t ID, Register reg) { GBE_ASSERTM((int) ID < (int) srcNum, "Out-of-bound source"); static_cast(this)->src[ID] = reg; } }; /*! For regular n destinations instructions */ template struct NDstPolicy { INLINE uint32_t getDstNum(void) const { return dstNum; } INLINE Register getDst(const Function &fn, uint32_t ID) const { GBE_ASSERTM((int) ID < (int) dstNum, "Out-of-bound destination"); return static_cast(this)->dst[ID]; } INLINE void setDst(Function &fn, uint32_t ID, Register reg) { GBE_ASSERTM((int) ID < (int) dstNum, "Out-of-bound destination"); static_cast(this)->dst[ID] = reg; } }; /*! For instructions that use a tuple for source */ template struct TupleSrcPolicy { INLINE uint32_t getSrcNum(void) const { return static_cast(this)->srcNum; } INLINE Register getSrc(const Function &fn, uint32_t ID) const { GBE_ASSERTM(ID < static_cast(this)->srcNum, "Out-of-bound source register"); return fn.getRegister(static_cast(this)->src, ID); } INLINE void setSrc(Function &fn, uint32_t ID, Register reg) { GBE_ASSERTM(ID < static_cast(this)->srcNum, "Out-of-bound source register"); return fn.setRegister(static_cast(this)->src, ID, reg); } }; /*! For instructions that use a tuple for destination */ template struct TupleDstPolicy { INLINE uint32_t getDstNum(void) const { return static_cast(this)->dstNum; } INLINE Register getDst(const Function &fn, uint32_t ID) const { GBE_ASSERTM(ID < static_cast(this)->dstNum, "Out-of-bound source register"); return fn.getRegister(static_cast(this)->dst, ID); } INLINE void setDst(Function &fn, uint32_t ID, Register reg) { GBE_ASSERTM(ID < static_cast(this)->dstNum, "Out-of-bound source register"); return fn.setRegister(static_cast(this)->dst, ID, reg); } }; /*! All unary and binary arithmetic instructions */ template // 1 or 2 class ALIGNED_INSTRUCTION NaryInstruction : public BasePolicy, public NSrcPolicy, srcNum>, public NDstPolicy, 1> { public: INLINE Type getType(void) const { return this->type; } INLINE bool wellFormed(const Function &fn, std::string &whyNot) const; INLINE void out(std::ostream &out, const Function &fn) const; Type type; //!< Type of the instruction Register dst[1]; //!< Index of the register in the register file Register src[srcNum]; //!< Indices of the sources }; /*! All 0-source arithmetic instructions */ class ALIGNED_INSTRUCTION NullaryInstruction : public NaryInstruction<0> { public: NullaryInstruction(Opcode opcode, Type type, Register dst) { this->opcode = opcode; this->type = type; this->dst[0] = dst; } }; /*! All 1-source arithmetic instructions */ class ALIGNED_INSTRUCTION UnaryInstruction : public NaryInstruction<1> { public: UnaryInstruction(Opcode opcode, Type type, Register dst, Register src) { this->opcode = opcode; this->type = type; this->dst[0] = dst; this->src[0] = src; } }; /*! All 2-source arithmetic instructions */ class ALIGNED_INSTRUCTION BinaryInstruction : public NaryInstruction<2> { public: BinaryInstruction(Opcode opcode, Type type, Register dst, Register src0, Register src1) { this->opcode = opcode; this->type = type; this->dst[0] = dst; this->src[0] = src0; this->src[1] = src1; } INLINE bool commutes(void) const { switch (opcode) { case OP_ADD: case OP_ADDSAT: case OP_XOR: case OP_OR: case OP_AND: case OP_MUL: return true; default: return false; } } }; class ALIGNED_INSTRUCTION TernaryInstruction : public BasePolicy, public NDstPolicy, public TupleSrcPolicy { public: TernaryInstruction(Opcode opcode, Type type, Register dst, Tuple src) { this->opcode = opcode; this->type = type; this->dst[0] = dst; this->src = src; } Type getType(void) const { return type; } bool wellFormed(const Function &fn, std::string &whyNot) const; INLINE void out(std::ostream &out, const Function &fn) const; Type type; Register dst[1]; Tuple src; static const uint32_t srcNum = 3; }; /*! Three sources mean we need a tuple to encode it */ class ALIGNED_INSTRUCTION SelectInstruction : public BasePolicy, public NDstPolicy, public TupleSrcPolicy { public: SelectInstruction(Type type, Register dst, Tuple src) { this->opcode = OP_SEL; this->type = type; this->dst[0] = dst; this->src = src; } INLINE Type getType(void) const { return this->type; } INLINE bool wellFormed(const Function &fn, std::string &whyNot) const; INLINE void out(std::ostream &out, const Function &fn) const; Type type; //!< Type of the instruction Register dst[1]; //!< Dst is the register index Tuple src; //!< 3 sources do not fit in 8 bytes -> use a tuple static const uint32_t srcNum = 3; }; /*! Comparison instructions take two sources of the same type and return a * boolean value. Since it is pretty similar to binary instruction, we * steal all the methods from it, except wellFormed (dst register is always * a boolean value) */ class ALIGNED_INSTRUCTION CompareInstruction : public NaryInstruction<2> { public: CompareInstruction(Opcode opcode, Type type, Register dst, Register src0, Register src1) { this->opcode = opcode; this->type = type; this->dst[0] = dst; this->src[0] = src0; this->src[1] = src1; } INLINE bool wellFormed(const Function &fn, std::string &whyNot) const; }; class ALIGNED_INSTRUCTION BitCastInstruction : public BasePolicy, public TupleSrcPolicy, public TupleDstPolicy { public: BitCastInstruction(Type dstType, Type srcType, Tuple dst, Tuple src, uint8_t dstNum, uint8_t srcNum) { this->opcode = OP_BITCAST; this->dst = dst; this->src = src; this->dstFamily = getFamily(dstType); this->srcFamily = getFamily(srcType); GBE_ASSERT(srcNum <= Instruction::MAX_SRC_NUM && dstNum <= Instruction::MAX_DST_NUM); this->dstNum = dstNum; this->srcNum = srcNum; } INLINE Type getSrcType(void) const { return getType((RegisterFamily)srcFamily); } INLINE Type getDstType(void) const { return getType((RegisterFamily)dstFamily); } INLINE bool wellFormed(const Function &fn, std::string &whyNot) const; INLINE void out(std::ostream &out, const Function &fn) const; uint8_t dstFamily:4; //!< family to cast to uint8_t srcFamily:4; //!< family to cast from Tuple dst; Tuple src; uint8_t dstNum; //!, public NSrcPolicy { public: ConvertInstruction(Opcode opcode, Type dstType, Type srcType, Register dst, Register src) { this->opcode = opcode; this->dst[0] = dst; this->src[0] = src; this->dstType = dstType; this->srcType = srcType; } INLINE Type getSrcType(void) const { return this->srcType; } INLINE Type getDstType(void) const { return this->dstType; } INLINE bool wellFormed(const Function &fn, std::string &whyNot) const; INLINE void out(std::ostream &out, const Function &fn) const; Register dst[1]; Register src[1]; Type dstType; //!< Type to convert to Type srcType; //!< Type to convert from }; class ALIGNED_INSTRUCTION MemInstruction : public BasePolicy { public: MemInstruction(AddressMode _AM, AddressSpace _AS, bool _dwAligned, Type _type, Register _offset) : AM(_AM), AS(_AS), dwAligned(_dwAligned), type(_type), SurfaceIndex(0), offset(_offset) { } AddressMode getAddressMode() const { return AM; } AddressSpace getAddressSpace() const { return AS; } /*! MemInstruction may have one possible btiReg */ Register getBtiReg() const { assert(AM == AM_DynamicBti); return BtiReg; } unsigned getSurfaceIndex() const { assert(AM != AM_DynamicBti); return SurfaceIndex; } Register getAddressRegister()const { return offset; } unsigned getAddressIndex() const { return 0; } Type getValueType() const { return type; } INLINE bool isAligned(void) const { return !!dwAligned; } void setSurfaceIndex (unsigned id) { SurfaceIndex = id; } void setBtiReg(Register reg) { BtiReg = reg; } protected: /*! including address reg + optional bti reg */ int getBaseSrcNum() const { return AM == AM_DynamicBti ? 2 : 1; } bool hasExtraBtiReg() const { return AM == AM_DynamicBti; } AddressMode AM; AddressSpace AS; uint8_t dwAligned : 1; Type type; union { Register BtiReg; unsigned SurfaceIndex; }; Register offset; }; class ALIGNED_INSTRUCTION AtomicInstruction : public MemInstruction, public NDstPolicy { public: AtomicInstruction(AtomicOps atomicOp, Type type, Register dst, AddressSpace addrSpace, Register address, Tuple payload, AddressMode AM) : MemInstruction(AM, addrSpace, true, type, address) { this->opcode = OP_ATOMIC; this->atomicOp = atomicOp; this->dst[0] = dst; this->payload = payload; int payloadNum = 1; if((atomicOp == ATOMIC_OP_INC) || (atomicOp == ATOMIC_OP_DEC)) payloadNum = 0; if(atomicOp == ATOMIC_OP_CMPXCHG) payloadNum = 2; srcNum = payloadNum + getBaseSrcNum(); } INLINE Register getSrc(const Function &fn, uint32_t ID) const { GBE_ASSERTM((int)ID < (int)srcNum, "Out-of-bound source register for atomic"); if (ID == 0) { return offset; } else if (hasExtraBtiReg() && (int)ID == (int)srcNum-1) { return getBtiReg(); } else { return fn.getRegister(payload, ID - 1); } } INLINE void setSrc(Function &fn, uint32_t ID, Register reg) { GBE_ASSERTM((int)ID < (int)srcNum, "Out-of-bound source register for atomic"); if (ID == 0) { offset = reg; } else if (hasExtraBtiReg() && (int)ID == (int)srcNum - 1) { setBtiReg(reg); } else { fn.setRegister(payload, ID - 1, reg); } } INLINE uint32_t getSrcNum(void) const { return srcNum; } INLINE AtomicOps getAtomicOpcode(void) const { return this->atomicOp; } INLINE bool wellFormed(const Function &fn, std::string &whyNot) const; INLINE void out(std::ostream &out, const Function &fn) const; Register dst[1]; Tuple payload; uint8_t srcNum:3; //! { public: INLINE BranchInstruction(Opcode op, LabelIndex labelIndex, Register predicate, bool inv_pred=false) { GBE_ASSERT(op == OP_BRA || op == OP_IF || op == OP_WHILE); this->opcode = op; this->predicate = predicate; this->labelIndex = labelIndex; this->hasPredicate = true; this->hasLabel = true; this->inversePredicate = inv_pred; } INLINE BranchInstruction(Opcode op, LabelIndex labelIndex) { GBE_ASSERT(op == OP_BRA || op == OP_ELSE || op == OP_ENDIF); this->opcode = op; this->labelIndex = labelIndex; this->hasPredicate = false; this->hasLabel = true; } INLINE BranchInstruction(Opcode op) { GBE_ASSERT(op == OP_RET); this->opcode = op; this->hasPredicate = false; this->hasLabel = false; } INLINE LabelIndex getLabelIndex(void) const { GBE_ASSERTM(hasLabel, "No target label for this branch instruction"); return labelIndex; } INLINE uint32_t getSrcNum(void) const { return hasPredicate ? 1 : 0; } INLINE Register getSrc(const Function &fn, uint32_t ID) const { GBE_ASSERTM(hasPredicate, "No source for unpredicated branches"); GBE_ASSERTM(ID == 0, "Only one source for the branch instruction"); return predicate; } INLINE void setSrc(Function &fn, uint32_t ID, Register reg) { GBE_ASSERTM(hasPredicate, "No source for unpredicated branches"); GBE_ASSERTM(ID == 0, "Only one source for the branch instruction"); predicate = reg; } INLINE bool isPredicated(void) const { return hasPredicate; } INLINE bool getInversePredicated(void) const { return inversePredicate; } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const; Register predicate; //!< Predication means conditional branch LabelIndex labelIndex; //!< Index of the label the branch targets bool hasPredicate:1; //!< Is it predicated? bool inversePredicate:1; //!< Is it inverse predicated? bool hasLabel:1; //!< Is there any target label? Register dst[0]; //!< No destination }; class ALIGNED_INSTRUCTION LoadInstruction : public MemInstruction { public: LoadInstruction(Type type, Tuple dstValues, Register offset, AddressSpace AS, uint32_t _valueNum, bool dwAligned, AddressMode AM, bool ifBlock = false) : MemInstruction(AM, AS, dwAligned, type, offset), valueNum(_valueNum), values(dstValues), ifBlock(ifBlock) { this->opcode = OP_LOAD; } INLINE unsigned getSrcNum() const { return getBaseSrcNum(); } INLINE Register getSrc(const Function &fn, unsigned id) const { if (id == 0) return offset; if (hasExtraBtiReg() && id == 1) return BtiReg; assert(0 && "LoadInstruction::getSrc() out-of-range"); return ir::Register(0); } INLINE void setSrc(Function &fn, unsigned id, Register reg) { assert(id < getSrcNum()); if (id == 0) { offset = reg; return; } if (id == 1) { setBtiReg(reg); return; } } INLINE unsigned getDstNum() const { return valueNum; } INLINE Register getDst(const Function &fn, unsigned id) const { assert(id < valueNum); return fn.getRegister(values, id); } INLINE void setDst(Function &fn, unsigned id, Register reg) { assert(id < getDstNum()); fn.setRegister(values, id, reg); } INLINE uint32_t getValueNum(void) const { return valueNum; } INLINE Register getValue(const Function &fn, unsigned id) const { assert(id < valueNum); return fn.getRegister(values, id); } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const; INLINE bool isBlock() const { return ifBlock; } uint8_t valueNum; Tuple values; bool ifBlock; }; class ALIGNED_INSTRUCTION StoreInstruction : public MemInstruction, public NDstPolicy { public: StoreInstruction(Type type, Tuple values, Register offset, AddressSpace addrSpace, uint32_t valueNum, bool dwAligned, AddressMode AM, bool ifBlock = false) : MemInstruction(AM, addrSpace, dwAligned, type, offset) { this->opcode = OP_STORE; this->values = values; this->valueNum = valueNum; this->ifBlock = ifBlock; } INLINE unsigned getValueNum() const { return valueNum; } INLINE Register getValue(const Function &fn, unsigned id) const { return fn.getRegister(values, id); } INLINE unsigned getSrcNum() const { return getBaseSrcNum() + valueNum; } INLINE Register getSrc(const Function &fn, unsigned id) const { if (id == 0) return offset; if (id <= valueNum) return fn.getRegister(values, id-1); if (hasExtraBtiReg() && (int)id == (int)valueNum+1) return getBtiReg(); assert(0 && "StoreInstruction::getSrc() out-of-range"); return Register(0); } INLINE void setSrc(Function &fn, unsigned id, Register reg) { if (id == 0) { offset = reg; return; } if (id > 0 && id <= valueNum) { fn.setRegister(values, id-1, reg); return; } if (hasExtraBtiReg() && (int)id == (int)valueNum + 1) { setBtiReg(reg); return; } assert(0 && "StoreInstruction::setSrc() index out-of-range"); } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const; INLINE bool isBlock() const { return ifBlock; } Register dst[0]; uint8_t valueNum; Tuple values; bool ifBlock; }; class ALIGNED_INSTRUCTION SampleInstruction : // TODO public BasePolicy, public TupleSrcPolicy, public TupleDstPolicy { public: SampleInstruction(uint8_t imageIdx, Tuple dstTuple, Tuple srcTuple, uint8_t srcNum, bool dstIsFloat, bool srcIsFloat, uint8_t sampler, uint8_t samplerOffset) { this->opcode = OP_SAMPLE; this->dst = dstTuple; this->src = srcTuple; this->srcNum = srcNum; this->dstIsFloat = dstIsFloat; this->srcIsFloat = srcIsFloat; this->samplerIdx = sampler; this->imageIdx = imageIdx; this->samplerOffset = samplerOffset; } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << "." << this->getDstType() << "." << this->getSrcType() << " surface id " << (int)this->getImageIndex(); out << " coord u %" << this->getSrc(fn, 0); if (srcNum >= 2) out << " coord v %" << this->getSrc(fn, 1); if (srcNum >= 3) out << " coord w %" << this->getSrc(fn, 2); out << " %" << this->getDst(fn, 0) << " %" << this->getDst(fn, 1) << " %" << this->getDst(fn, 2) << " %" << this->getDst(fn, 3) << " sampler idx " << (int)this->getSamplerIndex(); } Tuple src; Tuple dst; INLINE uint8_t getImageIndex(void) const { return this->imageIdx; } INLINE Type getSrcType(void) const { return this->srcIsFloat ? TYPE_FLOAT : TYPE_S32; } INLINE Type getDstType(void) const { return this->dstIsFloat ? TYPE_FLOAT : TYPE_U32; } INLINE uint8_t getSamplerIndex(void) const { return this->samplerIdx; } INLINE uint8_t getSamplerOffset(void) const { return this->samplerOffset; } uint8_t srcIsFloat:1; uint8_t dstIsFloat:1; uint8_t samplerIdx:4; uint8_t samplerOffset:2; uint8_t imageIdx; uint8_t srcNum; static const uint32_t dstNum = 4; }; class ALIGNED_INSTRUCTION VmeInstruction : public BasePolicy, public TupleSrcPolicy, public TupleDstPolicy { public: VmeInstruction(uint8_t imageIdx, Tuple dstTuple, Tuple srcTuple, uint32_t dstNum, uint32_t srcNum, int msg_type, int vme_search_path_lut, int lut_sub) { this->opcode = OP_VME; this->dst = dstTuple; this->src = srcTuple; this->dstNum = dstNum; this->srcNum = srcNum; this->imageIdx = imageIdx; this->msg_type = msg_type; this->vme_search_path_lut = vme_search_path_lut; this->lut_sub = lut_sub; } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << " src_surface id " << (int)this->getImageIndex() << " ref_surface id " << (int)this->getImageIndex() + 1; for(uint32_t i = 0; i < dstNum; i++){ out<< " %" << this->getDst(fn, i); } for(uint32_t i = 0; i < srcNum; i++){ out<< " %" << this->getSrc(fn, i); } out << " msg_type " << (int)this->getMsgType() << " vme_search_path_lut " << (int)this->vme_search_path_lut << " lut_sub " << (int)this->lut_sub; } Tuple src; Tuple dst; INLINE uint8_t getImageIndex(void) const { return this->imageIdx; } INLINE uint8_t getMsgType(void) const { return this->msg_type; } INLINE Type getSrcType(void) const { return TYPE_U32; } INLINE Type getDstType(void) const { return TYPE_U32; } uint8_t imageIdx; uint8_t msg_type; uint8_t vme_search_path_lut; uint8_t lut_sub; uint32_t srcNum; uint32_t dstNum; }; class ALIGNED_INSTRUCTION TypedWriteInstruction : // TODO public BasePolicy, public TupleSrcPolicy, public NDstPolicy { public: INLINE TypedWriteInstruction(uint8_t imageIdx, Tuple srcTuple, uint8_t srcNum, Type srcType, Type coordType) { this->opcode = OP_TYPED_WRITE; this->src = srcTuple; this->srcNum = srcNum; this->coordType = coordType; this->srcType = srcType; this->imageIdx = imageIdx; } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const { this->outOpcode(out); uint32_t srcID = 0; out << "." << this->getSrcType() << " surface id " << (int)this->getImageIndex() << " coord u %" << this->getSrc(fn, srcID++); if (srcNum >= 6) out << " coord v %" << this->getSrc(fn, srcID++); if (srcNum >= 7) out << " coord w %" << this->getSrc(fn, srcID++); out << " %" << this->getSrc(fn, srcID++); out << " %" << this->getSrc(fn, srcID++); out << " %" << this->getSrc(fn, srcID++); out << " %" << this->getSrc(fn, srcID++); } Tuple src; uint8_t srcType; uint8_t coordType; uint8_t imageIdx; // bti, u, [v], [w], 4 data elements uint8_t srcNum; INLINE uint8_t getImageIndex(void) const { return this->imageIdx; } INLINE Type getSrcType(void) const { return (Type)this->srcType; } INLINE Type getCoordType(void) const { return (Type)this->coordType; } Register dst[0]; //!< No dest register }; class ALIGNED_INSTRUCTION GetImageInfoInstruction : public BasePolicy, public NSrcPolicy, public NDstPolicy { public: GetImageInfoInstruction( int type, Register dst, uint8_t imageIdx, Register infoReg) { this->opcode = OP_GET_IMAGE_INFO; this->infoType = type; this->dst[0] = dst; this->src[0] = infoReg; this->imageIdx = imageIdx; } INLINE uint32_t getInfoType(void) const { return infoType; } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << "." << this->getInfoType() << " %" << this->getDst(fn, 0) << " surface id " << (int)this->getImageIndex() << " info reg %" << this->getSrc(fn, 0); } INLINE uint8_t getImageIndex(void) const { return imageIdx; } uint8_t infoType; //!< Type of the requested information. uint8_t imageIdx; //!< surface index. Register src[1]; //!< surface info register. Register dst[1]; //!< dest register to put the information. static const uint32_t dstNum = 1; }; class ALIGNED_INSTRUCTION CalcTimestampInstruction : public BasePolicy, public NSrcPolicy, public NDstPolicy { public: CalcTimestampInstruction(uint32_t pointNum, uint32_t timestampType) { this->opcode = OP_CALC_TIMESTAMP; this->timestampType = static_cast(timestampType); this->pointNum = static_cast(pointNum); } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << "TimeStamp pointer " << static_cast(pointNum) << " (Type " << static_cast(timestampType) << ")"; } uint32_t getPointNum(void) const { return this->pointNum; } uint32_t getTimestamptType(void) const { return this->timestampType; } uint8_t timestampType; //!< Type of the time stamp, 16bits or 32bits, eg. uint8_t pointNum; //!< The insert point number. Register dst[0], src[0]; }; class ALIGNED_INSTRUCTION StoreProfilingInstruction : public BasePolicy, public NSrcPolicy, public NDstPolicy { public: StoreProfilingInstruction(uint32_t bti, uint32_t profilingType) { this->opcode = OP_STORE_PROFILING; this->profilingType = static_cast(profilingType); this->bti = static_cast(bti); } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << " BTI " << static_cast(this->bti) << " (Type " << static_cast(this->profilingType) << ")"; } uint32_t getProfilingType(void) const { return this->profilingType; } uint32_t getBTI(void) const { return this->bti; } uint8_t profilingType; //!< Type format of profiling, 16bits or 32bits, eg. uint8_t bti; Register src[0]; Register dst[0]; }; class ALIGNED_INSTRUCTION LoadImmInstruction : public BasePolicy, public NSrcPolicy, public NDstPolicy { public: INLINE LoadImmInstruction(Type type, Register dst, ImmediateIndex index) { this->dst[0] = dst; this->opcode = OP_LOADI; this->immediateIndex = index; this->type = type; } INLINE Immediate getImmediate(const Function &fn) const { return fn.getImmediate(immediateIndex); } INLINE Type getType(void) const { return this->type; } bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const; Register dst[1]; //!< RegisterData to store into Register src[0]; //!< No source register ImmediateIndex immediateIndex; //!< Index in the vector of immediates Type type; //!< Type of the immediate }; class ALIGNED_INSTRUCTION SyncInstruction : public BasePolicy, public NSrcPolicy, public NDstPolicy { public: INLINE SyncInstruction(uint32_t parameters) { this->opcode = OP_SYNC; this->parameters = parameters; } INLINE uint32_t getParameters(void) const { return this->parameters; } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const; uint32_t parameters; Register dst[0], src[0]; }; class ALIGNED_INSTRUCTION ReadARFInstruction : public BasePolicy, public NSrcPolicy, public NDstPolicy { public: INLINE ReadARFInstruction(Type type, Register dst, ARFRegister arf) { this->type = type; this->dst[0] = dst; this->opcode = OP_READ_ARF; this->arf = arf; } INLINE ir::ARFRegister getARFRegister(void) const { return this->arf; } INLINE Type getType(void) const { return this->type; } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const; Type type; ARFRegister arf; Register dst[1]; Register src[0]; }; class ALIGNED_INSTRUCTION SimdShuffleInstruction : public NaryInstruction<2> { public: SimdShuffleInstruction(Type type, Register dst, Register src0, Register src1) { this->opcode = OP_SIMD_SHUFFLE; this->type = type; this->dst[0] = dst; this->src[0] = src0; this->src[1] = src1; } INLINE bool wellFormed(const Function &fn, std::string &why) const; }; class ALIGNED_INSTRUCTION RegionInstruction : public BasePolicy, public NSrcPolicy, public NDstPolicy { public: INLINE RegionInstruction(Register dst, Register src, uint32_t offset) { this->offset = offset; this->dst[0] = dst; this->src[0] = src; this->opcode = OP_REGION; } INLINE uint32_t getOffset(void) const { return this->offset; } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const; uint32_t offset; Register dst[1]; Register src[1]; }; class ALIGNED_INSTRUCTION IndirectMovInstruction : public BasePolicy, public NSrcPolicy, public NDstPolicy { public: INLINE IndirectMovInstruction(Type type, Register dst, Register src0, Register src1, uint32_t offset) { this->type = type; this->offset = offset; this->dst[0] = dst; this->src[0] = src0; this->src[1] = src1; this->opcode = OP_INDIRECT_MOV; } INLINE Type getType(void) const { return this->type; } INLINE uint32_t getOffset(void) const { return this->offset; } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const; Type type; uint32_t offset; Register dst[1]; Register src[2]; }; class ALIGNED_INSTRUCTION LabelInstruction : public BasePolicy, public NSrcPolicy, public NDstPolicy { public: INLINE LabelInstruction(LabelIndex labelIndex) { this->opcode = OP_LABEL; this->labelIndex = labelIndex; } INLINE LabelIndex getLabelIndex(void) const { return labelIndex; } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const; LabelIndex labelIndex; //!< Index of the label Register dst[0], src[0]; }; /*! Wait instructions */ class ALIGNED_INSTRUCTION WaitInstruction : public BasePolicy, public NSrcPolicy, public NDstPolicy { public: INLINE WaitInstruction() { this->opcode = OP_WAIT; } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const; Register dst[0], src[0]; }; class ALIGNED_INSTRUCTION WorkGroupInstruction : public BasePolicy, public TupleSrcPolicy, public NDstPolicy { public: INLINE WorkGroupInstruction(WorkGroupOps opcode, uint32_t slmAddr, Register dst, Tuple srcTuple, uint8_t srcNum, Type type) { this->opcode = OP_WORKGROUP; this->workGroupOp = opcode; this->type = type; this->dst[0] = dst; this->src = srcTuple; this->srcNum = srcNum; this->slmAddr = slmAddr; } INLINE Type getType(void) const { return this->type; } INLINE bool wellFormed(const Function &fn, std::string &whyNot) const; INLINE void out(std::ostream &out, const Function &fn) const; INLINE WorkGroupOps getWorkGroupOpcode(void) const { return this->workGroupOp; } uint32_t getSlmAddr(void) const { return this->slmAddr; } WorkGroupOps workGroupOp:5; uint32_t srcNum:3; //!< Source Number uint32_t slmAddr:24; //!< Thread Map in SLM. Type type; //!< Type of the instruction Tuple src; Register dst[1]; }; class ALIGNED_INSTRUCTION SubGroupInstruction : public BasePolicy, public TupleSrcPolicy, public NDstPolicy { public: INLINE SubGroupInstruction(WorkGroupOps opcode, Register dst, Tuple srcTuple, uint8_t srcNum, Type type) { this->opcode = OP_SUBGROUP; this->workGroupOp = opcode; this->type = type; this->dst[0] = dst; this->src = srcTuple; this->srcNum = srcNum; } INLINE Type getType(void) const { return this->type; } INLINE bool wellFormed(const Function &fn, std::string &whyNot) const; INLINE void out(std::ostream &out, const Function &fn) const; INLINE WorkGroupOps getWorkGroupOpcode(void) const { return this->workGroupOp; } WorkGroupOps workGroupOp:5; uint32_t srcNum:3; //!< Source Number Type type; //!< Type of the instruction Tuple src; Register dst[1]; }; class ALIGNED_INSTRUCTION PrintfInstruction : public BasePolicy, public TupleSrcPolicy, public NDstPolicy { public: INLINE PrintfInstruction(Register dst, Tuple srcTuple, Tuple typeTuple, uint8_t srcNum, uint8_t bti, uint16_t num) { this->opcode = OP_PRINTF; this->dst[0] = dst; this->src = srcTuple; this->type = typeTuple; this->srcNum = srcNum; this->bti = bti; this->num = num; } INLINE bool wellFormed(const Function &fn, std::string &whyNot) const; INLINE void out(std::ostream &out, const Function &fn) const; uint32_t getNum(void) const { return this->num; } uint32_t getBti(void) const { return this->bti; } Type getType(const Function& fn, uint32_t ID) const { GBE_ASSERTM(ID < this->srcNum, "Out-of-bound types"); return (Type)fn.getType(type, ID); } uint32_t srcNum:8; //!< Source Number uint32_t bti:8; //!< The BTI uint32_t num:16; //!< The printf statement number of one kernel. Tuple src; Tuple type; Register dst[1]; }; class ALIGNED_INSTRUCTION MediaBlockReadInstruction : public BasePolicy, public TupleSrcPolicy, public TupleDstPolicy { public: INLINE MediaBlockReadInstruction(uint8_t imageIdx, Tuple dst, uint8_t vec_size, Tuple srcTuple, uint8_t srcNum, Type type) { this->opcode = OP_MBREAD; this->dst = dst; this->dstNum = vec_size; this->src = srcTuple; this->srcNum = srcNum; this->imageIdx = imageIdx; this->type = type; } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << "." << type << "." << (int)this->getVectorSize(); out << " {"; for (uint32_t i = 0; i < dstNum; ++i) out << "%" << this->getDst(fn, i) << (i != (dstNum-1u) ? " " : ""); out << "}"; out << " 2D surface id " << (int)this->getImageIndex() << " byte coord x %" << this->getSrc(fn, 0) << " row coord y %" << this->getSrc(fn, 1); } INLINE uint8_t getImageIndex(void) const { return this->imageIdx; } INLINE uint8_t getVectorSize(void) const { return this->dstNum; } INLINE Type getType(void) const { return this->type; } Tuple src; Tuple dst; uint8_t imageIdx; uint8_t srcNum; uint8_t dstNum; Type type; }; class ALIGNED_INSTRUCTION MediaBlockWriteInstruction : public BasePolicy, public TupleSrcPolicy, public NDstPolicy { public: INLINE MediaBlockWriteInstruction(uint8_t imageIdx, Tuple srcTuple, uint8_t srcNum, uint8_t vec_size, Type type) { this->opcode = OP_MBWRITE; this->src = srcTuple; this->srcNum = srcNum; this->imageIdx = imageIdx; this->vec_size = vec_size; this->type = type; } INLINE bool wellFormed(const Function &fn, std::string &why) const; INLINE void out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << "." << type << "." << (int)this->getVectorSize() << " 2D surface id " << (int)this->getImageIndex() << " byte coord x %" << this->getSrc(fn, 0) << " row coord y %" << this->getSrc(fn, 1); out << " {"; for (uint32_t i = 0; i < vec_size; ++i) out << "%" << this->getSrc(fn, i + 2) << (i != (vec_size-1u) ? " " : ""); out << "}"; } INLINE uint8_t getImageIndex(void) const { return this->imageIdx; } INLINE uint8_t getVectorSize(void) const { return this->vec_size; } INLINE Type getType(void) const { return this->type; } Tuple src; Register dst[0]; uint8_t imageIdx; uint8_t srcNum; uint8_t vec_size; Type type; }; #undef ALIGNED_INSTRUCTION ///////////////////////////////////////////////////////////////////////// // Implements all the wellFormed methods ///////////////////////////////////////////////////////////////////////// /*! All Nary instruction registers must be of the same family and properly * defined (i.e. not out-of-bound) */ static INLINE bool checkRegisterData(RegisterFamily family, const Register &ID, const Function &fn, std::string &whyNot) { if (UNLIKELY(ID.value() >= fn.regNum())) { whyNot = "Out-of-bound destination register index"; return false; } const RegisterData reg = fn.getRegisterData(ID); if (UNLIKELY(reg.family != family)) { whyNot = "Destination family does not match instruction type"; return false; } return true; } /*! Special registers are *not* writeable */ static INLINE bool checkSpecialRegForWrite(const Register ®, const Function &fn, std::string &whyNot) { if (fn.isSpecialReg(reg) == true && reg != ir::ocl::stackptr) { whyNot = "Non stack pointer special registers are not writeable"; return false; } return true; } /*! We check that the given type belongs to the provided type family */ static INLINE bool checkTypeFamily(const Type &type, const Type *family, uint32_t typeNum, std::string &whyNot) { uint32_t typeID = 0; for (; typeID < typeNum; ++typeID) if (family[typeID] == type) break; if (typeID == typeNum) { whyNot = "Type is not supported by the instruction"; return false; } return true; } #define CHECK_TYPE(TYPE, FAMILY) \ do { \ if (UNLIKELY(checkTypeFamily(TYPE, FAMILY, FAMILY##Num, whyNot)) == false) \ return false; \ } while (0) static const Type madType[] = {TYPE_FLOAT}; static const uint32_t madTypeNum = ARRAY_ELEM_NUM(madType); // TODO add support for 64 bits values static const Type allButBool[] = {TYPE_S8, TYPE_U8, TYPE_S16, TYPE_U16, TYPE_S32, TYPE_U32, TYPE_S64, TYPE_U64, TYPE_HALF, TYPE_FLOAT, TYPE_DOUBLE}; static const uint32_t allButBoolNum = ARRAY_ELEM_NUM(allButBool); // TODO add support for 64 bits values static const Type logicalType[] = {TYPE_S8, TYPE_U8, TYPE_S16, TYPE_U16, TYPE_S32, TYPE_U32, TYPE_S64, TYPE_U64, TYPE_BOOL}; static const uint32_t logicalTypeNum = ARRAY_ELEM_NUM(logicalType); // Unary and binary instructions share the same rules template INLINE bool NaryInstruction::wellFormed(const Function &fn, std::string &whyNot) const { const RegisterFamily family = getFamily(this->type); if (UNLIKELY(checkSpecialRegForWrite(dst[0], fn, whyNot) == false)) return false; if (opcode != OP_CBIT && UNLIKELY(checkRegisterData(family, dst[0], fn, whyNot) == false)) return false; for (uint32_t srcID = 0; srcID < srcNum; ++srcID) if (UNLIKELY(checkRegisterData(family, src[srcID], fn, whyNot) == false)) return false; // We actually support logical operations on boolean values for AND, OR, // and XOR switch (this->opcode) { case OP_OR: case OP_XOR: case OP_AND: CHECK_TYPE(this->type, logicalType); break; default: CHECK_TYPE(this->type, allButBool); break; case OP_MOV: break; case OP_POW: case OP_COS: case OP_SIN: case OP_RCP: case OP_ABS: case OP_RSQ: case OP_SQR: case OP_RNDD: case OP_RNDE: case OP_RNDU: case OP_RNDZ: const Type fp = TYPE_FLOAT; if (UNLIKELY(checkTypeFamily(TYPE_FLOAT, &fp, 1, whyNot)) == false) return false; break; } return true; } // First source must a boolean. Other must match the destination type INLINE bool SelectInstruction::wellFormed(const Function &fn, std::string &whyNot) const { const RegisterFamily family = getFamily(this->type); if (UNLIKELY(checkSpecialRegForWrite(dst[0], fn, whyNot) == false)) return false; if (UNLIKELY(checkRegisterData(family, dst[0], fn, whyNot) == false)) return false; if (UNLIKELY(src + 3u > fn.tupleNum())) { whyNot = "Out-of-bound index for ternary instruction"; return false; } const Register regID = fn.getRegister(src, 0); if (UNLIKELY(checkRegisterData(FAMILY_BOOL, regID, fn, whyNot) == false)) return false; for (uint32_t srcID = 1; srcID < 3; ++srcID) { const Register regID = fn.getRegister(src, srcID); if (UNLIKELY(checkRegisterData(family, regID, fn, whyNot) == false)) return false; } return true; } // Pretty similar to binary instruction. Only the destination is of type // boolean INLINE bool CompareInstruction::wellFormed(const Function &fn, std::string &whyNot) const { if (UNLIKELY(checkSpecialRegForWrite(dst[0], fn, whyNot) == false)) return false; if (UNLIKELY(checkRegisterData(FAMILY_BOOL, dst[0], fn, whyNot) == false)) return false; const RegisterFamily family = getFamily(this->type); for (uint32_t srcID = 0; srcID < 2; ++srcID) if (UNLIKELY(checkRegisterData(family, src[srcID], fn, whyNot) == false)) return false; return true; } // The bit sizes of src and the dst must be identical, and don't support bool now, bool need double check. INLINE bool BitCastInstruction::wellFormed(const Function &fn, std::string &whyNot) const { for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { if (UNLIKELY(checkSpecialRegForWrite(getDst(fn, dstID), fn, whyNot) == false)) return false; if (UNLIKELY(checkRegisterData((RegisterFamily)dstFamily, getDst(fn, dstID), fn, whyNot) == false)) return false; } for (uint32_t srcID = 0; srcID < srcNum; ++srcID) { if (UNLIKELY(checkRegisterData((RegisterFamily)srcFamily, getSrc(fn, srcID), fn, whyNot) == false)) return false; } CHECK_TYPE(getType((RegisterFamily)dstFamily), allButBool); CHECK_TYPE(getType((RegisterFamily)srcFamily), allButBool); uint32_t dstBytes = 0, srcBtyes = 0; dstBytes = dstNum * getFamilySize((RegisterFamily)dstFamily); srcBtyes = srcNum * getFamilySize((RegisterFamily)srcFamily); if(dstBytes != srcBtyes){ whyNot = " The bit sizes of src and the dst is not identical."; return false; } return true; } // We can convert anything to anything, but types and families must match INLINE bool ConvertInstruction::wellFormed(const Function &fn, std::string &whyNot) const { const RegisterFamily dstFamily = getFamily(dstType); const RegisterFamily srcFamily = getFamily(srcType); if (UNLIKELY(checkSpecialRegForWrite(dst[0], fn, whyNot) == false)) return false; if (UNLIKELY(checkRegisterData(dstFamily, dst[0], fn, whyNot) == false)) return false; if (UNLIKELY(checkRegisterData(srcFamily, src[0], fn, whyNot) == false)) return false; CHECK_TYPE(this->dstType, allButBool); CHECK_TYPE(this->srcType, allButBool); return true; } // We can convert anything to anything, but types and families must match INLINE bool AtomicInstruction::wellFormed(const Function &fn, std::string &whyNot) const { if (UNLIKELY(checkSpecialRegForWrite(dst[0], fn, whyNot) == false)) return false; const RegisterFamily family = getFamily(this->type); if (UNLIKELY(checkRegisterData(family, dst[0], fn, whyNot) == false)) return false; for (uint32_t srcID = 0; srcID < srcNum-1u; ++srcID) if (UNLIKELY(checkRegisterData(family, getSrc(fn, srcID+1u), fn, whyNot) == false)) return false; return true; } INLINE bool TernaryInstruction::wellFormed(const Function &fn, std::string &whyNot) const { const RegisterFamily family = getFamily(this->type); if (UNLIKELY(checkSpecialRegForWrite(dst[0], fn, whyNot) == false)) return false; if (UNLIKELY(checkRegisterData(family, dst[0], fn, whyNot) == false)) return false; if (UNLIKELY(src + 3u > fn.tupleNum())) { whyNot = "Out-of-bound index for ternary instruction"; return false; } for (uint32_t srcID = 0; srcID < 3; ++srcID) { const Register regID = fn.getRegister(src, srcID); if (UNLIKELY(checkRegisterData(family, regID, fn, whyNot) == false)) return false; } return true; } /*! Loads and stores follow the same restrictions */ template INLINE bool wellFormedLoadStore(const T &insn, const Function &fn, std::string &whyNot) { if (UNLIKELY(insn.getAddressRegister() >= fn.regNum())) { whyNot = "Out-of-bound offset register index"; return false; } if (UNLIKELY(insn.values + insn.valueNum > fn.tupleNum())) { whyNot = "Out-of-bound tuple index"; return false; } // Check all registers const RegisterFamily family = getFamily(insn.getValueType()); for (uint32_t valueID = 0; valueID < insn.getValueNum(); ++valueID) { const Register regID = insn.getValue(fn, valueID);; if (UNLIKELY(checkRegisterData(family, regID, fn, whyNot) == false)) return false; } return true; } INLINE bool LoadInstruction::wellFormed(const Function &fn, std::string &whyNot) const { const uint32_t dstNum = this->getDstNum(); for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { const Register reg = this->getDst(fn, dstID); const bool isOK = checkSpecialRegForWrite(reg, fn, whyNot); if (UNLIKELY(isOK == false)) return false; } if (UNLIKELY(dstNum > Instruction::MAX_DST_NUM)) { whyNot = "Too many destinations for load instruction"; return false; } return wellFormedLoadStore(*this, fn, whyNot); } INLINE bool StoreInstruction::wellFormed(const Function &fn, std::string &whyNot) const { const uint32_t srcNum = this->getSrcNum(); if (UNLIKELY(srcNum > Instruction::MAX_SRC_NUM)) { whyNot = "Too many source for store instruction"; return false; } return wellFormedLoadStore(*this, fn, whyNot); } // TODO INLINE bool SampleInstruction::wellFormed(const Function &fn, std::string &why) const { return true; } INLINE bool VmeInstruction::wellFormed(const Function &fn, std::string &why) const { return true; } INLINE bool TypedWriteInstruction::wellFormed(const Function &fn, std::string &why) const { return true; } INLINE bool GetImageInfoInstruction::wellFormed(const Function &fn, std::string &why) const { return true; } INLINE bool WaitInstruction::wellFormed(const Function &fn, std::string &why) const { return true; } // Ensure that types and register family match INLINE bool LoadImmInstruction::wellFormed(const Function &fn, std::string &whyNot) const { if (UNLIKELY(immediateIndex >= fn.immediateNum())) { whyNot = "Out-of-bound immediate value index"; return false; } const ir::Type immType = fn.getImmediate(immediateIndex).getType(); if (UNLIKELY(type != immType)) { whyNot = "Inconsistent type for the immediate value to load"; return false; } const RegisterFamily family = getFamily(type); if (UNLIKELY(checkSpecialRegForWrite(dst[0], fn, whyNot) == false)) return false; if (UNLIKELY(checkRegisterData(family, dst[0], fn, whyNot) == false)) return false; //Support all type IMM, disable check //CHECK_TYPE(this->type, allButBool); return true; } INLINE bool SyncInstruction::wellFormed(const Function &fn, std::string &whyNot) const { const uint32_t maxParams = SYNC_WORKGROUP_EXEC | SYNC_LOCAL_READ_FENCE | SYNC_LOCAL_WRITE_FENCE | SYNC_GLOBAL_READ_FENCE | SYNC_GLOBAL_WRITE_FENCE | SYNC_IMAGE_FENCE; if (UNLIKELY(this->parameters > maxParams)) { whyNot = "Invalid parameters for sync instruction"; return false; } else if (UNLIKELY(this->parameters == 0)) { whyNot = "Missing parameters for sync instruction"; return false; } return true; } INLINE bool ReadARFInstruction::wellFormed(const Function &fn, std::string &whyNot) const { if (UNLIKELY( this->type != TYPE_U32 && this->type != TYPE_S32)) { whyNot = "Only support S32/U32 type"; return false; } const RegisterFamily family = getFamily(this->type); if (UNLIKELY(checkRegisterData(family, dst[0], fn, whyNot) == false)) return false; return true; } INLINE bool SimdShuffleInstruction::wellFormed(const Function &fn, std::string &whyNot) const { if (UNLIKELY( this->type != TYPE_U32 && this->type != TYPE_S32 && this->type != TYPE_FLOAT && this->type != TYPE_U16 && this->type != TYPE_S16)) { whyNot = "Only support S16/U16/S32/U32/FLOAT type"; return false; } if (UNLIKELY(checkRegisterData(FAMILY_DWORD, src[1], fn, whyNot) == false)) return false; return true; } INLINE bool RegionInstruction::wellFormed(const Function &fn, std::string &whyNot) const { if (UNLIKELY(checkRegisterData(FAMILY_DWORD, src[0], fn, whyNot) == false)) return false; if (UNLIKELY(checkRegisterData(FAMILY_DWORD, dst[0], fn, whyNot) == false)) return false; return true; } INLINE bool IndirectMovInstruction::wellFormed(const Function &fn, std::string &whyNot) const { const RegisterFamily family = getFamily(this->type); if (UNLIKELY(checkSpecialRegForWrite(dst[0], fn, whyNot) == false)) return false; if (UNLIKELY(checkRegisterData(family, dst[0], fn, whyNot) == false)) return false; return true; } // Only a label index is required INLINE bool LabelInstruction::wellFormed(const Function &fn, std::string &whyNot) const { if (UNLIKELY(labelIndex >= fn.labelNum())) { whyNot = "Out-of-bound label index"; return false; } return true; } // The label must exist and the register must of boolean family INLINE bool BranchInstruction::wellFormed(const Function &fn, std::string &whyNot) const { if (hasLabel) if (UNLIKELY(labelIndex >= fn.labelNum())) { whyNot = "Out-of-bound label index"; return false; } if (hasPredicate) if (UNLIKELY(checkRegisterData(FAMILY_BOOL, predicate, fn, whyNot) == false)) return false; return true; } INLINE bool CalcTimestampInstruction::wellFormed(const Function &fn, std::string &whyNot) const { if (UNLIKELY(this->timestampType != 1)) { whyNot = "Wrong time stamp type"; return false; } if (UNLIKELY(this->pointNum >= 20 && this->pointNum != 0xff && this->pointNum != 0xfe)) { whyNot = "To much Insert pointer"; return false; } return true; } INLINE bool StoreProfilingInstruction::wellFormed(const Function &fn, std::string &whyNot) const { if (UNLIKELY(this->profilingType != 1)) { whyNot = "Wrong profiling format"; return false; } return true; } INLINE bool WorkGroupInstruction::wellFormed(const Function &fn, std::string &whyNot) const { const RegisterFamily family = getFamily(this->type); if (UNLIKELY(checkSpecialRegForWrite(dst[0], fn, whyNot) == false)) return false; if (UNLIKELY(checkRegisterData(family, dst[0], fn, whyNot) == false)) return false; switch (this->workGroupOp) { case WORKGROUP_OP_ANY: case WORKGROUP_OP_ALL: case WORKGROUP_OP_REDUCE_ADD: case WORKGROUP_OP_REDUCE_MIN: case WORKGROUP_OP_REDUCE_MAX: case WORKGROUP_OP_INCLUSIVE_ADD: case WORKGROUP_OP_INCLUSIVE_MIN: case WORKGROUP_OP_INCLUSIVE_MAX: case WORKGROUP_OP_EXCLUSIVE_ADD: case WORKGROUP_OP_EXCLUSIVE_MIN: case WORKGROUP_OP_EXCLUSIVE_MAX: if (this->srcNum != 3) { whyNot = "Wrong number of source."; return false; } break; case WORKGROUP_OP_BROADCAST: if (this->srcNum <= 1) { whyNot = "Wrong number of source."; return false; } else { const RegisterFamily fam = fn.getPointerFamily(); for (uint32_t srcID = 1; srcID < this->srcNum; ++srcID) { const Register regID = fn.getRegister(src, srcID); if (UNLIKELY(checkRegisterData(fam, regID, fn, whyNot) == false)) return false; } } break; default: whyNot = "No such work group function."; return false; } return true; } INLINE bool SubGroupInstruction::wellFormed(const Function &fn, std::string &whyNot) const { const RegisterFamily family = getFamily(this->type); if (UNLIKELY(checkSpecialRegForWrite(dst[0], fn, whyNot) == false)) return false; if (UNLIKELY(checkRegisterData(family, dst[0], fn, whyNot) == false)) return false; switch (this->workGroupOp) { case WORKGROUP_OP_ANY: case WORKGROUP_OP_ALL: case WORKGROUP_OP_REDUCE_ADD: case WORKGROUP_OP_REDUCE_MIN: case WORKGROUP_OP_REDUCE_MAX: case WORKGROUP_OP_INCLUSIVE_ADD: case WORKGROUP_OP_INCLUSIVE_MIN: case WORKGROUP_OP_INCLUSIVE_MAX: case WORKGROUP_OP_EXCLUSIVE_ADD: case WORKGROUP_OP_EXCLUSIVE_MIN: case WORKGROUP_OP_EXCLUSIVE_MAX: if (this->srcNum != 1) { whyNot = "Wrong number of source."; return false; } break; case WORKGROUP_OP_BROADCAST: if (this->srcNum != 2) { whyNot = "Wrong number of source."; return false; } else { if (UNLIKELY(checkRegisterData(FAMILY_DWORD, fn.getRegister(src, 1), fn, whyNot) == false)) return false; } break; default: whyNot = "No such sub group function."; return false; } return true; } INLINE bool PrintfInstruction::wellFormed(const Function &fn, std::string &whyNot) const { return true; } INLINE bool MediaBlockReadInstruction::wellFormed(const Function &fn, std::string &whyNot) const { if (this->srcNum != 2) { whyNot = "Wrong number of source."; return false; } return true; } INLINE bool MediaBlockWriteInstruction::wellFormed(const Function &fn, std::string &whyNot) const { if (this->srcNum != 2 + this->vec_size) { whyNot = "Wrong number of source."; return false; } return true; } #undef CHECK_TYPE ///////////////////////////////////////////////////////////////////////// // Implements all the output stream methods ///////////////////////////////////////////////////////////////////////// template INLINE void NaryInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << "." << this->getType() << " %" << this->getDst(fn, 0); for (uint32_t i = 0; i < srcNum; ++i) out << " %" << this->getSrc(fn, i); } template static void ternaryOrSelectOut(const T &insn, std::ostream &out, const Function &fn) { insn.outOpcode(out); out << "." << insn.getType() << " %" << insn.getDst(fn, 0) << " %" << insn.getSrc(fn, 0) << " %" << insn.getSrc(fn, 1) << " %" << insn.getSrc(fn, 2); } INLINE void SelectInstruction::out(std::ostream &out, const Function &fn) const { ternaryOrSelectOut(*this, out, fn); } INLINE void TernaryInstruction::out(std::ostream &out, const Function &fn) const { ternaryOrSelectOut(*this, out, fn); } INLINE void AtomicInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << "." << AS; #define OUT_ATOMIC_OP(TYPE) \ case ATOMIC_OP_##TYPE: \ { out << "." << #TYPE; \ break; \ } switch(atomicOp) { OUT_ATOMIC_OP(AND) OUT_ATOMIC_OP(OR) OUT_ATOMIC_OP(XOR) OUT_ATOMIC_OP(XCHG) OUT_ATOMIC_OP(INC) OUT_ATOMIC_OP(DEC) OUT_ATOMIC_OP(ADD) OUT_ATOMIC_OP(SUB) OUT_ATOMIC_OP(IMAX) OUT_ATOMIC_OP(IMIN) OUT_ATOMIC_OP(UMAX) OUT_ATOMIC_OP(UMIN) OUT_ATOMIC_OP(CMPXCHG) default: out << "." << "INVALID"; assert(0); }; out << " %" << this->getDst(fn, 0); out << " {" << "%" << this->getSrc(fn, 0) << "}"; for (uint32_t i = 1; i < srcNum; ++i) out << " %" << this->getSrc(fn, i); AddressMode am = this->getAddressMode(); out << " bti:"; if ( am == AM_DynamicBti) { out << " %" << this->getBtiReg(); } else { out << this->getSurfaceIndex(); } } INLINE void BitCastInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << "." << this->getDstType() << "." << this->getSrcType(); out << " {"; for (uint32_t i = 0; i < dstNum; ++i) out << "%" << this->getDst(fn, i) << (i != (dstNum-1u) ? " " : ""); out << "}"; out << " {"; for (uint32_t i = 0; i < srcNum; ++i) out << "%" << this->getSrc(fn, i) << (i != (srcNum-1u) ? " " : ""); out << "}"; } INLINE void ConvertInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << "." << this->getDstType() << "." << this->getSrcType() << " %" << this->getDst(fn, 0) << " %" << this->getSrc(fn, 0); } INLINE void LoadInstruction::out(std::ostream &out, const Function &fn) const { if(ifBlock) out<< "BLOCK"; this->outOpcode(out); out << "." << type << "." << AS << (dwAligned ? "." : ".un") << "aligned"; out << " {"; for (uint32_t i = 0; i < valueNum; ++i) out << "%" << this->getDst(fn, i) << (i != (valueNum-1u) ? " " : ""); out << "}"; out << " %" << this->getSrc(fn, 0); AddressMode am = this->getAddressMode(); out << " bti:"; if ( am == AM_DynamicBti) { out << " %" << this->getBtiReg(); } else { out << this->getSurfaceIndex(); } } INLINE void StoreInstruction::out(std::ostream &out, const Function &fn) const { if(ifBlock) out<< "BLOCK"; this->outOpcode(out); out << "." << type << "." << AS << (dwAligned ? "." : ".un") << "aligned"; out << " %" << this->getSrc(fn, 0) << " {"; for (uint32_t i = 0; i < valueNum; ++i) out << "%" << this->getSrc(fn, i+1) << (i != (valueNum-1u) ? " " : ""); out << "}"; AddressMode am = this->getAddressMode(); out << " bti:"; if ( am == AM_DynamicBti) { out << " %" << this->getBtiReg(); } else { out << this->getSurfaceIndex(); } } INLINE void ReadARFInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << " %" << this->getDst(fn, 0) << " arf:" << arf; } INLINE void RegionInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << " %" << this->getDst(fn, 0) << " %" << this->getSrc(fn, 0) << " offset: " << this->offset; } INLINE void IndirectMovInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << "." << type << " %" << this->getDst(fn, 0) << " %" << this->getSrc(fn, 0); out << " %" << this->getSrc(fn, 1) << " offset: " << this->offset; } INLINE void LabelInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << " $" << labelIndex; } INLINE void BranchInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); if(opcode == OP_IF && inversePredicate) out << " !"; if (hasPredicate) out << "<%" << this->getSrc(fn, 0) << ">"; if (hasLabel) out << " -> label$" << labelIndex; } INLINE void LoadImmInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); out << "." << type; out << " %" << this->getDst(fn,0) << " "; fn.outImmediate(out, immediateIndex); } static const char *syncStr[syncFieldNum] = { "workgroup", "local_read", "local_write", "global_read", "global_write", "image" }; INLINE void SyncInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); for (uint32_t field = 0; field < syncFieldNum; ++field) if (this->parameters & (1 << field)) out << "." << syncStr[field]; } INLINE void WaitInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); } INLINE void WorkGroupInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); switch (this->workGroupOp) { case WORKGROUP_OP_ANY: out << "_" << "ANY"; break; case WORKGROUP_OP_ALL: out << "_" << "ALL"; break; case WORKGROUP_OP_REDUCE_ADD: out << "_" << "REDUCE_ADD"; break; case WORKGROUP_OP_REDUCE_MIN: out << "_" << "REDUCE_MIN"; break; case WORKGROUP_OP_REDUCE_MAX: out << "_" << "REDUCE_MAX"; break; case WORKGROUP_OP_INCLUSIVE_ADD: out << "_" << "INCLUSIVE_ADD"; break; case WORKGROUP_OP_INCLUSIVE_MIN: out << "_" << "INCLUSIVE_MIN"; break; case WORKGROUP_OP_INCLUSIVE_MAX: out << "_" << "INCLUSIVE_MAX"; break; case WORKGROUP_OP_EXCLUSIVE_ADD: out << "_" << "EXCLUSIVE_ADD"; break; case WORKGROUP_OP_EXCLUSIVE_MIN: out << "_" << "EXCLUSIVE_MIN"; break; case WORKGROUP_OP_EXCLUSIVE_MAX: out << "_" << "EXCLUSIVE_MAX"; break; case WORKGROUP_OP_BROADCAST: out << "_" << "BROADCAST"; break; default: GBE_ASSERT(0); } out << " %" << this->getDst(fn, 0); for (uint32_t i = 0; i < this->getSrcNum(); ++i) out << " %" << this->getSrc(fn, i); if (this->workGroupOp == WORKGROUP_OP_BROADCAST) { do { int localN = srcNum - 1; GBE_ASSERT(localN); out << " Local X:"; out << " %" << this->getSrc(fn, 1); localN--; if (!localN) break; out << " Local Y:"; out << " %" << this->getSrc(fn, 2); localN--; if (!localN) break; out << " Local Z:"; out << " %" << this->getSrc(fn, 3); localN--; GBE_ASSERT(!localN); } while(0); } out << " (TheadID Map at SLM: " << this->slmAddr << ")"; } INLINE void SubGroupInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); switch (this->workGroupOp) { case WORKGROUP_OP_ANY: out << "_" << "ANY"; break; case WORKGROUP_OP_ALL: out << "_" << "ALL"; break; case WORKGROUP_OP_REDUCE_ADD: out << "_" << "REDUCE_ADD"; break; case WORKGROUP_OP_REDUCE_MIN: out << "_" << "REDUCE_MIN"; break; case WORKGROUP_OP_REDUCE_MAX: out << "_" << "REDUCE_MAX"; break; case WORKGROUP_OP_INCLUSIVE_ADD: out << "_" << "INCLUSIVE_ADD"; break; case WORKGROUP_OP_INCLUSIVE_MIN: out << "_" << "INCLUSIVE_MIN"; break; case WORKGROUP_OP_INCLUSIVE_MAX: out << "_" << "INCLUSIVE_MAX"; break; case WORKGROUP_OP_EXCLUSIVE_ADD: out << "_" << "EXCLUSIVE_ADD"; break; case WORKGROUP_OP_EXCLUSIVE_MIN: out << "_" << "EXCLUSIVE_MIN"; break; case WORKGROUP_OP_EXCLUSIVE_MAX: out << "_" << "EXCLUSIVE_MAX"; break; case WORKGROUP_OP_BROADCAST: out << "_" << "BROADCAST"; break; default: GBE_ASSERT(0); } out << " %" << this->getDst(fn, 0); out << " %" << this->getSrc(fn, 0); if (this->workGroupOp == WORKGROUP_OP_BROADCAST) { do { int localN = srcNum - 1; GBE_ASSERT(localN); out << " Local ID:"; out << " %" << this->getSrc(fn, 1); localN--; if (!localN) break; } while(0); } } INLINE void PrintfInstruction::out(std::ostream &out, const Function &fn) const { this->outOpcode(out); } } /* namespace internal */ std::ostream &operator<< (std::ostream &out, AddressSpace addrSpace) { switch (addrSpace) { case MEM_GLOBAL: return out << "global"; case MEM_LOCAL: return out << "local"; case MEM_CONSTANT: return out << "constant"; case MEM_PRIVATE: return out << "private"; case MEM_MIXED: return out << "mixed"; case MEM_GENERIC: return out << "generic"; case MEM_INVALID: return out << "invalid"; }; return out; } /////////////////////////////////////////////////////////////////////////// // Implements the various introspection functions /////////////////////////////////////////////////////////////////////////// template struct HelperIntrospection { enum { value = 0 }; }; template struct HelperIntrospection { enum { value = 1 }; }; RegisterData Instruction::getDstData(uint32_t ID) const { const Function &fn = this->getFunction(); return fn.getRegisterData(this->getDst(ID)); } RegisterData Instruction::getSrcData(uint32_t ID) const { const Function &fn = this->getFunction(); return fn.getRegisterData(this->getSrc(ID)); } #define DECL_INSN(OPCODE, CLASS) \ case OP_##OPCODE: \ return HelperIntrospection::value == 1; #define START_INTROSPECTION(CLASS) \ static_assert(sizeof(internal::CLASS) == (sizeof(uint64_t)*4), \ "Bad instruction size"); \ static_assert(offsetof(internal::CLASS, opcode) == 0, \ "Bad opcode offset"); \ bool CLASS::isClassOf(const Instruction &insn) { \ const Opcode op = insn.getOpcode(); \ typedef CLASS RefClass; \ switch (op) { #define END_INTROSPECTION(CLASS) \ default: return false; \ }; \ } START_INTROSPECTION(NullaryInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(NullaryInstruction) START_INTROSPECTION(UnaryInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(UnaryInstruction) START_INTROSPECTION(BinaryInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(BinaryInstruction) START_INTROSPECTION(CompareInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(CompareInstruction) START_INTROSPECTION(BitCastInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(BitCastInstruction) START_INTROSPECTION(ConvertInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(ConvertInstruction) START_INTROSPECTION(AtomicInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(AtomicInstruction) START_INTROSPECTION(SelectInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(SelectInstruction) START_INTROSPECTION(TernaryInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(TernaryInstruction) START_INTROSPECTION(BranchInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(BranchInstruction) START_INTROSPECTION(SampleInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(SampleInstruction) START_INTROSPECTION(TypedWriteInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(TypedWriteInstruction) START_INTROSPECTION(GetImageInfoInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(GetImageInfoInstruction) START_INTROSPECTION(CalcTimestampInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(CalcTimestampInstruction) START_INTROSPECTION(StoreProfilingInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(StoreProfilingInstruction) START_INTROSPECTION(LoadImmInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(LoadImmInstruction) START_INTROSPECTION(LoadInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(LoadInstruction) START_INTROSPECTION(StoreInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(StoreInstruction) START_INTROSPECTION(SyncInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(SyncInstruction) START_INTROSPECTION(ReadARFInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(ReadARFInstruction) START_INTROSPECTION(RegionInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(RegionInstruction) START_INTROSPECTION(SimdShuffleInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(SimdShuffleInstruction) START_INTROSPECTION(IndirectMovInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(IndirectMovInstruction) START_INTROSPECTION(LabelInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(LabelInstruction) START_INTROSPECTION(WaitInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(WaitInstruction) START_INTROSPECTION(VmeInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(VmeInstruction) START_INTROSPECTION(WorkGroupInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(WorkGroupInstruction) START_INTROSPECTION(SubGroupInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(SubGroupInstruction) START_INTROSPECTION(PrintfInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(PrintfInstruction) START_INTROSPECTION(MediaBlockReadInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(MediaBlockReadInstruction) START_INTROSPECTION(MediaBlockWriteInstruction) #include "ir/instruction.hxx" END_INTROSPECTION(MediaBlockWriteInstruction) #undef END_INTROSPECTION #undef START_INTROSPECTION #undef DECL_INSN /////////////////////////////////////////////////////////////////////////// // Implements the function dispatching from public to internal with some // macro horrors /////////////////////////////////////////////////////////////////////////// #define DECL_INSN(OPCODE, CLASS) \ case OP_##OPCODE: return reinterpret_cast(this)->CALL; #define START_FUNCTION(CLASS, RET, PROTOTYPE) \ RET CLASS::PROTOTYPE const { \ const Opcode op = this->getOpcode(); \ switch (op) { #define END_FUNCTION(CLASS, RET) \ case OP_INVALID: return RET(); \ }; \ return RET(); \ } #define CALL getSrcNum() START_FUNCTION(Instruction, uint32_t, getSrcNum(void)) #include "ir/instruction.hxx" END_FUNCTION(Instruction, uint32_t) #undef CALL #define CALL getDstNum() START_FUNCTION(Instruction, uint32_t, getDstNum(void)) #include "ir/instruction.hxx" END_FUNCTION(Instruction, uint32_t) #undef CALL #undef DECL_INSN #define DECL_INSN(OPCODE, CLASS) \ case OP_##OPCODE: \ { \ const Function &fn = this->getFunction(); \ return reinterpret_cast(this)->CALL; \ } #define CALL wellFormed(fn, whyNot) START_FUNCTION(Instruction, bool, wellFormed(std::string &whyNot)) #include "ir/instruction.hxx" END_FUNCTION(Instruction, bool) #undef CALL #define CALL getDst(fn, ID) START_FUNCTION(Instruction, Register, getDst(uint32_t ID)) #include "ir/instruction.hxx" END_FUNCTION(Instruction, Register) #undef CALL #define CALL getSrc(fn, ID) START_FUNCTION(Instruction, Register, getSrc(uint32_t ID)) #include "ir/instruction.hxx" END_FUNCTION(Instruction, Register) #undef CALL #undef DECL_INSN #undef END_FUNCTION #undef START_FUNCTION void Instruction::setSrc(uint32_t srcID, Register reg) { Function &fn = this->getFunction(); #if GBE_DEBUG const RegisterData oldData = this->getSrcData(srcID); const RegisterData newData = fn.getRegisterData(reg); GBE_ASSERT(oldData.family == newData.family); #endif /* GBE_DEBUG */ const Opcode op = this->getOpcode(); switch (op) { #define DECL_INSN(OP, FAMILY)\ case OP_##OP:\ reinterpret_cast(this)->setSrc(fn, srcID, reg);\ break; #include "instruction.hxx" #undef DECL_INSN case OP_INVALID: NOT_SUPPORTED; break; }; } void Instruction::setDst(uint32_t dstID, Register reg) { Function &fn = this->getFunction(); #if GBE_DEBUG const RegisterData oldData = this->getDstData(dstID); const RegisterData newData = fn.getRegisterData(reg); GBE_ASSERT(oldData.family == newData.family); #endif /* GBE_DEBUG */ const Opcode op = this->getOpcode(); switch (op) { #define DECL_INSN(OP, FAMILY)\ case OP_##OP:\ reinterpret_cast(this)->setDst(fn, dstID, reg);\ break; #include "instruction.hxx" #undef DECL_INSN case OP_INVALID: NOT_SUPPORTED; break; }; } const Function &Instruction::getFunction(void) const { const BasicBlock *bb = this->getParent(); GBE_ASSERT(bb != NULL); return bb->getParent(); } Function &Instruction::getFunction(void) { BasicBlock *bb = this->getParent(); GBE_ASSERT(bb != NULL); return bb->getParent(); } void Instruction::replace(Instruction *other) const { Function &fn = other->getFunction(); Instruction *insn = fn.newInstruction(*this); intrusive_list_node *prev = other->prev; insn->parent = other->parent; other->remove(); append(insn, prev); } void Instruction::remove(void) { Function &fn = this->getFunction(); unlink(this); fn.deleteInstruction(this); } void Instruction::insert(Instruction *prev, Instruction ** new_ins) { Function &fn = prev->getFunction(); Instruction *insn = fn.newInstruction(*this); insn->parent = prev->parent; append(insn, prev); if (new_ins) *new_ins = insn; } bool Instruction::hasSideEffect(void) const { return opcode == OP_STORE || opcode == OP_TYPED_WRITE || opcode == OP_SYNC || opcode == OP_ATOMIC || opcode == OP_CALC_TIMESTAMP || opcode == OP_STORE_PROFILING || opcode == OP_WAIT || opcode == OP_PRINTF || opcode == OP_MBWRITE; } #define DECL_MEM_FN(CLASS, RET, PROTOTYPE, CALL) \ RET CLASS::PROTOTYPE const { \ return reinterpret_cast(this)->CALL; \ } DECL_MEM_FN(NullaryInstruction, Type, getType(void), getType()) DECL_MEM_FN(UnaryInstruction, Type, getType(void), getType()) DECL_MEM_FN(BinaryInstruction, Type, getType(void), getType()) DECL_MEM_FN(BinaryInstruction, bool, commutes(void), commutes()) DECL_MEM_FN(SelectInstruction, Type, getType(void), getType()) DECL_MEM_FN(TernaryInstruction, Type, getType(void), getType()) DECL_MEM_FN(CompareInstruction, Type, getType(void), getType()) DECL_MEM_FN(BitCastInstruction, Type, getSrcType(void), getSrcType()) DECL_MEM_FN(BitCastInstruction, Type, getDstType(void), getDstType()) DECL_MEM_FN(ConvertInstruction, Type, getSrcType(void), getSrcType()) DECL_MEM_FN(ConvertInstruction, Type, getDstType(void), getDstType()) DECL_MEM_FN(MemInstruction, AddressSpace, getAddressSpace(void), getAddressSpace()) DECL_MEM_FN(MemInstruction, AddressMode, getAddressMode(void), getAddressMode()) DECL_MEM_FN(MemInstruction, Register, getAddressRegister(void), getAddressRegister()) DECL_MEM_FN(MemInstruction, Register, getBtiReg(void), getBtiReg()) DECL_MEM_FN(MemInstruction, unsigned, getSurfaceIndex(void), getSurfaceIndex()) DECL_MEM_FN(MemInstruction, Type, getValueType(void), getValueType()) DECL_MEM_FN(MemInstruction, bool, isAligned(void), isAligned()) DECL_MEM_FN(MemInstruction, unsigned, getAddressIndex(void), getAddressIndex()) DECL_MEM_FN(AtomicInstruction, AtomicOps, getAtomicOpcode(void), getAtomicOpcode()) DECL_MEM_FN(StoreInstruction, uint32_t, getValueNum(void), getValueNum()) DECL_MEM_FN(StoreInstruction, bool, isBlock(void), isBlock()) DECL_MEM_FN(LoadInstruction, uint32_t, getValueNum(void), getValueNum()) DECL_MEM_FN(LoadInstruction, bool, isBlock(void), isBlock()) DECL_MEM_FN(LoadImmInstruction, Type, getType(void), getType()) DECL_MEM_FN(LabelInstruction, LabelIndex, getLabelIndex(void), getLabelIndex()) DECL_MEM_FN(BranchInstruction, bool, isPredicated(void), isPredicated()) DECL_MEM_FN(BranchInstruction, bool, getInversePredicated(void), getInversePredicated()) DECL_MEM_FN(BranchInstruction, LabelIndex, getLabelIndex(void), getLabelIndex()) DECL_MEM_FN(SyncInstruction, uint32_t, getParameters(void), getParameters()) DECL_MEM_FN(ReadARFInstruction, Type, getType(void), getType()) DECL_MEM_FN(ReadARFInstruction, ARFRegister, getARFRegister(void), getARFRegister()) DECL_MEM_FN(SimdShuffleInstruction, Type, getType(void), getType()) DECL_MEM_FN(RegionInstruction, uint32_t, getOffset(void), getOffset()) DECL_MEM_FN(IndirectMovInstruction, uint32_t, getOffset(void), getOffset()) DECL_MEM_FN(IndirectMovInstruction, Type, getType(void), getType()) DECL_MEM_FN(SampleInstruction, Type, getSrcType(void), getSrcType()) DECL_MEM_FN(SampleInstruction, Type, getDstType(void), getDstType()) DECL_MEM_FN(SampleInstruction, uint8_t, getSamplerIndex(void), getSamplerIndex()) DECL_MEM_FN(SampleInstruction, uint8_t, getSamplerOffset(void), getSamplerOffset()) DECL_MEM_FN(SampleInstruction, uint8_t, getImageIndex(void), getImageIndex()) DECL_MEM_FN(VmeInstruction, Type, getSrcType(void), getSrcType()) DECL_MEM_FN(VmeInstruction, Type, getDstType(void), getDstType()) DECL_MEM_FN(VmeInstruction, uint8_t, getImageIndex(void), getImageIndex()) DECL_MEM_FN(VmeInstruction, uint8_t, getMsgType(void), getMsgType()) DECL_MEM_FN(TypedWriteInstruction, Type, getSrcType(void), getSrcType()) DECL_MEM_FN(TypedWriteInstruction, Type, getCoordType(void), getCoordType()) DECL_MEM_FN(TypedWriteInstruction, uint8_t, getImageIndex(void), getImageIndex()) DECL_MEM_FN(GetImageInfoInstruction, uint32_t, getInfoType(void), getInfoType()) DECL_MEM_FN(GetImageInfoInstruction, uint8_t, getImageIndex(void), getImageIndex()) DECL_MEM_FN(CalcTimestampInstruction, uint32_t, getPointNum(void), getPointNum()) DECL_MEM_FN(CalcTimestampInstruction, uint32_t, getTimestamptType(void), getTimestamptType()) DECL_MEM_FN(StoreProfilingInstruction, uint32_t, getProfilingType(void), getProfilingType()) DECL_MEM_FN(StoreProfilingInstruction, uint32_t, getBTI(void), getBTI()) DECL_MEM_FN(WorkGroupInstruction, Type, getType(void), getType()) DECL_MEM_FN(WorkGroupInstruction, WorkGroupOps, getWorkGroupOpcode(void), getWorkGroupOpcode()) DECL_MEM_FN(WorkGroupInstruction, uint32_t, getSlmAddr(void), getSlmAddr()) DECL_MEM_FN(SubGroupInstruction, Type, getType(void), getType()) DECL_MEM_FN(SubGroupInstruction, WorkGroupOps, getWorkGroupOpcode(void), getWorkGroupOpcode()) DECL_MEM_FN(PrintfInstruction, uint32_t, getNum(void), getNum()) DECL_MEM_FN(PrintfInstruction, uint32_t, getBti(void), getBti()) DECL_MEM_FN(PrintfInstruction, Type, getType(const Function& fn, uint32_t ID), getType(fn, ID)) DECL_MEM_FN(MediaBlockReadInstruction, uint8_t, getImageIndex(void), getImageIndex()) DECL_MEM_FN(MediaBlockReadInstruction, uint8_t, getVectorSize(void), getVectorSize()) DECL_MEM_FN(MediaBlockReadInstruction, Type, getType(void), getType()) DECL_MEM_FN(MediaBlockWriteInstruction, uint8_t, getImageIndex(void), getImageIndex()) DECL_MEM_FN(MediaBlockWriteInstruction, uint8_t, getVectorSize(void), getVectorSize()) DECL_MEM_FN(MediaBlockWriteInstruction, Type, getType(void), getType()) #undef DECL_MEM_FN #define DECL_MEM_FN(CLASS, RET, PROTOTYPE, CALL) \ RET CLASS::PROTOTYPE { \ return reinterpret_cast(this)->CALL; \ } DECL_MEM_FN(MemInstruction, void, setSurfaceIndex(unsigned id), setSurfaceIndex(id)) DECL_MEM_FN(MemInstruction, void, setBtiReg(Register reg), setBtiReg(reg)) #undef DECL_MEM_FN Immediate LoadImmInstruction::getImmediate(void) const { const Function &fn = this->getFunction(); return reinterpret_cast(this)->getImmediate(fn); } /////////////////////////////////////////////////////////////////////////// // Implements the emission functions /////////////////////////////////////////////////////////////////////////// // For all nullary functions with given opcode Instruction ALU0(Opcode opcode, Type type, Register dst) { return internal::NullaryInstruction(opcode, type, dst).convert(); } // All nullary functions #define DECL_EMIT_FUNCTION(NAME) \ Instruction NAME(Type type, Register dst) { \ return ALU0(OP_##NAME, type, dst);\ } DECL_EMIT_FUNCTION(SIMD_SIZE) DECL_EMIT_FUNCTION(SIMD_ID) #undef DECL_EMIT_FUNCTION // For all unary functions with given opcode Instruction ALU1(Opcode opcode, Type type, Register dst, Register src) { return internal::UnaryInstruction(opcode, type, dst, src).convert(); } // All unary functions #define DECL_EMIT_FUNCTION(NAME) \ Instruction NAME(Type type, Register dst, Register src) { \ return ALU1(OP_##NAME, type, dst, src);\ } DECL_EMIT_FUNCTION(MOV) DECL_EMIT_FUNCTION(FBH) DECL_EMIT_FUNCTION(FBL) DECL_EMIT_FUNCTION(CBIT) DECL_EMIT_FUNCTION(LZD) DECL_EMIT_FUNCTION(COS) DECL_EMIT_FUNCTION(SIN) DECL_EMIT_FUNCTION(LOG) DECL_EMIT_FUNCTION(SQR) DECL_EMIT_FUNCTION(RSQ) DECL_EMIT_FUNCTION(RNDD) DECL_EMIT_FUNCTION(RNDE) DECL_EMIT_FUNCTION(RNDU) DECL_EMIT_FUNCTION(RNDZ) DECL_EMIT_FUNCTION(BFREV) #undef DECL_EMIT_FUNCTION // All binary functions #define DECL_EMIT_FUNCTION(NAME) \ Instruction NAME(Type type, Register dst, Register src0, Register src1) { \ return internal::BinaryInstruction(OP_##NAME, type, dst, src0, src1).convert(); \ } DECL_EMIT_FUNCTION(POW) DECL_EMIT_FUNCTION(MUL) DECL_EMIT_FUNCTION(ADD) DECL_EMIT_FUNCTION(ADDSAT) DECL_EMIT_FUNCTION(SUB) DECL_EMIT_FUNCTION(SUBSAT) DECL_EMIT_FUNCTION(MUL_HI) DECL_EMIT_FUNCTION(I64_MUL_HI) DECL_EMIT_FUNCTION(UPSAMPLE_SHORT) DECL_EMIT_FUNCTION(UPSAMPLE_INT) DECL_EMIT_FUNCTION(UPSAMPLE_LONG) DECL_EMIT_FUNCTION(DIV) DECL_EMIT_FUNCTION(REM) DECL_EMIT_FUNCTION(SHL) DECL_EMIT_FUNCTION(SHR) DECL_EMIT_FUNCTION(ASR) DECL_EMIT_FUNCTION(BSF) DECL_EMIT_FUNCTION(BSB) DECL_EMIT_FUNCTION(OR) DECL_EMIT_FUNCTION(XOR) DECL_EMIT_FUNCTION(AND) DECL_EMIT_FUNCTION(HADD) DECL_EMIT_FUNCTION(RHADD) DECL_EMIT_FUNCTION(I64HADD) DECL_EMIT_FUNCTION(I64RHADD) #undef DECL_EMIT_FUNCTION // SEL Instruction SEL(Type type, Register dst, Tuple src) { return internal::SelectInstruction(type, dst, src).convert(); } Instruction I64MADSAT(Type type, Register dst, Tuple src) { return internal::TernaryInstruction(OP_I64MADSAT, type, dst, src).convert(); } Instruction MAD(Type type, Register dst, Tuple src) { return internal::TernaryInstruction(OP_MAD, type, dst, src).convert(); } Instruction LRP(Type type, Register dst, Tuple src) { return internal::TernaryInstruction(OP_LRP, type, dst, src).convert(); } // All compare functions #define DECL_EMIT_FUNCTION(NAME) \ Instruction NAME(Type type, Register dst, Register src0, Register src1) { \ const internal::CompareInstruction insn(OP_##NAME, type, dst, src0, src1); \ return insn.convert(); \ } DECL_EMIT_FUNCTION(EQ) DECL_EMIT_FUNCTION(NE) DECL_EMIT_FUNCTION(LE) DECL_EMIT_FUNCTION(LT) DECL_EMIT_FUNCTION(GE) DECL_EMIT_FUNCTION(GT) DECL_EMIT_FUNCTION(ORD) #undef DECL_EMIT_FUNCTION // BITCAST Instruction BITCAST(Type dstType, Type srcType, Tuple dst, Tuple src, uint8_t dstNum, uint8_t srcNum) { return internal::BitCastInstruction(dstType, srcType, dst, src, dstNum, srcNum).convert(); } // CVT Instruction CVT(Type dstType, Type srcType, Register dst, Register src) { return internal::ConvertInstruction(OP_CVT, dstType, srcType, dst, src).convert(); } // saturated convert Instruction SAT_CVT(Type dstType, Type srcType, Register dst, Register src) { return internal::ConvertInstruction(OP_SAT_CVT, dstType, srcType, dst, src).convert(); } // CVT Instruction F16TO32(Type dstType, Type srcType, Register dst, Register src) { return internal::ConvertInstruction(OP_F16TO32, dstType, srcType, dst, src).convert(); } // saturated convert Instruction F32TO16(Type dstType, Type srcType, Register dst, Register src) { return internal::ConvertInstruction(OP_F32TO16, dstType, srcType, dst, src).convert(); } // For all unary functions with given opcode Instruction ATOMIC(AtomicOps atomicOp, Type type, Register dst, AddressSpace space, Register address, Tuple payload, AddressMode AM, Register bti) { internal::AtomicInstruction insn = internal::AtomicInstruction(atomicOp, type, dst, space, address, payload, AM); insn.setBtiReg(bti); return insn.convert(); } Instruction ATOMIC(AtomicOps atomicOp, Type type, Register dst, AddressSpace space, Register address, Tuple payload, AddressMode AM, unsigned SurfaceIndex) { internal::AtomicInstruction insn = internal::AtomicInstruction(atomicOp, type, dst, space, address, payload, AM); insn.setSurfaceIndex(SurfaceIndex); return insn.convert(); } // BRA Instruction BRA(LabelIndex labelIndex) { return internal::BranchInstruction(OP_BRA, labelIndex).convert(); } Instruction BRA(LabelIndex labelIndex, Register pred) { return internal::BranchInstruction(OP_BRA, labelIndex, pred).convert(); } // IF Instruction IF(LabelIndex labelIndex, Register pred, bool inv_pred) { return internal::BranchInstruction(OP_IF, labelIndex, pred, inv_pred).convert(); } // ELSE Instruction ELSE(LabelIndex labelIndex) { return internal::BranchInstruction(OP_ELSE, labelIndex).convert(); } // ENDIF Instruction ENDIF(LabelIndex labelIndex) { return internal::BranchInstruction(OP_ENDIF, labelIndex).convert(); } // WHILE Instruction WHILE(LabelIndex labelIndex, Register pred) { return internal::BranchInstruction(OP_WHILE, labelIndex, pred).convert(); } // RET Instruction RET(void) { return internal::BranchInstruction(OP_RET).convert(); } // LOADI Instruction LOADI(Type type, Register dst, ImmediateIndex value) { return internal::LoadImmInstruction(type, dst, value).convert(); } // LOAD and STORE #define DECL_EMIT_FUNCTION(NAME, CLASS) \ Instruction NAME(Type type, \ Tuple tuple, \ Register offset, \ AddressSpace space, \ uint32_t valueNum, \ bool dwAligned, \ AddressMode AM, \ unsigned SurfaceIndex, \ bool isBlock) \ { \ internal::CLASS insn = internal::CLASS(type,tuple,offset,space,valueNum,dwAligned,AM, isBlock); \ insn.setSurfaceIndex(SurfaceIndex);\ return insn.convert(); \ } \ Instruction NAME(Type type, \ Tuple tuple, \ Register offset, \ AddressSpace space, \ uint32_t valueNum, \ bool dwAligned, \ AddressMode AM, \ Register bti) \ { \ internal::CLASS insn = internal::CLASS(type,tuple,offset,space,valueNum,dwAligned,AM); \ insn.setBtiReg(bti); \ return insn.convert(); \ } DECL_EMIT_FUNCTION(LOAD, LoadInstruction) DECL_EMIT_FUNCTION(STORE, StoreInstruction) #undef DECL_EMIT_FUNCTION // FENCE Instruction SYNC(uint32_t parameters) { return internal::SyncInstruction(parameters).convert(); } Instruction READ_ARF(Type type, Register dst, ARFRegister arf) { return internal::ReadARFInstruction(type, dst, arf).convert(); } Instruction REGION(Register dst, Register src, uint32_t offset) { return internal::RegionInstruction(dst, src, offset).convert(); } Instruction SIMD_SHUFFLE(Type type, Register dst, Register src0, Register src1) { return internal::SimdShuffleInstruction(type, dst, src0, src1).convert(); } Instruction INDIRECT_MOV(Type type, Register dst, Register src0, Register src1, uint32_t offset) { return internal::IndirectMovInstruction(type, dst, src0, src1, offset).convert(); } // LABEL Instruction LABEL(LabelIndex labelIndex) { return internal::LabelInstruction(labelIndex).convert(); } // SAMPLE Instruction SAMPLE(uint8_t imageIndex, Tuple dst, Tuple src, uint8_t srcNum, bool dstIsFloat, bool srcIsFloat, uint8_t sampler, uint8_t samplerOffset) { return internal::SampleInstruction(imageIndex, dst, src, srcNum, dstIsFloat, srcIsFloat, sampler, samplerOffset).convert(); } Instruction VME(uint8_t imageIndex, Tuple dst, Tuple src, uint32_t dstNum, uint32_t srcNum, int msg_type, int vme_search_path_lut, int lut_sub) { return internal::VmeInstruction(imageIndex, dst, src, dstNum, srcNum, msg_type, vme_search_path_lut, lut_sub).convert(); } Instruction TYPED_WRITE(uint8_t imageIndex, Tuple src, uint8_t srcNum, Type srcType, Type coordType) { return internal::TypedWriteInstruction(imageIndex, src, srcNum, srcType, coordType).convert(); } Instruction GET_IMAGE_INFO(int infoType, Register dst, uint8_t imageIndex, Register infoReg) { return internal::GetImageInfoInstruction(infoType, dst, imageIndex, infoReg).convert(); } Instruction CALC_TIMESTAMP(uint32_t pointNum, uint32_t tsType) { return internal::CalcTimestampInstruction(pointNum, tsType).convert(); } Instruction STORE_PROFILING(uint32_t bti, uint32_t profilingType) { return internal::StoreProfilingInstruction(bti, profilingType).convert(); } // WAIT Instruction WAIT(void) { return internal::WaitInstruction().convert(); } Instruction WORKGROUP(WorkGroupOps opcode, uint32_t slmAddr, Register dst, Tuple srcTuple, uint8_t srcNum, Type type) { return internal::WorkGroupInstruction(opcode, slmAddr, dst, srcTuple, srcNum, type).convert(); } Instruction SUBGROUP(WorkGroupOps opcode, Register dst, Tuple srcTuple, uint8_t srcNum, Type type) { return internal::SubGroupInstruction(opcode, dst, srcTuple, srcNum, type).convert(); } Instruction PRINTF(Register dst, Tuple srcTuple, Tuple typeTuple, uint8_t srcNum, uint8_t bti, uint16_t num) { return internal::PrintfInstruction(dst, srcTuple, typeTuple, srcNum, bti, num).convert(); } Instruction MBREAD(uint8_t imageIndex, Tuple dst, uint8_t vec_size, Tuple coord, uint8_t srcNum, Type type) { return internal::MediaBlockReadInstruction(imageIndex, dst, vec_size, coord, srcNum, type).convert(); } Instruction MBWRITE(uint8_t imageIndex, Tuple srcTuple, uint8_t srcNum, uint8_t vec_size, Type type) { return internal::MediaBlockWriteInstruction(imageIndex, srcTuple, srcNum, vec_size, type).convert(); } std::ostream &operator<< (std::ostream &out, const Instruction &insn) { const Function &fn = insn.getFunction(); const BasicBlock *bb = insn.getParent(); switch (insn.getOpcode()) { #define DECL_INSN(OPCODE, CLASS) \ case OP_##OPCODE: \ if(OP_##OPCODE == OP_ELSE) \ { \ reinterpret_cast(insn).out(out, fn); \ out << " <**>label: " << bb->thisElseLabel; \ break; \ } \ reinterpret_cast(insn).out(out, fn); \ break; #include "instruction.hxx" #undef DECL_INSN case OP_INVALID: NOT_SUPPORTED; break; }; return out; } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/lowering.hpp000664 001750 001750 00000006174 13161142102 020606 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file lowering.hpp * \author Benjamin Segovia * Lower instructions that are not supported properly. Typical example is * handling returns or unsupported vector scatters / gathers */ #ifndef __GBE_IR_LOWERING_HPP__ #define __GBE_IR_LOWERING_HPP__ namespace gbe { namespace ir { // Structure to update class Unit; /*! Remove all return instructions and replace them to forward branches that * point to the only return instruction in a dedicated basic block and the end * of the function. * Typically this code: * * dst[x] = 1; * if (x > 4) return; * dst[x] = 3; * * will be replaced by: * * dst[x] = 1; * if (x > 4) goto end; * dst[x] = 3; * end: * return; * * There will be only one return at the end of the function. This return will * be simply encoded as a End-of-thread instruction (EOT) */ void lowerReturn(Unit &unit, const std::string &functionName); /*! Function arguments are a bit tricky since we must implement the proper C * semantic: we can therefore address the function arguments as we want and * we can even modify them. This leads to interesting challenges. We identify * several cases: * * case 1: * int f (__global int *dst, int x[16], int y) { * dst[get_global_id(0)] = x[16] + y; * } * Here x and y will be pushed to registers using the Curbe. No problem, we * can directly used the pushed registers * * case 2: * int f (__global int *dst, int x[16], int y) { * dst[get_global_id(0)] = x[get_local_id(0)] + y; * } * Here x is indirectly accessed. We need to perform a gather from memory. We * can simply gather it from the curbe in memory * * case 3: * int f (__global int *dst, int x[16], int y) { * x[get_local_id(0)] = y + 1; * int *ptr = get_local_id(0) % 2 ? x[0] : x[1]; * dst[get_global_id(0)] = *ptr; * } * Here we modify the function argument since it is valid C. Problem is that * we are running in SIMD mode while the data are scalar (in both memory and * registers). In that case, we just spill everything to memory (using the * stack) and reload it from here when needed. */ void lowerFunctionArguments(Unit &unit, const std::string &functionName); } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_LOWERING_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/constant.cpp000664 001750 001750 00000007713 13161142102 020604 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file constant.hpp * * \author Benjamin Segovia */ #include "constant.hpp" namespace gbe { namespace ir { void ConstantSet::append(const std::string &name, uint32_t size, uint32_t alignment) { const uint32_t offset = ALIGN(this->data.size(), alignment); const uint32_t padding = offset - this->data.size(); const Constant constant(name, size, alignment, offset); constants.push_back(constant); this->data.resize(padding + size + this->data.size()); } #define OUT_UPDATE_SZ(elt) SERIALIZE_OUT(elt, outs, ret_size) #define IN_UPDATE_SZ(elt) DESERIALIZE_IN(elt, ins, total_size) uint32_t ConstantSet::serializeToBin(std::ostream& outs) { uint32_t ret_size = 0; OUT_UPDATE_SZ(magic_begin); /* output the const data. */ uint32_t sz = data.size()*sizeof(char); OUT_UPDATE_SZ(sz); if(data.size() > 0) { outs.write(data.data(), sz); ret_size += sz; } sz = constants.size(); OUT_UPDATE_SZ(sz); for (uint32_t i = 0; i < constants.size(); ++i) { Constant& cnst = constants[i]; sz = cnst.getName().size()*sizeof(char); uint32_t bytes = sizeof(sz) //name length self + sz //name + sizeof(cnst.getSize()) //size + sizeof(cnst.getAlignment()) //alignment + sizeof(cnst.getOffset()); //offset OUT_UPDATE_SZ(bytes); OUT_UPDATE_SZ(sz); outs.write(cnst.getName().c_str(), sz); ret_size += sz; OUT_UPDATE_SZ(cnst.getSize()); OUT_UPDATE_SZ(cnst.getAlignment()); OUT_UPDATE_SZ(cnst.getOffset()); } OUT_UPDATE_SZ(magic_end); OUT_UPDATE_SZ(ret_size); return ret_size; } uint32_t ConstantSet::deserializeFromBin(std::istream& ins) { uint32_t total_size = 0; uint32_t global_data_sz = 0; uint32_t const_num; uint32_t magic; IN_UPDATE_SZ(magic); if (magic != magic_begin) return 0; IN_UPDATE_SZ(global_data_sz); for (uint32_t i = 0; i < global_data_sz; i++) { char elt; IN_UPDATE_SZ(elt); data.push_back(elt); } IN_UPDATE_SZ(const_num); for (uint32_t i = 0; i < const_num; i++) { uint32_t bytes; IN_UPDATE_SZ(bytes); uint32_t name_len; IN_UPDATE_SZ(name_len); char* c_name = new char[name_len+1]; ins.read(c_name, name_len); total_size += sizeof(char)*name_len; c_name[name_len] = 0; uint32_t size, align, offset; IN_UPDATE_SZ(size); IN_UPDATE_SZ(align); IN_UPDATE_SZ(offset); ir::Constant constant(c_name, size, align, offset); constants.push_back(constant); delete[] c_name; /* Saint check */ if (bytes != sizeof(name_len) + sizeof(char)*name_len + sizeof(size) + sizeof(align) + sizeof(offset)) return 0; } IN_UPDATE_SZ(magic); if (magic != magic_end) return 0; uint32_t total_bytes; IN_UPDATE_SZ(total_bytes); if (total_bytes + sizeof(total_size) != total_size) return 0; return total_size; } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/value.cpp000664 001750 001750 00000072376 13161142102 020076 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file value.cpp * \author Benjamin Segovia */ #include "ir/value.hpp" #include "ir/liveness.hpp" namespace gbe { namespace ir { /*! To build the chains (i.e. basically the graph of values), we are going to * iterate on liveout definitions: for each block and for each variable * (ir::Register) alive at the end of the block (in Block::LiveOut), we are * computing the set of all possible value definitions. Using these value * definitions, we will finally transfer these sets to the successors to get * the ud / du chains * * LiveOutSet contains the set of definitions for each basic block */ class LiveOutSet { public: LiveOutSet(Liveness &liveness, const FunctionDAG &dag); ~LiveOutSet(void); /*! One set per register */ typedef set RegDefSet; /*! We have one map of liveout register per block */ typedef map BlockDefMap; /*! All the block definitions map in the functions */ typedef map FunctionDefMap; /*! Performs the double look-up to get the set of defs per register */ RegDefSet &getDefSet(const BasicBlock *bb, Register reg); /*! Build a UD-chain as the union of the predecessor chains */ void makeDefSet(DefSet &udChain, const BasicBlock &bb, Register reg); /*! Fast per register definition set allocation */ DECL_POOL(RegDefSet, regDefSetPool); /*! Fast register sets allocation */ DECL_POOL(BlockDefMap, blockDefMapPool); FunctionDefMap defMap; //!< All per-block data Liveness &liveness; //!< Contains LiveOut information const FunctionDAG &dag; //!< Structure we are building private: /*! Initialize liveOut with the instruction destination values */ void initializeInstructionDef(void); /*! Initialize liveOut with the function argument, special and pushed * registers */ void initializeOtherDef(void); /*! Iterate to completely transfer the liveness and get the def sets */ void iterateLiveOut(void); /*! Use custom allocators */ GBE_CLASS(LiveOutSet); }; /*! Debug print of the liveout set */ std::ostream &operator<< (std::ostream &out, LiveOutSet &set); LiveOutSet::LiveOutSet(Liveness &liveness, const FunctionDAG &dag) : liveness(liveness), dag(dag) { this->initializeInstructionDef(); this->initializeOtherDef(); this->iterateLiveOut(); } LiveOutSet::RegDefSet &LiveOutSet::getDefSet(const BasicBlock *bb, Register reg) { auto bbIt = defMap.find(bb); GBE_ASSERT(bbIt != defMap.end()); auto defIt = bbIt->second->find(reg); GBE_ASSERT(defIt != bbIt->second->end() && defIt->second != NULL); return *defIt->second; } void LiveOutSet::makeDefSet(DefSet &udChain, const BasicBlock &bb, Register reg) { // Iterate over all the predecessors const auto &preds = bb.getPredecessorSet(); for (const auto &pred : preds) { if (pred->undefPhiRegs.contains(reg)) continue; RegDefSet &predDef = this->getDefSet(pred, reg); for (auto def : predDef) udChain.insert(def); } // If this is the top block we must take into account both function // arguments and special registers const Function &fn = bb.getParent(); if (fn.isEntryBlock(bb) == false) return; // Is it a function input? const FunctionArgument *arg = fn.getArg(reg); const PushLocation *pushed = fn.getPushLocation(reg); // Is it a pushed register? if (pushed != NULL) { ValueDef *def = const_cast(dag.getDefAddress(pushed)); udChain.insert(def); } // Is a function argument? else if (arg != NULL) { ValueDef *def = const_cast(dag.getDefAddress(arg)); udChain.insert(def); } // Is it a special register? else if (fn.isSpecialReg(reg) == true) { ValueDef *def = const_cast(dag.getDefAddress(reg)); udChain.insert(def); } } void LiveOutSet::initializeInstructionDef(void) { const Function &fn = liveness.getFunction(); // Iterate over each block and initialize the liveOut data fn.foreachBlock([&](const BasicBlock &bb) { GBE_ASSERT(defMap.find(&bb) == defMap.end()); // Allocate a map of register definitions auto blockDefMap = this->newBlockDefMap(); defMap.insert(std::make_pair(&bb, blockDefMap)); // We only consider liveout registers const auto &info = this->liveness.getBlockInfo(&bb); const auto &liveOut = info.liveOut; for (auto reg : liveOut) { GBE_ASSERT(blockDefMap->find(reg) == blockDefMap->end()); auto regDefSet = this->newRegDefSet(); blockDefMap->insert(std::make_pair(reg, regDefSet)); } // Now traverse the blocks backwards and find the definition of each // liveOut register set defined; for (auto it = --bb.end(); it != bb.end(); --it) { const Instruction &insn = *it; const uint32_t dstNum = insn.getDstNum(); for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { const Register reg = insn.getDst(dstID); // We only take the most recent definition if (defined.contains(reg) == true) continue; // Not in LiveOut, so does not matter if (info.inLiveOut(reg) == false) continue; defined.insert(reg); // Insert the outgoing definition for this register auto regDefSet = blockDefMap->find(reg); ValueDef *def = const_cast(this->dag.getDefAddress(&insn, dstID)); GBE_ASSERT(regDefSet != blockDefMap->end() && def != NULL); regDefSet->second->insert(def); } } }); } void LiveOutSet::initializeOtherDef(void) { const Function &fn = liveness.getFunction(); const uint32_t argNum = fn.argNum(); // The first block must also transfer the function arguments const BasicBlock &top = fn.getTopBlock(); const Liveness::BlockInfo &info = this->liveness.getBlockInfo(&top); GBE_ASSERT(defMap.contains(&top) == true); auto blockDefMap = defMap.find(&top)->second; // Insert all the values that are not overwritten in the block and alive at // the end of it for (uint32_t argID = 0; argID < argNum; ++argID) { const FunctionArgument &arg = fn.getArg(argID); const Register reg = arg.reg; // Do not transfer dead values if (info.inLiveOut(reg) == false) continue; // If we overwrite it, do not transfer the initial value if (info.inVarKill(reg) == true) continue; ValueDef *def = const_cast(this->dag.getDefAddress(&arg)); auto it = blockDefMap->find(reg); GBE_ASSERT(it != blockDefMap->end()); it->second->insert(def); } // Now transfer the special registers that are not over-written const uint32_t firstID = fn.getFirstSpecialReg(); const uint32_t specialNum = fn.getSpecialRegNum(); for (uint32_t regID = firstID; regID < firstID + specialNum; ++regID) { const Register reg(regID); // Do not transfer dead values if (info.inLiveOut(reg) == false) continue; // If we overwrite it, do not transfer the initial value if (info.inVarKill(reg) == true) continue; ValueDef *def = const_cast(this->dag.getDefAddress(reg)); auto it = blockDefMap->find(reg); GBE_ASSERT(it != blockDefMap->end()); it->second->insert(def); } // Finally do the same thing with pushed registers const Function::PushMap &pushMap = fn.getPushMap(); for (const auto &pushed : pushMap) { const Register reg = pushed.first; // Do not transfer dead values if (info.inLiveOut(reg) == false) continue; // If we overwrite it, do not transfer the initial value if (info.inVarKill(reg) == true) continue; ValueDef *def = const_cast(this->dag.getDefAddress(&pushed.second)); auto it = blockDefMap->find(reg); GBE_ASSERT(it != blockDefMap->end()); it->second->insert(def); } } void LiveOutSet::iterateLiveOut(void) { bool changed = true; while (changed) { changed = false; // Compute the union of the current liveout definitions with the previous // ones. Do not take into account the killed values though liveness.foreach([&](Liveness::BlockInfo &curr, const Liveness::BlockInfo &pred) { const BasicBlock &bb = curr.bb; const BasicBlock &pbb = pred.bb; for (auto reg : curr.liveOut) { if (pred.inLiveOut(reg) == false) continue; if (curr.inVarKill(reg) == true) continue; RegDefSet &currSet = this->getDefSet(&bb, reg); RegDefSet &predSet = this->getDefSet(&pbb, reg); // Transfer the values for (auto def : predSet) { if (currSet.contains(def)) continue; changed = true; currSet.insert(def); } } }); } } LiveOutSet::~LiveOutSet(void) { for (const auto pair : defMap) { BlockDefMap *block = pair.second; for (auto regSet : *block) this->deleteRegDefSet(regSet.second); this->deleteBlockDefMap(block); } } std::ostream &operator<< (std::ostream &out, LiveOutSet &set) { for (const auto &pair : set.defMap) { // To recognize the block, just print its instructions out << "Block:" << std::endl; for (const auto &insn : *pair.first) out << insn << std::endl; // Iterate over all alive registers to get their definitions const LiveOutSet::BlockDefMap *defMap = pair.second; if (defMap->size() > 0) out << "LiveSet:" << std::endl; for (const auto &pair : *defMap) { const Register reg = pair.first; const LiveOutSet::RegDefSet *set = pair.second; for (auto def : *set) { const ValueDef::Type type = def->getType(); if (type == ValueDef::DEF_FN_ARG) out << "%" << reg << ": " << "function input" << std::endl; else if (type == ValueDef::DEF_FN_PUSHED) out << "%" << reg << ": " << "pushed register" << std::endl; else if (type == ValueDef::DEF_SPECIAL_REG) out << "%" << reg << ": " << "special register" << std::endl; else { const Instruction *insn = def->getInstruction(); out << "%" << reg << ": " << insn << " " << *insn << std::endl; } } } out << std::endl; } return out; } FunctionDAG::FunctionDAG(Liveness &liveness) : fn(liveness.getFunction()) { // We first start with empty chains udEmpty = this->newDefSet(); duEmpty = this->newUseSet(); // First create the chains and insert them in their respective maps fn.foreachInstruction([this](const Instruction &insn) { // sources == value uses const uint32_t srcNum = insn.getSrcNum(); for (uint32_t srcID = 0; srcID < srcNum; ++srcID) { ValueUse *valueUse = this->newValueUse(&insn, srcID); useName.insert(std::make_pair(*valueUse, valueUse)); udGraph.insert(std::make_pair(*valueUse, udEmpty)); } // destinations == value defs const uint32_t dstNum = insn.getDstNum(); for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { ValueDef *valueDef = this->newValueDef(&insn, dstID); defName.insert(std::make_pair(*valueDef, valueDef)); duGraph.insert(std::make_pair(*valueDef, duEmpty)); } }); // Function arguments are also value definitions const uint32_t argNum = fn.argNum(); for (uint32_t argID = 0; argID < argNum; ++argID) { const FunctionArgument &arg = fn.getArg(argID); ValueDef *valueDef = this->newValueDef(&arg); defName.insert(std::make_pair(*valueDef, valueDef)); duGraph.insert(std::make_pair(*valueDef, duEmpty)); } // Special registers are also definitions const uint32_t firstID = fn.getFirstSpecialReg(); const uint32_t specialNum = fn.getSpecialRegNum(); for (uint32_t regID = firstID; regID < firstID + specialNum; ++regID) { const Register reg(regID); ValueDef *valueDef = this->newValueDef(reg); defName.insert(std::make_pair(*valueDef, valueDef)); duGraph.insert(std::make_pair(*valueDef, duEmpty)); } // Pushed registers are also definitions const Function::PushMap &pushMap = fn.getPushMap(); for (const auto &pushed : pushMap) { ValueDef *valueDef = this->newValueDef(&pushed.second); defName.insert(std::make_pair(*valueDef, valueDef)); duGraph.insert(std::make_pair(*valueDef, duEmpty)); } // We create the liveOutSet to help us transfer the definitions LiveOutSet liveOutSet(liveness, *this); // Build UD chains traversing the blocks top to bottom fn.foreachBlock([&](const BasicBlock &bb) { // Track the allocated chains to be able to reuse them map allocated; // Some chains may be not used (ie they are dead). We track them to be // able to deallocate them later set unused; // For each instruction build the UD chains const_cast(bb).foreach([&](const Instruction &insn) { // Instruction sources consumes definitions const uint32_t srcNum = insn.getSrcNum(); for (uint32_t srcID = 0; srcID < srcNum; ++srcID) { const Register src = insn.getSrc(srcID); const ValueUse use(&insn, srcID); auto ud = udGraph.find(use); GBE_ASSERT(ud != udGraph.end()); // We already allocate the ud chain for this register auto it = allocated.find(src); if (it != allocated.end()) { udGraph.erase(ud); udGraph.insert(std::make_pair(use, it->second)); if (unused.contains(it->second)) unused.erase(it->second); } // Create a new one from the predecessor chains (upward used value) else { DefSet *udChain = this->newDefSet(); liveOutSet.makeDefSet(*udChain, bb, src); allocated.insert(std::make_pair(src, udChain)); ud->second = udChain; } } // Instruction destinations create new chains const uint32_t dstNum = insn.getDstNum(); for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { const Register dst = insn.getDst(dstID); ValueDef *def = const_cast(this->getDefAddress(&insn, dstID)); DefSet *udChain = this->newDefSet(); udChain->insert(def); unused.insert(udChain); // Remove the previous definition if any if (allocated.contains(dst) == true) allocated.erase(dst); allocated.insert(std::make_pair(dst, udChain)); } }); // Deallocate unused chains for (auto set : unused) this->deleteDefSet(set); }); // Build the DU chains from the UD ones fn.foreachInstruction([&](const Instruction &insn) { // For each value definition of each source, we push back this use const uint32_t srcNum = insn.getSrcNum(); for (uint32_t srcID = 0; srcID < srcNum; ++srcID) { ValueUse *use = const_cast(getUseAddress(&insn, srcID)); // Find all definitions for this source const auto &defs = this->getDef(&insn, srcID); for (auto def : defs) { auto uses = duGraph.find(*def); UseSet *du = uses->second; GBE_ASSERT(uses != duGraph.end()); if (du == duEmpty) { duGraph.erase(*def); du = this->newUseSet(); duGraph.insert(std::make_pair(*def, du)); } du->insert(use); } } }); // Allocate the set of uses and defs per register const uint32_t regNum = fn.regNum(); for (uint32_t regID = 0; regID < regNum; ++regID) { const Register reg(regID); UseSet *useSet = GBE_NEW_NO_ARG(UseSet); DefSet *defSet = GBE_NEW_NO_ARG(DefSet); regUse.insert(std::make_pair(reg, useSet)); regDef.insert(std::make_pair(reg, defSet)); } // Fill use sets (one per register) for (auto &useSet : duGraph) { for (auto use : *useSet.second) { const Register reg = use->getRegister(); auto it = regUse.find(reg); GBE_ASSERT(it != regUse.end() && it->second != NULL); it->second->insert(use); } } // Fill def sets (one per register) for (auto &defSet : udGraph) { for (auto def : *defSet.second) { const Register reg = def->getRegister(); auto it = regDef.find(reg); GBE_ASSERT(it != regDef.end() && it->second != NULL); it->second->insert(def); } } } /*! Helper to deallocate objects */ #define PTR_RELEASE(TYPE, VAR) \ do { \ if (VAR && destroyed.contains(VAR) == false) { \ destroyed.insert(VAR); \ delete##TYPE(VAR); \ } \ } while (0) FunctionDAG::~FunctionDAG(void) { // We track the already destroyed pointers set destroyed; // Release the empty ud-chains and du-chains PTR_RELEASE(DefSet, udEmpty); PTR_RELEASE(UseSet, duEmpty); // We free all the ud-chains for (const auto &pair : udGraph) { auto defs = pair.second; if (destroyed.contains(defs)) continue; for (auto def : *defs) PTR_RELEASE(ValueDef, def); PTR_RELEASE(DefSet, defs); } // We free all the du-chains for (const auto &pair : duGraph) { auto uses = pair.second; if (destroyed.contains(uses)) continue; for (auto use : *uses) PTR_RELEASE(ValueUse, use); PTR_RELEASE(UseSet, uses); } // Release all the use and definition sets per register for (const auto &pair : regUse) GBE_SAFE_DELETE(pair.second); for (const auto &pair : regDef) GBE_SAFE_DELETE(pair.second); } #undef PTR_RELEASE const UseSet &FunctionDAG::getUse(const ValueDef &def) const { auto it = duGraph.find(def); GBE_ASSERT(it != duGraph.end()); return *it->second; } const UseSet &FunctionDAG::getUse(const Instruction *insn, uint32_t dstID) const { return this->getUse(ValueDef(insn, dstID)); } const UseSet &FunctionDAG::getUse(const FunctionArgument *arg) const { return this->getUse(ValueDef(arg)); } const UseSet &FunctionDAG::getUse(const Register ®) const { return this->getUse(ValueDef(reg)); } const DefSet &FunctionDAG::getDef(const ValueUse &use) const { auto it = udGraph.find(use); GBE_ASSERT(it != udGraph.end()); return *it->second; } const DefSet &FunctionDAG::getDef(const Instruction *insn, uint32_t srcID) const { return this->getDef(ValueUse(insn, srcID)); } const UseSet *FunctionDAG::getRegUse(const Register ®) const { auto it = regUse.find(reg); GBE_ASSERT(it != regUse.end()); return it->second; } const DefSet *FunctionDAG::getRegDef(const Register ®) const { auto it = regDef.find(reg); GBE_ASSERT(it != regDef.end()); return it->second; } const ValueDef *FunctionDAG::getDefAddress(const ValueDef &def) const { auto it = defName.find(def); GBE_ASSERT(it != defName.end() && it->second != NULL); return it->second; } const ValueDef *FunctionDAG::getDefAddress(const PushLocation *pushed) const { return this->getDefAddress(ValueDef(pushed)); } const ValueDef *FunctionDAG::getDefAddress(const Instruction *insn, uint32_t dstID) const { return this->getDefAddress(ValueDef(insn, dstID)); } const ValueDef *FunctionDAG::getDefAddress(const FunctionArgument *arg) const { return this->getDefAddress(ValueDef(arg)); } const ValueDef *FunctionDAG::getDefAddress(const Register ®) const { return this->getDefAddress(ValueDef(reg)); } const ValueUse *FunctionDAG::getUseAddress(const Instruction *insn, uint32_t srcID) const { const ValueUse use(insn, srcID); auto it = useName.find(use); GBE_ASSERT(it != useName.end() && it->second != NULL); return it->second; } void FunctionDAG::getRegUDBBs(Register r, set &BBs) const{ auto dSet = getRegDef(r); for (auto &def : *dSet) BBs.insert(def->getInstruction()->getParent()); auto uSet = getRegUse(r); for (auto &use : *uSet) BBs.insert(use->getInstruction()->getParent()); } static void getLivenessBBs(const Liveness &liveness, Register r, const set &useDefSet, set &liveInSet, set &liveOutSet){ for (auto bb : useDefSet) { if (liveness.getLiveOut(bb).contains(r)) liveOutSet.insert(bb); if (liveness.getLiveIn(bb).contains(r)) liveInSet.insert(bb); } } static void getBlockDefInsns(const BasicBlock *bb, const DefSet *dSet, Register r, set &defInsns) { for (auto def : *dSet) { auto defInsn = def->getInstruction(); if (defInsn->getParent() == bb) defInsns.insert(defInsn); } } static bool liveinInterfere(const BasicBlock *bb, const Instruction *defInsn, Register r1) { BasicBlock::const_iterator iter = BasicBlock::const_iterator(defInsn); BasicBlock::const_iterator iterE = bb->end(); if (defInsn->getOpcode() == OP_MOV && defInsn->getSrc(0) == r1) return false; while (iter != iterE) { const Instruction *insn = iter.node(); for (unsigned i = 0; i < insn->getDstNum(); i++) { Register dst = insn->getDst(i); if (dst == r1) return false; } for (unsigned i = 0; i < insn->getSrcNum(); i++) { ir::Register src = insn->getSrc(i); if (src == r1) return true; } ++iter; } return false; } // r0 and r1 both are in Livein set. // Only if r0/r1 is used after r1/r0 has been modified. bool FunctionDAG::interfereLivein(const BasicBlock *bb, Register r0, Register r1) const { set defInsns0, defInsns1; auto defSet0 = getRegDef(r0); auto defSet1 = getRegDef(r1); getBlockDefInsns(bb, defSet0, r0, defInsns0); getBlockDefInsns(bb, defSet1, r1, defInsns1); if (defInsns0.size() == 0 && defInsns1.size() == 0) return false; for (auto insn : defInsns0) { if (liveinInterfere(bb, insn, r1)) return true; } for (auto insn : defInsns1) { if (liveinInterfere(bb, insn, r0)) return true; } return false; } // r0 and r1 both are in Liveout set. // Only if the last definition of r0/r1 is a MOV r0, r1 or MOV r1, r0, // it will not introduce interfering in this BB. bool FunctionDAG::interfereLiveout(const BasicBlock *bb, Register r0, Register r1) const { set defInsns0, defInsns1; auto defSet0 = getRegDef(r0); auto defSet1 = getRegDef(r1); getBlockDefInsns(bb, defSet0, r0, defInsns0); getBlockDefInsns(bb, defSet1, r1, defInsns1); if (defInsns0.size() == 0 && defInsns1.size() == 0) return false; BasicBlock::const_iterator iter = --bb->end(); BasicBlock::const_iterator iterE = bb->begin(); do { const Instruction *insn = iter.node(); for (unsigned i = 0; i < insn->getDstNum(); i++) { Register dst = insn->getDst(i); if (dst == r0 || dst == r1) { if (insn->getOpcode() != OP_MOV) return true; if (dst == r0 && insn->getSrc(0) != r1) return true; if (dst == r1 && insn->getSrc(0) != r0) return true; return false; } } --iter; } while (iter != iterE); return false; } // check instructions after the def of r0, if there is any def of r1, then no interefere for this // range. Otherwise, if there is any use of r1, then return true. bool FunctionDAG::interfere(const BasicBlock *bb, Register inReg, Register outReg) const { auto dSet = getRegDef(outReg); for (auto &def : *dSet) { auto defInsn = def->getInstruction(); if (defInsn->getParent() == bb) { if (defInsn->getOpcode() == OP_MOV && defInsn->getSrc(0) == inReg) continue; BasicBlock::const_iterator iter = BasicBlock::const_iterator(defInsn); BasicBlock::const_iterator iterE = bb->end(); iter++; // check no use of phi in this basicblock between [phiCopySrc def, bb end] while (iter != iterE) { const ir::Instruction *insn = iter.node(); // check phiUse for (unsigned i = 0; i < insn->getSrcNum(); i++) { ir::Register src = insn->getSrc(i); if (src == inReg) return true; } ++iter; } } } return false; } bool FunctionDAG::interfere(const Liveness &liveness, Register r0, Register r1) const { // If there are no any intersection BB, they are not interfering to each other. // There are three different interfering cases which need further checking: // 1. Both registers are in the LiveIn register set. // 2. Both registers are in the LiveOut register set. // 3. One is in LiveIn set and the Other is in LiveOut set. // For the above 3 cases, we need 3 different ways to check whether they really // interfering to each other. set bbSet0; set bbSet1; getRegUDBBs(r0, bbSet0); getRegUDBBs(r1, bbSet1); set liveInBBSet0, liveInBBSet1; set liveOutBBSet0, liveOutBBSet1; getLivenessBBs(liveness, r0, bbSet0, liveInBBSet0, liveOutBBSet0); getLivenessBBs(liveness, r1, bbSet1, liveInBBSet1, liveOutBBSet1); GBE_ASSERT(liveInBBSet0.size() + liveOutBBSet0.size() > 0); GBE_ASSERT(liveInBBSet1.size() + liveOutBBSet1.size() > 0); set intersect; set_intersection(liveInBBSet0.begin(), liveInBBSet0.end(), liveInBBSet1.begin(), liveInBBSet1.end(), std::inserter(intersect, intersect.begin())); for (auto bb : intersect) { if (interfereLivein(bb, r0, r1)) return true; } intersect.clear(); for (auto &bb: liveOutBBSet0) { if (liveness.getBlockInfo(bb).inLiveOut(r1)) intersect.insert(bb); } for (auto bb: liveOutBBSet1) { if (liveness.getBlockInfo(bb).inLiveOut(r0)) intersect.insert(bb); } for (auto bb : intersect) { if (interfereLiveout(bb, r0, r1)) return true; } set OIIntersect, IOIntersect; set_intersection(liveOutBBSet0.begin(), liveOutBBSet0.end(), liveInBBSet1.begin(), liveInBBSet1.end(), std::inserter(OIIntersect, OIIntersect.begin())); for (auto bb : OIIntersect) { if (interfere(bb, r1, r0)) return true; } set_intersection(liveInBBSet0.begin(), liveInBBSet0.end(), liveOutBBSet1.begin(), liveOutBBSet1.end(), std::inserter(IOIntersect, IOIntersect.begin())); for (auto bb : IOIntersect) { if (interfere(bb, r0, r1)) return true; } return false; } std::ostream &operator<< (std::ostream &out, const FunctionDAG &dag) { const Function &fn = dag.getFunction(); // Print all uses for the definitions and all definitions for each uses fn.foreachInstruction([&](const Instruction &insn) { out << &insn << ": " << insn << std::endl; // Display the set of definition for each destination const uint32_t dstNum = insn.getDstNum(); if (dstNum > 0) out << "USES:" << std::endl; for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { const Register reg = insn.getDst(dstID); const auto &uses = dag.getUse(&insn, dstID); for (auto use : uses) { const Instruction *other = use->getInstruction(); out << " %" << reg << " " << other << ": " << *other << std::endl; } } // Display the set of definitions for each source const uint32_t srcNum = insn.getSrcNum(); if (srcNum > 0) out << "DEFS:" << std::endl; for (uint32_t srcID = 0; srcID < srcNum; ++srcID) { const Register reg = insn.getSrc(srcID); const auto &defs = dag.getDef(&insn, srcID); for (auto def : defs) { if (def->getType() == ValueDef::DEF_FN_PUSHED) out << " %" << reg << " # pushed register" << std::endl; else if (def->getType() == ValueDef::DEF_FN_ARG) out << " %" << reg << " # function argument" << std::endl; else if (def->getType() == ValueDef::DEF_SPECIAL_REG) out << " %" << reg << " # special register" << std::endl; else { const Instruction *other = def->getInstruction(); out << " %" << reg << " " << other << ": " << *other << std::endl; } } } out << std::endl; }); return out; } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/printf.hpp000664 001750 001750 00000015213 13161142102 020254 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file printf.hpp * */ #ifndef __GBE_IR_PRINTF_HPP__ #define __GBE_IR_PRINTF_HPP__ #include #include "sys/map.hpp" #include "sys/vector.hpp" namespace gbe { namespace ir { class Unit; /* Things about printf info. */ enum { PRINTF_LM_NONE, PRINTF_LM_HH, PRINTF_LM_H, PRINTF_LM_L, PRINTF_LM_HL, }; enum { PRINTF_CONVERSION_INVALID, PRINTF_CONVERSION_D, PRINTF_CONVERSION_I, PRINTF_CONVERSION_O, PRINTF_CONVERSION_U, PRINTF_CONVERSION_X, PRINTF_CONVERSION_x, PRINTF_CONVERSION_F, PRINTF_CONVERSION_f, PRINTF_CONVERSION_E, PRINTF_CONVERSION_e, PRINTF_CONVERSION_G, PRINTF_CONVERSION_g, PRINTF_CONVERSION_A, PRINTF_CONVERSION_a, PRINTF_CONVERSION_C, PRINTF_CONVERSION_S, PRINTF_CONVERSION_P }; struct PrintfState { char left_justified; char sign_symbol; //0 for nothing, 1 for sign, 2 for space. char alter_form; char zero_padding; char vector_n; int min_width; int precision; int length_modifier; char conversion_specifier; int out_buf_sizeof_offset; // Should *global_total_size to get the full offset. std::string str; //if %s, the string store here. PrintfState(void) { left_justified = 0; sign_symbol = 0; alter_form = 0; zero_padding = 0; vector_n = 0; min_width = 0; precision = 0; length_modifier = 0; conversion_specifier = 0; out_buf_sizeof_offset = 0; } PrintfState(const PrintfState & other) { left_justified = other.left_justified; sign_symbol = other.sign_symbol; alter_form = other.alter_form; zero_padding = other.zero_padding; vector_n = other.vector_n; min_width = other.min_width; precision = other.precision; length_modifier = other.length_modifier; conversion_specifier = other.conversion_specifier; out_buf_sizeof_offset = other.out_buf_sizeof_offset; str = other.str; } }; enum { PRINTF_SLOT_TYPE_NONE, PRINTF_SLOT_TYPE_STRING, PRINTF_SLOT_TYPE_STATE }; struct PrintfSlot { uint32_t type; std::string str; PrintfState state; PrintfSlot(void) { type = PRINTF_SLOT_TYPE_NONE; } PrintfSlot(std::string& s) : str(s) { type = PRINTF_SLOT_TYPE_STRING; } PrintfSlot(PrintfState& st) { type = PRINTF_SLOT_TYPE_STATE; state = st; } PrintfSlot(const PrintfSlot & other) { if (other.type == PRINTF_SLOT_TYPE_STRING) { type = PRINTF_SLOT_TYPE_STRING; str = other.str; } else if (other.type == PRINTF_SLOT_TYPE_STATE) { type = PRINTF_SLOT_TYPE_STATE; state = other.state; } else { type = PRINTF_SLOT_TYPE_NONE; } } ~PrintfSlot(void) { } }; struct PrintfLog { uint32_t magic; // 0xAABBCCDD as magic for ASSERT. uint32_t size; // Size of this printf log, include header. uint32_t statementNum; // which printf within one kernel. const char* content; PrintfLog(const char* p) { GBE_ASSERT(*((uint32_t *)p) == 0xAABBCCDD); magic = *((uint32_t *)p); p += sizeof(uint32_t); size = *((uint32_t *)p); p += sizeof(uint32_t); statementNum = *((uint32_t *)p); p += sizeof(uint32_t); content = p; } template T getData(void) { T D = *((T *)content); content += sizeof(T); return D; } }; class Context; class PrintfSet //: public Serializable { public: PrintfSet(const PrintfSet& other) { fmts = other.fmts; btiBuf = other.btiBuf; } PrintfSet(void) = default; struct LockOutput { LockOutput(void) { pthread_mutex_lock(&lock); } ~LockOutput(void) { pthread_mutex_unlock(&lock); } }; typedef vector PrintfFmt; void append(uint32_t num, PrintfFmt* fmt) { GBE_ASSERT(fmts.find(num) == fmts.end()); fmts.insert(std::pair(num, *fmt)); } uint32_t getPrintfNum(void) const { return fmts.size(); } void setBufBTI(uint8_t b) { btiBuf = b; } uint8_t getBufBTI() const { return btiBuf; } uint32_t getPrintfBufferElementSize(uint32_t i) { PrintfSlot slot; int vec_num = 1; if (slot.state.vector_n > 0) { vec_num = slot.state.vector_n; } assert(vec_num > 0 && vec_num <= 16); switch (slot.state.conversion_specifier) { case PRINTF_CONVERSION_I: case PRINTF_CONVERSION_D: case PRINTF_CONVERSION_O: case PRINTF_CONVERSION_U: case PRINTF_CONVERSION_X: case PRINTF_CONVERSION_x: case PRINTF_CONVERSION_P: /* Char will be aligned to sizeof(int) here. */ case PRINTF_CONVERSION_C: return (uint32_t)(sizeof(int) * vec_num); case PRINTF_CONVERSION_E: case PRINTF_CONVERSION_e: case PRINTF_CONVERSION_F: case PRINTF_CONVERSION_f: case PRINTF_CONVERSION_G: case PRINTF_CONVERSION_g: case PRINTF_CONVERSION_A: case PRINTF_CONVERSION_a: return (uint32_t)(sizeof(float) * vec_num); case PRINTF_CONVERSION_S: return (uint32_t)0; default: break; } assert(0); return 0; } void outputPrintf(void* buf_addr); private: std::map fmts; friend struct LockOutput; uint8_t btiBuf; static pthread_mutex_t lock; GBE_CLASS(PrintfSet); }; } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_PRINTF_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/instruction.hxx000664 001750 001750 00000011022 13161142102 021345 0ustar00yryr000000 000000 /* * Copyright 2012 Intel Corporation * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice (including the next * paragraph) shall be included in all copies or substantial portions of the * Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER * DEALINGS IN THE SOFTWARE. */ /** * \file instruction.hxx * \author Benjamin Segovia */ DECL_INSN(SIMD_SIZE, NullaryInstruction) DECL_INSN(SIMD_ID, NullaryInstruction) DECL_INSN(MOV, UnaryInstruction) DECL_INSN(COS, UnaryInstruction) DECL_INSN(SIN, UnaryInstruction) DECL_INSN(LOG, UnaryInstruction) DECL_INSN(EXP, UnaryInstruction) DECL_INSN(SQR, UnaryInstruction) DECL_INSN(RSQ, UnaryInstruction) DECL_INSN(RCP, UnaryInstruction) DECL_INSN(ABS, UnaryInstruction) DECL_INSN(RNDD, UnaryInstruction) DECL_INSN(RNDE, UnaryInstruction) DECL_INSN(RNDU, UnaryInstruction) DECL_INSN(RNDZ, UnaryInstruction) DECL_INSN(SIMD_ANY, UnaryInstruction) DECL_INSN(SIMD_ALL, UnaryInstruction) DECL_INSN(BSWAP, UnaryInstruction) DECL_INSN(POW, BinaryInstruction) DECL_INSN(MUL, BinaryInstruction) DECL_INSN(ADD, BinaryInstruction) DECL_INSN(ADDSAT, BinaryInstruction) DECL_INSN(SUB, BinaryInstruction) DECL_INSN(SUBSAT, BinaryInstruction) DECL_INSN(DIV, BinaryInstruction) DECL_INSN(REM, BinaryInstruction) DECL_INSN(SHL, BinaryInstruction) DECL_INSN(SHR, BinaryInstruction) DECL_INSN(ASR, BinaryInstruction) DECL_INSN(BSF, BinaryInstruction) DECL_INSN(BSB, BinaryInstruction) DECL_INSN(OR, BinaryInstruction) DECL_INSN(XOR, BinaryInstruction) DECL_INSN(AND, BinaryInstruction) DECL_INSN(SIMD_SHUFFLE, SimdShuffleInstruction) DECL_INSN(SEL, SelectInstruction) DECL_INSN(EQ, CompareInstruction) DECL_INSN(NE, CompareInstruction) DECL_INSN(LE, CompareInstruction) DECL_INSN(LT, CompareInstruction) DECL_INSN(GE, CompareInstruction) DECL_INSN(GT, CompareInstruction) DECL_INSN(ORD, CompareInstruction) DECL_INSN(BITCAST, BitCastInstruction) DECL_INSN(CVT, ConvertInstruction) DECL_INSN(SAT_CVT, ConvertInstruction) DECL_INSN(F16TO32, ConvertInstruction) DECL_INSN(F32TO16, ConvertInstruction) DECL_INSN(ATOMIC, AtomicInstruction) DECL_INSN(BRA, BranchInstruction) DECL_INSN(RET, BranchInstruction) DECL_INSN(LOADI, LoadImmInstruction) DECL_INSN(LOAD, LoadInstruction) DECL_INSN(STORE, StoreInstruction) DECL_INSN(TYPED_WRITE, TypedWriteInstruction) DECL_INSN(SAMPLE, SampleInstruction) DECL_INSN(SYNC, SyncInstruction) DECL_INSN(LABEL, LabelInstruction) DECL_INSN(READ_ARF, ReadARFInstruction) DECL_INSN(REGION, RegionInstruction) DECL_INSN(VME, VmeInstruction) DECL_INSN(INDIRECT_MOV, IndirectMovInstruction) DECL_INSN(GET_IMAGE_INFO, GetImageInfoInstruction) DECL_INSN(MUL_HI, BinaryInstruction) DECL_INSN(I64_MUL_HI, BinaryInstruction) DECL_INSN(FBH, UnaryInstruction) DECL_INSN(FBL, UnaryInstruction) DECL_INSN(CBIT, UnaryInstruction) DECL_INSN(LZD, UnaryInstruction) DECL_INSN(HADD, BinaryInstruction) DECL_INSN(RHADD, BinaryInstruction) DECL_INSN(I64HADD, BinaryInstruction) DECL_INSN(I64RHADD, BinaryInstruction) DECL_INSN(UPSAMPLE_SHORT, BinaryInstruction) DECL_INSN(UPSAMPLE_INT, BinaryInstruction) DECL_INSN(UPSAMPLE_LONG, BinaryInstruction) DECL_INSN(I64MADSAT, TernaryInstruction) DECL_INSN(MAD, TernaryInstruction) DECL_INSN(LRP, TernaryInstruction) DECL_INSN(IF, BranchInstruction) DECL_INSN(ENDIF, BranchInstruction) DECL_INSN(ELSE, BranchInstruction) DECL_INSN(WHILE, BranchInstruction) DECL_INSN(CALC_TIMESTAMP, CalcTimestampInstruction) DECL_INSN(STORE_PROFILING, StoreProfilingInstruction) DECL_INSN(WAIT, WaitInstruction) DECL_INSN(WORKGROUP, WorkGroupInstruction) DECL_INSN(SUBGROUP, SubGroupInstruction) DECL_INSN(PRINTF, PrintfInstruction) DECL_INSN(MBREAD, MediaBlockReadInstruction) DECL_INSN(MBWRITE, MediaBlockWriteInstruction) DECL_INSN(BFREV, UnaryInstruction) Beignet-1.3.2-Source/backend/src/ir/half.cpp000664 001750 001750 00000017205 13173554000 017671 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file half.cpp * */ #include "llvm/ADT/APSInt.h" #include "half.hpp" namespace gbe { namespace ir { static llvm::APFloat convU16ToAPFloat(const uint16_t v) { uint64_t v64 = static_cast(v); llvm::APInt apInt(16, v64, false); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 return llvm::APFloat(llvm::APFloat::IEEEhalf(), apInt); #else return llvm::APFloat(llvm::APFloat::IEEEhalf, apInt); #endif } static uint16_t convAPFloatToU16(const llvm::APFloat& apf) { llvm::APInt api = apf.bitcastToAPInt(); uint64_t v64 = api.getZExtValue(); return static_cast(v64); } half::operator float(void) const { bool loseInfo; llvm::APFloat apf_self = convU16ToAPFloat(this->val); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 apf_self.convert(llvm::APFloat::IEEEsingle(), llvm::APFloat::rmNearestTiesToEven, &loseInfo); #else apf_self.convert(llvm::APFloat::IEEEsingle, llvm::APFloat::rmNearestTiesToEven, &loseInfo); #endif return apf_self.convertToFloat(); } half::operator double(void) const { bool loseInfo; llvm::APFloat apf_self = convU16ToAPFloat(this->val); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 apf_self.convert(llvm::APFloat::IEEEdouble(), llvm::APFloat::rmNearestTiesToEven, &loseInfo); #else apf_self.convert(llvm::APFloat::IEEEdouble, llvm::APFloat::rmNearestTiesToEven, &loseInfo); #endif return apf_self.convertToDouble(); } half::operator uint16_t(void) const { llvm::APSInt apsInt(16, false); bool isExact; llvm::APFloat apf_self = convU16ToAPFloat(this->val); apf_self.convertToInteger(apsInt, llvm::APFloat::rmNearestTiesToEven, &isExact); return static_cast(apsInt.getZExtValue()); } half::operator int16_t(void) const { llvm::APSInt apsInt(16, true); bool isExact; llvm::APFloat apf_self = convU16ToAPFloat(this->val); apf_self.convertToInteger(apsInt, llvm::APFloat::rmNearestTiesToEven, &isExact); return static_cast(apsInt.getZExtValue()); } half half::convToHalf(uint16_t u16) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 llvm::APFloat res(llvm::APFloat::IEEEhalf(), llvm::APInt(16, 0, false)); #else llvm::APFloat res(llvm::APFloat::IEEEhalf, llvm::APInt(16, 0, false)); #endif uint64_t u64 = static_cast(u16); llvm::APInt apInt(16, u64, false); res.convertFromAPInt(apInt, false, llvm::APFloat::rmNearestTiesToEven); return half(convAPFloatToU16(res)); } half half::convToHalf(int16_t v16) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 llvm::APFloat res(llvm::APFloat::IEEEhalf(), llvm::APInt(16, 0, true)); #else llvm::APFloat res(llvm::APFloat::IEEEhalf, llvm::APInt(16, 0, true)); #endif uint64_t u64 = static_cast(v16); llvm::APInt apInt(16, u64, true); res.convertFromAPInt(apInt, true, llvm::APFloat::rmNearestTiesToEven); return half(convAPFloatToU16(res)); } half half::operator +(const half& other) const { llvm::APFloat apf_self = convU16ToAPFloat(this->val); llvm::APFloat apf_other = convU16ToAPFloat(other.val); apf_self.add(apf_other, llvm::APFloat::rmNearestTiesToEven); uint16_t ret = convAPFloatToU16(apf_self); return half(ret); } half half::operator -(const half& other) const { llvm::APFloat apf_self = convU16ToAPFloat(this->val); llvm::APFloat apf_other = convU16ToAPFloat(other.val); apf_self.subtract(apf_other, llvm::APFloat::rmNearestTiesToEven); uint16_t ret = convAPFloatToU16(apf_self); return half(ret); } half half::operator *(const half& other) const { llvm::APFloat apf_self = convU16ToAPFloat(this->val); llvm::APFloat apf_other = convU16ToAPFloat(other.val); apf_self.multiply(apf_other, llvm::APFloat::rmNearestTiesToEven); uint16_t ret = convAPFloatToU16(apf_self); return half(ret); } half half::operator /(const half& other) const { llvm::APFloat apf_self = convU16ToAPFloat(this->val); llvm::APFloat apf_other = convU16ToAPFloat(other.val); apf_self.divide(apf_other, llvm::APFloat::rmNearestTiesToEven); uint16_t ret = convAPFloatToU16(apf_self); return half(ret); } half half::operator %(const half& other) const { llvm::APFloat apf_self = convU16ToAPFloat(this->val); llvm::APFloat apf_other = convU16ToAPFloat(other.val); apf_self.remainder(apf_other); uint16_t ret = convAPFloatToU16(apf_self); return half(ret); } bool half::operator ==(const half& other) const { llvm::APFloat apf_self = convU16ToAPFloat(this->val); llvm::APFloat apf_other = convU16ToAPFloat(other.val); llvm::APFloat::cmpResult res = apf_self.compare(apf_other); if (res == llvm::APFloat::cmpEqual) return true; return false; } bool half::operator !=(const half& other) const { llvm::APFloat apf_self = convU16ToAPFloat(this->val); llvm::APFloat apf_other = convU16ToAPFloat(other.val); llvm::APFloat::cmpResult res = apf_self.compare(apf_other); if (res == llvm::APFloat::cmpEqual) return false; return true; } bool half::operator <(const half& other) const { llvm::APFloat apf_self = convU16ToAPFloat(this->val); llvm::APFloat apf_other = convU16ToAPFloat(other.val); llvm::APFloat::cmpResult res = apf_self.compare(apf_other); if (res == llvm::APFloat::cmpLessThan) return true; return false; } bool half::operator >(const half& other) const { llvm::APFloat apf_self = convU16ToAPFloat(this->val); llvm::APFloat apf_other = convU16ToAPFloat(other.val); llvm::APFloat::cmpResult res = apf_self.compare(apf_other); if (res == llvm::APFloat::cmpGreaterThan) return true; return false; } bool half::operator <=(const half& other) const { llvm::APFloat apf_self = convU16ToAPFloat(this->val); llvm::APFloat apf_other = convU16ToAPFloat(other.val); llvm::APFloat::cmpResult res = apf_self.compare(apf_other); if (res == llvm::APFloat::cmpLessThan || res == llvm::APFloat::cmpEqual) return true; return false; } bool half::operator >=(const half& other) const { llvm::APFloat apf_self = convU16ToAPFloat(this->val); llvm::APFloat apf_other = convU16ToAPFloat(other.val); llvm::APFloat::cmpResult res = apf_self.compare(apf_other); if (res == llvm::APFloat::cmpGreaterThan || res == llvm::APFloat::cmpEqual) return true; return false; } bool half::operator &&(const half& other) const { llvm::APFloat apf_self = convU16ToAPFloat(this->val); llvm::APFloat apf_other = convU16ToAPFloat(other.val); if (apf_self.isZero() || apf_other.isZero()) return false; return true; } bool half::operator ||(const half& other) const { llvm::APFloat apf_self = convU16ToAPFloat(this->val); llvm::APFloat apf_other = convU16ToAPFloat(other.val); if (apf_self.isZero() && apf_other.isZero()) return false; return true; } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/register.hpp000664 001750 001750 00000021312 13161142102 020573 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file register.hpp * \author Benjamin Segovia */ #ifndef __GBE_IR_REGISTER_HPP__ #define __GBE_IR_REGISTER_HPP__ #include "sys/vector.hpp" #include "sys/platform.hpp" #include "../backend/program.h" namespace gbe { namespace ir { /*! Defines the size of the pointers. All the functions from the unit will * use the same pointer size as the unit they belong to */ enum PointerSize { POINTER_32_BITS = 32, POINTER_64_BITS = 64 }; /*! Basically provides the size of the register */ enum RegisterFamily : uint8_t { FAMILY_BOOL = 0, FAMILY_BYTE = 1, FAMILY_WORD = 2, FAMILY_DWORD = 3, FAMILY_QWORD = 4, FAMILY_OWORD = 5, FAMILY_HWORD = 6, FAMILY_REG = 7 }; INLINE char getFamilyName(RegisterFamily family) { static char registerFamilyName[] = {'b', 'B', 'W', 'D', 'Q', 'O', 'H', 'R'}; return registerFamilyName[family]; } INLINE uint32_t getFamilySize(RegisterFamily family) { switch (family) { case FAMILY_BYTE: return 1; case FAMILY_WORD: return 2; case FAMILY_DWORD: return 4; case FAMILY_QWORD: return 8; case FAMILY_REG: return 32; default: NOT_SUPPORTED; }; return 0; } enum ARFRegister { ARF_NULL = 0, ARF_ADDRESS, ARF_ACCUMULATOR, ARF_FLAG, ARF_MASK, ARF_MASK_STACK, ARF_MASK_STACK_DEPTH, ARF_STATE, ARF_CONTROL, ARF_NOTIFICATION_COUNT, ARF_IP, ARF_TM }; /*! Register is the position of the index of the register data in the register * file. We enforce type safety with this class */ TYPE_SAFE(Register, uint32_t) /*! A register can be either a byte, a word, a dword or a qword. We store this * value into a register data (which makes the register file) */ class RegisterData { public: struct PayloadRegisterData { gbe_curbe_type curbeType; int subType; }; /*! Build a register. All fields will be immutable */ INLINE RegisterData(RegisterFamily family, bool uniform, gbe_curbe_type curbeType, int subType) : family(family), uniform(uniform) { payloadData.curbeType = curbeType; payloadData.subType = subType; } /*! Copy constructor */ INLINE RegisterData(const RegisterData &other) : family(other.family), uniform(other.uniform), payloadData(other.payloadData) {} /*! Copy operator */ INLINE RegisterData &operator= (const RegisterData &other) { this->family = other.family; this->uniform = other.uniform; this->payloadData = other.payloadData; return *this; } /*! Nothing really happens here */ INLINE ~RegisterData(void) {} RegisterFamily family; //!< Register size or if it is a flag INLINE bool isUniform() const { return uniform; } INLINE void setUniform(bool uni) { uniform = uni; } INLINE void setPayloadType(gbe_curbe_type curbeType, int subType) { payloadData.curbeType = curbeType; payloadData.subType = subType; } INLINE void getPayloadType(gbe_curbe_type &curbeType, int &subType) const { curbeType = payloadData.curbeType; subType = payloadData.subType; } INLINE bool isPayloadType(void) const { return payloadData.curbeType != GBE_GEN_REG; } private: bool uniform; PayloadRegisterData payloadData; GBE_CLASS(RegisterData); }; /*! Output the register file string in the given stream */ std::ostream &operator<< (std::ostream &out, const RegisterData ®Data); INLINE bool operator< (const Register &r0, const Register &r1) { return r0.value() < r1.value(); } /*! Tuple is the position of the first register in the tuple vector. We * enforce type safety with this class */ TYPE_SAFE(Tuple, uint32_t) /*! A register file allocates and destroys registers. Basically, we will have * one register file per function */ class RegisterFile { public: /*! Return the index of a newly allocated register */ INLINE Register append(RegisterFamily family, bool uniform = false, gbe_curbe_type curbeType = GBE_GEN_REG, int subType = 0) { GBE_ASSERTM((uint64_t)regNum() < MAX_INDEX, "Too many defined registers (only 4G are supported)"); const uint32_t index = regNum(); const RegisterData reg(family, uniform, curbeType, subType); regs.push_back(reg); return Register(index); } /*! Make a tuple from an array of register */ Tuple appendArrayTuple(const Register *reg, uint32_t regNum); /*! Make a tuple and return the index to the first element of the tuple */ template INLINE Tuple appendTuple(First first, Rest... rest) { const Tuple index = Tuple(regTuples.size()); GBE_ASSERTM(first < regNum(), "Out-of-bound register"); regTuples.push_back(first); appendTuple(rest...); return index; } /*! To terminate variadic recursion */ INLINE void appendTuple(void) {} /*! Make a tuple from an array of Type */ Tuple appendArrayTypeTuple(const uint8_t *types, uint32_t num); /*! Make a tuple and return the index to the first element of the tuple */ template INLINE Tuple appendTypeTuple(First first, Rest... rest) { const Tuple index = Tuple(typeTuples.size()); typeTuples.push_back(first); appendTuple(rest...); return index; } /*! To terminate variadic recursion */ INLINE void appendTypeTuple(void) {} /*! Return a copy of the register at index */ INLINE RegisterData get(Register index) const { return regs[index]; } /*! Return true if the specified register is uniform type. */ INLINE bool isUniform(Register index) { return regs[index].isUniform(); } /*! Set a register to uniform or varying data type*/ INLINE void setUniform(Register index, bool uniform) { regs[index].setUniform(uniform); } /*! Set payload type of a register */ INLINE void setPayloadType(Register index, gbe_curbe_type curbeType, int subType) { regs[index].setPayloadType(curbeType, subType); } /*! Get payload type of a register */ INLINE void getPayloadType(Register index, gbe_curbe_type &curbeType, int &subType) const { regs[index].getPayloadType(curbeType, subType); } /*! Check whether the register is a payload register */ INLINE bool isPayloadReg(Register index) const { return regs[index].isPayloadType(); } /*! Get the register index from the tuple */ INLINE Register get(Tuple index, uint32_t which) const { return regTuples[index.value() + which]; } /*! Set the register index from the tuple */ INLINE void set(Tuple index, uint32_t which, Register reg) { regTuples[index.value() + which] = reg; } /*! Get the type from the tuple */ INLINE uint8_t getType(Tuple index, uint32_t which) const { return typeTuples[index.value() + which]; } /*! Set the type to the tuple */ INLINE void setType(Tuple index, uint32_t which, uint8_t type) { typeTuples[index.value() + which] = type; } /*! Number of registers in the register file */ INLINE uint32_t regNum(void) const { return regs.size(); } /*! Number of tuples in the register file */ INLINE uint32_t tupleNum(void) const { return regTuples.size(); } /*! register and tuple indices are short */ enum { MAX_INDEX = 0xffffffff }; private: vector regs; //!< All the registers together vector regTuples; //!< Tuples are used for many src / dst vector typeTuples; //!< Tuples are used for one instruction has multi src/dst types. GBE_CLASS(RegisterFile); }; /*! Output the register file string in the given stream */ std::ostream &operator<< (std::ostream &out, const RegisterFile &file); } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_REGISTER_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/printf.cpp000664 001750 001750 00000013372 13161142102 020253 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file printf.cpp * */ #include #include "printf.hpp" #include "ir/unit.hpp" namespace gbe { namespace ir { pthread_mutex_t PrintfSet::lock = PTHREAD_MUTEX_INITIALIZER; static void generatePrintfFmtString(PrintfState& state, std::string& str) { char num_str[16]; str = "%"; if (state.left_justified) { str += "-"; } if (state.sign_symbol == 1) { str += "+"; } else if (state.sign_symbol == 2) { str += " "; } if (state.alter_form) { str += "#"; } if (state.zero_padding) { str += "0"; } if (state.min_width >= 0) { snprintf(num_str, 16, "%d", state.min_width); str += num_str; } if (state.precision >= 0) { str += "."; snprintf(num_str, 16, "%d", state.precision); str += num_str; } switch (state.length_modifier) { case PRINTF_LM_HH: str += "hh"; break; case PRINTF_LM_H: str += "h"; break; case PRINTF_LM_L: str += "l"; break; case PRINTF_LM_HL: str += ""; break; default: assert(state.length_modifier == PRINTF_LM_NONE); } } #define PRINT_SOMETHING(target_ty, conv) do { \ if (!vec_i) \ pf_str = pf_str + std::string(#conv); \ printf(pf_str.c_str(), log.getData()); \ } while (0) static void printOutOneStatement(PrintfSet::PrintfFmt& fmt, PrintfLog& log) { std::string pf_str = ""; for (auto& slot : fmt) { if (slot.type == PRINTF_SLOT_TYPE_STRING) { printf("%s", slot.str.c_str()); continue; } assert(slot.type == PRINTF_SLOT_TYPE_STATE); generatePrintfFmtString(slot.state, pf_str); int vec_num; vec_num = slot.state.vector_n > 0 ? slot.state.vector_n : 1; for (int vec_i = 0; vec_i < vec_num; vec_i++) { if (vec_i) printf(","); switch (slot.state.conversion_specifier) { case PRINTF_CONVERSION_D: case PRINTF_CONVERSION_I: if (slot.state.length_modifier == PRINTF_LM_L) PRINT_SOMETHING(uint64_t, d); else PRINT_SOMETHING(int, d); break; case PRINTF_CONVERSION_O: if (slot.state.length_modifier == PRINTF_LM_L) PRINT_SOMETHING(uint64_t, o); else PRINT_SOMETHING(int, o); break; case PRINTF_CONVERSION_U: if (slot.state.length_modifier == PRINTF_LM_L) PRINT_SOMETHING(uint64_t, u); else PRINT_SOMETHING(int, u); break; case PRINTF_CONVERSION_X: if (slot.state.length_modifier == PRINTF_LM_L) PRINT_SOMETHING(uint64_t, X); else PRINT_SOMETHING(int, X); break; case PRINTF_CONVERSION_x: if (slot.state.length_modifier == PRINTF_LM_L) PRINT_SOMETHING(uint64_t, x); else PRINT_SOMETHING(int, x); break; case PRINTF_CONVERSION_C: PRINT_SOMETHING(char, c); break; case PRINTF_CONVERSION_F: PRINT_SOMETHING(float, F); break; case PRINTF_CONVERSION_f: PRINT_SOMETHING(float, f); break; case PRINTF_CONVERSION_E: PRINT_SOMETHING(float, E); break; case PRINTF_CONVERSION_e: PRINT_SOMETHING(float, e); break; case PRINTF_CONVERSION_G: PRINT_SOMETHING(float, G); break; case PRINTF_CONVERSION_g: PRINT_SOMETHING(float, g); break; case PRINTF_CONVERSION_A: PRINT_SOMETHING(float, A); break; case PRINTF_CONVERSION_a: PRINT_SOMETHING(float, a); break; case PRINTF_CONVERSION_P: PRINT_SOMETHING(int, p); break; case PRINTF_CONVERSION_S: pf_str = pf_str + "s"; printf(pf_str.c_str(), slot.state.str.c_str()); break; default: assert(0); return; } } } } void PrintfSet::outputPrintf(void* buf_addr) { LockOutput lock; uint32_t totalSZ = ((uint32_t *)buf_addr)[0]; char* p = (char*)buf_addr + sizeof(uint32_t); for (uint32_t parsed = 4; parsed < totalSZ; ) { PrintfLog log(p); GBE_ASSERT(fmts.find(log.statementNum) != fmts.end()); printOutOneStatement(fmts[log.statementNum], log); parsed += log.size; p += log.size; } } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/type.cpp000664 001750 001750 00000003221 13161142102 017722 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file instruction.cpp * \author Benjamin Segovia */ #include "ir/type.hpp" namespace gbe { namespace ir { std::ostream &operator<< (std::ostream &out, const Type &type) { switch (type) { case TYPE_BOOL: return out << "bool"; case TYPE_S8: return out << "int8"; case TYPE_U8: return out << "uint8"; case TYPE_S16: return out << "int16"; case TYPE_U16: return out << "uint16"; case TYPE_S32: return out << "int32"; case TYPE_U32: return out << "uint32"; case TYPE_S64: return out << "int64"; case TYPE_U64: return out << "uint64"; case TYPE_HALF: return out << "half"; case TYPE_FLOAT: return out << "float"; case TYPE_DOUBLE: return out << "double"; default : GBE_ASSERT(0 && "Unsupported type\n"); }; return out; } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/structurizer.hpp000664 001750 001750 00000016547 13161142102 021552 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __STRUCTURIZER_HPP__ #define __STRUCTURIZER_HPP__ #include "llvm/ADT/SmallVector.h" #include "ir/unit.hpp" #include "ir/function.hpp" #include "ir/instruction.hpp" #include #include #include #include #include #include namespace gbe { namespace ir { using namespace llvm; enum BlockType { SingleBlockType = 0, SerialBlockType, IfThenType, IfElseType, SelfLoopType }; /* Block*/ class Block; typedef std::set BlockSets; typedef std::list BlockList; typedef std::vector BlockVector; typedef std::set::iterator sIterator; typedef std::list::iterator lIterator; class Block { public: Block(BlockType type, const BlockList& children): has_barrier(false), mark(false), canBeHandled(true), inversePredicate(true), insnNum(0) { this->btype = type; this->children = children; } virtual ~Block() {} Block*& fallthrough() { return fall_through; } BlockSets& successors() { return successor; } size_t succ_size() { return successor.size(); } sIterator succ_begin() { return successor.begin(); } sIterator succ_end() { return successor.end(); } bool succ_empty() { return successor.empty(); } BlockSets& predecessors() { return predecessor; } size_t pred_size() { return predecessor.size(); } sIterator pred_begin() { return predecessor.begin(); } sIterator pred_end() { return predecessor.end(); } bool& hasBarrier() { return has_barrier; } BlockType type() { return btype; } virtual BasicBlock* getEntry() { return (*(children.begin()))->getEntry(); } virtual BasicBlock* getExit() { return (*(children.rbegin()))->getExit(); } public: BlockType btype; Block* fall_through; BlockSets predecessor; BlockSets successor; BlockList children; bool has_barrier; bool mark; bool canBeHandled; //label is for debug int label; /* inversePredicate should be false under two circumstance, * fallthrough is the same with succs: * (1) n->succs == m && block->fallthrough == m * block * | \ * | \ * m<--n * (2) m->succs == n && block->fallthrough == n * block * | \ * | \ * m-->n * */ bool inversePredicate; int insnNum; }; /* represents basic block */ class SimpleBlock: public Block { public: SimpleBlock(BasicBlock *p_bb) : Block(SingleBlockType, BlockList()) { this->p_bb = p_bb; } virtual ~SimpleBlock() {} BasicBlock* getBasicBlock() { return p_bb; } virtual BasicBlock* getEntry() { return p_bb; } virtual BasicBlock* getExit() { return p_bb; } virtual BasicBlock* getFirstBB() { return p_bb; } private: BasicBlock *p_bb; }; /* a serial of Blocks*/ class SerialBlock : public Block { public: SerialBlock(BlockList& children) : Block(SerialBlockType, children) {} virtual ~SerialBlock(){} }; /* If-Then Block*/ class IfThenBlock : public Block { public: IfThenBlock(Block* pred, Block* trueBlock) : Block(IfThenType, InitChildren(pred, trueBlock)) {} virtual ~IfThenBlock() {} private: const BlockList InitChildren(Block* pred, Block* trueBlock) { BlockList children; children.push_back(pred); children.push_back(trueBlock); return children; } }; /* If-Else Block*/ class IfElseBlock: public Block { public: IfElseBlock(Block* pred, Block* trueBlock, Block* falseBlock) : Block(IfElseType, InitChildren(pred, trueBlock, falseBlock)) {} virtual ~IfElseBlock() {} private: const BlockList InitChildren(Block* pred, Block* trueBlock, Block* falseBlock) { BlockList children; children.push_back(pred); children.push_back(trueBlock); children.push_back(falseBlock); return children; } }; /* Self loop Block*/ class SelfLoopBlock: public Block { public: SelfLoopBlock(Block* block) : Block(SelfLoopType, InitChildren(block)) {} virtual ~SelfLoopBlock() {} virtual BasicBlock* getEntry() { return (*(children.begin()))->getEntry(); } virtual BasicBlock* getExit() { return (*(children.begin()))->getExit(); } private: const BlockList InitChildren(Block * block) { BlockList children; children.push_back(block); return children; } }; class CFGStructurizer{ public: CFGStructurizer(Function* fn) { this->fn = fn; numSerialPatternMatch = 0; numLoopPatternMatch = 0; numIfPatternMatch = 0;} ~CFGStructurizer(); void StructurizeBlocks(); private: int numSerialPatternMatch; int numLoopPatternMatch; int numIfPatternMatch; void outBlockTypes(BlockType type); void printOrderedBlocks(); void blockPatternMatch(); int serialPatternMatch(Block *block); Block* mergeSerialBlock(BlockList& serialBB); void cfgUpdate(Block* mergedBB, const BlockSets& blockBBs); void replace(Block* mergedBB, BlockSets serialSets); int loopPatternMatch(Block *block); Block* mergeLoopBlock(BlockList& loopSets); int ifPatternMatch(Block *block); int patternMatch(Block *block); void collectInsnNum(Block* block, const BasicBlock* bb); private: void handleSelfLoopBlock(Block *loopblock, LabelIndex& whileLabel); void markNeedIf(Block *block, bool status); void markNeedEndif(Block *block, bool status); void markStructuredBlocks(Block *block, bool status); void handleIfBlock(Block *block, LabelIndex& matchingEndifLabel, LabelIndex& matchingElseLabel); void handleThenBlock(Block * block, LabelIndex& endiflabel); void handleThenBlock2(Block *block, Block *elseblock, LabelIndex elseBBLabel); void handleElseBlock(Block * block, LabelIndex& elselabel, LabelIndex& endiflabel); void handleStructuredBlocks(); void getStructureSequence(Block *block, std::vector &seq); std::set getStructureBasicBlocksIndex(Block* block, std::vector &bbs); std::set getStructureBasicBlocks(Block *block); Block* insertBlock(Block *p_block); bool checkForBarrier(const BasicBlock* bb); void getLiveIn(BasicBlock& bb, std::set& livein); void initializeBlocks(); void calculateNecessaryLiveout(); private: Function *fn; std::map bbmap; std::map bTobbmap; BlockVector blocks; Block* blocks_entry; gbe::vector loops; BlockList orderedBlks; BlockList::iterator orderIter; }; } /* namespace ir */ } /* namespace gbe */ #endif Beignet-1.3.2-Source/backend/src/ir/immediate.hpp000664 001750 001750 00000031266 13161142102 020716 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file Immediate.hpp * * \author Benjamin Segovia */ #ifndef __GBE_IR_IMMEDIATE_HPP__ #define __GBE_IR_IMMEDIATE_HPP__ #include #include "ir/type.hpp" #include "ir/half.hpp" #include "sys/platform.hpp" namespace gbe { namespace ir { typedef enum { IMM_TRUNC = 0, IMM_BITCAST, IMM_ADD, IMM_SUB, IMM_MUL, IMM_DIV, IMM_REM, IMM_SHL, IMM_ASHR, IMM_LSHR, IMM_AND, IMM_OR, IMM_XOR, IMM_OEQ, IMM_ONE, IMM_OLE, IMM_OGE, IMM_OLT, IMM_OGT, IMM_ORD, IMM_FPTOUI, IMM_FPTOSI, IMM_SITOFP, IMM_UITOFP, IMM_HFTOUS, IMM_HFTOSS, IMM_SSTOHF, IMM_USTOHF, IMM_EXTRACT, IMM_SEXT, IMM_ZEXT, IMM_FPEXT } ImmOpCode; typedef enum { IMM_TYPE_BOOL = TYPE_BOOL, IMM_TYPE_S8 = TYPE_S8, IMM_TYPE_U8 = TYPE_U8, IMM_TYPE_S16 = TYPE_S16, IMM_TYPE_U16 = TYPE_U16, IMM_TYPE_S32 = TYPE_S32, IMM_TYPE_U32 = TYPE_U32, IMM_TYPE_S64 = TYPE_S64, IMM_TYPE_U64 = TYPE_U64, IMM_TYPE_FLOAT = TYPE_FLOAT, IMM_TYPE_HALF = TYPE_HALF, IMM_TYPE_DOUBLE = TYPE_DOUBLE, IMM_TYPE_COMP // compond immediate which consist many immediates. } ImmType; /*! The value as stored in the instruction */ class Immediate { public: INLINE Immediate(void) { } Immediate & operator= (const Immediate &); INLINE Type getType(void) const { return (Type)type; } INLINE bool isCompType(void) const { return type == IMM_TYPE_COMP; } INLINE uint32_t getElemNum(void) const { return elemNum; } uint32_t getTypeSize(void) const { switch(type) { default: GBE_ASSERT(0 && "Invalid immeidate type.\n"); case TYPE_BOOL: case TYPE_S8: case TYPE_U8: return 1; case TYPE_S16: case TYPE_HALF: case TYPE_U16: return 2; case TYPE_FLOAT: case TYPE_S32: case TYPE_U32: return 4; case TYPE_DOUBLE: case TYPE_S64: case TYPE_U64: return 8; case IMM_TYPE_COMP: return sizeof(Immediate*); } } #define DECL_CONSTRUCTOR(TYPE, FIELD, IR_TYPE) \ Immediate(TYPE FIELD) { \ this->type = (ImmType)IR_TYPE; \ this->elemNum = 1; \ this->data.p = &defaultData; \ defaultData = 0ull; \ *this->data.FIELD = FIELD; \ } DECL_CONSTRUCTOR(bool, b, TYPE_BOOL) DECL_CONSTRUCTOR(int8_t, s8, TYPE_S8) DECL_CONSTRUCTOR(uint8_t, u8, TYPE_U8) DECL_CONSTRUCTOR(int16_t, s16, TYPE_S16) DECL_CONSTRUCTOR(uint16_t, u16, TYPE_U16) DECL_CONSTRUCTOR(int32_t, s32, TYPE_S32) DECL_CONSTRUCTOR(uint32_t, u32, TYPE_U32) DECL_CONSTRUCTOR(int64_t, s64, TYPE_S64) DECL_CONSTRUCTOR(uint64_t, u64, TYPE_U64) DECL_CONSTRUCTOR(float, f32, TYPE_FLOAT) DECL_CONSTRUCTOR(half, f16, TYPE_HALF) DECL_CONSTRUCTOR(double, f64, TYPE_DOUBLE) #undef DECL_CONSTRUCTOR #define DECL_CONSTRUCTOR(TYPE, FIELD, IR_TYPE, ELEMNUM) \ Immediate(TYPE *FIELD, uint32_t ELEMNUM) { \ this->type = (ImmType)IR_TYPE; \ this->elemNum = ELEMNUM; \ if (elemNum * ELEMNUM > 8) \ this->data.p = malloc(ELEMNUM * getTypeSize()); \ else \ this->data.p = &defaultData; \ defaultData = 0ull; \ memcpy(this->data.FIELD, FIELD, ELEMNUM * getTypeSize()); \ } DECL_CONSTRUCTOR(bool, b, TYPE_BOOL, elemNum) DECL_CONSTRUCTOR(int8_t, s8, TYPE_S8, elemNum) DECL_CONSTRUCTOR(uint8_t, u8, TYPE_U8, elemNum) DECL_CONSTRUCTOR(int16_t, s16, TYPE_S16, elemNum) DECL_CONSTRUCTOR(uint16_t, u16, TYPE_U16, elemNum) DECL_CONSTRUCTOR(int32_t, s32, TYPE_S32, elemNum) DECL_CONSTRUCTOR(uint32_t, u32, TYPE_U32, elemNum) DECL_CONSTRUCTOR(int64_t, s64, TYPE_S64, elemNum) DECL_CONSTRUCTOR(uint64_t, u64, TYPE_U64, elemNum) DECL_CONSTRUCTOR(float, f32, TYPE_FLOAT, elemNum) DECL_CONSTRUCTOR(half, f16, TYPE_HALF, elemNum) DECL_CONSTRUCTOR(double, f64, TYPE_DOUBLE, elemNum) #undef DECL_CONSTRUCTOR Immediate(const vector immVec, Type dstType); INLINE int64_t getIntegerValue(void) const { switch (type) { default: GBE_ASSERT(0 && "Invalid immediate type.\n"); case TYPE_BOOL: return *data.b; case TYPE_S8: return *data.s8; case TYPE_U8: return *data.u8; case TYPE_S16: return *data.s16; case TYPE_U16: return *data.u16; case TYPE_S32: return *data.s32; case TYPE_U32: return *data.u32; case TYPE_S64: return *data.s64; case TYPE_U64: return *data.u64; } } INLINE uint64_t getUnsignedIntegerValue(void) const { switch (type) { default: GBE_ASSERT(0 && "Invalid immediate type.\n"); case TYPE_BOOL: return *data.b; case TYPE_S8: return *data.s8; case TYPE_U8: return *data.u8; case TYPE_S16: return *data.s16; case TYPE_U16: return *data.u16; case TYPE_S32: return *data.s32; case TYPE_U32: return *data.u32; case TYPE_S64: return *data.s64; case TYPE_U64: return *data.u64; } } INLINE float getFloatValue(void) const { // we allow bitcast from u32/s32 immediate to float GBE_ASSERT(type == IMM_TYPE_FLOAT || type == IMM_TYPE_U32 || type == IMM_TYPE_S32); return *data.f32; } INLINE float asFloatValue(void) const { GBE_ASSERT(type == IMM_TYPE_FLOAT || type == IMM_TYPE_U32 || type == IMM_TYPE_S32); return *data.f32; } INLINE half getHalfValue(void) const { GBE_ASSERT(type == IMM_TYPE_HALF); return *data.f16; } INLINE half asHalfValue(void) const { // we allow bitcast from u32/s32 immediate to float GBE_ASSERT(type == IMM_TYPE_HALF || type == IMM_TYPE_U16 || type == IMM_TYPE_S16); return *data.f16; } INLINE int64_t asIntegerValue(void) const { GBE_ASSERT(elemNum == 1); return *data.s64; } INLINE double getDoubleValue(void) const { GBE_ASSERT(type == IMM_TYPE_DOUBLE); return *data.f64; } INLINE Immediate(const Immediate & other) { *this = other; } Immediate(ImmOpCode op, const Immediate &other, Type dstType) { switch (op) { default: GBE_ASSERT(0); case IMM_TRUNC: copy(other, 0, 1); break; case IMM_BITCAST: if (other.type != IMM_TYPE_COMP) { *this = other; type = (ImmType)dstType; } else { vector immVec; for(uint32_t i = 0; i < other.getElemNum(); i++) immVec.push_back(other.data.immVec[i]); *this = Immediate(immVec, dstType); } break; case IMM_FPTOUI: *this = Immediate((uint32_t)*other.data.f32); break; case IMM_FPTOSI: *this = Immediate((int32_t)*other.data.f32); break; case IMM_UITOFP: *this = Immediate((float)*other.data.u32); break; case IMM_SITOFP: *this = Immediate((float)*other.data.s32); break; case IMM_HFTOUS: *this = Immediate((uint16_t)*other.data.f16); break; case IMM_HFTOSS: *this = Immediate((int16_t)*other.data.f16); break; case IMM_USTOHF: *this = Immediate(half::convToHalf(*other.data.u16)); break; case IMM_SSTOHF: *this = Immediate(half::convToHalf(*other.data.s16)); break; case IMM_SEXT: { int64_t value = other.getIntegerValue(); if (other.getType() == TYPE_BOOL) value = -value; switch (dstType) { default: GBE_ASSERT(0 && "Illegal sext constant expression"); case TYPE_S8: *this = Immediate((int8_t)value); break; case TYPE_S16: *this = Immediate((int16_t)value); break; case TYPE_S32: *this = Immediate((int32_t)value); break; case TYPE_S64: *this = Immediate((int64_t)value); break; } } case IMM_ZEXT: { uint64_t value = other.getUnsignedIntegerValue(); switch (dstType) { default: GBE_ASSERT(0 && "Illegal sext constant expression"); case TYPE_U8: *this = Immediate((uint8_t)value); break; case TYPE_U16: *this = Immediate((uint16_t)value); break; case TYPE_U32: *this = Immediate((uint32_t)value); break; case TYPE_U64: *this = Immediate((uint64_t)value); break; } break; } case IMM_FPEXT: { if (other.getType() == TYPE_FLOAT) { GBE_ASSERT(dstType == TYPE_DOUBLE); double value = other.getFloatValue(); *this = Immediate(value); } else if (other.getType() == TYPE_HALF) { GBE_ASSERT(dstType == TYPE_DOUBLE || dstType == TYPE_FLOAT); if (dstType == TYPE_FLOAT) { float value = other.getHalfValue(); *this = Immediate(value); } else { double value = other.getHalfValue(); *this = Immediate(value); } } break; } } } Immediate(ImmOpCode op, const Immediate &left, const Immediate &right, Type dstType); ~Immediate() { if (data.p != &defaultData) { free(data.p); data.p = NULL; } } private: ImmType type; //!< Type of the value uint32_t elemNum; //!< vector imm data type uint64_t defaultData; union { bool *b; int8_t *s8; uint8_t *u8; int16_t *s16; uint16_t *u16; int32_t *s32; uint32_t *u32; int64_t *s64; uint64_t *u64; float *f32; double *f64; half *f16; const Immediate **immVec; void *p; } data; //!< Value to store Immediate operator+ (const Immediate &) const; Immediate operator- (const Immediate &) const; Immediate operator* (const Immediate &) const; Immediate operator/ (const Immediate &) const; Immediate operator> (const Immediate &) const; Immediate operator== (const Immediate &) const; Immediate operator!= (const Immediate &) const; Immediate operator>= (const Immediate &) const; Immediate operator<= (const Immediate &) const; Immediate operator&& (const Immediate &) const; Immediate operator% (const Immediate &) const; Immediate operator& (const Immediate &) const; Immediate operator| (const Immediate &) const; Immediate operator^ (const Immediate &) const; Immediate operator<< (const Immediate &) const; Immediate operator>> (const Immediate &) const; static Immediate lshr (const Immediate &left, const Immediate &right); static Immediate less (const Immediate &left, const Immediate &right); static Immediate extract (const Immediate &left, const Immediate &right, Type dstType); void copy(const Immediate &other, int32_t offset, uint32_t num); GBE_CLASS(Immediate); }; /*! Compare two immediates */ INLINE bool operator< (const Immediate &imm0, const Immediate &imm1) { if (imm0.getType() != imm1.getType()) return uint32_t(imm0.getType()) < uint32_t(imm1.getType()); else if (imm0.getType() == TYPE_FLOAT || imm0.getType() == TYPE_DOUBLE || imm0.getType() == TYPE_HALF) return imm0.asIntegerValue() < imm1.asIntegerValue(); else return imm0.getIntegerValue() < imm1.getIntegerValue(); GBE_ASSERT(0); } /*! A value is stored in a per-function vector. This is the index to it */ TYPE_SAFE(ImmediateIndex, uint32_t) } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_IMMEDIATE_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/lowering.cpp000664 001750 001750 00000047036 13173554000 020612 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file lowering.cpp * \author Benjamin Segovia */ #include "ir/context.hpp" #include "ir/value.hpp" #include "ir/liveness.hpp" #include "sys/set.hpp" namespace gbe { namespace ir { /*! Small helper class to lower return instructions */ class ContextReturn : public Context { public: /*! Initialize a context dedicated to return instruction lowering */ ContextReturn(Unit &unit) : Context(unit) { this->usedLabels = GBE_NEW_NO_ARG(vector); } /*! Lower the return instruction to gotos for the given function */ void lower(const std::string &functionName); }; void ContextReturn::lower(const std::string &functionName) { if ((this->fn = unit.getFunction(functionName)) == NULL) return; // Append a new block at the end of the function with a return instruction: // the only one we are going to have this->bb = &this->fn->getBottomBlock(); const LabelIndex index = this->label(); this->LABEL(index); const BasicBlock *lastBlock = this->bb; /* Append the STORE_PROFILING just before return. */ if (unit.getInProfilingMode() == true) { this->STORE_PROFILING(this->getUnit().getProfilingInfo()->getBTI(), this->getUnit().getProfilingInfo()->getProfilingType()); } this->RET(); // Now traverse all instructions and replace all returns by GOTO index fn->foreachInstruction([&](Instruction &insn) { if (insn.getParent() == lastBlock) return; // This is the last block if (insn.getOpcode() != OP_RET) return; const Instruction bra = ir::BRA(index); bra.replace(&insn); }); } void lowerReturn(Unit &unit, const std::string &functionName) { ContextReturn ctx(unit); ctx.lower(functionName); } /*! Characterizes how the argument is used (directly read, indirectly read, * written) */ enum ArgUse { ARG_DIRECT_READ = 0, ARG_INDIRECT_READ = 1, ARG_WRITTEN = 2 }; /*! Just to book keep the sequence of instructions that directly load an input * argument */ struct LoadAddImm { Instruction *load; //!< Load from the argument Instruction *add; //!< Can be NULL if we only have load(arg) Instruction *loadImm; //!< Can also be NULL uint64_t offset; //!< Offset where to load in the structure uint32_t argID; //!< Associated function argument }; struct IndirectLoad { Instruction *load; //!< Load from the argument vector adds; //!< Can be NULL if we only have load(arg) uint32_t argID; //!< Associated function argument }; /*! List of direct loads */ typedef vector LoadAddImmSeq; typedef vector IndirectLoadSeq; /*! Helper class to lower function arguments if required */ class FunctionArgumentLowerer : public Context { public: /*! Build the helper structure */ FunctionArgumentLowerer(Unit &unit); /*! Free everything we needed */ virtual ~FunctionArgumentLowerer(void); /*! Perform all function arguments substitution if needed */ void lower(const std::string &name); /*! Lower the given function argument accesses */ ArgUse lower(uint32_t argID); /*! Build the constant push for the function */ void buildConstantPush(void); /* Lower indirect Read to indirct Mov */ void lowerIndirectRead(uint32_t argID); /* Convert indirectLoad to indirect Mov */ void ReplaceIndirectLoad(void); /*! Inspect the given function argument to see how it is used. If this is * direct loads only, we also output the list of instructions used for each * load */ ArgUse getArgUse(uint32_t argID); /*! Recursively look if there is a store in the given use */ bool useStore(const ValueDef &def, set &visited); /*! Look if the pointer use only load with immediate offsets */ bool matchLoadAddImm(uint32_t argID); Liveness *liveness; //!< To compute the function graph FunctionDAG *dag; //!< Contains complete dependency information LoadAddImmSeq seq; //!< All the direct loads IndirectLoadSeq indirectSeq; //!< All the indirect loads }; INLINE uint64_t getOffsetFromImm(const Immediate &imm) { switch (imm.getType()) { // bit-cast these ones case TYPE_DOUBLE: case TYPE_FLOAT: NOT_SUPPORTED; return 0; case TYPE_S64: case TYPE_U64: case TYPE_U32: case TYPE_U16: case TYPE_U8: // sign extend these ones case TYPE_S32: case TYPE_S16: case TYPE_S8: return imm.getIntegerValue(); case TYPE_BOOL: case TYPE_HALF: NOT_SUPPORTED; return 0; default: GBE_ASSERT(0 && "Unsupported imm type.\n"); } return 0; } bool matchLoad(Instruction *insn, Instruction *add, Instruction *loadImm, uint64_t offset, uint32_t argID, LoadAddImm &loadAddImm) { const Opcode opcode = insn->getOpcode(); if (opcode == OP_LOAD) { LoadInstruction *load = cast(insn); if(!load) return false; if (load->getAddressSpace() != MEM_PRIVATE) return false; loadAddImm.load = insn; loadAddImm.add = add; loadAddImm.loadImm = loadImm; loadAddImm.offset = offset; loadAddImm.argID = argID; return true; } else return false; } FunctionArgumentLowerer::FunctionArgumentLowerer(Unit &unit) : Context(unit), liveness(NULL), dag(NULL) {} FunctionArgumentLowerer::~FunctionArgumentLowerer(void) { GBE_SAFE_DELETE(dag); GBE_SAFE_DELETE(liveness); } void FunctionArgumentLowerer::lower(const std::string &functionName) { if ((this->fn = unit.getFunction(functionName)) == NULL) return; GBE_SAFE_DELETE(dag); GBE_SAFE_DELETE(liveness); this->liveness = GBE_NEW(ir::Liveness, *fn); this->dag = GBE_NEW(ir::FunctionDAG, *this->liveness); bool needRefreshDag = false; // Process all structure arguments and find all the direct loads we can // replace const uint32_t argNum = fn->argNum(); vector indirctReadArgs; for (uint32_t argID = 0; argID < argNum; ++argID) { FunctionArgument &arg = fn->getArg(argID); if (arg.type != FunctionArgument::STRUCTURE) continue; if(this->lower(argID) == ARG_INDIRECT_READ) { indirctReadArgs.push_back(argID); //when the return value is ARG_INDIRECT_READ, there is still possible //that some IRs read it directly, and will be handled in buildConstantPush() //so we need to refresh the dag afer function buildConstantPush for (const auto &loadAddImm : seq) { if (loadAddImm.argID == argID) needRefreshDag = true; } } } // Build the constant push description and remove the instruction that // therefore become useless this->buildConstantPush(); if (needRefreshDag) { GBE_SAFE_DELETE(dag); GBE_SAFE_DELETE(liveness); this->liveness = GBE_NEW(ir::Liveness, *fn); this->dag = GBE_NEW(ir::FunctionDAG, *this->liveness); } for (uint32_t i = 0; i < indirctReadArgs.size(); ++i){ lowerIndirectRead(indirctReadArgs[i]); } ReplaceIndirectLoad(); } // Remove all the given instructions from the stream (if dead) #define REMOVE_INSN(WHICH) \ for (const auto &loadAddImm : seq) { \ Instruction *WHICH = loadAddImm.WHICH; \ if (WHICH == NULL) continue; \ const UseSet &useSet = dag->getUse(WHICH, 0); \ bool isDead = true; \ for (auto use : useSet) { \ if (dead.contains(use->getInstruction()) == false) { \ isDead = false; \ break; \ } \ } \ if (isDead && !dead.contains(WHICH)) { \ dead.insert(WHICH); \ WHICH->remove(); \ } \ } void FunctionArgumentLowerer::buildConstantPush(void) { if (seq.size() == 0) return; // Track instructions we remove to recursively kill them properly set dead; // The argument location we already pushed (since the same argument location // can be used several times) set inserted; for (const auto &loadAddImm : seq) { LoadInstruction *load = cast(loadAddImm.load); if(!load) continue; const uint32_t valueNum = load->getValueNum(); bool replaced = false; Instruction *ins_after = load; // the instruction to insert after. for (uint32_t valueID = 0; valueID < valueNum; ++valueID) { const Type type = load->getValueType(); const RegisterFamily family = getFamily(type); const uint32_t size = getFamilySize(family); const uint32_t offset = loadAddImm.offset + valueID * size; const PushLocation argLocation(*fn, loadAddImm.argID, offset); Register pushed; const Register reg = load->getValue(valueID); if (offset != 0) { if(inserted.contains(argLocation)) { pushed = argLocation.getRegister(); } else { // pushed register should be uniform register. pushed = fn->newRegister(family, true); this->appendPushedConstant(pushed, argLocation); inserted.insert(argLocation); } } else { pushed = fn->getArg(loadAddImm.argID).reg; } // TODO the MOV instruction can be most of the time avoided if the // register is never written. We must however support the register // replacement in the instruction interface to be able to patch all the // instruction that uses "reg" Instruction mov = ir::MOV(type, reg, pushed); mov.insert(ins_after, &ins_after); replaced = true; } if (replaced) { dead.insert(load); load->remove(); } } REMOVE_INSN(add) REMOVE_INSN(loadImm) } #undef REMOVE_INSN void FunctionArgumentLowerer::lowerIndirectRead(uint32_t argID) { FunctionArgument &arg = fn->getArg(argID); vector derivedRegs; map> addPtrInsns; derivedRegs.push_back(arg.reg); //Collect all load from this argument. for(uint32_t i=0; igetRegUse(derivedRegs[i]); for (const auto &use : *useSet) { Instruction *insn = const_cast(use->getInstruction()); const Opcode opcode = insn->getOpcode(); const uint32_t dstNum = insn->getDstNum(); (void) dstNum; GBE_ASSERT(dstNum == 1 || opcode == OP_LOAD); const Register dst = insn->getDst(); auto it = addPtrInsns.find(derivedRegs[i]); if((opcode == OP_ADD) && (derivedRegs[i] == arg.reg)) { GBE_ASSERT(it == addPtrInsns.end()); vector addInsns; addInsns.push_back(insn); addPtrInsns.insert(std::make_pair(dst, addInsns)); derivedRegs.push_back(dst); } else if(opcode == OP_LOAD) { LoadInstruction *load = cast(insn); if(!load) continue; if (load->getAddressSpace() != MEM_PRIVATE) continue; IndirectLoad indirectLoad; Register addr = load->getAddressRegister(); indirectLoad.argID = argID; indirectLoad.load = insn; auto addrIt = addPtrInsns.find(addr); GBE_ASSERT(addrIt != addPtrInsns.end()); indirectLoad.adds = addrIt->second; indirectSeq.push_back(indirectLoad); } else { if(it == addPtrInsns.end()) continue; //use arg as phi or selection, no add, skip it. auto dstIt = addPtrInsns.find(dst); if(dstIt == addPtrInsns.end()) addPtrInsns.insert(std::make_pair(dst, it->second)); else { //Muilt src from both argument, such as select, or phi, merge the vector dstIt->second.insert(dstIt->second.end(), it->second.begin(), it->second.end()); } derivedRegs.push_back(dst); } } } } void FunctionArgumentLowerer::ReplaceIndirectLoad(void) { if (indirectSeq.size() == 0) return; // Track instructions we remove to recursively kill them properly set dead; set inserted; for (const auto &indirectLoad : indirectSeq) { const Register arg = fn->getArg(indirectLoad.argID).reg; if(dead.contains(indirectLoad.load)) continue; //repetitive load in the indirectSeq, skip. LoadInstruction *load = cast(indirectLoad.load); const uint32_t valueNum = load ? load->getValueNum() : 0; bool replaced = false; Instruction *ins_after = load; // the instruction to insert after. for (uint32_t valueID = 0; valueID < valueNum; ++valueID) { const Type type = load->getValueType(); const RegisterFamily family = getFamily(type); const uint32_t size = getFamilySize(family); const uint32_t offset = valueID * size; const Register reg = load->getValue(valueID); Register addressReg = load->getAddressRegister(); if (fn->getPointerFamily() == FAMILY_QWORD) { Register tmp = fn->newRegister(FAMILY_DWORD); Instruction cvt = ir::CVT(ir::TYPE_U32, ir::TYPE_U64, tmp, load->getAddressRegister()); cvt.insert(ins_after, &ins_after); addressReg = tmp; } Instruction mov = ir::INDIRECT_MOV(type, reg, arg, addressReg, offset); mov.insert(ins_after, &ins_after); replaced = true; } if (replaced && !dead.contains(load)) { dead.insert(load); load->remove(); } vector adds = indirectLoad.adds; for (uint32_t i=0; i(adds[i]); if (add && !dead.contains(add)) { Register dst = add->getDst(); const Register src0 = add->getSrc(0); const Register src1 = add->getSrc(1); GBE_ASSERT(src0 == arg || src1 == arg); Register src = (src0 == arg) ? src1 : src0; Instruction mov = ir::MOV(add->getType(), dst, src); //MOV instruction could optimize if the dst don't write later mov.replace(add); dead.insert(add); } } } } bool FunctionArgumentLowerer::useStore(const ValueDef &def, set &visited) { const UseSet &useSet = dag->getUse(def); for (const auto &use : useSet) { const Instruction *insn = use->getInstruction(); const uint32_t srcID = use->getSrcID(); const Opcode opcode = insn->getOpcode(); if (visited.contains(insn)) continue; visited.insert(insn); if (opcode == OP_STORE && srcID == StoreInstruction::addressIndex) return true; if (insn->isMemberOf() == false && insn->isMemberOf() == false) continue; else { const uint32_t dstNum = insn->getDstNum(); for (uint32_t dstID = 0; dstID < dstNum; ++dstID) if (this->useStore(ValueDef(insn, dstID), visited) == true) return true; } } return false; } bool FunctionArgumentLowerer::matchLoadAddImm(uint32_t argID) { const FunctionArgument &arg = fn->getArg(argID); LoadAddImmSeq tmpSeq; bool match = true; // Inspect all uses of the function argument pointer const UseSet &useSet = dag->getUse(&arg); for (auto use : useSet) { Instruction *insn = const_cast(use->getInstruction()); const Opcode opcode = insn->getOpcode(); // load dst arg LoadAddImm loadAddImm; if (matchLoad(insn, NULL, NULL, 0, argID, loadAddImm)) { tmpSeq.push_back(loadAddImm); continue; } // add.ptr_type dst ptr other if (opcode != OP_ADD) return false; BinaryInstruction *add = cast(insn); if(!add) return false; const Type addType = add->getType(); const RegisterFamily family = getFamily(addType); if (family != unit.getPointerFamily()) return false; if (addType == TYPE_FLOAT) return false; // step 1 -> check that the other source comes from a load immediate const uint32_t srcID = use->getSrcID(); const uint32_t otherID = srcID ^ 1; const DefSet &defSet = dag->getDef(insn, otherID); const uint32_t defNum = defSet.size(); if (defNum == 0 || defNum > 1) continue; // undefined or more than one def const ValueDef *otherDef = *defSet.begin(); if (otherDef->getType() != ValueDef::DEF_INSN_DST) return false; Instruction *otherInsn = const_cast(otherDef->getInstruction()); if (otherInsn->getOpcode() != OP_LOADI) return false; LoadImmInstruction *loadImm = cast(otherInsn); if(!loadImm) return false; const Immediate imm = loadImm->getImmediate(); const uint64_t offset = getOffsetFromImm(imm); // step 2 -> check that the results of the add are loads from private // memory const UseSet &addUseSet = dag->getUse(add, 0); for (auto addUse : addUseSet) { Instruction *insn = const_cast(addUse->getInstruction()); // We finally find something like load dst arg+imm LoadAddImm loadAddImm; if (matchLoad(insn, add, loadImm, offset, argID, loadAddImm)) { tmpSeq.push_back(loadAddImm); continue; } else match = false; } } // OK, the argument only need direct loads. We can now append all the // direct load definitions we found for (const auto &loadImmSeq : tmpSeq) seq.push_back(loadImmSeq); return match; } ArgUse FunctionArgumentLowerer::getArgUse(uint32_t argID) { FunctionArgument &arg = fn->getArg(argID); // case 1 - we may store something to the structure argument set visited; if (this->useStore(ValueDef(&arg), visited)) return ARG_WRITTEN; // case 2 - we look for the patterns: LOAD(ptr) or LOAD(ptr+imm) if (this->matchLoadAddImm(argID)) return ARG_DIRECT_READ; // case 3 - LOAD(ptr+runtime_value) return ARG_INDIRECT_READ; } ArgUse FunctionArgumentLowerer::lower(uint32_t argID) { const ArgUse argUse = this->getArgUse(argID); #if GBE_DEBUG GBE_ASSERTM(argUse != ARG_WRITTEN, "TODO A store to a structure argument " "(i.e. not a char/short/int/float argument) has been found. " "This is not supported yet"); //GBE_ASSERTM(argUse != ARG_INDIRECT_READ, // "TODO Only direct loads of structure arguments are " // "supported now"); #endif /* GBE_DEBUG */ return argUse; } void lowerFunctionArguments(Unit &unit, const std::string &functionName) { FunctionArgumentLowerer lowerer(unit); lowerer.lower(functionName); } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/context.hpp000664 001750 001750 00000022617 13161142102 020444 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file context.hpp * \author Benjamin Segovia */ #ifndef __GBE_IR_CONTEXT_HPP__ #define __GBE_IR_CONTEXT_HPP__ #include "ir/instruction.hpp" #include "ir/function.hpp" #include "ir/register.hpp" #include "ir/immediate.hpp" #include "ir/unit.hpp" #include "sys/vector.hpp" #include namespace gbe { namespace ir { /*! A context allows an easy creation of the functions (instruction stream and * the set of immediates and registers needed for it) and constant arrays */ class Context { public: /*! Create a new context for this unit */ Context(Unit &unit); /*! Free resources needed by context */ virtual ~Context(void); /*! Create a new function "name" */ void startFunction(const std::string &name); /*! Close the function */ void endFunction(void); /*! Get the current processed unit */ INLINE Unit &getUnit(void) { return unit; } /*! Get the current processed function */ Function &getFunction(void); /*! Get the current processed block */ BasicBlock *getBlock(void) { return bb; } /*! Set the SIMD width of the function */ void setSimdWidth(uint32_t width) const { GBE_ASSERT(width == 8 || width == 16); fn->simdWidth = width; } /*! Append a new pushed constant */ void appendPushedConstant(Register reg, const PushLocation &pushed); /*! Create a new register with the given family for the current function */ Register reg(RegisterFamily family, bool uniform = false, gbe_curbe_type curbeType = GBE_GEN_REG, int subType = 0); /*! Create a new immediate value */ template INLINE ImmediateIndex newImmediate(T value) { const Immediate imm(value); return fn->newImmediate(imm); } template INLINE ImmediateIndex newImmediate(T value, uint32_t num) { const Immediate imm(value, num); return fn->newImmediate(imm); } /*! Create a new immediate value */ INLINE ImmediateIndex newImmediate(vectorindexVector, Type dstType) { vector immVector; for( uint32_t i = 0; i < indexVector.size(); i++) immVector.push_back(&fn->getImmediate(indexVector[i])); const Immediate imm(immVector, dstType); return fn->newImmediate(imm); } /*! Create an integer immediate value */ INLINE ImmediateIndex newIntegerImmediate(int64_t x, Type type) { switch (type) { case TYPE_S8: return this->newImmediate(int8_t(x)); case TYPE_U8: return this->newImmediate(uint8_t(x)); case TYPE_S16: return this->newImmediate(int16_t(x)); case TYPE_U16: return this->newImmediate(uint16_t(x)); case TYPE_S32: return this->newImmediate(int32_t(x)); case TYPE_U32: return this->newImmediate(uint32_t(x)); case TYPE_S64: return this->newImmediate(int64_t(x)); case TYPE_U64: return this->newImmediate(uint64_t(x)); default: NOT_SUPPORTED; return ImmediateIndex(0); } return ImmediateIndex(0); } INLINE ImmediateIndex newFloatImmediate(float x) { return this->newImmediate(x); } INLINE ImmediateIndex newDoubleImmediate(double x) { return this->newImmediate(x); } INLINE ImmediateIndex processImm(ImmOpCode op, ImmediateIndex src, Type type) { const Immediate &imm = fn->getImmediate(src); const Immediate &dstImm = Immediate(op, imm, type); return fn->newImmediate(dstImm); } INLINE ImmediateIndex processImm(ImmOpCode op, ImmediateIndex src0, ImmediateIndex src1, Type type) { const Immediate &imm0 = fn->getImmediate(src0); const Immediate &imm1 = fn->getImmediate(src1); const Immediate &dstImm = Immediate(op, imm0, imm1, type); return fn->newImmediate(dstImm); } /*! Create a new register holding the given value. A LOADI is pushed */ template INLINE Register immReg(T value) { GBE_ASSERTM(fn != NULL, "No function currently defined"); const Immediate imm(value); const ImmediateIndex index = fn->newImmediate(imm); const RegisterFamily family = getFamily(imm.getType()); const Register reg = this->reg(family); this->LOADI(imm.getType(), reg, index); return reg; } /*! Create a new label for the current function */ LabelIndex label(void); /*! Append a new input register for the function */ void input(const std::string &name, FunctionArgument::Type type, Register reg, FunctionArgument::InfoFromLLVM& info, uint32_t elemSz = 0u, uint32_t align = 0, uint8_t bti = 0); /*! Append a new output register for the function */ void output(Register reg); /*! Get the immediate value */ INLINE Immediate getImmediate(ImmediateIndex index) const { return fn->getImmediate(index); } /*! Append a new tuple */ template INLINE Tuple tuple(Args...args) { GBE_ASSERTM(fn != NULL, "No function currently defined"); return fn->file.appendTuple(args...); } /*! Make a tuple from an array of register */ INLINE Tuple arrayTuple(const Register *reg, uint32_t regNum) { GBE_ASSERTM(fn != NULL, "No function currently defined"); return fn->file.appendArrayTuple(reg, regNum); } /*! Make a tuple from an array of types */ INLINE Tuple arrayTypeTuple(const ir::Type *type, uint32_t num) { GBE_ASSERTM(fn != NULL, "No function currently defined"); return fn->file.appendArrayTypeTuple((uint8_t*)type, num); } /*! We just use variadic templates to forward instruction functions */ #define DECL_INSN(NAME, FAMILY) \ template INLINE void NAME(Args...args); #include "ir/instruction.hxx" #undef DECL_INSN /*! Return the pointer size handled by the unit */ INLINE PointerSize getPointerSize(void) const { return unit.getPointerSize(); } /*! Return the family of registers that contain pointer */ INLINE RegisterFamily getPointerFamily(void) const { return unit.getPointerFamily(); } #define DECL_THREE_SRC_INSN(NAME) \ INLINE void NAME(Type type, \ Register dst, \ Register src0, \ Register src1, \ Register src2) \ { \ const Tuple index = this->tuple(src0, src1, src2); \ this->NAME(type, dst, index); \ } DECL_THREE_SRC_INSN(SEL); DECL_THREE_SRC_INSN(I64MADSAT); DECL_THREE_SRC_INSN(MAD); DECL_THREE_SRC_INSN(LRP); #undef DECL_THREE_SRC_INSN /*! For all nullary functions */ void ALU0(Opcode opcode, Type type, Register dst) { const Instruction insn = gbe::ir::ALU0(opcode, type, dst); this->append(insn); } /*! For all unary functions */ void ALU1(Opcode opcode, Type type, Register dst, Register src) { const Instruction insn = gbe::ir::ALU1(opcode, type, dst, src); this->append(insn); } void appendSurface(uint8_t bti, Register reg) { fn->appendSurface(bti, reg); } void setDBGInfo(DebugInfo in) { DBGInfo = in; } protected: /*! A block must be started with a label */ void startBlock(void); /*! A block must be ended with a branch */ void endBlock(void); /*! Append the instruction in the current basic block */ void append(const Instruction &insn); Unit &unit; //!< A unit is associated to a contect Function *fn; //!< Current function we are processing BasicBlock *bb; //!< Current basic block we are filling static const uint8_t LABEL_IS_POINTED = 1 << 0; //!< Branch is using it static const uint8_t LABEL_IS_DEFINED = 1 << 1; //!< Label is defining it vector *usedLabels; /*! Functions can be defined recursiely */ struct StackElem { INLINE StackElem(Function *fn, BasicBlock *bb, vector *usedLabels) : fn(fn), bb(bb), usedLabels(usedLabels) {} Function *fn; //!< Function to process BasicBlock *bb; //!< Basic block currently processed vector *usedLabels; //!< Store all labels that are defined }; vector fnStack; //!< Stack of functions still to finish DebugInfo DBGInfo; GBE_CLASS(Context); }; // Use argument checker to assert argument value correctness #define DECL_INSN(NAME, FAMILY) \ template \ INLINE void Context::NAME(Args...args) { \ GBE_ASSERTM(fn != NULL, "No function currently defined"); \ const Instruction insn = gbe::ir::NAME(args...); \ this->append(insn); \ } #include "ir/instruction.hxx" #undef DECL_INSN } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_CONTEXT_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/liveness.cpp000664 001750 001750 00000026663 13161142102 020610 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file liveness.cpp * \author Benjamin Segovia */ #include "ir/liveness.hpp" #include namespace gbe { namespace ir { Liveness::Liveness(Function &fn, bool isInGenBackend) : fn(fn) { // Initialize UEVar and VarKill for each block fn.foreachBlock([this](const BasicBlock &bb) { this->initBlock(bb); // If the bb has ret instruction, add it to the work list set. const Instruction *lastInsn = bb.getLastInstruction(); const ir::Opcode op = lastInsn->getOpcode(); struct BlockInfo * info = liveness[&bb]; if (op == OP_RET) { workSet.insert(info); info->liveOut.insert(ocl::retVal); } }); // Now with iterative analysis, we compute liveout and livein sets while (unvisitBlocks.size()) { if (workSet.size() == 0) workSet.insert(--unvisitBlocks.end(), unvisitBlocks.end()); this->computeLiveInOut(); } // extend register (def in loop, use out-of-loop) liveness to the whole loop set extentRegs; // Only in Gen backend we need to take care of extra live out analysis. if (isInGenBackend) { this->computeExtraLiveInOut(extentRegs); // analyze uniform values. The extentRegs contains all the values which is // defined in a loop and use out-of-loop which could not be a uniform. The reason // is that when it reenter the second time, it may active different lanes. So // reenter many times may cause it has different values in different lanes. this->analyzeUniform(&extentRegs); } } void Liveness::removeRegs(const set &removes) { for (auto &pair : liveness) { BlockInfo &info = *(pair.second); for (auto reg : removes) { if (info.liveOut.contains(reg)) info.liveOut.erase(reg); if (info.upwardUsed.contains(reg)) info.upwardUsed.erase(reg); } } } void Liveness::replaceRegs(const map &replaceMap) { for (auto &pair : liveness) { BlockInfo &info = *pair.second; BasicBlock *bb = const_cast(&info.bb); for (auto &pair : replaceMap) { Register from = pair.first; Register to = pair.second; if (info.liveOut.contains(from)) { info.liveOut.erase(from); info.liveOut.insert(to); // FIXME, a hack method to avoid the "to" register be treated as // uniform value. bb->definedPhiRegs.insert(to); } if (info.upwardUsed.contains(from)) { info.upwardUsed.erase(from); info.upwardUsed.insert(to); } if (info.varKill.contains(from)) { info.varKill.erase(from); info.varKill.insert(to); } if (bb->undefPhiRegs.contains(from)) { bb->undefPhiRegs.erase(from); bb->undefPhiRegs.insert(to); } } } } Liveness::~Liveness(void) { for (auto &pair : liveness) GBE_SAFE_DELETE(pair.second); } void Liveness::analyzeUniform(set *extentRegs) { fn.foreachBlock([this, extentRegs](const BasicBlock &bb) { const_cast(bb).foreach([this, extentRegs](const Instruction &insn) { const uint32_t srcNum = insn.getSrcNum(); const uint32_t dstNum = insn.getDstNum(); bool uniform = true; //do not change dst uniform for simd id if (insn.getOpcode() == ir::OP_SIMD_ID) uniform = false; // do not change dst uniform for block read if ((insn.getOpcode() == ir::OP_LOAD && ir::cast(insn).isBlock()) || insn.getOpcode() == ir::OP_MBREAD) uniform = false; for (uint32_t srcID = 0; srcID < srcNum; ++srcID) { const Register reg = insn.getSrc(srcID); if (!fn.isUniformRegister(reg)) uniform = false; } // A destination is a killed value for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { const Register reg = insn.getDst(dstID); int opCode = insn.getOpcode(); // FIXME, ADDSAT and uniform vector should be supported. if (uniform && fn.getRegisterFamily(reg) != ir::FAMILY_QWORD && !insn.getParent()->definedPhiRegs.contains(reg) && opCode != ir::OP_ATOMIC && opCode != ir::OP_MUL_HI && opCode != ir::OP_HADD && opCode != ir::OP_RHADD && opCode != ir::OP_READ_ARF && opCode != ir::OP_ADDSAT && (dstNum == 1 || insn.getOpcode() != ir::OP_LOAD) && !extentRegs->contains(reg) ) fn.setRegisterUniform(reg, true); } }); }); } void Liveness::initBlock(const BasicBlock &bb) { GBE_ASSERT(liveness.contains(&bb) == false); BlockInfo *info = GBE_NEW(BlockInfo, bb); // Traverse all instructions to handle UEVar and VarKill const_cast(bb).foreach([this, info](const Instruction &insn) { this->initInstruction(*info, insn); }); liveness[&bb] = info; unvisitBlocks.insert(info); if(!bb.liveout.empty()) info->liveOut.insert(bb.liveout.begin(), bb.liveout.end()); } void Liveness::initInstruction(BlockInfo &info, const Instruction &insn) { const uint32_t srcNum = insn.getSrcNum(); const uint32_t dstNum = insn.getDstNum(); // First look for used before killed for (uint32_t srcID = 0; srcID < srcNum; ++srcID) { const Register reg = insn.getSrc(srcID); // Not killed -> it is really an upward use if (info.varKill.contains(reg) == false) info.upwardUsed.insert(reg); } // A destination is a killed value for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { const Register reg = insn.getDst(dstID); info.varKill.insert(reg); } } // Use simple backward data flow analysis to solve the liveness problem. void Liveness::computeLiveInOut(void) { while(!workSet.empty()) { auto currInfo = *workSet.begin(); workSet.erase(currInfo); if (unvisitBlocks.find(currInfo) != unvisitBlocks.end()) unvisitBlocks.erase(currInfo); for (auto currOutVar : currInfo->liveOut) if (!currInfo->varKill.contains(currOutVar)) currInfo->upwardUsed.insert(currOutVar); bool isChanged = false; for (auto prev : currInfo->bb.getPredecessorSet()) { BlockInfo *prevInfo = liveness[prev]; if (unvisitBlocks.find(currInfo) != unvisitBlocks.end()) unvisitBlocks.erase(currInfo); for (auto currInVar : currInfo->upwardUsed) { if (!prevInfo->bb.undefPhiRegs.contains(currInVar)) { auto changed = prevInfo->liveOut.insert(currInVar); if (changed.second) isChanged = true; } } if (isChanged ) workSet.insert(prevInfo); } }; } /* As we run in SIMD mode with prediction mask to indicate active lanes. If a vreg is defined in a loop, and there are som uses of the vreg out of the loop, the define point may be run several times under *different* prediction mask. For these kinds of vreg, we must extend the vreg liveness into the whole loop. If we don't do this, it's liveness is killed before the def point inside loop. If the vreg's corresponding physical reg is assigned to other vreg during the killed period, and the instructions before kill point were re-executed with different prediction, the inactive lanes of vreg maybe over-written. Then the out-of-loop use will got wrong data. */ void Liveness::computeExtraLiveInOut(set &extentRegs) { const vector &loops = fn.getLoops(); extentRegs.clear(); if(loops.size() == 0) return; for (auto l : loops) { const BasicBlock &preheader = fn.getBlock(l->preheader); BlockInfo *preheaderInfo = liveness[&preheader]; for (auto x : l->exits) { const BasicBlock &a = fn.getBlock(x.first); const BasicBlock &b = fn.getBlock(x.second); BlockInfo * exiting = liveness[&a]; BlockInfo * exit = liveness[&b]; std::vector toExtend; std::vector toExtendCand; if(b.getPredecessorSet().size() <= 1) { // the exits only have one predecessor for (auto p : exit->upwardUsed) toExtendCand.push_back(p); } else { // the exits have more than one predecessors std::set_intersection(exiting->liveOut.begin(), exiting->liveOut.end(), exit->upwardUsed.begin(), exit->upwardUsed.end(), std::back_inserter(toExtendCand)); } // toExtendCand may contain some virtual register defined before loop, // which need to be excluded. Because what we need is registers defined // in the loop. Such kind of registers must be in live-out of the loop's // preheader. So we do the subtraction here. std::set_difference(toExtendCand.begin(), toExtendCand.end(), preheaderInfo->liveOut.begin(), preheaderInfo->liveOut.end(), std::back_inserter(toExtend)); if (toExtend.size() == 0) continue; for(auto r : toExtend) extentRegs.insert(r); for (auto bb : l->bbs) { BlockInfo * bI = liveness[&fn.getBlock(bb)]; for(auto r : toExtend) { if(!bI->upwardUsed.contains(r)) bI->upwardUsed.insert(r); bI->liveOut.insert(r); } } } } } std::ostream &operator<< (std::ostream &out, const Liveness &live) { const Function &fn = live.getFunction(); fn.foreachBlock([&] (const BasicBlock &bb) { out << std::endl; out << "Label $" << bb.getLabelIndex() << std::endl; const Liveness::BlockInfo &bbInfo = live.getBlockInfo(&bb); out << "liveIn:" << std::endl; for (auto &x: bbInfo.upwardUsed) { out << x << " "; } out << std::endl << "liveOut:" << std::endl; for (auto &x : bbInfo.liveOut) out << x << " "; out << std::endl << "varKill:" << std::endl; for (auto &x : bbInfo.varKill) out << x << " "; out << std::endl; }); return out; } /*! To pretty print the livfeness info */ static const uint32_t prettyInsnStrSize = 48; static const uint32_t prettyRegStrSize = 5; /*! Describe how the register is used */ static const uint32_t USE_NONE = 0; static const uint32_t USE_READ = 1 << 0; static const uint32_t USE_WRITTEN = 1 << 1; enum UsePosition { POS_BEFORE = 0, POS_HERE = 1, POS_AFTER = 2 }; } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/profiling.cpp000664 001750 001750 00000006426 13173554000 020753 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file profiling.cpp * */ #include #include #include "ir/profiling.hpp" #include "src/cl_device_data.h" #include namespace gbe { namespace ir { pthread_mutex_t ProfilingInfo::lock = PTHREAD_MUTEX_INITIALIZER; void ProfilingInfo::outputProfilingInfo(void * logBuf) { LockOutput lock; uint32_t logNum = *reinterpret_cast(logBuf); printf("Total log number is %u\n", logNum); ProfilingReportItem* log = reinterpret_cast((char*)logBuf + 4); for (int i = 0; i < (int)logNum; i++) { GBE_ASSERT(log->simdType == ProfilingSimdType8 || log->simdType == ProfilingSimdType16); uint32_t simd = log->simdType == ProfilingSimdType16 ? 16 : 8; printf(" ------------------------ Log %-6d -----------------------\n", i); printf(" | fix functions id:%4d simd: %4d kernel id: %4d |\n", log->fixedFunctionID, simd, log->kernelID); if (IS_IVYBRIDGE(deviceID)) { printf(" | thread id: %4d EU id:%4d half slice id:%2d |\n", log->genInfo.gen7.thread_id, log->genInfo.gen7.eu_id, log->genInfo.gen7.half_slice_id); } else if (IS_HASWELL(deviceID)) { printf(" | thread id: %4d EU id:%4d half slice id:%2d slice id%2d |\n", log->genInfo.gen7.thread_id, log->genInfo.gen7.eu_id, log->genInfo.gen7.half_slice_id, log->genInfo.gen7.slice_id); } else if (IS_BROADWELL(deviceID)) { printf(" | thread id: %4d EU id:%4d sub slice id:%2d slice id%2d |\n", log->genInfo.gen8.thread_id, log->genInfo.gen8.eu_id, log->genInfo.gen8.subslice_id, log->genInfo.gen8.slice_id); } uint64_t proLog = log->timestampPrologHi; proLog = ((proLog << 32) & 0xffffffff00000000) + log->timestampPrologLo; uint64_t epiLog = log->timestampEpilogHi; epiLog = ((epiLog << 32) & 0xffffffff00000000) + log->timestampEpilogLo; printf(" | dispatch Mask:%4x prolog:%10" PRIu64 " epilog:%10" PRIu64 " |\n", log->dispatchMask, proLog, epiLog); printf(" | globalX:%4d~%4d globalY:%4d~%4d globalZ:%4d~%4d |\n", log->gidXStart, log->gidXEnd, log->gidYStart, log->gidYEnd, log->gidZStart, log->gidZEnd); for (uint32_t i = 0; i < MaxTimestampProfilingPoints - 2; i += 3) { printf(" | ts%-2d:%10u | ts%-2d:%10u | ts%-2d:%10u |\n", i, log->userTimestamp[i], i + 1, log->userTimestamp[i + 1], i + 2, log->userTimestamp[i + 2]); } printf(" | ts18:%10u | ts19:%10u | |\n", log->userTimestamp[18], log->userTimestamp[19]); log++; } } } } Beignet-1.3.2-Source/backend/src/ir/sampler.cpp000664 001750 001750 00000007654 13161142102 020422 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file sampler.cpp * */ #include "sampler.hpp" #include "context.hpp" #include "ocl_common_defines.h" namespace gbe { namespace ir { #ifdef GBE_COMPILER_AVAILABLE uint8_t SamplerSet::appendReg(uint32_t key, Context *ctx) { uint8_t samplerSlot = samplerMap.size(); samplerMap.insert(std::make_pair(key, samplerSlot)); return samplerSlot; } uint8_t SamplerSet::append(uint32_t samplerValue, Context *ctx) { auto it = samplerMap.find(samplerValue); if (it != samplerMap.end()) return it->second; // This register is just used as a key. return appendReg(samplerValue, ctx); } #define SAMPLER_ID(id) ((id << __CLK_SAMPLER_ARG_BASE) | __CLK_SAMPLER_ARG_KEY_BIT) uint8_t SamplerSet::append(Register samplerReg, Context *ctx) { ir::FunctionArgument *arg = ctx->getFunction().getArg(samplerReg); GBE_ASSERT(arg != NULL); GBE_ASSERT(arg->type == ir::FunctionArgument::SAMPLER); int32_t id = ctx->getFunction().getArgID(arg); GBE_ASSERT(id < (1 << __CLK_SAMPLER_ARG_BITS)); map::iterator it = samplerMap.find(SAMPLER_ID(id)); if (it != samplerMap.end()) { return it->second; } return appendReg(SAMPLER_ID(id), ctx); } #endif #define OUT_UPDATE_SZ(elt) SERIALIZE_OUT(elt, outs, ret_size) #define IN_UPDATE_SZ(elt) DESERIALIZE_IN(elt, ins, total_size) /*! Implements the serialization. */ uint32_t SamplerSet::serializeToBin(std::ostream& outs) { uint32_t ret_size = 0; uint32_t sz = 0; OUT_UPDATE_SZ(magic_begin); sz = samplerMap.size(); OUT_UPDATE_SZ(sz); for (map::iterator it = samplerMap.begin(); it != samplerMap.end(); ++it) { OUT_UPDATE_SZ(it->first); OUT_UPDATE_SZ(it->second); } OUT_UPDATE_SZ(magic_end); OUT_UPDATE_SZ(ret_size); return ret_size; } uint32_t SamplerSet::deserializeFromBin(std::istream& ins) { uint32_t total_size = 0; uint32_t magic; uint32_t sampler_map_sz = 0; IN_UPDATE_SZ(magic); if (magic != magic_begin) return 0; IN_UPDATE_SZ(sampler_map_sz); for (size_t i = 0; i < sampler_map_sz; i++) { uint32_t key; uint32_t slot; IN_UPDATE_SZ(key); IN_UPDATE_SZ(slot); samplerMap.insert(std::make_pair(key, slot)); } IN_UPDATE_SZ(magic); if (magic != magic_end) return 0; uint32_t total_bytes; IN_UPDATE_SZ(total_bytes); if (total_bytes + sizeof(total_size) != total_size) return 0; return total_size; } void SamplerSet::printStatus(int indent, std::ostream& outs) { using namespace std; string spaces = indent_to_str(indent); string spaces_nl = indent_to_str(indent + 4); outs << spaces << "------------ Begin SamplerSet ------------" << "\n"; outs << spaces_nl << " SamplerSet Map: [index, sampler_reg, sampler_slot]\n"; outs << spaces_nl << " samplerMap size: " << samplerMap.size() << "\n"; for (map::iterator it = samplerMap.begin(); it != samplerMap.end(); ++it) { outs << spaces_nl << " [" << it->first << ", " << it->second << "]\n"; } outs << spaces << "------------- End SamplerSet -------------" << "\n"; } } /* namespace ir */ } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/ir/reloc.hpp000664 001750 001750 00000005147 13161142102 020063 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file reloc.cpp * * \author Benjamin Segovia */ #ifndef __GBE_IR_RELOC_HPP__ #define __GBE_IR_RELOC_HPP__ #include "sys/vector.hpp" #include namespace gbe { namespace ir { /*! Complete unit of compilation. It contains a set of functions and a set of * RelocEntry the functions may refer to. */ struct RelocEntry { RelocEntry(unsigned int rO, unsigned int dO): refOffset(rO), defOffset(dO) {} unsigned int refOffset; unsigned int defOffset; }; class RelocTable : public NonCopyable, public Serializable { public: void addEntry(unsigned refOffset, unsigned defOffset) { entries.push_back(RelocEntry(refOffset, defOffset)); } RelocTable() : Serializable() {} RelocTable(const RelocTable& other) : Serializable(other), entries(other.entries) {} uint32_t getCount() { return entries.size(); } void getData(char *p) { if (entries.size() > 0 && p) memcpy(p, entries.data(), entries.size()*sizeof(RelocEntry)); } static const uint32_t magic_begin = TO_MAGIC('R', 'E', 'L', 'C'); static const uint32_t magic_end = TO_MAGIC('C', 'L', 'E', 'R'); /* format: magic_begin | reloc_table_size | entry_0_refOffset | entry_0_defOffset | entry_1_refOffset | entry_1_defOffset | ........ | entry_n_refOffset | entry_n_defOffset | magic_end | total_size */ /*! Implements the serialization. */ virtual uint32_t serializeToBin(std::ostream& outs); virtual uint32_t deserializeFromBin(std::istream& ins); private: vector entries; GBE_CLASS(RelocTable); }; } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_RELOC_HPP__ */ Beignet-1.3.2-Source/backend/src/ir/image.hpp000664 001750 001750 00000006167 13161142102 020044 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file image.hpp * */ #ifndef __GBE_IR_IMAGE_HPP__ #define __GBE_IR_IMAGE_HPP__ #include "ir/register.hpp" #include "ir/instruction.hpp" // for ImageInfoKey #include "sys/map.hpp" extern "C" { struct ImageInfo; } namespace gbe { namespace ir { class Context; /*! An image set is a set of images which are defined in kernel args. * We use this set to gather the images here and allocate a unique index * for each individual image. And that individual image could be used * at backend to identify this image's location. */ class ImageSet : public Serializable { public: /*! Append an image argument. */ void append(Register imageReg, Context *ctx, uint8_t bti); /*! Append an image info slot. */ void appendInfo(ImageInfoKey key, uint32_t offset); /*! Append an image info register. */ Register appendInfo(ImageInfoKey, Context *ctx); /*! clear image info. */ void clearInfo(); /*! Get the image's index(actual location). */ uint32_t getIdx(const Register imageReg) const; size_t getDataSize(void) { return regMap.size(); } size_t getDataSize(void) const { return regMap.size(); } int32_t getInfoOffset(ImageInfoKey key) const; void getData(struct ImageInfo *imageInfos) const; void operator = (const ImageSet& other) { regMap.insert(other.regMap.begin(), other.regMap.end()); } bool empty() const { return regMap.empty(); } ImageSet(const ImageSet& other) : regMap(other.regMap.begin(), other.regMap.end()) { } ImageSet() {} ~ImageSet(); static const uint32_t magic_begin = TO_MAGIC('I', 'M', 'A', 'G'); static const uint32_t magic_end = TO_MAGIC('G', 'A', 'M', 'I'); /* format: magic_begin | regMap_size | element_1 | ........ | element_n | indexMap_size | element_1 | ........ | element_n | magic_end | total_size */ /*! Implements the serialization. */ virtual uint32_t serializeToBin(std::ostream& outs); virtual uint32_t deserializeFromBin(std::istream& ins); virtual void printStatus(int indent, std::ostream& outs); private: map regMap; map indexMap; map infoRegMap; GBE_CLASS(ImageSet); }; } /* namespace ir */ } /* namespace gbe */ #endif /* __GBE_IR_IMAGE_HPP__ */ Beignet-1.3.2-Source/backend/src/Android.mk000664 001750 001750 00000024004 13161142102 017536 0ustar00yryr000000 000000 LOCAL_PATH:= $(call my-dir) include $(LOCAL_PATH)/../../Android.common.mk include $(CLEAR_VARS) include $(CLEAR_TBLGEN_VARS) LLVM_ROOT_PATH := external/llvm CLANG_ROOT_PATH := external/clang include $(CLANG_ROOT_PATH)/clang.mk BACKEND_SRC_FILES:= \ ${ocl_blob_file} \ sys/vector.hpp \ sys/map.hpp \ sys/set.hpp \ sys/intrusive_list.hpp \ sys/intrusive_list.cpp \ sys/exception.hpp \ sys/assert.cpp \ sys/assert.hpp \ sys/alloc.cpp \ sys/alloc.hpp \ sys/mutex.cpp \ sys/mutex.hpp \ sys/platform.cpp \ sys/platform.hpp \ sys/cvar.cpp \ sys/cvar.hpp \ ir/context.cpp \ ir/context.hpp \ ir/profile.cpp \ ir/profile.hpp \ ir/type.cpp \ ir/type.hpp \ ir/unit.cpp \ ir/unit.hpp \ ir/constant.cpp \ ir/constant.hpp \ ir/sampler.cpp \ ir/sampler.hpp \ ir/image.cpp \ ir/image.hpp \ ir/half.cpp \ ir/half.hpp \ ir/instruction.cpp \ ir/instruction.hpp \ ir/liveness.cpp \ ir/register.cpp \ ir/register.hpp \ ir/function.cpp \ ir/function.hpp \ ir/profiling.cpp \ ir/profiling.hpp \ ir/value.cpp \ ir/value.hpp \ ir/lowering.cpp \ ir/lowering.hpp \ ir/printf.cpp \ ir/printf.hpp \ ir/immediate.hpp \ ir/immediate.cpp \ ir/structurizer.hpp \ ir/structurizer.cpp \ ir/reloc.hpp \ ir/reloc.cpp \ backend/context.cpp \ backend/context.hpp \ backend/program.cpp \ backend/program.hpp \ backend/program.h \ llvm/llvm_sampler_fix.cpp \ llvm/llvm_bitcode_link.cpp \ llvm/llvm_gen_backend.cpp \ llvm/llvm_passes.cpp \ llvm/llvm_scalarize.cpp \ llvm/llvm_intrinsic_lowering.cpp \ llvm/llvm_barrier_nodup.cpp \ llvm/llvm_printf_parser.cpp \ llvm/ExpandConstantExpr.cpp \ llvm/ExpandUtils.cpp \ llvm/PromoteIntegers.cpp \ llvm/ExpandLargeIntegers.cpp \ llvm/StripAttributes.cpp \ llvm/llvm_device_enqueue.cpp \ llvm/llvm_to_gen.cpp \ llvm/llvm_loadstore_optimization.cpp \ llvm/llvm_gen_backend.hpp \ llvm/llvm_gen_ocl_function.hxx \ llvm/llvm_unroll.cpp \ llvm/llvm_to_gen.hpp \ llvm/llvm_profiling.cpp \ backend/gen/gen_mesa_disasm.c \ backend/gen_insn_selection.cpp \ backend/gen_insn_selection.hpp \ backend/gen_insn_selection_optimize.cpp \ backend/gen_insn_scheduling.cpp \ backend/gen_insn_scheduling.hpp \ backend/gen_insn_selection_output.cpp \ backend/gen_insn_selection_output.hpp \ backend/gen_reg_allocation.cpp \ backend/gen_reg_allocation.hpp \ backend/gen_context.cpp \ backend/gen_context.hpp \ backend/gen75_context.hpp \ backend/gen75_context.cpp \ backend/gen8_context.hpp \ backend/gen8_context.cpp \ backend/gen9_context.hpp \ backend/gen9_context.cpp \ backend/gen_program.cpp \ backend/gen_program.hpp \ backend/gen_program.h \ backend/gen7_instruction.hpp \ backend/gen8_instruction.hpp \ backend/gen_defs.hpp \ backend/gen_insn_compact.cpp \ backend/gen_encoder.hpp \ backend/gen_encoder.cpp \ backend/gen7_encoder.hpp \ backend/gen7_encoder.cpp \ backend/gen75_encoder.hpp \ backend/gen75_encoder.cpp \ backend/gen8_encoder.hpp \ backend/gen8_encoder.cpp \ backend/gen9_encoder.hpp \ backend/gen9_encoder.cpp #Generate GBEConfig for android LOCAL_MODULE := libgbe LOCAL_MODULE_TAGS := optional LOCAL_MODULE_CLASS := SHARED_LIBRARIES generated_path := $(call local-generated-sources-dir) gbe_config_file = $(LOCAL_PATH)/GBEConfig.h $(shell echo "// the configured options and settings for LIBGBE" > $(gbe_config_file)) $(shell echo "#define LIBGBE_VERSION_MAJOR 0" >> $(gbe_config_file)) $(shell echo "#define LIBGBE_VERSION_MINOR 2" >> $(gbe_config_file)) $(shell echo "#if defined(__ANDROID__)" >> $(gbe_config_file)) $(shell echo "#if __x86_64__" >> $(gbe_config_file)) $(shell echo " #define GBE_OBJECT_DIR \"/system/lib64/libgbe.so\"" >> $(gbe_config_file)) $(shell echo " #define INTERP_OBJECT_DIR \"/system/lib64/libgbeinterp.so\"" >> $(gbe_config_file)) $(shell echo " #define OCL_BITCODE_BIN \"/system/lib/ocl/beignet.bc\"" >> $(gbe_config_file)) $(shell echo " #define OCL_HEADER_DIR \"/system/lib/ocl/include\"" >> $(gbe_config_file)) $(shell echo " #define OCL_PCH_OBJECT \"/system/lib/ocl/beignet.pch\"" >> $(gbe_config_file)) $(shell echo " #define OCL_BITCODE_BIN_20 \"/system/lib/ocl/beignet_20.bc\"" >> $(gbe_config_file)) $(shell echo " #define OCL_PCH_OBJECT_20 \"/system/lib/ocl/beigneti_20.pch\"" >> $(gbe_config_file)) $(shell echo "#else /*__x86_64__*/" >> $(gbe_config_file)) $(shell echo " #define GBE_OBJECT_DIR \"/system/lib/libgbe.so\"" >> $(gbe_config_file)) $(shell echo " #define INTERP_OBJECT_DIR \"/system/lib/libgbeinterp.so\"" >> $(gbe_config_file)) $(shell echo " #define OCL_BITCODE_BIN \"/system/lib/ocl/beignet.bc\"" >> $(gbe_config_file)) $(shell echo " #define OCL_HEADER_DIR \"/system/lib/ocl/include\"" >> $(gbe_config_file)) $(shell echo " #define OCL_PCH_OBJECT \"/system/lib/ocl/beignet.pch\"" >> $(gbe_config_file)) $(shell echo " #define OCL_BITCODE_BIN_20 \"/system/lib/ocl/beignet_20.bc\"" >> $(gbe_config_file)) $(shell echo " #define OCL_PCH_OBJECT_20 \"/system/lib/ocl/beigneti_20.pch\"" >> $(gbe_config_file)) $(shell echo "#endif" >> $(gbe_config_file)) $(shell echo "#else /*__ANDROID__*/" >> $(gbe_config_file)) $(shell echo " #define GBE_OBJECT_DIR \"\"" >> $(gbe_config_file)) $(shell echo " #define INTERP_OBJECT_DIR \"\"" >> $(gbe_config_file)) $(shell echo " #define OCL_BITCODE_BIN \"`pwd $(TOP)`/$(generated_path)\"" >> $(gbe_config_file)) $(shell echo " #define OCL_HEADER_DIR \"`pwd $(TOP)`/$(generated_path)/libocl/include\"" >> $(gbe_config_file)) $(shell echo " #define OCL_PCH_OBJECT \"`pwd $(TOP)`/$(generated_path)\"" >> $(gbe_config_file)) $(shell echo " #define OCL_BITCODE_BIN_20 \"`pwd $(TOP)`/$(generated_path)\"" >> $(gbe_config_file)) $(shell echo " #define OCL_PCH_OBJECT_20 \"`pwd $(TOP)`/$(generated_path)\"" >> $(gbe_config_file)) $(shell echo "#endif" >> $(gbe_config_file)) #Build HOST libgbe.so LOCAL_C_INCLUDES := $(TOP_C_INCLUDE) \ $(BEIGNET_ROOT_PATH) \ $(LOCAL_PATH)/../ \ $(LLVM_INCLUDE_DIRS) LOCAL_CPPFLAGS += $(LLVM_CFLAGS) -std=c++11 -fexceptions -DGBE_DEBUG=0 -DGBE_COMPILER_AVAILABLE=1 -DGEN7_SAMPLER_CLAMP_BORDER_WORKAROUND LOCAL_CFLAGS += $(LLVM_CFLAGS) -fexceptions -DGBE_DEBUG=0 -DGBE_COMPILER_AVAILABLE=1 -DGEN7_SAMPLER_CLAMP_BORDER_WORKAROUND LOCAL_CPPFLAGS += -Wno-extra-semi -Wno-gnu-anonymous-struct -Wno-nested-anon-types LOCAL_CFLAGS += -Wno-extra-semi -Wno-gnu-anonymous-struct -Wno-nested-anon-types LOCAL_LDLIBS += -lpthread -lm -ldl -lLLVM -lclang #LOCAL_STATIC_LIBRARIES := $(CLANG_MODULE_LIBS) LOCAL_SHARED_LIBRARIES := libclang TBLGEN_TABLES := \ AttrList.inc \ Attrs.inc \ CommentCommandList.inc \ CommentNodes.inc \ DeclNodes.inc \ DiagnosticCommonKinds.inc \ DiagnosticDriverKinds.inc \ DiagnosticFrontendKinds.inc \ DiagnosticSemaKinds.inc LOCAL_SRC_FILES = $(BACKEND_SRC_FILES) include $(CLANG_HOST_BUILD_MK) include $(CLANG_TBLGEN_RULES_MK) include $(LLVM_GEN_INTRINSICS_MK) include $(BUILD_HOST_SHARED_LIBRARY) #Build gbe_bin_generater include $(CLEAR_VARS) LOCAL_SRC_FILES := gbe_bin_generater.cpp LOCAL_C_INCLUDES := $(TOP_C_INCLUDE) \ $(BEIGNET_ROOT_PATH) \ $(LOCAL_PATH)/ \ $(LLVM_INCLUDE_DIRS) LOCAL_CLANG := true LOCAL_MODULE := gbe_bin_generater LOCAL_MODULE_TAGS := optional LOCAL_CFLAGS = $(LLVM_CFLAGS) -std=gnu++11 -fexceptions LOCAL_SHARED_LIBRARIES := libgbe LOCAL_LDLIBS += -lpthread -lm -ldl include $(BUILD_HOST_EXECUTABLE) #Build libgbeinterp.so include $(CLEAR_VARS) LLVM_ROOT_PATH := external/llvm include $(LLVM_ROOT_PATH)/llvm.mk LOCAL_C_INCLUDES := $(TOP_C_INCLUDE) \ $(BEIGNET_ROOT_PATH) \ $(LOCAL_PATH)/../ \ $(LLVM_INCLUDE_DIRS) LOCAL_LDFLAGS := -Wl,--no-undefined LOCAL_CFLAGS += $(SUBDIR_C_CXX_FLAGS) LOCAL_CPPFLAGS += -Wl,-E -std=c++11 -DGBE_COMPILER_AVAILABLE=1 LOCAL_MODULE := libgbeinterp LOCAL_MODULE_TAGS := optional LOCAL_SRC_FILES := gbe_bin_interpreter.cpp LOCAL_SHARED_LIBRARIES := \ libcutils \ $(DRM_INTEL_LIBRARY) \ $(DRM_LIBRARY) include $(LLVM_DEVICE_BUILD_MK) include $(BUILD_SHARED_LIBRARY) #Build targe libgbe.so include $(CLEAR_VARS) include $(CLEAR_TBLGEN_VARS) LOCAL_C_INCLUDES := $(TOP_C_INCLUDE) \ $(BEIGNET_ROOT_PATH) \ $(LOCAL_PATH)/../ \ $(LLVM_INCLUDE_DIRS) SUBDIR_C_CXX_FLAGS := -fvisibility=hidden SUBDIR_C_CXX_FLAGS += -funroll-loops -fstrict-aliasing -msse2 -msse3 -mssse3 -msse4.1 -fPIC -Wall SUBDIR_C_CXX_FLAGS += $(LLVM_CFLAGS) LOCAL_CPPFLAGS := $(SUBDIR_C_CXX_FLAGS) LOCAL_CPPFLAGS += -fno-rtti -std=c++11 -DGBE_DEBUG=1 -DGBE_COMPILER_AVAILABLE=1 -DGEN7_SAMPLER_CLAMP_BORDER_WORKAROUND LOCAL_CPPFLAGS += -Wl,-E #LOCAL_SDK_VERSION := 19 #LOCAL_NDK_STL_VARIANT := gnustl_static LOCAL_CFLAGS := $(SUBDIR_C_CXX_FLAGS) LOCAL_CFLAGS += -Wl,-E LOCAL_LDFLAGS := -Wl,--no-undefined LOCAL_LDLIBS := $(LLVM_LFLAGS) LOCAL_MODULE := libgbe LOCAL_MODULE_TAGS := optional LOCAL_MODULE_CLASS := SHARED_LIBRARIES LOCAL_SHARED_LIBRARIES := \ libcutils \ $(DRM_INTEL_LIBRARY) \ $(DRM_LIBRARY) \ libclang libLLVM #$(THREAD_LIBS_INIT) #$(DL_LIBS) #LOCAL_STATIC_LIBRARIES := $(CLANG_MODULE_LIBS) TBLGEN_TABLES := \ AttrList.inc \ Attrs.inc \ CommentCommandList.inc \ CommentNodes.inc \ DeclNodes.inc \ DiagnosticCommonKinds.inc \ DiagnosticDriverKinds.inc \ DiagnosticFrontendKinds.inc \ DiagnosticSemaKinds.inc LOCAL_SRC_FILES := $(BACKEND_SRC_FILES) include $(CLANG_DEVICE_BUILD_MK) include $(CLANG_TBLGEN_RULES_MK) include $(LLVM_GEN_INTRINSICS_MK) include $(BUILD_SHARED_LIBRARY) Beignet-1.3.2-Source/backend/src/ocl_common_defines.h000664 001750 001750 00000012077 13161142102 021627 0ustar00yryr000000 000000 // This file includes defines that are common to both kernel code and // the NVPTX back-end. #ifndef __OCL_COMMON_DEFINES__ #define __OCL_COMMON_DEFINES__ // // Common defines for Image intrinsics // Channel order #define CLK_HAS_ALPHA(color) (color == CLK_A || color == CLK_RA || color == CLK_RGBA || color == CLK_BGRA || color == CLK_ARGB || color == CLK_sRGBA || color == CLK_sBGRA) enum { CLK_R = 0x10B0, CLK_A = 0x10B1, CLK_RG = 0x10B2, CLK_RA = 0x10B3, CLK_RGB = 0x10B4, CLK_RGBA = 0x10B5, CLK_BGRA = 0x10B6, CLK_ARGB = 0x10B7, #if (__NV_CL_C_VERSION == __NV_CL_C_VERSION_1_0) CLK_xRGB = 0x10B7, #endif CLK_INTENSITY = 0x10B8, CLK_LUMINANCE = 0x10B9 #if (__NV_CL_C_VERSION >= __NV_CL_C_VERSION_1_1) , CLK_Rx = 0x10BA, CLK_RGx = 0x10BB, CLK_RGBx = 0x10BC #endif #if (__NV_CL_C_VERSION >= __NV_CL_C_VERSION_2_0) , CLK_sRGBA = 0x10C1, CLK_sBGRA = 0x10C2 #endif }; typedef enum clk_channel_type { // valid formats for float return types CLK_SNORM_INT8 = 0x10D0, // four channel RGBA unorm8 CLK_SNORM_INT16 = 0x10D1, // four channel RGBA unorm16 CLK_UNORM_INT8 = 0x10D2, // four channel RGBA unorm8 CLK_UNORM_INT16 = 0x10D3, // four channel RGBA unorm16 CLK_HALF_FLOAT = 0x10DD, // four channel RGBA half CLK_FLOAT = 0x10DE, // four channel RGBA float #if (__NV_CL_C_VERSION >= __NV_CL_C_VERSION_1_1) CLK_UNORM_SHORT_565 = 0x10D4, CLK_UNORM_SHORT_555 = 0x10D5, CLK_UNORM_INT_101010 = 0x10D6, #endif // valid only for integer return types CLK_SIGNED_INT8 = 0x10D7, CLK_SIGNED_INT16 = 0x10D8, CLK_SIGNED_INT32 = 0x10D9, CLK_UNSIGNED_INT8 = 0x10DA, CLK_UNSIGNED_INT16 = 0x10DB, CLK_UNSIGNED_INT32 = 0x10DC, // CI SPI for CPU __CLK_UNORM_INT8888 , // four channel ARGB unorm8 __CLK_UNORM_INT8888R, // four channel BGRA unorm8 __CLK_VALID_IMAGE_TYPE_COUNT, __CLK_INVALID_IMAGE_TYPE = __CLK_VALID_IMAGE_TYPE_COUNT, __CLK_VALID_IMAGE_TYPE_MASK_BITS = 4, // number of bits required to // represent any image type __CLK_VALID_IMAGE_TYPE_MASK = ( 1 << __CLK_VALID_IMAGE_TYPE_MASK_BITS ) - 1 }clk_channel_type; typedef enum clk_sampler_type { __CLK_NORMALIZED_BASE = 0, CLK_NORMALIZED_COORDS_FALSE = 0, CLK_NORMALIZED_COORDS_TRUE = (1 << __CLK_NORMALIZED_BASE), __CLK_NORMALIZED_MASK = (CLK_NORMALIZED_COORDS_FALSE | CLK_NORMALIZED_COORDS_TRUE), __CLK_NORMALIZED_BITS = 1, // number of bits required to // represent normalization __CLK_ADDRESS_BASE = 0, CLK_ADDRESS_NONE = (0 << __CLK_ADDRESS_BASE), CLK_ADDRESS_CLAMP_TO_EDGE = (2 << __CLK_ADDRESS_BASE), CLK_ADDRESS_CLAMP = (4 << __CLK_ADDRESS_BASE), CLK_ADDRESS_REPEAT = (6 << __CLK_ADDRESS_BASE), CLK_ADDRESS_MIRROR = (8 << __CLK_ADDRESS_BASE), #if (__NV_CL_C_VERSION >= __NV_CL_C_VERSION_1_1) CLK_ADDRESS_MIRRORED_REPEAT = CLK_ADDRESS_MIRROR, #endif __CLK_ADDRESS_MASK = (CLK_ADDRESS_NONE | CLK_ADDRESS_CLAMP | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_ADDRESS_REPEAT | CLK_ADDRESS_MIRROR), __CLK_ADDRESS_BITS = 4, // number of bits required to // represent address info __CLK_FILTER_BASE = (__CLK_ADDRESS_BASE + __CLK_ADDRESS_BITS), CLK_FILTER_ANISOTROPIC = (0 << __CLK_FILTER_BASE), CLK_FILTER_NEAREST = (1 << __CLK_FILTER_BASE), CLK_FILTER_LINEAR = (2 << __CLK_FILTER_BASE), __CLK_FILTER_MASK = (CLK_FILTER_NEAREST | CLK_FILTER_LINEAR | CLK_FILTER_ANISOTROPIC), __CLK_FILTER_BITS = 2, // number of bits required to // represent address info __CLK_MIP_BASE = (__CLK_FILTER_BASE + __CLK_FILTER_BITS), CLK_MIP_NEAREST = (0 << __CLK_MIP_BASE), CLK_MIP_LINEAR = (1 << __CLK_MIP_BASE), CLK_MIP_ANISOTROPIC = (2 << __CLK_MIP_BASE), __CLK_MIP_MASK = (CLK_MIP_NEAREST | CLK_MIP_LINEAR | CLK_MIP_ANISOTROPIC), __CLK_MIP_BITS = 2, __CLK_SAMPLER_BITS = (__CLK_MIP_BASE + __CLK_MIP_BITS), __CLK_SAMPLER_MASK = (__CLK_MIP_MASK | __CLK_FILTER_MASK | __CLK_NORMALIZED_MASK | __CLK_ADDRESS_MASK), __CLK_SAMPLER_ARG_BASE = (__CLK_MIP_BASE + __CLK_SAMPLER_BITS), __CLK_SAMPLER_ARG_BITS = 8, __CLK_SAMPLER_ARG_MASK = (((1 << __CLK_SAMPLER_ARG_BITS) - 1) << __CLK_SAMPLER_ARG_BASE), __CLK_SAMPLER_ARG_KEY_BIT = (1 << (__CLK_SAMPLER_ARG_BASE + __CLK_SAMPLER_ARG_BITS)), __CLK_SAMPLER_ARG_KEY_BITS = 1, } clk_sampler_type; #endif /* __OCL_COMMON_DEFINES__ */ Beignet-1.3.2-Source/backend/src/llvm/000775 001750 001750 00000000000 13174334761 016621 5ustar00yryr000000 000000 Beignet-1.3.2-Source/backend/src/llvm/llvm_bitcode_link.cpp000664 001750 001750 00000032752 13173554000 023003 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . */ #include #include #include #include #include #include #include "sys/cvar.hpp" #include "src/GBEConfig.h" #include "llvm_includes.hpp" #include "llvm/llvm_gen_backend.hpp" #include "ir/unit.hpp" using namespace llvm; SVAR(OCL_BITCODE_LIB_PATH, OCL_BITCODE_BIN); SVAR(OCL_BITCODE_LIB_20_PATH, OCL_BITCODE_BIN_20); namespace gbe { static Module* createOclBitCodeModule(LLVMContext& ctx, bool strictMath, uint32_t oclVersion) { std::string bitCodeFiles = oclVersion >= 200 ? OCL_BITCODE_LIB_20_PATH : OCL_BITCODE_LIB_PATH; if(bitCodeFiles == "") bitCodeFiles = oclVersion >= 200 ? OCL_BITCODE_BIN_20 : OCL_BITCODE_BIN; std::istringstream bitCodeFilePath(bitCodeFiles); std::string FilePath; bool findBC = false; Module* oclLib = NULL; SMDiagnostic Err; while (std::getline(bitCodeFilePath, FilePath, ':')) { if(access(FilePath.c_str(), R_OK) == 0) { findBC = true; break; } } if (!findBC) { printf("Fatal Error: ocl lib %s does not exist\n", bitCodeFiles.c_str()); return NULL; } #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR <= 35 oclLib = getLazyIRFileModule(FilePath, Err, ctx); #else oclLib = getLazyIRFileModule(FilePath, Err, ctx).release(); #endif if (!oclLib) { printf("Fatal Error: ocl lib can not be opened\n"); return NULL; } llvm::GlobalVariable* mathFastFlag = oclLib->getGlobalVariable("__ocl_math_fastpath_flag"); assert(mathFastFlag); Type* intTy = IntegerType::get(ctx, 32); mathFastFlag->setInitializer(ConstantInt::get(intTy, strictMath ? 0 : 1)); return oclLib; } static bool materializedFuncCall(Module& src, Module& lib, llvm::Function& KF, std::set& MFS, std::vector&Gvs) { bool fromSrc = false; for (llvm::Function::iterator B = KF.begin(), BE = KF.end(); B != BE; B++) { for (BasicBlock::iterator instI = B->begin(), instE = B->end(); instI != instE; ++instI) { llvm::CallInst* call = dyn_cast(instI); if (!call) { continue; } llvm::Function * callFunc = call->getCalledFunction(); //if(!callFunc) { // continue; //} if (callFunc && callFunc->getIntrinsicID() != 0) continue; std::string fnName = call->getCalledValue()->stripPointerCasts()->getName(); if (!MFS.insert(fnName).second) { continue; } fromSrc = false; llvm::Function *newMF = lib.getFunction(fnName); if (!newMF) { newMF = src.getFunction(fnName); if (!newMF) { printf("Can not find the lib: %s\n", fnName.c_str()); return false; } fromSrc = true; } std::string ErrInfo;// = "Not Materializable"; if (!fromSrc && newMF->isMaterializable()) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 if (llvm::Error EC = newMF->materialize()) { std::string Msg; handleAllErrors(std::move(EC), [&](ErrorInfoBase &EIB) { Msg = EIB.message(); }); printf("Can not materialize the function: %s, because %s\n", fnName.c_str(), Msg.c_str()); return false; } Gvs.push_back((GlobalValue *)newMF); #elif LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 36 if (std::error_code EC = newMF->materialize()) { printf("Can not materialize the function: %s, because %s\n", fnName.c_str(), EC.message().c_str()); return false; } Gvs.push_back((GlobalValue *)newMF); #else if (newMF->Materialize(&ErrInfo)) { printf("Can not materialize the function: %s, because %s\n", fnName.c_str(), ErrInfo.c_str()); return false; } #endif } if (!materializedFuncCall(src, lib, *newMF, MFS, Gvs)) return false; } } return true; } Module* runBitCodeLinker(Module *mod, bool strictMath, ir::Unit &unit) { LLVMContext& ctx = mod->getContext(); std::set materializedFuncs; std::vector Gvs; uint32_t oclVersion = getModuleOclVersion(mod); ir::PointerSize size = oclVersion >= 200 ? ir::POINTER_64_BITS : ir::POINTER_32_BITS; unit.setPointerSize(size); Module* clonedLib = createOclBitCodeModule(ctx, strictMath, oclVersion); if (clonedLib == NULL) return NULL; std::vector kernels; std::vector kerneltmp; std::vector builtinFuncs; /* Add the memset and memcpy functions here. */ builtinFuncs.push_back("__gen_memcpy_gg"); builtinFuncs.push_back("__gen_memcpy_gp"); builtinFuncs.push_back("__gen_memcpy_gl"); builtinFuncs.push_back("__gen_memcpy_pg"); builtinFuncs.push_back("__gen_memcpy_pp"); builtinFuncs.push_back("__gen_memcpy_pl"); builtinFuncs.push_back("__gen_memcpy_lg"); builtinFuncs.push_back("__gen_memcpy_lp"); builtinFuncs.push_back("__gen_memcpy_ll"); builtinFuncs.push_back("__gen_memset_p"); builtinFuncs.push_back("__gen_memset_g"); builtinFuncs.push_back("__gen_memset_l"); builtinFuncs.push_back("__gen_memcpy_gg_align"); builtinFuncs.push_back("__gen_memcpy_gp_align"); builtinFuncs.push_back("__gen_memcpy_gl_align"); builtinFuncs.push_back("__gen_memcpy_pg_align"); builtinFuncs.push_back("__gen_memcpy_pp_align"); builtinFuncs.push_back("__gen_memcpy_pl_align"); builtinFuncs.push_back("__gen_memcpy_lg_align"); builtinFuncs.push_back("__gen_memcpy_lp_align"); builtinFuncs.push_back("__gen_memcpy_ll_align"); builtinFuncs.push_back("__gen_memset_p_align"); builtinFuncs.push_back("__gen_memset_g_align"); builtinFuncs.push_back("__gen_memset_l_align"); builtinFuncs.push_back("__gen_memcpy_pc"); builtinFuncs.push_back("__gen_memcpy_gc"); builtinFuncs.push_back("__gen_memcpy_lc"); builtinFuncs.push_back("__gen_memcpy_pc_align"); builtinFuncs.push_back("__gen_memcpy_gc_align"); builtinFuncs.push_back("__gen_memcpy_lc_align"); if (oclVersion >= 200) { builtinFuncs.push_back("__gen_memcpy_gn"); builtinFuncs.push_back("__gen_memcpy_pn"); builtinFuncs.push_back("__gen_memcpy_ln"); builtinFuncs.push_back("__gen_memcpy_ng"); builtinFuncs.push_back("__gen_memcpy_np"); builtinFuncs.push_back("__gen_memcpy_nl"); builtinFuncs.push_back("__gen_memcpy_nc"); builtinFuncs.push_back("__gen_memcpy_nn"); builtinFuncs.push_back("__gen_memset_n"); builtinFuncs.push_back("__gen_memcpy_gn_align"); builtinFuncs.push_back("__gen_memcpy_pn_align"); builtinFuncs.push_back("__gen_memcpy_ln_align"); builtinFuncs.push_back("__gen_memcpy_ng_align"); builtinFuncs.push_back("__gen_memcpy_np_align"); builtinFuncs.push_back("__gen_memcpy_nl_align"); builtinFuncs.push_back("__gen_memcpy_nc_align"); builtinFuncs.push_back("__gen_memcpy_nn_align"); builtinFuncs.push_back("__gen_memset_n_align"); } for (Module::iterator SF = mod->begin(), E = mod->end(); SF != E; ++SF) { if (SF->isDeclaration()) continue; if (!isKernelFunction(*SF)) continue; // mod will be deleted after link, copy the names. const char *funcName = SF->getName().data(); char * tmp = new char[strlen(funcName)+1]; strcpy(tmp,funcName); kernels.push_back(tmp); kerneltmp.push_back(tmp); if (!materializedFuncCall(*mod, *clonedLib, *SF, materializedFuncs, Gvs)) { delete clonedLib; return NULL; } Gvs.push_back((GlobalValue *)&*SF); } if (kernels.empty()) { printf("One module without kernel function!\n"); delete clonedLib; return NULL; } for (auto &f : builtinFuncs) { const std::string fnName(f); if (!materializedFuncs.insert(fnName).second) { continue; } llvm::Function *newMF = clonedLib->getFunction(fnName); if (!newMF) { printf("Can not find the function: %s\n", fnName.c_str()); delete clonedLib; return NULL; } std::string ErrInfo;// = "Not Materializable"; if (newMF->isMaterializable()) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 if (llvm::Error EC = newMF->materialize()) { std::string Msg; handleAllErrors(std::move(EC), [&](ErrorInfoBase &EIB) { Msg = EIB.message(); }); printf("Can not materialize the function: %s, because %s\n", fnName.c_str(), Msg.c_str()); delete clonedLib; return NULL; } #elif LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 36 if (std::error_code EC = newMF->materialize()) { printf("Can not materialize the function: %s, because %s\n", fnName.c_str(), EC.message().c_str()); delete clonedLib; return NULL; } #else if (newMF->Materialize(&ErrInfo)) { printf("Can not materialize the function: %s, because %s\n", fnName.c_str(), ErrInfo.c_str()); delete clonedLib; return NULL; } #endif } if (!materializedFuncCall(*mod, *clonedLib, *newMF, materializedFuncs, Gvs)) { delete clonedLib; return NULL; } Gvs.push_back((GlobalValue *)newMF); kernels.push_back(f); } /* The llvm 3.8 now has a strict materialized check for all value by checking * module is materialized. If we want to use library as old style that just * materialize what we need, we need to remove what we did not need before * materialize all of the module. To do this, we need all of the builtin * funcitons and what are needed from the kernel functions, these functions * are materalized and are recorded in Gvs, the GlobalValue like PI are also * needed and are added. Now we could not use use_empty to check if the GVs * are needed before the module is marked as all materialized, so we just * materialize all of them as there are only 7 GVs. Then we use GVExtraction * pass to extract the functions and values in Gvs from the library module. * After extract what we need and remove what we do not need, we use * materializeAll to mark the module as materialized. */ #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 /* Get all GlobalValue from module. */ Module::GlobalListType &GVlist = clonedLib->getGlobalList(); for(Module::global_iterator GVitr = GVlist.begin();GVitr != GVlist.end();++GVitr) { GlobalValue * GV = &*GVitr; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 ExitOnError ExitOnErr("Can not materialize the clonedLib: "); ExitOnErr(clonedLib->materialize(GV)); #else clonedLib->materialize(GV); #endif Gvs.push_back(GV); } llvm::legacy::PassManager Extract; /* Extract all values we need using GVExtractionPass. */ Extract.add(createGVExtractionPass(Gvs, false)); Extract.run(*clonedLib); /* Mark the library module as materialized for later use. */ #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 ExitOnError ExitOnErr("Can not materialize the clonedLib: "); ExitOnErr(clonedLib->materializeAll()); #else clonedLib->materializeAll(); #endif #endif /* the SPIR binary datalayout maybe different with beignet's bitcode */ if(clonedLib->getDataLayout() != mod->getDataLayout()) mod->setDataLayout(clonedLib->getDataLayout()); /* We use beignet's bitcode as dst because it will have a lot of lazy functions which will not be loaded. */ #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 llvm::Module * linked_module = llvm::CloneModule((llvm::Module*)mod).release(); if(LLVMLinkModules2(wrap(clonedLib), wrap(linked_module))) { #else char* errorMsg; if(LLVMLinkModules(wrap(clonedLib), wrap(mod), LLVMLinkerDestroySource, &errorMsg)) { printf("Fatal Error: link the bitcode error:\n%s\n", errorMsg); #endif delete clonedLib; return NULL; } #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 llvm::legacy::PassManager passes; #else llvm::PassManager passes; #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 auto PreserveKernel = [=](const GlobalValue &GV) { for(size_t i = 0;i < kernels.size(); ++i) if(strcmp(GV.getName().data(), kernels[i])) return true; return false; }; passes.add(createInternalizePass(PreserveKernel)); #else passes.add(createInternalizePass(kernels)); #endif passes.add(createGlobalDCEPass()); passes.run(*clonedLib); for(size_t i = 0;i < kerneltmp.size(); i++) delete[] kerneltmp[i]; return clonedLib; } } // end namespace Beignet-1.3.2-Source/backend/src/llvm/llvm_loadstore_optimization.cpp000664 001750 001750 00000030040 13173554000 025143 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Ruiling, Song * * The Idea is that: As GEN support at most 4 successive DWORD load/store, * then merge successive load/store that are compatible is beneficial. * The method of checking whether two load/store is compatible are borrowed * from Vectorize passes in llvm. */ #include "llvm_includes.hpp" using namespace llvm; namespace gbe { class GenLoadStoreOptimization : public BasicBlockPass { public: static char ID; ScalarEvolution *SE; const DataLayout *TD; GenLoadStoreOptimization() : BasicBlockPass(ID) {} void getAnalysisUsage(AnalysisUsage &AU) const { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 AU.addRequired(); AU.addPreserved(); #else AU.addRequired(); AU.addPreserved(); #endif AU.setPreservesCFG(); } virtual bool runOnBasicBlock(BasicBlock &BB) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 SE = &getAnalysis().getSE(); #else SE = &getAnalysis(); #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 TD = &BB.getModule()->getDataLayout(); #elif LLVM_VERSION_MINOR >= 5 DataLayoutPass *DLP = getAnalysisIfAvailable(); TD = DLP ? &DLP->getDataLayout() : nullptr; #else TD = getAnalysisIfAvailable(); #endif return optimizeLoadStore(BB); } Type *getValueType(Value *insn); Value *getPointerOperand(Value *I); unsigned getAddressSpace(Value *I); bool isSimpleLoadStore(Value *I); bool optimizeLoadStore(BasicBlock &BB); bool isLoadStoreCompatible(Value *A, Value *B); void mergeLoad(BasicBlock &BB, SmallVector &merged); void mergeStore(BasicBlock &BB, SmallVector &merged); bool findConsecutiveAccess(BasicBlock &BB, SmallVector &merged, const BasicBlock::iterator &start, unsigned maxVecSize, bool isLoad); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 virtual StringRef getPassName() const #else virtual const char *getPassName() const #endif { return "Merge compatible Load/stores for Gen"; } }; char GenLoadStoreOptimization::ID = 0; Value *GenLoadStoreOptimization::getPointerOperand(Value *I) { if (LoadInst *LI = dyn_cast(I)) return LI->getPointerOperand(); if (StoreInst *SI = dyn_cast(I)) return SI->getPointerOperand(); return NULL; } unsigned GenLoadStoreOptimization::getAddressSpace(Value *I) { if (LoadInst *L=dyn_cast(I)) return L->getPointerAddressSpace(); if (StoreInst *S=dyn_cast(I)) return S->getPointerAddressSpace(); return -1; } bool GenLoadStoreOptimization::isSimpleLoadStore(Value *I) { if (LoadInst *L=dyn_cast(I)) return L->isSimple(); if (StoreInst *S=dyn_cast(I)) return S->isSimple(); return false; } Type *GenLoadStoreOptimization::getValueType(Value *insn) { if(LoadInst *ld = dyn_cast(insn)) return ld->getType(); if(StoreInst *st = dyn_cast(insn)) return st->getValueOperand()->getType(); return NULL; } bool GenLoadStoreOptimization::isLoadStoreCompatible(Value *A, Value *B) { Value *ptrA = getPointerOperand(A); Value *ptrB = getPointerOperand(B); unsigned ASA = getAddressSpace(A); unsigned ASB = getAddressSpace(B); // Check that the address spaces match and that the pointers are valid. if (!ptrA || !ptrB || (ASA != ASB)) return false; if(!isSimpleLoadStore(A) || !isSimpleLoadStore(B)) return false; // Check that A and B are of the same type. if (ptrA->getType() != ptrB->getType()) return false; // Calculate the distance. const SCEV *ptrSCEVA = SE->getSCEV(ptrA); const SCEV *ptrSCEVB = SE->getSCEV(ptrB); const SCEV *offsetSCEV = SE->getMinusSCEV(ptrSCEVA, ptrSCEVB); const SCEVConstant *constOffSCEV = dyn_cast(offsetSCEV); // Non constant distance. if (!constOffSCEV) return false; int64_t offset = constOffSCEV->getValue()->getSExtValue(); Type *Ty = cast(ptrA->getType())->getElementType(); // The Instructions are connsecutive if the size of the first load/store is // the same as the offset. int64_t sz = TD->getTypeStoreSize(Ty); return ((-offset) == sz); } void GenLoadStoreOptimization::mergeLoad(BasicBlock &BB, SmallVector &merged) { IRBuilder<> Builder(&BB); unsigned size = merged.size(); SmallVector values; for(unsigned i = 0; i < size; i++) { values.push_back(merged[i]); } LoadInst *ld = cast(merged[0]); unsigned align = ld->getAlignment(); unsigned addrSpace = ld->getPointerAddressSpace(); // insert before first load Builder.SetInsertPoint(ld); VectorType *vecTy = VectorType::get(ld->getType(), size); Value *vecPtr = Builder.CreateBitCast(ld->getPointerOperand(), PointerType::get(vecTy, addrSpace)); LoadInst *vecValue = Builder.CreateLoad(vecPtr); vecValue->setAlignment(align); for (unsigned i = 0; i < size; ++i) { Value *S = Builder.CreateExtractElement(vecValue, Builder.getInt32(i)); values[i]->replaceAllUsesWith(S); } } // When searching for consecutive memory access, we do it in a small window, // if the window is too large, it would take up too much compiling time. // An Important rule we have followed is don't try to change load/store order. // But an exeption is 'load& store that are from different address spaces. The // return value will indicate wheter such kind of reorder happens. bool GenLoadStoreOptimization::findConsecutiveAccess(BasicBlock &BB, SmallVector &merged, const BasicBlock::iterator &start, unsigned maxVecSize, bool isLoad) { if(!isSimpleLoadStore(&*start)) return false; merged.push_back(&*start); unsigned targetAddrSpace = getAddressSpace(&*start); BasicBlock::iterator E = BB.end(); BasicBlock::iterator J = start; ++J; unsigned maxLimit = maxVecSize * 8; bool reordered = false; for(unsigned ss = 0; J != E && ss <= maxLimit; ++ss, ++J) { if((isLoad && isa(*J)) || (!isLoad && isa(*J))) { if(isLoadStoreCompatible(merged[merged.size()-1], &*J)) { merged.push_back(&*J); } } else if((isLoad && isa(*J))) { // simple stop to keep read/write order StoreInst *st = cast(&*J); unsigned addrSpace = st->getPointerAddressSpace(); if (addrSpace != targetAddrSpace) { reordered = true; } else { break; } } else if ((!isLoad && isa(*J))) { LoadInst *ld = cast(&*J); unsigned addrSpace = ld->getPointerAddressSpace(); if (addrSpace != targetAddrSpace) { reordered = true; } else { break; } } if(merged.size() >= maxVecSize) break; } return reordered; } void GenLoadStoreOptimization::mergeStore(BasicBlock &BB, SmallVector &merged) { IRBuilder<> Builder(&BB); unsigned size = merged.size(); SmallVector values; for(unsigned i = 0; i < size; i++) { values.push_back(cast(merged[i])->getValueOperand()); } StoreInst *st = cast(merged[0]); if(!st) return; unsigned addrSpace = st->getPointerAddressSpace(); unsigned align = st->getAlignment(); // insert before the last store Builder.SetInsertPoint(merged[size-1]); Type *dataTy = st->getValueOperand()->getType(); VectorType *vecTy = VectorType::get(dataTy, size); Value * parent = UndefValue::get(vecTy); for(unsigned i = 0; i < size; i++) { parent = Builder.CreateInsertElement(parent, values[i], ConstantInt::get(IntegerType::get(st->getContext(), 32), i)); } Value * stPointer = st->getPointerOperand(); if(!stPointer) return; Value *newPtr = Builder.CreateBitCast(stPointer, PointerType::get(vecTy, addrSpace)); StoreInst *newST = Builder.CreateStore(parent, newPtr); newST->setAlignment(align); } // Find the safe iterator we can point to. If reorder happens, we need to // point to the instruction after the first of toBeDeleted. If no reorder, // we are safe to point to the instruction after the last of toBeDeleted static BasicBlock::iterator findSafeInstruction(SmallVector &toBeDeleted, const BasicBlock::iterator ¤t, bool reorder) { BasicBlock::iterator safe = current; unsigned size = toBeDeleted.size(); if (reorder) { unsigned i = 0; while (i < size && toBeDeleted[i] == &*safe) { ++i; ++safe; } } else { safe = BasicBlock::iterator(toBeDeleted[size - 1]); ++safe; } return safe; } bool GenLoadStoreOptimization::optimizeLoadStore(BasicBlock &BB) { bool changed = false; SmallVector merged; for (BasicBlock::iterator BBI = BB.begin(), E = BB.end(); BBI != E;++BBI) { if(isa(*BBI) || isa(*BBI)) { bool isLoad = isa(*BBI) ? true: false; Type *ty = getValueType(&*BBI); if(!ty) continue; if(ty->isVectorTy()) continue; // TODO Support DWORD/WORD/BYTE LOAD for store support DWORD only now. if (!(ty->isFloatTy() || ty->isIntegerTy(32) || ((ty->isIntegerTy(8) || ty->isIntegerTy(16)) && isLoad))) continue; unsigned maxVecSize = (ty->isFloatTy() || ty->isIntegerTy(32)) ? 4 : (ty->isIntegerTy(16) ? 8 : 16); bool reorder = findConsecutiveAccess(BB, merged, BBI, maxVecSize, isLoad); uint32_t size = merged.size(); uint32_t pos = 0; bool doDeleting = size > 1; if (doDeleting) { // choose next undeleted instruction BBI = findSafeInstruction(merged, BBI, reorder); } while(size > 1) { unsigned vecSize = (size >= 16) ? 16 : (size >= 8 ? 8 : (size >= 4 ? 4 : size)); SmallVector mergedVec(merged.begin() + pos, merged.begin() + pos + vecSize); if(isLoad) mergeLoad(BB, mergedVec); else mergeStore(BB, mergedVec); // remove merged insn for(uint32_t i = 0; i < mergedVec.size(); i++) mergedVec[i]->eraseFromParent(); changed = true; pos += vecSize; size -= vecSize; } if (doDeleting) { //adjust the BBI back by one, as we would increase it in for loop //don't do this if BBI points to the very first instruction. if (BBI != BB.begin()) --BBI; } merged.clear(); } } return changed; } BasicBlockPass *createLoadStoreOptimizationPass() { return new GenLoadStoreOptimization(); } }; Beignet-1.3.2-Source/backend/src/llvm/ExpandLargeIntegers.cpp000664 001750 001750 00000077321 13173554000 023217 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ // Copyright (c) 2003-2014 University of Illinois at Urbana-Champaign. // All rights reserved. // // Developed by: // // LLVM Team // // University of Illinois at Urbana-Champaign // // http://llvm.org // // Permission is hereby granted, free of charge, to any person obtaining a copy of // this software and associated documentation files (the "Software"), to deal with // the Software without restriction, including without limitation the rights to // use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies // of the Software, and to permit persons to whom the Software is furnished to do // so, subject to the following conditions: // // * Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // // * Redistributions in binary form must reproduce the above copyright notice, // this list of conditions and the following disclaimers in the // documentation and/or other materials provided with the distribution. // // * Neither the names of the LLVM Team, University of Illinois at // Urbana-Champaign, nor the names of its contributors may be used to // endorse or promote products derived from this Software without specific // prior written permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS // FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE // CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE // SOFTWARE. //===- ExpandLargeIntegers.cpp - Expand illegal integers for PNaCl ABI ----===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. // // A limited set of transformations to expand illegal-sized int types. // //===----------------------------------------------------------------------===// // // Legal sizes for the purposes of expansion are anything 64 bits or less. // Operations on large integers are split into operations on smaller-sized // integers. The low parts should always be powers of 2, but the high parts may // not be. A subsequent pass can promote those. For now this pass only intends // to support the uses generated by clang, which is basically just for large // bitfields. // // Limitations: // 1) It can't change function signatures or global variables. // 3) Doesn't support mul, div/rem, switch. // 4) Doesn't handle arrays or structs (or GEPs) with illegal types. // 5) Doesn't handle constant expressions (it also doesn't produce them, so it // can run after ExpandConstantExpr). // // The PNaCl version does not handle bitcast between vector and large integer. // So I develop the bitcast from/to vector logic. // TODO: 1. When we do lshr/trunc, and we know it is cast from a vector, we can // optimize it to extractElement. // 2. OR x, 0 can be optimized as x. And x, 0 can be optimized as 0. //===----------------------------------------------------------------------===// #include "llvm_includes.hpp" #include "llvm_gen_backend.hpp" using namespace llvm; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35 #define DEBUG_TYPE "nacl-expand-ints" #endif #ifdef DEBUG #undef DEBUG #define DEBUG(...) #endif // Break instructions up into no larger than 64-bit chunks. static const unsigned kChunkBits = 64; static const unsigned kChunkBytes = kChunkBits / CHAR_BIT; namespace { class ExpandLargeIntegers : public FunctionPass { public: static char ID; ExpandLargeIntegers() : FunctionPass(ID) { } bool runOnFunction(Function &F) override; }; template struct LoHiPair { T Lo, Hi; LoHiPair() : Lo(), Hi() {} LoHiPair(T Lo, T Hi) : Lo(Lo), Hi(Hi) {} }; typedef LoHiPair TypePair; typedef LoHiPair ValuePair; typedef LoHiPair AlignPair; struct VectorElement { Value *parent; unsigned childId; VectorElement() : parent(NULL), childId(0) {} VectorElement(Value *p, unsigned i) : parent(p), childId(i) {} }; // Information needed to patch a phi node which forward-references a value. struct ForwardPHI { Value *Val; PHINode *Lo, *Hi; unsigned ValueNumber; ForwardPHI(Value *Val, PHINode *Lo, PHINode *Hi, unsigned ValueNumber) : Val(Val), Lo(Lo), Hi(Hi), ValueNumber(ValueNumber) {} }; } char ExpandLargeIntegers::ID = 0; static bool isLegalBitSize(unsigned Bits) { assert(Bits && "Can't have zero-size integers"); return Bits <= kChunkBits; } static TypePair getExpandedIntTypes(Type *Ty) { unsigned BitWidth = Ty->getIntegerBitWidth(); assert(!isLegalBitSize(BitWidth)); return TypePair(IntegerType::get(Ty->getContext(), kChunkBits), IntegerType::get(Ty->getContext(), BitWidth - kChunkBits)); } // Return true if Val is an int which should be converted. static bool shouldConvert(const Value *Val) { Type *Ty = Val ? Val->getType() : NULL; if (IntegerType *ITy = dyn_cast(Ty)) return !isLegalBitSize(ITy->getBitWidth()); return false; } // Return a pair of constants expanded from C. static ValuePair expandConstant(Constant *C) { assert(shouldConvert(C)); TypePair ExpandedTypes = getExpandedIntTypes(C->getType()); if (isa(C)) { return ValuePair(UndefValue::get(ExpandedTypes.Lo), UndefValue::get(ExpandedTypes.Hi)); } else if (ConstantInt *CInt = dyn_cast(C)) { Constant *ShiftAmt = ConstantInt::get( CInt->getType(), ExpandedTypes.Lo->getBitWidth(), false); return ValuePair( ConstantExpr::getTrunc(CInt, ExpandedTypes.Lo), ConstantExpr::getTrunc(ConstantExpr::getLShr(CInt, ShiftAmt), ExpandedTypes.Hi)); } errs() << "Value: " << *C << "\n"; report_fatal_error("Unexpected constant value"); } template static AlignPair getAlign(const DataLayout &DL, T *I, Type *PrefAlignTy) { unsigned LoAlign = I->getAlignment(); if (LoAlign == 0) LoAlign = DL.getPrefTypeAlignment(PrefAlignTy); unsigned HiAlign = MinAlign(LoAlign, kChunkBytes); return AlignPair(LoAlign, HiAlign); } namespace { // Holds the state for converting/replacing values. We visit instructions in // reverse post-order, phis are therefore the only instructions which can be // visited before the value they use. class ConversionState { public: // Return the expanded values for Val. ValuePair getConverted(Value *Val) { assert(shouldConvert(Val)); // Directly convert constants. if (Constant *C = dyn_cast(Val)) return expandConstant(C); if (RewrittenIllegals.count(Val)) { ValuePair Found = RewrittenIllegals[Val]; if (RewrittenLegals.count(Found.Lo)) Found.Lo = RewrittenLegals[Found.Lo]; if (RewrittenLegals.count(Found.Hi)) Found.Hi = RewrittenLegals[Found.Hi]; return Found; } errs() << "Value: " << *Val << "\n"; report_fatal_error("Expanded value not found in map"); } // Returns whether a converted value has been recorded. This is only useful // for phi instructions: they can be encountered before the incoming // instruction, whereas RPO order guarantees that other instructions always // use converted values. bool hasConverted(Value *Val) { assert(shouldConvert(Val)); return dyn_cast(Val) || RewrittenIllegals.count(Val); } // Record a forward phi, temporarily setting it to use Undef. This will be // patched up at the end of RPO. ValuePair recordForwardPHI(Value *Val, PHINode *Lo, PHINode *Hi, unsigned ValueNumber) { DEBUG(dbgs() << "\tRecording as forward PHI\n"); ForwardPHIs.push_back(ForwardPHI(Val, Lo, Hi, ValueNumber)); return ValuePair(UndefValue::get(Lo->getType()), UndefValue::get(Hi->getType())); } void recordConverted(Instruction *From, const ValuePair &To) { DEBUG(dbgs() << "\tTo: " << *To.Lo << "\n"); DEBUG(dbgs() << "\tAnd: " << *To.Hi << "\n"); ToErase.push_back(From); RewrittenIllegals[From] = To; } // Replace the uses of From with To, give From's name to To, and mark To for // deletion. void recordConverted(Instruction *From, Value *To) { assert(!shouldConvert(From)); DEBUG(dbgs() << "\tTo: " << *To << "\n"); ToErase.push_back(From); // From does not produce an illegal value, update its users in place. From->replaceAllUsesWith(To); To->takeName(From); RewrittenLegals[From] = To; } void patchForwardPHIs() { DEBUG(if (!ForwardPHIs.empty()) dbgs() << "Patching forward PHIs:\n"); for (ForwardPHI &F : ForwardPHIs) { ValuePair Ops = getConverted(F.Val); F.Lo->setIncomingValue(F.ValueNumber, Ops.Lo); F.Hi->setIncomingValue(F.ValueNumber, Ops.Hi); DEBUG(dbgs() << "\t" << *F.Lo << "\n\t" << *F.Hi << "\n"); } } void eraseReplacedInstructions() { for (Instruction *I : ToErase) I->dropAllReferences(); for (Instruction *I : ToErase) I->eraseFromParent(); } void addEraseCandidate(Instruction *c) { ToErase.push_back(c); } void appendElement(Value *v, Value *e) { if (ExtractElement.count(v) == 0) { SmallVector tmp; tmp.push_back(e); ExtractElement[v] = tmp; } else ExtractElement[v].push_back(e); } Value *getElement(Value *v, unsigned id) { return (ExtractElement[v])[id]; } VectorElement &getVectorMap(Value *child) { return VectorIllegals[child]; } bool convertedVector(Value *vector) { return VectorIllegals.count(vector) > 0 ? true : false; } void recordVectorMap(Value *child, VectorElement elem) { VectorIllegals[child] = elem; } private: // Maps illegal values to their new converted lo/hi values. DenseMap RewrittenIllegals; // Maps legal values to their new converted value. DenseMap RewrittenLegals; // Illegal values which have already been converted, will be erased. SmallVector ToErase; // PHIs which were encountered but had forward references. They need to get // patched up after RPO traversal. SmallVector ForwardPHIs; // helpers to solve bitcasting from vector to illegal integer types // Maps a Value to its original Vector and elemId DenseMap VectorIllegals; // cache the ExtractElement Values DenseMap> ExtractElement; }; } // Anonymous namespace static Value *buildVectorOrScalar(ConversionState &State, IRBuilder<> &IRB, SmallVector Elements) { assert(!Elements.empty()); Type *IntTy = IntegerType::get(IRB.getContext(), 32); if (Elements.size() > 1) { Value * vec = NULL; unsigned ElemNo = Elements.size(); Type *ElemTy = Elements[0]->getType(); // if it is illegal integer type, these instructions will be further // splited, that's why these temporary values should be erased. bool KeepInsert = isLegalBitSize(ElemTy->getPrimitiveSizeInBits() * ElemNo); for (unsigned i = 0; i < ElemNo; ++i) { Value *tmp = vec ? vec : UndefValue::get(VectorType::get(ElemTy, ElemNo)); Value *idx = ConstantInt::get(IntTy, i); vec = IRB.CreateInsertElement(tmp, Elements[i], idx); if (!KeepInsert && !isa(vec)) { State.addEraseCandidate(cast(vec)); } } return vec; } else { return Elements[0]; } } static void getSplitedValue(ConversionState &State, Value *Val, SmallVector &Result) { while (shouldConvert(Val)) { ValuePair Convert = State.getConverted(Val); Result.push_back(Convert.Lo); Val = Convert.Hi; } Result.push_back(Val); } // make all the elements in Src use the same llvm::Type, and return them in Dst static void unifyElementType(IRBuilder<> &IRB, SmallVector &Src, SmallVector &Dst) { unsigned MinWidth = Src[0]->getType()->getPrimitiveSizeInBits(); bool Unified = true; for (unsigned i = 0; i < Src.size(); i++) { Type *Ty = Src[i]->getType(); unsigned BitWidth = Ty->getPrimitiveSizeInBits(); if(BitWidth != MinWidth) Unified = false; if(BitWidth < MinWidth) MinWidth = BitWidth; } if (Unified) { for (unsigned i = 0; i < Src.size(); i++) Dst.push_back(Src[i]); } else { Type *IntTy = IntegerType::get(IRB.getContext(), 32); Type *ElemTy = IntegerType::get(IRB.getContext(), MinWidth); for (unsigned i = 0; i < Src.size(); i++) { Type *Ty = Src[i]->getType(); unsigned Size = Ty->getPrimitiveSizeInBits(); assert((Size % MinWidth) == 0); if (Size > MinWidth) { VectorType *VecTy = VectorType::get(ElemTy, Size/MinWidth); Value *Casted = IRB.CreateBitCast(Src[i], VecTy); for (unsigned j = 0; j < Size/MinWidth; j++) Dst.push_back(IRB.CreateExtractElement(Casted, ConstantInt::get(IntTy, j))); } else { Dst.push_back(Src[i]); } } } } static void convertInstruction(Instruction *Inst, ConversionState &State, const DataLayout &DL) { DEBUG(dbgs() << "Expanding Large Integer: " << *Inst << "\n"); // Set the insert point *after* Inst, so that any instructions inserted here // will be visited again. That allows iterative expansion of types > i128. BasicBlock::iterator InsertPos(Inst); IRBuilder<> IRB(&*++InsertPos); StringRef Name = Inst->getName(); if (PHINode *Phi = dyn_cast(Inst)) { unsigned N = Phi->getNumIncomingValues(); TypePair OpTys = getExpandedIntTypes(Phi->getIncomingValue(0)->getType()); PHINode *Lo = IRB.CreatePHI(OpTys.Lo, N, Twine(Name + ".lo")); PHINode *Hi = IRB.CreatePHI(OpTys.Hi, N, Twine(Name + ".hi")); for (unsigned I = 0; I != N; ++I) { Value *InVal = Phi->getIncomingValue(I); if(!InVal) continue; BasicBlock *InBB = Phi->getIncomingBlock(I); // If the value hasn't already been converted then this is a // forward-reference PHI which needs to be patched up after RPO traversal. ValuePair Ops = State.hasConverted(InVal) ? State.getConverted(InVal) : State.recordForwardPHI(InVal, Lo, Hi, I); Lo->addIncoming(Ops.Lo, InBB); Hi->addIncoming(Ops.Hi, InBB); } State.recordConverted(Phi, ValuePair(Lo, Hi)); } else if (ZExtInst *ZExt = dyn_cast(Inst)) { Value *Operand = ZExt->getOperand(0); Type *OpTy = Operand->getType(); TypePair Tys = getExpandedIntTypes(Inst->getType()); Value *Lo, *Hi; if (OpTy->getIntegerBitWidth() <= kChunkBits) { Lo = IRB.CreateZExt(Operand, Tys.Lo, Twine(Name, ".lo")); Hi = ConstantInt::get(Tys.Hi, 0); } else { ValuePair Ops = State.getConverted(Operand); Lo = Ops.Lo; Hi = IRB.CreateZExt(Ops.Hi, Tys.Hi, Twine(Name, ".hi")); } State.recordConverted(ZExt, ValuePair(Lo, Hi)); } else if (TruncInst *Trunc = dyn_cast(Inst)) { Value *Operand = Trunc->getOperand(0); assert(shouldConvert(Operand) && "TruncInst is expandable but not its op"); TypePair OpTys = getExpandedIntTypes(Operand->getType()); ValuePair Ops = State.getConverted(Operand); if (!shouldConvert(Inst)) { Value *NewInst = IRB.CreateTrunc(Ops.Lo, Trunc->getType(), Name); State.recordConverted(Trunc, NewInst); } else { TypePair Tys = getExpandedIntTypes(Trunc->getType()); (void) OpTys; assert(Tys.Lo == OpTys.Lo); Value *Lo = Ops.Lo; Value *Hi = IRB.CreateTrunc(Ops.Hi, Tys.Hi, Twine(Name, ".hi")); State.recordConverted(Trunc, ValuePair(Lo, Hi)); } } else if (BitCastInst *Cast = dyn_cast(Inst)) { Value *Operand = Cast->getOperand(0); bool DstVec = Inst->getType()->isVectorTy(); Type *IntTy = IntegerType::get(Cast->getContext(), 32); if (DstVec) { // integer to vector, get all children and bitcast SmallVector Split; SmallVector Unified; getSplitedValue(State, Operand, Split); // unify element type, this is required by insertelement unifyElementType(IRB, Split, Unified); Value *vec = NULL; unsigned ElemNo = Unified.size(); Type *ElemTy = Unified[0]->getType(); for (unsigned i = 0; i < ElemNo; ++i) { Value *tmp = vec ? vec : UndefValue::get(VectorType::get(ElemTy, ElemNo)); Value *idx = ConstantInt::get(IntTy, i); vec = IRB.CreateInsertElement(tmp, Unified[i], idx); } if (vec->getType() != Cast->getType()) vec = IRB.CreateBitCast(vec, Cast->getType()); State.recordConverted(Cast, vec); } else { // vector to integer assert(Operand->getType()->isVectorTy()); VectorType *VecTy = cast(Operand->getType()); Type *LargeTy = Inst->getType(); Type *ElemTy = VecTy->getElementType(); unsigned ElemNo = VecTy->getNumElements(); Value * VectorRoot = NULL; unsigned ChildIndex = 0; if (State.convertedVector(Operand)) { VectorElement VE = State.getVectorMap(Operand); VectorRoot = VE.parent; ChildIndex = VE.childId; } else { for (unsigned i =0; i < ElemNo; i++) State.appendElement(Operand, IRB.CreateExtractElement(Operand, ConstantInt::get(IntTy, i)) ); VectorRoot = Operand; } TypePair OpTys = getExpandedIntTypes(LargeTy); Value *Lo, *Hi; unsigned LowNo = OpTys.Lo->getIntegerBitWidth() / ElemTy->getPrimitiveSizeInBits(); unsigned HighNo = OpTys.Hi->getIntegerBitWidth() / ElemTy->getPrimitiveSizeInBits(); SmallVector LoElems; for (unsigned i = 0; i < LowNo; ++i) LoElems.push_back(State.getElement(VectorRoot, i+ChildIndex)); Lo = IRB.CreateBitCast(buildVectorOrScalar(State, IRB, LoElems), OpTys.Lo, Twine(Name, ".lo")); SmallVector HiElem; for (unsigned i = 0; i < HighNo; ++i) HiElem.push_back(State.getElement(VectorRoot, i+LowNo+ChildIndex)); Value *NewVec = buildVectorOrScalar(State, IRB, HiElem); Hi = IRB.CreateBitCast(NewVec, OpTys.Hi); State.recordVectorMap(NewVec, VectorElement(VectorRoot, LowNo + ChildIndex)); State.recordConverted(Cast, ValuePair(Lo, Hi)); } } else if (BinaryOperator *Binop = dyn_cast(Inst)) { ValuePair Lhs = State.getConverted(Binop->getOperand(0)); ValuePair Rhs = State.getConverted(Binop->getOperand(1)); TypePair Tys = getExpandedIntTypes(Binop->getType()); Instruction::BinaryOps Op = Binop->getOpcode(); switch (Op) { case Instruction::And: case Instruction::Or: case Instruction::Xor: { Value *Lo = IRB.CreateBinOp(Op, Lhs.Lo, Rhs.Lo, Twine(Name, ".lo")); Value *Hi = IRB.CreateBinOp(Op, Lhs.Hi, Rhs.Hi, Twine(Name, ".hi")); State.recordConverted(Binop, ValuePair(Lo, Hi)); break; } case Instruction::Shl: { ConstantInt *ShlAmount = dyn_cast(Rhs.Lo); // TODO(dschuff): Expansion of variable-sized shifts isn't supported // because the behavior depends on whether the shift amount is less than // the size of the low part of the expanded type, and I haven't yet // figured out a way to do it for variable-sized shifts without splitting // the basic block. I don't believe it's actually necessary for // bitfields. Likewise for LShr below. if (!ShlAmount) { errs() << "Shift: " << *Binop << "\n"; report_fatal_error("Expansion of variable-sized shifts of > 64-bit-" "wide values is not supported"); } unsigned ShiftAmount = ShlAmount->getZExtValue(); if (ShiftAmount >= Binop->getType()->getIntegerBitWidth()) ShiftAmount = 0; // Undefined behavior. unsigned HiBits = Tys.Hi->getIntegerBitWidth(); // |<------------Hi---------->|<-------Lo------>| // | | | // +--------+--------+--------+--------+--------+ // |abcdefghijklmnopqrstuvwxyz|ABCDEFGHIJKLMNOPQ| // +--------+--------+--------+--------+--------+ // Possible shifts: // |efghijklmnopqrstuvwxyzABCD|EFGHIJKLMNOPQ0000| Some Lo into Hi. // |vwxyzABCDEFGHIJKLMNOPQ0000|00000000000000000| Lo is 0, keep some Hi. // |DEFGHIJKLMNOPQ000000000000|00000000000000000| Lo is 0, no Hi left. Value *Lo, *Hi; if (ShiftAmount < kChunkBits) { Lo = IRB.CreateShl(Lhs.Lo, ShiftAmount, Twine(Name, ".lo")); Hi = IRB.CreateZExtOrTrunc(IRB.CreateLShr(Lhs.Lo, kChunkBits - ShiftAmount, Twine(Name, ".lo.shr")), Tys.Hi, Twine(Name, ".lo.ext")); } else { Lo = ConstantInt::get(Tys.Lo, 0); if (ShiftAmount == kChunkBits) { // Hi will be from Lo Hi = IRB.CreateZExtOrTrunc(Lhs.Lo, Tys.Hi, Twine(Name, ".lo.ext")); } else { Hi = IRB.CreateShl( IRB.CreateZExtOrTrunc(Lhs.Lo, Tys.Hi, Twine(Name, ".lo.ext")), ShiftAmount - kChunkBits, Twine(Name, ".lo.shl")); } } if (ShiftAmount < HiBits) Hi = IRB.CreateOr( Hi, IRB.CreateShl(Lhs.Hi, ShiftAmount, Twine(Name, ".hi.shl")), Twine(Name, ".or")); State.recordConverted(Binop, ValuePair(Lo, Hi)); break; } case Instruction::AShr: case Instruction::LShr: { ConstantInt *ShrAmount = dyn_cast(Rhs.Lo); // TODO(dschuff): Expansion of variable-sized shifts isn't supported // because the behavior depends on whether the shift amount is less than // the size of the low part of the expanded type, and I haven't yet // figured out a way to do it for variable-sized shifts without splitting // the basic block. I don't believe it's actually necessary for bitfields. if (!ShrAmount) { errs() << "Shift: " << *Binop << "\n"; report_fatal_error("Expansion of variable-sized shifts of > 64-bit-" "wide values is not supported"); } bool IsArith = Op == Instruction::AShr; unsigned ShiftAmount = ShrAmount->getZExtValue(); if (ShiftAmount >= Binop->getType()->getIntegerBitWidth()) ShiftAmount = 0; // Undefined behavior. unsigned HiBitWidth = Tys.Hi->getIntegerBitWidth(); // |<--Hi-->|<-------Lo------>| // | | | // +--------+--------+--------+ // |abcdefgh|ABCDEFGHIJKLMNOPQ| // +--------+--------+--------+ // Possible shifts (0 is sign when doing AShr): // |0000abcd|defgABCDEFGHIJKLM| Some Hi into Lo. // |00000000|00abcdefgABCDEFGH| Hi is 0, keep some Lo. // |00000000|000000000000abcde| Hi is 0, no Lo left. Value *Lo, *Hi; if (ShiftAmount == 0) { Lo = Lhs.Lo; Hi = Lhs.Hi; } else { if (ShiftAmount < kChunkBits) { Lo = IRB.CreateShl( IsArith ? IRB.CreateSExtOrTrunc(Lhs.Hi, Tys.Lo, Twine(Name, ".hi.ext")) : IRB.CreateZExtOrTrunc(Lhs.Hi, Tys.Lo, Twine(Name, ".hi.ext")), kChunkBits - ShiftAmount, Twine(Name, ".hi.shl")); Lo = IRB.CreateOr( Lo, IRB.CreateLShr(Lhs.Lo, ShiftAmount, Twine(Name, ".lo.shr")), Twine(Name, ".lo")); } else if (ShiftAmount == kChunkBits) { Lo = IsArith ? IRB.CreateSExtOrTrunc(Lhs.Hi, Tys.Lo, Twine(Name, ".hi.ext")) : IRB.CreateZExtOrTrunc(Lhs.Hi, Tys.Lo, Twine(Name, ".hi.ext")); } else { Lo = IRB.CreateBinOp(Op, Lhs.Hi, ConstantInt::get(Tys.Hi, ShiftAmount - kChunkBits), Twine(Name, ".hi.shr")); Lo = IsArith ? IRB.CreateSExtOrTrunc(Lo, Tys.Lo, Twine(Name, ".lo.ext")) : IRB.CreateZExtOrTrunc(Lo, Tys.Lo, Twine(Name, ".lo.ext")); } if (ShiftAmount < HiBitWidth) { Hi = IRB.CreateBinOp(Op, Lhs.Hi, ConstantInt::get(Tys.Hi, ShiftAmount), Twine(Name, ".hi")); } else { Hi = IsArith ? IRB.CreateAShr(Lhs.Hi, HiBitWidth - 1, Twine(Name, ".hi")) : ConstantInt::get(Tys.Hi, 0); } } State.recordConverted(Binop, ValuePair(Lo, Hi)); break; } case Instruction::Add: case Instruction::Sub: { Value *Lo, *Hi; if (Op == Instruction::Add) { Value *Limit = IRB.CreateSelect( IRB.CreateICmpULT(Lhs.Lo, Rhs.Lo, Twine(Name, ".cmp")), Rhs.Lo, Lhs.Lo, Twine(Name, ".limit")); // Don't propagate NUW/NSW to the lo operation: it can overflow. Lo = IRB.CreateBinOp(Op, Lhs.Lo, Rhs.Lo, Twine(Name, ".lo")); Value *Carry = IRB.CreateZExt( IRB.CreateICmpULT(Lo, Limit, Twine(Name, ".overflowed")), Tys.Hi, Twine(Name, ".carry")); // TODO(jfb) The hi operation could be tagged with NUW/NSW. Hi = IRB.CreateBinOp( Op, IRB.CreateBinOp(Op, Lhs.Hi, Rhs.Hi, Twine(Name, ".hi")), Carry, Twine(Name, ".carried")); } else { Value *Borrowed = IRB.CreateSExt( IRB.CreateICmpULT(Lhs.Lo, Rhs.Lo, Twine(Name, ".borrow")), Tys.Hi, Twine(Name, ".borrowing")); Lo = IRB.CreateBinOp(Op, Lhs.Lo, Rhs.Lo, Twine(Name, ".lo")); Hi = IRB.CreateBinOp( Instruction::Add, IRB.CreateBinOp(Op, Lhs.Hi, Rhs.Hi, Twine(Name, ".hi")), Borrowed, Twine(Name, ".borrowed")); } State.recordConverted(Binop, ValuePair(Lo, Hi)); break; } default: errs() << "Operation: " << *Binop << "\n"; report_fatal_error("Unhandled BinaryOperator type in " "ExpandLargeIntegers"); } } else if (LoadInst *Load = dyn_cast(Inst)) { Value *Op = Load->getPointerOperand(); unsigned AddrSpace = Op->getType()->getPointerAddressSpace(); TypePair Tys = getExpandedIntTypes(Load->getType()); AlignPair Align = getAlign(DL, Load, Load->getType()); Value *Loty = IRB.CreateBitCast(Op, Tys.Lo->getPointerTo(AddrSpace), Twine(Op->getName(), ".loty")); Value *Lo = IRB.CreateAlignedLoad(Loty, Align.Lo, Twine(Load->getName(), ".lo")); Value *HiAddr = IRB.CreateConstGEP1_32(Loty, 1, Twine(Op->getName(), ".hi.gep")); Value *HiTy = IRB.CreateBitCast(HiAddr, Tys.Hi->getPointerTo(AddrSpace), Twine(Op->getName(), ".hity")); Value *Hi = IRB.CreateAlignedLoad(HiTy, Align.Hi, Twine(Load->getName(), ".hi")); State.recordConverted(Load, ValuePair(Lo, Hi)); } else if (StoreInst *Store = dyn_cast(Inst)) { Value *Ptr = Store->getPointerOperand(); unsigned AddrSpace = Ptr->getType()->getPointerAddressSpace(); TypePair Tys = getExpandedIntTypes(Store->getValueOperand()->getType()); ValuePair StoreVals = State.getConverted(Store->getValueOperand()); AlignPair Align = getAlign(DL, Store, Store->getValueOperand()->getType()); Value *Loty = IRB.CreateBitCast(Ptr, Tys.Lo->getPointerTo(AddrSpace), Twine(Ptr->getName(), ".loty")); Value *Lo = IRB.CreateAlignedStore(StoreVals.Lo, Loty, Align.Lo); Value *HiAddr = IRB.CreateConstGEP1_32(Loty, 1, Twine(Ptr->getName(), ".hi.gep")); Value *HiTy = IRB.CreateBitCast(HiAddr, Tys.Hi->getPointerTo(AddrSpace), Twine(Ptr->getName(), ".hity")); Value *Hi = IRB.CreateAlignedStore(StoreVals.Hi, HiTy, Align.Hi); State.recordConverted(Store, ValuePair(Lo, Hi)); } else if (ICmpInst *Icmp = dyn_cast(Inst)) { ValuePair Lhs = State.getConverted(Icmp->getOperand(0)); ValuePair Rhs = State.getConverted(Icmp->getOperand(1)); switch (Icmp->getPredicate()) { case CmpInst::ICMP_EQ: case CmpInst::ICMP_NE: { Value *Lo = IRB.CreateICmp(Icmp->getUnsignedPredicate(), Lhs.Lo, Rhs.Lo, Twine(Name, ".lo")); Value *Hi = IRB.CreateICmp(Icmp->getUnsignedPredicate(), Lhs.Hi, Rhs.Hi, Twine(Name, ".hi")); Value *Result = IRB.CreateBinOp(Instruction::And, Lo, Hi, Twine(Name, ".result")); State.recordConverted(Icmp, Result); break; } // TODO(jfb): Implement the following cases. case CmpInst::ICMP_UGT: case CmpInst::ICMP_UGE: case CmpInst::ICMP_ULT: case CmpInst::ICMP_ULE: case CmpInst::ICMP_SGT: case CmpInst::ICMP_SGE: case CmpInst::ICMP_SLT: case CmpInst::ICMP_SLE: errs() << "Comparison: " << *Icmp << "\n"; report_fatal_error("Comparisons other than equality not supported for" "integer types larger than 64 bit"); default: llvm_unreachable("Invalid integer comparison"); } } else if (SelectInst *Select = dyn_cast(Inst)) { Value *Cond = Select->getCondition(); ValuePair True = State.getConverted(Select->getTrueValue()); ValuePair False = State.getConverted(Select->getFalseValue()); Value *Lo = IRB.CreateSelect(Cond, True.Lo, False.Lo, Twine(Name, ".lo")); Value *Hi = IRB.CreateSelect(Cond, True.Hi, False.Hi, Twine(Name, ".hi")); State.recordConverted(Select, ValuePair(Lo, Hi)); } else { errs() << "Instruction: " << *Inst << "\n"; report_fatal_error("Unhandle large integer expansion"); } } bool ExpandLargeIntegers::runOnFunction(Function &F) { // Don't support changing the function arguments. Illegal function arguments // should not be generated by clang. #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35 for (const Argument &Arg : F.args()) #else for (const Argument &Arg : F.getArgumentList()) #endif if (shouldConvert(&Arg)) report_fatal_error("Function " + F.getName() + " has illegal integer argument"); // TODO(jfb) This should loop to handle nested forward PHIs. ConversionState State; DataLayout DL(F.getParent()); bool Modified = false; ReversePostOrderTraversal RPOT(&F); for (ReversePostOrderTraversal::rpo_iterator FI = RPOT.begin(), FE = RPOT.end(); FI != FE; ++FI) { BasicBlock *BB = *FI; for (Instruction &I : *BB) { // Only attempt to convert an instruction if its result or any of its // operands are illegal. bool ShouldConvert = shouldConvert(&I); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35 for (Value *Op : I.operands()) ShouldConvert |= shouldConvert(Op); #else for (auto it = I.op_begin(); it != I.op_end(); it++) ShouldConvert |= shouldConvert(*it); #endif if (ShouldConvert) { convertInstruction(&I, State, DL); Modified = true; } } } State.patchForwardPHIs(); State.eraseReplacedInstructions(); return Modified; } FunctionPass *llvm::createExpandLargeIntegersPass() { return new ExpandLargeIntegers(); } Beignet-1.3.2-Source/backend/src/llvm/llvm_passes.cpp000664 001750 001750 00000030567 13173554000 021655 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia * Heldge RHodin */ /** * \file llvm_passes.cpp * \author Benjamin Segovia * \author Heldge RHodin */ /* THIS CODE IS DERIVED FROM GPL LLVM PTX BACKEND. CODE IS HERE: * http://sourceforge.net/scm/?type=git&group_id=319085 * Note that however, the original author, Heldge Rhodin, granted me (Benjamin * Segovia) the right to use another license for it (MIT here) */ #include "llvm_includes.hpp" #include "llvm/llvm_gen_backend.hpp" #include "ir/unit.hpp" #include "sys/map.hpp" using namespace llvm; namespace gbe { bool isKernelFunction(const llvm::Function &F) { bool bKernel = false; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 bKernel = F.getMetadata("kernel_arg_name") != NULL; #else const Module *module = F.getParent(); const Module::NamedMDListType& globalMD = module->getNamedMDList(); for(auto i = globalMD.begin(); i != globalMD.end(); i++) { const NamedMDNode &md = *i; if(strcmp(md.getName().data(), "opencl.kernels") != 0) continue; uint32_t ops = md.getNumOperands(); for(uint32_t x = 0; x < ops; x++) { MDNode* node = md.getOperand(x); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR <= 35 Value * op = node->getOperand(0); #else Value * op = cast(node->getOperand(0))->getValue(); #endif if(op == &F) bKernel = true; } } #endif return bKernel; } uint32_t getModuleOclVersion(const llvm::Module *M) { uint32_t oclVersion = 120; NamedMDNode *version = M->getNamedMetadata("opencl.ocl.version"); if (version == NULL) return oclVersion; uint32_t ops = version->getNumOperands(); if(ops > 0) { uint32_t major = 0, minor = 0; MDNode* node = version->getOperand(0); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 36 major = mdconst::extract(node->getOperand(0))->getZExtValue(); minor = mdconst::extract(node->getOperand(1))->getZExtValue(); #else major = cast(node->getOperand(0))->getZExtValue(); minor = cast(node->getOperand(1))->getZExtValue(); #endif oclVersion = major * 100 + minor * 10; } return oclVersion; } int32_t getPadding(int32_t offset, int32_t align) { return (align - (offset % align)) % align; } uint32_t getAlignmentByte(const ir::Unit &unit, Type* Ty) { switch (Ty->getTypeID()) { case Type::VoidTyID: NOT_SUPPORTED; case Type::VectorTyID: { const VectorType* VecTy = cast(Ty); uint32_t elemNum = VecTy->getNumElements(); if (elemNum == 3) elemNum = 4; // OCL spec return elemNum * getTypeByteSize(unit, VecTy->getElementType()); } case Type::PointerTyID: case Type::IntegerTyID: case Type::FloatTyID: case Type::DoubleTyID: case Type::HalfTyID: return getTypeBitSize(unit, Ty)/8; case Type::ArrayTyID: return getAlignmentByte(unit, cast(Ty)->getElementType()); case Type::StructTyID: { const StructType* StrTy = cast(Ty); uint32_t maxa = 0; for(uint32_t subtype = 0; subtype < StrTy->getNumElements(); subtype++) { maxa = std::max(getAlignmentByte(unit, StrTy->getElementType(subtype)), maxa); } return maxa; } default: NOT_SUPPORTED; } return 0u; } uint32_t getTypeBitSize(const ir::Unit &unit, Type* Ty) { switch (Ty->getTypeID()) { case Type::VoidTyID: NOT_SUPPORTED; case Type::PointerTyID: return unit.getPointerSize(); case Type::IntegerTyID: { // use S16 to represent SLM bool variables. int bitWidth = cast(Ty)->getBitWidth(); return (bitWidth == 1) ? 16 : bitWidth; } case Type::HalfTyID: return 16; case Type::FloatTyID: return 32; case Type::DoubleTyID: return 64; case Type::VectorTyID: { const VectorType* VecTy = cast(Ty); uint32_t numElem = VecTy->getNumElements(); if(numElem == 3) numElem = 4; // OCL spec return numElem * getTypeBitSize(unit, VecTy->getElementType()); } case Type::ArrayTyID: { const ArrayType* ArrTy = cast(Ty); Type* elementType = ArrTy->getElementType(); uint32_t size_element = getTypeBitSize(unit, elementType); uint32_t size = ArrTy->getNumElements() * size_element; uint32_t align = 8 * getAlignmentByte(unit, elementType); size += (ArrTy->getNumElements()-1) * getPadding(size_element, align); return size; } case Type::StructTyID: { const StructType* StrTy = cast(Ty); uint32_t size = 0; for(uint32_t subtype=0; subtype < StrTy->getNumElements(); subtype++) { Type* elementType = StrTy->getElementType(subtype); uint32_t align = 8 * getAlignmentByte(unit, elementType); size += getPadding(size, align); size += getTypeBitSize(unit, elementType); } return size; } default: NOT_SUPPORTED; } return 0u; } uint32_t getTypeByteSize(const ir::Unit &unit, Type* Ty) { uint32_t size_bit = getTypeBitSize(unit, Ty); assert((size_bit%8==0) && "no multiple of 8"); return size_bit/8; } Type* getEltType(Type* eltTy, uint32_t index) { Type *elementType = NULL; if (PointerType* ptrType = dyn_cast(eltTy)) elementType = ptrType->getElementType(); else if(SequentialType * seqType = dyn_cast(eltTy)) elementType = seqType->getElementType(); else if(CompositeType * compTy= dyn_cast(eltTy)) elementType = compTy->getTypeAtIndex(index); GBE_ASSERT(elementType); return elementType; } int32_t getGEPConstOffset(const ir::Unit &unit, Type *eltTy, int32_t TypeIndex) { int32_t offset = 0; if (!eltTy->isStructTy()) { if (TypeIndex != 0) { Type *elementType = getEltType(eltTy); uint32_t elementSize = getTypeByteSize(unit, elementType); uint32_t align = getAlignmentByte(unit, elementType); elementSize += getPadding(elementSize, align); offset = elementSize * TypeIndex; } } else { int32_t step = TypeIndex > 0 ? 1 : -1; for(int32_t ty_i=0; ty_i != TypeIndex; ty_i += step) { Type* elementType = getEltType(eltTy, ty_i); uint32_t align = getAlignmentByte(unit, elementType); offset += getPadding(offset, align * step); offset += getTypeByteSize(unit, elementType) * step; } //add getPaddingding for accessed type const uint32_t align = getAlignmentByte(unit, getEltType(eltTy ,TypeIndex)); offset += getPadding(offset, align * step); } return offset; } class GenRemoveGEPPasss : public BasicBlockPass { public: static char ID; GenRemoveGEPPasss(const ir::Unit &unit) : BasicBlockPass(ID), unit(unit) {} const ir::Unit &unit; void getAnalysisUsage(AnalysisUsage &AU) const { AU.setPreservesCFG(); } #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 virtual StringRef getPassName() const { #else virtual const char *getPassName() const { #endif return "SPIR backend: insert special spir instructions"; } bool simplifyGEPInstructions(GetElementPtrInst* GEPInst); virtual bool runOnBasicBlock(BasicBlock &BB) { bool changedBlock = false; iplist::iterator I = BB.getInstList().begin(); for (auto nextI = I, E = --BB.getInstList().end(); I != E; I = nextI) { iplist::iterator I = nextI++; if(GetElementPtrInst* gep = dyn_cast(&*I)) changedBlock = (simplifyGEPInstructions(gep) || changedBlock); } return changedBlock; } }; char GenRemoveGEPPasss::ID = 0; bool GenRemoveGEPPasss::simplifyGEPInstructions(GetElementPtrInst* GEPInst) { const uint32_t ptrSize = unit.getPointerSize(); Value* parentPointer = GEPInst->getOperand(0); Type* eltTy = parentPointer ? parentPointer->getType() : NULL; if(!eltTy) return false; Value* currentAddrInst = new PtrToIntInst(parentPointer, IntegerType::get(GEPInst->getContext(), ptrSize), "", GEPInst); int32_t constantOffset = 0; for(uint32_t op=1; opgetNumOperands(); ++op) { int32_t TypeIndex; ConstantInt* ConstOP = dyn_cast(GEPInst->getOperand(op)); if (ConstOP != NULL) { TypeIndex = ConstOP->getZExtValue(); constantOffset += getGEPConstOffset(unit, eltTy, TypeIndex); } else { // we only have array/vectors here, // therefore all elements have the same size TypeIndex = 0; Type* elementType = getEltType(eltTy); uint32_t size = getTypeByteSize(unit, elementType); //add padding uint32_t align = getAlignmentByte(unit, elementType); size += getPadding(size, align); Value *operand = GEPInst->getOperand(op); if(!operand) continue; #if 0 //HACK TODO: Inserted by type replacement.. this code could break something???? if(getTypeByteSize(unit, operand->getType())>4) { GBE_ASSERTM(false, "CHECK IT"); operand->dump(); //previous instruction is sext or zext instr. ignore it CastInst *cast = dyn_cast(operand); if(cast && (isa(operand) || isa(operand))) { //hope that CastInst is a s/zext operand = cast->getOperand(0); } else { //trunctate operand = new TruncInst(operand, IntegerType::get(GEPInst->getContext(), ptrSize), "", GEPInst); } } #endif Value* tmpOffset = operand; if (size != 1) { if (isPowerOf<2>(size)) { Constant* shiftAmnt = ConstantInt::get(IntegerType::get(GEPInst->getContext(), ptrSize), logi2(size)); tmpOffset = BinaryOperator::Create(Instruction::Shl, operand, shiftAmnt, "", GEPInst); } else{ Constant* sizeConst = ConstantInt::get(IntegerType::get(GEPInst->getContext(), ptrSize), size); tmpOffset = BinaryOperator::Create(Instruction::Mul, sizeConst, operand, "", GEPInst); } } currentAddrInst = BinaryOperator::Create(Instruction::Add, currentAddrInst, tmpOffset, "", GEPInst); } //step down in type hirachy eltTy = getEltType(eltTy, TypeIndex); } //insert addition of new offset before GEPInst when it is not zero if (constantOffset != 0) { Constant* newConstOffset = ConstantInt::get(IntegerType::get(GEPInst->getContext(), ptrSize), constantOffset); currentAddrInst = BinaryOperator::Create(Instruction::Add, currentAddrInst, newConstOffset, "", GEPInst); } //convert offset to ptr type (nop) IntToPtrInst* intToPtrInst = new IntToPtrInst(currentAddrInst,GEPInst->getType(),"", GEPInst); //replace uses of the GEP instruction with the newly calculated pointer GEPInst->replaceAllUsesWith(intToPtrInst); GEPInst->dropAllReferences(); GEPInst->eraseFromParent(); return true; } BasicBlockPass *createRemoveGEPPass(const ir::Unit &unit) { return new GenRemoveGEPPasss(unit); } } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/llvm/ExpandUtils.cpp000664 001750 001750 00000011147 13173554000 021556 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ // Imported from pNaCl project // Copyright (c) 2003-2014 University of Illinois at Urbana-Champaign. // All rights reserved. // // Developed by: // // LLVM Team // // University of Illinois at Urbana-Champaign // // http://llvm.org // // Permission is hereby granted, free of charge, to any person obtaining a copy of // this software and associated documentation files (the "Software"), to deal with // the Software without restriction, including without limitation the rights to // use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies // of the Software, and to permit persons to whom the Software is furnished to do // so, subject to the following conditions: // // * Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // // * Redistributions in binary form must reproduce the above copyright notice, // this list of conditions and the following disclaimers in the // documentation and/or other materials provided with the distribution. // // * Neither the names of the LLVM Team, University of Illinois at // Urbana-Champaign, nor the names of its contributors may be used to // endorse or promote products derived from this Software without specific // prior written permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS // FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE // CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE // SOFTWARE. //===-- ExpandUtils.cpp - Helper functions for expansion passes -----------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. // //===----------------------------------------------------------------------===// #include "llvm_includes.hpp" #include "llvm_gen_backend.hpp" using namespace llvm; namespace llvm { Instruction *PhiSafeInsertPt(Use *U) { Instruction *InsertPt = cast(U->getUser()); if (PHINode *PN = dyn_cast(InsertPt)) { // We cannot insert instructions before a PHI node, so insert // before the incoming block's terminator. This could be // suboptimal if the terminator is a conditional. InsertPt = PN->getIncomingBlock(*U)->getTerminator(); } return InsertPt; } void PhiSafeReplaceUses(Use *U, Value *NewVal) { User *UR = U->getUser(); if (PHINode *PN = dyn_cast(UR)) { // A PHI node can have multiple incoming edges from the same // block, in which case all these edges must have the same // incoming value. BasicBlock *BB = PN->getIncomingBlock(*U); for (unsigned I = 0; I < PN->getNumIncomingValues(); ++I) { if (PN->getIncomingBlock(I) == BB) PN->setIncomingValue(I, NewVal); } } else { UR->replaceUsesOfWith(U->get(), NewVal); } } Function *RecreateFunction(Function *Func, FunctionType *NewType) { Function *NewFunc = Function::Create(NewType, Func->getLinkage()); NewFunc->copyAttributesFrom(Func); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 Func->getParent()->getFunctionList().insert(Func->getIterator(), NewFunc); #else Func->getParent()->getFunctionList().insert(ilist_iterator(Func), NewFunc); #endif NewFunc->takeName(Func); NewFunc->getBasicBlockList().splice(NewFunc->begin(), Func->getBasicBlockList()); Func->replaceAllUsesWith( ConstantExpr::getBitCast(NewFunc, Func->getFunctionType()->getPointerTo())); return NewFunc; } } Beignet-1.3.2-Source/backend/src/llvm/PromoteIntegers.cpp000664 001750 001750 00000063633 13173554000 022453 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ // Copyright (c) 2003-2014 University of Illinois at Urbana-Champaign. // All rights reserved. // // Developed by: // // LLVM Team // // University of Illinois at Urbana-Champaign // // http://llvm.org // // Permission is hereby granted, free of charge, to any person obtaining a copy of // this software and associated documentation files (the "Software"), to deal with // the Software without restriction, including without limitation the rights to // use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies // of the Software, and to permit persons to whom the Software is furnished to do // so, subject to the following conditions: // // * Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // // * Redistributions in binary form must reproduce the above copyright notice, // this list of conditions and the following disclaimers in the // documentation and/or other materials provided with the distribution. // // * Neither the names of the LLVM Team, University of Illinois at // Urbana-Champaign, nor the names of its contributors may be used to // endorse or promote products derived from this Software without specific // prior written permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS // FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE // CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE // SOFTWARE. //===- PromoteIntegers.cpp - Promote illegal integers for PNaCl ABI -------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. // // A limited set of transformations to promote illegal-sized int types. // //===----------------------------------------------------------------------===// // // Legal sizes are currently 1, 8, 16, 32, 64 (and higher, see note below). // Operations on illegal integers are changed to operate on the next-higher // legal size. // It maintains no invariants about the upper bits (above the size of the // original type); therefore before operations which can be affected by the // value of these bits (e.g. cmp, select, lshr), the upper bits of the operands // are cleared. // // Limitations: // 1) It can't change function signatures or global variables // 2) It won't promote (and can't expand) types larger than i64 // 3) Doesn't support div operators // 4) Doesn't handle arrays or structs with illegal types // 5) Doesn't handle constant expressions (it also doesn't produce them, so it // can run after ExpandConstantExpr) // //===----------------------------------------------------------------------===// #include "llvm_includes.hpp" #include "llvm_gen_backend.hpp" using namespace llvm; namespace { class PromoteIntegers : public FunctionPass { public: static char ID; PromoteIntegers() : FunctionPass(ID) { } virtual bool runOnFunction(Function &F); }; } char PromoteIntegers::ID = 0; // Legal sizes are currently 1, 8, 16, 32, and 64. // We can't yet expand types above 64 bit, so don't try to touch them for now. // TODO(dschuff): expand >64bit types or disallow >64bit packed bitfields. // There are currently none in our tests that use the ABI checker. // See https://code.google.com/p/nativeclient/issues/detail?id=3360 static bool isLegalSize(unsigned Size) { if (Size > 64) return true; return Size == 1 || Size == 8 || Size == 16 || Size == 32 || Size == 64; } static Type *getPromotedIntType(IntegerType *Ty) { unsigned Width = Ty->getBitWidth(); assert(Width <= 64 && "Don't know how to legalize >64 bit types yet"); if (isLegalSize(Width)) return Ty; return IntegerType::get(Ty->getContext(), Width < 8 ? 8 : NextPowerOf2(Width)); } // Return a legal integer type, promoting to a larger size if necessary. static Type *getPromotedType(Type *Ty) { assert(isa(Ty) && "Trying to convert a non-integer type"); return getPromotedIntType(cast(Ty)); } // Return true if Val is an int which should be converted. static bool shouldConvert(Value *Val) { Type *Ty = Val ? Val->getType() : NULL; if (IntegerType *ITy = dyn_cast(Ty)) { if (!isLegalSize(ITy->getBitWidth())) { return true; } } return false; } // Return a constant which has been promoted to a legal size. static Value *convertConstant(Constant *C, bool SignExt=false) { assert(shouldConvert(C)); if (isa(C)) { return UndefValue::get(getPromotedType(C->getType())); } else if (ConstantInt *CInt = dyn_cast(C)) { return ConstantInt::get( getPromotedType(C->getType()), SignExt ? CInt->getSExtValue() : CInt->getZExtValue(), /*isSigned=*/SignExt); } else { errs() << "Value: " << *C << "\n"; report_fatal_error("Unexpected constant value"); return NULL; } } namespace { // Holds the state for converting/replacing values. Conversion is done in one // pass, with each value requiring conversion possibly having two stages. When // an instruction needs to be replaced (i.e. it has illegal operands or result) // a new instruction is created, and the pass calls getConverted to get its // operands. If the original operand has already been converted, the new value // is returned. Otherwise, a placeholder is created and used in the new // instruction. After a new instruction is created to replace an illegal one, // recordConverted is called to register the replacement. All users are updated, // and if there is a placeholder, its users are also updated. // recordConverted also queues the old value for deletion. // This strategy avoids the need for recursion or worklists for conversion. class ConversionState { public: // Return the promoted value for Val. If Val has not yet been converted, // return a placeholder, which will be converted later. Value *getConverted(Value *Val) { if (!shouldConvert(Val)) return Val; if (isa(Val)) report_fatal_error("Can't convert illegal GlobalVariables"); if (RewrittenMap.count(Val)) return RewrittenMap[Val]; // Directly convert constants. if (Constant *C = dyn_cast(Val)) return convertConstant(C, /*SignExt=*/false); // No converted value available yet, so create a placeholder. Value *P = new Argument(getPromotedType(Val->getType())); RewrittenMap[Val] = P; Placeholders[Val] = P; return P; } // Replace the uses of From with To, replace the uses of any // placeholders for From, and optionally give From's name to To. // Also mark To for deletion. void recordConverted(Instruction *From, Value *To, bool TakeName=true) { ToErase.push_back(From); if (!shouldConvert(From)) { // From does not produce an illegal value, update its users in place. From->replaceAllUsesWith(To); } else { // From produces an illegal value, so its users will be replaced. When // replacements are created they will use values returned by getConverted. if (Placeholders.count(From)) { // Users of the placeholder can be updated in place. Placeholders[From]->replaceAllUsesWith(To); Placeholders.erase(From); } RewrittenMap[From] = To; } if (TakeName) { To->takeName(From); } } void eraseReplacedInstructions() { for (SmallVectorImpl::iterator I = ToErase.begin(), E = ToErase.end(); I != E; ++I) (*I)->dropAllReferences(); for (SmallVectorImpl::iterator I = ToErase.begin(), E = ToErase.end(); I != E; ++I) (*I)->eraseFromParent(); } private: // Maps illegal values to their new converted values (or placeholders // if no new value is available yet) DenseMap RewrittenMap; // Maps illegal values with no conversion available yet to their placeholders DenseMap Placeholders; // Illegal values which have already been converted, will be erased. SmallVector ToErase; }; } // anonymous namespace // Split an illegal load into multiple legal loads and return the resulting // promoted value. The size of the load is assumed to be a multiple of 8. static Value *splitLoad(LoadInst *Inst, ConversionState &State) { if (Inst->isVolatile() || Inst->isAtomic()) report_fatal_error("Can't split volatile/atomic loads"); if (cast(Inst->getType())->getBitWidth() % 8 != 0) report_fatal_error("Loads must be a multiple of 8 bits"); unsigned AddrSpace = Inst->getPointerAddressSpace(); Value *OrigPtr = State.getConverted(Inst->getPointerOperand()); // OrigPtr is a placeholder in recursive calls, and so has no name if (OrigPtr->getName().empty()) OrigPtr->setName(Inst->getPointerOperand()->getName()); unsigned Width = cast(Inst->getType())->getBitWidth(); Type *NewType = getPromotedType(Inst->getType()); unsigned LoWidth = Width; while (!isLegalSize(LoWidth)) LoWidth -= 8; IntegerType *LoType = IntegerType::get(Inst->getContext(), LoWidth); IntegerType *HiType = IntegerType::get(Inst->getContext(), Width - LoWidth); IRBuilder<> IRB(Inst); Value *BCLo = IRB.CreateBitCast( OrigPtr, LoType->getPointerTo(AddrSpace), OrigPtr->getName() + ".loty"); Value *LoadLo = IRB.CreateAlignedLoad( BCLo, Inst->getAlignment(), Inst->getName() + ".lo"); Value *LoExt = IRB.CreateZExt(LoadLo, NewType, LoadLo->getName() + ".ext"); Value *GEPHi = IRB.CreateConstGEP1_32(BCLo, 1, OrigPtr->getName() + ".hi"); Value *BCHi = IRB.CreateBitCast( GEPHi, HiType->getPointerTo(AddrSpace), OrigPtr->getName() + ".hity"); Value *LoadHi = IRB.CreateLoad(BCHi, Inst->getName() + ".hi"); if (!isLegalSize(Width - LoWidth)) { LoadHi = splitLoad(cast(LoadHi), State); } Value *HiExt = IRB.CreateZExt(LoadHi, NewType, LoadHi->getName() + ".ext"); Value *HiShift = IRB.CreateShl(HiExt, LoWidth, HiExt->getName() + ".sh"); Value *Result = IRB.CreateOr(LoExt, HiShift); State.recordConverted(Inst, Result); return Result; } static Value *splitStore(StoreInst *Inst, ConversionState &State) { if (Inst->isVolatile() || Inst->isAtomic()) report_fatal_error("Can't split volatile/atomic stores"); if (cast(Inst->getValueOperand()->getType())->getBitWidth() % 8 != 0) report_fatal_error("Stores must be a multiple of 8 bits"); unsigned AddrSpace = Inst->getPointerAddressSpace(); Value *OrigPtr = State.getConverted(Inst->getPointerOperand()); // OrigPtr is now a placeholder in recursive calls, and so has no name. if (OrigPtr->getName().empty()) OrigPtr->setName(Inst->getPointerOperand()->getName()); Value *OrigVal = State.getConverted(Inst->getValueOperand()); unsigned Width = cast( Inst->getValueOperand()->getType())->getBitWidth(); unsigned LoWidth = Width; while (!isLegalSize(LoWidth)) LoWidth -= 8; IntegerType *LoType = IntegerType::get(Inst->getContext(), LoWidth); IntegerType *HiType = IntegerType::get(Inst->getContext(), Width - LoWidth); IRBuilder<> IRB(Inst); Value *BCLo = IRB.CreateBitCast( OrigPtr, LoType->getPointerTo(AddrSpace), OrigPtr->getName() + ".loty"); Value *LoTrunc = IRB.CreateTrunc( OrigVal, LoType, OrigVal->getName() + ".lo"); IRB.CreateAlignedStore(LoTrunc, BCLo, Inst->getAlignment()); Value *HiLShr = IRB.CreateLShr( OrigVal, LoWidth, OrigVal->getName() + ".hi.sh"); Value *GEPHi = IRB.CreateConstGEP1_32(BCLo, 1, OrigPtr->getName() + ".hi"); Value *HiTrunc = IRB.CreateTrunc( HiLShr, HiType, OrigVal->getName() + ".hi"); Value *BCHi = IRB.CreateBitCast( GEPHi, HiType->getPointerTo(AddrSpace), OrigPtr->getName() + ".hity"); Value *StoreHi = IRB.CreateStore(HiTrunc, BCHi); if (!isLegalSize(Width - LoWidth)) { // HiTrunc is still illegal, and is redundant with the truncate in the // recursive call, so just get rid of it. State.recordConverted(cast(HiTrunc), HiLShr, /*TakeName=*/false); StoreHi = splitStore(cast(StoreHi), State); } State.recordConverted(Inst, StoreHi, /*TakeName=*/false); return StoreHi; } // Return a converted value with the bits of the operand above the size of the // original type cleared. static Value *getClearConverted(Value *Operand, Instruction *InsertPt, ConversionState &State) { if(!Operand) return Operand; Type *OrigType = Operand->getType(); Instruction *OrigInst = dyn_cast(Operand); Operand = State.getConverted(Operand); // If the operand is a constant, it will have been created by // ConversionState.getConverted, which zero-extends by default. if (isa(Operand)) return Operand; Instruction *NewInst = BinaryOperator::Create( Instruction::And, Operand, ConstantInt::get( getPromotedType(OrigType), APInt::getLowBitsSet(getPromotedType(OrigType)->getIntegerBitWidth(), OrigType->getIntegerBitWidth())), Operand->getName() + ".clear", InsertPt); if (OrigInst) CopyDebug(NewInst, OrigInst); return NewInst; } // Return a value with the bits of the operand above the size of the original // type equal to the sign bit of the original operand. The new operand is // assumed to have been legalized already. // This is done by shifting the sign bit of the smaller value up to the MSB // position in the larger size, and then arithmetic-shifting it back down. static Value *getSignExtend(Value *Operand, Value *OrigOperand, Instruction *InsertPt) { // If OrigOperand was a constant, NewOperand will have been created by // ConversionState.getConverted, which zero-extends by default. But that is // wrong here, so replace it with a sign-extended constant. if (Constant *C = dyn_cast(OrigOperand)) return convertConstant(C, /*SignExt=*/true); Type *OrigType = OrigOperand->getType(); ConstantInt *ShiftAmt = ConstantInt::getSigned( cast(getPromotedType(OrigType)), getPromotedType(OrigType)->getIntegerBitWidth() - OrigType->getIntegerBitWidth()); BinaryOperator *Shl = BinaryOperator::Create( Instruction::Shl, Operand, ShiftAmt, Operand->getName() + ".getsign", InsertPt); if (Instruction *Inst = dyn_cast(OrigOperand)) CopyDebug(Shl, Inst); return CopyDebug(BinaryOperator::Create( Instruction::AShr, Shl, ShiftAmt, Operand->getName() + ".signed", InsertPt), Shl); } static void convertInstruction(Instruction *Inst, ConversionState &State) { if (SExtInst *Sext = dyn_cast(Inst)) { Value *Op = Sext->getOperand(0); Value *NewInst = NULL; // If the operand to be extended is illegal, we first need to fill its // upper bits with its sign bit. if (shouldConvert(Op)) { NewInst = getSignExtend(State.getConverted(Op), Op, Sext); } // If the converted type of the operand is the same as the converted // type of the result, we won't actually be changing the type of the // variable, just its value. if (getPromotedType(Op->getType()) != getPromotedType(Sext->getType())) { NewInst = CopyDebug(new SExtInst( NewInst ? NewInst : State.getConverted(Op), getPromotedType(cast(Sext->getType())), Sext->getName() + ".sext", Sext), Sext); } assert(NewInst && "Failed to convert sign extension"); State.recordConverted(Sext, NewInst); } else if (ZExtInst *Zext = dyn_cast(Inst)) { Value *Op = Zext->getOperand(0); Value *NewInst = NULL; if (shouldConvert(Op)) { NewInst = getClearConverted(Op, Zext, State); } // If the converted type of the operand is the same as the converted // type of the result, we won't actually be changing the type of the // variable, just its value. if (getPromotedType(Op->getType()) != getPromotedType(Zext->getType())) { NewInst = CopyDebug(CastInst::CreateZExtOrBitCast( NewInst ? NewInst : State.getConverted(Op), getPromotedType(cast(Zext->getType())), "", Zext), Zext); } assert(NewInst); State.recordConverted(Zext, NewInst); } else if (TruncInst *Trunc = dyn_cast(Inst)) { Value *Op = Trunc->getOperand(0); Value *NewInst; // If the converted type of the operand is the same as the converted // type of the result, we don't actually need to change the type of the // variable, just its value. However, because we don't care about the values // of the upper bits until they are consumed, truncation can be a no-op. if (getPromotedType(Op->getType()) != getPromotedType(Trunc->getType())) { NewInst = CopyDebug(new TruncInst( State.getConverted(Op), getPromotedType(cast(Trunc->getType())), State.getConverted(Op)->getName() + ".trunc", Trunc), Trunc); } else { NewInst = State.getConverted(Op); } State.recordConverted(Trunc, NewInst); } else if (LoadInst *Load = dyn_cast(Inst)) { if (shouldConvert(Load)) { splitLoad(Load, State); } } else if (StoreInst *Store = dyn_cast(Inst)) { if (shouldConvert(Store->getValueOperand())) { splitStore(Store, State); } } else if (isa(Inst)) { report_fatal_error("can't convert calls with illegal types"); } else if (BinaryOperator *Binop = dyn_cast(Inst)) { Value *NewInst = NULL; switch (Binop->getOpcode()) { case Instruction::AShr: { // The AShr operand needs to be sign-extended to the promoted size // before shifting. Because the sign-extension is implemented with // with AShr, it can be combined with the original operation. Value *Op = Binop->getOperand(0); Value *ShiftAmount = NULL; APInt SignShiftAmt = APInt( getPromotedType(Op->getType())->getIntegerBitWidth(), getPromotedType(Op->getType())->getIntegerBitWidth() - Op->getType()->getIntegerBitWidth()); NewInst = CopyDebug(BinaryOperator::Create( Instruction::Shl, State.getConverted(Op), ConstantInt::get(getPromotedType(Op->getType()), SignShiftAmt), State.getConverted(Op)->getName() + ".getsign", Binop), Binop); if (ConstantInt *C = dyn_cast( State.getConverted(Binop->getOperand(1)))) { ShiftAmount = ConstantInt::get(getPromotedType(Op->getType()), SignShiftAmt + C->getValue()); } else { // Clear the upper bits of the original shift amount, and add back the // amount we shifted to get the sign bit. ShiftAmount = getClearConverted(Binop->getOperand(1), Binop, State); ShiftAmount = CopyDebug(BinaryOperator::Create( Instruction::Add, ShiftAmount, ConstantInt::get( getPromotedType(Binop->getOperand(1)->getType()), SignShiftAmt), State.getConverted(Op)->getName() + ".shamt", Binop), Binop); } NewInst = CopyDebug(BinaryOperator::Create( Instruction::AShr, NewInst, ShiftAmount, Binop->getName() + ".result", Binop), Binop); break; } case Instruction::LShr: case Instruction::Shl: { // For LShr, clear the upper bits of the operand before shifting them // down into the valid part of the value. Value *Op = Binop->getOpcode() == Instruction::LShr ? getClearConverted(Binop->getOperand(0), Binop, State) : State.getConverted(Binop->getOperand(0)); NewInst = BinaryOperator::Create( Binop->getOpcode(), Op, // Clear the upper bits of the shift amount. getClearConverted(Binop->getOperand(1), Binop, State), Binop->getName() + ".result", Binop); break; } case Instruction::Add: case Instruction::Sub: case Instruction::Mul: case Instruction::And: case Instruction::Or: case Instruction::Xor: // These operations don't care about the state of the upper bits. NewInst = CopyDebug(BinaryOperator::Create( Binop->getOpcode(), State.getConverted(Binop->getOperand(0)), State.getConverted(Binop->getOperand(1)), Binop->getName() + ".result", Binop), Binop); break; case Instruction::FAdd: case Instruction::FSub: case Instruction::FMul: case Instruction::UDiv: case Instruction::SDiv: case Instruction::FDiv: case Instruction::URem: case Instruction::SRem: case Instruction::FRem: case Instruction::BinaryOpsEnd: // We should not see FP operators here. // We don't handle div. errs() << *Inst << "\n"; llvm_unreachable("Cannot handle binary operator"); break; } if (isa(NewInst)) { cast(NewInst)->setHasNoUnsignedWrap( Binop->hasNoUnsignedWrap()); cast(NewInst)->setHasNoSignedWrap( Binop->hasNoSignedWrap()); } State.recordConverted(Binop, NewInst); } else if (ICmpInst *Cmp = dyn_cast(Inst)) { Value *Op0, *Op1; // For signed compares, operands are sign-extended to their // promoted type. For unsigned or equality compares, the upper bits are // cleared. if (Cmp->isSigned()) { Op0 = getSignExtend(State.getConverted(Cmp->getOperand(0)), Cmp->getOperand(0), Cmp); Op1 = getSignExtend(State.getConverted(Cmp->getOperand(1)), Cmp->getOperand(1), Cmp); } else { Op0 = getClearConverted(Cmp->getOperand(0), Cmp, State); Op1 = getClearConverted(Cmp->getOperand(1), Cmp, State); } Instruction *NewInst = CopyDebug(new ICmpInst( Cmp, Cmp->getPredicate(), Op0, Op1, ""), Cmp); State.recordConverted(Cmp, NewInst); } else if (SelectInst *Select = dyn_cast(Inst)) { Instruction *NewInst = CopyDebug(SelectInst::Create( Select->getCondition(), State.getConverted(Select->getTrueValue()), State.getConverted(Select->getFalseValue()), "", Select), Select); State.recordConverted(Select, NewInst); } else if (PHINode *Phi = dyn_cast(Inst)) { PHINode *NewPhi = PHINode::Create( getPromotedType(Phi->getType()), Phi->getNumIncomingValues(), "", Phi); CopyDebug(NewPhi, Phi); for (unsigned I = 0, E = Phi->getNumIncomingValues(); I < E; ++I) { NewPhi->addIncoming(State.getConverted(Phi->getIncomingValue(I)), Phi->getIncomingBlock(I)); } State.recordConverted(Phi, NewPhi); } else if (SwitchInst *Switch = dyn_cast(Inst)) { Value *Condition = getClearConverted(Switch->getCondition(), Switch, State); SwitchInst *NewInst = SwitchInst::Create( Condition, Switch->getDefaultDest(), Switch->getNumCases(), Switch); CopyDebug(NewInst, Switch); for (SwitchInst::CaseIt I = Switch->case_begin(), E = Switch->case_end(); I != E; ++I) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 50 NewInst->addCase(cast(convertConstant(I->getCaseValue())), I->getCaseSuccessor()); #else NewInst->addCase(cast(convertConstant(I.getCaseValue())), I.getCaseSuccessor()); #endif } Switch->eraseFromParent(); } else { errs() << *Inst<<"\n"; llvm_unreachable("unhandled instruction"); } } bool PromoteIntegers::runOnFunction(Function &F) { // Don't support changing the function arguments. This should not be // generated by clang. for (Function::arg_iterator I = F.arg_begin(), E = F.arg_end(); I != E; ++I) { Value *Arg = &*I; if (shouldConvert(Arg)) { errs() << "Function " << F.getName() << ": " << *Arg << "\n"; llvm_unreachable("Function has illegal integer/pointer argument"); } } ConversionState State; bool Modified = false; for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE; ++FI) { for (BasicBlock::iterator BBI = FI->begin(), BBE = FI->end(); BBI != BBE;) { Instruction *Inst = &*BBI++; // Only attempt to convert an instruction if its result or any of its // operands are illegal. bool ShouldConvert = shouldConvert(Inst); for (User::op_iterator OI = Inst->op_begin(), OE = Inst->op_end(); OI != OE; ++OI) ShouldConvert |= shouldConvert(cast(OI)); if (ShouldConvert) { convertInstruction(Inst, State); Modified = true; } } } State.eraseReplacedInstructions(); return Modified; } FunctionPass *llvm::createPromoteIntegersPass() { return new PromoteIntegers(); } Beignet-1.3.2-Source/backend/src/llvm/llvm_scalarize.cpp000664 001750 001750 00000077315 13173554000 022336 0ustar00yryr000000 000000 /** * \file llvm_scalarize.cpp * * This file is derived from: * https://code.google.com/p/lunarglass/source/browse/trunk/Core/Passes/Transforms/Scalarize.cpp?r=903 */ //===- Scalarize.cpp - Scalarize LunarGLASS IR ----------------------------===// // // LunarGLASS: An Open Modular Shader Compiler Architecture // Copyright (C) 2010-2014 LunarG, Inc. // // Redistribution and use in source and binary forms, with or without // modification, are permitted provided that the following conditions // are met: // // Redistributions of source code must retain the above copyright // notice, this list of conditions and the following disclaimer. // // Redistributions in binary form must reproduce the above // copyright notice, this list of conditions and the following // disclaimer in the documentation and/or other materials provided // with the distribution. // // Neither the name of LunarG Inc. nor the names of its // contributors may be used to endorse or promote products derived // from this software without specific prior written permission. // // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS // FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE // COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, // INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, // BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; // LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT // LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN // ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE // POSSIBILITY OF SUCH DAMAGE. // //===----------------------------------------------------------------------===// // // Author: Michael Ilseman, LunarG // //===----------------------------------------------------------------------===// // // Scalarize the IR. // * Loads of uniforms become multiple loadComponent calls // // * Reads/writes become read/writeComponent calls // // * Component-wise operations become multiple ops over each component // // * Texture call become recomponsed texture calls // // * Vector ops disappear, with their users referring to the scalarized // * components // //===----------------------------------------------------------------------===// #include "llvm_includes.hpp" #include "llvm/llvm_gen_backend.hpp" #include "sys/map.hpp" using namespace llvm; namespace gbe { struct VectorValues { VectorValues() : vals() { } void setComponent(int c, llvm::Value* val) { assert(c >= 0 && c < 32 && "Out of bounds component"); vals[c] = val; } llvm::Value* getComponent(int c) { assert(c >= 0 && c < 32 && "Out of bounds component"); assert(vals[c] && "Requesting non-existing component"); return vals[c]; } // {Value* x, Value* y, Value* z, Value* w} llvm::Value* vals[32]; }; class Scalarize : public FunctionPass { public: // Standard pass stuff static char ID; Scalarize() : FunctionPass(ID) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35 initializeDominatorTreeWrapperPassPass(*PassRegistry::getPassRegistry()); #else initializeDominatorTreePass(*PassRegistry::getPassRegistry()); #endif } virtual bool runOnFunction(Function&); void print(raw_ostream&, const Module* = 0) const; virtual void getAnalysisUsage(AnalysisUsage&) const; protected: // An instruction is valid post-scalarization iff it is fully scalar or it // is a gla_loadn bool isValid(const Instruction*); // Take an instruction that produces a vector, and scalarize it bool scalarize(Instruction*); bool scalarizePerComponent(Instruction*); bool scalarizeBitCast(BitCastInst *); bool scalarizeFuncCall(CallInst *); bool scalarizeLoad(LoadInst*); bool scalarizeStore(StoreInst*); //bool scalarizeIntrinsic(IntrinsicInst*); bool scalarizeExtract(ExtractElementInst*); bool scalarizeInsert(InsertElementInst*); bool scalarizeShuffleVector(ShuffleVectorInst*); bool scalarizePHI(PHINode*); void scalarizeArgs(Function& F); // ... // Helpers to make the actual multiple scalar calls, one per // component. Updates the given VectorValues's components with the new // Values. void makeScalarizedCalls(Function*, ArrayRef, int numComponents, VectorValues&); void makePerComponentScalarizedCalls(Instruction*, ArrayRef); // Makes a scalar form of the given instruction: replaces the operands // and chooses a correct return type Instruction* createScalarInstruction(Instruction* inst, ArrayRef); // Gather the specified components in the given values. Returns the // component if the given value is a vector, or the scalar itself. void gatherComponents(int component, ArrayRef args, SmallVectorImpl& componentArgs); // Get the assigned component for that value. If the value is a scalar, // returns the scalar. If it's a constant, returns that component. If // it's an instruction, returns the vectorValues of that instruction for // that component Value* getComponent(int component, Value*); // Used for assertion purposes. Whether we can get the component out with // a getComponent call bool canGetComponent(Value*); // Used for assertion purposes. Whether for every operand we can get // components with a getComponent call bool canGetComponentArgs(User*); // Delete the instruction in the deadList void dce(); int GetConstantInt(const Value* value); bool IsPerComponentOp(const Instruction* inst); bool IsPerComponentOp(const Value* value); //these function used to add extract and insert instructions when load/store etc. void extractFromVector(Value* insn); Value* InsertToVector(Value* insn, Value* vecValue); Type* GetBasicType(Value* value) { return GetBasicType(value->getType()); } Type* GetBasicType(Type* type) { if(!type) return type; switch(type->getTypeID()) { case Type::VectorTyID: case Type::ArrayTyID: return GetBasicType(type->getContainedType(0)); default: break; } return type; } int GetComponentCount(const Type* type) { if (type && type->getTypeID() == Type::VectorTyID) return llvm::dyn_cast(type)->getNumElements(); else return 1; } int GetComponentCount(const Value* value) { return GetComponentCount(value ? value->getType() : NULL); } /* set to insert new instructions after the specified instruction.*/ void setAppendPoint(Instruction *insn) { BasicBlock::iterator next(insn); builder->SetInsertPoint(&*++next); } DenseMap vectorVals; struct VecValElement{ VecValElement(VectorValues *v, uint32_t i) : vecVals(v), id(i) {} VectorValues *vecVals; uint32_t id; }; DenseMap> usedVecVals; void setComponent(VectorValues &vecVals, uint32_t c, llvm::Value* val) { vecVals.setComponent(c, val); usedVecVals[val].push_back(VecValElement(&vecVals, c)); } void replaceAllUsesOfWith(Instruction* from, Instruction* to); Module* module; IRBuilder<>* builder; Type* intTy; Type* floatTy; std::vector deadList; // List of vector phis that were not completely scalarized because some // of their operands hadn't before been visited (i.e. loop variant // variables) SmallVector incompletePhis; // Map for alloca vec uesd for Extractelememt < vec, alloca > std::map vectorAlloca; }; Value* Scalarize::getComponent(int component, Value* v) { assert(canGetComponent(v) && "getComponent called on unhandled vector"); if (v && v->getType() && v->getType()->isVectorTy()) { if (ConstantDataVector* c = dyn_cast(v)) { return c->getElementAsConstant(component); } else if (ConstantVector* c = dyn_cast(v)) { return c->getOperand(component); } else if (isa(v)) { return Constant::getNullValue(GetBasicType(v)); } else if (isa(v)) { return UndefValue::get(GetBasicType(v)); } else { return vectorVals[v].getComponent(component); } } else { return v; } } bool IsPerComponentOp(const llvm::Value* value) { const llvm::Instruction* inst = llvm::dyn_cast(value); return inst && IsPerComponentOp(inst); } bool Scalarize::IsPerComponentOp(const Instruction* inst) { if (const IntrinsicInst* intr = dyn_cast(inst)) { const Intrinsic::ID intrinsicID = (Intrinsic::ID) intr->getIntrinsicID(); switch (intrinsicID) { default: return false; case Intrinsic::sqrt: case Intrinsic::ceil: case Intrinsic::trunc: case Intrinsic::fmuladd: return true; } } if (inst->isTerminator()) return false; switch (inst->getOpcode()) { // Cast ops are only per-component if they cast back to the same vector // width case Instruction::Trunc: case Instruction::ZExt: case Instruction::SExt: case Instruction::FPToUI: case Instruction::FPToSI: case Instruction::UIToFP: case Instruction::SIToFP: case Instruction::FPTrunc: case Instruction::FPExt: case Instruction::PtrToInt: case Instruction::IntToPtr: case Instruction::BitCast: return GetComponentCount(inst->getOperand(0)) == GetComponentCount(inst); // Vector ops case Instruction::InsertElement: case Instruction::ExtractElement: case Instruction::ShuffleVector: // Ways of accessing/loading/storing vectors case Instruction::ExtractValue: case Instruction::InsertValue: // Memory ops case Instruction::Alloca: case Instruction::Load: case Instruction::Store: case Instruction::GetElementPtr: // Phis are a little special. We consider them not to be per-component // because the mechanism of choice is a single value (what path we took to // get here), and doesn't choose per-component (as select would). The caller // should know to handle phis specially case Instruction::PHI: // Call insts, conservatively are no per-component case Instruction::Call: // Misc case Instruction::LandingPad: //--- 3.0 case Instruction::VAArg: return false; } // end of switch (inst->getOpcode()) return true; } int Scalarize::GetConstantInt(const Value* value) { const ConstantInt *constantInt = dyn_cast(value); // this might still be a constant expression, rather than a numeric constant, // e.g., expression with undef's in it, so it was not folded if (! constantInt) NOT_IMPLEMENTED; //gla::UnsupportedFunctionality("non-simple constant"); return constantInt->getValue().getSExtValue(); } bool Scalarize::canGetComponent(Value* v) { if (v && v->getType() && v->getType()->isVectorTy()) { if (isa(v) || isa(v) || isa(v) || isa(v)) { return true; } else { assert((isa(v) || isa(v)) && "Non-constant non-instuction?"); return vectorVals.count(v); } } else { return true; } } bool Scalarize::canGetComponentArgs(User* u) { if (PHINode* phi = dyn_cast(u)) { for (unsigned int i = 0; i < phi->getNumIncomingValues(); ++i) if (!canGetComponent(phi->getIncomingValue(i))) return false; } else { for (User::op_iterator i = u->op_begin(), e = u->op_end(); i != e; ++i) if (!canGetComponent(*i)) return false; } return true; } void Scalarize::gatherComponents(int component, ArrayRef args, SmallVectorImpl& componentArgs) { componentArgs.clear(); for (ArrayRef::iterator i = args.begin(), e = args.end(); i != e; ++i) componentArgs.push_back(getComponent(component, *i)); } Instruction* Scalarize::createScalarInstruction(Instruction* inst, ArrayRef args) { // TODO: Refine the below into one large switch unsigned op = inst->getOpcode(); if (inst->isCast()) { assert(args.size() == 1 && "incorrect number of arguments for cast op"); return CastInst::Create((Instruction::CastOps)op, args[0], GetBasicType(inst)); } if (inst->isBinaryOp()) { assert(args.size() == 2 && "incorrect number of arguments for binary op"); return BinaryOperator::Create((Instruction::BinaryOps)op, args[0], args[1]); } if (PHINode* phi = dyn_cast(inst)) { PHINode* res = PHINode::Create(GetBasicType(inst), phi->getNumIncomingValues()); // Loop over pairs of operands: [Value*, BasicBlock*] for (unsigned int i = 0; i < args.size(); i++) { BasicBlock* bb = phi->getIncomingBlock(i); //dyn_cast(args[i+1]); //assert(bb && "Non-basic block incoming block?"); res->addIncoming(args[i], bb); } return res; } if (CmpInst* cmpInst = dyn_cast(inst)) { assert(args.size() == 2 && "incorrect number of arguments for comparison"); return CmpInst::Create(cmpInst->getOpcode(), cmpInst->getPredicate(), args[0], args[1]); } if (isa(inst)) { assert(args.size() == 3 && "incorrect number of arguments for select"); return SelectInst::Create(args[0], args[1], args[2]); } if (IntrinsicInst* intr = dyn_cast(inst)) { if (! IsPerComponentOp(inst)) NOT_IMPLEMENTED; //gla::UnsupportedFunctionality("Scalarize instruction on a non-per-component intrinsic"); // TODO: Assumption is that all per-component intrinsics have all their // arguments be overloadable. Need to find some way to assert on this // assumption. This is due to how getDeclaration operates; it only takes // a list of types that fit overloadable slots. SmallVector tys(1, GetBasicType(inst->getType())); // Call instructions have the decl as a last argument, so skip it SmallVector _args; for (ArrayRef::iterator i = args.begin(), e = args.end() - 1; i != e; ++i) { tys.push_back(GetBasicType((*i)->getType())); _args.push_back(*i); } Function* f = Intrinsic::getDeclaration(module, intr->getIntrinsicID(), tys); return CallInst::Create(f, _args); } NOT_IMPLEMENTED; //gla::UnsupportedFunctionality("Currently unsupported instruction: ", inst->getOpcode(), // inst->getOpcodeName()); return 0; } void Scalarize::makeScalarizedCalls(Function* f, ArrayRef args, int count, VectorValues& vVals) { assert(count > 0 && count <= 32 && "invalid number of vector components"); for (int i = 0; i < count; ++i) { Value* res; SmallVector callArgs(args.begin(), args.end()); callArgs.push_back(ConstantInt::get(intTy, i)); res = builder->CreateCall(f, callArgs); setComponent(vVals, i, res); } } void Scalarize::makePerComponentScalarizedCalls(Instruction* inst, ArrayRef args) { int count = GetComponentCount(inst); assert(count > 0 && count <= 32 && "invalid number of vector components"); assert((inst->getNumOperands() == args.size() || isa(inst)) && "not enough arguments passed for instruction"); VectorValues& vVals = vectorVals[inst]; for (int i = 0; i < count; ++i) { // Set this component of each arg SmallVector callArgs(args.size(), 0); gatherComponents(i, args, callArgs); Instruction* res = createScalarInstruction(inst, callArgs); setComponent(vVals, i, res); builder->Insert(res); } } bool Scalarize::isValid(const Instruction* inst) { // The result if (inst->getType()->isVectorTy()) return false; // The arguments for (Instruction::const_op_iterator i = inst->op_begin(), e = inst->op_end(); i != e; ++i) { const Value* v = (*i); assert(v); if (v->getType()->isVectorTy()) return false; } return true; } bool Scalarize::scalarize(Instruction* inst) { if (isValid(inst)) return false; assert(! vectorVals.count(inst) && "We've already scalarized this somehow?"); assert((canGetComponentArgs(inst) || isa(inst)) && "Scalarizing an op whose arguments haven't been scalarized "); builder->SetInsertPoint(inst); if (IsPerComponentOp(inst)) return scalarizePerComponent(inst); //not Per Component bitcast, for example <2 * i8> -> i16, handle it in backend if (BitCastInst* bt = dyn_cast(inst)) return scalarizeBitCast(bt); if (LoadInst* ld = dyn_cast(inst)) return scalarizeLoad(ld); if (CallInst* call = dyn_cast(inst)) return scalarizeFuncCall(call); if (ExtractElementInst* extr = dyn_cast(inst)) return scalarizeExtract(extr); if (InsertElementInst* ins = dyn_cast(inst)) return scalarizeInsert(ins); if (ShuffleVectorInst* sv = dyn_cast(inst)) return scalarizeShuffleVector(sv); if (PHINode* phi = dyn_cast(inst)) return scalarizePHI(phi); if (isa(inst) || isa(inst)) // TODO: need to come up with a struct/array model for scalarization NOT_IMPLEMENTED; //gla::UnsupportedFunctionality("Scalarizing struct/array ops"); if (StoreInst* st = dyn_cast(inst)) return scalarizeStore(st); NOT_IMPLEMENTED; //gla::UnsupportedFunctionality("Currently unhandled instruction ", inst->getOpcode(), inst->getOpcodeName()); return false; } bool Scalarize::scalarizeShuffleVector(ShuffleVectorInst* sv) { // %res = shuffleVector %foo, bar, <...> // ==> nothing (just make a new VectorValues with the new components) VectorValues& vVals = vectorVals[sv]; int size = GetComponentCount(sv); Value* Op0 = sv->getOperand(0); if(!Op0) return false; int srcSize = GetComponentCount(Op0->getType()); for (int i = 0; i < size; ++i) { int select = sv->getMaskValue(i); if (select < 0) { setComponent(vVals, i, UndefValue::get(GetBasicType(Op0))); continue; } // Otherwise look up the corresponding component from the correct // source. Value* selectee; if (select < srcSize) { selectee = sv->getOperand(0); } else { // Choose from the second operand select -= srcSize; selectee = sv->getOperand(1); } setComponent(vVals, i, getComponent(select, selectee)); } return true; } bool Scalarize::scalarizePerComponent(Instruction* inst) { // dst = op %foo, %bar // ==> dstx = op ty %foox, ty %barx // dsty = op ty %fooy, ty %bary // ... SmallVector args(inst->op_begin(), inst->op_end()); makePerComponentScalarizedCalls(inst, args); return true; } bool Scalarize::scalarizePHI(PHINode* phi) { // dst = phi [ %foo, %bb1 ], [ %bar, %bb2], ... // ==> dstx = phi ty [ %foox, %bb1 ], [ %barx, %bb2], ... // dsty = phi ty [ %fooy, %bb1 ], [ %bary, %bb2], ... // If the scalar values are all known up-front, then just make the full // phinode now. If they are not yet known (phinode for a loop variant // variable), then deferr the arguments until later if (canGetComponentArgs(phi)) { SmallVector args(phi->op_begin(), phi->op_end()); makePerComponentScalarizedCalls(phi, args); } else { makePerComponentScalarizedCalls(phi, ArrayRef()); incompletePhis.push_back(phi); } return true; } void Scalarize::extractFromVector(Value* insn) { VectorValues& vVals = vectorVals[insn]; for (int i = 0; i < GetComponentCount(insn); ++i) { Value *cv = ConstantInt::get(intTy, i); Value *EI = builder->CreateExtractElement(insn, cv); setComponent(vVals, i, EI); } } Value* Scalarize::InsertToVector(Value * insn, Value* vecValue) { //VectorValues& vVals = vectorVals[writeValue]; //add fake insert instructions to avoid removed Value *II = NULL; for (int i = 0; i < GetComponentCount(vecValue); ++i) { Value *vec = II ? II : UndefValue::get(vecValue->getType()); Value *cv = ConstantInt::get(intTy, i); II = builder->CreateInsertElement(vec, getComponent(i, vecValue), cv); } return II; } bool Scalarize::scalarizeFuncCall(CallInst* call) { if (Function *F = call->getCalledFunction()) { if (F->getIntrinsicID() != 0) { //Intrinsic functions const Intrinsic::ID intrinsicID = (Intrinsic::ID) F->getIntrinsicID(); switch (intrinsicID) { default: GBE_ASSERTM(false, "Unsupported Intrinsic"); case Intrinsic::sqrt: case Intrinsic::ceil: case Intrinsic::trunc: { scalarizePerComponent(call); } break; } } else { Value *Callee = call->getCalledValue(); const std::string fnName = Callee->getName(); auto genIntrinsicID = intrinsicMap.find(fnName); // Get the function arguments CallSite CS(call); CallSite::arg_iterator CI = CS.arg_begin() + 1; switch (genIntrinsicID) { case GEN_OCL_NOT_FOUND: default: break; case GEN_OCL_READ_IMAGE_I: case GEN_OCL_READ_IMAGE_UI: case GEN_OCL_READ_IMAGE_F: { ++CI; if ((*CI)->getType()->isVectorTy()) *CI = InsertToVector(call, *CI); setAppendPoint(call); extractFromVector(call); break; } case GEN_OCL_WRITE_IMAGE_I: case GEN_OCL_WRITE_IMAGE_UI: case GEN_OCL_WRITE_IMAGE_F: { if ((*CI)->getType()->isVectorTy()) *CI = InsertToVector(call, *CI); ++CI; *CI = InsertToVector(call, *CI); break; } case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_IMAGE: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_IMAGE2: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_IMAGE4: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_IMAGE8: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_IMAGE: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_IMAGE2: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_IMAGE4: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_IMAGE8: { ++CI; ++CI; if ((*CI)->getType()->isVectorTy()) *CI = InsertToVector(call, *CI); break; } case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_MEM: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_MEM2: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_MEM4: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_MEM8: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_MEM: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_MEM2: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_MEM4: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_MEM8: { if ((*CI)->getType()->isVectorTy()) *CI = InsertToVector(call, *CI); break; } case GEN_OCL_VME: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_MEM2: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_MEM4: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_MEM8: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_IMAGE2: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_IMAGE4: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_IMAGE8: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_MEM2: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_MEM4: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_MEM8: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_IMAGE2: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_IMAGE4: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_IMAGE8: setAppendPoint(call); extractFromVector(call); break; case GEN_OCL_PRINTF: for (; CI != CS.arg_end(); ++CI) if ((*CI)->getType()->isVectorTy()) *CI = InsertToVector(call, *CI); break; } } } return false; } bool Scalarize::scalarizeBitCast(BitCastInst* bt) { if(bt->getOperand(0)->getType()->isVectorTy()) bt->setOperand(0, InsertToVector(bt, bt->getOperand(0))); if(bt->getType()->isVectorTy()) { setAppendPoint(bt); extractFromVector(bt); } return false; } bool Scalarize::scalarizeLoad(LoadInst* ld) { setAppendPoint(ld); extractFromVector(ld); return false; } bool Scalarize::scalarizeStore(StoreInst* st) { st->setOperand(0, InsertToVector(st, st->getValueOperand())); return false; } void Scalarize::replaceAllUsesOfWith(Instruction* from, Instruction* to) { GBE_ASSERT(from != NULL); if (from == to) return; for (auto &it : usedVecVals[from]) setComponent(*(it.vecVals), it.id, to); usedVecVals[from].clear(); from->replaceAllUsesWith(to); } bool Scalarize::scalarizeExtract(ExtractElementInst* extr) { // %res = extractelement %foo, %i // ==> nothing (just use %foo's %ith component instead of %res) if (! isa(extr->getOperand(1))) { // TODO: Variably referenced components. Probably handle/emulate through // a series of selects. //NOT_IMPLEMENTED; //gla::UnsupportedFunctionality("Variably referenced vector components"); //TODO: This is a implement for the non-constant index, we use an allocated new vector //to store the need vector elements. Value* foo = extr->getOperand(0); Type* fooTy = foo ? foo->getType() : NULL; Value* Alloc; if(vectorAlloca.find(foo) == vectorAlloca.end()) { BasicBlock &entry = extr->getParent()->getParent()->getEntryBlock(); BasicBlock::iterator bbIter = entry.begin(); while (isa(bbIter)) ++bbIter; IRBuilder<> allocBuilder(&entry); allocBuilder.SetInsertPoint(&*bbIter); Alloc = allocBuilder.CreateAlloca(fooTy, nullptr, ""); for (int i = 0; i < GetComponentCount(foo); ++i) { Value* foo_i = getComponent(i, foo); assert(foo_i && "There is unhandled vector component"); Value* idxs_i[] = {ConstantInt::get(intTy,0), ConstantInt::get(intTy,i)}; Value* storePtr_i = builder->CreateGEP(Alloc, idxs_i); builder->CreateStore(foo_i, storePtr_i); } vectorAlloca[foo] = Alloc; } else Alloc = vectorAlloca[foo]; Value* Idxs[] = {ConstantInt::get(intTy,0), extr->getOperand(1)}; Value* getPtr = builder->CreateGEP(Alloc, Idxs); Value* loadComp = builder->CreateLoad(getPtr); extr->replaceAllUsesWith(loadComp); return true; } //if (isa(extr->getOperand(0))) // return false; else{ int component = GetConstantInt(extr->getOperand(1)); Value* v = getComponent(component, extr->getOperand(0)); if(extr == v) return false; replaceAllUsesOfWith(dyn_cast(extr), dyn_cast(v)); return true; } } bool Scalarize::scalarizeInsert(InsertElementInst* ins) { // %res = insertValue %foo, %i // ==> nothing (just make a new VectorValues with the new component) if (! isa(ins->getOperand(2))) { // TODO: Variably referenced components. Probably handle/emulate through // a series of selects. NOT_IMPLEMENTED; //gla::UnsupportedFunctionality("Variably referenced vector components"); } int component = GetConstantInt(ins->getOperand(2)); VectorValues& vVals = vectorVals[ins]; for (int i = 0; i < GetComponentCount(ins); ++i) { setComponent(vVals, i, i == component ? ins->getOperand(1) : getComponent(i, ins->getOperand(0))); } return true; } void Scalarize::scalarizeArgs(Function& F) { if (F.arg_empty()) return; ReversePostOrderTraversal rpot(&F); BasicBlock::iterator instI = (*rpot.begin())->begin(); Instruction* instVal = &*instI; if(instVal == nullptr) return; builder->SetInsertPoint(instVal); Function::arg_iterator I = F.arg_begin(), E = F.arg_end(); for (; I != E; ++I) { Type *type = I->getType(); if(type->isVectorTy()) extractFromVector(&*I); } return; } bool Scalarize::runOnFunction(Function& F) { switch (F.getCallingConv()) { case CallingConv::C: case CallingConv::Fast: case CallingConv::SPIR_KERNEL: break; default: GBE_ASSERTM(false, "Unsupported calling convention"); } // As we inline all function calls, so skip non-kernel functions bool bKernel = isKernelFunction(F); if(!bKernel) return false; bool changed = false; module = F.getParent(); intTy = IntegerType::get(module->getContext(), 32); floatTy = Type::getFloatTy(module->getContext()); builder = new IRBuilder<>(module->getContext()); scalarizeArgs(F); typedef ReversePostOrderTraversal RPOTType; RPOTType rpot(&F); for (RPOTType::rpo_iterator bbI = rpot.begin(), bbE = rpot.end(); bbI != bbE; ++bbI) { for (BasicBlock::iterator instI = (*bbI)->begin(), instE = (*bbI)->end(); instI != instE; ++instI) { bool scalarized = scalarize(&*instI); if (scalarized) { changed = true; // TODO: uncomment when done deadList.push_back(&*instI); } } } // Fill in the incomplete phis for (SmallVectorImpl::iterator phiI = incompletePhis.begin(), phiE = incompletePhis.end(); phiI != phiE; ++phiI) { assert(canGetComponentArgs(*phiI) && "Phi's operands never scalarized"); // Fill in each component of this phi VectorValues& vVals = vectorVals[*phiI]; for (int c = 0; c < GetComponentCount(*phiI); ++c) { PHINode* compPhi = dyn_cast(vVals.getComponent(c)); assert(compPhi && "Vector phi got scalarized to non-phis?"); // Loop over pairs of operands: [Value*, BasicBlock*] for (unsigned int i = 0; i < (*phiI)->getNumOperands(); i++) { BasicBlock* bb = (*phiI)->getIncomingBlock(i); assert(bb && "Non-basic block incoming block?"); compPhi->addIncoming(getComponent(c, (*phiI)->getOperand(i)), bb); } } } dce(); incompletePhis.clear(); vectorVals.clear(); usedVecVals.clear(); vectorAlloca.clear(); delete builder; builder = 0; return changed; } void Scalarize::dce() { //two passes delete for some phinode for (std::vector::reverse_iterator i = deadList.rbegin(), e = deadList.rend(); i != e; ++i) { (*i)->dropAllReferences(); if((*i)->use_empty()) { (*i)->eraseFromParent(); (*i) = NULL; } } for (std::vector::reverse_iterator i = deadList.rbegin(), e = deadList.rend(); i != e; ++i) { if((*i) && (*i)->getParent()) (*i)->eraseFromParent(); } deadList.clear(); } void Scalarize::getAnalysisUsage(AnalysisUsage& AU) const { } void Scalarize::print(raw_ostream&, const Module*) const { return; } FunctionPass* createScalarizePass() { return new Scalarize(); } char Scalarize::ID = 0; } // end namespace Beignet-1.3.2-Source/backend/src/llvm/llvm_barrier_nodup.cpp000664 001750 001750 00000006153 13173554000 023204 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . */ /** * \file llvm_barrier_nodup.cpp * * This pass is to remove or add noduplicate function attribute for barrier functions. * Basically, we want to set NoDuplicate for those __gen_barrier_xxx functions. But if * a sub function calls those barrier functions, the sub function will not be inlined * in llvm's inlining pass. This is what we don't want. As inlining such a function in * the caller is safe, we just don't want it to duplicate the call. So Introduce this * pass to remove the NoDuplicate function attribute before the inlining pass and restore * it after. * */ #include "llvm_includes.hpp" #include "llvm/llvm_gen_backend.hpp" #include "sys/map.hpp" using namespace llvm; namespace gbe { class BarrierNodup : public ModulePass { public: static char ID; BarrierNodup(bool nodup) : ModulePass(ID), nodup(nodup) {} void getAnalysisUsage(AnalysisUsage &AU) const { } #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 virtual StringRef getPassName() const #else virtual const char *getPassName() const #endif { return "SPIR backend: set barrier no duplicate attr"; } virtual bool runOnModule(Module &M) { using namespace llvm; bool changed = false; for (auto &F : M) { if (F.getName() == "__gen_ocl_barrier_local_and_global" || F.getName() == "__gen_ocl_barrier_local" || F.getName() == "__gen_ocl_barrier_global") { if (nodup) { if (!F.hasFnAttribute(Attribute::NoDuplicate)) { F.addFnAttr(Attribute::NoDuplicate); changed = true; } } else { if (F.hasFnAttribute(Attribute::NoDuplicate)) { auto attrs = F.getAttributes(); F.setAttributes(attrs.removeAttribute(M.getContext(), #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 50 AttributeList::FunctionIndex, #else AttributeSet::FunctionIndex, #endif Attribute::NoDuplicate)); changed = true; } } } } return changed; } private: bool nodup; }; ModulePass *createBarrierNodupPass(bool Nodup) { return new BarrierNodup(Nodup); } char BarrierNodup::ID = 0; } // end namespace Beignet-1.3.2-Source/backend/src/llvm/llvm_sampler_fix.cpp000664 001750 001750 00000015155 13173554000 022664 0ustar00yryr000000 000000 /* * Copyright © 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * This pass is to solve the __gen_ocl_sampler_need_fix() and * __gen_ocl_sampler_need_rounding_fix(), as for some special * sampler type, we need some extra work around operations to * make sure to get correct pixel value. But for some other * sampler, we don't need those work around code. */ #include "llvm_includes.hpp" #include "llvm_gen_backend.hpp" #include "ocl_common_defines.h" using namespace llvm; namespace gbe { class SamplerFix : public FunctionPass { public: SamplerFix() : FunctionPass(ID) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35 initializeDominatorTreeWrapperPassPass(*PassRegistry::getPassRegistry()); #else initializeDominatorTreePass(*PassRegistry::getPassRegistry()); #endif } bool visitCallInst(CallInst *I) { if(!I) return false; Value *Callee = I->getCalledValue(); const std::string fnName = Callee->getName(); bool changed = false; Type *boolTy = IntegerType::get(I->getContext(), 1); Type *i32Ty = IntegerType::get(I->getContext(), 32); if (fnName.compare("__gen_ocl_sampler_need_fix") == 0) { // return (((sampler & __CLK_ADDRESS_MASK) == CLK_ADDRESS_CLAMP) && // ((sampler & __CLK_FILTER_MASK) == CLK_FILTER_NEAREST)); bool needFix = true; Value *needFixVal; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 CallInst *init = dyn_cast(I->getOperand(0)); if (init && init->getCalledValue()->getName().compare("__translate_sampler_initializer")) { const ConstantInt *ci = dyn_cast(init->getOperand(0)); uint32_t samplerInt = ci->getZExtValue(); #else if (dyn_cast(I->getOperand(0))) { const ConstantInt *ci = dyn_cast(I->getOperand(0)); uint32_t samplerInt = ci->getZExtValue(); #endif needFix = ((samplerInt & __CLK_ADDRESS_MASK) == CLK_ADDRESS_CLAMP && (samplerInt & __CLK_FILTER_MASK) == CLK_FILTER_NEAREST); needFixVal = ConstantInt::get(boolTy, needFix); } else { IRBuilder<> Builder(I->getParent()); Builder.SetInsertPoint(I); Value *addressMask = ConstantInt::get(i32Ty, __CLK_ADDRESS_MASK); Value *clampInt = ConstantInt::get(i32Ty, CLK_ADDRESS_CLAMP); Value *filterMask = ConstantInt::get(i32Ty, __CLK_FILTER_MASK); Value *nearestInt = ConstantInt::get(i32Ty, CLK_FILTER_NEAREST); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 Module *M = I->getParent()->getParent()->getParent(); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 50 Value* samplerCvt = M->getOrInsertFunction("__gen_ocl_sampler_to_int", i32Ty, I->getOperand(0)->getType()); #else Value* samplerCvt = M->getOrInsertFunction("__gen_ocl_sampler_to_int", i32Ty, I->getOperand(0)->getType(), nullptr); #endif Value *samplerVal = Builder.CreateCall(samplerCvt, {I->getOperand(0)}); #else Value *samplerVal = I->getOperand(0); #endif Value *addressMode = Builder.CreateAnd(samplerVal, addressMask); Value *isClampMode = Builder.CreateICmpEQ(addressMode, clampInt); Value *filterMode = Builder.CreateAnd(samplerVal, filterMask); Value *isNearestMode = Builder.CreateICmpEQ(filterMode, nearestInt); needFixVal = Builder.CreateAnd(isClampMode, isNearestMode); } I->replaceAllUsesWith(needFixVal); changed = true; } else if (fnName.compare("__gen_ocl_sampler_need_rounding_fix") == 0) { // return ((sampler & CLK_NORMALIZED_COORDS_TRUE) == 0); bool needFix = true; Value *needFixVal; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 CallInst *init = dyn_cast(I->getOperand(0)); if (init && init->getCalledValue()->getName().compare("__translate_sampler_initializer")) { const ConstantInt *ci = dyn_cast(init->getOperand(0)); uint32_t samplerInt = ci->getZExtValue(); #else if (dyn_cast(I->getOperand(0))) { const ConstantInt *ci = dyn_cast(I->getOperand(0)); uint32_t samplerInt = ci->getZExtValue(); #endif needFix = samplerInt & CLK_NORMALIZED_COORDS_TRUE; needFixVal = ConstantInt::get(boolTy, needFix); } else { IRBuilder<> Builder(I->getParent()); Builder.SetInsertPoint(I); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 Module *M = I->getParent()->getParent()->getParent(); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 50 Value* samplerCvt = M->getOrInsertFunction("__gen_ocl_sampler_to_int", i32Ty, I->getOperand(0)->getType()); #else Value* samplerCvt = M->getOrInsertFunction("__gen_ocl_sampler_to_int", i32Ty, I->getOperand(0)->getType(), nullptr); #endif Value *samplerVal = Builder.CreateCall(samplerCvt, {I->getOperand(0)}); #else Value *samplerVal = I->getOperand(0); #endif Value *normalizeMask = ConstantInt::get(i32Ty, CLK_NORMALIZED_COORDS_TRUE); Value *normalizeMode = Builder.CreateAnd(samplerVal, normalizeMask); needFixVal = Builder.CreateICmpEQ(normalizeMode, ConstantInt::get(i32Ty, 0)); } I->replaceAllUsesWith(needFixVal); changed = true; } return changed; } bool runOnFunction(Function& F) { bool changed = false; std::set deadInsnSet; for (inst_iterator I = inst_begin(&F), E = inst_end(&F); I != E; ++I) { if (dyn_cast(&*I)) { if (visitCallInst(dyn_cast(&*I))) { changed = true; deadInsnSet.insert(&*I); } } } for (auto it: deadInsnSet) it->eraseFromParent(); return changed; } static char ID; }; FunctionPass* createSamplerFixPass() { return new SamplerFix(); } char SamplerFix::ID = 0; }; Beignet-1.3.2-Source/backend/src/llvm/llvm_to_gen.cpp000664 001750 001750 00000037160 13173554000 021626 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file llvm_to_gen.cpp * \author Benjamin Segovia */ #include "llvm_includes.hpp" #include "llvm/llvm_gen_backend.hpp" #include "llvm/llvm_to_gen.hpp" #include #include #include "sys/cvar.hpp" #include "sys/platform.hpp" #include "ir/unit.hpp" #include "ir/function.hpp" #include "ir/structurizer.hpp" #include #include #include #include namespace gbe { BVAR(OCL_OUTPUT_CFG, false); BVAR(OCL_OUTPUT_CFG_ONLY, false); BVAR(OCL_OUTPUT_CFG_GEN_IR, false); using namespace llvm; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 #define TARGETLIBRARY TargetLibraryInfoImpl #else #define TARGETLIBRARY TargetLibraryInfo #endif void runFuntionPass(Module &mod, TARGETLIBRARY *libraryInfo, const DataLayout &DL) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 legacy::FunctionPassManager FPM(&mod); #else FunctionPassManager FPM(&mod); #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 #elif LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 36 FPM.add(new DataLayoutPass()); #elif LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR == 35 FPM.add(new DataLayoutPass(DL)); #else FPM.add(new DataLayout(DL)); #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35 FPM.add(createVerifierPass(true)); #else FPM.add(createVerifierPass()); #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 FPM.add(new TargetLibraryInfoWrapperPass(*libraryInfo)); #else FPM.add(new TargetLibraryInfo(*libraryInfo)); #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 FPM.add(createTypeBasedAAWrapperPass()); FPM.add(createBasicAAWrapperPass()); #else FPM.add(createTypeBasedAliasAnalysisPass()); FPM.add(createBasicAliasAnalysisPass()); #endif FPM.add(createCFGSimplificationPass()); FPM.add(createSROAPass()); FPM.add(createEarlyCSEPass()); FPM.add(createLowerExpectIntrinsicPass()); FPM.doInitialization(); for (Module::iterator I = mod.begin(), E = mod.end(); I != E; ++I) if (!I->isDeclaration()) FPM.run(*I); FPM.doFinalization(); } void runModulePass(Module &mod, TARGETLIBRARY *libraryInfo, const DataLayout &DL, int optLevel, bool strictMath) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 legacy::PassManager MPM; #else PassManager MPM; #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 #elif LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 36 MPM.add(new DataLayoutPass()); #elif LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR == 35 MPM.add(new DataLayoutPass(DL)); #else MPM.add(new DataLayout(DL)); #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 MPM.add(new TargetLibraryInfoWrapperPass(*libraryInfo)); #else MPM.add(new TargetLibraryInfo(*libraryInfo)); #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 MPM.add(createTypeBasedAAWrapperPass()); MPM.add(createBasicAAWrapperPass()); #else MPM.add(createTypeBasedAliasAnalysisPass()); MPM.add(createBasicAliasAnalysisPass()); #endif MPM.add(createIntrinsicLoweringPass()); MPM.add(createBarrierNodupPass(false)); // remove noduplicate fnAttr before inlining. MPM.add(createFunctionInliningPass(20000)); MPM.add(createBarrierNodupPass(true)); // restore noduplicate fnAttr after inlining. MPM.add(createStripAttributesPass(false)); // Strip unsupported attributes and calling conventions. MPM.add(createSamplerFixPass()); MPM.add(createGlobalOptimizerPass()); // Optimize out global vars MPM.add(createIPSCCPPass()); // IP SCCP MPM.add(createDeadArgEliminationPass()); // Dead argument elimination MPM.add(createInstructionCombiningPass());// Clean up after IPCP & DAE MPM.add(createCFGSimplificationPass()); // Clean up after IPCP & DAE MPM.add(createPruneEHPass()); // Remove dead EH info #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 MPM.add(createPostOrderFunctionAttrsLegacyPass()); #elif LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 MPM.add(createPostOrderFunctionAttrsPass()); // Set readonly/readnone attrs #else MPM.add(createFunctionAttrsPass()); // Set readonly/readnone attrs #endif //MPM.add(createScalarReplAggregatesPass(64, true, -1, -1, 64)) if(optLevel > 0) #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 MPM.add(createSROAPass()); #else MPM.add(createSROAPass(/*RequiresDomTree*/ false)); #endif MPM.add(createEarlyCSEPass()); // Catch trivial redundancies MPM.add(createJumpThreadingPass()); // Thread jumps. MPM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals MPM.add(createCFGSimplificationPass()); // Merge & remove BBs MPM.add(createInstructionCombiningPass()); // Combine silly seq's MPM.add(createTailCallEliminationPass()); // Eliminate tail calls MPM.add(createCFGSimplificationPass()); // Merge & remove BBs MPM.add(createReassociatePass()); // Reassociate expressions MPM.add(createLoopRotatePass()); // Rotate Loop MPM.add(createLICMPass()); // Hoist loop invariants MPM.add(createLoopUnswitchPass(true)); MPM.add(createInstructionCombiningPass()); MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars MPM.add(createLoopIdiomPass()); // Recognize idioms like memset. MPM.add(createLoopDeletionPass()); // Delete dead loops MPM.add(createLoopUnrollPass(640)); //1024, 32, 1024, 512)); //Unroll loops if(optLevel > 0) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 MPM.add(createSROAPass()); #else MPM.add(createSROAPass(/*RequiresDomTree*/ false)); #endif MPM.add(createGVNPass()); // Remove redundancies } #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35 // FIXME Workaround: we find that CustomLoopUnroll may increase register pressure greatly, // and it may even make som cl kernel cannot compile because of limited scratch memory for spill. // As we observe this under strict math. So we disable CustomLoopUnroll if strict math is enabled. if (!strictMath) { #if !defined(__ANDROID__) MPM.add(createCustomLoopUnrollPass()); //1024, 32, 1024, 512)); //Unroll loops #endif MPM.add(createLoopUnrollPass()); //1024, 32, 1024, 512)); //Unroll loops if(optLevel > 0) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 MPM.add(createSROAPass()); #else MPM.add(createSROAPass(/*RequiresDomTree*/ false)); #endif MPM.add(createGVNPass()); // Remove redundancies } } #endif MPM.add(createMemCpyOptPass()); // Remove memcpy / form memset MPM.add(createSCCPPass()); // Constant prop with SCCP // Run instcombine after redundancy elimination to exploit opportunities // opened up by them. MPM.add(createInstructionCombiningPass()); MPM.add(createJumpThreadingPass()); // Thread jumps MPM.add(createCorrelatedValuePropagationPass()); MPM.add(createDeadStoreEliminationPass()); // Delete dead stores MPM.add(createAggressiveDCEPass()); // Delete dead instructions MPM.add(createCFGSimplificationPass()); // Merge & remove BBs MPM.add(createInstructionCombiningPass()); // Clean up after everything. MPM.add(createStripDeadPrototypesPass()); // Get rid of dead prototypes if(optLevel > 0) { MPM.add(createGlobalDCEPass()); // Remove dead fns and globals. MPM.add(createConstantMergePass()); // Merge dup global constants } MPM.run(mod); } #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 #define OUTPUT_BITCODE(STAGE, MOD) do { \ legacy::PassManager passes__; \ if (OCL_OUTPUT_LLVM_##STAGE) { \ passes__.add(createPrintModulePass(*o)); \ passes__.run(MOD); \ } \ }while(0) #elif LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35 #define OUTPUT_BITCODE(STAGE, MOD) do { \ PassManager passes__; \ if (OCL_OUTPUT_LLVM_##STAGE) { \ passes__.add(createPrintModulePass(*o)); \ passes__.run(MOD); \ } \ }while(0) #else #define OUTPUT_BITCODE(STAGE, MOD) do { \ PassManager passes__; \ if (OCL_OUTPUT_LLVM_##STAGE) { \ passes__.add(createPrintModulePass(&*o)); \ passes__.run(MOD); \ } \ }while(0) #endif BVAR(OCL_OUTPUT_LLVM_BEFORE_LINK, false); BVAR(OCL_OUTPUT_LLVM_AFTER_LINK, false); BVAR(OCL_OUTPUT_LLVM_AFTER_GEN, false); class gbeDiagnosticContext { public: gbeDiagnosticContext() : _str(""), messages(_str), printer(messages), _has_errors(false) {} void process(const llvm::DiagnosticInfo &diagnostic) { if (diagnostic.getSeverity() != DS_Remark) { // avoid noise from function inlining remarks diagnostic.print(printer); } if (diagnostic.getSeverity() == DS_Error) { _has_errors = true; } } std::string str(){return messages.str();} bool has_errors(){return _has_errors;} private: std::string _str; llvm::raw_string_ostream messages; llvm::DiagnosticPrinterRawOStream printer; bool _has_errors; }; void gbeDiagnosticHandler(const llvm::DiagnosticInfo &diagnostic, void *context) { gbeDiagnosticContext *dc = reinterpret_cast(context); dc->process(diagnostic); } bool llvmToGen(ir::Unit &unit, const void* module, int optLevel, bool strictMath, int profiling, std::string &errors) { std::string errInfo; std::unique_ptr o = NULL; if (OCL_OUTPUT_LLVM_BEFORE_LINK || OCL_OUTPUT_LLVM_AFTER_LINK || OCL_OUTPUT_LLVM_AFTER_GEN) o = std::unique_ptr(new llvm::raw_fd_ostream(fileno(stdout), false)); Module* cl_mod = NULL; if (module) { cl_mod = reinterpret_cast(const_cast(module)); } if (!cl_mod) return false; OUTPUT_BITCODE(BEFORE_LINK, (*cl_mod)); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 legacy::PassManager passes__; #else PassManager passes__; #endif //run ExpandConstantExprPass before collectDeviceEnqueueInfo //to simplify the analyze of block. passes__.add(createExpandConstantExprPass()); // constant prop may generate ConstantExpr passes__.run(*cl_mod); /* Must call before materialize when link */ collectDeviceEnqueueInfo(cl_mod, unit); std::unique_ptr M; /* Before do any thing, we first filter in all CL functions in bitcode. */ /* Also set unit's pointer size in runBitCodeLinker */ M.reset(runBitCodeLinker(cl_mod, strictMath, unit)); if (M.get() == 0) return true; Module &mod = *M.get(); DataLayout DL(&mod); gbeDiagnosticContext dc; mod.getContext().setDiagnosticHandler(&gbeDiagnosticHandler,&dc); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 mod.setDataLayout(DL); #endif Triple TargetTriple(mod.getTargetTriple()); TARGETLIBRARY *libraryInfo = new TARGETLIBRARY(TargetTriple); libraryInfo->disableAllFunctions(); OUTPUT_BITCODE(AFTER_LINK, mod); runFuntionPass(mod, libraryInfo, DL); runModulePass(mod, libraryInfo, DL, optLevel, strictMath); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 legacy::PassManager passes; #else PassManager passes; #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 #elif LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 36 passes.add(new DataLayoutPass()); #elif LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR == 35 passes.add(new DataLayoutPass(DL)); #else passes.add(new DataLayout(DL)); #endif // Print the code before further optimizations passes.add(createIntrinsicLoweringPass()); passes.add(createStripAttributesPass(true)); // Strip unsupported attributes and calling conventions. passes.add(createFunctionInliningPass(20000)); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 passes.add(createSROAPass()); #else passes.add(createScalarReplAggregatesPass(64, true, -1, -1, 64)); #endif passes.add(createLoadStoreOptimizationPass()); passes.add(createConstantPropagationPass()); passes.add(createPromoteMemoryToRegisterPass()); if(optLevel > 0) passes.add(createGVNPass()); // Remove redundancies passes.add(createPrintfParserPass(unit)); passes.add(createExpandConstantExprPass()); // expand ConstantExpr passes.add(createScalarizePass()); // Expand all vector ops passes.add(createExpandLargeIntegersPass()); // legalize large integer operation passes.add(createInstructionCombiningPass()); // legalize will generate some silly instructions passes.add(createConstantPropagationPass()); // propagate constant after scalarize/legalize passes.add(createExpandConstantExprPass()); // constant prop may generate ConstantExpr passes.add(createPromoteIntegersPass()); // align integer size to power of two passes.add(createRemoveGEPPass(unit)); // Constant prop may generate gep passes.add(createDeadInstEliminationPass()); // Remove simplified instructions passes.add(createCFGSimplificationPass()); // Merge & remove BBs passes.add(createLowerSwitchPass()); // simplify cfg will generate switch-case instruction if (profiling) { passes.add(createProfilingInserterPass(profiling, unit)); // insert the time stamp for profiling. } passes.add(createScalarizePass()); // Expand all vector ops if(OCL_OUTPUT_CFG) #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 passes.add(createCFGPrinterLegacyPassPass()); #else passes.add(createCFGPrinterPass()); #endif if(OCL_OUTPUT_CFG_ONLY) #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 passes.add(createCFGOnlyPrinterLegacyPassPass()); #else passes.add(createCFGOnlyPrinterPass()); #endif passes.add(createGenPass(unit)); passes.run(mod); errors = dc.str(); if(dc.has_errors()){ unit.setValid(false); delete libraryInfo; return true; } // Print the code extra optimization passes OUTPUT_BITCODE(AFTER_GEN, mod); const ir::Unit::FunctionSet& fs = unit.getFunctionSet(); ir::Unit::FunctionSet::const_iterator iter = fs.begin(); while(iter != fs.end()) { ir::CFGStructurizer *structurizer = new ir::CFGStructurizer(iter->second); structurizer->StructurizeBlocks(); delete structurizer; if (OCL_OUTPUT_CFG_GEN_IR) iter->second->outputCFG(); iter++; } delete libraryInfo; return true; } } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/llvm/ExpandConstantExpr.cpp000664 001750 001750 00000015417 13161142102 023103 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ // Imported from pNaCl project // Copyright (c) 2003-2014 University of Illinois at Urbana-Champaign. // All rights reserved. // // Developed by: // // LLVM Team // // University of Illinois at Urbana-Champaign // // http://llvm.org // // Permission is hereby granted, free of charge, to any person obtaining a copy of // this software and associated documentation files (the "Software"), to deal with // the Software without restriction, including without limitation the rights to // use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies // of the Software, and to permit persons to whom the Software is furnished to do // so, subject to the following conditions: // // * Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // // * Redistributions in binary form must reproduce the above copyright notice, // this list of conditions and the following disclaimers in the // documentation and/or other materials provided with the distribution. // // * Neither the names of the LLVM Team, University of Illinois at // Urbana-Champaign, nor the names of its contributors may be used to // endorse or promote products derived from this Software without specific // prior written permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS // FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE // CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE // SOFTWARE. //===- ExpandConstantExpr.cpp - Convert ConstantExprs to Instructions------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. // //===----------------------------------------------------------------------===// // // This pass expands out ConstantExprs into Instructions. // // Note that this only converts ConstantExprs that are referenced by // Instructions. It does not convert ConstantExprs that are used as // initializers for global variables. // // This simplifies the language so that the PNaCl translator does not // need to handle ConstantExprs as part of a stable wire format for // PNaCl. // //===----------------------------------------------------------------------===// #include #include "llvm_includes.hpp" #include "llvm_gen_backend.hpp" using namespace llvm; static bool expandInstruction(Instruction *Inst); namespace { // This is a FunctionPass because our handling of PHI nodes means // that our modifications may cross BasicBlocks. struct ExpandConstantExpr : public FunctionPass { static char ID; // Pass identification, replacement for typeid ExpandConstantExpr() : FunctionPass(ID) { } virtual bool runOnFunction(Function &Func); }; } char ExpandConstantExpr::ID = 0; static Value *expandConstantExpr(Instruction *InsertPt, ConstantExpr *Expr) { Instruction *NewInst = Expr->getAsInstruction(); NewInst->insertBefore(InsertPt); NewInst->setName("expanded"); expandInstruction(NewInst); return NewInst; } // For a constant vector, it may contain some constant expressions. // We need to expand each expressions then recreate this vector by // using InsertElement instruction. Thus we can eliminate all the // constant expressions. static Value *expandConstantVector(Instruction *InsertPt, ConstantVector *CV) { int elemNum = CV->getType()->getNumElements(); Type *IntTy = IntegerType::get(CV->getContext(), 32); BasicBlock::iterator InsertPos(InsertPt); IRBuilder<> IRB(&*InsertPos); Value *vec = UndefValue::get(CV->getType()); for (int i = 0; i < elemNum; i++) { Value *idx = ConstantInt::get(IntTy, i); if (dyn_cast(CV->getOperand(i))) vec = IRB.CreateInsertElement(vec, expandConstantVector(InsertPt, dyn_cast(CV->getOperand(i))), idx); else if (dyn_cast(CV->getOperand(i))) vec = IRB.CreateInsertElement(vec, expandConstantExpr(InsertPt, dyn_cast(CV->getOperand(i))), idx); else vec = IRB.CreateInsertElement(vec, CV->getOperand(i), idx); } return vec; } // Whether a constant vector contains constant expression which need to expand. static bool needExpand(ConstantVector *CV) { int elemNum = CV->getType()->getNumElements(); for (int i = 0; i < elemNum; i++) { Constant *C = CV->getOperand(i); if (dyn_cast(C)) return true; if (dyn_cast(C)) if (needExpand(dyn_cast(C))) return true; } return false; } static bool expandInstruction(Instruction *Inst) { // A landingpad can only accept ConstantExprs, so it should remain // unmodified. if (isa(Inst)) return false; bool Modified = false; for (unsigned OpNum = 0; OpNum < Inst->getNumOperands(); OpNum++) { if (ConstantExpr *Expr = dyn_cast(Inst->getOperand(OpNum))) { Modified = true; Use *U = &Inst->getOperandUse(OpNum); PhiSafeReplaceUses(U, expandConstantExpr(PhiSafeInsertPt(U), Expr)); } else { ConstantVector *CV = dyn_cast(Inst->getOperand(OpNum)); if (CV && needExpand(CV)) { Modified = true; Use *U = &Inst->getOperandUse(OpNum); PhiSafeReplaceUses(U, expandConstantVector(PhiSafeInsertPt(U), CV)); } } } return Modified; } bool ExpandConstantExpr::runOnFunction(Function &Func) { bool Modified = false; for (llvm::Function::iterator BB = Func.begin(), E = Func.end(); BB != E; ++BB) { for (BasicBlock::InstListType::iterator Inst = BB->begin(), E = BB->end(); Inst != E; ++Inst) { Modified |= expandInstruction(&*Inst); } } return Modified; } FunctionPass *llvm::createExpandConstantExprPass() { return new ExpandConstantExpr(); } Beignet-1.3.2-Source/backend/src/llvm/llvm_unroll.cpp000664 001750 001750 00000022440 13173554000 021661 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . */ #include "llvm/Config/llvm-config.h" #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35 #include #include "llvm_includes.hpp" #include "llvm/llvm_gen_backend.hpp" #include "sys/map.hpp" using namespace llvm; namespace gbe { class CustomLoopUnroll : public LoopPass { public: static char ID; CustomLoopUnroll() : LoopPass(ID) {} void getAnalysisUsage(AnalysisUsage &AU) const { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 AU.addRequired(); AU.addPreserved(); #else AU.addRequired(); AU.addPreserved(); #endif AU.addRequiredID(LoopSimplifyID); AU.addPreservedID(LoopSimplifyID); AU.addRequiredID(LCSSAID); AU.addPreservedID(LCSSAID); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 AU.addRequired(); AU.addPreserved(); #else AU.addRequired(); AU.addPreserved(); #endif // FIXME: Loop unroll requires LCSSA. And LCSSA requires dom info. // If loop unroll does not preserve dom info then LCSSA pass on next // loop will receive invalid dom info. // For now, recreate dom info, if loop is unrolled. AU.addPreserved(); } // Returns the value associated with the given metadata node name (for // example, "llvm.loop.unroll.count"). If no such named metadata node // exists, then nullptr is returned. static const MDNode *GetUnrollMetadataValue(const Loop *L, StringRef Name) { MDNode *LoopID = L->getLoopID(); if (!LoopID) return nullptr; // First operand should refer to the loop id itself. assert(LoopID->getNumOperands() > 0 && "requires at least one operand"); assert(LoopID->getOperand(0) == LoopID && "invalid loop id"); for (unsigned i = 1, e = LoopID->getNumOperands(); i < e; ++i) { const MDNode *MD = dyn_cast(LoopID->getOperand(i)); if (!MD) continue; const MDString *S = dyn_cast(MD->getOperand(0)); if (!S) continue; if (Name.equals(S->getString())) { return MD; } } return nullptr; } static unsigned GetUnrollCount(const Loop *L, StringRef Name) { const MDNode *MD = GetUnrollMetadataValue(L, "llvm.loop.unroll.count"); if (MD) { assert(MD->getNumOperands() == 2 && "Unroll count hint metadata should have two operands."); unsigned Count; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 36 Count = mdconst::extract(MD->getOperand(1))->getZExtValue(); #else Count = cast(MD->getOperand(1))->getZExtValue(); #endif assert(Count >= 1 && "Unroll count must be positive."); return Count; } return 0; } void setUnrollID(Loop *L, bool enable) { assert(enable); LLVMContext &Context = L->getHeader()->getContext(); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 36 SmallVector forceUnroll; forceUnroll.push_back(MDString::get(Context, "llvm.loop.unroll.enable")); MDNode *forceUnrollNode = MDNode::get(Context, forceUnroll); SmallVector Vals; Vals.push_back(NULL); Vals.push_back(forceUnrollNode); #else SmallVector forceUnroll; forceUnroll.push_back(MDString::get(Context, "llvm.loop.unroll.enable")); forceUnroll.push_back(ConstantInt::get(Type::getInt1Ty(Context), enable)); MDNode *forceUnrollNode = MDNode::get(Context, forceUnroll); SmallVector Vals; Vals.push_back(NULL); Vals.push_back(forceUnrollNode); #endif MDNode *NewLoopID = MDNode::get(Context, Vals); // Set operand 0 to refer to the loop id itself. NewLoopID->replaceOperandWith(0, NewLoopID); L->setLoopID(NewLoopID); } static bool hasPrivateLoadStore(Loop *L) { const std::vector subLoops = L->getSubLoops(); std::set subBlocks, blocks; for(auto l : subLoops) for(auto bb : l->getBlocks()) subBlocks.insert(bb); for(auto bb : L->getBlocks()) if (subBlocks.find(bb) == subBlocks.end()) blocks.insert(bb); for(auto bb : blocks) { for (BasicBlock::iterator inst = bb->begin(), instE = bb->end(); inst != instE; ++inst) { unsigned addrSpace = -1; if (isa(*inst)) { LoadInst *ld = cast(&*inst); addrSpace = ld->getPointerAddressSpace(); } else if (isa(*inst)) { StoreInst *st = cast(&*inst); addrSpace = st->getPointerAddressSpace(); } if (addrSpace == 0) return true; } } return false; } // If one loop has very large self trip count // we don't want to unroll it. // self trip count means trip count divide by the parent's trip count. for example // for (int i = 0; i < 16; i++) { // for (int j = 0; j < 4; j++) { // for (int k = 0; k < 2; k++) { // ... // } // ... // } // The inner loops j and k could be unrolled, but the loop i will not be unrolled. // The return value true means the L could be unrolled, otherwise, it could not // be unrolled. bool handleParentLoops(Loop *L, LPPassManager &LPM) { Loop *currL = L; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 ScalarEvolution *SE = &getAnalysis().getSE(); LoopInfo &loopInfo = getAnalysis().getLoopInfo(); #else ScalarEvolution *SE = &getAnalysis(); #endif BasicBlock *ExitBlock = currL->getLoopLatch(); if (!ExitBlock || !L->isLoopExiting(ExitBlock)) ExitBlock = currL->getExitingBlock(); unsigned currTripCount = 0; bool shouldUnroll = true; if (ExitBlock) currTripCount = SE->getSmallConstantTripCount(L, ExitBlock); if (currTripCount > 32) { shouldUnroll = false; //Don't change the unrollID if doesn't force unroll. //setUnrollID(currL, false); return shouldUnroll; } while(currL) { Loop *parentL = currL->getParentLoop(); unsigned parentTripCount = 0; if (parentL) { BasicBlock *parentExitBlock = parentL->getLoopLatch(); if (!parentExitBlock || !parentL->isLoopExiting(parentExitBlock)) parentExitBlock = parentL->getExitingBlock(); if (parentExitBlock) parentTripCount = SE->getSmallConstantTripCount(parentL, parentExitBlock); } if (parentTripCount != 0 && currTripCount * parentTripCount > 32) { //Don't change the unrollID if doesn't force unroll. //setUnrollID(parentL, false); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 loopInfo.markAsRemoved(parentL); #else LPM.deleteLoopFromQueue(parentL); #endif return shouldUnroll; } currL = parentL; currTripCount = parentTripCount * currTripCount; } return shouldUnroll; } // Analyze the outermost BBs of this loop, if there are // some private load or store, we change it's loop meta data // to indicate more aggresive unrolling on it. virtual bool runOnLoop(Loop *L, LPPassManager &LPM) { const MDNode *Enable = GetUnrollMetadataValue(L, "llvm.loop.unroll.enable"); if (Enable) return false; const unsigned Count = GetUnrollCount(L, "llvm.loop.unroll.count"); if (Count > 0) return false; if (!handleParentLoops(L, LPM)) return false; if (!hasPrivateLoadStore(L)) return false; setUnrollID(L, true); return true; } #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 virtual StringRef getPassName() const #else virtual const char *getPassName() const #endif { return "SPIR backend: custom loop unrolling pass"; } }; char CustomLoopUnroll::ID = 0; LoopPass *createCustomLoopUnrollPass() { return new CustomLoopUnroll(); } } // end namespace #endif Beignet-1.3.2-Source/backend/src/llvm/llvm_printf_parser.cpp000664 001750 001750 00000053454 13173554000 023235 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . */ /** * \file llvm_printf_parser.cpp * * When there are printf functions existing, we have something to do here. * Because the GPU's feature, it is relatively hard to parse and caculate the * printf's format string. OpenCL 1.2 restrict the format string to be a * constant string and can be decided at compiling time. So we add a pass here * to parse the format string and check whether the parameters is valid. * If all are valid, we will generate the according instruction to store the * parameter content into the printf buffer. And if something is invalid, a * warning is generated and the printf instruction is skipped in order to avoid * GPU error. We also keep the relationship between the printf format and printf * content in GPU's printf buffer here, and use the system's C standard printf to * print the content after kernel executed. */ #include #include #include "llvm_includes.hpp" #include "llvm/llvm_gen_backend.hpp" #include "sys/map.hpp" #include "ir/printf.hpp" #include "ir/unit.hpp" using namespace llvm; namespace gbe { using namespace ir; /* Return the conversion_specifier if succeed, -1 if failed. */ static char __parse_printf_state(char *begin, char *end, char** rend, PrintfState * state) { const char *fmt; state->left_justified = 0; state->sign_symbol = 0; //0 for nothing, 1 for sign, 2 for space. state->alter_form = 0; state->zero_padding = 0; state->vector_n = 0; state->min_width = -1; state->precision = -1; state->length_modifier = 0; state->conversion_specifier = PRINTF_CONVERSION_INVALID; state->out_buf_sizeof_offset = -1; fmt = begin; if (*fmt != '%') return -1; #define FMT_PLUS_PLUS do { \ if (fmt + 1 <= end) fmt++; \ else { \ printf("Error, line: %d, fmt > end\n", __LINE__); \ return -1; \ } \ } while(0) FMT_PLUS_PLUS; // parse the flags. while (*fmt == '-' || *fmt == '+' || *fmt == ' ' || *fmt == '#' || *fmt == '0') switch (*fmt) { case '-': /* The result of the conversion is left-justified within the field. */ state->left_justified = 1; FMT_PLUS_PLUS; break; case '+': /* The result of a signed conversion always begins with a plus or minus sign. */ state->sign_symbol = 1; FMT_PLUS_PLUS; break; case ' ': /* If the first character of a signed conversion is not a sign, or if a signed conversion results in no characters, a space is prefixed to the result. If the space and + flags both appear,the space flag is ignored. */ if (state->sign_symbol == 0) state->sign_symbol = 2; FMT_PLUS_PLUS; break; case '#': /*The result is converted to an alternative form. */ state->alter_form = 1; FMT_PLUS_PLUS; break; case '0': if (!state->left_justified) state->zero_padding = 1; FMT_PLUS_PLUS; break; default: break; } // The minimum field width while ((*fmt >= '0') && (*fmt <= '9')) { if (state->min_width < 0) state->min_width = 0; state->min_width = state->min_width * 10 + (*fmt - '0'); FMT_PLUS_PLUS; } // The precision if (*fmt == '.') { FMT_PLUS_PLUS; state->precision = 0; while (*fmt >= '0' && *fmt <= '9') { state->precision = state->precision * 10 + (*fmt - '0'); FMT_PLUS_PLUS; } } // handle the vector specifier. if (*fmt == 'v') { FMT_PLUS_PLUS; switch (*fmt) { case '2': case '3': case '4': case '8': state->vector_n = *fmt - '0'; FMT_PLUS_PLUS; break; case '1': FMT_PLUS_PLUS; if (*fmt == '6') { state->vector_n = 16; FMT_PLUS_PLUS; } else return -1; break; default: //Wrong vector, error. return -1; } } // length modifiers if (*fmt == 'h') { FMT_PLUS_PLUS; if (*fmt == 'h') { //hh state->length_modifier = PRINTF_LM_HH; FMT_PLUS_PLUS; } else if (*fmt == 'l') { //hl state->length_modifier = PRINTF_LM_HL; FMT_PLUS_PLUS; } else { //h state->length_modifier = PRINTF_LM_H; } } else if (*fmt == 'l') { state->length_modifier = PRINTF_LM_L; FMT_PLUS_PLUS; } #define CONVERSION_SPEC_AND_RET(XXX, xxx) \ case XXX: \ state->conversion_specifier = PRINTF_CONVERSION_##xxx; \ FMT_PLUS_PLUS; \ *rend = (char *)fmt; \ return XXX; \ break; // conversion specifiers switch (*fmt) { CONVERSION_SPEC_AND_RET('d', D) CONVERSION_SPEC_AND_RET('i', I) CONVERSION_SPEC_AND_RET('o', O) CONVERSION_SPEC_AND_RET('u', U) CONVERSION_SPEC_AND_RET('x', x) CONVERSION_SPEC_AND_RET('X', X) CONVERSION_SPEC_AND_RET('f', f) CONVERSION_SPEC_AND_RET('F', F) CONVERSION_SPEC_AND_RET('e', e) CONVERSION_SPEC_AND_RET('E', E) CONVERSION_SPEC_AND_RET('g', g) CONVERSION_SPEC_AND_RET('G', G) CONVERSION_SPEC_AND_RET('a', a) CONVERSION_SPEC_AND_RET('A', A) CONVERSION_SPEC_AND_RET('c', C) CONVERSION_SPEC_AND_RET('s', S) CONVERSION_SPEC_AND_RET('p', P) // %% has been handled default: return -1; } } static PrintfSet::PrintfFmt* parser_printf_fmt(char* format, int& num) { char* begin; char* end; char* p; char ret_char; char* rend; PrintfState state; PrintfSet::PrintfFmt* printf_fmt = new PrintfSet::PrintfFmt(); p = format; begin = format; end = format + strlen(format); /* Now parse it. */ while (*begin) { p = begin; again: while (p < end && *p != '%') { p++; } if (p < end && p + 1 == end) { // String with % at end. printf("string end with %%\n"); goto error; } if (p + 1 < end && *(p + 1) == '%') { // %% p += 2; goto again; } if (p != begin) { std::string s(begin, size_t(p - begin)); printf_fmt->push_back(PrintfSlot(s)); } if (p == end) // finish break; /* Now parse the % start conversion_specifier. */ ret_char = __parse_printf_state(p, end, &rend, &state); if (ret_char < 0) goto error; printf_fmt->push_back(state); num++; if (rend == end) break; begin = rend; } #if 0 { int j = 0; for (auto &s : printf_fmt->first) { j++; if (s.type == PRINTF_SLOT_TYPE_STATE) { fprintf(stderr, "---- %d ---: state : \n", j); fprintf(stderr, " left_justified : %d\n", s.state->left_justified); fprintf(stderr, " sign_symbol: %d\n", s.state->sign_symbol); fprintf(stderr, " alter_form : %d\n", s.state->alter_form); fprintf(stderr, " zero_padding : %d\n", s.state->zero_padding); fprintf(stderr, " vector_n : %d\n", s.state->vector_n); fprintf(stderr, " min_width : %d\n", s.state->min_width); fprintf(stderr, " precision : %d\n", s.state->precision); fprintf(stderr, " length_modifier : %d\n", s.state->length_modifier); fprintf(stderr, " conversion_specifier : %d\n", s.state->conversion_specifier); } else if (s.type == PRINTF_SLOT_TYPE_STRING) { fprintf(stderr, "---- %d ---: string : %s\n", j, s.str); } } } #endif return printf_fmt; error: printf("error format string.\n"); delete printf_fmt; return NULL; } class PrintfParser : public FunctionPass { public: static char ID; typedef std::pair PrintfInst; Module* module; IRBuilder<>* builder; Type* intTy; ir::Unit &unit; PrintfParser(ir::Unit &unit) : FunctionPass(ID), unit(unit) { module = NULL; builder = NULL; intTy = NULL; } bool parseOnePrintfInstruction(CallInst * call); bool generateOneParameterInst(PrintfSlot& slot, Value* arg, Value*& new_arg); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 virtual StringRef getPassName() const #else virtual const char *getPassName() const #endif { return "Printf Parser"; } virtual bool runOnFunction(llvm::Function &F); }; bool PrintfParser::parseOnePrintfInstruction(CallInst * call) { CallSite CS(call); CallSite::arg_iterator CI_FMT = CS.arg_begin(); int param_num = 0; llvm::Constant* arg0 = dyn_cast(*CI_FMT); if(!arg0) { return false; } llvm::Constant* arg0_ptr = dyn_cast(arg0->getOperand(0)); if (!arg0_ptr) { return false; } ConstantDataSequential* fmt_arg = dyn_cast(arg0_ptr->getOperand(0)); if (!fmt_arg || !fmt_arg->isCString()) { return false; } std::string fmt = fmt_arg->getAsCString(); if (fmt.size() == 0) return false; PrintfSet::PrintfFmt* printf_fmt = NULL; if (!(printf_fmt = parser_printf_fmt((char *)fmt.c_str(), param_num))) {//at lease print something printf("Warning: Parse the printf inst %s failed, no output for it\n", fmt.c_str()); return false; } /* iff parameter more than %, error. */ /* str_fmt arg0 arg1 ... NULL */ if (param_num + 2 != static_cast(call->getNumOperands())) { delete printf_fmt; printf("Warning: Parse the printf inst %s failed, parameters do not match the %% number, no output for it\n", fmt.c_str()); return false; } /* Insert some conversion if types do not match. */ builder->SetInsertPoint(call); int i = 1; for (auto &s : *printf_fmt) { if (s.type == PRINTF_SLOT_TYPE_STRING) continue; assert(i < static_cast(call->getNumOperands()) - 1); Value* new_arg = NULL; Value *arg = call->getOperand(i); if (generateOneParameterInst(s, arg, new_arg) == false) { delete printf_fmt; printf("Warning: Parse the printf inst %s failed, the %d parameter format is wrong, no output for it\n", fmt.c_str(), i); return false; } if (new_arg) { // replace the according argument. call->setArgOperand(i, new_arg); } ++i; } GBE_ASSERT(unit.printfs.find(call) == unit.printfs.end()); unit.printfs.insert(std::pair((void *)call, printf_fmt)); return true; } bool PrintfParser::runOnFunction(llvm::Function &F) { bool hasPrintf = false; switch (F.getCallingConv()) { case CallingConv::C: case CallingConv::Fast: case CallingConv::SPIR_KERNEL: break; default: GBE_ASSERTM(false, "Unsupported calling convention"); } module = F.getParent(); intTy = IntegerType::get(module->getContext(), 32); // As we inline all function calls, so skip non-kernel functions bool bKernel = isKernelFunction(F); if(!bKernel) return false; builder = new IRBuilder<>(module->getContext()); llvm::GlobalValue* gFun = module->getNamedValue("printf"); if(gFun) { gFun->setName("__gen_ocl_printf_stub"); } llvm::GlobalValue* gFun2 = module->getNamedValue("puts"); if(gFun2 ) { gFun2->setName("__gen_ocl_puts_stub"); } /* First find printfs and caculate all slots size of one loop. */ for (llvm::Function::iterator B = F.begin(), BE = F.end(); B != BE; B++) { for (BasicBlock::iterator instI = B->begin(), instE = B->end(); instI != instE; ++instI) { llvm::CallInst* call = dyn_cast(instI); if (!call) { continue; } llvm::Function * callFunc = call->getCalledFunction(); if(!callFunc) { continue; } if ( callFunc->getIntrinsicID() != 0) continue; Value *Callee = call->getCalledValue(); const std::string fnName = Callee->getName(); if (fnName != "__gen_ocl_printf_stub" && fnName != "__gen_ocl_puts_stub") continue; if (!parseOnePrintfInstruction(call)) { // Just skip this printf instruction. continue; } hasPrintf = true; } } delete builder; return hasPrintf; } bool PrintfParser::generateOneParameterInst(PrintfSlot& slot, Value* arg, Value*& new_arg) { assert(slot.type == PRINTF_SLOT_TYPE_STATE); assert(builder); /* Check whether the arg match the format specifer. If needed, some conversion need to be applied. */ switch (arg->getType()->getTypeID()) { case Type::IntegerTyID: { bool sign = false; switch (slot.state.conversion_specifier) { case PRINTF_CONVERSION_I: case PRINTF_CONVERSION_D: sign = true; case PRINTF_CONVERSION_O: case PRINTF_CONVERSION_U: case PRINTF_CONVERSION_x: case PRINTF_CONVERSION_X: if (slot.state.length_modifier == PRINTF_LM_L) { /* we would rather print long. */ if (arg->getType() != Type::getInt64Ty(module->getContext())) { new_arg = builder->CreateIntCast(arg, Type::getInt64Ty(module->getContext()), sign); } } else { /* If the bits change, we need to consider the signed. */ if (arg->getType() != Type::getInt32Ty(module->getContext())) { new_arg = builder->CreateIntCast(arg, Type::getInt32Ty(module->getContext()), sign); } } return true; case PRINTF_CONVERSION_C: /* Int to Char, add a conversion. */ new_arg = builder->CreateIntCast(arg, Type::getInt8Ty(module->getContext()), false); return true; case PRINTF_CONVERSION_F: case PRINTF_CONVERSION_f: case PRINTF_CONVERSION_E: case PRINTF_CONVERSION_e: case PRINTF_CONVERSION_G: case PRINTF_CONVERSION_g: case PRINTF_CONVERSION_A: case PRINTF_CONVERSION_a: printf("Warning: Have a float parameter for %%d like specifier, take care of it\n"); new_arg = builder->CreateSIToFP(arg, Type::getFloatTy(module->getContext())); return true; case PRINTF_CONVERSION_S: /* Here, the case is printf("xxx%s", 0); we should output the null. */ slot.state.str = "(null)"; return true; default: return false; } break; } case Type::DoubleTyID: case Type::FloatTyID: { /* llvm 3.6 will give a undef value for NAN. */ if (dyn_cast(arg)) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 APFloat nan = APFloat::getNaN(APFloat::IEEEsingle(), false); #else APFloat nan = APFloat::getNaN(APFloat::IEEEsingle, false); #endif new_arg = ConstantFP::get(module->getContext(), nan); } /* Because the printf is a variable parameter function, it does not have the function prototype, so the compiler will always promote the arg to the longest precise type for float. So here, we can always find it is double. */ switch (slot.state.conversion_specifier) { case PRINTF_CONVERSION_I: case PRINTF_CONVERSION_D: /* Float to Int, add a conversion. */ printf("Warning: Have a int parameter for %%f like specifier, take care of it\n"); new_arg = builder->CreateFPToSI(arg, Type::getInt32Ty(module->getContext())); return true; case PRINTF_CONVERSION_O: case PRINTF_CONVERSION_U: case PRINTF_CONVERSION_x: case PRINTF_CONVERSION_X: /* Float to uint, add a conversion. */ printf("Warning: Have a uint parameter for %%f like specifier, take care of it\n"); new_arg = builder->CreateFPToUI(arg, Type::getInt32Ty(module->getContext())); return true; case PRINTF_CONVERSION_F: case PRINTF_CONVERSION_f: case PRINTF_CONVERSION_E: case PRINTF_CONVERSION_e: case PRINTF_CONVERSION_G: case PRINTF_CONVERSION_g: case PRINTF_CONVERSION_A: case PRINTF_CONVERSION_a: new_arg = builder->CreateFPCast(arg, Type::getFloatTy(module->getContext())); return true; default: return false; } break; } /* %p and %s */ case Type::PointerTyID: switch (slot.state.conversion_specifier) { case PRINTF_CONVERSION_S: { llvm::Constant* arg0 = dyn_cast(arg); if(!arg0) { return false; } llvm::Constant* arg0_ptr = dyn_cast(arg0->getOperand(0)); if (!arg0_ptr) { return false; } ConstantDataSequential* fmt_arg = dyn_cast(arg0_ptr->getOperand(0)); if (!fmt_arg || !fmt_arg->isCString()) { return false; } slot.state.str = fmt_arg->getAsCString(); return true; } case PRINTF_CONVERSION_P: { new_arg = builder->CreatePtrToInt(arg, Type::getInt32Ty(module->getContext())); return true; } default: return false; } break; case Type::VectorTyID: { Type* vect_type = arg->getType(); Type* elt_type = vect_type->getVectorElementType(); int vec_num = vect_type->getVectorNumElements(); bool sign = false; if (vec_num != slot.state.vector_n) { printf("Error The printf vector number is not match!\n"); return false; } switch (slot.state.conversion_specifier) { case PRINTF_CONVERSION_I: case PRINTF_CONVERSION_D: sign = true; case PRINTF_CONVERSION_O: case PRINTF_CONVERSION_U: case PRINTF_CONVERSION_x: case PRINTF_CONVERSION_X: { if (elt_type->getTypeID() != Type::IntegerTyID) { printf("Do not support type conversion between float and int in vector printf!\n"); return false; } Type* elt_dst_type = NULL; if (slot.state.length_modifier == PRINTF_LM_L) { elt_dst_type = Type::getInt64Ty(elt_type->getContext()); } else { elt_dst_type = Type::getInt32Ty(elt_type->getContext()); } /* If the bits change, we need to consider the signed. */ if (elt_type != elt_dst_type) { Value *II = NULL; for (int i = 0; i < vec_num; i++) { Value *vec = II ? II : UndefValue::get(VectorType::get(elt_dst_type, vec_num)); Value *cv = ConstantInt::get(Type::getInt32Ty(elt_type->getContext()), i); Value *org = builder->CreateExtractElement(arg, cv); Value *cvt = builder->CreateIntCast(org, elt_dst_type, sign); II = builder->CreateInsertElement(vec, cvt, cv); } new_arg = II; } return true; } case PRINTF_CONVERSION_F: case PRINTF_CONVERSION_f: case PRINTF_CONVERSION_E: case PRINTF_CONVERSION_e: case PRINTF_CONVERSION_G: case PRINTF_CONVERSION_g: case PRINTF_CONVERSION_A: case PRINTF_CONVERSION_a: if (elt_type->getTypeID() != Type::DoubleTyID && elt_type->getTypeID() != Type::FloatTyID) { printf("Do not support type conversion between float and int in vector printf!\n"); return false; } if (elt_type->getTypeID() != Type::FloatTyID) { Value *II = NULL; for (int i = 0; i < vec_num; i++) { Value *vec = II ? II : UndefValue::get(VectorType::get(Type::getFloatTy(elt_type->getContext()), vec_num)); Value *cv = ConstantInt::get(Type::getInt32Ty(elt_type->getContext()), i); Value *org = builder->CreateExtractElement(arg, cv); Value* cvt = builder->CreateFPCast(org, Type::getFloatTy(module->getContext())); II = builder->CreateInsertElement(vec, cvt, cv); } new_arg = II; } return true; default: return false; } } default: return false; } return false; } FunctionPass* createPrintfParserPass(ir::Unit &unit) { return new PrintfParser(unit); } char PrintfParser::ID = 0; } // end namespace Beignet-1.3.2-Source/backend/src/llvm/StripAttributes.cpp000664 001750 001750 00000010450 13173554000 022462 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ // Imported from pNaCl project // Copyright (c) 2003-2014 University of Illinois at Urbana-Champaign. // All rights reserved. // // Developed by: // // LLVM Team // // University of Illinois at Urbana-Champaign // // http://llvm.org // // Permission is hereby granted, free of charge, to any person obtaining a copy of // this software and associated documentation files (the "Software"), to deal with // the Software without restriction, including without limitation the rights to // use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies // of the Software, and to permit persons to whom the Software is furnished to do // so, subject to the following conditions: // // * Redistributions of source code must retain the above copyright notice, // this list of conditions and the following disclaimers. // // * Redistributions in binary form must reproduce the above copyright notice, // this list of conditions and the following disclaimers in the // documentation and/or other materials provided with the distribution. // // * Neither the names of the LLVM Team, University of Illinois at // Urbana-Champaign, nor the names of its contributors may be used to // endorse or promote products derived from this Software without specific // prior written permission. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS // FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE // CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE // SOFTWARE. // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This pass strips out attributes that are not supported by Beignet. // Currently, this strips out: // // * Calling conventions from functions and function calls. // #include "llvm_includes.hpp" #include "llvm_gen_backend.hpp" using namespace llvm; namespace { class StripAttributes : public FunctionPass { public: static char ID; // Pass identification, replacement for typeid StripAttributes(bool lastTime) : FunctionPass(ID), lastTime(lastTime) { } virtual bool runOnFunction(Function &Func); private: bool lastTime; //last time all StripAttributes }; } char StripAttributes::ID = 0; bool StripAttributes::runOnFunction(Function &Func) { Func.setCallingConv(CallingConv::C); Func.setLinkage(GlobalValue::ExternalLinkage); if (!gbe::isKernelFunction(Func)) { Func.addFnAttr(Attribute::AlwaysInline); if (lastTime || (Func.getName().find("__gen_mem") == std::string::npos)) // Memcpy and memset functions could be deleted at last inline. // Delete memcpy and memset functions for output llvm ir friendly. Func.setLinkage(GlobalValue::LinkOnceAnyLinkage); } for (Function::iterator BB = Func.begin(), E = Func.end(); BB != E; ++BB) { for (BasicBlock::iterator Inst = BB->begin(), E = BB->end(); Inst != E; ++Inst) { CallSite Call(&*Inst); if (Call) Call.setCallingConv(CallingConv::C); } } return true; } FunctionPass *llvm::createStripAttributesPass(bool lastTime) { return new StripAttributes(lastTime); } Beignet-1.3.2-Source/backend/src/llvm/llvm_intrinsic_lowering.cpp000664 001750 001750 00000014227 13173554000 024262 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . */ /** * \file llvm_intrinisc_lowering.cpp * \author Yang Rong */ #include "llvm_includes.hpp" #include "llvm/llvm_gen_backend.hpp" #include "sys/map.hpp" using namespace llvm; namespace gbe { class InstrinsicLowering : public BasicBlockPass { public: static char ID; InstrinsicLowering() : BasicBlockPass(ID) {} void getAnalysisUsage(AnalysisUsage &AU) const { } #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 virtual StringRef getPassName() const #else virtual const char *getPassName() const #endif { return "SPIR backend: lowering instrinsics"; } static char convertSpaceToName(Value *val) { const uint32_t space = val->getType()->getPointerAddressSpace(); switch(space) { case 0: return 'p'; case 1: return 'g'; case 2: return 'c'; case 3: return 'l'; case 4: return 'n'; default: assert(0 && "Non support address space"); return '\0'; } } static CallInst *replaceCallWith(const char *NewFn, CallInst *CI, Value **ArgBegin, Value **ArgEnd, Type *RetTy) { // If we haven't already looked up this function, check to see if the // program already contains a function with this name. Module *M = CI->getParent()->getParent()->getParent(); // Get or insert the definition now. std::vector ParamTys; for (Value** I = ArgBegin; I != ArgEnd; ++I) ParamTys.push_back((*I)->getType()); Constant* FCache = M->getOrInsertFunction(NewFn, FunctionType::get(RetTy, ParamTys, false)); IRBuilder<> Builder(CI->getParent(), BasicBlock::iterator(CI)); SmallVector Args(ArgBegin, ArgEnd); CallInst *NewCI = Builder.CreateCall(FCache, Args); NewCI->setName(CI->getName()); if (!CI->use_empty()) CI->replaceAllUsesWith(NewCI); CI->eraseFromParent(); return NewCI; } virtual bool runOnBasicBlock(BasicBlock &BB) { bool changedBlock = false; Module *M = BB.getParent()->getParent(); DataLayout TD(M); LLVMContext &Context = BB.getContext(); for (BasicBlock::iterator DI = BB.begin(); DI != BB.end(); ) { Instruction *Inst = &*DI++; CallInst* CI = dyn_cast(Inst); if(CI == NULL) continue; IRBuilder<> Builder(&BB, BasicBlock::iterator(CI)); // only support memcpy and memset if (Function *F = CI->getCalledFunction()) { const Intrinsic::ID intrinsicID = (Intrinsic::ID) F->getIntrinsicID(); if (intrinsicID == 0) continue; switch (intrinsicID) { case Intrinsic::memcpy: { Type *IntPtr = TD.getIntPtrType(Context); Value *Size = Builder.CreateIntCast(CI->getArgOperand(2), IntPtr, /* isSigned */ false); Value *align = Builder.CreateIntCast(CI->getArgOperand(3), IntPtr, /* isSigned */ false); ConstantInt *ci = dyn_cast(align); Value *Ops[3]; Ops[0] = CI->getArgOperand(0); Ops[1] = CI->getArgOperand(1); Ops[2] = Size; char name[24] = "__gen_memcpy_xx"; name[13] = convertSpaceToName(Ops[0]); name[14] = convertSpaceToName(Ops[1]); if(ci && (ci->getZExtValue() % 4 == 0)) //alignment is constant and 4 byte align strcat(name, "_align"); replaceCallWith(name, CI, Ops, Ops+3, Type::getVoidTy(Context)); break; } case Intrinsic::memset: { Value *Op0 = CI->getArgOperand(0); Value *val = Builder.CreateIntCast(CI->getArgOperand(1), IntegerType::getInt8Ty(Context), /* isSigned */ false); Type *IntPtr = TD.getIntPtrType(Op0->getType()); Value *Size = Builder.CreateIntCast(CI->getArgOperand(2), IntPtr, /* isSigned */ false); Value *align = Builder.CreateIntCast(CI->getArgOperand(3), IntPtr, /* isSigned */ false); ConstantInt *ci = dyn_cast(align); Value *Ops[3]; Ops[0] = Op0; // Extend the amount to i32. Ops[1] = val; Ops[2] = Size; char name[24] = "__gen_memset_x"; name[13] = convertSpaceToName(Ops[0]); if(ci && (ci->getZExtValue() % 4 == 0)) //alignment is constant and 4 byte align strcat(name, "_align"); replaceCallWith(name, CI, Ops, Ops+3, Type::getVoidTy(Context)); break; } default: continue; } } } return changedBlock; } }; char InstrinsicLowering::ID = 0; BasicBlockPass *createIntrinsicLoweringPass() { return new InstrinsicLowering(); } } // end namespace Beignet-1.3.2-Source/backend/src/llvm/llvm_device_enqueue.cpp000664 001750 001750 00000040267 13173554000 023343 0ustar00yryr000000 000000 /* * Copyright © 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include #include "llvm_includes.hpp" #include "ir/unit.hpp" #include "llvm_gen_backend.hpp" #include "ocl_common_defines.h" using namespace llvm; namespace gbe { BitCastInst *isInvokeBitcast(Instruction *I) { BitCastInst* bt = dyn_cast(I); if (bt == NULL) return NULL; //bt->dump(); Type* type = bt->getOperand(0)->getType(); if(!type->isPointerTy()) return NULL; PointerType *pointerType = dyn_cast(type); Type *pointed = pointerType->getElementType(); if(!pointed->isFunctionTy()) return NULL; Function *Fn = dyn_cast(bt->getOperand(0)); if(Fn == NULL) return NULL; /* This is a fake, to check the function bitcast is for block or not */ std::string fnName = Fn->getName(); if(fnName.find("_invoke") == std::string::npos) return NULL; return bt; } void mutateArgAddressSpace(Argument *arg) { std::listWorkList; WorkList.push_back(arg); while(!WorkList.empty()) { Value *v = WorkList.front(); for (Value::use_iterator iter = v->use_begin(); iter != v->use_end(); ++iter) { // After LLVM 3.5, use_iterator points to 'Use' instead of 'User', // which is more straightforward. #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR < 35 User *theUser = *iter; #else User *theUser = iter->getUser(); #endif // becareful with sub operation if (isa(theUser) || isa(theUser)) continue; WorkList.push_back(theUser); } PointerType *ty = dyn_cast(v->getType()); if(ty == NULL) continue; //should only one argument, private pointer type ty = PointerType::get(ty->getPointerElementType(), 1); v->mutateType(ty); WorkList.pop_front(); } } Function* setFunctionAsKernel(Module *mod, Function *Fn) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 LLVMContext &Context = mod->getContext(); Type *intTy = IntegerType::get(mod->getContext(), 32); SmallVector kernelMDArgs; // MDNode for the kernel argument address space qualifiers. SmallVector addressQuals; // MDNode for the kernel argument access qualifiers (images only). SmallVector accessQuals; // MDNode for the kernel argument type names. SmallVector argTypeNames; // MDNode for the kernel argument base type names. SmallVector argBaseTypeNames; // MDNode for the kernel argument type qualifiers. SmallVector argTypeQuals; // MDNode for the kernel argument names. SmallVector argNames; //Because paramter type changed, so must re-create the invoke function and replace the old one std::vector ParamTys; ValueToValueMapTy VMap; for (Function::arg_iterator I = Fn->arg_begin(), E = Fn->arg_end(); I != E; ++I) { PointerType *ty = dyn_cast(I->getType()); //Foce set the address space to global if(ty && (ty->getAddressSpace() == 0 || ty->getAddressSpace() == 4)) ty = PointerType::get(ty->getPointerElementType(), 1); ParamTys.push_back(ty); } FunctionType* NewFT = FunctionType::get(Fn->getReturnType(), ParamTys, false); Function* NewFn = Function::Create(NewFT, Function::ExternalLinkage, Fn->getName()); SmallVector Returns; Function::arg_iterator NewFnArgIt = NewFn->arg_begin(); for (Function::arg_iterator I = Fn->arg_begin(), E = Fn->arg_end(); I != E; ++I) { std::string ArgName = I->getName(); NewFnArgIt->setName(ArgName); VMap[&*I] = &(*NewFnArgIt++); } CloneFunctionInto(NewFn, Fn, VMap, /*ModuleLevelChanges=*/true, Returns); Fn->setName("__d" + Fn->getName()); mod->getFunctionList().push_back(NewFn); //mod->getOrInsertFunction(NewFn->getName(), NewFn->getFunctionType(), // NewFn->getAttributes()); for (Function::arg_iterator I = NewFn->arg_begin(), E = NewFn->arg_end(); I != E; ++I) { PointerType *ty = dyn_cast(I->getType()); //mutate the address space of all pointer derive from the argmument from private to global if(ty && ty->getAddressSpace() == 1) mutateArgAddressSpace(&*I); //ty = dyn_cast(I->getType()); addressQuals.push_back(llvm::ConstantAsMetadata::get(ConstantInt::get(intTy, ty->getAddressSpace()))); accessQuals.push_back(llvm::MDString::get(Context, "none")); argTypeNames.push_back(llvm::MDString::get(Context, "char*")); argBaseTypeNames.push_back(llvm::MDString::get(Context, "char*")); argTypeQuals.push_back(llvm::MDString::get(Context, "")); argNames.push_back(llvm::MDString::get(Context, I->getName())); } //If run to here, llvm version always > 3.9, add the version check just for build. NewFn->setMetadata("kernel_arg_addr_space", llvm::MDNode::get(Context, addressQuals)); NewFn->setMetadata("kernel_arg_access_qual", llvm::MDNode::get(Context, accessQuals)); NewFn->setMetadata("kernel_arg_type", llvm::MDNode::get(Context, argTypeNames)); NewFn->setMetadata("kernel_arg_base_type", llvm::MDNode::get(Context, argBaseTypeNames)); NewFn->setMetadata("kernel_arg_type_qual", llvm::MDNode::get(Context, argTypeQuals)); NewFn->setMetadata("kernel_arg_name", llvm::MDNode::get(Context, argNames)); return NewFn; #else assert(0); //only opencl 2.0 could reach hear. return Fn; #endif } Instruction* replaceInst(Instruction *I, Value *v) { //The bitcast is instruction if(BitCastInst *bt = dyn_cast(&*I)) { bt->replaceAllUsesWith(v); return bt; } return NULL; } void collectDeviceEnqueueInfo(Module *mod, ir::Unit &unit) { std::set deadInsnSet; std::set deadFunctionSet; std::map blocks; if (getModuleOclVersion(mod) < 200) return; for (Module::iterator SF = mod->begin(), E = mod->end(); SF != E; ++SF) { Function *f = &*SF; if (f->isDeclaration()) continue; for (inst_iterator I = inst_begin(f), E = inst_end(f); I != E; ++I) { if (BitCastInst* bt = isInvokeBitcast(&*I)) { /* handle block description, convert the instruction that store block * invoke pointer to store the index in the unit's block functions index.*/ Function *Fn = dyn_cast(bt->getOperand(0)); std::string fnName = Fn->getName(); int index = -1; for(size_t i=0; iuse_begin(); iter != bt->use_end(); ++iter) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR < 35 User *theUser = *iter; #else User *theUser = iter->getUser(); #endif if(StoreInst *st = dyn_cast(theUser)) { GetElementPtrInst * gep = dyn_cast(st->getPointerOperand()); if(gep) blocks[gep->getOperand(0)] = fnName; } } if(StoreInst* st = dyn_cast(&*I)) { GetElementPtrInst * gep = dyn_cast(st->getPointerOperand()); if(gep) blocks[gep->getOperand(0)] = fnName; } Value *v = Constant::getIntegerValue(bt->getType(), APInt(unit.getPointerSize(), index)); bt->replaceAllUsesWith(v); deadInsnSet.insert(bt); } if(CallInst *CI = dyn_cast(&*I)) { IRBuilder<> builder(CI->getParent(), BasicBlock::iterator(CI)); if(CI->getCalledFunction() == NULL) { //unnamed call function, parse the use to find the define of called function SmallVector args(CI->op_begin(), CI->op_end()-1); Value *v = CI->getCalledValue(); BitCastInst* bt = dyn_cast(v); if(bt == NULL) continue; LoadInst* ld = dyn_cast(bt->getOperand(0)); if(ld == NULL) continue; GetElementPtrInst * gep = dyn_cast(ld->getPointerOperand()); if(gep == NULL) continue; Value *fnPointer = gep->getOperand(0)->stripPointerCasts(); if(fnPointer == gep->getOperand(0)) continue; if(blocks.find(fnPointer) != blocks.end()) { std::string fnName = blocks[fnPointer]; Function* f = mod->getFunction(fnName); CallInst *newCI = builder.CreateCall(f, args); CI->replaceAllUsesWith(newCI); deadInsnSet.insert(CI); continue; } //the function is global variable if(GlobalVariable* gv = dyn_cast(fnPointer)) { Constant *c = gv->getInitializer(); ConstantExpr *expr = dyn_cast(c->getOperand(3)); BitCastInst *bt = dyn_cast(expr->getAsInstruction()); Function* f = dyn_cast(bt->getOperand(0)); CallInst *newCI = builder.CreateCall(f, args); CI->replaceAllUsesWith(newCI); deadInsnSet.insert(CI); continue; } ld = dyn_cast(fnPointer); if(ld == NULL) continue; if(GlobalVariable *gv = dyn_cast(ld->getPointerOperand())) { ConstantExpr *expr = dyn_cast(gv->getInitializer()); BitCastInst *bt = dyn_cast(expr->getAsInstruction()); GlobalVariable *block_literal = dyn_cast(bt->getOperand(0)); Constant *v = block_literal->getInitializer(); expr = dyn_cast(v->getOperand(3)); bt = dyn_cast(expr->getAsInstruction()); Function* f = dyn_cast(bt->getOperand(0)); CallInst *newCI = builder.CreateCall(f, args); CI->replaceAllUsesWith(newCI); deadInsnSet.insert(CI); continue; } if(AllocaInst *ai = dyn_cast(ld->getPointerOperand())) { Value *v = NULL; for (Value::use_iterator iter = ai->use_begin(); iter != ai->use_end(); ++iter) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR < 35 User *theUser = *iter; #else User *theUser = iter->getUser(); #endif if(StoreInst *st = dyn_cast(theUser)) { v = st->getValueOperand()->stripPointerCasts(); } } if(blocks.find(v) == blocks.end()) { if(GlobalVariable *gv = dyn_cast(v)) { Constant *c = gv->getInitializer(); ConstantExpr *expr = dyn_cast(c->getOperand(3)); BitCastInst *bt = dyn_cast(expr->getAsInstruction()); Function* f = dyn_cast(bt->getOperand(0)); blocks[v] = f->getName(); } } std::string fnName = blocks[v]; Function* f = mod->getFunction(fnName); CallInst *newCI = builder.CreateCall(f, args); CI->replaceAllUsesWith(newCI); deadInsnSet.insert(CI); continue; } //can't find the function's define assert(0); } else { //handle enqueue_kernel function call Function *fn = CI->getCalledFunction(); if (fn->getName().find("enqueue_kernel") == std::string::npos) continue; //block parameter's index, 3 or 6 int block_index = 3; Type *type = CI->getArgOperand(block_index)->getType(); if(type->isIntegerTy()) block_index = 6; Value *block = CI->getArgOperand(block_index)->stripPointerCasts(); LoadInst *ld = dyn_cast(block); Value *v = NULL; if(ld) { Value *block = ld->getPointerOperand(); for (Value::use_iterator iter = block->use_begin(); iter != block->use_end(); ++iter) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR < 35 User *theUser = *iter; #else User *theUser = iter->getUser(); #endif if(StoreInst *st = dyn_cast(theUser)) { v = st->getValueOperand()->stripPointerCasts(); } } if(blocks.find(v) == blocks.end()) { if(GlobalVariable *gv = dyn_cast(v)) { Constant *c = gv->getInitializer(); ConstantExpr *expr = dyn_cast(c->getOperand(3)); BitCastInst *bt = dyn_cast(expr->getAsInstruction()); Function* f = dyn_cast(bt->getOperand(0)); blocks[v] = f->getName(); } } } else if(isa(block)) { v = block; } std::string fnName = blocks[v]; Function* f = mod->getFunction(fnName); deadFunctionSet.insert(f); f = setFunctionAsKernel(mod, f); if( fn->isVarArg() ) { //enqueue function with slm, convert to __gen_enqueue_kernel_slm call //store the slm information to a alloca address. int start = block_index + 1 + 1; //the first is count, skip int count = CI->getNumArgOperands() - start; Type *intTy = IntegerType::get(mod->getContext(), 32); Type *int64Ty = IntegerType::get(mod->getContext(), 64); AllocaInst *AI = builder.CreateAlloca(intTy, ConstantInt::get(intTy, count)); for(uint32_t i = start; i < CI->getNumArgOperands(); i++) { Value *ptr = builder.CreateGEP(AI, ConstantInt::get(intTy, i-start)); Value *argSize = CI->getArgOperand(i); if (argSize->getType() == int64Ty) { argSize = builder.CreateTrunc(argSize, intTy); } builder.CreateStore(argSize, ptr); } SmallVector args(CI->op_begin(), CI->op_begin() + 3); args.push_back(CI->getArgOperand(block_index)); args.push_back(ConstantInt::get(intTy, count)); args.push_back(AI); std::vector ParamTys; for (Value** iter = args.begin(); iter != args.end(); ++iter) ParamTys.push_back((*iter)->getType()); CallInst* newCI = builder.CreateCall(cast(mod->getOrInsertFunction( "__gen_enqueue_kernel_slm", FunctionType::get(intTy, ParamTys, false))), args); CI->replaceAllUsesWith(newCI); deadInsnSet.insert(CI); } } } } } for (auto it: deadInsnSet) { it->eraseFromParent(); } for (auto it: deadFunctionSet) { it->eraseFromParent(); } } }; Beignet-1.3.2-Source/backend/src/llvm/llvm_includes.hpp000664 001750 001750 00000010672 13173554000 022165 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Yang Rong */ /** * \file llvm_includes.hpp * \author Yang Rong */ #ifndef __GBE_IR_LLVM_INCLUDES_HPP__ #define __GBE_IR_LLVM_INCLUDES_HPP__ #ifdef GBE_COMPILER_AVAILABLE #include "llvm/Config/llvm-config.h" #include "llvm/IR/BasicBlock.h" #include "llvm/IR/Constants.h" #include "llvm/IR/Function.h" #include "llvm/IR/Instructions.h" #include "llvm/IR/Module.h" #include "llvm/IR/IRBuilder.h" #include "llvm/IR/DataLayout.h" #include "llvm/IR/DerivedTypes.h" #include "llvm/IR/InstrTypes.h" #include "llvm/IR/IntrinsicInst.h" #include "llvm/IR/Attributes.h" #include "llvm/IR/CallingConv.h" #include "llvm/IR/Intrinsics.h" #include "llvm/IR/InlineAsm.h" #include "llvm/IR/LLVMContext.h" #include "llvm_includes.hpp" #include "llvm/Pass.h" #include "llvm/ADT/DenseMap.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/SmallString.h" #include "llvm/Analysis/ScalarEvolution.h" #include "llvm/Analysis/ScalarEvolutionExpressions.h" #include "llvm/Analysis/CFGPrinter.h" #include "llvm/Analysis/LoopPass.h" #include "llvm/Analysis/TargetTransformInfo.h" #include "llvm/Analysis/LoopInfo.h" #include "llvm/Analysis/ValueTracking.h" #include "llvm/Analysis/Passes.h" #include "llvm/Support/raw_ostream.h" #include "llvm/Support/Debug.h" #include "llvm/Support/MathExtras.h" #include "llvm/Support/FileSystem.h" #include "llvm/Support/MemoryBuffer.h" #include "llvm/Support/SourceMgr.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/FormattedStream.h" #include "llvm/Support/TargetRegistry.h" #include "llvm/Support/Host.h" #include "llvm/Support/ToolOutputFile.h" #include "llvm-c/Linker.h" #include "llvm/IRReader/IRReader.h" #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 #include //#include #else #include "llvm/Bitcode/ReaderWriter.h" #endif #include "llvm/Transforms/IPO.h" #include "llvm/Transforms/Utils/Cloning.h" #include "llvm/CodeGen/Passes.h" #include "llvm/CodeGen/IntrinsicLowering.h" #include "llvm/Transforms/Scalar.h" #include "llvm/MC/MCAsmInfo.h" #include "llvm/MC/MCContext.h" #include "llvm/MC/MCInstrInfo.h" #include "llvm/MC/MCObjectFileInfo.h" #include "llvm/MC/MCRegisterInfo.h" #include "llvm/MC/MCSubtargetInfo.h" #include "llvm/MC/MCSymbol.h" #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35 #include "llvm/IR/Mangler.h" #include "llvm/IR/CallSite.h" #include "llvm/IR/CFG.h" #include "llvm/IR/InstVisitor.h" #include "llvm/IR/IRPrintingPasses.h" #include "llvm/IR/Verifier.h" #include "llvm/IR/InstIterator.h" #include "llvm/IR/Dominators.h" #else #include "llvm/Support/CallSite.h" #include "llvm/Support/CFG.h" #include "llvm/Support/InstIterator.h" #include "llvm/InstVisitor.h" #include "llvm/Analysis/Verifier.h" #include "llvm/Analysis/Dominators.h" #include "llvm/Assembly/PrintModulePass.h" #include "llvm/Target/Mangler.h" #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 #include "llvm/Analysis/TargetLibraryInfo.h" #include "llvm/IR/LegacyPassManager.h" #else #include "llvm/Target/TargetLibraryInfo.h" #include "llvm/PassManager.h" #endif #include "llvm/ADT/Triple.h" #include #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38 #include "llvm/Analysis/BasicAliasAnalysis.h" #include "llvm/Analysis/TypeBasedAliasAnalysis.h" #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 #include "llvm/Transforms/IPO/FunctionAttrs.h" #include "llvm/Transforms/Scalar/GVN.h" #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 #include "llvm/Support/Error.h" #endif #endif /*GBE_COMPILER_AVAILABLE */ #endif /* __GBE_IR_LLVM_INCLUDES_HPP__ */ Beignet-1.3.2-Source/backend/src/llvm/llvm_gen_backend.cpp000664 001750 001750 00000714571 13173554000 022603 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file llvm_gen_backend.cpp * \author Benjamin Segovia */ /* Transform the LLVM IR code into Gen IR code i.e. our temporary representation * for programs running on Gen. * * Overview * ======== * * This code is mostly inspired by the (now defunct and replaced by CppBackend) * CBackend. Basically, there are two ways to transform LLVM code into machine * code (or anything else) * - You write a complete LLVM backend by the book. LLVM proposes a lot of * useful tools to do so. This is obviously the path chosen by all CPU guys * but also by AMD and nVidia which both use the backend infrastructure to * output their own intermediate language. The good point is that you can * reuse a lot of tools (like proper PHI elimination with phi congruence and * global copy propagation a la Chaitin). Bad points are: * 1/ It is a *long* journey to generate anything. * 2/ More importantly, the code is hugely biased towards CPUs. Typically, * the way registers are defined do not fit well Gen register file (which * is really more like a regular piece of memory). Same issue apply for * predicated instructions with mask which is a bit boring to use with * SSA. Indeed, since DAGSelection still manipulates SSA values, anything * predicated requires to insert extra sources * - You write function passes to do the translation yourself. Obviously, you * reinvent the wheel. However, it is easy to do and easier to maintain * (somehow) * * So, the code here just traverses LLVM asm and generates our own ISA. The * generated code is OK even if a global copy propagation pass is still overdue. * Right now, it is pretty straighforward and simplistic in that regard * * About Clang and the ABI / target * ================================ * * A major question is: how did we actually generate this LLVM code from OpenCL? * Well, thing is that there is no generic target in LLVM since there are many * dependencies on endianness or ABIs. Fortunately, the ptx (and nvptx for LLVM * 3.2) profile is pretty well adapted to our needs since NV and Gen GPU are * kind of similar, or at least they are similar enough to share the same front * end. * * Problems * ======== * * - Several things regarding constants like ConstantExpr are not properly handled. * - ptx front end generates function calls. Since we do not support them yet, * the user needs to force the inlining of all functions. If a function call * is intercepted, we just abort */ #include "llvm_includes.hpp" #include "llvm/llvm_gen_backend.hpp" #include "ir/context.hpp" #include "ir/unit.hpp" #include "ir/half.hpp" #include "ir/liveness.hpp" #include "ir/value.hpp" #include "sys/set.hpp" #include "sys/cvar.hpp" #include "backend/program.h" #include #include "llvm/IR/DebugLoc.h" #include "llvm/IR/DebugInfo.h" /* Not defined for LLVM 3.0 */ #if !defined(LLVM_VERSION_MAJOR) #define LLVM_VERSION_MAJOR 3 #endif /* !defined(LLVM_VERSION_MAJOR) */ #if !defined(LLVM_VERSION_MINOR) #define LLVM_VERSION_MINOR 0 #endif /* !defined(LLVM_VERSION_MINOR) */ #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR < 33 #error "Only LLVM 3.3 and newer are supported" #endif using namespace llvm; namespace gbe { extern bool OCL_DEBUGINFO; // first defined by calling BVAR in program.cpp /*! Gen IR manipulates only scalar types */ static bool isScalarType(const Type *type) { return type->isFloatTy() || type->isHalfTy() || type->isIntegerTy() || type->isDoubleTy() || type->isPointerTy(); } static std::string getTypeName(ir::Context &ctx, const Type *type, int sign) { GBE_ASSERT(isScalarType(type)); if (type->isFloatTy() == true) return "float"; if (type->isHalfTy() == true) return "half"; if (type->isDoubleTy() == true) return "double"; GBE_ASSERT(type->isIntegerTy() == true); if(sign) { if (type == Type::getInt1Ty(type->getContext())) return "char"; if (type == Type::getInt8Ty(type->getContext())) return "char"; if (type == Type::getInt16Ty(type->getContext())) return "short"; if (type == Type::getInt32Ty(type->getContext())) return "int"; if (type == Type::getInt64Ty(type->getContext())) return "long"; } else { if (type == Type::getInt1Ty(type->getContext())) return "uchar"; if (type == Type::getInt8Ty(type->getContext())) return "uchar"; if (type == Type::getInt16Ty(type->getContext())) return "ushort"; if (type == Type::getInt32Ty(type->getContext())) return "uint"; if (type == Type::getInt64Ty(type->getContext())) return "ulong"; } GBE_ASSERTM(false, "Unsupported type."); return ""; } /*! LLVM IR Type to Gen IR type translation */ static ir::Type getType(ir::Context &ctx, const Type *type) { GBE_ASSERT(isScalarType(type)); if (type->isFloatTy() == true) return ir::TYPE_FLOAT; if (type->isHalfTy() == true) return ir::TYPE_HALF; if (type->isDoubleTy() == true) return ir::TYPE_DOUBLE; if (type->isPointerTy() == true) { if (ctx.getPointerSize() == ir::POINTER_32_BITS) return ir::TYPE_U32; else return ir::TYPE_U64; } GBE_ASSERT(type->isIntegerTy() == true); if (type == Type::getInt1Ty(type->getContext())) return ir::TYPE_BOOL; if (type == Type::getInt8Ty(type->getContext())) return ir::TYPE_S8; if (type == Type::getInt16Ty(type->getContext())) return ir::TYPE_S16; if (type == Type::getInt32Ty(type->getContext())) return ir::TYPE_S32; if (type == Type::getInt64Ty(type->getContext())) return ir::TYPE_S64; return ir::TYPE_LARGE_INT; } /*! LLVM IR Type to Gen IR unsigned type translation */ static ir::Type getUnsignedType(ir::Context &ctx, const Type *type) { GBE_ASSERT(type->isIntegerTy() == true); if (type == Type::getInt1Ty(type->getContext())) return ir::TYPE_BOOL; if (type == Type::getInt8Ty(type->getContext())) return ir::TYPE_U8; if (type == Type::getInt16Ty(type->getContext())) return ir::TYPE_U16; if (type == Type::getInt32Ty(type->getContext())) return ir::TYPE_U32; if (type == Type::getInt64Ty(type->getContext())) return ir::TYPE_U64; ctx.getUnit().setValid(false); return ir::TYPE_U64; } /*! Type to register family translation */ static ir::RegisterFamily getFamily(ir::Context &ctx, const Type *type) { GBE_ASSERT(isScalarType(type) == true); if (type == Type::getInt1Ty(type->getContext())) return ir::FAMILY_BOOL; if (type == Type::getInt8Ty(type->getContext())) return ir::FAMILY_BYTE; if (type == Type::getInt16Ty(type->getContext()) || type->isHalfTy()) return ir::FAMILY_WORD; if (type == Type::getInt32Ty(type->getContext()) || type->isFloatTy()) return ir::FAMILY_DWORD; if (type == Type::getInt64Ty(type->getContext()) || type->isDoubleTy()) return ir::FAMILY_QWORD; if (type->isPointerTy()) return ctx.getPointerFamily(); ctx.getUnit().setValid(false); return ir::FAMILY_BOOL; } /*! Get number of element to process dealing either with a vector or a scalar * value */ static ir::Type getVectorInfo(ir::Context &ctx, Value *value, uint32_t &elemNum, bool useUnsigned = false) { ir::Type type; Type *llvmType = value->getType(); if (llvmType->isVectorTy() == true) { VectorType *vectorType = cast(llvmType); Type *elementType = vectorType->getElementType(); elemNum = vectorType->getNumElements(); if (useUnsigned) type = getUnsignedType(ctx, elementType); else type = getType(ctx, elementType); } else { elemNum = 1; if (useUnsigned) type = getUnsignedType(ctx, llvmType); else type = getType(ctx, llvmType); } return type; } /*! OCL to Gen-IR address type */ static INLINE ir::AddressSpace addressSpaceLLVMToGen(unsigned llvmMemSpace) { switch (llvmMemSpace) { case 0: return ir::MEM_PRIVATE; case 1: return ir::MEM_GLOBAL; case 2: return ir::MEM_CONSTANT; case 3: return ir::MEM_LOCAL; case 4: return ir::MEM_GENERIC; } GBE_ASSERT(false); return ir::MEM_GLOBAL; } static INLINE ir::AddressSpace btiToGen(const unsigned bti) { switch (bti) { case BTI_CONSTANT: return ir::MEM_CONSTANT; case BTI_PRIVATE: return ir::MEM_PRIVATE; case BTI_LOCAL: return ir::MEM_LOCAL; default: return ir::MEM_GLOBAL; } return ir::MEM_GLOBAL; } static Constant *extractConstantElem(Constant *CPV, uint32_t index) { ConstantVector *CV = dyn_cast(CPV); GBE_ASSERT(CV != NULL); #if GBE_DEBUG const uint32_t elemNum = CV->getNumOperands(); GBE_ASSERTM(index < elemNum, "Out-of-bound constant vector access"); #endif /* GBE_DEBUG */ CPV = cast(CV->getOperand(index)); return CPV; } #define TYPESIZE(TYPE,VECT,SZ) else if( name == std::string(#TYPE).append(" __attribute__((ext_vector_type("#VECT")))") ) return VECT*SZ; #define TYPESIZEVEC(TYPE,SZ)\ else if(name == #TYPE) return SZ;\ TYPESIZE(TYPE,2,SZ)\ TYPESIZE(TYPE,3,SZ)\ TYPESIZE(TYPE,4,SZ)\ TYPESIZE(TYPE,8,SZ)\ TYPESIZE(TYPE,16,SZ) static uint32_t getTypeSize(Module* M, const ir::Unit &unit, std::string& name) { if(name == "size_t") return sizeof(size_t); TYPESIZEVEC(char,1) TYPESIZEVEC(unsigned char,1) TYPESIZEVEC(short,2) TYPESIZEVEC(unsigned short,2) TYPESIZEVEC(half,2) TYPESIZEVEC(int,4) TYPESIZEVEC(unsigned int,4) TYPESIZEVEC(float,4) TYPESIZEVEC(double,8) TYPESIZEVEC(long,8) TYPESIZEVEC(unsigned long,8) else{ StructType *StrTy = M->getTypeByName("struct."+name); if(StrTy) return getTypeByteSize(unit,StrTy); } GBE_ASSERTM(false, "Unsupported type name"); return 0; } #undef TYPESIZEVEC #undef TYPESIZE /*! Handle the LLVM IR Value to Gen IR register translation. This has 2 roles: * - Split the LLVM vector into several scalar values * - Handle the transparent copies (bitcast or use of intrincics functions * like get_local_id / get_global_id */ class RegisterTranslator { public: /*! Indices will be zero for scalar values */ typedef std::pair ValueIndex; RegisterTranslator(ir::Context &ctx) : ctx(ctx) {} /*! Empty the maps */ void clear(void) { valueMap.clear(); scalarMap.clear(); } /*! Some values will not be allocated. For example, a bit-cast destination * like: %fake = bitcast %real or a vector insertion since we do not have * vectors in Gen-IR */ void newValueProxy(Value *real, Value *fake, uint32_t realIndex = 0u, uint32_t fakeIndex = 0u) { const ValueIndex key(fake, fakeIndex); const ValueIndex value(real, realIndex); GBE_ASSERT(valueMap.find(key) == valueMap.end()); // Do not insert twice valueMap[key] = value; } /*! Mostly used for the preallocated registers (lids, gids) */ void newScalarProxy(ir::Register reg, Value *value, uint32_t index = 0u) { const ValueIndex key(value, index); GBE_ASSERT(scalarMap.find(key) == scalarMap.end()); scalarMap[key] = reg; } /*! Allocate a new scalar register */ ir::Register newScalar(Value *value, Value *key = NULL, uint32_t index = 0u, bool uniform = false) { // we don't allow normal constant, but GlobalValue is a special case, // it needs a register to store its address GBE_ASSERT(! (isa(value) && !isa(value))); Type *type = value->getType(); auto typeID = type->getTypeID(); if (typeID == Type::PointerTyID) { Type *eltTy = dyn_cast(type)->getElementType(); if (eltTy->isStructTy()) { StructType *strTy = dyn_cast(eltTy); if (!strTy->isLiteral() && strTy->getName().data() && strstr(strTy->getName().data(), "sampler")) type = Type::getInt32Ty(value->getContext()); } } switch (typeID) { case Type::IntegerTyID: case Type::FloatTyID: case Type::HalfTyID: case Type::DoubleTyID: case Type::PointerTyID: GBE_ASSERT(index == 0); return this->_newScalar(value, key, type, index, uniform); break; case Type::VectorTyID: { auto vectorType = cast(type); auto elementType = vectorType->getElementType(); auto elementTypeID = elementType->getTypeID(); if (elementTypeID != Type::IntegerTyID && elementTypeID != Type::FloatTyID && elementTypeID != Type::HalfTyID && elementTypeID != Type::DoubleTyID) GBE_ASSERTM(false, "Vectors of elements are not supported"); return this->_newScalar(value, key, elementType, index, uniform); break; } case Type::StructTyID: { auto structType = cast(type); auto elementType = structType->getElementType(index); auto elementTypeID = elementType->getTypeID(); if (elementTypeID != Type::IntegerTyID && elementTypeID != Type::FloatTyID && elementTypeID != Type::HalfTyID && elementTypeID != Type::DoubleTyID) GBE_ASSERTM(false, "Strcuts of elements are not supported"); return this->_newScalar(value, key, elementType, index, uniform); break; } default: NOT_SUPPORTED; }; return ir::Register(); } /*! iterating in the value map to get the final real register */ void getRealValue(Value* &value, uint32_t& index) { auto end = valueMap.end(); for (;;) { auto it = valueMap.find(std::make_pair(value, index)); if (it == end) break; else { value = it->second.first; index = it->second.second; } } } /*! Get the register from the given value at given index possibly iterating * in the value map to get the final real register */ ir::Register getScalar(Value *value, uint32_t index = 0u) { getRealValue(value, index); const auto key = std::make_pair(value, index); GBE_ASSERT(scalarMap.find(key) != scalarMap.end()); return scalarMap[key]; } /*! Insert a given register at given Value position */ void insertRegister(const ir::Register ®, Value *value, uint32_t index) { const auto key = std::make_pair(value, index); GBE_ASSERT(scalarMap.find(key) == scalarMap.end()); scalarMap[key] = reg; } /*! Says if the value exists. Otherwise, it is undefined */ bool valueExists(Value *value, uint32_t index) { getRealValue(value, index); const auto key = std::make_pair(value, index); return scalarMap.find(key) != scalarMap.end(); } /*! if it's a undef const value, return true. Otherwise, return false. */ bool isUndefConst(Value *value, uint32_t index) { getRealValue(value, index); Constant *CPV = dyn_cast(value); if(CPV && dyn_cast(CPV)) CPV = extractConstantElem(CPV, index); return (CPV && (isa(CPV))); } private: /*! This creates a scalar register for a Value (index is the vector index when * the value is a vector of scalars) */ ir::Register _newScalar(Value *value, Value *key, Type *type, uint32_t index, bool uniform) { const ir::RegisterFamily family = getFamily(ctx, type); const ir::Register reg = ctx.reg(family, uniform); key = key == NULL ? value : key; this->insertRegister(reg, key, index); return reg; } /*! Map value to ir::Register */ map scalarMap; /*! Map values to values when this is only a translation (eq bitcast) */ map valueMap; /*! Actually allocates the registers */ ir::Context &ctx; }; class GenWriter; class MemoryInstHelper { public: MemoryInstHelper(ir::Context &c, ir::Unit &u, GenWriter *w, bool l) : ctx(c), unit(u), writer(w), legacyMode(l) { } void emitUnalignedDQLoadStore(Value *llvmValues); ir::Tuple getValueTuple(llvm::Value *llvmValues, llvm::Type *elemType, unsigned start, unsigned elemNum); void emitBatchLoadOrStore(const ir::Type type, const uint32_t elemNum, Value *llvmValues, Type * elemType); ir::Register getOffsetAddress(ir::Register basePtr, unsigned offset); void shootMessage(ir::Type type, ir::Register offset, ir::Tuple value, unsigned elemNum); template void emitLoadOrStore(T &I); private: ir::Context &ctx; ir::Unit &unit; GenWriter *writer; bool legacyMode; ir::AddressSpace addrSpace; ir::Register mBTI; ir::Register mPtr; ir::AddressMode mAddressMode; unsigned SurfaceIndex; bool isLoad; bool dwAligned; }; /*! Translate LLVM IR code to Gen IR code */ class GenWriter : public FunctionPass, public InstVisitor { /*! Unit to compute */ ir::Unit &unit; /*! Helper structure to compute the unit */ ir::Context ctx; /*! Make the LLVM-to-Gen translation */ RegisterTranslator regTranslator; /*! Map target basic block to its ir::LabelIndex */ map labelMap; /*! Condition inversion can simplify branch code. We store here all the * compare instructions we need to invert to decrease branch complexity */ set conditionSet; map globalPointer; typedef map::iterator GlobalPtrIter; /*! * node information for later optimization */ map phiMap; map> pointerOrigMap; typedef map>::iterator PtrOrigMapIter; // map pointer source to bti map BtiMap; // map printf pointer source to bti int printfBti; uint32_t printfNum; // map ptr to its bti register map BtiValueMap; // map ptr to it's base map pointerBaseMap; std::set addrStoreInst; typedef map::iterator PtrBaseMapIter; /*! We visit each function twice. Once to allocate the registers and once to * emit the Gen IR instructions */ enum Pass { PASS_EMIT_REGISTERS = 0, PASS_EMIT_INSTRUCTIONS = 1 } pass; typedef enum { CONST_INT, CONST_FLOAT, CONST_DOUBLE } ConstTypeId; LoopInfo *LI; Function *Func; const Module *TheModule; int btiBase; bool has_errors; /*! legacyMode is for hardware before BDW, * which do not support stateless memory access */ bool legacyMode; public: static char ID; explicit GenWriter(ir::Unit &unit) : FunctionPass(ID), unit(unit), ctx(unit), regTranslator(ctx), printfBti(-1), printfNum(0), LI(0), TheModule(0), btiBase(BTI_RESERVED_NUM), has_errors(false), legacyMode(true) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 initializeLoopInfoWrapperPassPass(*PassRegistry::getPassRegistry()); #else initializeLoopInfoPass(*PassRegistry::getPassRegistry()); #endif pass = PASS_EMIT_REGISTERS; } #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 virtual llvm::StringRef getPassName() const { return "Gen Back-End"; } #else virtual const char *getPassName() const { return "Gen Back-End"; } #endif void getAnalysisUsage(AnalysisUsage &AU) const { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 AU.addRequired(); #else AU.addRequired(); #endif AU.setPreservesAll(); } virtual bool doInitialization(Module &M); /*! helper function for parsing global constant data */ void getConstantData(const Constant * c, void* mem, uint32_t& offset, vector &) const; void collectGlobalConstant(void) const; ir::ImmediateIndex processConstantImmIndex(Constant *CPV, int32_t index = 0u); const ir::Immediate &processConstantImm(Constant *CPV, int32_t index = 0u); uint32_t incBtiBase() { GBE_ASSERT(btiBase <= BTI_MAX_ID); return btiBase++; } bool runOnFunction(Function &F) { // Do not codegen any 'available_externally' functions at all, they have // definitions outside the translation unit. if (F.hasAvailableExternallyLinkage()) return false; // As we inline all function calls, so skip non-kernel functions bool bKernel = isKernelFunction(F); if(!bKernel) return false; Func = &F; assignBti(F); if (legacyMode) analyzePointerOrigin(F); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 37 LI = &getAnalysis().getLoopInfo(); #else LI = &getAnalysis(); #endif emitFunction(F); phiMap.clear(); globalPointer.clear(); pointerOrigMap.clear(); BtiMap.clear(); BtiValueMap.clear(); pointerBaseMap.clear(); addrStoreInst.clear(); // Reset for next function btiBase = BTI_RESERVED_NUM; printfBti = -1; return false; } /*! Given a possible pointer value, find out the interested escape like load/store or atomic instruction */ void findPointerEscape(Value *ptr, std::set &mixedPtr, bool recordMixed, std::vector &revisit); /*! For all possible pointers, GlobalVariable, function pointer argument, alloca instruction, find their pointer escape points */ void analyzePointerOrigin(Function &F); unsigned getNewBti(Value *origin, bool force); void assignBti(Function &F); bool isSingleBti(Value *Val); Value *getBtiRegister(Value *v); /*! get the pointer origin */ Value *getSinglePointerOrigin(Value *ptr); /*! get the bti base address */ Value *getPointerBase(Value *ptr); void processPointerArray(Value *ptr, Value *bti, Value *base); void handleStoreLoadAddress(Function &F); MDNode *getKernelFunctionMetadata(Function *F); virtual bool doFinalization(Module &M) { return false; } /*! handle global variable register allocation (local, constant space) */ void allocateGlobalVariableRegister(Function &F); /*! gather all the loops in the function and add them to ir::Function */ void gatherLoopInfo(ir::Function &fn); /*! do topological sorting of basicblocks */ void sortBasicBlock(Function &F); /*! Emit the complete function code and declaration */ void emitFunction(Function &F); /*! Handle input and output function parameters */ void emitFunctionPrototype(Function &F); /*! Emit the code for a basic block */ void emitBasicBlock(BasicBlock *BB); /*! Each block end may require to emit MOVs for further PHIs */ void emitMovForPHI(BasicBlock *curr, BasicBlock *succ); /*! Alocate one or several registers (if vector) for the value */ INLINE void newRegister(Value *value, Value *key = NULL, bool uniform = false); /*! get the register for a llvm::Constant */ ir::Register getConstantRegister(Constant *c, uint32_t index = 0); /*! Return a valid register from an operand (can use LOADI to make one) */ INLINE ir::Register getRegister(Value *value, uint32_t index = 0); /*! Create a new immediate from a constant */ ir::ImmediateIndex newImmediate(Constant *CPV, uint32_t index = 0); /*! Insert a new label index when this is a scalar value */ INLINE void newLabelIndex(const BasicBlock *bb); /*! Inspect the terminator instruction and try to see if we should invert * the value to simplify the code */ INLINE void simplifyTerminator(BasicBlock *bb); /*! Helper function to emit loads and stores */ template void emitLoadOrStore(T &I); /*! Will try to remove MOVs due to PHI resolution */ void removeMOVs(const ir::Liveness &liveness, ir::Function &fn); /*! Optimize phi move based on liveness information */ void optimizePhiCopy(ir::Liveness &liveness, ir::Function &fn, map &replaceMap, map &redundantPhiCopyMap); /*! further optimization after phi copy optimization. * Global liveness interefering checking based redundant phy value * elimination. */ void postPhiCopyOptimization(ir::Liveness &liveness, ir::Function &fn, map &replaceMap, map &redundantPhiCopyMap); /*! Will try to remove redundants LOADI in basic blocks */ void removeLOADIs(const ir::Liveness &liveness, ir::Function &fn); /*! To avoid lost copy, we need two values for PHI. This function create a * fake value for the copy (basically ptr+1) */ INLINE Value *getPHICopy(Value *PHI); // Currently supported instructions #define DECL_VISIT_FN(NAME, TYPE) \ void regAllocate##NAME(TYPE &I); \ void emit##NAME(TYPE &I); \ void visit##NAME(TYPE &I) { \ if (pass == PASS_EMIT_INSTRUCTIONS) \ emit##NAME(I); \ else \ regAllocate##NAME(I); \ } DECL_VISIT_FN(BinaryOperator, Instruction); DECL_VISIT_FN(CastInst, CastInst); DECL_VISIT_FN(ReturnInst, ReturnInst); DECL_VISIT_FN(LoadInst, LoadInst); DECL_VISIT_FN(StoreInst, StoreInst); DECL_VISIT_FN(CallInst, CallInst); DECL_VISIT_FN(ICmpInst, ICmpInst); DECL_VISIT_FN(FCmpInst, FCmpInst); DECL_VISIT_FN(InsertElement, InsertElementInst); DECL_VISIT_FN(ExtractElement, ExtractElementInst); DECL_VISIT_FN(ExtractValue, ExtractValueInst); DECL_VISIT_FN(ShuffleVectorInst, ShuffleVectorInst); DECL_VISIT_FN(SelectInst, SelectInst); DECL_VISIT_FN(BranchInst, BranchInst); DECL_VISIT_FN(PHINode, PHINode); DECL_VISIT_FN(AllocaInst, AllocaInst); DECL_VISIT_FN(AtomicRMWInst, AtomicRMWInst); DECL_VISIT_FN(AtomicCmpXchgInst, AtomicCmpXchgInst); #undef DECL_VISIT_FN // Emit rounding instructions from gen native function void emitRoundingCallInst(CallInst &I, CallSite &CS, ir::Opcode opcode); // Emit unary instructions from gen native function void emitUnaryCallInst(CallInst &I, CallSite &CS, ir::Opcode opcode, ir::Type = ir::TYPE_FLOAT); // Emit unary instructions from gen native function void emitAtomicInst(CallInst &I, CallSite &CS, ir::AtomicOps opcode); // Emit workgroup instructions void emitWorkGroupInst(CallInst &I, CallSite &CS, ir::WorkGroupOps opcode); // Emit subgroup instructions void emitSubGroupInst(CallInst &I, CallSite &CS, ir::WorkGroupOps opcode); // Emit subgroup instructions void emitBlockReadWriteMemInst(CallInst &I, CallSite &CS, bool isWrite, uint8_t vec_size, ir::Type = ir::TYPE_U32); void emitBlockReadWriteImageInst(CallInst &I, CallSite &CS, bool isWrite, uint8_t vec_size, ir::Type = ir::TYPE_U32); uint8_t appendSampler(CallSite::arg_iterator AI); uint8_t getImageID(CallInst &I); // These instructions are not supported at all void visitVAArgInst(VAArgInst &I) {NOT_SUPPORTED;} void visitSwitchInst(SwitchInst &I) {NOT_SUPPORTED;} void visitInvokeInst(InvokeInst &I) {NOT_SUPPORTED;} void visitResumeInst(ResumeInst &I) {NOT_SUPPORTED;} void visitInlineAsm(CallInst &I) {NOT_SUPPORTED;} void visitIndirectBrInst(IndirectBrInst &I) {NOT_SUPPORTED;} void visitUnreachableInst(UnreachableInst &I) {;} void visitGetElementPtrInst(GetElementPtrInst &I) {NOT_SUPPORTED;} void visitInsertValueInst(InsertValueInst &I) {NOT_SUPPORTED;} template void visitLoadOrStore(T &I); INLINE void gatherBTI(Value *pointer, ir::BTI &bti); // batch vec4/8/16 load/store INLINE void emitBatchLoadOrStore(const ir::Type type, const uint32_t elemNum, Value *llvmValue, const ir::Register ptr, const ir::AddressSpace addrSpace, Type * elemType, bool isLoad, ir::Register bti, bool dwAligned, bool fixedBTI); // handle load of dword/qword with unaligned address void emitUnalignedDQLoadStore(ir::Register ptr, Value *llvmValues, ir::AddressSpace addrSpace, ir::Register bti, bool isLoad, bool dwAligned, bool fixedBTI); void visitInstruction(Instruction &I) {NOT_SUPPORTED;} ir::PrintfSet::PrintfFmt* getPrintfInfo(CallInst* inst) { if (unit.printfs.find((void *)inst) == unit.printfs.end()) return NULL; return unit.printfs[inst]; } void emitAtomicInstHelper(const ir::AtomicOps opcode,const ir::Type type, const ir::Register dst, llvm::Value* llvmPtr, const ir::Tuple payloadTuple); private: void setDebugInfo_CTX(llvm::Instruction * insn); // store the debug infomation in context for subsequently passing to Gen insn ir::ImmediateIndex processConstantImmIndexImpl(Constant *CPV, int32_t index = 0u); template ir::ImmediateIndex processSeqConstant(ConstantDataSequential *seq, int index, ConstTypeId tid); ir::ImmediateIndex processConstantVector(ConstantVector *cv, int index); friend class MemoryInstHelper; }; char GenWriter::ID = 0; static void updatePointerSource(Value *parent, Value *theUser, Value *source, SmallVector &pointers) { if (isa(theUser)) { SelectInst *si = dyn_cast(theUser); if (si && si->getTrueValue() == parent) pointers[0] = source; else pointers[1] = source; } else if (isa(theUser)) { PHINode *phi = dyn_cast(theUser); unsigned opNum = phi ? phi->getNumIncomingValues() : 0; for (unsigned j = 0; j < opNum; j++) { if (phi->getIncomingValue(j) == parent) { pointers[j] = source; } } } else { pointers[0] = source; } } bool isMixedPoint(Value *val, SmallVector &pointers) { Value *validSrc = NULL; unsigned i = 0; if (pointers.size() < 2) return false; while(i < pointers.size()) { if (pointers[i] != NULL && validSrc != NULL && pointers[i] != validSrc) return true; // when source is same as itself, we don't treat it as a new source // this often occurs for PHINode if (pointers[i] != NULL && validSrc == NULL && pointers[i] != val) { validSrc = pointers[i]; } i++; } return false; } void GenWriter::findPointerEscape(Value *ptr, std::set &mixedPtr, bool bFirstPass, std::vector &revisit) { std::vector workList; std::set visited; // loadInst result maybe used as pointer std::set ptrCandidate; bool isPointerArray = false; if (ptr->use_empty()) return; workList.push_back(ptr); for (unsigned i = 0; i < workList.size(); i++) { Value *work = workList[i]; if (work->use_empty()) continue; for (Value::use_iterator iter = work->use_begin(); iter != work->use_end(); ++iter) { // After LLVM 3.5, use_iterator points to 'Use' instead of 'User', // which is more straightforward. #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR < 35 User *theUser = *iter; #else User *theUser = iter->getUser(); #endif // becareful with sub operation if (isa(theUser) && dyn_cast(theUser)->getOpcode() == Instruction::Sub) { // check both comes from ptrtoInt, don't need to traverse ptrdiff Value *op0 = theUser->getOperand(0); Value *op1 = theUser->getOperand(1); if ((isa(op0) && dyn_cast(op0)->getOpcode() == Instruction::PtrToInt) &&(isa(op1) && dyn_cast(op1)->getOpcode() == Instruction::PtrToInt)) { continue; } } if (isa(theUser)) { // some GlobalVariable maybe used in the function which is not current processed. // such kind of user should be skipped if (dyn_cast(theUser)->getParent()->getParent() != Func) continue; } bool visitedInThisSource = visited.find(theUser) != visited.end(); if (isa(theUser) || isa(theUser)) { // reached from another source, update pointer source PtrOrigMapIter ptrIter = pointerOrigMap.find(theUser); if (ptrIter == pointerOrigMap.end()) { // create new one unsigned capacity = 1; if (isa(theUser)) capacity = 2; if (isa(theUser)) { PHINode *phi = dyn_cast(theUser); capacity = phi ? phi->getNumIncomingValues() : 1; } SmallVector pointers; unsigned k = 0; while (k++ < capacity) { pointers.push_back(NULL); } updatePointerSource(work, theUser, ptr, pointers); pointerOrigMap.insert(std::make_pair(theUser, pointers)); } else { // update pointer source updatePointerSource(work, theUser, ptr, (*ptrIter).second); } ptrIter = pointerOrigMap.find(theUser); if (isMixedPoint(theUser, (*ptrIter).second)) { // for the first pass, we need to record the mixed point instruction. // for the second pass, we don't need to go further, the reason is: // we always use it's 'direct mixed pointer parent' as origin, if we don't // stop here, we may set wrong pointer origin. if (bFirstPass) mixedPtr.insert(theUser); else continue; } // don't fall into dead loop, if (visitedInThisSource || theUser == ptr) { continue; } } // pointer address is used as the ValueOperand in store instruction, should be skipped if (StoreInst *store = dyn_cast(theUser)) { if (store->getValueOperand() == work) { addrStoreInst.insert(store); Value * pointerOperand = store->getPointerOperand(); // check whether the pointerOperand already visited or not, // if not visited, then we need to record all the loadInst // on the origin of pointerOperand // if visited, that is the origin of the pointerOperand already // traversed, we need to the traverse again to record all the LoadInst PtrOrigMapIter pointerOpIter = pointerOrigMap.find(pointerOperand); bool pointerVisited = pointerOpIter != pointerOrigMap.end(); if (pointerVisited) { revisit.push_back((*pointerOpIter).second[0]); } PtrOrigMapIter ptrIter = pointerOrigMap.find(work); if (ptrIter == pointerOrigMap.end()) { // create new one SmallVector pointers; pointers.push_back(ptr); pointerOrigMap.insert(std::make_pair(work, pointers)); } else { // update the pointer source here, if ((!isa(work) && !isa(work))) (*ptrIter).second[0] = ptr; } continue; } } visited.insert(theUser); if (isa(theUser) || isa(theUser) || isa(theUser)) { if (isa(theUser)) { Function *F = dyn_cast(theUser)->getCalledFunction(); if (!F || F->getIntrinsicID() != 0) continue; } Value *pointer = NULL; if (isa(theUser)) { ptrCandidate.insert(cast(theUser)); pointer = dyn_cast(theUser)->getPointerOperand(); } else if (isa(theUser)) { pointer = dyn_cast(theUser)->getPointerOperand(); // Check whether we have stored a address to this pointer // if yes, we need to traverse the ptrCandidate, as they are loaded pointers if (addrStoreInst.find(theUser) != addrStoreInst.end()) { isPointerArray = true; } } else if (isa(theUser)) { // atomic/read(write)image CallInst *ci = dyn_cast(theUser); pointer = ci ? ci->getArgOperand(0) : NULL; } else { //theUser->dump(); GBE_ASSERT(0 && "Unknown instruction operating on pointers\n"); } // the pointer operand is same as pointer origin, don't add to pointerOrigMap if (ptr == pointer) continue; // load/store/atomic instruction, we have reached the end, stop further traversing PtrOrigMapIter ptrIter = pointerOrigMap.find(pointer); if (ptrIter == pointerOrigMap.end()) { // create new one SmallVector pointers; pointers.push_back(ptr); pointerOrigMap.insert(std::make_pair(pointer, pointers)); } else { // update the pointer source here, if ((!isa(pointer) && !isa(pointer))) (*ptrIter).second[0] = ptr; } } else { workList.push_back(theUser); } } } if (isPointerArray) { GBE_ASSERT((isa(ptr) || ptrCandidate.empty()) && "storing/loading pointers only support private array"); for (auto x : ptrCandidate) { revisit.push_back(x); } } ptrCandidate.clear(); } bool GenWriter::isSingleBti(Value *Val) { // self + others same --> single // all same ---> single if (!isa(Val) && !isa(Val)) { return true; } else { PtrOrigMapIter iter = pointerOrigMap.find(Val); SmallVector &pointers = (*iter).second; unsigned srcNum = pointers.size(); Value *source = NULL; for (unsigned x = 0; x < srcNum; x++) { // often happend in phiNode where one source is same as PHINode itself, skip it if (pointers[x] == Val) continue; if (source == NULL) source = pointers[x]; else { if (source != pointers[x]) return false; } } return true; } } Value *GenWriter::getPointerBase(Value *ptr) { PtrBaseMapIter baseIter = pointerBaseMap.find(ptr); if (baseIter != pointerBaseMap.end()) { return baseIter->second; } if (isa(ptr)) { PointerType *ty = PointerType::get(ptr->getType(), 0); return ConstantPointerNull::get(ty); } typedef std::map::iterator BtiIter; // for pointers that already assigned a bti, it is the base pointer, BtiIter found = BtiMap.find(ptr); if (found != BtiMap.end()) { if (isa(ptr->getType())) { PointerType *ty = cast(ptr->getType()); // only global pointer will have starting address if (ty->getAddressSpace() == 1) { return ptr; } else { return ConstantPointerNull::get(ty); } } else { PointerType *ty = PointerType::get(ptr->getType(), 0); return ConstantPointerNull::get(ty); } } PtrOrigMapIter iter = pointerOrigMap.find(ptr); // we may not find the ptr, as it may be uninitialized if (iter == pointerOrigMap.end()) { PointerType *ty = PointerType::get(ptr->getType(), 0); return ConstantPointerNull::get(ty); } SmallVector &pointers = (*iter).second; if (isSingleBti(ptr)) { Value *base = getPointerBase(pointers[0]); pointerBaseMap.insert(std::make_pair(ptr, base)); return base; } else { if (isa(ptr)) { SelectInst *si = dyn_cast(ptr); IRBuilder<> Builder(si->getParent()); Value *trueVal = getPointerBase((*iter).second[0]); Value *falseVal = getPointerBase((*iter).second[1]); Builder.SetInsertPoint(si); Value *base = Builder.CreateSelect(si->getCondition(), trueVal, falseVal); pointerBaseMap.insert(std::make_pair(ptr, base)); return base; } else if (isa(ptr)) { PHINode *phi = dyn_cast(ptr); IRBuilder<> Builder(phi->getParent()); Builder.SetInsertPoint(phi); PHINode *basePhi = Builder.CreatePHI(ptr->getType(), phi->getNumIncomingValues()); unsigned srcNum = pointers.size(); for (unsigned x = 0; x < srcNum; x++) { Value *base = NULL; if (pointers[x] != ptr) { base = getPointerBase(pointers[x]); } else { base = basePhi; } IRBuilder<> Builder2(phi->getIncomingBlock(x)); BasicBlock *predBB = phi->getIncomingBlock(x); if (predBB->getTerminator()) Builder2.SetInsertPoint(predBB->getTerminator()); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR < 36 // llvm 3.5 and older version don't have CreateBitOrPointerCast() define Type *srcTy = base->getType(); Type *dstTy = ptr->getType(); if (srcTy->isPointerTy() && dstTy->isIntegerTy()) base = Builder2.CreatePtrToInt(base, dstTy); else if (srcTy->isIntegerTy() && dstTy->isPointerTy()) base = Builder2.CreateIntToPtr(base, dstTy); else if (srcTy != dstTy) base = Builder2.CreateBitCast(base, dstTy); #else base = Builder2.CreateBitOrPointerCast(base, ptr->getType()); #endif basePhi->addIncoming(base, phi->getIncomingBlock(x)); } pointerBaseMap.insert(std::make_pair(ptr, basePhi)); return basePhi; } else { //ptr->dump(); GBE_ASSERT(0 && "Unhandled instruction in getPointerBase\n"); return ptr; } } } Value *GenWriter::getSinglePointerOrigin(Value *ptr) { typedef std::map::iterator BtiIter; // for pointers that already assigned a bti, it is the pointer origin, BtiIter found = BtiMap.find(ptr); if (found != BtiMap.end()) return ptr; PtrOrigMapIter iter = pointerOrigMap.find(ptr); GBE_ASSERT(iter != pointerOrigMap.end()); return iter->second[0]; } Value *GenWriter::getBtiRegister(Value *Val) { typedef std::map::iterator BtiIter; typedef std::map::iterator BtiValueIter; BtiIter found = BtiMap.find(Val); BtiValueIter valueIter = BtiValueMap.find(Val); if (valueIter != BtiValueMap.end()) return valueIter->second; if (isa(Val)) { return ConstantInt::get(Type::getInt32Ty(Val->getContext()), BTI_PRIVATE); } if (found != BtiMap.end()) { // the Val already got assigned an BTI, return it Value *bti = ConstantInt::get(IntegerType::get(Val->getContext(), 32), found->second); BtiValueMap.insert(std::make_pair(Val, bti)); return bti; } else { PtrOrigMapIter iter = pointerOrigMap.find(Val); // the pointer may access an uninitialized pointer, // in this case, we will not find it in pointerOrigMap if (iter == pointerOrigMap.end()) return ConstantInt::get(Type::getInt32Ty(Val->getContext()), BTI_PRIVATE); if (isSingleBti(Val)) { Value * bti = getBtiRegister((*iter).second[0]); BtiValueMap.insert(std::make_pair(Val, bti)); return bti; } else { if (isa(Val)) { SelectInst *si = dyn_cast(Val); IRBuilder<> Builder(si->getParent()); Value *trueVal = getBtiRegister((*iter).second[0]); Value *falseVal = getBtiRegister((*iter).second[1]); Builder.SetInsertPoint(si); Value *bti = Builder.CreateSelect(si->getCondition(), trueVal, falseVal); BtiValueMap.insert(std::make_pair(Val, bti)); return bti; } else if (isa(Val)) { PHINode *phi = dyn_cast(Val); IRBuilder<> Builder(phi->getParent()); Builder.SetInsertPoint(phi); PHINode *btiPhi = Builder.CreatePHI( IntegerType::get(Val->getContext(), 32), phi->getNumIncomingValues()); SmallVector &pointers = (*iter).second; unsigned srcNum = pointers.size(); for (unsigned x = 0; x < srcNum; x++) { Value *bti = NULL; if (pointers[x] != Val) { bti = getBtiRegister(pointers[x]); } else { bti = btiPhi; } btiPhi->addIncoming(bti, phi->getIncomingBlock(x)); } BtiValueMap.insert(std::make_pair(Val, btiPhi)); return btiPhi; } else { //Val->dump(); GBE_ASSERT(0 && "Unhandled instruction in getBtiRegister\n"); return Val; } } } } unsigned GenWriter::getNewBti(Value *origin, bool force) { unsigned new_bti = 0; if (force) { new_bti = btiBase; incBtiBase(); return new_bti; } if (origin->getName().equals(StringRef("__gen_ocl_profiling_buf"))) { new_bti = btiBase; incBtiBase(); } else if (isa(origin) && dyn_cast(origin)->isConstant()) { new_bti = BTI_CONSTANT; } else { unsigned space = origin->getType()->getPointerAddressSpace(); switch (space) { case 0: new_bti = BTI_PRIVATE; break; case 1: { new_bti = btiBase; incBtiBase(); break; } case 2: // ocl 2.0, constant pointer use separate bti if(legacyMode) new_bti = BTI_CONSTANT;//btiBase; else { new_bti = btiBase;//btiBase; incBtiBase(); } break; case 3: new_bti = BTI_LOCAL; break; default: GBE_ASSERT(0); break; } } return new_bti; } MDNode *GenWriter::getKernelFunctionMetadata(Function *F) { NamedMDNode *clKernels = TheModule->getNamedMetadata("opencl.kernels"); uint32_t ops = clKernels->getNumOperands(); for(uint32_t x = 0; x < ops; x++) { MDNode* node = clKernels->getOperand(x); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR <= 35 Value * op = node->getOperand(0); #else auto *V = cast(node->getOperand(0)); Value *op = V ? V->getValue() : NULL; #endif if(op == F) { return node; } } return NULL; } void GenWriter::assignBti(Function &F) { Module::GlobalListType &globalList = const_cast (TheModule->getGlobalList()); for(auto i = globalList.begin(); i != globalList.end(); i ++) { GlobalVariable &v = *i; if(!v.isConstantUsed()) continue; BtiMap.insert(std::make_pair(&v, getNewBti(&v, false))); } MDNode *typeNameNode = NULL; MDNode *typeBaseNameNode = NULL; MDNode *typeQualNode = NULL; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 typeNameNode = F.getMetadata("kernel_arg_type"); typeBaseNameNode = F.getMetadata("kernel_arg_base_type"); typeQualNode = F.getMetadata("kernel_arg_type_qual"); #else MDNode *node = getKernelFunctionMetadata(&F); for(uint j = 0;node && j < node->getNumOperands() - 1; j++) { MDNode *attrNode = dyn_cast_or_null(node->getOperand(1 + j)); if (attrNode == NULL) break; MDString *attrName = dyn_cast_or_null(attrNode->getOperand(0)); if (!attrName) continue; if (attrName->getString() == "kernel_arg_type") { typeNameNode = attrNode; } else if (attrName->getString() == "kernel_arg_type_qual") { typeQualNode = attrNode; } if (attrName->getString() == "kernel_arg_base_type") { typeBaseNameNode = attrNode; } } #endif unsigned argID = 0; ir::FunctionArgument::InfoFromLLVM llvmInfo; for (Function::arg_iterator I = F.arg_begin(), E = F.arg_end(); I != E; ++I, argID++) { unsigned opID = argID; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR < 39 opID += 1; #endif if(typeNameNode) { llvmInfo.typeName= (cast(typeNameNode->getOperand(opID)))->getString(); } if(typeBaseNameNode) { llvmInfo.typeBaseName= (cast(typeBaseNameNode->getOperand(opID)))->getString(); } llvmInfo.typeName= (cast(typeNameNode->getOperand(opID)))->getString(); llvmInfo.typeQual = (cast(typeQualNode->getOperand(opID)))->getString(); bool isImage = llvmInfo.isImageType(); bool isPipe = llvmInfo.isPipeType(); if (I->getType()->isPointerTy() || isImage || isPipe) { BtiMap.insert(std::make_pair(&*I, getNewBti(&*I, isImage || isPipe))); } } BasicBlock &bb = F.getEntryBlock(); for (BasicBlock::iterator iter = bb.begin(), iterE = bb.end(); iter != iterE; ++iter) { if (AllocaInst *ai = dyn_cast(iter)) { BtiMap.insert(std::make_pair(ai, BTI_PRIVATE)); } } } void GenWriter::processPointerArray(Value *ptr, Value *bti, Value *base) { std::vector workList; std::set visited; if (ptr->use_empty()) return; workList.push_back(ptr); for (unsigned i = 0; i < workList.size(); i++) { Value *work = workList[i]; if (work->use_empty()) continue; for (Value::use_iterator iter = work->use_begin(); iter != work->use_end(); ++iter) { // After LLVM 3.5, use_iterator points to 'Use' instead of 'User', // which is more straightforward. #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR < 35 User *theUser = *iter; #else User *theUser = iter->getUser(); #endif if(visited.find(theUser) != visited.end()) continue; visited.insert(theUser); if (isa(theUser) || isa(theUser) || isa(theUser)) { if (isa(theUser)) { Function *F = dyn_cast(theUser)->getCalledFunction(); if (!F || F->getIntrinsicID() != 0) continue; } bool isLoad; Value *pointerOp; IRBuilder<> Builder(cast(theUser)->getParent()); if (isa(theUser)) { pointerOp = dyn_cast(theUser)->getPointerOperand(); isLoad = true; } else { pointerOp = dyn_cast(theUser)->getPointerOperand(); isLoad = false; } Builder.SetInsertPoint(cast(theUser)); Type *ptyTy = IntegerType::get(ptr->getContext(), getTypeBitSize(unit, ptr->getType())); Value *v1 = Builder.CreatePtrToInt(pointerOp, ptyTy); Value *v2 = Builder.CreatePtrToInt(getSinglePointerOrigin(pointerOp), ptyTy); Value *v3 = Builder.CreatePtrToInt(base, ptyTy); Value *v4 = Builder.CreatePtrToInt(bti, ptyTy); // newLocBase = (pointer - origin) + base_start Value *diff = Builder.CreateSub(v1, v2); Value *newLocBase = Builder.CreateAdd(v3, diff); newLocBase = Builder.CreateIntToPtr(newLocBase, Type::getInt32PtrTy(ptr->getContext())); // newLocBti = (pointer - origin) + bti_start Value *newLocBti = Builder.CreateAdd(v4, diff); newLocBti = Builder.CreateIntToPtr(newLocBti, Type::getInt32PtrTy(ptr->getContext())); // later GenWriter instruction translation needs this map info BtiValueMap.insert(std::make_pair(newLocBti, ConstantInt::get(Type::getInt32Ty(ptr->getContext()), BTI_PRIVATE))); pointerBaseMap.insert(std::make_pair(newLocBti, ConstantPointerNull::get(cast(pointerOp->getType())))); BtiValueMap.insert(std::make_pair(newLocBase, ConstantInt::get(Type::getInt32Ty(ptr->getContext()), BTI_PRIVATE))); pointerBaseMap.insert(std::make_pair(newLocBase, ConstantPointerNull::get(cast(pointerOp->getType())))); if (isLoad) { Value *loadedBase = Builder.CreateLoad(newLocBase); Value *loadedBti = Builder.CreateLoad(newLocBti); BtiValueMap.insert(std::make_pair(theUser, loadedBti)); pointerBaseMap.insert(std::make_pair(theUser, loadedBase)); } else { Value *valueOp = cast(theUser)->getValueOperand(); Value *tmp = Builder.CreatePtrToInt(getPointerBase(valueOp), Type::getInt32Ty(ptr->getContext())); Builder.CreateStore(tmp, newLocBase); Builder.CreateStore(getBtiRegister(valueOp), newLocBti); } } else { workList.push_back(theUser); } } } } void GenWriter::analyzePointerOrigin(Function &F) { // used to record where the pointers get mixed (i.e. select or phi instruction) std::set mixedPtr; // This is a two-pass algorithm, the 1st pass will try to update the pointer sources for // every instruction reachable from pointers and record mix-point in this pass. // The second pass will start from really mixed-pointer instruction like select or phinode. // and update the sources correctly. For pointers reachable from mixed-pointer, we will set // its direct mixed-pointer parent as it's pointer origin. std::vector revisit; // GlobalVariable Module::GlobalListType &globalList = const_cast (TheModule->getGlobalList()); for(auto i = globalList.begin(); i != globalList.end(); i ++) { GlobalVariable &v = *i; if(!v.isConstantUsed()) continue; findPointerEscape(&v, mixedPtr, true, revisit); } // function argument for (Function::arg_iterator I = F.arg_begin(), E = F.arg_end(); I != E; ++I) { if (I->getType()->isPointerTy()) { findPointerEscape(&*I, mixedPtr, true, revisit); } } // alloca BasicBlock &bb = F.getEntryBlock(); for (BasicBlock::iterator iter = bb.begin(), iterE = bb.end(); iter != iterE; ++iter) { if (AllocaInst *ai = dyn_cast(iter)) { findPointerEscape(ai, mixedPtr, true, revisit); } } // storing/loading pointer would introduce revisit for (size_t i = 0; i < revisit.size(); ++i) { findPointerEscape(revisit[i], mixedPtr, true, revisit); } // the second pass starts from mixed pointer for (std::set::iterator iter = mixedPtr.begin(); iter != mixedPtr.end(); ++iter) { findPointerEscape(*iter, mixedPtr, false, revisit); } for (std::set::iterator iter = mixedPtr.begin(); iter != mixedPtr.end(); ++iter) { getBtiRegister(*iter); } for (std::set::iterator iter = mixedPtr.begin(); iter != mixedPtr.end(); ++iter) { getPointerBase(*iter); } handleStoreLoadAddress(F); } void GenWriter::handleStoreLoadAddress(Function &F) { std::set processed; for (std::set::iterator iter = addrStoreInst.begin(); iter != addrStoreInst.end(); ++iter) { StoreInst *store = cast(*iter); Value *pointerOp = store->getPointerOperand(); Value *base = getSinglePointerOrigin(pointerOp); if (processed.find(base) != processed.end()) { continue; } processed.insert(base); if (!isa(base)) continue; Value *ArraySize = cast(base)->getArraySize(); BasicBlock &entry = F.getEntryBlock(); BasicBlock::iterator bbIter = entry.begin(); while (isa(bbIter)) ++bbIter; IRBuilder<> Builder(&entry); Builder.SetInsertPoint(&*bbIter); PointerType * AITy = cast(base)->getType(); Value * btiArray = Builder.CreateAlloca(AITy->getElementType(), ArraySize, base->getName() + ".bti"); Value * pointerBaseArray = Builder.CreateAlloca(AITy->getElementType(), ArraySize, base->getName() + ".pointer-base"); processPointerArray(base, btiArray, pointerBaseArray); } } void getSequentialData(const ConstantDataSequential *cda, void *ptr, uint32_t &offset) { StringRef data = cda->getRawDataValues(); memcpy((char*)ptr+offset, data.data(), data.size()); offset += data.size(); return; } void GenWriter::getConstantData(const Constant * c, void* mem, uint32_t& offset, vector &relocs) const { Type * type = c->getType(); Type::TypeID id = type->getTypeID(); GBE_ASSERT(c); if (isa(c)) { const ConstantExpr *expr = dyn_cast(c); Value *pointer = expr->getOperand(0); if (expr->getOpcode() == Instruction::GetElementPtr) { uint32_t constantOffset = 0; Type* EltTy = pointer->getType(); for(uint32_t op=1; opgetNumOperands(); ++op) { int32_t TypeIndex; ConstantInt* ConstOP = dyn_cast(expr->getOperand(op)); GBE_ASSERTM(ConstOP != NULL, "must be constant index"); TypeIndex = ConstOP->getZExtValue(); GBE_ASSERT(TypeIndex >= 0); constantOffset += getGEPConstOffset(unit, pointer->getType(), TypeIndex); EltTy = getEltType(EltTy, TypeIndex); } ir::Constant cc = unit.getConstantSet().getConstant(pointer->getName()); unsigned int defOffset = cc.getOffset(); relocs.push_back(ir::RelocEntry(offset, defOffset + constantOffset)); uint32_t size = getTypeByteSize(unit, type); memset((char*)mem+offset, 0, size); offset += size; } else if (expr->isCast()) { Constant *constPtr = cast(pointer); getConstantData(constPtr, mem, offset, relocs); offset += getTypeByteSize(unit, type); } return; } if (isa(c)) { ir::Constant cc = unit.getConstantSet().getConstant(c->getName()); unsigned int defOffset = cc.getOffset(); relocs.push_back(ir::RelocEntry(offset, defOffset)); uint32_t size = getTypeByteSize(unit, type); memset((char*)mem+offset, 0, size); offset += size; return; } if(isa(c)) { uint32_t size = getTypeByteSize(unit, type); offset += size; return; } else if(isa(c) || isa(c)) { uint32_t size = getTypeByteSize(unit, type); memset((char*)mem+offset, 0, size); offset += size; return; } switch(id) { case Type::TypeID::StructTyID: { const StructType * strTy = cast(c->getType()); uint32_t size = 0; for(uint32_t op=0; strTy && op < strTy->getNumElements(); op++) { Type* elementType = strTy->getElementType(op); uint32_t align = 8 * getAlignmentByte(unit, elementType); uint32_t padding = getPadding(size, align); size += padding; size += getTypeBitSize(unit, elementType); offset += padding/8; const Constant* sub = cast(c->getOperand(op)); GBE_ASSERT(sub); getConstantData(sub, mem, offset, relocs); } break; } case Type::TypeID::ArrayTyID: { const ConstantDataSequential *cds = dyn_cast(c); if(cds) getSequentialData(cds, mem, offset); else { const ConstantArray *ca = dyn_cast(c); if(!ca) return; const ArrayType *arrTy = ca->getType(); Type* elemTy = arrTy->getElementType(); uint32_t elemSize = getTypeBitSize(unit, elemTy); uint32_t padding = getPadding(elemSize, 8 * getAlignmentByte(unit, elemTy)); padding /= 8; uint32_t ops = c->getNumOperands(); for(uint32_t op = 0; op < ops; ++op) { Constant * ca = dyn_cast(c->getOperand(op)); getConstantData(ca, mem, offset, relocs); offset += padding; } } break; } case Type::TypeID::VectorTyID: { const ConstantDataSequential *cds = dyn_cast(c); const VectorType *vecTy = cast(type); GBE_ASSERT(cds); getSequentialData(cds, mem, offset); if(vecTy->getNumElements() == 3) // OCL spec require align to vec4 offset += getTypeByteSize(unit, vecTy->getElementType()); break; } case Type::TypeID::IntegerTyID: { const ConstantInt *ci = dyn_cast(c); uint32_t size = ci->getBitWidth() / 8; uint64_t data = ci->isNegative() ? ci->getSExtValue() : ci->getZExtValue(); memcpy((char*)mem+offset, &data, size); offset += size; break; } case Type::TypeID::FloatTyID: { const ConstantFP *cf = dyn_cast(c); *(float *)((char*)mem + offset) = cf->getValueAPF().convertToFloat(); offset += sizeof(float); break; } case Type::TypeID::DoubleTyID: { const ConstantFP *cf = dyn_cast(c); *(double *)((char*)mem + offset) = cf->getValueAPF().convertToDouble(); offset += sizeof(double); break; } case Type::TypeID::HalfTyID: { const ConstantFP *cf = dyn_cast(c); llvm::APFloat apf = cf->getValueAPF(); llvm::APInt api = apf.bitcastToAPInt(); uint64_t v64 = api.getZExtValue(); uint16_t v16 = static_cast(v64); *(unsigned short *)((char*)mem+offset) = v16; offset += sizeof(short); break; } case Type::TypeID::PointerTyID: { break; } default: { //c->dump(); NOT_IMPLEMENTED; } } } static bool isProgramGlobal(const GlobalVariable &v) { unsigned addrSpace = v.getType()->getAddressSpace(); // private/global/constant return (addrSpace == 2 || addrSpace == 1 || addrSpace == 0); } void GenWriter::collectGlobalConstant(void) const { const Module::GlobalListType &globalList = TheModule->getGlobalList(); // The first pass just create the global variable constants for(auto i = globalList.begin(); i != globalList.end(); i ++) { const GlobalVariable &v = *i; const char *name = v.getName().data(); vector relocs; if(isProgramGlobal(v)) { Type * type = v.getType()->getPointerElementType(); uint32_t size = getTypeByteSize(unit, type); uint32_t alignment = getAlignmentByte(unit, type); unit.newConstant(name, size, alignment); } } // the second pass to initialize the data for(auto i = globalList.begin(); i != globalList.end(); i ++) { const GlobalVariable &v = *i; const char *name = v.getName().data(); if(isProgramGlobal(v)) { if (v.hasInitializer()) { vector relocs; uint32_t offset = 0; ir::Constant &con = unit.getConstantSet().getConstant(name); void* mem = malloc(con.getSize()); const Constant *c = v.getInitializer(); getConstantData(c, mem, offset, relocs); unit.getConstantSet().setData((char*)mem, con.getOffset(), con.getSize()); free(mem); if (!legacyMode) { uint32_t refOffset = unit.getConstantSet().getConstant(name).getOffset(); for (uint32_t k = 0; k < relocs.size(); k++) { unit.getRelocTable().addEntry( refOffset + relocs[k].refOffset, relocs[k].defOffset ); } } } } } } bool GenWriter::doInitialization(Module &M) { FunctionPass::doInitialization(M); // Initialize TheModule = &M; uint32_t oclVersion = getModuleOclVersion(TheModule); legacyMode = oclVersion >= 200 ? false : true; unit.setOclVersion(oclVersion); collectGlobalConstant(); return false; } #define GET_EFFECT_DATA(_seq, _index, _tid) \ ((_tid == CONST_INT) ? _seq->getElementAsInteger(_index) : \ ((_tid == CONST_FLOAT) ? _seq->getElementAsFloat(_index) : \ _seq->getElementAsDouble(_index))) // typename P is for bool only, as c++ set the &vector ir::ImmediateIndex GenWriter::processSeqConstant(ConstantDataSequential *seq, int index, ConstTypeId tid) { if (index >= 0) { const T data = GET_EFFECT_DATA(seq, index, tid); return ctx.newImmediate(data); } else { vector

array; for(uint32_t i = 0; i < seq->getNumElements(); i++) array.push_back(GET_EFFECT_DATA(seq, i, tid)); return ctx.newImmediate((T*)&array[0], array.size()); } } ir::ImmediateIndex GenWriter::processConstantVector(ConstantVector *cv, int index) { if (index >= 0) { Constant *c = cv->getOperand(index); return processConstantImmIndex(c, -1); } else { vector immVector; for (uint32_t i = 0; i < cv->getNumOperands(); i++) immVector.push_back(processConstantImmIndex(cv->getOperand(i))); return ctx.newImmediate(immVector, getType(ctx, cv->getType()->getElementType())); } } ir::ImmediateIndex GenWriter::processConstantImmIndexImpl(Constant *CPV, int32_t index) { GBE_ASSERT(dyn_cast(CPV) == NULL); ConstantDataSequential *seq = dyn_cast(CPV); if (seq) { Type *Ty = seq->getElementType(); if (Ty == Type::getInt1Ty(CPV->getContext())) { return processSeqConstant(seq, index, CONST_INT); } else if (Ty == Type::getInt8Ty(CPV->getContext())) { return processSeqConstant(seq, index, CONST_INT); } else if (Ty == Type::getInt16Ty(CPV->getContext())) { return processSeqConstant(seq, index, CONST_INT); } else if (Ty == Type::getInt32Ty(CPV->getContext())) { return processSeqConstant(seq, index, CONST_INT); } else if (Ty == Type::getInt64Ty(CPV->getContext())) { return processSeqConstant(seq, index, CONST_INT); } else if (Ty == Type::getFloatTy(CPV->getContext())) { return processSeqConstant(seq, index, CONST_FLOAT); } else if (Ty == Type::getDoubleTy(CPV->getContext())) { return processSeqConstant(seq, index, CONST_DOUBLE); } else if (Ty == Type::getHalfTy(CPV->getContext())) { GBE_ASSERTM(0, "Const data array never be half float\n"); } } else if (dyn_cast(CPV)) { Type* Ty = CPV->getType(); if(Ty->isVectorTy()) Ty = (cast(Ty))->getElementType(); if (Ty == Type::getInt1Ty(CPV->getContext())) { const bool b = 0; return ctx.newImmediate(b); } else if (Ty == Type::getInt8Ty(CPV->getContext())) { const uint8_t u8 = 0; return ctx.newImmediate(u8); } else if (Ty == Type::getInt16Ty(CPV->getContext())) { const uint16_t u16 = 0; return ctx.newImmediate(u16); } else if (Ty == Type::getInt32Ty(CPV->getContext())) { const uint32_t u32 = 0; return ctx.newImmediate(u32); } else if (Ty == Type::getInt64Ty(CPV->getContext())) { const uint64_t u64 = 0; return ctx.newImmediate(u64); } else if (Ty == Type::getFloatTy(CPV->getContext())) { const float f32 = 0; return ctx.newImmediate(f32); } else if (Ty == Type::getHalfTy(CPV->getContext())) { const ir::half f16 = 0; return ctx.newImmediate(f16); } else if (Ty == Type::getDoubleTy(CPV->getContext())) { const double f64 = 0; return ctx.newImmediate(f64); } else { GBE_ASSERTM(false, "Unsupporte aggregate zero type."); return ctx.newImmediate(uint32_t(0)); } } else { if (dyn_cast(CPV)) return processConstantVector(dyn_cast(CPV), index); GBE_ASSERTM(dyn_cast(CPV) == NULL, "Unsupported constant expression"); // Integers if (ConstantInt *CI = dyn_cast(CPV)) { Type* Ty = CI->getType(); if (Ty == Type::getInt1Ty(CPV->getContext())) { const bool b = CI->getZExtValue(); return ctx.newImmediate(b); } else if (Ty == Type::getInt8Ty(CPV->getContext())) { const uint8_t u8 = CI->getZExtValue(); return ctx.newImmediate(u8); } else if (Ty == Type::getInt16Ty(CPV->getContext())) { const uint16_t u16 = CI->getZExtValue(); return ctx.newImmediate(u16); } else if (Ty == Type::getInt32Ty(CPV->getContext())) { const uint32_t u32 = CI->getZExtValue(); return ctx.newImmediate(u32); } else if (Ty == Type::getInt64Ty(CPV->getContext())) { const uint64_t u64 = CI->getZExtValue(); return ctx.newImmediate(u64); } else { if (CI->getValue().getActiveBits() > 64) { ctx.getUnit().setValid(false); return ctx.newImmediate(uint64_t(0)); } return ctx.newImmediate(uint64_t(CI->getZExtValue())); } } // NULL pointers if(isa(CPV)) { if (ctx.getPointerFamily() == ir::FAMILY_QWORD) return ctx.newImmediate(uint64_t(0)); else return ctx.newImmediate(uint32_t(0)); } const Type::TypeID typeID = CPV->getType()->getTypeID(); if (isa(CPV)) { Type* Ty = CPV->getType(); if (Ty == Type::getInt1Ty(CPV->getContext())) return ctx.newImmediate(false); if (Ty == Type::getInt8Ty(CPV->getContext())) return ctx.newImmediate((uint8_t)0); if (Ty == Type::getInt16Ty(CPV->getContext())) return ctx.newImmediate((uint16_t)0); if (Ty == Type::getInt32Ty(CPV->getContext())) return ctx.newImmediate((uint32_t)0); if (Ty == Type::getInt64Ty(CPV->getContext())) return ctx.newImmediate((uint64_t)0); if (Ty == Type::getFloatTy(CPV->getContext())) return ctx.newImmediate((float)0); if (Ty == Type::getHalfTy(CPV->getContext())) return ctx.newImmediate((ir::half)0); if (Ty == Type::getDoubleTy(CPV->getContext())) return ctx.newImmediate((double)0); GBE_ASSERT(0 && "Unsupported undef value type.\n"); } // Floats and doubles switch (typeID) { case Type::FloatTyID: case Type::HalfTyID: case Type::DoubleTyID: { ConstantFP *FPC = cast(CPV); GBE_ASSERT(isa(CPV) == false); if (FPC->getType() == Type::getFloatTy(CPV->getContext())) { const float f32 = FPC->getValueAPF().convertToFloat(); return ctx.newImmediate(f32); } else if (FPC->getType() == Type::getDoubleTy(CPV->getContext())) { const double f64 = FPC->getValueAPF().convertToDouble(); return ctx.newImmediate(f64); } else { llvm::APFloat apf = FPC->getValueAPF(); llvm::APInt api = apf.bitcastToAPInt(); uint64_t v64 = api.getZExtValue(); uint16_t v16 = static_cast(v64); const ir::half f16(v16); return ctx.newImmediate(f16); } } break; default: GBE_ASSERTM(false, "Unsupported constant type"); break; } } GBE_ASSERTM(false, "Unsupported constant type"); return ctx.newImmediate(uint64_t(0)); } ir::ImmediateIndex GenWriter::processConstantImmIndex(Constant *CPV, int32_t index) { if (dyn_cast(CPV) == NULL) return processConstantImmIndexImpl(CPV, index); //CPV->dump(); GBE_ASSERT(0 && "unsupported constant.\n"); return ctx.newImmediate((uint32_t)0); } const ir::Immediate &GenWriter::processConstantImm(Constant *CPV, int32_t index) { ir::ImmediateIndex immIndex = processConstantImmIndex(CPV, index); return ctx.getFunction().getImmediate(immIndex); } ir::ImmediateIndex GenWriter::newImmediate(Constant *CPV, uint32_t index) { return processConstantImmIndex(CPV, index); } void GenWriter::newRegister(Value *value, Value *key, bool uniform) { auto type = value->getType(); auto typeID = type->getTypeID(); switch (typeID) { case Type::IntegerTyID: case Type::FloatTyID: case Type::HalfTyID: case Type::DoubleTyID: case Type::PointerTyID: regTranslator.newScalar(value, key, 0, uniform); break; case Type::VectorTyID: { auto vectorType = cast(type); const uint32_t elemNum = vectorType->getNumElements(); for (uint32_t elemID = 0; elemID < elemNum; ++elemID) regTranslator.newScalar(value, key, elemID, uniform); break; } case Type::StructTyID: { auto structType = cast(type); const uint32_t elemNum = structType->getNumElements(); for (uint32_t elemID = 0; elemID < elemNum; ++elemID) regTranslator.newScalar(value, key, elemID, uniform); break; } default: NOT_SUPPORTED; }; } ir::Register GenWriter::getConstantRegister(Constant *c, uint32_t elemID) { GBE_ASSERT(c != NULL); if(isa(c)) { return regTranslator.getScalar(c, elemID); } if(isa(c)) { Type* llvmType = c->getType(); ir::Type dstType = getType(ctx, llvmType); ir::Register reg = ctx.reg(getFamily(dstType)); ir::ImmediateIndex immIndex; if(llvmType->isIntegerTy()) immIndex = ctx.newIntegerImmediate(0, dstType); else if(llvmType->isFloatTy()) { immIndex = ctx.newFloatImmediate((float)0.0); } else { immIndex = ctx.newDoubleImmediate((double)0.0); } ctx.LOADI(dstType, reg, immIndex); return reg; } const ir::ImmediateIndex immIndex = this->newImmediate(c, elemID); const ir::Immediate imm = ctx.getImmediate(immIndex); const ir::Register reg = ctx.reg(getFamily(imm.getType())); ctx.LOADI(imm.getType(), reg, immIndex); return reg; } ir::Register GenWriter::getRegister(Value *value, uint32_t elemID) { //the real value may be constant, so get real value before constant check regTranslator.getRealValue(value, elemID); if(isa(value)) { Constant *c = dyn_cast(value); return getConstantRegister(c, elemID); } else return regTranslator.getScalar(value, elemID); } INLINE Value *GenWriter::getPHICopy(Value *PHI) { const uintptr_t ptr = (uintptr_t) PHI; return (Value*) (ptr+1); } void GenWriter::newLabelIndex(const BasicBlock *bb) { if (labelMap.find(bb) == labelMap.end()) { const ir::LabelIndex label = ctx.label(); labelMap[bb] = label; } } void GenWriter::simplifyTerminator(BasicBlock *bb) { Value *value = bb->getTerminator(); BranchInst *I = NULL; if ((I = dyn_cast(value)) != NULL) { if (I->isConditional() == false) return; // If the "taken" successor is the next block, we try to invert the // branch. BasicBlock *succ = I->getSuccessor(0); if (std::next(Function::iterator(bb)) != Function::iterator(succ)) return; // More than one use is too complicated: we skip it Value *condition = I->getCondition(); if (condition->hasOneUse() == false) return; // Right now, we only invert comparison instruction ICmpInst *CI = dyn_cast(condition); if (CI != NULL) { GBE_ASSERT(conditionSet.find(CI) == conditionSet.end()); conditionSet.insert(CI); return; } } } void GenWriter::emitBasicBlock(BasicBlock *BB) { GBE_ASSERT(labelMap.find(BB) != labelMap.end()); ctx.LABEL(labelMap[BB]); for (auto II = BB->begin(), E = BB->end(); II != E; ++II) { if(OCL_DEBUGINFO) { llvm::Instruction * It = dyn_cast(II); setDebugInfo_CTX(It); } visit(*II); } } void GenWriter::emitMovForPHI(BasicBlock *curr, BasicBlock *succ) { for (BasicBlock::iterator I = succ->begin(); isa(I); ++I) { PHINode *PN = cast(I); Value *IV = PN->getIncomingValueForBlock(curr); Type *llvmType = PN->getType(); const ir::Type type = getType(ctx, llvmType); Value *PHICopy = this->getPHICopy(PN); const ir::Register dst = this->getRegister(PHICopy); if (!isa(IV)) { // Emit the MOV required by the PHI function. We do it simple and do not // try to optimize them. A next data flow analysis pass on the Gen IR // will remove them Constant *CP = dyn_cast(IV); if (CP) { GBE_ASSERT(isa(CP) == false); ConstantVector *CPV = dyn_cast(CP); if (CPV && dyn_cast(CPV) && isa(extractConstantElem(CPV, 0))) continue; ctx.MOV(type, dst, getRegister(CP)); } else if (regTranslator.valueExists(IV,0) || dyn_cast(IV)) { const ir::Register src = this->getRegister(IV); ctx.MOV(type, dst, src); } assert(!ctx.getBlock()->undefPhiRegs.contains(dst)); ctx.getBlock()->definedPhiRegs.insert(dst); } else { // If this is an undefined value, we don't need emit phi copy here. // But we need to record it. As latter, at liveness's backward analysis, // we don't need to pass the phi value/register to this BB which the phi // value is undefined. Otherwise, the phi value's liveness will be extent // incorrectly and may be extent to the basic block zero which is really bad. ctx.getBlock()->undefPhiRegs.insert(dst); } } } /*! To track read image args and write args */ struct ImageArgsInfo{ uint32_t readImageArgs; uint32_t writeImageArgs; }; static void collectImageArgs(std::string& accessQual, ImageArgsInfo& imageArgsInfo) { if(accessQual.find("read") != std::string::npos) { imageArgsInfo.readImageArgs++; GBE_ASSERT(imageArgsInfo.readImageArgs <= BTI_MAX_READ_IMAGE_ARGS); } else if(accessQual.find("write") != std::string::npos) { imageArgsInfo.writeImageArgs++; GBE_ASSERT(imageArgsInfo.writeImageArgs <= BTI_MAX_WRITE_IMAGE_ARGS); } else { //default is read_only per spec. imageArgsInfo.readImageArgs++; GBE_ASSERT(imageArgsInfo.readImageArgs <= BTI_MAX_READ_IMAGE_ARGS); } } void GenWriter::setDebugInfo_CTX(llvm::Instruction * insn) { llvm::DebugLoc dg = insn->getDebugLoc(); DebugInfo dbginfo; dbginfo.line = dg.getLine(); dbginfo.col = dg.getCol(); ctx.setDBGInfo(dbginfo); } void GenWriter::emitFunctionPrototype(Function &F) { GBE_ASSERTM(F.hasStructRetAttr() == false, "Returned value for kernel functions is forbidden"); // Loop over the kernel metadatas to set the required work group size. size_t reqd_wg_sz[3] = {0, 0, 0}; size_t hint_wg_sz[3] = {0, 0, 0}; size_t reqd_sg_sz = 0; ir::FunctionArgument::InfoFromLLVM llvmInfo; MDNode *addrSpaceNode = NULL; MDNode *typeNameNode = NULL; MDNode *typeBaseNameNode = NULL; MDNode *accessQualNode = NULL; MDNode *typeQualNode = NULL; MDNode *argNameNode = NULL; std::string functionAttributes; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 /* LLVM 3.9 change kernel arg info as function metadata */ addrSpaceNode = F.getMetadata("kernel_arg_addr_space"); accessQualNode = F.getMetadata("kernel_arg_access_qual"); typeNameNode = F.getMetadata("kernel_arg_type"); typeBaseNameNode = F.getMetadata("kernel_arg_base_type"); typeQualNode = F.getMetadata("kernel_arg_type_qual"); argNameNode = F.getMetadata("kernel_arg_name"); MDNode *attrNode; if ((attrNode = F.getMetadata("vec_type_hint"))) { GBE_ASSERT(attrNode->getNumOperands() == 2); functionAttributes += "vec_type_hint"; auto *Op1 = cast(attrNode->getOperand(0)); Value *V = Op1 ? Op1->getValue() : NULL; ConstantInt *sign = mdconst::extract(attrNode->getOperand(1)); size_t signValue = sign->getZExtValue(); Type *vtype = V->getType(); Type *stype = vtype; uint32_t elemNum = 0; if (vtype->isVectorTy()) { VectorType *vectorType = cast(vtype); stype = vectorType->getElementType(); elemNum = vectorType->getNumElements(); } std::string typeName = getTypeName(ctx, stype, signValue); std::stringstream param; char buffer[100] = {0}; param << "("; param << typeName; if (vtype->isVectorTy()) param << elemNum; param << ")"; param >> buffer; functionAttributes += buffer; functionAttributes += " "; } if ((attrNode = F.getMetadata("reqd_work_group_size"))) { GBE_ASSERT(attrNode->getNumOperands() == 3); ConstantInt *x = mdconst::extract(attrNode->getOperand(0)); ConstantInt *y = mdconst::extract(attrNode->getOperand(1)); ConstantInt *z = mdconst::extract(attrNode->getOperand(2)); GBE_ASSERT(x && y && z); reqd_wg_sz[0] = x->getZExtValue(); reqd_wg_sz[1] = y->getZExtValue(); reqd_wg_sz[2] = z->getZExtValue(); functionAttributes += "reqd_work_group_size"; std::stringstream param; char buffer[100] = {0}; param << "("; param << reqd_wg_sz[0]; param << ","; param << reqd_wg_sz[1]; param << ","; param << reqd_wg_sz[2]; param << ")"; param >> buffer; functionAttributes += buffer; functionAttributes += " "; } if ((attrNode = F.getMetadata("work_group_size_hint"))) { GBE_ASSERT(attrNode->getNumOperands() == 3); ConstantInt *x = mdconst::extract(attrNode->getOperand(0)); ConstantInt *y = mdconst::extract(attrNode->getOperand(1)); ConstantInt *z = mdconst::extract(attrNode->getOperand(2)); GBE_ASSERT(x && y && z); hint_wg_sz[0] = x->getZExtValue(); hint_wg_sz[1] = y->getZExtValue(); hint_wg_sz[2] = z->getZExtValue(); functionAttributes += "work_group_size_hint"; std::stringstream param; char buffer[100] = {0}; param << "("; param << hint_wg_sz[0]; param << ","; param << hint_wg_sz[1]; param << ","; param << hint_wg_sz[2]; param << ")"; param >> buffer; functionAttributes += buffer; functionAttributes += " "; } if ((attrNode = F.getMetadata("intel_reqd_sub_group_size"))) { GBE_ASSERT(attrNode->getNumOperands() == 1); ConstantInt *sz = mdconst::extract(attrNode->getOperand(0)); GBE_ASSERT(sz); reqd_sg_sz = sz->getZExtValue(); if(!(reqd_sg_sz == 8 || reqd_sg_sz == 16)){ F.getContext().emitError("Required sub group size is illegal!"); ctx.getUnit().setValid(false); return; } functionAttributes += "intel_reqd_sub_group_size"; std::stringstream param; char buffer[100] = {0}; param << "("; param << reqd_sg_sz; param << ")"; param >> buffer; functionAttributes += buffer; functionAttributes += " "; } #else /* First find the meta data belong to this function. */ MDNode *node = getKernelFunctionMetadata(&F); /* because "-cl-kernel-arg-info", should always have meta data. */ if (!F.arg_empty()) assert(node); for(uint j = 0; node && j < node->getNumOperands() - 1; j++) { MDNode *attrNode = dyn_cast_or_null(node->getOperand(1 + j)); if (attrNode == NULL) break; MDString *attrName = dyn_cast_or_null(attrNode->getOperand(0)); if (!attrName) continue; if (attrName->getString() == "reqd_work_group_size") { GBE_ASSERT(attrNode->getNumOperands() == 4); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR <= 35 ConstantInt *x = dyn_cast(attrNode->getOperand(1)); ConstantInt *y = dyn_cast(attrNode->getOperand(2)); ConstantInt *z = dyn_cast(attrNode->getOperand(3)); #else ConstantInt *x = mdconst::extract(attrNode->getOperand(1)); ConstantInt *y = mdconst::extract(attrNode->getOperand(2)); ConstantInt *z = mdconst::extract(attrNode->getOperand(3)); #endif GBE_ASSERT(x && y && z); reqd_wg_sz[0] = x->getZExtValue(); reqd_wg_sz[1] = y->getZExtValue(); reqd_wg_sz[2] = z->getZExtValue(); functionAttributes += attrName->getString(); std::stringstream param; char buffer[100] = {0}; param <<"("; param << reqd_wg_sz[0]; param << ","; param << reqd_wg_sz[1]; param << ","; param << reqd_wg_sz[2]; param <<")"; param >> buffer; functionAttributes += buffer; functionAttributes += " "; break; } else if (attrName->getString() == "kernel_arg_addr_space") { addrSpaceNode = attrNode; } else if (attrName->getString() == "kernel_arg_access_qual") { accessQualNode = attrNode; } else if (attrName->getString() == "kernel_arg_type") { typeNameNode = attrNode; } else if (attrName->getString() == "kernel_arg_base_type") { typeBaseNameNode = attrNode; } else if (attrName->getString() == "kernel_arg_type_qual") { typeQualNode = attrNode; } else if (attrName->getString() == "kernel_arg_name") { argNameNode = attrNode; } else if (attrName->getString() == "vec_type_hint") { GBE_ASSERT(attrNode->getNumOperands() == 3); functionAttributes += attrName->getString(); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR <= 35 Value* V = attrNode->getOperand(1); #else auto *Op1 = cast(attrNode->getOperand(1)); Value *V = Op1 ? Op1->getValue() : NULL; #endif #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR <= 35 ConstantInt *sign = dyn_cast(attrNode->getOperand(2)); #else ConstantInt *sign = mdconst::extract(attrNode->getOperand(2)); #endif size_t signValue = sign->getZExtValue(); Type* vtype = V->getType(); Type* stype = vtype; uint32_t elemNum = 0; if(vtype->isVectorTy()) { VectorType *vectorType = cast(vtype); stype = vectorType->getElementType(); elemNum = vectorType->getNumElements(); } std::string typeName = getTypeName(ctx, stype, signValue); std::stringstream param; char buffer[100] = {0}; param <<"("; param << typeName; if(vtype->isVectorTy()) param << elemNum; param <<")"; param >> buffer; functionAttributes += buffer; functionAttributes += " "; } else if (attrName->getString() == "work_group_size_hint") { GBE_ASSERT(attrNode->getNumOperands() == 4); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR <= 35 ConstantInt *x = dyn_cast(attrNode->getOperand(1)); ConstantInt *y = dyn_cast(attrNode->getOperand(2)); ConstantInt *z = dyn_cast(attrNode->getOperand(3)); #else ConstantInt *x = mdconst::extract(attrNode->getOperand(1)); ConstantInt *y = mdconst::extract(attrNode->getOperand(2)); ConstantInt *z = mdconst::extract(attrNode->getOperand(3)); #endif GBE_ASSERT(x && y && z); hint_wg_sz[0] = x->getZExtValue(); hint_wg_sz[1] = y->getZExtValue(); hint_wg_sz[2] = z->getZExtValue(); functionAttributes += attrName->getString(); std::stringstream param; char buffer[100] = {0}; param <<"("; param << hint_wg_sz[0]; param << ","; param << hint_wg_sz[1]; param << ","; param << hint_wg_sz[2]; param <<")"; param >> buffer; functionAttributes += buffer; functionAttributes += " "; } } #endif /* LLVM 3.9 Function metadata */ ctx.getFunction().setCompileWorkGroupSize(reqd_wg_sz[0], reqd_wg_sz[1], reqd_wg_sz[2]); if (reqd_sg_sz) ctx.setSimdWidth(reqd_sg_sz); ctx.getFunction().setFunctionAttributes(functionAttributes); // Loop over the arguments and output registers for them if (!F.arg_empty()) { uint32_t argID = 0; ImageArgsInfo imageArgsInfo = {}; Function::arg_iterator I = F.arg_begin(), E = F.arg_end(); // Insert a new register for each function argument for (; I != E; ++I, ++argID) { uint32_t opID = argID; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR < 39 opID += 1; #endif const std::string &argName = I->getName().str(); Type *type = I->getType(); if(addrSpaceNode) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR <= 35 llvmInfo.addrSpace = (cast(addrSpaceNode->getOperand(opID)))->getZExtValue(); #else llvmInfo.addrSpace = (mdconst::extract(addrSpaceNode->getOperand(opID)))->getZExtValue(); #endif } if(typeNameNode) { llvmInfo.typeName = (cast(typeNameNode->getOperand(opID)))->getString(); //LLVM 3.9 image's type name include access qual, don't match OpenCL spec, erase them. std::vector filters = {"__read_only ", "__write_only "}; for (uint32_t i = 0; i < filters.size(); i++) { size_t pos = llvmInfo.typeName.find(filters[i]); if (pos != std::string::npos) { llvmInfo.typeName = llvmInfo.typeName.erase(pos, filters[i].length()); } } } if(typeBaseNameNode){ llvmInfo.typeBaseName = (cast(typeBaseNameNode->getOperand(opID)))->getString(); } if(accessQualNode) { llvmInfo.accessQual = (cast(accessQualNode->getOperand(opID)))->getString(); } if(typeQualNode) { llvmInfo.typeQual = (cast(typeQualNode->getOperand(opID)))->getString(); } if(argNameNode){ llvmInfo.argName = (cast(argNameNode->getOperand(opID)))->getString(); } // function arguments are uniform values. this->newRegister(&*I, NULL, true); // add support for vector argument. if(type->isVectorTy()) { VectorType *vectorType = cast(type); ir::Register reg = getRegister(&*I, 0); Type *elemType = vectorType->getElementType(); const uint32_t elemSize = getTypeByteSize(unit, elemType); const uint32_t elemNum = vectorType->getNumElements(); //vector's elemType always scalar type ctx.input(argName, ir::FunctionArgument::VALUE, reg, llvmInfo, getTypeByteSize(unit, type), getAlignmentByte(unit, type), 0); ir::Function& fn = ctx.getFunction(); for(uint32_t i=1; i < elemNum; i++) { ir::PushLocation argLocation(fn, argID, elemSize*i); reg = getRegister(&*I, i); ctx.appendPushedConstant(reg, argLocation); //add to push map for reg alloc } continue; } GBE_ASSERTM(isScalarType(type) == true, "vector type in the function argument is not supported yet"); const ir::Register reg = getRegister(&*I); if (llvmInfo.isImageType()) { ctx.input(argName, ir::FunctionArgument::IMAGE, reg, llvmInfo, 4, 4, 0); ctx.getFunction().getImageSet()->append(reg, &ctx, BtiMap.find(&*I)->second); collectImageArgs(llvmInfo.accessQual, imageArgsInfo); continue; } if (llvmInfo.isSamplerType()) { ctx.input(argName, ir::FunctionArgument::SAMPLER, reg, llvmInfo, 4, 4, 0); (void)ctx.getFunction().getSamplerSet()->append(reg, &ctx); continue; } if(llvmInfo.isPipeType()) { llvmInfo.typeSize = getTypeSize(F.getParent(),unit,llvmInfo.typeName); ctx.input(argName, ir::FunctionArgument::PIPE, reg, llvmInfo, getTypeByteSize(unit, type), getAlignmentByte(unit, type), BtiMap.find(&*I)->second); continue; } if (type->isPointerTy() == false) ctx.input(argName, ir::FunctionArgument::VALUE, reg, llvmInfo, getTypeByteSize(unit, type), getAlignmentByte(unit, type), 0); else { PointerType *pointerType = dyn_cast(type); if(!pointerType) continue; Type *pointed = pointerType->getElementType(); // By value structure if (I->hasByValAttr()) { const size_t structSize = getTypeByteSize(unit, pointed); ctx.input(argName, ir::FunctionArgument::STRUCTURE, reg, llvmInfo, structSize, getAlignmentByte(unit, type), 0); } // Regular user provided pointer (global, local or constant) else { const uint32_t addr = pointerType->getAddressSpace(); const ir::AddressSpace addrSpace = addressSpaceLLVMToGen(addr); const uint32_t ptrSize = getTypeByteSize(unit, type); const uint32_t align = getAlignmentByte(unit, pointed); switch (addrSpace) { case ir::MEM_GLOBAL: ctx.input(argName, ir::FunctionArgument::GLOBAL_POINTER, reg, llvmInfo, ptrSize, align, BtiMap.find(&*I)->second); break; case ir::MEM_LOCAL: ctx.input(argName, ir::FunctionArgument::LOCAL_POINTER, reg, llvmInfo, ptrSize, align, BTI_LOCAL); ctx.getFunction().setUseSLM(true); break; case ir::MEM_CONSTANT: ctx.input(argName, ir::FunctionArgument::CONSTANT_POINTER, reg, llvmInfo, ptrSize, align, 0x2); break; default: GBE_ASSERT(addrSpace != ir::MEM_PRIVATE); } } } } } // When returning a structure, first input register is the pointer to the // structure #if GBE_DEBUG const Type *type = F.getReturnType(); GBE_ASSERTM(type->isVoidTy() == true, "Returned value for kernel functions is forbidden"); // Variable number of arguments is not supported FunctionType *FT = cast(F.getFunctionType()); GBE_ASSERT(FT->isVarArg() == false); #endif /* GBE_DEBUG */ } static inline bool isFPIntBitCast(const Instruction &I) { if (!isa(I)) return false; Type *SrcTy = I.getOperand(0)->getType(); Type *DstTy = I.getType(); return (SrcTy->isFloatingPointTy() && DstTy->isIntegerTy()) || (DstTy->isFloatingPointTy() && SrcTy->isIntegerTy()); } /*! To track last read and write of the registers */ struct RegInfoForMov { ir::Instruction *lastWriteInsn; ir::Instruction *lastReadInsn; uint32_t lastWrite; uint32_t lastRead; }; /*! Replace register "from" by register "to" in the destination(s) */ static void replaceDst(ir::Instruction *insn, ir::Register from, ir::Register to) { const uint32_t dstNum = insn->getDstNum(); for (uint32_t dstID = 0; dstID < dstNum; ++dstID) if (insn->getDst(dstID) == from) insn->setDst(dstID, to); } /*! Replace register "from" by register "to" in the source(s) */ static void replaceSrc(ir::Instruction *insn, ir::Register from, ir::Register to) { const uint32_t srcNum = insn->getSrcNum(); for (uint32_t srcID = 0; srcID < srcNum; ++srcID) if (insn->getSrc(srcID) == from) insn->setSrc(srcID, to); } /*! lastUse maintains data about last uses (reads/writes) for each * ir::Register */ static void buildRegInfo(ir::BasicBlock &bb, vector &lastUse) { // Clear the register usages for (auto &x : lastUse) { x.lastWrite = x.lastRead = 0; x.lastWriteInsn = x.lastReadInsn = NULL; } // Find use intervals for all registers (distinguish sources and // destinations) uint32_t insnID = 2; bb.foreach([&](ir::Instruction &insn) { if (insn.getOpcode() == ir::OP_MOV && insn.getDst(0) == insn.getSrc(0)) { insn.remove(); return; } const uint32_t dstNum = insn.getDstNum(); const uint32_t srcNum = insn.getSrcNum(); for (uint32_t srcID = 0; srcID < srcNum; ++srcID) { const ir::Register reg = insn.getSrc(srcID); lastUse[reg].lastRead = insnID; lastUse[reg].lastReadInsn = &insn; } for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { const ir::Register reg = insn.getDst(dstID); lastUse[reg].lastWrite = insnID+1; lastUse[reg].lastWriteInsn = &insn; } insnID+=2; }); } void GenWriter::optimizePhiCopy(ir::Liveness &liveness, ir::Function &fn, map &replaceMap, map &redundantPhiCopyMap) { // The overall idea behind is we check whether there is any interference // between phi and phiCopy live range. If there is no point that // phi & phiCopy are both alive, then we can optimize off the move // from phiCopy to phi, and use phiCopy directly instead of phi. // right now, the algorithm is still very conservative, we need to do // aggressive coaleasing for the moves added during phi elimination. using namespace ir; ir::FunctionDAG *dag = new ir::FunctionDAG(liveness); for (auto &it : phiMap) { const Register phi = it.first; const Register phiCopy = it.second; const ir::DefSet *phiCopyDef = dag->getRegDef(phiCopy); const ir::UseSet *phiUse = dag->getRegUse(phi); const DefSet *phiDef = dag->getRegDef(phi); bool isOpt = true; // FIXME, I find under some situation, the phiDef maybe null, seems a bug when building FunctionDAg. // need fix it there. if (phiDef->empty()) continue; const ir::BasicBlock *phiDefBB = (*phiDef->begin())->getInstruction()->getParent(); for (auto &x : *phiCopyDef) { const ir::Instruction * phiCopyDefInsn = x->getInstruction(); const ir::BasicBlock *bb = phiCopyDefInsn->getParent(); const Liveness::LiveOut &out = liveness.getLiveOut(bb); // phi & phiCopy are both alive at the endpoint of bb, // thus can not be optimized. if (out.contains(phi)) { isOpt = false; break; } const ir::Register phiCopySrc = phiCopyDefInsn->getSrc(0); const ir::UseSet *phiCopySrcUse = dag->getRegUse(phiCopySrc); const ir::DefSet *phiCopySrcDef = dag->getRegDef(phiCopySrc); // we should only do coaleasing on instruction-def and ssa-value if (phiCopySrcDef->size() == 1 && (*(phiCopySrcDef->begin()))->getType() == ValueDef::DEF_INSN_DST) { const ir::Instruction *phiCopySrcDefInsn = (*(phiCopySrcDef->begin()))->getInstruction(); if(bb == phiDefBB && bb == phiCopySrcDefInsn->getParent()) { // phiCopy, phiCopySrc defined in same basicblock as phi // try to coalease phiCopy and phiCopySrc first. // consider below situation: // bb1: // ... // bb2: // x = phi [x1, bb1], [x2, bb2] // x2 = x+1; // after de-ssa: // bb2: // mov x, x-copy // add x2, x, 1 // mov x-copy, x2 // obviously x2, x-copy and x2 can be mapped to same virtual register ir::BasicBlock::const_iterator iter = ir::BasicBlock::const_iterator(phiCopySrcDefInsn); ir::BasicBlock::const_iterator iterE = bb->end(); iter++; // check no use of phi in this basicblock between [phiCopySrc def, bb end] bool phiPhiCopySrcInterfere = false; while (iter != iterE) { const ir::Instruction *insn = iter.node(); // check phiUse for (unsigned i = 0; i < insn->getSrcNum(); i++) { ir::Register src = insn->getSrc(i); if (src == phi) { phiPhiCopySrcInterfere = true; break; } } ++iter; } if (!phiPhiCopySrcInterfere) { replaceSrc(const_cast(phiCopyDefInsn), phiCopySrc, phiCopy); for (auto &s : *phiCopySrcDef) { const Instruction *phiSrcDefInsn = s->getInstruction(); replaceDst(const_cast(phiSrcDefInsn), phiCopySrc, phiCopy); } for (auto &s : *phiCopySrcUse) { const Instruction *phiSrcUseInsn = s->getInstruction(); replaceSrc(const_cast(phiSrcUseInsn), phiCopySrc, phiCopy); } replaceMap.insert(std::make_pair(phiCopySrc, phiCopy)); } } } else { // FIXME, if the phiCopySrc is a phi value and has been used for more than one phiCopySrc // This 1:1 map will ignore the second one. if (((*(phiCopySrcDef->begin()))->getType() == ValueDef::DEF_INSN_DST) && redundantPhiCopyMap.find(phiCopySrc) == redundantPhiCopyMap.end()) redundantPhiCopyMap.insert(std::make_pair(phiCopySrc, phiCopy)); } // If phi is used in the same BB that define the phiCopy, // we need carefully check the liveness of phi & phiCopy. // Make sure their live ranges do not interfere. bool phiUsedInSameBB = false; for (auto &y : *phiUse) { const ir::Instruction *phiUseInsn = y->getInstruction(); const ir::BasicBlock *bb2 = phiUseInsn->getParent(); if (bb2 == bb) { phiUsedInSameBB = true; } } // Check phi is not used between phiCopy def point and bb's end point, // which is often referred as 'phi swap issue', just like below: // MOV phiCopy_1, x; // MOV phiCopy_2, phi_1; if (phiUsedInSameBB ) { for (auto it = --bb->end(); it != bb->end() ; --it) { const Instruction &p = *it; if (&p == phiCopyDefInsn) break; // we only care MOV here if (p.getSrcNum() == 1 && p.getSrc(0) == phi) { isOpt = false; break; } } } } // coalease phi and phiCopy if (isOpt) { for (auto &x : *phiDef) { replaceDst(const_cast(x->getInstruction()), phi, phiCopy); } for (auto &x : *phiUse) { const Instruction *phiUseInsn = x->getInstruction(); replaceSrc(const_cast(phiUseInsn), phi, phiCopy); replaceMap.insert(std::make_pair(phi, phiCopy)); } } } delete dag; } void GenWriter::postPhiCopyOptimization(ir::Liveness &liveness, ir::Function &fn, map &replaceMap, map &redundantPhiCopyMap) { // When doing the first pass phi copy optimization, we skip all the phi src MOV cases // whoes phiSrdDefs are also a phi value. We leave it here when all phi copy optimizations // have been done. Then we don't need to worry about there are still reducible phi copy remained. // We only need to check whether those possible redundant phi copy pairs' interfering to // each other globally, by leverage the DAG information. using namespace ir; // Firstly, validate all possible redundant phi copy map and update liveness information // accordingly. if (replaceMap.size() != 0) { for (auto pair : replaceMap) { if (redundantPhiCopyMap.find(pair.first) != redundantPhiCopyMap.end()) { auto it = redundantPhiCopyMap.find(pair.first); Register phiCopy = it->second; Register newPhiCopySrc = pair.second; redundantPhiCopyMap.erase(it); redundantPhiCopyMap.insert(std::make_pair(newPhiCopySrc, phiCopy)); } } liveness.replaceRegs(replaceMap); replaceMap.clear(); } if (redundantPhiCopyMap.size() == 0) return; auto dag = new FunctionDAG(liveness); map newRedundant; map *curRedundant = &redundantPhiCopyMap; map *nextRedundant = &newRedundant, tmp; map replacedRegs, revReplacedRegs; // Do multi pass redundant phi copy elimination based on the global interfering information. // FIXME, we don't need to re-compute the whole DAG for each pass. while (curRedundant->size() > 0) { //for (auto &pair = *curRedundant) { for (auto pair = curRedundant->begin(); pair != curRedundant->end(); ) { auto phiCopySrc = pair->first; auto phiCopy = pair->second; if (replacedRegs.find(phiCopy) != replacedRegs.end() || revReplacedRegs.find(phiCopy) != revReplacedRegs.end() || revReplacedRegs.find(phiCopySrc) != revReplacedRegs.end()) { pair++; continue; } if (!dag->interfere(liveness, phiCopySrc, phiCopy)) { const ir::DefSet *phiCopySrcDef = dag->getRegDef(phiCopySrc); const ir::UseSet *phiCopySrcUse = dag->getRegUse(phiCopySrc); for (auto &s : *phiCopySrcDef) { const Instruction *phiSrcDefInsn = s->getInstruction(); replaceDst(const_cast(phiSrcDefInsn), phiCopySrc, phiCopy); } for (auto &s : *phiCopySrcUse) { const Instruction *phiSrcUseInsn = s->getInstruction(); replaceSrc(const_cast(phiSrcUseInsn), phiCopySrc, phiCopy); } replacedRegs.insert(std::make_pair(phiCopySrc, phiCopy)); revReplacedRegs.insert(std::make_pair(phiCopy, phiCopySrc)); curRedundant->erase(pair++); } else pair++; } if (replacedRegs.size() != 0) { liveness.replaceRegs(replacedRegs); for (auto &pair : *curRedundant) { auto from = pair.first; auto to = pair.second; bool revisit = false; if (replacedRegs.find(pair.second) != replacedRegs.end()) { to = replacedRegs.find(to)->second; revisit = true; } if (revReplacedRegs.find(from) != revReplacedRegs.end() || revReplacedRegs.find(to) != revReplacedRegs.end()) revisit = true; if (revisit) nextRedundant->insert(std::make_pair(from, to)); } std::swap(curRedundant, nextRedundant); } else break; nextRedundant->clear(); replacedRegs.clear(); revReplacedRegs.clear(); delete dag; dag = new ir::FunctionDAG(liveness); } delete dag; } void GenWriter::removeMOVs(const ir::Liveness &liveness, ir::Function &fn) { // We store the last write and last read for each register const uint32_t regNum = fn.regNum(); vector lastUse; lastUse.resize(regNum); // Remove the MOVs per block (local analysis only) Note that we do not try // to remove MOV for variables that outlives the block. So we use liveness // information to figure out which variable is alive fn.foreachBlock([&](ir::BasicBlock &bb) { // We need to know when each register will be read or written buildRegInfo(bb, lastUse); // Liveinfo helps us to know if the source outlives the block const ir::Liveness::BlockInfo &info = liveness.getBlockInfo(&bb); auto it = --bb.end(); if (it->isMemberOf() == true) --it; for (auto it = --bb.end(); it != bb.end();) { ir::Instruction *insn = &*it; it--; const ir::Opcode op = insn->getOpcode(); if (op == ir::OP_MOV) { const ir::Register dst = insn->getDst(0); const ir::Register src = insn->getSrc(0); // Outlives the block. We do not do anything if (info.inLiveOut(src)) continue; const RegInfoForMov &dstInfo = lastUse[dst]; const RegInfoForMov &srcInfo = lastUse[src]; // The source is not computed in this block if (srcInfo.lastWrite == 0) continue; // dst is read after src is written. We cannot overwrite dst if (dstInfo.lastRead > srcInfo.lastWrite) continue; // We are good. We first patch the destination then all the sources replaceDst(srcInfo.lastWriteInsn, src, dst); // Then we patch all subsequent uses of the source ir::Instruction *next = static_cast(srcInfo.lastWriteInsn->next); while (next != insn) { replaceSrc(next, src, dst); next = static_cast(next->next); } insn->remove(); } else if (op == ir::OP_LOADI) continue; else break; } }); } void GenWriter::removeLOADIs(const ir::Liveness &liveness, ir::Function &fn) { // We store the last write and last read for each register const uint32_t regNum = fn.regNum(); vector lastUse; lastUse.resize(regNum); // Traverse all blocks and remove redundant immediates. Do *not* remove // immediates that outlive the block fn.foreachBlock([&](ir::BasicBlock &bb) { // Each immediate that is already loaded in the block map loadedImm; // Immediate to immediate translation map immTranslate; // Liveinfo helps us to know if the loaded immediate outlives the block const ir::Liveness::BlockInfo &info = liveness.getBlockInfo(&bb); // We need to know when each register will be read or written buildRegInfo(bb, lastUse); // Top bottom traversal -> remove useless LOADIs uint32_t insnID = 2; bb.foreach([&](ir::Instruction &insn) { // We either try to remove the LOADI or we will try to use it as a // replacement for the next same LOADIs if (insn.isMemberOf()) { ir::LoadImmInstruction &loadImm = cast(insn); const ir::Immediate imm = loadImm.getImmediate(); const ir::Register dst = loadImm.getDst(0); // Not here: cool, we put it in the map if the register is not // overwritten. If it is, we just ignore it for simplicity. Note that // it should not happen with the way we "unSSA" the code auto it = loadedImm.find(imm); auto end = loadedImm.end(); if (it == end && lastUse[dst].lastWrite == insnID+1) loadedImm.insert(std::make_pair(imm, dst)); // We already pushed the same immediate and we do not outlive the // block. We are good to replace this immediate by the previous one else if (it != end && info.inLiveOut(dst) == false) { immTranslate.insert(std::make_pair(dst, it->second)); insn.remove(); } } // Traverse all the destinations and sources and perform the // substitutions (if any) else { const uint32_t srcNum = insn.getSrcNum(); const uint32_t dstNum = insn.getDstNum(); for (uint32_t srcID = 0; srcID < srcNum; ++srcID) { const ir::Register src = insn.getSrc(srcID); auto it = immTranslate.find(src); if (it != immTranslate.end()) insn.setSrc(srcID, it->second); } for (uint32_t dstID = 0; dstID < dstNum; ++dstID) { const ir::Register dst = insn.getDst(dstID); auto it = immTranslate.find(dst); if (it != immTranslate.end()) insn.setDst(dstID, it->second); } } insnID += 2; }); }); } BVAR(OCL_OPTIMIZE_PHI_MOVES, true); BVAR(OCL_OPTIMIZE_LOADI, true); static const Instruction *getInstructionUseLocal(const Value *v) { // Local variable can only be used in one kernel function. So, if we find // one instruction that use the local variable, simply return. const Instruction *insn = NULL; for(Value::const_use_iterator iter = v->use_begin(); iter != v->use_end(); ++iter) { // After LLVM 3.5, use_iterator points to 'Use' instead of 'User', which is more straightforward. #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR < 35 const User *theUser = *iter; #else const User *theUser = iter->getUser(); #endif if(isa(theUser)) return cast(theUser); insn = getInstructionUseLocal(theUser); if(insn != NULL) break; } return insn; } void GenWriter::allocateGlobalVariableRegister(Function &F) { // Allocate a address register for each global variable const Module::GlobalListType &globalList = TheModule->getGlobalList(); for(auto i = globalList.begin(); i != globalList.end(); i ++) { const GlobalVariable &v = *i; if(!v.isConstantUsed()) continue; ir::AddressSpace addrSpace = addressSpaceLLVMToGen(v.getType()->getAddressSpace()); if(addrSpace == ir::MEM_LOCAL) { const Value * val = cast(&v); const Instruction *insn = getInstructionUseLocal(val); GBE_ASSERT(insn && "Can't find a valid reference instruction for local variable."); const BasicBlock * bb = insn->getParent(); const Function * func = bb->getParent(); if(func != &F) continue; ir::Function &f = ctx.getFunction(); f.setUseSLM(true); const Constant *c = v.getInitializer(); Type *ty = c->getType(); uint32_t oldSlm = f.getSLMSize(); // FIXME temporary reserve 4 bytes to avoid 0 address if (oldSlm == 0) oldSlm = 4; uint32_t align = 8 * getAlignmentByte(unit, ty); uint32_t padding = getPadding(oldSlm*8, align); f.setSLMSize(oldSlm + padding/8 + getTypeByteSize(unit, ty)); this->newRegister(const_cast(&v)); ir::Register reg = regTranslator.getScalar(const_cast(&v), 0); ctx.LOADI(getType(ctx, v.getType()), reg, ctx.newIntegerImmediate(oldSlm + padding/8, getType(ctx, v.getType()))); } else if(addrSpace == ir::MEM_CONSTANT || addrSpace == ir::MEM_GLOBAL || v.isConstant()) { if(v.getName().equals(StringRef("__gen_ocl_profiling_buf"))) { ctx.getUnit().getProfilingInfo()->setBTI(BtiMap.find(const_cast(&v))->second); regTranslator.newScalarProxy(ir::ocl::profilingbptr, const_cast(&v)); } else { this->newRegister(const_cast(&v)); ir::Register reg = regTranslator.getScalar(const_cast(&v), 0); ir::Constant &con = unit.getConstantSet().getConstant(v.getName()); if (!legacyMode) { ir::Register regload = ctx.reg(getFamily(getType(ctx, v.getType()))); ctx.LOADI(getType(ctx, v.getType()), regload, ctx.newIntegerImmediate(con.getOffset(), getType(ctx, v.getType()))); ctx.ADD(getType(ctx, v.getType()), reg, ir::ocl::constant_addrspace, regload); } else ctx.LOADI(getType(ctx, v.getType()), reg, ctx.newIntegerImmediate(con.getOffset(), getType(ctx, v.getType()))); } } else if(addrSpace == ir::MEM_PRIVATE) { this->newRegister(const_cast(&v)); } } } static INLINE void findAllLoops(LoopInfo * LI, std::vector> &lp) { for (Loop::reverse_iterator I = LI->rbegin(), E = LI->rend(); I != E; ++I) { lp.push_back(std::make_pair(*I, -1)); } if (lp.size() == 0) return; uint32_t i = 0; do { const std::vector subLoops = lp[i].first->getSubLoops(); for(auto sub : subLoops) lp.push_back(std::make_pair(sub, i)); i++; } while(i < lp.size()); } void GenWriter::gatherLoopInfo(ir::Function &fn) { vector loopBBs; vector> loopExits; std::vector> lp; findAllLoops(LI, lp); for (auto loop : lp) { loopBBs.clear(); loopExits.clear(); const std::vector &inBBs = loop.first->getBlocks(); for (auto b : inBBs) { GBE_ASSERT(labelMap.find(b) != labelMap.end()); loopBBs.push_back(labelMap[b]); } BasicBlock *preheader = loop.first->getLoopPredecessor(); ir::LabelIndex preheaderBB(0); if (preheader) { preheaderBB = labelMap[preheader]; } SmallVector exitBBs; loop.first->getExitEdges(exitBBs); for(auto b : exitBBs){ GBE_ASSERT(labelMap.find(b.first) != labelMap.end()); GBE_ASSERT(labelMap.find(b.second) != labelMap.end()); loopExits.push_back(std::make_pair(labelMap[b.first], labelMap[b.second])); } fn.addLoop(preheaderBB, loop.second, loopBBs, loopExits); } } static unsigned getChildNo(BasicBlock *bb) { TerminatorInst *term = bb->getTerminator(); return term->getNumSuccessors(); } // return NULL if index out-range of children number static BasicBlock *getChildPossible(BasicBlock *bb, unsigned index) { TerminatorInst *term = bb->getTerminator(); unsigned childNo = term->getNumSuccessors(); BasicBlock *child = NULL; if(index < childNo) { child = term->getSuccessor(index); } return child; } /*! Sorting Basic blocks is mainly used to solve register liveness issue, take a look at below CFG: -<--1-- | | | ->2 -- 3 <--- | | ^ | -->4-- | | | | | | | -----5<-- | | | | | ----------6<----- | -->7 1.) A register %10 defined in bb4, and used in bb5 & bb6. In normal liveness analysis, %10 is not alive in bb3. But under simd execution model, after executing bb4, some channel jump through bb5 to bb3, other channel may jump to bb6, we must execute bb3 first, then bb6, to avoid missing instructions. The physical register of %10 was assigned some value in bb4, but when executing bb3, its content may be over-written as it is dead in bb3. When jumping back to execute bb6, it will get polluted data. What a disaster! What we do here is do a topological sorting of basic blocks, For this case we can see the bb3 will be placed after bb5 & bb6. The liveness calculation is just as normal and will be correct. 2.) Another advantage of sorting basic blocks is reducing register pressure. In the above CFG, a register defined in bb3 and used in bb7 will be alive through 3,4,5,6,7. But in fact it should be only alive in bb3 and bb7. After topological sorting, this kind of register would be only alive in bb3 and bb7. Register pressure in 4,5,6 is reduced. 3.) Classical post-order traversal will automatically choose a order for the successors of a basic block, But this order may be hard to handle, take a look at below CFG: 1 <----- / | 2 --> 4 - | 3 | 5 In the post oder traversal, it may be: 5->4->3->2->1, as 4, 3 does not have strict order. This is a serious issue, a value defined in bb3, used in bb5 may be overwritten in bb1. Remember the simd execution model? some lanes may execute bb4 after other lanes finish bb3, and then jump to bb1, but live range of the register does not cover bb1. what we done here is for a loop exit (here bb3), we alwasy make sure it is visited first in the post-order traversal, for the graph, that means 5->3->4->2->1. Then a definition in bb3, and used in 5 will not interfere with any other values defined in the loop. FIXME: For irreducible graph, we need to identify it and convert to reducible graph. */ void GenWriter::sortBasicBlock(Function &F) { BasicBlock &entry = F.getEntryBlock(); std::vector visitStack; std::vector sorted; std::set visited; visitStack.push_back(&entry); visited.insert(&entry); while (!visitStack.empty()) { BasicBlock *top = visitStack.back(); unsigned childNo = getChildNo(top); GBE_ASSERT(childNo <= 2); BasicBlock *child0 = getChildPossible(top, 0); BasicBlock *child1 = getChildPossible(top, 1); if(childNo == 2) { Loop *loop = LI->getLoopFor(top); // visit loop exit node first, so loop exit block will be placed // after blocks in loop in 'reverse post-order' list. if (loop && loop->contains(child0) && !loop->contains(child1)) { BasicBlock *tmp = child0; child0 = child1; child1 = tmp; } } if (child0 != NULL && visited.find(child0) == visited.end()) { visitStack.push_back(child0); visited.insert(child0); } else if (child1 != NULL && visited.find(child1) == visited.end()) { visitStack.push_back(child1); visited.insert(child1); } else { sorted.push_back(visitStack.back()); visitStack.pop_back(); } } Function::BasicBlockListType &bbList = F.getBasicBlockList(); for (std::vector::iterator iter = sorted.begin(); iter != sorted.end(); ++iter) { (*iter)->removeFromParent(); } for (std::vector::reverse_iterator iter = sorted.rbegin(); iter != sorted.rend(); ++iter) { bbList.push_back(*iter); } } void GenWriter::emitFunction(Function &F) { switch (F.getCallingConv()) { case CallingConv::C: case CallingConv::Fast: case CallingConv::SPIR_KERNEL: break; default: GBE_ASSERTM(false, "Unsupported calling convention"); } ctx.startFunction(F.getName()); ir::Function &fn = ctx.getFunction(); this->regTranslator.clear(); this->labelMap.clear(); this->emitFunctionPrototype(F); this->allocateGlobalVariableRegister(F); sortBasicBlock(F); // Visit all the instructions and emit the IR registers or the value to // value mapping when a new register is not needed pass = PASS_EMIT_REGISTERS; for (inst_iterator I = inst_begin(&F), E = inst_end(&F); I != E; ++I) visit(*I); // Abort if this found an error (otherwise emitBasicBlock will assert) if(has_errors){return;} // First create all the labels (one per block) ... for (Function::iterator BB = F.begin(), E = F.end(); BB != E; ++BB) this->newLabelIndex(&*BB); // Then, for all branch instructions that have conditions, see if we can // simplify the code by inverting condition code for (Function::iterator BB = F.begin(), E = F.end(); BB != E; ++BB) this->simplifyTerminator(&*BB); // gather loop info, which is useful for liveness analysis gatherLoopInfo(fn); // ... then, emit the instructions for all basic blocks pass = PASS_EMIT_INSTRUCTIONS; for (Function::iterator BB = F.begin(), E = F.end(); BB != E; ++BB) emitBasicBlock(&*BB); ctx.endFunction(); // Liveness can be shared when we optimized the immediates and the MOVs ir::Liveness liveness(fn); if (OCL_OPTIMIZE_LOADI) this->removeLOADIs(liveness, fn); if (OCL_OPTIMIZE_PHI_MOVES) { map replaceMap, redundantPhiCopyMap; this->optimizePhiCopy(liveness, fn, replaceMap, redundantPhiCopyMap); this->postPhiCopyOptimization(liveness, fn, replaceMap, redundantPhiCopyMap); this->removeMOVs(liveness, fn); } } void GenWriter::regAllocateReturnInst(ReturnInst &I) {} void GenWriter::emitReturnInst(ReturnInst &I) { const ir::Function &fn = ctx.getFunction(); GBE_ASSERTM(fn.outputNum() <= 1, "no more than one value can be returned"); if (fn.outputNum() == 1 && I.getNumOperands() > 0) { const ir::Register dst = fn.getOutput(0); const ir::Register src = this->getRegister(I.getOperand(0)); const ir::RegisterFamily family = fn.getRegisterFamily(dst); ctx.MOV(ir::getType(family), dst, src); } ctx.RET(); } void GenWriter::regAllocateBinaryOperator(Instruction &I) { this->newRegister(&I); } void GenWriter::emitBinaryOperator(Instruction &I) { #if GBE_DEBUG GBE_ASSERT(I.getType()->isPointerTy() == false); // We accept logical operations on booleans switch (I.getOpcode()) { case Instruction::And: case Instruction::Or: case Instruction::Xor: break; default: GBE_ASSERT(I.getType() != Type::getInt1Ty(I.getContext())); } #endif /* GBE_DEBUG */ // Get the element type for a vector const ir::Type type = getType(ctx, I.getType()); // Emit the instructions in a row const ir::Register dst = this->getRegister(&I); const ir::Register src0 = this->getRegister(I.getOperand(0)); const ir::Register src1 = this->getRegister(I.getOperand(1)); switch (I.getOpcode()) { case Instruction::Add: case Instruction::FAdd: ctx.ADD(type, dst, src0, src1); break; case Instruction::Sub: case Instruction::FSub: ctx.SUB(type, dst, src0, src1); break; case Instruction::Mul: { //LLVM always put constant to src1, but also add the src0 constant check. ConstantInt *c = dyn_cast(I.getOperand(0)); int index = 0; if (c == NULL) { c = dyn_cast(I.getOperand(0)); index = 1; } if (c != NULL && isPowerOf<2>(c->getSExtValue())) { c = ConstantInt::get(c->getType(), logi2(c->getZExtValue())); if(index == 0) ctx.SHL(type, dst, src1, this->getRegister(c)); else ctx.SHL(type, dst, src0, this->getRegister(c)); } else { ctx.MUL(type, dst, src0, src1); } break; } case Instruction::FMul: ctx.MUL(type, dst, src0, src1); break; case Instruction::URem: ctx.REM(getUnsignedType(ctx, I.getType()), dst, src0, src1); break; case Instruction::SRem: case Instruction::FRem: ctx.REM(type, dst, src0, src1); break; case Instruction::UDiv: { //Only check divisor for DIV ConstantInt *c = dyn_cast(I.getOperand(1)); if (c != NULL && isPowerOf<2>(c->getZExtValue())) { c = ConstantInt::get(c->getType(), logi2(c->getZExtValue())); ctx.SHR(getUnsignedType(ctx, I.getType()), dst, src0, this->getRegister(c)); } else { ctx.DIV(getUnsignedType(ctx, I.getType()), dst, src0, src1); } break; } case Instruction::SDiv: case Instruction::FDiv: ctx.DIV(type, dst, src0, src1); break; case Instruction::And: ctx.AND(type, dst, src0, src1); break; case Instruction::Or: ctx.OR(type, dst, src0, src1); break; case Instruction::Xor: ctx.XOR(type, dst, src0, src1); break; case Instruction::Shl: ctx.SHL(type, dst, src0, src1); break; case Instruction::LShr: ctx.SHR(getUnsignedType(ctx, I.getType()), dst, src0, src1); break; case Instruction::AShr: ctx.ASR(type, dst, src0, src1); break; default: NOT_SUPPORTED; } } void GenWriter::regAllocateICmpInst(ICmpInst &I) { this->newRegister(&I); } static ir::Type makeTypeSigned(const ir::Type &type) { if (type == ir::TYPE_U8) return ir::TYPE_S8; else if (type == ir::TYPE_U16) return ir::TYPE_S16; else if (type == ir::TYPE_U32) return ir::TYPE_S32; else if (type == ir::TYPE_U64) return ir::TYPE_S64; return type; } static ir::Type makeTypeUnsigned(const ir::Type &type) { if (type == ir::TYPE_S8) return ir::TYPE_U8; else if (type == ir::TYPE_S16) return ir::TYPE_U16; else if (type == ir::TYPE_S32) return ir::TYPE_U32; else if (type == ir::TYPE_S64) return ir::TYPE_U64; return type; } void GenWriter::emitICmpInst(ICmpInst &I) { // Get the element type and the number of elements Type *operandType = I.getOperand(0)->getType(); const ir::Type type = getType(ctx, operandType); const ir::Type signedType = makeTypeSigned(type); const ir::Type unsignedType = makeTypeUnsigned(type); // Emit the instructions in a row const ir::Register dst = this->getRegister(&I); const ir::Register src0 = this->getRegister(I.getOperand(0)); const ir::Register src1 = this->getRegister(I.getOperand(1)); // We must invert the condition to simplify the branch code if (conditionSet.find(&I) != conditionSet.end()) { switch (I.getPredicate()) { case ICmpInst::ICMP_EQ: ctx.NE(type, dst, src0, src1); break; case ICmpInst::ICMP_NE: ctx.EQ(type, dst, src0, src1); break; case ICmpInst::ICMP_ULE: ctx.GT((unsignedType), dst, src0, src1); break; case ICmpInst::ICMP_SLE: ctx.GT(signedType, dst, src0, src1); break; case ICmpInst::ICMP_UGE: ctx.LT(unsignedType, dst, src0, src1); break; case ICmpInst::ICMP_SGE: ctx.LT(signedType, dst, src0, src1); break; case ICmpInst::ICMP_ULT: ctx.GE(unsignedType, dst, src0, src1); break; case ICmpInst::ICMP_SLT: ctx.GE(signedType, dst, src0, src1); break; case ICmpInst::ICMP_UGT: ctx.LE(unsignedType, dst, src0, src1); break; case ICmpInst::ICMP_SGT: ctx.LE(signedType, dst, src0, src1); break; default: NOT_SUPPORTED; } } // Nothing special to do else { switch (I.getPredicate()) { case ICmpInst::ICMP_EQ: ctx.EQ(type, dst, src0, src1); break; case ICmpInst::ICMP_NE: ctx.NE(type, dst, src0, src1); break; case ICmpInst::ICMP_ULE: ctx.LE((unsignedType), dst, src0, src1); break; case ICmpInst::ICMP_SLE: ctx.LE(signedType, dst, src0, src1); break; case ICmpInst::ICMP_UGE: ctx.GE(unsignedType, dst, src0, src1); break; case ICmpInst::ICMP_SGE: ctx.GE(signedType, dst, src0, src1); break; case ICmpInst::ICMP_ULT: ctx.LT(unsignedType, dst, src0, src1); break; case ICmpInst::ICMP_SLT: ctx.LT(signedType, dst, src0, src1); break; case ICmpInst::ICMP_UGT: ctx.GT(unsignedType, dst, src0, src1); break; case ICmpInst::ICMP_SGT: ctx.GT(signedType, dst, src0, src1); break; default: NOT_SUPPORTED; } } } void GenWriter::regAllocateFCmpInst(FCmpInst &I) { this->newRegister(&I); } void GenWriter::emitFCmpInst(FCmpInst &I) { // Get the element type and the number of elements Type *operandType = I.getOperand(0)->getType(); const ir::Type type = getType(ctx, operandType); const ir::Type insnType = getType(ctx, I.getType()); // Emit the instructions in a row const ir::Register dst = this->getRegister(&I); const ir::Register src0 = this->getRegister(I.getOperand(0)); const ir::Register src1 = this->getRegister(I.getOperand(1)); const ir::Register tmp = ctx.reg(getFamily(ctx, I.getType())); const ir::Register tmp1 = ctx.reg(getFamily(ctx, I.getType())); Value *cv = ConstantInt::get(I.getType(), 1); switch (I.getPredicate()) { case ICmpInst::FCMP_OEQ: ctx.EQ(type, dst, src0, src1); break; case ICmpInst::FCMP_UNE: ctx.NE(type, dst, src0, src1); break; case ICmpInst::FCMP_OLE: ctx.LE(type, dst, src0, src1); break; case ICmpInst::FCMP_OGE: ctx.GE(type, dst, src0, src1); break; case ICmpInst::FCMP_OLT: ctx.LT(type, dst, src0, src1); break; case ICmpInst::FCMP_OGT: ctx.GT(type, dst, src0, src1); break; case ICmpInst::FCMP_ORD: //If there is a constant between src0 and src1, this constant value //must ordered, otherwise, llvm will optimize the instruction to ture. //So discard this constant value, only compare the other src. if(isa(I.getOperand(0))) ctx.EQ(type, dst, src1, src1); else if(isa(I.getOperand(1))) ctx.EQ(type, dst, src0, src0); else ctx.ORD(type, dst, src0, src1); break; case ICmpInst::FCMP_UNO: if(isa(I.getOperand(0))) ctx.NE(type, dst, src1, src1); else if(isa(I.getOperand(1))) ctx.NE(type, dst, src0, src0); else { ctx.ORD(type, tmp, src0, src1); ctx.XOR(insnType, dst, tmp, getRegister(cv)); //TODO: Use NOT directly } break; case ICmpInst::FCMP_UEQ: ctx.NE(type, tmp, src0, src1); ctx.XOR(insnType, dst, tmp, getRegister(cv)); break; case ICmpInst::FCMP_UGT: ctx.LE(type, tmp, src0, src1); ctx.XOR(insnType, dst, tmp, getRegister(cv)); break; case ICmpInst::FCMP_UGE: ctx.LT(type, tmp, src0, src1); ctx.XOR(insnType, dst, tmp, getRegister(cv)); break; case ICmpInst::FCMP_ULT: ctx.GE(type, tmp, src0, src1); ctx.XOR(insnType, dst, tmp, getRegister(cv)); break; case ICmpInst::FCMP_ULE: ctx.GT(type, tmp, src0, src1); ctx.XOR(insnType, dst, tmp, getRegister(cv)); break; case ICmpInst::FCMP_ONE: ctx.LT(type, tmp, src0, src1); ctx.GT(type, tmp1, src0, src1); ctx.OR(insnType, dst, tmp, tmp1); break; case ICmpInst::FCMP_TRUE: ctx.MOV(insnType, dst, getRegister(cv)); break; default: NOT_SUPPORTED; } } void GenWriter::regAllocateCastInst(CastInst &I) { Value *dstValue = &I; Value *srcValue = I.getOperand(0); const auto op = I.getOpcode(); switch (op) { // When casting pointer to integers, be aware with integers case Instruction::PtrToInt: case Instruction::IntToPtr: { Type *dstType = dstValue->getType(); Type *srcType = srcValue->getType(); if (getTypeByteSize(unit, dstType) == getTypeByteSize(unit, srcType)) { #if GBE_DEBUG #endif /* GBE_DEBUG */ regTranslator.newValueProxy(srcValue, dstValue); } else this->newRegister(dstValue); } break; // Bitcast just forward registers case Instruction::BitCast: { Type *srcType = srcValue->getType(); Type *dstType = dstValue->getType(); if(srcType->isVectorTy() || dstType->isVectorTy()) this->newRegister(dstValue); else regTranslator.newValueProxy(srcValue, dstValue); } break; // Various conversion operations -> just allocate registers for them case Instruction::FPToUI: case Instruction::FPToSI: case Instruction::SIToFP: case Instruction::UIToFP: case Instruction::SExt: case Instruction::ZExt: case Instruction::FPExt: case Instruction::FPTrunc: case Instruction::Trunc: this->newRegister(&I); break; case Instruction::AddrSpaceCast: regTranslator.newValueProxy(srcValue, dstValue); break; default: NOT_SUPPORTED; } } void GenWriter::emitCastInst(CastInst &I) { switch (I.getOpcode()) { case Instruction::AddrSpaceCast: break; case Instruction::PtrToInt: case Instruction::IntToPtr: { Value *dstValue = &I; Value *srcValue = I.getOperand(0); Type *dstType = dstValue->getType(); Type *srcType = srcValue->getType(); if (getTypeByteSize(unit, dstType) != getTypeByteSize(unit, srcType)) { const ir::Register dst = this->getRegister(&I); const ir::Register src = this->getRegister(srcValue); ctx.CVT(getType(ctx, dstType), getType(ctx, srcType), dst, src); } } break; case Instruction::BitCast: { Value *srcValue = I.getOperand(0); Value *dstValue = &I; uint32_t srcElemNum = 0, dstElemNum = 0 ; ir::Type srcType = getVectorInfo(ctx, srcValue, srcElemNum); ir::Type dstType = getVectorInfo(ctx, dstValue, dstElemNum); // As long and double are not compatible in register storage // and we do not support double yet, simply put an assert here GBE_ASSERT(!(srcType == ir::TYPE_S64 && dstType == ir::TYPE_DOUBLE)); GBE_ASSERT(!(dstType == ir::TYPE_S64 && srcType == ir::TYPE_DOUBLE)); if(srcElemNum > 1 || dstElemNum > 1) { // Build the tuple data in the vector vector srcTupleData; vector dstTupleData; uint32_t elemID = 0; for (elemID = 0; elemID < srcElemNum; ++elemID) { ir::Register reg; reg = this->getRegister(srcValue, elemID); srcTupleData.push_back(reg); } for (elemID = 0; elemID < dstElemNum; ++elemID) { ir::Register reg; reg = this->getRegister(dstValue, elemID); dstTupleData.push_back(reg); } const ir::Tuple srcTuple = ctx.arrayTuple(&srcTupleData[0], srcElemNum); const ir::Tuple dstTuple = ctx.arrayTuple(&dstTupleData[0], dstElemNum); ctx.BITCAST(dstType, srcType, dstTuple, srcTuple, dstElemNum, srcElemNum); } } break; // nothing to emit here case Instruction::FPToUI: case Instruction::FPToSI: case Instruction::SIToFP: case Instruction::UIToFP: case Instruction::SExt: case Instruction::ZExt: case Instruction::FPExt: case Instruction::FPTrunc: case Instruction::Trunc: { // Get the element type for a vector Type *llvmDstType = I.getType(); Type *llvmSrcType = I.getOperand(0)->getType(); ir::Type dstType; if (I.getOpcode() == Instruction::FPToUI) dstType = getUnsignedType(ctx, llvmDstType); else dstType = getType(ctx, llvmDstType); ir::Type srcType; if (I.getOpcode() == Instruction::ZExt || I.getOpcode() == Instruction::UIToFP) { srcType = getUnsignedType(ctx, llvmSrcType); } else { srcType = getType(ctx, llvmSrcType); } // We use a select (0,1) not a convert when the destination is a boolean if (srcType == ir::TYPE_BOOL) { const ir::RegisterFamily family = getFamily(dstType); ir::ImmediateIndex zero; if(dstType == ir::TYPE_FLOAT) zero = ctx.newFloatImmediate(0); else if(dstType == ir::TYPE_DOUBLE) zero = ctx.newDoubleImmediate(0); else zero = ctx.newIntegerImmediate(0, dstType); ir::ImmediateIndex one; if (I.getOpcode() == Instruction::SExt && (dstType == ir::TYPE_S8 || dstType == ir::TYPE_S16 || dstType == ir::TYPE_S32 || dstType == ir::TYPE_S64)) one = ctx.newIntegerImmediate(-1, dstType); else if(dstType == ir::TYPE_FLOAT) one = ctx.newFloatImmediate(1); else if(dstType == ir::TYPE_DOUBLE) one = ctx.newDoubleImmediate(1); else one = ctx.newIntegerImmediate(1, dstType); const ir::Register zeroReg = ctx.reg(family); const ir::Register oneReg = ctx.reg(family); ctx.LOADI(dstType, zeroReg, zero); ctx.LOADI(dstType, oneReg, one); const ir::Register dst = this->getRegister(&I); const ir::Register src = this->getRegister(I.getOperand(0)); ctx.SEL(dstType, dst, src, oneReg, zeroReg); } /* For half <---> float conversion, we use F16TO32 or F32TO16, make the code path same. */ else if (srcType == ir::TYPE_HALF && dstType == ir::TYPE_FLOAT) { ctx.F16TO32(ir::TYPE_FLOAT, ir::TYPE_U16, getRegister(&I), getRegister(I.getOperand(0))); } else if (srcType == ir::TYPE_FLOAT && dstType == ir::TYPE_HALF) { ctx.F32TO16(ir::TYPE_U16, ir::TYPE_FLOAT, getRegister(&I), getRegister(I.getOperand(0))); } // Use a convert for the other cases else { const ir::Register dst = this->getRegister(&I); const ir::Register src = this->getRegister(I.getOperand(0)); ctx.CVT(dstType, srcType, dst, src); } } break; default: NOT_SUPPORTED; } } /*! Because there are still fake insert/extract instruction for * load/store, so keep empty function here */ void GenWriter::regAllocateInsertElement(InsertElementInst &I) {} void GenWriter::emitInsertElement(InsertElementInst &I) { const VectorType *type = dyn_cast(I.getType()); GBE_ASSERT(type); const int elemNum = type->getNumElements(); Value *vec = I.getOperand(0); Value *value = I.getOperand(1); const Value *index = I.getOperand(2); const ConstantInt *c = dyn_cast(index); int i = c->getValue().getSExtValue(); for(int j=0; j(index); GBE_ASSERT(c); int i = c->getValue().getSExtValue(); regTranslator.newValueProxy(vec, &I, i, 0); } void GenWriter::emitExtractElement(ExtractElementInst &I) { } void GenWriter::regAllocateExtractValue(ExtractValueInst &I) { Value *agg = I.getAggregateOperand(); for (const unsigned *i = I.idx_begin(), *e = I.idx_end(); i != e; i++) regTranslator.newValueProxy(agg, &I, *i, 0); } void GenWriter::emitExtractValue(ExtractValueInst &I) { } void GenWriter::regAllocateShuffleVectorInst(ShuffleVectorInst &I) {} void GenWriter::emitShuffleVectorInst(ShuffleVectorInst &I) {} void GenWriter::regAllocateSelectInst(SelectInst &I) { this->newRegister(&I); } void GenWriter::emitSelectInst(SelectInst &I) { // Get the element type for a vector const ir::Type type = getType(ctx, I.getType()); // Emit the instructions in a row const ir::Register dst = this->getRegister(&I); const ir::Register cond = this->getRegister(I.getOperand(0)); const ir::Register src0 = this->getRegister(I.getOperand(1)); const ir::Register src1 = this->getRegister(I.getOperand(2)); ctx.SEL(type, dst, cond, src0, src1); } void GenWriter::regAllocatePHINode(PHINode &I) { // Copy 1 for the PHI this->newRegister(&I); // Copy 2 to avoid lost copy issue Value *copy = this->getPHICopy(&I); this->newRegister(&I, copy); } void GenWriter::emitPHINode(PHINode &I) { Value *copy = this->getPHICopy(&I); const ir::Type type = getType(ctx, I.getType()); const ir::Register dst = this->getRegister(&I); const ir::Register src = this->getRegister(copy); ctx.MOV(type, dst, src); phiMap.insert(std::make_pair(dst, src)); } void GenWriter::regAllocateBranchInst(BranchInst &I) {} void GenWriter::emitBranchInst(BranchInst &I) { // Emit MOVs if required BasicBlock *bb = I.getParent(); this->emitMovForPHI(bb, I.getSuccessor(0)); if (I.isConditional()) this->emitMovForPHI(bb, I.getSuccessor(1)); // Inconditional branch. Just check that we jump to a block which is not our // successor if (I.isConditional() == false) { BasicBlock *target = I.getSuccessor(0); if (std::next(Function::iterator(bb)) != Function::iterator(target)) { GBE_ASSERT(labelMap.find(target) != labelMap.end()); const ir::LabelIndex labelIndex = labelMap[target]; ctx.BRA(labelIndex); } } // The LLVM branch has two targets else { BasicBlock *taken = NULL, *nonTaken = NULL; Value *condition = I.getCondition(); // We may inverted the branch condition to simplify the branching code const bool inverted = conditionSet.find(condition) != conditionSet.end(); taken = inverted ? I.getSuccessor(1) : I.getSuccessor(0); nonTaken = inverted ? I.getSuccessor(0) : I.getSuccessor(1); // Get both taken label and predicate register GBE_ASSERT(labelMap.find(taken) != labelMap.end()); const ir::LabelIndex index = labelMap[taken]; const ir::Register reg = this->getRegister(condition); ctx.BRA(index, reg); // If non-taken target is the next block, there is nothing to do BasicBlock *bb = I.getParent(); if (std::next(Function::iterator(bb)) == Function::iterator(nonTaken)) return; // This is slightly more complicated here. We need to issue one more // branch for the non-taken condition. GBE_ASSERT(labelMap.find(nonTaken) != labelMap.end()); const ir::LabelIndex untakenIndex = ctx.label(); ctx.LABEL(untakenIndex); ctx.BRA(labelMap[nonTaken]); } } void GenWriter::regAllocateCallInst(CallInst &I) { Value *dst = &I; Value *Callee = I.getCalledValue(); GBE_ASSERT(ctx.getFunction().getProfile() == ir::PROFILE_OCL); GBE_ASSERT(isa(I.getCalledValue()) == false); if(I.getNumArgOperands()) GBE_ASSERT(I.hasStructRetAttr() == false); // We only support a small number of intrinsics right now if (Function *F = I.getCalledFunction()) { const Intrinsic::ID intrinsicID = (Intrinsic::ID) F->getIntrinsicID(); if (intrinsicID != 0) { switch (F->getIntrinsicID()) { case Intrinsic::stacksave: this->newRegister(&I); break; case Intrinsic::stackrestore: break; case Intrinsic::lifetime_start: case Intrinsic::lifetime_end: break; case Intrinsic::fmuladd: this->newRegister(&I); break; case Intrinsic::debugtrap: case Intrinsic::trap: case Intrinsic::dbg_value: case Intrinsic::dbg_declare: break; case Intrinsic::sadd_with_overflow: case Intrinsic::uadd_with_overflow: case Intrinsic::ssub_with_overflow: case Intrinsic::usub_with_overflow: case Intrinsic::smul_with_overflow: case Intrinsic::umul_with_overflow: this->newRegister(&I); break; case Intrinsic::ctlz: case Intrinsic::cttz: case Intrinsic::bswap: this->newRegister(&I); break; case Intrinsic::fabs: case Intrinsic::sqrt: case Intrinsic::ceil: case Intrinsic::fma: case Intrinsic::trunc: case Intrinsic::rint: case Intrinsic::floor: case Intrinsic::sin: case Intrinsic::cos: case Intrinsic::log2: case Intrinsic::exp2: case Intrinsic::pow: this->newRegister(&I); break; default: GBE_ASSERTM(false, "Unsupported intrinsics"); } return; } } // Get the name of the called function and handle it const std::string fnName = Callee->stripPointerCasts()->getName(); auto genIntrinsicID = intrinsicMap.find(fnName); switch (genIntrinsicID) { case GEN_OCL_GET_GROUP_ID0: regTranslator.newScalarProxy(ir::ocl::groupid0, dst); break; case GEN_OCL_GET_GROUP_ID1: regTranslator.newScalarProxy(ir::ocl::groupid1, dst); break; case GEN_OCL_GET_GROUP_ID2: regTranslator.newScalarProxy(ir::ocl::groupid2, dst); break; case GEN_OCL_GET_LOCAL_ID0: regTranslator.newScalarProxy(ir::ocl::lid0, dst); break; case GEN_OCL_GET_LOCAL_ID1: regTranslator.newScalarProxy(ir::ocl::lid1, dst); break; case GEN_OCL_GET_LOCAL_ID2: regTranslator.newScalarProxy(ir::ocl::lid2, dst); break; case GEN_OCL_GET_NUM_GROUPS0: regTranslator.newScalarProxy(ir::ocl::numgroup0, dst); break; case GEN_OCL_GET_NUM_GROUPS1: regTranslator.newScalarProxy(ir::ocl::numgroup1, dst); break; case GEN_OCL_GET_NUM_GROUPS2: regTranslator.newScalarProxy(ir::ocl::numgroup2, dst); break; case GEN_OCL_GET_LOCAL_SIZE0: regTranslator.newScalarProxy(ir::ocl::lsize0, dst); break; case GEN_OCL_GET_LOCAL_SIZE1: regTranslator.newScalarProxy(ir::ocl::lsize1, dst); break; case GEN_OCL_GET_LOCAL_SIZE2: regTranslator.newScalarProxy(ir::ocl::lsize2, dst); break; case GEN_OCL_GET_ENQUEUED_LOCAL_SIZE0: regTranslator.newScalarProxy(ir::ocl::enqlsize0, dst); break; case GEN_OCL_GET_ENQUEUED_LOCAL_SIZE1: regTranslator.newScalarProxy(ir::ocl::enqlsize1, dst); break; case GEN_OCL_GET_ENQUEUED_LOCAL_SIZE2: regTranslator.newScalarProxy(ir::ocl::enqlsize2, dst); break; case GEN_OCL_GET_GLOBAL_SIZE0: regTranslator.newScalarProxy(ir::ocl::gsize0, dst); break; case GEN_OCL_GET_GLOBAL_SIZE1: regTranslator.newScalarProxy(ir::ocl::gsize1, dst); break; case GEN_OCL_GET_GLOBAL_SIZE2: regTranslator.newScalarProxy(ir::ocl::gsize2, dst); break; case GEN_OCL_GET_GLOBAL_OFFSET0: regTranslator.newScalarProxy(ir::ocl::goffset0, dst); break; case GEN_OCL_GET_GLOBAL_OFFSET1: regTranslator.newScalarProxy(ir::ocl::goffset1, dst); break; case GEN_OCL_GET_GLOBAL_OFFSET2: regTranslator.newScalarProxy(ir::ocl::goffset2, dst); break; case GEN_OCL_GET_THREAD_NUM: regTranslator.newScalarProxy(ir::ocl::threadn, dst); break; case GEN_OCL_GET_THREAD_ID: regTranslator.newScalarProxy(ir::ocl::threadid, dst); break; case GEN_OCL_GET_WORK_DIM: regTranslator.newScalarProxy(ir::ocl::workdim, dst); break; case GEN_OCL_FBH: case GEN_OCL_FBL: case GEN_OCL_CBIT: case GEN_OCL_RSQ: case GEN_OCL_RCP: case GEN_OCL_ABS: case GEN_OCL_GET_IMAGE_WIDTH: case GEN_OCL_GET_IMAGE_HEIGHT: case GEN_OCL_GET_IMAGE_CHANNEL_DATA_TYPE: case GEN_OCL_GET_IMAGE_CHANNEL_ORDER: case GEN_OCL_GET_IMAGE_DEPTH: case GEN_OCL_ATOMIC_ADD0: case GEN_OCL_ATOMIC_ADD1: case GEN_OCL_ATOMIC_SUB0: case GEN_OCL_ATOMIC_SUB1: case GEN_OCL_ATOMIC_AND0: case GEN_OCL_ATOMIC_AND1: case GEN_OCL_ATOMIC_OR0: case GEN_OCL_ATOMIC_OR1: case GEN_OCL_ATOMIC_XOR0: case GEN_OCL_ATOMIC_XOR1: case GEN_OCL_ATOMIC_XCHG0: case GEN_OCL_ATOMIC_XCHG1: case GEN_OCL_ATOMIC_UMAX0: case GEN_OCL_ATOMIC_UMAX1: case GEN_OCL_ATOMIC_UMIN0: case GEN_OCL_ATOMIC_UMIN1: case GEN_OCL_ATOMIC_IMAX0: case GEN_OCL_ATOMIC_IMAX1: case GEN_OCL_ATOMIC_IMIN0: case GEN_OCL_ATOMIC_IMIN1: case GEN_OCL_ATOMIC_INC0: case GEN_OCL_ATOMIC_INC1: case GEN_OCL_ATOMIC_DEC0: case GEN_OCL_ATOMIC_DEC1: case GEN_OCL_ATOMIC_CMPXCHG0: case GEN_OCL_ATOMIC_CMPXCHG1: // No structure can be returned this->newRegister(&I); break; case GEN_OCL_FORCE_SIMD8: case GEN_OCL_FORCE_SIMD16: case GEN_OCL_LBARRIER: case GEN_OCL_GBARRIER: case GEN_OCL_BARRIER: ctx.getFunction().setUseSLM(true); break; case GEN_OCL_WRITE_IMAGE_I: case GEN_OCL_WRITE_IMAGE_UI: case GEN_OCL_WRITE_IMAGE_F: break; case GEN_OCL_READ_IMAGE_I: case GEN_OCL_READ_IMAGE_UI: case GEN_OCL_READ_IMAGE_F: { // dst is a 4 elements vector. We allocate all 4 registers here. uint32_t elemNum; (void)getVectorInfo(ctx, &I, elemNum); GBE_ASSERT(elemNum == 4); this->newRegister(&I); break; } case GEN_OCL_MUL_HI_INT: case GEN_OCL_MUL_HI_UINT: case GEN_OCL_MUL_HI_I64: case GEN_OCL_MUL_HI_UI64: case GEN_OCL_UPSAMPLE_SHORT: case GEN_OCL_UPSAMPLE_INT: case GEN_OCL_UPSAMPLE_LONG: case GEN_OCL_FMAX: case GEN_OCL_FMIN: case GEN_OCL_SADD_SAT_CHAR: case GEN_OCL_SADD_SAT_SHORT: case GEN_OCL_SADD_SAT_INT: case GEN_OCL_SADD_SAT_LONG: case GEN_OCL_UADD_SAT_CHAR: case GEN_OCL_UADD_SAT_SHORT: case GEN_OCL_UADD_SAT_INT: case GEN_OCL_UADD_SAT_LONG: case GEN_OCL_SSUB_SAT_CHAR: case GEN_OCL_SSUB_SAT_SHORT: case GEN_OCL_SSUB_SAT_INT: case GEN_OCL_SSUB_SAT_LONG: case GEN_OCL_USUB_SAT_CHAR: case GEN_OCL_USUB_SAT_SHORT: case GEN_OCL_USUB_SAT_INT: case GEN_OCL_USUB_SAT_LONG: case GEN_OCL_HADD: case GEN_OCL_RHADD: case GEN_OCL_I64HADD: case GEN_OCL_I64RHADD: case GEN_OCL_I64_MAD_SAT: case GEN_OCL_I64_MAD_SATU: case GEN_OCL_SAT_CONV_U8_TO_I8: case GEN_OCL_SAT_CONV_I16_TO_I8: case GEN_OCL_SAT_CONV_U16_TO_I8: case GEN_OCL_SAT_CONV_I32_TO_I8: case GEN_OCL_SAT_CONV_U32_TO_I8: case GEN_OCL_SAT_CONV_F32_TO_I8: case GEN_OCL_SAT_CONV_I8_TO_U8: case GEN_OCL_SAT_CONV_I16_TO_U8: case GEN_OCL_SAT_CONV_U16_TO_U8: case GEN_OCL_SAT_CONV_I32_TO_U8: case GEN_OCL_SAT_CONV_U32_TO_U8: case GEN_OCL_SAT_CONV_F32_TO_U8: case GEN_OCL_SAT_CONV_U16_TO_I16: case GEN_OCL_SAT_CONV_I32_TO_I16: case GEN_OCL_SAT_CONV_U32_TO_I16: case GEN_OCL_SAT_CONV_F32_TO_I16: case GEN_OCL_SAT_CONV_I16_TO_U16: case GEN_OCL_SAT_CONV_I32_TO_U16: case GEN_OCL_SAT_CONV_U32_TO_U16: case GEN_OCL_SAT_CONV_F32_TO_U16: case GEN_OCL_SAT_CONV_U32_TO_I32: case GEN_OCL_SAT_CONV_F32_TO_I32: case GEN_OCL_SAT_CONV_I32_TO_U32: case GEN_OCL_SAT_CONV_F32_TO_U32: case GEN_OCL_SAT_CONV_F16_TO_I8: case GEN_OCL_SAT_CONV_F16_TO_U8: case GEN_OCL_SAT_CONV_F16_TO_I16: case GEN_OCL_SAT_CONV_F16_TO_U16: case GEN_OCL_SAT_CONV_F16_TO_I32: case GEN_OCL_SAT_CONV_F16_TO_U32: case GEN_OCL_CONV_F16_TO_F32: case GEN_OCL_CONV_F32_TO_F16: case GEN_OCL_SIMD_ANY: case GEN_OCL_SIMD_ALL: case GEN_OCL_SIMD_SIZE: case GEN_OCL_READ_TM: case GEN_OCL_REGION: case GEN_OCL_IN_PRIVATE: case GEN_OCL_SIMD_ID: case GEN_OCL_SIMD_SHUFFLE: case GEN_OCL_VME: case GEN_OCL_WORK_GROUP_ALL: case GEN_OCL_WORK_GROUP_ANY: case GEN_OCL_WORK_GROUP_BROADCAST: case GEN_OCL_WORK_GROUP_REDUCE_ADD: case GEN_OCL_WORK_GROUP_REDUCE_MAX: case GEN_OCL_WORK_GROUP_REDUCE_MIN: case GEN_OCL_WORK_GROUP_SCAN_EXCLUSIVE_ADD: case GEN_OCL_WORK_GROUP_SCAN_EXCLUSIVE_MAX: case GEN_OCL_WORK_GROUP_SCAN_EXCLUSIVE_MIN: case GEN_OCL_WORK_GROUP_SCAN_INCLUSIVE_ADD: case GEN_OCL_WORK_GROUP_SCAN_INCLUSIVE_MAX: case GEN_OCL_WORK_GROUP_SCAN_INCLUSIVE_MIN: case GEN_OCL_SUB_GROUP_BROADCAST: case GEN_OCL_SUB_GROUP_REDUCE_ADD: case GEN_OCL_SUB_GROUP_REDUCE_MAX: case GEN_OCL_SUB_GROUP_REDUCE_MIN: case GEN_OCL_SUB_GROUP_SCAN_EXCLUSIVE_ADD: case GEN_OCL_SUB_GROUP_SCAN_EXCLUSIVE_MAX: case GEN_OCL_SUB_GROUP_SCAN_EXCLUSIVE_MIN: case GEN_OCL_SUB_GROUP_SCAN_INCLUSIVE_ADD: case GEN_OCL_SUB_GROUP_SCAN_INCLUSIVE_MAX: case GEN_OCL_SUB_GROUP_SCAN_INCLUSIVE_MIN: case GEN_OCL_LRP: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_MEM: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_MEM2: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_MEM4: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_MEM8: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_IMAGE: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_IMAGE2: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_IMAGE4: case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_IMAGE8: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_MEM: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_MEM2: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_MEM4: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_MEM8: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_IMAGE: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_IMAGE2: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_IMAGE4: case GEN_OCL_SUB_GROUP_BLOCK_READ_US_IMAGE8: case GEN_OCL_ENQUEUE_SET_NDRANGE_INFO: case GEN_OCL_ENQUEUE_GET_NDRANGE_INFO: this->newRegister(&I); break; case GEN_OCL_GET_PIPE: { Value *srcValue = I.getOperand(0); if( BtiMap.find(dst) == BtiMap.end()) { unsigned tranBti = BtiMap.find(srcValue)->second; BtiMap.insert(std::make_pair(dst, tranBti)); } regTranslator.newValueProxy(srcValue, dst); break; } case GEN_OCL_MAKE_RID: case GEN_OCL_GET_RID: { Value *srcValue = I.getOperand(0); regTranslator.newValueProxy(srcValue, dst); break; } case GEN_OCL_INT_TO_SAMPLER: case GEN_OCL_SAMPLER_TO_INT: { Value *srcValue = I.getOperand(0); //srcValue->dump(); //dst->dump(); regTranslator.newValueProxy(srcValue, dst); break; } case GEN_OCL_ENQUEUE_GET_ENQUEUE_INFO_ADDR: regTranslator.newScalarProxy(ir::ocl::enqueuebufptr, dst); break; case GEN_OCL_PRINTF: this->newRegister(&I); // fall through case GEN_OCL_PUTS: { // We need a new BTI as printf output. if (printfBti < 0) { printfBti = this->getNewBti(&I, true); ctx.getFunction().getPrintfSet()->setBufBTI(printfBti); } break; } case GEN_OCL_CALC_TIMESTAMP: case GEN_OCL_STORE_PROFILING: case GEN_OCL_DEBUGWAIT: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_MEM: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_MEM2: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_MEM4: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_MEM8: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_IMAGE: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_IMAGE2: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_IMAGE4: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_IMAGE8: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_MEM: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_MEM2: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_MEM4: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_MEM8: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_IMAGE: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_IMAGE2: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_IMAGE4: case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_IMAGE8: break; case GEN_OCL_NOT_FOUND: default: has_errors = true; Func->getContext().emitError(&I,"function '" + fnName + "' not found or cannot be inlined"); }; } void GenWriter::emitRoundingCallInst(CallInst &I, CallSite &CS, ir::Opcode opcode) { if (I.getType()->isHalfTy()) { const ir::Register src = this->getRegister(I.getOperand(0)); const ir::Register srcFloat = ctx.reg(ir::FAMILY_DWORD); const ir::Register dstFloat = ctx.reg(ir::FAMILY_DWORD); const ir::Register dst = this->getRegister(&I); ctx.F16TO32(ir::TYPE_FLOAT, ir::TYPE_U16, srcFloat, src); ctx.ALU1(opcode, ir::TYPE_FLOAT, dstFloat, srcFloat); ctx.F32TO16(ir::TYPE_U16, ir::TYPE_FLOAT, dst, dstFloat); } else { GBE_ASSERT(I.getType()->isFloatTy()); this->emitUnaryCallInst(I,CS,opcode); } } void GenWriter::emitUnaryCallInst(CallInst &I, CallSite &CS, ir::Opcode opcode, ir::Type type) { CallSite::arg_iterator AI = CS.arg_begin(); #if GBE_DEBUG CallSite::arg_iterator AE = CS.arg_end(); #endif /* GBE_DEBUG */ GBE_ASSERT(AI != AE); const ir::Register src = this->getRegister(*AI); const ir::Register dst = this->getRegister(&I); ctx.ALU1(opcode, type, dst, src); } void GenWriter::regAllocateAtomicCmpXchgInst(AtomicCmpXchgInst &I) { this->newRegister(&I); } void GenWriter::emitAtomicInstHelper(const ir::AtomicOps opcode,const ir::Type type, const ir::Register dst, llvm::Value* llvmPtr, const ir::Tuple payloadTuple) { ir::Register pointer = this->getRegister(llvmPtr); ir::AddressSpace addrSpace = addressSpaceLLVMToGen(llvmPtr->getType()->getPointerAddressSpace()); // Get the function arguments ir::Register ptr; ir::Register btiReg; unsigned SurfaceIndex = 0xff; ir::AddressMode AM; if (legacyMode) { Value *bti = getBtiRegister(llvmPtr); Value *ptrBase = getPointerBase(llvmPtr); ir::Register baseReg = this->getRegister(ptrBase); if (isa(bti)) { AM = ir::AM_StaticBti; SurfaceIndex = cast(bti)->getZExtValue(); addrSpace = btiToGen(SurfaceIndex); } else { AM = ir::AM_DynamicBti; addrSpace = ir::MEM_MIXED; btiReg = this->getRegister(bti); } const ir::RegisterFamily pointerFamily = ctx.getPointerFamily(); ptr = ctx.reg(pointerFamily); ctx.SUB(ir::TYPE_U32, ptr, pointer, baseReg); } else { AM = ir::AM_Stateless; ptr = pointer; } ctx.ATOMIC(opcode, type, dst, addrSpace, ptr, payloadTuple, AM, SurfaceIndex); } void GenWriter::emitAtomicCmpXchgInst(AtomicCmpXchgInst &I) { // Get the function arguments Value *llvmPtr = I.getPointerOperand(); ir::AtomicOps opcode = ir::ATOMIC_OP_CMPXCHG; uint32_t payloadNum = 0; vector payload; const ir::Register oldValue = this->getRegister(&I, 0); const ir::Register compareRet = this->getRegister(&I, 1); const ir::Register expected = this->getRegister(I.getCompareOperand()); payload.push_back(this->getRegister(I.getCompareOperand())); payloadNum++; payload.push_back(this->getRegister(I.getNewValOperand())); payloadNum++; ir::Type type = getType(ctx, llvmPtr->getType()->getPointerElementType()); const ir::Tuple payloadTuple = payloadNum == 0 ? ir::Tuple(0) : ctx.arrayTuple(&payload[0], payloadNum); this->emitAtomicInstHelper(opcode, type, oldValue, llvmPtr, payloadTuple); ctx.EQ(type, compareRet, oldValue, expected); } void GenWriter::regAllocateAtomicRMWInst(AtomicRMWInst &I) { this->newRegister(&I); } static INLINE ir::AtomicOps atomicOpsLLVMToGen(llvm::AtomicRMWInst::BinOp llvmOp) { switch(llvmOp) { case llvm::AtomicRMWInst::Xchg: return ir::ATOMIC_OP_XCHG; case llvm::AtomicRMWInst::Add: return ir::ATOMIC_OP_ADD; case llvm::AtomicRMWInst::Sub: return ir::ATOMIC_OP_SUB; case llvm::AtomicRMWInst::And: return ir::ATOMIC_OP_AND; case llvm::AtomicRMWInst::Or: return ir::ATOMIC_OP_OR; case llvm::AtomicRMWInst::Xor: return ir::ATOMIC_OP_XOR; case llvm::AtomicRMWInst::Max: return ir::ATOMIC_OP_IMAX; case llvm::AtomicRMWInst::Min: return ir::ATOMIC_OP_IMIN; case llvm::AtomicRMWInst::UMax: return ir::ATOMIC_OP_UMAX; case llvm::AtomicRMWInst::UMin: return ir::ATOMIC_OP_UMIN; case llvm::AtomicRMWInst::Nand: case llvm::AtomicRMWInst::BAD_BINOP: break; } GBE_ASSERT(false); return ir::ATOMIC_OP_INVALID; } void GenWriter::emitAtomicRMWInst(AtomicRMWInst &I) { // Get the function arguments llvm::AtomicRMWInst::BinOp llvmOpcode = I.getOperation(); Value *llvmPtr = I.getOperand(0); ir::AtomicOps opcode = atomicOpsLLVMToGen(llvmOpcode); const ir::Register dst = this->getRegister(&I); uint32_t payloadNum = 0; vector payload; payload.push_back(this->getRegister(I.getOperand(1))); payloadNum++; ir::Type type = getType(ctx, llvmPtr->getType()->getPointerElementType()); const ir::Tuple payloadTuple = payloadNum == 0 ? ir::Tuple(0) : ctx.arrayTuple(&payload[0], payloadNum); this->emitAtomicInstHelper(opcode, type, dst, llvmPtr, payloadTuple); } void GenWriter::emitAtomicInst(CallInst &I, CallSite &CS, ir::AtomicOps opcode) { CallSite::arg_iterator AI = CS.arg_begin(); CallSite::arg_iterator AE = CS.arg_end(); GBE_ASSERT(AI != AE); Value *llvmPtr = *AI; ir::AddressSpace addrSpace = addressSpaceLLVMToGen(llvmPtr->getType()->getPointerAddressSpace()); ir::Register pointer = this->getRegister(llvmPtr); ir::Register ptr; ir::Register btiReg; unsigned SurfaceIndex = 0xff;; ir::AddressMode AM; if (legacyMode) { Value *bti = getBtiRegister(llvmPtr); Value *ptrBase = getPointerBase(llvmPtr); ir::Register baseReg = this->getRegister(ptrBase); if (isa(bti)) { AM = ir::AM_StaticBti; SurfaceIndex = cast(bti)->getZExtValue(); addrSpace = btiToGen(SurfaceIndex); } else { AM = ir::AM_DynamicBti; addrSpace = ir::MEM_MIXED; btiReg = this->getRegister(bti); } const ir::RegisterFamily pointerFamily = ctx.getPointerFamily(); ptr = ctx.reg(pointerFamily); ctx.SUB(ir::TYPE_U32, ptr, pointer, baseReg); } else { AM = ir::AM_Stateless; ptr = pointer; } const ir::Register dst = this->getRegister(&I); uint32_t payloadNum = 0; vector payload; AI++; while(AI != AE) { payload.push_back(this->getRegister(*(AI++))); payloadNum++; } ir::Type type = getType(ctx, llvmPtr->getType()->getPointerElementType()); const ir::Tuple payloadTuple = payloadNum == 0 ? ir::Tuple(0) : ctx.arrayTuple(&payload[0], payloadNum); if (AM == ir::AM_DynamicBti) { ctx.ATOMIC(opcode, type, dst, addrSpace, ptr, payloadTuple, AM, btiReg); } else { ctx.ATOMIC(opcode, type, dst, addrSpace, ptr, payloadTuple, AM, SurfaceIndex); } } void GenWriter::emitWorkGroupInst(CallInst &I, CallSite &CS, ir::WorkGroupOps opcode) { ir::Function &f = ctx.getFunction(); if (f.getwgBroadcastSLM() < 0 && opcode == ir::WORKGROUP_OP_BROADCAST) { uint32_t mapSize = 8; f.setUseSLM(true); uint32_t oldSlm = f.getSLMSize(); f.setSLMSize(oldSlm + mapSize); f.setwgBroadcastSLM(oldSlm); GBE_ASSERT(f.getwgBroadcastSLM() >= 0); } else if (f.gettidMapSLM() < 0 && opcode >= ir::WORKGROUP_OP_ANY && opcode <= ir::WORKGROUP_OP_EXCLUSIVE_MAX) { /* 1. For thread SLM based communication (default): * Threads will use SLM to write partial results computed individually and then read the whole set. Because the read is done in chunks of 4 extra padding is required. When we come to here, the global thread local vars should have all been allocated, so it's safe for us to steal a piece of SLM for this usage. */ // at most 64 thread for one subslice, along with extra padding uint32_t mapSize = sizeof(uint32_t) * (64 + 4); f.setUseSLM(true); uint32_t oldSlm = f.getSLMSize(); f.setSLMSize(oldSlm + mapSize); f.settidMapSLM(oldSlm); GBE_ASSERT(f.gettidMapSLM() >= 0); } CallSite::arg_iterator AI = CS.arg_begin(); CallSite::arg_iterator AE = CS.arg_end(); GBE_ASSERT(AI != AE); if (opcode == ir::WORKGROUP_OP_ALL || opcode == ir::WORKGROUP_OP_ANY) { GBE_ASSERT(getType(ctx, (*AI)->getType()) == ir::TYPE_S32); ir::Register src[3]; src[0] = ir::ocl::threadn; src[1] = ir::ocl::threadid; src[2] = this->getRegister(*(AI++)); const ir::Tuple srcTuple = ctx.arrayTuple(&src[0], 3); ctx.WORKGROUP(opcode, (uint32_t)f.gettidMapSLM(), getRegister(&I), srcTuple, 3, ir::TYPE_S32); } else if (opcode == ir::WORKGROUP_OP_BROADCAST) { int argNum = CS.arg_size(); std::vector src(argNum); for (int i = 0; i < argNum; i++) { src[i] = this->getRegister(*(AI++)); } const ir::Tuple srcTuple = ctx.arrayTuple(&src[0], argNum); ctx.WORKGROUP(ir::WORKGROUP_OP_BROADCAST, (uint32_t)f.getwgBroadcastSLM(), getRegister(&I), srcTuple, argNum, getType(ctx, (*CS.arg_begin())->getType())); } else { ConstantInt *sign = dyn_cast(AI); GBE_ASSERT(sign); bool isSign = sign->getZExtValue(); AI++; ir::Type ty; if (isSign) { ty = getType(ctx, (*AI)->getType()); } else { ty = getUnsignedType(ctx, (*AI)->getType()); } ir::Register src[3]; src[0] = ir::ocl::threadn; src[1] = ir::ocl::threadid; src[2] = this->getRegister(*(AI++)); const ir::Tuple srcTuple = ctx.arrayTuple(&src[0], 3); ctx.WORKGROUP(opcode, (uint32_t)f.gettidMapSLM(), getRegister(&I), srcTuple, 3, ty); } GBE_ASSERT(AI == AE); } void GenWriter::emitSubGroupInst(CallInst &I, CallSite &CS, ir::WorkGroupOps opcode) { CallSite::arg_iterator AI = CS.arg_begin(); CallSite::arg_iterator AE = CS.arg_end(); GBE_ASSERT(AI != AE); if (opcode == ir::WORKGROUP_OP_ALL || opcode == ir::WORKGROUP_OP_ANY) { GBE_ASSERT(getType(ctx, (*AI)->getType()) == ir::TYPE_S32); ir::Register src[3]; src[0] = this->getRegister(*(AI++)); const ir::Tuple srcTuple = ctx.arrayTuple(&src[0], 1); ctx.SUBGROUP(opcode, getRegister(&I), srcTuple, 1, ir::TYPE_S32); } else if (opcode == ir::WORKGROUP_OP_BROADCAST) { int argNum = CS.arg_size(); GBE_ASSERT(argNum == 2); std::vector src(argNum); for (int i = 0; i < argNum; i++) { src[i] = this->getRegister(*(AI++)); } const ir::Tuple srcTuple = ctx.arrayTuple(&src[0], argNum); ctx.SUBGROUP(ir::WORKGROUP_OP_BROADCAST, getRegister(&I), srcTuple, argNum, getType(ctx, (*CS.arg_begin())->getType())); } else { ConstantInt *sign = dyn_cast(AI); GBE_ASSERT(sign); bool isSign = sign->getZExtValue(); AI++; ir::Type ty; if (isSign) { ty = getType(ctx, (*AI)->getType()); } else { ty = getUnsignedType(ctx, (*AI)->getType()); } ir::Register src[3]; src[0] = this->getRegister(*(AI++)); const ir::Tuple srcTuple = ctx.arrayTuple(&src[0], 1); ctx.SUBGROUP(opcode, getRegister(&I), srcTuple, 1, ty); } GBE_ASSERT(AI == AE); } void GenWriter::emitBlockReadWriteMemInst(CallInst &I, CallSite &CS, bool isWrite, uint8_t vec_size, ir::Type type) { CallSite::arg_iterator AI = CS.arg_begin(); CallSite::arg_iterator AE = CS.arg_end(); GBE_ASSERT(AI != AE); Value *llvmPtr = *(AI++); ir::AddressSpace addrSpace = addressSpaceLLVMToGen(llvmPtr->getType()->getPointerAddressSpace()); GBE_ASSERT(addrSpace == ir::MEM_GLOBAL); ir::Register pointer = this->getRegister(llvmPtr); ir::Register ptr; ir::Register btiReg; unsigned SurfaceIndex = 0xff; ir::AddressMode AM; if (legacyMode) { Value *bti = getBtiRegister(llvmPtr); Value *ptrBase = getPointerBase(llvmPtr); ir::Register baseReg = this->getRegister(ptrBase); if (isa(bti)) { AM = ir::AM_StaticBti; SurfaceIndex = cast(bti)->getZExtValue(); addrSpace = btiToGen(SurfaceIndex); } else { AM = ir::AM_DynamicBti; addrSpace = ir::MEM_MIXED; btiReg = this->getRegister(bti); } const ir::RegisterFamily pointerFamily = ctx.getPointerFamily(); ptr = ctx.reg(pointerFamily); ctx.SUB(ir::TYPE_U32, ptr, pointer, baseReg); } else { AM = ir::AM_Stateless; ptr = pointer; } GBE_ASSERT(AM != ir::AM_DynamicBti); if(isWrite){ Value *llvmValues = *(AI++); vector srcTupleData; for(int i = 0;i < vec_size; i++) srcTupleData.push_back(getRegister(llvmValues, i)); const ir::Tuple tuple = ctx.arrayTuple(&srcTupleData[0], vec_size); ctx.STORE(type, tuple, ptr, addrSpace, vec_size, true, AM, SurfaceIndex, true); } else { vector dstTupleData; for(int i = 0;i < vec_size; i++) dstTupleData.push_back(getRegister(&I, i)); const ir::Tuple tuple = ctx.arrayTuple(&dstTupleData[0], vec_size); ctx.LOAD(type, tuple, ptr, addrSpace, vec_size, true, AM, SurfaceIndex, true); } GBE_ASSERT(AI == AE); } void GenWriter::emitBlockReadWriteImageInst(CallInst &I, CallSite &CS, bool isWrite, uint8_t vec_size, ir::Type type) { CallSite::arg_iterator AI = CS.arg_begin(); CallSite::arg_iterator AE = CS.arg_end(); GBE_ASSERT(AI != AE); const uint8_t imageID = getImageID(I); AI++; if(isWrite){ vector srcTupleData; srcTupleData.push_back(getRegister(*(AI++))); srcTupleData.push_back(getRegister(*(AI++))); for(int i = 0;i < vec_size; i++) srcTupleData.push_back(getRegister(*(AI), i)); AI++; const ir::Tuple srctuple = ctx.arrayTuple(&srcTupleData[0], 2 + vec_size); ctx.MBWRITE(imageID, srctuple, 2 + vec_size, vec_size, type); } else { ir::Register src[2]; src[0] = getRegister(*(AI++)); src[1] = getRegister(*(AI++)); vector dstTupleData; for(int i = 0;i < vec_size; i++) dstTupleData.push_back(getRegister(&I, i)); const ir::Tuple srctuple = ctx.arrayTuple(src, 2); const ir::Tuple dsttuple = ctx.arrayTuple(&dstTupleData[0], vec_size); ctx.MBREAD(imageID, dsttuple, vec_size, srctuple, 2, type); } GBE_ASSERT(AI == AE); } /* append a new sampler. should be called before any reference to * a sampler_t value. */ uint8_t GenWriter::appendSampler(CallSite::arg_iterator AI) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 CallInst *TC = dyn_cast(*AI); Constant *CPV = TC ? dyn_cast(TC->getOperand(0)) : NULL; #else Constant *CPV = dyn_cast(*AI); #endif uint8_t index; if (CPV != NULL) { #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 // Check if the Callee is sampler convert function GBE_ASSERT(TC->getCalledFunction()->getName().str() == "__gen_ocl_int_to_sampler"); #endif // This is not a kernel argument sampler, we need to append it to sampler set, // and allocate a sampler slot for it. const ir::Immediate &x = processConstantImm(CPV); GBE_ASSERTM(x.getType() == ir::TYPE_U32 || x.getType() == ir::TYPE_S32, "Invalid sampler type"); index = ctx.getFunction().getSamplerSet()->append(x.getIntegerValue(), &ctx); } else { const ir::Register samplerReg = this->getRegister(*AI); index = ctx.getFunction().getSamplerSet()->append(samplerReg, &ctx); } return index; } uint8_t GenWriter::getImageID(CallInst &I) { const ir::Register imageReg = this->getRegister(I.getOperand(0)); return ctx.getFunction().getImageSet()->getIdx(imageReg); } void GenWriter::emitCallInst(CallInst &I) { if (Function *F = I.getCalledFunction()) { if (F->getIntrinsicID() != 0) { const ir::Function &fn = ctx.getFunction(); // Get the function arguments CallSite CS(&I); CallSite::arg_iterator AI = CS.arg_begin(); #if GBE_DEBUG CallSite::arg_iterator AE = CS.arg_end(); #endif /* GBE_DEBUG */ switch (F->getIntrinsicID()) { case Intrinsic::stacksave: { const ir::Register dst = this->getRegister(&I); const ir::Register src = ir::ocl::stackptr; const ir::RegisterFamily family = fn.getRegisterFamily(dst); ctx.MOV(ir::getType(family), dst, src); } break; case Intrinsic::stackrestore: { const ir::Register dst = ir::ocl::stackptr; const ir::Register src = this->getRegister(I.getOperand(0)); const ir::RegisterFamily family = fn.getRegisterFamily(dst); ctx.MOV(ir::getType(family), dst, src); } break; case Intrinsic::lifetime_start: case Intrinsic::lifetime_end: break; case Intrinsic::debugtrap: case Intrinsic::trap: case Intrinsic::dbg_value: case Intrinsic::dbg_declare: break; case Intrinsic::uadd_with_overflow: { Type *llvmDstType = I.getType(); GBE_ASSERT(llvmDstType->isStructTy()); ir::Type dst0Type = getType(ctx, llvmDstType->getStructElementType(0)); const ir::Register dst0 = this->getRegister(&I, 0); const ir::Register src0 = this->getRegister(I.getOperand(0)); const ir::Register src1 = this->getRegister(I.getOperand(1)); ctx.ADD(dst0Type, dst0, src0, src1); ir::Register overflow = this->getRegister(&I, 1); const ir::Type unsignedType = makeTypeUnsigned(dst0Type); ctx.LT(unsignedType, overflow, dst0, src1); } break; case Intrinsic::usub_with_overflow: { Type *llvmDstType = I.getType(); GBE_ASSERT(llvmDstType->isStructTy()); ir::Type dst0Type = getType(ctx, llvmDstType->getStructElementType(0)); const ir::Register dst0 = this->getRegister(&I, 0); const ir::Register src0 = this->getRegister(I.getOperand(0)); const ir::Register src1 = this->getRegister(I.getOperand(1)); ctx.SUB(dst0Type, dst0, src0, src1); ir::Register overflow = this->getRegister(&I, 1); const ir::Type unsignedType = makeTypeUnsigned(dst0Type); ctx.GT(unsignedType, overflow, dst0, src0); } break; case Intrinsic::sadd_with_overflow: case Intrinsic::ssub_with_overflow: case Intrinsic::smul_with_overflow: case Intrinsic::umul_with_overflow: NOT_IMPLEMENTED; break; case Intrinsic::ctlz: { Type *llvmDstType = I.getType(); ir::Type dstType = getType(ctx, llvmDstType); Type *llvmSrcType = I.getOperand(0)->getType(); ir::Type srcType = getUnsignedType(ctx, llvmSrcType); //the llvm.ctlz.i64 is lowered to two llvm.ctlz.i32 call in ocl_clz.ll GBE_ASSERT(srcType != ir::TYPE_U64); const ir::Register dst = this->getRegister(&I); const ir::Register src = this->getRegister(I.getOperand(0)); int imm_value = 0; if(srcType == ir::TYPE_U16) { imm_value = 16; }else if(srcType == ir::TYPE_U8) { imm_value = 24; } if(srcType == ir::TYPE_U16 || srcType == ir::TYPE_U8) { ir::ImmediateIndex imm; ir::Type tmpType = ir::TYPE_S32; imm = ctx.newIntegerImmediate(imm_value, tmpType); const ir::RegisterFamily family = getFamily(tmpType); const ir::Register immReg = ctx.reg(family); ctx.LOADI(ir::TYPE_S32, immReg, imm); ir::Register tmp0 = ctx.reg(getFamily(tmpType)); ir::Register tmp1 = ctx.reg(getFamily(tmpType)); ir::Register tmp2 = ctx.reg(getFamily(tmpType)); ctx.CVT(tmpType, srcType, tmp0, src); ctx.ALU1(ir::OP_LZD, ir::TYPE_U32, tmp1, tmp0); ctx.SUB(tmpType, tmp2, tmp1, immReg); ctx.CVT(dstType, tmpType, dst, tmp2); } else { GBE_ASSERT(srcType == ir::TYPE_U32); ctx.ALU1(ir::OP_LZD, srcType, dst, src); } } break; case Intrinsic::cttz: { Type *llvmDstType = I.getType(); ir::Type dstType = getType(ctx, llvmDstType); Type *llvmSrcType = I.getOperand(0)->getType(); ir::Type srcType = getUnsignedType(ctx, llvmSrcType); //the llvm.ctlz.i64 is lowered to two llvm.cttz.i32 call in ocl_ctz.ll GBE_ASSERT(srcType != ir::TYPE_U64); const ir::Register dst = this->getRegister(&I); const ir::Register src = this->getRegister(I.getOperand(0)); uint32_t imm_value = 0; if(srcType == ir::TYPE_U16) { imm_value = 0xFFFF0000; }else if(srcType == ir::TYPE_U8) { imm_value = 0xFFFFFF00; } if(srcType == ir::TYPE_U16 || srcType == ir::TYPE_U8) { ir::ImmediateIndex imm; ir::Type tmpType = ir::TYPE_S32; ir::Type revType = ir::TYPE_U32; imm = ctx.newIntegerImmediate(imm_value, revType); const ir::RegisterFamily family = getFamily(revType); const ir::Register immReg = ctx.reg(family); ctx.LOADI(ir::TYPE_U32, immReg, imm); ir::Register tmp0 = ctx.reg(getFamily(tmpType)); ir::Register tmp1 = ctx.reg(getFamily(revType)); ir::Register tmp2 = ctx.reg(getFamily(revType)); ir::Register revTmp = ctx.reg(getFamily(revType)); ctx.CVT(tmpType, srcType, tmp0, src); //gen does not have 'tzd', so reverse first ctx.ADD(revType, tmp1, tmp0, immReg); ctx.ALU1(ir::OP_BFREV, revType, revTmp, tmp1); ctx.ALU1(ir::OP_LZD, ir::TYPE_U32, tmp2, revTmp); ctx.CVT(dstType, tmpType, dst, tmp2); } else { GBE_ASSERT(srcType == ir::TYPE_U32); ir::Type revType = ir::TYPE_U32; ir::Register revTmp = ctx.reg(getFamily(revType)); ctx.ALU1(ir::OP_BFREV, revType, revTmp, src); ctx.ALU1(ir::OP_LZD, ir::TYPE_U32, dst, revTmp); } } break; case Intrinsic::fma: case Intrinsic::fmuladd: { ir::Type srcType = getType(ctx, I.getType()); const ir::Register dst = this->getRegister(&I); const ir::Register src0 = this->getRegister(I.getOperand(0)); const ir::Register src1 = this->getRegister(I.getOperand(1)); const ir::Register src2 = this->getRegister(I.getOperand(2)); ctx.MAD(srcType, dst, src0, src1, src2); } break; case Intrinsic::sqrt: this->emitUnaryCallInst(I,CS,ir::OP_SQR); break; case Intrinsic::ceil: this->emitRoundingCallInst(I,CS,ir::OP_RNDU); break; case Intrinsic::trunc: this->emitRoundingCallInst(I,CS,ir::OP_RNDZ); break; case Intrinsic::rint: this->emitRoundingCallInst(I,CS,ir::OP_RNDE); break; case Intrinsic::floor: this->emitRoundingCallInst(I,CS,ir::OP_RNDD); break; case Intrinsic::sin: this->emitUnaryCallInst(I,CS,ir::OP_SIN); break; case Intrinsic::cos: this->emitUnaryCallInst(I,CS,ir::OP_COS); break; case Intrinsic::log2: this->emitUnaryCallInst(I,CS,ir::OP_LOG); break; case Intrinsic::exp2: this->emitUnaryCallInst(I,CS,ir::OP_EXP); break; case Intrinsic::bswap: this->emitUnaryCallInst(I,CS,ir::OP_BSWAP, getUnsignedType(ctx, I.getType())); break; case Intrinsic::pow: { const ir::Register src0 = this->getRegister(*AI); ++AI; const ir::Register src1 = this->getRegister(*AI); const ir::Register dst = this->getRegister(&I); ctx.POW(ir::TYPE_FLOAT, dst, src0, src1); break; } case Intrinsic::fabs: { const ir::Register src = this->getRegister(*AI); const ir::Register dst = this->getRegister(&I); ctx.ALU1(ir::OP_ABS, getType(ctx, (*AI)->getType()), dst, src); break; } default: NOT_IMPLEMENTED; } } else { // Get the name of the called function and handle it Value *Callee = I.getCalledValue(); const std::string fnName = Callee->stripPointerCasts()->getName(); auto genIntrinsicID = intrinsicMap.find(fnName); // Get the function arguments CallSite CS(&I); CallSite::arg_iterator AI = CS.arg_begin(); #if GBE_DEBUG CallSite::arg_iterator AE = CS.arg_end(); #endif /* GBE_DEBUG */ switch (genIntrinsicID) { case GEN_OCL_FBH: this->emitUnaryCallInst(I,CS,ir::OP_FBH, ir::TYPE_U32); break; case GEN_OCL_FBL: this->emitUnaryCallInst(I,CS,ir::OP_FBL, ir::TYPE_U32); break; case GEN_OCL_CBIT: this->emitUnaryCallInst(I,CS,ir::OP_CBIT, getUnsignedType(ctx, (*AI)->getType())); break; case GEN_OCL_ABS: { const ir::Register src = this->getRegister(*AI); const ir::Register dst = this->getRegister(&I); ctx.ALU1(ir::OP_ABS, getType(ctx, (*AI)->getType()), dst, src); break; } case GEN_OCL_SIMD_ALL: { const ir::Register src = this->getRegister(*AI); const ir::Register dst = this->getRegister(&I); ctx.ALU1(ir::OP_SIMD_ALL, ir::TYPE_S32, dst, src); break; } case GEN_OCL_SIMD_ANY: { const ir::Register src = this->getRegister(*AI); const ir::Register dst = this->getRegister(&I); ctx.ALU1(ir::OP_SIMD_ANY, ir::TYPE_S32, dst, src); break; } case GEN_OCL_READ_TM: { const ir::Register dst = this->getRegister(&I); ctx.READ_ARF(ir::TYPE_U32, dst, ir::ARF_TM); break; } case GEN_OCL_VME: { const uint8_t imageID = getImageID(I); AI++; AI++; uint32_t src_length = 40; vector dstTupleData, srcTupleData; for (uint32_t i = 0; i < src_length; i++, AI++){ srcTupleData.push_back(this->getRegister(*AI)); } const ir::Tuple srcTuple = ctx.arrayTuple(&srcTupleData[0], src_length); Constant *msg_type_cpv = dyn_cast(*AI); assert(msg_type_cpv); const ir::Immediate &msg_type_x = processConstantImm(msg_type_cpv); int msg_type = msg_type_x.getIntegerValue(); uint32_t dst_length; //msy_type =1 indicate inter search only of gen vme shared function GBE_ASSERT(msg_type == 1); if(msg_type == 1) dst_length = 6; for (uint32_t elemID = 0; elemID < dst_length; ++elemID) { const ir::Register reg = this->getRegister(&I, elemID); dstTupleData.push_back(reg); } const ir::Tuple dstTuple = ctx.arrayTuple(&dstTupleData[0], dst_length); ++AI; Constant *vme_search_path_lut_cpv = dyn_cast(*AI); assert(vme_search_path_lut_cpv); const ir::Immediate &vme_search_path_lut_x = processConstantImm(vme_search_path_lut_cpv); ++AI; Constant *lut_sub_cpv = dyn_cast(*AI); assert(lut_sub_cpv); const ir::Immediate &lut_sub_x = processConstantImm(lut_sub_cpv); ctx.VME(imageID, dstTuple, srcTuple, dst_length, src_length, msg_type, vme_search_path_lut_x.getIntegerValue(), lut_sub_x.getIntegerValue()); break; } case GEN_OCL_IN_PRIVATE: { const ir::Register dst = this->getRegister(&I); uint32_t stackSize = ctx.getFunction().getStackSize(); if (stackSize == 0) { ir::ImmediateIndex imm = ctx.newImmediate((bool)0); ctx.LOADI(ir::TYPE_BOOL, dst, imm); } else { ir::Register cmp0 = ctx.reg(ir::FAMILY_BOOL); ir::Register cmp1 = ctx.reg(ir::FAMILY_BOOL); const ir::Register src0 = this->getRegister(*AI); ir::Register tmp = ctx.reg(ir::FAMILY_QWORD); ctx.GE(ir::TYPE_U64, cmp0, src0, ir::ocl::stackbuffer); ctx.ADD(ir::TYPE_U64, tmp, ir::ocl::stackbuffer, ir::ocl::stacksize); ctx.LT(ir::TYPE_U64, cmp1, src0, tmp); ctx.AND(ir::TYPE_BOOL, dst, cmp0, cmp1); } break; } case GEN_OCL_REGION: { const ir::Register dst = this->getRegister(&I); // offset must be immediate GBE_ASSERT(AI != AE); Constant *CPV = dyn_cast(*AI); assert(CPV); const ir::Immediate &x = processConstantImm(CPV); AI++; const ir::Register src = this->getRegister(*AI); ctx.REGION(dst, src, x.getIntegerValue()); break; } case GEN_OCL_RSQ: this->emitUnaryCallInst(I,CS,ir::OP_RSQ); break; case GEN_OCL_RCP: this->emitUnaryCallInst(I,CS,ir::OP_RCP); break; case GEN_OCL_FORCE_SIMD8: ctx.setSimdWidth(8); break; case GEN_OCL_FORCE_SIMD16: ctx.setSimdWidth(16); break; case GEN_OCL_LBARRIER: ctx.SYNC(ir::syncLocalBarrier); break; case GEN_OCL_GBARRIER: ctx.SYNC(ir::syncGlobalBarrier); break; case GEN_OCL_BARRIER: { Constant *CPV = dyn_cast(*AI); unsigned syncFlag = 0; if (CPV) { const ir::Immediate &x = processConstantImm(CPV); unsigned barrierArg = x.getIntegerValue(); if (barrierArg & 0x1) { syncFlag |= ir::syncLocalBarrier; } if (barrierArg & 0x2) { syncFlag |= ir::syncGlobalBarrier; } if (barrierArg & 0x4) { syncFlag |= ir::syncImageBarrier; } } else { // FIXME we default it to do global fence and barrier. // we need to do runtime check here. syncFlag = ir::syncLocalBarrier | ir::syncGlobalBarrier; } ctx.SYNC(syncFlag); break; } case GEN_OCL_ATOMIC_ADD0: case GEN_OCL_ATOMIC_ADD1: this->emitAtomicInst(I,CS,ir::ATOMIC_OP_ADD); break; case GEN_OCL_ATOMIC_SUB0: case GEN_OCL_ATOMIC_SUB1: this->emitAtomicInst(I,CS,ir::ATOMIC_OP_SUB); break; case GEN_OCL_ATOMIC_AND0: case GEN_OCL_ATOMIC_AND1: this->emitAtomicInst(I,CS,ir::ATOMIC_OP_AND); break; case GEN_OCL_ATOMIC_OR0: case GEN_OCL_ATOMIC_OR1: this->emitAtomicInst(I,CS,ir::ATOMIC_OP_OR); break; case GEN_OCL_ATOMIC_XOR0: case GEN_OCL_ATOMIC_XOR1: this->emitAtomicInst(I,CS,ir::ATOMIC_OP_XOR); break; case GEN_OCL_ATOMIC_XCHG0: case GEN_OCL_ATOMIC_XCHG1: this->emitAtomicInst(I,CS,ir::ATOMIC_OP_XCHG); break; case GEN_OCL_ATOMIC_INC0: case GEN_OCL_ATOMIC_INC1: this->emitAtomicInst(I,CS,ir::ATOMIC_OP_INC); break; case GEN_OCL_ATOMIC_DEC0: case GEN_OCL_ATOMIC_DEC1: this->emitAtomicInst(I,CS,ir::ATOMIC_OP_DEC); break; case GEN_OCL_ATOMIC_UMIN0: case GEN_OCL_ATOMIC_UMIN1: this->emitAtomicInst(I,CS,ir::ATOMIC_OP_UMIN); break; case GEN_OCL_ATOMIC_UMAX0: case GEN_OCL_ATOMIC_UMAX1: this->emitAtomicInst(I,CS,ir::ATOMIC_OP_UMAX); break; case GEN_OCL_ATOMIC_IMIN0: case GEN_OCL_ATOMIC_IMIN1: this->emitAtomicInst(I,CS,ir::ATOMIC_OP_IMIN); break; case GEN_OCL_ATOMIC_IMAX0: case GEN_OCL_ATOMIC_IMAX1: this->emitAtomicInst(I,CS,ir::ATOMIC_OP_IMAX); break; case GEN_OCL_ATOMIC_CMPXCHG0: case GEN_OCL_ATOMIC_CMPXCHG1: this->emitAtomicInst(I,CS,ir::ATOMIC_OP_CMPXCHG); break; case GEN_OCL_GET_IMAGE_WIDTH: case GEN_OCL_GET_IMAGE_HEIGHT: case GEN_OCL_GET_IMAGE_DEPTH: case GEN_OCL_GET_IMAGE_CHANNEL_DATA_TYPE: case GEN_OCL_GET_IMAGE_CHANNEL_ORDER: { const uint8_t imageID = getImageID(I); GBE_ASSERT(AI != AE); ++AI; const ir::Register reg = this->getRegister(&I, 0); int infoType = genIntrinsicID - GEN_OCL_GET_IMAGE_WIDTH; ir::ImageInfoKey key(imageID, infoType); const ir::Register infoReg = ctx.getFunction().getImageSet()->appendInfo(key, &ctx); ctx.GET_IMAGE_INFO(infoType, reg, imageID, infoReg); break; } case GEN_OCL_READ_IMAGE_I: case GEN_OCL_READ_IMAGE_UI: case GEN_OCL_READ_IMAGE_F: { const uint8_t imageID = getImageID(I); GBE_ASSERT(AI != AE); ++AI; GBE_ASSERT(AI != AE); const uint8_t sampler = this->appendSampler(AI); ++AI; GBE_ASSERT(AI != AE); uint32_t coordNum; const ir::Type coordType = getVectorInfo(ctx, *AI, coordNum); if (coordNum == 4) coordNum = 3; const uint32_t imageDim = coordNum; GBE_ASSERT(imageDim >= 1 && imageDim <= 3); uint8_t samplerOffset = 0; Value *coordVal = *AI; ++AI; GBE_ASSERT(AI != AE); Value *samplerOffsetVal = *AI; #ifdef GEN7_SAMPLER_CLAMP_BORDER_WORKAROUND Constant *CPV = dyn_cast(samplerOffsetVal); assert(CPV); const ir::Immediate &x = processConstantImm(CPV); GBE_ASSERTM(x.getType() == ir::TYPE_U32 || x.getType() == ir::TYPE_S32, "Invalid sampler type"); samplerOffset = x.getIntegerValue(); #endif bool isFloatCoord = coordType == ir::TYPE_FLOAT; bool requiredFloatCoord = samplerOffset == 0; (void) isFloatCoord; GBE_ASSERT(isFloatCoord == requiredFloatCoord); vector dstTupleData, srcTupleData; for (uint32_t elemID = 0; elemID < imageDim; elemID++) srcTupleData.push_back(this->getRegister(coordVal, elemID)); uint32_t elemNum; ir::Type dstType = getVectorInfo(ctx, &I, elemNum); GBE_ASSERT(elemNum == 4); for (uint32_t elemID = 0; elemID < elemNum; ++elemID) { const ir::Register reg = this->getRegister(&I, elemID); dstTupleData.push_back(reg); } const ir::Tuple dstTuple = ctx.arrayTuple(&dstTupleData[0], elemNum); const ir::Tuple srcTuple = ctx.arrayTuple(&srcTupleData[0], imageDim); ctx.SAMPLE(imageID, dstTuple, srcTuple, imageDim, dstType == ir::TYPE_FLOAT, requiredFloatCoord, sampler, samplerOffset); break; } case GEN_OCL_WRITE_IMAGE_I: case GEN_OCL_WRITE_IMAGE_UI: case GEN_OCL_WRITE_IMAGE_F: { const uint8_t imageID = getImageID(I); GBE_ASSERT(AI != AE); ++AI; GBE_ASSERT(AI != AE); uint32_t coordNum; (void)getVectorInfo(ctx, *AI, coordNum); if (coordNum == 4) coordNum = 3; const uint32_t imageDim = coordNum; vector srcTupleData; GBE_ASSERT(imageDim >= 1 && imageDim <= 3); for (uint32_t elemID = 0; elemID < imageDim; elemID++) srcTupleData.push_back(this->getRegister(*AI, elemID)); ++AI; GBE_ASSERT(AI != AE); uint32_t elemNum; ir::Type srcType = getVectorInfo(ctx, *AI, elemNum); GBE_ASSERT(elemNum == 4); for (uint32_t elemID = 0; elemID < elemNum; ++elemID) { const ir::Register reg = this->getRegister(*AI, elemID); srcTupleData.push_back(reg); } const ir::Tuple srcTuple = ctx.arrayTuple(&srcTupleData[0], imageDim + 4); ctx.TYPED_WRITE(imageID, srcTuple, imageDim + 4, srcType, ir::TYPE_U32); break; } case GEN_OCL_MUL_HI_INT: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.MUL_HI(getType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_MUL_HI_UINT: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.MUL_HI(getUnsignedType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_MUL_HI_I64: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.I64_MUL_HI(getType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_MUL_HI_UI64: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.I64_MUL_HI(getUnsignedType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_UPSAMPLE_SHORT: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.UPSAMPLE_SHORT(getType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_UPSAMPLE_INT: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.UPSAMPLE_INT(getType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_UPSAMPLE_LONG: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.UPSAMPLE_LONG(getType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_SADD_SAT_CHAR: case GEN_OCL_SADD_SAT_SHORT: case GEN_OCL_SADD_SAT_INT: case GEN_OCL_SADD_SAT_LONG: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.ADDSAT(getType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_UADD_SAT_CHAR: case GEN_OCL_UADD_SAT_SHORT: case GEN_OCL_UADD_SAT_INT: case GEN_OCL_UADD_SAT_LONG: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.ADDSAT(getUnsignedType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_SSUB_SAT_CHAR: case GEN_OCL_SSUB_SAT_SHORT: case GEN_OCL_SSUB_SAT_INT: case GEN_OCL_SSUB_SAT_LONG: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.SUBSAT(getType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_USUB_SAT_CHAR: case GEN_OCL_USUB_SAT_SHORT: case GEN_OCL_USUB_SAT_INT: case GEN_OCL_USUB_SAT_LONG: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.SUBSAT(getUnsignedType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_I64_MAD_SAT: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src2 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.I64MADSAT(getType(ctx, I.getType()), dst, src0, src1, src2); break; } case GEN_OCL_I64_MAD_SATU: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src2 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.I64MADSAT(getUnsignedType(ctx, I.getType()), dst, src0, src1, src2); break; } case GEN_OCL_FMAX: case GEN_OCL_FMIN:{ GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); const ir::Register cmp = ctx.reg(ir::FAMILY_BOOL); //Becasue cmp's sources are same as sel's source, so cmp instruction and sel //instruction will be merged to one sel_cmp instruction in the gen selection //Add two intruction here for simple. if(genIntrinsicID == GEN_OCL_FMAX) ctx.GE(getType(ctx, I.getType()), cmp, src0, src1); else ctx.LT(getType(ctx, I.getType()), cmp, src0, src1); ctx.SEL(getType(ctx, I.getType()), dst, cmp, src0, src1); break; } case GEN_OCL_HADD: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.HADD(getUnsignedType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_I64HADD: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*(AI++)); GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*(AI++)); const ir::Register dst = this->getRegister(&I); ctx.I64HADD(ir::TYPE_U64, dst, src0, src1); break; } case GEN_OCL_RHADD: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*AI); ++AI; GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.RHADD(getUnsignedType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_I64RHADD: { GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*(AI++)); GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*(AI++)); const ir::Register dst = this->getRegister(&I); ctx.I64RHADD(ir::TYPE_U64, dst, src0, src1); break; } #define DEF(DST_TYPE, SRC_TYPE) \ { ctx.SAT_CVT(DST_TYPE, SRC_TYPE, getRegister(&I), getRegister(I.getOperand(0))); break; } case GEN_OCL_SAT_CONV_U8_TO_I8: DEF(ir::TYPE_S8, ir::TYPE_U8); case GEN_OCL_SAT_CONV_I16_TO_I8: DEF(ir::TYPE_S8, ir::TYPE_S16); case GEN_OCL_SAT_CONV_U16_TO_I8: DEF(ir::TYPE_S8, ir::TYPE_U16); case GEN_OCL_SAT_CONV_I32_TO_I8: DEF(ir::TYPE_S8, ir::TYPE_S32); case GEN_OCL_SAT_CONV_U32_TO_I8: DEF(ir::TYPE_S8, ir::TYPE_U32); case GEN_OCL_SAT_CONV_F32_TO_I8: DEF(ir::TYPE_S8, ir::TYPE_FLOAT); case GEN_OCL_SAT_CONV_I8_TO_U8: DEF(ir::TYPE_U8, ir::TYPE_S8); case GEN_OCL_SAT_CONV_I16_TO_U8: DEF(ir::TYPE_U8, ir::TYPE_S16); case GEN_OCL_SAT_CONV_U16_TO_U8: DEF(ir::TYPE_U8, ir::TYPE_U16); case GEN_OCL_SAT_CONV_I32_TO_U8: DEF(ir::TYPE_U8, ir::TYPE_S32); case GEN_OCL_SAT_CONV_U32_TO_U8: DEF(ir::TYPE_U8, ir::TYPE_U32); case GEN_OCL_SAT_CONV_F32_TO_U8: DEF(ir::TYPE_U8, ir::TYPE_FLOAT); case GEN_OCL_SAT_CONV_U16_TO_I16: DEF(ir::TYPE_S16, ir::TYPE_U16); case GEN_OCL_SAT_CONV_I32_TO_I16: DEF(ir::TYPE_S16, ir::TYPE_S32); case GEN_OCL_SAT_CONV_U32_TO_I16: DEF(ir::TYPE_S16, ir::TYPE_U32); case GEN_OCL_SAT_CONV_F32_TO_I16: DEF(ir::TYPE_S16, ir::TYPE_FLOAT); case GEN_OCL_SAT_CONV_I16_TO_U16: DEF(ir::TYPE_U16, ir::TYPE_S16); case GEN_OCL_SAT_CONV_I32_TO_U16: DEF(ir::TYPE_U16, ir::TYPE_S32); case GEN_OCL_SAT_CONV_U32_TO_U16: DEF(ir::TYPE_U16, ir::TYPE_U32); case GEN_OCL_SAT_CONV_F32_TO_U16: DEF(ir::TYPE_U16, ir::TYPE_FLOAT); case GEN_OCL_SAT_CONV_U32_TO_I32: DEF(ir::TYPE_S32, ir::TYPE_U32); case GEN_OCL_SAT_CONV_F32_TO_I32: DEF(ir::TYPE_S32, ir::TYPE_FLOAT); case GEN_OCL_SAT_CONV_I32_TO_U32: DEF(ir::TYPE_U32, ir::TYPE_S32); case GEN_OCL_SAT_CONV_F32_TO_U32: DEF(ir::TYPE_U32, ir::TYPE_FLOAT); case GEN_OCL_SAT_CONV_F16_TO_I8: DEF(ir::TYPE_S8, ir::TYPE_HALF); case GEN_OCL_SAT_CONV_F16_TO_U8: DEF(ir::TYPE_U8, ir::TYPE_HALF); case GEN_OCL_SAT_CONV_F16_TO_I16: DEF(ir::TYPE_S16, ir::TYPE_HALF); case GEN_OCL_SAT_CONV_F16_TO_U16: DEF(ir::TYPE_U16, ir::TYPE_HALF); case GEN_OCL_SAT_CONV_F16_TO_I32: DEF(ir::TYPE_S32, ir::TYPE_HALF); case GEN_OCL_SAT_CONV_F16_TO_U32: DEF(ir::TYPE_U32, ir::TYPE_HALF); case GEN_OCL_CONV_F16_TO_F32: ctx.F16TO32(ir::TYPE_FLOAT, ir::TYPE_U16, getRegister(&I), getRegister(I.getOperand(0))); break; case GEN_OCL_CONV_F32_TO_F16: ctx.F32TO16(ir::TYPE_U16, ir::TYPE_FLOAT, getRegister(&I), getRegister(I.getOperand(0))); break; #undef DEF case GEN_OCL_PRINTF: { ir::PrintfSet::PrintfFmt* fmt = getPrintfInfo(&I); if (fmt == NULL) break; ctx.getFunction().getPrintfSet()->append(printfNum, fmt); vector tupleData; vector tupleTypeData; int argNum = static_cast(I.getNumOperands()); argNum -= 2; // no fmt and last NULL. int realArgNum = argNum; for (int n = 0; n < argNum; n++) { /* First, ignore %s, the strings are recorded and not passed to GPU. */ llvm::Constant* args = dyn_cast(I.getOperand(n + 1)); llvm::Constant* args_ptr = NULL; if (args) args_ptr = dyn_cast(args->getOperand(0)); if (args_ptr) { ConstantDataSequential* fmt_arg = dyn_cast(args_ptr->getOperand(0)); if (fmt_arg && fmt_arg->isCString()) { realArgNum--; continue; } } Type * type = I.getOperand(n + 1)->getType(); if (type->isVectorTy()) { uint32_t srcElemNum = 0; Value *srcValue = I.getOperand(n + 1); ir::Type srcType = getVectorInfo(ctx, srcValue, srcElemNum); GBE_ASSERT(!(srcType == ir::TYPE_DOUBLE)); uint32_t elemID = 0; for (elemID = 0; elemID < srcElemNum; ++elemID) { ir::Register reg = getRegister(srcValue, elemID); tupleData.push_back(reg); tupleTypeData.push_back(srcType); } realArgNum += srcElemNum - 1; } else { ir::Register reg = getRegister(I.getOperand(n + 1)); tupleData.push_back(reg); tupleTypeData.push_back(getType(ctx, I.getOperand(n + 1)->getType())); } } ir::Tuple tuple; ir::Tuple typeTuple; if (realArgNum > 0) { tuple = ctx.arrayTuple(&tupleData[0], realArgNum); typeTuple = ctx.arrayTypeTuple(&tupleTypeData[0], realArgNum); } ctx.PRINTF(getRegister(&I), tuple, typeTuple, realArgNum, printfBti, printfNum); printfNum++; break; } case GEN_OCL_CALC_TIMESTAMP: { GBE_ASSERT(AI != AE); ConstantInt *CI = dyn_cast(*AI); GBE_ASSERT(CI); uint32_t pointNum = CI->getZExtValue(); AI++; GBE_ASSERT(AI != AE); CI = dyn_cast(*AI); GBE_ASSERT(CI); uint32_t tsType = CI->getZExtValue(); ctx.CALC_TIMESTAMP(pointNum, tsType); break; } case GEN_OCL_STORE_PROFILING: { /* The profiling log always begin at 0 offset, so we never need the buffer ptr value and ptrBase, and no need for SUB to calculate the real address, neither. We just pass down the BTI value to the instruction. */ GBE_ASSERT(AI != AE); Value* llvmPtr = *AI; Value *bti = getBtiRegister(llvmPtr); GBE_ASSERT(isa(bti)); //Should never be mixed pointer. uint32_t index = cast(bti)->getZExtValue(); (void) index; GBE_ASSERT(btiToGen(index) == ir::MEM_GLOBAL); ++AI; GBE_ASSERT(AI != AE); ConstantInt *CI = dyn_cast(*AI); GBE_ASSERT(CI); uint32_t ptype = CI->getZExtValue(); ctx.getUnit().getProfilingInfo()->setProfilingType(ptype); break; } case GEN_OCL_SIMD_SIZE: { const ir::Register dst = this->getRegister(&I); ctx.ALU0(ir::OP_SIMD_SIZE, getType(ctx, I.getType()), dst); break; } case GEN_OCL_SIMD_ID: { const ir::Register dst = this->getRegister(&I); ctx.ALU0(ir::OP_SIMD_ID, getType(ctx, I.getType()), dst); break; } case GEN_OCL_SIMD_SHUFFLE: { const ir::Register src0 = this->getRegister(*AI); ++AI; const ir::Register src1 = this->getRegister(*AI); ++AI; const ir::Register dst = this->getRegister(&I); ctx.SIMD_SHUFFLE(getType(ctx, I.getType()), dst, src0, src1); break; } case GEN_OCL_DEBUGWAIT: { ctx.WAIT(); break; } case GEN_OCL_WORK_GROUP_ALL: this->emitWorkGroupInst(I, CS, ir::WORKGROUP_OP_ALL); break; case GEN_OCL_WORK_GROUP_ANY: this->emitWorkGroupInst(I, CS, ir::WORKGROUP_OP_ANY); break; case GEN_OCL_WORK_GROUP_BROADCAST: this->emitWorkGroupInst(I, CS, ir::WORKGROUP_OP_BROADCAST); break; case GEN_OCL_WORK_GROUP_REDUCE_ADD: this->emitWorkGroupInst(I, CS, ir::WORKGROUP_OP_REDUCE_ADD); break; case GEN_OCL_WORK_GROUP_REDUCE_MAX: this->emitWorkGroupInst(I, CS, ir::WORKGROUP_OP_REDUCE_MAX); break; case GEN_OCL_WORK_GROUP_REDUCE_MIN: this->emitWorkGroupInst(I, CS, ir::WORKGROUP_OP_REDUCE_MIN); break; case GEN_OCL_WORK_GROUP_SCAN_EXCLUSIVE_ADD: this->emitWorkGroupInst(I, CS, ir::WORKGROUP_OP_EXCLUSIVE_ADD); break; case GEN_OCL_WORK_GROUP_SCAN_EXCLUSIVE_MAX: this->emitWorkGroupInst(I, CS, ir::WORKGROUP_OP_EXCLUSIVE_MAX); break; case GEN_OCL_WORK_GROUP_SCAN_EXCLUSIVE_MIN: this->emitWorkGroupInst(I, CS, ir::WORKGROUP_OP_EXCLUSIVE_MIN); break; case GEN_OCL_WORK_GROUP_SCAN_INCLUSIVE_ADD: this->emitWorkGroupInst(I, CS, ir::WORKGROUP_OP_INCLUSIVE_ADD); break; case GEN_OCL_WORK_GROUP_SCAN_INCLUSIVE_MAX: this->emitWorkGroupInst(I, CS, ir::WORKGROUP_OP_INCLUSIVE_MAX); break; case GEN_OCL_WORK_GROUP_SCAN_INCLUSIVE_MIN: this->emitWorkGroupInst(I, CS, ir::WORKGROUP_OP_INCLUSIVE_MIN); break; case GEN_OCL_SUB_GROUP_BROADCAST: this->emitSubGroupInst(I, CS, ir::WORKGROUP_OP_BROADCAST); break; case GEN_OCL_SUB_GROUP_REDUCE_ADD: this->emitSubGroupInst(I, CS, ir::WORKGROUP_OP_REDUCE_ADD); break; case GEN_OCL_SUB_GROUP_REDUCE_MAX: this->emitSubGroupInst(I, CS, ir::WORKGROUP_OP_REDUCE_MAX); break; case GEN_OCL_SUB_GROUP_REDUCE_MIN: this->emitSubGroupInst(I, CS, ir::WORKGROUP_OP_REDUCE_MIN); break; case GEN_OCL_SUB_GROUP_SCAN_EXCLUSIVE_ADD: this->emitSubGroupInst(I, CS, ir::WORKGROUP_OP_EXCLUSIVE_ADD); break; case GEN_OCL_SUB_GROUP_SCAN_EXCLUSIVE_MAX: this->emitSubGroupInst(I, CS, ir::WORKGROUP_OP_EXCLUSIVE_MAX); break; case GEN_OCL_SUB_GROUP_SCAN_EXCLUSIVE_MIN: this->emitSubGroupInst(I, CS, ir::WORKGROUP_OP_EXCLUSIVE_MIN); break; case GEN_OCL_SUB_GROUP_SCAN_INCLUSIVE_ADD: this->emitSubGroupInst(I, CS, ir::WORKGROUP_OP_INCLUSIVE_ADD); break; case GEN_OCL_SUB_GROUP_SCAN_INCLUSIVE_MAX: this->emitSubGroupInst(I, CS, ir::WORKGROUP_OP_INCLUSIVE_MAX); break; case GEN_OCL_SUB_GROUP_SCAN_INCLUSIVE_MIN: this->emitSubGroupInst(I, CS, ir::WORKGROUP_OP_INCLUSIVE_MIN); break; case GEN_OCL_LRP: { const ir::Register dst = this->getRegister(&I); GBE_ASSERT(AI != AE); const ir::Register src0 = this->getRegister(*(AI++)); GBE_ASSERT(AI != AE); const ir::Register src1 = this->getRegister(*(AI++)); GBE_ASSERT(AI != AE); const ir::Register src2 = this->getRegister(*(AI++)); ctx.LRP(ir::TYPE_FLOAT, dst, src0, src1, src2); break; } case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_MEM: this->emitBlockReadWriteMemInst(I, CS, false, 1); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_MEM2: this->emitBlockReadWriteMemInst(I, CS, false, 2); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_MEM4: this->emitBlockReadWriteMemInst(I, CS, false, 4); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_MEM8: this->emitBlockReadWriteMemInst(I, CS, false, 8); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_MEM: this->emitBlockReadWriteMemInst(I, CS, true, 1); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_MEM2: this->emitBlockReadWriteMemInst(I, CS, true, 2); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_MEM4: this->emitBlockReadWriteMemInst(I, CS, true, 4); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_MEM8: this->emitBlockReadWriteMemInst(I, CS, true, 8); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_IMAGE: this->emitBlockReadWriteImageInst(I, CS, false, 1); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_IMAGE2: this->emitBlockReadWriteImageInst(I, CS, false, 2); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_IMAGE4: this->emitBlockReadWriteImageInst(I, CS, false, 4); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_UI_IMAGE8: this->emitBlockReadWriteImageInst(I, CS, false, 8); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_IMAGE: this->emitBlockReadWriteImageInst(I, CS, true, 1); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_IMAGE2: this->emitBlockReadWriteImageInst(I, CS, true, 2); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_IMAGE4: this->emitBlockReadWriteImageInst(I, CS, true, 4); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_UI_IMAGE8: this->emitBlockReadWriteImageInst(I, CS, true, 8); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_US_MEM: this->emitBlockReadWriteMemInst(I, CS, false, 1, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_US_MEM2: this->emitBlockReadWriteMemInst(I, CS, false, 2, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_US_MEM4: this->emitBlockReadWriteMemInst(I, CS, false, 4, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_US_MEM8: this->emitBlockReadWriteMemInst(I, CS, false, 8, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_MEM: this->emitBlockReadWriteMemInst(I, CS, true, 1, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_MEM2: this->emitBlockReadWriteMemInst(I, CS, true, 2, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_MEM4: this->emitBlockReadWriteMemInst(I, CS, true, 4, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_MEM8: this->emitBlockReadWriteMemInst(I, CS, true, 8, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_US_IMAGE: this->emitBlockReadWriteImageInst(I, CS, false, 1, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_US_IMAGE2: this->emitBlockReadWriteImageInst(I, CS, false, 2, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_US_IMAGE4: this->emitBlockReadWriteImageInst(I, CS, false, 4, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_READ_US_IMAGE8: this->emitBlockReadWriteImageInst(I, CS, false, 8, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_IMAGE: this->emitBlockReadWriteImageInst(I, CS, true, 1, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_IMAGE2: this->emitBlockReadWriteImageInst(I, CS, true, 2, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_IMAGE4: this->emitBlockReadWriteImageInst(I, CS, true, 4, ir::TYPE_U16); break; case GEN_OCL_SUB_GROUP_BLOCK_WRITE_US_IMAGE8: this->emitBlockReadWriteImageInst(I, CS, true, 8, ir::TYPE_U16); break; case GEN_OCL_GET_PIPE: case GEN_OCL_MAKE_RID: case GEN_OCL_GET_RID: case GEN_OCL_INT_TO_SAMPLER: case GEN_OCL_SAMPLER_TO_INT: { break; } case GEN_OCL_ENQUEUE_SET_NDRANGE_INFO: { GBE_ASSERT(AI != AE); Value *dstValue; if(I.hasStructRetAttr()) dstValue = *AI++; else dstValue = &I; Value *srcValue = *AI; ++AI; regTranslator.newValueProxy(srcValue, dstValue); break; } case GEN_OCL_ENQUEUE_GET_NDRANGE_INFO: { GBE_ASSERT(AI != AE); Value *srcValue = *AI; ++AI; Value *dstValue = &I; regTranslator.newValueProxy(srcValue, dstValue); break; } case GEN_OCL_ENQUEUE_GET_ENQUEUE_INFO_ADDR: { ctx.getFunction().setUseDeviceEnqueue(true); break; } default: break; } } } } void GenWriter::regAllocateAllocaInst(AllocaInst &I) { this->newRegister(&I); } void GenWriter::emitAllocaInst(AllocaInst &I) { Value *src = I.getOperand(0); Type *elemType = I.getType()->getElementType(); ir::ImmediateIndex immIndex; uint32_t elementSize = getTypeByteSize(unit, elemType); // Be aware, we manipulate pointers if (ctx.getPointerSize() == ir::POINTER_32_BITS) immIndex = ctx.newImmediate(uint32_t(elementSize)); else immIndex = ctx.newImmediate(uint64_t(elementSize)); // OK, we try to see if we know compile time the size we need to allocate if (I.isArrayAllocation() == true) { Constant *CPV = dyn_cast(src); GBE_ASSERT(CPV); const ir::Immediate &imm = processConstantImm(CPV); const uint64_t elemNum = imm.getIntegerValue(); elementSize *= elemNum; if (ctx.getPointerSize() == ir::POINTER_32_BITS) immIndex = ctx.newImmediate(uint32_t(ALIGN(elementSize, 4))); else immIndex = ctx.newImmediate(uint64_t(ALIGN(elementSize, 4))); } // Now emit the stream of instructions to get the allocated pointer const ir::RegisterFamily pointerFamily = ctx.getPointerFamily(); const ir::Register dst = this->getRegister(&I); const ir::Register stack = ir::ocl::stackptr; const ir::Register reg = ctx.reg(pointerFamily); const ir::Immediate imm = ctx.getImmediate(immIndex); uint32_t align = getAlignmentByte(unit, elemType); // below code assume align is power of 2 GBE_ASSERT(align && (align & (align-1)) == 0); // align the stack pointer according to data alignment if(align > 1) { uint32_t prevStackPtr = ctx.getFunction().getStackSize(); uint32_t step = ((prevStackPtr + (align - 1)) & ~(align - 1)) - prevStackPtr; if (step != 0) { ir::ImmediateIndex stepImm; ir::Type pointerTy = getType(pointerFamily); if (ctx.getPointerSize() == ir::POINTER_32_BITS) stepImm = ctx.newImmediate(uint32_t(step)); else stepImm = ctx.newImmediate(uint64_t(step)); ir::Register stepReg = ctx.reg(ctx.getPointerFamily()); ctx.LOADI(pointerTy, stepReg, stepImm); ctx.ADD(pointerTy, stack, stack, stepReg); ctx.getFunction().pushStackSize(step); } } // Set the destination register properly if (legacyMode) ctx.MOV(imm.getType(), dst, stack); else ctx.ADD(imm.getType(), dst, stack, ir::ocl::stackbuffer); ctx.LOADI(imm.getType(), reg, immIndex); ctx.ADD(imm.getType(), stack, stack, reg); ctx.getFunction().pushStackSize(elementSize); } static INLINE Value *getLoadOrStoreValue(LoadInst &I) { return &I; } static INLINE Value *getLoadOrStoreValue(StoreInst &I) { return I.getValueOperand(); } void GenWriter::regAllocateLoadInst(LoadInst &I) { this->newRegister(&I); } void GenWriter::regAllocateStoreInst(StoreInst &I) {} void GenWriter::emitLoadInst(LoadInst &I) { MemoryInstHelper *h = new MemoryInstHelper(ctx, unit, this, legacyMode); h->emitLoadOrStore(I); delete h; } void GenWriter::emitStoreInst(StoreInst &I) { MemoryInstHelper *h = new MemoryInstHelper(ctx, unit, this, legacyMode); h->emitLoadOrStore(I); delete h; } llvm::FunctionPass *createGenPass(ir::Unit &unit) { return new GenWriter(unit); } ir::Tuple MemoryInstHelper::getValueTuple(llvm::Value *llvmValues, llvm::Type *elemType, unsigned start, unsigned elemNum) { vector tupleData; // put registers here for (uint32_t elemID = 0; elemID < elemNum; ++elemID) { ir::Register reg; if(writer->regTranslator.isUndefConst(llvmValues, elemID)) { Value *v = Constant::getNullValue(elemType); reg = writer->getRegister(v); } else reg = writer->getRegister(llvmValues, start + elemID); tupleData.push_back(reg); } const ir::Tuple tuple = ctx.arrayTuple(&tupleData[0], elemNum); return tuple; } void MemoryInstHelper::emitBatchLoadOrStore(const ir::Type type, const uint32_t elemNum, Value *llvmValues, Type * elemType) { uint32_t totalSize = elemNum * getFamilySize(getFamily(type)); uint32_t msgNum = totalSize > 16 ? totalSize / 16 : 1; const uint32_t perMsgNum = elemNum / msgNum; for (uint32_t msg = 0; msg < msgNum; ++msg) { // Build the tuple data in the vector ir::Tuple tuple = getValueTuple(llvmValues, elemType, perMsgNum*msg, perMsgNum); // each message can read/write 16 byte const int32_t stride = 16; ir::Register addr = getOffsetAddress(mPtr, msg*stride); shootMessage(type, addr, tuple, perMsgNum); } } ir::Register MemoryInstHelper::getOffsetAddress(ir::Register basePtr, unsigned offset) { const ir::RegisterFamily pointerFamily = ctx.getPointerFamily(); ir::Register addr; if (offset == 0) addr = basePtr; else { const ir::Register offsetReg = ctx.reg(pointerFamily); ir::ImmediateIndex immIndex; ir::Type immType; if (pointerFamily == ir::FAMILY_DWORD) { immIndex = ctx.newImmediate(int32_t(offset)); immType = ir::TYPE_S32; } else { immIndex = ctx.newImmediate(int64_t(offset)); immType = ir::TYPE_S64; } addr = ctx.reg(pointerFamily); ctx.LOADI(immType, offsetReg, immIndex); ctx.ADD(immType, addr, basePtr, offsetReg); } return addr; } // handle load of dword/qword with unaligned address void MemoryInstHelper::emitUnalignedDQLoadStore(Value *llvmValues) { Type *llvmType = llvmValues->getType(); unsigned byteSize = getTypeByteSize(unit, llvmType); Type *elemType = llvmType; unsigned elemNum = 1; if (!isScalarType(llvmType)) { VectorType *vectorType = cast(llvmType); elemType = vectorType->getElementType(); elemNum = vectorType->getNumElements(); } const ir::Type type = getType(ctx, elemType); ir::Tuple tuple = getValueTuple(llvmValues, elemType, 0, elemNum); vector byteTupleData; for (uint32_t elemID = 0; elemID < byteSize; ++elemID) { byteTupleData.push_back(ctx.reg(ir::FAMILY_BYTE)); } const ir::Tuple byteTuple = ctx.arrayTuple(&byteTupleData[0], byteSize); if (isLoad) { shootMessage(ir::TYPE_U8, mPtr, byteTuple, byteSize); ctx.BITCAST(type, ir::TYPE_U8, tuple, byteTuple, elemNum, byteSize); } else { ctx.BITCAST(ir::TYPE_U8, type, byteTuple, tuple, byteSize, elemNum); // FIXME: byte scatter does not handle correctly vector store, after fix that, // we can directly use on store instruction like: // ctx.STORE(ir::TYPE_U8, byteTuple, ptr, addrSpace, byteSize, dwAligned, fixedBTI, bti); for (uint32_t elemID = 0; elemID < byteSize; elemID++) { const ir::Register addr = getOffsetAddress(mPtr, elemID); const ir::Tuple value = ctx.arrayTuple(&byteTupleData[elemID], 1); shootMessage(ir::TYPE_U8, addr, value, 1); } } } template void MemoryInstHelper::emitLoadOrStore(T &I) { Value *llvmPtr = I.getPointerOperand(); Value *llvmValues = getLoadOrStoreValue(I); Type *llvmType = llvmValues->getType(); dwAligned = (I.getAlignment() % 4) == 0; addrSpace = addressSpaceLLVMToGen(llvmPtr->getType()->getPointerAddressSpace()); const ir::Register pointer = writer->getRegister(llvmPtr); const ir::RegisterFamily pointerFamily = ctx.getPointerFamily(); this->isLoad = IsLoad; Type *scalarType = llvmType; if (!isScalarType(llvmType)) { VectorType *vectorType = cast(llvmType); scalarType = vectorType->getElementType(); } // calculate bti and pointer operand if (legacyMode) { Value *bti = writer->getBtiRegister(llvmPtr); Value *ptrBase = writer->getPointerBase(llvmPtr); ir::Register baseReg = writer->getRegister(ptrBase); bool zeroBase = isa(ptrBase) ? true : false; if (isa(bti)) { SurfaceIndex = cast(bti)->getZExtValue(); addrSpace = btiToGen(SurfaceIndex); mAddressMode = ir::AM_StaticBti; } else { addrSpace = ir::MEM_MIXED; mBTI = writer->getRegister(bti); mAddressMode = ir::AM_DynamicBti; } mPtr = ctx.reg(pointerFamily); // FIXME: avoid subtraction zero at this stage is not a good idea, // but later ArgumentLower pass need to match exact load/addImm pattern // so, I avoid subtracting zero base to satisfy ArgumentLower pass. if (!zeroBase) ctx.SUB(getType(ctx, llvmPtr->getType()), mPtr, pointer, baseReg); else mPtr = pointer; } else { mPtr = pointer; SurfaceIndex = 0xff; mAddressMode = ir::AM_Stateless; } unsigned primitiveBits = scalarType->getPrimitiveSizeInBits(); if (!dwAligned && (primitiveBits == 64 || primitiveBits == 32) ) { emitUnalignedDQLoadStore(llvmValues); return; } // Scalar is easy. We neednot build register tuples if (isScalarType(llvmType) == true) { const ir::Type type = getType(ctx, llvmType); const ir::Register values = writer->getRegister(llvmValues); const ir::Tuple tuple = ctx.arrayTuple(&values, 1); shootMessage(type, mPtr, tuple, 1); } // A vector type requires to build a tuple else { VectorType *vectorType = cast(llvmType); Type *elemType = vectorType->getElementType(); // We follow OCL spec and support 2,3,4,8,16 elements only uint32_t elemNum = vectorType->getNumElements(); GBE_ASSERTM(elemNum == 2 || elemNum == 3 || elemNum == 4 || elemNum == 8 || elemNum == 16, "Only vectors of 2,3,4,8 or 16 elements are supported"); // The code is going to be fairly different from types to types (based on // size of each vector element) const ir::Type type = getType(ctx, elemType); const ir::RegisterFamily dataFamily = getFamily(type); if(dataFamily == ir::FAMILY_DWORD && addrSpace != ir::MEM_CONSTANT) { // One message is enough here. Nothing special to do if (elemNum <= 4) { ir::Tuple tuple = getValueTuple(llvmValues, elemType, 0, elemNum); shootMessage(type, mPtr, tuple, elemNum); } else { emitBatchLoadOrStore(type, elemNum, llvmValues, elemType); } } else if((dataFamily == ir::FAMILY_WORD && (isLoad || elemNum % 2 == 0)) || (dataFamily == ir::FAMILY_BYTE && (isLoad || elemNum % 4 == 0))) { emitBatchLoadOrStore(type, elemNum, llvmValues, elemType); } else { for (uint32_t elemID = 0; elemID < elemNum; elemID++) { if(writer->regTranslator.isUndefConst(llvmValues, elemID)) continue; const ir::Register reg = writer->getRegister(llvmValues, elemID); int elemSize = getTypeByteSize(unit, elemType); ir::Register addr = getOffsetAddress(mPtr, elemID*elemSize); const ir::Tuple tuple = ctx.arrayTuple(®, 1); shootMessage(type, addr, tuple, 1); } } } } void MemoryInstHelper::shootMessage(ir::Type type, ir::Register offset, ir::Tuple value, unsigned elemNum) { if (mAddressMode == ir::AM_DynamicBti) { if (isLoad) ctx.LOAD(type, value, offset, addrSpace, elemNum, dwAligned, mAddressMode, mBTI); else ctx.STORE(type, value, offset, addrSpace, elemNum, dwAligned, mAddressMode, mBTI); } else { if (isLoad) ctx.LOAD(type, value, offset, addrSpace, elemNum, dwAligned, mAddressMode, SurfaceIndex); else ctx.STORE(type, value, offset, addrSpace, elemNum, dwAligned, mAddressMode, SurfaceIndex); } } } /* namespace gbe */ Beignet-1.3.2-Source/backend/src/llvm/llvm_profiling.cpp000664 001750 001750 00000014316 13173554000 022342 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ /** * \file llvm_profiling.cpp * This file will insert some instructions for each profiling point. * */ #include #include #include "llvm/Config/llvm-config.h" #include "llvm/IR/Function.h" #include "llvm/IR/InstrTypes.h" #include "llvm/IR/Instructions.h" #include "llvm/IR/IntrinsicInst.h" #include "llvm/IR/Module.h" #include "llvm/Pass.h" #include "llvm/IR/IRBuilder.h" #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35 #include "llvm/IR/CallSite.h" #include "llvm/IR/CFG.h" #else #include "llvm/Support/CallSite.h" #include "llvm/Support/CFG.h" #endif #include "llvm/Support/raw_ostream.h" #include "llvm/IR/Attributes.h" #include "llvm/llvm_gen_backend.hpp" #include "sys/map.hpp" #include "ir/unit.hpp" #include #include using namespace llvm; using std::vector; namespace gbe { using namespace ir; class ProfilingInserter : public FunctionPass { public: static char ID; Module* module; IRBuilder<>* builder; Type* intTy; Type *ptrTy; int profilingType; ProfilingInserter(int profiling) : FunctionPass(ID), profilingType(profiling) { module = NULL; builder = NULL; intTy = NULL; ptrTy = NULL; } ~ProfilingInserter(void) { } #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40 virtual StringRef getPassName() const #else virtual const char *getPassName() const #endif { return "Timestamp Parser"; } virtual bool runOnFunction(llvm::Function &F); }; bool ProfilingInserter::runOnFunction(llvm::Function &F) { bool changed = false; int pointNum = 0; switch (F.getCallingConv()) { case CallingConv::C: case CallingConv::Fast: case CallingConv::SPIR_KERNEL: break; default: GBE_ASSERTM(false, "Unsupported calling convention"); } // As we inline all function calls, so skip non-kernel functions bool bKernel = isKernelFunction(F); if (!bKernel) return changed; module = F.getParent(); intTy = IntegerType::get(module->getContext(), 32); ptrTy = Type::getInt32PtrTy(module->getContext(), 1); builder = new IRBuilder<>(module->getContext()); /* alloc a new buffer ptr to collect the timestamps. */ builder->SetInsertPoint(&*F.begin()->begin()); llvm::Constant *profilingBuf = module->getGlobalVariable("__gen_ocl_profiling_buf"); if (!profilingBuf) { profilingBuf = new GlobalVariable(*module, intTy, false, GlobalVariable::ExternalLinkage, nullptr, StringRef("__gen_ocl_profiling_buf"), nullptr, GlobalVariable::NotThreadLocal, 1); } changed = true; for (llvm::Function::iterator B = F.begin(), BE = F.end(); B != BE; B++) { /* Skip the empty blocks. */ if (B->empty()) continue; BasicBlock::iterator instI = B->begin(); for ( ; instI != B->end(); instI++) { if (dyn_cast(instI)) continue; if (dyn_cast(instI)) { instI++; GBE_ASSERT(instI == B->end()); break; } if (dyn_cast(instI)) { instI++; GBE_ASSERT(instI == B->end()); break; } break; } if (instI == B->end()) continue; if (pointNum >= 20) // To many timestamp. continue; // Insert the first one at beginning of not PHI. builder->SetInsertPoint(&*instI); /* Add the timestamp store function call. */ // __gen_ocl_store_timestamp(int nth, int type); Value *Args[2] = {ConstantInt::get(intTy, pointNum++), ConstantInt::get(intTy, profilingType)}; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 50 builder->CreateCall(cast(module->getOrInsertFunction( "__gen_ocl_calc_timestamp", Type::getVoidTy(module->getContext()), IntegerType::getInt32Ty(module->getContext()), IntegerType::getInt32Ty(module->getContext()))), ArrayRef(Args)); #else builder->CreateCall(cast(module->getOrInsertFunction( "__gen_ocl_calc_timestamp", Type::getVoidTy(module->getContext()), IntegerType::getInt32Ty(module->getContext()), IntegerType::getInt32Ty(module->getContext()), nullptr)), ArrayRef(Args)); #endif } /* We insert one store_profiling at the end of the last block to hold the place. */ llvm::Function::iterator BE = F.end(); BE--; BasicBlock::iterator retInst = BE->end(); retInst--; builder->SetInsertPoint(&*retInst); Value *Args2[2] = {profilingBuf, ConstantInt::get(intTy, profilingType)}; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 50 builder->CreateCall(cast(module->getOrInsertFunction( "__gen_ocl_store_profiling", Type::getVoidTy(module->getContext()), ptrTy, IntegerType::getInt32Ty(module->getContext()))), ArrayRef(Args2)); #else builder->CreateCall(cast(module->getOrInsertFunction( "__gen_ocl_store_profiling", Type::getVoidTy(module->getContext()), ptrTy, IntegerType::getInt32Ty(module->getContext()), nullptr)), ArrayRef(Args2)); #endif delete builder; return changed; } FunctionPass* createProfilingInserterPass(int profilingType, ir::Unit &unit) { unit.setInProfilingMode(true); return new ProfilingInserter(profilingType); } char ProfilingInserter::ID = 0; } // end namespace Beignet-1.3.2-Source/backend/src/llvm/llvm_gen_backend.hpp000664 001750 001750 00000013050 13173554000 022570 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file llvm_gen_backend.hpp * \author Benjamin Segovia * * Pass generation functions */ #ifndef __GBE_LLVM_GEN_BACKEND_HPP__ #define __GBE_LLVM_GEN_BACKEND_HPP__ #include #include "llvm/Config/llvm-config.h" #include "llvm/Pass.h" #include "llvm/Analysis/LoopPass.h" #include "llvm/IR/Instructions.h" #include "sys/platform.hpp" #include "sys/map.hpp" #include // LLVM Type namespace llvm { class Type; /* Imported from pNaCl */ llvm::Instruction *PhiSafeInsertPt(llvm::Use *U); void PhiSafeReplaceUses(llvm::Use *U, llvm::Value *NewVal); FunctionPass *createExpandConstantExprPass(); FunctionPass *createExpandLargeIntegersPass(); FunctionPass *createPromoteIntegersPass(); FunctionPass *createStripAttributesPass(bool lastTime); // Copy debug information from Original to New, and return New. template T *CopyDebug(T *New, llvm::Instruction *Original) { New->setDebugLoc(Original->getDebugLoc()); return New; } } namespace gbe { // Final target of the Gen backend namespace ir { class Unit; } /*! All intrinsic Gen functions */ enum OCLInstrinsic { #define DECL_LLVM_GEN_FUNCTION(ID, NAME) GEN_OCL_##ID, #include "llvm_gen_ocl_function.hxx" GEN_OCL_NOT_FOUND, #undef DECL_LLVM_GEN_FUNCTION }; /*! Build the hash map for OCL functions on Gen */ struct OCLIntrinsicMap { /*! Build the intrinsic map */ OCLIntrinsicMap(void) { #define DECL_LLVM_GEN_FUNCTION(ID, NAME) \ map.insert(std::make_pair(#NAME, GEN_OCL_##ID)); #include "llvm_gen_ocl_function.hxx" #undef DECL_LLVM_GEN_FUNCTION } /*! Sort intrinsics with their names */ gbe::map map; OCLInstrinsic find(const std::string symbol) const { auto it = map.find(symbol); if (it == map.end()) { int status = 0; /* set for libcxxrt */ char *realName = abi::__cxa_demangle(symbol.c_str(), NULL, NULL, &status); if (realName) { std::string realFnName(realName), stripName; stripName = realFnName.substr(0, realFnName.find("(")); it = map.find(stripName); } free(realName); } // FIXME, should create a complete error reporting mechanism // when found error in beignet managed passes including Gen pass. if (it == map.end()) { std::cerr << "Unresolved symbol: " << symbol << std::endl; std::cerr << "Aborting..." << std::endl; return GEN_OCL_NOT_FOUND; } return it->second; } }; /*! Sort the OCL Gen instrinsic functions (built on pre-main) */ static const OCLIntrinsicMap intrinsicMap; /*! Pad the offset */ int32_t getPadding(int32_t offset, int32_t align); /*! Get the type alignment in bytes */ uint32_t getAlignmentByte(const ir::Unit &unit, llvm::Type* Ty); /*! Get the type size in bits */ uint32_t getTypeBitSize(const ir::Unit &unit, llvm::Type* Ty); /*! Get the type size in bytes */ uint32_t getTypeByteSize(const ir::Unit &unit, llvm::Type* Ty); /*! Get GEP constant offset for the specified operand.*/ int32_t getGEPConstOffset(const ir::Unit &unit, llvm::Type *eltTy, int32_t TypeIndex); /*! Get element type for a type.*/ llvm::Type* getEltType(llvm::Type *eltTy, uint32_t index = 0); /*! whether this is a kernel function */ bool isKernelFunction(const llvm::Function &f); /*! Create a Gen-IR unit */ llvm::FunctionPass *createGenPass(ir::Unit &unit); /*! Remove the GEP instructions */ llvm::BasicBlockPass *createRemoveGEPPass(const ir::Unit &unit); /*! Merge load/store if possible */ llvm::BasicBlockPass *createLoadStoreOptimizationPass(); /*! Scalarize all vector op instructions */ llvm::FunctionPass* createScalarizePass(); /*! Remove/add NoDuplicate function attribute for barrier functions. */ llvm::ModulePass* createBarrierNodupPass(bool); /*! Convert the Intrinsic call to gen function */ llvm::BasicBlockPass *createIntrinsicLoweringPass(); /*! Passer the printf function call. */ llvm::FunctionPass* createPrintfParserPass(ir::Unit &unit); /*! Insert the time stamp for profiling. */ llvm::FunctionPass* createProfilingInserterPass(int profilingType, ir::Unit &unit); #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35 /* customized loop unrolling pass. */ llvm::LoopPass *createCustomLoopUnrollPass(); #endif llvm::FunctionPass* createSamplerFixPass(); /*! Add all the function call of ocl to our bitcode. */ llvm::Module* runBitCodeLinker(llvm::Module *mod, bool strictMath, ir::Unit &unit); /*! Get the moudule's opencl version form meta data. */ uint32_t getModuleOclVersion(const llvm::Module *M); void collectDeviceEnqueueInfo(llvm::Module *mod, ir::Unit &unit); void* getPrintfInfo(llvm::CallInst* inst); } /* namespace gbe */ #endif /* __GBE_LLVM_GEN_BACKEND_HPP__ */ Beignet-1.3.2-Source/backend/src/llvm/llvm_gen_ocl_function.hxx000664 001750 001750 00000036307 13173554000 023715 0ustar00yryr000000 000000 DECL_LLVM_GEN_FUNCTION(GET_GROUP_ID0, __gen_ocl_get_group_id0) DECL_LLVM_GEN_FUNCTION(GET_GROUP_ID1, __gen_ocl_get_group_id1) DECL_LLVM_GEN_FUNCTION(GET_GROUP_ID2, __gen_ocl_get_group_id2) DECL_LLVM_GEN_FUNCTION(GET_LOCAL_ID0, __gen_ocl_get_local_id0) DECL_LLVM_GEN_FUNCTION(GET_LOCAL_ID1, __gen_ocl_get_local_id1) DECL_LLVM_GEN_FUNCTION(GET_LOCAL_ID2, __gen_ocl_get_local_id2) DECL_LLVM_GEN_FUNCTION(GET_NUM_GROUPS0, __gen_ocl_get_num_groups0) DECL_LLVM_GEN_FUNCTION(GET_NUM_GROUPS1, __gen_ocl_get_num_groups1) DECL_LLVM_GEN_FUNCTION(GET_NUM_GROUPS2, __gen_ocl_get_num_groups2) DECL_LLVM_GEN_FUNCTION(GET_LOCAL_SIZE0, __gen_ocl_get_local_size0) DECL_LLVM_GEN_FUNCTION(GET_LOCAL_SIZE1, __gen_ocl_get_local_size1) DECL_LLVM_GEN_FUNCTION(GET_LOCAL_SIZE2, __gen_ocl_get_local_size2) DECL_LLVM_GEN_FUNCTION(GET_ENQUEUED_LOCAL_SIZE0, __gen_ocl_get_enqueued_local_size0) DECL_LLVM_GEN_FUNCTION(GET_ENQUEUED_LOCAL_SIZE1, __gen_ocl_get_enqueued_local_size1) DECL_LLVM_GEN_FUNCTION(GET_ENQUEUED_LOCAL_SIZE2, __gen_ocl_get_enqueued_local_size2) DECL_LLVM_GEN_FUNCTION(GET_GLOBAL_SIZE0, __gen_ocl_get_global_size0) DECL_LLVM_GEN_FUNCTION(GET_GLOBAL_SIZE1, __gen_ocl_get_global_size1) DECL_LLVM_GEN_FUNCTION(GET_GLOBAL_SIZE2, __gen_ocl_get_global_size2) DECL_LLVM_GEN_FUNCTION(GET_GLOBAL_OFFSET0, __gen_ocl_get_global_offset0) DECL_LLVM_GEN_FUNCTION(GET_GLOBAL_OFFSET1, __gen_ocl_get_global_offset1) DECL_LLVM_GEN_FUNCTION(GET_GLOBAL_OFFSET2, __gen_ocl_get_global_offset2) DECL_LLVM_GEN_FUNCTION(GET_WORK_DIM, __gen_ocl_get_work_dim) // Math function DECL_LLVM_GEN_FUNCTION(RSQ, __gen_ocl_rsqrt) DECL_LLVM_GEN_FUNCTION(RCP, __gen_ocl_rcp) DECL_LLVM_GEN_FUNCTION(FMAX, __gen_ocl_fmax) DECL_LLVM_GEN_FUNCTION(FMIN, __gen_ocl_fmin) // Barrier function DECL_LLVM_GEN_FUNCTION(LBARRIER, __gen_ocl_barrier_local) DECL_LLVM_GEN_FUNCTION(GBARRIER, __gen_ocl_barrier_global) DECL_LLVM_GEN_FUNCTION(BARRIER, __gen_ocl_barrier) // To force SIMD8/16 compilation DECL_LLVM_GEN_FUNCTION(FORCE_SIMD8, __gen_ocl_force_simd8) DECL_LLVM_GEN_FUNCTION(FORCE_SIMD16, __gen_ocl_force_simd16) // To read_image functions. DECL_LLVM_GEN_FUNCTION(READ_IMAGE_I, __gen_ocl_read_imagei) DECL_LLVM_GEN_FUNCTION(READ_IMAGE_UI, __gen_ocl_read_imageui) DECL_LLVM_GEN_FUNCTION(READ_IMAGE_F, __gen_ocl_read_imagef) // To write_image functions. DECL_LLVM_GEN_FUNCTION(WRITE_IMAGE_I, __gen_ocl_write_imagei) DECL_LLVM_GEN_FUNCTION(WRITE_IMAGE_UI, __gen_ocl_write_imageui) DECL_LLVM_GEN_FUNCTION(WRITE_IMAGE_F, __gen_ocl_write_imagef) // To get image info function DECL_LLVM_GEN_FUNCTION(GET_IMAGE_WIDTH, __gen_ocl_get_image_width) DECL_LLVM_GEN_FUNCTION(GET_IMAGE_HEIGHT, __gen_ocl_get_image_height) DECL_LLVM_GEN_FUNCTION(GET_IMAGE_DEPTH, __gen_ocl_get_image_depth) DECL_LLVM_GEN_FUNCTION(GET_IMAGE_CHANNEL_DATA_TYPE, __gen_ocl_get_image_channel_data_type) DECL_LLVM_GEN_FUNCTION(GET_IMAGE_CHANNEL_ORDER, __gen_ocl_get_image_channel_order) // atomic related functions. DECL_LLVM_GEN_FUNCTION(ATOMIC_ADD0, _Z20__gen_ocl_atomic_addPU3AS1jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_ADD1, _Z20__gen_ocl_atomic_addPU3AS3jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_SUB0, _Z20__gen_ocl_atomic_subPU3AS1jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_SUB1, _Z20__gen_ocl_atomic_subPU3AS3jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_AND0, _Z20__gen_ocl_atomic_andPU3AS1jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_AND1, _Z20__gen_ocl_atomic_andPU3AS3jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_OR0, _Z19__gen_ocl_atomic_orPU3AS1jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_OR1, _Z19__gen_ocl_atomic_orPU3AS3jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_XOR0, _Z20__gen_ocl_atomic_xorPU3AS1jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_XOR1, _Z20__gen_ocl_atomic_xorPU3AS3jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_UMIN0, _Z21__gen_ocl_atomic_uminPU3AS1jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_UMIN1, _Z21__gen_ocl_atomic_uminPU3AS3jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_UMAX0, _Z21__gen_ocl_atomic_umaxPU3AS1jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_UMAX1, _Z21__gen_ocl_atomic_umaxPU3AS3jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_IMIN0, _Z21__gen_ocl_atomic_iminPU3AS1jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_IMIN1, _Z21__gen_ocl_atomic_iminPU3AS3jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_IMAX0, _Z21__gen_ocl_atomic_imaxPU3AS1jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_IMAX1, _Z21__gen_ocl_atomic_imaxPU3AS3jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_XCHG0, _Z21__gen_ocl_atomic_xchgPU3AS1jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_XCHG1, _Z21__gen_ocl_atomic_xchgPU3AS3jj) DECL_LLVM_GEN_FUNCTION(ATOMIC_INC0, _Z20__gen_ocl_atomic_incPU3AS1j) DECL_LLVM_GEN_FUNCTION(ATOMIC_INC1, _Z20__gen_ocl_atomic_incPU3AS3j) DECL_LLVM_GEN_FUNCTION(ATOMIC_DEC0, _Z20__gen_ocl_atomic_decPU3AS1j) DECL_LLVM_GEN_FUNCTION(ATOMIC_DEC1, _Z20__gen_ocl_atomic_decPU3AS3j) DECL_LLVM_GEN_FUNCTION(ATOMIC_CMPXCHG0, _Z24__gen_ocl_atomic_cmpxchgPU3AS1jjj) DECL_LLVM_GEN_FUNCTION(ATOMIC_CMPXCHG1, _Z24__gen_ocl_atomic_cmpxchgPU3AS3jjj) // saturation related functions. DECL_LLVM_GEN_FUNCTION(SADD_SAT_CHAR, _Z12ocl_sadd_satcc) DECL_LLVM_GEN_FUNCTION(SADD_SAT_SHORT, _Z12ocl_sadd_satss) DECL_LLVM_GEN_FUNCTION(SADD_SAT_INT, _Z12ocl_sadd_satii) DECL_LLVM_GEN_FUNCTION(SADD_SAT_LONG, _Z12ocl_sadd_satll) DECL_LLVM_GEN_FUNCTION(UADD_SAT_CHAR, _Z12ocl_uadd_sathh) DECL_LLVM_GEN_FUNCTION(UADD_SAT_SHORT, _Z12ocl_uadd_sattt) DECL_LLVM_GEN_FUNCTION(UADD_SAT_INT, _Z12ocl_uadd_satjj) DECL_LLVM_GEN_FUNCTION(UADD_SAT_LONG, _Z12ocl_uadd_satmm) DECL_LLVM_GEN_FUNCTION(SSUB_SAT_CHAR, _Z12ocl_ssub_satcc) DECL_LLVM_GEN_FUNCTION(SSUB_SAT_SHORT, _Z12ocl_ssub_satss) DECL_LLVM_GEN_FUNCTION(SSUB_SAT_INT, _Z12ocl_ssub_satii) DECL_LLVM_GEN_FUNCTION(SSUB_SAT_LONG, _Z12ocl_ssub_satll) DECL_LLVM_GEN_FUNCTION(USUB_SAT_CHAR, _Z12ocl_usub_sathh) DECL_LLVM_GEN_FUNCTION(USUB_SAT_SHORT, _Z12ocl_usub_sattt) DECL_LLVM_GEN_FUNCTION(USUB_SAT_INT, _Z12ocl_usub_satjj) DECL_LLVM_GEN_FUNCTION(USUB_SAT_LONG, _Z12ocl_usub_satmm) DECL_LLVM_GEN_FUNCTION(I64_MAD_SAT, _Z17__gen_ocl_mad_satlll) DECL_LLVM_GEN_FUNCTION(I64_MAD_SATU, _Z17__gen_ocl_mad_satmmm) // integer built-in functions DECL_LLVM_GEN_FUNCTION(MUL_HI_INT, _Z16__gen_ocl_mul_hiii) DECL_LLVM_GEN_FUNCTION(MUL_HI_UINT, _Z16__gen_ocl_mul_hijj) DECL_LLVM_GEN_FUNCTION(MUL_HI_I64, _Z16__gen_ocl_mul_hill) DECL_LLVM_GEN_FUNCTION(MUL_HI_UI64, _Z16__gen_ocl_mul_himm) DECL_LLVM_GEN_FUNCTION(FBH, __gen_ocl_fbh) DECL_LLVM_GEN_FUNCTION(FBL, __gen_ocl_fbl) DECL_LLVM_GEN_FUNCTION(ABS, __gen_ocl_abs) DECL_LLVM_GEN_FUNCTION(HADD, _Z14__gen_ocl_haddjj) DECL_LLVM_GEN_FUNCTION(RHADD, _Z15__gen_ocl_rhaddjj) DECL_LLVM_GEN_FUNCTION(I64HADD, _Z14__gen_ocl_haddmm) DECL_LLVM_GEN_FUNCTION(I64RHADD, _Z15__gen_ocl_rhaddmm) DECL_LLVM_GEN_FUNCTION(UPSAMPLE_SHORT, _Z18__gen_ocl_upsampless) DECL_LLVM_GEN_FUNCTION(UPSAMPLE_INT, _Z18__gen_ocl_upsampleii) DECL_LLVM_GEN_FUNCTION(UPSAMPLE_LONG, _Z18__gen_ocl_upsamplell) DECL_LLVM_GEN_FUNCTION(CBIT, __gen_ocl_cbit) // saturate convert DECL_LLVM_GEN_FUNCTION(SAT_CONV_U8_TO_I8, _Z16convert_char_sath) DECL_LLVM_GEN_FUNCTION(SAT_CONV_I16_TO_I8, _Z16convert_char_sats) DECL_LLVM_GEN_FUNCTION(SAT_CONV_U16_TO_I8, _Z16convert_char_satt) DECL_LLVM_GEN_FUNCTION(SAT_CONV_I32_TO_I8, _Z16convert_char_sati) DECL_LLVM_GEN_FUNCTION(SAT_CONV_U32_TO_I8, _Z16convert_char_satj) DECL_LLVM_GEN_FUNCTION(SAT_CONV_F32_TO_I8, _Z16convert_char_satf) DECL_LLVM_GEN_FUNCTION(SAT_CONV_I8_TO_U8, _Z17convert_uchar_satc) DECL_LLVM_GEN_FUNCTION(SAT_CONV_I16_TO_U8, _Z17convert_uchar_sats) DECL_LLVM_GEN_FUNCTION(SAT_CONV_U16_TO_U8, _Z17convert_uchar_satt) DECL_LLVM_GEN_FUNCTION(SAT_CONV_I32_TO_U8, _Z17convert_uchar_sati) DECL_LLVM_GEN_FUNCTION(SAT_CONV_U32_TO_U8, _Z17convert_uchar_satj) DECL_LLVM_GEN_FUNCTION(SAT_CONV_F32_TO_U8, _Z17convert_uchar_satf) DECL_LLVM_GEN_FUNCTION(SAT_CONV_U16_TO_I16, _Z17convert_short_satt) DECL_LLVM_GEN_FUNCTION(SAT_CONV_I32_TO_I16, _Z17convert_short_sati) DECL_LLVM_GEN_FUNCTION(SAT_CONV_U32_TO_I16, _Z17convert_short_satj) DECL_LLVM_GEN_FUNCTION(SAT_CONV_F32_TO_I16, _Z17convert_short_satf) DECL_LLVM_GEN_FUNCTION(SAT_CONV_I16_TO_U16, _Z18convert_ushort_sats) DECL_LLVM_GEN_FUNCTION(SAT_CONV_I32_TO_U16, _Z18convert_ushort_sati) DECL_LLVM_GEN_FUNCTION(SAT_CONV_U32_TO_U16, _Z18convert_ushort_satj) DECL_LLVM_GEN_FUNCTION(SAT_CONV_F32_TO_U16, _Z18convert_ushort_satf) DECL_LLVM_GEN_FUNCTION(SAT_CONV_U32_TO_I32, _Z15convert_int_satj) DECL_LLVM_GEN_FUNCTION(SAT_CONV_F32_TO_I32, _Z15convert_int_satf) DECL_LLVM_GEN_FUNCTION(SAT_CONV_I32_TO_U32, _Z16convert_uint_sati) DECL_LLVM_GEN_FUNCTION(SAT_CONV_F32_TO_U32, _Z16convert_uint_satf) DECL_LLVM_GEN_FUNCTION(CONV_F16_TO_F32, __gen_ocl_f16to32) DECL_LLVM_GEN_FUNCTION(CONV_F32_TO_F16, __gen_ocl_f32to16) DECL_LLVM_GEN_FUNCTION(SAT_CONV_F16_TO_I8, _Z16convert_char_satDh) DECL_LLVM_GEN_FUNCTION(SAT_CONV_F16_TO_U8, _Z17convert_uchar_satDh) DECL_LLVM_GEN_FUNCTION(SAT_CONV_F16_TO_I16, _Z17convert_short_satDh) DECL_LLVM_GEN_FUNCTION(SAT_CONV_F16_TO_U16, _Z18convert_ushort_satDh) DECL_LLVM_GEN_FUNCTION(SAT_CONV_F16_TO_I32, _Z15convert_int_satDh) DECL_LLVM_GEN_FUNCTION(SAT_CONV_F16_TO_U32, _Z16convert_uint_satDh) // SIMD level function for internal usage DECL_LLVM_GEN_FUNCTION(SIMD_ANY, sub_group_any) DECL_LLVM_GEN_FUNCTION(SIMD_ALL, sub_group_all) DECL_LLVM_GEN_FUNCTION(SIMD_SIZE, get_simd_size) DECL_LLVM_GEN_FUNCTION(SIMD_ID, get_sub_group_local_id) DECL_LLVM_GEN_FUNCTION(GET_THREAD_NUM, get_num_sub_groups) DECL_LLVM_GEN_FUNCTION(GET_THREAD_ID, get_sub_group_id) DECL_LLVM_GEN_FUNCTION(SIMD_SHUFFLE, intel_sub_group_shuffle) DECL_LLVM_GEN_FUNCTION(READ_TM, __gen_ocl_read_tm) DECL_LLVM_GEN_FUNCTION(REGION, __gen_ocl_region) DECL_LLVM_GEN_FUNCTION(IN_PRIVATE, __gen_ocl_in_private) DECL_LLVM_GEN_FUNCTION(VME, __gen_ocl_vme) // printf function DECL_LLVM_GEN_FUNCTION(PRINTF, __gen_ocl_printf_stub) DECL_LLVM_GEN_FUNCTION(PUTS, __gen_ocl_puts_stub) // store timestamp function DECL_LLVM_GEN_FUNCTION(CALC_TIMESTAMP, __gen_ocl_calc_timestamp) // store profiling info to the mem. DECL_LLVM_GEN_FUNCTION(STORE_PROFILING, __gen_ocl_store_profiling) // debug wait function DECL_LLVM_GEN_FUNCTION(DEBUGWAIT, __gen_ocl_debugwait) // work group function DECL_LLVM_GEN_FUNCTION(WORK_GROUP_BROADCAST, __gen_ocl_work_group_broadcast) DECL_LLVM_GEN_FUNCTION(WORK_GROUP_REDUCE_ADD, __gen_ocl_work_group_reduce_add) DECL_LLVM_GEN_FUNCTION(WORK_GROUP_REDUCE_MAX, __gen_ocl_work_group_reduce_max) DECL_LLVM_GEN_FUNCTION(WORK_GROUP_REDUCE_MIN, __gen_ocl_work_group_reduce_min) DECL_LLVM_GEN_FUNCTION(WORK_GROUP_SCAN_EXCLUSIVE_ADD, __gen_ocl_work_group_scan_exclusive_add) DECL_LLVM_GEN_FUNCTION(WORK_GROUP_SCAN_EXCLUSIVE_MAX, __gen_ocl_work_group_scan_exclusive_max) DECL_LLVM_GEN_FUNCTION(WORK_GROUP_SCAN_EXCLUSIVE_MIN, __gen_ocl_work_group_scan_exclusive_min) DECL_LLVM_GEN_FUNCTION(WORK_GROUP_SCAN_INCLUSIVE_ADD, __gen_ocl_work_group_scan_inclusive_add) DECL_LLVM_GEN_FUNCTION(WORK_GROUP_SCAN_INCLUSIVE_MAX, __gen_ocl_work_group_scan_inclusive_max) DECL_LLVM_GEN_FUNCTION(WORK_GROUP_SCAN_INCLUSIVE_MIN, __gen_ocl_work_group_scan_inclusive_min) DECL_LLVM_GEN_FUNCTION(WORK_GROUP_ALL, __gen_ocl_work_group_all) DECL_LLVM_GEN_FUNCTION(WORK_GROUP_ANY, __gen_ocl_work_group_any) // sub group function DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BROADCAST, __gen_ocl_sub_group_broadcast) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_REDUCE_ADD, __gen_ocl_sub_group_reduce_add) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_REDUCE_MAX, __gen_ocl_sub_group_reduce_max) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_REDUCE_MIN, __gen_ocl_sub_group_reduce_min) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_SCAN_EXCLUSIVE_ADD, __gen_ocl_sub_group_scan_exclusive_add) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_SCAN_EXCLUSIVE_MAX, __gen_ocl_sub_group_scan_exclusive_max) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_SCAN_EXCLUSIVE_MIN, __gen_ocl_sub_group_scan_exclusive_min) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_SCAN_INCLUSIVE_ADD, __gen_ocl_sub_group_scan_inclusive_add) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_SCAN_INCLUSIVE_MAX, __gen_ocl_sub_group_scan_inclusive_max) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_SCAN_INCLUSIVE_MIN, __gen_ocl_sub_group_scan_inclusive_min) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_UI_MEM, __gen_ocl_sub_group_block_read_ui_mem) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_UI_MEM2, __gen_ocl_sub_group_block_read_ui_mem2) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_UI_MEM4, __gen_ocl_sub_group_block_read_ui_mem4) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_UI_MEM8, __gen_ocl_sub_group_block_read_ui_mem8) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_UI_MEM, __gen_ocl_sub_group_block_write_ui_mem) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_UI_MEM2, __gen_ocl_sub_group_block_write_ui_mem2) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_UI_MEM4, __gen_ocl_sub_group_block_write_ui_mem4) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_UI_MEM8, __gen_ocl_sub_group_block_write_ui_mem8) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_UI_IMAGE, __gen_ocl_sub_group_block_read_ui_image) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_UI_IMAGE2, __gen_ocl_sub_group_block_read_ui_image2) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_UI_IMAGE4, __gen_ocl_sub_group_block_read_ui_image4) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_UI_IMAGE8, __gen_ocl_sub_group_block_read_ui_image8) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_UI_IMAGE, __gen_ocl_sub_group_block_write_ui_image) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_UI_IMAGE2, __gen_ocl_sub_group_block_write_ui_image2) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_UI_IMAGE4, __gen_ocl_sub_group_block_write_ui_image4) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_UI_IMAGE8, __gen_ocl_sub_group_block_write_ui_image8) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_US_MEM, __gen_ocl_sub_group_block_read_us_mem) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_US_MEM2, __gen_ocl_sub_group_block_read_us_mem2) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_US_MEM4, __gen_ocl_sub_group_block_read_us_mem4) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_US_MEM8, __gen_ocl_sub_group_block_read_us_mem8) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_US_MEM, __gen_ocl_sub_group_block_write_us_mem) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_US_MEM2, __gen_ocl_sub_group_block_write_us_mem2) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_US_MEM4, __gen_ocl_sub_group_block_write_us_mem4) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_US_MEM8, __gen_ocl_sub_group_block_write_us_mem8) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_US_IMAGE, __gen_ocl_sub_group_block_read_us_image) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_US_IMAGE2, __gen_ocl_sub_group_block_read_us_image2) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_US_IMAGE4, __gen_ocl_sub_group_block_read_us_image4) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_READ_US_IMAGE8, __gen_ocl_sub_group_block_read_us_image8) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_US_IMAGE, __gen_ocl_sub_group_block_write_us_image) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_US_IMAGE2, __gen_ocl_sub_group_block_write_us_image2) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_US_IMAGE4, __gen_ocl_sub_group_block_write_us_image4) DECL_LLVM_GEN_FUNCTION(SUB_GROUP_BLOCK_WRITE_US_IMAGE8, __gen_ocl_sub_group_block_write_us_image8) // common function DECL_LLVM_GEN_FUNCTION(LRP, __gen_ocl_lrp) // pipe function DECL_LLVM_GEN_FUNCTION(GET_PIPE, __gen_ocl_get_pipe) DECL_LLVM_GEN_FUNCTION(GET_RID, __gen_ocl_get_rid) DECL_LLVM_GEN_FUNCTION(MAKE_RID, __gen_ocl_make_rid) //Enqueue function DECL_LLVM_GEN_FUNCTION(ENQUEUE_SET_NDRANGE_INFO, __gen_ocl_set_ndrange_info) DECL_LLVM_GEN_FUNCTION(ENQUEUE_GET_NDRANGE_INFO, __gen_ocl_get_ndrange_info) DECL_LLVM_GEN_FUNCTION(ENQUEUE_GET_ENQUEUE_INFO_ADDR, __gen_ocl_get_enqueue_info_addr) // sampler helper functions DECL_LLVM_GEN_FUNCTION(SAMPLER_TO_INT, __gen_ocl_sampler_to_int) DECL_LLVM_GEN_FUNCTION(INT_TO_SAMPLER, __gen_ocl_int_to_sampler) Beignet-1.3.2-Source/backend/src/llvm/llvm_to_gen.hpp000664 001750 001750 00000002666 13173554000 021636 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file llvm_to_gen.hpp * \author Benjamin Segovia */ #ifndef __GBE_IR_LLVM_TO_GEN_HPP__ #define __GBE_IR_LLVM_TO_GEN_HPP__ #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 39 #include "llvm/IR/LLVMContext.h" #endif namespace gbe { namespace ir { // The code is output into an IR unit class Unit; } /* namespace ir */ /*! Convert the LLVM IR code to a GEN IR code, optLevel 0 equal to clang -O1 and 1 equal to clang -O2*/ bool llvmToGen(ir::Unit &unit, const void* module, int optLevel, bool strictMath, int profiling, std::string &errors); } /* namespace gbe */ #endif /* __GBE_IR_LLVM_TO_GEN_HPP__ */ Beignet-1.3.2-Source/backend/src/GBEConfig.h.in000664 001750 001750 00000000726 13161142102 020133 0ustar00yryr000000 000000 // the configured options and settings for LIBGBE #define LIBGBE_VERSION_MAJOR @LIBGBE_VERSION_MAJOR@ #define LIBGBE_VERSION_MINOR @LIBGBE_VERSION_MINOR@ #define GBE_OBJECT_DIR "@GBE_OBJECT_DIR@" #define INTERP_OBJECT_DIR "@INTERP_OBJECT_DIR@" #define OCL_BITCODE_BIN "@OCL_BITCODE_BIN@" #define OCL_HEADER_DIR "@OCL_HEADER_DIR@" #define OCL_PCH_OBJECT "@OCL_PCH_OBJECT@" #define OCL_BITCODE_BIN_20 "@OCL_BITCODE_BIN_20@" #define OCL_PCH_OBJECT_20 "@OCL_PCH_OBJECT_20@" Beignet-1.3.2-Source/backend/src/libocl/000775 001750 001750 00000000000 13174334761 017113 5ustar00yryr000000 000000 Beignet-1.3.2-Source/backend/src/libocl/src/000775 001750 001750 00000000000 13174334761 017702 5ustar00yryr000000 000000 Beignet-1.3.2-Source/backend/src/libocl/src/ocl_geometric.cl000664 001750 001750 00000010043 13161142102 023011 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_geometric.h" #include "ocl_common.h" #include "ocl_relational.h" #if (__OPENCL_C_VERSION__ >= 200) #include "ocl_math_20.h" #else #include "ocl_math.h" #endif #include "ocl_float.h" CONST float __gen_ocl_fabs(float x) __asm("llvm.fabs" ".f32"); OVERLOADABLE float dot(float p0, float p1) { return p0 * p1; } OVERLOADABLE float dot(float2 p0, float2 p1) { return p0.x * p1.x + p0.y * p1.y; } OVERLOADABLE float dot(float3 p0, float3 p1) { return p0.x * p1.x + p0.y * p1.y + p0.z * p1.z; } OVERLOADABLE float dot(float4 p0, float4 p1) { return p0.x * p1.x + p0.y * p1.y + p0.z * p1.z + p0.w * p1.w; } OVERLOADABLE half dot(half p0, half p1) { return p0 * p1; } OVERLOADABLE half dot(half2 p0, half2 p1) { return p0.x * p1.x + p0.y * p1.y; } OVERLOADABLE half dot(half3 p0, half3 p1) { return p0.x * p1.x + p0.y * p1.y + p0.z * p1.z; } OVERLOADABLE half dot(half4 p0, half4 p1) { return p0.x * p1.x + p0.y * p1.y + p0.z * p1.z + p0.w * p1.w; } OVERLOADABLE float length(float x) { return __gen_ocl_fabs(x); } #define BODY \ m = m==0.0f ? 1.0f : m; \ m = isinf(m) ? 1.0f : m; \ x = x/m; \ return m * sqrt(dot(x,x)); OVERLOADABLE float length(float2 x) { float m = max(__gen_ocl_fabs(x.s0), __gen_ocl_fabs(x.s1)); BODY; } OVERLOADABLE float length(float3 x) { float m = max(__gen_ocl_fabs(x.s0), max(__gen_ocl_fabs(x.s1), __gen_ocl_fabs(x.s2))); BODY; } OVERLOADABLE float length(float4 x) { float m = max(__gen_ocl_fabs(x.s0), max(__gen_ocl_fabs(x.s1), max(__gen_ocl_fabs(x.s2), __gen_ocl_fabs(x.s3)))); BODY; } #undef BODY OVERLOADABLE float distance(float x, float y) { return length(x-y); } OVERLOADABLE float distance(float2 x, float2 y) { return length(x-y); } OVERLOADABLE float distance(float3 x, float3 y) { return length(x-y); } OVERLOADABLE float distance(float4 x, float4 y) { return length(x-y); } OVERLOADABLE float normalize(float x) { float m = length(x); m = m == 0.0f ? 1.0f : m; return x / m; } OVERLOADABLE float2 normalize(float2 x) { float m = length(x); m = m == 0.0f ? 1.0f : m; return x / m; } OVERLOADABLE float3 normalize(float3 x) { float m = length(x); m = m == 0.0f ? 1.0f : m; return x / m; } OVERLOADABLE float4 normalize(float4 x) { float m = length(x); m = m == 0.0f ? 1.0f : m; return x / m; } OVERLOADABLE float fast_length(float x) { return __gen_ocl_fabs(x); } OVERLOADABLE float fast_length(float2 x) { return sqrt(dot(x,x)); } OVERLOADABLE float fast_length(float3 x) { return sqrt(dot(x,x)); } OVERLOADABLE float fast_length(float4 x) { return sqrt(dot(x,x)); } OVERLOADABLE float fast_distance(float x, float y) { return length(x-y); } OVERLOADABLE float fast_distance(float2 x, float2 y) { return length(x-y); } OVERLOADABLE float fast_distance(float3 x, float3 y) { return length(x-y); } OVERLOADABLE float fast_distance(float4 x, float4 y) { return length(x-y); } OVERLOADABLE float fast_normalize(float x) { return x > 0 ? 1.f : (x < 0 ? -1.f : 0.f); } OVERLOADABLE float2 fast_normalize(float2 x) { return x * rsqrt(dot(x, x)); } OVERLOADABLE float3 fast_normalize(float3 x) { return x * rsqrt(dot(x, x)); } OVERLOADABLE float4 fast_normalize(float4 x) { return x * rsqrt(dot(x, x)); } OVERLOADABLE float3 cross(float3 v0, float3 v1) { return v0.yzx*v1.zxy-v0.zxy*v1.yzx; } OVERLOADABLE float4 cross(float4 v0, float4 v1) { return (float4)(v0.yzx*v1.zxy-v0.zxy*v1.yzx, 0.f); } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_misc.cl000664 001750 001750 00000023257 13173554000 022010 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_misc.h" #define DEC2(TYPE, XTYPE, MASKTYPE) \ OVERLOADABLE TYPE##2 shuffle(XTYPE x, MASKTYPE##2 mask) { \ TYPE##2 y; \ y.s0 = ((TYPE *) &x)[mask.s0 & (vec_step(x) - 1)]; \ y.s1 = ((TYPE *) &x)[mask.s1 & (vec_step(x) - 1)]; \ return y; \ } #define DEC4(TYPE, XTYPE, MASKTYPE) \ OVERLOADABLE TYPE##4 shuffle(XTYPE x, MASKTYPE##4 mask) { \ TYPE##4 y; \ y.s0 = ((TYPE *) &x)[mask.s0 & (vec_step(x) - 1)]; \ y.s1 = ((TYPE *) &x)[mask.s1 & (vec_step(x) - 1)]; \ y.s2 = ((TYPE *) &x)[mask.s2 & (vec_step(x) - 1)]; \ y.s3 = ((TYPE *) &x)[mask.s3 & (vec_step(x) - 1)]; \ return y; \ } #define DEC8(TYPE, XTYPE, MASKTYPE) \ OVERLOADABLE TYPE##8 shuffle(XTYPE x, MASKTYPE##8 mask) { \ TYPE##8 y; \ y.s0 = ((TYPE *) &x)[mask.s0 & (vec_step(x) - 1)]; \ y.s1 = ((TYPE *) &x)[mask.s1 & (vec_step(x) - 1)]; \ y.s2 = ((TYPE *) &x)[mask.s2 & (vec_step(x) - 1)]; \ y.s3 = ((TYPE *) &x)[mask.s3 & (vec_step(x) - 1)]; \ y.s4 = ((TYPE *) &x)[mask.s4 & (vec_step(x) - 1)]; \ y.s5 = ((TYPE *) &x)[mask.s5 & (vec_step(x) - 1)]; \ y.s6 = ((TYPE *) &x)[mask.s6 & (vec_step(x) - 1)]; \ y.s7 = ((TYPE *) &x)[mask.s7 & (vec_step(x) - 1)]; \ return y; \ } #define DEC16(TYPE, XTYPE, MASKTYPE) \ OVERLOADABLE TYPE##16 shuffle(XTYPE x, MASKTYPE##16 mask) { \ TYPE##16 y; \ y.s0 = ((TYPE *) &x)[mask.s0 & (vec_step(x) - 1)]; \ y.s1 = ((TYPE *) &x)[mask.s1 & (vec_step(x) - 1)]; \ y.s2 = ((TYPE *) &x)[mask.s2 & (vec_step(x) - 1)]; \ y.s3 = ((TYPE *) &x)[mask.s3 & (vec_step(x) - 1)]; \ y.s4 = ((TYPE *) &x)[mask.s4 & (vec_step(x) - 1)]; \ y.s5 = ((TYPE *) &x)[mask.s5 & (vec_step(x) - 1)]; \ y.s6 = ((TYPE *) &x)[mask.s6 & (vec_step(x) - 1)]; \ y.s7 = ((TYPE *) &x)[mask.s7 & (vec_step(x) - 1)]; \ y.s8 = ((TYPE *) &x)[mask.s8 & (vec_step(x) - 1)]; \ y.s9 = ((TYPE *) &x)[mask.s9 & (vec_step(x) - 1)]; \ y.sA = ((TYPE *) &x)[mask.sA & (vec_step(x) - 1)]; \ y.sB = ((TYPE *) &x)[mask.sB & (vec_step(x) - 1)]; \ y.sC = ((TYPE *) &x)[mask.sC & (vec_step(x) - 1)]; \ y.sD = ((TYPE *) &x)[mask.sD & (vec_step(x) - 1)]; \ y.sE = ((TYPE *) &x)[mask.sE & (vec_step(x) - 1)]; \ y.sF = ((TYPE *) &x)[mask.sF & (vec_step(x) - 1)]; \ return y; \ } #define DEFMASK(TYPE, MASKTYPE) \ DEC2(TYPE, TYPE##2, MASKTYPE); DEC2(TYPE, TYPE##4, MASKTYPE); DEC2(TYPE, TYPE##8, MASKTYPE); DEC2(TYPE, TYPE##16, MASKTYPE) \ DEC4(TYPE, TYPE##2, MASKTYPE); DEC4(TYPE, TYPE##4, MASKTYPE); DEC4(TYPE, TYPE##8, MASKTYPE); DEC4(TYPE, TYPE##16, MASKTYPE) \ DEC8(TYPE, TYPE##2, MASKTYPE); DEC8(TYPE, TYPE##4, MASKTYPE); DEC8(TYPE, TYPE##8, MASKTYPE); DEC8(TYPE, TYPE##16, MASKTYPE) \ DEC16(TYPE, TYPE##2, MASKTYPE); DEC16(TYPE, TYPE##4, MASKTYPE); DEC16(TYPE, TYPE##8, MASKTYPE); DEC16(TYPE, TYPE##16, MASKTYPE) #define DEF(TYPE) \ DEFMASK(TYPE, uchar) \ DEFMASK(TYPE, ushort) \ DEFMASK(TYPE, uint) \ DEFMASK(TYPE, ulong) DEF(char) DEF(uchar) DEF(short) DEF(ushort) DEF(half) DEF(int) DEF(uint) DEF(float) DEF(long) DEF(ulong) #undef DEF #undef DEFMASK #undef DEC2 #undef DEC4 #undef DEC8 #undef DEC16 #define DEC2(TYPE, ARGTYPE, TEMPTYPE, MASKTYPE) \ OVERLOADABLE TYPE##2 shuffle2(ARGTYPE x, ARGTYPE y, MASKTYPE##2 mask) { \ return shuffle((TEMPTYPE)(x, y), mask); \ } #define DEC2X(TYPE, MASKTYPE) \ OVERLOADABLE TYPE##2 shuffle2(TYPE##16 x, TYPE##16 y, MASKTYPE##2 mask) { \ TYPE##2 z; \ z.s0 = (mask.s0 & 31) < 16 ? ((TYPE *)&x)[mask.s0 & 31] : ((TYPE *)&y)[mask.s0 & 15]; \ z.s1 = (mask.s1 & 31) < 16 ? ((TYPE *)&x)[mask.s1 & 31] : ((TYPE *)&y)[mask.s1 & 15]; \ return z; \ } #define DEC4(TYPE, ARGTYPE, TEMPTYPE, MASKTYPE) \ OVERLOADABLE TYPE##4 shuffle2(ARGTYPE x, ARGTYPE y, MASKTYPE##4 mask) { \ return shuffle((TEMPTYPE)(x, y), mask); \ } #define DEC4X(TYPE, MASKTYPE) \ OVERLOADABLE TYPE##4 shuffle2(TYPE##16 x, TYPE##16 y, MASKTYPE##4 mask) { \ TYPE##4 z; \ z.s0 = (mask.s0 & 31) < 16 ? ((TYPE *)&x)[mask.s0 & 31] : ((TYPE *)&y)[mask.s0 & 15]; \ z.s1 = (mask.s1 & 31) < 16 ? ((TYPE *)&x)[mask.s1 & 31] : ((TYPE *)&y)[mask.s1 & 15]; \ z.s2 = (mask.s2 & 31) < 16 ? ((TYPE *)&x)[mask.s2 & 31] : ((TYPE *)&y)[mask.s2 & 15]; \ z.s3 = (mask.s3 & 31) < 16 ? ((TYPE *)&x)[mask.s3 & 31] : ((TYPE *)&y)[mask.s3 & 15]; \ return z; \ } #define DEC8(TYPE, ARGTYPE, TEMPTYPE, MASKTYPE) \ OVERLOADABLE TYPE##8 shuffle2(ARGTYPE x, ARGTYPE y, MASKTYPE##8 mask) { \ return shuffle((TEMPTYPE)(x, y), mask); \ } #define DEC8X(TYPE, MASKTYPE) \ OVERLOADABLE TYPE##8 shuffle2(TYPE##16 x, TYPE##16 y, MASKTYPE##8 mask) { \ TYPE##8 z; \ z.s0 = (mask.s0 & 31) < 16 ? ((TYPE *)&x)[mask.s0 & 31] : ((TYPE *)&y)[mask.s0 & 15]; \ z.s1 = (mask.s1 & 31) < 16 ? ((TYPE *)&x)[mask.s1 & 31] : ((TYPE *)&y)[mask.s1 & 15]; \ z.s2 = (mask.s2 & 31) < 16 ? ((TYPE *)&x)[mask.s2 & 31] : ((TYPE *)&y)[mask.s2 & 15]; \ z.s3 = (mask.s3 & 31) < 16 ? ((TYPE *)&x)[mask.s3 & 31] : ((TYPE *)&y)[mask.s3 & 15]; \ z.s4 = (mask.s4 & 31) < 16 ? ((TYPE *)&x)[mask.s4 & 31] : ((TYPE *)&y)[mask.s4 & 15]; \ z.s5 = (mask.s5 & 31) < 16 ? ((TYPE *)&x)[mask.s5 & 31] : ((TYPE *)&y)[mask.s5 & 15]; \ z.s6 = (mask.s6 & 31) < 16 ? ((TYPE *)&x)[mask.s6 & 31] : ((TYPE *)&y)[mask.s6 & 15]; \ z.s7 = (mask.s7 & 31) < 16 ? ((TYPE *)&x)[mask.s7 & 31] : ((TYPE *)&y)[mask.s7 & 15]; \ return z; \ } #define DEC16(TYPE, ARGTYPE, TEMPTYPE, MASKTYPE) \ OVERLOADABLE TYPE##16 shuffle2(ARGTYPE x, ARGTYPE y, MASKTYPE##16 mask) { \ return shuffle((TEMPTYPE)(x, y), mask); \ } #define DEC16X(TYPE, MASKTYPE) \ OVERLOADABLE TYPE##16 shuffle2(TYPE##16 x, TYPE##16 y, MASKTYPE##16 mask) { \ TYPE##16 z; \ z.s0 = (mask.s0 & 31) < 16 ? ((TYPE *)&x)[mask.s0 & 31] : ((TYPE *)&y)[mask.s0 & 15]; \ z.s1 = (mask.s1 & 31) < 16 ? ((TYPE *)&x)[mask.s1 & 31] : ((TYPE *)&y)[mask.s1 & 15]; \ z.s2 = (mask.s2 & 31) < 16 ? ((TYPE *)&x)[mask.s2 & 31] : ((TYPE *)&y)[mask.s2 & 15]; \ z.s3 = (mask.s3 & 31) < 16 ? ((TYPE *)&x)[mask.s3 & 31] : ((TYPE *)&y)[mask.s3 & 15]; \ z.s4 = (mask.s4 & 31) < 16 ? ((TYPE *)&x)[mask.s4 & 31] : ((TYPE *)&y)[mask.s4 & 15]; \ z.s5 = (mask.s5 & 31) < 16 ? ((TYPE *)&x)[mask.s5 & 31] : ((TYPE *)&y)[mask.s5 & 15]; \ z.s6 = (mask.s6 & 31) < 16 ? ((TYPE *)&x)[mask.s6 & 31] : ((TYPE *)&y)[mask.s6 & 15]; \ z.s7 = (mask.s7 & 31) < 16 ? ((TYPE *)&x)[mask.s7 & 31] : ((TYPE *)&y)[mask.s7 & 15]; \ z.s8 = (mask.s8 & 31) < 16 ? ((TYPE *)&x)[mask.s8 & 31] : ((TYPE *)&y)[mask.s8 & 15]; \ z.s9 = (mask.s9 & 31) < 16 ? ((TYPE *)&x)[mask.s9 & 31] : ((TYPE *)&y)[mask.s9 & 15]; \ z.sA = (mask.sA & 31) < 16 ? ((TYPE *)&x)[mask.sA & 31] : ((TYPE *)&y)[mask.sA & 15]; \ z.sB = (mask.sB & 31) < 16 ? ((TYPE *)&x)[mask.sB & 31] : ((TYPE *)&y)[mask.sB & 15]; \ z.sC = (mask.sC & 31) < 16 ? ((TYPE *)&x)[mask.sC & 31] : ((TYPE *)&y)[mask.sC & 15]; \ z.sD = (mask.sD & 31) < 16 ? ((TYPE *)&x)[mask.sD & 31] : ((TYPE *)&y)[mask.sD & 15]; \ z.sE = (mask.sE & 31) < 16 ? ((TYPE *)&x)[mask.sE & 31] : ((TYPE *)&y)[mask.sE & 15]; \ z.sF = (mask.sF & 31) < 16 ? ((TYPE *)&x)[mask.sF & 31] : ((TYPE *)&y)[mask.sF & 15]; \ return z; \ } #define DEFMASK(TYPE, MASKTYPE) \ DEC2(TYPE, TYPE##2, TYPE##4, MASKTYPE) \ DEC2(TYPE, TYPE##4, TYPE##8, MASKTYPE) \ DEC2(TYPE, TYPE##8, TYPE##16, MASKTYPE) \ DEC2X(TYPE, MASKTYPE) \ DEC4(TYPE, TYPE##2, TYPE##4, MASKTYPE) \ DEC4(TYPE, TYPE##4, TYPE##8, MASKTYPE) \ DEC4(TYPE, TYPE##8, TYPE##16, MASKTYPE) \ DEC4X(TYPE, MASKTYPE) \ DEC8(TYPE, TYPE##2, TYPE##4, MASKTYPE) \ DEC8(TYPE, TYPE##4, TYPE##8, MASKTYPE) \ DEC8(TYPE, TYPE##8, TYPE##16, MASKTYPE) \ DEC8X(TYPE, MASKTYPE) \ DEC16(TYPE, TYPE##2, TYPE##4, MASKTYPE) \ DEC16(TYPE, TYPE##4, TYPE##8, MASKTYPE) \ DEC16(TYPE, TYPE##8, TYPE##16, MASKTYPE) \ DEC16X(TYPE, MASKTYPE) #define DEF(TYPE) \ DEFMASK(TYPE, uchar) \ DEFMASK(TYPE, ushort) \ DEFMASK(TYPE, uint) \ DEFMASK(TYPE, ulong) DEF(char) DEF(uchar) DEF(short) DEF(ushort) DEF(half) DEF(int) DEF(uint) DEF(float) DEF(long) DEF(ulong) #undef DEF #undef DEFMASK #undef DEC2 #undef DEC2X #undef DEC4 #undef DEC4X #undef DEC8 #undef DEC8X #undef DEC16 #undef DEC16X uint __gen_ocl_read_tm(void); uint __gen_ocl_region(ushort offset, uint data); struct time_stamp __gen_ocl_get_timestamp(void) { struct time_stamp val; uint tm = __gen_ocl_read_tm(); val.tick = ((ulong)__gen_ocl_region(1, tm) << 32) | __gen_ocl_region(0, tm); val.event = __gen_ocl_region(2, tm); return val; }; bool __gen_ocl_in_local(size_t p) { bool cond1 = p > 0; bool cond2 = p < 64*1024; return cond1 && cond2; } #if (__OPENCL_C_VERSION__ >= 200) local void *__to_local(generic void *p) { bool cond = __gen_ocl_in_local((size_t)p); return cond ? (local void*)p : NULL; } private void *__to_private(generic void *p) { bool cond = __gen_ocl_in_private((size_t)p); return cond ? (private void*)p : NULL; } global void *__to_global(generic void *p) { bool cond1 = __gen_ocl_in_local((size_t)p); bool cond2 = __gen_ocl_in_private((size_t)p); bool cond = cond1 || cond2; return !cond ? (global void*)p : NULL; } #endif Beignet-1.3.2-Source/backend/src/libocl/src/ocl_pipe.cl000664 001750 001750 00000021434 13161142102 021776 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_pipe.h" #include "ocl_atom.h" #include "ocl_workitem.h" #define PIPE_SUCCESS 0 #define PIPE_EMPTY -2 #define PIPE_FULL -3 #define PIPE_HEADER_SZ 128 #define PIPE_INDEX_OUTRANGE -4 #define PIPE_RESERVE_FAIL -5 #define RID_MAGIC 0xDE #define RIDT ushort #define DEAD_PTR 0xFFFFFFFF PURE CONST __global void* __gen_ocl_get_pipe(pipe int p); PURE CONST ulong __gen_ocl_get_rid(reserve_id_t rid); PURE CONST reserve_id_t __gen_ocl_make_rid(ulong rid); int __read_pipe_2(pipe int p, __generic void* dst) { __global int* pheader = (__global int*)__gen_ocl_get_pipe(p); int data_size = atomic_sub(pheader + 6, 1); if(data_size < 0){ atomic_add(pheader + 6, 1); return PIPE_EMPTY; //Check if element exist } __global char* psrc = (__global char*)pheader + PIPE_HEADER_SZ; int pack_num = pheader[0]; int pack_size = pheader[1]; int pipe_size = pack_num * pack_size; int read_ptr = atomic_add(pheader + 3, 1); if(read_ptr == pack_num - 1) atomic_sub(pheader + 3, pack_num); read_ptr = read_ptr % pack_num; for(int i = 0; i < pack_size ; i++) ((char*)dst)[i] = psrc[i + read_ptr*pack_size]; return 0; } int __read_pipe_4(pipe int p, reserve_id_t id, uint index, void* dst) { __global int* pheader = (__global int*)__gen_ocl_get_pipe(p); __global char* psrc = (__global char*)pheader + PIPE_HEADER_SZ; ulong uid = __gen_ocl_get_rid(id); RIDT* pid = (RIDT*)&uid; RIDT start_pt = pid[0]; RIDT reserve_size = pid[1]; if(index > reserve_size) return PIPE_INDEX_OUTRANGE; int pack_num = pheader[0]; int pack_size = pheader[1]; int read_ptr = (start_pt + index) % pack_num; int offset = read_ptr * pack_size; for(int i = 0; i < pack_size ; i++) ((char*)dst)[i] = psrc[i + offset]; return 0; } int __write_pipe_2(pipe int p, __generic void* src) { __global int* pheader = (__global int*)__gen_ocl_get_pipe(p); int pack_num = pheader[0]; int data_size = atomic_add(pheader + 6, 1); if(data_size >= pack_num){ atomic_sub(pheader + 6, 1); return PIPE_FULL; //Check if pipe full } __global char* psrc = (__global char*)pheader + PIPE_HEADER_SZ; int pack_size = pheader[1]; int pipe_size = pack_num * pack_size; int write_ptr = atomic_add(pheader + 2, 1); if(write_ptr == pack_num - 1) atomic_sub(pheader + 2, pack_num); write_ptr = write_ptr % pack_num; for(int i = 0; i < pack_size ; i++) psrc[i + write_ptr * pack_size] = ((char*)src)[i]; return 0; } int __write_pipe_4(pipe int p, reserve_id_t id, uint index, void* src) { __global int* pheader = __gen_ocl_get_pipe(p); __global char* psrc = (__global char*)pheader + PIPE_HEADER_SZ; ulong uid = __gen_ocl_get_rid(id); RIDT* pid = (RIDT*)&uid; RIDT start_pt = pid[0]; RIDT reserve_size = pid[1]; if(index > reserve_size) return PIPE_INDEX_OUTRANGE; int pack_num = pheader[0]; int pack_size = pheader[1]; int write_ptr = (start_pt + index) % pack_num; int offset = write_ptr * pack_size; for(int i = 0; i < pack_size ; i++) psrc[i + offset] = ((char*)src)[i]; return pack_size; } reserve_id_t __reserve_read_pipe(pipe int p, uint num) { __global int* pheader = (__global int*)__gen_ocl_get_pipe(p); int data_size = atomic_sub(pheader + 6, num); if(data_size < num){ atomic_add(pheader + 6, num); return __gen_ocl_make_rid(0l); } int pack_num = pheader[0]; int pack_size = pheader[1]; int pipe_size = pack_num * pack_size; int read_ptr = atomic_add(pheader + 3, num); if(read_ptr == pack_num - num) atomic_sub(pheader + 3, pack_num); ulong uid = 0l; RIDT* pid = (RIDT*)&uid; pid[0] = read_ptr % pack_num; pid[1] = num; pid[2] = RID_MAGIC ; return __gen_ocl_make_rid(uid); } void __commit_read_pipe(pipe int p, reserve_id_t rid) {} reserve_id_t __work_group_reserve_read_pipe(pipe int p, uint num) { uint rid_ptr = DEAD_PTR; int ret0 = 0; if(get_local_linear_id()==0){ __global int* pheader = (__global int*)__gen_ocl_get_pipe(p); int data_size = atomic_sub(pheader + 6, num); if(data_size < num){ atomic_add(pheader + 6, num); int ret0 = 1; } int pack_num = pheader[0]; int pack_size = pheader[1]; int pipe_size = pack_num * pack_size; int read_ptr = atomic_add(pheader + 3, num); if(read_ptr == pack_num - num && !ret0) atomic_sub(pheader + 3, pack_num); if(!ret0) rid_ptr = read_ptr % pack_num; } ulong uid = 0l; RIDT* pid = (RIDT*)&uid; rid_ptr = work_group_broadcast(rid_ptr,0,0,0); pid[0] = rid_ptr; pid[1] = num; pid[2] = RID_MAGIC ; if(rid_ptr == DEAD_PTR) uid = 0l; return __gen_ocl_make_rid(uid); } void __work_group_commit_read_pipe(pipe int p, reserve_id_t rid) {} reserve_id_t __sub_group_reserve_read_pipe(pipe int p, uint num) { __global int* pheader = (__global int*)__gen_ocl_get_pipe(p); int data_size = atomic_sub(pheader + 6, num); if(data_size < num){ atomic_add(pheader + 6, num); return __gen_ocl_make_rid(0l); } int pack_num = pheader[0]; int pack_size = pheader[1]; int pipe_size = pack_num * pack_size; int read_ptr = atomic_add(pheader + 3, num); if(read_ptr == pack_num - num) atomic_sub(pheader + 3, pack_num); ulong uid = 0l; RIDT* pid = (RIDT*)&uid; pid[0] = read_ptr % pack_num; pid[1] = num; pid[2] = RID_MAGIC ; return __gen_ocl_make_rid(uid); } void __sub_group_commit_read_pipe(pipe int p, reserve_id_t rid) {} reserve_id_t __reserve_write_pipe(pipe int p, uint num) { __global int* pheader = (__global int*)__gen_ocl_get_pipe(p); int pack_num = pheader[0]; int data_size = atomic_add(pheader + 6, num); if(data_size > pack_num - num){ atomic_sub(pheader + 6, num); return __gen_ocl_make_rid(0l); } int pack_size = pheader[1]; int pipe_size = pack_num * pack_size; int write_ptr = atomic_add(pheader + 2, num); if(write_ptr == pack_num - num) atomic_sub(pheader + 2, pack_num); ulong uid = 0l; RIDT* pid = (RIDT*)&uid; pid[0] = write_ptr % pack_num; pid[1] = num; pid[2] = RID_MAGIC ; return __gen_ocl_make_rid(uid); } void __commit_write_pipe(pipe int p, reserve_id_t rid) {} reserve_id_t __work_group_reserve_write_pipe(pipe int p, uint num) { uint rid_ptr = DEAD_PTR; int ret0 = 0; if(get_local_linear_id()==0){ __global int* pheader = (__global int*)__gen_ocl_get_pipe(p); int pack_num = pheader[0]; int data_size = atomic_add(pheader + 6, num); if(data_size > pack_num - num){ atomic_sub(pheader + 6, num); ret0 = 1; } int pack_size = pheader[1]; int pipe_size = pack_num * pack_size; int write_ptr = atomic_add(pheader + 2, num); if(write_ptr == pack_num - num && !ret0) atomic_sub(pheader + 2, pack_num); if(!ret0) rid_ptr = write_ptr % pack_num; } ulong uid = 0l; RIDT* pid = (RIDT*)&uid; rid_ptr = work_group_broadcast(rid_ptr,0,0,0); pid[0] = rid_ptr; pid[1] = num; pid[2] = RID_MAGIC ; if(rid_ptr == DEAD_PTR) uid = 0l; return __gen_ocl_make_rid(uid); } void __work_group_commit_write_pipe(pipe int p, reserve_id_t rid) {} reserve_id_t __sub_group_reserve_write_pipe(pipe int p, uint num) { __global int* pheader = (__global int*)__gen_ocl_get_pipe(p); int pack_num = pheader[0]; int data_size = atomic_add(pheader + 6, num); if(data_size > pack_num - num){ atomic_sub(pheader + 6, num); return __gen_ocl_make_rid(0l); } int pack_size = pheader[1]; int pipe_size = pack_num * pack_size; int write_ptr = atomic_add(pheader + 2, num); if(write_ptr == pack_num - num) atomic_sub(pheader + 2, pack_num); ulong uid = 0l; RIDT* pid = (RIDT*)&uid; pid[0] = write_ptr % pack_num; pid[1] = num; pid[2] = RID_MAGIC ; return __gen_ocl_make_rid(uid); } void __sub_group_commit_write_pipe(pipe int p, reserve_id_t rid) {} bool is_valid_reserve_id(reserve_id_t rid) { ulong uid = __gen_ocl_get_rid(rid); RIDT* pid = (RIDT*)&uid; if(pid[1] == 0) return false; if(pid[2] != RID_MAGIC) return false; return true; } /* Query Function */ uint __get_pipe_max_packets(pipe int p) { __global int* pheader = __gen_ocl_get_pipe(p); return pheader[0]; } uint __get_pipe_num_packets(pipe int p) { __global int* pheader = __gen_ocl_get_pipe(p); return pheader[6]; } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_sync.cl000664 001750 001750 00000002300 13161142102 022004 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_sync.h" #include "ocl_misc.h" void __gen_ocl_barrier_local(void); void __gen_ocl_barrier_global(void); void __gen_ocl_debugwait(void); OVERLOADABLE void mem_fence(cl_mem_fence_flags flags) { } OVERLOADABLE void read_mem_fence(cl_mem_fence_flags flags) { } OVERLOADABLE void write_mem_fence(cl_mem_fence_flags flags) { } cl_mem_fence_flags get_fence(void *ptr) { bool cond = __gen_ocl_in_local((size_t)ptr); return cond ? CLK_LOCAL_MEM_FENCE : CLK_GLOBAL_MEM_FENCE; } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_vload_20.cl000664 001750 001750 00000024376 13161142102 022457 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #pragma OPENCL EXTENSION cl_khr_fp64 : enable #include "ocl_vload_20.h" #include "ocl_relational.h" // These loads and stores will use untyped reads and writes, so we can just // cast to vector loads / stores. Not C99 compliant BTW due to aliasing issue. // Well we do not care, we do not activate TBAA in the compiler #define DECL_UNTYPED_RW_SPACE_N(TYPE, DIM, SPACE) \ OVERLOADABLE TYPE##DIM vload##DIM(size_t offset, const SPACE TYPE *p) { \ return *(SPACE TYPE##DIM *) (p + DIM * offset); \ } \ OVERLOADABLE void vstore##DIM(TYPE##DIM v, size_t offset, SPACE TYPE *p) { \ *(SPACE TYPE##DIM *) (p + DIM * offset) = v; \ } #define DECL_UNTYPED_RD_SPACE_N(TYPE, DIM, SPACE) \ OVERLOADABLE TYPE##DIM vload##DIM(size_t offset, const SPACE TYPE *p) { \ return *(SPACE TYPE##DIM *) (p + DIM * offset); \ } #define DECL_UNTYPED_V3_SPACE(TYPE, SPACE) \ OVERLOADABLE void vstore3(TYPE##3 v, size_t offset, SPACE TYPE *p) {\ *(p + 3 * offset) = v.s0; \ *(p + 3 * offset + 1) = v.s1; \ *(p + 3 * offset + 2) = v.s2; \ } \ OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p) { \ return (TYPE##3)(*(p + 3 * offset), *(p+ 3 * offset + 1), *(p + 3 * offset + 2));\ } #define DECL_UNTYPED_RDV3_SPACE(TYPE, SPACE) \ OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p) { \ return (TYPE##3)(*(p + 3 * offset), *(p+ 3 * offset + 1), *(p + 3 * offset + 2));\ } #define DECL_UNTYPED_RW_ALL_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 2, SPACE) \ DECL_UNTYPED_V3_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 4, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 8, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 16, SPACE) #define DECL_UNTYPED_RD_ALL_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 2, SPACE) \ DECL_UNTYPED_RDV3_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 4, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 8, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 16, SPACE) #define DECL_UNTYPED_RW_ALL(TYPE) \ DECL_UNTYPED_RD_ALL_SPACE(TYPE, __constant) \ DECL_UNTYPED_RW_ALL_SPACE(TYPE, __generic) #define DECL_BYTE_RD_SPACE(TYPE, SPACE) \ OVERLOADABLE TYPE##2 vload2(size_t offset, const SPACE TYPE *p) { \ return (TYPE##2)(*(p+2*offset), *(p+2*offset+1)); \ } \ OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p) { \ return (TYPE##3)(*(p+3*offset), *(p+3*offset+1), *(p+3*offset+2)); \ } \ OVERLOADABLE TYPE##4 vload4(size_t offset, const SPACE TYPE *p) { \ return (TYPE##4)(vload2(2*offset, p), vload2(2*offset, p+2)); \ } \ OVERLOADABLE TYPE##8 vload8(size_t offset, const SPACE TYPE *p) { \ return (TYPE##8)(vload4(2*offset, p), vload4(2*offset, p+4)); \ } \ OVERLOADABLE TYPE##16 vload16(size_t offset, const SPACE TYPE *p) { \ return (TYPE##16)(vload8(2*offset, p), vload8(2*offset, p+8)); \ } #define DECL_BYTE_WR_SPACE(TYPE, SPACE) \ OVERLOADABLE void vstore2(TYPE##2 v, size_t offset, SPACE TYPE *p) {\ *(p + 2 * offset) = v.s0; \ *(p + 2 * offset + 1) = v.s1; \ } \ OVERLOADABLE void vstore3(TYPE##3 v, size_t offset, SPACE TYPE *p) {\ *(p + 3 * offset) = v.s0; \ *(p + 3 * offset + 1) = v.s1; \ *(p + 3 * offset + 2) = v.s2; \ } \ OVERLOADABLE void vstore4(TYPE##4 v, size_t offset, SPACE TYPE *p) { \ vstore2(v.lo, 2*offset, p); \ vstore2(v.hi, 2*offset, p+2); \ } \ OVERLOADABLE void vstore8(TYPE##8 v, size_t offset, SPACE TYPE *p) { \ vstore4(v.lo, 2*offset, p); \ vstore4(v.hi, 2*offset, p+4); \ } \ OVERLOADABLE void vstore16(TYPE##16 v, size_t offset, SPACE TYPE *p) { \ vstore8(v.lo, 2*offset, p); \ vstore8(v.hi, 2*offset, p+8); \ } #define DECL_BYTE_RW_ALL(TYPE) \ DECL_BYTE_RD_SPACE(TYPE, __generic) \ DECL_BYTE_RD_SPACE(TYPE, __constant) \ DECL_BYTE_WR_SPACE(TYPE, __generic) DECL_BYTE_RW_ALL(char) DECL_BYTE_RW_ALL(half) DECL_BYTE_RW_ALL(uchar) DECL_BYTE_RW_ALL(short) DECL_BYTE_RW_ALL(ushort) DECL_UNTYPED_RW_ALL(int) DECL_UNTYPED_RW_ALL(uint) DECL_UNTYPED_RW_ALL(long) DECL_UNTYPED_RW_ALL(ulong) DECL_UNTYPED_RW_ALL(float) DECL_UNTYPED_RW_ALL(double) #undef DECL_UNTYPED_RW_ALL #undef DECL_UNTYPED_RW_ALL_SPACE #undef DECL_UNTYPED_RD_ALL_SPACE #undef DECL_UNTYPED_RW_SPACE_N #undef DECL_UNTYPED_RD_SPACE_N #undef DECL_UNTYPED_V3_SPACE #undef DECL_UNTYPED_RDV3_SPACE #undef DECL_BYTE_RD_SPACE #undef DECL_BYTE_WR_SPACE #undef DECL_BYTE_RW_ALL PURE CONST float __gen_ocl_f16to32(short h); PURE CONST short __gen_ocl_f32to16(float f); OVERLOADABLE short f32to16_rtp(float f) { short s = __gen_ocl_f32to16(f); float con = __gen_ocl_f16to32(s); //if(isinf(con)) return s; if (f > con) return s - signbit(f) * 2 + 1; else return s; } OVERLOADABLE short f32to16_rtn(float f) { short s = __gen_ocl_f32to16(f); float con = __gen_ocl_f16to32(s); //if(isinf(con)) return s; if (con > f) return s + signbit(f) * 2 - 1; else return s; } OVERLOADABLE short f32to16_rtz(float f) { short s = __gen_ocl_f32to16(f); float con = __gen_ocl_f16to32(s); //if(isinf(con)) return s; if (((con > f) && !signbit(f)) || ((con < f) && signbit(f))) return s - 1; else return s; } #define DECL_HALF_LD_SPACE(SPACE) \ OVERLOADABLE float vload_half(size_t offset, const SPACE half *p) { \ return __gen_ocl_f16to32(*(SPACE short *)(p + offset)); \ } \ OVERLOADABLE float vloada_half(size_t offset, const SPACE half *p) { \ return vload_half(offset, p); \ } \ OVERLOADABLE float2 vload_half2(size_t offset, const SPACE half *p) { \ return (float2)(vload_half(offset*2, p), \ vload_half(offset*2 + 1, p)); \ } \ OVERLOADABLE float2 vloada_half2(size_t offset, const SPACE half *p) { \ return (float2)(vloada_half(offset*2, p), \ vloada_half(offset*2 + 1, p)); \ } \ OVERLOADABLE float3 vload_half3(size_t offset, const SPACE half *p) { \ return (float3)(vload_half(offset*3, p), \ vload_half(offset*3 + 1, p), \ vload_half(offset*3 + 2, p)); \ } \ OVERLOADABLE float3 vloada_half3(size_t offset, const SPACE half *p) { \ return (float3)(vload_half(offset*4, p), \ vload_half(offset*4 + 1, p), \ vload_half(offset*4 + 2, p)); \ } \ OVERLOADABLE float4 vload_half4(size_t offset, const SPACE half *p) { \ return (float4)(vload_half2(offset*2, p), \ vload_half2(offset*2 + 1, p)); \ } \ OVERLOADABLE float4 vloada_half4(size_t offset, const SPACE half *p) { \ return (float4)(vloada_half2(offset*2, p), \ vloada_half2(offset*2 + 1, p)); \ } \ OVERLOADABLE float8 vload_half8(size_t offset, const SPACE half *p) { \ return (float8)(vload_half4(offset*2, p), \ vload_half4(offset*2 + 1, p)); \ } \ OVERLOADABLE float8 vloada_half8(size_t offset, const SPACE half *p) { \ return (float8)(vloada_half4(offset*2, p), \ vloada_half4(offset*2 + 1, p)); \ } \ OVERLOADABLE float16 vload_half16(size_t offset, const SPACE half *p) { \ return (float16)(vload_half8(offset*2, p), \ vload_half8(offset*2 + 1, p)); \ }\ OVERLOADABLE float16 vloada_half16(size_t offset, const SPACE half *p) { \ return (float16)(vloada_half8(offset*2, p), \ vloada_half8(offset*2 + 1, p)); \ }\ #define DECL_HALF_ST_SPACE_ROUND(SPACE, ROUND, FUNC) \ OVERLOADABLE void vstore_half##ROUND(float data, size_t offset, SPACE half *p) { \ *(SPACE short *)(p + offset) = FUNC(data); \ } \ OVERLOADABLE void vstorea_half##ROUND(float data, size_t offset, SPACE half *p) { \ vstore_half##ROUND(data, offset, p); \ } \ OVERLOADABLE void vstore_half2##ROUND(float2 data, size_t offset, SPACE half *p) { \ vstore_half##ROUND(data.lo, offset*2, p); \ vstore_half##ROUND(data.hi, offset*2 + 1, p); \ } \ OVERLOADABLE void vstorea_half2##ROUND(float2 data, size_t offset, SPACE half *p) { \ vstore_half2##ROUND(data, offset, p); \ } \ OVERLOADABLE void vstore_half3##ROUND(float3 data, size_t offset, SPACE half *p) { \ vstore_half##ROUND(data.s0, offset*3, p); \ vstore_half##ROUND(data.s1, offset*3 + 1, p); \ vstore_half##ROUND(data.s2, offset*3 + 2, p); \ } \ OVERLOADABLE void vstorea_half3##ROUND(float3 data, size_t offset, SPACE half *p) { \ vstore_half##ROUND(data.s0, offset*4, p); \ vstore_half##ROUND(data.s1, offset*4 + 1, p); \ vstore_half##ROUND(data.s2, offset*4 + 2, p); \ } \ OVERLOADABLE void vstore_half4##ROUND(float4 data, size_t offset, SPACE half *p) { \ vstore_half2##ROUND(data.lo, offset*2, p); \ vstore_half2##ROUND(data.hi, offset*2 + 1, p); \ } \ OVERLOADABLE void vstorea_half4##ROUND(float4 data, size_t offset, SPACE half *p) { \ vstore_half4##ROUND(data, offset, p); \ } \ OVERLOADABLE void vstore_half8##ROUND(float8 data, size_t offset, SPACE half *p) { \ vstore_half4##ROUND(data.lo, offset*2, p); \ vstore_half4##ROUND(data.hi, offset*2 + 1, p); \ } \ OVERLOADABLE void vstorea_half8##ROUND(float8 data, size_t offset, SPACE half *p) { \ vstore_half8##ROUND(data, offset, p); \ } \ OVERLOADABLE void vstore_half16##ROUND(float16 data, size_t offset, SPACE half *p) { \ vstore_half8##ROUND(data.lo, offset*2, p); \ vstore_half8##ROUND(data.hi, offset*2 + 1, p); \ } \ OVERLOADABLE void vstorea_half16##ROUND(float16 data, size_t offset, SPACE half *p) { \ vstore_half16##ROUND(data, offset, p); \ } #define DECL_HALF_ST_SPACE(SPACE) \ DECL_HALF_ST_SPACE_ROUND(SPACE, , __gen_ocl_f32to16) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rte, __gen_ocl_f32to16) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rtz, f32to16_rtz) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rtp, f32to16_rtp) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rtn, f32to16_rtn) \ DECL_HALF_LD_SPACE(__constant) DECL_HALF_LD_SPACE(__generic) DECL_HALF_ST_SPACE(__generic) //#undef DECL_UNTYPED_RW_ALL_SPACE #undef DECL_HALF_LD_SPACE #undef DECL_HALF_ST_SPACE #undef DECL_HALF_ST_SPACE_ROUND Beignet-1.3.2-Source/backend/src/libocl/src/ocl_barrier.ll000664 001750 001750 00000001757 13161142102 022506 0ustar00yryr000000 000000 ;XXX FIXME as llvm can't use macros, we hardcoded 3, 1, 2 ;here, we may need to use a more grace way to handle this type ;of values latter. ;#define CLK_LOCAL_MEM_FENCE (1 << 0) ;#define CLK_GLOBAL_MEM_FENCE (1 << 1) target datalayout = "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" target triple = "spir" declare i32 @_get_local_mem_fence() nounwind alwaysinline declare i32 @_get_global_mem_fence() nounwind alwaysinline declare void @__gen_ocl_barrier_local() nounwind alwaysinline noduplicate declare void @__gen_ocl_barrier_global() nounwind alwaysinline noduplicate declare void @__gen_ocl_debugwait() nounwind alwaysinline noduplicate declare void @__gen_ocl_barrier(i32) nounwind alwaysinline noduplicate define void @_Z7barrierj(i32 %flags) nounwind noduplicate alwaysinline { call void @__gen_ocl_barrier(i32 %flags) ret void } define void @_Z9debugwaitv() nounwind noduplicate alwaysinline { call void @__gen_ocl_debugwait() ret void } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_async.cl000664 001750 001750 00000005327 13161142102 022161 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #pragma OPENCL EXTENSION cl_khr_fp64 : enable #include "ocl_async.h" #include "ocl_sync.h" #include "ocl_workitem.h" #define BODY(SRC_STRIDE, DST_STRIDE) \ uint size = get_local_size(2) * get_local_size(1) * get_local_size(0); \ uint count = num / size; \ uint offset = get_local_id(2) * get_local_size(1) + get_local_id(1); \ offset = offset * get_local_size(0) + get_local_id(0); \ for(uint i=0; i. * */ #include "ocl_types.h" #include "ocl_enqueue.h" #include "ocl_workitem.h" #include "ocl_atom.h" queue_t get_default_queue(void) { queue_t queue; return queue; //return NULL queue } ndrange_t __gen_ocl_set_ndrange_info(__private struct ndrange_info_t *info); __private struct ndrange_info_t* __gen_ocl_get_ndrange_info(ndrange_t info); __global int* __gen_ocl_get_enqueue_info_addr(void); int __enqueue_kernel_basic(queue_t q, int flag, ndrange_t ndrange, BLOCK_TYPE block) { int i; __private struct Block_literal *literal = (__private struct Block_literal *)block; __private uchar *data = (__private uchar *)block; int size = literal->descriptor->size; literal->descriptor->reserved = 0; __global int* start_addr = __gen_ocl_get_enqueue_info_addr(); int offset = atomic_add(start_addr, size + sizeof(struct ndrange_info_t)); __global uchar* addr = (__global uchar*)start_addr + offset + sizeof(int); #if __clang_major__*10 + __clang_minor__ >= 50 __private struct ndrange_info_t *info = to_private(&ndrange); #else __private struct ndrange_info_t *info = __gen_ocl_get_ndrange_info(ndrange); #endif *((__global struct ndrange_info_t *)addr) = *info; addr += sizeof(*info); for(i=0; i< size; i++) { addr[i] = data[i]; } return 0; } int __enqueue_kernel_basic_events(queue_t q, int flag, ndrange_t ndrange, uint num_events_in_wait_list, const EVENT_TYPE event_wait_list, EVENT_TYPE event_ret, BLOCK_TYPE block) { return __enqueue_kernel_basic(q, flag, ndrange, block); } int __gen_enqueue_kernel_slm(queue_t q, int flag, ndrange_t ndrange, BLOCK_TYPE block, int count, __private int* slm_sizes) { int i; __private struct Block_literal* literal = (__private struct Block_literal *)block; __private uchar* data = (__private uchar *)block; int size = literal->descriptor->size; int slm_size = count * sizeof(int); literal->descriptor->reserved = slm_size; __global int* start_addr = __gen_ocl_get_enqueue_info_addr(); int offset = atomic_add(start_addr, size + sizeof(struct ndrange_info_t) + slm_size); __global uchar* addr = (__global uchar*)start_addr + offset + sizeof(int); #if __clang_major__*10 + __clang_minor__ >= 50 __private struct ndrange_info_t *info = to_private(&ndrange); #else __private struct ndrange_info_t *info = __gen_ocl_get_ndrange_info(ndrange); #endif *((__global struct ndrange_info_t *)addr) = *info; addr += sizeof(*info); for(i=0; i < size; i++) { addr[i] = data[i]; } addr += size; for(i=0; i < count; i++) { ((__global int *)addr)[i] = slm_sizes[i]; } return 0; } clk_event_t create_user_event(void) { clk_event_t e; return e; } void retain_event(clk_event_t event) { return; } void release_event(clk_event_t event) { return; } void set_user_event_status(clk_event_t event, int status) { return; } bool is_valid_event(clk_event_t event) { return 1; } uint __get_kernel_work_group_size_impl(BLOCK_TYPE block) { return 256; } uint __get_kernel_preferred_work_group_multiple_impl(BLOCK_TYPE block) { return 16; } void capture_event_profiling_info(clk_event_t event, int name, global void *value) { //fake profiing data ((__global ulong *)value)[0] = 0x3000; ((__global ulong *)value)[1] = 0x6000; } #if __clang_major__*10 + __clang_minor__ >= 50 #define RET_INFO return info; #else #define RET_INFO return __gen_ocl_set_ndrange_info(&info); #endif OVERLOADABLE ndrange_t ndrange_1D(size_t global_work_size) { struct ndrange_info_t info; info.type = 0x1; info.global_work_size[0] = global_work_size; RET_INFO; } OVERLOADABLE ndrange_t ndrange_1D(size_t global_work_size, size_t local_work_size) { struct ndrange_info_t info; info.type = 0x2; info.global_work_size[0] = global_work_size; info.local_work_size[0] = local_work_size; RET_INFO; } OVERLOADABLE ndrange_t ndrange_1D(size_t global_work_offset, size_t global_work_size, size_t local_work_size) { struct ndrange_info_t info; info.type = 0x3; info.global_work_size[0] = global_work_size; info.local_work_size[0] = local_work_size; info.global_work_offset[0] = global_work_offset; RET_INFO; } OVERLOADABLE ndrange_t ndrange_2D(const size_t global_work_size[2]) { struct ndrange_info_t info; info.type = 0x11; info.global_work_size[0] = global_work_size[0]; info.global_work_size[1] = global_work_size[1]; RET_INFO; } OVERLOADABLE ndrange_t ndrange_2D(const size_t global_work_size[2], const size_t local_work_size[2]) { struct ndrange_info_t info; info.type = 0x12; info.global_work_size[0] = global_work_size[0]; info.global_work_size[1] = global_work_size[1]; info.local_work_size[0] = local_work_size[0]; info.local_work_size[1] = local_work_size[1]; RET_INFO; } OVERLOADABLE ndrange_t ndrange_2D(const size_t global_work_offset[2], const size_t global_work_size[2], const size_t local_work_size[2]) { struct ndrange_info_t info; info.type = 0x13; info.global_work_size[0] = global_work_size[0]; info.global_work_size[1] = global_work_size[1]; info.local_work_size[0] = local_work_size[0]; info.local_work_size[1] = local_work_size[1]; info.global_work_offset[0] = global_work_offset[0]; info.global_work_offset[1] = global_work_offset[1]; RET_INFO; } OVERLOADABLE ndrange_t ndrange_3D(const size_t global_work_size[3]) { struct ndrange_info_t info; info.type = 0x21; info.global_work_size[0] = global_work_size[0]; info.global_work_size[1] = global_work_size[1]; info.global_work_size[2] = global_work_size[2]; RET_INFO; } OVERLOADABLE ndrange_t ndrange_3D(const size_t global_work_size[3], const size_t local_work_size[3]) { struct ndrange_info_t info; info.type = 0x22; info.global_work_size[0] = global_work_size[0]; info.global_work_size[1] = global_work_size[1]; info.global_work_size[2] = global_work_size[2]; info.local_work_size[0] = local_work_size[0]; info.local_work_size[1] = local_work_size[1]; info.local_work_size[2] = local_work_size[2]; RET_INFO; } OVERLOADABLE ndrange_t ndrange_3D(const size_t global_work_offset[3], const size_t global_work_size[3], const size_t local_work_size[3]) { struct ndrange_info_t info; info.type = 0x23; info.global_work_size[0] = global_work_size[0]; info.global_work_size[1] = global_work_size[1]; info.global_work_size[2] = global_work_size[2]; info.local_work_size[0] = local_work_size[0]; info.local_work_size[1] = local_work_size[1]; info.local_work_size[2] = local_work_size[2]; info.global_work_offset[0] = global_work_offset[0]; info.global_work_offset[1] = global_work_offset[1]; info.global_work_offset[2] = global_work_offset[2]; RET_INFO; } int enqueue_marker (queue_t queue, uint num_events_in_wait_list, const clk_event_t *event_wait_list, clk_event_t *event_ret) { return 0; } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_memcpy.cl000664 001750 001750 00000004505 13174270571 022353 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_memcpy.h" typedef int __attribute__((may_alias)) AI; #define DECL_TWO_SPACE_MEMCOPY_FN(NAME, DST_SPACE, SRC_SPACE) \ void __gen_memcpy_ ##NAME## _align (DST_SPACE uchar* dst, SRC_SPACE uchar* src, size_t size) { \ size_t index = 0; \ while((index + 4) <= size) { \ *((DST_SPACE AI *)(dst + index)) = *((SRC_SPACE AI *)(src + index)); \ index += 4; \ } \ while(index < size) { \ dst[index] = src[index]; \ index++; \ } \ } \ void __gen_memcpy_ ##NAME (DST_SPACE uchar* dst, SRC_SPACE uchar* src, size_t size) { \ size_t index = 0; \ while(index < size) { \ dst[index] = src[index]; \ index++; \ } \ } #if (__OPENCL_C_VERSION__ >= 200) #define DECL_ONE_SPACE_MEMCOPY_FN(NAME, DST_SPACE) \ DECL_TWO_SPACE_MEMCOPY_FN( NAME## g, DST_SPACE, __global) \ DECL_TWO_SPACE_MEMCOPY_FN( NAME## l, DST_SPACE, __local) \ DECL_TWO_SPACE_MEMCOPY_FN( NAME## p, DST_SPACE, __private) \ DECL_TWO_SPACE_MEMCOPY_FN( NAME## n, DST_SPACE, __generic) \ DECL_TWO_SPACE_MEMCOPY_FN( NAME## c, DST_SPACE, __constant) DECL_ONE_SPACE_MEMCOPY_FN(g, __global) DECL_ONE_SPACE_MEMCOPY_FN(l, __local) DECL_ONE_SPACE_MEMCOPY_FN(p, __private) DECL_ONE_SPACE_MEMCOPY_FN(n, __generic) #else #define DECL_ONE_SPACE_MEMCOPY_FN(NAME, DST_SPACE) \ DECL_TWO_SPACE_MEMCOPY_FN( NAME## g, DST_SPACE, __global) \ DECL_TWO_SPACE_MEMCOPY_FN( NAME## l, DST_SPACE, __local) \ DECL_TWO_SPACE_MEMCOPY_FN( NAME## p, DST_SPACE, __private) \ DECL_TWO_SPACE_MEMCOPY_FN( NAME## c, DST_SPACE, __constant) DECL_ONE_SPACE_MEMCOPY_FN(g, __global) DECL_ONE_SPACE_MEMCOPY_FN(l, __local) DECL_ONE_SPACE_MEMCOPY_FN(p, __private) #endif Beignet-1.3.2-Source/backend/src/libocl/src/ocl_vload.cl000664 001750 001750 00000025142 13161142102 022146 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #pragma OPENCL EXTENSION cl_khr_fp64 : enable #include "ocl_vload.h" #include "ocl_relational.h" // These loads and stores will use untyped reads and writes, so we can just // cast to vector loads / stores. Not C99 compliant BTW due to aliasing issue. // Well we do not care, we do not activate TBAA in the compiler #define DECL_UNTYPED_RW_SPACE_N(TYPE, DIM, SPACE) \ OVERLOADABLE TYPE##DIM vload##DIM(size_t offset, const SPACE TYPE *p) { \ return *(SPACE TYPE##DIM *) (p + DIM * offset); \ } \ OVERLOADABLE void vstore##DIM(TYPE##DIM v, size_t offset, SPACE TYPE *p) { \ *(SPACE TYPE##DIM *) (p + DIM * offset) = v; \ } #define DECL_UNTYPED_RD_SPACE_N(TYPE, DIM, SPACE) \ OVERLOADABLE TYPE##DIM vload##DIM(size_t offset, const SPACE TYPE *p) { \ return *(SPACE TYPE##DIM *) (p + DIM * offset); \ } #define DECL_UNTYPED_V3_SPACE(TYPE, SPACE) \ OVERLOADABLE void vstore3(TYPE##3 v, size_t offset, SPACE TYPE *p) {\ *(p + 3 * offset) = v.s0; \ *(p + 3 * offset + 1) = v.s1; \ *(p + 3 * offset + 2) = v.s2; \ } \ OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p) { \ return (TYPE##3)(*(p + 3 * offset), *(p+ 3 * offset + 1), *(p + 3 * offset + 2));\ } #define DECL_UNTYPED_RDV3_SPACE(TYPE, SPACE) \ OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p) { \ return (TYPE##3)(*(p + 3 * offset), *(p+ 3 * offset + 1), *(p + 3 * offset + 2));\ } #define DECL_UNTYPED_RW_ALL_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 2, SPACE) \ DECL_UNTYPED_V3_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 4, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 8, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 16, SPACE) #define DECL_UNTYPED_RD_ALL_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 2, SPACE) \ DECL_UNTYPED_RDV3_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 4, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 8, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 16, SPACE) #define DECL_UNTYPED_RW_ALL(TYPE) \ DECL_UNTYPED_RW_ALL_SPACE(TYPE, __global) \ DECL_UNTYPED_RW_ALL_SPACE(TYPE, __local) \ DECL_UNTYPED_RD_ALL_SPACE(TYPE, __constant) \ DECL_UNTYPED_RW_ALL_SPACE(TYPE, __private) #define DECL_BYTE_RD_SPACE(TYPE, SPACE) \ OVERLOADABLE TYPE##2 vload2(size_t offset, const SPACE TYPE *p) { \ return (TYPE##2)(*(p+2*offset), *(p+2*offset+1)); \ } \ OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p) { \ return (TYPE##3)(*(p+3*offset), *(p+3*offset+1), *(p+3*offset+2)); \ } \ OVERLOADABLE TYPE##4 vload4(size_t offset, const SPACE TYPE *p) { \ return (TYPE##4)(vload2(2*offset, p), vload2(2*offset, p+2)); \ } \ OVERLOADABLE TYPE##8 vload8(size_t offset, const SPACE TYPE *p) { \ return (TYPE##8)(vload4(2*offset, p), vload4(2*offset, p+4)); \ } \ OVERLOADABLE TYPE##16 vload16(size_t offset, const SPACE TYPE *p) { \ return (TYPE##16)(vload8(2*offset, p), vload8(2*offset, p+8)); \ } #define DECL_BYTE_WR_SPACE(TYPE, SPACE) \ OVERLOADABLE void vstore2(TYPE##2 v, size_t offset, SPACE TYPE *p) {\ *(p + 2 * offset) = v.s0; \ *(p + 2 * offset + 1) = v.s1; \ } \ OVERLOADABLE void vstore3(TYPE##3 v, size_t offset, SPACE TYPE *p) {\ *(p + 3 * offset) = v.s0; \ *(p + 3 * offset + 1) = v.s1; \ *(p + 3 * offset + 2) = v.s2; \ } \ OVERLOADABLE void vstore4(TYPE##4 v, size_t offset, SPACE TYPE *p) { \ vstore2(v.lo, 2*offset, p); \ vstore2(v.hi, 2*offset, p+2); \ } \ OVERLOADABLE void vstore8(TYPE##8 v, size_t offset, SPACE TYPE *p) { \ vstore4(v.lo, 2*offset, p); \ vstore4(v.hi, 2*offset, p+4); \ } \ OVERLOADABLE void vstore16(TYPE##16 v, size_t offset, SPACE TYPE *p) { \ vstore8(v.lo, 2*offset, p); \ vstore8(v.hi, 2*offset, p+8); \ } #define DECL_BYTE_RW_ALL(TYPE) \ DECL_BYTE_RD_SPACE(TYPE, __global) \ DECL_BYTE_RD_SPACE(TYPE, __local) \ DECL_BYTE_RD_SPACE(TYPE, __private) \ DECL_BYTE_RD_SPACE(TYPE, __constant) \ DECL_BYTE_WR_SPACE(TYPE, __global) \ DECL_BYTE_WR_SPACE(TYPE, __local) \ DECL_BYTE_WR_SPACE(TYPE, __private) DECL_BYTE_RW_ALL(char) DECL_BYTE_RW_ALL(half) DECL_BYTE_RW_ALL(uchar) DECL_BYTE_RW_ALL(short) DECL_BYTE_RW_ALL(ushort) DECL_UNTYPED_RW_ALL(int) DECL_UNTYPED_RW_ALL(uint) DECL_UNTYPED_RW_ALL(long) DECL_UNTYPED_RW_ALL(ulong) DECL_UNTYPED_RW_ALL(float) DECL_UNTYPED_RW_ALL(double) #undef DECL_UNTYPED_RW_ALL #undef DECL_UNTYPED_RW_ALL_SPACE #undef DECL_UNTYPED_RD_ALL_SPACE #undef DECL_UNTYPED_RW_SPACE_N #undef DECL_UNTYPED_RD_SPACE_N #undef DECL_UNTYPED_V3_SPACE #undef DECL_UNTYPED_RDV3_SPACE #undef DECL_BYTE_RD_SPACE #undef DECL_BYTE_WR_SPACE #undef DECL_BYTE_RW_ALL PURE CONST float __gen_ocl_f16to32(short h); PURE CONST short __gen_ocl_f32to16(float f); OVERLOADABLE short f32to16_rtp(float f) { short s = __gen_ocl_f32to16(f); float con = __gen_ocl_f16to32(s); //if(isinf(con)) return s; if (f > con) return s - signbit(f) * 2 + 1; else return s; } OVERLOADABLE short f32to16_rtn(float f) { short s = __gen_ocl_f32to16(f); float con = __gen_ocl_f16to32(s); //if(isinf(con)) return s; if (con > f) return s + signbit(f) * 2 - 1; else return s; } OVERLOADABLE short f32to16_rtz(float f) { short s = __gen_ocl_f32to16(f); float con = __gen_ocl_f16to32(s); //if(isinf(con)) return s; if (((con > f) && !signbit(f)) || ((con < f) && signbit(f))) return s - 1; else return s; } #define DECL_HALF_LD_SPACE(SPACE) \ OVERLOADABLE float vload_half(size_t offset, const SPACE half *p) { \ return __gen_ocl_f16to32(*(SPACE short *)(p + offset)); \ } \ OVERLOADABLE float vloada_half(size_t offset, const SPACE half *p) { \ return vload_half(offset, p); \ } \ OVERLOADABLE float2 vload_half2(size_t offset, const SPACE half *p) { \ return (float2)(vload_half(offset*2, p), \ vload_half(offset*2 + 1, p)); \ } \ OVERLOADABLE float2 vloada_half2(size_t offset, const SPACE half *p) { \ return (float2)(vloada_half(offset*2, p), \ vloada_half(offset*2 + 1, p)); \ } \ OVERLOADABLE float3 vload_half3(size_t offset, const SPACE half *p) { \ return (float3)(vload_half(offset*3, p), \ vload_half(offset*3 + 1, p), \ vload_half(offset*3 + 2, p)); \ } \ OVERLOADABLE float3 vloada_half3(size_t offset, const SPACE half *p) { \ return (float3)(vload_half(offset*4, p), \ vload_half(offset*4 + 1, p), \ vload_half(offset*4 + 2, p)); \ } \ OVERLOADABLE float4 vload_half4(size_t offset, const SPACE half *p) { \ return (float4)(vload_half2(offset*2, p), \ vload_half2(offset*2 + 1, p)); \ } \ OVERLOADABLE float4 vloada_half4(size_t offset, const SPACE half *p) { \ return (float4)(vloada_half2(offset*2, p), \ vloada_half2(offset*2 + 1, p)); \ } \ OVERLOADABLE float8 vload_half8(size_t offset, const SPACE half *p) { \ return (float8)(vload_half4(offset*2, p), \ vload_half4(offset*2 + 1, p)); \ } \ OVERLOADABLE float8 vloada_half8(size_t offset, const SPACE half *p) { \ return (float8)(vloada_half4(offset*2, p), \ vloada_half4(offset*2 + 1, p)); \ } \ OVERLOADABLE float16 vload_half16(size_t offset, const SPACE half *p) { \ return (float16)(vload_half8(offset*2, p), \ vload_half8(offset*2 + 1, p)); \ }\ OVERLOADABLE float16 vloada_half16(size_t offset, const SPACE half *p) { \ return (float16)(vloada_half8(offset*2, p), \ vloada_half8(offset*2 + 1, p)); \ }\ #define DECL_HALF_ST_SPACE_ROUND(SPACE, ROUND, FUNC) \ OVERLOADABLE void vstore_half##ROUND(float data, size_t offset, SPACE half *p) { \ *(SPACE short *)(p + offset) = FUNC(data); \ } \ OVERLOADABLE void vstorea_half##ROUND(float data, size_t offset, SPACE half *p) { \ vstore_half##ROUND(data, offset, p); \ } \ OVERLOADABLE void vstore_half2##ROUND(float2 data, size_t offset, SPACE half *p) { \ vstore_half##ROUND(data.lo, offset*2, p); \ vstore_half##ROUND(data.hi, offset*2 + 1, p); \ } \ OVERLOADABLE void vstorea_half2##ROUND(float2 data, size_t offset, SPACE half *p) { \ vstore_half2##ROUND(data, offset, p); \ } \ OVERLOADABLE void vstore_half3##ROUND(float3 data, size_t offset, SPACE half *p) { \ vstore_half##ROUND(data.s0, offset*3, p); \ vstore_half##ROUND(data.s1, offset*3 + 1, p); \ vstore_half##ROUND(data.s2, offset*3 + 2, p); \ } \ OVERLOADABLE void vstorea_half3##ROUND(float3 data, size_t offset, SPACE half *p) { \ vstore_half##ROUND(data.s0, offset*4, p); \ vstore_half##ROUND(data.s1, offset*4 + 1, p); \ vstore_half##ROUND(data.s2, offset*4 + 2, p); \ } \ OVERLOADABLE void vstore_half4##ROUND(float4 data, size_t offset, SPACE half *p) { \ vstore_half2##ROUND(data.lo, offset*2, p); \ vstore_half2##ROUND(data.hi, offset*2 + 1, p); \ } \ OVERLOADABLE void vstorea_half4##ROUND(float4 data, size_t offset, SPACE half *p) { \ vstore_half4##ROUND(data, offset, p); \ } \ OVERLOADABLE void vstore_half8##ROUND(float8 data, size_t offset, SPACE half *p) { \ vstore_half4##ROUND(data.lo, offset*2, p); \ vstore_half4##ROUND(data.hi, offset*2 + 1, p); \ } \ OVERLOADABLE void vstorea_half8##ROUND(float8 data, size_t offset, SPACE half *p) { \ vstore_half8##ROUND(data, offset, p); \ } \ OVERLOADABLE void vstore_half16##ROUND(float16 data, size_t offset, SPACE half *p) { \ vstore_half8##ROUND(data.lo, offset*2, p); \ vstore_half8##ROUND(data.hi, offset*2 + 1, p); \ } \ OVERLOADABLE void vstorea_half16##ROUND(float16 data, size_t offset, SPACE half *p) { \ vstore_half16##ROUND(data, offset, p); \ } #define DECL_HALF_ST_SPACE(SPACE) \ DECL_HALF_ST_SPACE_ROUND(SPACE, , __gen_ocl_f32to16) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rte, __gen_ocl_f32to16) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rtz, f32to16_rtz) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rtp, f32to16_rtp) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rtn, f32to16_rtn) \ DECL_HALF_LD_SPACE(__global) DECL_HALF_LD_SPACE(__local) DECL_HALF_LD_SPACE(__constant) DECL_HALF_LD_SPACE(__private) DECL_HALF_ST_SPACE(__global) DECL_HALF_ST_SPACE(__local) DECL_HALF_ST_SPACE(__private) //#undef DECL_UNTYPED_RW_ALL_SPACE #undef DECL_HALF_LD_SPACE #undef DECL_HALF_ST_SPACE #undef DECL_HALF_ST_SPACE_ROUND Beignet-1.3.2-Source/backend/src/libocl/src/ocl_sampler.ll000664 001750 001750 00000000677 13173554000 022532 0ustar00yryr000000 000000 target datalayout = "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" target triple = "spir" %opencl.sampler_t = type opaque declare %opencl.sampler_t addrspace(2)*@__gen_ocl_int_to_sampler(i32) define %opencl.sampler_t addrspace(2)*@__translate_sampler_initializer(i32 %s) { %call = call %opencl.sampler_t addrspace(2)*@__gen_ocl_int_to_sampler(i32 %s) ret %opencl.sampler_t addrspace(2)* %call } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_atomic_20.ll000664 001750 001750 00000014775 13161142102 022641 0ustar00yryr000000 000000 target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" target triple = "spir64" ;32bit version. define i32 @__gen_ocl_atomic_exchange32(i32 addrspace(4)* nocapture %ptr, i32 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile xchg i32 addrspace(4)* %ptr, i32 %value seq_cst ret i32 %0 } define i32 @__gen_ocl_atomic_exchangef(i32 addrspace(4)* nocapture %ptr, i32 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile xchg i32 addrspace(4)* %ptr, i32 %value seq_cst ret i32 %0 } define i32 @__gen_ocl_atomic_fetch_add32(i32 addrspace(4)* nocapture %ptr, i32 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile add i32 addrspace(4)* %ptr, i32 %value seq_cst ret i32 %0 } define i32 @__gen_ocl_atomic_fetch_addf(i32 addrspace(4)* nocapture %ptr, i32 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile add i32 addrspace(4)* %ptr, i32 %value seq_cst ret i32 %0 } define i32 @__gen_ocl_atomic_fetch_sub32(i32 addrspace(4)* nocapture %ptr, i32 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile sub i32 addrspace(4)* %ptr, i32 %value seq_cst ret i32 %0 } define i32 @__gen_ocl_atomic_fetch_or32(i32 addrspace(4)* nocapture %ptr, i32 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile or i32 addrspace(4)* %ptr, i32 %value seq_cst ret i32 %0 } define i32 @__gen_ocl_atomic_fetch_xor32(i32 addrspace(4)* nocapture %ptr, i32 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile xor i32 addrspace(4)* %ptr, i32 %value seq_cst ret i32 %0 } define i32 @__gen_ocl_atomic_fetch_and32(i32 addrspace(4)* nocapture %ptr, i32 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile and i32 addrspace(4)* %ptr, i32 %value seq_cst ret i32 %0 } define i32 @__gen_ocl_atomic_fetch_imin32(i32 addrspace(4)* nocapture %ptr, i32 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile min i32 addrspace(4)* %ptr, i32 %value seq_cst ret i32 %0 } define i32 @__gen_ocl_atomic_fetch_imax32(i32 addrspace(4)* nocapture %ptr, i32 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile max i32 addrspace(4)* %ptr, i32 %value seq_cst ret i32 %0 } define i32 @__gen_ocl_atomic_fetch_umin32(i32 addrspace(4)* nocapture %ptr, i32 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile umin i32 addrspace(4)* %ptr, i32 %value seq_cst ret i32 %0 } define i32 @__gen_ocl_atomic_fetch_umax32(i32 addrspace(4)* nocapture %ptr, i32 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile umax i32 addrspace(4)* %ptr, i32 %value seq_cst ret i32 %0 } define i32 @__gen_ocl_atomic_compare_exchange_strong32(i32 addrspace(4)* nocapture %ptr,i32 %compare, i32 %value, i32 %success, i32 %failure, i32 %scope) nounwind alwaysinline { entry: %0 = cmpxchg volatile i32 addrspace(4)* %ptr, i32 %compare, i32 %value seq_cst seq_cst %1 = extractvalue { i32, i1 } %0, 0 ret i32 %1 } define i32 @__gen_ocl_atomic_compare_exchange_weak32(i32 addrspace(4)* nocapture %ptr,i32 %compare, i32 %value, i32 %sucess, i32 %failure, i32 %scope) nounwind alwaysinline { entry: %0 = cmpxchg weak volatile i32 addrspace(4)* %ptr, i32 %compare, i32 %value seq_cst seq_cst %1 = extractvalue { i32, i1 } %0, 0 ret i32 %1 } ;64bit version define i64 @__gen_ocl_atomic_exchange64(i64 addrspace(4)* nocapture %ptr, i64 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile xchg i64 addrspace(4)* %ptr, i64 %value seq_cst ret i64 %0 } define i64 @__gen_ocl_atomic_fetch_add64(i64 addrspace(4)* nocapture %ptr, i64 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile add i64 addrspace(4)* %ptr, i64 %value seq_cst ret i64 %0 } define i64 @__gen_ocl_atomic_fetch_sub64(i64 addrspace(4)* nocapture %ptr, i64 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile sub i64 addrspace(4)* %ptr, i64 %value seq_cst ret i64 %0 } define i64 @__gen_ocl_atomic_fetch_or64(i64 addrspace(4)* nocapture %ptr, i64 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile or i64 addrspace(4)* %ptr, i64 %value seq_cst ret i64 %0 } define i64 @__gen_ocl_atomic_fetch_xor64(i64 addrspace(4)* nocapture %ptr, i64 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile xor i64 addrspace(4)* %ptr, i64 %value seq_cst ret i64 %0 } define i64 @__gen_ocl_atomic_fetch_and64(i64 addrspace(4)* nocapture %ptr, i64 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile and i64 addrspace(4)* %ptr, i64 %value seq_cst ret i64 %0 } define i64 @__gen_ocl_atomic_fetch_imin64(i64 addrspace(4)* nocapture %ptr, i64 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile min i64 addrspace(4)* %ptr, i64 %value seq_cst ret i64 %0 } define i64 @__gen_ocl_atomic_fetch_imax64(i64 addrspace(4)* nocapture %ptr, i64 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile max i64 addrspace(4)* %ptr, i64 %value seq_cst ret i64 %0 } define i64 @__gen_ocl_atomic_fetch_umin64(i64 addrspace(4)* nocapture %ptr, i64 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile umin i64 addrspace(4)* %ptr, i64 %value seq_cst ret i64 %0 } define i64 @__gen_ocl_atomic_fetch_umax64(i64 addrspace(4)* nocapture %ptr, i64 %value, i32 %order, i32 %scope) nounwind alwaysinline { entry: %0 = atomicrmw volatile umax i64 addrspace(4)* %ptr, i64 %value seq_cst ret i64 %0 } define i64 @__gen_ocl_atomic_compare_exchange_strong64(i64 addrspace(4)* nocapture %ptr,i64 %compare, i64 %value, i32 %sucess, i32 %failure, i32 %scope) nounwind alwaysinline { entry: %0 = cmpxchg volatile i64 addrspace(4)* %ptr, i64 %compare, i64 %value seq_cst seq_cst %1 = extractvalue { i64, i1 } %0, 0 ret i64 %1 } define i64 @__gen_ocl_atomic_compare_exchange_weak64(i64 addrspace(4)* nocapture %ptr,i64 %compare, i64 %value, i32 %sucess, i32 %failure, i32 %scope) nounwind alwaysinline { entry: %0 = cmpxchg weak volatile i64 addrspace(4)* %ptr, i64 %compare, i64 %value seq_cst seq_cst %1 = extractvalue { i64, i1 } %0, 0 ret i64 %1 } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_atom.cl000664 001750 001750 00000012314 13161142102 021776 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_atom.h" #include "ocl_as.h" OVERLOADABLE uint __gen_ocl_atomic_add(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_add(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_sub(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_sub(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_and(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_and(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_or(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_or(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_xor(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_xor(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_xchg(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_xchg(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_inc(__global uint *p); OVERLOADABLE uint __gen_ocl_atomic_inc(__local uint *p); OVERLOADABLE uint __gen_ocl_atomic_dec(__global uint *p); OVERLOADABLE uint __gen_ocl_atomic_dec(__local uint *p); OVERLOADABLE uint __gen_ocl_atomic_cmpxchg(__global uint *p, uint cmp, uint val); OVERLOADABLE uint __gen_ocl_atomic_cmpxchg(__local uint *p, uint cmp, uint val); OVERLOADABLE uint __gen_ocl_atomic_imin(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_imin(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_imax(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_imax(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_umin(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_umin(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_umax(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_umax(__local uint *p, uint val); #define DECL_ATOMIC_OP_SPACE(NAME, TYPE, SPACE, PREFIX) \ OVERLOADABLE TYPE atomic_##NAME (volatile SPACE TYPE *p, TYPE val) { \ return (TYPE)__gen_ocl_##PREFIX##NAME((SPACE uint *)p, val); \ } #define DECL_ATOMIC_OP_TYPE(NAME, TYPE, PREFIX) \ DECL_ATOMIC_OP_SPACE(NAME, TYPE, __global, PREFIX) \ DECL_ATOMIC_OP_SPACE(NAME, TYPE, __local, PREFIX) #define DECL_ATOMIC_OP(NAME) \ DECL_ATOMIC_OP_TYPE(NAME, uint, atomic_) \ DECL_ATOMIC_OP_TYPE(NAME, int, atomic_) DECL_ATOMIC_OP(add) DECL_ATOMIC_OP(sub) DECL_ATOMIC_OP(and) DECL_ATOMIC_OP(or) DECL_ATOMIC_OP(xor) DECL_ATOMIC_OP(xchg) DECL_ATOMIC_OP_TYPE(min, int, atomic_i) DECL_ATOMIC_OP_TYPE(max, int, atomic_i) DECL_ATOMIC_OP_TYPE(min, uint, atomic_u) DECL_ATOMIC_OP_TYPE(max, uint, atomic_u) #undef DECL_ATOMIC_OP_SPACE #define DECL_ATOMIC_OP_SPACE(NAME, TYPE, SPACE, PREFIX) \ OVERLOADABLE TYPE atomic_##NAME (volatile SPACE TYPE *p, TYPE val) { \ return as_float(__gen_ocl_##PREFIX##NAME((SPACE uint *)p, as_uint(val))); \ } DECL_ATOMIC_OP_SPACE(xchg, float, __global, atomic_) DECL_ATOMIC_OP_SPACE(xchg, float, __local, atomic_) #undef DECL_ATOMIC_OP #undef DECL_ATOMIC_OP_TYPE #undef DECL_ATOMIC_OP_SPACE #define DECL_ATOMIC_OP_SPACE(NAME, TYPE, SPACE) \ OVERLOADABLE TYPE atomic_##NAME (volatile SPACE TYPE *p) { \ return (TYPE)__gen_ocl_atomic_##NAME((SPACE uint *)p); \ } #define DECL_ATOMIC_OP_TYPE(NAME, TYPE) \ DECL_ATOMIC_OP_SPACE(NAME, TYPE, __global) \ DECL_ATOMIC_OP_SPACE(NAME, TYPE, __local) #define DECL_ATOMIC_OP(NAME) \ DECL_ATOMIC_OP_TYPE(NAME, uint) \ DECL_ATOMIC_OP_TYPE(NAME, int) DECL_ATOMIC_OP(inc) DECL_ATOMIC_OP(dec) #undef DECL_ATOMIC_OP #undef DECL_ATOMIC_OP_TYPE #undef DECL_ATOMIC_OP_SPACE #define DECL_ATOMIC_OP_SPACE(NAME, TYPE, SPACE) \ OVERLOADABLE TYPE atomic_##NAME (volatile SPACE TYPE *p, TYPE cmp, TYPE val) { \ return (TYPE)__gen_ocl_atomic_##NAME((SPACE uint *)p, (uint)cmp, (uint)val); \ } #define DECL_ATOMIC_OP_TYPE(NAME, TYPE) \ DECL_ATOMIC_OP_SPACE(NAME, TYPE, __global) \ DECL_ATOMIC_OP_SPACE(NAME, TYPE, __local) #define DECL_ATOMIC_OP(NAME) \ DECL_ATOMIC_OP_TYPE(NAME, uint) \ DECL_ATOMIC_OP_TYPE(NAME, int) DECL_ATOMIC_OP(cmpxchg) #undef DECL_ATOMIC_OP #undef DECL_ATOMIC_OP_TYPE #undef DECL_ATOMIC_OP_SPACE // XXX for conformance test // The following atom_xxx api is on OpenCL spec 1.0. // But the conformance test suite will test them anyway. #define atom_add atomic_add #define atom_sub atomic_sub #define atom_and atomic_and #define atom_or atomic_or #define atom_xor atomic_xor #define atom_xchg atomic_xchg #define atom_min atomic_min #define atom_max atomic_max #define atom_inc atomic_inc #define atom_dec atomic_dec #define atom_cmpxchg atomic_cmpxchg Beignet-1.3.2-Source/backend/src/libocl/src/ocl_clz.ll000664 001750 001750 00000003730 13161142102 021641 0ustar00yryr000000 000000 target datalayout = "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" target triple = "spir" declare i8 @llvm.ctlz.i8(i8, i1) declare i16 @llvm.ctlz.i16(i16, i1) declare i32 @llvm.ctlz.i32(i32, i1) declare i64 @llvm.ctlz.i64(i64, i1) define i8 @clz_s8(i8 %x) nounwind readnone alwaysinline { %call = call i8 @llvm.ctlz.i8(i8 %x, i1 0) ret i8 %call } define i8 @clz_u8(i8 %x) nounwind readnone alwaysinline { %call = call i8 @llvm.ctlz.i8(i8 %x, i1 0) ret i8 %call } define i16 @clz_s16(i16 %x) nounwind readnone alwaysinline { %call = call i16 @llvm.ctlz.i16(i16 %x, i1 0) ret i16 %call } define i16 @clz_u16(i16 %x) nounwind readnone alwaysinline { %call = call i16 @llvm.ctlz.i16(i16 %x, i1 0) ret i16 %call } define i32 @clz_s32(i32 %x) nounwind readnone alwaysinline { %call = call i32 @llvm.ctlz.i32(i32 %x, i1 0) ret i32 %call } define i32 @clz_u32(i32 %x) nounwind readnone alwaysinline { %call = call i32 @llvm.ctlz.i32(i32 %x, i1 0) ret i32 %call } define i64 @clz_s64(i64 %x) nounwind readnone alwaysinline { %1 = bitcast i64 %x to <2 x i32> %2 = extractelement <2 x i32> %1, i32 0 %3 = extractelement <2 x i32> %1, i32 1 %call1 = call i32 @llvm.ctlz.i32(i32 %2, i1 0) %call2 = call i32 @llvm.ctlz.i32(i32 %3, i1 0) %cmp = icmp ult i32 %call2, 32 %4 = add i32 %call1, 32 %5 = select i1 %cmp, i32 %call2, i32 %4 %6 = insertelement <2 x i32> undef, i32 %5, i32 0 %call = bitcast <2 x i32> %6 to i64 ret i64 %call } define i64 @clz_u64(i64 %x) nounwind readnone alwaysinline { %1 = bitcast i64 %x to <2 x i32> %2 = extractelement <2 x i32> %1, i32 0 %3 = extractelement <2 x i32> %1, i32 1 %call1 = call i32 @llvm.ctlz.i32(i32 %2, i1 0) %call2 = call i32 @llvm.ctlz.i32(i32 %3, i1 0) %cmp = icmp ult i32 %call2, 32 %4 = add i32 %call1, 32 %5 = select i1 %cmp, i32 %call2, i32 %4 %6 = insertelement <2 x i32> undef, i32 %5, i32 0 %call = bitcast <2 x i32> %6 to i64 ret i64 %call } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_sampler_20.ll000664 001750 001750 00000000671 13173554000 023025 0ustar00yryr000000 000000 target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" target triple = "spir64" %opencl.sampler_t = type opaque declare %opencl.sampler_t addrspace(2)*@__gen_ocl_int_to_sampler(i32) define %opencl.sampler_t addrspace(2)*@__translate_sampler_initializer(i32 %s) { %call = call %opencl.sampler_t addrspace(2)*@__gen_ocl_int_to_sampler(i32 %s) ret %opencl.sampler_t addrspace(2)* %call } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_ctz_20.ll000664 001750 001750 00000003722 13161142102 022153 0ustar00yryr000000 000000 target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" target triple = "spir64" declare i8 @llvm.cttz.i8(i8, i1) declare i16 @llvm.cttz.i16(i16, i1) declare i32 @llvm.cttz.i32(i32, i1) declare i64 @llvm.cttz.i64(i64, i1) define i8 @ctz_s8(i8 %x) nounwind readnone alwaysinline { %call = call i8 @llvm.cttz.i8(i8 %x, i1 0) ret i8 %call } define i8 @ctz_u8(i8 %x) nounwind readnone alwaysinline { %call = call i8 @llvm.cttz.i8(i8 %x, i1 0) ret i8 %call } define i16 @ctz_s16(i16 %x) nounwind readnone alwaysinline { %call = call i16 @llvm.cttz.i16(i16 %x, i1 0) ret i16 %call } define i16 @ctz_u16(i16 %x) nounwind readnone alwaysinline { %call = call i16 @llvm.cttz.i16(i16 %x, i1 0) ret i16 %call } define i32 @ctz_s32(i32 %x) nounwind readnone alwaysinline { %call = call i32 @llvm.cttz.i32(i32 %x, i1 0) ret i32 %call } define i32 @ctz_u32(i32 %x) nounwind readnone alwaysinline { %call = call i32 @llvm.cttz.i32(i32 %x, i1 0) ret i32 %call } define i64 @ctz_s64(i64 %x) nounwind readnone alwaysinline { %1 = bitcast i64 %x to <2 x i32> %2 = extractelement <2 x i32> %1, i32 0 %3 = extractelement <2 x i32> %1, i32 1 %call1 = call i32 @llvm.cttz.i32(i32 %2, i1 0) %call2 = call i32 @llvm.cttz.i32(i32 %3, i1 0) %cmp = icmp ult i32 %call1, 32 %4 = add i32 %call2, 32 %5 = select i1 %cmp, i32 %call1, i32 %4 %6 = insertelement <2 x i32> undef, i32 %5, i32 0 %call = bitcast <2 x i32> %6 to i64 ret i64 %call } define i64 @ctz_u64(i64 %x) nounwind readnone alwaysinline { %1 = bitcast i64 %x to <2 x i32> %2 = extractelement <2 x i32> %1, i32 0 %3 = extractelement <2 x i32> %1, i32 1 %call1 = call i32 @llvm.cttz.i32(i32 %2, i1 0) %call2 = call i32 @llvm.cttz.i32(i32 %3, i1 0) %cmp = icmp ult i32 %call1, 32 %4 = add i32 %call2, 32 %5 = select i1 %cmp, i32 %call1, i32 %4 %6 = insertelement <2 x i32> undef, i32 %5, i32 0 %call = bitcast <2 x i32> %6 to i64 ret i64 %call } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_workitem.cl000664 001750 001750 00000006414 13161142102 022703 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_workitem.h" PURE CONST uint __gen_ocl_get_work_dim(void); OVERLOADABLE uint get_work_dim(void) { return __gen_ocl_get_work_dim(); } #define DECL_INTERNAL_WORK_ITEM_FN(NAME) \ PURE CONST unsigned int __gen_ocl_##NAME##0(void); \ PURE CONST unsigned int __gen_ocl_##NAME##1(void); \ PURE CONST unsigned int __gen_ocl_##NAME##2(void); DECL_INTERNAL_WORK_ITEM_FN(get_group_id) DECL_INTERNAL_WORK_ITEM_FN(get_local_id) DECL_INTERNAL_WORK_ITEM_FN(get_enqueued_local_size) DECL_INTERNAL_WORK_ITEM_FN(get_local_size) DECL_INTERNAL_WORK_ITEM_FN(get_global_size) DECL_INTERNAL_WORK_ITEM_FN(get_global_offset) DECL_INTERNAL_WORK_ITEM_FN(get_num_groups) #undef DECL_INTERNAL_WORK_ITEM_FN #define DECL_PUBLIC_WORK_ITEM_FN(NAME, OTHER_RET) \ OVERLOADABLE size_t NAME(unsigned int dim) { \ if (dim == 0) return __gen_ocl_##NAME##0(); \ else if (dim == 1) return __gen_ocl_##NAME##1(); \ else if (dim == 2) return __gen_ocl_##NAME##2(); \ else return OTHER_RET; \ } DECL_PUBLIC_WORK_ITEM_FN(get_group_id, 0) DECL_PUBLIC_WORK_ITEM_FN(get_local_id, 0) DECL_PUBLIC_WORK_ITEM_FN(get_enqueued_local_size, 1) DECL_PUBLIC_WORK_ITEM_FN(get_local_size, 1) DECL_PUBLIC_WORK_ITEM_FN(get_global_size, 1) DECL_PUBLIC_WORK_ITEM_FN(get_global_offset, 0) DECL_PUBLIC_WORK_ITEM_FN(get_num_groups, 1) #undef DECL_PUBLIC_WORK_ITEM_FN OVERLOADABLE size_t get_global_id(uint dim) { return get_local_id(dim) + get_enqueued_local_size(dim) * get_group_id(dim) + get_global_offset(dim); } OVERLOADABLE size_t get_global_linear_id(void) { uint dim = __gen_ocl_get_work_dim(); if (dim == 1) return get_global_id(0) - get_global_offset(0); else if (dim == 2) return (get_global_id(1) - get_global_offset(1)) * get_global_size(0) + get_global_id(0) -get_global_offset(0); else if (dim == 3) return ((get_global_id(2) - get_global_offset(2)) * get_global_size(1) * get_global_size(0)) + ((get_global_id(1) - get_global_offset(1)) * get_global_size (0)) + (get_global_id(0) - get_global_offset(0)); else return 0; } OVERLOADABLE size_t get_local_linear_id(void) { uint dim = __gen_ocl_get_work_dim(); if (dim == 1) return get_local_id(0); else if (dim == 2) return get_local_id(1) * get_enqueued_local_size(0) + get_local_id(0); else if (dim == 3) return (get_local_id(2) * get_enqueued_local_size(1) * get_local_size(0)) + (get_local_id(1) * get_enqueued_local_size(0)) + get_local_id(0); else return 0; } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_work_group.cl000664 001750 001750 00000011175 13161142102 023240 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_work_group.h" int __gen_ocl_work_group_all(int); int work_group_all(int predicate) { return __gen_ocl_work_group_all(predicate); } int __gen_ocl_work_group_any(int); int work_group_any(int predicate) { return __gen_ocl_work_group_any(predicate); } /* broadcast */ #define BROADCAST_IMPL(GEN_TYPE) \ OVERLOADABLE GEN_TYPE __gen_ocl_work_group_broadcast(GEN_TYPE a, size_t local_id); \ OVERLOADABLE GEN_TYPE work_group_broadcast(GEN_TYPE a, size_t local_id) { \ return __gen_ocl_work_group_broadcast(a, local_id); \ } \ OVERLOADABLE GEN_TYPE __gen_ocl_work_group_broadcast(GEN_TYPE a, size_t local_id_x, size_t local_id_y); \ OVERLOADABLE GEN_TYPE work_group_broadcast(GEN_TYPE a, size_t local_id_x, size_t local_id_y) { \ return __gen_ocl_work_group_broadcast(a, local_id_x, local_id_y); \ } \ OVERLOADABLE GEN_TYPE __gen_ocl_work_group_broadcast(GEN_TYPE a, size_t local_id_x, size_t local_id_y, size_t local_id_z); \ OVERLOADABLE GEN_TYPE work_group_broadcast(GEN_TYPE a, size_t local_id_x, size_t local_id_y, size_t local_id_z) { \ return __gen_ocl_work_group_broadcast(a, local_id_x, local_id_y, local_id_z); \ } BROADCAST_IMPL(int) BROADCAST_IMPL(uint) BROADCAST_IMPL(long) BROADCAST_IMPL(ulong) BROADCAST_IMPL(float) BROADCAST_IMPL(double) #undef BROADCAST_IMPL #define RANGE_OP(RANGE, OP, GEN_TYPE, SIGN) \ OVERLOADABLE GEN_TYPE __gen_ocl_work_group_##RANGE##_##OP(bool sign, GEN_TYPE x); \ OVERLOADABLE GEN_TYPE work_group_##RANGE##_##OP(GEN_TYPE x) { \ return __gen_ocl_work_group_##RANGE##_##OP(SIGN, x); \ } /* reduce add */ RANGE_OP(reduce, add, int, true) RANGE_OP(reduce, add, uint, false) RANGE_OP(reduce, add, long, true) RANGE_OP(reduce, add, ulong, false) RANGE_OP(reduce, add, float, true) RANGE_OP(reduce, add, double, true) /* reduce min */ RANGE_OP(reduce, min, int, true) RANGE_OP(reduce, min, uint, false) RANGE_OP(reduce, min, long, true) RANGE_OP(reduce, min, ulong, false) RANGE_OP(reduce, min, float, true) RANGE_OP(reduce, min, double, true) /* reduce max */ RANGE_OP(reduce, max, int, true) RANGE_OP(reduce, max, uint, false) RANGE_OP(reduce, max, long, true) RANGE_OP(reduce, max, ulong, false) RANGE_OP(reduce, max, float, true) RANGE_OP(reduce, max, double, true) /* scan_inclusive add */ RANGE_OP(scan_inclusive, add, int, true) RANGE_OP(scan_inclusive, add, uint, false) RANGE_OP(scan_inclusive, add, long, true) RANGE_OP(scan_inclusive, add, ulong, false) RANGE_OP(scan_inclusive, add, float, true) RANGE_OP(scan_inclusive, add, double, true) /* scan_inclusive min */ RANGE_OP(scan_inclusive, min, int, true) RANGE_OP(scan_inclusive, min, uint, false) RANGE_OP(scan_inclusive, min, long, true) RANGE_OP(scan_inclusive, min, ulong, false) RANGE_OP(scan_inclusive, min, float, true) RANGE_OP(scan_inclusive, min, double, true) /* scan_inclusive max */ RANGE_OP(scan_inclusive, max, int, true) RANGE_OP(scan_inclusive, max, uint, false) RANGE_OP(scan_inclusive, max, long, true) RANGE_OP(scan_inclusive, max, ulong, false) RANGE_OP(scan_inclusive, max, float, true) RANGE_OP(scan_inclusive, max, double, true) /* scan_exclusive add */ RANGE_OP(scan_exclusive, add, int, true) RANGE_OP(scan_exclusive, add, uint, false) RANGE_OP(scan_exclusive, add, long, true) RANGE_OP(scan_exclusive, add, ulong, false) RANGE_OP(scan_exclusive, add, float, true) RANGE_OP(scan_exclusive, add, double, true) /* scan_exclusive min */ RANGE_OP(scan_exclusive, min, int, true) RANGE_OP(scan_exclusive, min, uint, false) RANGE_OP(scan_exclusive, min, long, true) RANGE_OP(scan_exclusive, min, ulong, false) RANGE_OP(scan_exclusive, min, float, true) RANGE_OP(scan_exclusive, min, double, true) /* scan_exclusive max */ RANGE_OP(scan_exclusive, max, int, true) RANGE_OP(scan_exclusive, max, uint, false) RANGE_OP(scan_exclusive, max, long, true) RANGE_OP(scan_exclusive, max, ulong, false) RANGE_OP(scan_exclusive, max, float, true) RANGE_OP(scan_exclusive, max, double, true) #undef RANGE_OP Beignet-1.3.2-Source/backend/src/libocl/src/ocl_ctz.ll000664 001750 001750 00000003730 13161142102 021651 0ustar00yryr000000 000000 target datalayout = "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" target triple = "spir" declare i8 @llvm.cttz.i8(i8, i1) declare i16 @llvm.cttz.i16(i16, i1) declare i32 @llvm.cttz.i32(i32, i1) declare i64 @llvm.cttz.i64(i64, i1) define i8 @ctz_s8(i8 %x) nounwind readnone alwaysinline { %call = call i8 @llvm.cttz.i8(i8 %x, i1 0) ret i8 %call } define i8 @ctz_u8(i8 %x) nounwind readnone alwaysinline { %call = call i8 @llvm.cttz.i8(i8 %x, i1 0) ret i8 %call } define i16 @ctz_s16(i16 %x) nounwind readnone alwaysinline { %call = call i16 @llvm.cttz.i16(i16 %x, i1 0) ret i16 %call } define i16 @ctz_u16(i16 %x) nounwind readnone alwaysinline { %call = call i16 @llvm.cttz.i16(i16 %x, i1 0) ret i16 %call } define i32 @ctz_s32(i32 %x) nounwind readnone alwaysinline { %call = call i32 @llvm.cttz.i32(i32 %x, i1 0) ret i32 %call } define i32 @ctz_u32(i32 %x) nounwind readnone alwaysinline { %call = call i32 @llvm.cttz.i32(i32 %x, i1 0) ret i32 %call } define i64 @ctz_s64(i64 %x) nounwind readnone alwaysinline { %1 = bitcast i64 %x to <2 x i32> %2 = extractelement <2 x i32> %1, i32 0 %3 = extractelement <2 x i32> %1, i32 1 %call1 = call i32 @llvm.cttz.i32(i32 %2, i1 0) %call2 = call i32 @llvm.cttz.i32(i32 %3, i1 0) %cmp = icmp ult i32 %call1, 32 %4 = add i32 %call2, 32 %5 = select i1 %cmp, i32 %call1, i32 %4 %6 = insertelement <2 x i32> undef, i32 %5, i32 0 %call = bitcast <2 x i32> %6 to i64 ret i64 %call } define i64 @ctz_u64(i64 %x) nounwind readnone alwaysinline { %1 = bitcast i64 %x to <2 x i32> %2 = extractelement <2 x i32> %1, i32 0 %3 = extractelement <2 x i32> %1, i32 1 %call1 = call i32 @llvm.cttz.i32(i32 %2, i1 0) %call2 = call i32 @llvm.cttz.i32(i32 %3, i1 0) %cmp = icmp ult i32 %call1, 32 %4 = add i32 %call2, 32 %5 = select i1 %cmp, i32 %call1, i32 %4 %6 = insertelement <2 x i32> undef, i32 %5, i32 0 %call = bitcast <2 x i32> %6 to i64 ret i64 %call } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_memset.cl000664 001750 001750 00000002671 13161142102 022335 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_memset.h" #define DECL_MEMSET_FN(NAME, DST_SPACE) \ void __gen_memset_ ##NAME## _align (DST_SPACE uchar* dst, uchar val, size_t size) { \ size_t index = 0; \ uint v = (val << 24) | (val << 16) | (val << 8) | val; \ while((index + 4) <= size) { \ *((DST_SPACE uint *)(dst + index)) = v; \ index += 4; \ } \ while(index < size) { \ dst[index] = val; \ index++; \ } \ } \ void __gen_memset_ ##NAME (DST_SPACE uchar* dst, uchar val, size_t size) { \ size_t index = 0; \ while(index < size) { \ dst[index] = val; \ index++; \ } \ } DECL_MEMSET_FN(g, __global) DECL_MEMSET_FN(l, __local) DECL_MEMSET_FN(p, __private) #if (__OPENCL_C_VERSION__ >= 200) DECL_MEMSET_FN(n, __generic) #endif Beignet-1.3.2-Source/backend/src/libocl/src/ocl_clz_20.ll000664 001750 001750 00000003722 13161142102 022143 0ustar00yryr000000 000000 target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" target triple = "spir64" declare i8 @llvm.ctlz.i8(i8, i1) declare i16 @llvm.ctlz.i16(i16, i1) declare i32 @llvm.ctlz.i32(i32, i1) declare i64 @llvm.ctlz.i64(i64, i1) define i8 @clz_s8(i8 %x) nounwind readnone alwaysinline { %call = call i8 @llvm.ctlz.i8(i8 %x, i1 0) ret i8 %call } define i8 @clz_u8(i8 %x) nounwind readnone alwaysinline { %call = call i8 @llvm.ctlz.i8(i8 %x, i1 0) ret i8 %call } define i16 @clz_s16(i16 %x) nounwind readnone alwaysinline { %call = call i16 @llvm.ctlz.i16(i16 %x, i1 0) ret i16 %call } define i16 @clz_u16(i16 %x) nounwind readnone alwaysinline { %call = call i16 @llvm.ctlz.i16(i16 %x, i1 0) ret i16 %call } define i32 @clz_s32(i32 %x) nounwind readnone alwaysinline { %call = call i32 @llvm.ctlz.i32(i32 %x, i1 0) ret i32 %call } define i32 @clz_u32(i32 %x) nounwind readnone alwaysinline { %call = call i32 @llvm.ctlz.i32(i32 %x, i1 0) ret i32 %call } define i64 @clz_s64(i64 %x) nounwind readnone alwaysinline { %1 = bitcast i64 %x to <2 x i32> %2 = extractelement <2 x i32> %1, i32 0 %3 = extractelement <2 x i32> %1, i32 1 %call1 = call i32 @llvm.ctlz.i32(i32 %2, i1 0) %call2 = call i32 @llvm.ctlz.i32(i32 %3, i1 0) %cmp = icmp ult i32 %call2, 32 %4 = add i32 %call1, 32 %5 = select i1 %cmp, i32 %call2, i32 %4 %6 = insertelement <2 x i32> undef, i32 %5, i32 0 %call = bitcast <2 x i32> %6 to i64 ret i64 %call } define i64 @clz_u64(i64 %x) nounwind readnone alwaysinline { %1 = bitcast i64 %x to <2 x i32> %2 = extractelement <2 x i32> %1, i32 0 %3 = extractelement <2 x i32> %1, i32 1 %call1 = call i32 @llvm.ctlz.i32(i32 %2, i1 0) %call2 = call i32 @llvm.ctlz.i32(i32 %3, i1 0) %cmp = icmp ult i32 %call2, 32 %4 = add i32 %call1, 32 %5 = select i1 %cmp, i32 %call2, i32 %4 %6 = insertelement <2 x i32> undef, i32 %5, i32 0 %call = bitcast <2 x i32> %6 to i64 ret i64 %call } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_atom_20.cl000664 001750 001750 00000042710 13161142102 022302 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_atom_20.h" #include "ocl_as.h" #include "ocl_sync.h" OVERLOADABLE uint __gen_ocl_atomic_add(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_add(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_sub(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_sub(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_and(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_and(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_or(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_or(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_xor(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_xor(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_xchg(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_xchg(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_inc(__global uint *p); OVERLOADABLE uint __gen_ocl_atomic_inc(__local uint *p); OVERLOADABLE uint __gen_ocl_atomic_dec(__global uint *p); OVERLOADABLE uint __gen_ocl_atomic_dec(__local uint *p); OVERLOADABLE uint __gen_ocl_atomic_cmpxchg(__global uint *p, uint cmp, uint val); OVERLOADABLE uint __gen_ocl_atomic_cmpxchg(__local uint *p, uint cmp, uint val); OVERLOADABLE uint __gen_ocl_atomic_imin(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_imin(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_imax(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_imax(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_umin(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_umin(__local uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_umax(__global uint *p, uint val); OVERLOADABLE uint __gen_ocl_atomic_umax(__local uint *p, uint val); #define DECL_ATOMIC_OP_SPACE(NAME, TYPE, SPACE, PREFIX) \ OVERLOADABLE TYPE atomic_##NAME (volatile SPACE TYPE *p, TYPE val) { \ return (TYPE)__gen_ocl_##PREFIX##NAME((SPACE uint *)p, val); \ } #define DECL_ATOMIC_OP_TYPE(NAME, TYPE, PREFIX) \ DECL_ATOMIC_OP_SPACE(NAME, TYPE, __global, PREFIX) \ DECL_ATOMIC_OP_SPACE(NAME, TYPE, __local, PREFIX) #define DECL_ATOMIC_OP(NAME) \ DECL_ATOMIC_OP_TYPE(NAME, uint, atomic_) \ DECL_ATOMIC_OP_TYPE(NAME, int, atomic_) DECL_ATOMIC_OP(add) DECL_ATOMIC_OP(sub) DECL_ATOMIC_OP(and) DECL_ATOMIC_OP(or) DECL_ATOMIC_OP(xor) DECL_ATOMIC_OP(xchg) DECL_ATOMIC_OP_TYPE(min, int, atomic_i) DECL_ATOMIC_OP_TYPE(max, int, atomic_i) DECL_ATOMIC_OP_TYPE(min, uint, atomic_u) DECL_ATOMIC_OP_TYPE(max, uint, atomic_u) #undef DECL_ATOMIC_OP_SPACE #define DECL_ATOMIC_OP_SPACE(NAME, TYPE, SPACE, PREFIX) \ OVERLOADABLE TYPE atomic_##NAME (volatile SPACE TYPE *p, TYPE val) { \ return as_float(__gen_ocl_##PREFIX##NAME((SPACE uint *)p, as_uint(val))); \ } DECL_ATOMIC_OP_SPACE(xchg, float, __global, atomic_) DECL_ATOMIC_OP_SPACE(xchg, float, __local, atomic_) #undef DECL_ATOMIC_OP #undef DECL_ATOMIC_OP_TYPE #undef DECL_ATOMIC_OP_SPACE #define DECL_ATOMIC_OP_SPACE(NAME, TYPE, SPACE) \ OVERLOADABLE TYPE atomic_##NAME (volatile SPACE TYPE *p) { \ return (TYPE)__gen_ocl_atomic_##NAME((SPACE uint *)p); \ } #define DECL_ATOMIC_OP_TYPE(NAME, TYPE) \ DECL_ATOMIC_OP_SPACE(NAME, TYPE, __global) \ DECL_ATOMIC_OP_SPACE(NAME, TYPE, __local) #define DECL_ATOMIC_OP(NAME) \ DECL_ATOMIC_OP_TYPE(NAME, uint) \ DECL_ATOMIC_OP_TYPE(NAME, int) DECL_ATOMIC_OP(inc) DECL_ATOMIC_OP(dec) #undef DECL_ATOMIC_OP #undef DECL_ATOMIC_OP_TYPE #undef DECL_ATOMIC_OP_SPACE #define DECL_ATOMIC_OP_SPACE(NAME, TYPE, SPACE) \ OVERLOADABLE TYPE atomic_##NAME (volatile SPACE TYPE *p, TYPE cmp, TYPE val) { \ return (TYPE)__gen_ocl_atomic_##NAME((SPACE uint *)p, (uint)cmp, (uint)val); \ } #define DECL_ATOMIC_OP_TYPE(NAME, TYPE) \ DECL_ATOMIC_OP_SPACE(NAME, TYPE, __global) \ DECL_ATOMIC_OP_SPACE(NAME, TYPE, __local) #define DECL_ATOMIC_OP(NAME) \ DECL_ATOMIC_OP_TYPE(NAME, uint) \ DECL_ATOMIC_OP_TYPE(NAME, int) DECL_ATOMIC_OP(cmpxchg) #undef DECL_ATOMIC_OP #undef DECL_ATOMIC_OP_TYPE #undef DECL_ATOMIC_OP_SPACE // XXX for conformance test // The following atom_xxx api is on OpenCL spec 1.0. // But the conformance test suite will test them anyway. #define atom_add atomic_add #define atom_sub atomic_sub #define atom_and atomic_and #define atom_or atomic_or #define atom_xor atomic_xor #define atom_xchg atomic_xchg #define atom_min atomic_min #define atom_max atomic_max #define atom_inc atomic_inc #define atom_dec atomic_dec #define atom_cmpxchg atomic_cmpxchg // OpenCL 2.0 features. #define DECL_ATOMIC_OP_TYPE(NAME, PREFIX, ATYPE, STYPE, CTYPE) \ OVERLOADABLE CTYPE atomic_##NAME (volatile ATYPE *p, CTYPE val) { \ return (CTYPE)__gen_ocl_atomic_##PREFIX((STYPE*)p, val, memory_order_seq_cst, memory_scope_device); \ } #define DECL_ATOMIC_COMPARE_EXCHANGE_TYPE(NAME, PREFIX, ATYPE, STYPE, CTYPE) \ OVERLOADABLE bool atomic_##NAME (volatile ATYPE *p, CTYPE* expected, CTYPE val) { \ CTYPE oldValue = __gen_ocl_atomic_##PREFIX((STYPE*)p, *expected, val, memory_order_seq_cst, memory_order_seq_cst, memory_scope_device); \ bool ret = oldValue == *expected; \ *expected = oldValue; \ return ret; \ } #define DECL_ATOMIC_LOAD_TYPE(NAME, PREFIX, ATYPE, STYPE, CTYPE) \ OVERLOADABLE CTYPE atomic_##NAME (volatile ATYPE *p) { \ return (CTYPE)__gen_ocl_atomic_##PREFIX((STYPE*)p, 0, memory_order_seq_cst, memory_scope_device); \ } #define DECL_ATOMIC_NO_RET_TYPE(NAME, PREFIX, ATYPE, STYPE, CTYPE) \ OVERLOADABLE void atomic_##NAME (volatile ATYPE *p, CTYPE val) { \ __gen_ocl_atomic_##PREFIX((STYPE*)p, val, memory_order_seq_cst, memory_scope_device); \ } #define DECL_ATOMIC_OP(NAME, PREFIX) \ DECL_ATOMIC_OP_TYPE(NAME, PREFIX##32, atomic_uint, atomic_int, uint) \ DECL_ATOMIC_OP_TYPE(NAME, PREFIX##32, atomic_int, atomic_int, int) \ //DECL_ATOMIC_OP_TYPE(NAME, PREFIX##64, atomic_ulong, atomic_long, ulong) \ DECL_ATOMIC_OP_TYPE(NAME, PREFIX##64, atomic_long, atomic_long, long) \ #define DECL_ATOMIC_COMPARE_EXCHANGE_OP(NAME, PREFIX) \ DECL_ATOMIC_COMPARE_EXCHANGE_TYPE(NAME, PREFIX##32, atomic_uint, atomic_int, uint) \ DECL_ATOMIC_COMPARE_EXCHANGE_TYPE(NAME, PREFIX##32, atomic_int, atomic_int, int) \ //DECL_ATOMIC_COMPARE_EXCHANGE_TYPE(NAME, PREFIX##64, atomic_ulong, atomic_long, ulong) \ DECL_ATOMIC_COMPARE_EXCHANGE_TYPE(NAME, PREFIX##64, atomic_long, atomic_long, long) \ #define DECL_ATOMIC_LOAD_OP(NAME, PREFIX) \ DECL_ATOMIC_LOAD_TYPE(NAME, PREFIX##32, atomic_uint, atomic_int, uint) \ DECL_ATOMIC_LOAD_TYPE(NAME, PREFIX##32, atomic_int, atomic_int, int) \ //DECL_ATOMIC_LOAD_TYPE(NAME, PREFIX##64, atomic_ulong, atomic_long, ulong) \ DECL_ATOMIC_LOAD_TYPE(NAME, PREFIX##64, atomic_long, atomic_long, long) \ #define DECL_ATOMIC_NO_RET_OP(NAME, PREFIX) \ DECL_ATOMIC_NO_RET_TYPE(NAME, PREFIX##32, atomic_uint, atomic_int, uint) \ DECL_ATOMIC_NO_RET_TYPE(NAME, PREFIX##32, atomic_int, atomic_int, int) \ //DECL_ATOMIC_NO_RET_TYPE(NAME, PREFIX##64, atomic_ulong, atomic_long, ulong) \ DECL_ATOMIC_NO_RET_TYPE(NAME, PREFIX##64, atomic_long, atomic_long, long) \ DECL_ATOMIC_OP(exchange, exchange) DECL_ATOMIC_OP(fetch_add, fetch_add) DECL_ATOMIC_OP(fetch_sub, fetch_sub) DECL_ATOMIC_OP(fetch_and, fetch_and) DECL_ATOMIC_OP(fetch_or, fetch_or) DECL_ATOMIC_OP(fetch_xor, fetch_xor) DECL_ATOMIC_LOAD_OP(load, fetch_add) DECL_ATOMIC_NO_RET_OP(init, exchange) DECL_ATOMIC_NO_RET_OP(store, exchange) DECL_ATOMIC_COMPARE_EXCHANGE_OP(compare_exchange_strong, compare_exchange_strong) DECL_ATOMIC_COMPARE_EXCHANGE_OP(compare_exchange_weak, compare_exchange_weak) DECL_ATOMIC_OP_TYPE(fetch_min, fetch_imin32, atomic_int, atomic_int, int) DECL_ATOMIC_OP_TYPE(fetch_min, fetch_umin32, atomic_uint, atomic_int, uint) DECL_ATOMIC_OP_TYPE(fetch_max, fetch_imax32, atomic_int, atomic_int, int) DECL_ATOMIC_OP_TYPE(fetch_max, fetch_umax32, atomic_uint, atomic_int, uint) #ifndef DISABLE_ATOMIC_INT64 DECL_ATOMIC_OP_TYPE(fetch_min, fetch_imin64, atomic_long, atomic_long, long) DECL_ATOMIC_OP_TYPE(fetch_min, fetch_umin64, atomic_ulong, atomic_long, ulong) DECL_ATOMIC_OP_TYPE(fetch_max, fetch_imax64, atomic_long, atomic_long, long) DECL_ATOMIC_OP_TYPE(fetch_max, fetch_umax64, atomic_ulong, atomic_long, ulong) #endif DECL_ATOMIC_OP_TYPE(exchange, exchangef, atomic_float, atomic_int, float) DECL_ATOMIC_NO_RET_TYPE(init, exchangef, atomic_float, atomic_int, float) DECL_ATOMIC_NO_RET_TYPE(store, exchangef, atomic_float, atomic_int, float) DECL_ATOMIC_LOAD_TYPE(load, fetch_addf, atomic_float, atomic_int, float) #undef DECL_ATOMIC_OP_TYPE #undef DECL_ATOMIC_LOAD_TYPE #undef DECL_ATOMIC_NO_RET_TYPE #undef DECL_ATOMIC_COMPARE_EXCHANGE_TYPE // with memory_order. #define DECL_ATOMIC_OP_TYPE(NAME, PREFIX, ATYPE, STYPE, CTYPE) \ OVERLOADABLE CTYPE atomic_##NAME (volatile ATYPE *p, CTYPE val, memory_order order) { \ return (CTYPE)__gen_ocl_atomic_##PREFIX((STYPE*)p, val, order, memory_scope_device); \ } #define DECL_ATOMIC_COMPARE_EXCHANGE_TYPE(NAME, PREFIX, ATYPE, STYPE, CTYPE) \ OVERLOADABLE bool atomic_##NAME (volatile ATYPE *p, CTYPE* expected, CTYPE val, memory_order success, memory_order failure) { \ CTYPE oldValue = __gen_ocl_atomic_##PREFIX((STYPE*)p, *expected, val, success, failure, memory_scope_device); \ bool ret = oldValue == *expected; \ *expected = oldValue; \ return ret; \ } #define DECL_ATOMIC_LOAD_TYPE(NAME, PREFIX, ATYPE, STYPE, CTYPE) \ OVERLOADABLE CTYPE atomic_##NAME (volatile ATYPE *p, memory_order order) { \ return (CTYPE)__gen_ocl_atomic_##PREFIX((STYPE*)p, 0, order, memory_scope_device); \ } #define DECL_ATOMIC_NO_RET_TYPE(NAME, PREFIX, ATYPE, STYPE, CTYPE) \ OVERLOADABLE void atomic_##NAME (volatile ATYPE *p, CTYPE val, memory_order order) { \ __gen_ocl_atomic_##PREFIX((STYPE*)p, val, order, memory_scope_device); \ } DECL_ATOMIC_OP(exchange_explicit, exchange) DECL_ATOMIC_OP(fetch_add_explicit, fetch_add) DECL_ATOMIC_OP(fetch_sub_explicit, fetch_sub) DECL_ATOMIC_OP(fetch_and_explicit, fetch_and) DECL_ATOMIC_OP(fetch_or_explicit, fetch_or) DECL_ATOMIC_OP(fetch_xor_explicit, fetch_xor) DECL_ATOMIC_LOAD_OP(load_explicit, fetch_add) DECL_ATOMIC_NO_RET_OP(store_explicit, exchange) DECL_ATOMIC_COMPARE_EXCHANGE_OP(compare_exchange_strong_explicit, compare_exchange_strong) DECL_ATOMIC_COMPARE_EXCHANGE_OP(compare_exchange_weak_explicit, compare_exchange_weak) DECL_ATOMIC_OP_TYPE(fetch_min_explicit, fetch_imin32, atomic_int, atomic_int, int) DECL_ATOMIC_OP_TYPE(fetch_min_explicit, fetch_umin32, atomic_uint, atomic_int, uint) DECL_ATOMIC_OP_TYPE(fetch_max_explicit, fetch_imax32, atomic_int, atomic_int, int) DECL_ATOMIC_OP_TYPE(fetch_max_explicit, fetch_umax32, atomic_uint, atomic_int, uint) #ifndef DISABLE_ATOMIC_INT64 DECL_ATOMIC_OP_TYPE(fetch_min_explicit, fetch_imin64, atomic_long, atomic_long, long) DECL_ATOMIC_OP_TYPE(fetch_min_explicit, fetch_umin64, atomic_ulong, atomic_long, ulong) DECL_ATOMIC_OP_TYPE(fetch_max_explicit, fetch_imax64, atomic_long, atomic_long, long) DECL_ATOMIC_OP_TYPE(fetch_max_explicit, fetch_umax64, atomic_ulong, atomic_long, ulong) #endif DECL_ATOMIC_OP_TYPE(exchange_explicit, exchangef, atomic_float, atomic_int, float) DECL_ATOMIC_NO_RET_TYPE(init_explicit, exchangef, atomic_float, atomic_int, float) DECL_ATOMIC_NO_RET_TYPE(store_explicit, exchangef, atomic_float, atomic_int, float) DECL_ATOMIC_LOAD_TYPE(load_explicit, fetch_addf, atomic_float, atomic_int, float) #undef DECL_ATOMIC_OP_TYPE #undef DECL_ATOMIC_LOAD_TYPE #undef DECL_ATOMIC_NO_RET_TYPE #undef DECL_ATOMIC_COMPARE_EXCHANGE_TYPE // with memory_order and memory_scope #define DECL_ATOMIC_OP_TYPE(NAME, PREFIX, ATYPE, STYPE, CTYPE) \ OVERLOADABLE CTYPE atomic_##NAME (volatile ATYPE *p, CTYPE val, memory_order order, memory_scope scope) { \ return (CTYPE)__gen_ocl_atomic_##PREFIX((STYPE*)p, val, order, scope); \ } #define DECL_ATOMIC_COMPARE_EXCHANGE_TYPE(NAME, PREFIX, ATYPE, STYPE, CTYPE) \ OVERLOADABLE bool atomic_##NAME (volatile ATYPE *p, CTYPE* expected, CTYPE val, memory_order success, memory_order failure, memory_scope scope) { \ CTYPE oldValue = __gen_ocl_atomic_##PREFIX((STYPE*)p, *expected, val, success, failure, scope); \ bool ret = oldValue == *expected; \ *expected = oldValue; \ return ret; \ } #define DECL_ATOMIC_LOAD_TYPE(NAME, PREFIX, ATYPE, STYPE, CTYPE) \ OVERLOADABLE CTYPE atomic_##NAME (volatile ATYPE *p, memory_order order, memory_scope scope) { \ return (CTYPE)__gen_ocl_atomic_##PREFIX((STYPE*)p, 0, order, scope); \ } #define DECL_ATOMIC_NO_RET_TYPE(NAME, PREFIX, ATYPE, STYPE, CTYPE) \ OVERLOADABLE void atomic_##NAME (volatile ATYPE *p, CTYPE val, memory_order order, memory_scope scope) { \ __gen_ocl_atomic_##PREFIX((STYPE*)p, val, order, scope); \ } DECL_ATOMIC_OP(exchange_explicit, exchange) DECL_ATOMIC_OP(fetch_add_explicit, fetch_add) DECL_ATOMIC_OP(fetch_sub_explicit, fetch_sub) DECL_ATOMIC_OP(fetch_and_explicit, fetch_and) DECL_ATOMIC_OP(fetch_or_explicit, fetch_or) DECL_ATOMIC_OP(fetch_xor_explicit, fetch_xor) DECL_ATOMIC_LOAD_OP(load_explicit, fetch_add) DECL_ATOMIC_NO_RET_OP(store_explicit, exchange) DECL_ATOMIC_COMPARE_EXCHANGE_OP(compare_exchange_strong_explicit, compare_exchange_strong) DECL_ATOMIC_COMPARE_EXCHANGE_OP(compare_exchange_weak_explicit, compare_exchange_weak) DECL_ATOMIC_OP_TYPE(fetch_min_explicit, fetch_imin32, atomic_int, atomic_int, int) DECL_ATOMIC_OP_TYPE(fetch_min_explicit, fetch_umin32, atomic_uint, atomic_int, uint) DECL_ATOMIC_OP_TYPE(fetch_max_explicit, fetch_imax32, atomic_int, atomic_int, int) DECL_ATOMIC_OP_TYPE(fetch_max_explicit, fetch_umax32, atomic_uint, atomic_int, uint) #ifndef DISABLE_ATOMIC_INT64 DECL_ATOMIC_OP_TYPE(fetch_min_explicit, fetch_imin64, atomic_long, atomic_long, long) DECL_ATOMIC_OP_TYPE(fetch_min_explicit, fetch_umin64, atomic_ulong, atomic_long, ulong) DECL_ATOMIC_OP_TYPE(fetch_max_explicit, fetch_imax64, atomic_long, atomic_long, long) DECL_ATOMIC_OP_TYPE(fetch_max_explicit, fetch_umax64, atomic_ulong, atomic_long, ulong) #endif DECL_ATOMIC_OP_TYPE(exchange_explicit, exchangef, atomic_float, atomic_int, float) DECL_ATOMIC_NO_RET_TYPE(init_explicit, exchangef, atomic_float, atomic_int, float) DECL_ATOMIC_NO_RET_TYPE(store_explicit, exchangef, atomic_float, atomic_int, float) DECL_ATOMIC_LOAD_TYPE(load_explicit, fetch_addf, atomic_float, atomic_int, float) #undef DECL_ATOMIC_OP_TYPE #undef DECL_ATOMIC_LOAD_TYPE #undef DECL_ATOMIC_NO_RET_TYPE #undef DECL_ATOMIC_COMPARE_EXCHANGE_TYPE #undef DECL_ATOMIC_OP #undef DECL_ATOMIC_LOAD_OP #undef DECL_ATOMIC_NO_RET_OP #undef DECL_ATOMIC_COMPARE_EXCHANGE_OP OVERLOADABLE bool atomic_flag_test_and_set(volatile atomic_flag *object) { atomic_int * temp = (atomic_int*)object; int expected = 0; int new_value = 1; int oldValue = __gen_ocl_atomic_compare_exchange_strong32(temp, expected, new_value, memory_order_seq_cst, memory_order_seq_cst, memory_scope_device); if(oldValue == new_value) return true; else return false; } OVERLOADABLE bool atomic_flag_test_and_set_explicit(volatile atomic_flag *object, memory_order order) { atomic_int * temp = (atomic_int*)object; int expected = 0; int new_value = 1; int oldValue = __gen_ocl_atomic_compare_exchange_strong32(temp, expected, new_value, memory_order_seq_cst, memory_order_seq_cst, memory_scope_device); if(oldValue == new_value) return true; else return false; } OVERLOADABLE bool atomic_flag_test_and_set_explicit(volatile atomic_flag *object, memory_order order, memory_scope scope){ atomic_int * temp = (atomic_int*)object; int expected = 0; int new_value = 1; int oldValue = __gen_ocl_atomic_compare_exchange_strong32(temp, expected, new_value, memory_order_seq_cst, memory_order_seq_cst, memory_scope_device); if(oldValue == new_value) return true; else return false; } OVERLOADABLE void atomic_flag_clear(volatile atomic_flag *object){ atomic_int * temp = (atomic_int*)object; __gen_ocl_atomic_exchange32(temp, 0, memory_order_seq_cst, memory_scope_device); } OVERLOADABLE void atomic_flag_clear_explicit(volatile atomic_flag *object, memory_order order){ atomic_int * temp = (atomic_int*)object; __gen_ocl_atomic_exchange32(temp, 0, memory_order_seq_cst, memory_scope_device); } OVERLOADABLE void atomic_flag_clear_explicit(volatile atomic_flag *object, memory_order order, memory_scope scope){ atomic_int * temp = (atomic_int*)object; __gen_ocl_atomic_exchange32(temp, 0, memory_order_seq_cst, memory_scope_device); } OVERLOADABLE void atomic_work_item_fence(cl_mem_fence_flags flags, memory_order order, memory_scope scope){ } Beignet-1.3.2-Source/backend/src/libocl/src/ocl_image.cl000664 001750 001750 00000112377 13161145234 022143 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_image.h" #if (__OPENCL_C_VERSION__ >= 200) #include "ocl_math_20.h" #else #include "ocl_math.h" #endif #include "ocl_integer.h" #include "ocl_common.h" #include "ocl_convert.h" #define int1 int #define float1 float /////////////////////////////////////////////////////////////////////////////// // Beignet builtin functions. /////////////////////////////////////////////////////////////////////////////// #define DECL_GEN_OCL_RW_IMAGE(image_type, n) \ OVERLOADABLE int4 __gen_ocl_read_imagei(read_only image_type image, sampler_t sampler, \ float ##n coord, uint sampler_offset); \ OVERLOADABLE int4 __gen_ocl_read_imagei(read_only image_type image, sampler_t sampler, \ int ##n coord, uint sampler_offset); \ OVERLOADABLE uint4 __gen_ocl_read_imageui(read_only image_type image, sampler_t sampler, \ float ##n coord, uint sampler_offset); \ OVERLOADABLE uint4 __gen_ocl_read_imageui(read_only image_type image, sampler_t sampler, \ int ##n coord, uint sampler_offset); \ OVERLOADABLE float4 __gen_ocl_read_imagef(read_only image_type image, sampler_t sampler, \ float ##n coord, uint sampler_offset); \ OVERLOADABLE float4 __gen_ocl_read_imagef(read_only image_type image, sampler_t sampler, \ int ##n coord, uint sampler_offset); \ OVERLOADABLE void __gen_ocl_write_imagei(write_only image_type image, int ##n coord , int4 color); \ OVERLOADABLE void __gen_ocl_write_imageui(write_only image_type image, int ##n coord, uint4 color);\ OVERLOADABLE void __gen_ocl_write_imagef(write_only image_type image, int ##n coord, float4 color); #define DECL_GEN_OCL_QUERY_IMAGE(image_type) \ OVERLOADABLE int __gen_ocl_get_image_width(image_type image); \ OVERLOADABLE int __gen_ocl_get_image_height(image_type image); \ OVERLOADABLE int __gen_ocl_get_image_channel_data_type(image_type image); \ OVERLOADABLE int __gen_ocl_get_image_channel_order(image_type image); \ OVERLOADABLE int __gen_ocl_get_image_depth(image_type image); \ DECL_GEN_OCL_RW_IMAGE(image1d_t, 1) DECL_GEN_OCL_RW_IMAGE(image1d_buffer_t, 2) DECL_GEN_OCL_RW_IMAGE(image1d_array_t, 2) DECL_GEN_OCL_RW_IMAGE(image1d_array_t, 4) DECL_GEN_OCL_RW_IMAGE(image2d_t, 2) DECL_GEN_OCL_RW_IMAGE(image2d_array_t, 3) DECL_GEN_OCL_RW_IMAGE(image3d_t, 3) DECL_GEN_OCL_RW_IMAGE(image2d_array_t, 4) DECL_GEN_OCL_RW_IMAGE(image3d_t, 4) DECL_GEN_OCL_QUERY_IMAGE(read_only image1d_t) DECL_GEN_OCL_QUERY_IMAGE(read_only image1d_buffer_t) DECL_GEN_OCL_QUERY_IMAGE(read_only image1d_array_t) DECL_GEN_OCL_QUERY_IMAGE(read_only image2d_t) DECL_GEN_OCL_QUERY_IMAGE(read_only image2d_array_t) DECL_GEN_OCL_QUERY_IMAGE(read_only image3d_t) #if __clang_major__*10 + __clang_minor__ >= 39 DECL_GEN_OCL_QUERY_IMAGE(write_only image1d_t) DECL_GEN_OCL_QUERY_IMAGE(write_only image1d_buffer_t) DECL_GEN_OCL_QUERY_IMAGE(write_only image1d_array_t) DECL_GEN_OCL_QUERY_IMAGE(write_only image2d_t) DECL_GEN_OCL_QUERY_IMAGE(write_only image2d_array_t) DECL_GEN_OCL_QUERY_IMAGE(write_only image3d_t) #endif #if (__OPENCL_C_VERSION__ >= 200) #define DECL_GEN_OCL_RW_IMAGE_WR(image_type, n) \ OVERLOADABLE int4 __gen_ocl_read_imagei(read_write image_type image, sampler_t sampler, \ float ##n coord, uint sampler_offset); \ OVERLOADABLE int4 __gen_ocl_read_imagei(read_write image_type image, sampler_t sampler, \ int ##n coord, uint sampler_offset); \ OVERLOADABLE uint4 __gen_ocl_read_imageui(read_write image_type image, sampler_t sampler, \ float ##n coord, uint sampler_offset); \ OVERLOADABLE uint4 __gen_ocl_read_imageui(read_write image_type image, sampler_t sampler, \ int ##n coord, uint sampler_offset); \ OVERLOADABLE float4 __gen_ocl_read_imagef(read_write image_type image, sampler_t sampler, \ float ##n coord, uint sampler_offset); \ OVERLOADABLE float4 __gen_ocl_read_imagef(read_write image_type image, sampler_t sampler, \ int ##n coord, uint sampler_offset); \ OVERLOADABLE void __gen_ocl_write_imagei(read_write image_type image, int ##n coord , int4 color); \ OVERLOADABLE void __gen_ocl_write_imageui(read_write image_type image, int ##n coord, uint4 color);\ OVERLOADABLE void __gen_ocl_write_imagef(read_write image_type image, int ##n coord, float4 color); DECL_GEN_OCL_RW_IMAGE_WR(image1d_t, 1) DECL_GEN_OCL_RW_IMAGE_WR(image1d_buffer_t, 2) DECL_GEN_OCL_RW_IMAGE_WR(image1d_array_t, 2) DECL_GEN_OCL_RW_IMAGE_WR(image1d_array_t, 4) DECL_GEN_OCL_RW_IMAGE_WR(image2d_t, 2) DECL_GEN_OCL_RW_IMAGE_WR(image2d_array_t, 3) DECL_GEN_OCL_RW_IMAGE_WR(image3d_t, 3) DECL_GEN_OCL_RW_IMAGE_WR(image2d_array_t, 4) DECL_GEN_OCL_RW_IMAGE_WR(image3d_t, 4) DECL_GEN_OCL_QUERY_IMAGE(read_write image1d_t) DECL_GEN_OCL_QUERY_IMAGE(read_write image1d_buffer_t) DECL_GEN_OCL_QUERY_IMAGE(read_write image1d_array_t) DECL_GEN_OCL_QUERY_IMAGE(read_write image2d_t) DECL_GEN_OCL_QUERY_IMAGE(read_write image2d_array_t) DECL_GEN_OCL_QUERY_IMAGE(read_write image3d_t) #endif /////////////////////////////////////////////////////////////////////////////// // helper functions to validate array index. /////////////////////////////////////////////////////////////////////////////// INLINE_OVERLOADABLE float2 __gen_validate_array_index(float2 coord, read_only image1d_array_t image) { float array_size = __gen_ocl_get_image_depth(image); coord.s1 = clamp(rint(coord.s1), 0.f, array_size - 1.f); return coord; } INLINE_OVERLOADABLE float4 __gen_validate_array_index(float4 coord, read_only image2d_array_t image) { float array_size = __gen_ocl_get_image_depth(image); coord.s2 = clamp(rint(coord.s2), 0.f, array_size - 1.f); return coord; } INLINE_OVERLOADABLE float3 __gen_validate_array_index(float3 coord, read_only image2d_array_t image) { float array_size = __gen_ocl_get_image_depth(image); coord.s2 = clamp(rint(coord.s2), 0.f, array_size - 1.f); return coord; } INLINE_OVERLOADABLE int2 __gen_validate_array_index(int2 coord, read_only image1d_array_t image) { int array_size = __gen_ocl_get_image_depth(image); coord.s1 = clamp(coord.s1, 0, array_size - 1); return coord; } INLINE_OVERLOADABLE int4 __gen_validate_array_index(int4 coord, read_only image2d_array_t image) { int array_size = __gen_ocl_get_image_depth(image); coord.s2 = clamp(coord.s2, 0, array_size - 1); return coord; } INLINE_OVERLOADABLE int3 __gen_validate_array_index(int3 coord, read_only image2d_array_t image) { int array_size = __gen_ocl_get_image_depth(image); coord.s2 = clamp(coord.s2, 0, array_size - 1); return coord; } #if __clang_major__*10 + __clang_minor__ >= 39 INLINE_OVERLOADABLE float2 __gen_validate_array_index(float2 coord, write_only image1d_array_t image) { float array_size = __gen_ocl_get_image_depth(image); coord.s1 = clamp(rint(coord.s1), 0.f, array_size - 1.f); return coord; } INLINE_OVERLOADABLE float4 __gen_validate_array_index(float4 coord, write_only image2d_array_t image) { float array_size = __gen_ocl_get_image_depth(image); coord.s2 = clamp(rint(coord.s2), 0.f, array_size - 1.f); return coord; } INLINE_OVERLOADABLE float3 __gen_validate_array_index(float3 coord, write_only image2d_array_t image) { float array_size = __gen_ocl_get_image_depth(image); coord.s2 = clamp(rint(coord.s2), 0.f, array_size - 1.f); return coord; } INLINE_OVERLOADABLE int2 __gen_validate_array_index(int2 coord, write_only image1d_array_t image) { int array_size = __gen_ocl_get_image_depth(image); coord.s1 = clamp(coord.s1, 0, array_size - 1); return coord; } INLINE_OVERLOADABLE int4 __gen_validate_array_index(int4 coord, write_only image2d_array_t image) { int array_size = __gen_ocl_get_image_depth(image); coord.s2 = clamp(coord.s2, 0, array_size - 1); return coord; } INLINE_OVERLOADABLE int3 __gen_validate_array_index(int3 coord, write_only image2d_array_t image) { int array_size = __gen_ocl_get_image_depth(image); coord.s2 = clamp(coord.s2, 0, array_size - 1); return coord; } #endif #if (__OPENCL_C_VERSION__ >= 200) INLINE_OVERLOADABLE float2 __gen_validate_array_index(float2 coord, read_write image1d_array_t image) { float array_size = __gen_ocl_get_image_depth(image); coord.s1 = clamp(rint(coord.s1), 0.f, array_size - 1.f); return coord; } INLINE_OVERLOADABLE float4 __gen_validate_array_index(float4 coord, read_write image2d_array_t image) { float array_size = __gen_ocl_get_image_depth(image); coord.s2 = clamp(rint(coord.s2), 0.f, array_size - 1.f); return coord; } INLINE_OVERLOADABLE float3 __gen_validate_array_index(float3 coord, read_write image2d_array_t image) { float array_size = __gen_ocl_get_image_depth(image); coord.s2 = clamp(rint(coord.s2), 0.f, array_size - 1.f); return coord; } INLINE_OVERLOADABLE int2 __gen_validate_array_index(int2 coord, read_write image1d_array_t image) { int array_size = __gen_ocl_get_image_depth(image); coord.s1 = clamp(coord.s1, 0, array_size - 1); return coord; } INLINE_OVERLOADABLE int4 __gen_validate_array_index(int4 coord, read_write image2d_array_t image) { int array_size = __gen_ocl_get_image_depth(image); coord.s2 = clamp(coord.s2, 0, array_size - 1); return coord; } INLINE_OVERLOADABLE int3 __gen_validate_array_index(int3 coord, read_write image2d_array_t image) { int array_size = __gen_ocl_get_image_depth(image); coord.s2 = clamp(coord.s2, 0, array_size - 1); return coord; } #endif // For non array image type, we need to do nothing. #define GEN_VALIDATE_ARRAY_INDEX(coord_type, image_type) \ INLINE_OVERLOADABLE coord_type __gen_validate_array_index(coord_type coord, image_type image) \ { \ return coord; \ } GEN_VALIDATE_ARRAY_INDEX(float, read_only image1d_t) GEN_VALIDATE_ARRAY_INDEX(int, read_only image1d_t) GEN_VALIDATE_ARRAY_INDEX(float2, read_only image2d_t) GEN_VALIDATE_ARRAY_INDEX(int2, read_only image2d_t) GEN_VALIDATE_ARRAY_INDEX(float4, read_only image3d_t) GEN_VALIDATE_ARRAY_INDEX(int4, read_only image3d_t) GEN_VALIDATE_ARRAY_INDEX(float3, read_only image3d_t) GEN_VALIDATE_ARRAY_INDEX(int3, read_only image3d_t) GEN_VALIDATE_ARRAY_INDEX(float, read_only image1d_buffer_t) GEN_VALIDATE_ARRAY_INDEX(int, read_only image1d_buffer_t) #if __clang_major__*10 + __clang_minor__ >= 39 GEN_VALIDATE_ARRAY_INDEX(float, write_only image1d_t) GEN_VALIDATE_ARRAY_INDEX(int, write_only image1d_t) GEN_VALIDATE_ARRAY_INDEX(float2, write_only image2d_t) GEN_VALIDATE_ARRAY_INDEX(int2, write_only image2d_t) GEN_VALIDATE_ARRAY_INDEX(float4, write_only image3d_t) GEN_VALIDATE_ARRAY_INDEX(int4, write_only image3d_t) GEN_VALIDATE_ARRAY_INDEX(float3, write_only image3d_t) GEN_VALIDATE_ARRAY_INDEX(int3, write_only image3d_t) GEN_VALIDATE_ARRAY_INDEX(float, write_only image1d_buffer_t) GEN_VALIDATE_ARRAY_INDEX(int, write_only image1d_buffer_t) #endif #if (__OPENCL_C_VERSION__ >= 200) GEN_VALIDATE_ARRAY_INDEX(float, read_write image1d_t) GEN_VALIDATE_ARRAY_INDEX(int, read_write image1d_t) GEN_VALIDATE_ARRAY_INDEX(float2, read_write image2d_t) GEN_VALIDATE_ARRAY_INDEX(int2, read_write image2d_t) GEN_VALIDATE_ARRAY_INDEX(float4, read_write image3d_t) GEN_VALIDATE_ARRAY_INDEX(int4, read_write image3d_t) GEN_VALIDATE_ARRAY_INDEX(float3, read_write image3d_t) GEN_VALIDATE_ARRAY_INDEX(int3, read_write image3d_t) GEN_VALIDATE_ARRAY_INDEX(float, read_write image1d_buffer_t) GEN_VALIDATE_ARRAY_INDEX(int, read_write image1d_buffer_t) #endif /////////////////////////////////////////////////////////////////////////////// // Helper functions to work around some coordiate boundary issues. // The major issue on Gen7/Gen7.5 are the sample message could not sampling // integer type surfaces correctly with CLK_ADDRESS_CLAMP and CLK_FILTER_NEAREST. // The work around is to use a LD message instead of normal sample message. /////////////////////////////////////////////////////////////////////////////// bool __gen_ocl_sampler_need_fix(sampler_t); bool __gen_ocl_sampler_need_rounding_fix(sampler_t); bool __gen_sampler_need_fix(const sampler_t sampler) { return __gen_ocl_sampler_need_fix(sampler); } bool __gen_sampler_need_rounding_fix(const sampler_t sampler) { return __gen_ocl_sampler_need_rounding_fix(sampler); } INLINE_OVERLOADABLE float __gen_fixup_float_coord(float tmpCoord) { if (tmpCoord < 0 && tmpCoord > -0x1p-20f) tmpCoord += -0x1p-9f; return tmpCoord; } INLINE_OVERLOADABLE float2 __gen_fixup_float_coord(float2 tmpCoord) { if (tmpCoord.s0 < 0 && tmpCoord.s0 > -0x1p-20f) tmpCoord.s0 += -0x1p-9f; if (tmpCoord.s1 < 0 && tmpCoord.s1 > -0x1p-20f) tmpCoord.s1 += -0x1p-9f; return tmpCoord; } INLINE_OVERLOADABLE float3 __gen_fixup_float_coord(float3 tmpCoord) { if (tmpCoord.s0 < 0 && tmpCoord.s0 > -0x1p-20f) tmpCoord.s0 += -0x1p-9f; if (tmpCoord.s1 < 0 && tmpCoord.s1 > -0x1p-20f) tmpCoord.s1 += -0x1p-9f; if (tmpCoord.s2 < 0 && tmpCoord.s2 > -0x1p-20f) tmpCoord.s2 += -0x1p-9f; return tmpCoord; } INLINE_OVERLOADABLE float4 __gen_fixup_float_coord(float4 tmpCoord) { if (tmpCoord.s0 < 0 && tmpCoord.s0 > -0x1p-20f) tmpCoord.s0 += -0x1p-9f; if (tmpCoord.s1 < 0 && tmpCoord.s1 > -0x1p-20f) tmpCoord.s1 += -0x1p-9f; if (tmpCoord.s2 < 0 && tmpCoord.s2 > -0x1p-20f) tmpCoord.s2 += -0x1p-9f; return tmpCoord; } // Functions to denormalize coordiates, it's needed when we need to use LD // message (sampler offset is non-zero) and the coordiates are normalized // coordiates. INLINE_OVERLOADABLE float __gen_denormalize_coord(const image1d_t image, float srcCoord) { return srcCoord * __gen_ocl_get_image_width(image); } INLINE_OVERLOADABLE float2 __gen_denormalize_coord(const image1d_array_t image, float2 srcCoord) { srcCoord.s0 = srcCoord.s0 * __gen_ocl_get_image_width(image); return srcCoord; } INLINE_OVERLOADABLE float __gen_denormalize_coord(const image1d_buffer_t image, float srcCoord) { return srcCoord * __gen_ocl_get_image_width(image); } INLINE_OVERLOADABLE float2 __gen_denormalize_coord(const image2d_t image, float2 srcCoord) { srcCoord.s0 = srcCoord.s0 * __gen_ocl_get_image_width(image); srcCoord.s1 = srcCoord.s1 * __gen_ocl_get_image_height(image); return srcCoord; } INLINE_OVERLOADABLE float3 __gen_denormalize_coord(const image2d_array_t image, float3 srcCoord) { srcCoord.s0 = srcCoord.s0 * __gen_ocl_get_image_width(image); srcCoord.s1 = srcCoord.s1 * __gen_ocl_get_image_height(image); return srcCoord; } INLINE_OVERLOADABLE float3 __gen_denormalize_coord(const image3d_t image, float3 srcCoord) { srcCoord.s0 = srcCoord.s0 * __gen_ocl_get_image_width(image); srcCoord.s1 = srcCoord.s1 * __gen_ocl_get_image_height(image); srcCoord.s2 = srcCoord.s2 * __gen_ocl_get_image_depth(image); return srcCoord; } INLINE_OVERLOADABLE float4 __gen_denormalize_coord(const image2d_array_t image, float4 srcCoord) { srcCoord.s0 = srcCoord.s0 * __gen_ocl_get_image_width(image); srcCoord.s1 = srcCoord.s1 * __gen_ocl_get_image_height(image); return srcCoord; } INLINE_OVERLOADABLE float4 __gen_denormalize_coord(const image3d_t image, float4 srcCoord) { srcCoord.s0 = srcCoord.s0 * __gen_ocl_get_image_width(image); srcCoord.s1 = srcCoord.s1 * __gen_ocl_get_image_height(image); srcCoord.s2 = srcCoord.s2 * __gen_ocl_get_image_depth(image); return srcCoord; } // After denormalize, we have to fixup the negative boundary. INLINE_OVERLOADABLE float __gen_fixup_neg_boundary(float coord) { return coord < 0 ? -1 : coord; } INLINE_OVERLOADABLE float2 __gen_fixup_neg_boundary(float2 coord) { coord.s0 = coord.s0 < 0 ? -1 : coord.s0; coord.s1 = coord.s1 < 0 ? -1 : coord.s1; return coord; } INLINE_OVERLOADABLE float4 __gen_fixup_neg_boundary(float4 coord) { coord.s0 = coord.s0 < 0 ? -1 : coord.s0; coord.s1 = coord.s1 < 0 ? -1 : coord.s1; coord.s2 = coord.s2 < 0 ? -1 : coord.s2; return coord; } INLINE_OVERLOADABLE float3 __gen_fixup_neg_boundary(float3 coord) { coord.s0 = coord.s0 < 0 ? -1 : coord.s0; coord.s1 = coord.s1 < 0 ? -1 : coord.s1; coord.s2 = coord.s2 < 0 ? -1 : coord.s2; return coord; } /////////////////////////////////////////////////////////////////////////////// // Built-in Image Read/Write Functions /////////////////////////////////////////////////////////////////////////////// // 2D 3D Image Common Macro #ifdef GEN7_SAMPLER_CLAMP_BORDER_WORKAROUND #define GEN_FIX_FLOAT_ROUNDING 1 #define GEN_FIX_INT_CLAMPING 1 #else #define GEN_FIX_FLOAT_ROUNDING 0 #define GEN_FIX_INT_CLAMPING 0 #endif #define convert_float1 convert_float #define convert_int1 convert_int // For integer coordinates #define DECL_READ_IMAGE0(int_clamping_fix, image_type, \ image_data_type, suffix, coord_type, n) \ OVERLOADABLE image_data_type read_image ##suffix(read_only image_type cl_image, \ const sampler_t sampler, \ coord_type coord) \ { \ coord = __gen_validate_array_index(coord, cl_image); \ if (int_clamping_fix && __gen_sampler_need_fix(sampler)) \ return __gen_ocl_read_image ##suffix(cl_image, sampler, \ convert_int ##n(coord), 1); \ return __gen_ocl_read_image ##suffix(cl_image, sampler, \ convert_float ##n (coord), 0); \ } // For float coordinates #define DECL_READ_IMAGE1(int_clamping_fix, image_type, \ image_data_type, suffix, coord_type, n) \ OVERLOADABLE image_data_type read_image ##suffix(read_only image_type cl_image, \ const sampler_t sampler, \ coord_type coord) \ { \ coord_type tmpCoord = __gen_validate_array_index(coord, cl_image); \ if (GEN_FIX_FLOAT_ROUNDING | int_clamping_fix) { \ if (__gen_sampler_need_fix(sampler)) { \ if (GEN_FIX_FLOAT_ROUNDING && \ __gen_sampler_need_rounding_fix(sampler)) \ tmpCoord = __gen_fixup_float_coord(tmpCoord); \ if (int_clamping_fix) { \ if (!__gen_sampler_need_rounding_fix(sampler)) \ tmpCoord = __gen_denormalize_coord(cl_image, tmpCoord); \ tmpCoord = __gen_fixup_neg_boundary(tmpCoord); \ return __gen_ocl_read_image ##suffix( \ cl_image, sampler, convert_int ##n(tmpCoord), 1); \ } \ } \ } \ return __gen_ocl_read_image ##suffix(cl_image, sampler, \ convert_float ##n (tmpCoord), 0); \ } #define DECL_READ_IMAGE_NOSAMPLER(access_qual, image_type, image_data_type, \ suffix, coord_type, n) \ OVERLOADABLE image_data_type read_image ##suffix(access_qual image_type cl_image, \ coord_type coord) \ { \ coord = __gen_validate_array_index(coord, cl_image); \ sampler_t defaultSampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE \ | CLK_FILTER_NEAREST; \ return __gen_ocl_read_image ##suffix( \ cl_image, defaultSampler, convert_float ##n (coord), 0); \ } #define DECL_WRITE_IMAGE(access_qual, image_type, image_data_type, suffix, coord_type) \ OVERLOADABLE void write_image ##suffix(access_qual image_type cl_image, \ coord_type coord, \ image_data_type color) \ { \ coord_type fixedCoord = __gen_validate_array_index(coord, cl_image); \ __gen_ocl_write_image ##suffix(cl_image, fixedCoord, color); \ } #if (__OPENCL_C_VERSION__ >= 200) #define DECL_IMAGE(int_clamping_fix, image_type, image_data_type, suffix, n) \ DECL_READ_IMAGE0(int_clamping_fix, image_type, \ image_data_type, suffix, int ##n, n) \ DECL_READ_IMAGE1(int_clamping_fix, image_type, \ image_data_type, suffix, float ##n, n) \ DECL_READ_IMAGE_NOSAMPLER(read_only, image_type, image_data_type, suffix, int ##n, n) \ DECL_READ_IMAGE_NOSAMPLER(read_write, image_type, image_data_type, suffix, int ##n, n) \ DECL_WRITE_IMAGE(write_only, image_type, image_data_type, suffix, int ##n) \ DECL_WRITE_IMAGE(read_write, image_type, image_data_type, suffix, int ##n) #else #define DECL_IMAGE(int_clamping_fix, image_type, image_data_type, suffix, n) \ DECL_READ_IMAGE0(int_clamping_fix, image_type, \ image_data_type, suffix, int ##n, n) \ DECL_READ_IMAGE1(int_clamping_fix, image_type, \ image_data_type, suffix, float ##n, n) \ DECL_READ_IMAGE_NOSAMPLER(read_only, image_type, image_data_type, suffix, int ##n, n) \ DECL_WRITE_IMAGE(write_only, image_type, image_data_type, suffix, int ##n) #endif // 1D #define DECL_IMAGE_TYPE(image_type, n) \ DECL_IMAGE(GEN_FIX_INT_CLAMPING, image_type, int4, i, n) \ DECL_IMAGE(GEN_FIX_INT_CLAMPING, image_type, uint4, ui, n) \ DECL_IMAGE(0, image_type, float4, f, n) DECL_IMAGE_TYPE(image1d_t, 1) DECL_IMAGE_TYPE(image2d_t, 2) DECL_IMAGE_TYPE(image3d_t, 4) DECL_IMAGE_TYPE(image3d_t, 3) DECL_IMAGE_TYPE(image2d_array_t, 4) DECL_IMAGE_TYPE(image2d_array_t, 3) #define DECL_READ_IMAGE1D_BUFFER_NOSAMPLER(access_qual, image_type, image_data_type, \ suffix, coord_type) \ OVERLOADABLE image_data_type read_image ##suffix(access_qual image_type cl_image, \ coord_type coord) \ { \ sampler_t defaultSampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE \ | CLK_FILTER_NEAREST; \ int2 effectCoord; \ effectCoord.s0 = coord % 8192; \ effectCoord.s1 = coord / 8192; \ return __gen_ocl_read_image ##suffix( \ cl_image, defaultSampler, convert_float2(effectCoord), 0); \ } #define DECL_WRITE_IMAGE1D_BUFFER(access_qual, image_type, image_data_type, suffix, coord_type) \ OVERLOADABLE void write_image ##suffix(access_qual image_type cl_image, \ coord_type coord, \ image_data_type color) \ { \ int2 effectCoord; \ effectCoord.s0 = coord %8192; \ effectCoord.s1 = coord / 8192; \ __gen_ocl_write_image ##suffix(cl_image, effectCoord, color); \ } #if (__OPENCL_C_VERSION__ >= 200) #define DECL_IMAGE_1DBuffer(int_clamping_fix, image_data_type, suffix) \ DECL_READ_IMAGE1D_BUFFER_NOSAMPLER(read_only, image1d_buffer_t, image_data_type, \ suffix, int) \ DECL_READ_IMAGE1D_BUFFER_NOSAMPLER(read_write, image1d_buffer_t, image_data_type, \ suffix, int) \ DECL_WRITE_IMAGE1D_BUFFER(write_only, image1d_buffer_t, image_data_type, suffix, int) \ DECL_WRITE_IMAGE1D_BUFFER(read_write, image1d_buffer_t, image_data_type, suffix, int) #else #define DECL_IMAGE_1DBuffer(int_clamping_fix, image_data_type, suffix) \ DECL_READ_IMAGE1D_BUFFER_NOSAMPLER(read_only, image1d_buffer_t, image_data_type, \ suffix, int) \ DECL_WRITE_IMAGE1D_BUFFER(write_only, image1d_buffer_t, image_data_type, suffix, int) #endif DECL_IMAGE_1DBuffer(GEN_FIX_INT_CLAMPING, int4, i) DECL_IMAGE_1DBuffer(GEN_FIX_INT_CLAMPING, uint4, ui) DECL_IMAGE_1DBuffer(0, float4, f) // For 1D Array: // fixup_1darray_coord functions are to convert 1d array coord to 2d array coord // and the caller must set the sampler offset to 2 by using this converted coord. // It is used to work around an image 1d array restrication which could not set // ai in the LD message. We solve it by fake the same image as a 2D array, and // then access it by LD message as a 3D sufface, treat the ai as the w coordinate. INLINE_OVERLOADABLE float4 __gen_fixup_1darray_coord(float2 coord, image1d_array_t image) { float4 newCoord; newCoord.s0 = coord.s0 < 0 ? -1 : coord.s0; newCoord.s1 = 0; newCoord.s2 = coord.s1; newCoord.s3 = 0; return newCoord; } INLINE_OVERLOADABLE int4 __gen_fixup_1darray_coord(int2 coord, image1d_array_t image) { int4 newCoord; newCoord.s0 = coord.s0; newCoord.s1 = 0; newCoord.s2 = coord.s1; newCoord.s3 = 0; return newCoord; } // For integer coordinates #define DECL_READ_IMAGE0_1DArray(int_clamping_fix, \ image_data_type, suffix, coord_type) \ OVERLOADABLE image_data_type read_image ##suffix(image1d_array_t cl_image, \ const sampler_t sampler, \ coord_type coord) \ { \ coord = __gen_validate_array_index(coord, cl_image); \ if (int_clamping_fix && __gen_sampler_need_fix(sampler)) { \ int4 newCoord = __gen_fixup_1darray_coord(coord, cl_image); \ return __gen_ocl_read_image ##suffix(cl_image, sampler, newCoord, 2); \ } \ return __gen_ocl_read_image ##suffix(cl_image, sampler, \ convert_float2 (coord), 0); \ } // For float coordiates #define DECL_READ_IMAGE1_1DArray(int_clamping_fix, image_data_type, \ suffix, coord_type) \ OVERLOADABLE image_data_type read_image ##suffix(image1d_array_t cl_image, \ const sampler_t sampler, \ coord_type coord) \ { \ coord_type tmpCoord = __gen_validate_array_index(coord, cl_image); \ if (GEN_FIX_FLOAT_ROUNDING | int_clamping_fix) { \ if (__gen_sampler_need_fix(sampler)) { \ if (GEN_FIX_FLOAT_ROUNDING && \ __gen_sampler_need_rounding_fix(sampler)) \ tmpCoord = __gen_fixup_float_coord(tmpCoord); \ if (int_clamping_fix) { \ if (!__gen_sampler_need_rounding_fix(sampler)) \ tmpCoord = __gen_denormalize_coord(cl_image, tmpCoord); \ float4 newCoord = __gen_fixup_1darray_coord(tmpCoord, cl_image); \ return __gen_ocl_read_image ##suffix( \ cl_image, sampler, convert_int4(newCoord), 2); \ } \ } \ } \ return __gen_ocl_read_image ##suffix(cl_image, sampler, \ convert_float2 (tmpCoord), 0); \ } #if (__OPENCL_C_VERSION__ >= 200) #define DECL_IMAGE_1DArray(int_clamping_fix, image_data_type, suffix) \ DECL_READ_IMAGE0_1DArray(int_clamping_fix, image_data_type, suffix, int2) \ DECL_READ_IMAGE1_1DArray(int_clamping_fix, image_data_type, \ suffix, float2) \ DECL_READ_IMAGE_NOSAMPLER(read_only, image1d_array_t, image_data_type, suffix, int2, 2) \ DECL_READ_IMAGE_NOSAMPLER(read_write, image1d_array_t, image_data_type, suffix, int2, 2)\ DECL_WRITE_IMAGE(write_only, image1d_array_t, image_data_type, suffix, int2) \ DECL_WRITE_IMAGE(read_write, image1d_array_t, image_data_type, suffix, int2) #else #define DECL_IMAGE_1DArray(int_clamping_fix, image_data_type, suffix) \ DECL_READ_IMAGE0_1DArray(int_clamping_fix, image_data_type, suffix, int2) \ DECL_READ_IMAGE1_1DArray(int_clamping_fix, image_data_type, \ suffix, float2) \ DECL_READ_IMAGE_NOSAMPLER(read_only, image1d_array_t, image_data_type, suffix, int2, 2) \ DECL_WRITE_IMAGE(write_only, image1d_array_t, image_data_type, suffix, int2) #endif DECL_IMAGE_1DArray(GEN_FIX_INT_CLAMPING, int4, i) DECL_IMAGE_1DArray(GEN_FIX_INT_CLAMPING, uint4, ui) DECL_IMAGE_1DArray(0, float4, f) /////////////////////////////////////////////////////////////////////////////// // Built-in Image Query Functions /////////////////////////////////////////////////////////////////////////////// #define DECL_IMAGE_INFO_COMMON(image_type) \ OVERLOADABLE int get_image_channel_data_type(image_type image) \ { \ return __gen_ocl_get_image_channel_data_type(image); \ } \ OVERLOADABLE int get_image_channel_order(image_type image) \ { \ return __gen_ocl_get_image_channel_order(image); \ } \ OVERLOADABLE int get_image_width(image_type image) \ { \ return __gen_ocl_get_image_width(image); \ } DECL_IMAGE_INFO_COMMON(read_only image1d_t) DECL_IMAGE_INFO_COMMON(read_only image1d_buffer_t) DECL_IMAGE_INFO_COMMON(read_only image1d_array_t) DECL_IMAGE_INFO_COMMON(read_only image2d_t) DECL_IMAGE_INFO_COMMON(read_only image3d_t) DECL_IMAGE_INFO_COMMON(read_only image2d_array_t) #if __clang_major__*10 + __clang_minor__ >= 39 DECL_IMAGE_INFO_COMMON(write_only image1d_t) DECL_IMAGE_INFO_COMMON(write_only image1d_buffer_t) DECL_IMAGE_INFO_COMMON(write_only image1d_array_t) DECL_IMAGE_INFO_COMMON(write_only image2d_t) DECL_IMAGE_INFO_COMMON(write_only image3d_t) DECL_IMAGE_INFO_COMMON(write_only image2d_array_t) #endif #if (__OPENCL_C_VERSION__ >= 200) DECL_IMAGE_INFO_COMMON(read_write image1d_t) DECL_IMAGE_INFO_COMMON(read_write image1d_buffer_t) DECL_IMAGE_INFO_COMMON(read_write image1d_array_t) DECL_IMAGE_INFO_COMMON(read_write image2d_t) DECL_IMAGE_INFO_COMMON(read_write image3d_t) DECL_IMAGE_INFO_COMMON(read_write image2d_array_t) #endif // 2D extra Info OVERLOADABLE int get_image_height(read_only image2d_t image) { return __gen_ocl_get_image_height(image); } OVERLOADABLE int2 get_image_dim(read_only image2d_t image) { return (int2){get_image_width(image), get_image_height(image)}; } #if __clang_major__*10 + __clang_minor__ >= 39 OVERLOADABLE int get_image_height(write_only image2d_t image) { return __gen_ocl_get_image_height(image); } OVERLOADABLE int2 get_image_dim(write_only image2d_t image) { return (int2){get_image_width(image), get_image_height(image)}; } #endif #if (__OPENCL_C_VERSION__ >= 200) OVERLOADABLE int get_image_height(read_write image2d_t image) { return __gen_ocl_get_image_height(image); } OVERLOADABLE int2 get_image_dim(read_write image2d_t image) { return (int2){get_image_width(image), get_image_height(image)}; } #endif // End of 2D // 3D extra Info OVERLOADABLE int get_image_height(read_only image3d_t image) { return __gen_ocl_get_image_height(image); } OVERLOADABLE int get_image_depth(read_only image3d_t image) { return __gen_ocl_get_image_depth(image); } OVERLOADABLE int4 get_image_dim(read_only image3d_t image) { return (int4) (get_image_width(image), get_image_height(image), get_image_depth(image), 0); } #if __clang_major__*10 + __clang_minor__ >= 39 OVERLOADABLE int get_image_height(write_only image3d_t image) { return __gen_ocl_get_image_height(image); } OVERLOADABLE int get_image_depth(write_only image3d_t image) { return __gen_ocl_get_image_depth(image); } OVERLOADABLE int4 get_image_dim(write_only image3d_t image) { return (int4) (get_image_width(image), get_image_height(image), get_image_depth(image), 0); } #endif #if (__OPENCL_C_VERSION__ >= 200) OVERLOADABLE int get_image_height(read_write image3d_t image) { return __gen_ocl_get_image_height(image); } OVERLOADABLE int get_image_depth(read_write image3d_t image) { return __gen_ocl_get_image_depth(image); } OVERLOADABLE int4 get_image_dim(read_write image3d_t image) { return (int4) (get_image_width(image), get_image_height(image), get_image_depth(image), 0); } #endif // 2D Array extra Info OVERLOADABLE int get_image_height(read_only image2d_array_t image) { return __gen_ocl_get_image_height(image); } OVERLOADABLE int2 get_image_dim(read_only image2d_array_t image) { return (int2){get_image_width(image), get_image_height(image)}; } OVERLOADABLE size_t get_image_array_size(read_only image2d_array_t image) { return __gen_ocl_get_image_depth(image); } #if __clang_major__*10 + __clang_minor__ >= 39 OVERLOADABLE int get_image_height(write_only image2d_array_t image) { return __gen_ocl_get_image_height(image); } OVERLOADABLE int2 get_image_dim(write_only image2d_array_t image) { return (int2){get_image_width(image), get_image_height(image)}; } OVERLOADABLE size_t get_image_array_size(write_only image2d_array_t image) { return __gen_ocl_get_image_depth(image); } #endif #if (__OPENCL_C_VERSION__ >= 200) OVERLOADABLE int get_image_height(read_write image2d_array_t image) { return __gen_ocl_get_image_height(image); } OVERLOADABLE int2 get_image_dim(read_write image2d_array_t image) { return (int2){get_image_width(image), get_image_height(image)}; } OVERLOADABLE size_t get_image_array_size(read_write image2d_array_t image) { return __gen_ocl_get_image_depth(image); } #endif // 1D Array info OVERLOADABLE size_t get_image_array_size(read_only image1d_array_t image) { return __gen_ocl_get_image_depth(image); } #if __clang_major__*10 + __clang_minor__ >= 39 OVERLOADABLE size_t get_image_array_size(write_only image1d_array_t image) { return __gen_ocl_get_image_depth(image); } #endif #if (__OPENCL_C_VERSION__ >= 200) OVERLOADABLE size_t get_image_array_size(read_write image1d_array_t image) { return __gen_ocl_get_image_depth(image); } #endif // End of 1DArray Beignet-1.3.2-Source/backend/src/libocl/src/ocl_barrier_20.ll000664 001750 001750 00000001751 13161142102 023001 0ustar00yryr000000 000000 ;XXX FIXME as llvm can't use macros, we hardcoded 3, 1, 2 ;here, we may need to use a more grace way to handle this type ;of values latter. ;#define CLK_LOCAL_MEM_FENCE (1 << 0) ;#define CLK_GLOBAL_MEM_FENCE (1 << 1) target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" target triple = "spir64" declare i32 @_get_local_mem_fence() nounwind alwaysinline declare i32 @_get_global_mem_fence() nounwind alwaysinline declare void @__gen_ocl_barrier_local() nounwind alwaysinline noduplicate declare void @__gen_ocl_barrier_global() nounwind alwaysinline noduplicate declare void @__gen_ocl_debugwait() nounwind alwaysinline noduplicate declare void @__gen_ocl_barrier(i32) nounwind alwaysinline noduplicate define void @_Z7barrierj(i32 %flags) nounwind noduplicate alwaysinline { call void @__gen_ocl_barrier(i32 %flags) ret void } define void @_Z9debugwaitv() nounwind noduplicate alwaysinline { call void @__gen_ocl_debugwait() ret void } Beignet-1.3.2-Source/backend/src/libocl/CMakeLists.txt000664 001750 001750 00000032701 13173554000 021643 0ustar00yryr000000 000000 PROJECT(LIBOCL) SET (OCL_OBJECT_DIR ${LIBOCL_BINARY_DIR}/${BEIGNET_INSTALL_DIR}) SET (OCL_HEADER_FILES ${OCL_OBJECT_DIR}/include/ocl_defines.h) SET (OCL_SOURCE_FILES "") SET (OCL_SOURCE_FILES_12 "") SET (OCL_SOURCE_FILES_20 "") ADD_CUSTOM_COMMAND(OUTPUT ${OCL_OBJECT_DIR}/include/ocl_defines.h COMMAND mkdir -p ${OCL_OBJECT_DIR}/include/ # COMMAND echo "cat ${LIBOCL_SOURCE_DIR}/tmpl/ocl_defines.tmpl.h \\> ${LIBOCL_BINARY_DIR}/include/ocl_defines.h" COMMAND cat ${LIBOCL_SOURCE_DIR}/tmpl/ocl_defines.tmpl.h > ${OCL_OBJECT_DIR}/include/ocl_defines.h # COMMAND echo "cat ${LIBOCL_SOURCE_DIR}/../ocl_common_defines.h \\>\\> ${LIBOCL_BINARY_DIR}/include/ocl_defines.h" COMMAND cat ${LIBOCL_SOURCE_DIR}/../ocl_common_defines.h >> ${OCL_OBJECT_DIR}/include/ocl_defines.h DEPENDS ${LIBOCL_SOURCE_DIR}/tmpl/ocl_defines.tmpl.h ${LIBOCL_SOURCE_DIR}/../ocl_common_defines.h COMMENT "Generate the header: ${LIBOCL_BINARY_DIR}/include/ocl_defines.h" ) #other module just copy. MACRO(COPY_THE_HEADER _mod) # Use the python script to generate the header files. STRING(REGEX REPLACE "\(o.*\)" "${OCL_OBJECT_DIR}/include/\\1.h" output_name ${_mod}) STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_SOURCE_DIR}/include/\\1.h" orgin_name ${_mod}) SET(OCL_HEADER_FILES ${OCL_HEADER_FILES} ${output_name}) IF(orgin_name STREQUAL output_name) ELSE(orgin_name STREQUAL output_name) ADD_CUSTOM_COMMAND(OUTPUT ${output_name} COMMAND mkdir -p ${OCL_OBJECT_DIR}/include/ #COMMAND echo "cp ${orgin_name} ${output_name}" COMMAND cp ${orgin_name} ${output_name} DEPENDS ${orgin_name} COMMENT "Copy the header: ${output_name}" ) ENDIF(orgin_name STREQUAL output_name) ENDMACRO(COPY_THE_HEADER) MACRO(COPY_THE_SOURCE _source _mod) # Use the python script to generate the header files. STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_BINARY_DIR}/src/\\1.cl" output_name ${_mod}) STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_SOURCE_DIR}/src/\\1.cl" orgin_name ${_mod}) SET(${_source} ${${_source}} ${output_name}) IF(orgin_name STREQUAL output_name) ELSE(orgin_name STREQUAL output_name) ADD_CUSTOM_COMMAND(OUTPUT ${output_name} COMMAND mkdir -p ${LIBOCL_BINARY_DIR}/src/ #COMMAND echo "cp ${orgin_name} ${output_name}" COMMAND cp ${orgin_name} ${output_name} DEPENDS ${orgin_name} COMMENT "Copy the source: ${output_name}" ) ENDIF(orgin_name STREQUAL output_name) ENDMACRO(COPY_THE_SOURCE) SET (OCL_COPY_HEADERS ocl ocl_types ocl_float ocl_printf) FOREACH(M ${OCL_COPY_HEADERS}) COPY_THE_HEADER(${M}) ENDFOREACH(M) SET (OCL_COPY_MODULES ocl_workitem ocl_async ocl_sync ocl_memcpy ocl_memset ocl_misc ocl_geometric ocl_image ocl_work_group) FOREACH(M ${OCL_COPY_MODULES}) COPY_THE_HEADER(${M}) COPY_THE_SOURCE(OCL_SOURCE_FILES ${M}) ENDFOREACH(M) SET (OCL_COPY_MODULES_12 ocl_vload ocl_atom) FOREACH(M ${OCL_COPY_MODULES_12}) COPY_THE_HEADER(${M}) COPY_THE_SOURCE(OCL_SOURCE_FILES_12 ${M}) ENDFOREACH(M) SET (OCL_COPY_MODULES_20 ocl_vload_20 ocl_atom_20 ocl_pipe ocl_enqueue) FOREACH(M ${OCL_COPY_MODULES_20}) COPY_THE_HEADER(${M}) COPY_THE_SOURCE(OCL_SOURCE_FILES_20 ${M}) ENDFOREACH(M) MACRO(GENERATE_HEADER_PY _mod) STRING(REGEX REPLACE "\(o.*\)" "${OCL_OBJECT_DIR}/include/\\1.h" output_name ${_mod}) STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_SOURCE_DIR}/tmpl/\\1.tmpl.h" tmpl_name ${_mod}) STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_SOURCE_DIR}/script/\\1.def" def_name ${_mod}) SET(OCL_HEADER_FILES ${OCL_HEADER_FILES} ${output_name}) ADD_CUSTOM_COMMAND(OUTPUT ${output_name} COMMAND mkdir -p ${OCL_OBJECT_DIR}/include/ #COMMAND echo "cat ${tmpl_name} \\> ${output_name}" COMMAND cat ${tmpl_name} > ${output_name} #COMMAND echo "${LIBOCL_SOURCE_DIR}/script/gen_vector.py ${def_name} ${output_name} 1" COMMAND ${PYTHON_EXECUTABLE} ${LIBOCL_SOURCE_DIR}/script/gen_vector.py ${def_name} ${output_name} 1 #COMMAND echo "echo \\#endif \\>\\> ${output_name}" COMMAND echo "\\#endif" >> ${output_name} DEPENDS ${tmpl_name} ${def_name} ${LIBOCL_SOURCE_DIR}/script/gen_vector.py COMMENT "Generate the header by python: ${output_name}" ) ENDMACRO(GENERATE_HEADER_PY) MACRO(GENERATE_SOURCE_PY _source _mod) STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_BINARY_DIR}/src/\\1.cl" output_name ${_mod}) STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_SOURCE_DIR}/tmpl/\\1.tmpl.cl" tmpl_name ${_mod}) STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_SOURCE_DIR}/script/\\1.def" def_name ${_mod}) SET(${_source} ${${_source}} ${output_name}) ADD_CUSTOM_COMMAND(OUTPUT ${output_name} COMMAND mkdir -p ${LIBOCL_BINARY_DIR}/src/ COMMAND cat ${tmpl_name} > ${output_name} COMMAND ${PYTHON_EXECUTABLE} ${LIBOCL_SOURCE_DIR}/script/gen_vector.py ${def_name} ${output_name} 0 DEPENDS ${tmpl_name} ${def_name} ${LIBOCL_SOURCE_DIR}/script/gen_vector.py COMMENT "Generate the source by python: ${output_name}" ) ENDMACRO(GENERATE_SOURCE_PY) SET (OCL_PY_GENERATED_MODULES ocl_common ocl_relational ocl_integer ocl_simd) FOREACH(M ${OCL_PY_GENERATED_MODULES}) GENERATE_HEADER_PY(${M}) GENERATE_SOURCE_PY(OCL_SOURCE_FILES ${M}) ENDFOREACH(M) SET (OCL_PY_GENERATED_MODULES_12 ocl_math) FOREACH(M ${OCL_PY_GENERATED_MODULES_12}) GENERATE_HEADER_PY(${M}) GENERATE_SOURCE_PY(OCL_SOURCE_FILES_12 ${M}) ENDFOREACH(M) SET (OCL_PY_GENERATED_MODULES_20 ocl_math_20) FOREACH(M ${OCL_PY_GENERATED_MODULES_20}) GENERATE_HEADER_PY(${M}) GENERATE_SOURCE_PY(OCL_SOURCE_FILES_20 ${M}) ENDFOREACH(M) MACRO(GENERATE_HEADER_BASH _mod) # Use the python script to generate the header files. STRING(REGEX REPLACE "\(o.*\)" "${OCL_OBJECT_DIR}/include/\\1.h" output_name ${_mod}) STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_SOURCE_DIR}/script/\\1.sh" sh_name ${_mod}) SET(OCL_HEADER_FILES ${OCL_HEADER_FILES} ${output_name}) ADD_CUSTOM_COMMAND(OUTPUT ${output_name} COMMAND mkdir -p ${OCL_OBJECT_DIR}/include/ COMMAND ${sh_name} -p > ${output_name} DEPENDS ${sh_name} COMMENT "Generate the header by script: ${output_name}" ) ENDMACRO(GENERATE_HEADER_BASH) MACRO(GENERATE_SOURCE_BASH _mod) # Use the python script to generate the header files. STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_BINARY_DIR}/src/\\1.cl" output_name ${_mod}) STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_SOURCE_DIR}/script/\\1.sh" def_name ${_mod}) SET(OCL_SOURCE_FILES ${OCL_SOURCE_FILES} ${output_name}) ADD_CUSTOM_COMMAND(OUTPUT ${output_name} COMMAND mkdir -p ${LIBOCL_BINARY_DIR}/src/ COMMAND ${sh_name} > ${output_name} DEPENDS ${sh_name} COMMENT "Generate the source by script: ${output_name}" ) ENDMACRO(GENERATE_SOURCE_BASH) SET (OCL_BASH_GENERATED_MODULES ocl_as ocl_convert) FOREACH(M ${OCL_BASH_GENERATED_MODULES}) GENERATE_HEADER_BASH(${M}) GENERATE_SOURCE_BASH(${M}) ENDFOREACH(M) SET (CLANG_OCL_FLAGS -fno-builtin -ffp-contract=off -triple spir -cl-kernel-arg-info -DGEN7_SAMPLER_CLAMP_BORDER_WORKAROUND "-cl-std=CL1.2" -D__OPENCL_C_VERSION__=120) SET (CLANG_OCL_FLAGS_20 -fno-builtin -ffp-contract=off -triple spir64 -cl-kernel-arg-info -fblocks -DGEN7_SAMPLER_CLAMP_BORDER_WORKAROUND "-cl-std=CL2.0" -D__OPENCL_C_VERSION__=200) MACRO(ADD_CL_TO_BC_TARGET _file _output _clang_flag) # CMake seems can not add pattern rule, use MACRO to replace. ADD_CUSTOM_COMMAND(OUTPUT ${_output} COMMAND mkdir -p ${OCL_OBJECT_DIR}/ #COMMAND echo ${LLVM_INSTALL_DIR}clang -cc1 ${CLANG_OCL_FLAGS} -I ${LIBOCL_BINARY_DIR}/include/ -emit-llvm-bc -o ${output_name} -x cl ${_file} COMMAND ${CLANG_EXECUTABLE} -cc1 ${_clang_flag} -I ${OCL_OBJECT_DIR}/include/ -emit-llvm-bc -o ${_output} -x cl ${_file} DEPENDS ${_file} ${OCL_HEADER_FILES} COMMENT "Compiling ${_file}" ) ENDMACRO(ADD_CL_TO_BC_TARGET) FOREACH(f ${OCL_SOURCE_FILES}) STRING(REGEX REPLACE "${LIBOCL_BINARY_DIR}/src/\(o.*\)\\.cl" "${OCL_OBJECT_DIR}/\\1.bc" bc_name ${f}) SET(OCL_BC_FILES_12 ${OCL_BC_FILES_12} ${bc_name}) ADD_CL_TO_BC_TARGET(${f} ${bc_name} "${CLANG_OCL_FLAGS}") ENDFOREACH(f) FOREACH(f ${OCL_SOURCE_FILES_12}) STRING(REGEX REPLACE "${LIBOCL_BINARY_DIR}/src/\(o.*\)\\.cl" "${OCL_OBJECT_DIR}/\\1.bc" bc_name ${f}) SET(OCL_BC_FILES_12 ${OCL_BC_FILES_12} ${bc_name}) ADD_CL_TO_BC_TARGET(${f} ${bc_name} "${CLANG_OCL_FLAGS}") ENDFOREACH(f) # handle the ll files MACRO(COPY_THE_LL _mod) # Use the python script to generate the header files. STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_BINARY_DIR}/src/\\1.ll" output_name ${_mod}) STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_SOURCE_DIR}/src/\\1.ll" orgin_name ${_mod}) IF(orgin_name STREQUAL output_name) ELSE(orgin_name STREQUAL output_name) ADD_CUSTOM_COMMAND(OUTPUT ${output_name} COMMAND mkdir -p ${LIBOCL_BINARY_DIR}/src/ #COMMAND echo "cp ${orgin_name} ${output_name}" COMMAND cp ${orgin_name} ${output_name} DEPENDS ${orgin_name} COMMENT "Copy the LL file: ${output_name}" ) ENDIF(orgin_name STREQUAL output_name) ENDMACRO(COPY_THE_LL) MACRO(ADD_LL_TO_BC_TARGET M) STRING(REGEX REPLACE "\(o.*\)" "${OCL_OBJECT_DIR}/\\1.bc" output_name ${M}) STRING(REGEX REPLACE "\(o.*\)" "${LIBOCL_BINARY_DIR}/src/\\1.ll" srcll_name ${M}) ADD_CUSTOM_COMMAND(OUTPUT ${output_name} COMMAND mkdir -p ${OCL_OBJECT_DIR}/ #COMMAND echo ${LLVM_INSTALL_DIR}llvm-as -o ${output_name} ${srcll_name} COMMAND ${LLVM_AS_EXECUTABLE} -o ${output_name} ${srcll_name} DEPENDS ${srcll_name} COMMENT "Compiling ${output_name}" ) ENDMACRO(ADD_LL_TO_BC_TARGET) SET (OCL_LL_MODULES_12 ocl_barrier ocl_clz ocl_ctz ocl_sampler) FOREACH(f ${OCL_LL_MODULES_12}) COPY_THE_LL(${f}) ADD_LL_TO_BC_TARGET(${f}) STRING(REGEX REPLACE "\(o.*\)" "${OCL_OBJECT_DIR}/\\1.bc" bc_name ${f}) SET(OCL_BC_FILES_12 ${OCL_BC_FILES_12} ${bc_name}) ENDFOREACH(f) ADD_CUSTOM_COMMAND(OUTPUT ${OCL_OBJECT_DIR}/beignet.bc COMMAND mkdir -p ${LIBOCL_BINARY_DIR}/lib/ #COMMAND echo llvm-link -o ${LIBOCL_BINARY_DIR}/lib/beignet.bc ${OCL_BC_FILES12} COMMAND ${LLVM_LINK_EXECUTABLE} -o ${OCL_OBJECT_DIR}/beignet.bc ${OCL_BC_FILES_12} DEPENDS ${OCL_BC_FILES_12} COMMENT "Generate the bitcode file: ${OCL_OBJECT_DIR}/beignet.bc" ) ADD_CUSTOM_COMMAND(OUTPUT ${OCL_OBJECT_DIR}/beignet.local.pch COMMAND mkdir -p ${OCL_OBJECT_DIR} COMMAND ${CLANG_EXECUTABLE} -cc1 ${CLANG_OCL_FLAGS} -I ${OCL_OBJECT_DIR}/include/ -emit-pch -x cl ${OCL_OBJECT_DIR}/include/ocl.h -o ${OCL_OBJECT_DIR}/beignet.local.pch DEPENDS ${OCL_HEADER_FILES} COMMENT "Generate the pch file: ${OCL_OBJECT_DIR}/beignet.local.pch" ) ADD_CUSTOM_COMMAND(OUTPUT ${OCL_OBJECT_DIR}/beignet.pch COMMAND mkdir -p ${OCL_OBJECT_DIR} COMMAND ${CLANG_EXECUTABLE} -cc1 ${CLANG_OCL_FLAGS} -I ${OCL_OBJECT_DIR}/include/ --relocatable-pch -emit-pch -isysroot ${LIBOCL_BINARY_DIR} -x cl ${OCL_OBJECT_DIR}/include/ocl.h -o ${OCL_OBJECT_DIR}/beignet.pch DEPENDS ${OCL_HEADER_FILES} COMMENT "Generate the pch file: ${OCL_OBJECT_DIR}/beignet.pch" ) if (ENABLE_OPENCL_20) FOREACH(f ${OCL_SOURCE_FILES}) STRING(REGEX REPLACE "${LIBOCL_BINARY_DIR}/src/\(o.*\)\\.cl" "${OCL_OBJECT_DIR}/\\1_20.bc" bc_name ${f}) SET(OCL_BC_FILES_20 ${OCL_BC_FILES_20} ${bc_name}) ADD_CL_TO_BC_TARGET(${f} ${bc_name} "${CLANG_OCL_FLAGS_20}") ENDFOREACH(f) FOREACH(f ${OCL_SOURCE_FILES_20}) STRING(REGEX REPLACE "${LIBOCL_BINARY_DIR}/src/\(o.*\)\\.cl" "${OCL_OBJECT_DIR}/\\1.bc" bc_name ${f}) SET(OCL_BC_FILES_20 ${OCL_BC_FILES_20} ${bc_name}) ADD_CL_TO_BC_TARGET(${f} ${bc_name} "${CLANG_OCL_FLAGS_20}") ENDFOREACH(f) SET (OCL_LL_MODULES_20 ocl_barrier_20 ocl_clz_20 ocl_ctz_20 ocl_atomic_20 ocl_sampler_20) FOREACH(f ${OCL_LL_MODULES_20}) COPY_THE_LL(${f}) ADD_LL_TO_BC_TARGET(${f}) STRING(REGEX REPLACE "\(o.*\)" "${OCL_OBJECT_DIR}/\\1.bc" bc_name ${f}) SET(OCL_BC_FILES_20 ${OCL_BC_FILES_20} ${bc_name}) ENDFOREACH(f) ADD_CUSTOM_COMMAND(OUTPUT ${OCL_OBJECT_DIR}/beignet_20.bc COMMAND mkdir -p ${LIBOCL_BINARY_DIR}/lib/ #COMMAND echo llvm-link -o ${LIBOCL_BINARY_DIR}/lib/beignet.bc ${OCL_BC_FILES} COMMAND ${LLVM_LINK_EXECUTABLE} -o ${OCL_OBJECT_DIR}/beignet_20.bc ${OCL_BC_FILES_20} DEPENDS ${OCL_BC_FILES_20} COMMENT "Generate the bitcode file: ${OCL_OBJECT_DIR}/beignet_20.bc" ) ADD_CUSTOM_COMMAND(OUTPUT ${OCL_OBJECT_DIR}/beignet_20.local.pch COMMAND mkdir -p ${OCL_OBJECT_DIR} COMMAND ${CLANG_EXECUTABLE} -cc1 ${CLANG_OCL_FLAGS_20} -I ${OCL_OBJECT_DIR}/include/ -emit-pch -x cl ${OCL_OBJECT_DIR}/include/ocl.h -o ${OCL_OBJECT_DIR}/beignet_20.local.pch DEPENDS ${OCL_HEADER_FILES} COMMENT "Generate the pch file: ${OCL_OBJECT_DIR}/beignet_20.local.pch" ) ADD_CUSTOM_COMMAND(OUTPUT ${OCL_OBJECT_DIR}/beignet_20.pch COMMAND mkdir -p ${OCL_OBJECT_DIR} COMMAND ${CLANG_EXECUTABLE} -cc1 ${CLANG_OCL_FLAGS_20} -I ${OCL_OBJECT_DIR}/include/ --relocatable-pch -emit-pch -isysroot ${LIBOCL_BINARY_DIR} -x cl ${OCL_OBJECT_DIR}/include/ocl.h -o ${OCL_OBJECT_DIR}/beignet_20.pch DEPENDS ${OCL_HEADER_FILES} COMMENT "Generate the pch file: ${OCL_OBJECT_DIR}/beignet_20.pch" ) endif (ENABLE_OPENCL_20) if (ENABLE_OPENCL_20) add_custom_target(beignet_bitcode ALL DEPENDS ${OCL_OBJECT_DIR}/beignet.bc ${OCL_OBJECT_DIR}/beignet_20.bc ${OCL_OBJECT_DIR}/beignet.pch ${OCL_OBJECT_DIR}/beignet_20.pch ${OCL_OBJECT_DIR}/beignet.local.pch ${OCL_OBJECT_DIR}/beignet_20.local.pch) else(ENABLE_OPENCL_20) add_custom_target(beignet_bitcode ALL DEPENDS ${OCL_OBJECT_DIR}/beignet.bc ${OCL_OBJECT_DIR}/beignet.pch ${OCL_OBJECT_DIR}/beignet.local.pch) endif (ENABLE_OPENCL_20) SET (OCL_OBJECT_DIR ${OCL_OBJECT_DIR} PARENT_SCOPE) SET (OCL_HEADER_FILES ${OCL_HEADER_FILES} PARENT_SCOPE) Beignet-1.3.2-Source/backend/src/libocl/include/000775 001750 001750 00000000000 13174334761 020536 5ustar00yryr000000 000000 Beignet-1.3.2-Source/backend/src/libocl/include/ocl_async.h000664 001750 001750 00000003752 13161142102 022646 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_ASYNC_H__ #define __OCL_ASYNC_H__ #include "ocl_types.h" #define DEFN(TYPE) \ OVERLOADABLE event_t async_work_group_copy (local TYPE *dst, const global TYPE *src, \ size_t num, event_t event); \ OVERLOADABLE event_t async_work_group_copy (global TYPE *dst, const local TYPE *src, \ size_t num, event_t event); \ OVERLOADABLE event_t async_work_group_strided_copy (local TYPE *dst, const global TYPE *src, \ size_t num, size_t src_stride, event_t event); \ OVERLOADABLE event_t async_work_group_strided_copy (global TYPE *dst, const local TYPE *src, \ size_t num, size_t dst_stride, event_t event); \ #define DEF(TYPE) \ DEFN(TYPE); DEFN(TYPE##2); DEFN(TYPE##3); DEFN(TYPE##4); DEFN(TYPE##8); DEFN(TYPE##16); DEF(char) DEF(uchar) DEF(short) DEF(ushort) DEF(int) DEF(uint) DEF(long) DEF(ulong) DEF(float) DEF(double) #undef DEFN #undef DEF OVERLOADABLE void wait_group_events (int num_events, event_t *event_list); #define DEFN(TYPE) \ OVERLOADABLE void prefetch(const global TYPE *p, size_t num); #define DEF(TYPE) \ DEFN(TYPE); DEFN(TYPE##2); DEFN(TYPE##3); DEFN(TYPE##4); DEFN(TYPE##8); DEFN(TYPE##16) DEF(char); DEF(uchar); DEF(short); DEF(ushort); DEF(int); DEF(uint); DEF(long); DEF(ulong); DEF(float); #undef DEFN #undef DEF #endif Beignet-1.3.2-Source/backend/src/libocl/include/ocl_types.h000664 001750 001750 00000010653 13161142102 022673 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_TYPES_H__ #define __OCL_TYPES_H__ #pragma OPENCL EXTENSION cl_khr_fp64 : enable #pragma OPENCL EXTENSION cl_khr_fp16 : enable #define DISABLE_ATOMIC_INT64 #ifndef DISABLE_ATOMIC_INT64 #pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable #pragma OPENCL EXTENSION cl_khr_int64_extended_atomics : enable #endif #include "ocl_defines.h" #define NULL 0 ///////////////////////////////////////////////////////////////////////////// // OpenCL Common Defines ///////////////////////////////////////////////////////////////////////////// #define INLINE inline __attribute__((always_inline)) #define OVERLOADABLE __attribute__((overloadable)) #define PURE __attribute__((pure)) #define CONST __attribute__((const)) #define INLINE_OVERLOADABLE inline __attribute__((overloadable,always_inline)) ///////////////////////////////////////////////////////////////////////////// // OpenCL built-in scalar data types ///////////////////////////////////////////////////////////////////////////// typedef unsigned char uchar; typedef unsigned short ushort; typedef unsigned int uint; typedef unsigned long ulong; typedef __typeof__(sizeof(int)) size_t; typedef __typeof__((int *)0-(int *)0) ptrdiff_t; #define __int_t_type(a,b,c) a##b##c #define __int_type(type,n) __int_t_type(type,n,_TYPE__) typedef __int_type(__INT,__INTPTR_WIDTH__) intptr_t; typedef __int_type(__UINT,__INTPTR_WIDTH__) uintptr_t; #undef __int_type #undef __int_t_type ///////////////////////////////////////////////////////////////////////////// // OpenCL address space ///////////////////////////////////////////////////////////////////////////// // These are built-ins in LLVM 3.3. #if 100*__clang_major__ + __clang_minor__ <= 302 #define __private __attribute__((address_space(0))) #define __global __attribute__((address_space(1))) #define __constant __attribute__((address_space(2))) #define __local __attribute__((address_space(3))) #define global __global #define local __local #define constant __constant #define private __private #endif ///////////////////////////////////////////////////////////////////////////// // OpenCL built-in vector data types ///////////////////////////////////////////////////////////////////////////// #define DEF(type) typedef type type##2 __attribute__((ext_vector_type(2)));\ typedef type type##3 __attribute__((ext_vector_type(3)));\ typedef type type##4 __attribute__((ext_vector_type(4)));\ typedef type type##8 __attribute__((ext_vector_type(8)));\ typedef type type##16 __attribute__((ext_vector_type(16))); DEF(char); DEF(uchar); DEF(short); DEF(ushort); DEF(int); DEF(uint); DEF(long); DEF(ulong); DEF(float); DEF(double); DEF(half); #undef DEF ///////////////////////////////////////////////////////////////////////////// // OpenCL atomic related types ///////////////////////////////////////////////////////////////////////////// //atomic flags #define CLK_LOCAL_MEM_FENCE (1 << 0) #define CLK_GLOBAL_MEM_FENCE (1 << 1) #define CLK_IMAGE_MEM_FENCE (1 << 2) typedef uint cl_mem_fence_flags; //memory order typedef enum { memory_order_relaxed, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst } memory_order; //memory scope typedef enum { memory_scope_work_item, memory_scope_work_group, memory_scope_device, memory_scope_all_svm_devices, memory_scope_sub_group, } memory_scope; ///////////////////////////////////////////////////////////////////////////// // OpenCL built-in event types ///////////////////////////////////////////////////////////////////////////// // FIXME: // This is a transitional hack to bypass the LLVM 3.3 built-in types. // See the Khronos SPIR specification for handling of these types. #endif /* __OCL_TYPES_H__ */ Beignet-1.3.2-Source/backend/src/libocl/include/ocl_image.h000664 001750 001750 00000022136 13173554000 022617 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_IMAGE_H__ #define __OCL_IMAGE_H__ #include "ocl_types.h" #define int1 int #define float1 float #pragma OPENCL EXTENSION cl_khr_3d_image_writes : enable #define DECL_IMAGE_READ_SAMPLE_RETTYPE(IMG_TYPE, DATA_YPE, SUFFIX, N) \ OVERLOADABLE DATA_YPE read_image ## SUFFIX(IMG_TYPE cl_image, const sampler_t sampler, int##N coord); \ OVERLOADABLE DATA_YPE read_image ## SUFFIX(IMG_TYPE cl_image, const sampler_t sampler, float##N coord); #define DECL_IMAGE_READ_NO_SAMPLE_RETTYPE(IMG_TYPE, DATA_YPE, SUFFIX, N) \ OVERLOADABLE DATA_YPE read_image ## SUFFIX(IMG_TYPE cl_image, int##N coord); #define DECL_IMAGE_WRITE_RETTYPE(IMG_TYPE, DATA_YPE, SUFFIX, N) \ OVERLOADABLE void write_image ## SUFFIX(IMG_TYPE cl_image, int##N coord, DATA_YPE color); #define DECL_IMAGE_TYPE_READ_NO_SAMPLE(IMG_TYPE, N)\ DECL_IMAGE_READ_NO_SAMPLE_RETTYPE(IMG_TYPE, int4, i, N) \ DECL_IMAGE_READ_NO_SAMPLE_RETTYPE(IMG_TYPE, uint4, ui, N) \ DECL_IMAGE_READ_NO_SAMPLE_RETTYPE(IMG_TYPE, float4, f, N) #define DECL_IMAGE_TYPE_READ_SAMPLE(IMG_TYPE, N)\ DECL_IMAGE_READ_SAMPLE_RETTYPE(IMG_TYPE, int4, i, N) \ DECL_IMAGE_READ_SAMPLE_RETTYPE(IMG_TYPE, uint4, ui, N) \ DECL_IMAGE_READ_SAMPLE_RETTYPE(IMG_TYPE, float4, f, N) #define DECL_IMAGE_TYPE_WRITE(IMG_TYPE, N)\ DECL_IMAGE_WRITE_RETTYPE(IMG_TYPE, int4, i, N) \ DECL_IMAGE_WRITE_RETTYPE(IMG_TYPE, uint4, ui, N) \ DECL_IMAGE_WRITE_RETTYPE(IMG_TYPE, float4, f, N) #if (__OPENCL_C_VERSION__ >= 200) #define DECL_IMAGE(IMG_TYPE, N) \ DECL_IMAGE_TYPE_READ_NO_SAMPLE(read_only IMG_TYPE, N) \ DECL_IMAGE_TYPE_READ_NO_SAMPLE(read_write IMG_TYPE, N) \ DECL_IMAGE_TYPE_READ_SAMPLE(read_only IMG_TYPE, N) \ DECL_IMAGE_TYPE_WRITE(write_only IMG_TYPE, N) \ DECL_IMAGE_TYPE_WRITE(read_write IMG_TYPE, N) #else #define DECL_IMAGE(IMG_TYPE, N) \ DECL_IMAGE_TYPE_READ_NO_SAMPLE(read_only IMG_TYPE, N) \ DECL_IMAGE_TYPE_READ_SAMPLE(read_only IMG_TYPE, N) \ DECL_IMAGE_TYPE_WRITE(write_only IMG_TYPE, N) #endif DECL_IMAGE(image1d_t, 1) DECL_IMAGE(image2d_t, 2) DECL_IMAGE(image1d_array_t, 2) DECL_IMAGE(image3d_t, 3) DECL_IMAGE(image3d_t, 4) DECL_IMAGE(image2d_array_t, 3) DECL_IMAGE(image2d_array_t, 4) #undef DECL_IMAGE #if (__OPENCL_C_VERSION__ >= 200) #define DECL_IMAGE(IMG_TYPE, N) \ DECL_IMAGE_TYPE_READ_NO_SAMPLE(read_only IMG_TYPE, N) \ DECL_IMAGE_TYPE_READ_NO_SAMPLE(read_write IMG_TYPE, N) \ DECL_IMAGE_TYPE_WRITE(write_only IMG_TYPE, N) \ DECL_IMAGE_TYPE_WRITE(read_write IMG_TYPE, N) #else #define DECL_IMAGE(IMG_TYPE, N) \ DECL_IMAGE_TYPE_READ_NO_SAMPLE(read_only IMG_TYPE, N) \ DECL_IMAGE_TYPE_WRITE(write_only IMG_TYPE, N) #endif DECL_IMAGE(image1d_buffer_t, 1) #undef int1 #undef float1 #undef DECL_IMAGE_TYPE_READ_NO_SAMPLE #undef DECL_IMAGE_TYPE_WRITE #undef DECL_IMAGE OVERLOADABLE int get_image_channel_data_type(read_only image1d_t image); OVERLOADABLE int get_image_channel_order(read_only image1d_t image); OVERLOADABLE int get_image_width(read_only image1d_t image); OVERLOADABLE int get_image_channel_data_type(read_only image1d_buffer_t image); OVERLOADABLE int get_image_channel_order(read_only image1d_buffer_t image); OVERLOADABLE int get_image_width(read_only image1d_buffer_t image); OVERLOADABLE int get_image_channel_data_type(read_only image2d_t image); OVERLOADABLE int get_image_channel_order(read_only image2d_t image); OVERLOADABLE int get_image_width(read_only image2d_t image); OVERLOADABLE int get_image_height(read_only image2d_t image); OVERLOADABLE int2 get_image_dim(read_only image2d_t image); OVERLOADABLE int get_image_channel_data_type(read_only image1d_array_t image); OVERLOADABLE int get_image_channel_order(read_only image1d_array_t image); OVERLOADABLE int get_image_width(read_only image1d_array_t image); OVERLOADABLE size_t get_image_array_size(read_only image1d_array_t image); OVERLOADABLE int get_image_channel_data_type(read_only image3d_t image); OVERLOADABLE int get_image_channel_order(read_only image3d_t image); OVERLOADABLE int get_image_width(read_only image3d_t image); OVERLOADABLE int get_image_height(read_only image3d_t image); OVERLOADABLE int get_image_depth(read_only image3d_t image); OVERLOADABLE int4 get_image_dim(read_only image3d_t image); OVERLOADABLE int get_image_channel_data_type(read_only image2d_array_t image); OVERLOADABLE int get_image_channel_order(read_only image2d_array_t image); OVERLOADABLE int get_image_width(read_only image2d_array_t image); OVERLOADABLE int get_image_height(read_only image2d_array_t image); OVERLOADABLE int2 get_image_dim(read_only image2d_array_t image); OVERLOADABLE size_t get_image_array_size(read_only image2d_array_t image); #if __clang_major__*10 + __clang_minor__ >= 39 OVERLOADABLE int get_image_channel_data_type(write_only image1d_t image); OVERLOADABLE int get_image_channel_order(write_only image1d_t image); OVERLOADABLE int get_image_width(write_only image1d_t image); OVERLOADABLE int get_image_channel_data_type(write_only image1d_buffer_t image); OVERLOADABLE int get_image_channel_order(write_only image1d_buffer_t image); OVERLOADABLE int get_image_width(write_only image1d_buffer_t image); OVERLOADABLE int get_image_channel_data_type(write_only image2d_t image); OVERLOADABLE int get_image_channel_order(write_only image2d_t image); OVERLOADABLE int get_image_width(write_only image2d_t image); OVERLOADABLE int get_image_height(write_only image2d_t image); OVERLOADABLE int2 get_image_dim(write_only image2d_t image); OVERLOADABLE int get_image_channel_data_type(write_only image1d_array_t image); OVERLOADABLE int get_image_channel_order(write_only image1d_array_t image); OVERLOADABLE int get_image_width(write_only image1d_array_t image); OVERLOADABLE size_t get_image_array_size(write_only image1d_array_t image); OVERLOADABLE int get_image_channel_data_type(write_only image3d_t image); OVERLOADABLE int get_image_channel_order(write_only image3d_t image); OVERLOADABLE int get_image_width(write_only image3d_t image); OVERLOADABLE int get_image_height(write_only image3d_t image); OVERLOADABLE int get_image_depth(write_only image3d_t image); OVERLOADABLE int4 get_image_dim(write_only image3d_t image); OVERLOADABLE int get_image_channel_data_type(write_only image2d_array_t image); OVERLOADABLE int get_image_channel_order(write_only image2d_array_t image); OVERLOADABLE int get_image_width(write_only image2d_array_t image); OVERLOADABLE int get_image_height(write_only image2d_array_t image); OVERLOADABLE int2 get_image_dim(write_only image2d_array_t image); OVERLOADABLE size_t get_image_array_size(write_only image2d_array_t image); #endif #if (__OPENCL_C_VERSION__ >= 200) OVERLOADABLE int get_image_channel_data_type(read_write image1d_t image); OVERLOADABLE int get_image_channel_order(read_write image1d_t image); OVERLOADABLE int get_image_width(read_write image1d_t image); OVERLOADABLE int get_image_channel_data_type(read_write image1d_buffer_t image); OVERLOADABLE int get_image_channel_order(read_write image1d_buffer_t image); OVERLOADABLE int get_image_width(read_write image1d_buffer_t image); OVERLOADABLE int get_image_channel_data_type(read_write image2d_t image); OVERLOADABLE int get_image_channel_order(read_write image2d_t image); OVERLOADABLE int get_image_width(read_write image2d_t image); OVERLOADABLE int get_image_height(read_write image2d_t image); OVERLOADABLE int2 get_image_dim(read_write image2d_t image); OVERLOADABLE int get_image_channel_data_type(read_write image1d_array_t image); OVERLOADABLE int get_image_channel_order(read_write image1d_array_t image); OVERLOADABLE int get_image_width(read_write image1d_array_t image); OVERLOADABLE size_t get_image_array_size(read_write image1d_array_t image); OVERLOADABLE int get_image_channel_data_type(read_write image3d_t image); OVERLOADABLE int get_image_channel_order(read_write image3d_t image); OVERLOADABLE int get_image_width(read_write image3d_t image); OVERLOADABLE int get_image_height(read_write image3d_t image); OVERLOADABLE int get_image_depth(read_write image3d_t image); OVERLOADABLE int4 get_image_dim(read_write image3d_t image); OVERLOADABLE int get_image_channel_data_type(read_write image2d_array_t image); OVERLOADABLE int get_image_channel_order(read_write image2d_array_t image); OVERLOADABLE int get_image_width(read_write image2d_array_t image); OVERLOADABLE int get_image_height(read_write image2d_array_t image); OVERLOADABLE int2 get_image_dim(read_write image2d_array_t image); OVERLOADABLE size_t get_image_array_size(read_write image2d_array_t image); #endif #endif Beignet-1.3.2-Source/backend/src/libocl/include/ocl_work_group.h000664 001750 001750 00000013065 13161142102 023725 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_WORK_GROUP_H__ #define __OCL_WORK_GROUP_H__ #include "ocl_types.h" int work_group_all(int predicate); int work_group_any(int predicate); /* broadcast */ OVERLOADABLE int work_group_broadcast(int a, size_t local_id); OVERLOADABLE uint work_group_broadcast(uint a, size_t local_id); OVERLOADABLE long work_group_broadcast(long a, size_t local_id); OVERLOADABLE ulong work_group_broadcast(ulong a, size_t local_id); OVERLOADABLE float work_group_broadcast(float a, size_t local_id); OVERLOADABLE double work_group_broadcast(double a, size_t local_id); OVERLOADABLE int work_group_broadcast(int a, size_t local_id_x, size_t local_id_y); OVERLOADABLE uint work_group_broadcast(uint a, size_t local_id_x, size_t local_id_y); OVERLOADABLE long work_group_broadcast(long a, size_t local_id_x, size_t local_id_y); OVERLOADABLE ulong work_group_broadcast(ulong a, size_t local_id_x, size_t local_id_y); OVERLOADABLE float work_group_broadcast(float a, size_t local_id_x, size_t local_id_y); OVERLOADABLE double work_group_broadcast(double a, size_t local_id_x, size_t local_id_y); OVERLOADABLE int work_group_broadcast(int a, size_t local_id_x, size_t local_id_y, size_t local_id_z); OVERLOADABLE uint work_group_broadcast(uint a, size_t local_id_x, size_t local_id_y, size_t local_id_z); OVERLOADABLE long work_group_broadcast(long a, size_t local_id_x, size_t local_id_y, size_t local_id_z); OVERLOADABLE ulong work_group_broadcast(ulong a, size_t local_id_x, size_t local_id_y, size_t local_id_z); OVERLOADABLE float work_group_broadcast(float a, size_t local_id_x, size_t local_id_y, size_t local_id_z); OVERLOADABLE double work_group_broadcast(double a, size_t local_id_x, size_t local_id_y, size_t local_id_z); /* reduce add */ OVERLOADABLE int work_group_reduce_add(int x); OVERLOADABLE uint work_group_reduce_add(uint x); OVERLOADABLE long work_group_reduce_add(long x); OVERLOADABLE ulong work_group_reduce_add(ulong x); OVERLOADABLE float work_group_reduce_add(float x); OVERLOADABLE double work_group_reduce_add(double x); /* reduce min */ OVERLOADABLE int work_group_reduce_min(int x); OVERLOADABLE uint work_group_reduce_min(uint x); OVERLOADABLE long work_group_reduce_min(long x); OVERLOADABLE ulong work_group_reduce_min(ulong x); OVERLOADABLE float work_group_reduce_min(float x); OVERLOADABLE double work_group_reduce_min(double x); /* reduce max */ OVERLOADABLE int work_group_reduce_max(int x); OVERLOADABLE uint work_group_reduce_max(uint x); OVERLOADABLE long work_group_reduce_max(long x); OVERLOADABLE ulong work_group_reduce_max(ulong x); OVERLOADABLE float work_group_reduce_max(float x); OVERLOADABLE double work_group_reduce_max(double x); /* scan_inclusive add */ OVERLOADABLE int work_group_scan_inclusive_add(int x); OVERLOADABLE uint work_group_scan_inclusive_add(uint x); OVERLOADABLE long work_group_scan_inclusive_add(long x); OVERLOADABLE ulong work_group_scan_inclusive_add(ulong x); OVERLOADABLE float work_group_scan_inclusive_add(float x); OVERLOADABLE double work_group_scan_inclusive_add(double x); /* scan_inclusive min */ OVERLOADABLE int work_group_scan_inclusive_min(int x); OVERLOADABLE uint work_group_scan_inclusive_min(uint x); OVERLOADABLE long work_group_scan_inclusive_min(long x); OVERLOADABLE ulong work_group_scan_inclusive_min(ulong x); OVERLOADABLE float work_group_scan_inclusive_min(float x); OVERLOADABLE double work_group_scan_inclusive_min(double x); /* scan_inclusive max */ OVERLOADABLE int work_group_scan_inclusive_max(int x); OVERLOADABLE uint work_group_scan_inclusive_max(uint x); OVERLOADABLE long work_group_scan_inclusive_max(long x); OVERLOADABLE ulong work_group_scan_inclusive_max(ulong x); OVERLOADABLE float work_group_scan_inclusive_max(float x); OVERLOADABLE double work_group_scan_inclusive_max(double x); /* scan_exclusive add */ OVERLOADABLE int work_group_scan_exclusive_add(int x); OVERLOADABLE uint work_group_scan_exclusive_add(uint x); OVERLOADABLE long work_group_scan_exclusive_add(long x); OVERLOADABLE ulong work_group_scan_exclusive_add(ulong x); OVERLOADABLE float work_group_scan_exclusive_add(float x); OVERLOADABLE double work_group_scan_exclusive_add(double x); /* scan_exclusive min */ OVERLOADABLE int work_group_scan_exclusive_min(int x); OVERLOADABLE uint work_group_scan_exclusive_min(uint x); OVERLOADABLE long work_group_scan_exclusive_min(long x); OVERLOADABLE ulong work_group_scan_exclusive_min(ulong x); OVERLOADABLE float work_group_scan_exclusive_min(float x); OVERLOADABLE double work_group_scan_exclusive_min(double x); /* scan_exclusive max */ OVERLOADABLE int work_group_scan_exclusive_max(int x); OVERLOADABLE uint work_group_scan_exclusive_max(uint x); OVERLOADABLE long work_group_scan_exclusive_max(long x); OVERLOADABLE ulong work_group_scan_exclusive_max(ulong x); OVERLOADABLE float work_group_scan_exclusive_max(float x); OVERLOADABLE double work_group_scan_exclusive_max(double x); #endif /* __OCL_WORK_GROUP_H__ */ Beignet-1.3.2-Source/backend/src/libocl/include/ocl_enqueue.h000664 001750 001750 00000007642 13173554000 023211 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_ENQUEUE_H__ #define __OCL_ENQUEUE_H__ #include "ocl_types.h" #define CLK_ENQUEUE_FLAGS_WAIT_KERNEL 0 #define CLK_ENQUEUE_FLAGS_NO_WAIT 1 #define CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP 2 #define CLK_SUCCESS 0 #define CL_COMPLETE 0 #define CLK_PROFILING_COMMAND_EXEC_TIME 0 struct ndrange_info_t { int type; int global_work_size[3]; int local_work_size[3]; int global_work_offset[3]; }; struct Block_literal { void *isa; // initialized to &_NSConcreteStackBlock or &_NSConcreteGlobalBlock int flags; int reserved; __global void* invoke; struct Block_descriptor_1 { unsigned long int reserved; // NULL unsigned long int size; // sizeof(struct Block_literal_1) // optional helper functions void *copy_helper; // IFF (1<<25) void *dispose_helper; // IFF (1<<25) // required ABI.2010.3.16 const char *signature; // IFF (1<<30) } *descriptor; // imported variables }; #if __clang_major__*10 + __clang_minor__ >= 50 typedef struct ndrange_info_t ndrange_t; #endif #if __clang_major__*10 + __clang_minor__ >= 50 #define BLOCK_TYPE void* #else #define BLOCK_TYPE __private void* #endif #if __clang_major__*10 + __clang_minor__ >= 40 #define EVENT_TYPE clk_event_t* #else #define EVENT_TYPE __private clk_event_t* #endif clk_event_t create_user_event(void); void retain_event(clk_event_t event); void release_event(clk_event_t event); void set_user_event_status(clk_event_t event, int status); bool is_valid_event(clk_event_t event); void capture_event_profiling_info(clk_event_t event, int name, global void *value); uint __get_kernel_work_group_size_impl(BLOCK_TYPE block); uint __get_kernel_preferred_work_group_multiple_impl(BLOCK_TYPE block); int __enqueue_kernel_basic(queue_t q, int flag, ndrange_t ndrange, BLOCK_TYPE block); int __enqueue_kernel_basic_events(queue_t q, int flag, ndrange_t ndrange, uint num_events_in_wait_list, const EVENT_TYPE event_wait_list, EVENT_TYPE event_ret, BLOCK_TYPE block); queue_t get_default_queue(void); int __gen_enqueue_kernel_slm(queue_t q, int flag, ndrange_t ndrange, BLOCK_TYPE block, int count, __private int* slm_sizes); OVERLOADABLE ndrange_t ndrange_1D(size_t global_work_size); OVERLOADABLE ndrange_t ndrange_1D(size_t global_work_size, size_t local_work_size); OVERLOADABLE ndrange_t ndrange_1D(size_t global_work_offset, size_t global_work_size, size_t local_work_size); OVERLOADABLE ndrange_t ndrange_2D(const size_t global_work_size[2]); OVERLOADABLE ndrange_t ndrange_2D(const size_t global_work_size[2], const size_t local_work_size[2]); OVERLOADABLE ndrange_t ndrange_2D(const size_t global_work_offset[2], const size_t global_work_size[2], const size_t local_work_size[2]); OVERLOADABLE ndrange_t ndrange_3D(const size_t global_work_size[3]); OVERLOADABLE ndrange_t ndrange_3D(const size_t global_work_size[3], const size_t local_work_size[3]); OVERLOADABLE ndrange_t ndrange_3D(const size_t global_work_offset[3], const size_t global_work_size[3], const size_t local_work_size[3]); int enqueue_marker (queue_t queue, uint num_events_in_wait_list, const clk_event_t *event_wait_list, clk_event_t *event_ret); #endif Beignet-1.3.2-Source/backend/src/libocl/include/ocl_geometric.h000664 001750 001750 00000004436 13161142102 023507 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_GEOMETRIC_H__ #define __OCL_GEOMETRIC_H__ #include "ocl_types.h" OVERLOADABLE float dot(float p0, float p1); OVERLOADABLE float dot(float2 p0, float2 p1); OVERLOADABLE float dot(float3 p0, float3 p1); OVERLOADABLE float dot(float4 p0, float4 p1); OVERLOADABLE half dot(half p0, half p1); OVERLOADABLE half dot(half2 p0, half2 p1); OVERLOADABLE half dot(half3 p0, half3 p1); OVERLOADABLE half dot(half4 p0, half4 p1); OVERLOADABLE float length(float x); OVERLOADABLE float length(float2 x); OVERLOADABLE float length(float3 x); OVERLOADABLE float length(float4 x); OVERLOADABLE float distance(float x, float y); OVERLOADABLE float distance(float2 x, float2 y); OVERLOADABLE float distance(float3 x, float3 y); OVERLOADABLE float distance(float4 x, float4 y); OVERLOADABLE float normalize(float x); OVERLOADABLE float2 normalize(float2 x); OVERLOADABLE float3 normalize(float3 x); OVERLOADABLE float4 normalize(float4 x); OVERLOADABLE float fast_length(float x); OVERLOADABLE float fast_length(float2 x); OVERLOADABLE float fast_length(float3 x); OVERLOADABLE float fast_length(float4 x); OVERLOADABLE float fast_distance(float x, float y); OVERLOADABLE float fast_distance(float2 x, float2 y); OVERLOADABLE float fast_distance(float3 x, float3 y); OVERLOADABLE float fast_distance(float4 x, float4 y); OVERLOADABLE float fast_normalize(float x); OVERLOADABLE float2 fast_normalize(float2 x); OVERLOADABLE float3 fast_normalize(float3 x); OVERLOADABLE float4 fast_normalize(float4 x); OVERLOADABLE float3 cross(float3 v0, float3 v1); OVERLOADABLE float4 cross(float4 v0, float4 v1); #endif Beignet-1.3.2-Source/backend/src/libocl/include/ocl_memcpy.h000664 001750 001750 00000005577 13161142102 023032 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_MEMCPY_H__ #define __OCL_MEMCPY_H__ #include "ocl_types.h" ///////////////////////////////////////////////////////////////////////////// // memcopy functions ///////////////////////////////////////////////////////////////////////////// void __gen_memcpy_gg_align(__global uchar* dst, __global uchar* src, size_t size); void __gen_memcpy_gp_align(__global uchar* dst, __private uchar* src, size_t size); void __gen_memcpy_gl_align(__global uchar* dst, __local uchar* src, size_t size); void __gen_memcpy_gc_align(__global uchar* dst, __constant uchar* src, size_t size); void __gen_memcpy_pg_align(__private uchar* dst, __global uchar* src, size_t size); void __gen_memcpy_pp_align(__private uchar* dst, __private uchar* src, size_t size); void __gen_memcpy_pl_align(__private uchar* dst, __local uchar* src, size_t size); void __gen_memcpy_pc_align(__private uchar* dst, __constant uchar* src, size_t size); void __gen_memcpy_lg_align(__local uchar* dst, __global uchar* src, size_t size); void __gen_memcpy_lp_align(__local uchar* dst, __private uchar* src, size_t size); void __gen_memcpy_ll_align(__local uchar* dst, __local uchar* src, size_t size); void __gen_memcpy_lc_align(__local uchar* dst, __constant uchar* src, size_t size); void __gen_memcpy_gg(__global uchar* dst, __global uchar* src, size_t size); void __gen_memcpy_gp(__global uchar* dst, __private uchar* src, size_t size); void __gen_memcpy_gl(__global uchar* dst, __local uchar* src, size_t size); void __gen_memcpy_gc(__global uchar* dst, __constant uchar* src, size_t size); void __gen_memcpy_pg(__private uchar* dst, __global uchar* src, size_t size); void __gen_memcpy_pp(__private uchar* dst, __private uchar* src, size_t size); void __gen_memcpy_pl(__private uchar* dst, __local uchar* src, size_t size); void __gen_memcpy_pc(__private uchar* dst, __constant uchar* src, size_t size); void __gen_memcpy_lg(__local uchar* dst, __global uchar* src, size_t size); void __gen_memcpy_lp(__local uchar* dst, __private uchar* src, size_t size); void __gen_memcpy_ll(__local uchar* dst, __local uchar* src, size_t size); void __gen_memcpy_lc(__local uchar* dst, __constant uchar* src, size_t size); #endif /* __OCL_MEMCPY_H__ */ Beignet-1.3.2-Source/backend/src/libocl/include/ocl_sync.h000664 001750 001750 00000002510 13161142102 022474 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_SYNC_H__ #define __OCL_SYNC_H__ #include "ocl_types.h" ///////////////////////////////////////////////////////////////////////////// // Synchronization functions ///////////////////////////////////////////////////////////////////////////// OVERLOADABLE void barrier(cl_mem_fence_flags flags); OVERLOADABLE void debugwait(void); OVERLOADABLE void mem_fence(cl_mem_fence_flags flags); OVERLOADABLE void read_mem_fence(cl_mem_fence_flags flags); OVERLOADABLE void write_mem_fence(cl_mem_fence_flags flags); #define work_group_barrier barrier cl_mem_fence_flags get_fence(void *ptr); #endif /* __OCL_SYNC_H__ */ Beignet-1.3.2-Source/backend/src/libocl/include/ocl_atom.h000664 001750 001750 00000010472 13161142102 022466 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_ATOM_H__ #define __OCL_ATOM_H__ #include "ocl_types.h" ///////////////////////////////////////////////////////////////////////////// // Atomic functions ///////////////////////////////////////////////////////////////////////////// OVERLOADABLE uint atomic_add(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_add(volatile __local uint *p, uint val); OVERLOADABLE int atomic_add(volatile __global int *p, int val); OVERLOADABLE int atomic_add(volatile __local int *p, int val); OVERLOADABLE uint atomic_sub(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_sub(volatile __local uint *p, uint val); OVERLOADABLE int atomic_sub(volatile __global int *p, int val); OVERLOADABLE int atomic_sub(volatile __local int *p, int val); OVERLOADABLE uint atomic_and(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_and(volatile __local uint *p, uint val); OVERLOADABLE int atomic_and(volatile __global int *p, int val); OVERLOADABLE int atomic_and(volatile __local int *p, int val); OVERLOADABLE uint atomic_or(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_or(volatile __local uint *p, uint val); OVERLOADABLE int atomic_or(volatile __global int *p, int val); OVERLOADABLE int atomic_or(volatile __local int *p, int val); OVERLOADABLE uint atomic_xor(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_xor(volatile __local uint *p, uint val); OVERLOADABLE int atomic_xor(volatile __global int *p, int val); OVERLOADABLE int atomic_xor(volatile __local int *p, int val); OVERLOADABLE uint atomic_xchg(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_xchg(volatile __local uint *p, uint val); OVERLOADABLE int atomic_xchg(volatile __global int *p, int val); OVERLOADABLE int atomic_xchg(volatile __local int *p, int val); OVERLOADABLE int atomic_min(volatile __global int *p, int val); OVERLOADABLE int atomic_min(volatile __local int *p, int val); OVERLOADABLE int atomic_max(volatile __global int *p, int val); OVERLOADABLE int atomic_max(volatile __local int *p, int val); OVERLOADABLE uint atomic_min(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_min(volatile __local uint *p, uint val); OVERLOADABLE uint atomic_max(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_max(volatile __local uint *p, uint val); OVERLOADABLE float atomic_xchg (volatile __global float *p, float val); OVERLOADABLE float atomic_xchg (volatile __local float *p, float val); OVERLOADABLE uint atomic_inc (volatile __global uint *p); OVERLOADABLE uint atomic_inc (volatile __local uint *p); OVERLOADABLE int atomic_inc (volatile __global int *p); OVERLOADABLE int atomic_inc (volatile __local int *p); OVERLOADABLE uint atomic_dec (volatile __global uint *p); OVERLOADABLE uint atomic_dec (volatile __local uint *p); OVERLOADABLE int atomic_dec (volatile __global int *p); OVERLOADABLE int atomic_dec (volatile __local int *p); OVERLOADABLE uint atomic_cmpxchg (volatile __global uint *p, uint cmp, uint val); OVERLOADABLE uint atomic_cmpxchg (volatile __local uint *p, uint cmp, uint val); OVERLOADABLE int atomic_cmpxchg (volatile __global int *p, int cmp, int val); OVERLOADABLE int atomic_cmpxchg (volatile __local int *p, int cmp, int val); // XXX for conformance test // The following atom_xxx api is on OpenCL spec 1.0. #define atom_add atomic_add #define atom_sub atomic_sub #define atom_and atomic_and #define atom_or atomic_or #define atom_xor atomic_xor #define atom_xchg atomic_xchg #define atom_min atomic_min #define atom_max atomic_max #define atom_inc atomic_inc #define atom_dec atomic_dec #define atom_cmpxchg atomic_cmpxchg #endif /* __OCL_ATOM_H__ */ Beignet-1.3.2-Source/backend/src/libocl/include/ocl_vload_20.h000664 001750 001750 00000014523 13161142102 023135 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_VLOAD_20_H__ #define __OCL_VLOAD_20_H__ #include "ocl_types.h" ///////////////////////////////////////////////////////////////////////////// // Vector loads and stores ///////////////////////////////////////////////////////////////////////////// // These loads and stores will use untyped reads and writes, so we can just // cast to vector loads / stores. Not C99 compliant BTW due to aliasing issue. // Well we do not care, we do not activate TBAA in the compiler #define DECL_UNTYPED_RW_SPACE_N(TYPE, DIM, SPACE) \ OVERLOADABLE TYPE##DIM vload##DIM(size_t offset, const SPACE TYPE *p); \ OVERLOADABLE void vstore##DIM(TYPE##DIM v, size_t offset, SPACE TYPE *p); #define DECL_UNTYPED_RD_SPACE_N(TYPE, DIM, SPACE) \ OVERLOADABLE TYPE##DIM vload##DIM(size_t offset, const SPACE TYPE *p); #define DECL_UNTYPED_V3_SPACE(TYPE, SPACE) \ OVERLOADABLE void vstore3(TYPE##3 v, size_t offset, SPACE TYPE *p); \ OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p); #define DECL_UNTYPED_RDV3_SPACE(TYPE, SPACE) \ OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p); #define DECL_UNTYPED_RW_ALL_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 2, SPACE) \ DECL_UNTYPED_V3_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 4, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 8, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 16, SPACE) #define DECL_UNTYPED_RD_ALL_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 2, SPACE) \ DECL_UNTYPED_RDV3_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 4, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 8, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 16, SPACE) #define DECL_UNTYPED_RW_ALL(TYPE) \ DECL_UNTYPED_RD_ALL_SPACE(TYPE, __constant) \ DECL_UNTYPED_RW_ALL_SPACE(TYPE, __generic) #define DECL_BYTE_RD_SPACE(TYPE, SPACE) \ OVERLOADABLE TYPE##2 vload2(size_t offset, const SPACE TYPE *p); \ OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p); \ OVERLOADABLE TYPE##4 vload4(size_t offset, const SPACE TYPE *p); \ OVERLOADABLE TYPE##8 vload8(size_t offset, const SPACE TYPE *p); \ OVERLOADABLE TYPE##16 vload16(size_t offset, const SPACE TYPE *p); #define DECL_BYTE_WR_SPACE(TYPE, SPACE) \ OVERLOADABLE void vstore2(TYPE##2 v, size_t offset, SPACE TYPE *p); \ OVERLOADABLE void vstore3(TYPE##3 v, size_t offset, SPACE TYPE *p); \ OVERLOADABLE void vstore4(TYPE##4 v, size_t offset, SPACE TYPE *p); \ OVERLOADABLE void vstore8(TYPE##8 v, size_t offset, SPACE TYPE *p); \ OVERLOADABLE void vstore16(TYPE##16 v, size_t offset, SPACE TYPE *p); #define DECL_BYTE_RW_ALL(TYPE) \ DECL_BYTE_RD_SPACE(TYPE, __generic) \ DECL_BYTE_WR_SPACE(TYPE, __generic) \ DECL_BYTE_RD_SPACE(TYPE, __constant) DECL_BYTE_RW_ALL(char) DECL_BYTE_RW_ALL(uchar) DECL_BYTE_RW_ALL(short) DECL_BYTE_RW_ALL(ushort) DECL_BYTE_RW_ALL(half) DECL_UNTYPED_RW_ALL(int) DECL_UNTYPED_RW_ALL(uint) DECL_UNTYPED_RW_ALL(long) DECL_UNTYPED_RW_ALL(ulong) DECL_UNTYPED_RW_ALL(float) DECL_UNTYPED_RW_ALL(double) #undef DECL_UNTYPED_RW_ALL #undef DECL_UNTYPED_RW_ALL_SPACE #undef DECL_UNTYPED_RD_ALL_SPACE #undef DECL_UNTYPED_RW_SPACE_N #undef DECL_UNTYPED_RD_SPACE_N #undef DECL_UNTYPED_V3_SPACE #undef DECL_UNTYPED_RDV3_SPACE #undef DECL_BYTE_RD_SPACE #undef DECL_BYTE_WR_SPACE #undef DECL_BYTE_RW_ALL #define DECL_HALF_LD_SPACE(SPACE) \ OVERLOADABLE float vload_half(size_t offset, const SPACE half *p); \ OVERLOADABLE float vloada_half(size_t offset, const SPACE half *p); \ OVERLOADABLE float2 vload_half2(size_t offset, const SPACE half *p); \ OVERLOADABLE float2 vloada_half2(size_t offset, const SPACE half *p); \ OVERLOADABLE float3 vload_half3(size_t offset, const SPACE half *p); \ OVERLOADABLE float3 vloada_half3(size_t offset, const SPACE half *p); \ OVERLOADABLE float4 vload_half4(size_t offset, const SPACE half *p); \ OVERLOADABLE float4 vloada_half4(size_t offset, const SPACE half *p); \ OVERLOADABLE float8 vload_half8(size_t offset, const SPACE half *p); \ OVERLOADABLE float8 vloada_half8(size_t offset, const SPACE half *p); \ OVERLOADABLE float16 vload_half16(size_t offset, const SPACE half *p); \ OVERLOADABLE float16 vloada_half16(size_t offset, const SPACE half *p); \ #define DECL_HALF_ST_SPACE_ROUND(SPACE, ROUND, FUNC) \ OVERLOADABLE void vstore_half##ROUND(float data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstorea_half##ROUND(float data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstore_half2##ROUND(float2 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstorea_half2##ROUND(float2 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstore_half3##ROUND(float3 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstorea_half3##ROUND(float3 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstore_half4##ROUND(float4 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstorea_half4##ROUND(float4 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstore_half8##ROUND(float8 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstorea_half8##ROUND(float8 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstore_half16##ROUND(float16 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstorea_half16##ROUND(float16 data, size_t offset, SPACE half *p); #define DECL_HALF_ST_SPACE(SPACE) \ DECL_HALF_ST_SPACE_ROUND(SPACE, , dummy) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rte, dummy) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rtz, dummy) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rtp, dummy) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rtn, dummy) \ DECL_HALF_LD_SPACE(__constant) DECL_HALF_LD_SPACE(__generic) DECL_HALF_ST_SPACE(__generic) //#undef DECL_UNTYPED_RW_ALL_SPACE #undef DECL_HALF_LD_SPACE #undef DECL_HALF_ST_SPACE #undef DECL_HALF_ST_SPACE_ROUND #endif /* __OCL_VLOAD_20_H__ */ Beignet-1.3.2-Source/backend/src/libocl/include/ocl_vload.h000664 001750 001750 00000015261 13161142102 022634 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_VLOAD_H__ #define __OCL_VLOAD_H__ #include "ocl_types.h" ///////////////////////////////////////////////////////////////////////////// // Vector loads and stores ///////////////////////////////////////////////////////////////////////////// // These loads and stores will use untyped reads and writes, so we can just // cast to vector loads / stores. Not C99 compliant BTW due to aliasing issue. // Well we do not care, we do not activate TBAA in the compiler #define DECL_UNTYPED_RW_SPACE_N(TYPE, DIM, SPACE) \ OVERLOADABLE TYPE##DIM vload##DIM(size_t offset, const SPACE TYPE *p); \ OVERLOADABLE void vstore##DIM(TYPE##DIM v, size_t offset, SPACE TYPE *p); #define DECL_UNTYPED_RD_SPACE_N(TYPE, DIM, SPACE) \ OVERLOADABLE TYPE##DIM vload##DIM(size_t offset, const SPACE TYPE *p); #define DECL_UNTYPED_V3_SPACE(TYPE, SPACE) \ OVERLOADABLE void vstore3(TYPE##3 v, size_t offset, SPACE TYPE *p); \ OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p); #define DECL_UNTYPED_RDV3_SPACE(TYPE, SPACE) \ OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p); #define DECL_UNTYPED_RW_ALL_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 2, SPACE) \ DECL_UNTYPED_V3_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 4, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 8, SPACE) \ DECL_UNTYPED_RW_SPACE_N(TYPE, 16, SPACE) #define DECL_UNTYPED_RD_ALL_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 2, SPACE) \ DECL_UNTYPED_RDV3_SPACE(TYPE, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 4, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 8, SPACE) \ DECL_UNTYPED_RD_SPACE_N(TYPE, 16, SPACE) #define DECL_UNTYPED_RW_ALL(TYPE) \ DECL_UNTYPED_RW_ALL_SPACE(TYPE, __global) \ DECL_UNTYPED_RW_ALL_SPACE(TYPE, __local) \ DECL_UNTYPED_RD_ALL_SPACE(TYPE, __constant) \ DECL_UNTYPED_RW_ALL_SPACE(TYPE, __private) #define DECL_BYTE_RD_SPACE(TYPE, SPACE) \ OVERLOADABLE TYPE##2 vload2(size_t offset, const SPACE TYPE *p); \ OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p); \ OVERLOADABLE TYPE##4 vload4(size_t offset, const SPACE TYPE *p); \ OVERLOADABLE TYPE##8 vload8(size_t offset, const SPACE TYPE *p); \ OVERLOADABLE TYPE##16 vload16(size_t offset, const SPACE TYPE *p); #define DECL_BYTE_WR_SPACE(TYPE, SPACE) \ OVERLOADABLE void vstore2(TYPE##2 v, size_t offset, SPACE TYPE *p); \ OVERLOADABLE void vstore3(TYPE##3 v, size_t offset, SPACE TYPE *p); \ OVERLOADABLE void vstore4(TYPE##4 v, size_t offset, SPACE TYPE *p); \ OVERLOADABLE void vstore8(TYPE##8 v, size_t offset, SPACE TYPE *p); \ OVERLOADABLE void vstore16(TYPE##16 v, size_t offset, SPACE TYPE *p); #define DECL_BYTE_RW_ALL(TYPE) \ DECL_BYTE_RD_SPACE(TYPE, __global) \ DECL_BYTE_RD_SPACE(TYPE, __local) \ DECL_BYTE_RD_SPACE(TYPE, __private) \ DECL_BYTE_RD_SPACE(TYPE, __constant) \ DECL_BYTE_WR_SPACE(TYPE, __global) \ DECL_BYTE_WR_SPACE(TYPE, __local) \ DECL_BYTE_WR_SPACE(TYPE, __private) DECL_BYTE_RW_ALL(char) DECL_BYTE_RW_ALL(uchar) DECL_BYTE_RW_ALL(short) DECL_BYTE_RW_ALL(ushort) DECL_BYTE_RW_ALL(half) DECL_UNTYPED_RW_ALL(int) DECL_UNTYPED_RW_ALL(uint) DECL_UNTYPED_RW_ALL(long) DECL_UNTYPED_RW_ALL(ulong) DECL_UNTYPED_RW_ALL(float) DECL_UNTYPED_RW_ALL(double) #undef DECL_UNTYPED_RW_ALL #undef DECL_UNTYPED_RW_ALL_SPACE #undef DECL_UNTYPED_RD_ALL_SPACE #undef DECL_UNTYPED_RW_SPACE_N #undef DECL_UNTYPED_RD_SPACE_N #undef DECL_UNTYPED_V3_SPACE #undef DECL_UNTYPED_RDV3_SPACE #undef DECL_BYTE_RD_SPACE #undef DECL_BYTE_WR_SPACE #undef DECL_BYTE_RW_ALL #define DECL_HALF_LD_SPACE(SPACE) \ OVERLOADABLE float vload_half(size_t offset, const SPACE half *p); \ OVERLOADABLE float vloada_half(size_t offset, const SPACE half *p); \ OVERLOADABLE float2 vload_half2(size_t offset, const SPACE half *p); \ OVERLOADABLE float2 vloada_half2(size_t offset, const SPACE half *p); \ OVERLOADABLE float3 vload_half3(size_t offset, const SPACE half *p); \ OVERLOADABLE float3 vloada_half3(size_t offset, const SPACE half *p); \ OVERLOADABLE float4 vload_half4(size_t offset, const SPACE half *p); \ OVERLOADABLE float4 vloada_half4(size_t offset, const SPACE half *p); \ OVERLOADABLE float8 vload_half8(size_t offset, const SPACE half *p); \ OVERLOADABLE float8 vloada_half8(size_t offset, const SPACE half *p); \ OVERLOADABLE float16 vload_half16(size_t offset, const SPACE half *p); \ OVERLOADABLE float16 vloada_half16(size_t offset, const SPACE half *p); \ #define DECL_HALF_ST_SPACE_ROUND(SPACE, ROUND, FUNC) \ OVERLOADABLE void vstore_half##ROUND(float data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstorea_half##ROUND(float data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstore_half2##ROUND(float2 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstorea_half2##ROUND(float2 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstore_half3##ROUND(float3 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstorea_half3##ROUND(float3 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstore_half4##ROUND(float4 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstorea_half4##ROUND(float4 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstore_half8##ROUND(float8 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstorea_half8##ROUND(float8 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstore_half16##ROUND(float16 data, size_t offset, SPACE half *p); \ OVERLOADABLE void vstorea_half16##ROUND(float16 data, size_t offset, SPACE half *p); #define DECL_HALF_ST_SPACE(SPACE) \ DECL_HALF_ST_SPACE_ROUND(SPACE, , dummy) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rte, dummy) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rtz, dummy) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rtp, dummy) \ DECL_HALF_ST_SPACE_ROUND(SPACE, _rtn, dummy) \ DECL_HALF_LD_SPACE(__global) DECL_HALF_LD_SPACE(__local) DECL_HALF_LD_SPACE(__constant) DECL_HALF_LD_SPACE(__private) DECL_HALF_ST_SPACE(__global) DECL_HALF_ST_SPACE(__local) DECL_HALF_ST_SPACE(__private) //#undef DECL_UNTYPED_RW_ALL_SPACE #undef DECL_HALF_LD_SPACE #undef DECL_HALF_ST_SPACE #undef DECL_HALF_ST_SPACE_ROUND #endif /* __OCL_VLOAD_H__ */ Beignet-1.3.2-Source/backend/src/libocl/include/ocl_atom_20.h000664 001750 001750 00000026165 13161142102 022775 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_ATOM20_H__ #define __OCL_ATOM20_H__ #include "ocl_types.h" ///////////////////////////////////////////////////////////////////////////// // Atomic functions ///////////////////////////////////////////////////////////////////////////// OVERLOADABLE uint atomic_add(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_add(volatile __local uint *p, uint val); OVERLOADABLE int atomic_add(volatile __global int *p, int val); OVERLOADABLE int atomic_add(volatile __local int *p, int val); OVERLOADABLE uint atomic_sub(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_sub(volatile __local uint *p, uint val); OVERLOADABLE int atomic_sub(volatile __global int *p, int val); OVERLOADABLE int atomic_sub(volatile __local int *p, int val); OVERLOADABLE uint atomic_and(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_and(volatile __local uint *p, uint val); OVERLOADABLE int atomic_and(volatile __global int *p, int val); OVERLOADABLE int atomic_and(volatile __local int *p, int val); OVERLOADABLE uint atomic_or(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_or(volatile __local uint *p, uint val); OVERLOADABLE int atomic_or(volatile __global int *p, int val); OVERLOADABLE int atomic_or(volatile __local int *p, int val); OVERLOADABLE uint atomic_xor(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_xor(volatile __local uint *p, uint val); OVERLOADABLE int atomic_xor(volatile __global int *p, int val); OVERLOADABLE int atomic_xor(volatile __local int *p, int val); OVERLOADABLE uint atomic_xchg(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_xchg(volatile __local uint *p, uint val); OVERLOADABLE int atomic_xchg(volatile __global int *p, int val); OVERLOADABLE int atomic_xchg(volatile __local int *p, int val); OVERLOADABLE int atomic_min(volatile __global int *p, int val); OVERLOADABLE int atomic_min(volatile __local int *p, int val); OVERLOADABLE int atomic_max(volatile __global int *p, int val); OVERLOADABLE int atomic_max(volatile __local int *p, int val); OVERLOADABLE uint atomic_min(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_min(volatile __local uint *p, uint val); OVERLOADABLE uint atomic_max(volatile __global uint *p, uint val); OVERLOADABLE uint atomic_max(volatile __local uint *p, uint val); OVERLOADABLE float atomic_xchg (volatile __global float *p, float val); OVERLOADABLE float atomic_xchg (volatile __local float *p, float val); OVERLOADABLE uint atomic_inc (volatile __global uint *p); OVERLOADABLE uint atomic_inc (volatile __local uint *p); OVERLOADABLE int atomic_inc (volatile __global int *p); OVERLOADABLE int atomic_inc (volatile __local int *p); OVERLOADABLE uint atomic_dec (volatile __global uint *p); OVERLOADABLE uint atomic_dec (volatile __local uint *p); OVERLOADABLE int atomic_dec (volatile __global int *p); OVERLOADABLE int atomic_dec (volatile __local int *p); OVERLOADABLE uint atomic_cmpxchg (volatile __global uint *p, uint cmp, uint val); OVERLOADABLE uint atomic_cmpxchg (volatile __local uint *p, uint cmp, uint val); OVERLOADABLE int atomic_cmpxchg (volatile __global int *p, int cmp, int val); OVERLOADABLE int atomic_cmpxchg (volatile __local int *p, int cmp, int val); // XXX for conformance test // The following atom_xxx api is on OpenCL spec 1.0. #define atom_add atomic_add #define atom_sub atomic_sub #define atom_and atomic_and #define atom_or atomic_or #define atom_xor atomic_xor #define atom_xchg atomic_xchg #define atom_min atomic_min #define atom_max atomic_max #define atom_inc atomic_inc #define atom_dec atomic_dec #define atom_cmpxchg atomic_cmpxchg //OpenCL 2.0 features #define ATOMIC_GEN_FUNCTIONS(ATYPE, CTYPE, POSTFIX) \ CTYPE __gen_ocl_atomic_exchange##POSTFIX(volatile ATYPE *p, CTYPE val, int order, int scope); \ CTYPE __gen_ocl_atomic_fetch_add##POSTFIX(volatile ATYPE *p, CTYPE val, int order, int scope); \ CTYPE __gen_ocl_atomic_fetch_sub##POSTFIX(volatile ATYPE *p, CTYPE val, int order, int scope); \ CTYPE __gen_ocl_atomic_fetch_or##POSTFIX(volatile ATYPE *p, CTYPE val, int order, int scope); \ CTYPE __gen_ocl_atomic_fetch_xor##POSTFIX(volatile ATYPE *p, CTYPE val, int order, int scope); \ CTYPE __gen_ocl_atomic_fetch_and##POSTFIX(volatile ATYPE *p, CTYPE val, int order, int scope); \ CTYPE __gen_ocl_atomic_fetch_imin##POSTFIX(volatile ATYPE *p, CTYPE val, int order, int scope); \ CTYPE __gen_ocl_atomic_fetch_umin##POSTFIX(volatile ATYPE *p, CTYPE val, int order, int scope); \ CTYPE __gen_ocl_atomic_fetch_imax##POSTFIX(volatile ATYPE *p, CTYPE val, int order, int scope); \ CTYPE __gen_ocl_atomic_fetch_umax##POSTFIX(volatile ATYPE *p, CTYPE val, int order, int scope);\ CTYPE __gen_ocl_atomic_compare_exchange_strong##POSTFIX(volatile ATYPE* object, CTYPE expected, CTYPE desired, int sucess, int failure, int scope); \ CTYPE __gen_ocl_atomic_compare_exchange_weak##POSTFIX(volatile ATYPE* object, CTYPE expected, CTYPE desired, int sucess, int failure, int scope); ATOMIC_GEN_FUNCTIONS(atomic_int, int, 32) #ifndef DISABLE_ATOMIC_INT64 ATOMIC_GEN_FUNCTIONS(atomic_long, long, 64) #endif float __gen_ocl_atomic_exchangef(volatile atomic_int *p, float val, int order, int scope); float __gen_ocl_atomic_fetch_addf(volatile atomic_int *p, float val, int order, int scope); #undef ATOMIC_GEN_FUNCTIONS /* only used to initialize global address space */ //#define ATOMIC_VAR_INIT(C value) #define ATOMIC_VAR_INIT #define ATOMIC_FLAG_INIT 0 //store #define ATOMIC_FUNCTIONS(ATYPE, CTYPE, MTYPE1, MTYPE2) \ OVERLOADABLE void atomic_init(volatile ATYPE *object, CTYPE desired); \ OVERLOADABLE void atomic_store(volatile ATYPE *object, CTYPE desired); \ OVERLOADABLE void atomic_store_explicit(volatile ATYPE *object, CTYPE desired, memory_order order); \ OVERLOADABLE void atomic_store_explicit(volatile ATYPE *object, CTYPE desired, memory_order order, memory_scope scope); \ OVERLOADABLE CTYPE atomic_load(volatile ATYPE *object); \ OVERLOADABLE CTYPE atomic_load_explicit(volatile ATYPE *object, memory_order order); \ OVERLOADABLE CTYPE atomic_load_explicit(volatile ATYPE *object, memory_order order, memory_scope scope); \ OVERLOADABLE CTYPE atomic_exchange(volatile ATYPE *object, CTYPE desired); \ OVERLOADABLE CTYPE atomic_exchange_explicit(volatile ATYPE *object, CTYPE desired, memory_order order); \ OVERLOADABLE CTYPE atomic_exchange_explicit(volatile ATYPE *object, CTYPE desired, memory_order order, memory_scope scope); \ OVERLOADABLE bool atomic_compare_exchange_strong(volatile ATYPE *object, CTYPE *expected, CTYPE desired); \ OVERLOADABLE bool atomic_compare_exchange_strong_explicit(volatile ATYPE *object, CTYPE *expected, CTYPE desired, memory_order success, memory_order failure); \ OVERLOADABLE bool atomic_compare_exchange_strong_explicit(volatile ATYPE *object, CTYPE *expected, CTYPE desired, memory_order success, memory_order failure, memory_scope scope); \ OVERLOADABLE bool atomic_compare_exchange_weak(volatile ATYPE *object, CTYPE *expected, CTYPE desired); \ OVERLOADABLE bool atomic_compare_exchange_weak_explicit(volatile ATYPE *object, CTYPE *expected, CTYPE desired, memory_order success, memory_order failure); \ OVERLOADABLE bool atomic_compare_exchange_weak_explicit(volatile ATYPE *object, CTYPE *expected, CTYPE desired, memory_order success, memory_order failure, memory_scope scope); \ OVERLOADABLE CTYPE atomic_fetch_add(volatile ATYPE *object, MTYPE1 desired); \ OVERLOADABLE CTYPE atomic_fetch_add_explicit(volatile ATYPE *object, MTYPE1 desired, memory_order order); \ OVERLOADABLE CTYPE atomic_fetch_add_explicit(volatile ATYPE *object, MTYPE1 desired, memory_order order, memory_scope scope); \ OVERLOADABLE CTYPE atomic_fetch_sub(volatile ATYPE *object, MTYPE1 desired); \ OVERLOADABLE CTYPE atomic_fetch_sub_explicit(volatile ATYPE *object, MTYPE1 desired, memory_order order); \ OVERLOADABLE CTYPE atomic_fetch_sub_explicit(volatile ATYPE *object, MTYPE1 desired, memory_order order, memory_scope scope); \ OVERLOADABLE CTYPE atomic_fetch_or(volatile ATYPE *object, MTYPE2 desired); \ OVERLOADABLE CTYPE atomic_fetch_or_explicit(volatile ATYPE *object, MTYPE2 desired, memory_order order); \ OVERLOADABLE CTYPE atomic_fetch_or_explicit(volatile ATYPE *object, MTYPE2 desired, memory_order order, memory_scope scope); \ OVERLOADABLE CTYPE atomic_fetch_xor(volatile ATYPE *object, MTYPE2 desired); \ OVERLOADABLE CTYPE atomic_fetch_xor_explicit(volatile ATYPE *object, MTYPE2 desired, memory_order order); \ OVERLOADABLE CTYPE atomic_fetch_xor_explicit(volatile ATYPE *object, MTYPE2 desired, memory_order order, memory_scope scope); \ OVERLOADABLE CTYPE atomic_fetch_and(volatile ATYPE *object, MTYPE2 desired); \ OVERLOADABLE CTYPE atomic_fetch_and_explicit(volatile ATYPE *object, MTYPE2 desired, memory_order order); \ OVERLOADABLE CTYPE atomic_fetch_and_explicit(volatile ATYPE *object, MTYPE2 desired, memory_order order, memory_scope scope); \ OVERLOADABLE CTYPE atomic_fetch_min(volatile ATYPE *object, MTYPE2 desired); \ OVERLOADABLE CTYPE atomic_fetch_min_explicit(volatile ATYPE *object, MTYPE2 desired, memory_order order); \ OVERLOADABLE CTYPE atomic_fetch_min_explicit(volatile ATYPE *object, MTYPE2 desired, memory_order order, memory_scope scope); \ OVERLOADABLE CTYPE atomic_fetch_max(volatile ATYPE *object, MTYPE2 desired); \ OVERLOADABLE CTYPE atomic_fetch_max_explicit(volatile ATYPE *object, MTYPE2 desired, memory_order order); \ OVERLOADABLE CTYPE atomic_fetch_max_explicit(volatile ATYPE *object, MTYPE2 desired, memory_order order, memory_scope scope); ATOMIC_FUNCTIONS(atomic_int, int, int, int) ATOMIC_FUNCTIONS(atomic_uint, uint, uint, uint) #ifndef DISABLE_ATOMIC_INT64 ATOMIC_FUNCTIONS(atomic_long, long, long, long) ATOMIC_FUNCTIONS(atomic_ulong, ulong, ulong, ulong) #endif ATOMIC_FUNCTIONS(atomic_float, float, float, float) #undef ATOMIC_FUNCTIONS OVERLOADABLE bool atomic_flag_test_and_set(volatile atomic_flag *object); OVERLOADABLE bool atomic_flag_test_and_set_explicit(volatile atomic_flag *object, memory_order order); OVERLOADABLE bool atomic_flag_test_and_set_explicit(volatile atomic_flag *object, memory_order order, memory_scope scope); OVERLOADABLE void atomic_flag_clear(volatile atomic_flag *object); OVERLOADABLE void atomic_flag_clear_explicit(volatile atomic_flag *object, memory_order order); OVERLOADABLE void atomic_flag_clear_explicit(volatile atomic_flag *object, memory_order order, memory_scope scope); OVERLOADABLE void atomic_work_item_fence(cl_mem_fence_flags flags, memory_order order, memory_scope scope); #endif /* __OCL_ATOM20_H__ */ Beignet-1.3.2-Source/backend/src/libocl/include/ocl_float.h000664 001750 001750 00000006126 13161142102 022634 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_FLOAT_H__ #define __OCL_FLOAT_H__ ///////////////////////////////////////////////////////////////////////////// // OpenCL floating-point macros and pragmas ///////////////////////////////////////////////////////////////////////////// #define FLT_DIG 6 #define FLT_MANT_DIG 24 #define FLT_MAX_10_EXP +38 #define FLT_MAX_EXP +128 #define FLT_MIN_10_EXP -37 #define FLT_MIN_EXP -125 #define FLT_RADIX 2 #define FLT_ONE 1.0000000000e+00 /* 0x3F800000 */ #define FLT_MAX 0x1.fffffep127f #define FLT_MIN 0x1.0p-126f #define FLT_EPSILON 0x1.0p-23f #define MAXFLOAT 3.40282347e38F INLINE_OVERLOADABLE float __ocl_inff(void) { union { uint u; float f; } u; u.u = 0x7F800000; return u.f; } INLINE_OVERLOADABLE float __ocl_nanf(void) { union { uint u; float f; } u; u.u = 0x7F800001; return u.f; } typedef union { float value; uint word; } float_shape_type; /* Get a 32 bit int from a float. */ #ifndef GEN_OCL_GET_FLOAT_WORD # define GEN_OCL_GET_FLOAT_WORD(i,d) \ do { \ float_shape_type gf_u; \ gf_u.value = (d); \ (i) = gf_u.word; \ } while (0) #endif /* Set a float from a 32 bit int. */ #ifndef GEN_OCL_SET_FLOAT_WORD # define GEN_OCL_SET_FLOAT_WORD(d,i) \ do { \ float_shape_type sf_u; \ sf_u.word = (i); \ (d) = sf_u.value; \ } while (0) #endif INLINE_OVERLOADABLE int __ocl_finitef (float x){ unsigned ix; GEN_OCL_GET_FLOAT_WORD (ix, x); return (ix & 0x7fffffff) < 0x7f800000; } #define HUGE_VALF (__ocl_inff()) #define INFINITY (__ocl_inff()) #define NAN (__ocl_nanf()) #define M_E_F 2.718281828459045F #define M_LOG2E_F 1.4426950408889634F #define M_LOG10E_F 0.43429448190325176F #define M_LOG210_F 3.3219280948873626F #define M_LN2_F 0.6931471805599453F #define M_LN10_F 2.302585092994046F #define M_PI_F 3.141592653589793F #define M_PI_2_F 1.5707963267948966F #define M_PI_4_F 0.7853981633974483F #define M_1_PI_F 0.3183098861837907F #define M_2_PI_F 0.6366197723675814F #define M_180_PI_F 57.295779513082321F #define M_2_SQRTPI_F 1.1283791670955126F #define M_SQRT2_F 1.4142135623730951F #define M_SQRT1_2_F 0.7071067811865476F #define FP_ILOGB0 (-0x7FFFFFFF-1) #define FP_ILOGBNAN FP_ILOGB0 #endif /* __OCL_FLOAT_H__ */ Beignet-1.3.2-Source/backend/src/libocl/include/ocl_workitem.h000664 001750 001750 00000002545 13161142102 023371 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_WORKITEM_H__ #define __OCL_WORKITEM_H__ #include "ocl_types.h" OVERLOADABLE uint get_work_dim(void); OVERLOADABLE size_t get_global_size(uint dimindx); OVERLOADABLE size_t get_global_id(uint dimindx); OVERLOADABLE size_t get_local_size(uint dimindx); OVERLOADABLE size_t get_enqueued_local_size(uint dimindx); OVERLOADABLE size_t get_local_id(uint dimindx); OVERLOADABLE size_t get_num_groups(uint dimindx); OVERLOADABLE size_t get_group_id(uint dimindx); OVERLOADABLE size_t get_global_offset(uint dimindx); OVERLOADABLE size_t get_global_linear_id(void); OVERLOADABLE size_t get_local_linear_id(void); #endif /* __OCL_WORKITEM_H__ */ Beignet-1.3.2-Source/backend/src/libocl/include/ocl_printf.h000664 001750 001750 00000002324 13161142102 023025 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_PRINTF_H__ #define __OCL_PRINTF_H__ #include "ocl_types.h" /* The printf function. */ /* From LLVM 3.4, c string are all in constant address space */ #if 100*__clang_major__ + __clang_minor__ < 304 int __gen_ocl_printf_stub(const char * format, ...); int __gen_ocl_puts_stub(const char * format); #else int __gen_ocl_printf_stub(constant char * format, ...); int __gen_ocl_puts_stub(constant char * format); #endif #define printf __gen_ocl_printf_stub #define puts __gen_ocl_puts_stub #endif Beignet-1.3.2-Source/backend/src/libocl/include/ocl_memset.h000664 001750 001750 00000002621 13161142102 023015 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_MEMSET_H__ #define __OCL_MEMSET_H__ #include "ocl_types.h" ///////////////////////////////////////////////////////////////////////////// // memcopy functions ///////////////////////////////////////////////////////////////////////////// void __gen_memset_g_align(__global uchar* dst, uchar val, size_t size); void __gen_memset_p_align(__private uchar* dst, uchar val, size_t size); void __gen_memset_l_align(__local uchar* dst, uchar val, size_t size); void __gen_memset_g(__global uchar* dst, uchar val, size_t size); void __gen_memset_p(__private uchar* dst, uchar val, size_t size); void __gen_memset_l(__local uchar* dst, uchar val, size_t size); #endif /* __OCL_MEMSET_H__ */ Beignet-1.3.2-Source/backend/src/libocl/include/ocl.h000664 001750 001750 00000006550 13173554000 021457 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_H__ #define __OCL_H__ /* LLVM 3.9 has these pre defined undef them first */ #ifdef cl_khr_3d_image_writes #undef cl_khr_3d_image_writes #endif #ifdef cl_khr_byte_addressable_store #undef cl_khr_byte_addressable_store #endif #ifdef cl_khr_fp16 #undef cl_khr_fp16 #endif #ifdef cl_khr_fp64 #undef cl_khr_fp64 #endif #ifdef cl_khr_global_int32_base_atomics #undef cl_khr_global_int32_base_atomics #endif #ifdef cl_khr_global_int32_extended_atomics #undef cl_khr_global_int32_extended_atomics #endif #ifdef cl_khr_gl_sharing #undef cl_khr_gl_sharing #endif #ifdef cl_khr_icd #undef cl_khr_icd #endif #ifdef cl_khr_local_int32_base_atomics #undef cl_khr_local_int32_base_atomics #endif #ifdef cl_khr_local_int32_extended_atomics #undef cl_khr_local_int32_extended_atomics #endif #ifdef cl_khr_d3d10_sharing #undef cl_khr_d3d10_sharing #endif #ifdef cl_khr_gl_event #undef cl_khr_gl_event #endif #ifdef cl_khr_int64_base_atomics #undef cl_khr_int64_base_atomics #endif #ifdef cl_khr_int64_extended_atomics #undef cl_khr_int64_extended_atomics #endif #ifdef cl_khr_d3d11_sharing #undef cl_khr_d3d11_sharing #endif #ifdef cl_khr_depth_images #undef cl_khr_depth_images #endif #ifdef cl_khr_dx9_media_sharing #undef cl_khr_dx9_media_sharing #endif #ifdef cl_khr_gl_depth_images #undef cl_khr_gl_depth_images #endif #ifdef cl_khr_spir #undef cl_khr_spir #endif #include "ocl_defines.h" #include "ocl_types.h" #include "ocl_as.h" #include "ocl_async.h" #include "ocl_common.h" #include "ocl_convert.h" #include "ocl_float.h" #include "ocl_geometric.h" #include "ocl_image.h" #include "ocl_integer.h" #include "ocl_memcpy.h" #include "ocl_memset.h" #include "ocl_misc.h" #include "ocl_printf.h" #include "ocl_relational.h" #include "ocl_sync.h" #if (__OPENCL_C_VERSION__ >= 200) #include "ocl_vload_20.h" #include "ocl_atom_20.h" #include "ocl_pipe.h" #include "ocl_math_20.h" #include "ocl_enqueue.h" #else #include "ocl_vload.h" #include "ocl_atom.h" #include "ocl_math.h" #endif #include "ocl_workitem.h" #include "ocl_simd.h" #include "ocl_work_group.h" /* Move these out from ocl_defines.h for only one define */ #define cl_khr_global_int32_base_atomics #define cl_khr_global_int32_extended_atomics #define cl_khr_local_int32_base_atomics #define cl_khr_local_int32_extended_atomics #define cl_khr_byte_addressable_store #define cl_khr_icd #define cl_khr_gl_sharing #define cl_khr_spir #define cl_khr_fp16 #define cl_khr_3d_image_writes #define cl_intel_subgroups #define cl_intel_subgroups_short #if __clang_major__*10 + __clang_minor__ > 40 #define cl_intel_required_subgroup_size #endif #pragma OPENCL EXTENSION cl_khr_fp64 : disable #pragma OPENCL EXTENSION cl_khr_fp16 : disable #endif Beignet-1.3.2-Source/backend/src/libocl/include/ocl_misc.h000664 001750 001750 00000011665 13173554000 022475 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_MISC_H__ #define __OCL_MISC_H__ #include "ocl_types.h" #define DEC2(TYPE, XTYPE, MASKTYPE) \ OVERLOADABLE TYPE##2 shuffle(XTYPE x, MASKTYPE##2 mask); #define DEC4(TYPE, XTYPE, MASKTYPE) \ OVERLOADABLE TYPE##4 shuffle(XTYPE x, MASKTYPE##4 mask); #define DEC8(TYPE, XTYPE, MASKTYPE) \ OVERLOADABLE TYPE##8 shuffle(XTYPE x, MASKTYPE##8 mask); #define DEC16(TYPE, XTYPE, MASKTYPE) \ OVERLOADABLE TYPE##16 shuffle(XTYPE x, MASKTYPE##16 mask); #define DEFMASK(TYPE, MASKTYPE) \ DEC2(TYPE, TYPE##2, MASKTYPE); DEC2(TYPE, TYPE##4, MASKTYPE); DEC2(TYPE, TYPE##8, MASKTYPE); DEC2(TYPE, TYPE##16, MASKTYPE) \ DEC4(TYPE, TYPE##2, MASKTYPE); DEC4(TYPE, TYPE##4, MASKTYPE); DEC4(TYPE, TYPE##8, MASKTYPE); DEC4(TYPE, TYPE##16, MASKTYPE) \ DEC8(TYPE, TYPE##2, MASKTYPE); DEC8(TYPE, TYPE##4, MASKTYPE); DEC8(TYPE, TYPE##8, MASKTYPE); DEC8(TYPE, TYPE##16, MASKTYPE) \ DEC16(TYPE, TYPE##2, MASKTYPE); DEC16(TYPE, TYPE##4, MASKTYPE); DEC16(TYPE, TYPE##8, MASKTYPE); DEC16(TYPE, TYPE##16, MASKTYPE) #define DEF(TYPE) \ DEFMASK(TYPE, uchar) \ DEFMASK(TYPE, ushort) \ DEFMASK(TYPE, uint) \ DEFMASK(TYPE, ulong) DEF(char) DEF(uchar) DEF(short) DEF(ushort) DEF(half) DEF(int) DEF(uint) DEF(float) DEF(long) DEF(ulong) #undef DEF #undef DEFMASK #undef DEC2 #undef DEC4 #undef DEC8 #undef DEC16 #define DEC2(TYPE, ARGTYPE, TEMPTYPE, MASKTYPE) \ OVERLOADABLE TYPE##2 shuffle2(ARGTYPE x, ARGTYPE y, MASKTYPE##2 mask); #define DEC2X(TYPE, MASKTYPE) \ OVERLOADABLE TYPE##2 shuffle2(TYPE##16 x, TYPE##16 y, MASKTYPE##2 mask); #define DEC4(TYPE, ARGTYPE, TEMPTYPE, MASKTYPE) \ OVERLOADABLE TYPE##4 shuffle2(ARGTYPE x, ARGTYPE y, MASKTYPE##4 mask); #define DEC4X(TYPE, MASKTYPE) \ OVERLOADABLE TYPE##4 shuffle2(TYPE##16 x, TYPE##16 y, MASKTYPE##4 mask); #define DEC8(TYPE, ARGTYPE, TEMPTYPE, MASKTYPE) \ OVERLOADABLE TYPE##8 shuffle2(ARGTYPE x, ARGTYPE y, MASKTYPE##8 mask); #define DEC8X(TYPE, MASKTYPE) \ OVERLOADABLE TYPE##8 shuffle2(TYPE##16 x, TYPE##16 y, MASKTYPE##8 mask); #define DEC16(TYPE, ARGTYPE, TEMPTYPE, MASKTYPE) \ OVERLOADABLE TYPE##16 shuffle2(ARGTYPE x, ARGTYPE y, MASKTYPE##16 mask); #define DEC16X(TYPE, MASKTYPE) \ OVERLOADABLE TYPE##16 shuffle2(TYPE##16 x, TYPE##16 y, MASKTYPE##16 mask); #define DEFMASK(TYPE, MASKTYPE) \ DEC2(TYPE, TYPE##2, TYPE##4, MASKTYPE) \ DEC2(TYPE, TYPE##4, TYPE##8, MASKTYPE) \ DEC2(TYPE, TYPE##8, TYPE##16, MASKTYPE) \ DEC2X(TYPE, MASKTYPE) \ DEC4(TYPE, TYPE##2, TYPE##4, MASKTYPE) \ DEC4(TYPE, TYPE##4, TYPE##8, MASKTYPE) \ DEC4(TYPE, TYPE##8, TYPE##16, MASKTYPE) \ DEC4X(TYPE, MASKTYPE) \ DEC8(TYPE, TYPE##2, TYPE##4, MASKTYPE) \ DEC8(TYPE, TYPE##4, TYPE##8, MASKTYPE) \ DEC8(TYPE, TYPE##8, TYPE##16, MASKTYPE) \ DEC8X(TYPE, MASKTYPE) \ DEC16(TYPE, TYPE##2, TYPE##4, MASKTYPE) \ DEC16(TYPE, TYPE##4, TYPE##8, MASKTYPE) \ DEC16(TYPE, TYPE##8, TYPE##16, MASKTYPE) \ DEC16X(TYPE, MASKTYPE) #define DEF(TYPE) \ DEFMASK(TYPE, uchar) \ DEFMASK(TYPE, ushort) \ DEFMASK(TYPE, uint) \ DEFMASK(TYPE, ulong) DEF(char) DEF(uchar) DEF(short) DEF(ushort) DEF(half) DEF(int) DEF(uint) DEF(float) DEF(long) DEF(ulong) #undef DEF #undef DEFMASK #undef DEC2 #undef DEC2X #undef DEC4 #undef DEC4X #undef DEC8 #undef DEC8X #undef DEC16 #undef DEC16X struct time_stamp { // time tick ulong tick; // If context-switch or frequency change occurs since last read of tm, // event will be non-zero, otherwise, it will be zero. uint event; }; uint __gen_ocl_region(ushort offset, uint data); struct time_stamp __gen_ocl_get_timestamp(void); uint8 __gen_ocl_vme(image2d_t, image2d_t, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, uint, int, int, int); bool __gen_ocl_in_local(size_t p); bool __gen_ocl_in_private(size_t p); #if (__OPENCL_C_VERSION__ >= 200) local void *__to_local(generic void *p); global void *__to_global(generic void *p); private void *__to_private(generic void *p); #endif #endif Beignet-1.3.2-Source/backend/src/libocl/include/ocl_pipe.h000664 001750 001750 00000004101 13161142102 022453 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_PIPE_H__ #define __OCL_PIPE_H__ #include "ocl_types.h" #include "ocl_work_group.h" #include "ocl_simd.h" /* The pipe read function. */ int __read_pipe_2(pipe int p, __generic void* dst); int __read_pipe_4(pipe int p, reserve_id_t id, uint index, void* dst); reserve_id_t __reserve_read_pipe(pipe int p, uint num); void __commit_read_pipe(pipe int p, reserve_id_t rid); reserve_id_t __work_group_reserve_read_pipe(pipe int p, uint num); void __work_group_commit_read_pipe(pipe int p, reserve_id_t rid); reserve_id_t __sub_group_reserve_read_pipe(pipe int p, uint num); void __sub_group_commit_read_pipe(pipe int p, reserve_id_t rid); /* The pipe write function. */ int __write_pipe_2(pipe int p, __generic void* src); int __write_pipe_4(pipe int p, reserve_id_t id, uint index, void* src); reserve_id_t __reserve_write_pipe(pipe int p, uint num); void __commit_write_pipe(pipe int p, reserve_id_t rid); reserve_id_t __work_group_reserve_write_pipe(pipe int p, uint num); void __work_group_commit_write_pipe(pipe int p, reserve_id_t rid); reserve_id_t __sub_group_reserve_write_pipe(pipe int p, uint num); void __sub_group_commit_write_pipe(pipe int p, reserve_id_t rid); /* The reserve_id_t function. */ bool is_valid_reserve_id(reserve_id_t rid); /* The pipe query function. */ uint __get_pipe_num_packets(pipe int p); uint __get_pipe_max_packets(pipe int p); #endif Beignet-1.3.2-Source/backend/src/libocl/Android.mk000664 001750 001750 00000010773 13161142102 021012 0ustar00yryr000000 000000 LOCAL_PATH:= $(call my-dir) include $(CLEAR_VARS) LOCAL_MODULE := libgbe LOCAL_MODULE_TAGS := optional LOCAL_MODULE_CLASS := SHARED_LIBRARIES generated_sources := $(call local-generated-sources-dir)/libocl $(shell mkdir -p ${generated_sources}/include/) $(shell mkdir -p ${generated_sources}/src/) #$(shell echo "cat $(LOCAL_PATH)/tmpl/ocl_defines.tmpl.h \\> ${LIBOCL_BINARY_DIR}/include/ocl_defines.h") $(shell cat $(LOCAL_PATH)/tmpl/ocl_defines.tmpl.h > ${generated_sources}/include/ocl_defines.h) #$(shell echo "cat $(LOCAL_PATH)/../ocl_common_defines.h \\>\\> ${LIBOCL_BINARY_DIR}/include/ocl_defines.h") $(shell cat ${LOCAL_PATH}/../ocl_common_defines.h >> ${generated_sources}/include/ocl_defines.h) $(shell echo "Generate the header: ${generated_sources}/include/ocl_defines.h") define COPY_THE_HEADER # Use the python script to generate the header files. $(shell cp ${LOCAL_PATH}/include/$(1).h ${generated_sources}/include/$(1).h) endef define COPY_THE_SOURCE # Use the python script to generate the header files. $(shell cp ${LOCAL_PATH}/src/$(1).cl ${generated_sources}/src/$(1).cl) endef OCL_COPY_MODULES := ocl ocl_types ocl_float ocl_printf $(foreach _M_, ${OCL_COPY_MODULES}, $(eval $(call COPY_THE_HEADER,$(_M_)))) OCL_COPY_MODULES := ocl_workitem ocl_atom ocl_async ocl_sync ocl_memcpy ocl_memset ocl_misc ocl_vload ocl_geometric ocl_image ocl_work_group OCL_SOURCE_FILES := $(OCL_COPY_MODULES) $(foreach _M_, ${OCL_COPY_MODULES}, $(eval $(call COPY_THE_HEADER,$(_M_)))) $(foreach _M_, ${OCL_COPY_MODULES}, $(eval $(call COPY_THE_SOURCE,$(_M_)))) define GENERATE_HEADER_PY # Use the python script to generate the header files. $(shell cat ${LOCAL_PATH}/tmpl/$(1).tmpl.h > ${generated_sources}/include/$(1).h) $(shell /usr/bin/python ${LOCAL_PATH}/script/gen_vector.py ${LOCAL_PATH}/script/$(1).def ${generated_sources}/include/$(1).h 1) $(shell echo "#endif" >> ${generated_sources}/include/$(1).h) endef define GENERATE_SOURCE_PY # Use the python script to generate the header files. $(shell cat ${LOCAL_PATH}/tmpl/$(1).tmpl.cl > ${generated_sources}/src/$(1).cl) $(shell /usr/bin/python ${LOCAL_PATH}/script/gen_vector.py ${LOCAL_PATH}/script/$(1).def ${generated_sources}/src/$(1).cl 0) endef OCL_COPY_MODULES_PY := ocl_common ocl_relational ocl_integer ocl_math ocl_simd OCL_SOURCE_FILES += $(OCL_COPY_MODULES_PY) $(foreach _M_, ${OCL_COPY_MODULES_PY}, $(eval $(call GENERATE_HEADER_PY,$(_M_)))) $(foreach _M_, ${OCL_COPY_MODULES_PY}, $(eval $(call GENERATE_SOURCE_PY,$(_M_)))) define GENERATE_HEADER_BASH # Use the python script to generate the header files.\ $(shell ${LOCAL_PATH}/script/$(1).sh -p > ${generated_sources}/include/$(1).h) endef define GENERATE_SOURCE_BASH # Use the python script to generate the header files. $(shell ${LOCAL_PATH}/script/$(1).sh > ${generated_sources}/src/$(1).cl) endef OCL_COPY_MODULES_SH := ocl_as ocl_convert OCL_SOURCE_FILES += $(OCL_COPY_MODULES_SH) $(foreach _M_, ${OCL_COPY_MODULES_SH}, $(eval $(call GENERATE_HEADER_BASH,$(_M_)))) $(foreach _M_, ${OCL_COPY_MODULES_SH}, $(eval $(call GENERATE_SOURCE_BASH,$(_M_)))) CLANG_OCL_FLAGS := -fno-builtin -ffp-contract=off -cl-kernel-arg-info -DGEN7_SAMPLER_CLAMP_BORDER_WORKAROUND "-cl-std=CL1.2" define ADD_CL_TO_BC_TARGET # Use the python script to generate the header files. $(shell $(HOST_OUT)/bin/clang -cc1 ${CLANG_OCL_FLAGS} -I ${generated_sources}/include/ -emit-llvm-bc -triple spir -o ${generated_sources}/$(1).bc -x cl ${generated_sources}/src/$(1).cl) endef $(foreach _M_, ${OCL_SOURCE_FILES}, $(eval $(call ADD_CL_TO_BC_TARGET,$(_M_)))) define COPY_THE_LL # Use the python script to generate the header files. $(shell cp ${LOCAL_PATH}/src/$(1).ll ${generated_sources}/src/$(1).ll) endef define ADD_LL_TO_BC_TARGET # Use the python script to generate the header files. $(shell $(HOST_OUT)/bin/llvm-as -o ${generated_sources}/$(1).bc ${generated_sources}/src/$(1).ll) endef OCL_LL_MODULES := ocl_barrier ocl_clz OCL_SOURCE_FILES += $(OCL_LL_MODULES) $(foreach _M_, ${OCL_LL_MODULES}, $(eval $(call COPY_THE_LL,$(_M_)))) $(foreach _M_, ${OCL_LL_MODULES}, $(eval $(call ADD_LL_TO_BC_TARGET,$(_M_)))) $(shell $(HOST_OUT)/bin/llvm-link -o ${generated_sources}/../beignet.bc $(addprefix ${generated_sources}/, $(addsuffix .bc, ${OCL_SOURCE_FILES}))) $(shell $(HOST_OUT)/bin/clang -cc1 ${CLANG_OCL_FLAGS} -triple spir -I ${generated_sources}/include/ --relocatable-pch -emit-pch -isysroot ${generated_sources} -x cl ${generated_sources}/include/ocl.h -o ${generated_sources}/../beignet.pch) Beignet-1.3.2-Source/backend/src/libocl/tmpl/000775 001750 001750 00000000000 13174334761 020067 5ustar00yryr000000 000000 Beignet-1.3.2-Source/backend/src/libocl/tmpl/ocl_defines.tmpl.h000664 001750 001750 00000002602 13161142102 023443 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_COMMON_DEF_H__ #define __OCL_COMMON_DEF_H__ #define __CL_VERSION_1_0__ 100 #define __CL_VERSION_1_1__ 110 #define __CL_VERSION_1_2__ 120 #define CL_VERSION_1_0 100 #define CL_VERSION_1_1 110 #define CL_VERSION_1_2 120 #if (__OPENCL_C_VERSION__ >= 200) #define __OPENCL_VERSION__ 200 #define CL_VERSION_2_0 200 #else #define __OPENCL_VERSION__ 120 #endif #define __ENDIAN_LITTLE__ 1 #define __IMAGE_SUPPORT__ 1 #define __kernel_exec(X, TYPE) __kernel __attribute__((work_group_size_hint(X,1,1))) \ __attribute__((vec_type_hint(TYPE))) #define kernel_exec(X, TYPE) __kernel_exec(X, TYPE) #endif /* end of __OCL_COMMON_DEF_H__ */ Beignet-1.3.2-Source/backend/src/libocl/tmpl/ocl_math.tmpl.cl000664 001750 001750 00000333075 13161142102 023141 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_math.h" #include "ocl_float.h" #include "ocl_relational.h" #include "ocl_common.h" #include "ocl_integer.h" extern constant int __ocl_math_fastpath_flag; CONST float __gen_ocl_fabs(float x) __asm("llvm.fabs" ".f32"); CONST float __gen_ocl_sin(float x) __asm("llvm.sin" ".f32"); CONST float __gen_ocl_cos(float x) __asm("llvm.cos" ".f32"); CONST float __gen_ocl_sqrt(float x) __asm("llvm.sqrt" ".f32"); PURE CONST float __gen_ocl_rsqrt(float x); CONST float __gen_ocl_log(float x) __asm("llvm.log2" ".f32"); CONST float __gen_ocl_exp(float x) __asm("llvm.exp2" ".f32"); PURE CONST float __gen_ocl_pow(float x, float y) __asm("llvm.pow" ".f32"); PURE CONST float __gen_ocl_rcp(float x); CONST float __gen_ocl_rndz(float x) __asm("llvm.trunc" ".f32"); CONST float __gen_ocl_rnde(float x) __asm("llvm.rint" ".f32"); CONST float __gen_ocl_rndu(float x) __asm("llvm.ceil" ".f32"); CONST float __gen_ocl_rndd(float x) __asm("llvm.floor" ".f32"); /* native functions */ OVERLOADABLE float native_cos(float x) { return __gen_ocl_cos(x); } OVERLOADABLE float native_sin(float x) { return __gen_ocl_sin(x); } OVERLOADABLE float native_sqrt(float x) { return __gen_ocl_sqrt(x); } OVERLOADABLE float native_rsqrt(float x) { return __gen_ocl_rsqrt(x); } OVERLOADABLE float native_log2(float x) { return __gen_ocl_log(x); } OVERLOADABLE float native_log(float x) { return native_log2(x) * 0.6931472002f; } OVERLOADABLE float native_log10(float x) { return native_log2(x) * 0.3010299956f; } OVERLOADABLE float native_powr(float x, float y) { return __gen_ocl_pow(x,y); } OVERLOADABLE float native_recip(float x) { return __gen_ocl_rcp(x); } OVERLOADABLE float native_tan(float x) { return native_sin(x) / native_cos(x); } OVERLOADABLE float native_exp2(float x) { return __gen_ocl_exp(x); } OVERLOADABLE float native_exp(float x) { return __gen_ocl_exp(M_LOG2E_F*x); } OVERLOADABLE float native_exp10(float x) { return __gen_ocl_exp(M_LOG210_F*x); } OVERLOADABLE float native_divide(float x, float y) { return x/y; } /* Fast path */ OVERLOADABLE float __gen_ocl_internal_fastpath_acosh (float x) { return native_log(x + native_sqrt(x + 1) * native_sqrt(x - 1)); } OVERLOADABLE float __gen_ocl_internal_fastpath_asinh (float x) { return native_log(x + native_sqrt(x * x + 1)); } OVERLOADABLE float __gen_ocl_internal_fastpath_atanh (float x) { return 0.5f * native_log((1 + x) / (1 - x)); } OVERLOADABLE float __gen_ocl_internal_fastpath_cbrt (float x) { return __gen_ocl_pow(x, 0.3333333333f); } OVERLOADABLE float __gen_ocl_internal_fastpath_cos (float x) { return native_cos(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_cosh (float x) { return (1 + native_exp(-2 * x)) / (2 * native_exp(-x)); } OVERLOADABLE float __gen_ocl_internal_fastpath_cospi (float x) { return __gen_ocl_cos(x * M_PI_F); } OVERLOADABLE float __gen_ocl_internal_fastpath_exp (float x) { return native_exp(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_exp10 (float x) { return native_exp10(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_expm1 (float x) { return __gen_ocl_pow(M_E_F, x) - 1; } OVERLOADABLE float __gen_ocl_internal_fastpath_fmod (float x, float y) { return x-y*__gen_ocl_rndz(x/y); } OVERLOADABLE float __gen_ocl_internal_fastpath_hypot (float x, float y) { return __gen_ocl_sqrt(x*x + y*y); } OVERLOADABLE int __gen_ocl_internal_fastpath_ilogb (float x) { return __gen_ocl_rndd(native_log2(x)); } OVERLOADABLE float __gen_ocl_internal_fastpath_ldexp (float x, int n) { return __gen_ocl_pow(2, n) * x; } OVERLOADABLE float __gen_ocl_internal_fastpath_log (float x) { return native_log(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_log2 (float x) { return native_log2(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_log10 (float x) { return native_log10(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_log1p (float x) { return native_log(x + 1); } OVERLOADABLE float __gen_ocl_internal_fastpath_logb (float x) { return __gen_ocl_rndd(native_log2(x)); } OVERLOADABLE float __gen_ocl_internal_fastpath_remainder (float x, float y) { return x-y*__gen_ocl_rnde(x/y); } OVERLOADABLE float __gen_ocl_internal_fastpath_rootn(float x, int n) { return __gen_ocl_pow(x, 1.f / n); } OVERLOADABLE float __gen_ocl_internal_fastpath_sin (float x) { return native_sin(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_sincos (float x, __global float *cosval) { *cosval = native_cos(x); return native_sin(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_sincos (float x, __local float *cosval) { *cosval = native_cos(x); return native_sin(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_sincos (float x, __private float *cosval) { *cosval = native_cos(x); return native_sin(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_sinh (float x) { return (1 - native_exp(-2 * x)) / (2 * native_exp(-x)); } OVERLOADABLE float __gen_ocl_internal_fastpath_sinpi (float x) { return __gen_ocl_sin(x * M_PI_F); } OVERLOADABLE float __gen_ocl_internal_fastpath_tan (float x) { return native_tan(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_tanh (float x) { float y = native_exp(-2 * x); return (1 - y) / (1 + y); } /* Internal implement, high accuracy. */ OVERLOADABLE float __gen_ocl_internal_floor(float x) { return __gen_ocl_rndd(x); } OVERLOADABLE float __gen_ocl_internal_copysign(float x, float y) { union { unsigned u; float f; } ux, uy; ux.f = x; uy.f = y; ux.u = (ux.u & 0x7fffffff) | (uy.u & 0x80000000u); return ux.f; } OVERLOADABLE float inline __gen_ocl_internal_log_valid(float x) { /* * Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ union { unsigned int i; float f; } u; const float ln2_hi = 6.9313812256e-01, /* 0x3f317180 */ ln2_lo = 9.0580006145e-06, /* 0x3717f7d1 */ two25 = 3.355443200e+07, /* 0x4c000000 */ Lg1 = 6.6666668653e-01, /* 3F2AAAAB */ Lg2 = 4.0000000596e-01, /* 3ECCCCCD */ Lg3 = 2.8571429849e-01, /* 3E924925 */ Lg4 = 2.2222198546e-01; /* 3E638E29 */ const float zero = 0.0; float fsq, f, s, z, R, w, t1, t2, partial; int k, ix, i, j; u.f = x; ix = u.i; k = 0; k += (ix>>23) - 127; ix &= 0x007fffff; i = (ix + (0x95f64<<3)) & 0x800000; u.i = ix | (i^0x3f800000); x = u.f; k += (i>>23); f = x - 1.0f; fsq = f * f; if((0x007fffff & (15 + ix)) < 16) { /* |f| < 2**-20 */ R = fsq * (0.5f - 0.33333333333333333f * f); return k * ln2_hi + k * ln2_lo + f - R; } s = f / (2.0f + f); z = s * s; i = ix - (0x6147a << 3); w = z * z; j = (0x6b851 << 3) - ix; t1= w * mad(w, Lg4, Lg2); t2= z * mad(w, Lg3, Lg1); i |= j; R = t2 + t1; partial = (i > 0) ? -mad(s, 0.5f * fsq, -0.5f * fsq) : (s * f); return mad(s, R, f) - partial + k * ln2_hi + k * ln2_lo;; } OVERLOADABLE float __gen_ocl_internal_log(float x) { union { unsigned int i; float f; } u; u.f = x; int ix = u.i; if (ix < 0 ) return NAN; /* log(-#) = NaN */ if (ix >= 0x7f800000) return NAN; return __gen_ocl_internal_log_valid(x); } OVERLOADABLE float __gen_ocl_internal_log10(float x) { union { float f; unsigned i; } u; const float ivln10 = 4.3429449201e-01, /* 0x3ede5bd9 */ log10_2hi = 3.0102920532e-01, /* 0x3e9a2080 */ log10_2lo = 7.9034151668e-07; /* 0x355427db */ float y, z; int i, k, hx; u.f = x; hx = u.i; if (hx<0) return NAN; /* log(-#) = NaN */ if (hx >= 0x7f800000) return NAN; k = (hx >> 23) - 127; i = ((unsigned)k & 0x80000000) >> 31; hx = (hx&0x007fffff) | ((0x7f-i) << 23); y = (float)(k + i); u.i = hx; x = u.f; return y * log10_2lo + y * log10_2hi + ivln10 * __gen_ocl_internal_log_valid(x); } OVERLOADABLE float __gen_ocl_internal_log2(float x) { const float zero = 0.0, invln2 = 0x1.715476p+0f; int ix; union { float f; int i; } u; u.f = x; ix = u.i; if (ix < 0) return NAN; /** log(-#) = NaN */ if (ix >= 0x7f800000) return NAN; return invln2 * __gen_ocl_internal_log_valid(x); } float __gen_ocl_scalbnf (float x, int n){ /* copy from fdlibm */ float two25 = 3.355443200e+07, /* 0x4c000000 */ twom25 = 2.9802322388e-08, /* 0x33000000 */ huge = 1.0e+30, tiny = 1.0e-30; int k,ix; GEN_OCL_GET_FLOAT_WORD(ix,x); k = (ix&0x7f800000)>>23; /* extract exponent */ if (k==0) { /* 0 or subnormal x */ if ((ix&0x7fffffff)==0) return x; /* +-0 */ x *= two25; GEN_OCL_GET_FLOAT_WORD(ix,x); k = ((ix&0x7f800000)>>23) - 25; } if (k==0xff) return x+x; /* NaN or Inf */ if (n< -50000) return tiny*__gen_ocl_internal_copysign(tiny,x); /*underflow*/ if (n> 50000 || k+n > 0xfe) return huge*__gen_ocl_internal_copysign(huge,x); /* overflow */ /* Now k and n are bounded we know that k = k+n does not overflow. */ k = k+n; if (k > 0) { /* normal result */ GEN_OCL_SET_FLOAT_WORD(x,(ix&0x807fffff)|(k<<23)); return x; } if (k <= -25) return tiny*__gen_ocl_internal_copysign(tiny,x); /*underflow*/ k += 25; /* subnormal result */ GEN_OCL_SET_FLOAT_WORD(x,(ix&0x807fffff)|(k<<23)); return x*twom25; } const __constant unsigned int two_over_pi[] = { 0, 0, 0xA2F, 0x983, 0x6E4, 0xe44, 0x152, 0x9FC, 0x275, 0x7D1, 0xF53, 0x4DD, 0xC0D, 0xB62, 0x959, 0x93C, 0x439, 0x041, 0xFE5, 0x163, }; // The main idea is from "Radian Reduction for Trigonometric Functions" // written by Mary H. Payne and Robert N. Hanek. Also another reference // is "A Continued-Fraction Analysis of Trigonometric Argument Reduction" // written by Roger Alan Smith, who gave the worst case in this paper. // for single float, worst x = 0x1.47d0fep34, and there are 29 bit // leading zeros in the fraction part of x*(2.0/pi). so we need at least // 29 (leading zero)+ 24 (fraction )+12 (integer) + guard bits. that is, // 65 + guard bits, as we calculate in 12*7 = 84bits, which means we have // about 19 guard bits. If we need further precision, we may need more // guard bits // Note we place two 0 in two_over_pi, which is used to handle input less // than 0x1.0p23 int payne_hanek(float x, float *y) { union { float f; unsigned u;} ieee; ieee.f = x; unsigned u = ieee.u; int k = ((u & 0x7f800000) >> 23)-127; int ma = (u & 0x7fffff) | 0x800000; unsigned high, low; high = (ma & 0xfff000) >> 12; low = ma & 0xfff; // Two tune below macro, you need to fully understand the algorithm #define CALC_BLOCKS 7 #define ZERO_BITS 2 unsigned result[CALC_BLOCKS]; // round down, note we need 2 bits integer precision int index = (k-23-2) < 0 ? (k-23-2-11)/12 : (k-23-2)/12; for (int i = 0; i < CALC_BLOCKS; i++) { result[i] = low * two_over_pi[index+i+ZERO_BITS] ; result[i] += high * two_over_pi[index+i+1+ZERO_BITS]; } for (int i = CALC_BLOCKS-1; i > 0; i--) { int temp = result[i] >> 12; result[i] -= temp << 12; result[i-1] += temp; } #undef CALC_BLOCKS #undef ZERO_BITS // get number of integer digits in result[0], note we only consider 12 valid bits // and also it means the fraction digits in result[0] is (12-intDigit) int intDigit = index*(-12) + (k-23); // As the integer bits may be all included in result[0], and also maybe // some bits in result[0], and some in result[1]. So we merge succesive bits, // which makes easy coding. unsigned b0 = (result[0] << 12) | result[1]; unsigned b1 = (result[2] << 12) | result[3]; unsigned b2 = (result[4] << 12) | result[5]; unsigned b3 = (result[6] << 12); unsigned intPart = b0 >> (24-intDigit); unsigned fract1 = ((b0 << intDigit) | (b1 >> (24-intDigit))) & 0xffffff; unsigned fract2 = ((b1 << intDigit) | (b2 >> (24-intDigit))) & 0xffffff; unsigned fract3 = ((b2 << intDigit) | (b3 >> (24-intDigit))) & 0xffffff; // larger than 0.5? which mean larger than pi/4, we need // transform from [0,pi/2] to [-pi/4, pi/4] through -(1.0-fract) int largerPiBy4 = ((fract1 & 0x800000) != 0); int sign = largerPiBy4 ? 1 : 0; intPart = largerPiBy4 ? (intPart+1) : intPart; fract1 = largerPiBy4 ? (fract1 ^ 0x00ffffff) : fract1; fract2 = largerPiBy4 ? (fract2 ^ 0x00ffffff) : fract2; fract3 = largerPiBy4 ? (fract3 ^ 0x00ffffff) : fract3; int leadingZero = (fract1 == 0); // +1 is for the hidden bit 1 in floating-point format int exponent = leadingZero ? -(24+1) : -(0+1); fract1 = leadingZero ? fract2 : fract1; fract2 = leadingZero ? fract3 : fract2; // fract1 may have leading zeros, add it int shift = clz(fract1)-8; exponent += -shift; float pio2 = 0x1.921fb6p+0; unsigned fdigit = ((fract1 << shift) | (fract2 >> (24-shift))) & 0xffffff; // we know that denormal number will not appear here ieee.u = (sign << 31) | ((exponent+127) << 23) | (fdigit & 0x7fffff); *y = ieee.f * pio2; return intPart; } int argumentReduceSmall(float x, float * remainder) { union { float f; unsigned u; } ieee; float twoByPi = 2.0f/3.14159265f; float piBy2_1h = (float) 0xc90/0x1.0p11, piBy2_1l = (float) 0xfda/0x1.0p23, piBy2_2h = (float) 0xa22/0x1.0p35, piBy2_2l = (float) 0x168/0x1.0p47, piBy2_3h = (float) 0xc23/0x1.0p59, piBy2_3l = (float) 0x4c4/0x1.0p71; float y = (float)(int)(twoByPi * x + 0.5f); ieee.f = y; ieee.u = ieee.u & 0xfffff000; float yh = ieee.f; float yl = y - yh; float rem = x - yh*piBy2_1h - yh*piBy2_1l - yl*piBy2_1h - yl*piBy2_1l; rem = rem - yh*piBy2_2h - yh*piBy2_2l + yl*piBy2_2h + yl*piBy2_2l; rem = rem - yh*piBy2_3h - yh*piBy2_3l - yl*piBy2_3h - yl*piBy2_3l; *remainder = rem; return (int)y; } int __ieee754_rem_pio2f(float x, float *y) { if (x < 4000.0f) { return argumentReduceSmall(x, y); } else { return payne_hanek(x, y); } } OVERLOADABLE float __kernel_sinf(float x) { /* copied from fdlibm */ const float S1 = -1.6666667163e-01, /* 0xbe2aaaab */ S2 = 8.3333337680e-03, /* 0x3c088889 */ S3 = -1.9841270114e-04, /* 0xb9500d01 */ S4 = 2.7557314297e-06; /* 0x3638ef1b */ float z,r,v; z = x*x; v = z*x; r = mad(z, mad(z, mad(z, S4, S3), S2), S1); return mad(v, r, x); } float __kernel_cosf(float x, float y) { /* copied from fdlibm */ const float one = 1.0000000000e+00, /* 0x3f800000 */ C1 = 4.1666667908e-02, /* 0x3d2aaaab */ C2 = -1.3888889225e-03, /* 0xbab60b61 */ C3 = 2.4801587642e-05; /* 0x37d00d01 */ float a,hz,z,r,qx; int ix; GEN_OCL_GET_FLOAT_WORD(ix,x); ix &= 0x7fffffff; /* ix = |x|'s high word*/ z = x*x; r = z * mad(z, mad(z, C3, C2), C1); if(ix < 0x3e99999a) /* if |x| < 0.3 */ return one - ((float)0.5*z - (z*r - x*y)); else { GEN_OCL_SET_FLOAT_WORD(qx,ix-0x01000000); /* x/4 */ hz = (float)0.5*z-qx; a = one-qx; return a - (hz - (z*r-x*y)); } } OVERLOADABLE float sin(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_sin(x); const float pio4 = 7.8539812565e-01; /* 0x3f490fda */ float y,z=0.0; int n, ix; float negative = x < 0.0f? -1.0f : 1.0f; x = fabs(x); GEN_OCL_GET_FLOAT_WORD(ix,x); ix &= 0x7fffffff; /* sin(Inf or NaN) is NaN */ if (ix >= 0x7f800000) return x-x; if(x <= pio4) return negative * __kernel_sinf(x); /* argument reduction needed */ else { n = __ieee754_rem_pio2f(x,&y); float s = __kernel_sinf(y); float c = __kernel_cosf(y,0.0f); float ret = (n&1) ? negative*c : negative*s; return (n&3)> 1? -1.0f*ret : ret; } } OVERLOADABLE float cos(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_cos(x); const float pio4 = 7.8539812565e-01; /* 0x3f490fda */ float y,z=0.0; int n, ix; x = __gen_ocl_fabs(x); GEN_OCL_GET_FLOAT_WORD(ix,x); ix &= 0x7fffffff; /* cos(Inf or NaN) is NaN */ if (ix >= 0x7f800000) return x-x; if(x <= pio4) return __kernel_cosf(x, 0.f); /* argument reduction needed */ else { n = __ieee754_rem_pio2f(x,&y); n &= 3; float c = __kernel_cosf(y, 0.0f); float s = __kernel_sinf(y); float v = (n&1) ? s : c; /* n&3 return 0 cos(y) 1 -sin(y) 2 -cos(y) 3 sin(y) */ int mask = (n>>1) ^ n; float sign = (mask&1) ? -1.0f : 1.0f; return sign * v; } } float __kernel_tanf(float x, float y, int iy) { /* copied from fdlibm */ float z,r,v,w,s; int ix,hx; const float one = 1.0000000000e+00, /* 0x3f800000 */ pio4 = 7.8539812565e-01, /* 0x3f490fda */ pio4lo= 3.7748947079e-08; /* 0x33222168 */ float T[13];// = { T[0] = 3.3333334327e-01; /* 0x3eaaaaab */ T[1] = 1.3333334029e-01; /* 0x3e088889 */ T[2] = 5.3968254477e-02; /* 0x3d5d0dd1 */ T[3] = 2.1869488060e-02; /* 0x3cb327a4 */ T[4] = 8.8632395491e-03; /* 0x3c11371f */ T[5] = 3.5920790397e-03; /* 0x3b6b6916 */ T[6] = 1.4562094584e-03; /* 0x3abede48 */ T[7] = 5.8804126456e-04; /* 0x3a1a26c8 */ GEN_OCL_GET_FLOAT_WORD(hx,x); ix = hx&0x7fffffff; /* high word of |x| */ if(ix<0x31800000) /* x < 2**-28 */ {if((int)x==0) { /* generate inexact */ if((ix|(iy+1))==0) return one/__gen_ocl_fabs(x); else return (iy==1)? x: -one/x; } } if(ix>=0x3f2ca140) { /* |x|>=0.6744 */ if(hx<0) {x = -x; y = -y;} z = pio4-x; w = pio4lo-y; x = z+w; y = 0.0; } z = x*x; w = z*z; /* Break x^5*(T[1]+x^2*T[2]+...) into * x^5(T[1]+x^4*T[3]+...+x^20*T[11]) + * x^5(x^2*(T[2]+x^4*T[4]+...+x^22*[T12])) */ r = mad(w, mad(w, mad(w, T[7], T[5]), T[3]), T[1]); v = z* mad(w, mad(w, T[6], T[4]), T[2]); s = z*x; r = mad(z, mad(s, r + v, y), y); r += T[0]*s; w = x+r; if(ix>=0x3f2ca140) { v = (float)iy; return (float)(1-((hx>>30)&2))*(v-(float)2.0*(x-(w*w/(w+v)-r))); } if(iy==1) return w; else return -1.0/(x+r); } OVERLOADABLE float tan(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_tan(x); float y,z=0.0; int n, ix; float negative = x < 0.0f? -1.0f : 1.0f; x = negative * x; GEN_OCL_GET_FLOAT_WORD(ix,x); ix &= 0x7fffffff; /* tan(Inf or NaN) is NaN */ if (ix>=0x7f800000) return x-x; /* NaN */ /* argument reduction needed */ else { n = __ieee754_rem_pio2f(x,&y); return negative * __kernel_tanf(y,0.0f,1-((n&1)<<1)); /* 1 -- n even -1 -- n odd */ } } OVERLOADABLE float __gen_ocl_internal_cospi(float x) { int ix; if(isinf(x) || isnan(x)) { return NAN; } if(x < 0.0f) { x = -x; } GEN_OCL_GET_FLOAT_WORD(ix, x); if(x> 0x1.0p24) return 1.0f; float m = __gen_ocl_internal_floor(x); ix = (int)m; m = x-m; if((ix&0x1) != 0) m+=1.0f; ix = __gen_ocl_internal_floor(m*4.0f); switch(ix) { case 0: return __kernel_cosf(m*M_PI_F, 0.0f); case 1: case 2: return __kernel_sinf((0.5f-m)*M_PI_F); case 3: case 4: return -__kernel_cosf((m-1.0f)*M_PI_F, 0.0f); case 5: case 6: return __kernel_sinf((m-1.5f)*M_PI_F); default: return __kernel_cosf((2.0f-m)*M_PI_F, 0.0f); } } OVERLOADABLE float __gen_ocl_internal_sinpi(float x) { float sign = 1.0f; int ix; if(isinf(x)) return NAN; if(x < 0.0f) { x = -x; sign = -1.0f; } GEN_OCL_GET_FLOAT_WORD(ix, x); if(x> 0x1.0p24) return 0.0f; float m = __gen_ocl_internal_floor(x); ix = (int)m; m = x-m; if((ix&0x1) != 0) m+=1.0f; ix = __gen_ocl_internal_floor(m*4.0f); switch(ix) { case 0: return sign*__kernel_sinf(m*M_PI_F); case 1: case 2: return sign*__kernel_cosf((m-0.5f)*M_PI_F, 0.0f); case 3: case 4: return -sign*__kernel_sinf((m-1.0f)*M_PI_F); case 5: case 6: return -sign*__kernel_cosf((m-1.5f)*M_PI_F, 0.0f); default: return -sign*__kernel_sinf((2.0f-m)*M_PI_F); } } OVERLOADABLE float lgamma(float x) { /* * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ const float zero= 0., one = 1.0000000000e+00, pi = 3.1415927410e+00, a0 = 7.7215664089e-02, a1 = 3.2246702909e-01, a2 = 6.7352302372e-02, a3 = 2.0580807701e-02, a4 = 7.3855509982e-03, a5 = 2.8905137442e-03, a6 = 1.1927076848e-03, a7 = 5.1006977446e-04, a8 = 2.2086278477e-04, a9 = 1.0801156895e-04, a10 = 2.5214456400e-05, a11 = 4.4864096708e-05, tc = 1.4616321325e+00, tf = -1.2148628384e-01, tt = 6.6971006518e-09, t0 = 4.8383611441e-01, t1 = -1.4758771658e-01, t2 = 6.4624942839e-02, t3 = -3.2788541168e-02, t4 = 1.7970675603e-02, t5 = -1.0314224288e-02, t6 = 6.1005386524e-03, t7 = -3.6845202558e-03, t8 = 2.2596477065e-03, t9 = -1.4034647029e-03, t10 = 8.8108185446e-04, t11 = -5.3859531181e-04, t12 = 3.1563205994e-04, t13 = -3.1275415677e-04, t14 = 3.3552918467e-04, u0 = -7.7215664089e-02, u1 = 6.3282704353e-01, u2 = 1.4549225569e+00, u3 = 9.7771751881e-01, u4 = 2.2896373272e-01, u5 = 1.3381091878e-02, v1 = 2.4559779167e+00, v2 = 2.1284897327e+00, v3 = 7.6928514242e-01, v4 = 1.0422264785e-01, v5 = 3.2170924824e-03, s0 = -7.7215664089e-02, s1 = 2.1498242021e-01, s2 = 3.2577878237e-01, s3 = 1.4635047317e-01, s4 = 2.6642270386e-02, s5 = 1.8402845599e-03, s6 = 3.1947532989e-05, r1 = 1.3920053244e+00, r2 = 7.2193557024e-01, r3 = 1.7193385959e-01, r4 = 1.8645919859e-02, r5 = 7.7794247773e-04, r6 = 7.3266842264e-06, w0 = 4.1893854737e-01, w1 = 8.3333335817e-02, w2 = -2.7777778450e-03, w3 = 7.9365057172e-04, w4 = -5.9518753551e-04, w5 = 8.3633989561e-04, w6 = -1.6309292987e-03; float t, y, z, nadj, p, p1, p2, p3, q, r, w; int i, hx, ix; nadj = 0; hx = *(int *)&x; ix = hx & 0x7fffffff; if (ix >= 0x7f800000) return x * x; if (ix == 0) return ((x + one) / zero); if (ix < 0x1c800000) { if (hx < 0) { return -native_log(-x); } else return -native_log(x); } if (hx < 0) { if (ix >= 0x4b000000) return ((-x) / zero); t = __gen_ocl_internal_sinpi(x); if (t == zero) return ((-x) / zero); nadj = native_log(pi / __gen_ocl_fabs(t * x)); x = -x; } if (ix == 0x3f800000 || ix == 0x40000000) r = 0; else if (ix < 0x40000000) { if (ix <= 0x3f666666) { r = -native_log(x); if (ix >= 0x3f3b4a20) { y = one - x; i = 0; } else if (ix >= 0x3e6d3308) { y = x - (tc - one); i = 1; } else { y = x; i = 2; } } else { r = zero; if (ix >= 0x3fdda618) { y = (float) 2.0 - x; i = 0; } else if (ix >= 0x3F9da620) { y = x - tc; i = 1; } else { y = x - one; i = 2; } } switch (i) { case 0: z = y * y; p1 = mad(z, mad(z, mad(z, mad(z, mad(z, a10, a8), a6), a4), a2), a0); p2 = z * mad(z, mad(z, mad(z, mad(z, mad(z, a11, a9), a7), a5), a3), a1); p = mad(y, p1, p2); r += (p - (float) 0.5 * y); break; case 1: z = y * y; w = z * y; p1 = mad(w, mad(w, mad(w, mad(w, t12, t9), t6), t3), t0); p2 = mad(w, mad(w, mad(w, mad(w, t13, t10), t7), t4), t1); p3 = mad(w, mad(w, mad(w, mad(w, t14, t11), t8), t5), t2); p = mad(p1, z, mad(w, mad(y, p3, p2), -tt)); r += (tf + p); break; case 2: p1 = y * mad(y, mad(y, mad(y, mad(y, mad(y, u5, u4), u3), u2), u1), u0); p2 = mad(y, mad(y, mad(y, mad(y, mad(y, v5, v4), v3), v2), v1), one); r += (-(float) 0.5 * y + p1 / p2); } } else if (ix < 0x41000000) { i = (int) x; t = zero; y = x - (float) i; p =y * mad(y, mad(y, mad(y, mad(y, mad(y, mad(y, s6, s5), s4), s3), s2), s1), s0); q = mad(y, mad(y, mad(y, mad(y, mad(y, mad(y, r6, r5), r4), r3), r2), r1), one); r = .5f * y + p / q; z = one; switch (i) { case 7: z *= (y + 6.0f); case 6: z *= (y + 5.0f); case 5: z *= (y + 4.0f); case 4: z *= (y + 3.0f); case 3: z *= (y + 2.0f); r += native_log(z); break; } } else if (ix < 0x5c800000) { t = native_log(x); z = one / x; y = z * z; w = mad(z, mad(y, mad(y, mad(y, mad(y, mad(y, w6, w5), w4), w3), w2), w1), w0); r = (x - .5f) * (t - one) + w; } else r = x * (native_log(x) - one); if (hx < 0) r = nadj - r; return r; } /* * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ #define BODY \ const float \ zero= 0., \ one = 1.0000000000e+00, \ pi = 3.1415927410e+00, \ a0 = 7.7215664089e-02, \ a1 = 3.2246702909e-01, \ a2 = 6.7352302372e-02, \ a3 = 2.0580807701e-02, \ a4 = 7.3855509982e-03, \ a5 = 2.8905137442e-03, \ a6 = 1.1927076848e-03, \ a7 = 5.1006977446e-04, \ a8 = 2.2086278477e-04, \ a9 = 1.0801156895e-04, \ a10 = 2.5214456400e-05, \ a11 = 4.4864096708e-05, \ tc = 1.4616321325e+00, \ tf = -1.2148628384e-01, \ tt = 6.6971006518e-09, \ t0 = 4.8383611441e-01, \ t1 = -1.4758771658e-01, \ t2 = 6.4624942839e-02, \ t3 = -3.2788541168e-02, \ t4 = 1.7970675603e-02, \ t5 = -1.0314224288e-02, \ t6 = 6.1005386524e-03, \ t7 = -3.6845202558e-03, \ t8 = 2.2596477065e-03, \ t9 = -1.4034647029e-03, \ t10 = 8.8108185446e-04, \ t11 = -5.3859531181e-04, \ t12 = 3.1563205994e-04, \ t13 = -3.1275415677e-04, \ t14 = 3.3552918467e-04, \ u0 = -7.7215664089e-02, \ u1 = 6.3282704353e-01, \ u2 = 1.4549225569e+00, \ u3 = 9.7771751881e-01, \ u4 = 2.2896373272e-01, \ u5 = 1.3381091878e-02, \ v1 = 2.4559779167e+00, \ v2 = 2.1284897327e+00, \ v3 = 7.6928514242e-01, \ v4 = 1.0422264785e-01, \ v5 = 3.2170924824e-03, \ s0 = -7.7215664089e-02, \ s1 = 2.1498242021e-01, \ s2 = 3.2577878237e-01, \ s3 = 1.4635047317e-01, \ s4 = 2.6642270386e-02, \ s5 = 1.8402845599e-03, \ s6 = 3.1947532989e-05, \ r1 = 1.3920053244e+00, \ r2 = 7.2193557024e-01, \ r3 = 1.7193385959e-01, \ r4 = 1.8645919859e-02, \ r5 = 7.7794247773e-04, \ r6 = 7.3266842264e-06, \ w0 = 4.1893854737e-01, \ w1 = 8.3333335817e-02, \ w2 = -2.7777778450e-03, \ w3 = 7.9365057172e-04, \ w4 = -5.9518753551e-04, \ w5 = 8.3633989561e-04, \ w6 = -1.6309292987e-03; \ float t, y, z, nadj, p, p1, p2, p3, q, r, w; \ int i, hx, ix; \ nadj = 0; \ hx = *(int *)&x; \ *signgamp = 1; \ ix = hx & 0x7fffffff; \ if (ix >= 0x7f800000) \ return x * x; \ if (ix == 0) \ return ((x + one) / zero); \ if (ix < 0x1c800000) { \ if (hx < 0) { \ *signgamp = -1; \ return -native_log(-x); \ } else \ return -native_log(x); \ } \ if (hx < 0) { \ if (ix >= 0x4b000000) \ return ((-x) / zero); \ t = __gen_ocl_internal_sinpi(x); \ if (t == zero) \ return ((-x) / zero); \ nadj = native_log(pi / __gen_ocl_fabs(t * x)); \ if (t < zero) \ *signgamp = -1; \ x = -x; \ } \ if (ix == 0x3f800000 || ix == 0x40000000) \ r = 0; \ else if (ix < 0x40000000) { \ if (ix <= 0x3f666666) { \ r = -native_log(x); \ if (ix >= 0x3f3b4a20) { \ y = one - x; \ i = 0; \ } else if (ix >= 0x3e6d3308) { \ y = x - (tc - one); \ i = 1; \ } else { \ y = x; \ i = 2; \ } \ } else { \ r = zero; \ if (ix >= 0x3fdda618) { \ y = (float) 2.0 - x; \ i = 0; \ } \ else if (ix >= 0x3F9da620) { \ y = x - tc; \ i = 1; \ } \ else { \ y = x - one; \ i = 2; \ } \ } \ switch (i) { \ case 0: \ z = y * y; \ p1 = mad(z, mad(z, mad(z, mad(z, mad(z, a10, a8), a6), a4), a2), a0); \ p2 = z * mad(z, mad(z, mad(z, mad(z, mad(z, a11, a9), a7), a5), a3), a1); \ p = mad(y, p1, p2); \ r = r - mad(y, 0.5f, -p); \ break; \ case 1: \ z = y * y; \ w = z * y; \ p1 = mad(w, mad(w, mad(w, mad(w, t12, t9), t6), t3), t0); \ p2 = mad(w, mad(w, mad(w, mad(w, t13, t10), t7), t4), t1); \ p3 = mad(w, mad(w, mad(w, mad(w, t14, t11), t8), t5), t2); \ p = z * p1 + mad(w, mad(y, p3, p2), -tt); \ r += (tf + p); \ break; \ case 2: \ p1 = y * mad(y, mad(y, mad(y, mad(y, mad(y, u5, u4), u3), u2), u1), u0); \ p2 = mad(y, mad(y, mad(y, mad(y, mad(y, v5, v4), v3), v2), v1), one); \ r = r + mad(y, -0.5f, p1 / p2); \ } \ } else if (ix < 0x41000000) { \ i = (int) x; \ t = zero; \ y = x - (float) i; \ p = y * mad(y, mad(y, mad(y, mad(y, mad(y, mad(y, s6, s5), s4), s3), s2), s1), s0); \ q = mad(y, mad(y, mad(y, mad(y, mad(y, mad(y, r6, r5), r4), r3), r2), r1), one); \ r = mad(y, 0.5f, p / q); \ z = one; \ switch (i) { \ case 7: \ z *= (y + (float) 6.0); \ case 6: \ z *= (y + (float) 5.0); \ case 5: \ z *= (y + (float) 4.0); \ case 4: \ z *= (y + (float) 3.0); \ case 3: \ z *= (y + (float) 2.0); \ r += native_log(z); \ break; \ } \ \ } else if (ix < 0x5c800000) { \ t = native_log(x); \ z = one / x; \ y = z * z; \ w = mad(z, mad(y, mad(y, mad(y, mad(y, mad(y, w6, w5), w4), w3), w2), w1), w0); \ r = (x - .5f) * (t - one) + w; \ } else \ r = x * (native_log(x) - one); \ if (hx < 0) \ r = nadj - r; \ return r; OVERLOADABLE float lgamma_r(float x, global int *signgamp) { BODY; } OVERLOADABLE float lgamma_r(float x, local int *signgamp) { BODY; } OVERLOADABLE float lgamma_r(float x, private int *signgamp) { BODY; } #undef BODY OVERLOADABLE float log1p(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_log1p(x); /* * Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ const float ln2_hi = 6.9313812256e-01, /* 0x3f317180 */ ln2_lo = 9.0580006145e-06, /* 0x3717f7d1 */ two25 = 3.355443200e+07, /* 0x4c000000 */ Lp1 = 6.6666668653e-01, /* 3F2AAAAB */ Lp2 = 4.0000000596e-01, /* 3ECCCCCD */ Lp3 = 2.8571429849e-01, /* 3E924925 */ Lp4 = 2.2222198546e-01; /* 3E638E29 */ const float zero = 0.0; float hfsq,f,c,s,z,R,u; int k,hx,hu,ax; union {float f; unsigned i;} un; un.f = x; hx = un.i; ax = hx&0x7fffffff; k = 1; if (hx < 0x3ed413d7) { /* x < 0.41422 */ if(ax>=0x3f800000) { /* x <= -1.0 */ if(x==(float)-1.0) return -two25/zero; /* log1p(-1)=+inf */ else return (x-x)/(x-x); /* log1p(x<-1)=NaN */ } if(ax<0x31000000) { /* |x| < 2**-29 */ if(two25+x>zero /* raise inexact */ &&ax<0x24800000) /* |x| < 2**-54 */ return x; else return x - x*x*(float)0.5; } if(hx>0||hx<=((int)0xbe95f61f)) { k=0;f=x;hu=1;} /* -0.2929= 0x7f800000) return x+x; if(k!=0) { if(hx<0x5a000000) { u = (float)1.0+x; un.f = u; hu = un.i; k = (hu>>23)-127; /* correction term */ c = (k>0)? (float)1.0-(u-x):x-(u-(float)1.0); c /= u; } else { u = x; un.f = u; hu = un.i; k = (hu>>23)-127; c = 0; } hu &= 0x007fffff; if(hu<0x3504f7) { un.i = hu|0x3f800000; u = un.f;/* normalize u */ } else { k += 1; un.i = hu|0x3f000000; u = un.f; /* normalize u/2 */ hu = (0x00800000-hu)>>2; } f = u-(float)1.0; } hfsq=(float)0.5*f*f; if(hu==0) { /* |f| < 2**-20 */ if(f==zero) { if(k==0) return zero; else {c = mad(k , ln2_lo, c); return mad(k, ln2_hi, c);} } R = mad(hfsq, 1.0f, -0.66666666666666666f * f); if(k==0) return f-R; else return k * ln2_hi - (R - mad(k, ln2_lo, c) - f); } s = f/((float)2.0+f); z = s*s; R = z * mad(z, mad(z, mad(z, Lp4, Lp3), Lp2), Lp1); if(k==0) return f + mad(hfsq + R, s, -hfsq); else return k*ln2_hi-( (hfsq - mad(s, hfsq + R, mad(k, ln2_lo, c))) - f); } OVERLOADABLE float logb(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_logb(x); union {float f; unsigned i;} u; u.f = x; int e = ((u.i & 0x7f800000) >> 23); float r1 = e-127; float r2 = -INFINITY; float r3 = x*x; /* sub normal or +/-0 */ float r = e == 0 ? r2 : r1; /* inf & nan */ return e == 0xff ? r3 : r; } OVERLOADABLE int ilogb(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_ilogb(x); union { int i; float f; } u; if (isnan(x)) return FP_ILOGBNAN; if (isinf(x)) return 0x7FFFFFFF; u.f = x; u.i &= 0x7fffffff; if (u.i == 0) return FP_ILOGB0; if (u.i >= 0x800000) return (u.i >> 23) - 127; int r = -126; int a = u.i & 0x7FFFFF; while(a < 0x800000) { a <<= 1; r --; } return r; } OVERLOADABLE float nan(uint code) { return NAN; } OVERLOADABLE float __gen_ocl_internal_tanpi(float x) { float sign = 1.0f; int ix; if(isinf(x)) return NAN; if(x < 0.0f) { x = -x; sign = -1.0f; } GEN_OCL_GET_FLOAT_WORD(ix, x); if(x> 0x1.0p24) return 0.0f; float m = __gen_ocl_internal_floor(x); ix = (int)m; m = x-m; int n = __gen_ocl_internal_floor(m*4.0f); if(m == 0.5f) { return (ix&0x1) == 0 ? sign*INFINITY : sign*-INFINITY; } if(m == 0.0f) { return (ix&0x1) == 0 ? 0.0f : -0.0f; } switch(n) { case 0: return sign * __kernel_tanf(m*M_PI_F, 0.0f, 1); case 1: return sign * 1.0f/__kernel_tanf((0.5f-m)*M_PI_F, 0.0f, 1); case 2: return sign * 1.0f/__kernel_tanf((0.5f-m)*M_PI_F, 0.0f, 1); default: return sign * -1.0f*__kernel_tanf((1.0f-m)*M_PI_F, 0.0f, 1); } } OVERLOADABLE float __gen_ocl_internal_cbrt(float x) { /* copied from fdlibm */ const unsigned B1 = 709958130, /* B1 = (84+2/3-0.03306235651)*2**23 */ B2 = 642849266; /* B2 = (76+2/3-0.03306235651)*2**23 */ const float C = 5.4285717010e-01, /* 19/35 = 0x3f0af8b0 */ D = -7.0530611277e-01, /* -864/1225 = 0xbf348ef1 */ E = 1.4142856598e+00, /* 99/70 = 0x3fb50750 */ F = 1.6071428061e+00, /* 45/28 = 0x3fcdb6db */ G = 3.5714286566e-01; /* 5/14 = 0x3eb6db6e */ float r,s,t, w; int hx; uint sign; uint high; GEN_OCL_GET_FLOAT_WORD(hx,x); sign=hx&0x80000000; /* sign= sign(x) */ hx ^=sign; if(hx>=0x7f800000) return(x+x); /* cbrt(NaN,INF) is itself */ if(hx==0) return(x); /* cbrt(0) is itself */ GEN_OCL_SET_FLOAT_WORD(x,hx); /* x <- |x| */ /* rough cbrt to 5 bits */ if(hx<0x00800000) /* subnormal number */ { //SET_FLOAT_WORD(t,0x4b800000); /* set t= 2**24 */ //t*=x; GET_FLOAT_WORD(high,t); SET_FLOAT_WORD(t,high/3+B2); t = (sign = 0) ? 0.0f : -0.0f; return t; } else GEN_OCL_SET_FLOAT_WORD(t,hx/3+B1); /* new cbrt to 23 bits */ r=t*t/x; s=mad(r, t, C); t*=G+F/(s+E+D/s); /* one step newton iteration to 53 bits with error less than 0.667 ulps */ s=t*t; /* t*t is exact */ r=x/s; w=t+t; r=(r-t)/(w+r); /* r-s is exact */ t=mad(t, r, t); /* retore the sign bit */ GEN_OCL_GET_FLOAT_WORD(high,t); GEN_OCL_SET_FLOAT_WORD(t,high|sign); return(t); } #define BODY \ *cosval = cos(x); \ return sin(x); OVERLOADABLE float sincos(float x, global float *cosval) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_sincos(x, cosval); BODY; } OVERLOADABLE float sincos(float x, local float *cosval) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_sincos(x, cosval); BODY; } OVERLOADABLE float sincos(float x, private float *cosval) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_sincos(x, cosval); BODY; } #undef BODY INLINE float __gen_ocl_asin_util(float x) { /* * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunSoft, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ float pS0 = 1.66666666666666657415e-01, pS1 = -3.25565818622400915405e-01, pS2 = 2.01212532134862925881e-01, pS3 = -4.00555345006794114027e-02, pS4 = 7.91534994289814532176e-04, qS1 = -2.40339491173441421878e+00, qS2 = 2.02094576023350569471e+00, qS3 = -6.88283971605453293030e-01, qS4 = 7.70381505559019352791e-02; float t = x*x; float p = t * mad(t, mad(t, mad(t, mad(t, pS4, pS3), pS2), pS1), pS0); float q = mad(t, mad(t, mad(t, mad(t, qS4, qS3), qS2), qS1), 1.0f); float w = p / q; return mad(x, w, x); } OVERLOADABLE float __gen_ocl_internal_asin(float x) { uint ix; union { uint i; float f; } u; u.f = x; ix = u.i & 0x7fffffff; if(ix == 0x3f800000) { return x * M_PI_2_F; /* asin(|1|)=+-pi/2 with inexact */ } if(ix > 0x3f800000) { /* |x|>= 1 */ return NAN; /* asin(|x|>1) is NaN */ } if(ix < 0x32000000) { /* if |x| < 2**-27 */ if(HUGE_VALF + x > FLT_ONE) return x; /* return x with inexact if x!=0*/ } if(x < -0.5) { return 2 * __gen_ocl_asin_util(native_sqrt((1+x) / 2)) - M_PI_2_F; } else if(x > 0.5) { return M_PI_2_F - 2 * __gen_ocl_asin_util(native_sqrt((1-x) / 2)); } else { return __gen_ocl_asin_util(x); } } OVERLOADABLE float __gen_ocl_internal_asinpi(float x) { return __gen_ocl_internal_asin(x) / M_PI_F; } OVERLOADABLE float __gen_ocl_internal_acos(float x) { if(x > 0.5) return 2 * __gen_ocl_asin_util(native_sqrt((1-x)/2)); else return M_PI_2_F - __gen_ocl_internal_asin(x); } OVERLOADABLE float __gen_ocl_internal_acospi(float x) { return __gen_ocl_internal_acos(x) / M_PI_F; } __constant float atanhi[4] = { 4.6364760399e-01, /* atan(0.5)hi 0x3eed6338 */ 7.8539812565e-01, /* atan(1.0)hi 0x3f490fda */ 9.8279368877e-01, /* atan(1.5)hi 0x3f7b985e */ 1.5707962513e+00, /* atan(inf)hi 0x3fc90fda */ }; __constant float atanlo[4] = { 5.0121582440e-09, /* atan(0.5)lo 0x31ac3769 */ 3.7748947079e-08, /* atan(1.0)lo 0x33222168 */ 3.4473217170e-08, /* atan(1.5)lo 0x33140fb4 */ 7.5497894159e-08, /* atan(inf)lo 0x33a22168 */ }; OVERLOADABLE float __gen_ocl_internal_atan(float x) { /* copied from fdlibm */ float aT[11]; aT[0] = 3.3333334327e-01; /* 0x3eaaaaaa */ aT[1] = -2.0000000298e-01; /* 0xbe4ccccd */ aT[2] = 1.4285714924e-01; /* 0x3e124925 */ aT[3] = -1.1111110449e-01; /* 0xbde38e38 */ aT[4] = 9.0908870101e-02; /* 0x3dba2e6e */ aT[5] = -7.6918758452e-02; /* 0xbd9d8795 */ aT[6] = 6.6610731184e-02; /* 0x3d886b35 */ const float one = 1.0, huge = 1.0e30; float w,s1,s2,z; int ix,hx,id; GEN_OCL_GET_FLOAT_WORD(hx,x); ix = hx&0x7fffffff; if(ix>=0x50800000) { /* if |x| >= 2^34 */ if(ix>0x7f800000) return x+x; /* NaN */ if(hx>0) return atanhi[3]+atanlo[3]; else return -atanhi[3]-atanlo[3]; } if (ix < 0x3ee00000) { /* |x| < 0.4375 */ if (ix < 0x31000000) { /* |x| < 2^-29 */ if(huge+x>one) return x; /* raise inexact */ } id = -1; } else { x = __gen_ocl_fabs(x); if (ix < 0x3f980000) { /* |x| < 1.1875 */ if (ix < 0x3f300000) { /* 7/16 <=|x|<11/16 */ id = 0; x = ((float)2.0*x-one)/((float)2.0+x); } else { /* 11/16<=|x|< 19/16 */ id = 1; x = (x-one)/(x+one); } } else { if (ix < 0x401c0000) { /* |x| < 2.4375 */ id = 2; x = (x-(float)1.5)/(one+(float)1.5*x); } else { /* 2.4375 <= |x| < 2^66 */ id = 3; x = -(float)1.0/x; } }} /* end of argument reduction */ z = x*x; w = z*z; /* break sum from i=0 to 10 aT[i]z**(i+1) into odd and even poly */ s1 = z * mad(w, mad(w, mad(w, aT[6], aT[4]), aT[2]), aT[0]); s2 = w * mad(w, mad(w, aT[5], aT[3]), aT[1]); if (id<0) return x - x*(s1+s2); else { z = atanhi[id] - ((x*(s1+s2) - atanlo[id]) - x); return (hx<0)? -z:z; } } OVERLOADABLE float __gen_ocl_internal_atanpi(float x) { return __gen_ocl_internal_atan(x) / M_PI_F; } // XXX work-around PTX profile OVERLOADABLE float sqrt(float x) { return native_sqrt(x); } OVERLOADABLE float rsqrt(float x) { return native_rsqrt(x); } OVERLOADABLE float __gen_ocl_internal_atan2(float y, float x) { /* copied from fdlibm */ float z; int k,m,hx,hy,ix,iy; const float tiny = 1.0e-30, zero = 0.0, pi_o_4 = 7.8539818525e-01, /* 0x3f490fdb */ pi_o_2 = 1.5707963705e+00, /* 0x3fc90fdb */ pi = 3.1415927410e+00, /* 0x40490fdb */ pi_lo = -8.7422776573e-08; /* 0xb3bbbd2e */ GEN_OCL_GET_FLOAT_WORD(hx,x); ix = hx&0x7fffffff; GEN_OCL_GET_FLOAT_WORD(hy,y); iy = hy&0x7fffffff; if((ix>0x7f800000)|| (iy>0x7f800000)) /* x or y is NaN */ return x+y; if(hx==0x3f800000) return z=__gen_ocl_internal_atan(y); /* x=1.0 */ m = ((hy>>31)&1)|((hx>>30)&2); /* 2*sign(x)+sign(y) */ /* when y = 0 */ if(iy==0) { switch(m) { case 0: case 1: return y; /* atan(+-0,+anything)=+-0 */ case 2: return pi+tiny;/* atan(+0,-anything) = pi */ case 3: return -pi-tiny;/* atan(-0,-anything) =-pi */ } } /* when x = 0 */ if(ix==0) return (hy<0)? -pi_o_2-tiny: pi_o_2+tiny; /* both are denorms. Gen does not support denorm, so we convert to normal float number*/ if(ix <= 0x7fffff && iy <= 0x7fffff) { x = (float)(ix) * (1.0f - ((hx>>30) & 0x2)); y = (float)(iy) * (1.0f - ((hy>>30) & 0x2)); } /* when x is INF */ if(ix==0x7f800000) { if(iy==0x7f800000) { switch(m) { case 0: return pi_o_4+tiny;/* atan(+INF,+INF) */ case 1: return -pi_o_4-tiny;/* atan(-INF,+INF) */ case 2: return (float)3.0*pi_o_4+tiny;/*atan(+INF,-INF)*/ case 3: return (float)-3.0*pi_o_4-tiny;/*atan(-INF,-INF)*/ } } else { switch(m) { case 0: return zero ; /* atan(+...,+INF) */ case 1: return -zero ; /* atan(-...,+INF) */ case 2: return pi+tiny ; /* atan(+...,-INF) */ case 3: return -pi-tiny ; /* atan(-...,-INF) */ } } } /* when y is INF */ if(iy==0x7f800000) return (hy<0)? -pi_o_2-tiny: pi_o_2+tiny; /* compute y/x */ k = (iy-ix)>>23; if(k > 60) z=pi_o_2+(float)0.5*pi_lo; /* |y/x| > 2**60 */ else if(hx<0&&k<-60) z=0.0; /* |y|/x < -2**60 */ else z=__gen_ocl_internal_atan(__gen_ocl_fabs(y/x)); /* safe to do y/x */ switch (m) { case 0: return z ; /* atan(+,+) */ case 1: { uint zh; GEN_OCL_GET_FLOAT_WORD(zh,z); GEN_OCL_SET_FLOAT_WORD(z,zh ^ 0x80000000); } return z ; /* atan(-,+) */ case 2: return pi-(z-pi_lo);/* atan(+,-) */ default: /* case 3 */ return (z-pi_lo)-pi;/* atan(-,-) */ } } OVERLOADABLE float __gen_ocl_internal_atan2pi(float y, float x) { return __gen_ocl_internal_atan2(y, x) / M_PI_F; } OVERLOADABLE float __gen_ocl_internal_fabs(float x) { return __gen_ocl_fabs(x); } OVERLOADABLE float __gen_ocl_internal_trunc(float x) { return __gen_ocl_rndz(x); } OVERLOADABLE float __gen_ocl_internal_round(float x) { float y = __gen_ocl_rndz(x); if (__gen_ocl_fabs(x - y) >= 0.5f) y += __gen_ocl_internal_copysign(1.f, x); return y; } OVERLOADABLE float __gen_ocl_internal_ceil(float x) { return __gen_ocl_rndu(x); } OVERLOADABLE float __gen_ocl_internal_rint(float x) { return __gen_ocl_rnde(x); } OVERLOADABLE float __gen_ocl_internal_exp(float x) { float o_threshold = 8.8721679688e+01, /* 0x42b17180 */ u_threshold = -1.0397208405e+02, /* 0xc2cff1b5 */ twom100 = 7.8886090522e-31, /* 2**-100=0x0d800000 */ ivln2 = 1.4426950216e+00, /* 0x3fb8aa3b =1/ln2 */ one = 1.0, huge = 1.0e+30, P1 = 1.6666667163e-01, /* 0x3e2aaaab */ P2 = -2.7777778450e-03; /* 0xbb360b61 */ float y,hi=0.0,lo=0.0,c,t; int k=0,xsb; unsigned hx; float ln2HI_0 = 6.9313812256e-01; /* 0x3f317180 */ float ln2HI_1 = -6.9313812256e-01; /* 0xbf317180 */ float ln2LO_0 = 9.0580006145e-06; /* 0x3717f7d1 */ float ln2LO_1 = -9.0580006145e-06; /* 0xb717f7d1 */ float half_0 = 0.5; float half_1 = -0.5; GEN_OCL_GET_FLOAT_WORD(hx,x); xsb = (hx>>31)&1; /* sign bit of x */ hx &= 0x7fffffff; /* high word of |x| */ /* filter out non-finite argument */ if(hx >= 0x42b17218) { /* if |x|>=88.721... */ if(hx>0x7f800000) return x+x; /* NaN */ if(hx==0x7f800000) return (xsb==0)? x:0.0; /* exp(+-inf)={inf,0} */ if(x > o_threshold) return huge*huge; /* overflow */ if(x < u_threshold) return twom100*twom100; /* underflow */ } /* argument reduction */ if(hx > 0x3eb17218) { /* if |x| > 0.5 ln2 */ if(hx < 0x3F851592) { /* and |x| < 1.5 ln2 */ hi = x-(xsb ==1 ? ln2HI_1 : ln2HI_0); lo= xsb == 1? ln2LO_1 : ln2LO_0; k = 1-xsb-xsb; } else { float tmp = xsb == 1 ? half_1 : half_0; k = ivln2*x+tmp; t = k; hi = x - t*ln2HI_0; /* t*ln2HI is exact here */ lo = t*ln2LO_0; } x = hi - lo; } else if(hx < 0x31800000) { /* when |x|<2**-28 */ if(huge+x>one) return one+x;/* trigger inexact */ } else k = 0; /* x is now in primary range */ t = x*x; c = x - t*(P1+t*P2); if(k==0) return one-((x*c)/(c-(float)2.0)-x); else y = one-((lo-(x*c)/((float)2.0-c))-hi); if(k >= -125) { unsigned hy; GEN_OCL_GET_FLOAT_WORD(hy,y); GEN_OCL_SET_FLOAT_WORD(y,hy+(k<<23)); /* add k to y's exponent */ return y; } else { unsigned hy; GEN_OCL_GET_FLOAT_WORD(hy,y); GEN_OCL_SET_FLOAT_WORD(y,hy+((k+100)<<23)); /* add k to y's exponent */ return y*twom100; } } /* erf,erfc from glibc s_erff.c -- float version of s_erf.c. * Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com. */ /* * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ INLINE_OVERLOADABLE float __gen_ocl_internal_erf(float x) { /*...*/ const float tiny = 1.0e-30, half_val= 5.0000000000e-01, /* 0x3F000000 */ one = 1.0000000000e+00, /* 0x3F800000 */ two = 2.0000000000e+00, /* 0x40000000 */ /* c = (subfloat)0.84506291151 */ erx = 8.4506291151e-01, /* 0x3f58560b */ /* * Coefficients for approximation to erf on [0,0.84375] */ efx = 1.2837916613e-01, /* 0x3e0375d4 */ efx8= 1.0270333290e+00, /* 0x3f8375d4 */ pp0 = 1.2837916613e-01, /* 0x3e0375d4 */ pp1 = -3.2504209876e-01, /* 0xbea66beb */ pp2 = -2.8481749818e-02, /* 0xbce9528f */ pp3 = -5.7702702470e-03, /* 0xbbbd1489 */ pp4 = -2.3763017452e-05, /* 0xb7c756b1 */ qq1 = 3.9791721106e-01, /* 0x3ecbbbce */ qq2 = 6.5022252500e-02, /* 0x3d852a63 */ qq3 = 5.0813062117e-03, /* 0x3ba68116 */ qq4 = 1.3249473704e-04, /* 0x390aee49 */ qq5 = -3.9602282413e-06, /* 0xb684e21a */ /* * Coefficients for approximation to erf in [0.84375,1.25] */ pa0 = -2.3621185683e-03, /* 0xbb1acdc6 */ pa1 = 4.1485610604e-01, /* 0x3ed46805 */ pa2 = -3.7220788002e-01, /* 0xbebe9208 */ pa3 = 3.1834661961e-01, /* 0x3ea2fe54 */ pa4 = -1.1089469492e-01, /* 0xbde31cc2 */ pa5 = 3.5478305072e-02, /* 0x3d1151b3 */ pa6 = -2.1663755178e-03, /* 0xbb0df9c0 */ qa1 = 1.0642088205e-01, /* 0x3dd9f331 */ qa2 = 5.4039794207e-01, /* 0x3f0a5785 */ qa3 = 7.1828655899e-02, /* 0x3d931ae7 */ qa4 = 1.2617121637e-01, /* 0x3e013307 */ qa5 = 1.3637083583e-02, /* 0x3c5f6e13 */ qa6 = 1.1984500103e-02, /* 0x3c445aa3 */ /* * Coefficients for approximation to erfc in [1.25,1/0.35] */ra0 = -9.8649440333e-03, /* 0xbc21a093 */ ra1 = -6.9385856390e-01, /* 0xbf31a0b7 */ ra2 = -1.0558626175e+01, /* 0xc128f022 */ ra3 = -6.2375331879e+01, /* 0xc2798057 */ ra4 = -1.6239666748e+02, /* 0xc322658c */ ra5 = -1.8460508728e+02, /* 0xc3389ae7 */ ra6 = -8.1287437439e+01, /* 0xc2a2932b */ ra7 = -9.8143291473e+00, /* 0xc11d077e */ sa1 = 1.9651271820e+01, /* 0x419d35ce */ sa2 = 1.3765776062e+02, /* 0x4309a863 */ sa3 = 4.3456588745e+02, /* 0x43d9486f */ sa4 = 6.4538726807e+02, /* 0x442158c9 */ sa5 = 4.2900814819e+02, /* 0x43d6810b */ sa6 = 1.0863500214e+02, /* 0x42d9451f */ sa7 = 6.5702495575e+00, /* 0x40d23f7c */ sa8 = -6.0424413532e-02, /* 0xbd777f97 */ /* * Coefficients for approximation to erfc in [1/.35,28] */ rb0 = -9.8649431020e-03, /* 0xbc21a092 */ rb1 = -7.9928326607e-01, /* 0xbf4c9dd4 */ rb2 = -1.7757955551e+01, /* 0xc18e104b */ rb3 = -1.6063638306e+02, /* 0xc320a2ea */ rb4 = -6.3756646729e+02, /* 0xc41f6441 */ rb5 = -1.0250950928e+03, /* 0xc480230b */ rb6 = -4.8351919556e+02, /* 0xc3f1c275 */ sb1 = 3.0338060379e+01, /* 0x41f2b459 */ sb2 = 3.2579251099e+02, /* 0x43a2e571 */ sb3 = 1.5367296143e+03, /* 0x44c01759 */ sb4 = 3.1998581543e+03, /* 0x4547fdbb */ sb5 = 2.5530502930e+03, /* 0x451f90ce */ sb6 = 4.7452853394e+02, /* 0x43ed43a7 */ sb7 = -2.2440952301e+01; /* 0xc1b38712 */ int hx,ix,i; float R,S,P,Q,s,y,z,r; GEN_OCL_GET_FLOAT_WORD(hx,x); ix = hx&0x7fffffff; if(ix>=0x7f800000) { /* erf(nan)=nan */ i = ((unsigned int)hx>>31)<<1; return (float)(1-i)+one/x; /* erf(+-inf)=+-1 */ } if(ix < 0x3f580000) { /* |x|<0.84375 */ if(ix < 0x31800000) { /* |x|<2**-28 */ if (ix < 0x04000000) /*avoid underflow */ return (float)0.125*((float)8.0*x+efx8*x); return x + efx*x; } z = x*x; r = mad(z, mad(z, mad(z, mad(z, pp4, pp3), pp2), pp1), pp0); s = mad(z, mad(z, mad(z, mad(z, mad(z, qq5,qq4), qq3), qq2), qq1), one); y = r / s; return mad(x, y, x); } if(ix < 0x3fa00000) { /* 0.84375 <= |x| < 1.25 */ s = __gen_ocl_internal_fabs(x)-one; P = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, pa6, pa5), pa4), pa3), pa2), pa1), pa0); Q = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, qa6, qa5), qa4), qa3), qa2), qa1), one); if(hx>=0) return erx + P/Q; else return -erx - P/Q; } if (ix >= 0x40c00000) { /* inf>|x|>=6 */ if(hx>=0) return one-tiny; else return tiny-one; } x = __gen_ocl_internal_fabs(x); s = one/(x*x); if(ix< 0x4036DB6E) { /* |x| < 1/0.35 */ R = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, ra7, ra6), ra5), ra4), ra3), ra2), ra1), ra0); S = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, sa8, sa7), sa6), sa5), sa4), sa3), sa2), sa1), one); } else { /* |x| >= 1/0.35 */ R = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, rb6, rb5), rb4), rb3), rb2), rb1), rb0); S = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, sb7, sb6), sb5), sb4), sb3), sb2), sb1), one); } GEN_OCL_GET_FLOAT_WORD(ix,x); GEN_OCL_SET_FLOAT_WORD(z,ix&0xfffff000); r = __gen_ocl_internal_exp(-z*z-(float)0.5625)*__gen_ocl_internal_exp((z-x)*(z+x)+R/S); if(hx>=0) return one-r/x; else return r/x-one; } INLINE_OVERLOADABLE float __gen_ocl_internal_erfc(float x) { /*...*/ const float tiny = 1.0e-30, half_val= 5.0000000000e-01, /* 0x3F000000 */ one = 1.0000000000e+00, /* 0x3F800000 */ two = 2.0000000000e+00, /* 0x40000000 */ /* c = (subfloat)0.84506291151 */ erx = 8.4506291151e-01, /* 0x3f58560b */ /* * Coefficients for approximation to erf on [0,0.84375] */ efx = 1.2837916613e-01, /* 0x3e0375d4 */ efx8= 1.0270333290e+00, /* 0x3f8375d4 */ pp0 = 1.2837916613e-01, /* 0x3e0375d4 */ pp1 = -3.2504209876e-01, /* 0xbea66beb */ pp2 = -2.8481749818e-02, /* 0xbce9528f */ pp3 = -5.7702702470e-03, /* 0xbbbd1489 */ pp4 = -2.3763017452e-05, /* 0xb7c756b1 */ qq1 = 3.9791721106e-01, /* 0x3ecbbbce */ qq2 = 6.5022252500e-02, /* 0x3d852a63 */ qq3 = 5.0813062117e-03, /* 0x3ba68116 */ qq4 = 1.3249473704e-04, /* 0x390aee49 */ qq5 = -3.9602282413e-06, /* 0xb684e21a */ /* * Coefficients for approximation to erf in [0.84375,1.25] */ pa0 = -2.3621185683e-03, /* 0xbb1acdc6 */ pa1 = 4.1485610604e-01, /* 0x3ed46805 */ pa2 = -3.7220788002e-01, /* 0xbebe9208 */ pa3 = 3.1834661961e-01, /* 0x3ea2fe54 */ pa4 = -1.1089469492e-01, /* 0xbde31cc2 */ pa5 = 3.5478305072e-02, /* 0x3d1151b3 */ pa6 = -2.1663755178e-03, /* 0xbb0df9c0 */ qa1 = 1.0642088205e-01, /* 0x3dd9f331 */ qa2 = 5.4039794207e-01, /* 0x3f0a5785 */ qa3 = 7.1828655899e-02, /* 0x3d931ae7 */ qa4 = 1.2617121637e-01, /* 0x3e013307 */ qa5 = 1.3637083583e-02, /* 0x3c5f6e13 */ qa6 = 1.1984500103e-02, /* 0x3c445aa3 */ /* * Coefficients for approximation to erfc in [1.25,1/0.35] */ra0 = -9.8649440333e-03, /* 0xbc21a093 */ ra1 = -6.9385856390e-01, /* 0xbf31a0b7 */ ra2 = -1.0558626175e+01, /* 0xc128f022 */ ra3 = -6.2375331879e+01, /* 0xc2798057 */ ra4 = -1.6239666748e+02, /* 0xc322658c */ ra5 = -1.8460508728e+02, /* 0xc3389ae7 */ ra6 = -8.1287437439e+01, /* 0xc2a2932b */ ra7 = -9.8143291473e+00, /* 0xc11d077e */ sa1 = 1.9651271820e+01, /* 0x419d35ce */ sa2 = 1.3765776062e+02, /* 0x4309a863 */ sa3 = 4.3456588745e+02, /* 0x43d9486f */ sa4 = 6.4538726807e+02, /* 0x442158c9 */ sa5 = 4.2900814819e+02, /* 0x43d6810b */ sa6 = 1.0863500214e+02, /* 0x42d9451f */ sa7 = 6.5702495575e+00, /* 0x40d23f7c */ sa8 = -6.0424413532e-02, /* 0xbd777f97 */ /* * Coefficients for approximation to erfc in [1/.35,28] */ rb0 = -9.8649431020e-03, /* 0xbc21a092 */ rb1 = -7.9928326607e-01, /* 0xbf4c9dd4 */ rb2 = -1.7757955551e+01, /* 0xc18e104b */ rb3 = -1.6063638306e+02, /* 0xc320a2ea */ rb4 = -6.3756646729e+02, /* 0xc41f6441 */ rb5 = -1.0250950928e+03, /* 0xc480230b */ rb6 = -4.8351919556e+02, /* 0xc3f1c275 */ sb1 = 3.0338060379e+01, /* 0x41f2b459 */ sb2 = 3.2579251099e+02, /* 0x43a2e571 */ sb3 = 1.5367296143e+03, /* 0x44c01759 */ sb4 = 3.1998581543e+03, /* 0x4547fdbb */ sb5 = 2.5530502930e+03, /* 0x451f90ce */ sb6 = 4.7452853394e+02, /* 0x43ed43a7 */ sb7 = -2.2440952301e+01; /* 0xc1b38712 */ int hx,ix; float R,S,P,Q,s,y,z,r; GEN_OCL_GET_FLOAT_WORD(hx,x); ix = hx&0x7fffffff; if(ix>=0x7f800000) { /* erfc(nan)=nan */ /* erfc(+-inf)=0,2 */ return (float)(((unsigned int)hx>>31)<<1)+one/x; } if(ix < 0x3f580000) { /* |x|<0.84375 */ if(ix < 0x23800000) /* |x|<2**-56 */ return one-x; z = x*x; r = mad(z, mad(z, mad(z, mad(z, pp4, pp3), pp2), pp1), pp0); s = mad(z, mad(z, mad(z, mad(z, mad(z, qq5, qq4), qq3), qq2), qq1), one); y = r/s; if(hx < 0x3e800000) { /* x<1/4 */ return one-(x+x*y); } else { r = x*y; r += (x-half_val); return half_val - r ; } } if(ix < 0x3fa00000) { /* 0.84375 <= |x| < 1.25 */ s = __gen_ocl_internal_fabs(x)-one; P = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, pa6, pa5), pa4), pa3), pa2), pa1), pa0); Q = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, qa6, qa5), qa4), qa3), qa2), qa1), one); if(hx>=0) { z = one-erx; return z - P/Q; } else { z = erx+P/Q; return one+z; } } if (ix < 0x41e00000) { /* |x|<28 */ x = __gen_ocl_internal_fabs(x); s = one/(x*x); if(ix< 0x4036DB6D) { /* |x| < 1/.35 ~ 2.857143*/ R = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, ra7, ra6), ra5), ra4), ra3), ra2), ra1), ra0); S = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, sa8, sa7), sa6), sa5), sa4), sa3), sa2), sa1), one); } else { /* |x| >= 1/.35 ~ 2.857143 */ if(hx<0&&ix>=0x40c00000) return two-tiny;/* x < -6 */ R = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, rb6, rb5), rb4), rb3), rb2), rb1), rb0); S = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, sb7, sb6), sb5), sb4), sb3), sb2), sb1), one); } GEN_OCL_GET_FLOAT_WORD(ix,x); GEN_OCL_SET_FLOAT_WORD(z,ix&0xffffe000); r = __gen_ocl_internal_exp(-z*z-(float)0.5625)* __gen_ocl_internal_exp((z-x)*(z+x)+R/S); if(hx>0) { float ret = r/x; return ret; } else return two-r/x; } else { if(hx>0) { return tiny*tiny; } else return two-tiny; } } OVERLOADABLE float __gen_ocl_internal_fmod (float x, float y) { //return x-y*__gen_ocl_rndz(x/y); float one = 1.0; float Zero[2]; int n,hx,hy,hz,ix,iy,sx,i; Zero[0] = 0.0; Zero[1] = -0.0; GEN_OCL_GET_FLOAT_WORD(hx,x); GEN_OCL_GET_FLOAT_WORD(hy,y); sx = hx&0x80000000; /* sign of x */ hx ^=sx; /* |x| */ hy &= 0x7fffffff; /* |y| */ /* purge off exception values */ if(hy==0||(hx>=0x7f800000)|| /* y=0,or x not finite */ (hy>0x7f800000)) /* or y is NaN */ return (x*y)/(x*y); if(hx>31]; /* |x|=|y| return x*0*/ /* determine ix = ilogb(x) */ if(hx<0x00800000) { /* subnormal x */ for (ix = -126,i=(hx<<8); i>0; i<<=1) ix -=1; } else ix = (hx>>23)-127; /* determine iy = ilogb(y) */ if(hy<0x00800000) { /* subnormal y */ for (iy = -126,i=(hy<<8); i>=0; i<<=1) iy -=1; } else iy = (hy>>23)-127; /* set up {hx,lx}, {hy,ly} and align y to x */ if(ix >= -126) hx = 0x00800000|(0x007fffff&hx); else { /* subnormal x, shift x to normal */ n = -126-ix; hx = hx<= -126) hy = 0x00800000|(0x007fffff&hy); else { /* subnormal y, shift y to normal */ n = -126-iy; hy = hy<>31]; hx = hz+hz; } } hz=hx-hy; if(hz>=0) {hx=hz;} /* convert back to floating value and restore the sign */ if(hx==0) /* return sign(x)*0 */ return Zero[(unsigned)sx>>31]; while(hx<0x00800000) { /* normalize x */ hx = hx+hx; iy -= 1; } if(iy>= -126) { /* normalize output */ hx = ((hx-0x00800000)|((iy+127)<<23)); GEN_OCL_SET_FLOAT_WORD(x,hx|sx); } else { /* subnormal output */ n = -126 - iy; hx >>= n; GEN_OCL_SET_FLOAT_WORD(x,hx|sx); x *= one; /* create necessary signal */ } return x; /* exact output */ } OVERLOADABLE float __gen_ocl_internal_expm1(float x) { //return __gen_ocl_pow(M_E_F, x) - 1; float Q1 = -3.3333335072e-02, /* 0xbd088889 */ ln2_hi = 6.9313812256e-01, /* 0x3f317180 */ ln2_lo = 9.0580006145e-06, /* 0x3717f7d1 */ Q2 = 1.5873016091e-03, /* 0x3ad00d01 */ huge = 1.0e30, tiny = 1.0e-30, ivln2 = 1.4426950216e+00, /* 0x3fb8aa3b =1/ln2 */ one = 1.0, o_threshold= 8.8721679688e+01; /* 0x42b17180 */ float y,hi,lo,c,t,e,hxs,hfx,r1; int k,xsb; int hx; GEN_OCL_GET_FLOAT_WORD(hx,x); xsb = hx&0x80000000; /* sign bit of x */ //if(xsb==0) //y=x; //else //y= -x; /* y = |x| */ y = __gen_ocl_internal_fabs(x); hx &= 0x7fffffff; /* high word of |x| */ /* filter out huge and non-finite argument */ if(hx >= 0x4195b844) { /* if |x|>=27*ln2 */ if(hx >= 0x42b17218) { /* if |x|>=88.721... */ if(hx>0x7f800000) return x+x; /* NaN */ if(hx==0x7f800000) return (xsb==0)? x:-1.0;/* exp(+-inf)={inf,-1} */ if(x > o_threshold) return huge*huge; /* overflow */ } if(xsb!=0) { /* x < -27*ln2, return -1.0 with inexact */ if(x+tiny<(float)0.0) /* raise inexact */ return tiny-one; /* return -1 */ } } /* argument reduction */ if(hx > 0x3eb17218) {/* if |x| > 0.5 ln2 */ if(hx < 0x3F851592) {/* and |x| < 1.5 ln2 */ if(xsb==0){ hi = x - ln2_hi; lo = ln2_lo; k = 1; } else { hi = x + ln2_hi; lo = -ln2_lo; k = -1; } } else { k = ivln2*x+((xsb==0)?(float)0.5:(float)-0.5); t = k; hi = x - t*ln2_hi;/* t*ln2_hi is exact here */ lo = t*ln2_lo; } x = hi - lo; c = (hi-x)-lo; } else if(hx < 0x33000000) { /* when |x|<2**-25, return x */ //t = huge+x; /* return x with inexact flags when x!=0 */ //return x - (t-(huge+x)); return x; } else k = 0; /* x is now in primary range */ hfx = (float)0.5*x; hxs = x*hfx; r1 = one+hxs*(Q1+hxs*Q2); t = (float)3.0-r1*hfx; e = hxs*((r1-t)/((float)6.0 - x*t)); if(k==0) return x - (x*e-hxs); /* c is 0 */ else{ e = (x*(e-c)-c); e -= hxs; if(k== -1)return (float)0.5*(x-e)-(float)0.5; if(k==1){ if(x < (float)-0.25) return -(float)2.0*(e-(x+(float)0.5)); else return (one+(float)2.0*(x-e)); } if (k <= -2 || k>56) { /* suffice to return exp(x)-1 */ int i; y = one-(e-x); GEN_OCL_GET_FLOAT_WORD(i,y); GEN_OCL_SET_FLOAT_WORD(y,i+(k<<23)); /* add k to y's exponent */ return y-one; } t = one; if(k<23) { int i; GEN_OCL_SET_FLOAT_WORD(t,0x3f800000 - (0x1000000>>k)); /* t=1-2^-k */ y = t-(e-x); GEN_OCL_GET_FLOAT_WORD(i,y); GEN_OCL_SET_FLOAT_WORD(y,i+(k<<23)); /* add k to y's exponent */ } else { int i; GEN_OCL_SET_FLOAT_WORD(t,((0x7f-k)<<23)); /* 2^-k */ y = x-(e+t); y += one; GEN_OCL_GET_FLOAT_WORD(i,y); GEN_OCL_SET_FLOAT_WORD(y,i+(k<<23)); /* add k to y's exponent */ } } return y; } OVERLOADABLE float __gen_ocl_internal_acosh(float x) { //return native_log(x + native_sqrt(x + 1) * native_sqrt(x - 1)); float one = 1.0, ln2 = 6.9314718246e-01;/* 0x3f317218 */ float t; int hx; GEN_OCL_GET_FLOAT_WORD(hx,x); if(hx<0x3f800000) { /* x < 1 */ return (x-x)/(x-x); } else if(hx >=0x4d800000) { /* x > 2**28 */ if(hx >=0x7f800000) {/* x is inf of NaN */ return x+x; } else return __gen_ocl_internal_log(x)+ln2;/* acosh(huge)=log(2x) */ } else if (hx==0x3f800000) { return 0.0; /* acosh(1) = 0 */ } else if (hx > 0x40000000) { /* 2**28 > x > 2 */ t=x*x; return __gen_ocl_internal_log((float)2.0*x-one/(x+__gen_ocl_sqrt(t-one))); } else { /* 1one) return x; /* return x inexact except 0 */ } if(ix>0x47000000) {/* |x| > 2**14 */ if(ix>=0x7f800000) return x+x;/* x is inf or NaN */ w = __gen_ocl_internal_log(__gen_ocl_internal_fabs(x))+ln2; } else { float xa = __gen_ocl_internal_fabs(x); if (ix>0x40000000) {/* 2**14 > |x| > 2.0 */ w = __gen_ocl_internal_log(mad(xa, 2.0f, one / (__gen_ocl_sqrt(mad(xa, xa, one)) + xa))); } else { /* 2.0 > |x| > 2**-14 */ float t = xa*xa; w =log1p(xa+t/(one+__gen_ocl_sqrt(one+t))); } } return __gen_ocl_internal_copysign(w, x); } OVERLOADABLE float __gen_ocl_internal_sinh(float x){ //return (1 - native_exp(-2 * x)) / (2 * native_exp(-x)); float one = 1.0, shuge = 1.0e37; float t,w,h; int ix,jx; GEN_OCL_GET_FLOAT_WORD(jx,x); ix = jx&0x7fffffff; /* x is INF or NaN */ if(ix>=0x7f800000) return x+x; h = 0.5; if (jx<0) h = -h; /* |x| in [0,22], return sign(x)*0.5*(E+E/(E+1))) */ if (ix < 0x41b00000) { /* |x|<22 */ if (ix<0x31800000) /* |x|<2**-28 */ if(shuge+x>one) return x;/* sinh(tiny) = tiny with inexact */ t = __gen_ocl_internal_expm1(__gen_ocl_internal_fabs(x)); if(ix<0x3f800000) return h*((float)2.0*t-t*t/(t+one)); return h*(t+t/(t+one)); } /* |x| in [22, log(maxdouble)] return 0.5*exp(|x|) */ if (ix < 0x42b17180) return h*__gen_ocl_internal_exp(__gen_ocl_internal_fabs(x)); /* |x| in [log(maxdouble), overflowthresold] */ if (ix<=0x42b2d4fc) { w = __gen_ocl_internal_exp((float)0.5*__gen_ocl_internal_fabs(x)); t = h*w; return t*w; } /* |x| > overflowthresold, sinh(x) overflow */ return x*shuge; } OVERLOADABLE float __gen_ocl_internal_tanh(float x) { //float y = native_exp(-2 * x); //return (1 - y) / (1 + y); float one=1.0, two=2.0, tiny = 1.0e-30; float t,z; int jx,ix; GEN_OCL_GET_FLOAT_WORD(jx,x); ix = jx&0x7fffffff; /* x is INF or NaN */ if(ix>=0x7f800000) { if (jx>=0) return one/x+one; /* tanh(+-inf)=+-1 */ else return one/x-one; /* tanh(NaN) = NaN */ } if (ix < 0x41b00000) { /* |x|<22 */ if (ix == 0) return x; /* x == +-0 */ if (ix<0x24000000) /* |x|<2**-55 */ return x*(one+x); /* tanh(small) = small */ if (ix>=0x3f800000) { /* |x|>=1 */ t = __gen_ocl_internal_expm1(two*__gen_ocl_internal_fabs(x)); z = one - two/(t+two); } else { t = __gen_ocl_internal_expm1(-two*__gen_ocl_internal_fabs(x)); z= -t/(t+two); } } else { /* |x| > 22, return +-1 */ z = one - tiny; /* raised inexact flag */ } return (jx>=0)? z: -z; } OVERLOADABLE float __gen_ocl_internal_cosh(float x) { //return (1 + native_exp(-2 * x)) / (2 * native_exp(-x)); float halF = 0.5, huge = 1.0e+30, tiny = 1.0e-30, one = 1.0; float t,w; int ix; GEN_OCL_GET_FLOAT_WORD(ix,x); ix &= 0x7fffffff; /* |x| in [0,22] */ if (ix < 0x41b00000) { /* |x| in [0,0.5*ln2], return 1+expm1(|x|)^2/(2*exp(|x|)) */ if(ix<0x3eb17218) { t = __gen_ocl_internal_expm1(__gen_ocl_fabs(x)); w = one+t; if (ix<0x24000000) return w; /* cosh(tiny) = 1 */ return one+(t*t)/(w+w); } /* |x| in [0.5*ln2,22], return (exp(|x|)+1/exp(|x|)/2; */ t = __gen_ocl_internal_exp(__gen_ocl_fabs(x)); return halF*t+halF/t; } /* |x| in [22, log(maxdouble)] return half*exp(|x|) */ if (ix < 0x42b17180) return halF*__gen_ocl_internal_exp(__gen_ocl_fabs(x)); /* |x| in [log(maxdouble), overflowthresold] */ if (ix<=0x42b2d4fc) { w = __gen_ocl_internal_exp(halF*__gen_ocl_fabs(x)); t = halF*w; return t*w; } /* x is INF or NaN */ if(ix>=0x7f800000) return x*x; /* |x| > overflowthresold, cosh(x) overflow */ return huge*huge; } OVERLOADABLE float __gen_ocl_internal_remainder(float x, float p){ //return x-y*__gen_ocl_rnde(x/y); float zero = 0.0; int hx,hp; unsigned sx; float p_half; GEN_OCL_GET_FLOAT_WORD(hx,x); GEN_OCL_GET_FLOAT_WORD(hp,p); sx = hx&0x80000000; hp &= 0x7fffffff; hx &= 0x7fffffff; /* purge off exception values */ if(hp==0) return (x*p)/(x*p); /* p = 0 */ if((hx>=0x7f800000)|| /* x not finite */ ((hp>0x7f800000))) /* p is NaN */ return (x*p)/(x*p); if (hp<=0x7effffff) x = __gen_ocl_internal_fmod(x,p+p); /* now x < 2p */ if ((hx-hp)==0) return zero*x; x = __gen_ocl_fabs(x); p = __gen_ocl_fabs(p); if (hp<0x01000000) { if(x+x>p) { x-=p; if(x+x>=p) x -= p; } } else { p_half = (float)0.5*p; if(x>p_half) { x-=p; if(x>=p_half) x -= p; } } GEN_OCL_GET_FLOAT_WORD(hx,x); GEN_OCL_SET_FLOAT_WORD(x,hx^sx); return x; } OVERLOADABLE float __gen_ocl_internal_ldexp(float x, int n) { x = __gen_ocl_scalbnf(x,n); return x; } OVERLOADABLE float __gen_ocl_internal_atanh(float x) { //return 0.5f * native_sqrt((1 + x) / (1 - x)); float xa = __gen_ocl_fabs (x); float t; if (isless (xa, 0.5f)){ if (xa < 0x1.0p-28f) return x; t = xa + xa; t = 0.5f * log1p (t + t * xa / (1.0f - xa)); } else if (isless (xa, 1.0f)){ t = 0.5f * log1p ((xa + xa) / (1.0f - xa)); } else{ if (isgreater (xa, 1.0f)) return (x - x) / (x - x); return x / 0.0f; } return __gen_ocl_internal_copysign(t, x); } OVERLOADABLE float __gen_ocl_internal_exp10(float x){ float px, qx,ans; short n; int i; float*p; float MAXL10 = 38.230809449325611792; float LOG210 = 3.32192809488736234787e0; float LG102A = 3.00781250000000000000E-1; float LG102B = 2.48745663981195213739E-4; float P[6]; P[0] = 2.063216740311022E-001; P[1] = 5.420251702225484E-001; P[2] = 1.171292686296281E+000; P[3] = 2.034649854009453E+000; P[4] = 2.650948748208892E+000; P[5] = 2.302585167056758E+000; if( x < -MAXL10 ) return 0.0; if( isinf(x)) return INFINITY; /* The following is necessary because range reduction blows up: */ if( x == 0 )return 1.0; /* Express 10**x = 10**g 2**n * = 10**g 10**( n log10(2) ) * = 10**( g + n log10(2) ) */ px = x * LOG210; qx = __gen_ocl_internal_floor( px + 0.5 ); n = qx; x -= qx * LG102A; x -= qx * LG102B; /* rational approximation for exponential * of the fractional part: * 10**x - 1 = 2x P(x**2)/( Q(x**2) - P(x**2) ) */ p = P; ans = *p++; i = 5; do{ ans = ans * x + *p++; } while( --i ); px = 1.0 + x * ans; /* multiply by power of 2 */ x = __gen_ocl_internal_ldexp( px, n ); return x; } OVERLOADABLE float cospi(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_cospi(x); return __gen_ocl_internal_cospi(x); } OVERLOADABLE float cosh(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_cosh(x); return __gen_ocl_internal_cosh(x); } OVERLOADABLE float acos(float x) { return __gen_ocl_internal_acos(x); } OVERLOADABLE float acospi(float x) { return __gen_ocl_internal_acospi(x); } OVERLOADABLE float acosh(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_acosh(x); return __gen_ocl_internal_acosh(x); } OVERLOADABLE float sinpi(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_sinpi(x); return __gen_ocl_internal_sinpi(x); } OVERLOADABLE float sinh(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_sinh(x); return __gen_ocl_internal_sinh(x); } OVERLOADABLE float asin(float x) { return __gen_ocl_internal_asin(x); } OVERLOADABLE float asinpi(float x) { return __gen_ocl_internal_asinpi(x); } OVERLOADABLE float asinh(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_asinh(x); return __gen_ocl_internal_asinh(x); } OVERLOADABLE float tanpi(float x) { return __gen_ocl_internal_tanpi(x); } OVERLOADABLE float tanh(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_tanh(x); return __gen_ocl_internal_tanh(x); } OVERLOADABLE float atan(float x) { return __gen_ocl_internal_atan(x); } OVERLOADABLE float atan2(float y, float x) { return __gen_ocl_internal_atan2(y, x); } OVERLOADABLE float atan2pi(float y, float x) { return __gen_ocl_internal_atan2pi(y, x); } OVERLOADABLE float atanpi(float x) { return __gen_ocl_internal_atanpi(x); } OVERLOADABLE float atanh(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_atanh(x); return __gen_ocl_internal_atanh(x); } OVERLOADABLE float cbrt(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_cbrt(x); return __gen_ocl_internal_cbrt(x); } OVERLOADABLE float rint(float x) { return __gen_ocl_internal_rint(x); } OVERLOADABLE float copysign(float x, float y) { return __gen_ocl_internal_copysign(x, y); } OVERLOADABLE float erf(float x) { return __gen_ocl_internal_erf(x); } OVERLOADABLE float erfc(float x) { return __gen_ocl_internal_erfc(x); } OVERLOADABLE float fmod (float x, float y) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_fmod(x, y); return __gen_ocl_internal_fmod(x, y); } OVERLOADABLE float remainder(float x, float p) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_remainder(x, p); return __gen_ocl_internal_remainder(x, p); } OVERLOADABLE float ldexp(float x, int n) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_ldexp(x, n); if (x == (float)0.0f) x = 0.0f; return __gen_ocl_internal_ldexp(x, n); } CONST OVERLOADABLE float __gen_ocl_mad(float a, float b, float c) __asm("llvm.fma" ".f32"); CONST OVERLOADABLE half __gen_ocl_mad(half a, half b, half c) __asm("llvm.fma" ".f16"); PURE CONST float __gen_ocl_fmax(float a, float b); PURE CONST float __gen_ocl_fmin(float a, float b); OVERLOADABLE float mad(float a, float b, float c) { return __gen_ocl_mad(a, b, c); } #define BODY \ if (isnan(x) || isinf(x)) { \ *exp = 0; \ return x; \ } \ uint u = as_uint(x); \ uint a = u & 0x7FFFFFFFu; \ if (a == 0) { \ *exp = 0; \ return x; \ } \ if (a >= 0x800000) { \ *exp = (a >> 23) - 126; \ return as_float((u & (0x807FFFFFu)) | 0x3F000000); \ } \ int e = -126; \ while (a < 0x400000) { \ e --; \ a <<= 1; \ } \ a <<= 1; \ *exp = e; \ return as_float((a & (0x807FFFFFu)) | (u & 0x80000000u) | 0x3F000000); OVERLOADABLE float frexp(float x, global int *exp) { BODY; } OVERLOADABLE float frexp(float x, local int *exp) { BODY; } OVERLOADABLE float frexp(float x, private int *exp) { BODY; } #undef BODY OVERLOADABLE float nextafter(float x, float y) { int hx, hy, ix, iy; hx = as_int(x); hy = as_int(y); ix = hx & 0x7fffffff; iy = hy & 0x7fffffff; if(ix == 0) ix = hx & 0x7fffff; if(iy == 0) iy = hy & 0x7fffff; if(ix>0x7f800000 || iy>0x7f800000) return x+y; if(hx == hy) return y; if(ix == 0) { if(iy == 0) return y; else return as_float((hy&0x80000000) | 1); } if(hx >= 0) { if(hx > hy) { hx -= 1; } else { hx += 1; } } else { if(hy >= 0 || hx > hy){ hx -= 1; } else { hx += 1; } } return as_float(hx); } #define BODY \ uint hx = as_uint(x), ix = hx & 0x7FFFFFFF; \ if (ix > 0x7F800000) { \ *i = nan(0u); \ return nan(0u); \ } \ if (ix == 0x7F800000) { \ *i = x; \ return as_float(hx & 0x80000000u); \ } \ *i = __gen_ocl_rndz(x); \ return x - *i; OVERLOADABLE float modf(float x, global float *i) { BODY; } OVERLOADABLE float modf(float x, local float *i) { BODY; } OVERLOADABLE float modf(float x, private float *i) { BODY; } #undef BODY OVERLOADABLE float __gen_ocl_internal_fmax(float a, float b) { return max(a,b); } OVERLOADABLE float __gen_ocl_internal_fmin(float a, float b) { return min(a,b); } OVERLOADABLE float __gen_ocl_internal_fmax(half a, half b) { return max(a,b); } OVERLOADABLE float __gen_ocl_internal_fmin(half a, half b) { return min(a,b); } OVERLOADABLE float __gen_ocl_internal_maxmag(float x, float y) { float a = __gen_ocl_fabs(x), b = __gen_ocl_fabs(y); return a > b ? x : b > a ? y : max(x, y); } OVERLOADABLE float __gen_ocl_internal_minmag(float x, float y) { float a = __gen_ocl_fabs(x), b = __gen_ocl_fabs(y); return a < b ? x : b < a ? y : min(x, y); } OVERLOADABLE float __gen_ocl_internal_fdim(float x, float y) { if(isnan(x)) return x; if(isnan(y)) return y; return x > y ? (x - y) : +0.f; } /* * the pow/pown high precision implementation are copied from msun library. * Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com. */ /* * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ OVERLOADABLE float __gen_ocl_internal_pow(float x, float y) { float z,ax,z_h,z_l,p_h,p_l; float y1,t1,t2,r,s,sn,t,u,v,w; int i,j,k,yisint,n; int hx,hy,ix,iy,is; float bp[2],dp_h[2],dp_l[2], zero = 0.0, one = 1.0, two = 2.0, two24 = 16777216.0, /* 0x4b800000 */ huge = 1.0e30, tiny = 1.0e-30, /* poly coefs for (3/2)*(log(x)-2s-2/3*s**3 */ L1 = 6.0000002384e-01, /* 0x3f19999a */ L2 = 4.2857143283e-01, /* 0x3edb6db7 */ P1 = 1.6666667163e-01, /* 0x3e2aaaab */ P2 = -2.7777778450e-03, /* 0xbb360b61 */ lg2 = 6.9314718246e-01, /* 0x3f317218 */ lg2_h = 6.93145752e-01, /* 0x3f317200 */ lg2_l = 1.42860654e-06, /* 0x35bfbe8c */ ovt = 4.2995665694e-08, /* -(128-log2(ovfl+.5ulp)) */ cp = 9.6179670095e-01, /* 0x3f76384f =2/(3ln2) */ cp_h = 9.6179199219e-01, /* 0x3f763800 =head of cp */ cp_l = 4.7017383622e-06, /* 0x369dc3a0 =tail of cp_h */ ivln2 = 1.4426950216e+00, /* 0x3fb8aa3b =1/ln2 */ ivln2_h = 1.4426879883e+00, /* 0x3fb8aa00 =16b 1/ln2*/ ivln2_l = 7.0526075433e-06; /* 0x36eca570 =1/ln2 tail*/ bp[0] = 1.0,bp[1] = 1.5, dp_h[0] = 0.0,dp_h[1] = 5.84960938e-01, dp_l[0] = 0.0,dp_l[1] = 1.56322085e-06; GEN_OCL_GET_FLOAT_WORD(hx,x); GEN_OCL_GET_FLOAT_WORD(hy,y); ix = hx&0x7fffffff; iy = hy&0x7fffffff; if (ix < 0x00800000) { /* x < 2**-126 */ ix = 0;/* Gen does not support subnormal number now */ } if (iy < 0x00800000) { /* y < 2**-126 */ iy = 0;/* Gen does not support subnormal number now */ } /* y==zero: x**0 = 1 */ if(iy==0) return one; /* pow(+1, y) returns 1 for any y, even a NAN */ if(hx==0x3f800000) return one; /* +-NaN return x+y */ if(ix > 0x7f800000 || iy > 0x7f800000) return (x+0.0f)+y+(0.0f); /* determine if y is an odd int when x < 0 * yisint = 0 ... y is not an integer * yisint = 1 ... y is an odd int * yisint = 2 ... y is an even int */ yisint = 0; if(hx<0) { if(iy>=0x4b800000) yisint = 2; /* even integer y */ else if(iy>=0x3f800000) { k = (iy>>23)-0x7f; /* exponent */ j = iy>>(23-k); if((j<<(23-k))==iy) yisint = 2-(j&1); } } /* special value of y */ if (iy==0x7f800000) { /* y is +-inf */ if (ix==0x3f800000) //return y - y; /* inf**+-1 is NaN */ return one; else if (ix > 0x3f800000)/* (|x|>1)**+-inf = inf,0 */ return (hy>=0)? y: zero; else /* (|x|<1)**-,+inf = inf,0 */ return (hy<0)?-y: zero; } if(iy==0x3f800000) { /* y is +-1 */ if(hy<0) return one/x; else return x; } if(hy==0x40000000) return x*x; /* y is 2 */ if(hy==0x3f000000) { /* y is 0.5 */ if(hx>=0)return __gen_ocl_sqrt(x); } ax = __gen_ocl_fabs(x); /* special value of x */ if(ix==0x7f800000||ix==0||ix==0x3f800000){ z = ax; /*x is +-0,+-inf,+-1*/ if(hy<0) z = one/z; /* z = (1/|x|) */ if(hx<0) { if(((ix-0x3f800000)|yisint)==0) { z = (z-z)/(z-z); /* (-1)**non-int is NaN */ } else if(yisint==1) z = -z; /* (x<0)**odd = -(|x|**odd) */ } return z; } n = ((uint)hx>>31)-1; /* (x<0)**(non-int) is NaN */ if((n|yisint)==0) return (x-x)/(x-x); sn = one; /* s (sign of result -ve**odd) = -1 else = 1 */ if((n|(yisint-1))==0) sn = -one;/* (-ve)**(odd int) */ /* |y| is huge */ if(iy>0x4d000000) { /* if |y| > 2**27 */ /* over/underflow if x is not close to one */ if(ix<0x3f7ffff8) return (hy<0)? sn*huge*huge:sn*tiny*tiny; if(ix>0x3f800007) return (hy>0)? sn*huge*huge:sn*tiny*tiny; /* now |1-x| is tiny <= 2**-20, suffice to compute log(x) by x-x^2/2+x^3/3-x^4/4 */ t = ax-1; /* t has 20 trailing zeros */ w = (t*t)*((float)0.5-t*(0.333333333333f-t*0.25f)); u = ivln2_h*t; /* ivln2_h has 16 sig. bits */ v = t*ivln2_l-w*ivln2; t1 = u+v; GEN_OCL_GET_FLOAT_WORD(is,t1); GEN_OCL_SET_FLOAT_WORD(t1,is&0xfffff000); t2 = v-(t1-u); } else { float s2,s_h,s_l,t_h,t_l; n = 0; /* take care subnormal number */ //if(ix<0x00800000) //{ax *= two24; n -= 24; GEN_OCL_GET_FLOAT_WORD(ix,ax); } n += ((ix)>>23)-0x7f; j = ix&0x007fffff; /* determine interval */ ix = j|0x3f800000; /* normalize ix */ if(j<=0x1cc471) k=0; /* |x|>1)&0xfffff000)|0x20000000; GEN_OCL_SET_FLOAT_WORD(t_h,is+0x00400000+(k<<21)); t_l = ax - (t_h-bp[k]); s_l = v*((u-s_h*t_h)-s_h*t_l); /* compute log(ax) */ s2 = s*s; r = s2*s2*(L1+s2*L2); r += s_l*(s_h+s); s2 = s_h*s_h; t_h = 3.0f+s2+r; GEN_OCL_GET_FLOAT_WORD(is,t_h); GEN_OCL_SET_FLOAT_WORD(t_h,is&0xffffe000); t_l = r-((t_h-3.0f)-s2); /* u+v = s*(1+...) */ u = s_h*t_h; v = s_l*t_h+t_l*s; /* 2/(3log2)*(s+...) */ p_h = u+v; GEN_OCL_GET_FLOAT_WORD(is,p_h); GEN_OCL_SET_FLOAT_WORD(p_h,is&0xffffe000); p_l = v-(p_h-u); z_h = cp_h*p_h; /* cp_h+cp_l = 2/(3*log2) */ z_l = cp_l*p_h+p_l*cp+dp_l[k]; /* log2(ax) = (s+..)*2/(3*log2) = n + dp_h + z_h + z_l */ t = (float)n; t1 = (((z_h+z_l)+dp_h[k])+t); GEN_OCL_GET_FLOAT_WORD(is,t1); GEN_OCL_SET_FLOAT_WORD(t1,is&0xffffe000); t2 = z_l-(((t1-t)-dp_h[k])-z_h); } /* split up y into y1+y2 and compute (y1+y2)*(t1+t2) */ GEN_OCL_GET_FLOAT_WORD(is,y); GEN_OCL_SET_FLOAT_WORD(y1,is&0xffffe000); p_l = (y-y1)*t1+y*t2; p_h = y1*t1; z = p_l+p_h; GEN_OCL_GET_FLOAT_WORD(j,z); if (j>0x43000000) /* if z > 128 */ return sn*huge*huge; /* overflow */ else if (j==0x43000000) { /* if z == 128 */ if(p_l+ovt>z-p_h) return sn*huge*huge; /* overflow */ } else if ((j&0x7fffffff)>0x43160000) /* z <= -150 */ return sn*tiny*tiny; /* underflow */ else if (j==0xc3160000){ /* z == -150 */ if(p_l<=z-p_h) return sn*tiny*tiny; /* underflow */ } /* * compute 2**(p_h+p_l) */ i = j&0x7fffffff; k = (i>>23)-0x7f; n = 0; if(i>0x3f000000) { /* if |z| > 0.5, set n = [z+0.5] */ n = j+(0x00800000>>(k+1)); k = ((n&0x7fffffff)>>23)-0x7f; /* new k for n */ GEN_OCL_SET_FLOAT_WORD(t,n&~(0x007fffff>>k)); n = ((n&0x007fffff)|0x00800000)>>(23-k); if(j<0) n = -n; p_h -= t; } t = p_l+p_h; GEN_OCL_GET_FLOAT_WORD(is,t); GEN_OCL_SET_FLOAT_WORD(t,is&0xffff8000); u = t*lg2_h; v = (p_l-(t-p_h))*lg2+t*lg2_l; z = u+v; w = v-(z-u); t = z*z; t1 = z - t*(P1+t*P2); r = (z*t1)/(t1-two)-(w+z*w); z = one-(r-z); GEN_OCL_GET_FLOAT_WORD(j,z); j += (n<<23); if((j>>23)<=0) z = __gen_ocl_scalbnf(z,n); /* subnormal output */ else GEN_OCL_SET_FLOAT_WORD(z,j); return sn*z; } OVERLOADABLE float tgamma (float x) { /* based on glibc __ieee754_gammaf_r by Ulrich Drepper */ unsigned int hx; GEN_OCL_GET_FLOAT_WORD(hx,x); if (hx == 0xff800000) { /* x == -Inf. According to ISO this is NaN. */ return NAN; } if ((hx & 0x7f800000) == 0x7f800000) { /* Positive infinity (return positive infinity) or NaN (return NaN). */ return x; } if (x < 0.0f && __gen_ocl_internal_floor (x) == x) { /* integer x < 0 */ return NAN; } if (x >= 36.0f) { /* Overflow. */ return INFINITY; } else if (x <= 0.0f && x >= -FLT_EPSILON / 4.0f) { return 1.0f / x; } else { float sinpix = __gen_ocl_internal_sinpi(x); if (x <= -42.0f) /* Underflow. */ {return 0.0f * sinpix /*for sign*/;} int exp2_adj = 0; float x_abs = __gen_ocl_fabs(x); float gam0; if (x_abs < 4.0f) { /* gamma = exp(lgamma) is only accurate for small lgamma */ float prod,x_adj; if (x_abs < 0.5f) { prod = 1.0f / x_abs; x_adj = x_abs + 1.0f; } else if (x_abs <= 1.5f) { prod = 1.0f; x_adj = x_abs; } else if (x_abs < 2.5f) { x_adj = x_abs - 1.0f; prod = x_adj; } else { x_adj = x_abs - 2.0f; prod = x_adj * (x_abs - 1.0f); } gam0 = __gen_ocl_internal_exp (lgamma (x_adj)) * prod; } else { /* Compute gamma (X) using Stirling's approximation, starting by computing pow (X, X) with a power of 2 factored out to avoid intermediate overflow. */ float x_int = __gen_ocl_internal_round (x_abs); float x_frac = x_abs - x_int; int x_log2; float x_mant = frexp (x_abs, &x_log2); if (x_mant < M_SQRT1_2_F) { x_log2--; x_mant *= 2.0f; } exp2_adj = x_log2 * (int) x_int; float ret = (__gen_ocl_internal_pow(x_mant, x_abs) * exp2 (x_log2 * x_frac) * __gen_ocl_internal_exp (-x_abs) * sqrt (2.0f * M_PI_F / x_abs) ); float x2 = x_abs * x_abs; float bsum = (0x3.403404p-12f / x2 -0xb.60b61p-12f) / x2 + 0x1.555556p-4f; gam0 = ret + ret * __gen_ocl_internal_expm1 (bsum / x_abs); } if (x > 0.0f) {return __gen_ocl_internal_ldexp (gam0, exp2_adj);} float gam1 = M_PI_F / (-x * sinpix * gam0); return __gen_ocl_internal_ldexp (gam1, -exp2_adj); } } float __gen_ocl_internal_pown(float x, int y) { const float bp[] = {1.0, 1.5,}, dp_h[] = { 0.0, 5.84960938e-01,}, /* 0x3f15c000 */ dp_l[] = { 0.0, 1.56322085e-06,}, /* 0x35d1cfdc */ zero = 0.0, one = 1.0, two = 2.0, two24 = 16777216.0, /* 0x4b800000 */ huge = 1.0e30, tiny = 1.0e-30, /* poly coefs for (3/2)*(log(x)-2s-2/3*s**3 */ L1 = 6.0000002384e-01, /* 0x3f19999a */ L2 = 4.2857143283e-01, /* 0x3edb6db7 */ P1 = 1.6666667163e-01, /* 0x3e2aaaab */ P2 = -2.7777778450e-03, /* 0xbb360b61 */ lg2 = 6.9314718246e-01, /* 0x3f317218 */ lg2_h = 0x1.62ep-1, lg2_l = 0x1.0bfbe8p-15, ovt = 4.2995665694e-08, /* -(128-log2(ovfl+.5ulp)) */ cp = 9.6179670095e-01, /* 0x3f76384f =2/(3ln2) */ cp_h = 9.6179199219e-01, /* 0x3f763800 =head of cp */ cp_l = 4.7017383622e-06, /* 0x369dc3a0 =tail of cp_h */ ivln2 = 1.4426950216e+00, /* 0x3fb8aa3b =1/ln2 */ ivln2_h = 1.4426879883e+00, /* 0x3fb8aa00 =16b 1/ln2*/ ivln2_l = 7.0526075433e-06; /* 0x36eca570 =1/ln2 tail*/ float z,ax,z_h,z_l,p_h,p_l; float y1,t1,t2,r,s,t,u,v,w; int i,j,k,yisint,n; int hx,ix,iy,is; GEN_OCL_GET_FLOAT_WORD(hx,x); ix = hx&0x7fffffff; iy = y > 0 ? y&0x7fffffff : (-y)&0x7fffffff; /* y==zero: x**0 = 1 */ if(y==0) return one; /* +-NaN return NAN */ if(ix > 0x7f800000) return NAN; /* determine if y is an odd int * yisint = 1 ... y is an odd int * yisint = 2 ... y is an even int */ yisint = y&1 ? 1 : 2; if (y == 1) return x; if (y == -1) return one/x; if (y == 2) return x*x; ax = __gen_ocl_fabs(x); /* special value of x */ if(ix==0x7f800000||ix==0||ix==0x3f800000){ z = ax; /*x is +-0,+-inf,+-1*/ if(y<0) z = one/z; /* z = (1/|x|) */ if(hx<0) { if(yisint==1) z = -z; /* (x<0)**odd = -(|x|**odd) */ } return z; } float sn = one; /* s (sign of result -ve**odd) = -1 else = 1 */ if(((((unsigned)hx>>31)-1)|(yisint-1))==0) sn = -one; /* (-ve)**(odd int) */ /* |y| is huge */ if(iy>0x08000000) { /* if |y| > 2**27 */ /* over/underflow if x is not close to one */ if(ix<0x3f7ffff8) return (y<0)? sn*huge*huge:tiny*tiny; if(ix>0x3f800007) return (y>0)? sn*huge*huge:tiny*tiny; /* now |1-x| is tiny <= 2**-20, suffice to compute log(x) by x-x^2/2+x^3/3-x^4/4 */ t = ax-1; /* t has 20 trailing zeros */ w = (t*t)*((float)0.5-t*((float)0.333333333333-t*(float)0.25)); u = ivln2_h*t; /* ivln2_h has 16 sig. bits */ v = t*ivln2_l-w*ivln2; t1 = u+v; GEN_OCL_GET_FLOAT_WORD(is,t1); GEN_OCL_SET_FLOAT_WORD(t1,is&0xfffff000); t2 = v-(t1-u); } else { float s2,s_h,s_l,t_h,t_l; n = 0; /* take care subnormal number */ // if(ix<0x00800000) // {ax *= two24; n -= 24; GEN_OCL_GET_FLOAT_WORD(ix,ax); } n += ((ix)>>23)-0x7f; j = ix&0x007fffff; /* determine interval */ ix = j|0x3f800000; /* normalize ix */ if(j<=0x1cc471) k=0; /* |x|>1)|0x20000000)+0x00400000+(k<<21)) &0xfffff000); t_l = ax - (t_h-bp[k]); s_l = v*((u-s_h*t_h)-s_h*t_l); /* compute log(ax) */ s2 = s*s; r = s2*s2*(L1+s2*L2); r += s_l*(s_h+s); s2 = s_h*s_h; t_h = (float)3.0+s2+r; GEN_OCL_GET_FLOAT_WORD(is,t_h); GEN_OCL_SET_FLOAT_WORD(t_h,is&0xffffe000); t_l = r-((t_h-(float)3.0)-s2); /* u+v = s*(1+...) */ u = s_h*t_h; v = s_l*t_h+t_l*s; /* 2/(3log2)*(s+...) */ p_h = u+v; GEN_OCL_GET_FLOAT_WORD(is,p_h); GEN_OCL_SET_FLOAT_WORD(p_h,is&0xffffe000); p_l = v-(p_h-u); z_h = cp_h*p_h; /* cp_h+cp_l = 2/(3*log2) */ z_l = cp_l*p_h+p_l*cp+dp_l[k]; /* log2(ax) = (s+..)*2/(3*log2) = n + dp_h + z_h + z_l */ t = (float)n; t1 = (((z_h+z_l)+dp_h[k])+t); GEN_OCL_GET_FLOAT_WORD(is,t1); GEN_OCL_SET_FLOAT_WORD(t1,is&0xffffe000); t2 = z_l-(((t1-t)-dp_h[k])-z_h); } /* split up y into y1+y2+y3 and compute (y1+y2+y3)*(t1+t2) */ float fy = (float)y; float y3 = (float)(y-(int)fy); GEN_OCL_GET_FLOAT_WORD(is,fy); GEN_OCL_SET_FLOAT_WORD(y1,is&0xfffff000); p_l = (fy-y1)*t1 + y3*t1 + fy*t2 + y3*t2; p_h = y1*t1; z = p_l+p_h; GEN_OCL_GET_FLOAT_WORD(j,z); if (j>0x43000000) /* if z > 128 */ return sn*huge*huge; /* overflow */ else if (j==0x43000000) { /* if z == 128 */ if(p_l+ovt>z-p_h) return sn*huge*huge; /* overflow */ } else if ((j&0x7fffffff)>0x43160000) /* z <= -150 */ return sn*tiny*tiny; /* underflow */ else if (j==0xc3160000){ /* z == -150 */ if(p_l<=z-p_h) return sn*tiny*tiny; /* underflow */ } /* * compute 2**(p_h+p_l) */ i = j&0x7fffffff; k = (i>>23)-0x7f; n = 0; if(i>0x3f000000) { /* if |z| > 0.5, set n = [z+0.5] */ n = j+(0x00800000>>(k+1)); k = ((n&0x7fffffff)>>23)-0x7f; /* new k for n */ GEN_OCL_SET_FLOAT_WORD(t,n&~(0x007fffff>>k)); n = ((n&0x007fffff)|0x00800000)>>(23-k); if(j<0) n = -n; p_h -= t; z -= n; } t = z; GEN_OCL_GET_FLOAT_WORD(is,t); GEN_OCL_SET_FLOAT_WORD(t,is&0xfffff000); u = t*lg2_h; v = (p_l-(t-p_h))*lg2+t*lg2_l; z = u+v; w = v-(z-u); t = z*z; t1 = z - t*(P1+t*P2); r = (z*t1)/(t1-two)-(w+z*w); z = one-(r-z); GEN_OCL_GET_FLOAT_WORD(j,z); j += (n<<23); if((j>>23)<=0) z = __gen_ocl_scalbnf(z,n); /* subnormal output */ else GEN_OCL_SET_FLOAT_WORD(z,j); return sn*z; } OVERLOADABLE float hypot(float x, float y) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_hypot(x, y); //return __gen_ocl_sqrt(x*x + y*y); float a,b,an,bn,cn; int e; if (isfinite (x) && isfinite (y)){ /* Determine absolute values. */ x = __gen_ocl_fabs (x); y = __gen_ocl_fabs (y); /* Find the bigger and the smaller one. */ a = max(x,y); b = min(x,y); /* Now 0 <= b <= a. */ /* Write a = an * 2^e, b = bn * 2^e with 0 <= bn <= an < 1. */ an = frexp (a, &e); bn = ldexp (b, - e); /* Through the normalization, no unneeded overflow or underflow will occur here. */ cn = __gen_ocl_sqrt (an * an + bn * bn); return ldexp (cn, e); }else{ if (isinf (x) || isinf (y)) /* x or y is infinite. Return +Infinity. */ return INFINITY; else /* x or y is NaN. Return NaN. */ return x + y; } } #define BODY \ if (isnan(x)) { \ *p = x; \ return x; \ } \ *p = __gen_ocl_internal_floor(x); \ if (isinf(x)) { \ return x > 0 ? +0. : -0.; \ } \ return __gen_ocl_internal_fmin(x - *p, 0x1.FFFFFep-1F); OVERLOADABLE float fract(float x, global float *p) { BODY; } OVERLOADABLE float fract(float x, local float *p) { BODY; } OVERLOADABLE float fract(float x, private float *p) { BODY; } #undef BODY #define BODY \ float Zero[2]; \ int n,hx,hy,hz,ix,iy,sx,i,sy; \ uint q,sxy; \ Zero[0] = 0.0;Zero[1] = -0.0; \ if (x == 0.0f) { x = 0.0f; }; \ if (y == 0.0f) { y = 0.0f; }\ GEN_OCL_GET_FLOAT_WORD(hx,x);GEN_OCL_GET_FLOAT_WORD(hy,y); \ sxy = (hx ^ hy) & 0x80000000;sx = hx&0x80000000;sy = hy&0x80000000; \ hx ^=sx; hy &= 0x7fffffff; \ if (hx < 0x00800000)hx = 0;if (hy < 0x00800000)hy = 0; \ if(hy==0||hx>=0x7f800000||hy>0x7f800000){ \ *quo = 0;return NAN; \ } \ if( hy == 0x7F800000 || hx == 0 ) { \ *quo = 0;return x; \ } \ if( hx == hy ) { \ *quo = (x == y) ? 1 : -1; \ return sx ? -0.0 : 0.0; \ } \ if(hx>31]; \ } \ ix = (hx>>23)-127; \ iy = (hy>>23)-127; \ hx = 0x00800000|(0x007fffff&hx); \ hy = 0x00800000|(0x007fffff&hy); \ n = ix - iy; \ q = 0; \ while(n--) { \ hz=hx-hy; \ if(hz<0) hx = hx << 1; \ else {hx = hz << 1; q++;} \ q <<= 1; \ } \ hz=hx-hy; \ if(hz>=0) {hx=hz;q++;} \ if(hx==0) { \ q &= 0x0000007f; \ *quo = (sxy ? -q : q); \ return Zero[(uint)sx>>31]; \ } \ while(hx<0x00800000) { \ hx <<= 1;iy -= 1; \ } \ if(iy>= -126) { \ hx = ((hx-0x00800000)|((iy+127)<<23)); \ } else {\ n = -126 - iy; \ hx >>= n; \ } \ fixup: \ GEN_OCL_SET_FLOAT_WORD(x,hx); \ if(hx<0x00800000){ \ GEN_OCL_GET_FLOAT_WORD(hy,y); \ hy &= 0x7fffffff; \ if(hx+hx > hy ||(hx+hx==hy && (q & 1)))q++; \ x = 0; \ }else{ \ y = __gen_ocl_fabs(y); \ if (y < 0x1p-125f) { \ if (x+x>y || (x+x==y && (q & 1))) { \ q++;x-=y; \ } \ }else if (x>0.5f*y || (x==0.5f*y && (q & 1))) { \ q++;x-=y; \ } \ GEN_OCL_GET_FLOAT_WORD(hx,x);GEN_OCL_SET_FLOAT_WORD(x,hx^sx); \ } \ int sign = sx==sy?0:1; \ q &= 0x0000007f; \ *quo = (sign ? -q : q); \ return x; OVERLOADABLE float remquo(float x, float y, global int *quo) { BODY; } OVERLOADABLE float remquo(float x, float y, local int *quo) { BODY; } OVERLOADABLE float remquo(float x, float y, private int *quo) { BODY; } #undef BODY OVERLOADABLE float powr(float x, float y) { unsigned int hx, sx, hy, sy; if (__ocl_math_fastpath_flag) return __gen_ocl_pow(x,y); else { if (isnan(x) || isnan(y)) return NAN; GEN_OCL_GET_FLOAT_WORD(hx,x); GEN_OCL_GET_FLOAT_WORD(hy,y); sx = (hx & 0x80000000) >> 31; sy = (hy & 0x80000000) >> 31; if ((hx&0x7fffffff) < 0x00800000) { /* x < 2**-126 */ x = 0.0f;/* Gen does not support subnormal number now */ hx = hx &0x80000000; } if ((hy&0x7fffffff) < 0x00800000) { /* y < 2**-126 */ y = 0.0;/* Gen does not support subnormal number now */ hy = hy &0x80000000; } // (x < 0) ** y = NAN (y!=0) if ((sx && (hx & 0x7fffffff))) return NAN; // +/-0 ** +/-0 = NAN if ( !(hx&0x7fffffff) && !(hy&0x7fffffff)) return NAN; // +inf ** +/-0 = NAN if ( ((hx & 0x7f800000) ==0x7f800000) && !(hy&0x7fffffff)) return NAN; // others except nan/inf/0 ** 0 = 1.0 if (!(hy&0x7fffffff)) return 1.0f; // +1 ** inf = NAN; +1 ** finite = 1; if (hx == 0x3f800000) { return isinf(y) ? NAN : 1.0f; } if ( !(hx & 0x7fffffff)) { // +/-0 ** y<0 = +inf // +/-0 ** y>0 = +0 return sy ? INFINITY : 0.0f; } return __gen_ocl_internal_pow(x,y); } } OVERLOADABLE float pown(float x, int n) { if (__ocl_math_fastpath_flag) { if (x == 0.f && n == 0) return 1.f; if (x < 0.f && (n&1) ) return -powr(-x, n); return powr(x, n); } else { int ix; GEN_OCL_GET_FLOAT_WORD(ix, x); float sign = ix < 0 ? -1.0f : 1.0f; if (x == 0.0f) x = sign * 0.0f; return __gen_ocl_internal_pown(x, n); } } OVERLOADABLE float pow(float x, float y) { if (!__ocl_math_fastpath_flag) return __gen_ocl_internal_pow(x,y); else { int n; if (x == 0.f && y == 0.f) return 1.f; if (x >= 0.f) return powr(x, y); n = y; if ((float)n == y)//is exact integer return pown(x, n); return NAN; } } OVERLOADABLE float rootn(float x, int n) { float ax,re; int sign = 0; int hx; if( n == 0 )return NAN; GEN_OCL_GET_FLOAT_WORD(hx, x); // Gen does not support denorm, flush to zero if ((hx & 0x7fffffff) < 0x00800000) { x = hx < 0 ? -0.0f : 0.0f; } //rootn ( x, n ) returns a NaN for x < 0 and n is even. if( x < 0 && 0 == (n&1) ) return NAN; if( x == 0.0 ){ switch( n & 0x80000001 ){ //rootn ( +-0, n ) is +0 for even n > 0. case 0: return 0.0f; //rootn ( +-0, n ) is +-0 for odd n > 0. case 1: return x; //rootn ( +-0, n ) is +inf for even n < 0. case 0x80000000: return INFINITY; //rootn ( +-0, n ) is +-inf for odd n < 0. case 0x80000001: return __gen_ocl_internal_copysign(INFINITY, x); } } ax = __gen_ocl_fabs(x); if(x <0.0f && (n&1)) sign = 1; if (__ocl_math_fastpath_flag) re = __gen_ocl_pow(ax, 1.f/n); else re = __gen_ocl_internal_pow(ax,1.f/n); if(sign) re = -re; return re; } OVERLOADABLE float fabs(float x) { return __gen_ocl_internal_fabs(x); } OVERLOADABLE float trunc(float x) { return __gen_ocl_internal_trunc(x); } OVERLOADABLE float round(float x) { return __gen_ocl_internal_round(x); } OVERLOADABLE float floor(float x) { return __gen_ocl_internal_floor(x); } OVERLOADABLE float ceil(float x) { return __gen_ocl_internal_ceil(x); } OVERLOADABLE float log(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_log(x); /* Use native instruction when it has enough precision */ if((x > 0x1.1p0) || (x <= 0)) return __gen_ocl_internal_fastpath_log(x); return __gen_ocl_internal_log(x); } OVERLOADABLE float log2(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_log2(x); /* Use native instruction when it has enough precision */ if((x > 0x1.1p0) || (x <= 0)) return __gen_ocl_internal_fastpath_log2(x); return __gen_ocl_internal_log2(x); } OVERLOADABLE float log10(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_log10(x); /* Use native instruction when it has enough precision */ if((x > 0x1.1p0) || (x <= 0)) return __gen_ocl_internal_fastpath_log10(x); return __gen_ocl_internal_log10(x); } OVERLOADABLE float exp(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_exp(x); /* Use native instruction when it has enough precision */ if (x > -0x1.6p1 && x < 0x1.6p1) return __gen_ocl_internal_fastpath_exp(x); return __gen_ocl_internal_exp(x); } OVERLOADABLE float exp2(float x) { /* Use native instruction when it has enough precision, exp2 always */ return native_exp2(x); } OVERLOADABLE float exp10(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_exp10(x); return __gen_ocl_internal_exp10(x); } OVERLOADABLE float expm1(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_expm1(x); return __gen_ocl_internal_expm1(x); } OVERLOADABLE float fmin(float a, float b) { return __gen_ocl_internal_fmin(a, b); } OVERLOADABLE float fmax(float a, float b) { return __gen_ocl_internal_fmax(a, b); } OVERLOADABLE float fma(float a, float b, float c) { return mad(a, b, c); } OVERLOADABLE float fdim(float x, float y) { return __gen_ocl_internal_fdim(x, y); } OVERLOADABLE float maxmag(float x, float y) { return __gen_ocl_internal_maxmag(x, y); } OVERLOADABLE float minmag(float x, float y) { return __gen_ocl_internal_minmag(x, y); } /* So far, the HW do not support half float math function. We just do the conversion and call the float version here. */ OVERLOADABLE half cospi(half x) { float _x = (float)x; return (half)cospi(_x); } OVERLOADABLE half cosh(half x) { float _x = (float)x; return (half)cosh(_x); } OVERLOADABLE half acos(half x) { float _x = (float)x; return (half)acos(_x); } OVERLOADABLE float half_cos(float x) { return (float)cos(x); } OVERLOADABLE float half_divide(float x, float y) { return (float)native_divide(x, y); } OVERLOADABLE float half_exp(float x) { return (float)native_exp(x); } OVERLOADABLE float half_exp2(float x){ return (float)native_exp2(x); } OVERLOADABLE float half_exp10(float x){ return (float)native_exp10(x); } OVERLOADABLE float half_log(float x){ return (float)native_log(x); } OVERLOADABLE float half_log2(float x){ return (float)native_log2(x); } OVERLOADABLE float half_log10(float x){ return (float)native_log10(x); } OVERLOADABLE float half_powr(float x, float y){ return (float)powr(x, y); } OVERLOADABLE float half_recip(float x){ return (float)native_recip(x); } OVERLOADABLE float half_rsqrt(float x){ return (float)native_rsqrt(x); } OVERLOADABLE float half_sin(float x){ return (float)sin(x); } OVERLOADABLE float half_sqrt(float x){ return (float)native_sqrt(x); } OVERLOADABLE float half_tan(float x){ return (float)tan(x); } OVERLOADABLE half acospi(half x) { float _x = (float)x; return (half)acospi(_x); } OVERLOADABLE half acosh(half x) { float _x = (float)x; return (half)acosh(_x); } OVERLOADABLE half sinpi(half x) { float _x = (float)x; return (half)sinpi(_x); } OVERLOADABLE half sinh(half x) { float _x = (float)x; return (half)sinh(_x); } OVERLOADABLE half asin(half x) { float _x = (float)x; return (half)asin(_x); } OVERLOADABLE half asinpi(half x) { float _x = (float)x; return (half)asinpi(_x); } OVERLOADABLE half asinh(half x) { float _x = (float)x; return (half)asinh(_x); } OVERLOADABLE half tanpi(half x) { float _x = (float)x; return (half)tanpi(_x); } OVERLOADABLE half tanh(half x) { float _x = (float)x; return (half)tanh(_x); } OVERLOADABLE half atan(half x) { float _x = (float)x; return (half)atan(_x); } OVERLOADABLE half atan2(half y, half x) { float _x = (float)x; float _y = (float)y; return (half)atan2(_x, _y); } OVERLOADABLE half atan2pi(half y, half x) { float _x = (float)x; float _y = (float)y; return (half)atan2pi(_x, _y); } OVERLOADABLE half atanpi(half x) { float _x = (float)x; return (half)atanpi(_x); } OVERLOADABLE half atanh(half x) { float _x = (float)x; return (half)atanh(_x); } OVERLOADABLE half cbrt(half x) { float _x = (float)x; return (half)cbrt(_x); } OVERLOADABLE half rint(half x) { float _x = (float)x; return (half)rint(_x); } OVERLOADABLE half copysign(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)copysign(_x, _y); } OVERLOADABLE half erf(half x) { float _x = (float)x; return (half)erf(_x); } OVERLOADABLE half erfc(half x) { float _x = (float)x; return (half)erfc(_x); } OVERLOADABLE half fmod(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)fmod(_x, _y); } OVERLOADABLE half remainder(half x, half p) { float _x = (float)x; float _p = (float)p; return (half)remainder(_x, _p); } OVERLOADABLE half ldexp(half x, int n) { float _x = (float)x; return (half)ldexp(_x, n); } OVERLOADABLE half powr(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)powr(_x, _y); } OVERLOADABLE half pow(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)pow(_x, _y); } //no pow, we use powr instead OVERLOADABLE half fabs(half x) { float _x = (float)x; return (half)fabs(_x); } OVERLOADABLE half trunc(half x) { float _x = (float)x; return (half)trunc(_x); } OVERLOADABLE half round(half x) { float _x = (float)x; return (half)round(_x); } OVERLOADABLE half floor(half x) { float _x = (float)x; return (half)floor(_x); } OVERLOADABLE half ceil(half x) { float _x = (float)x; return (half)ceil(_x); } OVERLOADABLE half log(half x) { float _x = (float)x; return (half)log(_x); } OVERLOADABLE half log2(half x) { float _x = (float)x; return (half)log2(_x); } OVERLOADABLE half log10(half x) { float _x = (float)x; return (half)log10(_x); } OVERLOADABLE half exp(half x) { float _x = (float)x; return (half)exp(_x); } OVERLOADABLE half exp10(half x) { float _x = (float)x; return (half)exp10(_x); } OVERLOADABLE half expm1(half x) { float _x = (float)x; return (half)expm1(_x); } OVERLOADABLE half fmin(half a, half b) { return __gen_ocl_internal_fmin(a, b); } OVERLOADABLE half fmax(half a, half b) { return __gen_ocl_internal_fmax(a, b); } OVERLOADABLE half fma(half a, half b, half c) { float _a = (float)a; float _b = (float)b; float _c = (float)c; return (half)fma(_a, _b, _c); } OVERLOADABLE half fdim(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)fdim(_x, _y); } OVERLOADABLE half maxmag(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)maxmag(_x, _y); } OVERLOADABLE half minmag(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)minmag(_x, _y); } OVERLOADABLE half exp2(half x) { float _x = (float)x; return (half)exp2(_x); } OVERLOADABLE half mad(half a, half b, half c) { return __gen_ocl_mad(a,b,c); } OVERLOADABLE half sin(half x) { float _x = (float)x; return (half)sin(_x); } OVERLOADABLE half cos(half x) { float _x = (float)x; return (half)cos(_x); } OVERLOADABLE half tan(half x) { float _x = (float)x; return (half)tan(_x); } OVERLOADABLE half tgamma(half x) { float _x = (float)x; return (half)tgamma(_x); } OVERLOADABLE half lgamma(half x) { float _x = (float)x; return (half)lgamma(_x); } OVERLOADABLE half lgamma_r(half x, global int *signgamp) { float _x = (float)x; return (half)lgamma_r(_x, signgamp); } OVERLOADABLE half lgamma_r(half x, local int *signgamp) { float _x = (float)x; return (half)lgamma_r(_x, signgamp); } OVERLOADABLE half lgamma_r(half x, private int *signgamp) { float _x = (float)x; return (half)lgamma_r(_x, signgamp); } OVERLOADABLE half log1p(half x) { float _x = (float)x; return (half)log1p(_x); } OVERLOADABLE half logb(half x) { float _x = (float)x; return (half)logb(_x); } OVERLOADABLE int ilogb(half x) { float _x = (float)x; return ilogb(_x); } OVERLOADABLE half nan(ushort code) { return (half)NAN; } OVERLOADABLE half sincos(half x, global half *cosval) { float _x = (float)x; float _cosval; half ret = (half)sincos(_x, &_cosval); *cosval = (half)_cosval; return ret; } OVERLOADABLE half sincos(half x, local half *cosval) { float _x = (float)x; float _cosval; half ret = (half)sincos(_x, &_cosval); *cosval = (half)_cosval; return ret; } OVERLOADABLE half sincos(half x, private half *cosval) { float _x = (float)x; float _cosval; half ret = (half)sincos(_x, &_cosval); *cosval = (half)_cosval; return ret; } OVERLOADABLE half sqrt(half x) { float _x = (float)x; return (half)sqrt(_x); } OVERLOADABLE half rsqrt(half x) { float _x = (float)x; return (half)rsqrt(_x); } OVERLOADABLE half frexp(half x, global int *exp) { float _x = (float)x; return (half)frexp(_x, exp); } OVERLOADABLE half frexp(half x, local int *exp) { float _x = (float)x; return (half)frexp(_x, exp); } OVERLOADABLE half frexp(half x, private int *exp) { float _x = (float)x; return (half)frexp(_x, exp); } OVERLOADABLE half nextafter(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)nextafter(_x, _y); } OVERLOADABLE half modf(half x, global half *i) { float _x = (float)x; float _i; half ret = (half)modf(_x, &_i); *i = (half)_i; return ret; } OVERLOADABLE half modf(half x, local half *i) { float _x = (float)x; float _i; half ret = (half)modf(_x, &_i); *i = (half)_i; return ret; } OVERLOADABLE half modf(half x, private half *i) { float _x = (float)x; float _i; half ret = (half)modf(_x, &_i); *i = (half)_i; return ret; } OVERLOADABLE half hypot(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)hypot(_x, _y); } OVERLOADABLE half fract(half x, global half *p) { float _x = (float)x; float _p; half ret = (half)fract(_x, &_p); *p = (half)_p; return ret; } OVERLOADABLE half fract(half x, local half *p) { float _x = (float)x; float _p; half ret = (half)fract(_x, &_p); *p = (half)_p; return ret; } OVERLOADABLE half fract(half x, private half *p) { float _x = (float)x; float _p; half ret = (half)fract(_x, &_p); *p = (half)_p; return ret; } OVERLOADABLE half remquo(half x, half y, global int *quo) { float _x = (float)x; float _y = (float)y; return (half)remquo(_x, _y, quo); } OVERLOADABLE half remquo(half x, half y, local int *quo) { float _x = (float)x; float _y = (float)y; return (half)remquo(_x, _y, quo); } OVERLOADABLE half remquo(half x, half y, private int *quo) { float _x = (float)x; float _y = (float)y; return (half)remquo(_x, _y, quo); } OVERLOADABLE half pown(half x, int n) { float _x = (float)x; return (half)pown(_x, n); } OVERLOADABLE half rootn(half x, int n) { float _x = (float)x; return (half)rootn(_x, n); } Beignet-1.3.2-Source/backend/src/libocl/tmpl/ocl_integer.tmpl.h000664 001750 001750 00000013655 13161142102 023475 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_INTEGER_H__ #define __OCL_INTEGER_H__ #include "ocl_types.h" #define CHAR_BIT 8 #define CHAR_MAX SCHAR_MAX #define CHAR_MIN SCHAR_MIN #define INT_MAX 2147483647 #define INT_MIN (-2147483647 - 1) #define LONG_MAX 0x7fffffffffffffffL #define LONG_MIN (-0x7fffffffffffffffL - 1) #define SCHAR_MAX 127 #define SCHAR_MIN (-127 - 1) #define SHRT_MAX 32767 #define SHRT_MIN (-32767 - 1) #define UCHAR_MAX 255 #define USHRT_MAX 65535 #define UINT_MAX 0xffffffff #define ULONG_MAX 0xffffffffffffffffUL OVERLOADABLE char clz(char x); OVERLOADABLE uchar clz(uchar x); OVERLOADABLE short clz(short x); OVERLOADABLE ushort clz(ushort x); OVERLOADABLE int clz(int x); OVERLOADABLE uint clz(uint x); OVERLOADABLE long clz(long x); OVERLOADABLE ulong clz(ulong x); char clz_s8(char); uchar clz_u8(uchar); short clz_s16(short); ushort clz_u16(ushort); int clz_s32(int); uint clz_u32(uint); long clz_s64(long); ulong clz_u64(ulong); OVERLOADABLE char ctz(char x); OVERLOADABLE uchar ctz(uchar x); OVERLOADABLE short ctz(short x); OVERLOADABLE ushort ctz(ushort x); OVERLOADABLE int ctz(int x); OVERLOADABLE uint ctz(uint x); OVERLOADABLE long ctz(long x); OVERLOADABLE ulong ctz(ulong x); char ctz_s8(char); uchar ctz_u8(uchar); short ctz_s16(short); ushort ctz_u16(ushort); int ctz_s32(int); uint ctz_u32(uint); long ctz_s64(long); ulong ctz_u64(ulong); OVERLOADABLE char popcount(char x); OVERLOADABLE uchar popcount(uchar x); OVERLOADABLE short popcount(short x); OVERLOADABLE ushort popcount(ushort x); OVERLOADABLE int popcount(int x); OVERLOADABLE uint popcount(uint x); OVERLOADABLE long popcount(long x); OVERLOADABLE ulong popcount(ulong x); OVERLOADABLE char mul_hi(char x, char y); OVERLOADABLE uchar mul_hi(uchar x, uchar y); OVERLOADABLE short mul_hi(short x, short y); OVERLOADABLE ushort mul_hi(ushort x, ushort y); OVERLOADABLE int mul_hi(int x, int y); OVERLOADABLE uint mul_hi(uint x, uint y); OVERLOADABLE long mul_hi(long x, long y); OVERLOADABLE ulong mul_hi(ulong x, ulong y); #define SDEF(TYPE) \ OVERLOADABLE TYPE add_sat(TYPE x, TYPE y); \ OVERLOADABLE TYPE sub_sat(TYPE x, TYPE y); SDEF(char); SDEF(short); SDEF(int); SDEF(long); #undef SDEF #define UDEF(TYPE) \ OVERLOADABLE TYPE add_sat(TYPE x, TYPE y); \ OVERLOADABLE TYPE sub_sat(TYPE x, TYPE y); UDEF(uchar); UDEF(ushort); UDEF(uint); UDEF(ulong); #undef UDEF #define DEF(type) OVERLOADABLE type mad_hi(type a, type b, type c); DEF(char) DEF(uchar) DEF(short) DEF(ushort) DEF(int) DEF(uint) DEF(long) DEF(ulong) #undef DEF OVERLOADABLE int mul24(int a, int b); OVERLOADABLE uint mul24(uint a, uint b); OVERLOADABLE int mad24(int a, int b, int c); OVERLOADABLE uint mad24(uint a, uint b, uint c); OVERLOADABLE char mad_sat(char a, char b, char c) ; OVERLOADABLE uchar mad_sat(uchar a, uchar b, uchar c); OVERLOADABLE short mad_sat(short a, short b, short c); OVERLOADABLE ushort mad_sat(ushort a, ushort b, ushort c); OVERLOADABLE int mad_sat(int a, int b, int c); OVERLOADABLE uint mad_sat(uint a, uint b, uint c); OVERLOADABLE long mad_sat(long a, long b, long c); OVERLOADABLE ulong mad_sat(ulong a, ulong b, ulong c); #define DEF(type, m) OVERLOADABLE type rotate(type x, type y); DEF(char, 7) DEF(uchar, 7) DEF(short, 15) DEF(ushort, 15) DEF(int, 31) DEF(uint, 31) DEF(long, 63) DEF(ulong, 63) #undef DEF OVERLOADABLE short upsample(char hi, uchar lo); OVERLOADABLE ushort upsample(uchar hi, uchar lo); OVERLOADABLE int upsample(short hi, ushort lo); OVERLOADABLE uint upsample(ushort hi, ushort lo); OVERLOADABLE long upsample(int hi, uint lo); OVERLOADABLE ulong upsample(uint hi, uint lo); #define DEC DEF(char); DEF(uchar); DEF(short); DEF(ushort) #define DEF(type) OVERLOADABLE type hadd(type x, type y); DEC #undef DEF #define DEF(type) OVERLOADABLE type rhadd(type x, type y); DEC #undef DEF #undef DEC OVERLOADABLE int hadd(int x, int y); OVERLOADABLE uint hadd(uint x, uint y); OVERLOADABLE int rhadd(int x, int y); OVERLOADABLE uint rhadd(uint x, uint y); OVERLOADABLE long hadd(long x, long y); OVERLOADABLE ulong hadd(ulong x, ulong y); OVERLOADABLE long rhadd(long x, long y); OVERLOADABLE ulong rhadd(ulong x, ulong y); #define DEC(TYPE) OVERLOADABLE u##TYPE abs(TYPE x); DEC(int) DEC(short) DEC(char) #undef DEC OVERLOADABLE ulong abs(long x); /* For unsigned types, do nothing. */ #define DEC(TYPE) OVERLOADABLE TYPE abs(TYPE x); DEC(uint) DEC(ushort) DEC(uchar) DEC(ulong) #undef DEC /* Char and short type abs diff */ /* promote char and short to int and will be no module overflow */ #define DEC(TYPE, UTYPE) OVERLOADABLE UTYPE abs_diff(TYPE x, TYPE y); DEC(char, uchar) DEC(uchar, uchar) DEC(short, ushort) DEC(ushort, ushort) #undef DEC OVERLOADABLE uint abs_diff (uint x, uint y); OVERLOADABLE uint abs_diff (int x, int y); OVERLOADABLE ulong abs_diff (long x, long y); OVERLOADABLE ulong abs_diff (ulong x, ulong y); #define DECL_MIN_MAX_CLAMP(TYPE) \ OVERLOADABLE TYPE max(TYPE a, TYPE b); \ OVERLOADABLE TYPE min(TYPE a, TYPE b); \ OVERLOADABLE TYPE clamp(TYPE v, TYPE l, TYPE u); DECL_MIN_MAX_CLAMP(int) DECL_MIN_MAX_CLAMP(short) DECL_MIN_MAX_CLAMP(char) DECL_MIN_MAX_CLAMP(uint) DECL_MIN_MAX_CLAMP(unsigned short) DECL_MIN_MAX_CLAMP(unsigned char) DECL_MIN_MAX_CLAMP(long) DECL_MIN_MAX_CLAMP(ulong) #undef DECL_MIN_MAX_CLAMP Beignet-1.3.2-Source/backend/src/libocl/tmpl/ocl_math.tmpl.h000664 001750 001750 00000021557 13161142102 022771 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_MATH_H__ #define __OCL_MATH_H__ #include "ocl_types.h" OVERLOADABLE float cospi(float x); OVERLOADABLE float cosh(float x); OVERLOADABLE float acos(float x); OVERLOADABLE float acospi(float x); OVERLOADABLE float acosh(float x); OVERLOADABLE float sinpi(float x); OVERLOADABLE float sinh(float x); OVERLOADABLE float asin(float x); OVERLOADABLE float asinpi(float x); OVERLOADABLE float asinh(float x); OVERLOADABLE float tanpi(float x); OVERLOADABLE float tanh(float x); OVERLOADABLE float atan(float x); OVERLOADABLE float atan2(float y, float x); OVERLOADABLE float atan2pi(float y, float x); OVERLOADABLE float atanpi(float x); OVERLOADABLE float atanh(float x); OVERLOADABLE float cbrt(float x); OVERLOADABLE float rint(float x); OVERLOADABLE float copysign(float x, float y); OVERLOADABLE float erf(float x); OVERLOADABLE float erfc(float x); OVERLOADABLE float fmod (float x, float y); OVERLOADABLE float remainder(float x, float p); OVERLOADABLE float ldexp(float x, int n); OVERLOADABLE float powr(float x, float y); OVERLOADABLE float pow(float x, float y); //no pow, we use powr instead OVERLOADABLE float fabs(float x); OVERLOADABLE float trunc(float x); OVERLOADABLE float round(float x); OVERLOADABLE float floor(float x); OVERLOADABLE float ceil(float x); OVERLOADABLE float log(float x); OVERLOADABLE float log2(float x); OVERLOADABLE float log10(float x); OVERLOADABLE float exp(float x); OVERLOADABLE float exp10(float x); OVERLOADABLE float expm1(float x); OVERLOADABLE float fmin(float a, float b); OVERLOADABLE float fmax(float a, float b); OVERLOADABLE float fma(float a, float b, float c); OVERLOADABLE float fdim(float x, float y); OVERLOADABLE float maxmag(float x, float y); OVERLOADABLE float minmag(float x, float y); OVERLOADABLE float exp2(float x); OVERLOADABLE float mad(float a, float b, float c); OVERLOADABLE float sin(float x); OVERLOADABLE float cos(float x); OVERLOADABLE float tan(float x); OVERLOADABLE float tgamma(float x); OVERLOADABLE float lgamma(float x); OVERLOADABLE float lgamma_r(float x, global int *signgamp); OVERLOADABLE float lgamma_r(float x, local int *signgamp); OVERLOADABLE float lgamma_r(float x, private int *signgamp); OVERLOADABLE float log1p(float x); OVERLOADABLE float logb(float x); OVERLOADABLE int ilogb(float x); OVERLOADABLE float nan(uint code); OVERLOADABLE float sincos(float x, global float *cosval); OVERLOADABLE float sincos(float x, local float *cosval); OVERLOADABLE float sincos(float x, private float *cosval); OVERLOADABLE float sqrt(float x); OVERLOADABLE float rsqrt(float x); OVERLOADABLE float frexp(float x, global int *exp); OVERLOADABLE float frexp(float x, local int *exp); OVERLOADABLE float frexp(float x, private int *exp); OVERLOADABLE float nextafter(float x, float y); OVERLOADABLE float modf(float x, global float *i); OVERLOADABLE float modf(float x, local float *i); OVERLOADABLE float modf(float x, private float *i); OVERLOADABLE float hypot(float x, float y); OVERLOADABLE float fract(float x, global float *p); OVERLOADABLE float fract(float x, local float *p); OVERLOADABLE float fract(float x, private float *p); OVERLOADABLE float remquo(float x, float y, global int *quo); OVERLOADABLE float remquo(float x, float y, local int *quo); OVERLOADABLE float remquo(float x, float y, private int *quo); OVERLOADABLE float pown(float x, int n); OVERLOADABLE float rootn(float x, int n); // native OVERLOADABLE float native_cos(float x); OVERLOADABLE float native_divide(float x, float y); OVERLOADABLE float native_exp(float x); OVERLOADABLE float native_exp2(float x); OVERLOADABLE float native_exp10(float x); OVERLOADABLE float native_log(float x); OVERLOADABLE float native_log2(float x); OVERLOADABLE float native_log10(float x); OVERLOADABLE float native_powr(float x, float y); OVERLOADABLE float native_recip(float x); OVERLOADABLE float native_rsqrt(float x); OVERLOADABLE float native_sin(float x); OVERLOADABLE float native_sqrt(float x); OVERLOADABLE float native_tan(float x); // Half float version. OVERLOADABLE half cospi(half x); OVERLOADABLE half cosh(half x); OVERLOADABLE half acos(half x); OVERLOADABLE half acospi(half x); OVERLOADABLE half acosh(half x); OVERLOADABLE half sinpi(half x); OVERLOADABLE half sinh(half x); OVERLOADABLE half asin(half x); OVERLOADABLE half asinpi(half x); OVERLOADABLE half asinh(half x); OVERLOADABLE half tanpi(half x); OVERLOADABLE half tanh(half x); OVERLOADABLE half atan(half x); OVERLOADABLE half atan2(half y, half x); OVERLOADABLE half atan2pi(half y, half x); OVERLOADABLE half atanpi(half x); OVERLOADABLE half atanh(half x); OVERLOADABLE half cbrt(half x); OVERLOADABLE half rint(half x); OVERLOADABLE half copysign(half x, half y); OVERLOADABLE half erf(half x); OVERLOADABLE half erfc(half x); OVERLOADABLE half fmod (half x, half y); OVERLOADABLE half remainder(half x, half p); OVERLOADABLE half ldexp(half x, int n); OVERLOADABLE half powr(half x, half y); OVERLOADABLE half pow(half x, half y); //no pow, we use powr instead OVERLOADABLE half fabs(half x); OVERLOADABLE half trunc(half x); OVERLOADABLE half round(half x); OVERLOADABLE half floor(half x); OVERLOADABLE half ceil(half x); OVERLOADABLE half log(half x); OVERLOADABLE half log2(half x); OVERLOADABLE half log10(half x); OVERLOADABLE half exp(half x); OVERLOADABLE half exp10(half x); OVERLOADABLE half expm1(half x); OVERLOADABLE half fmin(half a, half b); OVERLOADABLE half fmax(half a, half b); OVERLOADABLE half fma(half a, half b, half c); OVERLOADABLE half fdim(half x, half y); OVERLOADABLE half maxmag(half x, half y); OVERLOADABLE half minmag(half x, half y); OVERLOADABLE half exp2(half x); OVERLOADABLE half mad(half a, half b, half c); OVERLOADABLE half sin(half x); OVERLOADABLE half cos(half x); OVERLOADABLE half tan(half x); OVERLOADABLE half tgamma(half x); OVERLOADABLE half lgamma(half x); OVERLOADABLE half lgamma_r(half x, global int *signgamp); OVERLOADABLE half lgamma_r(half x, local int *signgamp); OVERLOADABLE half lgamma_r(half x, private int *signgamp); OVERLOADABLE half log1p(half x); OVERLOADABLE half logb(half x); OVERLOADABLE int ilogb(half x); OVERLOADABLE half nan(ushort code); OVERLOADABLE half sincos(half x, global half *cosval); OVERLOADABLE half sincos(half x, local half *cosval); OVERLOADABLE half sincos(half x, private half *cosval); OVERLOADABLE half sqrt(half x); OVERLOADABLE half rsqrt(half x); OVERLOADABLE half frexp(half x, global int *exp); OVERLOADABLE half frexp(half x, local int *exp); OVERLOADABLE half frexp(half x, private int *exp); OVERLOADABLE half nextafter(half x, half y); OVERLOADABLE half modf(half x, global half *i); OVERLOADABLE half modf(half x, local half *i); OVERLOADABLE half modf(half x, private half *i); OVERLOADABLE half hypot(half x, half y); OVERLOADABLE half fract(half x, global half *p); OVERLOADABLE half fract(half x, local half *p); OVERLOADABLE half fract(half x, private half *p); OVERLOADABLE half remquo(half x, half y, global int *quo); OVERLOADABLE half remquo(half x, half y, local int *quo); OVERLOADABLE half remquo(half x, half y, private int *quo); OVERLOADABLE half pown(half x, int n); OVERLOADABLE half rootn(half x, int n); // native half OVERLOADABLE half native_cos(half x); OVERLOADABLE half native_divide(half x, half y); OVERLOADABLE half native_exp(half x); OVERLOADABLE half native_exp2(half x); OVERLOADABLE half native_exp10(half x); OVERLOADABLE half native_log(half x); OVERLOADABLE half native_log2(half x); OVERLOADABLE half native_log10(half x); OVERLOADABLE half native_powr(half x, half y); OVERLOADABLE half native_recip(half x); OVERLOADABLE half native_rsqrt(half x); OVERLOADABLE half native_sin(half x); OVERLOADABLE half native_sqrt(half x); OVERLOADABLE half native_tan(half x); // half accuracy OVERLOADABLE float half_cos(float x); OVERLOADABLE float half_divide(float x, float y); OVERLOADABLE float half_exp(float x); OVERLOADABLE float half_exp2(float x); OVERLOADABLE float half_exp10(float x); OVERLOADABLE float half_log(float x); OVERLOADABLE float half_log2(float x); OVERLOADABLE float half_log10(float x); OVERLOADABLE float half_powr(float x, float y); OVERLOADABLE float half_recip(float x); OVERLOADABLE float half_rsqrt(float x); OVERLOADABLE float half_sin(float x); OVERLOADABLE float half_sqrt(float x); OVERLOADABLE float half_tan(float x); Beignet-1.3.2-Source/backend/src/libocl/tmpl/ocl_simd.tmpl.cl000664 001750 001750 00000042676 13161142102 023150 0ustar00yryr000000 000000 /* * Copyright @ 2015 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_simd.h" #include "ocl_workitem.h" uint get_max_sub_group_size(void) { uint local_sz = get_local_size(0)*get_local_size(1)*get_local_size(2); uint simd_sz = get_simd_size(); return local_sz > simd_sz ? simd_sz : local_sz; } uint get_sub_group_size(void) { uint threadn = get_num_sub_groups(); uint threadid = get_sub_group_id(); if (threadid == (threadn - 1)) return (get_local_size(0)*get_local_size(1)*get_local_size(2) -1) % get_max_sub_group_size() + 1; else return get_max_sub_group_size(); } /* broadcast */ #define BROADCAST_IMPL(GEN_TYPE) \ OVERLOADABLE GEN_TYPE __gen_ocl_sub_group_broadcast(GEN_TYPE a, uint local_id); \ OVERLOADABLE GEN_TYPE sub_group_broadcast(GEN_TYPE a, uint local_id) { \ return __gen_ocl_sub_group_broadcast(a, local_id); \ } BROADCAST_IMPL(int) BROADCAST_IMPL(uint) BROADCAST_IMPL(long) BROADCAST_IMPL(ulong) BROADCAST_IMPL(half) BROADCAST_IMPL(float) BROADCAST_IMPL(double) BROADCAST_IMPL(short) BROADCAST_IMPL(ushort) #undef BROADCAST_IMPL OVERLOADABLE short intel_sub_group_broadcast(short a, uint local_id) { return __gen_ocl_sub_group_broadcast(a, local_id); } OVERLOADABLE ushort intel_sub_group_broadcast(ushort a, uint local_id) { return __gen_ocl_sub_group_broadcast(a, local_id); } #define RANGE_OP(RANGE, OP, GEN_TYPE, SIGN) \ OVERLOADABLE GEN_TYPE __gen_ocl_sub_group_##RANGE##_##OP(bool sign, GEN_TYPE x); \ OVERLOADABLE GEN_TYPE sub_group_##RANGE##_##OP(GEN_TYPE x) { \ return __gen_ocl_sub_group_##RANGE##_##OP(SIGN, x); \ } /* reduce add */ RANGE_OP(reduce, add, int, true) RANGE_OP(reduce, add, uint, false) RANGE_OP(reduce, add, long, true) RANGE_OP(reduce, add, ulong, false) RANGE_OP(reduce, add, half, true) RANGE_OP(reduce, add, float, true) RANGE_OP(reduce, add, double, true) RANGE_OP(reduce, add, short, true) RANGE_OP(reduce, add, ushort, false) /* reduce min */ RANGE_OP(reduce, min, int, true) RANGE_OP(reduce, min, uint, false) RANGE_OP(reduce, min, long, true) RANGE_OP(reduce, min, ulong, false) RANGE_OP(reduce, min, half, true) RANGE_OP(reduce, min, float, true) RANGE_OP(reduce, min, double, true) RANGE_OP(reduce, min, short, true) RANGE_OP(reduce, min, ushort, false) /* reduce max */ RANGE_OP(reduce, max, int, true) RANGE_OP(reduce, max, uint, false) RANGE_OP(reduce, max, long, true) RANGE_OP(reduce, max, ulong, false) RANGE_OP(reduce, max, half, true) RANGE_OP(reduce, max, float, true) RANGE_OP(reduce, max, double, true) RANGE_OP(reduce, max, short, true) RANGE_OP(reduce, max, ushort, false) /* scan_inclusive add */ RANGE_OP(scan_inclusive, add, int, true) RANGE_OP(scan_inclusive, add, uint, false) RANGE_OP(scan_inclusive, add, long, true) RANGE_OP(scan_inclusive, add, ulong, false) RANGE_OP(scan_inclusive, add, half, true) RANGE_OP(scan_inclusive, add, float, true) RANGE_OP(scan_inclusive, add, double, true) RANGE_OP(scan_inclusive, add, short, true) RANGE_OP(scan_inclusive, add, ushort, false) /* scan_inclusive min */ RANGE_OP(scan_inclusive, min, int, true) RANGE_OP(scan_inclusive, min, uint, false) RANGE_OP(scan_inclusive, min, long, true) RANGE_OP(scan_inclusive, min, ulong, false) RANGE_OP(scan_inclusive, min, half, true) RANGE_OP(scan_inclusive, min, float, true) RANGE_OP(scan_inclusive, min, double, true) RANGE_OP(scan_inclusive, min, short, true) RANGE_OP(scan_inclusive, min, ushort, false) /* scan_inclusive max */ RANGE_OP(scan_inclusive, max, int, true) RANGE_OP(scan_inclusive, max, uint, false) RANGE_OP(scan_inclusive, max, long, true) RANGE_OP(scan_inclusive, max, ulong, false) RANGE_OP(scan_inclusive, max, half, true) RANGE_OP(scan_inclusive, max, float, true) RANGE_OP(scan_inclusive, max, double, true) RANGE_OP(scan_inclusive, max, short, true) RANGE_OP(scan_inclusive, max, ushort, false) /* scan_exclusive add */ RANGE_OP(scan_exclusive, add, int, true) RANGE_OP(scan_exclusive, add, uint, false) RANGE_OP(scan_exclusive, add, long, true) RANGE_OP(scan_exclusive, add, ulong, false) RANGE_OP(scan_exclusive, add, half, true) RANGE_OP(scan_exclusive, add, float, true) RANGE_OP(scan_exclusive, add, double, true) RANGE_OP(scan_exclusive, add, short, true) RANGE_OP(scan_exclusive, add, ushort, false) /* scan_exclusive min */ RANGE_OP(scan_exclusive, min, int, true) RANGE_OP(scan_exclusive, min, uint, false) RANGE_OP(scan_exclusive, min, long, true) RANGE_OP(scan_exclusive, min, ulong, false) RANGE_OP(scan_exclusive, min, half, true) RANGE_OP(scan_exclusive, min, float, true) RANGE_OP(scan_exclusive, min, double, true) RANGE_OP(scan_exclusive, min, short, true) RANGE_OP(scan_exclusive, min, ushort, false) /* scan_exclusive max */ RANGE_OP(scan_exclusive, max, int, true) RANGE_OP(scan_exclusive, max, uint, false) RANGE_OP(scan_exclusive, max, long, true) RANGE_OP(scan_exclusive, max, ulong, false) RANGE_OP(scan_exclusive, max, half, true) RANGE_OP(scan_exclusive, max, float, true) RANGE_OP(scan_exclusive, max, double, true) RANGE_OP(scan_exclusive, max, short, true) RANGE_OP(scan_exclusive, max, ushort, false) #undef RANGE_OP #define INTEL_RANGE_OP(RANGE, OP, GEN_TYPE, SIGN) \ OVERLOADABLE GEN_TYPE intel_sub_group_##RANGE##_##OP(GEN_TYPE x) { \ return __gen_ocl_sub_group_##RANGE##_##OP(SIGN, x); \ } INTEL_RANGE_OP(reduce, add, short, true) INTEL_RANGE_OP(reduce, add, ushort, false) INTEL_RANGE_OP(reduce, min, short, true) INTEL_RANGE_OP(reduce, min, ushort, false) INTEL_RANGE_OP(reduce, max, short, true) INTEL_RANGE_OP(reduce, max, ushort, false) INTEL_RANGE_OP(scan_inclusive, add, short, true) INTEL_RANGE_OP(scan_inclusive, add, ushort, false) INTEL_RANGE_OP(scan_inclusive, min, short, true) INTEL_RANGE_OP(scan_inclusive, min, ushort, false) INTEL_RANGE_OP(scan_inclusive, max, short, true) INTEL_RANGE_OP(scan_inclusive, max, ushort, false) INTEL_RANGE_OP(scan_exclusive, add, short, true) INTEL_RANGE_OP(scan_exclusive, add, ushort, false) INTEL_RANGE_OP(scan_exclusive, min, short, true) INTEL_RANGE_OP(scan_exclusive, min, ushort, false) INTEL_RANGE_OP(scan_exclusive, max, short, true) INTEL_RANGE_OP(scan_exclusive, max, ushort, false) #undef INTEL_RANGE_OP PURE CONST uint __gen_ocl_sub_group_block_read_ui_mem(const global uint* p); PURE CONST uint2 __gen_ocl_sub_group_block_read_ui_mem2(const global uint* p); PURE CONST uint4 __gen_ocl_sub_group_block_read_ui_mem4(const global uint* p); PURE CONST uint8 __gen_ocl_sub_group_block_read_ui_mem8(const global uint* p); OVERLOADABLE uint intel_sub_group_block_read(const global uint* p) { return __gen_ocl_sub_group_block_read_ui_mem(p); } OVERLOADABLE uint2 intel_sub_group_block_read2(const global uint* p) { return __gen_ocl_sub_group_block_read_ui_mem2(p); } OVERLOADABLE uint4 intel_sub_group_block_read4(const global uint* p) { return __gen_ocl_sub_group_block_read_ui_mem4(p); } OVERLOADABLE uint8 intel_sub_group_block_read8(const global uint* p) { return __gen_ocl_sub_group_block_read_ui_mem8(p); } OVERLOADABLE uint intel_sub_group_block_read_ui(const global uint* p) { return __gen_ocl_sub_group_block_read_ui_mem(p); } OVERLOADABLE uint2 intel_sub_group_block_read_ui2(const global uint* p) { return __gen_ocl_sub_group_block_read_ui_mem2(p); } OVERLOADABLE uint4 intel_sub_group_block_read_ui4(const global uint* p) { return __gen_ocl_sub_group_block_read_ui_mem4(p); } OVERLOADABLE uint8 intel_sub_group_block_read_ui8(const global uint* p) { return __gen_ocl_sub_group_block_read_ui_mem8(p); } void __gen_ocl_sub_group_block_write_ui_mem(global uint* p, uint data); void __gen_ocl_sub_group_block_write_ui_mem2(global uint* p, uint2 data); void __gen_ocl_sub_group_block_write_ui_mem4(global uint* p, uint4 data); void __gen_ocl_sub_group_block_write_ui_mem8(global uint* p, uint8 data); OVERLOADABLE void intel_sub_group_block_write(global uint* p, uint data) { __gen_ocl_sub_group_block_write_ui_mem(p, data); } OVERLOADABLE void intel_sub_group_block_write2(global uint* p, uint2 data) { __gen_ocl_sub_group_block_write_ui_mem2(p, data); } OVERLOADABLE void intel_sub_group_block_write4(global uint* p,uint4 data) { __gen_ocl_sub_group_block_write_ui_mem4(p, data); } OVERLOADABLE void intel_sub_group_block_write8(global uint* p,uint8 data) { __gen_ocl_sub_group_block_write_ui_mem8(p, data); } OVERLOADABLE void intel_sub_group_block_write_ui(global uint* p, uint data) { __gen_ocl_sub_group_block_write_ui_mem(p, data); } OVERLOADABLE void intel_sub_group_block_write_ui2(global uint* p, uint2 data) { __gen_ocl_sub_group_block_write_ui_mem2(p, data); } OVERLOADABLE void intel_sub_group_block_write_ui4(global uint* p,uint4 data) { __gen_ocl_sub_group_block_write_ui_mem4(p, data); } OVERLOADABLE void intel_sub_group_block_write_ui8(global uint* p,uint8 data) { __gen_ocl_sub_group_block_write_ui_mem8(p, data); } PURE CONST uint __gen_ocl_sub_group_block_read_ui_image(image2d_t p, int x, int y); PURE CONST uint2 __gen_ocl_sub_group_block_read_ui_image2(image2d_t p, int x, int y); PURE CONST uint4 __gen_ocl_sub_group_block_read_ui_image4(image2d_t p, int x, int y); PURE CONST uint8 __gen_ocl_sub_group_block_read_ui_image8(image2d_t p, int x, int y); OVERLOADABLE uint intel_sub_group_block_read(image2d_t p, int2 cord) { return __gen_ocl_sub_group_block_read_ui_image(p, cord.x, cord.y); } OVERLOADABLE uint2 intel_sub_group_block_read2(image2d_t p, int2 cord) { return __gen_ocl_sub_group_block_read_ui_image2(p, cord.x, cord.y); } OVERLOADABLE uint4 intel_sub_group_block_read4(image2d_t p, int2 cord) { return __gen_ocl_sub_group_block_read_ui_image4(p, cord.x, cord.y); } OVERLOADABLE uint8 intel_sub_group_block_read8(image2d_t p, int2 cord) { return __gen_ocl_sub_group_block_read_ui_image8(p, cord.x, cord.y); } OVERLOADABLE uint intel_sub_group_block_read_ui(image2d_t p, int2 cord) { return __gen_ocl_sub_group_block_read_ui_image(p, cord.x, cord.y); } OVERLOADABLE uint2 intel_sub_group_block_read_ui2(image2d_t p, int2 cord) { return __gen_ocl_sub_group_block_read_ui_image2(p, cord.x, cord.y); } OVERLOADABLE uint4 intel_sub_group_block_read_ui4(image2d_t p, int2 cord) { return __gen_ocl_sub_group_block_read_ui_image4(p, cord.x, cord.y); } OVERLOADABLE uint8 intel_sub_group_block_read_ui8(image2d_t p, int2 cord) { return __gen_ocl_sub_group_block_read_ui_image8(p, cord.x, cord.y); } void __gen_ocl_sub_group_block_write_ui_image(image2d_t p, int x, int y, uint data); void __gen_ocl_sub_group_block_write_ui_image2(image2d_t p, int x, int y, uint2 data); void __gen_ocl_sub_group_block_write_ui_image4(image2d_t p, int x, int y, uint4 data); void __gen_ocl_sub_group_block_write_ui_image8(image2d_t p, int x, int y, uint8 data); OVERLOADABLE void intel_sub_group_block_write(image2d_t p, int2 cord, uint data) { __gen_ocl_sub_group_block_write_ui_image(p, cord.x, cord.y, data); } OVERLOADABLE void intel_sub_group_block_write2(image2d_t p, int2 cord, uint2 data) { __gen_ocl_sub_group_block_write_ui_image2(p, cord.x, cord.y, data); } OVERLOADABLE void intel_sub_group_block_write4(image2d_t p, int2 cord, uint4 data) { __gen_ocl_sub_group_block_write_ui_image4(p, cord.x, cord.y, data); } OVERLOADABLE void intel_sub_group_block_write8(image2d_t p, int2 cord, uint8 data) { __gen_ocl_sub_group_block_write_ui_image8(p, cord.x, cord.y, data); } OVERLOADABLE void intel_sub_group_block_write_ui(image2d_t p, int2 cord, uint data) { __gen_ocl_sub_group_block_write_ui_image(p, cord.x, cord.y, data); } OVERLOADABLE void intel_sub_group_block_write_ui2(image2d_t p, int2 cord, uint2 data) { __gen_ocl_sub_group_block_write_ui_image2(p, cord.x, cord.y, data); } OVERLOADABLE void intel_sub_group_block_write_ui4(image2d_t p, int2 cord, uint4 data) { __gen_ocl_sub_group_block_write_ui_image4(p, cord.x, cord.y, data); } OVERLOADABLE void intel_sub_group_block_write_ui8(image2d_t p, int2 cord, uint8 data) { __gen_ocl_sub_group_block_write_ui_image8(p, cord.x, cord.y, data); } PURE CONST ushort __gen_ocl_sub_group_block_read_us_mem(const global ushort* p); PURE CONST ushort2 __gen_ocl_sub_group_block_read_us_mem2(const global ushort* p); PURE CONST ushort4 __gen_ocl_sub_group_block_read_us_mem4(const global ushort* p); PURE CONST ushort8 __gen_ocl_sub_group_block_read_us_mem8(const global ushort* p); OVERLOADABLE ushort intel_sub_group_block_read_us(const global ushort* p) { return __gen_ocl_sub_group_block_read_us_mem(p); } OVERLOADABLE ushort2 intel_sub_group_block_read_us2(const global ushort* p) { return __gen_ocl_sub_group_block_read_us_mem2(p); } OVERLOADABLE ushort4 intel_sub_group_block_read_us4(const global ushort* p) { return __gen_ocl_sub_group_block_read_us_mem4(p); } OVERLOADABLE ushort8 intel_sub_group_block_read_us8(const global ushort* p) { return __gen_ocl_sub_group_block_read_us_mem8(p); } void __gen_ocl_sub_group_block_write_us_mem(global ushort* p, ushort data); void __gen_ocl_sub_group_block_write_us_mem2(global ushort* p, ushort2 data); void __gen_ocl_sub_group_block_write_us_mem4(global ushort* p, ushort4 data); void __gen_ocl_sub_group_block_write_us_mem8(global ushort* p, ushort8 data); OVERLOADABLE void intel_sub_group_block_write_us(global ushort* p, ushort data) { __gen_ocl_sub_group_block_write_us_mem(p, data); } OVERLOADABLE void intel_sub_group_block_write_us2(global ushort* p, ushort2 data) { __gen_ocl_sub_group_block_write_us_mem2(p, data); } OVERLOADABLE void intel_sub_group_block_write_us4(global ushort* p,ushort4 data) { __gen_ocl_sub_group_block_write_us_mem4(p, data); } OVERLOADABLE void intel_sub_group_block_write_us8(global ushort* p,ushort8 data) { __gen_ocl_sub_group_block_write_us_mem8(p, data); } PURE CONST ushort __gen_ocl_sub_group_block_read_us_image(image2d_t p, int x, int y); PURE CONST ushort2 __gen_ocl_sub_group_block_read_us_image2(image2d_t p, int x, int y); PURE CONST ushort4 __gen_ocl_sub_group_block_read_us_image4(image2d_t p, int x, int y); PURE CONST ushort8 __gen_ocl_sub_group_block_read_us_image8(image2d_t p, int x, int y); OVERLOADABLE ushort intel_sub_group_block_read_us(image2d_t p, int2 cord) { return __gen_ocl_sub_group_block_read_us_image(p, cord.x, cord.y); } OVERLOADABLE ushort2 intel_sub_group_block_read_us2(image2d_t p, int2 cord) { return __gen_ocl_sub_group_block_read_us_image2(p, cord.x, cord.y); } OVERLOADABLE ushort4 intel_sub_group_block_read_us4(image2d_t p, int2 cord) { return __gen_ocl_sub_group_block_read_us_image4(p, cord.x, cord.y); } OVERLOADABLE ushort8 intel_sub_group_block_read_us8(image2d_t p, int2 cord) { return __gen_ocl_sub_group_block_read_us_image8(p, cord.x, cord.y); } void __gen_ocl_sub_group_block_write_us_image(image2d_t p, int x, int y, ushort data); void __gen_ocl_sub_group_block_write_us_image2(image2d_t p, int x, int y, ushort2 data); void __gen_ocl_sub_group_block_write_us_image4(image2d_t p, int x, int y, ushort4 data); void __gen_ocl_sub_group_block_write_us_image8(image2d_t p, int x, int y, ushort8 data); OVERLOADABLE void intel_sub_group_block_write_us(image2d_t p, int2 cord, ushort data) { __gen_ocl_sub_group_block_write_us_image(p, cord.x, cord.y, data); } OVERLOADABLE void intel_sub_group_block_write_us2(image2d_t p, int2 cord, ushort2 data) { __gen_ocl_sub_group_block_write_us_image2(p, cord.x, cord.y, data); } OVERLOADABLE void intel_sub_group_block_write_us4(image2d_t p, int2 cord, ushort4 data) { __gen_ocl_sub_group_block_write_us_image4(p, cord.x, cord.y, data); } OVERLOADABLE void intel_sub_group_block_write_us8(image2d_t p, int2 cord, ushort8 data) { __gen_ocl_sub_group_block_write_us_image8(p, cord.x, cord.y, data); } #define SHUFFLE_DOWN(TYPE) \ OVERLOADABLE TYPE intel_sub_group_shuffle_down(TYPE x, TYPE y, uint c) { \ TYPE res0, res1; \ res0 = intel_sub_group_shuffle(x, (get_sub_group_local_id() + c)%get_max_sub_group_size()); \ res1 = intel_sub_group_shuffle(y, (get_sub_group_local_id() + c)%get_max_sub_group_size()); \ bool inRange = ((int)c + (int)get_sub_group_local_id() > 0) && (((int)c + (int)get_sub_group_local_id() < (int) get_max_sub_group_size())); \ return inRange ? res0 : res1; \ } SHUFFLE_DOWN(float) SHUFFLE_DOWN(int) SHUFFLE_DOWN(uint) SHUFFLE_DOWN(short) SHUFFLE_DOWN(ushort) #undef SHUFFLE_DOWN #define SHUFFLE_UP(TYPE) \ OVERLOADABLE TYPE intel_sub_group_shuffle_up(TYPE x, TYPE y, uint c) { \ TYPE res0, res1; \ res0 = intel_sub_group_shuffle(x, (get_max_sub_group_size() + get_sub_group_local_id() - c)%get_max_sub_group_size()); \ res1 = intel_sub_group_shuffle(y, (get_max_sub_group_size() + get_sub_group_local_id() - c)%get_max_sub_group_size()); \ bool inRange = ((int)c - (int)get_sub_group_local_id() > 0) && (((int)c - (int)get_sub_group_local_id() < (int) get_max_sub_group_size())); \ return inRange ? res0 : res1; \ } SHUFFLE_UP(float) SHUFFLE_UP(int) SHUFFLE_UP(uint) SHUFFLE_UP(short) SHUFFLE_UP(ushort) #undef SHUFFLE_UP #define SHUFFLE_XOR(TYPE) \ OVERLOADABLE TYPE intel_sub_group_shuffle_xor(TYPE x, uint c) { \ return intel_sub_group_shuffle(x, (get_sub_group_local_id() ^ c) % get_max_sub_group_size()); \ } SHUFFLE_XOR(float) SHUFFLE_XOR(int) SHUFFLE_XOR(uint) SHUFFLE_XOR(short) SHUFFLE_XOR(ushort) #undef SHUFFLE_XOR Beignet-1.3.2-Source/backend/src/libocl/tmpl/ocl_math_20.tmpl.cl000664 001750 001750 00000325107 13161142102 023437 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_math_20.h" #include "ocl_float.h" #include "ocl_relational.h" #include "ocl_common.h" #include "ocl_integer.h" extern constant int __ocl_math_fastpath_flag; CONST float __gen_ocl_fabs(float x) __asm("llvm.fabs" ".f32"); CONST float __gen_ocl_sin(float x) __asm("llvm.sin" ".f32"); CONST float __gen_ocl_cos(float x) __asm("llvm.cos" ".f32"); CONST float __gen_ocl_sqrt(float x) __asm("llvm.sqrt" ".f32"); PURE CONST float __gen_ocl_rsqrt(float x); CONST float __gen_ocl_log(float x) __asm("llvm.log2" ".f32"); CONST float __gen_ocl_exp(float x) __asm("llvm.exp2" ".f32"); PURE CONST float __gen_ocl_pow(float x, float y) __asm("llvm.pow" ".f32"); PURE CONST float __gen_ocl_rcp(float x); CONST float __gen_ocl_rndz(float x) __asm("llvm.trunc" ".f32"); CONST float __gen_ocl_rnde(float x) __asm("llvm.rint" ".f32"); CONST float __gen_ocl_rndu(float x) __asm("llvm.ceil" ".f32"); CONST float __gen_ocl_rndd(float x) __asm("llvm.floor" ".f32"); /* native functions */ OVERLOADABLE float native_cos(float x) { return __gen_ocl_cos(x); } OVERLOADABLE float native_sin(float x) { return __gen_ocl_sin(x); } OVERLOADABLE float native_sqrt(float x) { return __gen_ocl_sqrt(x); } OVERLOADABLE float native_rsqrt(float x) { return __gen_ocl_rsqrt(x); } OVERLOADABLE float native_log2(float x) { return __gen_ocl_log(x); } OVERLOADABLE float native_log(float x) { return native_log2(x) * 0.6931472002f; } OVERLOADABLE float native_log10(float x) { return native_log2(x) * 0.3010299956f; } OVERLOADABLE float native_powr(float x, float y) { return __gen_ocl_pow(x,y); } OVERLOADABLE float native_recip(float x) { return __gen_ocl_rcp(x); } OVERLOADABLE float native_tan(float x) { return native_sin(x) / native_cos(x); } OVERLOADABLE float native_exp2(float x) { return __gen_ocl_exp(x); } OVERLOADABLE float native_exp(float x) { return __gen_ocl_exp(M_LOG2E_F*x); } OVERLOADABLE float native_exp10(float x) { return __gen_ocl_exp(M_LOG210_F*x); } OVERLOADABLE float native_divide(float x, float y) { return x/y; } /* Fast path */ OVERLOADABLE float __gen_ocl_internal_fastpath_acosh (float x) { return native_log(x + native_sqrt(x + 1) * native_sqrt(x - 1)); } OVERLOADABLE float __gen_ocl_internal_fastpath_asinh (float x) { return native_log(x + native_sqrt(x * x + 1)); } OVERLOADABLE float __gen_ocl_internal_fastpath_atanh (float x) { return 0.5f * native_log((1 + x) / (1 - x)); } OVERLOADABLE float __gen_ocl_internal_fastpath_cbrt (float x) { return __gen_ocl_pow(x, 0.3333333333f); } OVERLOADABLE float __gen_ocl_internal_fastpath_cos (float x) { return native_cos(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_cosh (float x) { return (1 + native_exp(-2 * x)) / (2 * native_exp(-x)); } OVERLOADABLE float __gen_ocl_internal_fastpath_cospi (float x) { return __gen_ocl_cos(x * M_PI_F); } OVERLOADABLE float __gen_ocl_internal_fastpath_exp (float x) { return native_exp(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_exp10 (float x) { return native_exp10(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_expm1 (float x) { return __gen_ocl_pow(M_E_F, x) - 1; } OVERLOADABLE float __gen_ocl_internal_fastpath_fmod (float x, float y) { return x-y*__gen_ocl_rndz(x/y); } OVERLOADABLE float __gen_ocl_internal_fastpath_hypot (float x, float y) { return __gen_ocl_sqrt(x*x + y*y); } OVERLOADABLE int __gen_ocl_internal_fastpath_ilogb (float x) { return __gen_ocl_rndd(native_log2(x)); } OVERLOADABLE float __gen_ocl_internal_fastpath_ldexp (float x, int n) { return __gen_ocl_pow(2, n) * x; } OVERLOADABLE float __gen_ocl_internal_fastpath_log (float x) { return native_log(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_log2 (float x) { return native_log2(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_log10 (float x) { return native_log10(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_log1p (float x) { return native_log(x + 1); } OVERLOADABLE float __gen_ocl_internal_fastpath_logb (float x) { return __gen_ocl_rndd(native_log2(x)); } OVERLOADABLE float __gen_ocl_internal_fastpath_remainder (float x, float y) { return x-y*__gen_ocl_rnde(x/y); } OVERLOADABLE float __gen_ocl_internal_fastpath_rootn(float x, int n) { return __gen_ocl_pow(x, 1.f / n); } OVERLOADABLE float __gen_ocl_internal_fastpath_sin (float x) { return native_sin(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_sincos (float x, float *cosval) { *cosval = native_cos(x); return native_sin(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_sinh (float x) { return (1 - native_exp(-2 * x)) / (2 * native_exp(-x)); } OVERLOADABLE float __gen_ocl_internal_fastpath_sinpi (float x) { return __gen_ocl_sin(x * M_PI_F); } OVERLOADABLE float __gen_ocl_internal_fastpath_tan (float x) { return native_tan(x); } OVERLOADABLE float __gen_ocl_internal_fastpath_tanh (float x) { float y = native_exp(-2 * x); return (1 - y) / (1 + y); } /* Internal implement, high accuracy. */ OVERLOADABLE float __gen_ocl_internal_floor(float x) { return __gen_ocl_rndd(x); } OVERLOADABLE float __gen_ocl_internal_copysign(float x, float y) { union { unsigned u; float f; } ux, uy; ux.f = x; uy.f = y; ux.u = (ux.u & 0x7fffffff) | (uy.u & 0x80000000u); return ux.f; } OVERLOADABLE float inline __gen_ocl_internal_log_valid(float x) { /* * Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ union { unsigned int i; float f; } u; const float ln2_hi = 6.9313812256e-01, /* 0x3f317180 */ ln2_lo = 9.0580006145e-06, /* 0x3717f7d1 */ two25 = 3.355443200e+07, /* 0x4c000000 */ Lg1 = 6.6666668653e-01, /* 3F2AAAAB */ Lg2 = 4.0000000596e-01, /* 3ECCCCCD */ Lg3 = 2.8571429849e-01, /* 3E924925 */ Lg4 = 2.2222198546e-01; /* 3E638E29 */ const float zero = 0.0; float fsq, f, s, z, R, w, t1, t2, partial; int k, ix, i, j; u.f = x; ix = u.i; k = 0; k += (ix>>23) - 127; ix &= 0x007fffff; i = (ix + (0x95f64<<3)) & 0x800000; u.i = ix | (i^0x3f800000); x = u.f; k += (i>>23); f = x - 1.0f; fsq = f * f; if((0x007fffff & (15 + ix)) < 16) { /* |f| < 2**-20 */ R = fsq * (0.5f - 0.33333333333333333f * f); return k * ln2_hi + k * ln2_lo + f - R; } s = f / (2.0f + f); z = s * s; i = ix - (0x6147a << 3); w = z * z; j = (0x6b851 << 3) - ix; t1= w * mad(w, Lg4, Lg2); t2= z * mad(w, Lg3, Lg1); i |= j; R = t2 + t1; partial = (i > 0) ? -mad(s, 0.5f * fsq, -0.5f * fsq) : (s * f); return mad(s, R, f) - partial + k * ln2_hi + k * ln2_lo;; } OVERLOADABLE float __gen_ocl_internal_log(float x) { union { unsigned int i; float f; } u; u.f = x; int ix = u.i; if (ix < 0 ) return NAN; /* log(-#) = NaN */ if (ix >= 0x7f800000) return NAN; return __gen_ocl_internal_log_valid(x); } OVERLOADABLE float __gen_ocl_internal_log10(float x) { union { float f; unsigned i; } u; const float ivln10 = 4.3429449201e-01, /* 0x3ede5bd9 */ log10_2hi = 3.0102920532e-01, /* 0x3e9a2080 */ log10_2lo = 7.9034151668e-07; /* 0x355427db */ float y, z; int i, k, hx; u.f = x; hx = u.i; if (hx<0) return NAN; /* log(-#) = NaN */ if (hx >= 0x7f800000) return NAN; k = (hx >> 23) - 127; i = ((unsigned)k & 0x80000000) >> 31; hx = (hx&0x007fffff) | ((0x7f-i) << 23); y = (float)(k + i); u.i = hx; x = u.f; return y * log10_2lo + y * log10_2hi + ivln10 * __gen_ocl_internal_log_valid(x); } OVERLOADABLE float __gen_ocl_internal_log2(float x) { const float zero = 0.0, invln2 = 0x1.715476p+0f; int ix; union { float f; int i; } u; u.f = x; ix = u.i; if (ix < 0) return NAN; /** log(-#) = NaN */ if (ix >= 0x7f800000) return NAN; return invln2 * __gen_ocl_internal_log_valid(x); } float __gen_ocl_scalbnf (float x, int n){ /* copy from fdlibm */ float two25 = 3.355443200e+07, /* 0x4c000000 */ twom25 = 2.9802322388e-08, /* 0x33000000 */ huge = 1.0e+30, tiny = 1.0e-30; int k,ix; GEN_OCL_GET_FLOAT_WORD(ix,x); k = (ix&0x7f800000)>>23; /* extract exponent */ if (k==0) { /* 0 or subnormal x */ if ((ix&0x7fffffff)==0) return x; /* +-0 */ x *= two25; GEN_OCL_GET_FLOAT_WORD(ix,x); k = ((ix&0x7f800000)>>23) - 25; } if (k==0xff) return x+x; /* NaN or Inf */ if (n< -50000) return tiny*__gen_ocl_internal_copysign(tiny,x); /*underflow*/ if (n> 50000 || k+n > 0xfe) return huge*__gen_ocl_internal_copysign(huge,x); /* overflow */ /* Now k and n are bounded we know that k = k+n does not overflow. */ k = k+n; if (k > 0) { /* normal result */ GEN_OCL_SET_FLOAT_WORD(x,(ix&0x807fffff)|(k<<23)); return x; } if (k <= -25) return tiny*__gen_ocl_internal_copysign(tiny,x); /*underflow*/ k += 25; /* subnormal result */ GEN_OCL_SET_FLOAT_WORD(x,(ix&0x807fffff)|(k<<23)); return x*twom25; } const __constant unsigned int two_over_pi[] = { 0, 0, 0xA2F, 0x983, 0x6E4, 0xe44, 0x152, 0x9FC, 0x275, 0x7D1, 0xF53, 0x4DD, 0xC0D, 0xB62, 0x959, 0x93C, 0x439, 0x041, 0xFE5, 0x163, }; // The main idea is from "Radian Reduction for Trigonometric Functions" // written by Mary H. Payne and Robert N. Hanek. Also another reference // is "A Continued-Fraction Analysis of Trigonometric Argument Reduction" // written by Roger Alan Smith, who gave the worst case in this paper. // for single float, worst x = 0x1.47d0fep34, and there are 29 bit // leading zeros in the fraction part of x*(2.0/pi). so we need at least // 29 (leading zero)+ 24 (fraction )+12 (integer) + guard bits. that is, // 65 + guard bits, as we calculate in 12*7 = 84bits, which means we have // about 19 guard bits. If we need further precision, we may need more // guard bits // Note we place two 0 in two_over_pi, which is used to handle input less // than 0x1.0p23 int payne_hanek(float x, float *y) { union { float f; unsigned u;} ieee; ieee.f = x; unsigned u = ieee.u; int k = ((u & 0x7f800000) >> 23)-127; int ma = (u & 0x7fffff) | 0x800000; unsigned high, low; high = (ma & 0xfff000) >> 12; low = ma & 0xfff; // Two tune below macro, you need to fully understand the algorithm #define CALC_BLOCKS 7 #define ZERO_BITS 2 unsigned result[CALC_BLOCKS]; // round down, note we need 2 bits integer precision int index = (k-23-2) < 0 ? (k-23-2-11)/12 : (k-23-2)/12; for (int i = 0; i < CALC_BLOCKS; i++) { result[i] = low * two_over_pi[index+i+ZERO_BITS] ; result[i] += high * two_over_pi[index+i+1+ZERO_BITS]; } for (int i = CALC_BLOCKS-1; i > 0; i--) { int temp = result[i] >> 12; result[i] -= temp << 12; result[i-1] += temp; } #undef CALC_BLOCKS #undef ZERO_BITS // get number of integer digits in result[0], note we only consider 12 valid bits // and also it means the fraction digits in result[0] is (12-intDigit) int intDigit = index*(-12) + (k-23); // As the integer bits may be all included in result[0], and also maybe // some bits in result[0], and some in result[1]. So we merge succesive bits, // which makes easy coding. unsigned b0 = (result[0] << 12) | result[1]; unsigned b1 = (result[2] << 12) | result[3]; unsigned b2 = (result[4] << 12) | result[5]; unsigned b3 = (result[6] << 12); unsigned intPart = b0 >> (24-intDigit); unsigned fract1 = ((b0 << intDigit) | (b1 >> (24-intDigit))) & 0xffffff; unsigned fract2 = ((b1 << intDigit) | (b2 >> (24-intDigit))) & 0xffffff; unsigned fract3 = ((b2 << intDigit) | (b3 >> (24-intDigit))) & 0xffffff; // larger than 0.5? which mean larger than pi/4, we need // transform from [0,pi/2] to [-pi/4, pi/4] through -(1.0-fract) int largerPiBy4 = ((fract1 & 0x800000) != 0); int sign = largerPiBy4 ? 1 : 0; intPart = largerPiBy4 ? (intPart+1) : intPart; fract1 = largerPiBy4 ? (fract1 ^ 0x00ffffff) : fract1; fract2 = largerPiBy4 ? (fract2 ^ 0x00ffffff) : fract2; fract3 = largerPiBy4 ? (fract3 ^ 0x00ffffff) : fract3; int leadingZero = (fract1 == 0); // +1 is for the hidden bit 1 in floating-point format int exponent = leadingZero ? -(24+1) : -(0+1); fract1 = leadingZero ? fract2 : fract1; fract2 = leadingZero ? fract3 : fract2; // fract1 may have leading zeros, add it int shift = clz(fract1)-8; exponent += -shift; float pio2 = 0x1.921fb6p+0; unsigned fdigit = ((fract1 << shift) | (fract2 >> (24-shift))) & 0xffffff; // we know that denormal number will not appear here ieee.u = (sign << 31) | ((exponent+127) << 23) | (fdigit & 0x7fffff); *y = ieee.f * pio2; return intPart; } int argumentReduceSmall(float x, float * remainder) { union { float f; unsigned u; } ieee; float twoByPi = 2.0f/3.14159265f; float piBy2_1h = (float) 0xc90/0x1.0p11, piBy2_1l = (float) 0xfda/0x1.0p23, piBy2_2h = (float) 0xa22/0x1.0p35, piBy2_2l = (float) 0x168/0x1.0p47, piBy2_3h = (float) 0xc23/0x1.0p59, piBy2_3l = (float) 0x4c4/0x1.0p71; float y = (float)(int)(twoByPi * x + 0.5f); ieee.f = y; ieee.u = ieee.u & 0xfffff000; float yh = ieee.f; float yl = y - yh; float rem = x - yh*piBy2_1h - yh*piBy2_1l - yl*piBy2_1h - yl*piBy2_1l; rem = rem - yh*piBy2_2h - yh*piBy2_2l + yl*piBy2_2h + yl*piBy2_2l; rem = rem - yh*piBy2_3h - yh*piBy2_3l - yl*piBy2_3h - yl*piBy2_3l; *remainder = rem; return (int)y; } int __ieee754_rem_pio2f(float x, float *y) { if (x < 4000.0f) { return argumentReduceSmall(x, y); } else { return payne_hanek(x, y); } } OVERLOADABLE float __kernel_sinf(float x) { /* copied from fdlibm */ const float S1 = -1.6666667163e-01, /* 0xbe2aaaab */ S2 = 8.3333337680e-03, /* 0x3c088889 */ S3 = -1.9841270114e-04, /* 0xb9500d01 */ S4 = 2.7557314297e-06; /* 0x3638ef1b */ float z,r,v; z = x*x; v = z*x; r = mad(z, mad(z, mad(z, S4, S3), S2), S1); return mad(v, r, x); } float __kernel_cosf(float x, float y) { /* copied from fdlibm */ const float one = 1.0000000000e+00, /* 0x3f800000 */ C1 = 4.1666667908e-02, /* 0x3d2aaaab */ C2 = -1.3888889225e-03, /* 0xbab60b61 */ C3 = 2.4801587642e-05; /* 0x37d00d01 */ float a,hz,z,r,qx; int ix; GEN_OCL_GET_FLOAT_WORD(ix,x); ix &= 0x7fffffff; /* ix = |x|'s high word*/ z = x*x; r = z * mad(z, mad(z, C3, C2), C1); if(ix < 0x3e99999a) /* if |x| < 0.3 */ return one - ((float)0.5*z - (z*r - x*y)); else { GEN_OCL_SET_FLOAT_WORD(qx,ix-0x01000000); /* x/4 */ hz = (float)0.5*z-qx; a = one-qx; return a - (hz - (z*r-x*y)); } } OVERLOADABLE float sin(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_sin(x); const float pio4 = 7.8539812565e-01; /* 0x3f490fda */ float y,z=0.0; int n, ix; float negative = x < 0.0f? -1.0f : 1.0f; x = fabs(x); GEN_OCL_GET_FLOAT_WORD(ix,x); ix &= 0x7fffffff; /* sin(Inf or NaN) is NaN */ if (ix >= 0x7f800000) return x-x; if(x <= pio4) return negative * __kernel_sinf(x); /* argument reduction needed */ else { n = __ieee754_rem_pio2f(x,&y); float s = __kernel_sinf(y); float c = __kernel_cosf(y,0.0f); float ret = (n&1) ? negative*c : negative*s; return (n&3)> 1? -1.0f*ret : ret; } } OVERLOADABLE float cos(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_cos(x); const float pio4 = 7.8539812565e-01; /* 0x3f490fda */ float y,z=0.0; int n, ix; x = __gen_ocl_fabs(x); GEN_OCL_GET_FLOAT_WORD(ix,x); ix &= 0x7fffffff; /* cos(Inf or NaN) is NaN */ if (ix >= 0x7f800000) return x-x; if(x <= pio4) return __kernel_cosf(x, 0.f); /* argument reduction needed */ else { n = __ieee754_rem_pio2f(x,&y); n &= 3; float c = __kernel_cosf(y, 0.0f); float s = __kernel_sinf(y); float v = (n&1) ? s : c; /* n&3 return 0 cos(y) 1 -sin(y) 2 -cos(y) 3 sin(y) */ int mask = (n>>1) ^ n; float sign = (mask&1) ? -1.0f : 1.0f; return sign * v; } } float __kernel_tanf(float x, float y, int iy) { /* copied from fdlibm */ float z,r,v,w,s; int ix,hx; const float one = 1.0000000000e+00, /* 0x3f800000 */ pio4 = 7.8539812565e-01, /* 0x3f490fda */ pio4lo= 3.7748947079e-08; /* 0x33222168 */ float T[13];// = { T[0] = 3.3333334327e-01; /* 0x3eaaaaab */ T[1] = 1.3333334029e-01; /* 0x3e088889 */ T[2] = 5.3968254477e-02; /* 0x3d5d0dd1 */ T[3] = 2.1869488060e-02; /* 0x3cb327a4 */ T[4] = 8.8632395491e-03; /* 0x3c11371f */ T[5] = 3.5920790397e-03; /* 0x3b6b6916 */ T[6] = 1.4562094584e-03; /* 0x3abede48 */ T[7] = 5.8804126456e-04; /* 0x3a1a26c8 */ GEN_OCL_GET_FLOAT_WORD(hx,x); ix = hx&0x7fffffff; /* high word of |x| */ if(ix<0x31800000) /* x < 2**-28 */ {if((int)x==0) { /* generate inexact */ if((ix|(iy+1))==0) return one/__gen_ocl_fabs(x); else return (iy==1)? x: -one/x; } } if(ix>=0x3f2ca140) { /* |x|>=0.6744 */ if(hx<0) {x = -x; y = -y;} z = pio4-x; w = pio4lo-y; x = z+w; y = 0.0; } z = x*x; w = z*z; /* Break x^5*(T[1]+x^2*T[2]+...) into * x^5(T[1]+x^4*T[3]+...+x^20*T[11]) + * x^5(x^2*(T[2]+x^4*T[4]+...+x^22*[T12])) */ r = mad(w, mad(w, mad(w, T[7], T[5]), T[3]), T[1]); v = z* mad(w, mad(w, T[6], T[4]), T[2]); s = z*x; r = mad(z, mad(s, r + v, y), y); r += T[0]*s; w = x+r; if(ix>=0x3f2ca140) { v = (float)iy; return (float)(1-((hx>>30)&2))*(v-(float)2.0*(x-(w*w/(w+v)-r))); } if(iy==1) return w; else return -1.0/(x+r); } OVERLOADABLE float tan(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_tan(x); float y,z=0.0; int n, ix; float negative = x < 0.0f? -1.0f : 1.0f; x = negative * x; GEN_OCL_GET_FLOAT_WORD(ix,x); ix &= 0x7fffffff; /* tan(Inf or NaN) is NaN */ if (ix>=0x7f800000) return x-x; /* NaN */ /* argument reduction needed */ else { n = __ieee754_rem_pio2f(x,&y); return negative * __kernel_tanf(y,0.0f,1-((n&1)<<1)); /* 1 -- n even -1 -- n odd */ } } OVERLOADABLE float __gen_ocl_internal_cospi(float x) { int ix; if(isinf(x) || isnan(x)) { return NAN; } if(x < 0.0f) { x = -x; } GEN_OCL_GET_FLOAT_WORD(ix, x); if(x> 0x1.0p24) return 1.0f; float m = __gen_ocl_internal_floor(x); ix = (int)m; m = x-m; if((ix&0x1) != 0) m+=1.0f; ix = __gen_ocl_internal_floor(m*4.0f); switch(ix) { case 0: return __kernel_cosf(m*M_PI_F, 0.0f); case 1: case 2: return __kernel_sinf((0.5f-m)*M_PI_F); case 3: case 4: return -__kernel_cosf((m-1.0f)*M_PI_F, 0.0f); case 5: case 6: return __kernel_sinf((m-1.5f)*M_PI_F); default: return __kernel_cosf((2.0f-m)*M_PI_F, 0.0f); } } OVERLOADABLE float __gen_ocl_internal_sinpi(float x) { float sign = 1.0f; int ix; if(isinf(x)) return NAN; if(x < 0.0f) { x = -x; sign = -1.0f; } GEN_OCL_GET_FLOAT_WORD(ix, x); if(x> 0x1.0p24) return 0.0f; float m = __gen_ocl_internal_floor(x); ix = (int)m; m = x-m; if((ix&0x1) != 0) m+=1.0f; ix = __gen_ocl_internal_floor(m*4.0f); switch(ix) { case 0: return sign*__kernel_sinf(m*M_PI_F); case 1: case 2: return sign*__kernel_cosf((m-0.5f)*M_PI_F, 0.0f); case 3: case 4: return -sign*__kernel_sinf((m-1.0f)*M_PI_F); case 5: case 6: return -sign*__kernel_cosf((m-1.5f)*M_PI_F, 0.0f); default: return -sign*__kernel_sinf((2.0f-m)*M_PI_F); } } OVERLOADABLE float lgamma(float x) { /* * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ const float zero= 0., one = 1.0000000000e+00, pi = 3.1415927410e+00, a0 = 7.7215664089e-02, a1 = 3.2246702909e-01, a2 = 6.7352302372e-02, a3 = 2.0580807701e-02, a4 = 7.3855509982e-03, a5 = 2.8905137442e-03, a6 = 1.1927076848e-03, a7 = 5.1006977446e-04, a8 = 2.2086278477e-04, a9 = 1.0801156895e-04, a10 = 2.5214456400e-05, a11 = 4.4864096708e-05, tc = 1.4616321325e+00, tf = -1.2148628384e-01, tt = 6.6971006518e-09, t0 = 4.8383611441e-01, t1 = -1.4758771658e-01, t2 = 6.4624942839e-02, t3 = -3.2788541168e-02, t4 = 1.7970675603e-02, t5 = -1.0314224288e-02, t6 = 6.1005386524e-03, t7 = -3.6845202558e-03, t8 = 2.2596477065e-03, t9 = -1.4034647029e-03, t10 = 8.8108185446e-04, t11 = -5.3859531181e-04, t12 = 3.1563205994e-04, t13 = -3.1275415677e-04, t14 = 3.3552918467e-04, u0 = -7.7215664089e-02, u1 = 6.3282704353e-01, u2 = 1.4549225569e+00, u3 = 9.7771751881e-01, u4 = 2.2896373272e-01, u5 = 1.3381091878e-02, v1 = 2.4559779167e+00, v2 = 2.1284897327e+00, v3 = 7.6928514242e-01, v4 = 1.0422264785e-01, v5 = 3.2170924824e-03, s0 = -7.7215664089e-02, s1 = 2.1498242021e-01, s2 = 3.2577878237e-01, s3 = 1.4635047317e-01, s4 = 2.6642270386e-02, s5 = 1.8402845599e-03, s6 = 3.1947532989e-05, r1 = 1.3920053244e+00, r2 = 7.2193557024e-01, r3 = 1.7193385959e-01, r4 = 1.8645919859e-02, r5 = 7.7794247773e-04, r6 = 7.3266842264e-06, w0 = 4.1893854737e-01, w1 = 8.3333335817e-02, w2 = -2.7777778450e-03, w3 = 7.9365057172e-04, w4 = -5.9518753551e-04, w5 = 8.3633989561e-04, w6 = -1.6309292987e-03; float t, y, z, nadj, p, p1, p2, p3, q, r, w; int i, hx, ix; nadj = 0; hx = *(int *)&x; ix = hx & 0x7fffffff; if (ix >= 0x7f800000) return x * x; if (ix == 0) return ((x + one) / zero); if (ix < 0x1c800000) { if (hx < 0) { return -native_log(-x); } else return -native_log(x); } if (hx < 0) { if (ix >= 0x4b000000) return ((-x) / zero); t = __gen_ocl_internal_sinpi(x); if (t == zero) return ((-x) / zero); nadj = native_log(pi / __gen_ocl_fabs(t * x)); x = -x; } if (ix == 0x3f800000 || ix == 0x40000000) r = 0; else if (ix < 0x40000000) { if (ix <= 0x3f666666) { r = -native_log(x); if (ix >= 0x3f3b4a20) { y = one - x; i = 0; } else if (ix >= 0x3e6d3308) { y = x - (tc - one); i = 1; } else { y = x; i = 2; } } else { r = zero; if (ix >= 0x3fdda618) { y = (float) 2.0 - x; i = 0; } else if (ix >= 0x3F9da620) { y = x - tc; i = 1; } else { y = x - one; i = 2; } } switch (i) { case 0: z = y * y; p1 = mad(z, mad(z, mad(z, mad(z, mad(z, a10, a8), a6), a4), a2), a0); p2 = z * mad(z, mad(z, mad(z, mad(z, mad(z, a11, a9), a7), a5), a3), a1); p = mad(y, p1, p2); r += (p - (float) 0.5 * y); break; case 1: z = y * y; w = z * y; p1 = mad(w, mad(w, mad(w, mad(w, t12, t9), t6), t3), t0); p2 = mad(w, mad(w, mad(w, mad(w, t13, t10), t7), t4), t1); p3 = mad(w, mad(w, mad(w, mad(w, t14, t11), t8), t5), t2); p = mad(p1, z, mad(w, mad(y, p3, p2), -tt)); r += (tf + p); break; case 2: p1 = y * mad(y, mad(y, mad(y, mad(y, mad(y, u5, u4), u3), u2), u1), u0); p2 = mad(y, mad(y, mad(y, mad(y, mad(y, v5, v4), v3), v2), v1), one); r += (-(float) 0.5 * y + p1 / p2); } } else if (ix < 0x41000000) { i = (int) x; t = zero; y = x - (float) i; p =y * mad(y, mad(y, mad(y, mad(y, mad(y, mad(y, s6, s5), s4), s3), s2), s1), s0); q = mad(y, mad(y, mad(y, mad(y, mad(y, mad(y, r6, r5), r4), r3), r2), r1), one); r = .5f * y + p / q; z = one; switch (i) { case 7: z *= (y + 6.0f); case 6: z *= (y + 5.0f); case 5: z *= (y + 4.0f); case 4: z *= (y + 3.0f); case 3: z *= (y + 2.0f); r += native_log(z); break; } } else if (ix < 0x5c800000) { t = native_log(x); z = one / x; y = z * z; w = mad(z, mad(y, mad(y, mad(y, mad(y, mad(y, w6, w5), w4), w3), w2), w1), w0); r = (x - .5f) * (t - one) + w; } else r = x * (native_log(x) - one); if (hx < 0) r = nadj - r; return r; } /* * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ #define BODY \ const float \ zero= 0., \ one = 1.0000000000e+00, \ pi = 3.1415927410e+00, \ a0 = 7.7215664089e-02, \ a1 = 3.2246702909e-01, \ a2 = 6.7352302372e-02, \ a3 = 2.0580807701e-02, \ a4 = 7.3855509982e-03, \ a5 = 2.8905137442e-03, \ a6 = 1.1927076848e-03, \ a7 = 5.1006977446e-04, \ a8 = 2.2086278477e-04, \ a9 = 1.0801156895e-04, \ a10 = 2.5214456400e-05, \ a11 = 4.4864096708e-05, \ tc = 1.4616321325e+00, \ tf = -1.2148628384e-01, \ tt = 6.6971006518e-09, \ t0 = 4.8383611441e-01, \ t1 = -1.4758771658e-01, \ t2 = 6.4624942839e-02, \ t3 = -3.2788541168e-02, \ t4 = 1.7970675603e-02, \ t5 = -1.0314224288e-02, \ t6 = 6.1005386524e-03, \ t7 = -3.6845202558e-03, \ t8 = 2.2596477065e-03, \ t9 = -1.4034647029e-03, \ t10 = 8.8108185446e-04, \ t11 = -5.3859531181e-04, \ t12 = 3.1563205994e-04, \ t13 = -3.1275415677e-04, \ t14 = 3.3552918467e-04, \ u0 = -7.7215664089e-02, \ u1 = 6.3282704353e-01, \ u2 = 1.4549225569e+00, \ u3 = 9.7771751881e-01, \ u4 = 2.2896373272e-01, \ u5 = 1.3381091878e-02, \ v1 = 2.4559779167e+00, \ v2 = 2.1284897327e+00, \ v3 = 7.6928514242e-01, \ v4 = 1.0422264785e-01, \ v5 = 3.2170924824e-03, \ s0 = -7.7215664089e-02, \ s1 = 2.1498242021e-01, \ s2 = 3.2577878237e-01, \ s3 = 1.4635047317e-01, \ s4 = 2.6642270386e-02, \ s5 = 1.8402845599e-03, \ s6 = 3.1947532989e-05, \ r1 = 1.3920053244e+00, \ r2 = 7.2193557024e-01, \ r3 = 1.7193385959e-01, \ r4 = 1.8645919859e-02, \ r5 = 7.7794247773e-04, \ r6 = 7.3266842264e-06, \ w0 = 4.1893854737e-01, \ w1 = 8.3333335817e-02, \ w2 = -2.7777778450e-03, \ w3 = 7.9365057172e-04, \ w4 = -5.9518753551e-04, \ w5 = 8.3633989561e-04, \ w6 = -1.6309292987e-03; \ float t, y, z, nadj, p, p1, p2, p3, q, r, w; \ int i, hx, ix; \ nadj = 0; \ hx = *(int *)&x; \ *signgamp = 1; \ ix = hx & 0x7fffffff; \ if (ix >= 0x7f800000) \ return x * x; \ if (ix == 0) \ return ((x + one) / zero); \ if (ix < 0x1c800000) { \ if (hx < 0) { \ *signgamp = -1; \ return -native_log(-x); \ } else \ return -native_log(x); \ } \ if (hx < 0) { \ if (ix >= 0x4b000000) \ return ((-x) / zero); \ t = __gen_ocl_internal_sinpi(x); \ if (t == zero) \ return ((-x) / zero); \ nadj = native_log(pi / __gen_ocl_fabs(t * x)); \ if (t < zero) \ *signgamp = -1; \ x = -x; \ } \ if (ix == 0x3f800000 || ix == 0x40000000) \ r = 0; \ else if (ix < 0x40000000) { \ if (ix <= 0x3f666666) { \ r = -native_log(x); \ if (ix >= 0x3f3b4a20) { \ y = one - x; \ i = 0; \ } else if (ix >= 0x3e6d3308) { \ y = x - (tc - one); \ i = 1; \ } else { \ y = x; \ i = 2; \ } \ } else { \ r = zero; \ if (ix >= 0x3fdda618) { \ y = (float) 2.0 - x; \ i = 0; \ } \ else if (ix >= 0x3F9da620) { \ y = x - tc; \ i = 1; \ } \ else { \ y = x - one; \ i = 2; \ } \ } \ switch (i) { \ case 0: \ z = y * y; \ p1 = mad(z, mad(z, mad(z, mad(z, mad(z, a10, a8), a6), a4), a2), a0); \ p2 = z * mad(z, mad(z, mad(z, mad(z, mad(z, a11, a9), a7), a5), a3), a1); \ p = mad(y, p1, p2); \ r = r - mad(y, 0.5f, -p); \ break; \ case 1: \ z = y * y; \ w = z * y; \ p1 = mad(w, mad(w, mad(w, mad(w, t12, t9), t6), t3), t0); \ p2 = mad(w, mad(w, mad(w, mad(w, t13, t10), t7), t4), t1); \ p3 = mad(w, mad(w, mad(w, mad(w, t14, t11), t8), t5), t2); \ p = z * p1 + mad(w, mad(y, p3, p2), -tt); \ r += (tf + p); \ break; \ case 2: \ p1 = y * mad(y, mad(y, mad(y, mad(y, mad(y, u5, u4), u3), u2), u1), u0); \ p2 = mad(y, mad(y, mad(y, mad(y, mad(y, v5, v4), v3), v2), v1), one); \ r = r + mad(y, -0.5f, p1 / p2); \ } \ } else if (ix < 0x41000000) { \ i = (int) x; \ t = zero; \ y = x - (float) i; \ p = y * mad(y, mad(y, mad(y, mad(y, mad(y, mad(y, s6, s5), s4), s3), s2), s1), s0); \ q = mad(y, mad(y, mad(y, mad(y, mad(y, mad(y, r6, r5), r4), r3), r2), r1), one); \ r = mad(y, 0.5f, p / q); \ z = one; \ switch (i) { \ case 7: \ z *= (y + (float) 6.0); \ case 6: \ z *= (y + (float) 5.0); \ case 5: \ z *= (y + (float) 4.0); \ case 4: \ z *= (y + (float) 3.0); \ case 3: \ z *= (y + (float) 2.0); \ r += native_log(z); \ break; \ } \ \ } else if (ix < 0x5c800000) { \ t = native_log(x); \ z = one / x; \ y = z * z; \ w = mad(z, mad(y, mad(y, mad(y, mad(y, mad(y, w6, w5), w4), w3), w2), w1), w0); \ r = (x - .5f) * (t - one) + w; \ } else \ r = x * (native_log(x) - one); \ if (hx < 0) \ r = nadj - r; \ return r; OVERLOADABLE float lgamma_r(float x, int *signgamp) { BODY; } #undef BODY OVERLOADABLE float log1p(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_log1p(x); /* * Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ const float ln2_hi = 6.9313812256e-01, /* 0x3f317180 */ ln2_lo = 9.0580006145e-06, /* 0x3717f7d1 */ two25 = 3.355443200e+07, /* 0x4c000000 */ Lp1 = 6.6666668653e-01, /* 3F2AAAAB */ Lp2 = 4.0000000596e-01, /* 3ECCCCCD */ Lp3 = 2.8571429849e-01, /* 3E924925 */ Lp4 = 2.2222198546e-01; /* 3E638E29 */ const float zero = 0.0; float hfsq,f,c,s,z,R,u; int k,hx,hu,ax; union {float f; unsigned i;} un; un.f = x; hx = un.i; ax = hx&0x7fffffff; k = 1; if (hx < 0x3ed413d7) { /* x < 0.41422 */ if(ax>=0x3f800000) { /* x <= -1.0 */ if(x==(float)-1.0) return -two25/zero; /* log1p(-1)=+inf */ else return (x-x)/(x-x); /* log1p(x<-1)=NaN */ } if(ax<0x31000000) { /* |x| < 2**-29 */ if(two25+x>zero /* raise inexact */ &&ax<0x24800000) /* |x| < 2**-54 */ return x; else return x - x*x*(float)0.5; } if(hx>0||hx<=((int)0xbe95f61f)) { k=0;f=x;hu=1;} /* -0.2929= 0x7f800000) return x+x; if(k!=0) { if(hx<0x5a000000) { u = (float)1.0+x; un.f = u; hu = un.i; k = (hu>>23)-127; /* correction term */ c = (k>0)? (float)1.0-(u-x):x-(u-(float)1.0); c /= u; } else { u = x; un.f = u; hu = un.i; k = (hu>>23)-127; c = 0; } hu &= 0x007fffff; if(hu<0x3504f7) { un.i = hu|0x3f800000; u = un.f;/* normalize u */ } else { k += 1; un.i = hu|0x3f000000; u = un.f; /* normalize u/2 */ hu = (0x00800000-hu)>>2; } f = u-(float)1.0; } hfsq=(float)0.5*f*f; if(hu==0) { /* |f| < 2**-20 */ if(f==zero) { if(k==0) return zero; else {c = mad(k , ln2_lo, c); return mad(k, ln2_hi, c);} } R = mad(hfsq, 1.0f, -0.66666666666666666f * f); if(k==0) return f-R; else return k * ln2_hi - (R - mad(k, ln2_lo, c) - f); } s = f/((float)2.0+f); z = s*s; R = z * mad(z, mad(z, mad(z, Lp4, Lp3), Lp2), Lp1); if(k==0) return f + mad(hfsq + R, s, -hfsq); else return k*ln2_hi-( (hfsq - mad(s, hfsq + R, mad(k, ln2_lo, c))) - f); } OVERLOADABLE float logb(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_logb(x); union {float f; unsigned i;} u; u.f = x; int e = ((u.i & 0x7f800000) >> 23); float r1 = e-127; float r2 = -INFINITY; float r3 = x*x; /* sub normal or +/-0 */ float r = e == 0 ? r2 : r1; /* inf & nan */ return e == 0xff ? r3 : r; } OVERLOADABLE int ilogb(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_ilogb(x); union { int i; float f; } u; if (isnan(x)) return FP_ILOGBNAN; if (isinf(x)) return 0x7FFFFFFF; u.f = x; u.i &= 0x7fffffff; if (u.i == 0) return FP_ILOGB0; if (u.i >= 0x800000) return (u.i >> 23) - 127; int r = -126; int a = u.i & 0x7FFFFF; while(a < 0x800000) { a <<= 1; r --; } return r; } OVERLOADABLE float nan(uint code) { return NAN; } OVERLOADABLE float __gen_ocl_internal_tanpi(float x) { float sign = 1.0f; int ix; if(isinf(x)) return NAN; if(x < 0.0f) { x = -x; sign = -1.0f; } GEN_OCL_GET_FLOAT_WORD(ix, x); if(x> 0x1.0p24) return 0.0f; float m = __gen_ocl_internal_floor(x); ix = (int)m; m = x-m; int n = __gen_ocl_internal_floor(m*4.0f); if(m == 0.5f) { return (ix&0x1) == 0 ? sign*INFINITY : sign*-INFINITY; } if(m == 0.0f) { return (ix&0x1) == 0 ? 0.0f : -0.0f; } switch(n) { case 0: return sign * __kernel_tanf(m*M_PI_F, 0.0f, 1); case 1: return sign * 1.0f/__kernel_tanf((0.5f-m)*M_PI_F, 0.0f, 1); case 2: return sign * 1.0f/__kernel_tanf((0.5f-m)*M_PI_F, 0.0f, 1); default: return sign * -1.0f*__kernel_tanf((1.0f-m)*M_PI_F, 0.0f, 1); } } OVERLOADABLE float __gen_ocl_internal_cbrt(float x) { /* copied from fdlibm */ const unsigned B1 = 709958130, /* B1 = (84+2/3-0.03306235651)*2**23 */ B2 = 642849266; /* B2 = (76+2/3-0.03306235651)*2**23 */ const float C = 5.4285717010e-01, /* 19/35 = 0x3f0af8b0 */ D = -7.0530611277e-01, /* -864/1225 = 0xbf348ef1 */ E = 1.4142856598e+00, /* 99/70 = 0x3fb50750 */ F = 1.6071428061e+00, /* 45/28 = 0x3fcdb6db */ G = 3.5714286566e-01; /* 5/14 = 0x3eb6db6e */ float r,s,t, w; int hx; uint sign; uint high; GEN_OCL_GET_FLOAT_WORD(hx,x); sign=hx&0x80000000; /* sign= sign(x) */ hx ^=sign; if(hx>=0x7f800000) return(x+x); /* cbrt(NaN,INF) is itself */ if(hx==0) return(x); /* cbrt(0) is itself */ GEN_OCL_SET_FLOAT_WORD(x,hx); /* x <- |x| */ /* rough cbrt to 5 bits */ if(hx<0x00800000) /* subnormal number */ { //SET_FLOAT_WORD(t,0x4b800000); /* set t= 2**24 */ //t*=x; GET_FLOAT_WORD(high,t); SET_FLOAT_WORD(t,high/3+B2); t = (sign = 0) ? 0.0f : -0.0f; return t; } else GEN_OCL_SET_FLOAT_WORD(t,hx/3+B1); /* new cbrt to 23 bits */ r=t*t/x; s=mad(r, t, C); t*=G+F/(s+E+D/s); /* one step newton iteration to 53 bits with error less than 0.667 ulps */ s=t*t; /* t*t is exact */ r=x/s; w=t+t; r=(r-t)/(w+r); /* r-s is exact */ t=mad(t, r, t); /* retore the sign bit */ GEN_OCL_GET_FLOAT_WORD(high,t); GEN_OCL_SET_FLOAT_WORD(t,high|sign); return(t); } #define BODY \ *cosval = cos(x); \ return sin(x); OVERLOADABLE float sincos(float x, float *cosval) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_sincos(x, cosval); BODY; } #undef BODY INLINE float __gen_ocl_asin_util(float x) { /* * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunSoft, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ float pS0 = 1.66666666666666657415e-01, pS1 = -3.25565818622400915405e-01, pS2 = 2.01212532134862925881e-01, pS3 = -4.00555345006794114027e-02, pS4 = 7.91534994289814532176e-04, qS1 = -2.40339491173441421878e+00, qS2 = 2.02094576023350569471e+00, qS3 = -6.88283971605453293030e-01, qS4 = 7.70381505559019352791e-02; float t = x*x; float p = t * mad(t, mad(t, mad(t, mad(t, pS4, pS3), pS2), pS1), pS0); float q = mad(t, mad(t, mad(t, mad(t, qS4, qS3), qS2), qS1), 1.0f); float w = p / q; return mad(x, w, x); } OVERLOADABLE float __gen_ocl_internal_asin(float x) { uint ix; union { uint i; float f; } u; u.f = x; ix = u.i & 0x7fffffff; if(ix == 0x3f800000) { return x * M_PI_2_F; /* asin(|1|)=+-pi/2 with inexact */ } if(ix > 0x3f800000) { /* |x|>= 1 */ return NAN; /* asin(|x|>1) is NaN */ } if(ix < 0x32000000) { /* if |x| < 2**-27 */ if(HUGE_VALF + x > FLT_ONE) return x; /* return x with inexact if x!=0*/ } if(x < -0.5) { return 2 * __gen_ocl_asin_util(native_sqrt((1+x) / 2)) - M_PI_2_F; } else if(x > 0.5) { return M_PI_2_F - 2 * __gen_ocl_asin_util(native_sqrt((1-x) / 2)); } else { return __gen_ocl_asin_util(x); } } OVERLOADABLE float __gen_ocl_internal_asinpi(float x) { return __gen_ocl_internal_asin(x) / M_PI_F; } OVERLOADABLE float __gen_ocl_internal_acos(float x) { if(x > 0.5) return 2 * __gen_ocl_asin_util(native_sqrt((1-x)/2)); else return M_PI_2_F - __gen_ocl_internal_asin(x); } OVERLOADABLE float __gen_ocl_internal_acospi(float x) { return __gen_ocl_internal_acos(x) / M_PI_F; } __constant float atanhi[4] = { 4.6364760399e-01, /* atan(0.5)hi 0x3eed6338 */ 7.8539812565e-01, /* atan(1.0)hi 0x3f490fda */ 9.8279368877e-01, /* atan(1.5)hi 0x3f7b985e */ 1.5707962513e+00, /* atan(inf)hi 0x3fc90fda */ }; __constant float atanlo[4] = { 5.0121582440e-09, /* atan(0.5)lo 0x31ac3769 */ 3.7748947079e-08, /* atan(1.0)lo 0x33222168 */ 3.4473217170e-08, /* atan(1.5)lo 0x33140fb4 */ 7.5497894159e-08, /* atan(inf)lo 0x33a22168 */ }; OVERLOADABLE float __gen_ocl_internal_atan(float x) { /* copied from fdlibm */ float aT[11]; aT[0] = 3.3333334327e-01; /* 0x3eaaaaaa */ aT[1] = -2.0000000298e-01; /* 0xbe4ccccd */ aT[2] = 1.4285714924e-01; /* 0x3e124925 */ aT[3] = -1.1111110449e-01; /* 0xbde38e38 */ aT[4] = 9.0908870101e-02; /* 0x3dba2e6e */ aT[5] = -7.6918758452e-02; /* 0xbd9d8795 */ aT[6] = 6.6610731184e-02; /* 0x3d886b35 */ const float one = 1.0, huge = 1.0e30; float w,s1,s2,z; int ix,hx,id; GEN_OCL_GET_FLOAT_WORD(hx,x); ix = hx&0x7fffffff; if(ix>=0x50800000) { /* if |x| >= 2^34 */ if(ix>0x7f800000) return x+x; /* NaN */ if(hx>0) return atanhi[3]+atanlo[3]; else return -atanhi[3]-atanlo[3]; } if (ix < 0x3ee00000) { /* |x| < 0.4375 */ if (ix < 0x31000000) { /* |x| < 2^-29 */ if(huge+x>one) return x; /* raise inexact */ } id = -1; } else { x = __gen_ocl_fabs(x); if (ix < 0x3f980000) { /* |x| < 1.1875 */ if (ix < 0x3f300000) { /* 7/16 <=|x|<11/16 */ id = 0; x = ((float)2.0*x-one)/((float)2.0+x); } else { /* 11/16<=|x|< 19/16 */ id = 1; x = (x-one)/(x+one); } } else { if (ix < 0x401c0000) { /* |x| < 2.4375 */ id = 2; x = (x-(float)1.5)/(one+(float)1.5*x); } else { /* 2.4375 <= |x| < 2^66 */ id = 3; x = -(float)1.0/x; } }} /* end of argument reduction */ z = x*x; w = z*z; /* break sum from i=0 to 10 aT[i]z**(i+1) into odd and even poly */ s1 = z * mad(w, mad(w, mad(w, aT[6], aT[4]), aT[2]), aT[0]); s2 = w * mad(w, mad(w, aT[5], aT[3]), aT[1]); if (id<0) return x - x*(s1+s2); else { z = atanhi[id] - ((x*(s1+s2) - atanlo[id]) - x); return (hx<0)? -z:z; } } OVERLOADABLE float __gen_ocl_internal_atanpi(float x) { return __gen_ocl_internal_atan(x) / M_PI_F; } // XXX work-around PTX profile OVERLOADABLE float sqrt(float x) { return native_sqrt(x); } OVERLOADABLE float rsqrt(float x) { return native_rsqrt(x); } OVERLOADABLE float __gen_ocl_internal_atan2(float y, float x) { /* copied from fdlibm */ float z; int k,m,hx,hy,ix,iy; const float tiny = 1.0e-30, zero = 0.0, pi_o_4 = 7.8539818525e-01, /* 0x3f490fdb */ pi_o_2 = 1.5707963705e+00, /* 0x3fc90fdb */ pi = 3.1415927410e+00, /* 0x40490fdb */ pi_lo = -8.7422776573e-08; /* 0xb3bbbd2e */ GEN_OCL_GET_FLOAT_WORD(hx,x); ix = hx&0x7fffffff; GEN_OCL_GET_FLOAT_WORD(hy,y); iy = hy&0x7fffffff; if((ix>0x7f800000)|| (iy>0x7f800000)) /* x or y is NaN */ return x+y; if(hx==0x3f800000) return z=__gen_ocl_internal_atan(y); /* x=1.0 */ m = ((hy>>31)&1)|((hx>>30)&2); /* 2*sign(x)+sign(y) */ /* when y = 0 */ if(iy==0) { switch(m) { case 0: case 1: return y; /* atan(+-0,+anything)=+-0 */ case 2: return pi+tiny;/* atan(+0,-anything) = pi */ case 3: return -pi-tiny;/* atan(-0,-anything) =-pi */ } } /* when x = 0 */ if(ix==0) return (hy<0)? -pi_o_2-tiny: pi_o_2+tiny; /* both are denorms. Gen does not support denorm, so we convert to normal float number*/ if(ix <= 0x7fffff && iy <= 0x7fffff) { x = (float)(ix) * (1.0f - ((hx>>30) & 0x2)); y = (float)(iy) * (1.0f - ((hy>>30) & 0x2)); } /* when x is INF */ if(ix==0x7f800000) { if(iy==0x7f800000) { switch(m) { case 0: return pi_o_4+tiny;/* atan(+INF,+INF) */ case 1: return -pi_o_4-tiny;/* atan(-INF,+INF) */ case 2: return (float)3.0*pi_o_4+tiny;/*atan(+INF,-INF)*/ case 3: return (float)-3.0*pi_o_4-tiny;/*atan(-INF,-INF)*/ } } else { switch(m) { case 0: return zero ; /* atan(+...,+INF) */ case 1: return -zero ; /* atan(-...,+INF) */ case 2: return pi+tiny ; /* atan(+...,-INF) */ case 3: return -pi-tiny ; /* atan(-...,-INF) */ } } } /* when y is INF */ if(iy==0x7f800000) return (hy<0)? -pi_o_2-tiny: pi_o_2+tiny; /* compute y/x */ k = (iy-ix)>>23; if(k > 60) z=pi_o_2+(float)0.5*pi_lo; /* |y/x| > 2**60 */ else if(hx<0&&k<-60) z=0.0; /* |y|/x < -2**60 */ else z=__gen_ocl_internal_atan(__gen_ocl_fabs(y/x)); /* safe to do y/x */ switch (m) { case 0: return z ; /* atan(+,+) */ case 1: { uint zh; GEN_OCL_GET_FLOAT_WORD(zh,z); GEN_OCL_SET_FLOAT_WORD(z,zh ^ 0x80000000); } return z ; /* atan(-,+) */ case 2: return pi-(z-pi_lo);/* atan(+,-) */ default: /* case 3 */ return (z-pi_lo)-pi;/* atan(-,-) */ } } OVERLOADABLE float __gen_ocl_internal_atan2pi(float y, float x) { return __gen_ocl_internal_atan2(y, x) / M_PI_F; } OVERLOADABLE float __gen_ocl_internal_fabs(float x) { return __gen_ocl_fabs(x); } OVERLOADABLE float __gen_ocl_internal_trunc(float x) { return __gen_ocl_rndz(x); } OVERLOADABLE float __gen_ocl_internal_round(float x) { float y = __gen_ocl_rndz(x); if (__gen_ocl_fabs(x - y) >= 0.5f) y += __gen_ocl_internal_copysign(1.f, x); return y; } OVERLOADABLE float __gen_ocl_internal_ceil(float x) { return __gen_ocl_rndu(x); } OVERLOADABLE float __gen_ocl_internal_rint(float x) { return __gen_ocl_rnde(x); } OVERLOADABLE float __gen_ocl_internal_exp(float x) { float o_threshold = 8.8721679688e+01, /* 0x42b17180 */ u_threshold = -1.0397208405e+02, /* 0xc2cff1b5 */ twom100 = 7.8886090522e-31, /* 2**-100=0x0d800000 */ ivln2 = 1.4426950216e+00, /* 0x3fb8aa3b =1/ln2 */ one = 1.0, huge = 1.0e+30, P1 = 1.6666667163e-01, /* 0x3e2aaaab */ P2 = -2.7777778450e-03; /* 0xbb360b61 */ float y,hi=0.0,lo=0.0,c,t; int k=0,xsb; unsigned hx; float ln2HI_0 = 6.9313812256e-01; /* 0x3f317180 */ float ln2HI_1 = -6.9313812256e-01; /* 0xbf317180 */ float ln2LO_0 = 9.0580006145e-06; /* 0x3717f7d1 */ float ln2LO_1 = -9.0580006145e-06; /* 0xb717f7d1 */ float half_0 = 0.5; float half_1 = -0.5; GEN_OCL_GET_FLOAT_WORD(hx,x); xsb = (hx>>31)&1; /* sign bit of x */ hx &= 0x7fffffff; /* high word of |x| */ /* filter out non-finite argument */ if(hx >= 0x42b17218) { /* if |x|>=88.721... */ if(hx>0x7f800000) return x+x; /* NaN */ if(hx==0x7f800000) return (xsb==0)? x:0.0; /* exp(+-inf)={inf,0} */ if(x > o_threshold) return huge*huge; /* overflow */ if(x < u_threshold) return twom100*twom100; /* underflow */ } /* argument reduction */ if(hx > 0x3eb17218) { /* if |x| > 0.5 ln2 */ if(hx < 0x3F851592) { /* and |x| < 1.5 ln2 */ hi = x-(xsb ==1 ? ln2HI_1 : ln2HI_0); lo= xsb == 1? ln2LO_1 : ln2LO_0; k = 1-xsb-xsb; } else { float tmp = xsb == 1 ? half_1 : half_0; k = ivln2*x+tmp; t = k; hi = x - t*ln2HI_0; /* t*ln2HI is exact here */ lo = t*ln2LO_0; } x = hi - lo; } else if(hx < 0x31800000) { /* when |x|<2**-28 */ if(huge+x>one) return one+x;/* trigger inexact */ } else k = 0; /* x is now in primary range */ t = x*x; c = x - t*(P1+t*P2); if(k==0) return one-((x*c)/(c-(float)2.0)-x); else y = one-((lo-(x*c)/((float)2.0-c))-hi); if(k >= -125) { unsigned hy; GEN_OCL_GET_FLOAT_WORD(hy,y); GEN_OCL_SET_FLOAT_WORD(y,hy+(k<<23)); /* add k to y's exponent */ return y; } else { unsigned hy; GEN_OCL_GET_FLOAT_WORD(hy,y); GEN_OCL_SET_FLOAT_WORD(y,hy+((k+100)<<23)); /* add k to y's exponent */ return y*twom100; } } /* erf,erfc from glibc s_erff.c -- float version of s_erf.c. * Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com. */ /* * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ INLINE_OVERLOADABLE float __gen_ocl_internal_erf(float x) { /*...*/ const float tiny = 1.0e-30, half_val= 5.0000000000e-01, /* 0x3F000000 */ one = 1.0000000000e+00, /* 0x3F800000 */ two = 2.0000000000e+00, /* 0x40000000 */ /* c = (subfloat)0.84506291151 */ erx = 8.4506291151e-01, /* 0x3f58560b */ /* * Coefficients for approximation to erf on [0,0.84375] */ efx = 1.2837916613e-01, /* 0x3e0375d4 */ efx8= 1.0270333290e+00, /* 0x3f8375d4 */ pp0 = 1.2837916613e-01, /* 0x3e0375d4 */ pp1 = -3.2504209876e-01, /* 0xbea66beb */ pp2 = -2.8481749818e-02, /* 0xbce9528f */ pp3 = -5.7702702470e-03, /* 0xbbbd1489 */ pp4 = -2.3763017452e-05, /* 0xb7c756b1 */ qq1 = 3.9791721106e-01, /* 0x3ecbbbce */ qq2 = 6.5022252500e-02, /* 0x3d852a63 */ qq3 = 5.0813062117e-03, /* 0x3ba68116 */ qq4 = 1.3249473704e-04, /* 0x390aee49 */ qq5 = -3.9602282413e-06, /* 0xb684e21a */ /* * Coefficients for approximation to erf in [0.84375,1.25] */ pa0 = -2.3621185683e-03, /* 0xbb1acdc6 */ pa1 = 4.1485610604e-01, /* 0x3ed46805 */ pa2 = -3.7220788002e-01, /* 0xbebe9208 */ pa3 = 3.1834661961e-01, /* 0x3ea2fe54 */ pa4 = -1.1089469492e-01, /* 0xbde31cc2 */ pa5 = 3.5478305072e-02, /* 0x3d1151b3 */ pa6 = -2.1663755178e-03, /* 0xbb0df9c0 */ qa1 = 1.0642088205e-01, /* 0x3dd9f331 */ qa2 = 5.4039794207e-01, /* 0x3f0a5785 */ qa3 = 7.1828655899e-02, /* 0x3d931ae7 */ qa4 = 1.2617121637e-01, /* 0x3e013307 */ qa5 = 1.3637083583e-02, /* 0x3c5f6e13 */ qa6 = 1.1984500103e-02, /* 0x3c445aa3 */ /* * Coefficients for approximation to erfc in [1.25,1/0.35] */ra0 = -9.8649440333e-03, /* 0xbc21a093 */ ra1 = -6.9385856390e-01, /* 0xbf31a0b7 */ ra2 = -1.0558626175e+01, /* 0xc128f022 */ ra3 = -6.2375331879e+01, /* 0xc2798057 */ ra4 = -1.6239666748e+02, /* 0xc322658c */ ra5 = -1.8460508728e+02, /* 0xc3389ae7 */ ra6 = -8.1287437439e+01, /* 0xc2a2932b */ ra7 = -9.8143291473e+00, /* 0xc11d077e */ sa1 = 1.9651271820e+01, /* 0x419d35ce */ sa2 = 1.3765776062e+02, /* 0x4309a863 */ sa3 = 4.3456588745e+02, /* 0x43d9486f */ sa4 = 6.4538726807e+02, /* 0x442158c9 */ sa5 = 4.2900814819e+02, /* 0x43d6810b */ sa6 = 1.0863500214e+02, /* 0x42d9451f */ sa7 = 6.5702495575e+00, /* 0x40d23f7c */ sa8 = -6.0424413532e-02, /* 0xbd777f97 */ /* * Coefficients for approximation to erfc in [1/.35,28] */ rb0 = -9.8649431020e-03, /* 0xbc21a092 */ rb1 = -7.9928326607e-01, /* 0xbf4c9dd4 */ rb2 = -1.7757955551e+01, /* 0xc18e104b */ rb3 = -1.6063638306e+02, /* 0xc320a2ea */ rb4 = -6.3756646729e+02, /* 0xc41f6441 */ rb5 = -1.0250950928e+03, /* 0xc480230b */ rb6 = -4.8351919556e+02, /* 0xc3f1c275 */ sb1 = 3.0338060379e+01, /* 0x41f2b459 */ sb2 = 3.2579251099e+02, /* 0x43a2e571 */ sb3 = 1.5367296143e+03, /* 0x44c01759 */ sb4 = 3.1998581543e+03, /* 0x4547fdbb */ sb5 = 2.5530502930e+03, /* 0x451f90ce */ sb6 = 4.7452853394e+02, /* 0x43ed43a7 */ sb7 = -2.2440952301e+01; /* 0xc1b38712 */ int hx,ix,i; float R,S,P,Q,s,y,z,r; GEN_OCL_GET_FLOAT_WORD(hx,x); ix = hx&0x7fffffff; if(ix>=0x7f800000) { /* erf(nan)=nan */ i = ((unsigned int)hx>>31)<<1; return (float)(1-i)+one/x; /* erf(+-inf)=+-1 */ } if(ix < 0x3f580000) { /* |x|<0.84375 */ if(ix < 0x31800000) { /* |x|<2**-28 */ if (ix < 0x04000000) /*avoid underflow */ return (float)0.125*((float)8.0*x+efx8*x); return x + efx*x; } z = x*x; r = mad(z, mad(z, mad(z, mad(z, pp4, pp3), pp2), pp1), pp0); s = mad(z, mad(z, mad(z, mad(z, mad(z, qq5,qq4), qq3), qq2), qq1), one); y = r / s; return mad(x, y, x); } if(ix < 0x3fa00000) { /* 0.84375 <= |x| < 1.25 */ s = __gen_ocl_internal_fabs(x)-one; P = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, pa6, pa5), pa4), pa3), pa2), pa1), pa0); Q = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, qa6, qa5), qa4), qa3), qa2), qa1), one); if(hx>=0) return erx + P/Q; else return -erx - P/Q; } if (ix >= 0x40c00000) { /* inf>|x|>=6 */ if(hx>=0) return one-tiny; else return tiny-one; } x = __gen_ocl_internal_fabs(x); s = one/(x*x); if(ix< 0x4036DB6E) { /* |x| < 1/0.35 */ R = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, ra7, ra6), ra5), ra4), ra3), ra2), ra1), ra0); S = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, sa8, sa7), sa6), sa5), sa4), sa3), sa2), sa1), one); } else { /* |x| >= 1/0.35 */ R = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, rb6, rb5), rb4), rb3), rb2), rb1), rb0); S = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, sb7, sb6), sb5), sb4), sb3), sb2), sb1), one); } GEN_OCL_GET_FLOAT_WORD(ix,x); GEN_OCL_SET_FLOAT_WORD(z,ix&0xfffff000); r = __gen_ocl_internal_exp(-z*z-(float)0.5625)*__gen_ocl_internal_exp((z-x)*(z+x)+R/S); if(hx>=0) return one-r/x; else return r/x-one; } INLINE_OVERLOADABLE float __gen_ocl_internal_erfc(float x) { /*...*/ const float tiny = 1.0e-30, half_val= 5.0000000000e-01, /* 0x3F000000 */ one = 1.0000000000e+00, /* 0x3F800000 */ two = 2.0000000000e+00, /* 0x40000000 */ /* c = (subfloat)0.84506291151 */ erx = 8.4506291151e-01, /* 0x3f58560b */ /* * Coefficients for approximation to erf on [0,0.84375] */ efx = 1.2837916613e-01, /* 0x3e0375d4 */ efx8= 1.0270333290e+00, /* 0x3f8375d4 */ pp0 = 1.2837916613e-01, /* 0x3e0375d4 */ pp1 = -3.2504209876e-01, /* 0xbea66beb */ pp2 = -2.8481749818e-02, /* 0xbce9528f */ pp3 = -5.7702702470e-03, /* 0xbbbd1489 */ pp4 = -2.3763017452e-05, /* 0xb7c756b1 */ qq1 = 3.9791721106e-01, /* 0x3ecbbbce */ qq2 = 6.5022252500e-02, /* 0x3d852a63 */ qq3 = 5.0813062117e-03, /* 0x3ba68116 */ qq4 = 1.3249473704e-04, /* 0x390aee49 */ qq5 = -3.9602282413e-06, /* 0xb684e21a */ /* * Coefficients for approximation to erf in [0.84375,1.25] */ pa0 = -2.3621185683e-03, /* 0xbb1acdc6 */ pa1 = 4.1485610604e-01, /* 0x3ed46805 */ pa2 = -3.7220788002e-01, /* 0xbebe9208 */ pa3 = 3.1834661961e-01, /* 0x3ea2fe54 */ pa4 = -1.1089469492e-01, /* 0xbde31cc2 */ pa5 = 3.5478305072e-02, /* 0x3d1151b3 */ pa6 = -2.1663755178e-03, /* 0xbb0df9c0 */ qa1 = 1.0642088205e-01, /* 0x3dd9f331 */ qa2 = 5.4039794207e-01, /* 0x3f0a5785 */ qa3 = 7.1828655899e-02, /* 0x3d931ae7 */ qa4 = 1.2617121637e-01, /* 0x3e013307 */ qa5 = 1.3637083583e-02, /* 0x3c5f6e13 */ qa6 = 1.1984500103e-02, /* 0x3c445aa3 */ /* * Coefficients for approximation to erfc in [1.25,1/0.35] */ra0 = -9.8649440333e-03, /* 0xbc21a093 */ ra1 = -6.9385856390e-01, /* 0xbf31a0b7 */ ra2 = -1.0558626175e+01, /* 0xc128f022 */ ra3 = -6.2375331879e+01, /* 0xc2798057 */ ra4 = -1.6239666748e+02, /* 0xc322658c */ ra5 = -1.8460508728e+02, /* 0xc3389ae7 */ ra6 = -8.1287437439e+01, /* 0xc2a2932b */ ra7 = -9.8143291473e+00, /* 0xc11d077e */ sa1 = 1.9651271820e+01, /* 0x419d35ce */ sa2 = 1.3765776062e+02, /* 0x4309a863 */ sa3 = 4.3456588745e+02, /* 0x43d9486f */ sa4 = 6.4538726807e+02, /* 0x442158c9 */ sa5 = 4.2900814819e+02, /* 0x43d6810b */ sa6 = 1.0863500214e+02, /* 0x42d9451f */ sa7 = 6.5702495575e+00, /* 0x40d23f7c */ sa8 = -6.0424413532e-02, /* 0xbd777f97 */ /* * Coefficients for approximation to erfc in [1/.35,28] */ rb0 = -9.8649431020e-03, /* 0xbc21a092 */ rb1 = -7.9928326607e-01, /* 0xbf4c9dd4 */ rb2 = -1.7757955551e+01, /* 0xc18e104b */ rb3 = -1.6063638306e+02, /* 0xc320a2ea */ rb4 = -6.3756646729e+02, /* 0xc41f6441 */ rb5 = -1.0250950928e+03, /* 0xc480230b */ rb6 = -4.8351919556e+02, /* 0xc3f1c275 */ sb1 = 3.0338060379e+01, /* 0x41f2b459 */ sb2 = 3.2579251099e+02, /* 0x43a2e571 */ sb3 = 1.5367296143e+03, /* 0x44c01759 */ sb4 = 3.1998581543e+03, /* 0x4547fdbb */ sb5 = 2.5530502930e+03, /* 0x451f90ce */ sb6 = 4.7452853394e+02, /* 0x43ed43a7 */ sb7 = -2.2440952301e+01; /* 0xc1b38712 */ int hx,ix; float R,S,P,Q,s,y,z,r; GEN_OCL_GET_FLOAT_WORD(hx,x); ix = hx&0x7fffffff; if(ix>=0x7f800000) { /* erfc(nan)=nan */ /* erfc(+-inf)=0,2 */ return (float)(((unsigned int)hx>>31)<<1)+one/x; } if(ix < 0x3f580000) { /* |x|<0.84375 */ if(ix < 0x23800000) /* |x|<2**-56 */ return one-x; z = x*x; r = mad(z, mad(z, mad(z, mad(z, pp4, pp3), pp2), pp1), pp0); s = mad(z, mad(z, mad(z, mad(z, mad(z, qq5, qq4), qq3), qq2), qq1), one); y = r/s; if(hx < 0x3e800000) { /* x<1/4 */ return one-(x+x*y); } else { r = x*y; r += (x-half_val); return half_val - r ; } } if(ix < 0x3fa00000) { /* 0.84375 <= |x| < 1.25 */ s = __gen_ocl_internal_fabs(x)-one; P = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, pa6, pa5), pa4), pa3), pa2), pa1), pa0); Q = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, qa6, qa5), qa4), qa3), qa2), qa1), one); if(hx>=0) { z = one-erx; return z - P/Q; } else { z = erx+P/Q; return one+z; } } if (ix < 0x41e00000) { /* |x|<28 */ x = __gen_ocl_internal_fabs(x); s = one/(x*x); if(ix< 0x4036DB6D) { /* |x| < 1/.35 ~ 2.857143*/ R = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, ra7, ra6), ra5), ra4), ra3), ra2), ra1), ra0); S = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, sa8, sa7), sa6), sa5), sa4), sa3), sa2), sa1), one); } else { /* |x| >= 1/.35 ~ 2.857143 */ if(hx<0&&ix>=0x40c00000) return two-tiny;/* x < -6 */ R = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, rb6, rb5), rb4), rb3), rb2), rb1), rb0); S = mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, mad(s, sb7, sb6), sb5), sb4), sb3), sb2), sb1), one); } GEN_OCL_GET_FLOAT_WORD(ix,x); GEN_OCL_SET_FLOAT_WORD(z,ix&0xffffe000); r = __gen_ocl_internal_exp(-z*z-(float)0.5625)* __gen_ocl_internal_exp((z-x)*(z+x)+R/S); if(hx>0) { float ret = r/x; return ret; } else return two-r/x; } else { if(hx>0) { return tiny*tiny; } else return two-tiny; } } OVERLOADABLE float __gen_ocl_internal_fmod (float x, float y) { //return x-y*__gen_ocl_rndz(x/y); float one = 1.0; float Zero[2]; int n,hx,hy,hz,ix,iy,sx,i; Zero[0] = 0.0; Zero[1] = -0.0; GEN_OCL_GET_FLOAT_WORD(hx,x); GEN_OCL_GET_FLOAT_WORD(hy,y); sx = hx&0x80000000; /* sign of x */ hx ^=sx; /* |x| */ hy &= 0x7fffffff; /* |y| */ /* purge off exception values */ if(hy==0||(hx>=0x7f800000)|| /* y=0,or x not finite */ (hy>0x7f800000)) /* or y is NaN */ return (x*y)/(x*y); if(hx>31]; /* |x|=|y| return x*0*/ /* determine ix = ilogb(x) */ if(hx<0x00800000) { /* subnormal x */ for (ix = -126,i=(hx<<8); i>0; i<<=1) ix -=1; } else ix = (hx>>23)-127; /* determine iy = ilogb(y) */ if(hy<0x00800000) { /* subnormal y */ for (iy = -126,i=(hy<<8); i>=0; i<<=1) iy -=1; } else iy = (hy>>23)-127; /* set up {hx,lx}, {hy,ly} and align y to x */ if(ix >= -126) hx = 0x00800000|(0x007fffff&hx); else { /* subnormal x, shift x to normal */ n = -126-ix; hx = hx<= -126) hy = 0x00800000|(0x007fffff&hy); else { /* subnormal y, shift y to normal */ n = -126-iy; hy = hy<>31]; hx = hz+hz; } } hz=hx-hy; if(hz>=0) {hx=hz;} /* convert back to floating value and restore the sign */ if(hx==0) /* return sign(x)*0 */ return Zero[(unsigned)sx>>31]; while(hx<0x00800000) { /* normalize x */ hx = hx+hx; iy -= 1; } if(iy>= -126) { /* normalize output */ hx = ((hx-0x00800000)|((iy+127)<<23)); GEN_OCL_SET_FLOAT_WORD(x,hx|sx); } else { /* subnormal output */ n = -126 - iy; hx >>= n; GEN_OCL_SET_FLOAT_WORD(x,hx|sx); x *= one; /* create necessary signal */ } return x; /* exact output */ } OVERLOADABLE float __gen_ocl_internal_expm1(float x) { //return __gen_ocl_pow(M_E_F, x) - 1; float Q1 = -3.3333335072e-02, /* 0xbd088889 */ ln2_hi = 6.9313812256e-01, /* 0x3f317180 */ ln2_lo = 9.0580006145e-06, /* 0x3717f7d1 */ Q2 = 1.5873016091e-03, /* 0x3ad00d01 */ huge = 1.0e30, tiny = 1.0e-30, ivln2 = 1.4426950216e+00, /* 0x3fb8aa3b =1/ln2 */ one = 1.0, o_threshold= 8.8721679688e+01; /* 0x42b17180 */ float y,hi,lo,c,t,e,hxs,hfx,r1; int k,xsb; int hx; GEN_OCL_GET_FLOAT_WORD(hx,x); xsb = hx&0x80000000; /* sign bit of x */ //if(xsb==0) //y=x; //else //y= -x; /* y = |x| */ y = __gen_ocl_internal_fabs(x); hx &= 0x7fffffff; /* high word of |x| */ /* filter out huge and non-finite argument */ if(hx >= 0x4195b844) { /* if |x|>=27*ln2 */ if(hx >= 0x42b17218) { /* if |x|>=88.721... */ if(hx>0x7f800000) return x+x; /* NaN */ if(hx==0x7f800000) return (xsb==0)? x:-1.0;/* exp(+-inf)={inf,-1} */ if(x > o_threshold) return huge*huge; /* overflow */ } if(xsb!=0) { /* x < -27*ln2, return -1.0 with inexact */ if(x+tiny<(float)0.0) /* raise inexact */ return tiny-one; /* return -1 */ } } /* argument reduction */ if(hx > 0x3eb17218) {/* if |x| > 0.5 ln2 */ if(hx < 0x3F851592) {/* and |x| < 1.5 ln2 */ if(xsb==0){ hi = x - ln2_hi; lo = ln2_lo; k = 1; } else { hi = x + ln2_hi; lo = -ln2_lo; k = -1; } } else { k = ivln2*x+((xsb==0)?(float)0.5:(float)-0.5); t = k; hi = x - t*ln2_hi;/* t*ln2_hi is exact here */ lo = t*ln2_lo; } x = hi - lo; c = (hi-x)-lo; } else if(hx < 0x33000000) { /* when |x|<2**-25, return x */ //t = huge+x; /* return x with inexact flags when x!=0 */ //return x - (t-(huge+x)); return x; } else k = 0; /* x is now in primary range */ hfx = (float)0.5*x; hxs = x*hfx; r1 = one+hxs*(Q1+hxs*Q2); t = (float)3.0-r1*hfx; e = hxs*((r1-t)/((float)6.0 - x*t)); if(k==0) return x - (x*e-hxs); /* c is 0 */ else{ e = (x*(e-c)-c); e -= hxs; if(k== -1)return (float)0.5*(x-e)-(float)0.5; if(k==1){ if(x < (float)-0.25) return -(float)2.0*(e-(x+(float)0.5)); else return (one+(float)2.0*(x-e)); } if (k <= -2 || k>56) { /* suffice to return exp(x)-1 */ int i; y = one-(e-x); GEN_OCL_GET_FLOAT_WORD(i,y); GEN_OCL_SET_FLOAT_WORD(y,i+(k<<23)); /* add k to y's exponent */ return y-one; } t = one; if(k<23) { int i; GEN_OCL_SET_FLOAT_WORD(t,0x3f800000 - (0x1000000>>k)); /* t=1-2^-k */ y = t-(e-x); GEN_OCL_GET_FLOAT_WORD(i,y); GEN_OCL_SET_FLOAT_WORD(y,i+(k<<23)); /* add k to y's exponent */ } else { int i; GEN_OCL_SET_FLOAT_WORD(t,((0x7f-k)<<23)); /* 2^-k */ y = x-(e+t); y += one; GEN_OCL_GET_FLOAT_WORD(i,y); GEN_OCL_SET_FLOAT_WORD(y,i+(k<<23)); /* add k to y's exponent */ } } return y; } OVERLOADABLE float __gen_ocl_internal_acosh(float x) { //return native_log(x + native_sqrt(x + 1) * native_sqrt(x - 1)); float one = 1.0, ln2 = 6.9314718246e-01;/* 0x3f317218 */ float t; int hx; GEN_OCL_GET_FLOAT_WORD(hx,x); if(hx<0x3f800000) { /* x < 1 */ return (x-x)/(x-x); } else if(hx >=0x4d800000) { /* x > 2**28 */ if(hx >=0x7f800000) {/* x is inf of NaN */ return x+x; } else return __gen_ocl_internal_log(x)+ln2;/* acosh(huge)=log(2x) */ } else if (hx==0x3f800000) { return 0.0; /* acosh(1) = 0 */ } else if (hx > 0x40000000) { /* 2**28 > x > 2 */ t=x*x; return __gen_ocl_internal_log((float)2.0*x-one/(x+__gen_ocl_sqrt(t-one))); } else { /* 1one) return x; /* return x inexact except 0 */ } if(ix>0x47000000) {/* |x| > 2**14 */ if(ix>=0x7f800000) return x+x;/* x is inf or NaN */ w = __gen_ocl_internal_log(__gen_ocl_internal_fabs(x))+ln2; } else { float xa = __gen_ocl_internal_fabs(x); if (ix>0x40000000) {/* 2**14 > |x| > 2.0 */ w = __gen_ocl_internal_log(mad(xa, 2.0f, one / (__gen_ocl_sqrt(mad(xa, xa, one)) + xa))); } else { /* 2.0 > |x| > 2**-14 */ float t = xa*xa; w =log1p(xa+t/(one+__gen_ocl_sqrt(one+t))); } } return __gen_ocl_internal_copysign(w, x); } OVERLOADABLE float __gen_ocl_internal_sinh(float x){ //return (1 - native_exp(-2 * x)) / (2 * native_exp(-x)); float one = 1.0, shuge = 1.0e37; float t,w,h; int ix,jx; GEN_OCL_GET_FLOAT_WORD(jx,x); ix = jx&0x7fffffff; /* x is INF or NaN */ if(ix>=0x7f800000) return x+x; h = 0.5; if (jx<0) h = -h; /* |x| in [0,22], return sign(x)*0.5*(E+E/(E+1))) */ if (ix < 0x41b00000) { /* |x|<22 */ if (ix<0x31800000) /* |x|<2**-28 */ if(shuge+x>one) return x;/* sinh(tiny) = tiny with inexact */ t = __gen_ocl_internal_expm1(__gen_ocl_internal_fabs(x)); if(ix<0x3f800000) return h*((float)2.0*t-t*t/(t+one)); return h*(t+t/(t+one)); } /* |x| in [22, log(maxdouble)] return 0.5*exp(|x|) */ if (ix < 0x42b17180) return h*__gen_ocl_internal_exp(__gen_ocl_internal_fabs(x)); /* |x| in [log(maxdouble), overflowthresold] */ if (ix<=0x42b2d4fc) { w = __gen_ocl_internal_exp((float)0.5*__gen_ocl_internal_fabs(x)); t = h*w; return t*w; } /* |x| > overflowthresold, sinh(x) overflow */ return x*shuge; } OVERLOADABLE float __gen_ocl_internal_tanh(float x) { //float y = native_exp(-2 * x); //return (1 - y) / (1 + y); float one=1.0, two=2.0, tiny = 1.0e-30; float t,z; int jx,ix; GEN_OCL_GET_FLOAT_WORD(jx,x); ix = jx&0x7fffffff; /* x is INF or NaN */ if(ix>=0x7f800000) { if (jx>=0) return one/x+one; /* tanh(+-inf)=+-1 */ else return one/x-one; /* tanh(NaN) = NaN */ } if (ix < 0x41b00000) { /* |x|<22 */ if (ix == 0) return x; /* x == +-0 */ if (ix<0x24000000) /* |x|<2**-55 */ return x*(one+x); /* tanh(small) = small */ if (ix>=0x3f800000) { /* |x|>=1 */ t = __gen_ocl_internal_expm1(two*__gen_ocl_internal_fabs(x)); z = one - two/(t+two); } else { t = __gen_ocl_internal_expm1(-two*__gen_ocl_internal_fabs(x)); z= -t/(t+two); } } else { /* |x| > 22, return +-1 */ z = one - tiny; /* raised inexact flag */ } return (jx>=0)? z: -z; } OVERLOADABLE float __gen_ocl_internal_cosh(float x) { //return (1 + native_exp(-2 * x)) / (2 * native_exp(-x)); float halF = 0.5, huge = 1.0e+30, tiny = 1.0e-30, one = 1.0; float t,w; int ix; GEN_OCL_GET_FLOAT_WORD(ix,x); ix &= 0x7fffffff; /* |x| in [0,22] */ if (ix < 0x41b00000) { /* |x| in [0,0.5*ln2], return 1+expm1(|x|)^2/(2*exp(|x|)) */ if(ix<0x3eb17218) { t = __gen_ocl_internal_expm1(__gen_ocl_fabs(x)); w = one+t; if (ix<0x24000000) return w; /* cosh(tiny) = 1 */ return one+(t*t)/(w+w); } /* |x| in [0.5*ln2,22], return (exp(|x|)+1/exp(|x|)/2; */ t = __gen_ocl_internal_exp(__gen_ocl_fabs(x)); return halF*t+halF/t; } /* |x| in [22, log(maxdouble)] return half*exp(|x|) */ if (ix < 0x42b17180) return halF*__gen_ocl_internal_exp(__gen_ocl_fabs(x)); /* |x| in [log(maxdouble), overflowthresold] */ if (ix<=0x42b2d4fc) { w = __gen_ocl_internal_exp(halF*__gen_ocl_fabs(x)); t = halF*w; return t*w; } /* x is INF or NaN */ if(ix>=0x7f800000) return x*x; /* |x| > overflowthresold, cosh(x) overflow */ return huge*huge; } OVERLOADABLE float __gen_ocl_internal_remainder(float x, float p){ //return x-y*__gen_ocl_rnde(x/y); float zero = 0.0; int hx,hp; unsigned sx; float p_half; GEN_OCL_GET_FLOAT_WORD(hx,x); GEN_OCL_GET_FLOAT_WORD(hp,p); sx = hx&0x80000000; hp &= 0x7fffffff; hx &= 0x7fffffff; /* purge off exception values */ if(hp==0) return (x*p)/(x*p); /* p = 0 */ if((hx>=0x7f800000)|| /* x not finite */ ((hp>0x7f800000))) /* p is NaN */ return (x*p)/(x*p); if (hp<=0x7effffff) x = __gen_ocl_internal_fmod(x,p+p); /* now x < 2p */ if ((hx-hp)==0) return zero*x; x = __gen_ocl_fabs(x); p = __gen_ocl_fabs(p); if (hp<0x01000000) { if(x+x>p) { x-=p; if(x+x>=p) x -= p; } } else { p_half = (float)0.5*p; if(x>p_half) { x-=p; if(x>=p_half) x -= p; } } GEN_OCL_GET_FLOAT_WORD(hx,x); GEN_OCL_SET_FLOAT_WORD(x,hx^sx); return x; } OVERLOADABLE float __gen_ocl_internal_ldexp(float x, int n) { x = __gen_ocl_scalbnf(x,n); return x; } OVERLOADABLE float __gen_ocl_internal_atanh(float x) { //return 0.5f * native_sqrt((1 + x) / (1 - x)); float xa = __gen_ocl_fabs (x); float t; if (isless (xa, 0.5f)){ if (xa < 0x1.0p-28f) return x; t = xa + xa; t = 0.5f * log1p (t + t * xa / (1.0f - xa)); } else if (isless (xa, 1.0f)){ t = 0.5f * log1p ((xa + xa) / (1.0f - xa)); } else{ if (isgreater (xa, 1.0f)) return (x - x) / (x - x); return x / 0.0f; } return __gen_ocl_internal_copysign(t, x); } OVERLOADABLE float __gen_ocl_internal_exp10(float x){ float px, qx,ans; short n; int i; float*p; float MAXL10 = 38.230809449325611792; float LOG210 = 3.32192809488736234787e0; float LG102A = 3.00781250000000000000E-1; float LG102B = 2.48745663981195213739E-4; float P[6]; P[0] = 2.063216740311022E-001; P[1] = 5.420251702225484E-001; P[2] = 1.171292686296281E+000; P[3] = 2.034649854009453E+000; P[4] = 2.650948748208892E+000; P[5] = 2.302585167056758E+000; if( x < -MAXL10 ) return 0.0; if( isinf(x)) return INFINITY; /* The following is necessary because range reduction blows up: */ if( x == 0 )return 1.0; /* Express 10**x = 10**g 2**n * = 10**g 10**( n log10(2) ) * = 10**( g + n log10(2) ) */ px = x * LOG210; qx = __gen_ocl_internal_floor( px + 0.5 ); n = qx; x -= qx * LG102A; x -= qx * LG102B; /* rational approximation for exponential * of the fractional part: * 10**x - 1 = 2x P(x**2)/( Q(x**2) - P(x**2) ) */ p = P; ans = *p++; i = 5; do{ ans = ans * x + *p++; } while( --i ); px = 1.0 + x * ans; /* multiply by power of 2 */ x = __gen_ocl_internal_ldexp( px, n ); return x; } OVERLOADABLE float cospi(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_cospi(x); return __gen_ocl_internal_cospi(x); } OVERLOADABLE float cosh(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_cosh(x); return __gen_ocl_internal_cosh(x); } OVERLOADABLE float acos(float x) { return __gen_ocl_internal_acos(x); } OVERLOADABLE float acospi(float x) { return __gen_ocl_internal_acospi(x); } OVERLOADABLE float acosh(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_acosh(x); return __gen_ocl_internal_acosh(x); } OVERLOADABLE float sinpi(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_sinpi(x); return __gen_ocl_internal_sinpi(x); } OVERLOADABLE float sinh(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_sinh(x); return __gen_ocl_internal_sinh(x); } OVERLOADABLE float asin(float x) { return __gen_ocl_internal_asin(x); } OVERLOADABLE float asinpi(float x) { return __gen_ocl_internal_asinpi(x); } OVERLOADABLE float asinh(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_asinh(x); return __gen_ocl_internal_asinh(x); } OVERLOADABLE float tanpi(float x) { return __gen_ocl_internal_tanpi(x); } OVERLOADABLE float tanh(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_tanh(x); return __gen_ocl_internal_tanh(x); } OVERLOADABLE float atan(float x) { return __gen_ocl_internal_atan(x); } OVERLOADABLE float atan2(float y, float x) { return __gen_ocl_internal_atan2(y, x); } OVERLOADABLE float atan2pi(float y, float x) { return __gen_ocl_internal_atan2pi(y, x); } OVERLOADABLE float atanpi(float x) { return __gen_ocl_internal_atanpi(x); } OVERLOADABLE float atanh(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_atanh(x); return __gen_ocl_internal_atanh(x); } OVERLOADABLE float cbrt(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_cbrt(x); return __gen_ocl_internal_cbrt(x); } OVERLOADABLE float rint(float x) { return __gen_ocl_internal_rint(x); } OVERLOADABLE float copysign(float x, float y) { return __gen_ocl_internal_copysign(x, y); } OVERLOADABLE float erf(float x) { return __gen_ocl_internal_erf(x); } OVERLOADABLE float erfc(float x) { return __gen_ocl_internal_erfc(x); } OVERLOADABLE float fmod (float x, float y) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_fmod(x, y); return __gen_ocl_internal_fmod(x, y); } OVERLOADABLE float remainder(float x, float p) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_remainder(x, p); return __gen_ocl_internal_remainder(x, p); } OVERLOADABLE float ldexp(float x, int n) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_ldexp(x, n); if (x == (float)0.0f) x = 0.0f; return __gen_ocl_internal_ldexp(x, n); } CONST OVERLOADABLE float __gen_ocl_mad(float a, float b, float c) __asm("llvm.fma" ".f32"); CONST OVERLOADABLE half __gen_ocl_mad(half a, half b, half c) __asm("llvm.fma" ".f16"); PURE CONST float __gen_ocl_fmax(float a, float b); PURE CONST float __gen_ocl_fmin(float a, float b); OVERLOADABLE float mad(float a, float b, float c) { return __gen_ocl_mad(a, b, c); } #define BODY \ if (isnan(x) || isinf(x)) { \ *exp = 0; \ return x; \ } \ uint u = as_uint(x); \ uint a = u & 0x7FFFFFFFu; \ if (a == 0) { \ *exp = 0; \ return x; \ } \ if (a >= 0x800000) { \ *exp = (a >> 23) - 126; \ return as_float((u & (0x807FFFFFu)) | 0x3F000000); \ } \ int e = -126; \ while (a < 0x400000) { \ e --; \ a <<= 1; \ } \ a <<= 1; \ *exp = e; \ return as_float((a & (0x807FFFFFu)) | (u & 0x80000000u) | 0x3F000000); OVERLOADABLE float frexp(float x, int *exp) { BODY; } #undef BODY OVERLOADABLE float nextafter(float x, float y) { int hx, hy, ix, iy; hx = as_int(x); hy = as_int(y); ix = hx & 0x7fffffff; iy = hy & 0x7fffffff; if(ix == 0) ix = hx & 0x7fffff; if(iy == 0) iy = hy & 0x7fffff; if(ix>0x7f800000 || iy>0x7f800000) return x+y; if(hx == hy) return y; if(ix == 0) { if(iy == 0) return y; else return as_float((hy&0x80000000) | 1); } if(hx >= 0) { if(hx > hy) { hx -= 1; } else { hx += 1; } } else { if(hy >= 0 || hx > hy){ hx -= 1; } else { hx += 1; } } return as_float(hx); } #define BODY \ uint hx = as_uint(x), ix = hx & 0x7FFFFFFF; \ if (ix > 0x7F800000) { \ *i = nan(0u); \ return nan(0u); \ } \ if (ix == 0x7F800000) { \ *i = x; \ return as_float(hx & 0x80000000u); \ } \ *i = __gen_ocl_rndz(x); \ return x - *i; OVERLOADABLE float modf(float x, float *i) { BODY; } #undef BODY OVERLOADABLE float __gen_ocl_internal_fmax(float a, float b) { return max(a,b); } OVERLOADABLE float __gen_ocl_internal_fmin(float a, float b) { return min(a,b); } OVERLOADABLE float __gen_ocl_internal_fmax(half a, half b) { return max(a,b); } OVERLOADABLE float __gen_ocl_internal_fmin(half a, half b) { return min(a,b); } OVERLOADABLE float __gen_ocl_internal_maxmag(float x, float y) { float a = __gen_ocl_fabs(x), b = __gen_ocl_fabs(y); return a > b ? x : b > a ? y : max(x, y); } OVERLOADABLE float __gen_ocl_internal_minmag(float x, float y) { float a = __gen_ocl_fabs(x), b = __gen_ocl_fabs(y); return a < b ? x : b < a ? y : min(x, y); } OVERLOADABLE float __gen_ocl_internal_fdim(float x, float y) { if(isnan(x)) return x; if(isnan(y)) return y; return x > y ? (x - y) : +0.f; } /* * the pow/pown high precision implementation are copied from msun library. * Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com. */ /* * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ OVERLOADABLE float __gen_ocl_internal_pow(float x, float y) { float z,ax,z_h,z_l,p_h,p_l; float y1,t1,t2,r,s,sn,t,u,v,w; int i,j,k,yisint,n; int hx,hy,ix,iy,is; float bp[2],dp_h[2],dp_l[2], zero = 0.0, one = 1.0, two = 2.0, two24 = 16777216.0, /* 0x4b800000 */ huge = 1.0e30, tiny = 1.0e-30, /* poly coefs for (3/2)*(log(x)-2s-2/3*s**3 */ L1 = 6.0000002384e-01, /* 0x3f19999a */ L2 = 4.2857143283e-01, /* 0x3edb6db7 */ P1 = 1.6666667163e-01, /* 0x3e2aaaab */ P2 = -2.7777778450e-03, /* 0xbb360b61 */ lg2 = 6.9314718246e-01, /* 0x3f317218 */ lg2_h = 6.93145752e-01, /* 0x3f317200 */ lg2_l = 1.42860654e-06, /* 0x35bfbe8c */ ovt = 4.2995665694e-08, /* -(128-log2(ovfl+.5ulp)) */ cp = 9.6179670095e-01, /* 0x3f76384f =2/(3ln2) */ cp_h = 9.6179199219e-01, /* 0x3f763800 =head of cp */ cp_l = 4.7017383622e-06, /* 0x369dc3a0 =tail of cp_h */ ivln2 = 1.4426950216e+00, /* 0x3fb8aa3b =1/ln2 */ ivln2_h = 1.4426879883e+00, /* 0x3fb8aa00 =16b 1/ln2*/ ivln2_l = 7.0526075433e-06; /* 0x36eca570 =1/ln2 tail*/ bp[0] = 1.0,bp[1] = 1.5, dp_h[0] = 0.0,dp_h[1] = 5.84960938e-01, dp_l[0] = 0.0,dp_l[1] = 1.56322085e-06; GEN_OCL_GET_FLOAT_WORD(hx,x); GEN_OCL_GET_FLOAT_WORD(hy,y); ix = hx&0x7fffffff; iy = hy&0x7fffffff; if (ix < 0x00800000) { /* x < 2**-126 */ ix = 0;/* Gen does not support subnormal number now */ } if (iy < 0x00800000) { /* y < 2**-126 */ iy = 0;/* Gen does not support subnormal number now */ } /* y==zero: x**0 = 1 */ if(iy==0) return one; /* pow(+1, y) returns 1 for any y, even a NAN */ if(hx==0x3f800000) return one; /* +-NaN return x+y */ if(ix > 0x7f800000 || iy > 0x7f800000) return (x+0.0f)+y+(0.0f); /* determine if y is an odd int when x < 0 * yisint = 0 ... y is not an integer * yisint = 1 ... y is an odd int * yisint = 2 ... y is an even int */ yisint = 0; if(hx<0) { if(iy>=0x4b800000) yisint = 2; /* even integer y */ else if(iy>=0x3f800000) { k = (iy>>23)-0x7f; /* exponent */ j = iy>>(23-k); if((j<<(23-k))==iy) yisint = 2-(j&1); } } /* special value of y */ if (iy==0x7f800000) { /* y is +-inf */ if (ix==0x3f800000) //return y - y; /* inf**+-1 is NaN */ return one; else if (ix > 0x3f800000)/* (|x|>1)**+-inf = inf,0 */ return (hy>=0)? y: zero; else /* (|x|<1)**-,+inf = inf,0 */ return (hy<0)?-y: zero; } if(iy==0x3f800000) { /* y is +-1 */ if(hy<0) return one/x; else return x; } if(hy==0x40000000) return x*x; /* y is 2 */ if(hy==0x3f000000) { /* y is 0.5 */ if(hx>=0)return __gen_ocl_sqrt(x); } ax = __gen_ocl_fabs(x); /* special value of x */ if(ix==0x7f800000||ix==0||ix==0x3f800000){ z = ax; /*x is +-0,+-inf,+-1*/ if(hy<0) z = one/z; /* z = (1/|x|) */ if(hx<0) { if(((ix-0x3f800000)|yisint)==0) { z = (z-z)/(z-z); /* (-1)**non-int is NaN */ } else if(yisint==1) z = -z; /* (x<0)**odd = -(|x|**odd) */ } return z; } n = ((uint)hx>>31)-1; /* (x<0)**(non-int) is NaN */ if((n|yisint)==0) return (x-x)/(x-x); sn = one; /* s (sign of result -ve**odd) = -1 else = 1 */ if((n|(yisint-1))==0) sn = -one;/* (-ve)**(odd int) */ /* |y| is huge */ if(iy>0x4d000000) { /* if |y| > 2**27 */ /* over/underflow if x is not close to one */ if(ix<0x3f7ffff8) return (hy<0)? sn*huge*huge:sn*tiny*tiny; if(ix>0x3f800007) return (hy>0)? sn*huge*huge:sn*tiny*tiny; /* now |1-x| is tiny <= 2**-20, suffice to compute log(x) by x-x^2/2+x^3/3-x^4/4 */ t = ax-1; /* t has 20 trailing zeros */ w = (t*t)*((float)0.5-t*(0.333333333333f-t*0.25f)); u = ivln2_h*t; /* ivln2_h has 16 sig. bits */ v = t*ivln2_l-w*ivln2; t1 = u+v; GEN_OCL_GET_FLOAT_WORD(is,t1); GEN_OCL_SET_FLOAT_WORD(t1,is&0xfffff000); t2 = v-(t1-u); } else { float s2,s_h,s_l,t_h,t_l; n = 0; /* take care subnormal number */ //if(ix<0x00800000) //{ax *= two24; n -= 24; GEN_OCL_GET_FLOAT_WORD(ix,ax); } n += ((ix)>>23)-0x7f; j = ix&0x007fffff; /* determine interval */ ix = j|0x3f800000; /* normalize ix */ if(j<=0x1cc471) k=0; /* |x|>1)&0xfffff000)|0x20000000; GEN_OCL_SET_FLOAT_WORD(t_h,is+0x00400000+(k<<21)); t_l = ax - (t_h-bp[k]); s_l = v*((u-s_h*t_h)-s_h*t_l); /* compute log(ax) */ s2 = s*s; r = s2*s2*(L1+s2*L2); r += s_l*(s_h+s); s2 = s_h*s_h; t_h = 3.0f+s2+r; GEN_OCL_GET_FLOAT_WORD(is,t_h); GEN_OCL_SET_FLOAT_WORD(t_h,is&0xffffe000); t_l = r-((t_h-3.0f)-s2); /* u+v = s*(1+...) */ u = s_h*t_h; v = s_l*t_h+t_l*s; /* 2/(3log2)*(s+...) */ p_h = u+v; GEN_OCL_GET_FLOAT_WORD(is,p_h); GEN_OCL_SET_FLOAT_WORD(p_h,is&0xffffe000); p_l = v-(p_h-u); z_h = cp_h*p_h; /* cp_h+cp_l = 2/(3*log2) */ z_l = cp_l*p_h+p_l*cp+dp_l[k]; /* log2(ax) = (s+..)*2/(3*log2) = n + dp_h + z_h + z_l */ t = (float)n; t1 = (((z_h+z_l)+dp_h[k])+t); GEN_OCL_GET_FLOAT_WORD(is,t1); GEN_OCL_SET_FLOAT_WORD(t1,is&0xffffe000); t2 = z_l-(((t1-t)-dp_h[k])-z_h); } /* split up y into y1+y2 and compute (y1+y2)*(t1+t2) */ GEN_OCL_GET_FLOAT_WORD(is,y); GEN_OCL_SET_FLOAT_WORD(y1,is&0xffffe000); p_l = (y-y1)*t1+y*t2; p_h = y1*t1; z = p_l+p_h; GEN_OCL_GET_FLOAT_WORD(j,z); if (j>0x43000000) /* if z > 128 */ return sn*huge*huge; /* overflow */ else if (j==0x43000000) { /* if z == 128 */ if(p_l+ovt>z-p_h) return sn*huge*huge; /* overflow */ } else if ((j&0x7fffffff)>0x43160000) /* z <= -150 */ return sn*tiny*tiny; /* underflow */ else if (j==0xc3160000){ /* z == -150 */ if(p_l<=z-p_h) return sn*tiny*tiny; /* underflow */ } /* * compute 2**(p_h+p_l) */ i = j&0x7fffffff; k = (i>>23)-0x7f; n = 0; if(i>0x3f000000) { /* if |z| > 0.5, set n = [z+0.5] */ n = j+(0x00800000>>(k+1)); k = ((n&0x7fffffff)>>23)-0x7f; /* new k for n */ GEN_OCL_SET_FLOAT_WORD(t,n&~(0x007fffff>>k)); n = ((n&0x007fffff)|0x00800000)>>(23-k); if(j<0) n = -n; p_h -= t; } t = p_l+p_h; GEN_OCL_GET_FLOAT_WORD(is,t); GEN_OCL_SET_FLOAT_WORD(t,is&0xffff8000); u = t*lg2_h; v = (p_l-(t-p_h))*lg2+t*lg2_l; z = u+v; w = v-(z-u); t = z*z; t1 = z - t*(P1+t*P2); r = (z*t1)/(t1-two)-(w+z*w); z = one-(r-z); GEN_OCL_GET_FLOAT_WORD(j,z); j += (n<<23); if((j>>23)<=0) z = __gen_ocl_scalbnf(z,n); /* subnormal output */ else GEN_OCL_SET_FLOAT_WORD(z,j); return sn*z; } OVERLOADABLE float tgamma (float x) { /* based on glibc __ieee754_gammaf_r by Ulrich Drepper */ unsigned int hx; GEN_OCL_GET_FLOAT_WORD(hx,x); if (hx == 0xff800000) { /* x == -Inf. According to ISO this is NaN. */ return NAN; } if ((hx & 0x7f800000) == 0x7f800000) { /* Positive infinity (return positive infinity) or NaN (return NaN). */ return x; } if (x < 0.0f && __gen_ocl_internal_floor (x) == x) { /* integer x < 0 */ return NAN; } if (x >= 36.0f) { /* Overflow. */ return INFINITY; } else if (x <= 0.0f && x >= -FLT_EPSILON / 4.0f) { return 1.0f / x; } else { float sinpix = __gen_ocl_internal_sinpi(x); if (x <= -42.0f) /* Underflow. */ {return 0.0f * sinpix /*for sign*/;} int exp2_adj = 0; float x_abs = __gen_ocl_fabs(x); float gam0; if (x_abs < 4.0f) { /* gamma = exp(lgamma) is only accurate for small lgamma */ float prod,x_adj; if (x_abs < 0.5f) { prod = 1.0f / x_abs; x_adj = x_abs + 1.0f; } else if (x_abs <= 1.5f) { prod = 1.0f; x_adj = x_abs; } else if (x_abs < 2.5f) { x_adj = x_abs - 1.0f; prod = x_adj; } else { x_adj = x_abs - 2.0f; prod = x_adj * (x_abs - 1.0f); } gam0 = __gen_ocl_internal_exp (lgamma (x_adj)) * prod; } else { /* Compute gamma (X) using Stirling's approximation, starting by computing pow (X, X) with a power of 2 factored out to avoid intermediate overflow. */ float x_int = __gen_ocl_internal_round (x_abs); float x_frac = x_abs - x_int; int x_log2; float x_mant = frexp (x_abs, &x_log2); if (x_mant < M_SQRT1_2_F) { x_log2--; x_mant *= 2.0f; } exp2_adj = x_log2 * (int) x_int; float ret = (__gen_ocl_internal_pow(x_mant, x_abs) * exp2 (x_log2 * x_frac) * __gen_ocl_internal_exp (-x_abs) * sqrt (2.0f * M_PI_F / x_abs) ); float x2 = x_abs * x_abs; float bsum = (0x3.403404p-12f / x2 -0xb.60b61p-12f) / x2 + 0x1.555556p-4f; gam0 = ret + ret * __gen_ocl_internal_expm1 (bsum / x_abs); } if (x > 0.0f) {return __gen_ocl_internal_ldexp (gam0, exp2_adj);} float gam1 = M_PI_F / (-x * sinpix * gam0); return __gen_ocl_internal_ldexp (gam1, -exp2_adj); } } float __gen_ocl_internal_pown(float x, int y) { const float bp[] = {1.0, 1.5,}, dp_h[] = { 0.0, 5.84960938e-01,}, /* 0x3f15c000 */ dp_l[] = { 0.0, 1.56322085e-06,}, /* 0x35d1cfdc */ zero = 0.0, one = 1.0, two = 2.0, two24 = 16777216.0, /* 0x4b800000 */ huge = 1.0e30, tiny = 1.0e-30, /* poly coefs for (3/2)*(log(x)-2s-2/3*s**3 */ L1 = 6.0000002384e-01, /* 0x3f19999a */ L2 = 4.2857143283e-01, /* 0x3edb6db7 */ P1 = 1.6666667163e-01, /* 0x3e2aaaab */ P2 = -2.7777778450e-03, /* 0xbb360b61 */ lg2 = 6.9314718246e-01, /* 0x3f317218 */ lg2_h = 0x1.62ep-1, lg2_l = 0x1.0bfbe8p-15, ovt = 4.2995665694e-08, /* -(128-log2(ovfl+.5ulp)) */ cp = 9.6179670095e-01, /* 0x3f76384f =2/(3ln2) */ cp_h = 9.6179199219e-01, /* 0x3f763800 =head of cp */ cp_l = 4.7017383622e-06, /* 0x369dc3a0 =tail of cp_h */ ivln2 = 1.4426950216e+00, /* 0x3fb8aa3b =1/ln2 */ ivln2_h = 1.4426879883e+00, /* 0x3fb8aa00 =16b 1/ln2*/ ivln2_l = 7.0526075433e-06; /* 0x36eca570 =1/ln2 tail*/ float z,ax,z_h,z_l,p_h,p_l; float y1,t1,t2,r,s,t,u,v,w; int i,j,k,yisint,n; int hx,ix,iy,is; GEN_OCL_GET_FLOAT_WORD(hx,x); ix = hx&0x7fffffff; iy = y > 0 ? y&0x7fffffff : (-y)&0x7fffffff; /* y==zero: x**0 = 1 */ if(y==0) return one; /* +-NaN return NAN */ if(ix > 0x7f800000) return NAN; /* determine if y is an odd int * yisint = 1 ... y is an odd int * yisint = 2 ... y is an even int */ yisint = y&1 ? 1 : 2; if (y == 1) return x; if (y == -1) return one/x; if (y == 2) return x*x; ax = __gen_ocl_fabs(x); /* special value of x */ if(ix==0x7f800000||ix==0||ix==0x3f800000){ z = ax; /*x is +-0,+-inf,+-1*/ if(y<0) z = one/z; /* z = (1/|x|) */ if(hx<0) { if(yisint==1) z = -z; /* (x<0)**odd = -(|x|**odd) */ } return z; } float sn = one; /* s (sign of result -ve**odd) = -1 else = 1 */ if(((((unsigned)hx>>31)-1)|(yisint-1))==0) sn = -one; /* (-ve)**(odd int) */ /* |y| is huge */ if(iy>0x08000000) { /* if |y| > 2**27 */ /* over/underflow if x is not close to one */ if(ix<0x3f7ffff8) return (y<0)? sn*huge*huge:tiny*tiny; if(ix>0x3f800007) return (y>0)? sn*huge*huge:tiny*tiny; /* now |1-x| is tiny <= 2**-20, suffice to compute log(x) by x-x^2/2+x^3/3-x^4/4 */ t = ax-1; /* t has 20 trailing zeros */ w = (t*t)*((float)0.5-t*((float)0.333333333333-t*(float)0.25)); u = ivln2_h*t; /* ivln2_h has 16 sig. bits */ v = t*ivln2_l-w*ivln2; t1 = u+v; GEN_OCL_GET_FLOAT_WORD(is,t1); GEN_OCL_SET_FLOAT_WORD(t1,is&0xfffff000); t2 = v-(t1-u); } else { float s2,s_h,s_l,t_h,t_l; n = 0; /* take care subnormal number */ // if(ix<0x00800000) // {ax *= two24; n -= 24; GEN_OCL_GET_FLOAT_WORD(ix,ax); } n += ((ix)>>23)-0x7f; j = ix&0x007fffff; /* determine interval */ ix = j|0x3f800000; /* normalize ix */ if(j<=0x1cc471) k=0; /* |x|>1)|0x20000000)+0x00400000+(k<<21)) &0xfffff000); t_l = ax - (t_h-bp[k]); s_l = v*((u-s_h*t_h)-s_h*t_l); /* compute log(ax) */ s2 = s*s; r = s2*s2*(L1+s2*L2); r += s_l*(s_h+s); s2 = s_h*s_h; t_h = (float)3.0+s2+r; GEN_OCL_GET_FLOAT_WORD(is,t_h); GEN_OCL_SET_FLOAT_WORD(t_h,is&0xffffe000); t_l = r-((t_h-(float)3.0)-s2); /* u+v = s*(1+...) */ u = s_h*t_h; v = s_l*t_h+t_l*s; /* 2/(3log2)*(s+...) */ p_h = u+v; GEN_OCL_GET_FLOAT_WORD(is,p_h); GEN_OCL_SET_FLOAT_WORD(p_h,is&0xffffe000); p_l = v-(p_h-u); z_h = cp_h*p_h; /* cp_h+cp_l = 2/(3*log2) */ z_l = cp_l*p_h+p_l*cp+dp_l[k]; /* log2(ax) = (s+..)*2/(3*log2) = n + dp_h + z_h + z_l */ t = (float)n; t1 = (((z_h+z_l)+dp_h[k])+t); GEN_OCL_GET_FLOAT_WORD(is,t1); GEN_OCL_SET_FLOAT_WORD(t1,is&0xffffe000); t2 = z_l-(((t1-t)-dp_h[k])-z_h); } /* split up y into y1+y2+y3 and compute (y1+y2+y3)*(t1+t2) */ float fy = (float)y; float y3 = (float)(y-(int)fy); GEN_OCL_GET_FLOAT_WORD(is,fy); GEN_OCL_SET_FLOAT_WORD(y1,is&0xfffff000); p_l = (fy-y1)*t1 + y3*t1 + fy*t2 + y3*t2; p_h = y1*t1; z = p_l+p_h; GEN_OCL_GET_FLOAT_WORD(j,z); if (j>0x43000000) /* if z > 128 */ return sn*huge*huge; /* overflow */ else if (j==0x43000000) { /* if z == 128 */ if(p_l+ovt>z-p_h) return sn*huge*huge; /* overflow */ } else if ((j&0x7fffffff)>0x43160000) /* z <= -150 */ return sn*tiny*tiny; /* underflow */ else if (j==0xc3160000){ /* z == -150 */ if(p_l<=z-p_h) return sn*tiny*tiny; /* underflow */ } /* * compute 2**(p_h+p_l) */ i = j&0x7fffffff; k = (i>>23)-0x7f; n = 0; if(i>0x3f000000) { /* if |z| > 0.5, set n = [z+0.5] */ n = j+(0x00800000>>(k+1)); k = ((n&0x7fffffff)>>23)-0x7f; /* new k for n */ GEN_OCL_SET_FLOAT_WORD(t,n&~(0x007fffff>>k)); n = ((n&0x007fffff)|0x00800000)>>(23-k); if(j<0) n = -n; p_h -= t; z -= n; } t = z; GEN_OCL_GET_FLOAT_WORD(is,t); GEN_OCL_SET_FLOAT_WORD(t,is&0xfffff000); u = t*lg2_h; v = (p_l-(t-p_h))*lg2+t*lg2_l; z = u+v; w = v-(z-u); t = z*z; t1 = z - t*(P1+t*P2); r = (z*t1)/(t1-two)-(w+z*w); z = one-(r-z); GEN_OCL_GET_FLOAT_WORD(j,z); j += (n<<23); if((j>>23)<=0) z = __gen_ocl_scalbnf(z,n); /* subnormal output */ else GEN_OCL_SET_FLOAT_WORD(z,j); return sn*z; } OVERLOADABLE float hypot(float x, float y) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_hypot(x, y); //return __gen_ocl_sqrt(x*x + y*y); float a,b,an,bn,cn; int e; if (isfinite (x) && isfinite (y)){ /* Determine absolute values. */ x = __gen_ocl_fabs (x); y = __gen_ocl_fabs (y); /* Find the bigger and the smaller one. */ a = max(x,y); b = min(x,y); /* Now 0 <= b <= a. */ /* Write a = an * 2^e, b = bn * 2^e with 0 <= bn <= an < 1. */ an = frexp (a, &e); bn = ldexp (b, - e); /* Through the normalization, no unneeded overflow or underflow will occur here. */ cn = __gen_ocl_sqrt (an * an + bn * bn); return ldexp (cn, e); }else{ if (isinf (x) || isinf (y)) /* x or y is infinite. Return +Infinity. */ return INFINITY; else /* x or y is NaN. Return NaN. */ return x + y; } } #define BODY \ if (isnan(x)) { \ *p = x; \ return x; \ } \ *p = __gen_ocl_internal_floor(x); \ if (isinf(x)) { \ return x > 0 ? +0. : -0.; \ } \ return __gen_ocl_internal_fmin(x - *p, 0x1.FFFFFep-1F); OVERLOADABLE float fract(float x, float *p) { BODY; } #undef BODY #define BODY \ float Zero[2]; \ int n,hx,hy,hz,ix,iy,sx,i,sy; \ uint q,sxy; \ Zero[0] = 0.0;Zero[1] = -0.0; \ if (x == 0.0f) { x = 0.0f; }; \ if (y == 0.0f) { y = 0.0f; }\ GEN_OCL_GET_FLOAT_WORD(hx,x);GEN_OCL_GET_FLOAT_WORD(hy,y); \ sxy = (hx ^ hy) & 0x80000000;sx = hx&0x80000000;sy = hy&0x80000000; \ hx ^=sx; hy &= 0x7fffffff; \ if (hx < 0x00800000)hx = 0;if (hy < 0x00800000)hy = 0; \ if(hy==0||hx>=0x7f800000||hy>0x7f800000){ \ *quo = 0;return NAN; \ } \ if( hy == 0x7F800000 || hx == 0 ) { \ *quo = 0;return x; \ } \ if( hx == hy ) { \ *quo = (x == y) ? 1 : -1; \ return sx ? -0.0 : 0.0; \ } \ if(hx>31]; \ } \ ix = (hx>>23)-127; \ iy = (hy>>23)-127; \ hx = 0x00800000|(0x007fffff&hx); \ hy = 0x00800000|(0x007fffff&hy); \ n = ix - iy; \ q = 0; \ while(n--) { \ hz=hx-hy; \ if(hz<0) hx = hx << 1; \ else {hx = hz << 1; q++;} \ q <<= 1; \ } \ hz=hx-hy; \ if(hz>=0) {hx=hz;q++;} \ if(hx==0) { \ q &= 0x0000007f; \ *quo = (sxy ? -q : q); \ return Zero[(uint)sx>>31]; \ } \ while(hx<0x00800000) { \ hx <<= 1;iy -= 1; \ } \ if(iy>= -126) { \ hx = ((hx-0x00800000)|((iy+127)<<23)); \ } else {\ n = -126 - iy; \ hx >>= n; \ } \ fixup: \ GEN_OCL_SET_FLOAT_WORD(x,hx); \ if(hx<0x00800000){ \ GEN_OCL_GET_FLOAT_WORD(hy,y); \ hy &= 0x7fffffff; \ if(hx+hx > hy ||(hx+hx==hy && (q & 1)))q++; \ x = 0; \ }else{ \ y = __gen_ocl_fabs(y); \ if (y < 0x1p-125f) { \ if (x+x>y || (x+x==y && (q & 1))) { \ q++;x-=y; \ } \ }else if (x>0.5f*y || (x==0.5f*y && (q & 1))) { \ q++;x-=y; \ } \ GEN_OCL_GET_FLOAT_WORD(hx,x);GEN_OCL_SET_FLOAT_WORD(x,hx^sx); \ } \ int sign = sx==sy?0:1; \ q &= 0x0000007f; \ *quo = (sign ? -q : q); \ return x; OVERLOADABLE float remquo(float x, float y, int *quo) { BODY; } #undef BODY OVERLOADABLE float powr(float x, float y) { unsigned int hx, sx, hy, sy; if (__ocl_math_fastpath_flag) return __gen_ocl_pow(x,y); else { if (isnan(x) || isnan(y)) return NAN; GEN_OCL_GET_FLOAT_WORD(hx,x); GEN_OCL_GET_FLOAT_WORD(hy,y); sx = (hx & 0x80000000) >> 31; sy = (hy & 0x80000000) >> 31; if ((hx&0x7fffffff) < 0x00800000) { /* x < 2**-126 */ x = 0.0f;/* Gen does not support subnormal number now */ hx = hx &0x80000000; } if ((hy&0x7fffffff) < 0x00800000) { /* y < 2**-126 */ y = 0.0;/* Gen does not support subnormal number now */ hy = hy &0x80000000; } // (x < 0) ** y = NAN (y!=0) if ((sx && (hx & 0x7fffffff))) return NAN; // +/-0 ** +/-0 = NAN if ( !(hx&0x7fffffff) && !(hy&0x7fffffff)) return NAN; // +inf ** +/-0 = NAN if ( ((hx & 0x7f800000) ==0x7f800000) && !(hy&0x7fffffff)) return NAN; // others except nan/inf/0 ** 0 = 1.0 if (!(hy&0x7fffffff)) return 1.0f; // +1 ** inf = NAN; +1 ** finite = 1; if (hx == 0x3f800000) { return isinf(y) ? NAN : 1.0f; } if ( !(hx & 0x7fffffff)) { // +/-0 ** y<0 = +inf // +/-0 ** y>0 = +0 return sy ? INFINITY : 0.0f; } return __gen_ocl_internal_pow(x,y); } } OVERLOADABLE float pown(float x, int n) { if (__ocl_math_fastpath_flag) { if (x == 0.f && n == 0) return 1.f; if (x < 0.f && (n&1) ) return -powr(-x, n); return powr(x, n); } else { int ix; GEN_OCL_GET_FLOAT_WORD(ix, x); float sign = ix < 0 ? -1.0f : 1.0f; if (x == 0.0f) x = sign * 0.0f; return __gen_ocl_internal_pown(x, n); } } OVERLOADABLE float pow(float x, float y) { if (!__ocl_math_fastpath_flag) return __gen_ocl_internal_pow(x,y); else { int n; if (x == 0.f && y == 0.f) return 1.f; if (x >= 0.f) return powr(x, y); n = y; if ((float)n == y)//is exact integer return pown(x, n); return NAN; } } OVERLOADABLE float rootn(float x, int n) { float ax,re; int sign = 0; int hx; if( n == 0 )return NAN; GEN_OCL_GET_FLOAT_WORD(hx, x); // Gen does not support denorm, flush to zero if ((hx & 0x7fffffff) < 0x00800000) { x = hx < 0 ? -0.0f : 0.0f; } //rootn ( x, n ) returns a NaN for x < 0 and n is even. if( x < 0 && 0 == (n&1) ) return NAN; if( x == 0.0 ){ switch( n & 0x80000001 ){ //rootn ( +-0, n ) is +0 for even n > 0. case 0: return 0.0f; //rootn ( +-0, n ) is +-0 for odd n > 0. case 1: return x; //rootn ( +-0, n ) is +inf for even n < 0. case 0x80000000: return INFINITY; //rootn ( +-0, n ) is +-inf for odd n < 0. case 0x80000001: return __gen_ocl_internal_copysign(INFINITY, x); } } ax = __gen_ocl_fabs(x); if(x <0.0f && (n&1)) sign = 1; if (__ocl_math_fastpath_flag) re = __gen_ocl_pow(ax, 1.f/n); else re = __gen_ocl_internal_pow(ax,1.f/n); if(sign) re = -re; return re; } OVERLOADABLE float fabs(float x) { return __gen_ocl_internal_fabs(x); } OVERLOADABLE float trunc(float x) { return __gen_ocl_internal_trunc(x); } OVERLOADABLE float round(float x) { return __gen_ocl_internal_round(x); } OVERLOADABLE float floor(float x) { return __gen_ocl_internal_floor(x); } OVERLOADABLE float ceil(float x) { return __gen_ocl_internal_ceil(x); } OVERLOADABLE float log(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_log(x); /* Use native instruction when it has enough precision */ if((x > 0x1.1p0) || (x <= 0)) return __gen_ocl_internal_fastpath_log(x); return __gen_ocl_internal_log(x); } OVERLOADABLE float log2(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_log2(x); /* Use native instruction when it has enough precision */ if((x > 0x1.1p0) || (x <= 0)) return __gen_ocl_internal_fastpath_log2(x); return __gen_ocl_internal_log2(x); } OVERLOADABLE float log10(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_log10(x); /* Use native instruction when it has enough precision */ if((x > 0x1.1p0) || (x <= 0)) return __gen_ocl_internal_fastpath_log10(x); return __gen_ocl_internal_log10(x); } OVERLOADABLE float exp(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_exp(x); /* Use native instruction when it has enough precision */ if (x > -0x1.6p1 && x < 0x1.6p1) return __gen_ocl_internal_fastpath_exp(x); return __gen_ocl_internal_exp(x); } OVERLOADABLE float exp2(float x) { /* Use native instruction when it has enough precision, exp2 always */ return native_exp2(x); } OVERLOADABLE float exp10(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_exp10(x); return __gen_ocl_internal_exp10(x); } OVERLOADABLE float expm1(float x) { if (__ocl_math_fastpath_flag) return __gen_ocl_internal_fastpath_expm1(x); return __gen_ocl_internal_expm1(x); } OVERLOADABLE float fmin(float a, float b) { return __gen_ocl_internal_fmin(a, b); } OVERLOADABLE float fmax(float a, float b) { return __gen_ocl_internal_fmax(a, b); } OVERLOADABLE float fma(float a, float b, float c) { return mad(a, b, c); } OVERLOADABLE float fdim(float x, float y) { return __gen_ocl_internal_fdim(x, y); } OVERLOADABLE float maxmag(float x, float y) { return __gen_ocl_internal_maxmag(x, y); } OVERLOADABLE float minmag(float x, float y) { return __gen_ocl_internal_minmag(x, y); } /* So far, the HW do not support half float math function. We just do the conversion and call the float version here. */ OVERLOADABLE half cospi(half x) { float _x = (float)x; return (half)cospi(_x); } OVERLOADABLE half cosh(half x) { float _x = (float)x; return (half)cosh(_x); } OVERLOADABLE half acos(half x) { float _x = (float)x; return (half)acos(_x); } OVERLOADABLE float half_cos(float x) { return (float)cos(x); } OVERLOADABLE float half_divide(float x, float y) { return (float)native_divide(x, y); } OVERLOADABLE float half_exp(float x) { return (float)native_exp(x); } OVERLOADABLE float half_exp2(float x){ return (float)native_exp2(x); } OVERLOADABLE float half_exp10(float x){ return (float)native_exp10(x); } OVERLOADABLE float half_log(float x){ return (float)native_log(x); } OVERLOADABLE float half_log2(float x){ return (float)native_log2(x); } OVERLOADABLE float half_log10(float x){ return (float)native_log10(x); } OVERLOADABLE float half_powr(float x, float y){ return (float)powr(x, y); } OVERLOADABLE float half_recip(float x){ return (float)native_recip(x); } OVERLOADABLE float half_rsqrt(float x){ return (float)native_rsqrt(x); } OVERLOADABLE float half_sin(float x){ return (float)sin(x); } OVERLOADABLE float half_sqrt(float x){ return (float)native_sqrt(x); } OVERLOADABLE float half_tan(float x){ return (float)tan(x); } OVERLOADABLE half acospi(half x) { float _x = (float)x; return (half)acospi(_x); } OVERLOADABLE half acosh(half x) { float _x = (float)x; return (half)acosh(_x); } OVERLOADABLE half sinpi(half x) { float _x = (float)x; return (half)sinpi(_x); } OVERLOADABLE half sinh(half x) { float _x = (float)x; return (half)sinh(_x); } OVERLOADABLE half asin(half x) { float _x = (float)x; return (half)asin(_x); } OVERLOADABLE half asinpi(half x) { float _x = (float)x; return (half)asinpi(_x); } OVERLOADABLE half asinh(half x) { float _x = (float)x; return (half)asinh(_x); } OVERLOADABLE half tanpi(half x) { float _x = (float)x; return (half)tanpi(_x); } OVERLOADABLE half tanh(half x) { float _x = (float)x; return (half)tanh(_x); } OVERLOADABLE half atan(half x) { float _x = (float)x; return (half)atan(_x); } OVERLOADABLE half atan2(half y, half x) { float _x = (float)x; float _y = (float)y; return (half)atan2(_x, _y); } OVERLOADABLE half atan2pi(half y, half x) { float _x = (float)x; float _y = (float)y; return (half)atan2pi(_x, _y); } OVERLOADABLE half atanpi(half x) { float _x = (float)x; return (half)atanpi(_x); } OVERLOADABLE half atanh(half x) { float _x = (float)x; return (half)atanh(_x); } OVERLOADABLE half cbrt(half x) { float _x = (float)x; return (half)cbrt(_x); } OVERLOADABLE half rint(half x) { float _x = (float)x; return (half)rint(_x); } OVERLOADABLE half copysign(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)copysign(_x, _y); } OVERLOADABLE half erf(half x) { float _x = (float)x; return (half)erf(_x); } OVERLOADABLE half erfc(half x) { float _x = (float)x; return (half)erfc(_x); } OVERLOADABLE half fmod(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)fmod(_x, _y); } OVERLOADABLE half remainder(half x, half p) { float _x = (float)x; float _p = (float)p; return (half)remainder(_x, _p); } OVERLOADABLE half ldexp(half x, int n) { float _x = (float)x; return (half)ldexp(_x, n); } OVERLOADABLE half powr(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)powr(_x, _y); } OVERLOADABLE half pow(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)pow(_x, _y); } //no pow, we use powr instead OVERLOADABLE half fabs(half x) { float _x = (float)x; return (half)fabs(_x); } OVERLOADABLE half trunc(half x) { float _x = (float)x; return (half)trunc(_x); } OVERLOADABLE half round(half x) { float _x = (float)x; return (half)round(_x); } OVERLOADABLE half floor(half x) { float _x = (float)x; return (half)floor(_x); } OVERLOADABLE half ceil(half x) { float _x = (float)x; return (half)ceil(_x); } OVERLOADABLE half log(half x) { float _x = (float)x; return (half)log(_x); } OVERLOADABLE half log2(half x) { float _x = (float)x; return (half)log2(_x); } OVERLOADABLE half log10(half x) { float _x = (float)x; return (half)log10(_x); } OVERLOADABLE half exp(half x) { float _x = (float)x; return (half)exp(_x); } OVERLOADABLE half exp10(half x) { float _x = (float)x; return (half)exp10(_x); } OVERLOADABLE half expm1(half x) { float _x = (float)x; return (half)expm1(_x); } OVERLOADABLE half fmin(half a, half b) { return __gen_ocl_internal_fmin(a, b); } OVERLOADABLE half fmax(half a, half b) { return __gen_ocl_internal_fmax(a, b); } OVERLOADABLE half fma(half a, half b, half c) { float _a = (float)a; float _b = (float)b; float _c = (float)c; return (half)fma(_a, _b, _c); } OVERLOADABLE half fdim(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)fdim(_x, _y); } OVERLOADABLE half maxmag(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)maxmag(_x, _y); } OVERLOADABLE half minmag(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)minmag(_x, _y); } OVERLOADABLE half exp2(half x) { float _x = (float)x; return (half)exp2(_x); } OVERLOADABLE half mad(half a, half b, half c) { return __gen_ocl_mad(a,b,c); } OVERLOADABLE half sin(half x) { float _x = (float)x; return (half)sin(_x); } OVERLOADABLE half cos(half x) { float _x = (float)x; return (half)cos(_x); } OVERLOADABLE half tan(half x) { float _x = (float)x; return (half)tan(_x); } OVERLOADABLE half tgamma(half x) { float _x = (float)x; return (half)tgamma(_x); } OVERLOADABLE half lgamma(half x) { float _x = (float)x; return (half)lgamma(_x); } OVERLOADABLE half lgamma_r(half x, int *signgamp) { float _x = (float)x; return (half)lgamma_r(_x, signgamp); } OVERLOADABLE half log1p(half x) { float _x = (float)x; return (half)log1p(_x); } OVERLOADABLE half logb(half x) { float _x = (float)x; return (half)logb(_x); } OVERLOADABLE int ilogb(half x) { float _x = (float)x; return ilogb(_x); } OVERLOADABLE half nan(ushort code) { return (half)NAN; } OVERLOADABLE half sincos(half x, half *cosval) { float _x = (float)x; float _cosval; half ret = (half)sincos(_x, &_cosval); *cosval = (half)_cosval; return ret; } OVERLOADABLE half sqrt(half x) { float _x = (float)x; return (half)sqrt(_x); } OVERLOADABLE half rsqrt(half x) { float _x = (float)x; return (half)rsqrt(_x); } OVERLOADABLE half frexp(half x, int *exp) { float _x = (float)x; return (half)frexp(_x, exp); } OVERLOADABLE half nextafter(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)nextafter(_x, _y); } OVERLOADABLE half modf(half x, half *i) { float _x = (float)x; float _i; half ret = (half)modf(_x, &_i); *i = (half)_i; return ret; } OVERLOADABLE half hypot(half x, half y) { float _x = (float)x; float _y = (float)y; return (half)hypot(_x, _y); } OVERLOADABLE half fract(half x, half *p) { float _x = (float)x; float _p; half ret = (half)fract(_x, &_p); *p = (half)_p; return ret; } OVERLOADABLE half remquo(half x, half y, int *quo) { float _x = (float)x; float _y = (float)y; return (half)remquo(_x, _y, quo); } OVERLOADABLE half pown(half x, int n) { float _x = (float)x; return (half)pown(_x, n); } OVERLOADABLE half rootn(half x, int n) { float _x = (float)x; return (half)rootn(_x, n); } Beignet-1.3.2-Source/backend/src/libocl/tmpl/ocl_relational.tmpl.h000664 001750 001750 00000007415 13161142102 024167 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_RELATIONAL_H__ #define __OCL_RELATIONAL_H__ #include "ocl_types.h" #include "ocl_as.h" OVERLOADABLE int isequal(float x, float y); OVERLOADABLE int isnotequal(float x, float y); OVERLOADABLE int isgreater(float x, float y); OVERLOADABLE int isgreaterequal(float x, float y); OVERLOADABLE int isless(float x, float y); OVERLOADABLE int islessequal(float x, float y); OVERLOADABLE int islessgreater(float x, float y); OVERLOADABLE int isfinite(float x); OVERLOADABLE int isinf(float x); OVERLOADABLE int isnan(float x); OVERLOADABLE int isnormal(float x); OVERLOADABLE int isordered(float x, float y); OVERLOADABLE int isunordered(float x, float y); OVERLOADABLE int signbit(float x); // Half half version. OVERLOADABLE int isequal(half x, half y); OVERLOADABLE int isnotequal(half x, half y); OVERLOADABLE int isgreater(half x, half y); OVERLOADABLE int isgreaterequal(half x, half y); OVERLOADABLE int isless(half x, half y); OVERLOADABLE int islessequal(half x, half y); OVERLOADABLE int islessgreater(half x, half y); OVERLOADABLE int isfinite(half x); OVERLOADABLE int isinf(half x); OVERLOADABLE int isnan(half x); OVERLOADABLE int isnormal(half x); OVERLOADABLE int isordered(half x, half y); OVERLOADABLE int isunordered(half x, half y); OVERLOADABLE int signbit(half x); // any #define DEC1(type) OVERLOADABLE int any(type a); #define DEC2(type) OVERLOADABLE int any(type a); #define DEC3(type) OVERLOADABLE int any(type a); #define DEC4(type) OVERLOADABLE int any(type a); #define DEC8(type) OVERLOADABLE int any(type a); #define DEC16(type) OVERLOADABLE int any(type a); DEC1(char); DEC1(short); DEC1(int); DEC1(long); #define DEC(n) DEC##n(char##n); DEC##n(short##n); DEC##n(int##n); DEC##n(long##n); DEC(2); DEC(3); DEC(4); DEC(8); DEC(16); #undef DEC #undef DEC1 #undef DEC2 #undef DEC3 #undef DEC4 #undef DEC8 #undef DEC16 // all #define DEC1(type) OVERLOADABLE int all(type a); #define DEC2(type) OVERLOADABLE int all(type a); #define DEC3(type) OVERLOADABLE int all(type a); #define DEC4(type) OVERLOADABLE int all(type a); #define DEC8(type) OVERLOADABLE int all(type a); #define DEC16(type) OVERLOADABLE int all(type a); DEC1(char) DEC1(short) DEC1(int) DEC1(long) #define DEC(n) DEC##n(char##n) DEC##n(short##n) DEC##n(int##n) DEC##n(long##n) DEC(2) DEC(3) DEC(4) DEC(8) DEC(16) #undef DEC #undef DEC1 #undef DEC2 #undef DEC3 #undef DEC4 #undef DEC8 #undef DEC16 #define DEF(type) OVERLOADABLE type bitselect(type a, type b, type c); DEF(char) DEF(uchar) DEF(short) DEF(ushort) DEF(int) DEF(uint) DEF(long) DEF(ulong) #undef DEF OVERLOADABLE float bitselect(float a, float b, float c); OVERLOADABLE half bitselect(half a, half b, half c); #define DEF(TYPE1, TYPE2) \ OVERLOADABLE TYPE1 select(TYPE1 src0, TYPE1 src1, TYPE2 cond); DEF(char, char) DEF(char, uchar) DEF(uchar, char) DEF(uchar, uchar) DEF(short, short) DEF(short, ushort) DEF(ushort, short) DEF(ushort, ushort) DEF(int, int) DEF(int, uint) DEF(uint, int) DEF(uint, uint) DEF(long, long) DEF(long, ulong) DEF(ulong, long) DEF(ulong, ulong) DEF(float, int) DEF(float, uint) DEF(half, short) DEF(half, ushort) #undef DEF Beignet-1.3.2-Source/backend/src/libocl/tmpl/ocl_simd.tmpl.h000664 001750 001750 00000031555 13161142102 022773 0ustar00yryr000000 000000 /* * Copyright © 2015 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_SIMD_H__ #define __OCL_SIMD_H__ #include "ocl_types.h" ///////////////////////////////////////////////////////////////////////////// // SIMD level function ///////////////////////////////////////////////////////////////////////////// int sub_group_any(int); int sub_group_all(int); uint get_simd_size(void); uint get_sub_group_size(void); uint get_max_sub_group_size(void); uint get_num_sub_groups(void); uint get_sub_group_id(void); uint get_sub_group_local_id(void); /* broadcast */ OVERLOADABLE int sub_group_broadcast(int a,uint local_id); OVERLOADABLE uint sub_group_broadcast(uint a, uint local_id); OVERLOADABLE long sub_group_broadcast(long a, uint local_id); OVERLOADABLE ulong sub_group_broadcast(ulong a, uint local_id); OVERLOADABLE half sub_group_broadcast(half a, uint local_id); OVERLOADABLE float sub_group_broadcast(float a, uint local_id); OVERLOADABLE double sub_group_broadcast(double a, uint local_id); OVERLOADABLE short sub_group_broadcast(short a,uint local_id); OVERLOADABLE ushort sub_group_broadcast(ushort a, uint local_id); OVERLOADABLE short intel_sub_group_broadcast(short a, uint local_id); OVERLOADABLE ushort intel_sub_group_broadcast(ushort a, uint local_id); /* reduce add */ OVERLOADABLE int sub_group_reduce_add(int x); OVERLOADABLE uint sub_group_reduce_add(uint x); OVERLOADABLE long sub_group_reduce_add(long x); OVERLOADABLE ulong sub_group_reduce_add(ulong x); OVERLOADABLE half sub_group_reduce_add(half x); OVERLOADABLE float sub_group_reduce_add(float x); OVERLOADABLE double sub_group_reduce_add(double x); OVERLOADABLE short sub_group_reduce_add(short x); OVERLOADABLE ushort sub_group_reduce_add(ushort x); OVERLOADABLE short intel_sug_group_reduce_add(short x); OVERLOADABLE ushort intel_sug_group_reduce_add(ushort x); /* reduce min */ OVERLOADABLE int sub_group_reduce_min(int x); OVERLOADABLE uint sub_group_reduce_min(uint x); OVERLOADABLE long sub_group_reduce_min(long x); OVERLOADABLE ulong sub_group_reduce_min(ulong x); OVERLOADABLE half sub_group_reduce_min(half x); OVERLOADABLE float sub_group_reduce_min(float x); OVERLOADABLE double sub_group_reduce_min(double x); OVERLOADABLE short sub_group_reduce_min(short x); OVERLOADABLE ushort sub_group_reduce_min(ushort x); OVERLOADABLE short intel_sug_group_reduce_min(short x); OVERLOADABLE ushort intel_sug_group_reduce_min(ushort x); /* reduce max */ OVERLOADABLE int sub_group_reduce_max(int x); OVERLOADABLE uint sub_group_reduce_max(uint x); OVERLOADABLE long sub_group_reduce_max(long x); OVERLOADABLE ulong sub_group_reduce_max(ulong x); OVERLOADABLE half sub_group_reduce_max(half x); OVERLOADABLE float sub_group_reduce_max(float x); OVERLOADABLE double sub_group_reduce_max(double x); OVERLOADABLE short sub_group_reduce_max(short x); OVERLOADABLE ushort sub_group_reduce_max(ushort x); OVERLOADABLE short intel_sug_group_reduce_max(short x); OVERLOADABLE ushort intel_sug_group_reduce_max(ushort x); /* scan_inclusive add */ OVERLOADABLE int sub_group_scan_inclusive_add(int x); OVERLOADABLE uint sub_group_scan_inclusive_add(uint x); OVERLOADABLE long sub_group_scan_inclusive_add(long x); OVERLOADABLE ulong sub_group_scan_inclusive_add(ulong x); OVERLOADABLE half sub_group_scan_inclusive_add(half x); OVERLOADABLE float sub_group_scan_inclusive_add(float x); OVERLOADABLE double sub_group_scan_inclusive_add(double x); OVERLOADABLE short sub_group_scan_inclusive_add(short x); OVERLOADABLE ushort sub_group_scan_inclusive_add(ushort x); OVERLOADABLE short intel_sug_group_scan_inclusive_add(short x); OVERLOADABLE ushort intel_sug_group_scan_inclusive_add(ushort x); /* scan_inclusive min */ OVERLOADABLE int sub_group_scan_inclusive_min(int x); OVERLOADABLE uint sub_group_scan_inclusive_min(uint x); OVERLOADABLE long sub_group_scan_inclusive_min(long x); OVERLOADABLE ulong sub_group_scan_inclusive_min(ulong x); OVERLOADABLE half sub_group_scan_inclusive_min(half x); OVERLOADABLE float sub_group_scan_inclusive_min(float x); OVERLOADABLE double sub_group_scan_inclusive_min(double x); OVERLOADABLE short sub_group_scan_inclusive_min(short x); OVERLOADABLE ushort sub_group_scan_inclusive_min(ushort x); OVERLOADABLE short intel_sug_group_scan_inclusive_min(short x); OVERLOADABLE ushort intel_sug_group_scan_inclusive_min(ushort x); /* scan_inclusive max */ OVERLOADABLE int sub_group_scan_inclusive_max(int x); OVERLOADABLE uint sub_group_scan_inclusive_max(uint x); OVERLOADABLE long sub_group_scan_inclusive_max(long x); OVERLOADABLE ulong sub_group_scan_inclusive_max(ulong x); OVERLOADABLE half sub_group_scan_inclusive_max(half x); OVERLOADABLE float sub_group_scan_inclusive_max(float x); OVERLOADABLE double sub_group_scan_inclusive_max(double x); OVERLOADABLE short sub_group_scan_inclusive_max(short x); OVERLOADABLE ushort sub_group_scan_inclusive_max(ushort x); OVERLOADABLE short intel_sug_group_scan_inclusive_max(short x); OVERLOADABLE ushort intel_sug_group_scan_inclusive_max(ushort x); /* scan_exclusive add */ OVERLOADABLE int sub_group_scan_exclusive_add(int x); OVERLOADABLE uint sub_group_scan_exclusive_add(uint x); OVERLOADABLE long sub_group_scan_exclusive_add(long x); OVERLOADABLE ulong sub_group_scan_exclusive_add(ulong x); OVERLOADABLE half sub_group_scan_exclusive_add(half x); OVERLOADABLE float sub_group_scan_exclusive_add(float x); OVERLOADABLE double sub_group_scan_exclusive_add(double x); OVERLOADABLE short sub_group_scan_exclusive_add(short x); OVERLOADABLE ushort sub_group_scan_exclusive_add(ushort x); OVERLOADABLE short intel_sub_group_scan_exclusive_add(short x); OVERLOADABLE ushort intel_sub_group_scan_exclusive_add(ushort x); /* scan_exclusive min */ OVERLOADABLE int sub_group_scan_exclusive_min(int x); OVERLOADABLE uint sub_group_scan_exclusive_min(uint x); OVERLOADABLE long sub_group_scan_exclusive_min(long x); OVERLOADABLE ulong sub_group_scan_exclusive_min(ulong x); OVERLOADABLE half sub_group_scan_exclusive_min(half x); OVERLOADABLE float sub_group_scan_exclusive_min(float x); OVERLOADABLE double sub_group_scan_exclusive_min(double x); OVERLOADABLE short sub_group_scan_exclusive_min(short x); OVERLOADABLE ushort sub_group_scan_exclusive_min(ushort x); OVERLOADABLE short intel_sug_group_scan_exclusive_min(short x); OVERLOADABLE ushort intel_sug_group_scan_exclusive_min(ushort x); /* scan_exclusive max */ OVERLOADABLE int sub_group_scan_exclusive_max(int x); OVERLOADABLE uint sub_group_scan_exclusive_max(uint x); OVERLOADABLE long sub_group_scan_exclusive_max(long x); OVERLOADABLE ulong sub_group_scan_exclusive_max(ulong x); OVERLOADABLE half sub_group_scan_exclusive_max(half x); OVERLOADABLE float sub_group_scan_exclusive_max(float x); OVERLOADABLE double sub_group_scan_exclusive_max(double x); OVERLOADABLE short sub_group_scan_exclusive_max(short x); OVERLOADABLE ushort sub_group_scan_exclusive_max(ushort x); OVERLOADABLE short intel_sug_group_scan_exclusive_max(short x); OVERLOADABLE ushort intel_sug_group_scan_exclusive_max(ushort x); /* shuffle */ OVERLOADABLE half intel_sub_group_shuffle(half x, uint c); OVERLOADABLE float intel_sub_group_shuffle(float x, uint c); OVERLOADABLE int intel_sub_group_shuffle(int x, uint c); OVERLOADABLE uint intel_sub_group_shuffle(uint x, uint c); OVERLOADABLE short intel_sub_group_shuffle(short x, uint c); OVERLOADABLE ushort intel_sub_group_shuffle(ushort x, uint c); OVERLOADABLE float intel_sub_group_shuffle_down(float x, float y, uint c); OVERLOADABLE int intel_sub_group_shuffle_down(int x, int y, uint c); OVERLOADABLE uint intel_sub_group_shuffle_down(uint x, uint y, uint c); OVERLOADABLE short intel_sub_group_shuffle_down(short x, short y, uint c); OVERLOADABLE ushort intel_sub_group_shuffle_down(ushort x, ushort y, uint c); OVERLOADABLE float intel_sub_group_shuffle_up(float x, float y, uint c); OVERLOADABLE int intel_sub_group_shuffle_up(int x, int y, uint c); OVERLOADABLE uint intel_sub_group_shuffle_up(uint x, uint y, uint c); OVERLOADABLE short intel_sub_group_shuffle_up(short x, short y, uint c); OVERLOADABLE ushort intel_sub_group_shuffle_up(ushort x, ushort y, uint c); OVERLOADABLE float intel_sub_group_shuffle_xor(float x, uint c); OVERLOADABLE int intel_sub_group_shuffle_xor(int x, uint c); OVERLOADABLE uint intel_sub_group_shuffle_xor(uint x, uint c); OVERLOADABLE short intel_sub_group_shuffle_xor(short x, uint c); OVERLOADABLE ushort intel_sub_group_shuffle_xor(ushort x, uint c); /* blocak read/write */ OVERLOADABLE uint intel_sub_group_block_read(const global uint* p); OVERLOADABLE uint2 intel_sub_group_block_read2(const global uint* p); OVERLOADABLE uint4 intel_sub_group_block_read4(const global uint* p); OVERLOADABLE uint8 intel_sub_group_block_read8(const global uint* p); OVERLOADABLE void intel_sub_group_block_write(__global uint* p, uint data); OVERLOADABLE void intel_sub_group_block_write2(__global uint* p, uint2 data); OVERLOADABLE void intel_sub_group_block_write4(__global uint* p, uint4 data); OVERLOADABLE void intel_sub_group_block_write8(__global uint* p, uint8 data); OVERLOADABLE uint intel_sub_group_block_read(image2d_t image, int2 byte_coord); OVERLOADABLE uint2 intel_sub_group_block_read2(image2d_t image, int2 byte_coord); OVERLOADABLE uint4 intel_sub_group_block_read4(image2d_t image, int2 byte_coord); OVERLOADABLE uint8 intel_sub_group_block_read8(image2d_t image, int2 byte_coord); OVERLOADABLE void intel_sub_group_block_write(image2d_t image, int2 byte_coord, uint data); OVERLOADABLE void intel_sub_group_block_write2(image2d_t image, int2 byte_coord, uint2 data); OVERLOADABLE void intel_sub_group_block_write4(image2d_t image, int2 byte_coord, uint4 data); OVERLOADABLE void intel_sub_group_block_write8(image2d_t image, int2 byte_coord, uint8 data); OVERLOADABLE uint intel_sub_group_block_read_ui(const global uint* p); OVERLOADABLE uint2 intel_sub_group_block_read_ui2(const global uint* p); OVERLOADABLE uint4 intel_sub_group_block_read_ui4(const global uint* p); OVERLOADABLE uint8 intel_sub_group_block_read_ui8(const global uint* p); OVERLOADABLE void intel_sub_group_block_write_ui(__global uint* p, uint data); OVERLOADABLE void intel_sub_group_block_write_ui2(__global uint* p, uint2 data); OVERLOADABLE void intel_sub_group_block_write_ui4(__global uint* p, uint4 data); OVERLOADABLE void intel_sub_group_block_write_ui8(__global uint* p, uint8 data); OVERLOADABLE uint intel_sub_group_block_read_ui(image2d_t image, int2 byte_coord); OVERLOADABLE uint2 intel_sub_group_block_read_ui2(image2d_t image, int2 byte_coord); OVERLOADABLE uint4 intel_sub_group_block_read_ui4(image2d_t image, int2 byte_coord); OVERLOADABLE uint8 intel_sub_group_block_read_ui8(image2d_t image, int2 byte_coord); OVERLOADABLE void intel_sub_group_block_write_ui(image2d_t image, int2 byte_coord, uint data); OVERLOADABLE void intel_sub_group_block_write_ui2(image2d_t image, int2 byte_coord, uint2 data); OVERLOADABLE void intel_sub_group_block_write_ui4(image2d_t image, int2 byte_coord, uint4 data); OVERLOADABLE void intel_sub_group_block_write_ui8(image2d_t image, int2 byte_coord, uint8 data); OVERLOADABLE ushort intel_sub_group_block_read_us(const global ushort* p); OVERLOADABLE ushort2 intel_sub_group_block_read_us2(const global ushort* p); OVERLOADABLE ushort4 intel_sub_group_block_read_us4(const global ushort* p); OVERLOADABLE ushort8 intel_sub_group_block_read_us8(const global ushort* p); OVERLOADABLE void intel_sub_group_block_write_us(__global ushort* p, ushort data); OVERLOADABLE void intel_sub_group_block_write_us2(__global ushort* p, ushort2 data); OVERLOADABLE void intel_sub_group_block_write_us4(__global ushort* p, ushort4 data); OVERLOADABLE void intel_sub_group_block_write_us8(__global ushort* p, ushort8 data); OVERLOADABLE ushort intel_sub_group_block_read_us(image2d_t image, int2 byte_coord); OVERLOADABLE ushort2 intel_sub_group_block_read_us2(image2d_t image, int2 byte_coord); OVERLOADABLE ushort4 intel_sub_group_block_read_us4(image2d_t image, int2 byte_coord); OVERLOADABLE ushort8 intel_sub_group_block_read_us8(image2d_t image, int2 byte_coord); OVERLOADABLE void intel_sub_group_block_write_us(image2d_t image, int2 byte_coord, ushort data); OVERLOADABLE void intel_sub_group_block_write_us2(image2d_t image, int2 byte_coord, ushort2 data); OVERLOADABLE void intel_sub_group_block_write_us4(image2d_t image, int2 byte_coord, ushort4 data); OVERLOADABLE void intel_sub_group_block_write_us8(image2d_t image, int2 byte_coord, ushort8 data); Beignet-1.3.2-Source/backend/src/libocl/tmpl/ocl_integer.tmpl.cl000664 001750 001750 00000025005 13161142102 023634 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_integer.h" PURE CONST uint __gen_ocl_fbh(uint); PURE CONST uint __gen_ocl_fbl(uint); PURE CONST OVERLOADABLE uint __gen_ocl_cbit(uint); PURE CONST OVERLOADABLE uint __gen_ocl_cbit(int); PURE CONST OVERLOADABLE uint __gen_ocl_cbit(ushort); PURE CONST OVERLOADABLE uint __gen_ocl_cbit(short); PURE CONST OVERLOADABLE uint __gen_ocl_cbit(uchar); PURE CONST OVERLOADABLE uint __gen_ocl_cbit(char); #define SDEF(TYPE, TYPE_NAME, SIZE) \ OVERLOADABLE TYPE clz(TYPE x){ return clz_##TYPE_NAME##SIZE(x);} SDEF(char, s, 8); SDEF(uchar, u, 8); SDEF(short, s, 16); SDEF(ushort, u, 16); SDEF(int, s, 32); SDEF(uint, u, 32); SDEF(long, s, 64); SDEF(ulong, u, 64); #undef SDEF #define SDEF(TYPE, TYPE_NAME, SIZE) \ OVERLOADABLE TYPE ctz(TYPE x){ return ctz_##TYPE_NAME##SIZE(x);} SDEF(char, s, 8); SDEF(uchar, u, 8); SDEF(short, s, 16); SDEF(ushort, u, 16); SDEF(int, s, 32); SDEF(uint, u, 32); SDEF(long, s, 64); SDEF(ulong, u, 64); #undef SDEF #define SDEF(TYPE) \ OVERLOADABLE TYPE popcount(TYPE x){ return __gen_ocl_cbit(x);} SDEF(char); SDEF(uchar); SDEF(short); SDEF(ushort); SDEF(int); SDEF(uint); #undef SDEF OVERLOADABLE long popcount(long x) { union { int i[2]; long x; } u; u.x = x; uint v = popcount(u.i[1]); v += popcount(u.i[0]); return v; } OVERLOADABLE ulong popcount(ulong x) { union { uint i[2]; ulong x; } u; u.x = x; uint v = popcount(u.i[1]); v += popcount(u.i[0]); return v; } // sat #define SDEF(TYPE) \ OVERLOADABLE TYPE ocl_sadd_sat(TYPE x, TYPE y); \ OVERLOADABLE TYPE ocl_ssub_sat(TYPE x, TYPE y); \ OVERLOADABLE TYPE add_sat(TYPE x, TYPE y) { return ocl_sadd_sat(x, y); } \ OVERLOADABLE TYPE sub_sat(TYPE x, TYPE y) { return ocl_ssub_sat(x, y); } SDEF(char); SDEF(short); #undef SDEF OVERLOADABLE int ocl_sadd_sat(int x, int y); OVERLOADABLE int add_sat(int x, int y) { return ocl_sadd_sat(x, y); } OVERLOADABLE int ocl_ssub_sat(int x, int y); OVERLOADABLE int sub_sat(int x, int y) { return (y == 0x80000000u) ? (ocl_sadd_sat(ocl_sadd_sat(0x7fffffff, x), 1)) : ocl_ssub_sat(x, y); } OVERLOADABLE long ocl_sadd_sat(long x, long y); OVERLOADABLE long add_sat(long x, long y) { union {long l; uint i[2];} ux, uy; ux.l = x; uy.l = y; if((ux.i[1] ^ uy.i[1]) & 0x80000000u) return x + y; return ocl_sadd_sat(x, y); } OVERLOADABLE long ocl_ssub_sat(long x, long y); OVERLOADABLE long sub_sat(long x, long y) { union {long l; uint i[2];} ux, uy; ux.l = x; uy.l = y; if((ux.i[1] ^ uy.i[1]) & 0x80000000u) return ocl_ssub_sat(x, y); return x - y; } #define UDEF(TYPE) \ OVERLOADABLE TYPE ocl_uadd_sat(TYPE x, TYPE y); \ OVERLOADABLE TYPE ocl_usub_sat(TYPE x, TYPE y); \ OVERLOADABLE TYPE add_sat(TYPE x, TYPE y) { return ocl_uadd_sat(x, y); } \ OVERLOADABLE TYPE sub_sat(TYPE x, TYPE y) { return ocl_usub_sat(x, y); } UDEF(uchar); UDEF(ushort); UDEF(uint); UDEF(ulong); #undef UDEF OVERLOADABLE int __gen_ocl_mul_hi(int x, int y); OVERLOADABLE uint __gen_ocl_mul_hi(uint x, uint y); OVERLOADABLE long __gen_ocl_mul_hi(long x, long y); OVERLOADABLE ulong __gen_ocl_mul_hi(ulong x, ulong y); OVERLOADABLE char mul_hi(char x, char y) { return (x * y) >> 8; } OVERLOADABLE uchar mul_hi(uchar x, uchar y) { return (x * y) >> 8; } OVERLOADABLE short mul_hi(short x, short y) { return (x * y) >> 16; } OVERLOADABLE ushort mul_hi(ushort x, ushort y) { return (x * y) >> 16; } OVERLOADABLE int mul_hi(int x, int y) { return __gen_ocl_mul_hi(x, y); } OVERLOADABLE uint mul_hi(uint x, uint y) { return __gen_ocl_mul_hi(x, y); } OVERLOADABLE long mul_hi(long x, long y) { return __gen_ocl_mul_hi(x, y); } OVERLOADABLE ulong mul_hi(ulong x, ulong y) { return __gen_ocl_mul_hi(x, y); } #define DEF(type) OVERLOADABLE type mad_hi(type a, type b, type c) { return mul_hi(a, b) + c; } DEF(char) DEF(uchar) DEF(short) DEF(ushort) DEF(int) DEF(uint) DEF(long) DEF(ulong) #undef DEF OVERLOADABLE int mul24(int a, int b) { return a*b; } OVERLOADABLE uint mul24(uint a, uint b) { return a*b; } OVERLOADABLE int mad24(int a, int b, int c) { return mul24(a, b) + c; } OVERLOADABLE uint mad24(uint a, uint b, uint c) { return mul24(a, b) + c; } OVERLOADABLE char mad_sat(char a, char b, char c) { int x = (int)a * (int)b + (int)c; if (x > 127) x = 127; if (x < -128) x = -128; return x; } OVERLOADABLE uchar mad_sat(uchar a, uchar b, uchar c) { uint x = (uint)a * (uint)b + (uint)c; if (x > 255) x = 255; return x; } OVERLOADABLE short mad_sat(short a, short b, short c) { int x = (int)a * (int)b + (int)c; if (x > 32767) x = 32767; if (x < -32768) x = -32768; return x; } OVERLOADABLE ushort mad_sat(ushort a, ushort b, ushort c) { uint x = (uint)a * (uint)b + (uint)c; if (x > 65535) x = 65535; return x; } OVERLOADABLE int mad_sat(int a, int b, int c) { long x = (long)a * (long)b + (long)c; if (x > 0x7FFFFFFF) x = 0x7FFFFFFF; else if (x < -0x7FFFFFFF-1) x = -0x7FFFFFFF-1; return (int)x; } OVERLOADABLE uint mad_sat(uint a, uint b, uint c) { ulong x = (ulong)a * (ulong)b + (ulong)c; if (x > 0xFFFFFFFFu) x = 0xFFFFFFFFu; return (uint)x; } OVERLOADABLE long __gen_ocl_mad_sat(long a, long b, long c); OVERLOADABLE ulong __gen_ocl_mad_sat(ulong a, ulong b, ulong c); OVERLOADABLE long mad_sat(long a, long b, long c) { return __gen_ocl_mad_sat(a, b, c); } OVERLOADABLE ulong mad_sat(ulong a, ulong b, ulong c) { return __gen_ocl_mad_sat(a, b, c); } OVERLOADABLE uchar __rotate_left(uchar x, uchar y) { return (x << y) | (x >> (8 - y)); } OVERLOADABLE char __rotate_left(char x, char y) { return __rotate_left((uchar)x, (uchar)y); } OVERLOADABLE ushort __rotate_left(ushort x, ushort y) { return (x << y) | (x >> (16 - y)); } OVERLOADABLE short __rotate_left(short x, short y) { return __rotate_left((ushort)x, (ushort)y); } OVERLOADABLE uint __rotate_left(uint x, uint y) { return (x << y) | (x >> (32 - y)); } OVERLOADABLE int __rotate_left(int x, int y) { return __rotate_left((uint)x, (uint)y); } OVERLOADABLE ulong __rotate_left(ulong x, ulong y) { return (x << y) | (x >> (64 - y)); } OVERLOADABLE long __rotate_left(long x, long y) { return __rotate_left((ulong)x, (ulong)y); } #define DEF(type, m) OVERLOADABLE type rotate(type x, type y) { return __rotate_left(x, (type)(y & m)); } DEF(char, 7) DEF(uchar, 7) DEF(short, 15) DEF(ushort, 15) DEF(int, 31) DEF(uint, 31) DEF(long, 63) DEF(ulong, 63) #undef DEF OVERLOADABLE short __gen_ocl_upsample(short hi, short lo); OVERLOADABLE int __gen_ocl_upsample(int hi, int lo); OVERLOADABLE long __gen_ocl_upsample(long hi, long lo); OVERLOADABLE short upsample(char hi, uchar lo) { return __gen_ocl_upsample((short)hi, (short)lo); } OVERLOADABLE ushort upsample(uchar hi, uchar lo) { return __gen_ocl_upsample((short)hi, (short)lo); } OVERLOADABLE int upsample(short hi, ushort lo) { return __gen_ocl_upsample((int)hi, (int)lo); } OVERLOADABLE uint upsample(ushort hi, ushort lo) { return __gen_ocl_upsample((int)hi, (int)lo); } OVERLOADABLE long upsample(int hi, uint lo) { return __gen_ocl_upsample((long)hi, (long)lo); } OVERLOADABLE ulong upsample(uint hi, uint lo) { return __gen_ocl_upsample((long)hi, (long)lo); } OVERLOADABLE uint __gen_ocl_hadd(uint x, uint y); OVERLOADABLE uint __gen_ocl_rhadd(uint x, uint y); #define DEC DEF(char); DEF(uchar); DEF(short); DEF(ushort) #define DEF(type) OVERLOADABLE type hadd(type x, type y) { return (x + y) >> 1; } DEC #undef DEF #define DEF(type) OVERLOADABLE type rhadd(type x, type y) { return (x + y + 1) >> 1; } DEC #undef DEF #undef DEC OVERLOADABLE int hadd(int x, int y) { return (x < 0 && y > 0) || (x > 0 && y < 0) ? ((x + y) >> 1) : __gen_ocl_hadd((uint)x, (uint)y); } OVERLOADABLE uint hadd(uint x, uint y) { return __gen_ocl_hadd(x, y); } OVERLOADABLE int rhadd(int x, int y) { return (x < 0 && y > 0) || (x > 0 && y < 0) ? ((x + y + 1) >> 1) : __gen_ocl_rhadd((uint)x, (uint)y); } OVERLOADABLE uint rhadd(uint x, uint y) { return __gen_ocl_rhadd(x, y); } OVERLOADABLE ulong __gen_ocl_hadd(ulong x, ulong y); OVERLOADABLE ulong __gen_ocl_rhadd(ulong x, ulong y); OVERLOADABLE long hadd(long x, long y) { return (x < 0 && y > 0) || (x > 0 && y < 0) ? ((x + y) >> 1) : __gen_ocl_hadd((ulong)x, (ulong)y); } OVERLOADABLE ulong hadd(ulong x, ulong y) { return __gen_ocl_hadd(x, y); } OVERLOADABLE long rhadd(long x, long y) { return (x < 0 && y > 0) || (x > 0 && y < 0) ? ((x + y + 1) >> 1) : __gen_ocl_rhadd((ulong)x, (ulong)y); } OVERLOADABLE ulong rhadd(ulong x, ulong y) { return __gen_ocl_rhadd(x, y); } PURE CONST OVERLOADABLE char __gen_ocl_abs(char x); PURE CONST OVERLOADABLE short __gen_ocl_abs(short x); PURE CONST OVERLOADABLE int __gen_ocl_abs(int x); #define DEC(TYPE) OVERLOADABLE u##TYPE abs(TYPE x) { return (u##TYPE) __gen_ocl_abs(x); } DEC(int) DEC(short) DEC(char) #undef DEC OVERLOADABLE ulong abs(long x) { return x < 0 ? -x : x; } /* For unsigned types, do nothing. */ #define DEC(TYPE) OVERLOADABLE TYPE abs(TYPE x) { return x; } DEC(uint) DEC(ushort) DEC(uchar) DEC(ulong) #undef DEC /* Char and short type abs diff */ /* promote char and short to int and will be no module overflow */ #define DEC(TYPE, UTYPE) OVERLOADABLE UTYPE abs_diff(TYPE x, TYPE y) \ { return y > x ? (y -x) : (x - y); } DEC(char, uchar) DEC(uchar, uchar) DEC(short, ushort) DEC(ushort, ushort) DEC(int, uint) DEC(uint, uint) DEC(long, ulong) DEC(ulong, ulong) #undef DEC #define DECL_MIN_MAX_CLAMP(TYPE) \ OVERLOADABLE TYPE max(TYPE a, TYPE b) { \ return a > b ? a : b; \ } \ OVERLOADABLE TYPE min(TYPE a, TYPE b) { \ return a < b ? a : b; \ } \ OVERLOADABLE TYPE clamp(TYPE v, TYPE l, TYPE u) { \ return max(min(v, u), l); \ } DECL_MIN_MAX_CLAMP(int) DECL_MIN_MAX_CLAMP(short) DECL_MIN_MAX_CLAMP(char) DECL_MIN_MAX_CLAMP(uint) DECL_MIN_MAX_CLAMP(unsigned short) DECL_MIN_MAX_CLAMP(unsigned char) DECL_MIN_MAX_CLAMP(long) DECL_MIN_MAX_CLAMP(ulong) #undef DECL_MIN_MAX_CLAMP Beignet-1.3.2-Source/backend/src/libocl/tmpl/ocl_common.tmpl.cl000664 001750 001750 00000007074 13161142102 023475 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_common.h" #include "ocl_float.h" #include "ocl_relational.h" ///////////////////////////////////////////////////////////////////////////// // Common Functions ///////////////////////////////////////////////////////////////////////////// PURE CONST OVERLOADABLE float __gen_ocl_fmax(float a, float b); PURE CONST OVERLOADABLE float __gen_ocl_fmin(float a, float b); PURE CONST OVERLOADABLE float __gen_ocl_lrp(float a, float b, float c); OVERLOADABLE float step(float edge, float x) { return x < edge ? 0.0 : 1.0; } OVERLOADABLE float max(float a, float b) { return __gen_ocl_fmax(a, b); } OVERLOADABLE float min(float a, float b) { return __gen_ocl_fmin(a, b); } OVERLOADABLE float mix(float x, float y, float a) { return __gen_ocl_lrp(a,y,x); //The lrp using a different order with mix } OVERLOADABLE float clamp(float v, float l, float u) { return max(min(v, u), l); } OVERLOADABLE float degrees(float radians) { return M_180_PI_F * radians; } OVERLOADABLE float radians(float degrees) { return (M_PI_F / 180) * degrees; } OVERLOADABLE float smoothstep(float e0, float e1, float x) { x = clamp((x - e0) / (e1 - e0), 0.f, 1.f); return x * x * (3 - 2 * x); } OVERLOADABLE float sign(float x) { // TODO: the best form of implementation is below, // But I find it hard to implement in Beignet now, // So I would put it in the TODO list. // cmp.ne.f0 null x:f 0.0:f // and ret:ud x:ud 0x80000000:ud //(+f0) or ret:ud ret:ud 0x3f800000:ud // cmp.ne.f0 null x:f x:f //(+f0) mov ret:f 0.0f union {float f; unsigned u;} ieee; ieee.f = x; unsigned k = ieee.u; float r = (k&0x80000000) ? -1.0f : 1.0f; // differentiate +0.0f -0.0f float s = 0.0f * r; s = (x == 0.0f) ? s : r; return isnan(x) ? 0.0f : s; } // Half float version. PURE CONST OVERLOADABLE half __gen_ocl_fmax(half a, half b); PURE CONST OVERLOADABLE half __gen_ocl_fmin(half a, half b); OVERLOADABLE half step(half edge, half x) { return x < edge ? 0.0 : 1.0; } OVERLOADABLE half max(half a, half b) { return __gen_ocl_fmax(a, b); } OVERLOADABLE half min(half a, half b) { return __gen_ocl_fmin(a, b); } OVERLOADABLE half mix(half x, half y, half a) { return x + (y-x)*a; } OVERLOADABLE half clamp(half v, half l, half u) { return max(min(v, u), l); } OVERLOADABLE half degrees(half radians) { return ((half)(M_180_PI_F)) * radians; } OVERLOADABLE half radians(half degrees) { return ((half)(M_PI_F / 180)) * degrees; } OVERLOADABLE half smoothstep(half e0, half e1, half x) { x = clamp((x - e0) / (e1 - e0), (half)0.0, (half)1.0); return x * x * (3 - 2 * x); } OVERLOADABLE half sign(half x) { union {half h; ushort u;} ieee; ieee.h = x; unsigned k = ieee.u; half r = (k&0x8000) ? -1.0 : 1.0; // differentiate +0.0f -0.0f half s = (half)0.0 * r; s = (x == (half)0.0) ? s : r; return isnan(x) ? 0.0 : s; } Beignet-1.3.2-Source/backend/src/libocl/tmpl/ocl_relational.tmpl.cl000664 001750 001750 00000013313 13161142102 024330 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "ocl_relational.h" OVERLOADABLE int isequal(float x, float y) { return x == y; } OVERLOADABLE int isnotequal(float x, float y) { return x != y; } OVERLOADABLE int isgreater(float x, float y) { return x > y; } OVERLOADABLE int isgreaterequal(float x, float y) { return x >= y; } OVERLOADABLE int isless(float x, float y) { return x < y; } OVERLOADABLE int islessequal(float x, float y) { return x <= y; } OVERLOADABLE int islessgreater(float x, float y) { return (x < y) || (x > y); } OVERLOADABLE int isfinite(float x) { union { uint u; float f; } u; u.f = x; return (u.u & 0x7FFFFFFF) < 0x7F800000; } OVERLOADABLE int isinf(float x) { union { uint u; float f; } u; u.f = x; return (u.u & 0x7FFFFFFF) == 0x7F800000; } OVERLOADABLE int isnan(float x) { return x != x; } OVERLOADABLE int isnormal(float x) { union { uint u; float f; } u; u.f = x; u.u &= 0x7FFFFFFF; return (u.u < 0x7F800000) && (u.u >= 0x800000); } OVERLOADABLE int isordered(float x, float y) { return isequal(x, x) && isequal(y, y); } OVERLOADABLE int isunordered(float x, float y) { return isnan(x) || isnan(y); } OVERLOADABLE int signbit(float x) { union { uint u; float f; } u; u.f = x; return u.u >> 31; } // Half float version. OVERLOADABLE int isequal(half x, half y) { return x == y; } OVERLOADABLE int isnotequal(half x, half y) { return x != y; } OVERLOADABLE int isgreater(half x, half y) { return x > y; } OVERLOADABLE int isgreaterequal(half x, half y) { return x >= y; } OVERLOADABLE int isless(half x, half y) { return x < y; } OVERLOADABLE int islessequal(half x, half y) { return x <= y; } OVERLOADABLE int islessgreater(half x, half y) { return (x < y) || (x > y); } OVERLOADABLE int isfinite(half x) { union { ushort u; half h; } u; u.h = x; return (u.u & 0x7FFF) < 0x7C00; } OVERLOADABLE int isinf(half x) { union { ushort u; half h; } u; u.h = x; return (u.u & 0x7FFF) == 0x7C00; } OVERLOADABLE int isnan(half x) { return x != x; } OVERLOADABLE int isnormal(half x) { union { ushort u; half h; } u; u.h = x; u.u &= 0x7FFF; return (u.u < 0x7C00) && (u.u >= 0x400); } OVERLOADABLE int isordered(half x, half y) { return isequal(x, x) && isequal(y, y); } OVERLOADABLE int isunordered(half x, half y) { return isnan(x) || isnan(y); } OVERLOADABLE int signbit(half x) { union { ushort u; half h; } u; u.h = x; return u.u >> 15; } // any #define DEC1(type) OVERLOADABLE int any(type a) { return a<0; } #define DEC2(type) OVERLOADABLE int any(type a) { return a.s0<0 || a.s1<0; } #define DEC3(type) OVERLOADABLE int any(type a) { return a.s0<0 || a.s1<0 || a.s2<0; } #define DEC4(type) OVERLOADABLE int any(type a) { return a.s0<0 || a.s1<0 || a.s2<0 || a.s3<0; } #define DEC8(type) OVERLOADABLE int any(type a) { return a.s0<0 || a.s1<0 || a.s2<0 || a.s3<0 || a.s4<0 || a.s5<0 || a.s6<0 || a.s7<0; } #define DEC16(type) OVERLOADABLE int any(type a) { return a.s0<0 || a.s1<0 || a.s2<0 || a.s3<0 || a.s4<0 || a.s5<0 || a.s6<0 || a.s7<0 || a.s8<0 || a.s9<0 || a.sA<0 || a.sB<0 || a.sC<0 || a.sD<0 || a.sE<0 || a.sF<0; } DEC1(char); DEC1(short); DEC1(int); DEC1(long); #define DEC(n) DEC##n(char##n); DEC##n(short##n); DEC##n(int##n); DEC##n(long##n); DEC(2); DEC(3); DEC(4); DEC(8); DEC(16); #undef DEC #undef DEC1 #undef DEC2 #undef DEC3 #undef DEC4 #undef DEC8 #undef DEC16 // all #define DEC1(type) OVERLOADABLE int all(type a) { return a<0; } #define DEC2(type) OVERLOADABLE int all(type a) { return a.s0<0 && a.s1<0; } #define DEC3(type) OVERLOADABLE int all(type a) { return a.s0<0 && a.s1<0 && a.s2<0; } #define DEC4(type) OVERLOADABLE int all(type a) { return a.s0<0 && a.s1<0 && a.s2<0 && a.s3<0; } #define DEC8(type) OVERLOADABLE int all(type a) { return a.s0<0 && a.s1<0 && a.s2<0 && a.s3<0 && a.s4<0 && a.s5<0 && a.s6<0 && a.s7<0; } #define DEC16(type) OVERLOADABLE int all(type a) { return a.s0<0 && a.s1<0 && a.s2<0 && a.s3<0 && a.s4<0 && a.s5<0 && a.s6<0 && a.s7<0 && a.s8<0 && a.s9<0 && a.sA<0 && a.sB<0 && a.sC<0 && a.sD<0 && a.sE<0 && a.sF<0; } DEC1(char); DEC1(short); DEC1(int); DEC1(long); #define DEC(n) DEC##n(char##n); DEC##n(short##n); DEC##n(int##n); DEC##n(long##n); DEC(2); DEC(3); DEC(4); DEC(8); DEC(16); #undef DEC #undef DEC1 #undef DEC2 #undef DEC3 #undef DEC4 #undef DEC8 #undef DEC16 #define DEF(type) OVERLOADABLE type bitselect(type a, type b, type c) { return (a & ~c) | (b & c); } DEF(char); DEF(uchar); DEF(short); DEF(ushort); DEF(int); DEF(uint) DEF(long); DEF(ulong) #undef DEF OVERLOADABLE float bitselect(float a, float b, float c) { return as_float(bitselect(as_int(a), as_int(b), as_int(c))); } // select #define DEF(TYPE1, TYPE2) \ OVERLOADABLE TYPE1 select(TYPE1 src0, TYPE1 src1, TYPE2 cond) { \ return cond ? src1 : src0; \ } DEF(char, char) DEF(char, uchar) DEF(uchar, char) DEF(uchar, uchar) DEF(short, short) DEF(short, ushort) DEF(ushort, short) DEF(ushort, ushort) DEF(int, int) DEF(int, uint) DEF(uint, int) DEF(uint, uint) DEF(long, long) DEF(long, ulong) DEF(ulong, long) DEF(ulong, ulong) DEF(float, int) DEF(float, uint) #undef DEF Beignet-1.3.2-Source/backend/src/libocl/tmpl/ocl_common.tmpl.h000664 001750 001750 00000003416 13161142102 023322 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_COMMON_H__ #define __OCL_COMMON_H__ #include "ocl_types.h" ///////////////////////////////////////////////////////////////////////////// // Common Functions ///////////////////////////////////////////////////////////////////////////// OVERLOADABLE float step(float edge, float x); OVERLOADABLE float max(float a, float b); OVERLOADABLE float min(float a, float b); OVERLOADABLE float mix(float x, float y, float a); OVERLOADABLE float clamp(float v, float l, float u); OVERLOADABLE float degrees(float radians); OVERLOADABLE float radians(float degrees); OVERLOADABLE float smoothstep(float e0, float e1, float x); OVERLOADABLE float sign(float x); // Half half version. OVERLOADABLE half step(half edge, half x); OVERLOADABLE half max(half a, half b); OVERLOADABLE half min(half a, half b); OVERLOADABLE half mix(half x, half y, half a); OVERLOADABLE half clamp(half v, half l, half u); OVERLOADABLE half degrees(half radians); OVERLOADABLE half radians(half degrees); OVERLOADABLE half smoothstep(half e0, half e1, half x); OVERLOADABLE half sign(half x); Beignet-1.3.2-Source/backend/src/libocl/tmpl/ocl_math_20.tmpl.h000664 001750 001750 00000017005 13161142102 023263 0ustar00yryr000000 000000 /* * Copyright © 2012 - 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __OCL_MATH_20_H__ #define __OCL_MATH_20_H__ #include "ocl_types.h" OVERLOADABLE float cospi(float x); OVERLOADABLE float cosh(float x); OVERLOADABLE float acos(float x); OVERLOADABLE float acospi(float x); OVERLOADABLE float acosh(float x); OVERLOADABLE float sinpi(float x); OVERLOADABLE float sinh(float x); OVERLOADABLE float asin(float x); OVERLOADABLE float asinpi(float x); OVERLOADABLE float asinh(float x); OVERLOADABLE float tanpi(float x); OVERLOADABLE float tanh(float x); OVERLOADABLE float atan(float x); OVERLOADABLE float atan2(float y, float x); OVERLOADABLE float atan2pi(float y, float x); OVERLOADABLE float atanpi(float x); OVERLOADABLE float atanh(float x); OVERLOADABLE float cbrt(float x); OVERLOADABLE float rint(float x); OVERLOADABLE float copysign(float x, float y); OVERLOADABLE float erf(float x); OVERLOADABLE float erfc(float x); OVERLOADABLE float fmod (float x, float y); OVERLOADABLE float remainder(float x, float p); OVERLOADABLE float ldexp(float x, int n); OVERLOADABLE float powr(float x, float y); OVERLOADABLE float pow(float x, float y); //no pow, we use powr instead OVERLOADABLE float fabs(float x); OVERLOADABLE float trunc(float x); OVERLOADABLE float round(float x); OVERLOADABLE float floor(float x); OVERLOADABLE float ceil(float x); OVERLOADABLE float log(float x); OVERLOADABLE float log2(float x); OVERLOADABLE float log10(float x); OVERLOADABLE float exp(float x); OVERLOADABLE float exp10(float x); OVERLOADABLE float expm1(float x); OVERLOADABLE float fmin(float a, float b); OVERLOADABLE float fmax(float a, float b); OVERLOADABLE float fma(float a, float b, float c); OVERLOADABLE float fdim(float x, float y); OVERLOADABLE float maxmag(float x, float y); OVERLOADABLE float minmag(float x, float y); OVERLOADABLE float exp2(float x); OVERLOADABLE float mad(float a, float b, float c); OVERLOADABLE float sin(float x); OVERLOADABLE float cos(float x); OVERLOADABLE float tan(float x); OVERLOADABLE float tgamma(float x); OVERLOADABLE float lgamma(float x); OVERLOADABLE float lgamma_r(float x, int *signgamp); OVERLOADABLE float log1p(float x); OVERLOADABLE float logb(float x); OVERLOADABLE int ilogb(float x); OVERLOADABLE float nan(uint code); OVERLOADABLE float sincos(float x, float *cosval); OVERLOADABLE float sqrt(float x); OVERLOADABLE float rsqrt(float x); OVERLOADABLE float frexp(float x, int *exp); OVERLOADABLE float nextafter(float x, float y); OVERLOADABLE float modf(float x, float *i); OVERLOADABLE float hypot(float x, float y); OVERLOADABLE float fract(float x, float *p); OVERLOADABLE float remquo(float x, float y, int *quo); OVERLOADABLE float pown(float x, int n); OVERLOADABLE float rootn(float x, int n); // native OVERLOADABLE float native_cos(float x); OVERLOADABLE float native_divide(float x, float y); OVERLOADABLE float native_exp(float x); OVERLOADABLE float native_exp2(float x); OVERLOADABLE float native_exp10(float x); OVERLOADABLE float native_log(float x); OVERLOADABLE float native_log2(float x); OVERLOADABLE float native_log10(float x); OVERLOADABLE float native_powr(float x, float y); OVERLOADABLE float native_recip(float x); OVERLOADABLE float native_rsqrt(float x); OVERLOADABLE float native_sin(float x); OVERLOADABLE float native_sqrt(float x); OVERLOADABLE float native_tan(float x); // Half float version. OVERLOADABLE half cospi(half x); OVERLOADABLE half cosh(half x); OVERLOADABLE half acos(half x); OVERLOADABLE half acospi(half x); OVERLOADABLE half acosh(half x); OVERLOADABLE half sinpi(half x); OVERLOADABLE half sinh(half x); OVERLOADABLE half asin(half x); OVERLOADABLE half asinpi(half x); OVERLOADABLE half asinh(half x); OVERLOADABLE half tanpi(half x); OVERLOADABLE half tanh(half x); OVERLOADABLE half atan(half x); OVERLOADABLE half atan2(half y, half x); OVERLOADABLE half atan2pi(half y, half x); OVERLOADABLE half atanpi(half x); OVERLOADABLE half atanh(half x); OVERLOADABLE half cbrt(half x); OVERLOADABLE half rint(half x); OVERLOADABLE half copysign(half x, half y); OVERLOADABLE half erf(half x); OVERLOADABLE half erfc(half x); OVERLOADABLE half fmod (half x, half y); OVERLOADABLE half remainder(half x, half p); OVERLOADABLE half ldexp(half x, int n); OVERLOADABLE half powr(half x, half y); OVERLOADABLE half pow(half x, half y); //no pow, we use powr instead OVERLOADABLE half fabs(half x); OVERLOADABLE half trunc(half x); OVERLOADABLE half round(half x); OVERLOADABLE half floor(half x); OVERLOADABLE half ceil(half x); OVERLOADABLE half log(half x); OVERLOADABLE half log2(half x); OVERLOADABLE half log10(half x); OVERLOADABLE half exp(half x); OVERLOADABLE half exp10(half x); OVERLOADABLE half expm1(half x); OVERLOADABLE half fmin(half a, half b); OVERLOADABLE half fmax(half a, half b); OVERLOADABLE half fma(half a, half b, half c); OVERLOADABLE half fdim(half x, half y); OVERLOADABLE half maxmag(half x, half y); OVERLOADABLE half minmag(half x, half y); OVERLOADABLE half exp2(half x); OVERLOADABLE half mad(half a, half b, half c); OVERLOADABLE half sin(half x); OVERLOADABLE half cos(half x); OVERLOADABLE half tan(half x); OVERLOADABLE half tgamma(half x); OVERLOADABLE half lgamma(half x); OVERLOADABLE half lgamma_r(half x, int *signgamp); OVERLOADABLE half log1p(half x); OVERLOADABLE half logb(half x); OVERLOADABLE int ilogb(half x); OVERLOADABLE half nan(ushort code); OVERLOADABLE half sincos(half x, half *cosval); OVERLOADABLE half sqrt(half x); OVERLOADABLE half rsqrt(half x); OVERLOADABLE half frexp(half x, int *exp); OVERLOADABLE half nextafter(half x, half y); OVERLOADABLE half modf(half x, half *i); OVERLOADABLE half hypot(half x, half y); OVERLOADABLE half fract(half x, half *p); OVERLOADABLE half remquo(half x, half y, int *quo); OVERLOADABLE half pown(half x, int n); OVERLOADABLE half rootn(half x, int n); // native half OVERLOADABLE half native_cos(half x); OVERLOADABLE half native_divide(half x, half y); OVERLOADABLE half native_exp(half x); OVERLOADABLE half native_exp2(half x); OVERLOADABLE half native_exp10(half x); OVERLOADABLE half native_log(half x); OVERLOADABLE half native_log2(half x); OVERLOADABLE half native_log10(half x); OVERLOADABLE half native_powr(half x, half y); OVERLOADABLE half native_recip(half x); OVERLOADABLE half native_rsqrt(half x); OVERLOADABLE half native_sin(half x); OVERLOADABLE half native_sqrt(half x); OVERLOADABLE half native_tan(half x); // half accuracy OVERLOADABLE float half_cos(float x); OVERLOADABLE float half_divide(float x, float y); OVERLOADABLE float half_exp(float x); OVERLOADABLE float half_exp2(float x); OVERLOADABLE float half_exp10(float x); OVERLOADABLE float half_log(float x); OVERLOADABLE float half_log2(float x); OVERLOADABLE float half_log10(float x); OVERLOADABLE float half_powr(float x, float y); OVERLOADABLE float half_recip(float x); OVERLOADABLE float half_rsqrt(float x); OVERLOADABLE float half_sin(float x); OVERLOADABLE float half_sqrt(float x); OVERLOADABLE float half_tan(float x); Beignet-1.3.2-Source/backend/src/libocl/script/000775 001750 001750 00000000000 13174334761 020417 5ustar00yryr000000 000000 Beignet-1.3.2-Source/backend/src/libocl/script/ocl_as.sh000775 001750 001750 00000010226 13161142102 022175 0ustar00yryr000000 000000 #! /bin/sh -e echo ' /* * Copyright © 2012 - 2014 Intel Corporatio * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ ' if [ $1"a" = "-pa" ]; then echo "#ifndef __OCL_AS_H__" echo "#define __OCL_AS_H__" echo "#include \"ocl_types.h\"" echo else echo "#include \"ocl_as.h\"" echo fi # Supported base types and their lengths TYPES="long:8 ulong:8 int:4 uint:4 short:2 ushort:2 char:1 uchar:1 double:8 float:4 half:2" # Supported vector lengths VECTOR_LENGTHS="1 2 3 4 8 16" ROUNDING_MODES="rte rtz rtp rtn" # Generate list of union sizes for type in $TYPES; do size=`IFS=:; set -- dummy $type; echo $3` for vector_length in $VECTOR_LENGTHS; do if test $vector_length -eq 3; then continue; fi union_sizes="$union_sizes `expr $vector_length \* $size`" done done union_sizes="`echo $union_sizes | tr ' ' '\n' | sort -n | uniq`" # For each union size for union_size in $union_sizes; do if [ $1"a" != "-pa" ]; then # Define an union that contains all vector types that have the same size as the union unionname="union _type_cast_${union_size}_b" echo "$unionname {" for type in $TYPES; do basetype=`IFS=:; set -- dummy $type; echo $2` basesize=`IFS=:; set -- dummy $type; echo $3` for vector_length in $VECTOR_LENGTHS; do if test $vector_length -eq 3; then vector_size_length="4" else vector_size_length=$vector_length; fi vector_size_in_union="`expr $vector_size_length \* $basesize`" if test $union_size -ne $vector_size_in_union; then continue fi if test $vector_length -eq 1; then vectortype=$basetype else vectortype=$basetype$vector_length fi echo " $vectortype _$vectortype;" done done echo "};" echo fi # For each tuple of vector types that has the same size as the current union size, # define an as_* function that converts types without changing binary representation. for ftype in $TYPES; do fbasetype=`IFS=:; set -- dummy $ftype; echo $2` fbasesize=`IFS=:; set -- dummy $ftype; echo $3` for fvector_length in $VECTOR_LENGTHS; do if test $fvector_length -eq 3; then fvector_size_length="4" else fvector_size_length=$fvector_length; fi fvector_size_in_union="`expr $fvector_size_length \* $fbasesize`" if test $union_size -ne $fvector_size_in_union; then continue fi if test $fvector_length -eq 1; then fvectortype=$fbasetype else fvectortype=$fbasetype$fvector_length fi for ttype in $TYPES; do tbasetype=`IFS=:; set -- dummy $ttype; echo $2` tbasesize=`IFS=:; set -- dummy $ttype; echo $3` if test $fbasetype = $tbasetype; then continue fi for tvector_length in $VECTOR_LENGTHS; do if test $tvector_length -eq 3; then tvector_size_length="4" else tvector_size_length=$tvector_length; fi tvector_size_in_union="`expr $tvector_size_length \* $tbasesize`" if test $union_size -ne $tvector_size_in_union; then continue fi if test $tvector_length -eq 1; then tvectortype=$tbasetype else tvectortype=$tbasetype$tvector_length fi if [ $1"a" = "-pa" ]; then echo "OVERLOADABLE $tvectortype as_$tvectortype($fvectortype v);" else echo "OVERLOADABLE $tvectortype as_$tvectortype($fvectortype v) {" echo " $unionname u;" echo " u._$fvectortype = v;" echo " return u._$tvectortype;" echo "}" echo fi done done done done done if [ $1"a" = "-pa" ]; then echo "#endif /* __OCL_AS_H__ */" fi Beignet-1.3.2-Source/backend/src/libocl/script/ocl_simd.def000664 001750 001750 00000001245 13161142102 022650 0ustar00yryr000000 000000 ##simd level functions floatn intel_sub_group_shuffle(floatn x, uint c) intn intel_sub_group_shuffle(intn x, uint c) uintn intel_sub_group_shuffle(uintn x, uint c) floatn intel_sub_group_shuffle_down(floatn x, floatn y, uint c) intn intel_sub_group_shuffle_down(intn x, intn y, uint c) uintn intel_sub_group_shuffle_down(uintn x, uintn y, uint c) floatn intel_sub_group_shuffle_up(floatn x, floatn y, uint c) intn intel_sub_group_shuffle_up(intn x, intn y, uint c) uintn intel_sub_group_shuffle_up(uintn x, uintn y, uint c) floatn intel_sub_group_shuffle_xor(floatn x, uint c) intn intel_sub_group_shuffle_xor(intn x, uint c) uintn intel_sub_group_shuffle_xor(uintn x, uint c) Beignet-1.3.2-Source/backend/src/libocl/script/ocl_common.def000664 001750 001750 00000002252 13161142102 023203 0ustar00yryr000000 000000 ##common gentype clamp (gentype x, gentype minval, gentype maxval) gentypef clamp (gentypef x, float minval, float maxval) gentypeh clamp (gentypeh x, half minval, half maxval) gentyped clamp (gentyped x, double minval, double maxval) gentype degrees (gentype radians) gentype max (gentype x, gentype y) gentypef max (gentypef x, float y) gentypeh max (gentypeh x, half y) gentyped max (gentyped x, double y) gentype min (gentype x, gentype y) gentypef min (gentypef x, float y) gentypeh min (gentypeh x, half y) gentyped min (gentyped x, double y) gentype mix (gentype x, gentype y, gentype a) gentypef mix (gentypef x, gentypef y, float a) gentypeh mix (gentypeh x, gentypeh y, half a) gentyped mix (gentyped x, gentyped y, double a) gentype radians (gentype degrees) gentype step (gentype edge, gentype x) gentypef step (float edge, gentypef x) gentypeh step (half edge, gentypeh x) gentyped step (double edge, gentyped x) gentype smoothstep (gentype edge0, gentype edge1, gentype x) gentypef smoothstep (float edge0, float edge1, gentypef x) gentypeh smoothstep (half edge0, half edge1, gentypeh x) gentyped smoothstep (double edge0, double edge1, gentyped x) gentype sign (gentype x) Beignet-1.3.2-Source/backend/src/libocl/script/ocl_convert.sh000775 001750 001750 00000046166 13161142102 023266 0ustar00yryr000000 000000 #! /bin/sh -e echo ' /* * Copyright © 2012 - 2014 Intel Corporatio * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ ' if [ $1"a" = "-pa" ]; then echo "#ifndef __OCL_CONVERT_H__" echo "#define __OCL_CONVERT_H__" echo "#include \"ocl_types.h\"" echo else echo "#include \"ocl_convert.h\"" echo fi # Supported base types and their lengths TYPES="long:8 ulong:8 int:4 uint:4 short:2 ushort:2 char:1 uchar:1 double:8 float:4 half:2" # Supported vector lengths VECTOR_LENGTHS="1 2 3 4 8 16" ROUNDING_MODES="rte rtz rtp rtn" # For all vector lengths and types, generate conversion functions for vector_length in $VECTOR_LENGTHS; do if test $vector_length -eq 1; then for ftype in $TYPES; do fbasetype=`IFS=:; set -- dummy $ftype; echo $2` for ttype in $TYPES; do tbasetype=`IFS=:; set -- dummy $ttype; echo $2` if [ $1"a" = "-pa" ]; then echo "OVERLOADABLE $tbasetype convert_$tbasetype($fbasetype v);" else echo "OVERLOADABLE $tbasetype convert_$tbasetype($fbasetype v) {" echo " return ($tbasetype)v;" echo "}" echo fi done done else for ftype in $TYPES; do fbasetype=`IFS=:; set -- dummy $ftype; echo $2` for ttype in $TYPES; do tbasetype=`IFS=:; set -- dummy $ttype; echo $2` if test $fbasetype = $tbasetype; then if test $vector_length -gt 1; then fvectortype=$fbasetype$vector_length tvectortype=$tbasetype$vector_length if [ $1"a" = "-pa" ]; then echo "OVERLOADABLE $tvectortype convert_$tvectortype($fvectortype v);" else echo "OVERLOADABLE $tvectortype convert_$tvectortype($fvectortype v) { return v; }" fi else if [ $1"a" = "-pa" ]; then echo "OVERLOADABLE $tbasetype convert_$tbasetype($fbasetype v);" else echo "OVERLOADABLE $tbasetype convert_$tbasetype($fbasetype v) { return v; }" fi fi continue fi fvectortype=$fbasetype$vector_length tvectortype=$tbasetype$vector_length construct="($tbasetype)(v.s0)" if test $vector_length -gt 1; then construct="$construct, ($tbasetype)(v.s1)" fi if test $vector_length -gt 2; then construct="$construct, ($tbasetype)(v.s2)" fi if test $vector_length -gt 3; then construct="$construct, ($tbasetype)(v.s3)" fi if test $vector_length -gt 4; then construct="$construct, ($tbasetype)(v.s4)" construct="$construct, ($tbasetype)(v.s5)" construct="$construct, ($tbasetype)(v.s6)" construct="$construct, ($tbasetype)(v.s7)" fi if test $vector_length -gt 8; then construct="$construct, ($tbasetype)(v.s8)" construct="$construct, ($tbasetype)(v.s9)" construct="$construct, ($tbasetype)(v.sA)" construct="$construct, ($tbasetype)(v.sB)" construct="$construct, ($tbasetype)(v.sC)" construct="$construct, ($tbasetype)(v.sD)" construct="$construct, ($tbasetype)(v.sE)" construct="$construct, ($tbasetype)(v.sF)" fi if [ $1"a" = "-pa" ]; then echo "OVERLOADABLE $tvectortype convert_$tvectortype($fvectortype v);" else echo "OVERLOADABLE $tvectortype convert_$tvectortype($fvectortype v) {" echo " return ($tvectortype)($construct);" echo "}" echo fi done done fi done echo ' /* The sat cvt supported by HW. */ #define DEF(DSTTYPE, SRCTYPE) \ OVERLOADABLE DSTTYPE convert_ ## DSTTYPE ## _sat(SRCTYPE x); DEF(char, uchar); DEF(char, short); DEF(char, ushort); DEF(char, int); DEF(char, uint); DEF(char, float); DEF(uchar, char); DEF(uchar, short); DEF(uchar, ushort); DEF(uchar, int); DEF(uchar, uint); DEF(uchar, float); DEF(short, ushort); DEF(short, int); DEF(short, uint); DEF(short, float); DEF(ushort, short); DEF(ushort, int); DEF(ushort, uint); DEF(ushort, float); DEF(int, uint); DEF(int, float); DEF(uint, int); DEF(uint, float); DEF(char, half); DEF(uchar, half); DEF(short, half); DEF(ushort, half); DEF(int, half); DEF(uint, half); #undef DEF ' if [ $1"a" = "-pa" ]; then echo "#define DEF(DSTTYPE, SRCTYPE, MIN, MAX) OVERLOADABLE DSTTYPE convert_ ## DSTTYPE ## _sat(SRCTYPE x);" else echo ' #define DEF(DSTTYPE, SRCTYPE, MIN, MAX) \ OVERLOADABLE DSTTYPE convert_ ## DSTTYPE ## _sat(SRCTYPE x) { \ x = x >= MAX ? MAX : x; \ return x <= MIN ? (DSTTYPE)MIN : (DSTTYPE)x; \ } ' fi echo ' DEF(char, long, -128, 127); DEF(uchar, long, 0, 255); DEF(short, long, -32768, 32767); DEF(ushort, long, 0, 65535); DEF(int, long, -0x7fffffff-1, 0x7fffffff); DEF(uint, long, 0, 0xffffffffu); #undef DEF ' if [ $1"a" = "-pa" ]; then echo " #define DEF(DSTTYPE, SRCTYPE, SRC_MIN, SRC_MAX, DST_MIN, DST_MAX) \ OVERLOADABLE DSTTYPE convert_ ## DSTTYPE ## _sat(SRCTYPE x);" else echo ' //convert float to long/ulong must take care of overflow, if overflow the value is undef. #define DEF(DSTTYPE, SRCTYPE, SRC_MIN, SRC_MAX, DST_MIN, DST_MAX) \ OVERLOADABLE DSTTYPE convert_ ## DSTTYPE ## _sat(SRCTYPE x) { \ DSTTYPE y = x >= SRC_MAX ? DST_MAX : (DSTTYPE)x; \ return x <= SRC_MIN ? DST_MIN : y; \ } ' fi echo ' DEF(long, float, -0x1.0p63, 0x1.0p63, 0x8000000000000000, 0x7fffffffffffffff); DEF(ulong, float, 0, 0x1.0p64, 0, 0xffffffffffffffff); #undef DEF ' if [ $1"a" = "-pa" ]; then echo "#define DEF(DSTTYPE, SRCTYPE, MAX) OVERLOADABLE DSTTYPE convert_ ## DSTTYPE ## _sat(SRCTYPE x);" else echo ' #define DEF(DSTTYPE, SRCTYPE, MAX) \ OVERLOADABLE DSTTYPE convert_ ## DSTTYPE ## _sat(SRCTYPE x) { \ return x >= MAX ? (DSTTYPE)MAX : x; \ } ' fi echo ' DEF(char, ulong, 127); DEF(uchar, ulong, 255); DEF(short, ulong, 32767); DEF(ushort, ulong, 65535); DEF(int, ulong, 0x7fffffff); DEF(uint, ulong, 0xffffffffu); #undef DEF ' if [ $1"a" = "-pa" ]; then echo "OVERLOADABLE long convert_long_sat(ulong x);" else echo ' OVERLOADABLE long convert_long_sat(ulong x) { ulong MAX = 0x7ffffffffffffffful; return x >= MAX ? MAX : x; } ' fi if [ $1"a" = "-pa" ]; then echo "#define DEF(DSTTYPE, SRCTYPE) OVERLOADABLE DSTTYPE convert_ ## DSTTYPE ## _sat(SRCTYPE x);" else echo ' #define DEF(DSTTYPE, SRCTYPE) \ OVERLOADABLE DSTTYPE convert_ ## DSTTYPE ## _sat(SRCTYPE x) { \ return x <= 0 ? 0 : x; \ } ' fi echo ' DEF(ushort, char); DEF(uint, char); DEF(uint, short); DEF(ulong, char); DEF(ulong, short); DEF(ulong, int); DEF(ulong, long); #undef DEF ' if [ $1"a" = "-pa" ]; then echo "#define DEF(DSTTYPE, SRCTYPE) OVERLOADABLE DSTTYPE convert_ ## DSTTYPE ## _sat(SRCTYPE x);" else echo ' #define DEF(DSTTYPE, SRCTYPE) \ OVERLOADABLE DSTTYPE convert_ ## DSTTYPE ## _sat(SRCTYPE x) { \ return x; \ } ' fi echo ' DEF(char, char); DEF(uchar, uchar); DEF(short, char); DEF(short, uchar); DEF(short, short); DEF(ushort, uchar); DEF(ushort, ushort); DEF(int, char); DEF(int, uchar); DEF(int, short); DEF(int, ushort); DEF(int, int); DEF(uint, uchar); DEF(uint, ushort); DEF(uint, uint); DEF(long, char); DEF(long, uchar); DEF(long, short); DEF(long, ushort); DEF(long, int); DEF(long, uint); DEF(long, long); DEF(ulong, uchar); DEF(ulong, ushort); DEF(ulong, uint); DEF(ulong, ulong); #undef DEF ' # for half to long if [ $1"a" = "-pa" ]; then echo ' OVERLOADABLE long convert_long_sat(half x); OVERLOADABLE ulong convert_ulong_sat(half x); ' else echo ' union _type_half_and_ushort { half hf; ushort us; }; OVERLOADABLE long convert_long_sat(half x) { union _type_half_and_ushort u; u.hf = x; if (u.us == 0x7C00) // +inf return 0x7FFFFFFFFFFFFFFF; if (u.us == 0xFC00) // -inf return 0x8000000000000000; return (long)x; } OVERLOADABLE ulong convert_ulong_sat(half x) { union _type_half_and_ushort u; u.hf = x; if (u.us == 0x7C00) // +inf return 0xFFFFFFFFFFFFFFFF; if (x < (half)0.0) { return 0; } return (ulong)x; }' fi # vector convert_DSTTYPE_sat function for vector_length in $VECTOR_LENGTHS; do if test $vector_length -eq 1; then continue; fi for ftype in $TYPES; do fbasetype=`IFS=:; set -- dummy $ftype; echo $2` if test $fbasetype = "double"; then continue; fi for ttype in $TYPES; do tbasetype=`IFS=:; set -- dummy $ttype; echo $2` if test $tbasetype = "double" -o $tbasetype = "float" -o $tbasetype = "half" ; then continue; fi fvectortype=$fbasetype$vector_length tvectortype=$tbasetype$vector_length conv="convert_${tbasetype}_sat" construct="$conv(v.s0)" if test $vector_length -gt 1; then construct="$construct, $conv(v.s1)" fi if test $vector_length -gt 2; then construct="$construct, $conv(v.s2)" fi if test $vector_length -gt 3; then construct="$construct, $conv(v.s3)" fi if test $vector_length -gt 4; then construct="$construct, $conv(v.s4)" construct="$construct, $conv(v.s5)" construct="$construct, $conv(v.s6)" construct="$construct, $conv(v.s7)" fi if test $vector_length -gt 8; then construct="$construct, $conv(v.s8)" construct="$construct, $conv(v.s9)" construct="$construct, $conv(v.sA)" construct="$construct, $conv(v.sB)" construct="$construct, $conv(v.sC)" construct="$construct, $conv(v.sD)" construct="$construct, $conv(v.sE)" construct="$construct, $conv(v.sF)" fi if [ $1"a" = "-pa" ]; then echo "OVERLOADABLE $tvectortype convert_${tvectortype}_sat($fvectortype v);" else echo "OVERLOADABLE $tvectortype convert_${tvectortype}_sat($fvectortype v) {" echo " return ($tvectortype)($construct);" echo "}" echo fi done done done if [ $1"a" != "-pa" ]; then echo ' CONST float __gen_ocl_rndz(float x) __asm("llvm.trunc" ".f32"); CONST float __gen_ocl_rnde(float x) __asm("llvm.rint" ".f32"); CONST float __gen_ocl_rndu(float x) __asm("llvm.ceil" ".f32"); CONST float __gen_ocl_rndd(float x) __asm("llvm.floor" ".f32"); OVERLOADABLE float __convert_float_rtz(long x) { union { uint u; float f; } u; u.f = x; long l = u.f; if((l > x && x > 0) || x >= 0x7fffffc000000000 || (l < x && x < 0)) { u.u -= 1; } return u.f; } OVERLOADABLE float __convert_float_rtp(long x) { union { uint u; float f; } u; u.f = x; long l = u.f; //can not use u.f < x if(l < x && x < 0x7fffffc000000000) { if(x > 0) u.u = u.u + 1; else u.u = u.u - 1; } return u.f; } OVERLOADABLE float __convert_float_rtn(long x) { union { uint u; float f; } u; u.f = x; long l = u.f; //avoid overflow if(l > x || x >= 0x7fffffc000000000) { if(x > 0) u.u = u.u - 1; else u.u = u.u + 1; } return u.f; } OVERLOADABLE float __convert_float_rtz(ulong x) { union { uint u; float f; } u; u.f = x; ulong l = u.f; if(l > x || x >= 0xffffff8000000000) u.u -= 1; return u.f; } OVERLOADABLE float __convert_float_rtp(ulong x) { union { uint u; float f; } u; u.f = x; ulong l = u.f; //can not use u.f < x if(l < x && x < 0xffffff8000000000) u.u = u.u + 1; return u.f; } OVERLOADABLE float __convert_float_rtn(ulong x) { return __convert_float_rtz(x); } OVERLOADABLE float __convert_float_rtz(int x) { union { uint u; float f; } u; u.f = x; long i = u.f; if((i > x && x > 0) || (i < x && x < 0)) { u.u -= 1; } return u.f; } OVERLOADABLE float __convert_float_rtp(int x) { union { uint u; float f; } u; u.f = x; int i = u.f; if(i < x) { if(x > 0) u.u += 1; else u.u -= 1; } return u.f; } OVERLOADABLE float __convert_float_rtn(int x) { union { uint u; float f; } u; u.f = x; long i = u.f; //avoid overflow if(i > x) { if(x > 0) u.u = u.u - 1; else u.u = u.u + 1; } return u.f; } OVERLOADABLE float __convert_float_rtz(uint x) { union { uint u; float f; } u; u.f = x; ulong i = u.f; if(i > x) u.u -= 1; return u.f; } OVERLOADABLE float __convert_float_rtp(uint x) { union { uint u; float f; } u; u.f = x; uint i = u.f; if(i < x) u.u += 1; return u.f; } OVERLOADABLE float __convert_float_rtn(uint x) { return __convert_float_rtz(x); } ' fi # convert_DSTTYPE_ROUNDING function for vector_length in $VECTOR_LENGTHS; do for ftype in $TYPES; do fbasetype=`IFS=:; set -- dummy $ftype; echo $2` if test $fbasetype = "double"; then continue; fi for ttype in $TYPES; do tbasetype=`IFS=:; set -- dummy $ttype; echo $2` if test $tbasetype = "double"; then continue; fi if test $vector_length -eq 1; then if [ $1"a" = "-pa" ]; then echo "OVERLOADABLE $tbasetype convert_${tbasetype}_rte($fbasetype x);" echo "OVERLOADABLE $tbasetype convert_${tbasetype}_rtz($fbasetype x);" echo "OVERLOADABLE $tbasetype convert_${tbasetype}_rtp($fbasetype x);" echo "OVERLOADABLE $tbasetype convert_${tbasetype}_rtn($fbasetype x);" else echo "OVERLOADABLE $tbasetype convert_${tbasetype}_rte($fbasetype x)" if test $fbasetype = "float" -a $tbasetype != "float"; then echo "{ return __gen_ocl_rnde(x); }" else echo "{ return x; }" fi echo "OVERLOADABLE $tbasetype convert_${tbasetype}_rtz($fbasetype x)" if test $fbasetype = "float" -a $tbasetype != "float"; then echo "{ return __gen_ocl_rndz(x); }" elif [ "$fbasetype" = "int" -o "$fbasetype" = "uint" -o "$fbasetype" = "long" -o "$fbasetype" = "ulong" ] && [ "$tbasetype" = "float" ]; then echo "{ return __convert_${tbasetype}_rtz(x); }" else echo "{ return x; }" fi echo "OVERLOADABLE $tbasetype convert_${tbasetype}_rtp($fbasetype x)" if test $fbasetype = "float" -a $tbasetype != "float"; then echo "{ return __gen_ocl_rndu(x); }" elif [ "$fbasetype" = "int" -o "$fbasetype" = "uint" -o "$fbasetype" = "long" -o "$fbasetype" = "ulong" ] && [ "$tbasetype" = "float" ]; then echo "{ return __convert_${tbasetype}_rtp(x); }" else echo "{ return x; }" fi echo "OVERLOADABLE $tbasetype convert_${tbasetype}_rtn($fbasetype x)" if test $fbasetype = "float" -a $tbasetype != "float"; then echo "{ return __gen_ocl_rndd(x); }" elif [ "$fbasetype" = "int" -o "$fbasetype" = "uint" -o "$fbasetype" = "long" -o "$fbasetype" = "ulong" ] && [ "$tbasetype" = "float" ]; then echo "{ return __convert_${tbasetype}_rtn(x); }" else echo "{ return x; }" fi fi continue fi for rounding in $ROUNDING_MODES; do fvectortype=$fbasetype$vector_length tvectortype=$tbasetype$vector_length conv="convert_${tbasetype}_${rounding}" construct="$conv(v.s0)" if test $vector_length -gt 1; then construct="$construct, $conv(v.s1)" fi if test $vector_length -gt 2; then construct="$construct, $conv(v.s2)" fi if test $vector_length -gt 3; then construct="$construct, $conv(v.s3)" fi if test $vector_length -gt 4; then construct="$construct, $conv(v.s4)" construct="$construct, $conv(v.s5)" construct="$construct, $conv(v.s6)" construct="$construct, $conv(v.s7)" fi if test $vector_length -gt 8; then construct="$construct, $conv(v.s8)" construct="$construct, $conv(v.s9)" construct="$construct, $conv(v.sA)" construct="$construct, $conv(v.sB)" construct="$construct, $conv(v.sC)" construct="$construct, $conv(v.sD)" construct="$construct, $conv(v.sE)" construct="$construct, $conv(v.sF)" fi if [ $1"a" = "-pa" ]; then echo "OVERLOADABLE $tvectortype convert_${tvectortype}_${rounding}($fvectortype v);" else echo "OVERLOADABLE $tvectortype convert_${tvectortype}_${rounding}($fvectortype v) {" echo " return ($tvectortype)($construct);" echo "}" echo fi done done done done # convert_DSTTYPE_sat_ROUNDING function for vector_length in $VECTOR_LENGTHS; do for ftype in $TYPES; do fbasetype=`IFS=:; set -- dummy $ftype; echo $2` if test $fbasetype = "double"; then continue; fi for ttype in $TYPES; do tbasetype=`IFS=:; set -- dummy $ttype; echo $2` if test $tbasetype = "double" -o $tbasetype = "float" -o $tbasetype = "half" ; then continue; fi if test $vector_length -eq 1; then if [ $1"a" = "-pa" ]; then echo "OVERLOADABLE $tbasetype convert_${tbasetype}_sat_rte($fbasetype x);" echo "OVERLOADABLE $tbasetype convert_${tbasetype}_sat_rtz($fbasetype x);" echo "OVERLOADABLE $tbasetype convert_${tbasetype}_sat_rtp($fbasetype x);" echo "OVERLOADABLE $tbasetype convert_${tbasetype}_sat_rtn($fbasetype x);" else echo "OVERLOADABLE $tbasetype convert_${tbasetype}_sat_rte($fbasetype x)" if test $fbasetype = "float"; then echo "{ return convert_${tbasetype}_sat(__gen_ocl_rnde(x)); }" else echo "{ return convert_${tbasetype}_sat(x); }" fi echo "OVERLOADABLE $tbasetype convert_${tbasetype}_sat_rtz($fbasetype x)" if test $fbasetype = "float"; then echo "{ return convert_${tbasetype}_sat(__gen_ocl_rndz(x)); }" else echo "{ return convert_${tbasetype}_sat(x); }" fi echo "OVERLOADABLE $tbasetype convert_${tbasetype}_sat_rtp($fbasetype x)" if test $fbasetype = "float"; then echo "{ return convert_${tbasetype}_sat(__gen_ocl_rndu(x)); }" else echo "{ return convert_${tbasetype}_sat(x); }" fi echo "OVERLOADABLE $tbasetype convert_${tbasetype}_sat_rtn($fbasetype x)" if test $fbasetype = "float"; then echo "{ return convert_${tbasetype}_sat(__gen_ocl_rndd(x)); }" else echo "{ return convert_${tbasetype}_sat(x); }" fi fi continue fi for rounding in $ROUNDING_MODES; do fvectortype=$fbasetype$vector_length tvectortype=$tbasetype$vector_length conv="convert_${tbasetype}_sat_${rounding}" construct="$conv(v.s0)" if test $vector_length -gt 1; then construct="$construct, $conv(v.s1)" fi if test $vector_length -gt 2; then construct="$construct, $conv(v.s2)" fi if test $vector_length -gt 3; then construct="$construct, $conv(v.s3)" fi if test $vector_length -gt 4; then construct="$construct, $conv(v.s4)" construct="$construct, $conv(v.s5)" construct="$construct, $conv(v.s6)" construct="$construct, $conv(v.s7)" fi if test $vector_length -gt 8; then construct="$construct, $conv(v.s8)" construct="$construct, $conv(v.s9)" construct="$construct, $conv(v.sA)" construct="$construct, $conv(v.sB)" construct="$construct, $conv(v.sC)" construct="$construct, $conv(v.sD)" construct="$construct, $conv(v.sE)" construct="$construct, $conv(v.sF)" fi if [ $1"a" = "-pa" ]; then echo "OVERLOADABLE $tvectortype convert_${tvectortype}_sat_${rounding}($fvectortype v);" else echo "OVERLOADABLE $tvectortype convert_${tvectortype}_sat_${rounding}($fvectortype v) {" echo " return ($tvectortype)($construct);" echo "}" echo fi done done done done if [ $1"a" = "-pa" ]; then echo "#endif /* __OCL_CONVERT_H__ */" fi Beignet-1.3.2-Source/backend/src/libocl/script/ocl_math.def000664 001750 001750 00000015160 13161142102 022646 0ustar00yryr000000 000000 ##math gentype acos (gentype) gentype acosh (gentype) gentype acospi (gentype x) gentype asin (gentype) gentype asinh (gentype) gentype asinpi (gentype x) gentype atan (gentype y_over_x) gentype atan2 (gentype y, gentype x) gentype atanh (gentype) gentype atanpi (gentype x) gentype atan2pi (gentype y, gentype x) gentype cbrt (gentype) gentype ceil (gentype) gentype copysign (gentype x, gentype y) gentype cos (gentype) gentype cosh (gentype) gentype cospi (gentype x) gentype erfc (gentype) gentype erf (gentype) gentype exp (gentype x) gentype exp2 (gentype) gentype exp10 (gentype) gentype expm1 (gentype x) gentype fabs (gentype) gentype fdim (gentype x, gentype y) gentype floor (gentype) # XXX we use madd for fma gentype fma (gentype a, gentype b, gentype c) gentype fmax (gentype x, gentype y) gentypef fmax (gentypef x, float y) gentypeh fmax (gentypeh x, half y) gentyped fmax (gentyped x, double y) gentype fmin (gentype x, gentype y) gentypef fmin (gentypef x, float y) gentypeh fmin (gentypeh x, half y) gentyped fmin (gentyped x, double y) gentype fmod (gentype x, gentype y) gentype fract (gentype x, __global gentype *iptr) gentype fract (gentype x, __local gentype *iptr) gentype fract (gentype x, __private gentype *iptr) floatn frexp (floatn x, __global intn *exp) floatn frexp (floatn x, __local intn *exp) floatn frexp (floatn x, __private intn *exp) float frexp (float x, __global int *exp) float frexp (float x, __local int *exp) float frexp (float x, __private int *exp) halfn frexp (halfn x, __global intn *exp) halfn frexp (halfn x, __local intn *exp) halfn frexp (halfn x, __private intn *exp) half frexp (half x, __global int *exp) half frexp (half x, __local int *exp) half frexp (half x, __private int *exp) doublen frexp (doublen x, __global intn *exp) doublen frexp (doublen x, __local intn *exp) doublen frexp (doublen x, __private intn *exp) double frexp (double x, __global int *exp) double frexp (double x, __local int *exp) double frexp (double x, __private int *exp) gentype hypot (gentype x, gentype y) intn ilogb (floatn x) int ilogb (float x) shortn ilogb (halfn x) short ilogb (half x) intn ilogb (doublen x) int ilogb (double x) floatn ldexp (floatn x, intn k) floatn ldexp (floatn x, int k) float ldexp (float x, int k) halfn ldexp (halfn x, intn k) halfn ldexp (halfn x, int k) half ldexp (half x, int k) doublen ldexp (doublen x, intn k) doublen ldexp (doublen x, int k) double ldexp (double x, int k) gentype lgamma (gentype x) floatn lgamma_r (floatn x, __global intn *signp) floatn lgamma_r (floatn x, __local intn *signp) floatn lgamma_r (floatn x, __private intn *signp) float lgamma_r (float x, __global int *signp) float lgamma_r (float x, __local int *signp) float lgamma_r (float x, __private int *signp) halfn lgamma_r (halfn x, __global intn *signp) halfn lgamma_r (halfn x, __local intn *signp) halfn lgamma_r (halfn x, __private intn *signp) half lgamma_r (half x, __global int *signp) half lgamma_r (half x, __local int *signp) half lgamma_r (half x, __private int *signp) #doublen lgamma_r (doublen x, __global intn *signp) #doublen lgamma_r (doublen x, __local intn *signp) #doublen lgamma_r (doublen x, __private intn *signp) #double lgamma_r (double x, __global int *signp) #double lgamma_r (double x, __local int *signp) #double lgamma_r (double x, __private int *signp) gentype log (gentype) gentype log2 (gentype) gentype log10 (gentype) gentype log1p (gentype x) gentype logb (gentype x) gentype mad (gentype a, gentype b, gentype c) gentype maxmag (gentype x, gentype y) gentype minmag (gentype x, gentype y) gentype modf (gentype x, __global gentype *iptr) gentype modf (gentype x, __local gentype *iptr) gentype modf (gentype x, __private gentype *iptr) floatn nan (uintn nancode) float nan (uint nancode) halfn nan (ushortn nancode) half nan (ushort nancode) doublen nan (ulongn nancode) double nan (ulong nancode) gentype nextafter (gentype x, gentype y) gentype pow (gentype x, gentype y) floatn pown (floatn x, intn y) float pown (float x, int y) halfn pown (halfn x, intn y) half pown (half x, int y) doublen pown (doublen x, intn y) double pown (double x, int y) gentype powr (gentype x, gentype y) gentype remainder (gentype x, gentype y) floatn remquo (floatn x, floatn y, __global intn *quo) floatn remquo (floatn x, floatn y, __local intn *quo) floatn remquo (floatn x, floatn y, __private intn *quo) float remquo (float x, float y, __global int *quo) float remquo (float x, float y, __local int *quo) float remquo (float x, float y, __private int *quo) halfn remquo (halfn x, halfn y, __global intn *quo) halfn remquo (halfn x, halfn y, __local intn *quo) halfn remquo (halfn x, halfn y, __private intn *quo) half remquo (half x, half y, __global int *quo) half remquo (half x, half y, __local int *quo) half remquo (half x, half y, __private int *quo) doublen remquo (doublen x, doublen y, __global intn *quo) doublen remquo (doublen x, doublen y, __local intn *quo) doublen remquo (doublen x, doublen y, __private intn *quo) double remquo (double x, double y, __global int *quo) double remquo (double x, double y, __local int *quo) double remquo (double x, double y, __private int *quo) gentype rint (gentype) floatn rootn (floatn x, intn y) halfn rootn (halfn x, intn y) doublen rootn (doublen x, intn y) doublen rootn (double x, int y) gentype round (gentype x) gentype rsqrt (gentype) gentype sin (gentype) gentype sincos (gentype x, __global gentype *cosval) gentype sincos (gentype x, __local gentype *cosval) gentype sincos (gentype x, __private gentype *cosval) gentype sinh (gentype) gentype sinpi (gentype x) gentype sqrt (gentype) gentype tan (gentype) gentype tanh (gentype) gentype tanpi (gentype x) gentype tgamma (gentype) gentype trunc (gentype) # XXX we already defined all native and non-native # functions to the same one. gentype native_cos (gentype x) gentype native_divide (gentype x, gentype y) gentype native_exp (gentype x) gentype native_exp2 (gentype x) gentype native_exp10 (gentype x) gentype native_log (gentype x) gentype native_log2 (gentype x) gentype native_log10 (gentype x) gentype native_powr (gentype x, gentype y) gentype native_recip (gentype x) gentype native_rsqrt (gentype x) gentype native_sin (gentype x) gentype native_sqrt (gentype x) gentype native_tan (gentype x) ##half_native_math gentype half_cos (gentype x) gentype half_divide (gentype x, gentype y) gentype half_exp (gentype x) gentype half_exp2 (gentype x) gentype half_exp10 (gentype x) gentype half_log (gentype x) gentype half_log2 (gentype x) gentype half_log10 (gentype x) gentype half_powr (gentype x, gentype y) gentype half_recip (gentype x) gentype half_rsqrt (gentype x) gentype half_sin (gentype x) gentype half_sqrt (gentype x) gentype half_tan (gentype x) Beignet-1.3.2-Source/backend/src/libocl/script/ocl_relational.def000664 001750 001750 00000003044 13161142102 024045 0ustar00yryr000000 000000 ##relational intn isequal (floatn x, floatn y) shortn isequal (halfn x, halfn y) longn isequal (doublen x, doublen y) intn isnotequal (floatn x, floatn y) shortn isnotequal (halfn x, halfn y) longn isnotequal (doublen x, doublen y) intn isgreater (floatn x, floatn y) shortn isgreater (halfn x, halfn y) longn isgreater (doublen x, doublen y) intn isgreaterequal (floatn x, floatn y) shortn isgreaterequal (halfn x, halfn y) longn isgreaterequal (doublen x, doublen y) intn isless (floatn x, floatn y) shortn isless (halfn x, halfn y) longn isless (doublen x, doublen y) intn islessequal (floatn x, floatn y) shortn islessequal (halfn x, halfn y) longn islessequal (doublen x, doublen y) intn islessgreater (floatn x, floatn y) shortn islessgreater (halfn x, halfn y) longn islessgreater (doublen x, doublen y) intn isfinite (floatn) shortn isfinite (halfn) longn isfinite (doublen) intn isinf (floatn) shortn isinf (halfn) longn isinf (doublen) intn isnan (floatn) shortn isnan (halfn) longn isnan (doublen) intn isnormal (floatn) shortn isnormal (halfn) longn isnormal (doublen) intn isordered (floatn x, floatn y) shortn isordered (halfn x, halfn y) longn isordered (doublen x, doublen y) intn isunordered (floatn x, floatn y) shortn isunordered (halfn x, halfn y) longn isunordered (doublen x, doublen y) intn signbit (floatn) shortn signbit (halfn) longn signbit (doublen) int any (igentype x) int all (igentype x) gentype bitselect (gentype a, gentype b, gentype c) gentype select (gentype a, gentype b, igentype c) gentype select (gentype a, gentype b, ugentype c) Beignet-1.3.2-Source/backend/src/libocl/script/ocl_math_20.def000664 001750 001750 00000011231 13161142102 023142 0ustar00yryr000000 000000 ##math gentype acos (gentype) gentype acosh (gentype) gentype acospi (gentype x) gentype asin (gentype) gentype asinh (gentype) gentype asinpi (gentype x) gentype atan (gentype y_over_x) gentype atan2 (gentype y, gentype x) gentype atanh (gentype) gentype atanpi (gentype x) gentype atan2pi (gentype y, gentype x) gentype cbrt (gentype) gentype ceil (gentype) gentype copysign (gentype x, gentype y) gentype cos (gentype) gentype cosh (gentype) gentype cospi (gentype x) gentype erfc (gentype) gentype erf (gentype) gentype exp (gentype x) gentype exp2 (gentype) gentype exp10 (gentype) gentype expm1 (gentype x) gentype fabs (gentype) gentype fdim (gentype x, gentype y) gentype floor (gentype) # XXX we use madd for fma gentype fma (gentype a, gentype b, gentype c) gentype fmax (gentype x, gentype y) gentypef fmax (gentypef x, float y) gentypeh fmax (gentypeh x, half y) gentyped fmax (gentyped x, double y) gentype fmin (gentype x, gentype y) gentypef fmin (gentypef x, float y) gentypeh fmin (gentypeh x, half y) gentyped fmin (gentyped x, double y) gentype fmod (gentype x, gentype y) gentype fract (gentype x, __generic gentype *iptr) floatn frexp (floatn x, __generic intn *exp) float frexp (float x, __generic int *exp) halfn frexp (halfn x, __generic intn *exp) half frexp (half x, __generic int *exp) doublen frexp (doublen x, __generic intn *exp) double frexp (double x, __generic int *exp) gentype hypot (gentype x, gentype y) intn ilogb (floatn x) int ilogb (float x) shortn ilogb (halfn x) short ilogb (half x) intn ilogb (doublen x) int ilogb (double x) floatn ldexp (floatn x, intn k) floatn ldexp (floatn x, int k) float ldexp (float x, int k) halfn ldexp (halfn x, intn k) halfn ldexp (halfn x, int k) half ldexp (half x, int k) doublen ldexp (doublen x, intn k) doublen ldexp (doublen x, int k) double ldexp (double x, int k) gentype lgamma (gentype x) floatn lgamma_r (floatn x, __generic intn *signp) float lgamma_r (float x, __generic int *signp) halfn lgamma_r (halfn x, __generic intn *signp) half lgamma_r (half x, __generic int *signp) #doublen lgamma_r (doublen x, __generic intn *signp) #double lgamma_r (double x, __generic int *signp) gentype log (gentype) gentype log2 (gentype) gentype log10 (gentype) gentype log1p (gentype x) gentype logb (gentype x) gentype mad (gentype a, gentype b, gentype c) gentype maxmag (gentype x, gentype y) gentype minmag (gentype x, gentype y) gentype modf (gentype x, __generic gentype *iptr) floatn nan (uintn nancode) float nan (uint nancode) halfn nan (ushortn nancode) half nan (ushort nancode) doublen nan (ulongn nancode) double nan (ulong nancode) gentype nextafter (gentype x, gentype y) gentype pow (gentype x, gentype y) floatn pown (floatn x, intn y) float pown (float x, int y) halfn pown (halfn x, intn y) half pown (half x, int y) doublen pown (doublen x, intn y) double pown (double x, int y) gentype powr (gentype x, gentype y) gentype remainder (gentype x, gentype y) floatn remquo (floatn x, floatn y, __generic intn *quo) float remquo (float x, float y, __generic int *quo) halfn remquo (halfn x, halfn y, __generic intn *quo) half remquo (half x, half y, __generic int *quo) doublen remquo (doublen x, doublen y, __generic intn *quo) double remquo (double x, double y, __generic int *quo) gentype rint (gentype) floatn rootn (floatn x, intn y) halfn rootn (halfn x, intn y) doublen rootn (doublen x, intn y) doublen rootn (double x, int y) gentype round (gentype x) gentype rsqrt (gentype) gentype sin (gentype) gentype sincos (gentype x, __generic gentype *cosval) gentype sinh (gentype) gentype sinpi (gentype x) gentype sqrt (gentype) gentype tan (gentype) gentype tanh (gentype) gentype tanpi (gentype x) gentype tgamma (gentype) gentype trunc (gentype) # XXX we already defined all native and non-native # functions to the same one. gentype native_cos (gentype x) gentype native_divide (gentype x, gentype y) gentype native_exp (gentype x) gentype native_exp2 (gentype x) gentype native_exp10 (gentype x) gentype native_log (gentype x) gentype native_log2 (gentype x) gentype native_log10 (gentype x) gentype native_powr (gentype x, gentype y) gentype native_recip (gentype x) gentype native_rsqrt (gentype x) gentype native_sin (gentype x) gentype native_sqrt (gentype x) gentype native_tan (gentype x) ##half_native_math gentype half_cos (gentype x) gentype half_divide (gentype x, gentype y) gentype half_exp (gentype x) gentype half_exp2 (gentype x) gentype half_exp10 (gentype x) gentype half_log (gentype x) gentype half_log2 (gentype x) gentype half_log10 (gentype x) gentype half_powr (gentype x, gentype y) gentype half_recip (gentype x) gentype half_rsqrt (gentype x) gentype half_sin (gentype x) gentype half_sqrt (gentype x) gentype half_tan (gentype x) Beignet-1.3.2-Source/backend/src/libocl/script/ocl_integer.def000664 001750 001750 00000002066 13161142102 023353 0ustar00yryr000000 000000 ##integer ugentype abs (gentype x) ugentype abs_diff (gentype x, gentype y) gentype add_sat (gentype x, gentype y) gentype hadd (gentype x, gentype y) gentype rhadd (gentype x, gentype y) gentype clamp (gentype x, gentype minval, gentype maxval) gentype clamp (gentype x, sgentype minval, sgentype maxval) gentype clz (gentype x) gentype ctz (gentype x) gentype mad_hi (gentype a, gentype b, gentype c) gentype mad_sat (gentype a, gentype b, gentype c) gentype max (gentype x, gentype y) gentype max (gentype x, sgentype y) gentype min (gentype x, gentype y) gentype min (gentype x, sgentype y) gentype mul_hi (gentype x, gentype y) gentype rotate (gentype v, gentype i) gentype sub_sat (gentype x, gentype y) shortn upsample (charn hi, ucharn lo) ushortn upsample (ucharn hi, ucharn lo) intn upsample (shortn hi, ushortn lo) uintn upsample (ushortn hi, ushortn lo) longn upsample (intn hi, uintn lo) ulongn upsample (uintn hi, uintn lo) gentype popcount (gentype x) ##fast_integer gentype mad24 (gentype x, gentype y, gentype z) gentype mul24 (gentype x, gentype y) Beignet-1.3.2-Source/backend/src/libocl/script/gen_vector.py000775 001750 001750 00000032467 13161142102 023121 0ustar00yryr000000 000000 #!/usr/bin/env python # # Copyright (C) 2012 Intel Corporation # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library. If not, see . # # Author: Zhigang Gong #/ # This file is to generate inline code to lower down those builtin # vector functions to scalar functions. from __future__ import print_function import re import sys import os if len(sys.argv) != 4: print("Invalid argument {0}".format(sys.argv)) print("use {0} spec_file_name output_file_name just_proto".format(sys.argv[0])) raise all_vector = 1,2,3,4,8,16 # generate generic type sets def gen_vector_type(type_set, vector_set = all_vector): ret = [] for t in type_set: for i in vector_set: ret.append((t, i)) return ret def set_vector_memspace(vector_type_set, memspace): ret = [] if memspace == '': return vector_type_set for t in vector_type_set: ret.append((t[0], t[1], memspace)) return ret # if we have 3 elements in the type tuple, we are a pointer with a memory space type # at the third element. def isPointer(t): return len(t) == 3 all_itype = "char","short","int","long" all_utype = "uchar","ushort","uint","ulong" all_int_type = all_itype + all_utype all_float_type = "float","double","half" all_type = all_int_type + all_float_type # all vector/scalar types for t in all_type: exec("{0}n = [\"{0}n\", gen_vector_type([\"{0}\"])]".format(t)) exec("s{0} = [\"{0}\", gen_vector_type([\"{0}\"], [1])]".format(t)) # Predefined type sets according to the Open CL spec. math_gentype = ["math_gentype", gen_vector_type(all_float_type)] math_gentypef = ["math_gentypef", gen_vector_type(["float"])] math_gentypeh = ["math_gentypeh", gen_vector_type(["half"])] math_gentyped = ["math_gentyped", gen_vector_type(["double"])] half_native_math_gentype = ["half_native_math_gentype", gen_vector_type(["float"])] integer_gentype = ["integer_gentype", gen_vector_type(all_int_type)] integer_ugentype = ["integer_ugentype", gen_vector_type(all_utype)] integer_sgentype = ["integer_sgentype", gen_vector_type(all_int_type, [1])] fast_integer_gentype = ["fast_integer_gentype", gen_vector_type(["uint", "int"])] common_gentype = ["common_gentype", gen_vector_type(all_float_type)] common_gentypef = ["common_gentypef", gen_vector_type(["float"])] common_gentypeh = ["common_gentypeh", gen_vector_type(["half"])] common_gentyped = ["common_gentyped", gen_vector_type(["double"])] relational_gentype = ["relational_gentype", gen_vector_type(all_type)] relational_igentype = ["relational_igentype", gen_vector_type(all_itype)] relational_ugentype = ["relational_ugentype", gen_vector_type(all_utype)] misc_gentypem = ["misc_gentypem", gen_vector_type(all_type, [2, 4, 8, 16])] misc_gentypen = ["misc_gentypen", gen_vector_type(all_type, [2, 4, 8, 16])] misc_ugentypem = ["misc_ugentypem", gen_vector_type(all_utype, [2, 4, 8, 16])] misc_ugentypen = ["misc_ugentypen", gen_vector_type(all_utype, [2, 4, 8, 16])] all_predefined_type = math_gentype, math_gentypef, math_gentyped, math_gentypeh, \ half_native_math_gentype, integer_gentype,integer_sgentype,\ integer_ugentype, charn, ucharn, shortn, ushortn, intn, \ uintn, longn, ulongn, floatn, doublen, halfn, common_gentypeh, \ fast_integer_gentype, common_gentype, common_gentypef, \ common_gentyped, relational_gentype, relational_igentype, \ relational_ugentype, schar, suchar, sshort, sushort, sint, \ suint, slong, sulong, sfloat, shalf, sdouble, misc_gentypem, \ misc_ugentypem, misc_gentypen, misc_ugentypen # type dictionary contains all the predefined type sets. type_dict = {} for t in all_predefined_type: type_dict.update({t[0]:t[1]}) def _prefix(prefix, dtype): if dtype.count("gentype") != 0: return prefix + '_' + dtype return dtype memspaces = ["__local ", "__private ", "__global ", "__generic "] def stripMemSpace(t): if t[0:2] == '__': for memspace in memspaces : if t[0:len(memspace)] == memspace: return memspace, t[len(memspace):] return '', t def check_type(types): for t in types: memspace, t = stripMemSpace(t) if not t in type_dict: print(t) raise TypeError("found invalid type.") def match_unsigned(dtype): if dtype[0] == 'half': return ["ushort", dtype[1]] if dtype[0] == 'float': return ["uint", dtype[1]] if dtype[0] == 'double': return ["ulong", dtype[1]] if dtype[0][0] == 'u': return dtype return ['u' + dtype[0], dtype[1]] def match_signed(dtype): if dtype[0] == 'half': return ["short", dtype[1]] if dtype[0] == 'float': return ["int", dtype[1]] if dtype[0] == 'double': return ["long", dtype[1]] if dtype[0][0] != 'u': return dtype return [dtype[0][1:], dtype[1]] def match_scalar(dtype): return [dtype[0], 1] # The dstType is the expected type, srcType is # the reference type. Sometimes, the dstType and # srcType are different. We need to fix this issue # and return correct dst type. def fixup_type(dstType, srcType, n): if dstType == srcType: return dstType[n] if dstType != srcType: # scalar dst type if len(dstType) == 1: return dstType[0] # dst is not scalar bug src is scalar if len(srcType) == 1: return dstType[n] if dstType == integer_sgentype[1] and srcType == integer_gentype[1]: return match_scalar(srcType[n]) if dstType == integer_gentype[1] and \ (srcType == integer_sgentype[1] or \ srcType == integer_ugentype[1]): return dstType[n] if dstType == integer_ugentype[1] and srcType == integer_gentype[1]: return match_unsigned(srcType[n]) if dstType == relational_igentype[1] and srcType == relational_gentype[1]: return match_signed(srcType[n]) if dstType == relational_ugentype[1] and srcType == relational_gentype[1]: return match_unsigned(srcType[n]) if dstType == relational_gentype[1] and \ (srcType == relational_igentype[1] or \ srcType == relational_ugentype[1]): return dstType[n] if (len(dstType) == len(srcType)): return dstType[n] print(dstType, srcType) raise TypeError("type mispatch") class builtinProto(): valueTypeStr = "" functionName = "" paramTypeStrs = [] paramCount = 0 outputStr = [] prefix = "" justproto = 0 def init(self, sectionHeader, sectionPrefix, justproto): self.valueTypeStr = "" self.functionName = "" self.paramTypeStrs = [] self.paramCount = 0 self.justproto = justproto if sectionHeader != "": self.outputStr = [sectionHeader] else: self.outputStr = [] if sectionPrefix != "": self.prefix = sectionPrefix self.indent = 0 def append(self, line, nextInit = ""): self.outputStr.append(line); return nextInit; def indentSpace(self): ret = "" for i in range(self.indent): ret += ' ' return ret def init_from_line(self, t): self.append('//{0}'.format(t)) line = [_f for _f in re.split(',| |\(', t.rstrip(')\n')) if _f] self.paramCount = 0 stripped = 0 memSpace = '' for i, text in enumerate(line): idx = i - stripped if idx == 0: self.valueTypeStr = _prefix(self.prefix, line[i]) continue if idx == 1: self.functionName = line[i]; continue if idx % 2 == 0: if line[i][0] == '(': tmpType = line[i][1:] else: tmpType = line[i] if tmpType == '__local' or \ tmpType == '__private' or \ tmpType == '__global' or\ tmpType == '__generic': memSpace = tmpType + ' ' stripped += 1 continue self.paramTypeStrs.append(memSpace + _prefix(self.prefix, tmpType)) memSpace = '' self.paramCount += 1 def gen_proto_str_1(self, vtypeSeq, ptypeSeqs, i): for n in range(0, self.paramCount): ptype = fixup_type(ptypeSeqs[n], vtypeSeq, i); vtype = fixup_type(vtypeSeq, ptypeSeqs[n], i); # XXX FIXME now skip all double vector, as we don't # defined those scalar version's prototype. if ptype[0].find('double') != -1 or \ vtype[0].find('double') != -1: return if (n == 0): formatStr = 'OVERLOADABLE {0}{1} {2} ('.format(vtype[0], vtype[1], self.functionName) else: formatStr += ', ' if vtype[1] == 1: return if isPointer(ptype): formatStr += ptype[2] pointerStr = '*' else: pointerStr = '' if ptype[1] != 1: formatStr += '{0}{1} {2}param{3}'.format(ptype[0], ptype[1], pointerStr, n) else: formatStr += '{0} {1}param{2}'.format(ptype[0], pointerStr, n) formatStr += ')' if self.justproto == "1": formatStr += ';' self.append(formatStr) return formatStr formatStr = self.append(formatStr, '{{return ({0}{1})('.format(vtype[0], vtype[1])) self.indent = len(formatStr) for j in range(0, vtype[1]): if (j != 0): formatStr += ',' if (j + 1) % 2 == 0: formatStr += ' ' if j % 2 == 0: formatStr = self.append(formatStr, self.indentSpace()) if self.prefix == 'relational' and self.functionName != 'bitselect' and self.functionName != 'select': formatStr += '-' formatStr += '{0}('.format(self.functionName) for n in range(0, self.paramCount): if n != 0: formatStr += ', ' ptype = fixup_type(ptypeSeqs[n], vtypeSeq, i) vtype = fixup_type(vtypeSeq, ptypeSeqs[n], i) if vtype[1] != ptype[1]: if ptype[1] != 1: raise TypeError("parameter is not a scalar but has different width with result value.") if isPointer(ptype): formatStr += '&' formatStr += 'param{0}'.format(n) continue if (isPointer(ptype)): formatStr += '({0} {1} *)param{2} + {3:2d}'.format(ptype[2], ptype[0], n, j) else: if (self.functionName == 'select' and n == 2): formatStr += '({0})(param{1}.s{2:X} & (({0})1 << (sizeof({0})*8 - 1)))'.format(ptype[0], n, j) else: formatStr += 'param{0}.s{1:X}'.format(n, j) formatStr += ')' formatStr += '); }\n' self.append(formatStr) return formatStr def output(self): for line in self.outputStr: print(line) def output(self, outFile): for line in self.outputStr: outFile.write('{0}\n'.format(line)) def gen_proto_str(self): check_type([self.valueTypeStr] + self.paramTypeStrs) vtypeSeq = type_dict[self.valueTypeStr] ptypeSeqs = [] count = len(vtypeSeq); for t in self.paramTypeStrs: memspace,t = stripMemSpace(t) ptypeSeqs.append(set_vector_memspace(type_dict[t], memspace)) count = max(count, len(type_dict[t])) for i in range(count): formatStr = self.gen_proto_str_1(vtypeSeq, ptypeSeqs, i) self.append("") # save the prototypes into ocl_vector.h specFile = open(sys.argv[1], 'r') headerFileName = sys.argv[2] tempHeader = open(headerFileName, 'a') isJustProto = sys.argv[3] tempHeader.write("//Begin from this part is autogenerated.\n") tempHeader.write("//Don't modify it manually.\n") functionProto = builtinProto() for line in specFile: if line.isspace(): continue if line[0] == '#': if line[1] == '#': sectionHeader = "//{0} builtin functions".format(line[2:].rstrip()) sectionPrefix=(line[2:].split())[0] continue functionProto.init(sectionHeader, sectionPrefix, isJustProto) sectionHeader = "" setionPrefix = "" functionProto.init_from_line(line) functionProto.gen_proto_str() functionProto.output(tempHeader) tempHeader.close() Beignet-1.3.2-Source/backend/CMakeLists.txt000664 001750 001750 00000005726 13161142102 017610 0ustar00yryr000000 000000 project (GBE) set (LIBGBE_VERSION_MAJOR 0) set (LIBGBE_VERSION_MINOR 2) cmake_minimum_required (VERSION 2.6.0) set (GBE_CMAKE_DIR "${GBE_SOURCE_DIR}/cmake") set (CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${GBE_CMAKE_DIR}") ############################################################## # Compilation directives ############################################################## set (GBE_DEBUG_MEMORY false CACHE bool "Activate the memory debugger") set (GBE_USE_BLOB false CACHE bool "Compile everything from one big file") # Force Release with debug info if (NOT CMAKE_BUILD_TYPE) set (CMAKE_BUILD_TYPE RelWithDebInfo) endif (NOT CMAKE_BUILD_TYPE) set (CMAKE_BUILD_TYPE ${CMAKE_BUILD_TYPE} CACHE STRING "assure config" FORCE) message(STATUS "Building mode: " ${CMAKE_BUILD_TYPE}) if (GBE_DEBUG_MEMORY) set (GBE_DEBUG_MEMORY_FLAG "-DGBE_DEBUG_MEMORY=1") else (GBE_DEBUG_MEMORY) set (GBE_DEBUG_MEMORY_FLAG "-DGBE_DEBUG_MEMORY=0") endif (GBE_DEBUG_MEMORY) # Hide all symbols and allows the symbols declared as visible to be exported set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${LLVM_CFLAGS} ${GBE_DEBUG_MEMORY_FLAG} ${GBE_COMPILE_UTESTS_FLAG} -DGBE_COMPILER_AVAILABLE=1 -fvisibility=hidden") set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${LLVM_CFLAGS} ${GBE_DEBUG_MEMORY_FLAG} ${GBE_COMPILE_UTESTS_FLAG} -DGBE_COMPILER_AVAILABLE=1") include_directories (${CMAKE_CURRENT_BINARY_DIR}) ############################################################## # Project source code ############################################################## add_subdirectory (src) if (USE_STANDALONE_GBE_COMPILER STREQUAL "true") set(LOCAL_OCL_BITCODE_BIN ${STANDALONE_GBE_COMPILER_DIR}/beignet.bc) set(LOCAL_OCL_HEADER_DIR ${STANDALONE_GBE_COMPILER_DIR}/include) set(LOCAL_OCL_PCH_OBJECT ${STANDALONE_GBE_COMPILER_DIR}/beignet.pch) endif (USE_STANDALONE_GBE_COMPILER STREQUAL "true") set(LOCAL_OCL_BITCODE_BIN "${LOCAL_OCL_BITCODE_BIN}" PARENT_SCOPE) set(LOCAL_OCL_HEADER_DIR "${LOCAL_OCL_HEADER_DIR}" PARENT_SCOPE) set(LOCAL_OCL_PCH_OBJECT "${LOCAL_OCL_PCH_OBJECT}" PARENT_SCOPE) set(LOCAL_GBE_OBJECT_DIR ${LOCAL_GBE_OBJECT_DIR} PARENT_SCOPE) set(LOCAL_INTERP_OBJECT_DIR ${LOCAL_INTERP_OBJECT_DIR} PARENT_SCOPE) set(LOCAL_OCL_BITCODE_BIN_20 "${LOCAL_OCL_BITCODE_BIN_20}" PARENT_SCOPE) set(LOCAL_OCL_PCH_OBJECT_20 "${LOCAL_OCL_PCH_OBJECT_20}" PARENT_SCOPE) set (GBE_BIN_GENERATER env OCL_BITCODE_LIB_PATH=${LOCAL_OCL_BITCODE_BIN} OCL_HEADER_FILE_DIR=${LOCAL_OCL_HEADER_DIR} OCL_PCH_PATH=${LOCAL_OCL_PCH_OBJECT} OCL_BITCODE_LIB_20_PATH=${LOCAL_OCL_BITCODE_BIN_20} OCL_PCH_20_PATH=${LOCAL_OCL_PCH_OBJECT_20}) if (USE_STANDALONE_GBE_COMPILER STREQUAL "true") set (GBE_BIN_GENERATER ${GBE_BIN_GENERATER} ${STANDALONE_GBE_COMPILER_DIR}/gbe_bin_generater PARENT_SCOPE) else (USE_STANDALONE_GBE_COMPILER STREQUAL "true") set (GBE_BIN_GENERATER ${GBE_BIN_GENERATER} LD_LIBRARY_PATH=${CMAKE_CURRENT_BINARY_DIR}/src ${CMAKE_CURRENT_BINARY_DIR}/src/gbe_bin_generater PARENT_SCOPE) endif (USE_STANDALONE_GBE_COMPILER STREQUAL "true") Beignet-1.3.2-Source/backend/kernels/000775 001750 001750 00000000000 13174334761 016523 5ustar00yryr000000 000000 Beignet-1.3.2-Source/backend/kernels/compile.sh000775 001750 001750 00000000145 13161142102 020470 0ustar00yryr000000 000000 #!/bin/bash clang -emit-llvm -O3 -target nvptx -c $1 -o $1.o llvm-dis $1.o rm $1.o mv $1.o.ll $1.ll Beignet-1.3.2-Source/src/000775 001750 001750 00000000000 13174334761 014260 5ustar00yryr000000 000000 Beignet-1.3.2-Source/src/cl_khr_icd.c000664 001750 001750 00000012434 13161142102 016467 0ustar00yryr000000 000000 /* * Copyright © 2013 Simon Richter * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . */ #include #include "cl_platform_id.h" #include "CL/cl_intel.h" // for clGetKernelSubGroupInfoKHR /* The interop functions are only available if sharing is enabled */ #ifdef HAS_GL_EGL #define CL_GL_INTEROP(x) x #else #define CL_GL_INTEROP(x) (void *) NULL #endif /* These are not yet implemented in Beignet */ #define CL_NOTYET(x) (void *) NULL /** Return platform list through ICD interface * This code is used only if a client is linked directly against the library * instead of using the ICD loader. In this case, no other implementations * should exist in the process address space, so the call is equivalent to * clGetPlatformIDs(). * * @param[in] num_entries Number of entries allocated in return buffer * @param[out] platforms Platform identifiers supported by this implementation * @param[out] num_platforms Number of platform identifiers returned * @return OpenCL error code * @retval CL_SUCCESS Successful execution * @retval CL_PLATFORM_NOT_FOUND_KHR No platforms provided * @retval CL_INVALID_VALUE Invalid parameters */ cl_int clIcdGetPlatformIDsKHR(cl_uint num_entries, cl_platform_id * platforms, cl_uint * num_platforms) { return clGetPlatformIDs(num_entries, platforms, num_platforms); } struct _cl_icd_dispatch const cl_khr_icd_dispatch = { clGetPlatformIDs, clGetPlatformInfo, clGetDeviceIDs, clGetDeviceInfo, clCreateContext, clCreateContextFromType, clRetainContext, clReleaseContext, clGetContextInfo, clCreateCommandQueue, clRetainCommandQueue, clReleaseCommandQueue, clGetCommandQueueInfo, (void *) NULL, /* clSetCommandQueueProperty */ clCreateBuffer, clCreateImage2D, clCreateImage3D, clRetainMemObject, clReleaseMemObject, clGetSupportedImageFormats, clGetMemObjectInfo, clGetImageInfo, clCreateSampler, clRetainSampler, clReleaseSampler, clGetSamplerInfo, clCreateProgramWithSource, clCreateProgramWithBinary, clRetainProgram, clReleaseProgram, clBuildProgram, clUnloadCompiler, clGetProgramInfo, clGetProgramBuildInfo, clCreateKernel, clCreateKernelsInProgram, clRetainKernel, clReleaseKernel, clSetKernelArg, clGetKernelInfo, clGetKernelWorkGroupInfo, clWaitForEvents, clGetEventInfo, clRetainEvent, clReleaseEvent, clGetEventProfilingInfo, clFlush, clFinish, clEnqueueReadBuffer, clEnqueueWriteBuffer, clEnqueueCopyBuffer, clEnqueueReadImage, clEnqueueWriteImage, clEnqueueCopyImage, clEnqueueCopyImageToBuffer, clEnqueueCopyBufferToImage, clEnqueueMapBuffer, clEnqueueMapImage, clEnqueueUnmapMemObject, clEnqueueNDRangeKernel, clEnqueueTask, clEnqueueNativeKernel, clEnqueueMarker, clEnqueueWaitForEvents, clEnqueueBarrier, clGetExtensionFunctionAddress, CL_GL_INTEROP(clCreateFromGLBuffer), CL_GL_INTEROP(clCreateFromGLTexture2D), CL_NOTYET(clCreateFromGLTexture3D), CL_NOTYET(clCreateFromGLRenderbuffer), CL_NOTYET(clGetGLObjectInfo), CL_NOTYET(clGetGLTextureInfo), CL_GL_INTEROP(clEnqueueAcquireGLObjects), CL_GL_INTEROP(clEnqueueReleaseGLObjects), CL_NOTYET(clGetGLContextInfoKHR), (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, clSetEventCallback, clCreateSubBuffer, clSetMemObjectDestructorCallback, clCreateUserEvent, clSetUserEventStatus, clEnqueueReadBufferRect, clEnqueueWriteBufferRect, clEnqueueCopyBufferRect, CL_NOTYET(clCreateSubDevicesEXT), CL_NOTYET(clRetainDeviceEXT), CL_NOTYET(clReleaseDeviceEXT), #ifdef CL_VERSION_1_2 (void *) NULL, clCreateSubDevices, clRetainDevice, clReleaseDevice, clCreateImage, clCreateProgramWithBuiltInKernels, clCompileProgram, clLinkProgram, clUnloadPlatformCompiler, clGetKernelArgInfo, clEnqueueFillBuffer, clEnqueueFillImage, clEnqueueMigrateMemObjects, clEnqueueMarkerWithWaitList, clEnqueueBarrierWithWaitList, clGetExtensionFunctionAddressForPlatform, CL_GL_INTEROP(clCreateFromGLTexture), (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, (void *) NULL, #endif #ifdef CL_VERSION_2_0 clCreateCommandQueueWithProperties, clCreatePipe, clGetPipeInfo, clSVMAlloc, clSVMFree, clEnqueueSVMFree, clEnqueueSVMMemcpy, clEnqueueSVMMemFill, clEnqueueSVMMap, clEnqueueSVMUnmap, clCreateSamplerWithProperties, clSetKernelArgSVMPointer, clSetKernelExecInfo, clGetKernelSubGroupInfoKHR, #endif }; Beignet-1.3.2-Source/src/intel/000775 001750 001750 00000000000 13174334761 015373 5ustar00yryr000000 000000 Beignet-1.3.2-Source/src/intel/intel_structs.h000664 001750 001750 00000053010 13161142102 020423 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* * Copyright 2009 Intel Corporation * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. * IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ #ifndef __INTEL_STRUCTS_H__ #define __INTEL_STRUCTS_H__ #include typedef struct gen6_interface_descriptor { struct { uint32_t pad6:6; uint32_t kernel_start_pointer:26; } desc0; struct { uint32_t pad:7; uint32_t software_exception:1; uint32_t pad2:3; uint32_t maskstack_exception:1; uint32_t pad3:1; uint32_t illegal_opcode_exception:1; uint32_t pad4:2; uint32_t floating_point_mode:1; uint32_t thread_priority:1; uint32_t single_program_flow:1; uint32_t pad5:1; uint32_t pad6:6; uint32_t pad7:6; } desc1; struct { uint32_t pad:2; uint32_t sampler_count:3; uint32_t sampler_state_pointer:27; } desc2; struct { uint32_t binding_table_entry_count:5; /* prefetch entries only */ uint32_t binding_table_pointer:27; /* 11 bit only on IVB+ */ } desc3; struct { uint32_t curbe_read_offset:16; /* in GRFs */ uint32_t curbe_read_len:16; /* in GRFs */ } desc4; struct { uint32_t group_threads_num:8; /* 0..64, 0 - no barrier use */ uint32_t barrier_return_byte:8; uint32_t slm_sz:5; /* 0..16 - 0K..64K */ uint32_t barrier_enable:1; uint32_t rounding_mode:2; uint32_t barrier_return_grf_offset:8; } desc5; uint32_t desc6; /* unused */ uint32_t desc7; /* unused */ } gen6_interface_descriptor_t; typedef struct gen8_interface_descriptor { struct { uint32_t pad6:6; uint32_t kernel_start_pointer:26; } desc0; struct { uint32_t kernel_start_pointer_high:16; uint32_t pad6:16; } desc1; struct { uint32_t pad:7; uint32_t software_exception:1; uint32_t pad2:3; uint32_t maskstack_exception:1; uint32_t pad3:1; uint32_t illegal_opcode_exception:1; uint32_t pad4:2; uint32_t floating_point_mode:1; uint32_t thread_priority:1; uint32_t single_program_flow:1; uint32_t denorm_mode:1; uint32_t thread_preemption_disable:1; uint32_t pad5:11; } desc2; struct { uint32_t pad:2; uint32_t sampler_count:3; uint32_t sampler_state_pointer:27; } desc3; struct { uint32_t binding_table_entry_count:5; /* prefetch entries only */ uint32_t binding_table_pointer:27; /* 11 bit only on IVB+ */ } desc4; struct { uint32_t curbe_read_offset:16; /* in GRFs */ uint32_t curbe_read_len:16; /* in GRFs */ } desc5; struct { uint32_t group_threads_num:10; /* 0..64, 0 - no barrier use */ uint32_t pad:5; uint32_t global_barrier_enable:1; uint32_t slm_sz:5; /* 0..16 - 0K..64K */ uint32_t barrier_enable:1; uint32_t rounding_mode:2; uint32_t barrier_return_grf_offset:8; } desc6; uint32_t desc7; /* unused */ } gen8_interface_descriptor_t; typedef struct gen7_surface_state { struct { uint32_t cube_pos_z:1; uint32_t cube_neg_z:1; uint32_t cube_pos_y:1; uint32_t cube_neg_y:1; uint32_t cube_pos_x:1; uint32_t cube_neg_x:1; uint32_t media_boundary_pixel_mode:2; uint32_t render_cache_rw_mode:1; uint32_t pad1:1; uint32_t surface_array_spacing:1; uint32_t vertical_line_stride_offset:1; uint32_t vertical_line_stride:1; uint32_t tile_walk:1; uint32_t tiled_surface:1; uint32_t horizontal_alignment:1; uint32_t vertical_alignment:2; uint32_t surface_format:9; uint32_t pad0:1; uint32_t surface_array:1; uint32_t surface_type:3; } ss0; struct { uint32_t base_addr; } ss1; struct { uint32_t width:14; uint32_t pad1:2; uint32_t height:14; uint32_t pad0:2; } ss2; struct { uint32_t pitch:18; uint32_t pad0:3; uint32_t depth:11; } ss3; union { struct { uint32_t mulsample_pal_idx:3; uint32_t numer_mulsample:3; uint32_t mss_fmt:1; uint32_t rt_view_extent:11; uint32_t min_array_element:11; uint32_t rt_rotate:2; uint32_t pad0:1; } not_str_buf; } ss4; struct { uint32_t mip_count:4; uint32_t surface_min_load:4; uint32_t pad2:6; uint32_t coherence_type:1; uint32_t stateless_force_write_thru:1; uint32_t cache_control:4; uint32_t y_offset:4; uint32_t pad0:1; uint32_t x_offset:7; } ss5; uint32_t ss6; /* unused */ struct { uint32_t min_lod:12; uint32_t pad0:4; uint32_t shader_a:3; uint32_t shader_b:3; uint32_t shader_g:3; uint32_t shader_r:3; uint32_t pad1:4; } ss7; } gen7_surface_state_t; typedef struct gen8_surface_state { struct { uint32_t cube_pos_z:1; uint32_t cube_neg_z:1; uint32_t cube_pos_y:1; uint32_t cube_neg_y:1; uint32_t cube_pos_x:1; uint32_t cube_neg_x:1; uint32_t media_boundary_pixel_mode:2; uint32_t render_cache_rw_mode:1; uint32_t sampler_L2_bypass_mode:1; uint32_t vertical_line_stride_offset:1; uint32_t vertical_line_stride:1; uint32_t tile_mode:2; uint32_t horizontal_alignment:2; uint32_t vertical_alignment:2; uint32_t surface_format:9; uint32_t pad0:1; uint32_t surface_array:1; uint32_t surface_type:3; } ss0; struct { uint32_t surface_qpitch:15; uint32_t pad0:3; uint32_t pad1:1; uint32_t base_mip_level:5; uint32_t mem_obj_ctrl_state:7; uint32_t pad2:1; } ss1; struct { uint32_t width:14; uint32_t pad1:2; uint32_t height:14; uint32_t pad0:2; } ss2; struct { uint32_t surface_pitch:18; uint32_t pad1:2; uint32_t pad0:1; uint32_t depth:11; } ss3; struct { union { struct { uint32_t multisample_pos_palette_idx:3; uint32_t multisample_num:3; uint32_t multisample_format:1; uint32_t render_target_view_ext:11; uint32_t min_array_elt:11; uint32_t render_target_and_sample_rotation:2; uint32_t pad1:1; }; uint32_t pad0; }; } ss4; struct { uint32_t mip_count:4; uint32_t surface_min_lod:4; uint32_t pad5:4; uint32_t pad4:2; uint32_t conherency_type:1; uint32_t pad3:3; uint32_t pad2:2; uint32_t cube_ewa:1; uint32_t y_offset:3; uint32_t pad0:1; uint32_t x_offset:7; } ss5; struct { union { union { struct { uint32_t aux_surface_mode:3; uint32_t aux_surface_pitch:9; uint32_t pad3:4; }; struct { uint32_t uv_plane_y_offset:14; uint32_t pad2:2; }; }; struct { uint32_t uv_plane_x_offset:14; uint32_t pad1:1; uint32_t seperate_uv_plane_enable:1; }; struct { uint32_t aux_sruface_qpitch:15; uint32_t pad0:1; }; }; } ss6; struct { uint32_t resource_min_lod:12; uint32_t pad0:4; uint32_t shader_channel_select_alpha:3; uint32_t shader_channel_select_blue:3; uint32_t shader_channel_select_green:3; uint32_t shader_channel_select_red:3; uint32_t alpha_clear_color:1; uint32_t blue_clear_color:1; uint32_t green_clear_color:1; uint32_t red_clear_color:1; } ss7; struct { uint32_t surface_base_addr_lo; } ss8; struct { uint32_t surface_base_addr_hi; } ss9; struct { uint32_t pad0:12; uint32_t aux_base_addr_lo:20; } ss10; struct { uint32_t aux_base_addr_hi:32; } ss11; struct { uint32_t pad0; } ss12; /* 13~15 have meaning only when aux surface mode == AUX_HIZ */ struct { uint32_t pad0; } ss13; struct { uint32_t pad0; } ss14; struct { uint32_t pad0; } ss15; } gen8_surface_state_t; typedef struct gen7_media_surface_state { struct { uint32_t base_addr; } ss0; struct { uint32_t uv_offset_v_direction:2; uint32_t pic_struct:2; uint32_t width:14; uint32_t height:14; } ss1; struct { uint32_t tile_mode:2; uint32_t half_pitch_for_chroma:1; uint32_t surface_pitch:18; uint32_t pad1:1; uint32_t surface_object_control_state:4; uint32_t pad0:1; uint32_t interleave_chroma:1; uint32_t surface_format:4; } ss2; struct { uint32_t y_offset_for_u:14; uint32_t pad1:2; uint32_t x_offset_for_u:14; uint32_t pad0:2; } ss3; struct { uint32_t y_offset_for_v:15; uint32_t pad1:1; uint32_t x_offset_for_v:14; uint32_t pad0:2; } ss4; struct { uint32_t pad0; } ss5; struct { uint32_t pad0; } ss6; struct { uint32_t pad0; } ss7; } gen7_media_surface_state_t; typedef union gen_surface_state { gen7_surface_state_t gen7_surface_state; gen8_surface_state_t gen8_surface_state; } gen_surface_state_t; static const size_t surface_state_sz = sizeof(gen_surface_state_t); typedef struct gen6_vfe_state_inline { struct { uint32_t per_thread_scratch_space:4; uint32_t pad3:3; uint32_t extend_vfe_state_present:1; uint32_t pad2:2; uint32_t scratch_base:22; } vfe0; struct { uint32_t debug_counter_control:2; uint32_t gpgpu_mode:1; /* 0 for SNB!!! */ uint32_t gateway_mmio_access:2; uint32_t fast_preempt:1; uint32_t bypass_gateway_ctl:1; /* 0 - legacy, 1 - no open/close */ uint32_t reset_gateway_timer:1; uint32_t urb_entries:8; uint32_t max_threads:16; } vfe1; struct { uint32_t pad8:8; uint32_t debug_object_id:24; } vfe2; struct { uint32_t curbe_size:16; /* in GRFs */ uint32_t urb_size:16; /* in GRFs */ } vfe3; struct { uint32_t scoreboard_mask:32; /* 1 - enable the corresponding dependency */ } vfe4; struct { uint32_t scoreboard0_dx:4; uint32_t scoreboard0_dy:4; uint32_t scoreboard1_dx:4; uint32_t scoreboard1_dy:4; uint32_t scoreboard2_dx:4; uint32_t scoreboard2_dy:4; uint32_t scoreboard3_dx:4; uint32_t scoreboard3_dy:4; } vfe5; struct { uint32_t scoreboard4_dx:4; uint32_t scoreboard4_dy:4; uint32_t scoreboard5_dx:4; uint32_t scoreboard5_dy:4; uint32_t scoreboard6_dx:4; uint32_t scoreboard6_dy:4; uint32_t scoreboard7_dx:4; uint32_t scoreboard7_dy:4; } vfe6; } gen6_vfe_state_inline_t; typedef struct gen6_pipe_control { struct { uint32_t length : BITFIELD_RANGE(0, 7); uint32_t reserved : BITFIELD_RANGE(8, 15); uint32_t instruction_subopcode : BITFIELD_RANGE(16, 23); uint32_t instruction_opcode : BITFIELD_RANGE(24, 26); uint32_t instruction_pipeline : BITFIELD_RANGE(27, 28); uint32_t instruction_type : BITFIELD_RANGE(29, 31); } dw0; struct { uint32_t depth_cache_flush_enable : BITFIELD_BIT(0); uint32_t stall_at_pixel_scoreboard : BITFIELD_BIT(1); uint32_t state_cache_invalidation_enable : BITFIELD_BIT(2); uint32_t constant_cache_invalidation_enable : BITFIELD_BIT(3); uint32_t vf_cache_invalidation_enable : BITFIELD_BIT(4); uint32_t dc_flush_enable : BITFIELD_BIT(5); uint32_t protected_memory_app_id : BITFIELD_BIT(6); uint32_t pipe_control_flush_enable : BITFIELD_BIT(7); uint32_t notify_enable : BITFIELD_BIT(8); uint32_t indirect_state_pointers_disable : BITFIELD_BIT(9); uint32_t texture_cache_invalidation_enable : BITFIELD_BIT(10); uint32_t instruction_cache_invalidate_enable : BITFIELD_BIT(11); uint32_t render_target_cache_flush_enable : BITFIELD_BIT(12); uint32_t depth_stall_enable : BITFIELD_BIT(13); uint32_t post_sync_operation : BITFIELD_RANGE(14, 15); uint32_t generic_media_state_clear : BITFIELD_BIT(16); uint32_t synchronize_gfdt_surface : BITFIELD_BIT(17); uint32_t tlb_invalidate : BITFIELD_BIT(18); uint32_t global_snapshot_count_reset : BITFIELD_BIT(19); uint32_t cs_stall : BITFIELD_BIT(20); uint32_t store_data_index : BITFIELD_BIT(21); uint32_t protected_memory_enable : BITFIELD_BIT(22); uint32_t reserved : BITFIELD_RANGE(23, 31); } dw1; struct { uint32_t reserved : BITFIELD_RANGE(0, 1); uint32_t destination_address_type : BITFIELD_BIT(2); uint32_t address : BITFIELD_RANGE(3, 31); } dw2; struct { uint32_t data; } dw3; struct { uint32_t data; } dw4; } gen6_pipe_control_t; typedef struct gen8_pipe_control { struct { uint32_t length : BITFIELD_RANGE(0, 7); uint32_t reserved : BITFIELD_RANGE(8, 15); uint32_t instruction_subopcode : BITFIELD_RANGE(16, 23); uint32_t instruction_opcode : BITFIELD_RANGE(24, 26); uint32_t instruction_pipeline : BITFIELD_RANGE(27, 28); uint32_t instruction_type : BITFIELD_RANGE(29, 31); } dw0; struct { uint32_t depth_cache_flush_enable : BITFIELD_BIT(0); uint32_t stall_at_pixel_scoreboard : BITFIELD_BIT(1); uint32_t state_cache_invalidation_enable : BITFIELD_BIT(2); uint32_t constant_cache_invalidation_enable : BITFIELD_BIT(3); uint32_t vf_cache_invalidation_enable : BITFIELD_BIT(4); uint32_t dc_flush_enable : BITFIELD_BIT(5); uint32_t protected_memory_app_id : BITFIELD_BIT(6); uint32_t pipe_control_flush_enable : BITFIELD_BIT(7); uint32_t notify_enable : BITFIELD_BIT(8); uint32_t indirect_state_pointers_disable : BITFIELD_BIT(9); uint32_t texture_cache_invalidation_enable : BITFIELD_BIT(10); uint32_t instruction_cache_invalidate_enable : BITFIELD_BIT(11); uint32_t render_target_cache_flush_enable : BITFIELD_BIT(12); uint32_t depth_stall_enable : BITFIELD_BIT(13); uint32_t post_sync_operation : BITFIELD_RANGE(14, 15); uint32_t generic_media_state_clear : BITFIELD_BIT(16); uint32_t synchronize_gfdt_surface : BITFIELD_BIT(17); uint32_t tlb_invalidate : BITFIELD_BIT(18); uint32_t global_snapshot_count_reset : BITFIELD_BIT(19); uint32_t cs_stall : BITFIELD_BIT(20); uint32_t store_data_index : BITFIELD_BIT(21); uint32_t protected_memory_enable : BITFIELD_BIT(22); uint32_t reserved : BITFIELD_RANGE(23, 31); } dw1; struct { uint32_t reserved : BITFIELD_RANGE(0, 1); uint32_t destination_address_type : BITFIELD_BIT(2); uint32_t address : BITFIELD_RANGE(3, 31); } dw2; struct { uint32_t data; } dw3; struct { uint32_t data; } dw4; struct { uint32_t data; } dw5; } gen8_pipe_control_t; #define GEN7_NUM_VME_SEARCH_PATH_STATES 14 #define GEN7_NUM_VME_RD_LUT_SETS 4 typedef struct gen7_vme_search_path_state { struct { uint32_t SPD_0_X : BITFIELD_RANGE(0, 3); //search path distance uint32_t SPD_0_Y : BITFIELD_RANGE(4, 7); uint32_t SPD_1_X : BITFIELD_RANGE(8, 11); uint32_t SPD_1_Y : BITFIELD_RANGE(12, 15); uint32_t SPD_2_X : BITFIELD_RANGE(16, 19); uint32_t SPD_2_Y : BITFIELD_RANGE(20, 23); uint32_t SPD_3_X : BITFIELD_RANGE(24, 27); uint32_t SPD_3_Y : BITFIELD_RANGE(28, 31); }dw0; }gen7_vme_search_path_state_t; typedef struct gen7_vme_rd_lut_set { struct { uint32_t LUT_MbMode_0 : BITFIELD_RANGE(0, 7); uint32_t LUT_MbMode_1 : BITFIELD_RANGE(8, 15); uint32_t LUT_MbMode_2 : BITFIELD_RANGE(16, 23); uint32_t LUT_MbMode_3 : BITFIELD_RANGE(24, 31); }dw0; struct { uint32_t LUT_MbMode_4 : BITFIELD_RANGE(0, 7); uint32_t LUT_MbMode_5 : BITFIELD_RANGE(8, 15); uint32_t LUT_MbMode_6 : BITFIELD_RANGE(16, 23); uint32_t LUT_MbMode_7 : BITFIELD_RANGE(24, 31); }dw1; struct { uint32_t LUT_MV_0 : BITFIELD_RANGE(0, 7); uint32_t LUT_MV_1 : BITFIELD_RANGE(8, 15); uint32_t LUT_MV_2 : BITFIELD_RANGE(16, 23); uint32_t LUT_MV_3 : BITFIELD_RANGE(24, 31); }dw2; struct { uint32_t LUT_MV_4 : BITFIELD_RANGE(0, 7); uint32_t LUT_MV_5 : BITFIELD_RANGE(8, 15); uint32_t LUT_MV_6 : BITFIELD_RANGE(16, 23); uint32_t LUT_MV_7 : BITFIELD_RANGE(24, 31); }dw3; }gen7_vme_rd_lut_set_t; typedef struct gen7_vme_state { gen7_vme_search_path_state_t sp[GEN7_NUM_VME_SEARCH_PATH_STATES]; struct { uint32_t LUT_MbMode_8_0 : BITFIELD_RANGE(0, 7); uint32_t LUT_MbMode_9_0 : BITFIELD_RANGE(8, 15); uint32_t LUT_MbMode_8_1 : BITFIELD_RANGE(16, 23); uint32_t LUT_MbMode_9_1 : BITFIELD_RANGE(24, 31); }dw14; struct { uint32_t LUT_MbMode_8_2 : BITFIELD_RANGE(0, 7); uint32_t LUT_MbMode_9_2 : BITFIELD_RANGE(8, 15); uint32_t LUT_MbMode_8_3 : BITFIELD_RANGE(16, 23); uint32_t LUT_MbMode_9_3 : BITFIELD_RANGE(24, 31); }dw15; gen7_vme_rd_lut_set_t lut[GEN7_NUM_VME_RD_LUT_SETS]; }gen7_vme_state_t; typedef struct gen6_sampler_state { struct { uint32_t shadow_function:3; uint32_t lod_bias:11; uint32_t min_filter:3; uint32_t mag_filter:3; uint32_t mip_filter:2; uint32_t base_level:5; uint32_t min_mag_neq:1; uint32_t lod_preclamp:1; uint32_t default_color_mode:1; uint32_t pad0:1; uint32_t disable:1; } ss0; struct { uint32_t r_wrap_mode:3; uint32_t t_wrap_mode:3; uint32_t s_wrap_mode:3; uint32_t cube_control_mode:1; uint32_t pad:2; uint32_t max_lod:10; uint32_t min_lod:10; } ss1; struct { uint32_t pad:5; uint32_t default_color_pointer:27; } ss2; struct { uint32_t non_normalized_coord:1; uint32_t pad:12; uint32_t address_round:6; uint32_t max_aniso:3; uint32_t chroma_key_mode:1; uint32_t chroma_key_index:2; uint32_t chroma_key_enable:1; uint32_t monochrome_filter_width:3; uint32_t monochrome_filter_height:3; } ss3; } gen6_sampler_state_t; typedef struct gen7_sampler_border_color { float r,g,b,a; } gen7_sampler_border_color_t; typedef struct gen7_sampler_state { struct { uint32_t aniso_algorithm:1; uint32_t lod_bias:13; uint32_t min_filter:3; uint32_t mag_filter:3; uint32_t mip_filter:2; uint32_t base_level:5; uint32_t pad1:1; uint32_t lod_preclamp:1; uint32_t default_color_mode:1; uint32_t pad0:1; uint32_t disable:1; } ss0; struct { uint32_t cube_control_mode:1; uint32_t shadow_function:3; uint32_t pad:4; uint32_t max_lod:12; uint32_t min_lod:12; } ss1; struct { uint32_t pad:5; uint32_t default_color_pointer:27; } ss2; struct { uint32_t r_wrap_mode:3; uint32_t t_wrap_mode:3; uint32_t s_wrap_mode:3; uint32_t pad:1; uint32_t non_normalized_coord:1; uint32_t trilinear_quality:2; uint32_t address_round:6; uint32_t max_aniso:3; uint32_t chroma_key_mode:1; uint32_t chroma_key_index:2; uint32_t chroma_key_enable:1; uint32_t pad0:6; } ss3; } gen7_sampler_state_t; STATIC_ASSERT(sizeof(gen6_sampler_state_t) == sizeof(gen7_sampler_state_t)); typedef struct gen8_sampler_state { struct { uint32_t aniso_algorithm:1; uint32_t lod_bias:13; uint32_t min_filter:3; uint32_t mag_filter:3; uint32_t mip_filter:2; uint32_t base_level:5; uint32_t lod_preclamp:2; uint32_t default_color_mode:1; uint32_t pad0:1; uint32_t disable:1; } ss0; struct { uint32_t cube_control_mode:1; uint32_t shadow_function:3; uint32_t chromakey_mode:1; uint32_t chromakey_index:2; uint32_t chromakey_enable:1; uint32_t max_lod:12; uint32_t min_lod:12; } ss1; struct { uint32_t lod_clamp_mag_mode:1; uint32_t flexible_filter_valign:1; uint32_t flexible_filter_halign:1; uint32_t flexible_filter_coeff_size:1; uint32_t flexible_filter_mode:1; uint32_t pad1:1; uint32_t indirect_state_ptr:18; uint32_t pad0:2; uint32_t sep_filter_height:2; uint32_t sep_filter_width:2; uint32_t sep_filter_coeff_table_size:2; } ss2; struct { uint32_t r_wrap_mode:3; uint32_t t_wrap_mode:3; uint32_t s_wrap_mode:3; uint32_t pad:1; uint32_t non_normalized_coord:1; uint32_t trilinear_quality:2; uint32_t address_round:6; uint32_t max_aniso:3; uint32_t pad0:2; uint32_t non_sep_filter_footprint_mask:8; } ss3; } gen8_sampler_state_t; STATIC_ASSERT(sizeof(gen6_sampler_state_t) == sizeof(gen8_sampler_state_t)); #undef BITFIELD_BIT #undef BITFIELD_RANGE #endif /* __INTEL_STRUCTS_H__ */ Beignet-1.3.2-Source/src/intel/intel_driver.h000664 001750 001750 00000012665 13161142102 020222 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* * Copyright 2009 Intel Corporation * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. * IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ #ifndef _INTEL_DRIVER_H_ #define _INTEL_DRIVER_H_ #include "cl_device_data.h" #include #include #include #include #include #include #include #include #define CMD_MI (0x0 << 29) #define CMD_2D (0x2 << 29) #define MI_NOOP (CMD_MI | 0) #define MI_BATCH_BUFFER_END (CMD_MI | (0xA << 23)) #define XY_COLOR_BLT_CMD (CMD_2D | (0x50 << 22) | 0x04) #define XY_COLOR_BLT_WRITE_ALPHA (1 << 21) #define XY_COLOR_BLT_WRITE_RGB (1 << 20) #define XY_COLOR_BLT_DST_TILED (1 << 11) /* BR13 */ #define BR13_565 (0x1 << 24) #define BR13_8888 (0x3 << 24) struct dri_state; struct intel_gpgpu_node; typedef struct _XDisplay Display; typedef struct intel_driver { dri_bufmgr *bufmgr; drm_intel_context *ctx; drm_intel_bo *null_bo; int fd; int device_id; int gen_ver; sigset_t sa_mask; pthread_mutex_t ctxmutex; int locked; int need_close; Display *x11_display; struct dri_state *dri_ctx; struct intel_gpgpu_node *gpgpu_list; int atomic_test_result; } intel_driver_t; #define SET_BLOCKED_SIGSET(DRIVER) do { \ sigset_t bl_mask; \ sigfillset(&bl_mask); \ sigdelset(&bl_mask, SIGFPE); \ sigdelset(&bl_mask, SIGILL); \ sigdelset(&bl_mask, SIGSEGV); \ sigdelset(&bl_mask, SIGBUS); \ sigdelset(&bl_mask, SIGKILL); \ pthread_sigmask(SIG_SETMASK, &bl_mask, &(DRIVER)->sa_mask); \ } while (0) #define RESTORE_BLOCKED_SIGSET(DRIVER) do { \ pthread_sigmask(SIG_SETMASK, &(DRIVER)->sa_mask, NULL); \ } while (0) #define PPTHREAD_MUTEX_LOCK(DRIVER) do { \ SET_BLOCKED_SIGSET(DRIVER); \ pthread_mutex_lock(&(DRIVER)->ctxmutex); \ } while (0) #define PPTHREAD_MUTEX_UNLOCK(DRIVER) do { \ pthread_mutex_unlock(&(DRIVER)->ctxmutex); \ RESTORE_BLOCKED_SIGSET(DRIVER); \ } while (0) /* device control */ extern void intel_driver_lock_hardware(intel_driver_t*); extern void intel_driver_unlock_hardware(intel_driver_t*); /* methods working in shared mode */ extern dri_bo* intel_driver_share_buffer(intel_driver_t*, const char *sname, uint32_t name); extern uint32_t intel_driver_shared_name(intel_driver_t*, dri_bo*); /* init driver shared with X using dri state, acquired from X Display */ extern int intel_driver_init_shared(intel_driver_t*, struct dri_state*); /* init driver in master mode (when X is not using the card) * usually dev_name = "/dev/dri/card0" */ extern int intel_driver_init_master(intel_driver_t*, const char* dev_name); /* init driver for render node */ extern int intel_driver_init_render(intel_driver_t*, const char* dev_name); /* terminate driver and all underlying structures */ extern int intel_driver_terminate(intel_driver_t*); /* simple check if driver was initialized (checking fd should suffice) */ extern int intel_driver_is_active(intel_driver_t*); /* init the call backs used by the ocl driver */ extern void intel_setup_callbacks(void); #endif /* _INTEL_DRIVER_H_ */ Beignet-1.3.2-Source/src/intel/intel_gpgpu.c000664 001750 001750 00000270146 13173554000 020053 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia * Alexei Soupikov */ #include #include #include #include #include #include #include #include #include #include #include #include #include "intel/intel_gpgpu.h" #include "intel/intel_defines.h" #include "intel/intel_structs.h" #include "program.h" // for BTI_RESERVED_NUM #include "cl_alloc.h" #include "cl_utils.h" #include "cl_sampler.h" #include "cl_accelerator_intel.h" #ifndef CL_VERSION_1_2 #define CL_MEM_OBJECT_IMAGE1D 0x10F4 #define CL_MEM_OBJECT_IMAGE1D_ARRAY 0x10F5 #define CL_MEM_OBJECT_IMAGE1D_BUFFER 0x10F6 #define CL_MEM_OBJECT_IMAGE2D_ARRAY 0x10F3 #endif #define GEN_CMD_MEDIA_OBJECT (0x71000000) #define MO_TS_BIT (1 << 24) #define MO_RETAIN_BIT (1 << 28) #define SAMPLER_STATE_SIZE (16) #define TIMESTAMP_ADDR 0x2358 /* Stores both binding tables and surface states */ typedef struct surface_heap { uint32_t binding_table[256]; char surface[256*sizeof(gen_surface_state_t)]; } surface_heap_t; typedef struct intel_event { drm_intel_bo *buffer; drm_intel_bo *ts_buf; int status; } intel_event_t; #define MAX_IF_DESC 32 typedef struct intel_gpgpu intel_gpgpu_t; typedef void (intel_gpgpu_set_L3_t)(intel_gpgpu_t *gpgpu, uint32_t use_slm); intel_gpgpu_set_L3_t *intel_gpgpu_set_L3 = NULL; typedef uint32_t (intel_gpgpu_get_scratch_index_t)(uint32_t size); intel_gpgpu_get_scratch_index_t *intel_gpgpu_get_scratch_index = NULL; typedef void (intel_gpgpu_post_action_t)(intel_gpgpu_t *gpgpu, int32_t flush_mode); intel_gpgpu_post_action_t *intel_gpgpu_post_action = NULL; typedef uint64_t (intel_gpgpu_read_ts_reg_t)(drm_intel_bufmgr *bufmgr); intel_gpgpu_read_ts_reg_t *intel_gpgpu_read_ts_reg = NULL; typedef void (intel_gpgpu_set_base_address_t)(intel_gpgpu_t *gpgpu); intel_gpgpu_set_base_address_t *intel_gpgpu_set_base_address = NULL; typedef void (intel_gpgpu_setup_bti_t)(intel_gpgpu_t *gpgpu, drm_intel_bo *buf, uint32_t internal_offset, size_t size, unsigned char index, uint32_t format); intel_gpgpu_setup_bti_t *intel_gpgpu_setup_bti = NULL; typedef void (intel_gpgpu_load_vfe_state_t)(intel_gpgpu_t *gpgpu); intel_gpgpu_load_vfe_state_t *intel_gpgpu_load_vfe_state = NULL; typedef void (intel_gpgpu_build_idrt_t)(intel_gpgpu_t *gpgpu, cl_gpgpu_kernel *kernel); intel_gpgpu_build_idrt_t *intel_gpgpu_build_idrt = NULL; typedef void (intel_gpgpu_load_curbe_buffer_t)(intel_gpgpu_t *gpgpu); intel_gpgpu_load_curbe_buffer_t *intel_gpgpu_load_curbe_buffer = NULL; typedef void (intel_gpgpu_load_idrt_t)(intel_gpgpu_t *gpgpu); intel_gpgpu_load_idrt_t *intel_gpgpu_load_idrt = NULL; typedef void (intel_gpgpu_pipe_control_t)(intel_gpgpu_t *gpgpu); intel_gpgpu_pipe_control_t *intel_gpgpu_pipe_control = NULL; typedef void (intel_gpgpu_select_pipeline_t)(intel_gpgpu_t *gpgpu); intel_gpgpu_select_pipeline_t *intel_gpgpu_select_pipeline = NULL; static void intel_gpgpu_sync(void *buf) { if (buf) drm_intel_bo_wait_rendering((drm_intel_bo *)buf); } static void *intel_gpgpu_ref_batch_buf(intel_gpgpu_t *gpgpu) { if (gpgpu->batch->last_bo) drm_intel_bo_reference(gpgpu->batch->last_bo); return gpgpu->batch->last_bo; } static void intel_gpgpu_unref_batch_buf(void *buf) { if (buf) drm_intel_bo_unreference((drm_intel_bo *)buf); } static void intel_gpgpu_delete_finished(intel_gpgpu_t *gpgpu) { if (gpgpu == NULL) return; if(gpgpu->time_stamp_b.bo) drm_intel_bo_unreference(gpgpu->time_stamp_b.bo); if(gpgpu->printf_b.bo) drm_intel_bo_unreference(gpgpu->printf_b.bo); if (gpgpu->aux_buf.bo) drm_intel_bo_unreference(gpgpu->aux_buf.bo); if (gpgpu->perf_b.bo) drm_intel_bo_unreference(gpgpu->perf_b.bo); if (gpgpu->stack_b.bo) drm_intel_bo_unreference(gpgpu->stack_b.bo); if (gpgpu->scratch_b.bo) drm_intel_bo_unreference(gpgpu->scratch_b.bo); if (gpgpu->profiling_b.bo) drm_intel_bo_unreference(gpgpu->profiling_b.bo); if(gpgpu->constant_b.bo) drm_intel_bo_unreference(gpgpu->constant_b.bo); intel_batchbuffer_delete(gpgpu->batch); cl_free(gpgpu); } /* Destroy the all intel_gpgpu, no matter finish or not, when driver destroy */ void intel_gpgpu_delete_all(intel_driver_t *drv) { struct intel_gpgpu_node *p; if(drv->gpgpu_list == NULL) return; PPTHREAD_MUTEX_LOCK(drv); while(drv->gpgpu_list) { p = drv->gpgpu_list; drv->gpgpu_list = p->next; intel_gpgpu_delete_finished(p->gpgpu); cl_free(p); } PPTHREAD_MUTEX_UNLOCK(drv); } static void intel_gpgpu_delete(intel_gpgpu_t *gpgpu) { if (gpgpu == NULL) return; intel_driver_t *drv = gpgpu->drv; struct intel_gpgpu_node *p, *node; PPTHREAD_MUTEX_LOCK(drv); p = drv->gpgpu_list; if(p) { node = p->next; while(node) { if(node->gpgpu->batch && node->gpgpu->batch->buffer && !drm_intel_bo_busy(node->gpgpu->batch->buffer)) { p->next = node->next; intel_gpgpu_delete_finished(node->gpgpu); cl_free(node); node = p->next; } else { p = node; node = node->next; } } node = drv->gpgpu_list; if(node->gpgpu->batch && node->gpgpu->batch->buffer && !drm_intel_bo_busy(node->gpgpu->batch->buffer)) { drv->gpgpu_list = drv->gpgpu_list->next; intel_gpgpu_delete_finished(node->gpgpu); cl_free(node); } } if (gpgpu == NULL) return; if(gpgpu->batch && gpgpu->batch->buffer && drm_intel_bo_busy(gpgpu->batch->buffer)) { TRY_ALLOC_NO_ERR (node, CALLOC(struct intel_gpgpu_node)); node->gpgpu = gpgpu; node->next = NULL; p = drv->gpgpu_list; if(p == NULL) drv->gpgpu_list= node; else { while(p->next) p = p->next; p->next = node; } } else intel_gpgpu_delete_finished(gpgpu); error: PPTHREAD_MUTEX_UNLOCK(drv); } static intel_gpgpu_t* intel_gpgpu_new(intel_driver_t *drv) { intel_gpgpu_t *state = NULL; TRY_ALLOC_NO_ERR (state, CALLOC(intel_gpgpu_t)); state->drv = drv; state->batch = intel_batchbuffer_new(state->drv); assert(state->batch); exit: return state; error: intel_gpgpu_delete(state); state = NULL; goto exit; } static void intel_gpgpu_select_pipeline_gen7(intel_gpgpu_t *gpgpu) { BEGIN_BATCH(gpgpu->batch, 1); OUT_BATCH(gpgpu->batch, CMD_PIPELINE_SELECT | PIPELINE_SELECT_GPGPU); ADVANCE_BATCH(gpgpu->batch); } static void intel_gpgpu_select_pipeline_gen9(intel_gpgpu_t *gpgpu) { BEGIN_BATCH(gpgpu->batch, 1); OUT_BATCH(gpgpu->batch, CMD_PIPELINE_SELECT | PIPELINE_SELECT_MASK | PIPELINE_SELECT_GPGPU); ADVANCE_BATCH(gpgpu->batch); } static uint32_t intel_gpgpu_get_cache_ctrl_gen7() { return cc_llc_l3; } static uint32_t intel_gpgpu_get_cache_ctrl_gen75() { return llccc_ec | l3cc_ec; } static uint32_t intel_gpgpu_get_cache_ctrl_gen8() { return tcc_llc_ec_l3 | mtllc_wb; } static uint32_t intel_gpgpu_get_cache_ctrl_gen9() { //Kernel-defined cache control registers 2: //L3CC: WB; LeCC: WB; TC: LLC/eLLC; int major = 0, minor = 0; int mocs_index = 0x2; struct utsname buf; uname(&buf); sscanf(buf.release, "%d.%d", &major, &minor); //From linux 4.3, kernel redefined the mocs table's value, //But before 4.3, still used the hw defautl value. if(strcmp(buf.sysname, "Linux") == 0 && major == 4 && minor < 3) { /* linux kernel support skl from 4.x, so check from 4 */ mocs_index = 0x9; } return (mocs_index << 1); } static void intel_gpgpu_set_base_address_gen7(intel_gpgpu_t *gpgpu) { const uint32_t def_cc = cl_gpgpu_get_cache_ctrl(); /* default Cache Control value */ BEGIN_BATCH(gpgpu->batch, 10); OUT_BATCH(gpgpu->batch, CMD_STATE_BASE_ADDRESS | 8); /* 0, Gen State Mem Obj CC, Stateless Mem Obj CC, Stateless Access Write Back */ OUT_BATCH(gpgpu->batch, 0 | (def_cc << 8) | (def_cc << 4) | (0 << 3)| BASE_ADDRESS_MODIFY); /* General State Base Addr */ /* 0, State Mem Obj CC */ /* We use a state base address for the surface heap since IVB clamp the * binding table pointer at 11 bits. So, we cannot use pointers directly while * using the surface heap */ assert(gpgpu->aux_offset.surface_heap_offset % 4096 == 0); OUT_RELOC(gpgpu->batch, gpgpu->aux_buf.bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, gpgpu->aux_offset.surface_heap_offset + (0 | (def_cc << 8) | (def_cc << 4) | (0 << 3)| BASE_ADDRESS_MODIFY)); OUT_BATCH(gpgpu->batch, 0 | (def_cc << 8) | BASE_ADDRESS_MODIFY); /* Dynamic State Base Addr */ OUT_BATCH(gpgpu->batch, 0 | (def_cc << 8) | BASE_ADDRESS_MODIFY); /* Indirect Obj Base Addr */ OUT_BATCH(gpgpu->batch, 0 | (def_cc << 8) | BASE_ADDRESS_MODIFY); /* Instruction Base Addr */ OUT_BATCH(gpgpu->batch, 0 | BASE_ADDRESS_MODIFY); /* According to mesa i965 driver code, we must set the dynamic state access upper bound * to a valid bound value, otherwise, the border color pointer may be rejected and you * may get incorrect border color. This is a known hardware bug. */ OUT_BATCH(gpgpu->batch, 0xfffff000 | BASE_ADDRESS_MODIFY); OUT_BATCH(gpgpu->batch, 0 | BASE_ADDRESS_MODIFY); OUT_BATCH(gpgpu->batch, 0 | BASE_ADDRESS_MODIFY); ADVANCE_BATCH(gpgpu->batch); } static void intel_gpgpu_set_base_address_gen8(intel_gpgpu_t *gpgpu) { const uint32_t def_cc = cl_gpgpu_get_cache_ctrl(); /* default Cache Control value */ BEGIN_BATCH(gpgpu->batch, 16); OUT_BATCH(gpgpu->batch, CMD_STATE_BASE_ADDRESS | 14); /* 0, Gen State Mem Obj CC, Stateless Mem Obj CC, Stateless Access Write Back */ OUT_BATCH(gpgpu->batch, 0 | (def_cc << 4) | (0 << 1)| BASE_ADDRESS_MODIFY); /* General State Base Addr */ OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, 0 | (def_cc << 16)); /* 0, State Mem Obj CC */ /* We use a state base address for the surface heap since IVB clamp the * binding table pointer at 11 bits. So, we cannot use pointers directly while * using the surface heap */ assert(gpgpu->aux_offset.surface_heap_offset % 4096 == 0); OUT_RELOC(gpgpu->batch, gpgpu->aux_buf.bo, I915_GEM_DOMAIN_SAMPLER, I915_GEM_DOMAIN_SAMPLER, gpgpu->aux_offset.surface_heap_offset + (0 | (def_cc << 4) | (0 << 1)| BASE_ADDRESS_MODIFY)); OUT_BATCH(gpgpu->batch, 0); OUT_RELOC(gpgpu->batch, gpgpu->aux_buf.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, (0 | (def_cc << 4) | (0 << 1)| BASE_ADDRESS_MODIFY)); /* Dynamic State Base Addr */ OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, 0 | (def_cc << 4) | BASE_ADDRESS_MODIFY); /* Indirect Obj Base Addr */ OUT_BATCH(gpgpu->batch, 0); //OUT_BATCH(gpgpu->batch, 0 | (def_cc << 4) | BASE_ADDRESS_MODIFY); /* Instruction Base Addr */ OUT_RELOC(gpgpu->batch, (drm_intel_bo *)gpgpu->ker->bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, 0 + (0 | (def_cc << 4) | (0 << 1)| BASE_ADDRESS_MODIFY)); OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, 0xfffff000 | BASE_ADDRESS_MODIFY); /* According to mesa i965 driver code, we must set the dynamic state access upper bound * to a valid bound value, otherwise, the border color pointer may be rejected and you * may get incorrect border color. This is a known hardware bug. */ OUT_BATCH(gpgpu->batch, 0xfffff000 | BASE_ADDRESS_MODIFY); OUT_BATCH(gpgpu->batch, 0xfffff000 | BASE_ADDRESS_MODIFY); OUT_BATCH(gpgpu->batch, 0xfffff000 | BASE_ADDRESS_MODIFY); ADVANCE_BATCH(gpgpu->batch); } static void intel_gpgpu_set_base_address_gen9(intel_gpgpu_t *gpgpu) { const uint32_t def_cc = cl_gpgpu_get_cache_ctrl(); /* default Cache Control value */ BEGIN_BATCH(gpgpu->batch, 19); OUT_BATCH(gpgpu->batch, CMD_STATE_BASE_ADDRESS | 17); /* 0, Gen State Mem Obj CC, Stateless Mem Obj CC, Stateless Access Write Back */ OUT_BATCH(gpgpu->batch, 0 | (def_cc << 4) | (0 << 1)| BASE_ADDRESS_MODIFY); /* General State Base Addr */ OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, 0 | (def_cc << 16)); /* 0, State Mem Obj CC */ /* We use a state base address for the surface heap since IVB clamp the * binding table pointer at 11 bits. So, we cannot use pointers directly while * using the surface heap */ assert(gpgpu->aux_offset.surface_heap_offset % 4096 == 0); OUT_RELOC(gpgpu->batch, gpgpu->aux_buf.bo, I915_GEM_DOMAIN_SAMPLER, I915_GEM_DOMAIN_SAMPLER, gpgpu->aux_offset.surface_heap_offset + (0 | (def_cc << 4) | (0 << 1)| BASE_ADDRESS_MODIFY)); OUT_BATCH(gpgpu->batch, 0); OUT_RELOC(gpgpu->batch, gpgpu->aux_buf.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, (0 | (def_cc << 4) | (0 << 1)| BASE_ADDRESS_MODIFY)); /* Dynamic State Base Addr */ OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, 0 | (def_cc << 4) | BASE_ADDRESS_MODIFY); /* Indirect Obj Base Addr */ OUT_BATCH(gpgpu->batch, 0); //OUT_BATCH(gpgpu->batch, 0 | (def_cc << 4) | BASE_ADDRESS_MODIFY); /* Instruction Base Addr */ OUT_RELOC(gpgpu->batch, (drm_intel_bo *)gpgpu->ker->bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, 0 + (0 | (def_cc << 4) | (0 << 1)| BASE_ADDRESS_MODIFY)); OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, 0xfffff000 | BASE_ADDRESS_MODIFY); /* According to mesa i965 driver code, we must set the dynamic state access upper bound * to a valid bound value, otherwise, the border color pointer may be rejected and you * may get incorrect border color. This is a known hardware bug. */ OUT_BATCH(gpgpu->batch, 0xfffff000 | BASE_ADDRESS_MODIFY); OUT_BATCH(gpgpu->batch, 0xfffff000 | BASE_ADDRESS_MODIFY); OUT_BATCH(gpgpu->batch, 0xfffff000 | BASE_ADDRESS_MODIFY); /* Bindless surface state base address */ OUT_BATCH(gpgpu->batch, (def_cc << 4) | BASE_ADDRESS_MODIFY); OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, 0xfffff000); ADVANCE_BATCH(gpgpu->batch); } uint32_t intel_gpgpu_get_scratch_index_gen7(uint32_t size) { return size / 1024 - 1; } uint32_t intel_gpgpu_get_scratch_index_gen75(uint32_t size) { //align in backend, if non pow2, must align when alloc scratch bo. assert((size & (size - 1)) == 0); size = size >> 11; uint32_t index = 0; while((size >>= 1) > 0) index++; //get leading one return index; } uint32_t intel_gpgpu_get_scratch_index_gen8(uint32_t size) { //align in backend, if non pow2, must align when alloc scratch bo. assert((size & (size - 1)) == 0); size = size >> 10; uint32_t index = 0; while((size >>= 1) > 0) index++; //get leading one return index; } static cl_int intel_gpgpu_get_max_curbe_size(uint32_t device_id) { if (IS_BAYTRAIL_T(device_id) || IS_IVB_GT1(device_id)) return 992; else return 2016; } static cl_int intel_gpgpu_get_curbe_size(intel_gpgpu_t *gpgpu) { int curbe_size = gpgpu->curb.size_cs_entry * gpgpu->curb.num_cs_entries; int max_curbe_size = intel_gpgpu_get_max_curbe_size(gpgpu->drv->device_id); if (curbe_size > max_curbe_size) { fprintf(stderr, "warning, curbe size exceed limitation.\n"); return max_curbe_size; } else return curbe_size; } static void intel_gpgpu_load_vfe_state_gen7(intel_gpgpu_t *gpgpu) { int32_t scratch_index; BEGIN_BATCH(gpgpu->batch, 8); OUT_BATCH(gpgpu->batch, CMD_MEDIA_STATE_POINTERS | (8-2)); if(gpgpu->per_thread_scratch > 0) { scratch_index = intel_gpgpu_get_scratch_index(gpgpu->per_thread_scratch); OUT_RELOC(gpgpu->batch, gpgpu->scratch_b.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, scratch_index); } else { OUT_BATCH(gpgpu->batch, 0); } /* max_thread | urb entries | (reset_gateway|bypass_gate_way | gpgpu_mode) */ OUT_BATCH(gpgpu->batch, 0 | ((gpgpu->max_threads - 1) << 16) | (0 << 8) | 0xc4); OUT_BATCH(gpgpu->batch, 0); /* curbe_size */ OUT_BATCH(gpgpu->batch, intel_gpgpu_get_curbe_size(gpgpu)); OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, 0); ADVANCE_BATCH(gpgpu->batch); } static void intel_gpgpu_load_vfe_state_gen8(intel_gpgpu_t *gpgpu) { int32_t scratch_index; BEGIN_BATCH(gpgpu->batch, 9); OUT_BATCH(gpgpu->batch, CMD_MEDIA_STATE_POINTERS | (9-2)); if(gpgpu->per_thread_scratch > 0) { scratch_index = intel_gpgpu_get_scratch_index(gpgpu->per_thread_scratch); OUT_RELOC(gpgpu->batch, gpgpu->scratch_b.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, scratch_index); } else { OUT_BATCH(gpgpu->batch, 0); } OUT_BATCH(gpgpu->batch, 0); /* max_thread | urb entries | (reset_gateway|bypass_gate_way | gpgpu_mode) */ OUT_BATCH(gpgpu->batch, 0 | ((gpgpu->max_threads - 1) << 16) | (2 << 8) | 0xc0); //urb entries can't be 0 OUT_BATCH(gpgpu->batch, 0); /* urb entries size | curbe_size */ OUT_BATCH(gpgpu->batch, 2<<16 | intel_gpgpu_get_curbe_size(gpgpu)); OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, 0); ADVANCE_BATCH(gpgpu->batch); } static void intel_gpgpu_load_curbe_buffer_gen7(intel_gpgpu_t *gpgpu) { BEGIN_BATCH(gpgpu->batch, 4); OUT_BATCH(gpgpu->batch, CMD(2,0,1) | (4 - 2)); /* length-2 */ OUT_BATCH(gpgpu->batch, 0); /* mbz */ OUT_BATCH(gpgpu->batch, intel_gpgpu_get_curbe_size(gpgpu) * 32); OUT_RELOC(gpgpu->batch, gpgpu->aux_buf.bo, I915_GEM_DOMAIN_INSTRUCTION, 0, gpgpu->aux_offset.curbe_offset); ADVANCE_BATCH(gpgpu->batch); } static void intel_gpgpu_load_curbe_buffer_gen8(intel_gpgpu_t *gpgpu) { BEGIN_BATCH(gpgpu->batch, 4); OUT_BATCH(gpgpu->batch, CMD(2,0,1) | (4 - 2)); /* length-2 */ OUT_BATCH(gpgpu->batch, 0); /* mbz */ OUT_BATCH(gpgpu->batch, intel_gpgpu_get_curbe_size(gpgpu) * 32); OUT_BATCH(gpgpu->batch, gpgpu->aux_offset.curbe_offset); ADVANCE_BATCH(gpgpu->batch); } static void intel_gpgpu_load_idrt_gen7(intel_gpgpu_t *gpgpu) { BEGIN_BATCH(gpgpu->batch, 4); OUT_BATCH(gpgpu->batch, CMD(2,0,2) | (4 - 2)); /* length-2 */ OUT_BATCH(gpgpu->batch, 0); /* mbz */ OUT_BATCH(gpgpu->batch, 1 << 5); OUT_RELOC(gpgpu->batch, gpgpu->aux_buf.bo, I915_GEM_DOMAIN_INSTRUCTION, 0, gpgpu->aux_offset.idrt_offset); ADVANCE_BATCH(gpgpu->batch); } static void intel_gpgpu_load_idrt_gen8(intel_gpgpu_t *gpgpu) { BEGIN_BATCH(gpgpu->batch, 4); OUT_BATCH(gpgpu->batch, CMD(2,0,2) | (4 - 2)); /* length-2 */ OUT_BATCH(gpgpu->batch, 0); /* mbz */ OUT_BATCH(gpgpu->batch, 1 << 5); OUT_BATCH(gpgpu->batch, gpgpu->aux_offset.idrt_offset); ADVANCE_BATCH(gpgpu->batch); } static const uint32_t gpgpu_l3_config_reg1[] = { 0x00080040, 0x02040040, 0x00800040, 0x01000038, 0x02000030, 0x01000038, 0x00000038, 0x00000040, 0x0A140091, 0x09100091, 0x08900091, 0x08900091, 0x010000a1 }; static const uint32_t gpgpu_l3_config_reg2[] = { 0x00000000, 0x00000000, 0x00080410, 0x00080410, 0x00040410, 0x00040420, 0x00080420, 0x00080020, 0x00204080, 0x00244890, 0x00284490, 0x002444A0, 0x00040810 }; /* Emit PIPE_CONTROLs to write the current GPU timestamp into a buffer. */ static void intel_gpgpu_write_timestamp(intel_gpgpu_t *gpgpu, int idx) { BEGIN_BATCH(gpgpu->batch, 5); OUT_BATCH(gpgpu->batch, CMD_PIPE_CONTROL | (5-2)); OUT_BATCH(gpgpu->batch, GEN7_PIPE_CONTROL_WRITE_TIMESTAMP); OUT_RELOC(gpgpu->batch, gpgpu->time_stamp_b.bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, GEN7_PIPE_CONTROL_GLOBAL_GTT_WRITE | idx * sizeof(uint64_t)); OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, 0); ADVANCE_BATCH(); } static void intel_gpgpu_pipe_control_gen7(intel_gpgpu_t *gpgpu) { gen6_pipe_control_t* pc = (gen6_pipe_control_t*) intel_batchbuffer_alloc_space(gpgpu->batch, sizeof(gen6_pipe_control_t)); memset(pc, 0, sizeof(*pc)); pc->dw0.length = SIZEOF32(gen6_pipe_control_t) - 2; pc->dw0.instruction_subopcode = GEN7_PIPE_CONTROL_SUBOPCODE_3D_CONTROL; pc->dw0.instruction_opcode = GEN7_PIPE_CONTROL_OPCODE_3D_CONTROL; pc->dw0.instruction_pipeline = GEN7_PIPE_CONTROL_3D; pc->dw0.instruction_type = GEN7_PIPE_CONTROL_INSTRUCTION_GFX; pc->dw1.render_target_cache_flush_enable = 1; pc->dw1.texture_cache_invalidation_enable = 1; pc->dw1.cs_stall = 1; pc->dw1.dc_flush_enable = 1; //pc->dw1.instruction_cache_invalidate_enable = 1; ADVANCE_BATCH(gpgpu->batch); } static void intel_gpgpu_pipe_control_gen75(intel_gpgpu_t *gpgpu) { gen6_pipe_control_t* pc = (gen6_pipe_control_t*) intel_batchbuffer_alloc_space(gpgpu->batch, sizeof(gen6_pipe_control_t)); memset(pc, 0, sizeof(*pc)); pc->dw0.length = SIZEOF32(gen6_pipe_control_t) - 2; pc->dw0.instruction_subopcode = GEN7_PIPE_CONTROL_SUBOPCODE_3D_CONTROL; pc->dw0.instruction_opcode = GEN7_PIPE_CONTROL_OPCODE_3D_CONTROL; pc->dw0.instruction_pipeline = GEN7_PIPE_CONTROL_3D; pc->dw0.instruction_type = GEN7_PIPE_CONTROL_INSTRUCTION_GFX; pc->dw1.cs_stall = 1; pc->dw1.dc_flush_enable = 1; pc = (gen6_pipe_control_t*) intel_batchbuffer_alloc_space(gpgpu->batch, sizeof(gen6_pipe_control_t)); memset(pc, 0, sizeof(*pc)); pc->dw0.length = SIZEOF32(gen6_pipe_control_t) - 2; pc->dw0.instruction_subopcode = GEN7_PIPE_CONTROL_SUBOPCODE_3D_CONTROL; pc->dw0.instruction_opcode = GEN7_PIPE_CONTROL_OPCODE_3D_CONTROL; pc->dw0.instruction_pipeline = GEN7_PIPE_CONTROL_3D; pc->dw0.instruction_type = GEN7_PIPE_CONTROL_INSTRUCTION_GFX; pc->dw1.render_target_cache_flush_enable = 1; pc->dw1.texture_cache_invalidation_enable = 1; pc->dw1.cs_stall = 1; ADVANCE_BATCH(gpgpu->batch); } static void intel_gpgpu_pipe_control_gen8(intel_gpgpu_t *gpgpu) { gen8_pipe_control_t* pc = (gen8_pipe_control_t*) intel_batchbuffer_alloc_space(gpgpu->batch, sizeof(gen8_pipe_control_t)); memset(pc, 0, sizeof(*pc)); pc->dw0.length = SIZEOF32(gen8_pipe_control_t) - 2; pc->dw0.instruction_subopcode = GEN7_PIPE_CONTROL_SUBOPCODE_3D_CONTROL; pc->dw0.instruction_opcode = GEN7_PIPE_CONTROL_OPCODE_3D_CONTROL; pc->dw0.instruction_pipeline = GEN7_PIPE_CONTROL_3D; pc->dw0.instruction_type = GEN7_PIPE_CONTROL_INSTRUCTION_GFX; pc->dw1.render_target_cache_flush_enable = 1; pc->dw1.texture_cache_invalidation_enable = 1; pc->dw1.cs_stall = 1; pc->dw1.dc_flush_enable = 1; //pc->dw1.instruction_cache_invalidate_enable = 1; ADVANCE_BATCH(gpgpu->batch); } static void intel_gpgpu_set_L3_gen7(intel_gpgpu_t *gpgpu, uint32_t use_slm) { BEGIN_BATCH(gpgpu->batch, 9); OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */ OUT_BATCH(gpgpu->batch, GEN7_L3_SQC_REG1_ADDRESS_OFFSET); OUT_BATCH(gpgpu->batch, 0x00A00000); OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */ OUT_BATCH(gpgpu->batch, GEN7_L3_CNTL_REG2_ADDRESS_OFFSET); if (use_slm) OUT_BATCH(gpgpu->batch, gpgpu_l3_config_reg1[12]); else OUT_BATCH(gpgpu->batch, gpgpu_l3_config_reg1[4]); OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */ OUT_BATCH(gpgpu->batch, GEN7_L3_CNTL_REG3_ADDRESS_OFFSET); if (use_slm) OUT_BATCH(gpgpu->batch, gpgpu_l3_config_reg2[12]); else OUT_BATCH(gpgpu->batch, gpgpu_l3_config_reg2[4]); ADVANCE_BATCH(gpgpu->batch); intel_gpgpu_pipe_control(gpgpu); } static void intel_gpgpu_set_L3_baytrail(intel_gpgpu_t *gpgpu, uint32_t use_slm) { BEGIN_BATCH(gpgpu->batch, 9); OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */ OUT_BATCH(gpgpu->batch, GEN7_L3_SQC_REG1_ADDRESS_OFFSET); OUT_BATCH(gpgpu->batch, 0x00D30000); /* General credit : High credit = 26 : 6 */ OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */ OUT_BATCH(gpgpu->batch, GEN7_L3_CNTL_REG2_ADDRESS_OFFSET); if (use_slm) OUT_BATCH(gpgpu->batch, 0x01020021); /* {SLM=64, URB=96, DC=16, RO=16, Sum=192} */ else OUT_BATCH(gpgpu->batch, 0x02040040); /* {SLM=0, URB=128, DC=32, RO=32, Sum=192} */ OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */ OUT_BATCH(gpgpu->batch, GEN7_L3_CNTL_REG3_ADDRESS_OFFSET); OUT_BATCH(gpgpu->batch, 0x0); /* {I/S=0, Const=0, Tex=0} */ ADVANCE_BATCH(gpgpu->batch); intel_gpgpu_pipe_control(gpgpu); } static void intel_gpgpu_set_L3_gen75(intel_gpgpu_t *gpgpu, uint32_t use_slm) { /* still set L3 in batch buffer for fulsim. */ if(gpgpu->drv->atomic_test_result != SELF_TEST_ATOMIC_FAIL) { BEGIN_BATCH(gpgpu->batch, 15); OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */ /* FIXME: KMD always disable the atomic in L3 for some reason. I checked the spec, and don't think we need that workaround now. Before I send a patch to kernel, let's just enable it here. */ OUT_BATCH(gpgpu->batch, HSW_SCRATCH1_OFFSET); OUT_BATCH(gpgpu->batch, 0); /* enable atomic in L3 */ OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */ OUT_BATCH(gpgpu->batch, HSW_ROW_CHICKEN3_HDC_OFFSET); OUT_BATCH(gpgpu->batch, (1 << 6ul) << 16); /* enable atomic in L3 */ } else { BEGIN_BATCH(gpgpu->batch, 9); } OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */ OUT_BATCH(gpgpu->batch, GEN7_L3_SQC_REG1_ADDRESS_OFFSET); OUT_BATCH(gpgpu->batch, 0x08800000); OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */ OUT_BATCH(gpgpu->batch, GEN7_L3_CNTL_REG2_ADDRESS_OFFSET); if (use_slm) OUT_BATCH(gpgpu->batch, gpgpu_l3_config_reg1[12]); else OUT_BATCH(gpgpu->batch, gpgpu_l3_config_reg1[4]); OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */ OUT_BATCH(gpgpu->batch, GEN7_L3_CNTL_REG3_ADDRESS_OFFSET); if (use_slm) OUT_BATCH(gpgpu->batch, gpgpu_l3_config_reg2[12]); else OUT_BATCH(gpgpu->batch, gpgpu_l3_config_reg2[4]); ADVANCE_BATCH(gpgpu->batch); //if(use_slm) // gpgpu->batch->enable_slm = 1; intel_gpgpu_pipe_control(gpgpu); } static void intel_gpgpu_set_L3_gen8(intel_gpgpu_t *gpgpu, uint32_t use_slm) { BEGIN_BATCH(gpgpu->batch, 3); OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */ OUT_BATCH(gpgpu->batch, GEN8_L3_CNTL_REG_ADDRESS_OFFSET); // FIXME, this is a workaround for switch SLM enable and disable random hang if(use_slm) OUT_BATCH(gpgpu->batch, 0x60000121); /* {SLM=192, URB=128, Rest=384} */ else OUT_BATCH(gpgpu->batch, 0x60000160); /* {SLM=0, URB=384, Rest=384, Sum=768} */ //if(use_slm) // gpgpu->batch->enable_slm = 1; intel_gpgpu_pipe_control(gpgpu); } static void intel_gpgpu_batch_start(intel_gpgpu_t *gpgpu) { intel_batchbuffer_start_atomic(gpgpu->batch, 256); intel_gpgpu_pipe_control(gpgpu); assert(intel_gpgpu_set_L3); intel_gpgpu_set_L3(gpgpu, gpgpu->ker->use_slm); intel_gpgpu_select_pipeline(gpgpu); intel_gpgpu_set_base_address(gpgpu); intel_gpgpu_load_vfe_state(gpgpu); intel_gpgpu_load_curbe_buffer(gpgpu); intel_gpgpu_load_idrt(gpgpu); if (gpgpu->perf_b.bo) { BEGIN_BATCH(gpgpu->batch, 3); OUT_BATCH(gpgpu->batch, (0x28 << 23) | /* MI_REPORT_PERF_COUNT */ (3 - 2)); /* length-2 */ OUT_RELOC(gpgpu->batch, gpgpu->perf_b.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, 0 | /* Offset for the start "counters" */ 1); /* Use GTT and not PGTT */ OUT_BATCH(gpgpu->batch, 0); ADVANCE_BATCH(gpgpu->batch); } /* Insert PIPE_CONTROL for time stamp of start*/ if (gpgpu->time_stamp_b.bo) intel_gpgpu_write_timestamp(gpgpu, 0); } static void intel_gpgpu_post_action_gen7(intel_gpgpu_t *gpgpu, int32_t flush_mode) { if(flush_mode) intel_gpgpu_pipe_control(gpgpu); } static void intel_gpgpu_post_action_gen75(intel_gpgpu_t *gpgpu, int32_t flush_mode) { /* flush force for set L3 */ intel_gpgpu_pipe_control(gpgpu); /* Restore L3 control to disable SLM mode, otherwise, may affect 3D pipeline */ intel_gpgpu_set_L3(gpgpu, 0); } static void intel_gpgpu_batch_end(intel_gpgpu_t *gpgpu, int32_t flush_mode) { /* Insert PIPE_CONTROL for time stamp of end*/ if (gpgpu->time_stamp_b.bo) intel_gpgpu_write_timestamp(gpgpu, 1); /* Insert the performance counter command */ if (gpgpu->perf_b.bo) { BEGIN_BATCH(gpgpu->batch, 3); OUT_BATCH(gpgpu->batch, (0x28 << 23) | /* MI_REPORT_PERF_COUNT */ (3 - 2)); /* length-2 */ OUT_RELOC(gpgpu->batch, gpgpu->perf_b.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, 512 | /* Offset for the end "counters" */ 1); /* Use GTT and not PGTT */ OUT_BATCH(gpgpu->batch, 0); ADVANCE_BATCH(gpgpu->batch); } intel_gpgpu_post_action(gpgpu, flush_mode); intel_batchbuffer_end_atomic(gpgpu->batch); } static int intel_gpgpu_batch_reset(intel_gpgpu_t *gpgpu, size_t sz) { return intel_batchbuffer_reset(gpgpu->batch, sz); } static int intel_gpgpu_flush(intel_gpgpu_t *gpgpu) { if (!gpgpu->batch || !gpgpu->batch->buffer) return 0; return intel_batchbuffer_flush(gpgpu->batch); /* FIXME: Remove old assert here for binded buffer offset 0 which tried to guard possible NULL buffer pointer check in kernel, as in case like "runtime_null_kernel_arg", but that's wrong to just take buffer offset 0 as NULL, and cause failure for normal kernels which has no such NULL ptr check but with buffer offset 0 (which is possible now and will be normal if full PPGTT is on). Need to fix NULL ptr check otherwise. */ } static int intel_gpgpu_state_init(intel_gpgpu_t *gpgpu, uint32_t max_threads, uint32_t size_cs_entry, int profiling) { drm_intel_bo *bo; /* Binded buffers */ gpgpu->binded_n = 0; gpgpu->img_bitmap = 0; gpgpu->img_index_base = 3; gpgpu->sampler_bitmap = ~((1 << max_sampler_n) - 1); /* URB */ gpgpu->curb.num_cs_entries = 64; gpgpu->curb.size_cs_entry = size_cs_entry; gpgpu->max_threads = max_threads; if (gpgpu->printf_b.bo) dri_bo_unreference(gpgpu->printf_b.bo); gpgpu->printf_b.bo = NULL; if (gpgpu->profiling_b.bo) dri_bo_unreference(gpgpu->profiling_b.bo); gpgpu->profiling_b.bo = NULL; /* Set the profile buffer*/ if(gpgpu->time_stamp_b.bo) dri_bo_unreference(gpgpu->time_stamp_b.bo); gpgpu->time_stamp_b.bo = NULL; if (profiling) { bo = dri_bo_alloc(gpgpu->drv->bufmgr, "timestamp query", 4096, 4096); gpgpu->time_stamp_b.bo = bo; if (!bo) fprintf(stderr, "Could not allocate buffer for profiling.\n"); } /* stack */ if (gpgpu->stack_b.bo) dri_bo_unreference(gpgpu->stack_b.bo); gpgpu->stack_b.bo = NULL; /* Set the auxiliary buffer*/ uint32_t size_aux = 0; if(gpgpu->aux_buf.bo) dri_bo_unreference(gpgpu->aux_buf.bo); gpgpu->aux_buf.bo = NULL; /* begin with surface heap to make sure it's page aligned, because state base address use 20bit for the address */ gpgpu->aux_offset.surface_heap_offset = size_aux; size_aux += sizeof(surface_heap_t); //curbe must be 32 bytes aligned size_aux = ALIGN(size_aux, 64); gpgpu->aux_offset.curbe_offset = size_aux; size_aux += gpgpu->curb.num_cs_entries * gpgpu->curb.size_cs_entry * 32; //idrt must be 32 bytes aligned size_aux = ALIGN(size_aux, 32); gpgpu->aux_offset.idrt_offset = size_aux; size_aux += MAX_IF_DESC * sizeof(struct gen6_interface_descriptor); //must be 32 bytes aligned //sampler state and vme state share the same buffer, size_aux = ALIGN(size_aux, 32); gpgpu->aux_offset.sampler_state_offset = size_aux; size_aux += MAX(GEN_MAX_SAMPLERS * sizeof(gen6_sampler_state_t), GEN_MAX_VME_STATES * sizeof(gen7_vme_state_t)); //sampler border color state must be 32 bytes aligned size_aux = ALIGN(size_aux, 32); gpgpu->aux_offset.sampler_border_color_state_offset = size_aux; size_aux += GEN_MAX_SAMPLERS * sizeof(gen7_sampler_border_color_t); /* make sure aux buffer is page aligned */ size_aux = ALIGN(size_aux, 4096); bo = dri_bo_alloc(gpgpu->drv->bufmgr, "AUX_BUFFER", size_aux, 4096); if (!bo || dri_bo_map(bo, 1) != 0) { fprintf(stderr, "%s:%d: %s.\n", __FILE__, __LINE__, strerror(errno)); if (bo) dri_bo_unreference(bo); if (profiling && gpgpu->time_stamp_b.bo) dri_bo_unreference(gpgpu->time_stamp_b.bo); gpgpu->time_stamp_b.bo = NULL; return -1; } memset(bo->virtual, 0, size_aux); gpgpu->aux_buf.bo = bo; return 0; } static void intel_gpgpu_set_buf_reloc_gen7(intel_gpgpu_t *gpgpu, int32_t index, dri_bo* obj_bo, uint32_t obj_bo_offset) { surface_heap_t *heap = gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.surface_heap_offset; heap->binding_table[index] = offsetof(surface_heap_t, surface) + index * sizeof(gen7_surface_state_t); dri_bo_emit_reloc(gpgpu->aux_buf.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, obj_bo_offset, gpgpu->aux_offset.surface_heap_offset + heap->binding_table[index] + offsetof(gen7_surface_state_t, ss1), obj_bo); } static void intel_gpgpu_set_buf_reloc_for_vme_gen7(intel_gpgpu_t *gpgpu, int32_t index, dri_bo* obj_bo, uint32_t obj_bo_offset) { surface_heap_t *heap = gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.surface_heap_offset; heap->binding_table[index] = offsetof(surface_heap_t, surface) + index * sizeof(gen7_surface_state_t); dri_bo_emit_reloc(gpgpu->aux_buf.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, obj_bo_offset, gpgpu->aux_offset.surface_heap_offset + heap->binding_table[index] + offsetof(gen7_media_surface_state_t, ss0), obj_bo); } static dri_bo* intel_gpgpu_alloc_constant_buffer(intel_gpgpu_t *gpgpu, uint32_t size, uint8_t bti) { if(gpgpu->constant_b.bo) dri_bo_unreference(gpgpu->constant_b.bo); gpgpu->constant_b.bo = drm_intel_bo_alloc(gpgpu->drv->bufmgr, "CONSTANT_BUFFER", size, 64); if (gpgpu->constant_b.bo == NULL) return NULL; intel_gpgpu_setup_bti(gpgpu, gpgpu->constant_b.bo, 0, size, bti, I965_SURFACEFORMAT_R32G32B32A32_UINT); return gpgpu->constant_b.bo; } static void intel_gpgpu_setup_bti_gen7(intel_gpgpu_t *gpgpu, drm_intel_bo *buf, uint32_t internal_offset, size_t size, unsigned char index, uint32_t format) { assert(size <= (2ul<<30)); size_t s = size - 1; surface_heap_t *heap = gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.surface_heap_offset; gen7_surface_state_t *ss0 = (gen7_surface_state_t *) &heap->surface[index * sizeof(gen7_surface_state_t)]; memset(ss0, 0, sizeof(gen7_surface_state_t)); ss0->ss0.surface_type = I965_SURFACE_BUFFER; ss0->ss0.surface_format = format; ss0->ss2.width = s & 0x7f; /* bits 6:0 of sz */ // Per bspec, I965_SURFACE_BUFFER and RAW format, size must be a multiple of 4 byte. if(format == I965_SURFACEFORMAT_RAW) assert((ss0->ss2.width & 0x03) == 3); ss0->ss2.height = (s >> 7) & 0x3fff; /* bits 20:7 of sz */ ss0->ss3.depth = (s >> 21) & 0x3ff; /* bits 30:21 of sz */ ss0->ss5.cache_control = cl_gpgpu_get_cache_ctrl(); heap->binding_table[index] = offsetof(surface_heap_t, surface) + index * sizeof(gen7_surface_state_t); ss0->ss1.base_addr = buf->offset + internal_offset; dri_bo_emit_reloc(gpgpu->aux_buf.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, internal_offset, gpgpu->aux_offset.surface_heap_offset + heap->binding_table[index] + offsetof(gen7_surface_state_t, ss1), buf); } static void intel_gpgpu_setup_bti_gen75(intel_gpgpu_t *gpgpu, drm_intel_bo *buf, uint32_t internal_offset, size_t size, unsigned char index, uint32_t format) { assert(size <= (2ul<<30)); size_t s = size - 1; surface_heap_t *heap = gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.surface_heap_offset; gen7_surface_state_t *ss0 = (gen7_surface_state_t *) &heap->surface[index * sizeof(gen7_surface_state_t)]; memset(ss0, 0, sizeof(gen7_surface_state_t)); ss0->ss0.surface_type = I965_SURFACE_BUFFER; ss0->ss0.surface_format = format; if(format != I965_SURFACEFORMAT_RAW) { ss0->ss7.shader_r = I965_SURCHAN_SELECT_RED; ss0->ss7.shader_g = I965_SURCHAN_SELECT_GREEN; ss0->ss7.shader_b = I965_SURCHAN_SELECT_BLUE; ss0->ss7.shader_a = I965_SURCHAN_SELECT_ALPHA; } ss0->ss2.width = s & 0x7f; /* bits 6:0 of sz */ // Per bspec, I965_SURFACE_BUFFER and RAW format, size must be a multiple of 4 byte. if(format == I965_SURFACEFORMAT_RAW) assert((ss0->ss2.width & 0x03) == 3); ss0->ss2.height = (s >> 7) & 0x3fff; /* bits 20:7 of sz */ ss0->ss3.depth = (s >> 21) & 0x3ff; /* bits 30:21 of sz */ ss0->ss5.cache_control = cl_gpgpu_get_cache_ctrl(); heap->binding_table[index] = offsetof(surface_heap_t, surface) + index * sizeof(gen7_surface_state_t); ss0->ss1.base_addr = buf->offset + internal_offset; dri_bo_emit_reloc(gpgpu->aux_buf.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, internal_offset, gpgpu->aux_offset.surface_heap_offset + heap->binding_table[index] + offsetof(gen7_surface_state_t, ss1), buf); } static void intel_gpgpu_setup_bti_gen8(intel_gpgpu_t *gpgpu, drm_intel_bo *buf, uint32_t internal_offset, size_t size, unsigned char index, uint32_t format) { assert(size <= (2ul<<30)); size_t s = size - 1; surface_heap_t *heap = gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.surface_heap_offset; gen8_surface_state_t *ss0 = (gen8_surface_state_t *) &heap->surface[index * sizeof(gen8_surface_state_t)]; memset(ss0, 0, sizeof(gen8_surface_state_t)); ss0->ss0.surface_type = I965_SURFACE_BUFFER; ss0->ss0.surface_format = format; if(format != I965_SURFACEFORMAT_RAW) { ss0->ss7.shader_channel_select_red = I965_SURCHAN_SELECT_RED; ss0->ss7.shader_channel_select_green = I965_SURCHAN_SELECT_GREEN; ss0->ss7.shader_channel_select_blue = I965_SURCHAN_SELECT_BLUE; ss0->ss7.shader_channel_select_alpha = I965_SURCHAN_SELECT_ALPHA; } ss0->ss2.width = s & 0x7f; /* bits 6:0 of sz */ // Per bspec, I965_SURFACE_BUFFER and RAW format, size must be a multiple of 4 byte. if(format == I965_SURFACEFORMAT_RAW) assert((ss0->ss2.width & 0x03) == 3); ss0->ss2.height = (s >> 7) & 0x3fff; /* bits 20:7 of sz */ ss0->ss3.depth = (s >> 21) & 0x3ff; /* bits 30:21 of sz */ ss0->ss1.mem_obj_ctrl_state = cl_gpgpu_get_cache_ctrl(); heap->binding_table[index] = offsetof(surface_heap_t, surface) + index * sizeof(gen8_surface_state_t); ss0->ss8.surface_base_addr_lo = (buf->offset64 + internal_offset) & 0xffffffff; ss0->ss9.surface_base_addr_hi = ((buf->offset64 + internal_offset) >> 32) & 0xffffffff; dri_bo_emit_reloc(gpgpu->aux_buf.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, internal_offset, gpgpu->aux_offset.surface_heap_offset + heap->binding_table[index] + offsetof(gen8_surface_state_t, ss8), buf); } static void intel_gpgpu_setup_bti_gen9(intel_gpgpu_t *gpgpu, drm_intel_bo *buf, uint32_t internal_offset, size_t size, unsigned char index, uint32_t format) { assert(size <= (4ul<<30)); size_t s = size - 1; surface_heap_t *heap = gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.surface_heap_offset; gen8_surface_state_t *ss0 = (gen8_surface_state_t *) &heap->surface[index * sizeof(gen8_surface_state_t)]; memset(ss0, 0, sizeof(gen8_surface_state_t)); ss0->ss0.surface_type = I965_SURFACE_BUFFER; ss0->ss0.surface_format = format; if(format != I965_SURFACEFORMAT_RAW) { ss0->ss7.shader_channel_select_red = I965_SURCHAN_SELECT_RED; ss0->ss7.shader_channel_select_green = I965_SURCHAN_SELECT_GREEN; ss0->ss7.shader_channel_select_blue = I965_SURCHAN_SELECT_BLUE; ss0->ss7.shader_channel_select_alpha = I965_SURCHAN_SELECT_ALPHA; } ss0->ss2.width = s & 0x7f; /* bits 6:0 of sz */ // Per bspec, I965_SURFACE_BUFFER and RAW format, size must be a multiple of 4 byte. if(format == I965_SURFACEFORMAT_RAW) assert((ss0->ss2.width & 0x03) == 3); ss0->ss2.height = (s >> 7) & 0x3fff; /* bits 20:7 of sz */ ss0->ss3.depth = (s >> 21) & 0x7ff; /* bits 31:21 of sz, from bespec only gen 9 support that*/ ss0->ss1.mem_obj_ctrl_state = cl_gpgpu_get_cache_ctrl(); heap->binding_table[index] = offsetof(surface_heap_t, surface) + index * sizeof(gen8_surface_state_t); ss0->ss8.surface_base_addr_lo = (buf->offset64 + internal_offset) & 0xffffffff; ss0->ss9.surface_base_addr_hi = ((buf->offset64 + internal_offset) >> 32) & 0xffffffff; dri_bo_emit_reloc(gpgpu->aux_buf.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, internal_offset, gpgpu->aux_offset.surface_heap_offset + heap->binding_table[index] + offsetof(gen8_surface_state_t, ss8), buf); } static int intel_is_surface_array(cl_mem_object_type type) { if (type == CL_MEM_OBJECT_IMAGE1D_ARRAY || type == CL_MEM_OBJECT_IMAGE2D_ARRAY) return 1; return 0; } static int intel_get_surface_type(cl_mem_object_type type) { switch (type) { case CL_MEM_OBJECT_IMAGE1D: case CL_MEM_OBJECT_IMAGE1D_ARRAY: return I965_SURFACE_1D; case CL_MEM_OBJECT_IMAGE1D_BUFFER: case CL_MEM_OBJECT_IMAGE2D: case CL_MEM_OBJECT_IMAGE2D_ARRAY: return I965_SURFACE_2D; case CL_MEM_OBJECT_IMAGE3D: return I965_SURFACE_3D; default: assert(0); } return 0; } /* Get fixed surface type. If it is a 1D array image with a large index, we need to fixup it to 2D type due to a Gen7/Gen75's sampler issue on a integer type surface with clamp address mode and nearest filter mode. */ static uint32_t get_surface_type(intel_gpgpu_t *gpgpu, int index, cl_mem_object_type type) { uint32_t surface_type; //Now all platforms need it, so disable platform, re-enable it //when some platform don't need this workaround if (/*((IS_IVYBRIDGE(gpgpu->drv->device_id) || IS_HASWELL(gpgpu->drv->device_id) || IS_BROADWELL(gpgpu->drv->device_id) || IS_CHERRYVIEW(gpgpu->drv->device_id) || IS_SKYLAKE(gpgpu->drv->device_id) || IS_BROXTON(gpgpu->drv->device_id) || IS_KABYLAKE(gpgpu->drv_device_id))) && */ index >= BTI_WORKAROUND_IMAGE_OFFSET + BTI_RESERVED_NUM && type == CL_MEM_OBJECT_IMAGE1D_ARRAY) surface_type = I965_SURFACE_2D; else surface_type = intel_get_surface_type(type); return surface_type; } static void intel_gpgpu_bind_image_gen7(intel_gpgpu_t *gpgpu, uint32_t index, dri_bo* obj_bo, uint32_t obj_bo_offset, uint32_t format, cl_mem_object_type type, uint32_t bpp, int32_t w, int32_t h, int32_t depth, int32_t pitch, int32_t slice_pitch, int32_t tiling) { surface_heap_t *heap = gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.surface_heap_offset; gen7_surface_state_t *ss = (gen7_surface_state_t *) &heap->surface[index * sizeof(gen7_surface_state_t)]; memset(ss, 0, sizeof(*ss)); ss->ss0.vertical_line_stride = 0; // always choose VALIGN_2 ss->ss0.surface_type = get_surface_type(gpgpu, index, type); if (intel_is_surface_array(type)) { ss->ss0.surface_array = 1; ss->ss0.surface_array_spacing = 1; } ss->ss0.surface_format = format; ss->ss1.base_addr = obj_bo->offset + obj_bo_offset; ss->ss2.width = w - 1; ss->ss2.height = h - 1; ss->ss3.depth = depth - 1; ss->ss4.not_str_buf.rt_view_extent = depth - 1; ss->ss4.not_str_buf.min_array_element = 0; ss->ss3.pitch = pitch - 1; ss->ss5.cache_control = cl_gpgpu_get_cache_ctrl(); if (tiling == GPGPU_TILE_X) { ss->ss0.tiled_surface = 1; ss->ss0.tile_walk = I965_TILEWALK_XMAJOR; } else if (tiling == GPGPU_TILE_Y) { ss->ss0.tiled_surface = 1; ss->ss0.tile_walk = I965_TILEWALK_YMAJOR; } ss->ss0.render_cache_rw_mode = 1; /* XXX do we need to set it? */ intel_gpgpu_set_buf_reloc_gen7(gpgpu, index, obj_bo, obj_bo_offset); assert(index < GEN_MAX_SURFACES); } static void intel_gpgpu_bind_image_for_vme_gen7(intel_gpgpu_t *gpgpu, uint32_t index, dri_bo* obj_bo, uint32_t obj_bo_offset, uint32_t format, cl_mem_object_type type, uint32_t bpp, int32_t w, int32_t h, int32_t depth, int32_t pitch, int32_t slice_pitch, int32_t tiling) { surface_heap_t *heap = gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.surface_heap_offset; gen7_media_surface_state_t *ss = (gen7_media_surface_state_t *) &heap->surface[index * sizeof(gen7_surface_state_t)]; memset(ss, 0, sizeof(*ss)); ss->ss0.base_addr = obj_bo->offset + obj_bo_offset; ss->ss1.uv_offset_v_direction = 0; ss->ss1.pic_struct = 0; ss->ss1.width = w - 1; ss->ss1.height = h - 1; if (tiling == GPGPU_NO_TILE) { ss->ss2.tile_mode = 0; } else if (tiling == GPGPU_TILE_X){ ss->ss2.tile_mode = 2; } else if (tiling == GPGPU_TILE_Y){ ss->ss2.tile_mode = 3; } ss->ss2.half_pitch_for_chroma = 0; ss->ss2.surface_pitch = pitch - 1; ss->ss2.surface_object_control_state = cl_gpgpu_get_cache_ctrl(); ss->ss2.interleave_chroma = 0; ss->ss2.surface_format = 12; //Y8_UNORM ss->ss3.y_offset_for_u = 0; ss->ss3.x_offset_for_u = 0; ss->ss4.y_offset_for_v = 0; ss->ss4.x_offset_for_v = 0; intel_gpgpu_set_buf_reloc_for_vme_gen7(gpgpu, index, obj_bo, obj_bo_offset); assert(index < GEN_MAX_SURFACES); } static void intel_gpgpu_bind_image_gen75(intel_gpgpu_t *gpgpu, uint32_t index, dri_bo* obj_bo, uint32_t obj_bo_offset, uint32_t format, cl_mem_object_type type, uint32_t bpp, int32_t w, int32_t h, int32_t depth, int32_t pitch, int32_t slice_pitch, int32_t tiling) { surface_heap_t *heap = gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.surface_heap_offset; gen7_surface_state_t *ss = (gen7_surface_state_t *) &heap->surface[index * sizeof(gen7_surface_state_t)]; memset(ss, 0, sizeof(*ss)); ss->ss0.vertical_line_stride = 0; // always choose VALIGN_2 ss->ss0.surface_type = get_surface_type(gpgpu, index, type); if (intel_is_surface_array(type)) { ss->ss0.surface_array = 1; ss->ss0.surface_array_spacing = 1; } ss->ss0.surface_format = format; ss->ss1.base_addr = obj_bo->offset + obj_bo_offset; ss->ss2.width = w - 1; ss->ss2.height = h - 1; ss->ss3.depth = depth - 1; ss->ss4.not_str_buf.rt_view_extent = depth - 1; ss->ss4.not_str_buf.min_array_element = 0; ss->ss3.pitch = pitch - 1; ss->ss5.cache_control = cl_gpgpu_get_cache_ctrl(); ss->ss7.shader_r = I965_SURCHAN_SELECT_RED; ss->ss7.shader_g = I965_SURCHAN_SELECT_GREEN; ss->ss7.shader_b = I965_SURCHAN_SELECT_BLUE; ss->ss7.shader_a = I965_SURCHAN_SELECT_ALPHA; if (tiling == GPGPU_TILE_X) { ss->ss0.tiled_surface = 1; ss->ss0.tile_walk = I965_TILEWALK_XMAJOR; } else if (tiling == GPGPU_TILE_Y) { ss->ss0.tiled_surface = 1; ss->ss0.tile_walk = I965_TILEWALK_YMAJOR; } ss->ss0.render_cache_rw_mode = 1; /* XXX do we need to set it? */ intel_gpgpu_set_buf_reloc_gen7(gpgpu, index, obj_bo, obj_bo_offset); assert(index < GEN_MAX_SURFACES); } static void intel_gpgpu_bind_image_gen8(intel_gpgpu_t *gpgpu, uint32_t index, dri_bo* obj_bo, uint32_t obj_bo_offset, uint32_t format, cl_mem_object_type type, uint32_t bpp, int32_t w, int32_t h, int32_t depth, int32_t pitch, int32_t slice_pitch, int32_t tiling) { surface_heap_t *heap = gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.surface_heap_offset; gen8_surface_state_t *ss = (gen8_surface_state_t *) &heap->surface[index * sizeof(gen8_surface_state_t)]; memset(ss, 0, sizeof(*ss)); ss->ss0.vertical_line_stride = 0; // always choose VALIGN_2 ss->ss0.surface_type = get_surface_type(gpgpu, index, type); ss->ss0.surface_format = format; if (intel_is_surface_array(type)) { ss->ss0.surface_array = 1; ss->ss1.surface_qpitch = (h + 3)/4; } ss->ss0.horizontal_alignment = 1; ss->ss0.vertical_alignment = 1; if (tiling == GPGPU_TILE_X) { ss->ss0.tile_mode = GEN8_TILEMODE_XMAJOR; } else if (tiling == GPGPU_TILE_Y) { ss->ss0.tile_mode = GEN8_TILEMODE_YMAJOR; } else assert(tiling == GPGPU_NO_TILE);// W mode is not supported now. ss->ss2.width = w - 1; ss->ss2.height = h - 1; ss->ss3.depth = depth - 1; ss->ss8.surface_base_addr_lo = (obj_bo->offset64 + obj_bo_offset) & 0xffffffff; ss->ss9.surface_base_addr_hi = ((obj_bo->offset64 + obj_bo_offset) >> 32) & 0xffffffff; ss->ss4.render_target_view_ext = depth - 1; ss->ss4.min_array_elt = 0; ss->ss3.surface_pitch = pitch - 1; ss->ss1.mem_obj_ctrl_state = cl_gpgpu_get_cache_ctrl(); ss->ss7.shader_channel_select_red = I965_SURCHAN_SELECT_RED; ss->ss7.shader_channel_select_green = I965_SURCHAN_SELECT_GREEN; ss->ss7.shader_channel_select_blue = I965_SURCHAN_SELECT_BLUE; ss->ss7.shader_channel_select_alpha = I965_SURCHAN_SELECT_ALPHA; ss->ss0.render_cache_rw_mode = 1; /* XXX do we need to set it? */ heap->binding_table[index] = offsetof(surface_heap_t, surface) + index * surface_state_sz; dri_bo_emit_reloc(gpgpu->aux_buf.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, obj_bo_offset, gpgpu->aux_offset.surface_heap_offset + heap->binding_table[index] + offsetof(gen8_surface_state_t, ss8), obj_bo); assert(index < GEN_MAX_SURFACES); } static void intel_gpgpu_bind_image_gen9(intel_gpgpu_t *gpgpu, uint32_t index, dri_bo* obj_bo, uint32_t obj_bo_offset, uint32_t format, cl_mem_object_type type, uint32_t bpp, int32_t w, int32_t h, int32_t depth, int32_t pitch, int32_t slice_pitch, int32_t tiling) { surface_heap_t *heap = gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.surface_heap_offset; gen8_surface_state_t *ss = (gen8_surface_state_t *) &heap->surface[index * sizeof(gen8_surface_state_t)]; memset(ss, 0, sizeof(*ss)); ss->ss0.vertical_line_stride = 0; // always choose VALIGN_2 ss->ss0.surface_type = get_surface_type(gpgpu, index, type); ss->ss0.surface_format = format; if (intel_is_surface_array(type) && ss->ss0.surface_type == I965_SURFACE_1D) { ss->ss0.surface_array = 1; ss->ss1.surface_qpitch = (slice_pitch/bpp + 3)/4; //align_h } if (intel_is_surface_array(type) && ss->ss0.surface_type == I965_SURFACE_2D) { ss->ss0.surface_array = 1; ss->ss1.surface_qpitch = (slice_pitch/pitch + 3)/4; } if(ss->ss0.surface_type == I965_SURFACE_3D) ss->ss1.surface_qpitch = (slice_pitch/pitch + 3)/4; ss->ss0.horizontal_alignment = 1; ss->ss0.vertical_alignment = 1; if (tiling == GPGPU_TILE_X) { ss->ss0.tile_mode = GEN8_TILEMODE_XMAJOR; } else if (tiling == GPGPU_TILE_Y) { ss->ss0.tile_mode = GEN8_TILEMODE_YMAJOR; } else assert(tiling == GPGPU_NO_TILE);// W mode is not supported now. ss->ss2.width = w - 1; ss->ss2.height = h - 1; ss->ss3.depth = depth - 1; ss->ss8.surface_base_addr_lo = (obj_bo->offset64 + obj_bo_offset) & 0xffffffff; ss->ss9.surface_base_addr_hi = ((obj_bo->offset64 + obj_bo_offset) >> 32) & 0xffffffff; ss->ss4.render_target_view_ext = depth - 1; ss->ss4.min_array_elt = 0; ss->ss3.surface_pitch = pitch - 1; ss->ss1.mem_obj_ctrl_state = cl_gpgpu_get_cache_ctrl(); ss->ss7.shader_channel_select_red = I965_SURCHAN_SELECT_RED; ss->ss7.shader_channel_select_green = I965_SURCHAN_SELECT_GREEN; ss->ss7.shader_channel_select_blue = I965_SURCHAN_SELECT_BLUE; ss->ss7.shader_channel_select_alpha = I965_SURCHAN_SELECT_ALPHA; ss->ss0.render_cache_rw_mode = 1; /* XXX do we need to set it? */ heap->binding_table[index] = offsetof(surface_heap_t, surface) + index * surface_state_sz; dri_bo_emit_reloc(gpgpu->aux_buf.bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, obj_bo_offset, gpgpu->aux_offset.surface_heap_offset + heap->binding_table[index] + offsetof(gen8_surface_state_t, ss8), obj_bo); assert(index < GEN_MAX_SURFACES); } static void intel_gpgpu_bind_buf(intel_gpgpu_t *gpgpu, drm_intel_bo *buf, uint32_t offset, uint32_t internal_offset, size_t size, uint8_t bti) { assert(gpgpu->binded_n < max_buf_n); if(offset != -1) { gpgpu->binded_buf[gpgpu->binded_n] = buf; gpgpu->target_buf_offset[gpgpu->binded_n] = internal_offset; gpgpu->binded_offset[gpgpu->binded_n] = offset; gpgpu->binded_n++; } intel_gpgpu_setup_bti(gpgpu, buf, internal_offset, size, bti, I965_SURFACEFORMAT_RAW); } static int intel_gpgpu_set_scratch(intel_gpgpu_t * gpgpu, uint32_t per_thread_size) { drm_intel_bufmgr *bufmgr = gpgpu->drv->bufmgr; drm_intel_bo* old = gpgpu->scratch_b.bo; uint32_t total = per_thread_size * gpgpu->max_threads; /* Per Bspec, scratch should 2X the desired size when EU index is not continuous */ if (IS_HASWELL(gpgpu->drv->device_id) || IS_CHERRYVIEW(gpgpu->drv->device_id) || PCI_CHIP_BROXTON_1 == gpgpu->drv->device_id || PCI_CHIP_BROXTON_3 == gpgpu->drv->device_id) total *= 2; gpgpu->per_thread_scratch = per_thread_size; if(old && old->size < total) { drm_intel_bo_unreference(old); old = NULL; } if(!old && total) { gpgpu->scratch_b.bo = drm_intel_bo_alloc(bufmgr, "SCRATCH_BO", total, 4096); if (gpgpu->scratch_b.bo == NULL) return -1; } return 0; } static void intel_gpgpu_set_stack(intel_gpgpu_t *gpgpu, uint32_t offset, uint32_t size, uint8_t bti) { drm_intel_bufmgr *bufmgr = gpgpu->drv->bufmgr; gpgpu->stack_b.bo = drm_intel_bo_alloc(bufmgr, "STACK", size, 64); cl_gpgpu_bind_buf((cl_gpgpu)gpgpu, (cl_buffer)gpgpu->stack_b.bo, offset, 0, size, bti); } static void intel_gpgpu_build_idrt_gen7(intel_gpgpu_t *gpgpu, cl_gpgpu_kernel *kernel) { gen6_interface_descriptor_t *desc; drm_intel_bo *ker_bo = NULL; desc = (gen6_interface_descriptor_t*) (gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.idrt_offset); memset(desc, 0, sizeof(*desc)); ker_bo = (drm_intel_bo *) kernel->bo; desc->desc0.kernel_start_pointer = ker_bo->offset >> 6; /* reloc */ desc->desc1.single_program_flow = 0; desc->desc1.floating_point_mode = 0; /* use IEEE-754 rule */ desc->desc5.rounding_mode = 0; /* round to nearest even */ assert((gpgpu->aux_buf.bo->offset + gpgpu->aux_offset.sampler_state_offset) % 32 == 0); desc->desc2.sampler_state_pointer = (gpgpu->aux_buf.bo->offset + gpgpu->aux_offset.sampler_state_offset) >> 5; desc->desc3.binding_table_entry_count = 0; /* no prefetch */ desc->desc3.binding_table_pointer = 0; desc->desc4.curbe_read_len = kernel->curbe_sz / 32; desc->desc4.curbe_read_offset = 0; /* Barriers / SLM are automatically handled on Gen7+ */ if (gpgpu->drv->gen_ver == 7 || gpgpu->drv->gen_ver == 75) { size_t slm_sz = kernel->slm_sz; desc->desc5.group_threads_num = kernel->use_slm ? kernel->thread_n : 0; desc->desc5.barrier_enable = kernel->use_slm; if (slm_sz <= 4*KB) slm_sz = 4*KB; else if (slm_sz <= 8*KB) slm_sz = 8*KB; else if (slm_sz <= 16*KB) slm_sz = 16*KB; else if (slm_sz <= 32*KB) slm_sz = 32*KB; else slm_sz = 64*KB; slm_sz = slm_sz >> 12; desc->desc5.slm_sz = slm_sz; } else desc->desc5.group_threads_num = kernel->barrierID; /* BarrierID on GEN6 */ dri_bo_emit_reloc(gpgpu->aux_buf.bo, I915_GEM_DOMAIN_INSTRUCTION, 0, 0, gpgpu->aux_offset.idrt_offset + offsetof(gen6_interface_descriptor_t, desc0), ker_bo); dri_bo_emit_reloc(gpgpu->aux_buf.bo, I915_GEM_DOMAIN_SAMPLER, 0, gpgpu->aux_offset.sampler_state_offset, gpgpu->aux_offset.idrt_offset + offsetof(gen6_interface_descriptor_t, desc2), gpgpu->aux_buf.bo); } static void intel_gpgpu_build_idrt_gen8(intel_gpgpu_t *gpgpu, cl_gpgpu_kernel *kernel) { gen8_interface_descriptor_t *desc; desc = (gen8_interface_descriptor_t*) (gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.idrt_offset); memset(desc, 0, sizeof(*desc)); desc->desc0.kernel_start_pointer = 0; /* reloc */ desc->desc2.single_program_flow = 0; desc->desc2.floating_point_mode = 0; /* use IEEE-754 rule */ desc->desc6.rounding_mode = 0; /* round to nearest even */ assert((gpgpu->aux_buf.bo->offset + gpgpu->aux_offset.sampler_state_offset) % 32 == 0); desc->desc3.sampler_state_pointer = gpgpu->aux_offset.sampler_state_offset >> 5; desc->desc4.binding_table_entry_count = 0; /* no prefetch */ desc->desc4.binding_table_pointer = 0; desc->desc5.curbe_read_len = kernel->curbe_sz / 32; desc->desc5.curbe_read_offset = 0; /* Barriers / SLM are automatically handled on Gen7+ */ size_t slm_sz = kernel->slm_sz; /* group_threads_num should not be set to 0 even if the barrier is disabled per bspec */ desc->desc6.group_threads_num = kernel->thread_n; desc->desc6.barrier_enable = kernel->use_slm; if (slm_sz == 0) slm_sz = 0; else if (slm_sz <= 4*KB) slm_sz = 4*KB; else if (slm_sz <= 8*KB) slm_sz = 8*KB; else if (slm_sz <= 16*KB) slm_sz = 16*KB; else if (slm_sz <= 32*KB) slm_sz = 32*KB; else slm_sz = 64*KB; slm_sz = slm_sz >> 12; desc->desc6.slm_sz = slm_sz; } static void intel_gpgpu_build_idrt_gen9(intel_gpgpu_t *gpgpu, cl_gpgpu_kernel *kernel) { gen8_interface_descriptor_t *desc; desc = (gen8_interface_descriptor_t*) (gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.idrt_offset); memset(desc, 0, sizeof(*desc)); desc->desc0.kernel_start_pointer = 0; /* reloc */ desc->desc2.single_program_flow = 0; desc->desc2.floating_point_mode = 0; /* use IEEE-754 rule */ desc->desc6.rounding_mode = 0; /* round to nearest even */ assert((gpgpu->aux_buf.bo->offset + gpgpu->aux_offset.sampler_state_offset) % 32 == 0); desc->desc3.sampler_state_pointer = gpgpu->aux_offset.sampler_state_offset >> 5; desc->desc4.binding_table_entry_count = 0; /* no prefetch */ desc->desc4.binding_table_pointer = 0; desc->desc5.curbe_read_len = kernel->curbe_sz / 32; desc->desc5.curbe_read_offset = 0; /* Barriers / SLM are automatically handled on Gen7+ */ size_t slm_sz = kernel->slm_sz; /* group_threads_num should not be set to 0 even if the barrier is disabled per bspec */ desc->desc6.group_threads_num = kernel->thread_n; desc->desc6.barrier_enable = kernel->use_slm; if (slm_sz == 0) slm_sz = 0; else if (slm_sz <= 1*KB) slm_sz = 1; else if (slm_sz <= 2*KB) slm_sz = 2; else if (slm_sz <= 4*KB) slm_sz = 3; else if (slm_sz <= 8*KB) slm_sz = 4; else if (slm_sz <= 16*KB) slm_sz = 5; else if (slm_sz <= 32*KB) slm_sz = 6; else slm_sz = 7; desc->desc6.slm_sz = slm_sz; } static int intel_gpgpu_upload_curbes_gen7(intel_gpgpu_t *gpgpu, const void* data, uint32_t size) { unsigned char *curbe = NULL; cl_gpgpu_kernel *k = gpgpu->ker; uint32_t i, j; /* Upload the data first */ if (dri_bo_map(gpgpu->aux_buf.bo, 1) != 0) { fprintf(stderr, "%s:%d: %s.\n", __FILE__, __LINE__, strerror(errno)); return -1; } assert(gpgpu->aux_buf.bo->virtual); curbe = (unsigned char *) (gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.curbe_offset); memcpy(curbe, data, size); /* Now put all the relocations for our flat address space */ for (i = 0; i < k->thread_n; ++i) for (j = 0; j < gpgpu->binded_n; ++j) { *(uint32_t *)(curbe + gpgpu->binded_offset[j]+i*k->curbe_sz) = gpgpu->binded_buf[j]->offset64 + gpgpu->target_buf_offset[j]; drm_intel_bo_emit_reloc(gpgpu->aux_buf.bo, gpgpu->aux_offset.curbe_offset + gpgpu->binded_offset[j]+i*k->curbe_sz, gpgpu->binded_buf[j], gpgpu->target_buf_offset[j], I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER); } dri_bo_unmap(gpgpu->aux_buf.bo); return 0; } static int intel_gpgpu_upload_curbes_gen8(intel_gpgpu_t *gpgpu, const void* data, uint32_t size) { unsigned char *curbe = NULL; cl_gpgpu_kernel *k = gpgpu->ker; uint32_t i, j; /* Upload the data first */ if (dri_bo_map(gpgpu->aux_buf.bo, 1) != 0) { fprintf(stderr, "%s:%d: %s.\n", __FILE__, __LINE__, strerror(errno)); return -1; } assert(gpgpu->aux_buf.bo->virtual); curbe = (unsigned char *) (gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.curbe_offset); memcpy(curbe, data, size); /* Now put all the relocations for our flat address space */ for (i = 0; i < k->thread_n; ++i) for (j = 0; j < gpgpu->binded_n; ++j) { *(size_t *)(curbe + gpgpu->binded_offset[j]+i*k->curbe_sz) = gpgpu->binded_buf[j]->offset64 + gpgpu->target_buf_offset[j]; drm_intel_bo_emit_reloc(gpgpu->aux_buf.bo, gpgpu->aux_offset.curbe_offset + gpgpu->binded_offset[j]+i*k->curbe_sz, gpgpu->binded_buf[j], gpgpu->target_buf_offset[j], I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER); } dri_bo_unmap(gpgpu->aux_buf.bo); return 0; } static void intel_gpgpu_upload_samplers(intel_gpgpu_t *gpgpu, const void *data, uint32_t n) { if (n) { const size_t sz = n * sizeof(gen6_sampler_state_t); memcpy(gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.sampler_state_offset, data, sz); } } int translate_wrap_mode(uint32_t cl_address_mode, int using_nearest) { switch( cl_address_mode ) { case CLK_ADDRESS_NONE: case CLK_ADDRESS_REPEAT: return GEN_TEXCOORDMODE_WRAP; case CLK_ADDRESS_CLAMP: return GEN_TEXCOORDMODE_CLAMP_BORDER; case CLK_ADDRESS_CLAMP_TO_EDGE: return GEN_TEXCOORDMODE_CLAMP; case CLK_ADDRESS_MIRRORED_REPEAT: return GEN_TEXCOORDMODE_MIRROR; default: return GEN_TEXCOORDMODE_WRAP; } } static void intel_gpgpu_insert_vme_state_gen7(intel_gpgpu_t *gpgpu, cl_accelerator_intel accel, uint32_t index) { gen7_vme_state_t* vme = (gen7_vme_state_t*)(gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.sampler_state_offset) + index; memset(vme, 0, sizeof(*vme)); gen7_vme_search_path_state_t* sp = vme->sp; if(accel->desc.me.search_path_type == CL_ME_SEARCH_PATH_RADIUS_2_2_INTEL){ sp[0].dw0.SPD_0_X = 0; sp[0].dw0.SPD_0_Y = 0; sp[0].dw0.SPD_1_X = 0; sp[0].dw0.SPD_1_Y = 0; sp[0].dw0.SPD_2_X = 0; sp[0].dw0.SPD_2_Y = 0; sp[0].dw0.SPD_3_X = 0; sp[0].dw0.SPD_3_Y = 0; } else if(accel->desc.me.search_path_type == CL_ME_SEARCH_PATH_RADIUS_4_4_INTEL){ sp[0].dw0.SPD_0_X = 1; sp[0].dw0.SPD_0_Y = 0; sp[0].dw0.SPD_1_X = 0; sp[0].dw0.SPD_1_Y = 1; sp[0].dw0.SPD_2_X = -1; sp[0].dw0.SPD_2_Y = 0; sp[0].dw0.SPD_3_X = 0; sp[0].dw0.SPD_3_Y = 0; } else if(accel->desc.me.search_path_type == CL_ME_SEARCH_PATH_RADIUS_16_12_INTEL){ sp[0].dw0.SPD_0_X = 1; sp[0].dw0.SPD_0_Y = 0; sp[0].dw0.SPD_1_X = 1; sp[0].dw0.SPD_1_Y = 0; sp[0].dw0.SPD_2_X = 1; sp[0].dw0.SPD_2_Y = 0; sp[0].dw0.SPD_3_X = 1; sp[0].dw0.SPD_3_Y = 0; sp[1].dw0.SPD_0_X = 1; sp[1].dw0.SPD_0_Y = 0; sp[1].dw0.SPD_1_X = 1; sp[1].dw0.SPD_1_Y = 0; sp[1].dw0.SPD_2_X = 1; sp[1].dw0.SPD_2_Y = 0; sp[1].dw0.SPD_3_X = 0; sp[1].dw0.SPD_3_Y = 1; sp[2].dw0.SPD_0_X = -1; sp[2].dw0.SPD_0_Y = 0; sp[2].dw0.SPD_1_X = -1; sp[2].dw0.SPD_1_Y = 0; sp[2].dw0.SPD_2_X = -1; sp[2].dw0.SPD_2_Y = 0; sp[2].dw0.SPD_3_X = -1; sp[2].dw0.SPD_3_Y = 0; sp[3].dw0.SPD_0_X = -1; sp[3].dw0.SPD_0_Y = 0; sp[3].dw0.SPD_1_X = -1; sp[3].dw0.SPD_1_Y = 0; sp[3].dw0.SPD_2_X = -1; sp[3].dw0.SPD_2_Y = 0; sp[3].dw0.SPD_3_X = 0; sp[3].dw0.SPD_3_Y = 1; sp[4].dw0.SPD_0_X = 1; sp[4].dw0.SPD_0_Y = 0; sp[4].dw0.SPD_1_X = 1; sp[4].dw0.SPD_1_Y = 0; sp[4].dw0.SPD_2_X = 1; sp[4].dw0.SPD_2_Y = 0; sp[4].dw0.SPD_3_X = 1; sp[4].dw0.SPD_3_Y = 0; sp[5].dw0.SPD_0_X = 1; sp[5].dw0.SPD_0_Y = 0; sp[5].dw0.SPD_1_X = 1; sp[5].dw0.SPD_1_Y = 0; sp[5].dw0.SPD_2_X = 1; sp[5].dw0.SPD_2_Y = 0; sp[5].dw0.SPD_3_X = 0; sp[5].dw0.SPD_3_Y = 1; sp[6].dw0.SPD_0_X = -1; sp[6].dw0.SPD_0_Y = 0; sp[6].dw0.SPD_1_X = -1; sp[6].dw0.SPD_1_Y = 0; sp[6].dw0.SPD_2_X = -1; sp[6].dw0.SPD_2_Y = 0; sp[6].dw0.SPD_3_X = -1; sp[6].dw0.SPD_3_Y = 0; sp[7].dw0.SPD_0_X = -1; sp[7].dw0.SPD_0_Y = 0; sp[7].dw0.SPD_1_X = -1; sp[7].dw0.SPD_1_Y = 0; sp[7].dw0.SPD_2_X = -1; sp[7].dw0.SPD_2_Y = 0; sp[7].dw0.SPD_3_X = 0; sp[7].dw0.SPD_3_Y = 1; sp[8].dw0.SPD_0_X = 1; sp[8].dw0.SPD_0_Y = 0; sp[8].dw0.SPD_1_X = 1; sp[8].dw0.SPD_1_Y = 0; sp[8].dw0.SPD_2_X = 1; sp[8].dw0.SPD_2_Y = 0; sp[8].dw0.SPD_3_X = 1; sp[8].dw0.SPD_3_Y = 0; sp[9].dw0.SPD_0_X = 1; sp[9].dw0.SPD_0_Y = 0; sp[9].dw0.SPD_1_X = 1; sp[9].dw0.SPD_1_Y = 0; sp[9].dw0.SPD_2_X = 1; sp[9].dw0.SPD_2_Y = 0; sp[9].dw0.SPD_3_X = 0; sp[9].dw0.SPD_3_Y = 1; sp[10].dw0.SPD_0_X = -1; sp[10].dw0.SPD_0_Y = 0; sp[10].dw0.SPD_1_X = -1; sp[10].dw0.SPD_1_Y = 0; sp[10].dw0.SPD_2_X = -1; sp[10].dw0.SPD_2_Y = 0; sp[10].dw0.SPD_3_X = -1; sp[10].dw0.SPD_3_Y = 0; sp[11].dw0.SPD_0_X = -1; sp[11].dw0.SPD_0_Y = 0; sp[11].dw0.SPD_1_X = -1; sp[11].dw0.SPD_1_Y = 0; sp[11].dw0.SPD_2_X = -1; sp[11].dw0.SPD_2_Y = 0; sp[11].dw0.SPD_3_X = 0; sp[11].dw0.SPD_3_Y = 0; } } static void intel_gpgpu_bind_vme_state_gen7(intel_gpgpu_t *gpgpu, cl_accelerator_intel accel) { intel_gpgpu_insert_vme_state_gen7(gpgpu, accel, 0); } static void intel_gpgpu_insert_sampler_gen7(intel_gpgpu_t *gpgpu, uint32_t index, uint32_t clk_sampler) { int using_nearest = 0; uint32_t wrap_mode; gen7_sampler_state_t *sampler; sampler = (gen7_sampler_state_t *)(gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.sampler_state_offset) + index; memset(sampler, 0, sizeof(*sampler)); assert((gpgpu->aux_buf.bo->offset + gpgpu->aux_offset.sampler_border_color_state_offset) % 32 == 0); sampler->ss2.default_color_pointer = (gpgpu->aux_buf.bo->offset + gpgpu->aux_offset.sampler_border_color_state_offset) >> 5; if ((clk_sampler & __CLK_NORMALIZED_MASK) == CLK_NORMALIZED_COORDS_FALSE) sampler->ss3.non_normalized_coord = 1; else sampler->ss3.non_normalized_coord = 0; switch (clk_sampler & __CLK_FILTER_MASK) { case CLK_FILTER_NEAREST: sampler->ss0.min_filter = GEN_MAPFILTER_NEAREST; sampler->ss0.mip_filter = GEN_MIPFILTER_NONE; sampler->ss0.mag_filter = GEN_MAPFILTER_NEAREST; using_nearest = 1; break; case CLK_FILTER_LINEAR: sampler->ss0.min_filter = GEN_MAPFILTER_LINEAR; sampler->ss0.mip_filter = GEN_MIPFILTER_NONE; sampler->ss0.mag_filter = GEN_MAPFILTER_LINEAR; break; } wrap_mode = translate_wrap_mode(clk_sampler & __CLK_ADDRESS_MASK, using_nearest); sampler->ss3.s_wrap_mode = wrap_mode; /* XXX mesa i965 driver code point out that if the surface is a 1D surface, we may need * to set t_wrap_mode to GEN_TEXCOORDMODE_WRAP. */ sampler->ss3.t_wrap_mode = wrap_mode; sampler->ss3.r_wrap_mode = wrap_mode; sampler->ss0.lod_preclamp = 1; /* OpenGL mode */ sampler->ss0.default_color_mode = 0; /* OpenGL/DX10 mode */ sampler->ss0.base_level = 0; sampler->ss1.max_lod = 0; sampler->ss1.min_lod = 0; if (sampler->ss0.min_filter != GEN_MAPFILTER_NEAREST) sampler->ss3.address_round |= GEN_ADDRESS_ROUNDING_ENABLE_U_MIN | GEN_ADDRESS_ROUNDING_ENABLE_V_MIN | GEN_ADDRESS_ROUNDING_ENABLE_R_MIN; if (sampler->ss0.mag_filter != GEN_MAPFILTER_NEAREST) sampler->ss3.address_round |= GEN_ADDRESS_ROUNDING_ENABLE_U_MAG | GEN_ADDRESS_ROUNDING_ENABLE_V_MAG | GEN_ADDRESS_ROUNDING_ENABLE_R_MAG; dri_bo_emit_reloc(gpgpu->aux_buf.bo, I915_GEM_DOMAIN_SAMPLER, 0, gpgpu->aux_offset.sampler_border_color_state_offset, gpgpu->aux_offset.sampler_state_offset + index * sizeof(gen7_sampler_state_t) + offsetof(gen7_sampler_state_t, ss2), gpgpu->aux_buf.bo); } static void intel_gpgpu_insert_sampler_gen8(intel_gpgpu_t *gpgpu, uint32_t index, uint32_t clk_sampler) { int using_nearest = 0; uint32_t wrap_mode; gen8_sampler_state_t *sampler; sampler = (gen8_sampler_state_t *)(gpgpu->aux_buf.bo->virtual + gpgpu->aux_offset.sampler_state_offset) + index; memset(sampler, 0, sizeof(*sampler)); assert((gpgpu->aux_buf.bo->offset + gpgpu->aux_offset.sampler_border_color_state_offset) % 32 == 0); if ((clk_sampler & __CLK_NORMALIZED_MASK) == CLK_NORMALIZED_COORDS_FALSE) sampler->ss3.non_normalized_coord = 1; else sampler->ss3.non_normalized_coord = 0; switch (clk_sampler & __CLK_FILTER_MASK) { case CLK_FILTER_NEAREST: sampler->ss0.min_filter = GEN_MAPFILTER_NEAREST; sampler->ss0.mip_filter = GEN_MIPFILTER_NONE; sampler->ss0.mag_filter = GEN_MAPFILTER_NEAREST; using_nearest = 1; break; case CLK_FILTER_LINEAR: sampler->ss0.min_filter = GEN_MAPFILTER_LINEAR; sampler->ss0.mip_filter = GEN_MIPFILTER_NONE; sampler->ss0.mag_filter = GEN_MAPFILTER_LINEAR; break; } wrap_mode = translate_wrap_mode(clk_sampler & __CLK_ADDRESS_MASK, using_nearest); sampler->ss3.s_wrap_mode = wrap_mode; /* XXX mesa i965 driver code point out that if the surface is a 1D surface, we may need * to set t_wrap_mode to GEN_TEXCOORDMODE_WRAP. */ sampler->ss3.t_wrap_mode = wrap_mode; sampler->ss3.r_wrap_mode = wrap_mode; sampler->ss0.lod_preclamp = 1; /* OpenGL mode */ sampler->ss0.default_color_mode = 0; /* OpenGL/DX10 mode */ sampler->ss0.base_level = 0; sampler->ss1.max_lod = 0; sampler->ss1.min_lod = 0; if (sampler->ss0.min_filter != GEN_MAPFILTER_NEAREST) sampler->ss3.address_round |= GEN_ADDRESS_ROUNDING_ENABLE_U_MIN | GEN_ADDRESS_ROUNDING_ENABLE_V_MIN | GEN_ADDRESS_ROUNDING_ENABLE_R_MIN; if (sampler->ss0.mag_filter != GEN_MAPFILTER_NEAREST) sampler->ss3.address_round |= GEN_ADDRESS_ROUNDING_ENABLE_U_MAG | GEN_ADDRESS_ROUNDING_ENABLE_V_MAG | GEN_ADDRESS_ROUNDING_ENABLE_R_MAG; } static void intel_gpgpu_bind_sampler_gen7(intel_gpgpu_t *gpgpu, uint32_t *samplers, size_t sampler_sz) { int index; assert(sampler_sz <= GEN_MAX_SAMPLERS); for(index = 0; index < sampler_sz; index++) intel_gpgpu_insert_sampler_gen7(gpgpu, index, samplers[index]); } static void intel_gpgpu_bind_sampler_gen8(intel_gpgpu_t *gpgpu, uint32_t *samplers, size_t sampler_sz) { int index; assert(sampler_sz <= GEN_MAX_SAMPLERS); for(index = 0; index < sampler_sz; index++) intel_gpgpu_insert_sampler_gen8(gpgpu, index, samplers[index]); } static void intel_gpgpu_states_setup(intel_gpgpu_t *gpgpu, cl_gpgpu_kernel *kernel) { gpgpu->ker = kernel; if (gpgpu->drv->null_bo) intel_gpgpu_setup_bti(gpgpu, gpgpu->drv->null_bo, 0, 64*1024, 0xfe, I965_SURFACEFORMAT_RAW); intel_gpgpu_build_idrt(gpgpu, kernel); dri_bo_unmap(gpgpu->aux_buf.bo); } static void intel_gpgpu_set_perf_counters(intel_gpgpu_t *gpgpu, cl_buffer *perf) { if (gpgpu->perf_b.bo) drm_intel_bo_unreference(gpgpu->perf_b.bo); drm_intel_bo_reference((drm_intel_bo*) perf); gpgpu->perf_b.bo = (drm_intel_bo*) perf; } static void intel_gpgpu_walker_gen7(intel_gpgpu_t *gpgpu, uint32_t simd_sz, uint32_t thread_n, const size_t global_wk_off[3], const size_t global_dim_off[3], const size_t global_wk_sz[3], const size_t local_wk_sz[3]) { const uint32_t global_wk_dim[3] = { global_wk_sz[0] / local_wk_sz[0], global_wk_sz[1] / local_wk_sz[1], global_wk_sz[2] / local_wk_sz[2] }; uint32_t right_mask = ~0x0; size_t group_sz = local_wk_sz[0] * local_wk_sz[1] * local_wk_sz[2]; assert(simd_sz == 8 || simd_sz == 16); uint32_t shift = (group_sz & (simd_sz - 1)); shift = (shift == 0) ? simd_sz : shift; right_mask = (1 << shift) - 1; BEGIN_BATCH(gpgpu->batch, 11); OUT_BATCH(gpgpu->batch, CMD_GPGPU_WALKER | 9); OUT_BATCH(gpgpu->batch, 0); /* kernel index == 0 */ assert(thread_n <= 64); if (simd_sz == 16) OUT_BATCH(gpgpu->batch, (1 << 30) | (thread_n-1)); /* SIMD16 | thread max */ else OUT_BATCH(gpgpu->batch, (0 << 30) | (thread_n-1)); /* SIMD8 | thread max */ OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, global_wk_dim[0]); OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, global_wk_dim[1]); OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, global_wk_dim[2]); OUT_BATCH(gpgpu->batch, right_mask); OUT_BATCH(gpgpu->batch, ~0x0); /* we always set height as 1, so set bottom mask as all 1*/ ADVANCE_BATCH(gpgpu->batch); BEGIN_BATCH(gpgpu->batch, 2); OUT_BATCH(gpgpu->batch, CMD_MEDIA_STATE_FLUSH | 0); OUT_BATCH(gpgpu->batch, 0); /* kernel index == 0 */ ADVANCE_BATCH(gpgpu->batch); if (IS_IVYBRIDGE(gpgpu->drv->device_id)) intel_gpgpu_pipe_control(gpgpu); } static void intel_gpgpu_walker_gen8(intel_gpgpu_t *gpgpu, uint32_t simd_sz, uint32_t thread_n, const size_t global_wk_off[3], const size_t global_dim_off[3], const size_t global_wk_sz[3], const size_t local_wk_sz[3]) { const uint32_t global_wk_dim[3] = { global_wk_sz[0] / local_wk_sz[0], global_wk_sz[1] / local_wk_sz[1], global_wk_sz[2] / local_wk_sz[2] }; uint32_t right_mask = ~0x0; size_t group_sz = local_wk_sz[0] * local_wk_sz[1] * local_wk_sz[2]; assert(simd_sz == 8 || simd_sz == 16); uint32_t shift = (group_sz & (simd_sz - 1)); shift = (shift == 0) ? simd_sz : shift; right_mask = (1 << shift) - 1; BEGIN_BATCH(gpgpu->batch, 15); OUT_BATCH(gpgpu->batch, CMD_GPGPU_WALKER | 13); OUT_BATCH(gpgpu->batch, 0); /* kernel index == 0 */ OUT_BATCH(gpgpu->batch, 0); /* Indirect Data Length */ OUT_BATCH(gpgpu->batch, 0); /* Indirect Data Start Address */ assert(thread_n <= 64); if (simd_sz == 16) OUT_BATCH(gpgpu->batch, (1 << 30) | (thread_n-1)); /* SIMD16 | thread max */ else OUT_BATCH(gpgpu->batch, (0 << 30) | (thread_n-1)); /* SIMD8 | thread max */ OUT_BATCH(gpgpu->batch, global_dim_off[0]); OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, global_wk_dim[0]+global_dim_off[0]); OUT_BATCH(gpgpu->batch, global_dim_off[1]); OUT_BATCH(gpgpu->batch, 0); OUT_BATCH(gpgpu->batch, global_wk_dim[1]+global_dim_off[1]); OUT_BATCH(gpgpu->batch, global_dim_off[2]); OUT_BATCH(gpgpu->batch, global_wk_dim[2]+global_dim_off[2]); OUT_BATCH(gpgpu->batch, right_mask); OUT_BATCH(gpgpu->batch, ~0x0); /* we always set height as 1, so set bottom mask as all 1*/ ADVANCE_BATCH(gpgpu->batch); BEGIN_BATCH(gpgpu->batch, 2); OUT_BATCH(gpgpu->batch, CMD_MEDIA_STATE_FLUSH | 0); OUT_BATCH(gpgpu->batch, 0); /* kernel index == 0 */ ADVANCE_BATCH(gpgpu->batch); intel_gpgpu_pipe_control(gpgpu); } static intel_event_t* intel_gpgpu_event_new(intel_gpgpu_t *gpgpu) { intel_event_t *event = NULL; TRY_ALLOC_NO_ERR (event, CALLOC(intel_event_t)); event->buffer = gpgpu->batch->buffer; if (event->buffer) drm_intel_bo_reference(event->buffer); event->status = command_queued; if(gpgpu->time_stamp_b.bo) { event->ts_buf = gpgpu->time_stamp_b.bo; drm_intel_bo_reference(event->ts_buf); } exit: return event; error: cl_free(event); event = NULL; goto exit; } /* The upper layer already flushed the batch buffer, just update internal status to command_submitted. */ static void intel_gpgpu_event_flush(intel_event_t *event) { assert(event->status == command_queued); event->status = command_running; } static int intel_gpgpu_event_update_status(intel_event_t *event, int wait) { if(event->status == command_complete) return event->status; if (event->buffer && event->status == command_running && !drm_intel_bo_busy(event->buffer)) { event->status = command_complete; drm_intel_bo_unreference(event->buffer); event->buffer = NULL; return event->status; } if(wait == 0) return event->status; if (event->buffer) { drm_intel_bo_wait_rendering(event->buffer); event->status = command_complete; drm_intel_bo_unreference(event->buffer); event->buffer = NULL; } return event->status; } static void intel_gpgpu_event_delete(intel_event_t *event) { if(event->buffer) drm_intel_bo_unreference(event->buffer); if(event->ts_buf) drm_intel_bo_unreference(event->ts_buf); cl_free(event); } /* IVB and HSW's result MUST shift in x86_64 system */ static uint64_t intel_gpgpu_read_ts_reg_gen7(drm_intel_bufmgr *bufmgr) { uint64_t result = 0; drm_intel_reg_read(bufmgr, TIMESTAMP_ADDR, &result); /* In x86_64 system, the low 32bits of timestamp count are stored in the high 32 bits of result which got from drm_intel_reg_read, and 32-35 bits are lost; but match bspec in i386 system. It seems the kernel readq bug. So shift 32 bit in x86_64, and only remain 32 bits data in i386. */ struct utsname buf; uname(&buf); /* In some systems, the user space is 32 bit, but kernel is 64 bit, so can't use the * compiler's flag to determine the kernel'a architecture, use uname to get it. */ /* x86_64 in linux, amd64 in bsd */ if(strcmp(buf.machine, "x86_64") == 0 || strcmp(buf.machine, "amd64") == 0) return result >> 32; else return result & 0x0ffffffff; } /* baytrail's result should clear high 4 bits */ static uint64_t intel_gpgpu_read_ts_reg_baytrail(drm_intel_bufmgr *bufmgr) { uint64_t result = 0; drm_intel_reg_read(bufmgr, TIMESTAMP_ADDR, &result); return result & 0x0ffffffff; } /* We want to get the current time of GPU. */ static void intel_gpgpu_event_get_gpu_cur_timestamp(intel_driver_t* gen_driver, uint64_t* ret_ts) { uint64_t result = 0; drm_intel_bufmgr *bufmgr = gen_driver->bufmgr; /* Get the ts that match the bspec */ result = intel_gpgpu_read_ts_reg(bufmgr); result *= 80; *ret_ts = result; return; } /* Get the GPU execute time. */ static void intel_gpgpu_event_get_exec_timestamp(intel_gpgpu_t* gpgpu, int index, uint64_t* ret_ts) { uint64_t result = 0; assert(gpgpu->time_stamp_b.bo); assert(index == 0 || index == 1); drm_intel_gem_bo_map_gtt(gpgpu->time_stamp_b.bo); uint64_t* ptr = gpgpu->time_stamp_b.bo->virtual; result = ptr[index]; /* According to BSpec, the timestamp counter should be 36 bits, but comparing to the timestamp counter from IO control reading, we find the first 4 bits seems to be fake. In order to keep the timestamp counter conformable, we just skip the first 4 bits. */ result = (result & 0x0FFFFFFFF) * 80; //convert to nanoseconds *ret_ts = result; drm_intel_gem_bo_unmap_gtt(gpgpu->time_stamp_b.bo); } static int intel_gpgpu_set_profiling_buf(intel_gpgpu_t *gpgpu, uint32_t size, uint32_t offset, uint8_t bti) { drm_intel_bo *bo = NULL; gpgpu->profiling_b.bo = drm_intel_bo_alloc(gpgpu->drv->bufmgr, "Profiling buffer", size, 64); bo = gpgpu->profiling_b.bo; if (!bo || (drm_intel_bo_map(bo, 1) != 0)) { fprintf(stderr, "%s:%d: %s.\n", __FILE__, __LINE__, strerror(errno)); return -1; } memset(bo->virtual, 0, size); drm_intel_bo_unmap(bo); cl_gpgpu_bind_buf((cl_gpgpu)gpgpu, (cl_buffer)bo, offset, 0, size, bti); return 0; } static void intel_gpgpu_set_profiling_info(intel_gpgpu_t *gpgpu, void* profiling_info) { gpgpu->profiling_info = profiling_info; } static void* intel_gpgpu_get_profiling_info(intel_gpgpu_t *gpgpu) { return gpgpu->profiling_info; } static int intel_gpgpu_set_printf_buf(intel_gpgpu_t *gpgpu, uint32_t size, uint8_t bti) { if (gpgpu->printf_b.bo) dri_bo_unreference(gpgpu->printf_b.bo); gpgpu->printf_b.bo = dri_bo_alloc(gpgpu->drv->bufmgr, "Printf buffer", size, 4096); if (!gpgpu->printf_b.bo || (drm_intel_bo_map(gpgpu->printf_b.bo, 1) != 0)) { fprintf(stderr, "%s:%d: %s.\n", __FILE__, __LINE__, strerror(errno)); return -1; } memset(gpgpu->printf_b.bo->virtual, 0, size); *(uint32_t *)(gpgpu->printf_b.bo->virtual) = 4; // first four is for the length. drm_intel_bo_unmap(gpgpu->printf_b.bo); /* No need to bind, we do not need to emit reloc. */ intel_gpgpu_setup_bti(gpgpu, gpgpu->printf_b.bo, 0, size, bti, I965_SURFACEFORMAT_RAW); return 0; } static void* intel_gpgpu_map_profiling_buf(intel_gpgpu_t *gpgpu) { drm_intel_bo *bo = NULL; bo = gpgpu->profiling_b.bo; drm_intel_bo_map(bo, 1); return bo->virtual; } static void intel_gpgpu_unmap_profiling_buf_addr(intel_gpgpu_t *gpgpu) { drm_intel_bo *bo = NULL; bo = gpgpu->profiling_b.bo; drm_intel_bo_unmap(bo); } static void* intel_gpgpu_map_printf_buf(intel_gpgpu_t *gpgpu) { drm_intel_bo *bo = NULL; bo = gpgpu->printf_b.bo; drm_intel_bo_map(bo, 1); return bo->virtual; } static void intel_gpgpu_unmap_printf_buf_addr(intel_gpgpu_t *gpgpu) { drm_intel_bo *bo = NULL; bo = gpgpu->printf_b.bo; drm_intel_bo_unmap(bo); } static void intel_gpgpu_release_printf_buf(intel_gpgpu_t *gpgpu) { drm_intel_bo_unreference(gpgpu->printf_b.bo); gpgpu->printf_b.bo = NULL; } static void intel_gpgpu_set_printf_info(intel_gpgpu_t *gpgpu, void* printf_info) { gpgpu->printf_info = printf_info; } static void* intel_gpgpu_get_printf_info(intel_gpgpu_t *gpgpu) { return gpgpu->printf_info; } static void intel_gpgpu_set_kernel(intel_gpgpu_t *gpgpu, void * kernel) { gpgpu->kernel = kernel; } static void* intel_gpgpu_get_kernel(intel_gpgpu_t *gpgpu) { return gpgpu->kernel; } LOCAL void intel_set_gpgpu_callbacks(int device_id) { cl_gpgpu_new = (cl_gpgpu_new_cb *) intel_gpgpu_new; cl_gpgpu_delete = (cl_gpgpu_delete_cb *) intel_gpgpu_delete; cl_gpgpu_sync = (cl_gpgpu_sync_cb *) intel_gpgpu_sync; cl_gpgpu_bind_buf = (cl_gpgpu_bind_buf_cb *) intel_gpgpu_bind_buf; cl_gpgpu_set_stack = (cl_gpgpu_set_stack_cb *) intel_gpgpu_set_stack; cl_gpgpu_state_init = (cl_gpgpu_state_init_cb *) intel_gpgpu_state_init; cl_gpgpu_set_perf_counters = (cl_gpgpu_set_perf_counters_cb *) intel_gpgpu_set_perf_counters; cl_gpgpu_alloc_constant_buffer = (cl_gpgpu_alloc_constant_buffer_cb *) intel_gpgpu_alloc_constant_buffer; cl_gpgpu_states_setup = (cl_gpgpu_states_setup_cb *) intel_gpgpu_states_setup; cl_gpgpu_upload_samplers = (cl_gpgpu_upload_samplers_cb *) intel_gpgpu_upload_samplers; cl_gpgpu_batch_reset = (cl_gpgpu_batch_reset_cb *) intel_gpgpu_batch_reset; cl_gpgpu_batch_start = (cl_gpgpu_batch_start_cb *) intel_gpgpu_batch_start; cl_gpgpu_batch_end = (cl_gpgpu_batch_end_cb *) intel_gpgpu_batch_end; cl_gpgpu_flush = (cl_gpgpu_flush_cb *) intel_gpgpu_flush; cl_gpgpu_bind_sampler = (cl_gpgpu_bind_sampler_cb *) intel_gpgpu_bind_sampler_gen7; cl_gpgpu_bind_vme_state = (cl_gpgpu_bind_vme_state_cb *) intel_gpgpu_bind_vme_state_gen7; cl_gpgpu_set_scratch = (cl_gpgpu_set_scratch_cb *) intel_gpgpu_set_scratch; cl_gpgpu_event_new = (cl_gpgpu_event_new_cb *)intel_gpgpu_event_new; cl_gpgpu_event_flush = (cl_gpgpu_event_flush_cb *)intel_gpgpu_event_flush; cl_gpgpu_event_update_status = (cl_gpgpu_event_update_status_cb *)intel_gpgpu_event_update_status; cl_gpgpu_event_delete = (cl_gpgpu_event_delete_cb *)intel_gpgpu_event_delete; cl_gpgpu_event_get_exec_timestamp = (cl_gpgpu_event_get_exec_timestamp_cb *)intel_gpgpu_event_get_exec_timestamp; cl_gpgpu_event_get_gpu_cur_timestamp = (cl_gpgpu_event_get_gpu_cur_timestamp_cb *)intel_gpgpu_event_get_gpu_cur_timestamp; cl_gpgpu_ref_batch_buf = (cl_gpgpu_ref_batch_buf_cb *)intel_gpgpu_ref_batch_buf; cl_gpgpu_unref_batch_buf = (cl_gpgpu_unref_batch_buf_cb *)intel_gpgpu_unref_batch_buf; cl_gpgpu_set_profiling_buffer = (cl_gpgpu_set_profiling_buffer_cb *)intel_gpgpu_set_profiling_buf; cl_gpgpu_set_profiling_info = (cl_gpgpu_set_profiling_info_cb *)intel_gpgpu_set_profiling_info; cl_gpgpu_get_profiling_info = (cl_gpgpu_get_profiling_info_cb *)intel_gpgpu_get_profiling_info; cl_gpgpu_map_profiling_buffer = (cl_gpgpu_map_profiling_buffer_cb *)intel_gpgpu_map_profiling_buf; cl_gpgpu_unmap_profiling_buffer = (cl_gpgpu_unmap_profiling_buffer_cb *)intel_gpgpu_unmap_profiling_buf_addr; cl_gpgpu_set_printf_buffer = (cl_gpgpu_set_printf_buffer_cb *)intel_gpgpu_set_printf_buf; cl_gpgpu_map_printf_buffer = (cl_gpgpu_map_printf_buffer_cb *)intel_gpgpu_map_printf_buf; cl_gpgpu_unmap_printf_buffer = (cl_gpgpu_unmap_printf_buffer_cb *)intel_gpgpu_unmap_printf_buf_addr; cl_gpgpu_release_printf_buffer = (cl_gpgpu_release_printf_buffer_cb *)intel_gpgpu_release_printf_buf; cl_gpgpu_set_printf_info = (cl_gpgpu_set_printf_info_cb *)intel_gpgpu_set_printf_info; cl_gpgpu_get_printf_info = (cl_gpgpu_get_printf_info_cb *)intel_gpgpu_get_printf_info; cl_gpgpu_set_kernel = (cl_gpgpu_set_kernel_cb *)intel_gpgpu_set_kernel; cl_gpgpu_get_kernel = (cl_gpgpu_get_kernel_cb *)intel_gpgpu_get_kernel; if (IS_BROADWELL(device_id) || IS_CHERRYVIEW(device_id)) { cl_gpgpu_bind_image = (cl_gpgpu_bind_image_cb *) intel_gpgpu_bind_image_gen8; intel_gpgpu_set_L3 = intel_gpgpu_set_L3_gen8; cl_gpgpu_get_cache_ctrl = (cl_gpgpu_get_cache_ctrl_cb *)intel_gpgpu_get_cache_ctrl_gen8; intel_gpgpu_get_scratch_index = intel_gpgpu_get_scratch_index_gen8; intel_gpgpu_post_action = intel_gpgpu_post_action_gen7; //BDW need not restore SLM, same as gen7 intel_gpgpu_read_ts_reg = intel_gpgpu_read_ts_reg_gen7; if(IS_CHERRYVIEW(device_id)) intel_gpgpu_read_ts_reg = intel_gpgpu_read_ts_reg_baytrail; intel_gpgpu_set_base_address = intel_gpgpu_set_base_address_gen8; intel_gpgpu_setup_bti = intel_gpgpu_setup_bti_gen8; intel_gpgpu_load_vfe_state = intel_gpgpu_load_vfe_state_gen8; cl_gpgpu_walker = (cl_gpgpu_walker_cb *)intel_gpgpu_walker_gen8; intel_gpgpu_build_idrt = intel_gpgpu_build_idrt_gen8; intel_gpgpu_load_curbe_buffer = intel_gpgpu_load_curbe_buffer_gen8; intel_gpgpu_load_idrt = intel_gpgpu_load_idrt_gen8; cl_gpgpu_bind_sampler = (cl_gpgpu_bind_sampler_cb *) intel_gpgpu_bind_sampler_gen8; intel_gpgpu_pipe_control = intel_gpgpu_pipe_control_gen8; intel_gpgpu_select_pipeline = intel_gpgpu_select_pipeline_gen7; cl_gpgpu_upload_curbes = (cl_gpgpu_upload_curbes_cb *) intel_gpgpu_upload_curbes_gen8; return; } if (IS_GEN9(device_id)) { cl_gpgpu_bind_image = (cl_gpgpu_bind_image_cb *) intel_gpgpu_bind_image_gen9; intel_gpgpu_set_L3 = intel_gpgpu_set_L3_gen8; cl_gpgpu_get_cache_ctrl = (cl_gpgpu_get_cache_ctrl_cb *)intel_gpgpu_get_cache_ctrl_gen9; intel_gpgpu_get_scratch_index = intel_gpgpu_get_scratch_index_gen8; intel_gpgpu_post_action = intel_gpgpu_post_action_gen7; //SKL need not restore SLM, same as gen7 intel_gpgpu_read_ts_reg = intel_gpgpu_read_ts_reg_gen7; if(IS_GEMINILAKE(device_id)) intel_gpgpu_read_ts_reg = intel_gpgpu_read_ts_reg_baytrail; intel_gpgpu_set_base_address = intel_gpgpu_set_base_address_gen9; intel_gpgpu_setup_bti = intel_gpgpu_setup_bti_gen9; intel_gpgpu_load_vfe_state = intel_gpgpu_load_vfe_state_gen8; cl_gpgpu_walker = (cl_gpgpu_walker_cb *)intel_gpgpu_walker_gen8; intel_gpgpu_build_idrt = intel_gpgpu_build_idrt_gen9; intel_gpgpu_load_curbe_buffer = intel_gpgpu_load_curbe_buffer_gen8; intel_gpgpu_load_idrt = intel_gpgpu_load_idrt_gen8; cl_gpgpu_bind_sampler = (cl_gpgpu_bind_sampler_cb *) intel_gpgpu_bind_sampler_gen8; intel_gpgpu_pipe_control = intel_gpgpu_pipe_control_gen8; intel_gpgpu_select_pipeline = intel_gpgpu_select_pipeline_gen9; cl_gpgpu_upload_curbes = (cl_gpgpu_upload_curbes_cb *) intel_gpgpu_upload_curbes_gen8; return; } cl_gpgpu_upload_curbes = (cl_gpgpu_upload_curbes_cb *) intel_gpgpu_upload_curbes_gen7; intel_gpgpu_set_base_address = intel_gpgpu_set_base_address_gen7; intel_gpgpu_load_vfe_state = intel_gpgpu_load_vfe_state_gen7; cl_gpgpu_walker = (cl_gpgpu_walker_cb *)intel_gpgpu_walker_gen7; intel_gpgpu_build_idrt = intel_gpgpu_build_idrt_gen7; intel_gpgpu_load_curbe_buffer = intel_gpgpu_load_curbe_buffer_gen7; intel_gpgpu_load_idrt = intel_gpgpu_load_idrt_gen7; intel_gpgpu_select_pipeline = intel_gpgpu_select_pipeline_gen7; if (IS_HASWELL(device_id)) { cl_gpgpu_bind_image = (cl_gpgpu_bind_image_cb *) intel_gpgpu_bind_image_gen75; intel_gpgpu_set_L3 = intel_gpgpu_set_L3_gen75; cl_gpgpu_get_cache_ctrl = (cl_gpgpu_get_cache_ctrl_cb *)intel_gpgpu_get_cache_ctrl_gen75; intel_gpgpu_get_scratch_index = intel_gpgpu_get_scratch_index_gen75; intel_gpgpu_post_action = intel_gpgpu_post_action_gen75; intel_gpgpu_read_ts_reg = intel_gpgpu_read_ts_reg_gen7; //HSW same as ivb intel_gpgpu_setup_bti = intel_gpgpu_setup_bti_gen75; intel_gpgpu_pipe_control = intel_gpgpu_pipe_control_gen75; } else if (IS_IVYBRIDGE(device_id)) { cl_gpgpu_bind_image = (cl_gpgpu_bind_image_cb *) intel_gpgpu_bind_image_gen7; cl_gpgpu_bind_image_for_vme = (cl_gpgpu_bind_image_cb *) intel_gpgpu_bind_image_for_vme_gen7; if (IS_BAYTRAIL_T(device_id)) { intel_gpgpu_set_L3 = intel_gpgpu_set_L3_baytrail; intel_gpgpu_read_ts_reg = intel_gpgpu_read_ts_reg_baytrail; } else { intel_gpgpu_set_L3 = intel_gpgpu_set_L3_gen7; intel_gpgpu_read_ts_reg = intel_gpgpu_read_ts_reg_gen7; } cl_gpgpu_get_cache_ctrl = (cl_gpgpu_get_cache_ctrl_cb *)intel_gpgpu_get_cache_ctrl_gen7; intel_gpgpu_get_scratch_index = intel_gpgpu_get_scratch_index_gen7; intel_gpgpu_post_action = intel_gpgpu_post_action_gen7; intel_gpgpu_setup_bti = intel_gpgpu_setup_bti_gen7; intel_gpgpu_pipe_control = intel_gpgpu_pipe_control_gen7; } } Beignet-1.3.2-Source/src/intel/intel_cl_gl_share_image_info.h000664 001750 001750 00000000501 13161142102 023330 0ustar00yryr000000 000000 #ifndef __INTEL_CL_GL_SHARE_IMAGE_INFO_ #define __INTEL_CL_GL_SHARE_IMAGE_INFO_ struct _intel_cl_gl_share_image_info { int fd; size_t w; size_t h; size_t depth; size_t pitch; int tiling; size_t offset; size_t tile_x; size_t tile_y; unsigned int gl_format; size_t row_pitch, slice_pitch; }; #endif Beignet-1.3.2-Source/src/intel/intel_gpgpu.h000664 001750 001750 00000006037 13161142102 020045 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia * Alexei Soupikov */ #ifndef __INTEL_GPGPU_H__ #define __INTEL_GPGPU_H__ #include "cl_utils.h" #include "cl_driver.h" #include "intel/intel_batchbuffer.h" #include "intel/intel_driver.h" #include #include /* We can bind only a limited number of buffers */ enum { max_buf_n = 128 }; enum { max_img_n = 128}; enum {max_sampler_n = 16 }; struct intel_driver; struct intel_batchbuffer; /* Handle GPGPU state */ struct intel_gpgpu { void* ker_opaque; void* printf_info; void* profiling_info; struct intel_driver *drv; struct intel_batchbuffer *batch; cl_gpgpu_kernel *ker; drm_intel_bo *binded_buf[max_buf_n]; /* all buffers binded for the call */ uint32_t target_buf_offset[max_buf_n];/* internal offset for buffers binded for the call */ uint32_t binded_offset[max_buf_n]; /* their offsets in the curbe buffer */ uint32_t binded_n; /* number of buffers binded */ void *kernel; /* cl_kernel with this gpgpu */ unsigned long img_bitmap; /* image usage bitmap. */ unsigned int img_index_base; /* base index for image surface.*/ unsigned long sampler_bitmap; /* sampler usage bitmap. */ struct { drm_intel_bo *bo; } stack_b; struct { drm_intel_bo *bo; } perf_b; struct { drm_intel_bo *bo; } scratch_b; struct { drm_intel_bo *bo; } constant_b; struct { drm_intel_bo *bo; } time_stamp_b; /* time stamp buffer */ struct { drm_intel_bo *bo; } printf_b; /* the printf buf and index buf*/ struct { drm_intel_bo *bo; } profiling_b; /* the buf for profiling*/ struct { drm_intel_bo *bo; } aux_buf; struct { uint32_t surface_heap_offset; uint32_t curbe_offset; uint32_t idrt_offset; uint32_t sampler_state_offset; uint32_t sampler_border_color_state_offset; } aux_offset; uint32_t per_thread_scratch; struct { uint32_t num_cs_entries; uint32_t size_cs_entry; /* size of one entry in 512bit elements */ } curb; uint32_t max_threads; /* max threads requested by the user */ }; struct intel_gpgpu_node { struct intel_gpgpu *gpgpu; struct intel_gpgpu_node *next; }; /* Set the gpgpu related call backs */ extern void intel_set_gpgpu_callbacks(int device_id); #endif /* __INTEL_GPGPU_H__ */ Beignet-1.3.2-Source/src/intel/intel_driver.c000664 001750 001750 00000070734 13173554000 020225 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* * Copyright 2009 Intel Corporation * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. * IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * Authors: * Xiang Haihao * Zou Nan hai * */ #if defined(HAS_GL_EGL) #define EGL_EGLEXT_PROTOTYPES #include "GL/gl.h" #include "EGL/egl.h" #include #endif #ifdef HAS_X11 #include #include "x11/dricommon.h" #endif #include "intel_driver.h" #include "intel_gpgpu.h" #include "intel_batchbuffer.h" #include "intel_bufmgr.h" #include "cl_mem.h" #include #include #include #include #include #include #include #include #include "cl_utils.h" #include "cl_alloc.h" #include "cl_context.h" #include "cl_driver.h" #include "cl_device_id.h" #include "cl_platform_id.h" static void intel_driver_delete(intel_driver_t *driver) { if (driver == NULL) return; cl_free(driver); } static intel_driver_t* intel_driver_new(void) { intel_driver_t *driver = NULL; TRY_ALLOC_NO_ERR (driver, CALLOC(intel_driver_t)); driver->fd = -1; exit: return driver; error: intel_driver_delete(driver); driver = NULL; goto exit; } /* just used for maximum relocation number in drm_intel */ #define BATCH_SIZE 0x4000 /* set OCL_DUMP_AUB=1 to get aub file */ static void intel_driver_aub_dump(intel_driver_t *driver) { char *val; val = getenv("OCL_DUMP_AUB"); if (!val) return; if (atoi(val) != 0) { drm_intel_bufmgr_gem_set_aub_filename(driver->bufmgr, "beignet.aub"); drm_intel_bufmgr_gem_set_aub_dump(driver->bufmgr, 1); } } static int intel_driver_memman_init(intel_driver_t *driver) { driver->bufmgr = drm_intel_bufmgr_gem_init(driver->fd, BATCH_SIZE); if (!driver->bufmgr) return 0; drm_intel_bufmgr_gem_enable_reuse(driver->bufmgr); driver->device_id = drm_intel_bufmgr_gem_get_devid(driver->bufmgr); intel_driver_aub_dump(driver); return 1; } static int intel_driver_context_init(intel_driver_t *driver) { driver->ctx = drm_intel_gem_context_create(driver->bufmgr); if (!driver->ctx) return 0; driver->null_bo = NULL; #ifdef HAS_BO_SET_SOFTPIN drm_intel_bo *bo = dri_bo_alloc(driver->bufmgr, "null_bo", 64*1024, 4096); drm_intel_bo_set_softpin_offset(bo, 0); // don't reuse it, that would make two bo trying to bind to same address, // which is un-reasonable. drm_intel_bo_disable_reuse(bo); drm_intel_bo_map(bo, 1); *(uint32_t *)bo->virtual = MI_BATCH_BUFFER_END; drm_intel_bo_unmap(bo); if (drm_intel_gem_bo_context_exec(bo, driver->ctx, 0, 0) == 0) { driver->null_bo = bo; } else { drm_intel_bo_unreference(bo); } #endif return 1; } static void intel_driver_context_destroy(intel_driver_t *driver) { if (driver->null_bo) drm_intel_bo_unreference(driver->null_bo); if(driver->ctx) drm_intel_gem_context_destroy(driver->ctx); driver->ctx = NULL; } static int intel_driver_init(intel_driver_t *driver, int dev_fd) { driver->fd = dev_fd; driver->locked = 0; pthread_mutex_init(&driver->ctxmutex, NULL); if (!intel_driver_memman_init(driver)) return 0; if (!intel_driver_context_init(driver)) return 0; #if EMULATE_GEN driver->gen_ver = EMULATE_GEN; if (EMULATE_GEN == 75) driver->device_id = PCI_CHIP_HASWELL_L; /* we pick L for HSW */ else if (EMULATE_GEN == 7) driver->device_id = PCI_CHIP_IVYBRIDGE_GT2; /* we pick GT2 for IVB */ else if (EMULATE_GEN == 6) driver->device_id = PCI_CHIP_SANDYBRIDGE_GT2; /* we pick GT2 for SNB */ else FATAL ("Unsupported Gen for emulation"); #else if (IS_GEN9(driver->device_id)) driver->gen_ver = 9; else if (IS_GEN8(driver->device_id)) driver->gen_ver = 8; else if (IS_GEN75(driver->device_id)) driver->gen_ver = 75; else if (IS_GEN7(driver->device_id)) driver->gen_ver = 7; else if (IS_GEN6(driver->device_id)) driver->gen_ver = 6; else if(IS_IGDNG(driver->device_id)) driver->gen_ver = 5; else driver->gen_ver = 4; #endif /* EMULATE_GEN */ return 1; } static cl_int intel_driver_open(intel_driver_t *intel, cl_context_prop props) { int cardi; #ifdef HAS_X11 char *driver_name; #endif if (props != NULL && props->gl_type != CL_GL_NOSHARE && props->gl_type != CL_GL_GLX_DISPLAY && props->gl_type != CL_GL_EGL_DISPLAY) { fprintf(stderr, "Unsupported gl share type %d.\n", props->gl_type); return CL_INVALID_OPERATION; } #ifdef HAS_X11 intel->x11_display = XOpenDisplay(NULL); if(intel->x11_display) { if((intel->dri_ctx = getDRI2State(intel->x11_display, DefaultScreen(intel->x11_display), &driver_name))) { intel_driver_init_shared(intel, intel->dri_ctx); Xfree(driver_name); } else fprintf(stderr, "X server found. dri2 connection failed! \n"); } #endif if(!intel_driver_is_active(intel)) { char card_name[20]; for(cardi = 0; cardi < 16; cardi++) { sprintf(card_name, "/dev/dri/renderD%d", 128+cardi); if (access(card_name, R_OK) != 0) continue; if(intel_driver_init_render(intel, card_name)) break; } } if(!intel_driver_is_active(intel)) { char card_name[20]; for(cardi = 0; cardi < 16; cardi++) { sprintf(card_name, "/dev/dri/card%d", cardi); if (access(card_name, R_OK) != 0) continue; if(intel_driver_init_master(intel, card_name)) break; } } if(!intel_driver_is_active(intel)) { fprintf(stderr, "Device open failed, aborting...\n"); return CL_DEVICE_NOT_FOUND; } #ifdef HAS_GL_EGL if (props && props->gl_type == CL_GL_EGL_DISPLAY) { assert(props->egl_display); } #endif return CL_SUCCESS; } static void intel_driver_close(intel_driver_t *intel) { //Due to the drm change about the test usrptr, we need to destroy the bufmgr //befor the driver was closed, otherwise the test usrptr will not be freed. if (intel->bufmgr) drm_intel_bufmgr_destroy(intel->bufmgr); #ifdef HAS_X11 if(intel->dri_ctx) dri_state_release(intel->dri_ctx); if(intel->x11_display) XCloseDisplay(intel->x11_display); #endif if(intel->need_close) { close(intel->fd); intel->need_close = 0; } intel->dri_ctx = NULL; intel->x11_display = NULL; intel->fd = -1; } LOCAL int intel_driver_is_active(intel_driver_t *driver) { return driver->fd >= 0; } #ifdef HAS_X11 LOCAL int intel_driver_init_shared(intel_driver_t *driver, dri_state_t *state) { int ret; assert(state); if(state->driConnectedFlag != DRI2) return 0; ret = intel_driver_init(driver, state->fd); driver->need_close = 0; return ret; } #endif LOCAL int intel_driver_init_master(intel_driver_t *driver, const char* dev_name) { int dev_fd, ret; drm_client_t client; // usually dev_name = "/dev/dri/card%d" dev_fd = open(dev_name, O_RDWR); if (dev_fd == -1) { fprintf(stderr, "open(\"%s\", O_RDWR) failed: %s\n", dev_name, strerror(errno)); return 0; } // Check that we're authenticated memset(&client, 0, sizeof(drm_client_t)); ret = ioctl(dev_fd, DRM_IOCTL_GET_CLIENT, &client); if (ret == -1) { fprintf(stderr, "ioctl(dev_fd, DRM_IOCTL_GET_CLIENT, &client) failed: %s\n", strerror(errno)); close(dev_fd); return 0; } if (!client.auth) { fprintf(stderr, "%s not authenticated\n", dev_name); close(dev_fd); return 0; } ret = intel_driver_init(driver, dev_fd); driver->need_close = 1; return ret; } LOCAL int intel_driver_init_render(intel_driver_t *driver, const char* dev_name) { int dev_fd, ret; dev_fd = open(dev_name, O_RDWR); if (dev_fd == -1) return 0; ret = intel_driver_init(driver, dev_fd); driver->need_close = 1; return ret; } LOCAL int intel_driver_terminate(intel_driver_t *driver) { pthread_mutex_destroy(&driver->ctxmutex); if(driver->need_close) { close(driver->fd); driver->need_close = 0; } driver->fd = -1; return 1; } LOCAL void intel_driver_lock_hardware(intel_driver_t *driver) { PPTHREAD_MUTEX_LOCK(driver); assert(!driver->locked); driver->locked = 1; } LOCAL void intel_driver_unlock_hardware(intel_driver_t *driver) { driver->locked = 0; PPTHREAD_MUTEX_UNLOCK(driver); } LOCAL dri_bo* intel_driver_share_buffer_from_name(intel_driver_t *driver, const char *sname, uint32_t name) { dri_bo *bo = intel_bo_gem_create_from_name(driver->bufmgr, sname, name); if (bo == NULL) { fprintf(stderr, "intel_bo_gem_create_from_name create \"%s\" bo from name %d failed: %s\n", sname, name, strerror(errno)); return NULL; } return bo; } LOCAL dri_bo* intel_driver_share_buffer_from_fd(intel_driver_t *driver, int fd, int size) { dri_bo *bo = drm_intel_bo_gem_create_from_prime(driver->bufmgr, fd, size); if (bo == NULL) { fprintf(stderr, "drm_intel_bo_gem_create_from_prime create bo(size %d) from fd %d failed: %s\n", size, fd, strerror(errno)); return NULL; } return bo; } LOCAL uint32_t intel_driver_shared_name(intel_driver_t *driver, dri_bo *bo) { uint32_t name; assert(bo); dri_bo_flink(bo, &name); return name; } /* XXX a null props is ok? */ static int intel_get_device_id(void) { intel_driver_t *driver = NULL; int intel_device_id; driver = intel_driver_new(); assert(driver != NULL); if(UNLIKELY(intel_driver_open(driver, NULL) != CL_SUCCESS)) return INVALID_CHIP_ID; intel_device_id = driver->device_id; intel_driver_context_destroy(driver); intel_driver_close(driver); intel_driver_terminate(driver); intel_driver_delete(driver); return intel_device_id; } extern void intel_gpgpu_delete_all(intel_driver_t *driver); static void cl_intel_driver_delete(intel_driver_t *driver) { if (driver == NULL) return; intel_gpgpu_delete_all(driver); intel_driver_context_destroy(driver); intel_driver_close(driver); intel_driver_terminate(driver); intel_driver_delete(driver); } #include "cl_gbe_loader.h" static intel_driver_t* cl_intel_driver_new(cl_context_prop props) { intel_driver_t *driver = NULL; TRY_ALLOC_NO_ERR (driver, intel_driver_new()); if(UNLIKELY(intel_driver_open(driver, props) != CL_SUCCESS)) goto error; exit: return driver; error: cl_intel_driver_delete(driver); driver = NULL; goto exit; } static drm_intel_bufmgr* intel_driver_get_bufmgr(intel_driver_t *drv) { return drv->bufmgr; } static uint32_t intel_driver_get_ver(struct intel_driver *drv) { return drv->gen_ver; } static void intel_driver_enlarge_stack_size(struct intel_driver *drv, int32_t *stack_size) { if (drv->gen_ver == 75) *stack_size = *stack_size * 4; else if (drv->device_id == PCI_CHIP_BROXTON_1 || drv->device_id == PCI_CHIP_BROXTON_3 || IS_CHERRYVIEW(drv->device_id)) *stack_size = *stack_size * 2; } static void intel_driver_set_atomic_flag(intel_driver_t *drv, int atomic_flag) { drv->atomic_test_result = atomic_flag; } static size_t drm_intel_bo_get_size(drm_intel_bo *bo) { return bo->size; } static void* drm_intel_bo_get_virtual(drm_intel_bo *bo) { return bo->virtual; } static int get_cl_tiling(uint32_t drm_tiling) { switch(drm_tiling) { case I915_TILING_X: return CL_TILE_X; case I915_TILING_Y: return CL_TILE_Y; case I915_TILING_NONE: return CL_NO_TILE; default: assert(0); } return CL_NO_TILE; } static uint32_t intel_buffer_get_tiling_align(cl_context ctx, uint32_t tiling_mode, uint32_t dim) { uint32_t gen_ver = ((intel_driver_t *)ctx->drv)->gen_ver; uint32_t ret = 0; switch (tiling_mode) { case CL_TILE_X: if (dim == 0) { //tileX width in bytes ret = 512; } else if (dim == 1) { //tileX height in number of rows ret = 8; } else if (dim == 2) { //height to calculate slice pitch if (gen_ver == 9) //SKL same as tileY height ret = 8; else if (gen_ver == 8) //IVB, HSW, BDW same as CL_NO_TILE vertical alignment ret = 4; else ret = 2; } else assert(0); break; case CL_TILE_Y: if (dim == 0) { //tileY width in bytes ret = 128; } else if (dim == 1) { //tileY height in number of rows ret = 32; } else if (dim == 2) { //height to calculate slice pitch if (gen_ver == 9) //SKL same as tileY height ret = 32; else if (gen_ver == 8) //IVB, HSW, BDW same as CL_NO_TILE vertical alignment ret = 4; else ret = 2; } else assert(0); break; case CL_NO_TILE: if (dim == 1 || dim == 2) { //vertical alignment if (gen_ver == 8 || gen_ver == 9) //SKL 1D array need 4 alignment qpitch ret = 4; else ret = 2; } else assert(0); break; } return ret; } #if defined(HAS_GL_EGL) #include "intel_cl_gl_share_image_info.h" #include "cl_image.h" static PFNEGLEXPORTDMABUFIMAGEMESAPROC eglExportDMABUFImageMESA_func = NULL; static int get_required_egl_extensions(){ if(eglExportDMABUFImageMESA_func == NULL){ eglExportDMABUFImageMESA_func = (PFNEGLEXPORTDMABUFIMAGEMESAPROC) eglGetProcAddress("eglExportDMABUFImageMESA"); if(eglExportDMABUFImageMESA_func == NULL){ fprintf(stderr, "Failed to get EGL extension function eglExportDMABUFImageMESA\n"); return -1; } } return 0; } static int cl_get_clformat_from_texture(GLint tex_format, cl_image_format * cl_format) { cl_int ret = CL_SUCCESS; switch (tex_format) { case GL_RGBA8: case GL_RGBA: case GL_RGBA16: case GL_RGBA8I: case GL_RGBA16I: case GL_RGBA32I: case GL_RGBA8UI: case GL_RGBA16UI: case GL_RGBA32UI: case GL_RGBA16F: case GL_RGBA32F: cl_format->image_channel_order = CL_RGBA; break; case GL_BGRA: cl_format->image_channel_order = CL_BGRA; break; default: ret = -1; goto error; } switch (tex_format) { case GL_RGBA8: case GL_RGBA: case GL_BGRA: cl_format->image_channel_data_type = CL_UNORM_INT8; break; case GL_RGBA16: cl_format->image_channel_data_type = CL_UNORM_INT16; break; case GL_RGBA8I: cl_format->image_channel_data_type = CL_SIGNED_INT8; break; case GL_RGBA16I: cl_format->image_channel_data_type = CL_SIGNED_INT16; break; case GL_RGBA32I: cl_format->image_channel_data_type = CL_SIGNED_INT32; break; case GL_RGBA8UI: cl_format->image_channel_data_type = CL_UNSIGNED_INT8; break; case GL_RGBA16UI: cl_format->image_channel_data_type = CL_UNSIGNED_INT16; break; case GL_RGBA32UI: cl_format->image_channel_data_type = CL_UNSIGNED_INT32; break; case GL_RGBA16F: cl_format->image_channel_data_type = CL_HALF_FLOAT; break; case GL_RGBA32F: cl_format->image_channel_order = CL_FLOAT; break; default: ret = -1; goto error; } error: return ret; } static int get_mem_type_from_target(GLenum texture_target, cl_mem_object_type *type) { switch(texture_target) { case GL_TEXTURE_1D: *type = CL_MEM_OBJECT_IMAGE1D; break; case GL_TEXTURE_2D: *type = CL_MEM_OBJECT_IMAGE2D; break; case GL_TEXTURE_3D: *type = CL_MEM_OBJECT_IMAGE3D; break; case GL_TEXTURE_1D_ARRAY: *type = CL_MEM_OBJECT_IMAGE1D_ARRAY; break; case GL_TEXTURE_2D_ARRAY: *type = CL_MEM_OBJECT_IMAGE2D_ARRAY; break; default: return -1; } return CL_SUCCESS; } static cl_buffer intel_alloc_buffer_from_texture_egl(cl_context ctx, unsigned int target, int miplevel, unsigned int texture, struct _cl_mem_image *image) { drm_intel_bo *intel_bo = NULL; struct _intel_cl_gl_share_image_info info; unsigned int bpp, intel_fmt; cl_image_format cl_format; EGLBoolean ret; EGLenum e_target; //We just support GL_TEXTURE_2D because we can't query info like slice_pitch now. if(target == GL_TEXTURE_2D) e_target = EGL_GL_TEXTURE_2D; else return NULL; if(get_required_egl_extensions() != 0) return NULL; EGLAttrib attrib_list[] = {EGL_GL_TEXTURE_LEVEL, miplevel, EGL_NONE}; EGLImage e_image = eglCreateImage(EGL_DISP(ctx), EGL_CTX(ctx), e_target, (EGLClientBuffer)texture, &attrib_list[0]); if(e_image == EGL_NO_IMAGE) return NULL; int fd, stride, offset; ret = eglExportDMABUFImageMESA_func(EGL_DISP(ctx), e_image, &fd, &stride, &offset); if(ret != EGL_TRUE){ eglDestroyImage(EGL_DISP(ctx), e_image); return NULL; } info.fd = fd; /* The size argument just takes effect in intel_driver_share_buffer_from_fd when * Linux kernel is older than 3.12, so it doesn't matter we set to 0 here. */ int size = 0; intel_bo = intel_driver_share_buffer_from_fd((intel_driver_t *)ctx->drv, fd, size); if (intel_bo == NULL) { eglDestroyImage(EGL_DISP(ctx), e_image); return NULL; } GLint param_value; glGetTexLevelParameteriv(target, miplevel, GL_TEXTURE_WIDTH, ¶m_value); info.w = param_value; glGetTexLevelParameteriv(target, miplevel, GL_TEXTURE_HEIGHT, ¶m_value); info.h = param_value; glGetTexLevelParameteriv(target, miplevel, GL_TEXTURE_DEPTH, ¶m_value); info.depth = 1; info.pitch = stride; uint32_t tiling_mode, swizzle_mode; drm_intel_bo_get_tiling(intel_bo, &tiling_mode, &swizzle_mode); info.offset = offset; info.tile_x = 0; info.tile_y = 0; glGetTexLevelParameteriv(target, miplevel, GL_TEXTURE_INTERNAL_FORMAT, ¶m_value); info.gl_format = param_value; info.row_pitch = stride; info.slice_pitch = 0; info.tiling = get_cl_tiling(tiling_mode); if (cl_get_clformat_from_texture(info.gl_format, &cl_format) != 0) goto error; if (cl_image_byte_per_pixel(&cl_format, &bpp) != CL_SUCCESS) goto error; intel_fmt = cl_image_get_intel_format(&cl_format); if (intel_fmt == INTEL_UNSUPPORTED_FORMAT) goto error; cl_mem_object_type image_type; if (get_mem_type_from_target(target, &image_type) != 0) goto error; cl_mem_image_init(image, info.w, info.h, image_type, info.depth, cl_format, intel_fmt, bpp, info.row_pitch, info.slice_pitch, info.tiling, info.tile_x, info.tile_y, info.offset); struct _cl_mem_gl_image *gl_image = (struct _cl_mem_gl_image*)image; gl_image->fd = fd; gl_image->egl_image = e_image; return (cl_buffer) intel_bo; error: drm_intel_bo_unreference(intel_bo); close(fd); eglDestroyImage(EGL_DISP(ctx), e_image); return NULL; } static cl_buffer intel_alloc_buffer_from_texture(cl_context ctx, unsigned int target, int miplevel, unsigned int texture, struct _cl_mem_image *image) { if (IS_EGL_CONTEXT(ctx)) return intel_alloc_buffer_from_texture_egl(ctx, target, miplevel, texture, image); return NULL; } static int intel_release_buffer_from_texture(cl_context ctx, struct _cl_mem_gl_image *gl_image) { if (IS_EGL_CONTEXT(ctx)) { close(gl_image->fd); eglDestroyImage(EGL_DISP(ctx), gl_image->egl_image); return CL_SUCCESS; } return -1; } #endif cl_buffer intel_share_buffer_from_libva(cl_context ctx, unsigned int bo_name, size_t *sz) { drm_intel_bo *intel_bo; intel_bo = intel_driver_share_buffer_from_name((intel_driver_t *)ctx->drv, "shared from libva", bo_name); if (intel_bo == NULL) return NULL; if (sz) *sz = intel_bo->size; return (cl_buffer)intel_bo; } cl_buffer intel_share_image_from_libva(cl_context ctx, unsigned int bo_name, struct _cl_mem_image *image) { drm_intel_bo *intel_bo; uint32_t intel_tiling, intel_swizzle_mode; intel_bo = intel_driver_share_buffer_from_name((intel_driver_t *)ctx->drv, "shared from libva", bo_name); if (intel_bo == NULL) return NULL; drm_intel_bo_get_tiling(intel_bo, &intel_tiling, &intel_swizzle_mode); image->tiling = get_cl_tiling(intel_tiling); return (cl_buffer)intel_bo; } cl_buffer intel_share_buffer_from_fd(cl_context ctx, int fd, int buffer_size) { drm_intel_bo *intel_bo; intel_bo = intel_driver_share_buffer_from_fd((intel_driver_t *)ctx->drv, fd, buffer_size); if (intel_bo == NULL) return NULL; return (cl_buffer)intel_bo; } cl_buffer intel_share_image_from_fd(cl_context ctx, int fd, int image_size, struct _cl_mem_image *image) { drm_intel_bo *intel_bo; uint32_t intel_tiling, intel_swizzle_mode; intel_bo = intel_driver_share_buffer_from_fd((intel_driver_t *)ctx->drv, fd, image_size); if (intel_bo == NULL) return NULL; drm_intel_bo_get_tiling(intel_bo, &intel_tiling, &intel_swizzle_mode); image->tiling = get_cl_tiling(intel_tiling); return (cl_buffer)intel_bo; } static cl_buffer intel_buffer_alloc_userptr(cl_buffer_mgr bufmgr, const char* name, void *data,size_t size, unsigned long flags) { #ifdef HAS_USERPTR drm_intel_bo *bo; bo = drm_intel_bo_alloc_userptr((drm_intel_bufmgr *)bufmgr, name, data, I915_TILING_NONE, 0, size, flags); /* Fallback to unsynchronized userptr allocation if kernel has no MMU notifier enabled. */ if (bo == NULL) bo = drm_intel_bo_alloc_userptr((drm_intel_bufmgr *)bufmgr, name, data, I915_TILING_NONE, 0, size, flags | I915_USERPTR_UNSYNCHRONIZED); return (cl_buffer)bo; #else return NULL; #endif } static int32_t get_intel_tiling(cl_int tiling, uint32_t *intel_tiling) { switch (tiling) { case CL_NO_TILE: *intel_tiling = I915_TILING_NONE; break; case CL_TILE_X: *intel_tiling = I915_TILING_X; break; case CL_TILE_Y: *intel_tiling = I915_TILING_Y; break; default: assert(0); return -1; } return 0; } static int intel_buffer_set_tiling(cl_buffer bo, cl_image_tiling_t tiling, size_t stride) { uint32_t intel_tiling; int ret; if (UNLIKELY((get_intel_tiling(tiling, &intel_tiling)) < 0)) return -1; #ifndef NDEBUG uint32_t required_tiling; required_tiling = intel_tiling; #endif ret = drm_intel_bo_set_tiling((drm_intel_bo*)bo, &intel_tiling, stride); assert(intel_tiling == required_tiling); return ret; } #define CHV_CONFIG_WARNING \ "Warning: can't get GPU's configurations, will use the minimal one. Please update your drm to 2.4.59+ and linux kernel to 4.0.0+.\n" static void intel_update_device_info(cl_device_id device) { intel_driver_t *driver; driver = intel_driver_new(); assert(driver != NULL); if (intel_driver_open(driver, NULL) != CL_SUCCESS) { intel_driver_delete(driver); return; } #ifdef HAS_USERPTR const size_t sz = 4096; void *host_ptr; host_ptr = cl_aligned_malloc(sz, 4096); if (host_ptr != NULL) { cl_buffer bo = intel_buffer_alloc_userptr((cl_buffer_mgr)driver->bufmgr, "CL memory object", host_ptr, sz, 0); if (bo == NULL) device->host_unified_memory = CL_FALSE; else drm_intel_bo_unreference((drm_intel_bo*)bo); cl_free(host_ptr); } else device->host_unified_memory = CL_FALSE; #endif #ifdef HAS_EU_TOTAL unsigned int eu_total; /* Prefer driver-queried max compute units if supported */ if (!drm_intel_get_eu_total(driver->fd, &eu_total)) device->max_compute_unit = eu_total; else if (IS_CHERRYVIEW(device->device_id)) printf(CHV_CONFIG_WARNING); #else if (IS_CHERRYVIEW(device->device_id)) { #if defined(__ANDROID__) device->max_compute_unit = 12; #else printf(CHV_CONFIG_WARNING); #endif } #endif #ifdef HAS_SUBSLICE_TOTAL unsigned int subslice_total; /* Prefer driver-queried subslice count if supported */ if (!drm_intel_get_subslice_total(driver->fd, &subslice_total)) device->sub_slice_count = subslice_total; else if (IS_CHERRYVIEW(device->device_id)) printf(CHV_CONFIG_WARNING); #else if (IS_CHERRYVIEW(device->device_id)) { #if defined(__ANDROID__) device->sub_slice_count = 2; #else printf(CHV_CONFIG_WARNING); #endif } #endif #ifdef HAS_POOLED_EU /* BXT pooled eu, 3*6 to 2*9, like sub slice count is 2 */ int has_pooled_eu; if((has_pooled_eu = drm_intel_get_pooled_eu(driver->fd)) > 0) device->sub_slice_count = 2; #ifdef HAS_MIN_EU_IN_POOL int min_eu; /* for fused down 2x6 devices, beignet don't support. */ if (has_pooled_eu > 0 && (min_eu = drm_intel_get_min_eu_in_pool(driver->fd)) > 0) { assert(min_eu == 9); //don't support fuse down device. } #endif //HAS_MIN_EU_IN_POOL #endif //HAS_POOLED_EU //We should get the device memory dynamically, but the //mapablce mem size usage is unknown. Just ignore it. size_t total_mem,map_mem; if(drm_intel_get_aperture_sizes(driver->fd,&map_mem,&total_mem) == 0) device->global_mem_size = (cl_ulong)total_mem; intel_driver_context_destroy(driver); intel_driver_close(driver); intel_driver_terminate(driver); intel_driver_delete(driver); } LOCAL void intel_setup_callbacks(void) { cl_driver_new = (cl_driver_new_cb *) cl_intel_driver_new; cl_driver_delete = (cl_driver_delete_cb *) cl_intel_driver_delete; cl_driver_get_ver = (cl_driver_get_ver_cb *) intel_driver_get_ver; cl_driver_enlarge_stack_size = (cl_driver_enlarge_stack_size_cb *) intel_driver_enlarge_stack_size; cl_driver_set_atomic_flag = (cl_driver_set_atomic_flag_cb *) intel_driver_set_atomic_flag; cl_driver_get_bufmgr = (cl_driver_get_bufmgr_cb *) intel_driver_get_bufmgr; cl_driver_get_device_id = (cl_driver_get_device_id_cb *) intel_get_device_id; cl_driver_update_device_info = (cl_driver_update_device_info_cb *) intel_update_device_info; cl_buffer_alloc = (cl_buffer_alloc_cb *) drm_intel_bo_alloc; cl_buffer_alloc_userptr = (cl_buffer_alloc_userptr_cb*) intel_buffer_alloc_userptr; #ifdef HAS_BO_SET_SOFTPIN cl_buffer_set_softpin_offset = (cl_buffer_set_softpin_offset_cb *) drm_intel_bo_set_softpin_offset; cl_buffer_set_bo_use_full_range = (cl_buffer_set_bo_use_full_range_cb *) drm_intel_bo_use_48b_address_range; #endif cl_buffer_disable_reuse = (cl_buffer_disable_reuse_cb *) drm_intel_bo_disable_reuse; cl_buffer_set_tiling = (cl_buffer_set_tiling_cb *) intel_buffer_set_tiling; #if defined(HAS_GL_EGL) cl_buffer_alloc_from_texture = (cl_buffer_alloc_from_texture_cb *) intel_alloc_buffer_from_texture; cl_buffer_release_from_texture = (cl_buffer_release_from_texture_cb *) intel_release_buffer_from_texture; #endif cl_buffer_get_buffer_from_libva = (cl_buffer_get_buffer_from_libva_cb *) intel_share_buffer_from_libva; cl_buffer_get_image_from_libva = (cl_buffer_get_image_from_libva_cb *) intel_share_image_from_libva; cl_buffer_reference = (cl_buffer_reference_cb *) drm_intel_bo_reference; cl_buffer_unreference = (cl_buffer_unreference_cb *) drm_intel_bo_unreference; cl_buffer_map = (cl_buffer_map_cb *) drm_intel_bo_map; cl_buffer_unmap = (cl_buffer_unmap_cb *) drm_intel_bo_unmap; cl_buffer_map_gtt = (cl_buffer_map_gtt_cb *) drm_intel_gem_bo_map_gtt; cl_buffer_unmap_gtt = (cl_buffer_unmap_gtt_cb *) drm_intel_gem_bo_unmap_gtt; cl_buffer_map_gtt_unsync = (cl_buffer_map_gtt_unsync_cb *) drm_intel_gem_bo_map_unsynchronized; cl_buffer_get_virtual = (cl_buffer_get_virtual_cb *) drm_intel_bo_get_virtual; cl_buffer_get_size = (cl_buffer_get_size_cb *) drm_intel_bo_get_size; cl_buffer_pin = (cl_buffer_pin_cb *) drm_intel_bo_pin; cl_buffer_unpin = (cl_buffer_unpin_cb *) drm_intel_bo_unpin; cl_buffer_subdata = (cl_buffer_subdata_cb *) drm_intel_bo_subdata; cl_buffer_get_subdata = (cl_buffer_get_subdata_cb *) drm_intel_bo_get_subdata; cl_buffer_wait_rendering = (cl_buffer_wait_rendering_cb *) drm_intel_bo_wait_rendering; cl_buffer_get_fd = (cl_buffer_get_fd_cb *) drm_intel_bo_gem_export_to_prime; cl_buffer_get_tiling_align = (cl_buffer_get_tiling_align_cb *)intel_buffer_get_tiling_align; cl_buffer_get_buffer_from_fd = (cl_buffer_get_buffer_from_fd_cb *) intel_share_buffer_from_fd; cl_buffer_get_image_from_fd = (cl_buffer_get_image_from_fd_cb *) intel_share_image_from_fd; intel_set_gpgpu_callbacks(intel_get_device_id()); } Beignet-1.3.2-Source/src/intel/intel_batchbuffer.c000664 001750 001750 00000012611 13161142102 021164 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /************************************************************************** * * Copyright 2006 Tungsten Graphics, Inc., Cedar Park, Texas. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. * IN NO EVENT SHALL TUNGSTEN GRAPHICS AND/OR ITS SUPPLIERS BE LIABLE FOR * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * **************************************************************************/ #include "intel/intel_batchbuffer.h" #include "intel/intel_driver.h" #include "cl_alloc.h" #include "cl_utils.h" #include #include #include #include LOCAL int intel_batchbuffer_reset(intel_batchbuffer_t *batch, size_t sz) { if (batch->buffer != NULL) { dri_bo_unreference(batch->buffer); batch->buffer = NULL; batch->last_bo = NULL; } batch->buffer = dri_bo_alloc(batch->intel->bufmgr, "batch buffer", sz, 64); if (!batch->buffer || (dri_bo_map(batch->buffer, 1) != 0)) { if (batch->buffer) dri_bo_unreference(batch->buffer); batch->buffer = NULL; return -1; } batch->map = (uint8_t*) batch->buffer->virtual; batch->size = sz; batch->ptr = batch->map; batch->atomic = 0; batch->last_bo = batch->buffer; batch->enable_slm = 0; return 0; } LOCAL void intel_batchbuffer_init(intel_batchbuffer_t *batch, intel_driver_t *intel) { assert(intel); batch->intel = intel; } LOCAL void intel_batchbuffer_terminate(intel_batchbuffer_t *batch) { assert(batch->buffer); if (batch->map) { dri_bo_unmap(batch->buffer); batch->map = NULL; } dri_bo_unreference(batch->buffer); batch->buffer = NULL; } LOCAL int intel_batchbuffer_flush(intel_batchbuffer_t *batch) { uint32_t used = batch->ptr - batch->map; int is_locked = batch->intel->locked; int err = 0; if (used == 0) return 0; if ((used & 4) == 0) { *(uint32_t*) batch->ptr = 0; batch->ptr += 4; } *(uint32_t*)batch->ptr = MI_BATCH_BUFFER_END; batch->ptr += 4; used = batch->ptr - batch->map; dri_bo_unmap(batch->buffer); batch->ptr = batch->map = NULL; if (!is_locked) intel_driver_lock_hardware(batch->intel); int flag = I915_EXEC_RENDER; if(batch->enable_slm) { /* use the hard code here temp, must change to * I915_EXEC_ENABLE_SLM when it drm accept the patch */ flag |= (1<<13); } if (drm_intel_gem_bo_context_exec(batch->buffer, batch->intel->ctx, used, flag) < 0) { fprintf(stderr, "drm_intel_gem_bo_context_exec() failed: %s\n", strerror(errno)); err = -1; } if (!is_locked) intel_driver_unlock_hardware(batch->intel); return err; } LOCAL void intel_batchbuffer_emit_reloc(intel_batchbuffer_t *batch, dri_bo *bo, uint32_t read_domains, uint32_t write_domains, uint32_t delta) { assert(batch->ptr - batch->map < batch->size); dri_bo_emit_reloc(batch->buffer, read_domains, write_domains, delta, batch->ptr - batch->map, bo); intel_batchbuffer_emit_dword(batch, bo->offset + delta); } LOCAL intel_batchbuffer_t* intel_batchbuffer_new(intel_driver_t *intel) { intel_batchbuffer_t *batch = NULL; assert(intel); TRY_ALLOC_NO_ERR (batch, CALLOC(intel_batchbuffer_t)); intel_batchbuffer_init(batch, intel); exit: return batch; error: intel_batchbuffer_delete(batch); batch = NULL; goto exit; } LOCAL void intel_batchbuffer_delete(intel_batchbuffer_t *batch) { if (batch == NULL) return; if(batch->buffer) intel_batchbuffer_terminate(batch); cl_free(batch); } Beignet-1.3.2-Source/src/intel/intel_defines.h000664 001750 001750 00000042637 13161142102 020346 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* Copyright (C) Intel Corp. 2006. All Rights Reserved. Intel funded Tungsten Graphics (http://www.tungstengraphics.com) to develop this 3D driver. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. **********************************************************************/ /* * Authors: * Keith Whitwell */ #ifndef __GENX_DEFINES_H__ #define __GENX_DEFINES_H__ #define CMD(PIPELINE,OP,SUB_OP) ((3 << 29) | \ ((PIPELINE) << 27) | \ ((OP) << 24) | \ ((SUB_OP) << 16)) #define CMD_URB_FENCE CMD(0, 0, 0) #define CMD_CS_URB_STATE CMD(0, 0, 1) #define CMD_CONSTANT_BUFFER CMD(0, 0, 2) #define CMD_STATE_PREFETCH CMD(0, 0, 3) #define CMD_MEDIA_GATEWAY_STATE CMD(2, 0, 3) #define CMD_MEDIA_STATE_FLUSH CMD(2, 0, 4) #define CMD_GPGPU_WALKER CMD(2, 1, 5) #define CMD_PIPE_CONTROL CMD(3, 2, 0) #define CMD_LOAD_REGISTER_IMM (0x22 << 23) #define CMD_STATE_BASE_ADDRESS CMD(0, 1, 1) #define CMD_STATE_SIP CMD(0, 1, 2) #define CMD_PIPELINE_SELECT CMD(1, 1, 4) #define CMD_SAMPLER_PALETTE_LOAD CMD(3, 1, 2) #define CMD_MEDIA_STATE_POINTERS CMD(2, 0, 0) #define CMD_MEDIA CMD(2, 1, 0) #define CMD_MEDIA_EX CMD(2, 1, 1) #define CMD_PIPELINED_POINTERS CMD(3, 0, 0) #define CMD_BINDING_TABLE_POINTERS CMD(3, 0, 1) #define CMD_VERTEX_BUFFERS CMD(3, 0, 8) #define CMD_VERTEX_ELEMENTS CMD(3, 0, 9) #define CMD_DRAWING_RECTANGLE CMD(3, 1, 0) #define CMD_CONSTANT_COLOR CMD(3, 1, 1) #define CMD_3DPRIMITIVE CMD(3, 3, 0) #define BASE_ADDRESS_MODIFY (1 << 0) #define PIPELINE_SELECT_3D 0 #define PIPELINE_SELECT_MEDIA 1 #define PIPELINE_SELECT_GPGPU 2 #define PIPELINE_SELECT_MASK (3 << 8) #define UF0_CS_REALLOC (1 << 13) #define UF0_VFE_REALLOC (1 << 12) #define UF0_SF_REALLOC (1 << 11) #define UF0_CLIP_REALLOC (1 << 10) #define UF0_GS_REALLOC (1 << 9) #define UF0_VS_REALLOC (1 << 8) #define UF1_CLIP_FENCE_SHIFT 20 #define UF1_GS_FENCE_SHIFT 10 #define UF1_VS_FENCE_SHIFT 0 #define UF2_CS_FENCE_SHIFT 20 #define UF2_VFE_FENCE_SHIFT 10 #define UF2_SF_FENCE_SHIFT 0 #define FLOATING_POINT_IEEE_754 0 #define FLOATING_POINT_NON_IEEE_754 1 #define I965_SURFACE_1D 0 #define I965_SURFACE_2D 1 #define I965_SURFACE_3D 2 #define I965_SURFACE_CUBE 3 #define I965_SURFACE_BUFFER 4 #define I965_SURFACE_NULL 7 #define I965_SURFACEFORMAT_R32G32B32A32_FLOAT 0x000 #define I965_SURFACEFORMAT_R32G32B32A32_SINT 0x001 #define I965_SURFACEFORMAT_R32G32B32A32_UINT 0x002 #define I965_SURFACEFORMAT_R32G32B32A32_UNORM 0x003 #define I965_SURFACEFORMAT_R32G32B32A32_SNORM 0x004 #define I965_SURFACEFORMAT_R64G64_FLOAT 0x005 #define I965_SURFACEFORMAT_R32G32B32X32_FLOAT 0x006 #define I965_SURFACEFORMAT_R32G32B32A32_SSCALED 0x007 #define I965_SURFACEFORMAT_R32G32B32A32_USCALED 0x008 #define I965_SURFACEFORMAT_R32G32B32_FLOAT 0x040 #define I965_SURFACEFORMAT_R32G32B32_SINT 0x041 #define I965_SURFACEFORMAT_R32G32B32_UINT 0x042 #define I965_SURFACEFORMAT_R32G32B32_UNORM 0x043 #define I965_SURFACEFORMAT_R32G32B32_SNORM 0x044 #define I965_SURFACEFORMAT_R32G32B32_SSCALED 0x045 #define I965_SURFACEFORMAT_R32G32B32_USCALED 0x046 #define I965_SURFACEFORMAT_R16G16B16A16_UNORM 0x080 #define I965_SURFACEFORMAT_R16G16B16A16_SNORM 0x081 #define I965_SURFACEFORMAT_R16G16B16A16_SINT 0x082 #define I965_SURFACEFORMAT_R16G16B16A16_UINT 0x083 #define I965_SURFACEFORMAT_R16G16B16A16_FLOAT 0x084 #define I965_SURFACEFORMAT_R32G32_FLOAT 0x085 #define I965_SURFACEFORMAT_R32G32_SINT 0x086 #define I965_SURFACEFORMAT_R32G32_UINT 0x087 #define I965_SURFACEFORMAT_R32_FLOAT_X8X24_TYPELESS 0x088 #define I965_SURFACEFORMAT_X32_TYPELESS_G8X24_UINT 0x089 #define I965_SURFACEFORMAT_L32A32_FLOAT 0x08A #define I965_SURFACEFORMAT_R32G32_UNORM 0x08B #define I965_SURFACEFORMAT_R32G32_SNORM 0x08C #define I965_SURFACEFORMAT_R64_FLOAT 0x08D #define I965_SURFACEFORMAT_R16G16B16X16_UNORM 0x08E #define I965_SURFACEFORMAT_R16G16B16X16_FLOAT 0x08F #define I965_SURFACEFORMAT_A32X32_FLOAT 0x090 #define I965_SURFACEFORMAT_L32X32_FLOAT 0x091 #define I965_SURFACEFORMAT_I32X32_FLOAT 0x092 #define I965_SURFACEFORMAT_R16G16B16A16_SSCALED 0x093 #define I965_SURFACEFORMAT_R16G16B16A16_USCALED 0x094 #define I965_SURFACEFORMAT_R32G32_SSCALED 0x095 #define I965_SURFACEFORMAT_R32G32_USCALED 0x096 #define I965_SURFACEFORMAT_B8G8R8A8_UNORM 0x0C0 #define I965_SURFACEFORMAT_B8G8R8A8_UNORM_SRGB 0x0C1 #define I965_SURFACEFORMAT_R10G10B10A2_UNORM 0x0C2 #define I965_SURFACEFORMAT_R10G10B10A2_UNORM_SRGB 0x0C3 #define I965_SURFACEFORMAT_R10G10B10A2_UINT 0x0C4 #define I965_SURFACEFORMAT_R10G10B10_SNORM_A2_UNORM 0x0C5 #define I965_SURFACEFORMAT_R8G8B8A8_UNORM 0x0C7 #define I965_SURFACEFORMAT_R8G8B8A8_UNORM_SRGB 0x0C8 #define I965_SURFACEFORMAT_R8G8B8A8_SNORM 0x0C9 #define I965_SURFACEFORMAT_R8G8B8A8_SINT 0x0CA #define I965_SURFACEFORMAT_R8G8B8A8_UINT 0x0CB #define I965_SURFACEFORMAT_R16G16_UNORM 0x0CC #define I965_SURFACEFORMAT_R16G16_SNORM 0x0CD #define I965_SURFACEFORMAT_R16G16_SINT 0x0CE #define I965_SURFACEFORMAT_R16G16_UINT 0x0CF #define I965_SURFACEFORMAT_R16G16_FLOAT 0x0D0 #define I965_SURFACEFORMAT_B10G10R10A2_UNORM 0x0D1 #define I965_SURFACEFORMAT_B10G10R10A2_UNORM_SRGB 0x0D2 #define I965_SURFACEFORMAT_R11G11B10_FLOAT 0x0D3 #define I965_SURFACEFORMAT_R32_SINT 0x0D6 #define I965_SURFACEFORMAT_R32_UINT 0x0D7 #define I965_SURFACEFORMAT_R32_FLOAT 0x0D8 #define I965_SURFACEFORMAT_R24_UNORM_X8_TYPELESS 0x0D9 #define I965_SURFACEFORMAT_X24_TYPELESS_G8_UINT 0x0DA #define I965_SURFACEFORMAT_L16A16_UNORM 0x0DF #define I965_SURFACEFORMAT_I24X8_UNORM 0x0E0 #define I965_SURFACEFORMAT_L24X8_UNORM 0x0E1 #define I965_SURFACEFORMAT_A24X8_UNORM 0x0E2 #define I965_SURFACEFORMAT_I32_FLOAT 0x0E3 #define I965_SURFACEFORMAT_L32_FLOAT 0x0E4 #define I965_SURFACEFORMAT_A32_FLOAT 0x0E5 #define I965_SURFACEFORMAT_B8G8R8X8_UNORM 0x0E9 #define I965_SURFACEFORMAT_B8G8R8X8_UNORM_SRGB 0x0EA #define I965_SURFACEFORMAT_R8G8B8X8_UNORM 0x0EB #define I965_SURFACEFORMAT_R8G8B8X8_UNORM_SRGB 0x0EC #define I965_SURFACEFORMAT_R9G9B9E5_SHAREDEXP 0x0ED #define I965_SURFACEFORMAT_B10G10R10X2_UNORM 0x0EE #define I965_SURFACEFORMAT_L16A16_FLOAT 0x0F0 #define I965_SURFACEFORMAT_R32_UNORM 0x0F1 #define I965_SURFACEFORMAT_R32_SNORM 0x0F2 #define I965_SURFACEFORMAT_R10G10B10X2_USCALED 0x0F3 #define I965_SURFACEFORMAT_R8G8B8A8_SSCALED 0x0F4 #define I965_SURFACEFORMAT_R8G8B8A8_USCALED 0x0F5 #define I965_SURFACEFORMAT_R16G16_SSCALED 0x0F6 #define I965_SURFACEFORMAT_R16G16_USCALED 0x0F7 #define I965_SURFACEFORMAT_R32_SSCALED 0x0F8 #define I965_SURFACEFORMAT_R32_USCALED 0x0F9 #define I965_SURFACEFORMAT_B5G6R5_UNORM 0x100 #define I965_SURFACEFORMAT_B5G6R5_UNORM_SRGB 0x101 #define I965_SURFACEFORMAT_B5G5R5A1_UNORM 0x102 #define I965_SURFACEFORMAT_B5G5R5A1_UNORM_SRGB 0x103 #define I965_SURFACEFORMAT_B4G4R4A4_UNORM 0x104 #define I965_SURFACEFORMAT_B4G4R4A4_UNORM_SRGB 0x105 #define I965_SURFACEFORMAT_R8G8_UNORM 0x106 #define I965_SURFACEFORMAT_R8G8_SNORM 0x107 #define I965_SURFACEFORMAT_R8G8_SINT 0x108 #define I965_SURFACEFORMAT_R8G8_UINT 0x109 #define I965_SURFACEFORMAT_R16_UNORM 0x10A #define I965_SURFACEFORMAT_R16_SNORM 0x10B #define I965_SURFACEFORMAT_R16_SINT 0x10C #define I965_SURFACEFORMAT_R16_UINT 0x10D #define I965_SURFACEFORMAT_R16_FLOAT 0x10E #define I965_SURFACEFORMAT_I16_UNORM 0x111 #define I965_SURFACEFORMAT_L16_UNORM 0x112 #define I965_SURFACEFORMAT_A16_UNORM 0x113 #define I965_SURFACEFORMAT_L8A8_UNORM 0x114 #define I965_SURFACEFORMAT_I16_FLOAT 0x115 #define I965_SURFACEFORMAT_L16_FLOAT 0x116 #define I965_SURFACEFORMAT_A16_FLOAT 0x117 #define I965_SURFACEFORMAT_R5G5_SNORM_B6_UNORM 0x119 #define I965_SURFACEFORMAT_B5G5R5X1_UNORM 0x11A #define I965_SURFACEFORMAT_B5G5R5X1_UNORM_SRGB 0x11B #define I965_SURFACEFORMAT_R8G8_SSCALED 0x11C #define I965_SURFACEFORMAT_R8G8_USCALED 0x11D #define I965_SURFACEFORMAT_R16_SSCALED 0x11E #define I965_SURFACEFORMAT_R16_USCALED 0x11F #define I965_SURFACEFORMAT_R8_UNORM 0x140 #define I965_SURFACEFORMAT_R8_SNORM 0x141 #define I965_SURFACEFORMAT_R8_SINT 0x142 #define I965_SURFACEFORMAT_R8_UINT 0x143 #define I965_SURFACEFORMAT_A8_UNORM 0x144 #define I965_SURFACEFORMAT_I8_UNORM 0x145 #define I965_SURFACEFORMAT_L8_UNORM 0x146 #define I965_SURFACEFORMAT_P4A4_UNORM 0x147 #define I965_SURFACEFORMAT_A4P4_UNORM 0x148 #define I965_SURFACEFORMAT_R8_SSCALED 0x149 #define I965_SURFACEFORMAT_R8_USCALED 0x14A #define I965_SURFACEFORMAT_R1_UINT 0x181 #define I965_SURFACEFORMAT_YCRCB_NORMAL 0x182 #define I965_SURFACEFORMAT_YCRCB_SWAPUVY 0x183 #define I965_SURFACEFORMAT_BC1_UNORM 0x186 #define I965_SURFACEFORMAT_BC2_UNORM 0x187 #define I965_SURFACEFORMAT_BC3_UNORM 0x188 #define I965_SURFACEFORMAT_BC4_UNORM 0x189 #define I965_SURFACEFORMAT_BC5_UNORM 0x18A #define I965_SURFACEFORMAT_BC1_UNORM_SRGB 0x18B #define I965_SURFACEFORMAT_BC2_UNORM_SRGB 0x18C #define I965_SURFACEFORMAT_BC3_UNORM_SRGB 0x18D #define I965_SURFACEFORMAT_MONO8 0x18E #define I965_SURFACEFORMAT_YCRCB_SWAPUV 0x18F #define I965_SURFACEFORMAT_YCRCB_SWAPY 0x190 #define I965_SURFACEFORMAT_DXT1_RGB 0x191 #define I965_SURFACEFORMAT_FXT1 0x192 #define I965_SURFACEFORMAT_R8G8B8_UNORM 0x193 #define I965_SURFACEFORMAT_R8G8B8_SNORM 0x194 #define I965_SURFACEFORMAT_R8G8B8_SSCALED 0x195 #define I965_SURFACEFORMAT_R8G8B8_USCALED 0x196 #define I965_SURFACEFORMAT_R64G64B64A64_FLOAT 0x197 #define I965_SURFACEFORMAT_R64G64B64_FLOAT 0x198 #define I965_SURFACEFORMAT_BC4_SNORM 0x199 #define I965_SURFACEFORMAT_BC5_SNORM 0x19A #define I965_SURFACEFORMAT_R16G16B16_UNORM 0x19C #define I965_SURFACEFORMAT_R16G16B16_SNORM 0x19D #define I965_SURFACEFORMAT_R16G16B16_SSCALED 0x19E #define I965_SURFACEFORMAT_R16G16B16_USCALED 0x19F #define I965_SURFACEFORMAT_RAW 0x1FF #define I965_MAPFILTER_NEAREST 0x0 #define I965_MAPFILTER_LINEAR 0x1 #define I965_MAPFILTER_ANISOTROPIC 0x2 #define I965_MIPFILTER_NONE 0 #define I965_MIPFILTER_NEAREST 1 #define I965_MIPFILTER_LINEAR 3 #define I965_TEXCOORDMODE_WRAP 0 #define I965_TEXCOORDMODE_MIRROR 1 #define I965_TEXCOORDMODE_CLAMP 2 #define I965_TEXCOORDMODE_CUBE 3 #define I965_TEXCOORDMODE_CLAMP_BORDER 4 #define I965_TEXCOORDMODE_MIRROR_ONCE 5 #define I965_SURFACERETURNFORMAT_FLOAT32 0 #define I965_SURFACERETURNFORMAT_S1 1 #define I965_TILEWALK_XMAJOR 0 #define I965_TILEWALK_YMAJOR 1 #define GEN8_TILEMODE_LINEAR 0 #define GEN8_TILEMODE_WMAJOR 1 #define GEN8_TILEMODE_XMAJOR 2 #define GEN8_TILEMODE_YMAJOR 3 #define I965_SURCHAN_SELECT_ZERO 0 #define I965_SURCHAN_SELECT_ONE 1 #define I965_SURCHAN_SELECT_RED 4 #define I965_SURCHAN_SELECT_GREEN 5 #define I965_SURCHAN_SELECT_BLUE 6 #define I965_SURCHAN_SELECT_ALPHA 7 #define URB_SIZE(intel) (IS_IGDNG(intel->device_id) ? 1024 : \ IS_G4X(intel->device_id) ? 384 : 256) // HSW #define HSW_SCRATCH1_OFFSET (0xB038) #define HSW_ROW_CHICKEN3_HDC_OFFSET (0xE49C) // L3 cache stuff #define GEN7_L3_SQC_REG1_ADDRESS_OFFSET (0XB010) #define GEN7_L3_CNTL_REG2_ADDRESS_OFFSET (0xB020) #define GEN7_L3_CNTL_REG3_ADDRESS_OFFSET (0xB024) #define GEN8_L3_CNTL_REG_ADDRESS_OFFSET (0x7034) // To issue pipe controls (reset L3 / SLM or stall) #define GEN7_PIPE_CONTROL_MEDIA 0x2 #define GEN7_PIPE_CONTROL_3D 0x3 #define GEN7_PIPE_CONTROL_INSTRUCTION_GFX 0x3 #define GEN7_PIPE_CONTROL_OPCODE_3D_CONTROL 0x2 #define GEN7_PIPE_CONTROL_SUBOPCODE_3D_CONTROL 0x0 #define GEN7_PIPE_CONTROL_WRITE_TIMESTAMP (3 << 14) #define GEN7_PIPE_CONTROL_GLOBAL_GTT_WRITE (1 << 2) #define GEN_MAPFILTER_NEAREST 0x0 #define GEN_MAPFILTER_LINEAR 0x1 #define GEN_MAPFILTER_ANISOTROPIC 0x2 #define GEN_MIPFILTER_NONE 0 #define GEN_MIPFILTER_NEAREST 1 #define GEN_MIPFILTER_LINEAR 3 #define GEN_ADDRESS_ROUNDING_ENABLE_U_MAG 0x20 #define GEN_ADDRESS_ROUNDING_ENABLE_U_MIN 0x10 #define GEN_ADDRESS_ROUNDING_ENABLE_V_MAG 0x08 #define GEN_ADDRESS_ROUNDING_ENABLE_V_MIN 0x04 #define GEN_ADDRESS_ROUNDING_ENABLE_R_MAG 0x02 #define GEN_ADDRESS_ROUNDING_ENABLE_R_MIN 0x01 #define GEN_TEXCOORDMODE_WRAP 0 #define GEN_TEXCOORDMODE_MIRROR 1 #define GEN_TEXCOORDMODE_CLAMP 2 #define GEN_TEXCOORDMODE_CUBE 3 #define GEN_TEXCOORDMODE_CLAMP_BORDER 4 #define GEN_TEXCOORDMODE_MIRROR_ONCE 5 #endif /* __GENX_DEFINES_H__ */ Beignet-1.3.2-Source/src/intel/intel_batchbuffer.h000664 001750 001750 00000012242 13161142102 021171 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /************************************************************************** * * Copyright 2006 Tungsten Graphics, Inc., Cedar Park, Texas. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. * IN NO EVENT SHALL TUNGSTEN GRAPHICS AND/OR ITS SUPPLIERS BE LIABLE FOR * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * **************************************************************************/ #ifndef _INTEL_BATCHBUFFER_H_ #define _INTEL_BATCHBUFFER_H_ #include "intel_defines.h" #include "cl_utils.h" #include #include #include #include #include #include #include #define BEGIN_BATCH(b, n) do { \ intel_batchbuffer_require_space(b, (n) * 4); \ } while (0) #define OUT_BATCH(b, d) do { \ intel_batchbuffer_emit_dword(b, d); \ } while (0) #define OUT_RELOC(b, bo, read_domains, write_domain, delta) do { \ assert((delta) >= 0); \ intel_batchbuffer_emit_reloc(b, bo, read_domains, write_domain, delta); \ } while (0) #define ADVANCE_BATCH(b) do { } while (0) struct intel_driver; typedef struct intel_batchbuffer { struct intel_driver *intel; drm_intel_bo *buffer; /** Last bo submitted to the hardware. used for clFinish. */ drm_intel_bo *last_bo; uint32_t size; uint8_t *map; uint8_t *ptr; /** HSW: can't set LRI in batch buffer, set I915_EXEC_ENABLE_SLM * flag when call exec. */ uint8_t enable_slm; int atomic; } intel_batchbuffer_t; extern intel_batchbuffer_t* intel_batchbuffer_new(struct intel_driver*); extern void intel_batchbuffer_delete(intel_batchbuffer_t*); extern void intel_batchbuffer_emit_reloc(intel_batchbuffer_t*, drm_intel_bo*, uint32_t read_domains, uint32_t write_domains, uint32_t delta); extern void intel_batchbuffer_init(intel_batchbuffer_t*, struct intel_driver*); extern void intel_batchbuffer_terminate(intel_batchbuffer_t*); extern int intel_batchbuffer_flush(intel_batchbuffer_t*); extern int intel_batchbuffer_reset(intel_batchbuffer_t*, size_t sz); static INLINE uint32_t intel_batchbuffer_space(const intel_batchbuffer_t *batch) { assert(batch->ptr); return batch->size - (batch->ptr - batch->map); } static INLINE void intel_batchbuffer_emit_dword(intel_batchbuffer_t *batch, uint32_t x) { assert(intel_batchbuffer_space(batch) >= 4); *(uint32_t*)batch->ptr = x; batch->ptr += 4; } static INLINE void intel_batchbuffer_require_space(intel_batchbuffer_t *batch, uint32_t size) { assert(size < batch->size - 8); if (intel_batchbuffer_space(batch) < size) intel_batchbuffer_space(batch); } static INLINE uint8_t* intel_batchbuffer_alloc_space(intel_batchbuffer_t *batch, uint32_t size) { assert(intel_batchbuffer_space(batch) >= size); uint8_t *space_ptr = batch->ptr; batch->ptr += size; return space_ptr; } static INLINE void intel_batchbuffer_start_atomic(intel_batchbuffer_t *batch, uint32_t size) { assert(!batch->atomic); intel_batchbuffer_require_space(batch, size); batch->atomic = 1; } static INLINE void intel_batchbuffer_end_atomic(intel_batchbuffer_t *batch) { assert(batch->atomic); batch->atomic = 0; } #endif /* _INTEL_BATCHBUFFER_H_ */ Beignet-1.3.2-Source/src/cl_alloc.c000664 001750 001750 00000003310 13161142102 016147 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "cl_alloc.h" #include "cl_utils.h" #include #include #include static volatile int32_t cl_alloc_n = 0; LOCAL void* cl_malloc(size_t sz) { void * p = NULL; atomic_inc(&cl_alloc_n); p = malloc(sz); assert(p); return p; } LOCAL void* cl_aligned_malloc(size_t sz, size_t align) { void * p = NULL; atomic_inc(&cl_alloc_n); p = memalign(align, sz); assert(p); return p; } LOCAL void* cl_calloc(size_t n, size_t elem_size) { void *p = NULL; atomic_inc(&cl_alloc_n); p = calloc(n, elem_size); assert(p); return p; } LOCAL void* cl_realloc(void *ptr, size_t sz) { if (ptr == NULL) atomic_inc(&cl_alloc_n); return realloc(ptr, sz); } LOCAL void cl_free(void *ptr) { if (ptr == NULL) return; atomic_dec(&cl_alloc_n); free(ptr); ptr = NULL; } LOCAL size_t cl_report_unfreed(void) { return cl_alloc_n; } LOCAL void cl_report_set_all_freed(void) { cl_alloc_n = 0; } Beignet-1.3.2-Source/src/cl_driver.h000664 001750 001750 00000044211 13161142102 016362 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_DRIVER_H__ #define __CL_DRIVER_H__ #include #include #include "cl_driver_type.h" #include "CL/cl_ext.h" /* Various limitations we should remove actually */ #define GEN_MAX_SURFACES 256 #define GEN_MAX_SAMPLERS 16 #define GEN_MAX_VME_STATES 8 /************************************************************************** * cl_driver: * Hide behind some call backs the buffer allocation / deallocation ... This * will allow us to make the use of a software performance simulator easier and * to minimize the code specific for the HW and for the simulator **************************************************************************/ /* Create a new driver */ typedef cl_driver (cl_driver_new_cb)(cl_context_prop); extern cl_driver_new_cb *cl_driver_new; /* Delete the driver */ typedef void (cl_driver_delete_cb)(cl_driver); extern cl_driver_delete_cb *cl_driver_delete; /* Get the buffer manager from the driver */ typedef cl_buffer_mgr (cl_driver_get_bufmgr_cb)(cl_driver); extern cl_driver_get_bufmgr_cb *cl_driver_get_bufmgr; /* Get the Gen version from the driver */ typedef uint32_t (cl_driver_get_ver_cb)(cl_driver); extern cl_driver_get_ver_cb *cl_driver_get_ver; /* enlarge stack size from the driver */ typedef void (cl_driver_enlarge_stack_size_cb)(cl_driver, int32_t*); extern cl_driver_enlarge_stack_size_cb *cl_driver_enlarge_stack_size; typedef enum cl_self_test_res{ SELF_TEST_PASS = 0, SELF_TEST_SLM_FAIL = 1, SELF_TEST_ATOMIC_FAIL = 2, SELF_TEST_OTHER_FAIL = 3, } cl_self_test_res; /* Set the atomic enable/disable flag in the driver */ typedef void (cl_driver_set_atomic_flag_cb)(cl_driver, int); extern cl_driver_set_atomic_flag_cb *cl_driver_set_atomic_flag; /************************************************************************** * GPGPU command streamer **************************************************************************/ /* Describe texture tiling */ typedef enum cl_gpgpu_tiling { GPGPU_NO_TILE = 0, GPGPU_TILE_X = 1, GPGPU_TILE_Y = 2, } cl_gpgpu_tiling; /* Cache control options for gen7 */ typedef enum cl_cache_control { cc_gtt = 0x0, cc_l3 = 0x1, cc_llc = 0x2, cc_llc_l3 = 0x3 } cl_cache_control; /* L3 Cache control options for gen75 */ typedef enum cl_l3_cache_control { l3cc_uc = 0x0, l3cc_ec = 0x1 } cl_l3_cache_control; /* LLCCC Cache control options for gen75 */ typedef enum cl_llccc_cache_control { llccc_pte = 0x0<<1, llccc_uc = 0x1<<1, llccc_ec = 0x2<<1, llccc_ucllc = 0x3<<1 } cl_llccc_cache_control; /* Target Cache control options for gen8 */ typedef enum cl_target_cache_control { tcc_ec_only = 0x0<<3, tcc_llc_only = 0x1<<3, tcc_llc_ec = 0x2<<3, tcc_llc_ec_l3 = 0x3<<3 } cl_target_cache_control; /* Memory type LLC/ELLC Cache control options for gen8 */ typedef enum cl_mtllc_cache_control { mtllc_pte = 0x0<<5, mtllc_none = 0x1<<5, mtllc_wt = 0x2<<5, mtllc_wb = 0x3<<5 } cl_mtllc_cache_control; typedef enum gpu_command_status { command_queued = 3, command_submitted = 2, command_running = 1, command_complete = 0 } gpu_command_status; /* Use this structure to bind kernels in the gpgpu state */ typedef struct cl_gpgpu_kernel { const char *name; /* kernel name and bo name */ uint32_t grf_blocks; /* register blocks kernel wants (in 8 reg blocks) */ uint32_t curbe_sz; /* total size of all curbes */ cl_buffer bo; /* kernel code in the proper addr space */ int32_t barrierID; /* barrierID for _this_ kernel */ uint32_t use_slm:1; /* For gen7 (automatic barrier management) */ uint32_t thread_n:15; /* For gen7 (automatic barrier management) */ uint32_t slm_sz; /* For gen7 (automatic SLM allocation) */ } cl_gpgpu_kernel; /* Create a new gpgpu state */ typedef cl_gpgpu (cl_gpgpu_new_cb)(cl_driver); extern cl_gpgpu_new_cb *cl_gpgpu_new; /* Delete the gpgpu state */ typedef void (cl_gpgpu_delete_cb)(cl_gpgpu); extern cl_gpgpu_delete_cb *cl_gpgpu_delete; /* Synchonize GPU with CPU */ typedef void (cl_gpgpu_sync_cb)(void*); extern cl_gpgpu_sync_cb *cl_gpgpu_sync; /* Bind a regular unformatted buffer */ typedef void (cl_gpgpu_bind_buf_cb)(cl_gpgpu, cl_buffer, uint32_t offset, uint32_t internal_offset, size_t size, uint8_t bti); extern cl_gpgpu_bind_buf_cb *cl_gpgpu_bind_buf; typedef void (cl_gpgpu_set_kernel_cb)(cl_gpgpu, void *); extern cl_gpgpu_set_kernel_cb *cl_gpgpu_set_kernel; typedef void* (cl_gpgpu_get_kernel_cb)(cl_gpgpu); extern cl_gpgpu_get_kernel_cb *cl_gpgpu_get_kernel; /* bind samplers defined in both kernel and kernel args. */ typedef void (cl_gpgpu_bind_sampler_cb)(cl_gpgpu, uint32_t *samplers, size_t sampler_sz); extern cl_gpgpu_bind_sampler_cb *cl_gpgpu_bind_sampler; typedef void (cl_gpgpu_bind_vme_state_cb)(cl_gpgpu, cl_accelerator_intel accel); extern cl_gpgpu_bind_vme_state_cb *cl_gpgpu_bind_vme_state; /* get the default cache control value. */ typedef uint32_t (cl_gpgpu_get_cache_ctrl_cb)(); extern cl_gpgpu_get_cache_ctrl_cb *cl_gpgpu_get_cache_ctrl; /* Set a 2d texture */ typedef void (cl_gpgpu_bind_image_cb)(cl_gpgpu state, uint32_t id, cl_buffer obj_bo, uint32_t obj_bo_offset, uint32_t format, uint32_t bpp, uint32_t type, int32_t w, int32_t h, int32_t depth, int pitch, int32_t slice_pitch, cl_gpgpu_tiling tiling); extern cl_gpgpu_bind_image_cb *cl_gpgpu_bind_image; typedef void (cl_gpgpu_bind_image_for_vme_cb)(cl_gpgpu state, uint32_t id, cl_buffer obj_bo, uint32_t obj_bo_offset, uint32_t format, uint32_t bpp, uint32_t type, int32_t w, int32_t h, int32_t depth, int pitch, int32_t slice_pitch, cl_gpgpu_tiling tiling); extern cl_gpgpu_bind_image_for_vme_cb *cl_gpgpu_bind_image_for_vme; /* Setup a stack */ typedef void (cl_gpgpu_set_stack_cb)(cl_gpgpu, uint32_t offset, uint32_t size, uint32_t cchint); extern cl_gpgpu_set_stack_cb *cl_gpgpu_set_stack; /* Setup scratch */ typedef int (cl_gpgpu_set_scratch_cb)(cl_gpgpu, uint32_t per_thread_size); extern cl_gpgpu_set_scratch_cb *cl_gpgpu_set_scratch; /* Configure internal state */ typedef int (cl_gpgpu_state_init_cb)(cl_gpgpu, uint32_t max_threads, uint32_t size_cs_entry, int profiling); extern cl_gpgpu_state_init_cb *cl_gpgpu_state_init; /* Set the buffer object where to report performance counters */ typedef void (cl_gpgpu_set_perf_counters_cb)(cl_gpgpu, cl_buffer perf); extern cl_gpgpu_set_perf_counters_cb *cl_gpgpu_set_perf_counters; /* Fills current curbe buffer with data */ typedef int (cl_gpgpu_upload_curbes_cb)(cl_gpgpu, const void* data, uint32_t size); extern cl_gpgpu_upload_curbes_cb *cl_gpgpu_upload_curbes; typedef cl_buffer (cl_gpgpu_alloc_constant_buffer_cb)(cl_gpgpu, uint32_t size, uint8_t bti); extern cl_gpgpu_alloc_constant_buffer_cb *cl_gpgpu_alloc_constant_buffer; /* Setup all indirect states */ typedef void (cl_gpgpu_states_setup_cb)(cl_gpgpu, cl_gpgpu_kernel *kernel); extern cl_gpgpu_states_setup_cb *cl_gpgpu_states_setup; /* Upload the constant samplers as specified inside the OCL kernel */ typedef void (cl_gpgpu_upload_samplers_cb)(cl_gpgpu *state, const void *data, uint32_t n); extern cl_gpgpu_upload_samplers_cb *cl_gpgpu_upload_samplers; /* Set a sampler */ typedef void (cl_gpgpu_set_sampler_cb)(cl_gpgpu, uint32_t index, uint32_t non_normalized); extern cl_gpgpu_set_sampler_cb *cl_gpgpu_set_sampler; /* Allocate the batch buffer and return the BO used for the batch buffer */ typedef int (cl_gpgpu_batch_reset_cb)(cl_gpgpu, size_t sz); extern cl_gpgpu_batch_reset_cb *cl_gpgpu_batch_reset; /* Atomic begin, pipeline select, urb, pipeline state and constant buffer */ typedef void (cl_gpgpu_batch_start_cb)(cl_gpgpu); extern cl_gpgpu_batch_start_cb *cl_gpgpu_batch_start; /* atomic end with possibly inserted flush */ typedef void (cl_gpgpu_batch_end_cb)(cl_gpgpu, int32_t flush_mode); extern cl_gpgpu_batch_end_cb *cl_gpgpu_batch_end; /* Flush the command buffer */ typedef int (cl_gpgpu_flush_cb)(cl_gpgpu); extern cl_gpgpu_flush_cb *cl_gpgpu_flush; /* new a event for a batch buffer */ typedef cl_gpgpu_event (cl_gpgpu_event_new_cb)(cl_gpgpu); extern cl_gpgpu_event_new_cb *cl_gpgpu_event_new; /* update the batch buffer of this event */ typedef int (cl_gpgpu_event_update_status_cb)(cl_gpgpu_event, int); extern cl_gpgpu_event_update_status_cb *cl_gpgpu_event_update_status; /* flush the batch buffer of this event */ typedef void (cl_gpgpu_event_flush_cb)(cl_gpgpu_event); extern cl_gpgpu_event_flush_cb *cl_gpgpu_event_flush; /* cancel exec batch buffer of this event */ typedef void (cl_gpgpu_event_cancel_cb)(cl_gpgpu_event); extern cl_gpgpu_event_cancel_cb *cl_gpgpu_event_cancel; /* delete a gpgpu event */ typedef void (cl_gpgpu_event_delete_cb)(cl_gpgpu_event); extern cl_gpgpu_event_delete_cb *cl_gpgpu_event_delete; /* Get a event time stamp */ typedef void (cl_gpgpu_event_get_exec_timestamp_cb)(cl_gpgpu, int, uint64_t*); extern cl_gpgpu_event_get_exec_timestamp_cb *cl_gpgpu_event_get_exec_timestamp; /* Get current GPU time stamp */ typedef void (cl_gpgpu_event_get_gpu_cur_timestamp_cb)(cl_driver, uint64_t*); extern cl_gpgpu_event_get_gpu_cur_timestamp_cb *cl_gpgpu_event_get_gpu_cur_timestamp; /* Get current batch buffer handle */ typedef void* (cl_gpgpu_ref_batch_buf_cb)(cl_gpgpu); extern cl_gpgpu_ref_batch_buf_cb *cl_gpgpu_ref_batch_buf; /* Get release batch buffer handle */ typedef void (cl_gpgpu_unref_batch_buf_cb)(void*); extern cl_gpgpu_unref_batch_buf_cb *cl_gpgpu_unref_batch_buf; /* Set the profiling buffer */ typedef int (cl_gpgpu_set_profiling_buffer_cb)(cl_gpgpu, uint32_t, uint32_t, uint8_t); extern cl_gpgpu_set_profiling_buffer_cb *cl_gpgpu_set_profiling_buffer; typedef int (cl_gpgpu_set_profiling_info_cb)(cl_gpgpu, void *); extern cl_gpgpu_set_profiling_info_cb *cl_gpgpu_set_profiling_info; typedef void* (cl_gpgpu_get_profiling_info_cb)(cl_gpgpu); extern cl_gpgpu_get_profiling_info_cb *cl_gpgpu_get_profiling_info; typedef void* (cl_gpgpu_map_profiling_buffer_cb)(cl_gpgpu); extern cl_gpgpu_map_profiling_buffer_cb *cl_gpgpu_map_profiling_buffer; typedef void (cl_gpgpu_unmap_profiling_buffer_cb)(cl_gpgpu); extern cl_gpgpu_unmap_profiling_buffer_cb *cl_gpgpu_unmap_profiling_buffer; /* Set the printf buffer */ typedef int (cl_gpgpu_set_printf_buffer_cb)(cl_gpgpu, uint32_t, uint8_t); extern cl_gpgpu_set_printf_buffer_cb *cl_gpgpu_set_printf_buffer; /* get the printf buffer offset in the apeture*/ typedef unsigned long (cl_gpgpu_reloc_printf_buffer_cb)(cl_gpgpu, uint32_t, uint32_t); extern cl_gpgpu_reloc_printf_buffer_cb *cl_gpgpu_reloc_printf_buffer; /* map the printf buffer */ typedef void* (cl_gpgpu_map_printf_buffer_cb)(cl_gpgpu); extern cl_gpgpu_map_printf_buffer_cb *cl_gpgpu_map_printf_buffer; /* unmap the printf buffer */ typedef void (cl_gpgpu_unmap_printf_buffer_cb)(cl_gpgpu); extern cl_gpgpu_unmap_printf_buffer_cb *cl_gpgpu_unmap_printf_buffer; /* release the printf buffer */ typedef unsigned long (cl_gpgpu_release_printf_buffer_cb)(cl_gpgpu); extern cl_gpgpu_release_printf_buffer_cb *cl_gpgpu_release_printf_buffer; /* Set the last printfset pointer */ typedef int (cl_gpgpu_set_printf_info_cb)(cl_gpgpu, void *); extern cl_gpgpu_set_printf_info_cb *cl_gpgpu_set_printf_info; /* Get the last printfset pointer */ typedef void* (cl_gpgpu_get_printf_info_cb)(cl_gpgpu); extern cl_gpgpu_get_printf_info_cb *cl_gpgpu_get_printf_info; /* Will spawn all threads */ typedef void (cl_gpgpu_walker_cb)(cl_gpgpu, uint32_t simd_sz, uint32_t thread_n, const size_t global_wk_off[3], const size_t global_dim_off[3], const size_t global_wk_sz[3], const size_t local_wk_sz[3]); extern cl_gpgpu_walker_cb *cl_gpgpu_walker; /************************************************************************** * Buffer **************************************************************************/ /* Allocate a buffer */ typedef cl_buffer (cl_buffer_alloc_cb)(cl_buffer_mgr, const char*, size_t, size_t); extern cl_buffer_alloc_cb *cl_buffer_alloc; typedef cl_buffer (cl_buffer_alloc_userptr_cb)(cl_buffer_mgr, const char*, void *, size_t, unsigned long); extern cl_buffer_alloc_userptr_cb *cl_buffer_alloc_userptr; typedef int (cl_buffer_set_softpin_offset_cb)(cl_buffer, uint64_t); extern cl_buffer_set_softpin_offset_cb *cl_buffer_set_softpin_offset; typedef int (cl_buffer_set_bo_use_full_range_cb)(cl_buffer, uint32_t); extern cl_buffer_set_bo_use_full_range_cb *cl_buffer_set_bo_use_full_range; typedef int (cl_buffer_disable_reuse_cb)(cl_buffer); extern cl_buffer_disable_reuse_cb *cl_buffer_disable_reuse; /* Set a buffer's tiling mode */ typedef int (cl_buffer_set_tiling_cb)(cl_buffer, int tiling, size_t stride); extern cl_buffer_set_tiling_cb *cl_buffer_set_tiling; #include "cl_context.h" #include "cl_mem.h" typedef cl_buffer (cl_buffer_alloc_from_texture_cb)(cl_context, unsigned int, int, unsigned int, struct _cl_mem_image *gl_image); extern cl_buffer_alloc_from_texture_cb *cl_buffer_alloc_from_texture; typedef void (cl_buffer_release_from_texture_cb)(cl_context, struct _cl_mem_gl_image *); extern cl_buffer_release_from_texture_cb *cl_buffer_release_from_texture; typedef cl_buffer (cl_buffer_get_buffer_from_libva_cb)(cl_context ctx, unsigned int bo_name, size_t *sz); extern cl_buffer_get_buffer_from_libva_cb *cl_buffer_get_buffer_from_libva; typedef cl_buffer (cl_buffer_get_image_from_libva_cb)(cl_context ctx, unsigned int bo_name, struct _cl_mem_image *image); extern cl_buffer_get_image_from_libva_cb *cl_buffer_get_image_from_libva; /* Unref a buffer and destroy it if no more ref */ typedef int (cl_buffer_unreference_cb)(cl_buffer); extern cl_buffer_unreference_cb *cl_buffer_unreference; /* Add one more ref on a buffer */ typedef void (cl_buffer_reference_cb)(cl_buffer); extern cl_buffer_reference_cb *cl_buffer_reference; /* Map a buffer */ typedef int (cl_buffer_map_cb)(cl_buffer, uint32_t write_enable); extern cl_buffer_map_cb *cl_buffer_map; /* Unmap a buffer */ typedef int (cl_buffer_unmap_cb)(cl_buffer); extern cl_buffer_unmap_cb *cl_buffer_unmap; /* Map a buffer in the GTT domain */ typedef int (cl_buffer_map_gtt_cb)(cl_buffer); extern cl_buffer_map_gtt_cb *cl_buffer_map_gtt; /* Map a buffer in the GTT domain, non waiting the GPU read or write*/ typedef int (cl_buffer_map_gtt_unsync_cb)(cl_buffer); extern cl_buffer_map_gtt_unsync_cb *cl_buffer_map_gtt_unsync; /* Unmap a buffer in the GTT domain */ typedef int (cl_buffer_unmap_gtt_cb)(cl_buffer); extern cl_buffer_unmap_gtt_cb *cl_buffer_unmap_gtt; /* Get the virtual address (when mapped) */ typedef void* (cl_buffer_get_virtual_cb)(cl_buffer); extern cl_buffer_get_virtual_cb *cl_buffer_get_virtual; /* Get the size of the buffer */ typedef size_t (cl_buffer_get_size_cb)(cl_buffer); extern cl_buffer_get_size_cb *cl_buffer_get_size; /* Pin a buffer */ typedef int (cl_buffer_pin_cb)(cl_buffer, uint32_t alignment); extern cl_buffer_pin_cb *cl_buffer_pin; /* Unpin a buffer */ typedef int (cl_buffer_unpin_cb)(cl_buffer); extern cl_buffer_unpin_cb *cl_buffer_unpin; /* Fill data in the buffer */ typedef int (cl_buffer_subdata_cb)(cl_buffer, unsigned long, unsigned long, const void*); extern cl_buffer_subdata_cb *cl_buffer_subdata; /* Get data from buffer */ typedef int (cl_buffer_get_subdata_cb)(cl_buffer, unsigned long, unsigned long, void*); extern cl_buffer_get_subdata_cb *cl_buffer_get_subdata; /* Wait for all pending rendering for this buffer to complete */ typedef int (cl_buffer_wait_rendering_cb) (cl_buffer); extern cl_buffer_wait_rendering_cb *cl_buffer_wait_rendering; typedef int (cl_buffer_get_fd_cb)(cl_buffer, int *fd); extern cl_buffer_get_fd_cb *cl_buffer_get_fd; typedef int (cl_buffer_get_tiling_align_cb)(cl_context ctx, uint32_t tiling_mode, uint32_t dim); extern cl_buffer_get_tiling_align_cb *cl_buffer_get_tiling_align; typedef cl_buffer (cl_buffer_get_buffer_from_fd_cb)(cl_context ctx, int fd, int size); extern cl_buffer_get_buffer_from_fd_cb *cl_buffer_get_buffer_from_fd; typedef cl_buffer (cl_buffer_get_image_from_fd_cb)(cl_context ctx, int fd, int size, struct _cl_mem_image *image); extern cl_buffer_get_image_from_fd_cb *cl_buffer_get_image_from_fd; /* Get the device id */ typedef int (cl_driver_get_device_id_cb)(void); extern cl_driver_get_device_id_cb *cl_driver_get_device_id; /* Update the device info */ typedef void (cl_driver_update_device_info_cb)(cl_device_id device); extern cl_driver_update_device_info_cb *cl_driver_update_device_info; #endif /* __CL_DRIVER_H__ */ Beignet-1.3.2-Source/src/x11/000775 001750 001750 00000000000 13174334761 014671 5ustar00yryr000000 000000 Beignet-1.3.2-Source/src/x11/va_dri2.c000664 001750 001750 00000022452 13161142102 016346 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* * Copyright 2008 Red Hat, Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Soft- * ware"), to deal in the Software without restriction, including without * limitation the rights to use, copy, modify, merge, publish, distribute, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, provided that the above copyright * notice(s) and this permission notice appear in all copies of the Soft- * ware and that both the above copyright notice(s) and this permission * notice appear in supporting documentation. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABIL- * ITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY * RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN * THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSE- * QUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, * DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER * TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFOR- * MANCE OF THIS SOFTWARE. * * Except as contained in this notice, the name of a copyright holder shall * not be used in advertising or otherwise to promote the sale, use or * other dealings in this Software without prior written authorization of * the copyright holder. * * Authors: * Kristian Hgsberg (krh@redhat.com) */ #define NEED_REPLIES #include #include #include #include "xf86drm.h" #include "x11/va_dri2.h" #include "x11/va_dri2str.h" #include "x11/va_dri2tokens.h" #ifndef DRI2DriverDRI #define DRI2DriverDRI 0 #endif #define LOCAL __attribute__ ((visibility ("internal"))) static char va_dri2ExtensionName[] = DRI2_NAME; static XExtensionInfo _va_dri2_info_data; static XExtensionInfo *va_dri2Info = &_va_dri2_info_data; static XEXT_GENERATE_CLOSE_DISPLAY (VA_DRI2CloseDisplay, va_dri2Info) static /* const */ XExtensionHooks va_dri2ExtensionHooks = { NULL, /* create_gc */ NULL, /* copy_gc */ NULL, /* flush_gc */ NULL, /* free_gc */ NULL, /* create_font */ NULL, /* free_font */ VA_DRI2CloseDisplay, /* close_display */ NULL, /* wire_to_event */ NULL, /* event_to_wire */ NULL, /* error */ NULL, /* error_string */ }; static XEXT_GENERATE_FIND_DISPLAY (DRI2FindDisplay, va_dri2Info, va_dri2ExtensionName, &va_dri2ExtensionHooks, 0, NULL) LOCAL Bool VA_DRI2QueryExtension(Display *dpy, int *eventBase, int *errorBase) { XExtDisplayInfo *info = DRI2FindDisplay(dpy); if (XextHasExtension(info)) { *eventBase = info->codes->first_event; *errorBase = info->codes->first_error; return True; } return False; } LOCAL Bool VA_DRI2QueryVersion(Display *dpy, int *major, int *minor) { XExtDisplayInfo *info = DRI2FindDisplay (dpy); xDRI2QueryVersionReply rep; xDRI2QueryVersionReq *req; XextCheckExtension (dpy, info, va_dri2ExtensionName, False); LockDisplay(dpy); GetReq(DRI2QueryVersion, req); req->reqType = info->codes->major_opcode; req->dri2Reqtype = X_DRI2QueryVersion; req->majorVersion = DRI2_MAJOR; req->minorVersion = DRI2_MINOR; if (!_XReply(dpy, (xReply *)&rep, 0, xFalse)) { UnlockDisplay(dpy); SyncHandle(); return False; } *major = rep.majorVersion; *minor = rep.minorVersion; UnlockDisplay(dpy); SyncHandle(); return True; } LOCAL Bool VA_DRI2Connect(Display *dpy, XID window, char **driverName, char **deviceName) { XExtDisplayInfo *info = DRI2FindDisplay(dpy); xDRI2ConnectReply rep; xDRI2ConnectReq *req; XextCheckExtension (dpy, info, va_dri2ExtensionName, False); LockDisplay(dpy); GetReq(DRI2Connect, req); req->reqType = info->codes->major_opcode; req->dri2Reqtype = X_DRI2Connect; req->window = window; req->drivertype = DRI2DriverDRI; if (!_XReply(dpy, (xReply *)&rep, 0, xFalse)) { UnlockDisplay(dpy); SyncHandle(); return False; } if (rep.driverNameLength == 0 && rep.deviceNameLength == 0) { UnlockDisplay(dpy); SyncHandle(); return False; } *driverName = Xmalloc(rep.driverNameLength + 1); if (*driverName == NULL) { _XEatData(dpy, ((rep.driverNameLength + 3) & ~3) + ((rep.deviceNameLength + 3) & ~3)); UnlockDisplay(dpy); SyncHandle(); return False; } _XReadPad(dpy, *driverName, rep.driverNameLength); (*driverName)[rep.driverNameLength] = '\0'; *deviceName = Xmalloc(rep.deviceNameLength + 1); if (*deviceName == NULL) { Xfree(*driverName); _XEatData(dpy, ((rep.deviceNameLength + 3) & ~3)); UnlockDisplay(dpy); SyncHandle(); return False; } _XReadPad(dpy, *deviceName, rep.deviceNameLength); (*deviceName)[rep.deviceNameLength] = '\0'; UnlockDisplay(dpy); SyncHandle(); return True; } LOCAL Bool VA_DRI2Authenticate(Display *dpy, XID window, drm_magic_t magic) { XExtDisplayInfo *info = DRI2FindDisplay(dpy); xDRI2AuthenticateReq *req; xDRI2AuthenticateReply rep; XextCheckExtension (dpy, info, va_dri2ExtensionName, False); LockDisplay(dpy); GetReq(DRI2Authenticate, req); req->reqType = info->codes->major_opcode; req->dri2Reqtype = X_DRI2Authenticate; req->window = window; req->magic = magic; if (!_XReply(dpy, (xReply *)&rep, 0, xFalse)) { UnlockDisplay(dpy); SyncHandle(); return False; } UnlockDisplay(dpy); SyncHandle(); return rep.authenticated; } LOCAL void VA_DRI2CreateDrawable(Display *dpy, XID drawable) { XExtDisplayInfo *info = DRI2FindDisplay(dpy); xDRI2CreateDrawableReq *req; XextSimpleCheckExtension (dpy, info, va_dri2ExtensionName); LockDisplay(dpy); GetReq(DRI2CreateDrawable, req); req->reqType = info->codes->major_opcode; req->dri2Reqtype = X_DRI2CreateDrawable; req->drawable = drawable; UnlockDisplay(dpy); SyncHandle(); } LOCAL void VA_DRI2DestroyDrawable(Display *dpy, XID drawable) { XExtDisplayInfo *info = DRI2FindDisplay(dpy); xDRI2DestroyDrawableReq *req; XextSimpleCheckExtension (dpy, info, va_dri2ExtensionName); XSync(dpy, False); LockDisplay(dpy); GetReq(DRI2DestroyDrawable, req); req->reqType = info->codes->major_opcode; req->dri2Reqtype = X_DRI2DestroyDrawable; req->drawable = drawable; UnlockDisplay(dpy); SyncHandle(); } LOCAL VA_DRI2Buffer *VA_DRI2GetBuffers(Display *dpy, XID drawable, int *width, int *height, unsigned int *attachments, int count, int *outcount) { XExtDisplayInfo *info = DRI2FindDisplay(dpy); xDRI2GetBuffersReply rep; xDRI2GetBuffersReq *req; VA_DRI2Buffer *buffers; xDRI2Buffer repBuffer; CARD32 *p; int i; XextCheckExtension (dpy, info, va_dri2ExtensionName, False); LockDisplay(dpy); GetReqExtra(DRI2GetBuffers, count * 4, req); req->reqType = info->codes->major_opcode; req->dri2Reqtype = X_DRI2GetBuffers; req->drawable = drawable; req->count = count; p = (CARD32 *) &req[1]; for (i = 0; i < count; i++) p[i] = attachments[i]; if (!_XReply(dpy, (xReply *)&rep, 0, xFalse)) { UnlockDisplay(dpy); SyncHandle(); return NULL; } *width = rep.width; *height = rep.height; *outcount = rep.count; buffers = Xmalloc(rep.count * sizeof buffers[0]); if (buffers == NULL) { _XEatData(dpy, rep.count * sizeof repBuffer); UnlockDisplay(dpy); SyncHandle(); return NULL; } for (i = 0; i < (int) rep.count; i++) { _XReadPad(dpy, (char *) &repBuffer, sizeof repBuffer); buffers[i].attachment = repBuffer.attachment; buffers[i].name = repBuffer.name; buffers[i].pitch = repBuffer.pitch; buffers[i].cpp = repBuffer.cpp; buffers[i].flags = repBuffer.flags; } UnlockDisplay(dpy); SyncHandle(); return buffers; } LOCAL void VA_DRI2CopyRegion(Display *dpy, XID drawable, XserverRegion region, CARD32 dest, CARD32 src) { XExtDisplayInfo *info = DRI2FindDisplay(dpy); xDRI2CopyRegionReq *req; xDRI2CopyRegionReply rep; XextSimpleCheckExtension (dpy, info, va_dri2ExtensionName); LockDisplay(dpy); GetReq(DRI2CopyRegion, req); req->reqType = info->codes->major_opcode; req->dri2Reqtype = X_DRI2CopyRegion; req->drawable = drawable; req->region = region; req->dest = dest; req->src = src; _XReply(dpy, (xReply *)&rep, 0, xFalse); UnlockDisplay(dpy); SyncHandle(); } Beignet-1.3.2-Source/src/x11/va_dri2.h000664 001750 001750 00000006600 13161142102 016350 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* * Copyright 2007,2008 Red Hat, Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Soft- * ware"), to deal in the Software without restriction, including without * limitation the rights to use, copy, modify, merge, publish, distribute, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, provided that the above copyright * notice(s) and this permission notice appear in all copies of the Soft- * ware and that both the above copyright notice(s) and this permission * notice appear in supporting documentation. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABIL- * ITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY * RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN * THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSE- * QUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, * DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER * TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFOR- * MANCE OF THIS SOFTWARE. * * Except as contained in this notice, the name of a copyright holder shall * not be used in advertising or otherwise to promote the sale, use or * other dealings in this Software without prior written authorization of * the copyright holder. * * Authors: * Kristian Hgsberg (krh@redhat.com) */ #ifndef _VA_DRI2_H_ #define _VA_DRI2_H_ #include #include #include typedef struct { unsigned int attachment; unsigned int name; unsigned int pitch; unsigned int cpp; unsigned int flags; } VA_DRI2Buffer; extern Bool VA_DRI2QueryExtension(Display *display, int *eventBase, int *errorBase); extern Bool VA_DRI2QueryVersion(Display *display, int *major, int *minor); extern Bool VA_DRI2Connect(Display *display, XID window, char **driverName, char **deviceName); extern Bool VA_DRI2Authenticate(Display *display, XID window, drm_magic_t magic); extern void VA_DRI2CreateDrawable(Display *display, XID drawable); extern void VA_DRI2DestroyDrawable(Display *display, XID handle); extern VA_DRI2Buffer * VA_DRI2GetBuffers(Display *dpy, XID drawable, int *width, int *height, unsigned int *attachments, int count, int *outcount); #if 1 extern void VA_DRI2CopyRegion(Display *dpy, XID drawable, XserverRegion region, CARD32 dest, CARD32 src); #endif #endif Beignet-1.3.2-Source/src/x11/dricommon.c000664 001750 001750 00000021161 13161142102 017003 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia * Note: the code is taken from libva code base */ /* * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. * IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include "x11/va_dri2.h" #include "x11/va_dri2tokens.h" #include "x11/dricommon.h" #include "cl_utils.h" #include "cl_alloc.h" #include #include #include #include #define LOCAL __attribute__ ((visibility ("internal"))) LOCAL dri_drawable_t* dri_state_do_drawable_hash(dri_state_t *state, XID drawable) { int index = drawable % DRAWABLE_HASH_SZ; struct dri_drawable *dri_drawable = state->drawable_hash[index]; while (dri_drawable) { if (dri_drawable->x_drawable == drawable) return dri_drawable; dri_drawable = dri_drawable->next; } dri_drawable = dri_state_create_drawable(state, drawable); if(dri_drawable == NULL) return NULL; dri_drawable->x_drawable = drawable; dri_drawable->next = state->drawable_hash[index]; state->drawable_hash[index] = dri_drawable; return dri_drawable; } LOCAL void dri_state_free_drawable_hash(dri_state_t *state) { int i; struct dri_drawable *dri_drawable, *prev; for (i = 0; i < DRAWABLE_HASH_SZ; i++) { dri_drawable = state->drawable_hash[i]; while (dri_drawable) { prev = dri_drawable; dri_drawable = prev->next; dri_state_destroy_drawable(state, prev); } } } LOCAL dri_drawable_t* dri_state_get_drawable(dri_state_t *state, XID drawable) { return dri_state_do_drawable_hash(state, drawable); } LOCAL void dri_state_init_drawable_hash_table(dri_state_t *state) { int i; for(i=0; i < DRAWABLE_HASH_SZ; i++) state->drawable_hash[i] = NULL; } LOCAL void dri_state_delete(dri_state_t *state) { if (state == NULL) return; dri_state_close(state); cl_free(state); } LOCAL dri_state_t* dri_state_new(void) { dri_state_t *state = NULL; TRY_ALLOC_NO_ERR (state, CALLOC(dri_state_t)); state->fd = -1; state->driConnectedFlag = NONE; dri_state_init_drawable_hash_table(state); exit: return state; error: dri_state_delete(state); state = NULL; goto exit; } #define __DRI_BUFFER_FRONT_LEFT 0 #define __DRI_BUFFER_BACK_LEFT 1 #define __DRI_BUFFER_FRONT_RIGHT 2 #define __DRI_BUFFER_BACK_RIGHT 3 #define __DRI_BUFFER_DEPTH 4 #define __DRI_BUFFER_STENCIL 5 #define __DRI_BUFFER_ACCUM 6 #define __DRI_BUFFER_FAKE_FRONT_LEFT 7 #define __DRI_BUFFER_FAKE_FRONT_RIGHT 8 typedef struct dri2_drawable { struct dri_drawable base; union dri_buffer buffers[5]; int width; int height; int has_backbuffer; int back_index; int front_index; } dri2_drawable_t; LOCAL dri_drawable_t* dri_state_create_drawable(dri_state_t *state, XID x_drawable) { dri2_drawable_t *dri2_drwble; dri2_drwble = (dri2_drawable_t*)calloc(1, sizeof(*dri2_drwble)); if (!dri2_drwble) return NULL; dri2_drwble->base.x_drawable = x_drawable; dri2_drwble->base.x = 0; dri2_drwble->base.y = 0; VA_DRI2CreateDrawable(state->x11_dpy, x_drawable); return &dri2_drwble->base; } LOCAL void dri_state_destroy_drawable(dri_state_t *state, dri_drawable_t *dri_drwble) { VA_DRI2DestroyDrawable(state->x11_dpy, dri_drwble->x_drawable); free(dri_drwble); } LOCAL void dri_state_swap_buffer(dri_state_t *state, dri_drawable_t *dri_drwble) { dri2_drawable_t *dri2_drwble = (dri2_drawable_t*)dri_drwble; XRectangle xrect; XserverRegion region; if (dri2_drwble->has_backbuffer) { xrect.x = 0; xrect.y = 0; xrect.width = dri2_drwble->width; xrect.height = dri2_drwble->height; region = XFixesCreateRegion(state->x11_dpy, &xrect, 1); VA_DRI2CopyRegion(state->x11_dpy, dri_drwble->x_drawable, region, DRI2BufferFrontLeft, DRI2BufferBackLeft); XFixesDestroyRegion(state->x11_dpy, region); } } LOCAL union dri_buffer* dri_state_get_rendering_buffer(dri_state_t *state, dri_drawable_t *dri_drwble) { dri2_drawable_t *dri2_drwble = (dri2_drawable_t *)dri_drwble; int i; int count; unsigned int attachments[5]; VA_DRI2Buffer *buffers; i = 0; attachments[i++] = __DRI_BUFFER_BACK_LEFT; attachments[i++] = __DRI_BUFFER_FRONT_LEFT; buffers = VA_DRI2GetBuffers(state->x11_dpy, dri_drwble->x_drawable, &dri2_drwble->width, &dri2_drwble->height, attachments, i, &count); assert(buffers); if (buffers == NULL) return NULL; dri2_drwble->has_backbuffer = 0; for (i = 0; i < count; i++) { dri2_drwble->buffers[i].dri2.attachment = buffers[i].attachment; dri2_drwble->buffers[i].dri2.name = buffers[i].name; dri2_drwble->buffers[i].dri2.pitch = buffers[i].pitch; dri2_drwble->buffers[i].dri2.cpp = buffers[i].cpp; dri2_drwble->buffers[i].dri2.flags = buffers[i].flags; if (buffers[i].attachment == __DRI_BUFFER_BACK_LEFT) { dri2_drwble->has_backbuffer = 1; dri2_drwble->back_index = i; } if (buffers[i].attachment == __DRI_BUFFER_FRONT_LEFT) dri2_drwble->front_index = i; } dri_drwble->width = dri2_drwble->width; dri_drwble->height = dri2_drwble->height; Xfree(buffers); if (dri2_drwble->has_backbuffer) return &dri2_drwble->buffers[dri2_drwble->back_index]; return &dri2_drwble->buffers[dri2_drwble->front_index]; } LOCAL void dri_state_close(dri_state_t *state) { dri_state_free_drawable_hash(state); assert(state->fd >= 0); close(state->fd); } LOCAL void dri_state_release(dri_state_t *state) { dri_state_delete(state); } LOCAL dri_state_t* getDRI2State(Display* dpy, int screen, char **driver_name) { int major, minor; int error_base; int event_base; char *device_name = NULL; drm_magic_t magic; char * internal_driver_name = NULL; int fd = -1; dri_state_t* state = NULL; if (!VA_DRI2QueryExtension(dpy, &event_base, &error_base)) goto err_out; if (!VA_DRI2QueryVersion(dpy, &major, &minor)) goto err_out; if (!VA_DRI2Connect(dpy, RootWindow(dpy, screen), &internal_driver_name, &device_name)) goto err_out; if(device_name != NULL ) fd = open(device_name, O_RDWR); if (fd < 0) goto err_out; if (drmGetMagic(fd, &magic)) goto err_out; if (!VA_DRI2Authenticate(dpy, RootWindow(dpy, screen), magic)) goto err_out; if(driver_name) *driver_name = internal_driver_name; else Xfree(internal_driver_name); state = dri_state_new(); state->fd = fd; state->x11_dpy = dpy; state->x11_screen = screen; state->driConnectedFlag = DRI2; if (device_name) Xfree(device_name); return state; err_out: if (device_name) Xfree(device_name); if (internal_driver_name) Xfree(internal_driver_name); if(driver_name) *driver_name = NULL; if (fd >= 0) close(fd); if (driver_name) *driver_name = NULL; return state; } Beignet-1.3.2-Source/src/x11/dricommon.h000664 001750 001750 00000006013 13161142102 017007 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia * Note: the code is taken from libva code base */ /* * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the * "Software"), to deal in the Software without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sub license, and/or sell copies of the Software, and to * permit persons to whom the Software is furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice (including the * next paragraph) shall be included in all copies or substantial portions * of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. * IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _VA_DRICOMMON_H_ #define _VA_DRICOMMON_H_ #include #include #include #include union dri_buffer { struct { unsigned int attachment; unsigned int name; unsigned int pitch; unsigned int cpp; unsigned int flags; } dri2; }; typedef struct dri_drawable { XID x_drawable; int x; int y; unsigned int width; unsigned int height; struct dri_drawable *next; } dri_drawable_t; #define DRAWABLE_HASH_SZ 32 enum DRI_VER { NONE = 0, // NOT supported VA_DRI1 = 1, DRI2 = 2 }; typedef struct dri_state { Display *x11_dpy; int x11_screen; int fd; enum DRI_VER driConnectedFlag; /* 0: disconnected, 2: DRI2 */ dri_drawable_t *drawable_hash[DRAWABLE_HASH_SZ]; } dri_state_t; dri_drawable_t *dri_state_create_drawable(dri_state_t*, XID x_drawable); void dri_state_destroy_drawable(dri_state_t*, dri_drawable_t*); void dri_state_close(dri_state_t*); void dri_state_release(dri_state_t*); // Create a dri2 state from dpy and screen dri_state_t *getDRI2State(Display* dpy, int screen, char **driver_name); #endif /* _VA_DRICOMMON_H_ */ Beignet-1.3.2-Source/src/x11/va_dri2tokens.h000664 001750 001750 00000005275 13161142102 017603 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* * Copyright 2008 Red Hat, Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Soft- * ware"), to deal in the Software without restriction, including without * limitation the rights to use, copy, modify, merge, publish, distribute, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, provided that the above copyright * notice(s) and this permission notice appear in all copies of the Soft- * ware and that both the above copyright notice(s) and this permission * notice appear in supporting documentation. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABIL- * ITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY * RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN * THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSE- * QUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, * DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER * TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFOR- * MANCE OF THIS SOFTWARE. * * Except as contained in this notice, the name of a copyright holder shall * not be used in advertising or otherwise to promote the sale, use or * other dealings in this Software without prior written authorization of * the copyright holder. * * Authors: * Kristian Hgsberg (krh@redhat.com) */ #ifndef _DRI2_TOKENS_H_ #define _DRI2_TOKENS_H_ #define DRI2BufferFrontLeft 0 #define DRI2BufferBackLeft 1 #define DRI2BufferFrontRight 2 #define DRI2BufferBackRight 3 #define DRI2BufferDepth 4 #define DRI2BufferStencil 5 #define DRI2BufferAccum 6 #define DRI2BufferFakeFrontLeft 7 #define DRI2BufferFakeFrontRight 8 #define DRI2DriverDRI 0 #endif Beignet-1.3.2-Source/src/x11/va_dri2str.h000664 001750 001750 00000013537 13161142102 017110 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* * Copyright 2008 Red Hat, Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Soft- * ware"), to deal in the Software without restriction, including without * limitation the rights to use, copy, modify, merge, publish, distribute, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, provided that the above copyright * notice(s) and this permission notice appear in all copies of the Soft- * ware and that both the above copyright notice(s) and this permission * notice appear in supporting documentation. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABIL- * ITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY * RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN * THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSE- * QUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, * DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER * TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFOR- * MANCE OF THIS SOFTWARE. * * Except as contained in this notice, the name of a copyright holder shall * not be used in advertising or otherwise to promote the sale, use or * other dealings in this Software without prior written authorization of * the copyright holder. * * Authors: * Kristian Hgsberg (krh@redhat.com) */ #ifndef _DRI2_PROTO_H_ #define _DRI2_PROTO_H_ #define DRI2_NAME "DRI2" #define DRI2_MAJOR 1 #define DRI2_MINOR 0 #define DRI2NumberErrors 0 #define DRI2NumberEvents 0 #define DRI2NumberRequests 7 #define X_DRI2QueryVersion 0 #define X_DRI2Connect 1 #define X_DRI2Authenticate 2 #define X_DRI2CreateDrawable 3 #define X_DRI2DestroyDrawable 4 #define X_DRI2GetBuffers 5 #define X_DRI2CopyRegion 6 typedef struct { CARD32 attachment B32; CARD32 name B32; CARD32 pitch B32; CARD32 cpp B32; CARD32 flags B32; } xDRI2Buffer; typedef struct { CARD8 reqType; CARD8 dri2Reqtype; CARD16 length B16; CARD32 majorVersion B32; CARD32 minorVersion B32; } xDRI2QueryVersionReq; #define sz_xDRI2QueryVersionReq 12 typedef struct { BYTE type; /* X_Reply */ BYTE pad1; CARD16 sequenceNumber B16; CARD32 length B32; CARD32 majorVersion B32; CARD32 minorVersion B32; CARD32 pad2 B32; CARD32 pad3 B32; CARD32 pad4 B32; CARD32 pad5 B32; } xDRI2QueryVersionReply; #define sz_xDRI2QueryVersionReply 32 typedef struct { CARD8 reqType; CARD8 dri2Reqtype; CARD16 length B16; CARD32 window B32; CARD32 drivertype B32; } xDRI2ConnectReq; #define sz_xDRI2ConnectReq 12 typedef struct { BYTE type; /* X_Reply */ BYTE pad1; CARD16 sequenceNumber B16; CARD32 length B32; CARD32 driverNameLength B32; CARD32 deviceNameLength B32; CARD32 pad2 B32; CARD32 pad3 B32; CARD32 pad4 B32; CARD32 pad5 B32; } xDRI2ConnectReply; #define sz_xDRI2ConnectReply 32 typedef struct { CARD8 reqType; CARD8 dri2Reqtype; CARD16 length B16; CARD32 window B32; CARD32 magic B32; } xDRI2AuthenticateReq; #define sz_xDRI2AuthenticateReq 12 typedef struct { BYTE type; /* X_Reply */ BYTE pad1; CARD16 sequenceNumber B16; CARD32 length B32; CARD32 authenticated B32; CARD32 pad2 B32; CARD32 pad3 B32; CARD32 pad4 B32; CARD32 pad5 B32; CARD32 pad6 B32; } xDRI2AuthenticateReply; #define sz_xDRI2AuthenticateReply 32 typedef struct { CARD8 reqType; CARD8 dri2Reqtype; CARD16 length B16; CARD32 drawable B32; } xDRI2CreateDrawableReq; #define sz_xDRI2CreateDrawableReq 8 typedef struct { CARD8 reqType; CARD8 dri2Reqtype; CARD16 length B16; CARD32 drawable B32; } xDRI2DestroyDrawableReq; #define sz_xDRI2DestroyDrawableReq 8 typedef struct { CARD8 reqType; CARD8 dri2Reqtype; CARD16 length B16; CARD32 drawable B32; CARD32 count B32; } xDRI2GetBuffersReq; #define sz_xDRI2GetBuffersReq 12 typedef struct { BYTE type; /* X_Reply */ BYTE pad1; CARD16 sequenceNumber B16; CARD32 length B32; CARD32 width B32; CARD32 height B32; CARD32 count B32; CARD32 pad2 B32; CARD32 pad3 B32; CARD32 pad4 B32; } xDRI2GetBuffersReply; #define sz_xDRI2GetBuffersReply 32 typedef struct { CARD8 reqType; CARD8 dri2Reqtype; CARD16 length B16; CARD32 drawable B32; CARD32 region B32; CARD32 dest B32; CARD32 src B32; } xDRI2CopyRegionReq; #define sz_xDRI2CopyRegionReq 20 typedef struct { BYTE type; /* X_Reply */ BYTE pad1; CARD16 sequenceNumber B16; CARD32 length B32; CARD32 pad2 B32; CARD32 pad3 B32; CARD32 pad4 B32; CARD32 pad5 B32; CARD32 pad6 B32; CARD32 pad7 B32; } xDRI2CopyRegionReply; #define sz_xDRI2CopyRegionReply 32 #endif Beignet-1.3.2-Source/src/cl_image.c000664 001750 001750 00000023273 13161142102 016151 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "cl_image.h" #include "cl_utils.h" #include "intel/intel_defines.h" #include LOCAL cl_int cl_image_byte_per_pixel(const cl_image_format *fmt, uint32_t *bpp) { assert(bpp); if(fmt == NULL) return CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; const uint32_t type = fmt->image_channel_data_type; const uint32_t order = fmt->image_channel_order; switch (type) { #define DECL_BPP(DATA_TYPE, VALUE) case DATA_TYPE: *bpp = VALUE; DECL_BPP(CL_SNORM_INT8, 1); break; DECL_BPP(CL_SNORM_INT16, 2); break; DECL_BPP(CL_UNORM_INT8, 1); break; DECL_BPP(CL_UNORM_INT16, 2); break; DECL_BPP(CL_UNORM_SHORT_565, 2); if (order != CL_RGBx && order != CL_RGB) return CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; break; DECL_BPP(CL_UNORM_SHORT_555, 2); if (order != CL_RGBx && order != CL_RGB) return CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; break; DECL_BPP(CL_UNORM_INT_101010, 4); if (order != CL_RGBx && order != CL_RGB) return CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; break; DECL_BPP(CL_SIGNED_INT8, 1); break; DECL_BPP(CL_SIGNED_INT16, 2); break; DECL_BPP(CL_SIGNED_INT32, 4); break; DECL_BPP(CL_UNSIGNED_INT8, 1); break; DECL_BPP(CL_UNSIGNED_INT16, 2); break; DECL_BPP(CL_UNSIGNED_INT32, 4); break; DECL_BPP(CL_HALF_FLOAT, 2); break; DECL_BPP(CL_FLOAT, 4); break; #undef DECL_BPP default: return CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; }; switch (order) { case CL_Rx: break; case CL_R: break; case CL_A: break; case CL_RA: *bpp *= 2; break; case CL_RG: *bpp *= 2; break; case CL_INTENSITY: case CL_LUMINANCE: if (type != CL_UNORM_INT8 && type != CL_UNORM_INT16 && type != CL_SNORM_INT8 && type != CL_SNORM_INT16 && type != CL_HALF_FLOAT && type != CL_FLOAT) return CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; break; case CL_RGB: case CL_RGBx: if (type != CL_UNORM_SHORT_555 && type != CL_UNORM_SHORT_565 && type != CL_UNORM_INT_101010) return CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; break; case CL_RGBA: *bpp *= 4; break; case CL_ARGB: case CL_BGRA: if (type != CL_UNORM_INT8 && type != CL_SIGNED_INT8 && type != CL_SNORM_INT8 && type != CL_UNSIGNED_INT8) return CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; *bpp *= 4; break; case CL_sRGBA: case CL_sBGRA: if (type != CL_UNORM_INT8) return CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; *bpp *= 4; break; default: return CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; }; return CL_SUCCESS; } LOCAL uint32_t cl_image_get_intel_format(const cl_image_format *fmt) { const uint32_t type = fmt->image_channel_data_type; const uint32_t order = fmt->image_channel_order; switch (order) { case CL_R: #if 0 case CL_Rx: case CL_A: case CL_INTENSITY: case CL_LUMINANCE: if ((order == CL_INTENSITY || order == CL_LUMINANCE) && (type != CL_UNORM_INT8 && type != CL_UNORM_INT16 && type != CL_SNORM_INT8 && type != CL_SNORM_INT16 && type != CL_HALF_FLOAT && type != CL_FLOAT)) return INTEL_UNSUPPORTED_FORMAT; #endif /* XXX it seems we have some acuracy compatible issue with snomr_int8/16, * have to disable those formats currently. */ switch (type) { case CL_HALF_FLOAT: return I965_SURFACEFORMAT_R16_FLOAT; case CL_FLOAT: return I965_SURFACEFORMAT_R32_FLOAT; // case CL_SNORM_INT16: return I965_SURFACEFORMAT_R16_SNORM; // case CL_SNORM_INT8: return I965_SURFACEFORMAT_R8_SNORM; case CL_UNORM_INT8: return I965_SURFACEFORMAT_R8_UNORM; case CL_UNORM_INT16: return I965_SURFACEFORMAT_R16_UNORM; case CL_SIGNED_INT8: return I965_SURFACEFORMAT_R8_SINT; case CL_SIGNED_INT16: return I965_SURFACEFORMAT_R16_SINT; case CL_SIGNED_INT32: return I965_SURFACEFORMAT_R32_SINT; case CL_UNSIGNED_INT8: return I965_SURFACEFORMAT_R8_UINT; case CL_UNSIGNED_INT16: return I965_SURFACEFORMAT_R16_UINT; case CL_UNSIGNED_INT32: return I965_SURFACEFORMAT_R32_UINT; default: return INTEL_UNSUPPORTED_FORMAT; }; case CL_RG: switch (type) { case CL_UNORM_INT8: return I965_SURFACEFORMAT_R8G8_UNORM; case CL_UNORM_INT16: return I965_SURFACEFORMAT_R16G16_UNORM; case CL_UNSIGNED_INT8: return I965_SURFACEFORMAT_R8G8_UINT; case CL_UNSIGNED_INT16: return I965_SURFACEFORMAT_R16G16_UINT; default: return INTEL_UNSUPPORTED_FORMAT; }; #if 0 case CL_RG: case CL_RA: switch (type) { case CL_HALF_FLOAT: return I965_SURFACEFORMAT_R16G16_FLOAT; case CL_FLOAT: return I965_SURFACEFORMAT_R32G32_FLOAT; case CL_SNORM_INT16: return I965_SURFACEFORMAT_R16G16_SNORM; case CL_SNORM_INT8: return I965_SURFACEFORMAT_R8G8_SNORM; case CL_UNORM_INT8: return I965_SURFACEFORMAT_R8G8_UNORM; case CL_UNORM_INT16: return I965_SURFACEFORMAT_R16G16_UNORM; case CL_SIGNED_INT8: return I965_SURFACEFORMAT_R8G8_SINT; case CL_SIGNED_INT16: return I965_SURFACEFORMAT_R16G16_SINT; case CL_SIGNED_INT32: return I965_SURFACEFORMAT_R32G32_SINT; case CL_UNSIGNED_INT8: return I965_SURFACEFORMAT_R8G8_UINT; case CL_UNSIGNED_INT16: return I965_SURFACEFORMAT_R16G16_UINT; case CL_UNSIGNED_INT32: return I965_SURFACEFORMAT_R32G32_UINT; default: return INTEL_UNSUPPORTED_FORMAT; }; case CL_RGB: case CL_RGBx: switch (type) { case CL_UNORM_INT_101010: return I965_SURFACEFORMAT_R10G10B10A2_UNORM; case CL_UNORM_SHORT_565: case CL_UNORM_SHORT_555: default: return INTEL_UNSUPPORTED_FORMAT; }; #endif case CL_RGBA: switch (type) { case CL_HALF_FLOAT: return I965_SURFACEFORMAT_R16G16B16A16_FLOAT; case CL_FLOAT: return I965_SURFACEFORMAT_R32G32B32A32_FLOAT; // case CL_SNORM_INT16: return I965_SURFACEFORMAT_R16G16B16A16_SNORM; // case CL_SNORM_INT8: return I965_SURFACEFORMAT_R8G8B8A8_SNORM; case CL_UNORM_INT8: return I965_SURFACEFORMAT_R8G8B8A8_UNORM; case CL_UNORM_INT16: return I965_SURFACEFORMAT_R16G16B16A16_UNORM; case CL_SIGNED_INT8: return I965_SURFACEFORMAT_R8G8B8A8_SINT; case CL_SIGNED_INT16: return I965_SURFACEFORMAT_R16G16B16A16_SINT; case CL_SIGNED_INT32: return I965_SURFACEFORMAT_R32G32B32A32_SINT; case CL_UNSIGNED_INT8: return I965_SURFACEFORMAT_R8G8B8A8_UINT; case CL_UNSIGNED_INT16: return I965_SURFACEFORMAT_R16G16B16A16_UINT; case CL_UNSIGNED_INT32: return I965_SURFACEFORMAT_R32G32B32A32_UINT; default: return INTEL_UNSUPPORTED_FORMAT; }; case CL_ARGB: return INTEL_UNSUPPORTED_FORMAT; case CL_BGRA: switch (type) { case CL_UNORM_INT8: return I965_SURFACEFORMAT_B8G8R8A8_UNORM; default: return INTEL_UNSUPPORTED_FORMAT; }; case CL_sRGBA: switch (type) { case CL_UNORM_INT8: return I965_SURFACEFORMAT_R8G8B8A8_UNORM_SRGB; default: return INTEL_UNSUPPORTED_FORMAT; }; case CL_sBGRA: switch (type) { case CL_UNORM_INT8: return I965_SURFACEFORMAT_B8G8R8A8_UNORM_SRGB; default: return INTEL_UNSUPPORTED_FORMAT; }; default: return INTEL_UNSUPPORTED_FORMAT; }; } static const uint32_t cl_image_order[] = { CL_R, CL_A, CL_RG, CL_RA, CL_RGB, CL_RGBA, CL_BGRA, CL_ARGB, CL_INTENSITY, CL_LUMINANCE, CL_Rx, CL_RGx, CL_RGBx, CL_sRGBA, CL_sBGRA }; static const uint32_t cl_image_type[] = { CL_SNORM_INT8, CL_SNORM_INT16, CL_UNORM_INT8, CL_UNORM_INT16, CL_UNORM_SHORT_565, CL_UNORM_SHORT_555, CL_UNORM_INT_101010, CL_SIGNED_INT8, CL_SIGNED_INT16, CL_SIGNED_INT32, CL_UNSIGNED_INT8, CL_UNSIGNED_INT16, CL_UNSIGNED_INT32, CL_HALF_FLOAT, CL_FLOAT }; static const size_t cl_image_order_n = SIZEOF32(cl_image_order); static const size_t cl_image_type_n = SIZEOF32(cl_image_type); cl_int cl_image_get_supported_fmt(cl_context ctx, cl_mem_flags flags, cl_mem_object_type image_type, cl_uint num_entries, cl_image_format *image_formats, cl_uint *num_image_formats) { size_t i, j, n = 0; for (i = 0; i < cl_image_order_n; ++i) for (j = 0; j < cl_image_type_n; ++j) { const cl_image_format fmt = { .image_channel_order = cl_image_order[i], .image_channel_data_type = cl_image_type[j] }; const uint32_t intel_fmt = cl_image_get_intel_format(&fmt); if (cl_image_order[i] >= CL_sRGBA && ((flags & CL_MEM_WRITE_ONLY) || (flags & CL_MEM_READ_WRITE) || (flags & CL_MEM_KERNEL_READ_AND_WRITE))) continue; if (intel_fmt == INTEL_UNSUPPORTED_FORMAT) continue; if (n < num_entries && image_formats) image_formats[n] = fmt; n++; } if (num_image_formats) *num_image_formats = n; return CL_SUCCESS; } Beignet-1.3.2-Source/src/cl_gt_device.h000664 001750 001750 00000015025 13173554000 017030 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #undef LIBCL_VERSION_STRING #undef LIBCL_C_VERSION_STRING #ifdef GEN9_DEVICE #define LIBCL_VERSION_STRING GEN9_LIBCL_VERSION_STRING #define LIBCL_C_VERSION_STRING GEN9_LIBCL_C_VERSION_STRING #else #define LIBCL_VERSION_STRING NONGEN9_LIBCL_VERSION_STRING #define LIBCL_C_VERSION_STRING NONGEN9_LIBCL_C_VERSION_STRING #endif /* Common fields for both all GT devices (IVB / SNB) */ .device_type = CL_DEVICE_TYPE_GPU, .device_id=0,/* == device_id (set when requested) */ .vendor_id = INTEL_VENDOR_ID, .max_work_item_dimensions = 3, .max_1d_global_work_sizes = {1024 * 1024 * 256, 1, 1}, .max_2d_global_work_sizes = {8192, 8192, 1}, .max_3d_global_work_sizes = {8192, 8192, 2048}, .preferred_vector_width_char = 16, .preferred_vector_width_short = 8, .preferred_vector_width_int = 4, .preferred_vector_width_long = 2, .preferred_vector_width_float = 4, .preferred_vector_width_double = 0, .preferred_vector_width_half = 0, .native_vector_width_char = 8, .native_vector_width_short = 8, .native_vector_width_int = 4, .native_vector_width_long = 2, .native_vector_width_float = 4, .native_vector_width_double = 2, .native_vector_width_half = 8, .address_bits = 32, .svm_capabilities = CL_DEVICE_SVM_COARSE_GRAIN_BUFFER, .preferred_platform_atomic_alignment = 0, .preferred_global_atomic_alignment = 0, .preferred_local_atomic_alignment = 0, .image_support = CL_TRUE, .max_read_image_args = BTI_MAX_READ_IMAGE_ARGS, .max_write_image_args = BTI_MAX_WRITE_IMAGE_ARGS, .max_read_write_image_args = BTI_MAX_WRITE_IMAGE_ARGS, .image_max_array_size = 2048, .image2d_max_width = 8192, .image2d_max_height = 8192, .image3d_max_width = 8192, .image3d_max_height = 8192, .image3d_max_depth = 2048, .image_mem_size = 65536, .max_samplers = 16, .mem_base_addr_align = sizeof(cl_long) * 16 * 8, .min_data_type_align_size = sizeof(cl_long) * 16, .max_pipe_args = 16, .pipe_max_active_reservations = 1, .pipe_max_packet_siz = 1024, .double_fp_config = 0, .global_mem_cache_type = CL_READ_WRITE_CACHE, .max_constant_buffer_size = 128 * 1024 * 1024, .max_constant_args = 8, .max_global_variable_size = 64 * 1024, .global_variable_preferred_total_size = 64 * 1024, .error_correction_support = CL_FALSE, #ifdef HAS_USERPTR .host_unified_memory = CL_TRUE, #else .host_unified_memory = CL_FALSE, #endif .profiling_timer_resolution = 80, /* ns */ .endian_little = CL_TRUE, .available = CL_TRUE, .compiler_available = CL_TRUE, .linker_available = CL_TRUE, .execution_capabilities = CL_EXEC_KERNEL | CL_EXEC_NATIVE_KERNEL, .queue_properties = CL_QUEUE_PROFILING_ENABLE, .queue_on_host_properties = CL_QUEUE_PROFILING_ENABLE, .queue_on_device_properties = CL_QUEUE_PROFILING_ENABLE | CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, .queue_on_device_preferred_size = 16 * 1024, .queue_on_device_max_size = 256 * 1024, .max_on_device_queues = 1, .max_on_device_events = 1024, .platform = NULL, /* == intel_platform (set when requested) */ /* IEEE 754, XXX does IVB support CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT? */ .single_fp_config = CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST , /* IEEE 754. */ .half_fp_config = CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST , .printf_buffer_size = 1 * 1024 * 1024, .interop_user_sync = CL_TRUE, #define DECL_INFO_STRING(FIELD, STRING) \ .FIELD = STRING, \ .JOIN(FIELD,_sz) = sizeof(STRING), DECL_INFO_STRING(name, "Intel HD Graphics Family") DECL_INFO_STRING(vendor, "Intel") DECL_INFO_STRING(version, LIBCL_VERSION_STRING) DECL_INFO_STRING(profile, "FULL_PROFILE") DECL_INFO_STRING(opencl_c_version, LIBCL_C_VERSION_STRING) DECL_INFO_STRING(extensions, "") DECL_INFO_STRING(built_in_kernels, "__cl_copy_region_align4;" "__cl_copy_region_align16;" "__cl_cpy_region_unalign_same_offset;" "__cl_copy_region_unalign_dst_offset;" "__cl_copy_region_unalign_src_offset;" "__cl_copy_buffer_rect;" "__cl_copy_image_1d_to_1d;" "__cl_copy_image_2d_to_2d;" "__cl_copy_image_3d_to_2d;" "__cl_copy_image_2d_to_3d;" "__cl_copy_image_3d_to_3d;" "__cl_copy_image_2d_to_buffer;" "__cl_copy_image_3d_to_buffer;" "__cl_copy_buffer_to_image_2d;" "__cl_copy_buffer_to_image_3d;" "__cl_fill_region_unalign;" "__cl_fill_region_align2;" "__cl_fill_region_align4;" "__cl_fill_region_align8_2;" "__cl_fill_region_align8_4;" "__cl_fill_region_align8_8;" "__cl_fill_region_align8_16;" "__cl_fill_region_align128;" "__cl_fill_image_1d;" "__cl_fill_image_1d_array;" "__cl_fill_image_2d;" "__cl_fill_image_2d_array;" "__cl_fill_image_3d;" #ifdef GEN7_DEVICE "block_motion_estimate_intel;" #endif ) DECL_INFO_STRING(driver_version, LIBCL_DRIVER_VERSION_STRING) DECL_INFO_STRING(spir_versions, "1.2") #undef DECL_INFO_STRING .parent_device = NULL, .partition_max_sub_device = 1, .partition_property = {0}, .affinity_domain = 0, .partition_type = {0}, .image_pitch_alignment = 1, .image_base_address_alignment = 4096, .sub_group_sizes = {8, 16}, .sub_group_sizes_sz = sizeof(size_t) * 2, .cmrt_device = NULL Beignet-1.3.2-Source/src/OCLConfig.h.in000664 001750 001750 00000000435 13161142102 016561 0ustar00yryr000000 000000 // the configured options and settings for LIBCL #define LIBCL_DRIVER_VERSION_MAJOR @LIBCL_DRIVER_VERSION_MAJOR@ #define LIBCL_DRIVER_VERSION_MINOR @LIBCL_DRIVER_VERSION_MINOR@ #define LIBCL_C_VERSION_MAJOR @LIBCL_C_VERSION_MAJOR@ #define LIBCL_C_VERSION_MINOR @LIBCL_C_VERSION_MINOR@ Beignet-1.3.2-Source/src/cl_api_sampler.c000664 001750 001750 00000006524 13161142102 017363 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "cl_sampler.h" #include "cl_context.h" #include "cl_device_id.h" cl_sampler clCreateSampler(cl_context context, cl_bool normalized, cl_addressing_mode addressing, cl_filter_mode filter, cl_int *errcode_ret) { cl_sampler sampler = NULL; cl_int err = CL_SUCCESS; cl_uint i; do { if (!CL_OBJECT_IS_CONTEXT(context)) { err = CL_INVALID_CONTEXT; break; } if (addressing < CL_ADDRESS_NONE || addressing > CL_ADDRESS_MIRRORED_REPEAT) { err = CL_INVALID_VALUE; break; } if (filter < CL_FILTER_NEAREST || filter > CL_FILTER_LINEAR) { err = CL_INVALID_VALUE; break; } /* Check if images are not supported by any device associated with context */ for (i = 0; i < context->device_num; i++) { if (context->devices[i]->image_support == CL_FALSE) { err = CL_INVALID_OPERATION; break; } } if (err != CL_SUCCESS) break; sampler = cl_create_sampler(context, normalized, addressing, filter, &err); } while (0); if (errcode_ret) *errcode_ret = err; return sampler; } cl_int clGetSamplerInfo(cl_sampler sampler, cl_sampler_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { const void *src_ptr = NULL; size_t src_size = 0; cl_int ref; if (!CL_OBJECT_IS_SAMPLER(sampler)) { return CL_INVALID_SAMPLER; } if (param_name == CL_SAMPLER_REFERENCE_COUNT) { ref = CL_OBJECT_GET_REF(sampler); src_ptr = &ref; src_size = sizeof(cl_int); } else if (param_name == CL_SAMPLER_CONTEXT) { src_ptr = &sampler->ctx; src_size = sizeof(cl_context); } else if (param_name == CL_SAMPLER_NORMALIZED_COORDS) { src_ptr = &sampler->normalized_coords; src_size = sizeof(cl_bool); } else if (param_name == CL_SAMPLER_ADDRESSING_MODE) { src_ptr = &sampler->address; src_size = sizeof(cl_addressing_mode); } else if (param_name == CL_SAMPLER_FILTER_MODE) { src_ptr = &sampler->filter; src_size = sizeof(cl_filter_mode); } else { return CL_INVALID_VALUE; } return cl_get_info_helper(src_ptr, src_size, param_value, param_value_size, param_value_size_ret); } cl_int clRetainSampler(cl_sampler sampler) { if (!CL_OBJECT_IS_SAMPLER(sampler)) { return CL_INVALID_SAMPLER; } cl_sampler_add_ref(sampler); return CL_SUCCESS; } cl_int clReleaseSampler(cl_sampler sampler) { if (!CL_OBJECT_IS_SAMPLER(sampler)) { return CL_INVALID_SAMPLER; } cl_sampler_delete(sampler); return CL_SUCCESS; } Beignet-1.3.2-Source/src/git_sha1.sh000775 001750 001750 00000001073 13173553727 016323 0ustar00yryr000000 000000 #!/bin/bash SOURCE_DIR=$1 FILE=$2 touch ${SOURCE_DIR}/${FILE}_tmp if test -d ${SOURCE_DIR}/../.git; then if which git > /dev/null; then git --git-dir=${SOURCE_DIR}/../.git log -n 1 --oneline | \ sed 's/^\([^ ]*\) .*/#define BEIGNET_GIT_SHA1 "git-\1"/' \ > ${SOURCE_DIR}/${FILE}_tmp fi fi #updating ${SOURCE_DIR}/${FILE} if ! cmp -s ${SOURCE_DIR}/${FILE}_tmp ${SOURCE_DIR}/${FILE}; then mv ${SOURCE_DIR}/${FILE}_tmp ${SOURCE_DIR}/${FILE} else rm ${SOURCE_DIR}/${FILE}_tmp fi Beignet-1.3.2-Source/src/cl_mem_gl.h000664 001750 001750 00000001047 13161142102 016327 0ustar00yryr000000 000000 #ifndef __CL_MEM_GL_H__ #define __CL_MEM_GL_H__ #include "cl_mem.h" cl_mem cl_mem_new_gl_buffer(cl_context ctx, cl_mem_flags flags, GLuint buf_obj, cl_int *errcode_ret); cl_mem cl_mem_new_gl_texture(cl_context ctx, cl_mem_flags flags, GLenum texture_target, GLint miplevel, GLuint texture, cl_int *errcode_ret); #endif Beignet-1.3.2-Source/src/cl_accelerator_intel.h000664 001750 001750 00000002111 13161142102 020537 0ustar00yryr000000 000000 #ifndef __CL_ACCELERATOR_INTEL_H__ #define __CL_ACCELERATOR_INTEL_H__ #include "cl_base_object.h" #include "CL/cl.h" #include "CL/cl_ext.h" #include struct _cl_accelerator_intel { _cl_base_object base; cl_accelerator_intel prev, next; /* We chain in the allocator, why chain? */ cl_context ctx; /* Context it belongs to */ cl_accelerator_type_intel type; union { cl_motion_estimation_desc_intel me; } desc; /* save desc before we decide how to handle it */ }; #define CL_OBJECT_ACCELERATOR_INTEL_MAGIC 0x7e6a08c9a7ac3e3fLL #define CL_OBJECT_IS_ACCELERATOR_INTEL(obj) \ (((cl_base_object)obj)->magic == CL_OBJECT_ACCELERATOR_INTEL_MAGIC) cl_accelerator_intel cl_accelerator_intel_new(cl_context ctx, cl_accelerator_type_intel accel_type, size_t desc_sz, const void* desc, cl_int* errcode_ret); void cl_accelerator_intel_add_ref(cl_accelerator_intel accel); void cl_accelerator_intel_delete(cl_accelerator_intel accel); #endif Beignet-1.3.2-Source/src/CMakeLists.txt000664 001750 001750 00000015600 13173554000 017007 0ustar00yryr000000 000000 include_directories(${CMAKE_CURRENT_SOURCE_DIR} ${DRM_INCLUDE_DIRS} ${DRM_INCLUDE_DIRS}/../ ${CMAKE_CURRENT_SOURCE_DIR}/../backend/src/backend/ ${CMAKE_CURRENT_SOURCE_DIR}/../include ${LLVM_INCLUDE_DIR} ${OPENGL_INCLUDE_DIRS} ${EGL_INCLUDE_DIRS}) macro (MakeKernelBinStr KERNEL_PATH KERNEL_FILES) foreach (KF ${KERNEL_FILES}) set (input_file ${KERNEL_PATH}/${KF}.cl) set (output_file ${KERNEL_PATH}/${KF}_str.c) list (APPEND KERNEL_STR_FILES ${output_file}) list (GET GBE_BIN_GENERATER -1 GBE_BIN_FILE) if(GEN_PCI_ID) add_custom_command( OUTPUT ${output_file} COMMAND rm -rf ${output_file} COMMAND ${GBE_BIN_GENERATER} -s -o${output_file} -t${GEN_PCI_ID} ${input_file} DEPENDS ${input_file} ${GBE_BIN_FILE} beignet_bitcode) else(GEN_PCI_ID) add_custom_command( OUTPUT ${output_file} COMMAND rm -rf ${output_file} COMMAND ${GBE_BIN_GENERATER} -s -o${output_file} ${input_file} DEPENDS ${input_file} ${GBE_BIN_FILE} beignet_bitcode) endif(GEN_PCI_ID) endforeach (KF) endmacro (MakeKernelBinStr) macro (MakeBuiltInKernelStr KERNEL_PATH KERNEL_FILES) set (output_file ${KERNEL_PATH}/${BUILT_IN_NAME}.cl) set (file_content) file (REMOVE ${output_file}) foreach (KF ${KERNEL_NAMES}) set (input_file ${KERNEL_PATH}/${KF}.cl) file(READ ${input_file} file_content ) STRING(REGEX REPLACE ";" "\\\\;" file_content "${file_content}") file(APPEND ${output_file} ${file_content}) endforeach (KF) endmacro (MakeBuiltInKernelStr) set (KERNEL_STR_FILES) set (KERNEL_NAMES cl_internal_copy_buf_align4 cl_internal_copy_buf_align16 cl_internal_copy_buf_unalign_same_offset cl_internal_copy_buf_unalign_dst_offset cl_internal_copy_buf_unalign_src_offset cl_internal_copy_buf_rect cl_internal_copy_buf_rect_align4 cl_internal_copy_image_1d_to_1d cl_internal_copy_image_2d_to_2d cl_internal_copy_image_3d_to_2d cl_internal_copy_image_2d_to_3d cl_internal_copy_image_3d_to_3d cl_internal_copy_image_2d_to_2d_array cl_internal_copy_image_1d_array_to_1d_array cl_internal_copy_image_2d_array_to_2d_array cl_internal_copy_image_2d_array_to_2d cl_internal_copy_image_2d_array_to_3d cl_internal_copy_image_3d_to_2d_array cl_internal_copy_image_2d_to_buffer cl_internal_copy_image_2d_to_buffer_align16 cl_internal_copy_image_3d_to_buffer cl_internal_copy_buffer_to_image_2d cl_internal_copy_buffer_to_image_2d_align16 cl_internal_copy_buffer_to_image_3d cl_internal_fill_buf_align8 cl_internal_fill_buf_align4 cl_internal_fill_buf_align2 cl_internal_fill_buf_unalign cl_internal_fill_buf_align128 cl_internal_fill_image_1d cl_internal_fill_image_1d_array cl_internal_fill_image_2d cl_internal_fill_image_2d_array cl_internal_fill_image_3d cl_internal_block_motion_estimate_intel) set (BUILT_IN_NAME cl_internal_built_in_kernel) MakeBuiltInKernelStr ("${CMAKE_CURRENT_SOURCE_DIR}/kernels/" "${KERNEL_NAMES}") MakeKernelBinStr ("${CMAKE_CURRENT_SOURCE_DIR}/kernels/" "${KERNEL_NAMES}") MakeKernelBinStr ("${CMAKE_CURRENT_SOURCE_DIR}/kernels/" "${BUILT_IN_NAME}") set(OPENCL_SRC ${KERNEL_STR_FILES} cl_base_object.c cl_api.c cl_api_platform_id.c cl_api_device_id.c cl_api_mem.c cl_api_kernel.c cl_api_command_queue.c cl_api_event.c cl_api_context.c cl_api_sampler.c cl_api_program.c cl_alloc.c cl_kernel.c cl_program.c cl_gbe_loader.cpp cl_sampler.c cl_accelerator_intel.c cl_event.c cl_enqueue.c cl_image.c cl_mem.c cl_platform_id.c cl_extensions.c cl_device_id.c cl_context.c cl_command_queue.c cl_command_queue.h cl_device_enqueue.c cl_device_enqueue.h cl_command_queue_gen7.c cl_command_queue_enqueue.c cl_utils.c cl_driver.h cl_driver.cpp cl_driver_defs.c intel/intel_gpgpu.c intel/intel_batchbuffer.c intel/intel_driver.c performance.c) if (X11_FOUND) set(CMAKE_CXX_FLAGS "-DHAS_X11 ${CMAKE_CXX_FLAGS}") set(CMAKE_C_FLAGS "-DHAS_X11 ${CMAKE_C_FLAGS}") set(OPENCL_SRC ${OPENCL_SRC} x11/dricommon.c x11/va_dri2.c) endif (X11_FOUND) if (CMRT_FOUND) set(CMAKE_CXX_FLAGS "-DHAS_CMRT ${CMAKE_CXX_FLAGS}") set(CMAKE_CXX_FLAGS "-DCMRT_PATH=${CMRT_LIBDIR}/libcmrt.so.1 ${CMAKE_CXX_FLAGS}") set(CMAKE_C_FLAGS "-DHAS_CMRT ${CMAKE_C_FLAGS}") set(OPENCL_SRC ${OPENCL_SRC} cl_cmrt.cpp) endif (CMRT_FOUND) if (OPENGL_FOUND AND EGL_FOUND) set (OPENCL_SRC ${OPENCL_SRC} cl_mem_gl.c cl_gl_api.c ) SET(CMAKE_CXX_FLAGS "-DHAS_GL_EGL ${CMAKE_CXX_FLAGS}") SET(CMAKE_C_FLAGS "-DHAS_GL_EGL ${CMAKE_C_FLAGS}") endif (OPENGL_FOUND AND EGL_FOUND) if (OCLIcd_FOUND) set (OPENCL_SRC ${OPENCL_SRC} cl_khr_icd.c) SET(CMAKE_CXX_FLAGS "-DHAS_OCLIcd ${CMAKE_CXX_FLAGS}") SET(CMAKE_C_FLAGS "-DHAS_OCLIcd ${CMAKE_C_FLAGS}") endif (OCLIcd_FOUND) if (HAVE_DRM_INTEL_USERPTR) SET(CMAKE_CXX_FLAGS "-DHAS_USERPTR ${CMAKE_CXX_FLAGS}") SET(CMAKE_C_FLAGS "-DHAS_USERPTR ${CMAKE_C_FLAGS}") endif (HAVE_DRM_INTEL_USERPTR) if (HAVE_DRM_INTEL_EU_TOTAL) SET(CMAKE_CXX_FLAGS "-DHAS_EU_TOTAL ${CMAKE_CXX_FLAGS}") SET(CMAKE_C_FLAGS "-DHAS_EU_TOTAL ${CMAKE_C_FLAGS}") endif (HAVE_DRM_INTEL_EU_TOTAL) if (HAVE_DRM_INTEL_SUBSLICE_TOTAL) SET(CMAKE_CXX_FLAGS "-DHAS_SUBSLICE_TOTAL ${CMAKE_CXX_FLAGS}") SET(CMAKE_C_FLAGS "-DHAS_SUBSLICE_TOTAL ${CMAKE_C_FLAGS}") endif (HAVE_DRM_INTEL_SUBSLICE_TOTAL) if (HAVE_DRM_INTEL_POOLED_EU) SET(CMAKE_CXX_FLAGS "-DHAS_POOLED_EU ${CMAKE_CXX_FLAGS}") SET(CMAKE_C_FLAGS "-DHAS_POOLED_EU ${CMAKE_C_FLAGS}") endif (HAVE_DRM_INTEL_POOLED_EU) if (HAVE_DRM_INTEL_MIN_EU_IN_POOL) SET(CMAKE_CXX_FLAGS "-DHAS_MIN_EU_IN_POOL ${CMAKE_CXX_FLAGS}") SET(CMAKE_C_FLAGS "-DHAS_MIN_EU_IN_POOL ${CMAKE_C_FLAGS}") endif (HAVE_DRM_INTEL_MIN_EU_IN_POOL) if (HAVE_DRM_INTEL_BO_SET_SOFTPIN) SET(CMAKE_CXX_FLAGS "-DHAS_BO_SET_SOFTPIN ${CMAKE_CXX_FLAGS}") SET(CMAKE_C_FLAGS "-DHAS_BO_SET_SOFTPIN ${CMAKE_C_FLAGS}") endif (HAVE_DRM_INTEL_BO_SET_SOFTPIN) set(GIT_SHA1 "git_sha1.h") add_custom_target(${GIT_SHA1} ALL COMMAND chmod +x ${CMAKE_CURRENT_SOURCE_DIR}/git_sha1.sh COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/git_sha1.sh ${CMAKE_CURRENT_SOURCE_DIR} ${GIT_SHA1} ) SET(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,-Bsymbolic,--allow-shlib-undefined") link_directories (${LLVM_LIBRARY_DIR} ${DRM_LIBDIR} ${OPENGL_LIBDIR} ${EGL_LIBDIR}) add_library(cl SHARED ${OPENCL_SRC}) ADD_DEPENDENCIES(cl ${GIT_SHA1}) target_link_libraries( cl rt ${X11_LIBRARIES} ${XEXT_LIBRARIES} ${XFIXES_LIBRARIES} ${DRM_INTEL_LIBRARIES} ${DRM_LIBRARIES} ${CMAKE_THREAD_LIBS_INIT} ${CMAKE_DL_LIBS} ${OPENGL_LIBRARIES} ${EGL_LIBRARIES}) install (TARGETS cl LIBRARY DESTINATION ${BEIGNET_INSTALL_DIR}) Beignet-1.3.2-Source/src/cl_device_id.h000664 001750 001750 00000017411 13173554000 017013 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_DEVICE_ID_H__ #define __CL_DEVICE_ID_H__ #define EXTENSTION_LENGTH 512 #include "cl_base_object.h" /* Store complete information about the device */ struct _cl_device_id { _cl_base_object base; cl_device_type device_type; cl_uint device_id; cl_uint vendor_id; cl_uint max_compute_unit; // maximum EU number cl_uint max_thread_per_unit; // maximum EU threads per EU. cl_uint sub_slice_count; // Device's sub slice count cl_uint max_work_item_dimensions; // should be 3. size_t max_work_item_sizes[3]; // equal to maximum work group size. size_t max_work_group_size; // maximum work group size under simd16 mode. size_t max_1d_global_work_sizes[3]; // maximum 1d global work size for builtin kernels. size_t max_2d_global_work_sizes[3]; // maximum 2d global work size for builtin kernels. size_t max_3d_global_work_sizes[3]; // maximum 3d global work size for builtin kernels. cl_uint preferred_vector_width_char; cl_uint preferred_vector_width_short; cl_uint preferred_vector_width_int; cl_uint preferred_vector_width_long; cl_uint preferred_vector_width_float; cl_uint preferred_vector_width_double; cl_uint preferred_vector_width_half; cl_uint native_vector_width_char; cl_uint native_vector_width_short; cl_uint native_vector_width_int; cl_uint native_vector_width_long; cl_uint native_vector_width_float; cl_uint native_vector_width_double; cl_uint native_vector_width_half; cl_uint max_clock_frequency; cl_uint address_bits; cl_ulong max_mem_alloc_size; cl_device_svm_capabilities svm_capabilities; cl_uint preferred_platform_atomic_alignment; cl_uint preferred_global_atomic_alignment; cl_uint preferred_local_atomic_alignment; cl_bool image_support; cl_uint max_read_image_args; cl_uint max_write_image_args; cl_uint max_read_write_image_args; size_t image2d_max_width; size_t image_max_array_size; size_t image2d_max_height; size_t image3d_max_width; size_t image3d_max_height; size_t image3d_max_depth; size_t image_mem_size; cl_uint max_samplers; size_t max_parameter_size; cl_uint mem_base_addr_align; cl_uint min_data_type_align_size; cl_uint max_pipe_args; cl_uint pipe_max_active_reservations; cl_uint pipe_max_packet_siz; cl_device_fp_config single_fp_config; cl_device_fp_config half_fp_config; cl_device_fp_config double_fp_config; cl_device_mem_cache_type global_mem_cache_type; cl_uint global_mem_cache_line_size; cl_ulong global_mem_cache_size; cl_ulong global_mem_size; cl_ulong max_constant_buffer_size; cl_uint max_constant_args; size_t max_global_variable_size; size_t global_variable_preferred_total_size; cl_device_local_mem_type local_mem_type; cl_ulong local_mem_size; cl_ulong scratch_mem_size; cl_bool error_correction_support; cl_bool host_unified_memory; size_t profiling_timer_resolution; cl_bool endian_little; cl_bool available; cl_bool compiler_available; cl_bool linker_available; cl_device_exec_capabilities execution_capabilities; cl_command_queue_properties queue_properties; cl_command_queue_properties queue_on_host_properties; cl_command_queue_properties queue_on_device_properties; cl_uint queue_on_device_preferred_size; cl_uint queue_on_device_max_size; cl_uint max_on_device_queues; cl_uint max_on_device_events; cl_platform_id platform; size_t printf_buffer_size; cl_bool interop_user_sync; const char *name; const char *vendor; const char *version; const char *profile; const char *opencl_c_version; const char extensions[EXTENSTION_LENGTH]; const char *driver_version; const char *spir_versions; const char *built_in_kernels; size_t name_sz; size_t vendor_sz; size_t version_sz; size_t profile_sz; size_t opencl_c_version_sz; size_t extensions_sz; size_t driver_version_sz; size_t spir_versions_sz; size_t built_in_kernels_sz; /* SubDevice specific info */ cl_device_id parent_device; cl_uint partition_max_sub_device; cl_device_partition_property partition_property[3]; cl_device_affinity_domain affinity_domain; cl_device_partition_property partition_type[3]; uint32_t atomic_test_result; cl_uint image_pitch_alignment; cl_uint image_base_address_alignment; size_t sub_group_sizes[2]; size_t sub_group_sizes_sz; //inited as NULL, created only when cmrt kernel is used void* cmrt_device; //realtype: CmDevice* }; #define CL_OBJECT_DEVICE_MAGIC 0x2acaddcca8853c52LL #define CL_OBJECT_IS_DEVICE(obj) ((obj && \ ((cl_base_object)obj)->magic == CL_OBJECT_DEVICE_MAGIC && \ CL_OBJECT_GET_REF(obj) >= 1)) /* Get a device from the given platform */ extern cl_int cl_get_device_ids(cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices); /* Get the intel GPU device we currently have in this machine (if any) */ extern cl_device_id cl_get_gt_device(cl_device_type device_type); /* Provide info about the device */ extern cl_int cl_get_device_info(cl_device_id device, cl_device_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret); extern cl_int cl_get_kernel_workgroup_info(cl_kernel kernel, cl_device_id device, cl_kernel_work_group_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret); extern cl_int cl_get_kernel_subgroup_info(cl_kernel kernel, cl_device_id device, cl_kernel_work_group_info param_name, size_t input_value_size, const void * input_value, size_t param_value_size, void * param_value, size_t * param_value_size_ret); /* Returns the Gen device ID */ extern cl_int cl_device_get_version(cl_device_id device, cl_int *ver); extern size_t cl_get_kernel_max_wg_sz(cl_kernel); extern cl_int cl_devices_list_check(cl_uint num_devices, const cl_device_id *devices); extern cl_int cl_devices_list_include_check(cl_uint num_devices, const cl_device_id *devices, cl_uint num_to_check, const cl_device_id *devices_to_check); #endif /* __CL_DEVICE_ID_H__ */ Beignet-1.3.2-Source/src/.gitignore000664 001750 001750 00000000025 13161142102 016223 0ustar00yryr000000 000000 OCLConfig.h libcl.so Beignet-1.3.2-Source/src/cl_enqueue.h000664 001750 001750 00000006734 13161142102 016546 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Rong Yang */ #ifndef __CL_ENQUEUE_H__ #define __CL_ENQUEUE_H__ #include "cl_internals.h" #include "cl_driver.h" #include "CL/cl.h" typedef enum { EnqueueReturnSuccesss = 0, /* For some case, we have nothing to do, just return SUCCESS. */ EnqueueReadBuffer, EnqueueReadBufferRect, EnqueueWriteBuffer, EnqueueWriteBufferRect, EnqueueCopyBuffer, EnqueueCopyBufferRect, EnqueueReadImage, EnqueueWriteImage, EnqueueCopyImage, EnqueueCopyImageToBuffer, EnqueueCopyBufferToImage, EnqueueMapBuffer, EnqueueMapImage, EnqueueUnmapMemObject, EnqueueNDRangeKernel, EnqueueNativeKernel, EnqueueMarker, EnqueueBarrier, EnqueueFillBuffer, EnqueueFillImage, EnqueueMigrateMemObj, EnqueueSVMFree, EnqueueSVMMemCopy, EnqueueSVMMemFill, EnqueueInvalid } enqueue_type; typedef struct _enqueue_data { enqueue_type type; /* Command type */ cl_mem mem_obj; /* Enqueue's cl_mem */ cl_command_queue queue; /* Command queue */ size_t offset; /* Mem object's offset */ size_t size; /* Size */ size_t origin[3]; /* Origin */ size_t host_origin[3]; /* Origin */ size_t region[3]; /* Region */ size_t row_pitch; /* Row pitch */ size_t slice_pitch; /* Slice pitch */ size_t host_row_pitch; /* Host row pitch, used in read/write buffer rect */ size_t host_slice_pitch; /* Host slice pitch, used in read/write buffer rect */ const void *const_ptr; /* Const ptr for memory read */ void *ptr; /* Ptr for write and return value */ const cl_mem *mem_list; /* mem_list of clEnqueueNativeKernel */ uint8_t unsync_map; /* Indicate the clEnqueueMapBuffer/Image is unsync map */ uint8_t write_map; /* Indicate if the clEnqueueMapBuffer is write enable */ void ** pointers; /* The svm_pointers of clEnqueueSVMFree */ size_t pattern_size; /* the pattern_size of clEnqueueSVMMemFill */ void (*user_func)(void *); /* pointer to a host-callable user function */ void (CL_CALLBACK *free_func)( cl_command_queue queue, cl_uint num_svm_pointers, void *svm_pointers[], void *user_data); /* pointer to pfn_free_func of clEnqueueSVMFree */ cl_gpgpu gpgpu; cl_bool mid_event_of_enq; /* For non-uniform ndrange, one enqueue have a sequence event, the last event need to parse device enqueue information. 0 : last event; 1: non-last event */ } enqueue_data; /* Do real enqueue commands */ extern cl_int cl_enqueue_handle(enqueue_data *data, cl_int status); extern void cl_enqueue_delete(enqueue_data *data); #endif /* __CL_ENQUEUE_H__ */ Beignet-1.3.2-Source/src/cl_command_queue_gen7.c000664 001750 001750 00000044571 13161142102 020635 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "cl_command_queue.h" #include "cl_context.h" #include "cl_program.h" #include "cl_kernel.h" #include "cl_device_id.h" #include "cl_mem.h" #include "cl_event.h" #include "cl_utils.h" #include "cl_alloc.h" #include "cl_device_enqueue.h" #include #include #include #include #define MAX_GROUP_SIZE_IN_HALFSLICE 512 static INLINE size_t cl_kernel_compute_batch_sz(cl_kernel k) { return 256+256; } /* "Varing" payload is the part of the curbe that changes accross threads in the * same work group. Right now, it consists in local IDs and block IPs */ static cl_int cl_set_varying_payload(const cl_kernel ker, char *data, const size_t *local_wk_sz, size_t simd_sz, size_t cst_sz, size_t thread_n) { uint32_t *ids[3] = {NULL,NULL,NULL}; uint16_t *block_ips = NULL; uint32_t *thread_ids = NULL; size_t i, j, k, curr = 0; int32_t id_offset[3], ip_offset, tid_offset; cl_int err = CL_SUCCESS; int32_t dw_ip_offset = -1; id_offset[0] = interp_kernel_get_curbe_offset(ker->opaque, GBE_CURBE_LOCAL_ID_X, 0); id_offset[1] = interp_kernel_get_curbe_offset(ker->opaque, GBE_CURBE_LOCAL_ID_Y, 0); id_offset[2] = interp_kernel_get_curbe_offset(ker->opaque, GBE_CURBE_LOCAL_ID_Z, 0); ip_offset = interp_kernel_get_curbe_offset(ker->opaque, GBE_CURBE_BLOCK_IP, 0); tid_offset = interp_kernel_get_curbe_offset(ker->opaque, GBE_CURBE_THREAD_ID, 0); if (ip_offset < 0) dw_ip_offset = interp_kernel_get_curbe_offset(ker->opaque, GBE_CURBE_DW_BLOCK_IP, 0); assert(ip_offset < 0 || dw_ip_offset < 0); assert(ip_offset >= 0 || dw_ip_offset >= 0); if (id_offset[0] >= 0) TRY_ALLOC(ids[0], (uint32_t*) alloca(sizeof(uint32_t)*thread_n*simd_sz)); if (id_offset[1] >= 0) TRY_ALLOC(ids[1], (uint32_t*) alloca(sizeof(uint32_t)*thread_n*simd_sz)); if (id_offset[2] >= 0) TRY_ALLOC(ids[2], (uint32_t*) alloca(sizeof(uint32_t)*thread_n*simd_sz)); TRY_ALLOC(block_ips, (uint16_t*) alloca(sizeof(uint16_t)*thread_n*simd_sz)); if (tid_offset >= 0) { TRY_ALLOC(thread_ids, (uint32_t*) alloca(sizeof(uint32_t)*thread_n)); memset(thread_ids, 0, sizeof(uint32_t)*thread_n); } /* 0xffff means that the lane is inactivated */ memset(block_ips, 0xff, sizeof(int16_t)*thread_n*simd_sz); /* Compute the IDs and the block IPs */ for (k = 0; k < local_wk_sz[2]; ++k) for (j = 0; j < local_wk_sz[1]; ++j) for (i = 0; i < local_wk_sz[0]; ++i, ++curr) { if (id_offset[0] >= 0) ids[0][curr] = i; if (id_offset[1] >= 0) ids[1][curr] = j; if (id_offset[2] >= 0) ids[2][curr] = k; block_ips[curr] = 0; if (thread_ids) thread_ids[curr/simd_sz] = curr/simd_sz; } /* Copy them to the curbe buffer */ curr = 0; for (i = 0; i < thread_n; ++i, data += cst_sz) { uint32_t *ids0 = (uint32_t *) (data + id_offset[0]); uint32_t *ids1 = (uint32_t *) (data + id_offset[1]); uint32_t *ids2 = (uint32_t *) (data + id_offset[2]); uint16_t *ips = (uint16_t *) (data + ip_offset); uint32_t *dw_ips = (uint32_t *) (data + dw_ip_offset); if (thread_ids) *(uint32_t *)(data + tid_offset) = thread_ids[i]; for (j = 0; j < simd_sz; ++j, ++curr) { if (id_offset[0] >= 0) ids0[j] = ids[0][curr]; if (id_offset[1] >= 0) ids1[j] = ids[1][curr]; if (id_offset[2] >= 0) ids2[j] = ids[2][curr]; if (ip_offset >= 0) ips[j] = block_ips[curr]; if (dw_ip_offset >= 0) dw_ips[j] = block_ips[curr]; } } error: return err; } static int cl_upload_constant_buffer(cl_command_queue queue, cl_kernel ker, cl_gpgpu gpgpu) { if (interp_kernel_get_ocl_version(ker->opaque) >= 200) { // pass the starting of constant address space int32_t constant_addrspace = interp_kernel_get_curbe_offset(ker->opaque, GBE_CURBE_CONSTANT_ADDRSPACE, 0); if (constant_addrspace >= 0) { size_t global_const_size = interp_program_get_global_constant_size(ker->program->opaque); if (global_const_size > 0) { *(char **)(ker->curbe + constant_addrspace) = ker->program->global_data_ptr; cl_gpgpu_bind_buf(gpgpu, ker->program->global_data, constant_addrspace, 0, ALIGN(global_const_size, getpagesize()), BTI_CONSTANT); } } return 0; } // TODO this is only valid for OpenCL 1.2, // under ocl1.2 we gather all constant into one dedicated surface. // but in 2.0 we put program global into one surface, but constants // pass through kernel argument in each separate buffer int32_t arg; size_t offset = 0; uint32_t raw_size = 0, aligned_size =0; gbe_program prog = ker->program->opaque; const int32_t arg_n = interp_kernel_get_arg_num(ker->opaque); size_t global_const_size = interp_program_get_global_constant_size(prog); raw_size = global_const_size; // Surface state need 4 byte alignment, and Constant argument's buffer size // have align to 4 byte when alloc, so align global constant size to 4 can // ensure the finally aligned_size align to 4. aligned_size = ALIGN(raw_size, 4); /* Reserve 8 bytes to get rid of 0 address */ if(global_const_size == 0) aligned_size = 8; for (arg = 0; arg < arg_n; ++arg) { const enum gbe_arg_type type = interp_kernel_get_arg_type(ker->opaque, arg); if (type == GBE_ARG_CONSTANT_PTR && ker->args[arg].mem) { uint32_t alignment = interp_kernel_get_arg_align(ker->opaque, arg); assert(alignment != 0); cl_mem mem = ker->args[arg].mem; raw_size += mem->size; aligned_size = ALIGN(aligned_size, alignment); aligned_size += mem->size; } } if(raw_size == 0) return 0; cl_buffer bo = cl_gpgpu_alloc_constant_buffer(gpgpu, aligned_size, BTI_CONSTANT); if (bo == NULL) return -1; cl_buffer_map(bo, 1); char * cst_addr = cl_buffer_get_virtual(bo); if (cst_addr == NULL) return -1; /* upload the global constant data */ if (global_const_size > 0) { interp_program_get_global_constant_data(prog, (char*)(cst_addr+offset)); offset += global_const_size; } /* reserve 8 bytes to get rid of 0 address */ if(global_const_size == 0) { offset = 8; } /* upload constant buffer argument */ int32_t curbe_offset = 0; for (arg = 0; arg < arg_n; ++arg) { const enum gbe_arg_type type = interp_kernel_get_arg_type(ker->opaque, arg); if (type == GBE_ARG_CONSTANT_PTR && ker->args[arg].mem) { cl_mem mem = ker->args[arg].mem; uint32_t alignment = interp_kernel_get_arg_align(ker->opaque, arg); offset = ALIGN(offset, alignment); curbe_offset = interp_kernel_get_curbe_offset(ker->opaque, GBE_CURBE_KERNEL_ARGUMENT, arg); if (curbe_offset < 0) continue; *(uint32_t *) (ker->curbe + curbe_offset) = offset; cl_buffer_map(mem->bo, 1); void * addr = cl_buffer_get_virtual(mem->bo); memcpy(cst_addr + offset, addr, mem->size); cl_buffer_unmap(mem->bo); offset += mem->size; } } cl_buffer_unmap(bo); return 0; } /* Will return the total amount of slm used */ static int32_t cl_curbe_fill(cl_kernel ker, const uint32_t work_dim, const size_t *global_wk_off, const size_t *global_wk_sz, const size_t *local_wk_sz, const size_t *enqueued_local_wk_sz, size_t thread_n) { int32_t offset; #define UPLOAD(ENUM, VALUE) \ if ((offset = interp_kernel_get_curbe_offset(ker->opaque, ENUM, 0)) >= 0) \ *((uint32_t *) (ker->curbe + offset)) = VALUE; UPLOAD(GBE_CURBE_LOCAL_SIZE_X, local_wk_sz[0]); UPLOAD(GBE_CURBE_LOCAL_SIZE_Y, local_wk_sz[1]); UPLOAD(GBE_CURBE_LOCAL_SIZE_Z, local_wk_sz[2]); UPLOAD(GBE_CURBE_ENQUEUED_LOCAL_SIZE_X, enqueued_local_wk_sz[0]); UPLOAD(GBE_CURBE_ENQUEUED_LOCAL_SIZE_Y, enqueued_local_wk_sz[1]); UPLOAD(GBE_CURBE_ENQUEUED_LOCAL_SIZE_Z, enqueued_local_wk_sz[2]); UPLOAD(GBE_CURBE_GLOBAL_SIZE_X, global_wk_sz[0]); UPLOAD(GBE_CURBE_GLOBAL_SIZE_Y, global_wk_sz[1]); UPLOAD(GBE_CURBE_GLOBAL_SIZE_Z, global_wk_sz[2]); UPLOAD(GBE_CURBE_GLOBAL_OFFSET_X, global_wk_off[0]); UPLOAD(GBE_CURBE_GLOBAL_OFFSET_Y, global_wk_off[1]); UPLOAD(GBE_CURBE_GLOBAL_OFFSET_Z, global_wk_off[2]); UPLOAD(GBE_CURBE_GROUP_NUM_X, global_wk_sz[0] / enqueued_local_wk_sz[0] + (global_wk_sz[0]%enqueued_local_wk_sz[0]?1:0)); UPLOAD(GBE_CURBE_GROUP_NUM_Y, global_wk_sz[1] / enqueued_local_wk_sz[1] + (global_wk_sz[1]%enqueued_local_wk_sz[1]?1:0)); UPLOAD(GBE_CURBE_GROUP_NUM_Z, global_wk_sz[2] / enqueued_local_wk_sz[2] + (global_wk_sz[2]%enqueued_local_wk_sz[2]?1:0)); UPLOAD(GBE_CURBE_THREAD_NUM, thread_n); UPLOAD(GBE_CURBE_WORK_DIM, work_dim); #undef UPLOAD /* Handle the various offsets to SLM */ const int32_t arg_n = interp_kernel_get_arg_num(ker->opaque); int32_t arg, slm_offset = interp_kernel_get_slm_size(ker->opaque); ker->local_mem_sz = 0; for (arg = 0; arg < arg_n; ++arg) { const enum gbe_arg_type type = interp_kernel_get_arg_type(ker->opaque, arg); if (type != GBE_ARG_LOCAL_PTR) continue; uint32_t align = interp_kernel_get_arg_align(ker->opaque, arg); assert(align != 0); slm_offset = ALIGN(slm_offset, align); offset = interp_kernel_get_curbe_offset(ker->opaque, GBE_CURBE_KERNEL_ARGUMENT, arg); if (offset < 0) continue; uint32_t *slmptr = (uint32_t *) (ker->curbe + offset); *slmptr = slm_offset; slm_offset += ker->args[arg].local_sz; ker->local_mem_sz += ker->args[arg].local_sz; } return slm_offset; } static void cl_bind_stack(cl_gpgpu gpgpu, cl_kernel ker) { cl_context ctx = ker->program->ctx; cl_device_id device = ctx->devices[0]; const int32_t per_lane_stack_sz = ker->stack_size; const int32_t value = GBE_CURBE_EXTRA_ARGUMENT; const int32_t sub_value = GBE_STACK_BUFFER; const int32_t offset_stack_buffer = interp_kernel_get_curbe_offset(ker->opaque, value, sub_value); int32_t stack_sz = per_lane_stack_sz; /* No stack required for this kernel */ if (per_lane_stack_sz == 0) return; /* The stack size is given for *each* SIMD lane. So, we accordingly compute * the size we need for the complete machine */ assert(offset_stack_buffer >= 0); stack_sz *= interp_kernel_get_simd_width(ker->opaque); stack_sz *= device->max_compute_unit * ctx->devices[0]->max_thread_per_unit; /* for some hardware, part of EUs are disabled with EU id reserved, * it makes the active EU id larger than count of EUs within a subslice, * need to enlarge stack size for such case to avoid out of range. */ cl_driver_enlarge_stack_size(ctx->drv, &stack_sz); const int32_t offset_stack_size = interp_kernel_get_curbe_offset(ker->opaque, GBE_CURBE_STACK_SIZE, 0); if (offset_stack_size >= 0) { *(uint64_t *)(ker->curbe + offset_stack_size) = stack_sz; } cl_gpgpu_set_stack(gpgpu, offset_stack_buffer, stack_sz, BTI_PRIVATE); } static int cl_bind_profiling(cl_gpgpu gpgpu, uint32_t simd_sz, cl_kernel ker, size_t global_sz, size_t local_sz, uint32_t bti) { int32_t offset; int i = 0; int thread_num; if (simd_sz == 16) { for(i = 0; i < 3; i++) { offset = interp_kernel_get_curbe_offset(ker->opaque, GBE_CURBE_PROFILING_TIMESTAMP0 + i, 0); assert(offset >= 0); memset(ker->curbe + offset, 0x0, sizeof(uint32_t)*8*2); thread_num = (local_sz + 15)/16; } } else { assert(simd_sz == 8); for(i = 0; i < 5; i++) { offset = interp_kernel_get_curbe_offset(ker->opaque, GBE_CURBE_PROFILING_TIMESTAMP0 + i, 0); assert(offset >= 0); memset(ker->curbe + offset, 0x0, sizeof(uint32_t)*8); thread_num = (local_sz + 7)/8; } } offset = interp_kernel_get_curbe_offset(ker->opaque, GBE_CURBE_PROFILING_BUF_POINTER, 0); thread_num = thread_num*(global_sz/local_sz); if (cl_gpgpu_set_profiling_buffer(gpgpu, thread_num*128 + 4, offset, bti)) return -1; return 0; } static int cl_alloc_printf(cl_gpgpu gpgpu, cl_kernel ker, void* printf_info, int printf_num, size_t global_sz) { /* An guess size. */ size_t buf_size = global_sz * sizeof(int) * 16 * printf_num; if (buf_size > 16*1024*1024) //at most. buf_size = 16*1024*1024; if (buf_size < 1*1024*1024) // at least. buf_size = 1*1024*1024; if (cl_gpgpu_set_printf_buffer(gpgpu, buf_size, interp_get_printf_buf_bti(printf_info)) != 0) return -1; return 0; } LOCAL cl_int cl_command_queue_ND_range_gen7(cl_command_queue queue, cl_kernel ker, cl_event event, const uint32_t work_dim, const size_t *global_wk_off, const size_t *global_dim_off, const size_t *global_wk_sz, const size_t *global_wk_sz_use, const size_t *local_wk_sz, const size_t *local_wk_sz_use) { cl_gpgpu gpgpu = cl_gpgpu_new(queue->ctx->drv); cl_context ctx = queue->ctx; char *final_curbe = NULL; /* Includes them and one sub-buffer per group */ cl_gpgpu_kernel kernel; const uint32_t simd_sz = cl_kernel_get_simd_width(ker); size_t i, batch_sz = 0u, local_sz = 0u; size_t cst_sz = interp_kernel_get_curbe_size(ker->opaque); int32_t scratch_sz = interp_kernel_get_scratch_size(ker->opaque); size_t thread_n = 0u; int printf_num = 0; cl_int err = CL_SUCCESS; size_t global_size = global_wk_sz[0] * global_wk_sz[1] * global_wk_sz[2]; void* printf_info = NULL; uint32_t max_bti = 0; if (ker->exec_info_n > 0) { cst_sz += ker->exec_info_n * sizeof(void *); cst_sz = (cst_sz + 31) / 32 * 32; //align to register size, hard code here. ker->curbe = cl_realloc(ker->curbe, cst_sz); } ker->curbe_sz = cst_sz; /* Setup kernel */ kernel.name = interp_kernel_get_name(ker->opaque); kernel.grf_blocks = 128; kernel.bo = ker->bo; kernel.barrierID = 0; kernel.slm_sz = 0; kernel.use_slm = interp_kernel_use_slm(ker->opaque); /* Compute the number of HW threads we need */ if(UNLIKELY(err = cl_kernel_work_group_sz(ker, local_wk_sz_use, 3, &local_sz) != CL_SUCCESS)) { DEBUGP(DL_ERROR, "Work group size exceed Kernel's work group size."); return err; } kernel.thread_n = thread_n = (local_sz + simd_sz - 1) / simd_sz; kernel.curbe_sz = cst_sz; if (scratch_sz > ker->program->ctx->devices[0]->scratch_mem_size) { DEBUGP(DL_ERROR, "Out of scratch memory %d.", scratch_sz); return CL_OUT_OF_RESOURCES; } /* Curbe step 1: fill the constant urb buffer data shared by all threads */ if (ker->curbe) { kernel.slm_sz = cl_curbe_fill(ker, work_dim, global_wk_off, global_wk_sz,local_wk_sz_use ,local_wk_sz, thread_n); if (kernel.slm_sz > ker->program->ctx->devices[0]->local_mem_size) { DEBUGP(DL_ERROR, "Out of shared local memory %d.", kernel.slm_sz); return CL_OUT_OF_RESOURCES; } } printf_info = interp_dup_printfset(ker->opaque); cl_gpgpu_set_printf_info(gpgpu, printf_info); /* Setup the kernel */ if (queue->props & CL_QUEUE_PROFILING_ENABLE) err = cl_gpgpu_state_init(gpgpu, ctx->devices[0]->max_compute_unit * ctx->devices[0]->max_thread_per_unit, cst_sz / 32, 1); else err = cl_gpgpu_state_init(gpgpu, ctx->devices[0]->max_compute_unit * ctx->devices[0]->max_thread_per_unit, cst_sz / 32, 0); if (err != 0) goto error; printf_num = interp_get_printf_num(printf_info); if (printf_num) { if (cl_alloc_printf(gpgpu, ker, printf_info, printf_num, global_size) != 0) goto error; } if (interp_get_profiling_bti(ker->opaque) != 0) { if (cl_bind_profiling(gpgpu, simd_sz, ker, global_size, local_sz, interp_get_profiling_bti(ker->opaque))) goto error; cl_gpgpu_set_profiling_info(gpgpu, interp_dup_profiling(ker->opaque)); } else { cl_gpgpu_set_profiling_info(gpgpu, NULL); } /* Bind user buffers */ cl_command_queue_bind_surface(queue, ker, gpgpu, &max_bti); /* Bind user images */ if(UNLIKELY(err = cl_command_queue_bind_image(queue, ker, gpgpu, &max_bti) != CL_SUCCESS)) return err; /* Bind all exec infos */ cl_command_queue_bind_exec_info(queue, ker, gpgpu, &max_bti); /* Bind device enqueue buffer */ cl_device_enqueue_bind_buffer(gpgpu, ker, &max_bti, &kernel); /* Bind all samplers */ if (ker->vme) cl_gpgpu_bind_vme_state(gpgpu, ker->accel); else cl_gpgpu_bind_sampler(gpgpu, ker->samplers, ker->sampler_sz); if (cl_gpgpu_set_scratch(gpgpu, scratch_sz) != 0) goto error; /* Bind a stack if needed */ cl_bind_stack(gpgpu, ker); if (cl_upload_constant_buffer(queue, ker, gpgpu) != 0) goto error; cl_gpgpu_states_setup(gpgpu, &kernel); /* Curbe step 2. Give the localID and upload it to video memory */ if (ker->curbe) { assert(cst_sz > 0); TRY_ALLOC (final_curbe, (char*) alloca(thread_n * cst_sz)); for (i = 0; i < thread_n; ++i) { memcpy(final_curbe + cst_sz * i, ker->curbe, cst_sz); } TRY (cl_set_varying_payload, ker, final_curbe, local_wk_sz_use, simd_sz, cst_sz, thread_n); if (cl_gpgpu_upload_curbes(gpgpu, final_curbe, thread_n*cst_sz) != 0) goto error; } /* Start a new batch buffer */ batch_sz = cl_kernel_compute_batch_sz(ker); if (cl_gpgpu_batch_reset(gpgpu, batch_sz) != 0) goto error; //cl_set_thread_batch_buf(queue, cl_gpgpu_ref_batch_buf(gpgpu)); cl_gpgpu_batch_start(gpgpu); /* Issue the GPGPU_WALKER command */ cl_gpgpu_walker(gpgpu, simd_sz, thread_n, global_wk_off,global_dim_off, global_wk_sz_use, local_wk_sz_use); /* Close the batch buffer and submit it */ cl_gpgpu_batch_end(gpgpu, 0); event->exec_data.queue = queue; event->exec_data.gpgpu = gpgpu; event->exec_data.type = EnqueueNDRangeKernel; return CL_SUCCESS; error: /* only some command/buffer internal error reach here, so return error code OOR */ return CL_OUT_OF_RESOURCES; } Beignet-1.3.2-Source/src/cl_api_mem.c000664 001750 001750 00000201717 13161142102 016477 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "cl_mem.h" #include "cl_enqueue.h" #include "cl_command_queue.h" #include "cl_event.h" #include "CL/cl.h" cl_int clSetMemObjectDestructorCallback(cl_mem memobj, void(CL_CALLBACK *pfn_notify)(cl_mem, void *), void *user_data) { if (!CL_OBJECT_IS_MEM(memobj)) return CL_INVALID_MEM_OBJECT; if (pfn_notify == NULL) return CL_INVALID_VALUE; return cl_mem_set_destructor_callback(memobj, pfn_notify, user_data); } cl_int clGetMemObjectInfo(cl_mem memobj, cl_mem_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { const void *src_ptr = NULL; size_t src_size = 0; cl_mem_object_type type; size_t ptr, offset; cl_int ref; cl_mem parent; if (!CL_OBJECT_IS_MEM(memobj)) { return CL_INVALID_MEM_OBJECT; } switch (param_name) { case CL_MEM_TYPE: { type = cl_get_mem_object_type(memobj); src_ptr = &type; src_size = sizeof(cl_mem_object_type); break; } case CL_MEM_FLAGS: src_ptr = &memobj->flags; src_size = sizeof(cl_mem_flags); break; case CL_MEM_SIZE: src_ptr = &memobj->size; src_size = sizeof(size_t); break; case CL_MEM_HOST_PTR: { ptr = 0; if (memobj->type == CL_MEM_IMAGE_TYPE) { ptr = (size_t)memobj->host_ptr; } else { struct _cl_mem_buffer *buf = (struct _cl_mem_buffer *)memobj; ptr = (size_t)memobj->host_ptr + buf->sub_offset; } src_ptr = &ptr; src_size = sizeof(size_t); break; } case CL_MEM_USES_SVM_POINTER: { src_ptr = &memobj->is_svm; src_size = sizeof(memobj->is_svm); break; } case CL_MEM_MAP_COUNT: src_ptr = &memobj->map_ref; src_size = sizeof(cl_uint); break; case CL_MEM_REFERENCE_COUNT: { ref = CL_OBJECT_GET_REF(memobj); src_ptr = &ref; src_size = sizeof(cl_int); break; } case CL_MEM_CONTEXT: src_ptr = &memobj->ctx; src_size = sizeof(cl_context); break; case CL_MEM_ASSOCIATED_MEMOBJECT: { parent = NULL; if (memobj->type == CL_MEM_SUBBUFFER_TYPE) { struct _cl_mem_buffer *buf = (struct _cl_mem_buffer *)memobj; parent = (cl_mem)(buf->parent); } else if (memobj->type == CL_MEM_IMAGE_TYPE) { parent = memobj; } else if (memobj->type == CL_MEM_BUFFER1D_IMAGE_TYPE) { struct _cl_mem_buffer1d_image *image_buffer = (struct _cl_mem_buffer1d_image *)memobj; parent = image_buffer->descbuffer; } else parent = NULL; src_ptr = &parent; src_size = sizeof(cl_mem); break; } case CL_MEM_OFFSET: { offset = 0; if (memobj->type == CL_MEM_SUBBUFFER_TYPE) { struct _cl_mem_buffer *buf = (struct _cl_mem_buffer *)memobj; offset = buf->sub_offset; } src_ptr = &offset; src_size = sizeof(size_t); break; } default: return CL_INVALID_VALUE; } return cl_get_info_helper(src_ptr, src_size, param_value, param_value_size, param_value_size_ret); } cl_int clGetImageInfo(cl_mem memobj, cl_image_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { const void *src_ptr = NULL; size_t src_size = 0; struct _cl_mem_image *image; size_t height, depth, array_sz; cl_uint value; if (!CL_OBJECT_IS_MEM(memobj)) { return CL_INVALID_MEM_OBJECT; } image = cl_mem_image(memobj); switch (param_name) { case CL_IMAGE_FORMAT: src_ptr = &image->fmt; src_size = sizeof(cl_image_format); break; case CL_IMAGE_ELEMENT_SIZE: src_ptr = &image->bpp; src_size = sizeof(size_t); break; case CL_IMAGE_ROW_PITCH: src_ptr = &image->row_pitch; src_size = sizeof(size_t); break; case CL_IMAGE_SLICE_PITCH: src_ptr = &image->slice_pitch; src_size = sizeof(size_t); break; case CL_IMAGE_WIDTH: if (memobj->type == CL_MEM_BUFFER1D_IMAGE_TYPE) { struct _cl_mem_buffer1d_image *buffer1d_image = (struct _cl_mem_buffer1d_image *)image; src_ptr = &buffer1d_image->size; } else { src_ptr = &image->w; } src_size = sizeof(size_t); break; case CL_IMAGE_HEIGHT: { height = 0; if (memobj->type != CL_MEM_BUFFER1D_IMAGE_TYPE) { height = IS_1D_IMAGE(image) ? 0 : image->h; } src_ptr = &height; src_size = sizeof(size_t); break; } case CL_IMAGE_DEPTH: { depth = 0; depth = IS_3D_IMAGE(image) ? image->depth : 0; src_ptr = &depth; src_size = sizeof(size_t); break; } case CL_IMAGE_ARRAY_SIZE: { array_sz = 0; array_sz = IS_IMAGE_ARRAY(image) ? image->depth : 0; src_ptr = &array_sz; src_size = sizeof(size_t); break; } case CL_IMAGE_BUFFER: src_ptr = &image->buffer_1d; src_size = sizeof(cl_mem); break; case CL_IMAGE_NUM_MIP_LEVELS: case CL_IMAGE_NUM_SAMPLES: { value = 0; src_ptr = &value; src_size = sizeof(cl_uint); break; } default: return CL_INVALID_VALUE; } return cl_get_info_helper(src_ptr, src_size, param_value, param_value_size, param_value_size_ret); } void * clEnqueueMapBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_map, cl_map_flags map_flags, size_t offset, size_t size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event, cl_int *errcode_ret) { cl_int err = CL_SUCCESS; void *ptr = NULL; void *mem_ptr = NULL; cl_event e = NULL; cl_int e_status; enqueue_data *data = NULL; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_BUFFER(buffer)) { err = CL_INVALID_MEM_OBJECT; break; } if (command_queue->ctx != buffer->ctx) { err = CL_INVALID_CONTEXT; break; } if (!size || offset + size > buffer->size) { err = CL_INVALID_VALUE; break; } if ((map_flags & CL_MAP_READ && buffer->flags & (CL_MEM_HOST_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS)) || (map_flags & (CL_MAP_WRITE | CL_MAP_WRITE_INVALIDATE_REGION) && buffer->flags & (CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_NO_ACCESS))) { err = CL_INVALID_OPERATION; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_MAP_BUFFER, &err); if (err != CL_SUCCESS) { break; } if (blocking_map) { err = cl_event_wait_for_event_ready(e); if (err != CL_SUCCESS) break; /* Blocking call API is a sync point of flush. */ err = cl_command_queue_wait_flush(command_queue); if (err != CL_SUCCESS) { break; } } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } data = &e->exec_data; data->type = EnqueueMapBuffer; data->mem_obj = buffer; data->offset = offset; data->size = size; data->ptr = NULL; data->unsync_map = 0; if (map_flags & (CL_MAP_WRITE | CL_MAP_WRITE_INVALIDATE_REGION)) data->write_map = 1; if (e_status == CL_COMPLETE) { // Sync mode, no need to queue event. err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { err = cl_event_exec(e, CL_SUBMITTED, CL_TRUE); // Submit to get the address. if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } ptr = data->ptr; assert(ptr); err = cl_mem_record_map_mem(buffer, ptr, &mem_ptr, offset, size, NULL, NULL); assert(err == CL_SUCCESS); } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } if (errcode_ret) *errcode_ret = err; return mem_ptr; } cl_int clEnqueueUnmapMemObject(cl_command_queue command_queue, cl_mem memobj, void *mapped_ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; cl_int e_status; enqueue_data *data = NULL; cl_event e = NULL; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_MEM(memobj)) { err = CL_INVALID_MEM_OBJECT; break; } if (command_queue->ctx != memobj->ctx) { err = CL_INVALID_CONTEXT; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_UNMAP_MEM_OBJECT, &err); if (err != CL_SUCCESS) { break; } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } data = &e->exec_data; data->type = EnqueueUnmapMemObject; data->mem_obj = memobj; data->ptr = mapped_ptr; if (e_status == CL_COMPLETE) { // No need to wait err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { // May need to wait some event to complete. err = cl_event_exec(e, CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueReadBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, size_t offset, size_t size, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; enqueue_data *data = NULL; cl_int e_status; cl_event e = NULL; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_BUFFER(buffer)) { err = CL_INVALID_MEM_OBJECT; break; } if (command_queue->ctx != buffer->ctx) { err = CL_INVALID_CONTEXT; break; } if (!ptr || !size || offset + size > buffer->size) { err = CL_INVALID_VALUE; break; } if (buffer->flags & (CL_MEM_HOST_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS)) { err = CL_INVALID_OPERATION; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_READ_BUFFER, &err); if (err != CL_SUCCESS) { break; } if (blocking_read) { err = cl_event_wait_for_event_ready(e); if (err != CL_SUCCESS) break; /* Blocking call API is a sync point of flush. */ err = cl_command_queue_wait_flush(command_queue); if (err != CL_SUCCESS) { break; } } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } data = &e->exec_data; data->type = EnqueueReadBuffer; data->mem_obj = buffer; data->ptr = ptr; data->offset = offset; data->size = size; if (e_status == CL_COMPLETE) { // Sync mode, no need to queue event. err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { err = cl_event_exec(e, CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueWriteBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, size_t offset, size_t size, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; enqueue_data *data = NULL; cl_int e_status; cl_event e = NULL; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_BUFFER(buffer)) { err = CL_INVALID_MEM_OBJECT; break; } if (command_queue->ctx != buffer->ctx) { err = CL_INVALID_CONTEXT; break; } if (!ptr || !size || offset + size > buffer->size) { err = CL_INVALID_VALUE; break; } if (buffer->flags & (CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_NO_ACCESS)) { err = CL_INVALID_OPERATION; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_WRITE_BUFFER, &err); if (err != CL_SUCCESS) { break; } if (blocking_write) { err = cl_event_wait_for_event_ready(e); if (err != CL_SUCCESS) break; /* Blocking call API is a sync point of flush. */ err = cl_command_queue_wait_flush(command_queue); if (err != CL_SUCCESS) { break; } } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } data = &e->exec_data; data->type = EnqueueWriteBuffer; data->mem_obj = buffer; data->const_ptr = ptr; data->offset = offset; data->size = size; if (e_status == CL_COMPLETE) { // Sync mode, no need to queue event. err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { err = cl_event_exec(e, CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueReadBufferRect(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, const size_t *buffer_origin, const size_t *host_origin, const size_t *region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; size_t total_size = 0; enqueue_data *data = NULL; cl_int e_status; cl_event e = NULL; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_BUFFER(buffer)) { err = CL_INVALID_MEM_OBJECT; break; } if (command_queue->ctx != buffer->ctx) { err = CL_INVALID_CONTEXT; break; } if (buffer->flags & (CL_MEM_HOST_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS)) { err = CL_INVALID_OPERATION; break; } if (!ptr || !region || region[0] == 0 || region[1] == 0 || region[2] == 0) { err = CL_INVALID_VALUE; break; } if (buffer_row_pitch == 0) buffer_row_pitch = region[0]; if (buffer_slice_pitch == 0) buffer_slice_pitch = region[1] * buffer_row_pitch; if (host_row_pitch == 0) host_row_pitch = region[0]; if (host_slice_pitch == 0) host_slice_pitch = region[1] * host_row_pitch; if (buffer_row_pitch < region[0] || host_row_pitch < region[0]) { err = CL_INVALID_VALUE; break; } if ((buffer_slice_pitch < region[1] * buffer_row_pitch || buffer_slice_pitch % buffer_row_pitch != 0) || (host_slice_pitch < region[1] * host_row_pitch || host_slice_pitch % host_row_pitch != 0)) { err = CL_INVALID_VALUE; break; } total_size = (buffer_origin[2] + region[2] - 1) * buffer_slice_pitch + (buffer_origin[1] + region[1] - 1) * buffer_row_pitch + buffer_origin[0] + region[0]; if (total_size > buffer->size) { err = CL_INVALID_VALUE; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_READ_BUFFER_RECT, &err); if (err != CL_SUCCESS) { break; } if (blocking_read) { err = cl_event_wait_for_event_ready(e); if (err != CL_SUCCESS) break; /* Blocking call API is a sync point of flush. */ err = cl_command_queue_wait_flush(command_queue); if (err != CL_SUCCESS) { break; } } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } data = &e->exec_data; data->type = EnqueueReadBufferRect; data->mem_obj = buffer; data->ptr = ptr; data->origin[0] = buffer_origin[0]; data->origin[1] = buffer_origin[1]; data->origin[2] = buffer_origin[2]; data->host_origin[0] = host_origin[0]; data->host_origin[1] = host_origin[1]; data->host_origin[2] = host_origin[2]; data->region[0] = region[0]; data->region[1] = region[1]; data->region[2] = region[2]; data->row_pitch = buffer_row_pitch; data->slice_pitch = buffer_slice_pitch; data->host_row_pitch = host_row_pitch; data->host_slice_pitch = host_slice_pitch; if (e_status == CL_COMPLETE) { // Sync mode, no need to queue event. err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { err = cl_event_exec(e, CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueWriteBufferRect(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, const size_t *buffer_origin, const size_t *host_origin, const size_t *region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; size_t total_size = 0; enqueue_data *data = NULL; cl_int e_status; cl_event e = NULL; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_BUFFER(buffer)) { err = CL_INVALID_MEM_OBJECT; break; } if (command_queue->ctx != buffer->ctx) { err = CL_INVALID_CONTEXT; break; } if (buffer->flags & (CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_NO_ACCESS)) { err = CL_INVALID_OPERATION; break; } if (!ptr || !region || region[0] == 0 || region[1] == 0 || region[2] == 0) { err = CL_INVALID_VALUE; break; } if (buffer_row_pitch == 0) buffer_row_pitch = region[0]; if (buffer_slice_pitch == 0) buffer_slice_pitch = region[1] * buffer_row_pitch; if (host_row_pitch == 0) host_row_pitch = region[0]; if (host_slice_pitch == 0) host_slice_pitch = region[1] * host_row_pitch; if (buffer_row_pitch < region[0] || host_row_pitch < region[0]) { err = CL_INVALID_VALUE; break; } if ((buffer_slice_pitch < region[1] * buffer_row_pitch || buffer_slice_pitch % buffer_row_pitch != 0) || (host_slice_pitch < region[1] * host_row_pitch || host_slice_pitch % host_row_pitch != 0)) { err = CL_INVALID_VALUE; break; } total_size = (buffer_origin[2] + region[2] - 1) * buffer_slice_pitch + (buffer_origin[1] + region[1] - 1) * buffer_row_pitch + buffer_origin[0] + region[0]; if (total_size > buffer->size) { err = CL_INVALID_VALUE; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_WRITE_BUFFER_RECT, &err); if (err != CL_SUCCESS) { break; } if (blocking_write) { err = cl_event_wait_for_event_ready(e); if (err != CL_SUCCESS) break; /* Blocking call API is a sync point of flush. */ err = cl_command_queue_wait_flush(command_queue); if (err != CL_SUCCESS) { break; } } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } data = &e->exec_data; data->type = EnqueueWriteBufferRect; data->mem_obj = buffer; data->const_ptr = ptr; data->origin[0] = buffer_origin[0]; data->origin[1] = buffer_origin[1]; data->origin[2] = buffer_origin[2]; data->host_origin[0] = host_origin[0]; data->host_origin[1] = host_origin[1]; data->host_origin[2] = host_origin[2]; data->region[0] = region[0]; data->region[1] = region[1]; data->region[2] = region[2]; data->row_pitch = buffer_row_pitch; data->slice_pitch = buffer_slice_pitch; data->host_row_pitch = host_row_pitch; data->host_slice_pitch = host_slice_pitch; if (e_status == CL_COMPLETE) { // Sync mode, no need to queue event. err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { err = cl_event_exec(e, CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueCopyBuffer(cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, size_t src_offset, size_t dst_offset, size_t cb, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; cl_event e = NULL; cl_int e_status; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_MEM(src_buffer)) { err = CL_INVALID_MEM_OBJECT; break; } if (!CL_OBJECT_IS_MEM(dst_buffer)) { err = CL_INVALID_MEM_OBJECT; break; } if (command_queue->ctx != src_buffer->ctx) { err = CL_INVALID_CONTEXT; break; } if (command_queue->ctx != dst_buffer->ctx) { err = CL_INVALID_CONTEXT; break; } if (src_offset + cb > src_buffer->size) { err = CL_INVALID_VALUE; break; } if (dst_offset + cb > dst_buffer->size) { err = CL_INVALID_VALUE; break; } /* Check overlap */ if (src_buffer == dst_buffer && (src_offset <= dst_offset && dst_offset <= src_offset + cb - 1) && (dst_offset <= src_offset && src_offset <= dst_offset + cb - 1)) { err = CL_MEM_COPY_OVERLAP; break; } /* Check sub overlap */ if (src_buffer->type == CL_MEM_SUBBUFFER_TYPE && dst_buffer->type == CL_MEM_SUBBUFFER_TYPE) { struct _cl_mem_buffer *src_b = (struct _cl_mem_buffer *)src_buffer; struct _cl_mem_buffer *dst_b = (struct _cl_mem_buffer *)dst_buffer; size_t src_sub_offset = src_b->sub_offset; size_t dst_sub_offset = dst_b->sub_offset; if ((src_offset + src_sub_offset <= dst_offset + dst_sub_offset && dst_offset + dst_sub_offset <= src_offset + src_sub_offset + cb - 1) && (dst_offset + dst_sub_offset <= src_offset + src_sub_offset && src_offset + src_sub_offset <= dst_offset + dst_sub_offset + cb - 1)) { err = CL_MEM_COPY_OVERLAP; break; } } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_COPY_BUFFER, &err); if (err != CL_SUCCESS) { break; } err = cl_mem_copy(command_queue, e, src_buffer, dst_buffer, src_offset, dst_offset, cb); if (err != CL_SUCCESS) { break; } /* We will flush the ndrange if no event depend. Else we will add it to queue list. The finish or Complete status will always be done in queue list. */ e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { // Error happend, cancel. err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } err = cl_event_exec(e, e_status == CL_COMPLETE ? CL_SUBMITTED : CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } /* The following code checking overlap is from Appendix of openCL spec 1.1 */ static cl_bool check_copy_overlap(const size_t src_offset[3], const size_t dst_offset[3], const size_t region[3], size_t row_pitch, size_t slice_pitch) { const size_t src_min[] = {src_offset[0], src_offset[1], src_offset[2]}; const size_t src_max[] = {src_offset[0] + region[0], src_offset[1] + region[1], src_offset[2] + region[2]}; const size_t dst_min[] = {dst_offset[0], dst_offset[1], dst_offset[2]}; const size_t dst_max[] = {dst_offset[0] + region[0], dst_offset[1] + region[1], dst_offset[2] + region[2]}; // Check for overlap cl_bool overlap = CL_TRUE; unsigned i; size_t dst_start = dst_offset[2] * slice_pitch + dst_offset[1] * row_pitch + dst_offset[0]; size_t dst_end = dst_start + (region[2] * slice_pitch + region[1] * row_pitch + region[0]); size_t src_start = src_offset[2] * slice_pitch + src_offset[1] * row_pitch + src_offset[0]; size_t src_end = src_start + (region[2] * slice_pitch + region[1] * row_pitch + region[0]); for (i = 0; i != 3; ++i) { overlap = overlap && (src_min[i] < dst_max[i]) && (src_max[i] > dst_min[i]); } if (!overlap) { size_t delta_src_x = (src_offset[0] + region[0] > row_pitch) ? src_offset[0] + region[0] - row_pitch : 0; size_t delta_dst_x = (dst_offset[0] + region[0] > row_pitch) ? dst_offset[0] + region[0] - row_pitch : 0; if ((delta_src_x > 0 && delta_src_x > dst_offset[0]) || (delta_dst_x > 0 && delta_dst_x > src_offset[0])) { if ((src_start <= dst_start && dst_start < src_end) || (dst_start <= src_start && src_start < dst_end)) overlap = CL_TRUE; } if (region[2] > 1) { size_t src_height = slice_pitch / row_pitch; size_t dst_height = slice_pitch / row_pitch; size_t delta_src_y = (src_offset[1] + region[1] > src_height) ? src_offset[1] + region[1] - src_height : 0; size_t delta_dst_y = (dst_offset[1] + region[1] > dst_height) ? dst_offset[1] + region[1] - dst_height : 0; if ((delta_src_y > 0 && delta_src_y > dst_offset[1]) || (delta_dst_y > 0 && delta_dst_y > src_offset[1])) { if ((src_start <= dst_start && dst_start < src_end) || (dst_start <= src_start && src_start < dst_end)) overlap = CL_TRUE; } } } return overlap; } cl_int clEnqueueCopyBufferRect(cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, const size_t *src_origin, const size_t *dst_origin, const size_t *region, size_t src_row_pitch, size_t src_slice_pitch, size_t dst_row_pitch, size_t dst_slice_pitch, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; cl_event e = NULL; size_t total_size = 0; cl_int e_status; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_MEM(src_buffer)) { err = CL_INVALID_MEM_OBJECT; break; } if (!CL_OBJECT_IS_MEM(dst_buffer)) { err = CL_INVALID_MEM_OBJECT; break; } if ((command_queue->ctx != src_buffer->ctx) || (command_queue->ctx != dst_buffer->ctx)) { err = CL_INVALID_CONTEXT; break; } if (!region || region[0] == 0 || region[1] == 0 || region[2] == 0) { err = CL_INVALID_VALUE; break; } if (src_row_pitch == 0) src_row_pitch = region[0]; if (src_slice_pitch == 0) src_slice_pitch = region[1] * src_row_pitch; if (dst_row_pitch == 0) dst_row_pitch = region[0]; if (dst_slice_pitch == 0) dst_slice_pitch = region[1] * dst_row_pitch; if (src_row_pitch < region[0] || dst_row_pitch < region[0]) { err = CL_INVALID_VALUE; break; } if ((src_slice_pitch < region[1] * src_row_pitch || src_slice_pitch % src_row_pitch != 0) || (dst_slice_pitch < region[1] * dst_row_pitch || dst_slice_pitch % dst_row_pitch != 0)) { err = CL_INVALID_VALUE; break; } total_size = (src_origin[2] + region[2] - 1) * src_slice_pitch + (src_origin[1] + region[1] - 1) * src_row_pitch + src_origin[0] + region[0]; if (total_size > src_buffer->size) { err = CL_INVALID_VALUE; break; } total_size = (dst_origin[2] + region[2] - 1) * dst_slice_pitch + (dst_origin[1] + region[1] - 1) * dst_row_pitch + dst_origin[0] + region[0]; if (total_size > dst_buffer->size) { err = CL_INVALID_VALUE; break; } if (src_buffer == dst_buffer && (src_row_pitch != dst_row_pitch || src_slice_pitch != dst_slice_pitch)) { err = CL_INVALID_VALUE; break; } if (src_buffer == dst_buffer && check_copy_overlap(src_origin, dst_origin, region, src_row_pitch, src_slice_pitch)) { err = CL_MEM_COPY_OVERLAP; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_COPY_BUFFER_RECT, &err); if (err != CL_SUCCESS) { break; } err = cl_mem_copy_buffer_rect(command_queue, e, src_buffer, dst_buffer, src_origin, dst_origin, region, src_row_pitch, src_slice_pitch, dst_row_pitch, dst_slice_pitch); if (err != CL_SUCCESS) { break; } /* We will flush the ndrange if no event depend. Else we will add it to queue list. The finish or Complete status will always be done in queue list. */ e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { // Error happend, cancel. err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } else if (e_status == CL_COMPLETE) { err = cl_event_exec(e, CL_SUBMITTED, CL_FALSE); if (err != CL_SUCCESS) { break; } } cl_command_queue_enqueue_event(command_queue, e); } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueFillBuffer(cl_command_queue command_queue, cl_mem buffer, const void *pattern, size_t pattern_size, size_t offset, size_t size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; static size_t valid_sz[] = {1, 2, 4, 8, 16, 32, 64, 128}; int i = 0; cl_event e = NULL; cl_int e_status; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_BUFFER(buffer)) { err = CL_INVALID_MEM_OBJECT; break; } if (command_queue->ctx != buffer->ctx) { err = CL_INVALID_CONTEXT; break; } if (offset + size > buffer->size) { err = CL_INVALID_VALUE; break; } if (pattern == NULL) { err = CL_INVALID_VALUE; break; } for (i = 0; i < sizeof(valid_sz) / sizeof(size_t); i++) { if (valid_sz[i] == pattern_size) break; } if (i == sizeof(valid_sz) / sizeof(size_t)) { err = CL_INVALID_VALUE; break; } if (offset % pattern_size || size % pattern_size) { err = CL_INVALID_VALUE; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_FILL_BUFFER, &err); if (err != CL_SUCCESS) { break; } err = cl_mem_fill(command_queue, e, pattern, pattern_size, buffer, offset, size); if (err) { break; } /* We will flush the ndrange if no event depend. Else we will add it to queue list. The finish or Complete status will always be done in queue list. */ e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { // Error happend, cancel. err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } err = cl_event_exec(e, e_status == CL_COMPLETE ? CL_SUBMITTED : CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueMigrateMemObjects(cl_command_queue command_queue, cl_uint num_mem_objects, const cl_mem *mem_objects, cl_mem_migration_flags flags, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { /* So far, we just support 1 device and no subdevice. So all the command queues belong to the small context. There is no need to migrate the mem objects by now. */ cl_int err = CL_SUCCESS; cl_event e = NULL; cl_int e_status; cl_uint i = 0; do { if (!flags & CL_MIGRATE_MEM_OBJECT_HOST) { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } } if (num_mem_objects == 0 || mem_objects == NULL) { err = CL_INVALID_VALUE; break; } if (flags && flags & ~(CL_MIGRATE_MEM_OBJECT_HOST | CL_MIGRATE_MEM_OBJECT_CONTENT_UNDEFINED)) { err = CL_INVALID_VALUE; break; } for (i = 0; i < num_mem_objects; i++) { if (!CL_OBJECT_IS_MEM(mem_objects[i])) { err = CL_INVALID_MEM_OBJECT; break; } if (mem_objects[i]->ctx != command_queue->ctx) { err = CL_INVALID_CONTEXT; break; } } if (err != CL_SUCCESS) { break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_MIGRATE_MEM_OBJECTS, &err); if (err != CL_SUCCESS) { break; } /* Noting to do now, just enqueue a event. */ e->exec_data.type = EnqueueMigrateMemObj; /* We will flush the ndrange if no event depend. Else we will add it to queue list. The finish or Complete status will always be done in queue list. */ e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { // Error happend, cancel. err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } err = cl_event_exec(e, e_status == CL_COMPLETE ? CL_SUBMITTED : CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } /************************************ Images *********************************************/ static cl_int check_image_region(struct _cl_mem_image *image, const size_t *pregion, size_t *region) { if (pregion == NULL) { return CL_INVALID_VALUE; } if (image->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY) { region[0] = pregion[0]; region[1] = 1; region[2] = pregion[1]; } else { region[0] = pregion[0]; region[1] = pregion[1]; region[2] = pregion[2]; } if ((region[0] == 0) || (region[1] == 0) || (region[2] == 0)) { return CL_INVALID_VALUE; } return CL_SUCCESS; } static cl_int check_image_origin(struct _cl_mem_image *image, const size_t *porigin, size_t *origin) { if (porigin == NULL) { return CL_INVALID_VALUE; } if (image->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY) { origin[0] = porigin[0]; origin[1] = 0; origin[2] = porigin[1]; } else { origin[0] = porigin[0]; origin[1] = porigin[1]; origin[2] = porigin[2]; } return CL_SUCCESS; } void * clEnqueueMapImage(cl_command_queue command_queue, cl_mem mem, cl_bool blocking_map, cl_map_flags map_flags, const size_t *porigin, const size_t *pregion, size_t *image_row_pitch, size_t *image_slice_pitch, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event, cl_int *errcode_ret) { cl_int err = CL_SUCCESS; void *ptr = NULL; void *mem_ptr = NULL; size_t offset = 0; struct _cl_mem_image *image = NULL; cl_int e_status; enqueue_data *data = NULL; size_t region[3]; size_t origin[3]; cl_event e = NULL; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_IMAGE(mem)) { err = CL_INVALID_MEM_OBJECT; break; } image = cl_mem_image(mem); err = check_image_region(image, pregion, region); if (err != CL_SUCCESS) { break; } err = check_image_origin(image, porigin, origin); if (err != CL_SUCCESS) { break; } if (command_queue->ctx != mem->ctx) { err = CL_INVALID_CONTEXT; break; } if (origin[0] + region[0] > image->w || origin[1] + region[1] > image->h || origin[2] + region[2] > image->depth) { err = CL_INVALID_VALUE; break; } if (!image_row_pitch || (image->slice_pitch && !image_slice_pitch)) { err = CL_INVALID_VALUE; break; } if ((map_flags & CL_MAP_READ && mem->flags & (CL_MEM_HOST_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS)) || (map_flags & (CL_MAP_WRITE | CL_MAP_WRITE_INVALIDATE_REGION) && mem->flags & (CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_NO_ACCESS))) { err = CL_INVALID_OPERATION; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_MAP_IMAGE, &err); if (err != CL_SUCCESS) { break; } if (blocking_map) { err = cl_event_wait_for_event_ready(e); if (err != CL_SUCCESS) break; /* Blocking call API is a sync point of flush. */ err = cl_command_queue_wait_flush(command_queue); if (err != CL_SUCCESS) { break; } } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } data = &e->exec_data; data->type = EnqueueMapImage; data->mem_obj = mem; data->origin[0] = origin[0]; data->origin[1] = origin[1]; data->origin[2] = origin[2]; data->region[0] = region[0]; data->region[1] = region[1]; data->region[2] = region[2]; data->ptr = ptr; data->unsync_map = 1; if (map_flags & (CL_MAP_WRITE | CL_MAP_WRITE_INVALIDATE_REGION)) data->write_map = 1; if (e_status == CL_COMPLETE) { // Sync mode, no need to queue event. err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { err = cl_event_exec(e, CL_SUBMITTED, CL_TRUE); // Submit to get the address. if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } ptr = data->ptr; assert(ptr); /* Store and write back map info. */ if (mem->flags & CL_MEM_USE_HOST_PTR) { if (image_slice_pitch) *image_slice_pitch = image->host_slice_pitch; *image_row_pitch = image->host_row_pitch; offset = image->bpp * origin[0] + image->host_row_pitch * origin[1] + image->host_slice_pitch * origin[2]; } else { if (image_slice_pitch) *image_slice_pitch = image->slice_pitch; if (image->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY) *image_row_pitch = image->slice_pitch; else *image_row_pitch = image->row_pitch; offset = image->bpp * origin[0] + image->row_pitch * origin[1] + image->slice_pitch * origin[2]; } err = cl_mem_record_map_mem(mem, ptr, &mem_ptr, offset, 0, origin, region); assert(err == CL_SUCCESS); // Easy way, do not use unmap to handle error. } while (0); if (err != CL_SUCCESS) { if (e) { cl_event_delete(e); e = NULL; } assert(ptr == NULL); } if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } if (errcode_ret) *errcode_ret = err; return mem_ptr; } cl_int clEnqueueReadImage(cl_command_queue command_queue, cl_mem mem, cl_bool blocking_read, const size_t *porigin, const size_t *pregion, size_t row_pitch, size_t slice_pitch, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; struct _cl_mem_image *image = NULL; enqueue_data *data = NULL; cl_int e_status; size_t region[3]; size_t origin[3]; cl_event e = NULL; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_IMAGE(mem)) { err = CL_INVALID_MEM_OBJECT; break; } image = cl_mem_image(mem); err = check_image_region(image, pregion, region); if (err != CL_SUCCESS) { break; } err = check_image_origin(image, porigin, origin); if (err != CL_SUCCESS) { break; } if (command_queue->ctx != mem->ctx) { err = CL_INVALID_CONTEXT; break; } if (origin[0] + region[0] > image->w || origin[1] + region[1] > image->h || origin[2] + region[2] > image->depth) { err = CL_INVALID_VALUE; break; } if (!row_pitch) { row_pitch = image->bpp * region[0]; } else if (row_pitch < image->bpp * region[0]) { err = CL_INVALID_VALUE; break; } if (image->slice_pitch) { if (!slice_pitch) { slice_pitch = row_pitch * region[1]; } else if (slice_pitch < row_pitch * region[1]) { err = CL_INVALID_VALUE; break; } } else if (slice_pitch) { err = CL_INVALID_VALUE; break; } if (!ptr) { err = CL_INVALID_VALUE; break; } if (mem->flags & (CL_MEM_HOST_WRITE_ONLY | CL_MEM_HOST_NO_ACCESS)) { err = CL_INVALID_OPERATION; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_READ_IMAGE, &err); if (err != CL_SUCCESS) { break; } if (blocking_read) { err = cl_event_wait_for_event_ready(e); if (err != CL_SUCCESS) break; /* Blocking call API is a sync point of flush. */ err = cl_command_queue_wait_flush(command_queue); if (err != CL_SUCCESS) { break; } } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } data = &e->exec_data; data->type = EnqueueReadImage; data->mem_obj = mem; data->ptr = ptr; data->origin[0] = origin[0]; data->origin[1] = origin[1]; data->origin[2] = origin[2]; data->region[0] = region[0]; data->region[1] = region[1]; data->region[2] = region[2]; data->row_pitch = row_pitch; data->slice_pitch = slice_pitch; if (e_status == CL_COMPLETE) { // Sync mode, no need to queue event. err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { err = cl_event_exec(e, CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueWriteImage(cl_command_queue command_queue, cl_mem mem, cl_bool blocking_write, const size_t *porigin, const size_t *pregion, size_t row_pitch, size_t slice_pitch, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; struct _cl_mem_image *image = NULL; enqueue_data *data = NULL; cl_int e_status; size_t region[3]; size_t origin[3]; cl_event e = NULL; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_IMAGE(mem)) { err = CL_INVALID_MEM_OBJECT; break; } image = cl_mem_image(mem); err = check_image_region(image, pregion, region); if (err != CL_SUCCESS) { break; } err = check_image_origin(image, porigin, origin); if (err != CL_SUCCESS) { break; } if (command_queue->ctx != mem->ctx) { err = CL_INVALID_CONTEXT; break; } if (origin[0] + region[0] > image->w || origin[1] + region[1] > image->h || origin[2] + region[2] > image->depth) { err = CL_INVALID_VALUE; break; } if (!row_pitch) { row_pitch = image->bpp * region[0]; } else if (row_pitch < image->bpp * region[0]) { err = CL_INVALID_VALUE; break; } if (image->slice_pitch) { if (!slice_pitch) { slice_pitch = row_pitch * region[1]; } else if (slice_pitch < row_pitch * region[1]) { err = CL_INVALID_VALUE; break; } } else if (slice_pitch) { err = CL_INVALID_VALUE; break; } if (!ptr) { err = CL_INVALID_VALUE; break; } if (mem->flags & (CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_NO_ACCESS)) { err = CL_INVALID_OPERATION; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_WRITE_IMAGE, &err); if (err != CL_SUCCESS) { break; } if (blocking_write) { err = cl_event_wait_for_event_ready(e); if (err != CL_SUCCESS) break; /* Blocking call API is a sync point of flush. */ err = cl_command_queue_wait_flush(command_queue); if (err != CL_SUCCESS) { break; } } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } data = &e->exec_data; data->type = EnqueueWriteImage; data->mem_obj = mem; data->const_ptr = ptr; data->origin[0] = origin[0]; data->origin[1] = origin[1]; data->origin[2] = origin[2]; data->region[0] = region[0]; data->region[1] = region[1]; data->region[2] = region[2]; data->row_pitch = row_pitch; data->slice_pitch = slice_pitch; if (e_status == CL_COMPLETE) { // Sync mode, no need to queue event. err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { err = cl_event_exec(e, CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueCopyImage(cl_command_queue command_queue, cl_mem src_mem, cl_mem dst_mem, const size_t *psrc_origin, const size_t *pdst_origin, const size_t *pregion, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; cl_bool overlap = CL_TRUE; cl_int i = 0; cl_event e = NULL; struct _cl_mem_image *src_image = NULL; struct _cl_mem_image *dst_image = NULL; size_t region[3]; size_t src_origin[3]; size_t dst_origin[3]; cl_int e_status; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_IMAGE(src_mem)) { err = CL_INVALID_MEM_OBJECT; break; } if (!CL_OBJECT_IS_IMAGE(dst_mem)) { err = CL_INVALID_MEM_OBJECT; break; } src_image = cl_mem_image(src_mem); dst_image = cl_mem_image(dst_mem); err = check_image_region(src_image, pregion, region); if (err != CL_SUCCESS) { break; } err = check_image_origin(src_image, psrc_origin, src_origin); if (err != CL_SUCCESS) { break; } err = check_image_origin(dst_image, pdst_origin, dst_origin); if (err != CL_SUCCESS) { break; } if (command_queue->ctx != src_mem->ctx || command_queue->ctx != dst_mem->ctx) { err = CL_INVALID_CONTEXT; break; } if (src_image->fmt.image_channel_order != dst_image->fmt.image_channel_order || src_image->fmt.image_channel_data_type != dst_image->fmt.image_channel_data_type) { err = CL_IMAGE_FORMAT_MISMATCH; break; } if (src_origin[0] + region[0] > src_image->w || src_origin[1] + region[1] > src_image->h || src_origin[2] + region[2] > src_image->depth) { err = CL_INVALID_VALUE; break; } if (dst_origin[0] + region[0] > dst_image->w || dst_origin[1] + region[1] > dst_image->h || dst_origin[2] + region[2] > dst_image->depth) { err = CL_INVALID_VALUE; break; } if ((src_image->image_type == CL_MEM_OBJECT_IMAGE2D && (src_origin[2] != 0 || region[2] != 1)) || (dst_image->image_type == CL_MEM_OBJECT_IMAGE2D && (dst_origin[2] != 0 || region[2] != 1))) { err = CL_INVALID_VALUE; break; } if (src_image == dst_image) { for (i = 0; i < 3; i++) { overlap = overlap && (src_origin[i] < dst_origin[i] + region[i]) && (dst_origin[i] < src_origin[i] + region[i]); } if (overlap == CL_TRUE) { err = CL_MEM_COPY_OVERLAP; break; } } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_COPY_IMAGE, &err); if (err != CL_SUCCESS) { break; } err = cl_mem_kernel_copy_image(command_queue, e, src_image, dst_image, src_origin, dst_origin, region); if (err != CL_SUCCESS) { break; } /* We will flush the ndrange if no event depend. Else we will add it to queue list. The finish or Complete status will always be done in queue list. */ e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { // Error happend, cancel. err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } err = cl_event_exec(e, e_status == CL_COMPLETE ? CL_SUBMITTED : CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueCopyImageToBuffer(cl_command_queue command_queue, cl_mem src_mem, cl_mem dst_buffer, const size_t *psrc_origin, const size_t *pregion, size_t dst_offset, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; struct _cl_mem_image *src_image = NULL; size_t region[3]; size_t src_origin[3]; cl_event e = NULL; cl_int e_status; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_IMAGE(src_mem)) { err = CL_INVALID_MEM_OBJECT; break; } if (!CL_OBJECT_IS_BUFFER(dst_buffer)) { err = CL_INVALID_MEM_OBJECT; break; } src_image = cl_mem_image(src_mem); err = check_image_region(src_image, pregion, region); if (err != CL_SUCCESS) { break; } err = check_image_origin(src_image, psrc_origin, src_origin); if (err != CL_SUCCESS) { break; } if (command_queue->ctx != src_mem->ctx || command_queue->ctx != dst_buffer->ctx) { err = CL_INVALID_CONTEXT; break; } if (dst_offset + region[0] * region[1] * region[2] * src_image->bpp > dst_buffer->size) { err = CL_INVALID_VALUE; break; } if (src_origin[0] + region[0] > src_image->w || src_origin[1] + region[1] > src_image->h || src_origin[2] + region[2] > src_image->depth) { err = CL_INVALID_VALUE; break; } if (src_image->image_type == CL_MEM_OBJECT_IMAGE2D && (src_origin[2] != 0 || region[2] != 1)) { err = CL_INVALID_VALUE; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_COPY_IMAGE_TO_BUFFER, &err); if (err != CL_SUCCESS) { break; } err = cl_mem_copy_image_to_buffer(command_queue, e, src_image, dst_buffer, src_origin, dst_offset, region); if (err != CL_SUCCESS) { break; } /* We will flush the ndrange if no event depend. Else we will add it to queue list. The finish or Complete status will always be done in queue list. */ e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { // Error happend, cancel. err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } err = cl_event_exec(e, e_status == CL_COMPLETE ? CL_SUBMITTED : CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueCopyBufferToImage(cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_mem, size_t src_offset, const size_t *pdst_origin, const size_t *pregion, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; struct _cl_mem_image *dst_image = NULL; size_t region[3]; size_t dst_origin[3]; cl_event e = NULL; cl_int e_status; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_BUFFER(src_buffer)) { err = CL_INVALID_MEM_OBJECT; break; } if (!CL_OBJECT_IS_IMAGE(dst_mem)) { err = CL_INVALID_MEM_OBJECT; break; } dst_image = cl_mem_image(dst_mem); err = check_image_region(dst_image, pregion, region); if (err != CL_SUCCESS) { break; } err = check_image_origin(dst_image, pdst_origin, dst_origin); if (err != CL_SUCCESS) { break; } if (command_queue->ctx != src_buffer->ctx || command_queue->ctx != dst_mem->ctx) { err = CL_INVALID_CONTEXT; break; } if (src_offset + region[0] * region[1] * region[2] * dst_image->bpp > src_buffer->size) { err = CL_INVALID_VALUE; break; } if (dst_origin[0] + region[0] > dst_image->w || dst_origin[1] + region[1] > dst_image->h || dst_origin[2] + region[2] > dst_image->depth) { err = CL_INVALID_VALUE; break; } if (dst_image->image_type == CL_MEM_OBJECT_IMAGE2D && (dst_origin[2] != 0 || region[2] != 1)) { err = CL_INVALID_VALUE; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_COPY_BUFFER_TO_IMAGE, &err); if (err != CL_SUCCESS) { break; } err = cl_mem_copy_buffer_to_image(command_queue, e, src_buffer, dst_image, src_offset, dst_origin, region); if (err != CL_SUCCESS) { break; } /* We will flush the ndrange if no event depend. Else we will add it to queue list. The finish or Complete status will always be done in queue list. */ e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { // Error happend, cancel. err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } err = cl_event_exec(e, e_status == CL_COMPLETE ? CL_SUBMITTED : CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueFillImage(cl_command_queue command_queue, cl_mem mem, const void *fill_color, const size_t *porigin, const size_t *pregion, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; size_t region[3]; size_t origin[3]; cl_event e = NULL; struct _cl_mem_image *image = NULL; cl_int e_status; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_IMAGE(mem)) { err = CL_INVALID_MEM_OBJECT; break; } image = cl_mem_image(mem); err = check_image_region(image, pregion, region); if (err != CL_SUCCESS) { break; } err = check_image_origin(image, porigin, origin); if (err != CL_SUCCESS) { break; } if (command_queue->ctx != mem->ctx) { err = CL_INVALID_CONTEXT; break; } if (fill_color == NULL) { err = CL_INVALID_VALUE; break; } if (origin[0] + region[0] > image->w || origin[1] + region[1] > image->h || origin[2] + region[2] > image->depth) { err = CL_INVALID_VALUE; break; } if (image->image_type == CL_MEM_OBJECT_IMAGE2D && (origin[2] != 0 || region[2] != 1)) { err = CL_INVALID_VALUE; break; } if (image->image_type == CL_MEM_OBJECT_IMAGE1D && (origin[2] != 0 || origin[1] != 0 || region[2] != 1 || region[1] != 1)) { err = CL_INVALID_VALUE; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_FILL_IMAGE, &err); if (err != CL_SUCCESS) { break; } err = cl_image_fill(command_queue, e, fill_color, image, origin, region); if (err != CL_SUCCESS) { break; } /* We will flush the ndrange if no event depend. Else we will add it to queue list. The finish or Complete status will always be done in queue list. */ e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { // Error happend, cancel. err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } err = cl_event_exec(e, e_status == CL_COMPLETE ? CL_SUBMITTED : CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clRetainMemObject(cl_mem memobj) { if (!CL_OBJECT_IS_MEM(memobj)) { return CL_INVALID_MEM_OBJECT; } cl_mem_add_ref(memobj); return CL_SUCCESS; } cl_int clReleaseMemObject(cl_mem memobj) { if (!CL_OBJECT_IS_MEM(memobj)) { return CL_INVALID_MEM_OBJECT; } cl_mem_delete(memobj); return CL_SUCCESS; } Beignet-1.3.2-Source/src/cl_driver.cpp000664 001750 001750 00000002264 13161142102 016717 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ extern "C" { #include "intel/intel_driver.h" #include "cl_utils.h" #include #include } namespace { /*! Just use c++ pre-main to initialize the call-backs */ struct OCLDriverCallBackInitializer { OCLDriverCallBackInitializer(void) { intel_setup_callbacks(); } }; /*! Set the call backs at pre-main time */ static OCLDriverCallBackInitializer cbInitializer; } /* namespace */ Beignet-1.3.2-Source/src/cl_sampler.h000664 001750 001750 00000004071 13161142102 016532 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_SAMPLER_H__ #define __CL_SAMPLER_H__ #include "CL/cl.h" #include "cl_base_object.h" #include "../backend/src/ocl_common_defines.h" #include /* How to access images */ typedef struct _cl_sampler { _cl_base_object base; cl_context ctx; /* Context it belongs to */ cl_bool normalized_coords; /* Are coordinates normalized? */ cl_addressing_mode address; /* CLAMP / REPEAT and so on... */ cl_filter_mode filter; /* LINEAR / NEAREST mostly */ uint32_t clkSamplerValue; } _cl_sampler; #define CL_OBJECT_SAMPLER_MAGIC 0x686a0ecba79ce32fLL #define CL_OBJECT_IS_SAMPLER(obj) ((obj && \ ((cl_base_object)obj)->magic == CL_OBJECT_SAMPLER_MAGIC && \ CL_OBJECT_GET_REF(obj) >= 1)) /* Create a new sampler object */ extern cl_sampler cl_create_sampler(cl_context, cl_bool, cl_addressing_mode, cl_filter_mode, cl_int *err); /* Unref the object and delete it if no more reference on it */ extern void cl_sampler_delete(cl_sampler); /* Add one more reference to this object */ extern void cl_sampler_add_ref(cl_sampler); /* set a sampler kernel argument */ int cl_set_sampler_arg_slot(cl_kernel k, int index, cl_sampler sampler); #endif /* __CL_SAMPLER_H__ */ Beignet-1.3.2-Source/src/cl_gbe_loader.h000664 001750 001750 00000011045 13173554000 017160 0ustar00yryr000000 000000 /* * Copyright © 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __CL_GBE_LOADER_H__ #define __CL_GBE_LOADER_H__ #include "program.h" #ifdef __cplusplus extern "C" { #endif extern gbe_program_new_from_source_cb *compiler_program_new_from_source; extern gbe_program_new_from_llvm_file_cb *compiler_program_new_from_llvm_file; extern gbe_program_compile_from_source_cb *compiler_program_compile_from_source; extern gbe_program_new_gen_program_cb *compiler_program_new_gen_program; extern gbe_program_link_program_cb *compiler_program_link_program; extern gbe_program_check_opt_cb *compiler_program_check_opt; extern gbe_program_build_from_llvm_cb *compiler_program_build_from_llvm; extern gbe_program_new_from_llvm_binary_cb *compiler_program_new_from_llvm_binary; extern gbe_program_serialize_to_binary_cb *compiler_program_serialize_to_binary; extern gbe_program_new_from_llvm_cb *compiler_program_new_from_llvm; extern gbe_program_clean_llvm_resource_cb *compiler_program_clean_llvm_resource; extern gbe_program_new_from_binary_cb *interp_program_new_from_binary; extern gbe_program_get_global_constant_size_cb *interp_program_get_global_constant_size; extern gbe_program_get_global_constant_data_cb *interp_program_get_global_constant_data; extern gbe_program_get_global_reloc_count_cb *interp_program_get_global_reloc_count; extern gbe_program_get_global_reloc_table_cb *interp_program_get_global_reloc_table; extern gbe_program_delete_cb *interp_program_delete; extern gbe_program_get_kernel_num_cb *interp_program_get_kernel_num; extern gbe_program_get_kernel_by_name_cb *interp_program_get_kernel_by_name; extern gbe_program_get_kernel_cb *interp_program_get_kernel; extern gbe_program_get_device_enqueue_kernel_name_cb *interp_program_get_device_enqueue_kernel_name; extern gbe_kernel_get_name_cb *interp_kernel_get_name; extern gbe_kernel_get_attributes_cb *interp_kernel_get_attributes; extern gbe_kernel_get_code_cb *interp_kernel_get_code; extern gbe_kernel_get_code_size_cb *interp_kernel_get_code_size; extern gbe_kernel_get_arg_num_cb *interp_kernel_get_arg_num; extern gbe_kernel_get_arg_size_cb *interp_kernel_get_arg_size; extern gbe_kernel_get_arg_bti_cb *interp_kernel_get_arg_bti; extern gbe_kernel_get_arg_type_cb *interp_kernel_get_arg_type; extern gbe_kernel_get_arg_align_cb *interp_kernel_get_arg_align; extern gbe_kernel_get_simd_width_cb *interp_kernel_get_simd_width; extern gbe_kernel_get_curbe_offset_cb *interp_kernel_get_curbe_offset; extern gbe_kernel_get_curbe_size_cb *interp_kernel_get_curbe_size; extern gbe_kernel_get_stack_size_cb *interp_kernel_get_stack_size; extern gbe_kernel_get_scratch_size_cb *interp_kernel_get_scratch_size; extern gbe_kernel_get_required_work_group_size_cb *interp_kernel_get_required_work_group_size; extern gbe_kernel_use_slm_cb *interp_kernel_use_slm; extern gbe_kernel_get_slm_size_cb *interp_kernel_get_slm_size; extern gbe_kernel_get_sampler_size_cb *interp_kernel_get_sampler_size; extern gbe_kernel_get_sampler_data_cb *interp_kernel_get_sampler_data; extern gbe_kernel_get_compile_wg_size_cb *interp_kernel_get_compile_wg_size; extern gbe_kernel_get_image_size_cb *interp_kernel_get_image_size; extern gbe_kernel_get_image_data_cb *interp_kernel_get_image_data; extern gbe_kernel_get_ocl_version_cb *interp_kernel_get_ocl_version; extern gbe_output_profiling_cb* interp_output_profiling; extern gbe_get_profiling_bti_cb* interp_get_profiling_bti; extern gbe_dup_profiling_cb* interp_dup_profiling; extern gbe_get_printf_num_cb* interp_get_printf_num; extern gbe_get_printf_buf_bti_cb* interp_get_printf_buf_bti; extern gbe_dup_printfset_cb* interp_dup_printfset; extern gbe_release_printf_info_cb* interp_release_printf_info; extern gbe_output_printf_cb* interp_output_printf; extern gbe_kernel_get_arg_info_cb *interp_kernel_get_arg_info; extern gbe_kernel_use_device_enqueue_cb * interp_kernel_use_device_enqueue; int CompilerSupported(); #ifdef __cplusplus } #endif #endif /* __CL_GBE_LOADER_H__ */ Beignet-1.3.2-Source/src/cl_gl_api.c000664 001750 001750 00000017542 13173554000 016333 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Zhigang Gong */ #include #include #include #ifdef HAS_GL_EGL #include #endif #include "cl_platform_id.h" #include "cl_device_id.h" #include "cl_context.h" #include "cl_command_queue.h" #include "cl_program.h" #include "cl_kernel.h" #include "cl_mem.h" #include "cl_image.h" #include "cl_sampler.h" #include "cl_alloc.h" #include "cl_utils.h" #include "cl_enqueue.h" #include "cl_event.h" #include "CL/cl.h" #include "CL/cl_gl.h" #include "CL/cl_intel.h" #include "cl_mem_gl.h" #define CHECK_GL_CONTEXT(CTX) \ do { \ if (UNLIKELY(CTX->props.gl_type == CL_GL_NOSHARE)) { \ err = CL_INVALID_CONTEXT; \ goto error; \ } \ } while (0) cl_mem clCreateFromGLBuffer(cl_context context, cl_mem_flags flags, GLuint bufobj, cl_int * errcode_ret) { cl_mem mem = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); CHECK_GL_CONTEXT (context); mem = cl_mem_new_gl_buffer(context, flags, bufobj, &err); error: if (errcode_ret) *errcode_ret = err; return mem; } cl_mem clCreateFromGLTexture2D(cl_context context, cl_mem_flags flags, GLenum texture_target, GLint miplevel, GLuint texture, cl_int * errcode_ret) { cl_mem mem = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); CHECK_GL_CONTEXT (context); mem = cl_mem_new_gl_texture(context, flags, texture_target, miplevel, texture, &err); error: if (errcode_ret) *errcode_ret = err; return mem; } cl_mem clCreateFromGLTexture3D(cl_context context, cl_mem_flags flags, GLenum texture_target, GLint miplevel, GLuint texture, cl_int * errcode_ret) { NOT_IMPLEMENTED; } cl_mem clCreateFromGLTexture(cl_context context, cl_mem_flags flags, cl_GLenum target, cl_GLint miplevel, cl_GLuint texture, cl_int * errcode_ret) { cl_mem mem = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); CHECK_GL_CONTEXT (context); //We just support GL_TEXTURE_2D now. if(target != GL_TEXTURE_2D){ err = CL_INVALID_VALUE; goto error; } mem = cl_mem_new_gl_texture(context, flags, target, miplevel, texture, &err); error: if (errcode_ret) *errcode_ret = err; return mem; } /* XXX NULL function currently. */ cl_int clEnqueueAcquireGLObjects (cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; cl_int e_status, i; cl_event e = NULL; enqueue_data *data = NULL; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (UNLIKELY(command_queue->ctx->props.gl_type == CL_GL_NOSHARE)) { err = CL_INVALID_CONTEXT; break; } if ((num_objects == 0 && mem_objects != NULL) || (num_objects > 0 && mem_objects == NULL)) { err = CL_INVALID_VALUE; break; } for (i = 0; i < num_objects; i++) { if (!cl_mem_image(mem_objects[i])) { err = CL_INVALID_MEM_OBJECT; break; } if (!IS_GL_IMAGE(mem_objects[i])) { err = CL_INVALID_GL_OBJECT; break; } } if (err != CL_SUCCESS) { break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_ACQUIRE_GL_OBJECTS, &err); if (err != CL_SUCCESS) { break; } e_status = cl_event_is_ready(e); data = &e->exec_data; data->type = EnqueueReturnSuccesss; if (e_status == CL_COMPLETE) { // Sync mode, no need to queue event. err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { err = cl_event_exec(e, CL_SUBMITTED, CL_TRUE); // Submit to get the address. if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } /* XXX NULL function currently. */ cl_int clEnqueueReleaseGLObjects (cl_command_queue command_queue, cl_uint num_objects, const cl_mem *mem_objects, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; cl_int e_status, i; cl_event e = NULL; enqueue_data *data = NULL; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (UNLIKELY(command_queue->ctx->props.gl_type == CL_GL_NOSHARE)) { err = CL_INVALID_CONTEXT; break; } if ((num_objects == 0 && mem_objects != NULL) || (num_objects > 0 && mem_objects == NULL)) { err = CL_INVALID_VALUE; break; } for (i = 0; i < num_objects; i++) { if (!cl_mem_image(mem_objects[i])) { err = CL_INVALID_MEM_OBJECT; break; } if (!IS_GL_IMAGE(mem_objects[i])) { err = CL_INVALID_GL_OBJECT; break; } } if (err != CL_SUCCESS) { break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_ACQUIRE_GL_OBJECTS, &err); if (err != CL_SUCCESS) { break; } e_status = cl_event_is_ready(e); data = &e->exec_data; data->type = EnqueueReturnSuccesss; if (e_status == CL_COMPLETE) { // Sync mode, no need to queue event. err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { err = cl_event_exec(e, CL_SUBMITTED, CL_TRUE); // Submit to get the address. if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } Beignet-1.3.2-Source/src/cl_api_event.c000664 001750 001750 00000021104 13161142102 017030 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "cl_event.h" #include "cl_context.h" #include "cl_command_queue.h" #include "CL/cl.h" #include cl_event clCreateUserEvent(cl_context context, cl_int *errcode_ret) { cl_int err = CL_SUCCESS; cl_event event = NULL; do { if (!CL_OBJECT_IS_CONTEXT(context)) { err = CL_INVALID_CONTEXT; break; } event = cl_event_create(context, NULL, 0, NULL, CL_COMMAND_USER, &err); } while (0); if (errcode_ret) *errcode_ret = err; return event; } cl_int clSetUserEventStatus(cl_event event, cl_int execution_status) { cl_int err = CL_SUCCESS; if (!CL_OBJECT_IS_EVENT(event)) { return CL_INVALID_EVENT; } if (execution_status > CL_COMPLETE) { return CL_INVALID_VALUE; } err = cl_event_set_status(event, execution_status); return err; } /* 1.1 API, depreciated */ cl_int clEnqueueMarker(cl_command_queue command_queue, cl_event *event) { return clEnqueueMarkerWithWaitList(command_queue, 0, NULL, event); } cl_int clEnqueueMarkerWithWaitList(cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; cl_event e = NULL; cl_int e_status; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } if (event == NULL) { /* Create a anonymous event, it can not be waited on and useless. */ return CL_SUCCESS; } e = cl_event_create_marker_or_barrier(command_queue, num_events_in_wait_list, event_wait_list, CL_FALSE, &err); if (err != CL_SUCCESS) { return err; } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { // Error happend, cancel. err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } else if (e_status == CL_COMPLETE) { err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { cl_command_queue_enqueue_event(command_queue, e); } } while (0); if (event) { *event = e; } else { cl_event_delete(e); } return err; } /* 1.1 API, depreciated */ cl_int clEnqueueBarrier(cl_command_queue command_queue) { return clEnqueueBarrierWithWaitList(command_queue, 0, NULL, NULL); } cl_int clEnqueueBarrierWithWaitList(cl_command_queue command_queue, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; cl_event e = NULL; cl_int e_status; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create_marker_or_barrier(command_queue, num_events_in_wait_list, event_wait_list, CL_TRUE, &err); if (err != CL_SUCCESS) { break; } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { // Error happend, cancel. err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } else if (e_status == CL_COMPLETE) { cl_command_queue_insert_barrier_event(command_queue, e); err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } /* Already a completed barrier, no need to insert to queue. */ } else { cl_command_queue_insert_barrier_event(command_queue, e); cl_command_queue_enqueue_event(command_queue, e); } } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clWaitForEvents(cl_uint num_events, const cl_event *event_list) { cl_int err = CL_SUCCESS; cl_uint i; if (num_events == 0 || event_list == NULL) { return CL_INVALID_VALUE; } err = cl_event_check_waitlist(num_events, event_list, NULL, NULL); if (err != CL_SUCCESS) { return err; } for (i = 0; i < num_events; i++) { if (cl_event_get_status(event_list[i]) < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; return err; } } err = cl_event_wait_for_events_list(num_events, event_list); return err; } /* 1.1 API, depreciated */ cl_int clEnqueueWaitForEvents(cl_command_queue command_queue, cl_uint num_events, const cl_event *event_list) { cl_int err = CL_SUCCESS; if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } err = clWaitForEvents(num_events, event_list); return err; } cl_int clSetEventCallback(cl_event event, cl_int command_exec_callback_type, void(CL_CALLBACK *pfn_notify)(cl_event, cl_int, void *), void *user_data) { cl_int err = CL_SUCCESS; if (!CL_OBJECT_IS_EVENT(event)) { return CL_INVALID_EVENT; } if ((pfn_notify == NULL) || (command_exec_callback_type > CL_SUBMITTED) || (command_exec_callback_type < CL_COMPLETE)) { return CL_INVALID_VALUE; } err = cl_event_set_callback(event, command_exec_callback_type, pfn_notify, user_data); return err; } cl_int clGetEventInfo(cl_event event, cl_event_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { void *src_ptr = NULL; size_t src_size = 0; cl_uint ref; cl_int status; if (!CL_OBJECT_IS_EVENT(event)) { return CL_INVALID_EVENT; } if (param_name == CL_EVENT_COMMAND_QUEUE) { src_ptr = &event->queue; src_size = sizeof(cl_command_queue); } else if (param_name == CL_EVENT_CONTEXT) { src_ptr = &event->ctx; src_size = sizeof(cl_context); } else if (param_name == CL_EVENT_COMMAND_TYPE) { src_ptr = &event->event_type; src_size = sizeof(cl_command_type); } else if (param_name == CL_EVENT_COMMAND_EXECUTION_STATUS) { status = cl_event_get_status(event); src_ptr = &status; src_size = sizeof(cl_int); } else if (param_name == CL_EVENT_REFERENCE_COUNT) { ref = CL_OBJECT_GET_REF(event); src_ptr = &ref; src_size = sizeof(cl_int); } else { return CL_INVALID_VALUE; } return cl_get_info_helper(src_ptr, src_size, param_value, param_value_size, param_value_size_ret); } cl_int clGetEventProfilingInfo(cl_event event, cl_profiling_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { cl_ulong ret_val; if (!CL_OBJECT_IS_EVENT(event)) { return CL_INVALID_EVENT; } assert(event->event_type == CL_COMMAND_USER || event->queue != NULL); if (event->event_type == CL_COMMAND_USER || !(event->queue->props & CL_QUEUE_PROFILING_ENABLE) || cl_event_get_status(event) != CL_COMPLETE) { return CL_PROFILING_INFO_NOT_AVAILABLE; } if (param_value && param_value_size < sizeof(cl_ulong)) { return CL_INVALID_VALUE; } if (param_name < CL_PROFILING_COMMAND_QUEUED || param_name > CL_PROFILING_COMMAND_COMPLETE) { return CL_INVALID_VALUE; } ret_val = event->timestamp[param_name - CL_PROFILING_COMMAND_QUEUED]; if (ret_val == CL_EVENT_INVALID_TIMESTAMP) { return CL_INVALID_VALUE; } if (param_value) *(cl_ulong *)param_value = ret_val; if (param_value_size_ret) *param_value_size_ret = sizeof(cl_ulong); return CL_SUCCESS; } Beignet-1.3.2-Source/src/cl_enqueue.c000664 001750 001750 00000046015 13161142102 016535 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Rong Yang */ //#include "cl_image.h" #include "cl_enqueue.h" #include "cl_driver.h" #include "cl_event.h" #include "cl_command_queue.h" #include "cl_utils.h" #include "cl_alloc.h" #include "cl_device_enqueue.h" #include #include #include #include static cl_int cl_enqueue_read_buffer(enqueue_data *data, cl_int status) { cl_int err = CL_SUCCESS; cl_mem mem = data->mem_obj; if (status != CL_COMPLETE) return err; assert(mem->type == CL_MEM_BUFFER_TYPE || mem->type == CL_MEM_SUBBUFFER_TYPE); struct _cl_mem_buffer *buffer = (struct _cl_mem_buffer *)mem; //cl_buffer_get_subdata sometime is very very very slow in linux kernel, in skl and chv, //and it is randomly. So temporary disable it, use map/copy/unmap to read. //Should re-enable it after find root cause. if (0 && !mem->is_userptr) { if (cl_buffer_get_subdata(mem->bo, data->offset + buffer->sub_offset, data->size, data->ptr) != 0) err = CL_MAP_FAILURE; } else { void *src_ptr = cl_mem_map_auto(mem, 0); if (src_ptr == NULL) err = CL_MAP_FAILURE; else { //sometimes, application invokes read buffer, instead of map buffer, even if userptr is enabled //memcpy is not necessary for this case if (data->ptr != (char *)src_ptr + data->offset + buffer->sub_offset) memcpy(data->ptr, (char *)src_ptr + data->offset + buffer->sub_offset, data->size); cl_mem_unmap_auto(mem); } } return err; } static cl_int cl_enqueue_read_buffer_rect(enqueue_data *data, cl_int status) { cl_int err = CL_SUCCESS; void *src_ptr; void *dst_ptr; const size_t *origin = data->origin; const size_t *host_origin = data->host_origin; const size_t *region = data->region; cl_mem mem = data->mem_obj; if (status != CL_COMPLETE) return err; assert(mem->type == CL_MEM_BUFFER_TYPE || mem->type == CL_MEM_SUBBUFFER_TYPE); struct _cl_mem_buffer *buffer = (struct _cl_mem_buffer *)mem; if (!(src_ptr = cl_mem_map_auto(mem, 0))) { err = CL_MAP_FAILURE; goto error; } size_t offset = origin[0] + data->row_pitch * origin[1] + data->slice_pitch * origin[2]; src_ptr = (char *)src_ptr + offset + buffer->sub_offset; offset = host_origin[0] + data->host_row_pitch * host_origin[1] + data->host_slice_pitch * host_origin[2]; dst_ptr = (char *)data->ptr + offset; if (data->row_pitch == region[0] && data->row_pitch == data->host_row_pitch && (region[2] == 1 || (data->slice_pitch == region[0] * region[1] && data->slice_pitch == data->host_slice_pitch))) { memcpy(dst_ptr, src_ptr, region[2] == 1 ? data->row_pitch * region[1] : data->slice_pitch * region[2]); } else { cl_uint y, z; for (z = 0; z < region[2]; z++) { const char *src = src_ptr; char *dst = dst_ptr; for (y = 0; y < region[1]; y++) { memcpy(dst, src, region[0]); src += data->row_pitch; dst += data->host_row_pitch; } src_ptr = (char *)src_ptr + data->slice_pitch; dst_ptr = (char *)dst_ptr + data->host_slice_pitch; } } err = cl_mem_unmap_auto(mem); error: return err; } static cl_int cl_enqueue_write_buffer(enqueue_data *data, cl_int status) { cl_int err = CL_SUCCESS; cl_mem mem = data->mem_obj; assert(mem->type == CL_MEM_BUFFER_TYPE || mem->type == CL_MEM_SUBBUFFER_TYPE); struct _cl_mem_buffer *buffer = (struct _cl_mem_buffer *)mem; if (status != CL_COMPLETE) return err; if (mem->is_userptr) { void *dst_ptr = cl_mem_map_auto(mem, 1); if (dst_ptr == NULL) err = CL_MAP_FAILURE; else { memcpy((char *)dst_ptr + data->offset + buffer->sub_offset, data->const_ptr, data->size); cl_mem_unmap_auto(mem); } } else { if (cl_buffer_subdata(mem->bo, data->offset + buffer->sub_offset, data->size, data->const_ptr) != 0) err = CL_MAP_FAILURE; } return err; } static cl_int cl_enqueue_write_buffer_rect(enqueue_data *data, cl_int status) { cl_int err = CL_SUCCESS; void *src_ptr; void *dst_ptr; const size_t *origin = data->origin; const size_t *host_origin = data->host_origin; const size_t *region = data->region; cl_mem mem = data->mem_obj; assert(mem->type == CL_MEM_BUFFER_TYPE || mem->type == CL_MEM_SUBBUFFER_TYPE); struct _cl_mem_buffer *buffer = (struct _cl_mem_buffer *)mem; if (status != CL_COMPLETE) return err; if (!(dst_ptr = cl_mem_map_auto(mem, 1))) { err = CL_MAP_FAILURE; goto error; } size_t offset = origin[0] + data->row_pitch * origin[1] + data->slice_pitch * origin[2]; dst_ptr = (char *)dst_ptr + offset + buffer->sub_offset; offset = host_origin[0] + data->host_row_pitch * host_origin[1] + data->host_slice_pitch * host_origin[2]; src_ptr = (char *)data->const_ptr + offset; if (data->row_pitch == region[0] && data->row_pitch == data->host_row_pitch && (region[2] == 1 || (data->slice_pitch == region[0] * region[1] && data->slice_pitch == data->host_slice_pitch))) { memcpy(dst_ptr, src_ptr, region[2] == 1 ? data->row_pitch * region[1] : data->slice_pitch * region[2]); } else { cl_uint y, z; for (z = 0; z < region[2]; z++) { const char *src = src_ptr; char *dst = dst_ptr; for (y = 0; y < region[1]; y++) { memcpy(dst, src, region[0]); src += data->host_row_pitch; dst += data->row_pitch; } src_ptr = (char *)src_ptr + data->host_slice_pitch; dst_ptr = (char *)dst_ptr + data->slice_pitch; } } err = cl_mem_unmap_auto(mem); error: return err; } static cl_int cl_enqueue_read_image(enqueue_data *data, cl_int status) { cl_int err = CL_SUCCESS; void *src_ptr; cl_mem mem = data->mem_obj; CHECK_IMAGE(mem, image); const size_t *origin = data->origin; const size_t *region = data->region; if (status != CL_COMPLETE) return err; if (!(src_ptr = cl_mem_map_auto(mem, 0))) { err = CL_MAP_FAILURE; goto error; } size_t offset = image->offset + image->bpp*origin[0] + image->row_pitch*origin[1] + image->slice_pitch*origin[2]; src_ptr = (char*)src_ptr + offset; if (!origin[0] && region[0] == image->w && data->row_pitch == image->row_pitch && (region[2] == 1 || (!origin[1] && region[1] == image->h && data->slice_pitch == image->slice_pitch))) { memcpy(data->ptr, src_ptr, region[2] == 1 ? data->row_pitch * region[1] : data->slice_pitch * region[2]); } else { cl_uint y, z; for (z = 0; z < region[2]; z++) { const char *src = src_ptr; char *dst = data->ptr; for (y = 0; y < region[1]; y++) { memcpy(dst, src, image->bpp * region[0]); src += image->row_pitch; dst += data->row_pitch; } src_ptr = (char *)src_ptr + image->slice_pitch; data->ptr = (char *)data->ptr + data->slice_pitch; } } err = cl_mem_unmap_auto(mem); error: return err; } static cl_int cl_enqueue_write_image(enqueue_data *data, cl_int status) { cl_int err = CL_SUCCESS; void *dst_ptr; cl_mem mem = data->mem_obj; CHECK_IMAGE(mem, image); if (status != CL_COMPLETE) return err; if (!(dst_ptr = cl_mem_map_auto(mem, 1))) { err = CL_MAP_FAILURE; goto error; } cl_mem_copy_image_region(data->origin, data->region, dst_ptr + image->offset, image->row_pitch, image->slice_pitch, data->const_ptr, data->row_pitch, data->slice_pitch, image, CL_TRUE, CL_FALSE); err = cl_mem_unmap_auto(mem); error: return err; } static cl_int cl_enqueue_map_buffer(enqueue_data *data, cl_int status) { void *ptr = NULL; cl_int err = CL_SUCCESS; cl_mem mem = data->mem_obj; assert(mem->type == CL_MEM_BUFFER_TYPE || mem->type == CL_MEM_SUBBUFFER_TYPE || mem->type == CL_MEM_SVM_TYPE); struct _cl_mem_buffer* buffer = (struct _cl_mem_buffer *)mem; if (status == CL_SUBMITTED) { if (buffer->base.is_userptr) { ptr = buffer->base.host_ptr; } else { if ((ptr = cl_mem_map_gtt_unsync(&buffer->base)) == NULL) { err = CL_MAP_FAILURE; return err; } } data->ptr = ptr; } else if (status == CL_COMPLETE) { if (mem->is_userptr) ptr = cl_mem_map_auto(mem, data->write_map ? 1 : 0); else { if (data->unsync_map == 1) //because using unsync map in clEnqueueMapBuffer, so force use map_gtt here ptr = cl_mem_map_gtt(mem); else ptr = cl_mem_map_auto(mem, data->write_map ? 1 : 0); } if (ptr == NULL) { err = CL_MAP_FAILURE; return err; } data->ptr = ptr; if ((mem->flags & CL_MEM_USE_HOST_PTR) && !mem->is_userptr) { assert(mem->host_ptr); ptr = (char *)ptr + data->offset + buffer->sub_offset; memcpy(mem->host_ptr + data->offset + buffer->sub_offset, ptr, data->size); } } return err; } static cl_int cl_enqueue_map_image(enqueue_data *data, cl_int status) { cl_int err = CL_SUCCESS; cl_mem mem = data->mem_obj; void *ptr = NULL; size_t row_pitch = 0; CHECK_IMAGE(mem, image); if (status == CL_SUBMITTED) { if ((ptr = cl_mem_map_gtt_unsync(mem)) == NULL) { err = CL_MAP_FAILURE; goto error; } data->ptr = ptr; } else if (status == CL_COMPLETE) { if (data->unsync_map == 1) //because using unsync map in clEnqueueMapBuffer, so force use map_gtt here ptr = cl_mem_map_gtt(mem); else ptr = cl_mem_map_auto(mem, data->write_map ? 1 : 0); if (ptr == NULL) { err = CL_MAP_FAILURE; goto error; } data->ptr = (char*)ptr + image->offset; if (image->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY) row_pitch = image->slice_pitch; else row_pitch = image->row_pitch; if(mem->flags & CL_MEM_USE_HOST_PTR) { assert(mem->host_ptr); if (!mem->is_userptr) //src and dst need add offset in function cl_mem_copy_image_region cl_mem_copy_image_region(data->origin, data->region, mem->host_ptr, image->host_row_pitch, image->host_slice_pitch, data->ptr, row_pitch, image->slice_pitch, image, CL_TRUE, CL_TRUE); } } error: return err; } static cl_int cl_enqueue_unmap_mem_object(enqueue_data *data, cl_int status) { cl_int err = CL_SUCCESS; int i, j; size_t mapped_size = 0; size_t origin[3], region[3]; void *v_ptr = NULL; void *mapped_ptr = data->ptr; cl_mem memobj = data->mem_obj; size_t row_pitch = 0; if (status != CL_COMPLETE) return err; assert(memobj->mapped_ptr_sz >= memobj->map_ref); INVALID_VALUE_IF(!mapped_ptr); for (i = 0; i < memobj->mapped_ptr_sz; i++) { if (memobj->mapped_ptr[i].ptr == mapped_ptr) { memobj->mapped_ptr[i].ptr = NULL; mapped_size = memobj->mapped_ptr[i].size; v_ptr = memobj->mapped_ptr[i].v_ptr; for (j = 0; j < 3; j++) { region[j] = memobj->mapped_ptr[i].region[j]; origin[j] = memobj->mapped_ptr[i].origin[j]; memobj->mapped_ptr[i].region[j] = 0; memobj->mapped_ptr[i].origin[j] = 0; } memobj->mapped_ptr[i].size = 0; memobj->mapped_ptr[i].v_ptr = NULL; memobj->map_ref--; break; } } /* can not find a mapped address? */ INVALID_VALUE_IF(i == memobj->mapped_ptr_sz); if (memobj->flags & CL_MEM_USE_HOST_PTR) { if (memobj->type == CL_MEM_BUFFER_TYPE || memobj->type == CL_MEM_SUBBUFFER_TYPE || memobj->type == CL_MEM_SVM_TYPE) { assert(mapped_ptr >= memobj->host_ptr && mapped_ptr + mapped_size <= memobj->host_ptr + memobj->size); /* Sync the data. */ if (!memobj->is_userptr) memcpy(v_ptr, mapped_ptr, mapped_size); } else { CHECK_IMAGE(memobj, image); if (image->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY) row_pitch = image->slice_pitch; else row_pitch = image->row_pitch; if (!memobj->is_userptr) //v_ptr have added offset, host_ptr have not added offset. cl_mem_copy_image_region(origin, region, v_ptr, row_pitch, image->slice_pitch, memobj->host_ptr, image->host_row_pitch, image->host_slice_pitch, image, CL_FALSE, CL_TRUE); } } else { assert(v_ptr == mapped_ptr); } cl_mem_unmap_auto(memobj); /* shrink the mapped slot. */ if (memobj->mapped_ptr_sz / 2 > memobj->map_ref) { int j = 0; cl_mapped_ptr *new_ptr = (cl_mapped_ptr *)malloc( sizeof(cl_mapped_ptr) * (memobj->mapped_ptr_sz / 2)); if (!new_ptr) { /* Just do nothing. */ goto error; } memset(new_ptr, 0, (memobj->mapped_ptr_sz / 2) * sizeof(cl_mapped_ptr)); for (i = 0; i < memobj->mapped_ptr_sz; i++) { if (memobj->mapped_ptr[i].ptr) { new_ptr[j] = memobj->mapped_ptr[i]; j++; assert(j < memobj->mapped_ptr_sz / 2); } } memobj->mapped_ptr_sz = memobj->mapped_ptr_sz / 2; free(memobj->mapped_ptr); memobj->mapped_ptr = new_ptr; } error: return err; } static cl_int cl_enqueue_native_kernel(enqueue_data *data, cl_int status) { cl_int err = CL_SUCCESS; cl_uint num_mem_objects = (cl_uint)data->offset; const cl_mem *mem_list = data->mem_list; const void **args_mem_loc = (const void **)data->const_ptr; cl_uint i; if (status != CL_COMPLETE) return err; for (i = 0; i < num_mem_objects; ++i) { const cl_mem buffer = mem_list[i]; CHECK_MEM(buffer); *((void **)args_mem_loc[i]) = cl_mem_map_auto(buffer, 0); } data->user_func(data->ptr); for (i = 0; i < num_mem_objects; ++i) { cl_mem_unmap_auto(mem_list[i]); } error: return err; } cl_int cl_enqueue_svm_free(enqueue_data *data, cl_int status) { int i; void **pointers = data->pointers; uint num_svm_ptrs = data->size; cl_int err = CL_SUCCESS; if (status != CL_COMPLETE) return err; if(data->free_func) { data->free_func(data->queue, num_svm_ptrs, pointers, data->ptr); } else { for(i=0; iqueue->ctx, pointers[i]); } free(pointers); return CL_SUCCESS; } cl_int cl_enqueue_svm_mem_copy(enqueue_data *data, cl_int status) { cl_mem mem; size_t size = data->size; const char* src_ptr = (const char *)data->const_ptr; char *dst_ptr = (char *)data->ptr; cl_int err = CL_SUCCESS; int i; if (status != CL_COMPLETE) return err; if((mem = cl_context_get_svm_from_ptr(data->queue->ctx, data->ptr)) != NULL) { dst_ptr = (char *)cl_mem_map_auto(mem, 1); } if((mem = cl_context_get_svm_from_ptr(data->queue->ctx, data->const_ptr)) != NULL) { src_ptr = (const char *)cl_mem_map_auto(mem, 0); } for(i=0; isize; size_t pattern_size = data->pattern_size; const char* pattern = (const char *)data->const_ptr; char *ptr = (char *)data->ptr; cl_int err = CL_SUCCESS; int i, j; if (status != CL_COMPLETE) return err; if((mem = cl_context_get_svm_from_ptr(data->queue->ctx, data->ptr)) != NULL) { ptr = (char *)cl_mem_map_auto(mem, 1); } for(i=0; igpgpu); //if it is the last ndrange of an cl enqueue api, //check the device enqueue information. if (data->mid_event_of_enq == 0) { assert(data->queue); cl_device_enqueue_parse_result(data->queue, data->gpgpu); } } else if (status == CL_COMPLETE) { void *batch_buf = cl_gpgpu_ref_batch_buf(data->gpgpu); cl_gpgpu_sync(batch_buf); cl_gpgpu_unref_batch_buf(batch_buf); } return err; } static cl_int cl_enqueue_marker_or_barrier(enqueue_data *data, cl_int status) { return CL_COMPLETE; } LOCAL void cl_enqueue_delete(enqueue_data *data) { if (data == NULL) return; if (data->type == EnqueueCopyBufferRect || data->type == EnqueueCopyBuffer || data->type == EnqueueCopyImage || data->type == EnqueueCopyBufferToImage || data->type == EnqueueCopyImageToBuffer || data->type == EnqueueNDRangeKernel || data->type == EnqueueFillBuffer || data->type == EnqueueFillImage) { if (data->gpgpu) { cl_gpgpu_delete(data->gpgpu); data->gpgpu = NULL; } return; } if (data->type == EnqueueNativeKernel) { if (data->mem_list) { cl_free((void*)data->mem_list); data->mem_list = NULL; } if (data->ptr) { cl_free((void*)data->ptr); data->ptr = NULL; } if (data->const_ptr) { cl_free((void*)data->const_ptr); data->const_ptr = NULL; } } } LOCAL cl_int cl_enqueue_handle(enqueue_data *data, cl_int status) { switch (data->type) { case EnqueueReturnSuccesss: return CL_SUCCESS; case EnqueueReadBuffer: return cl_enqueue_read_buffer(data, status); case EnqueueReadBufferRect: return cl_enqueue_read_buffer_rect(data, status); case EnqueueWriteBuffer: return cl_enqueue_write_buffer(data, status); case EnqueueWriteBufferRect: return cl_enqueue_write_buffer_rect(data, status); case EnqueueReadImage: return cl_enqueue_read_image(data, status); case EnqueueWriteImage: return cl_enqueue_write_image(data, status); case EnqueueMapBuffer: return cl_enqueue_map_buffer(data, status); case EnqueueMapImage: return cl_enqueue_map_image(data, status); case EnqueueUnmapMemObject: return cl_enqueue_unmap_mem_object(data, status); case EnqueueSVMFree: return cl_enqueue_svm_free(data, status); case EnqueueSVMMemCopy: return cl_enqueue_svm_mem_copy(data, status); case EnqueueSVMMemFill: return cl_enqueue_svm_mem_fill(data, status); case EnqueueMarker: case EnqueueBarrier: return cl_enqueue_marker_or_barrier(data, status); case EnqueueCopyBufferRect: case EnqueueCopyBuffer: case EnqueueCopyImage: case EnqueueCopyBufferToImage: case EnqueueCopyImageToBuffer: case EnqueueNDRangeKernel: case EnqueueFillBuffer: case EnqueueFillImage: //return cl_event_flush(event); return cl_enqueue_ndrange(data, status); case EnqueueNativeKernel: return cl_enqueue_native_kernel(data, status); case EnqueueMigrateMemObj: default: return CL_SUCCESS; } } Beignet-1.3.2-Source/src/cl_driver_defs.c000664 001750 001750 00000013650 13161142102 017361 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "cl_driver.h" #include "cl_utils.h" #include /* Driver */ LOCAL cl_driver_new_cb *cl_driver_new = NULL; LOCAL cl_driver_delete_cb *cl_driver_delete = NULL; LOCAL cl_driver_get_bufmgr_cb *cl_driver_get_bufmgr = NULL; LOCAL cl_driver_get_ver_cb *cl_driver_get_ver = NULL; LOCAL cl_driver_enlarge_stack_size_cb *cl_driver_enlarge_stack_size = NULL; LOCAL cl_driver_set_atomic_flag_cb *cl_driver_set_atomic_flag = NULL; LOCAL cl_driver_get_device_id_cb *cl_driver_get_device_id = NULL; LOCAL cl_driver_update_device_info_cb *cl_driver_update_device_info = NULL; /* Buffer */ LOCAL cl_buffer_alloc_cb *cl_buffer_alloc = NULL; LOCAL cl_buffer_alloc_userptr_cb *cl_buffer_alloc_userptr = NULL; LOCAL cl_buffer_set_softpin_offset_cb *cl_buffer_set_softpin_offset = NULL; LOCAL cl_buffer_set_bo_use_full_range_cb *cl_buffer_set_bo_use_full_range = NULL; LOCAL cl_buffer_disable_reuse_cb *cl_buffer_disable_reuse = NULL; LOCAL cl_buffer_set_tiling_cb *cl_buffer_set_tiling = NULL; LOCAL cl_buffer_alloc_from_texture_cb *cl_buffer_alloc_from_texture = NULL; LOCAL cl_buffer_release_from_texture_cb *cl_buffer_release_from_texture = NULL; LOCAL cl_buffer_reference_cb *cl_buffer_reference = NULL; LOCAL cl_buffer_unreference_cb *cl_buffer_unreference = NULL; LOCAL cl_buffer_map_cb *cl_buffer_map = NULL; LOCAL cl_buffer_unmap_cb *cl_buffer_unmap = NULL; LOCAL cl_buffer_map_gtt_cb *cl_buffer_map_gtt = NULL; LOCAL cl_buffer_map_gtt_unsync_cb *cl_buffer_map_gtt_unsync = NULL; LOCAL cl_buffer_unmap_gtt_cb *cl_buffer_unmap_gtt = NULL; LOCAL cl_buffer_get_virtual_cb *cl_buffer_get_virtual = NULL; LOCAL cl_buffer_get_size_cb *cl_buffer_get_size = NULL; LOCAL cl_buffer_pin_cb *cl_buffer_pin = NULL; LOCAL cl_buffer_unpin_cb *cl_buffer_unpin = NULL; LOCAL cl_buffer_subdata_cb *cl_buffer_subdata = NULL; LOCAL cl_buffer_get_subdata_cb *cl_buffer_get_subdata = NULL; LOCAL cl_buffer_wait_rendering_cb *cl_buffer_wait_rendering = NULL; LOCAL cl_buffer_get_buffer_from_libva_cb *cl_buffer_get_buffer_from_libva = NULL; LOCAL cl_buffer_get_image_from_libva_cb *cl_buffer_get_image_from_libva = NULL; LOCAL cl_buffer_get_fd_cb *cl_buffer_get_fd = NULL; LOCAL cl_buffer_get_tiling_align_cb *cl_buffer_get_tiling_align = NULL; LOCAL cl_buffer_get_buffer_from_fd_cb *cl_buffer_get_buffer_from_fd = NULL; LOCAL cl_buffer_get_image_from_fd_cb *cl_buffer_get_image_from_fd = NULL; /* GPGPU */ LOCAL cl_gpgpu_new_cb *cl_gpgpu_new = NULL; LOCAL cl_gpgpu_delete_cb *cl_gpgpu_delete = NULL; LOCAL cl_gpgpu_sync_cb *cl_gpgpu_sync = NULL; LOCAL cl_gpgpu_bind_buf_cb *cl_gpgpu_bind_buf = NULL; LOCAL cl_gpgpu_set_stack_cb *cl_gpgpu_set_stack = NULL; LOCAL cl_gpgpu_set_scratch_cb *cl_gpgpu_set_scratch = NULL; LOCAL cl_gpgpu_bind_image_cb *cl_gpgpu_bind_image = NULL; LOCAL cl_gpgpu_bind_image_cb *cl_gpgpu_bind_image_for_vme = NULL; LOCAL cl_gpgpu_get_cache_ctrl_cb *cl_gpgpu_get_cache_ctrl = NULL; LOCAL cl_gpgpu_state_init_cb *cl_gpgpu_state_init = NULL; LOCAL cl_gpgpu_alloc_constant_buffer_cb * cl_gpgpu_alloc_constant_buffer = NULL; LOCAL cl_gpgpu_set_perf_counters_cb *cl_gpgpu_set_perf_counters = NULL; LOCAL cl_gpgpu_upload_curbes_cb *cl_gpgpu_upload_curbes = NULL; LOCAL cl_gpgpu_states_setup_cb *cl_gpgpu_states_setup = NULL; LOCAL cl_gpgpu_upload_samplers_cb *cl_gpgpu_upload_samplers = NULL; LOCAL cl_gpgpu_batch_reset_cb *cl_gpgpu_batch_reset = NULL; LOCAL cl_gpgpu_batch_start_cb *cl_gpgpu_batch_start = NULL; LOCAL cl_gpgpu_batch_end_cb *cl_gpgpu_batch_end = NULL; LOCAL cl_gpgpu_flush_cb *cl_gpgpu_flush = NULL; LOCAL cl_gpgpu_walker_cb *cl_gpgpu_walker = NULL; LOCAL cl_gpgpu_bind_sampler_cb *cl_gpgpu_bind_sampler = NULL; LOCAL cl_gpgpu_bind_vme_state_cb *cl_gpgpu_bind_vme_state = NULL; LOCAL cl_gpgpu_event_new_cb *cl_gpgpu_event_new = NULL; LOCAL cl_gpgpu_event_update_status_cb *cl_gpgpu_event_update_status = NULL; LOCAL cl_gpgpu_event_flush_cb *cl_gpgpu_event_flush = NULL; LOCAL cl_gpgpu_event_delete_cb *cl_gpgpu_event_delete = NULL; LOCAL cl_gpgpu_event_get_exec_timestamp_cb *cl_gpgpu_event_get_exec_timestamp = NULL; LOCAL cl_gpgpu_event_get_gpu_cur_timestamp_cb *cl_gpgpu_event_get_gpu_cur_timestamp = NULL; LOCAL cl_gpgpu_ref_batch_buf_cb *cl_gpgpu_ref_batch_buf = NULL; LOCAL cl_gpgpu_unref_batch_buf_cb *cl_gpgpu_unref_batch_buf = NULL; LOCAL cl_gpgpu_set_profiling_buffer_cb *cl_gpgpu_set_profiling_buffer = NULL; LOCAL cl_gpgpu_set_profiling_info_cb *cl_gpgpu_set_profiling_info = NULL; LOCAL cl_gpgpu_get_profiling_info_cb *cl_gpgpu_get_profiling_info = NULL; LOCAL cl_gpgpu_map_profiling_buffer_cb *cl_gpgpu_map_profiling_buffer = NULL; LOCAL cl_gpgpu_unmap_profiling_buffer_cb *cl_gpgpu_unmap_profiling_buffer = NULL; LOCAL cl_gpgpu_set_printf_buffer_cb *cl_gpgpu_set_printf_buffer = NULL; LOCAL cl_gpgpu_reloc_printf_buffer_cb *cl_gpgpu_reloc_printf_buffer = NULL; LOCAL cl_gpgpu_map_printf_buffer_cb *cl_gpgpu_map_printf_buffer = NULL; LOCAL cl_gpgpu_unmap_printf_buffer_cb *cl_gpgpu_unmap_printf_buffer = NULL; LOCAL cl_gpgpu_set_printf_info_cb *cl_gpgpu_set_printf_info = NULL; LOCAL cl_gpgpu_get_printf_info_cb *cl_gpgpu_get_printf_info = NULL; LOCAL cl_gpgpu_release_printf_buffer_cb *cl_gpgpu_release_printf_buffer = NULL; LOCAL cl_gpgpu_set_kernel_cb *cl_gpgpu_set_kernel = NULL; LOCAL cl_gpgpu_get_kernel_cb *cl_gpgpu_get_kernel = NULL; Beignet-1.3.2-Source/src/cl_event.h000664 001750 001750 00000010206 13161142102 016205 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __CL_EVENT_H_ #define __CL_EVENT_H_ #include #include "cl_base_object.h" #include "cl_enqueue.h" #include "CL/cl.h" typedef void(CL_CALLBACK *cl_event_notify_cb)(cl_event event, cl_int event_command_exec_status, void *user_data); typedef struct _cl_event_user_callback { cl_int status; /* The execution status */ cl_bool executed; /* Indicat the callback function been called or not */ cl_event_notify_cb pfn_notify; /* Callback function */ void *user_data; /* Callback user data */ list_node node; /* Event callback list node */ } _cl_event_user_callback; typedef _cl_event_user_callback *cl_event_user_callback; typedef struct _cl_event { _cl_base_object base; cl_context ctx; /* The context associated with event */ cl_command_queue queue; /* The command queue associated with event */ cl_command_type event_type; /* Event type. */ cl_bool is_barrier; /* Is this event a barrier */ cl_int status; /* The execution status */ cl_event *depend_events; /* The events must complete before this. */ cl_uint depend_event_num; /* The depend events number. */ list_head callbacks; /* The events The event callback functions */ list_node enqueue_node; /* The node in the enqueue list. */ cl_ulong timestamp[5]; /* The time stamps for profiling. */ enqueue_data exec_data; /* Context for execute this event. */ } _cl_event; #define CL_OBJECT_EVENT_MAGIC 0x8324a9f810ebf90fLL #define CL_OBJECT_IS_EVENT(obj) ((obj && \ ((cl_base_object)obj)->magic == CL_OBJECT_EVENT_MAGIC && \ CL_OBJECT_GET_REF(obj) >= 1)) #define CL_EVENT_STATE_UNKNOWN 0x4 #define CL_EVENT_IS_MARKER(E) (E->event_type == CL_COMMAND_MARKER) #define CL_EVENT_IS_BARRIER(E) (E->event_type == CL_COMMAND_BARRIER) #define CL_EVENT_IS_USER(E) (E->event_type == CL_COMMAND_USER) #define CL_EVENT_INVALID_TIMESTAMP 0xFFFFFFFFFFFFFFFF /* Create a new event object */ extern cl_event cl_event_create(cl_context ctx, cl_command_queue queue, cl_uint num_events, const cl_event *event_list, cl_command_type type, cl_int *errcode_ret); extern cl_int cl_event_check_waitlist(cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event* event, cl_context ctx); extern cl_uint cl_event_exec(cl_event event, cl_int exec_to_status, cl_bool ignore_depends); /* 0 means ready, >0 means not ready, <0 means error. */ extern cl_int cl_event_is_ready(cl_event event); extern cl_int cl_event_get_status(cl_event event); extern void cl_event_add_ref(cl_event event); extern void cl_event_delete(cl_event event); extern cl_int cl_event_set_status(cl_event event, cl_int status); extern cl_int cl_event_set_callback(cl_event event, cl_int exec_type, cl_event_notify_cb pfn_notify, void *user_data); extern cl_int cl_event_wait_for_events_list(cl_uint num_events, const cl_event *event_list); extern cl_int cl_event_wait_for_event_ready(cl_event event); extern cl_event cl_event_create_marker_or_barrier(cl_command_queue queue, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_bool is_barrier, cl_int* error); extern void cl_event_update_timestamp(cl_event event, cl_int status); #endif /* __CL_EVENT_H__ */ Beignet-1.3.2-Source/src/cl_api_platform_id.c000664 001750 001750 00000004153 13161142102 020214 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "cl_platform_id.h" #include "CL/cl_ext.h" cl_int clGetPlatformInfo(cl_platform_id platform, cl_platform_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { const void *src_ptr = NULL; size_t src_size = 0; if (!CL_OBJECT_IS_PLATFORM(platform)) { return CL_INVALID_PLATFORM; } /* Only one platform now. */ if (platform != cl_get_platform_default()) { return CL_INVALID_PLATFORM; } if (param_name == CL_PLATFORM_PROFILE) { src_ptr = platform->profile; src_size = platform->profile_sz; } else if (param_name == CL_PLATFORM_VERSION) { src_ptr = platform->version; src_size = platform->version_sz; } else if (param_name == CL_PLATFORM_NAME) { src_ptr = platform->name; src_size = platform->name_sz; } else if (param_name == CL_PLATFORM_VENDOR) { src_ptr = platform->vendor; src_size = platform->vendor_sz; } else if (param_name == CL_PLATFORM_EXTENSIONS) { src_ptr = platform->extensions; src_size = platform->extensions_sz; } else if (param_name == CL_PLATFORM_ICD_SUFFIX_KHR) { src_ptr = platform->icd_suffix_khr; src_size = platform->icd_suffix_khr_sz; } else { return CL_INVALID_VALUE; } return cl_get_info_helper(src_ptr, src_size, param_value, param_value_size, param_value_size_ret); } Beignet-1.3.2-Source/src/cl_event.c000664 001750 001750 00000043722 13161142102 016211 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "cl_event.h" #include "cl_context.h" #include "cl_command_queue.h" #include "cl_alloc.h" #include #include // TODO: Need to move it to some device related file later. static void cl_event_update_timestamp_gen(cl_event event, cl_int status) { cl_ulong ts = 0; if ((event->exec_data.type == EnqueueCopyBufferRect) || (event->exec_data.type == EnqueueCopyBuffer) || (event->exec_data.type == EnqueueCopyImage) || (event->exec_data.type == EnqueueCopyBufferToImage) || (event->exec_data.type == EnqueueCopyImageToBuffer) || (event->exec_data.type == EnqueueNDRangeKernel) || (event->exec_data.type == EnqueueFillBuffer) || (event->exec_data.type == EnqueueFillImage)) { if (status == CL_QUEUED || status == CL_SUBMITTED) { cl_gpgpu_event_get_gpu_cur_timestamp(event->queue->ctx->drv, &ts); if (ts == CL_EVENT_INVALID_TIMESTAMP) ts++; event->timestamp[CL_QUEUED - status] = ts; return; } else if (status == CL_RUNNING) { assert(event->exec_data.gpgpu); return; // Wait for the event complete and get run and complete then. } else { assert(event->exec_data.gpgpu); cl_gpgpu_event_get_exec_timestamp(event->exec_data.gpgpu, 0, &ts); if (ts == CL_EVENT_INVALID_TIMESTAMP) ts++; event->timestamp[2] = ts; cl_gpgpu_event_get_exec_timestamp(event->exec_data.gpgpu, 1, &ts); if (ts == CL_EVENT_INVALID_TIMESTAMP) ts++; event->timestamp[3] = ts; /* Set the submit time the same as running time if it is later. */ if (event->timestamp[1] > event->timestamp[2] || event->timestamp[2] - event->timestamp[1] > 0x0FFFFFFFFFF /*Overflowed */) event->timestamp[1] = event->timestamp[2]; return; } } else { cl_gpgpu_event_get_gpu_cur_timestamp(event->queue->ctx->drv, &ts); if (ts == CL_EVENT_INVALID_TIMESTAMP) ts++; event->timestamp[CL_QUEUED - status] = ts; return; } } LOCAL void cl_event_update_timestamp(cl_event event, cl_int state) { int i; cl_bool re_cal = CL_FALSE; cl_ulong ts[4]; assert(state >= CL_COMPLETE || state <= CL_QUEUED); if (event->event_type == CL_COMMAND_USER) return; assert(event->queue); if ((event->queue->props & CL_QUEUE_PROFILING_ENABLE) == 0) return; /* Should not record the timestamp twice. */ assert(event->timestamp[CL_QUEUED - state] == CL_EVENT_INVALID_TIMESTAMP); cl_event_update_timestamp_gen(event, state); if (state == CL_COMPLETE) { // TODO: Need to set the CL_PROFILING_COMMAND_COMPLETE when enable child enqueue. // Just a duplicate of event complete time now. event->timestamp[4] = event->timestamp[3]; /* If timestamp overflow, set queued time to 0 and re-calculate. */ for (i = 0; i < 4; i++) { if (event->timestamp[i + 1] < event->timestamp[i]) { re_cal = CL_TRUE; break; } } if (re_cal) { for (i = 3; i >= 0; i--) { if (event->timestamp[i + 1] < event->timestamp[i]) { //overflow ts[i] = event->timestamp[i + 1] + (CL_EVENT_INVALID_TIMESTAMP - event->timestamp[i]); } else { ts[i] = event->timestamp[i + 1] - event->timestamp[i]; } } event->timestamp[0] = 0; for (i = 1; i < 5; i++) { event->timestamp[i] = event->timestamp[i - 1] + ts[i - 1]; } } } } LOCAL void cl_event_add_ref(cl_event event) { assert(event); CL_OBJECT_INC_REF(event); } LOCAL cl_int cl_event_get_status(cl_event event) { cl_int ret; assert(event); CL_OBJECT_LOCK(event); ret = event->status; CL_OBJECT_UNLOCK(event); return ret; } static cl_event cl_event_new(cl_context ctx, cl_command_queue queue, cl_command_type type, cl_uint num_events, cl_event *event_list) { int i; cl_event e = cl_calloc(1, sizeof(_cl_event)); if (e == NULL) return NULL; CL_OBJECT_INIT_BASE(e, CL_OBJECT_EVENT_MAGIC); /* Append the event in the context event list */ cl_context_add_event(ctx, e); e->queue = queue; list_init(&e->callbacks); list_node_init(&e->enqueue_node); assert(type >= CL_COMMAND_NDRANGE_KERNEL && type <= CL_COMMAND_SVM_UNMAP); e->event_type = type; if (type == CL_COMMAND_USER) { e->status = CL_SUBMITTED; } else { e->status = CL_EVENT_STATE_UNKNOWN; } if (type == CL_COMMAND_USER) { assert(queue == NULL); } e->depend_events = event_list; e->depend_event_num = num_events; for (i = 0; i < 4; i++) { e->timestamp[i] = CL_EVENT_INVALID_TIMESTAMP; } return e; } LOCAL void cl_event_delete(cl_event event) { int i; cl_event_user_callback cb; if (UNLIKELY(event == NULL)) return; if (CL_OBJECT_DEC_REF(event) > 1) return; cl_enqueue_delete(&event->exec_data); assert(list_node_out_of_list(&event->enqueue_node)); if (event->depend_events) { assert(event->depend_event_num); for (i = 0; i < event->depend_event_num; i++) { cl_event_delete(event->depend_events[i]); } cl_free(event->depend_events); } /* Free all the callbacks. Last ref, no need to lock. */ while (!list_empty(&event->callbacks)) { cb = list_entry(event->callbacks.head_node.n, _cl_event_user_callback, node); list_node_del(&cb->node); cl_free(cb); } /* Remove it from the list */ assert(event->ctx); cl_context_remove_event(event->ctx, event); CL_OBJECT_DESTROY_BASE(event); cl_free(event); } LOCAL cl_event cl_event_create(cl_context ctx, cl_command_queue queue, cl_uint num_events, const cl_event *event_list, cl_command_type type, cl_int *errcode_ret) { cl_event e = NULL; cl_event *depend_events = NULL; cl_int err = CL_SUCCESS; cl_uint total_events = 0; int i; assert(ctx); do { if (event_list) assert(num_events); if (queue == NULL) { assert(type == CL_COMMAND_USER); assert(event_list == NULL); assert(num_events == 0); e = cl_event_new(ctx, queue, type, 0, NULL); if (e == NULL) { err = CL_OUT_OF_HOST_MEMORY; break; } } else { CL_OBJECT_LOCK(queue); total_events = queue->barrier_events_num + num_events; if (total_events) { depend_events = cl_calloc(total_events, sizeof(cl_event)); if (depend_events == NULL) { CL_OBJECT_UNLOCK(queue); err = CL_OUT_OF_HOST_MEMORY; break; } } /* Add all the barrier events as depend events. */ for (i = 0; i < queue->barrier_events_num; i++) { assert(CL_EVENT_IS_BARRIER(queue->barrier_events[i])); cl_event_add_ref(queue->barrier_events[i]); depend_events[num_events + i] = queue->barrier_events[i]; } CL_OBJECT_UNLOCK(queue); for (i = 0; i < num_events; i++) { assert(event_list && event_list[i]); assert(event_list[i]->ctx == ctx); assert(CL_OBJECT_IS_EVENT(event_list[i])); cl_event_add_ref(event_list[i]); depend_events[i] = event_list[i]; } if (depend_events) assert(total_events); e = cl_event_new(ctx, queue, type, total_events, depend_events); if (e == NULL) { err = CL_OUT_OF_HOST_MEMORY; break; } depend_events = NULL; } } while (0); if (err != CL_SUCCESS) { if (depend_events) { for (i = 0; i < total_events; i++) { cl_event_delete(depend_events[i]); } cl_free(depend_events); } // if set depend_events, must succeed. assert(e->depend_events == NULL); cl_event_delete(e); } if (errcode_ret) *errcode_ret = err; return e; } LOCAL cl_int cl_event_set_callback(cl_event event, cl_int exec_type, cl_event_notify_cb pfn_notify, void *user_data) { cl_int err = CL_SUCCESS; cl_event_user_callback cb; cl_bool exec_imm = CL_FALSE; assert(event); assert(pfn_notify); do { cb = cl_calloc(1, sizeof(_cl_event_user_callback)); if (cb == NULL) { err = CL_OUT_OF_HOST_MEMORY; break; } list_node_init(&cb->node); cb->pfn_notify = pfn_notify; cb->user_data = user_data; cb->status = exec_type; cb->executed = CL_FALSE; CL_OBJECT_LOCK(event); if (event->status > exec_type) { list_add_tail(&event->callbacks, &cb->node); cb = NULL; } else { /* The state has already OK, call it immediately. */ exec_imm = CL_TRUE; } CL_OBJECT_UNLOCK(event); if (exec_imm) { cb->pfn_notify(event, event->status, cb->user_data); } } while (0); if (cb) cl_free(cb); return err; } LOCAL cl_int cl_event_set_status(cl_event event, cl_int status) { list_head tmp_callbacks; list_node *n; list_node *pos; cl_bool notify_queue = CL_FALSE; cl_event_user_callback cb; assert(event); CL_OBJECT_LOCK(event); if (event->status <= CL_COMPLETE) { // Already set to error or completed CL_OBJECT_UNLOCK(event); return CL_INVALID_OPERATION; } if (CL_EVENT_IS_USER(event)) { assert(event->status != CL_RUNNING && event->status != CL_QUEUED); } else { assert(event->queue); // Must belong to some queue. } if (status >= event->status) { // Should never go back. CL_OBJECT_UNLOCK(event); return CL_INVALID_OPERATION; } event->status = status; /* Call all the callbacks. */ if (!list_empty(&event->callbacks)) { do { status = event->status; list_init(&tmp_callbacks); list_move(&event->callbacks, &tmp_callbacks); /* Call all the callbacks without lock. */ CL_OBJECT_UNLOCK(event); list_for_each_safe(pos, n, &tmp_callbacks) { cb = list_entry(pos, _cl_event_user_callback, node); assert(cb->executed == CL_FALSE); if (cb->status < status) continue; list_node_del(&cb->node); cb->executed = CL_TRUE; cb->pfn_notify(event, status, cb->user_data); cl_free(cb); } CL_OBJECT_LOCK(event); // Set back the uncalled callbacks. list_merge(&event->callbacks, &tmp_callbacks); /* Status may changed because we unlock. need to check again. */ } while (status != event->status); } /* Wakeup all the waiter for status change. */ CL_OBJECT_NOTIFY_COND(event); if (event->status <= CL_COMPLETE) { notify_queue = CL_TRUE; } CL_OBJECT_UNLOCK(event); /* Need to notify all the command queue within the same context. */ if (notify_queue) { cl_command_queue queue = NULL; /*First, we need to remove it from queue's barrier list. */ if (CL_EVENT_IS_BARRIER(event)) { assert(event->queue); cl_command_queue_remove_barrier_event(event->queue, event); } /* Then, notify all the queues within the same context. */ CL_OBJECT_LOCK(event->ctx); /* Disable remove and add queue to the context temporary. We need to make sure all the queues in the context currently are valid. */ event->ctx->queue_modify_disable++; CL_OBJECT_UNLOCK(event->ctx); list_for_each(pos, &event->ctx->queues) { queue = (cl_command_queue)(list_entry(pos, _cl_base_object, node)); assert(queue != NULL); cl_command_queue_notify(queue); } CL_OBJECT_LOCK(event->ctx); /* Disable remove and add queue to the context temporary. We need to make sure all the queues in the context currently are valid. */ event->ctx->queue_modify_disable--; CL_OBJECT_NOTIFY_COND(event->ctx); CL_OBJECT_UNLOCK(event->ctx); } return CL_SUCCESS; } LOCAL cl_int cl_event_wait_for_event_ready(const cl_event event) { assert(CL_OBJECT_IS_EVENT(event)); return cl_event_wait_for_events_list(event->depend_event_num, event->depend_events); } LOCAL cl_int cl_event_wait_for_events_list(cl_uint num_events, const cl_event *event_list) { int i; cl_event e; cl_int ret = CL_SUCCESS; for (i = 0; i < num_events; i++) { e = event_list[i]; assert(e); assert(CL_OBJECT_IS_EVENT(e)); CL_OBJECT_LOCK(e); while (e->status > CL_COMPLETE) { CL_OBJECT_WAIT_ON_COND(e); } assert(e->status <= CL_COMPLETE); /* Iff some error happened, return the error. */ if (e->status < CL_COMPLETE) { ret = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; } CL_OBJECT_UNLOCK(e); } return ret; } LOCAL cl_int cl_event_check_waitlist(cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event, cl_context ctx) { cl_int err = CL_SUCCESS; cl_int i; do { /* check the event_wait_list and num_events_in_wait_list */ if ((event_wait_list == NULL) && (num_events_in_wait_list > 0)) { err = CL_INVALID_EVENT_WAIT_LIST; break; } if ((event_wait_list != NULL) && (num_events_in_wait_list == 0)) { err = CL_INVALID_EVENT_WAIT_LIST; break; } /* check the event and context */ for (i = 0; i < num_events_in_wait_list; i++) { if (!CL_OBJECT_IS_EVENT(event_wait_list[i])) { err = CL_INVALID_EVENT_WAIT_LIST; break; } if (event == event_wait_list + i) { /* Pointer of element of the wait list */ err = CL_INVALID_EVENT_WAIT_LIST; break; } /* check all belong to same context. */ if (ctx == NULL) { ctx = event_wait_list[i]->ctx; } if (event_wait_list[i]->ctx != ctx) { err = CL_INVALID_CONTEXT; break; } } if (err != CL_SUCCESS) break; } while (0); return err; } /* When we call this function, all the events it depends on should already be ready, unless ignore_depends is set. */ LOCAL cl_uint cl_event_exec(cl_event event, cl_int exec_to_status, cl_bool ignore_depends) { /* We are MT safe here, no one should call this at the same time. No need to lock */ cl_int ret = CL_SUCCESS; cl_int cur_status = cl_event_get_status(event); cl_int depend_status; cl_int s; assert(exec_to_status >= CL_COMPLETE); assert(exec_to_status <= CL_QUEUED); if (cur_status < CL_COMPLETE) { return cur_status; } depend_status = cl_event_is_ready(event); assert(depend_status <= CL_COMPLETE || ignore_depends || exec_to_status == CL_QUEUED); if (depend_status < CL_COMPLETE) { // Error happend, cancel exec. ret = cl_event_set_status(event, depend_status); return depend_status; } if (cur_status <= exec_to_status) { return ret; } /* Exec to the target status. */ for (s = cur_status - 1; s >= exec_to_status; s--) { assert(s >= CL_COMPLETE); ret = cl_enqueue_handle(&event->exec_data, s); if (ret != CL_SUCCESS) { assert(ret < 0); DEBUGP(DL_WARNING, "Exec event %p error, type is %d, error status is %d", event, event->event_type, ret); ret = cl_event_set_status(event, ret); assert(ret == CL_SUCCESS); return ret; // Failed and we never do further. } else { assert(!CL_EVENT_IS_USER(event)); if ((event->queue->props & CL_QUEUE_PROFILING_ENABLE) != 0) { /* record the timestamp before actually doing something. */ cl_event_update_timestamp(event, s); } ret = cl_event_set_status(event, s); assert(ret == CL_SUCCESS); } } return ret; } /* 0 means ready, >0 means not ready, <0 means error. */ LOCAL cl_int cl_event_is_ready(cl_event event) { int i; int status; int ret_status = CL_COMPLETE; for (i = 0; i < event->depend_event_num; i++) { status = cl_event_get_status(event->depend_events[i]); if (status > CL_COMPLETE) { // Find some not ready, just OK return status; } if (status < CL_COMPLETE) { // Record some error. ret_status = status; } } return ret_status; } LOCAL cl_event cl_event_create_marker_or_barrier(cl_command_queue queue, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_bool is_barrier, cl_int *error) { cl_event e = NULL; cl_int err = CL_SUCCESS; cl_command_type type = CL_COMMAND_MARKER; enqueue_type eq_type = EnqueueMarker; if (is_barrier) { type = CL_COMMAND_BARRIER; eq_type = EnqueueBarrier; } if (event_wait_list) { assert(num_events_in_wait_list > 0); e = cl_event_create(queue->ctx, queue, num_events_in_wait_list, event_wait_list, type, &err); if (err != CL_SUCCESS) { *error = err; return NULL; } } else { /* The marker depends on all events in the queue now. */ cl_command_queue_enqueue_worker worker = &queue->worker; cl_uint i; cl_uint event_num; cl_event *depend_events; CL_OBJECT_LOCK(queue); /* First, wait for the command queue retire all in executing event. */ while (1) { if (worker->quit) { // already destroy the queue? CL_OBJECT_UNLOCK(queue); *error = CL_INVALID_COMMAND_QUEUE; return NULL; } if (worker->in_exec_status != CL_COMPLETE) { CL_OBJECT_WAIT_ON_COND(queue); continue; } break; } event_num = 0; depend_events = NULL; if (!list_empty(&worker->enqueued_events)) { depend_events = cl_command_queue_record_in_queue_events(queue, &event_num); } CL_OBJECT_UNLOCK(queue); e = cl_event_create(queue->ctx, queue, event_num, depend_events, type, &err); for (i = 0; i < event_num; i++) { //unref the temp cl_event_delete(depend_events[i]); } if (depend_events) cl_free(depend_events); if (err != CL_SUCCESS) { *error = err; return NULL; } } e->exec_data.type = eq_type; *error = CL_SUCCESS; return e; } Beignet-1.3.2-Source/src/cl_extensions.c000664 001750 001750 00000012061 13173554000 017266 0ustar00yryr000000 000000 #include "llvm/Config/llvm-config.h" #ifdef HAS_GL_EGL #include "EGL/egl.h" #include "EGL/eglext.h" #endif #include "cl_platform_id.h" #include "cl_device_id.h" #include "cl_internals.h" #include "CL/cl.h" #include "cl_utils.h" #include #include #include /* This extension should be common for all the intel GPU platform. Every device may have its own additional externsions. */ static struct cl_extensions intel_platform_extensions = { { #define DECL_EXT(name) \ {(struct cl_extension_base){.ext_id = cl_##name##_ext_id, .ext_name = "cl_" #name, .ext_enabled = 0}}, DECL_ALL_EXTENSIONS }, #undef DECL_EXT {""} }; void check_basic_extension(cl_extensions_t *extensions) { int id; for(id = BASE_EXT_START_ID; id <= BASE_EXT_END_ID; id++) if (id != EXT_ID(khr_fp64)) extensions->extensions[id].base.ext_enabled = 1; } void check_opt1_extension(cl_extensions_t *extensions) { int id; for(id = OPT1_EXT_START_ID; id <= OPT1_EXT_END_ID; id++) { if (id == EXT_ID(khr_icd)) extensions->extensions[id].base.ext_enabled = 1; #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35 if (id == EXT_ID(khr_spir)) extensions->extensions[id].base.ext_enabled = 1; #endif if (id == EXT_ID(khr_image2d_from_buffer)) extensions->extensions[id].base.ext_enabled = 1; if (id == EXT_ID(khr_3d_image_writes)) extensions->extensions[id].base.ext_enabled = 1; } } void check_gl_extension(cl_extensions_t *extensions) { #if defined(HAS_GL_EGL) int id; /* For now, we only support cl_khr_gl_sharing. */ for(id = GL_EXT_START_ID; id <= GL_EXT_END_ID; id++) if (id == EXT_ID(khr_gl_sharing)) extensions->extensions[id].base.ext_enabled = 1; #endif } void check_intel_extension(cl_extensions_t *extensions) { int id; for(id = INTEL_EXT_START_ID; id <= INTEL_EXT_END_ID; id++) { if(id != EXT_ID(intel_motion_estimation)) extensions->extensions[id].base.ext_enabled = 1; if(id == EXT_ID(intel_required_subgroup_size)) #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR > 40 extensions->extensions[id].base.ext_enabled = 1; #else extensions->extensions[id].base.ext_enabled = 0; #endif } } void process_extension_str(cl_extensions_t *extensions) { int str_max = sizeof(extensions->ext_str); int str_offset = 0; int id; memset(extensions->ext_str, 0, sizeof(extensions->ext_str)); for(id = 0; id < cl_khr_extension_id_max; id++) { if (extensions->extensions[id].base.ext_enabled) { int copy_len; char *ext_name = extensions->extensions[id].base.ext_name; if (str_offset + 1 >= str_max) return; if (str_offset != 0) extensions->ext_str[str_offset - 1] = ' '; copy_len = (strlen(ext_name) + 1 + str_offset) < str_max ? (strlen(ext_name) + 1) : (str_max - str_offset - 1); strncpy(&extensions->ext_str[str_offset], extensions->extensions[id].base.ext_name, copy_len); str_offset += copy_len; } } } LOCAL void cl_intel_platform_get_default_extension(cl_device_id device) { cl_platform_id pf = device->platform; memcpy((char*)device->extensions, pf->internal_extensions->ext_str, sizeof(device->extensions)); device->extensions_sz = strlen(pf->internal_extensions->ext_str) + 1; } LOCAL void cl_intel_platform_enable_extension(cl_device_id device, uint32_t ext) { int id; char* ext_str = NULL; cl_platform_id pf = device->platform; assert(pf); for(id = BASE_EXT_START_ID; id < cl_khr_extension_id_max; id++) { if (id == ext) { if (!pf->internal_extensions->extensions[id].base.ext_enabled) ext_str = pf->internal_extensions->extensions[id].base.ext_name; break; } } /* already enabled, skip. */ if (ext_str && strstr(device->extensions, ext_str)) ext_str = NULL; if (ext_str) { if (device->extensions_sz <= 1) { memcpy((char*)device->extensions, ext_str, strlen(ext_str)); device->extensions_sz = strlen(ext_str) + 1; } else { assert(device->extensions_sz + 1 + strlen(ext_str) < EXTENSTION_LENGTH); *(char*)(device->extensions + device->extensions_sz - 1) = ' '; memcpy((char*)device->extensions + device->extensions_sz, ext_str, strlen(ext_str)); device->extensions_sz = device->extensions_sz + strlen(ext_str) + 1; } *(char*)(device->extensions + device->extensions_sz - 1) = 0; } } LOCAL void cl_intel_platform_extension_init(cl_platform_id intel_platform) { static int ext_initialized = 0; /* The EXT should be only inited once. */ (void) ext_initialized; assert(!ext_initialized); check_basic_extension(&intel_platform_extensions); check_opt1_extension(&intel_platform_extensions); check_gl_extension(&intel_platform_extensions); check_intel_extension(&intel_platform_extensions); process_extension_str(&intel_platform_extensions); ext_initialized = 1; intel_platform->internal_extensions = &intel_platform_extensions; intel_platform->extensions = intel_platform_extensions.ext_str; intel_platform->extensions_sz = strlen(intel_platform->extensions) + 1; return; } Beignet-1.3.2-Source/src/cl_utils.h000664 001750 001750 00000043567 13161142102 016244 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_UTILS_H__ #define __CL_UTILS_H__ #include "CL/cl.h" /* INLINE is forceinline */ #define INLINE __attribute__((always_inline)) inline /* Branch hint */ #define LIKELY(x) __builtin_expect((x),1) #define UNLIKELY(x) __builtin_expect((x),0) /* Stringify macros */ #define JOIN(X, Y) _DO_JOIN(X, Y) #define _DO_JOIN(X, Y) _DO_JOIN2(X, Y) #define _DO_JOIN2(X, Y) X##Y enum DEBUGP_LEVEL { DL_INFO, DL_WARNING, DL_ERROR }; #ifdef NDEBUG #define DEBUGP(...) #else //TODO: decide print or not with the value of level from environment #define DEBUGP(level, fmt, ...) \ do { \ fprintf(stderr, "Beignet: "#fmt, ##__VA_ARGS__); \ fprintf(stderr, "\n"); \ } while (0) #endif /* Check compile time errors */ #define STATIC_ASSERT(value) \ struct JOIN(__,JOIN(__,__LINE__)) { \ int x[(value) ? 1 : -1]; \ } /* Throw errors */ #ifdef NDEBUG #define ERR(ERROR, ...) \ do { \ err = ERROR; \ goto error; \ } while (0) #else #define ERR(ERROR, ...) \ do { \ fprintf(stderr, "error in %s line %i\n", __FILE__, __LINE__); \ fprintf(stderr, __VA_ARGS__); \ fprintf(stderr, "\n"); \ err = ERROR; \ goto error; \ } while (0) #endif #define DO_ALLOC_ERR \ do { \ ERR(CL_OUT_OF_HOST_MEMORY, "Out of memory"); \ } while (0) #define ERR_IF(COND, ERROR, ...) \ do { \ if (UNLIKELY(COND)) ERR (ERROR, __VA_ARGS__); \ } while (0) #define INVALID_VALUE_IF(COND) \ do { \ ERR_IF(COND, CL_INVALID_VALUE, "Invalid value"); \ } while (0) #define INVALID_DEVICE_IF(COND) \ do { \ ERR_IF(COND, CL_INVALID_DEVICE, "Invalid device"); \ } while (0) #define MAX(x0, x1) ((x0) > (x1) ? (x0) : (x1)) #define MIN(x0, x1) ((x0) < (x1) ? (x0) : (x1)) #define ALIGN(A, B) (((A) % (B)) ? (A) + (B) - ((A) % (B)) : (A)) #define DO_ALLOC_ERROR \ do { \ err = CL_OUT_OF_HOST_MEMORY; \ goto error; \ } while (0) #define FATAL(...) \ do { \ fprintf(stderr, "error: "); \ fprintf(stderr, __VA_ARGS__); \ fprintf(stderr, "\n"); \ assert(0); \ exit(-1); \ } while (0) #define FATAL_IF(COND, ...) \ do { \ if (UNLIKELY(COND)) FATAL(__VA_ARGS__); \ } while (0) #define NOT_IMPLEMENTED FATAL ("Not implemented") #define CHECK_CONTEXT(CTX) \ do { \ if (UNLIKELY(CTX == NULL)) { \ err = CL_INVALID_CONTEXT; \ goto error; \ } \ if (UNLIKELY(!CL_OBJECT_IS_CONTEXT(CTX))) { \ err = CL_INVALID_CONTEXT; \ goto error; \ } \ } while (0) #define CHECK_QUEUE(QUEUE) \ do { \ if (UNLIKELY(QUEUE == NULL)) { \ err = CL_INVALID_COMMAND_QUEUE; \ goto error; \ } \ if (UNLIKELY(!CL_OBJECT_IS_COMMAND_QUEUE(QUEUE))) { \ err = CL_INVALID_COMMAND_QUEUE; \ goto error; \ } \ } while (0) #define CHECK_MEM(MEM) \ do { \ if (UNLIKELY(MEM == NULL)) { \ err = CL_INVALID_MEM_OBJECT; \ goto error; \ } \ if (UNLIKELY(!CL_OBJECT_IS_MEM(MEM))) { \ err = CL_INVALID_MEM_OBJECT; \ goto error; \ } \ } while (0) #define CHECK_IMAGE(MEM, IMAGE) \ CHECK_MEM(MEM); \ do { \ if (UNLIKELY(!IS_IMAGE(MEM))) { \ err = CL_INVALID_MEM_OBJECT; \ goto error; \ } \ } while (0); \ struct _cl_mem_image *IMAGE; \ IMAGE = cl_mem_image(MEM); \ #define FIXUP_IMAGE_REGION(IMAGE, PREGION, REGION) \ const size_t *REGION; \ size_t REGION ##_REC[3]; \ do { \ if (PREGION == NULL) \ { \ err = CL_INVALID_VALUE; \ goto error; \ } \ if (IMAGE->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY) { \ REGION ##_REC[0] = PREGION[0]; \ REGION ##_REC[1] = 1; \ REGION ##_REC[2] = PREGION[1]; \ REGION = REGION ##_REC; \ } else { \ REGION = PREGION; \ } \ if((REGION[0] == 0)||(REGION[1] == 0)||(REGION[2] == 0)) \ { \ err = CL_INVALID_VALUE; \ goto error; \ } \ } while(0) #define FIXUP_IMAGE_ORIGIN(IMAGE, PREGION, REGION) \ const size_t *REGION; \ size_t REGION ##_REC[3]; \ do { \ if (PREGION == NULL) \ { \ err = CL_INVALID_VALUE; \ goto error; \ } \ if (IMAGE->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY) { \ REGION ##_REC[0] = PREGION[0]; \ REGION ##_REC[1] = 0; \ REGION ##_REC[2] = PREGION[1]; \ REGION = REGION ##_REC; \ } else { \ REGION = PREGION; \ } \ } while(0) #define CHECK_EVENT(EVENT) \ do { \ if (UNLIKELY(EVENT == NULL)) { \ err = CL_INVALID_EVENT; \ goto error; \ } \ if (UNLIKELY(!CL_OBJECT_IS_EVENT(EVENT))) { \ err = CL_INVALID_EVENT; \ goto error; \ } \ } while (0) #define CHECK_SAMPLER(SAMPLER) \ do { \ if (UNLIKELY(SAMPLER == NULL)) { \ err = CL_INVALID_SAMPLER; \ goto error; \ } \ if (UNLIKELY(!CL_OBJECT_IS_SAMPLER(SAMPLER))) { \ err = CL_INVALID_SAMPLER; \ goto error; \ } \ } while (0) #define CHECK_ACCELERATOR_INTEL(ACCELERATOR_INTEL) \ do { \ if (UNLIKELY(ACCELERATOR_INTEL == NULL)) { \ err = CL_INVALID_ACCELERATOR_INTEL; \ goto error; \ } \ if (UNLIKELY(!CL_OBJECT_IS_ACCELERATOR_INTEL(ACCELERATOR_INTEL))) { \ err = CL_INVALID_ACCELERATOR_INTEL; \ goto error; \ } \ } while (0) #define CHECK_KERNEL(KERNEL) \ do { \ if (UNLIKELY(KERNEL == NULL)) { \ err = CL_INVALID_KERNEL; \ goto error; \ } \ if (UNLIKELY(!CL_OBJECT_IS_KERNEL(KERNEL))) { \ err = CL_INVALID_KERNEL; \ goto error; \ } \ } while (0) #define CHECK_PROGRAM(PROGRAM) \ do { \ if (UNLIKELY(PROGRAM == NULL)) { \ err = CL_INVALID_PROGRAM; \ goto error; \ } \ if (UNLIKELY(!CL_OBJECT_IS_PROGRAM(PROGRAM))) { \ err = CL_INVALID_PROGRAM; \ goto error; \ } \ } while (0) #define ELEMENTS(x) (sizeof(x)/sizeof(*(x))) #define CALLOC_STRUCT(T) (struct T*) cl_calloc(1, sizeof(struct T)) #define CALLOC(T) (T*) cl_calloc(1, sizeof(T)) #define CALLOC_ARRAY(T, N) (T*) cl_calloc(N, sizeof(T)) #define MEMZERO(x) do { memset((x),0,sizeof(*(x))); } while (0) /* Run some code and catch errors */ #define TRY(fn,...) \ do { \ if (UNLIKELY((err = fn(__VA_ARGS__)) != CL_SUCCESS)) \ goto error; \ } while (0) #define TRY_NO_ERR(fn,...) \ do { \ if (UNLIKELY(fn(__VA_ARGS__) != CL_SUCCESS)) \ goto error; \ } while (0) #define TRY_ALLOC(dst, EXPR) \ do { \ if (UNLIKELY((dst = EXPR) == NULL)) \ DO_ALLOC_ERROR; \ } while (0) #define TRY_ALLOC_NO_ERR(dst, EXPR) \ do { \ if (UNLIKELY((dst = EXPR) == NULL)) \ goto error; \ } while (0) #define TRY_ALLOC_NO_RET(EXPR) \ do { \ if (UNLIKELY((EXPR) == NULL)) \ DO_ALLOC_ERROR; \ } while (0) /* Break Point Definitions */ #if !defined(NDEBUG) #define BREAK \ do { \ __asm__("int3"); \ } while(0) #define BREAK_IF(value) \ do { \ if (UNLIKELY(!(value))) BREAKPOINT(); \ } while(0) #else #define BREAKPOINT() do { } while(0) #define ASSERT(value) do { } while(0) #endif /* For all internal functions */ #define LOCAL __attribute__ ((visibility ("internal"))) /* Align a structure or a variable */ #define ALIGNED(X) __attribute__ ((aligned (X))) /* Number of DWORDS */ #define SIZEOF32(X) (sizeof(X) / sizeof(uint32_t)) /* Memory quantity */ #define KB 1024 #define MB (KB*KB) /* To help bitfield definitions */ #define BITFIELD_BIT(X) 1 #define BITFIELD_RANGE(X,Y) ((Y) - (X) + 1) /* 32 bits atomic variable */ typedef volatile int atomic_t; static INLINE int atomic_add(atomic_t *v, const int c) { register int i = c; __asm__ __volatile__("lock ; xaddl %0, %1;" : "+r"(i), "+m"(*v) : "m"(*v), "r"(i)); return i; } static INLINE int atomic_read(atomic_t *v) { return *v; } static INLINE int atomic_inc(atomic_t *v) { return atomic_add(v, 1); } static INLINE int atomic_dec(atomic_t *v) { return atomic_add(v, -1); } /* Define one list node. */ typedef struct list_node { struct list_node *n; struct list_node *p; } list_node; typedef struct list_head { list_node head_node; } list_head; static inline void list_node_init(list_node *node) { node->n = node; node->p = node; } static inline int list_node_out_of_list(const struct list_node *node) { return node->n == node; } static inline void list_init(list_head *head) { head->head_node.n = &head->head_node; head->head_node.p = &head->head_node; } extern void list_node_insert_before(list_node *node, list_node *the_new); extern void list_node_insert_after(list_node *node, list_node *the_new); static inline void list_node_del(struct list_node *node) { node->n->p = node->p; node->p->n = node->n; /* And all point to self for safe. */ node->p = node; node->n = node; } static inline void list_add(list_head *head, list_node *the_new) { list_node_insert_after(&head->head_node, the_new); } static inline void list_add_tail(list_head *head, list_node *the_new) { list_node_insert_before(&head->head_node, the_new); } static inline int list_empty(const struct list_head *head) { return head->head_node.n == &head->head_node; } /* Move the content from one head to another. */ extern void list_move(struct list_head *the_old, struct list_head *the_new); /* Merge the content of the two lists to one head. */ extern void list_merge(struct list_head *head, struct list_head *to_merge); #undef offsetof #ifdef __compiler_offsetof #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER) #else #define offsetof(TYPE, MEMBER) ((size_t) & ((TYPE *)0)->MEMBER) #endif #define list_entry(ptr, type, member) ({ \ const typeof( ((type *)0)->member ) *__mptr = (ptr); \ (type *)( (char *)__mptr - offsetof(type,member) ); }) #define list_for_each(pos, head) \ for (pos = (head)->head_node.n; pos != &((head)->head_node); pos = pos->n) #define list_for_each_safe(pos, ne, head) \ for (pos = (head)->head_node.n, ne = pos->n; pos != &((head)->head_node); \ pos = ne, ne = pos->n) extern cl_int cl_get_info_helper(const void *src, size_t src_size, void *dst, size_t dst_size, size_t *ret_size); #endif /* __CL_UTILS_H__ */ Beignet-1.3.2-Source/src/cl_program.c000664 001750 001750 00000070360 13173554000 016544 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "cl_kernel.h" #include "cl_program.h" #include "cl_device_id.h" #include "cl_context.h" #include "cl_alloc.h" #include "cl_utils.h" #include "cl_khr_icd.h" #include "cl_gbe_loader.h" #include "cl_cmrt.h" #include "CL/cl.h" #include "CL/cl_intel.h" #include "CL/cl_ext.h" #include #include #include #include #include #include #include #include static void cl_program_release_sources(cl_program p) { if (p->source) { cl_free(p->source); p->source = NULL; } } static void cl_program_release_binary(cl_program p) { if (p->binary) { cl_free(p->binary); p->binary = NULL; } } LOCAL void cl_program_delete(cl_program p) { uint32_t ref, i; if (p == NULL) return; /* We are not done with it yet */ if ((ref = CL_OBJECT_DEC_REF(p)) > 1) return; /* Destroy the sources and binary if still allocated */ cl_program_release_sources(p); cl_program_release_binary(p); /* Release the build options. */ if (p->build_opts) { cl_free(p->build_opts); p->build_opts = NULL; } if (p->build_log) { free(p->build_log); p->build_log = NULL; } #ifdef HAS_CMRT if (p->cmrt_program != NULL) cmrt_destroy_program(p); else #endif { cl_free(p->bin); /* Free the blob */ for (i = 0; i < p->ker_n; ++i) /* Free the kernels */ cl_kernel_delete(p->ker[i]); cl_free(p->ker); } if (p->global_data_ptr) cl_buffer_unreference(p->global_data); cl_free(p->global_data_ptr); /* Remove it from the list */ cl_context_remove_program(p->ctx, p); /* Free the program as allocated by the compiler */ if (p->opaque) { if (CompilerSupported()) //For static variables release, gbeLoader may have been released, so //compiler_program_clean_llvm_resource and interp_program_delete may be NULL. if(compiler_program_clean_llvm_resource) compiler_program_clean_llvm_resource(p->opaque); if(interp_program_delete) interp_program_delete(p->opaque); } CL_OBJECT_DESTROY_BASE(p); cl_free(p); } #define BUILD_LOG_MAX_SIZE (1024*1024U) LOCAL cl_program cl_program_new(cl_context ctx) { cl_program p = NULL; /* Allocate the structure */ TRY_ALLOC_NO_ERR (p, CALLOC(struct _cl_program)); CL_OBJECT_INIT_BASE(p, CL_OBJECT_PROGRAM_MAGIC); p->build_status = CL_BUILD_NONE; p->cmrt_program = NULL; p->build_log = calloc(BUILD_LOG_MAX_SIZE, sizeof(char)); if (p->build_log) p->build_log_max_sz = BUILD_LOG_MAX_SIZE; /* The queue also belongs to its context */ cl_context_add_program(ctx, p); exit: return p; error: cl_program_delete(p); goto exit; } LOCAL void cl_program_add_ref(cl_program p) { assert(p); CL_OBJECT_INC_REF(p); } static cl_int cl_program_load_gen_program(cl_program p) { cl_int err = CL_SUCCESS; uint32_t i; assert(p->opaque != NULL); p->ker_n = interp_program_get_kernel_num(p->opaque); /* Allocate the kernel array */ TRY_ALLOC (p->ker, CALLOC_ARRAY(cl_kernel, p->ker_n)); for (i = 0; i < p->ker_n; ++i) { const gbe_kernel opaque = interp_program_get_kernel(p->opaque, i); assert(opaque != NULL); TRY_ALLOC (p->ker[i], cl_kernel_new(p)); cl_kernel_setup(p->ker[i], opaque); } error: return err; } #define BINARY_HEADER_LENGTH 5 static const unsigned char binary_type_header[BHI_MAX][BINARY_HEADER_LENGTH]= \ {{'B','C', 0xC0, 0xDE}, {1, 'B', 'C', 0xC0, 0xDE}, {2, 'B', 'C', 0xC0, 0xDE}, {1, 'G','E', 'N', 'C'}, {'C','I', 'S', 'A'}, }; LOCAL cl_bool headerCompare(const unsigned char *BufPtr, BINARY_HEADER_INDEX index) { bool matched = true; int length = (index == BHI_SPIR || index == BHI_CMRT) ? BINARY_HEADER_LENGTH -1 :BINARY_HEADER_LENGTH; int i = 0; if(index == BHI_GEN_BINARY) i = 1; for (; i < length; ++i) { matched = matched && (BufPtr[i] == binary_type_header[index][i]); } if(index == BHI_GEN_BINARY && matched) { if(BufPtr[0] != binary_type_header[index][0]) { DEBUGP(DL_WARNING, "Beignet binary format have been changed, please generate binary again.\n"); matched = false; } } return matched; } #define isSPIR(BufPtr) headerCompare(BufPtr, BHI_SPIR) #define isLLVM_C_O(BufPtr) headerCompare(BufPtr, BHI_COMPIRED_OBJECT) #define isLLVM_LIB(BufPtr) headerCompare(BufPtr, BHI_LIBRARY) #define isGenBinary(BufPtr) headerCompare(BufPtr, BHI_GEN_BINARY) #define isCMRT(BufPtr) headerCompare(BufPtr, BHI_CMRT) static cl_int get_program_global_data(cl_program prog) { //OpenCL 1.2 would never call this function, and OpenCL 2.0 alwasy HAS_BO_SET_SOFTPIN. #ifdef HAS_BO_SET_SOFTPIN cl_buffer_mgr bufmgr = NULL; bufmgr = cl_context_get_bufmgr(prog->ctx); assert(bufmgr); size_t const_size = interp_program_get_global_constant_size(prog->opaque); if (const_size == 0) return CL_SUCCESS; int page_size = getpagesize(); size_t alignedSz = ALIGN(const_size, page_size); char * p = (char*)cl_aligned_malloc(alignedSz, page_size); prog->global_data_ptr = p; interp_program_get_global_constant_data(prog->opaque, (char*)p); prog->global_data = cl_buffer_alloc_userptr(bufmgr, "program global data", p, alignedSz, 0); cl_buffer_set_softpin_offset(prog->global_data, (size_t)p); cl_buffer_set_bo_use_full_range(prog->global_data, 1); uint32_t reloc_count = interp_program_get_global_reloc_count(prog->opaque); if (reloc_count > 0) { uint32_t x; struct RelocEntry {int refOffset; int defOffset;}; char *temp = (char*) malloc(reloc_count *sizeof(int)*2); interp_program_get_global_reloc_table(prog->opaque, temp); for (x = 0; x < reloc_count; x++) { int ref_offset = ((struct RelocEntry *)temp)[x].refOffset; *(uint64_t*)&(p[ref_offset]) = ((struct RelocEntry *)temp)[x].defOffset + (uint64_t)p; } free(temp); } #if 0 int x = 0; for (x = 0; x < const_size; x++) { printf("offset %d data: %x\n", x, (unsigned)p[x]); } #endif #endif return CL_SUCCESS; } LOCAL size_t cl_program_get_global_variable_size(cl_program prog) { return interp_program_get_global_constant_size(prog->opaque); } LOCAL cl_program cl_program_create_from_binary(cl_context ctx, cl_uint num_devices, const cl_device_id * devices, const size_t * lengths, const unsigned char ** binaries, cl_int * binary_status, cl_int * errcode_ret) { cl_program program = NULL; cl_int err = CL_SUCCESS; assert(ctx); INVALID_DEVICE_IF (num_devices != 1); INVALID_DEVICE_IF (devices == NULL); INVALID_DEVICE_IF (devices[0] != ctx->devices[0]); INVALID_VALUE_IF (binaries == NULL); INVALID_VALUE_IF (lengths == NULL); if (binaries[0] == NULL) { err = CL_INVALID_VALUE; if (binary_status) binary_status[0] = CL_INVALID_VALUE; goto error; } //need at least 4 bytes to check the binary type. if (lengths[0] == 0 || lengths[0] < 4) { err = CL_INVALID_VALUE; if (binary_status) binary_status[0] = CL_INVALID_VALUE; goto error; } program = cl_program_new(ctx); if (UNLIKELY(program == NULL)) { err = CL_OUT_OF_HOST_MEMORY; goto error; } TRY_ALLOC(program->binary, cl_calloc(lengths[0], sizeof(char))); memcpy(program->binary, binaries[0], lengths[0]); program->binary_sz = lengths[0]; program->source_type = FROM_BINARY; if (isCMRT((unsigned char*)program->binary)) { program->source_type = FROM_CMRT; }else if(isSPIR((unsigned char*)program->binary)) { char* typed_binary; TRY_ALLOC(typed_binary, cl_calloc(lengths[0]+1, sizeof(char))); memcpy(typed_binary+1, binaries[0], lengths[0]); *typed_binary = 1; program->opaque = compiler_program_new_from_llvm_binary(program->ctx->devices[0]->device_id, typed_binary, program->binary_sz+1); cl_free(typed_binary); if (UNLIKELY(program->opaque == NULL)) { err = CL_INVALID_PROGRAM; goto error; } program->source_type = FROM_LLVM_SPIR; program->binary_type = CL_PROGRAM_BINARY_TYPE_INTERMEDIATE; }else if(isLLVM_C_O((unsigned char*)program->binary) || isLLVM_LIB((unsigned char*)program->binary)) { if(*program->binary == BHI_COMPIRED_OBJECT){ program->binary_type = CL_PROGRAM_BINARY_TYPE_COMPILED_OBJECT; }else if(*program->binary == BHI_LIBRARY){ program->binary_type = CL_PROGRAM_BINARY_TYPE_LIBRARY; }else{ err= CL_INVALID_BINARY; goto error; } program->opaque = compiler_program_new_from_llvm_binary(program->ctx->devices[0]->device_id, program->binary, program->binary_sz); if (UNLIKELY(program->opaque == NULL)) { err = CL_INVALID_PROGRAM; goto error; } program->source_type = FROM_LLVM; } else if (isGenBinary((unsigned char*)program->binary)) { program->opaque = interp_program_new_from_binary(program->ctx->devices[0]->device_id, program->binary, program->binary_sz); if (UNLIKELY(program->opaque == NULL)) { DEBUGP(DL_ERROR, "Incompatible binary, please delete the binary and generate again."); err = CL_INVALID_PROGRAM; goto error; } /* Create all the kernels */ TRY (cl_program_load_gen_program, program); program->binary_type = CL_PROGRAM_BINARY_TYPE_EXECUTABLE; } else { err= CL_INVALID_BINARY; goto error; } if (binary_status) binary_status[0] = CL_SUCCESS; exit: if (errcode_ret) *errcode_ret = err; return program; error: cl_program_delete(program); program = NULL; goto exit; return CL_SUCCESS; } LOCAL cl_program cl_program_create_with_built_in_kernles(cl_context ctx, cl_uint num_devices, const cl_device_id * devices, const char * kernel_names, cl_int * errcode_ret) { cl_int err = CL_SUCCESS; cl_program built_in_prgs = NULL; assert(ctx); INVALID_DEVICE_IF (num_devices != 1); INVALID_DEVICE_IF (devices == NULL); INVALID_DEVICE_IF (devices[0] != ctx->devices[0]); cl_int binary_status = CL_SUCCESS; extern char cl_internal_built_in_kernel_str[]; extern size_t cl_internal_built_in_kernel_str_size; char* p_built_in_kernel_str =cl_internal_built_in_kernel_str; built_in_prgs = cl_program_create_from_binary(ctx, 1, &ctx->devices[0], (size_t*)&cl_internal_built_in_kernel_str_size, (const unsigned char **)&p_built_in_kernel_str, &binary_status, &err); if (!built_in_prgs) return NULL; err = cl_program_build(built_in_prgs, NULL); if (err != CL_SUCCESS) return NULL; built_in_prgs->is_built = 1; exit: if (errcode_ret) *errcode_ret = err; return built_in_prgs; error: goto exit; return CL_SUCCESS; } LOCAL cl_program cl_program_create_from_llvm(cl_context ctx, cl_uint num_devices, const cl_device_id *devices, const char *file_name, cl_int *errcode_ret) { cl_program program = NULL; cl_int err = CL_SUCCESS; assert(ctx); INVALID_DEVICE_IF (num_devices != 1); INVALID_DEVICE_IF (devices == NULL); INVALID_DEVICE_IF (devices[0] != ctx->devices[0]); INVALID_VALUE_IF (file_name == NULL); program = cl_program_new(ctx); if (UNLIKELY(program == NULL)) { err = CL_OUT_OF_HOST_MEMORY; goto error; } program->opaque = compiler_program_new_from_llvm_file(ctx->devices[0]->device_id, file_name, program->build_log_max_sz, program->build_log, &program->build_log_sz); if (UNLIKELY(program->opaque == NULL)) { err = CL_INVALID_PROGRAM; goto error; } /* Create all the kernels */ TRY (cl_program_load_gen_program, program); program->source_type = FROM_LLVM; exit: if (errcode_ret) *errcode_ret = err; return program; error: cl_program_delete(program); program = NULL; goto exit; } LOCAL cl_program cl_program_create_from_source(cl_context ctx, cl_uint count, const char **strings, const size_t *lengths, cl_int *errcode_ret) { cl_program program = NULL; cl_int err = CL_SUCCESS; cl_uint i; int32_t * lens = NULL; int32_t len_total = 0; assert(ctx); char * p = NULL; // the real compilation step will be done at build time since we do not have // yet the compilation options program = cl_program_new(ctx); if (UNLIKELY(program == NULL)) { err = CL_OUT_OF_HOST_MEMORY; goto error; } TRY_ALLOC (lens, cl_calloc(count, sizeof(int32_t))); for (i = 0; i < (int) count; ++i) { size_t len; if (lengths == NULL || lengths[i] == 0) len = strlen(strings[i]); else len = lengths[i]; lens[i] = len; len_total += len; } TRY_ALLOC(program->source, cl_calloc(len_total+1, sizeof(char))); p = program->source; for (i = 0; i < (int) count; ++i) { memcpy(p, strings[i], lens[i]); p += lens[i]; } *p = '\0'; program->source_type = FROM_SOURCE; program->binary_type = CL_PROGRAM_BINARY_TYPE_NONE; exit: cl_free(lens); lens = NULL; if (errcode_ret) *errcode_ret = err; return program; error: cl_program_delete(program); program = NULL; goto exit; } /* Before we do the real work, we need to check whether our platform cl version can meet -cl-std= */ static int check_cl_version_option(cl_program p, const char* options) { const char* s = NULL; int ver1 = 0; int ver2 = 0; char version_str[64] = {0}; if (options && (s = strstr(options, "-cl-std="))) { if (s + strlen("-cl-std=CLX.X") > options + strlen(options)) { return 0; } if (s[8] != 'C' || s[9] != 'L' || s[10] > '9' || s[10] < '0' || s[11] != '.' || s[12] > '9' || s[12] < '0') { return 0; } ver1 = (s[10] - '0') * 10 + (s[12] - '0'); if (cl_get_device_info(p->ctx->devices[0], CL_DEVICE_OPENCL_C_VERSION, sizeof(version_str), version_str, NULL) != CL_SUCCESS) return 0; assert(strstr(version_str, "OpenCL") && version_str[0] == 'O'); ver2 = (version_str[9] - '0') * 10 + (version_str[11] - '0'); if (ver2 < ver1) return 0; return 1; } return 1; } LOCAL cl_int cl_program_build(cl_program p, const char *options) { cl_int err = CL_SUCCESS; int i = 0; int copyed = 0; if (CL_OBJECT_GET_REF(p) > 1) { err = CL_INVALID_OPERATION; goto error; } #if HAS_CMRT if (p->source_type == FROM_CMRT) { //only here we begins to invoke cmrt //break spec to return other errors such as CL_DEVICE_NOT_FOUND err = cmrt_build_program(p, options); if (err == CL_SUCCESS) { p->build_status = CL_BUILD_SUCCESS; p->binary_type = CL_PROGRAM_BINARY_TYPE_EXECUTABLE; return CL_SUCCESS; } else goto error; } #endif if (!check_cl_version_option(p, options)) { err = CL_BUILD_PROGRAM_FAILURE; goto error; } if (options) { if(p->build_opts == NULL || strcmp(options, p->build_opts) != 0) { if(p->build_opts) { cl_free(p->build_opts); p->build_opts = NULL; } TRY_ALLOC (p->build_opts, cl_calloc(strlen(options) + 1, sizeof(char))); memcpy(p->build_opts, options, strlen(options)); } } if (options == NULL && p->build_opts) { cl_free(p->build_opts); p->build_opts = NULL; } if (p->source_type == FROM_SOURCE) { if (!CompilerSupported()) { err = CL_COMPILER_NOT_AVAILABLE; goto error; } p->opaque = compiler_program_new_from_source(p->ctx->devices[0]->device_id, p->source, p->build_log_max_sz, options, p->build_log, &p->build_log_sz); if (UNLIKELY(p->opaque == NULL)) { if (p->build_log_sz > 0 && strstr(p->build_log, "error: error reading 'options'")) err = CL_INVALID_BUILD_OPTIONS; else err = CL_BUILD_PROGRAM_FAILURE; goto error; } /* Create all the kernels */ TRY (cl_program_load_gen_program, p); } else if (p->source_type == FROM_LLVM || p->source_type == FROM_LLVM_SPIR) { if (!CompilerSupported()) { err = CL_COMPILER_NOT_AVAILABLE; goto error; } compiler_program_build_from_llvm(p->opaque, p->build_log_max_sz, p->build_log, &p->build_log_sz, options); if (UNLIKELY(p->opaque == NULL)) { if (p->build_log_sz > 0 && strstr(p->build_log, "error: error reading 'options'")) err = CL_INVALID_BUILD_OPTIONS; else err = CL_BUILD_PROGRAM_FAILURE; goto error; } /* Create all the kernels */ TRY (cl_program_load_gen_program, p); } else if (p->source_type == FROM_BINARY && p->binary_type != CL_PROGRAM_BINARY_TYPE_EXECUTABLE) { p->opaque = interp_program_new_from_binary(p->ctx->devices[0]->device_id, p->binary, p->binary_sz); if (UNLIKELY(p->opaque == NULL)) { err = CL_BUILD_PROGRAM_FAILURE; goto error; } /* Create all the kernels */ TRY (cl_program_load_gen_program, p); } p->binary_type = CL_PROGRAM_BINARY_TYPE_EXECUTABLE; for (i = 0; i < p->ker_n; i ++) { const gbe_kernel opaque = interp_program_get_kernel(p->opaque, i); p->bin_sz += interp_kernel_get_code_size(opaque); } TRY_ALLOC (p->bin, cl_calloc(p->bin_sz, sizeof(char))); for (i = 0; i < p->ker_n; i ++) { const gbe_kernel opaque = interp_program_get_kernel(p->opaque, i); size_t sz = interp_kernel_get_code_size(opaque); memcpy(p->bin + copyed, interp_kernel_get_code(opaque), sz); copyed += sz; } uint32_t ocl_version = interp_kernel_get_ocl_version(interp_program_get_kernel(p->opaque, 0)); if (ocl_version >= 200 && (err = get_program_global_data(p)) != CL_SUCCESS) goto error; p->is_built = 1; p->build_status = CL_BUILD_SUCCESS; return CL_SUCCESS; error: p->build_status = CL_BUILD_ERROR; return err; } cl_program cl_program_link(cl_context context, cl_uint num_input_programs, const cl_program * input_programs, const char * options, cl_int* errcode_ret) { cl_program p = NULL; cl_int err = CL_SUCCESS; cl_int i = 0; int copyed = 0; cl_bool ret = 0; int avialable_program = 0; //Although we don't use options, but still need check options if(!compiler_program_check_opt(options)) { err = CL_INVALID_LINKER_OPTIONS; goto error; } const char kernel_arg_option[] = "-cl-kernel-arg-info"; cl_bool option_exist = CL_TRUE; for(i = 0; i < num_input_programs; i++) { //num_input_programs >0 and input_programs MUST not NULL, so compare with input_programs[0] directly. if(input_programs[i]->binary_type == CL_PROGRAM_BINARY_TYPE_LIBRARY || input_programs[i]->binary_type == CL_PROGRAM_BINARY_TYPE_COMPILED_OBJECT || input_programs[i]->binary_type == CL_PROGRAM_BINARY_TYPE_INTERMEDIATE) { avialable_program++; } if(input_programs[i]->build_opts == NULL || strstr(input_programs[i]->build_opts, kernel_arg_option) == NULL ) { option_exist = CL_FALSE; } } //None of program contain a compilerd binary or library. if(avialable_program == 0) { goto done; } //Must all of program contain a compilerd binary or library. if(avialable_program < num_input_programs) { err = CL_INVALID_OPERATION; goto error; } p = cl_program_new(context); if (UNLIKELY(p == NULL)) { err = CL_OUT_OF_HOST_MEMORY; goto error; } if(option_exist) { TRY_ALLOC (p->build_opts, cl_calloc(strlen(kernel_arg_option) + 1, sizeof(char))); memcpy(p->build_opts, kernel_arg_option, strlen(kernel_arg_option)); } if (!check_cl_version_option(p, options)) { err = CL_BUILD_PROGRAM_FAILURE; goto error; } p->opaque = compiler_program_new_gen_program(context->devices[0]->device_id, NULL, NULL, NULL); for(i = 0; i < num_input_programs; i++) { // if program create with llvm binary, need deserilize first to get module. if(input_programs[i]) ret = compiler_program_link_program(p->opaque, input_programs[i]->opaque, p->build_log_max_sz, p->build_log, &p->build_log_sz); if (UNLIKELY(ret)) { err = CL_LINK_PROGRAM_FAILURE; goto error; } } if(options && strstr(options, "-create-library")){ p->binary_type = CL_PROGRAM_BINARY_TYPE_LIBRARY; goto done; }else{ p->binary_type = CL_PROGRAM_BINARY_TYPE_EXECUTABLE; } compiler_program_build_from_llvm(p->opaque, p->build_log_max_sz, p->build_log, &p->build_log_sz, options); /* Create all the kernels */ TRY (cl_program_load_gen_program, p); for (i = 0; i < p->ker_n; i ++) { const gbe_kernel opaque = interp_program_get_kernel(p->opaque, i); p->bin_sz += interp_kernel_get_code_size(opaque); } TRY_ALLOC (p->bin, cl_calloc(p->bin_sz, sizeof(char))); for (i = 0; i < p->ker_n; i ++) { const gbe_kernel opaque = interp_program_get_kernel(p->opaque, i); size_t sz = interp_kernel_get_code_size(opaque); memcpy(p->bin + copyed, interp_kernel_get_code(opaque), sz); copyed += sz; } uint32_t ocl_version = interp_kernel_get_ocl_version(interp_program_get_kernel(p->opaque, 0)); if (ocl_version >= 200 && (err = get_program_global_data(p)) != CL_SUCCESS) goto error; done: if(p) p->is_built = 1; if(p) p->build_status = CL_BUILD_SUCCESS; if (errcode_ret) *errcode_ret = err; return p; error: if(p) p->build_status = CL_BUILD_ERROR; if (errcode_ret) *errcode_ret = err; return p; } #define FILE_PATH_LENGTH 1024 LOCAL cl_int cl_program_compile(cl_program p, cl_uint num_input_headers, const cl_program * input_headers, const char ** header_include_names, const char* options) { cl_int err = CL_SUCCESS; int i = 0; if (CL_OBJECT_GET_REF(p) > 1) { err = CL_INVALID_OPERATION; goto error; } if (!check_cl_version_option(p, options)) { err = CL_BUILD_PROGRAM_FAILURE; goto error; } if (options) { if(p->build_opts == NULL || strcmp(options, p->build_opts) != 0) { if(p->build_opts) { cl_free(p->build_opts); p->build_opts = NULL; } TRY_ALLOC (p->build_opts, cl_calloc(strlen(options) + 1, sizeof(char))); memcpy(p->build_opts, options, strlen(options)); } } if (options == NULL && p->build_opts) { cl_free(p->build_opts); p->build_opts = NULL; } #if defined(__ANDROID__) char temp_header_template[]= "/data/local/tmp/beignet.XXXXXX"; #else char temp_header_template[]= "/tmp/beignet.XXXXXX"; #endif char* temp_header_path = mkdtemp(temp_header_template); if (p->source_type == FROM_SOURCE) { if (!CompilerSupported()) { err = CL_COMPILER_NOT_AVAILABLE; goto error; } //write the headers to /tmp/beignet.XXXXXX for include. for (i = 0; i < num_input_headers; i++) { if(header_include_names[i] == NULL || input_headers[i] == NULL) continue; char temp_path[FILE_PATH_LENGTH]=""; strncat(temp_path, temp_header_path, strlen(temp_header_path)); strncat(temp_path, "/", 1); strncat(temp_path, header_include_names[i], strlen(header_include_names[i])); if(strlen(temp_path) >= FILE_PATH_LENGTH - 1 ) { err = CL_COMPILE_PROGRAM_FAILURE; goto error; } temp_path[strlen(temp_path)+1] = '\0'; char* dirc = strdup(temp_path); char* dir = dirname(dirc); mkdir(dir, 0755); if(access(dir, R_OK|W_OK) != 0){ err = CL_COMPILE_PROGRAM_FAILURE; goto error; } free(dirc); FILE* pfile = fopen(temp_path, "wb"); if(pfile){ fwrite(input_headers[i]->source, strlen(input_headers[i]->source), 1, pfile); fclose(pfile); }else{ err = CL_COMPILE_PROGRAM_FAILURE; goto error; } } p->opaque = compiler_program_compile_from_source(p->ctx->devices[0]->device_id, p->source, temp_header_path, p->build_log_max_sz, options, p->build_log, &p->build_log_sz); char rm_path[255]="rm "; strncat(rm_path, temp_header_path, strlen(temp_header_path)); strncat(rm_path, " -rf", 4); int temp = system(rm_path); if(temp){ assert(0); } if (UNLIKELY(p->opaque == NULL)) { if (p->build_log_sz > 0 && strstr(p->build_log, "error: error reading 'options'")) err = CL_INVALID_COMPILER_OPTIONS; else err = CL_COMPILE_PROGRAM_FAILURE; goto error; } /* Create all the kernels */ p->binary_type = CL_PROGRAM_BINARY_TYPE_COMPILED_OBJECT; }else if(p->source_type == FROM_BINARY){ err = CL_INVALID_OPERATION; return err; } p->is_built = 1; p->build_status = CL_BUILD_SUCCESS; return CL_SUCCESS; error: p->build_status = CL_BUILD_ERROR; return err; } LOCAL cl_kernel cl_program_create_kernel(cl_program p, const char *name, cl_int *errcode_ret) { cl_kernel from = NULL, to = NULL; cl_int err = CL_SUCCESS; uint32_t i = 0; #ifdef HAS_CMRT if (p->cmrt_program != NULL) { void* cmrt_kernel = cmrt_create_kernel(p, name); if (cmrt_kernel != NULL) { to = cl_kernel_new(p); to->cmrt_kernel = cmrt_kernel; goto exit; } else { err = CL_INVALID_KERNEL_NAME; goto error; } } #endif /* Find the program first */ for (i = 0; i < p->ker_n; ++i) { assert(p->ker[i]); const char *ker_name = cl_kernel_get_name(p->ker[i]); if (ker_name != NULL && strcmp(ker_name, name) == 0) { from = p->ker[i]; break; } } /* We were not able to find this named kernel */ if (UNLIKELY(from == NULL)) { err = CL_INVALID_KERNEL_NAME; goto error; } TRY_ALLOC(to, cl_kernel_dup(from)); exit: if (errcode_ret) *errcode_ret = err; return to; error: cl_kernel_delete(to); to = NULL; goto exit; } LOCAL cl_int cl_program_create_kernels_in_program(cl_program p, cl_kernel* ker) { int i = 0; if(ker == NULL) return CL_SUCCESS; for (i = 0; i < p->ker_n; ++i) { TRY_ALLOC_NO_ERR(ker[i], cl_kernel_dup(p->ker[i])); } return CL_SUCCESS; error: do { cl_kernel_delete(ker[i]); ker[i--] = NULL; } while(i > 0); return CL_OUT_OF_HOST_MEMORY; } LOCAL void cl_program_get_kernel_names(cl_program p, size_t size, char *names, size_t *size_ret) { int i = 0; const char *ker_name = NULL; size_t len = 0; if(size_ret) *size_ret = 0; if(p->ker == NULL) { return; } ker_name = cl_kernel_get_name(p->ker[0]); if (ker_name != NULL) len = strlen(ker_name); else len = 0; if(names && ker_name) { strncpy(names, ker_name, size - 1); names[size - 1] = '\0'; if(size < len - 1) { if(size_ret) *size_ret = size; return; } size = size - len - 1; //sub \0 } if(size_ret) *size_ret = len + 1; //add NULL for (i = 1; i < p->ker_n; ++i) { ker_name = cl_kernel_get_name(p->ker[i]); if (ker_name != NULL) len = strlen(ker_name); else len = 0; if(names && ker_name) { strncat(names, ";", size); if(size >= 1) strncat(names, ker_name, size - 1); if(size < len + 1) { if(size_ret) *size_ret = size; break; } size = size - len - 1; } if(size_ret) *size_ret += len + 1; //add ';' } } Beignet-1.3.2-Source/src/cl_api_device_id.c000664 001750 001750 00000005403 13161142102 017626 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "cl_device_id.h" #include "cl_platform_id.h" cl_int clGetDeviceIDs(cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, cl_device_id *devices, cl_uint *num_devices) { const cl_device_type valid_type = CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_CPU | CL_DEVICE_TYPE_ACCELERATOR | CL_DEVICE_TYPE_DEFAULT | CL_DEVICE_TYPE_CUSTOM; /* Check parameter consistency */ if (UNLIKELY(devices == NULL && num_devices == NULL)) return CL_INVALID_VALUE; if (UNLIKELY(platform && platform != cl_get_platform_default())) return CL_INVALID_PLATFORM; if (UNLIKELY(devices && num_entries == 0)) return CL_INVALID_VALUE; if ((device_type & valid_type) == 0) return CL_INVALID_DEVICE_TYPE; return cl_get_device_ids(platform, device_type, num_entries, devices, num_devices); } cl_int clGetDeviceInfo(cl_device_id device, cl_device_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { if (!CL_OBJECT_IS_DEVICE(device)) { return CL_INVALID_DEVICE; } return cl_get_device_info(device, param_name, param_value_size, param_value, param_value_size_ret); } cl_int clRetainDevice(cl_device_id device) { // XXX stub for C++ Bindings return CL_SUCCESS; } cl_int clReleaseDevice(cl_device_id device) { // XXX stub for C++ Bindings return CL_SUCCESS; } cl_int clCreateSubDevices(cl_device_id in_device, const cl_device_partition_property *properties, cl_uint num_devices, cl_device_id *out_devices, cl_uint *num_devices_ret) { /* Check parameter consistency */ if (UNLIKELY(out_devices == NULL && num_devices_ret == NULL)) return CL_INVALID_VALUE; if (UNLIKELY(in_device == NULL && properties == NULL)) return CL_INVALID_VALUE; *num_devices_ret = 0; return CL_INVALID_DEVICE_PARTITION_COUNT; } Beignet-1.3.2-Source/src/cl_context.c000664 001750 001750 00000031303 13173554000 016553 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "cl_platform_id.h" #include "cl_device_id.h" #include "cl_context.h" #include "cl_command_queue.h" #include "cl_mem.h" #include "cl_sampler.h" #include "cl_event.h" #include "cl_alloc.h" #include "cl_utils.h" #include "cl_driver.h" #include "cl_khr_icd.h" #include "cl_kernel.h" #include "cl_program.h" #include "CL/cl.h" #include "CL/cl_gl.h" #include #include #include #include #include LOCAL void cl_context_add_queue(cl_context ctx, cl_command_queue queue) { assert(queue->ctx == NULL); cl_context_add_ref(ctx); CL_OBJECT_LOCK(ctx); while (ctx->queue_modify_disable) { CL_OBJECT_WAIT_ON_COND(ctx); } list_add_tail(&ctx->queues, &queue->base.node); ctx->queue_num++; CL_OBJECT_UNLOCK(ctx); queue->ctx = ctx; } LOCAL void cl_context_remove_queue(cl_context ctx, cl_command_queue queue) { assert(queue->ctx == ctx); CL_OBJECT_LOCK(ctx); while (ctx->queue_modify_disable) { CL_OBJECT_WAIT_ON_COND(ctx); } list_node_del(&queue->base.node); ctx->queue_num--; CL_OBJECT_UNLOCK(ctx); cl_context_delete(ctx); queue->ctx = NULL; } LOCAL void cl_context_add_mem(cl_context ctx, cl_mem mem) { assert(mem->ctx == NULL); cl_context_add_ref(ctx); CL_OBJECT_LOCK(ctx); list_add_tail(&ctx->mem_objects, &mem->base.node); ctx->mem_object_num++; CL_OBJECT_UNLOCK(ctx); mem->ctx = ctx; } LOCAL void cl_context_remove_mem(cl_context ctx, cl_mem mem) { assert(mem->ctx == ctx); CL_OBJECT_LOCK(ctx); list_node_del(&mem->base.node); ctx->mem_object_num--; CL_OBJECT_UNLOCK(ctx); cl_context_delete(ctx); mem->ctx = NULL; } LOCAL void cl_context_add_sampler(cl_context ctx, cl_sampler sampler) { assert(sampler->ctx == NULL); cl_context_add_ref(ctx); CL_OBJECT_LOCK(ctx); list_add_tail(&ctx->samplers, &sampler->base.node); ctx->sampler_num++; CL_OBJECT_UNLOCK(ctx); sampler->ctx = ctx; } LOCAL void cl_context_remove_sampler(cl_context ctx, cl_sampler sampler) { assert(sampler->ctx == ctx); CL_OBJECT_LOCK(ctx); list_node_del(&sampler->base.node); ctx->sampler_num--; CL_OBJECT_UNLOCK(ctx); cl_context_delete(ctx); sampler->ctx = NULL; } LOCAL void cl_context_add_event(cl_context ctx, cl_event event) { assert(event->ctx == NULL); cl_context_add_ref(ctx); CL_OBJECT_LOCK(ctx); list_add_tail(&ctx->events, &event->base.node); ctx->event_num++; CL_OBJECT_UNLOCK(ctx); event->ctx = ctx; } LOCAL void cl_context_remove_event(cl_context ctx, cl_event event) { assert(event->ctx == ctx); CL_OBJECT_LOCK(ctx); list_node_del(&event->base.node); ctx->event_num--; CL_OBJECT_UNLOCK(ctx); cl_context_delete(ctx); event->ctx = NULL; } LOCAL void cl_context_add_program(cl_context ctx, cl_program program) { assert(program->ctx == NULL); cl_context_add_ref(ctx); CL_OBJECT_LOCK(ctx); list_add_tail(&ctx->programs, &program->base.node); ctx->program_num++; CL_OBJECT_UNLOCK(ctx); program->ctx = ctx; } LOCAL void cl_context_remove_program(cl_context ctx, cl_program program) { assert(program->ctx == ctx); CL_OBJECT_LOCK(ctx); list_node_del(&program->base.node); ctx->program_num--; CL_OBJECT_UNLOCK(ctx); cl_context_delete(ctx); program->ctx = NULL; } #define CHECK(var) \ if (var) \ return CL_INVALID_PROPERTY; \ else \ var = 1; static cl_int cl_context_properties_process(const cl_context_properties *prop, struct _cl_context_prop *cl_props, cl_uint * prop_len) { int set_cl_context_platform = 0, set_cl_gl_context_khr = 0, set_cl_egl_display_khr = 0, set_cl_glx_display_khr = 0, set_cl_wgl_hdc_khr = 0, set_cl_cgl_sharegroup_khr = 0; cl_int err = CL_SUCCESS; cl_props->gl_type = CL_GL_NOSHARE; cl_props->platform_id = 0; if (prop == NULL) goto exit; while(*prop) { switch (*prop) { case CL_CONTEXT_PLATFORM: CHECK (set_cl_context_platform); cl_props->platform_id = *(prop + 1); if (UNLIKELY((cl_platform_id) cl_props->platform_id != cl_get_platform_default())) { err = CL_INVALID_PLATFORM; goto error; } break; case CL_GL_CONTEXT_KHR: CHECK (set_cl_gl_context_khr); cl_props->gl_context = *(prop + 1); break; case CL_EGL_DISPLAY_KHR: CHECK (set_cl_egl_display_khr); cl_props->gl_type = CL_GL_EGL_DISPLAY; cl_props->egl_display = *(prop + 1); break; case CL_GLX_DISPLAY_KHR: CHECK (set_cl_glx_display_khr); cl_props->gl_type = CL_GL_GLX_DISPLAY; cl_props->glx_display = *(prop + 1); break; case CL_WGL_HDC_KHR: CHECK (set_cl_wgl_hdc_khr); cl_props->gl_type = CL_GL_WGL_HDC; cl_props->wgl_hdc = *(prop + 1); break; case CL_CGL_SHAREGROUP_KHR: CHECK (set_cl_cgl_sharegroup_khr); cl_props->gl_type = CL_GL_CGL_SHAREGROUP; cl_props->cgl_sharegroup = *(prop + 1); break; default: err = CL_INVALID_PROPERTY; goto error; } prop += 2; *prop_len += 2; } (*prop_len)++; exit: error: return err; } LOCAL cl_context cl_create_context(const cl_context_properties * properties, cl_uint num_devices, const cl_device_id * devices, void (CL_CALLBACK * pfn_notify) (const char*, const void*, size_t, void*), void * user_data, cl_int * errcode_ret) { /* cl_platform_id platform = NULL; */ struct _cl_context_prop props; cl_context ctx = NULL; cl_int err = CL_SUCCESS; cl_uint prop_len = 0; cl_uint dev_num = 0; cl_device_id* all_dev = NULL; cl_uint i, j; /* XXX */ FATAL_IF (num_devices != 1, "Only one device is supported"); /* Check that we are getting the right platform */ if (UNLIKELY(((err = cl_context_properties_process(properties, &props, &prop_len)) != CL_SUCCESS))) goto error; /* Filter out repeated device. */ assert(num_devices > 0); all_dev = cl_calloc(num_devices, sizeof(cl_device_id)); if (all_dev == NULL) { *errcode_ret = CL_OUT_OF_HOST_MEMORY; return NULL; } for (i = 0; i < num_devices; i++) { for (j = 0; j < i; j++) { if (devices[j] == devices[i]) { break; } } if (j != i) { // Find some duplicated one. continue; } all_dev[dev_num] = devices[i]; dev_num++; } assert(dev_num == 1); // TODO: multi devices later. /* We are good */ if (UNLIKELY((ctx = cl_context_new(&props, dev_num, all_dev)) == NULL)) { cl_free(all_dev); err = CL_OUT_OF_HOST_MEMORY; goto error; } if(properties != NULL && prop_len > 0) { TRY_ALLOC (ctx->prop_user, CALLOC_ARRAY(cl_context_properties, prop_len)); memcpy(ctx->prop_user, properties, sizeof(cl_context_properties)*prop_len); } ctx->prop_len = prop_len; /* cl_context_new will use all_dev. */ all_dev = NULL; /* Save the user callback and user data*/ ctx->pfn_notify = pfn_notify; ctx->user_data = user_data; cl_driver_set_atomic_flag(ctx->drv, ctx->devices[0]->atomic_test_result); exit: if (errcode_ret != NULL) *errcode_ret = err; return ctx; error: cl_context_delete(ctx); ctx = NULL; goto exit; } LOCAL cl_context cl_context_new(struct _cl_context_prop *props, cl_uint dev_num, cl_device_id* all_dev) { cl_context ctx = NULL; TRY_ALLOC_NO_ERR (ctx, CALLOC(struct _cl_context)); CL_OBJECT_INIT_BASE(ctx, CL_OBJECT_CONTEXT_MAGIC); ctx->devices = all_dev; ctx->device_num = dev_num; list_init(&ctx->queues); list_init(&ctx->mem_objects); list_init(&ctx->samplers); list_init(&ctx->events); list_init(&ctx->programs); ctx->queue_modify_disable = CL_FALSE; TRY_ALLOC_NO_ERR (ctx->drv, cl_driver_new(props)); ctx->props = *props; ctx->ver = cl_driver_get_ver(ctx->drv); exit: return ctx; error: cl_context_delete(ctx); ctx = NULL; goto exit; } LOCAL void cl_context_delete(cl_context ctx) { int i = 0; if (UNLIKELY(ctx == NULL)) return; int internal_ctx_refs = 1; // determine how many ctx refs are held by internal_prgs and built_in_prgs for (i = CL_INTERNAL_KERNEL_MIN; i < CL_INTERNAL_KERNEL_MAX; i++) { if (ctx->internal_kernels[i] && ctx->internal_prgs[i]) ++internal_ctx_refs; } /* We are not done yet */ if (CL_OBJECT_DEC_REF(ctx) > internal_ctx_refs) return; // create a temporary extra ref here so cl_program_delete doesn't // attempt a recursive full cl_context_delete when cleaning up // our internal programs CL_OBJECT_INC_REF(ctx); /* delete the internal programs. */ for (i = CL_INTERNAL_KERNEL_MIN; i < CL_INTERNAL_KERNEL_MAX; i++) { if (ctx->internal_kernels[i]) { cl_kernel k = ctx->internal_kernels[i]; ctx->internal_kernels[i] = NULL; cl_kernel_delete(k); assert(ctx->internal_prgs[i]); cl_program p = ctx->internal_prgs[i]; ctx->internal_prgs[i] = NULL; cl_program_delete(p); } } CL_OBJECT_DEC_REF(ctx); cl_free(ctx->prop_user); cl_free(ctx->devices); cl_driver_delete(ctx->drv); CL_OBJECT_DESTROY_BASE(ctx); cl_free(ctx); } LOCAL void cl_context_add_ref(cl_context ctx) { assert(ctx); CL_OBJECT_INC_REF(ctx); } cl_buffer_mgr cl_context_get_bufmgr(cl_context ctx) { return cl_driver_get_bufmgr(ctx->drv); } cl_kernel cl_context_get_static_kernel_from_bin(cl_context ctx, cl_int index, const char * str_kernel, size_t size, const char * str_option) { cl_int ret; cl_int binary_status = CL_SUCCESS; cl_kernel ker; CL_OBJECT_TAKE_OWNERSHIP(ctx, 1); if (ctx->internal_prgs[index] == NULL) { ctx->internal_prgs[index] = cl_program_create_from_binary(ctx, 1, &ctx->devices[0], &size, (const unsigned char **)&str_kernel, &binary_status, &ret); if (!ctx->internal_prgs[index]) { ker = NULL; goto unlock; } ret = cl_program_build(ctx->internal_prgs[index], str_option); if (ret != CL_SUCCESS) { ker = NULL; goto unlock; } ctx->internal_prgs[index]->is_built = 1; if (index == CL_ENQUEUE_FILL_BUFFER_ALIGN8_8) { ctx->internal_kernels[index] = cl_program_create_kernel(ctx->internal_prgs[index], "__cl_fill_region_align8_2", NULL); } else if (index == CL_ENQUEUE_FILL_BUFFER_ALIGN8_16) { ctx->internal_kernels[index] = cl_program_create_kernel(ctx->internal_prgs[index], "__cl_fill_region_align8_4", NULL); } else if (index == CL_ENQUEUE_FILL_BUFFER_ALIGN8_32) { ctx->internal_kernels[index] = cl_program_create_kernel(ctx->internal_prgs[index], "__cl_fill_region_align8_8", NULL); } else if (index == CL_ENQUEUE_FILL_BUFFER_ALIGN8_64) { ctx->internal_kernels[index] = cl_program_create_kernel(ctx->internal_prgs[index], "__cl_fill_region_align8_16", NULL); } else { ctx->internal_kernels[index] = cl_kernel_dup(ctx->internal_prgs[index]->ker[0]); } } ker = ctx->internal_kernels[index]; unlock: CL_OBJECT_RELEASE_OWNERSHIP(ctx); return cl_kernel_dup(ker); } cl_mem cl_context_get_svm_from_ptr(cl_context ctx, const void * p) { struct list_node *pos; cl_mem buf; list_for_each (pos, (&ctx->mem_objects)) { buf = (cl_mem)list_entry(pos, _cl_base_object, node); if(buf->host_ptr == NULL) continue; if(buf->is_svm == 0) continue; if(buf->type != CL_MEM_SVM_TYPE) continue; if((size_t)buf->host_ptr <= (size_t)p && (size_t)p < ((size_t)buf->host_ptr + buf->size)) return buf; } return NULL; } cl_mem cl_context_get_mem_from_ptr(cl_context ctx, const void * p) { struct list_node *pos; cl_mem buf; list_for_each (pos, (&ctx->mem_objects)) { buf = (cl_mem)list_entry(pos, _cl_base_object, node); if(buf->host_ptr == NULL) continue; if((size_t)buf->host_ptr <= (size_t)p && (size_t)p < ((size_t)buf->host_ptr + buf->size)) return buf; } return NULL; } Beignet-1.3.2-Source/src/cl_gen75_device.h000664 001750 001750 00000002202 13161142102 017325 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* Common fields for both CHV,VLV and HSW devices */ .max_parameter_size = 1024, .global_mem_cache_line_size = 64, /* XXX */ .global_mem_cache_size = 8 << 10, /* XXX */ .local_mem_type = CL_LOCAL, .local_mem_size = 64 << 10, .scratch_mem_size = 2 << 20, .max_mem_alloc_size = 2 * 1024 * 1024 * 1024ul, .global_mem_size = 2 * 1024 * 1024 * 1024ul, #include "cl_gt_device.h" Beignet-1.3.2-Source/src/cl_extensions.h000664 001750 001750 00000006551 13173554000 017302 0ustar00yryr000000 000000 #ifndef __CL_EXTENSIONS_H__ #define __CL_EXTENSIONS_H__ /* The following approved Khronos extension * names must be returned by all device that * support OpenCL C 1.2. */ #define DECL_BASE_EXTENSIONS \ DECL_EXT(khr_global_int32_base_atomics) \ DECL_EXT(khr_global_int32_extended_atomics) \ DECL_EXT(khr_local_int32_base_atomics) \ DECL_EXT(khr_local_int32_extended_atomics) \ DECL_EXT(khr_byte_addressable_store) \ DECL_EXT(khr_3d_image_writes)\ DECL_EXT(khr_image2d_from_buffer)\ DECL_EXT(khr_depth_images)\ DECL_EXT(khr_fp64) /* The OPT1 extensions are those optional extensions * which don't have external dependecies*/ #define DECL_OPT1_EXTENSIONS \ DECL_EXT(khr_int64_base_atomics)\ DECL_EXT(khr_int64_extended_atomics)\ DECL_EXT(khr_fp16)\ DECL_EXT(khr_initialize_memory)\ DECL_EXT(khr_context_abort)\ DECL_EXT(khr_spir) \ DECL_EXT(khr_icd) #define DECL_INTEL_EXTENSIONS \ DECL_EXT(intel_accelerator) \ DECL_EXT(intel_motion_estimation) \ DECL_EXT(intel_subgroups) \ DECL_EXT(intel_subgroups_short) \ DECL_EXT(intel_required_subgroup_size) #define DECL_GL_EXTENSIONS \ DECL_EXT(khr_gl_sharing)\ DECL_EXT(khr_gl_event)\ DECL_EXT(khr_gl_depth_images)\ DECL_EXT(khr_gl_msaa_sharing) #define DECL_D3D_EXTENSIONS \ DECL_EXT(khr_d3d10_sharing)\ DECL_EXT(khr_dx9_media_sharing)\ DECL_EXT(khr_d3d11_sharing)\ #define DECL_ALL_EXTENSIONS \ DECL_BASE_EXTENSIONS \ DECL_OPT1_EXTENSIONS \ DECL_INTEL_EXTENSIONS \ DECL_GL_EXTENSIONS \ DECL_D3D_EXTENSIONS #define EXT_ID(name) cl_ ## name ## _ext_id #define EXT_STRUCT_NAME(name) cl_ ## name ## ext /*Declare enum ids */ typedef enum { #define DECL_EXT(name) EXT_ID(name), DECL_ALL_EXTENSIONS #undef DECL_EXT cl_khr_extension_id_max }cl_extension_enum; #define BASE_EXT_START_ID EXT_ID(khr_global_int32_base_atomics) #define BASE_EXT_END_ID EXT_ID(khr_fp64) #define OPT1_EXT_START_ID EXT_ID(khr_int64_base_atomics) #define OPT1_EXT_END_ID EXT_ID(khr_icd) #define INTEL_EXT_START_ID EXT_ID(intel_accelerator) #define INTEL_EXT_END_ID EXT_ID(intel_subgroups_short) #define GL_EXT_START_ID EXT_ID(khr_gl_sharing) #define GL_EXT_END_ID EXT_ID(khr_gl_msaa_sharing) #define IS_BASE_EXTENSION(id) (id >= BASE_EXT_START_ID && id <= BASE_EXT_END_ID) #define IS_OPT1_EXTENSION(id) (id >= OPT1_EXT_START_ID && id <= OPT1_EXT_END_ID) #define IS_GL_EXTENSION(id) (id >= GL_EXT_START_ID && id <= GL_EXT_END_ID) struct cl_extension_base { cl_extension_enum ext_id; int ext_enabled; char *ext_name; }; /* Declare each extension structure. */ #define DECL_EXT(name) \ struct EXT_STRUCT_NAME(name) { \ struct cl_extension_base base;\ }; DECL_BASE_EXTENSIONS DECL_OPT1_EXTENSIONS DECL_INTEL_EXTENSIONS DECL_D3D_EXTENSIONS DECL_GL_EXTENSIONS #undef DECL_EXT /* Union all extensions together. */ typedef union { struct cl_extension_base base; #define DECL_EXT(name) struct EXT_STRUCT_NAME(name) EXT_STRUCT_NAME(name); DECL_ALL_EXTENSIONS #undef DECL_EXT } extension_union; #include "cl_device_id.h" typedef struct cl_extensions { extension_union extensions[cl_khr_extension_id_max]; char ext_str[EXTENSTION_LENGTH]; } cl_extensions_t; extern void cl_intel_platform_extension_init(cl_platform_id intel_platform); extern void cl_intel_platform_enable_extension(cl_device_id device, uint32_t name); extern void cl_intel_platform_get_default_extension(cl_device_id device); #endif /* __CL_EXTENSIONS_H__ */ Beignet-1.3.2-Source/src/Android.mk000664 001750 001750 00000011532 13161142102 016151 0ustar00yryr000000 000000 LOCAL_PATH:= $(call my-dir) include $(CLEAR_VARS) include $(LOCAL_PATH)/../Android.common.mk ocl_config_file = $(LOCAL_PATH)/OCLConfig.h $(shell echo "// the configured options and settings for LIBCL" > $(ocl_config_file)) $(shell echo "#define LIBCL_DRIVER_VERSION_MAJOR 1" >> $(ocl_config_file)) $(shell echo "#define LIBCL_DRIVER_VERSION_MINOR 2" >> $(ocl_config_file)) $(shell echo "#define LIBCL_C_VERSION_MAJOR 1" >> $(ocl_config_file)) $(shell echo "#define LIBCL_C_VERSION_MINOR 2" >> $(ocl_config_file)) LOCAL_C_INCLUDES := $(TOP_C_INCLUDE) $(BEIGNET_ROOT_PATH)/backend/src/backend/ $(BEIGNET_ROOT_PATH) LOCAL_C_INCLUDES += $(DRM_INCLUDE_PATH) LOCAL_C_INCLUDES += $(LLVM_INCLUDE_DIRS) LOCAL_C_INCLUDES += hardware/drm_gralloc LOCAL_CPPFLAGS := $(TOP_CPPFLAGS) -std=c++11 -DHAS_USERPTR LOCAL_CFLAGS := $(TOP_CFLAGS) -DHAS_USERPTR OPTIONAL_EGL_LIBRARY := LOCAL_LDFLAGS := -Wl,-Bsymbolic LOCAL_LDLIBS := -lm -ldl LOCAL_SHARED_LIBRARIES += liblog libcutils LOCAL_ADDITIONAL_DEPENDENCIES := $(GBE_BIN_GENERATER) LOCAL_MODULE := libcl LOCAL_REQUIRED_MODULES := $(HOST_OUT_EXECUTABLES)/gbe_bin_generater LOCAL_ADDITIONAL_DEPENDENCIES := $(BEIGNET_ROOT_PATH)/backend/src/Android.mk KERNEL_PATH := $(BEIGNET_ROOT_PATH)/src/kernels KERNEL_NAMES := cl_internal_copy_buf_align4 \ cl_internal_copy_buf_align16 \ cl_internal_copy_buf_unalign_same_offset \ cl_internal_copy_buf_unalign_dst_offset \ cl_internal_copy_buf_unalign_src_offset \ cl_internal_copy_buf_rect \ cl_internal_copy_buf_rect_align4 \ cl_internal_copy_image_1d_to_1d \ cl_internal_copy_image_2d_to_2d \ cl_internal_copy_image_3d_to_2d \ cl_internal_copy_image_2d_to_3d \ cl_internal_copy_image_3d_to_3d \ cl_internal_copy_image_2d_to_2d_array \ cl_internal_copy_image_1d_array_to_1d_array \ cl_internal_copy_image_2d_array_to_2d_array \ cl_internal_copy_image_2d_array_to_2d \ cl_internal_copy_image_2d_array_to_3d \ cl_internal_copy_image_3d_to_2d_array \ cl_internal_copy_image_2d_to_buffer \ cl_internal_copy_image_2d_to_buffer_align16 \ cl_internal_copy_image_3d_to_buffer \ cl_internal_copy_buffer_to_image_2d \ cl_internal_copy_buffer_to_image_2d_align16 \ cl_internal_copy_buffer_to_image_3d \ cl_internal_fill_buf_align8 \ cl_internal_fill_buf_align4 \ cl_internal_fill_buf_align2 \ cl_internal_fill_buf_unalign \ cl_internal_fill_buf_align128 \ cl_internal_fill_image_1d \ cl_internal_fill_image_1d_array \ cl_internal_fill_image_2d \ cl_internal_fill_image_2d_array \ cl_internal_fill_image_3d BUILT_IN_NAME := cl_internal_built_in_kernel GBE_BIN_GENERATER := $(HOST_OUT_EXECUTABLES)/gbe_bin_generater $(shell rm $(KERNEL_PATH)/$(BUILT_IN_NAME).cl) define GEN_INTERNAL_KER # Use the python script to generate the header files. $(shell $(GBE_BIN_GENERATER) -s $(KERNEL_PATH)/$(1).cl -o $(KERNEL_PATH)/$(1)_str.c) $(shell cat $(KERNEL_PATH)/$(1).cl >> $(KERNEL_PATH)/$(BUILT_IN_NAME).cl) endef $(foreach KERNEL_NAME, ${KERNEL_NAMES}, $(eval $(call GEN_INTERNAL_KER,$(KERNEL_NAME)))) $(shell $(GBE_BIN_GENERATER) -s $(KERNEL_PATH)/$(BUILT_IN_NAME).cl -o $(KERNEL_PATH)/$(BUILT_IN_NAME)_str.c) GIT_SHA1 = git_sha1.h $(shell chmod +x $(LOCAL_PATH)/git_sha1.sh) $(shell $(LOCAL_PATH)/git_sha1.sh $(LOCAL_PATH) ${GIT_SHA1}) LOCAL_SRC_FILES:= \ $(addprefix kernels/,$(addsuffix _str.c, $(KERNEL_NAMES))) \ $(addprefix kernels/,$(addsuffix _str.c, $(BUILT_IN_NAME))) \ cl_base_object.c \ cl_api.c \ cl_api_platform_id.c \ cl_api_device_id.c \ cl_api_mem.c \ cl_api_kernel.c \ cl_api_command_queue.c \ cl_api_event.c \ cl_api_context.c \ cl_api_sampler.c \ cl_api_program.c \ cl_alloc.c \ cl_kernel.c \ cl_program.c \ cl_gbe_loader.cpp \ cl_sampler.c \ cl_accelerator_intel.c \ cl_event.c \ cl_enqueue.c \ cl_image.c \ cl_mem.c \ cl_platform_id.c \ cl_extensions.c \ cl_device_id.c \ cl_context.c \ cl_command_queue.c \ cl_command_queue.h \ cl_command_queue_gen7.c \ cl_command_queue_enqueue.c \ cl_device_enqueue.c \ cl_utils.c \ cl_driver.h \ cl_driver.cpp \ cl_driver_defs.c \ intel/intel_gpgpu.c \ intel/intel_batchbuffer.c \ intel/intel_driver.c \ performance.c LOCAL_SHARED_LIBRARIES := \ libgbe \ libdl \ $(DRM_INTEL_LIBRARY) \ $(DRM_LIBRARY) \ $(OPTIONAL_EGL_LIBRARY) \ libhardware #LOCAL_CLANG := true include external/libcxx/libcxx.mk include $(BUILD_SHARED_LIBRARY) Beignet-1.3.2-Source/src/cl_internals.h000664 001750 001750 00000002737 13161142102 017075 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_INTERNALS_H__ #define __CL_INTERNALS_H__ /* We put a header to identify each object. This will make the programmer life * easy if objects are wrongly used in the API */ #define CL_MAGIC_KERNEL_HEADER 0x1234567890abcdefLL #define CL_MAGIC_CONTEXT_HEADER 0x0ab123456789cdefLL #define CL_MAGIC_PROGRAM_HEADER 0x34560ab12789cdefLL #define CL_MAGIC_QUEUE_HEADER 0x83650a12b79ce4dfLL #define CL_MAGIC_SAMPLER_HEADER 0x686a0ecba79ce33fLL #define CL_MAGIC_EVENT_HEADER 0x8324a9c810ebf90fLL #define CL_MAGIC_MEM_HEADER 0x381a27b9ce6504dfLL #define CL_MAGIC_DEAD_HEADER 0xdeaddeaddeaddeadLL #define CL_MAGIC_ACCELERATOR_INTEL_HEADER 0x7c6a08c9a7ac3e3fLL #endif /* __CL_INTERNALS_H__ */ Beignet-1.3.2-Source/src/cl_sampler.c000664 001750 001750 00000007125 13161142102 016530 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "cl_context.h" #include "cl_sampler.h" #include "cl_utils.h" #include "cl_alloc.h" #include "cl_khr_icd.h" #include "cl_kernel.h" #include uint32_t cl_to_clk(cl_bool normalized_coords, cl_addressing_mode address, cl_filter_mode filter) { int clk_address = CLK_ADDRESS_NONE; int clk_filter = CLK_FILTER_NEAREST; switch (address) { case CL_ADDRESS_NONE: clk_address = CLK_ADDRESS_NONE; break; case CL_ADDRESS_CLAMP: clk_address = CLK_ADDRESS_CLAMP; break; case CL_ADDRESS_CLAMP_TO_EDGE: clk_address = CLK_ADDRESS_CLAMP_TO_EDGE; break; case CL_ADDRESS_REPEAT: clk_address = CLK_ADDRESS_REPEAT; break; case CL_ADDRESS_MIRRORED_REPEAT: clk_address = CLK_ADDRESS_MIRRORED_REPEAT; break; default: assert(0); } switch(filter) { case CL_FILTER_NEAREST: clk_filter = CLK_FILTER_NEAREST; break; case CL_FILTER_LINEAR: clk_filter = CLK_FILTER_LINEAR; break; default: assert(0); } return (clk_address << __CLK_ADDRESS_BASE) | (normalized_coords << __CLK_NORMALIZED_BASE) | (clk_filter); } #define IS_SAMPLER_ARG(v) (v & __CLK_SAMPLER_ARG_KEY_BIT) #define SAMPLER_ARG_ID(v) ((v & __CLK_SAMPLER_ARG_MASK) >> __CLK_SAMPLER_ARG_BASE) int cl_set_sampler_arg_slot(cl_kernel k, int index, cl_sampler sampler) { int slot_id; for(slot_id = 0; slot_id < k->sampler_sz; slot_id++) { if (IS_SAMPLER_ARG(k->samplers[slot_id])) { if (SAMPLER_ARG_ID(k->samplers[slot_id]) == index) { k->samplers[slot_id] = (k->samplers[slot_id] & (~__CLK_SAMPLER_MASK)) | sampler->clkSamplerValue; return slot_id; } } } return -1; } LOCAL cl_sampler cl_create_sampler(cl_context ctx, cl_bool normalized_coords, cl_addressing_mode address, cl_filter_mode filter, cl_int *errcode_ret) { cl_sampler sampler = NULL; /* Allocate and inialize the structure itself */ sampler = cl_calloc(1, sizeof(_cl_sampler)); if (sampler == NULL) { *errcode_ret = CL_OUT_OF_HOST_MEMORY; return NULL; } CL_OBJECT_INIT_BASE(sampler, CL_OBJECT_SAMPLER_MAGIC); sampler->normalized_coords = normalized_coords; sampler->address = address; sampler->filter = filter; /* Append the sampler in the context sampler list */ cl_context_add_sampler(ctx, sampler); // TODO: May move it to other place, it's not a common sampler logic. sampler->clkSamplerValue = cl_to_clk(normalized_coords, address, filter); *errcode_ret = CL_SUCCESS; return sampler; } LOCAL void cl_sampler_delete(cl_sampler sampler) { if (UNLIKELY(sampler == NULL)) return; if (CL_OBJECT_DEC_REF(sampler) > 1) return; cl_context_remove_sampler(sampler->ctx, sampler); CL_OBJECT_DESTROY_BASE(sampler); cl_free(sampler); } LOCAL void cl_sampler_add_ref(cl_sampler sampler) { assert(sampler); CL_OBJECT_INC_REF(sampler); } Beignet-1.3.2-Source/src/cl_cmrt.h000664 001750 001750 00000002776 13161142102 016046 0ustar00yryr000000 000000 /* * Copyright @2015 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Guo Yejun */ #ifndef __CL_CMRT_H__ #define __CL_CMRT_H__ #ifdef __cplusplus extern "C" { #endif #include "cl_kernel.h" #include "cl_program.h" cl_int cmrt_build_program(cl_program p, const char *options); cl_int cmrt_destroy_program(cl_program p); cl_int cmrt_destroy_device(cl_device_id device); void* cmrt_create_kernel(cl_program p, const char *name); cl_int cmrt_destroy_kernel(cl_kernel k); cl_int cmrt_enqueue(cl_command_queue cq, cl_kernel k, const size_t* global_work_size, const size_t* local_work_size); cl_int cmrt_set_kernel_arg(cl_kernel k, cl_uint index, size_t sz, const void *value); cl_int cmrt_destroy_memory(cl_mem mem); cl_int cmrt_destroy_event(cl_command_queue cq); cl_int cmrt_wait_for_task_finished(cl_command_queue cq); #ifdef __cplusplus } #endif #endif Beignet-1.3.2-Source/src/cl_device_enqueue.h000664 001750 001750 00000002277 13161142102 020063 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Rong Yang */ #ifndef __CL_DEVICE_ENQUEUE_H__ #define __CL_DEVICE_ENQUEUE_H__ #include "cl_internals.h" #include "cl_driver.h" #include "CL/cl.h" #include extern cl_int cl_device_enqueue_bind_buffer(cl_gpgpu gpgpu, cl_kernel ker, uint32_t *max_bti, cl_gpgpu_kernel *kernel); extern cl_int cl_device_enqueue_parse_result(cl_command_queue queue, cl_gpgpu gpgpu); #endif /* __CL_DEVICE_ENQUEUE_H__ */ Beignet-1.3.2-Source/src/cl_platform_id.h000664 001750 001750 00000005732 13161142102 017374 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_PLATFORM_ID_H__ #define __CL_PLATFORM_ID_H__ #include "CL/cl.h" #include "cl_internals.h" #include "cl_extensions.h" #include "cl_base_object.h" #include "src/OCLConfig.h" #include "src/git_sha1.h" struct _cl_platform_id { _cl_base_object base; const char *profile; const char *version; const char *name; const char *vendor; char *extensions; const char *icd_suffix_khr; size_t profile_sz; size_t version_sz; size_t name_sz; size_t vendor_sz; size_t extensions_sz; size_t icd_suffix_khr_sz; struct cl_extensions *internal_extensions; }; #define CL_OBJECT_PLATFORM_MAGIC 0xaacdbb00123ccd85LL #define CL_OBJECT_IS_PLATFORM(obj) ((obj && \ ((cl_base_object)obj)->magic == CL_OBJECT_PLATFORM_MAGIC && \ CL_OBJECT_GET_REF(obj) >= 1)) /* Return the default platform */ extern cl_platform_id cl_get_platform_default(void); /* Return the valid platform */ extern cl_int cl_get_platform_ids(cl_uint num_entries, cl_platform_id * platforms, cl_uint * num_platforms); #define _STR(x) #x #define _JOINT(x, y) _STR(x) "." _STR(y) #define _JOINT3(x, y, z) _STR(x) "." _STR(y) "." _STR(z) #ifdef BEIGNET_GIT_SHA1 #define BEIGNET_GIT_SHA1_STRING " (" BEIGNET_GIT_SHA1 ")" #else #define BEIGNET_GIT_SHA1_STRING #endif #ifdef LIBCL_DRIVER_VERSION_PATCH #define LIBCL_DRIVER_VERSION_STRING _JOINT3(LIBCL_DRIVER_VERSION_MAJOR, LIBCL_DRIVER_VERSION_MINOR, LIBCL_DRIVER_VERSION_PATCH) #else #define LIBCL_DRIVER_VERSION_STRING _JOINT(LIBCL_DRIVER_VERSION_MAJOR, LIBCL_DRIVER_VERSION_MINOR) #endif #define GEN9_LIBCL_VERSION_STRING "OpenCL " _JOINT(LIBCL_C_VERSION_MAJOR, LIBCL_C_VERSION_MINOR) " beignet " LIBCL_DRIVER_VERSION_STRING BEIGNET_GIT_SHA1_STRING #define GEN9_LIBCL_C_VERSION_STRING "OpenCL C " _JOINT(LIBCL_C_VERSION_MAJOR, LIBCL_C_VERSION_MINOR) " beignet " LIBCL_DRIVER_VERSION_STRING BEIGNET_GIT_SHA1_STRING #define NONGEN9_LIBCL_VERSION_STRING "OpenCL 1.2 beignet " LIBCL_DRIVER_VERSION_STRING BEIGNET_GIT_SHA1_STRING #define NONGEN9_LIBCL_C_VERSION_STRING "OpenCL C 1.2 beignet " LIBCL_DRIVER_VERSION_STRING BEIGNET_GIT_SHA1_STRING #endif /* __CL_PLATFORM_ID_H__ */ Beignet-1.3.2-Source/src/cl_alloc.h000664 001750 001750 00000002655 13161142102 016167 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_ALLOC_H__ #define __CL_ALLOC_H__ #include "cl_internals.h" #include /* Return a valid pointer for the requested memory block size */ extern void *cl_malloc(size_t sz); /* Aligned malloc */ extern void* cl_aligned_malloc(size_t sz, size_t align); /* malloc + memzero */ extern void *cl_calloc(size_t n, size_t elem_size); /* Regular realloc */ extern void *cl_realloc(void *ptr, size_t sz); /* Free a pointer allocated with cl_*alloc */ extern void cl_free(void *ptr); /* We count the number of allocation. This function report the number of * allocation still unfreed */ extern size_t cl_report_unfreed(void); #endif /* __CL_ALLOC_H__ */ Beignet-1.3.2-Source/src/cl_mem_gl.c000664 001750 001750 00000004672 13161142102 016331 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Zhigang Gong */ #include #include #include #include #include #include #include "cl_mem.h" #include "cl_image.h" #include "cl_context.h" #include "cl_utils.h" #include "cl_alloc.h" #include "cl_device_id.h" #include "cl_driver.h" #include "cl_platform_id.h" #include "cl_mem_gl.h" #include "CL/cl.h" #include "CL/cl_intel.h" #include "CL/cl_gl.h" LOCAL cl_mem cl_mem_new_gl_buffer(cl_context ctx, cl_mem_flags flags, GLuint buf_obj, cl_int *errcode_ret) { NOT_IMPLEMENTED; } LOCAL cl_mem cl_mem_new_gl_texture(cl_context ctx, cl_mem_flags flags, GLenum texture_target, GLint miplevel, GLuint texture, cl_int *errcode_ret) { cl_int err = CL_SUCCESS; cl_mem mem = NULL; /* Check flags consistency */ if (UNLIKELY(flags & CL_MEM_COPY_HOST_PTR)) { err = CL_INVALID_ARG_VALUE; goto error; } mem = cl_mem_allocate(CL_MEM_GL_IMAGE_TYPE, ctx, flags, 0, 0, NULL, NULL, &err); if (mem == NULL || err != CL_SUCCESS) goto error; mem->bo = cl_buffer_alloc_from_texture(ctx, texture_target, miplevel, texture, cl_mem_image(mem)); if (UNLIKELY(mem->bo == NULL)) { err = CL_MEM_OBJECT_ALLOCATION_FAILURE; goto error; } exit: if (errcode_ret) *errcode_ret = err; return mem; error: cl_mem_delete(mem); mem = NULL; goto exit; } LOCAL void cl_mem_gl_delete(struct _cl_mem_gl_image *gl_image) { if (gl_image->base.base.bo != NULL) cl_buffer_release_from_texture(gl_image->base.base.ctx, gl_image); } Beignet-1.3.2-Source/src/cl_accelerator_intel.c000664 001750 001750 00000004165 13161142102 020545 0ustar00yryr000000 000000 #include "cl_context.h" #include "cl_accelerator_intel.h" #include "cl_utils.h" #include "cl_alloc.h" #include "cl_khr_icd.h" #include "cl_kernel.h" #include LOCAL cl_accelerator_intel cl_accelerator_intel_new(cl_context ctx, cl_accelerator_type_intel accel_type, size_t desc_sz, const void* desc, cl_int* errcode_ret) { cl_accelerator_intel accel = NULL; cl_int err = CL_SUCCESS; /* Allocate and inialize the structure itself */ TRY_ALLOC(accel, CALLOC(struct _cl_accelerator_intel)); CL_OBJECT_INIT_BASE(accel, CL_OBJECT_ACCELERATOR_INTEL_MAGIC); if (accel_type != CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL) { err = CL_INVALID_ACCELERATOR_TYPE_INTEL; goto error; } accel->type = accel_type; if (desc == NULL) { // and check inside desc err = CL_INVALID_ACCELERATOR_DESCRIPTOR_INTEL; goto error; } accel->desc.me = *(cl_motion_estimation_desc_intel*)desc; /* Append the accelerator_intel in the context accelerator_intel list */ /* does this really needed? */ CL_OBJECT_LOCK(ctx); accel->next = ctx->accels; if (ctx->accels != NULL) ctx->accels->prev = accel; ctx->accels = accel; CL_OBJECT_UNLOCK(ctx); accel->ctx = ctx; cl_context_add_ref(ctx); exit: if (errcode_ret) *errcode_ret = err; return accel; error: cl_accelerator_intel_delete(accel); accel = NULL; goto exit; } LOCAL void cl_accelerator_intel_add_ref(cl_accelerator_intel accel) { CL_OBJECT_INC_REF(accel); } LOCAL void cl_accelerator_intel_delete(cl_accelerator_intel accel) { if (UNLIKELY(accel == NULL)) return; if (CL_OBJECT_DEC_REF(accel) > 1) return; /* Remove the accelerator_intel in the context accelerator_intel list */ CL_OBJECT_LOCK(accel->ctx); if (accel->prev) accel->prev->next = accel->next; if (accel->next) accel->next->prev = accel->prev; if (accel->ctx->accels == accel) accel->ctx->accels = accel->next; CL_OBJECT_UNLOCK(accel->ctx); cl_context_delete(accel->ctx); CL_OBJECT_DESTROY_BASE(accel); cl_free(accel); } Beignet-1.3.2-Source/src/cl_command_queue.c000664 001750 001750 00000027267 13161150061 017723 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "program.h" // for BTI_MAX_IMAGE_NUM #include "cl_command_queue.h" #include "cl_context.h" #include "cl_program.h" #include "cl_kernel.h" #include "cl_device_id.h" #include "cl_mem.h" #include "cl_utils.h" #include "cl_alloc.h" #include "cl_driver.h" #include "cl_khr_icd.h" #include "cl_event.h" #include "performance.h" #include "cl_cmrt.h" #include #include #include static cl_command_queue cl_command_queue_new(cl_context ctx) { cl_command_queue queue = NULL; assert(ctx); queue = cl_calloc(1, sizeof(_cl_command_queue)); if (queue == NULL) return NULL; CL_OBJECT_INIT_BASE(queue, CL_OBJECT_COMMAND_QUEUE_MAGIC); if (cl_command_queue_init_enqueue(queue) != CL_SUCCESS) { cl_free(queue); return NULL; } /* Append the command queue in the list */ cl_context_add_queue(ctx, queue); return queue; } LOCAL cl_command_queue cl_create_command_queue(cl_context ctx, cl_device_id device, cl_command_queue_properties properties, cl_uint queue_size, cl_int *errcode_ret) { cl_command_queue queue = cl_command_queue_new(ctx); if (queue == NULL) { *errcode_ret = CL_OUT_OF_HOST_MEMORY; return NULL; } queue->props = properties; queue->device = device; queue->size = queue_size; *errcode_ret = CL_SUCCESS; return queue; } LOCAL void cl_command_queue_delete(cl_command_queue queue) { assert(queue); if (CL_OBJECT_DEC_REF(queue) > 1) return; /* Before we destroy the queue, we should make sure all the commands in the queue are finished. */ cl_command_queue_wait_finish(queue); cl_context_remove_queue(queue->ctx, queue); cl_command_queue_destroy_enqueue(queue); cl_mem_delete(queue->perf); if (queue->barrier_events) { cl_free(queue->barrier_events); } CL_OBJECT_DESTROY_BASE(queue); cl_free(queue); } LOCAL void cl_command_queue_add_ref(cl_command_queue queue) { CL_OBJECT_INC_REF(queue); } static void set_image_info(char *curbe, struct ImageInfo * image_info, struct _cl_mem_image *image) { if (image_info->wSlot >= 0) *(uint32_t*)(curbe + image_info->wSlot) = image->w; if (image_info->hSlot >= 0) *(uint32_t*)(curbe + image_info->hSlot) = image->h; if (image_info->depthSlot >= 0) *(uint32_t*)(curbe + image_info->depthSlot) = image->depth; if (image_info->channelOrderSlot >= 0) *(uint32_t*)(curbe + image_info->channelOrderSlot) = image->fmt.image_channel_order; if (image_info->dataTypeSlot >= 0) *(uint32_t*)(curbe + image_info->dataTypeSlot) = image->fmt.image_channel_data_type; } LOCAL cl_int cl_command_queue_bind_image(cl_command_queue queue, cl_kernel k, cl_gpgpu gpgpu, uint32_t *max_bti) { uint32_t i; for (i = 0; i < k->image_sz; i++) { int id = k->images[i].arg_idx; struct _cl_mem_image *image; assert(interp_kernel_get_arg_type(k->opaque, id) == GBE_ARG_IMAGE); image = cl_mem_image(k->args[id].mem); set_image_info(k->curbe, &k->images[i], image); if(*max_bti < k->images[i].idx) *max_bti = k->images[i].idx; if(k->vme){ if( (image->fmt.image_channel_order != CL_R) || (image->fmt.image_channel_data_type != CL_UNORM_INT8) ) return CL_IMAGE_FORMAT_NOT_SUPPORTED; cl_gpgpu_bind_image_for_vme(gpgpu, k->images[i].idx, image->base.bo, image->offset + k->args[id].mem->offset, image->intel_fmt, image->image_type, image->bpp, image->w, image->h, image->depth, image->row_pitch, image->slice_pitch, (cl_gpgpu_tiling)image->tiling); } else cl_gpgpu_bind_image(gpgpu, k->images[i].idx, image->base.bo, image->offset + k->args[id].mem->offset, image->intel_fmt, image->image_type, image->bpp, image->w, image->h, image->depth, image->row_pitch, image->slice_pitch, (cl_gpgpu_tiling)image->tiling); // TODO, this workaround is for GEN7/GEN75 only, we may need to do it in the driver layer // on demand. if (image->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY) cl_gpgpu_bind_image(gpgpu, k->images[i].idx + BTI_WORKAROUND_IMAGE_OFFSET, image->base.bo, image->offset + k->args[id].mem->offset, image->intel_fmt, image->image_type, image->bpp, image->w, image->h, image->depth, image->row_pitch, image->slice_pitch, (cl_gpgpu_tiling)image->tiling); } return CL_SUCCESS; } LOCAL cl_int cl_command_queue_bind_surface(cl_command_queue queue, cl_kernel k, cl_gpgpu gpgpu, uint32_t *max_bti) { /* Bind all user buffers (given by clSetKernelArg) */ uint32_t i, bti; uint32_t ocl_version = interp_kernel_get_ocl_version(k->opaque); enum gbe_arg_type arg_type; /* kind of argument */ for (i = 0; i < k->arg_n; ++i) { int32_t offset; // location of the address in the curbe arg_type = interp_kernel_get_arg_type(k->opaque, i); if (!(arg_type == GBE_ARG_GLOBAL_PTR || (arg_type == GBE_ARG_CONSTANT_PTR && ocl_version >= 200) || arg_type == GBE_ARG_PIPE) || !k->args[i].mem) continue; offset = interp_kernel_get_curbe_offset(k->opaque, GBE_CURBE_KERNEL_ARGUMENT, i); if (offset < 0) continue; bti = interp_kernel_get_arg_bti(k->opaque, i); if(*max_bti < bti) *max_bti = bti; if (k->args[i].mem->type == CL_MEM_SUBBUFFER_TYPE) { struct _cl_mem_buffer* buffer = (struct _cl_mem_buffer*)k->args[i].mem; cl_gpgpu_bind_buf(gpgpu, k->args[i].mem->bo, offset, k->args[i].mem->offset + buffer->sub_offset, k->args[i].mem->size, bti); } else { size_t mem_offset = 0; // if(k->args[i].is_svm) { mem_offset = (size_t)k->args[i].ptr - (size_t)k->args[i].mem->host_ptr; } cl_gpgpu_bind_buf(gpgpu, k->args[i].mem->bo, offset, k->args[i].mem->offset + mem_offset, k->args[i].mem->size, bti); } } return CL_SUCCESS; } LOCAL cl_int cl_command_queue_bind_exec_info(cl_command_queue queue, cl_kernel k, cl_gpgpu gpgpu, uint32_t *max_bti) { uint32_t i; size_t mem_offset, bti = *max_bti; cl_mem mem; int32_t offset = interp_kernel_get_curbe_size(k->opaque); for (i = 0; i < k->exec_info_n; i++) { void *ptr = k->exec_info[i]; mem = cl_context_get_svm_from_ptr(k->program->ctx, ptr); if(mem == NULL) mem = cl_context_get_mem_from_ptr(k->program->ctx, ptr); if (mem) { mem_offset = (size_t)ptr - (size_t)mem->host_ptr; /* only need realloc in surface state, don't need realloc in curbe */ cl_gpgpu_bind_buf(gpgpu, mem->bo, offset + i * sizeof(ptr), mem->offset + mem_offset, mem->size, bti++); if(bti == BTI_WORKAROUND_IMAGE_OFFSET) bti = *max_bti + BTI_WORKAROUND_IMAGE_OFFSET; assert(bti < BTI_MAX_ID); } } *max_bti = bti; return CL_SUCCESS; } extern cl_int cl_command_queue_ND_range_gen7(cl_command_queue, cl_kernel, cl_event, uint32_t, const size_t *, const size_t *,const size_t *, const size_t *, const size_t *, const size_t *); static cl_int cl_kernel_check_args(cl_kernel k) { uint32_t i; for (i = 0; i < k->arg_n; ++i) if (k->args[i].is_set == CL_FALSE) return CL_INVALID_KERNEL_ARGS; return CL_SUCCESS; } LOCAL cl_int cl_command_queue_ND_range(cl_command_queue queue, cl_kernel k, cl_event event, const uint32_t work_dim, const size_t *global_wk_off, const size_t *global_dim_off, const size_t *global_wk_sz, const size_t *global_wk_sz_use, const size_t *local_wk_sz, const size_t *local_wk_sz_use) { if(b_output_kernel_perf) time_start(queue->ctx, cl_kernel_get_name(k), queue); const int32_t ver = cl_driver_get_ver(queue->ctx->drv); cl_int err = CL_SUCCESS; /* Check that the user did not forget any argument */ TRY (cl_kernel_check_args, k); if (ver == 7 || ver == 75 || ver == 8 || ver == 9) //TRY (cl_command_queue_ND_range_gen7, queue, k, work_dim, global_wk_off, global_wk_sz, local_wk_sz); TRY (cl_command_queue_ND_range_gen7, queue, k, event, work_dim, global_wk_off, global_dim_off, global_wk_sz, global_wk_sz_use, local_wk_sz, local_wk_sz_use); else FATAL ("Unknown Gen Device"); error: return err; } LOCAL int cl_command_queue_flush_gpgpu(cl_gpgpu gpgpu) { void* printf_info = cl_gpgpu_get_printf_info(gpgpu); void* profiling_info; if (cl_gpgpu_flush(gpgpu) < 0) return CL_OUT_OF_RESOURCES; if (printf_info && interp_get_printf_num(printf_info)) { void *addr = cl_gpgpu_map_printf_buffer(gpgpu); interp_output_printf(printf_info, addr); cl_gpgpu_unmap_printf_buffer(gpgpu); } if (printf_info) { interp_release_printf_info(printf_info); cl_gpgpu_set_printf_info(gpgpu, NULL); } /* If have profiling info, output it. */ profiling_info = cl_gpgpu_get_profiling_info(gpgpu); if (profiling_info) { interp_output_profiling(profiling_info, cl_gpgpu_map_profiling_buffer(gpgpu)); cl_gpgpu_unmap_profiling_buffer(gpgpu); } return CL_SUCCESS; } LOCAL void cl_command_queue_insert_barrier_event(cl_command_queue queue, cl_event event) { cl_int i = 0; cl_event_add_ref(event); assert(queue != NULL); CL_OBJECT_LOCK(queue); if (queue->barrier_events == NULL) { queue->barrier_events_size = 4; queue->barrier_events = cl_calloc(queue->barrier_events_size, sizeof(cl_event)); assert(queue->barrier_events); } for (i = 0; ibarrier_events_num; i++) { assert(queue->barrier_events[i] != event); } if(queue->barrier_events_num < queue->barrier_events_size) { queue->barrier_events[queue->barrier_events_num++] = event; CL_OBJECT_UNLOCK(queue); return; } /* Array is full, double expand. */ queue->barrier_events_size *= 2; queue->barrier_events = cl_realloc(queue->barrier_events, queue->barrier_events_size * sizeof(cl_event)); assert(queue->barrier_events); queue->barrier_events[queue->barrier_events_num++] = event; CL_OBJECT_UNLOCK(queue); return; } LOCAL void cl_command_queue_remove_barrier_event(cl_command_queue queue, cl_event event) { cl_int i = 0; assert(queue != NULL); CL_OBJECT_LOCK(queue); assert(queue->barrier_events_num > 0); assert(queue->barrier_events); for(i = 0; i < queue->barrier_events_num; i++) { if(queue->barrier_events[i] == event) break; } assert(i < queue->barrier_events_num); // Must find it. if(i == queue->barrier_events_num - 1) { // The last one. queue->barrier_events[i] = NULL; } else { for(; i < queue->barrier_events_num - 1; i++) { // Move forward. queue->barrier_events[i] = queue->barrier_events[i+1]; } } queue->barrier_events_num -= 1; CL_OBJECT_UNLOCK(queue); cl_event_delete(event); } Beignet-1.3.2-Source/src/cl_utils.c000664 001750 001750 00000004377 13161142102 016233 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "cl_utils.h" #include #include LOCAL void list_node_insert_before(struct list_node *node, struct list_node *the_new) { list_node *before_node = node->p; the_new->p = before_node; the_new->n = node; node->p = the_new; before_node->n = the_new; } LOCAL void list_node_insert_after(struct list_node *node, struct list_node *the_new) { list_node *after_node = node->n; the_new->n = after_node; the_new->p = node; node->n = the_new; after_node->p = the_new; } LOCAL void list_move(struct list_head *the_old, struct list_head *the_new) { assert(list_empty(the_new)); if (list_empty(the_old)) { return; } memcpy(&the_new->head_node, &the_old->head_node, sizeof(list_node)); the_new->head_node.n->p = &the_new->head_node; the_new->head_node.p->n = &the_new->head_node; list_init(the_old); } LOCAL void list_merge(struct list_head *head, struct list_head *to_merge) { if (list_empty(to_merge)) return; list_node *merge_last_node = to_merge->head_node.p; list_node *merge_first_node = to_merge->head_node.n; merge_last_node->n = &head->head_node; merge_first_node->p = head->head_node.p; head->head_node.p->n = merge_first_node; head->head_node.p = merge_last_node; list_init(to_merge); } LOCAL cl_int cl_get_info_helper(const void *src, size_t src_size, void *dst, size_t dst_size, size_t *ret_size) { if (dst && dst_size < src_size) return CL_INVALID_VALUE; if (dst && dst_size) { memcpy(dst, src, src_size); } if (ret_size) *ret_size = src_size; return CL_SUCCESS; } Beignet-1.3.2-Source/src/cl_device_id.c000664 001750 001750 00000206774 13173554000 017022 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "cl_platform_id.h" #include "cl_device_id.h" #include "cl_internals.h" #include "cl_utils.h" #include "cl_driver.h" #include "cl_device_data.h" #include "cl_khr_icd.h" #include "CL/cl.h" #include "CL/cl_ext.h" #include "CL/cl_intel.h" #include "cl_gbe_loader.h" #include "cl_alloc.h" #include #include #include #include #include #ifndef CL_VERSION_1_2 #define CL_DEVICE_BUILT_IN_KERNELS 0x103F #endif static struct _cl_device_id intel_ivb_gt2_device = { .max_compute_unit = 16, .max_thread_per_unit = 8, .sub_slice_count = 2, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen7_device.h" }; static struct _cl_device_id intel_ivb_gt1_device = { .max_compute_unit = 6, .max_thread_per_unit = 6, .sub_slice_count = 1, .max_work_item_sizes = {256, 256, 256}, .max_work_group_size = 256, .max_clock_frequency = 1000, #include "cl_gen7_device.h" }; static struct _cl_device_id intel_baytrail_t_device = { .max_compute_unit = 4, .max_thread_per_unit = 8, .sub_slice_count = 1, .max_work_item_sizes = {256, 256, 256}, .max_work_group_size = 256, .max_clock_frequency = 1000, #include "cl_gen7_device.h" }; /* XXX we clone IVB for HSW now */ static struct _cl_device_id intel_hsw_gt1_device = { .max_compute_unit = 10, .max_thread_per_unit = 7, .sub_slice_count = 1, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen75_device.h" }; static struct _cl_device_id intel_hsw_gt2_device = { .max_compute_unit = 20, .max_thread_per_unit = 7, .sub_slice_count = 2, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen75_device.h" }; static struct _cl_device_id intel_hsw_gt3_device = { .max_compute_unit = 40, .max_thread_per_unit = 7, .sub_slice_count = 4, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen75_device.h" }; /* XXX we clone IVB for HSW now */ static struct _cl_device_id intel_brw_gt1_device = { .max_compute_unit = 12, .max_thread_per_unit = 7, .sub_slice_count = 2, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen8_device.h" }; static struct _cl_device_id intel_brw_gt2_device = { .max_compute_unit = 24, .max_thread_per_unit = 7, .sub_slice_count = 3, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen8_device.h" }; static struct _cl_device_id intel_brw_gt3_device = { .max_compute_unit = 48, .max_thread_per_unit = 7, .sub_slice_count = 6, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen8_device.h" }; //Cherryview has the same pciid, must get the max_compute_unit and max_thread_per_unit from drm static struct _cl_device_id intel_chv_device = { .max_compute_unit = 8, .max_thread_per_unit = 7, .sub_slice_count = 2, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen75_device.h" }; /* XXX we clone brw now */ static struct _cl_device_id intel_skl_gt1_device = { .max_compute_unit = 6, .max_thread_per_unit = 7, .sub_slice_count = 2, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen9_device.h" }; static struct _cl_device_id intel_skl_gt2_device = { .max_compute_unit = 24, .max_thread_per_unit = 7, .sub_slice_count = 3, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen9_device.h" }; static struct _cl_device_id intel_skl_gt3_device = { .max_compute_unit = 48, .max_thread_per_unit = 7, .sub_slice_count = 6, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen9_device.h" }; static struct _cl_device_id intel_skl_gt4_device = { .max_compute_unit = 72, .max_thread_per_unit = 7, .sub_slice_count = 9, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen9_device.h" }; static struct _cl_device_id intel_bxt18eu_device = { .max_compute_unit = 18, .max_thread_per_unit = 6, .sub_slice_count = 3, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen9_device.h" }; static struct _cl_device_id intel_bxt12eu_device = { .max_compute_unit = 12, .max_thread_per_unit = 6, .sub_slice_count = 2, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen9_device.h" }; static struct _cl_device_id intel_kbl_gt1_device = { .max_compute_unit = 12, .max_thread_per_unit = 7, .sub_slice_count = 2, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen9_device.h" }; static struct _cl_device_id intel_kbl_gt15_device = { .max_compute_unit = 18, .max_thread_per_unit = 7, .sub_slice_count = 3, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen9_device.h" }; static struct _cl_device_id intel_kbl_gt2_device = { .max_compute_unit = 24, .max_thread_per_unit = 7, .sub_slice_count = 3, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen9_device.h" }; static struct _cl_device_id intel_kbl_gt3_device = { .max_compute_unit = 48, .max_thread_per_unit = 7, .sub_slice_count = 6, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen9_device.h" }; static struct _cl_device_id intel_kbl_gt4_device = { .max_compute_unit = 72, .max_thread_per_unit = 7, .sub_slice_count = 9, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen9_device.h" }; static struct _cl_device_id intel_glk18eu_device = { .max_compute_unit = 18, .max_thread_per_unit = 6, .sub_slice_count = 3, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen9_device.h" }; static struct _cl_device_id intel_glk12eu_device = { .max_compute_unit = 12, .max_thread_per_unit = 6, .sub_slice_count = 2, .max_work_item_sizes = {512, 512, 512}, .max_work_group_size = 512, .max_clock_frequency = 1000, #include "cl_gen9_device.h" }; LOCAL cl_device_id cl_get_gt_device(cl_device_type device_type) { cl_device_id ret = NULL; const int device_id = cl_driver_get_device_id(); cl_device_id device = NULL; //cl_get_gt_device only return GPU type device. if (((CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_DEFAULT) & device_type) == 0) return NULL; #define DECL_INFO_STRING(BREAK, STRUCT, FIELD, STRING) \ STRUCT.FIELD = STRING; \ STRUCT.JOIN(FIELD,_sz) = sizeof(STRING); \ device = &STRUCT; \ goto BREAK; switch (device_id) { case PCI_CHIP_HASWELL_D1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell GT1 Desktop"); case PCI_CHIP_HASWELL_D2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell GT2 Desktop"); case PCI_CHIP_HASWELL_D3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell GT3 Desktop"); case PCI_CHIP_HASWELL_S1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell GT1 Server"); case PCI_CHIP_HASWELL_S2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell GT2 Server"); case PCI_CHIP_HASWELL_S3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell GT3 Server"); case PCI_CHIP_HASWELL_M1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell GT1 Mobile"); case PCI_CHIP_HASWELL_M2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell GT2 Mobile"); case PCI_CHIP_HASWELL_M3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell GT3 Mobile"); case PCI_CHIP_HASWELL_B1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell GT1 reserved"); case PCI_CHIP_HASWELL_B2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell GT2 reserved"); case PCI_CHIP_HASWELL_B3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell GT3 reserved"); case PCI_CHIP_HASWELL_E1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell GT1 reserved"); case PCI_CHIP_HASWELL_E2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell GT2 reserved"); case PCI_CHIP_HASWELL_E3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell GT3 reserved"); case PCI_CHIP_HASWELL_SDV_D1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT1 Desktop"); case PCI_CHIP_HASWELL_SDV_D2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT2 Desktop"); case PCI_CHIP_HASWELL_SDV_D3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT3 Desktop"); case PCI_CHIP_HASWELL_SDV_S1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT1 Server"); case PCI_CHIP_HASWELL_SDV_S2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT2 Server"); case PCI_CHIP_HASWELL_SDV_S3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT3 Server"); case PCI_CHIP_HASWELL_SDV_M1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT1 Mobile"); case PCI_CHIP_HASWELL_SDV_M2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT2 Mobile"); case PCI_CHIP_HASWELL_SDV_M3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT3 Mobile"); case PCI_CHIP_HASWELL_SDV_B1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT1 reserved"); case PCI_CHIP_HASWELL_SDV_B2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT2 reserved"); case PCI_CHIP_HASWELL_SDV_B3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT3 reserved"); case PCI_CHIP_HASWELL_SDV_E1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT1 reserved"); case PCI_CHIP_HASWELL_SDV_E2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT2 reserved"); case PCI_CHIP_HASWELL_SDV_E3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell" " Software Development Vehicle device GT3 reserved"); case PCI_CHIP_HASWELL_ULT_D1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT1 Desktop"); case PCI_CHIP_HASWELL_ULT_D2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT2 Desktop"); case PCI_CHIP_HASWELL_ULT_D3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT3 Desktop"); case PCI_CHIP_HASWELL_ULT_S1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT1 Server"); case PCI_CHIP_HASWELL_ULT_S2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT2 Server"); case PCI_CHIP_HASWELL_ULT_S3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT3 Server"); case PCI_CHIP_HASWELL_ULT_M1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT1 Mobile"); case PCI_CHIP_HASWELL_ULT_M2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile"); case PCI_CHIP_HASWELL_ULT_M3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT3 Mobile"); case PCI_CHIP_HASWELL_ULT_B1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT1 reserved"); case PCI_CHIP_HASWELL_ULT_B2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT2 reserved"); case PCI_CHIP_HASWELL_ULT_B3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT3 reserved"); case PCI_CHIP_HASWELL_ULT_E1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT1 reserved"); case PCI_CHIP_HASWELL_ULT_E2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT2 reserved"); case PCI_CHIP_HASWELL_ULT_E3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell Ultrabook GT3 reserved"); /* CRW */ case PCI_CHIP_HASWELL_CRW_D1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell CRW GT1 Desktop"); case PCI_CHIP_HASWELL_CRW_D2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell CRW GT2 Desktop"); case PCI_CHIP_HASWELL_CRW_D3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell CRW GT3 Desktop"); case PCI_CHIP_HASWELL_CRW_S1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell CRW GT1 Server"); case PCI_CHIP_HASWELL_CRW_S2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell CRW GT2 Server"); case PCI_CHIP_HASWELL_CRW_S3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell CRW GT3 Server"); case PCI_CHIP_HASWELL_CRW_M1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell CRW GT1 Mobile"); case PCI_CHIP_HASWELL_CRW_M2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell CRW GT2 Mobile"); case PCI_CHIP_HASWELL_CRW_M3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell CRW GT3 Mobile"); case PCI_CHIP_HASWELL_CRW_B1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell CRW GT1 reserved"); case PCI_CHIP_HASWELL_CRW_B2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell CRW GT2 reserved"); case PCI_CHIP_HASWELL_CRW_B3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell CRW GT3 reserved"); case PCI_CHIP_HASWELL_CRW_E1: DECL_INFO_STRING(has_break, intel_hsw_gt1_device, name, "Intel(R) HD Graphics Haswell CRW GT1 reserved"); case PCI_CHIP_HASWELL_CRW_E2: DECL_INFO_STRING(has_break, intel_hsw_gt2_device, name, "Intel(R) HD Graphics Haswell CRW GT2 reserved"); case PCI_CHIP_HASWELL_CRW_E3: DECL_INFO_STRING(has_break, intel_hsw_gt3_device, name, "Intel(R) HD Graphics Haswell CRW GT3 reserved"); has_break: device->device_id = device_id; device->platform = cl_get_platform_default(); ret = device; cl_intel_platform_get_default_extension(ret); break; case PCI_CHIP_IVYBRIDGE_GT1: DECL_INFO_STRING(ivb_gt1_break, intel_ivb_gt1_device, name, "Intel(R) HD Graphics IvyBridge GT1"); case PCI_CHIP_IVYBRIDGE_M_GT1: DECL_INFO_STRING(ivb_gt1_break, intel_ivb_gt1_device, name, "Intel(R) HD Graphics IvyBridge M GT1"); case PCI_CHIP_IVYBRIDGE_S_GT1: DECL_INFO_STRING(ivb_gt1_break, intel_ivb_gt1_device, name, "Intel(R) HD Graphics IvyBridge S GT1"); ivb_gt1_break: intel_ivb_gt1_device.device_id = device_id; intel_ivb_gt1_device.platform = cl_get_platform_default(); ret = &intel_ivb_gt1_device; cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_intel_motion_estimation_ext_id); break; case PCI_CHIP_IVYBRIDGE_GT2: DECL_INFO_STRING(ivb_gt2_break, intel_ivb_gt2_device, name, "Intel(R) HD Graphics IvyBridge GT2"); case PCI_CHIP_IVYBRIDGE_M_GT2: DECL_INFO_STRING(ivb_gt2_break, intel_ivb_gt2_device, name, "Intel(R) HD Graphics IvyBridge M GT2"); case PCI_CHIP_IVYBRIDGE_S_GT2: DECL_INFO_STRING(ivb_gt2_break, intel_ivb_gt2_device, name, "Intel(R) HD Graphics IvyBridge S GT2"); ivb_gt2_break: intel_ivb_gt2_device.device_id = device_id; intel_ivb_gt2_device.platform = cl_get_platform_default(); ret = &intel_ivb_gt2_device; cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_intel_motion_estimation_ext_id); break; case PCI_CHIP_BAYTRAIL_T: DECL_INFO_STRING(baytrail_t_device_break, intel_baytrail_t_device, name, "Intel(R) HD Graphics Bay Trail-T"); baytrail_t_device_break: intel_baytrail_t_device.device_id = device_id; intel_baytrail_t_device.platform = cl_get_platform_default(); ret = &intel_baytrail_t_device; cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_intel_motion_estimation_ext_id); break; case PCI_CHIP_BROADWLL_M_GT1: DECL_INFO_STRING(brw_gt1_break, intel_brw_gt1_device, name, "Intel(R) HD Graphics BroadWell Mobile GT1"); case PCI_CHIP_BROADWLL_D_GT1: DECL_INFO_STRING(brw_gt1_break, intel_brw_gt1_device, name, "Intel(R) HD Graphics BroadWell U-Processor GT1"); case PCI_CHIP_BROADWLL_S_GT1: DECL_INFO_STRING(brw_gt1_break, intel_brw_gt1_device, name, "Intel(R) HD Graphics BroadWell Server GT1"); case PCI_CHIP_BROADWLL_W_GT1: DECL_INFO_STRING(brw_gt1_break, intel_brw_gt1_device, name, "Intel(R) HD Graphics BroadWell Workstation GT1"); case PCI_CHIP_BROADWLL_U_GT1: DECL_INFO_STRING(brw_gt1_break, intel_brw_gt1_device, name, "Intel(R) HD Graphics BroadWell ULX GT1"); brw_gt1_break: /* For Gen8 and later, half float is suppported and we will enable cl_khr_fp16. */ intel_brw_gt1_device.device_id = device_id; intel_brw_gt1_device.platform = cl_get_platform_default(); ret = &intel_brw_gt1_device; cl_intel_platform_get_default_extension(ret); #ifdef ENABLE_FP64 cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id); #endif cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_BROADWLL_M_GT2: DECL_INFO_STRING(brw_gt2_break, intel_brw_gt2_device, name, "Intel(R) HD Graphics 5600 BroadWell Mobile GT2"); case PCI_CHIP_BROADWLL_D_GT2: DECL_INFO_STRING(brw_gt2_break, intel_brw_gt2_device, name, "Intel(R) HD Graphics 5500 BroadWell U-Processor GT2"); case PCI_CHIP_BROADWLL_S_GT2: DECL_INFO_STRING(brw_gt2_break, intel_brw_gt2_device, name, "Intel(R) HD Graphics BroadWell Server GT2"); case PCI_CHIP_BROADWLL_W_GT2: DECL_INFO_STRING(brw_gt2_break, intel_brw_gt2_device, name, "Intel(R) HD Graphics BroadWell Workstation GT2"); case PCI_CHIP_BROADWLL_U_GT2: DECL_INFO_STRING(brw_gt2_break, intel_brw_gt2_device, name, "Intel(R) HD Graphics 5300 BroadWell ULX GT2"); brw_gt2_break: intel_brw_gt2_device.device_id = device_id; intel_brw_gt2_device.platform = cl_get_platform_default(); ret = &intel_brw_gt2_device; cl_intel_platform_get_default_extension(ret); #ifdef ENABLE_FP64 cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id); #endif cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_BROADWLL_M_GT3: DECL_INFO_STRING(brw_gt3_break, intel_brw_gt3_device, name, "Intel(R) Iris Pro Graphics 6200 BroadWell Mobile GT3"); case PCI_CHIP_BROADWLL_D_GT3: DECL_INFO_STRING(brw_gt3_break, intel_brw_gt3_device, name, "Intel(R) HD Graphics 6000 BroadWell U-Processor GT3"); case PCI_CHIP_BROADWLL_UI_GT3: DECL_INFO_STRING(brw_gt3_break, intel_brw_gt3_device, name, "Intel(R) Iris Graphics 6100 BroadWell U-Processor GT3"); case PCI_CHIP_BROADWLL_S_GT3: DECL_INFO_STRING(brw_gt3_break, intel_brw_gt3_device, name, "Intel(R) Iris Pro Graphics P6300 BroadWell Server GT3"); case PCI_CHIP_BROADWLL_W_GT3: DECL_INFO_STRING(brw_gt3_break, intel_brw_gt3_device, name, "Intel(R) HD Graphics BroadWell Workstation GT3"); case PCI_CHIP_BROADWLL_U_GT3: DECL_INFO_STRING(brw_gt3_break, intel_brw_gt3_device, name, "Intel(R) HD Graphics BroadWell ULX GT3"); brw_gt3_break: intel_brw_gt3_device.device_id = device_id; intel_brw_gt3_device.platform = cl_get_platform_default(); ret = &intel_brw_gt3_device; cl_intel_platform_get_default_extension(ret); #ifdef ENABLE_FP64 cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id); #endif cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_CHV_0: case PCI_CHIP_CHV_1: case PCI_CHIP_CHV_2: case PCI_CHIP_CHV_3: DECL_INFO_STRING(chv_break, intel_chv_device, name, "Intel(R) HD Graphics Cherryview"); chv_break: intel_chv_device.device_id = device_id; intel_chv_device.platform = cl_get_platform_default(); ret = &intel_chv_device; cl_intel_platform_get_default_extension(ret); #ifdef ENABLE_FP64 cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id); #endif cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_SKYLAKE_ULT_GT1: DECL_INFO_STRING(skl_gt1_break, intel_skl_gt1_device, name, "Intel(R) HD Graphics Skylake ULT GT1"); case PCI_CHIP_SKYLAKE_ULX_GT1: DECL_INFO_STRING(skl_gt1_break, intel_skl_gt1_device, name, "Intel(R) HD Graphics Skylake ULX GT1"); case PCI_CHIP_SKYLAKE_DT_GT1: DECL_INFO_STRING(skl_gt1_break, intel_skl_gt1_device, name, "Intel(R) HD Graphics Skylake Desktop GT1"); case PCI_CHIP_SKYLAKE_HALO_GT1: DECL_INFO_STRING(skl_gt1_break, intel_skl_gt1_device, name, "Intel(R) HD Graphics Skylake Halo GT1"); case PCI_CHIP_SKYLAKE_SRV_GT1: DECL_INFO_STRING(skl_gt1_break, intel_skl_gt1_device, name, "Intel(R) HD Graphics Skylake Server GT1"); skl_gt1_break: intel_skl_gt1_device.device_id = device_id; intel_skl_gt1_device.platform = cl_get_platform_default(); ret = &intel_skl_gt1_device; #ifdef ENABLE_FP64 cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id); #endif cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_SKYLAKE_ULT_GT2: DECL_INFO_STRING(skl_gt2_break, intel_skl_gt2_device, name, "Intel(R) HD Graphics Skylake ULT GT2"); case PCI_CHIP_SKYLAKE_ULT_GT2F: DECL_INFO_STRING(skl_gt2_break, intel_skl_gt2_device, name, "Intel(R) HD Graphics Skylake ULT GT2F"); case PCI_CHIP_SKYLAKE_ULX_GT2: DECL_INFO_STRING(skl_gt2_break, intel_skl_gt2_device, name, "Intel(R) HD Graphics Skylake ULX GT2"); case PCI_CHIP_SKYLAKE_DT_GT2: DECL_INFO_STRING(skl_gt2_break, intel_skl_gt2_device, name, "Intel(R) HD Graphics Skylake Desktop GT2"); case PCI_CHIP_SKYLAKE_HALO_GT2: DECL_INFO_STRING(skl_gt2_break, intel_skl_gt2_device, name, "Intel(R) HD Graphics Skylake Halo GT2"); case PCI_CHIP_SKYLAKE_SRV_GT2: DECL_INFO_STRING(skl_gt2_break, intel_skl_gt2_device, name, "Intel(R) HD Graphics Skylake Server GT2"); case PCI_CHIP_SKYLAKE_WKS_GT2: DECL_INFO_STRING(skl_gt2_break, intel_skl_gt2_device, name, "Intel(R) HD Graphics Skylake Workstation GT2"); skl_gt2_break: intel_skl_gt2_device.device_id = device_id; intel_skl_gt2_device.platform = cl_get_platform_default(); ret = &intel_skl_gt2_device; #ifdef ENABLE_FP64 cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id); #endif cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_SKYLAKE_ULT_GT3: DECL_INFO_STRING(skl_gt3_break, intel_skl_gt3_device, name, "Intel(R) HD Graphics Skylake ULT GT3"); case PCI_CHIP_SKYLAKE_ULT_GT3E1: DECL_INFO_STRING(skl_gt3_break, intel_skl_gt3_device, name, "Intel(R) HD Graphics Skylake ULT GT3E"); case PCI_CHIP_SKYLAKE_ULT_GT3E2: DECL_INFO_STRING(skl_gt3_break, intel_skl_gt3_device, name, "Intel(R) HD Graphics Skylake ULT GT3E"); case PCI_CHIP_SKYLAKE_HALO_GT3: DECL_INFO_STRING(skl_gt3_break, intel_skl_gt3_device, name, "Intel(R) HD Graphics Skylake Halo GT3"); case PCI_CHIP_SKYLAKE_SRV_GT3: DECL_INFO_STRING(skl_gt3_break, intel_skl_gt3_device, name, "Intel(R) HD Graphics Skylake Server GT3"); case PCI_CHIP_SKYLAKE_MEDIA_SRV_GT3: DECL_INFO_STRING(skl_gt3_break, intel_skl_gt3_device, name, "Intel(R) HD Graphics Skylake Media Server GT3"); skl_gt3_break: intel_skl_gt3_device.device_id = device_id; intel_skl_gt3_device.platform = cl_get_platform_default(); ret = &intel_skl_gt3_device; cl_intel_platform_get_default_extension(ret); #ifdef ENABLE_FP64 cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id); #endif cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_SKYLAKE_DT_GT4: DECL_INFO_STRING(skl_gt4_break, intel_skl_gt4_device, name, "Intel(R) HD Graphics Skylake Desktop GT4"); case PCI_CHIP_SKYLAKE_HALO_GT4: DECL_INFO_STRING(skl_gt4_break, intel_skl_gt4_device, name, "Intel(R) HD Graphics Skylake Halo GT4"); case PCI_CHIP_SKYLAKE_SRV_GT4: DECL_INFO_STRING(skl_gt4_break, intel_skl_gt4_device, name, "Intel(R) HD Graphics Skylake Server GT4"); case PCI_CHIP_SKYLAKE_WKS_GT4: DECL_INFO_STRING(skl_gt4_break, intel_skl_gt4_device, name, "Intel(R) HD Graphics Skylake Workstation GT4"); skl_gt4_break: intel_skl_gt4_device.device_id = device_id; intel_skl_gt4_device.platform = cl_get_platform_default(); ret = &intel_skl_gt4_device; #ifdef ENABLE_FP64 cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id); #endif cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_BROXTON_0: DECL_INFO_STRING(bxt18eu_break, intel_bxt18eu_device, name, "Intel(R) HD Graphics Broxton 0"); case PCI_CHIP_BROXTON_2: DECL_INFO_STRING(bxt18eu_break, intel_bxt18eu_device, name, "Intel(R) HD Graphics Broxton 2"); bxt18eu_break: intel_bxt18eu_device.device_id = device_id; intel_bxt18eu_device.platform = cl_get_platform_default(); ret = &intel_bxt18eu_device; cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_BROXTON_1: DECL_INFO_STRING(bxt12eu_break, intel_bxt12eu_device, name, "Intel(R) HD Graphics Broxton 1"); case PCI_CHIP_BROXTON_3: DECL_INFO_STRING(bxt12eu_break, intel_bxt12eu_device, name, "Intel(R) HD Graphics Broxton 3"); bxt12eu_break: intel_bxt12eu_device.device_id = device_id; intel_bxt12eu_device.platform = cl_get_platform_default(); ret = &intel_bxt12eu_device; cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_KABYLAKE_ULT_GT1: DECL_INFO_STRING(kbl_gt1_break, intel_kbl_gt1_device, name, "Intel(R) HD Graphics Kabylake ULT GT1"); case PCI_CHIP_KABYLAKE_DT_GT1: DECL_INFO_STRING(kbl_gt1_break, intel_kbl_gt1_device, name, "Intel(R) HD Graphics Kabylake Desktop GT1"); case PCI_CHIP_KABYLAKE_HALO_GT1: DECL_INFO_STRING(kbl_gt1_break, intel_kbl_gt1_device, name, "Intel(R) HD Graphics Kabylake Halo GT1"); case PCI_CHIP_KABYLAKE_ULX_GT1: DECL_INFO_STRING(kbl_gt1_break, intel_kbl_gt1_device, name, "Intel(R) HD Graphics Kabylake ULX GT1"); case PCI_CHIP_KABYLAKE_SRV_GT1: DECL_INFO_STRING(kbl_gt1_break, intel_kbl_gt1_device, name, "Intel(R) HD Graphics Kabylake Server GT1"); kbl_gt1_break: intel_kbl_gt1_device.device_id = device_id; intel_kbl_gt1_device.platform = cl_get_platform_default(); ret = &intel_kbl_gt1_device; #ifdef ENABLE_FP64 cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id); #endif cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_KABYLAKE_ULT_GT15: DECL_INFO_STRING(kbl_gt15_break, intel_kbl_gt15_device, name, "Intel(R) HD Graphics Kabylake ULT GT1.5"); case PCI_CHIP_KABYLAKE_DT_GT15: DECL_INFO_STRING(kbl_gt15_break, intel_kbl_gt15_device, name, "Intel(R) HD Graphics Kabylake Desktop GT1.5"); case PCI_CHIP_KABYLAKE_HALO_GT15: DECL_INFO_STRING(kbl_gt15_break, intel_kbl_gt15_device, name, "Intel(R) HD Graphics Kabylake Halo GT1.5"); case PCI_CHIP_KABYLAKE_ULX_GT15: DECL_INFO_STRING(kbl_gt15_break, intel_kbl_gt15_device, name, "Intel(R) HD Graphics Kabylake ULX GT1.5"); kbl_gt15_break: intel_kbl_gt15_device.device_id = device_id; intel_kbl_gt15_device.platform = cl_get_platform_default(); ret = &intel_kbl_gt15_device; #ifdef ENABLE_FP64 cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id); #endif cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_KABYLAKE_ULT_GT2: case PCI_CHIP_KABYLAKE_ULT_GT2_1: DECL_INFO_STRING(kbl_gt2_break, intel_kbl_gt2_device, name, "Intel(R) HD Graphics Kabylake ULT GT2"); case PCI_CHIP_KABYLAKE_DT_GT2: DECL_INFO_STRING(kbl_gt2_break, intel_kbl_gt2_device, name, "Intel(R) HD Graphics Kabylake Desktop GT2"); case PCI_CHIP_KABYLAKE_HALO_GT2: DECL_INFO_STRING(kbl_gt2_break, intel_kbl_gt2_device, name, "Intel(R) HD Graphics Kabylake Halo GT2"); case PCI_CHIP_KABYLAKE_ULX_GT2: DECL_INFO_STRING(kbl_gt2_break, intel_kbl_gt2_device, name, "Intel(R) HD Graphics Kabylake ULX GT2"); case PCI_CHIP_KABYLAKE_SRV_GT2: DECL_INFO_STRING(kbl_gt2_break, intel_kbl_gt2_device, name, "Intel(R) HD Graphics Kabylake Server GT2"); case PCI_CHIP_KABYLAKE_WKS_GT2: DECL_INFO_STRING(kbl_gt2_break, intel_kbl_gt2_device, name, "Intel(R) HD Graphics Kabylake Workstation GT2"); kbl_gt2_break: intel_kbl_gt2_device.device_id = device_id; intel_kbl_gt2_device.platform = cl_get_platform_default(); ret = &intel_kbl_gt2_device; #ifdef ENABLE_FP64 cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id); #endif cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_KABYLAKE_ULT_GT3: case PCI_CHIP_KABYLAKE_ULT_GT3_1: case PCI_CHIP_KABYLAKE_ULT_GT3_2: DECL_INFO_STRING(kbl_gt3_break, intel_kbl_gt3_device, name, "Intel(R) HD Graphics Kabylake ULT GT3"); kbl_gt3_break: intel_kbl_gt3_device.device_id = device_id; intel_kbl_gt3_device.platform = cl_get_platform_default(); ret = &intel_kbl_gt3_device; #ifdef ENABLE_FP64 cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id); #endif cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_KABYLAKE_HALO_GT4: DECL_INFO_STRING(kbl_gt4_break, intel_kbl_gt4_device, name, "Intel(R) HD Graphics Kabylake ULT GT4"); kbl_gt4_break: intel_kbl_gt4_device.device_id = device_id; intel_kbl_gt4_device.platform = cl_get_platform_default(); ret = &intel_kbl_gt4_device; #ifdef ENABLE_FP64 cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id); #endif cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_GLK_3x6: DECL_INFO_STRING(glk18eu_break, intel_bxt18eu_device, name, "Intel(R) HD Graphics Geminilake(3x6)"); glk18eu_break: intel_glk18eu_device.device_id = device_id; intel_glk18eu_device.platform = cl_get_platform_default(); ret = &intel_glk18eu_device; cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_GLK_2x6: DECL_INFO_STRING(glk12eu_break, intel_bxt12eu_device, name, "Intel(R) HD Graphics Geminilake(2x6)"); glk12eu_break: intel_glk12eu_device.device_id = device_id; intel_glk12eu_device.platform = cl_get_platform_default(); ret = &intel_glk12eu_device; cl_intel_platform_get_default_extension(ret); cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id); break; case PCI_CHIP_SANDYBRIDGE_BRIDGE: case PCI_CHIP_SANDYBRIDGE_GT1: case PCI_CHIP_SANDYBRIDGE_GT2: case PCI_CHIP_SANDYBRIDGE_GT2_PLUS: case PCI_CHIP_SANDYBRIDGE_BRIDGE_M: case PCI_CHIP_SANDYBRIDGE_M_GT1: case PCI_CHIP_SANDYBRIDGE_M_GT2: case PCI_CHIP_SANDYBRIDGE_M_GT2_PLUS: case PCI_CHIP_SANDYBRIDGE_BRIDGE_S: case PCI_CHIP_SANDYBRIDGE_S_GT: // Intel(R) HD Graphics SandyBridge not supported yet ret = NULL; break; default: printf("cl_get_gt_device(): error, unknown device: %x\n", device_id); } if (ret == NULL) return NULL; CL_OBJECT_INIT_BASE(ret, CL_OBJECT_DEVICE_MAGIC); if (!CompilerSupported()) { ret->compiler_available = CL_FALSE; //ret->linker_available = CL_FALSE; ret->profile = "EMBEDDED_PROFILE"; ret->profile_sz = strlen(ret->profile) + 1; } /* Apply any driver-dependent updates to the device info */ cl_driver_update_device_info(ret); #define toMB(size) (size)&(UINT64_MAX<<20) /* Get the global_mem_size and max_mem_alloc size from * driver, system ram and hardware*/ struct sysinfo info; if (sysinfo(&info) == 0) { uint64_t totalgpumem = ret->global_mem_size; uint64_t maxallocmem = ret->max_mem_alloc_size; uint64_t totalram = info.totalram * info.mem_unit; /* In case to keep system stable we just use half * of the raw as global mem */ ret->global_mem_size = toMB((totalram / 2 > totalgpumem) ? totalgpumem: totalram / 2); /* The hardware has some limit about the alloc size * and the excution of kernel need some global mem * so we now make sure single mem does not use much * than 3/4 global mem*/ ret->max_mem_alloc_size = toMB((ret->global_mem_size * 3 / 4 > maxallocmem) ? maxallocmem: ret->global_mem_size * 3 / 4); } return ret; } /* Runs a small kernel to check that the device works; returns * SELF_TEST_PASS: for success. * SELF_TEST_SLM_FAIL: for SLM results mismatch; * SELF_TEST_ATOMIC_FAIL: for hsw enqueue kernel failure to not enable atomics in L3. * SELF_TEST_OTHER_FAIL: other fail like runtime API fail.*/ LOCAL cl_self_test_res cl_self_test(cl_device_id device, cl_self_test_res atomic_in_l3_flag) { cl_int status; cl_context ctx; cl_command_queue queue; cl_program program; cl_kernel kernel; cl_mem buffer; cl_event kernel_finished; size_t n = 3; cl_int test_data[3] = {3, 7, 5}; const char* kernel_source = "__kernel void self_test(__global int *buf) {" " __local int tmp[3];" " tmp[get_local_id(0)] = buf[get_local_id(0)];" " barrier(CLK_LOCAL_MEM_FENCE);" " buf[get_global_id(0)] = tmp[2 - get_local_id(0)] + buf[get_global_id(0)];" "}"; // using __local to catch the "no SLM on Haswell" problem static int tested = 0; static cl_self_test_res ret = SELF_TEST_OTHER_FAIL; if (tested != 0) return ret; tested = 1; ctx = clCreateContext(NULL, 1, &device, NULL, NULL, &status); if(!ctx) return ret; cl_driver_set_atomic_flag(ctx->drv, atomic_in_l3_flag); if (status == CL_SUCCESS) { queue = clCreateCommandQueueWithProperties(ctx, device, 0, &status); if (status == CL_SUCCESS) { program = clCreateProgramWithSource(ctx, 1, &kernel_source, NULL, &status); if (status == CL_SUCCESS) { status = clBuildProgram(program, 1, &device, "", NULL, NULL); if (status == CL_SUCCESS) { kernel = clCreateKernel(program, "self_test", &status); if (status == CL_SUCCESS) { buffer = clCreateBuffer(ctx, CL_MEM_COPY_HOST_PTR, n*4, test_data, &status); if (status == CL_SUCCESS) { status = clSetKernelArg(kernel, 0, sizeof(cl_mem), &buffer); if (status == CL_SUCCESS) { status = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &n, &n, 0, NULL, &kernel_finished); if (status == CL_SUCCESS) { status = clEnqueueReadBuffer(queue, buffer, CL_TRUE, 0, n*4, test_data, 1, &kernel_finished, NULL); if (status == CL_SUCCESS) { if (test_data[0] == 8 && test_data[1] == 14 && test_data[2] == 8){ ret = SELF_TEST_PASS; } else { ret = SELF_TEST_SLM_FAIL; printf("Beignet: self-test failed: (3, 7, 5) + (5, 7, 3) returned (%i, %i, %i)\n" "See README.md or http://www.freedesktop.org/wiki/Software/Beignet/\n", test_data[0], test_data[1], test_data[2]); } } } else{ ret = SELF_TEST_ATOMIC_FAIL; // Atomic fail need to test SLM again with atomic in L3 feature disabled. tested = 0; } clReleaseEvent(kernel_finished); } } clReleaseMemObject(buffer); } clReleaseKernel(kernel); } } clReleaseProgram(program); } clReleaseCommandQueue(queue); } clReleaseContext(ctx); return ret; } LOCAL cl_int cl_get_device_ids(cl_platform_id platform, cl_device_type device_type, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) { cl_device_id device; /* Do we have a usable device? */ device = cl_get_gt_device(device_type); if (device) { cl_self_test_res ret = cl_self_test(device, SELF_TEST_PASS); if (ret == SELF_TEST_ATOMIC_FAIL) { device->atomic_test_result = ret; ret = cl_self_test(device, ret); printf("Beignet: warning - disable atomic in L3 feature.\n"); } if(ret == SELF_TEST_SLM_FAIL) { int disable_self_test = 0; // can't use BVAR (backend/src/sys/cvar.hpp) here as it's C++ const char *env = getenv("OCL_IGNORE_SELF_TEST"); if (env != NULL) { sscanf(env, "%i", &disable_self_test); } if (disable_self_test) { printf("Beignet: Warning - overriding self-test failure\n"); } else { printf("Beignet: disabling non-working device\n"); device = 0; } } } if (!device) { if (num_devices) *num_devices = 0; if (devices) *devices = 0; return CL_DEVICE_NOT_FOUND; } else { if (num_devices) *num_devices = 1; if (devices) { *devices = device; } return CL_SUCCESS; } } LOCAL cl_bool is_gen_device(cl_device_id device) { return device == &intel_ivb_gt1_device || device == &intel_ivb_gt2_device || device == &intel_baytrail_t_device || device == &intel_hsw_gt1_device || device == &intel_hsw_gt2_device || device == &intel_hsw_gt3_device || device == &intel_brw_gt1_device || device == &intel_brw_gt2_device || device == &intel_brw_gt3_device || device == &intel_chv_device || device == &intel_skl_gt1_device || device == &intel_skl_gt2_device || device == &intel_skl_gt3_device || device == &intel_skl_gt4_device || device == &intel_bxt18eu_device || device == &intel_bxt12eu_device || device == &intel_kbl_gt1_device || device == &intel_kbl_gt15_device || device == &intel_kbl_gt2_device || device == &intel_kbl_gt3_device || device == &intel_kbl_gt4_device || device == &intel_glk18eu_device || device == &intel_glk12eu_device; } LOCAL cl_int cl_get_device_info(cl_device_id device, cl_device_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { const void *src_ptr = NULL; size_t src_size = 0; cl_int dev_ref; // We now just support gen devices. if (UNLIKELY(is_gen_device(device) == CL_FALSE)) return CL_INVALID_DEVICE; /* Find the correct parameter */ switch (param_name) { case CL_DEVICE_TYPE: src_ptr = &device->device_type; src_size = sizeof(device->device_type); break; case CL_DEVICE_VENDOR_ID: src_ptr = &device->vendor_id; src_size = sizeof(device->vendor_id); break; case CL_DEVICE_MAX_COMPUTE_UNITS: src_ptr = &device->max_compute_unit; src_size = sizeof(device->max_compute_unit); break; case CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: src_ptr = &device->max_work_item_dimensions; src_size = sizeof(device->max_work_item_dimensions); break; case CL_DEVICE_MAX_WORK_ITEM_SIZES: src_ptr = &device->max_work_item_sizes; src_size = sizeof(device->max_work_item_sizes); break; case CL_DEVICE_MAX_WORK_GROUP_SIZE: src_ptr = &device->max_work_group_size; src_size = sizeof(device->max_work_group_size); break; case CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: src_ptr = &device->preferred_vector_width_char; src_size = sizeof(device->preferred_vector_width_char); break; case CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: src_ptr = &device->preferred_vector_width_short; src_size = sizeof(device->preferred_vector_width_short); break; case CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: src_ptr = &device->preferred_vector_width_int; src_size = sizeof(device->preferred_vector_width_int); break; case CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: src_ptr = &device->preferred_vector_width_long; src_size = sizeof(device->preferred_vector_width_long); break; case CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: src_ptr = &device->preferred_vector_width_float; src_size = sizeof(device->preferred_vector_width_float); break; case CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: src_ptr = &device->preferred_vector_width_double; src_size = sizeof(device->preferred_vector_width_double); break; case CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF: src_ptr = &device->preferred_vector_width_half; src_size = sizeof(device->preferred_vector_width_half); break; case CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR: src_ptr = &device->native_vector_width_char; src_size = sizeof(device->native_vector_width_char); break; case CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT: src_ptr = &device->native_vector_width_short; src_size = sizeof(device->native_vector_width_short); break; case CL_DEVICE_NATIVE_VECTOR_WIDTH_INT: src_ptr = &device->native_vector_width_int; src_size = sizeof(device->native_vector_width_int); break; case CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG: src_ptr = &device->native_vector_width_long; src_size = sizeof(device->native_vector_width_long); break; case CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT: src_ptr = &device->native_vector_width_float; src_size = sizeof(device->native_vector_width_float); break; case CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE: src_ptr = &device->native_vector_width_double; src_size = sizeof(device->native_vector_width_double); break; case CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF: src_ptr = &device->native_vector_width_half; src_size = sizeof(device->native_vector_width_half); break; case CL_DEVICE_MAX_CLOCK_FREQUENCY: src_ptr = &device->max_clock_frequency; src_size = sizeof(device->max_clock_frequency); break; case CL_DEVICE_ADDRESS_BITS: src_ptr = &device->address_bits; src_size = sizeof(device->address_bits); break; case CL_DEVICE_MAX_MEM_ALLOC_SIZE: src_ptr = &device->max_mem_alloc_size; src_size = sizeof(device->max_mem_alloc_size); break; case CL_DEVICE_IMAGE_SUPPORT: src_ptr = &device->image_support; src_size = sizeof(device->image_support); break; case CL_DEVICE_MAX_READ_IMAGE_ARGS: src_ptr = &device->max_read_image_args; src_size = sizeof(device->max_read_image_args); break; case CL_DEVICE_MAX_WRITE_IMAGE_ARGS: src_ptr = &device->max_write_image_args; src_size = sizeof(device->max_write_image_args); break; case CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS: src_ptr = &device->max_read_write_image_args; src_size = sizeof(device->max_read_write_image_args); break; case CL_DEVICE_IMAGE_MAX_ARRAY_SIZE: src_ptr = &device->image_max_array_size; src_size = sizeof(device->image_max_array_size); break; case CL_DEVICE_IMAGE2D_MAX_WIDTH: src_ptr = &device->image2d_max_width; src_size = sizeof(device->image2d_max_width); break; case CL_DEVICE_IMAGE2D_MAX_HEIGHT: src_ptr = &device->image2d_max_height; src_size = sizeof(device->image2d_max_height); break; case CL_DEVICE_IMAGE3D_MAX_WIDTH: src_ptr = &device->image3d_max_width; src_size = sizeof(device->image3d_max_width); break; case CL_DEVICE_IMAGE3D_MAX_HEIGHT: src_ptr = &device->image3d_max_height; src_size = sizeof(device->image3d_max_height); break; case CL_DEVICE_IMAGE3D_MAX_DEPTH: src_ptr = &device->image3d_max_depth; src_size = sizeof(device->image3d_max_depth); break; case CL_DEVICE_MAX_SAMPLERS: src_ptr = &device->max_samplers; src_size = sizeof(device->max_samplers); break; case CL_DEVICE_MAX_PARAMETER_SIZE: src_ptr = &device->max_parameter_size; src_size = sizeof(device->max_parameter_size); break; case CL_DEVICE_MEM_BASE_ADDR_ALIGN: src_ptr = &device->mem_base_addr_align; src_size = sizeof(device->mem_base_addr_align); break; case CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: src_ptr = &device->min_data_type_align_size; src_size = sizeof(device->min_data_type_align_size); break; case CL_DEVICE_MAX_PIPE_ARGS: src_ptr = &device->max_pipe_args; src_size = sizeof(device->max_pipe_args); break; case CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS: src_ptr = &device->pipe_max_active_reservations; src_size = sizeof(device->pipe_max_active_reservations); break; case CL_DEVICE_PIPE_MAX_PACKET_SIZE: src_ptr = &device->pipe_max_packet_siz; src_size = sizeof(device->pipe_max_packet_siz); break; case CL_DEVICE_SINGLE_FP_CONFIG: src_ptr = &device->single_fp_config; src_size = sizeof(device->single_fp_config); break; case CL_DEVICE_HALF_FP_CONFIG: src_ptr = &device->half_fp_config; src_size = sizeof(device->half_fp_config); break; case CL_DEVICE_DOUBLE_FP_CONFIG: src_ptr = &device->double_fp_config; src_size = sizeof(device->double_fp_config); break; case CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: src_ptr = &device->global_mem_cache_type; src_size = sizeof(device->global_mem_cache_type); break; case CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: src_ptr = &device->global_mem_cache_line_size; src_size = sizeof(device->global_mem_cache_line_size); break; case CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: src_ptr = &device->global_mem_cache_size; src_size = sizeof(device->global_mem_cache_size); break; case CL_DEVICE_GLOBAL_MEM_SIZE: src_ptr = &device->global_mem_size; src_size = sizeof(device->global_mem_size); break; case CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: src_ptr = &device->max_constant_buffer_size; src_size = sizeof(device->max_constant_buffer_size); break; case CL_DEVICE_IMAGE_MAX_BUFFER_SIZE: src_ptr = &device->image_mem_size; src_size = sizeof(device->image_mem_size); break; case CL_DEVICE_MAX_CONSTANT_ARGS: src_ptr = &device->max_constant_args; src_size = sizeof(device->max_constant_args); break; case CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE: src_ptr = &device->max_global_variable_size; src_size = sizeof(device->max_global_variable_size); break; case CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE: src_ptr = &device->global_variable_preferred_total_size; src_size = sizeof(device->global_variable_preferred_total_size); break; case CL_DEVICE_LOCAL_MEM_TYPE: src_ptr = &device->local_mem_type; src_size = sizeof(device->local_mem_type); break; case CL_DEVICE_LOCAL_MEM_SIZE: src_ptr = &device->local_mem_size; src_size = sizeof(device->local_mem_size); break; case CL_DEVICE_ERROR_CORRECTION_SUPPORT: src_ptr = &device->error_correction_support; src_size = sizeof(device->error_correction_support); break; case CL_DEVICE_HOST_UNIFIED_MEMORY: src_ptr = &device->host_unified_memory; src_size = sizeof(device->host_unified_memory); break; case CL_DEVICE_PROFILING_TIMER_RESOLUTION: src_ptr = &device->profiling_timer_resolution; src_size = sizeof(device->profiling_timer_resolution); break; case CL_DEVICE_ENDIAN_LITTLE: src_ptr = &device->endian_little; src_size = sizeof(device->endian_little); break; case CL_DEVICE_AVAILABLE: src_ptr = &device->available; src_size = sizeof(device->available); break; case CL_DEVICE_COMPILER_AVAILABLE: src_ptr = &device->compiler_available; src_size = sizeof(device->compiler_available); break; case CL_DEVICE_LINKER_AVAILABLE: src_ptr = &device->linker_available; src_size = sizeof(device->linker_available); break; case CL_DEVICE_EXECUTION_CAPABILITIES: src_ptr = &device->execution_capabilities; src_size = sizeof(device->execution_capabilities); break; case CL_DEVICE_QUEUE_PROPERTIES: src_ptr = &device->queue_properties; src_size = sizeof(device->queue_properties); break; case CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES: src_ptr = &device->queue_on_device_properties; src_size = sizeof(device->queue_on_device_properties); break; case CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE: src_ptr = &device->queue_on_device_preferred_size; src_size = sizeof(device->queue_on_device_preferred_size); break; case CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE: src_ptr = &device->queue_on_device_max_size; src_size = sizeof(device->queue_on_device_max_size); break; case CL_DEVICE_MAX_ON_DEVICE_QUEUES: src_ptr = &device->max_on_device_queues; src_size = sizeof(device->max_on_device_queues); break; case CL_DEVICE_MAX_ON_DEVICE_EVENTS: src_ptr = &device->max_on_device_events; src_size = sizeof(device->max_on_device_events); break; case CL_DEVICE_PLATFORM: src_ptr = &device->platform; src_size = sizeof(device->platform); break; case CL_DEVICE_PRINTF_BUFFER_SIZE: src_ptr = &device->printf_buffer_size; src_size = sizeof(device->printf_buffer_size); break; case CL_DEVICE_PREFERRED_INTEROP_USER_SYNC: src_ptr = &device->interop_user_sync; src_size = sizeof(device->interop_user_sync); break; case CL_DEVICE_NAME: src_ptr = device->name; src_size = device->name_sz; break; case CL_DEVICE_VENDOR: src_ptr = device->vendor; src_size = device->vendor_sz; break; case CL_DEVICE_VERSION: src_ptr = device->version; src_size = device->version_sz; break; case CL_DEVICE_PROFILE: src_ptr = device->profile; src_size = device->profile_sz; break; case CL_DEVICE_OPENCL_C_VERSION: src_ptr = device->opencl_c_version; src_size = device->opencl_c_version_sz; break; case CL_DEVICE_SPIR_VERSIONS: src_ptr = device->spir_versions; src_size = device->spir_versions_sz; break; case CL_DEVICE_EXTENSIONS: src_ptr = device->extensions; src_size = device->extensions_sz; break; case CL_DEVICE_BUILT_IN_KERNELS: src_ptr = device->built_in_kernels; src_size = device->built_in_kernels_sz; break; case CL_DEVICE_PARENT_DEVICE: src_ptr = &device->parent_device; src_size = sizeof(device->parent_device); break; case CL_DEVICE_PARTITION_MAX_SUB_DEVICES: src_ptr = &device->partition_max_sub_device; src_size = sizeof(device->partition_max_sub_device); break; case CL_DEVICE_PARTITION_PROPERTIES: src_ptr = &device->partition_property; src_size = sizeof(device->partition_property); break; case CL_DEVICE_PARTITION_AFFINITY_DOMAIN: src_ptr = &device->affinity_domain; src_size = sizeof(device->affinity_domain); break; case CL_DEVICE_PARTITION_TYPE: src_ptr = &device->partition_type; src_size = sizeof(device->partition_type); break; case CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT: src_ptr = &device->preferred_platform_atomic_alignment; src_size = sizeof(device->preferred_platform_atomic_alignment); break; case CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT: src_ptr = &device->preferred_global_atomic_alignment; src_size = sizeof(device->preferred_global_atomic_alignment); break; case CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT: src_ptr = &device->preferred_local_atomic_alignment; src_size = sizeof(device->preferred_local_atomic_alignment); break; case CL_DEVICE_IMAGE_PITCH_ALIGNMENT: src_ptr = &device->image_pitch_alignment; src_size = sizeof(device->image_pitch_alignment); break; case CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT: src_ptr = &device->image_base_address_alignment; src_size = sizeof(device->image_base_address_alignment); break; case CL_DEVICE_SVM_CAPABILITIES: src_ptr = &device->svm_capabilities; src_size = sizeof(device->svm_capabilities); break; case CL_DEVICE_REFERENCE_COUNT: { dev_ref = CL_OBJECT_GET_REF(device); src_ptr = &dev_ref; src_size = sizeof(cl_int); break; } case CL_DRIVER_VERSION: src_ptr = device->driver_version; src_size = device->driver_version_sz; break; case CL_DEVICE_SUB_GROUP_SIZES_INTEL: src_ptr = device->sub_group_sizes; src_size = device->sub_group_sizes_sz; break; default: return CL_INVALID_VALUE; } return cl_get_info_helper(src_ptr, src_size, param_value, param_value_size, param_value_size_ret); } LOCAL cl_int cl_device_get_version(cl_device_id device, cl_int *ver) { if (UNLIKELY(is_gen_device(device) == CL_FALSE)) return CL_INVALID_DEVICE; if (ver == NULL) return CL_SUCCESS; if (device == &intel_ivb_gt1_device || device == &intel_ivb_gt2_device || device == &intel_baytrail_t_device) { *ver = 7; } else if (device == &intel_hsw_gt1_device || device == &intel_hsw_gt2_device || device == &intel_hsw_gt3_device) { *ver = 75; } else if (device == &intel_brw_gt1_device || device == &intel_brw_gt2_device || device == &intel_brw_gt3_device || device == &intel_chv_device) { *ver = 8; } else if (device == &intel_skl_gt1_device || device == &intel_skl_gt2_device || device == &intel_skl_gt3_device || device == &intel_skl_gt4_device || device == &intel_bxt18eu_device || device == &intel_bxt12eu_device || device == &intel_kbl_gt1_device || device == &intel_kbl_gt2_device || device == &intel_kbl_gt3_device || device == &intel_kbl_gt4_device || device == &intel_kbl_gt15_device || device == &intel_glk18eu_device || device == &intel_glk12eu_device) { *ver = 9; } else return CL_INVALID_VALUE; return CL_SUCCESS; } #undef DECL_FIELD #define _DECL_FIELD(FIELD) \ if (param_value && param_value_size < sizeof(FIELD)) \ return CL_INVALID_VALUE; \ if (param_value_size_ret != NULL) \ *param_value_size_ret = sizeof(FIELD); \ if (param_value) \ memcpy(param_value, &FIELD, sizeof(FIELD)); \ return CL_SUCCESS; #define DECL_FIELD(CASE,FIELD) \ case JOIN(CL_KERNEL_,CASE): \ _DECL_FIELD(FIELD) #include "cl_kernel.h" #include "cl_program.h" static int cl_check_builtin_kernel_dimension(cl_kernel kernel, cl_device_id device) { const char * n = cl_kernel_get_name(kernel); const char * builtin_kernels_2d = "__cl_copy_image_2d_to_2d;__cl_copy_image_2d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_fill_image_2d;__cl_fill_image_2d_array;"; const char * builtin_kernels_3d = "__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_3d;__cl_fill_image_3d"; if (n == NULL || !strstr(device->built_in_kernels, n)){ return 0; }else if(strstr(builtin_kernels_2d, n)){ return 2; }else if(strstr(builtin_kernels_3d, n)){ return 3; }else return 1; } LOCAL size_t cl_get_kernel_max_wg_sz(cl_kernel kernel) { size_t work_group_size, thread_cnt; int simd_width = interp_kernel_get_simd_width(kernel->opaque); int device_id = kernel->program->ctx->devices[0]->device_id; if (!interp_kernel_use_slm(kernel->opaque)) { if (!IS_BAYTRAIL_T(device_id) || simd_width == 16) work_group_size = simd_width * 64; else work_group_size = kernel->program->ctx->devices[0]->max_compute_unit * kernel->program->ctx->devices[0]->max_thread_per_unit * simd_width; } else { thread_cnt = kernel->program->ctx->devices[0]->max_compute_unit * kernel->program->ctx->devices[0]->max_thread_per_unit / kernel->program->ctx->devices[0]->sub_slice_count; if(thread_cnt > 64) thread_cnt = 64; work_group_size = thread_cnt * simd_width; } if(work_group_size > kernel->program->ctx->devices[0]->max_work_group_size) work_group_size = kernel->program->ctx->devices[0]->max_work_group_size; return work_group_size; } LOCAL cl_int cl_get_kernel_workgroup_info(cl_kernel kernel, cl_device_id device, cl_kernel_work_group_info param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret) { int err = CL_SUCCESS; int dimension = 0; CHECK_KERNEL(kernel); if (device == NULL) device = kernel->program->ctx->devices[0]; if (UNLIKELY(is_gen_device(device) == CL_FALSE)) return CL_INVALID_DEVICE; switch (param_name) { case CL_KERNEL_WORK_GROUP_SIZE: { if (param_value && param_value_size < sizeof(size_t)) return CL_INVALID_VALUE; if (param_value_size_ret != NULL) *param_value_size_ret = sizeof(size_t); if (param_value) { size_t work_group_size = cl_get_kernel_max_wg_sz(kernel); *(size_t*)param_value = work_group_size; return CL_SUCCESS; } } case CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: { if (param_value && param_value_size < sizeof(size_t)) return CL_INVALID_VALUE; if (param_value_size_ret != NULL) *param_value_size_ret = sizeof(size_t); if (param_value) *(size_t*)param_value = interp_kernel_get_simd_width(kernel->opaque); return CL_SUCCESS; } case CL_KERNEL_LOCAL_MEM_SIZE: { size_t local_mem_sz = interp_kernel_get_slm_size(kernel->opaque) + kernel->local_mem_sz; _DECL_FIELD(local_mem_sz) } DECL_FIELD(COMPILE_WORK_GROUP_SIZE, kernel->compile_wg_sz) DECL_FIELD(PRIVATE_MEM_SIZE, kernel->stack_size) case CL_KERNEL_GLOBAL_WORK_SIZE: { dimension = cl_check_builtin_kernel_dimension(kernel, device); if ( !dimension ) return CL_INVALID_VALUE; if (param_value_size_ret != NULL) *param_value_size_ret = sizeof(device->max_1d_global_work_sizes); if (param_value) { if (dimension == 1) { memcpy(param_value, device->max_1d_global_work_sizes, sizeof(device->max_1d_global_work_sizes)); }else if(dimension == 2){ memcpy(param_value, device->max_2d_global_work_sizes, sizeof(device->max_2d_global_work_sizes)); }else if(dimension == 3){ memcpy(param_value, device->max_3d_global_work_sizes, sizeof(device->max_3d_global_work_sizes)); }else return CL_INVALID_VALUE; return CL_SUCCESS; } return CL_SUCCESS; } case CL_KERNEL_SPILL_MEM_SIZE_INTEL: { if (param_value && param_value_size < sizeof(cl_ulong)) return CL_INVALID_VALUE; if (param_value_size_ret != NULL) *param_value_size_ret = sizeof(cl_ulong); if (param_value) *(cl_ulong*)param_value = (cl_ulong)interp_kernel_get_scratch_size(kernel->opaque); return CL_SUCCESS; } default: return CL_INVALID_VALUE; }; error: return err; } LOCAL cl_int cl_get_kernel_subgroup_info(cl_kernel kernel, cl_device_id device, cl_kernel_work_group_info param_name, size_t input_value_size, const void* input_value, size_t param_value_size, void* param_value, size_t* param_value_size_ret) { int err = CL_SUCCESS; if(device != NULL) if (kernel->program->ctx->devices[0] != device) return CL_INVALID_DEVICE; CHECK_KERNEL(kernel); switch (param_name) { case CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR: { int i, dim = 0; size_t local_sz = 1; if (param_value && param_value_size < sizeof(size_t)) return CL_INVALID_VALUE; if (param_value_size_ret != NULL) *param_value_size_ret = sizeof(size_t); switch (input_value_size) { case sizeof(size_t)*1: case sizeof(size_t)*2: case sizeof(size_t)*3: dim = input_value_size/sizeof(size_t); break; default: return CL_INVALID_VALUE; } if (input_value == NULL ) return CL_INVALID_VALUE; for(i = 0; i < dim; i++) local_sz *= ((size_t*)input_value)[i]; if (param_value) { size_t simd_sz = cl_kernel_get_simd_width(kernel); size_t sub_group_size = local_sz >= simd_sz? simd_sz : local_sz; *(size_t*)param_value = sub_group_size; return CL_SUCCESS; } break; } case CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE_KHR: { int i, dim = 0; size_t local_sz = 1; if (param_value && param_value_size < sizeof(size_t)) return CL_INVALID_VALUE; if (param_value_size_ret != NULL) *param_value_size_ret = sizeof(size_t); switch (input_value_size) { case sizeof(size_t)*1: case sizeof(size_t)*2: case sizeof(size_t)*3: dim = input_value_size/sizeof(size_t); break; default: return CL_INVALID_VALUE; } if (input_value == NULL ) return CL_INVALID_VALUE; for(i = 0; i < dim; i++) local_sz *= ((size_t*)input_value)[i]; if (param_value) { size_t simd_sz = cl_kernel_get_simd_width(kernel); size_t sub_group_num = (local_sz + simd_sz - 1) / simd_sz; *(size_t*)param_value = sub_group_num; return CL_SUCCESS; } break; } case CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL: { if (param_value && param_value_size < sizeof(size_t)) return CL_INVALID_VALUE; if (param_value_size_ret != NULL) *param_value_size_ret = sizeof(size_t); if (param_value) *(size_t*)param_value = interp_kernel_get_simd_width(kernel->opaque); return CL_SUCCESS; } default: return CL_INVALID_VALUE; }; error: return err; } LOCAL cl_int cl_devices_list_check(cl_uint num_devices, const cl_device_id *devices) { cl_uint i; if (devices == NULL) return CL_INVALID_DEVICE; assert(num_devices > 0); for (i = 0; i < num_devices; i++) { if (!CL_OBJECT_IS_DEVICE(devices[i])) { return CL_INVALID_DEVICE; } if (devices[i]->available == CL_FALSE) { return CL_DEVICE_NOT_AVAILABLE; } // We now just support one platform. if (devices[i]->platform != cl_get_platform_default()) { return CL_INVALID_DEVICE; } // TODO: We now just support Gen Device. if (devices[i] != cl_get_gt_device(devices[i]->device_type)) { return CL_INVALID_DEVICE; } } return CL_SUCCESS; } LOCAL cl_int cl_devices_list_include_check(cl_uint num_devices, const cl_device_id *devices, cl_uint num_to_check, const cl_device_id *devices_to_check) { cl_uint i, j; for (i = 0; i < num_to_check; i++) { for (j = 0; j < num_devices; j++) { if (devices_to_check[i] == devices[j]) break; } if (j == num_devices) return CL_INVALID_DEVICE; } return CL_SUCCESS; } Beignet-1.3.2-Source/src/cl_api_program.c000664 001750 001750 00000014075 13161142102 017367 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "cl_context.h" #include "cl_program.h" #include "cl_device_id.h" #include cl_int clGetProgramInfo(cl_program program, cl_program_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { const void *src_ptr = NULL; size_t src_size = 0; const char *ret_str = ""; cl_int ref; cl_uint num_dev, kernels_num; if (!CL_OBJECT_IS_PROGRAM(program)) { return CL_INVALID_PROGRAM; } if (param_name == CL_PROGRAM_REFERENCE_COUNT) { ref = CL_OBJECT_GET_REF(program); src_ptr = &ref; src_size = sizeof(cl_int); } else if (param_name == CL_PROGRAM_CONTEXT) { src_ptr = &program->ctx; src_size = sizeof(cl_context); } else if (param_name == CL_PROGRAM_NUM_DEVICES) { num_dev = program->ctx->device_num; // Just 1 dev now. src_ptr = &num_dev; src_size = sizeof(cl_uint); } else if (param_name == CL_PROGRAM_DEVICES) { src_ptr = program->ctx->devices; src_size = program->ctx->device_num * sizeof(cl_device_id); } else if (param_name == CL_PROGRAM_NUM_KERNELS) { kernels_num = program->ker_n; src_ptr = &kernels_num; src_size = sizeof(cl_uint); } else if (param_name == CL_PROGRAM_SOURCE) { if (!program->source) { src_ptr = ret_str; src_size = 1; } else { src_ptr = program->source; src_size = strlen(program->source) + 1; } } else if (param_name == CL_PROGRAM_KERNEL_NAMES) { // TODO: need to refine this. cl_program_get_kernel_names(program, param_value_size, (char *)param_value, param_value_size_ret); return CL_SUCCESS; } else if (param_name == CL_PROGRAM_BINARY_SIZES) { if (program->binary == NULL) { if (program->binary_type == CL_PROGRAM_BINARY_TYPE_EXECUTABLE) { program->binary_sz = compiler_program_serialize_to_binary(program->opaque, &program->binary, 0); } else if (program->binary_type == CL_PROGRAM_BINARY_TYPE_COMPILED_OBJECT) { program->binary_sz = compiler_program_serialize_to_binary(program->opaque, &program->binary, 1); } else if (program->binary_type == CL_PROGRAM_BINARY_TYPE_LIBRARY) { program->binary_sz = compiler_program_serialize_to_binary(program->opaque, &program->binary, 2); } else { return CL_INVALID_BINARY; } } if (program->binary == NULL || program->binary_sz == 0) { return CL_OUT_OF_RESOURCES; } src_ptr = &program->binary_sz; src_size = sizeof(size_t); } else if (param_name == CL_PROGRAM_BINARIES) { if (param_value_size_ret) *param_value_size_ret = sizeof(void *); if (!param_value) return CL_SUCCESS; /* param_value points to an array of n pointers allocated by the caller */ if (program->binary == NULL) { if (program->binary_type == CL_PROGRAM_BINARY_TYPE_EXECUTABLE) { program->binary_sz = compiler_program_serialize_to_binary(program->opaque, &program->binary, 0); } else if (program->binary_type == CL_PROGRAM_BINARY_TYPE_COMPILED_OBJECT) { program->binary_sz = compiler_program_serialize_to_binary(program->opaque, &program->binary, 1); } else if (program->binary_type == CL_PROGRAM_BINARY_TYPE_LIBRARY) { program->binary_sz = compiler_program_serialize_to_binary(program->opaque, &program->binary, 2); } else { return CL_INVALID_BINARY; } } if (program->binary == NULL || program->binary_sz == 0) { return CL_OUT_OF_RESOURCES; } memcpy(*((void **)param_value), program->binary, program->binary_sz); return CL_SUCCESS; } else { return CL_INVALID_VALUE; } return cl_get_info_helper(src_ptr, src_size, param_value, param_value_size, param_value_size_ret); } cl_int clGetProgramBuildInfo(cl_program program, cl_device_id device, cl_program_build_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { const void *src_ptr = NULL; size_t src_size = 0; const char *ret_str = ""; size_t global_size; if (!CL_OBJECT_IS_PROGRAM(program)) { return CL_INVALID_PROGRAM; } cl_int err = cl_devices_list_include_check(program->ctx->device_num, program->ctx->devices, 1, &device); if (err != CL_SUCCESS) return err; if (param_name == CL_PROGRAM_BUILD_STATUS) { src_ptr = &program->build_status; src_size = sizeof(cl_build_status); } else if (param_name == CL_PROGRAM_BUILD_OPTIONS) { if (program->is_built && program->build_opts) { ret_str = program->build_opts; } src_ptr = ret_str; src_size = strlen(ret_str) + 1; } else if (param_name == CL_PROGRAM_BUILD_LOG) { src_ptr = program->build_log; src_size = program->build_log_sz + 1; } else if (param_name == CL_PROGRAM_BINARY_TYPE) { src_ptr = &program->binary_type; src_size = sizeof(cl_uint); } else if (param_name == CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE) { global_size = 0; if (program->is_built) global_size = cl_program_get_global_variable_size(program); src_ptr = &global_size; src_size = sizeof(global_size); } else { return CL_INVALID_VALUE; } return cl_get_info_helper(src_ptr, src_size, param_value, param_value_size, param_value_size_ret); } Beignet-1.3.2-Source/src/cl_kernel.h000664 001750 001750 00000013426 13161142102 016353 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_KERNEL_H__ #define __CL_KERNEL_H__ #include "cl_internals.h" #include "cl_base_object.h" #include "cl_driver.h" #include "cl_gbe_loader.h" #include "CL/cl.h" #include "CL/cl_ext.h" #include #include /* This is the kernel as it is interfaced by the compiler */ struct _gbe_kernel; /* We need to save buffer data for relocation and binding and we must figure out * if all arguments are properly set */ typedef struct cl_argument { cl_mem mem; /* For image and regular buffers */ cl_sampler sampler; /* For sampler. */ cl_accelerator_intel accel; unsigned char bti; void *ptr; /* SVM ptr value. */ uint32_t local_sz:30; /* For __local size specification */ uint32_t is_set:1; /* All args must be set before NDRange */ uint32_t is_svm:1; /* Indicate this argument is SVMPointer */ } cl_argument; /* One OCL function */ struct _cl_kernel { _cl_base_object base; cl_buffer bo; /* The code itself */ cl_program program; /* Owns this structure (and pointers) */ gbe_kernel opaque; /* (Opaque) compiler structure for the OCL kernel */ cl_accelerator_intel accel; /* accelerator */ char *curbe; /* One curbe per kernel */ size_t curbe_sz; /* Size of it */ uint32_t samplers[GEN_MAX_SAMPLERS]; /* samplers defined in kernel & kernel args */ size_t sampler_sz; /* sampler size defined in kernel & kernel args. */ struct ImageInfo *images; /* images defined in kernel args */ size_t image_sz; /* image count in kernel args */ cl_ulong local_mem_sz; /* local memory size specified in kernel args. */ size_t compile_wg_sz[3]; /* Required workgroup size by __attribute__((reqd_work_gro up_size(X, Y, Z))) qualifier.*/ size_t global_work_sz[3]; /* maximum global size that can be used to execute a kernel (i.e. global_work_size argument to clEnqueueNDRangeKernel.)*/ size_t stack_size; /* stack size per work item. */ cl_argument *args; /* To track argument setting */ uint32_t arg_n:30; /* Number of arguments */ uint32_t ref_its_program:1; /* True only for the user kernel (created by clCreateKernel) */ uint32_t vme:1; /* True only if it is a built-in kernel for VME */ void* cmrt_kernel; /* CmKernel* */ uint32_t exec_info_n; /* The kernel's exec info count */ void** exec_info; /* The kernel's exec info */ cl_bool useDeviceEnqueue; /* kernel use device enqueue */ void* device_enqueue_ptr; /* device_enqueue buffer*/ uint32_t device_enqueue_info_n; /* count of parent kernel's arguments buffers, as child enqueues' exec info */ void** device_enqueue_infos; /* parent kernel's arguments buffers, as child enqueues' exec info */ }; #define CL_OBJECT_KERNEL_MAGIC 0x1234567890abedefLL #define CL_OBJECT_IS_KERNEL(obj) ((obj && \ ((cl_base_object)obj)->magic == CL_OBJECT_KERNEL_MAGIC && \ CL_OBJECT_GET_REF(obj) >= 1)) /* Allocate an empty kernel */ extern cl_kernel cl_kernel_new(cl_program); /* Destroy and deallocate an empty kernel */ extern void cl_kernel_delete(cl_kernel); /* Setup the kernel with the given GBE Kernel */ extern void cl_kernel_setup(cl_kernel k, gbe_kernel opaque); /* Get the kernel name */ extern const char *cl_kernel_get_name(cl_kernel k); /* Get the kernel attributes*/ extern const char *cl_kernel_get_attributes(cl_kernel k); /* Get the simd width as used in the code */ extern uint32_t cl_kernel_get_simd_width(cl_kernel k); /* When a kernel is created from outside, we just duplicate the structure we * have internally and give it back to the user */ extern cl_kernel cl_kernel_dup(cl_kernel); /* Add one more reference on the kernel object */ extern void cl_kernel_add_ref(cl_kernel); /* Set the argument before kernel execution */ extern int cl_kernel_set_arg(cl_kernel, uint32_t arg_index, size_t arg_size, const void *arg_value); extern int cl_kernel_set_arg_svm_pointer(cl_kernel, uint32_t arg_index, const void *arg_value); extern cl_int cl_kernel_set_exec_info(cl_kernel k, size_t n, const void *value); /* Get the argument information */ extern int cl_get_kernel_arg_info(cl_kernel k, cl_uint arg_index, cl_kernel_arg_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); /* Compute and check the work group size from the user provided local size */ extern cl_int cl_kernel_work_group_sz(cl_kernel ker, const size_t *local_wk_sz, cl_uint wk_dim, size_t *wk_grp_sz); #endif /* __CL_KERNEL_H__ */ Beignet-1.3.2-Source/src/cl_api_command_queue.c000664 001750 001750 00000014740 13161142102 020541 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "cl_command_queue.h" #include "cl_device_id.h" #include "CL/cl.h" #include /* Depreciated in 2.0 later */ cl_command_queue clCreateCommandQueue(cl_context context, cl_device_id device, cl_command_queue_properties properties, cl_int *errcode_ret) { cl_command_queue queue = NULL; cl_int err = CL_SUCCESS; do { if (!CL_OBJECT_IS_CONTEXT(context)) { err = CL_INVALID_CONTEXT; break; } err = cl_devices_list_include_check(context->device_num, context->devices, 1, &device); if (err) break; if (properties & ~(CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_PROFILING_ENABLE)) { err = CL_INVALID_VALUE; break; } if (properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) { /*not supported now.*/ err = CL_INVALID_QUEUE_PROPERTIES; break; } queue = cl_create_command_queue(context, device, properties, 0, &err); } while (0); if (errcode_ret) *errcode_ret = err; return queue; } /* 2.0 new API for create command queue. */ cl_command_queue clCreateCommandQueueWithProperties(cl_context context, cl_device_id device, const cl_queue_properties *properties, cl_int *errcode_ret) { cl_command_queue queue = NULL; cl_int err = CL_SUCCESS; cl_command_queue_properties prop = 0xFFFFFFFF; cl_uint queue_sz = 0xFFFFFFFF; do { if (!CL_OBJECT_IS_CONTEXT(context)) { err = CL_INVALID_CONTEXT; break; } err = cl_devices_list_include_check(context->device_num, context->devices, 1, &device); if (err) break; if (properties) { cl_ulong que_type; cl_ulong que_val; cl_uint i; for (i = 0; (que_type = properties[i++]) != 0; i++) { que_val = properties[i]; switch (que_type) { case CL_QUEUE_PROPERTIES: if (prop != 0xFFFFFFFF) err = CL_INVALID_VALUE; else { switch (que_val) { case 0: case CL_QUEUE_PROFILING_ENABLE: case CL_QUEUE_PROFILING_ENABLE | CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE: case CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE: case CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE: case CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT: case CL_QUEUE_PROFILING_ENABLE | CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE: case CL_QUEUE_PROFILING_ENABLE | CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT: prop = que_val; break; default: err = CL_INVALID_VALUE; break; } } break; case CL_QUEUE_SIZE: queue_sz = que_val; break; default: err = CL_INVALID_VALUE; break; } } if (err) /* break the while and return some err. */ break; } /* Set some paramters to default val. */ if (prop == 0xFFFFFFFF) prop = 0; if (queue_sz != 0xFFFFFFFF) if (!(prop & CL_QUEUE_ON_DEVICE)) { err = CL_INVALID_VALUE; break; } if (queue_sz == 0xFFFFFFFF) queue_sz = device->queue_on_device_preferred_size; if (queue_sz > device->queue_on_device_max_size) { err = CL_INVALID_VALUE; break; } queue = cl_create_command_queue(context, device, prop, queue_sz, &err); } while (0); if (errcode_ret) *errcode_ret = err; return queue; } cl_int clGetCommandQueueInfo(cl_command_queue command_queue, cl_command_queue_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { const void *src_ptr = NULL; size_t src_size = 0; cl_int ref; if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } if (param_name == CL_QUEUE_CONTEXT) { src_ptr = &command_queue->ctx; src_size = sizeof(cl_context); } else if (param_name == CL_QUEUE_DEVICE) { src_ptr = &command_queue->device; src_size = sizeof(cl_device_id); } else if (param_name == CL_QUEUE_REFERENCE_COUNT) { ref = CL_OBJECT_GET_REF(command_queue); src_ptr = &ref; src_size = sizeof(cl_int); } else if (param_name == CL_QUEUE_PROPERTIES) { src_ptr = &command_queue->props; src_size = sizeof(cl_command_queue_properties); } else if (param_name == CL_QUEUE_SIZE) { src_ptr = &command_queue->size; src_size = sizeof(command_queue->size); } else { return CL_INVALID_VALUE; } return cl_get_info_helper(src_ptr, src_size, param_value, param_value_size, param_value_size_ret); } cl_int clFlush(cl_command_queue command_queue) { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } return cl_command_queue_wait_flush(command_queue); } cl_int clFinish(cl_command_queue command_queue) { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } return cl_command_queue_wait_finish(command_queue); } cl_int clRetainCommandQueue(cl_command_queue command_queue) { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } cl_command_queue_add_ref(command_queue); return CL_SUCCESS; } cl_int clReleaseCommandQueue(cl_command_queue command_queue) { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { return CL_INVALID_COMMAND_QUEUE; } cl_command_queue_wait_flush(command_queue); cl_command_queue_delete(command_queue); return CL_SUCCESS; } Beignet-1.3.2-Source/src/cl_cmrt.cpp000664 001750 001750 00000021610 13161142102 016365 0ustar00yryr000000 000000 #include "cl_cmrt.h" #include "cl_device_id.h" #include "intel/intel_defines.h" #include "cl_command_queue.h" #include "cm_rt.h" //header file of libcmrt.so typedef INT (*CreateCmDeviceFunc)(CmDevice * &pDevice, UINT & version, CmDriverContext * drivercontext, UINT DevCreateOption); typedef INT (*DestroyCmDeviceFunc)(CmDevice * &pDevice); #include static void* dlhCMRT = NULL; static CreateCmDeviceFunc pfnCreateCmDevice = NULL; static DestroyCmDeviceFunc pfnDestroyCmDevice = NULL; #define XSTR(x) #x #define STR(x) XSTR(x) class CmrtCleanup { public: CmrtCleanup(){} ~CmrtCleanup() { if (dlhCMRT != NULL) dlclose(dlhCMRT); } }; enum CMRT_MEM_TYPE { CMRT_BUFFER, CMRT_SURFACE2D, }; static CmrtCleanup cmrtCleanup; static bool LoadCmrtLibrary() { if (dlhCMRT == NULL) { dlhCMRT = dlopen(STR(CMRT_PATH), RTLD_LAZY | RTLD_LOCAL); if (dlhCMRT == NULL) return false; pfnCreateCmDevice = (CreateCmDeviceFunc)dlsym(dlhCMRT, "CreateCmDevice"); if (pfnCreateCmDevice == NULL) return false; pfnDestroyCmDevice = (DestroyCmDeviceFunc)dlsym(dlhCMRT, "DestroyCmDevice"); if (pfnDestroyCmDevice == NULL) return false; } return true; } cl_int cmrt_build_program(cl_program p, const char *options) { CmDevice*& cmrt_device = (CmDevice*&)(p->ctx->device->cmrt_device); int result; if (cmrt_device == NULL) { if (!LoadCmrtLibrary()) return CL_DEVICE_NOT_AVAILABLE; //yes, the error is not accurate, but i do not find a bettere one CmDriverContext ctx; ctx.shared_bufmgr = 1; ctx.bufmgr = (drm_intel_bufmgr*)cl_context_get_bufmgr(p->ctx); ctx.userptr_enabled = 0; ctx.deviceid = p->ctx->device->device_id; ctx.device_rev = -1; UINT version = 0; result = (*pfnCreateCmDevice)(cmrt_device, version, &ctx, CM_DEVICE_CREATE_OPTION_DEFAULT); if (result != CM_SUCCESS) return CL_DEVICE_NOT_AVAILABLE; } CmProgram* cmrt_program = NULL; result = cmrt_device->LoadProgram(p->binary, p->binary_sz, cmrt_program, options); if (result != CM_SUCCESS) return CL_COMPILE_PROGRAM_FAILURE; p->cmrt_program = cmrt_program; cmrt_program->GetKernelCount(p->ker_n); return CL_SUCCESS; } cl_int cmrt_destroy_program(cl_program p) { CmDevice* cmrt_device = (CmDevice*)(p->ctx->device->cmrt_device); CmProgram*& cmrt_program = (CmProgram*&)(p->cmrt_program); if (cmrt_device->DestroyProgram(cmrt_program) != CM_SUCCESS) return CL_INVALID_PROGRAM; return CL_SUCCESS; } cl_int cmrt_destroy_device(cl_device_id device) { CmDevice*& cmrt_device = (CmDevice*&)(device->cmrt_device); if ((*pfnDestroyCmDevice)(cmrt_device) != CM_SUCCESS) return CL_INVALID_DEVICE; return CL_SUCCESS; } void* cmrt_create_kernel(cl_program p, const char *name) { CmDevice* cmrt_device = (CmDevice*)(p->ctx->device->cmrt_device); CmKernel* cmrt_kernel = NULL; int result = cmrt_device->CreateKernel((CmProgram*)(p->cmrt_program), name, cmrt_kernel); if (result != CM_SUCCESS) return NULL; return cmrt_kernel; } cl_int cmrt_destroy_kernel(cl_kernel k) { CmDevice* cmrt_device = (CmDevice*)(k->program->ctx->device->cmrt_device); CmKernel*& cmrt_kernel = (CmKernel*&)(k->cmrt_kernel); if (cmrt_device->DestroyKernel(cmrt_kernel) != CM_SUCCESS) return CL_INVALID_KERNEL; return CL_SUCCESS; } cl_int cmrt_enqueue(cl_command_queue cq, cl_kernel k, const size_t* global_work_size, const size_t* local_work_size) { CmDevice* cmrt_device = (CmDevice*)(k->program->ctx->device->cmrt_device); CmKernel* cmrt_kernel = (CmKernel*)(k->cmrt_kernel); int result = 0; cmrt_kernel->SetThreadCount(global_work_size[0]*global_work_size[1]); //no need to destory queue explicitly, //and there is only one queue instance within each device, //CreateQueue always returns the same instance CmQueue* pCmQueue = NULL; cmrt_device->CreateQueue(pCmQueue); CmTask *pKernelArray = NULL; cmrt_device->CreateTask(pKernelArray); pKernelArray->AddKernel(cmrt_kernel); CmEvent* e = NULL; if (local_work_size == NULL) { CmThreadSpace* pTS = NULL; cmrt_device->CreateThreadSpace(global_work_size[0], global_work_size[1], pTS); result = pCmQueue->Enqueue(pKernelArray, e, pTS); } else { CmThreadGroupSpace* pTGS = NULL; cmrt_device->CreateThreadGroupSpace(global_work_size[0], global_work_size[1], local_work_size[0], local_work_size[1], pTGS); result = pCmQueue->EnqueueWithGroup(pKernelArray, e, pTGS); cmrt_device->DestroyThreadGroupSpace(pTGS); } if (result != CM_SUCCESS) return CL_INVALID_OPERATION; cmrt_device->DestroyTask(pKernelArray); CmEvent*& olde = (CmEvent*&)cq->cmrt_event; if (olde != NULL) pCmQueue->DestroyEvent(e); cq->cmrt_event = e; return CL_SUCCESS; } static VA_CM_FORMAT GetCmrtFormat(_cl_mem_image* image) { switch (image->intel_fmt) { case I965_SURFACEFORMAT_B8G8R8A8_UNORM: return VA_CM_FMT_A8R8G8B8; case I965_SURFACEFORMAT_B8G8R8X8_UNORM: return VA_CM_FMT_X8R8G8B8; case I965_SURFACEFORMAT_A8_UNORM: return VA_CM_FMT_A8; case I965_SURFACEFORMAT_R10G10B10A2_UNORM: return VA_CM_FMT_A2B10G10R10; case I965_SURFACEFORMAT_R16G16B16A16_UNORM: return VA_CM_FMT_A16B16G16R16; case I965_SURFACEFORMAT_L8_UNORM: return VA_CM_FMT_L8; case I965_SURFACEFORMAT_R16_UINT: return VA_CM_FMT_R16U; case I965_SURFACEFORMAT_R8_UNORM: return VA_CM_FMT_R8U; case I965_SURFACEFORMAT_L16_UNORM: return VA_CM_FMT_L16; case I965_SURFACEFORMAT_R32_FLOAT: return VA_CM_FMT_R32F; default: return VA_CM_FMT_UNKNOWN; } } static bool CreateCmrtMemory(cl_mem mem) { if (mem->cmrt_mem != NULL) return true; CmDevice* cmrt_device = (CmDevice*)(mem->ctx->device->cmrt_device); int result; CmOsResource osResource; osResource.bo_size = mem->size; osResource.bo_flags = DRM_BO_HANDLE; osResource.bo = (drm_intel_bo*)mem->bo; if (IS_IMAGE(mem)) { _cl_mem_image* image = cl_mem_image(mem); if (CL_MEM_OBJECT_IMAGE2D != image->image_type) return CL_INVALID_ARG_VALUE; osResource.format = GetCmrtFormat(image); if (osResource.format == VA_CM_FMT_UNKNOWN) return false; osResource.aligned_width = image->row_pitch; osResource.aligned_height = mem->size / image->row_pitch; osResource.pitch = image->row_pitch; osResource.tile_type = image->tiling; osResource.orig_width = image->w; osResource.orig_height = image->h; CmSurface2D*& cmrt_surface2d = (CmSurface2D*&)(mem->cmrt_mem); result = cmrt_device->CreateSurface2D(&osResource, cmrt_surface2d); mem->cmrt_mem_type = CMRT_SURFACE2D; } else { osResource.format = VA_CM_FMT_BUFFER; osResource.buf_bytes = mem->size; CmBuffer*& cmrt_buffer = (CmBuffer*&)(mem->cmrt_mem); result = cmrt_device->CreateBuffer(&osResource, cmrt_buffer); mem->cmrt_mem_type = CMRT_BUFFER; } if (result != CM_SUCCESS) return false; return true; } cl_int cmrt_set_kernel_arg(cl_kernel k, cl_uint index, size_t sz, const void *value) { if(value == NULL) return CL_INVALID_ARG_VALUE; CmKernel* cmrt_kernel = (CmKernel*)(k->cmrt_kernel); WORD argKind = -1; if (cmrt_kernel->GetArgKind(index, argKind) != CM_SUCCESS) return CL_INVALID_ARG_INDEX; int result; if (argKind == ARG_KIND_GENERAL) result = cmrt_kernel->SetKernelArg(index, sz, value); else { cl_mem mem = *(cl_mem*)value; if (((cl_base_object)mem)->magic == CL_MAGIC_MEM_HEADER) { if (!CreateCmrtMemory(mem)) return CL_INVALID_ARG_VALUE; SurfaceIndex * memIndex = NULL; if (mem->cmrt_mem_type == CMRT_BUFFER) { CmBuffer* cmrt_buffer = (CmBuffer*)(mem->cmrt_mem); cmrt_buffer->GetIndex(memIndex); } else { CmSurface2D* cmrt_surface2d = (CmSurface2D*)(mem->cmrt_mem); cmrt_surface2d->GetIndex(memIndex); } result = cmrt_kernel->SetKernelArg(index, sizeof(SurfaceIndex), memIndex); } else return CL_INVALID_ARG_VALUE; } if (result != CM_SUCCESS) return CL_INVALID_KERNEL_ARGS; return CL_SUCCESS; } cl_int cmrt_destroy_memory(cl_mem mem) { CmDevice* cmrt_device = (CmDevice*)(mem->ctx->device->cmrt_device); if (mem->cmrt_mem_type == CMRT_BUFFER) { CmBuffer*& cmrt_buffer = (CmBuffer*&)(mem->cmrt_mem); cmrt_device->DestroySurface(cmrt_buffer); } else { CmSurface2D*& cmrt_surface2d = (CmSurface2D*&)(mem->cmrt_mem); cmrt_device->DestroySurface(cmrt_surface2d); } return CL_SUCCESS; } cl_int cmrt_destroy_event(cl_command_queue cq) { CmEvent*& cmrt_event = (CmEvent*&)(cq->cmrt_event); CmDevice* cmrt_device = (CmDevice*)(cq->ctx->device->cmrt_device); CmQueue* pCmQueue = NULL; cmrt_event->WaitForTaskFinished(); cmrt_device->CreateQueue(pCmQueue); pCmQueue->DestroyEvent(cmrt_event); return CL_SUCCESS; } cl_int cmrt_wait_for_task_finished(cl_command_queue cq) { CmEvent* cmrt_event = (CmEvent*)(cq->cmrt_event); cmrt_event->WaitForTaskFinished(); return CL_SUCCESS; } Beignet-1.3.2-Source/src/cl_khr_icd.h000664 001750 001750 00000002026 13161142102 016470 0ustar00yryr000000 000000 /* * Copyright © 2013 Simon Richter * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . */ #ifndef __CL_KHR_ICD_H__ #define __CL_KHR_ICD_H__ #ifdef HAS_OCLIcd #define SET_ICD(dispatch) \ dispatch = &cl_khr_icd_dispatch; #define DEFINE_ICD(member) struct _cl_icd_dispatch const *member; extern struct _cl_icd_dispatch const cl_khr_icd_dispatch; #else #define SET_ICD(dispatch) #define DEFINE_ICD(member) #endif #endif Beignet-1.3.2-Source/src/cl_image.h000664 001750 001750 00000003341 13161142102 016150 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_IMAGE_H__ #define __CL_IMAGE_H__ #include "cl_internals.h" #include "CL/cl.h" #include /* Returned when the OCL format is not supported */ #define INTEL_UNSUPPORTED_FORMAT ((uint32_t) ~0x0u) /* Compute the number of bytes per pixel if the format is supported */ extern cl_int cl_image_byte_per_pixel(const cl_image_format *fmt, uint32_t *bpp); /* Return the intel format for the given OCL format */ extern uint32_t cl_image_get_intel_format(const cl_image_format *fmt); /* Return the list of formats supported by the API */ extern cl_int cl_image_get_supported_fmt(cl_context context, cl_mem_flags flags, cl_mem_object_type image_type, cl_uint num_entries, cl_image_format *image_formats, cl_uint *num_image_formats); #endif /* __CL_IMAGE_H__ */ Beignet-1.3.2-Source/src/cl_command_queue.h000664 001750 001750 00000012211 13161142102 017704 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_COMMAND_QUEUE_H__ #define __CL_COMMAND_QUEUE_H__ #include "cl_internals.h" #include "cl_driver.h" #include "cl_base_object.h" #include "CL/cl.h" #include struct intel_gpgpu; typedef struct _cl_command_queue_enqueue_worker { cl_command_queue queue; pthread_t tid; cl_uint cookie; cl_bool quit; list_head enqueued_events; cl_uint in_exec_status; // Same value as CL_COMPLETE, CL_SUBMITTED ... } _cl_command_queue_enqueue_worker; typedef _cl_command_queue_enqueue_worker *cl_command_queue_enqueue_worker; /* Basically, this is a (kind-of) batch buffer */ typedef struct _cl_command_queue { _cl_base_object base; _cl_command_queue_enqueue_worker worker; cl_context ctx; /* Its parent context */ cl_device_id device; /* Its device */ cl_event* barrier_events; /* Point to array of non-complete user events that block this command queue */ cl_int barrier_events_num; /* Number of Non-complete user events */ cl_int barrier_events_size; /* The size of array that wait_events point to */ cl_command_queue_properties props; /* Queue properties */ cl_mem perf; /* Where to put the perf counters */ cl_uint size; /* Store the specified size for queueu */ } _cl_command_queue;; #define CL_OBJECT_COMMAND_QUEUE_MAGIC 0x83650a12b79ce4efLL #define CL_OBJECT_IS_COMMAND_QUEUE(obj) ((obj && \ ((cl_base_object)obj)->magic == CL_OBJECT_COMMAND_QUEUE_MAGIC && \ CL_OBJECT_GET_REF(obj) >= 1)) /* Allocate and initialize a new command queue. Also insert it in the list of * command queue in the associated context */ extern cl_command_queue cl_create_command_queue(cl_context, cl_device_id, cl_command_queue_properties, cl_uint, cl_int*); /* Destroy and deallocate the command queue */ extern void cl_command_queue_delete(cl_command_queue); /* Keep one more reference on the queue */ extern void cl_command_queue_add_ref(cl_command_queue); /* Map ND range kernel from OCL API */ extern cl_int cl_command_queue_ND_range(cl_command_queue queue, cl_kernel ker, cl_event event, const uint32_t work_dim, const size_t *global_wk_off, const size_t *global_dim_off, const size_t *global_wk_sz, const size_t *global_wk_sz_use, const size_t *local_wk_sz, const size_t *local_wk_sz_use); /* The memory object where to report the performance */ extern cl_int cl_command_queue_set_report_buffer(cl_command_queue, cl_mem); /* Flush for the specified gpgpu */ extern int cl_command_queue_flush_gpgpu(cl_gpgpu); /* Bind all the surfaces in the GPGPU state */ extern cl_int cl_command_queue_bind_surface(cl_command_queue, cl_kernel, cl_gpgpu, uint32_t *); /* Bind all the image surfaces in the GPGPU state */ extern cl_int cl_command_queue_bind_image(cl_command_queue, cl_kernel, cl_gpgpu, uint32_t *); /* Bind all exec info to bind table */ extern cl_int cl_command_queue_bind_exec_info(cl_command_queue, cl_kernel, cl_gpgpu, uint32_t *); /* Insert a user event to command's wait_events */ extern void cl_command_queue_insert_event(cl_command_queue, cl_event); /* Remove a user event from command's wait_events */ extern void cl_command_queue_remove_event(cl_command_queue, cl_event); extern void cl_command_queue_insert_barrier_event(cl_command_queue queue, cl_event event); extern void cl_command_queue_remove_barrier_event(cl_command_queue queue, cl_event event); extern void cl_command_queue_notify(cl_command_queue queue); extern void cl_command_queue_enqueue_event(cl_command_queue queue, cl_event event); extern cl_int cl_command_queue_init_enqueue(cl_command_queue queue); extern void cl_command_queue_destroy_enqueue(cl_command_queue queue); extern cl_int cl_command_queue_wait_finish(cl_command_queue queue); extern cl_int cl_command_queue_wait_flush(cl_command_queue queue); /* Note: Must call this function with queue's lock. */ extern cl_event *cl_command_queue_record_in_queue_events(cl_command_queue queue, cl_uint *list_num); #endif /* __CL_COMMAND_QUEUE_H__ */ Beignet-1.3.2-Source/src/cl_gbe_loader.cpp000664 001750 001750 00000037673 13173554000 017532 0ustar00yryr000000 000000 /* * Copyright © 2014 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include #include #include #include #include "cl_gbe_loader.h" #include "backend/src/GBEConfig.h" //function pointer from libgbe.so gbe_program_new_from_source_cb *compiler_program_new_from_source = NULL; gbe_program_new_from_llvm_file_cb *compiler_program_new_from_llvm_file = NULL; gbe_program_compile_from_source_cb *compiler_program_compile_from_source = NULL; gbe_program_new_gen_program_cb *compiler_program_new_gen_program = NULL; gbe_program_link_program_cb *compiler_program_link_program = NULL; gbe_program_check_opt_cb *compiler_program_check_opt = NULL; gbe_program_build_from_llvm_cb *compiler_program_build_from_llvm = NULL; gbe_program_new_from_llvm_binary_cb *compiler_program_new_from_llvm_binary = NULL; gbe_program_serialize_to_binary_cb *compiler_program_serialize_to_binary = NULL; gbe_program_new_from_llvm_cb *compiler_program_new_from_llvm = NULL; gbe_program_clean_llvm_resource_cb *compiler_program_clean_llvm_resource = NULL; //function pointer from libgbeinterp.so gbe_program_new_from_binary_cb *interp_program_new_from_binary = NULL; gbe_program_get_global_constant_size_cb *interp_program_get_global_constant_size = NULL; gbe_program_get_global_constant_data_cb *interp_program_get_global_constant_data = NULL; gbe_program_get_global_reloc_count_cb *interp_program_get_global_reloc_count = NULL; gbe_program_get_global_reloc_table_cb *interp_program_get_global_reloc_table = NULL; gbe_program_delete_cb *interp_program_delete = NULL; gbe_program_get_kernel_num_cb *interp_program_get_kernel_num = NULL; gbe_program_get_kernel_by_name_cb *interp_program_get_kernel_by_name = NULL; gbe_program_get_kernel_cb *interp_program_get_kernel = NULL; gbe_program_get_device_enqueue_kernel_name_cb *interp_program_get_device_enqueue_kernel_name = NULL; gbe_kernel_get_name_cb *interp_kernel_get_name = NULL; gbe_kernel_get_attributes_cb *interp_kernel_get_attributes = NULL; gbe_kernel_get_code_cb *interp_kernel_get_code = NULL; gbe_kernel_get_code_size_cb *interp_kernel_get_code_size = NULL; gbe_kernel_get_arg_num_cb *interp_kernel_get_arg_num = NULL; gbe_kernel_get_arg_size_cb *interp_kernel_get_arg_size = NULL; gbe_kernel_get_arg_bti_cb *interp_kernel_get_arg_bti = NULL; gbe_kernel_get_arg_type_cb *interp_kernel_get_arg_type = NULL; gbe_kernel_get_arg_align_cb *interp_kernel_get_arg_align = NULL; gbe_kernel_get_simd_width_cb *interp_kernel_get_simd_width = NULL; gbe_kernel_get_curbe_offset_cb *interp_kernel_get_curbe_offset = NULL; gbe_kernel_get_curbe_size_cb *interp_kernel_get_curbe_size = NULL; gbe_kernel_get_stack_size_cb *interp_kernel_get_stack_size = NULL; gbe_kernel_get_scratch_size_cb *interp_kernel_get_scratch_size = NULL; gbe_kernel_get_required_work_group_size_cb *interp_kernel_get_required_work_group_size = NULL; gbe_kernel_use_slm_cb *interp_kernel_use_slm = NULL; gbe_kernel_get_slm_size_cb *interp_kernel_get_slm_size = NULL; gbe_kernel_get_sampler_size_cb *interp_kernel_get_sampler_size = NULL; gbe_kernel_get_sampler_data_cb *interp_kernel_get_sampler_data = NULL; gbe_kernel_get_compile_wg_size_cb *interp_kernel_get_compile_wg_size = NULL; gbe_kernel_get_image_size_cb *interp_kernel_get_image_size = NULL; gbe_kernel_get_image_data_cb *interp_kernel_get_image_data = NULL; gbe_kernel_get_ocl_version_cb *interp_kernel_get_ocl_version = NULL; gbe_output_profiling_cb* interp_output_profiling = NULL; gbe_get_profiling_bti_cb* interp_get_profiling_bti = NULL; gbe_dup_profiling_cb* interp_dup_profiling = NULL; gbe_get_printf_num_cb* interp_get_printf_num = NULL; gbe_get_printf_buf_bti_cb* interp_get_printf_buf_bti = NULL; gbe_dup_printfset_cb* interp_dup_printfset = NULL; gbe_release_printf_info_cb* interp_release_printf_info = NULL; gbe_output_printf_cb* interp_output_printf = NULL; gbe_kernel_get_arg_info_cb *interp_kernel_get_arg_info = NULL; gbe_kernel_use_device_enqueue_cb *interp_kernel_use_device_enqueue = NULL; struct GbeLoaderInitializer { GbeLoaderInitializer() { LoadCompiler(); const char* path; if (!LoadInterp(path)) std::cerr << "unable to load " << path << " which is part of the driver, please check!" << std::endl; } bool LoadInterp(const char*& path) { const char* interpPath = getenv("OCL_INTERP_PATH"); if (interpPath == NULL|| !strcmp(interpPath, "")) interpPath = INTERP_OBJECT_DIR; path = interpPath; dlhInterp = dlopen(interpPath, RTLD_LAZY | RTLD_LOCAL); if (dlhInterp == NULL) { return false; } interp_program_new_from_binary = *(gbe_program_new_from_binary_cb**)dlsym(dlhInterp, "gbe_program_new_from_binary"); if (interp_program_new_from_binary == NULL) return false; interp_program_get_global_constant_size = *(gbe_program_get_global_constant_size_cb**)dlsym(dlhInterp, "gbe_program_get_global_constant_size"); if (interp_program_get_global_constant_size == NULL) return false; interp_program_get_global_constant_data = *(gbe_program_get_global_constant_data_cb**)dlsym(dlhInterp, "gbe_program_get_global_constant_data"); if (interp_program_get_global_constant_data == NULL) return false; interp_program_get_global_reloc_count = *(gbe_program_get_global_reloc_count_cb**)dlsym(dlhInterp, "gbe_program_get_global_reloc_count"); if (interp_program_get_global_reloc_count == NULL) return false; interp_program_get_global_reloc_table = *(gbe_program_get_global_reloc_table_cb**)dlsym(dlhInterp, "gbe_program_get_global_reloc_table"); if (interp_program_get_global_reloc_table == NULL) return false; interp_program_delete = *(gbe_program_delete_cb**)dlsym(dlhInterp, "gbe_program_delete"); if (interp_program_delete == NULL) return false; interp_program_get_kernel_num = *(gbe_program_get_kernel_num_cb**)dlsym(dlhInterp, "gbe_program_get_kernel_num"); if (interp_program_get_kernel_num == NULL) return false; interp_program_get_kernel_by_name = *(gbe_program_get_kernel_by_name_cb**)dlsym(dlhInterp, "gbe_program_get_kernel_by_name"); if (interp_program_get_kernel_by_name == NULL) return false; interp_program_get_kernel = *(gbe_program_get_kernel_cb**)dlsym(dlhInterp, "gbe_program_get_kernel"); if (interp_program_get_kernel == NULL) return false; interp_program_get_device_enqueue_kernel_name = *(gbe_program_get_device_enqueue_kernel_name_cb**)dlsym(dlhInterp, "gbe_program_get_device_enqueue_kernel_name"); if (interp_program_get_device_enqueue_kernel_name == NULL) return false; interp_kernel_get_name = *(gbe_kernel_get_name_cb**)dlsym(dlhInterp, "gbe_kernel_get_name"); if (interp_kernel_get_name == NULL) return false; interp_kernel_get_attributes = *(gbe_kernel_get_attributes_cb**)dlsym(dlhInterp, "gbe_kernel_get_attributes"); if (interp_kernel_get_attributes == NULL) return false; interp_kernel_get_code = *(gbe_kernel_get_code_cb**)dlsym(dlhInterp, "gbe_kernel_get_code"); if (interp_kernel_get_code == NULL) return false; interp_kernel_get_code_size = *(gbe_kernel_get_code_size_cb**)dlsym(dlhInterp, "gbe_kernel_get_code_size"); if (interp_kernel_get_code_size == NULL) return false; interp_kernel_get_arg_num = *(gbe_kernel_get_arg_num_cb**)dlsym(dlhInterp, "gbe_kernel_get_arg_num"); if (interp_kernel_get_arg_num == NULL) return false; interp_kernel_get_arg_size = *(gbe_kernel_get_arg_size_cb**)dlsym(dlhInterp, "gbe_kernel_get_arg_size"); if (interp_kernel_get_arg_size == NULL) return false; interp_kernel_get_arg_bti = *(gbe_kernel_get_arg_bti_cb**)dlsym(dlhInterp, "gbe_kernel_get_arg_bti"); if (interp_kernel_get_arg_bti == NULL) return false; interp_kernel_get_arg_type = *(gbe_kernel_get_arg_type_cb**)dlsym(dlhInterp, "gbe_kernel_get_arg_type"); if (interp_kernel_get_arg_type == NULL) return false; interp_kernel_get_arg_align = *(gbe_kernel_get_arg_align_cb**)dlsym(dlhInterp, "gbe_kernel_get_arg_align"); if (interp_kernel_get_arg_align == NULL) return false; interp_kernel_get_simd_width = *(gbe_kernel_get_simd_width_cb**)dlsym(dlhInterp, "gbe_kernel_get_simd_width"); if (interp_kernel_get_simd_width == NULL) return false; interp_kernel_get_curbe_offset = *(gbe_kernel_get_curbe_offset_cb**)dlsym(dlhInterp, "gbe_kernel_get_curbe_offset"); if (interp_kernel_get_curbe_offset == NULL) return false; interp_kernel_get_curbe_size = *(gbe_kernel_get_curbe_size_cb**)dlsym(dlhInterp, "gbe_kernel_get_curbe_size"); if (interp_kernel_get_curbe_size == NULL) return false; interp_kernel_get_stack_size = *(gbe_kernel_get_stack_size_cb**)dlsym(dlhInterp, "gbe_kernel_get_stack_size"); if (interp_kernel_get_stack_size == NULL) return false; interp_kernel_get_scratch_size = *(gbe_kernel_get_scratch_size_cb**)dlsym(dlhInterp, "gbe_kernel_get_scratch_size"); if (interp_kernel_get_scratch_size == NULL) return false; interp_kernel_get_required_work_group_size = *(gbe_kernel_get_required_work_group_size_cb**)dlsym(dlhInterp, "gbe_kernel_get_required_work_group_size"); if (interp_kernel_get_required_work_group_size == NULL) return false; interp_kernel_use_slm = *(gbe_kernel_use_slm_cb**)dlsym(dlhInterp, "gbe_kernel_use_slm"); if (interp_kernel_use_slm == NULL) return false; interp_kernel_get_slm_size = *(gbe_kernel_get_slm_size_cb**)dlsym(dlhInterp, "gbe_kernel_get_slm_size"); if (interp_kernel_get_slm_size == NULL) return false; interp_kernel_get_sampler_size = *(gbe_kernel_get_sampler_size_cb**)dlsym(dlhInterp, "gbe_kernel_get_sampler_size"); if (interp_kernel_get_sampler_size == NULL) return false; interp_kernel_get_sampler_data = *(gbe_kernel_get_sampler_data_cb**)dlsym(dlhInterp, "gbe_kernel_get_sampler_data"); if (interp_kernel_get_sampler_data == NULL) return false; interp_kernel_get_compile_wg_size = *(gbe_kernel_get_compile_wg_size_cb**)dlsym(dlhInterp, "gbe_kernel_get_compile_wg_size"); if (interp_kernel_get_compile_wg_size == NULL) return false; interp_kernel_get_image_size = *(gbe_kernel_get_image_size_cb**)dlsym(dlhInterp, "gbe_kernel_get_image_size"); if (interp_kernel_get_image_size == NULL) return false; interp_kernel_get_image_data = *(gbe_kernel_get_image_data_cb**)dlsym(dlhInterp, "gbe_kernel_get_image_data"); if (interp_kernel_get_image_data == NULL) return false; interp_kernel_get_ocl_version = *(gbe_kernel_get_ocl_version_cb**)dlsym(dlhInterp, "gbe_kernel_get_ocl_version"); if (interp_kernel_get_ocl_version == NULL) return false; interp_output_profiling = *(gbe_output_profiling_cb**)dlsym(dlhInterp, "gbe_output_profiling"); if (interp_output_profiling == NULL) return false; interp_get_profiling_bti = *(gbe_get_profiling_bti_cb**)dlsym(dlhInterp, "gbe_get_profiling_bti"); if (interp_get_profiling_bti == NULL) return false; interp_dup_profiling = *(gbe_dup_profiling_cb**)dlsym(dlhInterp, "gbe_dup_profiling"); if (interp_dup_profiling == NULL) return false; interp_get_printf_num = *(gbe_get_printf_num_cb**)dlsym(dlhInterp, "gbe_get_printf_num"); if (interp_get_printf_num == NULL) return false; interp_get_printf_buf_bti = *(gbe_get_printf_buf_bti_cb**)dlsym(dlhInterp, "gbe_get_printf_buf_bti"); if (interp_get_printf_buf_bti == NULL) return false; interp_dup_printfset = *(gbe_dup_printfset_cb**)dlsym(dlhInterp, "gbe_dup_printfset"); if (interp_dup_printfset == NULL) return false; interp_release_printf_info = *(gbe_release_printf_info_cb**)dlsym(dlhInterp, "gbe_release_printf_info"); if (interp_release_printf_info == NULL) return false; interp_output_printf = *(gbe_output_printf_cb**)dlsym(dlhInterp, "gbe_output_printf"); if (interp_output_printf == NULL) return false; interp_kernel_get_arg_info = *(gbe_kernel_get_arg_info_cb**)dlsym(dlhInterp, "gbe_kernel_get_arg_info"); if (interp_kernel_get_arg_info == NULL) return false; interp_kernel_use_device_enqueue = *(gbe_kernel_use_device_enqueue_cb**)dlsym(dlhInterp, "gbe_kernel_use_device_enqueue"); if (interp_kernel_use_device_enqueue == NULL) return false; return true; } void LoadCompiler() { compilerLoaded = false; const char* nonCompiler = getenv("OCL_NON_COMPILER"); if (nonCompiler != NULL) { if (strcmp(nonCompiler, "1") == 0) return; } const char* gbePath = getenv("OCL_GBE_PATH"); if (gbePath == NULL || !strcmp(gbePath, "")) gbePath = GBE_OBJECT_DIR; dlhCompiler = dlopen(gbePath, RTLD_LAZY | RTLD_LOCAL); if (dlhCompiler != NULL) { compiler_program_new_from_source = *(gbe_program_new_from_source_cb **)dlsym(dlhCompiler, "gbe_program_new_from_source"); if (compiler_program_new_from_source == NULL) return; compiler_program_new_from_llvm_file = *(gbe_program_new_from_llvm_file_cb **)dlsym(dlhCompiler, "gbe_program_new_from_llvm_file"); if (compiler_program_new_from_llvm_file == NULL) return; compiler_program_compile_from_source = *(gbe_program_compile_from_source_cb **)dlsym(dlhCompiler, "gbe_program_compile_from_source"); if (compiler_program_compile_from_source == NULL) return; compiler_program_new_gen_program = *(gbe_program_new_gen_program_cb **)dlsym(dlhCompiler, "gbe_program_new_gen_program"); if (compiler_program_new_gen_program == NULL) return; compiler_program_link_program = *(gbe_program_link_program_cb **)dlsym(dlhCompiler, "gbe_program_link_program"); if (compiler_program_link_program == NULL) return; compiler_program_check_opt = *(gbe_program_check_opt_cb **)dlsym(dlhCompiler, "gbe_program_check_opt"); if (compiler_program_check_opt == NULL) return; compiler_program_build_from_llvm = *(gbe_program_build_from_llvm_cb **)dlsym(dlhCompiler, "gbe_program_build_from_llvm"); if (compiler_program_build_from_llvm == NULL) return; compiler_program_new_from_llvm_binary = *(gbe_program_new_from_llvm_binary_cb **)dlsym(dlhCompiler, "gbe_program_new_from_llvm_binary"); if (compiler_program_new_from_llvm_binary == NULL) return; compiler_program_serialize_to_binary = *(gbe_program_serialize_to_binary_cb **)dlsym(dlhCompiler, "gbe_program_serialize_to_binary"); if (compiler_program_serialize_to_binary == NULL) return; compiler_program_new_from_llvm = *(gbe_program_new_from_llvm_cb **)dlsym(dlhCompiler, "gbe_program_new_from_llvm"); if (compiler_program_new_from_llvm == NULL) return; compiler_program_clean_llvm_resource = *(gbe_program_clean_llvm_resource_cb **)dlsym(dlhCompiler, "gbe_program_clean_llvm_resource"); if (compiler_program_clean_llvm_resource == NULL) return; compilerLoaded = true; } } ~GbeLoaderInitializer() { if (dlhCompiler != NULL) dlclose(dlhCompiler); if (dlhInterp != NULL) dlclose(dlhInterp); //When destroy, set the release relative functions //to NULL to avoid dangling pointer visit. compiler_program_clean_llvm_resource = NULL; interp_program_delete = NULL; } bool compilerLoaded; void *dlhCompiler; void *dlhInterp; }; static struct GbeLoaderInitializer gbeLoader; int CompilerSupported() { if (gbeLoader.compilerLoaded) return 1; else return 0; } Beignet-1.3.2-Source/src/cl_device_data.h000664 001750 001750 00000043571 13173554000 017336 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_DEVICE_DATA_H__ #define __CL_DEVICE_DATA_H__ #define INVALID_CHIP_ID -1 //returned by intel_get_device_id if no device found #define INTEL_VENDOR_ID 0x8086 // Vendor ID for Intel #define PCI_CHIP_GM45_GM 0x2A42 #define PCI_CHIP_IGD_E_G 0x2E02 #define PCI_CHIP_Q45_G 0x2E12 #define PCI_CHIP_G45_G 0x2E22 #define PCI_CHIP_G41_G 0x2E32 #define PCI_CHIP_IGDNG_D_G 0x0042 #define PCI_CHIP_IGDNG_M_G 0x0046 #define IS_G45(devid) (devid == PCI_CHIP_IGD_E_G || \ devid == PCI_CHIP_Q45_G || \ devid == PCI_CHIP_G45_G || \ devid == PCI_CHIP_G41_G) #define IS_GM45(devid) (devid == PCI_CHIP_GM45_GM) #define IS_G4X(devid) (IS_G45(devid) || IS_GM45(devid)) #define IS_IGDNG_D(devid) (devid == PCI_CHIP_IGDNG_D_G) #define IS_IGDNG_M(devid) (devid == PCI_CHIP_IGDNG_M_G) #define IS_IGDNG(devid) (IS_IGDNG_D(devid) || IS_IGDNG_M(devid)) #ifndef PCI_CHIP_SANDYBRIDGE_BRIDGE #define PCI_CHIP_SANDYBRIDGE_BRIDGE 0x0100 /* Desktop */ #define PCI_CHIP_SANDYBRIDGE_GT1 0x0102 #define PCI_CHIP_SANDYBRIDGE_GT2 0x0112 #define PCI_CHIP_SANDYBRIDGE_GT2_PLUS 0x0122 #define PCI_CHIP_SANDYBRIDGE_BRIDGE_M 0x0104 /* Mobile */ #define PCI_CHIP_SANDYBRIDGE_M_GT1 0x0106 #define PCI_CHIP_SANDYBRIDGE_M_GT2 0x0116 #define PCI_CHIP_SANDYBRIDGE_M_GT2_PLUS 0x0126 #define PCI_CHIP_SANDYBRIDGE_BRIDGE_S 0x0108 /* Server */ #define PCI_CHIP_SANDYBRIDGE_S_GT 0x010A #endif #define IS_GEN6(devid) \ (devid == PCI_CHIP_SANDYBRIDGE_GT1 || \ devid == PCI_CHIP_SANDYBRIDGE_GT2 || \ devid == PCI_CHIP_SANDYBRIDGE_GT2_PLUS || \ devid == PCI_CHIP_SANDYBRIDGE_M_GT1 || \ devid == PCI_CHIP_SANDYBRIDGE_M_GT2 || \ devid == PCI_CHIP_SANDYBRIDGE_M_GT2_PLUS || \ devid == PCI_CHIP_SANDYBRIDGE_S_GT) #define PCI_CHIP_IVYBRIDGE_GT1 0x0152 /* Desktop */ #define PCI_CHIP_IVYBRIDGE_GT2 0x0162 #define PCI_CHIP_IVYBRIDGE_M_GT1 0x0156 /* Mobile */ #define PCI_CHIP_IVYBRIDGE_M_GT2 0x0166 #define PCI_CHIP_IVYBRIDGE_S_GT1 0x015a /* Server */ #define PCI_CHIP_IVYBRIDGE_S_GT2 0x016a #define PCI_CHIP_BAYTRAIL_T 0x0F31 #define IS_IVB_GT1(devid) \ (devid == PCI_CHIP_IVYBRIDGE_GT1 || \ devid == PCI_CHIP_IVYBRIDGE_M_GT1 || \ devid == PCI_CHIP_IVYBRIDGE_S_GT1) #define IS_IVB_GT2(devid) \ (devid == PCI_CHIP_IVYBRIDGE_GT2 || \ devid == PCI_CHIP_IVYBRIDGE_M_GT2 || \ devid == PCI_CHIP_IVYBRIDGE_S_GT2) #define IS_BAYTRAIL_T(devid) \ (devid == PCI_CHIP_BAYTRAIL_T) #define IS_IVYBRIDGE(devid) (IS_IVB_GT1(devid) || IS_IVB_GT2(devid) || IS_BAYTRAIL_T(devid)) #define IS_GEN7(devid) IS_IVYBRIDGE(devid) #define PCI_CHIP_HASWELL_D1 0x0402 /* GT1 desktop */ #define PCI_CHIP_HASWELL_D2 0x0412 /* GT2 desktop */ #define PCI_CHIP_HASWELL_D3 0x0422 /* GT3 desktop */ #define PCI_CHIP_HASWELL_S1 0x040a /* GT1 server */ #define PCI_CHIP_HASWELL_S2 0x041a /* GT2 server */ #define PCI_CHIP_HASWELL_S3 0x042a /* GT3 server */ #define PCI_CHIP_HASWELL_M1 0x0406 /* GT1 mobile */ #define PCI_CHIP_HASWELL_M2 0x0416 /* GT2 mobile */ #define PCI_CHIP_HASWELL_M3 0x0426 /* GT3 mobile */ #define PCI_CHIP_HASWELL_B1 0x040B /* Haswell GT1 */ #define PCI_CHIP_HASWELL_B2 0x041B /* Haswell GT2 */ #define PCI_CHIP_HASWELL_B3 0x042B /* Haswell GT3 */ #define PCI_CHIP_HASWELL_E1 0x040E /* Haswell GT1 */ #define PCI_CHIP_HASWELL_E2 0x041E /* Haswell GT2 */ #define PCI_CHIP_HASWELL_E3 0x042E /* Haswell GT3 */ /* Software Development Vehicle devices. */ #define PCI_CHIP_HASWELL_SDV_D1 0x0C02 /* SDV GT1 desktop */ #define PCI_CHIP_HASWELL_SDV_D2 0x0C12 /* SDV GT2 desktop */ #define PCI_CHIP_HASWELL_SDV_D3 0x0C22 /* SDV GT3 desktop */ #define PCI_CHIP_HASWELL_SDV_S1 0x0C0A /* SDV GT1 server */ #define PCI_CHIP_HASWELL_SDV_S2 0x0C1A /* SDV GT2 server */ #define PCI_CHIP_HASWELL_SDV_S3 0x0C2A /* SDV GT3 server */ #define PCI_CHIP_HASWELL_SDV_M1 0x0C06 /* SDV GT1 mobile */ #define PCI_CHIP_HASWELL_SDV_M2 0x0C16 /* SDV GT2 mobile */ #define PCI_CHIP_HASWELL_SDV_M3 0x0C26 /* SDV GT3 mobile */ #define PCI_CHIP_HASWELL_SDV_B1 0x0C0B /* SDV GT1 */ #define PCI_CHIP_HASWELL_SDV_B2 0x0C1B /* SDV GT2 */ #define PCI_CHIP_HASWELL_SDV_B3 0x0C2B /* SDV GT3 */ #define PCI_CHIP_HASWELL_SDV_E1 0x0C0E /* SDV GT1 */ #define PCI_CHIP_HASWELL_SDV_E2 0x0C1E /* SDV GT2 */ #define PCI_CHIP_HASWELL_SDV_E3 0x0C2E /* SDV GT3 */ /* Ultrabooks */ #define PCI_CHIP_HASWELL_ULT_D1 0x0A02 /* ULT GT1 desktop */ #define PCI_CHIP_HASWELL_ULT_D2 0x0A12 /* ULT GT2 desktop */ #define PCI_CHIP_HASWELL_ULT_D3 0x0A22 /* ULT GT3 desktop */ #define PCI_CHIP_HASWELL_ULT_S1 0x0A0A /* ULT GT1 server */ #define PCI_CHIP_HASWELL_ULT_S2 0x0A1A /* ULT GT2 server */ #define PCI_CHIP_HASWELL_ULT_S3 0x0A2A /* ULT GT3 server */ #define PCI_CHIP_HASWELL_ULT_M1 0x0A06 /* ULT GT1 mobile */ #define PCI_CHIP_HASWELL_ULT_M2 0x0A16 /* ULT GT2 mobile */ #define PCI_CHIP_HASWELL_ULT_M3 0x0A26 /* ULT GT3 mobile */ #define PCI_CHIP_HASWELL_ULT_B1 0x0A0B /* ULT GT1 */ #define PCI_CHIP_HASWELL_ULT_B2 0x0A1B /* ULT GT2 */ #define PCI_CHIP_HASWELL_ULT_B3 0x0A2B /* ULT GT3 */ #define PCI_CHIP_HASWELL_ULT_E1 0x0A0E /* ULT GT1 */ #define PCI_CHIP_HASWELL_ULT_E2 0x0A1E /* ULT GT2 */ #define PCI_CHIP_HASWELL_ULT_E3 0x0A2E /* ULT GT3 */ /* CRW */ #define PCI_CHIP_HASWELL_CRW_D1 0x0D02 /* CRW GT1 desktop */ #define PCI_CHIP_HASWELL_CRW_D2 0x0D12 /* CRW GT2 desktop */ #define PCI_CHIP_HASWELL_CRW_D3 0x0D22 /* CRW GT3 desktop */ #define PCI_CHIP_HASWELL_CRW_S1 0x0D0A /* CRW GT1 server */ #define PCI_CHIP_HASWELL_CRW_S2 0x0D1A /* CRW GT2 server */ #define PCI_CHIP_HASWELL_CRW_S3 0x0D2A /* CRW GT3 server */ #define PCI_CHIP_HASWELL_CRW_M1 0x0D06 /* CRW GT1 mobile */ #define PCI_CHIP_HASWELL_CRW_M2 0x0D16 /* CRW GT2 mobile */ #define PCI_CHIP_HASWELL_CRW_M3 0x0D26 /* CRW GT3 mobile */ #define PCI_CHIP_HASWELL_CRW_B1 0x0D0B /* CRW GT1 */ #define PCI_CHIP_HASWELL_CRW_B2 0x0D1B /* CRW GT2 */ #define PCI_CHIP_HASWELL_CRW_B3 0x0D2B /* CRW GT3 */ #define PCI_CHIP_HASWELL_CRW_E1 0x0D0E /* CRW GT1 */ #define PCI_CHIP_HASWELL_CRW_E2 0x0D1E /* CRW GT2 */ #define PCI_CHIP_HASWELL_CRW_E3 0x0D2E /* CRW GT3 */ #define IS_HASWELL(devid) ( \ (devid) == PCI_CHIP_HASWELL_D1 || (devid) == PCI_CHIP_HASWELL_D2 || \ (devid) == PCI_CHIP_HASWELL_D3 || (devid) == PCI_CHIP_HASWELL_S1 || \ (devid) == PCI_CHIP_HASWELL_S2 || (devid) == PCI_CHIP_HASWELL_S3 || \ (devid) == PCI_CHIP_HASWELL_M1 || (devid) == PCI_CHIP_HASWELL_M2 || \ (devid) == PCI_CHIP_HASWELL_M3 || (devid) == PCI_CHIP_HASWELL_B1 || \ (devid) == PCI_CHIP_HASWELL_B2 || (devid) == PCI_CHIP_HASWELL_B3 || \ (devid) == PCI_CHIP_HASWELL_E1 || (devid) == PCI_CHIP_HASWELL_E2 || \ (devid) == PCI_CHIP_HASWELL_E3 || (devid) == PCI_CHIP_HASWELL_SDV_D1 || \ (devid) == PCI_CHIP_HASWELL_SDV_D2 || (devid) == PCI_CHIP_HASWELL_SDV_D3 || \ (devid) == PCI_CHIP_HASWELL_SDV_S1 || (devid) == PCI_CHIP_HASWELL_SDV_S2 || \ (devid) == PCI_CHIP_HASWELL_SDV_S3 || (devid) == PCI_CHIP_HASWELL_SDV_M1 || \ (devid) == PCI_CHIP_HASWELL_SDV_M2 || (devid) == PCI_CHIP_HASWELL_SDV_M3 || \ (devid) == PCI_CHIP_HASWELL_SDV_B1 || (devid) == PCI_CHIP_HASWELL_SDV_B2 || \ (devid) == PCI_CHIP_HASWELL_SDV_B3 || (devid) == PCI_CHIP_HASWELL_SDV_E1 || \ (devid) == PCI_CHIP_HASWELL_SDV_E2 || (devid) == PCI_CHIP_HASWELL_SDV_E3 || \ (devid) == PCI_CHIP_HASWELL_ULT_D1 || (devid) == PCI_CHIP_HASWELL_ULT_D2 || \ (devid) == PCI_CHIP_HASWELL_ULT_D3 || (devid) == PCI_CHIP_HASWELL_ULT_S1 || \ (devid) == PCI_CHIP_HASWELL_ULT_S2 || (devid) == PCI_CHIP_HASWELL_ULT_S3 || \ (devid) == PCI_CHIP_HASWELL_ULT_M1 || (devid) == PCI_CHIP_HASWELL_ULT_M2 || \ (devid) == PCI_CHIP_HASWELL_ULT_M3 || (devid) == PCI_CHIP_HASWELL_ULT_B1 || \ (devid) == PCI_CHIP_HASWELL_ULT_B2 || (devid) == PCI_CHIP_HASWELL_ULT_B3 || \ (devid) == PCI_CHIP_HASWELL_ULT_E1 || (devid) == PCI_CHIP_HASWELL_ULT_E2 || \ (devid) == PCI_CHIP_HASWELL_ULT_E3 || (devid) == PCI_CHIP_HASWELL_CRW_D1 || \ (devid) == PCI_CHIP_HASWELL_CRW_D2 || (devid) == PCI_CHIP_HASWELL_CRW_D3 || \ (devid) == PCI_CHIP_HASWELL_CRW_S1 || (devid) == PCI_CHIP_HASWELL_CRW_S2 || \ (devid) == PCI_CHIP_HASWELL_CRW_S3 || (devid) == PCI_CHIP_HASWELL_CRW_M1 || \ (devid) == PCI_CHIP_HASWELL_CRW_M2 || (devid) == PCI_CHIP_HASWELL_CRW_M3 || \ (devid) == PCI_CHIP_HASWELL_CRW_B1 || (devid) == PCI_CHIP_HASWELL_CRW_B2 || \ (devid) == PCI_CHIP_HASWELL_CRW_B3 || (devid) == PCI_CHIP_HASWELL_CRW_E1 || \ (devid) == PCI_CHIP_HASWELL_CRW_E2 || (devid) == PCI_CHIP_HASWELL_CRW_E3) #define IS_GEN75(devid) IS_HASWELL(devid) /* BRW */ #define PCI_CHIP_BROADWLL_M_GT1 0x1602 /* Intel(R) Broadwell Mobile - Halo (EDRAM) - GT1 */ #define PCI_CHIP_BROADWLL_D_GT1 0x1606 /* Intel(R) Broadwell U-Processor - GT1 */ #define PCI_CHIP_BROADWLL_S_GT1 0x160A /* Intel(R) Broadwell Server - GT1 */ #define PCI_CHIP_BROADWLL_W_GT1 0x160D /* Intel(R) Broadwell Workstation - GT1 */ #define PCI_CHIP_BROADWLL_U_GT1 0x160E /* Intel(R) Broadwell ULX - GT1 */ #define PCI_CHIP_BROADWLL_M_GT2 0x1612 /* Intel(R) Broadwell Mobile - Halo (EDRAM) - GT2 */ #define PCI_CHIP_BROADWLL_D_GT2 0x1616 /* Intel(R) Broadwell U-Processor - GT2 */ #define PCI_CHIP_BROADWLL_S_GT2 0x161A /* Intel(R) Broadwell Server - GT2 */ #define PCI_CHIP_BROADWLL_W_GT2 0x161D /* Intel(R) Broadwell Workstation - GT2 */ #define PCI_CHIP_BROADWLL_U_GT2 0x161E /* Intel(R) Broadwell ULX - GT2 */ #define PCI_CHIP_BROADWLL_M_GT3 0x1622 /* Intel(R) Broadwell Mobile - Halo (EDRAM) - GT3 */ #define PCI_CHIP_BROADWLL_D_GT3 0x1626 /* Intel(R) Broadwell U-Processor HD 6000 - GT3 */ #define PCI_CHIP_BROADWLL_UI_GT3 0x162B /* Intel(R) Broadwell U-Process Iris 6100 - GT3 */ #define PCI_CHIP_BROADWLL_S_GT3 0x162A /* Intel(R) Broadwell Server - GT3 */ #define PCI_CHIP_BROADWLL_W_GT3 0x162D /* Intel(R) Broadwell Workstation - GT3 */ #define PCI_CHIP_BROADWLL_U_GT3 0x162E /* Intel(R) Broadwell ULX - GT3 */ #define IS_BRW_GT1(devid) \ (devid == PCI_CHIP_BROADWLL_M_GT1 || \ devid == PCI_CHIP_BROADWLL_D_GT1 || \ devid == PCI_CHIP_BROADWLL_S_GT1 || \ devid == PCI_CHIP_BROADWLL_W_GT1 || \ devid == PCI_CHIP_BROADWLL_U_GT1) #define IS_BRW_GT2(devid) \ (devid == PCI_CHIP_BROADWLL_M_GT2 || \ devid == PCI_CHIP_BROADWLL_D_GT2 || \ devid == PCI_CHIP_BROADWLL_S_GT2 || \ devid == PCI_CHIP_BROADWLL_W_GT2 || \ devid == PCI_CHIP_BROADWLL_U_GT2) #define IS_BRW_GT3(devid) \ (devid == PCI_CHIP_BROADWLL_M_GT3 || \ devid == PCI_CHIP_BROADWLL_D_GT3 || \ devid == PCI_CHIP_BROADWLL_S_GT3 || \ devid == PCI_CHIP_BROADWLL_W_GT3 || \ devid == PCI_CHIP_BROADWLL_UI_GT3 || \ devid == PCI_CHIP_BROADWLL_U_GT3) #define IS_BROADWELL(devid) (IS_BRW_GT1(devid) || IS_BRW_GT2(devid) || IS_BRW_GT3(devid)) #define PCI_CHIP_CHV_0 0x22B0 #define PCI_CHIP_CHV_1 0x22B1 #define PCI_CHIP_CHV_2 0x22B2 #define PCI_CHIP_CHV_3 0x22B3 #define IS_CHERRYVIEW(devid) \ (devid == PCI_CHIP_CHV_0 || \ devid == PCI_CHIP_CHV_1 || \ devid == PCI_CHIP_CHV_2 || \ devid == PCI_CHIP_CHV_3) #define IS_GEN8(devid) (IS_BROADWELL(devid) || IS_CHERRYVIEW(devid)) /* SKL */ #define PCI_CHIP_SKYLAKE_ULT_GT1 0x1906 /* Intel(R) Skylake ULT - GT1 */ #define PCI_CHIP_SKYLAKE_ULT_GT2 0x1916 /* Intel(R) Skylake ULT - GT2 */ #define PCI_CHIP_SKYLAKE_ULT_GT3 0x1923 /* Intel(R) Skylake ULT - GT3 */ #define PCI_CHIP_SKYLAKE_ULT_GT3E1 0x1926 /* Intel(R) Skylake ULT - GT3E */ #define PCI_CHIP_SKYLAKE_ULT_GT3E2 0x1927 /* Intel(R) Skylake ULT - GT3E */ #define PCI_CHIP_SKYLAKE_ULT_GT2F 0x1921 /* Intel(R) Skylake ULT - GT2F */ #define PCI_CHIP_SKYLAKE_ULX_GT1 0x190E /* Intel(R) Skylake ULX - GT1 */ #define PCI_CHIP_SKYLAKE_ULX_GT2 0x191E /* Intel(R) Skylake ULX - GT2 */ #define PCI_CHIP_SKYLAKE_DT_GT1 0x1902 /* Intel(R) Skylake Desktop - GT1 */ #define PCI_CHIP_SKYLAKE_DT_GT2 0x1912 /* Intel(R) Skylake Desktop - GT2 */ #define PCI_CHIP_SKYLAKE_DT_GT4 0x1932 /* Intel(R) Skylake Desktop - GT4 */ #define PCI_CHIP_SKYLAKE_HALO_GT1 0x190B /* Intel(R) Skylake HALO - GT1 */ #define PCI_CHIP_SKYLAKE_HALO_GT2 0x191B /* Intel(R) Skylake HALO - GT2 */ #define PCI_CHIP_SKYLAKE_HALO_GT3 0x192B /* Intel(R) Skylake HALO - GT3 */ #define PCI_CHIP_SKYLAKE_HALO_GT4 0x193B /* Intel(R) Skylake HALO - GT4 */ #define PCI_CHIP_SKYLAKE_SRV_GT1 0x190A /* Intel(R) Skylake Server - GT1 */ #define PCI_CHIP_SKYLAKE_SRV_GT2 0x191A /* Intel(R) Skylake Server - GT2 */ #define PCI_CHIP_SKYLAKE_SRV_GT3 0x192A /* Intel(R) Skylake Server - GT3 */ #define PCI_CHIP_SKYLAKE_SRV_GT4 0x193A /* Intel(R) Skylake Server - GT4 */ #define PCI_CHIP_SKYLAKE_WKS_GT2 0x191D /* Intel(R) Skylake WKS - GT2 */ #define PCI_CHIP_SKYLAKE_MEDIA_SRV_GT3 0x192D /* Intel(R) Skylake Media Server - GT3 */ #define PCI_CHIP_SKYLAKE_WKS_GT4 0x193D /* Intel(R) Skylake WKS - GT4 */ #define IS_SKL_GT1(devid) \ (devid == PCI_CHIP_SKYLAKE_ULT_GT1 || \ devid == PCI_CHIP_SKYLAKE_ULX_GT1 || \ devid == PCI_CHIP_SKYLAKE_DT_GT1 || \ devid == PCI_CHIP_SKYLAKE_HALO_GT1 || \ devid == PCI_CHIP_SKYLAKE_SRV_GT1) #define IS_SKL_GT2(devid) \ (devid == PCI_CHIP_SKYLAKE_ULT_GT2 || \ devid == PCI_CHIP_SKYLAKE_ULT_GT2F || \ devid == PCI_CHIP_SKYLAKE_ULX_GT2 || \ devid == PCI_CHIP_SKYLAKE_DT_GT2 || \ devid == PCI_CHIP_SKYLAKE_HALO_GT2 || \ devid == PCI_CHIP_SKYLAKE_SRV_GT2 || \ devid == PCI_CHIP_SKYLAKE_WKS_GT2) #define IS_SKL_GT3(devid) \ (devid == PCI_CHIP_SKYLAKE_ULT_GT3 || \ devid == PCI_CHIP_SKYLAKE_ULT_GT3E1 || \ devid == PCI_CHIP_SKYLAKE_ULT_GT3E2 || \ devid == PCI_CHIP_SKYLAKE_HALO_GT3 || \ devid == PCI_CHIP_SKYLAKE_SRV_GT3 || \ devid == PCI_CHIP_SKYLAKE_MEDIA_SRV_GT3) #define IS_SKL_GT4(devid) \ (devid == PCI_CHIP_SKYLAKE_DT_GT4 || \ devid == PCI_CHIP_SKYLAKE_HALO_GT4 || \ devid == PCI_CHIP_SKYLAKE_SRV_GT4 || \ devid == PCI_CHIP_SKYLAKE_WKS_GT4) #define IS_SKYLAKE(devid) (IS_SKL_GT1(devid) || IS_SKL_GT2(devid) || IS_SKL_GT3(devid) || IS_SKL_GT4(devid)) /* BXT */ #define PCI_CHIP_BROXTON_0 0x5A84 #define PCI_CHIP_BROXTON_1 0x5A85 #define PCI_CHIP_BROXTON_2 0x1A84 #define PCI_CHIP_BROXTON_3 0x1A85 #define IS_BROXTON(devid) \ (devid == PCI_CHIP_BROXTON_0 || \ devid == PCI_CHIP_BROXTON_1 || \ devid == PCI_CHIP_BROXTON_2 || \ devid == PCI_CHIP_BROXTON_3) #define PCI_CHIP_KABYLAKE_ULT_GT1 0x5906 #define PCI_CHIP_KABYLAKE_ULT_GT2 0x5916 #define PCI_CHIP_KABYLAKE_ULT_GT3 0x5926 #define PCI_CHIP_KABYLAKE_ULT_GT15 0x5913 #define PCI_CHIP_KABYLAKE_ULT_GT2_1 0x5921 #define PCI_CHIP_KABYLAKE_ULT_GT3_1 0x5923 #define PCI_CHIP_KABYLAKE_ULT_GT3_2 0x5927 #define PCI_CHIP_KABYLAKE_DT_GT1 0x5902 #define PCI_CHIP_KABYLAKE_DT_GT2 0x5912 #define PCI_CHIP_KABYLAKE_DT_GT15 0x5917 #define PCI_CHIP_KABYLAKE_HALO_GT1 0x590B #define PCI_CHIP_KABYLAKE_HALO_GT2 0x591B #define PCI_CHIP_KABYLAKE_HALO_GT4 0x593B #define PCI_CHIP_KABYLAKE_HALO_GT15 0x5908 #define PCI_CHIP_KABYLAKE_ULX_GT1 0x590E #define PCI_CHIP_KABYLAKE_ULX_GT2 0x591E #define PCI_CHIP_KABYLAKE_ULX_GT15 0x5915 #define PCI_CHIP_KABYLAKE_SRV_GT1 0x590A #define PCI_CHIP_KABYLAKE_SRV_GT2 0x591A #define PCI_CHIP_KABYLAKE_WKS_GT2 0x591D #define IS_KBL_GT1(devid) \ (devid == PCI_CHIP_KABYLAKE_ULT_GT1 || \ devid == PCI_CHIP_KABYLAKE_DT_GT1 || \ devid == PCI_CHIP_KABYLAKE_HALO_GT1 || \ devid == PCI_CHIP_KABYLAKE_ULX_GT1 || \ devid == PCI_CHIP_KABYLAKE_SRV_GT1) #define IS_KBL_GT15(devid) \ (devid == PCI_CHIP_KABYLAKE_ULT_GT15 || \ devid == PCI_CHIP_KABYLAKE_DT_GT15 || \ devid == PCI_CHIP_KABYLAKE_HALO_GT15 || \ devid == PCI_CHIP_KABYLAKE_ULX_GT15) #define IS_KBL_GT2(devid) \ (devid == PCI_CHIP_KABYLAKE_ULT_GT2 || \ devid == PCI_CHIP_KABYLAKE_ULT_GT2_1 || \ devid == PCI_CHIP_KABYLAKE_DT_GT2 || \ devid == PCI_CHIP_KABYLAKE_HALO_GT2 || \ devid == PCI_CHIP_KABYLAKE_ULX_GT2 || \ devid == PCI_CHIP_KABYLAKE_SRV_GT2 || \ devid == PCI_CHIP_KABYLAKE_WKS_GT2) #define IS_KBL_GT3(devid) \ (devid == PCI_CHIP_KABYLAKE_ULT_GT3 || \ devid == PCI_CHIP_KABYLAKE_ULT_GT3_1 || \ devid == PCI_CHIP_KABYLAKE_ULT_GT3_2) #define IS_KBL_GT4(devid) \ (devid == PCI_CHIP_KABYLAKE_HALO_GT4) #define IS_KABYLAKE(devid) (IS_KBL_GT1(devid) || IS_KBL_GT15(devid) || IS_KBL_GT2(devid) || IS_KBL_GT3(devid) || IS_KBL_GT4(devid)) #define PCI_CHIP_GLK_3x6 0x3184 #define PCI_CHIP_GLK_2x6 0x3185 #define IS_GEMINILAKE(devid) \ (devid == PCI_CHIP_GLK_3x6 || \ devid == PCI_CHIP_GLK_2x6) #define IS_GEN9(devid) (IS_SKYLAKE(devid) || IS_BROXTON(devid) || IS_KABYLAKE(devid) || IS_GEMINILAKE(devid)) #define MAX_OCLVERSION(devid) (IS_GEN9(devid) ? 200 : 120) #endif /* __CL_DEVICE_DATA_H__ */ Beignet-1.3.2-Source/src/cl_context.h000664 001750 001750 00000021520 13173554000 016560 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_CONTEXT_H__ #define __CL_CONTEXT_H__ #include "CL/cl.h" #include "CL/cl_ext.h" #include "cl_internals.h" #include "cl_driver.h" #include "cl_base_object.h" #include #include /* DRI device created at create context */ struct intel_driver; enum _cl_gl_context_type { CL_GL_NOSHARE, CL_GL_EGL_DISPLAY, CL_GL_GLX_DISPLAY, CL_GL_WGL_HDC, CL_GL_CGL_SHAREGROUP }; enum _cl_internal_ker_type { CL_INTERNAL_KERNEL_MIN = 0, CL_ENQUEUE_COPY_BUFFER_ALIGN4 = 0, CL_ENQUEUE_COPY_BUFFER_ALIGN16, CL_ENQUEUE_COPY_BUFFER_UNALIGN_SAME_OFFSET, CL_ENQUEUE_COPY_BUFFER_UNALIGN_DST_OFFSET, CL_ENQUEUE_COPY_BUFFER_UNALIGN_SRC_OFFSET, CL_ENQUEUE_COPY_BUFFER_RECT, CL_ENQUEUE_COPY_BUFFER_RECT_ALIGN4, CL_ENQUEUE_COPY_IMAGE_1D_TO_1D, //copy image 1d to image 1d CL_ENQUEUE_COPY_IMAGE_2D_TO_2D, //copy image 2d to image 2d CL_ENQUEUE_COPY_IMAGE_3D_TO_2D, //copy image 3d to image 2d CL_ENQUEUE_COPY_IMAGE_2D_TO_3D, //copy image 2d to image 3d CL_ENQUEUE_COPY_IMAGE_3D_TO_3D, //copy image 3d to image 3d CL_ENQUEUE_COPY_IMAGE_2D_TO_2D_ARRAY, //copy image 2d to image 2d array CL_ENQUEUE_COPY_IMAGE_1D_ARRAY_TO_1D_ARRAY, //copy image 1d array to image 1d array CL_ENQUEUE_COPY_IMAGE_2D_ARRAY_TO_2D_ARRAY, //copy image 2d array to image 2d array CL_ENQUEUE_COPY_IMAGE_2D_ARRAY_TO_2D, //copy image 2d array to image 2d CL_ENQUEUE_COPY_IMAGE_2D_ARRAY_TO_3D, //copy image 2d array to image 3d CL_ENQUEUE_COPY_IMAGE_3D_TO_2D_ARRAY, //copy image 3d to image 2d array CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER, //copy image 2d to buffer CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER_ALIGN16, CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER, //copy image 3d tobuffer CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D, //copy buffer to image 2d CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D_ALIGN16, CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D, //copy buffer to image 3d CL_ENQUEUE_FILL_BUFFER_UNALIGN, //fill buffer with 1 aligne pattern, pattern size=1 CL_ENQUEUE_FILL_BUFFER_ALIGN2, //fill buffer with 2 aligne pattern, pattern size=2 CL_ENQUEUE_FILL_BUFFER_ALIGN4, //fill buffer with 4 aligne pattern, pattern size=4 CL_ENQUEUE_FILL_BUFFER_ALIGN8_8, //fill buffer with 8 aligne pattern, pattern size=8 CL_ENQUEUE_FILL_BUFFER_ALIGN8_16, //fill buffer with 16 aligne pattern, pattern size=16 CL_ENQUEUE_FILL_BUFFER_ALIGN8_32, //fill buffer with 16 aligne pattern, pattern size=32 CL_ENQUEUE_FILL_BUFFER_ALIGN8_64, //fill buffer with 16 aligne pattern, pattern size=64 CL_ENQUEUE_FILL_BUFFER_ALIGN128, //fill buffer with 128 aligne pattern, pattern size=128 CL_ENQUEUE_FILL_IMAGE_1D, //fill image 1d CL_ENQUEUE_FILL_IMAGE_1D_ARRAY, //fill image 1d array CL_ENQUEUE_FILL_IMAGE_2D, //fill image 2d CL_ENQUEUE_FILL_IMAGE_2D_ARRAY, //fill image 2d array CL_ENQUEUE_FILL_IMAGE_3D, //fill image 3d CL_INTERNAL_KERNEL_MAX }; struct _cl_context_prop { cl_context_properties platform_id; enum _cl_gl_context_type gl_type; cl_context_properties gl_context; union { cl_context_properties egl_display; cl_context_properties glx_display; cl_context_properties wgl_hdc; cl_context_properties cgl_sharegroup; }; }; #define IS_EGL_CONTEXT(ctx) (ctx->props.gl_type == CL_GL_EGL_DISPLAY) #define EGL_DISP(ctx) (EGLDisplay)(ctx->props.egl_display) #define EGL_CTX(ctx) (EGLContext)(ctx->props.gl_context) /* Encapsulate the whole device */ struct _cl_context { _cl_base_object base; cl_driver drv; /* Handles HW or simulator */ cl_device_id* devices; /* All devices belong to this context */ cl_uint device_num; /* Devices number of this context */ list_head queues; /* All command queues currently allocated */ cl_uint queue_num; /* All queue number currently allocated */ cl_uint queue_modify_disable; /* Temp disable queue list change. */ list_head mem_objects; /* All memory object currently allocated */ cl_uint mem_object_num; /* All memory number currently allocated */ list_head samplers; /* All sampler object currently allocated */ cl_uint sampler_num; /* All sampler number currently allocated */ list_head events; /* All event object currently allocated */ cl_uint event_num; /* All event number currently allocated */ list_head programs; /* All programs currently allocated */ cl_uint program_num; /* All program number currently allocated */ cl_accelerator_intel accels; /* All accelerator_intel object currently allocated */ cl_program internal_prgs[CL_INTERNAL_KERNEL_MAX]; /* All programs internal used, for example clEnqueuexxx api use */ cl_kernel internal_kernels[CL_INTERNAL_KERNEL_MAX]; /* All kernels for clenqueuexxx api, for example clEnqueuexxx api use */ uint32_t ver; /* Gen version */ struct _cl_context_prop props; cl_context_properties * prop_user; /* a copy of user passed context properties when create context */ cl_uint prop_len; /* count of the properties */ void (CL_CALLBACK *pfn_notify)(const char *, const void *, size_t, void *); /* User's callback when error occur in context */ void *user_data; /* A pointer to user supplied data */ }; #define CL_OBJECT_CONTEXT_MAGIC 0x20BBCADE993134AALL #define CL_OBJECT_IS_CONTEXT(obj) ((obj && \ ((cl_base_object)obj)->magic == CL_OBJECT_CONTEXT_MAGIC && \ CL_OBJECT_GET_REF(obj) >= 1)) extern void cl_context_add_queue(cl_context ctx, cl_command_queue queue); extern void cl_context_remove_queue(cl_context ctx, cl_command_queue queue); extern void cl_context_add_mem(cl_context ctx, cl_mem mem); extern void cl_context_remove_mem(cl_context ctx, cl_mem mem); extern void cl_context_add_sampler(cl_context ctx, cl_sampler sampler); extern void cl_context_remove_sampler(cl_context ctx, cl_sampler sampler); extern void cl_context_add_event(cl_context ctx, cl_event sampler); extern void cl_context_remove_event(cl_context ctx, cl_event sampler); extern void cl_context_add_program(cl_context ctx, cl_program program); extern void cl_context_remove_program(cl_context ctx, cl_program program); /* Implement OpenCL function */ extern cl_context cl_create_context(const cl_context_properties*, cl_uint, const cl_device_id*, void (CL_CALLBACK * pfn_notify) (const char*, const void*, size_t, void*), void *, cl_int*); /* Allocate and initialize a context */ extern cl_context cl_context_new(struct _cl_context_prop *prop, cl_uint dev_num, cl_device_id* all_dev); /* Destroy and deallocate a context */ extern void cl_context_delete(cl_context); /* Increment the context reference counter */ extern void cl_context_add_ref(cl_context); /* Enqueue a ND Range kernel */ extern cl_int cl_context_ND_kernel(cl_context, cl_command_queue, cl_kernel, cl_uint, const size_t*, const size_t*, const size_t*); /* Used for allocation */ extern cl_buffer_mgr cl_context_get_bufmgr(cl_context ctx); /* Get the internal used kernel from binary*/ extern cl_kernel cl_context_get_static_kernel_from_bin(cl_context ctx, cl_int index, const char * str_kernel, size_t size, const char * str_option); /* Get the SVM from pointer, return NULL if pointer is not from SVM */ extern cl_mem cl_context_get_svm_from_ptr(cl_context ctx, const void *p); /* Get the mem from pointer, return NULL if pointer is not from mem*/ extern cl_mem cl_context_get_mem_from_ptr(cl_context ctx, const void *p); #endif /* __CL_CONTEXT_H__ */ Beignet-1.3.2-Source/src/cl_kernel.c000664 001750 001750 00000042335 13161142102 016347 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "cl_kernel.h" #include "cl_program.h" #include "cl_device_id.h" #include "cl_context.h" #include "cl_mem.h" #include "cl_alloc.h" #include "cl_utils.h" #include "cl_khr_icd.h" #include "CL/cl.h" #include "cl_sampler.h" #include "cl_accelerator_intel.h" #include "cl_cmrt.h" #include #include #include #include #include LOCAL void cl_kernel_delete(cl_kernel k) { uint32_t i; if (k == NULL) return; #ifdef HAS_CMRT if (k->cmrt_kernel != NULL) { cmrt_destroy_kernel(k); CL_OBJECT_DESTROY_BASE(k); cl_free(k); return; } #endif /* We are not done with the kernel */ if (CL_OBJECT_DEC_REF(k) > 1) return; /* Release one reference on all bos we own */ if (k->bo) cl_buffer_unreference(k->bo); /* This will be true for kernels created by clCreateKernel */ if (k->ref_its_program) cl_program_delete(k->program); /* Release the curbe if allocated */ if (k->curbe) cl_free(k->curbe); /* Release the argument array if required */ if (k->args) { for (i = 0; i < k->arg_n; ++i) if (k->args[i].mem != NULL) cl_mem_delete(k->args[i].mem); cl_free(k->args); } if (k->image_sz) cl_free(k->images); if (k->exec_info) cl_free(k->exec_info); if (k->device_enqueue_ptr) cl_mem_svm_delete(k->program->ctx, k->device_enqueue_ptr); if (k->device_enqueue_infos) cl_free(k->device_enqueue_infos); CL_OBJECT_DESTROY_BASE(k); cl_free(k); } LOCAL cl_kernel cl_kernel_new(cl_program p) { cl_kernel k = NULL; TRY_ALLOC_NO_ERR (k, CALLOC(struct _cl_kernel)); CL_OBJECT_INIT_BASE(k, CL_OBJECT_KERNEL_MAGIC); k->program = p; k->cmrt_kernel = NULL; exit: return k; error: cl_kernel_delete(k); k = NULL; goto exit; } LOCAL const char* cl_kernel_get_name(cl_kernel k) { if (UNLIKELY(k == NULL)) return NULL; return interp_kernel_get_name(k->opaque); } LOCAL const char* cl_kernel_get_attributes(cl_kernel k) { if (UNLIKELY(k == NULL)) return NULL; return interp_kernel_get_attributes(k->opaque); } LOCAL void cl_kernel_add_ref(cl_kernel k) { CL_OBJECT_INC_REF(k); } LOCAL cl_int cl_kernel_set_arg(cl_kernel k, cl_uint index, size_t sz, const void *value) { int32_t offset; /* where to patch */ enum gbe_arg_type arg_type; /* kind of argument */ size_t arg_sz; /* size of the argument */ cl_mem mem = NULL; /* for __global, __constant and image arguments */ cl_context ctx = k->program->ctx; if (UNLIKELY(index >= k->arg_n)) return CL_INVALID_ARG_INDEX; arg_type = interp_kernel_get_arg_type(k->opaque, index); arg_sz = interp_kernel_get_arg_size(k->opaque, index); if (k->vme && index == 0) { //the best method is to return the arg type of GBE_ARG_ACCELERATOR_INTEL //but it is not straightforward since clang does not support it now //the easy way is to consider typedef accelerator_intel_t as a struct, //this easy way makes the size mismatched, so use another size check method. if (sz != sizeof(cl_accelerator_intel) || arg_sz != sizeof(cl_motion_estimation_desc_intel)) return CL_INVALID_ARG_SIZE; cl_accelerator_intel* accel = (cl_accelerator_intel*)value; if ((*accel)->type != CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL) return CL_INVALID_ACCELERATOR_TYPE_INTEL; } else { if (UNLIKELY(arg_type != GBE_ARG_LOCAL_PTR && arg_sz != sz)) { if (arg_type != GBE_ARG_SAMPLER || (arg_type == GBE_ARG_SAMPLER && sz != sizeof(cl_sampler))) return CL_INVALID_ARG_SIZE; } } if(UNLIKELY(arg_type == GBE_ARG_LOCAL_PTR && sz == 0)) return CL_INVALID_ARG_SIZE; if(arg_type == GBE_ARG_VALUE) { if(UNLIKELY(value == NULL)) return CL_INVALID_ARG_VALUE; } else if(arg_type == GBE_ARG_LOCAL_PTR) { if(UNLIKELY(value != NULL)) return CL_INVALID_ARG_VALUE; } else if(arg_type == GBE_ARG_SAMPLER) { if (UNLIKELY(value == NULL)) return CL_INVALID_ARG_VALUE; cl_sampler s = *(cl_sampler*)value; if(!CL_OBJECT_IS_SAMPLER(s)) return CL_INVALID_SAMPLER; } else { // should be image, GLOBAL_PTR, CONSTANT_PTR if (UNLIKELY(value == NULL && (arg_type == GBE_ARG_IMAGE || arg_type == GBE_ARG_PIPE))) return CL_INVALID_ARG_VALUE; if(value != NULL) mem = *(cl_mem*)value; if(arg_type == GBE_ARG_PIPE) { _cl_mem_pipe* pipe= cl_mem_pipe(mem); size_t type_size = (size_t)interp_kernel_get_arg_info(k->opaque, index,5); if(pipe->packet_size != type_size) return CL_INVALID_ARG_VALUE; } if(value != NULL && mem) { if(CL_SUCCESS != cl_mem_is_valid(mem, ctx)) return CL_INVALID_MEM_OBJECT; if (UNLIKELY((arg_type == GBE_ARG_IMAGE && !IS_IMAGE(mem)) || (arg_type != GBE_ARG_IMAGE && IS_IMAGE(mem)))) return CL_INVALID_ARG_VALUE; } } /* Copy the structure or the value directly into the curbe */ if (arg_type == GBE_ARG_VALUE) { if (k->vme && index == 0) { cl_accelerator_intel accel; memcpy(&accel, value, sz); offset = interp_kernel_get_curbe_offset(k->opaque, GBE_CURBE_KERNEL_ARGUMENT, index); if (offset >= 0) { assert(offset + sz <= k->curbe_sz); memcpy(k->curbe + offset, &(accel->desc.me), arg_sz); } k->args[index].local_sz = 0; k->args[index].is_set = 1; k->args[index].mem = NULL; k->accel = accel; return CL_SUCCESS; } else { offset = interp_kernel_get_curbe_offset(k->opaque, GBE_CURBE_KERNEL_ARGUMENT, index); if (offset >= 0) { assert(offset + sz <= k->curbe_sz); memcpy(k->curbe + offset, value, sz); } k->args[index].local_sz = 0; k->args[index].is_set = 1; k->args[index].mem = NULL; return CL_SUCCESS; } } /* For a local pointer just save the size */ if (arg_type == GBE_ARG_LOCAL_PTR) { k->args[index].local_sz = sz; k->args[index].is_set = 1; k->args[index].mem = NULL; return CL_SUCCESS; } /* Is it a sampler*/ if (arg_type == GBE_ARG_SAMPLER) { cl_sampler sampler; memcpy(&sampler, value, sz); k->args[index].local_sz = 0; k->args[index].is_set = 1; k->args[index].mem = NULL; k->args[index].sampler = sampler; cl_set_sampler_arg_slot(k, index, sampler); offset = interp_kernel_get_curbe_offset(k->opaque, GBE_CURBE_KERNEL_ARGUMENT, index); if (offset >= 0) { assert(offset + 4 <= k->curbe_sz); memcpy(k->curbe + offset, &sampler->clkSamplerValue, 4); } return CL_SUCCESS; } if(value != NULL) mem = *(cl_mem*) value; if(value == NULL || mem == NULL) { /* for buffer object GLOBAL_PTR CONSTANT_PTR, it maybe NULL */ int32_t offset = interp_kernel_get_curbe_offset(k->opaque, GBE_CURBE_KERNEL_ARGUMENT, index); if (offset >= 0) *((uint32_t *)(k->curbe + offset)) = 0; assert(arg_type == GBE_ARG_GLOBAL_PTR || arg_type == GBE_ARG_CONSTANT_PTR); if (k->args[index].mem) cl_mem_delete(k->args[index].mem); k->args[index].mem = NULL; k->args[index].is_set = 1; k->args[index].local_sz = 0; return CL_SUCCESS; } mem = *(cl_mem*) value; cl_mem_add_ref(mem); if (k->args[index].mem) cl_mem_delete(k->args[index].mem); k->args[index].mem = mem; k->args[index].is_set = 1; k->args[index].is_svm = mem->is_svm; if(mem->is_svm) k->args[index].ptr = mem->host_ptr; k->args[index].local_sz = 0; k->args[index].bti = interp_kernel_get_arg_bti(k->opaque, index); return CL_SUCCESS; } LOCAL cl_int cl_kernel_set_arg_svm_pointer(cl_kernel k, cl_uint index, const void *value) { enum gbe_arg_type arg_type; /* kind of argument */ //size_t arg_sz; /* size of the argument */ cl_context ctx = k->program->ctx; cl_mem mem= cl_context_get_svm_from_ptr(ctx, value); if (UNLIKELY(index >= k->arg_n)) return CL_INVALID_ARG_INDEX; arg_type = interp_kernel_get_arg_type(k->opaque, index); //arg_sz = interp_kernel_get_arg_size(k->opaque, index); if(arg_type != GBE_ARG_GLOBAL_PTR && arg_type != GBE_ARG_CONSTANT_PTR ) return CL_INVALID_ARG_VALUE; if(mem == NULL) return CL_INVALID_ARG_VALUE; cl_mem_add_ref(mem); if (k->args[index].mem) cl_mem_delete(k->args[index].mem); k->args[index].ptr = (void *)value; k->args[index].mem = mem; k->args[index].is_set = 1; k->args[index].is_svm = 1; k->args[index].local_sz = 0; k->args[index].bti = interp_kernel_get_arg_bti(k->opaque, index); return 0; } LOCAL cl_int cl_kernel_set_exec_info(cl_kernel k, size_t n, const void *value) { cl_int err = CL_SUCCESS; assert(k != NULL); if (n == 0) return err; TRY_ALLOC(k->exec_info, cl_calloc(n, 1)); memcpy(k->exec_info, value, n); k->exec_info_n = n / sizeof(void *); error: return err; } LOCAL int cl_get_kernel_arg_info(cl_kernel k, cl_uint arg_index, cl_kernel_arg_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { assert(k != NULL); void *ret_info = interp_kernel_get_arg_info(k->opaque, arg_index, param_name - CL_KERNEL_ARG_ADDRESS_QUALIFIER); uint32_t arg_type = interp_kernel_get_arg_type(k->opaque, arg_index); int str_len = 0; cl_kernel_arg_type_qualifier type_qual = CL_KERNEL_ARG_TYPE_NONE; switch (param_name) { case CL_KERNEL_ARG_ADDRESS_QUALIFIER: if (param_value_size_ret) *param_value_size_ret = sizeof(cl_kernel_arg_address_qualifier); if (!param_value) return CL_SUCCESS; if (param_value_size < sizeof(cl_kernel_arg_address_qualifier)) return CL_INVALID_VALUE; if ((size_t)ret_info == 0) { *(cl_kernel_arg_address_qualifier *)param_value = CL_KERNEL_ARG_ADDRESS_PRIVATE; } else if ((size_t)ret_info == 1 || (size_t)ret_info == 4) { *(cl_kernel_arg_address_qualifier *)param_value = CL_KERNEL_ARG_ADDRESS_GLOBAL; } else if ((size_t)ret_info == 2) { *(cl_kernel_arg_address_qualifier *)param_value = CL_KERNEL_ARG_ADDRESS_CONSTANT; } else if ((size_t)ret_info == 3) { *(cl_kernel_arg_address_qualifier *)param_value = CL_KERNEL_ARG_ADDRESS_LOCAL; } else { /* If no address qualifier is specified, the default address qualifier which is CL_KERNEL_ARG_ADDRESS_PRIVATE is returned. */ *(cl_kernel_arg_address_qualifier *)param_value = CL_KERNEL_ARG_ADDRESS_PRIVATE; } return CL_SUCCESS; case CL_KERNEL_ARG_ACCESS_QUALIFIER: if (param_value_size_ret) *param_value_size_ret = sizeof(cl_kernel_arg_access_qualifier); if (!param_value) return CL_SUCCESS; if (param_value_size < sizeof(cl_kernel_arg_access_qualifier)) return CL_INVALID_VALUE; if (!strcmp((char*)ret_info, "write_only")) { *(cl_kernel_arg_address_qualifier *)param_value = CL_KERNEL_ARG_ACCESS_WRITE_ONLY; } else if (!strcmp((char*)ret_info, "read_only")) { *(cl_kernel_arg_address_qualifier *)param_value = CL_KERNEL_ARG_ACCESS_READ_ONLY; } else if (!strcmp((char*)ret_info, "read_write")) { *(cl_kernel_arg_address_qualifier *)param_value = CL_KERNEL_ARG_ACCESS_READ_WRITE; } else { *(cl_kernel_arg_address_qualifier *)param_value = CL_KERNEL_ARG_ACCESS_NONE; } return CL_SUCCESS; case CL_KERNEL_ARG_TYPE_NAME: case CL_KERNEL_ARG_NAME: str_len = strlen(ret_info); if (param_value_size_ret) *param_value_size_ret = str_len + 1; if (!param_value) return CL_SUCCESS; if (param_value_size < str_len + 1) return CL_INVALID_VALUE; memcpy(param_value, ret_info, str_len); ((char *)param_value)[str_len] = 0; return CL_SUCCESS; case CL_KERNEL_ARG_TYPE_QUALIFIER: if (param_value_size_ret) *param_value_size_ret = sizeof(cl_kernel_arg_type_qualifier); if (!param_value) return CL_SUCCESS; if (param_value_size < sizeof(cl_kernel_arg_type_qualifier)) return CL_INVALID_VALUE; if (strstr((char*)ret_info, "const") && (arg_type == GBE_ARG_GLOBAL_PTR || arg_type == GBE_ARG_CONSTANT_PTR || arg_type == GBE_ARG_LOCAL_PTR)) type_qual = type_qual | CL_KERNEL_ARG_TYPE_CONST; if (strstr((char*)ret_info, "volatile")) type_qual = type_qual | CL_KERNEL_ARG_TYPE_VOLATILE; if (strstr((char*)ret_info, "restrict")) type_qual = type_qual | CL_KERNEL_ARG_TYPE_RESTRICT; if (strstr((char*)ret_info, "pipe")) type_qual = CL_KERNEL_ARG_TYPE_PIPE; *(cl_kernel_arg_type_qualifier *)param_value = type_qual; return CL_SUCCESS; default: assert(0); } return CL_SUCCESS; } LOCAL uint32_t cl_kernel_get_simd_width(cl_kernel k) { assert(k != NULL); return interp_kernel_get_simd_width(k->opaque); } LOCAL void cl_kernel_setup(cl_kernel k, gbe_kernel opaque) { cl_context ctx = k->program->ctx; cl_buffer_mgr bufmgr = cl_context_get_bufmgr(ctx); if(k->bo != NULL) cl_buffer_unreference(k->bo); /* Allocate the gen code here */ const uint32_t code_sz = interp_kernel_get_code_size(opaque); const char *code = interp_kernel_get_code(opaque); k->bo = cl_buffer_alloc(bufmgr, "CL kernel", code_sz, 64u); k->arg_n = interp_kernel_get_arg_num(opaque); /* Upload the code */ cl_buffer_subdata(k->bo, 0, code_sz, code); k->opaque = opaque; const char* kname = cl_kernel_get_name(k); if (kname != NULL && strncmp(kname, "block_motion_estimate_intel", sizeof("block_motion_estimate_intel")) == 0) k->vme = 1; else k->vme = 0; /* Create the curbe */ k->curbe_sz = interp_kernel_get_curbe_size(k->opaque); /* Get sampler data & size */ k->sampler_sz = interp_kernel_get_sampler_size(k->opaque); assert(k->sampler_sz <= GEN_MAX_SAMPLERS); if (k->sampler_sz > 0) interp_kernel_get_sampler_data(k->opaque, k->samplers); interp_kernel_get_compile_wg_size(k->opaque, k->compile_wg_sz); k->stack_size = interp_kernel_get_stack_size(k->opaque); /* Get image data & size */ k->image_sz = interp_kernel_get_image_size(k->opaque); assert(k->sampler_sz <= GEN_MAX_SURFACES); assert(k->image_sz <= ctx->devices[0]->max_read_image_args + ctx->devices[0]->max_write_image_args); if (k->image_sz > 0) { TRY_ALLOC_NO_ERR(k->images, cl_calloc(k->image_sz, sizeof(k->images[0]))); interp_kernel_get_image_data(k->opaque, k->images); } else k->images = NULL; return; error: cl_buffer_unreference(k->bo); k->bo = NULL; } LOCAL cl_kernel cl_kernel_dup(cl_kernel from) { cl_kernel to = NULL; if (UNLIKELY(from == NULL)) return NULL; TRY_ALLOC_NO_ERR (to, CALLOC(struct _cl_kernel)); CL_OBJECT_INIT_BASE(to, CL_OBJECT_KERNEL_MAGIC); to->bo = from->bo; to->opaque = from->opaque; to->vme = from->vme; to->program = from->program; to->arg_n = from->arg_n; to->curbe_sz = from->curbe_sz; to->sampler_sz = from->sampler_sz; to->image_sz = from->image_sz; to->exec_info_n = from->exec_info_n; memcpy(to->compile_wg_sz, from->compile_wg_sz, sizeof(from->compile_wg_sz)); to->stack_size = from->stack_size; if (to->sampler_sz) memcpy(to->samplers, from->samplers, to->sampler_sz * sizeof(uint32_t)); if (to->image_sz) { TRY_ALLOC_NO_ERR(to->images, cl_calloc(to->image_sz, sizeof(to->images[0]))); memcpy(to->images, from->images, to->image_sz * sizeof(to->images[0])); } else to->images = NULL; if (to->exec_info_n) { /* Must always 0 here */ TRY_ALLOC_NO_ERR(to->exec_info, cl_calloc(to->exec_info_n, sizeof(void *))); memcpy(to->exec_info, from->exec_info, to->exec_info_n * sizeof(void *)); } TRY_ALLOC_NO_ERR(to->args, cl_calloc(to->arg_n, sizeof(cl_argument))); if (to->curbe_sz) TRY_ALLOC_NO_ERR(to->curbe, cl_calloc(1, to->curbe_sz)); /* Retain the bos */ if (from->bo) cl_buffer_reference(from->bo); /* We retain the program destruction since this kernel (user allocated) * depends on the program for some of its pointers */ assert(from->program); cl_program_add_ref(from->program); to->ref_its_program = CL_TRUE; exit: return to; error: cl_kernel_delete(to); to = NULL; goto exit; } LOCAL cl_int cl_kernel_work_group_sz(cl_kernel ker, const size_t *local_wk_sz, uint32_t wk_dim, size_t *wk_grp_sz) { cl_int err = CL_SUCCESS; size_t sz = 0; cl_uint i; for (i = 0; i < wk_dim; ++i) { const uint32_t required_sz = interp_kernel_get_required_work_group_size(ker->opaque, i); if (required_sz != 0 && required_sz != local_wk_sz[i]) { err = CL_INVALID_WORK_ITEM_SIZE; goto error; } } sz = local_wk_sz[0]; for (i = 1; i < wk_dim; ++i) sz *= local_wk_sz[i]; if (sz > cl_get_kernel_max_wg_sz(ker)) { err = CL_INVALID_WORK_ITEM_SIZE; goto error; } error: if (wk_grp_sz) *wk_grp_sz = sz; return err; } Beignet-1.3.2-Source/src/cl_platform_id.c000664 001750 001750 00000004164 13161142102 017365 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "cl_platform_id.h" #include "cl_internals.h" #include "cl_utils.h" #include "CL/cl.h" #include "CL/cl_ext.h" #include #include #define DECL_INFO_STRING(FIELD, STRING) \ .FIELD = STRING, \ .JOIN(FIELD,_sz) = sizeof(STRING), static struct _cl_platform_id intel_platform_data = { DECL_INFO_STRING(profile, "FULL_PROFILE") DECL_INFO_STRING(version, GEN9_LIBCL_VERSION_STRING) DECL_INFO_STRING(name, "Intel Gen OCL Driver") DECL_INFO_STRING(vendor, "Intel") DECL_INFO_STRING(icd_suffix_khr, "Intel") }; #undef DECL_INFO_STRING /* Intel platform (only GPU now). It is used as default when the API's platform ptr is NULL */ static cl_platform_id intel_platform = NULL; LOCAL cl_platform_id cl_get_platform_default(void) { if (intel_platform) return intel_platform; intel_platform = &intel_platform_data; CL_OBJECT_INIT_BASE(intel_platform, CL_OBJECT_PLATFORM_MAGIC); cl_intel_platform_extension_init(intel_platform); return intel_platform; } LOCAL cl_int cl_get_platform_ids(cl_uint num_entries, cl_platform_id * platforms, cl_uint * num_platforms) { if (num_platforms != NULL) *num_platforms = 1; /* Easy right now, only one platform is supported */ if(platforms) *platforms = cl_get_platform_default(); return CL_SUCCESS; } Beignet-1.3.2-Source/src/cl_api.c000664 001750 001750 00000124245 13161142102 015641 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "cl_platform_id.h" #include "cl_device_id.h" #include "cl_context.h" #include "cl_command_queue.h" #include "cl_enqueue.h" #include "cl_event.h" #include "cl_program.h" #include "cl_kernel.h" #include "cl_mem.h" #include "cl_image.h" #include "cl_sampler.h" #include "cl_accelerator_intel.h" #include "cl_alloc.h" #include "cl_utils.h" #include "cl_cmrt.h" #include "CL/cl.h" #include "CL/cl_ext.h" #include "CL/cl_intel.h" #include #include #include #include #include "performance.h" #ifndef CL_VERSION_1_2 #define CL_MAP_WRITE_INVALIDATE_REGION (1 << 2) #define CL_DEVICE_TYPE_CUSTOM (1 << 4) #define CL_MEM_HOST_WRITE_ONLY (1 << 7) #define CL_MEM_HOST_READ_ONLY (1 << 8) #define CL_MEM_HOST_NO_ACCESS (1 << 9) typedef intptr_t cl_device_partition_property; #endif #define FILL_GETINFO_RET(TYPE, ELT, VAL, RET) \ do { \ if (param_value && param_value_size < sizeof(TYPE)*ELT) \ return CL_INVALID_VALUE; \ if (param_value) { \ memset(param_value, 0, param_value_size); \ memcpy(param_value, (VAL), sizeof(TYPE)*ELT); \ } \ \ if (param_value_size_ret) \ *param_value_size_ret = sizeof(TYPE)*ELT; \ return RET; \ } while(0) cl_int clGetPlatformIDs(cl_uint num_entries, cl_platform_id * platforms, cl_uint * num_platforms) { if(UNLIKELY(platforms == NULL && num_platforms == NULL)) return CL_INVALID_VALUE; if(UNLIKELY(num_entries == 0 && platforms != NULL)) return CL_INVALID_VALUE; return cl_get_platform_ids(num_entries, platforms, num_platforms); } cl_mem clCreateBuffer(cl_context context, cl_mem_flags flags, size_t size, void * host_ptr, cl_int * errcode_ret) { cl_mem mem = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); mem = cl_mem_new_buffer(context, flags, size, host_ptr, &err); error: if (errcode_ret) *errcode_ret = err; return mem; } cl_mem clCreateSubBuffer(cl_mem buffer, cl_mem_flags flags, cl_buffer_create_type buffer_create_type, const void * buffer_create_info, cl_int * errcode_ret) { cl_mem mem = NULL; cl_int err = CL_SUCCESS; CHECK_MEM(buffer); mem = cl_mem_new_sub_buffer(buffer, flags, buffer_create_type, buffer_create_info, &err); error: if (errcode_ret) *errcode_ret = err; return mem; } cl_mem clCreateImage(cl_context context, cl_mem_flags flags, const cl_image_format *image_format, const cl_image_desc *image_desc, void *host_ptr, cl_int * errcode_ret) { cl_mem mem = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); if (image_format == NULL) { err = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; goto error; } if (image_format->image_channel_order < CL_R || image_format->image_channel_order > CL_sBGRA) { err = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; goto error; } if (image_format->image_channel_data_type < CL_SNORM_INT8 || image_format->image_channel_data_type > CL_FLOAT) { err = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; goto error; } if (image_desc == NULL) { err = CL_INVALID_IMAGE_DESCRIPTOR; goto error; } if (image_desc->image_type <= CL_MEM_OBJECT_BUFFER || image_desc->image_type > CL_MEM_OBJECT_IMAGE1D_BUFFER) { err = CL_INVALID_IMAGE_DESCRIPTOR; goto error; } /* buffer refers to a valid buffer memory object if image_type is CL_MEM_OBJECT_IMAGE1D_BUFFER or CL_MEM_OBJECT_IMAGE2D. Otherwise it must be NULL. */ if (image_desc->image_type != CL_MEM_OBJECT_IMAGE1D_BUFFER && image_desc->image_type != CL_MEM_OBJECT_IMAGE2D && image_desc->buffer) { err = CL_INVALID_IMAGE_DESCRIPTOR; goto error; } if (image_desc->num_mip_levels || image_desc->num_samples) { err = CL_INVALID_IMAGE_DESCRIPTOR; goto error; } /* Other details check for image_desc will leave to image create. */ mem = cl_mem_new_image(context, flags, image_format, image_desc, host_ptr, &err); error: if (errcode_ret) *errcode_ret = err; return mem; } void * clSVMAlloc (cl_context context, cl_svm_mem_flags flags, size_t size, unsigned int alignment) { cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); (void) err; return cl_mem_svm_allocate(context, flags, size, alignment); error: return NULL; } void clSVMFree (cl_context context, void* svm_pointer) { cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); (void) err; return cl_mem_svm_delete(context, svm_pointer); error: return; } cl_int clEnqueueSVMFree (cl_command_queue command_queue, cl_uint num_svm_pointers, void *svm_pointers[], void (CL_CALLBACK *pfn_free_func)( cl_command_queue queue, cl_uint num_svm_pointers, void *svm_pointers[], void *user_data), void *user_data, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; cl_int i = 0; void** pointers = NULL; cl_event e = NULL; cl_int e_status; enqueue_data *data; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if(num_svm_pointers == 0 || svm_pointers == NULL) { err = CL_INVALID_VALUE; break; } for(i=0; ictx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_SVM_FREE, &err); if (err != CL_SUCCESS) { break; } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } pointers = malloc(num_svm_pointers * sizeof(void *)); if(UNLIKELY(pointers == NULL)) { err = CL_OUT_OF_HOST_MEMORY; break; } memcpy(pointers, svm_pointers, num_svm_pointers * sizeof(void *)); data = &e->exec_data; data->type = EnqueueSVMFree; data->queue = command_queue; data->pointers = pointers; data->free_func = pfn_free_func; data->size = num_svm_pointers; data->ptr = user_data; if (e_status == CL_COMPLETE) { // Sync mode, no need to queue event. err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { err = cl_event_exec(e, CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueSVMMap (cl_command_queue command_queue, cl_bool blocking_map, cl_map_flags map_flags, void *svm_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; cl_mem buffer; CHECK_QUEUE(command_queue); buffer = cl_context_get_svm_from_ptr(command_queue->ctx, svm_ptr); if(buffer == NULL) { err = CL_INVALID_VALUE; goto error; } clEnqueueMapBuffer(command_queue, buffer, blocking_map, map_flags, 0, size, num_events_in_wait_list, event_wait_list, event, &err); if(event) (*event)->event_type = CL_COMMAND_SVM_MAP; error: return err; } cl_int clEnqueueSVMUnmap (cl_command_queue command_queue, void *svm_ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; cl_mem buffer; CHECK_QUEUE(command_queue); buffer = cl_context_get_svm_from_ptr(command_queue->ctx, svm_ptr); if(buffer == NULL) { err = CL_INVALID_VALUE; goto error; } err = clEnqueueUnmapMemObject(command_queue, buffer, svm_ptr, num_events_in_wait_list, event_wait_list, event); if(event) (*event)->event_type = CL_COMMAND_SVM_UNMAP; error: return err; } cl_int clEnqueueSVMMemcpy (cl_command_queue command_queue, cl_bool blocking_copy, void *dst_ptr, const void *src_ptr, size_t size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; enqueue_data *data; cl_int e_status; cl_event e = NULL; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if(UNLIKELY(dst_ptr == NULL || src_ptr == NULL || size == 0 )) { err = CL_INVALID_VALUE; break; } if(((size_t)src_ptr < (size_t)dst_ptr && ((size_t)src_ptr + size > (size_t)dst_ptr)) || ((size_t)dst_ptr < (size_t)src_ptr && ((size_t)dst_ptr + size > (size_t)src_ptr))) { err = CL_MEM_COPY_OVERLAP; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_SVM_MEMCPY, &err); if (err != CL_SUCCESS) { break; } if (blocking_copy) { err = cl_event_wait_for_event_ready(e); if (err != CL_SUCCESS) break; /* Blocking call API is a sync point of flush. */ err = cl_command_queue_wait_flush(command_queue); if (err != CL_SUCCESS) { break; } } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } data = &e->exec_data; data->type = EnqueueSVMMemCopy; data->queue = command_queue; data->ptr = dst_ptr; data->const_ptr = src_ptr; data->size = size; if (e_status == CL_COMPLETE) { // Sync mode, no need to queue event. err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { err = cl_event_exec(e, CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } } while(0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueSVMMemFill (cl_command_queue command_queue, void *svm_ptr, const void *pattern, size_t pattern_size, size_t size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; enqueue_data *data; cl_int e_status; cl_event e = NULL; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if(UNLIKELY(svm_ptr == NULL || ((size_t)svm_ptr & (pattern_size - 1)) != 0)) { err = CL_INVALID_VALUE; break; } if(UNLIKELY(pattern == NULL || (pattern_size & (pattern_size - 1)) != 0 || pattern_size > 128)) { err = CL_INVALID_VALUE; break; } if(UNLIKELY(size == 0 || (size % pattern_size) != 0)) { err = CL_INVALID_VALUE; break; } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_SVM_MEMFILL, &err); if (err != CL_SUCCESS) { break; } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } data = &e->exec_data; data->type = EnqueueSVMMemFill; data->queue = command_queue; data->ptr = svm_ptr; data->const_ptr = pattern; data->pattern_size = pattern_size; data->size = size; if (e_status == CL_COMPLETE) { // Sync mode, no need to queue event. err = cl_event_exec(e, CL_COMPLETE, CL_FALSE); if (err != CL_SUCCESS) { break; } } else { err = cl_event_exec(e, CL_QUEUED, CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); } } while(0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_mem clCreateImage2D(cl_context context, cl_mem_flags flags, const cl_image_format * image_format, size_t image_width, size_t image_height, size_t image_row_pitch, void * host_ptr, cl_int * errcode_ret) { cl_mem mem = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); cl_image_desc image_desc; memset(&image_desc, 0, sizeof(image_desc)); image_desc.image_type = CL_MEM_OBJECT_IMAGE2D; image_desc.image_width = image_width; image_desc.image_height = image_height; image_desc.image_row_pitch = image_row_pitch; mem = cl_mem_new_image(context, flags, image_format, &image_desc, host_ptr, &err); error: if (errcode_ret) *errcode_ret = err; return mem; } cl_mem clCreateImage3D(cl_context context, cl_mem_flags flags, const cl_image_format * image_format, size_t image_width, size_t image_height, size_t image_depth, size_t image_row_pitch, size_t image_slice_pitch, void * host_ptr, cl_int * errcode_ret) { cl_mem mem = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); cl_image_desc image_desc; image_desc.image_type = CL_MEM_OBJECT_IMAGE3D; image_desc.image_width = image_width; image_desc.image_height = image_height; image_desc.image_depth = image_depth; image_desc.image_row_pitch = image_row_pitch; image_desc.image_slice_pitch = image_slice_pitch; mem = cl_mem_new_image(context, flags, image_format, &image_desc, host_ptr, &err); error: if (errcode_ret) *errcode_ret = err; return mem; } cl_int clGetSupportedImageFormats(cl_context ctx, cl_mem_flags flags, cl_mem_object_type image_type, cl_uint num_entries, cl_image_format * image_formats, cl_uint * num_image_formats) { cl_int err = CL_SUCCESS; CHECK_CONTEXT (ctx); if (UNLIKELY(num_entries == 0 && image_formats != NULL)) { err = CL_INVALID_VALUE; goto error; } if (UNLIKELY(image_type != CL_MEM_OBJECT_IMAGE1D && image_type != CL_MEM_OBJECT_IMAGE1D_ARRAY && image_type != CL_MEM_OBJECT_IMAGE1D_BUFFER && image_type != CL_MEM_OBJECT_IMAGE2D_ARRAY && image_type != CL_MEM_OBJECT_IMAGE2D && image_type != CL_MEM_OBJECT_IMAGE3D)) { err = CL_INVALID_VALUE; goto error; } err = cl_image_get_supported_fmt(ctx, flags, image_type, num_entries, image_formats, num_image_formats); error: return err; } cl_sampler clCreateSamplerWithProperties(cl_context context, const cl_sampler_properties *sampler_properties, cl_int * errcode_ret) { cl_sampler sampler = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); cl_bool normalized = 0xFFFFFFFF; cl_addressing_mode addressing = 0xFFFFFFFF; cl_filter_mode filter = 0xFFFFFFFF; if(sampler_properties) { cl_ulong sam_type; cl_ulong sam_val; cl_uint i; for(i = 0;(sam_type = sampler_properties[i++])!=0;i++) { sam_val = sampler_properties[i]; switch(sam_type) { case CL_SAMPLER_NORMALIZED_COORDS: if(normalized != 0xFFFFFFFF) err = CL_INVALID_VALUE; else if(sam_val == CL_TRUE || sam_val == CL_FALSE) normalized = sam_val; else err = CL_INVALID_VALUE; break; case CL_SAMPLER_ADDRESSING_MODE: if(addressing != 0xFFFFFFFF) err = CL_INVALID_VALUE; else if(sam_val == CL_ADDRESS_MIRRORED_REPEAT || sam_val == CL_ADDRESS_REPEAT || sam_val == CL_ADDRESS_CLAMP_TO_EDGE || sam_val == CL_ADDRESS_CLAMP || sam_val == CL_ADDRESS_NONE) addressing = sam_val; else err = CL_INVALID_VALUE; break; case CL_SAMPLER_FILTER_MODE: if(filter != 0xFFFFFFFF) err = CL_INVALID_VALUE; else if(sam_val == CL_FILTER_LINEAR || sam_val == CL_FILTER_NEAREST) filter = sam_val; else err = CL_INVALID_VALUE; break; default: err = CL_INVALID_VALUE; break; } } } if(err) goto error; if(normalized == 0xFFFFFFFF) normalized = CL_TRUE; if(addressing == 0xFFFFFFFF) addressing = CL_ADDRESS_CLAMP; if(filter == 0xFFFFFFFF) filter = CL_FILTER_NEAREST; sampler = cl_create_sampler(context, normalized, addressing, filter, &err); error: if (errcode_ret) *errcode_ret = err; return sampler; } cl_program clCreateProgramWithSource(cl_context context, cl_uint count, const char ** strings, const size_t * lengths, cl_int * errcode_ret) { cl_program program = NULL; cl_int err = CL_SUCCESS; cl_uint i; CHECK_CONTEXT (context); INVALID_VALUE_IF (count == 0); INVALID_VALUE_IF (strings == NULL); for(i = 0; i < count; i++) { if(UNLIKELY(strings[i] == NULL)) { err = CL_INVALID_VALUE; goto error; } } program = cl_program_create_from_source(context, count, strings, lengths, &err); error: if (errcode_ret) *errcode_ret = err; return program; } cl_program clCreateProgramWithBinary(cl_context context, cl_uint num_devices, const cl_device_id * devices, const size_t * lengths, const unsigned char ** binaries, cl_int * binary_status, cl_int * errcode_ret) { cl_program program = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); program = cl_program_create_from_binary(context, num_devices, devices, lengths, binaries, binary_status, &err); error: if (errcode_ret) *errcode_ret = err; return program; } cl_program clCreateProgramWithBuiltInKernels(cl_context context, cl_uint num_devices, const cl_device_id * device_list, const char * kernel_names, cl_int * errcode_ret) { cl_program program = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); INVALID_VALUE_IF (kernel_names == NULL); program = cl_program_create_with_built_in_kernles(context, num_devices, device_list, kernel_names, &err); error: if (errcode_ret) *errcode_ret = err; return program; } cl_int clRetainProgram(cl_program program) { cl_int err = CL_SUCCESS; CHECK_PROGRAM (program); cl_program_add_ref(program); error: return err; } cl_int clReleaseProgram(cl_program program) { cl_int err = CL_SUCCESS; CHECK_PROGRAM (program); cl_program_delete(program); error: return err; } cl_int clBuildProgram(cl_program program, cl_uint num_devices, const cl_device_id * device_list, const char * options, void (CL_CALLBACK *pfn_notify) (cl_program, void*), void * user_data) { cl_int err = CL_SUCCESS; CHECK_PROGRAM(program); INVALID_VALUE_IF (num_devices > 1); INVALID_VALUE_IF (num_devices == 0 && device_list != NULL); INVALID_VALUE_IF (num_devices != 0 && device_list == NULL); INVALID_VALUE_IF (pfn_notify == 0 && user_data != NULL); /* Everything is easy. We only support one device anyway */ if (num_devices != 0) { assert(program->ctx); err = cl_devices_list_include_check(program->ctx->device_num, program->ctx->devices, num_devices, device_list); if (err) goto error; } assert(program->source_type == FROM_LLVM || program->source_type == FROM_SOURCE || program->source_type == FROM_LLVM_SPIR || program->source_type == FROM_BINARY || program->source_type == FROM_CMRT); if((err = cl_program_build(program, options)) != CL_SUCCESS) { goto error; } program->is_built = CL_TRUE; if (pfn_notify) pfn_notify(program, user_data); error: return err; } cl_int clCompileProgram(cl_program program , cl_uint num_devices , const cl_device_id * device_list , const char * options , cl_uint num_input_headers , const cl_program * input_headers , const char ** header_include_names , void (CL_CALLBACK * pfn_notify )(cl_program, void *), void * user_data ) { cl_int err = CL_SUCCESS; CHECK_PROGRAM(program); INVALID_VALUE_IF (num_devices > 1); INVALID_VALUE_IF (num_devices == 0 && device_list != NULL); INVALID_VALUE_IF (num_devices != 0 && device_list == NULL); INVALID_VALUE_IF (pfn_notify == 0 && user_data != NULL); INVALID_VALUE_IF (num_input_headers == 0 && input_headers != NULL); INVALID_VALUE_IF (num_input_headers != 0 && input_headers == NULL); /* Everything is easy. We only support one device anyway */ if (num_devices != 0) { assert(program->ctx); err = cl_devices_list_include_check(program->ctx->device_num, program->ctx->devices, num_devices, device_list); if (err) goto error; } /* TODO support create program from binary */ assert(program->source_type == FROM_LLVM || program->source_type == FROM_SOURCE || program->source_type == FROM_LLVM_SPIR || program->source_type == FROM_BINARY); if((err = cl_program_compile(program, num_input_headers, input_headers, header_include_names, options)) != CL_SUCCESS) { goto error; } program->is_built = CL_TRUE; if (pfn_notify) pfn_notify(program, user_data); error: return err; } cl_program clLinkProgram(cl_context context, cl_uint num_devices, const cl_device_id * device_list, const char * options, cl_uint num_input_programs, const cl_program * input_programs, void (CL_CALLBACK * pfn_notify)(cl_program program, void * user_data), void * user_data, cl_int * errcode_ret) { cl_int err = CL_SUCCESS; cl_program program = NULL; CHECK_CONTEXT (context); INVALID_VALUE_IF (num_devices > 1); INVALID_VALUE_IF (num_devices == 0 && device_list != NULL); INVALID_VALUE_IF (num_devices != 0 && device_list == NULL); INVALID_VALUE_IF (pfn_notify == 0 && user_data != NULL); INVALID_VALUE_IF (num_input_programs == 0 && input_programs != NULL); INVALID_VALUE_IF (num_input_programs != 0 && input_programs == NULL); INVALID_VALUE_IF (num_input_programs == 0 && input_programs == NULL); program = cl_program_link(context, num_input_programs, input_programs, options, &err); if(program) program->is_built = CL_TRUE; if (pfn_notify) pfn_notify(program, user_data); error: if (errcode_ret) *errcode_ret = err; return program; } cl_int clUnloadCompiler(void) { return CL_SUCCESS; } cl_int clUnloadPlatformCompiler(cl_platform_id platform) { return CL_SUCCESS; } cl_kernel clCreateKernel(cl_program program, const char * kernel_name, cl_int * errcode_ret) { cl_kernel kernel = NULL; cl_int err = CL_SUCCESS; CHECK_PROGRAM (program); if (program->ker_n <= 0) { err = CL_INVALID_PROGRAM_EXECUTABLE; goto error; } INVALID_VALUE_IF (kernel_name == NULL); kernel = cl_program_create_kernel(program, kernel_name, &err); error: if (errcode_ret) *errcode_ret = err; return kernel; } cl_int clCreateKernelsInProgram(cl_program program, cl_uint num_kernels, cl_kernel * kernels, cl_uint * num_kernels_ret) { cl_int err = CL_SUCCESS; CHECK_PROGRAM (program); if (program->ker_n <= 0) { err = CL_INVALID_PROGRAM_EXECUTABLE; goto error; } if (kernels && num_kernels < program->ker_n) { err = CL_INVALID_VALUE; goto error; } if(num_kernels_ret) *num_kernels_ret = program->ker_n; if(kernels) err = cl_program_create_kernels_in_program(program, kernels); error: return err; } cl_int clRetainKernel(cl_kernel kernel) { cl_int err = CL_SUCCESS; CHECK_KERNEL(kernel); cl_kernel_add_ref(kernel); error: return err; } cl_int clReleaseKernel(cl_kernel kernel) { cl_int err = CL_SUCCESS; CHECK_KERNEL(kernel); cl_kernel_delete(kernel); error: return err; } cl_int clSetKernelArg(cl_kernel kernel, cl_uint arg_index, size_t arg_size, const void * arg_value) { cl_int err = CL_SUCCESS; CHECK_KERNEL(kernel); #ifdef HAS_CMRT if (kernel->cmrt_kernel != NULL) err = cmrt_set_kernel_arg(kernel, arg_index, arg_size, arg_value); else #endif err = cl_kernel_set_arg(kernel, arg_index, arg_size, arg_value); error: return err; } cl_int clSetKernelArgSVMPointer(cl_kernel kernel, cl_uint arg_index, const void *arg_value) { cl_int err = CL_SUCCESS; CHECK_KERNEL(kernel); err = cl_kernel_set_arg_svm_pointer(kernel, arg_index, arg_value); error: return err; } cl_int clSetKernelExecInfo(cl_kernel kernel, cl_kernel_exec_info param_name, size_t param_value_size, const void *param_value) { cl_int err = CL_SUCCESS; CHECK_KERNEL(kernel); if((param_name != CL_KERNEL_EXEC_INFO_SVM_PTRS && param_name != CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM) || param_value == NULL || param_value_size == 0) { err = CL_INVALID_VALUE; goto error; } if(param_name == CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM && *(cl_bool *)param_value == CL_TRUE) { err = CL_INVALID_OPERATION; goto error; } err = cl_kernel_set_exec_info(kernel, param_value_size, param_value); error: return err; } cl_int clGetKernelArgInfo(cl_kernel kernel, cl_uint arg_index, cl_kernel_arg_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { cl_int err = CL_SUCCESS; CHECK_KERNEL(kernel); if(kernel->program->build_opts == NULL || strstr(kernel->program->build_opts,"-cl-kernel-arg-info") == NULL ) { err = CL_KERNEL_ARG_INFO_NOT_AVAILABLE; goto error; } if (param_name != CL_KERNEL_ARG_ADDRESS_QUALIFIER && param_name != CL_KERNEL_ARG_ACCESS_QUALIFIER && param_name != CL_KERNEL_ARG_TYPE_NAME && param_name != CL_KERNEL_ARG_TYPE_QUALIFIER && param_name != CL_KERNEL_ARG_NAME) { err = CL_INVALID_VALUE; goto error; } if (arg_index >= kernel->arg_n) { err = CL_INVALID_ARG_INDEX; goto error; } err = cl_get_kernel_arg_info(kernel, arg_index, param_name, param_value_size, param_value, param_value_size_ret); error: return err; } cl_int clGetKernelWorkGroupInfo(cl_kernel kernel, cl_device_id device, cl_kernel_work_group_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { return cl_get_kernel_workgroup_info(kernel, device, param_name, param_value_size, param_value, param_value_size_ret); } cl_int clGetKernelSubGroupInfoKHR(cl_kernel kernel, cl_device_id device, cl_kernel_work_group_info param_name, size_t input_value_size, const void * input_value, size_t param_value_size, void * param_value, size_t * param_value_size_ret) { return cl_get_kernel_subgroup_info(kernel, device, param_name, input_value_size, input_value, param_value_size, param_value, param_value_size_ret); } cl_int clRetainEvent(cl_event event) { cl_int err = CL_SUCCESS; CHECK_EVENT(event); cl_event_add_ref(event); error: return err; } cl_int clReleaseEvent(cl_event event) { cl_int err = CL_SUCCESS; CHECK_EVENT(event); cl_event_delete(event); error: return err; } cl_mem clCreatePipe (cl_context context, cl_mem_flags flags, cl_uint pipe_packet_size, cl_uint pipe_max_packets, const cl_pipe_properties *properties, cl_int *errcode_ret) { cl_mem mem = NULL; cl_int err = CL_SUCCESS; cl_uint device_max_size = 0; CHECK_CONTEXT (context); if(UNLIKELY((flags & ~(CL_MEM_READ_WRITE | CL_MEM_HOST_NO_ACCESS)) != 0)) { err = CL_INVALID_VALUE; goto error; } if(UNLIKELY(properties != NULL)) { err = CL_INVALID_VALUE; goto error; } if(UNLIKELY(pipe_packet_size == 0 || pipe_max_packets == 0)) { err = CL_INVALID_PIPE_SIZE; goto error; } if ((err = cl_get_device_info(context->devices[0], CL_DEVICE_PIPE_MAX_PACKET_SIZE, sizeof(device_max_size), &device_max_size, NULL)) != CL_SUCCESS) { goto error; } if(UNLIKELY(pipe_packet_size > device_max_size)) { err = CL_INVALID_PIPE_SIZE; goto error; } if(flags == 0) flags = CL_MEM_READ_WRITE | CL_MEM_HOST_NO_ACCESS; mem = cl_mem_new_pipe(context, flags, pipe_packet_size, pipe_max_packets, &err); error: if (errcode_ret) *errcode_ret = err; return mem; } cl_int clGetPipeInfo (cl_mem pipe, cl_pipe_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { cl_int err = CL_SUCCESS; CHECK_MEM(pipe); err = cl_get_pipe_info(pipe, param_name, param_value_size, param_value, param_value_size_ret); error: return err; } #define EXTFUNC(x) \ if (strcmp(#x, func_name) == 0) \ return (void *)x; static void* internal_clGetExtensionFunctionAddress(const char *func_name) { if (func_name == NULL) return NULL; #ifdef HAS_OCLIcd /* cl_khr_icd */ EXTFUNC(clIcdGetPlatformIDsKHR) #endif EXTFUNC(clCreateProgramWithLLVMIntel) EXTFUNC(clGetGenVersionIntel) EXTFUNC(clMapBufferIntel) EXTFUNC(clUnmapBufferIntel) EXTFUNC(clMapBufferGTTIntel) EXTFUNC(clUnmapBufferGTTIntel) EXTFUNC(clPinBufferIntel) EXTFUNC(clUnpinBufferIntel) EXTFUNC(clReportUnfreedIntel) EXTFUNC(clCreateBufferFromLibvaIntel) EXTFUNC(clCreateImageFromLibvaIntel) EXTFUNC(clGetMemObjectFdIntel) EXTFUNC(clCreateBufferFromFdINTEL) EXTFUNC(clCreateImageFromFdINTEL) EXTFUNC(clCreateAcceleratorINTEL) EXTFUNC(clRetainAcceleratorINTEL) EXTFUNC(clReleaseAcceleratorINTEL) EXTFUNC(clGetAcceleratorInfoINTEL) EXTFUNC(clGetKernelSubGroupInfoKHR) return NULL; } void* clGetExtensionFunctionAddress(const char *func_name) { return internal_clGetExtensionFunctionAddress(func_name); } void* clGetExtensionFunctionAddressForPlatform(cl_platform_id platform, const char *func_name) { if (UNLIKELY(platform != NULL && platform != cl_get_platform_default())) return NULL; return internal_clGetExtensionFunctionAddress(func_name); } #undef EXTFUNC cl_int clReportUnfreedIntel(void) { return cl_report_unfreed(); } void* clMapBufferIntel(cl_mem mem, cl_int *errcode_ret) { void *ptr = NULL; cl_int err = CL_SUCCESS; CHECK_MEM (mem); ptr = cl_mem_map(mem, 1); error: if (errcode_ret) *errcode_ret = err; return ptr; } cl_int clUnmapBufferIntel(cl_mem mem) { cl_int err = CL_SUCCESS; CHECK_MEM (mem); err = cl_mem_unmap(mem); error: return err; } void* clMapBufferGTTIntel(cl_mem mem, cl_int *errcode_ret) { void *ptr = NULL; cl_int err = CL_SUCCESS; CHECK_MEM (mem); ptr = cl_mem_map_gtt(mem); error: if (errcode_ret) *errcode_ret = err; return ptr; } cl_int clUnmapBufferGTTIntel(cl_mem mem) { cl_int err = CL_SUCCESS; CHECK_MEM (mem); err = cl_mem_unmap_gtt(mem); error: return err; } cl_int clPinBufferIntel(cl_mem mem) { cl_int err = CL_SUCCESS; CHECK_MEM (mem); cl_mem_pin(mem); error: return err; } cl_int clUnpinBufferIntel(cl_mem mem) { cl_int err = CL_SUCCESS; CHECK_MEM (mem); cl_mem_unpin(mem); error: return err; } cl_int clGetGenVersionIntel(cl_device_id device, cl_int *ver) { return cl_device_get_version(device, ver); } cl_program clCreateProgramWithLLVMIntel(cl_context context, cl_uint num_devices, const cl_device_id * devices, const char * filename, cl_int * errcode_ret) { return cl_program_create_from_llvm(context, num_devices, devices, filename, errcode_ret); } cl_mem clCreateBufferFromLibvaIntel(cl_context context, unsigned int bo_name, cl_int *errorcode_ret) { cl_mem mem = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); mem = cl_mem_new_libva_buffer(context, bo_name, &err); error: if (errorcode_ret) *errorcode_ret = err; return mem; } cl_mem clCreateImageFromLibvaIntel(cl_context context, const cl_libva_image *info, cl_int *errorcode_ret) { cl_mem mem = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); if (!info) { err = CL_INVALID_VALUE; goto error; } mem = cl_mem_new_libva_image(context, info->bo_name, info->offset, info->width, info->height, info->fmt, info->row_pitch, &err); error: if (errorcode_ret) *errorcode_ret = err; return mem; } extern CL_API_ENTRY cl_int CL_API_CALL clGetMemObjectFdIntel(cl_context context, cl_mem memobj, int* fd) { cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); CHECK_MEM (memobj); err = cl_mem_get_fd(memobj, fd); error: return err; } cl_mem clCreateBufferFromFdINTEL(cl_context context, const cl_import_buffer_info_intel* info, cl_int *errorcode_ret) { cl_mem mem = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); if (!info) { err = CL_INVALID_VALUE; goto error; } mem = cl_mem_new_buffer_from_fd(context, info->fd, info->size, &err); error: if (errorcode_ret) *errorcode_ret = err; return mem; } cl_mem clCreateImageFromFdINTEL(cl_context context, const cl_import_image_info_intel* info, cl_int *errorcode_ret) { cl_mem mem = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT (context); if (!info) { err = CL_INVALID_VALUE; goto error; } /* Create image object from fd. * We just support creating CL_MEM_OBJECT_IMAGE2D image object now. * Other image type will be supported later if necessary. */ if(info->type == CL_MEM_OBJECT_IMAGE2D){ mem = cl_mem_new_image_from_fd(context, info->fd, info->size, info->offset, info->width, info->height, info->fmt, info->row_pitch, &err); } else{ err = CL_INVALID_ARG_VALUE; goto error; } error: if (errorcode_ret) *errorcode_ret = err; return mem; } cl_accelerator_intel clCreateAcceleratorINTEL(cl_context context, cl_accelerator_type_intel accel_type, size_t desc_sz, const void* desc, cl_int* errcode_ret) { cl_accelerator_intel accel = NULL; cl_int err = CL_SUCCESS; CHECK_CONTEXT(context); accel = cl_accelerator_intel_new(context, accel_type, desc_sz, desc, &err); error: if (errcode_ret) *errcode_ret = err; return accel; } cl_int clRetainAcceleratorINTEL(cl_accelerator_intel accel) { cl_int err = CL_SUCCESS; CHECK_ACCELERATOR_INTEL(accel); cl_accelerator_intel_add_ref(accel); error: return err; } cl_int clReleaseAcceleratorINTEL(cl_accelerator_intel accel) { cl_int err = CL_SUCCESS; CHECK_ACCELERATOR_INTEL(accel); cl_accelerator_intel_delete(accel); error: return err; } cl_int clGetAcceleratorInfoINTEL(cl_accelerator_intel accel, cl_accelerator_info_intel param_name, size_t param_value_size, void* param_value, size_t* param_value_size_ret) { cl_int err = CL_SUCCESS; CHECK_ACCELERATOR_INTEL(accel); if (param_name == CL_ACCELERATOR_REFERENCE_COUNT_INTEL) { cl_uint ref = CL_OBJECT_GET_REF(accel); FILL_GETINFO_RET (cl_uint, 1, &ref, CL_SUCCESS); } else if (param_name == CL_ACCELERATOR_CONTEXT_INTEL) { FILL_GETINFO_RET (cl_context, 1, &accel->ctx, CL_SUCCESS); } else if (param_name == CL_ACCELERATOR_TYPE_INTEL) { FILL_GETINFO_RET (cl_uint, 1, &accel->type, CL_SUCCESS); } else if (param_name == CL_ACCELERATOR_DESCRIPTOR_INTEL) { FILL_GETINFO_RET (cl_motion_estimation_desc_intel, 1, &(accel->desc.me), CL_SUCCESS); } else{ return CL_INVALID_VALUE; } error: return err; } Beignet-1.3.2-Source/src/cl_gen7_device.h000664 001750 001750 00000002464 13161142102 017252 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* Common fields for both IVB devices (either GT1 or GT2) */ .max_parameter_size = 1024, .global_mem_cache_line_size = 64, /* XXX */ .global_mem_cache_size = 8 << 10, /* XXX */ .local_mem_type = CL_LOCAL, .local_mem_size = 64 << 10, .scratch_mem_size = 12 << 10, .max_mem_alloc_size = 2 * 1024 * 1024 * 1024ul, .global_mem_size = 2 * 1024 * 1024 * 1024ul, //temporarily define to only export builtin kernel block_motion_estimate_intel only for Gen7 //will remove after HSW+ also support #define GEN7_DEVICE #include "cl_gt_device.h" #undef GEN7_DEVICE Beignet-1.3.2-Source/src/cl_device_enqueue.c000664 001750 001750 00000014715 13161142102 020056 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Rong Yang */ #include "cl_device_enqueue.h" #include "cl_mem.h" #include "cl_utils.h" #include "cl_context.h" #include "cl_program.h" #include "cl_alloc.h" #include "cl_kernel.h" #include "cl_command_queue.h" #include "cl_event.h" LOCAL cl_int cl_device_enqueue_fix_offset(cl_kernel ker) { uint32_t i; void *ptr; cl_mem mem; enum gbe_arg_type arg_type; /* kind of argument */ for (i = 0; i < ker->arg_n; ++i) { arg_type = interp_kernel_get_arg_type(ker->opaque, i); //HOW about image if (!(arg_type == GBE_ARG_GLOBAL_PTR || arg_type == GBE_ARG_CONSTANT_PTR) || !ker->args[i].mem) continue; if(!ker->args[i].is_svm) { mem = ker->args[i].mem; ptr = cl_mem_map(mem, 0); cl_buffer_set_softpin_offset(mem->bo, (size_t)ptr); cl_buffer_set_bo_use_full_range(mem->bo, 1); cl_buffer_disable_reuse(mem->bo); mem->host_ptr = ptr; cl_mem_unmap(mem); ker->device_enqueue_infos[ker->device_enqueue_info_n++] = ptr; } else { ker->device_enqueue_infos[ker->device_enqueue_info_n++] = ker->args[i].mem->host_ptr; } } return 0; } LOCAL cl_int cl_device_enqueue_bind_buffer(cl_gpgpu gpgpu, cl_kernel ker, uint32_t *max_bti, cl_gpgpu_kernel *kernel) { int32_t value = GBE_CURBE_ENQUEUE_BUF_POINTER; int32_t offset = interp_kernel_get_curbe_offset(ker->opaque, value, 0); size_t buf_size = 32 * 1024 * 1024; //fix 32M cl_mem mem; if(offset > 0) { if(ker->useDeviceEnqueue == false) { if(ker->device_enqueue_ptr == NULL) ker->device_enqueue_ptr = cl_mem_svm_allocate(ker->program->ctx, 0, buf_size, 0); if(ker->device_enqueue_infos == NULL) ker->device_enqueue_infos = cl_calloc(ker->arg_n, sizeof(void *)); ker->device_enqueue_info_n = 0; ker->useDeviceEnqueue = CL_TRUE; cl_device_enqueue_fix_offset(ker); cl_kernel_add_ref(ker); } mem = cl_context_get_svm_from_ptr(ker->program->ctx, ker->device_enqueue_ptr); assert(mem); cl_gpgpu_bind_buf(gpgpu, mem->bo, offset, 0, buf_size, *max_bti); cl_gpgpu_set_kernel(gpgpu, ker); } return 0; } typedef struct ndrange_info_t { int type; int global_work_size[3]; int local_work_size[3]; int global_work_offset[3]; } ndrange_info_t; typedef struct Block_literal { void *isa; // initialized to &_NSConcreteStackBlock or &_NSConcreteGlobalBlock int flags; int reserved; int index; struct Block_descriptor_1 { unsigned long int slm_size; // NULL unsigned long int size; // sizeof(struct Block_literal_1) // optional helper functions void *copy_helper; // IFF (1<<25) void *dispose_helper; // IFF (1<<25) // required ABI.2010.3.16 const char *signature; // IFF (1<<30) } *descriptor; // imported variables } Block_literal; LOCAL cl_int cl_device_enqueue_parse_result(cl_command_queue queue, cl_gpgpu gpgpu) { cl_mem mem; int size, type, dim, i; const char * kernel_name; cl_kernel child_ker; cl_event evt = NULL; cl_kernel ker = cl_gpgpu_get_kernel(gpgpu); if(ker == NULL || ker->useDeviceEnqueue == CL_FALSE) return 0; void *buf = cl_gpgpu_ref_batch_buf(gpgpu); //wait the gpgpu's batch buf finish, the gpgpu in queue may be not //same as the param gpgpu, for example when flush event. cl_gpgpu_sync(buf); cl_gpgpu_unref_batch_buf(buf); mem = cl_context_get_svm_from_ptr(ker->program->ctx, ker->device_enqueue_ptr); if(mem == NULL) return -1; char *ptr = (char *)cl_mem_map(mem, 0); size = *(int *)ptr; ptr += 4; while(size > 0) { size_t fixed_global_off[] = {0,0,0}; size_t fixed_global_sz[] = {1,1,1}; size_t fixed_local_sz[] = {1,1,1}; ndrange_info_t* ndrange_info = (ndrange_info_t *)ptr; size -= sizeof(ndrange_info_t); ptr += sizeof(ndrange_info_t); Block_literal *block = (Block_literal *)ptr; size -= block->descriptor->size; ptr += block->descriptor->size; type = ndrange_info->type; dim = (type & 0xf0) >> 4; type = type & 0xf; assert(dim <= 2); for(i = 0; i <= dim; i++) { fixed_global_sz[i] = ndrange_info->global_work_size[i]; if(type > 1) fixed_local_sz[i] = ndrange_info->local_work_size[i]; if(type > 2) fixed_global_off[i] = ndrange_info->global_work_offset[i]; } int *slm_sizes = (int *)ptr; int slm_size = block->descriptor->slm_size; size -= slm_size; ptr += slm_size; kernel_name = interp_program_get_device_enqueue_kernel_name(ker->program->opaque, block->index); child_ker = cl_program_create_kernel(ker->program, kernel_name, NULL); assert(child_ker); cl_kernel_set_arg_svm_pointer(child_ker, 0, block); int index = 1; for(i=0; idevice_enqueue_info_n * sizeof(void *), ker->device_enqueue_infos); if (evt != NULL) { clReleaseEvent(evt); evt = NULL; } clEnqueueNDRangeKernel(queue, child_ker, dim + 1, fixed_global_off, fixed_global_sz, fixed_local_sz, 0, NULL, &evt); cl_command_queue_flush_gpgpu(gpgpu); cl_kernel_delete(child_ker); } if (evt != NULL) { //Can't call clWaitForEvents here, it may cause dead lock. //If evt->exec_data.gpgpu is NULL, evt has finished. if (evt->exec_data.gpgpu) { buf = cl_gpgpu_ref_batch_buf(evt->exec_data.gpgpu); //wait the gpgpu's batch buf finish, the gpgpu in queue may be not //same as the param gpgpu, for example when flush event. cl_gpgpu_sync(buf); cl_gpgpu_unref_batch_buf(buf); } clReleaseEvent(evt); evt = NULL; } cl_mem_unmap_auto(mem); cl_kernel_delete(ker); return 0; } Beignet-1.3.2-Source/src/cl_driver_type.h000664 001750 001750 00000001702 13161142102 017421 0ustar00yryr000000 000000 /************************************************************************** * cl_driver: * Hide behind some call backs the buffer allocation / deallocation ... This * will allow us to make the use of a software performance simulator easier and * to minimize the code specific for the HW and for the simulator **************************************************************************/ #ifndef __CL_DRIVER_TYPE_H__ #define __CL_DRIVER_TYPE_H__ /* Encapsulates command buffer / data buffer / kernels */ typedef struct _cl_buffer *cl_buffer; /* Encapsulates buffer manager */ typedef struct _cl_buffer_mgr *cl_buffer_mgr; /* Encapsulates the driver backend functionalities */ typedef struct _cl_driver *cl_driver; /* Encapsulates the gpgpu stream of commands */ typedef struct _cl_gpgpu *cl_gpgpu; /* Encapsulates the event of a command stream */ typedef struct _cl_gpgpu_event *cl_gpgpu_event; typedef struct _cl_context_prop *cl_context_prop; #endif Beignet-1.3.2-Source/src/cl_base_object.c000664 001750 001750 00000006612 13161142102 017325 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include #include "cl_base_object.h" static pthread_t invalid_thread_id = -1; LOCAL void cl_object_init_base(cl_base_object obj, cl_ulong magic) { obj->magic = magic; obj->ref = 1; SET_ICD(obj->dispatch); pthread_mutex_init(&obj->mutex, NULL); pthread_cond_init(&obj->cond, NULL); obj->owner = invalid_thread_id; list_node_init(&obj->node); } LOCAL void cl_object_destroy_base(cl_base_object obj) { int ref = CL_OBJECT_GET_REF(obj); if (ref != 0) { DEBUGP(DL_ERROR, "CL object %p, call destroy with a reference %d", obj, ref); assert(0); } if (!CL_OBJECT_IS_VALID(obj)) { DEBUGP(DL_ERROR, "CL object %p, call destroy while it is already a dead object", obj); assert(0); } if (obj->owner != invalid_thread_id) { DEBUGP(DL_ERROR, "CL object %p, call destroy while still has a owener %d", obj, (int)obj->owner); assert(0); } if (!list_node_out_of_list(&obj->node)) { DEBUGP(DL_ERROR, "CL object %p, call destroy while still belong to some object %p", obj, obj->node.p); assert(0); } obj->magic = CL_OBJECT_INVALID_MAGIC; pthread_mutex_destroy(&obj->mutex); pthread_cond_destroy(&obj->cond); } LOCAL cl_int cl_object_take_ownership(cl_base_object obj, cl_int wait, cl_bool withlock) { pthread_t self; assert(CL_OBJECT_IS_VALID(obj)); self = pthread_self(); if (withlock == CL_FALSE) pthread_mutex_lock(&obj->mutex); if (pthread_equal(obj->owner, self)) { // Already get if (withlock == CL_FALSE) pthread_mutex_unlock(&obj->mutex); return 1; } if (pthread_equal(obj->owner, invalid_thread_id)) { obj->owner = self; if (withlock == CL_FALSE) pthread_mutex_unlock(&obj->mutex); return 1; } if (wait == 0) { if (withlock == CL_FALSE) pthread_mutex_unlock(&obj->mutex); return 0; } while (!pthread_equal(obj->owner, invalid_thread_id)) { pthread_cond_wait(&obj->cond, &obj->mutex); } obj->owner = self; if (withlock == CL_FALSE) pthread_mutex_unlock(&obj->mutex); return 1; } LOCAL void cl_object_release_ownership(cl_base_object obj, cl_bool withlock) { assert(CL_OBJECT_IS_VALID(obj)); if (withlock == CL_FALSE) pthread_mutex_lock(&obj->mutex); assert(pthread_equal(pthread_self(), obj->owner)); obj->owner = invalid_thread_id; pthread_cond_broadcast(&obj->cond); if (withlock == CL_FALSE) pthread_mutex_unlock(&obj->mutex); } LOCAL void cl_object_wait_on_cond(cl_base_object obj) { assert(CL_OBJECT_IS_VALID(obj)); pthread_cond_wait(&obj->cond, &obj->mutex); } LOCAL void cl_object_notify_cond(cl_base_object obj) { assert(CL_OBJECT_IS_VALID(obj)); pthread_cond_broadcast(&obj->cond); } Beignet-1.3.2-Source/src/cl_api_context.c000664 001750 001750 00000011276 13161142102 017404 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "cl_context.h" #include "cl_device_id.h" #include "cl_alloc.h" cl_context clCreateContext(const cl_context_properties *properties, cl_uint num_devices, const cl_device_id *devices, void (*pfn_notify)(const char *, const void *, size_t, void *), void *user_data, cl_int *errcode_ret) { cl_int err = CL_SUCCESS; cl_context context = NULL; do { /* Assure parameters correctness */ if (devices == NULL) { err = CL_INVALID_VALUE; break; } if (num_devices == 0) { err = CL_INVALID_VALUE; break; } if (pfn_notify == NULL && user_data != NULL) { err = CL_INVALID_VALUE; break; } err = cl_devices_list_check(num_devices, devices); if (err != CL_SUCCESS) break; context = cl_create_context(properties, num_devices, devices, pfn_notify, user_data, &err); } while (0); if (errcode_ret) *errcode_ret = err; return context; } cl_context clCreateContextFromType(const cl_context_properties *properties, cl_device_type device_type, void(CL_CALLBACK *pfn_notify)(const char *, const void *, size_t, void *), void *user_data, cl_int *errcode_ret) { cl_context context = NULL; cl_int err = CL_SUCCESS; cl_device_id *devices = NULL; cl_uint num_devices = 0; const cl_device_type valid_type = CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_CPU | CL_DEVICE_TYPE_ACCELERATOR | CL_DEVICE_TYPE_DEFAULT | CL_DEVICE_TYPE_CUSTOM; do { /* Assure parameters correctness */ if (pfn_notify == NULL && user_data != NULL) { err = CL_INVALID_VALUE; break; } if ((device_type & valid_type) == 0) { err = CL_INVALID_DEVICE_TYPE; break; } /* Get the devices num first. */ err = cl_get_device_ids(NULL, device_type, 0, NULL, &num_devices); if (err != CL_SUCCESS) break; assert(num_devices > 0); devices = cl_malloc(num_devices * sizeof(cl_device_id)); err = cl_get_device_ids(NULL, device_type, num_devices, &devices[0], &num_devices); if (err != CL_SUCCESS) break; context = cl_create_context(properties, num_devices, devices, pfn_notify, user_data, &err); } while (0); if (devices) cl_free(devices); if (errcode_ret) *errcode_ret = err; return context; } cl_int clRetainContext(cl_context context) { if (!CL_OBJECT_IS_CONTEXT(context)) { return CL_INVALID_CONTEXT; } cl_context_add_ref(context); return CL_SUCCESS; } cl_int clReleaseContext(cl_context context) { if (!CL_OBJECT_IS_CONTEXT(context)) { return CL_INVALID_CONTEXT; } cl_context_delete(context); return CL_SUCCESS; } cl_int clGetContextInfo(cl_context context, cl_context_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { const void *src_ptr = NULL; size_t src_size = 0; cl_uint n, ref; cl_context_properties p; if (!CL_OBJECT_IS_CONTEXT(context)) { return CL_INVALID_CONTEXT; } if (param_name == CL_CONTEXT_DEVICES) { src_ptr = context->devices; src_size = sizeof(cl_device_id) * context->device_num; } else if (param_name == CL_CONTEXT_NUM_DEVICES) { n = context->device_num; src_ptr = &n; src_size = sizeof(cl_uint); } else if (param_name == CL_CONTEXT_REFERENCE_COUNT) { ref = CL_OBJECT_GET_REF(context); src_ptr = &ref; src_size = sizeof(cl_uint); } else if (param_name == CL_CONTEXT_PROPERTIES) { if (context->prop_len > 0) { src_ptr = context->prop_user; src_size = sizeof(cl_context_properties) * context->prop_len; } else { p = 0; src_ptr = &p; src_size = sizeof(cl_context_properties); } } else { return CL_INVALID_VALUE; } return cl_get_info_helper(src_ptr, src_size, param_value, param_value_size, param_value_size_ret); } Beignet-1.3.2-Source/src/cl_mem.c000664 001750 001750 00000254026 13161142102 015647 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "cl_mem.h" #include "cl_image.h" #include "cl_context.h" #include "cl_utils.h" #include "cl_alloc.h" #include "cl_device_id.h" #include "cl_driver.h" #include "cl_khr_icd.h" #include "cl_kernel.h" #include "cl_command_queue.h" #include "cl_cmrt.h" #include "cl_enqueue.h" #include "CL/cl.h" #include "CL/cl_intel.h" #include #include #include #include #include #define FIELD_SIZE(CASE,TYPE) \ case JOIN(CL_,CASE): \ if(param_value_size_ret) \ *param_value_size_ret = sizeof(TYPE); \ if(!param_value) \ return CL_SUCCESS; \ if(param_value_size < sizeof(TYPE)) \ return CL_INVALID_VALUE; \ break; #define MAX_TILING_SIZE 128 * MB LOCAL cl_mem_object_type cl_get_mem_object_type(cl_mem mem) { switch (mem->type) { case CL_MEM_BUFFER_TYPE: case CL_MEM_SUBBUFFER_TYPE: return CL_MEM_OBJECT_BUFFER; case CL_MEM_IMAGE_TYPE: case CL_MEM_GL_IMAGE_TYPE: { struct _cl_mem_image *image = cl_mem_image(mem); return image->image_type; } default: return CL_MEM_OBJECT_BUFFER; } } LOCAL cl_int cl_get_pipe_info(cl_mem mem, cl_mem_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { _cl_mem_pipe *pipe; switch(param_name) { FIELD_SIZE(PIPE_PACKET_SIZE, cl_uint); FIELD_SIZE(PIPE_MAX_PACKETS, cl_uint); default: return CL_INVALID_VALUE; } if(mem->type != CL_MEM_PIPE_TYPE) return CL_INVALID_MEM_OBJECT; pipe = cl_mem_pipe(mem); switch(param_name) { case CL_PIPE_PACKET_SIZE: *((cl_uint *)param_value) = pipe->packet_size; break; case CL_PIPE_MAX_PACKETS: *((cl_uint *)param_value) = pipe->max_packets; break; } return CL_SUCCESS; } LOCAL cl_mem cl_mem_allocate(enum cl_mem_type type, cl_context ctx, cl_mem_flags flags, size_t sz, cl_int is_tiled, void *host_ptr, //pointer from application cl_mem buffer, //image2D from buffer cl_int *errcode) { cl_buffer_mgr bufmgr = NULL; cl_mem mem = NULL; cl_int err = CL_SUCCESS; size_t alignment = 64; assert(ctx); /* Allocate and inialize the structure itself */ if (type == CL_MEM_IMAGE_TYPE) { struct _cl_mem_image *image = NULL; TRY_ALLOC (image, CALLOC(struct _cl_mem_image)); mem = &image->base; } else if (type == CL_MEM_GL_IMAGE_TYPE ) { struct _cl_mem_gl_image *gl_image = NULL; TRY_ALLOC (gl_image, CALLOC(struct _cl_mem_gl_image)); mem = &gl_image->base.base; } else if (type == CL_MEM_BUFFER1D_IMAGE_TYPE) { struct _cl_mem_buffer1d_image *buffer1d_image = NULL; TRY_ALLOC(buffer1d_image, CALLOC(struct _cl_mem_buffer1d_image)); mem = &buffer1d_image->base.base; } else if (type == CL_MEM_PIPE_TYPE) { _cl_mem_pipe *pipe = NULL; TRY_ALLOC(pipe, CALLOC(struct _cl_mem_pipe)); mem = &pipe->base; } else { struct _cl_mem_buffer *buffer = NULL; TRY_ALLOC (buffer, CALLOC(struct _cl_mem_buffer)); mem = &buffer->base; } CL_OBJECT_INIT_BASE(mem, CL_OBJECT_MEM_MAGIC); list_init(&mem->dstr_cb_head); mem->type = type; mem->flags = flags; mem->is_userptr = 0; mem->offset = 0; mem->is_svm = 0; mem->cmrt_mem = NULL; if (mem->type == CL_MEM_IMAGE_TYPE) { cl_mem_image(mem)->is_image_from_buffer = 0; } if (sz != 0) { /* Pinning will require stricter alignment rules */ if ((flags & CL_MEM_PINNABLE) || is_tiled) alignment = 4096; /* Allocate space in memory */ bufmgr = cl_context_get_bufmgr(ctx); assert(bufmgr); #ifdef HAS_USERPTR uint8_t bufCreated = 0; if (ctx->devices[0]->host_unified_memory) { int page_size = getpagesize(); int cacheline_size = 0; cl_get_device_info(ctx->devices[0], CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, sizeof(cacheline_size), &cacheline_size, NULL); if (type == CL_MEM_BUFFER_TYPE) { if (flags & CL_MEM_USE_HOST_PTR) { assert(host_ptr != NULL); cl_mem svm_mem = NULL; if((svm_mem = cl_context_get_svm_from_ptr(ctx, host_ptr)) != NULL) mem->is_svm = 1; /* userptr not support tiling */ if (!is_tiled) { if(svm_mem != NULL) { //SVM always paged alignment mem->offset = 0; mem->is_userptr = 1; mem->bo = svm_mem->bo; cl_mem_add_ref(svm_mem); bufCreated = 1; } else if ((ALIGN((unsigned long)host_ptr, cacheline_size) == (unsigned long)host_ptr) && (ALIGN((unsigned long)sz, cacheline_size) == (unsigned long)sz)) { void* aligned_host_ptr = (void*)(((unsigned long)host_ptr) & (~(page_size - 1))); mem->offset = host_ptr - aligned_host_ptr; mem->is_userptr = 1; size_t aligned_sz = ALIGN((mem->offset + sz), page_size); mem->bo = cl_buffer_alloc_userptr(bufmgr, "CL userptr memory object", aligned_host_ptr, aligned_sz, 0); bufCreated = 1; } } } else if (flags & CL_MEM_ALLOC_HOST_PTR) { const size_t alignedSZ = ALIGN(sz, page_size); void* internal_host_ptr = cl_aligned_malloc(alignedSZ, page_size); mem->host_ptr = internal_host_ptr; mem->is_userptr = 1; mem->bo = cl_buffer_alloc_userptr(bufmgr, "CL userptr memory object", internal_host_ptr, alignedSZ, 0); bufCreated = 1; } } else if (type == CL_MEM_IMAGE_TYPE) { if (host_ptr != NULL) { assert(flags & CL_MEM_USE_HOST_PTR); assert(!is_tiled); assert(ALIGN((unsigned long)host_ptr, cacheline_size) == (unsigned long)host_ptr); void* aligned_host_ptr = (void*)(((unsigned long)host_ptr) & (~(page_size - 1))); mem->offset = host_ptr - aligned_host_ptr; mem->is_userptr = 1; size_t aligned_sz = ALIGN((mem->offset + sz), page_size); mem->bo = cl_buffer_alloc_userptr(bufmgr, "CL userptr memory object", aligned_host_ptr, aligned_sz, 0); bufCreated = 1; } } } if(type == CL_MEM_IMAGE_TYPE && buffer != NULL) { // if create image from USE_HOST_PTR buffer, the buffer's base address need be aligned. if(buffer->is_userptr) { int base_alignement = 0; cl_get_device_info(ctx->devices[0], CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, sizeof(base_alignement), &base_alignement, NULL); if(ALIGN((unsigned long)buffer->host_ptr, base_alignement) != (unsigned long)buffer->host_ptr) { err = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; goto error; } } // if the image if created from buffer, should use the bo directly to share same bo. mem->bo = buffer->bo; cl_mem_image(mem)->is_image_from_buffer = 1; bufCreated = 1; } if (!bufCreated) mem->bo = cl_buffer_alloc(bufmgr, "CL memory object", sz, alignment); #else if(type == CL_MEM_IMAGE_TYPE && buffer != NULL) { // if the image if created from buffer, should use the bo directly to share same bo. mem->bo = buffer->bo; cl_mem_image(mem)->is_image_from_buffer = 1; } else mem->bo = cl_buffer_alloc(bufmgr, "CL memory object", sz, alignment); #endif if (UNLIKELY(mem->bo == NULL)) { err = CL_MEM_OBJECT_ALLOCATION_FAILURE; goto error; } mem->size = sz; } /* Append the buffer in the context buffer list */ cl_context_add_mem(ctx, mem); exit: if (errcode) *errcode = err; return mem; error: cl_mem_delete(mem); mem = NULL; goto exit; } LOCAL cl_int cl_mem_is_valid(cl_mem mem, cl_context ctx) { struct list_node *pos; cl_base_object pbase_object; CL_OBJECT_LOCK(ctx); list_for_each (pos, (&ctx->mem_objects)) { pbase_object = list_entry(pos, _cl_base_object, node); if (pbase_object == (cl_base_object)mem) { if (UNLIKELY(!CL_OBJECT_IS_MEM(mem))) { CL_OBJECT_UNLOCK(ctx); return CL_INVALID_MEM_OBJECT; } CL_OBJECT_UNLOCK(ctx); return CL_SUCCESS; } } CL_OBJECT_UNLOCK(ctx); return CL_INVALID_MEM_OBJECT; } LOCAL cl_mem cl_mem_new_buffer(cl_context ctx, cl_mem_flags flags, size_t sz, void *data, cl_int *errcode_ret) { /* Possible mem type combination: CL_MEM_ALLOC_HOST_PTR CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR CL_MEM_USE_HOST_PTR CL_MEM_COPY_HOST_PTR */ cl_int err = CL_SUCCESS; cl_mem mem = NULL; cl_ulong max_mem_size; if (UNLIKELY(sz == 0)) { err = CL_INVALID_BUFFER_SIZE; goto error; } if (UNLIKELY(((flags & CL_MEM_READ_WRITE) && (flags & (CL_MEM_READ_ONLY | CL_MEM_WRITE_ONLY))) || ((flags & CL_MEM_READ_ONLY) && (flags & (CL_MEM_WRITE_ONLY))) || ((flags & CL_MEM_ALLOC_HOST_PTR) && (flags & CL_MEM_USE_HOST_PTR)) || ((flags & CL_MEM_COPY_HOST_PTR) && (flags & CL_MEM_USE_HOST_PTR)) || ((flags & CL_MEM_HOST_READ_ONLY) && (flags & CL_MEM_HOST_NO_ACCESS)) || ((flags & CL_MEM_HOST_READ_ONLY) && (flags & CL_MEM_HOST_WRITE_ONLY)) || ((flags & CL_MEM_HOST_WRITE_ONLY) && (flags & CL_MEM_HOST_NO_ACCESS)) || ((flags & (~(CL_MEM_READ_WRITE | CL_MEM_WRITE_ONLY | CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR | CL_MEM_USE_HOST_PTR | CL_MEM_HOST_WRITE_ONLY | CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_NO_ACCESS))) != 0))) { err = CL_INVALID_VALUE; goto error; } /* This flag is valid only if host_ptr is not NULL */ if (UNLIKELY((((flags & CL_MEM_COPY_HOST_PTR) || (flags & CL_MEM_USE_HOST_PTR)) && data == NULL)) || (!(flags & (CL_MEM_COPY_HOST_PTR |CL_MEM_USE_HOST_PTR)) && (data != NULL))) { err = CL_INVALID_HOST_PTR; goto error; } if ((err = cl_get_device_info(ctx->devices[0], CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(max_mem_size), &max_mem_size, NULL)) != CL_SUCCESS) { goto error; } if (UNLIKELY(sz > max_mem_size)) { err = CL_INVALID_BUFFER_SIZE; goto error; } /* HSW: Byte scattered Read/Write has limitation that the buffer size must be a multiple of 4 bytes. */ sz = ALIGN(sz, 4); /* Create the buffer in video memory */ mem = cl_mem_allocate(CL_MEM_BUFFER_TYPE, ctx, flags, sz, CL_FALSE, data, NULL, &err); if (mem == NULL || err != CL_SUCCESS) goto error; /* Copy the data if required */ if (flags & CL_MEM_COPY_HOST_PTR) { if (mem->is_userptr) memcpy(mem->host_ptr, data, sz); else cl_buffer_subdata(mem->bo, 0, sz, data); } if ((flags & CL_MEM_USE_HOST_PTR) && !mem->is_userptr) cl_buffer_subdata(mem->bo, 0, sz, data); if (flags & CL_MEM_USE_HOST_PTR) mem->host_ptr = data; exit: if (errcode_ret) *errcode_ret = err; return mem; error: cl_mem_delete(mem); mem = NULL; goto exit; } LOCAL cl_mem cl_mem_new_sub_buffer(cl_mem buffer, cl_mem_flags flags, cl_buffer_create_type create_type, const void *create_info, cl_int *errcode_ret) { cl_int err = CL_SUCCESS; cl_mem mem = NULL; struct _cl_mem_buffer *sub_buf = NULL; if (buffer->type != CL_MEM_BUFFER_TYPE) { err = CL_INVALID_MEM_OBJECT; goto error; } if (flags && (((buffer->flags & CL_MEM_WRITE_ONLY) && (flags & (CL_MEM_READ_WRITE|CL_MEM_READ_ONLY))) || ((buffer->flags & CL_MEM_READ_ONLY) && (flags & (CL_MEM_READ_WRITE|CL_MEM_WRITE_ONLY))) || (flags & (CL_MEM_USE_HOST_PTR | CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR)) || ((flags & CL_MEM_HOST_READ_ONLY) && (flags & CL_MEM_HOST_NO_ACCESS)) || ((flags & CL_MEM_HOST_READ_ONLY) && (flags & CL_MEM_HOST_WRITE_ONLY)) || ((flags & CL_MEM_HOST_WRITE_ONLY) && (flags & CL_MEM_HOST_NO_ACCESS)))) { err = CL_INVALID_VALUE; goto error; } if((flags & (CL_MEM_WRITE_ONLY | CL_MEM_READ_ONLY | CL_MEM_READ_WRITE)) == 0) { flags |= buffer->flags & (CL_MEM_WRITE_ONLY | CL_MEM_READ_ONLY | CL_MEM_READ_WRITE); } flags |= buffer->flags & (CL_MEM_USE_HOST_PTR | CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR); if((flags & (CL_MEM_HOST_WRITE_ONLY | CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_NO_ACCESS)) == 0) { flags |= buffer->flags & (CL_MEM_HOST_WRITE_ONLY | CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_NO_ACCESS); } if (create_type != CL_BUFFER_CREATE_TYPE_REGION) { err = CL_INVALID_VALUE; goto error; } if (!create_info) { err = CL_INVALID_VALUE; goto error; } cl_buffer_region *info = (cl_buffer_region *)create_info; if (!info->size) { err = CL_INVALID_BUFFER_SIZE; goto error; } if (info->origin > buffer->size || info->origin + info->size > buffer->size) { err = CL_INVALID_VALUE; goto error; } if (info->origin & (buffer->ctx->devices[0]->mem_base_addr_align / 8 - 1)) { err = CL_MISALIGNED_SUB_BUFFER_OFFSET; goto error; } /* Now create the sub buffer and link it to the buffer. */ TRY_ALLOC (sub_buf, CALLOC(struct _cl_mem_buffer)); mem = &sub_buf->base; CL_OBJECT_INIT_BASE(mem, CL_OBJECT_MEM_MAGIC); list_init(&mem->dstr_cb_head); mem->type = CL_MEM_SUBBUFFER_TYPE; mem->flags = flags; mem->offset = buffer->offset; mem->is_userptr = buffer->is_userptr; sub_buf->parent = (struct _cl_mem_buffer*)buffer; cl_mem_add_ref(buffer); /* Append the buffer in the parent buffer list */ pthread_mutex_lock(&((struct _cl_mem_buffer*)buffer)->sub_lock); sub_buf->sub_next = ((struct _cl_mem_buffer*)buffer)->subs; if (((struct _cl_mem_buffer*)buffer)->subs != NULL) ((struct _cl_mem_buffer*)buffer)->subs->sub_prev = sub_buf; ((struct _cl_mem_buffer*)buffer)->subs = sub_buf; pthread_mutex_unlock(&((struct _cl_mem_buffer*)buffer)->sub_lock); mem->bo = buffer->bo; mem->size = info->size; sub_buf->sub_offset = info->origin; if (buffer->flags & CL_MEM_USE_HOST_PTR || buffer->flags & CL_MEM_COPY_HOST_PTR || buffer->flags & CL_MEM_ALLOC_HOST_PTR) { mem->host_ptr = buffer->host_ptr; } /* Append the buffer in the context buffer list */ cl_context_add_mem(buffer->ctx, mem); exit: if (errcode_ret) *errcode_ret = err; return mem; error: cl_mem_delete(mem); mem = NULL; goto exit; } cl_mem cl_mem_new_pipe(cl_context ctx, cl_mem_flags flags, cl_uint packet_size, cl_uint max_packets, cl_int *errcode_ret) { _cl_mem_pipe* pipe = NULL; cl_uint *ptr = NULL; cl_mem mem = NULL; cl_int err; cl_uint sz; if(UNLIKELY((pipe = CALLOC(_cl_mem_pipe)) == NULL)) { err = CL_OUT_OF_RESOURCES; goto error; } sz = packet_size * max_packets; assert(sz != 0); /* HSW: Byte scattered Read/Write has limitation that the buffer size must be a multiple of 4 bytes. */ sz = ALIGN(sz, 4); sz += 128; //The head of pipe is for data struct, and alignment to 128 byte for max data type double16 mem = cl_mem_allocate(CL_MEM_PIPE_TYPE, ctx, flags, sz, CL_FALSE,NULL , NULL, &err); if (mem == NULL || err != CL_SUCCESS) goto error; ptr = cl_mem_map_auto(mem, 1); if(ptr == NULL){ err = CL_OUT_OF_RESOURCES; goto error; } ptr[0] = max_packets; ptr[1] = packet_size; ptr[2] = 0; //write ptr ptr[3] = 0; //read ptr ptr[4] = 0; //reservation read ptr ptr[5] = 0; //reservation write ptr ptr[6] = 0; //packet num cl_mem_unmap(mem); pipe = cl_mem_pipe(mem); pipe->flags = flags; pipe->packet_size = packet_size; pipe->max_packets = max_packets; return mem; exit: if (errcode_ret) *errcode_ret = err; return mem; error: cl_mem_delete(mem); mem = NULL; goto exit; } void cl_mem_replace_buffer(cl_mem buffer, cl_buffer new_bo) { cl_buffer_unreference(buffer->bo); buffer->bo = new_bo; cl_buffer_reference(new_bo); if (buffer->type != CL_MEM_SUBBUFFER_TYPE) return; struct _cl_mem_buffer *it = ((struct _cl_mem_buffer*)buffer)->sub_next; for( ; it != (struct _cl_mem_buffer*)buffer; it = it->sub_next) { cl_buffer_unreference(it->base.bo); it->base.bo = new_bo; cl_buffer_reference(new_bo); } } void* cl_mem_svm_allocate(cl_context ctx, cl_svm_mem_flags flags, size_t size, unsigned int alignment) { cl_int err = CL_SUCCESS; size_t max_mem_size; if(UNLIKELY(alignment & (alignment - 1))) return NULL; if ((err = cl_get_device_info(ctx->devices[0], CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(max_mem_size), &max_mem_size, NULL)) != CL_SUCCESS) { return NULL; } if(UNLIKELY(size == 0 || size > max_mem_size)) { return NULL; } if (flags & (CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_SVM_ATOMICS)) { return NULL; } if (flags && ((flags & (CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_SVM_FINE_GRAIN_BUFFER)) || ((flags & CL_MEM_WRITE_ONLY) && (flags & CL_MEM_READ_ONLY)) || ((flags & CL_MEM_WRITE_ONLY) && (flags & CL_MEM_READ_WRITE)) || ((flags & CL_MEM_READ_ONLY) && (flags & CL_MEM_READ_WRITE)))) { return NULL; } void * ptr = NULL; #ifdef HAS_BO_SET_SOFTPIN cl_buffer_mgr bufmgr = NULL; cl_mem mem; _cl_mem_svm* svm; if(UNLIKELY((svm = CALLOC(_cl_mem_svm)) == NULL)) return NULL; mem = &svm->base; mem->type = CL_MEM_SVM_TYPE; CL_OBJECT_INIT_BASE(mem, CL_OBJECT_MEM_MAGIC); list_init(&mem->dstr_cb_head); mem->flags = flags | CL_MEM_USE_HOST_PTR; mem->is_userptr = 0; mem->is_svm = 0; mem->offset = 0; bufmgr = cl_context_get_bufmgr(ctx); assert(bufmgr); int page_size = getpagesize(); const size_t alignedSZ = ALIGN(size, page_size); if(alignment == 0) alignment = page_size; else alignment = ALIGN(alignment, page_size); ptr = cl_aligned_malloc(alignedSZ, alignment); if(ptr == NULL) return NULL; mem->host_ptr = ptr; mem->is_svm = 1; mem->is_userptr = 1; mem->bo = cl_buffer_alloc_userptr(bufmgr, "CL SVM memory object", ptr, alignedSZ, 0); mem->size = size; cl_buffer_set_softpin_offset(mem->bo, (size_t)ptr); cl_buffer_set_bo_use_full_range(mem->bo, 1); /* Append the svm in the context buffer list */ cl_context_add_mem(ctx, mem); #endif return ptr; } void cl_mem_copy_image_region(const size_t *origin, const size_t *region, void *dst, size_t dst_row_pitch, size_t dst_slice_pitch, const void *src, size_t src_row_pitch, size_t src_slice_pitch, const struct _cl_mem_image *image, cl_bool offset_dst, cl_bool offset_src) { if(offset_dst) { size_t dst_offset = image->bpp * origin[0] + dst_row_pitch * origin[1] + dst_slice_pitch * origin[2]; dst = (char*)dst + dst_offset; } if(offset_src) { size_t src_offset = image->bpp * origin[0] + src_row_pitch * origin[1] + src_slice_pitch * origin[2]; src = (char*)src + src_offset; } if (!origin[0] && region[0] == image->w && dst_row_pitch == src_row_pitch && (region[2] == 1 || (!origin[1] && region[1] == image->h && dst_slice_pitch == src_slice_pitch))) { memcpy(dst, src, region[2] == 1 ? src_row_pitch*region[1] : src_slice_pitch*region[2]); } else { cl_uint y, z; for (z = 0; z < region[2]; z++) { const char* src_ptr = src; char* dst_ptr = dst; for (y = 0; y < region[1]; y++) { memcpy(dst_ptr, src_ptr, image->bpp*region[0]); src_ptr += src_row_pitch; dst_ptr += dst_row_pitch; } src = (char*)src + src_slice_pitch; dst = (char*)dst + dst_slice_pitch; } } } void cl_mem_copy_image_to_image(const size_t *dst_origin,const size_t *src_origin, const size_t *region, const struct _cl_mem_image *dst_image, const struct _cl_mem_image *src_image) { char* dst= cl_mem_map_auto((cl_mem)dst_image, 1); char* src= cl_mem_map_auto((cl_mem)src_image, 0); size_t dst_offset = dst_image->bpp * dst_origin[0] + dst_image->row_pitch * dst_origin[1] + dst_image->slice_pitch * dst_origin[2]; size_t src_offset = src_image->bpp * src_origin[0] + src_image->row_pitch * src_origin[1] + src_image->slice_pitch * src_origin[2]; dst= (char*)dst+ dst_offset; src= (char*)src+ src_offset; cl_uint y, z; for (z = 0; z < region[2]; z++) { const char* src_ptr = src; char* dst_ptr = dst; for (y = 0; y < region[1]; y++) { memcpy(dst_ptr, src_ptr, src_image->bpp*region[0]); src_ptr += src_image->row_pitch; dst_ptr += dst_image->row_pitch; } src = (char*)src + src_image->slice_pitch; dst = (char*)dst + dst_image->slice_pitch; } cl_mem_unmap_auto((cl_mem)src_image); cl_mem_unmap_auto((cl_mem)dst_image); } static void cl_mem_copy_image(struct _cl_mem_image *image, size_t row_pitch, size_t slice_pitch, void* host_ptr) { char* dst_ptr = cl_mem_map_auto((cl_mem)image, 1); size_t origin[3] = {0, 0, 0}; size_t region[3] = {image->w, image->h, image->depth}; cl_mem_copy_image_region(origin, region, dst_ptr, image->row_pitch, image->slice_pitch, host_ptr, row_pitch, slice_pitch, image, CL_FALSE, CL_FALSE); //offset is 0 cl_mem_unmap_auto((cl_mem)image); } cl_image_tiling_t cl_get_default_tiling(cl_driver drv) { static int initialized = 0; static cl_image_tiling_t tiling = CL_TILE_X; if (!initialized) { // FIXME, need to find out the performance diff's root cause on BDW. // SKL's 3D Image can't use TILE_X, so use TILE_Y as default if(cl_driver_get_ver(drv) == 8 || cl_driver_get_ver(drv) == 9) tiling = CL_TILE_Y; char *tilingStr = getenv("OCL_TILING"); if (tilingStr != NULL) { switch (tilingStr[0]) { case '0': tiling = CL_NO_TILE; break; case '1': tiling = CL_TILE_X; break; case '2': tiling = CL_TILE_Y; break; default: break; } } initialized = 1; } return tiling; } static cl_mem _cl_mem_new_image(cl_context ctx, cl_mem_flags flags, const cl_image_format *fmt, const cl_mem_object_type orig_image_type, size_t w, size_t h, size_t depth, size_t pitch, size_t slice_pitch, void *data, //pointer from application cl_mem buffer, //for image2D from buffer cl_int *errcode_ret) { cl_int err = CL_SUCCESS; cl_mem mem = NULL; cl_mem_object_type image_type = orig_image_type; uint32_t bpp = 0, intel_fmt = INTEL_UNSUPPORTED_FORMAT; size_t sz = 0, aligned_pitch = 0, aligned_slice_pitch = 0, aligned_h = 0; size_t origin_width = w; // for image1d buffer work around. cl_image_tiling_t tiling = CL_NO_TILE; int enable_true_hostptr = 0; // can't use BVAR (backend/src/sys/cvar.hpp) here as it's C++ const char *env = getenv("OCL_IMAGE_HOSTPTR"); if (env != NULL) { sscanf(env, "%i", &enable_true_hostptr); } /* Check flags consistency */ if (UNLIKELY((flags & (CL_MEM_COPY_HOST_PTR | CL_MEM_USE_HOST_PTR)) && data == NULL)) { err = CL_INVALID_HOST_PTR; goto error; } /* Get the size of each pixel */ if (UNLIKELY((err = cl_image_byte_per_pixel(fmt, &bpp)) != CL_SUCCESS)) goto error; /* Only a sub-set of the formats are supported */ intel_fmt = cl_image_get_intel_format(fmt); if (UNLIKELY(intel_fmt == INTEL_UNSUPPORTED_FORMAT)) { err = CL_IMAGE_FORMAT_NOT_SUPPORTED; goto error; } /* See if the user parameters match */ #define DO_IMAGE_ERROR \ do { \ err = CL_INVALID_IMAGE_SIZE; \ goto error; \ } while (0); if (UNLIKELY(w == 0)) DO_IMAGE_ERROR; if (UNLIKELY(h == 0 && (image_type != CL_MEM_OBJECT_IMAGE1D && image_type != CL_MEM_OBJECT_IMAGE1D_ARRAY && image_type != CL_MEM_OBJECT_IMAGE1D_BUFFER))) DO_IMAGE_ERROR; if (image_type == CL_MEM_OBJECT_IMAGE1D) { size_t min_pitch = bpp * w; if (data && pitch == 0) pitch = min_pitch; h = 1; depth = 1; if (UNLIKELY(w > ctx->devices[0]->image2d_max_width)) DO_IMAGE_ERROR; if (UNLIKELY(data && min_pitch > pitch)) DO_IMAGE_ERROR; if (UNLIKELY(data && (slice_pitch % pitch != 0))) DO_IMAGE_ERROR; if (UNLIKELY(!data && pitch != 0)) DO_IMAGE_ERROR; if (UNLIKELY(!data && slice_pitch != 0)) DO_IMAGE_ERROR; tiling = CL_NO_TILE; } else if (image_type == CL_MEM_OBJECT_IMAGE2D || image_type == CL_MEM_OBJECT_IMAGE1D_BUFFER) { if (image_type == CL_MEM_OBJECT_IMAGE1D_BUFFER) { if (UNLIKELY(w > ctx->devices[0]->image_mem_size)) DO_IMAGE_ERROR; /* This is an image1d buffer which exceeds normal image size restrication We have to use a 2D image to simulate this 1D image. */ h = (w + ctx->devices[0]->image2d_max_width - 1) / ctx->devices[0]->image2d_max_width; w = w > ctx->devices[0]->image2d_max_width ? ctx->devices[0]->image2d_max_width : w; tiling = CL_NO_TILE; } else if(image_type == CL_MEM_OBJECT_IMAGE2D && buffer != NULL) { tiling = CL_NO_TILE; } else if (cl_driver_get_ver(ctx->drv) != 6) { /* Pick up tiling mode (we do only linear on SNB) */ tiling = cl_get_default_tiling(ctx->drv); } size_t min_pitch = bpp * w; if (data && pitch == 0) pitch = min_pitch; if (UNLIKELY(w > ctx->devices[0]->image2d_max_width)) DO_IMAGE_ERROR; if (UNLIKELY(h > ctx->devices[0]->image2d_max_height)) DO_IMAGE_ERROR; if (UNLIKELY(data && min_pitch > pitch)) DO_IMAGE_ERROR; if (UNLIKELY(!data && pitch != 0 && buffer == NULL)) DO_IMAGE_ERROR; depth = 1; } else if (image_type == CL_MEM_OBJECT_IMAGE3D || image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY || image_type == CL_MEM_OBJECT_IMAGE2D_ARRAY) { if (image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY) { h = 1; tiling = CL_NO_TILE; } else if (cl_driver_get_ver(ctx->drv) != 6) tiling = cl_get_default_tiling(ctx->drv); size_t min_pitch = bpp * w; if (data && pitch == 0) pitch = min_pitch; size_t min_slice_pitch = pitch * h; if (data && slice_pitch == 0) slice_pitch = min_slice_pitch; if (UNLIKELY(w > ctx->devices[0]->image3d_max_width)) DO_IMAGE_ERROR; if (UNLIKELY(h > ctx->devices[0]->image3d_max_height)) DO_IMAGE_ERROR; if (image_type == CL_MEM_OBJECT_IMAGE3D && (UNLIKELY(depth > ctx->devices[0]->image3d_max_depth))) DO_IMAGE_ERROR else if (UNLIKELY(depth > ctx->devices[0]->image_max_array_size)) DO_IMAGE_ERROR; if (UNLIKELY(data && min_pitch > pitch)) DO_IMAGE_ERROR; if (UNLIKELY(data && min_slice_pitch > slice_pitch)) DO_IMAGE_ERROR; if (UNLIKELY(!data && pitch != 0)) DO_IMAGE_ERROR; if (UNLIKELY(!data && slice_pitch != 0)) DO_IMAGE_ERROR; } else assert(0); #undef DO_IMAGE_ERROR uint8_t enableUserptr = 0; if (enable_true_hostptr && ctx->devices[0]->host_unified_memory && data != NULL && (flags & CL_MEM_USE_HOST_PTR)) { int cacheline_size = 0; cl_get_device_info(ctx->devices[0], CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, sizeof(cacheline_size), &cacheline_size, NULL); if (ALIGN((unsigned long)data, cacheline_size) == (unsigned long)data && ALIGN(h, cl_buffer_get_tiling_align(ctx, CL_NO_TILE, 1)) == h && ALIGN(h * pitch * depth, cacheline_size) == h * pitch * depth && //h and pitch should same as aligned_h and aligned_pitch if enable userptr ((image_type != CL_MEM_OBJECT_IMAGE3D && image_type != CL_MEM_OBJECT_IMAGE1D_ARRAY && image_type != CL_MEM_OBJECT_IMAGE2D_ARRAY) || pitch * h == slice_pitch)) { tiling = CL_NO_TILE; enableUserptr = 1; } } /* Tiling requires to align both pitch and height */ if (tiling == CL_NO_TILE) { aligned_pitch = w * bpp; if (aligned_pitch < pitch && enableUserptr) aligned_pitch = pitch; //no need align the height if 2d image from buffer. //the pitch should be same with buffer's pitch as they share same bo. if (image_type == CL_MEM_OBJECT_IMAGE2D && buffer != NULL) { if(aligned_pitch < pitch) { aligned_pitch = pitch; } aligned_h = h; } else aligned_h = ALIGN(h, cl_buffer_get_tiling_align(ctx, CL_NO_TILE, 1)); } else if (tiling == CL_TILE_X) { aligned_pitch = ALIGN(w * bpp, cl_buffer_get_tiling_align(ctx, CL_TILE_X, 0)); aligned_h = ALIGN(h, cl_buffer_get_tiling_align(ctx, CL_TILE_X, 1)); } else if (tiling == CL_TILE_Y) { aligned_pitch = ALIGN(w * bpp, cl_buffer_get_tiling_align(ctx, CL_TILE_Y, 0)); aligned_h = ALIGN(h, cl_buffer_get_tiling_align(ctx, CL_TILE_Y, 1)); } sz = aligned_pitch * aligned_h * depth; if (image_type == CL_MEM_OBJECT_IMAGE2D && buffer != NULL) { //image 2d created from buffer: per spec, the buffer sz maybe larger than the image 2d. if (buffer->size >= sz) sz = buffer->size; else { err = CL_INVALID_IMAGE_SIZE; goto error; } } /* If sz is large than 128MB, map gtt may fail in some system. Because there is no obviours performance drop, disable tiling. */ if(tiling != CL_NO_TILE && sz > MAX_TILING_SIZE) { tiling = CL_NO_TILE; aligned_pitch = w * bpp; aligned_h = ALIGN(h, cl_buffer_get_tiling_align(ctx, CL_NO_TILE, 1)); sz = aligned_pitch * aligned_h * depth; } if (image_type != CL_MEM_OBJECT_IMAGE1D_BUFFER) { if (image_type == CL_MEM_OBJECT_IMAGE2D && buffer != NULL) mem = cl_mem_allocate(CL_MEM_IMAGE_TYPE, ctx, flags, sz, tiling != CL_NO_TILE, NULL, buffer, &err); else { if (enableUserptr) mem = cl_mem_allocate(CL_MEM_IMAGE_TYPE, ctx, flags, sz, tiling != CL_NO_TILE, data, NULL, &err); else mem = cl_mem_allocate(CL_MEM_IMAGE_TYPE, ctx, flags, sz, tiling != CL_NO_TILE, NULL, NULL, &err); } } else { mem = cl_mem_allocate(CL_MEM_BUFFER1D_IMAGE_TYPE, ctx, flags, sz, tiling != CL_NO_TILE, NULL, NULL, &err); if (mem != NULL && err == CL_SUCCESS) { struct _cl_mem_buffer1d_image *buffer1d_image = (struct _cl_mem_buffer1d_image *)mem; buffer1d_image->size = origin_width;; } } if (mem == NULL || err != CL_SUCCESS) goto error; if(!(image_type == CL_MEM_OBJECT_IMAGE2D && buffer != NULL)) { //no need set tiling if image 2d created from buffer since share same bo. cl_buffer_set_tiling(mem->bo, tiling, aligned_pitch); } if (image_type == CL_MEM_OBJECT_IMAGE1D || image_type == CL_MEM_OBJECT_IMAGE2D || image_type == CL_MEM_OBJECT_IMAGE1D_BUFFER) aligned_slice_pitch = 0; else //SKL need use tiling's aligned_h to calc slice_pitch and IVB to BDW need CL_NO_TILE's aligned_h to calc. aligned_slice_pitch = aligned_pitch * ALIGN(h, cl_buffer_get_tiling_align(ctx, tiling, 2)); cl_mem_image_init(cl_mem_image(mem), w, h, image_type, depth, *fmt, intel_fmt, bpp, aligned_pitch, aligned_slice_pitch, tiling, 0, 0, 0); /* Copy the data if required */ if (flags & CL_MEM_COPY_HOST_PTR && data) cl_mem_copy_image(cl_mem_image(mem), pitch, slice_pitch, data); if (flags & CL_MEM_USE_HOST_PTR && data) { mem->host_ptr = data; cl_mem_image(mem)->host_row_pitch = pitch; cl_mem_image(mem)->host_slice_pitch = slice_pitch; if (!enableUserptr) cl_mem_copy_image(cl_mem_image(mem), pitch, slice_pitch, data); } exit: if (errcode_ret) *errcode_ret = err; return mem; error: cl_mem_delete(mem); mem = NULL; goto exit; } static cl_mem _cl_mem_new_image_from_buffer(cl_context ctx, cl_mem_flags flags, const cl_image_format* image_format, const cl_image_desc *image_desc, cl_int *errcode_ret) { cl_mem image = NULL; cl_mem buffer = image_desc->buffer; cl_int err = CL_SUCCESS; *errcode_ret = err; cl_ulong max_size; cl_mem_flags merged_flags; uint32_t bpp; uint32_t intel_fmt = INTEL_UNSUPPORTED_FORMAT; size_t offset = 0; /* Get the size of each pixel */ if (UNLIKELY((err = cl_image_byte_per_pixel(image_format, &bpp)) != CL_SUCCESS)) goto error; /* Only a sub-set of the formats are supported */ intel_fmt = cl_image_get_intel_format(image_format); if (UNLIKELY(intel_fmt == INTEL_UNSUPPORTED_FORMAT)) { err = CL_INVALID_IMAGE_FORMAT_DESCRIPTOR; goto error; } if (!buffer) { err = CL_INVALID_IMAGE_DESCRIPTOR; goto error; } if (flags & (CL_MEM_USE_HOST_PTR|CL_MEM_ALLOC_HOST_PTR|CL_MEM_COPY_HOST_PTR)) { err = CL_INVALID_IMAGE_DESCRIPTOR; goto error; } /* access check. */ if ((buffer->flags & CL_MEM_WRITE_ONLY) && (flags & (CL_MEM_READ_WRITE|CL_MEM_READ_ONLY))) { err = CL_INVALID_VALUE; goto error; } if ((buffer->flags & CL_MEM_READ_ONLY) && (flags & (CL_MEM_READ_WRITE|CL_MEM_WRITE_ONLY))) { err = CL_INVALID_VALUE; goto error; } if ((buffer->flags & CL_MEM_HOST_WRITE_ONLY) && (flags & CL_MEM_HOST_READ_ONLY)) { err = CL_INVALID_VALUE; goto error; } if ((buffer->flags & CL_MEM_HOST_READ_ONLY) && (flags & CL_MEM_HOST_WRITE_ONLY)) { err = CL_INVALID_VALUE; goto error; } if ((buffer->flags & CL_MEM_HOST_NO_ACCESS) && (flags & (CL_MEM_HOST_READ_ONLY | CL_MEM_HOST_WRITE_ONLY))) { err = CL_INVALID_VALUE; goto error; } if ((err = cl_get_device_info(ctx->devices[0], CL_DEVICE_IMAGE_MAX_BUFFER_SIZE, sizeof(max_size), &max_size, NULL)) != CL_SUCCESS) { goto error; } if (image_desc->image_width > max_size) { err = CL_INVALID_IMAGE_DESCRIPTOR; goto error; } if (image_desc->image_width*bpp > buffer->size) { err = CL_INVALID_IMAGE_DESCRIPTOR; goto error; } merged_flags = buffer->flags; if (flags & (CL_MEM_READ_WRITE|CL_MEM_READ_WRITE|CL_MEM_WRITE_ONLY)) { merged_flags &= ~(CL_MEM_READ_WRITE|CL_MEM_READ_WRITE|CL_MEM_WRITE_ONLY); merged_flags |= flags & (CL_MEM_READ_WRITE|CL_MEM_READ_WRITE|CL_MEM_WRITE_ONLY); } if (flags & (CL_MEM_HOST_WRITE_ONLY|CL_MEM_HOST_READ_ONLY|CL_MEM_HOST_NO_ACCESS)) { merged_flags &= ~(CL_MEM_HOST_WRITE_ONLY|CL_MEM_HOST_READ_ONLY|CL_MEM_HOST_NO_ACCESS); merged_flags |= flags & (CL_MEM_HOST_WRITE_ONLY|CL_MEM_HOST_READ_ONLY|CL_MEM_HOST_NO_ACCESS); } struct _cl_mem_buffer *mem_buffer = (struct _cl_mem_buffer*)buffer; if (buffer->type == CL_MEM_SUBBUFFER_TYPE) { offset = ((struct _cl_mem_buffer *)buffer)->sub_offset; mem_buffer = mem_buffer->parent; } /* Get the size of each pixel */ if (UNLIKELY((err = cl_image_byte_per_pixel(image_format, &bpp)) != CL_SUCCESS)) goto error; if(image_desc->image_type == CL_MEM_OBJECT_IMAGE2D) { image = _cl_mem_new_image(ctx, flags, image_format, image_desc->image_type, image_desc->image_width, image_desc->image_height, image_desc->image_depth, image_desc->image_row_pitch, image_desc->image_slice_pitch, NULL, image_desc->buffer, errcode_ret); } else if (image_desc->image_type == CL_MEM_OBJECT_IMAGE1D_BUFFER) { // Per bspec, a image should has a at least 2 line vertical alignment, // thus we can't simply attach a buffer to a 1d image surface which has the same size. // We have to create a new image, and copy the buffer data to this new image. // And replace all the buffer object's reference to this image. image = _cl_mem_new_image(ctx, flags, image_format, image_desc->image_type, mem_buffer->base.size / bpp, 0, 0, 0, 0, NULL, NULL, errcode_ret); } else assert(0); if (image == NULL) return NULL; if(image_desc->image_type == CL_MEM_OBJECT_IMAGE2D) { //no need copy since the image 2d and buffer share same bo. } else if (image_desc->image_type == CL_MEM_OBJECT_IMAGE1D_BUFFER) { // FIXME, we could use copy buffer to image to do this on GPU latter. // currently the copy buffer to image function doesn't support 1D image. // // There is a potential risk that this buffer was mapped and the caller // still hold the pointer and want to access it again. This scenario is // not explicitly forbidden in the spec, although it should not be permitted. void *src = cl_mem_map(buffer, 0); void *dst = cl_mem_map(image, 1); memcpy(dst, src, mem_buffer->base.size); cl_mem_unmap(image); cl_mem_unmap(buffer); struct _cl_mem_buffer1d_image* image_buffer = (struct _cl_mem_buffer1d_image*)image; image_buffer->descbuffer = buffer; } else assert(0); if (err != 0) goto error; // Now replace buffer's bo to this new bo, need to take care of sub buffer // case. if (image_desc->image_type == CL_MEM_OBJECT_IMAGE1D_BUFFER) cl_mem_replace_buffer(buffer, image->bo); /* Now point to the right offset if buffer is a SUB_BUFFER. */ if (buffer->flags & CL_MEM_USE_HOST_PTR) image->host_ptr = buffer->host_ptr + offset; cl_mem_image(image)->offset = offset; cl_mem_add_ref(buffer); cl_mem_image(image)->buffer_1d = buffer; return image; error: if (image) cl_mem_delete(image); image = NULL; *errcode_ret = err; return image; } LOCAL cl_mem cl_mem_new_image(cl_context context, cl_mem_flags flags, const cl_image_format *image_format, const cl_image_desc *image_desc, void *host_ptr, cl_int *errcode_ret) { switch (image_desc->image_type) { case CL_MEM_OBJECT_IMAGE1D: case CL_MEM_OBJECT_IMAGE3D: return _cl_mem_new_image(context, flags, image_format, image_desc->image_type, image_desc->image_width, image_desc->image_height, image_desc->image_depth, image_desc->image_row_pitch, image_desc->image_slice_pitch, host_ptr, NULL, errcode_ret); case CL_MEM_OBJECT_IMAGE2D: if(image_desc->buffer) return _cl_mem_new_image_from_buffer(context, flags, image_format, image_desc, errcode_ret); else return _cl_mem_new_image(context, flags, image_format, image_desc->image_type, image_desc->image_width, image_desc->image_height, image_desc->image_depth, image_desc->image_row_pitch, image_desc->image_slice_pitch, host_ptr, NULL, errcode_ret); case CL_MEM_OBJECT_IMAGE1D_ARRAY: case CL_MEM_OBJECT_IMAGE2D_ARRAY: return _cl_mem_new_image(context, flags, image_format, image_desc->image_type, image_desc->image_width, image_desc->image_height, image_desc->image_array_size, image_desc->image_row_pitch, image_desc->image_slice_pitch, host_ptr, NULL, errcode_ret); case CL_MEM_OBJECT_IMAGE1D_BUFFER: return _cl_mem_new_image_from_buffer(context, flags, image_format, image_desc, errcode_ret); break; case CL_MEM_OBJECT_BUFFER: default: assert(0); } return NULL; } LOCAL void cl_mem_svm_delete(cl_context ctx, void *svm_pointer) { cl_mem mem; if(UNLIKELY(svm_pointer == NULL)) return; mem = cl_context_get_svm_from_ptr(ctx, svm_pointer); if(mem == NULL) return; cl_mem_delete(mem); } LOCAL void cl_mem_delete(cl_mem mem) { cl_int i; cl_mem_dstr_cb cb = NULL; if (UNLIKELY(mem == NULL)) return; if (CL_OBJECT_DEC_REF(mem) > 1) return; #ifdef HAS_GL_EGL if (UNLIKELY(IS_GL_IMAGE(mem))) { cl_mem_gl_delete(cl_mem_gl_image(mem)); } #endif #ifdef HAS_CMRT if (mem->cmrt_mem != NULL) cmrt_destroy_memory(mem); #endif /* First, call all the callbacks registered by user. */ while (!list_empty(&mem->dstr_cb_head)) { cb = list_entry(mem->dstr_cb_head.head_node.n, _cl_mem_dstr_cb, node); list_node_del(&cb->node); cb->pfn_notify(mem, cb->user_data); cl_free(cb); } /* iff we are a image, delete the 1d buffer if has. */ if (IS_IMAGE(mem)) { if (cl_mem_image(mem)->buffer_1d) { assert(cl_mem_image(mem)->image_type == CL_MEM_OBJECT_IMAGE1D_BUFFER || cl_mem_image(mem)->image_type == CL_MEM_OBJECT_IMAGE2D); cl_mem_delete(cl_mem_image(mem)->buffer_1d); if(cl_mem_image(mem)->image_type == CL_MEM_OBJECT_IMAGE2D && cl_mem_image(mem)->is_image_from_buffer == 1) { cl_mem_image(mem)->buffer_1d = NULL; mem->bo = NULL; } } } /* Someone still mapped, unmap */ if(mem->map_ref > 0) { assert(mem->mapped_ptr); for(i=0; imapped_ptr_sz; i++) { if(mem->mapped_ptr[i].ptr != NULL) { mem->map_ref--; cl_mem_unmap_auto(mem); } } assert(mem->map_ref == 0); } if (mem->mapped_ptr) free(mem->mapped_ptr); /* Iff we are sub, do nothing for bo release. */ if (mem->type == CL_MEM_SUBBUFFER_TYPE) { struct _cl_mem_buffer* buffer = (struct _cl_mem_buffer*)mem; /* Remove it from the parent's list */ assert(buffer->parent); pthread_mutex_lock(&buffer->parent->sub_lock); if (buffer->sub_prev) buffer->sub_prev->sub_next = buffer->sub_next; if (buffer->sub_next) buffer->sub_next->sub_prev = buffer->sub_prev; if (buffer->parent->subs == buffer) buffer->parent->subs = buffer->sub_next; pthread_mutex_unlock(&buffer->parent->sub_lock); cl_mem_delete((cl_mem )(buffer->parent)); } else if (mem->is_svm && mem->type != CL_MEM_SVM_TYPE) { cl_mem svm_mem = cl_context_get_svm_from_ptr(mem->ctx, mem->host_ptr); if (svm_mem != NULL) cl_mem_delete(svm_mem); } else if (LIKELY(mem->bo != NULL)) { cl_buffer_unreference(mem->bo); } /* Remove it from the list */ cl_context_remove_mem(mem->ctx, mem); if ((mem->is_userptr && (mem->flags & CL_MEM_ALLOC_HOST_PTR) && (mem->type != CL_MEM_SUBBUFFER_TYPE)) || (mem->is_svm && mem->type == CL_MEM_SVM_TYPE)) cl_free(mem->host_ptr); CL_OBJECT_DESTROY_BASE(mem); cl_free(mem); } LOCAL void cl_mem_add_ref(cl_mem mem) { assert(mem); CL_OBJECT_INC_REF(mem); } #define LOCAL_SZ_0 16 #define LOCAL_SZ_1 4 #define LOCAL_SZ_2 4 LOCAL cl_int cl_mem_copy(cl_command_queue queue, cl_event event, cl_mem src_buf, cl_mem dst_buf, size_t src_offset, size_t dst_offset, size_t cb) { cl_int ret = CL_SUCCESS; cl_kernel ker = NULL; size_t global_off[] = {0,0,0}; size_t global_sz[] = {1,1,1}; size_t local_sz[] = {1,1,1}; const unsigned int masks[4] = {0xffffffff, 0x0ff, 0x0ffff, 0x0ffffff}; int aligned = 0; int dw_src_offset = src_offset/4; int dw_dst_offset = dst_offset/4; if (!cb) return ret; /* We use one kernel to copy the data. The kernel is lazily created. */ assert(src_buf->ctx == dst_buf->ctx); /* All 16 bytes aligned, fast and easy one. */ if((cb % 16 == 0) && (src_offset % 16 == 0) && (dst_offset % 16 == 0)) { extern char cl_internal_copy_buf_align16_str[]; extern size_t cl_internal_copy_buf_align16_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_BUFFER_ALIGN16, cl_internal_copy_buf_align16_str, (size_t)cl_internal_copy_buf_align16_str_size, NULL); cb = cb/16; aligned = 1; } else if ((cb % 4 == 0) && (src_offset % 4 == 0) && (dst_offset % 4 == 0)) { /* all Dword aligned.*/ extern char cl_internal_copy_buf_align4_str[]; extern size_t cl_internal_copy_buf_align4_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_BUFFER_ALIGN4, cl_internal_copy_buf_align4_str, (size_t)cl_internal_copy_buf_align4_str_size, NULL); cb = cb/4; aligned = 1; } if (aligned) { if (!ker) return CL_OUT_OF_RESOURCES; if (cb < LOCAL_SZ_0) { local_sz[0] = 1; } else { local_sz[0] = LOCAL_SZ_0; } global_sz[0] = ((cb + LOCAL_SZ_0 - 1)/LOCAL_SZ_0)*LOCAL_SZ_0; cl_kernel_set_arg(ker, 0, sizeof(cl_mem), &src_buf); cl_kernel_set_arg(ker, 1, sizeof(int), &dw_src_offset); cl_kernel_set_arg(ker, 2, sizeof(cl_mem), &dst_buf); cl_kernel_set_arg(ker, 3, sizeof(int), &dw_dst_offset); cl_kernel_set_arg(ker, 4, sizeof(int), &cb); ret = cl_command_queue_ND_range(queue, ker, event, 1, global_off, global_off, global_sz, global_sz, local_sz, local_sz); cl_kernel_delete(ker); return ret; } /* Now handle the unaligned cases. */ int dw_num = ((dst_offset % 4 + cb) + 3) / 4; unsigned int first_mask = dst_offset % 4 == 0 ? 0x0 : masks[dst_offset % 4]; unsigned int last_mask = masks[(dst_offset + cb) % 4]; /* handle the very small range copy. */ if (cb < 4 && dw_num == 1) { first_mask = first_mask | ~last_mask; } if (cb < LOCAL_SZ_0) { local_sz[0] = 1; } else { local_sz[0] = LOCAL_SZ_0; } global_sz[0] = ((dw_num + LOCAL_SZ_0 - 1)/LOCAL_SZ_0)*LOCAL_SZ_0; if (src_offset % 4 == dst_offset % 4) { /* Src and dst has the same unaligned offset, just handle the header and tail. */ extern char cl_internal_copy_buf_unalign_same_offset_str[]; extern size_t cl_internal_copy_buf_unalign_same_offset_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_BUFFER_UNALIGN_SAME_OFFSET, cl_internal_copy_buf_unalign_same_offset_str, (size_t)cl_internal_copy_buf_unalign_same_offset_str_size, NULL); if (!ker) return CL_OUT_OF_RESOURCES; cl_kernel_set_arg(ker, 0, sizeof(cl_mem), &src_buf); cl_kernel_set_arg(ker, 1, sizeof(int), &dw_src_offset); cl_kernel_set_arg(ker, 2, sizeof(cl_mem), &dst_buf); cl_kernel_set_arg(ker, 3, sizeof(int), &dw_dst_offset); cl_kernel_set_arg(ker, 4, sizeof(int), &dw_num); cl_kernel_set_arg(ker, 5, sizeof(int), &first_mask); cl_kernel_set_arg(ker, 6, sizeof(int), &last_mask); ret = cl_command_queue_ND_range(queue, ker, event, 1, global_off, global_off, global_sz, global_sz, local_sz, local_sz); cl_kernel_delete(ker); return ret; } /* Dst's offset < Src's offset, so one dst dword need two sequential src dwords to fill it. */ if (dst_offset % 4 < src_offset % 4) { extern char cl_internal_copy_buf_unalign_dst_offset_str[]; extern size_t cl_internal_copy_buf_unalign_dst_offset_str_size; int align_diff = src_offset % 4 - dst_offset % 4; unsigned int dw_mask = masks[align_diff]; int shift = align_diff * 8; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_BUFFER_UNALIGN_DST_OFFSET, cl_internal_copy_buf_unalign_dst_offset_str, (size_t)cl_internal_copy_buf_unalign_dst_offset_str_size, NULL); if (!ker) return CL_OUT_OF_RESOURCES; cl_kernel_set_arg(ker, 0, sizeof(cl_mem), &src_buf); cl_kernel_set_arg(ker, 1, sizeof(int), &dw_src_offset); cl_kernel_set_arg(ker, 2, sizeof(cl_mem), &dst_buf); cl_kernel_set_arg(ker, 3, sizeof(int), &dw_dst_offset); cl_kernel_set_arg(ker, 4, sizeof(int), &dw_num); cl_kernel_set_arg(ker, 5, sizeof(int), &first_mask); cl_kernel_set_arg(ker, 6, sizeof(int), &last_mask); cl_kernel_set_arg(ker, 7, sizeof(int), &shift); cl_kernel_set_arg(ker, 8, sizeof(int), &dw_mask); ret = cl_command_queue_ND_range(queue, ker, event, 1, global_off, global_off, global_sz, global_sz, local_sz, local_sz); cl_kernel_delete(ker); return ret; } /* Dst's offset > Src's offset, so one dst dword need two sequential src - and src to fill it. */ if (dst_offset % 4 > src_offset % 4) { extern char cl_internal_copy_buf_unalign_src_offset_str[]; extern size_t cl_internal_copy_buf_unalign_src_offset_str_size; int align_diff = dst_offset % 4 - src_offset % 4; unsigned int dw_mask = masks[4 - align_diff]; int shift = align_diff * 8; int src_less = !(src_offset % 4) && !((src_offset + cb) % 4); ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_BUFFER_UNALIGN_SRC_OFFSET, cl_internal_copy_buf_unalign_src_offset_str, (size_t)cl_internal_copy_buf_unalign_src_offset_str_size, NULL); if (!ker) return CL_OUT_OF_RESOURCES; cl_kernel_set_arg(ker, 0, sizeof(cl_mem), &src_buf); cl_kernel_set_arg(ker, 1, sizeof(int), &dw_src_offset); cl_kernel_set_arg(ker, 2, sizeof(cl_mem), &dst_buf); cl_kernel_set_arg(ker, 3, sizeof(int), &dw_dst_offset); cl_kernel_set_arg(ker, 4, sizeof(int), &dw_num); cl_kernel_set_arg(ker, 5, sizeof(int), &first_mask); cl_kernel_set_arg(ker, 6, sizeof(int), &last_mask); cl_kernel_set_arg(ker, 7, sizeof(int), &shift); cl_kernel_set_arg(ker, 8, sizeof(int), &dw_mask); cl_kernel_set_arg(ker, 9, sizeof(int), &src_less); ret = cl_command_queue_ND_range(queue, ker, event, 1, global_off, global_off, global_sz, global_sz, local_sz, local_sz); cl_kernel_delete(ker); return ret; } /* no case can hanldle? */ assert(0); return ret; } LOCAL cl_int cl_image_fill(cl_command_queue queue, cl_event e, const void * pattern, struct _cl_mem_image* src_image, const size_t * origin, const size_t * region) { cl_int ret = CL_SUCCESS; cl_kernel ker = NULL; size_t global_off[] = {0,0,0}; size_t global_sz[] = {1,1,1}; size_t local_sz[] = {LOCAL_SZ_0,LOCAL_SZ_1,LOCAL_SZ_2}; uint32_t savedIntelFmt = src_image->intel_fmt; if(region[1] == 1) local_sz[1] = 1; if(region[2] == 1) local_sz[2] = 1; global_sz[0] = ((region[0] + local_sz[0] - 1) / local_sz[0]) * local_sz[0]; global_sz[1] = ((region[1] + local_sz[1] - 1) / local_sz[1]) * local_sz[1]; global_sz[2] = ((region[2] + local_sz[2] - 1) / local_sz[2]) * local_sz[2]; if(src_image->image_type == CL_MEM_OBJECT_IMAGE1D) { extern char cl_internal_fill_image_1d_str[]; extern size_t cl_internal_fill_image_1d_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_FILL_IMAGE_1D, cl_internal_fill_image_1d_str, (size_t)cl_internal_fill_image_1d_str_size, NULL); }else if(src_image->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY) { extern char cl_internal_fill_image_1d_array_str[]; extern size_t cl_internal_fill_image_1d_array_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_FILL_IMAGE_1D_ARRAY, cl_internal_fill_image_1d_array_str, (size_t)cl_internal_fill_image_1d_array_str_size, NULL); }else if(src_image->image_type == CL_MEM_OBJECT_IMAGE2D) { extern char cl_internal_fill_image_2d_str[]; extern size_t cl_internal_fill_image_2d_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_FILL_IMAGE_2D, cl_internal_fill_image_2d_str, (size_t)cl_internal_fill_image_2d_str_size, NULL); }else if(src_image->image_type == CL_MEM_OBJECT_IMAGE2D_ARRAY) { extern char cl_internal_fill_image_2d_array_str[]; extern size_t cl_internal_fill_image_2d_array_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_FILL_IMAGE_2D_ARRAY, cl_internal_fill_image_2d_array_str, (size_t)cl_internal_fill_image_2d_array_str_size, NULL); }else if(src_image->image_type == CL_MEM_OBJECT_IMAGE3D) { extern char cl_internal_fill_image_3d_str[]; extern size_t cl_internal_fill_image_3d_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_FILL_IMAGE_3D, cl_internal_fill_image_3d_str, (size_t)cl_internal_fill_image_3d_str_size, NULL); }else{ return CL_IMAGE_FORMAT_NOT_SUPPORTED; } if (!ker) return CL_OUT_OF_RESOURCES; cl_kernel_set_arg(ker, 0, sizeof(cl_mem), &src_image); if(src_image->fmt.image_channel_order >= CL_sRGBA) { #define RGB2sRGB(linear) ( linear <= 0.0031308f )? ( 12.92f * linear ):( 1.055f * powf( linear, 1.0f/2.4f ) - 0.055f); cl_image_format fmt; float newpattern[4] = {0.0,0.0,0.0,((float*)pattern)[3]}; int i; for(i = 0;i < 3; i++){ if(src_image->fmt.image_channel_order == CL_sRGBA) { newpattern[i] = RGB2sRGB(((float*)pattern)[i]); } else newpattern[2-i] = RGB2sRGB(((float*)pattern)[i]); } cl_kernel_set_arg(ker, 1, sizeof(float)*4, newpattern); fmt.image_channel_order = CL_RGBA; fmt.image_channel_data_type = CL_UNORM_INT8; src_image->intel_fmt = cl_image_get_intel_format(&fmt); #undef RGB2sRGB } else cl_kernel_set_arg(ker, 1, sizeof(float)*4, pattern); cl_kernel_set_arg(ker, 2, sizeof(cl_int), ®ion[0]); cl_kernel_set_arg(ker, 3, sizeof(cl_int), ®ion[1]); cl_kernel_set_arg(ker, 4, sizeof(cl_int), ®ion[2]); cl_kernel_set_arg(ker, 5, sizeof(cl_int), &origin[0]); cl_kernel_set_arg(ker, 6, sizeof(cl_int), &origin[1]); cl_kernel_set_arg(ker, 7, sizeof(cl_int), &origin[2]); ret = cl_command_queue_ND_range(queue, ker, e, 3, global_off, global_off, global_sz, global_sz, local_sz, local_sz); cl_kernel_delete(ker); src_image->intel_fmt = savedIntelFmt; return ret; } LOCAL cl_int cl_mem_fill(cl_command_queue queue, cl_event e, const void * pattern, size_t pattern_size, cl_mem buffer, size_t offset, size_t size) { cl_int ret = CL_SUCCESS; cl_kernel ker = NULL; size_t global_off[] = {0,0,0}; size_t global_sz[] = {1,1,1}; size_t local_sz[] = {1,1,1}; char pattern_comb[4]; int is_128 = 0; const void * pattern1 = NULL; assert(offset % pattern_size == 0); assert(size % pattern_size == 0); if (!size) return ret; if (pattern_size == 128) { /* 128 is according to pattern of double16, but double works not very well on some platform. We use two float16 to handle this. */ extern char cl_internal_fill_buf_align128_str[]; extern size_t cl_internal_fill_buf_align128_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_FILL_BUFFER_ALIGN128, cl_internal_fill_buf_align128_str, (size_t)cl_internal_fill_buf_align128_str_size, NULL); is_128 = 1; pattern_size = pattern_size / 2; pattern1 = pattern + pattern_size; size = size / 2; } else if (pattern_size % 8 == 0) { /* Handle the 8 16 32 64 cases here. */ extern char cl_internal_fill_buf_align8_str[]; extern size_t cl_internal_fill_buf_align8_str_size; int order = ffs(pattern_size / 8) - 1; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_FILL_BUFFER_ALIGN8_8 + order, cl_internal_fill_buf_align8_str, (size_t)cl_internal_fill_buf_align8_str_size, NULL); } else if (pattern_size == 4) { extern char cl_internal_fill_buf_align4_str[]; extern size_t cl_internal_fill_buf_align4_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_FILL_BUFFER_ALIGN4, cl_internal_fill_buf_align4_str, (size_t)cl_internal_fill_buf_align4_str_size, NULL); } else if (size >= 4 && size % 4 == 0 && offset % 4 == 0) { /* The unaligned case. But if copy size and offset are aligned to 4, we can fake the pattern with the pattern duplication fill in. */ assert(pattern_size == 1 || pattern_size == 2); extern char cl_internal_fill_buf_align4_str[]; extern size_t cl_internal_fill_buf_align4_str_size; if (pattern_size == 2) { memcpy(pattern_comb, pattern, sizeof(char)*2); memcpy(pattern_comb + 2, pattern, sizeof(char)*2); } else { pattern_comb[0] = pattern_comb[1] = pattern_comb[2] = pattern_comb[3] = *(char *)pattern; } ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_FILL_BUFFER_ALIGN4, cl_internal_fill_buf_align4_str, (size_t)cl_internal_fill_buf_align4_str_size, NULL); pattern_size = 4; pattern = pattern_comb; } //TODO: Unaligned cases, we may need to optimize it as cl_mem_copy, using mask in kernel //functions. This depend on the usage but now we just use aligned 1 and 2. else if (pattern_size == 2) { extern char cl_internal_fill_buf_align2_str[]; extern size_t cl_internal_fill_buf_align2_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_FILL_BUFFER_ALIGN2, cl_internal_fill_buf_align2_str, (size_t)cl_internal_fill_buf_align2_str_size, NULL); } else if (pattern_size == 1) { extern char cl_internal_fill_buf_unalign_str[]; extern size_t cl_internal_fill_buf_unalign_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_FILL_BUFFER_UNALIGN, cl_internal_fill_buf_unalign_str, (size_t)cl_internal_fill_buf_unalign_str_size, NULL); } else assert(0); if (!ker) return CL_OUT_OF_RESOURCES; size = size / pattern_size; offset = offset / pattern_size; if (size < LOCAL_SZ_0) { local_sz[0] = 1; } else { local_sz[0] = LOCAL_SZ_0; } global_sz[0] = ((size + LOCAL_SZ_0 - 1) / LOCAL_SZ_0) * LOCAL_SZ_0; cl_kernel_set_arg(ker, 0, sizeof(cl_mem), &buffer); cl_kernel_set_arg(ker, 1, pattern_size, pattern); cl_kernel_set_arg(ker, 2, sizeof(cl_uint), &offset); cl_kernel_set_arg(ker, 3, sizeof(cl_uint), &size); if (is_128) cl_kernel_set_arg(ker, 4, pattern_size, pattern1); ret = cl_command_queue_ND_range(queue, ker, e, 1, global_off, global_off, global_sz, global_sz, local_sz, local_sz); cl_kernel_delete(ker); return ret; } LOCAL cl_int cl_mem_copy_buffer_rect(cl_command_queue queue, cl_event event, cl_mem src_buf, cl_mem dst_buf, const size_t *src_origin, const size_t *dst_origin, const size_t *region, size_t src_row_pitch, size_t src_slice_pitch, size_t dst_row_pitch, size_t dst_slice_pitch) { cl_int ret; cl_kernel ker; size_t global_off[] = {0,0,0}; size_t global_sz[] = {1,1,1}; size_t local_sz[] = {LOCAL_SZ_0,LOCAL_SZ_1,LOCAL_SZ_1}; // the src and dst mem rect is continuous, the copy is degraded to buf copy if((region[0] == dst_row_pitch) && (region[0] == src_row_pitch) && (region[1] * src_row_pitch == src_slice_pitch) && (region[1] * dst_row_pitch == dst_slice_pitch)){ cl_int src_offset = src_origin[2]*src_slice_pitch + src_origin[1]*src_row_pitch + src_origin[0]; cl_int dst_offset = dst_origin[2]*dst_slice_pitch + dst_origin[1]*dst_row_pitch + dst_origin[0]; cl_int size = region[0]*region[1]*region[2]; ret = cl_mem_copy(queue, NULL, src_buf, dst_buf,src_offset, dst_offset, size); return ret; } if(region[1] == 1) local_sz[1] = 1; if(region[2] == 1) local_sz[2] = 1; global_sz[0] = ((region[0] + local_sz[0] - 1) / local_sz[0]) * local_sz[0]; global_sz[1] = ((region[1] + local_sz[1] - 1) / local_sz[1]) * local_sz[1]; global_sz[2] = ((region[2] + local_sz[2] - 1) / local_sz[2]) * local_sz[2]; cl_int src_offset = src_origin[2]*src_slice_pitch + src_origin[1]*src_row_pitch + src_origin[0]; cl_int dst_offset = dst_origin[2]*dst_slice_pitch + dst_origin[1]*dst_row_pitch + dst_origin[0]; /* We use one kernel to copy the data. The kernel is lazily created. */ assert(src_buf->ctx == dst_buf->ctx); /* setup the kernel and run. */ size_t region0 = region[0]; if( (src_offset % 4== 0) && (dst_offset % 4== 0) && (src_row_pitch % 4== 0) && (dst_row_pitch % 4== 0) && (src_slice_pitch % 4== 0) && (dst_slice_pitch % 4== 0) && (region0 % 4 == 0) ){ extern char cl_internal_copy_buf_rect_align4_str[]; extern size_t cl_internal_copy_buf_rect_align4_str_size; region0 /= 4; src_offset /= 4; dst_offset /= 4; src_row_pitch /= 4; dst_row_pitch /= 4; src_slice_pitch /= 4; dst_slice_pitch /= 4; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_BUFFER_RECT_ALIGN4, cl_internal_copy_buf_rect_align4_str, (size_t)cl_internal_copy_buf_rect_align4_str_size, NULL); }else{ extern char cl_internal_copy_buf_rect_str[]; extern size_t cl_internal_copy_buf_rect_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_BUFFER_RECT, cl_internal_copy_buf_rect_str, (size_t)cl_internal_copy_buf_rect_str_size, NULL); } if (!ker) return CL_OUT_OF_RESOURCES; cl_kernel_set_arg(ker, 0, sizeof(cl_mem), &src_buf); cl_kernel_set_arg(ker, 1, sizeof(cl_mem), &dst_buf); cl_kernel_set_arg(ker, 2, sizeof(cl_int), ®ion0); cl_kernel_set_arg(ker, 3, sizeof(cl_int), ®ion[1]); cl_kernel_set_arg(ker, 4, sizeof(cl_int), ®ion[2]); cl_kernel_set_arg(ker, 5, sizeof(cl_int), &src_offset); cl_kernel_set_arg(ker, 6, sizeof(cl_int), &dst_offset); cl_kernel_set_arg(ker, 7, sizeof(cl_int), &src_row_pitch); cl_kernel_set_arg(ker, 8, sizeof(cl_int), &src_slice_pitch); cl_kernel_set_arg(ker, 9, sizeof(cl_int), &dst_row_pitch); cl_kernel_set_arg(ker, 10, sizeof(cl_int), &dst_slice_pitch); ret = cl_command_queue_ND_range(queue, ker, event, 1, global_off, global_off, global_sz, global_sz, local_sz, local_sz); cl_kernel_delete(ker); return ret; } LOCAL cl_int cl_mem_kernel_copy_image(cl_command_queue queue, cl_event event, struct _cl_mem_image* src_image, struct _cl_mem_image* dst_image, const size_t *src_origin, const size_t *dst_origin, const size_t *region) { cl_int ret; cl_kernel ker = NULL; size_t global_off[] = {0,0,0}; size_t global_sz[] = {1,1,1}; size_t local_sz[] = {LOCAL_SZ_0,LOCAL_SZ_1,LOCAL_SZ_2}; uint32_t fixupDataType; uint32_t savedIntelFmt; if(region[1] == 1) local_sz[1] = 1; if(region[2] == 1) local_sz[2] = 1; global_sz[0] = ((region[0] + local_sz[0] - 1) / local_sz[0]) * local_sz[0]; global_sz[1] = ((region[1] + local_sz[1] - 1) / local_sz[1]) * local_sz[1]; global_sz[2] = ((region[2] + local_sz[2] - 1) / local_sz[2]) * local_sz[2]; switch (src_image->fmt.image_channel_data_type) { case CL_SNORM_INT8: case CL_UNORM_INT8: fixupDataType = CL_UNSIGNED_INT8; break; case CL_HALF_FLOAT: case CL_SNORM_INT16: case CL_UNORM_INT16: fixupDataType = CL_UNSIGNED_INT16; break; case CL_FLOAT: fixupDataType = CL_UNSIGNED_INT32; break; default: fixupDataType = 0; } if (fixupDataType) { cl_image_format fmt; if (src_image->fmt.image_channel_order != CL_BGRA && src_image->fmt.image_channel_order != CL_sBGRA && src_image->fmt.image_channel_order != CL_sRGBA) fmt.image_channel_order = src_image->fmt.image_channel_order; else fmt.image_channel_order = CL_RGBA; fmt.image_channel_data_type = fixupDataType; savedIntelFmt = src_image->intel_fmt; src_image->intel_fmt = cl_image_get_intel_format(&fmt); dst_image->intel_fmt = src_image->intel_fmt; } /* We use one kernel to copy the data. The kernel is lazily created. */ assert(src_image->base.ctx == dst_image->base.ctx); /* setup the kernel and run. */ if(src_image->image_type == CL_MEM_OBJECT_IMAGE1D) { if(dst_image->image_type == CL_MEM_OBJECT_IMAGE1D) { extern char cl_internal_copy_image_1d_to_1d_str[]; extern size_t cl_internal_copy_image_1d_to_1d_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_1D_TO_1D, cl_internal_copy_image_1d_to_1d_str, (size_t)cl_internal_copy_image_1d_to_1d_str_size, NULL); } } else if(src_image->image_type == CL_MEM_OBJECT_IMAGE2D) { if(dst_image->image_type == CL_MEM_OBJECT_IMAGE2D) { extern char cl_internal_copy_image_2d_to_2d_str[]; extern size_t cl_internal_copy_image_2d_to_2d_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_2D_TO_2D, cl_internal_copy_image_2d_to_2d_str, (size_t)cl_internal_copy_image_2d_to_2d_str_size, NULL); } else if(dst_image->image_type == CL_MEM_OBJECT_IMAGE3D) { extern char cl_internal_copy_image_2d_to_3d_str[]; extern size_t cl_internal_copy_image_2d_to_3d_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_2D_TO_3D, cl_internal_copy_image_2d_to_3d_str, (size_t)cl_internal_copy_image_2d_to_3d_str_size, NULL); } else if(dst_image->image_type == CL_MEM_OBJECT_IMAGE2D_ARRAY) { extern char cl_internal_copy_image_2d_to_2d_array_str[]; extern size_t cl_internal_copy_image_2d_to_2d_array_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_2D_TO_2D_ARRAY, cl_internal_copy_image_2d_to_2d_array_str, (size_t)cl_internal_copy_image_2d_to_2d_array_str_size, NULL); } } else if(src_image->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY) { if(dst_image->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY) { extern char cl_internal_copy_image_1d_array_to_1d_array_str[]; extern size_t cl_internal_copy_image_1d_array_to_1d_array_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_1D_ARRAY_TO_1D_ARRAY, cl_internal_copy_image_1d_array_to_1d_array_str, (size_t)cl_internal_copy_image_1d_array_to_1d_array_str_size, NULL); } } else if(src_image->image_type == CL_MEM_OBJECT_IMAGE2D_ARRAY) { if(dst_image->image_type == CL_MEM_OBJECT_IMAGE2D_ARRAY) { extern char cl_internal_copy_image_2d_array_to_2d_array_str[]; extern size_t cl_internal_copy_image_2d_array_to_2d_array_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_2D_ARRAY_TO_2D_ARRAY, cl_internal_copy_image_2d_array_to_2d_array_str, (size_t)cl_internal_copy_image_2d_array_to_2d_array_str_size, NULL); } else if(dst_image->image_type == CL_MEM_OBJECT_IMAGE2D) { extern char cl_internal_copy_image_2d_array_to_2d_str[]; extern size_t cl_internal_copy_image_2d_array_to_2d_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_2D_ARRAY_TO_2D, cl_internal_copy_image_2d_array_to_2d_str, (size_t)cl_internal_copy_image_2d_array_to_2d_str_size, NULL); } else if(dst_image->image_type == CL_MEM_OBJECT_IMAGE3D) { extern char cl_internal_copy_image_2d_array_to_3d_str[]; extern size_t cl_internal_copy_image_2d_array_to_3d_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_2D_ARRAY_TO_3D, cl_internal_copy_image_2d_array_to_3d_str, (size_t)cl_internal_copy_image_2d_array_to_3d_str_size, NULL); } } else if(src_image->image_type == CL_MEM_OBJECT_IMAGE3D) { if(dst_image->image_type == CL_MEM_OBJECT_IMAGE2D) { extern char cl_internal_copy_image_3d_to_2d_str[]; extern size_t cl_internal_copy_image_3d_to_2d_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_3D_TO_2D, cl_internal_copy_image_3d_to_2d_str, (size_t)cl_internal_copy_image_3d_to_2d_str_size, NULL); } else if(dst_image->image_type == CL_MEM_OBJECT_IMAGE3D) { extern char cl_internal_copy_image_3d_to_3d_str[]; extern size_t cl_internal_copy_image_3d_to_3d_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_3D_TO_3D, cl_internal_copy_image_3d_to_3d_str, (size_t)cl_internal_copy_image_3d_to_3d_str_size, NULL); } else if(dst_image->image_type == CL_MEM_OBJECT_IMAGE2D_ARRAY) { extern char cl_internal_copy_image_3d_to_2d_array_str[]; extern size_t cl_internal_copy_image_3d_to_2d_array_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_3D_TO_2D_ARRAY, cl_internal_copy_image_3d_to_2d_array_str, (size_t)cl_internal_copy_image_3d_to_2d_array_str_size, NULL); } } if (!ker) { ret = CL_OUT_OF_RESOURCES; goto fail; } cl_kernel_set_arg(ker, 0, sizeof(cl_mem), &src_image); cl_kernel_set_arg(ker, 1, sizeof(cl_mem), &dst_image); cl_kernel_set_arg(ker, 2, sizeof(cl_int), ®ion[0]); cl_kernel_set_arg(ker, 3, sizeof(cl_int), ®ion[1]); cl_kernel_set_arg(ker, 4, sizeof(cl_int), ®ion[2]); cl_kernel_set_arg(ker, 5, sizeof(cl_int), &src_origin[0]); cl_kernel_set_arg(ker, 6, sizeof(cl_int), &src_origin[1]); cl_kernel_set_arg(ker, 7, sizeof(cl_int), &src_origin[2]); cl_kernel_set_arg(ker, 8, sizeof(cl_int), &dst_origin[0]); cl_kernel_set_arg(ker, 9, sizeof(cl_int), &dst_origin[1]); cl_kernel_set_arg(ker, 10, sizeof(cl_int), &dst_origin[2]); ret = cl_command_queue_ND_range(queue, ker, event, 1, global_off, global_off, global_sz, global_sz, local_sz, local_sz); fail: cl_kernel_delete(ker); if (fixupDataType) { src_image->intel_fmt = savedIntelFmt; dst_image->intel_fmt = savedIntelFmt; } return ret; } LOCAL cl_int cl_mem_copy_image_to_buffer(cl_command_queue queue, cl_event event, struct _cl_mem_image* image, cl_mem buffer, const size_t *src_origin, const size_t dst_offset, const size_t *region) { cl_int ret; cl_kernel ker = NULL; size_t global_off[] = {0,0,0}; size_t global_sz[] = {1,1,1}; size_t local_sz[] = {LOCAL_SZ_0,LOCAL_SZ_1,LOCAL_SZ_2}; uint32_t intel_fmt, bpp; cl_image_format fmt; size_t origin0, region0; size_t kn_dst_offset; int align16 = 0; size_t align_size = 1; size_t w_saved; if(region[1] == 1) local_sz[1] = 1; if(region[2] == 1) local_sz[2] = 1; global_sz[0] = ((region[0] + local_sz[0] - 1) / local_sz[0]) * local_sz[0]; global_sz[1] = ((region[1] + local_sz[1] - 1) / local_sz[1]) * local_sz[1]; global_sz[2] = ((region[2] + local_sz[2] - 1) / local_sz[2]) * local_sz[2]; /* We use one kernel to copy the data. The kernel is lazily created. */ assert(image->base.ctx == buffer->ctx); intel_fmt = image->intel_fmt; bpp = image->bpp; w_saved = image->w; region0 = region[0] * bpp; kn_dst_offset = dst_offset; if((image->image_type == CL_MEM_OBJECT_IMAGE2D) && ((image->w * image->bpp) % 16 == 0) && ((src_origin[0] * bpp) % 16 == 0) && (region0 % 16 == 0) && (dst_offset % 16 == 0)){ fmt.image_channel_order = CL_RGBA; fmt.image_channel_data_type = CL_UNSIGNED_INT32; align16 = 1; align_size = 16; } else{ fmt.image_channel_order = CL_R; fmt.image_channel_data_type = CL_UNSIGNED_INT8; align_size = 1; } image->intel_fmt = cl_image_get_intel_format(&fmt); image->w = (image->w * image->bpp) / align_size; image->bpp = align_size; region0 = (region[0] * bpp) / align_size; origin0 = (src_origin[0] * bpp) / align_size; kn_dst_offset /= align_size; global_sz[0] = ((region0 + local_sz[0] - 1) / local_sz[0]) * local_sz[0]; /* setup the kernel and run. */ if(image->image_type == CL_MEM_OBJECT_IMAGE2D) { if(align16){ extern char cl_internal_copy_image_2d_to_buffer_align16_str[]; extern size_t cl_internal_copy_image_2d_to_buffer_align16_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER_ALIGN16, cl_internal_copy_image_2d_to_buffer_align16_str, (size_t)cl_internal_copy_image_2d_to_buffer_align16_str_size, NULL); } else{ extern char cl_internal_copy_image_2d_to_buffer_str[]; extern size_t cl_internal_copy_image_2d_to_buffer_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER, cl_internal_copy_image_2d_to_buffer_str, (size_t)cl_internal_copy_image_2d_to_buffer_str_size, NULL); } }else if(image->image_type == CL_MEM_OBJECT_IMAGE3D) { extern char cl_internal_copy_image_3d_to_buffer_str[]; extern size_t cl_internal_copy_image_3d_to_buffer_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER, cl_internal_copy_image_3d_to_buffer_str, (size_t)cl_internal_copy_image_3d_to_buffer_str_size, NULL); } if (!ker) { ret = CL_OUT_OF_RESOURCES; goto fail; } cl_kernel_set_arg(ker, 0, sizeof(cl_mem), &image); cl_kernel_set_arg(ker, 1, sizeof(cl_mem), &buffer); cl_kernel_set_arg(ker, 2, sizeof(cl_int), ®ion0); cl_kernel_set_arg(ker, 3, sizeof(cl_int), ®ion[1]); cl_kernel_set_arg(ker, 4, sizeof(cl_int), ®ion[2]); cl_kernel_set_arg(ker, 5, sizeof(cl_int), &origin0); cl_kernel_set_arg(ker, 6, sizeof(cl_int), &src_origin[1]); cl_kernel_set_arg(ker, 7, sizeof(cl_int), &src_origin[2]); cl_kernel_set_arg(ker, 8, sizeof(cl_int), &kn_dst_offset); ret = cl_command_queue_ND_range(queue, ker, event, 1, global_off, global_off, global_sz, global_sz, local_sz, local_sz); fail: cl_kernel_delete(ker); image->intel_fmt = intel_fmt; image->bpp = bpp; image->w = w_saved; return ret; } LOCAL cl_int cl_mem_copy_buffer_to_image(cl_command_queue queue, cl_event event, cl_mem buffer, struct _cl_mem_image* image, const size_t src_offset, const size_t *dst_origin, const size_t *region) { cl_int ret; cl_kernel ker = NULL; size_t global_off[] = {0,0,0}; size_t global_sz[] = {1,1,1}; size_t local_sz[] = {LOCAL_SZ_0,LOCAL_SZ_1,LOCAL_SZ_2}; uint32_t intel_fmt, bpp; cl_image_format fmt; size_t origin0, region0; size_t kn_src_offset; int align16 = 0; size_t align_size = 1; size_t w_saved = 0; if(region[1] == 1) local_sz[1] = 1; if(region[2] == 1) local_sz[2] = 1; global_sz[0] = ((region[0] + local_sz[0] - 1) / local_sz[0]) * local_sz[0]; global_sz[1] = ((region[1] + local_sz[1] - 1) / local_sz[1]) * local_sz[1]; global_sz[2] = ((region[2] + local_sz[2] - 1) / local_sz[2]) * local_sz[2]; /* We use one kernel to copy the data. The kernel is lazily created. */ assert(image->base.ctx == buffer->ctx); intel_fmt = image->intel_fmt; bpp = image->bpp; w_saved = image->w; region0 = region[0] * bpp; kn_src_offset = src_offset; if((image->image_type == CL_MEM_OBJECT_IMAGE2D) && ((image->w * image->bpp) % 16 == 0) && ((dst_origin[0] * bpp) % 16 == 0) && (region0 % 16 == 0) && (src_offset % 16 == 0)){ fmt.image_channel_order = CL_RGBA; fmt.image_channel_data_type = CL_UNSIGNED_INT32; align16 = 1; align_size = 16; } else{ fmt.image_channel_order = CL_R; fmt.image_channel_data_type = CL_UNSIGNED_INT8; align_size = 1; } image->intel_fmt = cl_image_get_intel_format(&fmt); image->w = (image->w * image->bpp) / align_size; image->bpp = align_size; region0 = (region[0] * bpp) / align_size; origin0 = (dst_origin[0] * bpp) / align_size; kn_src_offset /= align_size; global_sz[0] = ((region0 + local_sz[0] - 1) / local_sz[0]) * local_sz[0]; /* setup the kernel and run. */ if(image->image_type == CL_MEM_OBJECT_IMAGE2D) { if(align16){ extern char cl_internal_copy_buffer_to_image_2d_align16_str[]; extern size_t cl_internal_copy_buffer_to_image_2d_align16_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D_ALIGN16, cl_internal_copy_buffer_to_image_2d_align16_str, (size_t)cl_internal_copy_buffer_to_image_2d_align16_str_size, NULL); } else{ extern char cl_internal_copy_buffer_to_image_2d_str[]; extern size_t cl_internal_copy_buffer_to_image_2d_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D, cl_internal_copy_buffer_to_image_2d_str, (size_t)cl_internal_copy_buffer_to_image_2d_str_size, NULL); } }else if(image->image_type == CL_MEM_OBJECT_IMAGE3D) { extern char cl_internal_copy_buffer_to_image_3d_str[]; extern size_t cl_internal_copy_buffer_to_image_3d_str_size; ker = cl_context_get_static_kernel_from_bin(queue->ctx, CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D, cl_internal_copy_buffer_to_image_3d_str, (size_t)cl_internal_copy_buffer_to_image_3d_str_size, NULL); } if (!ker) return CL_OUT_OF_RESOURCES; cl_kernel_set_arg(ker, 0, sizeof(cl_mem), &image); cl_kernel_set_arg(ker, 1, sizeof(cl_mem), &buffer); cl_kernel_set_arg(ker, 2, sizeof(cl_int), ®ion0); cl_kernel_set_arg(ker, 3, sizeof(cl_int), ®ion[1]); cl_kernel_set_arg(ker, 4, sizeof(cl_int), ®ion[2]); cl_kernel_set_arg(ker, 5, sizeof(cl_int), &origin0); cl_kernel_set_arg(ker, 6, sizeof(cl_int), &dst_origin[1]); cl_kernel_set_arg(ker, 7, sizeof(cl_int), &dst_origin[2]); cl_kernel_set_arg(ker, 8, sizeof(cl_int), &kn_src_offset); ret = cl_command_queue_ND_range(queue, ker, event, 1, global_off, global_off, global_sz, global_sz, local_sz, local_sz); cl_kernel_delete(ker); image->intel_fmt = intel_fmt; image->bpp = bpp; image->w = w_saved; return ret; } LOCAL void* cl_mem_map(cl_mem mem, int write) { cl_buffer_map(mem->bo, write); assert(cl_buffer_get_virtual(mem->bo)); return cl_buffer_get_virtual(mem->bo); } LOCAL cl_int cl_mem_unmap(cl_mem mem) { cl_buffer_unmap(mem->bo); return CL_SUCCESS; } LOCAL void* cl_mem_map_gtt(cl_mem mem) { cl_buffer_map_gtt(mem->bo); assert(cl_buffer_get_virtual(mem->bo)); mem->mapped_gtt = 1; return cl_buffer_get_virtual(mem->bo); } LOCAL void * cl_mem_map_gtt_unsync(cl_mem mem) { cl_buffer_map_gtt_unsync(mem->bo); assert(cl_buffer_get_virtual(mem->bo)); return cl_buffer_get_virtual(mem->bo); } LOCAL cl_int cl_mem_unmap_gtt(cl_mem mem) { cl_buffer_unmap_gtt(mem->bo); return CL_SUCCESS; } LOCAL void* cl_mem_map_auto(cl_mem mem, int write) { //if mem is not created from userptr, the offset should be always zero. if (!mem->is_userptr) assert(mem->offset == 0); if (IS_IMAGE(mem) && cl_mem_image(mem)->tiling != CL_NO_TILE) return cl_mem_map_gtt(mem); else { if (mem->is_userptr) { cl_buffer_wait_rendering(mem->bo); return mem->host_ptr; }else return cl_mem_map(mem, write); } } LOCAL cl_int cl_mem_unmap_auto(cl_mem mem) { if (mem->mapped_gtt == 1) { cl_buffer_unmap_gtt(mem->bo); mem->mapped_gtt = 0; } else if (!mem->is_userptr) cl_buffer_unmap(mem->bo); return CL_SUCCESS; } LOCAL cl_int cl_mem_pin(cl_mem mem) { assert(mem); if (UNLIKELY((mem->flags & CL_MEM_PINNABLE) == 0)) return CL_INVALID_MEM_OBJECT; cl_buffer_pin(mem->bo, 4096); return CL_SUCCESS; } LOCAL cl_int cl_mem_unpin(cl_mem mem) { assert(mem); if (UNLIKELY((mem->flags & CL_MEM_PINNABLE) == 0)) return CL_INVALID_MEM_OBJECT; cl_buffer_unpin(mem->bo); return CL_SUCCESS; } LOCAL cl_mem cl_mem_new_libva_buffer(cl_context ctx, unsigned int bo_name, cl_int* errcode) { cl_int err = CL_SUCCESS; cl_mem mem = NULL; mem = cl_mem_allocate(CL_MEM_BUFFER_TYPE, ctx, 0, 0, CL_FALSE, NULL, NULL, &err); if (mem == NULL || err != CL_SUCCESS) goto error; size_t sz = 0; mem->bo = cl_buffer_get_buffer_from_libva(ctx, bo_name, &sz); if (mem->bo == NULL) { err = CL_MEM_OBJECT_ALLOCATION_FAILURE; goto error; } mem->size = sz; exit: if (errcode) *errcode = err; return mem; error: cl_mem_delete(mem); mem = NULL; goto exit; } LOCAL cl_mem cl_mem_new_libva_image(cl_context ctx, unsigned int bo_name, size_t offset, size_t width, size_t height, cl_image_format fmt, size_t row_pitch, cl_int *errcode) { cl_int err = CL_SUCCESS; cl_mem mem = NULL; struct _cl_mem_image *image = NULL; uint32_t intel_fmt, bpp; /* Get the size of each pixel */ if (UNLIKELY((err = cl_image_byte_per_pixel(&fmt, &bpp)) != CL_SUCCESS)) goto error; intel_fmt = cl_image_get_intel_format(&fmt); if (intel_fmt == INTEL_UNSUPPORTED_FORMAT) { err = CL_IMAGE_FORMAT_NOT_SUPPORTED; goto error; } mem = cl_mem_allocate(CL_MEM_IMAGE_TYPE, ctx, 0, 0, 0, NULL, NULL, &err); if (mem == NULL || err != CL_SUCCESS) goto error; image = cl_mem_image(mem); mem->bo = cl_buffer_get_image_from_libva(ctx, bo_name, image); if (mem->bo == NULL) { err = CL_MEM_OBJECT_ALLOCATION_FAILURE; goto error; } image->w = width; image->h = height; image->image_type = CL_MEM_OBJECT_IMAGE2D; image->depth = 1; image->fmt = fmt; image->intel_fmt = intel_fmt; image->bpp = bpp; image->row_pitch = row_pitch; image->slice_pitch = 0; // NOTE: tiling of image is set in cl_buffer_get_image_from_libva(). image->tile_x = 0; image->tile_y = 0; image->offset = offset; exit: if (errcode) *errcode = err; return mem; error: cl_mem_delete(mem); mem = NULL; goto exit; } LOCAL cl_int cl_mem_get_fd(cl_mem mem, int* fd) { cl_int err = CL_SUCCESS; if(cl_buffer_get_fd(mem->bo, fd)) err = CL_INVALID_OPERATION; return err; } LOCAL cl_mem cl_mem_new_buffer_from_fd(cl_context ctx, int fd, int buffer_sz, cl_int* errcode) { cl_int err = CL_SUCCESS; cl_mem mem = NULL; mem = cl_mem_allocate(CL_MEM_BUFFER_TYPE, ctx, 0, 0, CL_FALSE, NULL, NULL, &err); if (mem == NULL || err != CL_SUCCESS) goto error; mem->bo = cl_buffer_get_buffer_from_fd(ctx, fd, buffer_sz); if (mem->bo == NULL) { err = CL_MEM_OBJECT_ALLOCATION_FAILURE; goto error; } mem->size = buffer_sz; exit: if (errcode) *errcode = err; return mem; error: cl_mem_delete(mem); mem = NULL; goto exit; } LOCAL cl_mem cl_mem_new_image_from_fd(cl_context ctx, int fd, int image_sz, size_t offset, size_t width, size_t height, cl_image_format fmt, size_t row_pitch, cl_int *errcode) { cl_int err = CL_SUCCESS; cl_mem mem = NULL; struct _cl_mem_image *image = NULL; uint32_t intel_fmt, bpp; /* Get the size of each pixel */ if (UNLIKELY((err = cl_image_byte_per_pixel(&fmt, &bpp)) != CL_SUCCESS)) goto error; intel_fmt = cl_image_get_intel_format(&fmt); if (intel_fmt == INTEL_UNSUPPORTED_FORMAT) { err = CL_IMAGE_FORMAT_NOT_SUPPORTED; goto error; } mem = cl_mem_allocate(CL_MEM_IMAGE_TYPE, ctx, 0, 0, 0, NULL, NULL, &err); if (mem == NULL || err != CL_SUCCESS) goto error; image = cl_mem_image(mem); mem->bo = cl_buffer_get_image_from_fd(ctx, fd, image_sz, image); if (mem->bo == NULL) { err = CL_MEM_OBJECT_ALLOCATION_FAILURE; goto error; } mem->size = image_sz; image->w = width; image->h = height; image->image_type = CL_MEM_OBJECT_IMAGE2D; image->depth = 1; image->fmt = fmt; image->intel_fmt = intel_fmt; image->bpp = bpp; image->row_pitch = row_pitch; image->slice_pitch = 0; // NOTE: tiling of image is set in cl_buffer_get_image_from_fd(). image->tile_x = 0; image->tile_y = 0; image->offset = offset; exit: if (errcode) *errcode = err; return mem; error: cl_mem_delete(mem); mem = NULL; goto exit; } LOCAL cl_int cl_mem_record_map_mem(cl_mem mem, void *ptr, void **mem_ptr, size_t offset, size_t size, const size_t *origin, const size_t *region) { // TODO: Need to add MT safe logic. cl_int slot = -1; int err = CL_SUCCESS; size_t sub_offset = 0; if(mem->type == CL_MEM_SUBBUFFER_TYPE) { struct _cl_mem_buffer* buffer = (struct _cl_mem_buffer*)mem; sub_offset = buffer->sub_offset; } ptr = (char*)ptr + offset + sub_offset; if(mem->flags & CL_MEM_USE_HOST_PTR) { assert(mem->host_ptr); //only calc ptr here, will do memcpy in enqueue *mem_ptr = (char *)mem->host_ptr + offset + sub_offset; } else { *mem_ptr = ptr; } /* Record the mapped address. */ if (!mem->mapped_ptr_sz) { mem->mapped_ptr_sz = 16; mem->mapped_ptr = (cl_mapped_ptr *)malloc( sizeof(cl_mapped_ptr) * mem->mapped_ptr_sz); if (!mem->mapped_ptr) { cl_mem_unmap_auto(mem); err = CL_OUT_OF_HOST_MEMORY; goto error; } memset(mem->mapped_ptr, 0, mem->mapped_ptr_sz * sizeof(cl_mapped_ptr)); slot = 0; } else { int i = 0; for (; i < mem->mapped_ptr_sz; i++) { if (mem->mapped_ptr[i].ptr == NULL) { slot = i; break; } } if (i == mem->mapped_ptr_sz) { cl_mapped_ptr *new_ptr = (cl_mapped_ptr *)malloc( sizeof(cl_mapped_ptr) * mem->mapped_ptr_sz * 2); if (!new_ptr) { cl_mem_unmap_auto(mem); err = CL_OUT_OF_HOST_MEMORY; goto error; } memset(new_ptr, 0, 2 * mem->mapped_ptr_sz * sizeof(cl_mapped_ptr)); memcpy(new_ptr, mem->mapped_ptr, mem->mapped_ptr_sz * sizeof(cl_mapped_ptr)); slot = mem->mapped_ptr_sz; mem->mapped_ptr_sz *= 2; free(mem->mapped_ptr); mem->mapped_ptr = new_ptr; } } assert(slot != -1); mem->mapped_ptr[slot].ptr = *mem_ptr; mem->mapped_ptr[slot].v_ptr = ptr; mem->mapped_ptr[slot].size = size; if(origin) { assert(region); mem->mapped_ptr[slot].origin[0] = origin[0]; mem->mapped_ptr[slot].origin[1] = origin[1]; mem->mapped_ptr[slot].origin[2] = origin[2]; mem->mapped_ptr[slot].region[0] = region[0]; mem->mapped_ptr[slot].region[1] = region[1]; mem->mapped_ptr[slot].region[2] = region[2]; } mem->map_ref++; error: if (err != CL_SUCCESS) *mem_ptr = NULL; return err; } LOCAL cl_int cl_mem_set_destructor_callback(cl_mem memobj, void(CL_CALLBACK *pfn_notify)(cl_mem, void *), void *user_data) { cl_mem_dstr_cb cb = cl_calloc(1, sizeof(_cl_mem_dstr_cb)); if (cb == NULL) { return CL_OUT_OF_HOST_MEMORY; } memset(cb, 0, sizeof(_cl_mem_dstr_cb)); list_node_init(&cb->node); cb->pfn_notify = pfn_notify; cb->user_data = user_data; CL_OBJECT_LOCK(memobj); list_add(&memobj->dstr_cb_head, &cb->node); CL_OBJECT_UNLOCK(memobj); return CL_SUCCESS; } Beignet-1.3.2-Source/src/kernels/000775 001750 001750 00000000000 13174334761 015723 5ustar00yryr000000 000000 Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_2d_array_to_2d.cl000664 001750 001750 00000001656 13161142102 025571 0ustar00yryr000000 000000 kernel void __cl_copy_image_2d_array_to_2d(__read_only image2d_array_t src_image, __write_only image2d_t dst_image, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2) { int i = get_global_id(0); int j = get_global_id(1); int4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; int4 src_coord; int2 dst_coord; if((i >= region0) || (j>= region1)) return; src_coord.x = src_origin0 + i; src_coord.y = src_origin1 + j; src_coord.z = src_origin2; dst_coord.x = dst_origin0 + i; dst_coord.y = dst_origin1 + j; color = read_imagei(src_image, sampler, src_coord); write_imagei(dst_image, dst_coord, color); } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_1d_to_1d.cl000664 001750 001750 00000001551 13161142102 024363 0ustar00yryr000000 000000 kernel void __cl_copy_image_1d_to_1d(__read_only image1d_t src_image, __write_only image1d_t dst_image, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); int4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; int src_coord; int dst_coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; src_coord = src_origin0 + i; dst_coord = dst_origin0 + i; color = read_imagei(src_image, sampler, src_coord); write_imagei(dst_image, dst_coord, color); } Beignet-1.3.2-Source/src/kernels/cl_internal_fill_image_1d.cl000664 001750 001750 00000000762 13161142102 023254 0ustar00yryr000000 000000 kernel void __cl_fill_image_1d( __write_only image1d_t image, float4 pattern, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int origin0, unsigned int origin1, unsigned int origin2) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); int coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; coord = origin0 + i; write_imagef(image, coord, pattern); } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_3d_to_3d.cl000664 001750 001750 00000001763 13161142102 024374 0ustar00yryr000000 000000 kernel void __cl_copy_image_3d_to_3d(__read_only image3d_t src_image, __write_only image3d_t dst_image, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); int4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; int4 src_coord; int4 dst_coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; src_coord.x = src_origin0 + i; src_coord.y = src_origin1 + j; src_coord.z = src_origin2 + k; dst_coord.x = dst_origin0 + i; dst_coord.y = dst_origin1 + j; dst_coord.z = dst_origin2 + k; color = read_imagei(src_image, sampler, src_coord); write_imagei(dst_image, dst_coord, color); } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_buffer_to_image_2d.cl000664 001750 001750 00000001403 13161142102 025325 0ustar00yryr000000 000000 kernel void __cl_copy_buffer_to_image_2d(__write_only image2d_t image, global uchar* buffer, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2, unsigned int src_offset) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); uint4 color = (uint4)(0); int2 dst_coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; dst_coord.x = dst_origin0 + i; dst_coord.y = dst_origin1 + j; src_offset += (k * region1 + j) * region0 + i; color.x = buffer[src_offset]; write_imageui(image, dst_coord, color); } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_2d_to_buffer_align16.cl000664 001750 001750 00000001437 13161142102 026655 0ustar00yryr000000 000000 kernel void __cl_copy_image_2d_to_buffer_align16( __read_only image2d_t image, global uint4* buffer, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_offset) { int i = get_global_id(0); int j = get_global_id(1); if((i >= region0) || (j>= region1)) return; uint4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; int2 src_coord; src_coord.x = src_origin0 + i; src_coord.y = src_origin1 + j; color = read_imageui(image, sampler, src_coord); *(buffer + dst_offset + region0*j + i) = color; } Beignet-1.3.2-Source/src/kernels/cl_internal_fill_image_3d.cl000664 001750 001750 00000001047 13161142102 023253 0ustar00yryr000000 000000 kernel void __cl_fill_image_3d( __write_only image3d_t image, float4 pattern, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int origin0, unsigned int origin1, unsigned int origin2) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); int4 coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; coord.x = origin0 + i; coord.y = origin1 + j; coord.z = origin2 + k; write_imagef(image, coord, pattern); } Beignet-1.3.2-Source/src/kernels/cl_internal_fill_buf_align4.cl000664 001750 001750 00000000335 13161142102 023614 0ustar00yryr000000 000000 kernel void __cl_fill_region_align4 ( global float* dst, float pattern, unsigned int offset, unsigned int size) { int i = get_global_id(0); if (i < size) { dst[i+offset] = pattern; } } Beignet-1.3.2-Source/src/kernels/cl_internal_fill_buf_align128.cl000664 001750 001750 00000000467 13161142102 023771 0ustar00yryr000000 000000 kernel void __cl_fill_region_align128 ( global float16* dst, float16 pattern0, unsigned int offset, unsigned int size, float16 pattern1) { int i = get_global_id(0); if (i < size) { dst[i*2+offset] = pattern0; dst[i*2+offset+1] = pattern1; } } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_buf_align16.cl000664 001750 001750 00000000730 13161142102 023722 0ustar00yryr000000 000000 kernel void __cl_copy_region_align16 ( global float* src, unsigned int src_offset, global float* dst, unsigned int dst_offset, unsigned int size) { int i = get_global_id(0) * 4; if (i < size*4) { dst[i+dst_offset] = src[i+src_offset]; dst[i+dst_offset + 1] = src[i+src_offset + 1]; dst[i+dst_offset + 2] = src[i+src_offset + 2]; dst[i+dst_offset + 3] = src[i+src_offset + 3]; } } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_3d_to_buffer.cl000664 001750 001750 00000001676 13161142102 025342 0ustar00yryr000000 000000 #define IMAGE_TYPE image3d_t #define COORD_TYPE int4 kernel void __cl_copy_image_3d_to_buffer ( __read_only IMAGE_TYPE image, global uchar* buffer, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_offset) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); uint4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; COORD_TYPE src_coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; src_coord.x = src_origin0 + i; src_coord.y = src_origin1 + j; src_coord.z = src_origin2 + k; color = read_imageui(image, sampler, src_coord); dst_offset += (k * region1 + j) * region0 + i; buffer[dst_offset] = color.x; } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_2d_to_2d_array.cl000664 001750 001750 00000001725 13161142102 025566 0ustar00yryr000000 000000 kernel void __cl_copy_image_2d_to_2d_array(__read_only image2d_t src_image, __write_only image2d_array_t dst_image, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2) { int i = get_global_id(0); int j = get_global_id(1); int4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; int2 src_coord; int4 dst_coord; if((i >= region0) || (j>= region1)) return; src_coord.x = src_origin0 + i; src_coord.y = src_origin1 + j; dst_coord.x = dst_origin0 + i; dst_coord.y = dst_origin1 + j; dst_coord.z = dst_origin2; color = read_imagei(src_image, sampler, src_coord); write_imagei(dst_image, dst_coord, color); } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_2d_to_2d.cl000664 001750 001750 00000001661 13161142102 024367 0ustar00yryr000000 000000 kernel void __cl_copy_image_2d_to_2d(__read_only image2d_t src_image, __write_only image2d_t dst_image, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); int4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; int2 src_coord; int2 dst_coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; src_coord.x = src_origin0 + i; src_coord.y = src_origin1 + j; dst_coord.x = dst_origin0 + i; dst_coord.y = dst_origin1 + j; color = read_imagei(src_image, sampler, src_coord); write_imagei(dst_image, dst_coord, color); } Beignet-1.3.2-Source/src/kernels/cl_internal_fill_image_1d_array.cl000664 001750 001750 00000001032 13161142102 024441 0ustar00yryr000000 000000 kernel void __cl_fill_image_1d_array( __write_only image1d_array_t image, float4 pattern, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int origin0, unsigned int origin1, unsigned int origin2) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); int2 coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; coord.x = origin0 + i; coord.y = origin2 + k; write_imagef(image, coord, pattern); } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_buf_align4.cl000664 001750 001750 00000000442 13161142102 023637 0ustar00yryr000000 000000 kernel void __cl_copy_region_align4 ( global float* src, unsigned int src_offset, global float* dst, unsigned int dst_offset, unsigned int size) { int i = get_global_id(0); if (i < size) dst[i+dst_offset] = src[i+src_offset]; } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_buf_rect.cl000664 001750 001750 00000001421 13161142102 023414 0ustar00yryr000000 000000 kernel void __cl_copy_buffer_rect ( global char* src, global char* dst, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_offset, unsigned int dst_offset, unsigned int src_row_pitch, unsigned int src_slice_pitch, unsigned int dst_row_pitch, unsigned int dst_slice_pitch) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); if((i >= region0) || (j>= region1) || (k>=region2)) return; src_offset += k * src_slice_pitch + j * src_row_pitch + i; dst_offset += k * dst_slice_pitch + j * dst_row_pitch + i; dst[dst_offset] = src[src_offset]; } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_1d_array_to_1d_array.cl000664 001750 001750 00000001635 13161142102 026762 0ustar00yryr000000 000000 kernel void __cl_copy_image_1d_array_to_1d_array(__read_only image1d_array_t src_image, __write_only image1d_array_t dst_image, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2) { int i = get_global_id(0); int k = get_global_id(2); int4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; int2 src_coord; int2 dst_coord; if((i >= region0) || (k>=region2)) return; src_coord.x = src_origin0 + i; src_coord.y = src_origin2 + k; dst_coord.x = dst_origin0 + i; dst_coord.y = dst_origin2 + k; color = read_imagei(src_image, sampler, src_coord); write_imagei(dst_image, dst_coord, color); } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_buf_rect_align4.cl000664 001750 001750 00000001426 13161142102 024657 0ustar00yryr000000 000000 kernel void __cl_copy_buffer_rect_align4 ( global int* src, global int* dst, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_offset, unsigned int dst_offset, unsigned int src_row_pitch, unsigned int src_slice_pitch, unsigned int dst_row_pitch, unsigned int dst_slice_pitch) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); if((i >= region0) || (j>= region1) || (k>=region2)) return; src_offset += k * src_slice_pitch + j * src_row_pitch + i; dst_offset += k * dst_slice_pitch + j * dst_row_pitch + i; dst[dst_offset] = src[src_offset]; } Beignet-1.3.2-Source/src/kernels/cl_internal_fill_image_2d_array.cl000664 001750 001750 00000001063 13161142102 024446 0ustar00yryr000000 000000 kernel void __cl_fill_image_2d_array( __write_only image2d_array_t image, float4 pattern, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int origin0, unsigned int origin1, unsigned int origin2) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); int4 coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; coord.x = origin0 + i; coord.y = origin1 + j; coord.z = origin2 + k; write_imagef(image, coord, pattern); } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_buf_unalign_dst_offset.cl000664 001750 001750 00000001777 13161142102 026352 0ustar00yryr000000 000000 kernel void __cl_copy_region_unalign_dst_offset ( global int* src, unsigned int src_offset, global int* dst, unsigned int dst_offset, unsigned int size, unsigned int first_mask, unsigned int last_mask, unsigned int shift, unsigned int dw_mask) { int i = get_global_id(0); unsigned int tmp = 0; if (i > size -1) return; /* last dw, need to be careful, not to overflow the source. */ if ((i == size - 1) && ((last_mask & (~(~dw_mask >> shift))) == 0)) { tmp = ((src[src_offset + i] & ~dw_mask) >> shift); } else { tmp = ((src[src_offset + i] & ~dw_mask) >> shift) | ((src[src_offset + i + 1] & dw_mask) << (32 - shift)); } if (i == 0) { dst[dst_offset] = (dst[dst_offset] & first_mask) | (tmp & (~first_mask)); } else if (i == size - 1) { dst[i+dst_offset] = (tmp & last_mask) | (dst[i+dst_offset] & (~last_mask)); } else { dst[i+dst_offset] = tmp; } } Beignet-1.3.2-Source/src/kernels/cl_internal_fill_image_2d.cl000664 001750 001750 00000001016 13161142102 023246 0ustar00yryr000000 000000 kernel void __cl_fill_image_2d( __write_only image2d_t image, float4 pattern, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int origin0, unsigned int origin1, unsigned int origin2) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); int2 coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; coord.x = origin0 + i; coord.y = origin1 + j; write_imagef(image, coord, pattern); } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_2d_array_to_2d_array.cl000664 001750 001750 00000002013 13161142102 026753 0ustar00yryr000000 000000 kernel void __cl_copy_image_2d_array_to_2d_array(__read_only image2d_array_t src_image, __write_only image2d_array_t dst_image, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); int4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; int4 src_coord; int4 dst_coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; src_coord.x = src_origin0 + i; src_coord.y = src_origin1 + j; src_coord.z = src_origin2 + k; dst_coord.x = dst_origin0 + i; dst_coord.y = dst_origin1 + j; dst_coord.z = dst_origin2 + k; color = read_imagei(src_image, sampler, src_coord); write_imagei(dst_image, dst_coord, color); } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_2d_array_to_3d.cl000664 001750 001750 00000001777 13161142102 025576 0ustar00yryr000000 000000 kernel void __cl_copy_image_2d_array_to_3d(__read_only image2d_array_t src_image, __write_only image3d_t dst_image, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); int4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; int4 src_coord; int4 dst_coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; src_coord.x = src_origin0 + i; src_coord.y = src_origin1 + j; src_coord.z = src_origin2 + k; dst_coord.x = dst_origin0 + i; dst_coord.y = dst_origin1 + j; dst_coord.z = dst_origin2 + k; color = read_imagei(src_image, sampler, src_coord); write_imagei(dst_image, dst_coord, color); } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_2d_to_3d.cl000664 001750 001750 00000001766 13161142102 024376 0ustar00yryr000000 000000 kernel void __cl_copy_image_2d_to_3d(__read_only image2d_t src_image, __write_only image3d_t dst_image, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); int4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; int2 src_coord; int4 dst_coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; src_coord.x = src_origin0 + i; src_coord.y = src_origin1 + j; dst_coord.x = dst_origin0 + i; dst_coord.y = dst_origin1 + j; dst_coord.z = dst_origin2 + k; color = read_imagei(src_image, sampler, src_coord); write_imagei(dst_image, dst_coord, color); } Beignet-1.3.2-Source/src/kernels/cl_internal_fill_buf_align2.cl000664 001750 001750 00000000336 13161142102 023613 0ustar00yryr000000 000000 kernel void __cl_fill_region_align2 ( global char2 * dst, char2 pattern, unsigned int offset, unsigned int size) { int i = get_global_id(0); if (i < size) { dst[i+offset] = pattern; } } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_buffer_to_image_2d_align16.cl000664 001750 001750 00000001316 13161142102 026651 0ustar00yryr000000 000000 kernel void __cl_copy_buffer_to_image_2d_align16(__write_only image2d_t image, global uint4* buffer, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2, unsigned int src_offset) { int i = get_global_id(0); int j = get_global_id(1); uint4 color = (uint4)(0); int2 dst_coord; if((i >= region0) || (j>= region1)) return; dst_coord.x = dst_origin0 + i; dst_coord.y = dst_origin1 + j; src_offset += j * region0 + i; color = buffer[src_offset]; write_imageui(image, dst_coord, color); } Beignet-1.3.2-Source/src/kernels/cl_internal_fill_buf_unalign.cl000664 001750 001750 00000000337 13161142102 024075 0ustar00yryr000000 000000 kernel void __cl_fill_region_unalign ( global char * dst, char pattern, unsigned int offset, unsigned int size) { int i = get_global_id(0); if (i < size) { dst[i+offset] = pattern; } } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_buf_unalign_same_offset.cl000664 001750 001750 00000001226 13161142102 026472 0ustar00yryr000000 000000 kernel void __cl_copy_region_unalign_same_offset ( global int* src, unsigned int src_offset, global int* dst, unsigned int dst_offset, unsigned int size, unsigned int first_mask, unsigned int last_mask) { int i = get_global_id(0); if (i > size -1) return; if (i == 0) { dst[dst_offset] = (dst[dst_offset] & first_mask) | (src[src_offset] & (~first_mask)); } else if (i == size - 1) { dst[i+dst_offset] = (src[i+src_offset] & last_mask) | (dst[i+dst_offset] & (~last_mask)); } else { dst[i+dst_offset] = src[i+src_offset]; } } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_buffer_to_image_3d.cl000664 001750 001750 00000001444 13161142102 025333 0ustar00yryr000000 000000 kernel void __cl_copy_buffer_to_image_3d(__write_only image3d_t image, global uchar* buffer, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2, unsigned int src_offset) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); uint4 color = (uint4)(0); int4 dst_coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; dst_coord.x = dst_origin0 + i; dst_coord.y = dst_origin1 + j; dst_coord.z = dst_origin2 + k; src_offset += (k * region1 + j) * region0 + i; color.x = buffer[src_offset]; write_imageui(image, dst_coord, color); } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_3d_to_2d.cl000664 001750 001750 00000001722 13161142102 024366 0ustar00yryr000000 000000 kernel void __cl_copy_image_3d_to_2d(__read_only image3d_t src_image, __write_only image2d_t dst_image, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); int4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; int4 src_coord; int2 dst_coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; src_coord.x = src_origin0 + i; src_coord.y = src_origin1 + j; src_coord.z = src_origin2 + k; dst_coord.x = dst_origin0 + i; dst_coord.y = dst_origin1 + j; color = read_imagei(src_image, sampler, src_coord); write_imagei(dst_image, dst_coord, color); } Beignet-1.3.2-Source/src/kernels/cl_internal_fill_buf_align8.cl000664 001750 001750 00000000656 13161142102 023626 0ustar00yryr000000 000000 #define COMPILER_ABS_FUNC_N(N) \ kernel void __cl_fill_region_align8_##N ( global float##N* dst, float##N pattern, \ unsigned int offset, unsigned int size) { \ int i = get_global_id(0); \ if (i < size) { \ dst[i+offset] = pattern; \ } \ } COMPILER_ABS_FUNC_N(2) COMPILER_ABS_FUNC_N(4) COMPILER_ABS_FUNC_N(8) COMPILER_ABS_FUNC_N(16) Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_2d_to_buffer.cl000664 001750 001750 00000001540 13161142102 025327 0ustar00yryr000000 000000 kernel void __cl_copy_image_2d_to_buffer( __read_only image2d_t image, global uchar* buffer, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_offset) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); uint4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; int2 src_coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; src_coord.x = src_origin0 + i; src_coord.y = src_origin1 + j; color = read_imageui(image, sampler, src_coord); dst_offset += (k * region1 + j) * region0 + i; buffer[dst_offset] = color.x; } Beignet-1.3.2-Source/src/kernels/cl_internal_block_motion_estimate_intel.cl000664 001750 001750 00000035551 13161142102 026351 0ustar00yryr000000 000000 typedef struct _motion_estimation_desc_intel { uint mb_block_type; uint subpixel_mode; uint sad_adjust_mode; uint search_path_type; } accelerator_intel_t; __kernel __attribute__((reqd_work_group_size(16,1,1))) void block_motion_estimate_intel(accelerator_intel_t accel, __read_only image2d_t src_image, __read_only image2d_t ref_image, __global short2 * prediction_motion_vector_buffer, __global short2 * motion_vector_buffer, __global ushort * residuals){ uint src_grf0_dw7; uint src_grf0_dw6; uint src_grf0_dw5; uint src_grf0_dw4; uint src_grf0_dw3; uint src_grf0_dw2; uint src_grf0_dw1; uint src_grf0_dw0; uint src_grf1_dw7; uint src_grf1_dw6; uint src_grf1_dw5; uint src_grf1_dw4; uint src_grf1_dw3; uint src_grf1_dw2; uint src_grf1_dw1; uint src_grf1_dw0; uint src_grf2_dw7; uint src_grf2_dw6; uint src_grf2_dw5; uint src_grf2_dw4; uint src_grf2_dw3; uint src_grf2_dw2; uint src_grf2_dw1; uint src_grf2_dw0; uint src_grf3_dw7; uint src_grf3_dw6; uint src_grf3_dw5; uint src_grf3_dw4; uint src_grf3_dw3; uint src_grf3_dw2; uint src_grf3_dw1; uint src_grf3_dw0; uint src_grf4_dw7; uint src_grf4_dw6; uint src_grf4_dw5; uint src_grf4_dw4; uint src_grf4_dw3; uint src_grf4_dw2; uint src_grf4_dw1; uint src_grf4_dw0; uint8 vme_result = (0); int lgid_x = get_group_id(0); int lgid_y = get_group_id(1); int num_groups_x = get_num_groups(0); int index = lgid_y * num_groups_x + lgid_x; uint2 srcCoord = 0; short2 predict_mv = 0; if(prediction_motion_vector_buffer != NULL){ predict_mv = prediction_motion_vector_buffer[index]; predict_mv.x = predict_mv.x / 4; predict_mv.y = predict_mv.y / 4; } srcCoord.x = lgid_x * 16; srcCoord.y = lgid_y * 16; //CL_ME_SEARCH_PATH_RADIUS_2_2_INTEL if(accel.search_path_type == 0x0){ //src_grf0_dw5 = (Ref_Height << 24) | (Ref_Width << 16) | (Ignored << 8) | (Dispatch_Id); src_grf0_dw5 = (20 << 24) | (20 << 16) | (0 << 8) | (0); //src_grf0_dw1 = (Ref1Y << 16) | (Ref1X); src_grf0_dw1 = ((-2 + predict_mv.y) << 16 ) | ((-2 + predict_mv.x) & 0x0000ffff); //src_grf0_dw0 = (Ref0Y << 16) | (Ref0X); src_grf0_dw0 = ((-2 + predict_mv.y) << 16 ) | ((-2 + predict_mv.x) & 0x0000ffff); //src_grf1_dw2 = (Start1Y << 28) | (Start1X << 24) | (Start0Y << 20) src_grf1_dw2 = (0 << 28) | (0 << 24) | (0 << 20) //| (Start0X << 16) | (Max_Num_SU << 8) | (LenSP); | (0 << 16) | (2 << 8) | (2); } //CL_ME_SEARCH_PATH_RADIUS_4_4_INTEL else if(accel.search_path_type == 0x1){ src_grf0_dw5 = (24 << 24) | (24 << 16) | (0 << 8) | (0); src_grf0_dw1 = ((-4 + predict_mv.y) << 16 ) | ((-4 + predict_mv.x) & 0x0000ffff); src_grf0_dw0 = ((-4 + predict_mv.y) << 16 ) | ((-4 + predict_mv.x) & 0x0000ffff); src_grf1_dw2 = (0 << 28) | (0 << 24) | (0 << 20) | (0 << 16) | (48 << 8) | (48); } //CL_ME_SEARCH_PATH_RADIUS_16_12_INTEL else if(accel.search_path_type == 0x5){ src_grf0_dw5 = (40 << 24) | (48 << 16) | (0 << 8) | (0); src_grf0_dw1 = ((-12 + predict_mv.y) << 16 ) | ((-16 + predict_mv.x) & 0x0000ffff); src_grf0_dw0 = ((-12 + predict_mv.y) << 16 ) | ((-16 + + predict_mv.x) & 0x0000ffff); src_grf1_dw2 = (0 << 28) | (0 << 24) | (0 << 20) | (0 << 16) | (48 << 8) | (48); } /*Deal with mb_block_type & sad_adjust_mode & subpixel_mode*/ uchar sub_mb_part_mask = 0; //CL_ME_MB_TYPE_16x16_INTEL if(accel.mb_block_type == 0x0) sub_mb_part_mask = 0x7e; //CL_ME_MB_TYPE_8x8_INTEL else if(accel.mb_block_type == 0x1) sub_mb_part_mask = 0x77; //CL_ME_MB_TYPE_4x4_INTEL else if(accel.mb_block_type == 0x2) sub_mb_part_mask = 0x3f; uchar inter_sad = 0; //CL_ME_SAD_ADJUST_MODE_NONE_INTEL if(accel.sad_adjust_mode == 0x0) inter_sad = 0; //CL_ME_SAD_ADJUST_MODE_HAAR_INTEL else if(accel.sad_adjust_mode == 0x1) inter_sad = 2; uchar sub_pel_mode = 0; //CL_ME_SUBPIXEL_MODE_INTEGER_INTEL if(accel.subpixel_mode == 0x0) sub_pel_mode = 0; //CL_ME_SUBPIXEL_MODE_HPEL_INTEL else if(accel.subpixel_mode == 0x1) sub_pel_mode = 1; //CL_ME_SUBPIXEL_MODE_QPEL_INTEL else if(accel.subpixel_mode == 0x2) sub_pel_mode = 3; //src_grf0_dw3 = (Reserved << 31) | (Sub_Mb_Part_Mask << 24) | (Intra_SAD << 22) src_grf0_dw3 = (0 << 31) | (sub_mb_part_mask << 24) | (0 << 22) //| (Inter_SAD << 20) | (BB_Skip_Enabled << 19) | (Reserverd << 18) | (inter_sad << 20) | (0 << 19) | (0 << 18) //| (Dis_Aligned_Src_Fetch << 17) | (Dis_Aligned_Ref_Fetch << 16) | (Dis_Field_Cache_Alloc << 15) | (0 << 17) | (0 << 16) | (0 << 15) //| (Skip_Type << 14) | (Sub_Pel_Mode << 12) | (Dual_Search_Path_Opt << 11) | (0 << 14) | (sub_pel_mode << 12) | (0 << 11) //| (Search_Ctrl << 8) | (Ref_Access << 7) | (SrcAccess << 6) | (0 << 8) | (0 << 7) | (0 << 6) //| (Mb_Type_Remap << 4) | (Reserved_Workaround << 3) | (Reserved_Workaround << 2) | (0 << 4) | (0 << 3) | (0 << 2) //| (Src_Size); | (0); //src_grf0_dw7 = Debug; src_grf0_dw7 = 0; //src_grf0_dw6 = Debug; src_grf0_dw6 = 0; //src_grf0_dw5 = (Ref_Height << 24) | (Ref_Width << 16) | (Ignored << 8) | (Dispatch_Id?); //src_grf0_dw4 = Ignored; src_grf0_dw4 = 0; //src_grf0_dw2 = (SrcY << 16) | (SrcX); src_grf0_dw2 = (srcCoord.y << 16) | (srcCoord.x); //src_grf0_dw1 = (Ref1Y << 16) | (Ref1X); //src_grf0_dw0 = (Ref0Y << 16) | (Ref0X); /*src_grf1_dw7 = (Skip_Center_Mask << 24) | (Reserved << 22) | (Ref1_Field_Polarity << 21) | (Ref0_Field_Polarity << 20) | (Src_Field_Polarity << 19) | (Bilinear_Enable << 18) | (MV_Cost_Scale_Factor << 16) | (Mb_Intra_Struct << 8) | (Intra_Corner_Swap << 7) | (Non_Skip_Mode_Added << 6) | (Non_Skip_ZMv_Added << 5) | (IntraPartMask);*/ src_grf1_dw7 = 0; //src_grf1_dw6 = Reserved; src_grf1_dw6 = 0; /*src_grf1_dw5 = (Cost_Center1Y << 16) | (Cost_Center1X); src_grf1_dw4 = (Cost_Center0Y << 16) | (Cost_Center0X); src_grf1_dw3 = (Ime_Too_Good << 24 ) | (Ime_Too_Bad << 16) | (Part_Tolerance_Thrhd << 8) | (FBPrunThrhd);*/ src_grf1_dw5 = 0; src_grf1_dw4 = 0; src_grf1_dw3 = 0; //src_grf1_dw2 = (Start1Y << 28) | (Start1X << 24) | (Start0Y << 20) //| (Start0X << 16) | (Max_Num_SU << 8) | (LenSP); /*src_grf1_dw1 = (RepartEn << 31) | (FBPrunEn << 30) | (AdaptiveValidationControl << 29) | (Uni_Mix_Disable << 28) | (Bi_Sub_Mb_Part_Mask << 24) | (Reserverd << 22) | (Bi_Weight << 16) | (Reserved << 6) | (MaxNumMVs);*/ //src_grf1_dw1 = (0 << 24) | (2); src_grf1_dw1 = (0 << 24) | (16); /*src_grf1_dw0 = (Early_Ime_Stop << 24) | (Early_Fme_Success << 16) | (Skip_Success << 8) | (T8x8_Flag_For_Inter_En << 7) | (Quit_Inter_En << 6) | (Early_Ime_Success_En << 5) | (Early_Success_En << 4) | (Part_Candidate_En << 3) | (Bi_Mix_Dis << 2) | (Adaptive_En << 1) | (SkipModeEn);*/ src_grf1_dw0 = 0; /*src_grf2_dw7 = Ref1_SkipCenter_3_Delta_XY; src_grf2_dw6 = Ref0_SkipCenter_3_Delta_XY; src_grf2_dw5 = Ref1_SkipCenter_2_Delta_XY; src_grf2_dw4 = Ref0_SkipCenter_3_Delta_XY; src_grf2_dw3 = Ref1_SkipCenter_1_Delta_XY; src_grf2_dw2 = Ref0_SkipCenter_1_Delta_XY; src_grf2_dw1 = Ref1_SkipCenter_0_Delta_XY; src_grf2_dw0 = (Ref0_Skip_Center_0_Delta_Y << 16) | (Ref0_Skip_Center_0_Delta_X); src_grf3_dw7 = Neighbor pixel Luma value [23, -1] to [20, -1]; src_grf3_dw6 = Neighbor pixel Luma value [19, -1] to [16, -1]; src_grf3_dw5 = Neighbor pixel Luma value [15, -1] to [12, -1]; src_grf3_dw4 = Neighbor pixel Luma value [11, -1] to [8, -1]; src_grf3_dw3 = Neighbor pixel Luma value [7, -1] to [4, -1]; src_grf3_dw2 = (Neighbor pixel Luma value [3, -1] << 24) | (Neighbor pixel Luma value [2, -1] << 16) | (Neighbor pixel Luma value [1, -1] << 8) | (Neighbor pixel Luma value [0, -1]); //src_grf3_dw1 = (?) | (Reserved) | ((Intra_16x16_Mode_Mask); src_grf3_dw0 = (Reserved<<25) | (Intra_16x16_Mode_Mask << 16) | (Reserved) | (Intra_16x16_Mode_Mask); src_grf4_dw7 = Reserved; src_grf4_dw6 = Reserved; src_grf4_dw5 = Reserved; src_grf4_dw4 = (Intra_MxM_Pred_Mode_B15 << 28) | (Intra_MxM_Pred_Mode_B14 << 24) | (Intra_MxM_Pred_Mode_B11 << 20) | (Intra_MxM_Pred_Mode_B10 << 16) | (Intra_MxM_Pred_Mode_A15 << 12) | (Intra_MxM_Pred_Mode_A13 << 8) | (Intra_MxM_Pred_Mode_A7 << 4) | (Intra_MxM_Pred_Mode_A5); //src_grf4_dw3 = (?) | (Neighbor pixel Luma value [-1, 14] to [-1, 12]); src_grf4_dw2 = Neighbor pixel Luma value [-1, 11] to [-1, 8]; src_grf4_dw1 = Neighbor pixel Luma value [-1, 7] to [-1, 4]; src_grf4_dw0 = (Neighbor pixel Luma value [-1, 3] << 24) | (Neighbor pixel Luma value [-1, 2] << 16) | (Neighbor pixel Luma value [-1, 1] << 8) | (Neighbor pixel Luma value [-1, 0]);*/ src_grf2_dw7 = 0; src_grf2_dw6 = 0; src_grf2_dw5 = 0; src_grf2_dw4 = 0; src_grf2_dw3 = 0; src_grf2_dw2 = 0; src_grf2_dw1 = 0; src_grf2_dw0 = 0; src_grf3_dw7 = 0; src_grf3_dw6 = 0; src_grf3_dw5 = 0; src_grf3_dw4 = 0; src_grf3_dw3 = 0; src_grf3_dw2 = 0; src_grf3_dw1 = 0; src_grf3_dw0 = 0; src_grf4_dw7 = 0; src_grf4_dw6 = 0; src_grf4_dw5 = 0; src_grf4_dw4 = 0; src_grf4_dw3 = 0; src_grf4_dw2 = 0; src_grf4_dw1 = 0; src_grf4_dw0 = 0; int lid_x = get_local_id(0); vme_result = __gen_ocl_vme(src_image, ref_image, src_grf0_dw7, src_grf0_dw6, src_grf0_dw5, src_grf0_dw4, src_grf0_dw3, src_grf0_dw2, src_grf0_dw1, src_grf0_dw0, src_grf1_dw7, src_grf1_dw6, src_grf1_dw5, src_grf1_dw4, src_grf1_dw3, src_grf1_dw2, src_grf1_dw1, src_grf1_dw0, src_grf2_dw7, src_grf2_dw6, src_grf2_dw5, src_grf2_dw4, src_grf2_dw3, src_grf2_dw2, src_grf2_dw1, src_grf2_dw0, src_grf3_dw7, src_grf3_dw6, src_grf3_dw5, src_grf3_dw4, src_grf3_dw3, src_grf3_dw2, src_grf3_dw1, src_grf3_dw0, src_grf4_dw7, src_grf4_dw6, src_grf4_dw5, src_grf4_dw4, src_grf4_dw3, src_grf4_dw2, src_grf4_dw1, src_grf4_dw0, //msg_type, vme_search_path_lut, lut_sub, 1, 0, 0); barrier(CLK_LOCAL_MEM_FENCE); short2 mv[16]; ushort res[16]; uint write_back_dwx; uint simd_width = get_max_sub_group_size(); /* In simd 8 mode, one kernel variable 'uint' map to 8 dword. * In simd 16 mode, one kernel variable 'uint' map to 16 dword. * That's why we should treat simd8 and simd16 differently when * use __gen_ocl_region. * */ if(simd_width == 8){ write_back_dwx = __gen_ocl_region(0, vme_result.s1); mv[0] = as_short2( write_back_dwx ); if(accel.mb_block_type > 0x0){ for(int i = 2, j = 1; j < 4; i += 2, j++){ write_back_dwx = __gen_ocl_region(i, vme_result.s1); mv[j] = as_short2( write_back_dwx ); } if(accel.mb_block_type > 0x1){ for(int i = 0, j = 4; j < 8; i += 2, j++){ write_back_dwx = __gen_ocl_region(i, vme_result.s2); mv[j] = as_short2( write_back_dwx ); } for(int i = 0, j = 8; j < 12; i += 2, j++){ write_back_dwx = __gen_ocl_region(i, vme_result.s3); mv[j] = as_short2( write_back_dwx ); } for(int i = 0, j = 12; j < 16; i += 2, j++){ write_back_dwx = __gen_ocl_region(i, vme_result.s4); mv[j] = as_short2( write_back_dwx ); } } } ushort2 temp_res; for(int i = 0; i < 8; i++){ write_back_dwx = __gen_ocl_region(i, vme_result.s5); temp_res = as_ushort2(write_back_dwx); res[i*2] = temp_res.s0; res[i*2+1] = temp_res.s1; } } else if(simd_width == 16){ write_back_dwx = __gen_ocl_region(0 + 8, vme_result.s0); mv[0] = as_short2( write_back_dwx ); if(accel.mb_block_type > 0x0){ for(int i = 2, j = 1; j < 4; i += 2, j++){ write_back_dwx = __gen_ocl_region(i + 8, vme_result.s0); mv[j] = as_short2( write_back_dwx ); } if(accel.mb_block_type > 0x1){ for(int i = 0, j = 4; j < 8; i += 2, j++){ write_back_dwx = __gen_ocl_region(i, vme_result.s1); mv[j] = as_short2( write_back_dwx ); } for(int i = 0, j = 8; j < 12; i += 2, j++){ write_back_dwx = __gen_ocl_region(i + 8, vme_result.s1); mv[j] = as_short2( write_back_dwx ); } for(int i = 0, j = 12; j < 16; i += 2, j++){ write_back_dwx = __gen_ocl_region(i, vme_result.s2); mv[j] = as_short2( write_back_dwx ); } } } ushort2 temp_res; for(int i = 0; i < 8; i++){ write_back_dwx = __gen_ocl_region(i + 8, vme_result.s2); temp_res = as_ushort2(write_back_dwx); res[i*2] = temp_res.s0; res[i*2+1] = temp_res.s1; } } int mv_index; //CL_ME_MB_TYPE_16x16_INTEL if(accel.mb_block_type == 0x0){ mv_index = index * 1; if( lid_x == 0 ){ motion_vector_buffer[mv_index] = mv[lid_x]; if(residuals) residuals[mv_index] = 2 * res[lid_x]; } } //CL_ME_MB_TYPE_8x8_INTEL else if(accel.mb_block_type == 0x1){ if(lid_x < 4){ mv_index = lgid_y * num_groups_x * 4 + lgid_x * 2; mv_index = mv_index + num_groups_x * 2 * (lid_x / 2) + (lid_x % 2); motion_vector_buffer[mv_index] = mv[lid_x]; if(residuals) residuals[mv_index] = 2 * res[lid_x]; } } //CL_ME_MB_TYPE_4x4_INTEL else if(accel.mb_block_type == 0x2){ if(lid_x < 16){ mv_index = lgid_y * num_groups_x * 16 + lgid_x * 4; mv_index = mv_index + num_groups_x * 4 * (lid_x / 4) + (lid_x % 4); motion_vector_buffer[mv_index] = mv[lid_x]; if(residuals) residuals[mv_index] = 2 * res[lid_x]; } } } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_buf_unalign_src_offset.cl000664 001750 001750 00000002045 13161142102 026334 0ustar00yryr000000 000000 kernel void __cl_copy_region_unalign_src_offset ( global int* src, unsigned int src_offset, global int* dst, unsigned int dst_offset, unsigned int size, unsigned int first_mask, unsigned int last_mask, unsigned int shift, unsigned int dw_mask, int src_less) { int i = get_global_id(0); unsigned int tmp = 0; if (i > size -1) return; if (i == 0) { tmp = ((src[src_offset + i] & dw_mask) << shift); } else if (src_less && i == size - 1) { // not exceed the bound of source tmp = ((src[src_offset + i - 1] & ~dw_mask) >> (32 - shift)); } else { tmp = ((src[src_offset + i - 1] & ~dw_mask) >> (32 - shift)) | ((src[src_offset + i] & dw_mask) << shift); } if (i == 0) { dst[dst_offset] = (dst[dst_offset] & first_mask) | (tmp & (~first_mask)); } else if (i == size - 1) { dst[i+dst_offset] = (tmp & last_mask) | (dst[i+dst_offset] & (~last_mask)); } else { dst[i+dst_offset] = tmp; } } Beignet-1.3.2-Source/src/kernels/cl_internal_copy_image_3d_to_2d_array.cl000664 001750 001750 00000001777 13161142102 025576 0ustar00yryr000000 000000 kernel void __cl_copy_image_3d_to_2d_array(__read_only image3d_t src_image, __write_only image2d_array_t dst_image, unsigned int region0, unsigned int region1, unsigned int region2, unsigned int src_origin0, unsigned int src_origin1, unsigned int src_origin2, unsigned int dst_origin0, unsigned int dst_origin1, unsigned int dst_origin2) { int i = get_global_id(0); int j = get_global_id(1); int k = get_global_id(2); int4 color; const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; int4 src_coord; int4 dst_coord; if((i >= region0) || (j>= region1) || (k>=region2)) return; src_coord.x = src_origin0 + i; src_coord.y = src_origin1 + j; src_coord.z = src_origin2 + k; dst_coord.x = dst_origin0 + i; dst_coord.y = dst_origin1 + j; dst_coord.z = dst_origin2 + k; color = read_imagei(src_image, sampler, src_coord); write_imagei(dst_image, dst_coord, color); } Beignet-1.3.2-Source/src/performance.c000664 001750 001750 00000024173 13161142102 016712 0ustar00yryr000000 000000 #include #include #include #include #include #include #include #define MAX_KERNEL_NAME_LENGTH 100 #define MAX_KERNEL_EXECUTION_COUNT 100000 #define MAX_KERNEL_BUILD_OPT 1000 typedef struct kernel_storage_node { char kernel_name[MAX_KERNEL_NAME_LENGTH]; float kernel_times[MAX_KERNEL_EXECUTION_COUNT]; char build_option[MAX_KERNEL_BUILD_OPT]; int current_count; float kernel_sum_time; struct kernel_storage_node *next; } kernel_storage_node; typedef struct context_storage_node { uintptr_t context_id; kernel_storage_node *kernels_storage; char max_time_kernel_name[MAX_KERNEL_NAME_LENGTH]; float kernel_max_time; int kernel_count; struct context_storage_node *next; } context_storage_node; typedef struct storage { context_storage_node * context_storage; } storage; static storage record; static int atexit_registered = 0; static context_storage_node * prev_context_pointer = NULL; static kernel_storage_node * prev_kernel_pointer = NULL; static context_storage_node * find_context(cl_context context) { if(NULL != prev_context_pointer ) { if(prev_context_pointer->context_id == (uintptr_t)context) return prev_context_pointer; } if(NULL == record.context_storage) { record.context_storage = (context_storage_node *) malloc(sizeof(context_storage_node)); if (record.context_storage == NULL) return NULL; record.context_storage->context_id = (uintptr_t)context; record.context_storage->kernels_storage = NULL; record.context_storage->kernel_max_time = 0.0f; record.context_storage->next = NULL; record.context_storage->kernel_count = 0; return record.context_storage; } context_storage_node *pre = record.context_storage; context_storage_node *cur = record.context_storage; while(NULL !=cur && (uintptr_t)context != cur->context_id ) { pre = cur; cur = cur->next; } if(NULL != cur) return cur; pre->next = (context_storage_node *)malloc(sizeof(context_storage_node)); pre = pre->next; pre->context_id = (uintptr_t)context; pre->kernels_storage = NULL; pre->kernel_max_time = 0.0f; pre->next = NULL; pre->kernel_count = 0; return pre; } static kernel_storage_node * find_kernel(context_storage_node *p_context, const char *kernel_name, const char *build_opt) { if(NULL != prev_kernel_pointer && NULL != prev_context_pointer && p_context == prev_context_pointer && !strncmp(kernel_name, prev_kernel_pointer->kernel_name, MAX_KERNEL_NAME_LENGTH) && !strncmp(build_opt, prev_kernel_pointer->build_option, MAX_KERNEL_BUILD_OPT)) return prev_kernel_pointer; if(NULL == p_context) return NULL; if(NULL == p_context->kernels_storage) { p_context->kernels_storage = (kernel_storage_node *)malloc(sizeof(kernel_storage_node)); if (p_context->kernels_storage == NULL) return NULL; p_context->kernel_count++; strncpy(p_context->kernels_storage->kernel_name,kernel_name, MAX_KERNEL_NAME_LENGTH); p_context->kernels_storage->kernel_name[MAX_KERNEL_NAME_LENGTH - 1] = '\0'; strncpy(p_context->kernels_storage->build_option, build_opt, MAX_KERNEL_BUILD_OPT); p_context->kernels_storage->build_option[MAX_KERNEL_BUILD_OPT - 1] = '\0'; p_context->kernels_storage->current_count = 0; p_context->kernels_storage->kernel_sum_time = 0.0f; p_context->kernels_storage->next = NULL; return p_context->kernels_storage; } kernel_storage_node *pre = p_context->kernels_storage; kernel_storage_node *cur = p_context->kernels_storage; while(NULL != cur && (strncmp(cur->kernel_name, kernel_name, MAX_KERNEL_NAME_LENGTH) || strncmp(cur->build_option, build_opt, MAX_KERNEL_BUILD_OPT))) { pre = cur; cur = cur->next; } if(NULL != cur) return cur; p_context->kernel_count++; pre->next = (kernel_storage_node *)malloc(sizeof(kernel_storage_node)); pre = pre->next; pre->current_count = 0; pre->kernel_sum_time = 0.0f; pre->next = NULL; strncpy(pre->kernel_name, kernel_name, MAX_KERNEL_NAME_LENGTH); pre->kernel_name[MAX_KERNEL_NAME_LENGTH - 1] = '\0'; strncpy(pre->build_option, build_opt, MAX_KERNEL_BUILD_OPT); pre->build_option[MAX_KERNEL_NAME_LENGTH - 1] = '\0'; return pre; } static void free_storage() { context_storage_node *p_context = record.context_storage; while(NULL != p_context) { context_storage_node *p_tmp_context = p_context->next; kernel_storage_node *p_kernel = p_context->kernels_storage; while(NULL != p_kernel) { kernel_storage_node *p_tmp_kernel = p_kernel->next; free(p_kernel); p_kernel = p_tmp_kernel; } free(p_context); p_context = p_tmp_context; } } typedef struct time_element { char kernel_name[MAX_KERNEL_NAME_LENGTH]; float kernel_sum_time; int kernel_execute_count; double dev; float kernel_times[MAX_KERNEL_EXECUTION_COUNT]; uint32_t time_index; } time_element; static int cmp(const void *a, const void *b) { if(((time_element *)a)->kernel_sum_time < ((time_element *)b)->kernel_sum_time) return 1; else if(((time_element *)a)->kernel_sum_time > ((time_element *)b)->kernel_sum_time) return -1; else return 0; } static void print_time_info() { context_storage_node *p_context = record.context_storage; if(NULL == p_context) { printf("Nothing to output !\n"); return; } int tmp_context_id = 0; while(NULL != p_context) { printf("[------------ CONTEXT %4d ------------]\n", tmp_context_id++); printf(" ->>>> KERNELS TIME SUMMARY <<<<-\n"); kernel_storage_node *p_kernel = p_context->kernels_storage; kernel_storage_node *p_tmp_kernel = p_kernel; time_element *te = (time_element *)malloc(sizeof(time_element)*p_context->kernel_count); if (te == NULL) return; memset(te, 0, sizeof(time_element)*p_context->kernel_count); int i = -1, j = 0, k = 0; while(NULL != p_tmp_kernel) { for(k=0; k<=i; k++) { if(!strncmp(te[k].kernel_name, p_tmp_kernel->kernel_name, MAX_KERNEL_NAME_LENGTH)) break; } if(k == i+1) { i++; k = i; } te[k].kernel_execute_count += p_tmp_kernel->current_count; strncpy(te[k].kernel_name, p_tmp_kernel->kernel_name, MAX_KERNEL_NAME_LENGTH); te[k].kernel_name[MAX_KERNEL_NAME_LENGTH - 1] = '\0'; te[k].kernel_sum_time += p_tmp_kernel->kernel_sum_time; for(j=0; j != p_tmp_kernel->current_count; ++j) te[k].kernel_times[te[k].time_index++] = p_tmp_kernel->kernel_times[j]; p_tmp_kernel = p_tmp_kernel->next; } for(k=0; k<=i; k++) { float average = te[k].kernel_sum_time / te[k].kernel_execute_count; double sumsquare = 0.0; for(j=0; jkernel_count, sizeof(time_element), cmp); for(j=0; j<=i; ++j) sum_time += te[j].kernel_sum_time; for(j=0; j<=i; ++j) { printf(" [Kernel Name: %-30s Time(ms): (%4.1f%%) %9.2f Count: %-7d Ave(ms): %7.2f Dev: %.1lf%%]\n", te[j].kernel_name, te[j].kernel_sum_time / sum_time * 100, te[j].kernel_sum_time, te[j].kernel_execute_count, te[j].kernel_sum_time / te[j].kernel_execute_count, te[j].dev / te[j].kernel_sum_time * te[j].kernel_execute_count * 100); } free(te); printf(" Total : %.2f\n", sum_time); if(2 != b_output_kernel_perf) { printf("[------------ CONTEXT ENDS------------]\n\n"); p_context = p_context->next; continue; } p_tmp_kernel = p_kernel; printf("\n ->>>> KERNELS TIME DETAIL <<<<-\n"); while(NULL != p_kernel) { printf(" [Kernel Name : %30s Time(ms): %.2f]\n", p_kernel->kernel_name, p_kernel->kernel_sum_time); if(*p_kernel->build_option != '\0') { int count = 0; printf(" ->Build Options : "); while(p_kernel->build_option[count] != '\0' ) { printf("%c", p_kernel->build_option[count++]); if(count % 100 == 0) printf("\n "); } printf("\n"); } for(i=0; i!=p_kernel->current_count; ++i) printf(" Execution Round%5d : %.2f (ms)\n", i+1, p_kernel->kernel_times[i]); p_kernel = p_kernel->next; } printf("[------------ CONTEXT ENDS------------]\n\n"); p_context = p_context->next; } free_storage(); } static void insert(cl_context context, const char *kernel_name, const char *build_opt, float time) { if(!atexit_registered) { atexit_registered = 1; atexit(print_time_info); } context_storage_node *p_context = find_context(context); kernel_storage_node *p_kernel = find_kernel(p_context, kernel_name, build_opt); if(!p_kernel) return; prev_context_pointer = p_context; prev_kernel_pointer = p_kernel; p_kernel->kernel_times[p_kernel->current_count++] = time; p_kernel->kernel_sum_time += time; if(p_kernel->kernel_sum_time > p_context->kernel_max_time) { p_context->kernel_max_time = p_kernel->kernel_sum_time; strncpy(p_context->max_time_kernel_name, kernel_name, MAX_KERNEL_NAME_LENGTH); p_context->max_time_kernel_name[MAX_KERNEL_NAME_LENGTH - 1] = '\0'; } } static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; int b_output_kernel_perf = 0; static struct timeval start, end; void initialize_env_var() { char *env = getenv("OCL_OUTPUT_KERNEL_PERF"); if(NULL == env || !strncmp(env,"0", 1)) b_output_kernel_perf = 0; else if(!strncmp(env,"1", 1)) b_output_kernel_perf = 1; else b_output_kernel_perf = 2; } void time_start(cl_context context, const char * kernel_name, cl_command_queue cq) { pthread_mutex_lock(&mutex); gettimeofday(&start, NULL); } void time_end(cl_context context, const char * kernel_name, const char * build_opt, cl_command_queue cq) { clFinish(cq); gettimeofday(&end, NULL); float t = (end.tv_sec - start.tv_sec)*1000 + (end.tv_usec - start.tv_usec)/1000.0f; insert(context, kernel_name, build_opt, t); pthread_mutex_unlock(&mutex); } Beignet-1.3.2-Source/src/cl_mem.h000664 001750 001750 00000033722 13161142102 015652 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_MEM_H__ #define __CL_MEM_H__ #include "cl_internals.h" #include "cl_driver_type.h" #include "CL/cl.h" #include "cl_base_object.h" #include #include #if defined(HAS_GL_EGL) #include "EGL/egl.h" #endif #ifndef CL_VERSION_1_2 #define CL_MEM_OBJECT_IMAGE1D 0x10F4 #define CL_MEM_OBJECT_IMAGE1D_ARRAY 0x10F5 #define CL_MEM_OBJECT_IMAGE1D_BUFFER 0x10F6 #define CL_MEM_OBJECT_IMAGE2D_ARRAY 0x10F3 typedef struct _cl_image_desc { cl_mem_object_type image_type; size_t image_width; size_t image_height; size_t image_depth; size_t image_array_size; size_t image_row_pitch; size_t image_slice_pitch; cl_uint num_mip_levels; cl_uint num_samples; cl_mem buffer; } cl_image_desc; #endif typedef enum cl_image_tiling { CL_NO_TILE = 0, CL_TILE_X = 1, CL_TILE_Y = 2 } cl_image_tiling_t; typedef struct _cl_mapped_ptr { void * ptr; void * v_ptr; size_t size; size_t origin[3]; /* mapped origin */ size_t region[3]; /* mapped region */ }cl_mapped_ptr; typedef struct _cl_mem_dstr_cb { list_node node; /* Mem callback list node */ void(CL_CALLBACK *pfn_notify)(cl_mem memobj, void *user_data); void *user_data; } _cl_mem_dstr_cb; typedef _cl_mem_dstr_cb* cl_mem_dstr_cb; /* Used for buffers and images */ enum cl_mem_type { CL_MEM_BUFFER_TYPE, CL_MEM_SUBBUFFER_TYPE, CL_MEM_PIPE_TYPE, CL_MEM_SVM_TYPE, CL_MEM_IMAGE_TYPE, CL_MEM_GL_IMAGE_TYPE, CL_MEM_BUFFER1D_IMAGE_TYPE }; #define IS_IMAGE(mem) (mem->type >= CL_MEM_IMAGE_TYPE) #define IS_GL_IMAGE(mem) (mem->type == CL_MEM_GL_IMAGE_TYPE) typedef struct _cl_mem { _cl_base_object base; enum cl_mem_type type; cl_buffer bo; /* Data in GPU memory */ size_t size; /* original request size, not alignment size, used in constant buffer */ cl_context ctx; /* Context it belongs to */ cl_mem_flags flags; /* Flags specified at the creation time */ void * host_ptr; /* Pointer of the host mem specified by CL_MEM_ALLOC_HOST_PTR, CL_MEM_USE_HOST_PTR */ cl_mapped_ptr* mapped_ptr;/* Store the mapped addresses and size by caller. */ int mapped_ptr_sz; /* The array size of mapped_ptr. */ int map_ref; /* The mapped count. */ uint8_t mapped_gtt; /* This object has mapped gtt, for unmap. */ list_head dstr_cb_head; /* All destroy callbacks. */ uint8_t is_userptr; /* CL_MEM_USE_HOST_PTR is enabled */ cl_bool is_svm; /* This object is svm */ size_t offset; /* offset of host_ptr to the page beginning, only for CL_MEM_USE_HOST_PTR*/ uint8_t cmrt_mem_type; /* CmBuffer, CmSurface2D, ... */ void* cmrt_mem; } _cl_mem; #define CL_OBJECT_MEM_MAGIC 0x381a27b9ee6504dfLL #define CL_OBJECT_IS_MEM(obj) ((obj && \ ((cl_base_object)obj)->magic == CL_OBJECT_MEM_MAGIC && \ CL_OBJECT_GET_REF(obj) >= 1)) #define CL_OBJECT_IS_IMAGE(mem) ((mem && \ ((cl_base_object)mem)->magic == CL_OBJECT_MEM_MAGIC && \ CL_OBJECT_GET_REF(mem) >= 1 && \ mem->type >= CL_MEM_IMAGE_TYPE)) #define CL_OBJECT_IS_BUFFER(mem) ((mem && \ ((cl_base_object)mem)->magic == CL_OBJECT_MEM_MAGIC && \ CL_OBJECT_GET_REF(mem) >= 1 && \ mem->type < CL_MEM_IMAGE_TYPE)) typedef struct _cl_mem_pipe { _cl_mem base; cl_svm_mem_flags flags; /* Flags specified at the creation time */ uint32_t packet_size; uint32_t max_packets; } _cl_mem_pipe; typedef struct _cl_mem_svm { _cl_mem base; cl_svm_mem_flags flags; /* Flags specified at the creation time */ } _cl_mem_svm; struct _cl_mem_image { _cl_mem base; cl_image_format fmt; /* only for images */ uint32_t intel_fmt; /* format to provide in the surface state */ size_t bpp; /* number of bytes per pixel */ cl_mem_object_type image_type; /* only for images 1D/2D...*/ size_t w, h, depth; /* only for images (depth is only for 3D images) */ size_t row_pitch, slice_pitch; size_t host_row_pitch, host_slice_pitch; cl_image_tiling_t tiling; /* only IVB+ supports TILE_[X,Y] (image only) */ size_t tile_x, tile_y; /* tile offset, used for mipmap images. */ size_t offset; /* offset for dri_bo, used when it's reloc. */ cl_mem buffer_1d; /* if the image is created from buffer, it point to the buffer.*/ uint8_t is_image_from_buffer; /* IMAGE from Buffer*/ }; struct _cl_mem_gl_image { struct _cl_mem_image base; int fd; #if defined(HAS_GL_EGL) EGLImage egl_image; #endif }; struct _cl_mem_buffer1d_image { struct _cl_mem_image base; uint32_t size; _cl_mem * descbuffer; }; #define IS_1D_IMAGE(image) (image->image_type == CL_MEM_OBJECT_IMAGE1D || \ image->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY || \ image->image_type == CL_MEM_OBJECT_IMAGE1D_BUFFER) #define IS_2D_IMAGE(image) (image->image_type == CL_MEM_OBJECT_IMAGE2D || \ image->image_type == CL_MEM_OBJECT_IMAGE2D_ARRAY) #define IS_3D_IMAGE(image) (image->image_type == CL_MEM_OBJECT_IMAGE3D) #define IS_IMAGE_ARRAY(image) (image->image_type == CL_MEM_OBJECT_IMAGE1D_ARRAY || \ image->image_type == CL_MEM_OBJECT_IMAGE2D_ARRAY) inline static void cl_mem_image_init(struct _cl_mem_image *image, size_t w, size_t h, cl_mem_object_type image_type, size_t depth, cl_image_format fmt, uint32_t intel_fmt, uint32_t bpp, size_t row_pitch, size_t slice_pitch, cl_image_tiling_t tiling, size_t tile_x, size_t tile_y, size_t offset) { image->w = w; image->h = h; image->image_type = image_type; image->depth = depth; image->fmt = fmt; image->intel_fmt = intel_fmt; image->bpp = bpp; image->row_pitch = row_pitch; image->slice_pitch = slice_pitch; image->tiling = tiling; image->tile_x = tile_x; image->tile_y = tile_y; image->offset = offset; } struct _cl_mem_buffer { _cl_mem base; struct _cl_mem_buffer* subs; /* Sub buf objects. */ size_t sub_offset; /* The sub start offset. */ struct _cl_mem_buffer* sub_prev, *sub_next;/* We chain the sub memory buffers together */ pthread_mutex_t sub_lock; /* Sub buffers list lock*/ struct _cl_mem_buffer* parent; /* Point to the parent buffer if is sub-buffer */ }; inline static struct _cl_mem_image * cl_mem_image(cl_mem mem) { assert(IS_IMAGE(mem)); return (struct _cl_mem_image *)mem; } inline static struct _cl_mem_gl_image * cl_mem_gl_image(cl_mem mem) { assert(IS_GL_IMAGE(mem)); return (struct _cl_mem_gl_image*)mem; } inline static struct _cl_mem_pipe * cl_mem_pipe(cl_mem mem) { assert(mem->type == CL_MEM_PIPE_TYPE); return (struct _cl_mem_pipe *)mem; } /* Query information about a memory object */ extern cl_mem_object_type cl_get_mem_object_type(cl_mem mem); /* Query whether mem is in buffers */ extern cl_int cl_mem_is_valid(cl_mem mem, cl_context ctx); /* Create a new memory object and initialize it with possible user data */ extern cl_mem cl_mem_new_buffer(cl_context, cl_mem_flags, size_t, void*, cl_int*); /* Create a new sub memory object */ extern cl_mem cl_mem_new_sub_buffer(cl_mem, cl_mem_flags, cl_buffer_create_type, const void *, cl_int *); extern cl_mem cl_mem_new_pipe(cl_context, cl_mem_flags, cl_uint, cl_uint, cl_int *); /* Query information about a pipe object */ extern cl_int cl_get_pipe_info(cl_mem, cl_mem_info, size_t, void *, size_t *); void* cl_mem_svm_allocate(cl_context, cl_svm_mem_flags, size_t, unsigned int); void cl_mem_svm_delete(cl_context, void *svm_pointer); /* Idem but this is an image */ extern cl_mem cl_mem_new_image(cl_context context, cl_mem_flags flags, const cl_image_format *image_format, const cl_image_desc *image_desc, void *host_ptr, cl_int *errcode_ret); /* Unref the object and delete it if no more reference */ extern void cl_mem_delete(cl_mem); /* Destroy egl image. */ extern void cl_mem_gl_delete(struct _cl_mem_gl_image *); /* Add one more reference to this object */ extern void cl_mem_add_ref(cl_mem); /* api clEnqueueCopyBuffer help function */ extern cl_int cl_mem_copy(cl_command_queue queue, cl_event event, cl_mem src_buf, cl_mem dst_buf, size_t src_offset, size_t dst_offset, size_t cb); extern cl_int cl_mem_fill(cl_command_queue queue, cl_event e, const void * pattern, size_t pattern_size, cl_mem buffer, size_t offset, size_t size); extern cl_int cl_image_fill(cl_command_queue queue, cl_event e, const void * pattern, struct _cl_mem_image*, const size_t *, const size_t *); /* api clEnqueueCopyBufferRect help function */ extern cl_int cl_mem_copy_buffer_rect(cl_command_queue, cl_event event, cl_mem, cl_mem, const size_t *, const size_t *, const size_t *, size_t, size_t, size_t, size_t); /* api clEnqueueCopyImage help function */ extern cl_int cl_mem_kernel_copy_image(cl_command_queue, cl_event event, struct _cl_mem_image*, struct _cl_mem_image*, const size_t *, const size_t *, const size_t *); /* api clEnqueueCopyImageToBuffer help function */ extern cl_int cl_mem_copy_image_to_buffer(cl_command_queue, cl_event, struct _cl_mem_image*, cl_mem, const size_t *, const size_t, const size_t *); /* api clEnqueueCopyBufferToImage help function */ extern cl_int cl_mem_copy_buffer_to_image(cl_command_queue, cl_event, cl_mem, struct _cl_mem_image*, const size_t, const size_t *, const size_t *); /* Directly map a memory object */ extern void *cl_mem_map(cl_mem, int); /* Unmap a memory object */ extern cl_int cl_mem_unmap(cl_mem); /* Directly map a memory object in GTT mode */ extern void *cl_mem_map_gtt(cl_mem); /* Directly map a memory object in GTT mode, with out waiting gpu idle */ extern void *cl_mem_map_gtt_unsync(cl_mem); /* Unmap a memory object in GTT mode */ extern cl_int cl_mem_unmap_gtt(cl_mem); /* Directly map a memory object - tiled images are mapped in GTT mode */ extern void *cl_mem_map_auto(cl_mem, int); /* Unmap a memory object - tiled images are unmapped in GTT mode */ extern cl_int cl_mem_unmap_auto(cl_mem); /* Pin/unpin the buffer in memory (you must be root) */ extern cl_int cl_mem_pin(cl_mem); extern cl_int cl_mem_unpin(cl_mem); extern cl_mem cl_mem_allocate(enum cl_mem_type type, cl_context ctx, cl_mem_flags flags, size_t sz, cl_int is_tiled, void *host_ptr, cl_mem buffer, cl_int *errcode); void cl_mem_copy_image_region(const size_t *origin, const size_t *region, void *dst, size_t dst_row_pitch, size_t dst_slice_pitch, const void *src, size_t src_row_pitch, size_t src_slice_pitch, const struct _cl_mem_image *image, cl_bool offset_dst, cl_bool offset_src); void cl_mem_copy_image_to_image(const size_t *dst_origin,const size_t *src_origin, const size_t *region, const struct _cl_mem_image *dst_image, const struct _cl_mem_image *src_image); extern cl_mem cl_mem_new_libva_buffer(cl_context ctx, unsigned int bo_name, cl_int *errcode); extern cl_mem cl_mem_new_libva_image(cl_context ctx, unsigned int bo_name, size_t offset, size_t width, size_t height, cl_image_format fmt, size_t row_pitch, cl_int *errcode); extern cl_int cl_mem_get_fd(cl_mem mem, int* fd); extern cl_mem cl_mem_new_buffer_from_fd(cl_context ctx, int fd, int buffer_sz, cl_int* errcode); extern cl_mem cl_mem_new_image_from_fd(cl_context ctx, int fd, int image_sz, size_t offset, size_t width, size_t height, cl_image_format fmt, size_t row_pitch, cl_int *errcode); extern cl_int cl_mem_record_map_mem(cl_mem mem, void *ptr, void **mem_ptr, size_t offset, size_t size, const size_t *origin, const size_t *region); extern cl_int cl_mem_set_destructor_callback(cl_mem memobj, void(CL_CALLBACK *pfn_notify)(cl_mem, void *), void *user_data); #endif /* __CL_MEM_H__ */ Beignet-1.3.2-Source/src/cl_command_queue_enqueue.c000664 001750 001750 00000021202 13161142102 021426 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: He Junyan */ #include "cl_command_queue.h" #include "cl_event.h" #include "cl_alloc.h" #include static void * worker_thread_function(void *Arg) { cl_command_queue_enqueue_worker worker = (cl_command_queue_enqueue_worker)Arg; cl_command_queue queue = worker->queue; cl_event e; cl_uint cookie = -1; list_node *pos; list_node *n; list_head ready_list; cl_int exec_status; CL_OBJECT_LOCK(queue); while (1) { /* Must have locked here. */ if (worker->quit == CL_TRUE) { CL_OBJECT_UNLOCK(queue); return NULL; } if (list_empty(&worker->enqueued_events)) { CL_OBJECT_WAIT_ON_COND(queue); continue; } /* The cookie will change when event status change or something happend to this command queue. If we already checked the event list and do not find anything to exec, we need to wait the cookie update, to avoid loop for ever. */ if (cookie == worker->cookie) { CL_OBJECT_WAIT_ON_COND(queue); continue; } /* Here we hold lock to check event status, to avoid missing the status notify*/ list_init(&ready_list); list_for_each_safe(pos, n, &worker->enqueued_events) { e = list_entry(pos, _cl_event, enqueue_node); if (cl_event_is_ready(e) <= CL_COMPLETE) { list_node_del(&e->enqueue_node); list_add_tail(&ready_list, &e->enqueue_node); } } if (list_empty(&ready_list)) { /* Nothing to do, just wait. */ cookie = worker->cookie; continue; } /* Notify waiters, we change the event list. */ CL_OBJECT_NOTIFY_COND(queue); worker->in_exec_status = CL_QUEUED; CL_OBJECT_UNLOCK(queue); /* Do the really job without lock.*/ exec_status = CL_SUBMITTED; list_for_each_safe(pos, n, &ready_list) { e = list_entry(pos, _cl_event, enqueue_node); cl_event_exec(e, exec_status, CL_FALSE); } /* Notify all waiting for flush. */ CL_OBJECT_LOCK(queue); worker->in_exec_status = CL_SUBMITTED; CL_OBJECT_NOTIFY_COND(queue); CL_OBJECT_UNLOCK(queue); list_for_each_safe(pos, n, &ready_list) { e = list_entry(pos, _cl_event, enqueue_node); cl_event_exec(e, CL_COMPLETE, CL_FALSE); } /* Clear and delete all the events. */ list_for_each_safe(pos, n, &ready_list) { e = list_entry(pos, _cl_event, enqueue_node); list_node_del(&e->enqueue_node); cl_event_delete(e); } CL_OBJECT_LOCK(queue); worker->in_exec_status = CL_COMPLETE; /* Notify finish waiters, we have done all the ready event. */ CL_OBJECT_NOTIFY_COND(queue); } } LOCAL void cl_command_queue_notify(cl_command_queue queue) { if (CL_OBJECT_GET_REF(queue) < 1) { return; } assert(queue && (((cl_base_object)queue)->magic == CL_OBJECT_COMMAND_QUEUE_MAGIC)); CL_OBJECT_LOCK(queue); queue->worker.cookie++; CL_OBJECT_NOTIFY_COND(queue); CL_OBJECT_UNLOCK(queue); } LOCAL void cl_command_queue_enqueue_event(cl_command_queue queue, cl_event event) { CL_OBJECT_INC_REF(event); assert(CL_OBJECT_IS_COMMAND_QUEUE(queue)); CL_OBJECT_LOCK(queue); assert(queue->worker.quit == CL_FALSE); assert(list_node_out_of_list(&event->enqueue_node)); list_add_tail(&queue->worker.enqueued_events, &event->enqueue_node); queue->worker.cookie++; CL_OBJECT_NOTIFY_COND(queue); CL_OBJECT_UNLOCK(queue); } LOCAL cl_int cl_command_queue_init_enqueue(cl_command_queue queue) { cl_command_queue_enqueue_worker worker = &queue->worker; worker->queue = queue; worker->quit = CL_FALSE; worker->in_exec_status = CL_COMPLETE; worker->cookie = 8; list_init(&worker->enqueued_events); if (pthread_create(&worker->tid, NULL, worker_thread_function, worker)) { DEBUGP(DL_ERROR, "Can not create worker thread for queue %p...\n", queue); return CL_OUT_OF_RESOURCES; } return CL_SUCCESS; } LOCAL void cl_command_queue_destroy_enqueue(cl_command_queue queue) { cl_command_queue_enqueue_worker worker = &queue->worker; list_node *pos; list_node *n; cl_event e; assert(worker->queue == queue); assert(worker->quit == CL_FALSE); CL_OBJECT_LOCK(queue); worker->quit = 1; CL_OBJECT_NOTIFY_COND(queue); CL_OBJECT_UNLOCK(queue); pthread_join(worker->tid, NULL); /* We will wait for finish before destroy the command queue. */ if (!list_empty(&worker->enqueued_events)) { DEBUGP(DL_WARNING, "There are still some enqueued works in the queue %p when this" " queue is destroyed, this may cause very serious problems.\n", queue); list_for_each_safe(pos, n, &worker->enqueued_events) { e = list_entry(pos, _cl_event, enqueue_node); list_node_del(&e->enqueue_node); cl_event_set_status(e, -1); // Give waiters a chance to wakeup. cl_event_delete(e); } } } /* Note: Must call this function with queue's lock. */ LOCAL cl_event * cl_command_queue_record_in_queue_events(cl_command_queue queue, cl_uint *list_num) { int event_num = 0; list_node *pos; cl_command_queue_enqueue_worker worker = &queue->worker; cl_event *enqueued_list = NULL; int i; cl_event tmp_e = NULL; list_for_each(pos, &worker->enqueued_events) { event_num++; } assert(event_num > 0); enqueued_list = cl_calloc(event_num, sizeof(cl_event)); assert(enqueued_list); i = 0; list_for_each(pos, &worker->enqueued_events) { tmp_e = list_entry(pos, _cl_event, enqueue_node); cl_event_add_ref(tmp_e); // Add ref temp avoid delete. enqueued_list[i] = tmp_e; i++; } assert(i == event_num); *list_num = event_num; return enqueued_list; } LOCAL cl_int cl_command_queue_wait_flush(cl_command_queue queue) { cl_command_queue_enqueue_worker worker = &queue->worker; cl_event *enqueued_list = NULL; cl_uint enqueued_num = 0; int i; CL_OBJECT_LOCK(queue); if (worker->quit) { // already destroy the queue? CL_OBJECT_UNLOCK(queue); return CL_INVALID_COMMAND_QUEUE; } if (!list_empty(&worker->enqueued_events)) { enqueued_list = cl_command_queue_record_in_queue_events(queue, &enqueued_num); assert(enqueued_num > 0); assert(enqueued_list); } while (worker->in_exec_status == CL_QUEUED) { CL_OBJECT_WAIT_ON_COND(queue); if (worker->quit) { // already destroy the queue? CL_OBJECT_UNLOCK(queue); return CL_INVALID_COMMAND_QUEUE; } } CL_OBJECT_UNLOCK(queue); /* Wait all event enter submitted status. */ for (i = 0; i < enqueued_num; i++) { CL_OBJECT_LOCK(enqueued_list[i]); while (enqueued_list[i]->status > CL_SUBMITTED) { CL_OBJECT_WAIT_ON_COND(enqueued_list[i]); } CL_OBJECT_UNLOCK(enqueued_list[i]); } for (i = 0; i < enqueued_num; i++) { cl_event_delete(enqueued_list[i]); } if (enqueued_list) cl_free(enqueued_list); return CL_SUCCESS; } LOCAL cl_int cl_command_queue_wait_finish(cl_command_queue queue) { cl_command_queue_enqueue_worker worker = &queue->worker; cl_event *enqueued_list = NULL; cl_uint enqueued_num = 0; int i; CL_OBJECT_LOCK(queue); if (worker->quit) { // already destroy the queue? CL_OBJECT_UNLOCK(queue); return CL_INVALID_COMMAND_QUEUE; } if (!list_empty(&worker->enqueued_events)) { enqueued_list = cl_command_queue_record_in_queue_events(queue, &enqueued_num); assert(enqueued_num > 0); assert(enqueued_list); } while (worker->in_exec_status > CL_COMPLETE) { CL_OBJECT_WAIT_ON_COND(queue); if (worker->quit) { // already destroy the queue? CL_OBJECT_UNLOCK(queue); return CL_INVALID_COMMAND_QUEUE; } } CL_OBJECT_UNLOCK(queue); /* Wait all event enter submitted status. */ for (i = 0; i < enqueued_num; i++) { CL_OBJECT_LOCK(enqueued_list[i]); while (enqueued_list[i]->status > CL_COMPLETE) { CL_OBJECT_WAIT_ON_COND(enqueued_list[i]); } CL_OBJECT_UNLOCK(enqueued_list[i]); } for (i = 0; i < enqueued_num; i++) { cl_event_delete(enqueued_list[i]); } if (enqueued_list) cl_free(enqueued_list); return CL_SUCCESS; } Beignet-1.3.2-Source/src/cl_gen8_device.h000664 001750 001750 00000002164 13161142102 017250 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* Common fields for both BDW devices */ .max_parameter_size = 1024, .global_mem_cache_line_size = 64, /* XXX */ .global_mem_cache_size = 8 << 10, /* XXX */ .local_mem_type = CL_LOCAL, .local_mem_size = 64 << 10, .scratch_mem_size = 2 << 20, .max_mem_alloc_size = 2 * 1024 * 1024 * 1024ul, .global_mem_size = 4 * 1024 * 1024 * 1024ul, #include "cl_gt_device.h" Beignet-1.3.2-Source/src/performance.h000664 001750 001750 00000000514 13161142102 016710 0ustar00yryr000000 000000 #ifndef __PERFORMANCE_H__ #define __PERFORMANCE_H__ #include "CL/cl.h" extern int b_output_kernel_perf; void time_start(cl_context context, const char * kernel_name, cl_command_queue cq); void time_end(cl_context context, const char * kernel_name, const char * build_opt, cl_command_queue cq); void initialize_env_var(); #endif Beignet-1.3.2-Source/src/cl_base_object.h000664 001750 001750 00000007742 13161142102 017337 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #ifndef __CL_BASE_OBJECT_H__ #define __CL_BASE_OBJECT_H__ #include "cl_utils.h" #include "cl_khr_icd.h" #include "CL/cl.h" #include #include /************************************************************************ Every CL objects should have: ICD dispatcher: Hold the ICD function table pointer. Reference: To maintain its' life time. CL retain/release API will change its value. We will destroy the object when the count reach 0. Magic: Just a number to represent each CL object. We will use it to check whether it is the object we want. Mutex & Cond: Used to protect the CL objects MT safe. lock/unlock critical region should be short enough and should not have any block function call. take_ownership/release_ownership can own the object for a long time. take_ownership will not hold the lock and so will not cause deadlock problems. we can wait on the cond to get the ownership. *************************************************************************/ typedef struct _cl_base_object { DEFINE_ICD(dispatch); /* Dispatch function table for icd */ cl_ulong magic; /* Magic number for each CL object */ atomic_t ref; /* Reference for each CL object */ list_node node; /* CL object node belong to some container */ pthread_mutex_t mutex; /* THe mutex to protect this object MT safe */ pthread_cond_t cond; /* Condition to wait for getting the object */ pthread_t owner; /* The thread which own this object */ } _cl_base_object; typedef struct _cl_base_object *cl_base_object; #define CL_OBJECT_INVALID_MAGIC 0xFEFEFEFEFEFEFEFELL #define CL_OBJECT_IS_VALID(obj) (((cl_base_object)obj)->magic != CL_OBJECT_INVALID_MAGIC) #define CL_OBJECT_INC_REF(obj) (atomic_inc(&((cl_base_object)obj)->ref)) #define CL_OBJECT_DEC_REF(obj) (atomic_dec(&((cl_base_object)obj)->ref)) #define CL_OBJECT_GET_REF(obj) (atomic_read(&((cl_base_object)obj)->ref)) #define CL_OBJECT_LOCK(obj) (pthread_mutex_lock(&((cl_base_object)obj)->mutex)) #define CL_OBJECT_UNLOCK(obj) (pthread_mutex_unlock(&((cl_base_object)obj)->mutex)) extern void cl_object_init_base(cl_base_object obj, cl_ulong magic); extern void cl_object_destroy_base(cl_base_object obj); extern cl_int cl_object_take_ownership(cl_base_object obj, cl_int wait, cl_bool withlock); extern void cl_object_release_ownership(cl_base_object obj, cl_bool withlock); extern void cl_object_wait_on_cond(cl_base_object obj); extern void cl_object_notify_cond(cl_base_object obj); #define CL_OBJECT_INIT_BASE(obj, magic) (cl_object_init_base((cl_base_object)obj, magic)) #define CL_OBJECT_DESTROY_BASE(obj) (cl_object_destroy_base((cl_base_object)obj)) #define CL_OBJECT_TAKE_OWNERSHIP(obj, wait) (cl_object_take_ownership((cl_base_object)obj, wait, CL_FALSE)) #define CL_OBJECT_RELEASE_OWNERSHIP(obj) (cl_object_release_ownership((cl_base_object)obj, CL_FALSE)) #define CL_OBJECT_TAKE_OWNERSHIP_WITHLOCK(obj, wait) (cl_object_take_ownership((cl_base_object)obj, wait, CL_TRUE)) #define CL_OBJECT_RELEASE_OWNERSHIP_WITHLOCK(obj) (cl_object_release_ownership((cl_base_object)obj, CL_TRUE)) #define CL_OBJECT_WAIT_ON_COND(obj) (cl_object_wait_on_cond((cl_base_object)obj)) #define CL_OBJECT_NOTIFY_COND(obj) (cl_object_notify_cond((cl_base_object)obj)) #endif /* __CL_BASE_OBJECT_H__ */ Beignet-1.3.2-Source/src/cl_program.h000664 001750 001750 00000014103 13161142102 016533 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __CL_PROGRAM_H__ #define __CL_PROGRAM_H__ #include "cl_internals.h" #include "cl_gbe_loader.h" #include "cl_base_object.h" #include "CL/cl.h" #include #include // This is the structure ouput by the compiler struct _gbe_program; enum { FROM_SOURCE = 0, FROM_LLVM = 1, FROM_BINARY = 2, FROM_LLVM_SPIR = 3, FROM_CMRT = 4, }; typedef enum _BINARY_HEADER_INDEX { BHI_SPIR = 0, BHI_COMPIRED_OBJECT = 1, BHI_LIBRARY = 2, BHI_GEN_BINARY = 3, BHI_CMRT = 4, BHI_MAX, }BINARY_HEADER_INDEX; /* This maps an OCL file containing some kernels */ struct _cl_program { _cl_base_object base; gbe_program opaque; /* (Opaque) program as ouput by the compiler */ cl_kernel *ker; /* All kernels included by the OCL file */ cl_program prev, next; /* We chain the programs together */ cl_context ctx; /* Its parent context */ cl_buffer global_data; char * global_data_ptr; char *bin; /* The program copied verbatim */ size_t bin_sz; /* Its size in memory */ char *source; /* Program sources */ char *binary; /* Program binary. */ size_t binary_sz; /* The binary size. */ uint32_t binary_type; /* binary type: COMPILED_OBJECT(LLVM IR), LIBRARY(LLVM IR with option "-create-library"), or EXECUTABLE(GEN binary). */ /* ext binary type: BINARY_TYPE_INTERMIDIATE. */ uint32_t ker_n; /* Number of declared kernels */ uint32_t source_type:3; /* Built from binary, source, CMRT or LLVM*/ uint32_t is_built:1; /* Did we call clBuildProgram on it? */ int32_t build_status; /* build status. */ char *build_opts; /* The build options for this program */ size_t build_log_max_sz; /*build log maximum size in byte.*/ char *build_log; /* The build log for this program. */ size_t build_log_sz; /* The actual build log size.*/ void* cmrt_program; /* real type: CmProgram* */ }; #define CL_OBJECT_PROGRAM_MAGIC 0x34562ab12789cdefLL #define CL_OBJECT_IS_PROGRAM(obj) ((obj && \ ((cl_base_object)obj)->magic == CL_OBJECT_PROGRAM_MAGIC && \ CL_OBJECT_GET_REF(obj) >= 1)) /* Create a empty program */ extern cl_program cl_program_new(cl_context); /* Destroy and deallocate an empty kernel */ extern void cl_program_delete(cl_program); /* Add one more reference to the object (to defer its deletion) */ extern void cl_program_add_ref(cl_program); /* Create a kernel for the OCL user */ extern cl_kernel cl_program_create_kernel(cl_program, const char*, cl_int*); /* creates kernel objects for all kernel functions in program. */ extern cl_int cl_program_create_kernels_in_program(cl_program, cl_kernel*); /* Create a program from OCL source */ extern cl_program cl_program_create_from_source(cl_context ctx, cl_uint count, const char **strings, const size_t *lengths, cl_int *errcode_ret); /* Directly create a program from a blob */ extern cl_program cl_program_create_from_binary(cl_context context, cl_uint num_devices, const cl_device_id * devices, const size_t * lengths, const unsigned char ** binaries, cl_int * binary_status, cl_int * errcode_ret); /* Create a program with built-in kernels*/ extern cl_program cl_program_create_with_built_in_kernles(cl_context context, cl_uint num_devices, const cl_device_id * device_list, const char * kernel_names, cl_int * errcode_ret); /* Directly create a program from a LLVM source file */ extern cl_program cl_program_create_from_llvm(cl_context context, cl_uint num_devices, const cl_device_id * devices, const char * fileName, cl_int * errcode_ret); /* Build the program as specified by OCL */ extern cl_int cl_program_build(cl_program p, const char* options); /* Compile the program as specified by OCL */ extern cl_int cl_program_compile(cl_program p, cl_uint num_input_headers, const cl_program * input_headers, const char ** header_include_names, const char* options); /* link the program as specified by OCL */ extern cl_program cl_program_link(cl_context context, cl_uint num_input_programs, const cl_program * input_programs, const char * options, cl_int* errcode_ret); /* Get the kernel names in program */ extern void cl_program_get_kernel_names(cl_program p, size_t size, char *names, size_t *size_ret); extern size_t cl_program_get_global_variable_size(cl_program p); #endif /* __CL_PROGRAM_H__ */ Beignet-1.3.2-Source/src/cl_gen9_device.h000664 001750 001750 00000002236 13161142102 017251 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /* Common fields for both SKL devices */ .max_parameter_size = 1024, .global_mem_cache_line_size = 64, /* XXX */ .global_mem_cache_size = 8 << 10, /* XXX */ .local_mem_type = CL_LOCAL, .local_mem_size = 64 << 10, .scratch_mem_size = 2 << 20, .max_mem_alloc_size = 4 * 1024 * 1024 * 1024ul, .global_mem_size = 4 * 1024 * 1024 * 1024ul, #define GEN9_DEVICE 1 #include "cl_gt_device.h" #undef GEN9_DEVICE Beignet-1.3.2-Source/src/cl_api_kernel.c000664 001750 001750 00000032574 13161142102 017204 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * */ #include "cl_mem.h" #include "cl_kernel.h" #include "cl_enqueue.h" #include "cl_command_queue.h" #include "cl_event.h" #include "cl_context.h" #include "cl_program.h" #include "cl_alloc.h" #include "CL/cl.h" #include #include cl_int clGetKernelInfo(cl_kernel kernel, cl_kernel_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret) { const void *src_ptr = NULL; size_t src_size = 0; const char *str = NULL; cl_int ref; cl_uint n; if (!CL_OBJECT_IS_KERNEL(kernel)) { return CL_INVALID_KERNEL; } if (param_name == CL_KERNEL_CONTEXT) { src_ptr = &kernel->program->ctx; src_size = sizeof(cl_context); } else if (param_name == CL_KERNEL_PROGRAM) { src_ptr = &kernel->program; src_size = sizeof(cl_program); } else if (param_name == CL_KERNEL_NUM_ARGS) { n = kernel->arg_n; src_ptr = &n; src_size = sizeof(cl_uint); } else if (param_name == CL_KERNEL_REFERENCE_COUNT) { ref = CL_OBJECT_GET_REF(kernel); src_ptr = &ref; src_size = sizeof(cl_int); } else if (param_name == CL_KERNEL_FUNCTION_NAME) { str = cl_kernel_get_name(kernel); src_ptr = str; src_size = strlen(str) + 1; } else if (param_name == CL_KERNEL_ATTRIBUTES) { str = cl_kernel_get_attributes(kernel); src_ptr = str; src_size = strlen(str) + 1; } else { return CL_INVALID_VALUE; } return cl_get_info_helper(src_ptr, src_size, param_value, param_value_size, param_value_size_ret); } cl_int clEnqueueNDRangeKernel(cl_command_queue command_queue, cl_kernel kernel, cl_uint work_dim, const size_t *global_work_offset, const size_t *global_work_size, const size_t *local_work_size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { size_t fixed_global_off[] = {0, 0, 0}; size_t fixed_global_sz[] = {1, 1, 1}; size_t fixed_local_sz[] = {1, 1, 1}; cl_int err = CL_SUCCESS; cl_uint i; cl_event e = NULL; cl_int event_status; do { if (!CL_OBJECT_IS_COMMAND_QUEUE(command_queue)) { err = CL_INVALID_COMMAND_QUEUE; break; } if (!CL_OBJECT_IS_KERNEL(kernel)) { err = CL_INVALID_KERNEL; break; } /* Check number of dimensions we have */ if (UNLIKELY(work_dim == 0 || work_dim > 3)) { err = CL_INVALID_WORK_DIMENSION; break; } /* We need a work size per dimension */ if (UNLIKELY(global_work_size == NULL)) { err = CL_INVALID_GLOBAL_WORK_SIZE; break; } if (kernel->vme) { if (work_dim != 2) { err = CL_INVALID_WORK_DIMENSION; break; } if (local_work_size != NULL) { err = CL_INVALID_WORK_GROUP_SIZE; break; } } if (global_work_offset != NULL) { for (i = 0; i < work_dim; ++i) { if (UNLIKELY(global_work_offset[i] + global_work_size[i] > (size_t)-1)) { err = CL_INVALID_GLOBAL_OFFSET; break; } } } /* Queue and kernel must share the same context */ assert(kernel->program); if (command_queue->ctx != kernel->program->ctx) { err = CL_INVALID_CONTEXT; break; } if (local_work_size != NULL) { for (i = 0; i < work_dim; ++i) fixed_local_sz[i] = local_work_size[i]; } else { if (kernel->vme) { fixed_local_sz[0] = 16; fixed_local_sz[1] = 1; } else { uint j, maxDimSize = 64 /* from 64? */, maxGroupSize = 256; //MAX_WORK_GROUP_SIZE may too large size_t realGroupSize = 1; for (i = 0; i < work_dim; i++) { for (j = maxDimSize; j > 1; j--) { if (global_work_size[i] % j == 0 && j <= maxGroupSize) { fixed_local_sz[i] = j; maxGroupSize = maxGroupSize / j; maxDimSize = maxGroupSize > maxDimSize ? maxDimSize : maxGroupSize; break; //choose next work_dim } } realGroupSize *= fixed_local_sz[i]; } //in a loop of conformance test (such as test_api repeated_setup_cleanup), in each loop: //create a new context, a new command queue, and uses 'globalsize[0]=1000, localsize=NULL' to enqueu kernel //it triggers the following message for many times. //to avoid too many messages, only print it for the first time of the process. //just use static variable since it doesn't matter to print a few times at multi-thread case. static int warn_no_good_localsize = 1; if (realGroupSize % 8 != 0 && warn_no_good_localsize) { warn_no_good_localsize = 0; DEBUGP(DL_WARNING, "unable to find good values for local_work_size[i], please provide\n" " local_work_size[] explicitly, you can find good values with\n" " trial-and-error method."); } } } if (kernel->vme) { fixed_global_sz[0] = (global_work_size[0] + 15) / 16 * 16; fixed_global_sz[1] = (global_work_size[1] + 15) / 16; } else { for (i = 0; i < work_dim; ++i) fixed_global_sz[i] = global_work_size[i]; } if (global_work_offset != NULL) for (i = 0; i < work_dim; ++i) fixed_global_off[i] = global_work_offset[i]; if (kernel->compile_wg_sz[0] || kernel->compile_wg_sz[1] || kernel->compile_wg_sz[2]) { if (fixed_local_sz[0] != kernel->compile_wg_sz[0] || fixed_local_sz[1] != kernel->compile_wg_sz[1] || fixed_local_sz[2] != kernel->compile_wg_sz[2]) { err = CL_INVALID_WORK_GROUP_SIZE; break; } } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } int i, j, k; const size_t global_wk_sz_div[3] = { fixed_global_sz[0] / fixed_local_sz[0] * fixed_local_sz[0], fixed_global_sz[1] / fixed_local_sz[1] * fixed_local_sz[1], fixed_global_sz[2] / fixed_local_sz[2] * fixed_local_sz[2]}; const size_t global_wk_sz_rem[3] = { fixed_global_sz[0] % fixed_local_sz[0], fixed_global_sz[1] % fixed_local_sz[1], fixed_global_sz[2] % fixed_local_sz[2]}; cl_uint count; count = global_wk_sz_rem[0] ? 2 : 1; count *= global_wk_sz_rem[1] ? 2 : 1; count *= global_wk_sz_rem[2] ? 2 : 1; const size_t *global_wk_all[2] = {global_wk_sz_div, global_wk_sz_rem}; /* Go through the at most 8 cases and euque if there is work items left */ for (i = 0; i < 2; i++) { for (j = 0; j < 2; j++) { for (k = 0; k < 2; k++) { size_t global_wk_sz_use[3] = {global_wk_all[k][0], global_wk_all[j][1], global_wk_all[i][2]}; size_t global_dim_off[3] = { k * global_wk_sz_div[0] / fixed_local_sz[0], j * global_wk_sz_div[1] / fixed_local_sz[1], i * global_wk_sz_div[2] / fixed_local_sz[2]}; size_t local_wk_sz_use[3] = { k ? global_wk_sz_rem[0] : fixed_local_sz[0], j ? global_wk_sz_rem[1] : fixed_local_sz[1], i ? global_wk_sz_rem[2] : fixed_local_sz[2]}; if (local_wk_sz_use[0] == 0 || local_wk_sz_use[1] == 0 || local_wk_sz_use[2] == 0) continue; e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_NDRANGE_KERNEL, &err); if (err != CL_SUCCESS) { break; } /* Do device specific checks are enqueue the kernel */ err = cl_command_queue_ND_range(command_queue, kernel, e, work_dim, fixed_global_off, global_dim_off, fixed_global_sz, global_wk_sz_use, fixed_local_sz, local_wk_sz_use); if (err != CL_SUCCESS) { break; } e->exec_data.mid_event_of_enq = (count > 1); count--; /* We will flush the ndrange if no event depend. Else we will add it to queue list. The finish or Complete status will always be done in queue list. */ event_status = cl_event_is_ready(e); if (event_status < CL_COMPLETE) { // Error happend, cancel. err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } err = cl_event_exec(e, (event_status == CL_COMPLETE ? CL_SUBMITTED : CL_QUEUED), CL_FALSE); if (err != CL_SUCCESS) { break; } cl_command_queue_enqueue_event(command_queue, e); if (e->exec_data.mid_event_of_enq) cl_event_delete(e); } if (err != CL_SUCCESS) { break; } } if (err != CL_SUCCESS) { break; } } } while (0); if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } cl_int clEnqueueTask(cl_command_queue command_queue, cl_kernel kernel, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { const size_t global_size[3] = {1, 0, 0}; const size_t local_size[3] = {1, 0, 0}; return clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, global_size, local_size, num_events_in_wait_list, event_wait_list, event); } cl_int clEnqueueNativeKernel(cl_command_queue command_queue, void (*user_func)(void *), void *args, size_t cb_args, cl_uint num_mem_objects, const cl_mem *mem_list, const void **args_mem_loc, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) { cl_int err = CL_SUCCESS; void *new_args = NULL; void **new_args_mem_loc = NULL; cl_mem *new_mem_list = NULL; cl_int i; cl_int e_status; cl_event e = NULL; enqueue_data *data = NULL; do { if (user_func == NULL || (args == NULL && cb_args > 0) || (args == NULL && num_mem_objects > 0) || (args != NULL && cb_args == 0) || (num_mem_objects > 0 && (mem_list == NULL || args_mem_loc == NULL)) || (num_mem_objects == 0 && (mem_list != NULL || args_mem_loc != NULL))) { err = CL_INVALID_VALUE; break; } //Per spec, need copy args if (cb_args) { new_args = cl_malloc(cb_args); if (num_mem_objects) { new_args_mem_loc = cl_malloc(sizeof(void *) * num_mem_objects); new_mem_list = cl_malloc(sizeof(cl_mem) * num_mem_objects); memcpy(new_mem_list, mem_list, sizeof(cl_mem) * num_mem_objects); } if (new_args == NULL || new_args_mem_loc == NULL) { err = CL_OUT_OF_HOST_MEMORY; break; } memcpy(new_args, args, cb_args); for (i = 0; i < num_mem_objects; ++i) { if (!CL_OBJECT_IS_MEM(mem_list[i])) { err = CL_INVALID_MEM_OBJECT; break; } new_args_mem_loc[i] = new_args + (args_mem_loc[i] - args); //change to new args } } err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list, event, command_queue->ctx); if (err != CL_SUCCESS) { break; } e = cl_event_create(command_queue->ctx, command_queue, num_events_in_wait_list, event_wait_list, CL_COMMAND_NATIVE_KERNEL, &err); if (err != CL_SUCCESS) { break; } e_status = cl_event_is_ready(e); if (e_status < CL_COMPLETE) { err = CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST; break; } data = &e->exec_data; data->type = EnqueueNativeKernel; data->mem_list = new_mem_list; data->ptr = new_args; data->size = cb_args; data->offset = (size_t)num_mem_objects; data->const_ptr = new_args_mem_loc; data->user_func = user_func; new_args = NULL; new_mem_list = NULL; new_args_mem_loc = NULL; // Event delete will free them. err = cl_event_exec(e, (e_status == CL_COMPLETE ? CL_COMPLETE : CL_QUEUED), CL_FALSE); if (err != CL_SUCCESS) { break; } if (e_status != CL_COMPLETE) cl_command_queue_enqueue_event(command_queue, e); } while (0); if (err != CL_SUCCESS) { if (new_args) cl_free(new_args); if (new_mem_list) cl_free(new_mem_list); if (new_args_mem_loc) cl_free(new_args_mem_loc); } if (err == CL_SUCCESS && event) { *event = e; } else { cl_event_delete(e); } return err; } Beignet-1.3.2-Source/Android.common.mk000664 001750 001750 00000001617 13161142102 016654 0ustar00yryr000000 000000 #LOCAL_PATH:= $(call my-dir) #include $(CLEAR_VARS) TOP_C_INCLUDE := bionic $(BEIGNET_ROOT_PATH)/include TOP_CPPFLAGS := -Wall -Wno-invalid-offsetof -mfpmath=sse -fno-rtti -Wcast-align -std=c++11 -msse2 -msse3 -mssse3 -msse4.1 -D__ANDROID__ TOP_CFLAGS := -Wall -mfpmath=sse -msse2 -Wcast-align -msse2 -msse3 -mssse3 -msse4.1 -D__ANDROID__ LLVM_INCLUDE_DIRS := external/llvm/device/include\ external/llvm/include \ external/clang/include \ LLVM_CFLAGS := -DNDEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS LLVM_LFLAGS := -ldl -lm LLVM_FOUND := true DRM_INCLUDE_PATH := external/drm/intel external/drm/include/drm external/drm DRM_LIBRARY := libdrm DRM_FOUND := true THREAD_LIBS_INIT := libpthread DRM_INTEL_LIBRARY := libdrm_intel DRM_INTEL_FOUND := true GBE_LIBRARY := libgbe GBE_FOUND := false OCLIcd_FOUND := false Beignet-1.3.2-Source/CMakeLists.txt000664 001750 001750 00000032073 13174331716 016233 0ustar00yryr000000 000000 # compiler choose,now support ICC,GCC CLANG compiler if (COMPILER STREQUAL "GCC") find_program(CMAKE_C_COMPILER NAMES gcc) find_program(CMAKE_CXX_COMPILER NAMES g++) elseif (COMPILER STREQUAL "CLANG") set (CMAKE_C_COMPILER "clang") set (CMAKE_CXX_COMPILER "clang++") find_program(CMAKE_AR NAMES llvm-ar) find_program(CMAKE_LINKER NAMES llvm-ld) elseif (COMPILER STREQUAL "ICC") find_program(CMAKE_C_COMPILER NAMES icc) find_program(CMAKE_CXX_COMPILER NAMES icpc) find_program(CMAKE_AR NAMES xiar) find_program(CMAKE_LINKER NAMES xild) endif () CMAKE_MINIMUM_REQUIRED(VERSION 2.6.0) PROJECT(OCL) if( ${CMAKE_CXX_COMPILER_ID} STREQUAL "Clang") set(COMPILER "CLANG") elseif(${CMAKE_CXX_COMPILER_ID} STREQUAL "GNU") set(COMPILER "GCC") elseif(${CMAKE_CXX_COMPILER_ID} STREQUAL "Intel") set(COMPILER "ICC") endif() set (NOT_BUILD_STAND_ALONE_UTEST 1) INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_CURRENT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/include) INCLUDE (FindPkgConfig) SET(CMAKE_VERBOSE_MAKEFILE "false") set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/CMake/") INCLUDE (GNUInstallDirs OPTIONAL) # support old CMake without GNUInstallDirs if (NOT CMAKE_INSTALL_FULL_LIBDIR) set (CMAKE_INSTALL_FULL_LIBDIR "${CMAKE_INSTALL_PREFIX}/lib") set (BEIGNET_LIBRARY_ARCHITECTURE "") else (NOT CMAKE_INSTALL_FULL_LIBDIR) set (BEIGNET_LIBRARY_ARCHITECTURE "${CMAKE_LIBRARY_ARCHITECTURE}") endif (NOT CMAKE_INSTALL_FULL_LIBDIR) if (NOT LIB_INSTALL_DIR) set (LIB_INSTALL_DIR "${CMAKE_INSTALL_FULL_LIBDIR}") endif (NOT LIB_INSTALL_DIR) if (NOT BEIGNET_INSTALL_DIR) set (BEIGNET_INSTALL_DIR "${LIB_INSTALL_DIR}/beignet/") endif (NOT BEIGNET_INSTALL_DIR) # allow co-installation of 32- and 64-bit versions: # https://wiki.debian.org/Multiarch if (BEIGNET_INSTALL_DIR STREQUAL "${CMAKE_INSTALL_PREFIX}/lib/beignet/") set (ICD_FILE_NAME "intel-beignet.icd") else (BEIGNET_INSTALL_DIR STREQUAL "${CMAKE_INSTALL_PREFIX}/lib/beignet/") if (BEIGNET_LIBRARY_ARCHITECTURE STREQUAL "") set (ICD_FILE_NAME "intel-beignet.icd") else (BEIGNET_LIBRARY_ARCHITECTURE STREQUAL "") set (ICD_FILE_NAME "intel-beignet-${BEIGNET_LIBRARY_ARCHITECTURE}.icd") endif (BEIGNET_LIBRARY_ARCHITECTURE STREQUAL "") endif (BEIGNET_INSTALL_DIR STREQUAL "${CMAKE_INSTALL_PREFIX}/lib/beignet/") # Force Release with debug info if (NOT CMAKE_BUILD_TYPE) set (CMAKE_BUILD_TYPE RelWithDebInfo) endif (NOT CMAKE_BUILD_TYPE) set (CMAKE_BUILD_TYPE ${CMAKE_BUILD_TYPE} CACHE STRING "assure config" FORCE) message(STATUS "Building mode: " ${CMAKE_BUILD_TYPE}) # XXX now hard coded to enable the clamp to border workaround for IVB. ADD_DEFINITIONS(-DGEN7_SAMPLER_CLAMP_BORDER_WORKAROUND) # compiler flag setting if (COMPILER STREQUAL "GCC") set (CMAKE_C_CXX_FLAGS "${CMAKE_C_CXX_FLAGS} -funroll-loops -fstrict-aliasing -msse2 -msse3 -mssse3 -msse4.1 -fPIC -Wall -mfpmath=sse -Wcast-align -Wl,-E") elseif (COMPILER STREQUAL "CLANG") set (CMAKE_C_CXX_FLAGS "${CMAKE_C_CXX_FLAGS} -funroll-loops -fstrict-aliasing -msse2 -msse3 -mssse3 -msse4.1 -fPIC -Wall") elseif (COMPILER STREQUAL "ICC") set (CMAKE_C_CXX_FLAGS "${CMAKE_C_CXX_FLAGS} -wd2928 -Wall -fPIC -fstrict-aliasing -fp-model fast -msse4.1 -Wl,-E") endif () set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${CMAKE_C_CXX_FLAGS} -std=c++0x -Wno-invalid-offsetof") set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${CMAKE_C_CXX_FLAGS}") set (CMAKE_CXX_FLAGS_DEBUG "-O0 -g -DGBE_DEBUG=1") set (CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O2 -g -DGBE_DEBUG=1") set (CMAKE_CXX_FLAGS_MINSIZEREL "-Os -DNDEBUG -DGBE_DEBUG=0") set (CMAKE_CXX_FLAGS_RELEASE "-O2 -DNDEBUG -DGBE_DEBUG=0") set (CMAKE_C_FLAGS_DEBUG "-O0 -g -DGBE_DEBUG=1") set (CMAKE_C_FLAGS_RELWITHDEBINFO "-O2 -g -DGBE_DEBUG=1") set (CMAKE_C_FLAGS_MINSIZEREL "-Os -DNDEBUG -DGBE_DEBUG=0") set (CMAKE_C_FLAGS_RELEASE "-O2 -DNDEBUG -DGBE_DEBUG=0") IF (USE_STANDALONE_GBE_COMPILER STREQUAL "true") Find_Package(StandaloneGbeCompiler) ELSE (USE_STANDALONE_GBE_COMPILER STREQUAL "true") # Front end stuff we need #INCLUDE(CMake/FindLLVM.cmake) Find_Package(LLVM 3.3) SET (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-rtti") ENDIF (USE_STANDALONE_GBE_COMPILER STREQUAL "true") set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,-Bsymbolic -Wl,--no-undefined ${LLVM_LDFLAGS}") # XLib Find_Package(X11) IF(X11_FOUND) MESSAGE(STATUS "Looking for XLib - found") ELSE(X11_FOUND) MESSAGE(STATUS "Looking for XLib - not found") ENDIF(X11_FOUND) # DRM pkg_check_modules(DRM REQUIRED libdrm) IF(DRM_FOUND) MESSAGE(STATUS "Looking for DRM - found at ${DRM_PREFIX} ${DRM_VERSION}") INCLUDE_DIRECTORIES(${DRM_INCLUDE_DIRS}) ELSE(DRM_FOUND) MESSAGE(STATUS "Looking for DRM - not found") ENDIF(DRM_FOUND) include(CheckLibraryExists) # DRM Intel pkg_check_modules(DRM_INTEL libdrm_intel>=2.4.52) IF(DRM_INTEL_FOUND) INCLUDE_DIRECTORIES(${DRM_INTEL_INCLUDE_DIRS}) MESSAGE(STATUS "Looking for DRM Intel - found at ${DRM_INTEL_PREFIX} ${DRM_INTEL_VERSION}") CHECK_LIBRARY_EXISTS(drm_intel "drm_intel_bo_alloc_userptr" ${DRM_INTEL_LIBDIR} HAVE_DRM_INTEL_USERPTR) IF(HAVE_DRM_INTEL_USERPTR) MESSAGE(STATUS "Enable userptr support") ELSE(HAVE_DRM_INTEL_USERPTR) MESSAGE(STATUS "Disable userptr support") ENDIF(HAVE_DRM_INTEL_USERPTR) CHECK_LIBRARY_EXISTS(drm_intel "drm_intel_get_eu_total" ${DRM_INTEL_LIBDIR} HAVE_DRM_INTEL_EU_TOTAL) IF(HAVE_DRM_INTEL_EU_TOTAL) MESSAGE(STATUS "Enable EU total query support") ELSE(HAVE_DRM_INTEL_EU_TOTAL) MESSAGE(STATUS "Disable EU total query support") ENDIF(HAVE_DRM_INTEL_EU_TOTAL) CHECK_LIBRARY_EXISTS(drm_intel "drm_intel_get_subslice_total" ${DRM_INTEL_LIBDIR} HAVE_DRM_INTEL_SUBSLICE_TOTAL) IF(HAVE_DRM_INTEL_SUBSLICE_TOTAL) MESSAGE(STATUS "Enable subslice total query support") ELSE(HAVE_DRM_INTEL_SUBSLICE_TOTAL) MESSAGE(STATUS "Disable subslice total query support") ENDIF(HAVE_DRM_INTEL_SUBSLICE_TOTAL) CHECK_LIBRARY_EXISTS(drm_intel "drm_intel_get_pooled_eu" ${DRM_INTEL_LIBDIR} HAVE_DRM_INTEL_POOLED_EU) IF(HAVE_DRM_INTEL_POOLED_EU) MESSAGE(STATUS "Enable pooled eu query support") ELSE(HAVE_DRM_INTEL_POOLED_EU) MESSAGE(STATUS "Disable pooled eu query support") ENDIF(HAVE_DRM_INTEL_POOLED_EU) CHECK_LIBRARY_EXISTS(drm_intel "drm_intel_get_min_eu_in_pool" ${DRM_INTEL_LIBDIR} HAVE_DRM_INTEL_MIN_EU_IN_POOL) IF(HAVE_DRM_INTEL_MIN_EU_IN_POOL) MESSAGE(STATUS "Enable min eu in pool query support") ELSE(HAVE_DRM_INTEL_MIN_EU_IN_POOL) MESSAGE(STATUS "Disable min eu in pool query support") ENDIF(HAVE_DRM_INTEL_MIN_EU_IN_POOL) CHECK_LIBRARY_EXISTS(drm_intel "drm_intel_bo_set_softpin_offset" ${DRM_INTEL_LIBDIR} HAVE_DRM_INTEL_BO_SET_SOFTPIN) ELSE(DRM_INTEL_FOUND) MESSAGE(FATAL_ERROR "Looking for DRM Intel (>= 2.4.52) - not found") ENDIF(DRM_INTEL_FOUND) # CMRT #disable CMRT as default, since we do not see real case, #but see build issue of this feature OPTION(INVOKE_CMRT "Enable CMRT" OFF) IF(INVOKE_CMRT) pkg_check_modules(CMRT libcmrt) IF(CMRT_FOUND) INCLUDE_DIRECTORIES(${CMRT_INCLUDE_DIRS}) ENDIF(CMRT_FOUND) ENDIF(INVOKE_CMRT) # Threads Find_Package(Threads) IF(X11_FOUND) # Xext pkg_check_modules(XEXT REQUIRED xext) IF(XEXT_FOUND) INCLUDE_DIRECTORIES(${XEXT_INCLUDE_DIRS}) MESSAGE(STATUS "Looking for Xext - found at ${XEXT_PREFIX}") ELSE(XEXT_FOUND) MESSAGE(STATUS "Looking for Xext - not found") ENDIF(XEXT_FOUND) # Xfixes pkg_check_modules(XFIXES REQUIRED xfixes) IF(XFIXES_FOUND) INCLUDE_DIRECTORIES(${XFIXES_INCLUDE_DIRS}) MESSAGE(STATUS "Looking for Xfixes - found at ${XFIXES_PREFIX}") ELSE(XFIXES_FOUND) MESSAGE(STATUS "Looking for Xfixes - not found") ENDIF(XFIXES_FOUND) ENDIF(X11_FOUND) pkg_check_modules(OPENGL QUIET gl>=13.0.0) IF(OPENGL_FOUND) MESSAGE(STATUS "Looking for OpenGL - found at ${OPENGL_PREFIX} ${OPENGL_VERSION}") ELSE(OPENGL_FOUND) MESSAGE(STATUS "Looking for OpenGL (>=13.0.0) - not found, cl_khr_gl_sharing will be disabled") ENDIF(OPENGL_FOUND) pkg_check_modules(EGL QUIET egl>=13.0.0) IF(EGL_FOUND) MESSAGE(STATUS "Looking for EGL - found at ${EGL_PREFIX} ${EGL_VERSION}") ELSE(EGL_FOUND) MESSAGE(STATUS "Looking for EGL (>=13.0.0) - not found, cl_khr_gl_sharing will be disabled") ENDIF(EGL_FOUND) OPTION(OCLICD_COMPAT "OCL ICD compatibility mode" ON) IF(OCLICD_COMPAT) Find_Package(OCLIcd) IF(OCLIcd_FOUND) MESSAGE(STATUS "Looking for OCL ICD header file - found") configure_file ( "intel-beignet.icd.in" "${ICD_FILE_NAME}" ) install (FILES ${CMAKE_CURRENT_BINARY_DIR}/${ICD_FILE_NAME} DESTINATION /etc/OpenCL/vendors) ELSE(OCLIcd_FOUND) MESSAGE(STATUS "Looking for OCL ICD header file - not found") MESSAGE(FATAL_ERROR "OCL ICD loader miss. If you really want to disable OCL ICD support, please run cmake with option -DOCLICD_COMPAT=0.") ENDIF(OCLIcd_FOUND) ENDIF(OCLICD_COMPAT) Find_Package(PythonInterp) OPTION(EXPERIMENTAL_DOUBLE "Enable experimental double support" OFF) IF (EXPERIMENTAL_DOUBLE) ADD_DEFINITIONS(-DENABLE_FP64) ENDIF(EXPERIMENTAL_DOUBLE) SET(CAN_OPENCL_20 ON) IF (CMAKE_SIZEOF_VOID_P EQUAL 4) SET(CAN_OPENCL_20 OFF) ENDIF (CMAKE_SIZEOF_VOID_P EQUAL 4) IF (NOT HAVE_DRM_INTEL_BO_SET_SOFTPIN) SET(CAN_OPENCL_20 OFF) ENDIF (NOT HAVE_DRM_INTEL_BO_SET_SOFTPIN) IF (LLVM_VERSION_NODOT VERSION_LESS 39) SET(CAN_OPENCL_20 OFF) ENDIF (LLVM_VERSION_NODOT VERSION_LESS 39) IF (ENABLE_OPENCL_20) IF (NOT HAVE_DRM_INTEL_BO_SET_SOFTPIN) MESSAGE(FATAL_ERROR "Please update libdrm to version 2.4.66 or later to enable OpenCL 2.0.") ENDIF (NOT HAVE_DRM_INTEL_BO_SET_SOFTPIN) IF (LLVM_VERSION_NODOT VERSION_LESS 39) MESSAGE(FATAL_ERROR "Please update LLVM to version 3.9 or later to enable OpenCL 2.0.") ENDIF (LLVM_VERSION_NODOT VERSION_LESS 39) IF (CMAKE_SIZEOF_VOID_P EQUAL 4) MESSAGE(FATAL_ERROR "Please use x64 host to enable OpenCL 2.0.") ENDIF (CMAKE_SIZEOF_VOID_P EQUAL 4) ENDIF(ENABLE_OPENCL_20) IF (DEFINED ENABLE_OPENCL_20) IF (ENABLE_OPENCL_20 AND CAN_OPENCL_20) SET(CAN_OPENCL_20 ON) ELSE(ENABLE_OPENCL_20 AND CAN_OPENCL_20) SET(CAN_OPENCL_20 OFF) ENDIF (ENABLE_OPENCL_20 AND CAN_OPENCL_20) ENDIF (DEFINED ENABLE_OPENCL_20) OPTION(ENABLE_OPENCL_20 "Enable opencl 2.0 support" ${CAN_OPENCL_20}) IF (CAN_OPENCL_20) SET (ENABLE_OPENCL_20 ON) MESSAGE(STATUS "Building with OpenCL 2.0.") ADD_DEFINITIONS(-DENABLE_OPENCL_20) ELSE (CAN_OPENCL_20) MESSAGE(STATUS "Building with OpenCL 1.2.") ENDIF(CAN_OPENCL_20) set (LIBCL_DRIVER_VERSION_MAJOR 1) set (LIBCL_DRIVER_VERSION_MINOR 3) set (LIBCL_DRIVER_VERSION_PATCH 2) if (ENABLE_OPENCL_20) set (LIBCL_C_VERSION_MAJOR 2) set (LIBCL_C_VERSION_MINOR 0) else (ENABLE_OPENCL_20) set (LIBCL_C_VERSION_MAJOR 1) set (LIBCL_C_VERSION_MINOR 2) endif (ENABLE_OPENCL_20) configure_file ( "src/OCLConfig.h.in" "src/OCLConfig.h" ) OPTION(BUILD_EXAMPLES "Build examples" OFF) IF(BUILD_EXAMPLES) IF(NOT X11_FOUND) MESSAGE(FATAL_ERROR "XLib is necessary for examples - not found") ENDIF(NOT X11_FOUND) # libva & libva-x11 pkg_check_modules(LIBVA REQUIRED libva) pkg_check_modules(LIBVA-X11 REQUIRED libva-x11) set(LIBVA_BUF_SH_DEP false) set(V4L2_BUF_SH_DEP false) IF(LIBVA_FOUND AND LIBVA-X11_FOUND) MESSAGE(STATUS "Looking for LIBVA - found at ${LIBVA_PREFIX} ${LIBVA_VERSION}") MESSAGE(STATUS "Looking for LIBVA-X11 - found at ${LIBVA-X11_PREFIX} ${LIBVA-X11_VERSION}") INCLUDE_DIRECTORIES(${LIBVA_INCLUDE_DIRS}) INCLUDE_DIRECTORIES(${LIBVA-X11_INCLUDE_DIRS}) set(V4L2_BUF_SH_DEP true) IF(LIBVA_VERSION VERSION_LESS "0.36.0" OR LIBVA-X11_VERSION VERSION_LESS "0.36.0") IF(LIBVA_VERSION VERSION_LESS "0.36.0") MESSAGE(STATUS "Looking for LIBVA (>= 0.36.0) - not found") ENDIF(LIBVA_VERSION VERSION_LESS "0.36.0") IF(LIBVA-X11_VERSION VERSION_LESS "0.36.0") MESSAGE(STATUS "Looking for LIBVA-X11 (>= 0.36.0) - not found") ENDIF(LIBVA-X11_VERSION VERSION_LESS "0.36.0") MESSAGE(STATUS "Example libva_buffer_sharing will not be built") ELSE(LIBVA_VERSION VERSION_LESS "0.36.0" OR LIBVA-X11_VERSION VERSION_LESS "0.36.0") set(LIBVA_BUF_SH_DEP true) ENDIF(LIBVA_VERSION VERSION_LESS "0.36.0" OR LIBVA-X11_VERSION VERSION_LESS "0.36.0") ELSE(LIBVA_FOUND AND LIBVA-X11_FOUND) MESSAGE(STATUS "Example libva_buffer_sharing and v4l2_buffer_sharing will not be built") ENDIF(LIBVA_FOUND AND LIBVA-X11_FOUND) IF(NOT (OPENGL_FOUND AND EGL_FOUND AND X11_FOUND)) MESSAGE(STATUS "Example gl_buffer_sharing will not be built") ENDIF(NOT (OPENGL_FOUND AND EGL_FOUND AND X11_FOUND)) ENDIF(BUILD_EXAMPLES) ADD_SUBDIRECTORY(include) ADD_SUBDIRECTORY(backend) ADD_SUBDIRECTORY(src) ADD_SUBDIRECTORY(utests EXCLUDE_FROM_ALL) # compile benchmark only if standalone compiler is not provided IF (NOT (USE_STANDALONE_GBE_COMPILER STREQUAL "true")) ADD_SUBDIRECTORY(benchmark EXCLUDE_FROM_ALL) ENDIF (NOT (USE_STANDALONE_GBE_COMPILER STREQUAL "true")) IF(BUILD_EXAMPLES) ADD_SUBDIRECTORY(examples) ENDIF(BUILD_EXAMPLES) SET(CPACK_SET_DESTDIR ON) SET(CPACK_PACKAGE_VERSION_MAJOR "${LIBCL_DRIVER_VERSION_MAJOR}") SET(CPACK_PACKAGE_VERSION_MINOR "${LIBCL_DRIVER_VERSION_MINOR}") SET(CPACK_PACKAGE_VERSION_PATCH "${LIBCL_DRIVER_VERSION_PATCH}") SET(CPACK_SOURCE_GENERATOR "TGZ;TZ") SET(CPACK_PACKAGE_NAME "Beignet") SET(CPACK_PACKAGE_VENDOR "Intel Open Source Technology Center") INCLUDE(CPack) Beignet-1.3.2-Source/include/000775 001750 001750 00000000000 13174334761 015114 5ustar00yryr000000 000000 Beignet-1.3.2-Source/include/CMakeLists.txt000664 001750 001750 00000000255 13161142102 017634 0ustar00yryr000000 000000 FILE(GLOB HEADER_FILES "CL/*.h") FILE(GLOB HPP_FILES "CL/*.hpp") install (FILES ${HEADER_FILES} DESTINATION include/CL) install (FILES ${HPP_FILES} DESTINATION include/CL) Beignet-1.3.2-Source/include/CL/000775 001750 001750 00000000000 13174334761 015412 5ustar00yryr000000 000000 Beignet-1.3.2-Source/include/CL/cl_egl.h000664 001750 001750 00000012356 13161142102 016775 0ustar00yryr000000 000000 /******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ #ifndef __OPENCL_CL_EGL_H #define __OPENCL_CL_EGL_H #ifdef __APPLE__ #else #include #endif #ifdef __cplusplus extern "C" { #endif /* Command type for events created with clEnqueueAcquireEGLObjectsKHR */ #define CL_COMMAND_EGL_FENCE_SYNC_OBJECT_KHR 0x202F #define CL_COMMAND_ACQUIRE_EGL_OBJECTS_KHR 0x202D #define CL_COMMAND_RELEASE_EGL_OBJECTS_KHR 0x202E /* Error type for clCreateFromEGLImageKHR */ #define CL_INVALID_EGL_OBJECT_KHR -1093 #define CL_EGL_RESOURCE_NOT_ACQUIRED_KHR -1092 /* CLeglImageKHR is an opaque handle to an EGLImage */ typedef void* CLeglImageKHR; /* CLeglDisplayKHR is an opaque handle to an EGLDisplay */ typedef void* CLeglDisplayKHR; /* CLeglSyncKHR is an opaque handle to an EGLSync object */ typedef void* CLeglSyncKHR; /* properties passed to clCreateFromEGLImageKHR */ typedef intptr_t cl_egl_image_properties_khr; #define cl_khr_egl_image 1 extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromEGLImageKHR(cl_context /* context */, CLeglDisplayKHR /* egldisplay */, CLeglImageKHR /* eglimage */, cl_mem_flags /* flags */, const cl_egl_image_properties_khr * /* properties */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromEGLImageKHR_fn)( cl_context context, CLeglDisplayKHR egldisplay, CLeglImageKHR eglimage, cl_mem_flags flags, const cl_egl_image_properties_khr * properties, cl_int * errcode_ret); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireEGLObjectsKHR(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireEGLObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event); extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseEGLObjectsKHR(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseEGLObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event); #define cl_khr_egl_event 1 extern CL_API_ENTRY cl_event CL_API_CALL clCreateEventFromEGLSyncKHR(cl_context /* context */, CLeglSyncKHR /* sync */, CLeglDisplayKHR /* display */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_event (CL_API_CALL *clCreateEventFromEGLSyncKHR_fn)( cl_context context, CLeglSyncKHR sync, CLeglDisplayKHR display, cl_int * errcode_ret); #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_EGL_H */ Beignet-1.3.2-Source/include/CL/cl_dx9_media_sharing.h000664 001750 001750 00000012455 13161142102 021604 0ustar00yryr000000 000000 /********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_DX9_MEDIA_SHARING_H #define __OPENCL_CL_DX9_MEDIA_SHARING_H #include #include #ifdef __cplusplus extern "C" { #endif /******************************************************************************/ /* cl_khr_dx9_media_sharing */ #define cl_khr_dx9_media_sharing 1 typedef cl_uint cl_dx9_media_adapter_type_khr; typedef cl_uint cl_dx9_media_adapter_set_khr; #if defined(_WIN32) #include typedef struct _cl_dx9_surface_info_khr { IDirect3DSurface9 *resource; HANDLE shared_handle; } cl_dx9_surface_info_khr; #endif /******************************************************************************/ /* Error Codes */ #define CL_INVALID_DX9_MEDIA_ADAPTER_KHR -1010 #define CL_INVALID_DX9_MEDIA_SURFACE_KHR -1011 #define CL_DX9_MEDIA_SURFACE_ALREADY_ACQUIRED_KHR -1012 #define CL_DX9_MEDIA_SURFACE_NOT_ACQUIRED_KHR -1013 /* cl_media_adapter_type_khr */ #define CL_ADAPTER_D3D9_KHR 0x2020 #define CL_ADAPTER_D3D9EX_KHR 0x2021 #define CL_ADAPTER_DXVA_KHR 0x2022 /* cl_media_adapter_set_khr */ #define CL_PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR 0x2023 #define CL_ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR 0x2024 /* cl_context_info */ #define CL_CONTEXT_ADAPTER_D3D9_KHR 0x2025 #define CL_CONTEXT_ADAPTER_D3D9EX_KHR 0x2026 #define CL_CONTEXT_ADAPTER_DXVA_KHR 0x2027 /* cl_mem_info */ #define CL_MEM_DX9_MEDIA_ADAPTER_TYPE_KHR 0x2028 #define CL_MEM_DX9_MEDIA_SURFACE_INFO_KHR 0x2029 /* cl_image_info */ #define CL_IMAGE_DX9_MEDIA_PLANE_KHR 0x202A /* cl_command_type */ #define CL_COMMAND_ACQUIRE_DX9_MEDIA_SURFACES_KHR 0x202B #define CL_COMMAND_RELEASE_DX9_MEDIA_SURFACES_KHR 0x202C /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromDX9MediaAdapterKHR_fn)( cl_platform_id platform, cl_uint num_media_adapters, cl_dx9_media_adapter_type_khr * media_adapter_type, void * media_adapters, cl_dx9_media_adapter_set_khr media_adapter_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromDX9MediaSurfaceKHR_fn)( cl_context context, cl_mem_flags flags, cl_dx9_media_adapter_type_khr adapter_type, void * surface_info, cl_uint plane, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireDX9MediaSurfacesKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseDX9MediaSurfacesKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_DX9_MEDIA_SHARING_H */ Beignet-1.3.2-Source/include/CL/cl_gl.h000664 001750 001750 00000016637 13161142102 016636 0ustar00yryr000000 000000 /********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ #ifndef __OPENCL_CL_GL_H #define __OPENCL_CL_GL_H #ifdef __APPLE__ #include #else #include #endif #ifdef __cplusplus extern "C" { #endif typedef cl_uint cl_gl_object_type; typedef cl_uint cl_gl_texture_info; typedef cl_uint cl_gl_platform_info; typedef struct __GLsync *cl_GLsync; /* cl_gl_object_type = 0x2000 - 0x200F enum values are currently taken */ #define CL_GL_OBJECT_BUFFER 0x2000 #define CL_GL_OBJECT_TEXTURE2D 0x2001 #define CL_GL_OBJECT_TEXTURE3D 0x2002 #define CL_GL_OBJECT_RENDERBUFFER 0x2003 #define CL_GL_OBJECT_TEXTURE2D_ARRAY 0x200E #define CL_GL_OBJECT_TEXTURE1D 0x200F #define CL_GL_OBJECT_TEXTURE1D_ARRAY 0x2010 #define CL_GL_OBJECT_TEXTURE_BUFFER 0x2011 /* cl_gl_texture_info */ #define CL_GL_TEXTURE_TARGET 0x2004 #define CL_GL_MIPMAP_LEVEL 0x2005 #define CL_GL_NUM_SAMPLES 0x2012 extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLBuffer(cl_context /* context */, cl_mem_flags /* flags */, cl_GLuint /* bufobj */, int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLTexture(cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateFromGLRenderbuffer(cl_context /* context */, cl_mem_flags /* flags */, cl_GLuint /* renderbuffer */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetGLObjectInfo(cl_mem /* memobj */, cl_gl_object_type * /* gl_object_type */, cl_GLuint * /* gl_object_name */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetGLTextureInfo(cl_mem /* memobj */, cl_gl_texture_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueAcquireGLObjects(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReleaseGLObjects(cl_command_queue /* command_queue */, cl_uint /* num_objects */, const cl_mem * /* mem_objects */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; /* Deprecated OpenCL 1.1 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateFromGLTexture2D(cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateFromGLTexture3D(cl_context /* context */, cl_mem_flags /* flags */, cl_GLenum /* target */, cl_GLint /* miplevel */, cl_GLuint /* texture */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; /* cl_khr_gl_sharing extension */ #define cl_khr_gl_sharing 1 typedef cl_uint cl_gl_context_info; /* Additional Error Codes */ #define CL_INVALID_GL_SHAREGROUP_REFERENCE_KHR -1000 /* cl_gl_context_info */ #define CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KHR 0x2006 #define CL_DEVICES_FOR_GL_CONTEXT_KHR 0x2007 /* Additional cl_context_properties */ #define CL_GL_CONTEXT_KHR 0x2008 #define CL_EGL_DISPLAY_KHR 0x2009 #define CL_GLX_DISPLAY_KHR 0x200A #define CL_WGL_HDC_KHR 0x200B #define CL_CGL_SHAREGROUP_KHR 0x200C extern CL_API_ENTRY cl_int CL_API_CALL clGetGLContextInfoKHR(const cl_context_properties * /* properties */, cl_gl_context_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetGLContextInfoKHR_fn)( const cl_context_properties * properties, cl_gl_context_info param_name, size_t param_value_size, void * param_value, size_t * param_value_size_ret); #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_GL_H */ Beignet-1.3.2-Source/include/CL/cl_ext.h000664 001750 001750 00000046431 13161142102 017027 0ustar00yryr000000 000000 /******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ /* $Revision: 11928 $ on $Date: 2010-07-13 09:04:56 -0700 (Tue, 13 Jul 2010) $ */ /* cl_ext.h contains OpenCL extensions which don't have external */ /* (OpenGL, D3D) dependencies. */ #ifndef __CL_EXT_H #define __CL_EXT_H #ifdef __cplusplus extern "C" { #endif #ifdef __APPLE__ #include #include #else #include #endif /* cl_khr_fp16 extension - no extension #define since it has no functions */ #define CL_DEVICE_HALF_FP_CONFIG 0x1033 /* Memory object destruction * * Apple extension for use to manage externally allocated buffers used with cl_mem objects with CL_MEM_USE_HOST_PTR * * Registers a user callback function that will be called when the memory object is deleted and its resources * freed. Each call to clSetMemObjectCallbackFn registers the specified user callback function on a callback * stack associated with memobj. The registered user callback functions are called in the reverse order in * which they were registered. The user callback functions are called and then the memory object is deleted * and its resources freed. This provides a mechanism for the application (and libraries) using memobj to be * notified when the memory referenced by host_ptr, specified when the memory object is created and used as * the storage bits for the memory object, can be reused or freed. * * The application may not call CL api's with the cl_mem object passed to the pfn_notify. * * Please check for the "cl_APPLE_SetMemObjectDestructor" extension using clGetDeviceInfo(CL_DEVICE_EXTENSIONS) * before using. */ #define cl_APPLE_SetMemObjectDestructor 1 cl_int CL_API_ENTRY clSetMemObjectDestructorAPPLE( cl_mem /* memobj */, void (* /*pfn_notify*/)( cl_mem /* memobj */, void* /*user_data*/), void * /*user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /* Context Logging Functions * * The next three convenience functions are intended to be used as the pfn_notify parameter to clCreateContext(). * Please check for the "cl_APPLE_ContextLoggingFunctions" extension using clGetDeviceInfo(CL_DEVICE_EXTENSIONS) * before using. * * clLogMessagesToSystemLog fowards on all log messages to the Apple System Logger */ #define cl_APPLE_ContextLoggingFunctions 1 extern void CL_API_ENTRY clLogMessagesToSystemLogAPPLE( const char * /* errstr */, const void * /* private_info */, size_t /* cb */, void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /* clLogMessagesToStdout sends all log messages to the file descriptor stdout */ extern void CL_API_ENTRY clLogMessagesToStdoutAPPLE( const char * /* errstr */, const void * /* private_info */, size_t /* cb */, void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /* clLogMessagesToStderr sends all log messages to the file descriptor stderr */ extern void CL_API_ENTRY clLogMessagesToStderrAPPLE( const char * /* errstr */, const void * /* private_info */, size_t /* cb */, void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0; /************************ * cl_khr_icd extension * ************************/ #define cl_khr_icd 1 /* cl_platform_info */ #define CL_PLATFORM_ICD_SUFFIX_KHR 0x0920 /* Additional Error Codes */ #define CL_PLATFORM_NOT_FOUND_KHR -1001 extern CL_API_ENTRY cl_int CL_API_CALL clIcdGetPlatformIDsKHR(cl_uint /* num_entries */, cl_platform_id * /* platforms */, cl_uint * /* num_platforms */); typedef CL_API_ENTRY cl_int (CL_API_CALL *clIcdGetPlatformIDsKHR_fn)( cl_uint /* num_entries */, cl_platform_id * /* platforms */, cl_uint * /* num_platforms */); /* Extension: cl_khr_image2D_buffer * * This extension allows a 2D image to be created from a cl_mem buffer without a copy. * The type associated with a 2D image created from a buffer in an OpenCL program is image2d_t. * Both the sampler and sampler-less read_image built-in functions are supported for 2D images * and 2D images created from a buffer. Similarly, the write_image built-ins are also supported * for 2D images created from a buffer. * * When the 2D image from buffer is created, the client must specify the width, * height, image format (i.e. channel order and channel data type) and optionally the row pitch * * The pitch specified must be a multiple of CL_DEVICE_IMAGE_PITCH_ALIGNMENT pixels. * The base address of the buffer must be aligned to CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT pixels. */ /************************************* * cl_khr_initalize_memory extension * *************************************/ #define CL_CONTEXT_MEMORY_INITIALIZE_KHR 0x2030 /************************************** * cl_khr_terminate_context extension * **************************************/ #define CL_DEVICE_TERMINATE_CAPABILITY_KHR 0x2031 #define CL_CONTEXT_TERMINATE_KHR 0x2032 #define cl_khr_terminate_context 1 extern CL_API_ENTRY cl_int CL_API_CALL clTerminateContextKHR(cl_context /* context */) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clTerminateContextKHR_fn)(cl_context /* context */) CL_EXT_SUFFIX__VERSION_1_2; /* * Extension: cl_khr_spir * * This extension adds support to create an OpenCL program object from a * Standard Portable Intermediate Representation (SPIR) instance */ #define CL_DEVICE_SPIR_VERSIONS 0x40E0 #define CL_PROGRAM_BINARY_TYPE_INTERMEDIATE 0x40E1 /****************************************** * cl_nv_device_attribute_query extension * ******************************************/ /* cl_nv_device_attribute_query extension - no extension #define since it has no functions */ #define CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV 0x4000 #define CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV 0x4001 #define CL_DEVICE_REGISTERS_PER_BLOCK_NV 0x4002 #define CL_DEVICE_WARP_SIZE_NV 0x4003 #define CL_DEVICE_GPU_OVERLAP_NV 0x4004 #define CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV 0x4005 #define CL_DEVICE_INTEGRATED_MEMORY_NV 0x4006 /********************************* * cl_amd_device_attribute_query * *********************************/ #define CL_DEVICE_PROFILING_TIMER_OFFSET_AMD 0x4036 /********************************* * cl_arm_printf extension *********************************/ #define CL_PRINTF_CALLBACK_ARM 0x40B0 #define CL_PRINTF_BUFFERSIZE_ARM 0x40B1 /********************************* * cl_intel_accelerator extension * *********************************/ #define cl_intel_accelerator 1 #define cl_intel_motion_estimation 1 typedef struct _cl_accelerator_intel* cl_accelerator_intel; typedef cl_uint cl_accelerator_type_intel; typedef cl_uint cl_accelerator_info_intel; typedef struct _cl_motion_estimation_desc_intel { cl_uint mb_block_type; cl_uint subpixel_mode; cl_uint sad_adjust_mode; cl_uint search_path_type; } cl_motion_estimation_desc_intel; /* Error Codes */ #define CL_INVALID_ACCELERATOR_INTEL -1094 #define CL_INVALID_ACCELERATOR_TYPE_INTEL -1095 #define CL_INVALID_ACCELERATOR_DESCRIPTOR_INTEL -1096 #define CL_ACCELERATOR_TYPE_NOT_SUPPORTED_INTEL -1097 /* Deprecated Error Codes */ #define CL_INVALID_ACCELERATOR_INTEL_DEPRECATED -6000 #define CL_INVALID_ACCELERATOR_TYPE_INTEL_DEPRECATED -6001 #define CL_INVALID_ACCELERATOR_DESCRIPTOR_INTEL_DEPRECATED -6002 #define CL_ACCELERATOR_TYPE_NOT_SUPPORTED_INTEL_DEPRECATED -6003 /* cl_accelerator_type_intel */ #define CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL 0x0 /* cl_accelerator_info_intel */ #define CL_ACCELERATOR_DESCRIPTOR_INTEL 0x4090 #define CL_ACCELERATOR_REFERENCE_COUNT_INTEL 0x4091 #define CL_ACCELERATOR_CONTEXT_INTEL 0x4092 #define CL_ACCELERATOR_TYPE_INTEL 0x4093 /*cl_motion_detect_desc_intel flags */ #define CL_ME_MB_TYPE_16x16_INTEL 0x0 #define CL_ME_MB_TYPE_8x8_INTEL 0x1 #define CL_ME_MB_TYPE_4x4_INTEL 0x2 #define CL_ME_SUBPIXEL_MODE_INTEGER_INTEL 0x0 #define CL_ME_SUBPIXEL_MODE_HPEL_INTEL 0x1 #define CL_ME_SUBPIXEL_MODE_QPEL_INTEL 0x2 #define CL_ME_SAD_ADJUST_MODE_NONE_INTEL 0x0 #define CL_ME_SAD_ADJUST_MODE_HAAR_INTEL 0x1 #define CL_ME_SEARCH_PATH_RADIUS_2_2_INTEL 0x0 #define CL_ME_SEARCH_PATH_RADIUS_4_4_INTEL 0x1 #define CL_ME_SEARCH_PATH_RADIUS_16_12_INTEL 0x5 extern CL_API_ENTRY cl_accelerator_intel CL_API_CALL clCreateAcceleratorINTEL( cl_context /* context */, cl_accelerator_type_intel /* accelerator_type */, size_t /* descriptor_size */, const void* /* descriptor */, cl_int* /* errcode_ret */ ) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_accelerator_intel (CL_API_CALL *clCreateAcceleratorINTEL_fn)( cl_context /* context */, cl_accelerator_type_intel /* accelerator_type */, size_t /* descriptor_size */, const void* /* descriptor */, cl_int* /* errcode_ret */ ) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clGetAcceleratorInfoINTEL ( cl_accelerator_intel /* accelerator */, cl_accelerator_info_intel /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */ ) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetAcceleratorInfoINTEL_fn)( cl_accelerator_intel /* accelerator */, cl_accelerator_info_intel /* param_name */, size_t /* param_value_size */, void* /* param_value */, size_t* /* param_value_size_ret */ ) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clRetainAcceleratorINTEL( cl_accelerator_intel /* accelerator */ ) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clRetainAcceleratorINTEL_fn)( cl_accelerator_intel /* accelerator */ ) CL_EXT_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseAcceleratorINTEL( cl_accelerator_intel /* accelerator */ ) CL_EXT_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clReleaseAcceleratorINTEL_fn)( cl_accelerator_intel /* accelerator */ ) CL_EXT_SUFFIX__VERSION_1_2; #ifdef CL_VERSION_1_1 /*********************************** * cl_ext_device_fission extension * ***********************************/ #define cl_ext_device_fission 1 extern CL_API_ENTRY cl_int CL_API_CALL clReleaseDeviceEXT( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL *clReleaseDeviceEXT_fn)( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clRetainDeviceEXT( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int (CL_API_CALL *clRetainDeviceEXT_fn)( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef cl_ulong cl_device_partition_property_ext; extern CL_API_ENTRY cl_int CL_API_CALL clCreateSubDevicesEXT( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; typedef CL_API_ENTRY cl_int ( CL_API_CALL * clCreateSubDevicesEXT_fn)( cl_device_id /*in_device*/, const cl_device_partition_property_ext * /* properties */, cl_uint /*num_entries*/, cl_device_id * /*out_devices*/, cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1; /* cl_device_partition_property_ext */ #define CL_DEVICE_PARTITION_EQUALLY_EXT 0x4050 #define CL_DEVICE_PARTITION_BY_COUNTS_EXT 0x4051 #define CL_DEVICE_PARTITION_BY_NAMES_EXT 0x4052 #define CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN_EXT 0x4053 /* clDeviceGetInfo selectors */ #define CL_DEVICE_PARENT_DEVICE_EXT 0x4054 #define CL_DEVICE_PARTITION_TYPES_EXT 0x4055 #define CL_DEVICE_AFFINITY_DOMAINS_EXT 0x4056 #define CL_DEVICE_REFERENCE_COUNT_EXT 0x4057 #define CL_DEVICE_PARTITION_STYLE_EXT 0x4058 /* error codes */ #define CL_DEVICE_PARTITION_FAILED_EXT -1057 #define CL_INVALID_PARTITION_COUNT_EXT -1058 #define CL_INVALID_PARTITION_NAME_EXT -1059 /* CL_AFFINITY_DOMAINs */ #define CL_AFFINITY_DOMAIN_L1_CACHE_EXT 0x1 #define CL_AFFINITY_DOMAIN_L2_CACHE_EXT 0x2 #define CL_AFFINITY_DOMAIN_L3_CACHE_EXT 0x3 #define CL_AFFINITY_DOMAIN_L4_CACHE_EXT 0x4 #define CL_AFFINITY_DOMAIN_NUMA_EXT 0x10 #define CL_AFFINITY_DOMAIN_NEXT_FISSIONABLE_EXT 0x100 /* cl_device_partition_property_ext list terminators */ #define CL_PROPERTIES_LIST_END_EXT ((cl_device_partition_property_ext) 0) #define CL_PARTITION_BY_COUNTS_LIST_END_EXT ((cl_device_partition_property_ext) 0) #define CL_PARTITION_BY_NAMES_LIST_END_EXT ((cl_device_partition_property_ext) 0 - 1) /********************************* * cl_qcom_ext_host_ptr extension *********************************/ #define CL_MEM_EXT_HOST_PTR_QCOM (1 << 29) #define CL_DEVICE_EXT_MEM_PADDING_IN_BYTES_QCOM 0x40A0 #define CL_DEVICE_PAGE_SIZE_QCOM 0x40A1 #define CL_IMAGE_ROW_ALIGNMENT_QCOM 0x40A2 #define CL_IMAGE_SLICE_ALIGNMENT_QCOM 0x40A3 #define CL_MEM_HOST_UNCACHED_QCOM 0x40A4 #define CL_MEM_HOST_WRITEBACK_QCOM 0x40A5 #define CL_MEM_HOST_WRITETHROUGH_QCOM 0x40A6 #define CL_MEM_HOST_WRITE_COMBINING_QCOM 0x40A7 typedef cl_uint cl_image_pitch_info_qcom; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceImageInfoQCOM(cl_device_id device, size_t image_width, size_t image_height, const cl_image_format *image_format, cl_image_pitch_info_qcom param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret); typedef struct _cl_mem_ext_host_ptr { /* Type of external memory allocation. */ /* Legal values will be defined in layered extensions. */ cl_uint allocation_type; /* Host cache policy for this external memory allocation. */ cl_uint host_cache_policy; } cl_mem_ext_host_ptr; /********************************* * cl_qcom_ion_host_ptr extension *********************************/ #define CL_MEM_ION_HOST_PTR_QCOM 0x40A8 typedef struct _cl_mem_ion_host_ptr { /* Type of external memory allocation. */ /* Must be CL_MEM_ION_HOST_PTR_QCOM for ION allocations. */ cl_mem_ext_host_ptr ext_host_ptr; /* ION file descriptor */ int ion_filedesc; /* Host pointer to the ION allocated memory */ void* ion_hostptr; } cl_mem_ion_host_ptr; #endif /* CL_VERSION_1_1 */ #ifdef CL_VERSION_2_0 /********************************* * cl_khr_sub_groups extension *********************************/ #define cl_khr_sub_groups 1 typedef cl_uint cl_kernel_sub_group_info; /* cl_khr_sub_group_info */ #define CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR 0x2033 #define CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE_KHR 0x2034 extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelSubGroupInfoKHR(cl_kernel /* in_kernel */, cl_device_id /*in_device*/, cl_kernel_sub_group_info /* param_name */, size_t /*input_value_size*/, const void * /*input_value*/, size_t /*param_value_size*/, void* /*param_value*/, size_t* /*param_value_size_ret*/ ) CL_EXT_SUFFIX__VERSION_2_0; typedef CL_API_ENTRY cl_int ( CL_API_CALL * clGetKernelSubGroupInfoKHR_fn)(cl_kernel /* in_kernel */, cl_device_id /*in_device*/, cl_kernel_sub_group_info /* param_name */, size_t /*input_value_size*/, const void * /*input_value*/, size_t /*param_value_size*/, void* /*param_value*/, size_t* /*param_value_size_ret*/ ) CL_EXT_SUFFIX__VERSION_2_0; #endif /* CL_VERSION_2_0 */ #ifdef __cplusplus } #endif #endif /* __CL_EXT_H */ Beignet-1.3.2-Source/include/CL/cl_d3d10.h000664 001750 001750 00000012002 13161142102 017025 0ustar00yryr000000 000000 /********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_D3D10_H #define __OPENCL_CL_D3D10_H #include #include #include #ifdef __cplusplus extern "C" { #endif /****************************************************************************** * cl_khr_d3d10_sharing */ #define cl_khr_d3d10_sharing 1 typedef cl_uint cl_d3d10_device_source_khr; typedef cl_uint cl_d3d10_device_set_khr; /******************************************************************************/ /* Error Codes */ #define CL_INVALID_D3D10_DEVICE_KHR -1002 #define CL_INVALID_D3D10_RESOURCE_KHR -1003 #define CL_D3D10_RESOURCE_ALREADY_ACQUIRED_KHR -1004 #define CL_D3D10_RESOURCE_NOT_ACQUIRED_KHR -1005 /* cl_d3d10_device_source_nv */ #define CL_D3D10_DEVICE_KHR 0x4010 #define CL_D3D10_DXGI_ADAPTER_KHR 0x4011 /* cl_d3d10_device_set_nv */ #define CL_PREFERRED_DEVICES_FOR_D3D10_KHR 0x4012 #define CL_ALL_DEVICES_FOR_D3D10_KHR 0x4013 /* cl_context_info */ #define CL_CONTEXT_D3D10_DEVICE_KHR 0x4014 #define CL_CONTEXT_D3D10_PREFER_SHARED_RESOURCES_KHR 0x402C /* cl_mem_info */ #define CL_MEM_D3D10_RESOURCE_KHR 0x4015 /* cl_image_info */ #define CL_IMAGE_D3D10_SUBRESOURCE_KHR 0x4016 /* cl_command_type */ #define CL_COMMAND_ACQUIRE_D3D10_OBJECTS_KHR 0x4017 #define CL_COMMAND_RELEASE_D3D10_OBJECTS_KHR 0x4018 /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromD3D10KHR_fn)( cl_platform_id platform, cl_d3d10_device_source_khr d3d_device_source, void * d3d_object, cl_d3d10_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10BufferKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Buffer * resource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10Texture2DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Texture2D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D10Texture3DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D10Texture3D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireD3D10ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseD3D10ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_0; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_D3D10_H */ Beignet-1.3.2-Source/include/CL/cl_gl_ext.h000664 001750 001750 00000005465 13161142102 017513 0ustar00yryr000000 000000 /********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ /* cl_gl_ext.h contains vendor (non-KHR) OpenCL extensions which have */ /* OpenGL dependencies. */ #ifndef __OPENCL_CL_GL_EXT_H #define __OPENCL_CL_GL_EXT_H #ifdef __cplusplus extern "C" { #endif #ifdef __APPLE__ #include #else #include #endif /* * For each extension, follow this template * cl_VEN_extname extension */ /* #define cl_VEN_extname 1 * ... define new types, if any * ... define new tokens, if any * ... define new APIs, if any * * If you need GLtypes here, mirror them with a cl_GLtype, rather than including a GL header * This allows us to avoid having to decide whether to include GL headers or GLES here. */ /* * cl_khr_gl_event extension * See section 9.9 in the OpenCL 1.1 spec for more information */ #define CL_COMMAND_GL_FENCE_SYNC_OBJECT_KHR 0x200D extern CL_API_ENTRY cl_event CL_API_CALL clCreateEventFromGLsyncKHR(cl_context /* context */, cl_GLsync /* cl_GLsync */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_GL_EXT_H */ Beignet-1.3.2-Source/include/CL/cl.h000664 001750 001750 00000213053 13161142102 016143 0ustar00yryr000000 000000 /******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ #ifndef __OPENCL_CL_H #define __OPENCL_CL_H #ifdef __APPLE__ #include #else #include #endif #ifdef __cplusplus extern "C" { #endif /******************************************************************************/ typedef struct _cl_platform_id * cl_platform_id; typedef struct _cl_device_id * cl_device_id; typedef struct _cl_context * cl_context; typedef struct _cl_command_queue * cl_command_queue; typedef struct _cl_mem * cl_mem; typedef struct _cl_program * cl_program; typedef struct _cl_kernel * cl_kernel; typedef struct _cl_event * cl_event; typedef struct _cl_sampler * cl_sampler; typedef cl_uint cl_bool; /* WARNING! Unlike cl_ types in cl_platform.h, cl_bool is not guaranteed to be the same size as the bool in kernels. */ typedef cl_ulong cl_bitfield; typedef cl_bitfield cl_device_type; typedef cl_uint cl_platform_info; typedef cl_uint cl_device_info; typedef cl_bitfield cl_device_fp_config; typedef cl_uint cl_device_mem_cache_type; typedef cl_uint cl_device_local_mem_type; typedef cl_bitfield cl_device_exec_capabilities; typedef cl_bitfield cl_device_svm_capabilities; typedef cl_bitfield cl_command_queue_properties; typedef intptr_t cl_device_partition_property; typedef cl_bitfield cl_device_affinity_domain; typedef intptr_t cl_context_properties; typedef cl_uint cl_context_info; typedef cl_bitfield cl_queue_properties; typedef cl_uint cl_command_queue_info; typedef cl_uint cl_channel_order; typedef cl_uint cl_channel_type; typedef cl_bitfield cl_mem_flags; typedef cl_bitfield cl_svm_mem_flags; typedef cl_uint cl_mem_object_type; typedef cl_uint cl_mem_info; typedef cl_bitfield cl_mem_migration_flags; typedef cl_uint cl_image_info; typedef cl_uint cl_buffer_create_type; typedef cl_uint cl_addressing_mode; typedef cl_uint cl_filter_mode; typedef cl_uint cl_sampler_info; typedef cl_bitfield cl_map_flags; typedef intptr_t cl_pipe_properties; typedef cl_uint cl_pipe_info; typedef cl_uint cl_program_info; typedef cl_uint cl_program_build_info; typedef cl_uint cl_program_binary_type; typedef cl_int cl_build_status; typedef cl_uint cl_kernel_info; typedef cl_uint cl_kernel_arg_info; typedef cl_uint cl_kernel_arg_address_qualifier; typedef cl_uint cl_kernel_arg_access_qualifier; typedef cl_bitfield cl_kernel_arg_type_qualifier; typedef cl_uint cl_kernel_work_group_info; typedef cl_uint cl_event_info; typedef cl_uint cl_command_type; typedef cl_uint cl_profiling_info; typedef cl_bitfield cl_sampler_properties; typedef cl_uint cl_kernel_exec_info; typedef struct _cl_image_format { cl_channel_order image_channel_order; cl_channel_type image_channel_data_type; } cl_image_format; typedef struct _cl_image_desc { cl_mem_object_type image_type; size_t image_width; size_t image_height; size_t image_depth; size_t image_array_size; size_t image_row_pitch; size_t image_slice_pitch; cl_uint num_mip_levels; cl_uint num_samples; #ifdef __GNUC__ __extension__ /* Prevents warnings about anonymous union in -pedantic builds */ #endif union { cl_mem buffer; cl_mem mem_object; }; } cl_image_desc; typedef struct _cl_buffer_region { size_t origin; size_t size; } cl_buffer_region; /******************************************************************************/ /* Error Codes */ #define CL_SUCCESS 0 #define CL_DEVICE_NOT_FOUND -1 #define CL_DEVICE_NOT_AVAILABLE -2 #define CL_COMPILER_NOT_AVAILABLE -3 #define CL_MEM_OBJECT_ALLOCATION_FAILURE -4 #define CL_OUT_OF_RESOURCES -5 #define CL_OUT_OF_HOST_MEMORY -6 #define CL_PROFILING_INFO_NOT_AVAILABLE -7 #define CL_MEM_COPY_OVERLAP -8 #define CL_IMAGE_FORMAT_MISMATCH -9 #define CL_IMAGE_FORMAT_NOT_SUPPORTED -10 #define CL_BUILD_PROGRAM_FAILURE -11 #define CL_MAP_FAILURE -12 #define CL_MISALIGNED_SUB_BUFFER_OFFSET -13 #define CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST -14 #define CL_COMPILE_PROGRAM_FAILURE -15 #define CL_LINKER_NOT_AVAILABLE -16 #define CL_LINK_PROGRAM_FAILURE -17 #define CL_DEVICE_PARTITION_FAILED -18 #define CL_KERNEL_ARG_INFO_NOT_AVAILABLE -19 #define CL_INVALID_VALUE -30 #define CL_INVALID_DEVICE_TYPE -31 #define CL_INVALID_PLATFORM -32 #define CL_INVALID_DEVICE -33 #define CL_INVALID_CONTEXT -34 #define CL_INVALID_QUEUE_PROPERTIES -35 #define CL_INVALID_COMMAND_QUEUE -36 #define CL_INVALID_HOST_PTR -37 #define CL_INVALID_MEM_OBJECT -38 #define CL_INVALID_IMAGE_FORMAT_DESCRIPTOR -39 #define CL_INVALID_IMAGE_SIZE -40 #define CL_INVALID_SAMPLER -41 #define CL_INVALID_BINARY -42 #define CL_INVALID_BUILD_OPTIONS -43 #define CL_INVALID_PROGRAM -44 #define CL_INVALID_PROGRAM_EXECUTABLE -45 #define CL_INVALID_KERNEL_NAME -46 #define CL_INVALID_KERNEL_DEFINITION -47 #define CL_INVALID_KERNEL -48 #define CL_INVALID_ARG_INDEX -49 #define CL_INVALID_ARG_VALUE -50 #define CL_INVALID_ARG_SIZE -51 #define CL_INVALID_KERNEL_ARGS -52 #define CL_INVALID_WORK_DIMENSION -53 #define CL_INVALID_WORK_GROUP_SIZE -54 #define CL_INVALID_WORK_ITEM_SIZE -55 #define CL_INVALID_GLOBAL_OFFSET -56 #define CL_INVALID_EVENT_WAIT_LIST -57 #define CL_INVALID_EVENT -58 #define CL_INVALID_OPERATION -59 #define CL_INVALID_GL_OBJECT -60 #define CL_INVALID_BUFFER_SIZE -61 #define CL_INVALID_MIP_LEVEL -62 #define CL_INVALID_GLOBAL_WORK_SIZE -63 #define CL_INVALID_PROPERTY -64 #define CL_INVALID_IMAGE_DESCRIPTOR -65 #define CL_INVALID_COMPILER_OPTIONS -66 #define CL_INVALID_LINKER_OPTIONS -67 #define CL_INVALID_DEVICE_PARTITION_COUNT -68 #define CL_INVALID_PIPE_SIZE -69 #define CL_INVALID_DEVICE_QUEUE -70 /* OpenCL Version */ #define CL_VERSION_1_0 1 #define CL_VERSION_1_1 1 #define CL_VERSION_1_2 1 #define CL_VERSION_2_0 1 /* cl_bool */ #define CL_FALSE 0 #define CL_TRUE 1 #define CL_BLOCKING CL_TRUE #define CL_NON_BLOCKING CL_FALSE /* cl_platform_info */ #define CL_PLATFORM_PROFILE 0x0900 #define CL_PLATFORM_VERSION 0x0901 #define CL_PLATFORM_NAME 0x0902 #define CL_PLATFORM_VENDOR 0x0903 #define CL_PLATFORM_EXTENSIONS 0x0904 /* cl_device_type - bitfield */ #define CL_DEVICE_TYPE_DEFAULT (1 << 0) #define CL_DEVICE_TYPE_CPU (1 << 1) #define CL_DEVICE_TYPE_GPU (1 << 2) #define CL_DEVICE_TYPE_ACCELERATOR (1 << 3) #define CL_DEVICE_TYPE_CUSTOM (1 << 4) #define CL_DEVICE_TYPE_ALL 0xFFFFFFFF /* cl_device_info */ #define CL_DEVICE_TYPE 0x1000 #define CL_DEVICE_VENDOR_ID 0x1001 #define CL_DEVICE_MAX_COMPUTE_UNITS 0x1002 #define CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS 0x1003 #define CL_DEVICE_MAX_WORK_GROUP_SIZE 0x1004 #define CL_DEVICE_MAX_WORK_ITEM_SIZES 0x1005 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR 0x1006 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT 0x1007 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT 0x1008 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG 0x1009 #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT 0x100A #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE 0x100B #define CL_DEVICE_MAX_CLOCK_FREQUENCY 0x100C #define CL_DEVICE_ADDRESS_BITS 0x100D #define CL_DEVICE_MAX_READ_IMAGE_ARGS 0x100E #define CL_DEVICE_MAX_WRITE_IMAGE_ARGS 0x100F #define CL_DEVICE_MAX_MEM_ALLOC_SIZE 0x1010 #define CL_DEVICE_IMAGE2D_MAX_WIDTH 0x1011 #define CL_DEVICE_IMAGE2D_MAX_HEIGHT 0x1012 #define CL_DEVICE_IMAGE3D_MAX_WIDTH 0x1013 #define CL_DEVICE_IMAGE3D_MAX_HEIGHT 0x1014 #define CL_DEVICE_IMAGE3D_MAX_DEPTH 0x1015 #define CL_DEVICE_IMAGE_SUPPORT 0x1016 #define CL_DEVICE_MAX_PARAMETER_SIZE 0x1017 #define CL_DEVICE_MAX_SAMPLERS 0x1018 #define CL_DEVICE_MEM_BASE_ADDR_ALIGN 0x1019 #define CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE 0x101A #define CL_DEVICE_SINGLE_FP_CONFIG 0x101B #define CL_DEVICE_GLOBAL_MEM_CACHE_TYPE 0x101C #define CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE 0x101D #define CL_DEVICE_GLOBAL_MEM_CACHE_SIZE 0x101E #define CL_DEVICE_GLOBAL_MEM_SIZE 0x101F #define CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE 0x1020 #define CL_DEVICE_MAX_CONSTANT_ARGS 0x1021 #define CL_DEVICE_LOCAL_MEM_TYPE 0x1022 #define CL_DEVICE_LOCAL_MEM_SIZE 0x1023 #define CL_DEVICE_ERROR_CORRECTION_SUPPORT 0x1024 #define CL_DEVICE_PROFILING_TIMER_RESOLUTION 0x1025 #define CL_DEVICE_ENDIAN_LITTLE 0x1026 #define CL_DEVICE_AVAILABLE 0x1027 #define CL_DEVICE_COMPILER_AVAILABLE 0x1028 #define CL_DEVICE_EXECUTION_CAPABILITIES 0x1029 #define CL_DEVICE_QUEUE_PROPERTIES 0x102A /* deprecated */ #define CL_DEVICE_QUEUE_ON_HOST_PROPERTIES 0x102A #define CL_DEVICE_NAME 0x102B #define CL_DEVICE_VENDOR 0x102C #define CL_DRIVER_VERSION 0x102D #define CL_DEVICE_PROFILE 0x102E #define CL_DEVICE_VERSION 0x102F #define CL_DEVICE_EXTENSIONS 0x1030 #define CL_DEVICE_PLATFORM 0x1031 #define CL_DEVICE_DOUBLE_FP_CONFIG 0x1032 /* 0x1033 reserved for CL_DEVICE_HALF_FP_CONFIG */ #define CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF 0x1034 #define CL_DEVICE_HOST_UNIFIED_MEMORY 0x1035 /* deprecated */ #define CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR 0x1036 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT 0x1037 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_INT 0x1038 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG 0x1039 #define CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT 0x103A #define CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE 0x103B #define CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF 0x103C #define CL_DEVICE_OPENCL_C_VERSION 0x103D #define CL_DEVICE_LINKER_AVAILABLE 0x103E #define CL_DEVICE_BUILT_IN_KERNELS 0x103F #define CL_DEVICE_IMAGE_MAX_BUFFER_SIZE 0x1040 #define CL_DEVICE_IMAGE_MAX_ARRAY_SIZE 0x1041 #define CL_DEVICE_PARENT_DEVICE 0x1042 #define CL_DEVICE_PARTITION_MAX_SUB_DEVICES 0x1043 #define CL_DEVICE_PARTITION_PROPERTIES 0x1044 #define CL_DEVICE_PARTITION_AFFINITY_DOMAIN 0x1045 #define CL_DEVICE_PARTITION_TYPE 0x1046 #define CL_DEVICE_REFERENCE_COUNT 0x1047 #define CL_DEVICE_PREFERRED_INTEROP_USER_SYNC 0x1048 #define CL_DEVICE_PRINTF_BUFFER_SIZE 0x1049 #define CL_DEVICE_IMAGE_PITCH_ALIGNMENT 0x104A #define CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT 0x104B #define CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS 0x104C #define CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE 0x104D #define CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES 0x104E #define CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE 0x104F #define CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE 0x1050 #define CL_DEVICE_MAX_ON_DEVICE_QUEUES 0x1051 #define CL_DEVICE_MAX_ON_DEVICE_EVENTS 0x1052 #define CL_DEVICE_SVM_CAPABILITIES 0x1053 #define CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE 0x1054 #define CL_DEVICE_MAX_PIPE_ARGS 0x1055 #define CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS 0x1056 #define CL_DEVICE_PIPE_MAX_PACKET_SIZE 0x1057 #define CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT 0x1058 #define CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT 0x1059 #define CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT 0x105A /* cl_device_fp_config - bitfield */ #define CL_FP_DENORM (1 << 0) #define CL_FP_INF_NAN (1 << 1) #define CL_FP_ROUND_TO_NEAREST (1 << 2) #define CL_FP_ROUND_TO_ZERO (1 << 3) #define CL_FP_ROUND_TO_INF (1 << 4) #define CL_FP_FMA (1 << 5) #define CL_FP_SOFT_FLOAT (1 << 6) #define CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT (1 << 7) /* cl_device_mem_cache_type */ #define CL_NONE 0x0 #define CL_READ_ONLY_CACHE 0x1 #define CL_READ_WRITE_CACHE 0x2 /* cl_device_local_mem_type */ #define CL_LOCAL 0x1 #define CL_GLOBAL 0x2 /* cl_device_exec_capabilities - bitfield */ #define CL_EXEC_KERNEL (1 << 0) #define CL_EXEC_NATIVE_KERNEL (1 << 1) /* cl_command_queue_properties - bitfield */ #define CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE (1 << 0) #define CL_QUEUE_PROFILING_ENABLE (1 << 1) #define CL_QUEUE_ON_DEVICE (1 << 2) #define CL_QUEUE_ON_DEVICE_DEFAULT (1 << 3) /* cl_context_info */ #define CL_CONTEXT_REFERENCE_COUNT 0x1080 #define CL_CONTEXT_DEVICES 0x1081 #define CL_CONTEXT_PROPERTIES 0x1082 #define CL_CONTEXT_NUM_DEVICES 0x1083 /* cl_context_properties */ #define CL_CONTEXT_PLATFORM 0x1084 #define CL_CONTEXT_INTEROP_USER_SYNC 0x1085 /* cl_device_partition_property */ #define CL_DEVICE_PARTITION_EQUALLY 0x1086 #define CL_DEVICE_PARTITION_BY_COUNTS 0x1087 #define CL_DEVICE_PARTITION_BY_COUNTS_LIST_END 0x0 #define CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN 0x1088 /* cl_device_affinity_domain */ #define CL_DEVICE_AFFINITY_DOMAIN_NUMA (1 << 0) #define CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE (1 << 1) #define CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE (1 << 2) #define CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE (1 << 3) #define CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE (1 << 4) #define CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE (1 << 5) /* cl_device_svm_capabilities */ #define CL_DEVICE_SVM_COARSE_GRAIN_BUFFER (1 << 0) #define CL_DEVICE_SVM_FINE_GRAIN_BUFFER (1 << 1) #define CL_DEVICE_SVM_FINE_GRAIN_SYSTEM (1 << 2) #define CL_DEVICE_SVM_ATOMICS (1 << 3) /* cl_command_queue_info */ #define CL_QUEUE_CONTEXT 0x1090 #define CL_QUEUE_DEVICE 0x1091 #define CL_QUEUE_REFERENCE_COUNT 0x1092 #define CL_QUEUE_PROPERTIES 0x1093 #define CL_QUEUE_SIZE 0x1094 /* cl_mem_flags and cl_svm_mem_flags - bitfield */ #define CL_MEM_READ_WRITE (1 << 0) #define CL_MEM_WRITE_ONLY (1 << 1) #define CL_MEM_READ_ONLY (1 << 2) #define CL_MEM_USE_HOST_PTR (1 << 3) #define CL_MEM_ALLOC_HOST_PTR (1 << 4) #define CL_MEM_COPY_HOST_PTR (1 << 5) /* reserved (1 << 6) */ #define CL_MEM_HOST_WRITE_ONLY (1 << 7) #define CL_MEM_HOST_READ_ONLY (1 << 8) #define CL_MEM_HOST_NO_ACCESS (1 << 9) #define CL_MEM_SVM_FINE_GRAIN_BUFFER (1 << 10) /* used by cl_svm_mem_flags only */ #define CL_MEM_SVM_ATOMICS (1 << 11) /* used by cl_svm_mem_flags only */ #define CL_MEM_KERNEL_READ_AND_WRITE (1 << 12) /* cl_mem_migration_flags - bitfield */ #define CL_MIGRATE_MEM_OBJECT_HOST (1 << 0) #define CL_MIGRATE_MEM_OBJECT_CONTENT_UNDEFINED (1 << 1) /* cl_channel_order */ #define CL_R 0x10B0 #define CL_A 0x10B1 #define CL_RG 0x10B2 #define CL_RA 0x10B3 #define CL_RGB 0x10B4 #define CL_RGBA 0x10B5 #define CL_BGRA 0x10B6 #define CL_ARGB 0x10B7 #define CL_INTENSITY 0x10B8 #define CL_LUMINANCE 0x10B9 #define CL_Rx 0x10BA #define CL_RGx 0x10BB #define CL_RGBx 0x10BC #define CL_DEPTH 0x10BD #define CL_DEPTH_STENCIL 0x10BE #define CL_sRGB 0x10BF #define CL_sRGBx 0x10C0 #define CL_sRGBA 0x10C1 #define CL_sBGRA 0x10C2 #define CL_ABGR 0x10C3 /* cl_channel_type */ #define CL_SNORM_INT8 0x10D0 #define CL_SNORM_INT16 0x10D1 #define CL_UNORM_INT8 0x10D2 #define CL_UNORM_INT16 0x10D3 #define CL_UNORM_SHORT_565 0x10D4 #define CL_UNORM_SHORT_555 0x10D5 #define CL_UNORM_INT_101010 0x10D6 #define CL_SIGNED_INT8 0x10D7 #define CL_SIGNED_INT16 0x10D8 #define CL_SIGNED_INT32 0x10D9 #define CL_UNSIGNED_INT8 0x10DA #define CL_UNSIGNED_INT16 0x10DB #define CL_UNSIGNED_INT32 0x10DC #define CL_HALF_FLOAT 0x10DD #define CL_FLOAT 0x10DE #define CL_UNORM_INT24 0x10DF /* cl_mem_object_type */ #define CL_MEM_OBJECT_BUFFER 0x10F0 #define CL_MEM_OBJECT_IMAGE2D 0x10F1 #define CL_MEM_OBJECT_IMAGE3D 0x10F2 #define CL_MEM_OBJECT_IMAGE2D_ARRAY 0x10F3 #define CL_MEM_OBJECT_IMAGE1D 0x10F4 #define CL_MEM_OBJECT_IMAGE1D_ARRAY 0x10F5 #define CL_MEM_OBJECT_IMAGE1D_BUFFER 0x10F6 #define CL_MEM_OBJECT_PIPE 0x10F7 /* cl_mem_info */ #define CL_MEM_TYPE 0x1100 #define CL_MEM_FLAGS 0x1101 #define CL_MEM_SIZE 0x1102 #define CL_MEM_HOST_PTR 0x1103 #define CL_MEM_MAP_COUNT 0x1104 #define CL_MEM_REFERENCE_COUNT 0x1105 #define CL_MEM_CONTEXT 0x1106 #define CL_MEM_ASSOCIATED_MEMOBJECT 0x1107 #define CL_MEM_OFFSET 0x1108 #define CL_MEM_USES_SVM_POINTER 0x1109 /* cl_image_info */ #define CL_IMAGE_FORMAT 0x1110 #define CL_IMAGE_ELEMENT_SIZE 0x1111 #define CL_IMAGE_ROW_PITCH 0x1112 #define CL_IMAGE_SLICE_PITCH 0x1113 #define CL_IMAGE_WIDTH 0x1114 #define CL_IMAGE_HEIGHT 0x1115 #define CL_IMAGE_DEPTH 0x1116 #define CL_IMAGE_ARRAY_SIZE 0x1117 #define CL_IMAGE_BUFFER 0x1118 #define CL_IMAGE_NUM_MIP_LEVELS 0x1119 #define CL_IMAGE_NUM_SAMPLES 0x111A /* cl_pipe_info */ #define CL_PIPE_PACKET_SIZE 0x1120 #define CL_PIPE_MAX_PACKETS 0x1121 /* cl_addressing_mode */ #define CL_ADDRESS_NONE 0x1130 #define CL_ADDRESS_CLAMP_TO_EDGE 0x1131 #define CL_ADDRESS_CLAMP 0x1132 #define CL_ADDRESS_REPEAT 0x1133 #define CL_ADDRESS_MIRRORED_REPEAT 0x1134 /* cl_filter_mode */ #define CL_FILTER_NEAREST 0x1140 #define CL_FILTER_LINEAR 0x1141 /* cl_sampler_info */ #define CL_SAMPLER_REFERENCE_COUNT 0x1150 #define CL_SAMPLER_CONTEXT 0x1151 #define CL_SAMPLER_NORMALIZED_COORDS 0x1152 #define CL_SAMPLER_ADDRESSING_MODE 0x1153 #define CL_SAMPLER_FILTER_MODE 0x1154 #define CL_SAMPLER_MIP_FILTER_MODE 0x1155 #define CL_SAMPLER_LOD_MIN 0x1156 #define CL_SAMPLER_LOD_MAX 0x1157 /* cl_map_flags - bitfield */ #define CL_MAP_READ (1 << 0) #define CL_MAP_WRITE (1 << 1) #define CL_MAP_WRITE_INVALIDATE_REGION (1 << 2) /* cl_program_info */ #define CL_PROGRAM_REFERENCE_COUNT 0x1160 #define CL_PROGRAM_CONTEXT 0x1161 #define CL_PROGRAM_NUM_DEVICES 0x1162 #define CL_PROGRAM_DEVICES 0x1163 #define CL_PROGRAM_SOURCE 0x1164 #define CL_PROGRAM_BINARY_SIZES 0x1165 #define CL_PROGRAM_BINARIES 0x1166 #define CL_PROGRAM_NUM_KERNELS 0x1167 #define CL_PROGRAM_KERNEL_NAMES 0x1168 /* cl_program_build_info */ #define CL_PROGRAM_BUILD_STATUS 0x1181 #define CL_PROGRAM_BUILD_OPTIONS 0x1182 #define CL_PROGRAM_BUILD_LOG 0x1183 #define CL_PROGRAM_BINARY_TYPE 0x1184 #define CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE 0x1185 /* cl_program_binary_type */ #define CL_PROGRAM_BINARY_TYPE_NONE 0x0 #define CL_PROGRAM_BINARY_TYPE_COMPILED_OBJECT 0x1 #define CL_PROGRAM_BINARY_TYPE_LIBRARY 0x2 #define CL_PROGRAM_BINARY_TYPE_EXECUTABLE 0x4 /* cl_build_status */ #define CL_BUILD_SUCCESS 0 #define CL_BUILD_NONE -1 #define CL_BUILD_ERROR -2 #define CL_BUILD_IN_PROGRESS -3 /* cl_kernel_info */ #define CL_KERNEL_FUNCTION_NAME 0x1190 #define CL_KERNEL_NUM_ARGS 0x1191 #define CL_KERNEL_REFERENCE_COUNT 0x1192 #define CL_KERNEL_CONTEXT 0x1193 #define CL_KERNEL_PROGRAM 0x1194 #define CL_KERNEL_ATTRIBUTES 0x1195 /* cl_kernel_arg_info */ #define CL_KERNEL_ARG_ADDRESS_QUALIFIER 0x1196 #define CL_KERNEL_ARG_ACCESS_QUALIFIER 0x1197 #define CL_KERNEL_ARG_TYPE_NAME 0x1198 #define CL_KERNEL_ARG_TYPE_QUALIFIER 0x1199 #define CL_KERNEL_ARG_NAME 0x119A /* cl_kernel_arg_address_qualifier */ #define CL_KERNEL_ARG_ADDRESS_GLOBAL 0x119B #define CL_KERNEL_ARG_ADDRESS_LOCAL 0x119C #define CL_KERNEL_ARG_ADDRESS_CONSTANT 0x119D #define CL_KERNEL_ARG_ADDRESS_PRIVATE 0x119E /* cl_kernel_arg_access_qualifier */ #define CL_KERNEL_ARG_ACCESS_READ_ONLY 0x11A0 #define CL_KERNEL_ARG_ACCESS_WRITE_ONLY 0x11A1 #define CL_KERNEL_ARG_ACCESS_READ_WRITE 0x11A2 #define CL_KERNEL_ARG_ACCESS_NONE 0x11A3 /* cl_kernel_arg_type_qualifer */ #define CL_KERNEL_ARG_TYPE_NONE 0 #define CL_KERNEL_ARG_TYPE_CONST (1 << 0) #define CL_KERNEL_ARG_TYPE_RESTRICT (1 << 1) #define CL_KERNEL_ARG_TYPE_VOLATILE (1 << 2) #define CL_KERNEL_ARG_TYPE_PIPE (1 << 3) /* cl_kernel_work_group_info */ #define CL_KERNEL_WORK_GROUP_SIZE 0x11B0 #define CL_KERNEL_COMPILE_WORK_GROUP_SIZE 0x11B1 #define CL_KERNEL_LOCAL_MEM_SIZE 0x11B2 #define CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE 0x11B3 #define CL_KERNEL_PRIVATE_MEM_SIZE 0x11B4 #define CL_KERNEL_GLOBAL_WORK_SIZE 0x11B5 /* cl_kernel_exec_info */ #define CL_KERNEL_EXEC_INFO_SVM_PTRS 0x11B6 #define CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM 0x11B7 /* cl_event_info */ #define CL_EVENT_COMMAND_QUEUE 0x11D0 #define CL_EVENT_COMMAND_TYPE 0x11D1 #define CL_EVENT_REFERENCE_COUNT 0x11D2 #define CL_EVENT_COMMAND_EXECUTION_STATUS 0x11D3 #define CL_EVENT_CONTEXT 0x11D4 /* cl_command_type */ #define CL_COMMAND_NDRANGE_KERNEL 0x11F0 #define CL_COMMAND_TASK 0x11F1 #define CL_COMMAND_NATIVE_KERNEL 0x11F2 #define CL_COMMAND_READ_BUFFER 0x11F3 #define CL_COMMAND_WRITE_BUFFER 0x11F4 #define CL_COMMAND_COPY_BUFFER 0x11F5 #define CL_COMMAND_READ_IMAGE 0x11F6 #define CL_COMMAND_WRITE_IMAGE 0x11F7 #define CL_COMMAND_COPY_IMAGE 0x11F8 #define CL_COMMAND_COPY_IMAGE_TO_BUFFER 0x11F9 #define CL_COMMAND_COPY_BUFFER_TO_IMAGE 0x11FA #define CL_COMMAND_MAP_BUFFER 0x11FB #define CL_COMMAND_MAP_IMAGE 0x11FC #define CL_COMMAND_UNMAP_MEM_OBJECT 0x11FD #define CL_COMMAND_MARKER 0x11FE #define CL_COMMAND_ACQUIRE_GL_OBJECTS 0x11FF #define CL_COMMAND_RELEASE_GL_OBJECTS 0x1200 #define CL_COMMAND_READ_BUFFER_RECT 0x1201 #define CL_COMMAND_WRITE_BUFFER_RECT 0x1202 #define CL_COMMAND_COPY_BUFFER_RECT 0x1203 #define CL_COMMAND_USER 0x1204 #define CL_COMMAND_BARRIER 0x1205 #define CL_COMMAND_MIGRATE_MEM_OBJECTS 0x1206 #define CL_COMMAND_FILL_BUFFER 0x1207 #define CL_COMMAND_FILL_IMAGE 0x1208 #define CL_COMMAND_SVM_FREE 0x1209 #define CL_COMMAND_SVM_MEMCPY 0x120A #define CL_COMMAND_SVM_MEMFILL 0x120B #define CL_COMMAND_SVM_MAP 0x120C #define CL_COMMAND_SVM_UNMAP 0x120D /* command execution status */ #define CL_COMPLETE 0x0 #define CL_RUNNING 0x1 #define CL_SUBMITTED 0x2 #define CL_QUEUED 0x3 /* cl_buffer_create_type */ #define CL_BUFFER_CREATE_TYPE_REGION 0x1220 /* cl_profiling_info */ #define CL_PROFILING_COMMAND_QUEUED 0x1280 #define CL_PROFILING_COMMAND_SUBMIT 0x1281 #define CL_PROFILING_COMMAND_START 0x1282 #define CL_PROFILING_COMMAND_END 0x1283 #define CL_PROFILING_COMMAND_COMPLETE 0x1284 /********************************************************************************************************/ /* Platform API */ extern CL_API_ENTRY cl_int CL_API_CALL clGetPlatformIDs(cl_uint /* num_entries */, cl_platform_id * /* platforms */, cl_uint * /* num_platforms */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetPlatformInfo(cl_platform_id /* platform */, cl_platform_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Device APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceIDs(cl_platform_id /* platform */, cl_device_type /* device_type */, cl_uint /* num_entries */, cl_device_id * /* devices */, cl_uint * /* num_devices */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetDeviceInfo(cl_device_id /* device */, cl_device_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clCreateSubDevices(cl_device_id /* in_device */, const cl_device_partition_property * /* properties */, cl_uint /* num_devices */, cl_device_id * /* out_devices */, cl_uint * /* num_devices_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clRetainDevice(cl_device_id /* device */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseDevice(cl_device_id /* device */) CL_API_SUFFIX__VERSION_1_2; /* Context APIs */ extern CL_API_ENTRY cl_context CL_API_CALL clCreateContext(const cl_context_properties * /* properties */, cl_uint /* num_devices */, const cl_device_id * /* devices */, void (CL_CALLBACK * /* pfn_notify */)(const char *, const void *, size_t, void *), void * /* user_data */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_context CL_API_CALL clCreateContextFromType(const cl_context_properties * /* properties */, cl_device_type /* device_type */, void (CL_CALLBACK * /* pfn_notify*/ )(const char *, const void *, size_t, void *), void * /* user_data */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainContext(cl_context /* context */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseContext(cl_context /* context */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetContextInfo(cl_context /* context */, cl_context_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Command Queue APIs */ extern CL_API_ENTRY cl_command_queue CL_API_CALL clCreateCommandQueueWithProperties(cl_context /* context */, cl_device_id /* device */, const cl_queue_properties * /* properties */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainCommandQueue(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseCommandQueue(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetCommandQueueInfo(cl_command_queue /* command_queue */, cl_command_queue_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Memory Object APIs */ extern CL_API_ENTRY cl_mem CL_API_CALL clCreateBuffer(cl_context /* context */, cl_mem_flags /* flags */, size_t /* size */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateSubBuffer(cl_mem /* buffer */, cl_mem_flags /* flags */, cl_buffer_create_type /* buffer_create_type */, const void * /* buffer_create_info */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateImage(cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format * /* image_format */, const cl_image_desc * /* image_desc */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_mem CL_API_CALL clCreatePipe(cl_context /* context */, cl_mem_flags /* flags */, cl_uint /* pipe_packet_size */, cl_uint /* pipe_max_packets */, const cl_pipe_properties * /* properties */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainMemObject(cl_mem /* memobj */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseMemObject(cl_mem /* memobj */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetSupportedImageFormats(cl_context /* context */, cl_mem_flags /* flags */, cl_mem_object_type /* image_type */, cl_uint /* num_entries */, cl_image_format * /* image_formats */, cl_uint * /* num_image_formats */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetMemObjectInfo(cl_mem /* memobj */, cl_mem_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetImageInfo(cl_mem /* image */, cl_image_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetPipeInfo(cl_mem /* pipe */, cl_pipe_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetMemObjectDestructorCallback(cl_mem /* memobj */, void (CL_CALLBACK * /*pfn_notify*/)( cl_mem /* memobj */, void* /*user_data*/), void * /*user_data */ ) CL_API_SUFFIX__VERSION_1_1; /* SVM Allocation APIs */ extern CL_API_ENTRY void * CL_API_CALL clSVMAlloc(cl_context /* context */, cl_svm_mem_flags /* flags */, size_t /* size */, cl_uint /* alignment */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY void CL_API_CALL clSVMFree(cl_context /* context */, void * /* svm_pointer */) CL_API_SUFFIX__VERSION_2_0; /* Sampler APIs */ extern CL_API_ENTRY cl_sampler CL_API_CALL clCreateSamplerWithProperties(cl_context /* context */, const cl_sampler_properties * /* normalized_coords */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainSampler(cl_sampler /* sampler */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseSampler(cl_sampler /* sampler */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetSamplerInfo(cl_sampler /* sampler */, cl_sampler_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Program Object APIs */ extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithSource(cl_context /* context */, cl_uint /* count */, const char ** /* strings */, const size_t * /* lengths */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBinary(cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const size_t * /* lengths */, const unsigned char ** /* binaries */, cl_int * /* binary_status */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithBuiltInKernels(cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* kernel_names */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clRetainProgram(cl_program /* program */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseProgram(cl_program /* program */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clBuildProgram(cl_program /* program */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* options */, void (CL_CALLBACK * /* pfn_notify */)(cl_program /* program */, void * /* user_data */), void * /* user_data */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clCompileProgram(cl_program /* program */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* options */, cl_uint /* num_input_headers */, const cl_program * /* input_headers */, const char ** /* header_include_names */, void (CL_CALLBACK * /* pfn_notify */)(cl_program /* program */, void * /* user_data */), void * /* user_data */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_program CL_API_CALL clLinkProgram(cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* options */, cl_uint /* num_input_programs */, const cl_program * /* input_programs */, void (CL_CALLBACK * /* pfn_notify */)(cl_program /* program */, void * /* user_data */), void * /* user_data */, cl_int * /* errcode_ret */ ) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clUnloadPlatformCompiler(cl_platform_id /* platform */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clGetProgramInfo(cl_program /* program */, cl_program_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetProgramBuildInfo(cl_program /* program */, cl_device_id /* device */, cl_program_build_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Kernel Object APIs */ extern CL_API_ENTRY cl_kernel CL_API_CALL clCreateKernel(cl_program /* program */, const char * /* kernel_name */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clCreateKernelsInProgram(cl_program /* program */, cl_uint /* num_kernels */, cl_kernel * /* kernels */, cl_uint * /* num_kernels_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clRetainKernel(cl_kernel /* kernel */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseKernel(cl_kernel /* kernel */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelArg(cl_kernel /* kernel */, cl_uint /* arg_index */, size_t /* arg_size */, const void * /* arg_value */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelArgSVMPointer(cl_kernel /* kernel */, cl_uint /* arg_index */, const void * /* arg_value */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetKernelExecInfo(cl_kernel /* kernel */, cl_kernel_exec_info /* param_name */, size_t /* param_value_size */, const void * /* param_value */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelInfo(cl_kernel /* kernel */, cl_kernel_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelArgInfo(cl_kernel /* kernel */, cl_uint /* arg_indx */, cl_kernel_arg_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelWorkGroupInfo(cl_kernel /* kernel */, cl_device_id /* device */, cl_kernel_work_group_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Event Object APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clWaitForEvents(cl_uint /* num_events */, const cl_event * /* event_list */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clGetEventInfo(cl_event /* event */, cl_event_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_event CL_API_CALL clCreateUserEvent(cl_context /* context */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clRetainEvent(cl_event /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clReleaseEvent(cl_event /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clSetUserEventStatus(cl_event /* event */, cl_int /* execution_status */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clSetEventCallback( cl_event /* event */, cl_int /* command_exec_callback_type */, void (CL_CALLBACK * /* pfn_notify */)(cl_event, cl_int, void *), void * /* user_data */) CL_API_SUFFIX__VERSION_1_1; /* Profiling APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clGetEventProfilingInfo(cl_event /* event */, cl_profiling_info /* param_name */, size_t /* param_value_size */, void * /* param_value */, size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0; /* Flush and Finish APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clFlush(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clFinish(cl_command_queue /* command_queue */) CL_API_SUFFIX__VERSION_1_0; /* Enqueued Commands APIs */ extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_read */, size_t /* offset */, size_t /* size */, void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadBufferRect(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_read */, const size_t * /* buffer_offset */, const size_t * /* host_offset */, const size_t * /* region */, size_t /* buffer_row_pitch */, size_t /* buffer_slice_pitch */, size_t /* host_row_pitch */, size_t /* host_slice_pitch */, void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_write */, size_t /* offset */, size_t /* size */, const void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteBufferRect(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_write */, const size_t * /* buffer_offset */, const size_t * /* host_offset */, const size_t * /* region */, size_t /* buffer_row_pitch */, size_t /* buffer_slice_pitch */, size_t /* host_row_pitch */, size_t /* host_slice_pitch */, const void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, const void * /* pattern */, size_t /* pattern_size */, size_t /* offset */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBuffer(cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_buffer */, size_t /* src_offset */, size_t /* dst_offset */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferRect(cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_buffer */, const size_t * /* src_origin */, const size_t * /* dst_origin */, const size_t * /* region */, size_t /* src_row_pitch */, size_t /* src_slice_pitch */, size_t /* dst_row_pitch */, size_t /* dst_slice_pitch */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_1; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueReadImage(cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_read */, const size_t * /* origin[3] */, const size_t * /* region[3] */, size_t /* row_pitch */, size_t /* slice_pitch */, void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueWriteImage(cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_write */, const size_t * /* origin[3] */, const size_t * /* region[3] */, size_t /* input_row_pitch */, size_t /* input_slice_pitch */, const void * /* ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueFillImage(cl_command_queue /* command_queue */, cl_mem /* image */, const void * /* fill_color */, const size_t * /* origin[3] */, const size_t * /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImage(cl_command_queue /* command_queue */, cl_mem /* src_image */, cl_mem /* dst_image */, const size_t * /* src_origin[3] */, const size_t * /* dst_origin[3] */, const size_t * /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyImageToBuffer(cl_command_queue /* command_queue */, cl_mem /* src_image */, cl_mem /* dst_buffer */, const size_t * /* src_origin[3] */, const size_t * /* region[3] */, size_t /* dst_offset */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueCopyBufferToImage(cl_command_queue /* command_queue */, cl_mem /* src_buffer */, cl_mem /* dst_image */, size_t /* src_offset */, const size_t * /* dst_origin[3] */, const size_t * /* region[3] */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY void * CL_API_CALL clEnqueueMapBuffer(cl_command_queue /* command_queue */, cl_mem /* buffer */, cl_bool /* blocking_map */, cl_map_flags /* map_flags */, size_t /* offset */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY void * CL_API_CALL clEnqueueMapImage(cl_command_queue /* command_queue */, cl_mem /* image */, cl_bool /* blocking_map */, cl_map_flags /* map_flags */, const size_t * /* origin[3] */, const size_t * /* region[3] */, size_t * /* image_row_pitch */, size_t * /* image_slice_pitch */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */, cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueUnmapMemObject(cl_command_queue /* command_queue */, cl_mem /* memobj */, void * /* mapped_ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueMigrateMemObjects(cl_command_queue /* command_queue */, cl_uint /* num_mem_objects */, const cl_mem * /* mem_objects */, cl_mem_migration_flags /* flags */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueNDRangeKernel(cl_command_queue /* command_queue */, cl_kernel /* kernel */, cl_uint /* work_dim */, const size_t * /* global_work_offset */, const size_t * /* global_work_size */, const size_t * /* local_work_size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueNativeKernel(cl_command_queue /* command_queue */, void (CL_CALLBACK * /*user_func*/)(void *), void * /* args */, size_t /* cb_args */, cl_uint /* num_mem_objects */, const cl_mem * /* mem_list */, const void ** /* args_mem_loc */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueMarkerWithWaitList(cl_command_queue /* command_queue */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueBarrierWithWaitList(cl_command_queue /* command_queue */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_1_2; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMFree(cl_command_queue /* command_queue */, cl_uint /* num_svm_pointers */, void *[] /* svm_pointers[] */, void (CL_CALLBACK * /*pfn_free_func*/)(cl_command_queue /* queue */, cl_uint /* num_svm_pointers */, void *[] /* svm_pointers[] */, void * /* user_data */), void * /* user_data */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemcpy(cl_command_queue /* command_queue */, cl_bool /* blocking_copy */, void * /* dst_ptr */, const void * /* src_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMemFill(cl_command_queue /* command_queue */, void * /* svm_ptr */, const void * /* pattern */, size_t /* pattern_size */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMMap(cl_command_queue /* command_queue */, cl_bool /* blocking_map */, cl_map_flags /* flags */, void * /* svm_ptr */, size_t /* size */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; extern CL_API_ENTRY cl_int CL_API_CALL clEnqueueSVMUnmap(cl_command_queue /* command_queue */, void * /* svm_ptr */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_API_SUFFIX__VERSION_2_0; /* Extension function access * * Returns the extension function address for the given function name, * or NULL if a valid function can not be found. The client must * check to make sure the address is not NULL, before using or * calling the returned function address. */ extern CL_API_ENTRY void * CL_API_CALL clGetExtensionFunctionAddressForPlatform(cl_platform_id /* platform */, const char * /* func_name */) CL_API_SUFFIX__VERSION_1_2; /* Deprecated OpenCL 1.1 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateImage2D(cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format * /* image_format */, size_t /* image_width */, size_t /* image_height */, size_t /* image_row_pitch */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL clCreateImage3D(cl_context /* context */, cl_mem_flags /* flags */, const cl_image_format * /* image_format */, size_t /* image_width */, size_t /* image_height */, size_t /* image_depth */, size_t /* image_row_pitch */, size_t /* image_slice_pitch */, void * /* host_ptr */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueMarker(cl_command_queue /* command_queue */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueWaitForEvents(cl_command_queue /* command_queue */, cl_uint /* num_events */, const cl_event * /* event_list */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clEnqueueBarrier(cl_command_queue /* command_queue */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_int CL_API_CALL clUnloadCompiler(void) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED void * CL_API_CALL clGetExtensionFunctionAddress(const char * /* func_name */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED; /* Deprecated OpenCL 2.0 APIs */ extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_command_queue CL_API_CALL clCreateCommandQueue(cl_context /* context */, cl_device_id /* device */, cl_command_queue_properties /* properties */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_sampler CL_API_CALL clCreateSampler(cl_context /* context */, cl_bool /* normalized_coords */, cl_addressing_mode /* addressing_mode */, cl_filter_mode /* filter_mode */, cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED; extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_2_DEPRECATED cl_int CL_API_CALL clEnqueueTask(cl_command_queue /* command_queue */, cl_kernel /* kernel */, cl_uint /* num_events_in_wait_list */, const cl_event * /* event_wait_list */, cl_event * /* event */) CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_H */ Beignet-1.3.2-Source/include/CL/cl_intel.h000664 001750 001750 00000020435 13173554000 017345 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #ifndef __OPENCL_CL_INTEL_H #define __OPENCL_CL_INTEL_H #include "CL/cl.h" #ifdef __cplusplus extern "C" { #endif #define CL_MEM_PINNABLE (1 << 10) /* Track allocations and report current number of unfreed allocations */ extern CL_API_ENTRY cl_int CL_API_CALL clReportUnfreedIntel(void); typedef CL_API_ENTRY cl_int (CL_API_CALL *clReportUnfreedIntel_fn)(void); /* 1 to 1 mapping of drm_intel_bo_map */ extern CL_API_ENTRY void* CL_API_CALL clMapBufferIntel(cl_mem, cl_int*); typedef CL_API_ENTRY void* (CL_API_CALL *clMapBufferIntel_fn)(cl_mem, cl_int*); /* 1 to 1 mapping of drm_intel_bo_unmap */ extern CL_API_ENTRY cl_int CL_API_CALL clUnmapBufferIntel(cl_mem); typedef CL_API_ENTRY cl_int (CL_API_CALL *clUnmapBufferIntel_fn)(cl_mem); /* 1 to 1 mapping of drm_intel_gem_bo_map_gtt */ extern CL_API_ENTRY void* CL_API_CALL clMapBufferGTTIntel(cl_mem, cl_int*); typedef CL_API_ENTRY void* (CL_API_CALL *clMapBufferGTTIntel_fn)(cl_mem, cl_int*); /* 1 to 1 mapping of drm_intel_gem_bo_unmap_gtt */ extern CL_API_ENTRY cl_int CL_API_CALL clUnmapBufferGTTIntel(cl_mem); typedef CL_API_ENTRY cl_int (CL_API_CALL *clUnmapBufferGTTIntel_fn)(cl_mem); /* Pin /Unpin the buffer in GPU memory (must be root) */ extern CL_API_ENTRY cl_int CL_API_CALL clPinBufferIntel(cl_mem); extern CL_API_ENTRY cl_int CL_API_CALL clUnpinBufferIntel(cl_mem); typedef CL_API_ENTRY cl_int (CL_API_CALL *clPinBufferIntel_fn)(cl_mem); typedef CL_API_ENTRY cl_int (CL_API_CALL *clUnpinBufferIntel_fn)(cl_mem); /* Get the generation of the Gen device (used to load the proper binary) */ extern CL_API_ENTRY cl_int CL_API_CALL clGetGenVersionIntel(cl_device_id device, cl_int *ver); typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetGenVersionIntel_fn)( cl_device_id device, cl_int *ver); /* Create a program from a LLVM source file */ extern CL_API_ENTRY cl_program CL_API_CALL clCreateProgramWithLLVMIntel(cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* file */, cl_int * /* errcode_ret */); typedef CL_API_ENTRY cl_program (CL_API_CALL *clCreateProgramWithLLVMIntel_fn)( cl_context /* context */, cl_uint /* num_devices */, const cl_device_id * /* device_list */, const char * /* file */, cl_int * /* errcode_ret */); /* Create buffer from libva's buffer object */ extern CL_API_ENTRY cl_mem CL_API_CALL clCreateBufferFromLibvaIntel(cl_context /* context */, unsigned int /* bo_name */, cl_int * /* errcode_ret */); typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateBufferFromLibvaIntel_fn)( cl_context /* context */, unsigned int /* bo_name */, cl_int * /* errcode_ret */); /* Create image from libva's buffer object */ typedef struct _cl_libva_image { unsigned int bo_name; uint32_t offset; uint32_t width; uint32_t height; cl_image_format fmt; uint32_t row_pitch; uint32_t reserved[8]; } cl_libva_image; extern CL_API_ENTRY cl_mem CL_API_CALL clCreateImageFromLibvaIntel(cl_context /* context */, const cl_libva_image * /* info */, cl_int * /* errcode_ret */); typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateImageFromLibvaIntel_fn)( cl_context /* context */, const cl_libva_image * /* info */, cl_int * /* errcode_ret */); /* Create buffer from libva's buffer object */ extern CL_API_ENTRY cl_int CL_API_CALL clGetMemObjectFdIntel(cl_context /* context */, cl_mem /* Memory Obejct */, int* /* returned fd */); typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetMemObjectFdIntel_fn)( cl_context /* context */, cl_mem /* Memory Obejct */, int* /* returned fd */); typedef struct _cl_import_buffer_info_intel { int fd; int size; } cl_import_buffer_info_intel; typedef struct _cl_import_image_info_intel { int fd; int size; cl_mem_object_type type; cl_image_format fmt; uint32_t offset; uint32_t width; uint32_t height; uint32_t row_pitch; } cl_import_image_info_intel; /* Create memory object from external buffer object by fd */ extern CL_API_ENTRY cl_mem CL_API_CALL clCreateBufferFromFdINTEL(cl_context /* context */, const cl_import_buffer_info_intel * /* info */, cl_int * /* errcode_ret */); typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateBufferFromFdINTEL_fn)( cl_context /* context */, const cl_import_buffer_info_intel * /* info */, cl_int * /* errcode_ret */); extern CL_API_ENTRY cl_mem CL_API_CALL clCreateImageFromFdINTEL(cl_context /* context */, const cl_import_image_info_intel * /* info */, cl_int * /* errcode_ret */); typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateImageFromFdINTEL_fn)( cl_context /* context */, const cl_import_image_info_intel * /* info */, cl_int * /* errcode_ret */); #ifndef CL_VERSION_2_0 typedef cl_uint cl_kernel_sub_group_info; /* cl_khr_sub_group_info */ #define CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR 0x2033 #define CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE_KHR 0x2034 extern CL_API_ENTRY cl_int CL_API_CALL clGetKernelSubGroupInfoKHR(cl_kernel /* in_kernel */, cl_device_id /*in_device*/, cl_kernel_sub_group_info /* param_name */, size_t /*input_value_size*/, const void * /*input_value*/, size_t /*param_value_size*/, void* /*param_value*/, size_t* /*param_value_size_ret*/ ); typedef CL_API_ENTRY cl_int ( CL_API_CALL * clGetKernelSubGroupInfoKHR_fn)(cl_kernel /* in_kernel */, cl_device_id /*in_device*/, cl_kernel_sub_group_info /* param_name */, size_t /*input_value_size*/, const void * /*input_value*/, size_t /*param_value_size*/, void* /*param_value*/, size_t* /*param_value_size_ret*/ ); #endif /* cl_intel_required_subgroup_size extension*/ #define CL_DEVICE_SUB_GROUP_SIZES_INTEL 0x4108 #define CL_KERNEL_SPILL_MEM_SIZE_INTEL 0x4109 #define CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL 0x410A #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_INTEL_H */ Beignet-1.3.2-Source/include/CL/cl_platform.h000664 001750 001750 00000123273 13161142102 020053 0ustar00yryr000000 000000 /********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11803 $ on $Date: 2010-06-25 10:02:12 -0700 (Fri, 25 Jun 2010) $ */ #ifndef __CL_PLATFORM_H #define __CL_PLATFORM_H #ifdef __APPLE__ /* Contains #defines for AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER below */ #include #endif #ifdef __cplusplus extern "C" { #endif #if defined(_WIN32) #define CL_API_ENTRY #define CL_API_CALL __stdcall #define CL_CALLBACK __stdcall #else #define CL_API_ENTRY #define CL_API_CALL #define CL_CALLBACK #endif /* * Deprecation flags refer to the last version of the header in which the * feature was not deprecated. * * E.g. VERSION_1_1_DEPRECATED means the feature is present in 1.1 without * deprecation but is deprecated in versions later than 1.1. */ #ifdef __APPLE__ #define CL_EXTENSION_WEAK_LINK __attribute__((weak_import)) #define CL_API_SUFFIX__VERSION_1_0 AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_0 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER #define CL_API_SUFFIX__VERSION_1_1 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define GCL_API_SUFFIX__VERSION_1_1 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_1 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER_BUT_DEPRECATED_IN_MAC_OS_X_VERSION_10_7 #ifdef AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define CL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define GCL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_2 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_8_AND_LATER #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER_BUT_DEPRECATED_IN_MAC_OS_X_VERSION_10_8 #else #warning This path should never happen outside of internal operating system development. AvailabilityMacros do not function correctly here! #define CL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define GCL_API_SUFFIX__VERSION_1_2 AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_2 CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED CL_EXTENSION_WEAK_LINK AVAILABLE_MAC_OS_X_VERSION_10_7_AND_LATER #endif #else #define CL_EXTENSION_WEAK_LINK #define CL_API_SUFFIX__VERSION_1_0 #define CL_EXT_SUFFIX__VERSION_1_0 #define CL_API_SUFFIX__VERSION_1_1 #define CL_EXT_SUFFIX__VERSION_1_1 #define CL_API_SUFFIX__VERSION_1_2 #define CL_EXT_SUFFIX__VERSION_1_2 #define CL_API_SUFFIX__VERSION_2_0 #define CL_EXT_SUFFIX__VERSION_2_0 #ifdef __GNUC__ #ifdef CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED __attribute__((deprecated)) #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED __attribute__((deprecated)) #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_2_APIS #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED __attribute__((deprecated)) #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #endif #elif _WIN32 #ifdef CL_USE_DEPRECATED_OPENCL_1_0_APIS #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED __declspec(deprecated) #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_1_APIS #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED __declspec(deprecated) #endif #ifdef CL_USE_DEPRECATED_OPENCL_1_2_APIS #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #else #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED __declspec(deprecated) #endif #else #define CL_EXT_SUFFIX__VERSION_1_0_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_0_DEPRECATED #define CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_1_DEPRECATED #define CL_EXT_SUFFIX__VERSION_1_2_DEPRECATED #define CL_EXT_PREFIX__VERSION_1_2_DEPRECATED #endif #endif #if (defined (_WIN32) && defined(_MSC_VER)) /* scalar types */ typedef signed __int8 cl_char; typedef unsigned __int8 cl_uchar; typedef signed __int16 cl_short; typedef unsigned __int16 cl_ushort; typedef signed __int32 cl_int; typedef unsigned __int32 cl_uint; typedef signed __int64 cl_long; typedef unsigned __int64 cl_ulong; typedef unsigned __int16 cl_half; typedef float cl_float; typedef double cl_double; /* Macro names and corresponding values defined by OpenCL */ #define CL_CHAR_BIT 8 #define CL_SCHAR_MAX 127 #define CL_SCHAR_MIN (-127-1) #define CL_CHAR_MAX CL_SCHAR_MAX #define CL_CHAR_MIN CL_SCHAR_MIN #define CL_UCHAR_MAX 255 #define CL_SHRT_MAX 32767 #define CL_SHRT_MIN (-32767-1) #define CL_USHRT_MAX 65535 #define CL_INT_MAX 2147483647 #define CL_INT_MIN (-2147483647-1) #define CL_UINT_MAX 0xffffffffU #define CL_LONG_MAX ((cl_long) 0x7FFFFFFFFFFFFFFFLL) #define CL_LONG_MIN ((cl_long) -0x7FFFFFFFFFFFFFFFLL - 1LL) #define CL_ULONG_MAX ((cl_ulong) 0xFFFFFFFFFFFFFFFFULL) #define CL_FLT_DIG 6 #define CL_FLT_MANT_DIG 24 #define CL_FLT_MAX_10_EXP +38 #define CL_FLT_MAX_EXP +128 #define CL_FLT_MIN_10_EXP -37 #define CL_FLT_MIN_EXP -125 #define CL_FLT_RADIX 2 #define CL_FLT_MAX 340282346638528859811704183484516925440.0f #define CL_FLT_MIN 1.175494350822287507969e-38f #define CL_FLT_EPSILON 0x1.0p-23f #define CL_DBL_DIG 15 #define CL_DBL_MANT_DIG 53 #define CL_DBL_MAX_10_EXP +308 #define CL_DBL_MAX_EXP +1024 #define CL_DBL_MIN_10_EXP -307 #define CL_DBL_MIN_EXP -1021 #define CL_DBL_RADIX 2 #define CL_DBL_MAX 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.0 #define CL_DBL_MIN 2.225073858507201383090e-308 #define CL_DBL_EPSILON 2.220446049250313080847e-16 #define CL_M_E 2.718281828459045090796 #define CL_M_LOG2E 1.442695040888963387005 #define CL_M_LOG10E 0.434294481903251816668 #define CL_M_LN2 0.693147180559945286227 #define CL_M_LN10 2.302585092994045901094 #define CL_M_PI 3.141592653589793115998 #define CL_M_PI_2 1.570796326794896557999 #define CL_M_PI_4 0.785398163397448278999 #define CL_M_1_PI 0.318309886183790691216 #define CL_M_2_PI 0.636619772367581382433 #define CL_M_2_SQRTPI 1.128379167095512558561 #define CL_M_SQRT2 1.414213562373095145475 #define CL_M_SQRT1_2 0.707106781186547572737 #define CL_M_E_F 2.71828174591064f #define CL_M_LOG2E_F 1.44269502162933f #define CL_M_LOG10E_F 0.43429449200630f #define CL_M_LN2_F 0.69314718246460f #define CL_M_LN10_F 2.30258512496948f #define CL_M_PI_F 3.14159274101257f #define CL_M_PI_2_F 1.57079637050629f #define CL_M_PI_4_F 0.78539818525314f #define CL_M_1_PI_F 0.31830987334251f #define CL_M_2_PI_F 0.63661974668503f #define CL_M_2_SQRTPI_F 1.12837922573090f #define CL_M_SQRT2_F 1.41421353816986f #define CL_M_SQRT1_2_F 0.70710676908493f #define CL_NAN (CL_INFINITY - CL_INFINITY) #define CL_HUGE_VALF ((cl_float) 1e50) #define CL_HUGE_VAL ((cl_double) 1e500) #define CL_MAXFLOAT CL_FLT_MAX #define CL_INFINITY CL_HUGE_VALF #else #include /* scalar types */ typedef int8_t cl_char; typedef uint8_t cl_uchar; typedef int16_t cl_short __attribute__((aligned(2))); typedef uint16_t cl_ushort __attribute__((aligned(2))); typedef int32_t cl_int __attribute__((aligned(4))); typedef uint32_t cl_uint __attribute__((aligned(4))); typedef int64_t cl_long __attribute__((aligned(8))); typedef uint64_t cl_ulong __attribute__((aligned(8))); typedef uint16_t cl_half __attribute__((aligned(2))); typedef float cl_float __attribute__((aligned(4))); typedef double cl_double __attribute__((aligned(8))); /* Macro names and corresponding values defined by OpenCL */ #define CL_CHAR_BIT 8 #define CL_SCHAR_MAX 127 #define CL_SCHAR_MIN (-127-1) #define CL_CHAR_MAX CL_SCHAR_MAX #define CL_CHAR_MIN CL_SCHAR_MIN #define CL_UCHAR_MAX 255 #define CL_SHRT_MAX 32767 #define CL_SHRT_MIN (-32767-1) #define CL_USHRT_MAX 65535 #define CL_INT_MAX 2147483647 #define CL_INT_MIN (-2147483647-1) #define CL_UINT_MAX 0xffffffffU #define CL_LONG_MAX ((cl_long) 0x7FFFFFFFFFFFFFFFLL) #define CL_LONG_MIN ((cl_long) -0x7FFFFFFFFFFFFFFFLL - 1LL) #define CL_ULONG_MAX ((cl_ulong) 0xFFFFFFFFFFFFFFFFULL) #define CL_FLT_DIG 6 #define CL_FLT_MANT_DIG 24 #define CL_FLT_MAX_10_EXP +38 #define CL_FLT_MAX_EXP +128 #define CL_FLT_MIN_10_EXP -37 #define CL_FLT_MIN_EXP -125 #define CL_FLT_RADIX 2 #define CL_FLT_MAX 0x1.fffffep127f #define CL_FLT_MIN 0x1.0p-126f #define CL_FLT_EPSILON 0x1.0p-23f #define CL_DBL_DIG 15 #define CL_DBL_MANT_DIG 53 #define CL_DBL_MAX_10_EXP +308 #define CL_DBL_MAX_EXP +1024 #define CL_DBL_MIN_10_EXP -307 #define CL_DBL_MIN_EXP -1021 #define CL_DBL_RADIX 2 #define CL_DBL_MAX 0x1.fffffffffffffp1023 #define CL_DBL_MIN 0x1.0p-1022 #define CL_DBL_EPSILON 0x1.0p-52 #define CL_M_E 2.718281828459045090796 #define CL_M_LOG2E 1.442695040888963387005 #define CL_M_LOG10E 0.434294481903251816668 #define CL_M_LN2 0.693147180559945286227 #define CL_M_LN10 2.302585092994045901094 #define CL_M_PI 3.141592653589793115998 #define CL_M_PI_2 1.570796326794896557999 #define CL_M_PI_4 0.785398163397448278999 #define CL_M_1_PI 0.318309886183790691216 #define CL_M_2_PI 0.636619772367581382433 #define CL_M_2_SQRTPI 1.128379167095512558561 #define CL_M_SQRT2 1.414213562373095145475 #define CL_M_SQRT1_2 0.707106781186547572737 #define CL_M_E_F 2.71828174591064f #define CL_M_LOG2E_F 1.44269502162933f #define CL_M_LOG10E_F 0.43429449200630f #define CL_M_LN2_F 0.69314718246460f #define CL_M_LN10_F 2.30258512496948f #define CL_M_PI_F 3.14159274101257f #define CL_M_PI_2_F 1.57079637050629f #define CL_M_PI_4_F 0.78539818525314f #define CL_M_1_PI_F 0.31830987334251f #define CL_M_2_PI_F 0.63661974668503f #define CL_M_2_SQRTPI_F 1.12837922573090f #define CL_M_SQRT2_F 1.41421353816986f #define CL_M_SQRT1_2_F 0.70710676908493f #if defined( __GNUC__ ) #define CL_HUGE_VALF __builtin_huge_valf() #define CL_HUGE_VAL __builtin_huge_val() #define CL_NAN __builtin_nanf( "" ) #else #define CL_HUGE_VALF ((cl_float) 1e50) #define CL_HUGE_VAL ((cl_double) 1e500) float nanf( const char * ); #define CL_NAN nanf( "" ) #endif #define CL_MAXFLOAT CL_FLT_MAX #define CL_INFINITY CL_HUGE_VALF #endif #include /* Mirror types to GL types. Mirror types allow us to avoid deciding which 87s to load based on whether we are using GL or GLES here. */ typedef unsigned int cl_GLuint; typedef int cl_GLint; typedef unsigned int cl_GLenum; /* * Vector types * * Note: OpenCL requires that all types be naturally aligned. * This means that vector types must be naturally aligned. * For example, a vector of four floats must be aligned to * a 16 byte boundary (calculated as 4 * the natural 4-byte * alignment of the float). The alignment qualifiers here * will only function properly if your compiler supports them * and if you don't actively work to defeat them. For example, * in order for a cl_float4 to be 16 byte aligned in a struct, * the start of the struct must itself be 16-byte aligned. * * Maintaining proper alignment is the user's responsibility. */ /* Define basic vector types */ #if defined( __VEC__ ) #include /* may be omitted depending on compiler. AltiVec spec provides no way to detect whether the header is required. */ typedef vector unsigned char __cl_uchar16; typedef vector signed char __cl_char16; typedef vector unsigned short __cl_ushort8; typedef vector signed short __cl_short8; typedef vector unsigned int __cl_uint4; typedef vector signed int __cl_int4; typedef vector float __cl_float4; #define __CL_UCHAR16__ 1 #define __CL_CHAR16__ 1 #define __CL_USHORT8__ 1 #define __CL_SHORT8__ 1 #define __CL_UINT4__ 1 #define __CL_INT4__ 1 #define __CL_FLOAT4__ 1 #endif #if defined( __SSE__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef float __cl_float4 __attribute__((vector_size(16))); #else typedef __m128 __cl_float4; #endif #define __CL_FLOAT4__ 1 #endif #if defined( __SSE2__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef cl_uchar __cl_uchar16 __attribute__((vector_size(16))); typedef cl_char __cl_char16 __attribute__((vector_size(16))); typedef cl_ushort __cl_ushort8 __attribute__((vector_size(16))); typedef cl_short __cl_short8 __attribute__((vector_size(16))); typedef cl_uint __cl_uint4 __attribute__((vector_size(16))); typedef cl_int __cl_int4 __attribute__((vector_size(16))); typedef cl_ulong __cl_ulong2 __attribute__((vector_size(16))); typedef cl_long __cl_long2 __attribute__((vector_size(16))); typedef cl_double __cl_double2 __attribute__((vector_size(16))); #else typedef __m128i __cl_uchar16; typedef __m128i __cl_char16; typedef __m128i __cl_ushort8; typedef __m128i __cl_short8; typedef __m128i __cl_uint4; typedef __m128i __cl_int4; typedef __m128i __cl_ulong2; typedef __m128i __cl_long2; typedef __m128d __cl_double2; #endif #define __CL_UCHAR16__ 1 #define __CL_CHAR16__ 1 #define __CL_USHORT8__ 1 #define __CL_SHORT8__ 1 #define __CL_INT4__ 1 #define __CL_UINT4__ 1 #define __CL_ULONG2__ 1 #define __CL_LONG2__ 1 #define __CL_DOUBLE2__ 1 #endif #if defined( __MMX__ ) #include #if defined( __GNUC__ ) typedef cl_uchar __cl_uchar8 __attribute__((vector_size(8))); typedef cl_char __cl_char8 __attribute__((vector_size(8))); typedef cl_ushort __cl_ushort4 __attribute__((vector_size(8))); typedef cl_short __cl_short4 __attribute__((vector_size(8))); typedef cl_uint __cl_uint2 __attribute__((vector_size(8))); typedef cl_int __cl_int2 __attribute__((vector_size(8))); typedef cl_ulong __cl_ulong1 __attribute__((vector_size(8))); typedef cl_long __cl_long1 __attribute__((vector_size(8))); typedef cl_float __cl_float2 __attribute__((vector_size(8))); #else typedef __m64 __cl_uchar8; typedef __m64 __cl_char8; typedef __m64 __cl_ushort4; typedef __m64 __cl_short4; typedef __m64 __cl_uint2; typedef __m64 __cl_int2; typedef __m64 __cl_ulong1; typedef __m64 __cl_long1; typedef __m64 __cl_float2; #endif #define __CL_UCHAR8__ 1 #define __CL_CHAR8__ 1 #define __CL_USHORT4__ 1 #define __CL_SHORT4__ 1 #define __CL_INT2__ 1 #define __CL_UINT2__ 1 #define __CL_ULONG1__ 1 #define __CL_LONG1__ 1 #define __CL_FLOAT2__ 1 #endif #if defined( __AVX__ ) #if defined( __MINGW64__ ) #include #else #include #endif #if defined( __GNUC__ ) typedef cl_float __cl_float8 __attribute__((vector_size(32))); typedef cl_double __cl_double4 __attribute__((vector_size(32))); #else typedef __m256 __cl_float8; typedef __m256d __cl_double4; #endif #define __CL_FLOAT8__ 1 #define __CL_DOUBLE4__ 1 #endif /* Define capabilities for anonymous struct members. */ #if defined( __GNUC__) && ! defined( __STRICT_ANSI__ ) #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ __extension__ #elif defined( _WIN32) && (_MSC_VER >= 1500) /* Microsoft Developer Studio 2008 supports anonymous structs, but * complains by default. */ #define __CL_HAS_ANON_STRUCT__ 1 #define __CL_ANON_STRUCT__ /* Disable warning C4201: nonstandard extension used : nameless * struct/union */ #pragma warning( push ) #pragma warning( disable : 4201 ) #else #define __CL_HAS_ANON_STRUCT__ 0 #define __CL_ANON_STRUCT__ #endif /* Define alignment keys */ #if defined( __GNUC__ ) #define CL_ALIGNED(_x) __attribute__ ((aligned(_x))) #elif defined( _WIN32) && (_MSC_VER) /* Alignment keys neutered on windows because MSVC can't swallow function arguments with alignment requirements */ /* http://msdn.microsoft.com/en-us/library/373ak2y1%28VS.71%29.aspx */ /* #include */ /* #define CL_ALIGNED(_x) _CRT_ALIGN(_x) */ #define CL_ALIGNED(_x) #else #warning Need to implement some method to align data here #define CL_ALIGNED(_x) #endif /* Indicate whether .xyzw, .s0123 and .hi.lo are supported */ #if __CL_HAS_ANON_STRUCT__ /* .xyzw and .s0123...{f|F} are supported */ #define CL_HAS_NAMED_VECTOR_FIELDS 1 /* .hi and .lo are supported */ #define CL_HAS_HI_LO_VECTOR_FIELDS 1 #endif /* Define cl_vector types */ /* ---- cl_charn ---- */ typedef union { cl_char CL_ALIGNED(2) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_char lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2; #endif }cl_char2; typedef union { cl_char CL_ALIGNED(4) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_char2 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[2]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4; #endif }cl_char4; /* cl_char3 is identical in size, alignment and behavior to cl_char4. See section 6.1.5. */ typedef cl_char4 cl_char3; typedef union { cl_char CL_ALIGNED(8) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_char4 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[4]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4[2]; #endif #if defined( __CL_CHAR8__ ) __cl_char8 v8; #endif }cl_char8; typedef union { cl_char CL_ALIGNED(16) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_char x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_char s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_char8 lo, hi; }; #endif #if defined( __CL_CHAR2__) __cl_char2 v2[8]; #endif #if defined( __CL_CHAR4__) __cl_char4 v4[4]; #endif #if defined( __CL_CHAR8__ ) __cl_char8 v8[2]; #endif #if defined( __CL_CHAR16__ ) __cl_char16 v16; #endif }cl_char16; /* ---- cl_ucharn ---- */ typedef union { cl_uchar CL_ALIGNED(2) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_uchar lo, hi; }; #endif #if defined( __cl_uchar2__) __cl_uchar2 v2; #endif }cl_uchar2; typedef union { cl_uchar CL_ALIGNED(4) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_uchar2 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[2]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4; #endif }cl_uchar4; /* cl_uchar3 is identical in size, alignment and behavior to cl_uchar4. See section 6.1.5. */ typedef cl_uchar4 cl_uchar3; typedef union { cl_uchar CL_ALIGNED(8) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_uchar4 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[4]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4[2]; #endif #if defined( __CL_UCHAR8__ ) __cl_uchar8 v8; #endif }cl_uchar8; typedef union { cl_uchar CL_ALIGNED(16) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uchar x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_uchar s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_uchar8 lo, hi; }; #endif #if defined( __CL_UCHAR2__) __cl_uchar2 v2[8]; #endif #if defined( __CL_UCHAR4__) __cl_uchar4 v4[4]; #endif #if defined( __CL_UCHAR8__ ) __cl_uchar8 v8[2]; #endif #if defined( __CL_UCHAR16__ ) __cl_uchar16 v16; #endif }cl_uchar16; /* ---- cl_shortn ---- */ typedef union { cl_short CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_short lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2; #endif }cl_short2; typedef union { cl_short CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_short2 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[2]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4; #endif }cl_short4; /* cl_short3 is identical in size, alignment and behavior to cl_short4. See section 6.1.5. */ typedef cl_short4 cl_short3; typedef union { cl_short CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_short4 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[4]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4[2]; #endif #if defined( __CL_SHORT8__ ) __cl_short8 v8; #endif }cl_short8; typedef union { cl_short CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_short x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_short s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_short8 lo, hi; }; #endif #if defined( __CL_SHORT2__) __cl_short2 v2[8]; #endif #if defined( __CL_SHORT4__) __cl_short4 v4[4]; #endif #if defined( __CL_SHORT8__ ) __cl_short8 v8[2]; #endif #if defined( __CL_SHORT16__ ) __cl_short16 v16; #endif }cl_short16; /* ---- cl_ushortn ---- */ typedef union { cl_ushort CL_ALIGNED(4) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_ushort lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2; #endif }cl_ushort2; typedef union { cl_ushort CL_ALIGNED(8) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_ushort2 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[2]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4; #endif }cl_ushort4; /* cl_ushort3 is identical in size, alignment and behavior to cl_ushort4. See section 6.1.5. */ typedef cl_ushort4 cl_ushort3; typedef union { cl_ushort CL_ALIGNED(16) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_ushort4 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[4]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4[2]; #endif #if defined( __CL_USHORT8__ ) __cl_ushort8 v8; #endif }cl_ushort8; typedef union { cl_ushort CL_ALIGNED(32) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ushort x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_ushort s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_ushort8 lo, hi; }; #endif #if defined( __CL_USHORT2__) __cl_ushort2 v2[8]; #endif #if defined( __CL_USHORT4__) __cl_ushort4 v4[4]; #endif #if defined( __CL_USHORT8__ ) __cl_ushort8 v8[2]; #endif #if defined( __CL_USHORT16__ ) __cl_ushort16 v16; #endif }cl_ushort16; /* ---- cl_intn ---- */ typedef union { cl_int CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_int lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2; #endif }cl_int2; typedef union { cl_int CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_int2 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[2]; #endif #if defined( __CL_INT4__) __cl_int4 v4; #endif }cl_int4; /* cl_int3 is identical in size, alignment and behavior to cl_int4. See section 6.1.5. */ typedef cl_int4 cl_int3; typedef union { cl_int CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_int4 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[4]; #endif #if defined( __CL_INT4__) __cl_int4 v4[2]; #endif #if defined( __CL_INT8__ ) __cl_int8 v8; #endif }cl_int8; typedef union { cl_int CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_int x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_int s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_int8 lo, hi; }; #endif #if defined( __CL_INT2__) __cl_int2 v2[8]; #endif #if defined( __CL_INT4__) __cl_int4 v4[4]; #endif #if defined( __CL_INT8__ ) __cl_int8 v8[2]; #endif #if defined( __CL_INT16__ ) __cl_int16 v16; #endif }cl_int16; /* ---- cl_uintn ---- */ typedef union { cl_uint CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_uint lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2; #endif }cl_uint2; typedef union { cl_uint CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_uint2 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[2]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4; #endif }cl_uint4; /* cl_uint3 is identical in size, alignment and behavior to cl_uint4. See section 6.1.5. */ typedef cl_uint4 cl_uint3; typedef union { cl_uint CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_uint4 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[4]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4[2]; #endif #if defined( __CL_UINT8__ ) __cl_uint8 v8; #endif }cl_uint8; typedef union { cl_uint CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_uint x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_uint s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_uint8 lo, hi; }; #endif #if defined( __CL_UINT2__) __cl_uint2 v2[8]; #endif #if defined( __CL_UINT4__) __cl_uint4 v4[4]; #endif #if defined( __CL_UINT8__ ) __cl_uint8 v8[2]; #endif #if defined( __CL_UINT16__ ) __cl_uint16 v16; #endif }cl_uint16; /* ---- cl_longn ---- */ typedef union { cl_long CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_long lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2; #endif }cl_long2; typedef union { cl_long CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_long2 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[2]; #endif #if defined( __CL_LONG4__) __cl_long4 v4; #endif }cl_long4; /* cl_long3 is identical in size, alignment and behavior to cl_long4. See section 6.1.5. */ typedef cl_long4 cl_long3; typedef union { cl_long CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_long4 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[4]; #endif #if defined( __CL_LONG4__) __cl_long4 v4[2]; #endif #if defined( __CL_LONG8__ ) __cl_long8 v8; #endif }cl_long8; typedef union { cl_long CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_long x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_long s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_long8 lo, hi; }; #endif #if defined( __CL_LONG2__) __cl_long2 v2[8]; #endif #if defined( __CL_LONG4__) __cl_long4 v4[4]; #endif #if defined( __CL_LONG8__ ) __cl_long8 v8[2]; #endif #if defined( __CL_LONG16__ ) __cl_long16 v16; #endif }cl_long16; /* ---- cl_ulongn ---- */ typedef union { cl_ulong CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_ulong lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2; #endif }cl_ulong2; typedef union { cl_ulong CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_ulong2 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[2]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4; #endif }cl_ulong4; /* cl_ulong3 is identical in size, alignment and behavior to cl_ulong4. See section 6.1.5. */ typedef cl_ulong4 cl_ulong3; typedef union { cl_ulong CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_ulong4 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[4]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4[2]; #endif #if defined( __CL_ULONG8__ ) __cl_ulong8 v8; #endif }cl_ulong8; typedef union { cl_ulong CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_ulong x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_ulong s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_ulong8 lo, hi; }; #endif #if defined( __CL_ULONG2__) __cl_ulong2 v2[8]; #endif #if defined( __CL_ULONG4__) __cl_ulong4 v4[4]; #endif #if defined( __CL_ULONG8__ ) __cl_ulong8 v8[2]; #endif #if defined( __CL_ULONG16__ ) __cl_ulong16 v16; #endif }cl_ulong16; /* --- cl_floatn ---- */ typedef union { cl_float CL_ALIGNED(8) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_float lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2; #endif }cl_float2; typedef union { cl_float CL_ALIGNED(16) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_float2 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[2]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4; #endif }cl_float4; /* cl_float3 is identical in size, alignment and behavior to cl_float4. See section 6.1.5. */ typedef cl_float4 cl_float3; typedef union { cl_float CL_ALIGNED(32) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_float4 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[4]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4[2]; #endif #if defined( __CL_FLOAT8__ ) __cl_float8 v8; #endif }cl_float8; typedef union { cl_float CL_ALIGNED(64) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_float x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_float s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_float8 lo, hi; }; #endif #if defined( __CL_FLOAT2__) __cl_float2 v2[8]; #endif #if defined( __CL_FLOAT4__) __cl_float4 v4[4]; #endif #if defined( __CL_FLOAT8__ ) __cl_float8 v8[2]; #endif #if defined( __CL_FLOAT16__ ) __cl_float16 v16; #endif }cl_float16; /* --- cl_doublen ---- */ typedef union { cl_double CL_ALIGNED(16) s[2]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1; }; __CL_ANON_STRUCT__ struct{ cl_double lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2; #endif }cl_double2; typedef union { cl_double CL_ALIGNED(32) s[4]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3; }; __CL_ANON_STRUCT__ struct{ cl_double2 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[2]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4; #endif }cl_double4; /* cl_double3 is identical in size, alignment and behavior to cl_double4. See section 6.1.5. */ typedef cl_double4 cl_double3; typedef union { cl_double CL_ALIGNED(64) s[8]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3, s4, s5, s6, s7; }; __CL_ANON_STRUCT__ struct{ cl_double4 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[4]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4[2]; #endif #if defined( __CL_DOUBLE8__ ) __cl_double8 v8; #endif }cl_double8; typedef union { cl_double CL_ALIGNED(128) s[16]; #if __CL_HAS_ANON_STRUCT__ __CL_ANON_STRUCT__ struct{ cl_double x, y, z, w, __spacer4, __spacer5, __spacer6, __spacer7, __spacer8, __spacer9, sa, sb, sc, sd, se, sf; }; __CL_ANON_STRUCT__ struct{ cl_double s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF; }; __CL_ANON_STRUCT__ struct{ cl_double8 lo, hi; }; #endif #if defined( __CL_DOUBLE2__) __cl_double2 v2[8]; #endif #if defined( __CL_DOUBLE4__) __cl_double4 v4[4]; #endif #if defined( __CL_DOUBLE8__ ) __cl_double8 v8[2]; #endif #if defined( __CL_DOUBLE16__ ) __cl_double16 v16; #endif }cl_double16; /* Macro to facilitate debugging * Usage: * Place CL_PROGRAM_STRING_DEBUG_INFO on the line before the first line of your source. * The first line ends with: CL_PROGRAM_STRING_DEBUG_INFO \" * Each line thereafter of OpenCL C source must end with: \n\ * The last line ends in "; * * Example: * * const char *my_program = CL_PROGRAM_STRING_DEBUG_INFO "\ * kernel void foo( int a, float * b ) \n\ * { \n\ * // my comment \n\ * *b[ get_global_id(0)] = a; \n\ * } \n\ * "; * * This should correctly set up the line, (column) and file information for your source * string so you can do source level debugging. */ #define __CL_STRINGIFY( _x ) # _x #define _CL_STRINGIFY( _x ) __CL_STRINGIFY( _x ) #define CL_PROGRAM_STRING_DEBUG_INFO "#line " _CL_STRINGIFY(__LINE__) " \"" __FILE__ "\" \n\n" #ifdef __cplusplus } #endif #undef __CL_HAS_ANON_STRUCT__ #undef __CL_ANON_STRUCT__ #if defined( _WIN32) && (_MSC_VER >= 1500) #pragma warning( pop ) #endif #endif /* __CL_PLATFORM_H */ Beignet-1.3.2-Source/include/CL/opencl.h000664 001750 001750 00000003711 13161142102 017023 0ustar00yryr000000 000000 /******************************************************************************* * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. ******************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_H #define __OPENCL_H #ifdef __cplusplus extern "C" { #endif #ifdef __APPLE__ #include #include #include #include #else #include #include #include #include #endif #ifdef __cplusplus } #endif #endif /* __OPENCL_H */ Beignet-1.3.2-Source/include/CL/cl_d3d11.h000664 001750 001750 00000011774 13161142102 017045 0ustar00yryr000000 000000 /********************************************************************************** * Copyright (c) 2008-2015 The Khronos Group Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and/or associated documentation files (the * "Materials"), to deal in the Materials without restriction, including * without limitation the rights to use, copy, modify, merge, publish, * distribute, sublicense, and/or sell copies of the Materials, and to * permit persons to whom the Materials are furnished to do so, subject to * the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Materials. * * MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS * KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS * SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT * https://www.khronos.org/registry/ * * THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE * MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS. **********************************************************************************/ /* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */ #ifndef __OPENCL_CL_D3D11_H #define __OPENCL_CL_D3D11_H #include #include #include #ifdef __cplusplus extern "C" { #endif /****************************************************************************** * cl_khr_d3d11_sharing */ #define cl_khr_d3d11_sharing 1 typedef cl_uint cl_d3d11_device_source_khr; typedef cl_uint cl_d3d11_device_set_khr; /******************************************************************************/ /* Error Codes */ #define CL_INVALID_D3D11_DEVICE_KHR -1006 #define CL_INVALID_D3D11_RESOURCE_KHR -1007 #define CL_D3D11_RESOURCE_ALREADY_ACQUIRED_KHR -1008 #define CL_D3D11_RESOURCE_NOT_ACQUIRED_KHR -1009 /* cl_d3d11_device_source */ #define CL_D3D11_DEVICE_KHR 0x4019 #define CL_D3D11_DXGI_ADAPTER_KHR 0x401A /* cl_d3d11_device_set */ #define CL_PREFERRED_DEVICES_FOR_D3D11_KHR 0x401B #define CL_ALL_DEVICES_FOR_D3D11_KHR 0x401C /* cl_context_info */ #define CL_CONTEXT_D3D11_DEVICE_KHR 0x401D #define CL_CONTEXT_D3D11_PREFER_SHARED_RESOURCES_KHR 0x402D /* cl_mem_info */ #define CL_MEM_D3D11_RESOURCE_KHR 0x401E /* cl_image_info */ #define CL_IMAGE_D3D11_SUBRESOURCE_KHR 0x401F /* cl_command_type */ #define CL_COMMAND_ACQUIRE_D3D11_OBJECTS_KHR 0x4020 #define CL_COMMAND_RELEASE_D3D11_OBJECTS_KHR 0x4021 /******************************************************************************/ typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetDeviceIDsFromD3D11KHR_fn)( cl_platform_id platform, cl_d3d11_device_source_khr d3d_device_source, void * d3d_object, cl_d3d11_device_set_khr d3d_device_set, cl_uint num_entries, cl_device_id * devices, cl_uint * num_devices) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11BufferKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Buffer * resource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11Texture2DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Texture2D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromD3D11Texture3DKHR_fn)( cl_context context, cl_mem_flags flags, ID3D11Texture3D * resource, UINT subresource, cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireD3D11ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseD3D11ObjectsKHR_fn)( cl_command_queue command_queue, cl_uint num_objects, const cl_mem * mem_objects, cl_uint num_events_in_wait_list, const cl_event * event_wait_list, cl_event * event) CL_API_SUFFIX__VERSION_1_2; #ifdef __cplusplus } #endif #endif /* __OPENCL_CL_D3D11_H */ Beignet-1.3.2-Source/utests/000775 001750 001750 00000000000 13174334761 015020 5ustar00yryr000000 000000 Beignet-1.3.2-Source/utests/builtin_frexp.cpp000664 001750 001750 00000002627 13161142102 020363 0ustar00yryr000000 000000 #include #include "utest_helper.hpp" void builtin_frexp(void) { const int n = 32; float src[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("builtin_frexp"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); src[0] = ((float*)buf_data[0])[0] = 0.f; src[1] = ((float*)buf_data[0])[1] = -0.f; src[2] = ((float*)buf_data[0])[2] = nanf(""); src[3] = ((float*)buf_data[0])[3] = INFINITY; src[4] = ((float*)buf_data[0])[4] = -INFINITY; for (int i = 5; i < n; ++i) src[i] = ((float*)buf_data[0])[i] = (rand() & 255) * 0.1f - 12.8f; OCL_UNMAP_BUFFER(0); OCL_NDRANGE(1); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); float *dst = (float*)buf_data[1]; int *exp = (int*)buf_data[2]; int w; OCL_ASSERT(dst[0] == 0.f && exp[0] == 0); OCL_ASSERT(dst[1] == -0.f && exp[1] == 0); OCL_ASSERT(isnanf(dst[2])); OCL_ASSERT(dst[3] == INFINITY); OCL_ASSERT(dst[4] == -INFINITY); for (int i = 5; i < n; ++i) { OCL_ASSERT(fabsf(dst[i] - frexpf(src[i], &w)) < 1e-5); OCL_ASSERT(exp[i] == w); } OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(builtin_frexp); Beignet-1.3.2-Source/utests/compiler_get_sub_group_local_id.cpp000664 001750 001750 00000001467 13161142102 024076 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_get_sub_group_local_id(void) { if(!cl_check_subgroups()) return; const size_t n = 256; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_get_sub_group_local_id"); OCL_CREATE_BUFFER(buf[0], 0, (n+1) * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) (n+1); ++i) ((int*)buf_data[0])[i] = -1; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(0); int* dst = (int *)buf_data[0]; OCL_ASSERT(8 == dst[0] || 16 == dst[0]); for (int32_t i = 1; i < (int32_t) n; ++i){ OCL_ASSERT((i-1) % dst[0] == dst[i]); } OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(compiler_get_sub_group_local_id); Beignet-1.3.2-Source/utests/compiler_vect_compare.cpp000664 001750 001750 00000002214 13161142102 022042 0ustar00yryr000000 000000 #include "utest_helper.hpp" typedef struct { int x; int y; int z; int w; } int4; void compiler_vect_compare(void) { const size_t n = 16; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_vect_compare"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int4), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int4), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_MAP_BUFFER(0); for (uint32_t i = 0; i < n; ++i) { ((int4*)buf_data[0])[i].x = i & 0x1; ((int4*)buf_data[0])[i].y = i & 0x2; ((int4*)buf_data[0])[i].z = i & 0x4; ((int4*)buf_data[0])[i].w = i & 0x8; } OCL_UNMAP_BUFFER(0); globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 16; ++i) { OCL_ASSERT(((int4*)buf_data[1])[i].x == (int)((i&0x1)?0xffffffff:0)); OCL_ASSERT(((int4*)buf_data[1])[i].y == (int)((i&0x2)?0xffffffff:0)); OCL_ASSERT(((int4*)buf_data[1])[i].z == (int)((i&0x4)?0xffffffff:0)); OCL_ASSERT(((int4*)buf_data[1])[i].w == (int)((i&0x8)?0xffffffff:0)); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_vect_compare); Beignet-1.3.2-Source/utests/get_arg_info.cpp000664 001750 001750 00000006404 13161142102 020131 0ustar00yryr000000 000000 #include #include "utest_helper.hpp" void test_get_arg_info(void) { int ret; uint32_t ret_val; cl_kernel_arg_type_qualifier type_qual; size_t ret_sz; char name[64]; // Setup kernel and buffers OCL_CALL (cl_kernel_init, "test_get_arg_info.cl", "test_get_arg_info", SOURCE, "-cl-kernel-arg-info"); //Arg 0 ret = clGetKernelArgInfo(kernel, 0, CL_KERNEL_ARG_ADDRESS_QUALIFIER, sizeof(ret_val), &ret_val, &ret_sz); OCL_ASSERT(ret == CL_SUCCESS); OCL_ASSERT(ret_sz == sizeof(cl_kernel_arg_address_qualifier)); OCL_ASSERT(ret_val == CL_KERNEL_ARG_ADDRESS_GLOBAL); ret = clGetKernelArgInfo(kernel, 0, CL_KERNEL_ARG_ACCESS_QUALIFIER, sizeof(ret_val), &ret_val, &ret_sz); OCL_ASSERT(ret == CL_SUCCESS); OCL_ASSERT(ret_sz == sizeof(cl_kernel_arg_access_qualifier)); OCL_ASSERT(ret_val == CL_KERNEL_ARG_ACCESS_NONE); ret = clGetKernelArgInfo(kernel, 0, CL_KERNEL_ARG_TYPE_NAME, sizeof(name), name, &ret_sz); OCL_ASSERT(ret == CL_SUCCESS); OCL_ASSERT(ret_sz == strlen("float*") + 1); OCL_ASSERT(!strcmp(name, "float*")); ret = clGetKernelArgInfo(kernel, 0, CL_KERNEL_ARG_NAME, sizeof(name), name, &ret_sz); OCL_ASSERT(ret == CL_SUCCESS); OCL_ASSERT(ret_sz == strlen("src") + 1); OCL_ASSERT(!strcmp(name, "src")); ret = clGetKernelArgInfo(kernel, 0, CL_KERNEL_ARG_TYPE_QUALIFIER, sizeof(type_qual), &type_qual, &ret_sz); OCL_ASSERT(ret == CL_SUCCESS); OCL_ASSERT(ret_sz == sizeof(cl_kernel_arg_type_qualifier)); OCL_ASSERT(type_qual == (CL_KERNEL_ARG_TYPE_CONST|CL_KERNEL_ARG_TYPE_VOLATILE)); //Arg 1 ret = clGetKernelArgInfo(kernel, 1, CL_KERNEL_ARG_ADDRESS_QUALIFIER, sizeof(ret_val), &ret_val, &ret_sz); OCL_ASSERT(ret == CL_SUCCESS); OCL_ASSERT(ret_sz == sizeof(cl_kernel_arg_address_qualifier)); OCL_ASSERT(ret_val == CL_KERNEL_ARG_ADDRESS_LOCAL); ret = clGetKernelArgInfo(kernel, 1, CL_KERNEL_ARG_ACCESS_QUALIFIER, sizeof(ret_val), &ret_val, &ret_sz); OCL_ASSERT(ret == CL_SUCCESS); OCL_ASSERT(ret_sz == sizeof(cl_kernel_arg_access_qualifier)); OCL_ASSERT(ret_val == CL_KERNEL_ARG_ACCESS_NONE); ret = clGetKernelArgInfo(kernel, 1, CL_KERNEL_ARG_TYPE_NAME, sizeof(name), name, &ret_sz); OCL_ASSERT(ret == CL_SUCCESS); OCL_ASSERT(ret_sz == strlen("int*") + 1); OCL_ASSERT(!strcmp(name, "int*")); ret = clGetKernelArgInfo(kernel, 1, CL_KERNEL_ARG_NAME, sizeof(name), name, &ret_sz); OCL_ASSERT(ret == CL_SUCCESS); OCL_ASSERT(ret_sz == strlen("dst") + 1); OCL_ASSERT(!strcmp(name, "dst")); ret = clGetKernelArgInfo(kernel, 1, CL_KERNEL_ARG_TYPE_QUALIFIER, sizeof(type_qual), &type_qual, &ret_sz); OCL_ASSERT(ret == CL_SUCCESS); OCL_ASSERT(ret_sz == sizeof(cl_kernel_arg_type_qualifier)); OCL_ASSERT(type_qual == CL_KERNEL_ARG_TYPE_NONE); //Arg 2 ret = clGetKernelArgInfo(kernel, 2, CL_KERNEL_ARG_TYPE_NAME, sizeof(name), name, &ret_sz); OCL_ASSERT(ret == CL_SUCCESS); OCL_ASSERT(ret_sz == strlen("test_arg_struct") + 1); OCL_ASSERT(!strcmp(name, "test_arg_struct")); } MAKE_UTEST_FROM_FUNCTION(test_get_arg_info); Beignet-1.3.2-Source/utests/compiler_copy_buffer_row.cpp000664 001750 001750 00000002253 13161142102 022570 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_copy_buffer_row(void) { uint32_t *src_buffer = NULL; int *data_buffer = NULL; const int row = 8192; const int row_n = 2; const int n = row * row_n; // Setup kernel and buffers OCL_CREATE_KERNEL("test_copy_buffer_row"); src_buffer = (uint32_t *) malloc(sizeof(uint32_t) * n); for (int32_t i = 0; i < n; ++i) src_buffer[i] = i; data_buffer = (int *) malloc(sizeof(int) * 2); data_buffer[0] = row; data_buffer[1] = n; OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(uint32_t), src_buffer); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_CREATE_BUFFER(buf[2], CL_MEM_COPY_HOST_PTR, 2 * sizeof(uint32_t), data_buffer); free(src_buffer); free(data_buffer); // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); // Check results OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int32_t i = 0; i < n; ++i) OCL_ASSERT(((uint32_t*)buf_data[0])[i] == ((uint32_t*)buf_data[1])[i]); } MAKE_UTEST_FROM_FUNCTION(compiler_copy_buffer_row); Beignet-1.3.2-Source/utests/get_cl_info.cpp000664 001750 001750 00000073303 13161142102 017760 0ustar00yryr000000 000000 #include #include #include #include #include #include #include "utest_helper.hpp" using namespace std; /* ***************************************************** * * This file to test all the API like: clGetXXXXInfo * * ***************************************************** */ #define NO_STANDARD_REF 0xFFFFF template struct Info_Result { T ret; T refer; int size; typedef T type_value; void * get_ret(void) { return (void *)&ret; } Info_Result(T other) { refer = other; size = sizeof(T); } bool check_result (void) { //printf("The refer is %d, we get result is %d\n", refer, ret); if (ret != refer && refer != (T)NO_STANDARD_REF) return false; return true; } }; template <> struct Info_Result { char * ret; char * refer; int size; typedef char* type_value; Info_Result(const char *other, int sz): refer(NULL) { size = sz; ret = (char *)malloc(sizeof(char) * sz); if (other) { refer = (char *)malloc(sizeof(char) * sz); memcpy(refer, other, sz); } } ~Info_Result(void) { free(refer); free(ret); } void * get_ret(void) { return (void *)ret; } bool check_result (void) { if (refer && ::memcmp(ret, refer, size)) return false; return true; } }; template <> //Used for such as CL_PROGRAM_BINARIES struct Info_Result { char ** ret; char ** refer; int *elt_size; int size; typedef char** type_value; int array_size; Info_Result(char **other, int *sz, int elt_num) { array_size = elt_num; size = elt_num * sizeof(char**); ret = (char **)malloc(elt_num * sizeof(char *)); memset(ret, 0, (elt_num * sizeof(char *))); refer = (char **)malloc(elt_num * sizeof(char *)); memset(refer, 0, (elt_num * sizeof(char *))); elt_size = (int *)malloc(elt_num * sizeof(int)); memset(elt_size, 0, (elt_num * sizeof(int))); if (sz) { int i = 0; for (; i < elt_num; i++) { elt_size[i] = sz[i]; ret[i] = (char *)malloc(sz[i] * sizeof(char)); if (other[i] && elt_size[i] > 0) { refer[i] = (char *)malloc(sz[i] * sizeof(char)); memcpy(&refer[i], &other[i], sz[i]); } else refer[i] = NULL; } } } ~Info_Result(void) { int i = 0; for (; i < array_size; i++) { if (refer[i]) free(refer[i]); free(ret[i]); } free(ret); free(refer); free(elt_size); } void * get_ret(void) { return (void *)ret; } bool check_result (void) { int i = 0; for (; i < array_size; i++) { if (refer[i] && ::memcmp(ret[i], refer[i], elt_size[i])) return false; } return true; } }; template struct Traits { static bool Is_Same(void) { return false; }; }; template struct Traits { static bool Is_Same(void) { return true; }; }; template Info_Result* cast_as(void *info) { Info_Result* ret; ret = reinterpret_cast*>(info); OCL_ASSERT((Traits::type_value>::Is_Same())); return ret; } #define CALL_INFO_AND_RET(TYPE, FUNC, ...) \ do { \ cl_int ret; \ size_t ret_size; \ \ Info_Result* info = cast_as(x->second); \ ret = FUNC (__VA_ARGS__, x->first, \ info->size, info->get_ret(), &ret_size); \ OCL_ASSERT((!ret)); \ OCL_ASSERT((info->check_result())); \ delete info; \ } while(0) /* ***************************************************** * * clGetProgramInfo * * ***************************************************** */ #define CALL_PROGINFO_AND_RET(TYPE) CALL_INFO_AND_RET(TYPE, clGetProgramInfo, program) void get_program_info(void) { map maps; int expect_value; char * expect_source; int sz; char *ker_path = (char *)malloc(4096 * sizeof(char)); const char *kiss_path = getenv("OCL_KERNEL_PATH"); if(!kiss_path) return; string line; string source_code; if(strlen(kiss_path) > 4000) return; sprintf(ker_path, "%s/%s", kiss_path, "compiler_if_else.cl"); ifstream in(ker_path); while (getline(in,line)) { source_code = (source_code == "") ? source_code + line : source_code + "\n" + line; } free(ker_path); //cout<< source_code; source_code = source_code + "\n"; expect_source = (char *)source_code.c_str(); OCL_CREATE_KERNEL("compiler_if_else"); /* First test for clGetProgramInfo. We just have 1 devices now */ expect_value = 2;//One program, one kernel. maps.insert(make_pair(CL_PROGRAM_REFERENCE_COUNT, (void *)(new Info_Result<>(((cl_uint)expect_value))))); maps.insert(make_pair(CL_PROGRAM_CONTEXT, (void *)(new Info_Result(ctx)))); expect_value = 1; maps.insert(make_pair(CL_PROGRAM_NUM_DEVICES, (void *)(new Info_Result<>(((cl_uint)expect_value))))); maps.insert(make_pair(CL_PROGRAM_DEVICES, (void *)(new Info_Result(device)))); sz = (strlen(expect_source) + 1); maps.insert(make_pair(CL_PROGRAM_SOURCE, (void *)(new Info_Result(expect_source, sz)))); expect_value = NO_STANDARD_REF; maps.insert(make_pair(CL_PROGRAM_BINARY_SIZES, (void *)(new Info_Result((size_t)expect_value)))); sz = 8192; //big enough? expect_source = NULL; maps.insert(make_pair(CL_PROGRAM_BINARIES, (void *)(new Info_Result(&expect_source, &sz, 1)))); for (map::iterator x = maps.begin(); x != maps.end(); ++x) { switch (x->first) { case CL_PROGRAM_REFERENCE_COUNT: case CL_PROGRAM_NUM_DEVICES: CALL_PROGINFO_AND_RET(cl_uint); break; case CL_PROGRAM_CONTEXT: CALL_PROGINFO_AND_RET(cl_context); break; case CL_PROGRAM_DEVICES: CALL_PROGINFO_AND_RET(cl_device_id); break; case CL_PROGRAM_SOURCE: CALL_PROGINFO_AND_RET(char *); break; case CL_PROGRAM_BINARY_SIZES: CALL_PROGINFO_AND_RET(size_t); break; case CL_PROGRAM_BINARIES: CALL_PROGINFO_AND_RET(char **); break; default: break; } } } MAKE_UTEST_FROM_FUNCTION(get_program_info); /* ***************************************************** * * clGetCommandQueueInfo * * ***************************************************** */ #define CALL_QUEUEINFO_AND_RET(TYPE) CALL_INFO_AND_RET(TYPE, clGetCommandQueueInfo, queue) void get_queue_info(void) { /* use the compiler_fabs case to test us. */ const size_t n = 16; map maps; int expect_ref; cl_command_queue_properties prop; OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_CREATE_KERNEL("compiler_fabs"); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) ((float*)buf_data[0])[i] = .1f * (rand() & 15) - .75f; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); /* Do our test.*/ maps.insert(make_pair(CL_QUEUE_CONTEXT, (void *)(new Info_Result(ctx)))); maps.insert(make_pair(CL_QUEUE_DEVICE, (void *)(new Info_Result(device)))); expect_ref = 1; maps.insert(make_pair(CL_QUEUE_REFERENCE_COUNT, (void *)(new Info_Result<>(((cl_uint)expect_ref))))); prop = 0; maps.insert(make_pair(CL_QUEUE_PROPERTIES, (void *)(new Info_Result( ((cl_command_queue_properties)prop))))); for (map::iterator x = maps.begin(); x != maps.end(); ++x) { switch (x->first) { case CL_QUEUE_CONTEXT: CALL_QUEUEINFO_AND_RET(cl_context); break; case CL_QUEUE_DEVICE: CALL_QUEUEINFO_AND_RET(cl_device_id); break; case CL_QUEUE_REFERENCE_COUNT: CALL_QUEUEINFO_AND_RET(cl_uint); break; case CL_QUEUE_PROPERTIES: CALL_QUEUEINFO_AND_RET(cl_command_queue_properties); break; default: break; } } } MAKE_UTEST_FROM_FUNCTION(get_queue_info); /* ***************************************************** * * clGetProgramBuildInfo * * ***************************************************** */ #define CALL_PROG_BUILD_INFO_AND_RET(TYPE) CALL_INFO_AND_RET(TYPE, \ clGetProgramBuildInfo, program, device) void get_program_build_info(void) { map maps; cl_build_status expect_status; char build_opt[] = "-emit-llvm"; char log[] = ""; int sz; OCL_CALL (cl_kernel_init, "compiler_if_else.cl", "compiler_if_else", SOURCE, build_opt); /* Do our test.*/ expect_status = CL_BUILD_SUCCESS; maps.insert(make_pair(CL_PROGRAM_BUILD_STATUS, (void *)(new Info_Result(expect_status)))); sz = strlen(build_opt) + 1; maps.insert(make_pair(CL_PROGRAM_BUILD_OPTIONS, (void *)(new Info_Result(build_opt, sz)))); sz = strlen(log) + 1; maps.insert(make_pair(CL_PROGRAM_BUILD_LOG, /* not supported now, just "" */ (void *)(new Info_Result(log, sz)))); for (map::iterator x = maps.begin(); x != maps.end(); ++x) { switch (x->first) { case CL_PROGRAM_BUILD_STATUS: CALL_PROG_BUILD_INFO_AND_RET(cl_build_status); break; case CL_PROGRAM_BUILD_OPTIONS: CALL_PROG_BUILD_INFO_AND_RET(char *); break; case CL_PROGRAM_BUILD_LOG: CALL_PROG_BUILD_INFO_AND_RET(char *); break; default: break; } } } MAKE_UTEST_FROM_FUNCTION(get_program_build_info); // This method uses clGetProgramBuildInfo to check the llvm dump build options sent // and verifies that the llvm dump file is actually generated in the backend. void get_build_llvm_info(void) { map maps; cl_build_status expect_status; char llvm_file[] = "test_llvm_dump.txt"; char build_opt[] = "-dump-opt-llvm=test_llvm_dump.txt"; FILE *fp = NULL; int sz; //Remove any pre-existing file if( (fp = fopen(llvm_file, "r")) != NULL) { fclose(fp); std::remove(llvm_file); } OCL_CALL (cl_kernel_init, "compiler_if_else.cl", "compiler_if_else", SOURCE, build_opt); /* Do our test.*/ expect_status = CL_BUILD_SUCCESS; maps.insert(make_pair(CL_PROGRAM_BUILD_STATUS, (void *)(new Info_Result(expect_status)))); sz = strlen(build_opt) + 1; maps.insert(make_pair(CL_PROGRAM_BUILD_OPTIONS, (void *)(new Info_Result(build_opt, sz)))); for (map::iterator x = maps.begin(); x != maps.end(); ++x) { switch (x->first) { case CL_PROGRAM_BUILD_STATUS: CALL_PROG_BUILD_INFO_AND_RET(cl_build_status); break; case CL_PROGRAM_BUILD_OPTIONS: CALL_PROG_BUILD_INFO_AND_RET(char *); break; default: break; } } if (cl_check_beignet()) { //Test is successful if the backend created the file if( (fp = fopen(llvm_file, "r")) == NULL) { std::cout << "LLVM file creation.. FAILED"; OCL_ASSERT(0); } else { fclose(fp); std::cout << "LLVM file created.. SUCCESS"; } } } MAKE_UTEST_FROM_FUNCTION(get_build_llvm_info); // This method uses clGetProgramBuildInfo to check the dump-spir-binary options // and verifies that the spir dump file is actually generated in the backend. void compile_spir_binary(void) { map maps; cl_build_status expect_status; char spir_file[] = "test_spir_dump.txt"; char compile_opt[] = "-dump-spir-binary=test_spir_dump.txt"; FILE *fp = NULL; int sz; //Remove any pre-existing file if( (fp = fopen(spir_file, "r")) != NULL) { fclose(fp); std::remove(spir_file); } OCL_CALL (cl_kernel_compile, "compiler_ceil.cl", "compiler_ceil", compile_opt); /* Do our test.*/ expect_status = CL_BUILD_SUCCESS; maps.insert(make_pair(CL_PROGRAM_BUILD_STATUS, (void *)(new Info_Result(expect_status)))); sz = strlen(compile_opt) + 1; maps.insert(make_pair(CL_PROGRAM_BUILD_OPTIONS, (void *)(new Info_Result(compile_opt, sz)))); for (map::iterator x = maps.begin(); x != maps.end(); ++x) { switch (x->first) { case CL_PROGRAM_BUILD_STATUS: CALL_PROG_BUILD_INFO_AND_RET(cl_build_status); break; case CL_PROGRAM_BUILD_OPTIONS: CALL_PROG_BUILD_INFO_AND_RET(char *); break; default: break; } } if (cl_check_beignet()) { //Test is successful if the backend created the file if( (fp = fopen(spir_file, "r")) == NULL) { std::cout << "SPIR file creation.. FAILED"; OCL_ASSERT(0); } else { fclose(fp); std::cout << "SPIR file created.. SUCCESS"; } } } MAKE_UTEST_FROM_FUNCTION(compile_spir_binary); void build_spir_binary(void) { map maps; cl_build_status expect_status; char spir_file[] = "test_spir_dump.txt"; char build_opt[] = "-dump-spir-binary=test_spir_dump.txt"; FILE *fp = NULL; int sz; //Remove any pre-existing file if( (fp = fopen(spir_file, "r")) != NULL) { fclose(fp); std::remove(spir_file); } OCL_CALL (cl_kernel_init, "compiler_ceil.cl", "compiler_ceil", SOURCE, build_opt); /* Do our test.*/ expect_status = CL_BUILD_SUCCESS; maps.insert(make_pair(CL_PROGRAM_BUILD_STATUS, (void *)(new Info_Result(expect_status)))); sz = strlen(build_opt) + 1; maps.insert(make_pair(CL_PROGRAM_BUILD_OPTIONS, (void *)(new Info_Result(build_opt, sz)))); for (map::iterator x = maps.begin(); x != maps.end(); ++x) { switch (x->first) { case CL_PROGRAM_BUILD_STATUS: CALL_PROG_BUILD_INFO_AND_RET(cl_build_status); break; case CL_PROGRAM_BUILD_OPTIONS: CALL_PROG_BUILD_INFO_AND_RET(char *); break; default: break; } } if (cl_check_beignet()) { //Test is successful if the backend created the file if( (fp = fopen(spir_file, "r")) == NULL) { std::cout << "SPIR file creation.. FAILED"; OCL_ASSERT(0); } else { fclose(fp); std::cout << "SPIR file created.. SUCCESS"; } } } MAKE_UTEST_FROM_FUNCTION(build_spir_binary); // This method uses clGetProgramBuildInfo to check the asm dump build options sent // And verifies that the asm dump file is actually generated in the backend. void get_build_asm_info(void) { map maps; cl_build_status expect_status; char asm_file[] = "test_asm_dump.txt"; char build_opt[] ="-dump-opt-asm=test_asm_dump.txt"; FILE *fp = NULL; int sz; //Remove any pre-existing file if( (fp = fopen(asm_file, "r")) != NULL) { fclose(fp); std::remove(asm_file); } OCL_CALL (cl_kernel_init, "compiler_if_else.cl", "compiler_if_else", SOURCE, build_opt); /* Do our test.*/ expect_status = CL_BUILD_SUCCESS; maps.insert(make_pair(CL_PROGRAM_BUILD_STATUS, (void *)(new Info_Result(expect_status)))); sz = strlen(build_opt) + 1; maps.insert(make_pair(CL_PROGRAM_BUILD_OPTIONS, (void *)(new Info_Result(build_opt, sz)))); for (map::iterator x = maps.begin(); x != maps.end(); ++x) { switch (x->first) { case CL_PROGRAM_BUILD_STATUS: CALL_PROG_BUILD_INFO_AND_RET(cl_build_status); break; case CL_PROGRAM_BUILD_OPTIONS: CALL_PROG_BUILD_INFO_AND_RET(char *); break; default: break; } } if (cl_check_beignet()) { //Test is successful if the backend created the file if( (fp = fopen(asm_file, "r")) == NULL) { std::cout << "ASM file creation.. FAILED"; OCL_ASSERT(0); } else { fclose(fp); std::cout << "ASM file created.. SUCCESS"; } } } MAKE_UTEST_FROM_FUNCTION(get_build_asm_info); void get_compile_llvm_info(void) { map maps; cl_build_status expect_status; char llvm_file[] = "test_llvm_dump.txt"; char compile_opt[] = "-dump-opt-llvm=test_llvm_dump.txt"; FILE *fp = NULL; //Remove any pre-existing file if( (fp = fopen(llvm_file, "r")) != NULL) { fclose(fp); std::remove(llvm_file); } OCL_CALL (cl_kernel_compile, "compiler_if_else.cl", "compiler_if_else", compile_opt); /* Do our test.*/ expect_status = CL_BUILD_SUCCESS; maps.insert(make_pair(CL_PROGRAM_BUILD_STATUS, (void *)(new Info_Result(expect_status)))); for (map::iterator x = maps.begin(); x != maps.end(); ++x) { switch (x->first) { case CL_PROGRAM_BUILD_STATUS: CALL_PROG_BUILD_INFO_AND_RET(cl_build_status); break; case CL_PROGRAM_BUILD_OPTIONS: CALL_PROG_BUILD_INFO_AND_RET(char *); break; default: break; } } if (cl_check_beignet()) { //Test is successful if the backend created the file if( (fp = fopen(llvm_file, "r")) == NULL) { std::cout << "LLVM file creation.. FAILED"; OCL_ASSERT(0); } else { fclose(fp); std::cout << "LLVM file created.. SUCCESS"; } } } MAKE_UTEST_FROM_FUNCTION(get_compile_llvm_info); void get_link_asm_info(void) { map maps; cl_build_status expect_status; char asm_file[] = "test_asm_dump.txt"; char link_opt[] = "-dump-opt-asm=test_asm_dump.txt"; FILE *fp = NULL; //Remove any pre-existing file if( (fp = fopen(asm_file, "r")) != NULL) { fclose(fp); std::remove(asm_file); } OCL_CALL (cl_kernel_link, "compiler_if_else.cl", "compiler_if_else", link_opt); /* Do our test.*/ expect_status = CL_BUILD_SUCCESS; maps.insert(make_pair(CL_PROGRAM_BUILD_STATUS, (void *)(new Info_Result(expect_status)))); for (map::iterator x = maps.begin(); x != maps.end(); ++x) { switch (x->first) { case CL_PROGRAM_BUILD_STATUS: CALL_PROG_BUILD_INFO_AND_RET(cl_build_status); break; case CL_PROGRAM_BUILD_OPTIONS: CALL_PROG_BUILD_INFO_AND_RET(char *); break; default: break; } } if (cl_check_beignet()) { //Test is successful if the backend created the file if( (fp = fopen(asm_file, "r")) == NULL) { std::cout << "ASM file creation.. FAILED"; OCL_ASSERT(0); } else { fclose(fp); std::cout << "ASM file created.. SUCCESS"; } } } MAKE_UTEST_FROM_FUNCTION(get_link_asm_info); /* ***************************************************** * * clGetContextInfo * * ***************************************************** */ #define CALL_CONTEXTINFO_AND_RET(TYPE) CALL_INFO_AND_RET(TYPE, clGetContextInfo, ctx) void get_context_info(void) { /* use the compiler_fabs case to test us. */ const size_t n = 16; map maps; int expect_ref; OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_CREATE_KERNEL("compiler_fabs"); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) ((float*)buf_data[0])[i] = .1f * (rand() & 15) - .75f; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); /* Do our test.*/ expect_ref = 1; maps.insert(make_pair(CL_CONTEXT_NUM_DEVICES, (void *)(new Info_Result(expect_ref)))); maps.insert(make_pair(CL_CONTEXT_DEVICES, (void *)(new Info_Result(device)))); // reference count seems depends on the implementation expect_ref = NO_STANDARD_REF; maps.insert(make_pair(CL_CONTEXT_REFERENCE_COUNT, (void *)(new Info_Result<>(((cl_uint)expect_ref))))); maps.insert(make_pair(CL_CONTEXT_PROPERTIES, (void *)(new Info_Result( (const char*)NULL, 100*sizeof(cl_context_properties))))); for (map::iterator x = maps.begin(); x != maps.end(); ++x) { switch (x->first) { case CL_CONTEXT_NUM_DEVICES: CALL_CONTEXTINFO_AND_RET(cl_uint); break; case CL_CONTEXT_DEVICES: CALL_CONTEXTINFO_AND_RET(cl_device_id); break; case CL_CONTEXT_REFERENCE_COUNT: CALL_CONTEXTINFO_AND_RET(cl_uint); break; case CL_CONTEXT_PROPERTIES: CALL_CONTEXTINFO_AND_RET(char*); break; default: break; } } } MAKE_UTEST_FROM_FUNCTION(get_context_info); /* ***************************************************** * * clGetKernelInfo * * ***************************************************** */ #define CALL_KERNELINFO_AND_RET(TYPE) CALL_INFO_AND_RET(TYPE, clGetKernelInfo, kernel) void get_kernel_info(void) { /* use the compiler_fabs case to test us. */ const size_t n = 16; map maps; int expect_ref; OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_CREATE_KERNEL("compiler_fabs"); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); // Run the kernel on GPU maps.insert(make_pair(CL_KERNEL_PROGRAM, (void *)(new Info_Result(program)))); maps.insert(make_pair(CL_KERNEL_CONTEXT, (void *)(new Info_Result(ctx)))); // reference count seems depends on the implementation expect_ref = NO_STANDARD_REF; maps.insert(make_pair(CL_KERNEL_REFERENCE_COUNT, (void *)(new Info_Result<>(((cl_uint)expect_ref))))); expect_ref = 2; maps.insert(make_pair(CL_KERNEL_NUM_ARGS, (void *)(new Info_Result(expect_ref)))); const char * expected_name = "compiler_fabs"; maps.insert(make_pair(CL_KERNEL_FUNCTION_NAME, (void *)(new Info_Result(expected_name, strlen(expected_name)+1)))); for (map::iterator x = maps.begin(); x != maps.end(); ++x) { switch (x->first) { case CL_KERNEL_PROGRAM: CALL_KERNELINFO_AND_RET(cl_program); break; case CL_KERNEL_CONTEXT: CALL_KERNELINFO_AND_RET(cl_context); break; case CL_KERNEL_REFERENCE_COUNT: CALL_KERNELINFO_AND_RET(cl_uint); break; case CL_KERNEL_NUM_ARGS: CALL_KERNELINFO_AND_RET(cl_uint); break; case CL_KERNEL_FUNCTION_NAME: CALL_KERNELINFO_AND_RET(char*); break; default: break; } } } MAKE_UTEST_FROM_FUNCTION(get_kernel_info); /* ***************************************************** * * clGetImageInfo * * ***************************************************** */ void get_image_info(void) { const size_t w = 512; const size_t h = 512; cl_image_format format; cl_image_desc desc; format.image_channel_order = CL_RGBA; format.image_channel_data_type = CL_UNSIGNED_INT8; desc.image_type = CL_MEM_OBJECT_IMAGE2D; desc.image_width = w; desc.image_height = h; desc.image_row_pitch = 0; desc.image_row_pitch = 0; desc.image_slice_pitch = 0; desc.num_mip_levels = 0; desc.num_samples = 0; desc.buffer = NULL; OCL_CREATE_IMAGE(buf[0], 0, &format, &desc, NULL); cl_mem image = buf[0]; cl_image_format ret_format; OCL_CALL(clGetImageInfo, image, CL_IMAGE_FORMAT, sizeof(ret_format), &ret_format, NULL); OCL_ASSERT(format.image_channel_order == ret_format.image_channel_order); OCL_ASSERT(format.image_channel_data_type == ret_format.image_channel_data_type); size_t element_size; OCL_CALL(clGetImageInfo, image, CL_IMAGE_ELEMENT_SIZE, sizeof(element_size), &element_size, NULL); OCL_ASSERT(element_size == 4); size_t row_pitch; OCL_CALL(clGetImageInfo, image, CL_IMAGE_ROW_PITCH, sizeof(row_pitch), &row_pitch, NULL); OCL_ASSERT(row_pitch == 4 * w); size_t slice_pitch; OCL_CALL(clGetImageInfo, image, CL_IMAGE_SLICE_PITCH, sizeof(slice_pitch), &slice_pitch, NULL); OCL_ASSERT(slice_pitch == 0); size_t width; OCL_CALL(clGetImageInfo, image, CL_IMAGE_WIDTH, sizeof(width), &width, NULL); OCL_ASSERT(width == w); size_t height; OCL_CALL(clGetImageInfo, image, CL_IMAGE_HEIGHT, sizeof(height), &height, NULL); OCL_ASSERT(height == h); size_t depth; OCL_CALL(clGetImageInfo, image, CL_IMAGE_DEPTH, sizeof(depth), &depth, NULL); OCL_ASSERT(depth == 0); } MAKE_UTEST_FROM_FUNCTION(get_image_info); /* ***************************************************** * * clGetMemObjectInfo * * ***************************************************** */ #define CALL_GETMEMINFO_AND_RET(TYPE) CALL_INFO_AND_RET(TYPE, clGetMemObjectInfo, (buf[0])) void get_mem_info(void) { map maps; int expect_ref; cl_mem sub_buf; cl_int error; OCL_CREATE_BUFFER(buf[1], 0, 4096, NULL); cl_buffer_region region; region.origin = 1024; region.size = 2048; sub_buf = clCreateSubBuffer(buf[1], 0, CL_BUFFER_CREATE_TYPE_REGION, ®ion, &error ); buf[0] = sub_buf; OCL_ASSERT(error == CL_SUCCESS); void * map_ptr = clEnqueueMapBuffer(queue, buf[0], 1, CL_MAP_READ, 0, 64, 0, NULL, NULL, NULL); expect_ref = CL_MEM_OBJECT_BUFFER; maps.insert(make_pair(CL_MEM_TYPE, (void *)(new Info_Result((cl_mem_object_type)expect_ref)))); expect_ref = 0; maps.insert(make_pair(CL_MEM_FLAGS, (void *)(new Info_Result(expect_ref)))); expect_ref = 2048; maps.insert(make_pair(CL_MEM_SIZE, (void *)(new Info_Result(((size_t)expect_ref))))); expect_ref = 1024; maps.insert(make_pair(CL_MEM_HOST_PTR, (void *)(new Info_Result(((size_t)expect_ref))))); expect_ref = 1; maps.insert(make_pair(CL_MEM_MAP_COUNT, (void *)(new Info_Result(((cl_uint)expect_ref))))); expect_ref = 1; maps.insert(make_pair(CL_MEM_REFERENCE_COUNT, (void *)(new Info_Result(((cl_uint)expect_ref))))); maps.insert(make_pair(CL_MEM_CONTEXT, (void *)(new Info_Result(((cl_context)ctx))))); maps.insert(make_pair(CL_MEM_ASSOCIATED_MEMOBJECT, (void *)(new Info_Result(((cl_mem)buf[1]))))); expect_ref = 1024; maps.insert(make_pair(CL_MEM_OFFSET, (void *)(new Info_Result(((size_t)expect_ref))))); for (map::iterator x = maps.begin(); x != maps.end(); ++x) { switch (x->first) { case CL_MEM_TYPE: CALL_GETMEMINFO_AND_RET(cl_mem_object_type); break; case CL_MEM_FLAGS: CALL_GETMEMINFO_AND_RET(cl_mem_flags); break; case CL_MEM_SIZE: CALL_GETMEMINFO_AND_RET(size_t); break; case CL_MEM_HOST_PTR: CALL_GETMEMINFO_AND_RET(size_t); break; case CL_MEM_MAP_COUNT: CALL_GETMEMINFO_AND_RET(cl_uint); break; case CL_MEM_REFERENCE_COUNT: CALL_GETMEMINFO_AND_RET(cl_uint); break; case CL_MEM_CONTEXT: CALL_GETMEMINFO_AND_RET(cl_context); break; case CL_MEM_ASSOCIATED_MEMOBJECT: CALL_GETMEMINFO_AND_RET(cl_mem); break; case CL_MEM_OFFSET: CALL_GETMEMINFO_AND_RET(size_t); break; default: break; } } clEnqueueUnmapMemObject(queue, buf[0], map_ptr, 0, NULL, NULL); } MAKE_UTEST_FROM_FUNCTION(get_mem_info); Beignet-1.3.2-Source/utests/vload_bench.cpp000664 001750 001750 00000006110 13161142102 017744 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include #define N_ITERATIONS 10000 #define T uint8_t template static double vload_bench(const char *kernelFunc, uint32_t N, uint32_t offset, bool benchMode) { const size_t n = benchMode ? (512 * 1024) : (8 * 1024); struct timeval start, end; // Setup kernel and buffers std::string kernelName = kernelFunc + std::to_string((long long unsigned int)N); OCL_CALL (cl_kernel_init, "vload_bench.cl", kernelName.c_str(), SOURCE, NULL); //OCL_CREATE_KERNEL("compiler_array"); buf_data[0] = (T*) malloc(sizeof(T) * n); for (uint32_t i = 0; i < n; ++i) ((T*)buf_data[0])[i] = i; //rand() & ((1LL << N) - 1); OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(T), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(uint32_t), &offset); globals[0] = n / ((N + 1) & ~0x1); locals[0] = 256; if (benchMode) gettimeofday(&start, NULL); OCL_NDRANGE(1); if (benchMode) { OCL_FINISH(); gettimeofday(&end, NULL); double elapsed = (end.tv_sec - start.tv_sec) * 1e6 + (end.tv_usec - start.tv_usec); double bandwidth = (globals[0] * (N_ITERATIONS) * sizeof(T) * N) / (elapsed * 1000.); printf("\t%2.1fGB/S\n", bandwidth); return bandwidth; } else { // Check result OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < globals[0]; ++i) { OCL_ASSERT((uint32_t)(((T*)buf_data[0])[i + offset]) == ((uint32_t*)buf_data[1])[i]); } return 0; } } #define VLOAD_TEST(T, kT) \ static void vload_test_ ##kT(void) \ { \ uint8_t vectorSize[] = {2, 3, 4, 8, 16}; \ for(uint32_t i = 0; i < sizeof(vectorSize); i++) { \ for(uint32_t offset = 0; offset < vectorSize[i]; offset++) {\ (void)vload_bench("vload_bench_1" #kT, vectorSize[i], offset, false); \ }\ } \ }\ MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(vload_test_ ##kT, true) #ifndef BUILD_BENCHMARK VLOAD_TEST(uint8_t, uchar) VLOAD_TEST(int8_t, char) VLOAD_TEST(uint16_t, ushort) VLOAD_TEST(int16_t, short) VLOAD_TEST(uint32_t, uint) VLOAD_TEST(int32_t, int) VLOAD_TEST(float, float) #endif #define VLOAD_BENCH(T, kT) \ static double vload_bench_ ##kT(void) \ { \ uint8_t vectorSize[] = {2, 3, 4, 8, 16}; \ double totBandwidth = 0; \ unsigned int j = 0;\ printf("\n");\ for(uint32_t i = 0; i < sizeof(vectorSize); i++, j++) { \ printf(" Vector size %d:\n", vectorSize[i]); \ uint32_t k = 0;\ double bandwidthForOneSize = 0;\ for(uint32_t offset = 0; offset < vectorSize[i]; offset++, k++) {\ printf("\tOffset %d :", offset); \ bandwidthForOneSize += vload_bench("vload_bench_10000" #kT, vectorSize[i], offset, true); \ }\ totBandwidth += bandwidthForOneSize / k;\ } \ return totBandwidth/j;\ }\ MAKE_BENCHMARK_FROM_FUNCTION_KEEP_PROGRAM(vload_bench_ ##kT, true, "GB/S") #ifdef BUILD_BENCHMARK VLOAD_BENCH(uint8_t, uchar) VLOAD_BENCH(uint16_t, ushort) VLOAD_BENCH(uint32_t, uint) #endif Beignet-1.3.2-Source/utests/compiler_unstructured_branch1.cpp000664 001750 001750 00000003073 13161142102 023544 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_unstructured_branch1(void) { const size_t n = 16; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_unstructured_branch1"); buf_data[0] = (uint32_t*) malloc(sizeof(uint32_t) * n); for (uint32_t i = 0; i < n; ++i) ((uint32_t*)buf_data[0])[i] = 2; OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(uint32_t), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; OCL_NDRANGE(1); // First control flow OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 2); // Second control flow for (uint32_t i = 0; i < n; ++i) ((int32_t*)buf_data[0])[i] = -2; OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((uint32_t*)buf_data[1])[i] == 3); // Third control flow for (uint32_t i = 0; i < 8; ++i) ((int32_t*)buf_data[0])[i] = 2; for (uint32_t i = 8; i < n; ++i) ((int32_t*)buf_data[0])[i] = -2; OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 8; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 2); for (uint32_t i = 8; i < n; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 3); } MAKE_UTEST_FROM_FUNCTION(compiler_unstructured_branch1); Beignet-1.3.2-Source/utests/builtin_shuffle2.cpp000664 001750 001750 00000002361 13161142102 020750 0ustar00yryr000000 000000 #include "utest_helper.hpp" void builtin_shuffle2(void) { const int n = 32; // Setup kernel and buffers OCL_CREATE_KERNEL("builtin_shuffle2"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[3], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_SET_ARG(3, sizeof(cl_mem), &buf[3]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int i = 0; i < n; i ++) { ((float *)(buf_data[0]))[i] = (rand() & 15) * 0.1f; ((float *)(buf_data[1]))[i] = (rand() & 15) * 0.1f; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); OCL_MAP_BUFFER(3); for (int i = 0; i < n; i ++) { OCL_ASSERT(2 * ((float *)(buf_data[0]))[i] == ((float *)(buf_data[3]))[i]); OCL_ASSERT(2 * ((float *)(buf_data[1]))[i] == ((float *)(buf_data[2]))[i]); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); OCL_UNMAP_BUFFER(3); } MAKE_UTEST_FROM_FUNCTION(builtin_shuffle2); Beignet-1.3.2-Source/utests/compiler_box_blur_image.cpp000664 001750 001750 00000002511 13161142102 022351 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_box_blur_image() { int w, h; cl_image_format format = { }; cl_image_desc desc = { }; size_t origin[3] = { }; size_t region[3]; int *src, *dst; OCL_CREATE_KERNEL("compiler_box_blur_image"); /* Load the picture */ src = cl_read_bmp("sample.bmp", &w, &h); format.image_channel_order = CL_RGBA; format.image_channel_data_type = CL_UNSIGNED_INT8; desc.image_type = CL_MEM_OBJECT_IMAGE2D; desc.image_width = w; desc.image_height = h; desc.image_depth = 1; desc.image_row_pitch = w*sizeof(uint32_t); /* Run the kernel */ OCL_CREATE_IMAGE(buf[0], CL_MEM_COPY_HOST_PTR, &format, &desc, src); free(src); desc.image_row_pitch = 0; OCL_CREATE_IMAGE(buf[1], 0, &format, &desc, NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = w; globals[1] = h; locals[0] = 16; locals[1] = 16; OCL_NDRANGE(2); dst = (int*)malloc(w*h*sizeof(uint32_t)); region[0] = w; region[1] = h; region[2] = 1; OCL_READ_IMAGE(buf[1], origin, region, dst); /* Save the image (for debug purpose) */ cl_write_bmp(dst, w, h, "compiler_box_blur_image.bmp"); /* Compare with the golden image */ OCL_CHECK_IMAGE(dst, w, h, "compiler_box_blur_ref.bmp"); free(dst); } MAKE_UTEST_FROM_FUNCTION(compiler_box_blur_image); Beignet-1.3.2-Source/utests/compiler_subgroup_scan_inclusive.cpp000664 001750 001750 00000041204 13161142102 024330 0ustar00yryr000000 000000 #include #include #include #include #include #include #include #include "utest_helper.hpp" using namespace std; /* set to 1 for debug, output of input-expected data */ #define DEBUG_STDOUT 0 /* NDRANGE */ #define WG_GLOBAL_SIZE 30 #define WG_LOCAL_SIZE 30 enum WG_FUNCTION { WG_SCAN_INCLUSIVE_ADD, WG_SCAN_INCLUSIVE_MAX, WG_SCAN_INCLUSIVE_MIN }; /* * Generic compute-expected function for op SCAN INCLUSIVE type * and any variable type */ template static void compute_expected(WG_FUNCTION wg_func, T* input, T* expected, size_t SIMD_SIZE, bool IS_HALF) { if(wg_func == WG_SCAN_INCLUSIVE_ADD) { expected[0] = input[0]; for(uint32_t i = 1; i < SIMD_SIZE; i++) { if (IS_HALF) expected[i] = __float_to_half(as_uint(as_float(__half_to_float(input[i])) + as_float(__half_to_float(expected[i - 1])))); else expected[i] = input[i] + expected[i - 1]; } } else if(wg_func == WG_SCAN_INCLUSIVE_MAX) { expected[0] = input[0]; for(uint32_t i = 1; i < SIMD_SIZE; i++) { if (IS_HALF) expected[i] = (as_float(__half_to_float(input[i])) > as_float(__half_to_float(expected[i - 1]))) ? input[i] : expected[i - 1]; else expected[i] = max(input[i], expected[i - 1]); } } else if(wg_func == WG_SCAN_INCLUSIVE_MIN) { expected[0] = input[0]; for(uint32_t i = 1; i < SIMD_SIZE; i++) { if (IS_HALF) expected[i] = (as_float(__half_to_float(input[i])) < as_float(__half_to_float(expected[i - 1]))) ? input[i] : expected[i - 1]; else expected[i] = min(input[i], expected[i - 1]); } } } /* * Generic input-expected generate function for op SCAN INCLUSIVE type * and any variable type */ template static void generate_data(WG_FUNCTION wg_func, T* &input, T* &expected, size_t SIMD_SIZE, bool IS_HALF) { input = new T[WG_GLOBAL_SIZE]; expected = new T[WG_GLOBAL_SIZE]; /* base value for all data types */ T base_val = (long)7 << (sizeof(T) * 5 - 3); /* seed for random inputs */ srand (time(NULL)); /* generate inputs and expected values */ for(uint32_t gid = 0; gid < WG_GLOBAL_SIZE; gid += SIMD_SIZE) { #if DEBUG_STDOUT cout << endl << "IN: " << endl; #endif SIMD_SIZE = (gid + SIMD_SIZE) > WG_GLOBAL_SIZE ? WG_GLOBAL_SIZE - gid : SIMD_SIZE; /* input values */ for(uint32_t lid = 0; lid < SIMD_SIZE; lid++) { /* initially 0, augment after */ input[gid + lid] = 0; /* check all data types, test ideal for QWORD types */ input[gid + lid] += ((rand() % 2 - 1) * base_val); /* add trailing random bits, tests GENERAL cases */ input[gid + lid] += (rand() % 112); if (IS_HALF) input[gid + lid] = __float_to_half(as_uint((float)input[gid + lid]/2)); #if DEBUG_STDOUT /* output generated input */ cout << setw(4) << input[gid + lid] << ", " ; if((lid + 1) % 8 == 0) cout << endl; #endif } /* expected values */ compute_expected(wg_func, input + gid, expected + gid, SIMD_SIZE, IS_HALF); #if DEBUG_STDOUT /* output expected input */ cout << endl << "EXP: " << endl; for(uint32_t lid = 0; lid < SIMD_SIZE; lid++) { cout << setw(4) << expected[gid + lid] << ", " ; if((lid + 1) % 8 == 0) cout << endl; } cout << endl; #endif } } /* * Generic subgroup utest function for op SCAN INCLUSIVE type * and any variable type */ template static void subgroup_generic(WG_FUNCTION wg_func, T* input, T* expected, bool IS_HALF = false) { /* get simd size */ globals[0] = WG_GLOBAL_SIZE; locals[0] = WG_LOCAL_SIZE; size_t SIMD_SIZE = 0; OCL_CALL(utestclGetKernelSubGroupInfoKHR,kernel,device,CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR,sizeof(size_t)*1,locals,sizeof(size_t),&SIMD_SIZE,NULL); /* input and expected data */ generate_data(wg_func, input, expected, SIMD_SIZE, IS_HALF); /* prepare input for data type */ OCL_CREATE_BUFFER(buf[0], 0, WG_GLOBAL_SIZE * sizeof(T), NULL); OCL_CREATE_BUFFER(buf[1], 0, WG_GLOBAL_SIZE * sizeof(T), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); /* set input data for GPU */ OCL_MAP_BUFFER(0); memcpy(buf_data[0], input, WG_GLOBAL_SIZE * sizeof(T)); OCL_UNMAP_BUFFER(0); /* run the kernel on GPU */ OCL_NDRANGE(1); /* check if mismatch */ OCL_MAP_BUFFER(1); uint32_t mismatches = 0; for (uint32_t i = 0; i < WG_GLOBAL_SIZE; i++) if(((T *)buf_data[1])[i] != *(expected + i)) { if (IS_HALF) { float num_computed = as_float(__half_to_float(((T *)buf_data[1])[i])); float num_expected = as_float(__half_to_float(*(expected + i))); float num_diff = abs(num_computed - num_expected) / abs(num_expected); if (num_diff > 0.03f) { mismatches++; #if DEBUG_STDOUT /* output mismatch */ cout << "Err at " << i << ", " << num_computed << " != " << num_expected <<" diff: " <::is_integer) { mismatches++; #if DEBUG_STDOUT /* output mismatch */ cout << "Err at " << i << ", " << ((T *)buf_data[1])[i] << " != " << *(expected + i) << endl; #endif } /* float error is tolerable though */ else { float num_computed = ((T *)buf_data[1])[i]; float num_expected = *(expected + i); float num_diff = abs(num_computed - num_expected) / abs(num_expected); if(num_diff > 0.01f){ mismatches++; #if DEBUG_STDOUT /* output mismatch */ cout << "Err at " << i << ", " << ((T *)buf_data[1])[i] << " != " << *(expected + i) << endl; #endif } } } #if DEBUG_STDOUT /* output mismatch count */ cout << "mismatches " << mismatches << endl; #endif OCL_UNMAP_BUFFER(1); OCL_ASSERT(mismatches == 0); } /* * Workgroup scan_inclusive add utest functions */ void compiler_subgroup_scan_inclusive_add_int(void) { if(!cl_check_subgroups()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_add_int"); subgroup_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_add_int); void compiler_subgroup_scan_inclusive_add_uint(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_add_uint"); subgroup_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_add_uint); void compiler_subgroup_scan_inclusive_add_long(void) { if(!cl_check_subgroups()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_add_long"); subgroup_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_subgroup_scan_inclusive_add_long); void compiler_subgroup_scan_inclusive_add_ulong(void) { if(!cl_check_subgroups()) return; cl_ulong *input = NULL; cl_ulong *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_add_ulong"); subgroup_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_subgroup_scan_inclusive_add_ulong); void compiler_subgroup_scan_inclusive_add_float(void) { if(!cl_check_subgroups()) return; cl_float *input = NULL; cl_float *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_add_float"); subgroup_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_add_float); void compiler_subgroup_scan_inclusive_add_half(void) { if(!cl_check_subgroups()) return; if(!cl_check_half()) return; cl_half *input = NULL; cl_half *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_scan_inclusive.cl", "compiler_subgroup_scan_inclusive_add_half", SOURCE, "-DHALF"); subgroup_generic(WG_SCAN_INCLUSIVE_ADD, input, expected, true); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_add_half); void compiler_subgroup_scan_inclusive_add_short(void) { if(!cl_check_subgroups()) return; cl_short *input = NULL; cl_short *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_add_short"); subgroup_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_add_short); void compiler_subgroup_scan_inclusive_add_ushort(void) { if(!cl_check_subgroups()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_add_ushort"); subgroup_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_add_ushort); /* * Workgroup scan_inclusive max utest functions */ void compiler_subgroup_scan_inclusive_max_int(void) { if(!cl_check_subgroups()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_max_int"); subgroup_generic(WG_SCAN_INCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_max_int); void compiler_subgroup_scan_inclusive_max_uint(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_max_uint"); subgroup_generic(WG_SCAN_INCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_max_uint); void compiler_subgroup_scan_inclusive_max_long(void) { if(!cl_check_subgroups()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_max_long"); subgroup_generic(WG_SCAN_INCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_subgroup_scan_inclusive_max_long); void compiler_subgroup_scan_inclusive_max_ulong(void) { if(!cl_check_subgroups()) return; cl_ulong *input = NULL; cl_ulong *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_max_ulong"); subgroup_generic(WG_SCAN_INCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_subgroup_scan_inclusive_max_ulong); void compiler_subgroup_scan_inclusive_max_float(void) { if(!cl_check_subgroups()) return; cl_float *input = NULL; cl_float *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_max_float"); subgroup_generic(WG_SCAN_INCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_max_float); void compiler_subgroup_scan_inclusive_max_half(void) { if(!cl_check_subgroups()) return; if(!cl_check_half()) return; cl_half *input = NULL; cl_half *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_scan_inclusive.cl", "compiler_subgroup_scan_inclusive_max_half", SOURCE, "-DHALF"); subgroup_generic(WG_SCAN_INCLUSIVE_MAX, input, expected, true); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_max_half); void compiler_subgroup_scan_inclusive_max_short(void) { if(!cl_check_subgroups()) return; cl_short *input = NULL; cl_short *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_max_short"); subgroup_generic(WG_SCAN_INCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_max_short); void compiler_subgroup_scan_inclusive_max_ushort(void) { if(!cl_check_subgroups()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_max_ushort"); subgroup_generic(WG_SCAN_INCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_max_ushort); /* * Workgroup scan_inclusive min utest functions */ void compiler_subgroup_scan_inclusive_min_int(void) { if(!cl_check_subgroups()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_min_int"); subgroup_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_min_int); void compiler_subgroup_scan_inclusive_min_uint(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_min_uint"); subgroup_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_min_uint); void compiler_subgroup_scan_inclusive_min_long(void) { if(!cl_check_subgroups()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_min_long"); subgroup_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_subgroup_scan_inclusive_min_long); void compiler_subgroup_scan_inclusive_min_ulong(void) { if(!cl_check_subgroups()) return; cl_ulong *input = NULL; cl_ulong *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_min_ulong"); subgroup_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_subgroup_scan_inclusive_min_ulong); void compiler_subgroup_scan_inclusive_min_float(void) { if(!cl_check_subgroups()) return; cl_float *input = NULL; cl_float *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_min_float"); subgroup_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_min_float); void compiler_subgroup_scan_inclusive_min_half(void) { if(!cl_check_subgroups()) return; if(!cl_check_half()) return; cl_half *input = NULL; cl_half *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_scan_inclusive.cl", "compiler_subgroup_scan_inclusive_min_half", SOURCE, "-DHALF"); subgroup_generic(WG_SCAN_INCLUSIVE_MIN, input, expected, true); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_min_half); void compiler_subgroup_scan_inclusive_min_short(void) { if(!cl_check_subgroups()) return; cl_short *input = NULL; cl_short *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_min_short"); subgroup_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_min_short); void compiler_subgroup_scan_inclusive_min_ushort(void) { if(!cl_check_subgroups()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_scan_inclusive", "compiler_subgroup_scan_inclusive_min_ushort"); subgroup_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_scan_inclusive_min_ushort); Beignet-1.3.2-Source/utests/compiler_upsample_long.cpp000664 001750 001750 00000002002 13161142102 022233 0ustar00yryr000000 000000 #include #include "utest_helper.hpp" void compiler_upsample_long(void) { const int n = 32; int src1[n]; unsigned int src2[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_upsample_long"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(unsigned int), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(int64_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int i = 0; i < n; ++i) { src1[i] = ((int*)buf_data[0])[i] = rand(); src2[i] = ((unsigned int*)buf_data[1])[i] = rand(); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(2); for (int i = 0; i < n; ++i) OCL_ASSERT(((int64_t*)buf_data[2])[i] == (((int64_t)(src1[i]) << 32) | src2[i])); OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(compiler_upsample_long); Beignet-1.3.2-Source/utests/runtime_event.cpp000664 001750 001750 00000003242 13161142102 020367 0ustar00yryr000000 000000 #include "utest_helper.hpp" #define BUFFERSIZE 32*1024 void runtime_event(void) { const size_t n = BUFFERSIZE; cl_int cpu_src[BUFFERSIZE]; cl_event ev[3]; cl_int status = 0; cl_int value = 34; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_event"); OCL_CREATE_BUFFER(buf[0], 0, BUFFERSIZE*sizeof(int), NULL); for(cl_uint i=0; i= CL_SUBMITTED); } buf_data[0] = clEnqueueMapBuffer(queue, buf[0], CL_FALSE, 0, 0, BUFFERSIZE*sizeof(int), 1, &ev[2], NULL, NULL); OCL_SET_USER_EVENT_STATUS(ev[0], CL_COMPLETE); clGetEventInfo(ev[0], CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(status), &status, NULL); OCL_ASSERT(status == CL_COMPLETE); OCL_FINISH(); for (cl_uint i = 0; i != sizeof(ev) / sizeof(cl_event); ++i) { clGetEventInfo(ev[i], CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(status), &status, NULL); OCL_ASSERT(status <= CL_COMPLETE); } for (uint32_t i = 0; i < n; ++i) { OCL_ASSERT(((int*)buf_data[0])[i] == (int)value + 0x3); } for (cl_uint i = 0; i != sizeof(ev) / sizeof(cl_event); ++i) { clReleaseEvent(ev[i]); } } MAKE_UTEST_FROM_FUNCTION(runtime_event); Beignet-1.3.2-Source/utests/utest_helper.cpp000664 001750 001750 00000100366 13173554000 020222 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "utest_file_map.hpp" #include "utest_helper.hpp" #include "utest_error.h" #include "CL/cl.h" #include "CL/cl_intel.h" #include #include #include #include #include #include #define FATAL(...) \ do { \ fprintf(stderr, "error: "); \ fprintf(stderr, __VA_ARGS__); \ fprintf(stderr, "\n");\ assert(0); \ exit(-1); \ } while (0) #define FATAL_IF(COND, ...) \ do { \ if (COND) FATAL(__VA_ARGS__); \ } while (0) cl_platform_id platform = NULL; cl_device_id device = NULL; cl_context ctx = NULL; __thread cl_program program = NULL; __thread cl_kernel kernel = NULL; cl_command_queue queue = NULL; __thread cl_mem buf[MAX_BUFFER_N] = {}; __thread void *buf_data[MAX_BUFFER_N] = {}; __thread size_t globals[3] = {}; __thread size_t locals[3] = {}; float ULPSIZE_FAST_MATH = 10000.; __attribute__ ((visibility ("internal"))) clGetKernelSubGroupInfoKHR_cb* utestclGetKernelSubGroupInfoKHR = NULL; #ifdef HAS_GL_EGL_X11 Display *xDisplay; EGLDisplay eglDisplay; EGLContext eglContext = NULL; EGLSurface eglSurface; Window xWindow; void cl_ocl_destroy_egl_window() { eglMakeCurrent(eglDisplay, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT); eglDestroyContext(eglDisplay, eglContext); eglDestroySurface(eglDisplay, eglSurface); XDestroyWindow(xDisplay, xWindow); XCloseDisplay(xDisplay); } bool init_egl_window(int width, int height) { XSetWindowAttributes swa; Window win, root; EGLint attr[] = { // some attributes to set up our egl-interface EGL_BUFFER_SIZE, 16, EGL_RENDERABLE_TYPE, EGL_OPENGL_BIT, EGL_NONE }; //// egl-contexts collect all state descriptions needed required for operation EGLint ctxattr[] = { #if 0 EGL_CONTEXT_CLIENT_VERSION, 2, #endif EGL_NONE }; EGLConfig ecfg; EGLint numConfig; eglContext = EGL_NO_CONTEXT; xDisplay = XOpenDisplay(NULL); if (xDisplay == NULL) { fprintf(stderr, "Failed to open DISPLAY.\n"); return false; } root = DefaultRootWindow(xDisplay); swa.event_mask = ExposureMask | PointerMotionMask | KeyPressMask; win = XCreateWindow( xDisplay, root, 0, 0, width, height, 0, CopyFromParent, InputOutput, CopyFromParent, CWEventMask, &swa); xWindow = win; /////// the egl part ////////////////////////////////////////////////////////////////// // egl provides an interface to connect the graphics related functionality of openGL ES // with the windowing interface and functionality of the native operation system (X11 // in our case. eglDisplay = eglGetDisplay( (EGLNativeDisplayType) xDisplay ); if ( eglDisplay == EGL_NO_DISPLAY ) { fprintf(stderr, "Got no EGL display.\n"); return false; } eglBindAPI(EGL_OPENGL_API); int m,n; if ( !eglInitialize( eglDisplay, &m, &n ) ) { fprintf(stderr, "Unable to initialize EGL\n"); return false; } if ( !eglChooseConfig( eglDisplay, attr, &ecfg, 1, &numConfig ) ) { fprintf(stderr, "Failed to choose config (eglError: %d)\n", eglGetError()); return false; } if ( numConfig != 1 ) { fprintf(stderr, "Didn't get exactly one config, but %d", numConfig); return false; } eglSurface = eglCreateWindowSurface ( eglDisplay, ecfg, win, NULL ); if ( eglSurface == EGL_NO_SURFACE ) { fprintf(stderr, "Unable to create EGL surface (eglError: %d)\n", eglGetError()); return false; } eglContext = eglCreateContext ( eglDisplay, ecfg, EGL_NO_CONTEXT, ctxattr ); if ( eglContext == EGL_NO_CONTEXT ) { fprintf(stderr, "Unable to create EGL context (eglError: %d)\n", eglGetError()); return false; } //// associate the egl-context with the egl-surface eglMakeCurrent( eglDisplay, eglSurface, eglSurface, eglContext); glClearColor(1.0, 1.0, 1.0, 1.0); glClear(GL_COLOR_BUFFER_BIT); glFinish(); eglSwapBuffers(eglDisplay, eglSurface); return true; } #endif static const char* cl_test_channel_order_string(cl_channel_order order) { switch(order) { #define DECL_ORDER(WHICH) case CL_##WHICH: return "CL_"#WHICH DECL_ORDER(R); DECL_ORDER(A); DECL_ORDER(RG); DECL_ORDER(RA); DECL_ORDER(RGB); DECL_ORDER(RGBA); DECL_ORDER(BGRA); DECL_ORDER(ARGB); DECL_ORDER(INTENSITY); DECL_ORDER(LUMINANCE); DECL_ORDER(Rx); DECL_ORDER(RGx); DECL_ORDER(RGBx); DECL_ORDER(sRGBA); DECL_ORDER(sBGRA); #undef DECL_ORDER default: return "Unsupported image channel order"; }; } static const char* cl_test_channel_type_string(cl_channel_type type) { switch(type) { #define DECL_TYPE(WHICH) case CL_##WHICH: return "CL_"#WHICH DECL_TYPE(SNORM_INT8); DECL_TYPE(SNORM_INT16); DECL_TYPE(UNORM_INT8); DECL_TYPE(UNORM_INT16); DECL_TYPE(UNORM_SHORT_565); DECL_TYPE(UNORM_SHORT_555); DECL_TYPE(UNORM_INT_101010); DECL_TYPE(SIGNED_INT8); DECL_TYPE(SIGNED_INT16); DECL_TYPE(SIGNED_INT32); DECL_TYPE(UNSIGNED_INT8); DECL_TYPE(UNSIGNED_INT16); DECL_TYPE(UNSIGNED_INT32); DECL_TYPE(HALF_FLOAT); DECL_TYPE(FLOAT); #undef DECL_TYPE default: return "Unsupported image channel type"; }; } static void clpanic(const char *msg, int rval) { printf("Failed: %s (%d)\n", msg, rval); exit(-1); } char* cl_do_kiss_path(const char *file, cl_device_id device) { const char *sub_path = NULL; char *ker_path = NULL; const char *kiss_path = getenv("OCL_KERNEL_PATH"); size_t sz = strlen(file); sub_path = ""; if (kiss_path == NULL) clpanic("set OCL_KERNEL_PATH. This is where the kiss kernels are", -1); sz += strlen(kiss_path) + strlen(sub_path) + 2; /* +1 for end of string, +1 for '/' */ if ((ker_path = (char*) malloc(sz)) == NULL) clpanic("Allocation failed", -1); sprintf(ker_path, "%s/%s%s", kiss_path, sub_path, file); return ker_path; } int cl_kernel_init(const char *file_name, const char *kernel_name, int format, const char * build_opt) { cl_file_map_t *fm = NULL; char *ker_path = NULL; cl_int status = CL_SUCCESS; static const char *prevFileName = NULL; /* Load the program and build it */ if (!program || (program && (!prevFileName || strcmp(prevFileName, file_name)))) { if (program) clReleaseProgram(program); ker_path = cl_do_kiss_path(file_name, device); if (format == LLVM) { assert(0); } else if (format == SOURCE) { cl_file_map_t *fm = cl_file_map_new(); if(!fm) { fprintf(stderr, "run out of memory\n"); goto error; } FATAL_IF (cl_file_map_open(fm, ker_path) != CL_FILE_MAP_SUCCESS, "Failed to open file \"%s\" with kernel \"%s\". Did you properly set OCL_KERNEL_PATH variable?", file_name, kernel_name); const char *src = cl_file_map_begin(fm); const size_t sz = cl_file_map_size(fm); program = clCreateProgramWithSource(ctx, 1, &src, &sz, &status); cl_file_map_delete(fm); } else FATAL("Not able to create program from binary"); if (status != CL_SUCCESS) { fprintf(stderr, "error calling clCreateProgramWithBinary\n"); goto error; } prevFileName = file_name; /* OCL requires to build the program even if it is created from a binary */ OCL_CALL (clBuildProgram, program, 1, &device, build_opt, NULL, NULL); } /* Create a kernel from the program */ if (kernel) clReleaseKernel(kernel); kernel = clCreateKernel(program, kernel_name, &status); if (status != CL_SUCCESS) { fprintf(stderr, "error calling clCreateKernel\n"); goto error; } exit: free(ker_path); cl_file_map_delete(fm); return status; error: prevFileName = NULL; goto exit; } int cl_kernel_compile(const char *file_name, const char *kernel_name, const char * compile_opt) { cl_file_map_t *fm = NULL; char *ker_path = NULL; cl_int status = CL_SUCCESS; static const char *prevFileName = NULL; /* Load the program and build it */ if (!program || (program && (!prevFileName || strcmp(prevFileName, file_name)))) { if (program) clReleaseProgram(program); ker_path = cl_do_kiss_path(file_name, device); cl_file_map_t *fm = cl_file_map_new(); if(!fm) { fprintf(stderr, "run out of memory\n"); goto error; } FATAL_IF (cl_file_map_open(fm, ker_path) != CL_FILE_MAP_SUCCESS, "Failed to open file \"%s\" with kernel \"%s\". Did you properly set OCL_KERNEL_PATH variable?", file_name, kernel_name); const char *src = cl_file_map_begin(fm); const size_t sz = cl_file_map_size(fm); program = clCreateProgramWithSource(ctx, 1, &src, &sz, &status); cl_file_map_delete(fm); if (status != CL_SUCCESS) { fprintf(stderr, "error calling clCreateProgramWithSource\n"); goto error; } prevFileName = file_name; OCL_CALL (clCompileProgram, program, 1, &device, // num_devices & device_list compile_opt, // compile_options 0, // num_input_headers NULL, NULL, NULL, NULL); OCL_ASSERT(status == CL_SUCCESS); } exit: free(ker_path); cl_file_map_delete(fm); return status; error: prevFileName = NULL; goto exit; } int cl_kernel_link(const char *file_name, const char *kernel_name, const char * link_opt) { cl_file_map_t *fm = NULL; char *ker_path = NULL; cl_int status = CL_SUCCESS; static const char *prevFileName = NULL; /* Load the program and build it */ if (!program || (program && (!prevFileName || strcmp(prevFileName, file_name)))) { if (program) clReleaseProgram(program); ker_path = cl_do_kiss_path(file_name, device); cl_file_map_t *fm = cl_file_map_new(); if(!fm) { fprintf(stderr, "run out of memory\n"); goto error; } FATAL_IF (cl_file_map_open(fm, ker_path) != CL_FILE_MAP_SUCCESS, "Failed to open file \"%s\" with kernel \"%s\". Did you properly set OCL_KERNEL_PATH variable?", file_name, kernel_name); const char *src = cl_file_map_begin(fm); const size_t sz = cl_file_map_size(fm); program = clCreateProgramWithSource(ctx, 1, &src, &sz, &status); cl_file_map_delete(fm); if (status != CL_SUCCESS) { fprintf(stderr, "error calling clCreateProgramWithSource\n"); goto error; } prevFileName = file_name; OCL_CALL (clCompileProgram, program, 1, &device, // num_devices & device_list NULL, // compile_options 0, // num_input_headers NULL, NULL, NULL, NULL); OCL_ASSERT(status==CL_SUCCESS); cl_program input_programs[1] = {program}; program = clLinkProgram(ctx, 1, &device, link_opt, 1, input_programs, NULL, NULL, &status); OCL_ASSERT(program != NULL); OCL_ASSERT(status == CL_SUCCESS); clReleaseProgram(input_programs[0]); } /* Create a kernel from the program */ if (kernel) clReleaseKernel(kernel); kernel = clCreateKernel(program, kernel_name, &status); if (status != CL_SUCCESS) { fprintf(stderr, "error calling clCreateKernel\n"); goto error; } exit: free(ker_path); cl_file_map_delete(fm); return status; error: prevFileName = NULL; goto exit; } #define GET_PLATFORM_STR_INFO(LOWER_NAME, NAME) \ { \ size_t param_value_size; \ OCL_CALL (clGetPlatformInfo, platform, CL_PLATFORM_##NAME, 0, 0, ¶m_value_size); \ std::vector param_value(param_value_size); \ OCL_CALL (clGetPlatformInfo, platform, CL_PLATFORM_##NAME, \ param_value_size, param_value.empty() ? NULL : ¶m_value.front(), \ ¶m_value_size); \ std::string str; \ if (!param_value.empty()) \ str = std::string(¶m_value.front(), param_value_size-1); \ printf("platform_" #LOWER_NAME " \"%s\"\n", str.c_str()); \ } #include #define GET_DEVICE_STR_INFO(LOWER_NAME, NAME) \ std::string LOWER_NAME ##Str; \ OCL_CALL (clGetDeviceInfo, device, CL_DEVICE_##NAME, 0, 0, ¶m_value_size); \ { \ std::vector param_value(param_value_size); \ OCL_CALL (clGetDeviceInfo, device, CL_DEVICE_##NAME, \ param_value_size, param_value.empty() ? NULL : ¶m_value.front(), \ ¶m_value_size); \ if (!param_value.empty()) \ LOWER_NAME ##Str = std::string(¶m_value.front(), param_value_size-1); \ } \ printf("device_" #LOWER_NAME " \"%s\"\n", LOWER_NAME ##Str.c_str()); int cl_ocl_init(void) { cl_int status = CL_SUCCESS; cl_uint platform_n; size_t i; #ifdef HAS_GL_EGL_X11 bool hasGLExt = false; #endif cl_context_properties *props = NULL; /* Get the platform number */ OCL_CALL (clGetPlatformIDs, 0, NULL, &platform_n); printf("platform number %u\n", platform_n); assert(platform_n >= 1); /* Get a valid platform */ OCL_CALL (clGetPlatformIDs, 1, &platform, &platform_n); GET_PLATFORM_STR_INFO(profile, PROFILE); GET_PLATFORM_STR_INFO(name, NAME); GET_PLATFORM_STR_INFO(vendor, VENDOR); GET_PLATFORM_STR_INFO(version, VERSION); GET_PLATFORM_STR_INFO(extensions, EXTENSIONS); /* Get the device (only GPU device is supported right now) */ try { OCL_CALL (clGetDeviceIDs, platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL); { size_t param_value_size; GET_DEVICE_STR_INFO(profile, PROFILE); GET_DEVICE_STR_INFO(name, NAME); GET_DEVICE_STR_INFO(vendor, VENDOR); GET_DEVICE_STR_INFO(version, VERSION); GET_DEVICE_STR_INFO(extensions, EXTENSIONS); GET_DEVICE_STR_INFO(opencl_c_version, OPENCL_C_VERSION); #ifdef HAS_GL_EGL_X11 if (std::strstr(extensionsStr.c_str(), "cl_khr_gl_sharing")) { hasGLExt = true; } #endif } } catch (...) { fprintf(stderr, "error calling clGetDeviceIDs\n"); status = CL_DEVICE_NOT_FOUND; goto error; } #ifdef HAS_GL_EGL_X11 if (hasGLExt) { int i = 0; props = new cl_context_properties[7]; props[i++] = CL_CONTEXT_PLATFORM; props[i++] = (cl_context_properties)platform; if (init_egl_window(EGL_WINDOW_WIDTH, EGL_WINDOW_HEIGHT)) { props[i++] = CL_EGL_DISPLAY_KHR; props[i++] = (cl_context_properties)eglGetCurrentDisplay(); props[i++] = CL_GL_CONTEXT_KHR; props[i++] = (cl_context_properties)eglGetCurrentContext(); } props[i++] = 0; } #endif /* Now create a context */ ctx = clCreateContext(props, 1, &device, NULL, NULL, &status); if (status != CL_SUCCESS) { fprintf(stderr, "error calling clCreateContext\n"); goto error; } /* All image types currently supported by the context */ cl_image_format fmt[256]; cl_uint fmt_n; clGetSupportedImageFormats(ctx, 0, CL_MEM_OBJECT_IMAGE2D, 256, fmt, &fmt_n); printf("%u image formats are supported\n", fmt_n); for (i = 0; i < fmt_n; ++i) printf("[%s %s]\n", cl_test_channel_order_string(fmt[i].image_channel_order), cl_test_channel_type_string(fmt[i].image_channel_data_type)); /* We are going to push NDRange kernels here */ queue = clCreateCommandQueue(ctx, device, 0, &status); if (status != CL_SUCCESS) { fprintf(stderr, "error calling clCreateCommandQueue\n"); goto error; } error: if (props) delete[] props; return status; } int cl_test_init(const char *file_name, const char *kernel_name, int format) { cl_int status = CL_SUCCESS; /* Initialize OCL */ if ((status = cl_ocl_init()) != CL_SUCCESS) goto error; /* Load the kernel */ if ((status = cl_kernel_init(file_name, kernel_name, format, NULL)) != CL_SUCCESS) goto error; error: return status; } void cl_kernel_destroy(bool needDestroyProgram) { if (kernel) { clReleaseKernel(kernel); kernel = NULL; } if (needDestroyProgram && program) { clReleaseProgram(program); program = NULL; } } void cl_ocl_destroy(void) { clReleaseCommandQueue(queue); clReleaseContext(ctx); #ifdef HAS_GL_EGL_X11 if (eglContext != NULL) { cl_ocl_destroy_egl_window(); eglContext = NULL; } #endif } void cl_test_destroy(void) { cl_kernel_destroy(); cl_ocl_destroy(); } void cl_buffer_destroy(void) { int i; for (i = 0; i < MAX_BUFFER_N; ++i) { if (buf_data[i] != NULL) { clEnqueueUnmapMemObject(queue, buf[i], buf_data[i], 0, NULL, NULL); buf_data[i] = NULL; } if (buf[i] != NULL) { clReleaseMemObject(buf[i]); buf[i] = NULL; } } } void cl_report_perf_counters(cl_mem perf) { cl_int status = CL_SUCCESS; uint32_t *start = NULL, *end = NULL; uint32_t i; if (perf == NULL) return; start = (uint32_t*)clEnqueueMapBuffer(queue, perf, CL_TRUE, CL_MAP_READ, 0, 128 * sizeof(uint32_t)/*size*/, 0, NULL, NULL, &status); assert(status == CL_SUCCESS && start != NULL); end = start + 128; printf("BEFORE\n"); for (i = 0; i < 6*8; ++i) { if (i % 8 == 0) printf("\n"); printf("[%3u 0x%8x] ", i, start[i]); } printf("\n\n"); printf("AFTER\n"); for (i = 0; i < 6*8; ++i) { if (i % 8 == 0) printf("\n"); printf("[%3u 0x%8x] ", i, end[i]); } printf("\n\n"); printf("DIFF\n"); for (i = 0; i < 6*8; ++i) { if (i % 8 == 0) printf("\n"); printf("[%3u %8i] ", i, end[i] - start[i]); } printf("\n\n"); clEnqueueUnmapMemObject(queue, perf, start, 0, NULL, NULL); } struct bmphdr { // 2 bytes of magic here, "BM", total header size is 54 bytes! int filesize; // 4 total file size incl header short as0, as1; // 8 app specific int bmpoffset; // 12 ofset of bmp data int headerbytes; // 16 bytes in header from this point (40 actually) int width; // 20 int height; // 24 short nplanes; // 26 no of color planes short bpp; // 28 bits/pixel int compression; // 32 BI_RGB = 0 = no compression int sizeraw; // 36 size of raw bmp file, excluding header, incl padding int hres; // 40 horz resolutions pixels/meter int vres; // 44 int npalcolors; // 48 No of colors in palette int nimportant; // 52 No of important colors // raw b, g, r data here, dword aligned per scan line }; int *cl_read_bmp(const char *filename, int *width, int *height) { struct bmphdr hdr; char *bmppath = cl_do_kiss_path(filename, device); FILE *fp = fopen(bmppath, "rb"); assert(fp); char magic[2]; int ret; ret = fread(&magic[0], 1, 2, fp); if(2 != ret){ fclose(fp); free(bmppath); return NULL; } assert(magic[0] == 'B' && magic[1] == 'M'); ret = fread(&hdr, sizeof(hdr), 1, fp); if(1 != ret){ fclose(fp); free(bmppath); return NULL; } assert(hdr.width > 0 && hdr.height > 0 && hdr.nplanes == 1 && hdr.compression == 0); int *rgb32 = (int *) malloc(hdr.width * hdr.height * sizeof(int)); assert(rgb32); int x, y; int *dst = rgb32; for (y = 0; y < hdr.height; y++) { for (x = 0; x < hdr.width; x++) { assert(!feof(fp)); int b = (getc(fp) & 0x0ff); int g = (getc(fp) & 0x0ff); int r = (getc(fp) & 0x0ff); *dst++ = (r | (g << 8) | (b << 16) | 0xff000000); /* abgr */ } while (x & 3) { getc(fp); x++; } // each scanline padded to dword // printf("read row %d\n", y); // fflush(stdout); } fclose(fp); *width = hdr.width; *height = hdr.height; free(bmppath); return rgb32; } void cl_write_bmp(const int *data, int width, int height, const char *filename) { int x, y; FILE *fp = NULL; #if defined(__ANDROID__) char dst_img[256]; snprintf(dst_img, sizeof(dst_img), "/sdcard/ocl/%s", filename); fp = fopen(dst_img, "wb"); if(fp == NULL) return; #else fp = fopen(filename, "wb"); #endif assert(fp); char *raw = (char *) malloc(width * height * sizeof(int)); // at most assert(raw); char *p = raw; for (y = 0; y < height; y++) { for (x = 0; x < width; x++) { int c = *data++; *p++ = ((c >> 16) & 0xff); *p++ = ((c >> 8) & 0xff); *p++ = ((c >> 0) & 0xff); } while (x & 3) { *p++ = 0; x++; } // pad to dword } int sizeraw = p - raw; int scanline = (width * 3 + 3) & ~3; assert(sizeraw == scanline * height); struct bmphdr hdr; hdr.filesize = scanline * height + sizeof(hdr) + 2; hdr.as0 = 0; hdr.as1 = 0; hdr.bmpoffset = sizeof(hdr) + 2; hdr.headerbytes = 40; hdr.width = width; hdr.height = height; hdr.nplanes = 1; hdr.bpp = 24; hdr.compression = 0; hdr.sizeraw = sizeraw; hdr.hres = 0; // 2834; hdr.vres = 0; // 2834; hdr.npalcolors = 0; hdr.nimportant = 0; /* Now write bmp file */ char magic[2] = { 'B', 'M' }; fwrite(&magic[0], 1, 2, fp); fwrite(&hdr, 1, sizeof(hdr), fp); fwrite(raw, 1, hdr.sizeraw, fp); fclose(fp); free(raw); } static const float pixel_threshold = 0.05f; static const float max_error_ratio = 0.001f; int cl_check_image(const int *img, int w, int h, const char *bmp) { int refw, refh; int *ref = cl_read_bmp(bmp, &refw, &refh); if (ref == NULL || refw != w || refh != h) return 0; const int n = w*h; int discrepancy = 0; for (int i = 0; i < n; ++i) { const float r = (float) (img[i] & 0xff); const float g = (float) ((img[i] >> 8) & 0xff); const float b = (float) ((img[i] >> 16) & 0xff); const float rr = (float) (ref[i] & 0xff); const float rg = (float) ((ref[i] >> 8) & 0xff); const float rb = (float) ((ref[i] >> 16) & 0xff); const float dr = fabs(r-rr) / (1.f/255.f + std::max(r,rr)); const float dg = fabs(g-rg) / (1.f/255.f + std::max(g,rg)); const float db = fabs(b-rb) / (1.f/255.f + std::max(b,rb)); const float err = sqrtf(dr*dr+dg*dg+db*db); if (err > pixel_threshold) discrepancy++; } free(ref); return (float(discrepancy) / float(n) > max_error_ratio) ? 0 : 1; } float cl_FLT_ULP(float float_number) { SF floatBin, ulpBin, ulpBinBase; floatBin.f = float_number; ulpBin.spliter.sign = ulpBinBase.spliter.sign = 0; ulpBin.spliter.exponent = ulpBinBase.spliter.exponent = floatBin.spliter.exponent; ulpBin.spliter.mantissa = 0x1; ulpBinBase.spliter.mantissa = 0x0; return ulpBin.f - ulpBinBase.f; } int cl_INT_ULP(int int_number) { return 0; } double time_subtract(struct timeval *y, struct timeval *x, struct timeval *result) { if ( x->tv_sec > y->tv_sec ) return -1; if ((x->tv_sec == y->tv_sec) && (x->tv_usec > y->tv_usec)) return -1; if ( result != NULL){ result->tv_sec = ( y->tv_sec - x->tv_sec ); result->tv_usec = ( y->tv_usec - x->tv_usec ); if (result->tv_usec < 0){ result->tv_sec --; result->tv_usec += 1000000; } } double msec = 1000.0*(y->tv_sec - x->tv_sec) + (y->tv_usec - x->tv_usec)/1000.0; return msec; } float select_ulpsize(float ULPSIZE_FAST_MATH, float ULPSIZE_NO_FAST_MATH) { const char* env_strict = getenv("OCL_STRICT_CONFORMANCE"); float ULPSIZE_FACTOR = ULPSIZE_NO_FAST_MATH; if (env_strict != NULL && strcmp(env_strict, "0") == 0 ) ULPSIZE_FACTOR = ULPSIZE_FAST_MATH; return ULPSIZE_FACTOR; } int cl_check_double(void) { std::string extStr; size_t param_value_size; OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_EXTENSIONS, 0, 0, ¶m_value_size); std::vector param_value(param_value_size); OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_EXTENSIONS, param_value_size, param_value.empty() ? NULL : ¶m_value.front(), ¶m_value_size); if (!param_value.empty()) extStr = std::string(¶m_value.front(), param_value_size-1); if (std::strstr(extStr.c_str(), "cl_khr_fp64") == NULL) { printf("No cl_khr_fp64, Skip!"); return 0; } return 1; } int cl_check_beignet(void) { size_t param_value_size; size_t ret_sz; OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_VERSION, 0, 0, ¶m_value_size); if(param_value_size == 0) { return 0; } char* device_version_str = (char* )malloc(param_value_size * sizeof(char) ); OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_VERSION, param_value_size, (void*)device_version_str, &ret_sz); OCL_ASSERT(ret_sz == param_value_size); if(!strstr(device_version_str, "beignet")) { free(device_version_str); return 0; } free(device_version_str); return 1; } int cl_check_motion_estimation(void) { std::string extStr; size_t param_value_size; OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_EXTENSIONS, 0, 0, ¶m_value_size); std::vector param_value(param_value_size); OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_EXTENSIONS, param_value_size, param_value.empty() ? NULL : ¶m_value.front(), ¶m_value_size); if (!param_value.empty()) extStr = std::string(¶m_value.front(), param_value_size-1); if (std::strstr(extStr.c_str(), "cl_intel_motion_estimation") == NULL) { printf("No cl_intel_motion_estimation, Skip!"); return 0; } return 1; } int cl_check_subgroups(void) { std::string extStr; size_t param_value_size; OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_EXTENSIONS, 0, 0, ¶m_value_size); std::vector param_value(param_value_size); OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_EXTENSIONS, param_value_size, param_value.empty() ? NULL : ¶m_value.front(), ¶m_value_size); if (!param_value.empty()) extStr = std::string(¶m_value.front(), param_value_size-1); if (std::strstr(extStr.c_str(), "cl_intel_subgroups") == NULL) { printf("No cl_intel_subgroups, Skip!"); return 0; } if(utestclGetKernelSubGroupInfoKHR == NULL) utestclGetKernelSubGroupInfoKHR = (clGetKernelSubGroupInfoKHR_cb*) clGetExtensionFunctionAddressForPlatform(platform,"clGetKernelSubGroupInfoKHR"); if(utestclGetKernelSubGroupInfoKHR == NULL) { printf("Can't find clGetKernelSubGroupInfoKHR"); OCL_ASSERT(0); } return 1; } int cl_check_subgroups_short(void) { if (!cl_check_subgroups()) return 0; std::string extStr; size_t param_value_size; OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_EXTENSIONS, 0, 0, ¶m_value_size); std::vector param_value(param_value_size); OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_EXTENSIONS, param_value_size, param_value.empty() ? NULL : ¶m_value.front(), ¶m_value_size); if (!param_value.empty()) extStr = std::string(¶m_value.front(), param_value_size-1); if (std::strstr(extStr.c_str(), "cl_intel_subgroups_short") == NULL) { printf("No cl_intel_subgroups_short, Skip!"); return 0; } return 1; } int cl_check_ocl20(bool or_beignet) { size_t param_value_size; size_t ret_sz; OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_OPENCL_C_VERSION, 0, 0, ¶m_value_size); if(param_value_size == 0) { printf("Not OpenCL 2.0 device, "); if(or_beignet){ if(cl_check_beignet()) { printf("Beignet extension test!"); return 1; } else { printf("Not beignet device , Skip!"); return 0; } }else{ printf("Skip!"); return 0; } } char* device_version_str = (char* )malloc(param_value_size * sizeof(char) ); OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_OPENCL_C_VERSION, param_value_size, (void*)device_version_str, &ret_sz); OCL_ASSERT(ret_sz == param_value_size); if(!strstr(device_version_str, "2.0")) { free(device_version_str); printf("Not OpenCL 2.0 device, "); if(or_beignet){ if(cl_check_beignet()) { printf("Beignet extension test!"); return 1; } else { printf("Not beignet device , Skip!"); return 0; } }else{ printf("Skip!"); return 0; } } free(device_version_str); return 1; } int cl_check_half(void) { std::string extStr; size_t param_value_size; OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_EXTENSIONS, 0, 0, ¶m_value_size); std::vector param_value(param_value_size); OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_EXTENSIONS, param_value_size, param_value.empty() ? NULL : ¶m_value.front(), ¶m_value_size); if (!param_value.empty()) extStr = std::string(¶m_value.front(), param_value_size-1); if (std::strstr(extStr.c_str(), "cl_khr_fp16") == NULL) { printf("No cl_khr_fp16, Skip!"); return 0; } return 1; } uint32_t __half_to_float(uint16_t h, bool *isInf, bool *infSign) { uint32_t out_val = 0; uint16_t sign = (h & 0x8000) >> 15; uint16_t exp = (h & 0x7c00) >> 10; uint16_t fraction = h & 0x03ff; if (isInf) *isInf = false; if (infSign) *infSign = false; if (exp == 0 && fraction == 0) { // (Signed) zero return (sign << 31); } if (exp == 0) { // subnormal mode assert(fraction > 0); exp = -1; do { fraction = fraction << 1; exp++; } while ((fraction & 0x400) == 0); exp = 127 - exp - 15; out_val = (sign << 31) | ((exp & 0xff) << 23) | ((fraction & 0x3ff) << 13); return out_val; } if (exp == 0x1f) { // inf or NAN if (fraction == 0) { // inf out_val = (sign << 31) | (255 << 23); if (isInf) *isInf = true; if (infSign) *infSign = (sign == 0) ? 1 : 0; return out_val; } else { // NAN mode out_val = (sign << 31) | (255 << 23) | 0x7fffff; return out_val; } } // Easy case, just convert. exp = 127 - 15 + exp; out_val = (sign << 31) | ((exp & 0xff) << 23) | ((fraction & 0x3ff) << 13); return out_val; } uint16_t __float_to_half(uint32_t x) { uint16_t sign = (x & 0x80000000) >> 31; uint16_t exp = (x & 0x7F800000) >> 23; uint32_t fraction = (x & 0x7fffff); uint16_t out_val = 0; /* Handle the float NAN format. */ if (exp == 0xFF && fraction != 0) { /* return a NAN half. */ out_val = (sign << 15) | (0x7C00) | (fraction & 0x3ff); return out_val; } /* Float exp is from -126~127, half exp is from -14~15 */ if (exp - 127 > 15) { // Should overflow. /* return +- inf. */ out_val = (sign << 15) | (0x7C00); return out_val; } /* half has 10 bits fraction, so have chance to convet to (-1)^sign X 2^(-14) X 0.fraction form. But if the exp - 127 < -14 - 10, we must have unerflow. */ if (exp < -14 + 127 - 10) { // Should underflow. /* Return zero without subnormal numbers. */ out_val = (sign << 15); return out_val; } if (exp < -14 + 127) { //May underflow, but may use subnormal numbers int shift = -(exp - 127 + 14); assert(shift > 0); assert(shift <= 10); fraction = fraction | 0x0800000; // in 1.significantbits2, add the 1 fraction = fraction >> shift; // To half fraction fraction = (fraction & 0x7ff000) >> 12; out_val = (sign << 15) | ((fraction >> 1) & 0x3ff); if (fraction & 0x01) out_val++; return out_val; } /* Easy case, just convert. */ fraction = (fraction & 0x7ff000) >> 12; exp = exp - 127 + 15; assert(exp > 0); assert(exp < 0x01f); out_val = (sign << 15) | (exp << 10) | ((fraction >> 1) & 0x3ff); if (fraction & 0x01) out_val++; return out_val; } uint32_t as_uint(float f) { union uint32_cast _tmp; _tmp._float = f; return _tmp._uint; } float as_float(uint32_t i) { union uint32_cast _tmp; _tmp._uint = i; return _tmp._float; } int cl_check_reqd_subgroup(void) { if (!cl_check_subgroups()) return 0; std::string extStr; size_t param_value_size; OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_EXTENSIONS, 0, 0, ¶m_value_size); std::vector param_value(param_value_size); OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_EXTENSIONS, param_value_size, param_value.empty() ? NULL : ¶m_value.front(), ¶m_value_size); if (!param_value.empty()) extStr = std::string(¶m_value.front(), param_value_size-1); if (std::strstr(extStr.c_str(), "cl_intel_required_subgroup_size") == NULL) { printf("No cl_intel_required_subgroup_size, Skip!"); return 0; } return 1; } Beignet-1.3.2-Source/utests/compiler_workitem_builtin.cpp000664 001750 001750 00000000257 13161142102 022767 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_workitem_builtin(void) { OCL_CREATE_KERNEL("compiler_workitem_builtin"); } MAKE_UTEST_FROM_FUNCTION(compiler_workitem_builtin); Beignet-1.3.2-Source/utests/compiler_getelementptr_bitcast.cpp000664 001750 001750 00000002447 13161142102 023773 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_getelementptr_bitcast(void) { const size_t n = 16; float cpu_dst[16], cpu_src[16]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_getelementptr_bitcast"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; //must be 1 to pass the test, it is required by the special usage in the kernel locals[0] = 1; // Run random tests for (uint32_t pass = 0; pass < 8; ++pass) { OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) cpu_src[i] = ((float*)buf_data[0])[i] = .1f * (rand() & 15) - .75f; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i){ unsigned char* c = (unsigned char*)&cpu_src[i]; cpu_dst[i] = c[2]; } // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i){ //printf("src:%f, gpu_dst: %f, cpu_dst: %f\n", cpu_src[i], ((float *)buf_data[1])[i], cpu_dst[i]); OCL_ASSERT(((float *)buf_data[1])[i] == cpu_dst[i]); } OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION(compiler_getelementptr_bitcast); Beignet-1.3.2-Source/utests/compiler_function_argument.cpp000664 001750 001750 00000001131 13161142102 023117 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_function_argument(void) { const size_t n = 2048; const int value = 34; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_function_argument"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(int), &value); // Run the kernel globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); OCL_MAP_BUFFER(0); // Check results for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((int*)buf_data[0])[i] == value); } MAKE_UTEST_FROM_FUNCTION(compiler_function_argument); Beignet-1.3.2-Source/utests/compiler_subgroup_image_block_write.cpp000664 001750 001750 00000017006 13161142102 024774 0ustar00yryr000000 000000 #include #include #include #include "utest_helper.hpp" using namespace std; /* set to 1 for debug, output of input-expected data */ #define DEBUG_STDOUT 0 /* NDRANGE */ #define WG_GLOBAL_SIZE 32 #define WG_LOCAL_SIZE 32 /* * Generic compute-expected function for meida block write */ template static void compute_expected(T* input, T* expected, size_t VEC_SIZE) { for(uint32_t i = 0; i < WG_GLOBAL_SIZE; i++) for(uint32_t j = 0; j < VEC_SIZE; j++) expected[WG_GLOBAL_SIZE * j + i] = input[i * VEC_SIZE + j]; } /* * Generic input-expected generate function for media block write */ template static void generate_data(T* &input, T* &expected, size_t VEC_SIZE) { /* allocate input and expected arrays */ input = new T[WG_GLOBAL_SIZE * VEC_SIZE]; expected = new T[WG_GLOBAL_SIZE * VEC_SIZE]; /* base value for all data types */ T base_val = (long)7 << (sizeof(T) * 5 - 3); /* seed for random inputs */ srand (time(NULL)); #if DEBUG_STDOUT cout << endl << "IN: " << endl; #endif /* generate inputs and expected values */ for(uint32_t gid = 0; gid < WG_GLOBAL_SIZE * VEC_SIZE; gid++) { /* initially 0, augment after */ input[gid] = ((rand() % 2 - 1) * base_val) + (rand() % 112); //input[gid] = gid; #if DEBUG_STDOUT /* output generated input */ cout << setw(4) << input[gid] << ", " ; if((gid + 1) % 8 == 0) cout << endl; #endif } /* expected values */ compute_expected(input, expected, VEC_SIZE); #if DEBUG_STDOUT /* output expected input */ cout << endl << "EXP: " << endl; for(uint32_t gid = 0; gid < WG_GLOBAL_SIZE; gid++) { cout << "("; for(uint32_t vsz = 0; vsz < VEC_SIZE; vsz++) cout << setw(4) << expected[gid* VEC_SIZE + vsz] << ", " ; cout << ")"; if((gid + 1) % 8 == 0) cout << endl; cout << endl; } #endif } /* * Generic subgroup utest function for media block write */ template static void subgroup_generic(T* input, T* expected, size_t VEC_SIZE) { cl_image_format format; cl_image_desc desc; memset(&desc, 0x0, sizeof(cl_image_desc)); memset(&format, 0x0, sizeof(cl_image_format)); /* get simd size */ globals[0] = WG_GLOBAL_SIZE; locals[0] = WG_LOCAL_SIZE; size_t SIMD_SIZE = 0; OCL_CALL(utestclGetKernelSubGroupInfoKHR,kernel,device,CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR,sizeof(size_t)*1,locals,sizeof(size_t),&SIMD_SIZE,NULL); size_t buf_sz = VEC_SIZE * WG_GLOBAL_SIZE; /* input and expected data */ generate_data(input, expected, VEC_SIZE); /* prepare input for datatype */ format.image_channel_order = CL_R; format.image_channel_data_type = CL_UNSIGNED_INT32; desc.image_type = CL_MEM_OBJECT_IMAGE2D; desc.image_width = WG_GLOBAL_SIZE; desc.image_height = VEC_SIZE; desc.image_row_pitch = 0; OCL_CREATE_IMAGE(buf[0], 0, &format, &desc, NULL); OCL_CREATE_BUFFER(buf[1], 0, buf_sz * sizeof(T), NULL); /* set input data for GPU */ OCL_MAP_BUFFER(1); memcpy(buf_data[1], input, buf_sz* sizeof(T)); OCL_UNMAP_BUFFER(1); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); /* run the kernel on GPU */ OCL_NDRANGE(1); /* check if mismatch */ OCL_MAP_BUFFER_GTT(0); uint32_t mismatches = 0; size_t image_row_pitch = 0; OCL_CALL(clGetImageInfo, buf[0], CL_IMAGE_ROW_PITCH, sizeof(image_row_pitch), &image_row_pitch, NULL); image_row_pitch /= sizeof(T); T *out = (T *)buf_data[0]; for (uint32_t vsz = 0; vsz < VEC_SIZE; vsz++) for (uint32_t i = 0; i < WG_GLOBAL_SIZE; i++) if (out[vsz * image_row_pitch + i] != expected[WG_GLOBAL_SIZE * vsz + i]) { /* found mismatch, increment */ mismatches++; #if DEBUG_STDOUT /* output mismatch */ cout << "Err at " << WG_GLOBAL_SIZE * vsz + i << ", " << out[vsz * image_row_pitch + i] << " != " << expected[WG_GLOBAL_SIZE * vsz + i] << endl; #endif } OCL_UNMAP_BUFFER_GTT(0); OCL_ASSERT(mismatches == 0); free(input); free(expected); } /* * sub_group image block write functions */ void compiler_subgroup_image_block_write_ui1(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_image_block_write", "compiler_subgroup_image_block_write_ui1"); subgroup_generic(input, expected, 1); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_image_block_write_ui1); void compiler_subgroup_image_block_write_ui2(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_image_block_write", "compiler_subgroup_image_block_write_ui2"); subgroup_generic(input, expected, 2); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_image_block_write_ui2); void compiler_subgroup_image_block_write_ui4(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_image_block_write", "compiler_subgroup_image_block_write_ui4"); subgroup_generic(input, expected, 4); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_image_block_write_ui4); void compiler_subgroup_image_block_write_ui8(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_image_block_write", "compiler_subgroup_image_block_write_ui8"); subgroup_generic(input, expected, 8); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_image_block_write_ui8); void compiler_subgroup_image_block_write_us1(void) { if(!cl_check_subgroups_short()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_image_block_write.cl", "compiler_subgroup_image_block_write_us1", SOURCE, "-DSHORT"); subgroup_generic(input, expected, 1); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_image_block_write_us1); void compiler_subgroup_image_block_write_us2(void) { if(!cl_check_subgroups_short()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_image_block_write.cl", "compiler_subgroup_image_block_write_us2", SOURCE, "-DSHORT"); subgroup_generic(input, expected, 2); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_image_block_write_us2); void compiler_subgroup_image_block_write_us4(void) { if(!cl_check_subgroups_short()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_image_block_write.cl", "compiler_subgroup_image_block_write_us4", SOURCE, "-DSHORT"); subgroup_generic(input, expected, 4); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_image_block_write_us4); void compiler_subgroup_image_block_write_us8(void) { if(!cl_check_subgroups_short()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_image_block_write.cl", "compiler_subgroup_image_block_write_us8", SOURCE, "-DSHORT"); subgroup_generic(input, expected, 8); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_image_block_write_us8); Beignet-1.3.2-Source/utests/compiler_math_constants.cpp000664 001750 001750 00000000251 13161142102 022417 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_math_constants(void) { OCL_CREATE_KERNEL("compiler_math_constants"); } MAKE_UTEST_FROM_FUNCTION(compiler_math_constants); Beignet-1.3.2-Source/utests/compiler_uint3_unaligned_copy.cpp000664 001750 001750 00000002377 13161142102 023527 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_uint3_unaligned_copy(void) { const size_t n = 128; // Setup kernel and buffers. Note that uint3 is aligned on 16 bytes // according to the OCL specification OCL_CREATE_KERNEL("compiler_uint3_unaligned_copy"); buf_data[0] = (uint32_t*) malloc(sizeof(uint32_t[4]) * n); for (uint32_t i = 0; i < n; ++i) { ((uint32_t*)buf_data[0])[3*i+0] = 3*i+0; ((uint32_t*)buf_data[0])[3*i+1] = 3*i+1; ((uint32_t*)buf_data[0])[3*i+2] = 3*i+2; } OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(uint32_t[4]), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t[4]), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); // Check result OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < n; ++i) { OCL_ASSERT(((uint32_t*)buf_data[0])[3*i+0] == ((uint32_t*)buf_data[1])[3*i+0]); OCL_ASSERT(((uint32_t*)buf_data[0])[3*i+1] == ((uint32_t*)buf_data[1])[3*i+1]); OCL_ASSERT(((uint32_t*)buf_data[0])[3*i+2] == ((uint32_t*)buf_data[1])[3*i+2]); } } MAKE_UTEST_FROM_FUNCTION(compiler_uint3_unaligned_copy); Beignet-1.3.2-Source/utests/builtin_num_groups.cpp000664 001750 001750 00000004043 13161142102 021427 0ustar00yryr000000 000000 /* According to the OpenCL v1.1 & v1.2 chapter 6.11, the behavior of function get_num_groups should be as following: globals[0] = 1; globals[1] = 4; globals[2] = 9; locals[0] = 1; locals[1] = 2; locals[2] = 3; #ifdef CL_VERSION_1_2 | CL_VERSION_1_1: get_num_groups(-1) = 1 (dimension:1) get_num_groups(0) = 1 (dimension:1) get_num_groups(1) = 1 (dimension:1) get_num_groups(-1) = 1 (dimension:2) get_num_groups(0) = 1 (dimension:2) get_num_groups(1) = 2 (dimension:2) get_num_groups(2) = 1 (dimension:2) get_num_groups(-1) = 1 (dimension:3) get_num_groups(0) = 1 (dimension:3) get_num_groups(1) = 2 (dimension:3) get_num_groups(2) = 3 (dimension:3) get_num_groups(3) = 1 (dimension:3) */ #define udebug 0 #include "utest_helper.hpp" static void builtin_num_groups(void) { // Setup kernel and buffers int dim, dim_arg_global, num_groups, err; OCL_CREATE_KERNEL("builtin_num_groups"); OCL_CREATE_BUFFER(buf[0], CL_MEM_READ_WRITE, sizeof(int), NULL); OCL_CREATE_BUFFER(buf[1], CL_MEM_READ_WRITE, sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 1; globals[1] = 4; globals[2] = 9; locals[0] = 1; locals[1] = 2; locals[2] = 3; for( dim=1; dim <= 3; dim++ ) { for( dim_arg_global = -1; dim_arg_global <= dim + 1; dim_arg_global++ ) { err = clEnqueueWriteBuffer( queue, buf[1], CL_TRUE, 0, sizeof(int), &dim_arg_global, 0, NULL, NULL); if (err != CL_SUCCESS) { printf("Error: Failed to write to source array!\n"); OCL_ASSERT(0); } // Run the kernel OCL_NDRANGE( dim ); OCL_MAP_BUFFER(0); num_groups = ((int*)buf_data[0])[0]; #if udebug printf("get_num_groups(%d) = %d (dimension:%d)\n", dim_arg_global, num_groups, dim); #endif if ( dim_arg_global >= 0 && dim_arg_global < dim) OCL_ASSERT( num_groups == dim_arg_global + 1 ); else { OCL_ASSERT( num_groups == 1); } OCL_UNMAP_BUFFER(0); } } } MAKE_UTEST_FROM_FUNCTION(builtin_num_groups); Beignet-1.3.2-Source/utests/compiler_sampler.cpp000664 001750 001750 00000003242 13161142102 021040 0ustar00yryr000000 000000 /* test OpenCL 1.1 Sampler Objects (section 5.5) */ #include "utest_helper.hpp" void compiler_sampler(void) { if(!cl_check_ocl20(false)) return; OCL_CREATE_KERNEL("compiler_sampler"); OCL_ASSERT(ctx != 0); cl_sampler s; cl_int err; cl_uint a1[] = {CL_TRUE, CL_FALSE}, a2[] = {CL_ADDRESS_MIRRORED_REPEAT, CL_ADDRESS_REPEAT, CL_ADDRESS_CLAMP_TO_EDGE, CL_ADDRESS_CLAMP, CL_ADDRESS_NONE}, a3[] = {CL_FILTER_NEAREST, CL_FILTER_LINEAR}, a4[] = {CL_SAMPLER_REFERENCE_COUNT, CL_SAMPLER_CONTEXT, CL_SAMPLER_NORMALIZED_COORDS, CL_SAMPLER_ADDRESSING_MODE, CL_SAMPLER_FILTER_MODE}; char pv[1000]; size_t pv_size; int i, j, k, l; for(i=0; i<2; i++) for(j=0; j<5; j++) for(k=0; k<2; k++) { s = clCreateSampler(ctx, a1[i], a2[j], a3[k], &err); OCL_ASSERT(err == CL_SUCCESS); OCL_CALL(clRetainSampler, s); OCL_CALL(clReleaseSampler, s); for(l=0; l<5; l++) OCL_CALL(clGetSamplerInfo, s, a4[l], 1000, pv, &pv_size); OCL_CALL(clReleaseSampler, s); cl_sampler_properties sam[] = { CL_SAMPLER_NORMALIZED_COORDS, a1[i], CL_SAMPLER_ADDRESSING_MODE, a2[j], CL_SAMPLER_FILTER_MODE, a3[k], 0}; s = clCreateSamplerWithProperties(ctx, sam, &err); OCL_ASSERT(err == CL_SUCCESS); OCL_CALL(clRetainSampler, s); OCL_CALL(clReleaseSampler, s); for(l=0; l<5; l++) OCL_CALL(clGetSamplerInfo, s, a4[l], 1000, pv, &pv_size); OCL_CALL(clReleaseSampler, s); } } MAKE_UTEST_FROM_FUNCTION(compiler_sampler); Beignet-1.3.2-Source/utests/compiler_degrees.cpp000664 001750 001750 00000001434 13161142102 021014 0ustar00yryr000000 000000 #include "utest_helper.hpp" #define M_180_PI_F 57.295779513082321F void compiler_degrees(void) { const int n = 32; float src[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_degrees"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); for (int i = 0; i < n; ++i) { src[i] = ((float *)buf_data[0])[i] = rand() * 0.01f; } OCL_UNMAP_BUFFER(0); OCL_NDRANGE(1); OCL_MAP_BUFFER(1); for (int i = 0; i < n; ++i) { OCL_ASSERT(((float *)buf_data[1])[i] == src[i] * M_180_PI_F); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_degrees); Beignet-1.3.2-Source/utests/builtin_max_sub_group_size.cpp000664 001750 001750 00000002670 13161142102 023141 0ustar00yryr000000 000000 /* According to the OpenCL cl_intel_subgroups. Now define local and global size as following: globals[0] = 4; globals[1] = 9; globals[2] = 16; locals[0] = 2; locals[1] = 3; locals[2] = 4; */ #define udebug 0 #include "utest_helper.hpp" static void builtin_max_sub_group_size(void) { if(!cl_check_subgroups()) return; // Setup kernel and buffers size_t dim, i,local_sz = 1,buf_len = 1; OCL_CREATE_KERNEL("builtin_max_sub_group_size"); size_t sub_sz; OCL_CREATE_BUFFER(buf[0], CL_MEM_READ_WRITE, sizeof(int)*576, NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); for( dim=1; dim <= 3; dim++ ) { buf_len = 1; local_sz = 1; for(i=1; i <= dim; i++) { locals[i - 1] = i + 1; globals[i - 1] = (i + 1) * (i + 1); buf_len *= ((i + 1) * (i + 1)); local_sz *= i + 1; } for(i = dim+1; i <= 3; i++) { globals[i - 1] = 0; locals[i - 1] = 0; } OCL_CALL(utestclGetKernelSubGroupInfoKHR,kernel,device,CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR,sizeof(size_t)*dim,locals,sizeof(size_t),&sub_sz,NULL); // Run the kernel OCL_NDRANGE( dim ); clFinish(queue); OCL_MAP_BUFFER(0); for( i = 0; i < buf_len; i++) { #if udebug printf("got %d expect %d\n", ((uint32_t*)buf_data[0])[i], sub_sz); #endif OCL_ASSERT( ((uint32_t*)buf_data[0])[i] == sub_sz); } OCL_UNMAP_BUFFER(0); } } MAKE_UTEST_FROM_FUNCTION(builtin_max_sub_group_size); Beignet-1.3.2-Source/utests/compiler_integer_builtin.cpp000664 001750 001750 00000000254 13161142102 022560 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_integer_builtin(void) { OCL_CREATE_KERNEL("compiler_integer_builtin"); } MAKE_UTEST_FROM_FUNCTION(compiler_integer_builtin); Beignet-1.3.2-Source/utests/builtin_local_id.cpp000664 001750 001750 00000003136 13161142102 021001 0ustar00yryr000000 000000 /* According to the OpenCL v1.1 & v1.2 chapter 6.11. Now define local and global size as following: globals[0] = 4; globals[1] = 9; globals[2] = 16; locals[0] = 2; locals[1] = 3; locals[2] = 4; Kernel: int id = get_local_id(0) + get_group_id(0)*2 + \ get_local_id(1) * 4 + get_group_id(1)*12 +\ get_local_id(2) *36 + get_group_id(2)*144; dimension:1 0 1 2 3 dimension:2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 dimension:3 0 1 2 3 4 5 6 7 ... 139 140 141 142 143 ... ... 429 430 431 432 433 434 ... 571 572 573 574 575 */ #define udebug 0 #include "utest_helper.hpp" static void builtin_local_id(void) { // Setup kernel and buffers int dim, i, buf_len=1; OCL_CREATE_KERNEL("builtin_local_id"); OCL_CREATE_BUFFER(buf[0], CL_MEM_READ_WRITE, sizeof(int)*576, NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); for( dim=1; dim <= 3; dim++ ) { buf_len = 1; for(i=1; i <= dim; i++) { locals[i - 1] = i + 1; globals[i - 1] = (i + 1) * (i + 1); buf_len *= ((i + 1) * (i + 1)); } for(i = dim+1; i <= 3; i++) { globals[i - 1] = 0; locals[i - 1] = 0; } // Run the kernel OCL_NDRANGE( dim ); clFinish(queue); OCL_MAP_BUFFER(0); #if udebug for(i = 0; i < buf_len; i++) { printf("%2d ", ((int*)buf_data[0])[i]); if ((i + 1) % 4 == 0) printf("\n"); } #endif for( i = 0; i < buf_len; i++) OCL_ASSERT( ((int*)buf_data[0])[i] == i); OCL_UNMAP_BUFFER(0); } } MAKE_UTEST_FROM_FUNCTION(builtin_local_id); Beignet-1.3.2-Source/utests/compiler_step.cpp000664 001750 001750 00000021465 13161142102 020357 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include "string.h" template struct cl_vec { T ptr[((N+1)/2)*2]; //align to 2 elements. typedef cl_vec vec_type; cl_vec(void) { memset(ptr, 0, sizeof(T) * ((N+1)/2)*2); } cl_vec(vec_type & other) { memset(ptr, 0, sizeof(T) * ((N+1)/2)*2); memcpy (this->ptr, other.ptr, sizeof(T) * N); } vec_type& operator= (vec_type & other) { memset(ptr, 0, sizeof(T) * ((N+1)/2)*2); memcpy (this->ptr, other.ptr, sizeof(T) * N); return *this; } template vec_type& operator= (cl_vec & other) { memset(ptr, 0, sizeof(T) * ((N+1)/2)*2); memcpy (this->ptr, other.ptr, sizeof(T) * N); return *this; } bool operator== (vec_type & other) { return !memcmp (this->ptr, other.ptr, sizeof(T) * N); } void step (vec_type & other) { int i = 0; for (; i < N; i++) { T a = ptr[i]; T edge = other.ptr[i]; T f = a < edge ? 0.0 : 1.0; ptr[i] = f; } } void step (float & edge) { int i = 0; for (; i < N; i++) { T a = ptr[i]; T f = a < edge ? 0.0 : 1.0; ptr[i] = f; } } }; template static void cpu (int global_id, cl_vec *edge, cl_vec *src, cl_vec *dst) { cl_vec v = src[global_id]; v.step(edge[global_id]); dst[global_id] = v; } template static void cpu(int global_id, T *edge, T *src, U *dst) { T f = src[global_id]; T e = edge[global_id]; f = f < e ? 0.0 : 1.0; dst[global_id] = (U)f; } template static void cpu (int global_id, float edge, cl_vec *src, cl_vec *dst) { cl_vec v = src[global_id]; v.step(edge); dst[global_id] = v; } template static void cpu(int global_id, float edge, T *src, U *dst) { T f = src[global_id]; f = f < edge ? 0.0 : 1.0; dst[global_id] = (U)f; } template static void gen_rand_val (cl_vec& vect) { int i = 0; memset(vect.ptr, 0, sizeof(T) * ((N+1)/2)*2); for (; i < N; i++) { vect.ptr[i] = static_cast(.1f * (rand() & 15) - .75f); } } template static void gen_rand_val (T & val) { val = static_cast(.1f * (rand() & 15) - .75f); } template inline static void print_data (T& val) { if (std::is_unsigned::value) printf(" %u", val); else printf(" %d", val); } inline static void print_data (float& val) { printf(" %f", val); } template static void dump_data (cl_vec* edge, cl_vec* src, cl_vec* dst, int n) { U* val = reinterpret_cast(dst); n = n*((N+1)/2)*2; printf("\nEdge: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((T *)buf_data[0])[i]); } printf("\nx: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((T *)buf_data[1])[i]); } printf("\nCPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(val[i]); } printf("\nGPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((U *)buf_data[2])[i]); } } template static void dump_data (T* edge, T* src, U* dst, int n) { printf("\nedge: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((T *)buf_data[0])[i]); } printf("\nx: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((T *)buf_data[1])[i]); } printf("\nCPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(dst[i]); } printf("\nGPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((U *)buf_data[2])[i]); } } template static void dump_data (float edge, cl_vec* src, cl_vec* dst, int n) { U* val = reinterpret_cast(dst); n = n*((N+1)/2)*2; printf("\nEdge: %f\n", edge); printf("\nx: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((T *)buf_data[0])[i]); } printf("\nCPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(val[i]); } printf("\nGPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((U *)buf_data[1])[i]); } } template static void dump_data (float edge, T* src, U* dst, int n) { printf("\nedge: %f\n", edge); printf("\nx: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((T *)buf_data[0])[i]); } printf("\nCPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(dst[i]); } printf("\nGPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((U *)buf_data[1])[i]); } } template static void compiler_step_with_type(void) { const size_t n = 16; T cpu_dst[n], cpu_src[n]; T edge[n]; // Setup buffers OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(T), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(T), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(T), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = n; // Run random tests for (uint32_t pass = 0; pass < 8; ++pass) { OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); /* Clear the dst buffer to avoid random data. */ OCL_MAP_BUFFER(2); memset(buf_data[2], 0, sizeof(T) * n); OCL_UNMAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { gen_rand_val(cpu_src[i]); gen_rand_val(edge[i]); } memcpy(buf_data[1], cpu_src, sizeof(T) * n); memcpy(buf_data[0], edge, sizeof(T) * n); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) cpu(i, edge, cpu_src, cpu_dst); // Compare OCL_MAP_BUFFER(2); //dump_data(edge, cpu_src, cpu_dst, n); OCL_ASSERT(!memcmp(buf_data[2], cpu_dst, sizeof(T) * n)); OCL_UNMAP_BUFFER(2); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(0); } } #define STEP_TEST_TYPE(TYPE) \ static void compiler_step_##TYPE (void) \ { \ OCL_CALL (cl_kernel_init, "compiler_step.cl", "compiler_step_"#TYPE, SOURCE, NULL); \ compiler_step_with_type(); \ } \ MAKE_UTEST_FROM_FUNCTION(compiler_step_##TYPE); typedef cl_vec float2; typedef cl_vec float3; typedef cl_vec float4; typedef cl_vec float8; typedef cl_vec float16; STEP_TEST_TYPE(float) STEP_TEST_TYPE(float2) STEP_TEST_TYPE(float3) STEP_TEST_TYPE(float4) STEP_TEST_TYPE(float8) STEP_TEST_TYPE(float16) template static void compiler_stepf_with_type(void) { const size_t n = 16; T cpu_dst[n], cpu_src[n]; float edge = (float)(.1f * (rand() & 15) - .75f); // Setup buffers OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(T), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(T), NULL); OCL_SET_ARG(0, sizeof(float), &edge); OCL_SET_ARG(1, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = n; // Run random tests for (uint32_t pass = 0; pass < 8; ++pass) { OCL_MAP_BUFFER(0); /* Clear the dst buffer to avoid random data. */ OCL_MAP_BUFFER(1); memset(buf_data[1], 0, sizeof(T) * n); OCL_UNMAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { gen_rand_val(cpu_src[i]); } memcpy(buf_data[0], cpu_src, sizeof(T) * n); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) cpu(i, edge, cpu_src, cpu_dst); // Compare OCL_MAP_BUFFER(1); //dump_data(edge, cpu_src, cpu_dst, n); OCL_ASSERT(!memcmp(buf_data[1], cpu_dst, sizeof(T) * n)); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(0); } } #define _STEPF_TEST_TYPE(TYPE, keep_program) \ static void compiler_stepf_##TYPE (void) \ { \ OCL_CALL (cl_kernel_init, "compiler_step.cl", "compiler_stepf_"#TYPE, SOURCE, NULL); \ compiler_stepf_with_type(); \ } \ MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(compiler_stepf_##TYPE, keep_program); #define STEPF_TEST_TYPE(TYPE) _STEPF_TEST_TYPE(TYPE, true) #define STEPF_TEST_TYPE_END(TYPE) _STEPF_TEST_TYPE(TYPE, false) STEPF_TEST_TYPE(float) STEPF_TEST_TYPE(float2) STEPF_TEST_TYPE(float3) STEPF_TEST_TYPE(float4) STEPF_TEST_TYPE(float8) STEPF_TEST_TYPE_END(float16) Beignet-1.3.2-Source/utests/setenv.sh.in000664 001750 001750 00000001013 13161142102 017236 0ustar00yryr000000 000000 #!/bin/sh # export OCL_BITCODE_LIB_PATH=@LOCAL_OCL_BITCODE_BIN@ export OCL_HEADER_FILE_DIR=@LOCAL_OCL_HEADER_DIR@ export OCL_BITCODE_LIB_20_PATH=@LOCAL_OCL_BITCODE_BIN_20@ export OCL_PCH_PATH=@LOCAL_OCL_PCH_OBJECT@ export OCL_PCH_20_PATH=@LOCAL_OCL_PCH_OBJECT_20@ export OCL_KERNEL_PATH=@CMAKE_CURRENT_SOURCE_DIR@/../kernels export OCL_GBE_PATH=@LOCAL_GBE_OBJECT_DIR@ export OCL_INTERP_PATH=@LOCAL_INTERP_OBJECT_DIR@ #disable self-test so we can get something more precise than "doesn't work" export OCL_IGNORE_SELF_TEST=1 Beignet-1.3.2-Source/utests/compiler_constant_expr.cpp000664 001750 001750 00000001725 13161142102 022270 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include static void compiler_constant_expr(void) { const size_t n = 48; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_constant_expr"); buf_data[0] = (uint32_t*) malloc(sizeof(float) * n); for (uint32_t i = 0; i < n; ++i) ((float*)buf_data[0])[i] = i; OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(float), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; OCL_NDRANGE(1); // Check result OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < n; ++i) { float expect = pow(((float*)buf_data[0])[i], (i % 3) + 1); float err = fabs(((float*)buf_data[1])[i] - expect); OCL_ASSERT(err <= 100 * cl_FLT_ULP(expect)); } } MAKE_UTEST_FROM_FUNCTION(compiler_constant_expr); Beignet-1.3.2-Source/utests/compiler_sub_group_shuffle_down.cpp000664 001750 001750 00000006223 13161142102 024147 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_sub_group_shuffle_down_int(void) { if(!cl_check_subgroups()) return; const size_t n = 32; const int32_t buf_size = 4 * n + 1; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_sub_group_shuffle_down", "compiler_sub_group_shuffle_down_int"); OCL_CREATE_BUFFER(buf[0], 0, buf_size * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); int c = 13; OCL_SET_ARG(1, sizeof(int), &c); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); for (int32_t i = 0; i < buf_size; ++i) ((int*)buf_data[0])[i] = -1; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(0); int* dst = (int *)buf_data[0]; int suggroupsize = dst[0]; OCL_ASSERT(suggroupsize == 8 || suggroupsize == 16); dst++; for (int32_t i = 0; i < (int32_t) n; ++i){ int round = i / suggroupsize; int index = i % suggroupsize; //printf("%d %d %d %d\n",dst[4*i], dst[4*i+1], dst[4*i+2], dst[4*i+3]); OCL_ASSERT( (index + c >= suggroupsize ? 456 : 123) == dst[4*i]); OCL_ASSERT( (index + c >= suggroupsize ? (round * suggroupsize + (i + c) % suggroupsize): 123) == dst[4*i+1]); OCL_ASSERT( (index + index + 1 >= suggroupsize ? -(round * suggroupsize + (i + index + 1) % suggroupsize) : (round * suggroupsize + (i + index + 1) % suggroupsize)) == dst[4*i+2]); OCL_ASSERT((round * suggroupsize + (suggroupsize - 1)) == dst[4*i+3]); } OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(compiler_sub_group_shuffle_down_int); void compiler_sub_group_shuffle_down_short(void) { if(!cl_check_subgroups_short()) return; const size_t n = 32; const int32_t buf_size = 4 * n + 1; // Setup kernel and buffers OCL_CALL(cl_kernel_init, "compiler_sub_group_shuffle_down.cl", "compiler_sub_group_shuffle_down_short", SOURCE, "-DSHORT"); OCL_CREATE_BUFFER(buf[0], 0, buf_size * sizeof(short), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); int c = 13; OCL_SET_ARG(1, sizeof(int), &c); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); for (int32_t i = 0; i < buf_size; ++i) ((short*)buf_data[0])[i] = -1; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(0); short* dst = (short *)buf_data[0]; short suggroupsize = dst[0]; OCL_ASSERT(suggroupsize == 8 || suggroupsize == 16); dst++; for (int32_t i = 0; i < (int32_t) n; ++i){ int round = i / suggroupsize; int index = i % suggroupsize; //printf("%d %d %d %d\n",dst[4*i], dst[4*i+1], dst[4*i+2], dst[4*i+3]); OCL_ASSERT( (index + c >= suggroupsize ? 456 : 123) == dst[4*i]); OCL_ASSERT( (index + c >= suggroupsize ? (round * suggroupsize + (i + c) % suggroupsize): 123) == dst[4*i+1]); OCL_ASSERT( (index + index + 1 >= suggroupsize ? -(round * suggroupsize + (i + index + 1) % suggroupsize) : (round * suggroupsize + (i + index + 1) % suggroupsize)) == dst[4*i+2]); OCL_ASSERT((round * suggroupsize + (suggroupsize - 1)) == dst[4*i+3]); } OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(compiler_sub_group_shuffle_down_short); Beignet-1.3.2-Source/utests/compiler_lower_return2.cpp000664 001750 001750 00000002264 13161142102 022211 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void cpu(int global_id, int *src, int *dst) { const int id = global_id; dst[id] = id; while (dst[id] > src[id]) { if (dst[id] > 10) return; dst[id]--; } dst[id] += 2; } static void compiler_lower_return2(void) { const size_t n = 16; int cpu_dst[16], cpu_src[16]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_lower_return2"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; for (uint32_t pass = 0; pass < 8; ++pass) { OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) cpu_src[i] = ((int32_t*)buf_data[0])[i] = rand() % 16; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i <(int32_t) n; ++i) cpu(i, cpu_src, cpu_dst); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < 11; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == cpu_dst[i]); OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION(compiler_lower_return2); Beignet-1.3.2-Source/utests/compiler_fill_image_1d.cpp000664 001750 001750 00000002441 13161142102 022051 0ustar00yryr000000 000000 #include #include "utest_helper.hpp" static void compiler_fill_image_1d(void) { const size_t w = 2048; cl_image_format format; cl_image_desc desc; memset(&desc, 0x0, sizeof(cl_image_desc)); memset(&format, 0x0, sizeof(cl_image_format)); format.image_channel_order = CL_RGBA; format.image_channel_data_type = CL_UNSIGNED_INT8; desc.image_type = CL_MEM_OBJECT_IMAGE1D; desc.image_width = w; desc.image_row_pitch = 0; // Setup kernel and images OCL_CREATE_KERNEL("test_fill_image_1d"); OCL_CREATE_IMAGE(buf[0], 0, &format, &desc, NULL); OCL_MAP_BUFFER_GTT(0); for (uint32_t i = 0; i < w; i++) { ((uint32_t*)buf_data[0])[i] = 0; } OCL_UNMAP_BUFFER_GTT(0); // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); globals[0] = w/2; locals[0] = 16; OCL_NDRANGE(1); // Check result OCL_MAP_BUFFER_GTT(0); //printf("------ The image result is: -------\n"); for (uint32_t i = 0; i < w/2; i++) { //printf(" %2x", ((uint32_t *)buf_data[0])[i]); OCL_ASSERT(((uint32_t*)buf_data[0])[i] == 0x03020100); } for (uint32_t i = w/2; i < w; i++) { //printf(" %2x", ((uint32_t *)buf_data[0])[i]); OCL_ASSERT(((uint32_t*)buf_data[0])[i] == 0); } OCL_UNMAP_BUFFER_GTT(0); } MAKE_UTEST_FROM_FUNCTION(compiler_fill_image_1d); Beignet-1.3.2-Source/utests/compiler_overflow.cpp000664 001750 001750 00000011735 13161142102 021246 0ustar00yryr000000 000000 #include "utest_helper.hpp" namespace { typedef struct { unsigned long x; unsigned long y; unsigned long z; unsigned long w; }ulong4; typedef struct { uint32_t x; uint32_t y; uint32_t z; uint32_t w; } uint4; typedef struct { uint16_t x; uint16_t y; uint16_t z; uint16_t w; } ushort4; typedef struct { uint8_t x; uint8_t y; uint8_t z; uint8_t w; } uchar4; template U get_max() { int shift_bit = sizeof(U)*8; U u_max = 0; for (int i = 0; i < shift_bit; i++) u_max |= 1<<(shift_bit-i-1); return u_max; } template void test(const char *kernel_name, int func_type) { const size_t n = 16; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_overflow", kernel_name); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(T), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(T), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(T), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); U max = get_max(); // test add and sub overflow when src1 is 1: // uadd.with.overflow: max + 1 // usub.with.overflow: 0 - 1 OCL_MAP_BUFFER(0); for (uint32_t i = 0; i < n; ++i) { if(func_type == 0) { ((T*)buf_data[0])[i].x = max; ((T*)buf_data[0])[i].y = max; ((T*)buf_data[0])[i].z = max; ((T*)buf_data[0])[i].w = i; }else if(func_type == 1) { ((T*)buf_data[0])[i].x = 0; ((T*)buf_data[0])[i].y = 0; ((T*)buf_data[0])[i].z = 0; ((T*)buf_data[0])[i].w = n+2-i; }else OCL_ASSERT(0); } OCL_UNMAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < n; ++i) { ((T*)buf_data[1])[i].x = 1; ((T*)buf_data[1])[i].y = 1; ((T*)buf_data[1])[i].z = 1; ((T*)buf_data[1])[i].w = 1; } OCL_UNMAP_BUFFER(1); globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); OCL_MAP_BUFFER(2); for (uint32_t i = 0; i < 16; ++i) { // printf("%u,%u,%u,%u\n", ((T*)buf_data[2])[i].x,((T*)buf_data[2])[i].y, ((T*)buf_data[2])[i].z, ((T*)buf_data[2])[i].w ); if(func_type == 0) { OCL_ASSERT(((T*)buf_data[2])[i].x == 0); OCL_ASSERT(((T*)buf_data[2])[i].y == 1); OCL_ASSERT(((T*)buf_data[2])[i].z == 1); OCL_ASSERT(((T*)buf_data[2])[i].w == i+2); }else if(func_type == 1) { OCL_ASSERT(((T*)buf_data[2])[i].x == max); OCL_ASSERT(((T*)buf_data[2])[i].y == max-1); OCL_ASSERT(((T*)buf_data[2])[i].z == max-1); OCL_ASSERT(((T*)buf_data[2])[i].w == n-i); }else OCL_ASSERT(0); } OCL_UNMAP_BUFFER(2); // test add and sub overflow when src1 is max: // uadd.with.overflow: max + max // usub.with.overflow: 0 - max OCL_MAP_BUFFER(0); for (uint32_t i = 0; i < n; ++i) { if(func_type == 0) { ((T*)buf_data[0])[i].x = max; ((T*)buf_data[0])[i].y = max; ((T*)buf_data[0])[i].z = max; ((T*)buf_data[0])[i].w = i; }else if(func_type == 1) { ((T*)buf_data[0])[i].x = 0; ((T*)buf_data[0])[i].y = 0; ((T*)buf_data[0])[i].z = 0; ((T*)buf_data[0])[i].w = n+2-i; }else OCL_ASSERT(0); } OCL_UNMAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < n; ++i) { ((T*)buf_data[1])[i].x = max; ((T*)buf_data[1])[i].y = max; ((T*)buf_data[1])[i].z = max; ((T*)buf_data[1])[i].w = 1; } OCL_UNMAP_BUFFER(1); globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); OCL_MAP_BUFFER(2); for (uint32_t i = 0; i < 16; ++i) { // printf("%u,%u,%u,%u\n", ((T*)buf_data[2])[i].x,((T*)buf_data[2])[i].y, ((T*)buf_data[2])[i].z, ((T*)buf_data[2])[i].w ); if(func_type == 0) { OCL_ASSERT(((T*)buf_data[2])[i].x == max-1); OCL_ASSERT(((T*)buf_data[2])[i].y == max); OCL_ASSERT(((T*)buf_data[2])[i].z == max); OCL_ASSERT(((T*)buf_data[2])[i].w == i+2); }else if(func_type == 1) { OCL_ASSERT(((T*)buf_data[2])[i].x == 1); OCL_ASSERT(((T*)buf_data[2])[i].y == 0); OCL_ASSERT(((T*)buf_data[2])[i].z == 0); OCL_ASSERT(((T*)buf_data[2])[i].w == n-i); }else OCL_ASSERT(0); } OCL_UNMAP_BUFFER(2); } } #define compiler_overflow_add(type, subtype, kernel, func_type) \ static void compiler_overflow_add_ ##type(void)\ {\ test(# kernel, func_type);\ }\ MAKE_UTEST_FROM_FUNCTION(compiler_overflow_add_ ## type); #define compiler_overflow_sub(type, subtype, kernel, func_type) \ static void compiler_overflow_sub_ ##type(void)\ {\ test(# kernel, func_type);\ }\ MAKE_UTEST_FROM_FUNCTION(compiler_overflow_sub_ ## type); compiler_overflow_add(ulong4, unsigned long, compiler_overflow_ulong4_add, 0) compiler_overflow_add(uint4, uint32_t, compiler_overflow_uint4_add, 0) compiler_overflow_add(ushort4, uint16_t, compiler_overflow_ushort4_add, 0) compiler_overflow_add(uchar4, uint8_t, compiler_overflow_uchar4_add, 0) // as llvm intrincs function doesn't support byte/short overflow, // we just test uint overflow here. compiler_overflow_sub(uint4, uint32_t, compiler_overflow_uint4_sub, 1) Beignet-1.3.2-Source/utests/compiler_upsample_int.cpp000664 001750 001750 00000001727 13161142102 022103 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_upsample_int(void) { const int n = 32; short src1[n]; unsigned short src2[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_upsample_int"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(short), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(short), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int i = 0; i < n; ++i) { src1[i] = ((short*)buf_data[0])[i] = rand(); src2[i] = ((short*)buf_data[1])[i] = rand(); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(2); for (int i = 0; i < n; ++i) OCL_ASSERT(((int*)buf_data[2])[i] == (int)((src1[i] << 16) | src2[i])); OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(compiler_upsample_int); Beignet-1.3.2-Source/utests/sub_buffer.cpp000664 001750 001750 00000011113 13161142102 017621 0ustar00yryr000000 000000 #include "utest_helper.hpp" void sub_buffer_check(void) { cl_int error; cl_ulong max_alloc_size; cl_uint address_align; cl_mem main_buf; cl_mem sub_buf; char *main_buf_content; char sub_buf_content[32]; error = clGetDeviceInfo(device, CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(max_alloc_size), &max_alloc_size, NULL); OCL_ASSERT(error == CL_SUCCESS); error = clGetDeviceInfo(device, CL_DEVICE_MEM_BASE_ADDR_ALIGN, sizeof(address_align ), &address_align, NULL ); OCL_ASSERT(error == CL_SUCCESS); max_alloc_size /= 8; main_buf_content = (char *)malloc(sizeof(char) * max_alloc_size); for (cl_ulong i = 0; i < max_alloc_size; i++) { main_buf_content[i] = rand() & 63; } main_buf = clCreateBuffer(ctx, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, max_alloc_size, main_buf_content, &error); OCL_ASSERT(error == CL_SUCCESS); /* Test read sub buffer. */ for (cl_ulong sz = max_alloc_size / 4; sz <= max_alloc_size; sz += max_alloc_size / 4) { for (cl_ulong off = 0; off < max_alloc_size; off += 1234 + max_alloc_size / 3) { cl_buffer_region region; region.origin = off; region.size = sz; sub_buf = clCreateSubBuffer(main_buf, 0, CL_BUFFER_CREATE_TYPE_REGION, ®ion, &error ); /* invalid size, should be failed. */ if(off + sz > max_alloc_size) { OCL_ASSERT(error != CL_SUCCESS); continue; } /* invalid align, should be failed. */ if(off & ((address_align/8)-1)) { OCL_ASSERT(error != CL_SUCCESS); continue; } OCL_ASSERT(error == CL_SUCCESS); error = clEnqueueReadBuffer(queue, sub_buf, CL_TRUE, 0, 32, (void *)sub_buf_content, 0, NULL, NULL); OCL_ASSERT(error == CL_SUCCESS); #if 0 printf("\nRead ########### Src buffer: \n"); for (int i = 0; i < 32; ++i) printf(" %2.2u", main_buf_content[off + i]); printf("\nRead ########### dst buffer: \n"); for (int i = 0; i < 32; ++i) printf(" %2.2u", sub_buf_content[i]); printf("\n"); #endif for (int i = 0; i < 32; ++i) { if (main_buf_content[off + i] != sub_buf_content[i]) { printf ("different index is %d\n", i); OCL_ASSERT(0); } } } } for (cl_ulong sz = max_alloc_size / 4; sz <= max_alloc_size; sz += max_alloc_size / 4) { for (cl_ulong off = 0; off < max_alloc_size; off += 1234 + max_alloc_size / 3) { cl_buffer_region region; region.origin = off; region.size = sz; sub_buf = clCreateSubBuffer(main_buf, 0, CL_BUFFER_CREATE_TYPE_REGION, ®ion, &error ); /* invalid size, should be failed. */ if(off + sz > max_alloc_size) { OCL_ASSERT(error != CL_SUCCESS); continue; } /* invalid align, should be failed. */ if(off & (address_align/8-1)) { OCL_ASSERT(error != CL_SUCCESS); continue; } OCL_ASSERT(error == CL_SUCCESS); for (int i = 0; i < 32; i++) { sub_buf_content[i] = rand() & 63; } error = clEnqueueWriteBuffer(queue, main_buf, CL_TRUE, off, 32, sub_buf_content, 0, NULL, NULL); OCL_ASSERT(error == CL_SUCCESS); void * mapped_ptr = clEnqueueMapBuffer(queue, sub_buf, CL_TRUE, (cl_map_flags)( CL_MAP_READ | CL_MAP_WRITE ), 0, 32, 0, NULL, NULL, &error ); OCL_ASSERT(error == CL_SUCCESS); #if 0 printf("\nMap ########### Src buffer: \n"); for (int i = 0; i < 32; ++i) printf(" %2.2u", sub_buf_content[i]); printf("\nMap ########### dst buffer: \n"); for (int i = 0; i < 32; ++i) printf(" %2.2u", ((char *)mapped_ptr)[i]); printf("\n"); #endif for (int i = 0; i < 32; i++) { if (mapped_ptr && ((char *)mapped_ptr)[i] != sub_buf_content[i]) { printf ("different index is %d\n", i); OCL_ASSERT(0); } } error = clEnqueueUnmapMemObject(queue, sub_buf, mapped_ptr, 0, NULL, NULL ); OCL_ASSERT(error == CL_SUCCESS); clReleaseMemObject(sub_buf); } } clReleaseMemObject(main_buf); free(main_buf_content); } MAKE_UTEST_FROM_FUNCTION(sub_buffer_check); Beignet-1.3.2-Source/utests/compiler_multiple_kernels.cpp000664 001750 001750 00000000315 13161142102 022751 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_multiple_kernels(void) { OCL_CREATE_KERNEL_FROM_FILE("compiler_multiple_kernels", "first_kernel"); } MAKE_UTEST_FROM_FUNCTION(compiler_multiple_kernels);Beignet-1.3.2-Source/utests/enqueue_copy_buf_unaligned.cpp000664 001750 001750 00000005667 13161142102 023103 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void test_copy_buf(size_t sz, size_t src_off, size_t dst_off, size_t cb) { unsigned int i; OCL_MAP_BUFFER(0); for (i=0; i < sz; i++) { ((char*)buf_data[0])[i] = (rand() & 31); } OCL_UNMAP_BUFFER(0); OCL_MAP_BUFFER(1); for (i=0; i < sz; i++) { ((char*)buf_data[1])[i] = 64; } OCL_UNMAP_BUFFER(1); if (src_off + cb > sz || dst_off + cb > sz) { /* Expect Error. */ OCL_ASSERT(clEnqueueCopyBuffer(queue, buf[0], buf[1], src_off, dst_off, cb*sizeof(char), 0, NULL, NULL)); return; } OCL_ASSERT(!clEnqueueCopyBuffer(queue, buf[0], buf[1], src_off, dst_off, cb*sizeof(char), 0, NULL, NULL)); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); #if 0 printf ("@@@@@@@@@ cb is %d\n", cb); printf ("@@@@@@@@@ src_off is %d\n", src_off); printf ("@@@@@@@@@ dst_off is %d\n", dst_off); printf("\n########### Src buffer: \n"); for (i = 0; i < sz; ++i) printf(" %2.2u", ((unsigned char*)buf_data[0])[i]); printf("\n########### dst buffer: \n"); for (i = 0; i < sz; ++i) printf(" %2.2u", ((unsigned char*)buf_data[1])[i]); #endif // Check results for (i = 0; i < cb; ++i) { if (((char*)buf_data[0])[i +src_off] != ((char*)buf_data[1])[i + dst_off]) { printf ("different index is %d\n", i); OCL_ASSERT(0); } } for (i = 0; i < dst_off; ++i) { if (((char*)buf_data[1])[i] != 64) { printf ("wrong write, different index is %d\n", i); OCL_ASSERT(0); } } for (i = dst_off + cb; i < sz; ++i) { if (((char*)buf_data[1])[i] != 64) { printf ("wrong write, different index is %d\n", i); OCL_ASSERT(0); } } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); } void enqueue_copy_buf_unaligned(void) { size_t i; size_t j; const size_t sz = 1024; unsigned int offset = 0; OCL_CREATE_BUFFER(buf[0], 0, sz * sizeof(char), NULL); OCL_CREATE_BUFFER(buf[1], 0, sz * sizeof(char), NULL); #if 1 /* Test the same offset cases. */ for (i=0; i #include #include #include "utest_helper.hpp" void compiler_long_not_vec8(void) { const size_t n = 64; const int v = 8; int64_t src[n * v]; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_long_not", "compiler_long_not_vec8"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int64_t) * v, NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int64_t) * v, NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(int64_t) * v, NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n*v; ++i) { if (i % 3 == 0) src[i] = 0x0UL; else src[i] = ((int64_t)rand() << 32) + rand(); // printf(" 0x%lx", src[i]); } OCL_MAP_BUFFER(0); memcpy(buf_data[0], src, sizeof(int64_t) * n * v); OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); uint64_t res; // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n*v; ++i) { res = 0xffffffffffffffffUL; if (src[i]) res = 0x0; OCL_ASSERT(((uint64_t *)(buf_data[1]))[i] == res); //printf("ref is 0x%lx, result is 0x%lx\n", res, ((int64_t *)(buf_data[1]))[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_long_not_vec8); Beignet-1.3.2-Source/utests/utest_file_map.hpp000664 001750 001750 00000004626 13161142102 020517 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file assert.hpp * * \author Benjamin Segovia */ #ifndef __UTEST_FILE_MAP_HPP__ #define __UTEST_FILE_MAP_HPP__ #include "CL/cl.h" #include /* Map a file into memory for direct / cached / simple accesses */ typedef struct cl_file_map { void *start, *stop; /* First character and last one */ size_t size; /* Total size of the file */ int fd; /* Posix file descriptor */ cl_bool mapped; /* Indicate if a file was mapped or not */ char *name; /* File itself */ } cl_file_map_t; /* Report information about an open temptative */ enum { CL_FILE_MAP_SUCCESS = 0, CL_FILE_MAP_FILE_NOT_FOUND = 1, CL_FILE_MAP_FAILED_TO_MMAP = 2 }; /* Allocate and Initialize a file mapper (but do not map any file */ extern cl_file_map_t *cl_file_map_new(void); /* Initialize a file mapper (but do not map any file */ extern int cl_file_map_init(cl_file_map_t *fm); /* Destroy but do not deallocate a file map */ extern void cl_file_map_destroy(cl_file_map_t *fm); /* Destroy and free it */ extern void cl_file_map_delete(cl_file_map_t *fm); /* Open a file and returns the error code */ extern int cl_file_map_open(cl_file_map_t *fm, const char *name); static inline cl_bool cl_file_map_is_mapped(const cl_file_map_t *fm) { return fm->mapped; } static inline const char* cl_file_map_begin(const cl_file_map_t *fm) { return (const char*) fm->start; } static inline const char* cl_file_map_end(const cl_file_map_t *fm) { return (const char*) fm->stop; } static inline size_t cl_file_map_size(const cl_file_map_t *fm) { return fm->size; } #endif /* __UTEST_FILE_MAP_HPP__ */ Beignet-1.3.2-Source/utests/compiler_switch.cpp000664 001750 001750 00000002767 13161142102 020711 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void cpu_compiler_switch(int *dst, int *src, int get_global_id0) { switch (get_global_id0) { case 0: dst[get_global_id0] = src[get_global_id0 + 4]; break; case 1: dst[get_global_id0] = src[get_global_id0 + 14]; break; case 2: dst[get_global_id0] = src[get_global_id0 + 13]; break; case 6: dst[get_global_id0] = src[get_global_id0 + 11]; break; case 7: dst[get_global_id0] = src[get_global_id0 + 10]; break; case 10: dst[get_global_id0] = src[get_global_id0 + 9]; break; case 12: dst[get_global_id0] = src[get_global_id0 + 6]; break; default: dst[get_global_id0] = src[get_global_id0 + 8]; break; } } static void compiler_switch(void) { const size_t n = 32; int cpu_dst[32], cpu_src[32]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_switch"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 32; ++i) cpu_src[i] = ((int32_t*)buf_data[1])[i] = i; OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int i = 0; i < 16; ++i) cpu_compiler_switch(cpu_dst, cpu_src, i); for (int i = 0; i < 16; ++i) OCL_ASSERT(((int32_t*)buf_data[0])[i] == cpu_dst[i]); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_switch) Beignet-1.3.2-Source/utests/compiler_get_max_sub_group_size.cpp000664 001750 001750 00000001427 13161142102 024143 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_get_max_sub_group_size(void) { if(!cl_check_subgroups()) return; const size_t n = 256; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_get_max_sub_group_size"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) ((int*)buf_data[0])[i] = -1; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(0); int* dst = (int *)buf_data[0]; for (int32_t i = 0; i < (int32_t) n; ++i){ OCL_ASSERT(8 == dst[i] || 16 == dst[i] || 32 == dst[i]); } OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(compiler_get_max_sub_group_size); Beignet-1.3.2-Source/utests/runtime_set_kernel_arg.cpp000664 001750 001750 00000001313 13161142102 022227 0ustar00yryr000000 000000 #include "utest_helper.hpp" void runtime_set_kernel_arg(void) { const size_t n = 16; cl_float3 src; src.s[0] = 1; src.s[1] =2; src.s[2] = 3; // Setup kernel and buffers OCL_CREATE_KERNEL("set_kernel_arg"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_float3), &src); // Run the kernel globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); OCL_MAP_BUFFER(0); // Check results for (uint32_t i = 0; i < n; ++i) { // printf("%d %d\n",i, ((uint32_t*)buf_data[0])[i]); OCL_ASSERT(((uint32_t*)buf_data[0])[i] == src.s[i%3]); } OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(runtime_set_kernel_arg); Beignet-1.3.2-Source/utests/builtin_sub_group_size.cpp000664 001750 001750 00000003071 13161142102 022270 0ustar00yryr000000 000000 /* According to the OpenCL cl_intel_subgroups. Now define local and global size as following: globals[0] = 4; globals[1] = 9; globals[2] = 16; locals[0] = 2; locals[1] = 3; locals[2] = 4; */ #define udebug 0 #include "utest_helper.hpp" static void builtin_sub_group_size(void) { if(!cl_check_subgroups()) return; // Setup kernel and buffers size_t dim, i,local_sz = 1,buf_len = 1; OCL_CREATE_KERNEL("builtin_sub_group_size"); size_t max_sub_sz; OCL_CREATE_BUFFER(buf[0], CL_MEM_READ_WRITE, sizeof(int)*576, NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); for( dim=1; dim <= 3; dim++ ) { buf_len = 1; local_sz = 1; for(i=1; i <= dim; i++) { locals[i - 1] = i + 1; globals[i - 1] = (i + 1) * (i + 1); buf_len *= ((i + 1) * (i + 1)); local_sz *= i + 1; } for(i = dim+1; i <= 3; i++) { globals[i - 1] = 0; locals[i - 1] = 0; } OCL_CALL(utestclGetKernelSubGroupInfoKHR,kernel,device,CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR,sizeof(size_t)*dim,locals,sizeof(size_t),&max_sub_sz,NULL); // Run the kernel OCL_NDRANGE( dim ); clFinish(queue); OCL_MAP_BUFFER(0); for( i = 0; i < buf_len; i++) { size_t expect_sz = (i % local_sz) < (local_sz / max_sub_sz * max_sub_sz) ? max_sub_sz : (local_sz % max_sub_sz); #if udebug printf("%zu get %d, expect %zu\n",i, ((uint32_t*)buf_data[0])[i], expect_sz); #endif OCL_ASSERT( ((uint32_t*)buf_data[0])[i] == expect_sz); } OCL_UNMAP_BUFFER(0); } } MAKE_UTEST_FROM_FUNCTION(builtin_sub_group_size); Beignet-1.3.2-Source/utests/builtin_bitselect.cpp000664 001750 001750 00000002455 13161142102 021214 0ustar00yryr000000 000000 #include "utest_helper.hpp" int as_int(float f) { void *p = &f; return *(int *)p; } int cpu(int a, int b, int c) { return (a & ~c) | (b & c); } void builtin_bitselect(void) { const int n = 32; float src1[n], src2[n], src3[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("builtin_bitselect"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[3], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_SET_ARG(3, sizeof(cl_mem), &buf[3]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); for (int i = 0; i < n; ++i) { src1[i] = ((float*)buf_data[0])[i] = rand() * 0.1f; src2[i] = ((float*)buf_data[1])[i] = rand() * 0.1f; src3[i] = ((float*)buf_data[2])[i] = rand() * 0.1f; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); OCL_NDRANGE(1); OCL_MAP_BUFFER(3); for (int i = 0; i < n; ++i) OCL_ASSERT(((int*)buf_data[3])[i] == cpu(as_int(src1[i]), as_int(src2[i]), as_int(src3[i]))); OCL_UNMAP_BUFFER(3); } MAKE_UTEST_FROM_FUNCTION(builtin_bitselect); Beignet-1.3.2-Source/utests/utest_error.c000664 001750 001750 00000007272 13161142102 017527 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "utest_error.h" #include "CL/cl.h" const char *err_msg[] = { [-CL_SUCCESS] = "CL_SUCCESS", [-CL_DEVICE_NOT_FOUND] = "CL_DEVICE_NOT_FOUND", [-CL_DEVICE_NOT_AVAILABLE] = "CL_DEVICE_NOT_AVAILABLE", [-CL_COMPILER_NOT_AVAILABLE] = "CL_COMPILER_NOT_AVAILABLE", [-CL_MEM_OBJECT_ALLOCATION_FAILURE] = "CL_MEM_OBJECT_ALLOCATION_FAILURE", [-CL_OUT_OF_RESOURCES] = "CL_OUT_OF_RESOURCES", [-CL_OUT_OF_HOST_MEMORY] = "CL_OUT_OF_HOST_MEMORY", [-CL_PROFILING_INFO_NOT_AVAILABLE] = "CL_PROFILING_INFO_NOT_AVAILABLE", [-CL_MEM_COPY_OVERLAP] = "CL_MEM_COPY_OVERLAP", [-CL_IMAGE_FORMAT_MISMATCH] = "CL_IMAGE_FORMAT_MISMATCH", [-CL_IMAGE_FORMAT_NOT_SUPPORTED] = "CL_IMAGE_FORMAT_NOT_SUPPORTED", [-CL_BUILD_PROGRAM_FAILURE] = "CL_BUILD_PROGRAM_FAILURE", [-CL_MAP_FAILURE] = "CL_MAP_FAILURE", [-CL_MISALIGNED_SUB_BUFFER_OFFSET] = "CL_MISALIGNED_SUB_BUFFER_OFFSET", [-CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST] = "CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST", [-CL_INVALID_VALUE] = "CL_INVALID_VALUE", [-CL_INVALID_DEVICE_TYPE] = "CL_INVALID_DEVICE_TYPE", [-CL_INVALID_PLATFORM] = "CL_INVALID_PLATFORM", [-CL_INVALID_DEVICE] = "CL_INVALID_DEVICE", [-CL_INVALID_CONTEXT] = "CL_INVALID_CONTEXT", [-CL_INVALID_QUEUE_PROPERTIES] = "CL_INVALID_QUEUE_PROPERTIES", [-CL_INVALID_COMMAND_QUEUE] = "CL_INVALID_COMMAND_QUEUE", [-CL_INVALID_HOST_PTR] = "CL_INVALID_HOST_PTR", [-CL_INVALID_MEM_OBJECT] = "CL_INVALID_MEM_OBJECT", [-CL_INVALID_IMAGE_FORMAT_DESCRIPTOR] = "CL_INVALID_IMAGE_FORMAT_DESCRIPTOR", [-CL_INVALID_IMAGE_SIZE] = "CL_INVALID_IMAGE_SIZE", [-CL_INVALID_SAMPLER] = "CL_INVALID_SAMPLER", [-CL_INVALID_BINARY] = "CL_INVALID_BINARY", [-CL_INVALID_BUILD_OPTIONS] = "CL_INVALID_BUILD_OPTIONS", [-CL_INVALID_PROGRAM] = "CL_INVALID_PROGRAM", [-CL_INVALID_PROGRAM_EXECUTABLE] = "CL_INVALID_PROGRAM_EXECUTABLE", [-CL_INVALID_KERNEL_NAME] = "CL_INVALID_KERNEL_NAME", [-CL_INVALID_KERNEL_DEFINITION] = "CL_INVALID_KERNEL_DEFINITION", [-CL_INVALID_KERNEL] = "CL_INVALID_KERNEL", [-CL_INVALID_ARG_INDEX] = "CL_INVALID_ARG_INDEX", [-CL_INVALID_ARG_VALUE] = "CL_INVALID_ARG_VALUE", [-CL_INVALID_ARG_SIZE] = "CL_INVALID_ARG_SIZE", [-CL_INVALID_KERNEL_ARGS] = "CL_INVALID_KERNEL_ARGS", [-CL_INVALID_WORK_DIMENSION] = "CL_INVALID_WORK_DIMENSION", [-CL_INVALID_WORK_GROUP_SIZE] = "CL_INVALID_WORK_GROUP_SIZE", [-CL_INVALID_WORK_ITEM_SIZE] = "CL_INVALID_WORK_ITEM_SIZE", [-CL_INVALID_GLOBAL_OFFSET] = "CL_INVALID_GLOBAL_OFFSET", [-CL_INVALID_EVENT_WAIT_LIST] = "CL_INVALID_EVENT_WAIT_LIST", [-CL_INVALID_EVENT] = "CL_INVALID_EVENT", [-CL_INVALID_OPERATION] = "CL_INVALID_OPERATION", [-CL_INVALID_GL_OBJECT] = "CL_INVALID_GL_OBJECT", [-CL_INVALID_BUFFER_SIZE] = "CL_INVALID_BUFFER_SIZE", [-CL_INVALID_MIP_LEVEL] = "CL_INVALID_MIP_LEVEL", [-CL_INVALID_GLOBAL_WORK_SIZE] = "CL_INVALID_GLOBAL_WORK_SIZE", [-CL_INVALID_PROPERTY] = "CL_INVALID_PROPERTY" }; const size_t err_msg_n = sizeof(err_msg) / sizeof(err_msg[0]); Beignet-1.3.2-Source/utests/compiler_insert_vector.cpp000664 001750 001750 00000000616 13161142102 022265 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_insert_vector(void) { const size_t n = 2048; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_insert_vector"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int) * 4, NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); // Run the kernel globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); } MAKE_UTEST_FROM_FUNCTION(compiler_insert_vector); Beignet-1.3.2-Source/utests/compiler_uint16_copy.cpp000664 001750 001750 00000001774 13161142102 021565 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_uint16_copy(void) { const size_t n = 128; // Setup kernel and buffers. Note that uint16 is aligned on 16 bytes // according to the OCL specificatio OCL_CREATE_KERNEL("compiler_uint16_copy"); buf_data[0] = (uint32_t*) malloc(sizeof(uint32_t[16]) * n); for (uint32_t i = 0; i < n; ++i) for (uint32_t j = 0; j < 16; ++j) ((uint32_t*)buf_data[0])[16*i+j] = 16*i+j; OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(uint32_t[16]), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t[16]), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); // Check result OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 16*n; ++i) OCL_ASSERT(((uint32_t*)buf_data[0])[i] == ((uint32_t*)buf_data[1])[i]); } MAKE_UTEST_FROM_FUNCTION(compiler_uint16_copy); Beignet-1.3.2-Source/utests/compiler_function_constant1.cpp000664 001750 001750 00000002215 13161142102 023213 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_function_constant1(void) { const size_t n = 2048; const uint32_t value = 34; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_function_constant"); OCL_CREATE_BUFFER(buf[0], 0, 75 * sizeof(short), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(uint32_t), &value); OCL_MAP_BUFFER(0); for(uint32_t i = 0; i < 69; ++i) ((short *)buf_data[0])[i] = i; OCL_UNMAP_BUFFER(0); // Run the kernel globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); OCL_CREATE_BUFFER(buf[2], 0, 101 * sizeof(short), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[2]); OCL_MAP_BUFFER(2); for(uint32_t i = 0; i < 69; ++i) ((short *)buf_data[2])[i] = 2*i; OCL_UNMAP_BUFFER(2); // Run the kernel globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); OCL_MAP_BUFFER(1); // Check results for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((uint32_t *)buf_data[1])[i] == (value + (i%69)*2)); OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_function_constant1); Beignet-1.3.2-Source/utests/compiler_hadd.cpp000664 001750 001750 00000001712 13161142102 020275 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_hadd(void) { const int n = 32; int src1[n], src2[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_hadd"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int i = 0; i < n; ++i) { src1[i] = ((int*)buf_data[0])[i] = rand(); src2[i] = ((int*)buf_data[1])[i] = rand(); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(2); for (int i = 0; i < n; ++i) { long long a = src1[i]; a += src2[i]; a >>= 1; OCL_ASSERT(((int*)buf_data[2])[i] == (int)a); } OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(compiler_hadd); Beignet-1.3.2-Source/utests/compiler_subgroup_reduce.cpp000664 001750 001750 00000042276 13161142102 022604 0ustar00yryr000000 000000 #include #include #include #include #include #include #include #include "utest_helper.hpp" using namespace std; /* set to 1 for debug, output of input-expected data */ #define DEBUG_STDOUT 0 /* NDRANGE */ #define WG_GLOBAL_SIZE 30 #define WG_LOCAL_SIZE 30 enum WG_FUNCTION { WG_ANY, WG_ALL, WG_REDUCE_ADD, WG_REDUCE_MIN, WG_REDUCE_MAX }; /* * Generic compute-expected function for op REDUCE/ANY/ALL * and any variable type */ template static void compute_expected(WG_FUNCTION wg_func, T* input, T* expected, size_t SIMD_SIZE, bool IS_HALF) { if(wg_func == WG_ANY) { T wg_predicate = input[0]; for(uint32_t i = 1; i < SIMD_SIZE; i++) wg_predicate = (int)wg_predicate || (int)input[i]; for(uint32_t i = 0; i < SIMD_SIZE; i++) expected[i] = wg_predicate; } else if(wg_func == WG_ALL) { T wg_predicate = input[0]; for(uint32_t i = 1; i < SIMD_SIZE; i++) wg_predicate = (int)wg_predicate && (int)input[i]; for(uint32_t i = 0; i < SIMD_SIZE; i++) expected[i] = wg_predicate; } else if(wg_func == WG_REDUCE_ADD) { T wg_sum = input[0]; if(IS_HALF) { float wg_sum_tmp = 0.0f; for(uint32_t i = 0; i < SIMD_SIZE; i++) { wg_sum_tmp += as_float(__half_to_float(input[i])); } wg_sum = __float_to_half(as_uint(wg_sum_tmp)); } else { for(uint32_t i = 1; i < SIMD_SIZE; i++) wg_sum += input[i]; } for(uint32_t i = 0; i < SIMD_SIZE; i++) expected[i] = wg_sum; } else if(wg_func == WG_REDUCE_MAX) { T wg_max = input[0]; for(uint32_t i = 1; i < SIMD_SIZE; i++) { if (IS_HALF) { wg_max = (as_float(__half_to_float(input[i])) > as_float(__half_to_float(wg_max))) ? input[i] : wg_max; } else wg_max = max(input[i], wg_max); } for(uint32_t i = 0; i < SIMD_SIZE; i++) expected[i] = wg_max; } else if(wg_func == WG_REDUCE_MIN) { T wg_min = input[0]; for(uint32_t i = 1; i < SIMD_SIZE; i++) { if (IS_HALF) { wg_min= (as_float(__half_to_float(input[i])) < as_float(__half_to_float(wg_min))) ? input[i] : wg_min; } else wg_min = min(input[i], wg_min); } for(uint32_t i = 0; i < SIMD_SIZE; i++) expected[i] = wg_min; } } /* * Generic input-expected generate function for op REDUCE/ANY/ALL * and any variable type */ template static void generate_data(WG_FUNCTION wg_func, T* &input, T* &expected, size_t SIMD_SIZE, bool IS_HALF) { input = new T[WG_GLOBAL_SIZE]; expected = new T[WG_GLOBAL_SIZE]; /* base value for all data types */ T base_val = (long)7 << (sizeof(T) * 5 - 3); /* seed for random inputs */ srand (time(NULL)); /* generate inputs and expected values */ for(uint32_t gid = 0; gid < WG_GLOBAL_SIZE; gid += SIMD_SIZE) { #if DEBUG_STDOUT cout << endl << "IN: " << endl; #endif SIMD_SIZE = (gid + SIMD_SIZE) > WG_GLOBAL_SIZE ? WG_GLOBAL_SIZE - gid : SIMD_SIZE; /* input values */ for (uint32_t lid = 0; lid < SIMD_SIZE; lid++) { /* initially 0, augment after */ input[gid + lid] = 0; if (numeric_limits::is_integer) { /* check all data types, test ideal for QWORD types */ input[gid + lid] += ((rand() % 2 - 1) * base_val); /* add trailing random bits, tests GENERAL cases */ input[gid + lid] += (rand() % 112); /* always last bit is 1, ideal test ALL/ANY */ if (IS_HALF) input[gid + lid] = __float_to_half(as_uint((float)input[gid + lid]/2)); } else { input[gid + lid] += rand(); input[gid + lid] += rand() / ((float)RAND_MAX + 1); } #if DEBUG_STDOUT /* output generated input */ cout << setw(4) << input[gid + lid] << ", " ; if((lid + 1) % 8 == 0) cout << endl; #endif } /* expected values */ compute_expected(wg_func, input + gid, expected + gid, SIMD_SIZE, IS_HALF); #if DEBUG_STDOUT /* output expected input */ cout << endl << "EXP: " << endl; for(uint32_t lid = 0; lid < SIMD_SIZE; lid++) { cout << setw(4) << expected[gid + lid] << ", " ; if((lid + 1) % 8 == 0) cout << endl; } cout << endl; #endif } } /* * Generic subgroup utest function for op REDUCE/ANY/ALL * and any variable type */ template static void subgroup_generic(WG_FUNCTION wg_func, T* input, T* expected, bool IS_HALF = false) { /* get simd size */ globals[0] = WG_GLOBAL_SIZE; locals[0] = WG_LOCAL_SIZE; size_t SIMD_SIZE = 0; OCL_CALL(utestclGetKernelSubGroupInfoKHR,kernel,device,CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR,sizeof(size_t)*1,locals,sizeof(size_t),&SIMD_SIZE,NULL); /* input and expected data */ generate_data(wg_func, input, expected, SIMD_SIZE, IS_HALF); /* prepare input for data type */ OCL_CREATE_BUFFER(buf[0], 0, WG_GLOBAL_SIZE * sizeof(T), NULL); OCL_CREATE_BUFFER(buf[1], 0, WG_GLOBAL_SIZE * sizeof(T), NULL); /* set input data for GPU */ OCL_MAP_BUFFER(0); memcpy(buf_data[0], input, WG_GLOBAL_SIZE * sizeof(T)); OCL_UNMAP_BUFFER(0); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); /* run the kernel on GPU */ OCL_NDRANGE(1); /* check if mismatch */ OCL_MAP_BUFFER(1); uint32_t mismatches = 0; for (uint32_t i = 0; i < WG_GLOBAL_SIZE; i++) if(((T *)buf_data[1])[i] != *(expected + i)) { if (IS_HALF) { float num_computed = as_float(__half_to_float(((T *)buf_data[1])[i])); float num_expected = as_float(__half_to_float(*(expected + i))); float num_diff = abs(num_computed - num_expected) / abs(num_expected); if (num_diff > 0.03f) { mismatches++; } #if DEBUG_STDOUT /* output mismatch */ cout << "Err at " << i << ", " << num_computed << " != " << num_expected << " diff: " <::is_integer) { mismatches++; #if DEBUG_STDOUT /* output mismatch */ cout << "Err at " << i << ", " << ((T *)buf_data[1])[i] << " != " << *(expected + i) << endl; #endif } /* float error is tolerable though */ else { float num_computed = ((T *)buf_data[1])[i]; float num_expected = *(expected + i); float num_diff = abs(num_computed - num_expected) / abs(num_expected); if (num_diff > 0.01f) { mismatches++; #if DEBUG_STDOUT /* output mismatch */ cout << "Err at " << i << ", " << ((T *)buf_data[1])[i] << " != " << *(expected + i) << endl; #endif } } } #if DEBUG_STDOUT /* output mismatch count */ cout << "mismatches " << mismatches << endl; #endif OCL_UNMAP_BUFFER(1); OCL_ASSERT(mismatches == 0); } /* * Workgroup any/all utest functions */ void compiler_subgroup_any(void) { if(!cl_check_subgroups()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_any"); subgroup_generic(WG_ANY, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_any); void compiler_subgroup_all(void) { if(!cl_check_subgroups()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_all"); subgroup_generic(WG_ALL, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_all); /* * Workgroup reduce add utest functions */ void compiler_subgroup_reduce_add_int(void) { if(!cl_check_subgroups()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_add_int"); subgroup_generic(WG_REDUCE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_add_int); void compiler_subgroup_reduce_add_uint(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_add_uint"); subgroup_generic(WG_REDUCE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_add_uint); void compiler_subgroup_reduce_add_long(void) { if(!cl_check_subgroups()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_add_long"); subgroup_generic(WG_REDUCE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_subgroup_reduce_add_long); void compiler_subgroup_reduce_add_ulong(void) { if(!cl_check_subgroups()) return; cl_ulong *input = NULL; cl_ulong *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_add_ulong"); subgroup_generic(WG_REDUCE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_subgroup_reduce_add_ulong); void compiler_subgroup_reduce_add_float(void) { if(!cl_check_subgroups()) return; cl_float *input = NULL; cl_float *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_add_float"); subgroup_generic(WG_REDUCE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_add_float); void compiler_subgroup_reduce_add_half(void) { if(!cl_check_subgroups()) return; if(!cl_check_half()) return; cl_half *input = NULL; cl_half *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_reduce.cl", "compiler_subgroup_reduce_add_half", SOURCE, "-DHALF"); subgroup_generic(WG_REDUCE_ADD, input, expected, true); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_add_half); void compiler_subgroup_reduce_add_short(void) { if(!cl_check_subgroups_short()) return; cl_short *input = NULL; cl_short *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_add_short"); subgroup_generic(WG_REDUCE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_add_short); void compiler_subgroup_reduce_add_ushort(void) { if(!cl_check_subgroups_short()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_add_ushort"); subgroup_generic(WG_REDUCE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_add_ushort); /* * Workgroup reduce max utest functions */ void compiler_subgroup_reduce_max_int(void) { if(!cl_check_subgroups()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_max_int"); subgroup_generic(WG_REDUCE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_max_int); void compiler_subgroup_reduce_max_uint(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_max_uint"); subgroup_generic(WG_REDUCE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_max_uint); void compiler_subgroup_reduce_max_long(void) { if(!cl_check_subgroups()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_max_long"); subgroup_generic(WG_REDUCE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_subgroup_reduce_max_long); void compiler_subgroup_reduce_max_ulong(void) { if(!cl_check_subgroups()) return; cl_ulong *input = NULL; cl_ulong *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_max_ulong"); subgroup_generic(WG_REDUCE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_subgroup_reduce_max_ulong); void compiler_subgroup_reduce_max_float(void) { if(!cl_check_subgroups()) return; cl_float *input = NULL; cl_float *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_max_float"); subgroup_generic(WG_REDUCE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_max_float); void compiler_subgroup_reduce_max_half(void) { if(!cl_check_subgroups()) return; if(!cl_check_half()) return; cl_half *input = NULL; cl_half *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_reduce.cl", "compiler_subgroup_reduce_max_half", SOURCE, "-DHALF"); subgroup_generic(WG_REDUCE_MAX, input, expected, true); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_max_half); void compiler_subgroup_reduce_max_short(void) { if(!cl_check_subgroups_short()) return; cl_short *input = NULL; cl_short *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_max_short"); subgroup_generic(WG_REDUCE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_max_short); void compiler_subgroup_reduce_max_ushort(void) { if(!cl_check_subgroups_short()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_max_ushort"); subgroup_generic(WG_REDUCE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_max_ushort); /* * Workgroup reduce min utest functions */ void compiler_subgroup_reduce_min_int(void) { if(!cl_check_subgroups()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_min_int"); subgroup_generic(WG_REDUCE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_min_int); void compiler_subgroup_reduce_min_uint(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_min_uint"); subgroup_generic(WG_REDUCE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_min_uint); void compiler_subgroup_reduce_min_long(void) { if(!cl_check_subgroups()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_min_long"); subgroup_generic(WG_REDUCE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_subgroup_reduce_min_long); void compiler_subgroup_reduce_min_ulong(void) { if(!cl_check_subgroups()) return; cl_ulong *input = NULL; cl_ulong *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_min_ulong"); subgroup_generic(WG_REDUCE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_subgroup_reduce_min_ulong); void compiler_subgroup_reduce_min_float(void) { if(!cl_check_subgroups()) return; cl_float *input = NULL; cl_float *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_min_float"); subgroup_generic(WG_REDUCE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_min_float); void compiler_subgroup_reduce_min_half(void) { if(!cl_check_subgroups()) return; if(!cl_check_half()) return; cl_half *input = NULL; cl_half *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_reduce.cl", "compiler_subgroup_reduce_min_half", SOURCE, "-DHALF"); subgroup_generic(WG_REDUCE_MIN, input, expected, true); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_min_half); void compiler_subgroup_reduce_min_short(void) { if(!cl_check_subgroups_short()) return; cl_short *input = NULL; cl_short *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_min_short"); subgroup_generic(WG_REDUCE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_min_short); void compiler_subgroup_reduce_min_ushort(void) { if(!cl_check_subgroups_short()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_reduce", "compiler_subgroup_reduce_min_ushort"); subgroup_generic(WG_REDUCE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_reduce_min_ushort); Beignet-1.3.2-Source/utests/compiler_long_convert.cpp000664 001750 001750 00000010507 13161142102 022076 0ustar00yryr000000 000000 #include #include #include #include "utest_helper.hpp" // convert shorter integer to 64-bit integer void compiler_long_convert(void) { const size_t n = 16; char src1[n]; short src2[n]; int src3[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_long_convert"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(char), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(short), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[3], 0, n * sizeof(int64_t), NULL); OCL_CREATE_BUFFER(buf[4], 0, n * sizeof(int64_t), NULL); OCL_CREATE_BUFFER(buf[5], 0, n * sizeof(int64_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_SET_ARG(3, sizeof(cl_mem), &buf[3]); OCL_SET_ARG(4, sizeof(cl_mem), &buf[4]); OCL_SET_ARG(5, sizeof(cl_mem), &buf[5]); globals[0] = n; locals[0] = 16; // Run random tests for (int32_t i = 0; i < (int32_t) n; ++i) { src1[i] = -i; src2[i] = -i; src3[i] = -i; } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); memcpy(buf_data[0], src1, sizeof(src1)); memcpy(buf_data[1], src2, sizeof(src2)); memcpy(buf_data[2], src3, sizeof(src3)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(3); OCL_MAP_BUFFER(4); OCL_MAP_BUFFER(5); int64_t *dst1 = ((int64_t *)buf_data[3]); int64_t *dst2 = ((int64_t *)buf_data[4]); int64_t *dst3 = ((int64_t *)buf_data[5]); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("%lx %lx %lx\n", dst1[i], dst2[i], dst3[i]); OCL_ASSERT(dst1[i] == -(int64_t)i); OCL_ASSERT(dst2[i] == -(int64_t)i); OCL_ASSERT(dst3[i] == -(int64_t)i); } OCL_UNMAP_BUFFER(3); OCL_UNMAP_BUFFER(4); OCL_UNMAP_BUFFER(5); } MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(compiler_long_convert, true); // convert 64-bit integer to shorter integer void compiler_long_convert_2(void) { const size_t n = 16; int64_t src[n]; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_long_convert", "compiler_long_convert_2"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(char), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(short), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[3], 0, n * sizeof(int64_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_SET_ARG(3, sizeof(cl_mem), &buf[3]); globals[0] = n; locals[0] = 16; // Run random tests for (int32_t i = 0; i < (int32_t) n; ++i) { src[i] = -i; } OCL_MAP_BUFFER(3); memcpy(buf_data[3], src, sizeof(src)); OCL_UNMAP_BUFFER(3); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); char *dst1 = ((char *)buf_data[0]); short *dst2 = ((short *)buf_data[1]); int *dst3 = ((int *)buf_data[2]); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("%x %x %x\n", dst1[i], dst2[i], dst3[i]); OCL_ASSERT(dst1[i] == -i); OCL_ASSERT(dst2[i] == -i); OCL_ASSERT(dst3[i] == -i); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(compiler_long_convert_2, true); // convert 64-bit integer to 32-bit float void compiler_long_convert_to_float(void) { const size_t n = 16; int64_t src[n]; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_long_convert", "compiler_long_convert_to_float"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int64_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; // Run random tests for (int32_t i = 0; i < (int32_t) n; ++i) { src[i] = -(int64_t)i; } OCL_MAP_BUFFER(1); memcpy(buf_data[1], src, sizeof(src)); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); float *dst = ((float *)buf_data[0]); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("%f\n", dst[i]); OCL_ASSERT(dst[i] == src[i]); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_long_convert_to_float); Beignet-1.3.2-Source/utests/compiler_local_memory_barrier_2.cpp000664 001750 001750 00000001445 13161142102 024011 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_local_memory_barrier_2(void) { const size_t n = 16*1024; globals[0] = n/2; locals[0] = 256; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_local_memory_barrier_2"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); //OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, locals[0] * 2 * sizeof(uint32_t), NULL); // Run the kernel OCL_NDRANGE(1); OCL_MAP_BUFFER(0); // Check results uint32_t *dst = (uint32_t*)buf_data[0]; for (uint32_t i = 0; i < n; i+=locals[0]) for (uint32_t j = 0; j < locals[0]; ++j) OCL_ASSERT(dst[i+j] == locals[0] - 1 -j); OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(compiler_local_memory_barrier_2); Beignet-1.3.2-Source/utests/compiler_radians.cpp000664 001750 001750 00000001402 13161142102 021012 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_radians(void) { const int n = 32; float src[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_radians"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); for (int i = 0; i < n; ++i) { src[i] = ((float *)buf_data[0])[i] = rand() * 0.01f; } OCL_UNMAP_BUFFER(0); OCL_NDRANGE(1); OCL_MAP_BUFFER(1); for (int i = 0; i < n; ++i) { OCL_ASSERT(((float *)buf_data[1])[i] == src[i] * (3.141592653589793F / 180)); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_radians); Beignet-1.3.2-Source/utests/compiler_subgroup_buffer_block_write.cpp000664 001750 001750 00000017136 13161142102 025167 0ustar00yryr000000 000000 #include #include #include #include "utest_helper.hpp" using namespace std; /* set to 1 for debug, output of input-expected data */ #define DEBUG_STDOUT 0 /* NDRANGE */ #define WG_GLOBAL_SIZE 32 #define WG_LOCAL_SIZE 32 /* * Generic input-expected generate function for block write */ template static void compute_expected(T* input, T* expected, size_t VEC_SIZE, size_t SIMD_SIZE) { for(uint32_t i = 0; i < SIMD_SIZE; i++) for(uint32_t j = 0; j < VEC_SIZE; j++) expected[SIMD_SIZE * j + i] = input[i * VEC_SIZE + j]; } /* * Generic compute-expected function for buffer block write */ template static void generate_data(T* &input, T* &expected, size_t VEC_SIZE, size_t SIMD_SIZE) { /* allocate input and expected arrays */ input = new T[WG_GLOBAL_SIZE * VEC_SIZE]; expected = new T[WG_GLOBAL_SIZE * VEC_SIZE]; /* base value for all data types */ T base_val = (long)7 << (sizeof(T) * 5 - 3); /* seed for random inputs */ srand (time(NULL)); /* generate inputs and expected values */ for(uint32_t gid = 0; gid < WG_GLOBAL_SIZE; gid += SIMD_SIZE) { #if DEBUG_STDOUT cout << endl << "IN: " << endl; #endif SIMD_SIZE = (gid + SIMD_SIZE) > WG_GLOBAL_SIZE ? WG_GLOBAL_SIZE - gid : SIMD_SIZE; /* input values */ for(uint32_t lid = 0; lid < SIMD_SIZE; lid++) { for(uint32_t vsz = 0; vsz < VEC_SIZE; vsz++) { /* initially 0, augment after */ input[(gid + lid)*VEC_SIZE + vsz] = 0; /* check all data types, test ideal for QWORD types */ input[(gid + lid)*VEC_SIZE + vsz] += ((rand() % 2 - 1) * base_val); /* add trailing random bits, tests GENERAL cases */ input[(gid + lid)*VEC_SIZE + vsz] += (rand() % 112); //input[(gid + lid)*VEC_SIZE + vsz] = (gid + lid)*VEC_SIZE + vsz; #if DEBUG_STDOUT /* output generated input */ cout << setw(4) << input[(gid + lid)*VEC_SIZE + vsz] << ", " ; if((lid + 1) % 8 == 0) cout << endl; #endif } } /* expected values */ compute_expected(input + gid * VEC_SIZE, expected + gid * VEC_SIZE, VEC_SIZE, SIMD_SIZE); #if DEBUG_STDOUT /* output expected input */ cout << endl << "EXP: " << endl; for(uint32_t lid = 0; lid < SIMD_SIZE ; lid++){ for(uint32_t vsz = 0; vsz < VEC_SIZE; vsz++) cout << setw(4) << expected[(gid + lid)*VEC_SIZE + vsz] << ", " ; if((lid + 1) % 8 == 0) cout << endl; } cout << endl; #endif } } /* * Generic subgroup utest function for buffer block write */ template static void subgroup_generic(T* input, T* expected, size_t VEC_SIZE) { /* get simd size */ globals[0] = WG_GLOBAL_SIZE; locals[0] = WG_LOCAL_SIZE; size_t SIMD_SIZE = 0; OCL_CALL(utestclGetKernelSubGroupInfoKHR,kernel,device,CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR,sizeof(size_t)*1,locals,sizeof(size_t),&SIMD_SIZE,NULL); size_t buf_sz = VEC_SIZE * WG_GLOBAL_SIZE; /* input and expected data */ generate_data(input, expected, VEC_SIZE, SIMD_SIZE); /* prepare input for datatype */ OCL_CREATE_BUFFER(buf[0], 0, buf_sz * sizeof(T), NULL); OCL_CREATE_BUFFER(buf[1], 0, buf_sz * sizeof(T), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); /* set input data for GPU */ OCL_MAP_BUFFER(0); memcpy(buf_data[0], input, buf_sz* sizeof(T)); OCL_UNMAP_BUFFER(0); /* run the kernel on GPU */ OCL_NDRANGE(1); /* check if mismatch */ OCL_MAP_BUFFER(1); uint32_t mismatches = 0; for (uint32_t i = 0; i < buf_sz; i++) if(((T *)buf_data[1])[i] != *(expected + i)) { /* found mismatch, increment */ mismatches++; #if DEBUG_STDOUT /* output mismatch */ cout << "Err at " << i << ", " << ((T *)buf_data[1])[i] << " != " << *(expected + i) << endl; #endif } #if DEBUG_STDOUT /* output mismatch count */ cout << "mismatches " << mismatches << endl; #endif OCL_UNMAP_BUFFER(1); OCL_ASSERT(mismatches == 0); free(input); free(expected); } /* * subgroup buffer block write */ void compiler_subgroup_buffer_block_write_ui1(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_buffer_block_write", "compiler_subgroup_buffer_block_write_ui1"); subgroup_generic(input, expected, 1); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_buffer_block_write_ui1); void compiler_subgroup_buffer_block_write_ui2(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_buffer_block_write", "compiler_subgroup_buffer_block_write_ui2"); subgroup_generic(input, expected, 2); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_buffer_block_write_ui2); void compiler_subgroup_buffer_block_write_ui4(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_buffer_block_write", "compiler_subgroup_buffer_block_write_ui4"); subgroup_generic(input, expected, 4); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_buffer_block_write_ui4); void compiler_subgroup_buffer_block_write_ui8(void) { if(!cl_check_subgroups()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_subgroup_buffer_block_write", "compiler_subgroup_buffer_block_write_ui8"); subgroup_generic(input, expected, 8); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_buffer_block_write_ui8); void compiler_subgroup_buffer_block_write_us1(void) { if(!cl_check_subgroups_short()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_buffer_block_write.cl", "compiler_subgroup_buffer_block_write_us1", SOURCE, "-DSHORT"); subgroup_generic(input, expected, 1); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_buffer_block_write_us1); void compiler_subgroup_buffer_block_write_us2(void) { if(!cl_check_subgroups_short()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_buffer_block_write.cl", "compiler_subgroup_buffer_block_write_us2", SOURCE, "-DSHORT"); subgroup_generic(input, expected, 2); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_buffer_block_write_us2); void compiler_subgroup_buffer_block_write_us4(void) { if(!cl_check_subgroups_short()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_buffer_block_write.cl", "compiler_subgroup_buffer_block_write_us4", SOURCE, "-DSHORT"); subgroup_generic(input, expected, 4); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_buffer_block_write_us4); void compiler_subgroup_buffer_block_write_us8(void) { if(!cl_check_subgroups_short()) return; cl_ushort *input = NULL; cl_ushort *expected = NULL; OCL_CALL(cl_kernel_init, "compiler_subgroup_buffer_block_write.cl", "compiler_subgroup_buffer_block_write_us8", SOURCE, "-DSHORT"); subgroup_generic(input, expected, 8); } MAKE_UTEST_FROM_FUNCTION(compiler_subgroup_buffer_block_write_us8); Beignet-1.3.2-Source/utests/compiler_long_2.cpp000664 001750 001750 00000002653 13161142102 020562 0ustar00yryr000000 000000 #include #include #include #include "utest_helper.hpp" void compiler_long_2(void) { const size_t n = 16; int64_t src1[n], src2[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_long_2"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int64_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int64_t), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(int64_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = 16; // Run random tests for (int32_t i = 0; i < (int32_t) n; ++i) { src1[i] = ((int64_t)rand() << 32) + rand(); src2[i] = ((int64_t)rand() << 32) + rand(); } src1[4] = 1; OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], src1, sizeof(src1)); memcpy(buf_data[1], src2, sizeof(src2)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(2); int64_t *dest = ((int64_t *)buf_data[2]); //for (int32_t i = 0; i < (int32_t) n; ++i) // printf("%lx\n", dest[i]); OCL_ASSERT(0xFEDCBA9876543210UL == (uint64_t)dest[0]); OCL_ASSERT((src1[1] & src2[1]) == dest[1]); OCL_ASSERT((src1[2] | src2[2]) == dest[2]); OCL_ASSERT((src1[3] ^ src2[3]) == dest[3]); OCL_ASSERT(0x1122334455667788L == dest[4]); OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(compiler_long_2); Beignet-1.3.2-Source/utests/builtin_sub_group_id.cpp000664 001750 001750 00000002762 13161142102 021720 0ustar00yryr000000 000000 /* According to the OpenCL cl_intel_subgroups. Now define local and global size as following: globals[0] = 4; globals[1] = 9; globals[2] = 16; locals[0] = 2; locals[1] = 3; locals[2] = 4; */ #define udebug 0 #include "utest_helper.hpp" static void builtin_sub_group_id(void) { if(!cl_check_subgroups()) return; // Setup kernel and buffers size_t dim, i,local_sz = 1,buf_len = 1; OCL_CREATE_KERNEL("builtin_sub_group_id"); size_t max_sub_sz; OCL_CREATE_BUFFER(buf[0], CL_MEM_READ_WRITE, sizeof(int)*576, NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); for( dim=1; dim <= 3; dim++ ) { buf_len = 1; local_sz = 1; for(i=1; i <= dim; i++) { locals[i - 1] = i + 1; globals[i - 1] = (i + 1) * (i + 1); buf_len *= ((i + 1) * (i + 1)); local_sz *= i + 1; } for(i = dim+1; i <= 3; i++) { globals[i - 1] = 0; locals[i - 1] = 0; } OCL_CALL(utestclGetKernelSubGroupInfoKHR,kernel,device,CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR,sizeof(size_t)*dim,locals,sizeof(size_t),&max_sub_sz,NULL); // Run the kernel OCL_NDRANGE( dim ); clFinish(queue); OCL_MAP_BUFFER(0); for( i = 0; i < buf_len; i++) { size_t expect_id = (i % local_sz) / max_sub_sz; #if udebug printf("%zu get %d, expect %zu\n",i, ((uint32_t*)buf_data[0])[i], expect_id); #endif OCL_ASSERT( ((uint32_t*)buf_data[0])[i] == expect_id); } OCL_UNMAP_BUFFER(0); } } MAKE_UTEST_FROM_FUNCTION(builtin_sub_group_id); Beignet-1.3.2-Source/utests/compiler_fill_gl_image.cpp000664 001750 001750 00000004470 13161142102 022153 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_fill_gl_image(void) { const size_t w = EGL_WINDOW_WIDTH; const size_t h = EGL_WINDOW_HEIGHT; uint32_t color0 = 0x123456FF; uint32_t color1 = 0x789ABCDE; uint32_t *resultColor0; uint32_t *resultColor1; GLuint tex; if (eglContext == EGL_NO_CONTEXT) { fprintf(stderr, "There is no valid egl context. Ignore this case.\n"); return; } // Setup kernel and images glGenTextures(1, &tex); glBindTexture(GL_TEXTURE_2D, tex); // Must set the all filters to GL_NEAREST! glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, w, h, 0, GL_RGBA, GL_UNSIGNED_INT_8_8_8_8, NULL); glGenerateMipmap(GL_TEXTURE_2D); glTexImage2D(GL_TEXTURE_2D, 1, GL_RGBA, w/2, h/2, 0, GL_RGBA, GL_UNSIGNED_INT_8_8_8_8, NULL); OCL_CREATE_KERNEL("test_fill_gl_image"); //Create cl image from miplevel 0 OCL_CREATE_GL_IMAGE(buf[0], 0, GL_TEXTURE_2D, 0, tex); // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(color0), &color0); globals[0] = w; globals[1] = h; locals[0] = 16; locals[1] = 16; glFinish(); OCL_ENQUEUE_ACQUIRE_GL_OBJECTS(0); OCL_NDRANGE(2); OCL_FLUSH(); OCL_ENQUEUE_RELEASE_GL_OBJECTS(0); // Check result resultColor0 = new uint32_t[w * h]; if (resultColor0 == NULL) assert(0); glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA, GL_UNSIGNED_INT_8_8_8_8, resultColor0); for (uint32_t j = 0; j < h; ++j) for (uint32_t i = 0; i < w; i++) OCL_ASSERT(resultColor0[j * w + i] == color0); //Create cl image from miplevel 1 OCL_CREATE_GL_IMAGE(buf[1], 0, GL_TEXTURE_2D, 1, tex); OCL_SET_ARG(0, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(1, sizeof(color1), &color1); globals[0] = w/2; globals[1] = h/2; OCL_ENQUEUE_ACQUIRE_GL_OBJECTS(1); OCL_NDRANGE(2); OCL_FLUSH(); OCL_ENQUEUE_RELEASE_GL_OBJECTS(1); // Check result resultColor1 = new uint32_t[(w/2)*(h/2)]; glGetTexImage(GL_TEXTURE_2D, 1, GL_RGBA, GL_UNSIGNED_INT_8_8_8_8, resultColor1); for (uint32_t j = 0; j < h/2; ++j) for (uint32_t i = 0; i < w/2; i++) OCL_ASSERT(resultColor1[j * (w/2) + i] == color1); delete[] resultColor0; delete[] resultColor1; } MAKE_UTEST_FROM_FUNCTION(compiler_fill_gl_image); Beignet-1.3.2-Source/utests/compiler_reqd_sub_group_size.cpp000664 001750 001750 00000003022 13173554000 023452 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include #include #include using namespace std; void compiler_reqd_sub_group_size(void) { if (!cl_check_reqd_subgroup()) return; size_t param_value_size; OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_SUB_GROUP_SIZES_INTEL, 0, NULL, ¶m_value_size); size_t* param_value = new size_t[param_value_size]; OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_SUB_GROUP_SIZES_INTEL, param_value_size, param_value, NULL); const char* opt = "-D SIMD_SIZE="; for( uint32_t i = 0; i < param_value_size / sizeof(size_t) ; ++i) { ostringstream ss; uint32_t simd_size = param_value[i]; ss << opt << simd_size; //cout << "options: " << ss.str() << endl; OCL_CALL(cl_kernel_init, "compiler_reqd_sub_group_size.cl", "compiler_reqd_sub_group_size", SOURCE, ss.str().c_str()); size_t SIMD_SIZE = 0; OCL_CALL(utestclGetKernelSubGroupInfoKHR,kernel,device, CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL,0, NULL,sizeof(size_t),&SIMD_SIZE,NULL); //cout << SIMD_SIZE << " with " << simd_size << endl; OCL_ASSERT(SIMD_SIZE == simd_size); cl_ulong SPILL_SIZE = 0xFFFFFFFF; OCL_CALL(clGetKernelWorkGroupInfo, kernel, device, CL_KERNEL_SPILL_MEM_SIZE_INTEL, sizeof(cl_ulong), &SPILL_SIZE, NULL); //cout << "spill size: " << SPILL_SIZE << endl; OCL_ASSERT(SPILL_SIZE == 0); clReleaseProgram(program); program = NULL; } delete[] param_value; } MAKE_UTEST_FROM_FUNCTION(compiler_reqd_sub_group_size); Beignet-1.3.2-Source/utests/compiler_mul_hi.cpp000664 001750 001750 00000001721 13161142102 020652 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_mul_hi(void) { const int n = 32; int src1[n], src2[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_mul_hi"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int i = 0; i < n; ++i) { src1[i] = ((int*)buf_data[0])[i] = rand(); src2[i] = ((int*)buf_data[1])[i] = rand(); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(2); for (int i = 0; i < n; ++i) { long long a = src1[i]; a *= src2[i]; a >>= 32; OCL_ASSERT(((int*)buf_data[2])[i] == (int)a); } OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(compiler_mul_hi); Beignet-1.3.2-Source/utests/compiler_unstructured_branch3.cpp000664 001750 001750 00000003301 13161142102 023540 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_unstructured_branch3(void) { const size_t n = 16; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_unstructured_branch3"); buf_data[0] = (uint32_t*) malloc(sizeof(uint32_t) * n); for (uint32_t i = 0; i < n; ++i) ((uint32_t*)buf_data[0])[i] = 2; OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(uint32_t), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; OCL_NDRANGE(1); // First control flow OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 2); // Second control flow for (uint32_t i = 0; i < n; ++i) ((int32_t*)buf_data[0])[i] = 0; OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((uint32_t*)buf_data[1])[i] == 3); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Third control flow OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 8; ++i) ((int32_t*)buf_data[0])[i] = 2; for (uint32_t i = 8; i < n; ++i) ((int32_t*)buf_data[0])[i] = 0; OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 8; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 2); for (uint32_t i = 8; i < n; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 3); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_unstructured_branch3); Beignet-1.3.2-Source/utests/builtin_global_linear_id.cpp000664 001750 001750 00000003452 13161142102 022502 0ustar00yryr000000 000000 /* According to the OpenCL v2.0 chapter 6.13.1 Now define global size as following: globals[0] = 3; globals[1] = 4; globals[2] = 5; offsets[0] = 1; offsets[1] = 2; offsets[2] = 3; Kernel: id = get_global_linear_id(0) dimension:1 0 1 2 dimension:2 0 1 2 3 4 5 6 7 8 9 10 11 dimension:3 0 1 2 12 13 14 24 25 26 36 37 38 48 49 50 3 4 5 15 16 17 27 28 29 39 40 41 51 52 53 6 7 8 18 19 20 30 31 32 42 43 44 54 55 56 9 10 11 21 22 23 33 34 35 45 46 47 57 58 59 */ #define udebug 0 #include "utest_helper.hpp" static void builtin_global_linear_id(void) { if (!cl_check_ocl20()) return; // Setup kernel and buffers int dim, err, i, buf_len=1; size_t offsets[3] = {0,0,0}; OCL_CREATE_KERNEL("builtin_global_linear_id"); OCL_CREATE_BUFFER(buf[0], CL_MEM_READ_WRITE, sizeof(int)*80, NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); for( dim=1; dim <= 3; dim++ ) { buf_len = 1; for(i=1; i <= dim; i++) { globals[i - 1] = 2 + i; locals[i - 1] = 2 + i; offsets[i - 1] = i; buf_len *= 2 + i; } for(i=dim+1; i <= 3; i++) { globals[i - 1] = 0; locals[i - 1] = 0; offsets[i - 1] = 0; } // Run the kernel err = clEnqueueNDRangeKernel(queue, kernel, dim, offsets, globals, locals, 0, NULL, NULL); if (err != CL_SUCCESS) { printf("Error: Failed to execute kernel! %d\n", err); OCL_ASSERT(0); } clFinish(queue); OCL_MAP_BUFFER(0); #if udebug for(i = 0; i < buf_len; i++) { printf("%2d ", ((int*)buf_data[0])[i]); if ((i + 1) % 3 == 0) printf("\n"); } #endif for( i = 0; i < buf_len; i++) OCL_ASSERT( ((int*)buf_data[0])[i] == i); OCL_UNMAP_BUFFER(0); } } MAKE_UTEST_FROM_FUNCTION(builtin_global_linear_id); Beignet-1.3.2-Source/utests/compiler_uint2_copy.cpp000664 001750 001750 00000001560 13161142102 021471 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_uint2_copy(void) { const size_t n = 128; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_uint2_copy"); buf_data[0] = (uint32_t*) malloc(sizeof(uint32_t[2]) * n); for (uint32_t i = 0; i < 2*n; ++i) ((uint32_t*)buf_data[0])[i] = i; OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(uint32_t[2]), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t[2]), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); // Check result OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 2*n; ++i) OCL_ASSERT(((uint32_t*)buf_data[0])[i] == ((uint32_t*)buf_data[1])[i]); } MAKE_UTEST_FROM_FUNCTION(compiler_uint2_copy); Beignet-1.3.2-Source/utests/compiler_private_data_overflow.cpp000664 001750 001750 00000000663 13161142102 023767 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_private_data_overflow(void) { OCL_CREATE_KERNEL( "compiler_private_data_overflow" ); OCL_CREATE_BUFFER( buf[0], 0, sizeof(cl_int4), NULL ); OCL_SET_ARG( 0, sizeof(cl_mem), &buf[0] ); globals[0] = 64; locals[0] = 32; OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_ASSERT( ((uint32_t *)buf_data[0])[0] == 0 ); OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION( compiler_private_data_overflow ); Beignet-1.3.2-Source/utests/compiler_function_argument2.cpp000664 001750 001750 00000003046 13161142102 023210 0ustar00yryr000000 000000 #include "utest_helper.hpp" #define VECSIZE 8 void compiler_function_argument2(void) { char arg0[8] = { 0 }; unsigned char arg1[8] = { 0 }; short arg2[8] = { 0 }; unsigned short arg3[8] = { 0 }; int arg4[8] = { 0 }; unsigned int arg5[8] = { 0 }; float arg6[8] = { 0 }; for (uint32_t i = 0; i < 8; ++i) { arg0[i] = rand(); arg1[i] = rand(); arg2[i] = rand(); arg3[i] = rand(); arg4[i] = rand(); arg5[i] = rand(); arg6[i] = rand(); } // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_function_argument2"); OCL_CREATE_BUFFER(buf[0], 0, sizeof(float) * 8 * 8, NULL); OCL_SET_ARG(0, sizeof(arg0), arg0); OCL_SET_ARG(1, sizeof(arg1), arg1); OCL_SET_ARG(2, sizeof(arg2), arg2); OCL_SET_ARG(3, sizeof(arg3), arg3); OCL_SET_ARG(4, sizeof(arg4), arg4); OCL_SET_ARG(5, sizeof(arg5), arg5); OCL_SET_ARG(6, sizeof(arg6), arg6); OCL_SET_ARG(7, sizeof(cl_mem), &buf[0]); // Run the kernel globals[0] = 1; locals[0] = 1; OCL_NDRANGE(1); OCL_MAP_BUFFER(0); /* Check results */ float *dst = (float*)buf_data[0]; for (uint32_t i = 0; i < 8; ++i) { OCL_ASSERT((float)arg0[i] == dst[0*8 + i]); OCL_ASSERT((float)arg1[i] == dst[1*8 + i]); OCL_ASSERT((float)arg2[i] == dst[2*8 + i]); OCL_ASSERT((float)arg3[i] == dst[3*8 + i]); OCL_ASSERT((float)arg4[i] == dst[4*8 + i]); OCL_ASSERT((float)arg5[i] == dst[5*8 + i]); OCL_ASSERT((float)arg6[i] == dst[6*8 + i]); } OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(compiler_function_argument2); Beignet-1.3.2-Source/utests/compiler_saturate_sub.cpp000664 001750 001750 00000010707 13161142102 022102 0ustar00yryr000000 000000 #include "utest_helper.hpp" namespace { const int n = 16; // declaration only, we should create each template specification for each type. template T get_data(int idx, int part); /* the format of test data is as follows: * the first column is A * the second column is B * the third column is the expected result. */ #define DEF_TEMPLATE(TYPE, NAME) \ template <> \ TYPE get_data(int idx, int part) \ { \ static TYPE test_data[n][3] = { \ { 0, 0, 0 }, \ { 0, 1, -1 }, \ { CL_##NAME##_MIN, CL_##NAME##_MIN, 0 }, \ { CL_##NAME##_MAX, CL_##NAME##_MAX, 0 }, \ { -2, CL_##NAME##_MIN, CL_##NAME##_MAX-1 }, \ { -1, CL_##NAME##_MIN, CL_##NAME##_MAX }, \ { 0, CL_##NAME##_MIN, CL_##NAME##_MAX }, \ { 1, CL_##NAME##_MIN, CL_##NAME##_MAX }, \ { -2, CL_##NAME##_MAX, CL_##NAME##_MIN }, \ { -1, CL_##NAME##_MAX, CL_##NAME##_MIN }, \ { 0, CL_##NAME##_MAX, -CL_##NAME##_MAX }, \ { 1, CL_##NAME##_MAX, -CL_##NAME##_MAX+1 }, \ { CL_##NAME##_MIN, CL_##NAME##_MAX, CL_##NAME##_MIN }, \ { CL_##NAME##_MIN, 1, CL_##NAME##_MIN }, \ { CL_##NAME##_MIN, -1, CL_##NAME##_MIN+1 }, \ { CL_##NAME##_MAX, CL_##NAME##_MIN, CL_##NAME##_MAX }, \ }; \ return test_data[idx][part]; \ } \ \ template <> \ u##TYPE get_data(int idx, int part) \ { \ static u##TYPE test_data[n][3] = { \ { 0, 0, 0 }, \ { 0, 1, 0 }, \ { 1, 1, 0 }, \ { 1, 0, 1 }, \ { CL_U##NAME##_MAX, CL_U##NAME##_MAX, 0 }, \ { 0, CL_U##NAME##_MAX, 0 }, \ { 1, CL_U##NAME##_MAX, 0 }, \ { CL_U##NAME##_MAX, 0, CL_U##NAME##_MAX }, \ }; \ return test_data[idx][part]; \ } DEF_TEMPLATE(int8_t, CHAR) DEF_TEMPLATE(int16_t, SHRT) DEF_TEMPLATE(int32_t, INT) //DEF_TEMPLATE(int64_t, LONG) template void test(const char *kernel_name) { T C[n] = { 0 }; T A[n] = { 0 }; T B[n] = { 0 }; for (int i = 0; i < n; i++) { A[i] = get_data(i, 0); B[i] = get_data(i, 1); } OCL_CREATE_KERNEL_FROM_FILE("compiler_saturate_sub", kernel_name); OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(T), &C[0]); OCL_CREATE_BUFFER(buf[1], CL_MEM_COPY_HOST_PTR, n * sizeof(T), &A[0]); OCL_CREATE_BUFFER(buf[2], CL_MEM_COPY_HOST_PTR, n * sizeof(T), &B[0]); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = n; OCL_NDRANGE(1); OCL_MAP_BUFFER(0); for (int i = 0; i < n; i++) { OCL_ASSERT(((T*)buf_data[0])[i] == get_data(i, 2)); } OCL_UNMAP_BUFFER(0); } } #define compiler_saturate_sub(type, kernel) \ static void compiler_saturate_sub_ ##type(void)\ {\ test(# kernel);\ }\ MAKE_UTEST_FROM_FUNCTION(compiler_saturate_sub_ ## type); compiler_saturate_sub(int8_t, test_char) compiler_saturate_sub(uint8_t, test_uchar) compiler_saturate_sub(int16_t, test_short) compiler_saturate_sub(uint16_t, test_ushort) compiler_saturate_sub(int32_t, test_int) compiler_saturate_sub(uint32_t, test_uint) //compiler_saturate_sub(int64_t, test_long) //compiler_saturate_sub(uint64_t, test_ulong) Beignet-1.3.2-Source/utests/compiler_insn_selection_max.cpp000664 001750 001750 00000001723 13161142102 023260 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include static void compiler_insn_selection_max(void) { const size_t n = 8192 * 4; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_insn_selection_max"); buf_data[0] = (uint32_t*) malloc(sizeof(uint32_t) * n); for (uint32_t i = 0; i < n; ++i) ((float*)buf_data[0])[i] = float(i); OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(uint32_t), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); // Check result OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); float *dst = (float*)buf_data[1]; float *src = (float*)buf_data[0]; for (uint32_t i = 0; i < n; ++i) { OCL_ASSERT(dst[i] == std::max(src[i], src[0])); } } MAKE_UTEST_FROM_FUNCTION(compiler_insn_selection_max) Beignet-1.3.2-Source/utests/compiler_abs.cpp000664 001750 001750 00000016066 13161142102 020152 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include "string.h" template struct cl_vec { T ptr[((N+1)/2)*2]; //align to 2 elements. typedef cl_vec vec_type; cl_vec(void) { memset(ptr, 0, sizeof(T) * ((N+1)/2)*2); } cl_vec(vec_type & other) { memset(ptr, 0, sizeof(T) * ((N+1)/2)*2); memcpy (this->ptr, other.ptr, sizeof(T) * N); } vec_type& operator= (vec_type & other) { memset(ptr, 0, sizeof(T) * ((N+1)/2)*2); memcpy (this->ptr, other.ptr, sizeof(T) * N); return *this; } template vec_type& operator= (cl_vec & other) { memset(ptr, 0, sizeof(T) * ((N+1)/2)*2); memcpy (this->ptr, other.ptr, sizeof(T) * N); return *this; } bool operator== (vec_type & other) { return !memcmp (this->ptr, other.ptr, sizeof(T) * N); } void abs(void) { int i = 0; for (; i < N; i++) { T f = ptr[i]; f = f < 0 ? -f : f; ptr[i] = f; } } }; template static void cpu (int global_id, cl_vec *src, cl_vec *dst) { cl_vec v = src[global_id]; v.abs(); dst[global_id] = v; } template static void cpu(int global_id, T *src, U *dst) { T f = src[global_id]; f = f < 0 ? -f : f; dst[global_id] = (U)f; } template static void gen_rand_val (cl_vec& vect) { int i = 0; memset(vect.ptr, 0, sizeof(T) * ((N+1)/2)*2); for (; i < N; i++) { vect.ptr[i] = static_cast((rand() & 63) - 32); } } template static void gen_rand_val (T & val) { val = static_cast((rand() & 63) - 32); } template inline static void print_data (T& val) { if (std::is_unsigned::value) printf(" %u", val); else printf(" %d", val); } template static void dump_data (cl_vec* src, cl_vec* dst, int n) { U* val = reinterpret_cast(dst); n = n*((N+1)/2)*2; printf("\nRaw: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((T *)buf_data[0])[i]); } printf("\nCPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(val[i]); } printf("\nGPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((U *)buf_data[1])[i]); } } template static void dump_data (T* src, U* dst, int n) { printf("\nRaw: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((T *)buf_data[0])[i]); } printf("\nCPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(dst[i]); } printf("\nGPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(((U *)buf_data[1])[i]); } } template static void check_result(T* actual, T* expected) { OCL_ASSERT(*actual == *expected); } template static void check_result(cl_vec* actual, cl_vec* expected) { OCL_ASSERT(!memcmp(actual, expected, sizeof(T)*N)); } template static void compiler_abs_with_type(void) { const size_t n = 16; U cpu_dst[16]; T cpu_src[16]; // Setup buffers OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(T), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(T), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; // Run random tests for (uint32_t pass = 0; pass < 8; ++pass) { OCL_MAP_BUFFER(0); /* Clear the dst buffer to avoid random data. */ OCL_MAP_BUFFER(1); memset(buf_data[1], 0, sizeof(U) * n); OCL_UNMAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { gen_rand_val(cpu_src[i]); } memcpy(buf_data[0], cpu_src, sizeof(T) * n); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) cpu(i, cpu_src, cpu_dst); // Compare OCL_MAP_BUFFER(1); // dump_data(cpu_src, cpu_dst, n); U* actual = (U*)buf_data[1]; U* expected = cpu_dst; for (size_t i = 0; i < n; ++i) check_result(&actual[i], &expected[i]); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(0); } } #define ABS_TEST_TYPE_1(TYPE, UTYPE, KEEP_PROGRAM) \ static void compiler_abs_##TYPE (void) \ { \ OCL_CALL (cl_kernel_init, "compiler_abs.cl", "compiler_abs_"#TYPE, SOURCE, NULL); \ compiler_abs_with_type(); \ } \ MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(compiler_abs_##TYPE, KEEP_PROGRAM); #define ABS_TEST_TYPE(TYPE, UTYPE) ABS_TEST_TYPE_1(TYPE, UTYPE, true) #define ABS_TEST_TYPE_END(TYPE, UTYPE) ABS_TEST_TYPE_1(TYPE, UTYPE, false) typedef unsigned char uchar; typedef unsigned short ushort; typedef unsigned int uint; ABS_TEST_TYPE(int, uint) ABS_TEST_TYPE(short, ushort) ABS_TEST_TYPE(char, uchar) ABS_TEST_TYPE(uint, uint) ABS_TEST_TYPE(ushort, ushort) ABS_TEST_TYPE(uchar, uchar) typedef cl_vec int2; typedef cl_vec int3; typedef cl_vec int4; typedef cl_vec int8; typedef cl_vec int16; typedef cl_vec uint2; typedef cl_vec uint3; typedef cl_vec uint4; typedef cl_vec uint8; typedef cl_vec uint16; ABS_TEST_TYPE(int2, uint2) ABS_TEST_TYPE(int3, uint3) ABS_TEST_TYPE(int4, uint4) ABS_TEST_TYPE(int8, uint8) ABS_TEST_TYPE(int16, uint16) ABS_TEST_TYPE(uint2, uint2) ABS_TEST_TYPE(uint3, uint3) ABS_TEST_TYPE(uint4, uint4) ABS_TEST_TYPE(uint8, uint8) ABS_TEST_TYPE(uint16, uint16) typedef cl_vec char2; typedef cl_vec char3; typedef cl_vec char4; typedef cl_vec char8; typedef cl_vec char16; typedef cl_vec uchar2; typedef cl_vec uchar3; typedef cl_vec uchar4; typedef cl_vec uchar8; typedef cl_vec uchar16; ABS_TEST_TYPE(char2, uchar2) ABS_TEST_TYPE(char3, uchar3) ABS_TEST_TYPE(char4, uchar4) ABS_TEST_TYPE(char8, uchar8) ABS_TEST_TYPE(char16, uchar16) ABS_TEST_TYPE(uchar2, uchar2) ABS_TEST_TYPE(uchar3, uchar3) ABS_TEST_TYPE(uchar4, uchar4) ABS_TEST_TYPE(uchar8, uchar8) ABS_TEST_TYPE(uchar16, uchar16) typedef cl_vec short2; typedef cl_vec short3; typedef cl_vec short4; typedef cl_vec short8; typedef cl_vec short16; typedef cl_vec ushort2; typedef cl_vec ushort3; typedef cl_vec ushort4; typedef cl_vec ushort8; typedef cl_vec ushort16; ABS_TEST_TYPE(short2, ushort2) ABS_TEST_TYPE(short3, ushort3) ABS_TEST_TYPE(short4, ushort4) ABS_TEST_TYPE(short8, ushort8) ABS_TEST_TYPE(short16, ushort16) ABS_TEST_TYPE(ushort2, ushort2) ABS_TEST_TYPE(ushort3, ushort3) ABS_TEST_TYPE(ushort4, ushort4) ABS_TEST_TYPE(ushort8, ushort8) ABS_TEST_TYPE_END(ushort16, ushort16) Beignet-1.3.2-Source/utests/compiler_sub_group_shuffle_up.cpp000664 001750 001750 00000005621 13161142102 023625 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_sub_group_shuffle_up_int(void) { if(!cl_check_subgroups()) return; const size_t n = 32; const int32_t buf_size = 4 * n + 1; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_sub_group_shuffle_up", "compiler_sub_group_shuffle_up_int"); OCL_CREATE_BUFFER(buf[0], 0, buf_size * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); int c = 3; OCL_SET_ARG(1, sizeof(int), &c); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); for (int32_t i = 0; i < buf_size; ++i) ((int*)buf_data[0])[i] = -1; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(0); int* dst = (int *)buf_data[0]; int suggroupsize = dst[0]; OCL_ASSERT(suggroupsize == 8 || suggroupsize == 16); dst++; for (int32_t i = 0; i < (int32_t) n; ++i){ int round = i / suggroupsize; int index = i % suggroupsize; //printf("%d %d %d %d\n",dst[4*i], dst[4*i+1], dst[4*i+2], dst[4*i+3]); OCL_ASSERT( ((c - index) > 0 ? 123 : 456) == dst[4*i]); OCL_ASSERT( ((c - index) > 0 ? 123 : (i - c)) == dst[4*i+1]); OCL_ASSERT( ((suggroupsize - index - 1 - index) > 0 ? (i + index + 1) : -(i + index + 1 - suggroupsize)) == dst[4*i+2]); OCL_ASSERT((round * suggroupsize + (suggroupsize - 1)) == dst[4*i+3]); } OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(compiler_sub_group_shuffle_up_int); void compiler_sub_group_shuffle_up_short(void) { if(!cl_check_subgroups_short()) return; const size_t n = 32; const int32_t buf_size = 4 * n + 1; // Setup kernel and buffers OCL_CALL(cl_kernel_init, "compiler_sub_group_shuffle_up.cl", "compiler_sub_group_shuffle_up_short", SOURCE, "-DSHORT"); OCL_CREATE_BUFFER(buf[0], 0, buf_size * sizeof(short), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); int c = 3; OCL_SET_ARG(1, sizeof(int), &c); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); for (int32_t i = 0; i < buf_size; ++i) ((short*)buf_data[0])[i] = -1; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(0); short* dst = (short *)buf_data[0]; short suggroupsize = dst[0]; OCL_ASSERT(suggroupsize == 8 || suggroupsize == 16); dst++; for (int32_t i = 0; i < (int32_t) n; ++i){ int round = i / suggroupsize; int index = i % suggroupsize; //printf("%d %d %d %d\n",dst[4*i], dst[4*i+1], dst[4*i+2], dst[4*i+3]); OCL_ASSERT( ((c - index) > 0 ? 123 : 456) == dst[4*i]); OCL_ASSERT( ((c - index) > 0 ? 123 : (i - c)) == dst[4*i+1]); OCL_ASSERT( ((suggroupsize - index - 1 - index) > 0 ? (i + index + 1) : -(i + index + 1 - suggroupsize)) == dst[4*i+2]); OCL_ASSERT((round * suggroupsize + (suggroupsize - 1)) == dst[4*i+3]); } OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(compiler_sub_group_shuffle_up_short); Beignet-1.3.2-Source/utests/utest.hpp000664 001750 001750 00000012334 13161142102 016656 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file utest.hpp * \author Benjamin Segovia * * Provides all unit test capabilites. It is rather rudimentary but it should * do the job */ #ifndef __UTEST_UTEST_HPP__ #define __UTEST_UTEST_HPP__ #include "utest_exception.hpp" #include #include #include /*! struct for statistics */ struct RStatistics { size_t passCount; size_t failCount; size_t finishrun; size_t actualrun; }; /*! Quick and dirty unit test system with registration */ struct UTest { /*! A unit test function to run */ typedef void (*Function) (void); /*! Empty test */ UTest(void); /*! Build a new unit test and append it to the unit test list */ UTest(Function fn, const char *name, bool isBenchMark = false, bool haveIssue = false, bool needDestroyProgram = true); /*! Function to execute */ Function fn; /*! Name of the test */ const char *name; /*! numbers of the jobs */ const char *number; /*! whether it is a bench mark. */ bool isBenchMark; /*! Indicate whether current test cases has issue to be fixes */ bool haveIssue; /*! Indicate whether destroy kernels/program. */ bool needDestroyProgram; /*! The tests that are registered */ static std::vector *utestList; /*! Run the test with the given name */ static void run(const char *name); /*! Run the test with the given name */ static void runMultiThread(const char *number); /*! Run all the tests without known issue*/ static void runAllNoIssue(void); /*! Run all the benchmark. */ static void runAllBenchMark(void); /*! Run all the tests */ static void runAll(void); /*! List all test cases */ static void listAllCases(void); /*! List test cases that can run*/ static void listCasesCanRun(void); /*! List test cases with issue*/ static void listCasesWithIssue(void); /*! Statistics struct */ static RStatistics retStatistics; /*! Do run a test case actually */ static void do_run(struct UTest utest); }; /*! Register a new unit test */ #define UTEST_REGISTER(FN) static const UTest __##FN##__(FN, #FN); #define MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(FN, KEEP_PROGRAM) \ static void __ANON__##FN##__(void) { UTEST_EXPECT_SUCCESS(FN()); } \ static const UTest __##FN##__(__ANON__##FN##__, #FN, false, false, !(KEEP_PROGRAM)); /*! Turn a function into a unit test */ #define MAKE_UTEST_FROM_FUNCTION(FN) \ static void __ANON__##FN##__(void) { UTEST_EXPECT_SUCCESS(FN()); } \ static const UTest __##FN##__(__ANON__##FN##__, #FN); /*! Register a test case which has issue to be fixed */ #define MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(FN) \ static void __ANON__##FN##__(void) { UTEST_EXPECT_SUCCESS(FN()); } \ static const UTest __##FN##__(__ANON__##FN##__, #FN, false ,true); /*! Turn a function into a unit performance test */ #define MAKE_BENCHMARK_FROM_FUNCTION_KEEP_PROGRAM(FN, KEEP_PROGRAM, ...) \ static void __ANON__##FN##__(void) { BENCHMARK(FN(), __VA_ARGS__); } \ static const UTest __##FN##__(__ANON__##FN##__, #FN, true, false, !(KEEP_PROGRAM)); #define MAKE_BENCHMARK_FROM_FUNCTION(FN, ...) \ static void __ANON__##FN##__(void) { BENCHMARK(FN(), __VA_ARGS__); } \ static const UTest __##FN##__(__ANON__##FN##__, #FN, true); /*! No assert is expected */ #define UTEST_EXPECT_SUCCESS(EXPR) \ do { \ try { \ EXPR; \ std::cout << " [SUCCESS]" << std::endl; \ UTest::retStatistics.passCount += 1; \ } \ catch (Exception e) { \ std::cout << " [FAILED]" << std::endl; \ std::cout << " " << e.what() << std::endl; \ UTest::retStatistics.failCount++; \ } \ } while (0) #define UTEST_EXPECT_FAILED(EXPR) \ do { \ try { \ EXPR; \ std::cout << " [FAILED]" << std::endl; \ retStatistics.failCount++; \ } \ catch (gbe::Exception e) { \ std::cout << " [SUCCESS]" << std::endl; \ retStatistics.passCount++; \ } \ } while (0) #define BENCHMARK(EXPR, ...) \ do { \ double ret = 0;\ try { \ ret = EXPR; \ std::cout << " [Result: " << std::fixed<< std::setprecision(3) << ret << " " << __VA_ARGS__ << "] [SUCCESS]" << std::endl; \ UTest::retStatistics.passCount += 1; \ } \ catch (Exception e) { \ std::cout << " " << #EXPR << " [FAILED]" << std::endl; \ std::cout << " " << e.what() << std::endl; \ UTest::retStatistics.failCount++; \ } \ } while (0) #define BANDWIDTH(BYTES, MSEC) \ ((double)(BYTES)) / ((MSEC) * 1e6); #endif /* __UTEST_UTEST_HPP__ */ Beignet-1.3.2-Source/utests/compiler_unstructured_branch0.cpp000664 001750 001750 00000003117 13161142102 023542 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_unstructured_branch0(void) { const size_t n = 32; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_unstructured_branch0"); buf_data[0] = (uint32_t*) malloc(sizeof(uint32_t) * n); for (uint32_t i = 0; i < n; ++i) ((uint32_t*)buf_data[0])[i] = 2; OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(uint32_t), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; OCL_NDRANGE(1); // First control flow OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 16; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 2); for (uint32_t i = 16; i < 32; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 1); // Second control flow for (uint32_t i = 0; i < n; ++i) ((int32_t*)buf_data[0])[i] = -2; OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 32; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 1); // Third control flow for (uint32_t i = 0; i < 8; ++i) ((int32_t*)buf_data[0])[i] = 2; OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 8; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 2); for (uint32_t i = 8; i < 32; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 1); } MAKE_UTEST_FROM_FUNCTION(compiler_unstructured_branch0); Beignet-1.3.2-Source/utests/compiler_write_only.cpp000664 001750 001750 00000002423 13161142102 021570 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "utest_helper.hpp" static void compiler_write_only(void) { const size_t n = 2048; // Setup kernel and buffers OCL_CREATE_KERNEL("test_write_only"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); // Run the kernel globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); OCL_MAP_BUFFER(0); // Check results for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((uint32_t*)buf_data[0])[i] == i); } MAKE_UTEST_FROM_FUNCTION(compiler_write_only); Beignet-1.3.2-Source/utests/builtin_exp.cpp000664 001750 001750 00000006257 13161142102 020036 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include #include #define udebug 0 #define FLT_MAX 0x1.fffffep127f #define FLT_MIN ldexpf(1.0,-126) #define FLT_ULP (1.0e-6f) #define printf_c(...) \ {\ printf("\033[1m\033[40;31m");\ printf( __VA_ARGS__ );\ printf("\033[0m");\ } namespace{ float input_data[] = {FLT_MAX, -FLT_MAX, FLT_MIN, -FLT_MIN, 80, -80, 3.14, -3.14, -0.5, 0.5, 1, -1, 0.0 }; const int count_input = sizeof(input_data) / sizeof(input_data[0]); const int max_function = 5; static void cpu_compiler_math(float *dst, const float *src) { const float x = *src; dst[0] = exp(x); dst[1] = exp2(x); dst[2] = exp10(x); dst[3] = expm1(x); dst[4] = x; } static void builtin_exp(void) { // Setup kernel and buffers int k, i, index_cur; float gpu_data[max_function * count_input] = {0}, cpu_data[max_function * count_input] = {0}; float diff; char log[256] = {0}; OCL_CREATE_KERNEL("builtin_exp"); OCL_CREATE_BUFFER(buf[0], CL_MEM_READ_WRITE, count_input * max_function * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], CL_MEM_READ_WRITE, count_input * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[2], CL_MEM_READ_WRITE, sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = count_input; locals[0] = 1; clEnqueueWriteBuffer( queue, buf[1], CL_TRUE, 0, count_input * sizeof(float), input_data, 0, NULL, NULL); int maxfunc = max_function; clEnqueueWriteBuffer( queue, buf[2], CL_TRUE, 0, sizeof(int), &maxfunc, 0, NULL, NULL); // Run the kernel OCL_NDRANGE( 1 ); clEnqueueReadBuffer( queue, buf[0], CL_TRUE, 0, sizeof(float) * max_function * count_input, gpu_data, 0, NULL, NULL); for (k = 0; (uint)k < count_input; k++) { cpu_compiler_math( cpu_data + k * max_function, input_data + k); for (i = 0; i < max_function; i++) { index_cur = k * max_function + i; diff = fabs(gpu_data[index_cur]-cpu_data[index_cur]); sprintf(log, "%d/%d: %f -> gpu:%f cpu:%f diff:%f expect:%f\n", \ k, i, input_data[k], gpu_data[index_cur], cpu_data[index_cur], \ diff/gpu_data[index_cur], 3 * FLT_ULP); #if udebug if (std::isinf(cpu_data[index_cur]) && std::isinf(gpu_data[index_cur])){ printf(log); } else if (std::isnan(cpu_data[index_cur]) && std::isnan(gpu_data[index_cur])){ printf(log); } else if( diff / cpu_data[index_cur] < 3 * FLT_ULP \ && ( gpu_data[index_cur] > FLT_ULP || cpu_data[index_cur] > FLT_ULP )){ printf(log); } else if ( gpu_data[index_cur] < FLT_ULP && gpu_data[index_cur] < FLT_ULP) printf(log); else printf_c(log); #else if (std::isinf(cpu_data[index_cur])) OCL_ASSERTM(std::isinf(gpu_data[index_cur]), log); else if (std::isnan(cpu_data[index_cur])) OCL_ASSERTM(std::isnan(gpu_data[index_cur]), log); else if ( gpu_data[index_cur] > FLT_ULP || cpu_data[index_cur] > FLT_ULP) OCL_ASSERTM(fabs( diff / cpu_data[index_cur]) < 3 * FLT_ULP, log); else OCL_ASSERTM(fabs(diff) < 3 * FLT_ULP, log); #endif } } } MAKE_UTEST_FROM_FUNCTION(builtin_exp) } Beignet-1.3.2-Source/utests/compiler_lower_return0.cpp000664 001750 001750 00000002743 13161142102 022211 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_lower_return0(void) { const size_t n = 32; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_lower_return0"); buf_data[0] = (uint32_t*) malloc(sizeof(uint32_t) * n); for (uint32_t i = 0; i < n; ++i) ((uint32_t*)buf_data[0])[i] = 2; OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(uint32_t), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); // First control flow OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int32_t i = 0; i < 32; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == i); // Second control flow for (uint32_t i = 0; i < n; ++i) ((int32_t*)buf_data[0])[i] = -2; OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 32; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == -2); // Third control flow for (uint32_t i = 0; i < 8; ++i) ((int32_t*)buf_data[0])[i] = 2; OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int32_t i = 0; i < 8; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == i); for (int32_t i = 8; i < 32; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == -2); } MAKE_UTEST_FROM_FUNCTION(compiler_lower_return0); Beignet-1.3.2-Source/utests/compiler_integer_remainder.cpp000664 001750 001750 00000002250 13161142102 023056 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void cpu(int global_id, int *src, int *dst, int x) { dst[global_id] = src[global_id] % x; } void compiler_integer_remainder(void) { const size_t n = 16; int cpu_dst[16], cpu_src[16]; const int x = 7; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_integer_remainder"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(x), &x); globals[0] = 16; locals[0] = 16; // Run random tests for (uint32_t pass = 0; pass < 8; ++pass) { OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) cpu_src[i] = ((int32_t*)buf_data[0])[i] = rand() % 16; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i <(int32_t) n; ++i) cpu(i, cpu_src, cpu_dst, x); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < 11; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == cpu_dst[i]); OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION(compiler_integer_remainder); Beignet-1.3.2-Source/utests/compiler_box_blur_float.cpp000664 001750 001750 00000003374 13161142102 022404 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include static int *tmp = NULL; static struct float4 {float x,y,z,w;} *src = NULL, *dst = NULL; static int w = 0; static int h = 0; static int sz = 0; static const size_t chunk = 64; static void compiler_box_blur_float() { OCL_CREATE_KERNEL("compiler_box_blur_float"); /* Load the picture */ tmp = cl_read_bmp("sample.bmp", &w, &h); if(tmp == NULL) return; sz = w * h * sizeof(float[4]); src = (float4*)malloc(sz); /* RGBA -> float4 conversion */ const int n = w*h; for (int i = 0; i < n; ++i) { src[i].x = (float) (tmp[i] & 0xff); src[i].y = (float) ((tmp[i] >> 8) & 0xff); src[i].z = (float) ((tmp[i] >> 16) & 0xff); src[i].w = 0.f; } free(tmp); /* Run the kernel */ OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, sz, src); OCL_CREATE_BUFFER(buf[1], 0, sz, NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(int), &w); OCL_SET_ARG(3, sizeof(int), &h); OCL_SET_ARG(4, sizeof(int), &chunk); globals[0] = size_t(w); globals[1] = h/chunk + ((h%chunk)?1:0); locals[0] = 16; locals[1] = 1; free(src); OCL_NDRANGE(2); OCL_MAP_BUFFER(1); dst = (float4*) buf_data[1]; /* Convert back to RGBA and save */ int *tmp = (int*) malloc(n*sizeof(int)); for (int i = 0; i < n; ++i) { int to = int(std::min(dst[i].x, 255.f)); to |= int(std::min(dst[i].y, 255.f)) << 8; to |= int(std::min(dst[i].z, 255.f)) << 16; tmp[i] = to; } /* Save the image (for debug purpose) */ cl_write_bmp(tmp, w, h, "compiler_box_blur_float.bmp"); /* Compare with the golden image */ OCL_CHECK_IMAGE(tmp, w, h, "compiler_box_blur_float_ref.bmp"); free(tmp); } MAKE_UTEST_FROM_FUNCTION(compiler_box_blur_float); Beignet-1.3.2-Source/utests/compiler_integer_division.cpp000664 001750 001750 00000002247 13161142102 022742 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void cpu(int global_id, int *src, int *dst, int x) { dst[global_id] = src[global_id] / x; } void compiler_integer_division(void) { const size_t n = 16; int cpu_dst[16], cpu_src[16]; const int x = 7; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_integer_division"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(x), &x); globals[0] = 16; locals[0] = 16; // Run random tests for (uint32_t pass = 0; pass < 8; ++pass) { OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) cpu_src[i] = ((int32_t*)buf_data[0])[i] = rand() % 1000; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i <(int32_t) n; ++i) cpu(i, cpu_src, cpu_dst, x); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < 11; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == cpu_dst[i]); OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION(compiler_integer_division); Beignet-1.3.2-Source/utests/compiler_group_size.cpp000664 001750 001750 00000007016 13161142102 021566 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include struct xyz{ unsigned short b; unsigned short e; unsigned int o; }; void compiler_group_size1(void) { const size_t n = 7*32*17; int group_size[] = {7, 17, 32}; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_group_size"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); for(int i = 0; i < 3; i++) { // Run the kernel globals[0] = n; locals[0] = group_size[i]; OCL_NDRANGE(1); OCL_MAP_BUFFER(0); // Check results for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((uint32_t*)buf_data[0])[i] == i); OCL_UNMAP_BUFFER(0); } } void compiler_group_size2(void) { const uint32_t n = 4*17*8; int size_x[] = {2, 4, 17}; int size_y[] = {2, 4, 4}; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_group_size"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); for(int i = 0; i < 3; i++) { // Run the kernel globals[0] = 4*17; globals[1] = 8; locals[0] = size_x[i]; locals[1] = size_y[i]; OCL_NDRANGE(2); OCL_MAP_BUFFER(0); // Check results for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((uint32_t*)buf_data[0])[i] == i); OCL_UNMAP_BUFFER(0); } } void compiler_group_size3(void) { const uint32_t n = 4*17*8*4; int size_x[] = {2, 4, 17}; int size_y[] = {2, 4, 4}; int size_z[] = {2, 1, 2}; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_group_size"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); for(int i = 0; i < 3; i++) { // Run the kernel globals[0] = 4*17; globals[1] = 8; globals[2] = 4; locals[0] = size_x[i]; locals[1] = size_y[i]; locals[2] = size_z[i]; OCL_NDRANGE(3); OCL_MAP_BUFFER(0); // Check results for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((uint32_t*)buf_data[0])[i] == i); OCL_UNMAP_BUFFER(0); } } void compiler_group_size4(void) { const size_t n = 16; uint32_t color = 2; uint32_t num = 1; int group_size[] = {1}; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_group_size", "compiler_group_size4"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(struct xyz), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); for(uint32_t i = 0; i < num; i++) { // Run the kernel OCL_MAP_BUFFER(0); ((struct xyz*)buf_data[0])[0].b = 0; ((struct xyz*)buf_data[0])[0].e = 2; ((struct xyz*)buf_data[0])[0].o = 0; OCL_UNMAP_BUFFER(0); OCL_MAP_BUFFER(1); memset(((uint32_t*)buf_data[1]), 0x0, sizeof(uint32_t)*n); OCL_UNMAP_BUFFER(1); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_int), &group_size[i]); OCL_SET_ARG(3, sizeof(cl_int), &color); globals[0] = group_size[i]; locals[0] = group_size[i]; OCL_NDRANGE(1); OCL_MAP_BUFFER(1); // Check results for (uint32_t j = 0; j < n; ++j) { // std::cout <<((uint32_t*)buf_data[1])[j] << " "; if(j >= i && j <= i+2) { OCL_ASSERT(((uint32_t*)buf_data[1])[j] == color); } else { OCL_ASSERT(((uint32_t*)buf_data[1])[j] == 0); } } OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(compiler_group_size1, true); MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(compiler_group_size2, true); MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(compiler_group_size3, true); MAKE_UTEST_FROM_FUNCTION(compiler_group_size4); Beignet-1.3.2-Source/utests/compiler_vector_load_store.cpp000664 001750 001750 00000005642 13161142102 023120 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include #include template static void compiler_vector_load_store(int elemNum, const char *kernelName) { const size_t n = elemNum * 256; if (strstr(kernelName, "half") != NULL) if (!cl_check_half()) return; // Setup kernel and buffers if (strstr(kernelName, "half") != NULL) OCL_CALL(cl_kernel_init, "compiler_vector_load_store.cl", kernelName, SOURCE, "-DHALF"); else OCL_CREATE_KERNEL_FROM_FILE("compiler_vector_load_store", kernelName); buf_data[0] = (T*) malloc(sizeof(T) * n); for (uint32_t i = 0; i < n; ++i) { if (strstr(kernelName, "half") != NULL) ((T*)buf_data[0])[i] = __float_to_half(as_uint((float)i/(float)n)); else ((T*)buf_data[0])[i] = i; } OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(T), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(T), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n / elemNum; locals[0] = 16; OCL_NDRANGE(1); // Check result OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < n; ++i) { int shift = ((i % elemNum) + 1); if (strstr(kernelName, "double") != NULL) OCL_ASSERT((((T*)buf_data[1])[i] - ((T)((T*)buf_data[0])[i] + shift)) < 1e-5); else if (strstr(kernelName, "half") != NULL) { float fdst = as_float(__half_to_float(((T*)buf_data[1])[i])); float fsrc = as_float(__half_to_float((T)(((T*)buf_data[0])[i]))); fsrc += shift; //printf("%d (%f, %f)\n",i, fdst, fsrc); OCL_ASSERT((fabs(fsrc - fdst) <= 0.03 * fabs(fdst))); } else OCL_ASSERT(((T*)buf_data[1])[i] == (T)(((T*)buf_data[0])[i] + shift)); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); } #define compiler_vector_load_store(type, n, kernel_type, keep_program) \ static void compiler_vector_ ##kernel_type ##n ##_load_store(void)\ {\ compiler_vector_load_store(n, "test_" #kernel_type #n);\ }\ MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(compiler_vector_ ## kernel_type ##n ##_load_store, keep_program); #define test_all_vector(type, kernel_type, keep_program) \ compiler_vector_load_store(type, 2, kernel_type, true) \ compiler_vector_load_store(type, 3, kernel_type, true) \ compiler_vector_load_store(type, 4, kernel_type, true) \ compiler_vector_load_store(type, 8, kernel_type, true) \ compiler_vector_load_store(type, 16, kernel_type, keep_program) test_all_vector(int8_t, char, true) test_all_vector(uint8_t, uchar, true) test_all_vector(int16_t, short, true) test_all_vector(uint16_t, ushort, true) test_all_vector(int32_t, int, true) test_all_vector(uint32_t, uint, true) test_all_vector(float, float, true) //test_all_vector(double, double, true) test_all_vector(int64_t, long, true) test_all_vector(uint64_t, ulong, false) test_all_vector(uint16_t, half, false) Beignet-1.3.2-Source/utests/CMakeLists.txt000664 001750 001750 00000031255 13173554000 017553 0ustar00yryr000000 000000 ################################################################################### # these configurations are copied from beignet root directory cmake for stand alone build. # do NOT set the NOT_BUILD_STAND_ALONE_UTEST if build the utest alone. if (NOT NOT_BUILD_STAND_ALONE_UTEST) message(STATUS "Building Stand Alone Utest") CMAKE_MINIMUM_REQUIRED(VERSION 2.6.0) INCLUDE (FindPkgConfig) Find_Package(PythonInterp) # OpenCL pkg_check_modules(OPENCL REQUIRED OpenCL) IF(OPENCL_FOUND) INCLUDE_DIRECTORIES(${OPENCL_INCLUDE_DIRS}) ENDIF(OPENCL_FOUND) # Force Release with debug info if (NOT CMAKE_BUILD_TYPE) set (CMAKE_BUILD_TYPE RelWithDebInfo) endif (NOT CMAKE_BUILD_TYPE) message(STATUS "Building mode: " ${CMAKE_BUILD_TYPE}) set (CMAKE_BUILD_TYPE ${CMAKE_BUILD_TYPE} CACHE STRING "assure config" FORCE) # Threads Find_Package(Threads) set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${CMAKE_C_CXX_FLAGS} -std=c++0x -Wno-invalid-offsetof") set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${CMAKE_C_CXX_FLAGS}") set (CMAKE_CXX_FLAGS_DEBUG "-O0 -g -DGBE_DEBUG=1") set (CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O2 -g -DGBE_DEBUG=1") set (CMAKE_CXX_FLAGS_MINSIZEREL "-Os -DNDEBUG -DGBE_DEBUG=0") set (CMAKE_CXX_FLAGS_RELEASE "-O2 -DNDEBUG -DGBE_DEBUG=0") set (CMAKE_C_FLAGS_DEBUG "-O0 -g -DGBE_DEBUG=1") set (CMAKE_C_FLAGS_RELWITHDEBINFO "-O2 -g -DGBE_DEBUG=1") set (CMAKE_C_FLAGS_MINSIZEREL "-Os -DNDEBUG -DGBE_DEBUG=0") set (CMAKE_C_FLAGS_RELEASE "-O2 -DNDEBUG -DGBE_DEBUG=0") endif (NOT NOT_BUILD_STAND_ALONE_UTEST) ################################################################################### INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/../include ${OPENGL_INCLUDE_DIRS} ${EGL_INCLUDE_DIRS}) ##### Math Function Part: EXECUTE_PROCESS(COMMAND mkdir generated -p WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}) EXECUTE_PROCESS(COMMAND ${PYTHON_EXECUTABLE} utest_math_gen.py WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} OUTPUT_VARIABLE GEN_MATH_STRING) string(REGEX REPLACE " " ";" ADDMATHFUNC ${GEN_MATH_STRING}) string(REGEX REPLACE "generated/([^\ ]*)\\.cpp" "${CMAKE_CURRENT_SOURCE_DIR}/../kernels/\\1.cl" KERNEL_MATH_LIST ${GEN_MATH_STRING}) string(REGEX REPLACE " " ";" KERNEL_MATH_LIST ${KERNEL_MATH_LIST}) string(REGEX REPLACE "generated/([^\ ]*)\\.cpp" "\\1.cl" KERNEL_GITIGNORE_LIST ${GEN_MATH_STRING}) set_directory_properties(PROPERTIES ADDITIONAL_MAKE_CLEAN_FILES "generated;${KERNEL_MATH_LIST}") configure_file ( "setenv.sh.in" "setenv.sh" ) #XXX only need GL if required link_directories (${LLVM_LIBRARY_DIR} ${OPENGL_LIBDIR} ${EGL_LIBDIR} ${X11_LIBDIR} ${DRM_LIBDIR}) set (utests_basic_sources utest_error.c utest_assert.cpp utest.cpp utest_file_map.cpp utest_helper.cpp) # the test case with binary kernel if (NOT_BUILD_STAND_ALONE_UTEST) set (utests_binary_kernel_sources load_program_from_bin_file.cpp enqueue_built_in_kernels.cpp) endif (NOT_BUILD_STAND_ALONE_UTEST) set (utests_sources compiler_basic_arithmetic.cpp compiler_displacement_map_element.cpp compiler_mandelbrot.cpp compiler_mandelbrot_alternate.cpp compiler_box_blur_float.cpp compiler_box_blur_image.cpp compiler_box_blur.cpp compiler_insert_to_constant.cpp compiler_argument_structure.cpp compiler_argument_structure_indirect.cpp compiler_argument_structure_select.cpp compiler_arith_shift_right.cpp compiler_mixed_pointer.cpp compiler_array0.cpp compiler_array.cpp compiler_array1.cpp compiler_array2.cpp compiler_array3.cpp compiler_array4.cpp compiler_byte_scatter.cpp compiler_ceil.cpp compiler_popcount.cpp compiler_convert_uchar_sat.cpp compiler_copy_buffer.cpp compiler_copy_image.cpp compiler_copy_image_1d.cpp compiler_copy_image_3d.cpp compiler_copy_buffer_row.cpp compiler_degrees.cpp compiler_step.cpp compiler_fabs.cpp compiler_abs.cpp compiler_abs_diff.cpp compiler_fill_image.cpp compiler_fill_image0.cpp compiler_fill_image_1d.cpp compiler_fill_image_3d.cpp compiler_fill_image_3d_2.cpp compiler_function_argument0.cpp compiler_function_argument1.cpp compiler_function_argument2.cpp compiler_function_argument.cpp compiler_function_constant0.cpp compiler_function_constant1.cpp compiler_function_constant.cpp compiler_global_constant.cpp compiler_global_constant_2.cpp compiler_group_size.cpp compiler_hadd.cpp compiler_if_else.cpp compiler_integer_division.cpp compiler_integer_remainder.cpp compiler_insert_vector.cpp compiler_lower_return0.cpp compiler_lower_return1.cpp compiler_lower_return2.cpp compiler_mad_hi.cpp compiler_mul_hi.cpp compiler_mad24.cpp compiler_mul24.cpp compiler_multiple_kernels.cpp compiler_radians.cpp compiler_rhadd.cpp compiler_rotate.cpp compiler_saturate.cpp compiler_saturate_sub.cpp compiler_shift_right.cpp compiler_short_scatter.cpp compiler_smoothstep.cpp compiler_uint2_copy.cpp compiler_uint3_copy.cpp compiler_uint8_copy.cpp compiler_uint16_copy.cpp compiler_uint3_unaligned_copy.cpp compiler_upsample_int.cpp compiler_upsample_long.cpp compiler_unstructured_branch0.cpp compiler_unstructured_branch1.cpp compiler_unstructured_branch2.cpp compiler_unstructured_branch3.cpp compiler_write_only_bytes.cpp compiler_write_only.cpp compiler_write_only_shorts.cpp compiler_switch.cpp compiler_bswap.cpp compiler_clz.cpp compiler_ctz.cpp compiler_math.cpp compiler_atomic_functions.cpp compiler_async_copy.cpp compiler_workgroup_broadcast.cpp compiler_workgroup_reduce.cpp compiler_workgroup_scan_exclusive.cpp compiler_workgroup_scan_inclusive.cpp compiler_subgroup_broadcast.cpp compiler_subgroup_reduce.cpp compiler_subgroup_scan_exclusive.cpp compiler_subgroup_scan_inclusive.cpp compiler_subgroup_buffer_block_read.cpp compiler_subgroup_buffer_block_write.cpp compiler_subgroup_image_block_read.cpp compiler_subgroup_image_block_write.cpp compiler_async_stride_copy.cpp compiler_insn_selection_min.cpp compiler_insn_selection_max.cpp compiler_insn_selection_masked_min_max.cpp compiler_load_bool_imm.cpp compiler_global_memory_barrier.cpp compiler_local_memory_two_ptr.cpp compiler_local_memory_barrier.cpp compiler_local_memory_barrier_wg64.cpp compiler_local_memory_barrier_2.cpp compiler_local_slm.cpp compiler_movforphi_undef.cpp compiler_volatile.cpp compiler_copy_image1.cpp compiler_get_image_info.cpp compiler_get_image_info_array.cpp compiler_vect_compare.cpp compiler_vector_load_store.cpp compiler_vector_inc.cpp compiler_cl_finish.cpp get_cl_info.cpp builtin_atan2.cpp builtin_bitselect.cpp builtin_frexp.cpp builtin_mad_sat.cpp builtin_modf.cpp builtin_nextafter.cpp builtin_remquo.cpp builtin_shuffle.cpp builtin_shuffle2.cpp builtin_sign.cpp builtin_lgamma.cpp builtin_lgamma_r.cpp builtin_tgamma.cpp buildin_work_dim.cpp builtin_global_size.cpp builtin_local_size.cpp builtin_global_id.cpp builtin_num_groups.cpp builtin_local_id.cpp builtin_sub_group_size.cpp builtin_max_sub_group_size.cpp builtin_num_sub_groups.cpp builtin_sub_group_id.cpp builtin_acos_asin.cpp builtin_pow.cpp builtin_exp.cpp builtin_convert_sat.cpp sub_buffer.cpp runtime_createcontext.cpp runtime_set_kernel_arg.cpp runtime_null_kernel_arg.cpp runtime_event.cpp runtime_barrier_list.cpp runtime_marker_list.cpp runtime_compile_link.cpp compiler_long.cpp compiler_long_2.cpp compiler_long_not.cpp compiler_long_hi_sat.cpp compiler_long_div.cpp compiler_long_convert.cpp compiler_long_shl.cpp compiler_long_shr.cpp compiler_long_asr.cpp compiler_long_mult.cpp compiler_long_cmp.cpp compiler_long_bitcast.cpp compiler_half.cpp compiler_function_argument3.cpp compiler_function_qualifiers.cpp compiler_bool_cross_basic_block.cpp compiler_private_const.cpp compiler_private_data_overflow.cpp compiler_getelementptr_bitcast.cpp compiler_time_stamp.cpp compiler_double_precision.cpp compiler_double.cpp compiler_double_div.cpp compiler_double_convert.cpp load_program_from_gen_bin.cpp get_arg_info.cpp profiling_exec.cpp enqueue_copy_buf.cpp enqueue_copy_buf_unaligned.cpp test_printf.cpp enqueue_fill_buf.cpp builtin_kernel_max_global_size.cpp image_1D_buffer.cpp image_from_buffer.cpp compare_image_2d_and_1d_array.cpp compiler_fill_image_1d_array.cpp compiler_fill_image_2d_array.cpp compiler_constant_expr.cpp compiler_assignment_operation_in_if.cpp vload_bench.cpp runtime_use_host_ptr_buffer.cpp runtime_alloc_host_ptr_buffer.cpp runtime_use_host_ptr_image.cpp compiler_get_max_sub_group_size.cpp compiler_get_sub_group_local_id.cpp compiler_sub_group_shuffle.cpp compiler_sub_group_shuffle_down.cpp compiler_sub_group_shuffle_up.cpp compiler_sub_group_shuffle_xor.cpp compiler_reqd_sub_group_size.cpp builtin_global_linear_id.cpp builtin_local_linear_id.cpp multi_queue_events.cpp compiler_mix.cpp compiler_math_3op.cpp compiler_bsort.cpp builtin_kernel_block_motion_estimate_intel.cpp compiler_program_global.cpp compiler_generic_atomic.cpp compiler_atomic_functions_20.cpp compiler_sampler.cpp compiler_generic_pointer.cpp runtime_pipe_query.cpp compiler_pipe_builtin.cpp compiler_device_enqueue.cpp compiler_sqrt_div.cpp compiler_remove_negative_add.cpp) if (LLVM_VERSION_NODOT VERSION_GREATER 34) SET(utests_sources ${utests_sources} compiler_overflow.cpp) endif (LLVM_VERSION_NODOT VERSION_GREATER 34) if (NOT_BUILD_STAND_ALONE_UTEST) if (X11_FOUND) SET(utests_sources ${utests_sources} runtime_climage_from_boname.cpp) SET(UTESTS_REQUIRED_X11_LIB ${X11_LIBRARIES} ${XEXT_LIBRARIES}) else() SET(UTESTS_REQUIRED_X11_LIB "") endif (X11_FOUND) endif (NOT_BUILD_STAND_ALONE_UTEST) if (CMRT_FOUND) SET(utests_sources ${utests_sources} runtime_cmrt.cpp) endif (CMRT_FOUND) SET (kernel_bin ${CMAKE_CURRENT_SOURCE_DIR}/../kernels/compiler_ceil) list (GET GBE_BIN_GENERATER -1 GBE_BIN_FILE) if(GEN_PCI_ID) ADD_CUSTOM_COMMAND( OUTPUT ${kernel_bin}.bin COMMAND ${GBE_BIN_GENERATER} ${kernel_bin}.cl -o${kernel_bin}.bin -t${GEN_PCI_ID} DEPENDS ${GBE_BIN_FILE} ${kernel_bin}.cl) else(GEN_PCI_ID) ADD_CUSTOM_COMMAND( OUTPUT ${kernel_bin}.bin COMMAND ${GBE_BIN_GENERATER} ${kernel_bin}.cl -o${kernel_bin}.bin DEPENDS ${GBE_BIN_FILE} ${kernel_bin}.cl) endif(GEN_PCI_ID) if (NOT_BUILD_STAND_ALONE_UTEST) ADD_CUSTOM_TARGET(kernel_bin.bin DEPENDS ${kernel_bin}.bin) endif (NOT_BUILD_STAND_ALONE_UTEST) add_custom_command(OUTPUT ${CMAKE_CURRENT_SOURCE_DIR}/generated COMMAND mkdir ${CMAKE_CURRENT_SOURCE_DIR}/generated -p COMMAND ${PYTHON_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/utest_math_gen.py > /dev/null 2>&1 COMMAND echo ${KERNEL_GITIGNORE_LIST} |sed 's/ /\\n/g' > ${CMAKE_CURRENT_SOURCE_DIR}/../kernels/.gitignore WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} ) add_custom_target(utest_generator DEPENDS generated WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} ) #compiler_fill_gl_image test case also need xlib if (OPENGL_FOUND AND EGL_FOUND AND X11_FOUND) SET(utests_sources ${utests_sources} compiler_fill_gl_image.cpp) SET(CMAKE_CXX_FLAGS "-DHAS_GL_EGL_X11 ${CMAKE_CXX_FLAGS} ${DEF_OCL_PCH_PCM_PATH}") SET(CMAKE_C_FLAGS "-DHAS_GL_EGL_X11 ${CMAKE_C_FLAGS} ${DEF_OCL_PCH_PCM_PATH}") SET(UTESTS_REQUIRED_GL_EGL_X11_LIB ${OPENGL_LIBRARIES} ${EGL_LIBRARIES} ${X11_LIBRARIES}) endif() if (USE_STANDALONE_GBE_COMPILER STREQUAL "true") SET(utests_sources ${utests_basic_sources} ${utests_binary_kernel_sources}) else () SET(utests_sources ${utests_basic_sources} ${utests_binary_kernel_sources} ${ADDMATHFUNC} ${utests_sources}) endif () if (COMPILER STREQUAL "CLANG") SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-tautological-compare") endif () SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-deprecated-declarations") SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-deprecated-declarations" ) ADD_LIBRARY(utests SHARED ${utests_sources}) if (NOT_BUILD_STAND_ALONE_UTEST) TARGET_LINK_LIBRARIES(utests cl m ${UTESTS_REQUIRED_GL_EGL_X11_LIB} ${CMAKE_THREAD_LIBS_INIT} ${UTESTS_REQUIRED_X11_LIB}) else() TARGET_LINK_LIBRARIES(utests ${OPENCL_LIBRARIES} m ${UTESTS_REQUIRED_GL_EGL_X11_LIB} ${CMAKE_THREAD_LIBS_INIT} ${UTESTS_REQUIRED_X11_LIB}) endif() ADD_EXECUTABLE(utest_run utest_run.cpp) TARGET_LINK_LIBRARIES(utest_run utests) if (NOT_BUILD_STAND_ALONE_UTEST) ADD_DEPENDENCIES (utest_run kernel_bin.bin) endif (NOT_BUILD_STAND_ALONE_UTEST) ADD_DEPENDENCIES (utests utest_generator) ADD_EXECUTABLE(flat_address_space runtime_flat_address_space.cpp) TARGET_LINK_LIBRARIES(flat_address_space utests) ADD_CUSTOM_TARGET(utest DEPENDS utest_run utests flat_address_space) Beignet-1.3.2-Source/utests/compiler_math.cpp000664 001750 001750 00000005247 13161142102 020335 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include #include static void cpu_compiler_math(float *dst, float *src, int i) { const float x = src[i]; const float PI = 3.141592653589793f; switch (i) { case 0: dst[i] = cosf(x); break; case 1: dst[i] = sinf(x); break; case 2: dst[i] = log2f(x); break; case 3: dst[i] = sqrtf(x); break; case 4: dst[i] = 1.f/ sqrtf(x); break; case 5: dst[i] = 1.f / x; break; case 6: dst[i] = tanf(x); break; case 7: dst[i] = powf(x, 0.3333333333333333333f); break; case 8: dst[i] = ceilf(x); break; case 9: dst[i] = cosf(PI * x); break; case 10: dst[i] = powf(2, x); break; case 11: dst[i] = powf(10, x); break; case 12: dst[i] = expf(x) - 1; break; case 13: dst[i] = logf(x + 1); break; case 14: dst[i] = floorf(log2f(x)); break; case 15: dst[i] = sinf(PI * x); break; case 16: dst[i] = tanf(PI * x); break; case 17: dst[i] = 2 * roundf(x / 2); break; case 18: dst[i] = sinhf(x); break; case 19: dst[i] = coshf(x); break; case 20: dst[i] = tanhf(x); break; case 21: dst[i] = asinhf(x); break; case 22: dst[i] = acoshf(x); break; case 23: dst[i] = atanhf(x); break; case 24: dst[i] = asinf(x); break; case 25: dst[i] = acosf(x); break; case 26: dst[i] = atanf(x); break; case 27: dst[i] = asinf(x) / PI; break; case 28: dst[i] = acosf(x) / PI; break; case 29: dst[i] = atanf(x) / PI; break; case 30: dst[i] = erff(x); break; case 31: dst[i] = nanf(""); break; default: dst[i] = 1.f; break; }; } static void compiler_math(void) { const size_t n = 32; float cpu_dst[32], cpu_src[32]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_math"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; int j; for(j = 0; j < 1000; j ++) { OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 32; ++i) cpu_src[i] = ((float*)buf_data[1])[i] = .1f * (rand() & 15); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int i = 0; i < 16; ++i) cpu_compiler_math(cpu_dst, cpu_src, i); for (int i = 0; i < 16; ++i) { const float cpu = cpu_dst[i]; const float gpu = ((float*)buf_data[0])[i]; if (std::isinf(cpu)) OCL_ASSERT(std::isinf(gpu)); else if (std::isnan(cpu)) OCL_ASSERT(std::isnan(gpu)); else OCL_ASSERT(fabs(gpu-cpu) < 1e-3f); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION(compiler_math) Beignet-1.3.2-Source/utests/compiler_mul24.cpp000664 001750 001750 00000001634 13161142102 020343 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_mul24(void) { const int n = 32; int src1[n], src2[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_mul24"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int i = 0; i < n; ++i) { src1[i] = ((int*)buf_data[0])[i] = rand(); src2[i] = ((int*)buf_data[1])[i] = rand(); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(2); for (int i = 0; i < n; ++i) OCL_ASSERT(((int*)buf_data[2])[i] == (src1[i]) * (src2[i])); OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(compiler_mul24); Beignet-1.3.2-Source/utests/compiler_ceil.cpp000664 001750 001750 00000002155 13161142102 020313 0ustar00yryr000000 000000 #include #include "utest_helper.hpp" static void cpu(int global_id, float *src, float *dst) { dst[global_id] = ceilf(src[global_id]); } void compiler_ceil(void) { const size_t n = 16; float cpu_dst[16], cpu_src[16]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_ceil"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; // Run random tests for (uint32_t pass = 0; pass < 8; ++pass) { OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) cpu_src[i] = ((float*)buf_data[0])[i] = .1f * (rand() & 15) - .75f; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) cpu(i, cpu_src, cpu_dst); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) OCL_ASSERT(((float *)buf_data[1])[i] == cpu_dst[i]); OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION(compiler_ceil); Beignet-1.3.2-Source/utests/compiler_sub_group_shuffle_xor.cpp000664 001750 001750 00000005501 13161142102 024006 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_sub_group_shuffle_xor_int(void) { if(!cl_check_subgroups()) return; const size_t n = 32; const int32_t buf_size = 4 * n + 1; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_sub_group_shuffle_xor", "compiler_sub_group_shuffle_xor_int"); OCL_CREATE_BUFFER(buf[0], 0, buf_size * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); int c = 3; OCL_SET_ARG(1, sizeof(int), &c); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); for (int32_t i = 0; i < buf_size; ++i) ((int*)buf_data[0])[i] = -1; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(0); int* dst = (int *)buf_data[0]; int suggroupsize = dst[0]; OCL_ASSERT(suggroupsize == 8 || suggroupsize == 16); dst++; for (int32_t i = 0; i < (int32_t) n; ++i){ int round = i / suggroupsize; int index = i % suggroupsize; OCL_ASSERT(index == dst[4*i]); //printf("%d %d %d %d\n", i, dst[4*i+1], dst[4*i+2], dst[4*i+3]); OCL_ASSERT((round * suggroupsize + (c ^ index)) == dst[4*i+1]); OCL_ASSERT((round * suggroupsize + (index ^ (suggroupsize - index -1))) == dst[4*i+2]); OCL_ASSERT((round * suggroupsize + (index ^ (index + 1) % suggroupsize)) == dst[4*i+3]); } OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(compiler_sub_group_shuffle_xor_int); void compiler_sub_group_shuffle_xor_short(void) { if(!cl_check_subgroups_short()) return; const size_t n = 32; const int32_t buf_size = 4 * n + 1; // Setup kernel and buffers OCL_CALL(cl_kernel_init, "compiler_sub_group_shuffle_xor.cl", "compiler_sub_group_shuffle_xor_short", SOURCE, "-DSHORT"); OCL_CREATE_BUFFER(buf[0], 0, buf_size * sizeof(short), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); int c = 3; OCL_SET_ARG(1, sizeof(int), &c); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); for (int32_t i = 0; i < buf_size; ++i) ((short*)buf_data[0])[i] = -1; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(0); short* dst = (short *)buf_data[0]; short suggroupsize = dst[0]; OCL_ASSERT(suggroupsize == 8 || suggroupsize == 16); dst++; for (int32_t i = 0; i < (int32_t) n; ++i){ int round = i / suggroupsize; int index = i % suggroupsize; OCL_ASSERT(index == dst[4*i]); //printf("%d %d %d %d\n", i, dst[4*i+1], dst[4*i+2], dst[4*i+3]); OCL_ASSERT((round * suggroupsize + (c ^ index)) == dst[4*i+1]); OCL_ASSERT((round * suggroupsize + (index ^ (suggroupsize - index -1))) == dst[4*i+2]); OCL_ASSERT((round * suggroupsize + (index ^ (index + 1) % suggroupsize)) == dst[4*i+3]); } OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(compiler_sub_group_shuffle_xor_short); Beignet-1.3.2-Source/utests/compiler_long_hi_sat.cpp000664 001750 001750 00000011011 13161142102 021654 0ustar00yryr000000 000000 #include #include #include #include "utest_helper.hpp" static void __u64_mul_u64(uint64_t sourceA, uint64_t sourceB, uint64_t &destLow, uint64_t &destHi) { uint64_t lowA, lowB; uint64_t highA, highB; lowA = sourceA & 0xffffffff; highA = sourceA >> 32; lowB = sourceB & 0xffffffff; highB = sourceB >> 32; uint64_t aHibHi = highA * highB; uint64_t aHibLo = highA * lowB; uint64_t aLobHi = lowA * highB; uint64_t aLobLo = lowA * lowB; uint64_t aLobLoHi = aLobLo >> 32; uint64_t aLobHiLo = aLobHi & 0xFFFFFFFFULL; aHibLo += aLobLoHi + aLobHiLo; destHi = aHibHi + (aHibLo >> 32 ) + (aLobHi >> 32); // Cant overflow destLow = (aHibLo << 32) | ( aLobLo & 0xFFFFFFFFULL); } static void __64_mul_64(int64_t sourceA, int64_t sourceB, uint64_t &destLow, int64_t &destHi) { int64_t aSign = sourceA >> 63; int64_t bSign = sourceB >> 63; int64_t resultSign = aSign ^ bSign; // take absolute values of the argument sourceA = (sourceA ^ aSign) - aSign; sourceB = (sourceB ^ bSign) - bSign; uint64_t hi; __u64_mul_u64( (uint64_t) sourceA, (uint64_t) sourceB, destLow, hi ); // Fix the sign if( resultSign ) { destLow ^= resultSign; hi ^= resultSign; destLow -= resultSign; //carry if necessary if( 0 == destLow ) hi -= resultSign; } destHi = (int64_t) hi; } static void __mad_sat(int64_t sourceA, int64_t sourceB, int64_t sourceC, int64_t& dst) { cl_long multHi; cl_ulong multLo; __64_mul_64(sourceA, sourceB, multLo, multHi); cl_ulong sum = multLo + sourceC; // carry if overflow if(sourceC >= 0) { if(multLo > sum) { multHi++; if(CL_LONG_MIN == multHi) { multHi = CL_LONG_MAX; sum = CL_ULONG_MAX; } } } else { if( multLo < sum ) { multHi--; if( CL_LONG_MAX == multHi ) { multHi = CL_LONG_MIN; sum = 0; } } } // saturate if( multHi > 0 ) sum = CL_LONG_MAX; else if ( multHi == 0 && sum > CL_LONG_MAX) sum = CL_LONG_MAX; else if ( multHi == -1 && sum < (cl_ulong)CL_LONG_MIN) sum = CL_LONG_MIN; else if( multHi < -1 ) sum = CL_LONG_MIN; dst = (cl_long) sum; } void compiler_long_mul_hi(void) { const size_t n = 32; int64_t src[n]; int64_t num0 = 0xF00A00CED0090B0CUL; int64_t num1 = 0x7FABCD57FC098FC1UL; memset(src, 0, sizeof(int64_t) * n); // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_long_hi_sat", "compiler_long_mul_hi"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint64_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint64_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_long), &num0); OCL_SET_ARG(3, sizeof(cl_long), &num1); globals[0] = n; locals[0] = 32; for (int32_t i = 0; i < (int32_t) n; ++i) { uint64_t a = rand(); a = a <<32 | a; src[i] = a; } OCL_MAP_BUFFER(0); memcpy(buf_data[0], src, sizeof(uint64_t) * n); OCL_UNMAP_BUFFER(0); uint64_t res_lo; int64_t res_hi; // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { if (i % 2 == 0) __64_mul_64(src[i], num0, res_lo, res_hi); else __64_mul_64(src[i], num1, res_lo, res_hi); OCL_ASSERT(((int64_t *)(buf_data[1]))[i] == res_hi); } OCL_UNMAP_BUFFER(1); } void compiler_long_mul_sat(void) { const size_t n = 32; int64_t src[n]; int64_t num0 = 0xF00000CED8090B0CUL; int64_t num1 = 0x0000000000098FC1UL; memset(src, 0, sizeof(int64_t) * n); // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_long_hi_sat", "compiler_long_mul_sat"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint64_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint64_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_long), &num0); OCL_SET_ARG(3, sizeof(cl_long), &num1); globals[0] = n; locals[0] = 32; for (int32_t i = 0; i < (int32_t) n; ++i) { uint64_t a = rand(); a = a <<32 | a; src[i] = a; } OCL_MAP_BUFFER(0); memcpy(buf_data[0], src, sizeof(uint64_t) * n); OCL_UNMAP_BUFFER(0); int64_t res; // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { __mad_sat(src[i], num0, num1, res); OCL_ASSERT(((int64_t *)(buf_data[1]))[i] == res); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_long_mul_hi); MAKE_UTEST_FROM_FUNCTION(compiler_long_mul_sat); Beignet-1.3.2-Source/utests/compiler_unstructured_branch2.cpp000664 001750 001750 00000004066 13161142102 023550 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_unstructured_branch2(void) { const size_t n = 16; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_unstructured_branch2"); buf_data[0] = (uint32_t*) malloc(sizeof(uint32_t) * n); for (uint32_t i = 0; i < n; ++i) ((uint32_t*)buf_data[0])[i] = 2; OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(uint32_t), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; OCL_NDRANGE(1); // First control flow OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 12); // Second control flow for (uint32_t i = 0; i < n; ++i) ((int32_t*)buf_data[0])[i] = -2; OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == -6); // Third control flow for (uint32_t i = 0; i < 8; ++i) ((int32_t*)buf_data[0])[i] = 2; for (uint32_t i = 8; i < n; ++i) ((int32_t*)buf_data[0])[i] = -2; OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 8; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 12); for (uint32_t i = 8; i < n; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == -6); // Fourth control flow for (uint32_t i = 0; i < 4; ++i) ((int32_t*)buf_data[0])[i] = 1; for (uint32_t i = 4; i < 8; ++i) ((int32_t*)buf_data[0])[i] = 2; for (uint32_t i = 8; i < n; ++i) ((int32_t*)buf_data[0])[i] = -2; OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 8; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 12); for (uint32_t i = 8; i < n; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == -6); } MAKE_UTEST_FROM_FUNCTION(compiler_unstructured_branch2); Beignet-1.3.2-Source/utests/compiler_short_scatter.cpp000664 001750 001750 00000001045 13161142102 022260 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_short_scatter(void) { const size_t n = 128; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_short_scatter"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int16_t), NULL); // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); // Check result OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) OCL_ASSERT(((int16_t*)buf_data[0])[i] == (int16_t) i); } MAKE_UTEST_FROM_FUNCTION(compiler_short_scatter); Beignet-1.3.2-Source/utests/multi_queue_events.cpp000664 001750 001750 00000007304 13161142102 021430 0ustar00yryr000000 000000 #include "utest_helper.hpp" #define THREAD_SIZE 8 pthread_t tid[THREAD_SIZE]; static cl_command_queue all_queues[THREAD_SIZE]; static cl_event enqueue_events[THREAD_SIZE]; static cl_event user_event; static cl_kernel the_kernel; static char source_str[] = "kernel void assgin_work_dim( __global int *ret, int i) { \n" "if (i == 0) ret[i] = 10; \n" "else ret[i] = ret[i - 1] + 1; \n" "}\n"; static size_t the_globals[3] = {16, 1, 1}; static size_t the_locals[3] = {16, 1, 1}; static size_t the_goffsets[3] = {0, 0, 0}; static void *thread_function(void *arg) { int num = *((int *)arg); cl_int ret; cl_event dep_event[2]; ret = clSetKernelArg(the_kernel, 1, sizeof(cl_int), &num); OCL_ASSERT(ret == CL_SUCCESS); if (num == 0) { dep_event[0] = user_event; ret = clEnqueueNDRangeKernel(all_queues[num], the_kernel, 1, the_goffsets, the_globals, the_locals, 1, dep_event, &enqueue_events[num]); } else { dep_event[0] = user_event; dep_event[1] = enqueue_events[num - 1]; ret = clEnqueueNDRangeKernel(all_queues[num], the_kernel, 1, the_goffsets, the_globals, the_locals, 2, dep_event, &enqueue_events[num]); } OCL_ASSERT(ret == CL_SUCCESS); return NULL; } void multi_queue_events(void) { cl_int ret; size_t source_size = sizeof(source_str); const char *source = source_str; cl_program program = NULL; int i; /* Create Kernel Program from the source */ program = clCreateProgramWithSource(ctx, 1, &source, &source_size, &ret); OCL_ASSERT(ret == CL_SUCCESS); /* Build Kernel Program */ ret = clBuildProgram(program, 1, &device, NULL, NULL, NULL); OCL_ASSERT(ret == CL_SUCCESS); the_kernel = clCreateKernel(program, "assgin_work_dim", NULL); OCL_ASSERT(the_kernel != NULL); int buffer_content[16] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; cl_mem buf = clCreateBuffer(ctx, CL_MEM_COPY_HOST_PTR, 16 * 4, buffer_content, &ret); OCL_ASSERT(buf != NULL); ret = clSetKernelArg(the_kernel, 0, sizeof(cl_mem), &buf); OCL_ASSERT(ret == CL_SUCCESS); for (i = 0; i < THREAD_SIZE; i++) { all_queues[i] = clCreateCommandQueue(ctx, device, 0, &ret); OCL_ASSERT(ret == CL_SUCCESS); } user_event = clCreateUserEvent(ctx, &ret); OCL_ASSERT(ret == CL_SUCCESS); for (i = 0; i < THREAD_SIZE; i++) { pthread_create(&tid[i], NULL, thread_function, &i); pthread_join(tid[i], NULL); } cl_event map_event; void *map_ptr = clEnqueueMapBuffer(all_queues[0], buf, 0, CL_MAP_READ, 0, 32, THREAD_SIZE, enqueue_events, &map_event, NULL); OCL_ASSERT(map_ptr != NULL); cl_event all_event[10]; for (i = 0; i < THREAD_SIZE; i++) { all_event[i] = enqueue_events[i]; } all_event[8] = user_event; all_event[9] = map_event; //printf("before Waitfor events ##\n"); clSetUserEventStatus(user_event, CL_COMPLETE); ret = clWaitForEvents(10, all_event); OCL_ASSERT(ret == CL_SUCCESS); //printf("After Waitfor events ##\n"); //printf("############# Finish Setting ################\n"); printf("\n"); for (i = 0; i < 8; i++) { //printf(" %d", ((int *)map_ptr)[i]); OCL_ASSERT(((int *)map_ptr)[i] == 10 + i); } //printf("\n"); ret = clEnqueueUnmapMemObject(all_queues[0], buf, map_ptr, 1, &map_event, NULL); OCL_ASSERT(ret == CL_SUCCESS); //printf("------------------------- End -------------------------------\n"); clReleaseKernel(the_kernel); clReleaseProgram(program); clReleaseMemObject(buf); for (i = 0; i < THREAD_SIZE; i++) { clReleaseCommandQueue(all_queues[i]); clReleaseEvent(enqueue_events[i]); } clReleaseEvent(user_event); clReleaseEvent(map_event); } MAKE_UTEST_FROM_FUNCTION(multi_queue_events); Beignet-1.3.2-Source/utests/compiler_fill_image_3d.cpp000664 001750 001750 00000002411 13161142102 022050 0ustar00yryr000000 000000 #include #include "utest_helper.hpp" static void compiler_fill_image_3d(void) { const size_t w = 512; const size_t h = 512; const size_t depth = 5; uint32_t color = 0x12345678; cl_image_format format; cl_image_desc desc; memset(&desc, 0x0, sizeof(cl_image_desc)); memset(&format, 0x0, sizeof(cl_image_format)); format.image_channel_order = CL_RGBA; format.image_channel_data_type = CL_UNSIGNED_INT8; desc.image_type = CL_MEM_OBJECT_IMAGE3D; desc.image_width = w; desc.image_height = h; desc.image_depth = depth; desc.image_row_pitch = 0; desc.image_slice_pitch = 0; // Setup kernel and images OCL_CREATE_KERNEL("test_fill_image_3d"); OCL_CREATE_IMAGE(buf[0], 0, &format, &desc, NULL); // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(color), &color); globals[0] = w; globals[1] = h; globals[2] = depth; locals[0] = 16; locals[1] = 16; locals[2] = 1; OCL_NDRANGE(3); // Check result OCL_MAP_BUFFER_GTT(0); for (uint32_t k = 0; k < depth; k++) for (uint32_t j = 0; j < h; ++j) for (uint32_t i = 0; i < w; i++) OCL_ASSERT(((uint32_t*)buf_data[0])[k*w*h + j*w + i] == 0x78563412); OCL_UNMAP_BUFFER_GTT(0); } MAKE_UTEST_FROM_FUNCTION(compiler_fill_image_3d); Beignet-1.3.2-Source/utests/compiler_workgroup_broadcast.cpp000664 001750 001750 00000021064 13161142102 023460 0ustar00yryr000000 000000 #include #include #include #include "utest_helper.hpp" using namespace std; /* set to 1 for debug, output of input-expected data */ #define DEBUG_STDOUT 0 /* NDRANGE */ #define WG_GLOBAL_SIZE_X 16 #define WG_GLOBAL_SIZE_Y 4 #define WG_GLOBAL_SIZE_Z 4 #define WG_LOCAL_SIZE_X 16 #define WG_LOCAL_SIZE_Y 2 #define WG_LOCAL_SIZE_Z 2 /* TODO debug bellow case, lid2 always stays 0, instead of 0 and 1 * * #define WG_GLOBAL_SIZE_X 16 * #define WG_GLOBAL_SIZE_Y 1 * #define WG_GLOBAL_SIZE_Z 4 * * #define WG_LOCAL_SIZE_X 16 * #define WG_LOCAL_SIZE_Y 1 * #define WG_LOCAL_SIZE_Z 2 */ #define WG_LOCAL_X 5 #define WG_LOCAL_Y 0 #define WG_LOCAL_Z 0 enum WG_BROADCAST { WG_BROADCAST_1D, WG_BROADCAST_2D, WG_BROADCAST_3D }; /* * Generic compute-expected function for op BROADCAST type * and any variable type */ template static void compute_expected(WG_BROADCAST wg_broadcast, T* input, T* expected, uint32_t wg_global_size, uint32_t wg_local_size) { if(wg_broadcast == WG_BROADCAST_1D) { for(uint32_t i = 0; i < wg_local_size; i++) expected[i] = input[WG_LOCAL_X]; } else if(wg_broadcast == WG_BROADCAST_2D) { for(uint32_t i = 0; i < wg_local_size; i++) expected[i] = input[WG_LOCAL_X + WG_LOCAL_Y * WG_LOCAL_SIZE_X]; } else if(wg_broadcast == WG_BROADCAST_3D) { for(uint32_t i = 0; i < wg_local_size; i++) expected[i] = input[WG_LOCAL_X + WG_LOCAL_Y * WG_LOCAL_SIZE_X + WG_LOCAL_Z * WG_LOCAL_SIZE_X * WG_LOCAL_SIZE_Y]; } } /* * Generic input-expected generate function for op BROADCAST type * and any variable type */ template static void generate_data(WG_BROADCAST wg_broadcast, T* &input, T* &expected, uint32_t &wg_global_size, uint32_t &wg_local_size) { if(wg_broadcast == WG_BROADCAST_1D) { wg_global_size = WG_GLOBAL_SIZE_X; wg_local_size = WG_LOCAL_SIZE_X; } else if(wg_broadcast == WG_BROADCAST_2D) { wg_global_size = WG_GLOBAL_SIZE_X * WG_GLOBAL_SIZE_Y; wg_local_size = WG_LOCAL_SIZE_X * WG_LOCAL_SIZE_Y; } else if(wg_broadcast == WG_BROADCAST_3D) { wg_global_size = WG_GLOBAL_SIZE_X * WG_GLOBAL_SIZE_Y * WG_GLOBAL_SIZE_Z; wg_local_size = WG_LOCAL_SIZE_X * WG_LOCAL_SIZE_Y * WG_LOCAL_SIZE_Z; } /* allocate input and expected arrays */ input = new T[wg_global_size]; expected = new T[wg_global_size]; /* base value for all data types */ T base_val = (long)7 << (sizeof(T) * 5 - 3); /* seed for random inputs */ srand (time(NULL)); /* generate inputs and expected values */ for(uint32_t gid = 0; gid < wg_global_size; gid += wg_local_size) { #if DEBUG_STDOUT cout << endl << "IN: " << endl; #endif /* input values */ for(uint32_t lid = 0; lid < wg_local_size; lid++) { /* initially 0, augment after */ input[gid + lid] = 0; /* check all data types, test ideal for QWORD types */ input[gid + lid] += ((rand() % 2 - 1) * base_val); /* add trailing random bits, tests GENERAL cases */ input[gid + lid] += (rand() % 112); #if DEBUG_STDOUT /* output generated input */ cout << setw(4) << input[gid + lid] << ", " ; if((lid + 1) % 8 == 0) cout << endl; #endif } /* expected values */ compute_expected(wg_broadcast, input + gid, expected + gid, wg_global_size, wg_local_size); #if DEBUG_STDOUT /* output expected input */ cout << endl << "EXP: " << endl; for(uint32_t lid = 0; lid < wg_local_size; lid++){ cout << setw(4) << expected[gid + lid] << ", " ; if((lid + 1) % 8 == 0) cout << endl; } #endif } } /* * Generic workgroup utest function for op BROADCAST type * and any variable type */ template static void workgroup_generic(WG_BROADCAST wg_broadcast, T* input, T* expected) { uint32_t wg_global_size = 0; uint32_t wg_local_size = 0; cl_uint wg_local_x = WG_LOCAL_X; cl_uint wg_local_y = WG_LOCAL_Y; cl_uint wg_local_z = WG_LOCAL_Z; /* input and expected data */ generate_data(wg_broadcast, input, expected, wg_global_size, wg_local_size); /* prepare input for datatype */ OCL_CREATE_BUFFER(buf[0], 0, wg_global_size * sizeof(T), NULL); OCL_CREATE_BUFFER(buf[1], 0, wg_global_size * sizeof(T), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_uint), &wg_local_x); OCL_SET_ARG(3, sizeof(cl_uint), &wg_local_y); OCL_SET_ARG(4, sizeof(cl_uint), &wg_local_z); /* set input data for GPU */ OCL_MAP_BUFFER(0); memcpy(buf_data[0], input, wg_global_size * sizeof(T)); OCL_UNMAP_BUFFER(0); /* run the kernel on GPU */ if(wg_broadcast == WG_BROADCAST_1D) { globals[0] = WG_GLOBAL_SIZE_X; locals[0] = WG_LOCAL_SIZE_X; OCL_NDRANGE(1); } else if(wg_broadcast == WG_BROADCAST_2D) { globals[0] = WG_GLOBAL_SIZE_X; locals[0] = WG_LOCAL_SIZE_X; globals[1] = WG_GLOBAL_SIZE_Y; locals[1] = WG_LOCAL_SIZE_Y; OCL_NDRANGE(2); } else if(wg_broadcast == WG_BROADCAST_3D) { globals[0] = WG_GLOBAL_SIZE_X; locals[0] = WG_LOCAL_SIZE_X; globals[1] = WG_GLOBAL_SIZE_Y; locals[1] = WG_LOCAL_SIZE_Y; globals[2] = WG_GLOBAL_SIZE_Z; locals[2] = WG_LOCAL_SIZE_Y; OCL_NDRANGE(3); } /* check if mismatch */ OCL_MAP_BUFFER(1); uint32_t mismatches = 0; for (uint32_t i = 0; i < wg_global_size; i++) if(((T *)buf_data[1])[i] != *(expected + i)) { /* found mismatch, increment */ mismatches++; #if DEBUG_STDOUT /* output mismatch */ cout << "Err at " << i << ", " << ((T *)buf_data[1])[i] << " != " << *(expected + i) << endl; #endif } #if DEBUG_STDOUT /* output mismatch count */ cout << "mismatches " << mismatches << endl; #endif OCL_UNMAP_BUFFER(1); OCL_ASSERT(mismatches == 0); } /* * Workgroup broadcast 1D functions */ void compiler_workgroup_broadcast_1D_int(void) { if (!cl_check_ocl20()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_broadcast", "compiler_workgroup_broadcast_1D_int"); workgroup_generic(WG_BROADCAST_1D, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_broadcast_1D_int); void compiler_workgroup_broadcast_1D_long(void) { if (!cl_check_ocl20()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_broadcast", "compiler_workgroup_broadcast_1D_long"); workgroup_generic(WG_BROADCAST_1D, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_broadcast_1D_long); /* * Workgroup broadcast 2D functions */ void compiler_workgroup_broadcast_2D_int(void) { if (!cl_check_ocl20()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_broadcast", "compiler_workgroup_broadcast_2D_int"); workgroup_generic(WG_BROADCAST_2D, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_broadcast_2D_int); void compiler_workgroup_broadcast_2D_long(void) { if (!cl_check_ocl20()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_broadcast", "compiler_workgroup_broadcast_2D_long"); workgroup_generic(WG_BROADCAST_2D, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_broadcast_2D_long); /* * Workgroup broadcast 3D functions */ void compiler_workgroup_broadcast_3D_int(void) { if (!cl_check_ocl20()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_broadcast", "compiler_workgroup_broadcast_3D_int"); workgroup_generic(WG_BROADCAST_3D, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_broadcast_3D_int); void compiler_workgroup_broadcast_3D_long(void) { if (!cl_check_ocl20()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_broadcast", "compiler_workgroup_broadcast_3D_long"); workgroup_generic(WG_BROADCAST_3D, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_broadcast_3D_long); Beignet-1.3.2-Source/utests/compiler_copy_image_1d.cpp000664 001750 001750 00000002761 13161142102 022102 0ustar00yryr000000 000000 #include #include "utest_helper.hpp" static void compiler_copy_image_1d(void) { const size_t w = 512; cl_image_format format; cl_image_desc desc; cl_sampler sampler; memset(&desc, 0x0, sizeof(cl_image_desc)); memset(&format, 0x0, sizeof(cl_image_format)); // Setup kernel and images OCL_CREATE_KERNEL("test_copy_image_1d"); buf_data[0] = (uint32_t*) malloc(sizeof(uint32_t) * w); for (uint32_t i = 0; i < w; i++) ((uint32_t*)buf_data[0])[i] = i; format.image_channel_order = CL_RGBA; format.image_channel_data_type = CL_UNSIGNED_INT8; desc.image_type = CL_MEM_OBJECT_IMAGE1D; desc.image_width = w; desc.image_row_pitch = w * sizeof(uint32_t); OCL_CREATE_IMAGE(buf[0], CL_MEM_COPY_HOST_PTR, &format, &desc, buf_data[0]); desc.image_row_pitch = 0; OCL_CREATE_IMAGE(buf[1], 0, &format, &desc, NULL); OCL_CREATE_SAMPLER(sampler, CL_ADDRESS_REPEAT, CL_FILTER_NEAREST); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(sampler), &sampler); globals[0] = w; locals[0] = 16; OCL_NDRANGE(1); // Check result OCL_MAP_BUFFER_GTT(0); OCL_MAP_BUFFER_GTT(1); for (uint32_t i = 0; i < w; i++) { //printf (" %x", ((uint32_t*)buf_data[1])[i]); OCL_ASSERT(((uint32_t*)buf_data[0])[i] == ((uint32_t*)buf_data[1])[i]); } OCL_UNMAP_BUFFER_GTT(0); OCL_UNMAP_BUFFER_GTT(1); } MAKE_UTEST_FROM_FUNCTION(compiler_copy_image_1d); Beignet-1.3.2-Source/utests/enqueue_fill_buf.cpp000664 001750 001750 00000004117 13161142102 021016 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include static char pattern_serials[128]; static void test_fill_buf(size_t sz, size_t offset, size_t size, size_t pattern_sz) { unsigned int i; int ret = 0; OCL_MAP_BUFFER(0); memset(((char*)buf_data[0]), 0, sz); OCL_UNMAP_BUFFER(0); for (i=0; i < pattern_sz; i++) { pattern_serials[i] = (rand() & 63); } if (offset + size > sz) { /* Expect Error. */ OCL_ASSERT(clEnqueueFillBuffer(queue, buf[0], pattern_serials, pattern_sz, offset, size, 0, NULL, NULL)); return; } ret = clEnqueueFillBuffer(queue, buf[0], pattern_serials, pattern_sz, offset, size, 0, NULL, NULL); OCL_ASSERT(!ret); OCL_MAP_BUFFER(0); #if 0 printf("\n==== pattern size is %d, offset is %d, size is %d ====\n", pattern_sz, offset, size); printf("\n########### buffer: \n"); for (i = 0; i < sz; ++i) printf(" %2.2u", ((unsigned char*)buf_data[0])[i]); #endif // Check results int j = 0; for (i = 0; i < sz; ++i) { if (i < offset || i >= offset + size) { if (((char*)buf_data[0])[i] != 0) { printf ("\nnon zero index is %d\n", i); OCL_ASSERT(0); } continue; } if (((char*)buf_data[0])[i] != pattern_serials[j]) { printf ("\ndifferent index is %d\n", i); OCL_ASSERT(0); } j++; if (j == (int)pattern_sz) j = 0; } OCL_UNMAP_BUFFER(0); } void enqueue_fill_buf(void) { size_t offset; size_t pattern_sz; const size_t sz = 1024; size_t size = 0; static int valid_sz[] = {1, 2, 4, 8, 16, 32, 64, 128}; unsigned int i = 0; OCL_CREATE_BUFFER(buf[0], 0, sz * sizeof(char), NULL); for (i = 0; i < sizeof(valid_sz)/sizeof(int); i++) { pattern_sz = valid_sz[i]; size = ((rand()%1024)/pattern_sz) * pattern_sz; offset = ((rand()%1024)/pattern_sz) * pattern_sz; while (size + offset + 1 > sz) { if (size > offset) { size = size - offset; } else offset = offset - size; } test_fill_buf(sz, offset, size, pattern_sz); } } MAKE_UTEST_FROM_FUNCTION(enqueue_fill_buf); Beignet-1.3.2-Source/utests/.gitignore000664 001750 001750 00000000504 13161142102 016765 0ustar00yryr000000 000000 compiler_box_blur.bmp compiler_box_blur_float.bmp compiler_clod.bmp compiler_julia.bmp compiler_julia_no_break.bmp compiler_mandelbrot.bmp compiler_mandelbrot_alternate.bmp compiler_menger_sponge_no_shadow.bmp compiler_nautilus.bmp compiler_ribbon.bmp flat_address_space libutests.so utest_run generated utest_generator.pyc Beignet-1.3.2-Source/utests/compiler_long_bitcast.cpp000664 001750 001750 00000014064 13161142102 022051 0ustar00yryr000000 000000 #include #include #include #include "utest_helper.hpp" void compiler_bitcast_char8_to_long(void) { const size_t n = 64; const int v = 8; char src[n * v]; uint64_t *dst = (uint64_t *)src; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_long_bitcast", "compiler_bitcast_char8_to_long"); OCL_CREATE_BUFFER(buf[0], 0, sizeof(src), NULL); OCL_CREATE_BUFFER(buf[1], 0, sizeof(src), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n*v; ++i) { src[i] = (char)rand(); } OCL_MAP_BUFFER(0); memcpy(buf_data[0], src, sizeof(src)); OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { OCL_ASSERT(((uint64_t *)(buf_data[1]))[i] == dst[i]); //printf("ref is 0x%lx, result is 0x%lx\n", dst[i], ((int64_t *)(buf_data[1]))[i]); } OCL_UNMAP_BUFFER(1); } void compiler_bitcast_long_to_char8(void) { const size_t n = 64; const int v = 8; uint64_t src[n]; char *dst = (char *)src; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_long_bitcast", "compiler_bitcast_char8_to_long"); OCL_CREATE_BUFFER(buf[0], 0, sizeof(src), NULL); OCL_CREATE_BUFFER(buf[1], 0, sizeof(src), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { src[i] = ((int64_t)rand() << 32) + rand(); } OCL_MAP_BUFFER(0); memcpy(buf_data[0], src, sizeof(src)); OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n*v; ++i) { OCL_ASSERT(((char *)(buf_data[1]))[i] == dst[i]); // printf("ref is 0x%2x, result is 0x%2x\n", dst[i], ((char *)(buf_data[1]))[i]); } OCL_UNMAP_BUFFER(1); } void compiler_bitcast_int2_to_long(void) { const size_t n = 64; const int v = 2; int src[n * v]; uint64_t *dst = (uint64_t *)src; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_long_bitcast", "compiler_bitcast_int2_to_long"); OCL_CREATE_BUFFER(buf[0], 0, sizeof(src), NULL); OCL_CREATE_BUFFER(buf[1], 0, sizeof(src), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n*v; ++i) { src[i] = (int)rand(); } OCL_MAP_BUFFER(0); memcpy(buf_data[0], src, sizeof(src)); OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { OCL_ASSERT(((uint64_t *)(buf_data[1]))[i] == dst[i]); //printf("ref is 0x%lx, result is 0x%lx\n", dst[i], ((int64_t *)(buf_data[1]))[i]); } OCL_UNMAP_BUFFER(1); } void compiler_bitcast_long_to_int2(void) { const size_t n = 64; const int v = 2; uint64_t src[n]; uint32_t *dst = (uint32_t *)src; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_long_bitcast", "compiler_bitcast_long_to_int2"); OCL_CREATE_BUFFER(buf[0], 0, sizeof(src), NULL); OCL_CREATE_BUFFER(buf[1], 0, sizeof(src), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { src[i] = ((int64_t)i << 32) + i; } OCL_MAP_BUFFER(0); memcpy(buf_data[0], src, sizeof(src)); OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n*v; ++i) { OCL_ASSERT(((uint32_t *)(buf_data[1]))[i] == dst[i]); //printf("ref is 0x%2x, result is 0x%2x\n", dst[i], ((uint32_t *)(buf_data[1]))[i]); } OCL_UNMAP_BUFFER(1); } void compiler_bitcast_short4_to_long(void) { const size_t n = 64; const int v = 4; short src[n * v]; uint64_t *dst = (uint64_t *)src; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_long_bitcast", "compiler_bitcast_short4_to_long"); OCL_CREATE_BUFFER(buf[0], 0, sizeof(src), NULL); OCL_CREATE_BUFFER(buf[1], 0, sizeof(src), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n*v; ++i) { src[i] = (short)rand(); } OCL_MAP_BUFFER(0); memcpy(buf_data[0], src, sizeof(src)); OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { OCL_ASSERT(((uint64_t *)(buf_data[1]))[i] == dst[i]); //printf("ref is 0x%lx, result is 0x%lx\n", dst[i], ((int64_t *)(buf_data[1]))[i]); } OCL_UNMAP_BUFFER(1); } void compiler_bitcast_long_to_short4(void) { const size_t n = 64; const int v = 4; uint64_t src[n]; uint16_t *dst = (uint16_t *)src; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_long_bitcast", "compiler_bitcast_long_to_short4"); OCL_CREATE_BUFFER(buf[0], 0, sizeof(src), NULL); OCL_CREATE_BUFFER(buf[1], 0, sizeof(src), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { src[i] = ((int64_t)rand() << 32) + rand(); } OCL_MAP_BUFFER(0); memcpy(buf_data[0], src, sizeof(src)); OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n*v; ++i) { OCL_ASSERT(((uint16_t *)(buf_data[1]))[i] == dst[i]); //printf("ref is 0x%2x, result is 0x%2x\n", dst[i], ((uint16_t *)(buf_data[1]))[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_bitcast_char8_to_long); MAKE_UTEST_FROM_FUNCTION(compiler_bitcast_long_to_char8); MAKE_UTEST_FROM_FUNCTION(compiler_bitcast_int2_to_long); MAKE_UTEST_FROM_FUNCTION(compiler_bitcast_long_to_int2); MAKE_UTEST_FROM_FUNCTION(compiler_bitcast_short4_to_long); MAKE_UTEST_FROM_FUNCTION(compiler_bitcast_long_to_short4); Beignet-1.3.2-Source/utests/new_data.txt000664 001750 001750 00000005600 13161142102 017322 0ustar00yryr000000 000000 6 5 3 4 37 15 10 2 200 3 156 1 97 200 3 3 2 1 2 10 128 2 124 25 5 5 251 0 256 0 256 0 256 0 256 0 256 0 256 0 256 1 256 2 256 3 256 0 256 0 256 0 256 1 256 2 256 3 256 0 256 0 256 0 256 0 256 0 256 4 256 5 256 6 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 3 100 255 100 155 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 0 100 255 56 0 100 253 100 255 56 0 56 0 20 8 180 9 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 256 0 1 253 5 252 150 168 100 254 150 168 100 254 1 253 5 252 Beignet-1.3.2-Source/utests/compiler_load_bool_imm.cpp000664 001750 001750 00000001506 13161142102 022172 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_load_bool_imm(void) { const size_t n = 1024; const size_t local_size = 16; const int copiesPerWorkItem = 5; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_load_bool_imm"); OCL_CREATE_BUFFER(buf[0], 0, n * copiesPerWorkItem * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, local_size*copiesPerWorkItem*sizeof(int), NULL); // 16 x int OCL_SET_ARG(2, sizeof(int), &copiesPerWorkItem); // 16 x int // Run the kernel globals[0] = n; locals[0] = local_size; OCL_NDRANGE(1); OCL_MAP_BUFFER(0); // Check results int *dst = (int*)buf_data[0]; for (uint32_t i = 0; i < n * copiesPerWorkItem; i++) OCL_ASSERT(dst[i] == copiesPerWorkItem); OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(compiler_load_bool_imm); Beignet-1.3.2-Source/utests/compiler_global_constant.cpp000664 001750 001750 00000005736 13161142102 022560 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_global_constant(void) { const size_t n = 2048; const uint32_t e = 34, r = 77; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_global_constant"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(uint32_t), &e); OCL_SET_ARG(2, sizeof(uint32_t), &r); // Run the kernel globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); unsigned int m[3] = {71,72,73}; // Check results OCL_MAP_BUFFER(0); for (uint32_t i = 0; i < n; ++i) // printf("%d result %d reference %d\n", i, ((uint32_t *)buf_data[0])[i], m[i%3] + e + r); OCL_ASSERT(((uint32_t *)buf_data[0])[i] == m[i%3] + e + r); OCL_UNMAP_BUFFER(0); } void compiler_global_constant1(void) { const size_t n = 32; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_global_constant", "compiler_global_constant1"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); // Run the kernel globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); uint32_t data1[] = {1, 4, 7}; uint32_t data2[]= {3, 7, 11}; // Check results OCL_MAP_BUFFER(0); for (uint32_t i = 0; i < n; ++i) // printf("%d result %d reference %d\n", i, ((uint32_t *)buf_data[0])[i], data1[i%3] + data2[i%3]); OCL_ASSERT(((uint32_t *)buf_data[0])[i] == data1[i%3] + data2[i%3]); OCL_UNMAP_BUFFER(0); } void compiler_global_constant2(void) { const size_t n = 32; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_global_constant", "compiler_global_constant2"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); // Run the kernel globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); // Check results OCL_MAP_BUFFER(0); for (uint32_t i = 0; i < n; ++i) // printf("%d result %d reference %d\n", i, ((uint32_t *)buf_data[0])[i], 6); OCL_ASSERT(((uint32_t *)buf_data[0])[i] == 6); OCL_UNMAP_BUFFER(0); } void compiler_global_constant3(void) { const size_t n = 32; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_global_constant", "compiler_global_constant3"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); // Run the kernel globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); uint32_t data1[] = {3, 6, 9}; char data2[]= {'c', 'f', 'j'}; // Check results OCL_MAP_BUFFER(0); for (uint32_t i = 0; i < n; ++i) // printf("%d result %d reference %d\n", i, ((uint32_t *)buf_data[0])[i], data1[i%3] + (int)data2[i%3]); OCL_ASSERT(((uint32_t *)buf_data[0])[i] == data1[i%3] + (uint32_t)data2[i%3]); OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(compiler_global_constant, true); MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(compiler_global_constant1, true); MAKE_UTEST_FROM_FUNCTION_KEEP_PROGRAM(compiler_global_constant2, true); MAKE_UTEST_FROM_FUNCTION(compiler_global_constant3); Beignet-1.3.2-Source/utests/runtime_pipe_query.cpp000664 001750 001750 00000001064 13161142102 021430 0ustar00yryr000000 000000 #include #include "utest_helper.hpp" static void runtime_pipe_query(void) { if(!cl_check_ocl20(false)) return; const size_t w = 16; const size_t sz = 8; cl_uint retnum, retsz; /* pipe write kernel */ OCL_CALL2(clCreatePipe, buf[0], ctx, 0, sz, w, NULL); OCL_CALL(clGetPipeInfo, buf[0], CL_PIPE_MAX_PACKETS, sizeof(retnum), &retnum, NULL); OCL_CALL(clGetPipeInfo, buf[0], CL_PIPE_PACKET_SIZE, sizeof(retsz), &retsz, NULL); /*Check result */ OCL_ASSERT(sz == retsz && w == retnum); } MAKE_UTEST_FROM_FUNCTION(runtime_pipe_query); Beignet-1.3.2-Source/utests/runtime_createcontext.cpp000664 001750 001750 00000000550 13161142102 022115 0ustar00yryr000000 000000 #include "utest_helper.hpp" void runtime_createcontextfromtype(void) { cl_int status; cl_context ctx; ctx = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, NULL, NULL, &status); if (ctx == NULL) { OCL_THROW_ERROR("runtime_createcontextfromtype", status); } clReleaseContext(ctx); } MAKE_UTEST_FROM_FUNCTION(runtime_createcontextfromtype); Beignet-1.3.2-Source/utests/builtin_convert_sat.cpp000664 001750 001750 00000004755 13161142102 021572 0ustar00yryr000000 000000 #include #include "utest_helper.hpp" typedef unsigned char uchar; typedef unsigned short ushort; int64_t my_rand(void) { int64_t x = rand() - RAND_MAX/2; int64_t y = rand() - RAND_MAX/2; return x * y; } #define DEF2(DST_TYPE, SRC_TYPE, DST_MIN, DST_MAX, REAL_SRC_TYPE) \ void builtin_convert_ ## SRC_TYPE ## _to_ ## DST_TYPE ## _sat(void) \ { \ const int n = 128; \ OCL_CREATE_KERNEL_FROM_FILE("builtin_convert_sat", "builtin_convert_" # SRC_TYPE "_to_" # DST_TYPE "_sat"); \ OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(REAL_SRC_TYPE), NULL); \ OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(DST_TYPE), NULL); \ OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); \ OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); \ globals[0] = n; \ locals[0] = 16; \ OCL_MAP_BUFFER(0); \ for (int i = 0; i < n; i++) \ ((REAL_SRC_TYPE *)buf_data[0])[i] = my_rand(); \ OCL_UNMAP_BUFFER(0); \ OCL_NDRANGE(1); \ OCL_MAP_BUFFER(0); \ OCL_MAP_BUFFER(1); \ for (int i = 0; i < n; i++) { \ REAL_SRC_TYPE src = ((REAL_SRC_TYPE *)buf_data[0])[i]; \ DST_TYPE dst; \ if ((double)src > (double)DST_MAX) \ dst = DST_MAX; \ else if ((double)src < (double)DST_MIN) \ dst = DST_MIN; \ else \ dst = src; \ OCL_ASSERT(((DST_TYPE *)buf_data[1])[i] == dst); \ } \ OCL_UNMAP_BUFFER(0); \ OCL_UNMAP_BUFFER(1); \ } \ MAKE_UTEST_FROM_FUNCTION(builtin_convert_ ## SRC_TYPE ## _to_ ## DST_TYPE ## _sat); #define DEF(DST_TYPE, SRC_TYPE, DST_MIN, DST_MAX) \ DEF2(DST_TYPE, SRC_TYPE, DST_MIN, DST_MAX, SRC_TYPE) DEF(char, uchar, -128, 127); DEF(char, short, -128, 127); DEF(char, ushort, -128, 127); DEF(char, int, -128, 127); DEF(char, uint, -128, 127); DEF2(char, long, -128, 127, int64_t); DEF(char, float, -128, 127); DEF(uchar, char, 0, 255); DEF(uchar, short, 0, 255); DEF(uchar, ushort, 0, 255); DEF(uchar, int, 0, 255); DEF(uchar, uint, 0, 255); DEF2(uchar, long, 0, 255, int64_t); DEF(uchar, float, 0, 255); DEF(short, ushort, -32768, 32767); DEF(short, int, -32768, 32767); DEF(short, uint, -32768, 32767); DEF2(short, long, -32768, 32767, int64_t); DEF(short, float, -32768, 32767); DEF(ushort, short, 0, 65535); DEF(ushort, int, 0, 65535); DEF(ushort, uint, 0, 65535); DEF2(ushort, long, 0, 65535, int64_t); DEF(ushort, float, 0, 65535); DEF(int, uint, -0x7FFFFFFF-1, 0x7FFFFFFF); DEF2(int, long, -0x7FFFFFFF-1, 0x7FFFFFFF, int64_t); DEF(int, float, -0x7FFFFFFF-1, 0x7FFFFFFF); DEF(uint, int, 0, 0xffffffffu); DEF2(uint, long, 0, 0xffffffffu, int64_t); DEF(uint, float, 0, 0xffffffffu); #undef DEF Beignet-1.3.2-Source/utests/compiler_if_else.cpp000664 001750 001750 00000003142 13173554000 021011 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_if_else(void) { const size_t n = 17; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_if_else"); buf_data[0] = (uint32_t*) malloc(sizeof(uint32_t) * n); for (uint32_t i = 0; i < n; ++i) ((uint32_t*)buf_data[0])[i] = 2; OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(uint32_t), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; OCL_NDRANGE(1); // First control flow OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 16; ++i) { OCL_ASSERT(((int32_t*)buf_data[1])[i] == 2); } OCL_UNMAP_BUFFER(1); // Second control flow OCL_MAP_BUFFER(0); for (uint32_t i = 0; i < n; ++i) ((int32_t*)buf_data[0])[i] = -1; OCL_UNMAP_BUFFER(0); OCL_NDRANGE(1); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 16; ++i) { OCL_ASSERT(((int32_t*)buf_data[1])[i] == -2); } OCL_UNMAP_BUFFER(1); // Third control flow OCL_MAP_BUFFER(0); for (uint32_t i = 0; i < 4; ++i) ((int32_t*)buf_data[0])[i] = 2; for (uint32_t i = 4; i < n; ++i) ((int32_t*)buf_data[0])[i] = -1; OCL_UNMAP_BUFFER(0); OCL_NDRANGE(1); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 3; ++i) { OCL_ASSERT(((int32_t*)buf_data[1])[i] == 2); } OCL_ASSERT(((int32_t*)buf_data[1])[3] == -1); for (uint32_t i = 4; i < 16; ++i) { OCL_ASSERT(((int32_t*)buf_data[1])[i] == -2); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_if_else); Beignet-1.3.2-Source/utests/compiler_bswap.cpp000664 001750 001750 00000015205 13161142102 020513 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include "string.h" #define cpu_htons(A) ((((uint16_t)(A) & 0xff00) >> 8) | \ (((uint16_t)(A) & 0x00ff) << 8)) #define cpu_htonl(A) ((((uint32_t)(A) & 0xff000000) >> 24) | \ (((uint32_t)(A) & 0x00ff0000) >> 8) | \ (((uint32_t)(A) & 0x0000ff00) << 8) | \ (((uint32_t)(A) & 0x000000ff) << 24)) #define cpu_htonll(A) ((((uint64_t)(A) & 0xff00000000000000) >> 56) | \ (((uint64_t)(A) & 0x00ff000000000000) >> 40) | \ (((uint64_t)(A) & 0x0000ff0000000000) >> 24) | \ (((uint64_t)(A) & 0x000000ff00000000) >> 8) | \ (((uint64_t)(A) & 0x00000000ff000000) << 8) | \ (((uint64_t)(A) & 0x0000000000ff0000) << 24) | \ (((uint64_t)(A) & 0x000000000000ff00) << 40) | \ (((uint64_t)(A) & 0x00000000000000ff) << 56) ) template static void gen_rand_val(T & val) { val = static_cast(rand());//(0xAABBCCDD);// } template static void cpu(int global_id, T *src, T *dst) { T f = src[global_id]; T g = 0; if (sizeof(T) == sizeof(int16_t)) g = cpu_htons(f); else if (sizeof(T) == sizeof(int32_t)) g = cpu_htonl(f); else if (sizeof(T) == sizeof(int64_t)) g = cpu_htonll(f); dst[global_id] = g; } template static void cpu(int global_id, T src, T *dst) { T f = src; T g = 0; if (sizeof(T) == sizeof(int16_t)) g = cpu_htons(f); else if (sizeof(T) == sizeof(int32_t)) g = cpu_htonl(f); else if (sizeof(T) == sizeof(int64_t)) g = cpu_htonll(f); dst[global_id] = g; } template inline static void print_data(T& val) { if(sizeof(T) == sizeof(uint16_t)) printf(" 0x%hx", (uint16_t)val); else if(sizeof(T) == sizeof(uint32_t)) printf(" 0x%x", (uint32_t)val); else if(sizeof(T) == sizeof(uint64_t)) printf(" 0x%lx", (uint64_t)val); } template static void dump_data(T* raw, T* cpu, T* gpu, int n) { printf("\nRaw: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(raw[i]); } printf("\nCPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(cpu[i]); } printf("\nGPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(gpu[i]); } } template static void dump_data(T raw, T* cpu, T* gpu, int n) { printf("\nRaw: \n"); print_data(raw); printf("\nCPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(cpu[i]); } printf("\nGPU: \n"); for (int32_t i = 0; i < (int32_t) n; ++i) { print_data(gpu[i]); } } void compiler_bswap(void) { const size_t n = 16; uint32_t src0[n]; uint16_t src1[n]; uint32_t dst0[n]; uint16_t dst1[n]; int32_t src2 = static_cast(rand()); int32_t dst2[n]; int16_t src3 = static_cast(rand()); int16_t dst3[n]; uint64_t src4[n]; uint64_t dst4[n]; int64_t src5 = static_cast(rand()) << 32| static_cast(rand()); int64_t dst5[n]; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_bswap", "compiler_bswap"); OCL_CREATE_BUFFER(buf[0], 0, sizeof(src0), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_CREATE_BUFFER(buf[1], 0, sizeof(dst0), NULL); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_CREATE_BUFFER(buf[2], 0, sizeof(src1), NULL); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_CREATE_BUFFER(buf[3], 0, sizeof(dst1), NULL); OCL_SET_ARG(3, sizeof(cl_mem), &buf[3]); OCL_SET_ARG(4, sizeof(int32_t), &src2); OCL_CREATE_BUFFER(buf[4], 0, sizeof(dst2), NULL); OCL_SET_ARG(5, sizeof(cl_mem), &buf[4]); OCL_SET_ARG(6, sizeof(int16_t), &src3); OCL_CREATE_BUFFER(buf[5], 0, sizeof(dst3), NULL); OCL_SET_ARG(7, sizeof(cl_mem), &buf[5]); OCL_CREATE_BUFFER(buf[6], 0, sizeof(src4), NULL); OCL_SET_ARG(8, sizeof(cl_mem), &buf[6]); OCL_CREATE_BUFFER(buf[7], 0, sizeof(dst4), NULL); OCL_SET_ARG(9, sizeof(cl_mem), &buf[7]); OCL_SET_ARG(10, sizeof(int64_t), &src5); OCL_CREATE_BUFFER(buf[8], 0, sizeof(dst5), NULL); OCL_SET_ARG(11, sizeof(cl_mem), &buf[8]); OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) { gen_rand_val(src0[i]); } memcpy(buf_data[0], src0, sizeof(src0)); OCL_UNMAP_BUFFER(0); /* Clear the dst buffer to avoid random data. */ OCL_MAP_BUFFER(1); memset(buf_data[1], 0, sizeof(dst0)); OCL_UNMAP_BUFFER(1); OCL_MAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { gen_rand_val(src1[i]); } memcpy(buf_data[2], src1, sizeof(src1)); OCL_UNMAP_BUFFER(2); /* Clear the dst buffer to avoid random data. */ OCL_MAP_BUFFER(3); memset(buf_data[3], 0, sizeof(dst1)); OCL_UNMAP_BUFFER(3); /* Clear the dst buffer to avoid random data. */ OCL_MAP_BUFFER(4); memset(buf_data[4], 0, sizeof(dst2)); OCL_UNMAP_BUFFER(4); /* Clear the dst buffer to avoid random data. */ OCL_MAP_BUFFER(5); memset(buf_data[5], 0, sizeof(dst3)); OCL_UNMAP_BUFFER(5); OCL_MAP_BUFFER(6); for (int32_t i = 0; i < (int32_t) n; ++i) { uint64_t x, y; gen_rand_val(x); gen_rand_val(y); src4[i] = (x << 32)| y; } memcpy(buf_data[6], src4, sizeof(src4)); OCL_UNMAP_BUFFER(6); globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) { if (i%2) { dst0[i] = src0[i]; continue; } cpu(i, src0, dst0); } // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) { cpu(i, src1, dst1); if (i%2) { dst1[i] = dst1[i] + 1; cpu(i, dst1, dst1); } } // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) cpu(i, src2, dst2); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) cpu(i, src3, dst3); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) cpu(i, src4, dst4); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) cpu(i, src5, dst5); OCL_MAP_BUFFER(1); //dump_data(src0, dst0, (uint32_t *)buf_data[1], n); OCL_ASSERT(!memcmp(buf_data[1], dst0, sizeof(dst0))); OCL_UNMAP_BUFFER(1); OCL_MAP_BUFFER(3); //dump_data(src1, dst1, (uint16_t *)buf_data[3], n); OCL_ASSERT(!memcmp(buf_data[3], dst1, sizeof(dst1))); OCL_UNMAP_BUFFER(3); OCL_MAP_BUFFER(4); //dump_data(src2, dst2, (int32_t *)buf_data[4], n); OCL_ASSERT(!memcmp(buf_data[4], dst2, sizeof(dst2))); OCL_UNMAP_BUFFER(4); OCL_MAP_BUFFER(5); //dump_data(src3, dst3, (int16_t *)buf_data[5], n); OCL_ASSERT(!memcmp(buf_data[5], dst3, sizeof(dst3))); OCL_UNMAP_BUFFER(5); OCL_MAP_BUFFER(7); //dump_data(src4, dst4, (uint64_t *)buf_data[7], n); OCL_ASSERT(!memcmp(buf_data[7], dst4, sizeof(dst4))); OCL_UNMAP_BUFFER(7); OCL_MAP_BUFFER(8); //dump_data(src5, dst5, (int64_t *)buf_data[8], n); OCL_ASSERT(!memcmp(buf_data[8], dst5, sizeof(dst5))); OCL_UNMAP_BUFFER(8); } MAKE_UTEST_FROM_FUNCTION(compiler_bswap); Beignet-1.3.2-Source/utests/compiler_workgroup_scan_inclusive.cpp000664 001750 001750 00000025745 13161142102 024535 0ustar00yryr000000 000000 #include #include #include #include #include #include #include #include "utest_helper.hpp" using namespace std; /* set to 1 for debug, output of input-expected data */ #define DEBUG_STDOUT 0 /* NDRANGE */ #define WG_GLOBAL_SIZE 64 #define WG_LOCAL_SIZE 32 enum WG_FUNCTION { WG_SCAN_INCLUSIVE_ADD, WG_SCAN_INCLUSIVE_MAX, WG_SCAN_INCLUSIVE_MIN }; /* * Generic compute-expected function for op SCAN INCLUSIVE type * and any variable type */ template static void compute_expected(WG_FUNCTION wg_func, T* input, T* expected) { if(wg_func == WG_SCAN_INCLUSIVE_ADD) { expected[0] = input[0]; for(uint32_t i = 1; i < WG_LOCAL_SIZE; i++) expected[i] = input[i] + expected[i - 1]; } else if(wg_func == WG_SCAN_INCLUSIVE_MAX) { expected[0] = input[0]; for(uint32_t i = 1; i < WG_LOCAL_SIZE; i++) expected[i] = max(input[i], expected[i - 1]); } else if(wg_func == WG_SCAN_INCLUSIVE_MIN) { expected[0] = input[0]; for(uint32_t i = 1; i < WG_LOCAL_SIZE; i++) expected[i] = min(input[i], expected[i - 1]); } } /* * Generic input-expected generate function for op SCAN INCLUSIVE type * and any variable type */ template static void generate_data(WG_FUNCTION wg_func, T* &input, T* &expected) { input = new T[WG_GLOBAL_SIZE]; expected = new T[WG_GLOBAL_SIZE]; /* base value for all data types */ T base_val = (long)7 << (sizeof(T) * 5 - 3); /* seed for random inputs */ srand (time(NULL)); /* generate inputs and expected values */ for(uint32_t gid = 0; gid < WG_GLOBAL_SIZE; gid += WG_LOCAL_SIZE) { #if DEBUG_STDOUT cout << endl << "IN: " << endl; #endif /* input values */ for(uint32_t lid = 0; lid < WG_LOCAL_SIZE; lid++) { /* initially 0, augment after */ input[gid + lid] = 0; /* check all data types, test ideal for QWORD types */ input[gid + lid] += ((rand() % 2 - 1) * base_val); /* add trailing random bits, tests GENERAL cases */ input[gid + lid] += (rand() % 112); #if DEBUG_STDOUT /* output generated input */ cout << setw(4) << input[gid + lid] << ", " ; if((lid + 1) % 8 == 0) cout << endl; #endif } /* expected values */ compute_expected(wg_func, input + gid, expected + gid); #if DEBUG_STDOUT /* output expected input */ cout << endl << "EXP: " << endl; for(uint32_t lid = 0; lid < WG_LOCAL_SIZE; lid++) { cout << setw(4) << expected[gid + lid] << ", " ; if((lid + 1) % 8 == 0) cout << endl; } #endif } } /* * Generic workgroup utest function for op SCAN INCLUSIVE type * and any variable type */ template static void workgroup_generic(WG_FUNCTION wg_func, T* input, T* expected) { /* input and expected data */ generate_data(wg_func, input, expected); /* prepare input for data type */ OCL_CREATE_BUFFER(buf[0], 0, WG_GLOBAL_SIZE * sizeof(T), NULL); OCL_CREATE_BUFFER(buf[1], 0, WG_GLOBAL_SIZE * sizeof(T), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); /* set input data for GPU */ OCL_MAP_BUFFER(0); memcpy(buf_data[0], input, WG_GLOBAL_SIZE * sizeof(T)); OCL_UNMAP_BUFFER(0); /* run the kernel on GPU */ globals[0] = WG_GLOBAL_SIZE; locals[0] = WG_LOCAL_SIZE; OCL_NDRANGE(1); /* check if mismatch */ OCL_MAP_BUFFER(1); uint32_t mismatches = 0; for (uint32_t i = 0; i < WG_GLOBAL_SIZE; i++) if(((T *)buf_data[1])[i] != *(expected + i)) { /* found mismatch on integer, increment */ if(numeric_limits::is_integer){ mismatches++; #if DEBUG_STDOUT /* output mismatch */ cout << "Err at " << i << ", " << ((T *)buf_data[1])[i] << " != " << *(expected + i) << endl; #endif } /* float error is tolerable though */ else { float num_computed = ((T *)buf_data[1])[i]; float num_expected = *(expected + i); float num_diff = abs(num_computed - num_expected) / abs(num_expected); if(num_diff > 0.01f){ mismatches++; #if DEBUG_STDOUT /* output mismatch */ cout << "Err at " << i << ", " << ((T *)buf_data[1])[i] << " != " << *(expected + i) << endl; #endif } } } #if DEBUG_STDOUT /* output mismatch count */ cout << "mismatches " << mismatches << endl; #endif OCL_UNMAP_BUFFER(1); OCL_ASSERT(mismatches == 0); } /* * Workgroup scan_inclusive add utest functions */ void compiler_workgroup_scan_inclusive_add_int(void) { if (!cl_check_ocl20()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_add_int"); workgroup_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_inclusive_add_int); void compiler_workgroup_scan_inclusive_add_uint(void) { if (!cl_check_ocl20()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_add_uint"); workgroup_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_inclusive_add_uint); void compiler_workgroup_scan_inclusive_add_long(void) { if (!cl_check_ocl20()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_add_long"); workgroup_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_scan_inclusive_add_long); void compiler_workgroup_scan_inclusive_add_ulong(void) { if (!cl_check_ocl20()) return; cl_ulong *input = NULL; cl_ulong *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_add_ulong"); workgroup_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_scan_inclusive_add_ulong); void compiler_workgroup_scan_inclusive_add_float(void) { if (!cl_check_ocl20()) return; cl_float *input = NULL; cl_float *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_add_float"); workgroup_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_inclusive_add_float); /* * Workgroup scan_inclusive max utest functions */ void compiler_workgroup_scan_inclusive_max_int(void) { if (!cl_check_ocl20()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_max_int"); workgroup_generic(WG_SCAN_INCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_inclusive_max_int); void compiler_workgroup_scan_inclusive_max_uint(void) { if (!cl_check_ocl20()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_max_uint"); workgroup_generic(WG_SCAN_INCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_inclusive_max_uint); void compiler_workgroup_scan_inclusive_max_long(void) { if (!cl_check_ocl20()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_max_long"); workgroup_generic(WG_SCAN_INCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_scan_inclusive_max_long); void compiler_workgroup_scan_inclusive_max_ulong(void) { if (!cl_check_ocl20()) return; cl_ulong *input = NULL; cl_ulong *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_max_ulong"); workgroup_generic(WG_SCAN_INCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_scan_inclusive_max_ulong); void compiler_workgroup_scan_inclusive_max_float(void) { if (!cl_check_ocl20()) return; cl_float *input = NULL; cl_float *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_max_float"); workgroup_generic(WG_SCAN_INCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_inclusive_max_float); /* * Workgroup scan_inclusive min utest functions */ void compiler_workgroup_scan_inclusive_min_int(void) { if (!cl_check_ocl20()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_min_int"); workgroup_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_inclusive_min_int); void compiler_workgroup_scan_inclusive_min_uint(void) { if (!cl_check_ocl20()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_min_uint"); workgroup_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_inclusive_min_uint); void compiler_workgroup_scan_inclusive_min_long(void) { if (!cl_check_ocl20()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_min_long"); workgroup_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_scan_inclusive_min_long); void compiler_workgroup_scan_inclusive_min_ulong(void) { if (!cl_check_ocl20()) return; cl_ulong *input = NULL; cl_ulong *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_min_ulong"); workgroup_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_scan_inclusive_min_ulong); void compiler_workgroup_scan_inclusive_min_float(void) { if (!cl_check_ocl20()) return; cl_float *input = NULL; cl_float *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_inclusive", "compiler_workgroup_scan_inclusive_min_float"); workgroup_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_inclusive_min_float); Beignet-1.3.2-Source/utests/compiler_mad_hi.cpp000664 001750 001750 00000002251 13161142102 020615 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_mad_hi(void) { const int n = 32; int src1[n], src2[n], src3[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_mad_hi"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[3], 0, n * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_SET_ARG(3, sizeof(cl_mem), &buf[3]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); for (int i = 0; i < n; ++i) { src1[i] = ((int*)buf_data[0])[i] = rand(); src2[i] = ((int*)buf_data[1])[i] = rand(); src3[i] = ((int*)buf_data[2])[i] = rand(); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); OCL_NDRANGE(1); OCL_MAP_BUFFER(3); for (int i = 0; i < n; ++i) { long long a = src1[i]; a *= src2[i]; a >>= 32; a += src3[i]; OCL_ASSERT(((int*)buf_data[3])[i] == (int)a); } OCL_UNMAP_BUFFER(3); } MAKE_UTEST_FROM_FUNCTION(compiler_mad_hi); Beignet-1.3.2-Source/utests/utest_assert.cpp000664 001750 001750 00000002650 13161142102 020232 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file assert.cpp * \author Benjamin Segovia */ #include "utest_assert.hpp" #include "utest_exception.hpp" #include #include void onFailedAssertion(const char *msg, const char *file, const char *fn, int line) { char lineString[256]; sprintf(lineString, "%i", line); assert(msg != NULL && file != NULL && fn != NULL); const std::string str = "Error: " + std::string(msg) + "\n at file " + std::string(file) + ", function " + std::string(fn) + ", line " + std::string(lineString); throw Exception(str); } Beignet-1.3.2-Source/utests/compiler_array.cpp000664 001750 001750 00000001322 13161142102 020510 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_array(void) { const size_t n = 16; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_array"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); // First control flow OCL_MAP_BUFFER(0); for (uint32_t i = 0; i < n; ++i) ((int32_t*)buf_data[0])[i] = -2; OCL_UNMAP_BUFFER(0); globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); OCL_MAP_BUFFER(1); for (uint32_t i = 0; i < 16; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == 3); OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_array); Beignet-1.3.2-Source/utests/compiler_mix.cpp000664 001750 001750 00000002625 13161142102 020176 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include void compiler_mix(void) { const float MAXERR = 1e-3f; const int n = 1024; float src1[n], src2[n], src3[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_mix"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[3], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_SET_ARG(3, sizeof(cl_mem), &buf[3]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); for (int i = 0; i < n; ++i) { src1[i] = ((float*)buf_data[0])[i] = (float)rand(); src2[i] = ((float*)buf_data[1])[i] = (float)rand(); src3[i] = ((float*)buf_data[2])[i] = (float)rand()/(float)RAND_MAX; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); OCL_NDRANGE(1); OCL_MAP_BUFFER(3); float res, err; float max_err = 0.0f; for (int i = 0; i < n; ++i) { res = src1[i] + ((src2[i] - src1[i]) * src3[i]); err = fabsf((((float*)buf_data[3])[i] - res)/ res); max_err = err > max_err? err: max_err; } OCL_UNMAP_BUFFER(3); printf("\tmix max err is %g\n",max_err); OCL_ASSERT(max_err < MAXERR); } MAKE_UTEST_FROM_FUNCTION(compiler_mix); Beignet-1.3.2-Source/utests/compiler_function_constant0.cpp000664 001750 001750 00000002035 13161142102 023212 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_function_constant0(void) { const size_t n = 2048; const uint32_t value = 34; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_function_constant0"); OCL_CREATE_BUFFER(buf[0], 0, 75 * sizeof(int32_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, 1 * sizeof(char), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_SET_ARG(3, sizeof(uint32_t), &value); OCL_MAP_BUFFER(0); for(uint32_t i = 0; i < 69; ++i) ((int32_t *)buf_data[0])[i] = i; OCL_UNMAP_BUFFER(0); OCL_MAP_BUFFER(1); ((char *)buf_data[1])[0] = 15; OCL_UNMAP_BUFFER(1); // Run the kernel globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); OCL_MAP_BUFFER(2); // Check results for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((uint32_t *)buf_data[2])[i] == (value + 15 + i%69)); OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(compiler_function_constant0); Beignet-1.3.2-Source/utests/cl_create_kernel.cpp000664 001750 001750 00000001002 13161142102 020754 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void test_create_kernel(void) { cl_ulong max_mem_size; cl_int status; OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(max_mem_size), &max_mem_size, NULL); OCL_ASSERT(max_mem_size < (cl_ulong)-1); // increment the size so that following clCreateBuffer() would fail. ++max_mem_size; buf[0] = clCreateBuffer(ctx, 0, max_mem_size, NULL, &status); OCL_ASSERT(status == CL_INVALID_BUFFER_SIZE); } MAKE_UTEST_FROM_FUNCTION(test_create_kernel); Beignet-1.3.2-Source/utests/compiler_double_convert.cpp000664 001750 001750 00000043174 13161142102 022417 0ustar00yryr000000 000000 #include #include #include "utest_helper.hpp" void compiler_double_convert_int(void) { const size_t n = 16; double src[n]; int32_t cpu_dst0[n]; uint32_t cpu_dst1[n]; if (!cl_check_double()) return; memset(cpu_dst0, 0, sizeof(cpu_dst0)); memset(cpu_dst1, 0, sizeof(cpu_dst1)); // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_double_convert", "compiler_double_convert_int"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(double), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int32_t), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = 16; // Run random tests OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { src[i] = ((double*)buf_data[0])[i] = 32.1 * (rand() & 1324135) + 1434342.73209855531; ((int32_t*)buf_data[1])[i] = 0; ((uint32_t*)buf_data[2])[i] = 0; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) { if (i%3 == 0) continue; cpu_dst0[i] = (int32_t)src[i]; cpu_dst1[i] = (uint32_t)src[i]; } // Compare OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("Return Int is %d, ref is %d,\t Uint is %u, ref is %u,\t double is %f\n", // ((int*)buf_data[1])[i], cpu_dst0[i], ((uint32_t*)buf_data[2])[i], cpu_dst1[i], src[i]); OCL_ASSERT(((int32_t*)buf_data[1])[i] == cpu_dst0[i]); OCL_ASSERT(((uint32_t*)buf_data[2])[i] == cpu_dst1[i]); } OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(compiler_double_convert_int); void compiler_double_convert_float(void) { const size_t n = 16; double src[n]; float cpu_dst[n]; if (!cl_check_double()) return; memset(cpu_dst, 0, sizeof(cpu_dst)); // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_double_convert", "compiler_double_convert_float"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(double), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; // Run random tests OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { src[i] = ((double*)buf_data[0])[i] = 1332.1 * (rand() & 1324135) - 1434342.73209855531 * (rand() & 135); ((float*)buf_data[1])[i] = 0; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) { cpu_dst[i] = (float)src[i]; } // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("Return float is %f,\t ref is %f,\t double is %f\n", ((float*)buf_data[1])[i], cpu_dst[i], src[i]); OCL_ASSERT(((float*)buf_data[1])[i] == cpu_dst[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_double_convert_float); void compiler_double_convert_short(void) { const size_t n = 16; double src[n]; int16_t cpu_dst0[n]; uint16_t cpu_dst1[n]; if (!cl_check_double()) return; memset(cpu_dst0, 0, sizeof(cpu_dst0)); memset(cpu_dst1, 0, sizeof(cpu_dst1)); // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_double_convert", "compiler_double_convert_short"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(double), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int16_t), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(uint16_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = 16; // Run random tests OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { src[i] = ((double*)buf_data[0])[i] = 10.3443 * (rand() & 15) + 14.8924323; ((int16_t*)buf_data[1])[i] = 0; ((uint16_t*)buf_data[2])[i] = 0; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) { if (i%3 == 0) continue; cpu_dst0[i] = (int16_t)src[i]; cpu_dst1[i] = (uint16_t)src[i]; } // Compare OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("Return Int is %d, ref is %d,\t Uint is %u, ref is %u,\t double is %f\n", // ((int16_t*)buf_data[1])[i], cpu_dst0[i], ((uint16_t*)buf_data[2])[i], cpu_dst1[i], src[i]); OCL_ASSERT(((int16_t*)buf_data[1])[i] == cpu_dst0[i]); OCL_ASSERT(((uint16_t*)buf_data[2])[i] == cpu_dst1[i]); } OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(compiler_double_convert_short); void compiler_double_convert_char(void) { const size_t n = 16; double src[n]; int8_t cpu_dst0[n]; uint8_t cpu_dst1[n]; if (!cl_check_double()) return; memset(cpu_dst0, 0, sizeof(cpu_dst0)); memset(cpu_dst1, 0, sizeof(cpu_dst1)); // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_double_convert", "compiler_double_convert_char"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(double), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int8_t), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(uint8_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = 16; // Run random tests OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { src[i] = ((double*)buf_data[0])[i] = 10.3443 * (rand() & 7) + 2.8924323; ((int8_t*)buf_data[1])[i] = 0; ((uint8_t*)buf_data[2])[i] = 0; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) { if (i%3 == 0) continue; cpu_dst0[i] = (int8_t)src[i]; cpu_dst1[i] = (uint8_t)src[i]; } // Compare OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { // printf("Return Int is %d, ref is %d,\t Uint is %u, ref is %u,\t double is %f\n", // ((int8_t*)buf_data[1])[i], cpu_dst0[i], ((uint8_t*)buf_data[2])[i], cpu_dst1[i], src[i]); OCL_ASSERT(((int8_t*)buf_data[1])[i] == cpu_dst0[i]); OCL_ASSERT(((uint8_t*)buf_data[2])[i] == cpu_dst1[i]); } OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(compiler_double_convert_char); void compiler_double_convert_long(void) { const size_t n = 16; double src[n]; int64_t cpu_dst0[n]; uint64_t cpu_dst1[n]; if (!cl_check_double()) return; memset(cpu_dst0, 0, sizeof(cpu_dst0)); memset(cpu_dst1, 0, sizeof(cpu_dst1)); // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_double_convert", "compiler_double_convert_long"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(double), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int64_t), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(uint64_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = 16; // Run random tests OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { src[i] = ((double*)buf_data[0])[i] = 10.3443 * (rand() & 7) + 2.8924323; ((int64_t*)buf_data[1])[i] = 0; ((uint64_t*)buf_data[2])[i] = 0; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) { if (i%3 == 0) continue; cpu_dst0[i] = (int64_t)src[i]; cpu_dst1[i] = (uint64_t)src[i]; } // Compare OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { // printf("Return Int is %d, ref is %d,\t Uint is %u, ref is %u,\t double is %f\n", // ((int8_t*)buf_data[1])[i], cpu_dst0[i], ((uint8_t*)buf_data[2])[i], cpu_dst1[i], src[i]); OCL_ASSERT(((int64_t*)buf_data[1])[i] == cpu_dst0[i]); OCL_ASSERT(((uint64_t*)buf_data[2])[i] == cpu_dst1[i]); } OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(compiler_double_convert_long); void compiler_long_convert_double(void) { const size_t n = 16; int64_t src0[n]; uint64_t src1[n]; double cpu_dst0[n]; double cpu_dst1[n]; if (!cl_check_double()) return; memset(cpu_dst0, 0, sizeof(cpu_dst0)); memset(cpu_dst1, 0, sizeof(cpu_dst1)); // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_double_convert", "compiler_long_convert_double"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int64_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint64_t), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(double), NULL); OCL_CREATE_BUFFER(buf[3], 0, n * sizeof(double), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_SET_ARG(3, sizeof(cl_mem), &buf[3]); globals[0] = n; locals[0] = 16; // Run random tests OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); OCL_MAP_BUFFER(3); for (int32_t i = 0; i < (int32_t) n; ++i) { src0[i] = ((int64_t*)buf_data[0])[i] = 0xABC8ABACDA00C * (rand() & 7); src1[i] = ((uint64_t*)buf_data[1])[i] = 0xCABC8ABACDA00C * (rand() & 15); ((double*)buf_data[2])[i] = 0.0; ((double*)buf_data[3])[i] = 0.0; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); OCL_UNMAP_BUFFER(3); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) { cpu_dst0[i] = (double)src0[i]; cpu_dst1[i] = (double)src1[i]; } // Compare OCL_MAP_BUFFER(2); OCL_MAP_BUFFER(3); for (int32_t i = 0; i < (int32_t) n; ++i) { // printf("long is %ld, ref is %f, double is %f \t" // "ulong is %lu, ref is %f, double is %f\n", // src0[i], cpu_dst0[i], ((double*)buf_data[2])[i], // src1[i], cpu_dst1[i], ((double*)buf_data[3])[i]); OCL_ASSERT(((double*)buf_data[2])[i] == cpu_dst0[i]); OCL_ASSERT(((double*)buf_data[3])[i] == cpu_dst1[i]); } OCL_UNMAP_BUFFER(2); OCL_UNMAP_BUFFER(3); } MAKE_UTEST_FROM_FUNCTION(compiler_long_convert_double); void compiler_int_convert_double(void) { const size_t n = 16; int32_t src0[n]; uint32_t src1[n]; double cpu_dst0[n]; double cpu_dst1[n]; if (!cl_check_double()) return; memset(cpu_dst0, 0, sizeof(cpu_dst0)); memset(cpu_dst1, 0, sizeof(cpu_dst1)); // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_double_convert", "compiler_int_convert_double"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int32_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(double), NULL); OCL_CREATE_BUFFER(buf[3], 0, n * sizeof(double), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_SET_ARG(3, sizeof(cl_mem), &buf[3]); globals[0] = n; locals[0] = 16; // Run random tests OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); OCL_MAP_BUFFER(3); for (int32_t i = 0; i < (int32_t) n; ++i) { src0[i] = ((int32_t*)buf_data[0])[i] = 0xCABC8A0C * (rand() & 7); src1[i] = ((uint32_t*)buf_data[1])[i] = 0xCACDA00C * (rand() & 15); ((double*)buf_data[2])[i] = 0.0; ((double*)buf_data[3])[i] = 0.0; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); OCL_UNMAP_BUFFER(3); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) { cpu_dst0[i] = (double)src0[i]; cpu_dst1[i] = (double)src1[i]; } // Compare OCL_MAP_BUFFER(2); OCL_MAP_BUFFER(3); for (int32_t i = 0; i < (int32_t) n; ++i) { // printf("int is %d, ref is %f, double is %f \t" // "uint is %u, ref is %f, double is %f\n", // src0[i], cpu_dst0[i], ((double*)buf_data[2])[i], // src1[i], cpu_dst1[i], ((double*)buf_data[3])[i]); OCL_ASSERT(((double*)buf_data[2])[i] == cpu_dst0[i]); OCL_ASSERT(((double*)buf_data[3])[i] == cpu_dst1[i]); } OCL_UNMAP_BUFFER(2); OCL_UNMAP_BUFFER(3); } MAKE_UTEST_FROM_FUNCTION(compiler_int_convert_double); void compiler_short_convert_double(void) { const size_t n = 16; int16_t src0[n]; uint16_t src1[n]; double cpu_dst0[n]; double cpu_dst1[n]; if (!cl_check_double()) return; memset(cpu_dst0, 0, sizeof(cpu_dst0)); memset(cpu_dst1, 0, sizeof(cpu_dst1)); // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_double_convert", "compiler_short_convert_double"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int16_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint16_t), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(double), NULL); OCL_CREATE_BUFFER(buf[3], 0, n * sizeof(double), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_SET_ARG(3, sizeof(cl_mem), &buf[3]); globals[0] = n; locals[0] = 16; // Run random tests OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); OCL_MAP_BUFFER(3); for (int32_t i = 0; i < (int32_t) n; ++i) { src0[i] = ((int16_t*)buf_data[0])[i] = 0x8A0C * (rand() & 7); src1[i] = ((uint16_t*)buf_data[1])[i] = 0xC00C * (rand() & 15); ((double*)buf_data[2])[i] = 0.0; ((double*)buf_data[3])[i] = 0.0; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); OCL_UNMAP_BUFFER(3); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) { cpu_dst0[i] = (double)src0[i]; cpu_dst1[i] = (double)src1[i]; } // Compare OCL_MAP_BUFFER(2); OCL_MAP_BUFFER(3); for (int32_t i = 0; i < (int32_t) n; ++i) { // printf("short is %d, ref is %f, double is %f \t" // "ushort is %u, ref is %f, double is %f\n", // src0[i], cpu_dst0[i], ((double*)buf_data[2])[i], // src1[i], cpu_dst1[i], ((double*)buf_data[3])[i]); OCL_ASSERT(((double*)buf_data[2])[i] == cpu_dst0[i]); OCL_ASSERT(((double*)buf_data[3])[i] == cpu_dst1[i]); } OCL_UNMAP_BUFFER(2); OCL_UNMAP_BUFFER(3); } MAKE_UTEST_FROM_FUNCTION(compiler_short_convert_double); void compiler_char_convert_double(void) { const size_t n = 16; int8_t src0[n]; uint8_t src1[n]; double cpu_dst0[n]; double cpu_dst1[n]; if (!cl_check_double()) return; memset(cpu_dst0, 0, sizeof(cpu_dst0)); memset(cpu_dst1, 0, sizeof(cpu_dst1)); // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_double_convert", "compiler_char_convert_double"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int8_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint8_t), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(double), NULL); OCL_CREATE_BUFFER(buf[3], 0, n * sizeof(double), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_SET_ARG(3, sizeof(cl_mem), &buf[3]); globals[0] = n; locals[0] = 16; // Run random tests OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); OCL_MAP_BUFFER(3); for (int32_t i = 0; i < (int32_t) n; ++i) { src0[i] = ((int8_t*)buf_data[0])[i] = 0x8C * (rand() & 7); src1[i] = ((uint8_t*)buf_data[1])[i] = 0xC0 * (rand() & 15); ((double*)buf_data[2])[i] = 0.0; ((double*)buf_data[3])[i] = 0.0; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); OCL_UNMAP_BUFFER(3); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) { cpu_dst0[i] = (double)src0[i]; cpu_dst1[i] = (double)src1[i]; } // Compare OCL_MAP_BUFFER(2); OCL_MAP_BUFFER(3); for (int32_t i = 0; i < (int32_t) n; ++i) { // printf("char is %d, ref is %f, double is %f \t" // "uchar is %u, ref is %f, double is %f\n", // src0[i], cpu_dst0[i], ((double*)buf_data[2])[i], // src1[i], cpu_dst1[i], ((double*)buf_data[3])[i]); OCL_ASSERT(((double*)buf_data[2])[i] == cpu_dst0[i]); OCL_ASSERT(((double*)buf_data[3])[i] == cpu_dst1[i]); } OCL_UNMAP_BUFFER(2); OCL_UNMAP_BUFFER(3); } MAKE_UTEST_FROM_FUNCTION(compiler_char_convert_double); void compiler_float_convert_double(void) { const size_t n = 16; float src[n]; double cpu_dst[n]; if (!cl_check_double()) return; memset(cpu_dst, 0, sizeof(cpu_dst)); // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_double_convert", "compiler_float_convert_double"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(double), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; // Run random tests OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { src[i] = ((float*)buf_data[0])[i] = (float)(0x8C * (rand() & 7)) * 1342.42f; ((double*)buf_data[1])[i] = 0.0; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) { cpu_dst[i] = (double)src[i]; } // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("%f, \t%f\n", ((double*)buf_data[1])[i], cpu_dst[i]); OCL_ASSERT(((double*)buf_data[1])[i] == cpu_dst[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_float_convert_double); Beignet-1.3.2-Source/utests/builtin_num_sub_groups.cpp000664 001750 001750 00000002666 13161142102 022311 0ustar00yryr000000 000000 /* According to the OpenCL cl_intel_subgroups. Now define local and global size as following: globals[0] = 4; globals[1] = 9; globals[2] = 16; locals[0] = 2; locals[1] = 3; locals[2] = 4; */ #define udebug 0 #include "utest_helper.hpp" static void builtin_num_sub_groups(void) { if(!cl_check_subgroups()) return; // Setup kernel and buffers size_t dim, i,local_sz = 1,buf_len = 1; OCL_CREATE_KERNEL("builtin_num_sub_groups"); size_t num_sub; OCL_CREATE_BUFFER(buf[0], CL_MEM_READ_WRITE, sizeof(int)*576, NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); for( dim=1; dim <= 3; dim++ ) { buf_len = 1; local_sz = 1; for(i=1; i <= dim; i++) { locals[i - 1] = i + 1; globals[i - 1] = (i + 1) * (i + 1); buf_len *= ((i + 1) * (i + 1)); local_sz *= i + 1; } for(i = dim+1; i <= 3; i++) { globals[i - 1] = 0; locals[i - 1] = 0; } OCL_CALL(utestclGetKernelSubGroupInfoKHR,kernel,device,CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE_KHR ,sizeof(size_t)*dim,locals,sizeof(size_t),&num_sub,NULL); // Run the kernel OCL_NDRANGE( dim ); clFinish(queue); OCL_MAP_BUFFER(0); for( i = 0; i < buf_len; i++) { #if udebug printf("%zu get %d, expect %zu\n",i, ((uint32_t*)buf_data[0])[i], num_sub); #endif OCL_ASSERT( ((uint32_t*)buf_data[0])[i] == num_sub); } OCL_UNMAP_BUFFER(0); } } MAKE_UTEST_FROM_FUNCTION(builtin_num_sub_groups); Beignet-1.3.2-Source/utests/runtime_marker_list.cpp000664 001750 001750 00000004130 13161142102 021557 0ustar00yryr000000 000000 #include "utest_helper.hpp" #define BUFFERSIZE 32*1024 void runtime_marker_list(void) { const size_t n = BUFFERSIZE; cl_int cpu_src[BUFFERSIZE]; cl_int cpu_src_2[BUFFERSIZE]; cl_event ev[5]; cl_int status = 0; cl_int value = 34; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_event"); OCL_CREATE_BUFFER(buf[0], 0, BUFFERSIZE*sizeof(int), NULL); OCL_CREATE_BUFFER(buf[1], 0, BUFFERSIZE*sizeof(int), NULL); for(cl_uint i=0; i= CL_SUBMITTED); } buf_data[0] = clEnqueueMapBuffer(queue, buf[0], CL_FALSE, 0, 0, BUFFERSIZE*sizeof(int), 1, &ev[2], NULL, NULL); clEnqueueMarkerWithWaitList(queue, 0, NULL, &ev[3]); clEnqueueWriteBuffer(queue, buf[1], CL_FALSE, 0, BUFFERSIZE*sizeof(int), (void *)cpu_src_2, 1, &ev[3], &ev[4]); clGetEventInfo(ev[4], CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(status), &status, NULL); OCL_ASSERT(status != CL_COMPLETE); OCL_SET_USER_EVENT_STATUS(ev[0], CL_COMPLETE); OCL_FINISH(); clGetEventInfo(ev[0], CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(status), &status, NULL); OCL_ASSERT(status == CL_COMPLETE); for (cl_uint i = 0; i != sizeof(ev) / sizeof(cl_event); ++i) { clGetEventInfo(ev[i], CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(status), &status, NULL); OCL_ASSERT(status <= CL_COMPLETE); } for (uint32_t i = 0; i < n; ++i) { OCL_ASSERT(((int*)buf_data[0])[i] == (int)value + 0x3); } for (cl_uint i = 0; i != sizeof(ev) / sizeof(cl_event); ++i) { clReleaseEvent(ev[i]); } } MAKE_UTEST_FROM_FUNCTION(runtime_marker_list); Beignet-1.3.2-Source/utests/builtin_atan2.cpp000664 001750 001750 00000002073 13161142102 020237 0ustar00yryr000000 000000 #include #include "utest_helper.hpp" void builtin_atan2(void) { const int n = 1024; float y[n], x[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("builtin_atan2"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int i = 0; i < n; ++i) { y[i] = ((float*) buf_data[0])[i] = (rand()&255) * 0.01f; x[i] = ((float*) buf_data[1])[i] = (rand()&255) * 0.01f; } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(2); float *dst = (float*) buf_data[2]; for (int i = 0; i < n; ++i) { float cpu = atan2f(y[i], x[i]); float gpu = dst[i]; if (fabsf(cpu - gpu) >= 1e-2) { printf("%f %f %f %f\n", y[i], x[i], cpu, gpu); OCL_ASSERT(0); } } OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION (builtin_atan2); Beignet-1.3.2-Source/utests/compiler_global_memory_barrier.cpp000664 001750 001750 00000001420 13161142102 023727 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void compiler_global_memory_barrier(void) { const size_t n = 16*1024; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_global_memory_barrier"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); // Run the kernel globals[0] = n/2; locals[0] = 256; OCL_NDRANGE(1); OCL_MAP_BUFFER(0); // Check results uint32_t *dst = (uint32_t*)buf_data[0]; for (uint32_t i = 0; i < n; i+=locals[0]) for (uint32_t j = 0; j < locals[0]; ++j) OCL_ASSERT(dst[i+j] == locals[0] - 1 -j); OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(compiler_global_memory_barrier); Beignet-1.3.2-Source/utests/compiler_half.cpp000664 001750 001750 00000070715 13161142102 020320 0ustar00yryr000000 000000 #include #include #include #include #include "utest_helper.hpp" void compiler_half_basic(void) { const size_t n = 16; uint16_t hsrc[n]; float fsrc[n], fdst[n]; float f = 2.5; uint32_t tmp_f; if (!cl_check_half()) return; memcpy(&tmp_f, &f, sizeof(float)); // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half", "compiler_half_basic"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint16_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { fsrc[i] = 10.1 * i; memcpy(&tmp_f, &fsrc[i], sizeof(float)); hsrc[i] = __float_to_half(tmp_f); } for (int32_t i = 0; i < (int32_t) n; ++i) { fdst[i] = fsrc[i] + f; fdst[i] = fdst[i]*fdst[i]; fdst[i] = fdst[i]/1.8; } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0, sizeof(hsrc)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { tmp_f = __half_to_float(((uint16_t *)buf_data[1])[i]); memcpy(&f, &tmp_f, sizeof(float)); //printf("%f %f\n", f, fdst[i]); OCL_ASSERT(fabs(f - fdst[i]) <= 0.01 * fabs(fdst[i]) || (fdst[i] == 0.0 && f == 0.0)); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_half_basic); static const int half_n = 16; static float half_test_src[half_n] = { -0.23455f, 1.23413f, 2.3412, 8.234f, -122.31f, -14.233f, 0.0023f, 99.322f, 0.0f, 0.332f, 123.12f, -0.003f, 16.0f, 19.22f, 128.006f, 25.032f }; #define HALF_MATH_TEST_1ARG(NAME, CPPNAME) \ void compiler_half_math_##NAME(void) \ { \ const size_t n = half_n; \ uint16_t hsrc[n]; \ float fsrc[n], fdst[n]; \ uint32_t tmp_f; \ float f; \ \ if (!cl_check_half()) \ return; \ \ OCL_CREATE_KERNEL_FROM_FILE("compiler_half_math", "compiler_half_math_" #NAME); \ OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); \ OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint16_t), NULL); \ OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); \ OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); \ globals[0] = n; \ locals[0] = 16; \ \ for (int32_t i = 0; i < (int32_t) n; ++i) { \ fsrc[i] = half_test_src[i]; \ memcpy(&tmp_f, &fsrc[i], sizeof(float)); \ hsrc[i] = __float_to_half(tmp_f); \ } \ \ for (int32_t i = 0; i < (int32_t) n; ++i) { \ /* printf("Float is %f\n", fsrc[i]); */ \ fdst[i] = CPPNAME(fsrc[i]); \ } \ \ OCL_MAP_BUFFER(0); \ OCL_MAP_BUFFER(1); \ memcpy(buf_data[0], hsrc, sizeof(hsrc)); \ memset(buf_data[1], 0, sizeof(hsrc)); \ OCL_UNMAP_BUFFER(0); \ OCL_UNMAP_BUFFER(1); \ OCL_NDRANGE(1); \ \ OCL_MAP_BUFFER(1); \ for (int32_t i = 0; i < (int32_t) n; ++i) { \ bool isInf, infSign; \ tmp_f = __half_to_float(((uint16_t *)buf_data[1])[i], &isInf, &infSign); \ memcpy(&f, &tmp_f, sizeof(float)); \ /* printf("%.15f %.15f, diff is %f\n", f, fdst[i], (fabs(f - fdst[i])/fabs(fdst[i]))); */ \ OCL_ASSERT(((fabs(fdst[i]) < 6e-8f) && (fabs(f) < 6e-8f)) || \ (fabs(f - fdst[i]) <= 0.03 * fabs(fdst[i])) || \ (isInf && ((infSign && fdst[i] > 65504.0f) || (!infSign && fdst[i] < -65504.0f))) || \ (std::isnan(f) && std::isnan(fdst[i]))); \ } \ OCL_UNMAP_BUFFER(1); \ } \ MAKE_UTEST_FROM_FUNCTION(compiler_half_math_##NAME); HALF_MATH_TEST_1ARG(sin, sinf); HALF_MATH_TEST_1ARG(cos, cosf); HALF_MATH_TEST_1ARG(sinh, sinh); HALF_MATH_TEST_1ARG(cosh, cosh); HALF_MATH_TEST_1ARG(tan, tanf); HALF_MATH_TEST_1ARG(log10, log10f); HALF_MATH_TEST_1ARG(log, logf); HALF_MATH_TEST_1ARG(trunc, truncf); HALF_MATH_TEST_1ARG(exp, expf); HALF_MATH_TEST_1ARG(sqrt, sqrtf); HALF_MATH_TEST_1ARG(ceil, ceilf); #define HALF_MATH_TEST_2ARG(NAME, CPPNAME, RANGE_L, RANGE_H) \ void compiler_half_math_##NAME(void) \ { \ const size_t n = 16*4; \ uint16_t hsrc0[n], hsrc1[n]; \ float fsrc0[n], fsrc1[n], fdst[n]; \ uint32_t tmp_f; \ float f; \ \ if (!cl_check_half()) \ return; \ \ OCL_CREATE_KERNEL_FROM_FILE("compiler_half_math", "compiler_half_math_" #NAME); \ OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); \ OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint16_t), NULL); \ OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(uint16_t), NULL); \ OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); \ OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); \ OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); \ globals[0] = n; \ locals[0] = 16; \ \ for (int32_t i = 0; i < (int32_t) n; ++i) { \ fsrc0[i] = RANGE_L + (((RANGE_H) - (RANGE_L))/n) * i; \ memcpy(&tmp_f, &fsrc0[i], sizeof(float)); \ hsrc0[i] = __float_to_half(tmp_f); \ fsrc1[i] = RANGE_L + (half_test_src[i/4] + 63) * ((RANGE_H) - (RANGE_L)); \ memcpy(&tmp_f, &fsrc1[i], sizeof(float)); \ hsrc1[i] = __float_to_half(tmp_f); \ } \ \ for (int32_t i = 0; i < (int32_t) n; ++i) { \ /* printf("Float is %f %f\n", fsrc0[i], fsrc1[i]);*/ \ fdst[i] = CPPNAME(fsrc0[i], fsrc1[i]); \ } \ \ OCL_MAP_BUFFER(0); \ OCL_MAP_BUFFER(1); \ OCL_MAP_BUFFER(2); \ memcpy(buf_data[0], hsrc0, sizeof(hsrc0)); \ memcpy(buf_data[1], hsrc1, sizeof(hsrc1)); \ memset(buf_data[2], 0, sizeof(hsrc0)); \ OCL_UNMAP_BUFFER(0); \ OCL_UNMAP_BUFFER(1); \ OCL_UNMAP_BUFFER(2); \ OCL_NDRANGE(1); \ \ OCL_MAP_BUFFER(2); \ for (int32_t i = 0; i < (int32_t) n; ++i) { \ bool isInf, infSign; \ tmp_f = __half_to_float(((uint16_t *)buf_data[2])[i], &isInf, &infSign); \ memcpy(&f, &tmp_f, sizeof(float)); \ /*printf("%.15f %.15f, diff is %%%f\n", f, fdst[i], (fabs(f - fdst[i])/fabs(fdst[i]))); */ \ OCL_ASSERT(((fabs(fdst[i]) < 6e-8f) && (fabs(f) < 6e-8f)) || \ (fabs(f - fdst[i]) <= 0.03 * fabs(fdst[i])) || \ (isInf && ((infSign && fdst[i] > 65504.0f) || (!infSign && fdst[i] < -65504.0f))) || \ (std::isnan(f) && std::isnan(fdst[i]))); \ } \ OCL_UNMAP_BUFFER(2); \ } \ MAKE_UTEST_FROM_FUNCTION(compiler_half_math_##NAME); HALF_MATH_TEST_2ARG(fmod, fmod, 1.0, 500.0); HALF_MATH_TEST_2ARG(fmax, fmax, -10.0, 20.0); HALF_MATH_TEST_2ARG(fmin, fmin, -10.0, 20.0); void compiler_half_isnan(void) { const size_t n = 16*2; uint16_t hsrc[n]; if (!cl_check_half()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_relation", "compiler_half_isnan"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint16_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { hsrc[i] = 0xFF00; } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0, sizeof(uint16_t)*n); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("%d\n", ((uint16_t *)buf_data[1])[i]); OCL_ASSERT(((int16_t *)buf_data[1])[i] == -1); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_half_isnan); void compiler_half_isinf(void) { const size_t n = 16; uint16_t hsrc[n]; if (!cl_check_half()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_relation", "compiler_half_isinf"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n/2; ++i) { hsrc[i] = 0x7C00; } for (int32_t i = n/2; i < (int32_t) n; ++i) { hsrc[i] = 0xFC00; } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0, sizeof(int)*n); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("%d\n", ((int *)buf_data[1])[i]); OCL_ASSERT(((int *)buf_data[1])[i] == 1); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_half_isinf); void compiler_half_to_float(void) { const size_t n = 16*4; uint16_t hsrc[n]; float fdst[n]; uint32_t tmp_f; if (!cl_check_half()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_convert", "compiler_half_to_float"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { fdst[i] = 13.1 * i; memcpy(&tmp_f, &fdst[i], sizeof(float)); hsrc[i] = __float_to_half(tmp_f); } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0.0f, sizeof(fdst)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("%f %f, abs is %f\n", (((float *)buf_data[1])[i]), fdst[i], fabs((((float *)buf_data[1])[i]) - fdst[i])); OCL_ASSERT((fabs((((float *)buf_data[1])[i]) - fdst[i]) < 0.001 * fabs(fdst[i])) || (fdst[i] == 0.0 && (((float *)buf_data[1])[i]) == 0.0)); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_half_to_float); void compiler_half_as_char2(void) { const size_t n = 16; uint16_t hsrc[n]; uint8_t* csrc = (uint8_t*)hsrc; if (!cl_check_half()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_convert", "compiler_half_as_char2"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint16_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { hsrc[i] = (i&0x0f)<<8 | ((i+1)&0x0f); } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0, sizeof(hsrc)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n*2; ++i) { //printf("%d %d\n", (((uint8_t *)buf_data[1])[i]), csrc[i]); OCL_ASSERT((((uint8_t *)buf_data[1])[i]) == csrc[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_half_as_char2); void compiler_half2_as_int(void) { const size_t n = 16*2; uint16_t hsrc[n]; int* isrc = (int*)hsrc; if (!cl_check_half()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_convert", "compiler_half2_as_int"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint16_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { hsrc[i] = (i&0x0f)<<8 | ((i+1)&0x0f); } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0, sizeof(hsrc)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n/2; ++i) { //printf("%d %d\n", (((int *)buf_data[1])[i]), isrc[i]); OCL_ASSERT((((int *)buf_data[1])[i]) == isrc[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_half2_as_int); void compiler_half_to_char_sat(void) { const size_t n = 16; uint16_t hsrc[n]; float fsrc[n]; char dst[n]; uint32_t tmp_f; if (!cl_check_half()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_convert", "compiler_half_to_char_sat"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(char), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { fsrc[i] = -200.1f + 30.5f * i; memcpy(&tmp_f, &fsrc[i], sizeof(float)); hsrc[i] = __float_to_half(tmp_f); if (fsrc[i] <= -128.0f) { dst[i] = -128; } else if (fsrc[i] >= 127.0f) { dst[i] = 127; } else { dst[i] = (char)fsrc[i]; } } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0, sizeof(dst)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("%d %d\n", (((char *)buf_data[1])[i]), dst[i]); OCL_ASSERT((((char *)buf_data[1])[i]) == dst[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_half_to_char_sat); void compiler_half_to_ushort_sat(void) { const size_t n = 16; uint16_t hsrc[n]; float fsrc[n]; uint16_t dst[n]; uint32_t tmp_f; if (!cl_check_half()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_convert", "compiler_half_to_ushort_sat"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint16_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { fsrc[i] = -100.1f + 10.3f * i; memcpy(&tmp_f, &fsrc[i], sizeof(float)); hsrc[i] = __float_to_half(tmp_f); if (fsrc[i] <= 0.0f) { dst[i] = 0; } else { dst[i] = (uint16_t)fsrc[i]; } } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0, sizeof(dst)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("%u %u\n", (((uint16_t *)buf_data[1])[i]), dst[i]); OCL_ASSERT((((uint16_t *)buf_data[1])[i]) == dst[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_half_to_ushort_sat); void compiler_half_to_uint_sat(void) { const size_t n = 16; uint16_t hsrc[n]; float fsrc[n]; uint32_t dst[n]; uint32_t tmp_f; if (!cl_check_half()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_convert", "compiler_half_to_uint_sat"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { fsrc[i] = -10.1f + 13.965f * i; memcpy(&tmp_f, &fsrc[i], sizeof(float)); hsrc[i] = __float_to_half(tmp_f); if (fsrc[i] <= 0.0f) { dst[i] = 0; } else { dst[i] = (uint32_t)fsrc[i]; } } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0, sizeof(dst)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("%u %u\n", (((uint32_t *)buf_data[1])[i]), dst[i]); OCL_ASSERT((((uint32_t *)buf_data[1])[i]) == dst[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_half_to_uint_sat); void compiler_uchar_to_half(void) { const size_t n = 16; uint8_t hsrc[n]; float fdst[n]; uint32_t tmp_f; if (!cl_check_half()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_convert", "compiler_uchar_to_half"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint8_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint16_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { hsrc[i] = 5*i; fdst[i] = (float)hsrc[i]; } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0, n*sizeof(uint16_t)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { float f; tmp_f = __half_to_float(((uint16_t *)buf_data[1])[i]); memcpy(&f, &tmp_f, sizeof(float)); //printf("%f %f\n", f, fdst[i]); OCL_ASSERT(f == fdst[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_uchar_to_half); void compiler_int_to_half(void) { const size_t n = 16; int hsrc[n]; float fdst[n]; uint32_t tmp_f; if (!cl_check_half()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_convert", "compiler_int_to_half"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint16_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { hsrc[i] = 51*i; fdst[i] = (float)hsrc[i]; } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0, n*sizeof(uint16_t)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { float f; tmp_f = __half_to_float(((uint16_t *)buf_data[1])[i]); memcpy(&f, &tmp_f, sizeof(float)); //printf("%f %f\n", f, fdst[i]); OCL_ASSERT(f == fdst[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_int_to_half); void compiler_half_to_long(void) { const size_t n = 16; uint16_t hsrc[n]; int64_t ldst[n]; uint32_t tmp_f; float f; if (!cl_check_half()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_convert", "compiler_half_to_long"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint64_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { f = -100.1f + 10.3f * i; memcpy(&tmp_f, &f, sizeof(float)); hsrc[i] = __float_to_half(tmp_f); ldst[i] = (int64_t)f; } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0, n*sizeof(uint64_t)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("%ld %ld\n", (((int64_t *)buf_data[1])[i]), ldst[i]); OCL_ASSERT((((int64_t *)buf_data[1])[i]) == ldst[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_half_to_long); void compiler_ulong_to_half(void) { const size_t n = 16; uint64_t src[n]; float fdst[n]; uint32_t tmp_f; float f; if (!cl_check_half()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_convert", "compiler_ulong_to_half"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint64_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint16_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { src[i] = 10 + 126*i; fdst[i] = (float)src[i]; } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], src, sizeof(src)); memset(buf_data[1], 0, n*sizeof(uint16_t)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { tmp_f = __half_to_float(((uint16_t *)buf_data[1])[i]); memcpy(&f, &tmp_f, sizeof(float)); //printf("%f %f\n", f, fdst[i]); OCL_ASSERT(f == fdst[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_ulong_to_half); void compiler_half_to_long_sat(void) { const size_t n = 16; uint16_t hsrc[n]; int64_t ldst[n]; uint32_t tmp_f; float f; if (!cl_check_half()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_convert", "compiler_half_to_long_sat"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint64_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 1; i < (int32_t) n-1; ++i) { f = -100.1f + 10.3f * i; memcpy(&tmp_f, &f, sizeof(float)); hsrc[i] = __float_to_half(tmp_f); ldst[i] = (int64_t)f; } hsrc[0] = 0xFC00; //-inf; ldst[0] = 0x8000000000000000; hsrc[n-1] = 0x7C00; //inf; ldst[n-1] = 0x7FFFFFFFFFFFFFFF; OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0, n*sizeof(uint64_t)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("%lx %lx\n", (((int64_t *)buf_data[1])[i]), ldst[i]); OCL_ASSERT((((int64_t *)buf_data[1])[i]) == ldst[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_half_to_long_sat); void compiler_half_to_double(void) { const size_t n = 16; uint16_t hsrc[n]; double ddst[n]; uint32_t tmp_f; float f; // if (!cl_check_half()) // return; if (!cl_check_double()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_convert", "compiler_half_to_double"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint16_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(double), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { f = -100.1f + 10.3f * i; memcpy(&tmp_f, &f, sizeof(float)); hsrc[i] = __float_to_half(tmp_f); ddst[i] = (double)f; } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], hsrc, sizeof(hsrc)); memset(buf_data[1], 0, n*sizeof(double)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { double dd = ((double *)(buf_data[1]))[i]; // printf("%f %f, diff is %%%f\n", dd, ddst[i], fabs(dd - ddst[i])/fabs(ddst[i])); OCL_ASSERT(fabs(dd - ddst[i]) < 0.001f * fabs(ddst[i])); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_half_to_double); void compiler_double_to_half(void) { const size_t n = 16; uint16_t hdst[n]; double src[n]; uint32_t tmp_f; float f; // if (!cl_check_half()) // return; if (!cl_check_double()) return; // Setup kernel and buffers OCL_CREATE_KERNEL_FROM_FILE("compiler_half_convert", "compiler_double_to_half"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(double), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint16_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int32_t i = 0; i < (int32_t) n; ++i) { f = -100.1f + 10.3f * i; src[i] = (double)f; memcpy(&tmp_f, &f, sizeof(float)); hdst[i] = __float_to_half(tmp_f); } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], src, sizeof(src)); memset(buf_data[1], 0, n*sizeof(uint16_t)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) { uint16_t hf = ((uint16_t *)(buf_data[1]))[i]; //tmp_f = __half_to_float(hf); //memcpy(&f, &tmp_f, sizeof(float)); //printf("%f, %x, %x\n", f, hf, hdst[i]); OCL_ASSERT(hf == hdst[i]); } OCL_UNMAP_BUFFER(1); } MAKE_UTEST_FROM_FUNCTION(compiler_double_to_half); Beignet-1.3.2-Source/utests/compiler_arith_shift_right.cpp000664 001750 001750 00000002163 13161142102 023077 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void cpu(int global_id, int *src, int *dst) { dst[global_id] = src[global_id] >> 24; } void compiler_arith_shift_right(void) { const size_t n = 16; int cpu_src[16]; int cpu_dst[16]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_arith_shift_right"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; // Run random tests for (uint32_t pass = 0; pass < 8; ++pass) { OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) cpu_src[i] = ((int*)buf_data[0])[i] = 0x80000000 | rand(); OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) cpu(i, cpu_src, cpu_dst); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) OCL_ASSERT(((int *)buf_data[1])[i] == cpu_dst[i]); OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION(compiler_arith_shift_right); Beignet-1.3.2-Source/utests/compiler_array3.cpp000664 001750 001750 00000002416 13161142102 020600 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void cpu(int global_id, int *src, int *dst) { int tmp[32]; for (int i = 0; i < 16; ++i) { for (int j = 0; j < 16; ++j) tmp[j] = global_id; for (int j = 0; j < src[0]; ++j) tmp[j] = 1+src[j]; tmp[16+i] = tmp[i]; } dst[global_id] = tmp[16+global_id]; } void compiler_array3(void) { const size_t n = 16; int cpu_dst[16], cpu_src[16]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_array3"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; // Run random tests for (uint32_t pass = 0; pass < 8; ++pass) { OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) cpu_src[i] = ((int32_t*)buf_data[0])[i] = rand() % 16; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i <(int32_t) n; ++i) cpu(i, cpu_src, cpu_dst); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < 11; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == cpu_dst[i]); OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION(compiler_array3); Beignet-1.3.2-Source/utests/builtin_lgamma.cpp000664 001750 001750 00000001642 13161142102 020471 0ustar00yryr000000 000000 #include #include "utest_helper.hpp" void builtin_lgamma(void) { const int n = 1024; float src[n]; // Setup kernel and buffers OCL_CREATE_KERNEL("builtin_lgamma"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; for (int j = 0; j < 1024; j++) { OCL_MAP_BUFFER(0); for (int i = 0; i < n; ++i) { src[i] = ((float*) buf_data[0])[i] = (j * n + i + 1) * 0.001f; } OCL_UNMAP_BUFFER(0); OCL_NDRANGE(1); OCL_MAP_BUFFER(1); float *dst = (float*) buf_data[1]; for (int i = 0; i < n; ++i) { float cpu = lgamma(src[i]); float gpu = dst[i]; if (fabsf(cpu - gpu) >= 1e-3) { printf("%f %f %f", src[i], cpu, gpu); OCL_ASSERT(0); } } OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION (builtin_lgamma); Beignet-1.3.2-Source/utests/compiler_double.cpp000664 001750 001750 00000002376 13161142102 020656 0ustar00yryr000000 000000 #include #include "utest_helper.hpp" static void cpu(int global_id, double *src, double *dst) { double f = src[global_id]; double d = 1.234567890123456789; dst[global_id] = global_id < 14 ? (d * (f + d)) : 14; } void compiler_double(void) { const size_t n = 16; double cpu_dst[n], cpu_src[n]; if (!cl_check_double()) return; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_double"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(double), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(double), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; // Run random tests for (uint32_t pass = 0; pass < 1; ++pass) { OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) cpu_src[i] = ((double*)buf_data[0])[i] = .1f * (rand() & 15) - .75f; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) cpu(i, cpu_src, cpu_dst); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) OCL_ASSERT(fabs(((double*)buf_data[1])[i] - cpu_dst[i]) < 1e-32); OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION(compiler_double); Beignet-1.3.2-Source/utests/compiler_long_cmp.cpp000664 001750 001750 00000007644 13161142102 021205 0ustar00yryr000000 000000 #include #include #include #include "utest_helper.hpp" void compiler_long_cmp(void) { const size_t n = 16; int64_t src1[n], src2[n]; src1[0] = (int64_t)1 << 63, src2[0] = 0x7FFFFFFFFFFFFFFFll; src1[1] = (int64_t)1 << 63, src2[1] = ((int64_t)1 << 63) | 1; src1[2] = -1ll, src2[2] = 0; src1[3] = ((int64_t)123 << 32) | 0x7FFFFFFF, src2[3] = ((int64_t)123 << 32) | 0x80000000; src1[4] = 0x7FFFFFFFFFFFFFFFll, src2[4] = (int64_t)1 << 63; src1[5] = ((int64_t)1 << 63) | 1, src2[5] = (int64_t)1 << 63; src1[6] = 0, src2[6] = -1ll; src1[7] = ((int64_t)123 << 32) | 0x80000000, src2[7] = ((int64_t)123 << 32) | 0x7FFFFFFF; for(size_t i=8; i src2[i]) ? 3 : 4; OCL_ASSERT(x == dest[i]); } OCL_UNMAP_BUFFER(2); OCL_DESTROY_KERNEL_KEEP_PROGRAM(true); OCL_CREATE_KERNEL_FROM_FILE("compiler_long_cmp", "compiler_long_cmp_ge"); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_NDRANGE(1); OCL_MAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { int64_t *dest = (int64_t *)buf_data[2]; int64_t x = (src1[i] >= src2[i]) ? 3 : 4; OCL_ASSERT(x == dest[i]); } OCL_UNMAP_BUFFER(2); OCL_DESTROY_KERNEL_KEEP_PROGRAM(true); OCL_CREATE_KERNEL_FROM_FILE("compiler_long_cmp", "compiler_long_cmp_eq"); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_NDRANGE(1); OCL_MAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { int64_t *dest = (int64_t *)buf_data[2]; int64_t x = (src1[i] == src2[i]) ? 3 : 4; OCL_ASSERT(x == dest[i]); } OCL_UNMAP_BUFFER(2); OCL_DESTROY_KERNEL_KEEP_PROGRAM(true); OCL_CREATE_KERNEL_FROM_FILE("compiler_long_cmp", "compiler_long_cmp_neq"); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_NDRANGE(1); OCL_MAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { int64_t *dest = (int64_t *)buf_data[2]; int64_t x = (src1[i] != src2[i]) ? 3 : 4; OCL_ASSERT(x == dest[i]); } OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(compiler_long_cmp); Beignet-1.3.2-Source/utests/load_program_from_bin_file.cpp000664 001750 001750 00000004602 13161142102 023024 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include "utest_file_map.hpp" #include #include using namespace std; static void cpu(int global_id, float *src, float *dst) { dst[global_id] = ceilf(src[global_id]); } static void test_load_program_from_bin_file(void) { const size_t n = 16; float cpu_dst[16], cpu_src[16]; cl_int status; cl_int binary_status; char *ker_path = NULL; cl_file_map_t *fm = cl_file_map_new(); if(!fm) { fprintf(stderr, "run out of memory\n"); return; } ker_path = cl_do_kiss_path("compiler_ceil.bin", device); OCL_ASSERT (cl_file_map_open(fm, ker_path) == CL_FILE_MAP_SUCCESS); const unsigned char *src = (const unsigned char *)cl_file_map_begin(fm); const size_t sz = cl_file_map_size(fm); program = clCreateProgramWithBinary(ctx, 1, &device, &sz, &src, &binary_status, &status); OCL_ASSERT(program && status == CL_SUCCESS); /* OCL requires to build the program even if it is created from a binary */ OCL_ASSERT(clBuildProgram(program, 1, &device, NULL, NULL, NULL) == CL_SUCCESS); kernel = clCreateKernel(program, "compiler_ceil", &status); OCL_ASSERT(status == CL_SUCCESS); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; // Run random tests for (uint32_t pass = 0; pass < 8; ++pass) { OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) cpu_src[i] = ((float*)buf_data[0])[i] = .1f * (rand() & 15) - .75f; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) cpu(i, cpu_src, cpu_dst); // Compare OCL_MAP_BUFFER(1); #if 0 printf("#### GPU:\n"); for (int32_t i = 0; i < (int32_t) n; ++i) printf(" %f", ((float *)buf_data[1])[i]); printf("\n#### CPU:\n"); for (int32_t i = 0; i < (int32_t) n; ++i) printf(" %f", cpu_dst[i]); printf("\n"); #endif for (int32_t i = 0; i < (int32_t) n; ++i) OCL_ASSERT(((float *)buf_data[1])[i] == cpu_dst[i]); OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION(test_load_program_from_bin_file); Beignet-1.3.2-Source/utests/compiler_assignment_operation_in_if.cpp000664 001750 001750 00000001660 13161142102 024773 0ustar00yryr000000 000000 #include "utest_helper.hpp" typedef struct cpu_int3{ int x; int y; int z; }cpu_int3; static void cpu(int gidx, int *dst) { cpu_int3 d1 = {gidx, gidx-1, gidx-3}; int k = gidx % 5; if (k == 1){ d1.x = d1.y; } int * addr = dst + gidx; *addr = d1.x; } void compiler_assignment_operation_in_if(void){ const size_t n = 16; int cpu_dst[16] = {0}; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_assignment_operation_in_if"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); globals[0] = 16; locals[0] = 16; // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) cpu(i, cpu_dst); // Compare OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) OCL_ASSERT(((int *)buf_data[0])[i] == cpu_dst[i]); OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(compiler_assignment_operation_in_if) Beignet-1.3.2-Source/utests/compiler_fill_image0.cpp000664 001750 001750 00000002020 13161142102 021536 0ustar00yryr000000 000000 #include #include "utest_helper.hpp" static void compiler_fill_image0(void) { const size_t w = 512; const size_t h = 512; cl_image_format format; cl_image_desc desc; memset(&desc, 0x0, sizeof(cl_image_desc)); memset(&format, 0x0, sizeof(cl_image_format)); format.image_channel_order = CL_RGBA; format.image_channel_data_type = CL_UNSIGNED_INT8; desc.image_type = CL_MEM_OBJECT_IMAGE2D; desc.image_width = w; desc.image_height = h; desc.image_row_pitch = 0; // Setup kernel and images OCL_CREATE_KERNEL("test_fill_image0"); OCL_CREATE_IMAGE(buf[0], 0, &format, &desc, NULL); // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); globals[0] = w; globals[1] = h; locals[0] = 16; locals[1] = 16; OCL_NDRANGE(2); // Check result OCL_MAP_BUFFER_GTT(0); for (uint32_t j = 0; j < h; ++j) for (uint32_t i = 0; i < w; i++) OCL_ASSERT(((uint32_t*)buf_data[0])[j * w + i] == (i << 16 | j)); OCL_UNMAP_BUFFER_GTT(0); } MAKE_UTEST_FROM_FUNCTION(compiler_fill_image0); Beignet-1.3.2-Source/utests/compiler_async_copy.cpp000664 001750 001750 00000003242 13161142102 021544 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include typedef unsigned char uchar; typedef unsigned short ushort; #define DEF(TYPE, KER_TYPE, VEC_SIZE) \ static void compiler_async_copy_##KER_TYPE##VEC_SIZE(void) \ { \ const size_t n = 1024; \ const size_t local_size = 32; \ const int copiesPerWorkItem = 5; \ \ /* Setup kernel and buffers */\ OCL_CREATE_KERNEL_FROM_FILE("compiler_async_copy", "compiler_async_copy_" # KER_TYPE # VEC_SIZE); \ OCL_CREATE_BUFFER(buf[0], 0, n * copiesPerWorkItem * sizeof(TYPE) * VEC_SIZE, NULL); \ OCL_CREATE_BUFFER(buf[1], 0, n * copiesPerWorkItem * sizeof(TYPE) * VEC_SIZE, NULL); \ OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); \ OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); \ OCL_SET_ARG(2, local_size*copiesPerWorkItem*sizeof(TYPE)*VEC_SIZE, NULL); \ OCL_SET_ARG(3, sizeof(int), &copiesPerWorkItem); \ \ OCL_MAP_BUFFER(1); \ for (uint32_t i = 0; i < n * copiesPerWorkItem * VEC_SIZE; ++i) \ ((TYPE*)buf_data[1])[i] = rand(); \ OCL_UNMAP_BUFFER(1); \ \ /* Run the kernel */\ globals[0] = n; \ locals[0] = local_size; \ OCL_NDRANGE(1); \ OCL_MAP_BUFFER(0); \ OCL_MAP_BUFFER(1); \ \ /* Check results */\ TYPE *dst = (TYPE*)buf_data[0]; \ TYPE *src = (TYPE*)buf_data[1]; \ for (uint32_t i = 0; i < n * copiesPerWorkItem * VEC_SIZE; i++) \ OCL_ASSERT(dst[i] == src[i]); \ OCL_UNMAP_BUFFER(0); \ OCL_UNMAP_BUFFER(1); \ } \ \ MAKE_UTEST_FROM_FUNCTION(compiler_async_copy_##KER_TYPE##VEC_SIZE); DEF(char, char, 2); DEF(uchar, uchar, 2); DEF(short, short, 2); DEF(ushort, ushort, 2); DEF(int, int, 2); DEF(uint, uint, 2); DEF(int64_t, long, 2); DEF(uint64_t, ulong, 2); DEF(float, float, 2); //DEF(double, double, 2); Beignet-1.3.2-Source/utests/compiler_workgroup_scan_exclusive.cpp000664 001750 001750 00000026440 13161142102 024534 0ustar00yryr000000 000000 #include #include #include #include #include #include #include #include "utest_helper.hpp" using namespace std; /* set to 1 for debug, output of input-expected data */ #define DEBUG_STDOUT 0 /* NDRANGE */ #define WG_GLOBAL_SIZE 64 #define WG_LOCAL_SIZE 32 enum WG_FUNCTION { WG_SCAN_EXCLUSIVE_ADD, WG_SCAN_EXCLUSIVE_MAX, WG_SCAN_EXCLUSIVE_MIN }; /* * Generic compute-expected function for op SCAN EXCLUSIVE type * and any variable type */ template static void compute_expected(WG_FUNCTION wg_func, T* input, T* expected) { if(wg_func == WG_SCAN_EXCLUSIVE_ADD) { expected[0] = 0; expected[1] = input[0]; for(uint32_t i = 2; i < WG_LOCAL_SIZE; i++) expected[i] = input[i - 1] + expected[i - 1]; } else if(wg_func == WG_SCAN_EXCLUSIVE_MAX) { if(numeric_limits::is_integer) expected[0] = numeric_limits::min(); else expected[0] = - numeric_limits::infinity(); expected[1] = input[0]; for(uint32_t i = 2; i < WG_LOCAL_SIZE; i++) expected[i] = max(input[i - 1], expected[i - 1]); } else if(wg_func == WG_SCAN_EXCLUSIVE_MIN) { if(numeric_limits::is_integer) expected[0] = numeric_limits::max(); else expected[0] = numeric_limits::infinity(); expected[1] = input[0]; for(uint32_t i = 2; i < WG_LOCAL_SIZE; i++) expected[i] = min(input[i - 1], expected[i - 1]); } } /* * Generic workgroup utest function for op SCAN EXCLUSIVE type * and any variable type */ template static void generate_data(WG_FUNCTION wg_func, T* &input, T* &expected) { input = new T[WG_GLOBAL_SIZE]; expected = new T[WG_GLOBAL_SIZE]; /* base value for all data types */ T base_val = (long)7 << (sizeof(T) * 5 - 3); /* seed for random inputs */ srand (time(NULL)); /* generate inputs and expected values */ for(uint32_t gid = 0; gid < WG_GLOBAL_SIZE; gid += WG_LOCAL_SIZE) { #if DEBUG_STDOUT cout << endl << "IN: " << endl; #endif /* input values */ for(uint32_t lid = 0; lid < WG_LOCAL_SIZE; lid++) { /* initially 0, augment after */ input[gid + lid] = 0; /* check all data types, test ideal for QWORD types */ input[gid + lid] += ((rand() % 2 - 1) * base_val); /* add trailing random bits, tests GENERAL cases */ input[gid + lid] += (rand() % 112); #if DEBUG_STDOUT /* output generated input */ cout << setw(4) << input[gid + lid] << ", " ; if((lid + 1) % 8 == 0) cout << endl; #endif } /* expected values */ compute_expected(wg_func, input + gid, expected + gid); #if DEBUG_STDOUT /* output expected input */ cout << endl << "EXP: " << endl; for(uint32_t lid = 0; lid < WG_LOCAL_SIZE; lid++) { cout << setw(4) << expected[gid + lid] << ", " ; if((lid + 1) % 8 == 0) cout << endl; } #endif } } /* * Generic workgroup utest function for op SCAN EXCLUSIVE type * and any variable type */ template static void workgroup_generic(WG_FUNCTION wg_func, T* input, T* expected) { /* input and expected data */ generate_data(wg_func, input, expected); /* prepare input for data type */ OCL_CREATE_BUFFER(buf[0], 0, WG_GLOBAL_SIZE * sizeof(T), NULL); OCL_CREATE_BUFFER(buf[1], 0, WG_GLOBAL_SIZE * sizeof(T), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); /* set input data for GPU */ OCL_MAP_BUFFER(0); memcpy(buf_data[0], input, WG_GLOBAL_SIZE * sizeof(T)); OCL_UNMAP_BUFFER(0); /* run the kernel on GPU */ globals[0] = WG_GLOBAL_SIZE; locals[0] = WG_LOCAL_SIZE; OCL_NDRANGE(1); /* check if mismatch */ OCL_MAP_BUFFER(1); uint32_t mismatches = 0; for (uint32_t i = 0; i < WG_GLOBAL_SIZE; i++) if(((T *)buf_data[1])[i] != *(expected + i)) { /* found mismatch on integer, increment */ if(numeric_limits::is_integer){ mismatches++; #if DEBUG_STDOUT /* output mismatch */ cout << "Err at " << i << ", " << ((T *)buf_data[1])[i] << " != " << *(expected + i) << endl; #endif } /* float error is tolerable though */ else { float num_computed = ((T *)buf_data[1])[i]; float num_expected = *(expected + i); float num_diff = abs(num_computed - num_expected) / abs(num_expected); if(num_diff > 0.01f){ mismatches++; #if DEBUG_STDOUT /* output mismatch */ cout << "Err at " << i << ", " << ((T *)buf_data[1])[i] << " != " << *(expected + i) << endl; #endif } } } #if DEBUG_STDOUT /* output mismatch count */ cout << "mismatches " << mismatches << endl; #endif OCL_UNMAP_BUFFER(1); OCL_ASSERT(mismatches == 0); } /* * Workgroup scan_exclusive add utest functions */ void compiler_workgroup_scan_exclusive_add_int(void) { if (!cl_check_ocl20()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_add_int"); workgroup_generic(WG_SCAN_EXCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_exclusive_add_int); void compiler_workgroup_scan_exclusive_add_uint(void) { if (!cl_check_ocl20()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_add_uint"); workgroup_generic(WG_SCAN_EXCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_exclusive_add_uint); void compiler_workgroup_scan_exclusive_add_long(void) { if (!cl_check_ocl20()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_add_long"); workgroup_generic(WG_SCAN_EXCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_scan_exclusive_add_long); void compiler_workgroup_scan_exclusive_add_ulong(void) { if (!cl_check_ocl20()) return; cl_ulong *input = NULL; cl_ulong *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_add_ulong"); workgroup_generic(WG_SCAN_EXCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_scan_exclusive_add_ulong); void compiler_workgroup_scan_exclusive_add_float(void) { if (!cl_check_ocl20()) return; cl_float *input = NULL; cl_float *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_add_float"); workgroup_generic(WG_SCAN_EXCLUSIVE_ADD, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_exclusive_add_float); /* * Workgroup scan_exclusive max utest functions */ void compiler_workgroup_scan_exclusive_max_int(void) { if (!cl_check_ocl20()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_max_int"); workgroup_generic(WG_SCAN_EXCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_exclusive_max_int); void compiler_workgroup_scan_exclusive_max_uint(void) { if (!cl_check_ocl20()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_max_uint"); workgroup_generic(WG_SCAN_EXCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_exclusive_max_uint); void compiler_workgroup_scan_exclusive_max_long(void) { if (!cl_check_ocl20()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_max_long"); workgroup_generic(WG_SCAN_EXCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_scan_exclusive_max_long); void compiler_workgroup_scan_exclusive_max_ulong(void) { if (!cl_check_ocl20()) return; cl_ulong *input = NULL; cl_ulong *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_max_ulong"); workgroup_generic(WG_SCAN_EXCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_scan_exclusive_max_ulong); void compiler_workgroup_scan_exclusive_max_float(void) { if (!cl_check_ocl20()) return; cl_float *input = NULL; cl_float *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_max_float"); workgroup_generic(WG_SCAN_EXCLUSIVE_MAX, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_exclusive_max_float); /* * Workgroup scan_exclusive min utest functions */ void compiler_workgroup_scan_exclusive_min_int(void) { if (!cl_check_ocl20()) return; cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_min_int"); workgroup_generic(WG_SCAN_EXCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_exclusive_min_int); void compiler_workgroup_scan_exclusive_min_uint(void) { if (!cl_check_ocl20()) return; cl_uint *input = NULL; cl_uint *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_min_uint"); workgroup_generic(WG_SCAN_EXCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_exclusive_min_uint); void compiler_workgroup_scan_exclusive_min_long(void) { if (!cl_check_ocl20()) return; cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_min_long"); workgroup_generic(WG_SCAN_EXCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_scan_exclusive_min_long); void compiler_workgroup_scan_exclusive_min_ulong(void) { if (!cl_check_ocl20()) return; cl_ulong *input = NULL; cl_ulong *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_min_ulong"); workgroup_generic(WG_SCAN_EXCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(compiler_workgroup_scan_exclusive_min_ulong); void compiler_workgroup_scan_exclusive_min_float(void) { if (!cl_check_ocl20()) return; cl_float *input = NULL; cl_float *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("compiler_workgroup_scan_exclusive", "compiler_workgroup_scan_exclusive_min_float"); workgroup_generic(WG_SCAN_EXCLUSIVE_MIN, input, expected); } MAKE_UTEST_FROM_FUNCTION(compiler_workgroup_scan_exclusive_min_float); Beignet-1.3.2-Source/utests/builtin_shuffle.cpp000664 001750 001750 00000002312 13161142102 020662 0ustar00yryr000000 000000 #include "utest_helper.hpp" void builtin_shuffle(void) { const int n = 32; // Setup kernel and buffers OCL_CREATE_KERNEL("builtin_shuffle"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[3], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_SET_ARG(3, sizeof(cl_mem), &buf[3]); globals[0] = n; locals[0] = 16; OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (int i = 0; i < n; i ++) { ((float *)(buf_data[0]))[i] = rand(); ((float *)(buf_data[1]))[i] = rand(); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_NDRANGE(1); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); OCL_MAP_BUFFER(2); OCL_MAP_BUFFER(3); for (int i = 0; i < n; i ++) { OCL_ASSERT(((float *)(buf_data[0]))[i] == ((float *)(buf_data[3]))[i]); OCL_ASSERT(((float *)(buf_data[1]))[i] == ((float *)(buf_data[2]))[i]); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); OCL_UNMAP_BUFFER(2); OCL_UNMAP_BUFFER(3); } MAKE_UTEST_FROM_FUNCTION(builtin_shuffle); Beignet-1.3.2-Source/utests/profiling_exec.cpp000664 001750 001750 00000007210 13161142102 020477 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include "string.h" static void cpu_exec (int n, float* src, float* dst) { int i = 0; for (; i < n; i++) { float f = src[i]; f = f < 0 ? -f : f; dst[i] = f; } } #define QUEUE_SECONDS_LIMIT 10 #define SUBMIT_SECONDS_LIMIT 20 #define COMMAND_SECONDS_LIMIT 10 static void check_profiling_time(cl_ulong queued, cl_ulong submit, cl_ulong start, cl_ulong end) { size_t profiling_resolution = 0; OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_PROFILING_TIMER_RESOLUTION, sizeof(profiling_resolution), &profiling_resolution, NULL); /* Convert the time to second. */ double queue_to_submit = (double)(submit - queued)*1e-9; double submit_to_start = (double)(start - submit)*1e-9; double start_to_end = (double)(end - start)*1e-9; //printf("Profiling info:\n"); //printf("Time from queue to submit : %fms\n", (double)(queue_to_submit) * 1000.f ); //printf( "Time from submit to start : %fms\n", (double)(submit_to_start) * 1000.f ); //printf( "Time from start to end: %fms\n", (double)(start_to_end) * 1000.f ); OCL_ASSERTM(queued <= submit, "Enqueue time is later than submit time, invalid\n"); OCL_ASSERTM(submit <= start, "Submit time is later than start time, invalid\n"); OCL_ASSERTM(start <= end, "Start time is later than end time, invalid\n"); OCL_ASSERTM(queue_to_submit <= QUEUE_SECONDS_LIMIT, "Too large time from queue to submit\n"); OCL_ASSERTM(submit_to_start <= QUEUE_SECONDS_LIMIT, "Too large time from submit to start\n"); OCL_ASSERTM(start_to_end <= QUEUE_SECONDS_LIMIT, "Too large time from start to end\n"); } static void profiling_exec(void) { const size_t n = 512; cl_int status = CL_SUCCESS; cl_command_queue profiling_queue = NULL; float* cpu_src = (float *)malloc(n*sizeof(float)); float* cpu_dst = (float *)malloc(n*sizeof(float)); cl_event exec_event; cl_ulong time_queue, time_submit, time_start, time_end; /* Because the profiling prop, we can not use default queue. */ profiling_queue = clCreateCommandQueue(ctx, device, CL_QUEUE_PROFILING_ENABLE, &status); OCL_ASSERT(status == CL_SUCCESS); OCL_CREATE_KERNEL("compiler_fabs"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(float), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 256; OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) cpu_src[i] = ((float*)buf_data[0])[i] = .1f * (rand() & 15) - .75f; OCL_UNMAP_BUFFER(0); cpu_exec(n, cpu_src, cpu_dst); // Run the kernel on GPU OCL_CALL(clEnqueueNDRangeKernel, profiling_queue, kernel, 1, NULL, globals, locals, 0, NULL, &exec_event); OCL_CALL(clWaitForEvents, 1, &exec_event); OCL_CALL(clGetEventProfilingInfo, exec_event, CL_PROFILING_COMMAND_QUEUED, sizeof(cl_ulong), &time_queue, NULL); OCL_CALL(clGetEventProfilingInfo, exec_event, CL_PROFILING_COMMAND_SUBMIT, sizeof(cl_ulong), &time_submit, NULL); OCL_CALL(clGetEventProfilingInfo, exec_event, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &time_start, NULL); OCL_CALL(clGetEventProfilingInfo, exec_event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &time_end, NULL); clReleaseEvent(exec_event); check_profiling_time(time_queue, time_submit, time_start, time_end); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) OCL_ASSERT(((float *)buf_data[1])[i] == cpu_dst[i]); OCL_UNMAP_BUFFER(1); clReleaseCommandQueue(profiling_queue); free(cpu_dst); free(cpu_src); } MAKE_UTEST_FROM_FUNCTION(profiling_exec); Beignet-1.3.2-Source/utests/compiler_insn_selection_min.cpp000664 001750 001750 00000001722 13161142102 023255 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include static void compiler_insn_selection_min(void) { const size_t n = 8192 * 4; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_insn_selection_min"); buf_data[0] = (uint32_t*) malloc(sizeof(uint32_t) * n); for (uint32_t i = 0; i < n; ++i) ((float*)buf_data[0])[i] = float(i); OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, n * sizeof(uint32_t), buf_data[0]); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); free(buf_data[0]); buf_data[0] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); // Check result OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); float *dst = (float*)buf_data[1]; float *src = (float*)buf_data[0]; for (uint32_t i = 0; i < n; ++i) { OCL_ASSERT(dst[i] == std::min(src[i], src[0])); } } MAKE_UTEST_FROM_FUNCTION(compiler_insn_selection_min) Beignet-1.3.2-Source/utests/compiler_program_global.cpp000664 001750 001750 00000004573 13161142102 022374 0ustar00yryr000000 000000 #include "utest_helper.hpp" #include "utest_file_map.hpp" static int init_program(const char* name, cl_context ctx, cl_program *pg ) { cl_int err; char* ker_path = cl_do_kiss_path(name, device); cl_file_map_t *fm = cl_file_map_new(); err = cl_file_map_open(fm, ker_path); if(err != CL_FILE_MAP_SUCCESS) OCL_ASSERT(0); const char *src = cl_file_map_begin(fm); *pg = clCreateProgramWithSource(ctx, 1, &src, NULL, &err); free(ker_path); cl_file_map_delete(fm); return 0; } void compiler_program_global() { if(!cl_check_ocl20(false)) return; const int n = 16; cl_int err; // Setup kernel and buffers cl_program program; init_program("compiler_program_global.cl", ctx, &program); OCL_CALL (clBuildProgram, program, 1, &device, "-cl-std=CL2.0", NULL, NULL); cl_kernel k0 = clCreateKernel(program, "compiler_program_global0", &err); assert(err == CL_SUCCESS); cl_kernel k1 = clCreateKernel(program, "compiler_program_global1", &err); assert(err == CL_SUCCESS); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int), NULL); OCL_CALL (clSetKernelArg, k0, 0, sizeof(cl_mem), &buf[0]); OCL_CALL (clSetKernelArg, k1, 0, sizeof(cl_mem), &buf[1]); int dynamic = 1; OCL_CALL (clSetKernelArg, k0, 1, sizeof(cl_int), &dynamic); OCL_CALL (clSetKernelArg, k1, 1, sizeof(cl_int), &dynamic); globals[0] = 16; locals[0] = 16; OCL_MAP_BUFFER(0); for (int i = 0; i < n; ++i) ((int*)buf_data[0])[i] = i; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_CALL (clEnqueueNDRangeKernel, queue, k0, 1, NULL, globals, locals, 0, NULL, NULL); OCL_CALL (clEnqueueNDRangeKernel, queue, k1, 1, NULL, globals, locals, 0, NULL, NULL); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < n; ++i) { // printf("i=%d dst=%d\n", i, ((int*)buf_data[1])[i]); switch(i) { default: OCL_ASSERT(((int*)buf_data[1])[i] == i); break; case 11: OCL_ASSERT(((int*)buf_data[1])[i] == 7); break; case 12: OCL_ASSERT(((int*)buf_data[1])[i] == 4); break; case 13: OCL_ASSERT(((int*)buf_data[1])[i] == 2); break; case 14: OCL_ASSERT(((int*)buf_data[1])[i] == 3); break; case 15: OCL_ASSERT(((int*)buf_data[1])[i] == 2); break; } } OCL_UNMAP_BUFFER(1); clReleaseKernel(k0); clReleaseKernel(k1); clReleaseProgram(program); } MAKE_UTEST_FROM_FUNCTION(compiler_program_global); Beignet-1.3.2-Source/utests/compiler_array2.cpp000664 001750 001750 00000002422 13161142102 020574 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void cpu(int global_id, int *src, int *dst) { int final[16]; int array[16]; for (int j = 0; j < 16; ++j) array[j] = j; for (int j = 0; j < 16; ++j) final[j] = j+1; if (global_id == 15) dst[global_id] = final[global_id]; else dst[global_id] = array[15 - global_id]; } void compiler_array2(void) { const size_t n = 16; int cpu_dst[16], cpu_src[16]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_array2"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; // Run random tests for (uint32_t pass = 0; pass < 8; ++pass) { OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) cpu_src[i] = ((int32_t*)buf_data[0])[i] = rand() % 16; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i <(int32_t) n; ++i) cpu(i, cpu_src, cpu_dst); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < 11; ++i) OCL_ASSERT(((int32_t*)buf_data[1])[i] == cpu_dst[i]); OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION(compiler_array2); Beignet-1.3.2-Source/utests/compiler_long.cpp000664 001750 001750 00000003522 13161142102 020335 0ustar00yryr000000 000000 #include #include #include #include "utest_helper.hpp" void compiler_long(void) { const size_t n = 16; int64_t src1[n], src2[n]; int64_t zero = 0; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_long"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(int64_t), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int64_t), NULL); OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(int64_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_SET_ARG(3, sizeof(cl_long), &zero); globals[0] = n; locals[0] = 16; // Run random tests src1[0] = -1L, src2[0] = -1L; src1[1] = 0x8000000000000000UL, src2[1] = 0x8000000000000000UL; src1[2] = 0x7FFFFFFFFFFFFFFFL, src2[2] = 1L; src1[3] = 0xFFFFFFFEL, src2[3] = 1L; src1[4] = 0x7FFFFFFFL, src2[4] = 0x80000000L; src1[5] = 0, src2[5] = 0; src1[6] = 0, src2[6] = 1; src1[7] = -2L, src2[7] = -1L; src1[8] = 0, src2[8] = 0x8000000000000000UL; for (int32_t i = 9; i < (int32_t) n; ++i) { src1[i] = ((int64_t)rand() << 32) + rand(); src2[i] = ((int64_t)rand() << 32) + rand(); } OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); memcpy(buf_data[0], src1, sizeof(src1)); memcpy(buf_data[1], src2, sizeof(src2)); OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Run the kernel on GPU OCL_NDRANGE(1); // Compare OCL_MAP_BUFFER(2); for (int32_t i = 0; i < (int32_t) n; ++i) { //printf("%lx\n", ((int64_t *)buf_data[2])[i]); if (i < 5) OCL_ASSERT(src1[i] + src2[i] == ((int64_t *)buf_data[2])[i]); if (i > 5) OCL_ASSERT(src1[i] - src2[i] == ((int64_t *)buf_data[2])[i]); } OCL_UNMAP_BUFFER(2); } MAKE_UTEST_FROM_FUNCTION(compiler_long); Beignet-1.3.2-Source/utests/compiler_local_memory_two_ptr.cpp000664 001750 001750 00000003027 13161142102 023636 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ #include "utest_helper.hpp" static void compiler_local_memory_two_ptr(void) { const size_t n = 1024; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_local_memory_two_ptr"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, 64, NULL); // 16 x int OCL_SET_ARG(2, 64, NULL); // 16 x int // Run the kernel globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); OCL_MAP_BUFFER(0); // Check results int32_t *dst = (int32_t*)buf_data[0]; for (int32_t i = 0; i < (int) n; i+=16) for (int32_t j = 0; j < 16; ++j) { const int gid = i + j; const int tid = j; OCL_ASSERT(dst[i+j] == (gid&~0xf) + 15-tid + 15-tid); } } MAKE_UTEST_FROM_FUNCTION(compiler_local_memory_two_ptr); Beignet-1.3.2-Source/utests/compiler_preprocessor_macros.cpp000664 001750 001750 00000000270 13161142102 023465 0ustar00yryr000000 000000 #include "utest_helper.hpp" void compiler_preprocessor_macros(void) { OCL_CREATE_KERNEL("compiler_preprocessor_macros"); } MAKE_UTEST_FROM_FUNCTION(compiler_preprocessor_macros); Beignet-1.3.2-Source/utests/runtime_null_kernel_arg.cpp000664 001750 001750 00000001160 13161142102 022406 0ustar00yryr000000 000000 #include "utest_helper.hpp" void runtime_null_kernel_arg(void) { const size_t n = 32; // Setup kernel and buffers OCL_CREATE_KERNEL("null_kernel_arg"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(uint32_t), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), NULL); OCL_SET_ARG(2, sizeof(cl_mem), NULL); // Run the kernel globals[0] = n; locals[0] = 16; OCL_NDRANGE(1); OCL_MAP_BUFFER(0); // Check results for (uint32_t i = 0; i < n; ++i) OCL_ASSERT(((uint32_t*)buf_data[0])[i] == i); OCL_UNMAP_BUFFER(0); } MAKE_UTEST_FROM_FUNCTION(runtime_null_kernel_arg); Beignet-1.3.2-Source/utests/compiler_convert_uchar_sat.cpp000664 001750 001750 00000002240 13161142102 023103 0ustar00yryr000000 000000 #include "utest_helper.hpp" static void cpu(int global_id, float *src, int *dst) { float f = src[global_id]; dst[global_id] = f > 255 ? 255 : f < 0 ? 0 : f; } void compiler_convert_uchar_sat(void) { const size_t n = 16; float cpu_src[16]; int cpu_dst[16]; // Setup kernel and buffers OCL_CREATE_KERNEL("compiler_convert_uchar_sat"); OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(int), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); globals[0] = 16; locals[0] = 16; // Run random tests for (uint32_t pass = 0; pass < 8; ++pass) { OCL_MAP_BUFFER(0); for (int32_t i = 0; i < (int32_t) n; ++i) cpu_src[i] = ((float*)buf_data[0])[i] = (rand() & 1023) / 2; OCL_UNMAP_BUFFER(0); // Run the kernel on GPU OCL_NDRANGE(1); // Run on CPU for (int32_t i = 0; i < (int32_t) n; ++i) cpu(i, cpu_src, cpu_dst); // Compare OCL_MAP_BUFFER(1); for (int32_t i = 0; i < (int32_t) n; ++i) OCL_ASSERT(((int *)buf_data[1])[i] == cpu_dst[i]); OCL_UNMAP_BUFFER(1); } } MAKE_UTEST_FROM_FUNCTION(compiler_convert_uchar_sat); Beignet-1.3.2-Source/utests/utest_run.cpp000664 001750 001750 00000007512 13161142102 017537 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file utest_run.cpp * \author Benjamin Segovia * * Just run the unit tests. The user can possibly provides the subset of it */ #include "utest_helper.hpp" #include "utest_exception.hpp" #include #include #include static const char *shortopts = "c:j:l::anh"; struct option longopts[] = { {"casename", required_argument, NULL, 'c'}, {"jobs", required_argument, NULL, 'j'}, {"list", optional_argument, NULL, 'l'}, {"all", no_argument, NULL, 'a'}, {"allnoissue", no_argument, NULL, 'n'}, {"help", no_argument, NULL, 'h'}, {0, 0, 0, 0}, }; void usage() { std::cout << "\ Usage:\n\ ./utest_run

o-------o
|       |
|   1   |---->-----o
|       |          |
o-------o          |
    |              |
    |              |
o-------o          |
|       |          |
|   2   |---->-----------o
|       |          |     |
o-------o          |     |
    |              |     |
    |              |     |
    | o------o     |     |
    | |      |     |     |
    | v      |     |     |
o-------o    |     |     |
|       |    |     |     |
|   3   |    |     |     |
|       |    |     |     |
o-------o    |     |     |
    | |      |     |     |
    | o------o     |     |
    |              |     |
o-------o          |     |
|       |          |     |
|   4   |<---------o     |
|       |                |
o-------o                |
    |                    |
    |                    |
o-------o                |
|       |                |
|   5   |<----------------o
|       |
o-------o
Mapping it to a SIMD machine may seem challenging. Actually it is not too complicated. The problem is with the 2->5 jump. Indeed, we have to be sure that we are not missing any computation done in block 4. To do so: - Instead of jumping from block 2 to block 5, we jump from block 2 to block 4. - We implement a `JOIN` point on top of block 4. We check if any lane is going to be reactivated for the block 4. If not, we jump to block 5. This leads to the following linearized CFG:
o-------o
|       |
|   1   |---->-----o
|       |          |
o-------o          |
    |              |
    |              |
o-------o          |
|       |          |
|   2   |---->-----------o
|       |          |     |
o-------o          |     |
    |              |     |
    |              |     |
    | o--<---o     |     |
    | |      |     |     |
    | v      |     |     |
o-------o    |     |     |
|       |    |     |     |
|   3   |    ^     |     |
|       |    |     |     |
o-------o    |     |     |
    | |      |     |     |
    | o-->---o     |     |
    |              |     |
o-------o          |     |
|       |==========|=====|====O
|   4   |<---------|-----o    |
|       |<---------o          |
o-------o                     |
    |                         |
    |                         |
o-------o                     |
|       |                     |
|   5   |<====================O
|       |
o-------o
There is a new jump from block 4 to block 5. ### Implementation on Gen When using structured branches, Gen can supports auto-masking i.e. based on the branches which are taken, the control flow is properly handled and masks are automatically applied on all instructions. However, there is no similar support for unstructured branches. We therefore decided to mask instructions manually and use single program flow. This is actually quite easy to do since Gen is able to predicate any branches. Now, how to evaluate the if conditions in an efficient way? The choice we did is to use *per-lane block IPs*: for each SIMD lane, we store a short (16 bits) for each lane in a regular 256 bits GPR (general purpose register). This "blockIP" register is used in the following way: At the beginning of each block, we compare the blockIP register with the ID of the block. The lane is going to be _activated_ if its blockIP is _smaller_ than the ID of the block. Otherwise, the lane is deactivated. Therefore, we build a flag register at the entry of each basic block with a single 16-wide uint16_t compare. If no lane is activated, a jump is performed to the next block where some lanes is going to be activated. Since this is regular jumps, we just use `jmpi` instruction. With the help of predication, we can express all the different possibilities: - backward branches are always taken if _any_ of lanes in the predicate is true. We just use `<+f0.0.anyh>` predication. - forward branches is *not* taken if some of the lanes are going to activated in the next block. We therefore compare the blockIP with the ID of the _next_ block. If all of them are strictly greater than the ID of the next block, we jump. We therefore use the `<+f0.0.allh>` predicate in that case. - `JOIN` points are even simpler. We simply jump if none of the lane is activated. We therefore use the `<-f0.0.anyh>` predicate. The complete encoding is done in `src/backend/gen_insn_selection.cpp`. Forward branches are handled by `SimpleSelection::emitForwardBranch`. Backward branches are handled by `SimpleSelection::emitBackwardBranch`. Finally, since `JOIN` points are at the top of each basic blocks, they are handled by `SimpleSelection::emitLabelInstruction`. ### Computing `JOIN` points The last problem is to compute `JOIN` point i.e. we need to know if we need to jump at the beginning of each block and if we do, what is the target of the branch. The code is relatively straightforward and can be found in `src/backend/context.cpp`. Function is `Context::buildJIPs`.
Actually, the current implementation is not that elegant. A colleague, Thomas Raoux, has a simpler and better idea to handle it. ### Advantages and drawbacks of the method - The method has one decisive advantage: it is simple and extremely robust. It can handle any kind of CFGs (reducible or not) and does not require any transformation. The use of shorts is also not random. 16-wide compares is issued in 2 cycles (so it is twice fast as 16-wide 32 bits compares). - Main drawback will be performance. Even if this is not so bad, we still need more instructions than if we used structured branches. Mostly * one or two instructions for `JOIN` points * three instructions for backward and forward jumps (two more than structured branches that just require the branch instruction itself) Note that all extra instructions are 16 bits instructions (i.e. they use shorts) so they will only cost 2 cycles anyway. The last point is that Gen encoding restricts conditional modifiers and predicates to be the same in the instruction. This requires to copy or recompute the flag register for compares and select. So one more instruction is required for these two instructions. Once again, this would require only 2 cycles. Remarks on `ret` instructions ----------------------------- Since we can handle any kind of CFG, handling the return statements are relatively straightforward. We first create one return block at the end of the program. Then we replace all other returns by a unconditional jump to this block. The CFG linearization will take care of the rest. We then simply encode the (only one) return instruction as a End-Of-Thread message (EOT). Code examples ------------- Some tests were written to assert the correctness of the CFG linearization and the code generation. They can be found in the _run-time_ code base here: `utest/compiler_if_else.cpp` `utest/compiler_lower_return0.cpp` `utest/compiler_lower_return1.cpp` `utest/compiler_lower_return2.cpp` `utest/compiler_short_scatter.cpp` `utest/compiler_unstructured_branch0.cpp` `utest/compiler_unstructured_branch1.cpp` `utest/compiler_unstructured_branch2.cpp` `utest/compiler_unstructured_branch3.cpp` Beignet-1.3.2-Source/docs/Beignet/Backend/TODO.mdwn000664 001750 001750 00000006773 13161142102 020754 0ustar00yryr000000 000000 TODO ==== The compiler is quite complete now in terms of functionality. It could pass almos all of the piglit OCL test cases and the pass rate for the OpenCV test suite is also quite good now. But there are plenty of things to do for the final performance tuning. OpenCL standard library ----------------------- Today we define the OpenCL API in header file `src/ocl_stdlib.h`. By the way, one question remains: do we want to implement the high-precision functions as _inline_ functions or as external functions to call? Indeed, inlining all functions may lead to severe code bloats while calling functions will require to implement a proper ABI. We certainly want to do both actually. LLVM front-end -------------- The code is defined in `src/llvm`. We used the SPIR and the OpenCL profile to compile the code. Therefore, a good part of the job is already done. However, many things must be implemented: - From LLVM 3.3, we use SPIR IR. We need to use the compiler defined type to represent sampler\_t/image2d\_t/image1d\_t/.... - Considering to use libclc in our project and avoid to use the PCH which is not compatible for different clang versions. And may contribute what we have done in the ocl\_stdlib.h to libclc if possible. - Optimize math functions. Gen IR ------ The code is defined in `src/ir`. Main things to do are: - Support structurized while loop and self loop BBs. - Finishing the handling of function arguments (see the [[IR description|gen_ir]] for more details) - Merging of independent uniform loads (and samples). This is a major performance improvement once the uniform analysis is done. Basically, several uniform loads may be collapsed into one load if no writes happens in-between. This will obviously impact both instruction selection and the register allocation. - Implement fast path for small local variables. When the kernel only defines a small local array/variable, there will be a good chance to allocate the local array/variable in register space rather than system memory. This will reduce a lot of memory load/stroe from the system memory. After custom loop unrolling, this optimization is not very important for most cases now. Backend ------- The code is defined in `src/backend`. Main things to do are: - Optimize register spilling (see the [[compiler backend description|compiler_backend]] for more details) - Implementing proper instruction selection. A "simple" tree matching algorithm should provide good results for Gen - Improving the instruction scheduling pass. Need to implement proper pre register allocation scheduling to lower register pressure. - Reduce the macro instructions in gen\_context. The macro instructions added in gen\_context will not get a chance to do post register allocation scheduling. - Peephole optimization. There are many chances to do further peephole optimization. - Implement a better framework to do backend instructions optimizations. General plumbing ---------------- I tried to keep the code clean, well, as far as C++ can be really clean. There are some header cleaning steps required though, in particular in the backend code. The context used in the IR code generation (see `src/ir/context.*pp`) should be split up and cleaned up too. I also purely and simply copied and pasted the Gen ISA disassembler from Mesa. This leads to code duplication. Also some messages used by OpenCL (untyped reads and writes) are not properly decoded yet. All of those code should be improved and cleaned up are tracked with "XXX" comments in the code. Beignet-1.3.2-Source/docs/Beignet.mdwn000664 001750 001750 00000026140 13173554000 016655 0ustar00yryr000000 000000 Beignet ======= Beignet is an open source implementation of the OpenCL specification - a generic compute oriented API. This code base contains the code to run OpenCL programs on Intel GPUs which basically defines and implements the OpenCL host functions required to initialize the device, create the command queues, the kernels and the programs and run them on the GPU. The code base also contains the compiler part of the stack which is included in `backend/`. For more specific information about the compiler, please refer to `backend/README.md` News ---- [[Beignet project news|Beignet/NEWS]] Prerequisite ------------ The project depends on the following external libraries: - libdrm libraries (libdrm and libdrm\_intel) - Various LLVM components - If run with X server, beignet needs XLib, Xfixes and Xext installed. Otherwise, no X11 dependency. And if you want to work with the standard ICD libOpenCL.so, then you need two more packages (the following package name is for Ubuntu): - ocl-icd-dev - ocl-icd-libopencl1 If you don't want to enable ICD, or your system doesn't have ICD OpenCL support, you must explicitly disable ICD support by running cmake with option `-DOCLICD_COMPAT=0` then you can still link to the beignet OpenCL library. You can find the beignet/libcl.so in your system's library installation directories. Note that the compiler depends on LLVM (Low-Level Virtual Machine project), and the project normally supports the 3 latest LLVM released versions. Right now, the code has been compiled with LLVM 3.6, 3.7 and 3.8. With older version LLVM from 3.3, build still support, but no full tests cover. A simple command to install all the above dependencies for ubuntu or debian is: `sudo apt-get install cmake pkg-config python ocl-icd-dev libegl1-mesa-dev` ` ocl-icd-opencl-dev libdrm-dev libxfixes-dev libxext-dev llvm-3.6-dev` ` clang-3.6 libclang-3.6-dev libtinfo-dev libedit-dev zlib1g-dev` [http://llvm.org/releases/](http://llvm.org/releases/) **The recommended LLVM/CLANG version is 3.6 and/or 3.7** Based on our test result, LLVM 3.6 and 3.7 has the best pass rate on all the test suites. Compared to LLVM 3.6 and 3.7, if you used LLVM 3.8, you should pay attention to float immediate. For example, if you use 1.0 in the kernel, LLVM 3.6 will treat it as 1.0f, a single float, because the project doesn't support double float. but LLVM 3.8 will treat it as 1.0, a double float, at the last it may cause error. So we recommend using 1.0f instead of 1.0 if you don't need double float. For LLVM 3.4 and 3.5, Beignet still support them, but it may be limited to support the build and major functions. How to build and install ------------------------ The project uses CMake with three profiles: 1. Debug (-g) 2. RelWithDebInfo (-g with optimizations) 3. Release (only optimizations) Basically, from the root directory of the project `> mkdir build` `> cd build` `> cmake ../ # to configure` Please be noted that the code was compiled on GCC 4.6, GCC 4.7 and GCC 4.8 and CLANG 3.5 and ICC 14.0.3. Since the code uses really recent C++11 features, you may expect problems with older compilers. The default compiler should be GCC, and if you want to choose compiler manually, you need to configure it as below: `> cmake -DCOMPILER=[GCC|CLANG|ICC] ../` CMake will check the dependencies and will complain if it does not find them. `> make` The cmake will build the backend firstly. Please refer to: [[OpenCL Gen Backend|Beignet/Backend]] to get more dependencies. Once built, the run-time produces a shared object libcl.so which basically directly implements the OpenCL API. `> make utest` A set of tests are also produced. They may be found in `utests/`. Simply invoke: `> make install` It installs the following six files to the beignet/ directory relatively to your library installation directory. - libcl.so - libgbeinterp.so - libgbe.so - ocl\_stdlib.h, ocl\_stdlib.h.pch - beignet.bc It installs the OCL icd vendor files to /etc/OpenCL/vendors, if the system support ICD. - intel-beignet.icd `> make package` It packages the driver binaries, you may copy&install the package to another machine with similar system. How to run ---------- After building and installing Beignet, you may need to check whether it works on your platform. Beignet also produces various tests to ensure the compiler and the run-time consistency. This small test framework uses a simple c++ registration system to register all the unit tests. You need to call setenv.sh in the utests/ directory to set some environment variables firstly as below: `> . setenv.sh` Then in `utests/`: `> ./utest_run` will run all the unit tests one after the others `> ./utest_run some_unit_test` will only run `some_unit_test` test. On all supported target platform, the pass rate should be 100%. If it is not, you may need to refer the "Known Issues" section. Please be noted, the `. setenv.sh` is only required to run unit test cases. For all other OpenCL applications, don't execute that command. Normally, beignet needs to run under X server environment as normal user. If there isn't X server, beignet provides two alternative to run: * Run as root without X. * Enable the drm render nodes by passing drm.rnodes=1 to the kernel boot args, then you can run beignet with non-root and without X. Supported Targets ----------------- * 3rd Generation Intel Core Processors "Ivybridge". * 3rd Generation Intel Atom Processors "BayTrail". * 4th Generation Intel Core Processors "Haswell", need kernel patch if your linux kernel older than 4.2, see the "Known Issues" section. * 5th Generation Intel Core Processors "Broadwell". * 5th Generation Intel Atom Processors "Braswell". * 6th Generation Intel Core Processors "Skylake" and "Kabylake". * 5th Generation Intel Atom Processors "Broxten" or "Apollolake". OpenCL 2.0 ---------- From release v1.3.0, beignet supports OpenCL 2.0 on Skylake and later hardware. This requires LLVM/Clang 3.9 or later, libdrm 2.4.66 or later and x86_64 linux. As required by the OpenCL specification, kernels are compiled as OpenCL C 1.2 by default; to use 2.0 they must explicitly request it with the -cl-std=CL2.0 build option. As OpenCL 2.0 is likely to be slower than 1.2, we recommend that this is used only where needed. (This is because 2.0 uses more registers and has lots of int64 operations, and some of the 2.0 features (pipes and especially device queues) are implemented in software so do not provide any performance gain.) Beignet will continue to improve OpenCL 2.0 performance. Known Issues ------------ * GPU hang issues. To check whether GPU hang, you could execute dmesg and check whether it has the following message: `[17909.175965] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed...` If it does, there was a GPU hang. Usually, this means something wrong in the kernel, as it indicates the OCL kernel hasn't finished for about 6 seconds or even more. If you think the OCL kernel does need to run that long and have confidence with the kernel, you could disable the linux kernel driver's hang check feature to fix this hang issue. Just invoke the following command on Ubuntu system: `# echo -n 0 > /sys/module/i915/parameters/enable_hangcheck` But this command is a little bit dangerous, as if your kernel really hangs, then the GPU will lock up forever until a reboot. * "Beignet: self-test failed" and almost all unit tests fail. Linux 3.15 and 3.16 (commits [f0a346b](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f0a346bdafaf6fc4a51df9ddf1548fd888f860d8) to [c9224fa](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c9224faa59c3071ecfa2d4b24592f4eb61e57069)) enable the register whitelist by default but miss some registers needed for Beignet. This can be fixed by upgrading Linux, or by disabling the whitelist: `# echo 0 > /sys/module/i915/parameters/enable_cmd_parser` * "Beignet: self-test failed" and 15-30 unit tests fail on 4th Generation (Haswell) hardware. On Haswell, shared local memory (\_\_local) does not work at all on Linux <= 4.0, and requires the i915.enable_ppgtt=2 [boot parameter](https://wiki.ubuntu.com/Kernel/KernelBootParameters) on Linux 4.1. This is fixed in Linux 4.2; older versions can be fixed with [this patch](https://01.org/zh/beignet/downloads/linux-kernel-patch-hsw-support). If you do not need \_\_local, you can override the self-test with `export OCL_IGNORE_SELF_TEST=1` but using \_\_local after this may silently give wrong results. * Precision issue. Currently Gen does not provide native support of high precision math functions required by OpenCL. We provide a software version to achieve high precision, which you can turn off through `# export OCL_STRICT_CONFORMANCE=0`. This loses some precision but gains performance. * cl\_khr\_gl\_sharing. This extension is partially implemented(the most commonly used part), and we will implement other parts based on requirement. Project repository ------------------ Right now, we host our project on fdo at: [http://cgit.freedesktop.org/beignet/](http://cgit.freedesktop.org/beignet/). And the Intel 01.org: [https://01.org/beignet](https://01.org/beignet) The team -------- Beignet project was created by Ben Segovia. Since 2013, Now Intel China OTC graphics team continue to work on this project. The official contact for this project is: Zou Nanhai (). Maintainers from Intel: * Gong, Zhigang * Yang, Rong Developers from Intel: * Song, Ruiling * He, Junyan * Luo, Xionghu * Wen, Chuanbo * Guo, Yejun * Pan, Xiuli Debian Maintainer: * Rebecca Palmer Fedora Maintainer: * Igor Gnatenko If I missed any other package maintainers, please feel free to contact the mail list. How to contribute ----------------- You are always welcome to contribute to this project, just need to subscribe to the beignet mail list and send patches to it for review. The official mail list is as below: [http://lists.freedesktop.org/mailman/listinfo/beignet](http://lists.freedesktop.org/mailman/listinfo/beignet) The official bugzilla is at: [https://bugs.freedesktop.org/enter_bug.cgi?product=Beignet](https://bugs.freedesktop.org/enter_bug.cgi?product=Beignet) You are welcome to submit beignet bug. Please be noted, please specify the exact platform information, such as BYT/IVB/HSW/BDW, and GT1/GT2/GT3. You can easily get this information by running the beignet's unit test. Documents for OpenCL application developers ------------------------------------------- - [[Cross compile (yocto)|Beignet/howto/cross-compiler-howto]] - [[Work with old system without c++11|Beignet/howto/oldgcc-howto]] - [[Kernel Optimization Guide|Beignet/optimization-guide]] - [[Libva Buffer Sharing|Beignet/howto/libva-buffer-sharing-howto]] - [[V4l2 Buffer Sharing|Beignet/howto/v4l2-buffer-sharing-howto]] - [[OpenGL Buffer Sharing|Beignet/howto/gl-buffer-sharing-howto]] - [[Video Motion Estimation|Beignet/howto/video-motion-estimation-howto]] - [[Stand Alone Unit Test|Beignet/howto/stand-alone-utest-howto]] - [[Android build|Beignet/howto/android-build-howto]] The wiki URL is as below: [http://www.freedesktop.org/wiki/Software/Beignet/](http://www.freedesktop.org/wiki/Software/Beignet/) Beignet-1.3.2-Source/docs/NEWS.mdwn000664 001750 001750 00000004376 13174331653 016073 0ustar00yryr000000 000000 # News ## Oct 26, 2017 [Beignet 1.3.2](https://01.org/beignet/downloads/beignet-1.3.2-2017-10-26) is released. This is a bug-fix release. ## Mar 13, 2017 [Beignet 1.3.1](https://01.org/beignet/downloads/beignet-1.3.1-2017-03-13) is released. This is a bug-fix release. ## Jan 20, 2017 [Beignet 1.3.0](https://01.org/beignet/downloads/beignet-1.3.0-2017-01-20) is released. This is a major release. Please see the release notes for more information. ## Nov 4, 2016 [Beignet 1.2.1](https://01.org/beignet/downloads/beignet-1.2.1-2016-11-04) is released. This is a bug-fix release. ## Aug 30, 2016 [Beignet 1.2.0](https://01.org/beignet/downloads/beignet-1.2.0-2016-08-30) is released. This is a major release. Please see the release notes for more information. ## Apr 19, 2016 [Beignet 1.1.2](https://01.org/beignet/downloads/beignet-1.1.2-2016-04-19) is released. This is a bug-fix release. ## Oct 08, 2015 [Beignet 1.1.1](https://01.org/beignet/downloads/beignet-1.1.1-2015-10-08) is released. This is a bug-fix release. ## Jul 21, 2015 [Beignet 1.1.0](https://01.org/beignet/downloads/beignet-1.1.0-2015-07-31) is released. This is a major release. Please see the release notes for more information. ## Jan 19, 2015 [Beignet 1.0.1](https://01.org/beignet/downloads/beignet-1.0.1-2015-01-19) is released. This is a bug-fix release. ## Nov 14, 2014 [Beignet 1.0.0](https://01.org/beignet/downloads/beignet-1.0.0-2014-11-14) is released. This is a major release. Please see the release notes for more information. ## Sep 15, 2014 [Beignet 0.9.3](https://01.org/zh/beignet/downloads/beignet-0.9.3-2014-09-15-0) is released. This is a bug-fix release. ## July 17, 2014 [Beignet 0.9.2](https://01.org/zh/beignet/downloads/beignet-0.9.2-2014-07-17) is released. This is a bug-fix release. ## July 4, 2014 [Beignet 0.9.1](https://01.org/zh/beignet/downloads/beignet-0.9.1-2014-07-04) is released. This is a bug-fix release. ## June 26, 2014 [Beignet 0.9.0](https://01.org/zh/beignet/downloads/beignet-0.9-2014-06-26) is released. This is a major release. Please see the release notes for more information. ## Feb 12, 2014 [Beignet 0.8.0](https://01.org/zh/beignet/downloads/2014/beignet-0.8.0-2014-02-12) is released. This is a major release. Please see the release notes for more information. Beignet-1.3.2-Source/.gitmodules000664 001750 001750 00000000165 13161142102 015626 0ustar00yryr000000 000000 [submodule "examples/thirdparty/libva"] path = examples/thirdparty/libva url = git://anongit.freedesktop.org/libva Beignet-1.3.2-Source/benchmark/000775 001750 001750 00000000000 13174334761 015423 5ustar00yryr000000 000000 Beignet-1.3.2-Source/benchmark/benchmark_workgroup.cpp000664 001750 001750 00000025031 13161142102 022157 0ustar00yryr000000 000000 #include #include #include #include #include "utest_helper.hpp" #include #include #include using namespace std; /* work-group general settings */ #define WG_GLOBAL_SIZE (512 * 256) #define WG_LOCAL_SIZE 128 #define WG_LOOP_COUNT 1000 /* work-group broadcast only */ #define WG_GLOBAL_SIZE_X 1024 #define WG_GLOBAL_SIZE_Y 1024 #define WG_LOCAL_SIZE_X 32 #define WG_LOCAL_SIZE_Y 2 #define WG_LOCAL_X 5 #define WG_LOCAL_Y 0 enum WG_FUNCTION { WG_BROADCAST_1D, WG_BROADCAST_2D, WG_REDUCE_ADD, WG_REDUCE_MIN, WG_REDUCE_MAX, WG_SCAN_EXCLUSIVE_ADD, WG_SCAN_EXCLUSIVE_MAX, WG_SCAN_EXCLUSIVE_MIN, WG_SCAN_INCLUSIVE_ADD, WG_SCAN_INCLUSIVE_MAX, WG_SCAN_INCLUSIVE_MIN }; /* * Generic compute-expected on CPU function for any workgroup type * and any variable type */ template static void benchmark_expected(WG_FUNCTION wg_func, T* input, T* expected, uint32_t wg_global_size, uint32_t wg_local_size) { if(wg_func == WG_BROADCAST_1D) { for(uint32_t i = 0; i < wg_local_size; i++) expected[i] = input[WG_LOCAL_X]; } else if(wg_func == WG_BROADCAST_2D) { for(uint32_t i = 0; i < wg_local_size; i++) expected[i] = input[WG_LOCAL_X + WG_LOCAL_Y * WG_LOCAL_SIZE_X]; } else if(wg_func == WG_REDUCE_ADD) { T wg_sum = input[0]; for(uint32_t i = 1; i < wg_local_size; i++) wg_sum += input[i]; for(uint32_t i = 0; i < wg_local_size; i++) expected[i] = wg_sum; } else if(wg_func == WG_REDUCE_MAX) { T wg_max = input[0]; for(uint32_t i = 1; i < wg_local_size; i++) wg_max = max(input[i], wg_max); for(uint32_t i = 0; i < wg_local_size; i++) expected[i] = wg_max; } else if(wg_func == WG_REDUCE_MIN) { T wg_min = input[0]; for(uint32_t i = 1; i < wg_local_size; i++) wg_min = min(input[i], wg_min); for(uint32_t i = 0; i < wg_local_size; i++) expected[i] = wg_min; } else if(wg_func == WG_SCAN_INCLUSIVE_ADD) { expected[0] = input[0]; for(uint32_t i = 1; i < wg_local_size; i++) expected[i] = input[i] + expected[i - 1]; } else if(wg_func == WG_SCAN_INCLUSIVE_MAX) { expected[0] = input[0]; for(uint32_t i = 1; i < wg_local_size; i++) expected[i] = max(input[i], expected[i - 1]); } else if(wg_func == WG_SCAN_INCLUSIVE_MIN) { expected[0] = input[0]; for(uint32_t i = 1; i < wg_local_size; i++) expected[i] = min(input[i], expected[i - 1]); } } /* * Generic input-expected generate function for any workgroup type * and any variable type */ template static void benchmark_data(WG_FUNCTION wg_func, T* &input, T* &expected, uint32_t &wg_global_size, uint32_t &wg_local_size) { if(wg_func == WG_BROADCAST_1D) { wg_global_size = WG_GLOBAL_SIZE_X; wg_local_size = WG_LOCAL_SIZE_X; } else if(wg_func == WG_BROADCAST_2D) { wg_global_size = WG_GLOBAL_SIZE_X * WG_GLOBAL_SIZE_Y; wg_local_size = WG_LOCAL_SIZE_X * WG_LOCAL_SIZE_Y; } else { wg_global_size = WG_GLOBAL_SIZE; wg_local_size = WG_LOCAL_SIZE; } input = new T[wg_global_size]; expected = new T[wg_global_size]; /* seed for random inputs */ srand (time(NULL)); /* generate inputs and expected values */ for(uint32_t gid = 0; gid < wg_global_size; gid += wg_local_size) { /* input values */ for(uint32_t lid = 0; lid < wg_local_size; lid++) input[gid + lid] = (rand() % 512) / 3.1415f; /* expected values */ benchmark_expected(wg_func, input + gid, expected + gid, wg_global_size, wg_local_size); } } /* * Generic benchmark function for any workgroup type * and any variable type */ template static double benchmark_generic(WG_FUNCTION wg_func, T* input, T* expected) { double elapsed = 0; const uint32_t reduce_loop = WG_LOOP_COUNT; struct timeval start,stop; uint32_t wg_global_size = 0; uint32_t wg_local_size = 0; /* input and expected data */ benchmark_data(wg_func, input, expected, wg_global_size, wg_local_size); /* prepare input for datatype */ OCL_CREATE_BUFFER(buf[0], 0, wg_global_size * sizeof(T), NULL); OCL_CREATE_BUFFER(buf[1], 0, wg_global_size * sizeof(T), NULL); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_uint), &reduce_loop); if(wg_func == WG_BROADCAST_1D || wg_func == WG_BROADCAST_2D) { cl_uint wg_local_x = WG_LOCAL_X; cl_uint wg_local_y = WG_LOCAL_Y; OCL_SET_ARG(3, sizeof(cl_uint), &wg_local_x); OCL_SET_ARG(4, sizeof(cl_uint), &wg_local_y); } /* set input data for GPU */ OCL_MAP_BUFFER(0); memcpy(buf_data[0], input, wg_global_size * sizeof(T)); OCL_UNMAP_BUFFER(0); /* run the kernel on GPU */ gettimeofday(&start,0); if(wg_func == WG_BROADCAST_1D) { globals[0] = WG_GLOBAL_SIZE_X; locals[0] = WG_LOCAL_SIZE_X; OCL_NDRANGE(1); } else if(wg_func == WG_BROADCAST_2D) { globals[0] = WG_GLOBAL_SIZE_X; locals[0] = WG_LOCAL_SIZE_X; globals[1] = WG_GLOBAL_SIZE_Y; locals[1] = WG_LOCAL_SIZE_Y; OCL_NDRANGE(2); } else { /* reduce, scan inclulsive, scan exclusive */ globals[0] = WG_GLOBAL_SIZE; locals[0] = WG_LOCAL_SIZE; OCL_NDRANGE(1); } clFinish(queue); gettimeofday(&stop,0); elapsed = time_subtract(&stop, &start, 0); /* check if mistmatch, display execution time */ OCL_MAP_BUFFER(1); uint32_t mistmatches = 0; for (uint32_t i = 0; i < wg_global_size; i++) if(((T *)buf_data[1])[i] != *(expected + i)){ /* uncomment bellow for DEBUG */ /* cout << "Err at " << i << ", " << ((T *)buf_data[1])[i] << " != " << *(expected + i) << endl; */ mistmatches++; } cout << endl << endl << "Mistmatches " << mistmatches << endl; cout << "Exec time " << elapsed << endl << endl; OCL_UNMAP_BUFFER(1); return BANDWIDTH(sizeof(T) * wg_global_size * reduce_loop, elapsed); } /* * Benchmark workgroup broadcast */ double benchmark_workgroup_broadcast_1D_int(void) { cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("bench_workgroup", "bench_workgroup_broadcast_1D_int"); return benchmark_generic(WG_BROADCAST_1D, input, expected); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_workgroup_broadcast_1D_int, "GB/S"); double benchmark_workgroup_broadcast_1D_long(void) { cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("bench_workgroup", "bench_workgroup_broadcast_1D_long"); return benchmark_generic(WG_BROADCAST_1D, input, expected); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_workgroup_broadcast_1D_long, "GB/S"); double benchmark_workgroup_broadcast_2D_int(void) { cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("bench_workgroup", "bench_workgroup_broadcast_2D_int"); return benchmark_generic(WG_BROADCAST_2D, input, expected); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_workgroup_broadcast_2D_int, "GB/S"); double benchmark_workgroup_broadcast_2D_long(void) { cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("bench_workgroup", "bench_workgroup_broadcast_2D_long"); return benchmark_generic(WG_BROADCAST_2D, input, expected); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_workgroup_broadcast_2D_long, "GB/S"); /* * Benchmark workgroup reduce add */ double benchmark_workgroup_reduce_add_int(void) { cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("bench_workgroup", "bench_workgroup_reduce_add_int"); return benchmark_generic(WG_REDUCE_ADD, input, expected); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_workgroup_reduce_add_int, "GB/S"); double benchmark_workgroup_reduce_add_long(void) { cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("bench_workgroup", "bench_workgroup_reduce_add_long"); return benchmark_generic(WG_REDUCE_ADD, input, expected); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_workgroup_reduce_add_long, "GB/S"); /* * Benchmark workgroup reduce min */ double benchmark_workgroup_reduce_min_int(void) { cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("bench_workgroup", "bench_workgroup_reduce_min_int"); return benchmark_generic(WG_REDUCE_MIN, input, expected); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_workgroup_reduce_min_int, "GB/S"); double benchmark_workgroup_reduce_min_long(void) { cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("bench_workgroup", "bench_workgroup_reduce_min_long"); return benchmark_generic(WG_REDUCE_MIN, input, expected); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_workgroup_reduce_min_long, "GB/S"); /* * Benchmark workgroup scan inclusive add */ double benchmark_workgroup_scan_inclusive_add_int(void) { cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("bench_workgroup", "bench_workgroup_scan_inclusive_add_int"); return benchmark_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_workgroup_scan_inclusive_add_int, "GB/S"); double benchmark_workgroup_scan_inclusive_add_long(void) { cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("bench_workgroup", "bench_workgroup_scan_inclusive_add_long"); return benchmark_generic(WG_SCAN_INCLUSIVE_ADD, input, expected); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_workgroup_scan_inclusive_add_long, "GB/S"); /* * Benchmark workgroup scan inclusive min */ double benchmark_workgroup_scan_inclusive_min_int(void) { cl_int *input = NULL; cl_int *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("bench_workgroup", "bench_workgroup_scan_inclusive_min_int"); return benchmark_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_workgroup_scan_inclusive_min_int, "GB/S"); double benchmark_workgroup_scan_inclusive_min_long(void) { cl_long *input = NULL; cl_long *expected = NULL; OCL_CREATE_KERNEL_FROM_FILE("bench_workgroup", "bench_workgroup_scan_inclusive_min_long"); return benchmark_generic(WG_SCAN_INCLUSIVE_MIN, input, expected); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_workgroup_scan_inclusive_min_long, "GB/S"); Beignet-1.3.2-Source/benchmark/benchmark_read_image.cpp000664 001750 001750 00000003531 13161142102 022176 0ustar00yryr000000 000000 #include #include "utests/utest_helper.hpp" #include double benchmark_read_image(void) { struct timeval start,stop; const size_t x_count = 4; const size_t y_count = 4; const size_t w = 1024; const size_t h = 1024; const size_t sz = 4 * x_count * y_count * w * h; cl_image_format format; cl_image_desc desc; memset(&desc, 0x0, sizeof(cl_image_desc)); memset(&format, 0x0, sizeof(cl_image_format)); // Setup kernel and images OCL_CREATE_KERNEL("compiler_read_image"); buf_data[0] = (uint32_t*) malloc(sizeof(float) * sz); buf_data[1] = (uint32_t*) malloc(sizeof(float) * sz); for (uint32_t i = 0; i < sz; ++i) { ((float*)buf_data[0])[i] = rand(); ((float*)buf_data[1])[i] = rand(); } format.image_channel_order = CL_RGBA; format.image_channel_data_type = CL_FLOAT; desc.image_type = CL_MEM_OBJECT_IMAGE2D; desc.image_width = w * x_count; desc.image_height = h * y_count; desc.image_row_pitch = desc.image_width * sizeof(float) * 4; OCL_CREATE_IMAGE(buf[0], CL_MEM_COPY_HOST_PTR, &format, &desc, buf_data[0]); OCL_CREATE_IMAGE(buf[1], CL_MEM_COPY_HOST_PTR, &format, &desc, buf_data[1]); OCL_CREATE_BUFFER(buf[2], 0, sz * sizeof(float), NULL); free(buf_data[0]); buf_data[0] = NULL; free(buf_data[1]); buf_data[1] = NULL; // Run the kernel OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); globals[0] = w; globals[1] = h; locals[0] = 16; locals[1] = 16; gettimeofday(&start,0); for (size_t i=0; i<100; i++) { OCL_NDRANGE(2); } OCL_FINISH(); gettimeofday(&stop,0); free(buf_data[0]); buf_data[0] = NULL; double elapsed = time_subtract(&stop, &start, 0); return BANDWIDTH(sz * sizeof(float) * 2 * 100, elapsed); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_read_image, "GB/S"); Beignet-1.3.2-Source/benchmark/CMakeLists.txt000664 001750 001750 00000002225 13161142102 020142 0ustar00yryr000000 000000 INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/../utests ${CMAKE_CURRENT_SOURCE_DIR}/../include) link_directories (${LLVM_LIBRARY_DIR} ${DRM_LIBDIR}) set (benchmark_sources ../utests/utest_error.c ../utests/utest_assert.cpp ../utests/utest.cpp ../utests/utest_file_map.cpp ../utests/utest_helper.cpp ../utests/vload_bench.cpp benchmark_copy_buf.cpp benchmark_use_host_ptr_buffer.cpp benchmark_read_buffer.cpp benchmark_read_image.cpp benchmark_copy_buffer_to_image.cpp benchmark_copy_image_to_buffer.cpp benchmark_copy_buffer.cpp benchmark_copy_image.cpp benchmark_workgroup.cpp benchmark_math.cpp) SET(CMAKE_CXX_FLAGS "-DBUILD_BENCHMARK ${CMAKE_CXX_FLAGS}") SET(CMAKE_C_FLAGS "-DBUILD_BENCHMARK ${CMAKE_C_FLAGS}") ADD_LIBRARY(benchmarks SHARED ${ADDMATHFUNC} ${benchmark_sources}) #TARGET_LINK_LIBRARIES(benchmarks cl m ${OPENGL_LIBRARIES} ${CMAKE_THREAD_LIBS_INIT}) TARGET_LINK_LIBRARIES(benchmarks cl m) ADD_EXECUTABLE(benchmark_run benchmark_run.cpp) TARGET_LINK_LIBRARIES(benchmark_run benchmarks) ADD_CUSTOM_TARGET(benchmark DEPENDS benchmarks benchmark_run) Beignet-1.3.2-Source/benchmark/benchmark_math.cpp000664 001750 001750 00000007420 13161142102 021053 0ustar00yryr000000 000000 #include "utests/utest_helper.hpp" #include #include #include #include #include #include "utest_helper.hpp" #include double benchmark_generic_math(const char* str_filename, const char* str_kernel) { double elapsed = 0; struct timeval start,stop; const size_t global_size = 1024 * 1024; const size_t local_size = 64; /* Compute math OP, loop times on global size */ cl_float base = 1.000002; cl_float pwr = 1.0102003; uint32_t loop = 1000; /* Input set will be generated */ float* src = (float*)calloc(sizeof(float), global_size); OCL_ASSERT(src != NULL); for(uint32_t i = 0; i < global_size; i++) src[i] = base + i * (base - 1); /* Setup kernel and buffers */ OCL_CALL(cl_kernel_init, str_filename, str_kernel, SOURCE, ""); OCL_CREATE_BUFFER(buf[0], 0, (global_size) * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, (global_size) * sizeof(float), NULL); OCL_MAP_BUFFER(0); memcpy(buf_data[0], src, global_size * sizeof(float)); OCL_UNMAP_BUFFER(0); globals[0] = global_size; locals[0] = local_size; OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_float), &pwr); OCL_SET_ARG(3, sizeof(cl_uint), &loop); /* Measure performance */ gettimeofday(&start,0); OCL_NDRANGE(1); clFinish(queue); gettimeofday(&stop,0); elapsed = time_subtract(&stop, &start, 0); /* Show compute results */ OCL_MAP_BUFFER(1); for(uint32_t i = 0; i < global_size; i += 8192) printf("\t%.3f", ((float*)buf_data[1])[i]); OCL_UNMAP_BUFFER(1); return BANDWIDTH(global_size * loop, elapsed); } double benchmark_math_pow(void){ return benchmark_generic_math("bench_math.cl", "bench_math_pow"); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_math_pow, "Mop/s"); double benchmark_math_exp2(void){ return benchmark_generic_math("bench_math.cl", "bench_math_exp2"); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_math_exp2, "Mop/s"); double benchmark_math_exp(void){ return benchmark_generic_math("bench_math.cl", "bench_math_exp"); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_math_exp, "Mop/s"); double benchmark_math_exp10(void){ return benchmark_generic_math("bench_math.cl", "bench_math_exp10"); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_math_exp10, "Mop/s"); double benchmark_math_log2(void){ return benchmark_generic_math("bench_math.cl", "bench_math_log2"); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_math_log2, "Mop/s"); double benchmark_math_log(void){ return benchmark_generic_math("bench_math.cl", "bench_math_log"); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_math_log, "Mop/s"); double benchmark_math_log10(void){ return benchmark_generic_math("bench_math.cl", "bench_math_log10"); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_math_log10, "Mop/s"); double benchmark_math_sqrt(void){ return benchmark_generic_math("bench_math.cl", "bench_math_sqrt"); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_math_sqrt, "Mop/s"); double benchmark_math_sin(void){ return benchmark_generic_math("bench_math.cl", "bench_math_sin"); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_math_sin, "Mop/s"); double benchmark_math_cos(void){ return benchmark_generic_math("bench_math.cl", "bench_math_cos"); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_math_cos, "Mop/s"); double benchmark_math_tan(void){ return benchmark_generic_math("bench_math.cl", "bench_math_tan"); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_math_tan, "Mop/s"); double benchmark_math_asin(void){ return benchmark_generic_math("bench_math.cl", "bench_math_asin"); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_math_asin, "Mop/s"); double benchmark_math_acos(void){ return benchmark_generic_math("bench_math.cl", "bench_math_acos"); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_math_acos, "Mop/s"); Beignet-1.3.2-Source/benchmark/benchmark_read_buffer.cpp000664 001750 001750 00000002245 13161142102 022366 0ustar00yryr000000 000000 #include "utests/utest_helper.hpp" #include double benchmark_read_buffer(void) { struct timeval start,stop; const size_t n = 1024 * 1024; int count = 16; const size_t sz = 4 * n * count; OCL_CREATE_BUFFER(buf[0], 0, sz * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[1], 0, sz * sizeof(float), NULL); OCL_CREATE_BUFFER(buf[2], 0, sz * sizeof(float), NULL); OCL_CREATE_KERNEL("compiler_read_buffer"); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); OCL_SET_ARG(1, sizeof(cl_mem), &buf[1]); OCL_SET_ARG(2, sizeof(cl_mem), &buf[2]); OCL_MAP_BUFFER(0); OCL_MAP_BUFFER(1); for (size_t i = 0; i < sz; i ++) { ((float *)(buf_data[0]))[i] = rand(); ((float *)(buf_data[1]))[i] = rand(); } OCL_UNMAP_BUFFER(0); OCL_UNMAP_BUFFER(1); // Setup kernel and buffers globals[0] = n; locals[0] = 256; gettimeofday(&start,0); for (size_t i=0; i<100; i++) { OCL_NDRANGE(1); } OCL_FINISH(); gettimeofday(&stop,0); free(buf_data[0]); buf_data[0] = NULL; double elapsed = time_subtract(&stop, &start, 0); return BANDWIDTH(sz * sizeof(float) * 2 * 100, elapsed); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_read_buffer, "GB/S"); Beignet-1.3.2-Source/benchmark/benchmark_use_host_ptr_buffer.cpp000664 001750 001750 00000002115 13161142102 024165 0ustar00yryr000000 000000 #include "utests/utest_helper.hpp" #include double benchmark_use_host_ptr_buffer(void) { struct timeval start,stop; const size_t n = 4096*4096; // Setup kernel and buffers OCL_CREATE_KERNEL("runtime_use_host_ptr_buffer"); int ret = posix_memalign(&buf_data[0], 64, sizeof(uint32_t) * n); OCL_ASSERT(ret == 0); for (uint32_t i = 0; i < n; ++i) ((uint32_t*)buf_data[0])[i] = i; OCL_CREATE_BUFFER(buf[0], CL_MEM_USE_HOST_PTR, n * sizeof(uint32_t), buf_data[0]); OCL_SET_ARG(0, sizeof(cl_mem), &buf[0]); globals[0] = n; locals[0] = 256; gettimeofday(&start,0); for (size_t i=0; i<100; i++) { OCL_NDRANGE(1); void* mapptr = (int*)clEnqueueMapBuffer(queue, buf[0], CL_TRUE, CL_MAP_READ, 0, n*sizeof(uint32_t), 0, NULL, NULL, NULL); clEnqueueUnmapMemObject(queue, buf[0], mapptr, 0, NULL, NULL); } gettimeofday(&stop,0); free(buf_data[0]); buf_data[0] = NULL; double elapsed = time_subtract(&stop, &start, 0); return BANDWIDTH(n*sizeof(uint32_t)*100*2, elapsed); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_use_host_ptr_buffer, "GB/S"); Beignet-1.3.2-Source/benchmark/benchmark_copy_buffer_to_image.cpp000664 001750 001750 00000003447 13161142102 024276 0ustar00yryr000000 000000 #include #include "utests/utest_helper.hpp" #include #define IMAGE_BPP 2 double benchmark_copy_buffer_to_image(void) { struct timeval start,stop; const size_t w = 960 * 4; const size_t h = 540 * 4; const size_t sz = IMAGE_BPP * w * h; cl_image_format format; cl_image_desc desc; memset(&desc, 0x0, sizeof(cl_image_desc)); memset(&format, 0x0, sizeof(cl_image_format)); // Setup image and buffer buf_data[0] = (unsigned short*) malloc(sz); for (uint32_t i = 0; i < w*h; ++i) { ((unsigned short*)buf_data[0])[i] = (rand() & 0xffff); } format.image_channel_order = CL_R; format.image_channel_data_type = CL_UNSIGNED_INT16; desc.image_type = CL_MEM_OBJECT_IMAGE2D; desc.image_width = w; desc.image_height = h; desc.image_row_pitch = 0; OCL_CREATE_BUFFER(buf[0], CL_MEM_COPY_HOST_PTR, sz, buf_data[0]); OCL_CREATE_IMAGE(buf[1], 0, &format, &desc, NULL); /*copy buffer to image*/ size_t origin[3] = {0, 0, 0}; size_t region[3] = {w, h, 1}; OCL_CALL (clEnqueueCopyBufferToImage, queue, buf[0], buf[1], 0, origin, region, 0, NULL, NULL); OCL_FINISH(); OCL_MAP_BUFFER_GTT(1); /*check result*/ for (uint32_t j = 0; j < h; ++j) for (uint32_t i = 0; i < w; i++) { OCL_ASSERT(((unsigned short*)buf_data[0])[j * w + i] == ((unsigned short*)buf_data[1])[j * w + i]); } OCL_UNMAP_BUFFER_GTT(1); gettimeofday(&start,0); for (uint32_t i=0; i<100; i++) { OCL_CALL (clEnqueueCopyBufferToImage, queue, buf[0], buf[1], 0, origin, region, 0, NULL, NULL); } OCL_FINISH(); gettimeofday(&stop,0); free(buf_data[0]); buf_data[0] = NULL; double elapsed = time_subtract(&stop, &start, 0); return BANDWIDTH(sz * 100, elapsed); } MAKE_BENCHMARK_FROM_FUNCTION(benchmark_copy_buffer_to_image, "GB/S"); Beignet-1.3.2-Source/benchmark/benchmark_run.cpp000664 001750 001750 00000005354 13161142102 020732 0ustar00yryr000000 000000 /* * Copyright © 2012 Intel Corporation * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library. If not, see . * * Author: Benjamin Segovia */ /** * \file utest_run.cpp * \author Benjamin Segovia * * Just run the unit tests. The user can possibly provides the subset of it */ #include "utest_helper.hpp" #include "utest_exception.hpp" #include #include static const char *shortopts = "c:lanh"; struct option longopts[] = { {"casename", required_argument, NULL, 'c'}, {"list", no_argument, NULL, 'l'}, {"all", no_argument, NULL, 'a'}, {"allnoissue", no_argument, NULL, 'n'}, {"help", no_argument, NULL, 'h'}, {0, 0, 0, 0}, }; void usage() { std::cout << "\ Usage:\n\ ./utest_run