pax_global_header00006660000000000000000000000064141250306600014507gustar00rootroot0000000000000052 comment=35370aef369ab2cad21ea8062614f98fd8aa3aaa menhir-20210929/000077500000000000000000000000001412503066000132435ustar00rootroot00000000000000menhir-20210929/.gitignore000066400000000000000000000001131412503066000152260ustar00rootroot00000000000000*~ .merlin _build analysis/data.csv analysis/*.pdf dune-workspace.versions menhir-20210929/.gitlab-ci.yml000066400000000000000000000001531412503066000156760ustar00rootroot00000000000000image: busybox pages: script: - mv www public artifacts: paths: - public only: - master menhir-20210929/LICENSE000066400000000000000000001304771412503066000142640ustar00rootroot00000000000000In the following, * "THE RUNTIME LIBRARY" refers to the files in the subdirectory lib/. * "THE COQ LIBRARY" refers to the files in the subdirectory coq-menhirlib/. * "THE GENERATOR" refers to the files that are not part of THE RUNTIME LIBRARY, not part of THE COQ LIBRARY, and not located in the subdirectory test/. The files in the subdirectory test/ are not covered by this license. THE GENERATOR is distributed under the terms of the GNU General Public License version 2 (included below). THE RUNTIME LIBRARY is distributed under the terms of the GNU Library General Public License version 2 (included below). As a special exception to the GNU Library General Public License, you may link, statically or dynamically, a "work that uses the Library" with a publicly distributed version of the Library to produce an executable file containing portions of the Library, and distribute that executable file under terms of your choice, without any of the additional requirements listed in clause 6 of the GNU Library General Public License. By "a publicly distributed version of the Library", we mean either the unmodified Library as distributed by INRIA, or a modified version of the Library that is distributed under the conditions defined in clause 3 of the GNU Library General Public License. This exception does not however invalidate any other reasons why the executable file might be covered by the GNU Library General Public License. THE COQ LIBRARY is distributed under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. Version 3 of the GNU Lesser General Public License is included in the file coq-menhirlib/LICENSE. ---------------------------------------------------------------------- GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. ---------------------------------------------------------------------- GNU LIBRARY GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1991 Free Software Foundation, Inc. 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. [This is the first released version of the library GPL. It is numbered 2 because it goes with version 2 of the ordinary GPL.] Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This license, the Library General Public License, applies to some specially designated Free Software Foundation software, and to any other libraries whose authors decide to use it. You can use it for your libraries, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library, or if you modify it. For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link a program with the library, you must provide complete object files to the recipients so that they can relink them with the library, after making changes to the library and recompiling it. And you must show them these terms so they know their rights. Our method of protecting your rights has two steps: (1) copyright the library, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the library. Also, for each distributor's protection, we want to make certain that everyone understands that there is no warranty for this free library. If the library is modified by someone else and passed on, we want its recipients to know that what they have is not the original version, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that companies distributing free software will individually obtain patent licenses, thus in effect transforming the program into proprietary software. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. Most GNU software, including some libraries, is covered by the ordinary GNU General Public License, which was designed for utility programs. This license, the GNU Library General Public License, applies to certain designated libraries. This license is quite different from the ordinary one; be sure to read it in full, and don't assume that anything in it is the same as in the ordinary license. The reason we have a separate public license for some libraries is that they blur the distinction we usually make between modifying or adding to a program and simply using it. Linking a program with a library, without changing the library, is in some sense simply using the library, and is analogous to running a utility program or application program. However, in a textual and legal sense, the linked executable is a combined work, a derivative of the original library, and the ordinary General Public License treats it as such. Because of this blurred distinction, using the ordinary General Public License for libraries did not effectively promote software sharing, because most developers did not use the libraries. We concluded that weaker conditions might promote sharing better. However, unrestricted linking of non-free programs would deprive the users of those programs of all benefit from the free status of the libraries themselves. This Library General Public License is intended to permit developers of non-free programs to use free libraries, while preserving your freedom as a user of such programs to change the free libraries that are incorporated in them. (We have not seen how to achieve this as regards changes in header files, but we have achieved it as regards changes in the actual functions of the Library.) The hope is that this will lead to faster development of free libraries. The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, while the latter only works together with the library. Note that it is possible for a library to be covered by the ordinary General Public License rather than by this special one. GNU LIBRARY GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any software library which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Library General Public License (also called "this License"). Each licensee is addressed as "you". A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables. The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".) "Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library. Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does. 1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) The modified work must itself be a software library. b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change. c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License. d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful. (For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library. In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2, instead of to this License. (If a newer version than version 2 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices. Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy. This option is useful when you wish to copy part of the code of the Library into a program that is not a library. 4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange. If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code. 5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License. However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables. When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law. If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.) Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself. 6. As an exception to the Sections above, you may also compile or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications. You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things: a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.) b) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution. c) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place. d) Verify that the user has already received a copy of these materials or that you have already sent this user a copy. For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute. 7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above. b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it. 10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 13. The Free Software Foundation may publish revised and/or new versions of the Library General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation. 14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS Appendix: How to Apply These Terms to Your New Libraries If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License). To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This library is free software; you can redistribute it and/or modify it under the terms of the GNU Library General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public License for more details. You should have received a copy of the GNU Library General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA Also add information on how to contact you by electronic and paper mail. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the library `Frob' (a library for tweaking knobs) written by James Random Hacker. , 1 April 1990 Ty Coon, President of Vice That's all there is to it! menhir-20210929/check-tarball.sh000077500000000000000000000032261412503066000163010ustar00rootroot00000000000000#!/bin/bash set -euo pipefail IFS=$'\n\t' # This script checks that a Menhir tarball can be compiled and installed. # The command line argument should be the tarball's name without .tar.gz. PACKAGE="$1" TARBALL=$PACKAGE.tar.gz # We use a dedicated opam switch where it is permitted to uninstall/reinstall # Menhir. if opam switch list | grep 'test-menhir' >/dev/null ; then echo "The switch test-menhir already exists." ; else echo "Creating switch test-menhir..." ; opam switch create test-menhir 4.11.1 ; echo "Installing required packages..." ; opam install --yes dune visitors coq ; fi echo "Now switching to test-menhir..." eval $(opam env --set-switch --switch test-menhir) # Uninstall Menhir if it is installed. echo "Removing menhir if already installed..." # read -p "Can I remove it [Enter/^C]?" -n 1 -r ; opam remove menhir || /bin/true # Create a temporary directory; extract into it. # Build and install; then uninstall. TEMPDIR=`mktemp -d /tmp/menhir-test.XXXXXX` INSTALL=$TEMPDIR/install COQCONTRIB=$INSTALL/coq-contrib DUNE=dune cp $TARBALL $TEMPDIR echo " * Extracting. " (cd $TEMPDIR && tar xfz $TARBALL) echo " * Compiling and installing." mkdir $INSTALL (cd $TEMPDIR/$PACKAGE && $DUNE build @install && $DUNE install --prefix=$INSTALL menhir && make -C coq-menhirlib all && make -C coq-menhirlib CONTRIB=$COQCONTRIB install ) > $TEMPDIR/install.log 2>&1 || (cat $TEMPDIR/install.log; exit 1) echo " * Uninstalling." (cd $TEMPDIR/$PACKAGE && $DUNE uninstall --prefix=$INSTALL menhir make -C coq-menhirlib CONTRIB=$COQCONTRIB uninstall ) > $TEMPDIR/uninstall.log 2>&1 || (cat $TEMPDIR/uninstall.log; exit 1) rm -rf $TEMPDIR menhir-20210929/compile-ocaml.sh000077500000000000000000000101541412503066000163240ustar00rootroot00000000000000#!/bin/bash set -euo pipefail IFS=$' \n\t' # This script compiles OCaml using the current version of Menhir and verifies # that it behaves exactly as expected. MENHIR_ROOT=$(pwd) # We use a dedicated opam switch where it is permitted to uninstall/reinstall # Menhir. if opam switch list | grep 'test-menhir' >/dev/null ; then echo "The switch test-menhir already exists." ; else echo "Creating switch test-menhir..." ; opam switch create test-menhir ocaml-system.4.11.1 ; echo "Installing required packages..." ; opam install --yes dune ; fi eval $(opam env --set-switch --switch test-menhir) # Uninstall Menhir if it is installed. # We are not sure whether it was installed via opam or directly # via [make install], so we try both uninstallation methods. echo "Removing menhir if already installed..." # read -p "Can I remove it [Enter/^C]?" -n 1 -r ; (opam remove menhir menhirLib menhirSdk || /bin/true) >/dev/null 2>&1 make -C $MENHIR_ROOT uninstall >/dev/null 2>&1 # Check if everything has been committed. # This seems required for [opam pin] to work properly. if git status --porcelain | grep -v compile-ocaml ; then echo "Error: there remain uncommitted changes." ; git status ; exit 1 ; fi # This function runs a command silently, and prints its execution time. execute () { echo "$1" > .command T="$(date +%s)" if eval "$1" >log.out 2>log.err ; then T="$(($(date +%s)-T))" echo " $T seconds." ; else code=$? echo " failure." cat log.err exit $code fi } # Check out a fresh copy of OCaml at a specific tag. TEMPDIR=/tmp/menhir-test-ocaml mkdir -p $TEMPDIR cd $TEMPDIR rm -rf ocaml echo -n "Cloning OCaml..." execute "git clone git@github.com:ocaml/ocaml.git --depth 1 --branch 4.11.1" cd ocaml # Configure and compile OCaml. This step does not depend on Menhir. echo -n "Configuring OCaml..." execute "./configure" echo -n "Compiling OCaml..." execute "make -j" ls -l ocamlc if false ; then echo -n "Testing OCaml..." execute "make -C testsuite parallel" execute "make -C testsuite clean" fi # Install Menhir. # Should we install Menhir via [make install] or via [opam]? # [make install], which invokes [dune install], is likely to be much # faster (especially if Menhir has already been compiled in its # working directory). Unfortunately, I have seen dune (2.8.2) become # confused and install files partly in one switch, partly in another, # so perhaps installing via [opam pin ...] is preferable. echo -n "Installing Menhir..." if true ; then # Installation via opam. execute "make -C $MENHIR_ROOT pin" else # Installation via [make install]. execute "make -C $MENHIR_ROOT install" fi ls -l `which menhir` # Re-compile OCaml's parser using Menhir. echo -n "Recompiling OCaml's parser using Menhir..." execute "make promote-menhir" echo -n "Committing the recompiled parser..." execute "git add boot/menhir && git commit -m 'make promote-menhir'" # Take a snapshot of the ASTs produced by the current parser. echo -n "Constructing ASTs for all source files..." execute "make -j build-all-asts" echo -n "Committing all ASTs..." execute "make list-all-asts | xargs git add && git commit -m 'Build all ASTs.'" # Compile OCaml (again). # Cleaning up first should be unnecessary, but let's make sure the # compiler is correctly reconstructed from scratch. echo -n "Cleaning up..." execute "make clean" echo -n "Compiling OCaml..." execute "make -j" ls -l ocamlc if false ; then echo -n "Testing OCaml..." execute "make -C testsuite parallel" fi # Reconstruct all ASTs. # Removing them first should be unnecessary, but let's make sure they # are correctly reconstructed from scratch. echo -n "Removing previous ASTs..." execute "make list-all-asts | xargs rm -f" echo -n "Constructing ASTs for all source files..." execute "make -j build-all-asts" # Compare the ASTs produced by the current parser with the snapshot. rm -f .command log.{err,out} if git diff --exit-code >/dev/null ; then echo "Success: the original parser and the recompiled parser agree." else echo "Failure: the original parser and the recompiled parser disagree." echo "cd $TEMPDIR/ocaml && git status" fi menhir-20210929/coq-menhirlib/000077500000000000000000000000001412503066000157745ustar00rootroot00000000000000menhir-20210929/coq-menhirlib/CHANGES.md000066400000000000000000000043471412503066000173760ustar00rootroot00000000000000# Changes ## 2021/09/28 * Change `Instance` to `Global Instance` in the library and in the Coq files produced by `menhir --coq` so as to avoid warnings with Coq 8.14. ## 2021/04/19 * The types returned by the parsing functions, `parse_result` and `step_result` have been extended to carry additional information returned during failure. `Fail_pr` (resp. `Fail_sr`) is now an abbreviation for `Fail_pr_full _ _` (resp. `Fail_sr_full _ _`), and `Fail_pr_full` (resp. `Fail_sr_full`) contains a payload of the parser's state and the most recent token when the failure occurred. This enables error reporting in the Coq parsers generated by Menhir. (Contributed by Brian Ward.) ## 2021/03/10 * Replace `Require Omega` with `Require ZArith` so as to guarantee compatibility with Coq 8.14. * Change `Hint` to `Global Hint` in several places, so as to avoid warnings with Coq 8.13. ## 2020/05/03 * Import `ListNotations` wherever it is necessary so that we do not rely on it being exported by `Program`. ## 2019/09/24 * Fix compatibility with Coq 8.10, and avoid some warnings. ## 2019/06/26 * Fix compatibility with Coq 8.7 and Coq 8.9: * In Coq 8.7, in the syntax `{ x : T & T' }` for the `sigT` types, it was not possible to omit the type `T`. * An anomaly in Coq 8.7 has been worked around. * In Coq 8.9, the numeral notation for positives moved from `Coq.Numbers.BinNums` to `Coq.PArith.BinPos`. ## 2019/06/13 * The Coq development is now free of any axiom (it used to use axiom `K`), and the parsers can now be executed directly within Coq, without using extraction. * The parser interpreter is now written using dependent types, so that no dynamic checks are needed anymore at parsing time. When running the extracted code, this should give a performance boost. Moreover, efficient extraction of `int31` is no longer needed. This required some refactoring of the type of parse trees. * Instead of a dependent pair of a terminal and a semantic value, tokens are now a user-defined (inductive) type. ## 2018/08/27 * Avoid an undocumented mode of use of the `fix` tactic, which would cause an incompatibility with Coq > 8.8.1. (Reported and corrected by Michael Soegtrop.) ## 2018/05/30 * Initial release. menhir-20210929/coq-menhirlib/LICENSE000066400000000000000000001243441412503066000170110ustar00rootroot00000000000000All files in this directory are distributed under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. Version 3 of the GNU Lesser General Public License is included bellow. ---------------------------------------------------------------------- GNU LESSER GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. This version of the GNU Lesser General Public License incorporates the terms and conditions of version 3 of the GNU General Public License, supplemented by the additional permissions listed below. 0. Additional Definitions. As used herein, "this License" refers to version 3 of the GNU Lesser General Public License, and the "GNU GPL" refers to version 3 of the GNU General Public License. "The Library" refers to a covered work governed by this License, other than an Application or a Combined Work as defined below. An "Application" is any work that makes use of an interface provided by the Library, but which is not otherwise based on the Library. Defining a subclass of a class defined by the Library is deemed a mode of using an interface provided by the Library. A "Combined Work" is a work produced by combining or linking an Application with the Library. The particular version of the Library with which the Combined Work was made is also called the "Linked Version". The "Minimal Corresponding Source" for a Combined Work means the Corresponding Source for the Combined Work, excluding any source code for portions of the Combined Work that, considered in isolation, are based on the Application, and not on the Linked Version. The "Corresponding Application Code" for a Combined Work means the object code and/or source code for the Application, including any data and utility programs needed for reproducing the Combined Work from the Application, but excluding the System Libraries of the Combined Work. 1. Exception to Section 3 of the GNU GPL. You may convey a covered work under sections 3 and 4 of this License without being bound by section 3 of the GNU GPL. 2. Conveying Modified Versions. If you modify a copy of the Library, and, in your modifications, a facility refers to a function or data to be supplied by an Application that uses the facility (other than as an argument passed when the facility is invoked), then you may convey a copy of the modified version: a) under this License, provided that you make a good faith effort to ensure that, in the event an Application does not supply the function or data, the facility still operates, and performs whatever part of its purpose remains meaningful, or b) under the GNU GPL, with none of the additional permissions of this License applicable to that copy. 3. Object Code Incorporating Material from Library Header Files. The object code form of an Application may incorporate material from a header file that is part of the Library. You may convey such object code under terms of your choice, provided that, if the incorporated material is not limited to numerical parameters, data structure layouts and accessors, or small macros, inline functions and templates (ten or fewer lines in length), you do both of the following: a) Give prominent notice with each copy of the object code that the Library is used in it and that the Library and its use are covered by this License. b) Accompany the object code with a copy of the GNU GPL and this license document. 4. Combined Works. You may convey a Combined Work under terms of your choice that, taken together, effectively do not restrict modification of the portions of the Library contained in the Combined Work and reverse engineering for debugging such modifications, if you also do each of the following: a) Give prominent notice with each copy of the Combined Work that the Library is used in it and that the Library and its use are covered by this License. b) Accompany the Combined Work with a copy of the GNU GPL and this license document. c) For a Combined Work that displays copyright notices during execution, include the copyright notice for the Library among these notices, as well as a reference directing the user to the copies of the GNU GPL and this license document. d) Do one of the following: 0) Convey the Minimal Corresponding Source under the terms of this License, and the Corresponding Application Code in a form suitable for, and under terms that permit, the user to recombine or relink the Application with a modified version of the Linked Version to produce a modified Combined Work, in the manner specified by section 6 of the GNU GPL for conveying Corresponding Source. 1) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (a) uses at run time a copy of the Library already present on the user's computer system, and (b) will operate properly with a modified version of the Library that is interface-compatible with the Linked Version. e) Provide Installation Information, but only if you would otherwise be required to provide such information under section 6 of the GNU GPL, and only to the extent that such information is necessary to install and execute a modified version of the Combined Work produced by recombining or relinking the Application with a modified version of the Linked Version. (If you use option 4d0, the Installation Information must accompany the Minimal Corresponding Source and Corresponding Application Code. If you use option 4d1, you must provide the Installation Information in the manner specified by section 6 of the GNU GPL for conveying Corresponding Source.) 5. Combined Libraries. You may place library facilities that are a work based on the Library side by side in a single library together with other library facilities that are not Applications and are not covered by this License, and convey such a combined library under terms of your choice, if you do both of the following: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities, conveyed under the terms of this License. b) Give prominent notice with the combined library that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 6. Revised Versions of the GNU Lesser General Public License. The Free Software Foundation may publish revised and/or new versions of the GNU Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library as you received it specifies that a certain numbered version of the GNU Lesser General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that published version or of any later version published by the Free Software Foundation. If the Library as you received it does not specify a version number of the GNU Lesser General Public License, you may choose any version of the GNU Lesser General Public License ever published by the Free Software Foundation. If the Library as you received it specifies that a proxy can decide whether future versions of the GNU Lesser General Public License shall apply, that proxy's public statement of acceptance of any version is permanent authorization for you to choose that version for the Library. ---------------------------------------------------------------------- GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode: Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, your program's commands might be different; for a GUI interface, you would use an "about box". You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see . The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read . menhir-20210929/coq-menhirlib/Makefile000066400000000000000000000002201412503066000174260ustar00rootroot00000000000000# Delegate the following commands: .PHONY: all clean install uninstall all clean install uninstall: @ $(MAKE) -C src --no-print-directory $@ menhir-20210929/coq-menhirlib/README.md000066400000000000000000000013171412503066000172550ustar00rootroot00000000000000# A support library for verified Coq parsers produced by Menhir The [Menhir](http://gallium.inria.fr/~fpottier/menhir/) parser generator, in `--coq` mode, can produce [Coq](https://coq.inria.fr/) parsers. These parsers must be linked against this library, which provides both an interpreter (which allows running the generated parser) and a validator (which allows verifying, at parser construction time, that the generated parser is correct and complete with respect to the grammar). ## Installation To install the latest released version, use `opam install coq-menhirlib`. To install from the sources, use `make` followed with `make install`. ## Authors * [Jacques-Henri Jourdan](jacques-henri.jourdan@lri.fr) menhir-20210929/coq-menhirlib/src/000077500000000000000000000000001412503066000165635ustar00rootroot00000000000000menhir-20210929/coq-menhirlib/src/.gitignore000066400000000000000000000000741412503066000205540ustar00rootroot00000000000000*.vo *.glob *.v.d .*.aux _CoqProject *.vok *.vos .lia.cache menhir-20210929/coq-menhirlib/src/Alphabet.v000066400000000000000000000206731412503066000205020ustar00rootroot00000000000000(****************************************************************************) (* *) (* Menhir *) (* *) (* Jacques-Henri Jourdan, CNRS, LRI, Université Paris Sud *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under *) (* the terms of the GNU Lesser General Public License as published by the *) (* Free Software Foundation, either version 3 of the License, or (at your *) (* option) any later version, as described in the file LICENSE. *) (* *) (****************************************************************************) From Coq Require Import ZArith List Relations RelationClasses. Import ListNotations. Local Obligation Tactic := intros. (** A comparable type is equiped with a [compare] function, that define an order relation. **) Class Comparable (A:Type) := { compare : A -> A -> comparison; compare_antisym : forall x y, CompOpp (compare x y) = compare y x; compare_trans : forall x y z c, (compare x y) = c -> (compare y z) = c -> (compare x z) = c }. Theorem compare_refl {A:Type} (C: Comparable A) : forall x, compare x x = Eq. Proof. intros. pose proof (compare_antisym x x). destruct (compare x x); intuition; try discriminate. Qed. (** The corresponding order is a strict order. **) Definition comparableLt {A:Type} (C: Comparable A) : relation A := fun x y => compare x y = Lt. Global Instance ComparableLtStrictOrder {A:Type} (C: Comparable A) : StrictOrder (comparableLt C). Proof. apply Build_StrictOrder. unfold Irreflexive, Reflexive, complement, comparableLt. intros. pose proof H. rewrite <- compare_antisym in H. rewrite H0 in H. discriminate H. unfold Transitive, comparableLt. intros x y z. apply compare_trans. Qed. (** nat is comparable. **) Global Program Instance natComparable : Comparable nat := { compare := Nat.compare }. Next Obligation. symmetry. destruct (Nat.compare x y) as [] eqn:?. rewrite Nat.compare_eq_iff in Heqc. destruct Heqc. rewrite Nat.compare_eq_iff. trivial. rewrite <- nat_compare_lt in *. rewrite <- nat_compare_gt in *. trivial. rewrite <- nat_compare_lt in *. rewrite <- nat_compare_gt in *. trivial. Qed. Next Obligation. destruct c. rewrite Nat.compare_eq_iff in *; destruct H; assumption. rewrite <- nat_compare_lt in *. apply (Nat.lt_trans _ _ _ H H0). rewrite <- nat_compare_gt in *. apply (gt_trans _ _ _ H H0). Qed. (** A pair of comparable is comparable. **) Global Program Instance PairComparable {A:Type} (CA:Comparable A) {B:Type} (CB:Comparable B) : Comparable (A*B) := { compare := fun x y => let (xa, xb) := x in let (ya, yb) := y in match compare xa ya return comparison with | Eq => compare xb yb | x => x end }. Next Obligation. destruct x, y. rewrite <- (compare_antisym a a0). rewrite <- (compare_antisym b b0). destruct (compare a a0); intuition. Qed. Next Obligation. destruct x, y, z. destruct (compare a a0) as [] eqn:?, (compare a0 a1) as [] eqn:?; try (rewrite <- H0 in H; discriminate); try (destruct (compare a a1) as [] eqn:?; try (rewrite <- compare_antisym in Heqc0; rewrite CompOpp_iff in Heqc0; rewrite (compare_trans _ _ _ _ Heqc0 Heqc2) in Heqc1; discriminate); try (rewrite <- compare_antisym in Heqc1; rewrite CompOpp_iff in Heqc1; rewrite (compare_trans _ _ _ _ Heqc2 Heqc1) in Heqc0; discriminate); assumption); rewrite (compare_trans _ _ _ _ Heqc0 Heqc1); try assumption. apply (compare_trans _ _ _ _ H H0). Qed. (** Special case of comparable, where equality is Leibniz equality. **) Class ComparableLeibnizEq {A:Type} (C: Comparable A) := compare_eq : forall x y, compare x y = Eq -> x = y. (** Boolean equality for a [Comparable]. **) Definition compare_eqb {A:Type} {C:Comparable A} (x y:A) := match compare x y with | Eq => true | _ => false end. Theorem compare_eqb_iff {A:Type} {C:Comparable A} {U:ComparableLeibnizEq C} : forall x y, compare_eqb x y = true <-> x = y. Proof. unfold compare_eqb. intuition. apply compare_eq. destruct (compare x y); intuition; discriminate. destruct H. rewrite compare_refl; intuition. Qed. Global Instance NComparableLeibnizEq : ComparableLeibnizEq natComparable := Nat.compare_eq. (** A pair of ComparableLeibnizEq is ComparableLeibnizEq **) Global Instance PairComparableLeibnizEq {A:Type} {CA:Comparable A} (UA:ComparableLeibnizEq CA) {B:Type} {CB:Comparable B} (UB:ComparableLeibnizEq CB) : ComparableLeibnizEq (PairComparable CA CB). Proof. intros x y; destruct x, y; simpl. pose proof (compare_eq a a0); pose proof (compare_eq b b0). destruct (compare a a0); try discriminate. intuition. destruct H2, H0. reflexivity. Qed. (** An [Finite] type is a type with the list of all elements. **) Class Finite (A:Type) := { all_list : list A; all_list_forall : forall x:A, In x all_list }. (** An alphabet is both [ComparableLeibnizEq] and [Finite]. **) Class Alphabet (A:Type) := { AlphabetComparable :> Comparable A; AlphabetComparableLeibnizEq :> ComparableLeibnizEq AlphabetComparable; AlphabetFinite :> Finite A }. (** The [Numbered] class provides a conveniant way to build [Alphabet] instances, with a good computationnal complexity. It is mainly a injection from it to [positive] **) Class Numbered (A:Type) := { inj : A -> positive; surj : positive -> A; surj_inj_compat : forall x, surj (inj x) = x; inj_bound : positive; inj_bound_spec : forall x, (inj x < Pos.succ inj_bound)%positive }. Global Program Instance NumberedAlphabet {A:Type} (N:Numbered A) : Alphabet A := { AlphabetComparable := {| compare := fun x y => Pos.compare (inj x) (inj y) |}; AlphabetFinite := {| all_list := fst (Pos.iter (fun '(l, p) => (surj p::l, Pos.succ p)) ([], 1%positive) inj_bound) |} }. Next Obligation. simpl. now rewrite <- Pos.compare_antisym. Qed. Next Obligation. match goal with c : comparison |- _ => destruct c end. - rewrite Pos.compare_eq_iff in *. congruence. - rewrite Pos.compare_lt_iff in *. eauto using Pos.lt_trans. - rewrite Pos.compare_gt_iff in *. eauto using Pos.lt_trans. Qed. Next Obligation. intros x y. unfold compare. intros Hxy. assert (Hxy' : inj x = inj y). (* We do not use [Pos.compare_eq_iff] directly to make sure the proof is executable. *) { destruct (Pos.eq_dec (inj x) (inj y)) as [|[]]; [now auto|]. now apply Pos.compare_eq_iff. } (* Using rewrite here leads to non-executable proofs. *) transitivity (surj (inj x)). { apply eq_sym, surj_inj_compat. } transitivity (surj (inj y)); cycle 1. { apply surj_inj_compat. } apply f_equal, Hxy'. Defined. Next Obligation. rewrite <-(surj_inj_compat x). generalize (inj_bound_spec x). generalize (inj x). clear x. intros x. match goal with |- ?Hx -> In ?s (fst ?p) => assert ((Hx -> In s (fst p)) /\ snd p = Pos.succ inj_bound); [|now intuition] end. rewrite Pos.lt_succ_r. induction inj_bound as [|y [IH1 IH2]] using Pos.peano_ind; (split; [intros Hx|]); simpl. - rewrite (Pos.le_antisym _ _ Hx); auto using Pos.le_1_l. - auto. - rewrite Pos.iter_succ. destruct Pos.iter; simpl in *. subst. rewrite Pos.le_lteq in Hx. destruct Hx as [?%Pos.lt_succ_r| ->]; now auto. - rewrite Pos.iter_succ. destruct Pos.iter. simpl in IH2. subst. reflexivity. Qed. (** Definitions of [FSet]/[FMap] from [Comparable] **) Require Import OrderedTypeAlt. Require FSetAVL. Require FMapAVL. Import OrderedType. Module Type ComparableM. Parameter t : Type. Global Declare Instance tComparable : Comparable t. End ComparableM. Module OrderedTypeAlt_from_ComparableM (C:ComparableM) <: OrderedTypeAlt. Definition t := C.t. Definition compare : t -> t -> comparison := compare. Infix "?=" := compare (at level 70, no associativity). Lemma compare_sym x y : (y?=x) = CompOpp (x?=y). Proof. exact (Logic.eq_sym (compare_antisym x y)). Qed. Lemma compare_trans c x y z : (x?=y) = c -> (y?=z) = c -> (x?=z) = c. Proof. apply compare_trans. Qed. End OrderedTypeAlt_from_ComparableM. Module OrderedType_from_ComparableM (C:ComparableM) <: OrderedType. Module Alt := OrderedTypeAlt_from_ComparableM C. Include (OrderedType_from_Alt Alt). End OrderedType_from_ComparableM. menhir-20210929/coq-menhirlib/src/Automaton.v000066400000000000000000000147611412503066000207320ustar00rootroot00000000000000(****************************************************************************) (* *) (* Menhir *) (* *) (* Jacques-Henri Jourdan, CNRS, LRI, Université Paris Sud *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under *) (* the terms of the GNU Lesser General Public License as published by the *) (* Free Software Foundation, either version 3 of the License, or (at your *) (* option) any later version, as described in the file LICENSE. *) (* *) (****************************************************************************) Require Grammar. Require Export Alphabet. From Coq Require Import Orders. From Coq Require Export List Syntax. Module Type AutInit. (** The grammar of the automaton. **) Declare Module Gram:Grammar.T. Export Gram. (** The set of non initial state is considered as an alphabet. **) Parameter noninitstate : Type. Global Declare Instance NonInitStateAlph : Alphabet noninitstate. Parameter initstate : Type. Global Declare Instance InitStateAlph : Alphabet initstate. (** When we are at this state, we know that this symbol is the top of the stack. **) Parameter last_symb_of_non_init_state: noninitstate -> symbol. End AutInit. Module Types(Import Init:AutInit). (** In many ways, the behaviour of the initial state is different from the behaviour of the other states. So we have chosen to explicitaly separate them: the user has to provide the type of non initial states. **) Inductive state := | Init: initstate -> state | Ninit: noninitstate -> state. Global Program Instance StateAlph : Alphabet state := { AlphabetComparable := {| compare := fun x y => match x, y return comparison with | Init _, Ninit _ => Lt | Init x, Init y => compare x y | Ninit _, Init _ => Gt | Ninit x, Ninit y => compare x y end |}; AlphabetFinite := {| all_list := map Init all_list ++ map Ninit all_list |} }. Local Obligation Tactic := intros. Next Obligation. destruct x, y; intuition; apply compare_antisym. Qed. Next Obligation. destruct x, y, z; intuition. apply (compare_trans _ i0); intuition. congruence. congruence. apply (compare_trans _ n0); intuition. Qed. Next Obligation. intros x y. destruct x, y; intuition; try discriminate. rewrite (compare_eq i i0); intuition. rewrite (compare_eq n n0); intuition. Qed. Next Obligation. apply in_or_app; destruct x; intuition; [left|right]; apply in_map; apply all_list_forall. Qed. Coercion Ninit : noninitstate >-> state. Coercion Init : initstate >-> state. (** For an LR automaton, there are four kind of actions that can be done at a given state: - Shifting, that is reading a token and putting it into the stack, - Reducing a production, that is popping the right hand side of the production from the stack, and pushing the left hand side, - Failing - Accepting the word (special case of reduction) As in the menhir parser generator, we do not want our parser to read after the end of stream. That means that once the parser has read a word in the grammar language, it should stop without peeking the input stream. So, for the automaton to be complete, the grammar must be particular: if a word is in its language, then it is not a prefix of an other word of the language (otherwise, menhir reports an end of stream conflict). As a consequence of that, there is two notions of action: the first one is an action performed before having read the stream, the second one is after **) Inductive lookahead_action (term:terminal) := | Shift_act: forall s:noninitstate, T term = last_symb_of_non_init_state s -> lookahead_action term | Reduce_act: production -> lookahead_action term | Fail_act: lookahead_action term. Arguments Shift_act {term}. Arguments Reduce_act {term}. Arguments Fail_act {term}. Inductive action := | Default_reduce_act: production -> action | Lookahead_act : (forall term:terminal, lookahead_action term) -> action. (** Types used for the annotations of the automaton. **) (** An item is a part of the annotations given to the validator. It is acually a set of LR(1) items sharing the same core. It is needed to validate completeness. **) Record item := { (** The pseudo-production of the item. **) prod_item: production; (** The position of the dot. **) dot_pos_item: nat; (** The lookahead symbol of the item. We are using a list, so we can store together multiple LR(1) items sharing the same core. **) lookaheads_item: list terminal }. End Types. Module Type T. Include AutInit <+ Types. Module Export GramDefs := Grammar.Defs Gram. (** For each initial state, the non terminal it recognizes. **) Parameter start_nt: initstate -> nonterminal. (** The action table maps a state to either a map terminal -> action. **) Parameter action_table: state -> action. (** The goto table of an LR(1) automaton. **) Parameter goto_table: state -> forall nt:nonterminal, option { s:noninitstate | NT nt = last_symb_of_non_init_state s }. (** Some annotations on the automaton to help the validation. **) (** When we are at this state, we know that these symbols are just below the top of the stack. The list is ordered such that the head correspond to the (almost) top of the stack. **) Parameter past_symb_of_non_init_state: noninitstate -> list symbol. (** When we are at this state, the (strictly) previous states verify these predicates. **) Parameter past_state_of_non_init_state: noninitstate -> list (state -> bool). (** The items of the state. **) Parameter items_of_state: state -> list item. (** The nullable predicate for non terminals : true if and only if the symbol produces the empty string **) Parameter nullable_nterm: nonterminal -> bool. (** The first predicates for non terminals, symbols or words of symbols. A terminal is in the returned list if, and only if the parameter produces a word that begins with the given terminal **) Parameter first_nterm: nonterminal -> list terminal. End T. menhir-20210929/coq-menhirlib/src/Grammar.v000066400000000000000000000140121412503066000203360ustar00rootroot00000000000000(****************************************************************************) (* *) (* Menhir *) (* *) (* Jacques-Henri Jourdan, CNRS, LRI, Université Paris Sud *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under *) (* the terms of the GNU Lesser General Public License as published by the *) (* Free Software Foundation, either version 3 of the License, or (at your *) (* option) any later version, as described in the file LICENSE. *) (* *) (****************************************************************************) From Coq Require Import List Orders. Import ListNotations. Require Import Alphabet. (** The terminal non-terminal alphabets of the grammar. **) Module Type Alphs. Parameters terminal nonterminal : Type. Global Declare Instance TerminalAlph: Alphabet terminal. Global Declare Instance NonTerminalAlph: Alphabet nonterminal. End Alphs. (** Definition of the alphabet of symbols, given the alphabet of terminals and the alphabet of non terminals **) Module Symbol(Import A:Alphs). Inductive symbol := | T: terminal -> symbol | NT: nonterminal -> symbol. Global Program Instance SymbolAlph : Alphabet symbol := { AlphabetComparable := {| compare := fun x y => match x, y return comparison with | T _, NT _ => Gt | NT _, T _ => Lt | T x, T y => compare x y | NT x, NT y => compare x y end |}; AlphabetFinite := {| all_list := map T all_list++map NT all_list |} }. Next Obligation. destruct x; destruct y; intuition; apply compare_antisym. Qed. Next Obligation. destruct x; destruct y; destruct z; intuition; try discriminate. apply (compare_trans _ t0); intuition. apply (compare_trans _ n0); intuition. Qed. Next Obligation. intros x y. destruct x; destruct y; try discriminate; intros. rewrite (compare_eq t t0); now intuition. rewrite (compare_eq n n0); now intuition. Defined. Next Obligation. rewrite in_app_iff. destruct x; [left | right]; apply in_map; apply all_list_forall. Qed. End Symbol. (** A curryfied function with multiple parameters **) Definition arrows_right: Type -> list Type -> Type := fold_right (fun A B => A -> B). Module Type T. Include Alphs <+ Symbol. (** [symbol_semantic_type] maps a symbols to the type of its semantic values. **) Parameter symbol_semantic_type: symbol -> Type. (** The type of productions identifiers **) Parameter production : Type. Global Declare Instance ProductionAlph : Alphabet production. (** Accessors for productions: left hand side, right hand side, and semantic action. The semantic actions are given in the form of curryfied functions, that take arguments in the reverse order. **) Parameter prod_lhs: production -> nonterminal. (* The RHS of a production is given in reversed order, so that symbols *) Parameter prod_rhs_rev: production -> list symbol. Parameter prod_action: forall p:production, arrows_right (symbol_semantic_type (NT (prod_lhs p))) (map symbol_semantic_type (prod_rhs_rev p)). (** Tokens are the atomic elements of the input stream: they contain a terminal and a semantic value of the type corresponding to this terminal. *) Parameter token : Type. Parameter token_term : token -> terminal. Parameter token_sem : forall tok : token, symbol_semantic_type (T (token_term tok)). End T. Module Defs(Import G:T). (** The semantics of a grammar is defined in two stages. First, we define the notion of parse tree, which represents one way of recognizing a word with a head symbol. Semantic values are stored at the leaves. This notion is defined in two mutually recursive flavours: either for a single head symbol, or for a list of head symbols. *) Inductive parse_tree: forall (head_symbol:symbol) (word:list token), Type := (** Parse tree for a terminal symbol. *) | Terminal_pt: forall (tok:token), parse_tree (T (token_term tok)) [tok] (** Parse tree for a non-terminal symbol. *) | Non_terminal_pt: forall (prod:production) {word:list token}, parse_tree_list (prod_rhs_rev prod) word -> parse_tree (NT (prod_lhs prod)) word (* Note : the list head_symbols_rev is reversed. *) with parse_tree_list: forall (head_symbols_rev:list symbol) (word:list token), Type := | Nil_ptl: parse_tree_list [] [] | Cons_ptl: forall {head_symbolsq:list symbol} {wordq:list token}, parse_tree_list head_symbolsq wordq -> forall {head_symbolt:symbol} {wordt:list token}, parse_tree head_symbolt wordt -> parse_tree_list (head_symbolt::head_symbolsq) (wordq++wordt). (** We can now finish the definition of the semantics of a grammar, by giving the semantic value assotiated with a parse tree. *) Fixpoint pt_sem {head_symbol word} (tree:parse_tree head_symbol word) : symbol_semantic_type head_symbol := match tree with | Terminal_pt tok => token_sem tok | Non_terminal_pt prod ptl => ptl_sem ptl (prod_action prod) end with ptl_sem {A head_symbols word} (tree:parse_tree_list head_symbols word) : arrows_right A (map symbol_semantic_type head_symbols) -> A := match tree with | Nil_ptl => fun act => act | Cons_ptl q t => fun act => ptl_sem q (act (pt_sem t)) end. Fixpoint pt_size {head_symbol word} (tree:parse_tree head_symbol word) := match tree with | Terminal_pt _ => 1 | Non_terminal_pt _ l => S (ptl_size l) end with ptl_size {head_symbols word} (tree:parse_tree_list head_symbols word) := match tree with | Nil_ptl => 0 | Cons_ptl q t => pt_size t + ptl_size q end. End Defs. menhir-20210929/coq-menhirlib/src/Interpreter.v000066400000000000000000000450151412503066000212620ustar00rootroot00000000000000(****************************************************************************) (* *) (* Menhir *) (* *) (* Jacques-Henri Jourdan, CNRS, LRI, Université Paris Sud *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under *) (* the terms of the GNU Lesser General Public License as published by the *) (* Free Software Foundation, either version 3 of the License, or (at your *) (* option) any later version, as described in the file LICENSE. *) (* *) (****************************************************************************) From Coq Require Import List Syntax. Import ListNotations. From Coq.ssr Require Import ssreflect. Require Automaton. Require Import Alphabet Grammar Validator_safe. Module Make(Import A:Automaton.T). Module Import ValidSafe := Validator_safe.Make A. (** A few helpers for dependent types. *) (** Decidable propositions. *) Class Decidable (P : Prop) := decide : {P} + {~P}. Arguments decide _ {_}. (** A [Comparable] type has decidable equality. *) Global Instance comparable_decidable_eq T `{ComparableLeibnizEq T} (x y : T) : Decidable (x = y). Proof. unfold Decidable. destruct (compare x y) eqn:EQ; [left; apply compare_eq; intuition | ..]; right; intros ->; by rewrite compare_refl in EQ. Defined. Global Instance list_decidable_eq T : (forall x y : T, Decidable (x = y)) -> (forall l1 l2 : list T, Decidable (l1 = l2)). Proof. unfold Decidable. decide equality. Defined. Ltac subst_existT := repeat match goal with | _ => progress subst | H : @existT ?A ?P ?x ?y1 = @existT ?A ?P ?x ?y2 |- _ => let DEC := fresh in assert (DEC : forall u1 u2 : A, Decidable (u1 = u2)) by apply _; apply Eqdep_dec.inj_pair2_eq_dec in H; [|by apply DEC]; clear DEC end. (** The interpreter is written using dependent types. In order to avoid reducing proof terms while executing the parser, we thunk all the propositions behind an arrow. Note that thunkP is still in Prop so that it is erased by extraction. *) Definition thunkP (P : Prop) : Prop := True -> P. (** Sometimes, we actually need a reduced proof in a program (for example when using an equality to cast a value). In that case, instead of reducing the proof we already have, we reprove the assertion by using decidability. *) Definition reprove {P} `{Decidable P} (p : thunkP P) : P := match decide P with | left p => p | right np => False_ind _ (np (p I)) end. (** Combination of reprove with eq_rect. *) Definition cast {T : Type} (F : T -> Type) {x y : T} (eq : thunkP (x = y)) {DEC : unit -> Decidable (x = y)}: F x -> F y := fun a => eq_rect x F a y (@reprove _ (DEC ()) eq). Lemma cast_eq T F (x : T) (eq : thunkP (x = x)) `{forall x y, Decidable (x = y)} a : cast F eq a = a. Proof. by rewrite /cast -Eqdep_dec.eq_rect_eq_dec. Qed. (** Input buffers and operations on them. **) CoInductive buffer : Type := Buf_cons { buf_head : token; buf_tail : buffer }. (* Note: Coq 8.12.0 wants a Declare Scope command, but this breaks compatibility with Coq < 8.10. Declare Scope buffer_scope. *) Delimit Scope buffer_scope with buf. Bind Scope buffer_scope with buffer. Infix "::" := Buf_cons (at level 60, right associativity) : buffer_scope. (** Concatenation of a list and an input buffer **) Fixpoint app_buf (l:list token) (buf:buffer) := match l with | nil => buf | cons t q => (t :: app_buf q buf)%buf end. Infix "++" := app_buf (at level 60, right associativity) : buffer_scope. Lemma app_buf_assoc (l1 l2:list token) (buf:buffer) : (l1 ++ (l2 ++ buf) = (l1 ++ l2) ++ buf)%buf. Proof. induction l1 as [|?? IH]=>//=. rewrite IH //. Qed. (** The type of a non initial state: the type of semantic values associated with the last symbol of this state. *) Definition noninitstate_type state := symbol_semantic_type (last_symb_of_non_init_state state). (** The stack of the automaton : it can be either nil or contains a non initial state, a semantic value for the symbol associted with this state, and a nested stack. **) Definition stack := list (sigT noninitstate_type). (* eg. list {state & state_type state} *) Section Interpreter. Hypothesis safe: safe. (* Properties of the automaton deduced from safety validation. *) Proposition shift_head_symbs: shift_head_symbs. Proof. pose proof safe; unfold ValidSafe.safe in H; intuition. Qed. Proposition goto_head_symbs: goto_head_symbs. Proof. pose proof safe; unfold ValidSafe.safe in H; intuition. Qed. Proposition shift_past_state: shift_past_state. Proof. pose proof safe; unfold ValidSafe.safe in H; intuition. Qed. Proposition goto_past_state: goto_past_state. Proof. pose proof safe; unfold ValidSafe.safe in H; intuition. Qed. Proposition reduce_ok: reduce_ok. Proof. pose proof safe; unfold ValidSafe.safe in H; intuition. Qed. Variable init : initstate. (** The top state of a stack **) Definition state_of_stack (stack:stack): state := match stack with | [] => init | existT _ s _::_ => s end. (** The stack of states of an automaton stack **) Definition state_stack_of_stack (stack:stack) := (List.map (fun cell:sigT noninitstate_type => singleton_state_pred (projT1 cell)) stack ++ [singleton_state_pred init])%list. (** The stack of symbols of an automaton stack **) Definition symb_stack_of_stack (stack:stack) := List.map (fun cell:sigT noninitstate_type => last_symb_of_non_init_state (projT1 cell)) stack. (** The stack invariant : it basically states that the assumptions on the states are true. **) Inductive stack_invariant: stack -> Prop := | stack_invariant_constr: forall stack, prefix (head_symbs_of_state (state_of_stack stack)) (symb_stack_of_stack stack) -> prefix_pred (head_states_of_state (state_of_stack stack)) (state_stack_of_stack stack) -> stack_invariant_next stack -> stack_invariant stack with stack_invariant_next: stack -> Prop := | stack_invariant_next_nil: stack_invariant_next [] | stack_invariant_next_cons: forall state_cur st stack_rec, stack_invariant stack_rec -> stack_invariant_next (existT _ state_cur st::stack_rec). (** [pop] pops some symbols from the stack. It returns the popped semantic values using [sem_popped] as an accumulator and discards the popped states.**) Fixpoint pop (symbols_to_pop:list symbol) {A:Type} (stk:stack) : thunkP (prefix symbols_to_pop (symb_stack_of_stack stk)) -> forall (action:arrows_right A (map symbol_semantic_type symbols_to_pop)), stack * A. unshelve refine (match symbols_to_pop return (thunkP (prefix symbols_to_pop (symb_stack_of_stack stk))) -> forall (action:arrows_right A (map _ symbols_to_pop)), stack * A with | [] => fun _ action => (stk, action) | t::q => fun Hp action => match stk return thunkP (prefix (t::q) (symb_stack_of_stack stk)) -> stack * A with | existT _ state_cur sem::stack_rec => fun Hp => let sem_conv := cast symbol_semantic_type _ sem in pop q _ stack_rec _ (action sem_conv) | [] => fun Hp => False_rect _ _ end Hp end). Proof. - simpl in Hp. clear -Hp. abstract (intros _ ; specialize (Hp I); now inversion Hp). - clear -Hp. abstract (specialize (Hp I); now inversion Hp). - simpl in Hp. clear -Hp. abstract (intros _ ; specialize (Hp I); now inversion Hp). Defined. (* Equivalent declarative specification for pop, so that we avoid (part of) the dependent types nightmare. *) Inductive pop_spec {A:Type} : forall (symbols_to_pop:list symbol) (stk : stack) (action : arrows_right A (map symbol_semantic_type symbols_to_pop)) (stk' : stack) (sem : A), Prop := | Nil_pop_spec stk sem : pop_spec [] stk sem stk sem | Cons_pop_spec symbols_to_pop st stk action sem stk' res : pop_spec symbols_to_pop stk (action sem) stk' res -> pop_spec (last_symb_of_non_init_state st::symbols_to_pop) (existT _ st sem :: stk) action stk' res. Lemma pop_spec_ok {A:Type} symbols_to_pop stk Hp action stk' res: pop symbols_to_pop stk Hp action = (stk', res) <-> pop_spec (A:=A) symbols_to_pop stk action stk' res. Proof. revert stk Hp action. induction symbols_to_pop as [|t symbols_to_pop IH]=>stk Hp action /=. - split. + intros [= <- <-]. constructor. + intros H. inversion H. by subst_existT. - destruct stk as [|[st sem]]=>/=; [by destruct pop_subproof0|]. remember (pop_subproof t symbols_to_pop stk st Hp) as EQ eqn:eq. clear eq. generalize EQ. revert Hp action. rewrite <-(EQ I)=>Hp action ?. rewrite cast_eq. rewrite IH. split. + intros. by constructor. + intros H. inversion H. by subst_existT. Qed. Lemma pop_preserves_invariant symbols_to_pop stk Hp A action : stack_invariant stk -> stack_invariant (fst (pop symbols_to_pop stk Hp (A:=A) action)). Proof. revert stk Hp A action. induction symbols_to_pop as [|t q IH]=>//=. intros stk Hp A action Hi. destruct Hi as [stack Hp' Hpp [|state st stk']]. - destruct pop_subproof0. - now apply IH. Qed. Lemma pop_state_valid symbols_to_pop stk Hp A action lpred : prefix_pred lpred (state_stack_of_stack stk) -> let stk' := fst (pop symbols_to_pop stk Hp (A:=A) action) in state_valid_after_pop (state_of_stack stk') symbols_to_pop lpred. Proof. revert stk Hp A action lpred. induction symbols_to_pop as [|t q IH]=>/=. - intros stk Hp A a lpred Hpp. destruct lpred as [|pred lpred]; constructor. inversion Hpp as [|? lpred' ? pred' Himpl Hpp' eq1 eq2]; subst. specialize (Himpl (state_of_stack stk)). destruct (pred' (state_of_stack stk)) as [] eqn:Heqpred'=>//. destruct stk as [|[]]; simpl in *. + inversion eq2; subst; clear eq2. unfold singleton_state_pred in Heqpred'. now rewrite compare_refl in Heqpred'; discriminate. + inversion eq2; subst; clear eq2. unfold singleton_state_pred in Heqpred'. now rewrite compare_refl in Heqpred'; discriminate. - intros stk Hp A a lpred Hpp. destruct stk as [|[] stk]=>//=. + destruct pop_subproof0. + destruct lpred as [|pred lpred]; [by constructor|]. constructor. apply IH. by inversion Hpp. Qed. (** [step_result] represents the result of one step of the automaton : it can fail, accept or progress. [Fail_sr] means that the input is incorrect. [Accept_sr] means that this is the last step of the automaton, and it returns the semantic value of the input word. [Progress_sr] means that some progress has been made, but new steps are needed in order to accept a word. For [Accept_sr] and [Progress_sr], the result contains the new input buffer. [Fail_sr] means that the input word is rejected by the automaton. It is different to [Err] (from the error monad), which mean that the automaton is bogus and has perfomed a forbidden action. **) Inductive step_result := | Fail_sr_full: state -> token -> step_result | Accept_sr: symbol_semantic_type (NT (start_nt init)) -> buffer -> step_result | Progress_sr: stack -> buffer -> step_result. (** [reduce_step] does a reduce action : - pops some elements from the stack - execute the action of the production - follows the goto for the produced non terminal symbol **) Definition reduce_step stk prod (buffer : buffer) (Hval : thunkP (valid_for_reduce (state_of_stack stk) prod)) (Hi : thunkP (stack_invariant stk)) : step_result. refine ((let '(stk', sem) as ss := pop (prod_rhs_rev prod) stk _ (prod_action prod) return thunkP (state_valid_after_pop (state_of_stack (fst ss)) _ (head_states_of_state (state_of_stack stk))) -> _ in fun Hval' => match goto_table (state_of_stack stk') (prod_lhs prod) as goto return (thunkP (goto = None -> match state_of_stack stk' with | Init i => prod_lhs prod = start_nt i | Ninit _ => False end)) -> _ with | Some (exist _ state_new e) => fun _ => let sem := eq_rect _ _ sem _ e in Progress_sr (existT noninitstate_type state_new sem::stk') buffer | None => fun Hval => let sem := cast symbol_semantic_type _ sem in Accept_sr sem buffer end (fun _ => _)) (fun _ => pop_state_valid _ _ _ _ _ _ _)). Proof. - clear -Hi Hval. abstract (intros _; destruct Hi=>//; eapply prefix_trans; [by apply Hval|eassumption]). - clear -Hval. abstract (intros _; f_equal; specialize (Hval I eq_refl); destruct stk' as [|[]]=>//). - simpl in Hval'. clear -Hval Hval'. abstract (move : Hval => /(_ I) [_ /(_ _ (Hval' I))] Hval2 Hgoto; by rewrite Hgoto in Hval2). - clear -Hi. abstract by destruct Hi. Defined. Lemma reduce_step_stack_invariant_preserved stk prod buffer Hv Hi stk' buffer': reduce_step stk prod buffer Hv Hi = Progress_sr stk' buffer' -> stack_invariant stk'. Proof. unfold reduce_step. match goal with | |- context [pop ?symbols_to_pop stk ?Hp ?action] => assert (Hi':=pop_preserves_invariant symbols_to_pop stk Hp _ action (Hi I)); generalize (pop_state_valid symbols_to_pop stk Hp _ action) end. destruct pop as [stk0 sem]=>/=. simpl in Hi'. intros Hv'. assert (Hgoto1:=goto_head_symbs (state_of_stack stk0) (prod_lhs prod)). assert (Hgoto2:=goto_past_state (state_of_stack stk0) (prod_lhs prod)). match goal with | |- context [fun _ : True => ?X] => generalize X end. destruct goto_table as [[state_new e]|] eqn:EQgoto=>//. intros _ [= <- <-]. constructor=>/=. - constructor. eapply prefix_trans. apply Hgoto1. by destruct Hi'. - unfold state_stack_of_stack; simpl; constructor. + intros ?. by destruct singleton_state_pred. + eapply prefix_pred_trans. apply Hgoto2. by destruct Hi'. - by constructor. Qed. (** One step of parsing. **) Definition step stk buffer (Hi : thunkP (stack_invariant stk)): step_result := match action_table (state_of_stack stk) as a return thunkP match a return Prop with | Default_reduce_act prod => _ | Lookahead_act awt => forall t : terminal, match awt t with | Reduce_act p => _ | _ => True end end -> _ with | Default_reduce_act prod => fun Hv => reduce_step stk prod buffer Hv Hi | Lookahead_act awt => fun Hv => match buf_head buffer with | tok => match awt (token_term tok) as a return thunkP match a return Prop with Reduce_act p => _ | _ => _ end -> _ with | Shift_act state_new e => fun _ => let sem_conv := eq_rect _ symbol_semantic_type (token_sem tok) _ e in Progress_sr (existT noninitstate_type state_new sem_conv::stk) (buf_tail buffer) | Reduce_act prod => fun Hv => reduce_step stk prod buffer Hv Hi | Fail_act => fun _ => Fail_sr_full (state_of_stack stk) tok end (fun _ => Hv I (token_term tok)) end end (fun _ => reduce_ok _). Lemma step_stack_invariant_preserved stk buffer Hi stk' buffer': step stk buffer Hi = Progress_sr stk' buffer' -> stack_invariant stk'. Proof. unfold step. generalize (reduce_ok (state_of_stack stk))=>Hred. assert (Hshift1 := shift_head_symbs (state_of_stack stk)). assert (Hshift2 := shift_past_state (state_of_stack stk)). destruct action_table as [prod|awt]=>/=. - eauto using reduce_step_stack_invariant_preserved. - set (term := token_term (buf_head buffer)). generalize (Hred term). clear Hred. intros Hred. specialize (Hshift1 term). specialize (Hshift2 term). destruct (awt term) as [state_new e|prod|]=>//. + intros [= <- <-]. constructor=>/=. * constructor. eapply prefix_trans. apply Hshift1. by destruct Hi. * unfold state_stack_of_stack; simpl; constructor. -- intros ?. by destruct singleton_state_pred. -- eapply prefix_pred_trans. apply Hshift2. by destruct Hi. * constructor; by apply Hi. + eauto using reduce_step_stack_invariant_preserved. Qed. (** The parsing use a [nat] fuel parameter [log_n_steps], so that we do not have to prove terminaison, which is difficult. Note that [log_n_steps] is *not* the fuel in the conventionnal sense: this parameter contains the logarithm (in base 2) of the number of steps to perform. Hence, a value of, e.g., 50 will usually be enough to ensure termination. *) Fixpoint parse_fix stk buffer (log_n_steps : nat) (Hi : thunkP (stack_invariant stk)): { sr : step_result | forall stk' buffer', sr = Progress_sr stk' buffer' -> stack_invariant stk' } := match log_n_steps with | O => exist _ (step stk buffer Hi) (step_stack_invariant_preserved _ _ Hi) | S log_n_steps => match parse_fix stk buffer log_n_steps Hi with | exist _ (Progress_sr stk buffer) Hi' => parse_fix stk buffer log_n_steps (fun _ => Hi' _ buffer eq_refl) | sr => sr end end. (** The final result of a parsing is either a failure (the automaton has rejected the input word), either a timeout (the automaton has spent all the given [2^log_n_steps]), either a parsed semantic value with a rest of the input buffer. Note that we do not make parse_result depend on start_nt for the result type, so that this inductive is extracted without the use of Obj.t in OCaml. **) Inductive parse_result {A : Type} := | Fail_pr_full: state -> token -> parse_result | Timeout_pr: parse_result | Parsed_pr: A -> buffer -> parse_result. Global Arguments parse_result _ : clear implicits. Definition parse (buffer : buffer) (log_n_steps : nat): parse_result (symbol_semantic_type (NT (start_nt init))). refine (match proj1_sig (parse_fix [] buffer log_n_steps _) with | Fail_sr_full st tok => Fail_pr_full st tok | Accept_sr sem buffer' => Parsed_pr sem buffer' | Progress_sr _ _ => Timeout_pr end). Proof. abstract (repeat constructor; intros; by destruct singleton_state_pred). Defined. End Interpreter. Arguments Fail_sr_full {init} _ _. Arguments Accept_sr {init} _ _. Arguments Progress_sr {init} _ _. (* These notations are provided for backwards compatibility with Coq code * from before the addition of the return information. They are used in the * theorem statements. *) Notation Fail_sr := (Fail_sr_full _ _) (only parsing). Notation Fail_pr := (Fail_pr_full _ _) (only parsing). End Make. Module Type T(A:Automaton.T). Include (Make A). End T. menhir-20210929/coq-menhirlib/src/Interpreter_complete.v000066400000000000000000001015311412503066000231460ustar00rootroot00000000000000(****************************************************************************) (* *) (* Menhir *) (* *) (* Jacques-Henri Jourdan, CNRS, LRI, Université Paris Sud *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under *) (* the terms of the GNU Lesser General Public License as published by the *) (* Free Software Foundation, either version 3 of the License, or (at your *) (* option) any later version, as described in the file LICENSE. *) (* *) (****************************************************************************) From Coq Require Import List Arith. Import ListNotations. From Coq.ssr Require Import ssreflect. Require Import Alphabet Grammar. Require Automaton Interpreter Validator_complete. Module Make(Import A:Automaton.T) (Import Inter:Interpreter.T A). Module Import Valid := Validator_complete.Make A. (** * Completeness Proof **) Section Completeness_Proof. Hypothesis safe: Inter.ValidSafe.safe. Hypothesis complete: complete. (* Properties of the automaton deduced from completeness validation. *) Proposition nullable_stable: nullable_stable. Proof. pose proof complete; unfold Valid.complete in H; intuition. Qed. Proposition first_stable: first_stable. Proof. pose proof complete; unfold Valid.complete in H; intuition. Qed. Proposition start_future: start_future. Proof. pose proof complete; unfold Valid.complete in H; intuition. Qed. Proposition terminal_shift: terminal_shift. Proof. pose proof complete; unfold Valid.complete in H; intuition. Qed. Proposition end_reduce: end_reduce. Proof. pose proof complete; unfold Valid.complete in H; intuition. Qed. Proposition start_goto: start_goto. Proof. pose proof complete; unfold Valid.complete in H; intuition. Qed. Proposition non_terminal_goto: non_terminal_goto. Proof. pose proof complete; unfold Valid.complete in H; intuition. Qed. Proposition non_terminal_closed: non_terminal_closed. Proof. pose proof complete; unfold Valid.complete in H; intuition. Qed. (** If the nullable predicate has been validated, then it is correct. **) Lemma nullable_correct head word : word = [] -> parse_tree head word -> nullable_symb head = true with nullable_correct_list heads word : word = [] -> parse_tree_list heads word -> nullable_word heads = true. Proof. - destruct 2=>//. assert (Hnull := nullable_stable prod). erewrite nullable_correct_list in Hnull; eauto. - intros Hword. destruct 1=>//=. destruct (app_eq_nil _ _ Hword). eauto using andb_true_intro. Qed. (** Auxiliary lemma for first_correct. *) Lemma first_word_set_app t word1 word2 : TerminalSet.In t (first_word_set (word1 ++ word2)) <-> TerminalSet.In t (first_word_set word1) \/ TerminalSet.In t (first_word_set word2) /\ nullable_word (rev word1) = true. Proof. induction word1 as [|s word1 IH]=>/=. - split; [tauto|]. move=>[/TerminalSet.empty_1 ?|[? _]]//. - rewrite /nullable_word forallb_app /=. destruct nullable_symb=>/=. + rewrite Bool.andb_true_r. split. * move=>/TerminalSet.union_1. rewrite IH. move=>[?|[?|[??]]]; auto using TerminalSet.union_2, TerminalSet.union_3. * destruct IH. move=>[/TerminalSet.union_1 [?|?]|[??]]; auto using TerminalSet.union_2, TerminalSet.union_3. + rewrite Bool.andb_false_r. by intuition. Qed. (** If the first predicate has been validated, then it is correct. **) Lemma first_correct head word t q : word = t::q -> parse_tree head word -> TerminalSet.In (token_term t) (first_symb_set head) with first_correct_list heads word t q : word = t::q -> parse_tree_list heads word -> TerminalSet.In (token_term t) (first_word_set (rev' heads)). Proof. - intros Hword. destruct 1=>//. + inversion Hword. subst. apply TerminalSet.singleton_2, compare_refl. + eapply first_stable. eauto. - intros Hword. destruct 1 as [|symq wordq ptl symt wordt pt]=>//=. rewrite /rev' -rev_alt /= first_word_set_app /= rev_involutive rev_alt. destruct wordq; [right|left]. + destruct nullable_symb; eauto using TerminalSet.union_2, nullable_correct_list. + inversion Hword. subst. fold (rev' symq). eauto. Qed. (** A PTL is compatible with a stack if the top of the stack contains data representing to this PTL. *) Fixpoint ptl_stack_compat {symbs word} (stk0 : stack) (ptl : parse_tree_list symbs word) (stk : stack) : Prop := match ptl with | Nil_ptl => stk0 = stk | @Cons_ptl _ _ ptl sym _ pt => match stk with | [] => False | existT _ _ sem::stk => ptl_stack_compat stk0 ptl stk /\ exists e, sem = eq_rect _ symbol_semantic_type (pt_sem pt) _ e end end. (** .. and when a PTL is compatible with a stack, then calling the pop function return the semantic value of this PTL. *) Lemma pop_stack_compat_pop_spec {A symbs word} (ptl:parse_tree_list symbs word) (stk:stack) (stk0:stack) action : ptl_stack_compat stk0 ptl stk -> pop_spec symbs stk action stk0 (ptl_sem (A:=A) ptl action). Proof. revert stk. induction ptl=>stk /= Hstk. - subst. constructor. - destruct stk as [|[st sem] stk]=>//. destruct Hstk as [Hstk [??]]. subst. simpl. constructor. eauto. Qed. Variable init: initstate. (** In order to prove compleness, we first fix a word to be parsed together with the content of the parser at the end of the parsing. *) Variable full_word: list token. Variable buffer_end: buffer. (** Completeness is proved by following the traversal of the parse tree which is performed by the parser. Each step of parsing correspond to one step of traversal. In order to represent the state of the traversal, we define the notion of "dotted" parse tree, which is a parse tree with one dot on one of its node. The place of the dot represents the place of the next action to be executed. Such a dotted parse tree is decomposed into two part: a "regular" parse tree, which is the parse tree placed under the dot, and a "parse tree zipper", which is the part of the parse tree placed above the dot. Therefore, a parse tree zipper is a parse tree with a hole. Moreover, for easier manipulation, a parse tree zipper is represented "upside down". That is, the root of the parse tree is actually a leaf of the zipper, while the root of the zipper is the hole. *) Inductive pt_zipper: forall (hole_symb:symbol) (hole_word:list token), Type := | Top_ptz: pt_zipper (NT (start_nt init)) full_word | Cons_ptl_ptz: forall {head_symbolsq:list symbol} {wordq:list token}, parse_tree_list head_symbolsq wordq -> forall {head_symbolt:symbol} {wordt:list token}, ptl_zipper (head_symbolt::head_symbolsq) (wordq++wordt) -> pt_zipper head_symbolt wordt with ptl_zipper: forall (hole_symbs:list symbol) (hole_word:list token), Type := | Non_terminal_pt_ptlz: forall {p:production} {word:list token}, pt_zipper (NT (prod_lhs p)) word -> ptl_zipper (prod_rhs_rev p) word | Cons_ptl_ptlz: forall {head_symbolsq:list symbol} {wordq:list token}, forall {head_symbolt:symbol} {wordt:list token}, parse_tree head_symbolt wordt -> ptl_zipper (head_symbolt::head_symbolsq) (wordq++wordt) -> ptl_zipper head_symbolsq wordq. (** A dotted parse tree is the combination of a parse tree zipper with a parse tree. It can be intwo flavors, depending on which is the next action to be executed (shift or reduce). *) Inductive pt_dot: Type := | Reduce_ptd: forall {prod word}, parse_tree_list (prod_rhs_rev prod) word -> pt_zipper (NT (prod_lhs prod)) word -> pt_dot | Shift_ptd: forall (tok : token) {symbolsq wordq}, parse_tree_list symbolsq wordq -> ptl_zipper (T (token_term tok)::symbolsq) (wordq++[tok]) -> pt_dot. (** We can compute the full semantic value of a parse tree when represented as a dotted ptd. *) Fixpoint ptlz_sem {hole_symbs hole_word} (ptlz:ptl_zipper hole_symbs hole_word) : (forall A, arrows_right A (map symbol_semantic_type hole_symbs) -> A) -> (symbol_semantic_type (NT (start_nt init))) := match ptlz with | @Non_terminal_pt_ptlz prod _ ptz => fun k => ptz_sem ptz (k _ (prod_action prod)) | Cons_ptl_ptlz pt ptlz => fun k => ptlz_sem ptlz (fun _ f => k _ (f (pt_sem pt))) end with ptz_sem {hole_symb hole_word} (ptz:pt_zipper hole_symb hole_word): symbol_semantic_type hole_symb -> symbol_semantic_type (NT (start_nt init)) := match ptz with | Top_ptz => fun sem => sem | Cons_ptl_ptz ptl ptlz => fun sem => ptlz_sem ptlz (fun _ f => ptl_sem ptl (f sem)) end. Definition ptd_sem (ptd : pt_dot) := match ptd with | @Reduce_ptd prod _ ptl ptz => ptz_sem ptz (ptl_sem ptl (prod_action prod)) | Shift_ptd tok ptl ptlz => ptlz_sem ptlz (fun _ f => ptl_sem ptl (f (token_sem tok))) end. (** The buffer associated with a dotted parse tree corresponds to the buffer left to be read by the parser when at the state represented by the dotted parse tree. *) Fixpoint ptlz_buffer {hole_symbs hole_word} (ptlz:ptl_zipper hole_symbs hole_word): buffer := match ptlz with | Non_terminal_pt_ptlz ptz => ptz_buffer ptz | @Cons_ptl_ptlz _ _ _ wordt _ ptlz' => wordt ++ ptlz_buffer ptlz' end with ptz_buffer {hole_symb hole_word} (ptz:pt_zipper hole_symb hole_word): buffer := match ptz with | Top_ptz => buffer_end | Cons_ptl_ptz _ ptlz => ptlz_buffer ptlz end. Definition ptd_buffer (ptd:pt_dot) := match ptd with | Reduce_ptd _ ptz => ptz_buffer ptz | @Shift_ptd tok _ wordq _ ptlz => (tok::ptlz_buffer ptlz)%buf end. (** We are now ready to define the main invariant of the proof of completeness: we need to specify when a stack is compatible with a dotted parse tree. Informally, a stack is compatible with a dotted parse tree when it is the concatenation stack fragments which are compatible with each of the partially recognized productions appearing in the parse tree zipper. Moreover, the head of each of these stack fragment contains a state which has an item predicted by the corresponding zipper. More formally, the compatibility relation first needs the following auxiliary definitions: *) Fixpoint ptlz_prod {hole_symbs hole_word} (ptlz:ptl_zipper hole_symbs hole_word): production := match ptlz with | @Non_terminal_pt_ptlz prod _ _ => prod | Cons_ptl_ptlz _ ptlz' => ptlz_prod ptlz' end. Fixpoint ptlz_future {hole_symbs hole_word} (ptlz:ptl_zipper hole_symbs hole_word): list symbol := match ptlz with | Non_terminal_pt_ptlz _ => [] | @Cons_ptl_ptlz _ _ s _ _ ptlz' => s::ptlz_future ptlz' end. Fixpoint ptlz_lookahead {hole_symbs hole_word} (ptlz:ptl_zipper hole_symbs hole_word) : terminal := match ptlz with | Non_terminal_pt_ptlz ptz => token_term (buf_head (ptz_buffer ptz)) | Cons_ptl_ptlz _ ptlz' => ptlz_lookahead ptlz' end. Fixpoint ptz_stack_compat {hole_symb hole_word} (stk : stack) (ptz : pt_zipper hole_symb hole_word) : Prop := match ptz with | Top_ptz => stk = [] | Cons_ptl_ptz ptl ptlz => exists stk0, state_has_future (state_of_stack init stk) (ptlz_prod ptlz) (hole_symb::ptlz_future ptlz) (ptlz_lookahead ptlz) /\ ptl_stack_compat stk0 ptl stk /\ ptlz_stack_compat stk0 ptlz end with ptlz_stack_compat {hole_symbs hole_word} (stk : stack) (ptlz : ptl_zipper hole_symbs hole_word) : Prop := match ptlz with | Non_terminal_pt_ptlz ptz => ptz_stack_compat stk ptz | Cons_ptl_ptlz _ ptlz => ptlz_stack_compat stk ptlz end. Definition ptd_stack_compat (ptd:pt_dot) (stk:stack): Prop := match ptd with | @Reduce_ptd prod _ ptl ptz => exists stk0, state_has_future (state_of_stack init stk) prod [] (token_term (buf_head (ptz_buffer ptz))) /\ ptl_stack_compat stk0 ptl stk /\ ptz_stack_compat stk0 ptz | Shift_ptd tok ptl ptlz => exists stk0, state_has_future (state_of_stack init stk) (ptlz_prod ptlz) (T (token_term tok) :: ptlz_future ptlz) (ptlz_lookahead ptlz) /\ ptl_stack_compat stk0 ptl stk /\ ptlz_stack_compat stk0 ptlz end. Lemma ptz_stack_compat_cons_state_has_future {symbsq wordq symbt wordt} stk (ptl : parse_tree_list symbsq wordq) (ptlz : ptl_zipper (symbt :: symbsq) (wordq ++ wordt)) : ptz_stack_compat stk (Cons_ptl_ptz ptl ptlz) -> state_has_future (state_of_stack init stk) (ptlz_prod ptlz) (symbt::ptlz_future ptlz) (ptlz_lookahead ptlz). Proof. move=>[stk0 [? [? ?]]] //. Qed. Lemma ptlz_future_ptlz_prod hole_symbs hole_word (ptlz:ptl_zipper hole_symbs hole_word) : rev_append (ptlz_future ptlz) hole_symbs = prod_rhs_rev (ptlz_prod ptlz). Proof. induction ptlz=>//=. Qed. Lemma ptlz_future_first {symbs word} (ptlz : ptl_zipper symbs word) : TerminalSet.In (token_term (buf_head (ptlz_buffer ptlz))) (first_word_set (ptlz_future ptlz)) \/ token_term (buf_head (ptlz_buffer ptlz)) = ptlz_lookahead ptlz /\ nullable_word (ptlz_future ptlz) = true. Proof. induction ptlz as [|??? [|tok] pt ptlz IH]; [by auto| |]=>/=. - rewrite (nullable_correct _ _ eq_refl pt). destruct IH as [|[??]]; [left|right]=>/=; auto using TerminalSet.union_3. - left. destruct nullable_symb; eauto using TerminalSet.union_2, first_correct. Qed. (** We now want to define what is the next dotted parse tree which is to be handled after one action. Such dotted parse is built in two steps: Not only we have to perform the action by completing the parse tree, but we also have to prepare for the following step by moving the dot down to place it in front of the next action to be performed. *) Fixpoint build_pt_dot_from_pt {symb word} (pt : parse_tree symb word) (ptz : pt_zipper symb word) : pt_dot := match pt in parse_tree symb word return pt_zipper symb word -> pt_dot with | Terminal_pt tok => fun ptz => let X := match ptz in pt_zipper symb word return match symb with T term => True | NT _ => False end -> { symbsq : list symbol & { wordq : list token & (parse_tree_list symbsq wordq * ptl_zipper (symb :: symbsq) (wordq ++ word))%type } } with | Top_ptz => fun F => False_rect _ F | Cons_ptl_ptz ptl ptlz => fun _ => existT _ _ (existT _ _ (ptl, ptlz)) end I in Shift_ptd tok (fst (projT2 (projT2 X))) (snd (projT2 (projT2 X))) | Non_terminal_pt prod ptl => fun ptz => let is_notnil := match ptl in parse_tree_list w _ return option (match w return Prop with [] => False | _ => True end) with | Nil_ptl => None | _ => Some I end in match is_notnil with | None => Reduce_ptd ptl ptz | Some H => build_pt_dot_from_pt_rec ptl H (Non_terminal_pt_ptlz ptz) end end ptz with build_pt_dot_from_pt_rec {symbs word} (ptl : parse_tree_list symbs word) (Hsymbs : match symbs with [] => False | _ => True end) (ptlz : ptl_zipper symbs word) : pt_dot := match ptl in parse_tree_list symbs word return match symbs with [] => False | _ => True end -> ptl_zipper symbs word -> pt_dot with | Nil_ptl => fun Hsymbs _ => False_rect _ Hsymbs | Cons_ptl ptl' pt => fun _ => match ptl' in parse_tree_list symbsq wordq return parse_tree_list symbsq wordq -> ptl_zipper (_ :: symbsq) (wordq ++ _) -> pt_dot with | Nil_ptl => fun _ ptlz => build_pt_dot_from_pt pt (Cons_ptl_ptz Nil_ptl ptlz) | _ => fun ptl' ptlz => build_pt_dot_from_pt_rec ptl' I (Cons_ptl_ptlz pt ptlz) end ptl' end Hsymbs ptlz. Definition build_pt_dot_from_ptl {symbs word} (ptl : parse_tree_list symbs word) (ptlz : ptl_zipper symbs word) : pt_dot := match ptlz in ptl_zipper symbs word return parse_tree_list symbs word -> pt_dot with | Non_terminal_pt_ptlz ptz => fun ptl => Reduce_ptd ptl ptz | Cons_ptl_ptlz pt ptlz => fun ptl => build_pt_dot_from_pt pt (Cons_ptl_ptz ptl ptlz) end ptl. Definition next_ptd (ptd:pt_dot) : option pt_dot := match ptd with | Shift_ptd tok ptl ptlz => Some (build_pt_dot_from_ptl (Cons_ptl ptl (Terminal_pt tok)) ptlz) | Reduce_ptd ptl ptz => match ptz in pt_zipper symb word return parse_tree symb word -> _ with | Top_ptz => fun _ => None | Cons_ptl_ptz ptl' ptlz => fun pt => Some (build_pt_dot_from_ptl (Cons_ptl ptl' pt) ptlz) end (Non_terminal_pt _ ptl) end. Fixpoint next_ptd_iter (ptd:pt_dot) (log_n_steps:nat) : option pt_dot := match log_n_steps with | O => next_ptd ptd | S log_n_steps => match next_ptd_iter ptd log_n_steps with | None => None | Some ptd => next_ptd_iter ptd log_n_steps end end. (** We prove that these functions behave well w.r.t. semantic values. *) Lemma sem_build_from_pt {symb word} (pt : parse_tree symb word) (ptz : pt_zipper symb word) : ptz_sem ptz (pt_sem pt) = ptd_sem (build_pt_dot_from_pt pt ptz) with sem_build_from_pt_rec {symbs word} (ptl : parse_tree_list symbs word) (ptlz : ptl_zipper symbs word) Hsymbs : ptlz_sem ptlz (fun _ f => ptl_sem ptl f) = ptd_sem (build_pt_dot_from_pt_rec ptl Hsymbs ptlz). Proof. - destruct pt as [tok|prod word ptl]=>/=. + revert ptz. generalize [tok]. generalize (token_sem tok). generalize I. change True with (match T (token_term tok) with T _ => True | NT _ => False end) at 1. generalize (T (token_term tok)) => symb HT sem word ptz. by destruct ptz. + match goal with | |- context [match ?X with Some H => _ | None => _ end] => destruct X=>// end. by rewrite -sem_build_from_pt_rec. - destruct ptl; [contradiction|]. specialize (sem_build_from_pt_rec _ _ ptl)=>/=. destruct ptl. + by rewrite -sem_build_from_pt. + by rewrite -sem_build_from_pt_rec. Qed. Lemma sem_build_from_ptl {symbs word} (ptl : parse_tree_list symbs word) (ptlz : ptl_zipper symbs word) : ptlz_sem ptlz (fun _ f => ptl_sem ptl f) = ptd_sem (build_pt_dot_from_ptl ptl ptlz). Proof. destruct ptlz=>//=. by rewrite -sem_build_from_pt. Qed. Lemma sem_next_ptd (ptd : pt_dot) : match next_ptd ptd with | None => True | Some ptd' => ptd_sem ptd = ptd_sem ptd' end. Proof. destruct ptd as [prod word ptl ptz|tok symbs word ptl ptlz] =>/=. - change (ptl_sem ptl (prod_action prod)) with (pt_sem (Non_terminal_pt prod ptl)). generalize (Non_terminal_pt prod ptl). clear ptl. destruct ptz as [|?? ptl ?? ptlz]=>// pt. by rewrite -sem_build_from_ptl. - by rewrite -sem_build_from_ptl. Qed. Lemma sem_next_ptd_iter (ptd : pt_dot) (log_n_steps : nat) : match next_ptd_iter ptd log_n_steps with | None => True | Some ptd' => ptd_sem ptd = ptd_sem ptd' end. Proof. revert ptd. induction log_n_steps as [|log_n_steps IH]; [by apply sem_next_ptd|]=>/= ptd. assert (IH1 := IH ptd). destruct next_ptd_iter as [ptd'|]=>//. specialize (IH ptd'). destruct next_ptd_iter=>//. congruence. Qed. (** We prove that these functions behave well w.r.t. xxx_buffer. *) Lemma ptd_buffer_build_from_pt {symb word} (pt : parse_tree symb word) (ptz : pt_zipper symb word) : (word ++ ptz_buffer ptz)%buf = ptd_buffer (build_pt_dot_from_pt pt ptz) with ptd_buffer_build_from_pt_rec {symbs word} (ptl : parse_tree_list symbs word) (ptlz : ptl_zipper symbs word) Hsymbs : (word ++ ptlz_buffer ptlz)%buf = ptd_buffer (build_pt_dot_from_pt_rec ptl Hsymbs ptlz). Proof. - destruct pt as [tok|prod word ptl]=>/=. + f_equal. revert ptz. generalize [tok]. generalize (token_sem tok). generalize I. change True with (match T (token_term tok) with T _ => True | NT _ => False end) at 1. generalize (T (token_term tok)) => symb HT sem word ptz. by destruct ptz. + match goal with | |- context [match ?X with Some H => _ | None => _ end] => destruct X eqn:EQ end. * by rewrite -ptd_buffer_build_from_pt_rec. * rewrite [X in (X ++ _)%buf](_ : word = []) //. clear -EQ. by destruct ptl. - destruct ptl as [|?? ptl ?? pt]; [contradiction|]. specialize (ptd_buffer_build_from_pt_rec _ _ ptl). destruct ptl. + by rewrite /= -ptd_buffer_build_from_pt. + by rewrite -ptd_buffer_build_from_pt_rec //= app_buf_assoc. Qed. Lemma ptd_buffer_build_from_ptl {symbs word} (ptl : parse_tree_list symbs word) (ptlz : ptl_zipper symbs word) : ptlz_buffer ptlz = ptd_buffer (build_pt_dot_from_ptl ptl ptlz). Proof. destruct ptlz as [|???? pt]=>//=. by rewrite -ptd_buffer_build_from_pt. Qed. (** We prove that these functions behave well w.r.t. xxx_stack_compat. *) Lemma ptd_stack_compat_build_from_pt {symb word} (pt : parse_tree symb word) (ptz : pt_zipper symb word) (stk: stack) : ptz_stack_compat stk ptz -> ptd_stack_compat (build_pt_dot_from_pt pt ptz) stk with ptd_stack_compat_build_from_pt_rec {symbs word} (ptl : parse_tree_list symbs word) (ptlz : ptl_zipper symbs word) (stk : stack) Hsymbs : ptlz_stack_compat stk ptlz -> state_has_future (state_of_stack init stk) (ptlz_prod ptlz) (rev' (prod_rhs_rev (ptlz_prod ptlz))) (ptlz_lookahead ptlz) -> ptd_stack_compat (build_pt_dot_from_pt_rec ptl Hsymbs ptlz) stk. Proof. - intros Hstk. destruct pt as [tok|prod word ptl]=>/=. + revert ptz Hstk. generalize [tok]. generalize (token_sem tok). generalize I. change True with (match T (token_term tok) with T _ => True | NT _ => False end) at 1. generalize (T (token_term tok)) => symb HT sem word ptz. by destruct ptz. + assert (state_has_future (state_of_stack init stk) prod (rev' (prod_rhs_rev prod)) (token_term (buf_head (ptz_buffer ptz)))). { revert ptz Hstk. remember (NT (prod_lhs prod)) eqn:EQ=>ptz. destruct ptz as [|?? ptl0 ?? ptlz0]. - intros ->. apply start_future. congruence. - subst. intros (stk0 & Hfut & _). apply non_terminal_closed in Hfut. specialize (Hfut prod eq_refl). destruct (ptlz_future_first ptlz0) as [Hfirst|[Hfirst Hnull]]. + destruct Hfut as [_ Hfut]. auto. + destruct Hfut as [Hfut _]. by rewrite Hnull -Hfirst in Hfut. } match goal with | |- context [match ?X with Some H => _ | None => _ end] => destruct X eqn:EQ end. * by apply ptd_stack_compat_build_from_pt_rec. * exists stk. destruct ptl=>//. - intros Hstk Hfut. destruct ptl as [|?? ptl ?? pt]; [contradiction|]. specialize (ptd_stack_compat_build_from_pt_rec _ _ ptl). destruct ptl. + eapply ptd_stack_compat_build_from_pt=>//. exists stk. split; [|split]=>//; []. by rewrite -ptlz_future_ptlz_prod rev_append_rev /rev' -rev_alt rev_app_distr rev_involutive in Hfut. + by apply ptd_stack_compat_build_from_pt_rec. Qed. Lemma ptd_stack_compat_build_from_ptl {symbs word} (ptl : parse_tree_list symbs word) (ptlz : ptl_zipper symbs word) (stk stk0: stack) : ptlz_stack_compat stk0 ptlz -> ptl_stack_compat stk0 ptl stk -> state_has_future (state_of_stack init stk) (ptlz_prod ptlz) (ptlz_future ptlz) (ptlz_lookahead ptlz) -> ptd_stack_compat (build_pt_dot_from_ptl ptl ptlz) stk. Proof. intros Hstk0 Hstk Hfut. destruct ptlz=>/=. - eauto. - apply ptd_stack_compat_build_from_pt=>/=. eauto. Qed. (** We can now proceed by proving that the invariant is preserved by each step of parsing. We also prove that each step of parsing follows next_ptd. We start with reduce steps: *) Lemma reduce_step_next_ptd (prod : production) (word : list token) (ptl : parse_tree_list (prod_rhs_rev prod) word) (ptz : pt_zipper (NT (prod_lhs prod)) word) (stk : stack) Hval Hi : ptd_stack_compat (Reduce_ptd ptl ptz) stk -> match next_ptd (Reduce_ptd ptl ptz) with | None => reduce_step init stk prod (ptz_buffer ptz) Hval Hi = Accept_sr (ptd_sem (Reduce_ptd ptl ptz)) buffer_end | Some ptd => exists stk', reduce_step init stk prod (ptz_buffer ptz) Hval Hi = Progress_sr stk' (ptd_buffer ptd) /\ ptd_stack_compat ptd stk' end. Proof. intros (stk0 & _ & Hstk & Hstk0). apply pop_stack_compat_pop_spec with (action := prod_action prod) in Hstk. rewrite <-pop_spec_ok with (Hp := reduce_step_subproof init stk prod Hval Hi) in Hstk. unfold reduce_step. match goal with | |- context [pop_state_valid init ?A stk ?B ?C ?D ?E ?F] => generalize (pop_state_valid init A stk B C D E F) end. rewrite Hstk /=. intros Hv. generalize (reduce_step_subproof1 init stk prod Hval stk0 (fun _ : True => Hv)). clear Hval Hstk Hi Hv stk. assert (Hgoto := fun fut prod' => non_terminal_goto (state_of_stack init stk0) prod' (NT (prod_lhs prod)::fut)). simpl in Hgoto. destruct goto_table as [[st Hst]|] eqn:Hgoto'. - intros _. assert (match ptz with Top_ptz => False | _ => True end). { revert ptz Hst Hstk0 Hgoto'. generalize (eq_refl (NT (prod_lhs prod))). generalize (NT (prod_lhs prod)) at 1 3 5. intros nt Hnt ptz. destruct ptz=>//. injection Hnt=> <- /= Hst -> /= Hg. assert (Hsg := start_goto init). by rewrite Hg in Hsg. } clear Hgoto'. change (ptl_sem ptl (prod_action prod)) with (pt_sem (Non_terminal_pt prod ptl)). generalize (Non_terminal_pt prod ptl). clear ptl. destruct ptz as [|?? ptl ? ? ptlz]=>// pt. subst=>/=. eexists _. split. + f_equal. apply ptd_buffer_build_from_ptl. + destruct Hstk0 as (stk0' & Hfut & Hstk0' & Hstk0). apply (ptd_stack_compat_build_from_ptl _ _ _ stk0'); auto; []. split=>//. by exists eq_refl. - intros Hv. generalize (reduce_step_subproof0 _ prod _ (fun _ => Hv)). intros EQnt. clear Hv Hgoto'. change (ptl_sem ptl (prod_action prod)) with (pt_sem (Non_terminal_pt prod ptl)). generalize (Non_terminal_pt prod ptl). clear ptl. destruct ptz. + intros pt. f_equal. by rewrite cast_eq. + edestruct Hgoto. eapply ptz_stack_compat_cons_state_has_future, Hstk0. Qed. Lemma step_next_ptd (ptd : pt_dot) (stk : stack) Hi : ptd_stack_compat ptd stk -> match next_ptd ptd with | None => step safe init stk (ptd_buffer ptd) Hi = Accept_sr (ptd_sem ptd) buffer_end | Some ptd' => exists stk', step safe init stk (ptd_buffer ptd) Hi = Progress_sr stk' (ptd_buffer ptd') /\ ptd_stack_compat ptd' stk' end. Proof. intros Hstk. unfold step. generalize (reduce_ok safe (state_of_stack init stk)). destruct ptd as [prod word ptl ptz|tok symbs word ptl ptlz]. - assert (Hfut : state_has_future (state_of_stack init stk) prod [] (token_term (buf_head (ptz_buffer ptz)))). { destruct Hstk as (? & ? & ?)=>//. } assert (Hact := end_reduce _ _ _ _ Hfut). destruct action_table as [?|awt]=>Hval /=. + subst. by apply reduce_step_next_ptd. + set (term := token_term (buf_head (ptz_buffer ptz))) in *. generalize (Hval term). clear Hval. destruct (awt term)=>//. subst. intros Hval. by apply reduce_step_next_ptd. - destruct Hstk as (stk0 & Hfut & Hstk & Hstk0). assert (Hact := terminal_shift _ _ _ _ Hfut). simpl in Hact. clear Hfut. destruct action_table as [?|awt]=>//= /(_ (token_term tok)). destruct awt as [st' EQ| |]=>// _. eexists. split. + f_equal. rewrite -ptd_buffer_build_from_ptl //. + apply (ptd_stack_compat_build_from_ptl _ _ _ stk0); simpl; eauto. Qed. (** We prove the completeness of the parser main loop. *) Lemma parse_fix_next_ptd_iter (ptd : pt_dot) (stk : stack) (log_n_steps : nat) Hi : ptd_stack_compat ptd stk -> match next_ptd_iter ptd log_n_steps with | None => proj1_sig (parse_fix safe init stk (ptd_buffer ptd) log_n_steps Hi) = Accept_sr (ptd_sem ptd) buffer_end | Some ptd' => exists stk', proj1_sig (parse_fix safe init stk (ptd_buffer ptd) log_n_steps Hi) = Progress_sr stk' (ptd_buffer ptd') /\ ptd_stack_compat ptd' stk' end. Proof. revert ptd stk Hi. induction log_n_steps as [|log_n_steps IH]; [by apply step_next_ptd|]. move => /= ptd stk Hi Hstk. assert (IH1 := IH ptd stk Hi Hstk). assert (EQsem := sem_next_ptd_iter ptd log_n_steps). destruct parse_fix as [sr Hi']. simpl in IH1. destruct next_ptd_iter as [ptd'|]. - rewrite EQsem. destruct IH1 as (stk' & -> & Hstk'). by apply IH. - by subst. Qed. (** The parser is defined by recursion over a fuel parameter. In the completeness proof, we need to predict how much fuel is going to be needed in order to prove that enough fuel gives rise to a successful parsing. To do so, of a dotted parse tree, which is the number of actions left to be executed before complete parsing when the current state is represented by the dotted parse tree. *) Fixpoint ptlz_cost {hole_symbs hole_word} (ptlz:ptl_zipper hole_symbs hole_word) := match ptlz with | Non_terminal_pt_ptlz ptz => ptz_cost ptz | Cons_ptl_ptlz pt ptlz' => pt_size pt + ptlz_cost ptlz' end with ptz_cost {hole_symb hole_word} (ptz:pt_zipper hole_symb hole_word) := match ptz with | Top_ptz => 0 | Cons_ptl_ptz ptl ptlz' => 1 + ptlz_cost ptlz' end. Definition ptd_cost (ptd:pt_dot) := match ptd with | Reduce_ptd ptl ptz => ptz_cost ptz | Shift_ptd _ ptl ptlz => 1 + ptlz_cost ptlz end. Lemma ptd_cost_build_from_pt {symb word} (pt : parse_tree symb word) (ptz : pt_zipper symb word) : pt_size pt + ptz_cost ptz = S (ptd_cost (build_pt_dot_from_pt pt ptz)) with ptd_cost_build_from_pt_rec {symbs word} (ptl : parse_tree_list symbs word) (ptlz : ptl_zipper symbs word) Hsymbs : ptl_size ptl + ptlz_cost ptlz = ptd_cost (build_pt_dot_from_pt_rec ptl Hsymbs ptlz). Proof. - destruct pt as [tok|prod word ptl']=>/=. + revert ptz. generalize [tok]. generalize (token_sem tok). generalize I. change True with (match T (token_term tok) with T _ => True | NT _ => False end) at 1. generalize (T (token_term tok)) => symb HT sem word ptz. by destruct ptz. + match goal with | |- context [match ?X with Some H => _ | None => _ end] => destruct X eqn:EQ end. * rewrite -ptd_cost_build_from_pt_rec /= plus_n_Sm //. * simpl. by destruct ptl'. - destruct ptl as [|?? ptl ?? pt]; [contradiction|]. specialize (ptd_cost_build_from_pt_rec _ _ ptl). destruct ptl. + apply eq_add_S. rewrite -ptd_cost_build_from_pt /=. ring. + rewrite -ptd_cost_build_from_pt_rec //=. ring. Qed. Lemma ptd_cost_build_from_ptl {symbs word} (ptl : parse_tree_list symbs word) (ptlz : ptl_zipper symbs word) : ptlz_cost ptlz = ptd_cost (build_pt_dot_from_ptl ptl ptlz). Proof. destruct ptlz=>//. apply eq_add_S. rewrite -ptd_cost_build_from_pt /=. ring. Qed. Lemma next_ptd_cost ptd: match next_ptd ptd with | None => ptd_cost ptd = 0 | Some ptd' => ptd_cost ptd = S (ptd_cost ptd') end. Proof. destruct ptd as [prod word ptl ptz|tok symbq wordq ptl ptlz] =>/=. - generalize (Non_terminal_pt prod ptl). clear ptl. destruct ptz as [|?? ptl ?? ptlz]=>// pt. by rewrite -ptd_cost_build_from_ptl. - by rewrite -ptd_cost_build_from_ptl. Qed. Lemma next_ptd_iter_cost ptd log_n_steps : match next_ptd_iter ptd log_n_steps with | None => ptd_cost ptd < 2^log_n_steps | Some ptd' => ptd_cost ptd = 2^log_n_steps + ptd_cost ptd' end. Proof. revert ptd. induction log_n_steps as [|log_n_steps IH]=>ptd /=. - assert (Hptd := next_ptd_cost ptd). destruct next_ptd=>//. by rewrite Hptd. - rewrite Nat.add_0_r. assert (IH1 := IH ptd). destruct next_ptd_iter as [ptd'|]. + specialize (IH ptd'). destruct next_ptd_iter as [ptd''|]. * by rewrite IH1 IH -!plus_assoc. * rewrite IH1. by apply plus_lt_compat_l. + by apply lt_plus_trans. Qed. (** We now prove the top-level parsing function. The only thing that is left to be done is the initialization. To do so, we define the initial dotted parse tree, depending on a full (top-level) parse tree. *) Variable full_pt : parse_tree (NT (start_nt init)) full_word. Theorem parse_complete log_n_steps: match parse safe init (full_word ++ buffer_end) log_n_steps with | Parsed_pr sem buff => sem = pt_sem full_pt /\ buff = buffer_end /\ pt_size full_pt <= 2^log_n_steps | Timeout_pr => 2^log_n_steps < pt_size full_pt | Fail_pr => False end. Proof. assert (Hstk : ptd_stack_compat (build_pt_dot_from_pt full_pt Top_ptz) []) by by apply ptd_stack_compat_build_from_pt. unfold parse. assert (Hparse := parse_fix_next_ptd_iter _ _ log_n_steps (parse_subproof init) Hstk). rewrite -ptd_buffer_build_from_pt -sem_build_from_pt /= in Hparse. assert (Hcost := next_ptd_iter_cost (build_pt_dot_from_pt full_pt Top_ptz) log_n_steps). destruct next_ptd_iter. - destruct Hparse as (? & -> & ?). apply (f_equal S) in Hcost. rewrite -ptd_cost_build_from_pt Nat.add_0_r in Hcost. rewrite Hcost. apply le_lt_n_Sm, le_plus_l. - rewrite Hparse. split; [|split]=>//. apply lt_le_S in Hcost. by rewrite -ptd_cost_build_from_pt Nat.add_0_r in Hcost. Qed. End Completeness_Proof. End Make. menhir-20210929/coq-menhirlib/src/Interpreter_correct.v000066400000000000000000000162751412503066000230110ustar00rootroot00000000000000(****************************************************************************) (* *) (* Menhir *) (* *) (* Jacques-Henri Jourdan, CNRS, LRI, Université Paris Sud *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under *) (* the terms of the GNU Lesser General Public License as published by the *) (* Free Software Foundation, either version 3 of the License, or (at your *) (* option) any later version, as described in the file LICENSE. *) (* *) (****************************************************************************) From Coq Require Import List. Import ListNotations. Require Import Alphabet. Require Grammar Automaton Interpreter. From Coq.ssr Require Import ssreflect. Module Make(Import A:Automaton.T) (Import Inter:Interpreter.T A). (** * Correctness of the interpreter **) (** We prove that, in any case, if the interpreter accepts returning a semantic value, then this is a semantic value of the input **) Section Init. Variable init:initstate. (** [word_has_stack_semantics] relates a word with a stack, stating that the word is a concatenation of words that have the semantic values stored in the stack. **) Inductive word_has_stack_semantics: forall (word:list token) (stack:stack), Prop := | Nil_stack_whss: word_has_stack_semantics [] [] | Cons_stack_whss: forall (wordq:list token) (stackq:stack), word_has_stack_semantics wordq stackq -> forall (wordt:list token) (s:noninitstate) (pt:parse_tree (last_symb_of_non_init_state s) wordt), word_has_stack_semantics (wordq++wordt) (existT noninitstate_type s (pt_sem pt)::stackq). (** [pop] preserves the invariant **) Lemma pop_spec_ptl A symbols_to_pop action word_stk stk (res : A) stk' : pop_spec symbols_to_pop stk action stk' res -> word_has_stack_semantics word_stk stk -> exists word_stk' word_res (ptl:parse_tree_list symbols_to_pop word_res), (word_stk' ++ word_res = word_stk)%list /\ word_has_stack_semantics word_stk' stk' /\ ptl_sem ptl action = res. Proof. intros Hspec. revert word_stk. induction Hspec as [stk sem|symbols_to_pop st stk action sem stk' res Hspec IH]; intros word_stk Hword_stk. - exists word_stk, [], Nil_ptl. rewrite -app_nil_end. eauto. - inversion Hword_stk. subst_existT. edestruct IH as (word_stk' & word_res & ptl & ? & Hword_stk'' & ?); [eassumption|]. subst. eexists word_stk', (word_res ++ _)%list, (Cons_ptl ptl _). split; [|split]=>//. rewrite app_assoc //. Qed. (** [reduce_step] preserves the invariant **) Lemma reduce_step_invariant (stk:stack) (prod:production) Hv Hi word buffer : word_has_stack_semantics word stk -> match reduce_step init stk prod buffer Hv Hi with | Accept_sr sem buffer_new => exists pt : parse_tree (NT (start_nt init)) word, buffer = buffer_new /\ pt_sem pt = sem | Progress_sr stk' buffer_new => buffer = buffer_new /\ word_has_stack_semantics word stk' | Fail_sr => True end. Proof. intros Hword_stk. unfold reduce_step. match goal with | |- context [pop_state_valid init ?stp stk ?x1 ?x2 ?x3 ?x4 ?x5] => generalize (pop_state_valid init stp stk x1 x2 x3 x4 x5) end. destruct pop as [stk' sem] eqn:Hpop=>/= Hv'. apply pop_spec_ok in Hpop. apply pop_spec_ptl with (word_stk := word) in Hpop=>//. destruct Hpop as (word1 & word2 & ptl & <- & Hword1 & <-). generalize (reduce_step_subproof1 init stk prod Hv stk' (fun _ : True => Hv')). destruct goto_table as [[st' EQ]|]. - intros _. split=>//. change (ptl_sem ptl (prod_action prod)) with (pt_sem (Non_terminal_pt prod ptl)). generalize (Non_terminal_pt prod ptl). rewrite ->EQ. intros pt. by constructor. - intros Hstk'. destruct Hword1; [|by destruct Hstk']. generalize (reduce_step_subproof0 init prod [] (fun _ : True => Hstk')). simpl in Hstk'. rewrite -Hstk' // => EQ. rewrite cast_eq. exists (Non_terminal_pt prod ptl). by split. Qed. (** [step] preserves the invariant **) Lemma step_invariant stk word buffer safe Hi : word_has_stack_semantics word stk -> match step safe init stk buffer Hi with | Accept_sr sem buffer_new => exists word_new (pt:parse_tree (NT (start_nt init)) word_new), (word ++ buffer = word_new ++ buffer_new)%buf /\ pt_sem pt = sem | Progress_sr stk_new buffer_new => exists word_new, (word ++ buffer = word_new ++ buffer_new)%buf /\ word_has_stack_semantics word_new stk_new | Fail_sr => True end. Proof. intros Hword_stk. unfold step. generalize (reduce_ok safe (state_of_stack init stk)). destruct action_table as [prod|awt]. - intros Hv. apply (reduce_step_invariant stk prod (fun _ => Hv) Hi word buffer) in Hword_stk. destruct reduce_step=>//. + destruct Hword_stk as (pt & <- & <-); eauto. + destruct Hword_stk as [<- ?]; eauto. - destruct buffer as [tok buffer]=>/=. move=> /(_ (token_term tok)) Hv. destruct (awt (token_term tok)) as [st EQ|prod|]=>//. + eexists _. split; [by apply app_buf_assoc with (l2 := [_])|]. change (token_sem tok) with (pt_sem (Terminal_pt tok)). generalize (Terminal_pt tok). generalize [tok]. rewrite -> EQ=>word' pt /=. by constructor. + apply (reduce_step_invariant stk prod (fun _ => Hv) Hi word (tok::buffer)) in Hword_stk. destruct reduce_step=>//. * destruct Hword_stk as (pt & <- & <-); eauto. * destruct Hword_stk as [<- ?]; eauto. Qed. (** [step] preserves the invariant **) Lemma parse_fix_invariant stk word buffer safe log_n_steps Hi : word_has_stack_semantics word stk -> match proj1_sig (parse_fix safe init stk buffer log_n_steps Hi) with | Accept_sr sem buffer_new => exists word_new (pt:parse_tree (NT (start_nt init)) word_new), (word ++ buffer = word_new ++ buffer_new)%buf /\ pt_sem pt = sem | Progress_sr stk_new buffer_new => exists word_new, (word ++ buffer = word_new ++ buffer_new)%buf /\ word_has_stack_semantics word_new stk_new | Fail_sr => True end. Proof. revert stk word buffer Hi. induction log_n_steps as [|log_n_steps IH]=>/= stk word buffer Hi Hstk; [by apply step_invariant|]. assert (IH1 := IH stk word buffer Hi Hstk). destruct parse_fix as [[] Hi']=>/=; try by apply IH1. destruct IH1 as (word' & -> & Hstk')=>//. by apply IH. Qed. (** The interpreter is correct : if it returns a semantic value, then the input word has this semantic value. **) Theorem parse_correct safe buffer log_n_steps: match parse safe init buffer log_n_steps with | Parsed_pr sem buffer_new => exists word_new (pt:parse_tree (NT (start_nt init)) word_new), buffer = (word_new ++ buffer_new)%buf /\ pt_sem pt = sem | _ => True end. Proof. unfold parse. assert (Hparse := parse_fix_invariant [] [] buffer safe log_n_steps (parse_subproof init)). destruct proj1_sig=>//. apply Hparse. constructor. Qed. End Init. End Make. menhir-20210929/coq-menhirlib/src/Main.v000066400000000000000000000065441412503066000176470ustar00rootroot00000000000000(****************************************************************************) (* *) (* Menhir *) (* *) (* Jacques-Henri Jourdan, CNRS, LRI, Université Paris Sud *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under *) (* the terms of the GNU Lesser General Public License as published by the *) (* Free Software Foundation, either version 3 of the License, or (at your *) (* option) any later version, as described in the file LICENSE. *) (* *) (****************************************************************************) Require Grammar Automaton Interpreter_correct Interpreter_complete. From Coq Require Import Syntax Arith. Module Make(Export Aut:Automaton.T). Export Aut.Gram. Export Aut.GramDefs. Module Import Inter := Interpreter.Make Aut. Module Correct := Interpreter_correct.Make Aut Inter. Module Complete := Interpreter_complete.Make Aut Inter. Definition complete_validator:unit->bool := Complete.Valid.is_complete. Definition safe_validator:unit->bool := ValidSafe.is_safe. Definition parse (safe:safe_validator ()=true) init log_n_steps buffer : parse_result (symbol_semantic_type (NT (start_nt init))):= parse (ValidSafe.safe_is_validator safe) init buffer log_n_steps. (** Correction theorem. **) Theorem parse_correct (safe:safe_validator ()= true) init log_n_steps buffer: match parse safe init log_n_steps buffer with | Parsed_pr sem buffer_new => exists word (pt : parse_tree (NT (start_nt init)) word), buffer = (word ++ buffer_new)%buf /\ pt_sem pt = sem | _ => True end. Proof. apply Correct.parse_correct. Qed. (** Completeness theorem. **) Theorem parse_complete (safe:safe_validator () = true) init log_n_steps word buffer_end: complete_validator () = true -> forall tree:parse_tree (NT (start_nt init)) word, match parse safe init log_n_steps (word ++ buffer_end) with | Fail_pr => False | Parsed_pr sem_res buffer_end_res => sem_res = pt_sem tree /\ buffer_end_res = buffer_end /\ pt_size tree <= 2^log_n_steps | Timeout_pr => 2^log_n_steps < pt_size tree end. Proof. intros. now apply Complete.parse_complete, Complete.Valid.complete_is_validator. Qed. (** Unambiguity theorem. **) Theorem unambiguity: safe_validator () = true -> complete_validator () = true -> inhabited token -> forall init word, forall (tree1 tree2:parse_tree (NT (start_nt init)) word), pt_sem tree1 = pt_sem tree2. Proof. intros Hsafe Hcomp [tok] init word tree1 tree2. pose (buf_end := cofix buf_end := (tok :: buf_end)%buf). assert (Hcomp1 := parse_complete Hsafe init (pt_size tree1) word buf_end Hcomp tree1). assert (Hcomp2 := parse_complete Hsafe init (pt_size tree1) word buf_end Hcomp tree2). destruct parse. - destruct Hcomp1. - exfalso. eapply PeanoNat.Nat.lt_irrefl. etransitivity; [|apply Hcomp1]. eapply Nat.pow_gt_lin_r. constructor. - destruct Hcomp1 as [-> _], Hcomp2 as [-> _]. reflexivity. Qed. End Make. menhir-20210929/coq-menhirlib/src/Makefile000066400000000000000000000014711412503066000202260ustar00rootroot00000000000000PWD := $(shell pwd) COQINCLUDE := -R . MenhirLib \ export COQINCLUDE .PHONY: all clean install uninstall all: _CoqProject @ $(MAKE) -f Makefile.coq --no-print-directory all _CoqProject: @ $(MAKE) -f Makefile.coq --no-print-directory _CoqProject clean: @ $(MAKE) -f Makefile.coq --no-print-directory clean @ rm -f _CoqProject # The role of DESTDIR is explained here: # https://www.gnu.org/prep/standards/html_node/DESTDIR.html # Basically, it is empty in a normal installation. # A nonempty value can be used to perform a dummy installation # in a different location. CONTRIB = $(shell $(COQBIN)coqc -where | tr -d '\r' | tr '\\' '/')/user-contrib TARGET = $(DESTDIR)$(CONTRIB)/MenhirLib install: all rm -rf $(TARGET) mkdir -p $(TARGET) install -m 644 *.v *.vo *.glob $(TARGET) uninstall: rm -rf $(TARGET) menhir-20210929/coq-menhirlib/src/Makefile.coq000066400000000000000000000126201412503066000210050ustar00rootroot00000000000000############################################################################ # Requirements. # We need bash. We use the pipefail option to control the exit code of a # pipeline. SHELL := /usr/bin/env bash ############################################################################ # Configuration # # # This Makefile relies on the following variables: # ROOTDIR (default: .) # COQBIN (default: empty) # COQINCLUDE (default: empty) # VV (default: *.v) # V_AUX (default: undefined/empty) # SERIOUS (default: 1) # (if 0, we produce .vio files) # (if 1, we produce .vo files in the old way) # VERBOSE (default: undefined) # (if defined, commands are displayed) ifndef ROOTDIR ROOTDIR := . endif ifndef VV VV := $(wildcard $(ROOTDIR)/*.v) endif # Typically, $(VV) should list only the .v files that we are ultimately # interested in checking. $(V_AUX) should list every other .v file in the # project. $(VD) is obtained from $(VV) and $(V_AUX), so [make] sees all # dependencies and can rebuild files anywhere in the project, if needed, and # only if needed. ifndef VD VD := $(patsubst %.v,%.v.d,$(VV) $(V_AUX)) endif VIO := $(patsubst %.v,%.vio,$(VV)) VQ := $(patsubst %.v,%.vq,$(VV)) VO := $(patsubst %.v,%.vo,$(VV)) SERIOUS := 1 ############################################################################ # Binaries COQC := $(COQBIN)coqc $(COQFLAGS) COQDEP := $(COQBIN)coqdep COQIDE := $(COQBIN)coqide $(COQFLAGS) COQCHK := $(COQBIN)coqchk ############################################################################ # Targets .PHONY: all proof depend quick proof_vo proof_vq all: proof ifeq ($(SERIOUS),0) proof: proof_vq else proof: proof_vo endif proof_vq: $(VQ) depend: $(VD) quick: $(VIO) proof_vo: $(VO) ############################################################################ # Verbosity control. # Our commands are pretty long (due, among other things, to the use of # absolute paths everywhere). So, we hide them by default, and echo a short # message instead. However, sometimes one wants to see the command. # By default, VERBOSE is undefined, so the .SILENT directive is read, so no # commands are echoed. If VERBOSE is defined by the user, then the .SILENT # directive is ignored, so commands are echoed, unless they begin with an # explicit @. ifndef VERBOSE .SILENT: endif ############################################################################ # Verbosity filter. # Coq is way too verbose when using one of the -schedule-* commands. # So, we grep its output and remove any line that contains 'Checking task'. # We need a pipe that keeps the exit code of the *first* process. In # bash, when the pipefail option is set, the exit code is the logical # conjunction of the exit codes of the two processes. If we make sure # that the second process always succeeds, then we get the exit code # of the first process, as desired. ############################################################################ # Rules # If B uses A, then the dependencies produced by coqdep are: # B.vo: B.v A.vo # B.vio: B.v A.vio %.v.d: %.v $(COQDEP) $(COQINCLUDE) $< > $@ ifeq ($(SERIOUS),0) %.vo: %.vio @echo "Compiling `basename $*`..." set -o pipefail; ( \ $(COQC) $(COQINCLUDE) -schedule-vio2vo 1 $* \ 2>&1 | (grep -v 'Checking task' || true)) # The recipe for producing %.vio destroys %.vo. In other words, we do not # allow a young .vio file to co-exist with an old (possibly out-of-date) .vo # file, because this seems to lead Coq into various kinds of problems # ("inconsistent assumption" errors, "undefined universe" errors, warnings # about the existence of both files, and so on). Destroying %.vo should be OK # as long as the user does not try to build a mixture of .vo and .vio files in # one invocation of make. %.vio: %.v @echo "Digesting `basename $*`..." rm -f $*.vo $(COQC) $(COQINCLUDE) -quick $< %.vq: %.vio @echo "Checking `basename $*`..." set -o pipefail; ( \ $(COQC) $(COQINCLUDE) -schedule-vio-checking 1 $< \ 2>&1 | (grep -v 'Checking task' || true)) touch $@ endif ifeq ($(SERIOUS),1) %.vo: %.v @echo "Compiling `basename $*`..." $(COQC) $(COQINCLUDE) $< # @echo "$(COQC) $(COQINCLUDE) $<" endif _CoqProject: .FORCE @echo $(COQINCLUDE) > $@ .FORCE: ############################################################################ # Dependencies ifeq ($(findstring $(MAKECMDGOALS),depend clean),) -include $(VD) endif ############################################################################ # IDE .PHONY: ide .coqide: @echo '$(COQIDE) $(COQINCLUDE) $$*' > .coqide @chmod +x .coqide ide: _CoqProject $(COQIDE) $(COQINCLUDE) ############################################################################ # Clean .PHONY: clean # In a multi-directory setting, it is not entirely clear how to find the # files that we wish to remove. # One approach would be to view $(VV) as the authoritative list of source files # and remove just the derived files $(VO), etc. # Another approach is to scan all subdirectories of $(ROOTDIR) and remove all # object files in them. We follow this approach. # Be careful to use regular expressions that work both with GNU find # and with BSD find (MacOS). clean:: for d in `find $(ROOTDIR) -type d -not -regex ".*\\.git.*"` ; do \ (cd $$d && \ rm -f *~ && \ rm -f .*.aux && \ rm -f *.{vo,vos,vok,vio,vq,v.d,aux,glob,cache,crashcoqide} && \ rm -rf *.coq-native *.coqide && \ true) ; \ done menhir-20210929/coq-menhirlib/src/Validator_classes.v000066400000000000000000000061751412503066000224250ustar00rootroot00000000000000(****************************************************************************) (* *) (* Menhir *) (* *) (* Jacques-Henri Jourdan, CNRS, LRI, Université Paris Sud *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under *) (* the terms of the GNU Lesser General Public License as published by the *) (* Free Software Foundation, either version 3 of the License, or (at your *) (* option) any later version, as described in the file LICENSE. *) (* *) (****************************************************************************) From Coq Require Import List. From Coq.ssr Require Import ssreflect. Require Import Alphabet. Class IsValidator (P : Prop) (b : bool) := is_validator : b = true -> P. Global Hint Mode IsValidator + - : typeclass_instances. Global Instance is_validator_true : IsValidator True true. Proof. done. Qed. Global Instance is_validator_false : IsValidator False false. Proof. done. Qed. Global Instance is_validator_eq_true b : IsValidator (b = true) b. Proof. done. Qed. Global Instance is_validator_and P1 b1 P2 b2 `{IsValidator P1 b1} `{IsValidator P2 b2}: IsValidator (P1 /\ P2) (if b1 then b2 else false). Proof. by split; destruct b1, b2; apply is_validator. Qed. Global Instance is_validator_comparable_leibniz_eq A (C:Comparable A) (x y : A) : ComparableLeibnizEq C -> IsValidator (x = y) (compare_eqb x y). Proof. intros ??. by apply compare_eqb_iff. Qed. Global Instance is_validator_comparable_eq_impl A `(Comparable A) (x y : A) P b : IsValidator P b -> IsValidator (x = y -> P) (if compare_eqb x y then b else true). Proof. intros Hval Val ->. rewrite /compare_eqb compare_refl in Val. auto. Qed. Lemma is_validator_forall_finite A P b `(Finite A) : (forall (x : A), IsValidator (P x) (b x)) -> IsValidator (forall (x : A), P x) (forallb b all_list). Proof. move=> ? /forallb_forall Hb ?. apply is_validator, Hb, all_list_forall. Qed. (* We do not use an instance directly here, because we need somehow to force Coq to instantiate b with a lambda. *) Global Hint Extern 2 (IsValidator (forall x : ?A, _) _) => eapply (is_validator_forall_finite _ _ (fun (x:A) => _)) : typeclass_instances. (* Hint for synthetizing pattern-matching. *) Global Hint Extern 2 (IsValidator (match ?u with _ => _ end) ?b0) => let b := fresh "b" in unshelve notypeclasses refine (let b : bool := _ in _); [destruct u; intros; shelve|]; (* Synthetize `match .. with` in the validator. *) unify b b0; unfold b; destruct u; clear b : typeclass_instances. (* Hint for unfolding definitions. This is necessary because many hints for IsValidator use [Hint Extern], which do not automatically unfold identifiers. *) Global Hint Extern 100 (IsValidator ?X _) => unfold X : typeclass_instances. menhir-20210929/coq-menhirlib/src/Validator_complete.v000066400000000000000000000360331412503066000225740ustar00rootroot00000000000000(****************************************************************************) (* *) (* Menhir *) (* *) (* Jacques-Henri Jourdan, CNRS, LRI, Université Paris Sud *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under *) (* the terms of the GNU Lesser General Public License as published by the *) (* Free Software Foundation, either version 3 of the License, or (at your *) (* option) any later version, as described in the file LICENSE. *) (* *) (****************************************************************************) From Coq Require Import List Syntax Derive. From Coq.ssr Require Import ssreflect. Import ListNotations. Require Automaton. Require Import Alphabet Validator_classes. Module Make(Import A:Automaton.T). (** We instantiate some sets/map. **) Module TerminalComparableM <: ComparableM. Definition t := terminal. Global Instance tComparable : Comparable t := _. End TerminalComparableM. Module TerminalOrderedType := OrderedType_from_ComparableM TerminalComparableM. Module StateProdPosComparableM <: ComparableM. Definition t := (state*production*nat)%type. Global Instance tComparable : Comparable t := _. End StateProdPosComparableM. Module StateProdPosOrderedType := OrderedType_from_ComparableM StateProdPosComparableM. Module TerminalSet := FSetAVL.Make TerminalOrderedType. Module StateProdPosMap := FMapAVL.Make StateProdPosOrderedType. (** Nullable predicate for symbols and list of symbols. **) Definition nullable_symb (symbol:symbol) := match symbol with | NT nt => nullable_nterm nt | _ => false end. Definition nullable_word (word:list symbol) := forallb nullable_symb word. (** First predicate for non terminal, symbols and list of symbols, given as FSets. **) Definition first_nterm_set (nterm:nonterminal) := fold_left (fun acc t => TerminalSet.add t acc) (first_nterm nterm) TerminalSet.empty. Definition first_symb_set (symbol:symbol) := match symbol with | NT nt => first_nterm_set nt | T t => TerminalSet.singleton t end. Fixpoint first_word_set (word:list symbol) := match word with | [] => TerminalSet.empty | t::q => if nullable_symb t then TerminalSet.union (first_symb_set t) (first_word_set q) else first_symb_set t end. (** Small helper for finding the part of an item that is after the dot. **) Definition future_of_prod prod dot_pos : list symbol := (fix loop n lst := match n with | O => lst | S x => match loop x lst with [] => [] | _::q => q end end) dot_pos (rev' (prod_rhs_rev prod)). (** We build a fast map to store all the items of all the states. **) Definition items_map (_:unit): StateProdPosMap.t TerminalSet.t := fold_left (fun acc state => fold_left (fun acc item => let key := (state, prod_item item, dot_pos_item item) in let data := fold_left (fun acc t => TerminalSet.add t acc) (lookaheads_item item) TerminalSet.empty in let old := match StateProdPosMap.find key acc with | Some x => x | None => TerminalSet.empty end in StateProdPosMap.add key (TerminalSet.union data old) acc ) (items_of_state state) acc ) all_list (StateProdPosMap.empty TerminalSet.t). (** We need to avoid computing items_map each time we need it. To that purpose, we declare a typeclass specifying that some map is equal to items_map. *) Class IsItemsMap m := is_items_map : m = items_map (). (** Accessor. **) Definition find_items_map items_map state prod dot_pos : TerminalSet.t := match StateProdPosMap.find (state, prod, dot_pos) items_map with | None => TerminalSet.empty | Some x => x end. Definition state_has_future state prod (fut:list symbol) (lookahead:terminal) := exists dot_pos:nat, fut = future_of_prod prod dot_pos /\ TerminalSet.In lookahead (find_items_map (items_map ()) state prod dot_pos). (** Iterator over items. **) Definition forallb_items items_map (P:state -> production -> nat -> TerminalSet.t -> bool): bool:= StateProdPosMap.fold (fun key set acc => match key with (st, p, pos) => (acc && P st p pos set)%bool end ) items_map true. (** Typeclass instances for synthetizing the validator. *) Global Instance is_validator_subset S1 S2 : IsValidator (TerminalSet.Subset S1 S2) (TerminalSet.subset S1 S2). Proof. intros ?. by apply TerminalSet.subset_2. Qed. (* While the specification of the validator always quantify over possible lookahead tokens individually, the validator usually handles lookahead sets directly instead, for better performances. For instance, the validator for [state_has_future], which speaks about one single lookahead token is a subset operation: *) Lemma is_validator_state_has_future_subset st prod pos lookahead lset im fut : TerminalSet.In lookahead lset -> fut = future_of_prod prod pos -> IsItemsMap im -> IsValidator (state_has_future st prod fut lookahead) (TerminalSet.subset lset (find_items_map im st prod pos)). Proof. intros ? -> -> HSS%TerminalSet.subset_2. exists pos. split=>//. by apply HSS. Qed. (* We do not declare this lemma as an instance, and use [Hint Extern] instead, because the typeclass mechanism has trouble instantiating some evars if we do not explicitely call [eassumption]. *) Global Hint Extern 2 (IsValidator (state_has_future _ _ _ _) _) => eapply is_validator_state_has_future_subset; [eassumption|eassumption || reflexivity|] : typeclass_instances. (* As said previously, we manipulate lookahead terminal sets instead of lookahead individually. Hence, when we quantify over a lookahead set in the specification, we do not do anything in the executable validator. This instance is used for [non_terminal_closed]. *) Global Instance is_validator_forall_lookahead_set lset P b: (forall lookahead, TerminalSet.In lookahead lset -> IsValidator (P lookahead) b) -> IsValidator (forall lookahead, TerminalSet.In lookahead lset -> P lookahead) b. Proof. unfold IsValidator. firstorder. Qed. (* Dually, we sometimes still need to explicitelly iterate over a lookahead set. This is what this lemma allows. Used only in [end_reduce]. *) Lemma is_validator_iterate_lset P b lookahead lset : TerminalSet.In lookahead lset -> IsValidator P (b lookahead) -> IsValidator P (TerminalSet.fold (fun lookahead acc => if acc then b lookahead else false) lset true). Proof. intros Hlset%TerminalSet.elements_1 Hval Val. apply Hval. revert Val. rewrite TerminalSet.fold_1. generalize true at 1. clear -Hlset. induction Hlset as [? l <-%compare_eq|? l ? IH]=> /= b' Val. - destruct (b lookahead). by destruct b'. exfalso. by induction l; destruct b'. - eauto. Qed. Global Hint Extern 100 (IsValidator _ _) => match goal with | H : TerminalSet.In ?lookahead ?lset |- _ => eapply (is_validator_iterate_lset _ (fun lookahead => _) _ _ H); clear H end : typeclass_instances. (* We often quantify over all the items of all the states of the automaton. This lemma and the accompanying [Hint Resolve] declaration allow generating the corresponding executable validator. Note that it turns out that, in all the uses of this pattern, the first thing we do for each item is pattern-matching over the future. This lemma also embbed this pattern-matching, which makes it possible to get the hypothesis [fut' = future_of_prod prod (S pos)] in the non-nil branch. Moreover, note, again, that while the specification quantifies over lookahead terminals individually, the code provides lookahead sets instead. *) Lemma is_validator_forall_items P1 b1 P2 b2 im : IsItemsMap im -> (forall st prod lookahead lset pos, TerminalSet.In lookahead lset -> [] = future_of_prod prod pos -> IsValidator (P1 st prod lookahead) (b1 st prod pos lset)) -> (forall st prod pos lookahead lset s fut', TerminalSet.In lookahead lset -> fut' = future_of_prod prod (S pos) -> IsValidator (P2 st prod lookahead s fut') (b2 st prod pos lset s fut')) -> IsValidator (forall st prod fut lookahead, state_has_future st prod fut lookahead -> match fut with | [] => P1 st prod lookahead | s :: fut' => P2 st prod lookahead s fut' end) (forallb_items im (fun st prod pos lset => match future_of_prod prod pos with | [] => b1 st prod pos lset | s :: fut' => b2 st prod pos lset s fut' end)). Proof. intros -> Hval1 Hval2 Val st prod fut lookahead (pos & -> & Hlookahead). rewrite /forallb_items StateProdPosMap.fold_1 in Val. assert (match future_of_prod prod pos with | [] => b1 st prod pos (find_items_map (items_map ()) st prod pos) | s :: fut' => b2 st prod pos (find_items_map (items_map ()) st prod pos) s fut' end = true). - unfold find_items_map in *. assert (Hfind := @StateProdPosMap.find_2 _ (items_map ()) (st, prod, pos)). destruct StateProdPosMap.find as [lset|]; [|by edestruct (TerminalSet.empty_1); eauto]. specialize (Hfind _ eq_refl). apply StateProdPosMap.elements_1 in Hfind. revert Val. generalize true at 1. induction Hfind as [[? ?] l [?%compare_eq ?]|??? IH]=>?. + simpl in *; subst. match goal with |- _ -> ?X = true => destruct X end; [done|]. rewrite Bool.andb_false_r. clear. induction l as [|[[[??]?]?] l IH]=>//. + apply IH. - destruct future_of_prod eqn:EQ. by eapply Hval1; eauto. eapply Hval2 with (pos := pos); eauto; []. revert EQ. unfold future_of_prod=>-> //. Qed. (* We need a hint to explicitly instantiate b1 and b2 with lambdas. *) Global Hint Extern 0 (IsValidator (forall st prod fut lookahead, state_has_future st prod fut lookahead -> _) _) => eapply (is_validator_forall_items _ (fun st prod pos lset => _) _ (fun st prod pos lset s fut' => _)) : typeclass_instances. (* Used in [start_future] only. *) Global Instance is_validator_forall_state_has_future im st prod : IsItemsMap im -> IsValidator (forall look, state_has_future st prod (rev' (prod_rhs_rev prod)) look) (let lookaheads := find_items_map im st prod 0 in forallb (fun t => TerminalSet.mem t lookaheads) all_list). Proof. move=> -> /forallb_forall Val look. specialize (Val look (all_list_forall _)). exists 0. split=>//. by apply TerminalSet.mem_2. Qed. (** * Validation for completeness **) (** The nullable predicate is a fixpoint : it is correct. **) Definition nullable_stable := forall p:production, if nullable_word (prod_rhs_rev p) then nullable_nterm (prod_lhs p) = true else True. (** The first predicate is a fixpoint : it is correct. **) Definition first_stable:= forall (p:production), TerminalSet.Subset (first_word_set (rev' (prod_rhs_rev p))) (first_nterm_set (prod_lhs p)). (** The initial state has all the S=>.u items, where S is the start non-terminal **) Definition start_future := forall (init:initstate) (p:production), prod_lhs p = start_nt init -> forall (t:terminal), state_has_future init p (rev' (prod_rhs_rev p)) t. (** If a state contains an item of the form A->_.av[[b]], where a is a terminal, then reading an a does a [Shift_act], to a state containing an item of the form A->_.v[[b]]. **) Definition terminal_shift := forall (s1:state) prod fut lookahead, state_has_future s1 prod fut lookahead -> match fut with | T t::q => match action_table s1 with | Lookahead_act awp => match awp t with | Shift_act s2 _ => state_has_future s2 prod q lookahead | _ => False end | _ => False end | _ => True end. (** If a state contains an item of the form A->_.[[a]], then either we do a [Default_reduce_act] of the corresponding production, either a is a terminal (ie. there is a lookahead terminal), and reading a does a [Reduce_act] of the corresponding production. **) Definition end_reduce := forall (s:state) prod fut lookahead, state_has_future s prod fut lookahead -> match fut with | [] => match action_table s with | Default_reduce_act p => p = prod | Lookahead_act awt => match awt lookahead with | Reduce_act p => p = prod | _ => False end end | _ => True end. Definition is_end_reduce items_map := forallb_items items_map (fun s prod pos lset => match future_of_prod prod pos with | [] => match action_table s with | Default_reduce_act p => compare_eqb p prod | Lookahead_act awt => TerminalSet.fold (fun lookahead acc => match awt lookahead with | Reduce_act p => (acc && compare_eqb p prod)%bool | _ => false end) lset true end | _ => true end). (** If a state contains an item of the form A->_.Bv[[b]], where B is a non terminal, then the goto table says we have to go to a state containing an item of the form A->_.v[[b]]. **) Definition non_terminal_goto := forall (s1:state) prod fut lookahead, state_has_future s1 prod fut lookahead -> match fut with | NT nt::q => match goto_table s1 nt with | Some (exist _ s2 _) => state_has_future s2 prod q lookahead | None => False end | _ => True end. Definition start_goto := forall (init:initstate), match goto_table init (start_nt init) with | None => True | Some _ => False end. (** Closure property of item sets : if a state contains an item of the form A->_.Bv[[b]], then for each production B->u and each terminal a of first(vb), the state contains an item of the form B->_.u[[a]] **) Definition non_terminal_closed := forall s1 prod fut lookahead, state_has_future s1 prod fut lookahead -> match fut with | NT nt::q => forall p, prod_lhs p = nt -> (if nullable_word q then state_has_future s1 p (future_of_prod p 0) lookahead else True) /\ (forall lookahead2, TerminalSet.In lookahead2 (first_word_set q) -> state_has_future s1 p (future_of_prod p 0) lookahead2) | _ => True end. (** The automaton is complete **) Definition complete := nullable_stable /\ first_stable /\ start_future /\ terminal_shift /\ end_reduce /\ non_terminal_goto /\ start_goto /\ non_terminal_closed. Derive is_complete_0 SuchThat (forall im, IsItemsMap im -> IsValidator complete (is_complete_0 im)) As complete_0_is_validator. Proof. intros im. subst is_complete_0. instantiate (1:=fun im => _). apply _. Qed. Definition is_complete (_:unit) := is_complete_0 (items_map ()). Lemma complete_is_validator : IsValidator complete (is_complete ()). Proof. by apply complete_0_is_validator. Qed. End Make. menhir-20210929/coq-menhirlib/src/Validator_safe.v000066400000000000000000000203251412503066000216770ustar00rootroot00000000000000(****************************************************************************) (* *) (* Menhir *) (* *) (* Jacques-Henri Jourdan, CNRS, LRI, Université Paris Sud *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under *) (* the terms of the GNU Lesser General Public License as published by the *) (* Free Software Foundation, either version 3 of the License, or (at your *) (* option) any later version, as described in the file LICENSE. *) (* *) (****************************************************************************) From Coq Require Import List Syntax Derive. Import ListNotations. From Coq.ssr Require Import ssreflect. Require Automaton. Require Import Alphabet Validator_classes. Module Make(Import A:Automaton.T). (** The singleton predicate for states **) Definition singleton_state_pred (state:state) := (fun state' => match compare state state' with Eq => true |_ => false end). (** [past_state_of_non_init_state], extended for all states. **) Definition past_state_of_state (state:state) := match state with | Init _ => [] | Ninit nis => past_state_of_non_init_state nis end. (** Concatenations of last and past **) Definition head_symbs_of_state (state:state) := match state with | Init _ => [] | Ninit s => last_symb_of_non_init_state s::past_symb_of_non_init_state s end. Definition head_states_of_state (state:state) := singleton_state_pred state::past_state_of_state state. (** * Validation for correctness **) (** Prefix predicate between two lists of symbols. **) Inductive prefix: list symbol -> list symbol -> Prop := | prefix_nil: forall l, prefix [] l | prefix_cons: forall l1 l2 x, prefix l1 l2 -> prefix (x::l1) (x::l2). (** [prefix] is transitive **) Lemma prefix_trans: forall (l1 l2 l3:list symbol), prefix l1 l2 -> prefix l2 l3 -> prefix l1 l3. Proof. intros l1 l2 l3 H1 H2. revert l3 H2. induction H1; [now constructor|]. inversion 1. subst. constructor. eauto. Qed. Fixpoint is_prefix (l1 l2:list symbol) := match l1, l2 with | [], _ => true | t1::q1, t2::q2 => (compare_eqb t1 t2 && is_prefix q1 q2)%bool | _::_, [] => false end. Global Instance prefix_is_validator l1 l2 : IsValidator (prefix l1 l2) (is_prefix l1 l2). Proof. revert l2. induction l1 as [|x1 l1 IH]=>l2 Hpref. - constructor. - destruct l2 as [|x2 l2]=>//. move: Hpref=> /andb_prop [/compare_eqb_iff -> /IH ?]. by constructor. Qed. (** If we shift, then the known top symbols of the destination state is a prefix of the known top symbols of the source state, with the new symbol added. **) Definition shift_head_symbs := forall s, match action_table s with | Lookahead_act awp => forall t, match awp t with | Shift_act s2 _ => prefix (past_symb_of_non_init_state s2) (head_symbs_of_state s) | _ => True end | _ => True end. (** When a goto happens, then the known top symbols of the destination state is a prefix of the known top symbols of the source state, with the new symbol added. **) Definition goto_head_symbs := forall s nt, match goto_table s nt with | Some (exist _ s2 _) => prefix (past_symb_of_non_init_state s2) (head_symbs_of_state s) | None => True end. (** We have to say the same kind of checks for the assumptions about the states stack. However, theses assumptions are predicates. So we define a notion of "prefix" over predicates lists, that means, basically, that an assumption entails another **) Inductive prefix_pred: list (state->bool) -> list (state->bool) -> Prop := | prefix_pred_nil: forall l, prefix_pred [] l | prefix_pred_cons: forall l1 l2 f1 f2, (forall x, implb (f2 x) (f1 x) = true) -> prefix_pred l1 l2 -> prefix_pred (f1::l1) (f2::l2). (** [prefix_pred] is transitive **) Lemma prefix_pred_trans: forall (l1 l2 l3:list (state->bool)), prefix_pred l1 l2 -> prefix_pred l2 l3 -> prefix_pred l1 l3. Proof. intros l1 l2 l3 H1 H2. revert l3 H2. induction H1 as [|l1 l2 f1 f2 Hf2f1]; [now constructor|]. intros l3. inversion 1 as [|??? f3 Hf3f2]. subst. constructor; [|now eauto]. intros x. specialize (Hf3f2 x). specialize (Hf2f1 x). repeat destruct (_ x); auto. Qed. Fixpoint is_prefix_pred (l1 l2:list (state->bool)) := match l1, l2 with | [], _ => true | f1::q1, f2::q2 => (forallb (fun x => implb (f2 x) (f1 x)) all_list && is_prefix_pred q1 q2)%bool | _::_, [] => false end. Global Instance prefix_pred_is_validator l1 l2 : IsValidator (prefix_pred l1 l2) (is_prefix_pred l1 l2). Proof. revert l2. induction l1 as [|x1 l1 IH]=>l2 Hpref. - constructor. - destruct l2 as [|x2 l2]=>//. move: Hpref=> /andb_prop [/forallb_forall ? /IH ?]. constructor; auto using all_list_forall. Qed. (** The assumptions about state stack is conserved when we shift **) Definition shift_past_state := forall s, match action_table s with | Lookahead_act awp => forall t, match awp t with | Shift_act s2 _ => prefix_pred (past_state_of_non_init_state s2) (head_states_of_state s) | _ => True end | _ => True end. (** The assumptions about state stack is conserved when we do a goto **) Definition goto_past_state := forall s nt, match goto_table s nt with | Some (exist _ s2 _) => prefix_pred (past_state_of_non_init_state s2) (head_states_of_state s) | None => True end. (** What states are possible after having popped these symbols from the stack, given the annotation of the current state ? **) Inductive state_valid_after_pop (s:state): list symbol -> list (state -> bool) -> Prop := | state_valid_after_pop_nil1: forall p pl, p s = true -> state_valid_after_pop s [] (p::pl) | state_valid_after_pop_nil2: forall sl, state_valid_after_pop s sl [] | state_valid_after_pop_cons: forall st sq p pl, state_valid_after_pop s sq pl -> state_valid_after_pop s (st::sq) (p::pl). Fixpoint is_state_valid_after_pop (state:state) (to_pop:list symbol) annot := match annot, to_pop with | [], _ => true | p::_, [] => p state | p::pl, s::sl => is_state_valid_after_pop state sl pl end. Global Instance impl_is_state_valid_after_pop_is_validator state sl pl P b : IsValidator P b -> IsValidator (state_valid_after_pop state sl pl -> P) (if is_state_valid_after_pop state sl pl then b else true). Proof. destruct (is_state_valid_after_pop _ sl pl) eqn:EQ. - intros ???. by eapply is_validator. - intros _ _ Hsvap. exfalso. induction Hsvap=>//; [simpl in EQ; congruence|]. by destruct sl. Qed. (** A state is valid for reducing a production when : - The assumptions on the state are such that we will find the right hand side of the production on the stack. - We will be able to do a goto after having popped the right hand side. **) Definition valid_for_reduce (state:state) prod := prefix (prod_rhs_rev prod) (head_symbs_of_state state) /\ forall state_new, state_valid_after_pop state_new (prod_rhs_rev prod) (head_states_of_state state) -> match goto_table state_new (prod_lhs prod) with | None => match state_new with | Init i => prod_lhs prod = start_nt i | Ninit _ => False end | _ => True end. (** All the states that does a reduce are valid for reduction **) Definition reduce_ok := forall s, match action_table s with | Lookahead_act awp => forall t, match awp t with | Reduce_act p => valid_for_reduce s p | _ => True end | Default_reduce_act p => valid_for_reduce s p end. (** The automaton is safe **) Definition safe := shift_head_symbs /\ goto_head_symbs /\ shift_past_state /\ goto_past_state /\ reduce_ok. Derive is_safe SuchThat (IsValidator safe (is_safe ())) As safe_is_validator. Proof. subst is_safe. instantiate (1:=fun _ => _). apply _. Qed. End Make. menhir-20210929/coq-menhirlib/src/Version.v000066400000000000000000000000431412503066000203740ustar00rootroot00000000000000Definition require_20210929 := tt. menhir-20210929/doc/000077500000000000000000000000001412503066000140105ustar00rootroot00000000000000menhir-20210929/doc/.gitignore000066400000000000000000000002331412503066000157760ustar00rootroot00000000000000*.dvi *.aux *.bbl *.blg *.log *.out *.toc *.fdb_latexmk *.fls manual.pdf *.haux *.htoc manual.image.tex manual[0-9][0-9][0-9].png manual.html *.synctex.gz menhir-20210929/doc/Makefile000066400000000000000000000014571412503066000154570ustar00rootroot00000000000000.PHONY: all loop clean export TEXINPUTS=.: DEPS = $(wildcard *.tex) $(wildcard *.bib) $(wildcard *.sty) $(wildcard *.mly) all: manual.pdf manual.html %.pdf: %.tex $(DEPS) pdflatex $* bibtex $* pdflatex $* pdflatex $* manual.html: manual.tex $(DEPS) $(wildcard *.hva) hevea -fix manual.tex # # Hevea interprets 'tabbing' environment in a way # that creates spacing errors in the rendered output # of "textual version of derivation trees": it # asks for (padding:0px;) while the TeX rendering # inserts spacing between columns. Change this # to {padding:1px;} sed -i.bak -e "s/cellpadding0/cellpadding1/" manual.html && rm manual.html.bak # # Note: hevea generates images manual00{1,2,3}.png for the tikz pictures # present in the manual. loop: latexmk -pdf -pvc manual clean: @ rm -f `cat .gitignore` menhir-20210929/doc/declarations-onerrorreduce.mly000066400000000000000000000004051412503066000220560ustar00rootroot00000000000000%token ID ARROW LPAREN RPAREN COLON SEMICOLON %start program %on_error_reduce typ1 %% typ0: ID | LPAREN typ1 RPAREN {} typ1: typ0 | typ0 ARROW typ1 {} declaration: ID COLON typ1 {} program: | LPAREN declaration RPAREN | declaration SEMICOLON {} menhir-20210929/doc/declarations-phantom.mly000066400000000000000000000004731412503066000206530ustar00rootroot00000000000000%token ID ARROW LPAREN RPAREN COLON SEMICOLON %start program %% typ0: ID | LPAREN typ1(RPAREN) RPAREN {} typ1(phantom): typ0 | typ0 ARROW typ1(phantom) {} declaration(phantom): ID COLON typ1(phantom) {} program: | LPAREN declaration(RPAREN) RPAREN | declaration(SEMICOLON) SEMICOLON {} menhir-20210929/doc/declarations.mly000066400000000000000000000003571412503066000172100ustar00rootroot00000000000000%token ID ARROW LPAREN RPAREN COLON SEMICOLON %start program %% typ0: ID | LPAREN typ1 RPAREN {} typ1: typ0 | typ0 ARROW typ1 {} declaration: ID COLON typ1 {} program: | LPAREN declaration RPAREN | declaration SEMICOLON {} menhir-20210929/doc/dune000066400000000000000000000001601412503066000146630ustar00rootroot00000000000000;; Install the man page. (install (section man) (files menhir.1) (package menhir) ) (include dune.manual) menhir-20210929/doc/dune.manual000066400000000000000000000005601412503066000161430ustar00rootroot00000000000000;; This file is concatenated at the end of the file "dune" ;; by the release script, so the documentation is installed ;; only on release branches. ;; The documentation is currently built outside of dune's control ;; by doc/Makefile. (install (section doc) (files manual.pdf manual.html manual001.png manual002.png manual003.png ) (package menhir) ) menhir-20210929/doc/fppdf.sty000066400000000000000000000024401412503066000156500ustar00rootroot00000000000000% This tiny package invokes ``hyperref'' with appropriate options. % Three modes are provided: % if \fppdf is defined, we configure ``hyperref'' for PDF output. % otherwise, if WhizzyTeX is active, we do configure ``softref'' for producing DVI output % containing ``advi''-style hyperlinks. % otherwise, we configure nothing. \ProvidesPackage{fppdf} \@ifundefined{fppdf}{ \newcommand{\texorpdfstring}[2]{#1} \newcommand{\href}[2]{#2} \@ifundefined{WhizzyTeX}{ % PostScript output. \typeout{No hyperlinks.} }{ % WhizzyTeX output. \typeout{Hyperlinks in advi style.} % % Dfinissons les commandes \softlink et \softtarget, employes par locallabel, % de faon ce que les labels de preuves deviennent des hyperliens. % \edef\hyper@quote{\string"} \edef\hyper@sharp{\string#} \def \softlink #1#2{\special {html:}#2\special {html:}} \def \softtarget #1#2{\special {html:}#2\special {html:}} } }{ % PDF output. \typeout{Hyperlinks in pdflatex style.} \usepackage[bookmarks=true,bookmarksopen=true,colorlinks=true,linkcolor=blue,citecolor=blue,urlcolor=blue]{hyperref} \let\softlink\hyperlink \let\softtarget\hypertarget } menhir-20210929/doc/hevea.sty000066400000000000000000000057621412503066000156530ustar00rootroot00000000000000% hevea : hevea.sty % This is a very basic style file for latex document to be processed % with hevea. It contains definitions of LaTeX environment which are % processed in a special way by the translator. % Mostly : % - latexonly, not processed by hevea, processed by latex. % - htmlonly , the reverse. % - rawhtml, to include raw HTML in hevea output. % - toimage, to send text to the image file. % The package also provides hevea logos, html related commands (ahref % etc.), void cutting and image commands. \NeedsTeXFormat{LaTeX2e} \ProvidesPackage{hevea}[2002/01/11] \RequirePackage{comment} \newif\ifhevea\heveafalse \@ifundefined{ifimagen}{\newif\ifimagen\imagenfalse} \makeatletter% \newcommand{\heveasmup}[2]{% \raise #1\hbox{$\m@th$% \csname S@\f@size\endcsname \fontsize\sf@size 0% \math@fontsfalse\selectfont #2% }}% \DeclareRobustCommand{\hevea}{H\kern-.15em\heveasmup{.2ex}{E}\kern-.15emV\kern-.15em\heveasmup{.2ex}{E}\kern-.15emA}% \DeclareRobustCommand{\hacha}{H\kern-.15em\heveasmup{.2ex}{A}\kern-.15emC\kern-.1em\heveasmup{.2ex}{H}\kern-.15emA}% \DeclareRobustCommand{\html}{\protect\heveasmup{0.ex}{HTML}} %%%%%%%%% Hyperlinks hevea style \newcommand{\ahref}[2]{{#2}} \newcommand{\ahrefloc}[2]{{#2}} \newcommand{\aname}[2]{{#2}} \newcommand{\ahrefurl}[1]{\texttt{#1}} \newcommand{\footahref}[2]{#2\footnote{\texttt{#1}}} \newcommand{\mailto}[1]{\texttt{#1}} \newcommand{\imgsrc}[2][]{} \newcommand{\home}[1]{\protect\raisebox{-.75ex}{\char126}#1} \AtBeginDocument {\@ifundefined{url} {%url package is not loaded \let\url\ahref\let\oneurl\ahrefurl\let\footurl\footahref} {}} %% Void cutting instructions \newcounter{cuttingdepth} \newcommand{\tocnumber}{} \newcommand{\notocnumber}{} \newcommand{\cuttingunit}{} \newcommand{\cutdef}[2][]{} \newcommand{\cuthere}[2]{} \newcommand{\cutend}{} \newcommand{\htmlhead}[1]{} \newcommand{\htmlfoot}[1]{} \newcommand{\htmlprefix}[1]{} \newenvironment{cutflow}[1]{}{} \newcommand{\cutname}[1]{} \newcommand{\toplinks}[3]{} \newcommand{\setlinkstext}[3]{} \newcommand{\flushdef}[1]{} \newcommand{\footnoteflush}[1]{} %%%% Html only \excludecomment{rawhtml} \newcommand{\rawhtmlinput}[1]{} \excludecomment{htmlonly} %%%% Latex only \newenvironment{latexonly}{}{} \newenvironment{verblatex}{}{} %%%% Image file stuff \def\toimage{\endgroup} \def\endtoimage{\begingroup\def\@currenvir{toimage}} \def\verbimage{\endgroup} \def\endverbimage{\begingroup\def\@currenvir{verbimage}} \newcommand{\imageflush}[1][]{} %%% Bgcolor definition \newsavebox{\@bgcolorbin} \newenvironment{bgcolor}[2][] {\newcommand{\@mycolor}{#2}\begin{lrbox}{\@bgcolorbin}\vbox\bgroup} {\egroup\end{lrbox}% \begin{flushleft}% \colorbox{\@mycolor}{\usebox{\@bgcolorbin}}% \end{flushleft}} %%% Style sheets macros, defined as no-ops \newcommand{\newstyle}[2]{} \newcommand{\addstyle}[1]{} \newcommand{\setenvclass}[2]{} \newcommand{\getenvclass}[1]{} \newcommand{\loadcssfile}[1]{} \newenvironment{divstyle}[1]{}{} \newenvironment{cellstyle}[2]{}{} \newif\ifexternalcss %%% Postlude \makeatother menhir-20210929/doc/local.bib000066400000000000000000000165441412503066000155720ustar00rootroot00000000000000@String{acta = "Acta Informatica"} @String{aw = "Addison-Wesley"} @String{cacm = "Communications of the {ACM}"} @String{cc = "Compiler Construction (CC)"} @String{cup = "Cambridge University Press"} @String{entcs = "Electronic Notes in Theoretical Computer Science"} @String{spe = "Software: Practice and Experience"} @String{toplas = "ACM Transactions on Programming Languages and Systems"} @Misc{compcert-github, author = "Xavier Leroy", title = "The {CompCert C} verified compiler", year = "2014", howpublished = "\url{https://github.com/AbsInt/CompCert}", } @Misc{obelisk, author = {L\'elio Brun}, title = {Obelisk}, howpublished = {\url{https://github.com/Lelio-Brun/Obelisk}}, year = {2017}, } @Book{aho-86, author = "Alfred V. Aho and Ravi Sethi and Jeffrey D. Ullman", title = "Compilers: Principles, Techniques, and Tools", publisher = aw, year = "1986", } @Book{appel-tiger-98, author = "Andrew Appel", title = "Modern Compiler Implementation in {ML}", publisher = cup, year = "1998", URL = "http://www.cs.princeton.edu/~appel/modern/ml/", } @Article{bhamidipaty-proebsting-98, author = "Achyutram Bhamidipaty and Todd A. Proebsting", title = "Very Fast {YACC}-Compatible Parsers (For Very Little Effort)", journal = spe, year = "1998", volume = "28", number = "2", pages = "181--190", URL = "http://www.cs.arizona.edu/people/todd/papers/TR95-09.ps", } @Article{dencker-84, author = "Peter Dencker and Karl Dürre and Johannes Heuft", title = "Optimization of parser tables for portable compilers", journal = toplas, volume = "6", number = "4", year = "1984", pages = "546--572", URL = "http://doi.acm.org/10.1145/1780.1802", } @Article{deremer-pennello-82, author = "Frank DeRemer and Thomas Pennello", title = "Efficient Computation of ${LALR}(1)$ Look-Ahead Sets", journal = toplas, volume = "4", number = "4", year = "1982", pages = "615--649", URL = "http://doi.acm.org/10.1145/69622.357187", } @Manual{bison, title = "Bison", author = "Charles Donnelly and Richard Stallman", year = "2015", URL = "http://www.gnu.org/software/bison/manual/", } @Book{hopcroft-motwani-ullman-00, author = "John E. Hopcroft and Rajeev Motwani and Jeffrey D. Ullman", title = "Introduction to Automata Theory, Languages, and Computation", publisher = aw, year = "2000", URL = "http://www-db.stanford.edu/~ullman/ialc.html", } @Article{horspool-faster-90, author = "R. Nigel Horspool and Michael Whitney", title = "Even Faster {LR} Parsing", journal = spe, year = "1990", volume = "20", number = "6", pages = "515--535", URL = "http://www.cs.uvic.ca/~nigelh/Publications/fastparse.pdf", } @Article{jeffery-03, author = "Clinton L. Jeffery", title = "Generating {LR} syntax error messages from examples", journal = toplas, volume = "25", number = "5", year = "2003", pages = "631--640", URL = "http://doi.acm.org/10.1145/937563.937566", } @InCollection{johnson-yacc-79, author = "Steven C. Johnson", title = "{Yacc}: Yet Another Compiler Compiler", booktitle = "{UNIX} Programmer's Manual", volume = "2", publisher = "Holt, Rinehart, and Winston", pages = "353--387", year = "1979", URL = "http://dinosaur.compilertools.net/", } @InProceedings{jourdan-leroy-pottier-12, author = "Jacques-Henri Jourdan and François Pottier and Xavier Leroy", title = "Validating ${LR}(1)$ Parsers", year = "2012", booktitle = esop, publisher = springer, series = lncs, volume = "7211", pages = "397--416", URL = "http://gallium.inria.fr/~fpottier/publis/jourdan-leroy-pottier-validating-parsers.pdf", } @Article{klint-laemmel-verhoef-05, author = "Paul Klint and Ralf L{\"a}mmel and Chris Verhoef", title = "Toward an engineering discipline for grammarware", journal = tosem, volume = "14", number = "3", year = "2005", pages = "331--380", URL = "http://www.few.vu.nl/~x/gw/gw.pdf", } @Article{knuth-lr-65, author = "Donald E. Knuth", title = "On the translation of languages from left to right", journal = "Information \& Control", year = "1965", volume = "8", number = "6", pages = "607--639", URL = "http://www.sciencedirect.com/science/article/pii/S0019995865904262", } @Misc{compcert, author = "Xavier Leroy", title = "The {CompCert C} compiler", year = "2015", howpublished = "\url{http://compcert.inria.fr/}", } @Misc{ocaml, author = "Xavier Leroy and Damien Doligez and Alain Frisch and Jacques Garrigue and Didier Rémy and Jérôme Vouillon", title = "The {OCaml} system: documentation and user's manual", year = "2016", URL = "http://caml.inria.fr/", } @Article{pager-77, author = "David Pager", title = "A Practical General Method for Constructing ${LR}(k)$ Parsers", journal = acta, year = "1977", volume = "7", pages = "249--268", URL = "http://dx.doi.org/10.1007/BF00290336", } @InProceedings{pottier-reachability-cc-2016, author = "François Pottier", title = "Reachability and error diagnosis in {LR}(1) parsers", booktitle = cc, year = "2016", pages = "88--98", URL = "http://gallium.inria.fr/~fpottier/publis/fpottier-reachability-cc2016.pdf", } @Article{pottier-regis-gianas-typed-lr, author = "François Pottier and Yann {Régis-Gianas}", title = "Towards efficient, typed {LR} parsers", URL = "http://gallium.inria.fr/~fpottier/publis/fpottier-regis-gianas-typed-lr.pdf", year = "2006", pages = "155--180", journal = entcs, volume = "148", number = "2", } @Manual{tarditi-appel-00, title = "{ML-Yacc} User's Manual", author = "David R. Tarditi and Andrew W. Appel", year = "2000", URL = "http://www.smlnj.org/doc/ML-Yacc/", } @Article{tarjan-yao-79, author = "Robert Endre Tarjan and Andrew Chi-Chih Yao", title = "Storing a sparse table", journal = cacm, volume = "22", number = "11", year = "1979", pages = "606--611", URL = "http://doi.acm.org/10.1145/359168.359175", } @Article{jourdan-pottier-17, author = "Jacques-Henri Jourdan and François Pottier", title = "A simple, possibly correct {LR} parser for {C11}", journal = "ACM Transactions on Programming Languages and Systems", month = aug, year = "2017", volume = "39", number = "4", pages = "14:1--14:36", URL = "http://gallium.inria.fr/~fpottier/publis/jourdan-fpottier-2016.pdf", } menhir-20210929/doc/macros.tex000066400000000000000000000220261412503066000160200ustar00rootroot00000000000000% EBNF syntax. \let\nt\textit % Nonterminal. \newcommand{\is}{& ${} ::= {}$ &} \newcommand{\newprod}{\\\hspace{1cm}\barre\hspace{2mm}} \newcommand{\phaprod}{\\\hspace{1cm}\phantom\barre\hspace{2mm}} % Options and choices. \newcommand{\optional}[1]{% \ifmmode [\,#1\,]% \else $[\,\text{#1}\,]$% \fi } \newcommand{\metachoice}{$\,\mid\,$} % Lists. \newcommand{\seplist}[2]{#2#1${}\ldots{}$#1#2} \newcommand{\sepspacelist}[1]{\seplist{\ }{#1}} \newcommand{\sepcommalist}[1]{\seplist{,\ }{#1}} \newcommand{\precseplist}[2]{% \optional{#1} \seplist{\ #1}{#2}% } % Optional parameters. \newcommand{\tuple}[1]{\dlpar\sepcommalist{#1}\drpar} \newcommand{\oparams}[1]{\optional{\tuple{#1}}} % Some nonterminal symbols used in the grammar. \newcommand{\expression}{\nt{expression}\xspace} \newcommand{\expressionsub}[1]{\nt{expression}${}_{#1}$} \newcommand{\pattern}{\nt{pattern}\xspace} % Concrete syntax. \newcommand{\percentpercent}{\kw{\%\%}\xspace} \newcommand{\deuxpoints}{\kw{:}\xspace} \newcommand{\barre}{\kw{\textbar}\xspace} \newcommand{\kangle}[1]{\kw{\textless} #1 \kw{\textgreater}} \newcommand{\ocamltype}{\kangle{\textit{\ocaml type}}\xspace} \newcommand{\ocamlparam}{\kangle{\nt{uid} \deuxpoints \textit{\ocaml module type}}\xspace} \newcommand{\dheader}[1]{\kw{\%\{} #1 \kw{\%\}}} \newcommand{\dtoken}{\kw{\%token}\xspace} \newcommand{\dstart}{\kw{\%start}\xspace} \newcommand{\dtype}{\kw{\%type}\xspace} \newcommand{\dnonassoc}{\kw{\%nonassoc}\xspace} \newcommand{\dleft}{\kw{\%left}\xspace} \newcommand{\dright}{\kw{\%right}\xspace} \newcommand{\dparameter}{\kw{\%parameter}\xspace} \newcommand{\dpublic}{\kw{\%public}\xspace} \newcommand{\dinline}{\kw{\%inline}\xspace} \newcommand{\donerrorreduce}{\kw{\%on\_error\_reduce}\xspace} \newcommand{\dattribute}{\kw{\%attribute}\xspace} \newcommand{\dpaction}[1]{\kw{\{} #1 \kw{\}}\xspace} \newcommand{\daction}{\dpaction{\textit{\ocaml code}}\xspace} \newcommand{\dpfaction}[1]{\kw{<} #1 \kw{>}\xspace} \newcommand{\dpfidentityaction}{\kw{<>}\xspace} \newcommand{\dprec}{\kw{\%prec}\xspace} \newcommand{\dequal}{\kw{\small =}\xspace} \newcommand{\dquestion}{\kw{?}\xspace} \newcommand{\dplus}{\raisebox{2pt}{\kw{\small +}}\xspace} \newcommand{\dcolonequal}{\kw{\small :=}\xspace} \newcommand{\dequalequal}{\kw{\small ==}\xspace} \newcommand{\dstar}{\kw{*}\xspace} \newcommand{\dlpar}{\kw{(}\,\xspace} \newcommand{\drpar}{\,\kw{)}\xspace} \newcommand{\eos}{\kw{\#}\xspace} \newcommand{\dnewline}{\kw{\textbackslash n}\xspace} \newcommand{\dlet}{\kw{let}\xspace} \newcommand{\dsemi}{\kw{;}\xspace} \newcommand{\dunderscore}{\kw{\_}\xspace} \newcommand{\dtilde}{\raisebox{-4pt}{\kw{\textasciitilde}}\xspace} % Stylistic conventions. \newcommand{\kw}[1]{\text{\upshape\sf\bfseries #1}} \newcommand{\inlinesidecomment}[1]{\textit{\textbf{\footnotesize // #1}}} \newcommand{\sidecomment}[1]{\hspace{2cm}\inlinesidecomment{#1}} \newcommand{\docskip}{\vspace{1mm plus 1mm}} \newcommand{\docswitch}[1]{\docskip#1.\hspace{3mm}} \newcommand{\error}{\kw{error}\xspace} % Links to Menhir's repository. \newcommand{\repo}[2]{\href{https://gitlab.inria.fr/fpottier/menhir/blob/master/#1}{#2}} \newcommand{\menhirlibconvert}{\repo{lib/Convert.mli}{\texttt{MenhirLib.Convert}}\xspace} \newcommand{\menhirliberrorreports}{\repo{lib/ErrorReports.mli}{\texttt{MenhirLib.ErrorReports}}\xspace} \newcommand{\menhirlibincrementalengine}{\repo{lib/IncrementalEngine.ml}{\texttt{MenhirLib.IncrementalEngine}}\xspace} \newcommand{\menhirliblexerutil}{\repo{lib/LexerUtil.mli}{\texttt{MenhirLib.LexerUtil}}\xspace} \newcommand{\standardmly}{\repo{src/standard.mly}{\texttt{standard.mly}}\xspace} \newcommand{\distrib}[1]{\repo{#1}{\texttt{#1}}} % Links to CompCert's repository. \newcommand{\compcertgithub}{https://github.com/AbsInt/CompCert/tree/master} \newcommand{\compcertgithubfile}[1]{\href{\compcertgithub/#1}{\texttt{#1}}} % Abbreviations. \newcommand{\menhir}{Menhir\xspace} \newcommand{\menhirlib}{\texttt{MenhirLib}\xspace} \newcommand{\menhirsdk}{\texttt{MenhirSdk}\xspace} \newcommand{\coqmenhirlib}{\texttt{MenhirLib}\xspace} % for now \newcommand{\menhirinterpreter}{\texttt{MenhirInterpreter}\xspace} \newcommand{\cmenhir}{\texttt{menhir}\xspace} \newcommand{\ml}{\texttt{.ml}\xspace} \newcommand{\mli}{\texttt{.mli}\xspace} \newcommand{\mly}{\texttt{.mly}\xspace} \newcommand{\cmly}{\texttt{.cmly}\xspace} \newcommand{\vy}{\texttt{.vy}\xspace} \newcommand{\ocaml}{OCaml\xspace} \newcommand{\ocamlc}{\texttt{ocamlc}\xspace} \newcommand{\ocamlopt}{\texttt{ocamlopt}\xspace} \newcommand{\ocamldep}{\texttt{ocamldep}\xspace} \newcommand{\make}{\texttt{make}\xspace} \newcommand{\omake}{\texttt{omake}\xspace} \newcommand{\ocamlbuild}{\texttt{ocamlbuild}\xspace} \newcommand{\dune}{\texttt{dune}\xspace} \newcommand{\Makefile}{\texttt{Makefile}\xspace} \newcommand{\yacc}{\texttt{yacc}\xspace} \newcommand{\bison}{\texttt{bison}\xspace} \newcommand{\ocamlyacc}{\texttt{ocamlyacc}\xspace} \newcommand{\ocamllex}{\texttt{ocamllex}\xspace} \newcommand{\token}{\texttt{token}\xspace} \newcommand{\automaton}{\texttt{.automaton}\xspace} \newcommand{\automatonresolved}{\texttt{.automaton.resolved}\xspace} \newcommand{\conflicts}{\texttt{.conflicts}\xspace} \newcommand{\dott}{\texttt{.dot}\xspace} \newcommand{\legacy}{\texttt{legacy}\xspace} \newcommand{\simplified}{\texttt{simplified}\xspace} % Environments. \newcommand{\question}[1]{\vspace{3mm}$\diamond$ \textbf{#1}} % Ocamlweb settings. \newcommand{\basic}[1]{\textit{#1}} \let\ocwkw\kw \let\ocwbt\basic \let\ocwupperid\basic \let\ocwlowerid\basic \let\ocwtv\basic \newcommand{\ocwbar}{\vskip 2mm plus 2mm \hrule \vskip 2mm plus 2mm} \newcommand{\tcup}{${}\cup{}$} \newcommand{\tcap}{${}\cap{}$} \newcommand{\tminus}{${}\setminus{}$} % Command line options. \newcommand{\oo}[1]{\texttt{-{}-#1}\xspace} \newcommand{\obase}{\oo{base}} \newcommand{\ocanonical}{\oo{canonical}} % undocumented! \newcommand{\ocomment}{\oo{comment}} \newcommand{\ocmly}{\oo{cmly}} \newcommand{\odepend}{\oo{depend}} \newcommand{\orawdepend}{\oo{raw-depend}} \newcommand{\odump}{\oo{dump}} \newcommand{\odumpresolved}{\oo{dump-resolved}} \newcommand{\oerrorrecovery}{\oo{error-recovery}} \newcommand{\oexplain}{\oo{explain}} \newcommand{\oexternaltokens}{\oo{external-tokens}} \newcommand{\ofixedexc}{\oo{fixed-exception}} \newcommand{\ograph}{\oo{graph}} \newcommand{\oignoreone}{\oo{unused-token}} \newcommand{\oignoreall}{\oo{unused-tokens}} \newcommand{\oignoreprec}{\oo{unused-precedence-levels}} \newcommand{\oinfer}{\oo{infer}} \newcommand{\oinferwrite}{\oo{infer-write-query}} \newcommand{\oinferread}{\oo{infer-read-reply}} \newcommand{\oinferprotocolsupported}{\oo{infer-protocol-supported}} \newcommand{\oinspection}{\oo{inspection}} \newcommand{\ointerpret}{\oo{interpret}} \newcommand{\ointerpretshowcst}{\oo{interpret-show-cst}} \newcommand{\ologautomaton}{\oo{log-automaton}} \newcommand{\ologcode}{\oo{log-code}} \newcommand{\ologgrammar}{\oo{log-grammar}} \newcommand{\onodollars}{\oo{no-dollars}} \newcommand{\onoinline}{\oo{no-inline}} \newcommand{\onostdlib}{\oo{no-stdlib}} \newcommand{\oocamlc}{\oo{ocamlc}} \newcommand{\oocamldep}{\oo{ocamldep}} \newcommand{\oonlypreprocess}{\oo{only-preprocess}} \newcommand{\oonlytokens}{\oo{only-tokens}} \newcommand{\orequirealiases}{\oo{require-aliases}} \newcommand{\ostrict}{\oo{strict}} \newcommand{\osuggestcomp}{\oo{suggest-comp-flags}} \newcommand{\osuggestlinkb}{\oo{suggest-link-flags-byte}} \newcommand{\osuggestlinko}{\oo{suggest-link-flags-opt}} \newcommand{\osuggestmenhirlib}{\oo{suggest-menhirLib}} \newcommand{\osuggestocamlfind}{\oo{suggest-ocamlfind}} \newcommand{\otable}{\oo{table}} \newcommand{\otimings}{\oo{timings}} \newcommand{\otimingsto}{\oo{timings-to}} \newcommand{\otrace}{\oo{trace}} \newcommand{\ostdlib}{\oo{stdlib}} \newcommand{\oversion}{\oo{version}} \newcommand{\ocoq}{\oo{coq}} \newcommand{\ocoqlibpath}{\oo{coq-lib-path}} \newcommand{\ocoqlibnopath}{\oo{coq-lib-no-path}} \newcommand{\ocoqnocomplete}{\oo{coq-no-complete}} \newcommand{\ocoqnoactions}{\oo{coq-no-actions}} \newcommand{\ocoqnoversioncheck}{\oo{coq-no-version-check}} \newcommand{\olisterrors}{\oo{list-errors}} \newcommand{\ointerpreterror}{\oo{interpret-error}} \newcommand{\ocompileerrors}{\oo{compile-errors}} \newcommand{\ocompareerrors}{\oo{compare-errors}} \newcommand{\oupdateerrors}{\oo{update-errors}} \newcommand{\oechoerrors}{\oo{echo-errors}} \newcommand{\oechoerrorsconcrete}{\oo{echo-errors-concrete}} \newcommand{\omergeerrors}{\oo{merge-errors}} \newcommand{\ostrategy}{\oo{strategy}} \newcommand{\orandomseed}{\oo{random-seed}} \newcommand{\orandomselfinit}{\oo{random-self-init}} \newcommand{\orandomsentencelength}{\oo{random-sentence-length}} \newcommand{\orandomsentence}{\oo{random-sentence}} \newcommand{\orandomsentenceconcrete}{\oo{random-sentence-concrete}} % The .messages file format. \newcommand{\messages}{\text{\tt .messages}\xspace} % Adding mathstruts to ensure a common baseline. \newcommand{\mycommonbaseline}{ \let\oldnt\nt \renewcommand{\nt}[1]{$\mathstrut$\oldnt{##1}} \let\oldbasic\basic \renewcommand{\basic}[1]{$\mathstrut$\oldbasic{##1}} } % Position keywords. \newcommand{\ksymbolstartpos}{\texttt{\$symbolstartpos}\xspace} menhir-20210929/doc/manual.html000066400000000000000000013573301412503066000161670ustar00rootroot00000000000000 Menhir Reference Manual (version 20210929)

Menhir Reference Manual
(version 20210929)

François Pottier and Yann Régis-Gianas
INRIA
{Francois.Pottier, Yann.Regis-Gianas}@inria.fr

Contents

1  Foreword

Menhir is a parser generator. It turns high-level grammar specifications, decorated with semantic actions expressed in the OCaml programming language [18], into parsers, again expressed in OCaml. It is based on Knuth’s LR(1) parser construction technique [15]. It is strongly inspired by its precursors: yacc [11], ML-Yacc [22], and ocamlyacc [18], but offers a large number of minor and major improvements that make it a more modern tool.

This brief reference manual explains how to use Menhir. It does not attempt to explain context-free grammars, parsing, or the LR technique. Readers who have never used a parser generator are encouraged to read about these ideas first [1,2,8]. They are also invited to have a look at the demos directory in Menhir’s distribution.

Potential users of Menhir should be warned that Menhir’s feature set is not completely stable. There is a tension between preserving a measure of compatibility with ocamlyacc, on the one hand, and introducing new ideas, on the other hand. Some aspects of the tool, such as the error handling mechanism, are still potentially subject to incompatible changes: for instance, in the future, the current error handling mechanism (which is based on the error token, see §10) could be removed and replaced with an entirely different mechanism.

There is room for improvement in the tool and in this reference manual. Bug reports and suggestions are welcome!

2  Usage

Menhir is invoked as follows:

menhir optionoption filenamefilename

Each of the file names must end with .mly (unless --coq is used, in which case it must end with .vy) and denotes a partial grammar specification. These partial grammar specifications are joined (§5.1) to form a single, self-contained grammar specification, which is then processed. The following optional command line switches allow controlling many aspects of the process.

--base basename.  This switch controls the base name of the .ml and .mli files that are produced. That is, the tool will produce files named basename.ml and basename.mli. Note that basename can contain occurrences of the / character, so it really specifies a path and a base name. When only one filename is provided on the command line, the default basename is obtained by depriving filename of its final .mly suffix. When multiple file names are provided on the command line, no default base name exists, so that the --base switch must be used.

--cmly.  This switch causes Menhir to produce a .cmly file in addition to its normal operation. This file contains a (binary-form) representation of the grammar and automaton (see §13.1).

--comment.  This switch causes a few comments to be inserted into the OCaml code that is written to the .ml file.

--compare-errors filename1 --compare-errors filename2.  Two such switches must always be used in conjunction so as to specify the names of two .messages files, filename1 and filename2. Each file is read and internally translated to a mapping of states to messages. Menhir then checks that the left-hand mapping is a subset of the right-hand mapping. This feature is typically used in conjunction with --list-errors to check that filename2 is complete (that is, covers all states where an error can occur). For more information, see §11.

--compile-errors filename.  This switch causes Menhir to read the file filename, which must obey the .messages file format, and to compile it to an OCaml function that maps a state number to a message. The OCaml code is sent to the standard output channel. At the same time, Menhir checks that the collection of input sentences in the file filename is correct and irredundant. For more information, see §11.

--coq.  This switch causes Menhir to produce Coq code. See §12.

--coq-lib-path path.  This switch allows specifying under what name (or path) the Coq support library is known to Coq. When Menhir runs in --coq mode, the generated parser contains references to several modules in this library. This path is used to qualify these references. Its default value is MenhirLib.

--coq-lib-no-path.  This switch indicates that references to the Coq library MenhirLib should not be qualified. This was the default behavior of Menhir prior to 2018/05/30. This switch is provided for compatibility, but normally should not be used.

--coq-no-actions.  (Used in conjunction with --coq.) This switch causes the semantic actions present in the .vy file to be ignored and replaced with tt, the unique inhabitant of Coq’s unit type. This feature can be used to test the Coq back-end with a standard grammar, that is, a grammar that contains OCaml semantic actions. Just rename the file from .mly to .vy and set this switch.

--coq-no-complete.  (Used in conjunction with --coq.) This switch disables the generation of the proof of completeness of the parser (§12). This can be necessary because the proof of completeness is possible only if the grammar has no conflict (not even a benign one, in the sense of §6.1). This can be desirable also because, for a complex grammar, completeness may require a heavy certificate and its validation by Coq may take time.

--coq-no-version-check.  (Used in conjunction with --coq.) This switch prevents the generation of the check that verifies that the versions of Menhir and MenhirLib match.

--depend.  See §14.

--dump.  This switch causes a description of the automaton to be written to the file basename.automaton. This description is written after benign conflicts have been resolved, before severe conflicts are resolved (§6), and before extra reductions are introduced (§4.1.8).

--dump-resolved.  This command line switch causes a description of the automaton to be written to the file basename.automaton.resolved. This description is written after all conflicts have been resolved (§6) and after extra reductions have been introduced (§4.1.8).

--echo-errors filename.  This switch causes Menhir to read the .messages file filename and to produce on the standard output channel just the input sentences. (That is, all messages, blank lines, and comments are filtered out.) For more information, see §11.

--echo-errors-concrete filename.  This switch causes Menhir to read the .messages file filename and to produce on the standard output channel just the input sentences. Each sentence is followed with a comment of the form ## Concrete syntax: ... that shows this sentence in concrete syntax. This comment is printed only if the user has defined an alias for every token (§4.1.3).

--explain.  This switch causes conflict explanations to be written to the file basename.conflicts. See also §6.

--external-tokens T.  This switch causes the definition of the token type to be omitted in basename.ml and basename.mli. Instead, the generated parser relies on the type T.token, where T is an OCaml module name. It is up to the user to define module T and to make sure that it exports a suitable token type. Module T can be hand-written. It can also be automatically generated out of a grammar specification using the --only-tokens switch.

--fixed-exception.  This switch causes the exception Error to be internally defined as a synonym for Parsing.Parse_error. This means that an exception handler that catches Parsing.Parse_error will also catch the generated parser’s Error. This helps increase Menhir’s compatibility with ocamlyacc. There is otherwise no reason to use this switch.

--graph.  This switch causes a description of the grammar’s dependency graph to be written to the file basename.dot. The graph’s vertices are the grammar’s nonterminal symbols. There is a directed edge from vertex A to vertex B if the definition of A refers to B. The file is in a format that is suitable for processing by the graphviz toolkit.

--infer, --infer-write-query, --infer-read-reply.  See §14.

--inspection.  This switch requires --table. It causes Menhir to generate not only the monolithic and incremental APIs (§9.1, §9.2), but also the inspection API (§9.3). Activating this switch causes a few more tables to be produced, resulting in somewhat larger code size.

--interpret.  This switch causes Menhir to act as an interpreter, rather than as a compiler. No OCaml code is generated. Instead, Menhir reads sentences off the standard input channel, parses them, and displays outcomes. This switch can be usefully combined with --trace. For more information, see §8.

--interpret-error.  This switch is analogous to --interpret, except Menhir expects every sentence to cause an error on its last token, and displays information about the state in which the error is detected, in the .messages file format. For more information, see §11.

--interpret-show-cst.  This switch, used in conjunction with --interpret, causes Menhir to display a concrete syntax tree when a sentence is successfully parsed. For more information, see §8.

--list-errors.  This switch causes Menhir to produce (on the standard output channel) a complete list of input sentences that cause an error, in the .messages file format. For more information, see §11.

--log-automaton level.  When level is nonzero, this switch causes some information about the automaton to be logged to the standard error channel.

--log-code level.  When level is nonzero, this switch causes some information about the generated OCaml code to be logged to the standard error channel.

--log-grammar level.  When level is nonzero, this switch causes some information about the grammar to be logged to the standard error channel. When level is 2, the nullable, FIRST, and FOLLOW tables are displayed.

--merge-errors filename1 --merge-errors filename2.  Two such switches must always be used in conjunction so as to specify the names of two .messages files, filename1 and filename2. This command causes Menhir to merge these two .messages files and print the result on the standard output channel. For more information, see §11.

--no-dollars.  This switch disallows the use of positional keywords of the form $i.

--no-inline.  This switch causes all %inline keywords in the grammar specification to be ignored. This is especially useful in order to understand whether these keywords help solve any conflicts.

--no-stdlib.  This switch instructs Menhir to not use its standard library (§5.4).

--ocamlc command.  See §14.

--ocamldep command.  See §14.

--only-preprocess.  This switch causes the grammar specifications to be transformed up to the point where the automaton’s construction can begin. The grammar specifications whose names are provided on the command line are joined (§5.1); all parameterized nonterminal symbols are expanded away (§5.2); type inference is performed, if --infer is enabled; all nonterminal symbols marked %inline are expanded away (§5.3). This yields a single, monolithic grammar specification, which is printed on the standard output channel.

--only-tokens.  This switch causes the %token declarations in the grammar specification to be translated into a definition of the token type, which is written to the files basename.ml and basename.mli. No code is generated. This is useful when a single set of tokens is to be shared between several parsers. The directory demos/calc-two contains a demo that illustrates the use of this switch.

--random-seed seed.  This switch allows the user to set a random seed. This seed influences the random sentence generator.

--random-self-init.  This switch asks Menhir to choose a random seed in a nondeterministic (system-dependent) way. This seed influences the random sentence generator.

--random-sentence-length length.  This switch allows the user to set a goal length for the random sentence generator. The generated sentences will normally have length at most length.

--random-sentence symbol.  This switch asks Menhir to produce and display a random sentence that is generated by the nonterminal symbol symbol. The sentence is displayed as a sequence of terminal symbols, separated with spaces. Each terminal symbol is represented by its name.

The generated sentence is valid with respect to the grammar. If the grammar is in the class LR(1) (that is, if it has no conflicts at all), then the generated sentence is also accepted by the automaton. However, if the grammar has conflicts, then it may be the case that the sentence is rejected by the automaton.

The distribution of sentences is not uniform; some sentences (or fragments of sentences) may be more likely to appear than others.

The productions that involve the error pseudo-token are ignored by the random sentence generator.

--random-sentence-concrete symbol.  This switch asks Menhir to produce and display a random sentence that is generated by the nonterminal symbol symbol. The sentence is displayed as a sequence of terminal symbols, separated with spaces. Each terminal symbol is represented by its token alias (§4.1.3). This assumes that a token alias has been defined for every token.

--raw-depend.  See §14.

--require-aliases.  This switch causes Menhir to check that a token alias (§4.1.3) has been defined for every token. There is no requirement for this alias to be actually used; it must simply exist. A missing alias gives rise to a warning (and, in --strict mode, to an error).

--stdlib directory.  This switch exists only for backwards compatibility and is ignored. It may be removed in the future.

--strategy strategy.  This switch selects an error handling strategy, to be used by the code back-end, the table back-end, and the reference interpreter. The available strategies are legacy and simplified. (However, at the time of writing, the code back-end does not yet support the simplified strategy.) When this switch is omitted, the legacy strategy is used. The choice of a strategy matters only if the grammar uses the error token. For more details, see §10.

--strict.  This switch causes several warnings about the grammar and about the automaton to be considered errors. This includes warnings about useless precedence declarations, non-terminal symbols that produce the empty language, unreachable non-terminal symbols, productions that are never reduced, conflicts that are not resolved by precedence declarations, end-of-stream conflicts, and missing token aliases.

--suggest-*.  See §14.

--table.  This switch causes Menhir to use its table-based back-end, as opposed to its (default) code-based back-end. When --table is used, Menhir produces significantly more compact and somewhat slower parsers. See §16 for a speed comparison.

The table-based back-end produces rather compact tables, which are analogous to those produced by yacc, bison, or ocamlyacc. These tables are not quite stand-alone: they are exploited by an interpreter, which is shipped as part of the support library MenhirLib. For this reason, when --table is used, MenhirLib must be made visible to the OCaml compilers, and must be linked into your executable program. The --suggest-* switches, described above, help do this.

The code-based back-end compiles the LR automaton directly into a nest of mutually recursive OCaml functions. In that case, MenhirLib is not required.

The incremental API (§9.2) and the inspection API (§9.3) are made available only by the table-based back-end.

--timings.  This switch causes internal timing information to be sent to the standard error channel.

--timings-to filename.  This switch causes internal timing information to be written to the file filename.

--trace.  This switch causes tracing code to be inserted into the generated parser, so that, when the parser is run, its actions are logged to the standard error channel. This is analogous to ocamlrun’s p=1 parameter, except this switch must be enabled at compile time: one cannot selectively enable or disable tracing at runtime.

--unused-precedence-levels.  This switch suppresses all warnings about useless %left, %right, %nonassoc and %prec declarations.

--unused-token symbol.  This switch suppresses the warning that is normally emitted when Menhir finds that the terminal symbol symbol is unused.

--unused-tokens.  This switch suppresses all of the warnings that are normally emitted when Menhir finds that some terminal symbols are unused.

--update-errors filename.  This switch causes Menhir to read the .messages file filename and to produce on the standard output channel a new .messages file that is identical, except the auto-generated comments have been re-generated. For more information, see §11.

--version.  This switch causes Menhir to print its own version number and exit.

3  Lexical conventions

A semicolon character (;) may appear after a declaration (§4.1).

An old-style rule (§4.2) may be terminated with a semicolon. Also, within an old-style rule, each producer (§4.2.3) may be terminated with a semicolon.

A new-style rule (§4.3) must not be terminated with a semicolon. Within such a rule, the elements of a sequence must be separated with semicolons.

Semicolons are not allowed to appear anywhere except in the places mentioned above. This is in contrast with ocamlyacc, which views semicolons as insignificant, just like whitespace.

Identifiers (id) coincide with OCaml identifiers, except they are not allowed to contain the quote () character. Following OCaml, identifiers that begin with a lowercase letter (lid) or with an uppercase letter (uid) are distinguished.

A quoted identifier qid is a string enclosed in double quotes. Such a string cannot contain a double quote or a backslash. Quoted identifiers are used as token aliases (§4.1.3).

Comments are C-style (surrounded with /* and */, cannot be nested), C++-style (announced by // and extending until the end of the line), or OCaml-style (surrounded with (* and *), can be nested). Of course, inside OCaml code, only OCaml-style comments are allowed.

OCaml type expressions are surrounded with < and >. Within such expressions, all references to type constructors (other than the built-in list, option, etc.) must be fully qualified.

4  Syntax of grammar specifications


specification ::= declarationdeclaration %% rulerule%% OCaml code ]
declaration ::= %{ OCaml code %}
  %parameter < uid : OCaml module type >
  %token< OCaml type > ] uidqid ] … uidqid ]
  %nonassoc uiduid
  %left uiduid
  %right uiduid
  %type < OCaml type > lidlid
  %start< OCaml type > ] lidlid
  %attribute actualactual attributeattribute
  % attribute
  %on_error_reduce lidlid
attribute ::= [@ name payload ]
old syntaxrule ::= %public ] [ %inline ] lid(  id, …, id ) ] :| ] group || group
group ::= production || production { OCaml code }%prec id ]
production ::= producerproducer%prec id ]
producer ::= lid = ] actual
actual ::= id(  actual, …, actual ) ]
  actual?  ∣ +  ∣ * )
  group || group
new syntaxrule ::= %public ] let lid(  id, …, id ) ] ( :=  ∣ == ) expression
expression ::= | ] expression || expression
  pattern = ] expression ; expression
   id(  expression , …, expression  ) ]
   expression?  ∣ +  ∣ * )
   { OCaml code }%prec id ]
   < OCaml id >%prec id ]
pattern ::= lid   ∣   _   ∣   ~   ∣   (  pattern , …, pattern  )
Figure 1: Syntax of grammar specifications

The syntax of grammar specifications appears in Figure 1. The places where attributes can be attached are not shown; they are documented separately (§13.2). A grammar specification begins with a sequence of declarations (§4.1), ended by a mandatory %% keyword. Following this keyword, a sequence of rules is expected. Each rule defines a nonterminal symbol lid, whose name must begin with a lowercase letter. A rule is expressed either in the “old syntax” (§4.2) or in the “new syntax” (§4.3), which is slightly more elegant and powerful.

4.1  Declarations

4.1.1  Headers

A header is a piece of OCaml code, surrounded with %{ and %}. It is copied verbatim at the beginning of the .ml file. It typically contains OCaml open directives and function definitions for use by the semantic actions. If a single grammar specification file contains multiple headers, their order is preserved. However, when two headers originate in distinct grammar specification files, the order in which they are copied to the .ml file is unspecified.

It is important to note that the header is copied by Menhir only to the .ml file, not to the .mli file. Therefore, it should not contain declarations that affect the meaning of the types that appear in the .mli file. Here are two problems that people commonly run into:

  • Placing an open directive that is required for a %type declaration to make sense. For instance, writing open Foo in the header and declaring %type<t> bar, where the type t is defined in the module Foo, will not work. You must write %type<Foo.t> bar.
  • Declaring a module alias that affects a (declared or inferred) type. For instance, writing module F = Foo in the header and declaring %type<Foo.t> bar may not work (from 2020/05/25 on). The reason is, OCaml may infer that the symbol bar has type F.t, and Menhir will rely on this information without realizing that F is a local name, so in the end, the .mli file contains a reference to F.t that does not make sense.

4.1.2  Parameters

A declaration of the form:

%parameter < uid : OCaml module type >

causes the entire parser to become parameterized over the OCaml module uid, that is, to become an OCaml functor. The directory demos/calc-param contains a demo that illustrates the use of this switch.

If a single specification file contains multiple %parameter declarations, their order is preserved, so that the module name uid introduced by one declaration is effectively in scope in the declarations that follow. When two %parameter declarations originate in distinct grammar specification files, the order in which they are processed is unspecified. Last, %parameter declarations take effect before %{%}, %token, %type, or %start declarations are considered, so that the module name uid introduced by a %parameter declaration is effectively in scope in all %{%}, %token, %type, or %start declarations, regardless of whether they precede or follow the %parameter declaration. This means, in particular, that the side effects of an OCaml header are observed only when the functor is applied, not when it is defined.

4.1.3  Tokens

A declaration of the form:

%token< OCaml type > ] uid1qid1 ]  …  uidnqidn ]

defines the identifiers uid1, …, uidn as tokens, that is, as terminal symbols in the grammar specification and as data constructors in the token type.

If an OCaml type t is present, then these tokens are considered to carry a semantic value of type t, otherwise they are considered to carry no semantic value.

If a quoted identifier qidi is present, then it is considered an alias for the terminal symbol uidi. (This feature, known as “token aliases”, is borrowed from Bison.) Throughout the grammar, the quoted identifier qidi is then synonymous with the identifier uidi. For example, if one declares:

%token PLUS "+"

then the quoted identifier "+" stands for the terminal symbol PLUS throughout the grammar. An example of the use of token aliases appears in the directory demos/calc-alias. Token aliases can be used to improve the readability of a grammar. One must keep in mind, however, that they are just syntactic sugar: they are not interpreted in any way by Menhir or conveyed to tools like ocamllex. They could be considered confusing by a reader who mistakenly believes that they are interpreted as string literals.

4.1.4  Priority and associativity

A declaration of one of the following forms:

%nonassoc uid1uidn
%left uid1uidn
%right uid1uidn

assigns both a priority level and an associativity status to the symbols uid1, …, uidn. The priority level assigned to uid1, …, uidn is not defined explicitly: instead, it is defined to be higher than the priority level assigned by the previous %nonassoc, %left, or %right declaration, and lower than that assigned by the next %nonassoc, %left, or %right declaration. The symbols uid1, …, uidn can be tokens (defined elsewhere by a %token declaration) or dummies (not defined anywhere). Both can be referred to as part of %prec annotations. Associativity status and priority levels allow shift/reduce conflicts to be silently resolved (§6).

4.1.5  Types

A declaration of the form:

%type < OCaml type > lid1lidn

assigns an OCaml type to each of the nonterminal symbols lid1, …, lidn. For start symbols, providing an OCaml type is mandatory, but is usually done as part of the %start declaration. For other symbols, it is optional. Providing type information can improve the quality of OCaml’s type error messages.

A %type declaration may concern not only a nonterminal symbol, such as, say, expression, but also a fully applied parameterized nonterminal symbol, such as list(expression) or separated_list(COMMA, option(expression)).

The types provided as part of %type declarations are copied verbatim to the .ml and .mli files. In contrast, headers (§4.1.1) are copied to the .ml file only. For this reason, the types provided as part of %type declarations must make sense both in the presence and in the absence of these headers. They should typically be fully qualified types.

4.1.6  Start symbols

A declaration of the form:

%start< OCaml type > ] lid1lidn

declares the nonterminal symbols lid1, …, lidn to be start symbols. Each such symbol must be assigned an OCaml type either as part of the %start declaration or via separate %type declarations. Each of lid1, …, lidn becomes the name of a function whose signature is published in the .mli file and that can be used to invoke the parser.

4.1.7  Attribute declarations

Attribute declarations of the form %attribute actualactual attributeattribute and % attribute are explained in §13.2.

4.1.8  Extra reductions on error

A declaration of the form:

%on_error_reduce lid1lidn

marks the nonterminal symbols lid1, …, lidn as potentially eligible for reduction when an invalid token is found. This may cause one or more extra reduction steps to be performed before the error is detected.

More precisely, this declaration affects the automaton as follows. Let us say that a production lid → … is “reducible on error” if its left-hand symbol lid appears in a %on_error_reduce declaration. After the automaton has been constructed and after any conflicts have been resolved, in every state s, the following algorithm is applied:

  1. Construct the set of all productions that are ready to be reduced in state s and are reducible on error;
  2. Test if one of them, say p, has higher “on-error-reduce-priority” than every other production in this set;
  3. If so, in state s, replace every error action with a reduction of the production p. (In other words, for every terminal symbol t, if the action table says: “in state s, when the next input symbol is t, fail”, then this entry is replaced with: “in state s, when the next input symbol is t, reduce production p”.)

If step 3 above is executed in state s, then an error can never be detected in state s, since all error actions in state s are replaced with reduce actions. Error detection is deferred: at least one reduction takes place before the error is detected. It is a “spurious” reduction: in a canonical LR(1) automaton, it would not take place.

An %on_error_reduce declaration does not affect the language that is accepted by the automaton. It does not affect the location where an error is detected. It is used to control in which state an error is detected. If used wisely, it can make errors easier to report, because they are detected in a state for which it is easier to write an accurate diagnostic message (§11.3).

Like a %type declaration, an %on_error_reduce declaration may concern not only a nonterminal symbol, such as, say, expression, but also a fully applied parameterized nonterminal symbol, such as list(expression) or separated_list(COMMA, option(expression)).

The “on-error-reduce-priority” of a production is that of its left-hand symbol. The “on-error-reduce-priority” of a nonterminal symbol is determined implicitly by the order of %on_error_reduce declarations. In the declaration %on_error_reduce  lid1lidn, the symbols lid1, …, lidn have the same “on-error-reduce-priority”. They have higher “on-error-reduce-priority” than the symbols listed in previous %on_error_reduce declarations, and lower “on-error-reduce-priority” than those listed in later %on_error_reduce declarations.

4.2  Rules—old syntax

In its simplest form, a rule begins with the nonterminal symbol lid, followed by a colon character (:), and continues with a sequence of production groups (§4.2.1). Each production group is preceded with a vertical bar character (|); the very first bar is optional. The meaning of the bar is choice: the nonterminal symbol id develops to either of the production groups. We defer explanations of the keyword %public5.1), of the keyword %inline5.3), and of the optional formal parameters (  id, …, id )5.2).

4.2.1  Production groups

In its simplest form, a production group consists of a single production (§4.2.2), followed by an OCaml semantic action (§4.2.1) and an optional %prec annotation (§4.2.1). A production specifies a sequence of terminal and nonterminal symbols that should be recognized, and optionally binds identifiers to their semantic values.

Semantic actions

A semantic action is a piece of OCaml code that is executed in order to assign a semantic value to the nonterminal symbol with which this production group is associated. A semantic action can refer to the (already computed) semantic values of the terminal or nonterminal symbols that appear in the production via the semantic value identifiers bound by the production.

For compatibility with ocamlyacc, semantic actions can also refer to unnamed semantic values via positional keywords of the form $1, $2, etc. This style is discouraged. (It is in fact forbidden if --no-dollars is turned on.) Furthermore, as a positional keyword of the form $i is internally rewritten as _i, the user should not use identifiers of the form _i.

%prec annotations

An annotation of the form %prec id indicates that the precedence level of the production group is the level assigned to the symbol id via a previous %nonassoc, %left, or %right declaration (§4.1.4). In the absence of a %prec annotation, the precedence level assigned to each production is the level assigned to the rightmost terminal symbol that appears in it. It is undefined if the rightmost terminal symbol has an undefined precedence level or if the production mentions no terminal symbols at all. The precedence level assigned to a production is used when resolving shift/reduce conflicts (§6).

Multiple productions in a group

If multiple productions are present in a single group, then the semantic action and precedence annotation are shared between them. This short-hand effectively allows several productions to share a semantic action and precedence annotation without requiring textual duplication. It is legal only when every production binds exactly the same set of semantic value identifiers and when no positional semantic value keywords ($1, etc.) are used.

4.2.2  Productions

A production is a sequence of producers (§4.2.3), optionally followed by a %prec annotation (§4.2.1). If a precedence annotation is present, it applies to this production alone, not to other productions in the production group. It is illegal for a production and its production group to both carry %prec annotations.

4.2.3  Producers

A producer is an actual (§4.2.4), optionally preceded with a binding of a semantic value identifier, of the form lid =. The actual specifies which construction should be recognized and how a semantic value should be computed for that construction. The identifier lid, if present, becomes bound to that semantic value in the semantic action that follows. Otherwise, the semantic value can be referred to via a positional keyword ($1, etc.).

4.2.4  Actuals

In its simplest form, an actual is just a terminal or nonterminal symbol id. If it is a parameterized non-terminal symbol (see §5.2), then it should be applied: id(  actual, …, actual ) .

An actual may be followed with a modifier (?, +, or *). This is explained further on (see §5.2 and Figure 2).

An actual may also be an “anonymous rule”. In that case, one writes just the rule’s right-hand side, which takes the form group || group. (This form is allowed only as an argument in an application.) This form is expanded on the fly to a definition of a fresh non-terminal symbol, which is declared %inline. For instance, providing an anonymous rule as an argument to list:

list (  e = expression; SEMICOLON { e }  )

is equivalent to writing this:

list (  expression_SEMICOLON  )

where the non-terminal symbol expression_SEMICOLON is chosen fresh and is defined as follows:

%inline expression_SEMICOLON:
   |  e = expression; SEMICOLON { e }

4.3  Rules—new syntax

Please be warned that the new syntax is considered experimental and is subject to change in the future.

In its simplest form, a rule takes the form let lid := expression. Its left-hand side lid is a nonterminal symbol; its right-hand side is an expression. Such a rule defines an ordinary nonterminal symbol, while the alternate form let lid == expression defines an %inline nonterminal symbol (§5.3), that is, a macro. A rule can be preceded with the keyword %public5.1) and can be parameterized with a tuple of formal parameters (  id, …, id )5.2). The various forms of expressions, listed in Figure 1, are:

  • A choice between several expressions, [ | ] expression1 || expressionn. The leading bar is optional.
  • A sequence of two expressions, pattern = expression1 ; expression2. The semantic value produced by expression1 is decomposed according to the pattern pattern. The OCaml variables introduced by pattern may appear in a semantic action that ends the sequence expression2.
  • A sequence ~ = id1 ; expression2, which is sugar for id1 = id1 ; expression2. This is a pun.
  • A sequence expression1 ; expression2, which is sugar for _ = expression1 ; expression2.
  • A symbol id, possibly applied to a tuple of expressions (  expression1, …, expressionn  ). It is worth noting that such an expression can form the end of a sequence: id at the end of a sequence stands for x = id ; { x } for some fresh variable x. Thus, a sequence need not end with a semantic action.
  • An expression followed with ?, +, or *. This is sugar for the previous form: see §5.2 and Figure 2.
  • A semantic action { OCaml code } , possibly followed with a precedence annotation %prec id. This OCaml code can refer to the variables that have been bound earlier in the sequence that this semantic action ends. These include all variables named by the user as well as all variables introduced by a ~ pattern as part of a pun. The notation $i, where i is an integer, is forbidden.
  • A point-free semantic action < OCaml id >, possibly followed with a precedence annotation %prec id. The OCaml identifier id must denote a function or a data constructor. It is applied to a tuple of the variables that have been bound earlier in the sequence that this semantic action ends. Thus, <  id  > is sugar for {  id  (x1, …, xn} , where x1, …, xn are the variables bound earlier. These include all variables named by the user as well as all variables introduced by a ~ pattern.
  • An identity semantic action <>. This is sugar for < identity >, where identity is OCaml’s identity function. Therefore, it is sugar for {  (x1, …, xn} , where x1, …, xn are the variables bound earlier.

The syntax of expressions, as presented in Figure 1, seems more permissive than it really is. In reality, a choice cannot be nested inside a sequence; a sequence cannot be nested in the left-hand side of a sequence; a semantic action cannot appear in the left-hand side of a sequence. (Thus, there is a stratification in three levels: choice expressions, sequence expressions, and atomic expressions, which corresponds roughly to the stratification of rules, productions, and producers in the old syntax.) Furthermore, an expression between parentheses (  expression  ) is not a valid expression. To surround an expression with parentheses, one must write either midrule  (  expression  ) or endrule  (  expression  ) ; see §5.4 and Figure 3.

When a complex expression (e.g., a choice or a sequence) is placed in parentheses, as in id (  expression  ), this is equivalent to using id (  s ) , where the fresh symbol s is declared as a synonym for this expression, via the declaration let s == expression. This idiom is also known as an anonymous rule (§4.2.4).

Examples

As an example of a rule in the new syntax, the parameterized nonterminal symbol option, which is part of Menhir’s standard library (§5.4), can be defined as follows:

let option(x) :=
  | { None }
  | x = x ; { Some x }

Using a pun, it can also be written as follows:

let option(x) :=
  | { None }
  | ~ = x ; { Some x }

Using a pun and a point-free semantic action, it can also be expressed as follows:

let option(x) :=
  | { None }
  | ~ = x ; < Some >

As another example, the parameterized symbol delimited, also part of Menhir’s standard library (§5.4), can be defined in the new syntax as follows:

let delimited(opening, x, closing) ==
  opening ; ~ = x ; closing ; <>

The use of == indicates that this is a macro, i.e., an %inline nonterminal symbol (see §5.3). The identity semantic action <> is here synonymous with { x }.

Other illustrations of the new syntax can be found in the directories demos/calc-new-syntax and demos/calc-ast.

5  Advanced features

5.1  Splitting specifications over multiple files

Modules

Grammar specifications can be split over multiple files. When Menhir is invoked with multiple argument file names, it considers each of these files as a partial grammar specification, and joins these partial specifications in order to obtain a single, complete specification.

This feature is intended to promote a form a modularity. It is hoped that, by splitting large grammar specifications into several “modules”, they can be made more manageable. It is also hoped that this mechanism, in conjunction with parameterization (§5.2), will promote sharing and reuse. It should be noted, however, that this is only a weak form of modularity. Indeed, partial specifications cannot be independently processed (say, checked for conflicts). It is necessary to first join them, so as to form a complete grammar specification, before any kind of grammar analysis can be done.

This mechanism is, in fact, how Menhir’s standard library (§5.4) is made available: even though its name does not appear on the command line, it is automatically joined with the user’s explicitly-provided grammar specifications, making the standard library’s definitions globally visible.

A partial grammar specification, or module, contains declarations and rules, just like a complete one: there is no visible difference. Of course, it can consist of only declarations, or only rules, if the user so chooses. (Don’t forget the mandatory %% keyword that separates declarations and rules. It must be present, even if one of the two sections is empty.)

Private and public nonterminal symbols

It should be noted that joining is not a purely textual process. If two modules happen to define a nonterminal symbol by the same name, then it is considered, by default, that this is an accidental name clash. In that case, each of the two nonterminal symbols is silently renamed so as to avoid the clash. In other words, by default, a nonterminal symbol defined in module A is considered private, and cannot be defined again, or referred to, in module B.

Naturally, it is sometimes desirable to define a nonterminal symbol N in module A and to refer to it in module B. This is permitted if N is public, that is, if either its definition carries the keyword %public or N is declared to be a start symbol. A public nonterminal symbol is never renamed, so it can be referred to by modules other than its defining module.

In fact, it is permitted to split the definition of a public nonterminal symbol, over multiple modules and/or within a single module. That is, a public nonterminal symbol N can have multiple definitions, within one module and/or in distinct modules. All of these definitions are joined using the choice (|) operator. For instance, in the grammar of a programming language, the definition of the nonterminal symbol expression could be split into multiple modules, where one module groups the expression forms that have to do with arithmetic, one module groups those that concern function definitions and function calls, one module groups those that concern object definitions and method calls, and so on.

Tokens aside

Another use of modularity consists in placing all %token declarations in one module, and the actual grammar specification in another module. The module that contains the token definitions can then be shared, making it easier to define multiple parsers that accept the same type of tokens. (On this topic, see demos/calc-two.)

5.2  Parameterizing rules

A rule (that is, the definition of a nonterminal symbol) can be parameterized over an arbitrary number of symbols, which are referred to as formal parameters.

Example

For instance, here is the definition of the parameterized nonterminal symbol option, taken from the standard library (§5.4):

%public option(X):
   |  { None }
   |  x = X { Some x }

This definition states that option(X) expands to either the empty string, producing the semantic value None, or to the string X, producing the semantic value Some x, where x is the semantic value of X. In this definition, the symbol X is abstract: it stands for an arbitrary terminal or nonterminal symbol. The definition is made public, so option can be referred to within client modules.

A client who wishes to use option simply refers to it, together with an actual parameter – a symbol that is intended to replace X. For instance, here is how one might define a sequence of declarations, preceded with optional commas:

declarations:
   |  { [] }
   |  ds = declarations; option(COMMA); d = declaration { d :: ds }

This definition states that declarations expands either to the empty string or to declarations followed by an optional comma followed by declaration. (Here, COMMA is presumably a terminal symbol.) When this rule is encountered, the definition of option is instantiated: that is, a copy of the definition, where COMMA replaces X, is produced. Things behave exactly as if one had written:

optional_comma:
   |  { None }
   |  x = COMMA { Some x }
declarations:
   |  { [] }
   |  ds = declarations; optional_comma; d = declaration { d :: ds }

Note that, even though COMMA presumably has been declared as a token with no semantic value, writing x = COMMA is legal, and binds x to the unit value. This design choice ensures that the definition of option makes sense regardless of the nature of X: that is, X can be instantiated with a terminal symbol, with or without a semantic value, or with a nonterminal symbol.

Parameterization in general

In general, the definition of a nonterminal symbol N can be parameterized with an arbitrary number of formal parameters. When N is referred to within a production, it must be applied to the same number of actuals. In general, an actual is:

  • either a single symbol, which can be a terminal symbol, a nonterminal symbol, or a formal parameter;
  • or an application of such a symbol to a number of actuals.

For instance, here is a rule whose single production consists of a single producer, which contains several, nested actuals. (This example is discussed again in §5.4.)

plist(X):
   |  xs = loption(delimited(LPAREN, separated_nonempty_list(COMMA, X), RPAREN)) { xs }

actual?  is syntactic sugar for option(actual)
actual+  is syntactic sugar for nonempty_list(actual)
actual*  is syntactic sugar for list(actual)
Figure 2: Syntactic sugar for simulating regular expressions, also known as EBNF

Applications of the parameterized nonterminal symbols option, nonempty_list, and list, which are defined in the standard library (§5.4), can be written using a familiar, regular-expression like syntax (Figure 2).

Higher-order parameters

A formal parameter can itself expect parameters. For instance, here is a rule that defines the syntax of procedures in an imaginary programming language:

procedure(list):
   |  PROCEDURE ID list(formal) SEMICOLON block SEMICOLON {}

This rule states that the token ID, which represents the name of the procedure, should be followed with a list of formal parameters. (The definitions of the nonterminal symbols formal and block are not shown.) However, because list is a formal parameter, as opposed to a concrete nonterminal symbol defined elsewhere, this definition does not specify how the list is laid out: which token, if any, is used to separate, or terminate, list elements? is the list allowed to be empty? and so on. A more concrete notion of procedure is obtained by instantiating the formal parameter list: for instance, procedure(plist), where plist is the parameterized nonterminal symbol defined earlier, is a valid application.

Consistency

Definitions and uses of parameterized nonterminal symbols are checked for consistency before they are expanded away. In short, it is checked that, wherever a nonterminal symbol is used, it is supplied with actual arguments in appropriate number and of appropriate nature. This guarantees that expansion of parameterized definitions terminates and produces a well-formed grammar as its outcome.

5.3  Inlining

It is well-known that the following grammar of arithmetic expressions does not work as expected: that is, in spite of the priority declarations, it has shift/reduce conflicts.

%token < int > INT
%token PLUS TIMES
%left PLUS
%left TIMES
 
%%
 
expression:
   |  i = INT { i }
   |  e = expression; o = op; f = expression { o e f }
op:
   |  PLUS { ( + ) }
   |  TIMES { ( * ) }

The trouble is, the precedence level of the production expressionexpression op expression is undefined, and there is no sensible way of defining it via a %prec declaration, since the desired level really depends upon the symbol that was recognized by op: was it PLUS or TIMES?

The standard workaround is to abandon the definition of op as a separate nonterminal symbol, and to inline its definition into the definition of expression, like this:

expression:
   |  i = INT { i }
   |  e = expression; PLUS; f = expression { e + f }
   |  e = expression; TIMES; f = expression { e * f }

This avoids the shift/reduce conflict, but gives up some of the original specification’s structure, which, in realistic situations, can be damageable. Fortunately, Menhir offers a way of avoiding the conflict without manually transforming the grammar, by declaring that the nonterminal symbol op should be inlined:

expression:
   |  i = INT { i }
   |  e = expression; o = op; f = expression { o e f }
%inline op:
   |  PLUS { ( + ) }
   |  TIMES { ( * ) }

The %inline keyword causes all references to op to be replaced with its definition. In this example, the definition of op involves two productions, one that develops to PLUS and one that expands to TIMES, so every production that refers to op is effectively turned into two productions, one that refers to PLUS and one that refers to TIMES. After inlining, op disappears and expression has three productions: that is, the result of inlining is exactly the manual workaround shown above.

In some situations, inlining can also help recover a slight efficiency margin. For instance, the definition:

%inline plist(X):
   |  xs = loption(delimited(LPAREN, separated_nonempty_list(COMMA, X), RPAREN)) { xs }

effectively makes plist(X) an alias for the right-hand side loption(…). Without the %inline keyword, the language recognized by the grammar would be the same, but the LR automaton would probably have one more state and would perform one more reduction at run time.

The %inline keyword does not affect the computation of positions (§7). The same positions are computed, regardless of where %inline keywords are placed.

If the semantic actions have side effects, the %inline keyword can affect the order in which these side effects take place. In the example of op and expression above, if for some reason the semantic action associated with op has a side effect (such as updating a global variable, or printing a message), then, by inlining op, we delay this side effect, which takes place after the second operand has been recognized, whereas in the absence of inlining it takes place as soon as the operator has been recognized.

5.4  The standard library


NameRecognizesProducesComment
 
endrule(X)Xα, if X : α(inlined)
midrule(X)Xα, if X : α
 
option(X)є | Xα option, if X : α(also X?)
ioption(X)є | Xα option, if X : α(inlined)
boption(X)є | Xbool
loption(X)є | Xα list, if X : α list
 
pair(X, Y)X Yα×β, if X : α and Y : β
separated_pair(X, sep, Y)X sep Yα×β, if X : α and Y : β
preceded(opening, X)opening Xα, if X : α
terminated(X, closing)X closingα, if X : α
delimited(opening, X, closing)opening X closingα, if X : α
 
list(X)a possibly empty sequence of X’sα list, if X : α(also X*)
nonempty_list(X)a nonempty sequence of X’sα list, if X : α(also X+)
separated_list(sep, X)a possibly empty sequence of X’s separated with sep’sα list, if X : α
separated_nonempty_list(sep, X)a nonempty sequence of X’s   separated with sep’sα list, if X : α
 
rev(X)Xα list, if X : α list(inlined)
flatten(X)Xα list, if X : α list list(inlined)
append(X, Y)X Yα list, if X, Y : α list(inlined)
Figure 3: Summary of the standard library; see standard.mly for details

Once equipped with a rudimentary module system (§5.1), parameterization (§5.2), and inlining (§5.3), it is straightforward to propose a collection of commonly used definitions, such as options, sequences, lists, and so on. This standard library is joined, by default, with every grammar specification. A summary of the nonterminal symbols offered by the standard library appears in Figure 3. See also the short-hands documented in Figure 2.

By relying on the standard library, a client module can concisely define more elaborate notions. For instance, the following rule:

%inline plist(X):
   |  xs = loption(delimited(LPAREN, separated_nonempty_list(COMMA, X), RPAREN)) { xs }

causes plist(X) to recognize a list of X’s, where the empty list is represented by the empty string, and a non-empty list is delimited with parentheses and comma-separated.

The standard library is stored in a file named standard.mly, which is embedded inside Menhir when it is built. The command line switch --no-stdlib instructs Menhir to not load the standard library.

The meaning of the symbols defined in the standard library (Figure 3) should be clear in most cases. Yet, the symbols endrule(X) and midrule(X) deserve an explanation. Both take an argument X, which typically will be instantiated with an anonymous rule (§4.2.4). Both are defined as a synonym for X. In both cases, this allows placing an anonymous subrule in the middle of a rule.

For instance, the following is a well-formed production:

  cat    endrule(dog    { OCaml code1 })    cow    { OCaml code2 }

This production consists of three producers, namely cat and endrule(dog { OCaml code1 }) and cow, and a semantic action { OCaml code2 }. Because endrule(X) is declared as an %inline synonym for X, the expansion of anonymous rules (§4.2.4), followed with the expansion of %inline symbols (§5.3), transforms the above production into the following:

  cat    dog    cow    { OCaml code1; OCaml code2 }

Note that OCaml code1 moves to the end of the rule, which means that this code is executed only after cat, dog and cow have been recognized. In this example, the use of endrule is rather pointless, as the expanded code is more concise and clearer than the original code. Still, endrule can be useful when its actual argument is an anonymous rule with multiple branches.

midrule is used in exactly the same way as endrule, but its expansion is different. For instance, the following is a well-formed production:

  cat    midrule({ OCaml code1 })    cow    { OCaml code2 }

(There is no dog in this example; this is intentional.) Because midrule(X) is a synonym for X, but is not declared %inline, the expansion of anonymous rules (§4.2.4), followed with the expansion of %inline symbols (§5.3), transforms the above production into the following:

  cat    xxx    cow    { OCaml code2 }

where the fresh nonterminal symbol xxx is separately defined by the rule xxx: { OCaml code1 } . Thus, xxx recognizes the empty string, and as soon as it is recognized, OCaml code1 is executed. This is known as a “mid-rule action”.

6  Conflicts

When a shift/reduce or reduce/reduce conflict is detected, it is classified as either benign, if it can be resolved by consulting user-supplied precedence declarations, or severe, if it cannot. Benign conflicts are not reported. Severe conflicts are reported and, if the --explain switch is on, explained.

6.1  When is a conflict benign?

A shift/reduce conflict involves a single token (the one that one might wish to shift) and one or more productions (those that one might wish to reduce). When such a conflict is detected, the precedence level (§4.1.4, §4.2.1) of these entities are looked up and compared as follows:

  1. if only one production is involved, and if it has higher priority than the token, then the conflict is resolved in favor of reduction.
  2. if only one production is involved, and if it has the same priority as the token, then the associativity status of the token is looked up:
    1. if the token was declared nonassociative, then the conflict is resolved in favor of neither action, that is, a syntax error will be signaled if this token shows up when this production is about to be reduced;
    2. if the token was declared left-associative, then the conflict is resolved in favor of reduction;
    3. if the token was declared right-associative, then the conflict is resolved in favor of shifting.
  3. if multiple productions are involved, and if, considered one by one, they all cause the conflict to be resolved in the same way (that is, either in favor in shifting, or in favor of neither), then the conflict is resolved in that way.

In either of these cases, the conflict is considered benign. Otherwise, it is considered severe. Note that a reduce/reduce conflict is always considered severe, unless it happens to be subsumed by a benign multi-way shift/reduce conflict (item 3 above).

6.2  How are severe conflicts explained?

When the --dump switch is on, a description of the automaton is written to the .automaton file. Severe conflicts are shown as part of this description. Fortunately, there is also a way of understanding conflicts in terms of the grammar, rather than in terms of the automaton. When the --explain switch is on, a textual explanation is written to the .conflicts file.

Not all conflicts are explained in this file: instead, only one conflict per automaton state is explained. This is done partly in the interest of brevity, but also because Pager’s algorithm can create artificial conflicts in a state that already contains a true LR(1) conflict; thus, one cannot hope in general to explain all of the conflicts that appear in the automaton. As a result of this policy, once all conflicts explained in the .conflicts file have been fixed, one might need to run Menhir again to produce yet more conflict explanations.


%token IF THEN ELSE
%start < expression > expression
 
%%
 
expression:
   |  …
   |  IF b = expression THEN e = expression {}
   |  IF b = expression THEN e = expression ELSE f = expression {}
   |  …
Figure 4: Basic example of a shift/reduce conflict

How the conflict state is reached

Figure 4 shows a grammar specification with a typical shift/reduce conflict. When this specification is analyzed, the conflict is detected, and an explanation is written to the .conflicts file. The explanation first indicates in which state the conflict lies by showing how that state is reached. Here, it is reached after recognizing the following string of terminal and nonterminal symbols—the conflict string:

IF expression THEN IF expression THEN expression

Allowing the conflict string to contain both nonterminal and terminal symbols usually makes it shorter and more readable. If desired, a conflict string composed purely of terminal symbols could be obtained by replacing each occurrence of a nonterminal symbol N with an arbitrary N-sentence.

The conflict string can be thought of as a path that leads from one of the automaton’s start states to the conflict state. When multiple such paths exist, the one that is displayed is chosen shortest. Nevertheless, it may sometimes be quite long. In that case, artificially (and temporarily) declaring some existing nonterminal symbols to be start symbols has the effect of adding new start states to the automaton and can help produce shorter conflict strings. Here, expression was declared to be a start symbol, which is why the conflict string is quite short.

In addition to the conflict string, the .conflicts file also states that the conflict token is ELSE. That is, when the automaton has recognized the conflict string and when the lookahead token (the next token on the input stream) is ELSE, a conflict arises. A conflict corresponds to a choice: the automaton is faced with several possible actions, and does not know which one should be taken. This indicates that the grammar is not LR(1). The grammar may or may not be inherently ambiguous.

In our example, the conflict string and the conflict token are enough to understand why there is a conflict: when two IF constructs are nested, it is ambiguous which of the two constructs the ELSE branch should be associated with. Nevertheless, the .conflicts file provides further information: it explicitly shows that there exists a conflict, by proving that two distinct actions are possible. Here, one of these actions consists in shifting, while the other consists in reducing: this is a shift/reduce conflict.

A proof takes the form of a partial derivation tree whose fringe begins with the conflict string, followed by the conflict token. A derivation tree is a tree whose nodes are labeled with symbols. The root node carries a start symbol. A node that carries a terminal symbol is considered a leaf, and has no children. A node that carries a nonterminal symbol N either is considered a leaf, and has no children; or is not considered a leaf, and has n children, where n≥ 0, labeled x1,…,xn, where Nx1,…,xn is a production. The fringe of a partial derivation tree is the string of terminal and nonterminal symbols carried by the tree’s leaves. A string of terminal and nonterminal symbols that is the fringe of some partial derivation tree is a sentential form.

Why shifting is legal


Figure 5: A partial derivation tree that justifies shifting


expression
IF expression THEN expression
IF expression THEN expression . ELSE expression
Figure 6: A textual version of the tree in Figure 5

In our example, the proof that shifting is possible is the derivation tree shown in Figures 5 and 6. At the root of the tree is the grammar’s start symbol, expression. This symbol develops into the string IF expression THEN expression, which forms the tree’s second level. The second occurrence of expression in that string develops into IF expression THEN expression ELSE expression, which forms the tree’s last level. The tree’s fringe, a sentential form, is the string IF expression THEN IF expression THEN expression ELSE expression. As announced earlier, it begins with the conflict string IF expression THEN IF expression THEN expression, followed with the conflict token ELSE.

In Figure 6, the end of the conflict string is materialized with a dot. Note that this dot does not occupy the rightmost position in the tree’s last level. In other words, the conflict token (ELSE) itself occurs on the tree’s last level. In practical terms, this means that, after the automaton has recognized the conflict string and peeked at the conflict token, it makes sense for it to shift that token.

Why reducing is legal


Figure 7: A partial derivation tree that justifies reducing


expression
IF expression THEN expression ELSE expression       // lookahead token appears
IF expression THEN expression .
Figure 8: A textual version of the tree in Figure 7

In our example, the proof that reducing is possible is the derivation tree shown in Figures 7 and 8. Again, the sentential form found at the fringe of the tree begins with the conflict string, followed with the conflict token.

Again, in Figure 8, the end of the conflict string is materialized with a dot. Note that, this time, the dot occupies the rightmost position in the tree’s last level. In other words, the conflict token (ELSE) appeared on an earlier level (here, on the second level). This fact is emphasized by the comment // lookahead token appears found at the second level. In practical terms, this means that, after the automaton has recognized the conflict string and peeked at the conflict token, it makes sense for it to reduce the production that corresponds to the tree’s last level—here, the production is expressionIF expression THEN expression.

An example of a more complex derivation tree

Figures 9 and 10 show a partial derivation tree that justifies reduction in a more complex situation. (This derivation tree is relative to a grammar that is not shown.) Here, the conflict string is DATA UIDENT EQUALS UIDENT; the conflict token is LIDENT. It is quite clear that the fringe of the tree begins with the conflict string. However, in this case, the fringe does not explicitly exhibit the conflict token. Let us examine the tree more closely and answer the question: following UIDENT, what’s the next terminal symbol on the fringe?


Figure 9: A partial derivation tree that justifies reducing


decls
decl opt_semi decls       // lookahead token appears because opt_semi can vanish and decls can begin with LIDENT
DATA UIDENT EQUALS tycon_expr       // lookahead token is inherited
tycon_item       // lookahead token is inherited
UIDENT opt_type_exprs       // lookahead token is inherited
.
Figure 10: A textual version of the tree in Figure 9

First, note that opt_type_exprs is not a leaf node, even though it has no children. The grammar contains the production opt_type_exprs → є: the nonterminal symbol opt_type_exprs develops to the empty string. (This is made clear in Figure 10, where a single dot appears immediately below opt_type_exprs.) Thus, opt_type_exprs is not part of the fringe.

Next, note that opt_type_exprs is the rightmost symbol within its level. Thus, in order to find the next symbol on the fringe, we have to look up one level. This is the meaning of the comment // lookahead token is inherited. Similarly, tycon_item and tycon_expr appear rightmost within their level, so we again have to look further up.

This brings us back to the tree’s second level. There, decl is not the rightmost symbol: next to it, we find opt_semi and decls. Does this mean that opt_semi is the next symbol on the fringe? Yes and no. opt_semi is a nonterminal symbol, but we are really interested in finding out what the next terminal symbol on the fringe could be. The partial derivation tree shown in Figures 9 and 10 does not explicitly answer this question. In order to answer it, we need to know more about opt_semi and decls.

Here, opt_semi stands (as one might have guessed) for an optional semicolon, so the grammar contains a production opt_semi → є. This is indicated by the comment // opt_semi can vanish. (Nonterminal symbols that generate є are also said to be nullable.) Thus, one could choose to turn this partial derivation tree into a larger one by developing opt_semi into є, making it a non-leaf node. That would yield a new partial derivation tree where the next symbol on the fringe, following UIDENT, is decls.

Now, what about decls? Again, it is a nonterminal symbol, and we are really interested in finding out what the next terminal symbol on the fringe could be. Again, we need to imagine how this partial derivation tree could be turned into a larger one by developing decls. Here, the grammar happens to contain a production of the form declsLIDENT … This is indicated by the comment // decls can begin with LIDENT. Thus, by developing decls, it is possible to construct a partial derivation tree where the next symbol on the fringe, following UIDENT, is LIDENT. This is precisely the conflict token.

To sum up, there exists a partial derivation tree whose fringe begins with the conflict string, followed with the conflict token. Furthermore, in that derivation tree, the dot occupies the rightmost position in the last level. As in our previous example, this means that, after the automaton has recognized the conflict string and peeked at the conflict token, it makes sense for it to reduce the production that corresponds to the tree’s last level—here, the production is opt_type_exprs → є.

Greatest common factor among derivation trees

Understanding conflicts requires comparing two (or more) derivation trees. It is frequent for these trees to exhibit a common factor, that is, to exhibit identical structure near the top of the tree, and to differ only below a specific node. Manual identification of that node can be tedious, so Menhir performs this work automatically. When explaining a n-way conflict, it first displays the greatest common factor of the n derivation trees. A question mark symbol (?) is used to identify the node where the trees begin to differ. Then, Menhir displays each of the n derivation trees, without their common factor – that is, it displays n sub-trees that actually begin to differ at the root. This should make visual comparisons significantly easier.

6.3  How are severe conflicts resolved in the end?

It is unspecified how severe conflicts are resolved. Menhir attempts to mimic ocamlyacc’s specification, that is, to resolve shift/reduce conflicts in favor of shifting, and to resolve reduce/reduce conflicts in favor of the production that textually appears earliest in the grammar specification. However, this specification is inconsistent in case of three-way conflicts, that is, conflicts that simultaneously involve a shift action and several reduction actions. Furthermore, textual precedence can be undefined when the grammar specification is split over multiple modules. In short, Menhir’s philosophy is that

severe conflicts should not be tolerated,

so you should not care how they are resolved.

6.4  End-of-stream conflicts

Menhir’s treatment of the end of the token stream is (believed to be) fully compatible with ocamlyacc’s. Yet, Menhir attempts to be more user-friendly by warning about a class of so-called “end-of-stream conflicts”.

How the end of stream is handled

In many textbooks on parsing, it is assumed that the lexical analyzer, which produces the token stream, produces a special token, written #, to signal that the end of the token stream has been reached. A parser generator can take advantage of this by transforming the grammar: for each start symbol S in the original grammar, a new start symbol S’ is defined, together with the production S′→ S# . The symbol S is no longer a start symbol in the new grammar. This means that the parser will accept a sentence derived from S only if it is immediately followed by the end of the token stream.

This approach has the advantage of simplicity. However, ocamlyacc and Menhir do not follow it, for several reasons. Perhaps the most convincing one is that it is not flexible enough: sometimes, it is desirable to recognize a sentence derived from S, without requiring that it be followed by the end of the token stream: this is the case, for instance, when reading commands, one by one, on the standard input channel. In that case, there is no end of stream: the token stream is conceptually infinite. Furthermore, after a command has been recognized, we do not wish to examine the next token, because doing so might cause the program to block, waiting for more input.

In short, ocamlyacc and Menhir’s approach is to recognize a sentence derived from S and to not look, if possible, at what follows. However, this is possible only if the definition of S is such that the end of an S-sentence is identifiable without knowledge of the lookahead token. When the definition of S does not satisfy this criterion, and end-of-stream conflict arises: after a potential S-sentence has been read, there can be a tension between consulting the next token, in order to determine whether the sentence is continued, and not consulting the next token, because the sentence might be over and whatever follows should not be read. Menhir warns about end-of-stream conflicts, whereas ocamlyacc does not.

A definition of end-of-stream conflicts

Technically, Menhir proceeds as follows. A # symbol is introduced. It is, however, only a pseudo-token: it is never produced by the lexical analyzer. For each start symbol S in the original grammar, a new start symbol S’ is defined, together with the production S′→ S. The corresponding start state of the LR(1) automaton is composed of the LR(1) item S′ → .  S  [# ]. That is, the pseudo-token # initially appears in the lookahead set, indicating that we expect to be done after recognizing an S-sentence. During the construction of the LR(1) automaton, this lookahead set is inherited by other items, with the effect that, in the end, the automaton has:

  • shift actions only on physical tokens; and
  • reduce actions either on physical tokens or on the pseudo-token #.

A state of the automaton has a reduce action on # if, in that state, an S-sentence has been read, so that the job is potentially finished. A state has a shift or reduce action on a physical token if, in that state, more tokens potentially need to be read before an S-sentence is recognized. If a state has a reduce action on #, then that action should be taken without requesting the next token from the lexical analyzer. On the other hand, if a state has a shift or reduce action on a physical token, then the lookahead token must be consulted in order to determine if that action should be taken.


%token < int > INT
%token PLUS TIMES
%left PLUS
%left TIMES
%start < int > expr
%%
expr:
   |  i = INT { i }
   |  e1 = expr PLUS e2 = expr { e1 + e2 }
   |  e1 = expr TIMES e2 = expr { e1 * e2 }
Figure 11: Basic example of an end-of-stream conflict


State 6:
expr -> expr . PLUS expr [ # TIMES PLUS ]
expr -> expr PLUS expr . [ # TIMES PLUS ]
expr -> expr . TIMES expr [ # TIMES PLUS ]
-- On TIMES shift to state 3
-- On # PLUS reduce production expr -> expr PLUS expr

State 4:
expr -> expr . PLUS expr [ # TIMES PLUS ]
expr -> expr . TIMES expr [ # TIMES PLUS ]
expr -> expr TIMES expr . [ # TIMES PLUS ]
-- On # TIMES PLUS reduce production expr -> expr TIMES expr

State 2:
expr' -> expr . [ # ]
expr -> expr . PLUS expr [ # TIMES PLUS ]
expr -> expr . TIMES expr [ # TIMES PLUS ]
-- On TIMES shift to state 3
-- On PLUS shift to state 5
-- On # accept expr
Figure 12: Part of an LR automaton for the grammar in Figure 11


%token END
%start < int > main     // instead of expr
%%
main:
   |  e = expr END { e }
expr:
   |  …
Figure 13: Fixing the grammar specification in Figure 11

An end-of-stream conflict arises when a state has distinct actions on # and on at least one physical token. In short, this means that the end of an S-sentence cannot be unambiguously identified without examining one extra token. Menhir’s default behavior, in that case, is to suppress the action on #, so that more input is always requested.

Example

Figure 11 shows a grammar that has end-of-stream conflicts. When this grammar is processed, Menhir warns about these conflicts, and further warns that expr is never accepted. Let us explain.

Part of the corresponding automaton, as described in the .automaton file, is shown in Figure 12. Explanations at the end of the .automaton file (not shown) point out that states 6 and 2 have an end-of-stream conflict. Indeed, both states have distinct actions on # and on the physical token TIMES. It is interesting to note that, even though state 4 has actions on # and on physical tokens, it does not have an end-of-stream conflict. This is because the action taken in state 4 is always to reduce the production exprexpr TIMES expr, regardless of the lookahead token.

By default, Menhir produces a parser where end-of-stream conflicts are resolved in favor of looking ahead: that is, the problematic reduce actions on # are suppressed. This means, in particular, that the accept action in state 2, which corresponds to reducing the production exprexpr’, is suppressed. This explains why the symbol expr is never accepted: because expressions do not have an unambiguous end marker, the parser will always request one more token and will never stop.

In order to avoid this end-of-stream conflict, the standard solution is to introduce a new token, say END, and to use it as an end marker for expressions. The END token could be generated by the lexical analyzer when it encounters the actual end of stream, or it could correspond to a piece of concrete syntax, say, a line feed character, a semicolon, or an end keyword. The solution is shown in Figure 13.

7  Positions

When an ocamllex-generated lexical analyzer produces a token, it updates two fields, named lex_start_p and lex_curr_p, in its environment record, whose type is Lexing.lexbuf. Each of these fields holds a value of type Lexing.position. Together, they represent the token’s start and end positions within the text that is being scanned. These fields are read by Menhir after calling the lexical analyzer, so it is the lexical analyzer’s responsibility to correctly set these fields.

A position consists mainly of an offset (the position’s pos_cnum field), but also holds information about the current file name, the current line number, and the current offset within the current line. (Not all ocamllex-generated analyzers keep this extra information up to date. This must be explicitly programmed by the author of the lexical analyzer.)


$startpos  start position of the first symbol in the production’s right-hand side, if there is one;
    end position of the most recently parsed symbol, otherwise
$endpos  end position of the last symbol in the production’s right-hand side, if there is one;
    end position of the most recently parsed symbol, otherwise
$startpos( $i | id )  start position of the symbol named $i or id
$endpos( $i | id )  end position of the symbol named $i or id
$symbolstartpos   start position of the leftmost symbol id such that $startpos(id) !=  $endpos(id);
    if there is no such symbol, $endpos
$startofs   
$endofs   
$startofs( $i | id )  same as above, but produce an integer offset instead of a position
$endofs( $i | id )   
$symbolstartofs   
$loc  stands for the pair ($startpos, $endpos)
$loc( id )  stands for the pair ($startpos( id ), $endpos( id ))
$sloc  stands for the pair ($symbolstartpos, $endpos)
Figure 14: Position-related keywords


symbol_start_pos()$symbolstartpos       
symbol_end_pos()$endpos       
rhs_start_pos i$startpos($i)      (1 ≤ in)
rhs_end_pos i$endpos($i)      (1 ≤ in)
symbol_start()$symbolstartofs       
symbol_end()$endofs       
rhs_start i$startofs($i)      (1 ≤ in)
rhs_end i$endofs($i)      (1 ≤ in)
Figure 15: Translating position-related incantations from ocamlyacc to Menhir

This mechanism allows associating pairs of positions with terminal symbols. If desired, Menhir automatically extends it to nonterminal symbols as well. That is, it offers a mechanism for associating pairs of positions with terminal or nonterminal symbols. This is done by making a set of keywords available to semantic actions (Figure 14). These keywords are not available outside of a semantic action: in particular, they cannot be used within an OCaml header.

OCaml’s standard library module Parsing is deprecated. The functions that it offers can be called, but will return dummy positions.

We remark that, if the current production has an empty right-hand side, then $startpos and $endpos are equal, and (by convention) are the end position of the most recently parsed symbol (that is, the symbol that happens to be on top of the automaton’s stack when this production is reduced). If the current production has a nonempty right-hand side, then $startpos is the same as $startpos($1) and $endpos is the same as $endpos($n), where n is the length of the right-hand side.

More generally, if the current production has matched a sentence of length zero, then $startpos and $endpos will be equal, and conversely.

The position $startpos is sometimes “further towards the left” than one would like. For example, in the following production:

  declaration: modifier? variable { $startpos }

the keyword $startpos represents the start position of the optional modifier modifier?. If this modifier turns out to be absent, then its start position is (by definition) the end position of the most recently parsed symbol. This may not be what is desired: perhaps the user would prefer in this case to use the start position of the symbol variable. This is achieved by using $symbolstartpos instead of $startpos. By definition, $symbolstartpos is the start position of the leftmost symbol whose start and end positions differ. In this example, the computation of $symbolstartpos skips the absent modifier, whose start and end positions coincide, and returns the start position of the symbol variable (assuming this symbol has distinct start and end positions).

There is no keyword $symbolendpos. Indeed, the problem with $startpos is due to the asymmetry in the definition of $startpos and $endpos in the case of an empty right-hand side, and does not affect $endpos.

The positions computed by Menhir are exactly the same as those computed by ocamlyacc1. More precisely, Figure 15 sums up how to translate a call to the Parsing module, as used in an ocamlyacc grammar, to a Menhir keyword.

We note that Menhir’s $startpos does not appear in the right-hand column in Figure 15. In other words, Menhir’s $startpos does not correspond exactly to any of the ocamlyacc function calls. An exact ocamlyacc equivalent of $startpos is rhs_start_pos 1 if the current production has a nonempty right-hand side and symbol_start_pos() if it has an empty right-hand side.

Finally, we remark that Menhir’s %inline keyword (§5.3) does not affect the computation of positions. The same positions are computed, regardless of where %inline keywords are placed.

8  Using Menhir as an interpreter

When --interpret is set, Menhir no longer behaves as a compiler. Instead, it acts as an interpreter. That is, it repeatedly:

  • reads a sentence off the standard input channel;
  • parses this sentence, according to the grammar;
  • displays an outcome.

This process stops when the end of the input channel is reached.

8.1  Sentences

The syntax of sentences is as follows:

sentence ::= lid : ] uiduid  \n

Less formally, a sentence is a sequence of zero or more terminal symbols (uid’s), separated with whitespace, terminated with a newline character, and optionally preceded with a non-terminal start symbol (lid). This non-terminal symbol can be omitted if, and only if, the grammar only has one start symbol.

For instance, here are four valid sentences for the grammar of arithmetic expressions found in the directory demos/calc:

main: INT PLUS INT EOL
INT PLUS INT
INT PLUS PLUS INT EOL
INT PLUS PLUS

In the first sentence, the start symbol main was explicitly specified. In the other sentences, it was omitted, which is permitted, because this grammar has no start symbol other than main. The first sentence is a stream of four terminal symbols, namely INT, PLUS, INT, and EOL. These terminal symbols must be provided under their symbolic names. Writing, say, “12+32\n” instead of INT PLUS INT EOL is not permitted. Menhir would not be able to make sense of such a concrete notation, since it does not have a lexer for it.

8.2  Outcomes

As soon as Menhir is able to read a complete sentence off the standard input channel (that is, as soon as it finds the newline character that ends the sentence), it parses the sentence according to whichever grammar was specified on the command line, and displays an outcome.

An outcome is one of the following:

  • ACCEPT: a prefix of the sentence was successfully parsed; a parser generated by Menhir would successfully stop and produce a semantic value;
  • OVERSHOOT: the end of the sentence was reached before it could be accepted; a parser generated by Menhir would request a non-existent “next token” from the lexer, causing it to fail or block;
  • REJECT: the sentence was not accepted; a parser generated by Menhir would raise the exception Error.

When --interpret-show-cst is set, each ACCEPT outcome is followed with a concrete syntax tree. A concrete syntax tree is either a leaf or a node. A leaf is either a terminal symbol or error. A node is annotated with a non-terminal symbol, and carries a sequence of immediate descendants that correspond to a valid expansion of this non-terminal symbol. Menhir’s notation for concrete syntax trees is as follows:

cst ::= uid
  error
  [ lid : cstcst ]

For instance, if one wished to parse the example sentences of §8.1 using the grammar of arithmetic expressions in demos/calc, one could invoke Menhir as follows:

$ menhir --interpret --interpret-show-cst demos/calc/parser.mly
main: INT PLUS INT EOL
ACCEPT
[main: [expr: [expr: INT] PLUS [expr: INT]] EOL]
INT PLUS INT
OVERSHOOT
INT PLUS PLUS INT EOL
REJECT
INT PLUS PLUS
REJECT

(Here, Menhir’s input—the sentences provided by the user on the standard input channel— is shown intermixed with Menhir’s output—the outcomes printed by Menhir on the standard output channel.) The first sentence is valid, and accepted; a concrete syntax tree is displayed. The second sentence is incomplete, because the grammar specifies that a valid expansion of main ends with the terminal symbol EOL; hence, the outcome is OVERSHOOT. The third sentence is invalid, because of the repeated occurrence of the terminal symbol PLUS; the outcome is REJECT. The fourth sentence, a prefix of the third one, is rejected for the same reason.

8.3  Remarks

Using Menhir as an interpreter offers an easy way of debugging your grammar. For instance, if one wished to check that addition is considered left-associative, as requested by the %left directive found in the file demos/calc/parser.mly, one could submit the following sentence:

$ ./menhir --interpret --interpret-show-cst ../demos/calc/parser.mly
INT PLUS INT PLUS INT EOL
ACCEPT
[main:
  [expr: [expr: [expr: INT] PLUS [expr: INT]] PLUS [expr: INT]]
  EOL
]

The concrete syntax tree displayed by Menhir is skewed towards the left, as desired.

The switches --interpret and --trace can be used in conjunction. When --trace is set, the interpreter logs its actions to the standard error channel.

9  Generated API

When Menhir processes a grammar specification, say parser.mly, it produces one OCaml module, Parser, whose code resides in the file parser.ml and whose signature resides in the file parser.mli. We now review this signature. For simplicity, we assume that the grammar specification has just one start symbol main, whose OCaml type is thing.

9.1  Monolithic API

The monolithic API defines the type token, the exception Error, and the parsing function main, named after the start symbol of the grammar.

The type token is an algebraic data type. A value of type token represents a terminal symbol and its semantic value. For instance, if the grammar contains the declarations %token A and %token<int> B, then the generated file parser.mli contains the following definition:

  type token =
  | A
  | B of int

If --only-tokens is specified on the command line, the type token is generated, and the rest is omitted. On the contrary, if --external-tokens is used, the type token is omitted, but the rest (described below) is generated.

The exception Error carries no argument. It is raised by the parsing function main (described below) when a syntax error is detected.

  exception Error

Next comes one parsing function for each start symbol of the grammar. Here, we have assumed that there is one start symbol, named main, so the generated file parser.mli contains the following declaration:

  val main: (Lexing.lexbuf -> token) -> Lexing.lexbuf -> thing

This function expects two arguments, namely: a lexer, which typically is produced by ocamllex and has type Lexing.lexbuf -> token; and a lexing buffer, which has type Lexing.lexbuf. This API is compatible with ocamlyacc. (For information on using Menhir without ocamllex, please consult §16.) This API is “monolithic” in the sense that there is just one function, which does everything: it pulls tokens from the lexer, parses, and eventually returns a semantic value (or fails by throwing the exception Error).

9.2  Incremental API

If --table is set, Menhir offers an incremental API in addition to the monolithic API. In this API, control is inverted. The parser does not have access to the lexer. Instead, when the parser needs the next token, it stops and returns its current state to the user. The user is then responsible for obtaining this token (typically by invoking the lexer) and resuming the parser from that state. The directory demos/calc-incremental contains a demo that illustrates the use of the incremental API.

This API is “incremental” in the sense that the user has access to a sequence of the intermediate states of the parser. Assuming that semantic values are immutable, a parser state is a persistent data structure: it can be stored and used multiple times, if desired. This enables applications such as “live parsing”, where a buffer is continuously parsed while it is being edited. The parser can be re-started in the middle of the buffer whenever the user edits a character. Because two successive parser states share most of their data in memory, a list of n successive parser states occupies only O(n) space in memory.

9.2.1  Starting the parser

In this API, the parser is started by invoking Incremental.main. (Recall that we assume that main is the name of the start symbol.) The generated file parser.mli contains the following declaration:

  module Incremental : sig
    val main: position -> thing MenhirInterpreter.checkpoint
  end

The argument is the initial position. If the lexer is based on an OCaml lexing buffer, this argument should be lexbuf.lex_curr_p. In §9.2 and §9.3, the type position is a synonym for Lexing.position.

We emphasize that the function Incremental.main does not parse anything. It constructs a checkpoint which serves as a starting point. The functions offer and resume, described below, are used to drive the parser.

9.2.2  Driving the parser

The sub-module MenhirInterpreter is also part of the incremental API. Its declaration, which appears in the generated file parser.mli, is as follows:

  module MenhirInterpreter : MenhirLib.IncrementalEngine.INCREMENTAL_ENGINE
    with type token = token

The signature INCREMENTAL_ENGINE, defined in the module MenhirLib.IncrementalEngine, contains many types and functions, which are described in the rest of this section (§9.2.2) and in the following sections (§9.2.3, §9.2.4).

Please keep in mind that, from the outside, these types and functions should be referred to with an appropriate prefix. For instance, the type checkpoint should be referred to as MenhirInterpreter.checkpoint, or Parser.MenhirInterpreter.checkpoint, depending on which modules the user chooses to open.

  type 'a env

The abstract type 'a env represents the current state of the parser. (That is, it contains the current state and stack of the LR automaton.) Assuming that semantic values are immutable, it is a persistent data structure: it can be stored and used multiple times, if desired. The parameter 'a is the type of the semantic value that will eventually be produced if the parser succeeds.

  type production

The abstract type production represents a production of the grammar. The “start productions” (which do not exist in an .mly file, but are constructed by Menhir internally) are not part of this type.

  type 'a checkpoint = private
    | InputNeeded of 'a env
    | Shifting of 'a env * 'a env * bool
    | AboutToReduce of 'a env * production
    | HandlingError of 'a env
    | Accepted of 'a
    | Rejected

The type 'a checkpoint represents an intermediate or final state of the parser. An intermediate checkpoint is a suspension: it records the parser’s current state, and allows parsing to be resumed. The parameter 'a is the type of the semantic value that will eventually be produced if the parser succeeds.

Accepted and Rejected are final checkpoints. Accepted carries a semantic value.

InputNeeded is an intermediate checkpoint. It means that the parser wishes to read one token before continuing.

Shifting is an intermediate checkpoint. It means that the parser is taking a shift transition. It exposes the state of the parser before and after the transition. The Boolean parameter tells whether the parser intends to request a new token after this transition. (It always does, except when it is about to accept.)

AboutToReduce is an intermediate checkpoint: it means that the parser is about to perform a reduction step. HandlingError is also an intermediate checkpoint: it means that the parser has detected an error and is about to handle it. (Error handling is typically performed in several steps, so the next checkpoint is likely to be HandlingError again.) In these two cases, the parser does not need more input. The parser suspends itself at this point only in order to give the user an opportunity to observe the parser’s transitions and possibly handle errors in a different manner, if desired.

  val offer:
    'a checkpoint ->
    token * position * position ->
    'a checkpoint

The function offer allows the user to resume the parser after the parser has suspended itself with a checkpoint of the form InputNeeded env. This function expects the previous checkpoint checkpoint as well as a new token (together with the start and end positions of this token). It produces a new checkpoint, which again can be an intermediate checkpoint or a final checkpoint. It does not raise any exception. (The exception Error is used only in the monolithic API.)

  val resume:
    ?strategy:[ `Legacy | `Simplified ] ->
    'a checkpoint ->
    'a checkpoint

The function resume allows the user to resume the parser after the parser has suspended itself with a checkpoint of the form AboutToReduce (env, prod) or HandlingError env. This function expects just the previous checkpoint checkpoint. It produces a new checkpoint. It does not raise any exception. The optional argument strategy influences the manner in which resume deals with checkpoints of the form ErrorHandling _. Its default value is `Legacy. For more details, see §10.

The incremental API subsumes the monolithic API. Indeed, main can be (and is in fact) implemented by first using Incremental.main, then calling offer and resume in a loop, until a final checkpoint is obtained.

  type supplier =
    unit -> token * position * position

A token supplier is a function of no arguments which delivers a new token (together with its start and end positions) every time it is called. The function loop and its variants, described below, expect a supplier as an argument.

  val lexer_lexbuf_to_supplier:
    (Lexing.lexbuf -> token) -> Lexing.lexbuf -> supplier

The function lexer_lexbuf_to_supplier, applied to a lexer and to a lexing buffer, produces a fresh supplier.

The functions offer and resume, documented above, are sufficient to write a parser loop. One can imagine many variations of such a loop, which is why we expose offer and resume in the first place. Nevertheless, some variations are so common that it is worth providing them, ready for use. The following functions are implemented on top of offer and resume.

  val loop:
    ?strategy:[ `Legacy | `Simplified ] ->
    supplier -> 'a checkpoint -> 'a

loop supplier checkpoint begins parsing from checkpoint, reading tokens from supplier. It continues parsing until it reaches a checkpoint of the form Accepted v or Rejected. In the former case, it returns v. In the latter case, it raises the exception Error. (By the way, this is how we implement the monolithic API on top of the incremental API.) The optional argument strategy influences the manner in which loop deals with checkpoints of the form ErrorHandling _. Its default value is `Legacy. For more details, see §10.

  val loop_handle:
    ('a -> 'answer) ->
    ('a checkpoint -> 'answer) ->
    supplier -> 'a checkpoint -> 'answer

loop_handle succeed fail supplier checkpoint begins parsing from checkpoint, reading tokens from supplier. It continues until it reaches a checkpoint of the form Accepted v or HandlingError _ (or Rejected, but that should not happen, as HandlingError _ will be observed first). In the former case, it calls succeed v. In the latter case, it calls fail with this checkpoint. It cannot raise Error.

This means that Menhir’s traditional error-handling procedure (which pops the stack until a state that can act on the error token is found) does not get a chance to run. Instead, the user can implement her own error handling code, in the fail continuation.

  val loop_handle_undo:
    ('a -> 'answer) ->
    ('a checkpoint -> 'a checkpoint -> 'answer) ->
    supplier -> 'a checkpoint -> 'answer

loop_handle_undo is analogous to loop_handle, but passes a pair of checkpoints (instead of a single checkpoint) to the failure continuation. The first (and oldest) checkpoint that is passed to the failure continuation is the last InputNeeded checkpoint that was encountered before the error was detected. The second (and newest) checkpoint is where the error was detected. (This is the same checkpoint that loop_handle would pass to its failure continuation.) Going back to the first checkpoint can be thought of as undoing any reductions that were performed after seeing the problematic token. (These reductions must be default reductions or spurious reductions.) This can be useful to someone who wishes to implement an error explanation or error recovery mechanism.

loop_handle_undo must be applied to an InputNeeded checkpoint. The initial checkpoint produced by Incremental.main is of this form.

  val shifts: 'a checkpoint -> 'a env option

shifts checkpoint assumes that checkpoint has been obtained by submitting a token to the parser. It runs the parser from checkpoint, through an arbitrary number of reductions, until the parser either accepts this token (i.e., shifts) or rejects it (i.e., signals an error). If the parser decides to shift, then Some env is returned, where env is the parser’s state just before shifting. Otherwise, None is returned. This can be used to test whether the parser is willing to accept a certain token. This function should be used with caution, though, as it causes semantic actions to be executed. It is desirable that all semantic actions be side-effect-free, or that their side-effects be harmless.

  val acceptable: 'a checkpoint -> token -> position -> bool

acceptable checkpoint token pos requires checkpoint to be an InputNeeded checkpoint. It returns true iff the parser is willing to shift this token. This can be used to test, after an error has been detected, which tokens would have been accepted at this point. To do this, one would typically use loop_handle_undo to get access to the last InputNeeded checkpoint that was encountered before the error was detected, and apply acceptable to that checkpoint.

acceptable is implemented using shifts, so, like shifts, it causes certain semantic actions to be executed. It is desirable that all semantic actions be side-effect-free, or that their side-effects be harmless.

9.2.3  Inspecting the parser’s state

Although the type env is opaque, a parser state can be inspected via a few accessor functions, which are described in this section. The following types and functions are contained in the MenhirInterpreter sub-module.

  type 'a lr1state

The abstract type 'a lr1state describes a (non-initial) state of the LR(1) automaton. If s is such a state, then s should have at least one incoming transition, and all of its incoming transitions carry the same (terminal or non-terminal) symbol, say A. We say that A is the incoming symbol of the state s. The index 'a is the type of the semantic values associated with A. The role played by 'a is clarified in the definition of the type element, which appears further on.

  val number: _ lr1state -> int

The states of the LR(1) automaton are numbered (from 0 and up). The function number maps a state to its number.

  val production_index: production -> int
  val find_production: int -> production

Productions are numbered. (The set of indices of all productions forms an interval, which does not necessarily begin at 0.) The function production_index converts a production to an integer number, whereas the function find_production carries out the reverse conversion. It is an error to apply find_production to an invalid index.

  type element =
    | Element: 'a lr1state * 'a * position * position -> element

The type element describes one entry in the stack of the LR(1) automaton. In a stack element of the form Element (s, v, startp, endp), s is a (non-initial) state and v is a semantic value. The value v is associated with the incoming symbol A of the state s. In other words, the value v was pushed onto the stack just before the state s was entered. Thus, for some type 'a, the state s has type 'a lr1state and the value v has type 'a. The positions startp and endp delimit the fragment of the input text that was reduced to the symbol A.

In order to do anything useful with the value v, one must gain information about the type 'a, by inspection of the state s. So far, the type 'a lr1state is abstract, so there is no way of inspecting s. The inspection API (§9.3) offers further tools for this purpose.

  val top: 'a env -> element option

top env returns the parser’s top stack element. The state contained in this stack element is the current state of the automaton. If the stack is empty, None is returned. In that case, the current state of the automaton must be an initial state.

  val pop_many: int -> 'a env -> 'a env option

pop_many i env pops i elements off the automaton’s stack. This is done via i successive invocations of pop. Thus, pop_many 1 is pop. The index i must be nonnegative. The time complexity is O(i).

  val get: int -> 'a env -> element option

get i env returns the parser’s i-th stack element. The index i is 0-based: thus, get 0 is top. If i is greater than or equal to the number of elements in the stack, None is returned. get is implemented using pop_many and top: its time complexity is O(i).

  val current_state_number: 'a env -> int

current_state_number env is the integer number of the automaton’s current state. Although this number might conceivably be obtained via the functions top and number, using current_state_number is preferable, because this method works even when the automaton’s stack is empty (in which case the current state is an initial state, and top returns None). This number can be passed as an argument to a message function generated by menhir --compile-errors.

  val equal: 'a env -> 'a env -> bool

equal env1 env2 tells whether the parser configurations env1 and env2 are equal in the sense that the automaton’s current state is the same in env1 and env2 and the stack is physically the same in env1 and env2. If equal env1 env2 is true, then the sequence of the stack elements, as observed via pop and top, must be the same in env1 and env2. Also, if equal env1 env2 holds, then the checkpoints input_needed env1 and input_needed env2 must be equivalent. (The function input_needed is documented in §9.2.4.) The function equal has time complexity O(1).

  val positions: 'a env -> position * position

The function positions returns the start and end positions of the current lookahead token. If invoked in an initial state, this function returns a pair of twice the initial position that was passed as an argument to main.

  val env_has_default_reduction: 'a env -> bool
  val state_has_default_reduction: _ lr1state -> bool

When applied to an environment env taken from a checkpoint of the form AboutToReduce (env, prod), the function env_has_default_reduction tells whether the reduction that is about to take place is a default reduction.

state_has_default_reduction s tells whether the state s has a default reduction. This includes the case where s is an accepting state.

9.2.4  Updating the parser’s state

The functions presented in the previous section (§9.2.3) allow inspecting parser states of type 'a checkpoint and 'a env. However, so far, there are no functions for manufacturing new parser states, except offer and resume, which create new checkpoints by feeding tokens, one by one, to the parser.

In this section, a small number of functions are provided for manufacturing new parser states of type 'a env and 'a checkpoint. These functions allow going far back into the past and jumping ahead into the future, so to speak. In other words, they allow driving the parser in other ways than by feeding tokens into it. The functions pop, force_reduction and feed (part of the inspection API; see §9.3) construct values of type 'a env. The function input_needed constructs values of type 'a checkpoint and thereby allows resuming parsing in normal mode (via offer). Together, these functions can be used to implement error handling and error recovery strategies.

  val pop: 'a env -> 'a env option

pop env returns a new environment, where the parser’s top stack cell has been popped off. (If the stack is empty, None is returned.) This amounts to pretending that the (terminal or nonterminal) symbol that corresponds to this stack cell has not been read.

  val force_reduction: production -> 'a env -> 'a env

force_reduction prod env can be called only if in the state env the parser is capable of reducing the production prod. If this condition is satisfied, then this production is reduced, which means that its semantic action is executed (this can have side effects!) and the automaton makes a goto (nonterminal) transition. If this condition is not satisfied, an Invalid_argument exception is raised.

  val input_needed: 'a env -> 'a checkpoint

input_needed env returns InputNeeded env. Thus, out of a parser state that might have been obtained via a series of calls to the functions pop, force_reduction, feed, and so on, it produces a checkpoint, which can be used to resume normal parsing, by supplying this checkpoint as an argument to offer.

This function should be used with some care. It could “mess up the lookahead” in the sense that it allows parsing to resume in an arbitrary state s with an arbitrary lookahead symbol t, even though Menhir’s reachability analysis (which is carried out via the --list-errors switch) might well think that it is impossible to reach this particular configuration. If one is using Menhir’s new error reporting facility (§11), this could cause the parser to reach an error state for which no error message has been prepared.

9.3  Inspection API

If --inspection is set, Menhir offers an inspection API in addition to the monolithic and incremental APIs. (The reason why this is not done by default is that this requires more tables to be generated, thus making the generated parser larger.) Like the incremental API, the inspection API is found in the sub-module MenhirInterpreter. It offers the following types and functions.

The type 'a terminal is a generalized algebraic data type (GADT). A value of type 'a terminal represents a terminal symbol (without a semantic value). The index 'a is the type of the semantic values associated with this symbol. For instance, if the grammar contains the declarations %token A and %token<int> B, then the generated module MenhirInterpreter contains the following definition:

  type _ terminal =
  | T_A : unit terminal
  | T_B : int terminal

The data constructors are named after the terminal symbols, prefixed with “T_”.

The type 'a nonterminal is also a GADT. A value of type 'a nonterminal represents a nonterminal symbol (without a semantic value). The index 'a is the type of the semantic values associated with this symbol. For instance, if main is the only nonterminal symbol, then the generated module MenhirInterpreter contains the following definition:

  type _ nonterminal =
  | N_main : thing nonterminal

The data constructors are named after the nonterminal symbols, prefixed with “N_”.

The type 'a symbol is the disjoint union of the types 'a terminal and 'a nonterminal. In other words, a value of type 'a symbol represents a terminal or nonterminal symbol (without a semantic value). This type is (always) defined as follows:

  type 'a symbol =
    | T : 'a terminal -> 'a symbol
    | N : 'a nonterminal -> 'a symbol

The type xsymbol is an existentially quantified version of the type 'a symbol. It is useful in situations where the index 'a is not statically known. It is (always) defined as follows:

  type xsymbol =
    | X : 'a symbol -> xsymbol

The type item describes an LR(0) item, that is, a pair of a production prod and an index i into the right-hand side of this production. If the length of the right-hand side is n, then i is comprised between 0 and n, inclusive.

  type item =
      production * int

The following functions implement total orderings on the types _ terminal, _ nonterminal, xsymbol, production, and item.

  val compare_terminals: _ terminal -> _ terminal -> int
  val compare_nonterminals: _ nonterminal -> _ nonterminal -> int
  val compare_symbols: xsymbol -> xsymbol -> int
  val compare_productions: production -> production -> int
  val compare_items: item -> item -> int

The function incoming_symbol maps a (non-initial) LR(1) state s to its incoming symbol, that is, the symbol that the parser must recognize before it enters the state s.

  val incoming_symbol: 'a lr1state -> 'a symbol

This function can be used to gain access to the semantic value v in a stack element Element (s, v, _, _). Indeed, by case analysis on the symbol incoming_symbol s, one gains information about the type 'a, hence one obtains the ability to do something useful with the value v.

The function items maps a (non-initial) LR(1) state s to its LR(0) core, that is, to the underlying set of LR(0) items. This set is represented as a list, whose elements appear in an arbitrary order. This set is not closed under є-transitions.

  val items: _ lr1state -> item list

The functions lhs and rhs map a production prod to its left-hand side and right-hand side, respectively. The left-hand side is always a nonterminal symbol, hence always of the form N _. The right-hand side is a (possibly empty) sequence of (terminal or nonterminal) symbols.

  val lhs: production -> xsymbol
  val rhs: production -> xsymbol list

The function nullable, applied to a non-terminal symbol, tells whether this symbol is nullable. A nonterminal symbol is nullable if and only if it produces the empty word є.

  val nullable: _ nonterminal -> bool

The function call first nt t tells whether the FIRST set of the nonterminal symbol nt contains the terminal symbol t. That is, it returns true if and only if nt produces a word that begins with t. The function xfirst is identical to first, except it expects a first argument of type xsymbol instead of _ terminal.

  val first: _ nonterminal -> _ terminal -> bool
  val xfirst: xsymbol -> _ terminal -> bool

The function foreach_terminal enumerates the terminal symbols, including the special symbol error. The function foreach_terminal_but_error enumerates the terminal symbols, excluding error.

  val foreach_terminal:           (xsymbol -> 'a -> 'a) -> 'a -> 'a
  val foreach_terminal_but_error: (xsymbol -> 'a -> 'a) -> 'a -> 'a

feed symbol startp semv endp env causes the parser to consume the (terminal or nonterminal) symbol symbol, accompanied with the semantic value semv and with the start and end positions startp and endp. Thus, the automaton makes a transition, and reaches a new state. The stack grows by one cell. This operation is permitted only if the current state (as determined by env) has an outgoing transition labeled with symbol. Otherwise, an Invalid_argument exception is raised.

  val feed: 'a symbol -> position -> 'a -> position -> 'b env -> 'b env

10  Error handling: the traditional way

Menhir’s traditional error handling mechanism is considered deprecated: although it is still supported for the time being, it might be removed in the future. We recommend setting up an error handling mechanism using the new tools offered by Menhir (§11).

Error handling

Menhir’s error traditional handling mechanism is inspired by that of yacc and ocamlyacc, but is not identical. A special error token is made available for use within productions. The LR automaton is constructed exactly as if error was a regular terminal symbol. However, error is never produced by the lexical analyzer. Instead, when an error is detected, the current lookahead token is discarded and replaced with the error token, which becomes the current lookahead token. At this point, the parser enters error handling mode. In error handling mode, the parser behaves as follows:

  • If the current state has a shift action on the error token, then this action takes place. Under the legacy strategy, the parser then reads the next token and returns to normal mode. Under the simplified strategy, it does not request the next token, so the current token remains error, and the parser remains in error handling mode.
  • If the current state has a reduce action on the error token, then this action takes place. (This behavior differs from that of yacc and ocamlyacc, which do not reduce on error. It is somewhat unclear why not.) The current token remains error and the parser remains in error handling mode.
  • If the current state has no action on the error token, then, under the simplified strategy, the parser rejects the input. Under the legacy strategy, the parser pops a cell off its stack and remains in error handling mode. If the stack is empty, then the parser rejects the input.

In the monolithic API, the parser rejects the input by raising the exception Error. This exception carries no information. The position of the error can be obtained by reading the lexical analyzer’s environment record. In the incremental API, the parser rejects the input by returning the checkpoint Rejected.

Which strategy should one choose? First, let us note that the difference between the strategies legacy and simplified matters only if the grammar uses the error token. The following rule of thumb can be used to select between them:

  • If the error token is used only to catch an error and stop, then the simplified strategy should be preferred. (In this this restricted style, the error token always appears at the end of a production, whose semantic action raises an exception.)
  • If the error token is used to survive an error and continue parsing, then the legacy strategy should be selected.

Error recovery

ocamlyacc offers an error recovery mode, which is entered immediately after an error token was successfully shifted. In this mode, tokens are repeatedly taken off the input stream and discarded until an acceptable token is found. This feature is no longer offered by Menhir.

Error-related keywords

The following keyword is made available to semantic actions.

When the $syntaxerror keyword is evaluated, evaluation of the semantic action is aborted, so that the current reduction is abandoned; the current lookahead token is discarded and replaced with the error token; and error handling mode is entered. Note that there is no mechanism for inserting an error token in front of the current lookahead token, even though this might also be desirable. It is unclear whether this keyword is useful; it might be suppressed in the future.

11  Error handling: the new way

Menhir’s incremental API (§9.2) allows taking control when an error is detected. Indeed, as soon as an invalid token is detected, the parser produces a checkpoint of the form HandlingError _. At this point, if one decides to let the parser proceed, by just calling resume, then Menhir enters its traditional error handling mode (§10). Instead, however, one can decide to take control and perform error handling or error recovery in any way one pleases. One can, for instance, build and display a diagnostic message, based on the automaton’s current stack and/or state. Or, one could modify the input stream, by inserting or deleting tokens, so as to suppress the error, and resume normal parsing. In principle, the possibilities are endless.

An apparently simple-minded approach to error reporting, proposed by Jeffery [10] and further explored by Pottier [20], consists in selecting a diagnostic message (or a template for a diagnostic message) based purely on the current state of the automaton.

In this approach, one determines, ahead of time, which are the “error states” (that is, the states in which an error can be detected), and one prepares, for each error state, a diagnostic message. Because state numbers are fragile (they change when the grammar evolves), an error state is identified not by its number, but by an input sentence that leads to it: more precisely, by an input sentence which causes an error to be detected in this state. Thus, one maintains a set of pairs of an erroneous input sentence and a diagnostic message.

Menhir defines a file format, the .messages file format, for representing this information (§11.1), and offers a set of tools for creating, maintaining, and exploiting .messages files (§11.2). Once one understands these tools, there remains to write a collection of diagnostic messages, a more subtle task than one might think (§11.3), and to glue everything together (§11.4).

In this approach to error handling, as in any other approach, one must understand exactly when (that is, in which states) errors are detected. This in turn requires understanding how the automaton is constructed. Menhir’s construction technique is not Knuth’s canonical LR(1) technique [15], which is usually too expensive to be practical. Instead, Menhir merges states [19] and introduces so-called default reductions. These techniques defer error detection by allowing extra reductions to take place before an error is detected. The impact of these alterations must be taken into account when writing diagnostic messages (§11.3).

In this approach to error handling, the special error token is not used. It should not appear in the grammar. Similarly, the $syntaxerror keyword should not be used.

11.1  The .messages file format

Definition

A .messages file is a text file. It is composed of a list of entries. Each entry consists of one or more input sentences, followed with one or more blank lines, followed with a message. Two entries are separated by one or more blank lines. The syntax of an input sentence is described in §8.1. A message is an arbitrary piece of text, but cannot cannot a blank line.

Blank lines are significant: they are used as separators, both between entries, and (within an entry) between the sentences and the message. Thus, there cannot be a blank line between two sentences. (If there is one, Menhir becomes confused and may complain about some word not being “a known non-terminal symbol”). There also cannot be a blank line inside a message.


grammar: TYPE UID
# This hand-written comment concerns just the sentence above.
grammar: TYPE OCAMLTYPE UID PREC
# This hand-written comment concerns just the sentence above.

# This hand-written comment concerns both sentences above.

Ill-formed declaration.
Examples of well-formed declarations:
  %type <Syntax.expression> expression
  %type <int> date time
Figure 16: An entry in a .messages file


grammar: TYPE UID
##
## Ends in an error in state: 1.
##
## declaration -> TYPE . OCAMLTYPE separated_nonempty_list(option(COMMA),
##   strict_actual) [ TYPE TOKEN START RIGHT PUBLIC PERCENTPERCENT PARAMETER
##   ON_ERROR_REDUCE NONASSOC LEFT INLINE HEADER EOF COLON ]
##
## The known suffix of the stack is as follows:
## TYPE
##
# This hand-written comment concerns just the sentence above.
#
grammar: TYPE OCAMLTYPE UID PREC
##
## Ends in an error in state: 5.
##
## strict_actual -> symbol . loption(delimited(LPAREN,separated_nonempty_list
##   (COMMA,strict_actual),RPAREN)) [ UID TYPE TOKEN START STAR RIGHT QUESTION
##   PUBLIC PLUS PERCENTPERCENT PARAMETER ON_ERROR_REDUCE NONASSOC LID LEFT
##   INLINE HEADER EOF COMMA COLON ]
##
## The known suffix of the stack is as follows:
## symbol
##
# This hand-written comment concerns just the sentence above.

# This hand-written comment concerns both sentences above.

Ill-formed declaration.
Examples of well-formed declarations:
  %type <Syntax.expression> expression
  %type <int> date time
Figure 17: An entry in a .messages file, decorated with auto-generated comments

As an example, Figure 16 shows a valid entry, taken from Menhir’s own .messages file. This entry contains two input sentences, which lead to errors in two distinct states. A single message is associated with these two error states.

Comments

Comment lines, which begin with a # character, are ignored everywhere. However, users who wish to take advantage of Menhir’s facility for merging two .messages files (§11.2) should follow certain conventions regarding the placement of comments:

  • If a comment concerns a specific sentence and should remain attached to this sentence, then it must immediately follow this sentence (without a blank line in between).
  • If a comment concerns all sentences in an entry, then it should appear between the sentences and the message, with blank lines in between.
  • One should avoid placing comments between two entries, as the merging algorithm will not be able to handle them in a satisfactory way.

Auto-generated comments

Several commands, described next (§11.2), produce .messages files where each input sentence is followed with an auto-generated comment, marked with ##. This special comment indicates in which state the error is detected, and is supposed to help the reader understand what it means to be in this state: What has been read so far? What is expected next?

As an example, the previous entry, decorated with auto-generated comments, is shown in Figure 17. (We have manually wrapped the lines that did not fit in this document.)

An auto-generated comment begins with the number of the error state that is reached via this input sentence.

Then, the auto-generated comment shows the LR(1) items that compose this state, in the same format as in an .automaton file. these items offer a description of the past (that is, what has been read so far) and the future (that is, which terminal symbols are allowed next).

Finally, the auto-generated comment shows what is known about the stack when the automaton is in this state. (This can be deduced from the LR(1) items, but is more readable if shown separately.)

In a canonical LR(1) automaton, the LR(1) items offer an exact description of the past and future. However, in a noncanonical automaton, which is by default what Menhir produces, the situation is more subtle. The lookahead sets can be over-approximated, so the automaton can perform one or more “spurious reductions” before an error is detected. As a result, the LR(1) items in the error state offer a description of the future that may be both incorrect (that is, a terminal symbol that appears in a lookahead set is not necessarily a valid continuation) and incomplete (that is, a terminal symbol that does not appear in any lookahead set may nevertheless be a valid continuation). More details appear further on (§11.3).

In order to attract the user’s attention to this issue, if an input sentence causes one or more spurious reductions, then the auto-generated comment contains a warning about this fact. This mechanism is not completely foolproof, though, as it may be the case that one particular sentence does not cause any spurious reductions (hence, no warning appears), yet leads to an error state that can be reached via other sentences that do involve spurious reductions.

11.2  Maintaining .messages files

Ideally, the set of input sentences in a .messages file should be correct (that is, every sentence causes an error on its last token), irredundant (that is, no two sentences lead to the same error state), and complete (that is, every error state is reached by some sentence).

Verifying correctness and irredundancy

The correctness and irredundancy of a .messages file are checked by supplying --compile-errors filename on the command line, where filename is the name of the .messages file. (These arguments must be supplied in addition to the other usual arguments, such as the name of the .mly file.) This command fails if a sentence does not cause an error at all, or causes an error too early. It also fails if two sentences lead to the same error state. If the file is correct and irredundant, then (as its name suggests) this command compiles the .messages file down to an OCaml function, whose code is printed on the standard output channel. This function, named message, has type int -> string, and maps a state number to a message. It raises the exception Not_found if its argument is not the number of a state for which a message has been defined. If the set of input sentences is complete, then it cannot raise Not_found.

Verifying completeness

The completeness of a .messages file is checked via the commands --list-errors and --compare-errors. The former produces, from scratch, a complete set of input sentences, that is, a set of input sentences that reaches all error states. The latter compares two sets of sentences (more precisely, the two underlying sets of error states) for inclusion.

The command --list-errors first computes all possible ways of causing an error. From this information, it deduces a list of all error states, that is, all states where an error can be detected. For each of these states, it computes a (minimal) input sentence that causes an error in this state. Finally, it prints these sentences, in the .messages file format, on the standard output channel. Each sentence is followed with an auto-generated comment and with a dummy diagnostic message. The user should be warned that this algorithm may require large amounts of time (typically in the tens of seconds, possibly more) and memory (typically in the gigabytes, possibly more). It requires a 64-bit machine. (On a 32-bit machine, it works, but quickly hits a built-in size limit.) At the verbosity level --log-automaton 2, it displays some progress information and internal statistics on the standard error channel.

The command --compare-errors filename1 --compare-errors filename2 compares the .messages files filename1 and filename2. Each file is read and internally translated to a mapping of states to messages. Menhir then checks that the left-hand mapping is a subset of the right-hand mapping. That is, if a state s is reached by some sentence in filename1, then it should also be reached by some sentence in filename2. Furthermore, if the message associated with s in filename1 is not a dummy message, then the same message should be associated with s in filename2.

To check that the sentences in filename2 cover all error states, it suffices to (1) use --list-errors to produce a complete set of sentences, which one stores in filename1, then (2) use --compare-errors to compare filename1 and filename2.

In the case of a grammar that evolves fairly often, it can take significant human time and effort to update the .messages file and ensure correctness, irredundancy, and completeness. A tempting way of reducing this effort is to abandon completeness. This implies that the auto-generated message function can raise Not_found and that a generic “syntax error” message must be produced in that case. We prefer to discourage this approach, as it implies that the end user is exposed to a mixture of specific and generic syntax error messages, and there is no guarantee that the specific (hand-written) messages will appear in all situations where they are expected to appear. Instead, we recommend waiting for the grammar to become stable and enforcing completeness.

Merging .messages files

The command --merge-errors filename1 --merge-errors filename2 attempts to merge the .messages files filename1 and filename2, and prints the result on the standard output channel. This command can be useful if two users have worked independently and each of them has produced a .messages file that covers a subset of all error states. The merging algorithm works roughly as follows:

  • All entries in filename2 are preserved literally.
  • An entry in filename1 that contains the dummy message <YOUR SYNTAX ERROR MESSAGE HERE> is ignored.
  • An entry in filename1 that leads to a state for which there is no entry in filename2 is copied to filename2.
  • An entry in filename1 that leads to a state for which there is also an entry in filename2, with a distinct message, gives rise to a conflict. It is inserted into filename2 together with a comment that signals the conflict.

The algorithm is asymmetric: the content of filename1 is inserted into or appended to filename2. For this reason, if one of the files is a large “reference” file and the other file is a small “delta”, then it is recommended to provide the “delta” as filename1 and the “reference” as filename2.

Other commands

The command --update-errors filename is used to update the auto-generated comments in the .messages file filename. It is typically used after a change in the grammar (or in the command line options that affect the construction of the automaton). A new .messages file is produced on the standard output channel. It is identical to filename, except the auto-generated comments, identified by ##, have been removed and re-generated.

The command --echo-errors filename is used to filter out all comments, blank lines, and messages from the .messages file filename. The input sentences, and nothing else, are echoed on the standard output channel. As an example application, one could then translate the sentences to concrete syntax and create a collection of source files that trigger every possible syntax error.

The command --interpret-error is analogous to --interpret. It causes Menhir to act as an interpreter. Menhir reads sentences off the standard input channel, parses them, and displays the outcome. This switch can be usefully combined with --trace. The main difference between --interpret and --interpret-error is that, when the latter command is used, Menhir expects the input sentence to cause an error on its last token, and displays information about the state in which the error is detected, in the form of a .messages file entry. This can be used to quickly find out exactly what error is caused by one particular input sentence.

11.3  Writing accurate diagnostic messages

One might think that writing a diagnostic message for each error state is a straightforward (if lengthy) task. In reality, it is not so simple.

A state, not a sentence

The first thing to keep in mind is that a diagnostic message is associated with a state s, as opposed to a sentence. An entry in a .messages file contains a sentence w that leads to an error in state s. This sentence is just one way of causing an error in state s; there may exist many other sentences that also cause an error in this state. The diagnostic message should not be specific of the sentence w: it should make sense regardless of how the state s is reached.

As a rule of thumb, when writing a diagnostic message, one should (as much as possible) ignore the example sentence w altogether, and concentrate on the description of the state s, which appears as part of the auto-generated comment.

The LR(1) items that compose the state s offer a description of the past (that is, what has been read so far) and the future (that is, which terminal symbols are allowed next). A diagnostic message should be designed based on this description.


%token ID ARROW LPAREN RPAREN COLON SEMICOLON
%start<unit> program
%%
typ0: ID | LPAREN typ1 RPAREN {}
typ1: typ0 | typ0 ARROW typ1  {}
declaration: ID COLON typ1    {}
program:
| LPAREN declaration RPAREN
| declaration SEMICOLON       {}
Figure 18: A grammar where one error state is difficult to explain


program: ID COLON ID LPAREN
##
## Ends in an error in state: 8.
##
## typ1 -> typ0 . [ SEMICOLON RPAREN ]
## typ1 -> typ0 . ARROW typ1 [ SEMICOLON RPAREN ]
##
## The known suffix of the stack is as follows:
## typ0
##
Figure 19: A problematic error state in the grammar of Figure 18, due to over-approximation

The problem of over-approximated lookahead sets

As pointed out earlier (§11.1), in a noncanonical automaton, the lookahead sets in the LR(1) items can be both over- and under-approximated. One must be aware of this phenomenon, otherwise one runs the risk of writing a diagnostic message that proposes too many or too few continuations.

As an example, let us consider the grammar in Figure 18. According to this grammar, a “program” is either a declaration between parentheses or a declaration followed with a semicolon. A “declaration” is an identifier, followed with a colon, followed with a type. A “type” is an identifier, a type between parentheses, or a function type in the style of OCaml.

The (noncanonical) automaton produced by Menhir for this grammar has 17 states. Using --list-errors, we find that an error can be detected in 10 of these 17 states. By manual inspection of the auto-generated comments, we find that for 9 out of these 10 states, writing an accurate diagnostic message is easy. However, one problematic state remains, namely state 8, shown in Figure 19.

In this state, a (level-0) type has just been read. One valid continuation, which corresponds to the second LR(1) item in Figure 19, is to continue this type: the terminal symbol ARROW, followed with a (level-1) type, is a valid continuation. Now, the question is, what other valid continuations are there? By examining the first LR(1) item in Figure 19, it may look as if both SEMICOLON and RPAREN are valid continuations. However, this cannot be the case. A moment’s thought reveals that either we have seen an opening parenthesis LPAREN at the very beginning of the program, in which case we definitely expect a closing parenthesis RPAREN; or we have not seen one, in which case we definitely expect a semicolon SEMICOLON. It is never the case that both SEMICOLON and RPAREN are valid continuations!

In fact, the lookahead set in the first LR(1) item in Figure 19 is over-approximated. State 8 in the noncanonical automaton results from merging two states in the canonical automaton.

In such a situation, one cannot write an accurate diagnostic message. Knowing that the automaton is in state 8 does not give us a precise view of the valid continuations. Some valuable information (that is, whether we have seen an opening parenthesis LPAREN at the very beginning of the program) is buried in the automaton’s stack.


%token ID ARROW LPAREN RPAREN COLON SEMICOLON
%start<unit> program
%%
typ0: ID | LPAREN typ1(RPAREN) RPAREN          {}
typ1(phantom): typ0 | typ0 ARROW typ1(phantom) {}
declaration(phantom): ID COLON typ1(phantom)   {}
program:
| LPAREN declaration(RPAREN) RPAREN
| declaration(SEMICOLON)  SEMICOLON            {}
Figure 20: Splitting the problematic state of Figure 19 via selective duplication


%token ID ARROW LPAREN RPAREN COLON SEMICOLON
%start<unit> program
%on_error_reduce typ1
%%
typ0: ID | LPAREN typ1 RPAREN {}
typ1: typ0 | typ0 ARROW typ1  {}
declaration: ID COLON typ1    {}
program:
| LPAREN declaration RPAREN
| declaration SEMICOLON       {}
Figure 21: Avoiding the problematic state of Figure 19 via reductions on error


program: ID COLON ID LPAREN
##
## Ends in an error in state: 15.
##
## program -> declaration . SEMICOLON [ # ]
##
## The known suffix of the stack is as follows:
## declaration
##
## WARNING: This example involves spurious reductions.
## This implies that, although the LR(1) items shown above provide an
## accurate view of the past (what has been recognized so far), they
## may provide an INCOMPLETE view of the future (what was expected next).
## In state 8, spurious reduction of production typ1 -> typ0
## In state 11, spurious reduction of production declaration -> ID COLON typ1
##
Figure 22: A problematic error state in the grammar of Figure 21, due to under-approximation

How can one work around this problem? Let us suggest three options.

Blind duplication of states

One option would be to build a canonical automaton by using the --canonical switch. In this example, one would obtain a 27-state automaton, where the problem has disappeared. However, this option is rarely viable, as it duplicates many states without good reason.

Selective duplication of states

A second option is to manually cause just enough duplication to remove the problematic over-approximation. In our example, we wish to distinguish two kinds of types and declarations, namely those that must be followed with a closing parenthesis, and those that must be followed with a semicolon. We create such a distinction by parameterizing typ1 and declaration with a phantom parameter. The modified grammar is shown in Figure 20. The phantom parameter does not affect the language that is accepted: for instance, the nonterminal symbols declaration(SEMICOLON) and declaration(RPAREN) generate the same language as declaration in the grammar of Figure 18. Yet, by giving distinct names to these two symbols, we force the construction of an automaton where more states are distinguished. In this example, Menhir produces a 23-state automaton. Using --list-errors, we find that an error can be detected in 11 of these 23 states, and by manual inspection of the auto-generated comments, we find that for each of these 11 states, writing an accurate diagnostic message is easy. In summary, we have selectively duplicated just enough states so as to split the problematic error state into two non-problematic error states.

Reductions on error

A third and last option is to introduce an %on_error_reduce declaration (§4.1.8) so as to prevent the detection of an error in the problematic state 8. We see in Figure 19 that, in state 8, the production typ1typ0 is ready to be reduced. If we could force this reduction to take place, then the automaton would move to some other state where it would be clear which of SEMICOLON and RPAREN is expected. We achieve this by marking typ1 as “reducible on error”. The modified grammar is shown in Figure 21. For this grammar, Menhir produces a 17-state automaton. (This is the exact same automaton as for the grammar of Figure 18, except 2 of the 17 states have received extra reduction actions.) Using --list-errors, we find that an error can be detected in 9 of these 17 states. The problematic state, namely state 8, is no longer an error state! The problem has vanished.

The problem of under-approximated lookahead sets

The third option seems by far the simplest of all, and is recommended in many situations. However, it comes with a caveat. There may now exist states whose lookahead sets are under-approximated, in a certain sense. Because of this, there is a danger of writing an incomplete diagnostic message, one that does not list all valid continuations.

To see this, let us look again at the sentence ID COLON ID LPAREN. In the grammar and automaton of Figure 18, this sentence takes us to the problematic state 8, shown in Figure 19. In the grammar and automaton of Figure 21, because more reduction actions are carried out before the error is detected, this sentence takes us to state 15, shown in Figure 22.

When writing a diagnostic message for state 15, one might be tempted to write: “Up to this point, a declaration has been recognized. At this point, a semicolon is expected”. Indeed, by examining the sole LR(1) item in state 15, it looks as if SEMICOLON is the only permitted continuation. However, this is not the case. Another valid continuation is ARROW: indeed, the sentence ID COLON ID ARROW ID SEMICOLON forms a valid program. In fact, if the first token following ID COLON ID is ARROW, then in state 8 this token is shifted, so the two reductions that take us from state 8 through state 11 to state 15 never take place. This is why, even though ARROW does not appear in state 15 as a valid continuation, it nevertheless is a valid continuation of ID COLON ID. The warning produced by Menhir, shown in Figure 22, is supposed to attract attention to this issue.

Another way to explain this issue is to point out that, by declaring %on_error_reduce typ1, we make a choice. When the parser reads a type and finds an invalid token, it decides that this type is finished, even though, in reality, this type could be continued with ARROW …. This in turn causes the parser to perform another reduction and consider the current declaration finished, even though, in reality, this declaration could be continued with ARROW ….

In summary, when writing a diagnostic message for state 15, one should take into account the fact that this state can be reached via spurious reductions and (therefore) SEMICOLON may not be the only permitted continuation. One way of doing this, without explicitly listing all permitted continuations, is to write: “Up to this point, a declaration has been recognized. If this declaration is complete, then at this point, a semicolon is expected”.

11.4  A working example

The demo demos/calc-syntax-errors illustrates this approach to error handling. It is based on the demo demos/calc, which involves a very simple grammar of arithmetic expressions. Compared with demos/calc, one %on_error_reduce declaration is added so as to reduce the number of error states. There remain just 9 error states, for which we write 5 distinct syntax error messages. These messages are stored in the file demos/calc-syntax-errors/parserMessages.messages. The file demos/calc-syntax-errors/dune instructs the build system to check this file for correctness, irredundancy and completeness and to compile this file into an OCaml module parserMessages.ml. This OCaml module contains a single function, ParserMessages.messages, which maps a state number to a diagnostic message. It is called from the main module, demos/calc-syntax-errors/calc.ml. There, we use the facilities offered by the module MenhirLib.ErrorReports to print a full syntax error message, which includes the precise location of the error as well as the diagnostic message returned by the function ParserMessages.messages. As icing on the cake, we allow the diagnostic message to contain placeholders of the form $i, where i is an integer constant, understood as a 0-based index into the parser’s stack. We replace such a placeholder with the fragment of the source text that corresponds to this stack entry. A number of expected-output files demonstrate the kind of syntax error messages that we produce; see for instance demos/calc-syntax-errors/calc03.exp and demos/calc-syntax-errors/calc07.exp.

The CompCert verified compiler offers another real-world example. The “pre-parser” is where syntax errors are detected: see cparser/pre_parser.mly. A database of erroneous input sentences and (templates for) diagnostic messages is stored in cparser/handcrafted.messages.

12  Coq back-end

Menhir is able to generate a parser that whose correctness can be formally verified using the Coq proof assistant [13]. This feature is used to construct the parser of the CompCert verified compiler [17].

Setting the --coq switch on the command line enables the Coq back-end. When this switch is set, Menhir expects an input file whose name ends in .vy and generates a Coq file whose name ends in .v.

Like a .mly file, a .vy file is a grammar specification, with embedded semantic actions. The only difference is that the semantic actions in a .vy file are expressed in Coq instead of OCaml. A .vy file otherwise uses the same syntax as a .mly file. CompCert’s cparser/Parser.vy serves as an example.

Several restrictions are imposed when Menhir is used in --coq mode:

  • The error handling mechanism (§10) is absent. The $syntaxerror keyword and the error token are not supported.
  • Location information is not propagated. The $start* and $end* keywords (Figure 14) are not supported.
  • %parameter4.1.2) is not supported.
  • %inline5.3) is not supported.
  • The standard library (§5.4) is not supported, of course, because its semantic actions are expressed in OCaml. If desired, the user can define an analogous library, whose semantic actions are expressed in Coq.
  • Because Coq’s type inference algorithm is rather unpredictable, the Coq type of every nonterminal symbol must be provided via a %type or %start declaration (§4.1.5, §4.1.6).
  • Unless the proof of completeness has been deactivated using --coq-no-complete, the grammar must not have a conflict (not even a benign one, in the sense of §6.1). That is, the grammar must be LR(1). Conflict resolution via priority and associativity declarations (§4.1.4) is not supported. The reason is that there is no simple formal specification of how conflict resolution should work.

The generated file contains several modules:

  • The module Gram defines the terminal and non-terminal symbols, the grammar, and the semantic actions.
  • The module Aut contains the automaton generated by Menhir, together with a certificate that is checked by Coq while establishing the soundness and completeness of the parser.

The type terminal of the terminal symbols is an inductive type, with one constructor for each terminal symbol. A terminal symbol named Foo in the .vy file is named Foo't in Coq. A terminal symbol per se does not carry a the semantic value.

We also define the type token of tokens, that is, dependent pairs of a terminal symbol and a semantic value of an appropriate type for this symbol. We model the lexer as an object of type Streams.Stream token, that is, an infinite stream of tokens.

The type nonterminal of the non-terminal symbols is an inductive type, with one constructor for each non-terminal symbol. A non-terminal symbol named Bar in the .vy file is named Bar'nt in Coq.

The proof of termination of an LR(1) parser in the case of invalid input seems far from obvious. We did not find such a proof in the literature. In an application such as CompCert [17], this question is not considered crucial. For this reason, we did not formally establish the termination of the parser. Instead, in order to satisfy Coq’s termination requirements, we use the “fuel” technique: the parser takes an additional parameter log_fuel of type nat such that 2log_fuel is the maximum number of steps the parser is allowed to perform. In practice, one can use a value of e.g., 40 or 50 to make sure the parser will never run out of fuel in a reasonnable time.

Parsing can have three different outcomes, represented by the type parse_result. (This definition is implicitly parameterized over the initial state init. We omit the details here.)

  Inductive parse_result :=
  | Fail_pr_full: state -> token -> parse_result
  | Timeout_pr: parse_result
  | Parsed_pr:
      symbol_semantic_type (NT (start_nt init)) ->
      Stream token ->
      parse_result.

The outcome Fail_pr_full means that parsing has failed because of a syntax error. (If the completeness of the parser with respect to the grammar has been proved, this implies that the input is invalid). It contains two pieces of information: the state of the parser and the token which caused the error. These are provided for error reporting, if desired. It is important to note that, even though they should be correct, the validity of these two pieces of information is not certified by either the correctness or the completeness theorem. For more discussion on this, see §12.1. The outcome Timeout_pr means that the fuel has been exhausted. Of course, this cannot happen if the parser was given an infinite amount of fuel, as suggested above. The outcome Parsed_pr means that the parser has succeeded in parsing a prefix of the input stream. It carries the semantic value that has been constructed for this prefix, as well as the remainder of the input stream.

For each entry point entry of the grammar, Menhir generates a parsing function entry, whose type is nat -> Stream token -> parse_result.

Two theorems are provided, named entry_point_correct and entry_point_complete. The correctness theorem states that, if a word (a prefix of the input stream) is accepted, then this word is valid (with respect to the grammar) and the semantic value that is constructed by the parser is valid as well (with respect to the grammar). The completeness theorem states that if a word (a prefix of the input stream) is valid (with respect to the grammar), then (given sufficient fuel) it is accepted by the parser.

These results imply that the grammar is unambiguous: for every input, there is at most one valid interpretation. This is proved by another generated theorem, named Parser.unambiguous.

The parsers produced by Menhir’s Coq back-end must be linked with a Coq library. This library can be installed via the command opam install coq-menhirlib.2 The Coq sources of this library can be found in the coq-menhirlib directory of the Menhir repository.

The directory demos/coq-minicalc contains a minimal example that shows how to set things up.

The CompCert verified compiler [17,16] offers a real-world example. There, see in particular the directory cparser.

12.1  Error messaging options for Coq mode

Users of the Coq mode have several options for providing error messages from the parser. If they wish, they can follow the pattern of CompCert and use Menhir’s incremental mode for a non-verified separate parser (§11.4). This may also aid in parsing languages (such as C) that need a lexical feedback loop for correct parsing.

A similar option is available in demos/calc-syntax-errors, where a second parser is used after the first to determine errors.

The parse result Fail_pr_full provides the third and simplest option. As it carries state and token information, it allows constructing meaningful error messages with a small amount of work. The generated function Aut.N_of_state converts a state to a state number. While this is not as powerful as the advanced error handling enabled by the incremental API for non-verified parsers, this does allow interoperability with the existing .messages files and tooling, described in §11.1 and §11.2. This allows error messaging without a second parser. An example is provided in demos/coq-syntax-errors.

Note that the extra information carried by the data constructor Fail_pr_full (and by the related data constructor Fail_sr_full that is used internally) is not verified. This extra information is provided for convenience, but there is no proof of its correctness.

Users who wish to ignore this extra information can use the abbreviated notation Fail_pr and Fail_sr. This notation is used in the statements of the theorems about the parser. It is available only in Coq code, such as in demos/coq-minicalc, not in extracted OCaml code.

13  Building grammarware on top of Menhir

It is possible to build a variety of grammar-processing tools, also known as “grammarware” [14], on top of Menhir’s front-end. Indeed, Menhir offers a facility for dumping a .cmly file, which contains a (binary-form) representation of the grammar and automaton, as well as a library, MenhirSdk, for (programmatically) reading and exploiting a .cmly file. These facilities are described in §13.1. Furthermore, Menhir allows decorating a grammar with “attributes”, which are ignored by Menhir’s back-ends, yet are written to the .cmly file, thus can be exploited by other tools, via MenhirSdk. Attributes are described in §13.2.

13.1  Menhir’s SDK

The command line option --cmly causes Menhir to produce a .cmly file in addition to its normal operation. This file contains a (binary-form) representation of the grammar and automaton. This is the grammar that is obtained after the following steps have been carried out:

  • joining multiple .mly files, if necessary;
  • eliminating anonymous rules;
  • expanding away parameterized nonterminal symbols;
  • removing unreachable nonterminal symbols;
  • performing OCaml type inference, if the --infer switch is used;
  • inlining away nonterminal symbols that are decorated with %inline.

The library MenhirSdk offers an API for reading a .cmly file. The functor MenhirSdk.Cmly_read.Read reads such a file and produces a module whose signature is MenhirSdk.Cmly_api.GRAMMAR. This API is not explained in this document; for details, the reader is expected to follow the above links.

13.2  Attributes

Attributes are decorations that can be placed in .mly files. They are ignored by Menhir’s back-ends, but are written to .cmly files, thus can be exploited by other tools, via MenhirSdk.

An attribute consists of a name and a payload. An attribute name is an OCaml identifier, such as cost, or a list of OCaml identifiers, separated with dots, such as my.name. An attribute payload is an OCaml expression of arbitrary type, such as 1 or "&&" or print_int. Following the syntax of OCaml’s attributes, an attribute’s name and payload are separated with one or more spaces, and are delimited by [@ and ]. Thus, [@cost 1] and [@printer print_int] are examples of attributes.

An attribute can be attached at one of four levels:

  1. An attribute can be attached with the grammar. Such an attribute must be preceded with a % sign and must appear in the declarations section (§4.1). For example, the following is a valid declaration:
      %[@trace true]
    
  2. An attribute can be attached with a terminal symbol. Such an attribute must follow the declaration of this symbol. For example, the following is a valid declaration of the terminal symbol INT:
      %token<int> INT [@cost 0] [@printer print_int]
    
  3. An attribute can be attached with a nonterminal symbol. Such an attribute must appear inside the rule that defines this symbol, immediately after the name of this symbol. For instance, the following is a valid definition of the nonterminal symbol expr:
      expr [@default EConst 0]:
        i = INT                  { EConst i }
      | e1 = expr PLUS e2 = expr { EAdd (e1, e2) }
    
    An attribute can be attached with a parameterized nonterminal symbol:
      option [@default None] (X):
              { None }
      | x = X { Some x }
    
    An attribute cannot be attached with a nonterminal symbol that is decorated with the %inline keyword.
  4. An attribute can be attached with a producer (§4.2.3), that is, with an occurrence of a terminal or nonterminal symbol in the right-hand side of a production. Such an attribute must appear immediately after the producer. For instance, in the following rule, an attribute is attached with the producer expr*:
      exprs:
        LPAREN es = expr* [@list true] RPAREN { es }
    

As a convenience, it is possible to attach many attributes with many (terminal and nonterminal) symbols in one go, via an %attribute declaration, which must be placed in the declarations section (§4.1). For instance, the following declaration attaches both of the attributes [@cost 0] and [@precious false] with each of the symbols INT and id:

  %attribute INT id [@cost 0] [@precious false]

An %attribute declaration can be considered syntactic sugar: it is desugared away in terms of the four forms of attributes presented earlier. (The command line switch --only-preprocess can be used to see how it is desugared.)

If an attribute is attached with a parameterized nonterminal symbol, then, when this symbol is expanded away, the attribute is transmitted to every instance. For instance, in an earlier example, the attribute [@default None] was attached with the parameterized symbol option. Then, every instance of option, such as option(expr), option(COMMA), and so on, inherits this attribute. To attach an attribute with one specific instance only, one can use an %attribute declaration. For instance, the declaration %attribute option(expr) [@cost 10] attaches an attribute with the nonterminal symbol option(expr), but not with the symbol option(COMMA).

14  Interaction with build systems

This section explains some details of the compilation workflow, including OCaml type inference and its repercussions on dependency analysis (§14.1) and compilation flags (§14.2). This material should be of interest only to authors of build systems who wish to build support for Menhir into their system. Ordinary users should skip this section and use a build system that knows about Menhir, such as dune (preferred) or ocamlbuild.

14.1  OCaml type inference and dependency analysis

In an ideal world, the semantic actions in a .mly file should be well-typed according to the OCaml type discipline, and their types should be known to Menhir, which may need this knowledge. (When --inspection is set, Menhir needs to know the OCaml type of every nonterminal symbol.) To address this problem, three approaches exist:

  • Ignore the problem and let Menhir run without OCaml type information (§14.1.1).
  • Let Menhir obtain OCaml type information by invoking the OCaml compiler (§14.1.2).
  • Let Menhir request and receive OCaml type information without invoking the OCaml compiler (§14.1.3).

14.1.1  Running without OCaml type information

The simplest thing to do is to run Menhir without any of the flags described in the following (§14.1.2, §14.1.3). Then, the semantic actions are not type-checked, and their OCaml type is not inferred. (This is analogous to using ocamlyacc.) The drawbacks of this approach are as follows:

  • A type error in a semantic action is detected only when the .ml file produced by Menhir is type-checked. The location of the type error, as reported by the OCaml compiler, can be suboptimal.
  • Unless a %type declaration for every nonterminal symbol is given, the inspection API cannot be generated, that is, --inspection must be turned off.

14.1.2  Obtaining OCaml type information by calling the OCaml compiler

The second approach is to let Menhir invoke the OCaml compiler so as to type-check the semantic actions and infer their types. This is done by invoking Menhir with the --infer switch, as follows.

--infer.  This switch causes the semantic actions to be checked for type consistency before the parser is generated. To do so, Menhir generates a mock .ml file, which contains just the semantic actions, and invokes the OCaml compiler, under the form ocamlc -i, so as to type-check this file and infer the types of the semantic actions. Menhir then reads this information and produces real .ml and .mli files.

--ocamlc command.  This switch controls how ocamlc is invoked. It allows setting both the name of the executable and the command line options that are passed to it.

One difficulty with this approach is that the OCaml compiler usually needs to consult a few .cm[iox] files. Indeed, if the .mly file contains a reference to an external OCaml module, say A, then the OCaml compiler typically needs to read one or more files named A.cm[iox].

This implies that these files must have been created first. But how is one supposed to know, exactly, which files should be created first? One must scan the .mly file so as to find out which external modules it depends upon. In other words, a dependency analysis is required. This analysis can be carried out by invoking Menhir with the --depend switch, as follows.

--depend.  This switch causes Menhir to generate dependency information for use in conjunction with make. When invoked in this mode, Menhir does not generate a parser. Instead, it examines the grammar specification and prints a list of prerequisites for the targets basename.cm[iox], basename.ml, and basename.mli. This list is intended to be textually included within a Makefile. To produce this list, Menhir generates a mock .ml file, which contains just the semantic actions, invokes ocamldep, and postprocesses its output.

--raw-depend.  This switch is analogous to --depend. However, in this case, ocamldep’s output is not postprocessed by Menhir: it is echoed without change. This switch is not suitable for direct use with make ; it is intended for use with omake or ocamlbuild, which perform their own postprocessing.

--ocamldep command.  This switch controls how ocamldep is invoked. It allows setting both the name of the executable and the command line options that are passed to it.

14.1.3  Obtaining OCaml type information without calling the OCaml compiler

The third approach is to let Menhir request and receive OCaml type information without allowing Menhir to invoke the OCaml compiler. There is nothing magic about this: to achieve this, Menhir must be invoked twice, and the OCaml compiler must be invoked (by the user, or by the build system) in between. This is done as follows.

--infer-write-query mockfilename.  When invoked in this mode, Menhir does not generate a parser. Instead, generates a mock .ml file, named mockfilename, which contains just the semantic actions. Then, it stops.

It is then up to the user (or to the build system) to invoke ocamlc -i so as to type-check the mock .ml file and infer its signature. The output of this command should be redirected to some file sigfilename. Then, Menhir can be invoked again, as follows.

--infer-read-reply sigfilename.  When invoked in this mode, Menhir assumes that the file sigfilename contains the result of running ocamlc -i on the file mockfilename. It reads and parses this file, so as to obtain the OCaml type of every semantic action, then proceeds normally to generate a parser.

This protocol was introduced on 2018/05/23; earlier versions of Menhir do not support it. Its existence can be tested as follows:

--infer-protocol-supported.  When invoked with this switch, Menhir immediately terminates with exit code 0. An earlier version of Menhir, which does not support this protocol, would display a help message and terminate with a nonzero exit code.

14.2  Compilation flags

The following switches allow querying Menhir so as to find out which compilation flags should be passed to the OCaml compiler and linker.

--suggest-comp-flags.  This switch causes Menhir to print a set of suggested compilation flags, and exit. These flags are intended to be passed to the OCaml compilers (ocamlc or ocamlopt) when compiling and linking the parser generated by Menhir. What flags are suggested? In the absence of the --table switch, no flags are suggested. When --table is set, a -I flag is suggested, so as to ensure that MenhirLib is visible to the OCaml compiler.

--suggest-link-flags-byte.  This switch causes Menhir to print a set of suggested link flags, and exit. These flags are intended to be passed to ocamlc when producing a bytecode executable. What flags are suggested? In the absence of the --table switch, no flags are suggested. When --table is set, the object file menhirLib.cma is suggested, so as to ensure that MenhirLib is linked in.

--suggest-link-flags-opt.  This switch causes Menhir to print a set of suggested link flags, and exit. These flags are intended to be passed to ocamlopt when producing a native code executable. What flags are suggested? In the absence of the --table switch, no flags are suggested. When --table is set, the object file menhirLib.cmxa is suggested, so as to ensure that MenhirLib is linked in.

--suggest-menhirLib.  This switch causes Menhir to print (the absolute path of) the directory where MenhirLib was installed.

--suggest-ocamlfind.  This switch is deprecated and may be removed in the future. It always prints false.

15  Comparison with ocamlyacc

Roughly speaking, Menhir is 90% compatible with ocamlyacc. Legacy ocamlyacc grammar specifications are accepted and compiled by Menhir. The resulting parsers run and produce correct parse trees. However, parsers that explicitly invoke functions in the module Parsing behave slightly incorrectly. For instance, the functions that provide access to positions return a dummy position when invoked by a Menhir parser. Porting a grammar specification from ocamlyacc to Menhir requires replacing all calls to Parsing with new Menhir-specific keywords (§7).

Here is an incomplete list of the differences between ocamlyacc and Menhir. The list is roughly sorted by decreasing order of importance.

  • Menhir allows the definition of a nonterminal symbol to be parameterized (§5.2). A formal parameter can be instantiated with a terminal symbol, a nonterminal symbol, or an anonymous rule (§4.2.4). A library of standard parameterized definitions (§5.4), including options, sequences, and lists, is bundled with Menhir. EBNF syntax is supported: the modifiers ?, +, and * are sugar for options, nonempty lists, and arbitrary lists (Figure 2).
  • ocamlyacc only accepts LALR(1) grammars. Menhir accepts LR(1) grammars, thus avoiding certain artificial conflicts.
  • Menhir’s %inline keyword (§5.3) helps avoid or resolve some LR(1) conflicts without artificial modification of the grammar.
  • Menhir explains conflicts (§6) in terms of the grammar, not just in terms of the automaton. Menhir’s explanations are believed to be understandable by mere humans.
  • Menhir offers an incremental API (in --table mode only) (§9.2). This means that the state of the parser can be saved at any point (at no cost) and that parsing can later be resumed from a saved state.
  • Menhir offers a set of tools for building a (complete, irredundant) set of invalid input sentences, mapping each such sentence to a (hand-written) error message, and maintaining this set as the grammar evolves (§11).
  • In --coq mode, Menhir produces a parser whose correctness and completeness with respect to the grammar can be checked by Coq (§12).
  • Menhir offers an interpreter (§8) that helps debug grammars interactively.
  • Menhir allows grammar specifications to be split over multiple files (§5.1). It also allows several grammars to share a single set of tokens.
  • Menhir produces reentrant parsers.
  • Menhir is able to produce parsers that are parameterized by OCaml modules.
  • ocamlyacc requires semantic values to be referred to via keywords: $1, $2, and so on. Menhir allows semantic values to be explicitly named.
  • Menhir warns about end-of-stream conflicts (§6.4), whereas ocamlyacc does not. Menhir warns about productions that are never reduced, whereas, at least in some cases, ocamlyacc does not.
  • Menhir offers an option to typecheck semantic actions before a parser is generated: see --infer.
  • ocamlyacc produces tables that are interpreted by a piece of C code, requiring semantic actions to be encapsulated as OCaml closures and invoked by C code. Menhir offers a choice between producing tables and producing code. In either case, no C code is involved.
  • Menhir makes OCaml’s standard library module Parsing entirely obsolete. Access to locations is now via keywords (§7). Uses of raise Parse_error within semantic actions are deprecated. The function parse_error is deprecated. They are replaced with keywords (§10).
  • Menhir’s error handling mechanism (§10) is inspired by ocamlyacc’s, but is not guaranteed to be fully compatible. Error recovery, also known as re-synchronization, is not supported by Menhir.
  • The way in which severe conflicts (§6) are resolved is not guaranteed to be fully compatible with ocamlyacc.
  • Menhir warns about unused %token, %nonassoc, %left, and %right declarations. It also warns about %prec annotations that do not help resolve a conflict.
  • Menhir accepts OCaml-style comments.
  • Menhir allows %start and %type declarations to be condensed.
  • Menhir allows two (or more) productions to share a single semantic action.
  • Menhir produces better error messages when a semantic action contains ill-balanced parentheses.
  • ocamlyacc ignores semicolons and commas everywhere. Menhir regards semicolons and commas as significant, and allows them, or requires them, in certain well-defined places.
  • ocamlyacc allows %type declarations to refer to terminal or non-terminal symbols, whereas Menhir requires them to refer to non-terminal symbols. Types can be assigned to terminal symbols with a %token declaration.

16  Questions and Answers


Is Menhir faster than ocamlyacc? What is the speed difference between menhir and menhir --table? A (not quite scientific) benchmark suggests that the parsers produced by ocamlyacc and menhir --table have comparable speed, whereas those produced by menhir are between 2 and 5 times faster. This benchmark excludes the time spent in the lexer and in the semantic actions.


How do I write Makefile rules for Menhir? This can be a bit tricky. If you must do this, see §14. It is recommended instead to use a build system with built-in support for Menhir, such as dune (preferred) or ocamlbuild.


How do I use Menhir with ocamlbuild? Pass -use-menhir to ocamlbuild. To pass options to Menhir, pass -menhir "menhir <options>" to ocamlbuild. To use Menhir’s table-based back-end, pass -menhir "menhir --table" to ocamlbuild, and either pass -package menhirLib to ocamlbuild or add the tag package(menhirLib) in the _tags file. To combine multiple .mly files, say a.mly and b.mly, into a single parser, say parser.{ml,mli}, create a file named parser.mlypack that contains the module names A B. See the directory demos/ocamlbuild for examples. To deal with .messages files (§11), use the rules provided in the file demos/ocamlbuild/myocamlbuild.ml.


How do I use Menhir with dune?
Please use dune version 1.4.0 or newer, as it has appropriate built-in rules for Menhir parsers. In the simplest scenario, where the parser resides in a single source file parser.mly, the dune-project file should contain a “stanza” along the following lines:

(menhir
  (modules parser)
  (flags --explain --dump)
  (infer true)
)

Ordinary command line switches, like --explain and --dump, are passed as part of the flags line, as done above. The --infer switch has special status and should not be used directly; instead, write (infer true) or (infer false), as done above. (The default is true.) The --table switch can also be listed as part of the flags line; if you do so, then you must add menhirLib to the list of libraries that your code requires, as in the following example:

(executable
   (name myexecutable)
   (libraries menhirLib)
)

The directory demos offers several examples. For more details, see dune’s documentation. To deal with .messages files (§11), please use and adapt the rules found in the file src/stage2/dune.


My .mly file begins with a module alias declaration module F = Foo. Because of this, the .mli file generated by Menhir contains references to F instead of Foo. This does not make sense! Beginning with Menhir 20200525, Menhir prefers to use the types inferred by the OCaml compiler over the types provided by the user in %type declarations. (This may sound strange, but these types can differ in some situations that involve polymorphic variants. Using the inferred type is required for type soundness.) In the presence of a module alias declaration such as module F = Foo, OCaml can infer types that begin with F. instead of Foo., and Menhir currently does not detect that F is a local name. The suggested fix is to avoid placing module alias declarations in .mly files.


Menhir reports more shift/reduce conflicts than ocamlyacc! How come? ocamlyacc sometimes merges two states of the automaton that Menhir considers distinct. This happens when the grammar is not LALR(1). If these two states happen to contain a shift/reduce conflict, then Menhir reports two conflicts, while ocamlyacc only reports one. Of course, the two conflicts are very similar, so fixing one will usually fix the other as well.


I do not use ocamllex. Is there an API that does not involve lexing buffers? Like ocamlyacc, Menhir produces parsers whose monolithic API (§9.1) is intended for use with ocamllex. However, it is possible to convert them, after the fact, to a simpler, revised API. In the revised API, there are no lexing buffers, and a lexer is just a function from unit to tokens. Converters are provided by the library module MenhirLib.Convert. This can be useful, for instance, for users of sedlex, the Unicode-friendly lexer generator. Also, please note that Menhir’s incremental API (§9.2) does not mention the type Lexing.lexbuf. In this API, the parser expects to be supplied with triples of a token and start/end positions of type Lexing.position.


Is there other useful magic in MenhirLib? There is some. The module MenhirLib.ErrorReports offers some facilities for constructing syntax error messages. The module MenhirLib.LexerUtil offers facilities for extracting the position of a syntax error out of the lexing buffer and displaying it in a readable way.


I need both %inline and non-%inline versions of a non-terminal symbol. Is this possible? Define an %inline version first, then use it to define a non-%inline version, like this:

%inline ioption(X):  (* nothing *) { None } | x = X { Some x }
         option(X): o = ioption(X) { o }

This can work even in the presence of recursion, as illustrated by the following definition of (reversed, left-recursive, possibly empty) lists:

%inline irevlist(X):    (* nothing *) { [] } | xs = revlist(X) x = X { x :: xs }
         revlist(X): xs = irevlist(X) { xs }

The definition of irevlist is expanded into the definition of revlist, so in the end, revlist receives its normal, recursive definition. One can then view irevlist as a variant of revlist that is inlined one level deep.


Can I ship a generated parser while avoiding a dependency on MenhirLib? Yes. One option is to use the code-based back-end (that is, to not use --table). In this case, the generated parser is self-contained. Another option is to use the table-based back-end (that is, use --table) and include a copy of the files menhirLib.{ml,mli} together with the generated parser. The command menhir --suggest-menhirLib will tell you where to find these source files.


Why is $startpos off towards the left? It seems to include some leading whitespace. Indeed, as of 2015/11/04, the computation of positions has changed so as to match ocamlyacc’s behavior. As a result, $startpos can now appear to be too far off to the left. This is explained in §7. In short, the solution is to use $symbolstartpos instead.


Can I pretty-print a grammar in ASCII, HTML, or LATEX format? Yes. Have a look at obelisk [4].


Does Menhir support mid-rule actions? Yes. See midrule and its explanation in §5.4.

17  Technical background

After experimenting with Knuth’s canonical LR(1) technique [15], we found that it really is not practical, even on today’s computers. For this reason, Menhir implements a slightly modified version of Pager’s algorithm [19], which merges states on the fly if it can be proved that no reduce/reduce conflicts will arise as a consequence of this decision. This is how Menhir avoids the so-called mysterious conflicts created by LALR(1) parser generators [7, section 5.7].

Menhir’s algorithm for explaining conflicts is inspired by DeRemer and Pennello’s [6] and adapted for use with Pager’s construction technique.

By default, Menhir produces code, as opposed to tables. This approach has been explored before [3,9]. Menhir performs some static analysis of the automaton in order to produce more compact code.

When asked to produce tables, Menhir performs compression via first-fit row displacement, as described by Tarjan and Yao [23]. Double displacement is not used. The action table is made sparse by factoring out an error matrix, as suggested by Dencker, Dürre, and Heuft [5].

The type-theoretic tricks that triggered our interest in LR parsers [21] are not implemented in Menhir. In the beginning, we did not implement them because the OCaml compiler did not at the time offer generalized algebraic data types (GADTs). Today, OCaml has GADTs, but, as the saying goes, “if it ain’t broken, don’t fix it”.

The main ideas behind the Coq back-end are described in a paper by Jourdan, Pottier and Leroy [13]. The C11 parser in the CompCert compiler [17] is constructed by Menhir and verified by Coq, following this technique. How to construct a correct C11 parser using Menhir is described by Jourdan and Pottier [12].

The approach to error reports presented in §11 was proposed by Jeffery [10] and further explored by Pottier [20].

18  Acknowledgements

Menhir’s interpreter (--interpret) and table-based back-end (--table) were implemented by Guillaume Bau, Raja Boujbel, and François Pottier. The project was generously funded by Jane Street Capital, LLC through the “OCaml Summer Project” initiative.

Frédéric Bour provided motivation and an initial implementation for the incremental API, for the inspection API, for attributes, and for MenhirSdk. Merlin, an emacs mode for OCaml, contains an impressive incremental, syntax-error-tolerant OCaml parser, which is based on Menhir and has been a driving force for Menhir’s APIs.

Jacques-Henri Jourdan designed and implemented the Coq back-end and did the Coq proofs for it.

Gabriel Scherer provided motivation for investigating Jeffery’s technique.

References

[1]
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986.
[2]
Andrew Appel. Modern Compiler Implementation in ML. Cambridge University Press, 1998.
[3]
Achyutram Bhamidipaty and Todd A. Proebsting. Very fast YACC-compatible parsers (for very little effort). Software: Practice and Experience, 28(2):181–190, 1998.
[4]
Lélio Brun. Obelisk. https://github.com/Lelio-Brun/Obelisk, 2017.
[5]
Peter Dencker, Karl Dürre, and Johannes Heuft. Optimization of parser tables for portable compilers. ACM Transactions on Programming Languages and Systems, 6(4):546–572, 1984.
[6]
Frank DeRemer and Thomas Pennello. Efficient computation of LALR(1) look-ahead sets. ACM Transactions on Programming Languages and Systems, 4(4):615–649, 1982.
[7]
Charles Donnelly and Richard Stallman. Bison, 2015.
[8]
John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 2000.
[9]
R. Nigel Horspool and Michael Whitney. Even faster LR parsing. Software: Practice and Experience, 20(6):515–535, 1990.
[10]
Clinton L. Jeffery. Generating LR syntax error messages from examples. ACM Transactions on Programming Languages and Systems, 25(5):631–640, 2003.
[11]
Steven C. Johnson. Yacc: Yet another compiler compiler. In UNIX Programmer’s Manual, volume 2, pages 353–387. Holt, Rinehart, and Winston, 1979.
[12]
Jacques-Henri Jourdan and François Pottier. A simple, possibly correct LR parser for C11. ACM Transactions on Programming Languages and Systems, 39(4):14:1–14:36, August 2017.
[13]
Jacques-Henri Jourdan, François Pottier, and Xavier Leroy. Validating LR(1) parsers. volume 7211, pages 397–416, 2012.
[14]
Paul Klint, Ralf Lämmel, and Chris Verhoef. Toward an engineering discipline for grammarware. 14(3):331–380, 2005.
[15]
Donald E. Knuth. On the translation of languages from left to right. Information & Control, 8(6):607–639, 1965.
[16]
Xavier Leroy. The CompCert C verified compiler. https://github.com/AbsInt/CompCert, 2014.
[17]
Xavier Leroy. The CompCert C compiler. http://compcert.inria.fr/, 2015.
[18]
Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon. The OCaml system: documentation and user’s manual, 2016.
[19]
David Pager. A practical general method for constructing LR(k) parsers. Acta Informatica, 7:249–268, 1977.
[20]
François Pottier. Reachability and error diagnosis in LR(1) parsers. In Compiler Construction (CC), pages 88–98, 2016.
[21]
François Pottier and Yann Régis-Gianas. Towards efficient, typed LR parsers. Electronic Notes in Theoretical Computer Science, 148(2):155–180, 2006.
[22]
David R. Tarditi and Andrew W. Appel. ML-Yacc User’s Manual, 2000.
[23]
Robert Endre Tarjan and Andrew Chi-Chih Yao. Storing a sparse table. Communications of the ACM, 22(11):606–611, 1979.

1
The computation of $symbolstartpos is optimized by Menhir under two assumptions about the lexer. First, Menhir assumes that the lexer never produces a token whose start and end positions are equal. Second, Menhir assumes that two positions produced by the lexer are equal if and only if they are physically equal. If the lexer violates either of these assumptions, the computation of $symbolstartpos could produce a result that differs from Parsing.symbol_start_pos().
2
This assumes that you have installed opam, the OCaml package manager, and that you have run the command opam repo add coq-released https://coq.inria.fr/opam/released.

This document was translated from LATEX by HEVEA.
menhir-20210929/doc/manual.pdf000066400000000000000000016567311412503066000160030ustar00rootroot00000000000000%PDF-1.5 % 5 0 obj << /Type /ObjStm /N 100 /First 807 /Length 1228 /Filter /FlateDecode >> stream xڕVMo8WmCC"(6ݶlRF!|7Fm9mߛ7á4edIeH  ~G a0/HfX I*19fg)HIR "b0%V-֔gfsENRcpÇ\R!0(*@T L(d-4L̊,+,Y^tdMNFQ$95e"LDc,ZBP y2ANaU܁ a2|Uh*TngWzVzX5 x t6 9&,}26 !`daLjv Fs&}%.r|xz&9# p,#3)Lx#ɌC($)iHĦ"*%2&Dʤ*80+)ZW!sR?@@;#0RrL NױamGu(`6L JV{:O:9Bױ}'҇7c(7v6ߥ6 Ou*cZ~ 9&XD2DV7[1P6>BW!E \p_(ɷer I>X}^C&zz79 g6OͪT2 vCORVGŖBJ1M~H߫&tcS}.ir,(Zv`.(? cA7SA҆ XΪߖy*æQeS}Z!b 4IVgڴFaifb]򩢦I>=Wi8"veأbT]hsG\&m~s|$gaKQxZ6nC5E&< uK= $Ƕ:O/~0p'n;9\ z!Da,A/>q u7}U_\Wq}B嗃n"'.ȱ7^ V%,6ҧn/уƱcfbR8{oNP|Akƪuq#@;b? 7ퟃx-o0ޡb[YǴ~HH}sbF\qSU6v=jw$o9='q@A9vƗ'\l}#5+ -b7X endstream endobj 261 0 obj << /Length 303 /Filter /FlateDecode >> stream xڅJ1}3䦂-*eKIaz,YZŃ@LP 9 EB5{eҪ -;ռ'«w}FVwaץU۷yH0F0rbI- ~d `:#MR1#&/,0E r"6l~Ht-1M,A݆޲·.-٪hkAR^Naέ+a)<3VG]tuf*buC[cg!"ʉ-] )\a of} endstream endobj 311 0 obj << /Length 1249 /Filter /FlateDecode >> stream xMs6 >S'fN3wzAJY,y/N(|9.~b:/[Y~5b_Y?/e'\D)M3f͘xq/5k&,bSE5 Y+| Q1yrr-y &Nb/8"'W\y^0?!1v1?eMgZ%u "Ql2ٔ+R#Cr.dF}f\Y bAw`-I{1.WnH&3` ؁LjnLNIU0nk3 ca3L?n!tQc|kdTFPDVc[4돿WvC%w 6dݷyR~d{=wa[L@,~hDi|4wWÎ+I6 endstream endobj 338 0 obj << /Length 1050 /Filter /FlateDecode >> stream xYɒ6W@+I&Ol$r(僓H*E@ .RYLϔz(:"hN}ӿ[6:JR*Fr7r$|6@)8S󢭪K1OW٬;bHydwKѵo}Y$c4$+WSS&n[]:iM>TtM/qw3/Y &;CGB>&l,_"E-]ʪqcdNYD .hbB/8g$ZMgV>I-kCwۍgdW5qU(YMպ=DF|F1x >d^]y;]֠,Rtm&+-X ]/Ekҙ=G3Ж@e?mr1s锂i0'[?^wyٌj4Y6Oyfra5pAw y087:4_Hxtz'b5+O,x{E Y7 啨f${@MEe#s'x2ُRO> stream xڽZKW!'`,GY$u<>$Ѵǂg$Ckqڣ]r.~,Ewը/djeX0#|sg1e}ypϜ UMxG.a&bx{90 kJ/xTꈋzҀ{d!yBʠN.F]m 8 !"F $MQ9&4 e)0{ m*L&$Є y,?x!:UMHWHWF_db*+6$ %XPIP6W-)(&(RğhRq:Sr2xUTre uv)&g!qF'nS4af"f })*6)1V[R1p.(hThESL)?УJb8n /΃Bʥz4rMGZr.?633izcn_mVOoWח.nnۏmg6k|0nn}-ۋ/x3g1?d߮.nPÁ.{j\7f3nۭY/e[__sv ls7_ϳ,(M4^y;\\_bkT͇u5~Qt/s5=L^/zbofyWv;dm͘+\,ﯶ呷JuwytC8_odzkzrq6pJNUnoí_9D WX@Ftok aF׷o"0k1wٞ}Xb6pO}gzf6J\a ϓ\CJfngޜf7~WX*VkdހykѷQۘژƯ4~+_iJWƯ4~_mjWƯ6~_==FFmchclcjﶪl~v{۔?W[>y\ n:me$TMyKNp/r|}ĨC1ׇ=w'ϬJȑg$'hKF_-?<0m ħAq@I|@aǤڟ "U=C7,7lAawH9(qΞB_T]6*A)öaW K) }/4WO`IhMpGa`' KAl2N= ȞA2By 07߫#D=~0pDYQV@Gu`6 gz3 FJD\N'$ endstream endobj 359 0 obj << /Length 3567 /Filter /FlateDecode >> stream xZIWLNTI0.ەNRR) hSߧ7pj<9"h|uC]p'|?{/NkDZ{8] 4 wﴺ? v_ޛx״:۹{@x_N*rwz߸\(Nvetg9{mծ=;Ų\ںcѹ|:*Sp㝫yk/r 0˦\pG>Sغ/ \ApiPQkq8ub]yw5'?>% M`iw#ķ_}IH~ːuHﴁ}ee[s"b공Ǽ滿C#Rǂ! ?㩚lWF@;ʥ+u8į6@q#s.sw0DS?U/BF\&tz9xB%+>Իv%J# zwNPo{F^&ز`0m&Z"X bW[8ϞhωK5Ќm©I jʙkRG3/;3 OҵinY7<%uϮ%w^8RHtx?_go8ұ%-'RT7v?FM 9 #V0U#_xd"cwdTJ~R&z+aS|x*W斥2y{s8 Xnn%3УTKw,N;N^dlrePpr30n@ iq08[]a4wCybZe-eHz[SQa;Ym/fo,5Wq2Ju|VWq-ѭJ,7!udLP6s0 (pu`ݺјBOMI< xPo{ |53eP2[Ll]sNY$PfV߳T<%9H]PGq($&a^"ɴoI@KT`wM2>x=7e=H`[fpt'# զ!yxE]0L΃^ےbk[g&|ߎEpx cٺ\J雃z(Y䳏emۗr)5Ƚc$7O_74n{x0QCҐ$4Y0dmACn0zb$f/CMCcOG7"gqw#~i g-F9\E-mK2:h|aUpbw髫P˓IsLb@F9ӋY{{6`0c@'m^PY9,ϓzpe[:Si=nd^1fJR|7)2ں/!o88b%ZbWS$}cw]qv:a.~9Ecή7d)rUևec3gUupލLdצhזOg-HK=<3sg!?dPp+;* 8yлMuMOO p: :Yųo9TEsCuL7Y>u })LzsNW;{a8x>ue䥱س51xxeElvZoB9xYN'+?τL01=LLd ?YF *n fD1S1dD]+$%̊'t˺`GLh-,mM^QOCv4\M !36XH M S'C͵ƕZ$Bie5{EAbأ'00h#皡RjW)!`(*/亅y*0p ιl~Ie煑VUlQ̑mAxKؽ? @2An6FLt,_1&LNXq*W?]F3ÐeG';k6; t(7ʲܙ 7'E endstream endobj 383 0 obj << /Length 3695 /Filter /FlateDecode >> stream xZ_ O@: ,˶5A$hݢMQh<<ٻ/)R%}DE?ǻ7~x$(B=Tize,"uf-ۺm[1?.s)?Ke^оEy8V6X ʦ8|tmŦ?۲:tc]jeS"*‘PrRw<6.)qzF͇aTax=hc.@'\/6̵"~B+_*yeHu BҦv{`@RQ%`qb\5Y{ZZRY&,$Τ(ru\!GޯECV1C4Β%Zͮ203#c},Z/ 8r^EEhaFӯqqd1;ya}1QP.6Mf `BSTh*G?ŹjWmCʽt<b (4ωS ,UQ |2lxNqA<%858-3!I; IHڎm_uo}'XBNC=.8v yRuv6R`]J .Ð;zШ| P7u01hYGIO!amyU!œ h4XKz!X F%,Q?%8 91K"R'0,"~w)>C !\4 kX,%,d's$K۳\GN0e&z2&K^'Y2u9W Kq1уJ qqǮh ?sIJCTd"$񘮏 }& N:!OR= XBr"@4usۄb"xQeg55FaKͳ{e|[Q<[/! s`0EE0tc.X_ ] mDu!Ens-J~v'sςŝh9,6[.ޜ 2biN9S#$r,g[T>Ic"Lt$CzePNe)#4=\$@ n2 s°†-w3A>Di {n_7Κ<`lĖvkgMq4LcԜ)gh4vwxfϔ3.QNb,K&øլa7B{SilK<ӹ6Us""-&!PEQLLA>$Xnq;/HCk$47Up:z?'5޺mVc"* LxQx7ꕏ3X_GDڎU$Dfw\oO.{"z]HAC1NgZZ}XBcEsWGfr/pog7GՅrO˝+'6q80B"xģ&Wz=()_b5jKV(F}A33Q[@Zx@qdŵE"R,>Nu gHҹQU߶xA)juQ⧫KOUʃ;$+DTet߲D}~ X@U$Mn//_`= wT s 7(lg[e&.$ȡdM܋ KG7]GRK$R/e%, ztᫍgBӭw< Fqq>ðj-#D`{>1 D|L|0 ~hGSOc-%., bqz֡~#CpiEPS5{UL/=HLWcqDs,\0S0=C*OLϞ=9ҵˮXBt[y6eyS$M ҩVgOs-XMLʸ *Cx%GN29؇*d l3;=E*~)H{^`A5t;4LBA!~C1m _! I\~_~DKq!oD 䧹RPDSX?vmݏQ*Pu_u0jvm x +qDE0t_ҿ2t>>qYG:[-L9~}a&Ƶx#Cx`B Dx tWL;1}|Z|Bڡ{ksu1qCI[|p@=&㿶WgBP' ^ދ> stream xZKﯘC\F4)Ie]){|smI_nt| 4+or"M~~`|x|&޽mR܈8ұ7w7Y4ɅbusWkoU6tdoir4JčvO}݉hv?MZy$ݛ箠qj.L-$wE0:V;c;ƛdkF+l|.~F,3Uv"BO{Nf? .zswNfx7?  {1bD˼i}y /iuO#ڣiM_m7k5ٱkmV,RERE{s4X60:7;Em~uHtK|=~VƛH(#=6h[;Eذ -aҭL7O9#-0N]O9vX2S=Hu3Hn7}RlasZ/eA3+"tԀgqxm$]% J"_!΋S庉eTB`R&н#X`@] N`v fHEv=aHׇXgJ Yf(5T,DY,D =e!'Q3QBEYEYLƲڱ4ʲ|iVhm+MРIhwY2!fj~6 YAUi-u4ʳUAp%HK*߾6(Q/E88%$`ګ*+H&;| rᙉ .g|PEZdktCq~Q+ާni/("d*Md#eʗ)grz|_Tu+  z`+2\]ip S9MQ :  DeY 3'g'ڣnU5}q8ݮׂP"lVOW6Mg GPぱ؏} ߞIjXBf1d4xye:t&Hլ\t9P~mS檂t 1U=@&tڑMu syVelLnmdJsY ȁS<ƶ,.'@W*3vҗLHas $kmAewVM`hc(plU2(0VM}"l^v~G A9q dxeQyWuNҗ=s7WioxdesyTHug.EeO$7_S0Ցv|hS*Znwl]@rj0Yy7{~ev@( ra覐9VD2F B-r +RJn[X\pyy8>k (&ܶN}g󪺲(\zO,'ȅǷ|K2||6 5N%o@nnHے QvL/#0'5۔ | kۯBЭT8Inv)bIЈr⬀! qK9۠D 6i.I ~$aP+ar:456:nsӧ"23|#v@AZkk]j1˗?edW=Eh9"C#Cb^.T3شaVǩP`1E!c7 I`PAjm5-28o[Z2=%m-4hZÒO}=L $|` l;KYr'-Y_ۡxf=f׊uV\EdZ|6]Fb?~noW3(0qO-2̿((y^~y~Ā{el;Sp/PI0b;ut8n3GSqMfR̹ꎻ5S]('> А.ybfQ4p|'kfwO%n%"lc/41Lyr)Va P9<9UD. -2r;w+,ݼ ̸`\_|Rƕ:*eM1D/-❿߽X endstream endobj 415 0 obj << /Length 3292 /Filter /FlateDecode >> stream xڭZYs~_Qj&A@lgSN%*<6j8P30R%7-eCDKl%=VwdAPYę0e3ߴ\.7Y$eܲé ^,y "5^tYڔy)hڒGlZYb'%J!pt ˫|~鲺&CW$j"/6za姟?Oͤ)2T;B+@eNB!MWKU`7ߎM W﹗(T0·V[ۗ8ee\}D'yv| h8ycQͺ錜@]%! ߞ3F,*J:`r̶08t{A魷cў-{:Y^U ^h&0C 6^ `t+;;;|:u(ױF!&=c +S/xh;YhOF\Ax7tL(_0;^xS⯐V\s|%cw&gdJT1SrixD A7:Lt)RD04;T}R:@>w&5XA"Ü@':"ao N cm_L:6*Q;4q RG18w`'_6קSaoXp()NX':(ޖ_%6Q+;{ʐ984TPPjNgcOs KXWa96|l0-V"W7;LĚ;v EQ1~(*v@-[2"ȶ{EEi0g'3n egkAs}ˆ\V FtXrDq%t>d"|I_fe ja]`8V@#sPcxw&ޖ(  Ti+˙T@'i wAX Oj*%7@DZ(Y(H: fo,@\d ٹ8{E(8S -9;*V,?4`x'Të́.oȋd!(Idhs ZRʧD2d$̭ͫ@R0zRۭLHFo<=Z ߝE^CQa0Ww)GʺږY9igPAM\/ n…W- O l+\ aV֥L }IC ;xU?!*9%,&}ºiYA,;UΜ2P KϷ&ryFGL/ Ksjc y[H0lȫZq#^(涐[Coiq NbC\`K DمhMg*>H""ITp1$UȪ]o2K=ݐBes*M+RfkW| x4El53^.NͱA<Ţ5ց%~5G~)/[)&ȡ=3R`dR(M  @nhfVZުO;;I8*U 7W2Ma?QJ)];>ٚry/_ kFpGn<?^Bo0h޷' :KYLu h ṞJ_:Uu]VL-rWbw1'>ȗ^鍻kW X# vd[R:3/OBS eMy{|-=PHa X=^8p}an߸]om0ō,}͚Vlcwۧ_@ڽ J0ѳUe_-]~;̲]m^3O4'5_T{>T*iD lWoY}Ç4p endstream endobj 431 0 obj << /Length 3520 /Filter /FlateDecode >> stream xڝɒܶ*= D{JQ,۩\UCf&[\$Oy  >x]E |(@AwwI8ޥڨ ܽ?7zd:UhE ͝pT7w&jѾ,"ڪy=Z U^'*xh*<~ǮRtroD׊ᖟ>FNv2CwwVNvw:Tq3Hh%nlnyT *ʡjzHEgZx8# ~(cɴay*֊1i3WSv'$h&ieq>F ̭M{JU{Cfd2`jq q6 3e|Z|X*V`{gu )5?2YKUˊ:mV[YڂnLgB罁Ǹ@4; Iml.XKi)]m?ںqXGli&IDO,8e/yFáyiqhǁ'GdJ _7em$*3[vCӉ&D]pb CРq: Ʀm]@Nɇ9Eԩ í3M~yd-tі5h?6Jm@o?[1XUn|hkpqbYQLV+8$4mw.jd:j`Na_ms:9 l\CYx?k WA*&-u.eiؼERܢ8DO`%#A{ 8`m<:Y"BI>>G"INFGk8Pxr:̽/Gw#kZu g_SF* `a`gAeE[g UsA֥#2xL `w&8/`3es{Rg[Gu)9~0'KW5%(U\AXL+XHa)#UYf%abA a~~7R.̹*ڝtHwԟ8f`گFpױY}fp.y0eb 項$sv2eosxXBxHy"`\!\Rds65MA?U~GXM 琸NJ~♂&+^[µZ4< H+VGZ'x Se9Jd ZUZ̠>-ġ$ڬC 33xPQ =J|Sb# U~%a~L~ gbs?R^+[p"00Dy @5G{ajGydT v'2{3&d?N04 kONB8GNxq1@\)J+&1ַ#b{,H!V@}L\qr4%\]|:JTbS]WMQY4+)M_M"e=Ob3'As[aGYE8IW臷e"'i_2"Ig'à -#G(9%uTZùe4OyC$7;:UߑN78NiNqt < y9te[^Sہ&nC&k^2GmirECukrpQ˲DB٦9V<$n!ͤ2G/Tc1|rTlՖH9ԦXu܋Iiʺ?#7)z0Õ.i !]ygyغ$xuU$(?I=7DZ8ت[O]whٛIZU1 Jburz+S1^v]׎͑[+2Flz=ׯ:Na ׺J^"290GsFBEc{ r'~ՖJ|m(>z0gF'54$3wE0H*Wk7*L]5tH\ϯBK 7_-s}%ŘU)9$ǯ(N.ZFe"Xuzޘ:O ŅVS: van$ B,RP0;7Wz!X׃4.ib.q<c=T7Ǧv7ە`deXɛ=m¸g޹qV>@|_LVt!Mr৙|X9':L4aqka|hI˓W ?u XܼMCW@QCտD370cBB5CFwyo endstream endobj 437 0 obj << /Length 2878 /Filter /FlateDecode >> stream x[[s۸~Ћg6Bpu۝t;N0ms"jn{^!P'i:8ƳOԧ@ji=7хDL>X*WH2HJW%PΩ/%T}=I=C O {z2a==)3h np"])9=Ǫ S@ ;{])n(sݧytn@˂㐚(0 g53O׎>+@}TIxCqiH!1hIg0y8Pv>MVAd $vveIAtP!JDL7XZRC+iĻ,{ !Ŋyu;:Ҟ{⑐$e8O!O/^łyrAG r:ziu͓V}I|tS#B+N*F>b&?zM'VE3KAGH3Ceqz\'.B^y?ή*F_)(]S l rPdIa{5" gHϖ ˡ}fXjD |fjŔ|q>s볜<ؽoǂUGwJcbB5~m=xY7&:HA[ݼ'W!l3rLqF0\G=n+D(Eg}.M. ,=S8/w I3v W-iimӹeZիZMElw~ct2jhÒmTq1C4ɴ?昄>\ưZ98-|ݺHioWCik0K-^ i4rqUHnF+T}Ac|.`ڑc"Ya;xu4 ay\t.ap"Ip#dAh [}X<OU7r<q371IwנEP!/R:[WT;rxoi86ՠm>Q ]꽳WBP)HzP9H?%EgKE0XꭈZJYɮ@^l~Vh*<ȠĆοi*R0lYvΤۙLjGÞ0=o0FLj,ggPNnkʃ کKP[ (HzʜPXWyZĵz6w҉$h%=Vjs}:)D!f1o-]-l`5P}T ֗(.2/lXB5Ǒr]>jSD0S P3m- endstream endobj 446 0 obj << /Length 3519 /Filter /FlateDecode >> stream xZ[~ϯ0]"Q<- $EHݢ9y,9|6ۢ3RڼX2I3ù|3d[%>H|*OI _WZi.X"W7mqns-Y_uY-T]R{Pmm~,Dz/[rG͆CNfűc;W曛?4STL9pH<3%3JAWalj6뢞ʙ^]ì24\oݤmEo|ف|?q'+jQ@*krSݕǦ{-暓DoK^`FRx`Nz#s׷E_vN/>X?H*\B.o$,֝朗SDl655 f\1L JѩHӇ ((mbE"Pp1`޶DEʖZ'Ԃ8h,)5cp2tZ<$J %5So 49,u[0}gSĔ]=F=(b[?6'?Qh+g ]dcboi7ǡI@9g*Ws7."a"Sp7Бi_R ;L,q,Zz8BfDz}8 d@E&OYɩme抜F\Ԕ;t@XQtUZdƔ2\pwif+Snm4a[p #5edäLR.ǀf>K|B"u]&53Ff6 SFg$``66 =iW ,.8q>.n{g}ɟ XY<5&94\QĤ9K'~j<`_H\ƗR_RSD$<,T :O D l?Nr..\unwوPY8xwŽ Dk Ai9 .z,d OeJX!@p@eAN?nF%yh$8pj,YX١,<K_B(z ˷hyvba:'TYc5s?ؕ_k $X5l(# uT+ƙK6TBd6%AR}qXof<մǷl` YkyXE:B%LAQ*5Gdz0:9NQuNS"#sƳa7exNhVt2(^< : bXA Uzzɠob</v=O yLר^LfE1$xA>H}a"еR&\9Q`΂U0g;[8v1MƸ?TBZ$Ŏ4`(~8U .j**4>-#$LO|Ml4ak4/]J> D}{F]x(#fTob<7v ũ|feZQ }GScUhp1U.^ˈ7o/-+Ɩ]Tg?M㹽Ěyy[1#4y-' $ϐ3_ U#ko`73e'U3a/`/NGy(DʃB<-B;/d8.y<@ӄ=ٸzH'|9]&7Ҿ,s[^y}fܼ/ք{.xBO]G]QoYe6G|Olsshݡ9ĘkPL>R l+OS)"W B(?u58m4csjxo> 9Kϫ0"A8}yC9EewOϊ#|QR_}ۇo >Rv`cᐩ1l]XR$,*~u1~lZbeH߱}!cdˌ`,t]3B`3jPP=~:Mg:]LS$e?֏z1ÅSӧ8 c 7Nu= Jke}]=bv"\lwV1S8r*oF)8Ε$.f(ߟY)"LxNJ7Ž C涬dۻ7 }K;mU¢EZ۴2 +tB '99CwB p`K>Ypmt3;q4B~"uɠT*hQdgѓhEsᣪQk\rγplgOq\.n!8d[X?c|Q2xg,TPx{)J5@zD*ќ ^5r\~X9.#i5⠷o>@=$8ZqxfQ?q. oq)_^/i_ԩd#<·1/h8B]m_SafԸs̟Ȍ!NT٭4](Xej6'%_<]p`RE2Tll*|G. T[61սW"t)S晫<(ʅ̀ 'i>OSNFz@:8(VBsbI9h`D*_LsUӢU˒%ъDHمCVb߲1L lC y_Me9KrZy|T$7L.rܧ+(*vKxFih|$m5n;В U(%W@D1Z?m7c^eLG. ά;3/bTqhx5gbpC7%Nr Rx[p]7_(ss^Y9?ң,LDFW/9 'I@ H[%! #8U~yۍH֮z=xώ.\[W㛫F-JH׺@Lugi9.=[LS9.4'XnX,Cj!#|uğQhÞEz!A"@ks\c89]WܕF.sn`!>>dFE"xb} 4v endstream endobj 455 0 obj << /Length 3376 /Filter /FlateDecode >> stream xZK6ϯpmF|H&fI0}K[Jrf:~XE,w"Ez|Udx\DoDχ7_|Der[$xai-~~_l8H?m{tƏ XR|spkOb<'lDMUn{˪FUhXYm^y:=?7{$גxVd Ou4BDh-T<<硩@`2ZTJj|kJP[z8\9,=t btթ8IbLJiVyJGέ63B۪VnL |l_}wI,Xթ%6zZnVUmstFyBW3{Tn~Cz ||v,d/aVfHz]% cb2%E7xXm횖ހۈu<}y%jO%G-FoyA#" 3:L<ܡ뢓6zE׸u`+v_5UysB VIhNT>v $N5J+uJXϪ$,)ܠK3p׿H&/nQ.zB&NOyq1έ` SzB=1'|` b"g9]\d4^Pz;#Xzf(=R)ҔDLeEQwK[/̉` &DX(*bs>~tR%LL|۴Mg R "8bt&EPbzpM? .`@YClh%tlP1@X&gY 'Xa!ڽkkx597̴G & iTŀKH t{kô0m낳"*f)h3=gm,6`}N7I^ JYkVEc#&( 2ǂA PdYe2>CEw2]&"oC?__jd|Va`xWGo:W,d1#|;m!AB/LWTyaG~(C#;dOA]-mG0 kM2L8*=P᫚ظsk U5m9dOyv<:u4;CE3{ۼ7m΋AHOPw f\E>ͥ |;ۡ,w@Gpomut- 4~7{W~Y~$kFk8UՌ2T82Im.`HAIab; ^fbYBI57]O]B2ghdNa\#w:y6}gB|ZSN~b)̉ )S˽$hwi,>6m}FW$1l$/ԏ TP-vL^}6Ձ^m_\}.H桓{==m{"Ht1Y0rᱪ!=>88\qH PP- B&oǐ5քwF niّnM@UjEJ׊A˘H'pdsZ WDR W^GZ{^{uKuuxW fBr{\WE rE^牼ZX=mGqG[vaTe*G XW&9J[#zGM!%!_ HV̭/Lˋ5{t^l*|IqN'|gF1^>'x1̈S6]XxH$jz kl%2=RS -òÎ \.?mQW@W'8dFEy:#a:y.07Ke}NI/ٴrwRL^`rq^ƾ>VŤ/4\8 0j $]I;L":/qN]a!`.xnmH{*[=z6&[/O@Iۊ$+^KRiz&_ +1ݎxipd4nTHcҸ—Um+'8 jG3%d\,us`X6abt:eŴt3W> + rg7#|C$vM<_^[pCeޟ=nv|sWdA(vgx(?@q8؜kp1+oRI4鹫<׍"0|aBB[&R=$yYl=HW Mj{mt߻atNuw_?n&t=s9RFF=Lr$R.G9H; }OiPw;{֯-mn-+&uoD*)H > stream x[]o[7}ׯc`^p ͦ[`(.AGI%CV+6kٵط>Wpx cvT袩\mrZIja+7*:s!9Uc44ή)4لU1lqUuZ*~RUE8cj|RbepYK',,ciT+51YAd^rP]:\*؂KEh!%KhѥJMCԨ\1QKLegUIXv B PaEf{Fp$sd쏅JEz!D]ƇQ7]ǁԜ12&;cZmXipE{L nKP͵5&*gFZ;Ӯ$p#`XAaUa>Jʰ+`~ʯRH3~k)bKcE>`AZl4`HgI"sB U &$>PFہ ?ЈgC Fg<fOlyy1?]so?oyoNfËj>u ,Xq~/ f/#Ɨ/ m֧/[ ??o\,8 򬶋t,._v8Eᔦ1|BjPF,]kMͽ-m6Joxxv][(lxv|rljy،K _O5-͏{M=M͇•?7YvsrndrWs>,QKIrpr9i:i4ߤ!IiIԠcN "P0,gm盷d~q8; a:~ nNPZ>,ĖV(Os(r듚'[2  2nv~9iuB4z> } a}qY۞?W˓gg]s4IRp%M"L_߸ӧór^[}a0_norYb.| oo%vq|pIbz70d(Ÿ /:beo# $βl(7IDGPSt[L|n,ΓXI` $$ݽ=1@@`}`k1hRD 6"xMn)1T_;lmAS*aܣW#^d/>WxWVy(, K煥ya鼰tY:,g8x `ܪ@1|I=Ұ2Q})!<%D3@t~;plBS8V[Z|7@Ɍ^} pvWAɭ;1/N2O4) hiRL "aBЦE@ xAf̒Ei-qZ8#B|_*3v;qȈBGWnH/ˮ.ɿfccvVêajOޯuغl[wxv18#* /;TbK Rlz;\Ng2%;PIh !ʄh [F 2jNSKNޘMe0 eGMDAp ᭉ5b /zx4eY@SY7VoS L`Y;I C)X@9b5& 7+( 9X(T!w,,4+ L5cX첐Vך/˄ufSKNZHc&g` hHZȂsts8lZ?٣,j=y%P)Քx/irjl~JVktzJc!a"g9~#,5kb1X' obIHm jd#YEmAɣ%%co A,գm\0& |U`;wT(;";aY*8CQEPϩ(0^S(Z~[(7@[CA[:8KgU{{k`쵃)\b+<2;SbَWف1.owVg55ڤR{R$7\%$ y(31㍮4$:qͧ>|aY#Wj)SؔDy6bOHƀ8:dzޜ4~ bJ˥; XA[OzzհwQ Xb=`Ep Cp^UQ40Meuc8\넳YQ> ts`?DZ"۪{3r!Y ᪖}vJ' EK٥)Nv:sm 6'OX?rJC\ܬ_~I/"Bcn `yq`_'r1ϟ<8DXKG,Y\lR\b>2gt,0?R=n7?njj'"4iYOH IA٭a_,o@x@Ko[4Ƭo5d13i]Z'< oP,Ka Bk>.UU +M' ͧΔ *EOp endstream endobj 476 0 obj << /Length 3365 /Filter /FlateDecode >> stream xڭ]~rX3"}b ),kHm\w3Dyk87^V÷uȂL(\%R4Vϻ?׻.y_ft-ԀL6]>h~ݹ(iU$d0 ,%M"j#E $Ip"BӬ62NG@(0?T=s`03>CI@9nۺ"lPs!Vs2_paj%0!FB~ΝfqFxska&҉|kЩS##F933r)Jj{Dx)~Q~Va VrաC? A32ł07!픲(wFA|)ꆪ+Ǟmn4ޜM߼0J{E|O("4]aF:oB `kA-c7U“*uAPz^ 2;JU#zӯ-БjzA~ َK4j{5+Bn:qpќ@o'ӏ3%)SǚI4p?b\.]l`~@-&7>﮲Dĉb$ߺ :4f O[^> BL^[8h[G xB$FdmZ׃>Gns=ٞOfRc9ܒ"J 1Oeq24*^!ԣVMB :?"瞠?E^)x.B43fXL-gv0s >gnـڗOu+f(] "`]2V0bZ_PB4V-W g~֦˕W#SpdDgFLsJwe% p< 35)pY3t}_p ,ĹבeYY糌{`s ^ DRs鄶Dl$E)NP'ev^)r:]j ȌuW u%C%>fKD"&;ZfM,xS'xj@2XT">7h۞m/ \F0L`gQ$Y m[ՐCog:W#GB5z%|FHt4Kq9aD9kݷ4n#`s@uT2 ھ\0",'D< D3>qD7@F*Pd⊯)9o=X YPb⁄s# ]B*Pp4";Pa<ځPORT*4fj\{mtX''2)mNhM[#ן&UǶ+'K0@[y3 3X J׮B t7Սφ(+4;M%׮.+1#<[xt"aνIaq+0uR q91HW-,Ay2X'XDsn57:Jb [<ڮO(:pC@g:)!RF"Qy0]U-5&E# ]K޳HE@kMqecFh$mB 4@_ hr@݂yM٠>.91:g|Cyg:Ց[RN^^:;T = f v!7L0LIk 7,uK&+ٞ!Nu_^E69I);RKqDR&qĕ-T"U{܇bDVfł6Q;ln~#DhO C_AOxČ%Zޒ"t݌!3*"C8'w.#KP619Kr>5-3ipv mcaZA=wI)rnb Cؤo/T6ywg+S TWXWכ_Pz4JailpD4;j PLBj<ͫ$G)7iÍţgsU=hJu8AqA@@/9RCsNbdrR7Q{z#1ec}K|yh:9!f\X _WN_:^?nἿ=[+J)_S);MvWc*flyPL4fε*5#}O{?NH7]X8l-TT;ъpFRE&W2Yj֐ S} endstream endobj 495 0 obj << /Length 4179 /Filter /FlateDecode >> stream x[[s븑~_ᇝT&uR;oTh[YT$*>$۷ J93E!\}WWꗟqgÕWIo|?֛~30. vi==t k߲vc^CMomhvme{mhF`\*M !l(]_خVK˂YnL(14_4|74EQ+qo~t3L? z?5-ryxꎛ nY7/rteʜIWtFLWq4VLL*{; 3>{̐@e>L.8flTV0h"6 ${8ArUU* E$Z ;p"mu%E0lPf<xr&Iw$i4N$C蝏@cE[%|$jB{"f7 )zK}_a4gںs3N|^;2*1 IL JX1pKJc= ˥` Xpci}U5Q.;@ӭZ6+riP[K!R|A 9N9sezA!;ރ=chw6"GZ% `nXbDUͰ]:Q\m1 Xhy6|i<׆/Ϣ sCrs!r1zͩ ƒ\:zx>G6 yyql,6b ڜ0Mp;BUQ:|#IiX{_-`zWI ,}ON憔&Z"ĹU$"l~PZ|xVyA!? =4$<k4uə6R^ra}R abł"-vxFt f|( D$8MG9 Y$lLM=bGltO81NgL)Tvc_fm箆)*m$#`^(G<劬ᚩ21CH1e@V*P$$[DSnz[I?J\iX?oyq^OC: !ali.ic.dkzG~抧:7N*րD$yt-؃ʳ j)Z݁c(C?BRyG꼁x?91ѓ޿/<Q]L5Avɠuɠ2.ɠz}2,=AS #i@pi&J}TC.U޹ZIF(Fȩ,^1YwK4.$ۦpwﮕXY V|"f]ZLG8*'7Ҏv+]5΅!0Fx?j$t@5[wGw/M\ug 3 4w[º>`)br}ԸR„]`v/j;E.]sy8\8jD; FOnARVQ&A Lj,FKuw=8KPVvzp "_r1ON}NVxxmq!™G8QtA߯W *mjiS1dv,D +$XH(3#Da<tW9WL7ؠ`-UX/SH B Vܳ(Xҥ'ŗjrN*^f];ԭ / SSñ xx⟫qC c4An_`[ IY|U~(|d> #XCvÕn`1 |Y'eT'A-+&cEu^z5)_dWCR\~ġ6#Z)_77.luhI5$n8bwV6z㠇̾"wi>MA\1IV-Ĥ*zKr˓HjWoPC"˰q843 2R+d/̓Q^kaBBO$wS#X t蝀ɕk瑆Q"|3Щ߆6BqozXɑY$w&쫙`XfF%k9;Bya[9ML3,fQKV =` 30C GGёh=PĽ77go56S6U@Z{ dC$xbAHtoFzbc@?X.GZP>SeS=!'\Qp< Ƣ^OS3YBzͲ~e; 9T8O2}!Ă'J5akg$}Fp.،{_6T_ Gﲵk>'u> stream xZY~_/{nZa{8p@, ۄг`gwm@31 ײ= 8@(+:V7]=f6MLGRF̦MB%kըAVN0y òiѸLUއ>My_'c͸DF"T 4{M&;<YAl#%7sP/%֬@q(k_/b%n˽Oaw.P*swu_d/:O% -pi?x"H}lksOdS0s@$A^U<7-j+Up([27HIJ7I97OSQY )vԫp?ߩV6[fq$vf(41sx O}a* *!Zm\94 cD x9s;09L2Λu-ro(0AUu}+7Dp_0:yX'eCp=o0nW\*rН"@l&"7,raV{=gwTȟZ4y$Q8gb4K]"lygh0 m+}W|Koe1YOݢJ1Ϭ=ְ7Q{V̐&uJsuN`<[ZW N/!j=BǬ"/ a3ոkF#zO3v]`-CDF(~xgPQ;bidN'z$"YSCw@W2-$t= o@P<̍=*aa^rΎ{1rx2]0`ʩڙC"]s \{Vΐ (1vNULA'hu`MMuWY7CaywImpk rd 'JG#Da-=`Ӎ$ ]w͌lXxcm_ ̷NH,sHvTy~u$׌I./]D{Dš"i*''9H 2z(F;(I5u@,)*[Hk;YR᫁'MX 4P1ؒPov&toUO-? ľ2氰^_+ݸ猟`NB>)hazS77lwe\UBƭFS}p:~5v¿LYq+]d9 ~26r9foۡJ\rv,qĝyktu|͎t^I{H)3A"sVwhǂriӊ~mr{~[yRƚUyR[”!GԶ<@t聵+?Nk D@q>!rZpGlg`F/{J-[m A>=1ya  𰧕;c`=39 ,@j|Bޅ8Af*skU*0=XITey#xl r<r[/6|I]oH0W2 ?GOL+e>@J 0S6{tP=6K@>1XK9\27lEab^;ت+01u{|sKR2Ock1ev 1{y' yQ~*‹ٗ{c;uf$7^(Y~<Ƀm/_'& endstream endobj 518 0 obj << /Length 3440 /Filter /FlateDecode >> stream xZY~_lIQy v7`$ $>Ȳ֮,94F&뇢ܩXcASVﳾi0"%a 3IiƑFb#զݻ 3޺+YEG}W.5PDC*:AoӥڎOpn-2H@QSOye?h#@;)*Z90@sVeq:MdC`$1' VwJ Kؖ;&tJeW9uMiTn'R<2ͥw[/;iʺ-maA`/!Hu&eՕ_T06y"ApE|;zŇ Pxʨ3:T]y#e'PSsT hv8 ٥E2?Vb,(< vmiLwTg;Ѭ+T-H5~0p\z$p7#++ [myx*\FӔ(q^L؉g²і=4d-1ff40w9w$(y?03]q,"9{&HaAȩt,}sfY\cո[ҙ0 5l ObPںPN΍acYYo ^–mq(MF.ayKJGtw Hr3Ci-b}_ڬŻC?,;mS=+lW4њ('ZsPL״mnqH#19(:8TYM 6g rnD\=(&[=gdlo6˃oT0H^fI4)u[ڑ nmPQ}=̢PKiyֶed 3L*B= v螣ptLP6G,%#̯y(!XZ[ `볶A4{ۻ ki=Ŗ?b:J5$Lɾ,<݅5Znjh,Y d;j䄂5K@ &%`*شzewhf\\ߠ$X, hur&n>#ZĎg[U0(2GP14u25(QXtb\ d;po5kVS'8)~\UV|J]Fce 792A>Px-wU imDa)(Aw)(! Y0 t ӹh7][l?j v(kwahiF9E͂ p봙:rf$qtGM{x|+:cD( Y1#u4]1hyr`9X=W  z13Hq`H Lҟ"CMfK;"slL'tyhjBb!aoрWp/`(ʡXϊPZU^mc-,…\e9 [U^b@#J\Y?PRj{6(:`6~蔑P@w..`(ѹ=-oH9BhY=|S{Z8eҍ@J.@$0XjI@%FF`]uڮh;"uyq] v]\#x39M@zjBvh^Vzp\j1Չ~D+Y(-hU~y?6eK'"Uet8vTzh `ň Ex_lT]nD&7v:=ñeNƞK!5dԜ94'FVp掗<#k]?]ٷY{rڹ)g9ĆXڼgxm'z$v@}N~Ε!J DWqY8IRO),vC)ހ<-vso` 0Iљ 9i< hwﳖ*wt@vZg*(ݬsǹgHDĞzp|-o h ub!h}ݝ" . 7 DV{tlTaE%G-$N Z<AkNOhDC\N$ND"D$3n~sn(WDcsPƎh'=la}3om sHL+NYtMV] Uυً`ҩ]!C=V{(_& `HDt~x@zR )[ʫHZ0 *(8BV\Дa7a{ FO2%zkb_ۑ0BI$X F z&ҐB@J֋uw\'i ԩ1K -ޅI@)&UL3+\feO?~xZ6ʢюg@dq)[>㑓ew\zO5hPJW`JbC]~&\94DwKfF=xuo! /#0? @ߙ>(f/C>m z&WKiHWd1wyEu:}#C% %3|wlmμWj޽'|H#/cg-I;)v,3ocxYrcJ}zq'v %ZJ{4Gc =ȈZ3혈vqux9ZsذR2 J$D4fWG_ݿ-o endstream endobj 528 0 obj << /Length 3114 /Filter /FlateDecode >> stream xڽZo6#šn6;`(d[qRN͵ t,dFyfh> ~뛋/^`p$΂X@Ys:g7هyni/^d]DL}_(L*b)47e d}ozt[Di[2Um^/SqGd6 Ox@L>bSQɴ>$Y’PM5 XHW oWVeo)}< W  ,/G/?I 9R+뛋_0gLY>zwg>3ΤHffnb0%#bKJHB#|A d<&[=!w^5)BAlU9ۑ*?# =JxMzx>[ϗ՗>`8Au1#=O5Z@DPO/(@[Չ 68%,rb-%G)ڍtQ8oU-V<y<}I ~G?'Iنzj~]ܲv"d4 g3>Շ:7i$t`>>;"<- Cp^PüX\TZnhKg c`P{t[.YQ'1F7wĘ4]YAbNho zn9.IX#~t$ƚڥd;(xKhpa~ L[8H3@>RoV|/bOb^ $+{^6-}ڢ xRf.k[Uh`ýzn -DA4ɟj!:bƀ !bNM }Dž*uˀ@ @B wÝ Yt |=~㒞 `0\16jz iOacغ a= ,;aȕA0EH@Zn46 !; eLb1p۬ G1J /uٯQ4q wƎH_[Ȏo*!C~wv䌢| `YR Pӑbaҵ^;uާG#Em42Xz o0 8w\lYvqiݧQXp.t<+}Yt%H(0^^ H_P@uf 0A\ ~xmAQёz/z|y9hя4OoX0feL>] YxhsIKņ+* J߱4dT4ۅ6LPðiӌn ":mRKuވ`0w+7]rv$f'Bj`PԹĘĠuo2/xLǧEȢxĴzsz,Vq>]^ZrWRVP5VIq%BJcbگr2F‡AY3}LFֈ-aHs}{Eˮa间)yHsEM>7"w-kX@<)}Ki(cX^0| }T0wCƽ06oH!µt &*]Uz]HRC#,Ae;S Tz'7@bלa g{8K|Eߛ =ל8с`\iE>JIy$Bj5N5;(N^m) .lX S:73Ke[)`LR/x~X9y.oUJ0G{J.nڶ-Uݍw+TpgLeB}-$]6F7cdt GD}.@":#n4JqݵJ,\T#fB}p:džw$ݡdjq"Ǣ,Ni"wwҩjXK0A& ZWcm5> stream xZ[~_>,j'>\n7li9E 4E@۴M,Cu= ȥy%΍3ߌv`'g/C>ɂH7#.nh"W*vE&$Jvѝ^܄a0W2oE>{Kږys6ٮyAQ)?H8.4sd]qpSJ{Q:v/(:*`\km8KR (B9;7o1Jm:І%){ÈjUrsZ;*B51۷vkY4%ˀnjE&$=03d1дKTq҈:'ͤ?Ix3{$#! Qh̦.HXCW^һ[1#[#2)t.p!i>,!A:O}$`E}'p[zY@xKIU"G#%iz=Xk5J |DݛNG1-gԬ>9!%bA0UǴ'U飑*>xy1[~(tKT=mpx\S1=#}(@}|0@)Y}a!cX 撎:$|C#alIIL)&xM@bL@- 9!H51)!ֵG"Ǣ灏C7^Kd#2{Y`0iU59 hrDrؠ #i=4SS@jx_ˢVM<Wt{YhIh|N ×Ww 0O̐ڹg- -W7cy=bShmhY1Q6*j !QCF8PZ6~w$LpLyQ]5> ]e5IR=dd+ZS g$Pxēt^kpOQ%%I[8gX]LIcrmQQi',l"wx( -*D[X% Tog%1y<|JM2ظ娸yLܭ}-$񷲪^_wl(e K I{Q5IE- FٍCAWFioEq۵Cc84",= p Yg60H&쯀p[u?vNNDkiW< TWަ=#AJmdxӮ~mۉ脌Z&96l)+ ״kץ(ɴsw[A`Js x)cnPIv5H^MTu]5)2#@l,Վ7]նdC>/ !PMV:^SaWlG{]4$.8:X"'eШgA.3,$(:vYZ㵟(&Y/M20rLnSUXk^2fdkA}yNv,JIX8JG&my2J)OMl[*ҷqֽ|0'4kFxV &aa "2k86TҔbzm.LPBZSq-=e4kpE6 0W2z 5Kۯ2V̭X>3WݹXjګ~זmc>A^;4kwVnHxl ٴOLES(m6WE6nZ.?潽.}ؕph cBJAAB.Q r$'MYq^C' _AisrW_,`͋Ͻ ѻotbG2 8zy8ꇯ' ]g{fy^/'/Jg7 %})t<`z#Hl|30ȩV's;gB Xx'Y4Jx5iyebs_7("aF RAfqqSt1%? v:;Qlj-]F>3OF\ؙjso?\t_ nd;tvM$1i&z˃.@X厕'ǑҽIu?˪Z4߅GʴFHFwzV'7A by'i&:Y LC}!eE/5% endstream endobj 547 0 obj << /Length 2892 /Filter /FlateDecode >> stream x[[s~ׯKd+nNؙdKm=g2 т g/jAR上ٳߞ.jg_#.(]\$GYUE2~MU%!Wj]\ϪZ,"ٯuQR?/[׋.:{qqHxFf"DY%۳ھaĢpvmzng<({6{}w˥ .e&RIϔy<)4`m!Q$Mp#C|d!( !>#$vg:/TyzP3W vy%8}Qy*ǒSfK|M-VNZ,vt)Y`1[R.r/vuZQWGH$@F#h/>~$R^A cPm#Rے6ؖqV>^|,hrN [K.;tZ ^|45= }2 @, Ki  C } 'l,j@_{{f<3܈m.NKBęeHJICC .ڭO?s%xCD ́99 zi)@o=> :v'^f /ܯac]]\.ǵJ~© z q*O)05GɈp?p`8Gp\p}/wsVJ|I3;|%,1#c &l_Z472uVTS &!wlU Ѐ^De6ZT(xЭ>dDLXBKjAwS$`>p >lQx;.$ ;MX8roK"#BX0}T M<[Ưuó.M#wxœG3XImBH!8{'?8h(udc׵:*wLM&Em7K*G@(ޟufIt DWGCH+W F79]3&ݺG_1wzʜ3j^Y8w&CY6.o<덣PqĥBtUB?7a~vfgV^)gV_7$ QkAbTz˱NEiOTY5;HXۗKDPjy]G ^l}kZ<)NjU傊Z+Xݾ5b WOo(& C X݀py\zːJέGLj>dި+0,!9-Js pڿXS}:]x xA180b9MmSBVEduGt>ϖvJӇ*))\vOt4`_0*m]&]ol+tܽRp00=5:\K fl>N]eKO.<}aL(j܌xU@A2^XUu,U\YQ mK\U:uqM}M/%H}5MˠY_P}@>@Bwl~2XřIS4^eF@lM2͛ooAҊuAqucFjUƻ,uɟ@ tf,&u΢l[a qxh{-=py/kBqu'Ġޔ*Jܾ͟ɹ hr:ī[S`%@6L;逴vnD_N_; WӺzFqMӸ9 ޾t+A'y0vfqʹH:52a>!%Y[l>Iƿ"ї{uSjkߏyҟuh=Ω8*=@ՀIK%~YvFy}GyܺӫM XB>XB?9e+cXY5 } VMv[&M J9Zr :頟{mK27AfTWpV#X~i~xj-OZ-V: n(4Bgk롍]+]+DgSTmm05aـּufFu]YWZ6))S&)QJ0ёDƸEתQ\iǦ(q&z}]Ȑf[8AdX e YC{b2w,FWN+e~XTE9]:}m  ƗbFCSky]^hT\w.?P"ȑg"qxі#P#C]uP<(`-wwkLVrC3wϟzD)u>ۯ~>nCߞ\[W%#; UlvfGK``FT ;mCCdH`l}W[fD wXy{wni ec^1ppAX&w!o솲c71AKG2.mfcK[=ѝ`".3hB룝xE:Mrs$ SLa7hы׿Z endstream endobj 559 0 obj << /Length 3130 /Filter /FlateDecode >> stream xˎ_K7CYd 6;96-,NO3=XU7Wo"]HdQ&WO+,WT"ҫ=,^]^ff]&o^ih~?D*R`?D+jlGȤcTR=6 q(_sFI"a=m/{BPqc-v\/ٻzaՃJHUzZ&6w*Z09և@?eQqwRt/ $Pm+ 3HtӶki&hw5-xy$Wu"=s:g@HV<$+6^ ZfΦL#&>*gۅKCd~>DqC|jI\鉘P2}}*wɆwܖ.ou?d]w?Y#WAϚI^8.8$\kN[:2VX :LDH +dBOŠ-2~5vuͿ${>˼ʻ=8*8aOWudԼcp}_6/W)ʒZtCyy*# l!+PZT r2EXY Aw~O#Dm&޺:PcZM^\Y>-w%@->l X(*hjuƉU"eTQfVg Q>?HdIDz?"3TͽYPB=eTTeJէP:3>`EB%Fq5Vmǣ(s4ƫ% aHޙm Y(K :~$3<cḧ́Up-6-q*4spQҩŽ/xJqĉy~NڃXСmA^$S %b)Tv1ڼ1K*L"nai6K;vԺEh}YTW#';s;-sJ/o<"7E3%AC˅@Rq=Jr̗I"Yѓ)8YBTl8^ψ }09h14vMcߢ%37pRK'9(0o5( ɈFB88 C,gqȁPab"Q7RL4] ݘc%NF#wR zE `\w.H-p9|*:uT6B%2G!݁X}=Tԣ=u!+aԹX|腐Y-YF`Y@fGj,}i/IF+]I>WH: t*v|zmHt|G04p3Y8%4qZRvYCk0;`[F#9^."ʼn7R8*ׄn[hM\Ty`유,1H6ol$1TR_(frDi'fUNxR4G+^g ے/o  Q{/^M3`Z/+rV4{JQ1n}ZrW<|r  TM td 1Rd'68"=0SO?y]Mu-uu UB<`#`7EBt]-VĸOzǃ>1p+B?#aڼ#i̴Ȇ 9Ns-运kp>Iiшyx;\rApt5SCViI!]J .gup!iZ=췮 r$mIuFŴdlƹ$z39iJKiǏt"V\7$\3IԖC@l<5Sd6s_fhL/ȭe{BC69Xݼl 3n2lF"ñ{k@JЛU%k|-rn$"OYKܐߞT Fol ’ z/Yy\sFB<3K >(S? ՝pn ' @W X{XE 2}iƙ73r[pOE0B}!{?xڹm!am[ů+(UW=/}/P f.{ ?Ao)bt<kpnv:Fჿ[\ ~!S4_S@QPhD{O\Rb/E/qǺ@OFpBT1*R-= J09|PTK2TWѤlUWԑRVd2Ɔt/%y"{,j RMA?oAYGEQ?u&Qou(R%`#9|V{E(!j}mȤ!hHA#jmv@=m1>M:  {8Pd[hyENoyqVa|ۗ+\'˚et쵳w0naknv bVAe_0|ʱl{.dBd@W`>{oEX\ soQco.0=RfZM&|aKr~[vof| Cf` ~GlEzsPUy2qkk]o p^zPmw89gr@%fkLUAUx9 dܿ2M9 u endstream endobj 460 0 obj << /Type /ObjStm /N 100 /First 878 /Length 2482 /Filter /FlateDecode >> stream xZmo7_rF Eg(q {K˒_(gə<3Aq&Dg( ɐ a7 %\&koJQa!:ntLvznm"xfq' `C' Vb :Â+Xc% `޹I(7rd6UlJ1>F@OYLeX -Cৼa $ Wmԧ 8<\P1nb8յ< 1Du: &:B'JMX J+Q73 Dl t).tq~̐P(@sXrub 'zPNzlXw B.S *L^'K6;܁<&E'ja>LLpp8$` ǤR$BQ2\`ߢ&S.ȔRH'ސsA%bp.&l"SL,@,g^_Z^ ,uV(mmLmm܏mԯwK;nU~1Ϥq|?[Vo/OgT/Tsh>n X-b &ZD+{Qt?/, ,軫/Xml|B05K$b]2Ը1yA|E6)YM}R"ŏ "$ޯy kz KLV]$D fDf}#uYA()/c9T3gχwecD3u.$rͬ$[b DE)G@=P1[#8Ip*f tN}=6~:T'/ҩnё[>:Om`i.]Ervr]&`Wn+7Uܘ0Ixt@( p@ ᨹ[V1? l3"J;*/5W:[/٨h#:m (kD7Y C9* `Dl726j7f]<.@.DcvҌd̲;aԲ;!0<ׅrËj,;xm#'lQ,"?f 4$ 龔Eb&)v3)nۭLVc;-d* YU;fҚe}~6r}Cq̬F*h2j?C|r:tJva̲$Bv9Z`C#cܡi1(6-zB@~4pZ@ڗGoȼeD9 )V0#DmTKRHRE&&O{qrRw^ͻwVﺏ;{1_^Leb-O]iz ۽}Z\uϯWӯEuN{ QGh8c9(l4A󈓔ݚ-He.^@9v/2LOKmO~3)#wXNJHx#w@AIdGEK\tlQͧW,FiArO%S?QX}?$R?}r.h9x/& 5~GrmPّ# =ss(Q_~aUǘRFQ8*RG!3=A$p Vpk^Bj;YV!޹IDhRnbuѤCiJt#=zxwRݎ}(x7[D,0FbtGAz_=P?C3i@3|М5 r'ZqÌywOokCP=! ^Ed5D) ۢm{xWl&}9Q׼v(e8/cvn њ!v>1?(w`E6j1)_f>R P\ N q@L$ KT[v;i5‡ROcwխvf 8}nC ˰D;e[W[EuiQ]ZTW"s,Qў#j 6@tڗd_F{AX3YI( b"‹0ȇ :p endstream endobj 569 0 obj << /Length 2950 /Filter /FlateDecode >> stream xْ}B/hF\އ쪝캜M❪<*)ge;@6GEFO_ݯ|zq VwUzEX2%Aۭޭ߯`}qea&Lݫox/yߔF6nKYZ{0"r u?([3M dKd_# Y6a2r/SGD"2},BkVWA/(iƟ)ʼc|A.jnBYv@a~PD-[Y^⒧^'cC,0Q'@0@H/AgsA"6mG^EwzywU*Xbwx+ߋ ?{ABUWp#fk:At1 eC&Fut$xdOw0{ջ~dJKG'._ 7 +1 ~ɞջM&`eYdZcRNTǓ.+njuMi5a oQXxZ0B]Sfld래?MFdT#YX_R̢Hw WҶ$v;s_LDs:FkP-ڟ!\v.j fjz`6x8Y a).{Qz h=#?QgV+ w`,TL-""ZsPjⅩyMƠ վ|=t}+k$*#ArN'; #q;ӢtB{ffT; )CN_0h0Ub*nSeR4'7vDaЩ?D1$.g5* Mk $Zk٠8I`/A4*N<ˊH(AV=^3y0j$#iv(!vHP=L92ƯPanyb" jL ݃:t-@onEvёEW"w<[lX}ȁU{/uC+cxN>,ڭ+y/@Dɠ}a2qi{.AU8Uzͧ~1q؝|v'BIIA)=Tͮ?Fy-LǠw֜ kgQ٦ӂ`kl@ $`J Ʌcnņ:;bf=V>EeE2("i1ITa܃'E/C*L0`@@[im_++W7nvvUuGKkB2`SB0m|y,of :VqıT#_!C$N (J/[CY448^s &ќq+zp=Yޣ@%Ts;^ˏI\D Q&DUiAp5s(q A,B~`SwV+({`ѦμGʬY 'Y`aMn'y!D*E1}458,\m9;8G|=fFIIۆ" 7S X.'}y%QC')+!ʑ)dd(Bܩ-;&L_ME!j<PşɑrdZ; xBf ppW;oEfk]BGI #'_eٷi?+.b 4F y{۪ K"ϵ~dvyvd Q68w7eW =4=!85sV<=)<c+A:(k!d EWų`p CҨ-ز;y<K ΃ 9O񐓑 Y<;4ٙLLy{gyegE7`χ,G|kښ#)'cofP ]^!)L:R[lZS]A86$1Aw/GTqA l̽.Z bǵ>' m$-6MI1'*:1ݓj4 b pP,}4N!8~EXL6't endstream endobj 586 0 obj << /Length 3660 /Filter /FlateDecode >> stream xn6=mh3|K`$ wnm%GR3bHmxO xFRX,_}{_x J*θXe }r"7/̼ fdp`0߷ULu"-+ xD.ʿn.ƈq$V7wnv_եK.L{KV;Ga觥mڂ WWʲL/ۧbf3 ho}h<+1-VWy{e4kJ' V/gd5YFYeJ( 3OŲ?a97JĢicSAIsAEoT\$hK8N%$~4goCD#}~ bqcyE[)T0 fkKF2aЅdql-`s! Zb&.IRH[ВYna D) Gqtx4"1jM?u"A{kP>0Χ *ssMe7>vUW_J}9NP,T=}}7ͽ;)xҜB=0J(ft1bpz |B1[Y70y2P  oSä0@hⒺtPIm6T¬NA>wMlUŭ_y5v~?{,`Q LAZ6~vCBmZ-y!ЁRFjtXB fx\ BJx SЮʮCKۊK^x64{}]'lu97` <6YaLb8phʀ7jr(h:N0W$=Q%n"g@%QC9/GTt[3v)և ˖3 8o]:uUI.ix4ʍ|[RC\dr EY Drc9GDxWm2@zJY6~BX+<mGʝsqUOyq!^ٶ]Wm Fs#Yo=?:b^yz;( FZ\h(2@ /0}Dx1nxl#.G={p.ԵKvicHy;Ҵ?xi~v XS?*} J`YFۮl>ޛg$/-`Θ g^?̭=@~6}IT̔ ?t-.9N@GusvGW]󐁞ddM0%}ȤN2c7 {<* YzP5 o?s%@.`hTɺqC9Eꞣ`9w2P^ \"}3r~nE,c`)2O*Ped?k8TI[/-Y֢,xwڞ,Lh]XKg(ri5W-*3OS6I/)1\_Cl醏"s"kF y!`VL w8( ht??̘U#w.:wi\o ׍(e /r9yw@PAQZesŁv,pK1\H$,7ִ!b-۝ǥ]jfum;L-T<~J~ a{2P{J@;gxʻ R؂+u$-qT*vѴĬ4a>Ŗz8N>,?C>ZQU'q=χ߸%6NLnn>L{RiWc#W.qognݒ&j]DۧStfLY4.:S f p#_b[N.yg>bilϰ0* k鑅K@M('{V+Ͳp|(͊ϚvC녊Rx;p9n(z?/EoaIłQKR|"Sg F7 u$Ч~p/x6{T쉩C#u-F6.04WP6^$DaD}!2켳 KRc̯H ?n3ۄ]qz⽓/_}řȲYlG*dz^E 2;g0qBnO9ۊ&Z-9or*7IË,grb e{gm2 +s R'yyJ լ6?G8FrUT<)hHhGG0䫖57Z,iY>|8?#L1{L s}.b"it}'r}" jӞwS5TCӆ\Tʸme(e| T7HAg4ïIrymB>0kH8HTe/O/l;W/3EBO{pu/ݑcI$N]aI<4Qc /䣶pI Yvܣ[-ڵCx &Щ??Mܟpl{! Bte C>:SR$Jno=i\X1z/ꋁSLe8ut`ÝO|0 [͆ݓ3W"(omCf:llo]ZQ]E"yzJvҵZ/3l}'PiNdƧ5vUZ3µuuLX]Ĩgb}!YVŽשu?#{s@Q,|y"K \ ^"k\NZ endstream endobj 598 0 obj << /Length 3124 /Filter /FlateDecode >> stream xڵZm~T 8$Fc#}f_$X̳33}g^g~93R2nʫny[o3Ky3Y:{h3!YKB3w,z`arƕfE^2^ ۫Wo4ϜD<@u;>N'7SYLVӟo=l{csffҰBɯSia)6G d*MXJ+>TڬԤIt+AYU0.ssPՇ#`YbQ&8'8/^0M*&ʪ4EtYPSH3.FGA"hna a xfeI>\3|B#%g*?k$FnKȊ2E!$8R+ ta `Ց xy:$+U #is.mH tn'CH6.B"gdϿ7ƑXSP!Qzf5 U@FLpdaT*c#a9qSDrbȗY8r/Bi\d/4t[W`;%3* B 4ch΄0J,s5E݇ a9 \DqGV=1+℥ wtaa S^xѤd)2D$gEF(atNQQ^?+rWU2GrLq^!HXO,nDՄ2g {(]Th[@5< $,PU axXJV\Vqņ|#tW)U' v\aGVTL8YP|3q;6 I$U}R,EZ$eQY KԨ<K `}VA)oIP'C!ee[\iGE (q,Q( H,P iwr >'Oa=CFBy2LR#"Y@+(1G,P $ i"$Af;j6zJ g +OVehHBi4YFn2L4i )j̩sTbў3y<2/^LBUe ϖҮRٴ׮F.bwӒJ!Bܯ@# FsiUݝϵwexz\ظI/_40ɞ}ںvpBvGk¡ͪoVD|:+,TnWZG-2! ~A=g6`4p""9EغeՋQ9iv?nݮ w{A/>ٵ:P vwwֺ_`]%d j=luODͧso7̷ on\cAY_I><Q0㎭k"p3P[P0 rPoӗ%wr(0ҟQ$gQ?aԟMy.hC*4bl,pdjCn7@f7%x첩W0Wq;hyDB~}rjQ$s8VV|ng~l xb`ne州|L\?g3v#wŎxH+kv/Dh9N苁QB"8Tfћ.]}PQ`fm/|a{D#9T䪥^XI(77zeYĻ_Cn@Y!c9gIt2χ>7:^AN4rOxωEz"if`dՈ,==nw1>c4.>yM7<Z(z7~ky>e>wF":QB^O)Q3(lel\l{ݺaeh w<%ƸYFGwGߍSਖ;n!Qxiɢ`$9>=Gttz_8=8=8ws|n~0 [u uk:yA_yxJgtZnhKFl0v rulW\'t D,jMKy} Do͖\onxl|J0HT+!O:n͈ t׎ %ܞA dIJ)s? z/O q\Ψ7I)rJ.=? endstream endobj 610 0 obj << /Length 3773 /Filter /FlateDecode >> stream xڵZY~_j|/5[gh~^ X ÎXPK[s|>FzIW&qSYާ,kS/^.YZɶe6E^eps44FF|7>PV= $oA8z$?$@?.P~~q[e TZ{FI* 1&MD6!s:5mb0ӗqx:a0'U+^G&hȤDC&=za}EBt6mwv@yf5>~/0 Y 1?\av.ݧ Dl5={Vy)((7U9h׼ w_O"Vf ac~vT&([1mT%=ߴ{N-/ʍ멹mej"{*i-2enOYw;e)nSpvn$?Q$r9SY4i^n^:pF|uO9kiHm2ȸ.}OӀWXAmf9/HcP?7i`˾Q_ڶlzv*/0TΗQXj8x1af3`1tq؂ๆX}Y0K^,RIVە 7z>]6鑿n`()a葒/vNU7ĥxZ/'O%Ǻ2`6C;O_%o皑{ptĝԃw3&Yaw*\͹~Yi! S_=/]VR%??&$; ĀvH{ & Ē^2AlJ̹”.a'G-l @bqIh&PFGAe8eDV7ϣ eU _)?秴{ 0OZDބ76 ^7qAށ~Ŀoe7K&<<kp\M@FS|I[uD)Aoʬ̣a: Gv ;mz"V|Ctb(I݂$n)hҩǶSM qxhI<ߠy_k$ *iWYz"s\%7r@gE$䲝OHZԞÔ,f-2ynGӞp#^|(xn 7>w~^&<~b!C3`ʣ, uz2sT:%'p{~Jg;9}H՛d{ն@!VW@ +#)6ׄ|81\͑I@큰@$ۀXS[ zuZQLV+$˓TonB{#eݕ;m|*D7mQq<>Iȥvv Z7n IoW,ye'}/ɤZdudQ'}mC/겫TX=fBbZY+Ǵʘv F5[F˱nj31c-A\"ȮHbi˂{&-~ְ5?K>*o7-Zz廻x9be84Mni'|YoCJ:ˤD:ڿFp_w<Z:~>B2"y2 }5zщ\GG]GGqz =0 qIkFKy-oW)ɇ&EKY"-IPbr*}[M*M6tutevގ?ʘ.K ;Ҙ2ËAGYuوDZYw}m^N1!l.ۼыf ˒ 6Yyh{n_&q0749?;1Ktgw0B8C[\[R p ʤ"|վ\3rWr^rLzQ~)`_y7'}J]QW(q~_5*W§/n ˦-i#}UKڔ)e=-E8eBz_i7I:-j8'~pG! z"h_t Ny%99hȿ+Fn(]d\Bá 1؅6i%?we*P~K 1-]I>-$GPWw28ޱMrŻ0EGNUH8/YȞjYê{~;i:ښ>\QRT]"VG3iᛟ0aVe/mzOKa@4gL99g7Нb{&{&Bvw?;TZiz2va"1ҁZSH[_i7넷EҖ jmtpOX}GUa ay!KGF'LGwW7,.pQlh%@cE~C_6LT{~ܲ ;rw֋SJk"AsRvUY$[s6\ L88gfKqPә,5l٬[Tx ;_kv*(h֐gj?(9OvE㲨,sxMQ DijV`+Dk_+?/B endstream endobj 615 0 obj << /Length 3894 /Filter /FlateDecode >> stream xڥZo_"YDJt}(h\ѢEc(ryW\]i+i:}g8Csy~ |Mr/_4eR7w<1o~hnwi$4{曟}~LǁT9t'j[iw7C߀OK׷`xʊUqG=v{05(A;4Zdm,-m~4A)#?0b=H"G+bOR(?!8 0ud݄|blb'#Ⱥ>muz%X9$=c;|'-S O̖>]p;K# |c߀je,GKUY'"I3ddMB <}~uT>Vdww*K޺> *eyODYg^`[@dܽ)aahz[?VX=XjVUF!g:a;Є;"ܬ@F>4U?r|ߝH ڦ;^;yhh_M7k&WTZ6Aˬe",LW /tħ7`BNбJ;H,$` $cJ%JE 7}5|@| =A}>.FEb+kA}@cwW$\9K ?JD;·\acs8 Y᫶}{BUZk&A۷Ԗ盓ҵް6l"q_6g 8|;869v7x`" ρ-1Ia<F%qzsg}ߌO$_ V<%"\%%T* `_uG404.p=/a%~)-èjIBX]`F:Ny7kB?g25WW hw1tg;6gg#CW0YGE뺇Ů[KS_SDIC fHMɝE("ʣ32ʅ6{kӳ < (!\g{Д\g.}hHgoD< Q$,$`(+wPUcS[[-~k몯д>$CZz Wf'K:U(H)$uKdɵ;K<]r5s j2Ol`0Rtڭ/:{:Ns:#@gxNhу^{ޡ[[Pr~EɄI%{RY 5YLRƞg28jR; 5f'A _t\2[(eqUT (L *jToIEZ,K7 h_- KN!P>CE?/yxI U`!H֚B o:Ua6{.zPNnv,"sGi[٥I|#[@SK{p7ĴO4`u:`G ALmc!ʊH K2kZntY^0mOmL6*f{wwR+g&cJB*)4Z4Nf vxǓ I)K MuZU- .r`pI6)5 gXG7hEe>d|d\̩;<  wIGsQi1ӈ(>hGa"f+_ :[rhyfӉȊhՂ/X\cɢc2T>13`vW|TI4Vl9ڳOS̋= @~\I|&m_~]25GaH"I6BKM "S?G=bhe;ӚY~EķóJj{^y/d)4 @x柿|r(soրA$ѫk oV NRdq~epRSej2HP4 Dꅟ.}4< si5Az{|ELs:9hů[鼰Tq5q_P6n 4[6U|8𴜁Ҋ*4Z;yUy) 6F)8zqd1as=|Óv5H.Ҍ,4;pB(ODr'r=1e~qD uz! ( ~)bKLR.}Gxo endstream endobj 628 0 obj << /Length 3981 /Filter /FlateDecode >> stream xڝZݏbUD%"Iv<$E@[ܵYr$}狔d˻׾PM|wFQ&7F%QM?ĶU`ȳ[^ꂄp|)I>K"GTzp-ˍR< ƞxhX[eo 6g|&-pёnȻAl=w2xa:3ʩ'Yeqjد$R:(H-AvsGR~d'y ;$ϣ8Sp"gWUqb@u=ױk+ @;ر0 c)rî^Kv,HTT7uU_Ha,:<N34^L7{ J`Ktao촖[qr& fkTū9vu+OYiA~e#9`oM:(;nFA >0u!?Arhfۍ{la=H]NOE~׫2r / ^9+΀D:T <>ÇޯTFI :0Ǒlp/ӎ Sw @3#ci/5S뙐Tfz]Id&]c!#%y}k Ud1BA QPsqT̴<= b!R_ngOOm3-Kl`GثɎC.4ͦ0f\[^6Z_.ƏӪjI Րf((6tj\%^'23gG̩3%JkE8! eMtRFI]U7[T ,ˁKzf6h ISMIL3]/#z1O[B ~2yǥElnɵT$ JX5 >]yPQ46KfF;pC \C XC"o_ Xka<'`RExkfsY$;h o&0FI6/TI4 X=6\;TǛȹ 5}!OzQHIyEtcu;{hiTJsKCoiRqOj c%/oj jmWi7pq҈%'smZAj1ywuY&;\29xq9IuߵG x;J^ywoAs|9:yX__|>`[Pq5;IdµO"Éy(ɟ?wiGۜNZ9r+p#ǭ϶qD}!{ Lp[<9*Yz@PnՆ)Of—HM.xs IAt<*l2oW,3oN2x!mNKXRBTa#cjhf4ozÛwt+s%y0*^5^P]L2)#ᵼU0 neM5r}n":up/OeXE;;oPu8vg=k@3uK@d{\胵<2&dTohBc0]{:zpM\;j$ ]rukp]0W/xԙ3EɦP6;n |R#dN& 9+:Wט?R3ۦY<)1E[դLS#Ml,ӳ'F} 'ߣN-l)%R*ʯJ?3}ׯ.Rҷēo|-``d[≁9" C-Wpu/hM{$NIcYh1އG]^qj6/׋Wrk(+O`Yw?qYf0&3%DNLH 7&͡ɋ8B#poL̰+N `H9X8_5w]n^ WyNRlwYr6J't|HOʂqz 7kT$_;}+yYNHwJ_s,_V2[$^2i^KECEx3\ Fmo{R*&Uf*]ZH"|عrmdpdV-3]Ӏ"PήeetA.(BƠ3s?V㩗NψKBjU4 I2nq9/'S_>/$q| xr}HS?k5ӑ~oMq'xNX&|d]KtBSKh'ʊqT{p6 "\|ܡ4Vwya:uԸ47?&9${{<:Lje!T(/ w~+2W0{:!U⸑|k|PQWvz ŅWܢ |▹Y" :jQ<]}]y$wQ~0Q ,1HŽV2efqdҰՓT!uQdQyo*u~HkH&=+@fI5!,DseMr2(qi* ]. kHy3⎄`Hy(0R-ӦO? cCw[z*gUXyjsů^b_/{oZxpp,h"I@&pLOd`eZ' Axû8V endstream endobj 638 0 obj << /Length 1068 /Filter /FlateDecode >> stream xWo6~_A(*oŒEyAeWXlyX(Q%Ӷ]=Q;! ȑxpv! b(Of04aX3Xy54HÐI4;tFly$[զ6:X^ozá --oߏ}86SBʱ_]5=A]qU3~W;іqNe\mfK t_mP0ElX\&"L 2q!#fM`E?6j'jQ-OpTFu 5oL|$MÈp_F@ƳQ4WE ۊ|P(0)Q/&{x(d7, RWO0?`my'~Z6(WJ;R F{Cצ7(òqbxp>=JDHBH#P2\4w`wr4W] ~j\m)mq*n5@_$x{1ugԐ`ϗ Vcd4\4WKI%e=Z=)O 8hTU Tej_vPhҠvnڢjjpag?Gn6 ;| qzlͪ; 653 -vN?fP"_*}|J'v~8$k ]RdK@Ens&wTaf'ck:-' =бmti##"k`{CA'Ur?0ɽ|`a@yBLdˢC3<-b bVSu\u7ϑq+u 7+y\ny⣓5[yV3T(uڿ9">kGMĂ 7yx|]}cK<'& jO4ۻK'4p_#OqtAU鍅OR{, ;a)tۧ?A'"sv!=>)03#d0#]dl1;4ɠ`I\=8IѪDKga#L 2п endstream endobj 644 0 obj << /Length 2211 /Filter /FlateDecode >> stream xZ[~_  t-PA-,4=bK$g:h{xDiO7ma_uxx+gt_w<Fk2!2S0$n՛/t_n;& 0$srꫦo{>`L.vǻzAyj򼵬Z h[;*aZ <0ͷv 6>6ĭٚ?]c 4$_W*>^R=+$ZzCV/dz>]?ZjRZѧ-,p҄DLQ̢̑nD?e+$M"27LR,g>&dDqIGR(Wfº9X7.bKyׄ|yȇX/]&b3$5ūsaV=$]ծ3@(~W,r%3C *vQ63VAm]_E&= E$/C [ aBC`\.^ p)@bv}1ͽ ޼'m0il$'wo7v} 41"o~g% Fu਎9"5C͏`, 22G8?kvGɌD# hӚC_\T{|M[:o}r}Z8cZGçeP,SN\f<=vs޻$D \+7T*OF!珅A`tydPe(Z1l_ǧNmLRB 3D$yUc,L8_Qvl(jQ|@Bˉu5$""9sAFL*.㾠y!U2TI(@Mg̜\HTyfbu<V~9lb$Kf[6Y̸~t΅1-cVGLEu BQC@1.ēpDhjywѸW@)۝[{Al8Ul@,Kԁ#C+_ OΖ,MX//=U=5|pBNZ\UPi;4]i2b1aa X<6|y=>72ЖZS~GOο׫s n[C{u0[t`& 7xyzFټܫk[f6j.\R"&̎ ueb:Tz:~zl,J)E|Ea8#Ddj )2՟̏uaev?clmB{tic9R&hy oKs^}.,þQĮާF t~ ! @ `:N%> stream xڥَ}DtsE66QshK]>U"uz<fijXUvW!q(Pa.w$yRJlw{} nOteל/9#}eSS -572~-ʱU9 ;?_HG"Kdʢĵ@i&S"]T37Bg0[/zMY (M`v-u;H{wlxbx x˙FPnAkwNfJIt自}+DBe[tKm|Ct[VEu m-}苦)%}+./72$I}ͮ? .b "<i& ca, 2n3?-gA_Z ɾz8=P՗]]q}O7@:83 Vd"z׹&ʟga.TJLR Uߛ!pbi` Nd `wOWDfe2y]`xiK4rtvB>(xRڴVRq^2_,Ϊ+/̛4dL˄,x{pfaWAyKv{0Ed쟬55o4z'NIT0TGCD3 ><6w]mAZn!m#CV B[T(+2cs0a2X\ U-{JU=V*ހȂn u*j5[4FHco7,m b ŗU>U0Tܣlߛrt1X"5 %Nރ<4yE0{2K (\;/`=ee 4fgH(Fb=kd6@nt-U;O? 6#>凓 \$K!׸3"""!ԚN@hhSΜ^/}+5lU4M;89Ͱ@Z:ݓrKʈ ix$dM{a[iTf3Iٰ? uwzXyͰvT_(\㾻ZN#HИجZpX tf˧`1aX]xʩxp4M7|uƉҥܰxs6EվRX }{{m+[^c[W<[IOaa,E,Iק - H$f&65O]\Wee 3)+D ;B$̰sÞc8զ\Dʶ.1ؽdLY\XXZ*A6S~l:Cm mG#rW,>"~6]E`E'2 (t cw^Hc\.9FҘurB $X,כ%|AES2C6,s΋QKJD aB?O pmVs&:e]J#"}{hndB ˲mm~Nf!y= `RhvTꯑFЉgtTE4](XِRk/H-&^MAζiE-chwYfM f\ErkK,P%lܕFFk]t2U8k2vwiT,S}KKҼΖm!^.x1*#&h[}4ڌlx8‰G*8>qh"g>Fth#8LWq;|**CfG3f'M,},)CC^$y;,>p gGH:~Ad+x|\Ic'W~y%9b YMټc;8/@w_aޢ}x򎥸rtT>{k2c4||^UA;FB"4,|)g>M tfMI8>ðv#FZ[ å9B9w*E~XFB qaH;`xb̹uN>χT.td[~qWH.e)WֿHW Xt.W4OXO`W??X2iCAl*zbM ?Bꂃ+hL~`sZ`*a除'RWf/ ]hDrre9 .P/dd!ԣ=s?\~ẘO4  Q2 > stream xڽYKܶW!pXoJZq9K^'i\CCNVӍp+nt_c&|"X|~mn Vamhf&8Ro>x0] X7'͛_nq?YE,c ׽ax%}R| } raª 8 ,Lm]A1&|yxƞ,5B9_ڄEw_-f,ܢqU,q(k?1YXҮ"iiaLPAPB<]'م}u0Ω [t`!B_Uʷhh1q.٩ϸ2}{(q=s3P4+kquU9z歔EuULN>#֨ cop}JB9/[?w0=Lpm W!Aݒr[c̙c:AdJdi } !1<@1LD0tP<# #.pZД40v"CK'5="g3:ئvnyz …p8q*]p &͂D5uPYјNPH@6M~ Y7#W+DX #^*Eg T PMPM vQǶ|5q}Hl(Qxq+|;P$A\Ѳ{բY5XB6ȥjPeb zm)9+ W&Sf CcZŹgB-s!GS%nq bdbv⯈^P➱uuq "&&ɲJ9n!lndmq+o\Quƀ KT03NW ||PXa HK6\]{在lT9 ǏJ(#I=־${&#B^y] r_Cʃ+Kw6a0(x{I=rzoՠ +;:"$ `_;!`‘E8On^V{D_ M-8,i水>ΐF^kYPEٰUuCRSonK *esޕ!bA sqOX}LVASH0Xmg6V7X[q!4ei"$$jQȜ(=gh1{x)&7.EW9(gzqL#5>(,]sO-X9ƟN1Hvq/rW`)+yIDVgj^&B;̪6y{ @ ,M5h@蹆Arx$8ȲkbH{;0a*yCM#MG#p #&yA8k9XNh1|0aqjwɘr2Odqh<| ',^EAȃM7/ \q endstream endobj 564 0 obj << /Type /ObjStm /N 100 /First 890 /Length 2305 /Filter /FlateDecode >> stream xZmoF_no@ڜw@|%5ΑrPL*ZȌXr5˝ٙ!c .ɑ*Zv\*Zq-.ɕ\(0C_YL?'sc̸$G#."fJrpM%`T𢩛JI1_ejC)8z.L8e\\蘳==J6Eqm:NKX S+莰[Bubƚ96\և*~*be=0(E{`5VBp*XNjU0E1BFQ\`DiTVQӊ&3^6az.Th&RۆΎ2#)ЎؒLƌT!sq1# E2,1 <Uad* 8զ ՆbI* Z8xY,~*0?tq4L)8J ʮE>\Q8xR`ٕSkuXd%l5Al Xp)څ~Ǡ͚p͓rΚmwf͏3n~i5?fּX\sz(X)%Ϝ ĝ5?^\}wcjճ? X_c,d#-f,DLWN80|4?D;Ag⽯2|-b.jQT0 w׋!!bX) K@$r¡rzq0|GSoȴ5fQ+iE_ҷ,@ &Ӑp ݋8-m -5'd!P)#SCV`o?S؝}: ̀q -#Z?^jb.X1#rۺGlVP~dZJӃY-{D-;4Qӱ4Qۍ-Qߴ}oBk?i6}~CKALʛe5H$>UC:@2 0rȂҋ୬؎Lz"-Ĩ,B.5ap'R< >+&te1%(eК`m!FZ9$ЁEAP~Y# 9*rd+HyF<|$& w9 Grj՞37Zq)9TZ钼g4y+ o5רaȔV@fjXy?2yX={<B~o)r 6;$j; FrA2W!ydl¶9XneQ9+1иvyҭ<0 0l}~ !o!۫YV6NZp!0N8q^ڠ K*UW{A<ĽY4$Z*}َ3Hee"MV H4SĤ@ "n^C8T kvߪ.=6 c԰x4id^E肥#J9:dxF:5Y{Ctk0Cx[b\F4Kp:d໱'Ø 0q UgV*޲-uu Z{)#%SW)یq 6bl2p,LŦ-}['!kd*=jb:0ib_Tzq@pS'ń%zHJ-KaڑNB+H3բ?Q^7wh=1do_cQҢ kn1͓nI7ws3pyw/r}=o^fuټ߂7j~sptgm Y^d~Q9Mf:*GyIB"}0PO%Ì̾DWbO{W P endstream endobj 668 0 obj << /Length 3128 /Filter /FlateDecode >> stream xڵZY~K g ʳm]Dv{dz !w^85"M$tZ>o ܓWW|\niROQF"Q*`f&R&.ja”Ou 4([m#҈owYd]iWNpjQ@Ȩ:zCu}4 QM t$=USfIΧq4{?y @T&kpHi5MqUÕ*}KR!?n1"p1R"uة`w|]z>o ͛X61̟ fk+J^D5MA{q*K :AC:|]^Õ'N9Lb llaD8K6# 68 ۙf>hLmZ^Db3;͡M%%s0b9;r1f腀h,? 贇Yqm"9eh BK7BB)ދy6PK(gwԡ`r )TaȇʇyMMp(C(vu=a?V)f GqΕ BL\J:)i vi$H)Ίޭ4 g28gCI T+ M(5 TΓ`/ȑVvqhD?35z6&񈈣'%hǢn7͓D05hu 7wM'RHh˘28 2 waXr0X}koԤh@88"`zaΥ҂G2ilĊ-7`1_rދVTXB#+M^ގJg01_d+$7F{6l*@;p3 >9p㹃}85ti0iВqF@=aaǻL>ߣmp7|亸i{o@Tt8h M|#Z>xߝf+h]E}K$GjC0Fxyr`Yr > : j"=9ؽ+Mbr{.ǒ@9N._;M5ɲShoi0O`fMwgY8䆶8.)> >v4ޘ:.x+2LR*'lrI,"=yy/`|bibUszdC6FdZE߹0FwSM$wrHdK#ccBN<4c2DvDʗlah c).Z$NH![~7z1LoVR1lc}(_|JEQL l=V|A9W ıAzO'LΦq2V*(SE]!EQP% Щz;\%T3 endstream endobj 678 0 obj << /Length 2846 /Filter /FlateDecode >> stream xڵYYo~_o+6WH$w8uy n[ԑ8ȏO.gyiR<:bUz&Oo~| Eru[*_2a߮~ J48eǦ+~/ޗmݛؚ\\ɠ|96e_ϫ+)EjhcJLT0^̈́h{ iP2 Niݒ(y6WB+vNp]tf]G}s,a̯+2_ק3*|sjG6s-8 07+jXXN~M\۫vNjݨG{V)sdЃ zREڭDt5y>i}R-t9" ^@s3%GŒ+] .0@s+e?})k0;՛B1p(s[oZF.n%4uU9mgXF.x֮4u׷Mo\,p푎E.i_npi:.\k98s>HEۣy'"֣Հ8RSnn<̠١Sk)\[T&VnQk:pv/wKrkM[1T D%EK'snK\ Stl (KI$W2 @[#E/[ʞ#bJ+$GJi}uhRPL(b&I-Y 'xkbkvNrM8pmo mͦ*M/Bh@ H3`|Q0#"͇h@3Ch\(UuogqڵK Vdú1G Pad]x5(hN}Wn0(5H0l d8 e^Q[cPl8,D#Uwh/ Bg4P>6u^"͢XזxNh1)#\H YEFn~`|T@: '<ț8\9B98(.0@`'웦KW7>a33~(Xݫ$\lz:gT)$!9drzgY8XR`=KӅS\H85l,$ |_@2(90 7_(㖫!|-oEu9߇Smn:||O9 kεP8G?BQs:Qv D-8;މ}*gi|}HP=i!ses Y^+u8b8+Z[d$G, <-dRY]6ʔz73VrL"@L2xP_8"{2ݗ~x,wԍWz h06|% ~vF?A@mEGy|40dP\yM#VEKpm~o̶d J  ؚZXq]Ēv[!'Z$OJz;u;8Dk6MY4?8vjD5riP2`,޹6->R [D42ۃqqH.te,فIRiRU`OeUIޞ\UãpAͥ3 "g˾)ڶ4?9 K Y\w+`!u3RrxxK8+bA.x8Tv{cQ= c)s*\w v4vivLU 3(4)# xJ8v"avPĝiZDy<ı+1A_u[p[0e1I72 (R>F u!Q0٦71i endstream endobj 685 0 obj << /Length 2742 /Filter /FlateDecode >> stream xڵYݏo]F$-.@nurUrdΈ|S\_Lro>V7!?Gd(0*VT"ԫrkPlڡ_߫D}m싦 U0躶jhEeyӱu}G`JjhؚOk+jqA|l?۪Z}d1Q6CZQNr_ӭ.M,%=㠨*Z85}Cg[XnI495X5mC9e4%U;=luw ߣ,uu6(PĶJ䃓(czoJ#zH2X"VA1?ۮ?wi7@}hlWmcVfm;tYڌY5-)^gg: E=0%;5L)T:)Jս" %@,ӠI]<o "?r)ۏ!?Qbqzapy q8h(΂lKL*IaBf7n-1SZfFycAb56 T Xa)$qd,${=oH&&LDI6 "2$C@$ `A*4DERGgͶauU;ԥJO#ߗ'E(c5 3H|.L$2BR@1o ܦz/\G0C/tp:h6#Gtfxcז0eAsWAD;,g`;U&^k ZqyF+h(XeDHbme])YJ=jOŰXPRIۦA-E9yD'ztWZSE>AUupʰA+CBzخ3׋Fpcbcc8G6O_<2rUL孓PZ\T+bXؾ3O5h<0tG#+wK(:Zo-^ _-y,2'/I]7-8`9Չdvz"z$82@(#974fH4cĞCJbt[HHhYfI&DE:RDHAѭa*e}Q&2߄qթ4E|G㛾dʇb^(6]l@]dos>ujC,ū6'j1ZFR9V3W0X,,^.=^|rxܛb8~Z,{=P(%C` AQd^m{M_ճsqvܚu(9\Pc#WaGQ)'ĭ[y#'gI!pLK8y|!kl1N;p`WDD, >UA,QIL0K!\i춫6$v#)hx8w+EL+Ԥjtw3>PTg#Kqݰ#SYԠΫprvIP3H_P26r0bcΉ+L$q]B ljzqfpń4I]7k˽JG2枿4ͣwFDgL@[5L\P3jWd6_-9( oX!B0!ϥwwMy35;W-3C%8(]Ht:3wx NݱR.b.zWx4c)ŚW?)4aErBJ1t9> stream xZKT&>}HSe;J38$hD">v2>hwC*'G7x7?|_Ertڤ&Jzt=vR'|pDCg/LMo(~SI_O4/Pd,nǯO%{r%HCgJSג&(38֣m'Vbv^jz9$ԥu 0WPEgK5}7vtI-Ҥ=zmfY=kѩ.ALdIY~umjQݥ2>-ӑyi+Ȍ@O[FYS".='lk-$aVQ -VI[x>21idIt VN)Qd) -5gT]B7 ehw4xV_mՏk]ks2VKZ&d}DFGܖBpȐ-~zA"i-sՀ]DEWڵNdtڠW R~QL|@\:k]6=0DR&m6{@D)MW \bIyhjF b9WM6D&6s;8S5藪YBKvz NuAМ᭣?βU"Uk5WRfp9[IVtzxWag[_FDY<5 WgrgݸTi.Y%E6+݃.u''YtWNF͹ڦi:bf!:!wgA_`cgi'+BB4^'f^~WEL]K8HQ7pNlN7BZD&8Ԩ0͑(~[lz;Xg76~0x!.e< -¬dlvy"˜3_8—BD ;!x_"Q%cD ܫN笼 ܿ}nGڐI ^7a(C!8B3jzTuD@Uу|h93-d$$D^t照*kڅ;$MM~946h8_Y.3J$+ZI5U,>X;'p[3~l#0Cy83/3'.5DRHx:$o8bΡ3AcO~~oAG3iB>M&,OG|R(H>:$ y8Dܥ7x[B)M<% i!)Dbf-* YY9&Uv9/P=}ߔ&Oq;-Oz{f5~>$ Ʒ%tʹP6U0_~=ҧ9 gAA7ݿ-m@~gTtI *ם$OA:vꀻk$#B(QI@rL04DswϾض /gT%'m>0\Uk E(7> :bJ5J:"Qr|5IZ"t,"$GSQA\C^UsI,ڿ)Hez]X~`Kİn>z ;Q_!Ep?W潸D/3& YC֣?3_A:-El'rGg[ﭫ>4DnD.RjUh9v0hP yAQ ^^ұ 8R!C7M/!q0G &H4ݾ Bl 6=ވMC'xqM7dBͪDȵLشP$ח] Z=7+1ܡәJ= G{׋ m"=LKpR=E;;T za\MUX.Wb9ʫByUa9M-uN7$h/hC%퓔?b xAsV +^Suf_*I殛 /y;rPywbYm WUuus{qslKm}챐ߙr"0.I׶}wtw_nRF ;}00'ƪt!dޝ[ 4ؗ_A\92vͮV @߯9dt2r$8#θPBh@$_}YMewP ^*/tқUt,c>PQB.ƍ+j?=m ؖv@$W`!mb,^it jb)@;#z̽:gݠDBӠQ@?'r)UERSL/Z@j PS*rmJ o]fY&ԭq+5*f o* 1Lr!7+zf_eDͶ, 8ɴGbhQdq׾u/yTWUAkE1/p70p qh8GIQ" y:0īL`}NX;ƌH lL@ELkEVHJƥ*Wk{jQ endstream endobj 694 0 obj << /Length 3033 /Filter /FlateDecode >> stream xێ|ž.,/$R NK.Ї8my֖IpH겴Ou8 ÷oF<<t ɸzx>sadjŧ;+ߗ8mW_-Uh닾\O>+1 gZ_}s~/W O `³E9I>ޝ,9%vL%_Tu9?G_Mmn%}+xKULʥԋ/)6뚖V5G6Ek3bZ۲۴OTZU@ th ûph,6Kvm*j/EK;i꾨j:VcK*,S/eOu$S[b+ cX.vD[iGګXN# |K!!AyՆ3qHT$猱P.D&n]ISzt.ɂ՟zڀWpal[ ,䆆.עkC80V@խKCj:rZ鮶C`؅0H7S==ZJ*f{O{j y5 ꆣliDC"TXX6)m^`2 W`[]|;8.+pkpMfTMrO9͘7pl۟\qGM'd y"'aӫA9ą[ '9K!Oǥ6j[ vGAfI&(( 1e6cxLҬdfRy2eC]RQWS1 9pT,!ɇohh~Y%v=Nwh.4Rݤ9=&H=\8 gV!s+$W&%5yd"lŠ?r [L)((? N\Po_ !]5<%$coJTʙ|:Vv}&]Ο(XͤR/bL,QHRcvm켚vixOgpAJpX5 us8PS)9́8\ᲆ n~pK'S 3ܝvv=\1xư:kH ZK8F49Y`TQ,]Nm>AIzIm~ݳ _Wm1Hb{C0wVBZ4'?$$'4Ry6 R-p~0)҉ u<*$.+TݥO>ϓ'Uv>9?hH6\0A&Kbxh1~j&îV9ƏW7ynJרYoNT3'MӅtNA#:g'jЖϭ->z*6|tQoQ54GOإliV1ڜ6.(Ώ)i" wj$)^!*SLu<'`sW4qs3-+Gvo)J`dSt Ł/RԍRיg8b!-Xi+/Ss Pms17U!^UѨGh-tVTPjNrG ZV_TZ~`4 ɣ)%g$=C#0#4XJ?ey1MfD_8wO4~<[o@#[ wp16\'[$1^"8}SȒL%G!K%)1pnf C%3yh̝A̢dNb9Kμ/\2=7/[aCЅOD%K "\H.7L|7X endstream endobj 701 0 obj << /Length 3222 /Filter /FlateDecode >> stream xڭَ}o\Е Lf LEELS"u'ȋ]bTyD?D+O*Q7OM3Mkq~<ݖC;}^'[;TGOD,f'8DŽWڡ0~ɜN.3%p v"V~Gx5zs)zS|Ȣ=2ٟ /'6~ <x d;z9!:!K]=x;/`>Pת 8e_#,QD9K3Ϟ4AL*'!ggOxLJ#8tU*9& %`8 )иG_~rxDeA͐bzI'ꪯPoi#+r-/*ІO &:NKh|"hZK8)ӑ|]Gp! ".\ţ@^t/Agy<2 mk^F"נIdھ>Uv4Yfi݈%a 缱e`yqre0dޘɷG|DhYT.F3~QAMZlDNa)w5ײf*y3ք즱-'TT#1鶦ܡmN>VҤBɥujȐѩ!R&Zڱ5%Qrk,bW%o .r󅍋ȻGМ^,נ,JG+1ŔLYP OlB>DLK-S$jWO޷ú#C p$je9eY: `كq ~B.!NX7a(0̻*8\a;.p;UWJT˻ul~|.C<3uxfxZ>;HVA+g9:`@Wy@]pX9Uu3ϵy6WT66›;0tJMZM5ځYi;.ȃjR:iv5ӡqBU?WEϒJ$3BNU ( c1^ X kee;ӸG&ryH׼ٶ'r@Ep|%)u ͕OpΙ͹5!ImcO%.`[C]=GU*[Ҍ|>FpRdq2vԐAf+~/6R$RoNN2LBO ԚDjh!.\UwL'բAzw'oW9As_]/Wt*"&Lf^xM(ȅr_<2t ϩ{*O|r~tM2`+} ar? 5rUaɉF9WO@TTY̯\{$CVYxvti8LRk i\4Q)Bh=񨥎0ɓ(k f2}cՐѓ^}ƀRO!ҙjq] ɛwdȈ Ƹ/(ja z>>:=cϿ*+z:3YhQ ٌV)zuW 'PX*W t Y5oLE=TCsv)M# B(.Y`%.W%4,}iCH  KQ B6C`{A# #;+f+Ƞ/.p[3z݂[+_Lv9{[iZ0/ Sj1EՔ5a{Mڸb"> stream xڽZݏbP .hlEbAK͞,9-wCkݻkCr8&Mvo^7a ov$!g7ߨ3uuq1-6Q2NGnr-6lop5}rsyDU؝\?wYu~g6 6Mzz\IUWnΦRȹkT#G6LF>8ȀeUwTiUgwAuqG6( 5K}AxPͱ݆:6?H칾<_PL w}1)Q "6Sa5}G@Eմz۩6DƯܐ9:5W|lk])2,TO02۸f6V%l,ɪ>dr+kr e`)YpHzCP7~htxVK_'$%P/ӁIb HX#¥a^N;>'r.A9 k* (&u۟uT#S^KtM{b[04T6Pb_$),j}8.5e!X`V"X vv^ T=}Yxă;9QrozDŽI`J1a=db[3^Sk;ԣ8xDׇ5 'iqPXN&߸x;8*|܌_x2CʹjP@}g]xyћ, , YÄ-yCYMkg< k1~ .z\6.IKt텖"3}ѵ=Ši65J.-K/:sG/Buj [Cv(U6xa// c$HBt-+:n#iF7곪:+v4:7pw$H Fpٞ;I7 Ǧ+o:7ָ]0%syQSf{ܛ^NUj"26||C0iV&ײJ}WGvMQ:U5ז4<ɐn Kf+a"v01EWW+Y :xAQYU>O 29f,ȒfvlMmTqB}q-^tNgă 1)-O0oC?LA1ϻڣj|H7p0D ʞ;g2Uc!=k0y6dI 5K܄^pS$T'.WoFԷDJ+ohs-XE8])‹uH$S|s/4RZ` Q+VԺl ~`EKZHyV6 VKE.eH95$\e {@uMB]f($# :{d|% lgWcvFֹB=j=]󛖠FZR X"ځiV-5]Y `*ׄy1V8KPj,+wYI$^M?lt ۪qLYx{EK 'wc>;BycڽXc_P -g5#)*.4ݚĤ% ` ɒ_\0֬O,uRok{r&O/;txuJ]}HV/'% SհcS,݄yxHWD(hiy}4fxqx K|ub˾5>px3) 1>AƳ٨?W endstream endobj 711 0 obj << /Length 2757 /Filter /FlateDecode >> stream xڭˎ>_[`) L$ƷlАmv[XYr$g_*)jbR|T*ZCeωZQWLȒU]fl]L?mJ&jcӚ՗9^}Դvk>rn+8SR6؝_禪H_]k|Yp7}QQimv!cVleNI>X C\|dD+Pv%o 9g Rpzkv wv&?͗ 4y,{\3' – sŰƃ0]#&HFh0yw3 _H'mtiá!O3& e y8LIŃKG j~.[e_|r8i@DѺ Y(S"P0ܥ5_G􃊯7jEWut%|M#w3RI-g~%9O㩓6-R{ϖ:NR}sgtp0HeIJl; 3Ԝt}Ж`y*|SAԂJ7;Ub_Ve GvY'_ |O3PWJҋfC$E9dɸbwץ(4f p lxr"&Dpw"h^H'D(zan7:rvgk rŧDعMYF_A9FvK$ϕqJtz :4ӺFt..ʮHݯ Z^.h-ڂ\/hoiM}:9ou }0#!P cp?l_"$ ]iBǼd*%s׋w z)H˹^.'i(U r[:B&@@$MZ#p^\L'))< ~'<ʣa-_Na|Sh놧$цq2q L<ـ9#݄Ăǟf1Oiw kHbgr4' jQ ܝCp<J ZKuI-t_/W{dt" pM>^;WcPwI d1el}41C5ZUž z,WSORbJ': IhD&2)1g3}7(d*zKAPU.}ODdY X?"?=8=j e|Hrt BebQcئ1E,cbB`?dhisʘ m6 }|L/Y |iJftxBڰ̙#@Ǐme`=<O$bσK5{tC*1EP]{"T#|ӷ|wjƈ~R~_z>^\5\ 2NP\;R@>*4emrI}alQ=ebj'!6 Pp[)zև syMqH $ٴ(c9|o̒Z[[wXiQ)Շ$ Q5Z?NWc]rWڀX͹;_13SlݜB2 ׵61Sc9 ^Zka[>`ʰ ZTkŐ^M5[`{'׶)8jV|qIU-. b2!<yr> stream xڽَ}b݌H}&$b>d@vd1קUrg]*"`Uim;oIa>Po?{yۦx[7{xcZeQ=hhҼ芺JKBL[uvo?2ej&hǵCb-2;lYu]h?I^}T~tA4;;m1R&B'd y;` 0܀obs}[%B"@U{.H{v3#|8.i-@"ђuT^x"ìwV~,Ds;^]jMYZ j@lvLfLj-?tt=N%TsVNin"Q'ӢLd޷EE j}c%b|X ,y+Dm# ha@ HuD}4/|*";ne P Жg}ӀO(STA6K:;)wo̹L3Fd2䄗;>_[w";81,a5bm'GL or[P=M M끜PHÉ^"1EP0TFW E.CEǓ;$'a}!E3,:sd 'b,ɘ2$T"bE*&>*2]'0FxIk9+i[4HO >Gdcq^vWtQyR@pG—zMya*k0 F3brٌlP4[P}\U8nz. ȱT8m`dEP.zV:FO܄TH*7usQtksY|>:q NGh4p୒RZԽ9) k#ө^5"'i2AQ"V$|C]xUgl?p2L 8i=;Ǧ)ZZUGa8dTZ <>‰+E|~ƲdSn<8!Km>`C-  -ubyuIb"4wNGK zybG'ť@7|<:əj:jJ:C]E6bR<>1:MqW%1Ov6I@GQ,Hhr3"'̱˜2dǺn㻢i1G4p}K8}yT1\2UƸ &='|.of$Lވ}y_/b}Coui"V`ִcک&GݜP8M TTӗX=fXesC֔Bf?FeW ! Hn&cvcXSD&vDI˰n ۮ>rk ~uM !W#-1`;7⃄Eّ=9+̡5|ݥE0u$!mR+ M*@7|Q3$ N痲- n /!;l `z&SR_5@[ӛRG&Ii5}Ly3 N'|=JsIa!ti纏:9.V}EՋCP1^6P? nKQ}C v2R1 C9LƖ{t29aJ: Sh DBE7{S5ǣPq/=.*:YfЗ;`CJsC#Mich  h]<䡃D0Tt]bPtӸR(Hk#3ZTՓ X߸fܟ}cʴsD>4pzyFX]-JA.^D$DNoX蚖ʌ/igGcgw5K^^gG׏>yU@𫹖E<{u]8-tE WՔ}7оӅ[K@Ep 7E J9r>.wek%3ͅ@NIOF`<ߌ@,0< Գ{)iޠu'~uCR(In=Pߌ1vq~u4}_rБsF$,d hK/w#9y n7'^W=۝H㭩}ѭkrUhY1kJ. "# G!🛶hй~nPƏA-'XiS"c훕'SQI/ bYshgOЭ``O~ჽ<C=Nk-g^":p_5PPyBz3X*Rc-]P>XiZi]3q H.d/:<"Lo\ RTjDXF[$C^|p Z}#se 9p̖4њ!`g`|!we3T}R\57uƂ.6/}˃2В BW2%-g E S=y'_ {һl,=X&nVVµ!h[3J9殃%Vi"ğ^#utXsB]\Y "|ZEsIǠI}t endstream endobj 736 0 obj << /Length 2948 /Filter /FlateDecode >> stream xk mQpa).M4~(l$Jb} p8΋pu\]rAVU%AdhU(zE_?e蟷]l-2 l%;{tt<0k9<0EiC8[aBЏi2zk}ݧ'd3#!H%| `Lrh0[p[U3|Vց,nִ;pc ':pȭMo0"ZMxa+PL"^3 U? 0ƴ{Ilv'& Jt{c pAl}V0أ_ޯpE\6x},(:_ugLæxz.M qË ;z# ZĘ`3.wC."'LpEN σt/GѠ. ׳bYOKWhE7< sNfI= O, `!o2ivR=,K}_MMؠ/p=^yHt1cVFGtBM7قJA\I,Zӌj]uIyKl ~hpV=(5x֗" 0- "`-*ϒ')f<AÑ^Ƌ8IXEÖ/iZzE%M@~i aj_:K!I>E= N醗^^.-I$m.gWpy(n&V+5"0 }r4JW`,<*}Q gy8$Cn軓,1 P '>>߼Q#OyHV⾗O j*BcrxHt#Y$?Yĥ/ⲘI1v|̔d3D>b Aռ0u5ۢ] H:x/J5|;ImJ|FGYzɋq3O 1M;N@svItԑSo9uBO-Icb<낙Zp((?CP0$Xj UpRQN^s42>0|P r8r|Z||&AI:vX0YWM1Ȼb!:#D'LvKL=:ɳe[%"-$-ETRiQeadng͖9Dlr.>t^ZLn O+ѻŒj| Kk1La_x{>"g#X ?)R &S)D:qw%Ci$%{Mib<׏t)M}g|\_TW=(ӿJ@{b8RLa)S Saw x|{mddH!=DKx 9ơK5yls.?ټ:(_O3Z\ʠߊO-S>4ۮwNT lU 1p?-qt( ʨT[j ՞W:(߁: endstream endobj 746 0 obj << /Length 1711 /Filter /FlateDecode >> stream xXmoF _!f;Ul:·!)E:ZIrS,%NCI<|x mڧ#_-Oc>KͶ4i".{2̲|-uqu;m$ \y(㊨$o(NO!EIfZ(Lob/QaJ3&{n%6xNy }iRWmTxvv6%_mp2Azt ]vrzJLiyhO\NNjq= C?棳`9^x|>ͯNjt6ƴJtr:U2'#Zɞ1MfS"~fn kAm^욪.=Œ59f>nU% wYi޿V62o/:LZ("D^7<eVuX|9*)~'襚c[~Gt\=d7Esg,$KdI@A0௢Ca΃9`NǸS)sq"(旋|q X:  4 : &* pŶ@xxNFZȰ?Ha!Ӻ' CPeG6 ozݵW bFmOt,LĝwBpfߤl'^WĘ]aC^v% ΪL8V[KL D%$; G18nh *:geG_ ->(Mk;)ig̳N/4V{z1WF:ю6wA<4IX>(VH {^sZ D sȐ[و^OŀTC wiKm] W"]U 1p Ȯe-.">{TB( RTvR0k W>{}I 8:vO>rYQg"_'})=?u9#a7]v,nJ]I.uiCcn;PmMLXuLEU=Uu&yԨqI~xIJ* -7"JTCT[hi-*5zG*ƾ 㶃]<4n|x H$i odծ0BrG7"9Pxw_wGGkQ 3eJ3@BKVyQb+"Q>֢BB~Kٶe$wk 򐛖p֍.ͨBL| qØ$/nuნ% $M]b"q+/BDW>O]GgR5 Ru [ӓ6ӣI=fMcu3> stream xڕɎ>_рQbq!d 214C&JTE E \mEsQ >|QEPQt~Hҁ™ÿv_(y_ctFA pF?Tr>(vO7 LjSV3zhk^"W^mmW +_^.y7]Ӝ`@$.E&j5f3הCy04ۏi𠢠H! $|:܍C{x`lG)d sx.}JvbD¬؝l_vA_} Kg"zP  h!(3a ֵ80HԽ5ھ7vDZ/aն^.5典uY Um$g=u"^0=HfH9yt_ ku4o6 D ^lY_a SU{7dh$0 C}T%b'~#ecxd[{0o2KַdYsWk< BH~Kwf wϝ#D[nYx4,ˍ3NہzR,Usv3L":Z@2%dI "œ3r`I=米o*?QZ>)D㋸4F,[4v3+:u9~*GVfL K+-rDJhfUql8 <.6b׌# {kb>kY1+aᰃJvhqwugWmBg|ô噇# _fVDH{t 'FMW$X4WyܹFa3j[3͖E7fԶt9yPd$tubazӎNs3=$c9=w^)6Xd`b` ;%dRu1_yC W6@Eݵj_m-e:QSpaħoG[; cg-87(<Չkf2t؎+vXfpi_տ-$쁳NI44}7 ]{]%"¨#Zq娮c ʌD~&z>f̉m3)I5<4K@QpRH1㴵it}BrxrR6qkCz#hmy@ 85.:y|c=$m.UdzU#aq/@ lHu~5,zbhmD@9P಻j8DID ypoCS/wKǭ`!-m8,µa-h|I~t)"U6e=0gFKXrq){-Q<[*I%) A!;["It >N|6:4< n)t)v%&] x435~fEٲbM$W,{ύ,FY ݙS(¢i]^jrO7EuH|҃G (+Zt rIbqGvc3M0iD; c"~Kvմ\VZyeY z1Qju)^6x.E3?׃IӊnBj>_C;`bWx{әL :2IR~jg(HЍJ Ҋȿ0͊_imi rdA@شDDbpñrtޭpk64 T~zj)GƁ⤰{D_`$E!PmǓ0즲a9XfV9F6yOe.k#1a4K`iNyo`f%6ݔL!I͊^W2'Fmr~r,QaYw114s T/%r5.Nb8q#K3/+y}M,]\IxTNI (UmwUyc Y/ؠm࣋\K:\O br]Nqb*'[u}A,TX-&JDRy. *7;>bdEq T6's@EÝc?r])mivA+,ӉgtZمZo*]i@2,Kg,8ฆ*Nn%yyl yHíK-3e<+|)ʜ q9LM~ ǁgqȶvd>3f8)U4:N8Bs })*5D ( bSW 3RUuo:zKTLk]udWrp2ךcCޞW^.\מ,adAG&*d08A寊@Ni G[֘}z $i r:B7fN6[r5mKuaNC79u0%WM2p5Q6dgΝ#ļ;VudQW*yU`4! MT⿹zq=*uxǀXq>.`NlDCl7BN z7ܜq`>3vX`"{m[nGZ4Y<˶ăͼhီ7!,_i<8U(9*R/bC4}%H`ٙ%z-^,ݗ@5 4p+dɖM̥T2m?ZW24xN@!udd ZN2γp+dϋTt."d:x7b#K-zL{d]W[{oO s!i'YDwGU "92RmW=qMeȄH(,S{U)`!K@Ez%dE|7yJ>.ӱJ4ƞRg;(CtsGafozoCHk2ȵj%)^Aq\gNӜi,Bs课~KAd vHT8Da%cE"Ea"*JjzS endstream endobj 763 0 obj << /Length 4020 /Filter /FlateDecode >> stream xڭZm_@x)KS 9E4E!\[8Yr% g7˷m?Hΐ3 w7~ ͝ D.wT"ww_u{?Q'^҈(`Imf{RلHw5Ti7*MWOus̺5U~#.vYtڇ,?Vr_JY*Z=[TʘhvV:,Qmd("y}<$#WB-"Wu9yqX8R{%O4r{w:_e:n#ɘ\4N jʬjXJD߮8_(^NE 5wr2 6y]5Nu[?X7fL5[r+p/Kgh-s 'fCGͼM-2if[p#YQ9A TQBuyM5j5$ƺr^3bҬE7w-nS0նϲPc z&vտA+lۢ{&8p>Gxܕa!lzC&WYWW4Z2i,5,]ў쥥ֶ>2ރ8ZUTd+ O#TYIftΰPq8![(, l {CI)RPTP;À<-읉Dnp-mZ Lf\"H$aƛI, qJ +>B>B%E7D 3=a{^Lәe"L|n(`vDXoc3T0/]g٠5YՖQoFcv:՞*ξ5޶~mu(:20BrbA|ql2{|RO+H,J3mky| m6dmlI*NRA^ )tLclʶ-1_x1 GLC2*c7wh|`oݮ~=a qfQYw'8D(Zr&]&^^5X(YTI8"ส;b5:GX䕭?nt*O7 ?{C'; SJHCDWK&bl)L|=t 6} &-_hoeԈxyQLD *&KmOZ!If2vfa ))PfHCG7&+.&WȖ44nMLii|{i;nt& h©w|Mݟ=ץXb`t~YFf'&,F*Hqe#Xhm^.^:9+DiwjO; E.hf)5^ >7ܘMcq h؝!rܳ{9w?D&>=:F M8}w!5l ^dR@( UOqMB@'s172JRGz7q؟sSz$`X \fS}n@Lcf41ũʭ(/UF?ՁFMcL[Vա &Qă+Wt0u Iw^}2>rԝK;HXaF\!pj" rxETI}o 8iړ,pߊ ~S٧iS38uᗧS,.r6HAYKU^rb|O:&r6Ma\&R (K3I|}ƅfHirIqC("/ڭÒrk|3  qJJ@\s fVzCv` B2Bƫ`k O|ᙑ جc3VmOl2ifmǛ Z$"ɫ6{ :IqT*`%+A"`#0l]`@E$߄J.xDF@] z̐nej iCnE/5WzQX oEX߀>@DU#ӯtXzOP }85Rq{.;jtnXRHbe ӹ!S#zƑEOp!=DŖs DhÑZyRaBzl|煈ߛ2i:ٲb+2(S8>)d荤?v "`;\hMVBV;/`0iOHU*26W0LPwCCU]CaDt RH]veL5Cֳ("Q_MDIҵ%C EવK#߼j6bx<Dז E%o5F΀-k  G= g`=\O|dF???a1*Ig{5,_۰Af$ּZvDP^ vd< {O+oTRo92S8cu*QbsuIˣ%f"]Xz"~")ƩYX$($LMyAc  =p' 8 wk"WF!Q}4̓{ަBfM=MY`}1\%{_SvQ:aw>c6WAoPƱ*}wGC{Gp 4Sh)J!v^H}﨡И3ʁsK[B߮x=w5'3(9\Ht. I2<fm]i FNR$Rң{&$o_`W^}h'=  e֧"HSٲːº綢l!pM? ue[|ҫ\.HHp-o`OV^" yQ:ab)'| |qij> # ę2gwlG!DӛG^BDS_| ͑QRmXRmTٕMɭnߌ2^u WXʞ:wgxѺ]^\'X)%NE;q9'ToA5gƻ?E;1]?ußr76ϋW@+ @:يч1#}2(] ?t#O[j$狍X%;V1H)_+ !Rn-&QR8q7ORa,߈48vmÊ^{`Fm66?ԯ@/_  !SvCqT Gcau[f #P{j#+3cuup 9b8!H,'&(AQ?YCv90$Wl$(h %?N;zv)1RwԘ=rEpMdž 5u =YXh8q ׭sTyc;_8^? DCGv;C>7hu@~ez׃_xz$1bkZWb9\@m6-qEV>e9E".@'lgWMz=O %F?q.f3 LFzČX`vhG=5&p?TKM?~9ylB>kqZ=r3&fcg:=/PܗJ_>{hɀ>YE>n2{wfi˯ŶLDgQ1{o\GG e44{Fغ@(M3([NIPтUB_)"I3  KU:Ow endstream endobj 663 0 obj << /Type /ObjStm /N 100 /First 896 /Length 2366 /Filter /FlateDecode >> stream xZmo_r8CY.khl)e JH7o Yr8 Sv夎]숋KT]&{Nk(F˜,susDL-gc-Jm $ ] hg%G5`A. SdMr1}U]ƏfL2&xkq!!dǔ cR;, X"X3墎sWxSNQrMk1ƣNINCpپN8 ]I&9h@؇ )T11R dR|C>ш)FJxB|)) եݕV ګހ!)$Tm2 R i$A'G,hR +4pR%#ADkݗ= !ael)VVjGHuS`R0 j*b@FE:6A'2P[L + `Rwu0#UDU*& +JО*c4) fZmy#Rȱڑ  ZSXfbfAʰhgnjTOkNŠZtO4o' ?ͯo{4½5x|`yV;?wͅk~gS_<=q?<-^;9zsDӑf hX/;Nۡym\4~aty?^LW~<ݼyiXy~ϛ8^ ''' 1D lAP1q<>ͧ/kGylr?wUdry'Du::p6..b1\ll9\=".vtϡ_,.c|@ xꊬ e/k^gKϞuy.7r E;]\#ܴx=.tվ}v- @/s|hGKFk^W'ӖvL 13|@`;˕SR~O"yAԋ<K}ɏghނd1ܶi=Ħuæ-=ɂȇ{3}t/H }7T޲Gzn{-%]riPm9ȫqMګ7fPO/eu4+vQz)z8/F%AoK $gEzҗdY 2 ( xɤ$}d7ݧ |eK'c*]0O8 8ഀ F!r 7΍==$XD}ƍ7`EaZ7aSa>J1/\,Cð;1} m٭ OVKV+C]C])0[|^X?UIUTJq1W&*ȤlZ|@X\'x_ v5ж KlXu5^4!++'PF⭍$zNf9`":d0m7^inb >"K\VϲbK9E[:O[ޏu1S˩2[Vc^WGHU;rѮU#0 p`4^wYxd Fg>O0*>VVy;}^'ŗ A+J;!Vgj`.% EDǸˆ .OϢjIr3=F٘'.v { >ScVAh:ϠM-v 5{&Ru ~czp631Z 1d_ -o>S]3L^BrTK~\kaA!KL] x/,3!X(Eʨb˵դW Ǣ sZ>5'vNs2rZ<4YQ( # by{MX-ö~]P\v^ l߶k9Į׉(wAvi7qgԥ{*ӶU2^WdY-zMvah W) z2QΠDK9cJU_8AM*jl?;H>uC%m@Fh>dMѲ.V+^:Ekbp*[~._9l ׉ Jjw'ʡ! endstream endobj 772 0 obj << /Length 2316 /Filter /FlateDecode >> stream xڥXݏ߿@ UQm7\Yi]Yr$E3,.D|C$p~盿|EXjēLDA[[vlxzzo~xzO?޽}_|COmgBI\0֪w$z`_m.վ ĩw1~U8by AE(C/R/J(k 'm(L Gd3y|Fp"0xBaOۛE bw8.'BQ(aDHIVR$f5$TkQGAe-zhjy7t< !ijZ/6J0Zt%yHw23ɗPw Md&E%E F/ $`+o=+(lTG8Ep"96j*#5Q4EuE9 Hzc].]!q+[]tįԤ9>xy/(񈮼$Yy8:Jv\nj rW% (4x%cPHƒ>4+]:_ӹl!cUkھޕQ:y-AJ+ +O7! 1}[ -59;h-dYPUAt8k2rYvP   b.,A^Qcߓ6kKj )q(|~`ISqn,S% y]ނW*KJPUb hF= 9 ͹/3u2?|rΆq&sAJjIaEmWg ]B(Ƶ[;hJC䱏Q]|˗8~@/7* Yw^{Or7Нn 1aă: 9OTǧo0DU˓<,.¬-kV4d)R\3 |&za-L³VDE( "7T*\b{5 :'\3J}Su 1A%"-H!Ea$"*Foe endstream endobj 788 0 obj << /Length 2437 /Filter /FlateDecode >> stream x]sݿB}TPf^s[۝>e2KC9%EJmX,vv!__g/.~JD&jq"R^K[ցoD?hW7hk޾ ]+_:aMk]Z}]mkM19lr*$P2$,Աa_۹ߙP"aQ_ӓ{RMmڬ* AG1*sMjnmƖ#ؤ(Ȭ!QnyG|apSeO `Iz\B{PAz).}/)p{Q kb |q{7_cAyJDD~L2mz]='{Iy""6mrK{o7phyzh_eEoXs4*uk~ sݴ W*EO;kCn5~n<k{b'" q¨l]WȦr" Q26˜8tH8_<@8$^.8C 2$LQU|."o_I_H?:}4Q0ѱі .it?ݹV ȿRP1%sO0mi9`ò5!Hg@)LM@UhT|Rͦٺbt.kGjdy |vejw dvA|⪎G2uUDZ˽m |jU8;^>Ty^aB~)<=-MW v2 %Pp "e,d&A/&icz zrH*VaPbIJqj4OD=pԎ#!6`dK"tF?HЕd]<֦}ʏKS.,\m,KY9u/^.M i!x#-wYM ٿu)@ cUDSAI eOB᩸]u5VKw/"Kg)P%;/ 6;bU;*S>+ % *8-xVy9^2FWrad},cw:Xl8[Y!PTWlVPrL&bhC&rl`,̐@W= ~!s>ޣŲ3^|ߑ4t\ 0 ^tiU1׏'+hmp,2SVc:mٛB g)g 4&yU}4`9gQxxҩC/1G' czC};[!3>9''$H%NPO#뼏%Z?/dTp)H`υ+֩߮`Uݵ\0@aZR:P:$6YʝQOo(yš+j|Cc5wFpWoy<2pIpjgf%Ǐ#|#@  ;R0Ժ~ʕITlNh!&gHK\1$4܅BGE5{48iĠ;ܕKxd\N`|:&:_tdݧ/9A/!RD}ɹ DW: ? Hrg3x;,=X4WF8+:\oz̙A93HNro8i$a[aoR `%# [P6P'ْ'g D pއ8Y=$N~A-FMס7]*hVXk{evF{s|y=о\hЅ)"v7L:Z| 10KK7KH7%_TLeqF6SǒY-l8>orV ?l?ӱa n>f#n'?7`Bߙ1Y oӢEpҰфo Y~HPCA{/ӬPt\Gv?iBŭیvuf30Mnql>v`2OPW]sZAYMf0LmZ{K'ʃWjHc50񖰬dM뻋7,ҍ endstream endobj 804 0 obj << /Length 3067 /Filter /FlateDecode >> stream xڥk6 6`"·&C{(IQpe,$yyQ}!9%7Wt|/35E,Qv=871UۿqwޏA~~ys~? PDa( (-[Cmp4rQq ]8|tKaT >Jwvr6wuX^ߞ'k {)i𮰇jm$56'lbk7В!psT4gF᧟]^_ƃݡ_Ϗǡ;?F`CŸxL)s7F[D1A՘b`tj?' *Z>Hʈ8J&&I&YeI2wWgkV/Hv0 =/g\{{z _CPD#5lG~A{N9mzkAw9lC"UƖ ;hwF4%B,ql7#dP냠jn+2x'CWi)`{ 'ϫ@u!`r_FLz3ؔμ,ubPYPOH B8rȝ,!F`8n047U]-@] 2JȫmQm2}cw3,rV'ܬBDcmEoa8Gc 1G]hbs5#3+0e),\m;OpNl/NxB[> .4(TVwqkk100蜒X>#3 QptmSk9ɚ}u(֠YbK3X*k/P{}F{ ^=υOpOPآb @Z!?tس{Ijdeȏh4ܔP9i l>?a<^K$Rg9~ WnBDB :{;&Xy@ Ȑ@ɇXiP5 裱#bN(Hdm_3{V` ]xHUN̉y]:s",Cᆅ.Nw 6ؔ'nBILRUEcpqIgi5ع36QI (ή^|_5P /0elT(T6q}H, Рg Uq#)湀:4:|@tn=D_E&,J\E QiwwuN^0DaL0G~*R2 ^4)sn(~P$MV-\ LǞVXeNRAnl(by> "BN ARPx6tBqQ@MA";дBZݓ 5GhʒGG E vB\'l":H; q[ Y~'[mpLpnHA4ׁ pS tQ|7K wiz&$45e'd4;< =`7gXQGP['`es^9q C(qx Q޺OC!aЎ\w+zLã1{ۨRtgZ0#MIFĜ*":PofS-0'ԊP"C q"u*=,y(c4!Sȍzc-T -i{+دdn:ߘ4d4`fuClx/Wp ȁRsR&٤[ #yYNYTBP(HQp6F>lA`/u;tu;Tx,xO:L sWv~įAd&F5AXu d}Й&Mbġuw ݢ::VE4mEkTv&sP>0~oxkb]k¨>G'y)2n=^ADOFx_&(O19^?r.Zg=G]"}5؞U!|Ʒ-ڪqgAG^=w5[@ TBhz* KK+] x*_ϏZT0tx1 ot,H 7!̞|tuOg:4_H cqI4+-ą̇o݌;Mӹ#{B^.fTO6-e :*!KudHF,v C+G:RCdx "{) !9ġʯ" )Vu g ?%N?}5CAѼO !#=ny2,A~(ձSdH SEO!#3aBhCCu\UQ= J侫.Dju!# 0#taRGӧz5D[W2~6$qB1wJ?CnlS`l;yڮXOwy."?eh6^\ _F`SK^q) !#!+_JmU H:n7u4 tJ]1 +>v b$'>abt5&.jreK^E&&qۆ Uk=xe]>o$hS7l) Qn=IIR_G/Ѹ*kBE,YSU'%+WM` S4I:4aKqp38;P endstream endobj 827 0 obj << /Length 3777 /Filter /FlateDecode >> stream xڵZs픞xH#}rԝK.su'I&CSE*_HE X,vPpp\}&owo~W* SWw۫1W~^m~kax\weje{ i>s-6^^ݷV.w|uNAa|^x>~W7s7GEo.|[C^o١oyf \C$*M_CKWEb7o~\k=J?aJk(矯Uَ}>q,Z2Mҗݮ{YJy)>/s[02$GrDsT:_g$N<":dF״hf(T/AWm[ 7{m!Tl6-әic{[vb&tL& ÜHð0d2+M0 .8qq㇝Uymٗ7r,BoSuӑ!yo.آd">ҚZFˇ]{!vQ-oOZ>FƑrhʺ_ML*os:xK[h|ڢyڍτK^ kn0Ѹ:/jd7^+}9r"XZi:6ߗ52}SIç(Qh/..H09+TʞKѨ E']%;J"gC[6 ք~j7i@'l/{:tdhjе`uJǶQܔnrWM?[ ugeٖyUnD5& nX΂lOcߛO>$8+p-&/9,S.ȌⱩDq82y1;9wQᠩVXg~3EV- *?n^ ȚۦX N,C~ ,l7|.7J΃]Rkԍ!=ߴ#wr+N%ڜȭď::9)k&k8~ }DJU޶mLOLim3<6v'R2cseT.06Y?U^5;B1(;Oױa8]#0RS nXaM 4?ANAC,(ɶt$lNJ깜I_+#.)͢Ycq j\n'&Zn,NP'h,'ֻ s;Е} jl!f N8su}F=}THtPql 1vW>D@ixW\E#,Ӊ@!tpt-W"yh[ȀzgQaw)QaMfVBL sG'~WO䝢58IgpDxm28d+}c|ic|Ҭ%qE%H)`IPH<"0=/@d3aMGrl.|G\|qڲ:7f^H U-L'VG#ݓ?u( s:ۙIc@L=VHgfl4|J@Zm|\Y}%We׏UŕI<)3nf:1/ZIXw _4t9x 547 @f$nyqi\G}&%HɄq7I/[yU(|3?S.F FEg: $/9IB4~vSS>yUxtOuY4f U5tzrሬmRc+cM$[}q3}l*p?JArp #<ݚ)Y f]au;p{r.$xVWjAY no9NZ0x;}[r^Ȏ8 dpO qf msr)Lg2?N!u<yM侇Z:fCj#w"IW/mYiIz"Y(rjUύ;-YkE?tG$hpAd'b']qcS rt4׳ 81y( 5GX'LaZe/Tyr%Q #{/N0a&Dl M>vMr` ;ænھB' .kWParܣJ *0Q^9)l}V=tvR& OL%w)1Jdu^zR㩋̱º@F|_qulj1Ѫ؏;@}@娖CfJOA7.s7rv6A98!7`;'x]XQt2r|g `Aly0rkxb#HgLgv5 ,l܋Q1b4HF1CZill"~$Uخ.r1%r48' \57ٞs.U3n0%ĕ%Enѭ)3Ɨ jG$='>{7=4m ҩMKqST},%#E'V!޷8E mQvUS8Ht"cEe9ZvϸF҄LN)kf*OcTńeNuww1K,B~ٝMfXri`s @.uIMܽpL&$(dY6nPTvLa9%0b!|6?I_'f5+E?F8aZ2Qqn⽂A؍cl@IV\ oĆi˜0'^vE mJi6 />\##[KA]4e0;uZ aO94l?a"l;{jD=R5HV Xq V$]7LQ|fͰQhkg0fQ|4`_lMS>JyLbuߗ;ťV\ljs B`›Ѣ,K 8i? ssl%2m0UlSIP1}g۞[_e Q$G7+|T/xdG( d ,(pRkaz ӌ/у(M:;BaRőa sc/U$Q?p+?=_]H9-3F7'cw_[Λfb7XFiXpr:!i. v)٥,leF*SRLSIF1E.ĢL endstream endobj 844 0 obj << /Length 3439 /Filter /FlateDecode >> stream xڭZ[o~_9lh͢HEE`#>lt9e u$Ou8 ʻ.yWoBz"2yq{wG"D .nˋv0:pP?MΧEH E [RPnLMh:oJTfJ`9Sת)Vj4_յJWAwuK.e 0#եvU1T?vyC_ǁ?z_)*?ڞM^s -iT -aLAjY!{ը.>FB sY*0M ?6@ pK ’anxBΈզ8hWgQ8X%z&^w9kGm=E>a.@,zR+AL.fnbMS.hw:d+AYʲ_c-)*,+RE7ܸy!ye"6$qS}o1[T4ʹ?u%v`Hog!uث~q pK=͐d$I(_Sa>SeG{Cyd\3w^@啠 ]k~#'+1y6c_IG MCj"ՎEcKzcy3qe/(+Ry*&P؏qRe"R-Gߏ R`))H`}EW2jU@kaW'HJoѪH1s#7Un/}?ܐ`É?>`J^5Tjm'܊CKk6NǂmځT3#xev*-]Vaڮ7N#Ǧpc~ ~0.k]:ՐwW[B Q XPD86q$wNTZ3Nۊ N8/<9`^ZBgwPZ8FԦ:ƠV`{NC(h 7x"Y]}v1|3Xh$?~a L2| ށ4e^ kwyHtCq)+ߢƽ*g4fjJǃ 磇 Bw-;q5t[ػTVDyW};Z9IPo@f袷1N;;DLux<9uӑWmAMU!p;AByl`W% xށHwiIn ޷9cvWAbuiYKXpғ" ]hŀn-Mwvׄ+R]_1g;lҋ!zCKUTy7D_wv>?32z(3Fq5_Mf'YMH3R֐ s+ոч {e4 .O=B-</nZ`hAKn tVSh}YA['"VC~{lls)SEuTrKi6uK$5ϫL,˜qZj9@J% ػiF$f#OP ꑚ)τo 4ew~L~Ls`V;::GԚ(zJo xءƖ/CxKpcNEJvlJ`g.2t)؇cYn|˃\ !hA[!JHR@R8ɭ2q%e~>a!EB@BzE^F/֔ByGÍ#z Oq8JS7h0@3ơ_hT42-Wi%cIdb}dA[W=$L^dyrؔ|͏vu lMulkzgK\܄Se*c% 4"ƯntZHDŽ^Ј~`3;*8x- V0f\s{ tavt:M۞:`nc\ ‰U"2 7˧ac:_' Z Mb迥RoћU=!6^-^\\]x:U#m~k✿L&`db)G}Eq|6ޟTsBbr:8cHަGeȱ%Q5j뮧K%gbIhrx}!}FVn۩i%oƧ5ؽL3okDZei^;W [X$& k7773|EEvP"睮7قqޤpm\mx\n“λc^n=lI`ͩ$Cy'|#E[I浃K!¼#,c"MLn7P=e!XUZQΤﺶfJ{c/"-uiZn9!^?KhS;RW; yDnc]`$Q^J?y!KOSw9bmQp[fscd`7(WȮ^Q/h4F` ԅa|Biqm2qfykX14Whe$.`rA&ad~\5ҐtS'mf|ִ9vFj=(Րf>{mػ a&/]f!,'PL3P޺KnOKU&)^.fJ0 zKY384SqK}# #@$Q'ݐ?nS3P .A8یhbm\P65YH 0ř=_z-hQJG endstream endobj 863 0 obj << /Length 3709 /Filter /FlateDecode >> stream xڕَ6}ol[yX`v fM^C4dn %GG:[urMbU.G}{Ϩ$JW㍏'m>oΖ-*N}v|>ZJGl/r[<9e~|0z mͳs7?yl۶ipx쇳]*vHMb* RfY-_#A_l3oyn9Mt<ymLbvX8ȗeҸ~4mVH(^,V֎ ʺ!/@ش/^aRgc֜eКߐ  N_Ne~bR{ . H`,e,v{}vސK`; ە->/QE+u[ڄEk '7ʳV!7m [XLe/Hsl|=5癹yxƀ܂~#, ҵ.b3r` 3  զgųO(FHVve]{nJΤZ17fӢ{;Ye 0W"7| "Y>g /\%8$L\%Mٻo[1=GП'7=,p䵼&y8Q=xK˅.gЭ_[40&#'ra .{=>D8mxН`nx~x4fS1qn(3TȄKՐ0R$EPpC "+E&-2 Fz'6H$2 `F?[?p>8'A8_Iɹ|FcHJkYm8=Nc0 >~&OBͫ ((YhcmJcͺWl|8[v7+b'hIcwPOq Proqr^E\/BBȱC֤([ [q[}^ SKӕ9׵O2&X~1Qq"\{:R<7etTM.s.IUrϝ;sĩAw"e&X~,pg.EPC7+IgJqIܗ9bIM1lƏҚ}@GaO񉌞SQͲ1~p3B= L+·1Nevq5GP!\Xeq[~ \$7 Y ʩO ES0t)/Q}'.#|3L.`r ";G_hqLS* (PV dFw#D}yJhwK1T\#1n*~iE0-\VI_۽=(3֋LR(60Q u1!+,կAP)N#Q/KUF׍_<0]#6 }Z.^6iW=S'_*fPs@X㡵FXq \I`{(c6LH+_9Sc( \QDri"(d4Rq`nD2xywDmk΋fnE MTZ~3 V+@.xs B,4P{-fC%Z^#p##^3W&IunbmQh@rHV~8f**Bht(QP '٫̲;9U7|}~-O뜫!OA r݃dwonqQz\؂6nCF˜+w1  ~%IT_0Yyɥ T-xb7#vhćYPC9^Mˆ4AwÙ#f`вKS_%ejN=GN@ Y a?5m;W_Z4DE~p@@HpaY$[(5ZD-mĝ'|/7UYe{~DAUo42$ZQz[O{Pfxm QP endstream endobj 872 0 obj << /Length 2932 /Filter /FlateDecode >> stream xY۶BJ|&mMOqkԐ/j_Դ~#>~WWg>?{ۗ:XI)0TUbx#Wo=Z}y,ʼ߬U{lϚgXWvEs_HT~j-cvWZ+Z|jPm) Q{(3z|Q5N w >4O<ge[CU#'Z"v>lf[<_Uo#oYWNC/3,j" RR 77a(*IrʍoimNroaFۢ,P[cͿȏݳ]n/-HƗPĩ)tb/OK*t~}UaWlw$NX*bZBʘ`d ^u;Cg%rzuuu-=LY()=bӀT7Q.F _Jwh7%KYTJrw6+˓ՃmLf(h;et(#ơux͵F:^vp 48 M6$r-A# / u1dJɰ̩$" HDr0nG# YIlDL΍}9s8m?sd:LE1."m}Zuʧr>h۬"=*LL029{ڐO9o 8X,Աzw̗\H0pf"Aun$O0g-D:]ڏ*9ױǒOWd;iO tUZ7cL냍K6QG-@]|[[pGQ*?nykTHk"K,[fy^PGZڃLc )Pƙng`8|!d"bq n=\ =U9rstINf܂8gҩ˃|j0)(f!:T$GMpܗ N% <.lW8L@+hr"UWs\|mW7BAfzv|W~DkA}$$墱=b/)[{5z4{ʼnU֏9;/G-HQ2 !Z;F%V3m9D>v;z{y o! ǂ1'mu236نbnw~hAFK5W0pa]\ʖ@ᵠխi\,nq3 ԓ$n&" (؆gtTPw:\Z B Mʅx2zvq}R}Ɉ\J*rhGG$Oԯ^1=cc"mbiR1Q|bO!;uf{F% |=@m~%v1cZS0ŴۺUYEX8q "f}RDXd{@k "N4t-uGk"w& .IT$QIX .j "q&&Wⷼ"F'c[n]sEjֻ): .+[av"zK03ĤYBp>BEMQu?U?v0%;G{"\ilU2[I8װU"GMOz`Hg0rf;aIQ8.yc$b[]WI0-Z8"nM^)ODOCN8"_8&襷|˺jn3KYFG_JX#Np%"%b%HJG1`P!l乹ώ%7_Asw*&/hG^j_z> stream x[nG}Wc򠞮\v0J\Sv8 S3-[-Q"tASU]s)LdDJd<3HHxM(W1\a,~#YJdɂdȑ"|o,Dt,nB:a"<ɐ">a"c\  ^ A!ܨ9fU y YYp $HT"4@eWE Wd770N&>9}¸Ua;!tg&5+I>Qe܈ ,P 1,S2!b"CĀY3IE22@xc=4 ㉄<#]c):&*Pt9!N>Y$7 iu2 .Lx1Eu!s:S4%GPԣZf"V.Δr:'r>A ~p0@wD 錱Sޑީx7pP"f5OT,gA4&WI'2ܕ!"ѣYy3M+& b|5;gu|~?6[?}khx0kWy4O<<===t?lO`QAt<'/޼ċ|bwUhgtշ~H&5;i7Cڞ.߯ߙ "yb\=^֘吢TMQzøY>}z;k_wm7½j~i6? um|L6 !mA@0T'g x0 mm`߼^qѵji(b]Q6!΄Wx?+x/+Kd0U>xعή'bcZǰNUz-' Ӆ"-dFHwU0XsQX `B8 H)ڢ!X`ZXmj[` ekDCX8 &jAds6*Q`Il=6Y[*6z["sG>X[ͱ2Pϼג5;8*5)5ߖo^ӴE3X0CX*aQ`PM@Xnp`N * 3 8_Ce95%=#!Xe\;>Cj`](PCAXQ2"&Guak1}O\`;#%}Vb:kkQFW^'îGiug2xz*5k\u^r|\cd`ND$]}N` 5e^ h!1a*B3x29"R,pKl 1@c( yʀHo(h@47!dQϽ!{(XTQ<E6DU2h<~hxCx(͋ϟ7o|c/vsk^~vYz9oؚliN˓媟;jnmYd Z )BfweRR|(pz0@"U/Xuܩ9wlP6쮥ON=jjRdUkozڮom:3^s޵p=PH2i9E0!ZNO=ݚT(Ty% TObldѷ? 8Ѳ֗Gսh/@xS.\ &Iy5m 6=?!?ARMS$`w%n[iDJ"}%R_IWR+)J_Zm9xp?x>SϟNFHI#+-JA"AJhB$;tx@ǬǓO(J V;#V;mPz=D)ϟN $Y3ևO)L(xgA(P>l5 $ 1[Id@sٶVҔ 08H0^t߈`j{@ +zĢ8@l}cY8^MOڴqߥnKRz,ZCP{=|aHB0DʗC<0mm^IX Vq<͆ A gzukRA 8 :9 [\lZܵUHgBGrS7{ uEXJOW%NJVzͷf0}іp|9}ۜ-/'; NA]HC1?_qҶCG\B}u&Y'oHw[]C>h endstream endobj 892 0 obj << /Length 3083 /Filter /FlateDecode >> stream xَ_`&1`zX"{FRBR{j^jglȋج>*?~6GE㿹0R^nv~0~6?tZo^3?x=9ߵGXE}e/}O ё͇7۞à"ovތ`J~{Ѵ`c5㾕p崪giˇㆃf7AĈ?UM]5u Sڅo|oknhö+Z:DzժD,e:@X~\ 8umy.L@TtRoo:Č*Xф[>OX gUΆ!mQ43<יq6,*R4uvl+D$Ita䉾*e$W bwShe:U~eoewA+/ L7V#t.8?޾vI9DHOZWѡ;[7~Wz4!gӄ xR5MeD0S }mmhЍ=de+Lj /֢ό wq`fmh dCA(b׳(Z!Eۇ?U9WHQ8UjQ7`u.UP=*2WMh/ "6b`vVϐp%wF'3Ɍ7^,(^Vْ}fs鰕8" :?|V<"ȮG%r*z:ԙjbyYI[+rKbWpp^5f=7w.: KAU<\_JK23b!M+~m!8ײ5 &/IEm>=Mͻ7GҒ, %jSI GCH:y2y-΂rYU~0Ggm PILߑQ:QaPYUc}s0&S$!SUQLH&戫%ULc3IUͽNN!;1H8CQ":.}J)ly S_߮ +%$ch-קy8]?߂ec4d-x:`+玡_Pu R۱bdcS )xlKpqcun[iSYi_dž:}k} L'C)lșج]"( XڤžJTAU#VDق'J8JB4H{:-u8um>?T<~UoԨtr|^ m|8H!WZiqA OlB/qr<(v}jz,QvD D 6i@ H&CNBYWBy"\s)xNdu@RVaysQt /%O~9sb(ჱt:>/>J j^P-o6r{o{~l_ٻV>r~L)3=J3@vF+ OqWNYnYZ̪=^gՈ@!Olx$q*%oŘ$S<,KvbΟ6!mٿa=y|rtTI'8[it'y endstream endobj 898 0 obj << /Length 3514 /Filter /FlateDecode >> stream x˒%MՈ AڇM9L*ŒH2O7>ڝ]W*'6^Fߠ%`s0ީtr*K0~֧ƦkiǮ@ X,?wڮmnDyߝh~D>5ƣIp۷xOuնH{˘lk4{slv&K"D̞bq4)fDcDLg#3o0lg:I3I2͒4wTq?VM۴O?Wgr r{tYv:3%ؓ,O J )S&຺bpt1К.tTWu8V7hM ӥI+t|o27̟HLK:{\ u!D(ȔfpW |pd:TԬs1E"&3D#Ej?g!dž.  DۯdӠnf~%)A!.9[A& roF s!i.j@ lwM7Z$Fvge(2'Wyc,dN#gc@/+f{ڜ }\އkm4y.?!ii"SbN1KaxOYkŗ`d.DH}eX4[Z9Y9NbzulF`i]}CbRw{E1)~29ƤE:ft%gOp>NDY4K!nAgga'W @%Nrׁ5<k[OScGhs!AsJQ@XtT 24e&:$flVE8?񾅜:0J1Ʀ !)`DO}ASO A5;Ds.xRޔpvD]k\&YlCCe=X4s$! xbT[M}5XQƢ.&Uhؾ#\L<d~NA/h-K\&6Qo 9נێjQ%IlQ+.OװC.By鸔  PCɤ-uOT.HN59,5ȅS:F<ǥDŽ~ ސJ#hէ҈*"'n&s\R|7]Q6#/v4\L_=$yf)rYzWF]y] K3x]by)-Ao'Y)?nbL3 vV~q~N$ס%ؠ;($!?v~?MШUdZݳr 9D6 ē<Ņ~,3%o0A8f]F]1a&a"L8Ջ>*NiR4)*:O+]Y]%tZA߾2 ?-d" -Y_k o,酅P:HIѠ"Z~RB^EM[u4qu4/hHgAhڗoa>ѴRF5zi~'H{ g&yIȨS9G3$v :OWrwDgާiԗ;gИgHr|2ؚn򗛝* ,\mտJfb9',y"Ee-nS!!:5y8xג)bxnjr>ecNi~aD_Ng[7hzaot;Yht:L4]Ȗ K!;13 MþLl-UNKUC\BlΏn;P>r",Å9)Y.t|Vs¡Y=y)n?.YBLA9;8eT o)UWDL9R|b-O*}m YweU }Y> ؖlƝ<8t0-lFvr:KTrr]O6?^tWw;~5m rC3W"ן1xjX;F6z"xA\9 o'\± e35.U`hj.P1B},PA\OGx{K~۝yDkgZ&J?$ m̼ˀwBZ+<-gu?f{[)nEh!??vURS8 /_3@Jjɍ l?ţ8>5go]2HWYR?[~ʟwm]Pշ41J~OFERY *n~?b09ufuO 2V)+ 6%ds+/p̎&zhvUX;krp?eFoX5b@mw0Ix';[2?-+S\Fw#6X~JO7KNz Bkw㍄HKpH B.4-Re* \,W/ endstream endobj 910 0 obj << /Length 3103 /Filter /FlateDecode >> stream xZIϯxW'5vy^ zEӍ@RC*9$^>t7^WէodpV"yx>>h D0><nhkfhm{x󷟾I|Djt{|l6T盲y| :n}95ʞZ='6Ե=fc1Ձʙ)4RdADm~)ǭ;=XlΈ򺡾5]Uڎ?G֏G 枈TOD?oGT:+ANa M;VNٟ+s1CDmߛN˗4 uI2Ӵͯpa^rmKL&\(seeԼk6gLVy>!"۪j^"A8Bτq4sqip"v; h ɫ;=,wNqgglKG6^ KMۤ`<˃ˋO3.*PBؕmٛxPihę{ //' 'e. ӣd2SMgt# Pw7);ZDX@aSͤq^wC/kam5/"s !0'?|p)1ڟ&Igh~V́( }0p2ntha_oi46{qGWĕ54y**.$ nZLsyd` :HnULjqZ0{[MQ$gkk8>ffRLOD&ɤ'i|6H25+t<wn3V#\%ʈ ꄴoZ@5| FkFF`E 4Y`i.=J>a2b)"4H2kh0a)m&uGU^HkhE<52x,^}kSHSX SRĐ͇pe7[RVv87{Ԙ(p8>8"/"<>Jlq}gH8ؚQ \$85h|8)8q 8 7']dYr :t:NPZHC[ G*nP~YE!4'YOQ(ciBPy`;ǫrsiQ._ؿلJQ{X*@m+Cݺ<v%ݲSUhk{AXxvKʮt,ýRV_1Wm([mǗ+G*u`J|V))H;@+'DUܜ,],*βt^sˣژ=.k≏o"VԵ%BvPaH,O<={cXuH|K%s|tnjRccs:S?Y} OkioԻܗ3ʼn~=I0L$j`zX7=BYWS" 1]|9x& o0p*Q(\PtyB{m_ξ,TCS c]_~2 V_%L5T#Dm7xPGs#DTϦw寔N;ҨusKemW.ݛf /O<8:>2<˜!@>-f,$ qr~Au;4+o &ozM U3.DνVd'?-\=Q*w( TBC(+H8䑋|~A endstream endobj 926 0 obj << /Length 3062 /Filter /FlateDecode >> stream xێ}]ƌ(ꚗbH-P4-C(d-=$3L sQal&o&gy&&fmR.Ǒj0wfAqS{fY-I }(mw܅(}-ZWm9FcQĻ> TW~}o` s~?Z%LgyMWI M[_|NWM(GlOo`gHX5h5 bdoc1JO\_ Z4X> b!U8 .vay㭬h܈-^w㈨[v_vz0;Q|6|F t Rq{Mp =Sq*bQQi9yOc;;vv82$-(hH mU+v!aZUn?nC;ʶڞ}KѐEy҆VKu 1-pA,ÀXFj]< ^f.@G3#iL4b;@C4=X"?#MdCO`}Z'9_W^>[%β0RfD`Tzf1%Υ >h?:-Yv f:3$n D-PvQg3BS$w<}=g88:>Xz*2D] {l kzuo$kl- J}%U `axԟ:(Uce1KV#5;ĄбG4zT ój]9nj԰P:2ZNLL0{X1N}e0կ-K2HC=a Lʩ̑BZAfvQ'^$x]}R9>B D"5е^ |d*l~J$!r"fnYy?m\u#C "H&_VӼPtC1\VjZP{֙xeC QRm$d8E!do)~$*˥؉gμ$#Io#x?4l s&/ ?`?TY)=Wlȩ3 `=JUF3';sqPih%˸byr2aO?ÔDnʢ7GUZ|h^gg!/g9t5 s%jӱdX!(c䦱-4GY8b`ԻKOgb2N!Z4"0hDq$,ۻdf7EqH#6P. !hO_oggܫлV?IQtciOϋ,JP>`lCt ~y"'RJRC&f-W58ʇkrY ڍkD` hVC<= JEO+r3x_gNC ?h"Ra]μےL~>?0{fCd򯻦C>G!D,>A[ /` \ʼ8f7\ցo|N 'F3r@FY&Kk-"/&3"8tBjЏ-HI^4_CRQcwif1=XP ^U50gބ3:YY 0y09|7~ endstream endobj 940 0 obj << /Length 3758 /Filter /FlateDecode >> stream xڭZY~_11抇\d vaÙPM\#Jwtg vzq qlEYsIe+"3&8?GK`'"]aLK`xg7b#H~LRh{ 18/%S`qnsYqkv{ֻl1KlNA 2aФDG4&4;vEvW]ʾ8&,6UchYXhEژw14L.M<*_KXݭ21%uF4)IO9轊El0KXQ;+ߪAͨAjg ۷FnuڵLP,9H.8ᜑ cOw<4u ȅ $W2#䤊Sf$ژAHXΧI*ņJ:ʡ(7+!φLtok@XEP%T91? ,c`jȤ%QW11b$h18L. BlK،醸s`WyJ;POcs8 ‚hGօᓴNK_y:#ygC~m8:yPgOg TĒB:BU̟fqL QB#Pl>X-qrq`0e8qY@vB8]zlTM29ݥf6 @0HiHnlֿ!a>Q."4V$ Q$j:'p쒿CQ5=I7u k#Mۢn]ޚ4P5=p6vi4my_(|VK]_1oy1ZZ2cJl;59$Bvܫؔec5J(ogV>Z"ĀoWDIS!\i#l(=dfӹRav n=h>9"tI~DO P ٌ $_ĘX1dPSӘc ٺ P#=4L!d0-|CC~"_; OQ.i:2)G'@\ % k (+Y%Dnf򩙡8Ӗ ٘ĕN2  x,PͰ*5PJjk7X(`Gڔ Z훃5}un3}T2zT(B$x]H/.IUnO wL >2v7Y/gR%g'IKT\ύIF"+D;$!XUI 59M%*ڔI@TKF;Z37˖VpkTpP39F 3mx>L1wi)9-~~vY/A}(^_$D4ݿ}o[/nlI(>Fyl2V-?kW`ex$#\ALGiEХ1laR˱ln/u;(AFѴ'Ap ®p^n֧_I"i%9KD@ūDPPrn%p=s"p>)"2^g7.(c2ûMu4Jh9A4"˅+r~*w],[HM|sBHNl3ug~,!um$~: [mD +*]gbyMv -[`gq bbu-F@65 (,Cٞ0*ȧfަWУO?9IaKn}q#R-j#@E t`)IܔUӞOŞ?it۝o5 {pTI r!f9*=>0 -lk2|\..W;p%Y}N֝ -ovNl61u!{% E+O4K'.IǪ1Уq꒫%H2=`_j[ùޅI4(*eŅ.y5-Fwhgv1}ǦQv> pϥS:NpUR4ku ^q)F@hl90k>,s geJ*bRNűK.p!@ ,%ወe#д7-z TM<\вrXk8{_]"sﯨH+DrdSxepA_ҽu1hXe#ʡSIϦvWw-uGdx0r> w*Au'Mqix!:b2ԧ)YyuƭqN*pFzL+l$Ss3<BR> stream xڭZY~qon5dr٩]?@$K Q9 <4yE h|nȽ]W_J_ѕrMj{W׿Zm-zy◮iۮڼ*׺2me]q|k.Lݚ)umx.ߘ S'k/C|]MWWWaװ oЏ|'N} ~k}^K;`Ib!K( ᅲ= )jyRN<* (ke&nyٴ9FVmqM8u?L5fS8e뙿I8eUֹ)7ʼn Z8>.bgJS C-J&<F72gY.["o*#>qd9n{͕n3RǛ"#bSY .p +70(P;dE9ǰ$xݯ7o\G]7(y$YMxP3)H LC^dM; ZyZ[pT 7YTjj+R cǪQpg<)$t|\p4 O`FtI[Jxc'bXP{ e؉be{d\k, 7+jS3Z-Yu䑍c%tf3v\ 0Tpu?du 6KƠ<LCeNY\, 9m @ J8Ƚn%Lelbeמ&o>s5/'LFoi(WflسT9q8UOEiAm Z,+(efnz?uRm U7y֍RXNJ=!/.Js:$PM5(NvWFV,6 `kQn0:`C CWǞлLDn$m`rSǠbI'yl gEנ3o^N0SÕ$)+{i2T1\vGTZe~bD-^|Eٓza_Ge hn=p)?q|=YѨU4 X|ë܏1b-p.e (BqEU`AC%y=@ƀdT%,kuhquY4؁dmU.{O}G12|tI67<0ۖ KY8:*I؞lgѝxTt|A9gύhs=<bAYNV!yr<܎75ә?d؍{:fEYv 0hfUD  "Cωž ̀ q 4e`Cqs([QaCvŭTNX#F2:[&l9+;06y{l4s!gZ;e,qhP(LLQv:h%+3 X>k33 )PO:I?VshFf>ocJ遯9kBG CMuip/'73nl9FV]]b5В98^!ƜmG>AMu\P#ZE!]rs ..w)v>6HKQgPtzz؜9N6{|Hҭ]]D`02 %xD }<]sJPSddͦp*a9\]ICrPg\hviKp?}USbJS\#,O\K~i5[O;$Y6-|^<WSuTM/R%x۞:/I"fWk4w_K(7R7= ƮdB +`q n}4lš*`Blۈ@ =U-I1FQU|99UkY݃{9hCל?Ue ־+6pougS^Y*>%hK!SDQ8R>Q6[R9s ˶: _9iaqS1bvd2ϴxMHê::}a'vB_|m9ouND;_{ٵe`+8|W5<ܿ;y<><_[ڐhꖉy;{QpxatQfرlY{~(mDu:L6K ԜSJaW8Y2rmtùco*Бe|M-eRܳ9 W; Щ]UÙx̥CKgmW>^,}LQ:Dc |}6RBg$, fRM\Lrv^q,9ԥ뼑̆)TRh0/ 焰c%1€uqhOr;xw)̖-̩1z58z'; .")17G^tC?Bߘl%G: ݩ}Nǖh)\&Oפ^ТA+Sh3e ^5¾GZu6ckWC6g,U>tGV~Zj@m buKԵ[JAZHZE7Xbm$6)9tlIC׆R0dUIdd~Ac=5J睥@Сi?!0S7O뮭::L ɕc"/K,[\_l)%x#|14Y9j>y$C_(> $v$uŭ G!KMF 35(q5Av6tgF( \)Rfu'Oxڢ3`_kyK|7 cbGqP1ݶ]@TJ+_>70_zMz~R_ endstream endobj 994 0 obj << /Length 3566 /Filter /FlateDecode >> stream xZKs6ϯiCUY4|Le'.۳V&H$)R!x_h")ZMUj/"GM ;yWg? g~R~rMP ͓)S2LaLOVYQd͘ق:hoh{+Pԙ] K[?_MNyf{jO6mˉޒ o ]< XWd˩|gByE-vM}axs_C4|?IW.$)csjjuDg65vaZ=_*um}"ȖD!kڬaϾΪa™ fQƿ0&uo/YCߌIި`Q-98t:+ԦKeUvF5fzav[E쉺>,ݔM)QUKlǧE 8Fa"Hߜt7K]5ԽWa|;vxĞ5ѝETb0'00O4R]ܵvk@@Zy^<ڙ==_;Y{ZS%,*^*ڻ I{{xv5Fڻݫ{y.Fa*|Moxo|*+jVz[VMM/[p+K&`a|`Cn>vn-kGϏ=qlD Ku@˪遉XR 6.ȡ '"3Ayc0VDtTؗ@ a"6AX i`=Q .4!4 ǶAp7mf84jLAa;pCgAA  h.shr _q&^q»U_^3~띅c/AƢé|Wj=8^ޗb3Q]Sim\m(wKj4k Z7蠿{]T׻fTN"k2dh=/ xmP"-3`lS:[ŰD:CnogŲzffYrq7!JV[4{6D"EGDl(NtZ["[ͅ[]i`t\(kF;52|s ӅQQHE&[M4NTΆ!5n՘AN{o)HD&rJ!ſ߇$@ˊ|T{(.)u*T?3Ӏ0u^{RDo,yQc m92ʥR 9C132C<~";nfrbe13B]&-D~V,cR9Y`t- cy1bcVb@-0Vڌ> Ç)Z1Q()ڀR#oyrqܦueLZ+{Z7/R( CW`Kn0_,w{UGt?ub*D<>koJOB"*wvXZ(}@gՇJOG\<&%IF/ ~>=ߗy!=DtUk!flPdC>(׿d?/I퇢!ZֽE,.f0e0{'^{(2OxWˎYarA^+8ƆPnnƗFO ,l/z ތTq*Jsbu+UAZC7Qо_lwIY!>];*eu!BA%i0H١9 P_*<+a]/I|Qy8o [Btln P+)`Ҳ*7vH+P^A @ZsW}TAi DIK @,b0Cs7ƃ3%*$7Vѻڢo 8*{ CۯC3ƭӇ_ؘt"6VNcƣn:-s, tT2@i@>[u%+2oN]D-4T/xȬilu@0ښ_^uy2QMAVo˺mLqRWK]CN : Y]!)σȎ_Vy>QAyk:eN%KKW[3+t%`GT-f@w}EѽpC7op^ߙ:4!~B`;BGr>^Ҹ1r"OqeEU%^Bm%8jQR/1;Tb" Y-@'/U|lK5bXU`;dq?nҚ@@,H1e4r,ee,/@vrW5c*EbsN9k ̍X9@8 eD͋dYpy_f҇x/IUd#ft: Yet }pRC3^ t_s 84}i0op_i";9E=!?Gjzf5d_g1bZ5q_0+7A) 0 ́A(C|7c~r\X4i4 T="o_L<} endstream endobj 879 0 obj << /Type /ObjStm /N 100 /First 902 /Length 3135 /Filter /FlateDecode >> stream xZMo9W 6YUZd>2;,0H2sNZNE;KqnbU9q\2 yoL>$-\XK|vx#.Nrlc : o3zbhHF%a 7sAc$ 9 &J6)DS>/8oA 6̬?aIZ #T\ ΉЎ _m> H ,p$|FӊQA'$T$DQ& ɛAqHCuÌ3>zTD*y6Q=S11XФm X3I^Oa1ItPH5E^,ƀM>&ABwIa`a+SnYVE J PTC%DU3־@m&EW+9ks^kIA1I 9jJ "t7;>[/ {?*Uuژd@f:Ș?:[LMrץYts3W9، O.r;|s-ƕx5yA@Mh+\Keeyk@@gXK%Hwj҈&%4uq0LwTCap͇QH`}A i G">ʈ4K%42@.ΈXlRZBX#w}x{ѷc.! w6d"%cbE:h>)ЀEػ-ɛF#5 ÇrOn\DttL1OL1ɃMaikWVr-wծchf8ań8LUh .DJ}:݈1]ǘ3Uw?b۲ )L=dT~"E&"` %bݟnjK. A"|T>ak8f 3f=FbC ~\8bX>哇 ,ti᥀ Gs z8vNvU~0vpog۳okUNbn|݄;pdAX IOmzV1"ɒ\ **߷- }2o!1T#216ߌCBGqÿMK!cƗ0IX Jtk`,HeR@Zj{l_T*a_,oAp]z>FԏQ]1فx"ǜKaz2LL`5҄ZA0YnspFzY.ua{hur;}OPIk$5V*IW?쮬$5UқTIoU"_MRoJR7&K1YRoLzc%_~W|x<\L^w2މ{s;3۷bm:)<؈ۊ+_lh0'VD{6ДSew`gT<"^ټ;_.($r-ol=CVʋHa]I'yr!­eR([!hdh&7=^;Ѵo޴%膻z]%գrVf'~> <壚7:!|%>(2:8QޥD P`4qz;J WO;uX-4=h;[[`fRZBh2yy(YO `m^Ͷ;=mֳuoW/Jbח3 endstream endobj 1009 0 obj << /Length 1027 /Filter /FlateDecode >> stream xڕVr6+x Xea "/5.$Q)k0KH( B-{h ܜهk %'ϩCf.itZUtp' ZFMӇ~llMvv:E1,nvg0#dزD Dȳaϟ'=㎣vL~(uăTڡP)שl9>]Op~A2$y> ## :IUfyc| G{u滤gLVF=tOuhknneԺn![?}7TvXxҐc} ZCBfM5ΆJ1}&}1f qy8Z[td@JG$tPFt#|uZefгi'7Fժ}7.$X^MJK>~KHᄷ;Vi]*q&TJ]t6hjShN'09QN\Yuד/>nN`QCL ".0c_,^O .%/r\o߁n:3Ri:RI@5nu#̱ <էWIEKxU24*^*#s.\#1.vzymq ð6q G8{1D-"|_b6tCUAT;Woc汏DӔ"0tR}2|s$$d /A}87]%'bbotB,Fȣ W%o endstream endobj 1031 0 obj << /Length1 1704 /Length2 10268 /Length3 0 /Length 11377 /Filter /FlateDecode >> stream xڍTk6t4  00tw+!))tKw7 xy_֬5\}yi_iIYBAPS aƦ#A˱@p'0"/ 8xjP@=/%  5v2r¦`kCL.!!?R 86 @{6 BWFQ&tpb­řX.` @ ;,[@ƎMб;ІZ!\pA`A\B,ApCv*@i+ps'_޿!8-,0 XA yUv+X6;A@=ҁy)Mÿsa'v'9~y8f9 A8aO Y<_õ@] #+0woa[_6"d SPGr\-l8~'qPr?VmVl'3yy[ ` @A`? ?`W?y2~`%b#搑2T2`(6n7'W q^/_%;z"_`0 '3p(WoEo3inYn>~a? <(г =h~>oџ!$TAC?Hphx:AK<B?*2y({R_y8,R<{$_G >koqFq1@ iHmM`e ֈ(a7H &[nt-V;#}^D~KĶ]]2syc95Gqϧ4"}O.w#֦s6΅H^@9E(]RHۺɼOgeֺû7aIUDۮc]*W{ >G);o#D(ey1*s~e:>0W <.]aJQ?<>0e1~EE[$Ngn[.dР4]8u,S)Ovzd4Ȫ+vWͫ5*eU" I:ӾU@N}9cФ>`MxP`38Bn#Ý.礭Maޱ24}v35!GGHwsJ5c^hW{H0&x=<09yFWu{K"S?h s_r*@x5oN7O$ _e^)Ρ[-&f>6ͧBkA[jOlq4bEH6eE0MTSL~R$.T~pcMA~.GcfJso~5%>[Ĩ?N y>#*xZI#)0>В%RSBFLa^\+۹ۚҕKfm6Q>L] -8mj . mȴr|MiSn {^w]j;iH31&A 4ݛ&3]6Iyl`&g0T}cM5!<ģ'j`tٛ06Wk#c϶Z},78$d%{MwXXiVZ"Ţe=m%oNN9>a֍kg*EXG1 "+iNt{FT$@_.bBElU~EO5ڟ/ӛ:mudo=/^%댾GHNʔ0ijB^]:~с=`!1Pz}nE\p~NlKĚ0vu+y3)F<AN@scGDյWx`ATm gOEӝ>r&h4w<~!>ő]VY,]/[xfaC+4EHyjׅ[JI`yD(k@77ԽgQƞXmbcc$lv{#DKfir&h5TຼS"Bg4= CY,s'bTOn 6&̟U=ߜ֑2u;y [ZHSbz84fӘPiM䷤^@0%Y!,;hT=WwC~Ɗq5 ū_Qz<.akmŏ$d7nuLg͙zdW,4}vl#!jߧI3VJKFVdbjOIx&[~W9Tk0a{MIFy܉D <_GwCj!>g酘KƝ ߯ 39$$bzTlkOpzF/Boa}ߙ!/Awqs*=6)P9/ԁrzh\a'\C: z|n*'IG<[i˽ = 9iem)[;XMO$}А WU^13&0)[TUgߵd;4397kqH斍SSfO]' %e<3':bi&ePʣ:`ރGGa8qRRsg;騖m"7SIb`+-Ičd2 xq_VFt`U$ Ŷܷgϥ.ZW)Lf6w96gy`P8N]|6~hdQbXH2#cruX#\Ibʳ<2Ċ8͞ՎV&EWhot퓗/ b]&6LviM`,AKI'g 1OH5.S_Gܐ'YlJ< Zv7Ž ZͲ2Yn?0U!ր@)7g2fj |6:YQn5%3B^]<ݤv z9S[qw2s2z9w(Id!D#VKie7 f΁Ԃy t}&<'7NdО;G4 EI$g]/JQM4l9 R(rWHط zy*φ 1 N[XAdA/@Oi?iP)

/DX)5BPχ'NTY&Ub2G91$vf1DLZoȊqa_8'2 MWOń0~ҥ17ǜ]`рDv|ً,/8F%6\Uz%c+Ka;y\܌Wߤ/:q;:dW>8;iP=p`rXo@ahq1C{Y Ԕn#$XnMZ~v( Qzkfs_wpnTG0g LGM"1AM.eR֣]=\^CEJ#"ةߪq6"- r]fqC+VDs/> @VTֶkzeNiۉ*~FuBܰvoES )h쵈*"%1ct{rFnuEn%r\Ch+Xk57'bKu>Xb+=!($%) )(M6~AEy>Ӆ@lIy'8+gII2lI~'Y@GBHrC$ 0 *^jr9;i6<]K4pNTӇ l9iR∝;R%2ln2yZ9F7XoQ>"QK^Kg χBmzJF\Cx,Ƃ(,)b\VeyL:V&> q=MMwkMClFۨ{-f1OD$xnNêNz.He\hiɓ'u.^CKk-P|n:g{ .}T%8?.Mmk҄Z€@"Jh(&İZjԐF>j~rIJ,)JS#UlFGt/BhVG_(5_(| qK H1nھ׳4&gXX ߹$_#oE_8&pRR=2t#M5jҵ˧!Ӝߚya8[)Q2EϢkU59/U'̰4H%vd*Ȟ&~?{MBL MO^s.xBĹ?TQb^y8G<'~x8x)BLmQru\)c ~Foyϳ n;.QL ֪=Z=R|b2;s(?oxҟ:vbHI"ʑN,_TW9FP^yvPiR"a/#3Kkk'/*)c&LoԜJӟΒSdko;6HV uƨIj(wbӼJ۪&5֋cDM,wT8^pVb:ٝ.T'=~J͞҈=>!uJ[/,|MSAnC/F2A†͇* x-Z%SLXeyԾ4)HMO^`'w"abOFF=hqO*D}RLv#Bsrf>H{w&魾;2qc"FXq*ו{&9?_G|1Pv%Jm<3x3dw!vF/fֹO^\""?%J~wu neĵϪDbfutI?ASl45 l{oz,UC4[XLⲯ@)s9z X^}2^piET=0Z},zv%d0!9Aw7¾y׎q"μC[@:%117%i[eAҘ<^jUoEoL%r:u\<=BY$Wۛ1{^W'KfgyάcCbÙWPVc]tq~ëtQn'[Wr ɳS6}+wJs ["5j!Rgn5xRs\=P |4NO\~Wd:hQ? 8o_V($.o`huv[O27cGyOpHҏ?i,`\KԆw>۠pGzhOHiSsQ/I7G쓆ΰFqix#oM]m$_r7YS@fT%4˙+0@3b{ps&\ܴ:3+eX{ gsGE GیzobZaBN:ڥ~h_ɣB٦wt?y _<zע^H.Ω|ڞbceX?q'5 / X" Ivԝ !({!@nPPE88ٛlj[qA~}pmfTgw/D&W>V6sdCCY( -NA~bwRx.my)lr . յ^4 SZ >֍NEb#I&O//CxI&3Q]7IJΞߖzת饺w9qy[쑺i\am%w Jr**<ϟF=L F ]|ȨL5עphk ;aWm΂[4E3IGwSᖳ쏲zls(2\Z 7U NJ|!nTG–X! {^II&Zm_Di5tg=f0;t:p8k wbTG}}j\#bNv$>%,tJҒtQMܘ`MՖaaxDMyN&*]< VY˸aJ|Ay Y*ܷp%r<ޒNX܊]=EqkaE !iv3  {HUW+Էߣmhf/o$qCX mҭBfQ;뾜d#9QKSNy%vhĻmY?fXdU$|Cwꂳ3o[/EZYxT~!:w*{OBš(t{F#˩3nԔY<3M82GmR66ۢG*BR=wd a>_ƦDx%Fxb*Y?hFfLtz> ӱի=uYmyPhBƠĘ~F: b9Cw\p; 2lcj?6L탅7{S94 \ 3Zְ02Y9c C[ݷ)8Jn%S6f*[WЪfBJ;v{OmC:A눴gf>zE\)/$x*6*0LeFɒYV`iLrvZaI@iH93$P($!P4,($4$,yVw6~!>XԷ$W_"ۯ0^qk$y/ۏ]v嚐4Wٕn0fXf awLY84=V'IX-|16Z[ bZ^Ɗqo>7׳NjȽo09 dZA4uj8*yedCZ{ o3@io) 9qi;ː8K2-B.wQH~-)T[]yYGT=^u)MܯSdy֪|fS~Un 3݌wǂ3șzeӅ)Y]0F y?ig(V{?> stream xڍt4\"Ja:m0e3zhA! тDQI5k9{gww?aa瑵C4,kj`/ObG Y`(8!0(S18M$Pt DX""`ؿHq@4y5$"dGzјc r11逬+n EP#s-Gah`tDy(^7 G;z0 fЂLK8QH{7`.p[D<ှCkpB]oBpd- #{ VE(Bb^P 9PQ(˯~ܲ"N CQS{l1g7aGӍw*`\9ЀXTD@T0[G_ |`_nnH73,n^0 ߁Z`E608?1n|C/;$?5W3crrH_ `^YF ?{$nsOj/g1-$0?,7 m17(ݐ0]|0DcPcjUEC1JE8` QJpm3Gt( & F`Θ C?!( 65a=E-ԗz%C0 6Nj@1)fAkcb(m[OLف_o`>0[H[pYoOۙ'&<-]#V )Os+>wB,ZOr&}xVl&=?e yU`.G/^хCKF mХBco;J"jZ-m59hŌ~,, P-&}3Ť.rJ]M:g=6D+ty9M(C Wm4Sx"k#F=}F=J7۔ <G 7(If}/qDg uf1E9I*8;߻B6Ğz|/(OV?)֌86qLtF!U!n=-HCB&0,)7r-ꠂ7*qVI Y(rjcx/1nWɵsYaFn?VOlPcNy;ZM'>gR7{$+kB/4 Y}-9sۍ4R(3WwNUr1w,44/RN [{cJQUt ]5LO圎^EdXW~˩qdZaJ4P!g|`d't| J|J1yh<\ T ,CO+Cؽ 4Å/o1<cOnntհ"4kXX}T(Y' ]賧v3<:dp)*Jf~NyoY=aӒ*,L?Z\>quώ;=͗E^~2-q .RD\]7ڜuЌ4_>pU8+Ry%ҫsäZԆ1!)bIcr߃鄄@ΟρX""?=VB]Ӣ Gh/ga ٮq!upǥڣ<ހv>j ʪ22{̺^#FhMd6;r^]i_l 9I6fWIkJ-iXŁ)Y=ERbIYIlgIȩ__ y'Pg=ږ_'RW./zIpaXH(oAvyOP]b#Kcz)Y6><*)V9aSLUbkn.zҢzgS6ML}bKܓN>J֛o4HƾfqFvl*nRo~iC՘ʈUԣW$AN7|LMr W_K& ?*zؘ0)p.w**_{]v5J;53 S{-Md`sB+ם\QT$=Lf |[).H=N>皃g:C?s,#SvQ( Qxo'}'}iTqv@l3;v@$>uͦmX@񻂸;͌:-.R;zd87:v&Τ2p/+OW٬ 3z.{@6on%voշ(";֛!--Чmi/?ho$lDɁѸQuH A=g3bKR0m3iT6lkZs )g 05הn 6 n(+0 tKm.LA/^%ݛ|29YWesI}}w¾xoH~Qvs2.l4 2tX'nǥ좿4&mP7gɹ t{e޺n6*̞Qe⇣BľP`""@EȽ"l#ۧkxy7Օ8bamVsMK%ILt7]2ueaX]}oVjqT(o[IݏV2U#aV$l;-Byv?W"6bEX3=tzřAeCfw.dxf%$|0aOȔҺ;"V5kRyUt_K/#6 Z8Ꮂ.BLf">22#&4( M4i;jtMH?9IK5OAl\aNq na=g˦u-иtěW^' ߆!_OplW6DOpX9ELjX93&>ˈyO %DIƛ*U\w皑yan``I ծ(c@|gqX1yO@= 캫L~g G%oPA m-T#Oл]>ϽfqyB7>ysCZI4> $pI[oSq!+yG5PfÎogs5qڎWn*/;d{ɵڲi=-"Mj/:%Z3u'X2k3 FEOX[޹'gC2^-b^P臊"FX_Fy&SdBn7 *| 2FMc:O:O?1Һl) 3wc]&{Ά0.328CAX4ѭ1T[>^z""ƧYyx+c PvzKkIT4?^ |a~uh\vФ\X;="Vzwm̂T֧lb݉FUjwHL)w2r֗#2,(UskkY Ju XG{&~I>qw'ho=RYc@b KrBk(f7c5qU7+T{0Hʷ+ҾhAYrGk@}/aF/N}E֤C, sw~ Xr:l@]|5u )24lZfŕӖioB$s0euQy(ɽո!_ ~ 8{ vbQF \U=aqL"to$rhhG쐌"џtBm#Y ZEm <kJLky?О`bs+:*n]Vַ\vA*>N+PZZ,5qk@2" uWQz,GI|E~툞=S8GDf8nJ}tVZ+EJ`}سgTR^IMx|~ľ' ^SkqMGgU# jo  A3L#_h"&wVB{QSB +'AQXQoxq׷"S(An'X\"TMd!]__7Qpmof3"}eľBVD(w`E;s3݅`D\lŚ&nUUmB>xS]m 񣅥[AېTsG5ȰRy gNUӧĹo*4[pt Q鈵ln(nܳ]lHUEG*w&C]Iq}Gߩ0u):s&5 e!n3T[fМ$ng]|mP<:ah.D!a'}uD$[3sYbUgQ+sOedux8?EOI>vj'$n@eȿXhoi??[Y.HIeX|jHV``3hLaUR`klC~ϻEtTl95PsWb2Yr0 WIŒkmdUj$qϊx⠕_H5>T"|~܌UO~!jhfh] k~tof5OϏe;h0_^ ʭ ~"eC&"1ڮRj黹r$v~>_;ϣl oE7"?D_{FQc/ک0@'Ap|b驷Ŋu!ܠWB_H*]gvm9^ptࣩFDj9&'-Tw˙Q=+ݬ j$9}{kE6 9b QX4X+~hz6]}zpV4;2{ i}clMI]r'4_Hi ?D<Ę·rI;ːCϽua%7' s{8[h>.v>y;IW HP󌼩yF 'U@TK> stream xڍvT.")H7C 1t7 )!!݂Htwt珟wνkݻf}v{ =&9,ـ윂Ie ' ^  !0Rt l@nO/ G 9CN/ ssXZs01m3 [m#l03.p;Av#;R[4`g9w-Oe-+\fw9 u|pJU;0Vooߎ пAff0[; X@lU%v+V8A QAqus4s!6K񕥡0[[0;?)8tb[.ɎC wK=jf۽/!7X`X b~Ap9p' CS%o0W#ߟ>q_吓gSd0W'M2ǃQANC-`?>>?;MT`MrCN^N/721;!'L1b#i { 2dRy8qġ6zF lYa\@`5#Z=dequ8>R/qr34 f{ĸx co,]"1 ?XwT0u?n? o_#}0<lغaf7_.=t;v3'Gk_||0l>=3 l,'sa[|=J1x4Es6Ui3;݆g!_coEۢ~MljiPl/q]Gq3>P)D{EpyY⊼wBBG Cb+Ahң okʵA8*5q }+ kpneɺb+ hT [M:։3#U1,}]Vj޾.įu}^E;E-]/ fS)qXg_> {I\Wr99߼zڎOϛyURsҼ4bmCʃ\+ni`T Յ*5cEK%RnTJ1+SGty<듥anW9:%ڍQ9k{l[5~=A[+F0Pe--ɲO/VLiiI8S}z}:ВZ$/EϘ7ܩ-9vJ*WY/W7K&4HY$Zou:GdZhݗ'GV yؔ~~HFf\M9Y|!"~AEz"u ӏ,A7' 1mA)a"6z=]cRG3׆-tR;hC|#b05HD\0QAFʋ<fAzf[/QV{*?&iqKgS V`|t?$\tG ݃%Og}ew|Bn,$̤ oG#P:ϲ?`"ưwp:%5Aqj*mp!+pdHJ$ۧ |#! my"cp;*Nb)zX%1@oe7)*b$ei{ȅB?I2[ WSLrV˔ƗXmX E;r͘X6<̄0F8h|h2W1OQ=)JʞZ|t #Z5gL~6E/}G^eE@񝨼ll }"sԩs7  VA W؅mw,C@`ucS]uǚ*.yIlܖ)&FyoǶCNw<8vΔs:kgٱg6դ)uę5%86y$3s/mJ=~`0~{ AA GHB|1yڡEĴf{TGa71b<ۚoqVѷ < UL͑ sDWFIj<9$nWd|PpmiDJS:u%Ƚ6u4{f!$~ 4p 5Dt#}SKg9GBG>+ Wp'y#,SQTs,!V|ʒljQFXXm}Wm W5R\98gafͥd5 zC'N?E= A  2=fA&]\.o<ݍ ˵aew*)JǴn9ebIȥ+1:i)׊X s]\IpWdэyKb p3dk.}(1m&sOXWiODY}{aeYH=I/&zHpu2?Ƕ5:ng53/iҺ{'~s>nJN>,`wxOGח,^wh/ ^! S  W,GifBr(g~bL ˦0N@"$ǖ|ُX&A:۟vԾu +F3t&0)&Q dkaFJP%.oq;J "3͈Bw>cnKe>kOU!gF5?o@ y|hqmd2/ ZG[F*sse`Mz*PY4' Ro{:~ۃ*1/CԮOe?8U hǖXY#F]l[~z NbP|e5MǬ gj19-&s@H JلL b蕛Y?RKjT(<ejgޥ.Aw (Ejg#6tz܆:(-}{ Lswo:9YDzo*6J-)IAdʮvrdjd騖dN6lx^68ʠ$lӳ;5{oGB-`[sղq{i_Hf)X*ʓɐv=DP+\[bKig託2hbno2$\$&ĢQ%;Ss^mՅc/^J/ B-.\y|傲uҠxˇzmV ! sݶn;-WА0А|u2ΚOU\? /̧:7hs:k$nWAܷhmB]En&v?\> ͧg}ZHl JP쐀c|kCjV]5^ 42ď (aBY>@Q"dQ:GU].e].:(sKpmUM1G3MFL358BkqU4lWYu'q3p2ɠSkN~p%I`^w4>z~$Ot-|(Bl>.VHP a:C U_簵SUGԽy2N-J48Eh{}l?-@ SHЏACG_/$Q2˓.ߣI}-1/Wx=yzU>ۺý/[8>$D)ApH0K^8F&qB_>x^f#$<7\~/*;6aEIu}I§nE_4v޾C0q޷\bj2ԼA>|bc9ZrNnӻv/2eӓQ=mQ<r)R'SZʴ$_'ޘy11C"{]'rШ'_ Ji=59wʝݧW7Nc,r ӆV~ݰMI}NaYnjy0eF |,ȴ&KFFTg *t|ΕZ/*%> ~MExw~#Cd#4ʲIHE>wɈоC'7_ b`ܔ-)1kfa]+5PO_:R 㞉_a{ t!° {%4fI|v0Z+ŵ&݅X,)P2Z@ʰKiAn(1]Tȕ>J_ůYzB3lՇ4 6׷ (M"+5o}mүxa8C2B8* x0 Vc!Q-_bS2D*4Z9+07Z-N}h8>UH0e#J4$i>FSp{!wӈ!z2G"oTֈ$ pF@Y1l >y C͹s[^'"/9Gk i<Pgܻzd,s B_aj]0Z \2k(#zfߍbkAWDJS>dz3+CiZ/,Oעhܶs8ljCM $UuS isU%DŽf{3ľW؞(?TM0Җ{~6VE*ODꠟ: QQiIs ڭz3f,KhT$4y/Z#2h':Ҫ,׌L u;QQf*_>|b&dBPҮ'@($n./Iff @s7asCrOdr^7Os]-|TH1cjh_kjHNn/[LkᲤu"¢ElOap ErSNDT~u }C`iVg9mlG,> ًUlq;S​~#ȕ4&_rib☁Gm)53Һ`b;v.ujОy̹77T[}(e./G9S=?!b 8" )w{A籠XvIXD=9)>1~bl'i3}燪ϏZ#:1 SniM-d>~hF*ʨ$<_LḼ !!ӜeW3 :rMxy8nkX5!YiJIW(n>LE} _w =v/sy)^n=Nծ~aOlAzG\:Y\km ISȹҬ/ᵅ}ROmob6J[{NnI٧C[ej'}oJ?d jWG`1ƒ8e{ke2DI[XydTF>da0 =ȰV'Od`?֜@NHfUM>w*KHMɎ}Wbps6}\'FŅf|I,4O+jj(Q'y /qI<Mk>ΜNpBUc fx?#qAUeTDŽ߇d܍cƠ"ݓwv$TE$tmJTd (8l*x#)rbL ɹ:c<`+BRK鮻Yt]r|7)nVA.9WXo LW?%qϲ{lmɲae&zѕ,CȪD8[kﶭRS C!f3el䬅D%b$lޞ|C {#'K+Pp#t"(ڱPk^6d|֨hr6YٞZ!՛$u+0b>'*UooF_nF2AʹUHZN84Ϲ4&D57zbb ՄN0۲ARf|j8j#_Z:fc[2de4wE å!'< Yr!\G#/ZZ?#6i5.*4O>ddlAHUEy{3Gt*⟝0ylx-4)?Zt3Ug#=d#u0QBdYPm`Q$g"yˣn)|lb5(J67gRADv#ϳr1=p1-~B9ϝ endstream endobj 1037 0 obj << /Length1 1373 /Length2 6093 /Length3 0 /Length 7034 /Filter /FlateDecode >> stream xڍtT/]%Ȁ9H#) 00--J#tJK(ߨ9{Ͻkݻf<~mp/( P >"bc3?Z"6c0 $]:E ݡ#PT(&)(#! Py@Zu8 $bSz# (t\p Z #]m!`?Rp>vD\%<==A.H~8 /r胑`k\6{0~"6#GmGy`Z؂aHt;k 4:`g?;ѿA`ApWsC`&? ~9H8:@A6hߍrzzC" ($?54KV)]\0W} {|!0;_#ع  n`5ſ=*(vl~%7v6Vu#!`/`mD ( #OvlGFo dƖ *&yy(OHD̢ ݅b`pğfѷ=>36X0?7E0C,w?Po+/a@xuGG3߮OU Bs@%B/.e*Fp$׃ *[gD &?K*lv%$" ! o"ђ708 @#~SX ~~):(Ool4~ſߜDp[Pֳj9OQ)ͧ\|6 R4+>+q.0_~kÏhNkJҟl!8N7\m/!#ߵq3vf:[8nՙgWmopVƝI8XiW63tx(>&n/)ʗcIC6 nslj!v~ZIr `SĮ4&$ |R_R)dI@jHz&j3ڐR[iuӃr+Q^ujяza~(It)i/9K:*J(9镤+;xz$LiR8΀ہFmCRn|qnV.CǤ1K 2/tx;\<+1R]0sߕD55bM;EJp@*δ;3Ŧn(rD>IE7,(sA%V=0!J%a8.aS>h;Y&`=uʚK#H|!PSynf/1T4Shn^B!KIi!! 5J-#Q(ͼNqE3Ɠ#GZHLwW$wC>4l(B~ב:S6!U/~5&, YOlj hy̥U1 N\Id:v@ SQ/]tCG2uk@uѝ,$ ?c}Q0@u=44mg z{ I.DmX6WD(LkEhni(9}d{az 1,Ũe(ǻ3e,3&—$O^u'5oU;ЫM-([t` ?Rl}1Đ7N.ĩ2t7?ER=zYbf6]pD`@g31,ܹRo>3kMonFJy_^t.~X] |N"K#вMd Cb.ך"&z B##]],P A1±V^aV36~jzwQu0<~՚ζoULby[p#i:m:w \!ܾ-onVIz6(JhqSnuߧpk#Eq",_U@i CF)(؁XkaD5lPB- ^K=&j2}EHLjq2٩Y 13̾< fGSiU[x"5O-ݎ7u>1^E.)a&'ѩ' J:^DN.E\&mدg#bCbv^~v& -ޔ*,lc@+nNG)d_LQ0:}_U-!8]0ˎqksm1m 6. Ǒ$2Z{ګvZG7Ym&Ќw#0Gf}P${Ǖ])fDDzGbez"uO>sl"ɑÌxG^IĺO4Z >A[0OT_q"2Wng]ŸխTw ΧRټos`bA=swǴ-Wer{*RP)N{^Ou/|fYڏzΜ~4N NA)lV#xbg&G=We\[i3SSM/:Xа*s|^4OA#~kR2Vq`L׬=GY¨Eg dw%nMz.+1T SFv7rTr]LRSux·{pD+6:5YE#05.h߸=0п# lD)cZ͓_g)'IXg6}ܕM))=fL#C~}wiZ'I*屨{lּ.嵐]-u$#] pdi+t}%-ޮJ=ƭ? _(UwR&x@fTf֏;;Om-(a C䛨LQO'_y}#kjɔB̞UlU$uw:yx4tJlRB7Z+&2Y'cdy䴧}+ݔfmycj'DUzkɟX ܝ=XE-*b7x2G>[<9ЬOgș}u^=?XecYʀߨS0z@\)"Jҙ/~nwY1z:|wZpaťM*)j/b-HΫIƹ A’C _?cG>o\}ѭ$JrxdU=_!;YH}U, - o'PWoܳ L|] :Ut&UZl¥RFQ'iSW%bgGO i,CG_ޱwȓRi[J)`\R!zB+l[4Ct?4wSK5uƾ>VkS#9c^z`J"BNu0Y,e,5v;4fc>ج]™kXp8Hx>:4"9 P6!K@Hf./+w52:' 8G'0c@|#bySb?C(sv,l_}cu (g&1y6Qyt+z4TtHHVaGR#ikTʻe;m2 h v2\pI_c!@ڻ˛xԑm Pܽwyn@.=| joKLy[0c-lrF2[f1*1^5$WlyNvGZm A>Nh$!JRt6ܴѵ)cԄC]7ĔgWGScmVKZeWІI3/}FUTּXkꋪO%y~@5drjoSXz_yecvФ%^Fw ΂4:[Ay~Q5ewWHG)]3YgwIR!&y:gB;!]| +V\8t\GuX mz}mNv-N?(mۇS3o ;z?lt `VɊen" eԭ$ca~f6Us< /Gl#ڿhD;M2slFp^b*U yµR69 }$ܓlF_7(u"R%k9y:t5׼I bKc`UGܾ̃#-EKqiDr&"ViJ|Yςc9(C"U)7ݣ6%{5!9i!E͘0o"ؒ]3{Vp_} v Jv|'n`#uAAUcmͰw!}> _!1+m%O=XX%cpW/QjpAeRQ}zsJrKCy3PE5,('v\W`68cZ >,.hAQ Pgt}h=,J\"a.hR;LRXk:2#[\eCQiV[ٶ--dÛwQ+Bƒߕ^ȩԼUq)ey`ɖwڑ-^l7f@7-lHW0p+ YMyGQym!FF 2JcX>c3V<,oΦ jc-v/enHy.Qiʎ8UP*!ᅀfOnux\'x>|\vLgEO~ ͙T' CMk?n&_~5*^o5$ʽa]-M'}6qx,ez4rtxglޗt͛=!pk1!Z%xu@.;R Ϳ9sp Lo1;8!Z#xnÛxectk->g)6pzE ~F u`2٬ojrVS8tl-\5\KF PÑ4AM7=G6}S[C]IT"2VմV.^ۡ9 xW_-]` =1AD3M&ī^?-~){?g>cAM]Q?a|&_5jzhg4D\%&J=^Dt[)þN>ET mM$m}'݅{M0}C4C$M'{@͖L BN5S7R*9?ziZr. 8$x7{HH=5=ۊs]và)~YN8?S7 -) ʩb ?I#C>u"Љ*m9[OQE >OwmX3z`Ќ%}]nk;1Eq*- IuF%Jz{rAdEګgJ. Җ`^]e|lw3`(=y'Ǎ!գg'8Ы|[qM` e#&"VUp[&(D$_a1vy$ê endstream endobj 1039 0 obj << /Length1 1492 /Length2 6799 /Length3 0 /Length 7817 /Filter /FlateDecode >> stream xڍxT[.ҤJ5t{o  "Mzo "&MAQ9{׺we$gfyfYag3WCBTpP mh."@0;@ܑP\1! {B8@I Ka Po"]  h 4p] upD+ PpC 8@rw`C A+GUJPK@;r(G q~ @*M`E1"Q^ w `vAZ]WY @Pogpq}p=h Q|C" OE~(@  vH(W YnpqQH_=Csp=ng ;WAc8/"ADbbB v+7! G C~H'roÿWBB;(8@!C@_Yf|CbAu-ESS޿JǨ K%ŀ!!!1( q@п_p{@Os;eϿ4׀pKV.?7ovW;#췝\0hzS@? zUBOh~p(R ӃT75o0(@B0h/ l!;o$ZM {_e8ak؄E ww(O=vb ( ]cN""AOж߰P h rAA.]BA_7n? @NP uEg W)`ww\ο׿/&DÝ^({YZN4EqL=sVخf(Co.aD<#z{>#cKlK'"]u 7,$w.SRtzTB^SseԠ#F#WOgm?|V-fjTUERj %R&jAbp(*%Dp,b=_9^ך/ࠣrG7CGBtR&{f- MSr#_M"{`jX~6cF{QZvf9&buq\vU L E1=됽49;:Ӳ=>J3f}7FO!f;ﶓި~&3QCjN!:mƃ Z;T-%GYLDH Ȧ602uͥ(;+g|5Rs&4`` wܶNuƳp|\émIT4/>xmdC^(>gq|3( Xtj秠U-2ߎsmx5\bJ*D[(-@ZmMx YI_ qlM!. I*`I?l(D'w@ “p GF5,U%Ӧ UZA~_VqOO=/fb}w6Lj7D_*=h"+4OPhu%j#|G Db`ֿ̓At~zahzX՜0y#u5ƬrC(H QJAIHo$H PR<ͅi?c 81yXUYjUFZY;)EN?7hN ȮmPY2=Ux>1~zrYW)auIxc-JV3U(Zc5Y&Rj3Aey&uץ2K VvB̽C[T,%ӑT,F7ʳۦ`Cd!KoE +]쬭K"}osIҙЇ,A-U* ʼnvO(wTj B+{?d؋e6XP&7R7ퟹMoqYon$|/!H `?餏&tzRmqIPXbLxaA12&GKsQcw ['ckq~-rWpwMp9sGjGOQeE$vr f\aaOޝ96S~P\7׃d5;1b-ӻTp3X};SUT;'Xj=-Oob%Dp Ʀop5L,ŨjӛR.Te㬸?a (IN+8_8I"ܿ9-$?{1tMG ˑ//N7)6w`jWF!aB'zBJ$jz;,ĹY LD=ީ%2<~syBD YS w/EZ+b {xOshk OeIbwwqa8N',{0j]3 l)Z({$9@NҼ~H|YdM3n9FN o#w&Rz紆5vnY?]YK!`m=U!KߩRXLi T}x@@AoʨG:T>fkjBehYo1 WnxἋ6hᒟyKnjPw[x!k% r0(? "6adyL3D>:DT^(wQ0פM V$A_og tq88Y e:#Vb1GjD Vϭ )w)'S< ]"`4YpC:9;Yߩ7\T*ayv ӯ~5+vRVs1 l NgܛBx?_y8!.P#T;/8ŷnQ'bʅ]hF쏛ҖW }qV?f'-eigr 7+y_N?56렫 ݗЩ\m;y07H]:;vC]Vq݋5BNW*dtQ'HFTڞ*يңWn*S\lB 拜;- uDn}ՃgJK":Z:/#=$*d[H͹^S^gUO3jUNX[5#"x+>w@倕\ixP;zlzig\ɮ\ǯNI?0&UPJJZ_WXX6-1s6jgM\q}2)UdZC#qNgN T5u-m|?o?OO >,4v8$w6QY0ي}!!.,tsy8Hs/W{|QDn*!\ꇔ)4'sf+ҷKեYw vN^%ˌ\߹JY U&jG+8{K.n=:>` +KGH0O=T}51Ifݪ7LKlڿ.(?VIRaȌXՊimjm%.'jTi`~c5lիt] @}|L{_9C7޵ UO`&H;xKjP+w$b">fC$vyfQw |lT'蔕Of#rS8IH \Mny*y;yDK8`fzpc|u )nTt"0IXŲKlsqay;Y5go%T$K_OIh!S"YPfiv:k8xYv~V.#iNSy>5nj1O?fBoed){&"yG"rXŚ吢,vwF[~nyȏN;"uaV,7K}768ǟ 3.= |W] ΓTptq *vjuf[>^l]CwEHnmFu|vza',b]Xϡ PfY}]TR>= t]*"ǟIKq|xuQqgʥ{Ajɏ0^z#{jyT4V"os,l!6JÅoqѐn1ְ 盚Ay/gu XT4۹ZWh20EsQ$ztz}쇃۵$^KڕUz<%㬊ܛP Yq ^kr_RRW3 Al2)i%5Qygz(a|Ҵs(5|M9Eccl^Bϟ̳F{IV c:Jp}~ k#;֔]VӰj/b^bfH;j2vEp9{Yٗd u[n b&Q|߿Q'!źQpfi-_ 6l9 ;\OenvG^$Uoi6^Bt?/}x9;e|yixvskۏ0?b~e :6O";'XGzU* D݄ aRz;xQx^I+mWWy9SS_LuGS]˜ wC8>I˄ F Ͼcg t5?_y[J#A I=}7S,8ãm.=axv7ߩHKIA.@H &|rJyob"LFAM7R$xFxIv@(j} +Ii0ݫXٱc*;k[T<<&ŮNe &`c̼OȤIVG bkE65qB_uFWVz1_"YMILoPPsg+o9jGAU:q'rZ8*'+cEݍ,kΈdP !E3l+i]rV,}O GĕoӒ_o7D? ^BTEwVmS$eU:-dQt@IԉD*^|?_OnEr߸n29Hcc 7N{use k޴ lnԒ$:\4uR?-Xv+B.Xz~q;Q_Vs2¹9W G|J *6v:EFkm?7 6Z諶~wfٵ9$-IFKzYgۼ3Pa5~ֳÖn) a&DPcu[Ndtz0DE6 +wKԁjCKxzQcZTo`MPZ|)^bǯxxM 66Ptep2Ϊ~Vԛxqu4]Ӟzh,˴bC> j3DBFqU;ӧ$7rۍum*LNj-R.ld@,zG,0C-_1:_dIuޅW:ay3}Ƚ('.y%ߊ\yD}VԶy7G4 cEtYm A,0iCu>_+~$ ՄZSz RV~qy)?rO|bcى $$Ln/I>+'pz4-*P&&lN7 fFasOɿK`RC4(زIpwgy&Q hWc^WŀXMNA۞/wʞ|.xea5R=@1PPF$ֹZ t*=I!V6J:uhZՃwANOrk?) /_G9]_?% endstream endobj 1041 0 obj << /Length1 1372 /Length2 5935 /Length3 0 /Length 6880 /Filter /FlateDecode >> stream xڍuTݶ-A@%Bґ.А%$A JEDtKA91#\ss~b30Rw(, T3b 0XeĹ .3DdAQ@mPDRVDJ 2"1@5'(v qmEdd]$ AqW0  C"p>(+ù {yyX'BF,㉀ uE 8"pc= A @a(8o4!PȺg_P E Q@{ G (/"C=Hs(!xXÂH_# *?eu\@ᰀ_!1}ܬ3 G{ -<%22@; sU ;( Ov@ 0 "#a8:FY/Bx>~ F~Z 17OH HKd,cEm\-=([1cϿk>?kEUG` 0 %-)77twwC].> xzCmo9ipGpPQx1 p$7@`$7e5$ a"[Y`9X.xs_u 3Q I x9OoH8 Og ڣ1_*. v a?J<0~+ֿ@x#` 4L.ܩ:RK{_Zb-%pܓu/gj܇]O3z92¿q8mݖ2G޵%w怸G3; I,Po>2IyB yl>q!.\Tpւ]Y RYpsZc-8YZS` &ZCg8#H|ƻ4< ɲHZ&:_m&GXn})L]#爠]8(S凛va#VbLj 춺g8Ј4G’g7WyH)Z$ vn+憯rǁw)e%md$"t2tթjܞwKT(]y7w{0!ט>Vxb quC 5~fҶfgwYߎkuz_<ٿ5v1vZ4[:mϧ)~x[~鞰0lFaP`y{s%I:|ڕiZxUH|V?*/}i;`R$1QKA^zCLtog;UD~+3 DEpd㧏h^@idJrM\UC4 e5k6AeLWwK`9w)B |E r!n+uw7NJUԀ4t/X 6L6 ^xV٩"j@ټ0;ŸkjXGLJ3=(N\G&7inzha?7r[:ikz|c| d#q2|PPgmKqS%PDYٯ{>o={1)]="&njyXE`9P^xN(e?>ޕ}@:G&*9rd٧Z6'b-*]m(GʱCИa `rv* RYelptcq>2h?|wBuuZT!<,z,w5IGj'ƒ*˟Oi8fsNCzorIw.`gd؟Kx^x0#ye)p6yIʗ4?{~rKkG#4 Gdn>y,ȼa<AҾ4PN""1 7/JI딖a f&l^- &v^^ao@ug(3$#5#x ;X{O>}:Ktxqqc Ng)6gAKig/+޾~c9ψw7A`P "E] nS̴SPTb)sc,RG0ϟGd6M~䗆(o:0X BEO>ȯ fMtCdh킻 `"y'*f:DflYd&eK.a Ob]^}2jD;"޴&:<ǛTnupEWf5³ &N9)+yi+Jn+d~= .-1桽έhetn~Z^ƒcXi_x-0=آKCIQ秛ȟĂmHnEOZd08vwvxg "Y;#6>ݲ&8a_bEvYi:,$#IzCmַ  acx9R]4naK %nS nQ'}o{uyKCiqő($I_c gng^ËՏ-'8Pzf&I.1 LRV,xF( ܋^R;}OX5s#(|ijCf&{=ɅĪx bO 5[2 !PǍD5=3eXUhRqS3g;j T PV3Q֟}+mфC#-_GFoQ;e:GuҢW!{YɶZ8n6#gَVe[<5߼S.%gpg'sPpH)TR{ )h/|/xEY'Q2n?իo#|$%um%=K_'S_v4ײyE8+m~!q(O4Uԍ~a a{ RYd]~S.(@d Of.AblMJ]fԗo7Ǐ]b5?i,/HH|ꄻ^ Xtl0ZЖQ$KC{kĨUqfb7Iv7}|j3VY9>#rUw{bmYˢ\8Lo-Y#yH Ѽtӿlx8cXl MN~e˛{r]^UҤb6`Lg.Okx1^|? Hm!UJtkѠu@RdavK"n,qqg1O̸.mSM#]ܛk="Z$IAua ( nl: ˯|b~(v:S4JigS 0b,ktm73%`SPF~F$ImtV:"3I˔ {0kHmťQ1QMsɬvEaRTE|!v//ˆvGEZ]U(*b P[9ZTu EݥTdU{$/Ȗ~!۞WLv}J&hݺ}N`+<`vsrN])AU0fv_Umۓn1c=3Y*ȼ [G^#Җ~|[,ּďpԖwZku>yIEmvc*|7tAZ#6qrxYh%Y ǜGqAzؐrHɢkWL%Qg (?"XۤY}՛y۷%M\ ٍizGTok95<[پԚSLB8*%yy#vm2,]Ֆޫ`Ik7,/*d~`N~D9IP|›<x'k"U q^C%t7J Wܠ/\hѩmn>ҋe{ŕOL>}7ڄ 1b!O7I0i.*'?2\E5ʰPi/U:Sv`nɋ5@OWg.4kfqqzqaEDBQR3}uPO{.pAOUћl֊J$v=g;`kՁ[p)\2'e WzVt<T' Ru endstream endobj 1043 0 obj << /Length1 726 /Length2 7624 /Length3 0 /Length 8217 /Filter /FlateDecode >> stream xmReP%8 Cpw`w . Wkj T&А`gpQSK:PD  t@ #ƎF tpt[YCtm; vsp[\\Y]EX4i@5` $UTetZY ف ` ;;@,G=@IZS\FEY -ɪ) B,C. sl:{,P AcoK߰??\`vSڃtP3@ h!.q[Cڃ<_P3!VcȀ=@`-v| 9J ?V G寁Zr?2{!`@Pſ_eU G?'`\ u{ Xs#lyDF˒pf3srعxy||G3?uM ma\0&>O:oq|TqvnkmaB8WњE%03<En4K|W@UjWs#A\94eq!Ha+-RHG=ʴ {5v~ "KoDd=1X qJMdb[;LuGOC%5bk'] qeQ+{@t2LZv9W) `h%R|<$;-ڃx;zRO4<\3 G~"$ҥ-eu5atU&7D<="6GoPzq 0~5_cZǗxR`2R'(Pviѯ^m+vT<-Z7D[d)]dȦ"+wlt7UἬXQfϔN.|;4N_+ 2nU]ܬ^ UɤdŔT &|D+ȓ)Ȝ[#PSBx3Ȅf\ ?\YC{ⴤgkZsc7|Nm:VmnF*~z[* R)qāC# CTq!-Gq|Q\J1-uXIsNF{ZnZU:B\社H7lXڑIIzP(ʉ9;$v-$?!i#qH{FSis4pJC{:2+l l$X_;⥌'S7@˪f8 3q Jj9:OrEg3~ ;|E+,фJ)[z7 V`0{iEcFuKW7b~(5tiǴs dЌ$!n )lH!VŨhrhRKptFc*X`!nu% u6t~˰'&uRb FƹOoxfx'ưO8#T1\2mu$sfSCpp5*yQN ;7,eAñ jɮx9Ob \x(&QӱEYV؝&taxn_(^0qMZE"b iɏ y?xC7ajNuOaOXfF -3LӼ)ZLbPɘБhɐ>e | 7e F,;wOBQ2ڸLV--wrV#Zӳ7S%M+WX]~:cAf\+#WF ٨3 ^oWLc|ȿ: ߱\jqeJ&8Tbr)94i``%Zϫ um"|训"7U gGRy],R_KyM^k= { 싫bfbwzyÛ,FF3ܸvsD]nu/${MG)N IT/y8=%F?%s4{I ?Gt6U~}LcaJn pCVu}t(b+AM`õdw19jf(-\b-u7V:ibˇ|tͦL9ۀhnS1l~-D ;%Bz0Oc(f x*#^Cz_fD \ޠMxc#ZϨB+~dY| 9T*Rixηq“D _a:jUD ` {6K bae5Q?_.;[ •ЪNRQ&KBk_WIضb` zd<K?OhjoX@nk7V[׵ZmI(RyR}/d ؐR8DwkW?V"KZ &ڄ嘅&oy7V~~~+Mq5z2_Dk]+aJ"aЛs_,M)3VqP$Jl׭JHPxK|n]w`ͅE;$g;.:E/:+Tz*@^kNz2ݭbgWm2 pU~&S F; Ę7~7v8_=U2ÐeB{>zpɸVfBEB79(Wеʸ ~AOM 5uWu(7)A`Z8-efP B)qh=Q :~ ywP~aS9BIRֲAq-|6ΩZ״i8|=w Pla0n9W&θWV%+Ww^ QZR:tom^Xެ4'ds7+ɾ$Lg:ekr#AܭOe36Ak4JWY ]lΟ2GxE8cTxŇv kloFg bvbuB^2m(qWn4Ok1üǴ-pK7;K3 ïtY47bijLx>axBiޕM%U@@O>@IAݭVOӻ]G$aJyrue:Nsu T=yYȯb{2)\K[ $7+='ݻlacq\<7巛_ng |CGǦ?'!H`ncߪaUD3Tjt/2py JXS2 훬.wjEgrϐ9B%}7c痂DŽ2>a(h.w>sx3-IfsN}xX]1r:9H7$ IVlI";IxohƼ[(pgդ0}zbo #+J37bYq^TN^"1{ג*ڂ2Ts1g^t,.O)F8dZ8Y%sa?Xn-` q6]^Ƒ{aN3lLPqYE:z5q%c_)/Q]jce Y?;Cjn3 E; `2{qF.1&0:[n\76&Y$Qs ЮO?| 9 ìIkC)<8H.߈<]ۉG169˹aJ w{3i"ȯ6/y}jԼgKnז9`=3o4?WwpH]B ,R @%' \u풒 ZO!>y5nt#Tnx\԰ERY@,dvv*^dUgιWa29|b*< @B\b 08Nݽzq=mi~#_8\p1${7hm"}|]6_"RBi2C@5hvos>M)m埔86 $rN\F 6ٚhծQ1(A\[Vp]&ܸ'Yp&czfl$@Fp_3l>X@iv-"I^~e!'9lG\/}VQg)HL~ rgu.x/HQods.X_VX0Rʿ8J? I?}$W"xϝ18AȂQ07,TKe jwb򨓆73{ 04ž#k_QB=uE- b]sפ wXWISga~ kPؕlk` >.ÓRS+3:[/7I~ѕ\]:KUaTnz/tl h:M.$lWnH{e- }ʚ͏Ν8GNR@#F\&}aPc簆%ADӡ?3;ц$Bb$&+"婃DHE6;9l1d1  'Jd:G(jn ˮ^|&7$lM0~74#VU 4&Ҏ 6p"*H-(Y2PFAmCB7r9 ο~U6K=+KI"l,83ߑ8>Pc.eT]@KmYP;a%ܑY<d-Ҏ;Av*K⡊$uՌsߪOc+xr;Ҧ*q eyٔ~T CAm3̱,Ʈ\=[$$:o a.Qx6IԳ,Y*רs_)MVYVp$#~=';\Q y40 7Onc=ĽLyl @;~V$dmZ)O%p v gJu85ZdׅU~",zS?epu36Ɍeq; ^h̖"#ku(7O x vn,@Ljݝn9ѯܻ\7upO5C#>E"&to%+\%R.W>^ !n'*Հ-[ 둟^Hq3=h/FOhu%/khk?ՇT e V_͵{X:*bf:j_ 2x#T6<jGM@q'oǁL41$fsK9^v%m}[hjy#gu^gn3фC˻͙ۗ7P43VD<,_Y c\mF- Cሥn:@:n ,FiS?s!h_iqb8M*}S呂mg7[#l響":`ly2K#_L۸bmcV.,f$'ט!._M>J4*2M&lX#+cd @O zX9ΙlNԳ*P 4#^۪ĴZô̲S a9L@s9YI?#X(3Ԋʕ)&hAg8,A*ױw~k@a0ޙr ҵjgöh"[6 {РV0:}HyNB`Vfv>N6|+)M+YEsҩ!#fX7(@|F>a6\y%5yJqiH#q?fΰ x--\(vJF+H*:-}z yQ5G|B)#+6 ${ft/-vpݸކH΃-+4]qHÃjhoUQP]8*E50u!K^? ~bOu)蠭oEX7 >q|t.YAڸ=fE{qN3~814VU,1Iʬ6Q]~Ns1܌9u\Ň${ LqJt2]3UlH;Qj f :M˒+' ;1R+.~kq\Z_AG)woPW@+#UR)G)(:9NBOj.p`eK_;STV:_Q/&ƸRExXXc# Pwc+i#XR`{ `l!Ob}4|1kdL|]o& OL[o mGO.N4<دqjQ,0SZc`X7b.3*[t.ud1L$1b?44wm4Ǭg7jlw`VdlV(m ?F0&nb>/3 endstream endobj 1045 0 obj << /Length1 738 /Length2 6744 /Length3 0 /Length 7344 /Filter /FlateDecode >> stream xmvePi-A 3 Nw=xnA [u뭧T>?^e"#/ W yXX=!`/'% " 0!A ? @`;0r99y`0'[Gܛ[Hx9BN2!P'md p@=A' d ePR3Pv ?p\O5t,WYO,^,K7U= OB?4AT{j vE xB`("N@  9S#<ᏘN~;'?ۃ]wϪjO+\j+T"N+AmavNPqvN뀝9{^5^N~3~޿_`~m@u[Tf^>ňV ,+ke9TsB8QG !P ޺{aFm&v(gYSY=GnC N/vtV%cQgԓ)`0}?~(C|Pמ K}Rk*„HQ'yW/nEwcRB;?kСUV2"W̿H@`^mmN+zE#f^AV"zx@C۵ӂ,[o"(1bCn=6ov3۫vIωPk 9g{)# ._k26̅2]_C.f ɗ[D>79,B7/lz-48!fտ?-5*9wnu`|,״+(;b =}$Mӕyٷ=#l%BETB-z7[-`; XX m πIesL^ O_e~ޔ wݍZͥ$̰縊-?>*3P!0;{O&pTb/H" 5]iJ#^TT95ŔG*י RHIdɵ3og!RW2-}MY|sT=e BU[e=-G DqAJ?dG(V;SۮYUU :۵|7wȄ{Mipkz>;, ďK1ZTXgCp P/*~Cd 8} b!|5PoYZL{z&UИG㵣"L}+CRQJsq7;8~i#hZj8}ֈb^Ul؜ǧ o;Tp϶ kDӧȭ!ۯ_fRYD/+i^E{͞Ç;T5bA=* J؇^3f-v4u`|83xG)Kh}ik 9>ڸl:b@1-RvHb0V C]x?2n4|< %Okfv!c.hT3丗Wzrn5>SݧMĖD/qT} -d׮mLiȖt47RcbÔnnm4/b Z*lN~A; 5A#K#(ځw}Chba1~D(]d?K*ճwk2He!yK{c2v7,at{\'E 2JMΎF(̗ 9t>$ꉹ_ЫxĚy] N9[A"\W{PƲ?"ry$wbE!gՆ'Ed|K'}¨wu̝mc^<`-vpIW!S'wzjdyfG"tC55D@)f [qEdBMӹ0 q[ɣ 8-_b0z\Z"CRAKᅻ(u)ojܒ_=hętɂ<("i(US?=D|jZ&ZM=wZ&^)M.E]@K 8`$F)QVM4`?մ ht[}ﹾF\fs.^: p`]͢jXp:~8z'in#{:sXTDBY a12 6z]C=}URQX7jknOj"YDty#52`GSGRKeLZBQ4ոt8ưEaf$ӹ{31)ݩ?wh:`gw`v#C&1YPEB@gך//@;Kd.1ntq#k g/Zl*q h9L iH:e>BG"]j@2"2߶naˈdh6r8٪͏绿E_}4{Bkउ ~;k(6L9O'I<&qkVj$A\~ '_ˊ*/W<Ig|zKTGJl'C|=Kۄ#@aؑ&ԍ 5!bj&c ($=Ň]"בQ Uz"[v&4㟄KM>b߿荲.l =.fd3Z,k]BޭpR|Q ȡH#X[ݳpd/'kM֣ c <#Skm$Yֺ MիLqL,,P0>=ctWǑ_UIB1ӘkxRaI\o6m{LpDJ>}AXܧ~SNW\(R`55l+ ў{99DYo:`W0.HKUJ;Grg//w 췻͏m$pְK~\]!@beO$=HӽشE"Ttcj3Sv%N`?oXP,$rP\Ʉ+| "?4EZ"8A`!SR#_a} 6&$ґPם76+VnJs,RMˢTr` [qt-1(g曝_;GLaغ7tqD[so:CIL7gؗtE opPiݢ K,[S2)vhStaD,/—907T̹D1bo_NA!%ѵED^32tgڰX>ݷ {&S.I5<^ߢMe$r+@X> $I72mԥ6Q?]j%p$2{(%}p_`RJUp.>~LM,&G&(/]=zˠ< `%LvF 3<*!5Xf|􉟊m7>dLRmNy9v/;tF۝@GZ5t:y?d hYhi'UJL~#&9W]Rš=VT)0yHgHAW09c+~.=Ic{;aY5'}|&xqn6>REǬ% ſCrqr]/Ku$gAYUf%vQ]]`ܸ݀ q6 ŐkE!#IN ג0{x}![%/h8"VŚeх jM_ɽuWow*;1xs=D8 )BŗwBp yb󦕸*时mTB\ jQGQ~5GDiR>11PԶlgSP;ݒ0w}bI*=ז\.[-[8<66~&L}Qܿ7697Πs/ϐ~#E<([)BYI0XbّDn#ÂҾ\ܧ0F*IAI㿳]A>[W hJa\ KAΤ)͆,QVO'efGR_,98 nTD"25;&؇5Rm:tpgm描mJs]܃Bn w-_r^2 hBib\U[z3LFLN$ ގ>[Q]C FuE6j@ݾ؏q?i6b+_$φ4lb1njtY*t;(hTgQ=QAB5J&?u$P]sb˳w0?N^e|Q K"k8>z'}e s? endstream endobj 1047 0 obj << /Length1 738 /Length2 19516 /Length3 0 /Length 20096 /Filter /FlateDecode >> stream xlzc&m{m۶m۞Ӷm۶mcڶm}9|7'w"rʬڻ*PIHHE,J@T`dnk#b"P(ڹ;9PQ[Ife`lnm@fkebndF"LGO2@d 01+hJʉP˩lV ΆVF2FG%F6Ho6.8ZȊ˩ ӫHn7_8de?HcupDȉ`jn_IژGml_?G@ ¶vNY[c ?8Z9l 46r_I'1L Nob`k֫Ӫ%O?iMO]NG/,'-,H?o)?8fwZ܍@_뿒ueb$eff"`dgc%dd_p6N?wFזm,R[B}D g+gR/3LF443?ºWs7ݽܑo&2OLF.cWl+Շ]aZɁ݄9(PMdHٗ"3F)G؎[]x%GSW#̚\ܐbj>Rq_ΠQ ?;1ængBDMY%;m 3a͊EAтJmR`X9Tlla+R#G4(}"ˤf6GHE55X Գ^*MѠ 7v``F]K+R>rwJ/q35z$K8U` V&nΚg<_4Ŀ~7\R zO.ʥ0 (:%@?3Jr:ɸ"Bqd4m=N[K~}iƉXZn^m.j=1o d-`){e$8|KbRI@K Rhʱؕs 4b@((k. K2^=NEf/b.ˑ\xAFФ4.+[~(EB.Sw\}k4X KJ04T!9GB;&ÈX6|*!-{7bNsf1{F!|Uޤ"PvñA0#͊omn. \ ] 1 \܉5vwm's`׶*ۣd=,6{'[IǐYi^xm: HTf4=xxD!(T+{?%NqJBk/kNAJ7'"HjNϯBi+6y_1:b 6x ơaV6.D12uvʪ'&5SC{Br+̨eVqOkoH,9[k&;lIk/ o0ЃwT%wK0O/h|yG >Ύ/A\5%c0a=_ ;I.$ RB IjLxҗIV_(>OM1qĐ XpiR$g\#'Tj:ty_ 1YQkJMҽ鞨@ ɼJVZL16.{b <;M[ ~lb6q#$_*"O;gJ3QMU#}MS(\gKkNrYhJ \*+BAɢ^Frd$Hz\ԈɄ._Sj9FfI_6,Z8w#l;W =O&8洵ees6T4p8ēpKݐ d.#w'sSVMǽ/oHtQdbEng 䁴ȪOIhQpѥjhRL*)WMB u?)KK{Ab6QcIgT7lQ XU437oZ zs5UA-`%&b3v$c.>F7=H"U= $}-ݟ(VxI M-)pkM$fΚOv҉Hgr4;#10^`94$?g q2Iv,گv :_-DD!8ѽb214I2dpd%a숷\P~fT;?)WPT"d,+:s[?($t0j}".mpŽo`@Zh6][P NqwV14wXt?Lmr-+Y4a0qZXctb2뱗vڤ_&io 7ו uUSlDA|h0|ۋMU=AX&%&]}{&Qx|Я]9 (ڏG?PK:F>0U^ji"{W,G.Ja dT;᎞Rڡ7/)_⭇ni6ZWeo'y<W7pQ̗_"oXК"ktPve1#X#n5wHV;zV?>W)ißE">E;Xm^ ^VfAp[H!f ΫDEE) e4?9k@Oא4۽oMn7M!Hw-%Q5f}mx+96f ;cp0BRq:p%"1($(I:ԥI3/]teU#9c}Xk7ZfK?CWWY37*(mJ e^7?lNI 0ܼw`ӓA~=>de~όpI4D^I5 >շc.iz$p|P̳ Y|!MPTc۞b5JEVP&z&ąf]syC׎!y=S٪XNc~c5L<ȇ0yB4氰z %qjfe] %%:5lNdncgfj(@2A<$b O:8=Od(b}^XYϥsP\f~? Lff=wܥ)]/4عƓբH%Na@9ltuֺRohX#rPREh~?'TQQy)| 4̧S=Rsk傥';VcYlO׋m,b/1>A⿓hi3 ErYOIR>i]W%1<u34uaIVC\v.OE>C3m  mdfe*RiZve&|P FȒpDŐ#P-l_g@3ߏ ?Rh܆6n&юiXtj~N\_lRobodq|ǬF=i5=={Z[ P/"eC=# zƀ /Me'I\"#ͺu\?Cк1 XxH KE/;UVUy].\%=EXy=χ4jox4Ӣ`hQ+$Pawd<%khA@kyFnU*INٚNnC p Je\q/H;shIʒul'~5ߖ!W&1y}8" 3YmhLlD5*jӦjڬcrTuٱ33 _l\yY ǭ{F1g.f!ibɔ,):`Et,gՌ8EI"w|j[}dAzɣ3ʜI=OOS;!Uu*grN:_\' dTIo|%S MMV:M;1vB{ql ̡ǻ1EI;Rl4c ?)#aK!Z̗W;y1 q&vCrF !KBNL V3->JSўOuqbj$^_% Wœfևιv+\ˊ1Ɵu<9ؾKW¿toI sqĝ7?Muѓ,6TrXOlx]~L:dKqS6s; n]O66Ox2KVa,n+6I2s/>fJyHs#&`gܞev)}kMGt3ۓB}< _bu.5,01;Ixg5'8]$G)cLHk{wP;/jrT';zov]Eݰ ൐reS}ܵWօ8͸AZ,Fη *0h3ωAA*H#!,%@GQ#YkB!SdP(w|jN*O qxw)67 CJWVHSj&cLEaskƨXNj]vth^.a<sTcW{)tZmp-FpzG1*b-~e A=XtɼxU~ `9E~I"2H)_!@0Lj'{ A_N}~xztS~$c]h ! R<,V&K]KWu+m3Ȉ•j2cQն!HyUxUk`Irgډ Bcb͌k|]BwH%Ö(0,[+r)?HTQ ?0) ]J]gI {UBKQNk:0l UT]%os!O&_Qc WUDV"-' ɲDYZ'DQ-vSyN)2}rc`)azְr/:Qmdɉn'@ĉ"2\XGY#V'kXӶS d)d }yn&8N}vuD1 EeL6i(=iOthWiyѣ^PhSZk֔C{@n'6p'yo AaI&x}XԬ_OLE05A{] x#Fz+V 6"qO sjOQex8_ 5}tVݮU>D{!Up5^~kۄ k=R;N:K&o:CjftGЇ|ۊ4Z*{/ܰđӂ^=eD;lyIB>)Ԑ> e- bP(yPU>s9Y8> H_ʍSXj.?kŞPv4gW$VNܑ ys.TiKzS1 FN{mM}c$_žx7%DM jI}s)ost"7V֝qKS !bR&H\$׈<rϕѧ#_<9yM.61D(Ad ~l#n+GSr6d[R3T궢aI--4Sm|Un/WkHj38,F+ώU<&GU Ϧ40m]u8cNwylZo4N4QzaJEˠ|_+eFf̒\ ZeBeb= ម{Yh ь@tdbr8m7zs0ćߦ/ZS7,a+ E*Y3^sH R"ۿc;@HJ~C#v͆rP9+w>Y?fK´J ,U2fw TKW49F[+9pu35tؘԗbn|b<[zY]COGP@9h\VǬl6OEtDRUjR9AGyҞe `%ZkI8wC1,-j7/c JΎ$X;}~_l4+mw͕S\RQDX7ryqtuxUioFkOCn;mw#HK>Y5ۉkn jg{glE V]DGĝIn=x/2WsDhTt YSQuF;@2XTWGWASw_eS6,cvKrp^o:CuSӢN)>%!vF.lWkNVqPR]pp|@ p(g&ԡ ѮI#o:ksK_S-&Ǘ'UNF$H3ׁ* ę>yZ]x[ajK&"ߟ#I"]f13Sӥ N.(g[kXAT{nu2:Ybx`(P@ @sF@%f־2'ù'فs+ƪ\vK9d#`*:G`;3 s?dS,aeһHfIyq􁔣ETKoW)"0X/Et%)Sw\hͬUh-\Flb3)?K21KF/w $WܰxiH3a#ERP}0Bu=&. ~5AIkzBsu_ .E֜C_4BuKкF}xVZgp ;YQy+rB״}B^mA?G9d~ S ,͋ON(">K/M0ÖXIFnbߚz 1+VͅzBdzX u6r(y)KO;< `GQޥUg+R <qзv;i5yqQh5<r7z`;h؄dСّ-x< HEXx#JXW+>G_t~'坻4ےB]ĩ1ߴC3OBrC᮪e2X܈'lβYIe0bHoi@#B|=8KjeF/M#mx@ 0TgC ve|Ҵ s{m$؍ۅWZ IvtsM[M9ꃤ+z'6`+Hر?BT(R8++X,OK+ GxP-a&K%f󋠜_\پ:=ޠ^U!qmiyVv@9msǫsI~0ÅrNvsPiqQU^L. eʈj B\oei~"3Vy Ҭeꪈh;XF=MR-9Ԙ1z-5^ `{UxQz*C&Fuҹji7/?&5:CU\viT v_RM]; u̜eSf^\A{g7\eKO<@nHL7 xmGTmxEڽx11dTt[n ] PrqW ᫲50<{Cllf~g 21DvJ;zvщmM%woN4gkW:b?f;ҥ̇hB13~+v/6Cf!Y Vӆ{ՁȨkg;/Q)|a׀4: 7^e` µqYeҘyl'Kd!pm954Tr}ޛũmNY rY*.̄p#^cQ](va;\=Q L| dL!.R;43QO6+pa] USb|189Cop_}|*o1y]Ta"ˮ8Vkh_-=,Ie1&D? %S@-\$3i7.Z%?X_B`O?cP_o;{ |qBJ@.ĒrA%?+3<|6s[ّ+>B&Fj଺p;Hck9p.I%?0WߢUΌ[ta U=G?GWD}< "TB^ͿSevLOC`.#3,DUH4yՔ/ꨚHcBsqywϭħ\A ljMtr mJ8n=.|?C/N'%Q}4fk+ߌޛ7\&ѴըFb4CPN vc789X}/wcvH5 eQC.Q/nK8S}>Orn- `Oȿ?' 鈥, Y&+0>jkChb-Bvvȕf=/EokS'psQ 5vdF ؤ6 cp^2}NLȷ]ڀۻB\\s7 |?PKœ7k P`.oD)qn<,˃mq#(0(:m=wp9єj%'|ɣ!RlZO_$I0 ef/*x8)Fծt0~'fXk)FQ5#%PPu<8JLHrp-ΣoS=WNJMAflSշ!0S . ` !Q9ySP#sh“7!tS銰sqհ# 2\YLuEb {~OxᢨQl35Vg7@wo`;}BDh%cmKMko=j83y+CU ~}}wej 0?;0X%݋B/^LCU־\2C_qQǞs^<*$ORY`cY כ~@S[oHv$M8uzrrZx$ƒ Ta}B$LOwy(ou{o9ȍ~L}eavKPEQX-6%[<[e-r KbI5o 33獆+rƃꂲ_?N]'>,[@;3fЛ%o6m|՘e1 .c, c\_Z@(Y$ұY  8:*Ҭ,mzHѲӂ $DvuN_Br♄{ ]ld)BQFTU.%|Jw1Wr19Gz~4>L .Q5X}BkJ {uS*8#,ў'vAQIEjWc2h$i)䪋.#qeşyB@-6{ ¼yrп?̼^FG5+\3H]÷]BNR꺓ugNP,LA!ˌ}w?5q'(&ZN2&;Fv5>r%}3QrԻFsKu^*HaK[ΊlJZ_ MY?p$G/f¢_KS KͿf=Y 'sӉ{NHBLqw4nwǓ DǑ١}; 'FwC=lzE6 ݓ[h%_=u#] q|.4edi;=TpVΈ:,YX0(|Os{G9v'iΣ`n(]X`f힋"+:>dz+#11ɱx%,"~'EܯA(kHj,KIp oe},[2U2d"f?y{2y<-&(O ,Ϥt\QVqO .5KLgEWѝΑ+)oC'A=))^+ i.H}5 ǟzkz@*X@l(*|ш;~̖KɨC}ѿ5;;ya9&3!i4[h" Zgj?:GD_Z`T ^;F06)_䱽%jS”P#6Ho <9Q~yKm66]{eC}o1i-Z JQ 5jz yX뷜U[v~Ki-Nх5+i?7püN5}_BSwev'-'i@Z+PfQ28!n9]>6=͢‡ܾWF ep5b@=eN`6֟USA]*M*ƥ= )Gb+,ҥ*_>QJO}\WG54uCmr :)}Y 5feR )Jي6F}dUKBѳ=.uXxĴ1ĚЎE`&Y8A~$R+7K0 9 +rXjG̝l[Ea,\{l5tf}ʨ+IJN"h;2Fo?63g%-nmYÀ~J|$_x{,VOA]I@w%*w豔w^U2LPǎd0gNHoa:+DPW"(O8Ugn/&b&T%\\_:67ޘCΐB SHȩ j>*0ŭ3b]Ʌ쯾G {ח aA9/d",5)U:1^pQ9IIѝDiH䐲oO kVA1WK@Z#:Iˮ/~ 9,X'!ެtwQ1-qCV 04YLJ}S"e:i.0߷ؒj˟#*o L" ˤSc{7EFeHN=58u ҋԪ&OӸ   {_K /ΑmP<:iTK+6@ -쇸~>rァF:+,OVwyȢZAG>H'<3-VuF 2W:I-pϫH﹏> xH:{P`0 w Qrr3oQcZjZq?vtDJy.·INdbR:xO/)9"z_ewޞ}i%7jc?,:ϯpVvw0MX:|AFb.[ڰj~J\Q]~MV_YG?Q_p@H_)+Ҕo-IzU]3RhA`7QZx匧%4${mn?B 2jxH _YEׇc9DWp B=`UrE̊huW^ž@Jz `"tv7rVF4%Ӯ˾dQy7kfoc"y&&GpF8̄F\3 CubfߟS[ C(7~abct5J0-tx'ʼ?I/gLZ3#W,,t֮HS_LT%tZqa+K`i6|iLV5ca' E,0E_S%}};yZelCPXc1Ofm0n?vO~ҍq:GYa%6+)Jv+]#`StI_rq[mjeS34鍶̑[d{CHo!o߃E:dU*^JTuUmtDH:k)xPp̩w`S/Ip0 s]BK6~su4M>X{jױ<{!s7yMEKdćGȽy6F,4B)wu᛻}B];ti Q6iV19L`'oV^@֖3ajH4te{C}̺j;Jÿ?Y¬ӿ7OyHJP?DUŵWġפ5L9>dy K\(ibDk|u"5Tx8$]۴h/I^/1;c ?1m ""FOx:}#qEVpy▐.O(&0;VDzQhEu1v@NS𻝕(B]K0&\̴v-2TM i7v>&U\: (A@5KGK8X(@iodTF3#<4RCr<'/*jSRI8Ra uT6Ex&e# endstream endobj 1049 0 obj << /Length1 725 /Length2 27938 /Length3 0 /Length 28472 /Filter /FlateDecode >> stream xlc.]-\]me۶˶mvm۶ݷύg2֘c͌") #-#@ILY@ MJ*hblag+blP31(L Фa;{G 3sg忍Ukc  G9UɅх$%  ,'!)+Uؚ8X] -F&N&S;Gſ09?Ll2ʂbrQazea1@Zt[g'MU $k G$:hFF3_Iښc6v@NP!%hkkY#@/-77FH:C?1d$fnb,oΎ.&1godL-\ln'쪱]Ro)iEm-lJih? ,Aߎ0?]@a_IE ٹ{212hY9l? In&w#e;# Ԗr pi}eF-$Asc+{e<7XCf"X(0eb 3w0k0U,#Y;yI̜)R^$q(~pՅG[qJr oX}jYkC׽5м]R,=rP+Ka- 9jTH!kbJS*R=15#=x;OsA*K[yPH޵DR9{v_`\|(as3>(T![2 ~;?2 z_y:n8D O exFrh.z fsF*rNi^r}}1If: hUl zF&<*-#6 ]_oH]@Hi^+ C~j넇.t|)H;}NXzZ"QCg%3]%>l>/_]tjej/e(YVbذ G;ŋnß/42<񡠙@,u1F*ͮYͮgPF8Y3cd[g&a\LB'(ق<5T?),߭;81cxs}S ?|D[=`jK.1Sk%4 tp+35 5SҜboQ{d([Npx^Mz{W-@҇ΣN)jpS{tኤ[ 3N[K_N݋XaCp~L*N|i2@0=r 萔9@JpM%JA0e۹"?XISزRJtoE(ZѼ83ڮASrp\},#c##虫XA .rΰ; QZzJw˖D턔|@8eo< 񨔁Iȫb@ < :=RĖhUAU[4XՓNW]'ךD6 BB7yi M(hPiz JBKe\\$fG|A;: ݛU>I^,poQ_3˜[mf7t rԝ$S$vڏW53ů%>$7׌ƸylQ9=}7`n?ͱw`զ:A@Z7d|XoFP*Ov/UӅ2@=QV(q 'Aa)¹- J6 'YU&}rז;];Z9bPY"*^WszΊ+@T]sK3 ]X:aޯ4b)v[" O A#T;w_J!V\wuӞ钟i.l?8Lp]l)oʚ \i|B B.iY*&w:|xbsI"ߙfPS?@*nM9R?/R Cѓ7^ݯ9e"``:|'6\M|B.5d3̈́Mᐉ|?=9`6Oũiv\bƄc)&fE-n/0 ?_I/EK\ӫ. <uDQ tC2{ɦ!/ٵ\SFbӒ6-2LJ J9ޑq 9+*n DgWj¢>*.[NG}WU8݆c;Ӏ iToH%(4Z^ qKPr;yMNG Q=/S'?ͫ#ACHA2k, raע!5L;SW؁~~{D95Z\oʲo?,dꈩqٛKO 3eU՘",E268D(+OֵťQB@~sv7[E 輤Rs^ tFhuGLVC62&qTu`Q1ar (+~ ilQN͵|}? LBgN30*Ѷ4Y^Y꜎Ih!֐; l\ ]B3V$2(bPO 9B:BYw_f^;:jJ`H8T>Dfߩt]5ab;r]6ō.'d P{bq7n|Hw.to_p7Б /p%54ll*VwӀP%є0gOAl?UQ%٤HYRnش-)B̳t~.B㩡gj֑`LnR.Bh )c+җտBU;rfnhw=pjCF*cs>/ɺ{`*xn,6OA]9^B;7 ߟ@) J<; n[Qdl<ȩe Γ+ 6ˀ^@{`6c!OS3/kp u0q9ٲMkn5ބ6ʼ1AUCtP:-qTU:;b+#u,%a:[O20pSvq2I5i[IN.(ٜ>;~\̤Ӣ 67~4=YYD^HWqzbѿS_Kݾ;_W"6@=;ऱ,@p ph{k(=~ǎnDTIoDJcAB ۇ3} 3 tNOx\ȑۃ;~Rt)=p/Yr(7D=0[no/xlQb"X%m7HfԆ|JQK+#ɀْN 7.&*09-{9HW>83 n!s3w;%\ ߲ٕ$ ߮64u{ Lsׁȋv=%DDW4Wn 6ܙ}Zsf\~5YC2&"uiIȶ h)qQÚv?: @cr-s5vhd/d(0( !܇iN+3leRZtoCccب9;L)FY:)X-X\&akH4՝7yAcZU4*b+Iq'ke^q?;ns+0H3|M!sD[Ul$uM dȺ$hǀ {`~ы2 F_U<詗AU. Xuvȫ@߽%fn^š.p4YMM)W J~-t833G|hM܍SKH!t j/4vy FJ0&Gh9[t5,MWzc"-L*desvR4Vˡ7f3dK eOQ7C!)EA&m.&Ř9Wm QvYeEUeGog}㭗,݄MӥU[X¯OIsU|D,RŜ #-fd %DHwGi7#2>,^7X%s~Ij!MmG2eҷTo+ |S⪺mpˁ m2Ln#Zp;# Pa.SIBKX>JmN/"2 +EW(ʊXr\RU:EQ!XHt]\(u2٨n͍QQP%bBEhM hYh@&0n,zm5=%2^q,GverLrb.GvyA5<hP/z_MhWMV4EڒF3z 7Z`u_(ʴW!p J62eIzqozԴxBđMG黂埖YgSvkUP| Ivj:cBQK|!m? {zqy|b +#Q@@Xy] hTcM{>wr&>̱E'"'~GcJy|>P%̙zm rdMl{>RPh<#ů(%7d&P'6gUg>m#x3pÆn:Ϋy~]}[Lk#h|ey>m8g@*V\ԥzF\v{[Y$/Y}e%P-Y.v=l{掤L5Q k#Ә,3uNRe'T`~R)F jPFqVQhɑ)[YMkXT&}.vYfz/O넪2aDX@7{CߕSLܳz_:3\DT@@{( n) `NJ>Q/ j)]869a?_D}1ztT%2 aR^N}a7Pe)dvUk@q_|PȚA# 7"oj{Ojz7!YFdr D:eUmkӸW׃vtZhe6m⑹o>/~5zq%NӌP-\Z47r$n˼7 9gҖ;>2Rip?K zE+ M1C R( Uj!s#ps]jBĐA(l ˭D{(ɭ9B6<8SFS!bg;|u;9s=W]T~~q"O_ф.LqQO8Vc7Z@T)ٮCluGw x8v/ $\J >7&8 QW* h)h**Ni}`0ꏐi4CKo'I*,Ŝ Jgkljb?H ?qV{Q<VxO+Ig䧯bR\*aqWIWvx'[JBZ-Գ9_+c"ӍpSÚ[^ʄ0PY6cz5Ĝq+7GľT5Chn)Jif8Geh+`< &ǐw1tz3Q1s؄&'h}&RuX7eYtSR(s MCBMw\qGԃ%|! _/`|eg?aDgl1T\"ȷI)R3zSN4L26WmH[]g< H!0yLB$ (2iԦ}u8>nh_t@h(d[dAu+˫nC[&V*JWx~$)C|}gfR.sg(M*Rz+:XT)23-u梗09sk~lٖ Ɓ4OIjtcy\Etn&>qD6 pPZ$|jV̰K;8 MBhۏ3^X|k-esuPd(%ݟG#<P+5G"woG̖gcn@~'3{nG,9@Ja!J(VtJ BCOuo}}jZ#Le[\w4 @{#Py)1NPx~D!1 EפN!ߒn/ jx&yy:}x?HTU_HX !z.uLy $⻄{c> ]Rz;Wb@YVZ:LkH֭TñظZbNآ,sڪx;Z77Z6+LBݚpijBOpB9:Tf9CxXFI^%N8e$Ң,Yi饷dXko}˭DJ] {Fm͑dVx\%2cp%VN𒬿.s@P;/e[Џ׵TKݭA#ZoRy%+gtpf_r]Z:4iߞM u_K9{YKLgkBl4I-aoݫ(F=yc,ք?*Ed yɉt@tݦqJւfR]!R¦'ޭb {䔳%NEvsNll=B)."yA?jJ^̦y 78IV-3Y^0 _X pqC9(X,z{o)4ySťa\A/r=+[Pt82cQ }:x![`Mz#jrzzP=2hmr{ ߅18w.Qr1vצ#UG1} N&XHwRv;ۻ_LHz#m9_(=~AȤmPhuk˅].V-]gOMyOwr?oN(0bg2D{Ȝ`HGF;t%VC ?X^vF;?FB,O qŎJc,m,JR-{|8!gC͑ԪsOXR` vEئA,?uRК`tK9i#ciRf$AޅSyve5Aȇ= B qTH}0 ˆ<ށl+tK}曐 \. D~nߍR}B>+(g_^~4NYY0l1r9Fm~5cJT@[_Á"x*'[ E ?!|.k:-=u  %lAoT'V}vbIG7q4lY(}WfeDtbKdNPM<;ݘi X!K|@u7̞'PD,t`GEyUjtM1{-~QwOC=b#:0ZfaDJ7VZyLuv l;i^棇c5 G6pӧrs-t2nwgK^]~pvPkAqeߋPt_u=,V??Z-q^xIIj' bR9%`6J-R4䥣nNGgGز%O!oۺNjHпʨhӾ_RvSo1gQAM7xkxmCH\UeIlD^ )點E Gՠ90RMPBn<fQ:2 \GY ^}+WgS(T64}6;g1>#p젌zˁKJ @J8iv5{ _2ci,8|Dzb.=<#p $ d2S|?4...cj9iq'ߑm"E!I޺B5bCj@GG;2["f&<&YB3m {M$1l+u2i^iy@nCWL3l܊7L0:z O֏dH$ԝj|6^YްbuZA([&l@v ng]ֿ7砶𵳟$-Ym[dR>WqfY{~ْіng]mt/mڸT4Na>ysL3v1 8ؖ!宪70%'R&J?gb茭1s(_[kbm%"yݖ$ l[eepE{i 3DO8іduC?x$7v{68:_tu2Q}J܎09ާ=**=B{|*Dˆv rӲoPbNX>}g}rA|AʹqvWyC1 bFɊ4cS䰍z;br6@u-de&j8:C/O?yCoܥBјR@4C? t#UrF$Xju@3ݏDRyU0!clTj, L2;1`y|de!a SN<&g"pi1"o4)`GR:FZɣ S~Hw뚽;5aBADU{Z&% _HjBv#ޅ:=`9yL|_&O4ޔ̒p KM< YZޘ~\T·@GfY8-SG5UFHuYz5C0>tK!$rfCκ+齱"*)G-n,zsx#A2yr6\oE5c,oÎ⼌۬86TKIHOX`VPm #P nt)h餅C~^K>Ѯгu/)ןc#"Ss:dQފ>!d|[,ǁ|U!3!%6Xxn~ Q/p`^O8l NΣ DEPlcJQ-e]9XWA(ct3<;qu_;R&Ɲ'oz^*P<Ul(Ü{ 1ON/@gNVI1ůٶ6FT&W]RLj׵0IGq^rxf``Ǯ4e؄L- CSYe `Yf8/.ܷFb顑5>@ np唔u8'tL(Ԟ_F0Gw:2AgWnfVTٚmZK#eH(=̴H*A㳱B+$)}&e /tS7N"ZչɷȚ6ESy|UO3mw{~WZh#2nq3zVk ƅtu͕0\mcv:m ;ӱm۶m3pVծ]{u\@n_ #U(Cąl_`RPB ,}v c_*&+b.}W8f+q_#@R-/8ӝ{YߋK٢ՆbvSofIlOH62Ago^BСh] SC}U>Y~;ġ4+ 㧴_..u.Gl« [+Kw08Vz]M7̧Ҟ8@k.g>  t y7n^hj0\oxs tran)htacs%|e9lS{'UxX5F kW@߱ɭ 0. nwb&9.{Iji%Eڹ5qf~\QXDpOWuwm[z`@zׯrڣQ/wmCh66MCRYwXKQX085+`q)O^j2B( L-P:kGKBA_K=m$]wv_لVcU8{A6%gg= &uU;Id8ytRlJ?|Rz{wF=iV$[BQCdj+4G'ڒwhA}?t)Lq84ZH.Kɼ0 J ^kۃM~5CRPy=kEP*8+R1yVlڀWe)R;yOH}f7Dmsn?lc#e1!?eW@T=oKOka<,Az+ ~al˔N9 0<{LQw$ -ܯBUUSVD#4j0Tel0HDÌ?gKa=W-> edtR [-8Mwҧ_i65EC+z>*ұ:2 W\ؑZ˞cŻc@3x1^C(G|_ fb-#t@v?,<,hp[.dvjWS\v1{2&`$KlˮM9K4Z}ѨCtqak]Q_Uȓ[_mX3Ç޲ Nƙ Y"?<*iixmWQ+/ RMalh/eo&&*Ȓ2e¾<1"lFx2^x#x `P`/v&uocUFљ }a NVE"PxM-C>W#؋^Wg1D>vښ?0dwlbb}j l˸wZ,4 rWPv,TfA EeHZ*`pKlaJ݆}ySfHEly4J^LqJ#R.: CaM(rUB.M `XN1́쒒T#~ CN 6dIʰmA8tqN=\8L=N0:}O 9=)4D:#ZveFzU_1ɼZm|ML-мo9. uYV;2f( pMZ&g$%4_/xyR :Ȝa7?+C6{,&zc0.T @,QV},5qgvU jk곆s\Y񷳽WJmuo"2a>m5:Č~Mߤ'Ћu/76 84a_9kp(@; z}@,/e9NqPi^<Ǩ5,CDH/~%bVT`\h_%0aw[EL+(ft*i73/UI8 I PZYG>(;&-1wi,F;c˜rQ+b'QXa4$ v~ΐ. Y[*ꌠ,02u&PF 'UemC-.֐5x5ֆ$Αn0fvjRʈf}DŽ$iҜh#׵<56mp2Ovw#}6z;^}h.BW+x3˽T8 b3me '/wdG No=3ׅp/ F%z&ӑ_=oܲCPh(YAkugĦ+\'^Q?IAa8Rڲ~aT7}UQn&(BYiv-.dpЕE(߯ŽBtGKe-)S>Z?ij8PbZ@lC0uVw8iW<d2o؊SMAB2ϋ,Gܩ2& 4vC%gr2 *ȝ5ꖈӝ%?ą%hvيC^5 s*KHsc^ 6>/ INt5K)4mU/Arg} ٢zoHc1Ho\]={! ET*Gk_jDb):tw&sT <gX8Fvh/8QRJJx[xZld cJ͠ݪVDRP-i}ҭ>+|v1A%J\&&qDd4͝^yGPIʚA֘j#1v,;zSnxoB>~lHWf;y@EV-\\a{s[ԙ^tD .I%Kt)c /BYؠUB8ǠB0DYXϞiW6WcR-;fo xa]TGH60dC?FNlA>!"9$v6wRS?bWqBuNg̱@"iLlv"}reqcM^>?T!@q8gw}Ww`|#(t l9~A7GXLi/Ͻ6^R#!*8Smd6[?[{ ËndAsJk#.f4ee*XRغGp(#[PtN*܅?z%ޚa644 䟢pT;S9ZH_2džܝT:/\R_)lo@R"mV ۞9 $st:492,pIn{,$,;w ؖR^ ôZ?XmgCeo_Cأdx,:!_ݴ߄)o/8dLO!_J? /ң}KfӮr7r-c8@25--BWI2'1.lw羉R;*r#ڥ <'a:C9dx !\$\"=a[0f,tS\H7Q6T1zag:El;dzs y$ljpX'bA[{Wq{'/GE;Q$ MQ.D",BjsqU{1Z y_oî8Ebݢ"7bzT5`DUhG`]ZIi{EsӍq\ϏD!s2/aB{gk·ZK_+jXJ1UEN':5ٽ= BVAѠS4gHHNm,$UQGr4~~b\rA2xӒbY!=i4?H4S-~,W9e&,j7yX wuS׉=*I-u`q ㏲@LDbM5|NZ;y1w'ijdބuu/^*rfr{A='܅Kddcu D2ߓ%,*gdp7=-C]KdF ˱ #NlYi%ކ9k`Jvl AKHeR Lzr%%cSAgȞkD>GDc`MAlWQH~g_6/%F|҇W'l+ J`5[$'1' 9`r/zucEhдT|< C7e%H&o+^3f.xƯ#æ^ i.-Gb4jOz/exa?m{82kE`ǧ9 $d5n+TPhFFG!TEaK$ jg_HaTb |iQe"JbWr> pRW;[ɬ{E,E԰1:0dy;;Rӏ1U+"@p't~F7Jޗa,x&àP_lh!L?+̅ZW}Z BukrH-vY?"^E|4pg%צV1KRṜsY Fgt T/NIXa{߲m~ڃmQf`ԭi崠L|ꭚa-"15>9W5ѡ%!Eɹ]4lv!`-Dh?YY_x Smj-ܫ@vF@Fd3Kg1b0!swmGfoҺ`?lȀqVY J* :dL4$Yj>xD DŽC^lpY)2!yr}eE0dɛ୵9O@8}/J|nI(C23Yu衧3"C r.hnA7f+-4TB*jl6?[OnU7ı8p-@k^=p coON$!L=hS/-Ӕ]3@cXbcu]c#%bE$z0hz2/4>Vdʹ#w/EUX?E?ߵDǾ&vk}0VqpC:ؖt" `ߪ]s%Q ;,k";gc.gUP?5?l 8 8)~OXp>WB67EwQ9P>'5-ęh(\r/28~Tti`~2 ,gF蒙V ]U x>}(Ϲ˷nY;xbڀETw)¥8ǵ<5X2̩- t)k*wk|VbtYVDpO2zeq5>.0 pؙ5 Ϭ$h]K8.KT <>oq\کo,1G뾢j;H&TB2D'K|K\h=lUa#/H(}dX>tS#aZ[%YŞzH~$m&*?{ٷXړ̦ddԸkL'_ImB!D5)@U{b+nYIRhF\yJ!̐0OQ,*\@4$(Z#JRi 9yθ8_9qRG_duqUA\iB^U?k%w0g-Ҳ/Lg,Cc~"?p4,+OKS3}dbh dp/=0r!/ˣYW0WS}wkZ.@ YLlWÍ7;N<dҘÓ'AݦTo !-~ V~Sk@/+m\CPiAlGNŸ1jm + (m=mQ\/26EsKv`ǧ3~fcO&?{|XڂB{>EI^V܇][Aa$k/͢\bau_CEnF!#9vS/v*]ʞ'T, 'nvۿKRG[@,7֩9`T̽_^X9/5G,4>5,"}B /pP 0MGw 7zD Ζ8[O0-X^{5F=$6Vs?m`7Hm4?""@k=sL>{Y"xVґr5WɅܡ`ӵo$&B}\+g(seAARaf$x1:GW9Lp_E2px *.9Iˋ%O1)1KQ/y7澦x' |nP<#ѴGi[vgR.7iX-7l@= -A0K_X6 x(pѹԈTęO)yd{Xd1ԇ% mM[ue=J&[%)k v!AD,I}r=‰F&3L.3H͙O8>Qtd6V-TbfS6FVu6z\irJE3>u4ΰaVfܭ!a S8] C^DᐱZٝ>yNZ󾻳>A֏z(NA1Ljiyuc7si2 *c J8 <wZ70Nަ51 glK3{ƒ१0/GցluAůH섊{b~_ye~b ( RXs6CjNȪA%~&5ɼ)J #Gw@ ۅ)n_%A A]Kc.1]vʧ9vzg].\ԫ/; Vs<9Ոsث5=+t1YQW*ݖ76ut؟Wkc} f:h-nиDj:=eV+ IP+Gf?) vQ Ǫ>~\ղi|"0hi& wV'oY`43&R^( k  SY2լ™(b"pa]{ -]J;O?Tvu~ )⊳^nfe5hob 89`8MD߬3bӑı8dS 3mpHr w!TUD%%"ٟ6O} `Ae-t3 endstream endobj 1051 0 obj << /Length1 725 /Length2 22309 /Length3 0 /Length 22859 /Filter /FlateDecode >> stream xlspo-v~Ɏm۶mٱm۶mNvl;9zj3{9=WUwZDbv.*t \e1F3 ))@l`d0100<,-\Ɣj6&N5{{7Kc ++?Iʦ S)@X^ASRN@!. 73u2(Xd,MM)fN v&LovnN.3sȊ˩DUv&\65vqWu4Z6k`bi025nvf&?hH 015'F@!lob71uK{w' 5 .!hg| ,,=LM,]?F?hϪx)ы*H S 373( -A: Ʋ.Nmÿt_-`_t]L\-?djaj jolZ+Z4_ N=?kJ6{l9lat󠂓?`Ļw+Blg w25MVܷvC!\L X6%cs#Pj%+VʱY1ڠsOғ8.9J_8y# =[Rl}Š0k+amMn 8c:ON B_*X3Xe+(U Wp̓qbbۿ޺[e0mZkA^`g)x{@-~%Ϧ{6zK t:O*(i= 2Ch5-I/42l^Xа^~<.&M**̚^Ʂ[PoӍhl־31c\RDd"RS%Nޅҟ #Y?4fVADl- =(H&wYgWJ%VO1;Ƞ;.g\)Ò2DI˔ i1훱j8p켦c }MLEWE39k7Ze+IfInms&fN6ppSt}z)Ʉ*( i'IV_@81}Gf$֦ŗ,q;i\B3(mykߦ|{ٛ936}xQe>:dUFk@ؖ4~0 {Ԑ"jTGuy ˮ渨!F%/v:=Mxi6mKTŸ/ Op-;RyFߵO2289I@e q}e`>@WdE6uYhZ $%>omi4]FcYX,o_@cfÙSAH-CzLy`uřmb~bmxZ-# S%ʚJփnں5]=WP2SYÙD"8ZdTmr' \oWN||{gxq) Ov[ ,>Z#eu)kGy[I $+ bؐ>.q{=,r_hOtoH3Nڏ@ fH4T_s 6W ѐ]Qބ0N*y6?bL`+B_jR9HR(PP{{F/sZː篳WrnuP_w93i0͏駅4 "$b`oܩO@O0Ű()E^A\Xcuqg.g f,'3#Eڗ~bL n͕m9O)̜&(EvW/7{ݨŌCc`' _}?G7eZ&P! ~P ^54+ߴ3C_E,/!poKFNRXR {G@ad>&+1dm )h[\JH_ Ylqh%Dfj}V"Hiɇ!l^BhI~=΅I6wBQ :w-Ix'߾nVdOs'P'ji;JңF0> Ńh08$ C4BңI0S[uT pݭ"HMjWY&c &I?;bZ-r0ag 9Q.V5>gkZx@ꇵ6a WavO]"aRP~c|4xŒ⒨ٰAefLQ/? Va/ Ntld&^ kEoCX]ym*Z]FmDg}wo65(O~?$smooJCYGdp SWǔ4,ʱn'D#>(2d-"" r—-1fr927 Z9`@G_${uS 3깇rh*` :L(g=ZD&pbq])!L>TKFdHW⤏ڐ6|㴭^$X  RR[ԸۭrmvtiD\Cezk~O5lywcWn?hn$EK:k ӄEW kw<|Ǔ )Ðg[!ކ'8O#oV(Ǥ+ܧޗ# (({JQɕ fm|Xo|փl||@CS=Ҟ%)"PN|~TnzVWT+ @D/ v~V6v1b/ 'n @CەCV ʰؑb5LRU>aI(#chmZ-˵$ar0!KAʌ {HEň idU}Tiõ"!Gg< f\,pEɎAJq 䰼4ai0xY4v`Jx5D ~:G# &ZT;a-KMIcö$S7g0^ՅTޖx/qCV9Wv2%EZ/1P "& $%R}sl[jǁ. Пj !qg-ffg7bą6袂y+%]h J?N0q!(FLŮ>7DQ&`fZXP+4?qcƧ+`9.%Zn,:L{w)jy i-kd})#M$[S䡸F1S$*Nʭb$(Ł@'a8$1ĘT,WFwM:R\.ذh**0w`Px!1r@u 0c|/`ӄHI:kw1*1pbc7gT"8Dn@~!+%G'ġIEY{~ VÉ.I*6Ҙ9Zzɰ:s\쿷}7`辌ۛ#(Z?KQvRU]…/T)jC7s˨W*#aPb`7.$nz3GkBG38lfX)b%ƥ3ٲ*UI{8zZUܵR~ZU,pa@ |<9(}ZJ.i6>S_H}DZV9;cj-) >HZͰXr QMo> v]11BKȔ]RJG@qAhk٤L2uG*EA. Wcx C@Fd_KF}Nl8~j#/\78ӝ "#ZlV{ 0X҇ G8vi/GnM=ěL?IcuʸǺk4}DPAI|> Cb7!rVis=_[1_3a%j8=?t*y`slrT4n&Ww֤k,m}IA56 5P=zH|3>תhi=/szAaDT$M+8uEl3'{qNu9 J {&xepRk S1pAY+g\!Xd) asmLU{ X"e4" Z(oP\)ж#U1oǹZ (\+Zs~Zw] PʇjII`q0yyrؗ띄>lNJToÔ4ÀB憌LS)偤} xLO ?Z*gS]r_aOJ:•")!7۫,aU9Lး&nAXCeu  . ?/Dj9mH1a1YlbZ}@i>*;*ũZB_y>sy2K ~NgiJVy^+ݲl4RaftC ?\|J8^jN_'}[{>7L*mz+a4!r<0Ѓ]+JtFlTZ/bjL_oA3Tbweгbe?][ HX#Yل<+ު@ EFN\KM7ТfP+Tj\2=LP}m^}^"9sU&~kK+=O覝>|H2TB{)c_ Gf€sJ!TȀOHz)\ɝylM 8ƉJPRnܧS2vǙPA򖗷ti^E^nC<5Juۆ@+6#D=)--"_w8mGW$ `j]e3j.]!f竞^5b}pW*.>~,C4@:Ղ;+#6ߥ:8kj\U/Ը#F*X -UU1[u,5K.*Ua 2. na:N[2sMCY%|pח;ߞ+VZ"f|5_Y%hq {}FlIEz!a&r cSƴg짉eݎu&qYۭ bt !Z#Kۖrߗ/nz߳ T=k]mԩش,AJoEAޔl{ves)6A܇J1^<4BB$6*'o3M1t?8)گJmkgFs !IɈ1M0ͯڻ위/Jb@A"y27Õ %-iD'Ř?ЋgLTMc%vLu|~A2}%C_ s$~ٷox!%^lq􂪩m[բY6o ?{9duL[vD$a[Y -3ߵ\>>d;C/sGFA&D,e9w+IܘTSUƘZ,yn5 Ѥ10^H>斉WzR*G/Gځ%4V\T| 7;%[c '#i+VY1Q H 2C `(}ouK?꧚QF4~;2`O?^^Las,N*`27 wE@-}J5݅: 4.넭.W3Y!ƉNf^:Ni{8(֨ xa?8`&wGǮ:CE@q~~,oJ2D>F94P#]"Sx十n.*(o3-,7NV;Vzl8sۑ "\LXxY3 =IBm+hN;7q8~m:h`D avnY/աUBP0e\;n KrC@tA~n{%Fp/6& >4oq27/ι5VpHi))tU 1 %kf-uohmL}QYUZ],cYNT69v$V ܂@n׍rgPݮŮw*;`tKu1XȺD{\`(]N0 0EoocL]pd]QDAw=Z1YS <YsK%{{Q}JXAX1ru9q?`T\qb2n њ@$̢-J"}{Qab^*50D[nuAj7BycF4ƁD/*.{tN@`3spFn(5+麒}}$}$1@Fe ;w#KK߃j:S~Qt{֚ vB|T(1ұ^ yF?O[~LbJ:3 TGE]L shh3** GWTCl;mhrX©XvaS#N: ӁE iOX An_žy!!4eP). &ZwtA NuR?rPZ'9ÕSSkd%%`Td}}g{2-_ޮOVx`8S0&Ŧ-m􄊨wX]g{> (}Z+I};O虡~-a~m G fQeS˓e/;)R5ȨM%c&nj+yȳɈ;ΛJJfӢ QlG XEwV$qGsp:)8 5BqKF2&y?/+/#eN $:;u*a< mYѹ+P ,#TıLˊr4^d%abXjKѕlԿv(;6Y8u6|;e12+ D: { jd+6i/}ճ޹е+RAb4F:gN@O4]΅p-T Dq cc~J'/oHRLvKV`*s]6D(ڕzu_=D(D_zDZT/\ڐJ֨IN4ӊJ=,i,?923Szhv8_{U+\Eo?i1eѬ$pVy54ڿ%4DQhn,r)[NmC c:eDB WgUBmř9uHo;BƮ6T9lAN!#( '6u^j"4J<h])P*?uքvc^i*ʏu 0HUð&WRV[,D1;o)k$Ȓ [3lc"xbƷjQ P ~h}5uǚvC.|gr\%F1gmyre{| [ٮQcN`D=E*+<}[MU]"~YWI=ǎeMiワIl:s0!L11O$9yzGTSQy^ORbzrLē/,H8.# #HP1Ojl< @B2]1׃1W;OlC텦 E<7ڏ6[uxDz jqs2R@lƪ`/ה7dVGm ߴMد)ZtSXfdoG/) JכaPK#*^r"%er/JFbk#5}R 7yC\x^N¢ԃ&`릍4wH;]`-@z~~ZOWO3_4na@qa96aCPԪ=6S''V{[[NOrNv+|NQA'r`GC 9BG8̃gQ FTc=8>ی6`?8Of$F3~oIu|)s键"Dzr&jIp O@)WtH;5-r[^s+0 ς2V$f݃QYAA[YAYx}0vLmdS#\2JQŗ g/Ȇ'J h+tտr1r%{wI?6)/`v5 OK)~WFyؼ(:|R+.M; ,/dۀȌ+xrbtr1WNɈPV>SSc dܗ|3 Y݉p> ^+myq* m8׺ ؇al"#y|]j*1g̨FgHo,e;MX^He~C*ȔH T>vؖ"WOA0͸|%;qܽ*6j[" ci&5#2Nb|K[&Lhݚd q՜AշP=0=fO7 {A-]kGy9:`ӸEھz8%4y =*O Gz 7X{e\Km* A"y:rhzkATo*f͑XgmO> hy@ǤvU)虋0 ]fkIkzz IgE25]߷97/gއ\{m q/ł5%1**2(qeȚ o+Jljy{|nNJgUٙjB+&e.+h־27P=Dͺ +N2 ʩ+qf˖91ozMCQ{\W#*-]gUqn|7WQ;!xZv%(HU[{ܴߪl-h6nbUe!j`8]⎼^%+x[;J )'Tɑ.f<+صyDt }#76·pwTd-&CZjPlYd&-tMyB60:3̌"6^Tٜ ΥE|Au DƾBoS{eERώuhFJ@w;Mr-N PSQj.!gq_KqIT" 8@-הN{r,fre`$hh}[*GmyP[%hM[ Ͼ )d炟$ dt?!8eqn0q4 hAU Z =QdXALVm`(ĕ3-`IʮƎe'faOSI=SqA8 }5lTmSm% n>?NΧm)Nȝ.RZ'Jm7(6*pՆ0r2zLVwWʨ'0Hᇪ͢(1 雒=3z"bv`4˿FlсJA;Pvq(@%:T'׾Uzk).5[QEZ2߫wҹ,Eٶ.ǼoL qUt=L*\ !YJ̗Ϭa<Ԝ4 9&RA^?xYUw}@.~9<"uhD*q>nDAo8 &8*U(q`)dƞR$J/}#ݐ<.v&Y˨Y>(NgFsDKwSB$Xަ{0?^Yf_|"j>H-mMPm~wSY t(jd@M ԽPk 850\pWmEWo{78H\FJ01lW@V8kK5`:N -7 3C˅YBF T *m tYɼ`ũeƘ\4` k\¯w'85_VݕnXS3C [#pNF&xf2CB~ ٪uvu(KQ4-C{Ots¥-g4Wh">6|JQ/UCPkW?`D+O|~;<8]6,A|,4LU4PMz`7Ya*-]KH^Sf9ȣnx%pÇVZƾw F<W0&v='83r8irlopU`45kO-u187byd`*iK#. A.r] .^ .Z \֐Q!XgUvJ}I^r_rjZ qN =@)o&L #,,A޹ {=S[ JUu|[#W3bh'z Q=4ЩYRő.qCN_vʛ[.w*0 UZ\j*>tWaU=f+xb:Z[(lQ-# pbbJcy1- YQ Y.Fq3 8"p*^J@EzzX̃ Le ޷+I>N)dC`ѻJ R!?8 'Ihk[`=(^lIm%DG3A.ŒkGbmQ8~).B=kpxظ{J,gg#аuV[e>rxx)#=b~ADo }ȔaN e0)9Mr}9&[ ɦB v&űZlQgyq&^"a]qpQm9kH8D ;q5K-#_&, z2q(K-R+y0z\%{3ZdIN7[JЫ]~hpQ8XDZ %e _>X3?Z:Lc4Jf)ˡfQ,P P# A~Xҡ;IRqHV|"ռ6z5It=NDvZVF ~׻t<8|0 Jp:OCj*yٹ~$Zf,I3ƻi#=C4! 'TEϷ Jt/ug7vzA%#ȩA;uT\s=q;Qy`FQý4^ f"K._J@(*l3P"}FO@'jo}Ma;r1klFDFV0t!i>,5S㠌^I*-wlއ`o?+xNiFŤYA% ]f h3Pr ŝ&fOs |.&kx=?}t }pʹAhg̩v7ChsS>o(e|r5|7* D iv|90VAuL=:Lqt'Ʈ|yo x_cDD"=>_8t E}_%׊%؞P. p4UG#8GO7xrA6)hgX0\Dޒ؍wE ו6/BGr$9w r:S:qdIX.@Peq%+~Vc`x"U6$7.V9.Rre?v3( r^|isɪ<,úJɃ[4*AO:\+*6@ @b(cO%٨&i˜~ g.\*}̞ߖDz֚"z)r)?HDf I. 䀛NoSG8|ݠycix~ Kcn WnY~+ Sڒ~+hQ1Eaۦ{;S@'j\;e$1?#cF;^V_}>_u<7ȐSF㍁Ish~nb ^Цbq\.]yVμ5wPK'Mvz7HL씼o)>(@1~8,XϘ;iP4X1=$J|-S8%NiYA?ΎxD=YZiL}rxUY/'!T=2n< EeE6Ձal7m.xv"oLR[qU͝2uI56)f7xZ3gV R]&W{:c%kY әF7ћEahR@Άr{ʔM]g!0=/٫MZOw{}mldH,ο;gH y6n'ki6R,Y^p|Ȭ>oi`JJ_vk^_FL+ȧn{GB {BԺCfStxn7,5Xo&<%q$Lw4+Nу^fiO[ܞXFG_&3W@:}"p}D qJ09{lo.P-1e+ jqSD7,WP8<;=($>aO.\"rZ 7[0Vڃ XLjQSlh[+P18Z}yQ3>M #Rk`Hh !3!B2{{!a⤴_s܄+V͢2nx[ؐҀAp}3_<t=/FE @VFdS$_IJ8 ζ T\Ӡž8}Z=Q MAҍw BD#xHRx80V~]c3y |b/ӱbN]*2jZ.J@'ZO9s#U ֢ɴDavg(M8rpW4:hG _u*Ib 4gCH]O0Be*+! >Dע6&K}@`|(,ڳ6||@P~edH-N#^]^{Rot+d5P@"j]Q>²?P?)q$;-OQ~MGEf(VUn=K}K8i10˔ۼ-9)gbGsz5?}FKw\}qӞa]kG.oLjGcH7En1GΘQy}m>T6 *ᢖ@1J؛.e x*V$_b0|U]͎zNOn,hD5G}+$OR=!{^Bí=Uky2/]p~ta1l |i5r3A:1Rح\Z#J|95@ ѓ/nmB&K44-9fؒF i퐠ִ&1=GqU;$W٫MGoҷarnš`j>gu 67(Пs`.{N@%'.3ݽy4n"ES*5.}ϡSst"TE X*qҊh S\ѣTD8]6̖~ZƦ ~Uh>ecGk<=U݌9@'V+?EvFM/-,E`8 Xw6Y׊[o:H@p'^^OLD_b~ʫB\x"l+Wp"O#Orɋ紟1A"St,~"]2ltmRdJ.O=mr Mszmvcu'ueۤ5M* %l 썪25U 9Fӭ=I<+fC `Ktea7DjA *_7]|Ñ+L_⑖F9ݿFS`U?(Z^:[5frU7? J,J9Q&3r6Ƈ;6ȫ ? (}q7=҆Cjsa]uoxUr#px ?W?H9e %,D@0=pyxu; SٴPԹF3/a/$ 1 f}.6D"_ĽǕ 5!5?t}IמqDyt.B̌cز cܵ!o5g5Uu~rf>nQYq(wc_᭗%d ℑ2[fs f%P1gcxS!pmM@y KqI<};Ch[h#x̭cj 5(Pok9B>xKvиz͎;r C_BotBU4ur Оo8I ΓDo'܆DA! H@TLtQ9WIr.|lʲtL[n;KvlOQk#M@&JC$.}Lx՝y#s/z SF$2@Б}$aOi@[#Gɪr 1駟s|7X/5|9AuY*Y"H1ߛll@z=s+kK1lNj ju3|&s?j,J҉;Ɯ b*WT?HtΣ}G^ Ly?t8mdKeJ 8=;T݂]U(QMذS|5_5V= |s>(r!"%+j6\qemD# o=¾(л!wLFg<5DO` ܨ"iu{Ԅ .D&t^ܿW؇d۶A=rՖe^~zGB$h>QJY_Ӥ2dPR|xE}4[MYgӋVsP6O\©tDy|>~IR9ª-J vy*6^E{^>"|8x# ib#Mm'Q}+Pу^TJ8g}, ٴUٳT"QǮ1L`8 =9@uYn":@pEmx\KyK8} hƮc;D䘛kÜDM6᯵y:DyCc y@1#Խs-S}}:5ҧil} M6qٸl~{R]F.c}uݍ'Neʣ%DJu? Dk.oҶUeNyh|Q)Z]Ⱦ|u}JG.E˶9Nes2Ao⏈u2Z_0tw:fևDQ~^ ΰ;n-Aゼ;GXU ƻyPik|;jKhq(Y^I^KJ7Ē.mѡL '4! 3B.v/f̅Y{O,d,-bB e>pOnŗ*M0W̐@OP# .uc+,%z.uỵ:2 {$f5*"J'4Fım GJK~gڂ]ϿqC+A\{w ׃gd(iGpU^Ŕy\qT.AmYnrf08i{85*?կaw1f+ԗ7u6Kjhe\pJzu91ۗ dE9dt: Z`9?4d-2 infY eg;D1J]^.]m^GEgd$0qc"P!8RxL28fWgnnLNG QnW%EvcheG]>:R{C*aMKtFU3Íi-X7;Gcw0)"dg,P*5KxyS(WM%  !5J}BQ 'iw[+3Aћ kTNXJdU~EPp\#()M[|>BOXXAjg\q%g pL#Q1V"IxbԈk*)n-_V>ct΁e9-[> stream xlcn%ZeOٶm۶m6vٵe۶m>_X2Gf9fFs-RB1{;OSFZF:. #'+TNŔ njP6u02IN. c5CK[K'Mٕɕ"eSS) ,))'Sڙ:\l,2ƦvΦ3{'89?̜m*br*Qaza @Fr;gM]\ ,,:hFF _Iڙ:7!(ѐ`bjO)B kobdN M5& I3G>@b& .jX?{#kjbjr4gWMl<׊D4dg373(+Zg`߾@_#翖eb`21Y ]cW''S;O/=Lk!V-a~EԳˌ+S7H&W*x3F;D1د%ĕ~5[oz`jXr`7w.kA T,ٹ5ReH1V)uɑaMg-B _@waoH- +õ ꯬5;%0Dȶ*7ﯘ~el .<*œYQze<ܝq{VW!w p&ݖ%b9]^rq[UG«*lɜ7-/¶bFJK$q!G]qVy81疩QEV֗_9-Bp׸ϣ^\ Н21q w,geRwyBĚ*'dXXkLݪ6ܫ~;j1UY{-B iWv G@?~Y7e+;L?It&CУ];;.U"tuD](]i;D827EBƍ`Fߑbu4%VkWljlCڬQ"TV`#|$pڃF !0L_1mٹ=9a;Irf~tB9eP18#;2640 fT@xO\Z+[gs%hܫCަA7X5ɱ;eR<&LjbֹnkMZ(am5M>X>@:Ql In'qmdVO݊mg$ڵ$OY8{D&c x;v.|GyZ+2'LxN;S&ͼzFuzpC\%\3!/TRi)P+ g@]qaڌ˪ N T-?ڒ07;)N>!?IQ$) r7wv8DtqC)Mh(9,Ũ IC;E34I,Lpfڳ4cG7pTf9Bо'OҸ[_aB&/re%f :.#=";)¬D/:cGمcJ`_/ݼRuEݐX*s #sX@ N7.<kCn''9ot2Ԟ#)GL\^KlwP쐶)|9$AϿq VhuY~Ae8vJV^QfA&ʇ . A]A5`I>[ *)BH;޷0.E`~Bp@QAYRᬙ$]SK|ކe-v(bIvqړ ;m02\;lSPcF7Wr?(@&K,yeII ʨ6BZ,qÙ|6H=灿N VL,b KvB:+(3ďɆʽ^10/wT4x{/ s>AgZhW#>yu#Y%P,x#Hm̗Up8{PAߜ7e&:k778 ys+^( v`P(mOx:Mҵ?;$M -ЉBN2yau}Al\i[D{ B>mU덌r79Q:i hH: l)FKeB^&)@2h.[pϴ«.mF?"p_N)i&Ē;][Yd].xy'Z z쏗í`H J-AsʢsWB]anjXd[v[p/?ᝦyq2Db04ŮE:U9hkM/ i5hs{Ĭx?B_ VAv LjmG:䉆#q@S<vՇCF <ġ1͡j2aN DgUg^߱']7\^`WppU!٠ 'c^t٩֪( Aj<"!IGSRp /C_캥r~E/% =9GSZarkg*ʀYi)i<] TD9_Wkư+/}@GE~SET: dž;퍉Hvn R԰.\ ro\f1Ү(ߡSz@l3\֋PׯPIc /[~<xm|)V=HloyYZU^q8UA{*1:_LRZn%;<"|z2j'gxE:A;G2hmcixI5ZUǜtc7{7F=z5j3OV˂f+橾X9U{$3_8'7-Ho&"zo<88Ca uӆ~;xhvFY0/TAkゅ氨 TjUCꗩ %ּf+{|;yTї-*o6LO~ˠ ʖb@/_ziЃW%4kw4s% ˻"R#moDT^1Q eFlV\U.i82dvrUyH끐YYxlLܸ(*\Z'!vujnn>e }:SWEݑ{cuŀ|jFXxZb| bX'~Gx‚爵yhX6g[&6߾HІ)A&u]oʝ`.1ph ErZ(~=3k5ia Y̻c(L)Y?b s̯5x6p BZ?AJA7>` pUd\'2ofrAԤ*=_CK jG0z)Vx99S6 ZׅQ/4o«'kNj- >?V}Hwb4qVAWX6ʈhk*[Ps׊_{ W5Q1DaeK_WQꈿ(p>6ЭGD!"%7ug DZ%ڽxώqhkVrt ӊ;a(\5 8X"}(|+1M(fc4֓t{MR*owe*l1Z`eYG_*Jv~E?k Ma]?GE]2i*lMCd3 ?}Y`:CG֡Ҩ;Z\%EhDu>zҔwxqPC[d6x*cҲ{ M;@IqÃO)aHt$at)E)RŠ!1rQ`S%+nW9E7|-v3(aZJm ~K\wlAxFPzKȆQsrp"&/Sy!J{C53Tqqftu'^9}(/,@0ERp.$grmԷNA̺1JV;$ ;#iIM FQU >‚-e&%_6JV EH_'GFsuy\XJNK0`b 3gH0h+^k CT(y2BI7VmSκx;2c'oCeG~K"Ϛ?t˃r[82E6HL_Υ={zߣ9V*oQ4F *zfw3I }aZ%QReIYEjɗoЛK RUk=*Y  }Ű9AvǠTlO dT`RbF;5ZNQ@K"}\j/T % )f=ێ5P&pP2\@r$-\\`#*hk;91@ysjza@p6ܨ4*av3Z@MBJMa+m%|}[瓷jYUƽ}pBawPc7}pe.u*'Q_ 0Wt&HЗp+kS³эOjG[X_ n&@ǷRfA#}V%tVNၕw$)㌪9tī_rv\h]@諆,1 &{OeJEWYsh6F*Y'Vܵ\.;}߇a+Z==oŹ{B oYtU]v4+L/-{:3SQ\H9ST+`aYb`"Z<=UsEi/)@_f%3[h/lH| 涺L5A|yQ>18O=sDSҌN6b-Ask6ԓuyiA;4Ɲ=maeʃ5F*7>S& TÞ!*=CaWf)=MhX^e:emGW:jEj_Őr5Vzu+|_?XD.yi(s{^2G֦ #`Zs;߲cР# b+ R&&Շ 6QN9K6 WF5/!%xCXF$!0cq>qٗ=^>'!2 L{?兜'xtNj4bTUktKgoF T@i+ڐ閶Xj/ꚜOQ~x,Ãh&)J!B_P[KrM=8XWA]N7c^ȋNZadB_M2aFK'_ \MM,ƮzDZ`k,7pvԙ]9@GѱĞA9#Rqv`z8RUh@cF$WNߵ!523-cO fsKKU(ej@:C]+ip'v6GS{؃rO|~qHQ]?[̂E5 9ct{Nȴ1=Aqx" ͨPnOZҖSό򱡷tl݉!O~|_0E~ =GZE1-w`Jҧ\ ejn[h XӋ,0ftw{9K&֘֬`We^} +~iKoIvJiqW(9 '`?  _h:XAQOv=Ð@gV)՘AmU[ Y2P)_r 5 Sv1M;Mj>J[;aNsp9I㕨hwٷg^ΑJ#pEYvSl5j?TeÎ6}@qg'xLƔ/cx>8'`zط9|@oOzXeXh1$ٯ6\9A9\$0OU$I>ݤU,74=kci!ξX ΠXqEfDLGTᇁMg`da]hd $ y|Q%[Pt$U-Κu8b|%a2^ChT @P~^餧.ZL26n7_0E>-ԩI`Q {o;T%d*SfQw퓫ފn ލK=u?dNoF$#l1쓴ȩno0Ec)_;D-XϬϪ{`v"c?K xZܨh=D8*eNg`D ڢ oAX#?I?nەM,9z%&POSFM^_љSL ] Ôڛ N9OA(*:;6,g䡾>1?J5;ԂeʚyeSt'嘤/R[K0\GMCx7dg٫"9Kr/ 1R: a^_ (qnli8dɋU"n:܇6Ij󣚍?z{*8{K iA[$cG(]Z,b|1XS 2]Փ.tݟRB/힨({طHꂏd,i8C`֚VCd|UQs l:-Q4$ny& 4zgC܀T&8x#!\,?3oyչӯsgPCEf ܍Cc~! E(IS2e[:  T󤄚Uբ$b}Rw&1XJ;|n:ٌG[ Pgݾ'_'IPsLJ8'g h:uPސ q9?qA\V*@酩_i+('"X Sj+?UN3Ŀ Ps =䡂AosեWSs32Uz͑{CnmcĠ\ɏ;0/Di,_v'Hw5k| =ch]){Iӎ @qJ*V^Ӽiq^L 7j?.Na!pښ4C^GcJ F~@S jDzDV\q~tLoi3L:;8 bRpXLCM( WtP>Vc Km&p%}ǯ+8 * `hyܝ.Ns{$0+i "Cs'3yA#3X,a#X HzkbʉT(+AOW($)؂ѷYO R`GE&gM, ]_;=pyd\r#<&L|7t!-)dv+ᔪiaC'xaЙ6h~X_nc*T=y(ݳ00P؁Bmq˱X"}Y>gzKyȎʐ "V'_pGy.³)r3VdKodYaXga$E[WP7 AbaD p!bW.7V;U=.`BÛcv{ٱaATK3+;RW ȕB{  &/$#P9?juX`\) 򠻹QK>3hd$͂j`ʩbrxk\,7 R,{:ZL 9)0_^k4еl4|RѳÓm;vtm?JRoXwMX#wZj~6P] ۍ:4ã|~f' 7R`5 }/CJs l: 6mXj#FOvSwfӳnuH]JPn^r  S.geV$*#',r,w~wc}kqGQ- &9sJ胭E4cb`.L5hRƐ84sOq`Ac):rHh2n- W)Ծd$"dFa+cݹհs1+Mz8W.eiNJ$M*ZQ-\ᤛgZN K: v}>0[5_XŶ. {^yWZɹ}/H=]-NtR}AU؆c׃QX*qA~YcH#QR{F V8vKJ#h:(|ڤ튽4^w9qg:w C4êL~p3l.◟z-ЏLO F!7ޙ([T)x}I %0's:cCBf ?[6eʔ V$t< 8a^?;:u`8׍}ҡq[$Ʌ~m; Ӝ 2}7zwv[o[8 F#r hQn _GY,0 M֓͗I s4oIkv"A؜k4$ٔځUT˺P"Pvpn(}N])G FS*Q0!8n̼[N ȋZh∌Sp*YP{)7 ؏7],$"7j,(9_~jE4Ȯ$NwoIJhd+scP)FGG1 ^vNڵzamaWx&~ .>#gl O礓CwQݽu:!$]Aƙq9=K98͙3w dD${8'FTOJw CΈS Rt& H/~LuE\t 2i6`#vNr@7qIww /.M#eP%^UeyOCg$ğK#*zokiEaI{2m=J~V D6,{\}\2{7(ݍM/e4 1?:|ަΥEhf@EeoeO?]?sM~_l芨Z6IQzVo~#<[8CB쎥Fss,5*;iR'|4$!9[g ?KkxS>I|ϹXu:Ld5>lIG&X\[;b@W*!kHYM0}'E@IhS:\,FKh{ 3n: rZhf!eOet6iFקHQK*y a3ìۃ}yS,7a)*8F'/{- E*gIPGq (5A`%Pİ~<@$~Әڻ)WӬ 3Y+WatlӞBxV;Z,-1&UU,\-t-T-ꨉD%Vs -"܂hsSN H`,Vv-g2[1uǰ̓gы8ķW˝/4sMJn[6J~Jy@Xb(tK>mxB[By5MY#W®TFhd!#^CVdfcL"g}-0jϨA j"NĵQy@!d) Zp[Ѡq^C .^K-ˮh_q1L#ʈ$Šk:UjmĢc8&@ BJIfT 1 z'S+bAvJTZ(}ƥdCoCٖ2 G5/TlZ-ilaإ'dQGص;az WHZD_";'"$Ѭbhs nP+F(lz7lvgp\5/D]A1ř/%͢2f;Œetz :&|wO|-?,bOyZ$MsW8'65/[}& 'އ~]}3uѾlt`UvLlQ=eېat-րpEQշro]=q\.tKj%/_^_=@ \zps_+5*H.fowWз4fH E4AM^a0el3{ZIP%@"Q׳$`HLɼSwm^G }3p ӆpAl Tp^PCf@elo$˺그'Si]h*/zԭwD^RR=ݩ!X4}C Y97W"l ͧzv$׼)RVN^v$i<0{ 371JaGQ(i ܭq]';y_G& ]5~5$+G_ShqpѲ8] ~÷xnZ%)P @A5 (x O XǾ? ( >{c!4UͶ@ȃS]X3Dr& ND1,t(]:/BP_s fL*E,;1{F K\9q{[vx Wu3@AOJԶUI|| brSgk%QɅVn]}Yh GaC@N3Iirwh.}A/&טgJmk]S$`%qu$ͅec?]͋S?ߝDo7 SxI/թ;ȐQ'qegDZcr^l?PHѻx݃/;RC?M7ކ R ,^o>Xü !"l]FNK {yDO;V?kdMwdyd_},f;OH/ţŚoAB/y;'=,[2-n (J)|+6(!n[BYI@jAQA 5$k +SCv :|g+!cȿL3ZtND(╶ E[s{*%SMŷV+mg0YO}SC` /~I'@cDJ,{hI["'59`'$J 0)#{۪ %Ou^;dEB[DR y\7|~aL\# "sP>ZM!MX{_6 3 ]Ο0QFQp{IʥlJa|7S.mkaAU E2d@Vnԉq#s\,i* |RHxxvlNy*U~-iz`=!zE򙉊~;>N?B W)HQ8<0UFڔ2ChlV ea65Dt?Id[/p.:WwkN;=cތ.y]) \Hovׅ^sYyΉIAzI}䜶D'`7ZjcY@˰Z>'0XJ.yÁXuqЮgǝ,&mj^?_-kڲK}z~eXDćc1TYDWfBFxA_K$]dDl\ e"ɾVU~ I(rW.G&`5X&z*\QɎ@GYU>f Yj.ħY?~4D7PKMn.cZF i"y[Ip ~:"%f@ֆ _zJ;- Up{_ -юvH|05r=j&zԁ|U!!3ʈ#UM,[芙FQ<y+Kĝg~L[h/dI?S0-MCqI y]x$aEksVR6ߜq9f89}1;ڠ/pW沦_.aqn0D]ŅO%k_cNZk5oY /<3>EsI|z"؜ 7Hihpri`ZGF3l쐥 `I (yzֈC7d8i}>:AA.i蔿. (ՙ>eNrYLfWRxGWXeW^fPK8JġQ݆K=E|ԍOs}_;BiDy=[c?K%J#NϓA93*ϱ 2VKV`owtxM"$T7~FUn4ײlp@Ŕ\98<72u-7{?<[mYA?Ύpq5nJ"1eKmEݞ&?D ng0 aY[ ZkDжj~UuU{']7jD:_|$d[F(PÎH0l1T'pZǸiJ!/,<-AD蕀 ɦ^.>@gyMA;_dij'od«x%'ӭ_~ր"w\G]aQC@uP HdUνA?[9?G\fŽ=&i zpgP,%ka-pfy, 3li5xۄDD$N圌k9/mFjzxK@(huK,$^ǨDiUSnY"$(HDuYgB<%qt!e6ڿ/s6u,>.DҕO䋚Lԥ83ssiQ{:Ac8oP1~ד{s, 2&8RFڄl:!fe0;[~%Х;О)qK|b$y!F|ΰh^J~ 4oRԚt|Gc^KS>x5V~rgͼ*1ԫW3MfB*u2A f|Nt `j=ڜs<75Md[,:zН%LL*NFhX){tG t-k?ipou;b[UEJVYn@\7Lp<[\A>'}RBYL 4Ac=Dy.f,,{/Uք;o^W1 AStj@hUhKq՞)n;OQ˘zI F%# ˚>L=uAA껔DC*¶a ӝ88J$֚rMCV=zz l񫇸m l2spMVdS`)G5ⱃsSz^pQ#B̈́kZ2|Y-w̑yLR%O.%W_06j6B8 <K=IlUcߩ[m;{.SJdϫ;-fw}D,r{KD``haG2B!]e,ҥ`R|kP&T_s%AK(H4e^ F5.zk\6AGG:bz&̒t{srr c'mo G)@ϪPKMҢ(kZƳ<( F>u?bgkB ؅qC끾(% “i+f1ʥ^Rr{z:Q5ߺ\L1Sԯ\/83lwU|6.}@,fe@B+*Q+٪7]izJNsղA3yU (zaSX~WnQ(rb ,J݀+f#Knj Hr u}''y]KRXqoߥ[n۩q:Xy֝~Ȯ eOY**Ab0>%* /5oMeCMbtj'wڡQh#5˖,N}ܶk?Gl{ϋ׸EQI(TD{v7 ;TjZ&M /`y_QNґЃ?R9{Krjsq7_b`y{: OA,q.4Sy> ?Tvb68H)P7f73Z9MijM%ejz§4O^]~zv[ó@|` w.'XҼmH'O5 Δ~xF"\)5[-o`j.k#{`7~y:Go C*iX# QJe` )mdK r `RYM9da"6~ðM΃8\֙XmE"W W-XQpŗqƥX%F&@%UacFPtݐ$*rv6aIJpȠV'bJK8:$2XO-٩!xRhmeޗ/Ϡk5ئ^)wO <'(WΜͳjʝï"Pz*vn /A|?١]'D>R`"ʯw8ަ<,beyT?_*f}V h03<6wGb_MvP\SZzC\?- ;lWjI=kGOw:˳,򊅍8)35T 6WL;rn+$^,0VB9溷pJ\S##=؜N9nqU\όtZ.]pωn\S+0]]s\ǧ`ŞD&8G\!`C2@2hE?bV~B7땿H74M˖S>mVڙUn 1;5h.Irӕޝp&Rcvː^Ե }.{ ::ʛ(8KDC \њT˗PiӠ(l*G}՚CH980`#-#I!q5.:yj/ )yM-,LjuaĊU{?f-07{!UJv2lgwwbSE nY7޶6gJ%ڹeI̸݆}E:<֎5 Pa+)&:W@?MC;́(cơ]7)`MR.h \TS##D%Ne(@ ~C5R\XG4`<Dmsޜ',xNI8ͨ8/ nU81Ǫ‘D̜9J^?bJy>a6WjMdXJ{?ҮxIPFX Cb&}UR1aPb1ƵGyiJи8Os_8f~p ~Z#֩6EzO1׎[:mCȡ5a >ukګQN^ƥdbD愒(!W[zZ;6źly4eALGk!=H(Ƀ)BL#ĹWMX rƱ 1=ΉaDޛ_i4c#,L$?ȅ/!:|`4: ia)^3Wӽ@W4j"md&Ô҃aG">`wH`/]\=Qҟp~ cMI% ֕x4%st (8\>svi,c3CKJ}؁ y6x]w,1-(+IcK~Þ ݊%ъP@Afo :l ˥.uUĦ**q/CKfeCn¸>@bW*_;SjGuYT?gTn']K3̜__fylw/Ǒֵyڗvڭvo{p9#R,rN<L5S)O`.mtάYyF'?$a);T32c*BBrpd .* z `h1`i!AGQiK3d;ߦe|p&M jnB9fm~X%;lؾw?f_D\̻xk!1(@Tglw1xTMYp p1tGS -s8]}d//bt⎮͙6ԩڰ gec|]h'҃'ش=Ҍ')gf6c0: 4޵ne.NhYW 6:g]Mk'm7:ֲ  \(!/i, K#\6w־:yA*A3}X)q. e 7/҈ Rz_(esĔ:U/z1/^sK!:ůUrYOL7Edj'&G()7/or-hsEnw1jt_Ĵ[c-CLցn# k*xPjdvɬ(Oھf`)Gup.ykpܺY0si 9uzӿ$ H(@2+^AӚښ%W?t?,÷G09k艭*-\̊)]ԄWʪ Yi45÷ }$!ˬڑD3oߞ7Lo)2q$]i %[\ȂG%ݴЃ\srDffwU7XvRrM#R [ϪhN/Z$%?Č4FU5t^:i{Piy$U%%2M\\//"șMb6\ݔC<|Vr^ !g,:t 7& ׅ";1K7ЬGsr Ƙ C$MuA$Zzꅚ:ylgg<Ud$RO!Zk9>X\kJ!CIWoϤ¸PJ 󸩮CfݯrW,{|*C4@Rv\aLlQ+VZsUt&lfA`SM:nx>O8Œ {K1N6H:)4]6]^c42wGDʱҭ "Mu @ ŁMw2KkN}V˂v1gPǒ/=LcdWc>ikŽ+/I~NめLIbT)o~n'kpfj`D^xLYNeN$t T$Ū;S~CF[BHM۫~`paەexF7[o2]~FSqՒU8G"TyUZ;쪎 `NN.0"#j'a#d1cc;EYY !swԿ1Dz2jy𢝉 (;M%yG>iI$;JK).OS_'/Z,P*k[.'+6uR\;gc̔R2,.Z3&R35tԝ1p!3Gƍt%b" ]IYa LgN:!)EXrm <:/O8A`]>Hp٘$}$s!LKiWRswM=t܄Mp]C@XoApQ4 ċiwݾ X#9?SqMhl} sOBE3d |JY۠|Foer߇GĶ)O ^1Xlž{Uu4!pv&ig'Bj`X|nR"CDMG!'iRTe~ ~V`LQjWsC]0_v wAܛ|m$ƨͯB̻E.יxu^`)dž)^S.T.d(\uGJ=h-ϹSѵK.ž4֭`043){ ]~(v3qūI/He"|?I^b9,yĹlͳ-83l펲f<$$㿂2χ;MXb+ e:~Ozcf)b߳] A6VAtj~ !&d\]ub\_pۍlLvFn s /v"K_>,B/ޤeFn~խ]#{V|Gvt׫tci\@Q]Vg]Z\;= -Vlو|?s)pi4{] ԗRUTˁK6, 59c<_VMy"4-(}YdÅU-b@I({GԚF3lNnDFu/{1Бީ)fpX1zsraCT5a3}] hg8f7 f"?:_P̳C+35Bwrʤ=mbP&)8ܯF!rĤ RѦd '` ]KG> *~1q ^.r!Ma[ŎA7mY k7 HE;T&mǾ=;~_,<|Xy^{xziIA)y\!gϔ=3^})z /)Фyq+EUƱZ01ZgS4r(JwuZu ^<1K{Wqˎ%E*jaO~j #ZA 6sܢ9ț*v#%W)OVA (.T$n]+FnS^!L}UX2SUɰ 1:[X?*I Z̚] 8QqEa5em3?ELsP3|2lur若~T-FT89\$d-&p@5N=,E'$͐.W.7lajf8 Ax"} OfBO"t%5)raHs)eCvY1Xj⒡V \ol cyWNm9bN9'~ن.% e1rpNJqTEɮ_'_N0E,p!i&Uj C<R_P;umxv;a׷Y0E#w?;_ez!<Y3\ x$UęDS-c%e hn-hXiG+jI`s_m/#$ndQJ!Qjz1)-N="Xތ賠l&YbZ+XVۇ[bnՕtr7*YUtpktń{BIpvqg7΂!j& J\E:9ec6Pu{B*Jv#;%}UxP`h ~YY.}9o ֭O +j)J|Vd2Zޜꡮ6-]`!^fَ^$)V>E}q^\L‡u 7./~ ~Sh?i7|NDO[Cn6f4 0q}Tu4N2ˎծ^hʚu'G%<[:etni|xJ?m⏄֖^>зzq)f݃n_J{fN.rTpUz,gشʎ~RXbbJ#^XS7EM mGQ?DMهl~^WEF(Ak83ռ{Y}7e6VUr?[Zu2se%K hBD΁ ~- ^j(0^Khٲm*i\k cWflx'M!qJJhir"Jxٯ}AN!uYi5rd*D>pZ}j؈!+3JH@1rܔmnQlg*Y(8i^M>ۗw_&ڀs `&VJڮ%8;wX=c"nmx0M/B6.gct08CqPES { c%0L3Cm3x]a`_ܒOj,Xv9[{x½vjnm.,70'S/Rkld:%5n.K}~Pu^뼗 w=)Ԟ>t6{?psÆW$F ^Y@ W{B@(>^R :OLM_NRK}X~o `܏>[߳ 0 rK5h0ӐVpi@#@s&nqЋD62Lbh,3o)~|ӭ0~5Czto3&Gu8ҔJ<}4\5nȃ:z5Y)Nw[웛EF-L1mӿ7\B>1! 6ȐJ|760n^0%cbc2CA/,Xnkppn$q3 :McMzgYܣ~~.!VD(F}8}Άͥ8* dy2 :m^"bK{>7n -ݜl(05QG;D%7.o{ZK< D %{ Y(OSuȫtk4{g_5FCo7r@md;>Rܷ+b[JΑ VRSKB7K4sC 4NTw-LT*aF\؝n,pp楤6EVmJUoG2vݳȽd#2 ;9-w@;Fvy26K&g郠`0_4KgбU^iZ{lK/*T(>U1^bsnPظsL[_֔~NC$ʨGV0(}}G^Xme f}G0V,@ϷMu{<,W4CJQv_03u;S0I#[;a9QJpa]LP$V՟f8(cϳzG­kٽ?!zE]Yf2p@"dq QR͘8a^O\˻=.js1g|KI5C!Sr9&ɡ򫣵ul!b4,0gJp[ mV0ICio5{RρsG Yl`l ZTLut>X?jg;lMMq<[eWb/hҮ@_a6573C;B.oT~#uMe9>A9lC %b vуU/bv? [$?vg?YC4MGf^NNebg\Ɉ7,d$iOdeJ2-B낡':^6"~Fύ|p$(݁2$Lu.m! xDr%zbt60 de2`hQ%)2qB_ 0Z*zv/{! Zd5GIsETt2K5vhK`5UL-"6 988]rj%g-5uv+Q#ii1>[L1 e:45L$̭*jnY\^v4hu hAvUp|gXg 4J*ZIiqu! MY3|YjL4Gݴ.:aFK X5۫Sl8_:'0M୕*5Od+, w"WiWJj7JJmoF>MpIqF8 `.m#l5#4}eV`,KeYC}hW ?$K!Q7؁aMFw뚄r3*1B!'==Mk2H|}xSjriLR ,2aAe6R3\n}'OHLlB 2>b8tcVxs.,K)BFf[qcڧz>M%u R̟e}s>Niy.)e4'}+%epѾ~-[zGW oH7Gd*5`<1|}8?gc_L\P_飞 Egz4egq-)j\.pGJ:\,w5^guV *v=7ibwрST~(ㄮszHfMD;@Ȼt 2b!@cB"9Cs]lc !4-Fy?+c:bOr [v(P$PDwsws> ]q V :ٳ#Tɮl܈\CO%3_0 PR}av^.@qH8("ZҞZFa!dMUNɇD?C$D)pVoL"d^ C `˳mPB`C,9EpEQ7ah ]%@W_ocIߕObX TE 3Nr3NVc/;o4l;bNMtTK舜& u+ߩnkbྒr]L:_)]pEI+v/'l2°Qgh->=HڣyLaQOX2ǭўEgKyV!xrA1/ HP$RBR3y(=Mޫr9d޼KWȎ$9Qr~z4@s&w-w1d@JʔbƖA!W*`N%;حMF Q^dNsC"!ٗ΋ބ9JMGg5P:ȵPwXkCXvop"| ?i1mʷe;"?Fl;XkYʑ=~+6C%R6(%85V&ggiOjkA`>n"jt $\u 'wa*X[-8'ȵ]2_Lm*$j5b%.\h<͜LF>LzBD@ήi^zRQi?焱LDiW] Fu5t9$A8ƞQa&_weZ@Bv1ҽNPl5Rv< B5;1[?ğarBPs(pK={M^DuO2#(nS(`B*>|bwg{4Jdpg27$Sgz )4XPK:{ScGĨb}#؉k? ?2fX{ BXH6&(>~G^``vaEһap/=)HO2VJ8T8:';Q9w y?OFy1 `J \ jr{8X% QtZ(ZgNj#턠-nneF04#7o5^t.R!kZrTP9&p;0xbx=(^;=F|bf4_JewN^Ɖq9..ZELmӤ4!C;-z 5 S|b=ś6n52b3Ħ_"W%g(]/ u?* ҆8iwtxN'_z?fw +v}n'#ZNG.#6thr=<9_g섆.ʖ8PUHi ׂ]9nӞ'ztg_5HLeUs#gȳKEAz̿ff&ߏé3&1u5=C ,@EeQ\-/V0*;pL~z~BMC %"%b&=p=D(suY,[Q̊mSH2xd0AsW1PV崛twH @3S!@ MП>- rgH˪mXZ^6s~588[G2w[Uπx"P]Pl 8T0 zHzdtuz0 EU9 v}ts{)@I5\90RcdLj+mTaXuD+7fc\BqD Y=+CZ ޣ1F+!@ltE3p5N{dև( {jea}Uxc+& 8,Y!; ZׁqGM p>vl'SZ[9 hs=0-q.$-3oYze$Nቿ"l#9B*m]K 2DZdOp0Ә2Eo02hJ^+kWZ9W3(^;[ Z kl8h#ƁdBQ8t0 <~Uم\(U1S˘m XK{rh\75]| ī,s 'Gkuˠ QK ؔM?i9Y"J+Y:-Ǻ,fSG=~}(&pvʪ&F"~oy;yGdK۫OMwԬa>qRpjyՇ䫨M鹸:}2~f?uo S㿻W#ޚrR?g&fR9 Oɖ17;:z|&ruU+  CSyp)k"U}}`p紥a eǗ]e ;F~9J2Eb1PѨzya`x) \B/+6OI)pYҕǸ;kCuhuK9 ixoffx[C!p,Gy]l@I5 s5j  GB鹬]#"ͬ}ԣck"Տu~ݫnǷgJʻ@4PƗěe$+ D(@J@^8A%1BJlߋ@^p!0鉊`Q0^|# Tdcy$]aK|&?j:]"n40H'Dz+2w311E۞XU ?[":NO@7K7 }=]ȥhRH/u <0ng>m|F?H<% ak|r (SH^:F$/'{Mp@N1f{ Gdlp\zxӨ9{A)JVO*f#WS\8q~EoRzp*QM#IRZIDpGNx 辂nh|Y Qu?baO/5ց4~Js~C+ BBp1q՟ұp2ӝt]K#i#b`/ϵ8K rIa׫!=ת1NMܰ+L* +$}:+l=Hw z[ыST&+ Af܏CoOPv\`n<ׇBH "`xsDީ8']Dv"5m >ڬ8$$x"|(&+~|́nFt%ϧM\cc;?5IN B%q'd>_ŭojdz%V?ɛl(A(}`6fH[(U}M,8z@ >IO#UiN4V6+Xy`&aqL/^aBg82ݧ%`XnGgp3) `,c>ZpFG<"$C }{شHYQufa =bxλk ib?sέ)~x)>:%nvR݊GBL0r #L{ޫ-pFG1^nIQdX"? 3Z PuưC3 e/xBd:K\@>nj Ȧ53;H~Nlq5ri6)75̈́~(Q۸mbZ[O \ |XƤA䩴&nPH$3X*mdfMfTUލx`.2-n!\u΂""Lk-1|9T+\zj gXݧ7L݈ %}$ҝd }@D$ o,hWy I-b*))GBEh3R\Bʠg[ "{sCv\EٞP\sA.YP,"~٢v+BŇգIbː>ݫ5W*XW6Rl5^u-vap%[ hTx&)`,fhf&/V}A܄ AZޢ}eK`FLdfc ޙ;,Rf[Xm ,W)w\͟yq9r_uZ/޵甥B2QCn~[;Q=!Օ ,~v|X 1hFGM'uE۹%{ W> stream xlcpnݶ-ZJĶm۶m;+m+m۶>W_?mh>f1IDl=Mhh9J zZ&!Gg ;[agN1@Cp03wQ;P60pY۹Z]]]]\h]xiYdbp67ZX5$db*1[Gk@Ʉ`j05''u5qt @FDY@TNV "D,05Hى󿪣8em?XutE 00rYK7 [S;.MC~4v51 ػ88dMm6p6G [OIX]LV3OodL-\lM骱R "Fvf%4p4rD/chТ׈Pv^4&Vz35rqt4u_dbnblgdR+R8[N5;B2}l1hnwx};`ȳ{#qDhkw<1EZLTa[V{ ec$ v~眾(O9ɜs^#Pʃ,_h+FC gOU4k\pmgKϼ<0T3oѠ 7Sk_-R$j&DJMDEi1KUWk!Ora:]=.k hZ}~aYjfS1D6Qs*:fVJGEr80z^X(73?5h\LǤ`]X~$V'q־Iln~[Wv;)keGQ;0P N nBb [ ֦D=v;=փ>2.RCf[AGaDQ`%9S<[otOyX:*rgGH?cvMB BAc#',Փ(-!&_bt쏠AxJ)7ͩE[m} 93ьRQb:>ǝYЯEan&̆vcS9)u  ix&<`AQ V+p`G 7,>~q1 gӻ`3ٵs #y#Ze `dGJ%M׫ ߅f>˃VDF0t4s5kDgՃ48݄/qMlUͅL>+^? |G=<\a_ðNZӄ]o{)XC\/{ U1^ ][Z݋f]}m/y RbniHI[}]&LNQSI=Ԉ<")K$vsHr{W Y9tV@6)`ܬJybU\q@~a7CB bB>#5oš,? |ߛVA,/}wn:cMvs^<6BU@15Ȓ-t|";TeԷEݍ4e-7`%3rlz,|(^@PIJVDOG\D:|TTAm6ɏ[Fd-ˋIqzy%V1~t \m݌H~'8g.1D.W!7%6&m_ c4"x̊#\ϸI bQ8~iι*Icu=/znGƩƐ%dfpMaf~N d$lݺ Ëy% {j[ky,:'A̓RܗkC5.$,hۄc+''3M'vΏj K}0߫ f0G3 Xщ x~zt6YUi6.*:Ibۈ.csՃ`M}I,ǁ|UvA3A%Vp9۝˅w拮hLMV"oa1>w>gm#̼D/2KxP\&hIwBB-hDcS}j8!cw0 A:NuL)B0F/06A4f[3M Ma]$:O>VfoSf:JN 5Tv挬72*(CB.a׽z(p$?4.(ZX#΄ #+),zES#Jkp)ե;aVVhPw(w@}rqo"";Lo4(cYV,4}_B 6iAn"qZ9 Gatu=ݨ%OȋNjE8m )8)A @3OAv"S8qc-{%"tZ[^ar7DF&㳍v0W.]_D"R3 Hrjw{M & \.v:h=)vJ2jVrhLe8]}~'-ʂ2Gi 7MFr'"T}' )n8-<:窎T`B,_DUX{Ng-!+Ӗ/v :'SF`g6/654l_Q41F_&/Z(`#T} Y_alcvwXPZ eu!Pgܾ,~A5u!2}IK|xNZ M_HP](a֣mz[*!t,μ 646B}O =ni+j*~ُoڀvgDHod./GRgƊq1,Ӥ-3GE3eTl7Ss{'k2{ʂ4lc 9-*:ch#Tғa>SUI߉[uZ˔eӑ_.l3C`Ҭj<CAG|.£b I~EJb,GATLDN#`{߷~:(ө1XN8 7vatX3fqޔ,I3wxm.|S#Ps-jw?3cLGPNMMFK[A =X?&q5}2{@ PG8L8 YZa6n/qn0s V^a$LmsKn(،Ӄr(\Cn%ywj"J:DiDՄ׷"ĭ>D.ZZd b={5B)  ypP5 u^DGmGNVgIEA*!7 ZD/W'|AEǯƾ'nGg+]2.Qa$QOG|mqNJw,g\5F&F tP U D%/h`V{B3wrJ{֑8x_5gVQ?`r~v[%ss% e-CѨ\eWz7䌲mp;U>F+e۶SN)'y| 5UBŸZa&ZcWp.mЫ++r,qh^"oHr<+bcϝR39zƢj4Qzyb9zI!C%pmNj;4lqO-fTōe=C.ї"M_NC1SdpU Ev7Ն:*)4xA%?023RpH< xYG-ujLqgVM2UJ58Y^j# wNM>Tstfy:c>Zr2'߻?` |\GP }g1i5sm3О_/A>͠)kjMXxGDqV'B3i9A#6vq7؏,q5?eaI JWJ2b)UE"?]k' LX]~A,%PKY 1fX|&t >B˘V2W o0ÐѿK?DHO|^a[G*">g ӗ)gO=,ӖK_ ,h.t-u J^\f7zt7C-Ϊ B_)J{#!04+CZ~ΈBsOHnOP BVXu%8{.PG1HGo6!+kŜԣ@o+M싖YgؔB:ЮkIG\SOԋpPH_]bb?W`s>D)`MC,sX`AbGqh|(c)ҧ+w?xm&a}H9,4u[~W"cx:M  ]]z /Ce0Dv,'DŽ6FT}/ r8kMʱO)x*3c=ML T ZOj,Y~t[ ؝=ZTݡojp o-V;ˢs|^lzZc>sie|kyƠؤ9yEF :47!DFHuLR!&^/٠ig9s_TⶳQp,px-|c[Vl:,,e;sAg.:w (vSt ƚbIp"tEґ2oW>)ww+F®- l@W p=:I5}B<8bp1h`>KbU2j1J*sf=jn9N؄X𒙏Wy K H"WVeT%F[ʽ/6NlpQ\:=u{yA׆+kk}m>g~ )Yc:S˝s$TNSSYڼyjEi*nȍCӵL9{-n!iL8-5SX_OlU`0=iLNZ22w0/}6ɝGI1Hy_ȵm "[%<-*N}P0>89QUPn0qE] Rhv F(1\C&T;_",$Kcu9+4&Q+b.o 4.2_+Dg^U/B]p*ΠB'E 2p CiI2>3q]e;u7ʟ|?X =s`P=d;H//lI02u`b2L{A6J6W~nȽGFnקNØE?*:_׉3lP jdop_~aLq1,'FlÈV`1fDW[u;0oD&dfciR!U~ޫ^[:#;8>'4C)z~/#P͉ `? 1m-?|kDmQF݀SMMJ8YYtp.9@ͲrG[X_OGX[Dȉ1*#ŗ΁5{ Rz􌤅BltI(߬ctRN\TZet'`-X|^PoJƳk1E7g\6*!{&"nq"W#zo s ;ASЅ>{P)$5FsRPVPquSQ8i$](n O^z{`g^MDco{@,obԟҶ>d-`B06x8Y[r >CzEf@.Rs6늹 =|dCfIѸ1{ԓBUsLC0gۯd\xO߃Ea}T3pNa{м߉^OrM-*' Y#WPs7K8dw3Jy&#lbÙۜ$iKlʲmߜcb?MFKJTK W"3+,y9: B<#f?%\|O%wk (Q/^TM喰GB tübrc;L;LC"aZ=oM#]?& o jD1=ξ6Wnz s' ]I./Kn5v D^Z(]N|7'7nl nv7A::CfS1Yغs_@q.rF"hUU[?A$` xXY'o"(2kVўXcnx]#*L΅-f~Wwmh; hIa'G2ؽk_ZpPi*xVICgj ST6Bݣ ɟEٵv30oit 6f$0d(+G`++O@ й٫ybҳ4kK;)/݊wifSjP/albJKy)nS\#Q{41 eઠUv}o Myk_ݥxt *&4WtG!>FtT hS Dcozap`u1(6Kq4ӺIw(0_pm$S쾪L8 ?"Eu%J$?umT`jH,o,dl9ylgfk͎d 5ײ*rNWL:[{J{ "ɬN[E W;>còeg 4Bc<~p7oj%̶L k)%yª.O tz_>^ZkONcnIq~Bb  Z]!@EދUI݂鰅%[ȧD< [mP].b@W ST`6Γ;->?f͂g䖇Yb갤G&%/;OlM;;]a5E8rx2VJ\Eϙa?,S-2Q[E,v O/Y9XAe,sـ# Z^˄Ș|| MM,}f(1KcRh(jO/vGybtiŢOeK?_;omb=Q|=:s eH{y.N2ԕ3igx aPGL@ީ)ZvDž (bWL)c '~Fs̹` gNrSoB*{yFSYX q+> N7ϼҥQj>'eRA ëКy'ۚZ|{"z!#M~Ji#$ےy0Ib7:wz5:>ڟS1 5 jASA  fjw"Z3X?xfN7СE^?/Wҳ'z 2kY ߛz"QWmr#R]~=wق.N$t֖5IhMiѺҍ#7"{g?rWtdϧߐbbFg15<^}պ(愈F+ CijkjWEZ@#lM$Dž0{ҨuߩcU-C^=ՐV0CNb! _Z=٧(N:N}>0uoU[9)5إhLt㐬s#$WReC@eg؂&WyMr+>ޒ聤|=h*ɾ /QF'ʶajd;gd =&Is 3͚knB:g+if8n) *fXՊ%Md-F8j;0l '~w# On[7*EF75|;f|U6ֶ}q@4:CTQ3ZVLgXw1u+`ͤz.kuܽ5Vw _~Q] qt^meyG D0@n@'x-}/429:}gemd^bǵ b g1:[fa\PaR'*>4_0fT~?SSiK4nvkLH-K .}=ّVFTh%-|Xuw԰ exJv0:"VLhJ Twʎ¦1Pwq 8 s?Mno8PA=@ǂX?4gebrw>hP/z-"MG5utq]CcA4jx ˽J^PKG'[Co8`j1ԏ5UcR.Hɸ8FC)$!BL91GWv?^KX:zޫ"uxn;n1ze)Ntƨ_ Թ=^Ya)7k{2|\BŚse2nƣ!?.^2fՊ ٻ%`1$M |/#]fΚyiŮ++GAS{T$emycdJU,'V/nlѮ= q*}?Z9>OK UXhQ;TI3&EBWsJҩHCOCw)1y f/A_{+t9.!S_s53ZJP@WqLx Ts'݆>w$*Df1EltՏ ~-a%C}WS1[N8yU.^ugѓ*(M9F*X!V 34;0!T*Xp&Q,J/5K]2򐾪 Ѝ.D /g#AN1PZ&)ECc9sJu5J[xl•"527O 6AItUQ_cv5g*&8 ž>XBկ% -Ruc,Yhqp3Db(3#-n+]79Sh3r';9w:daFKpmtyAus8=+4ylSM[H|R* ~BRIT*d]>q mzĴYˍkK:<*$+Ixk(4-'K;ܑ0;`䀛~Z gSfC̜7,˱6eJ$U:9lZ7nh 2`9C"}ER$_9̱~ bxz.QG `(e~'x }Cp%'wɜ*\+ 9|OoW<6xĦmR`4wfV.= # h%'rpgxk!<` UF۶b>절r?=QA) RA:/Jeu&L6׶ dw z!Q4kV}sY:X<8??L[(RlNr`Pd rۉg'oBjYcevm`w|0~o͗fq~QK&>UדK*lT8}f?&I-J?1dٺ3ǰ8LCrdqX.7Bv2" < 9Umɚ}H0AE3d2Eu>m16tJa~9P+N=-ZO'Xv&.I\@2WCDmr}[`# ;x;6 1.)XWZ' [ 7QyJ}M FqsMCKm00{rTRf~hƇhc'E 񀬛\`aƑwͭ.\҉֣ 'a#dy,{ai^>c[>GӃtf2Kôӆ˖V J7w G꯹VD{1u-hMѤW! },RLYM;K4u]gs@XM7Ŗݫz-Dg x#lDJD9P!<$wϬ0Ϫ[ ̖ ϙ@ rP3Qmq!8A[E!ObajHa!NU0 !<1gɏ owf+Fl(^ky]=<7g;^j2!]:Ȕ\цþq 5=_Mv abOt+<~꽨}'QԬbB`l&Dž:{LSvFMѮ{~/4C.N1’tLhՉ̸ Nv@/0jD5}2XAS+dT\~myl^i62o#~=IF=9UKh>/Ӧ@65K?)]R=]> 1ͼ W/ğdJ>kY>}^2cAv(ԟi rmPVSvRbp3_"dRl#IHBDWK,Pe 7Do ,6@A{RI!b@?I+%Z% mvҼG>p0a}"l +f4~{ n}Uoۛ2H #Kvdm5+XYv Eʓ$m/K7qSMA6$ZzhX' wj"h/SCwԿ(ū o%r3V1@-_z!`<!> Y+|ݘq~t^ghϼZU:`P#ן7ktr#t>wge1f4=NgyZ?B%"Cq%ekÆ/m) UI4 x' 2 vFX q^A u+A΂~ehn QV{~5ͅ }Y@k=UAYX?]yS3v:GK2l4 0/sfž@"a7^h oeI?˩"J?wcQTNkW@2 O?][$ųնٶ]Sm8Y m۶6i1ٶz~oϿp΋\׹?[P6*"I@ ty΍F0QӆkڢT:" kIUQ75 A4G1v ̉K*17`-)xaN:(K$қ7y0zDl0/v5b^L'zR-lC7$8;]oӐ D5=*MJ2 a?^qFiJ44ru_z:%qxZzsLV1ְ>>tAXmEuh 4$u=)Ԫ}7Q ښC{j4ˊ|b.J=κّܰOt|)i{p\(oܯ+S.hHW֟TzHHXz ʞӼνN D@xmh΀Yp"pI d^(ڲ~PA`7Š7?99`;>O֭=y~|J" _o6vִ%d ?`}ogO+v̏yÙ'_NRp]H 9Ql5ԵZ SZ_jLvBH `fP(cX=)ŎnmkO|gCnXAP"2To>O9߸ͺ6B1D:u)c'`)k3 !v ZyJ5pjn>'YXqܚ3+nqp'Kh,i8N,axLI>:EвO٣{uaV*RJ2/ /enN)d`u)Sq,Dg 7`# 7)RLWYQ&hoO WUZHRMs~nˇˌJ\;Y9aE9U/(+IRp ܧn5 rU{V?]$~ '`05tiރkK< (hdД8dNk#?31CEM֮;kec Is^INscΎ+s^/ \&Qn >muS, ΂c0 ~ѷcv+$bCJI˟XZ ^E3BUa2x-зyyduwt^n>sVʼ+aVu8|f> Z,ŗe"4 nTb-"\tg$ p+( Bpg7'$z/)ӨvSgzr`,؞fu|wX ┞,j-u@[!|U6Q͢#.V' ITbs!zRwVy/#>* ʝCkEpE ,mok"rgwmݚҠI'(o@Yd*}UFͺR![ j _)2C3Lwe!BQfOV[ǯVk8 eOa4ԩ%`;H!8+~a8nQE"ǩfedK?o܃xLSC19 8>IvJ0I+*vJuvŽфg&+Uz8rSJ"8QNaZ#Wg 7+h%ҤdcB]dE-'2N˜^%X"olE(c||Vd# _%m[5%|#zC.M  67mئ;:TQ|A^+̽G\hL7 d༦3Z KSi ~N/_ΡHI/mfl8þʮeBA{TP={rUW(WkG"#%%+.00vm#./Veө Dd2 ʻa,-5HclGgZ4HcML!*DTOiGJ;RŁ-30r5,瀢}M>͐Bњ2%Ld͎bDH pڄ=iwpMOtظ^OUOd$QLrl*Wk5h(6ԧmQYʇ"VW+mI8p͗^A5!bMzl"US ]k1C?gVS+PAZnpԩd-jD#>0Hp؏ݜ cu/jg^ǢRkDC]~#))C96cO(EXiRv)ɣ=!"/+?@6BFV8EׂIZJ;;Oh5) /ٛ-ˡ1. Wڄo\RQiZ虧,yAU𭢚ęA(һ%`7fҢ,Ìko! L0LK}iD X4 }FlAMէ=!\0KE Ԏ8Ccë6yd zck~E2)=@VI,ŶεIz{R.o:gn LxT@Zd|OBBAIwrqS0pӈ{wE?^D*:\|~MWf̣AR2HpͰbc4sѾ#C.x@F[ݿ:V$ ]^JBk>[8x<|r XsO$3!0 kЊpEa'bB`FrЁ.[ }.^U]6%RJLO*|.yhI%"rҦx^lmj770 cj*\if@ ;_,}~Q EWS BoVv#^v*[U'еymtL/A_He =DUSV|$'G6L&WOAUj_S1rYZ4p""&Tk:C qz@;Qo̍;GX /ro'pn%;\FtגUk]Hzqs{rC{^WbtChQJG0 "ḍc̝6Y!x!eqpz,ܸ%@~+إL>yw?9akK'(е*nCYo vx&ԝ{0#:Z{o8AOS)Y{H> uu` /VMymCm|-^*\%y 4ǰx6 oR peWA=\":ܜ4>KǞ 'T9Y1CZC5Du+g>hEMr/oao~]$TVQ-uN?fCL dL?[fs/AaL.+%X%tGf^u Rw.$8r]t9صVYrSiʹ3{;}IDn3dkTo#4'sP '"O(9 X\`, )KR~I` v O (s']I ~Ne%sS6`O޹"ѯ]BVУ(l_{x-: `;4%w-u UE7rCǵ+i*fk܉.HM=^ҋU+" 6 sqWǪaÑ42g[N|g(—s`^VZ׹ o"LQdO_˰`Yc>]D6i_cTvLҨ?cfY6zŚ䯫UBPW/z)+v~n_[t=),l k@#ߔ07zFcqi9 E4语ki$)$(jT430P?"&ZD`x͉I$t#D=t{dS| .4FAtY, 8ɏ#,OA먇LZMuU; )TiZ.3~J!g6YDʼn!މwor=oWhsr%z5ͧZ. Jcd;]ioOD,r ^0@?)03Jq/K-`?g&(bFtξ_qqD7//fh'O_͇8+4( )GeAe<`U}!Np붟ϲ2u'J+:Dۻ,~)h@#k0\bR[D馕`YD*,F-]]xߤ㰘F5f>;+|ƾ"@i)bLORGewl!}1 ;fF6̧%.WwK(N-Րgt\,84^~q˅"xfI3bu,\]uWliVl2MpR҃!{2%T`=(lr+XCs+z[M)yZʴgJ:96vO|ɗ.o<Ia{=|3fq>LKx.".1-U5-(4e@t`UDXӇ(j;ԬٰЯ?yzF1לuOR1 St kq2ުv8&r'fpN)C P"G +Na ^d)a&*abz[-w%B$ѰQ1>uz(lA* rف!Qm*HR?Է˃Yxb6ڮȨSĻ(PHsmIN{lȃQhWvP>o9(ko a9B3S5J+~T9PWm[p[B#_s'F&ΕWX#B}swN%Ov^9Z8"8J+9Tzs9G}Xïp[F5ÛeK |E3 r"鑏]}Vd"xcaS6u(lĭWF[7D2D}o Ur(4g H _1rH'#b)/c;Z_lYO0)s%_6!1ޅe/7oI Cm߭\W+`hz!Ej * ( _)IS1jqߒ;"0IZHڣN endstream endobj 1057 0 obj << /Length1 1626 /Length2 16014 /Length3 0 /Length 16866 /Filter /FlateDecode >> stream xڬctf&R1+zb۶m;OlbFŶm[۩>_}Xc{k*( lir6F.NJv6rv2@ _9+)#VPD&&#''')@@NIMM_LFS@hmgouQ8@8B\N :Z\-2@[' %`Ҝt݀@T4{wo.&+7 {G6u)99;;Z;fU7NgsCr;YULZSҿt:Z:2L, =_0\,l  hfhb tro_u =m/ hmJ7fp̊rs:A _&v)ߔ;Hozo}b.r6;w Ebr1rOku YNoKm@o;D`jh_ښ-lyWK S10& /UO(".DX2T;*RdL0BBv/ZF6-3 ߻'!1YnU?:GQ[c;F/?jcGǿ5@;nuΘ;2-3ݹ3wxRD|8ؾA0߯ڮ7-l&qc@pÚ'x ׇ/y0^{zբ6^;t#kɣ=OXNFڂsēGᡞȾXXRnC̟gI Ɵ c.ZRt&MUTUtK5Տw6jQB6".5'u-Kx qeqOMcGp R̕f ,qcL3uF?Y2#]}\B!+FPkE<07Ѣ]}Crd8<)FFP(4bv0G ޷rhgO,a=0-֖R4*i7xq~bM.P_|! +Dy+OPq9`P[f9061r@w- FR.RN멜dY-QU<]x3p#qJfa"Yur z 7Vvd\jMbS + xYH0]mƥY~Jx rY.ge 0/墔-i@kS}o70>#2l*fVDy6k=r~?VvO<4Z*Wtɡݷ2^kwSRU\wi(Y.b3fjt>8cH%BS&1֪u/Rófo6߳"˄Ljea:a]Fwba?Lxj{Xpfg):R6-hX>JU]Om6,R/'yDGiySt;ݵWw'&,eL]-69qkzK+EMk;R:Gsr M47j5{հqJ/Y;nNzL9w/f %27[{+Rv>Y 7po&gȐ hYQ>ǹϑ|H%t=./26S%w*souή+ܬP4ap#z3 25ɤl"> H-8L(xkDFSaP![oyQ| -ߙRZ6& :J!$4ڣVxx; _q(c}veTrARg|u.dBIl$@̮.+W{ГqVڑkSxRF(n$YF6VI}?7QVVYi;ޗ/ҵ:S-D 2~UD}?l-՗q Vv0sGkJ8*EXXJ =sqp4ytD;EDD$Phž]Dq[b 䇚ӴM\H?z:Z?Re6oǷ)+RBy\+N,Дw'nȷ4c{f$~`z}ȓC?1+UEu=q(Ny"8ቯC 1o WҺUM®:ȹֈ_wG/a=Y lqf`6S0"猾S:SdwEOqe'A?,p;%05Y&O{g80pݒXSrcT=pTL wsV7"ƷSlJ`wP e(a&jsŷ*Ws}A6WZi/ ciQQ d#} i,3CSrkesjHP<2:~$Tj:F!A_9d7ЂGqQ8=i,,b?(Ea=Mɳs6pyb>t Bv S74Cp8>fK_3(Ms/YTY 58ڲ, "!o"eߺjk dWN+!pଞ Ei7M蚾pq;Q9It$XwIb!F"ֵ49;{LMO v㺁cFGS-g;a/v>#Y 7TGKB~<VRP;T:GI$ԁԣ4ޭ1wPbwx.W?n|ګVL54^EŐ_R$Fܦ4^mZ},1xlY`$~'tv5>S{ڹ>1|6nDv=Hjrnq;L䫬JZ<\`r zgaS*2F[sJ _LLV*^`2y 8rVx?.QjqoXT13⶛3լԺ&eahAssD|@Ty#]"hpJ}o:0t`m._! Z[ʹ?2 gjn 6|k\c RN*Ɛɢ@ "ݎajȩDyR`s80:@;B[ Ξ9ex wp ]n_1Zw3Z.~ Jp%M_[o1$|  M|ҶGcp7ëNfMLyaB|yGB- +6vy39f:.yz#83l{ s mNeb#BZtd3Kk%'-{،l1BZPx E(~}9j %m)(鈢s[OUhR_W@d90;nK*pHZQ\bs-0iF 9ŠD _Y~Qw$ؔFZIղ}>%Sm䂀RCDόRXW*y>Dc/[ƺb/  X)>z%Fp5 [Z y/]1ZKí&Zsv) ƌ3S4d(o-wTv|1AsR1?͹e88P`ٹH4N#DYͳ^xHR% {R59фrF;nNN$uv{d6<,9ׁ*?䤙C,v(8y~T6`CbU@q&P =1۵#ǗP5G?_ɐA\ PpzQ P_Um6lσx29;#$pf=gu;\hd.9<mMgytZ=6Wv?shUNjlwѿ0d j=U^D#uAs0Web& AՖ97w8/-,PNڪ& R\鳋3qrآ3fT^a%4vb͆YpdTa9mF] X{u]9 =|A/w3 ݒ br*Epf?ȗPgMLL\{DeE[^2'`TSŧOle)Nt -\ť <(;3MZg*ީ`N@3ڭN 7*w[N,2dPUPgįe#Ð|t<ޱ}g/.?B"Fu"-kTX1Uhv5G7QPzn]܉zl @ Vu b:trYpI۶fCeS±Z\`Jlq ͂lfIY:Y CP)$rӊFlC펅ZN$UY2_,p3idaac|^K [2-/0l H熢|xI{MEo fP69#Axg01y֟A.Xa˿0tN;k}V>5; v_f,g륶l0ƝTƵM~S4 +\7xi.Լ#gvWX!dR.L CF)w<}Ϯk7.-OLJ|{ u[\a[,Va'.5=`v7T (aM%6sYu kSnk[]Sa L!CJ8j@ҷXCLZ։?UÊ(X?"D+P] h"jT*Xzw&?Z&r&<|K&]cMut;VŊ,Hu~錔׎&m^dԤWtgX5%YGB_@PJ證v` eRȸT*gtEpM6 hEN/tHgϗE"}-?pv NjAx'x>QTH)rN"TBć y.,2IA:nbS6M*Sh/l䷦⮲'X`a|ş35mq"MX,*ged[qw\ۿ 14~ЪKYRō)2,Z؄39T=U%깕DnlC+}jpLڳZ8fGi=b}Ա|a( nY)<;;abN ,+": *"_+ d$x߲X5gI($ax[W'DNmkcAJ՛U1#>YoZ~"*a-"mp.< ( &Ԏ=+F'};Stwf=w)f2-p;]iZ0 iC5S|cl :DsT:ws50PI`I=jcApP@Y撮`іwŤDZ~y9}HZ:)Y94(lɥcpƟR+ͼmkz[邱y=5oiR3T&C!OX@iQp/=pt}Ҧ/7،W|Ky.sZ#U= }'RbHQjӒy7 >WUxӞ^T~!䃃قQ o / h|善Z7ow]Q0fG9٨<װg ݃ ~oYx&ݔ{(]yxL- i*KCo>/6}v1QJۋ{6S wBbcSΎva.2ŠUȕ`' 턿Q~OeN$vhS藅72Mul_~ : I@g3̝5' aC]D9e;. E3}-"qr 9*cXV4{f͏6I8*yy(k,]W/jwD/ tltQ»B{ 3(!Z Wpř—Ӊ !8!Smɰ@mk4̐;C.<)@#Bg ɬ ɨ{sZ}=3֜ ePT̩(sD R*QrQ+7?*ؠf;ErI5KA'CzyjuBDK Aܺ=jbHԅ"v vD|y!ׄZX3>mQFg]G8 @N%x x?hGCҥ?8:qP1+WI ?t`( #AG{~ZvSݭJ5]R țG6Nr.0Lݏp ]?{Y@<>7BbXvu!`#NP\Ɯ%n,;VϭDKKd%|L1$S5 IݵzF}u{[ 3AҤQ!9kO$0%\=gGEr]rTgBO>Efk ^.&2wJVJ.&{0)";0(ީֱg 63B(Q L{MV%>^u)*xTv+~Z[oRjeqJ (3?/xT ]tu%`0p::F_:gZ LK.KΐbR<<$I*p%#o>C xb̫ &.qK> ёlӓ7S:)uGQ[wclj&_{m9=%&K.;ԦF"8+8I\R'Î|1h`24L=k8QmuʛSH a4goN"H R<69_!'e9<(YvBAp 4;]Z^k 2=p DufzGRˏy"6ݚ kYF%̺ȉ+1鮸~ֲ`IR[/8l]QEjmV6?j#uNQuzs :oU7ye+4WDɺTGL=sP=nܽK7DwɥgNh{I f܃%ƐվBn'~Q"%nSAL:D&:x3Layܖbaէ<+oHl,`VG`SBeCg?i6B ;U{|RR†; ~NIUխ^~nc:ïa=;3ɘoj 9=]OKX`u帪dh2xqt3Nb@tCA*N\I|>KHђFܤ_kVLg`1NyvC"5@Šd!+x@1iqW3թXTw+&6>$2UW4xdln[Δǁ˫%!ӏTO; &[?l۳rj8}nm,-=q:Z2Qi9x8 WI-Z`Uo=J3(, = -Ϲ8f$[쎫Կ=cKxU'5te0찍$ݍ J^MJ@:b Źd;):Cϼa BA[r2aCD_1 >|yE.Ǖy1X1^H?L|+?`ֳo/ZYM2"{[ Yю+6l&J!ݦE5!n<$Գf.0$Ok$jP-~ajn:!6smN BQsy)r`y+{oqaQKPp;/7*ƑB W,cȂ `zr!}޾q5;G&o@lt+CBU)!o l_:Gj(yT0@wWpk(u-^\)H^px-oEwkG$U" 0ωx Č~pCl"?<-I|SXpӁeGbq4;VAbt( MFU")$pQP5S3É|+o&>τͣEY9 4 \:P=+R3jNo)ONv9l Wi٩5CZ^.w:dJjl %ޫz{y3'وZ鴰EY/n2q/qBNC )`I3m=.GY_?5WW[Imr]ɝ1:F(OܺAV\Vsyri6'uCgMCJ~ͩS F螕}E)ӰHd~z\Yl}H \pk G2 vCFBOAJfӍ<u,gHVm;d#ުRVpoUxLzOSgt~ =,@:p(hbdB2*?B|mQ4Vg}j3+k !=usD>=|K(75X-e`]8v48rclH~@x9۽'OZ/bkCJvXy(g2P8pA#cTDEJ72ietW^,JRH0UDA[}L XdຎK;X樳;SGM1dⰕ菢j?VGrԒ|BTII2ƾ@zWKXcA6D@V-#TPPF w ]54.zפ j(g?/b`6o 7Ӄxsx|U,3-+gޭQ߫ 1t?0kJhD6iC;#*'^u皦lKr*Ц\!PtDmv=؟HZ"r *)UvS: lEs8s?vp?{uSjp8U^c;հjEze6 &#ZaIHiC{9 :UU<h[g 3=EdӱRP[/ЬA6l(Vg{ i L|!TwvWY^P|0{78MgeT }^+L_XpwP +`VfT~+zdyU ?QNy%ōͼ Jj`u*ž e"Es/hFպ}>ӠiFǙXW'[GXּ c SYK^4n*d֕yyaTyc,顪N8%ʡ~D|i8dp `L[Dn*R5AMVNE7I1P={\Hn[)CsU;N: k9UӞʖQ x;4h1tQ )MrX!uSK/6|wM&B5\)X邙kY[ĥƖ,%_l' o\`-tYIRwA{:h҂`f?}[ Z(?6 7I5+kl{N+ǨaM-[NʓGaoR_?*aec+VGeو"H_l~&9Cv@o0)WDZ U75xFRg;7tͲLnGI=A^yc/8{*ܽrA=-vA$;6:r<7nWo;4`a߿sj*:M8/iA8`f,B e^yxDziCwת(6V{Kr5''kC2sJ+uNc< "|ns?@"T\J1Œ:{jsމYW|R76[[Fj&" u:g@{$ n @a>kNRnV|JϩW/|><"ړy _bs$g}n'cUl=]]Ǧ  b suRy9J)î+7U ZAh2W- iel 噭V<c pe/iDZy[@$џYf{XP^..:U^j)bR\3_f7R&8p92"N5>/^j(|iԳ7/F7!lZAbŢt b1z1 mZyw( )=8՗o *K$4 s"SCpD}i{w%VD3lX P 99N=tu +f-u$6ۤY3 C* ?}} Eb?Aw=k[+M3L޵טd'$#ʣ\Amvr7ٱ\RU'VBJvpi؇'ľ\ ݈h-h \%;t_,f "pYFAjK.L#>e茈nMx2 /EOfpoȉn_4dY ;(wfzIGrJLw}7%dKq.R٩ԫ'CjNJ:;Q'WIoUpJ]ܜ(f!ԙyj4Ex=ZuoGRmYiڠAnܰwr%Eﱦ5;Iꎟ2P&x*lx>ʼBS}wf4K\ݴi{Qޕ^*ы *SE^ps4\Gvf`|S%#\;s\Ek' # K&#Xp#- 4Q%2e8''GF66@aٛt88Uoq G:f6"ea̓=yfĤIQ /LvD: )*1f Љ@JՄ ã@#*,6vWl01UXlc讖/cS4:H(4ƒʚחEu N89rWp&naF~C_ѓ`q5DdN%[͔7lkFay~LFS%^8N0e¹m9uhhp瓁D( $_ Ias#%Zb,x1kSF"WIz`G_ZƋ sThF+@#]yz+(i~z8:BA-z~N: ]6`"Ul PVIa?{%f4bw'E³MW?0wc endstream endobj 1059 0 obj << /Length1 1642 /Length2 12459 /Length3 0 /Length 13302 /Filter /FlateDecode >> stream xڭxeT\ݲ-N-Xpww @cݸ[pww Np!\;ww{tjUfco:*5MVqK9P!P9hTX yW3{ƃBG' 4s\]%@ hp~ x9m\ ,`Odpڃ!@G7&p@@,AVE t:fn(,.@F`h 47.q=- rqy\fog -,Jn;!3mFvqupA\oQդd_]@o0llWIco4o t+9` r؛y~#8Nh X@k3gK{_:z3oo߻# ފ -[lk# _"hprn'tƷ$,^K -$ῧ2 "L?ߩeUs6hokظ97 d_n]?[;~qG7uX9y ry-@6+3ۮh t9lߜ88 ӲY9? kjj1joo$]e,{|XRa} o\AC6N?Z}7iG _-jhua psv~AV?? @Y  t#2 5j~wdDl~2} ek5yQ`C`; ̏{yqznٜ/1u gHgw4Ex x&(c䃻w#C;ytfGT)^׍nQF+} >5Q<;䦚NxވKR&2FTYr|8De@n#O G;GX6ӠEIh35T?dğWNw;g$0#9+DnGKr?۬}kzBF8$TGؕ/sІ2s` ;Qե3nN~knjMwfBZ/л/Gvv:6| *ϱ\Jt1YsӦ&eT@l!y^V9m0Z9}\^W $Q[iԠv^6HmzlAb}V wϡbX 5f[X$,ZPUJ*ڇN`W;~c9mBTh,J*:BQ}wxn ;s"{O|dp#{J0 \i6X5,4>{luSr~^|$ < i@+t_4 rwom+q9sZtEȿ-}i~ߘEMI_k49uOђg_gn:N]>>%n""._uU>)Uܔ>.JIra~P^nil@}y VF]UU;Txeq{."ɑR(dn¾m2ARжã]K{a< :F;b;4Q.,ˣcp6\UF!mǦMXkEͭS>Ocrz~~g˹21ifޑ$ÏcuA)ɼ"~DN׮\ϗ q<'P[ҮduʫhvޡEP6BsZ?*hp#;gHK ]{u/ kI*^Ӏ.oʼnP:a%d:U-"IW PFRwmiwǰ$ȃ [ǘ w7()%!ԁk`h#AZ[t* \hfms7ZcOqeB8"R[$⸼>N߬l"xJq㝜r#O8ʒi V?j9idV䜵EtI(m6S\LKU(*0+Jj&q?Y&_.Qݒ; @۸*MAAI./>{]ZÛ;p ;9_ ܗ1@rTc7QI/l@ҾC: ?A|Vօ]bp%wߧ&%dyd/ՌHxnЂX ƭ#Ҹ)L߸}sOnA1vLHBdއ4lz_I/& ]2F׷D],ҙ6̛J{1HQrfN9X=?J4#5ktڍPhW 7ԛ=$M|}>ŬWݪ:rt[G]:bDQogrb`hJsN*5R wd锔f>Jc"  '; وR_][bGKSY9<Ve;>Du=I㦧iqc#;VaWdD(}n'Nf2M$^1+U\f 198 g]Pw\gyCAE1˵]+놣.Ýuf]<\_03r+ϙz+8cG7{:gr~e,XTeѤ(5…f 1i!Jې£$,T}m~8ƠDBġ3m]ON"#ui,ty N9ߛDR_MεU0t As 2yF>o5g\Y)CÀiWWIŒI,?,=KyphݍY%eضTzS[Y)RO<wb] 4Qܞ4;7ٖh! 0r tPGڸFYd.Vą ς1iRnf(lP#rf*lPݾ|s5v 7cjD_f;4ej@a}=WLdi5"mMAVlnNFe8T,ˊhwqzUEeh;J(V/)PY7:j%h"ظиIֻu)!@8Ԥ,V>"ƈ']ڧ+\#YD(fN=ɐR(_s>b/Oe՞%xe VDž}Hpf_8=z'F"7Yi Dl)X>YLVyW,JVqu5>}7rAjr47Hu*bSG k{O{('Fs k᱐ V) 5qM/B'd\*"Px}{.ګfD$ kf; HmXTt-j錺mLpSuvnE@! aH7N1\׺vѭ쯙ef{!f8_~/i|kn h7?;EJ= iemJ~j l]9\OHלޭٵ;dd:U?ҩ ;Ki\=ed&'w[y DF) A q&u-\{Γ)ty7#+/)_ m]̑P/TP\h7I VʯvT7$tCCG#N>~_5z{&s͕U CH tlv:|j*w<ƥa0Jek7HAd7ZY- "l(d]`1kQRFa&&AKZpj;2UY,LUy ECpŅ| nkRdg h 8z AE.y".E<Q]-j3_/ Bm1ctj^M+I{+}y!9/&7P*|8wK7N'oy$>Vk{mWr ~ d)\xߤGxV`;^kx" -eS;A1FLGw5 \geuYR~N/j3iO2A(ԯP_ށYHbϞMyqL LNڳ܀1XR&Bc Y8l:CވFs2eR$1Ln}\[SE@m KXM~ ?E;i HSRli"~fH()@ܾz\ﻇ*흚r  3νʑrI(gZXaHa1ylٍ^xɆǼՍ~"TkC^49D*3mB]HĪÉ+ϸ,);.T˦6re J&Ԝ9rG͘!j@qǂ pA<1mr5Lї ~dJMtä{BfiCm+h'ܦ!+Ly~n~ܧħ}uZN[ȇȁCɼxwj!xCV!]!Pۀ硰''5ūY.JmI\X6~ wv b\'hx45<ݚ`> R\}\; W[Ӯ ?d'bD]߫?jv[s0f'D2tֆL+,\9x:L΢IG|1K7D L/]ĵr.V+qgz1:q0LfJs% ujV d (|_$&%Q,cOkKl؛ 2׮pDI٥S,y]:^+ lLRl]huғp\WDn㺭>?+iG]w"LMnW7:/]Qο{Ϟq$Lߏ"E@1n&WOk6j#QNNOʔ̈́),seN쯱)N;-K{hʶdͷNoCE`,ܻ[7BW X_^#fUgMž9.$4l(<:X+P ^AAd5߬B,ȹ z,'A6K͆:ǦR>pe,H!U :jN?tӸK]WI%D*c va PY%·W}6Wi*+(Y]XK,NߤS#ז)A&M%Dt_o@%IF_nM^{kFMvu%t"a?%]3粅d^Fݧ0 _EC7$ wpA`Imp'Vo7[ 4tZDE2^*|#MB( T¦U TW5զ&LytSa/:րȽ'^JA$g] \fwu峫8Hid)=w'biYp<7 NaS\˜Q1)81uT"B. Qm'ƊEAolp6%uՔ{.2ֲ@R6Vdk(.JM?$Y#W?A mQo5#o8tT{ZTky)FNPU#(-Wk2%5̽n1"QZ|SۮvbXcg|qNcnTGmʙS۪9&!$<<tgT>E IΫ1ڻUj ӗ8alݡ>C_ RDoQ$H*-Hjy=uu[.&k9ħءfPPc@Ѥj7G$=᰺ }Fi@3cCnH:3O&uZ[?fL"|XؤG/ $n@OF񷃮\ݶ(,R;,C@{&#H=>m@u<1r&`W;ӶP]%S )܌=HY4kqy5`WgwSVom?L |a۠1\Ђ2nbH$} -/"=zvCڈvlzP>P:D>Ӆ8Xl4¥bbݰ8[O?>m#4TE~K^Iftֿٜ!W,APH ,aQ3jEgذ~x9m3h\y?[xSe5w*1#^5P[!9k5Iӟ~^dvVqsM ;>kuA(eiCmPǎ%%T'N^m˴;a,2U^v}=n$$K6|lJVr|s6cxQfk[lT$BF&Tp)_jjcߥ@󤣴C:#5!h~(OB!jQ=LQ$'@LS5 ɲFېYMt@0s^!X@{j/0*sr4*]YVܶ  {@ۃ1!8ضPEJh+eJP2;Ak)j4E*hV4}p;I;ڵݢC?$۴>A^"L9b<}2_&}meNL1XlyDe=WICf*MV%4(Qk 7hl˸2/+tSl&㇩T^{{r 0߳SXg}4?)#@WLZ{)øsl=8F Gto;YQe&ffWRyQ|p~='GۄeU"q\t/JlM򜿓8(qu <CwbWO>39~4K Yn-U3saf12% 8>F;EmЇq5It޲eKF~j/aqb5ͣ6(Agw3AC$PWaaU$qhiA;L⸂EνΊd4 dMײұ ߻bVmk %qV F$`3M!{[3>z[~ꝵ|ͶXɇWe0G"3zY(oTYrddR7ȰCR;PzZ@K@[ĉv ee2l ?WǼ<1e6I:(G{qpn<`DT6դևgQbR=po7THl] ސJnŘ<| &WB>owh鍪?fz$CZˢD綐9_e3{ʈ9O+67.Y:~@I}@Cq8T_Y ;Wx5.aa LP bm]vw7YB<7%!k 3L;8|&K)$&> Y%]pR'(1e?_Cylax0 zЅ#%Ք%3W=$2R[p }6]Iqs=N盵"/*;3Nn>c|51uE@B4+T]sn!0( ׸1م$ki Gm1k7& ~{YEH? ^߱1  *05oha JhE‰s9 'ƥZM!n@B9j&KīKwWL{/hwxij6Zz n'~ΰl̿ӒD!|lO|u w풉[c9Pu<oo-Ip=ƤEKNJD9NutTsw_H3vSg HCQHu:/dCICf/&a endstream endobj 1061 0 obj << /Length1 1630 /Length2 20149 /Length3 0 /Length 20999 /Filter /FlateDecode >> stream xڬctm%{ǶYmUgǶQqŶmTlͯӧt\Zj"f&.L,|E<+௞R dnt7v1hMllV^^^J'hiPբgO?.tZ:8ڙۻT37X,1%eE)@dl Pv5䁦  OkLDgGsS0sSsL Gs? ۻhojjO*/)UY\uX pi`OKku1;\=\eb0:;{W@{24ٚ;;tOҽ翢?k8Z0!i7%]p[o67sпD-`fn7%e>(o!7r+G#}ZVc{;shlAc;)zk߱[J$Lbj0;5A@{kFVbSCMft~fUM%q́EpQt[F :xYxl<߂x8|7) 7 뿺J ?blow+]7707EX]r0NLwc u,mT/*qO4z ejh\`R<8hՅ*ϑk8~).FF7nŬV$ w/%W ?Ė",K&/؊Hfe 껖dVl2Wx〈GMKRiE[4C a fҚ,n!%l6|S7WϢ)?Jʮ+?DrF^c4>[b|^n5~7{k_w 8lccƌLzjUͨi V*8PJ,AGփ|ʄV]%MC&t2-*2^Fuv-սN45ԟ>;P%*Pf)+L(fp*LT? U,^?rcgKk#'knĪT`}uURHyRύo`q+r48_?V(uSz>A"r1i8&Q/kl'K[_ػN#}⽭k;s3ɳQЦ o|7Ωf+?hj+ޜ1᳙<= )gH 8lݒ9<o4l~m$%S@ȴthm vy‹CY_wQ v a:!Qs*eUpVFƜX;#"k%Ap+ &zY 4˵Ty=\EL=ipFxX=_nǰ=`3bc1#7n^H"OruP-C98)<&K` 0kԝpedx9ůCa &e'p)?̎%!و0|[*eEw* yENgݮ9eU"t{m؅ !݈}Au^$o 6ky<9f+&lQټLHǕ :}.=jGկݤi'Kl/P% 58DK\"R-+,OHG(و[;v 9 m8@L+)1R̪w5b=6XJF?˞m7Z3W;(Եݞ=fKKfP±N>+Sanu]r s_0ް. %w_0Ws1G|)ys<\HyҨ^FׯĊ7~xẑx-g̸D_W#`Q||T[]/Ӧ2KugHGLLV\[oCAiuvrC_R2)Hӛwt9,rpl *0'ͧN:{dE[`N =L3W s#,sh!jwrc/bÓU܈a׃(JǏ?au†sǷs-|Sta0#351T1^~dDQn[{f8EY(O,{{Y kۮAӌCWamV-GjIG=#CjRps11ZmϨIJrº%E>=$A&M[J=-3mJ41 /b{?{2(zNֿ(_YIhSMIi.ɔ1+ B[3 IaS2Bv?l&weW;0@jx":8  _%-B7" 4/VTpQHYAyѮI70ki'1DHxyP%ۧD}"/)bΕ /D C" L08 $S ]È٨j~ma^Uq|G[j3\7vM=('=nd.qzFgvVHM.2!VwkEWpI|LA ÖcûE]ڼdzuدk9f䋃$?u[28vDu@cd~>#lc's.>QpL`N%P9 vlE;H[+TU Ϭ9%L9c~"Ti/EsWxknfeSM|i"8Y1"asOHV ݽ%L>d}Vi4SD1P{~iB oggh2{]~)ٔ,9xW҉L '6C>Lˡ}hjn{=,BLswc7Q!?X'RS6t]s"̀O Q#a\Zx9LV!$6%6;}\!%WDZx[6%Dw c1VZ[M /y+BSYy]n뢫ta=%W!V~mSBص1NܷӮ, ;H}+^jȱ#=&aMk.6ȽRĮH }>}vwDŽ65ŗXA*NNݔv/in8<SPIeإE:op>e3m f` ݹ?~\W[C39J.ٔ݅~}^$d 'Ȫ6 D2e D^ǝ=&*RgL GM-5btziߔ.ՆqTIDBP<˘#q$p:\ UD*$➔|&ߕ+S0zb()tfòj9džYꁇf/O y=ZDrO4 ԨZWѸEG{ %.mLUOڶw)PW׈j42_ :ǹpZUWv짭yK0U(%5'Of:Js@^j&ys*Yδ!.AA2#w;ܒ9wP6/l+܌xG Q!.s#C: |\2{{U(LM9;*|4" EcNb[= lg$PzQWĜ1EهJQ>p!`d9 wfwX\ Ϊg @nk;)qY֒LSŸ#a],4p}EB;vpzެY.- SLԗB=ؖ ׊n|wF4,¾W( .G(;Fe6,JRC17 4 VGN;2rn'c~=gXK#w:* ]ˣ"HHE5ø|ou9_f /VJyf1lÅۚ67o/+ѱ{  xnvcM1fMO#_bX%4ݩ>IV2M :Cu Wd,*fb& PS C.gVZ.Nm5fFa{7dۇ 7L u&%M+ =H;]"٨V.^~AHb'9Yb=H򴟫bE+#q %7Ti!n?WGPӛ꯷]Y*Gubp_fJxEb7wgpzZ::-}O@{}~;<1<8REq Y)[6+xL_ +*f@{vP`&{h{ WIx#4:/Bs|l3JeK\qSPZQ+u>|P̭r%ŞJ<ďL{/W*z]2Dg%c:UnDxglw*CLhm%(fR>xsSE垬N9<ē6"Z-sIPŨ9CuN}ן5,I4P\>:h<öGk#3rC ->t}vw~;K)L6`_tbJkZ{i֙>NĠԔ+&Ċz%F8GwZ*k_r3Z]pMcDf77zhe}= ҅CP{)5n7;X|OO5XY`dw ܋B}hg;eeݦ_f$bީP2)k.e< ݔ}պN^jֲ\vшŀ2aS% .cNtK:jk0ZV =b/H$t.4^GvHO׿ye}3'wmVR,A9"fL JnJnFPrzc]EaNL: |7Ygm1c \aĽ5wܫpO)SM]͟HZI$A 8cuH;֓N%D`ulO'~W֭?2A!x*]ڄR 7_HUvB;xH5 yuj&2q'@7T|e2ߝpƢh-ڍ3H{A#'rmtɴ9 ʗ̌^$vkd_L*T 1ҏMicɯ MKp?b*G\Sl~MȐ~rB"2B49GⓟdF,blY(kmx$&+Snɔ+uw+ S=La̡@AguuO"Bڽ^Q{Qmb=86Od@.w .$,\L*D)x>-Aec ND9E2UdČyh&S(CM5Sq;y<7fӜm- Q4Xc gONiW9}am K0q8Zf3+|$>r!{[]O:WIW+ =47‹_(sڀ F@4_\"{f㱣1s6tja!Mi".8/%Td\ՈpQɞ&ϓt|"~Svޟ+,0Gco!:UT\J.bXyW{ho&Wa&7yvi/mg?6r}L {,=VnGudA9]vZ0*r- S@ϴYthZE:CWy{3f6_7;NŚ1'au0 |Xmk~'FX|?mXBAs#$Lki!f#y66%(HfZh-1lیW+#>oc < .YwC?;V 2>QO Kf-&7Y%h̯|1yIh {>Q$y_Kr)NNߺ_Sa~{u x:Z< 6l(6597I@7]_r!Ɯ.܆BDžYjQ|7_rɪz&}'] a}4:?'hLDյ:!ŵP5CI Ҡ/r(7!pF%n&YrZ E{RMEaLhAF)OYL Y6ΑYS\PTA5]QS5%z zqcs4M?iL?/ HHˮŞ*f{.=k@ 륩0 :Eog8%3_ַ} q!"DdY ץ_8a9iOPDJE3gN?]io`w"7v:ƈكxNZZ!`ԟď7h=2/%.Ily*G2 Y̟ 'v.dRoFy#N_amFiKAk40hK^DМq~!̣v>r Iq5Kj|@_|K,Tħ㮷<WE\E iGڞ|L䧰cqv{R@lCZ>+GYt\I\˫%Fd#fb/a=d\Hتd B=QEWtjFl~ k(^Hj%b-g ' scc $FJUxLzȨFrI2! \ Mht5"I>j2a?ly)Axؾx;&5EQ݂y=y, aă٨ OUیX>`?bYP-;o芢.YF NYӑ7ٕ$nu?1?zwD262:ZcKzcqTW y`RDn{^,/".N ĂYPg9c:+/%mCɫ{& w,pV5C^!=3!`Ä%YUt (T@j2]@ˆ\+S U7"Mf?Qxc m *h˝'^y*Q=bO]S%:Z˸h*3-~A3 nM?MoһYGSW JCt 18Gfa-~,}hpkT&C߇aq9+u9g/ ;;j0ѩhİԾo~%?#"]BŸd)pp6\?h<1G'|^ITSM9w&?29X7- T4d I*WUV:L" 1z";S^i@漩d Oǣt@\l{wHP]&DGCOܵ ൘! ttNT@O3'p[#.ʼ i ʢ{ynJ(؍B~T-'qdWpjgܷɄMuwBl;:ols,^)^x0yҵ*/,(ED9)RR~N&O{3k\˵=g+C$_? v S{V1LgŅKYtxX05<|l'KH8-o.& 9-2%tPjy0Kƶa3$QGGl$xێ5in}:1ՌHi2F)K1ͭu !8dd fԞJsJGwSy H{~t|Ur=+`TD-zx~U^Sݳ±HnFVV' EZ+O-?DVn"$1O%!nqgӄFIDvH2w< Ft:9.{(r$=M$VhmZiJF⟲5|wBB~$@E)ñcޞ׵''lMwSMC :"h/s ; &^ T ajp7WO(F|q-ђ9Hg7)Z3M @ cK o_[|,f|q R;OJ#;1Nvn,w*P=mY.+D 1zg}$4fAPY >]m(m>ܵȘtYUP/:͓$h* $XV+ݔ|n@hyF-.5#yt$װ>o7 x$dwTL^{h-?J")C|vC‸J#ڒ.} IV&Ƙ fWrm[3ףy>`LU6!!9hҕz/3MXgjzƵ^u!َ(%2fm3- -t7"J2RO0CHX< " JW nܑ9#򸲬;̭*iho `85DcU Xh,a-c6Y6bl^ߍZ(#Jj\m+4GQQ=OhzF+#\KGFXWZ7Ҏo:+j\ԣ^Ϣ^Aj9!?.>Р (I^'m7w4=A)IX*e?ױaa7pᛦq#$Dҩuje+ב&I1tmȬvUyy뼆ZS7g6oE\{=j{je!OgO6D齨&.Hyd%v25~"Z_^GӮ-@54Ƞ=gN (<+ylӤ ;T.`3Z5F0]Uʠ '۰ڨL f Ǝ!MYN$IF%J" w6 K:;u;X3q~FoQ*[3:ͷ9k8f_T^;,UE2TC!@ݯ"oE3m++7&sfOxQ/;BZx1lEטݤAHbvVͮe!UjIDy"~)4f,9Tl Ohh%('*ہ_NHtQE3_cy}h: 2g/7k,@w)o>p VRn>٥H<92 ijAѿ &.RBJ&g3܈fT4nQXpiï |טCm| [Yy#bK #@HOnE㭡\L Űܻp03 Mh+'V6À?":̀?n+ۉP,5RRI?vúCU{N9j2^ `,ݽve10{HO_tCi޻l<_VMTM`] Ƅtqitq[}0n,anW҃s8bes ?[.'/ [Ĭo>߫f@_v[oX>Bޅq>#vߡz6-yrgK׍T<0{NRVXXv}^i+m3ò{"XO\qC0ϓZ1jbD\JγlIRvM}zՐ Vwp;*Bxxc0)>dq>jџU('j#޲z6Y_?yzO~Eކ; bBK>|Nn y.{A!?=&!> S9}Ю{&1E뽸e|w?8 >wc2nB}aq>BJ Stc:=xz+`cT.+ޗPr bڝl*Ai _Qk㐜Xn-/4 .aW GBUQuWMC0SlkHpz> ,?{Xoojxh \?#J-v_>Z7Y[,+zsR!,`+$nw^mFÃH:d drrųbQ $JpvFK0\IYFK0vez}j2cMg'Tjgd%!R.ΥRGb$(#Lַ&Vhc.ե$qq*[qaٴ?\WX;oL`(.a:s(s&4$É"đW;"F!Wަ#Z&t»VvʠI:Ruџ7dI)Hbx[ga[waLIuS\8t}g >ss=ǥ+_g[Ay闌~9L=9&ꠑR,MfsהTIA/.=l<[{J+{[]rּyb/!џ^et[2d +M̵_Ld\TmWrZ/ǯN\qtCehP-Z*WӋita.6?4"S `zaས!yH2*D ,5$8^hřb#2Bo֑c?W g6*CH4L-5ɘ{d&$ϋE,+pտ]UPjyP} r[?Q{n۾V+Bд7!bXTH5 ߔ{J&߾:GrFᎀROvcط:0wrQ=<1SILL 9#U(7L˓,xbV)\=yVˁ?>H:>> ٬?wx$ F>7#JDҸjt]q\-ً`y*Ի1"ѕR$ 6͹4 `uUt<\n@&qC@LJz#Z *:A\ր .ہ6-Ccu'jm^_C7#Rӫ`?Sy]%xo3-}bŭ\(ܻa̖t-MD|WL6 &~q9Vz]iaabEuyXל>]j"| emX|0դ%×nP=N?ӓ0? ]ND J(kU}bee+)[*Ow:ƀy?J.."Pp㫦QN)$H*Vc fLI`PX˃D-؁{1 wxUv`y1X00r.~#( +;F6?bCo'zii68S%'Mx Pd?Ђh˩rQTb{SUt]7@9LL՞$ۘ.ފJjtavilGP@67!ɰZHQ*Vq#00tNզo|$S!~ q?6;]E v/`OxR:I$5Øsw Jװifv@ dKh &34U(:tF9,Ԕ:7v? S8: yR6qi;Q3 $OCXmme\<;[ Œt"yJ&Z3v7z)ݟ2rldwuvtm粍z;j,O}|B3j-oUb`ʩ|~>pWm틤Ao_F ;;RڣNZV|CY|WoW mz B Ӆ"z_V{Xi^ao4:JҮFJz,0Z=N#x0p8!{ p[3}yd/ "!:@?ױ-^0rkV|G.B T' `0`]a%DAz*kh~y4?'ܧ"JKf MHj:[ag8P<; >=M9{,"aGL:'1㝔bwop؆9^b(D:2 ,+\9, 7_?6[L*gAj!tЩI 9Έ9> 7S4ĻM`=߳R5vhvor9,e e WgLM8"J7r;6K 2KT+`m#ѲClg"ty&`G=v*ۘ,zvg dJ6: bcng=1%RcA.{QR [q <2%H Tas=]>%w\e15 D-\%ԗ%gNh+0-34–n> WK/(Hp֩o1P$2z\0Q  r%y{wT<,{1!(ac7*Tw}¡ًK*Vv~@1bO9!ϙA:q^Zru9a1~ˣ2$̝t ݫ䨶MRv)`0֌ E%͠PJ5~ٟ<\g8lC*O$b¤߈CMwh%2+-=LDɾ"@~Y/ivݣ}n mXr'ݸ{"\iy ܠ?6*1F`|ѓHȼVj@'ϢxVgw9s)U{%%-9LR,o>q ~ڛcomG-o:Λsm #}}pMIa}ه 3u}x. $ >U#NĢJ Wp8]yb^Fij]u16qj >wo!G!:ۂ>"uN~z ֎. - y05U~IEIǞyn xՆĹUCa* ,8^R*gKor4.wZ1b6s[:Ti+g_I_.!ȠƘ=ի6j^9wkC*ܵ֫JȽm?IͣZo4FOKߧ1!ұ÷;bA-> Z<ˀJ@x3}XòS&甯MS݋l?J q9Fs8ʘ+ t&Wdz/ TS?evI k/,vT;sIK#_BSwțe1:c:t$dw.!C<z>}&y0\ap 8%BQ(, Iz԰f{_Uv&],CW3iוn 4%Kk?@>eb#\VNߓ OwOjlBڴ%L;5o񱿙&Xۜ ;A$ZO?rۘSXm@m3EQ˼I@ h/Pf^iRS-$K|\g_lᾒX}}MH[#C@sFXsG@6L_QC+?2KQrN 9];AXpL2=C A| ־ש41,D'H*7-Mi{%AwǼ)nNg4,^i0'~GC\JYĴ<%Cቬa2.w`[gwlRi_= |DeޜM0RCWu.fŀW`0w>Ӻ*lyzxnaQ<*Zl^Z?FFIa_:o{xh ;w|@9wDqN9u ln_(( dBiKI {ɩh8üID1h :?QJ5KݢN{AWWV|#`PpCW:#g"pЏwk/ %#a)x<Žj' v w+R.kro+^ezT]W4/wT|%z*`xNYŢ:s4rR \J§ I2_o8 Q1<}-XݒdK^Og7oHg߫Z/9)ARMQf\|dRE ]CTVQseq~XbY/GI8Vu@XGj LpEKT$uW5vA3/Ի &z`ĨP^ -3LkމY`F'>܁f另sқCl(q+aTXƗQ2, 9ȈojAf6_zaU8MA["{#pe6ĥ~ړە0牃=f*Ѹ!۶%0]zqyoRLhc=.2hEeC8"NqfAΗ-5h3 endstream endobj 1063 0 obj << /Length1 1647 /Length2 15927 /Length3 0 /Length 16790 /Filter /FlateDecode >> stream xڬcx%\&tlvmccIvl۶mul۶mN?;g\WUwUVm2"y%:AcC#=#,AJֆSN`$h`Ic%#8mE \jcB!33!''',=̑REQ?-7hjMH3 `:P t4-r)ɪX6!dh 4"T&6>Xí/-7 j%[&:X;GB? [{Vd6F@[G¿YE]?aB6FN//_h@pu'!`ki7_2[{prZgS{cK_Iuo`kkhyKzX&96Z23/&6L; Q33T00t#428MIH2 "UJ-di)k`wg.k¿Peci`OB nwN!hmW!:&Vzb@W<Ȍˮbm Z_0e3?jXUDh_Q@?3?TBB6tLlt',cht%bgdd"<Qk#H/?_ v=`dchX34!6d[R\[e[:am@pÒ;pEB՛INsϠ[vq ƨz;[7b}DKNhmRco[:3vq)E.<9/ Pf|û?[Xekl3Al.yޒOբwa N V!Y ֆ!n/\xvUi(eJW'"WߝE2>V<n H艦bX}/hrIĔհ@oU׽2q 'r)iFw x2^"2e~VIK\<CʆnW2q O\e x0 !`.ax]U)VtX-Ak-Hມ-s]/TJ r 8ͬknH Wnj#!Z68We`N-ĿΟE 5ldID zŲ0 ib ui$uqS wp`ªm#U  ݐڔܞ3QNg izWMzҥhޱ|Fv #ϑ'R2s&]MrEp/jVUr[W[] ;H233$;cAW=`W_C,:×t9D2q>5҆.>V$?&uDEVM1OMU+qv7ނVfa"A$DC6^ܘ&xD2E4|!!zGu)fKIdɞh3_u*D(U2 v/jfSZ)RIS9\3T Jx 3s,̈́n}i-<L+< !`d}$ߺw٥ehmbHB)1EQۑ)o#9pl`tk[/mEFbMP}l aOV?Ffj[ 8:rg #8-kIJAoi >P(A33_˒W;yab@71qh27frR~^{jPCxsUc&w&Y{)dRbT/ɢ?Ȉ%fE./+='A^|!' شkk˻w兦)}cFs'',LȬ:v#(T+<%L t+^ ӵRV+*:"Ӹ -Œ<|z96q[RnSG'yxj9C_V#j/gmEwXnq*쑾sR;G1thT3D$!e;6d@Չxw6x[ߙ,^dr5 w,nR ;T0aUS?Vz_S$E;a1)WKEc.+Ðk8jS i\1T!WODqQ:QAt'OWVUr)y1KHe'OaClBggs[V3իb;h(`u76#*5qRKqLBo :bƲA}P>ڭ$)5ɠɊ2ב\gZ[`SBř4Re$J bok*z!̉W}*8ojM!?[,~/İNϰv(|)w;twW|K+a ~>7 E򊺢D ʪDUh@3AǿL׷MFeB@&Qu | . C=1pZo~I 5RZtYfatMkle3~F<+%"5f2 #@=^7Y:J=<^ӽ\6jx~EpK'ePM:*ıP6xѡ';*8=I oOstrRѠ0rbkz;9Mcx8(q7QqQn:ۊZCs74+rGX򶡥E i髶fɶ HSv20 5< \Q*gĀٍ&{ƫ[yM%dSo/nk=W ?P.ESy;֐q p)9yd5P#Pf8W0Aljdv_* Gxz sT%$QWL6fڂ@fJmY85'T,F&ORS:e$b1|sS #yb] i'150Z^-A bmM/%Ph{WaZ;v}S.#讆-_$#W=4bas5Ÿ:"!lp<ìz fƐɊ}1eW5'EDOD)[Jۥ)"Y/go`^˦U\$'Am9~{wyvLyp;H5_ED}wb uOp॰2O|Q[/ֶ鄕|=}wH1/Ad3*~$Y+Yb~Om\}D=qw~HNH fߎrz1c]·ʹFzs% m'÷tC6·x4)*RF2 qwb|HTOwC~ F Y gr d0t vZhdjMTQ+ m8o;a#c3 }8WD;1#G޻m{ a~MW%zFAZ;*OAИ%7Rhm"ê݌}gu^K{,"}ήLTWI>Wf{>/!ǪMX׽MBz:̳⵽^5 (˱hLSHI[=܋--@8>c+ $\z}G84T?s)6~y6lKwjM@dB O!I/!/Ygi3tܲ*g$PWcƉ&`OvU׀Q8G ReoN=*=,sg[Qf2'r:mQR?-bksjSzH Z)ZCF6ZE=MR e+"q'i9Pl8%}0 dtӫ];<s]t'JuiA%;=Z|-BaKͮ|6r`='NGog..'[H9m=IV? ơ8}@Fvdց,)n|3w&JUD4̈ަPeakpuCA"E{<<{FȨ(aIwt!H*a 'sZF `,fMOu6SVCP8hCqg>N02eniL{KaB!}l{LSCzMlI3)d-Ibq'=t~/cqOLЩ(p~RG*"4Z kOY/r2bK62K&KtoA{D()$LnBHlkfNx<_DFL"e#[~PvU#"%s{Uɷy{7D ӻ 8aqT&3GZ oLތR*8CǛeg#'5%tQ{Hoh=u]<တ)'ϳ]%1@QKz\.p5ܧfH+))eZjѽ]p͍"P=Sp{;qK7NQ4gC) hQKM`g($SKNNݐ㛝od^p2ՆCi p2qYL9f-#Ӣ%g80:-P[O NPTxug-fj1D]2#3hBt@<6Lnd%DcgďtR߽֕+jѩN8sbw{ `?3S)k.l&Zo1H<>pJ{`s{h눋5C&Ktc0B'Fw6{VAOuBdq1U,i}7nnT1DCx AѻTh$'ܕjS2Jz~ҡģ#`vn¢b皲m5&{o/`d?D>[&}Ցj%YruLR{5mm|㈖yBXU#e¾gFNYIuޫ$cW={~͞FH$HEP(NG}_d6/iF[}f?RH BW]R,OdE vIeAȡHE[vB7PnU܍hG^0iPقL'Yg k,^Damz"dRTC-|հ ,p ME+V|b #H|0pNN:}$K|ڨt4םۿ{$>>W@l=nGkp@Sy%ƵGZs/O_9P(6'qeK>>CBa ér7/%@$@DbdxH'^ Bw/5]z Cm^wI؀5a`] F\M58%?H)5@yq>)ԉhJpJZ|zP""dk1bňF6Hp#ogcBC@ 9s7ر>RÄ)~z'*N:}/v0I>M ]&[JbLJݳ`RSre?׍S|3 ʏw^TIL:Yx6V)|赗3a'W$D ȓUL3RQLWvpS2-u>U"!D`uk5ohL"wa -2s[߇dBOUX񼲲/vI lˌcoN8b)4kqcQ,HL&“Ft"^]msK{{Vn~1R\J9KpVN-]Ͼ0lB(*mc4#U<S*濿^A@6=fzZ YE3'a/Cc&&zRДꆂ5 S nIMNK +l6zpVM( S~2/Jg R RtQ#φ`7'z0h?"`IzwR |m^s2dܕ,GO'x*:rVbvMwcAަ߱|ٷ\NrLXxH7䡏%.GhKno1w Rm9#vzJ"%>*2f\ԒTG"!2!5#|B,Ew_V*: DlpA~\t5abhzu^M`KGRzi6Տ2ߠ=?pTpQ:Q7JK$QUK cYV.m/_uPLמFĤuƌNt,}Ѐ:Q]z,,IUۮ[|`՞@_ʹ4PN?YKJnXbtŰ8S(*Ѻ<(S`lO}{ N^]2)к@_G(\;Q+;}&0>ݚrN5*m̏ Xe#Fcf6^e^gnzyI2@a=b\wEȓ'p:h_75`#ohoWSuߌ\EԾҮvo<7/[/l޴&:x.p7@[YkͿV]P7(q_hP=FJ\3d]A='@L e^b1#c8{h7D_\9*}J(9!]ʭ/cI~5El)ʬfQ*p1{K'ǚ%+~JicsOD`W^'+>RXt&O?㘎U%Rj*^&KL8k/p,( ;&8~ȌXJ0ݖ72Nfn0dp9banb PH22>Ko+5L\ J:*9skJ2r fy6\KjIXoI;2K1o D|8 Ŭoͦc騘ǿ pc"uc|O%?;{R=BM⿶Wl7rZ}= R_5Oj^S9gіwǾIy EgUS(D7h} ,~3v.JYP*O׼7asD,'u+5hoSf.'@,w?8& ?%m&ʔ/F?V&_VRx(^L,F\n0n踯(<"3\<&+dVec7;tJ }rr{ ̽Ѽms$25+Y6 }zZ0uHJH$oVQ9bPwj8Iz h&,"E;>AWRuE۶}>EJ*Q] ^t37䫖 Uc7i.#f۝OL[D).a*CuE=L- \__+ ! ,^˯NޢDkV@C%%AGF){3?Y)vG9䦜JryE $p@K-jTcv "UkՓP9;o8OV770;3QkZ|wA5^']LfSZoiFR]d] `XG3q}1j<{Bv|B<ҒF* 6nAЭyGjpUM_ Qw4>p0qJ(17X"d !X%Jjz rrG S.Ai|I_Nm`"#UqXee ,wVe 9@ U&^Q|83eQ/ rC&7e~[^-"J q,l1xo$oVKT!wxL>ީ9%yaԼ&$ƸwP4V$z6wz6bUfɓܭZ5~_Оo@q `aiVhp}4m2Fq|5%G2R$?tT_t>ս럧eʢTPOF r]^vM@tkqY}$d;2c3?A1ևZ+(mVխD%Hմ317 r%wy&qGvU/?=#,XESZAQhsm~Y 88 vࢳ4oZޣOsiBʌ-`^ISⓆW\qk TPRC;V2ӹd8Š(ݬ(*A".s|kX<i%\К1ҼRү5߂jn*?IH* ނR0؂P8WB:+D(n\菵1چ(M Lc K;~ D M]ǗpBZ;>u WZkx^NDh ;^ƙ@d:f&I˙nH Y:*|+ YaoB`N}S9g5#zOic W>O㾘mɶC וEre >atXx' }2A8ɣ}@8L @?Ju5dZA*4;I#tv~s6@M(@d.]{D' E98בCaEΨMo" ˽1FMYI :OɥNʲL 0UxϤMB&JɕzLU+q*ђoUÌV+-[7;#͕M=0OMkmSXK f邗< V&DjW}R#i^3"6lkJU-v9Jسtg8ctR&)\&#/ܢfï=%SP?,l_$JCDQzwê6"\%}aO`f<)U4= )1^އ'zv5[z+z+<2^Auٴ쐱e<+֘5#b҉.י)4M^J flm)!>/dG͞%/-\39gFD]rBJ4ys8Q۷D3_RʳaՍrNk1}QE Ѩrf\Yc2ieUغt/vAc=O?6)}dNGH]~QsËW>fU&r@5ί5%DJ}+10s!j+FZC!fUǀtuG6hClt[u/:*vӞ)U },1NJ0fǢΙs"a5ȉLB _6:a`>āwfeL"⾰qk@gNt8(⛲ +~lS&\b'@H?]\i縼oZC؜4`n!=5Jv僭݊k.6?9eTGlN%*0A}Ǿѳ·&D)@RyBKBy'[*5{I15c>)8_aP_`>$cɍu+>^YN6b} bj~r9b-=o!}\ gxH>bo.|bbZe ^4i²/\pfDlw^HU =oZOKWP=맫E[*vfipUyG1Wx׹ސ0͔mG/A6׮#)3H97VퟳҎ$%]&ӻX }^W1S (L(F''ŪI^ZmcȇVO֗:ƦLF}!))/紭\Aa% 5igۛ Th~^}  \H˳$(h4Z&nδk [!OYt N,֏>IÌycP͢?Lǘ-{n9X P(m%vC0cLG3O/@+6 )!+'k;5$aM3 6tlAv%^h'S.i\DvX\~9H[~JmUg?Ǣ}!! L8Ҹ'g\krЉsRO"/R go=TY%",ԺӗSYBY-K y)rNuqB,;ĞnW#8,%+Kƒ#HZ7RKc-c񓤵TGsGqw}F^;!~+H59{N_e*M`n!50EkC9Yؓŗ| fw*-./![>Sk,YKc9lUYY,:՝[Cz!ۻ܊qK+;)6x.UPXn,.&9pQGUmeUle_.*.M N_@S[u C{]8?%aT9_OO8[{l>7y}j͙6?2ecVt:BE>B"3RުWtj̚^¶ܼU ߲` b;bK`Oت6IQTi5Z(_ڬqH]aj%GLR "d\a!,DI.p6:7FcpU͔tފ:Div_}$ބNH|)$|]ff.$.Quf+2GXd,J3 u4pqEQ4C.] 4:#2(ш )#=Ĩ_{ËA3#STh& -m~Qjpb F=ou4K{K&wܢbgɕDt2-]sN!|]٬Ԥ QwU/L:]BK+ݾ* y21 qB>x^(9Ş"gprIp}#UEn2ƵxTv8ږ'^լ_WU+=8vjOT=LAϛct!,g#X_Jİ nDEau#kbɤų`pUA+eL xXhۀryTgh*1+;3n@v'-su7ix{K&de7O>VN`X8BnQ%-Bɯѱ'<1jSCr;xGW@mR"ig$ endstream endobj 1001 0 obj << /Type /ObjStm /N 100 /First 963 /Length 5320 /Filter /FlateDecode >> stream x\Ys8~Lu vVj$N2$nI&ZmvDe;IQeIݾU] !<$M[pc)E]~ybJwS"ᙠ 2"pN JxgnGMΩ'Br/KN<*z(v܁7*NGD #Q!@]J5* *K$= GN"@ ) (HbNvgH%0,t9x"mΤ.DeN&%tFm*QJ]eI,QK)!~H*J'gȡ$s!C5 $ZA޸Y ̃~VN5 5hNLUmCC-pNl$)=C ʠ`֐dB&e4C2jS:A|eNAZKwCƜ[A7фF℁bP_2D r)Bb\hU0Mpc4+!h4R2 (2Tqa4 JYҺmF&amܸ0! R^hNL%П'0K2h{r |NL|d#D`畤`)"`;wa2U cETPָ \j%Lq+?!K&T#. Ao??Gu1aAg$Ίtҿ?\4x<" )1]<ͣS#c¨H &olt1dvfog|J1vRi5f騨cBsyB&:lPLBr0 I+ BY3\I@ oL>烲k]C6/FG^R)轼~+tGD %b3T E&p+QS EiU%;Ltp H!ۺ4 ^LY*(+HbOO`Q:A`W@Hfo)S遟w8{|G%$0BFִwun:R.sGEЛffXstMN HWO8w'Ҧn :)n:WwIvt^wO1eKȃ|>ͫϾVϫ*pǣ*y(aǒq:KROTX';' +xϠ#;v: D 8a_b9ڜ0Q5p?ŘY 8?MBX2XڐqN1#oHB q?b%ME6TtS1rqh->,5Eʴnˇa~,İ P(4An-BR3*\e~M ?7ÎuI - $?ck{7e9(oÁP',Srniع\2C8>p6@/P"#kR u?7~ثyBĢ`YX, e}(ƥc2r ;2%&20Mhk75'e @+ {m)p)mʇ"Ij2L>>U^@opyʋ6Y6 X$TgZ~R,Lפh-赆3Cv#\Y{K7:~l8X:WsX!s8׈=$95snjxTHoNլkJSIw~$szn]_>OM4*ඉ&n? %T%}Kbt*qf-] >~: fEKxq.7 kJBl]:xKdXnj2R 0G!mHWy8QrGxEB\8,z!#QXLx`><5'h(XDJoDġdJ@E0In-;"YjC0IQ,cr<鰴3u]+Oeg(($;nʺ~PpчWYRx,kFBoh6P`FФ H6+Ar՜(jsA{h|uD^Db FX`UBؕLHyPnR-0t]0LUSghj;2(^)!iJgRZP Qݪ67-ɱS-=oZd\.Cu$%p~&)kEy5Y!NUPt\iiՒA$H5 m,9V,RQX4= q(C!HgSz)Q`!ЫI42rAADu&+vC6[t`%<&i+$;ZXˌ(=Y"zzq*V>PvԈkѨQ79obymx6_s׮`X)"-ez_i*m)Ͷ Z-j5uLN_^^/]O\o=kwҀ~WWo8k);{yTJOΞ޴Ti|:ʓɫW/y zHq^8"9_qqz‡|O^ N<`4 ];te@ Ϋbrw!W|J#v^7쌝_Y>첨rzEmb2+g%Fl&ŴٌU q9Go,=7._=96BZgKn,J6Zg:/^üa"ibm8TdmE ~1>yy6CNoY뿇kv2-^aӓΏyfexr_b%ܼ>cYunE>چ}c݀^I~~s Kl7GVvƫD?y?~z]/糳؟' UE+*N1[ȗ;Ote!F[F^Xum؃ƶ7DQ e1oϜ魶W=?{ *$yoTʧB?M7:mZ<գXңجǨE:h]`{XJ~E{%ؗg|tEg ѕ]^r}n]/ENܞ-.A}>LZyo^zؖ$rprڛC'5'+t`٥mlB-/N>emw`n᯼U{3fm+/xف)6o;~7{%˖a6g޸_! }E?6D,2谶q*œxjq4[adcWmkc f1Rc͵,9g&FC4?`ݍK##:ۥ'&Eڠ͋DOωndET7$B_xkװDU8>Cͤzj-.R#fGb=(_%(ứ'=K1<]q9U/B'ysB]ٯnfym 1tem|6~ .5*),/1k}gt$lWEWe~m5S3>,to~}O}5=jm54 mo7׸Z7J waj}z5[K*b endstream endobj 1065 0 obj << /Type /ObjStm /N 100 /First 885 /Length 2777 /Filter /FlateDecode >> stream xڭZr8}W1)Ѹjj6qfM$N;5ML$%ѳns{I6rtMI1Z3μ 4gR? ga3gV{|  P)L+ FOe(nj BWK[x YׂyQA:7 A)  < jZJ#' P40-yeT2a΄PxCK9äِpTVh!dR+D=x, gtJ;*{Y',s L:@֔ht+IKX!E4b$Uafq,Q!(J)9- 4*#hF )$Q!Ө槵êA>1;miLajkjlĈ[0/ f!1+#PIEǜ"Ш>hcBzEˈz 1S;i B IpI90:If +EN9$3;xgsv0y 0l1rX-$ G3Nȟwn㏳nvsgᑄ0;~6ŽϞ=}=f{ion>18;Åv,69g/o :(?]_tDbzz'*uTb .髈*5FEd\ v@ۋTΕpwA~i:۞uοj.>wI~rx~T= D'! 3 g=}2x\)x8޳7;Ġ|,ovvC@x&Yl`ܲ?qf#<\ w8Cp`׿*  WJVCh x!VH6S4! b"bk#:\q\װj!Νא!<^&6bcWX k`c᭎W#! XЛp\}K8`>܇mo" `XAofW}t0\#fl+E)x 064̓^9u瞒~ٮr8"Y3k(mlQxٔJ/P]α jҦ2X9RJMU1쬃/]Τ:Q4azuQSIuvWQ*.ֲ+X:1&UE[(b34*UAzĈVC`D XUz&㐩XrT]Q6LrҺ(Kb _kHD"^L QaWHg.bZbZ)jF:*ZjCIP:p?U*w ԶC' .G(Kʋ~jI>.ϼ6@2.&h0n\Ch&Lfpn9 ]θ"wF (zOG.zύ?]W)(8@_^ db*~O1Bkax3<]-Iw1/6+GϮ_w}8ƒK∹pGfEj*~U%@5v{E NUAp@gudM1Sc2h&ET$ IhM7wH/RY̽UWňU{xŪlokη}x[Pd‹nu{P\ϯd˗#rbeӬpYy"Ǚ>: 7Ŧ竮bb=眈›n Ob|fiαլiݬzr]vwK l6f}.W9J젷:G/j}w隫9״ ]/7s)üfn?c/%LxƵH{ns,.n#!]pgxL-vbKL>yؕLaA ww(Wf8]?5sq/RF($T)S$%eJO;2ACø1֎@yhXT _  G!5/ @4|tD4\ 'C(c{. J(L!@”dx08:wjH<_}hK%QHh8+MguQZ,79 TB5 PrTs@?JZu\v .-$C'(2PG[GkՁaZ)3N[Hqy692k_GT`Ie endstream endobj 1120 0 obj << /Producer (pdfTeX-1.40.21) /Author()/Title()/Subject()/Creator(LaTeX with hyperref)/Keywords() /CreationDate (D:20210929112813+02'00') /ModDate (D:20210929112813+02'00') /Trapped /False /PTEX.Fullbanner (This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020/MacPorts 2020.54632_3) kpathsea version 6.3.2) >> endobj 1086 0 obj << /Type /ObjStm /N 34 /First 331 /Length 1344 /Filter /FlateDecode >> stream xڭXnG+{ eESM YDOiXj{U]S VHA2xA ( ,%WԂH:?MɊ-q<8z\PR', Bi!PH*LfBOS+ ƚ KNh:APhx 8$#Ik qrD䅕B ’J%7oln.ۖ,#v(F)͈%,|(B֫Հw'PcKĸ;aAYPXrW>/Kp/ ܞsT ܞZnƱ@X8<~Ðj5 KF9elbx~_nEs ' WjY&T QnF66qN6l+B:@cr׭Vi9K1o YՍQUp Z(\ԺMf̆!qYXMBהuhk1ꜭQti>ŀ h_k5@ig֕F)c]zLܠ]H0K5cDl3g`!e]؎c-@ɛ $-kHUgۛba:&ˍTFW:!7RYp9X<` mi9SL1$+ 7j`HUAs4tӇ׳ЕBӠK{Bңi5 yI3]"bjja6I isԯ٥:8naįYȂC8,uÂC(,(]: ɕjl}>Y+ƥ"I ^H(@O9t*Pƭ>kN2b9d:M$ ظ9Z:_֥j9̼#Y*FJޔFxw#%n)mxwd܍UHI),Hn8$k\sPs #~qNέӛst~:ݱ'ɹM,PTtLwOݟ˅Nr:^ÿp)I> ! qBf;Ztݟx{xo$А `y}Tb~͐ksG3~>,11k|H0gǯ9ʂ΂ɂ͂;*fni~I&a~hMf!W~_(gȓ)#Vzw ] /Length 2652 /Filter /FlateDecode >> stream x%Kl]Gweq?Nr8Nr$Nm'ׯĎ/T T RD)TL.A$@ 52( U%A+gthuh[ Z=]~MC"̢)gph]hNuMq;CWF_FAS($%)` MyGwlCiTzn-4yCy .; ^KBYIr d)=T)I: =Ԡ4mtIMK=ԣ)qN%~4%Hp.dG.%C3*yhtY?f{HhNr\_whpAcѨ1D!@ D:`7 )` 86`dE ]yH@R2V$$S )(ZI@ch#s&@2 0|nV a nxpl4l0{g,Z vj[ZJŦhZXꀖEh`](% ~'9O4q2Q: XJ4M}oH@U@ݱ`P}Pgz.4U ApG@8 Zqp]Sف/mtN @?P1vT;mK 8Y0 .mplURM՘{pd$CCE m@Ufӽz5Wq1ET\JE6ۨs(f[_O"jҫohMpv^yG v>p5*7ЅZ@1wsG1w5}FOiu;_-xiw_Fpu}]8 0 38"z>T`G90d>, HU Ⰰ#g-XG@e7w*S= TXcr ;}CoHN<;fċ{;'ST!cUK_fU"X%2eDE$G"I4[,+7E捈"97E"CEd%2DK"E4oD1C҅1B'2D\qI%D\qI%D\1C,K1Md*LD"ވ!b"VGDm~nmy?ey/Qjl MKq*Af둷[Z 4m[6+ԓemQkk,}=}8 Y ]m° NӠ tA,as JY@Gul<8)_| XzBX!78 nI0QK?|4`0"XˀJXw-z Wu{О(`tsF2b#@yYl,?`Pro!7&8gLslk ;3! 9f1Cr+n~^A|^r,Xs  Yei#y`pr+W̉`NsD0'9̉`Nr<.B~m ?Pfִ^y֌ufԚ¯?Qk Z VQZV%xv[eVUrJVZ%Z+VmV/RkͪVg:<t2DEk`0!Zt8(?xfU2Wdddddy#de,%gTơS4"2l__Q`dd+cprqTqFv:zս]~-?I> endstream endobj startxref 479864 %%EOF menhir-20210929/doc/manual.tex000066400000000000000000006653741412503066000160340ustar00rootroot00000000000000\def\true{true} \let\fpacm\true \documentclass[onecolumn,11pt,nocopyrightspace,preprint]{sigplanconf} \usepackage{amstext} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{moreverb} \usepackage{tikz} \usepackage{xspace} \usepackage{mymacros} \def\fppdf{true} \usepackage{fppdf} \input{macros} \input{version} % Let Menhir's version number appear at the bottom right of every page. \makeatletter \def\@formatyear{\menhirversion} \makeatother % Hevea-specific logic; see % http://hevea.inria.fr/doc/manual007.html % http://hevea.inria.fr/doc/manual008.html \usepackage{hevea} \newenvironment{heveapicture}{ \begin{toimage} }{ \end{toimage} \imageflush{} } % ------------------------------------------------------------------------------ % Headings. \title{\menhir Reference Manual\\\normalsize (version \menhirversion)} \begin{document} \authorinfo{François Pottier\and Yann Régis-Gianas} {INRIA} {\{Francois.Pottier, Yann.Regis-Gianas\}@inria.fr} \maketitle % ------------------------------------------------------------------------------ \clearpage \tableofcontents \clearpage % ------------------------------------------------------------------------------ \section{Foreword} \menhir is a parser generator. It turns high-level grammar specifications, decorated with semantic actions expressed in the \ocaml programming language~\cite{ocaml}, into parsers, again expressed in \ocaml. It is based on Knuth's LR(1) parser construction technique~\cite{knuth-lr-65}. It is strongly inspired by its precursors: \yacc~\cite{johnson-yacc-79}, \texttt{ML-Yacc}~\cite{tarditi-appel-00}, and \ocamlyacc~\cite{ocaml}, but offers a large number of minor and major improvements that make it a more modern tool. This brief reference manual explains how to use \menhir. It does not attempt to explain context-free grammars, parsing, or the LR technique. Readers who have never used a parser generator are encouraged to read about these ideas first~\cite{aho-86,appel-tiger-98,hopcroft-motwani-ullman-00}. They are also invited to have a look at the \distrib{demos} directory in \menhir's distribution. Potential users of \menhir should be warned that \menhir's feature set is not completely stable. There is a tension between preserving a measure of compatibility with \ocamlyacc, on the one hand, and introducing new ideas, on the other hand. Some aspects of the tool, such as the error handling mechanism, are still potentially subject to incompatible changes: for instance, in the future, the current error handling mechanism (which is based on the \error token, see \sref{sec:errors}) could be removed and replaced with an entirely different mechanism. There is room for improvement in the tool and in this reference manual. Bug reports and suggestions are welcome! % ------------------------------------------------------------------------------ \section{Usage} \menhir is invoked as follows: \begin{quote} \cmenhir \nt{option} \ldots \nt{option} \nt{filename} \ldots \nt{filename} \end{quote} Each of the file names must end with \mly (unless \ocoq is used, in which case it must end with \vy) and denotes a partial grammar specification. These partial grammar specifications are joined (\sref{sec:split}) to form a single, self-contained grammar specification, which is then processed. The following optional command line switches allow controlling many aspects of the process. \docswitch{\obase \nt{basename}} This switch controls the base name of the \ml and \mli files that are produced. That is, the tool will produce files named \nt{basename}\texttt{.ml} and \nt{basename}\texttt{.mli}. Note that \nt{basename} can contain occurrences of the \texttt{/} character, so it really specifies a path and a base name. When only one \nt{filename} is provided on the command line, the default \nt{basename} is obtained by depriving \nt{filename} of its final \mly suffix. When multiple file names are provided on the command line, no default base name exists, so that the \obase switch \emph{must} be used. \docswitch{\ocmly} This switch causes \menhir to produce a \cmly file in addition to its normal operation. This file contains a (binary-form) representation of the grammar and automaton (see \sref{sec:sdk}). \docswitch{\ocomment} This switch causes a few comments to be inserted into the \ocaml code that is written to the \ml file. \docswitch{\ocompareerrors \nt{filename1} \ocompareerrors \nt{filename2}} Two such switches must always be used in conjunction so as to specify the names of two \messages files, \nt{filename1} and \nt{filename2}. Each file is read and internally translated to a mapping of states to messages. \menhir then checks that the left-hand mapping is a subset of the right-hand mapping. This feature is typically used in conjunction with \olisterrors to check that \nt{filename2} is complete (that is, covers all states where an error can occur). For more information, see \sref{sec:errors:new}. \docswitch{\ocompileerrors \nt{filename}} This switch causes \menhir to read the file \nt{filename}, which must obey the \messages file format, and to compile it to an \ocaml function that maps a state number to a message. The \ocaml code is sent to the standard output channel. At the same time, \menhir checks that the collection of input sentences in the file \nt{filename} is correct and irredundant. For more information, see \sref{sec:errors:new}. \docswitch{\ocoq} This switch causes \menhir to produce Coq code. See \sref{sec:coq}. \docswitch{\ocoqlibpath \nt{path}} This switch allows specifying under what name (or path) the Coq support library is known to Coq. When \menhir runs in \ocoq mode, the generated parser contains references to several modules in this library. This path is used to qualify these references. Its default value is \coqmenhirlib. \docswitch{\ocoqlibnopath} This switch indicates that references to the Coq library \coqmenhirlib should \emph{not} be qualified. This was the default behavior of \menhir prior to 2018/05/30. This switch is provided for compatibility, but normally should not be used. \docswitch{\ocoqnoactions} (Used in conjunction with \ocoq.) This switch causes the semantic actions present in the \vy file to be ignored and replaced with \verb+tt+, the unique inhabitant of Coq's \verb+unit+ type. This feature can be used to test the Coq back-end with a standard grammar, that is, a grammar that contains \ocaml semantic actions. Just rename the file from \mly to \vy and set this switch. \docswitch{\ocoqnocomplete} (Used in conjunction with \ocoq.) This switch disables the generation of the proof of completeness of the parser (\sref{sec:coq}). This can be necessary because the proof of completeness is possible only if the grammar has no conflict (not even a benign one, in the sense of \sref{sec:conflicts:benign}). This can be desirable also because, for a complex grammar, completeness may require a heavy certificate and its validation by Coq may take time. \docswitch{\ocoqnoversioncheck} (Used in conjunction with \ocoq.) This switch prevents the generation of the check that verifies that the versions of \menhir and \coqmenhirlib match. \docswitch{\odepend} See \sref{sec:build}. \docswitch{\odump} This switch causes a description of the automaton to be written to the file \nt{basename}\automaton. This description is written after benign conflicts have been resolved, before severe conflicts are resolved (\sref{sec:conflicts}), and before extra reductions are introduced (\sref{sec:onerrorreduce}). \docswitch{\odumpresolved} This command line switch causes a description of the automaton to be written to the file \nt{basename}\automatonresolved. This description is written after all conflicts have been resolved (\sref{sec:conflicts}) and after extra reductions have been introduced (\sref{sec:onerrorreduce}). \docswitch{\oechoerrors \nt{filename}} This switch causes \menhir to read the \messages file \nt{filename} and to produce on the standard output channel just the input sentences. (That is, all messages, blank lines, and comments are filtered out.) For more information, see \sref{sec:errors:new}. \docswitch{\oechoerrorsconcrete \nt{filename}} This switch causes \menhir to read the \messages file \nt{filename} and to produce on the standard output channel just the input sentences. Each sentence is followed with a comment of the form \verb+## Concrete syntax: ...+ that shows this sentence in concrete syntax. This comment is printed only if the user has defined an alias for every token (\sref{sec:tokens}). \docswitch{\oexplain} This switch causes conflict explanations to be written to the file \nt{basename}\conflicts. See also \sref{sec:conflicts}. \docswitch{\oexternaltokens \nt{T}} This switch causes the definition of the \token type to be omitted in \nt{basename}\texttt{.ml} and \nt{basename}\texttt{.mli}. Instead, the generated parser relies on the type $T$\texttt{.}\token, where $T$ is an \ocaml module name. It is up to the user to define module $T$ and to make sure that it exports a suitable \token type. Module $T$ can be hand-written. It can also be automatically generated out of a grammar specification using the \oonlytokens switch. \docswitch{\ofixedexc} This switch causes the exception \texttt{Error} to be internally defined as a synonym for \texttt{Parsing.Parse\_error}. This means that an exception handler that catches \texttt{Parsing.Parse\_error} will also catch the generated parser's \texttt{Error}. This helps increase \menhir's compatibility with \ocamlyacc. There is otherwise no reason to use this switch. \docswitch{\ograph} This switch causes a description of the grammar's dependency graph to be written to the file \nt{basename}\dott. The graph's vertices are the grammar's nonterminal symbols. There is a directed edge from vertex $A$ to vertex $B$ if the definition of $A$ refers to $B$. The file is in a format that is suitable for processing by the \emph{graphviz} toolkit. \docswitch{\oinfer, \oinferwrite, \oinferread} See \sref{sec:build}. \docswitch{\oinspection} This switch requires \otable. It causes \menhir to generate not only the monolithic and incremental APIs (\sref{sec:monolithic}, \sref{sec:incremental}), but also the inspection API (\sref{sec:inspection}). Activating this switch causes a few more tables to be produced, resulting in somewhat larger code size. \docswitch{\ointerpret} This switch causes \menhir to act as an interpreter, rather than as a compiler. No \ocaml code is generated. Instead, \menhir reads sentences off the standard input channel, parses them, and displays outcomes. This switch can be usefully combined with \otrace. For more information, see \sref{sec:interpret}. \docswitch{\ointerpreterror} This switch is analogous to \ointerpret, except \menhir expects every sentence to cause an error on its last token, and displays information about the state in which the error is detected, in the \messages file format. For more information, see \sref{sec:errors:new}. \docswitch{\ointerpretshowcst} This switch, used in conjunction with \ointerpret, causes \menhir to display a concrete syntax tree when a sentence is successfully parsed. For more information, see \sref{sec:interpret}. \docswitch{\olisterrors} This switch causes \menhir to produce (on the standard output channel) a complete list of input sentences that cause an error, in the \messages file format. For more information, see \sref{sec:errors:new}. \docswitch{\ologautomaton \nt{level}} When \nt{level} is nonzero, this switch causes some information about the automaton to be logged to the standard error channel. \docswitch{\ologcode \nt{level}} When \nt{level} is nonzero, this switch causes some information about the generated \ocaml code to be logged to the standard error channel. \docswitch{\ologgrammar \nt{level}} When \nt{level} is nonzero, this switch causes some information about the grammar to be logged to the standard error channel. When \nt{level} is 2, the \emph{nullable}, \emph{FIRST}, and \emph{FOLLOW} tables are displayed. \docswitch{\omergeerrors \nt{filename1} \omergeerrors \nt{filename2}} Two such switches must always be used in conjunction so as to specify the names of two \messages files, \nt{filename1} and \nt{filename2}. This command causes \menhir to merge these two \messages files and print the result on the standard output channel. For more information, see \sref{sec:errors:new}. \docswitch{\onodollars} This switch disallows the use of positional keywords of the form \kw{\$i}. \docswitch{\onoinline} This switch causes all \dinline keywords in the grammar specification to be ignored. This is especially useful in order to understand whether these keywords help solve any conflicts. \docswitch{\onostdlib} This switch instructs \menhir to \emph{not} use its standard library (\sref{sec:library}). \docswitch{\oocamlc \nt{command}} See \sref{sec:build}. \docswitch{\oocamldep \nt{command}} See \sref{sec:build}. \docswitch{\oonlypreprocess} This switch causes the grammar specifications to be transformed up to the point where the automaton's construction can begin. The grammar specifications whose names are provided on the command line are joined (\sref{sec:split}); all parameterized nonterminal symbols are expanded away (\sref{sec:templates}); type inference is performed, if \oinfer is enabled; all nonterminal symbols marked \dinline are expanded away (\sref{sec:inline}). This yields a single, monolithic grammar specification, which is printed on the standard output channel. \docswitch{\oonlytokens} This switch causes the \dtoken declarations in the grammar specification to be translated into a definition of the \token type, which is written to the files \nt{basename}\texttt{.ml} and \nt{basename}\texttt{.mli}. No code is generated. This is useful when a single set of tokens is to be shared between several parsers. The directory \distrib{demos/calc-two} contains a demo that illustrates the use of this switch. \docswitch{\orandomseed \nt{seed}} This switch allows the user to set a random seed. This seed influences the random sentence generator. \docswitch{\orandomselfinit} This switch asks Menhir to choose a random seed in a nondeterministic (system-dependent) way. This seed influences the random sentence generator. \docswitch{\orandomsentencelength \nt{length}} This switch allows the user to set a goal length for the random sentence generator. The generated sentences will normally have length at most \nt{length}. % If \nt{length} is chosen in such a way that this constraint is infeasible, % then the system will automatically choose a different goal length. \docswitch{\orandomsentence \nt{symbol}} This switch asks Menhir to produce and display a random sentence that is generated by the nonterminal symbol \nt{symbol}. The sentence is displayed as a sequence of terminal symbols, separated with spaces. Each terminal symbol is represented by its name. The generated sentence is valid with respect to the grammar. If the grammar is in the class LR(1) (that is, if it has no conflicts at all), then the generated sentence is also accepted by the automaton. However, if the grammar has conflicts, then it may be the case that the sentence is rejected by the automaton. The distribution of sentences is \emph{not uniform}; some sentences (or fragments of sentences) may be more likely to appear than others. The productions that involve the \error pseudo-token are ignored by the random sentence generator. \docswitch{\orandomsentenceconcrete \nt{symbol}} This switch asks Menhir to produce and display a random sentence that is generated by the nonterminal symbol \nt{symbol}. The sentence is displayed as a sequence of terminal symbols, separated with spaces. Each terminal symbol is represented by its token alias (\sref{sec:tokens}). This assumes that a token alias has been defined for every token. \docswitch{\orawdepend} See \sref{sec:build}. \docswitch{\orequirealiases} This switch causes \menhir to check that a token alias (\sref{sec:tokens}) has been defined for every token. There is no requirement for this alias to be actually used; it must simply exist. A missing alias gives rise to a warning (and, in \ostrict mode, to an error). \docswitch{\ostdlib \nt{directory}} This switch exists only for backwards compatibility and is ignored. It may be removed in the future. \docswitch{\ostrategy \nt{strategy}} This switch selects an error handling strategy, to be used by the code back-end, the table back-end, and the reference interpreter. The available strategies are \legacy and simplified. (However, at the time of writing, the code back-end does not yet support the simplified strategy.) When this switch is omitted, the \legacy strategy is used. The choice of a strategy matters only if the grammar uses the \error token. For more details, see \sref{sec:errors}. \docswitch{\ostrict} This switch causes several warnings about the grammar and about the automaton to be considered errors. This includes warnings about useless precedence declarations, non-terminal symbols that produce the empty language, unreachable non-terminal symbols, productions that are never reduced, conflicts that are not resolved by precedence declarations, end-of-stream conflicts, and missing token aliases. \docswitch{\oo{suggest-*}} See \sref{sec:build}. \docswitch{\otable} This switch causes \menhir to use its table-based back-end, as opposed to its (default) code-based back-end. When \otable is used, \menhir produces significantly more compact and somewhat slower parsers. See \sref{sec:qa} for a speed comparison. The table-based back-end produces rather compact tables, which are analogous to those produced by \yacc, \bison, or \ocamlyacc. These tables are not quite stand-alone: they are exploited by an interpreter, which is shipped as part of the support library \menhirlib. For this reason, when \otable is used, \menhirlib must be made visible to the \ocaml compilers, and must be linked into your executable program. The \texttt{--suggest-*} switches, described above, help do this. The code-based back-end compiles the LR automaton directly into a nest of mutually recursive \ocaml functions. In that case, \menhirlib is not required. The incremental API (\sref{sec:incremental}) and the inspection API (\sref{sec:inspection}) are made available only by the table-based back-end. \docswitch{\otimings} This switch causes internal timing information to be sent to the standard error channel. \docswitch{\otimingsto \nt{filename}} This switch causes internal timing information to be written to the file \nt{filename}. \docswitch{\otrace} This switch causes tracing code to be inserted into the generated parser, so that, when the parser is run, its actions are logged to the standard error channel. This is analogous to \texttt{ocamlrun}'s \texttt{p=1} parameter, except this switch must be enabled at compile time: one cannot selectively enable or disable tracing at runtime. \docswitch{\oignoreprec} This switch suppresses all warnings about useless \dleft, \dright, \dnonassoc and \dprec declarations. \docswitch{\oignoreone \nt{symbol}} This switch suppresses the warning that is normally emitted when \menhir finds that the terminal symbol \nt{symbol} is unused. \docswitch{\oignoreall} This switch suppresses all of the warnings that are normally emitted when \menhir finds that some terminal symbols are unused. \docswitch{\oupdateerrors \nt{filename}} This switch causes \menhir to read the \messages file \nt{filename} and to produce on the standard output channel a new \messages file that is identical, except the auto-generated comments have been re-generated. For more information, see \sref{sec:errors:new}. \docswitch{\oversion} This switch causes \menhir to print its own version number and exit. % ------------------------------------------------------------------------------ \section{Lexical conventions} A semicolon character (\kw{;}) \emph{may} appear after a declaration (\sref{sec:decls}). An old-style rule (\sref{sec:old:rules}) \emph{may} be terminated with a semicolon. Also, within an old-style rule, each producer (\sref{sec:producers}) \emph{may} be terminated with a semicolon. A new-style rule (\sref{sec:new:rules}) \emph{must not} be terminated with a semicolon. Within such a rule, the elements of a sequence \emph{must} be separated with semicolons. Semicolons are not allowed to appear anywhere except in the places mentioned above. This is in contrast with \ocamlyacc, which views semicolons as insignificant, just like whitespace. Identifiers (\nt{id}) coincide with \ocaml identifiers, except they are not allowed to contain the quote (\kw{'}) character. Following \ocaml, identifiers that begin with a lowercase letter (\nt{lid}) or with an uppercase letter (\nt{uid}) are distinguished. A quoted identifier \nt{qid} is a string enclosed in double quotes. Such a string cannot contain a double quote or a backslash. Quoted identifiers are used as token aliases (\sref{sec:tokens}). Comments are C-style (surrounded with \kw{/*} and \kw{*/}, cannot be nested), C++-style (announced by \kw{/$\!$/} and extending until the end of the line), or \ocaml-style (surrounded with \kw{(*} and \kw{*)}, can be nested). Of course, inside \ocaml code, only \ocaml-style comments are allowed. \ocaml type expressions are surrounded with \kangle{and}. Within such expressions, all references to type constructors (other than the built-in \textit{list}, \textit{option}, etc.) must be fully qualified. % ------------------------------------------------------------------------------ \section{Syntax of grammar specifications} \newcommand{\modifier}{(\,\dquestion \metachoice \dplus \metachoice \dstar\hspace{-.3mm})} \begin{figure} \begin{center} \begin{tabular}{r@{}c@{}l} \nt{specification} \is \sepspacelist{\nt{declaration}} \percentpercent \sepspacelist{\nt{rule}} \optional{\percentpercent \textit{\ocaml code}} \\ \nt{declaration} \is \dheader{\textit{\ocaml code}} \\ && \dparameter \ocamlparam \\ && \dtoken \optional{\ocamltype} \sepspacelist{\nt{uid} \optional{\nt{qid}}} \\ && \dnonassoc \sepspacelist{\nt{uid}} \\ && \dleft \sepspacelist{\nt{uid}} \\ && \dright \sepspacelist{\nt{uid}} \\ && \dtype \ocamltype \sepspacelist{\nt{lid}} \\ && \dstart \optional{\ocamltype} \sepspacelist{\nt{lid}} \\ && \dattribute \sepspacelist{\nt{actual}} \sepspacelist{\nt{attribute}} \\ && \kw{\%} \nt{attribute} \\ % a grammar-wide attribute && \donerrorreduce \sepspacelist{\nt{lid}} \\ \nt{attribute} \is \kw{[@} \nt{name} \nt{payload} \kw{]} \\[4mm] \emph{old syntax} --- \nt{rule} \is \optional{\dpublic} \optional{\dinline} \nt{lid} \oparams{\nt{id}} \deuxpoints \precseplist\barre{\nt{group}} \\ \nt{group} \is \seplist{\ \barre}{\nt{production}} \daction \optional {\dprec \nt{id}} \\ \nt{production} \is \sepspacelist{\nt{producer}} \optional {\dprec \nt{id}} \\ \nt{producer} \is \optional{\nt{lid} \dequal} \nt{actual} \\ \nt{actual} \is \nt{id} \oparams{\nt{actual}} \\ && \nt{actual} \modifier \\ && \seplist{\ \barre}{\nt{group}} % not really allowed everywhere \\[4mm] \emph{new syntax} --- \nt{rule} \is \optional{\dpublic} \dlet \nt{lid} \oparams{\nt{id}} (\,\dcolonequal \metachoice \dequalequal\hspace{-.2mm}) \expression \\ \expression \is % a choice between sequence expressions: \precseplist\barre\expression \\&& % a sequence expression: \optional{\pattern \dequal{}} \expression \dsemi \expression \\&& % a symbol expression: \nt{id} \oparams{\expression} \\&& % a symbol expression: \expression \modifier \\&& % an action expression: \daction \optional {\dprec \nt{id}} % %prec is in fact allowed to appear before the semantic action, % but this is not documented. \\&& % an action expression: \dpfaction{\nt{\ocaml id}} \optional {\dprec \nt{id}} \\ \pattern \is \nt{lid} \,\metachoice\, \dunderscore \,\metachoice\, \dtilde \,\metachoice\, \tuple\pattern % The places where attributes can be attached are not shown in this % figure. This is intentional; let's avoid pollution. Attributes are % described separately. \end{tabular} \end{center} \caption{Syntax of grammar specifications} \label{fig:syntax} \end{figure} The syntax of grammar specifications appears in \fref{fig:syntax}. The places where attributes can be attached are not shown; they are documented separately (\sref{sec:attributes}). % % (For compatibility with \ocamlyacc, some specifications that do not fully % adhere to this syntax are also accepted.) % A grammar specification begins with a sequence of declarations (\sref{sec:decls}), ended by a mandatory \percentpercent keyword. % Following this keyword, a sequence of rules is expected. Each rule defines a nonterminal symbol~\nt{lid}, whose name must begin with a lowercase letter. % % In reality, in the old syntax, this is enforced only for start symbols. % In the new syntax, this is enforced for all symbols. % % A rule can also *extend* a symbol, but let's not mention that here. % A rule is expressed either in the ``old syntax'' (\sref{sec:old:rules}) or in the ``new syntax'' (\sref{sec:new:rules}), which is slightly more elegant and powerful. \subsection{Declarations} \label{sec:decls} \subsubsection{Headers} \label{sec:decls:headers} A header is a piece of \ocaml code, surrounded with \dheader{and}. It is copied verbatim at the beginning of the \ml file. It typically contains \ocaml \kw{open} directives and function definitions for use by the semantic actions. If a single grammar specification file contains multiple headers, their order is preserved. However, when two headers originate in distinct grammar specification files, the order in which they are copied to the \ml file is unspecified. It is important to note that the header is copied by \menhir only to the \ml file, \emph{not} to the \mli file. Therefore, it should not contain declarations that affect the meaning of the types that appear in the \mli file. Here are two problems that people commonly run into: \begin{itemize} \item Placing an \kw{open} directive that is required for a \dtype declaration to make sense. For instance, writing \verb+open Foo+ in the header and declaring \verb+%type bar+, where the type \verb+t+ is defined in the module \verb+Foo+, will not work. You must write \verb+%type bar+. \item Declaring a module alias that affects a (declared or inferred) type. For instance, writing \verb+module F = Foo+ in the header and declaring \verb+%type bar+ may not work (from 2020/05/25 on). The reason is, OCaml may infer that the symbol \verb+bar+ has type \verb+F.t+, and Menhir will rely on this information without realizing that \verb+F+ is a local name, so in the end, the \mli file contains a reference to \verb+F.t+ that does not make sense. \end{itemize} \subsubsection{Parameters} \label{sec:parameter} A declaration of the form: \begin{quote} \dparameter \ocamlparam \end{quote} causes the entire parser to become parameterized over the \ocaml module \nt{uid}, that is, to become an \ocaml functor. The directory \distrib{demos/calc-param} contains a demo that illustrates the use of this switch. If a single specification file contains multiple \dparameter declarations, their order is preserved, so that the module name \nt{uid} introduced by one declaration is effectively in scope in the declarations that follow. When two \dparameter declarations originate in distinct grammar specification files, the order in which they are processed is unspecified. Last, \dparameter declarations take effect before \dheader{$\ldots$}, \dtoken, \dtype, or \dstart declarations are considered, so that the module name \nt{uid} introduced by a \dparameter declaration is effectively in scope in \emph{all} \dheader{$\ldots$}, \dtoken, \dtype, or \dstart declarations, regardless of whether they precede or follow the \dparameter declaration. This means, in particular, that the side effects of an \ocaml header are observed only when the functor is applied, not when it is defined. \subsubsection{Tokens} \label{sec:tokens} A declaration of the form: \begin{quote} \dtoken \optional{\ocamltype} $\nt{uid}_1$ \optional{$\nt{qid}_1$} $\;\ldots\;$ $\nt{uid}_n$ \optional{$\nt{qid}_n$} \end{quote} defines the identifiers $\nt{uid}_1, \ldots, \nt{uid}_n$ as tokens, that is, as terminal symbols in the grammar specification and as data constructors in the \textit{token} type. If an \ocaml type $t$ is present, then these tokens are considered to carry a semantic value of type $t$, otherwise they are considered to carry no semantic value. If a quoted identifier $\nt{qid}_i$ is present, then it is considered an alias for the terminal symbol $\nt{uid}_i$. (This feature, known as ``token aliases'', is borrowed from Bison.) % https://www.gnu.org/software/bison/manual/html_node/Token-Decl.html#Token-Decl Throughout the grammar, the quoted identifier $\nt{qid}_i$ is then synonymous with the identifier $\nt{uid}_i$. % For example, if one declares: \begin{verbatim} %token PLUS "+" \end{verbatim} then the quoted identifier \texttt{"+"} stands for the terminal symbol \texttt{PLUS} throughout the grammar. An example of the use of token aliases appears in the directory \distrib{demos/calc-alias}. % Token aliases can be used to improve the readability of a grammar. One must keep in mind, however, that they are just syntactic sugar: they are not interpreted in any way by Menhir or conveyed to tools like \ocamllex. % They could be considered confusing by a reader who mistakenly believes that they are interpreted as string literals. \subsubsection{Priority and associativity} \label{sec:assoc} A declaration of one of the following forms: \begin{quote} \dnonassoc $\nt{uid}_1 \ldots \nt{uid}_n$ \\ \dleft $\nt{uid}_1 \ldots \nt{uid}_n$ \\ \dright $\nt{uid}_1 \ldots \nt{uid}_n$ \end{quote} assigns both a \emph{priority level} and an \emph{associativity status} to the symbols $\nt{uid}_1, \ldots, \nt{uid}_n$. The priority level assigned to $\nt{uid}_1, \ldots, \nt{uid}_n$ is not defined explicitly: instead, it is defined to be higher than the priority level assigned by the previous \dnonassoc, \dleft, or \dright declaration, and lower than that assigned by the next \dnonassoc, \dleft, or \dright declaration. The symbols $\nt{uid}_1, \ldots, \nt{uid}_n$ can be tokens (defined elsewhere by a \dtoken declaration) or dummies (not defined anywhere). Both can be referred to as part of \dprec annotations. Associativity status and priority levels allow shift/reduce conflicts to be silently resolved (\sref{sec:conflicts}). \subsubsection{Types} \label{sec:type} A declaration of the form: \begin{quote} \dtype \ocamltype $\nt{lid}_1 \ldots \nt{lid}_n$ \end{quote} assigns an \ocaml type to each of the nonterminal symbols $\nt{lid}_1, \ldots, \nt{lid}_n$. For start symbols, providing an \ocaml type is mandatory, but is usually done as part of the \dstart declaration. For other symbols, it is optional. Providing type information can improve the quality of \ocaml's type error messages. A \dtype declaration may concern not only a nonterminal symbol, such as, say, \texttt{expression}, but also a fully applied parameterized nonterminal symbol, such as \texttt{list(expression)} or \texttt{separated\_list(COMMA, option(expression))}. The types provided as part of \dtype declarations are copied verbatim to the \ml and \mli files. In contrast, headers (\sref{sec:decls:headers}) are copied to the \ml file only. For this reason, the types provided as part of \dtype declarations must make sense both in the presence and in the absence of these headers. They should typically be fully qualified types. % TEMPORARY type information can be mandatory in --coq mode; document? \subsubsection{Start symbols} \label{sec:start} A declaration of the form: \begin{quote} \dstart \optional{\ocamltype} $\nt{lid}_1 \ldots \nt{lid}_n$ \end{quote} declares the nonterminal symbols $\nt{lid}_1, \ldots, \nt{lid}_n$ to be start symbols. Each such symbol must be assigned an \ocaml type either as part of the \dstart declaration or via separate \dtype declarations. Each of $\nt{lid}_1, \ldots, \nt{lid}_n$ becomes the name of a function whose signature is published in the \mli file and that can be used to invoke the parser. \subsubsection{Attribute declarations} Attribute declarations of the form \dattribute \sepspacelist{\nt{actual}} \sepspacelist{\nt{attribute}} and \kw{\%} \nt{attribute} are explained in \sref{sec:attributes}. \subsubsection{Extra reductions on error} \label{sec:onerrorreduce} A declaration of the form: \begin{quote} \donerrorreduce $\nt{lid}_1 \ldots \nt{lid}_n$ \end{quote} marks the nonterminal symbols $\nt{lid}_1, \ldots, \nt{lid}_n$ as potentially eligible for reduction when an invalid token is found. This may cause one or more extra reduction steps to be performed before the error is detected. More precisely, this declaration affects the automaton as follows. Let us say that a production $\nt{lid} \rightarrow \ldots$ is ``reducible on error'' if its left-hand symbol~\nt{lid} appears in a \donerrorreduce declaration. After the automaton has been constructed and after any conflicts have been resolved, in every state~$s$, the following algorithm is applied: \begin{enumerate} \item Construct the set of all productions that are ready to be reduced in state~$s$ and are reducible on error; \item Test if one of them, say $p$, has higher ``on-error-reduce-priority'' than every other production in this set; \item If so, in state~$s$, replace every error action with a reduction of the production~$p$. (In other words, for every terminal symbol~$t$, if the action table says: ``in state~$s$, when the next input symbol is~$t$, fail'', then this entry is replaced with: ``in state~$s$, when the next input symbol is~$t$, reduce production~$p$''.) \end{enumerate} If step 3 above is executed in state~$s$, then an error can never be detected in state~$s$, since all error actions in state~$s$ are replaced with reduce actions. Error detection is deferred: at least one reduction takes place before the error is detected. It is a ``spurious'' reduction: in a canonical LR(1) automaton, it would not take place. An \donerrorreduce declaration does not affect the language that is accepted by the automaton. It does not affect the location where an error is detected. It is used to control in which state an error is detected. If used wisely, it can make errors easier to report, because they are detected in a state for which it is easier to write an accurate diagnostic message (\sref{sec:errors:diagnostics}). % This may make the tables bigger (but I have no statistics). % This makes LRijkstra significantly slower. Like a \dtype declaration, an \donerrorreduce declaration may concern not only a nonterminal symbol, such as, say, \texttt{expression}, but also a fully applied parameterized nonterminal symbol, such as \texttt{list(expression)} or \texttt{separated\_list(COMMA, option(expression))}. The ``on-error-reduce-priority'' of a production is that of its left-hand symbol. The ``on-error-reduce-priority'' of a nonterminal symbol is determined implicitly by the order of \donerrorreduce declarations. In the declaration $\donerrorreduce\;\nt{lid}_1 \ldots \nt{lid}_n$, the symbols $\nt{lid}_1, \ldots, \nt{lid}_n$ have the same ``on-error-reduce-priority''. They have higher ``on-error-reduce-priority'' than the symbols listed in previous \donerrorreduce declarations, and lower ``on-error-reduce-priority'' than those listed in later \donerrorreduce declarations. \subsection{Rules---old syntax} \label{sec:old:rules} In its simplest form, a rule begins with the nonterminal symbol \nt{lid}, followed by a colon character (\deuxpoints), and continues with a sequence of production groups (\sref{sec:productiongroups}). Each production group is preceded with a vertical bar character (\barre); the very first bar is optional. The meaning of the bar is choice: the nonterminal symbol \nt{id} develops to either of the production groups. We defer explanations of the keyword \dpublic (\sref{sec:split}), of the keyword \dinline (\sref{sec:inline}), and of the optional formal parameters $\tuple{\nt{id}}$ (\sref{sec:templates}). \subsubsection{Production groups} \label{sec:productiongroups} In its simplest form, a production group consists of a single production (\sref{sec:productions}), followed by an \ocaml semantic action (\sref{sec:actions}) and an optional \dprec annotation (\sref{sec:prec}). A production specifies a sequence of terminal and nonterminal symbols that should be recognized, and optionally binds identifiers to their semantic values. \paragraph{Semantic actions} \label{sec:actions} A semantic action is a piece of \ocaml code that is executed in order to assign a semantic value to the nonterminal symbol with which this production group is associated. A semantic action can refer to the (already computed) semantic values of the terminal or nonterminal symbols that appear in the production via the semantic value identifiers bound by the production. For compatibility with \ocamlyacc, semantic actions can also refer to unnamed semantic values via positional keywords of the form \kw{\$1}, \kw{\$2}, etc.\ This style is discouraged. (It is in fact forbidden if \onodollars is turned on.) Furthermore, as a positional keyword of the form \kw{\$i} is internally rewritten as \nt{\_i}, the user should not use identifiers of the form \nt{\_i}. \paragraph{\dprec annotations} \label{sec:prec} An annotation of the form \dprec \nt{id} indicates that the precedence level of the production group is the level assigned to the symbol \nt{id} via a previous \dnonassoc, \dleft, or \dright declaration (\sref{sec:assoc}). In the absence of a \dprec annotation, the precedence level assigned to each production is the level assigned to the rightmost terminal symbol that appears in it. It is undefined if the rightmost terminal symbol has an undefined precedence level or if the production mentions no terminal symbols at all. The precedence level assigned to a production is used when resolving shift/reduce conflicts (\sref{sec:conflicts}). \paragraph{Multiple productions in a group} If multiple productions are present in a single group, then the semantic action and precedence annotation are shared between them. This short-hand effectively allows several productions to share a semantic action and precedence annotation without requiring textual duplication. It is legal only when every production binds exactly the same set of semantic value identifiers and when no positional semantic value keywords (\kw{\$1}, etc.) are used. \subsubsection{Productions} \label{sec:productions} A production is a sequence of producers (\sref{sec:producers}), optionally followed by a \dprec annotation (\sref{sec:prec}). If a precedence annotation is present, it applies to this production alone, not to other productions in the production group. It is illegal for a production and its production group to both carry \dprec annotations. \subsubsection{Producers} \label{sec:producers} A producer is an actual (\sref{sec:actual}), optionally preceded with a binding of a semantic value identifier, of the form \nt{lid} \dequal. The actual specifies which construction should be recognized and how a semantic value should be computed for that construction. The identifier \nt{lid}, if present, becomes bound to that semantic value in the semantic action that follows. Otherwise, the semantic value can be referred to via a positional keyword (\kw{\$1}, etc.). \subsubsection{Actuals} \label{sec:actual} In its simplest form, an actual is just a terminal or nonterminal symbol $\nt{id}$. If it is a parameterized non-terminal symbol (see \sref{sec:templates}), then it should be applied: $\nt{id}\tuple{\nt{actual}}$. An actual may be followed with a modifier (\dquestion, \dplus, or \dstar). This is explained further on (see \sref{sec:templates} and \fref{fig:sugar}). An actual may also be an ``anonymous rule''. In that case, one writes just the rule's right-hand side, which takes the form $\seplist{\ \barre\ }{\nt{group}}$. (This form is allowed only as an argument in an application.) This form is expanded on the fly to a definition of a fresh non-terminal symbol, which is declared \dinline. For instance, providing an anonymous rule as an argument to \nt{list}: \begin{quote} \begin{tabular}{l} \nt{list} \dlpar \basic{e} = \nt{expression}; \basic{SEMICOLON} \dpaction{\basic{e}} \drpar \end{tabular} \end{quote} is equivalent to writing this: \begin{quote} \begin{tabular}{l} \nt{list} \dlpar \nt{expression\_SEMICOLON} \drpar \end{tabular} \end{quote} where the non-terminal symbol \nt{expression\_SEMICOLON} is chosen fresh and is defined as follows: \begin{quote} \begin{tabular}{l} \dinline \nt{expression\_SEMICOLON}: \newprod \basic{e} = \nt{expression}; \basic{SEMICOLON} \dpaction{\basic{e}} \end{tabular} \end{quote} \subsection{Rules---new syntax} \label{sec:new:rules} Please be warned that \textbf{the new syntax is considered experimental} and is subject to change in the future. % TEMPORARY à supprimer un jour... In its simplest form, a rule takes the form \dlet \nt{lid} \dcolonequal \expression. % Its left-hand side \nt{lid} is a nonterminal symbol; its right-hand side is an expression. % Such a rule defines an ordinary nonterminal symbol, while the alternate form \dlet \nt{lid} \dequalequal \expression defines an \dinline nonterminal symbol (\sref{sec:inline}), that is, a macro. % A rule can be preceded with the keyword \dpublic (\sref{sec:split}) and can be parameterized with a tuple of formal parameters $\tuple{\nt{id}}$ (\sref{sec:templates}). % The various forms of expressions, listed in \fref{fig:syntax}, are: % \begin{itemize} \item A \textbf{choice} between several expressions, % \precseplist\barre\expression. \optional{\barre} \expressionsub{1} \barre ${}\ldots{}$ \barre\expressionsub{n}. The leading bar is optional. \item A \textbf{sequence} of two expressions, \pattern \dequal \expressionsub{1} \dsemi \expressionsub{2}. The semantic value produced by \expressionsub{1} is decomposed according to the pattern \pattern. The \ocaml variables introduced by \pattern may appear in a semantic action that ends the sequence \expressionsub{2}. \item A sequence \dtilde \dequal \nt{id}${}_1$ \dsemi \expressionsub{2}, which is sugar for \nt{id}${}_1$ \dequal \nt{id}${}_1$ \dsemi \expressionsub{2}. This is a \textbf{pun}. % This is a special case of the previous form, % yet it receives special treatment; this is the % only case where ~ represents a deterministically-chosen name. \item A sequence \expressionsub{1} \dsemi \expressionsub{2}, which is sugar for \dunderscore \dequal \expressionsub{1} \dsemi \expressionsub{2}. \item A \textbf{symbol} \nt{id}, possibly applied to a tuple of expressions \dlpar \expressionsub{1},\ ${}\ldots{}$,\ \expressionsub{n} \drpar. It is worth noting that such an expression \emph{can} form the end of a sequence: \nt{id} at the end of a sequence stands for \nt{x} \dequal \nt{id} \dsemi \dpaction{\nt{x}} for some fresh variable \nt{x}. Thus, a sequence need not end with a semantic action. \item An expression followed with \dquestion, \dplus, or \dstar. This is sugar for the previous form: see \sref{sec:templates} and \fref{fig:sugar}. \item A \textbf{semantic action} \daction, possibly followed with a precedence annotation \dprec \nt{id}. This \ocaml code can refer to the variables that have been bound earlier in the sequence that this semantic action ends. These include all variables named by the user as well as all variables introduced by a $\dtilde$ pattern as part of a pun. % (but not variables introduced by deep ~ patterns) The notation $\kw{\$}i$, where $i$ is an integer, is forbidden. \item A \textbf{point-free semantic action} \dpfaction{\nt{\ocaml id}}, possibly followed with a precedence annotation \dprec~\nt{id}. The \ocaml identifier \nt{id} must denote a function or a data constructor. It is applied to a tuple of the variables that have been bound earlier in the sequence that this semantic action ends. Thus, $\dpfaction{\,\nt{id}\,}$ is sugar for $\dpaction{\,\nt{id}\;\,(x_1, \ldots, x_n)\,}$, where $x_1, \ldots, x_n$ are the variables bound earlier. These include all variables named by the user as well as all variables introduced by a $\dtilde$ pattern. \item An identity semantic action \dpfidentityaction. This is sugar for \dpfaction{\nt{identity}}, where \nt{identity} is \ocaml's identity function. Therefore, it is sugar for $\dpaction{\,(x_1, \ldots, x_n)\,}$, where $x_1, \ldots, x_n$ are the variables bound earlier. \end{itemize} \begin{comment} % fp: not sure if this paragraph is helpful. To some degree, an expression is analogous to an \ocaml expression: it returns an \ocaml value, and as a side effect, recognizes and consumes a fragment of the input. In particular, a sequence % \pattern \dequal \expressionsub{1} \dsemi \expressionsub{2} % is roughly analogous to an \ocaml sequence \verb+let p = e1 in e2+. % terminal symbol = recognize and consume one input symbol % nonterminal symbol = procedure call or macro invocation (if %inline) % semantic action = OCaml code insertion \end{comment} The syntax of expressions, as presented in \fref{fig:syntax}, seems more permissive than it really is. In reality, a choice cannot be nested inside a sequence; % (either on the left or on the right) a sequence cannot be nested in the left-hand side of a sequence; a semantic action cannot appear in the left-hand side of a sequence. (Thus, there is a stratification in three levels: choice expressions, sequence expressions, and atomic expressions, which corresponds roughly to the stratification of rules, productions, and producers in the old syntax.) % Furthermore, an expression between parentheses $\dlpar \expression \drpar$ is \emph{not} a valid expression. To surround an expression with parentheses, one must write either $\nt{midrule}\, \dlpar \expression \drpar$ or $\nt{endrule}\, \dlpar \expression \drpar$; see \sref{sec:library} and \fref{fig:standard}. When a complex expression (e.g., a choice or a sequence) is placed in parentheses, as in \nt{id}\,\dlpar\expression\drpar, this is equivalent to using $\nt{id}\,\dlpar\nt{s}\drpar$, where the fresh symbol~\nt{s} is declared as a synonym for this expression, via the declaration \dlet \nt{s} \dequalequal \expression. This idiom is also known as an anonymous rule (\sref{sec:actual}). \paragraph{Examples} As an example of a rule in the new syntax, the parameterized nonterminal symbol \nt{option}, which is part of Menhir's standard library (\sref{sec:library}), can be defined as follows: % \begin{quote} \begin{tabular}{l} \dlet \nt{option}(\nt{x}) \dcolonequal \\ \quad \barre \phantom{\nt{x} \dequal \nt{x} \dsemi{}} \dpaction{\nt{None}} \\ \quad \barre \nt{x} \dequal \nt{x} \dsemi{} \dpaction{\nt{Some x}} \end{tabular} \end{quote} % Using a pun, it can also be written as follows: % \begin{quote} \begin{tabular}{l} \dlet \nt{option}(\nt{x}) \dcolonequal \\ \quad \barre \phantom{\dtilde \dequal \nt{x} \dsemi{}} \dpaction{\nt{None}} \\ \quad \barre \dtilde \dequal \nt{x} \dsemi{} \dpaction{\nt{Some x}} \end{tabular} \end{quote} % Using a pun and a point-free semantic action, it can also be expressed as follows: % \begin{quote} \begin{tabular}{l} \dlet \nt{option}(\nt{x}) \dcolonequal \\ \quad \barre \phantom{\dtilde \dequal \nt{x} \dsemi{}} \dpaction{\nt{None}} \\ \quad \barre \dtilde \dequal \nt{x} \dsemi{} \dpfaction{\nt{Some}} \end{tabular} \end{quote} % As another example, the parameterized symbol $\nt{delimited}$, also part of Menhir's standard library (\sref{sec:library}), can be defined in the new syntax as follows: % \begin{quote} \begin{tabular}{l} \dlet \nt{delimited}(\nt{opening}, \nt{x}, \nt{closing}) \dequalequal \\ \quad \nt{opening} \dsemi \dtilde \dequal \nt{x} \dsemi \nt{closing} \dsemi \dpfidentityaction \end{tabular} \end{quote} % The use of $\dequalequal$ indicates that this is a macro, i.e., an \dinline nonterminal symbol (see \sref{sec:inline}). The identity semantic action \dpfidentityaction is here synonymous with \dpaction{\nt{x}}. Other illustrations of the new syntax can be found in the directories \distrib{demos/calc-new-syntax} and \distrib{demos/calc-ast}. \section{Advanced features} \subsection{Splitting specifications over multiple files} \label{sec:split} \paragraph{Modules} Grammar specifications can be split over multiple files. When \menhir is invoked with multiple argument file names, it considers each of these files as a \emph{partial} grammar specification, and \emph{joins} these partial specifications in order to obtain a single, complete specification. This feature is intended to promote a form a modularity. It is hoped that, by splitting large grammar specifications into several ``modules'', they can be made more manageable. It is also hoped that this mechanism, in conjunction with parameterization (\sref{sec:templates}), will promote sharing and reuse. It should be noted, however, that this is only a weak form of modularity. Indeed, partial specifications cannot be independently processed (say, checked for conflicts). It is necessary to first join them, so as to form a complete grammar specification, before any kind of grammar analysis can be done. This mechanism is, in fact, how \menhir's standard library (\sref{sec:library}) is made available: even though its name does not appear on the command line, it is automatically joined with the user's explicitly-provided grammar specifications, making the standard library's definitions globally visible. A partial grammar specification, or module, contains declarations and rules, just like a complete one: there is no visible difference. Of course, it can consist of only declarations, or only rules, if the user so chooses. (Don't forget the mandatory \percentpercent keyword that separates declarations and rules. It must be present, even if one of the two sections is empty.) \paragraph{Private and public nonterminal symbols} It should be noted that joining is \emph{not} a purely textual process. If two modules happen to define a nonterminal symbol by the same name, then it is considered, by default, that this is an accidental name clash. In that case, each of the two nonterminal symbols is silently renamed so as to avoid the clash. In other words, by default, a nonterminal symbol defined in module $A$ is considered \emph{private}, and cannot be defined again, or referred to, in module $B$. Naturally, it is sometimes desirable to define a nonterminal symbol $N$ in module $A$ and to refer to it in module~$B$. This is permitted if $N$ is public, that is, if either its definition carries the keyword \dpublic or $N$ is declared to be a start symbol. A public nonterminal symbol is never renamed, so it can be referred to by modules other than its defining module. In fact, it is permitted to split the definition of a \emph{public} nonterminal symbol, over multiple modules and/or within a single module. That is, a public nonterminal symbol $N$ can have multiple definitions, within one module and/or in distinct modules. All of these definitions are joined using the choice (\barre) operator. For instance, in the grammar of a programming language, the definition of the nonterminal symbol \nt{expression} could be split into multiple modules, where one module groups the expression forms that have to do with arithmetic, one module groups those that concern function definitions and function calls, one module groups those that concern object definitions and method calls, and so on. \paragraph{Tokens aside} Another use of modularity consists in placing all \dtoken declarations in one module, and the actual grammar specification in another module. The module that contains the token definitions can then be shared, making it easier to define multiple parsers that accept the same type of tokens. (On this topic, see \distrib{demos/calc-two}.) \subsection{Parameterizing rules} \label{sec:templates} A rule (that is, the definition of a nonterminal symbol) can be parameterized over an arbitrary number of symbols, which are referred to as formal parameters. \paragraph{Example} For instance, here is the definition of the parameterized nonterminal symbol \nt{option}, taken from the standard library (\sref{sec:library}): % \begin{quote} \begin{tabular}{l} \dpublic \basic{option}(\basic{X}): \newprod \dpaction{\basic{None}} \newprod \basic{x} = \basic{X} \dpaction{\basic{Some} \basic{x}} \end{tabular} \end{quote} % This definition states that \nt{option}(\basic{X}) expands to either the empty string, producing the semantic value \basic{None}, or to the string \basic{X}, producing the semantic value {\basic{Some}~\basic{x}}, where \basic{x} is the semantic value of \basic{X}. In this definition, the symbol \basic{X} is abstract: it stands for an arbitrary terminal or nonterminal symbol. The definition is made public, so \nt{option} can be referred to within client modules. A client who wishes to use \nt{option} simply refers to it, together with an actual parameter -- a symbol that is intended to replace \basic{X}. For instance, here is how one might define a sequence of declarations, preceded with optional commas: % \begin{quote} \begin{tabular}{l} \nt{declarations}: \newprod \dpaction{[]} \newprod \basic{ds} = \nt{declarations}; \nt{option}(\basic{COMMA}); \basic{d} = \nt{declaration} \dpaction{ \basic{d} :: \basic{ds} } \end{tabular} \end{quote} % This definition states that \nt{declarations} expands either to the empty string or to \nt{declarations} followed by an optional comma followed by \nt{declaration}. (Here, \basic{COMMA} is presumably a terminal symbol.) When this rule is encountered, the definition of \nt{option} is instantiated: that is, a copy of the definition, where \basic{COMMA} replaces \basic{X}, is produced. Things behave exactly as if one had written: \begin{quote} \begin{tabular}{l} \basic{optional\_comma}: \newprod \dpaction{\basic{None}} \newprod \basic{x} = \basic{COMMA} \dpaction{\basic{Some} \basic{x}} \\ \nt{declarations}: \newprod \dpaction{[]} \newprod \basic{ds} = \nt{declarations}; \nt{optional\_comma}; \basic{d} = \nt{declaration} \dpaction{ \basic{d} :: \basic{ds} } \end{tabular} \end{quote} % Note that, even though \basic{COMMA} presumably has been declared as a token with no semantic value, writing \basic{x}~=~\basic{COMMA} is legal, and binds \basic{x} to the unit value. This design choice ensures that the definition of \nt{option} makes sense regardless of the nature of \basic{X}: that is, \basic{X} can be instantiated with a terminal symbol, with or without a semantic value, or with a nonterminal symbol. \paragraph{Parameterization in general} In general, the definition of a nonterminal symbol $N$ can be parameterized with an arbitrary number of formal parameters. When $N$ is referred to within a production, it must be applied to the same number of actuals. In general, an actual is: % \begin{itemize} \item either a single symbol, which can be a terminal symbol, a nonterminal symbol, or a formal parameter; \item or an application of such a symbol to a number of actuals. \end{itemize} For instance, here is a rule whose single production consists of a single producer, which contains several, nested actuals. (This example is discussed again in \sref{sec:library}.) % \begin{quote} \begin{tabular}{l} \nt{plist}(\nt{X}): \newprod \basic{xs} = \nt{loption}(% \nt{delimited}(% \basic{LPAREN}, \nt{separated\_nonempty\_list}(\basic{COMMA}, \basic{X}), \basic{RPAREN}% )% ) \dpaction{\basic{xs}} \end{tabular} \end{quote} \begin{figure} \begin{center} \begin{tabular}{r@{\hspace{2mm}}c@{\hspace{2mm}}l} \nt{actual}\dquestion & is syntactic sugar for & \nt{option}(\nt{actual}) \\ \nt{actual}\dplus & is syntactic sugar for & \nt{nonempty\_list}(\nt{actual}) \\ \nt{actual}\dstar & is syntactic sugar for & \nt{list}(\nt{actual}) \end{tabular} \end{center} \caption{Syntactic sugar for simulating regular expressions, also known as EBNF} \label{fig:sugar} \end{figure} % Applications of the parameterized nonterminal symbols \nt{option}, \nt{nonempty\_list}, and \nt{list}, which are defined in the standard library (\sref{sec:library}), can be written using a familiar, regular-expression like syntax (\fref{fig:sugar}). \paragraph{Higher-order parameters} A formal parameter can itself expect parameters. For instance, here is a rule that defines the syntax of procedures in an imaginary programming language: % \begin{quote} \begin{tabular}{l} \nt{procedure}(\nt{list}): \newprod \basic{PROCEDURE} \basic{ID} \nt{list}(\nt{formal}) \nt{SEMICOLON} \nt{block} \nt{SEMICOLON} \dpaction{$\ldots$} \end{tabular} \end{quote} % This rule states that the token \basic{ID}, which represents the name of the procedure, should be followed with a list of formal parameters. (The definitions of the nonterminal symbols \nt{formal} and \nt{block} are not shown.) However, because \nt{list} is a formal parameter, as opposed to a concrete nonterminal symbol defined elsewhere, this definition does not specify how the list is laid out: which token, if any, is used to separate, or terminate, list elements? is the list allowed to be empty? and so on. A more concrete notion of procedure is obtained by instantiating the formal parameter \nt{list}: for instance, \nt{procedure}(\nt{plist}), where \nt{plist} is the parameterized nonterminal symbol defined earlier, is a valid application. \paragraph{Consistency} Definitions and uses of parameterized nonterminal symbols are checked for consistency before they are expanded away. In short, it is checked that, wherever a nonterminal symbol is used, it is supplied with actual arguments in appropriate number and of appropriate nature. This guarantees that expansion of parameterized definitions terminates and produces a well-formed grammar as its outcome. \subsection{Inlining} \label{sec:inline} It is well-known that the following grammar of arithmetic expressions does not work as expected: that is, in spite of the priority declarations, it has shift/reduce conflicts. % \begin{quote} \begin{tabular}{l} \dtoken \kangle{\basic{int}} \basic{INT} \\ \dtoken \basic{PLUS} \basic{TIMES} \\ \dleft \basic{PLUS} \\ \dleft \basic{TIMES} \\ \\ \percentpercent \\ \\ \nt{expression}: \newprod \basic{i} = \basic{INT} \dpaction{\basic{i}} \newprod \basic{e} = \nt{expression}; \basic{o} = \nt{op}; \basic{f} = \nt{expression} \dpaction{\basic{o} \basic{e} \basic{f}} \\ \nt{op}: \newprod \basic{PLUS} \dpaction{( + )} \newprod \basic{TIMES} \dpaction{( * )} \end{tabular} \end{quote} % The trouble is, the precedence level of the production \nt{expression} $\rightarrow$ \nt{expression} \nt{op} \nt{expression} is undefined, and there is no sensible way of defining it via a \dprec declaration, since the desired level really depends upon the symbol that was recognized by \nt{op}: was it \basic{PLUS} or \basic{TIMES}? The standard workaround is to abandon the definition of \nt{op} as a separate nonterminal symbol, and to inline its definition into the definition of \nt{expression}, like this: % \begin{quote} \begin{tabular}{l} \nt{expression}: \newprod \basic{i} = \basic{INT} \dpaction{\basic{i}} \newprod \basic{e} = \nt{expression}; \basic{PLUS}; \basic{f} = \nt{expression} \dpaction{\basic{e} + \basic{f}} \newprod \basic{e} = \nt{expression}; \basic{TIMES}; \basic{f} = \nt{expression} \dpaction{\basic{e} * \basic{f}} \end{tabular} \end{quote} % This avoids the shift/reduce conflict, but gives up some of the original specification's structure, which, in realistic situations, can be damageable. Fortunately, \menhir offers a way of avoiding the conflict without manually transforming the grammar, by declaring that the nonterminal symbol \nt{op} should be inlined: % \begin{quote} \begin{tabular}{l} \nt{expression}: \newprod \basic{i} = \basic{INT} \dpaction{\basic{i}} \newprod \basic{e} = \nt{expression}; \basic{o} = \nt{op}; \basic{f} = \nt{expression} \dpaction{\basic{o} \basic{e} \basic{f}} \\ \dinline \nt{op}: \newprod \basic{PLUS} \dpaction{( + )} \newprod \basic{TIMES} \dpaction{( * )} \end{tabular} \end{quote} % The \dinline keyword causes all references to \nt{op} to be replaced with its definition. In this example, the definition of \nt{op} involves two productions, one that develops to \basic{PLUS} and one that expands to \basic{TIMES}, so every production that refers to \nt{op} is effectively turned into two productions, one that refers to \basic{PLUS} and one that refers to \basic{TIMES}. After inlining, \nt{op} disappears and \nt{expression} has three productions: that is, the result of inlining is exactly the manual workaround shown above. In some situations, inlining can also help recover a slight efficiency margin. For instance, the definition: % \begin{quote} \begin{tabular}{l} \dinline \nt{plist}(\nt{X}): \newprod \basic{xs} = \nt{loption}(% \nt{delimited}(% \basic{LPAREN}, \nt{separated\_nonempty\_list}(\basic{COMMA}, \basic{X}), \basic{RPAREN}% )% ) \dpaction{\basic{xs}} \end{tabular} \end{quote} % effectively makes \nt{plist}(\nt{X}) an alias for the right-hand side \nt{loption}($\ldots$). Without the \dinline keyword, the language recognized by the grammar would be the same, but the LR automaton would probably have one more state and would perform one more reduction at run time. The \dinline keyword does not affect the computation of positions (\sref{sec:positions}). The same positions are computed, regardless of where \dinline keywords are placed. If the semantic actions have side effects, the \dinline keyword \emph{can} affect the order in which these side effects take place. In the example of \nt{op} and \nt{expression} above, if for some reason the semantic action associated with \nt{op} has a side effect (such as updating a global variable, or printing a message), then, by inlining \nt{op}, we delay this side effect, which takes place \emph{after} the second operand has been recognized, whereas in the absence of inlining it takes place as soon as the operator has been recognized. % Du coup, ça change l'ordre des effets, dans cet exemple, de infixe % à postfixe. \subsection{The standard library} \label{sec:library} \begin{figure} \begin{center} \begin{tabular}{lp{51mm}l@{}l} Name & Recognizes & Produces & Comment \\ \hline\\ % \nt{epsilon} & $\epsilon$ & \basic{unit} & (inlined) \\ % \\ \nt{endrule}(\nt{X}) & \nt{X} & $\alpha$, if \nt{X} : $\alpha$ & (inlined) \\ \nt{midrule}(\nt{X}) & \nt{X} & $\alpha$, if \nt{X} : $\alpha$ \\ \\ \nt{option}(\nt{X}) & $\epsilon$ \barre \nt{X} & $\alpha$ \basic{option}, if \nt{X} : $\alpha$ & (also \nt{X}\dquestion) \\ \nt{ioption}(\nt{X}) & $\epsilon$ \barre \nt{X} & $\alpha$ \basic{option}, if \nt{X} : $\alpha$ & (inlined) \\ \nt{boption}(\nt{X}) & $\epsilon$ \barre \nt{X} & \basic{bool} \\ \nt{loption}(\nt{X}) & $\epsilon$ \barre \nt{X} & $\alpha$ \basic{list}, if \nt{X} : $\alpha$ \nt{list} \\ \\ \nt{pair}(\nt{X}, \nt{Y}) & \nt{X} \nt{Y} & $\alpha\times\beta$, if \nt{X} : $\alpha$ and \nt{Y} : $\beta$ \\ \nt{separated\_pair}(\nt{X}, \nt{sep}, \nt{Y}) & \nt{X} \nt{sep} \nt{Y} & $\alpha\times\beta$, if \nt{X} : $\alpha$ and \nt{Y} : $\beta$ \\ \nt{preceded}(\nt{opening}, \nt{X}) & \nt{opening} \nt{X} & $\alpha$, if \nt{X} : $\alpha$ \\ \nt{terminated}(\nt{X}, \nt{closing}) & \nt{X} \nt{closing} & $\alpha$, if \nt{X} : $\alpha$ \\ \nt{delimited}(\nt{opening}, \nt{X}, \nt{closing}) & \nt{opening} \nt{X} \nt{closing} & $\alpha$, if \nt{X} : $\alpha$ \\ \\ \nt{list}(\nt{X}) & a possibly empty sequence of \nt{X}'s & $\alpha$ \basic{list}, if \nt{X} : $\alpha$ & (also \nt{X}\dstar) \\ \nt{nonempty\_list}(\nt{X}) & a nonempty sequence of \nt{X}'s & $\alpha$ \basic{list}, if \nt{X} : $\alpha$ & (also \nt{X}\dplus) \\ \nt{separated\_list}(\nt{sep}, \nt{X}) & a possibly empty sequence of \nt{X}'s separated with \nt{sep}'s & $\alpha$ \basic{list}, if \nt{X} : $\alpha$ \\ \nt{separated\_nonempty\_list}(\nt{sep}, \nt{X}) & a nonempty sequence of \nt{X}'s \hspace{2mm} separated with \nt{sep}'s & $\alpha$ \basic{list}, if \nt{X} : $\alpha$ \\ \\ \nt{rev}(\nt{X}) & \nt{X} & $\alpha$ \basic{list}, if \nt{X} : $\alpha$ \basic{list} & (inlined) \\ \nt{flatten}(\nt{X}) & \nt{X} & $\alpha$ \basic{list}, if \nt{X} : $\alpha$ \basic{list} \basic{list} & (inlined) \\ \nt{append}(\nt{X}, \nt{Y}) & \nt{X} \nt{Y} & $\alpha$ \basic{list}, if \nt{X}, \nt{Y} : $\alpha$ \basic{list} & (inlined) \\ \end{tabular} \end{center} \caption{Summary of the standard library; see \standardmly for details} \label{fig:standard} \end{figure} Once equipped with a rudimentary module system (\sref{sec:split}), parameterization (\sref{sec:templates}), and inlining (\sref{sec:inline}), it is straightforward to propose a collection of commonly used definitions, such as options, sequences, lists, and so on. This \emph{standard library} is joined, by default, with every grammar specification. A summary of the nonterminal symbols offered by the standard library appears in \fref{fig:standard}. See also the short-hands documented in \fref{fig:sugar}. By relying on the standard library, a client module can concisely define more elaborate notions. For instance, the following rule: % \begin{quote} \begin{tabular}{l} \dinline \nt{plist}(\nt{X}): \newprod \basic{xs} = \nt{loption}(% \nt{delimited}(% \basic{LPAREN}, \nt{separated\_nonempty\_list}(\basic{COMMA}, \basic{X}), \basic{RPAREN}% )% ) \dpaction{\basic{xs}} \end{tabular} \end{quote} % causes \nt{plist}(\nt{X}) to recognize a list of \nt{X}'s, where the empty list is represented by the empty string, and a non-empty list is delimited with parentheses and comma-separated. The standard library is stored in a file named \standardmly, which is embedded inside \menhir when it is built. % The command line switch \onostdlib instructs \menhir to \emph{not} load the standard library. The meaning of the symbols defined in the standard library (\fref{fig:standard}) should be clear in most cases. Yet, the symbols \nt{endrule}(\nt{X}) and \nt{midrule}(\nt{X}) deserve an explanation. Both take an argument \nt{X}, which typically will be instantiated with an anonymous rule (\sref{sec:actual}). Both are defined as a synonym for \nt{X}. In both cases, this allows placing an anonymous subrule in the middle of a rule. \newcommand{\AAA}{\nt{cat}} \newcommand{\BBB}{\nt{dog}} \newcommand{\CCC}{\nt{cow}} \newcommand{\XXX}{\nt{xxx}} For instance, the following is a well-formed production: % \[\begin{array}{l} \AAA \quad \nt{endrule}(\BBB \quad \dpaction{\nt{\ocaml code$_1$}}) \quad \CCC \quad \dpaction{\nt{\ocaml code$_2$}} \end{array}\] % This production consists of three producers, namely \AAA{} and \nt{endrule}(\BBB$\;$\dpaction{\nt{\ocaml code$_1$}}) and \CCC, and a semantic action \dpaction{\nt{\ocaml code$_2$}}. % Because \nt{endrule}(\nt{X}) is declared as an \dinline synonym for \nt{X}, the expansion of anonymous rules (\sref{sec:actual}), followed with the expansion of \dinline symbols (\sref{sec:inline}), transforms the above production into the following: % \[\begin{array}{l} \AAA \quad \BBB \quad \CCC \quad \dpaction{\nt{\ocaml code$_1$; \ocaml code$_2$}} \end{array}\] % Note that \nt{\ocaml code$_1$} moves to the end of the rule, which means that this code is executed only after \AAA, \BBB{} and \CCC{} have been recognized. In this example, the use of \nt{endrule} is rather pointless, as the expanded code is more concise and clearer than the original code. Still, \nt{endrule} can be useful when its actual argument is an anonymous rule with multiple branches. % Let me *not* show an example. See the comments in standard.mly. \nt{midrule} is used in exactly the same way as \nt{endrule}, but its expansion is different. For instance, the following is a well-formed production: % \[\begin{array}{l} \AAA \quad \nt{midrule}(\dpaction{\nt{\ocaml code$_1$}}) \quad \CCC \quad \dpaction{\nt{\ocaml code$_2$}} \end{array}\] % (There is no \BBB{} in this example; this is intentional.) Because \nt{midrule}(\nt{X}) is a synonym for \nt{X}, but is not declared \dinline, the expansion of anonymous rules (\sref{sec:actual}), followed with the expansion of \dinline symbols (\sref{sec:inline}), transforms the above production into the following: % \[\begin{array}{l} \AAA \quad \XXX \quad \CCC \quad \dpaction{\nt{\ocaml code$_2$}} \end{array}\] % where the fresh nonterminal symbol $\XXX$ is separately defined by the rule $\XXX: \dpaction{\nt{\ocaml code$_1$}}$. Thus, $\XXX$ recognizes the empty string, and as soon as it is recognized, \nt{\ocaml code$_1$} is executed. This is known as a ``mid-rule action''. % https://www.gnu.org/software/bison/manual/html_node/Mid_002dRule-Actions.html % ------------------------------------------------------------------------------ \section{Conflicts} \label{sec:conflicts} When a shift/reduce or reduce/reduce conflict is detected, it is classified as either benign, if it can be resolved by consulting user-supplied precedence declarations, or severe, if it cannot. Benign conflicts are not reported. Severe conflicts are reported and, if the \oexplain switch is on, explained. \subsection{When is a conflict benign?} \label{sec:conflicts:benign} A shift/reduce conflict involves a single token (the one that one might wish to shift) and one or more productions (those that one might wish to reduce). When such a conflict is detected, the precedence level (\sref{sec:assoc}, \sref{sec:prec}) of these entities are looked up and compared as follows: \begin{enumerate} \item if only one production is involved, and if it has higher priority than the token, then the conflict is resolved in favor of reduction. \item if only one production is involved, and if it has the same priority as the token, then the associativity status of the token is looked up: \begin{enumerate} \item if the token was declared nonassociative, then the conflict is resolved in favor of neither action, that is, a syntax error will be signaled if this token shows up when this production is about to be reduced; \item if the token was declared left-associative, then the conflict is resolved in favor of reduction; \item if the token was declared right-associative, then the conflict is resolved in favor of shifting. \end{enumerate} \item \label{multiway} if multiple productions are involved, and if, considered one by one, they all cause the conflict to be resolved in the same way (that is, either in favor in shifting, or in favor of neither), then the conflict is resolved in that way. \end{enumerate} In either of these cases, the conflict is considered benign. Otherwise, it is considered severe. Note that a reduce/reduce conflict is always considered severe, unless it happens to be subsumed by a benign multi-way shift/reduce conflict (item~\ref{multiway} above). \subsection{How are severe conflicts explained?} When the \odump switch is on, a description of the automaton is written to the \automaton file. Severe conflicts are shown as part of this description. Fortunately, there is also a way of understanding conflicts in terms of the grammar, rather than in terms of the automaton. When the \oexplain switch is on, a textual explanation is written to the \conflicts file. \emph{Not all conflicts are explained} in this file: instead, \emph{only one conflict per automaton state is explained}. This is done partly in the interest of brevity, but also because Pager's algorithm can create artificial conflicts in a state that already contains a true LR(1) conflict; thus, one cannot hope in general to explain all of the conflicts that appear in the automaton. As a result of this policy, once all conflicts explained in the \conflicts file have been fixed, one might need to run \menhir again to produce yet more conflict explanations. \begin{figure} \begin{quote} \begin{tabular}{l} \dtoken \basic{IF THEN ELSE} \\ \dstart \kangle{\basic{expression}} \nt{expression} \\ \\ \percentpercent \\ \\ \nt{expression}: \newprod $\ldots$ \newprod \basic{IF b} = \nt{expression} \basic{THEN e} = \nt{expression} \dpaction{$\ldots$} \newprod \basic{IF b} = \nt{expression} \basic{THEN e} = \nt{expression} \basic{ELSE f} = \nt{expression} \dpaction{$\ldots$} \newprod $\ldots$ \end{tabular} \end{quote} \caption{Basic example of a shift/reduce conflict} \label{fig:basicshiftreduce} \end{figure} \paragraph{How the conflict state is reached} \fref{fig:basicshiftreduce} shows a grammar specification with a typical shift/reduce conflict. % When this specification is analyzed, the conflict is detected, and an explanation is written to the \conflicts file. The explanation first indicates in which state the conflict lies by showing how that state is reached. Here, it is reached after recognizing the following string of terminal and nonterminal symbols---the \emph{conflict string}: % \begin{quote} \basic{IF expression THEN IF expression THEN expression} \end{quote} Allowing the conflict string to contain both nonterminal and terminal symbols usually makes it shorter and more readable. If desired, a conflict string composed purely of terminal symbols could be obtained by replacing each occurrence of a nonterminal symbol $N$ with an arbitrary $N$-sentence. The conflict string can be thought of as a path that leads from one of the automaton's start states to the conflict state. When multiple such paths exist, the one that is displayed is chosen shortest. Nevertheless, it may sometimes be quite long. In that case, artificially (and temporarily) declaring some existing nonterminal symbols to be start symbols has the effect of adding new start states to the automaton and can help produce shorter conflict strings. Here, \nt{expression} was declared to be a start symbol, which is why the conflict string is quite short. In addition to the conflict string, the \conflicts file also states that the \emph{conflict token} is \basic{ELSE}. That is, when the automaton has recognized the conflict string and when the lookahead token (the next token on the input stream) is \basic{ELSE}, a conflict arises. A conflict corresponds to a choice: the automaton is faced with several possible actions, and does not know which one should be taken. This indicates that the grammar is not LR(1). The grammar may or may not be inherently ambiguous. In our example, the conflict string and the conflict token are enough to understand why there is a conflict: when two \basic{IF} constructs are nested, it is ambiguous which of the two constructs the \basic{ELSE} branch should be associated with. Nevertheless, the \conflicts file provides further information: it explicitly shows that there exists a conflict, by proving that two distinct actions are possible. Here, one of these actions consists in \emph{shifting}, while the other consists in \emph{reducing}: this is a \emph{shift/reduce} conflict. A \emph{proof} takes the form of a \emph{partial derivation tree} whose \emph{fringe} begins with the conflict string, followed by the conflict token. A derivation tree is a tree whose nodes are labeled with symbols. The root node carries a start symbol. A node that carries a terminal symbol is considered a leaf, and has no children. A node that carries a nonterminal symbol $N$ either is considered a leaf, and has no children; or is not considered a leaf, and has $n$ children, where $n\geq 0$, labeled $\nt{x}_1,\ldots,\nt{x}_n$, where $N \rightarrow \nt{x}_1,\ldots,\nt{x}_n$ is a production. The fringe of a partial derivation tree is the string of terminal and nonterminal symbols carried by the tree's leaves. A string of terminal and nonterminal symbols that is the fringe of some partial derivation tree is a \emph{sentential form}. \paragraph{Why shifting is legal} \begin{figure} \mycommonbaseline \begin{center} \begin{heveapicture} \begin{tikzpicture}[level distance=12mm] \node { \nt{expression} } child { node {\basic{IF}} } child { node {\nt{expression}} } child { node {\basic{THEN}} } child { node {\nt{expression}} child { node {\basic{IF}} } child { node {\nt{expression}} } child { node {\basic{THEN}} } child { node {\nt{expression}} } child { node {\basic{ELSE}} } child { node {\nt{expression}} } } ; \end{tikzpicture} \end{heveapicture} \end{center} \caption{A partial derivation tree that justifies shifting} \label{fig:shifting:tree} \end{figure} \begin{figure} \begin{center} \begin{tabbing} \= \nt{expression} \\ \> \basic{IF} \nt{expression} \basic{THEN} \= \nt{expression} \\ \> \> \basic{IF} \nt{expression} \basic{THEN} \basic{expression} . \basic{ELSE} \nt{expression} \end{tabbing} \end{center} \caption{A textual version of the tree in \fref{fig:shifting:tree}} \label{fig:shifting:text} \end{figure} In our example, the proof that shifting is possible is the derivation tree shown in Figures~\ref{fig:shifting:tree} and~\ref{fig:shifting:text}. At the root of the tree is the grammar's start symbol, \nt{expression}. This symbol develops into the string \nt{IF expression THEN expression}, which forms the tree's second level. The second occurrence of \nt{expression} in that string develops into \nt{IF expression THEN expression ELSE expression}, which forms the tree's last level. The tree's fringe, a sentential form, is the string \nt{IF expression THEN IF expression THEN expression ELSE expression}. As announced earlier, it begins with the conflict string \nt{IF expression THEN IF expression THEN expression}, followed with the conflict token \nt{ELSE}. In \fref{fig:shifting:text}, the end of the conflict string is materialized with a dot. Note that this dot does not occupy the rightmost position in the tree's last level. In other words, the conflict token (\basic{ELSE}) itself occurs on the tree's last level. In practical terms, this means that, after the automaton has recognized the conflict string and peeked at the conflict token, it makes sense for it to \emph{shift} that token. \paragraph{Why reducing is legal} \begin{figure} \mycommonbaseline \begin{center} \begin{heveapicture} \begin{tikzpicture}[level distance=12mm] \node { \nt{expression} } child { node {\basic{IF}} } child { node {\nt{expression}} } child { node {\basic{THEN}} } child { node {\nt{expression}} child { node {\basic{IF}} } child { node {\nt{expression}} } child { node {\basic{THEN}} } child { node {\nt{expression}} } } child { node {\basic{ELSE}} } child { node {\nt{expression}} } ; \end{tikzpicture} \end{heveapicture} \end{center} \caption{A partial derivation tree that justifies reducing} \label{fig:reducing:tree} \end{figure} \begin{figure} \begin{center} \begin{tabbing} \= \nt{expression} \\ \> \basic{IF} \nt{expression} \basic{THEN} \= \nt{expression} \basic{ELSE} \nt{expression} \sidecomment{lookahead token appears} \\ \> \> \basic{IF} \nt{expression} \basic{THEN} \basic{expression} . \end{tabbing} \end{center} \caption{A textual version of the tree in \fref{fig:reducing:tree}} \label{fig:reducing:text} \end{figure} In our example, the proof that reducing is possible is the derivation tree shown in Figures~\ref{fig:reducing:tree} and~\ref{fig:reducing:text}. Again, the sentential form found at the fringe of the tree begins with the conflict string, followed with the conflict token. Again, in \fref{fig:reducing:text}, the end of the conflict string is materialized with a dot. Note that, this time, the dot occupies the rightmost position in the tree's last level. In other words, the conflict token (\basic{ELSE}) appeared on an earlier level (here, on the second level). This fact is emphasized by the comment \inlinesidecomment{lookahead token appears} found at the second level. In practical terms, this means that, after the automaton has recognized the conflict string and peeked at the conflict token, it makes sense for it to \emph{reduce} the production that corresponds to the tree's last level---here, the production is \nt{expression} $\rightarrow$ \basic{IF} \nt{expression} \basic{THEN} \basic{expression}. \paragraph{An example of a more complex derivation tree} Figures~\ref{fig:xreducing:tree} and~\ref{fig:xreducing:text} show a partial derivation tree that justifies reduction in a more complex situation. (This derivation tree is relative to a grammar that is not shown.) Here, the conflict string is \basic{DATA UIDENT EQUALS UIDENT}; the conflict token is \basic{LIDENT}. It is quite clear that the fringe of the tree begins with the conflict string. However, in this case, the fringe does not explicitly exhibit the conflict token. Let us examine the tree more closely and answer the question: following \basic{UIDENT}, what's the next terminal symbol on the fringe? \begin{figure} \mycommonbaseline \begin{center} \begin{heveapicture} \begin{tikzpicture}[level distance=12mm,level 1/.style={sibling distance=18mm}, level 2/.style={sibling distance=18mm}, level 4/.style={sibling distance=24mm}]] \node { \nt{decls} } child { node {\nt{decl}} child { node {\basic{DATA}} } child { node {\basic{UIDENT}} } child { node {\basic{EQUALS}} } child { node {\nt{tycon\_expr}} child { node {\nt{tycon\_item}} child { node {\basic{UIDENT}} } child { node {\nt{opt\_type\_exprs}} child { node {} edge from parent [dashed] } } } } } child { node {\nt{opt\_semi}} } child { node {\nt{decls}} } ; \end{tikzpicture} \end{heveapicture} \end{center} \caption{A partial derivation tree that justifies reducing} \label{fig:xreducing:tree} \end{figure} \begin{figure} \begin{center} \begin{tabbing} \= \nt{decls} \\ \> \nt{decl} \nt{opt\_semi} \nt{decls} \sidecomment{lookahead token appears because \nt{opt\_semi} can vanish and \nt{decls} can begin with \basic{LIDENT}} \\ \> \basic{DATA UIDENT} \basic{EQUALS} \= \nt{tycon\_expr} \sidecomment{lookahead token is inherited} \\ \> \> \nt{tycon\_item} \sidecomment{lookahead token is inherited} \\ \> \> \basic{UIDENT} \= \nt{opt\_type\_exprs} \sidecomment{lookahead token is inherited} \\ \> \> \> . \end{tabbing} \end{center} \caption{A textual version of the tree in \fref{fig:xreducing:tree}} \label{fig:xreducing:text} \end{figure} % TEMPORARY the HTML rendering of this figure isn't good First, note that \nt{opt\_type\_exprs} is \emph{not} a leaf node, even though it has no children. The grammar contains the production $\nt{opt\_type\_exprs} \rightarrow \epsilon$: the nonterminal symbol \nt{opt\_type\_exprs} develops to the empty string. (This is made clear in \fref{fig:xreducing:text}, where a single dot appears immediately below \nt{opt\_type\_exprs}.) Thus, \nt{opt\_type\_exprs} is not part of the fringe. Next, note that \nt{opt\_type\_exprs} is the rightmost symbol within its level. Thus, in order to find the next symbol on the fringe, we have to look up one level. This is the meaning of the comment \inlinesidecomment{lookahead token is inherited}. Similarly, \nt{tycon\_item} and \nt{tycon\_expr} appear rightmost within their level, so we again have to look further up. This brings us back to the tree's second level. There, \nt{decl} is \emph{not} the rightmost symbol: next to it, we find \nt{opt\_semi} and \nt{decls}. Does this mean that \nt{opt\_semi} is the next symbol on the fringe? Yes and no. \nt{opt\_semi} is a \emph{nonterminal} symbol, but we are really interested in finding out what the next \emph{terminal} symbol on the fringe could be. The partial derivation tree shown in Figures~\ref{fig:xreducing:tree} and~\ref{fig:xreducing:text} does not explicitly answer this question. In order to answer it, we need to know more about \nt{opt\_semi} and \nt{decls}. Here, \nt{opt\_semi} stands (as one might have guessed) for an optional semicolon, so the grammar contains a production $\nt{opt\_semi} \rightarrow \epsilon$. This is indicated by the comment \inlinesidecomment{\nt{opt\_semi} can vanish}. (Nonterminal symbols that generate $\epsilon$ are also said to be \emph{nullable}.) Thus, one could choose to turn this partial derivation tree into a larger one by developing \nt{opt\_semi} into $\epsilon$, making it a non-leaf node. That would yield a new partial derivation tree where the next symbol on the fringe, following \basic{UIDENT}, is \nt{decls}. Now, what about \nt{decls}? Again, it is a \emph{nonterminal} symbol, and we are really interested in finding out what the next \emph{terminal} symbol on the fringe could be. Again, we need to imagine how this partial derivation tree could be turned into a larger one by developing \nt{decls}. Here, the grammar happens to contain a production of the form $\nt{decls} \rightarrow \basic{LIDENT} \ldots$ This is indicated by the comment \inlinesidecomment{\nt{decls} can begin with \basic{LIDENT}}. Thus, by developing \nt{decls}, it is possible to construct a partial derivation tree where the next symbol on the fringe, following \basic{UIDENT}, is \basic{LIDENT}. This is precisely the conflict token. To sum up, there exists a partial derivation tree whose fringe begins with the conflict string, followed with the conflict token. Furthermore, in that derivation tree, the dot occupies the rightmost position in the last level. As in our previous example, this means that, after the automaton has recognized the conflict string and peeked at the conflict token, it makes sense for it to \emph{reduce} the production that corresponds to the tree's last level---here, the production is $\nt{opt\_type\_exprs} \rightarrow \epsilon$. \paragraph{Greatest common factor among derivation trees} Understanding conflicts requires comparing two (or more) derivation trees. It is frequent for these trees to exhibit a common factor, that is, to exhibit identical structure near the top of the tree, and to differ only below a specific node. Manual identification of that node can be tedious, so \menhir performs this work automatically. When explaining a $n$-way conflict, it first displays the greatest common factor of the $n$ derivation trees. A question mark symbol $\basic{(?)}$ is used to identify the node where the trees begin to differ. Then, \menhir displays each of the $n$ derivation trees, \emph{without their common factor} -- that is, it displays $n$ sub-trees that actually begin to differ at the root. This should make visual comparisons significantly easier. \subsection{How are severe conflicts resolved in the end?} It is unspecified how severe conflicts are resolved. \menhir attempts to mimic \ocamlyacc's specification, that is, to resolve shift/reduce conflicts in favor of shifting, and to resolve reduce/reduce conflicts in favor of the production that textually appears earliest in the grammar specification. However, this specification is inconsistent in case of three-way conflicts, that is, conflicts that simultaneously involve a shift action and several reduction actions. Furthermore, textual precedence can be undefined when the grammar specification is split over multiple modules. In short, \menhir's philosophy is that \begin{center} severe conflicts should not be tolerated, \end{center} so you should not care how they are resolved. % If a shift/reduce conflict is resolved in favor of reduction, then there can % exist words of terminal symbols that are accepted by the canonical LR(1) % automaton without traversing any conflict state and which are rejected by our % automaton (constructed by Pager's method followed by conflict % resolution). Same problem when a shift/reduce conflict is resolved in favor of % neither action (via \dnonassoc) or when a reduce/reduce conflict is resolved % arbitrarily. \subsection{End-of-stream conflicts} \label{sec:eos} \menhir's treatment of the end of the token stream is (believed to be) fully compatible with \ocamlyacc's. Yet, \menhir attempts to be more user-friendly by warning about a class of so-called ``end-of-stream conflicts''. % TEMPORARY il faut noter que \menhir n'est pas conforme à ocamlyacc en % présence de conflits end-of-stream; apparemment il part dans le mur % en exigeant toujours le token suivant, alors que ocamlyacc est capable % de s'arrêter (comment?); cf. problème de S. Hinderer (avril 2015). \paragraph{How the end of stream is handled} In many textbooks on parsing, it is assumed that the lexical analyzer, which produces the token stream, produces a special token, written \eos, to signal that the end of the token stream has been reached. A parser generator can take advantage of this by transforming the grammar: for each start symbol $\nt{S}$ in the original grammar, a new start symbol $\nt{S'}$ is defined, together with the production $S'\rightarrow S\eos$. The symbol $S$ is no longer a start symbol in the new grammar. This means that the parser will accept a sentence derived from $S$ only if it is immediately followed by the end of the token stream. This approach has the advantage of simplicity. However, \ocamlyacc and \menhir do not follow it, for several reasons. Perhaps the most convincing one is that it is not flexible enough: sometimes, it is desirable to recognize a sentence derived from $S$, \emph{without} requiring that it be followed by the end of the token stream: this is the case, for instance, when reading commands, one by one, on the standard input channel. In that case, there is no end of stream: the token stream is conceptually infinite. Furthermore, after a command has been recognized, we do \emph{not} wish to examine the next token, because doing so might cause the program to block, waiting for more input. In short, \ocamlyacc and \menhir's approach is to recognize a sentence derived from $S$ and to \emph{not look}, if possible, at what follows. However, this is possible only if the definition of $S$ is such that the end of an $S$-sentence is identifiable without knowledge of the lookahead token. When the definition of $S$ does not satisfy this criterion, and \emph{end-of-stream conflict} arises: after a potential $S$-sentence has been read, there can be a tension between consulting the next token, in order to determine whether the sentence is continued, and \emph{not} consulting the next token, because the sentence might be over and whatever follows should not be read. \menhir warns about end-of-stream conflicts, whereas \ocamlyacc does not. \paragraph{A definition of end-of-stream conflicts} Technically, \menhir proceeds as follows. A \eos symbol is introduced. It is, however, only a \emph{pseudo-}token: it is never produced by the lexical analyzer. For each start symbol $\nt{S}$ in the original grammar, a new start symbol $\nt{S'}$ is defined, together with the production $S'\rightarrow S$. The corresponding start state of the LR(1) automaton is composed of the LR(1) item $S' \rightarrow . \;S\; [\eos]$. That is, the pseudo-token \eos initially appears in the lookahead set, indicating that we expect to be done after recognizing an $S$-sentence. During the construction of the LR(1) automaton, this lookahead set is inherited by other items, with the effect that, in the end, the automaton has: \begin{itemize} \item \emph{shift} actions only on physical tokens; and \item \emph{reduce} actions either on physical tokens or on the pseudo-token \eos. \end{itemize} A state of the automaton has a reduce action on \eos if, in that state, an $S$-sentence has been read, so that the job is potentially finished. A state has a shift or reduce action on a physical token if, in that state, more tokens potentially need to be read before an $S$-sentence is recognized. If a state has a reduce action on \eos, then that action should be taken \emph{without} requesting the next token from the lexical analyzer. On the other hand, if a state has a shift or reduce action on a physical token, then the lookahead token \emph{must} be consulted in order to determine if that action should be taken. \begin{figure}[p] \begin{quote} \begin{tabular}{l} \dtoken \kangle{\basic{int}} \basic{INT} \\ \dtoken \basic{PLUS TIMES} \\ \dleft PLUS \\ \dleft TIMES \\ \dstart \kangle{\basic{int}} \nt{expr} \\ \percentpercent \\ \nt{expr}: \newprod \basic{i} = \basic{INT} \dpaction{\basic{i}} \newprod \basic{e1} = \nt{expr} \basic{PLUS} \basic{e2} = \nt{expr} \dpaction{\basic{e1 + e2}} \newprod \basic{e1} = \nt{expr} \basic{TIMES} \basic{e2} = \nt{expr} \dpaction{\basic{e1 * e2}} \end{tabular} \end{quote} \caption{Basic example of an end-of-stream conflict} \label{fig:basiceos} \end{figure} \begin{figure}[p] \begin{verbatim} State 6: expr -> expr . PLUS expr [ # TIMES PLUS ] expr -> expr PLUS expr . [ # TIMES PLUS ] expr -> expr . TIMES expr [ # TIMES PLUS ] -- On TIMES shift to state 3 -- On # PLUS reduce production expr -> expr PLUS expr State 4: expr -> expr . PLUS expr [ # TIMES PLUS ] expr -> expr . TIMES expr [ # TIMES PLUS ] expr -> expr TIMES expr . [ # TIMES PLUS ] -- On # TIMES PLUS reduce production expr -> expr TIMES expr State 2: expr' -> expr . [ # ] expr -> expr . PLUS expr [ # TIMES PLUS ] expr -> expr . TIMES expr [ # TIMES PLUS ] -- On TIMES shift to state 3 -- On PLUS shift to state 5 -- On # accept expr \end{verbatim} \caption{Part of an LR automaton for the grammar in \fref{fig:basiceos}} \label{fig:basiceosdump} \end{figure} \begin{figure}[p] \begin{quote} \begin{tabular}{l} \ldots \\ \dtoken \basic{END} \\ \dstart \kangle{\basic{int}} \nt{main} \hspace{1cm} \textit{// instead of \nt{expr}} \\ \percentpercent \\ \nt{main}: \newprod \basic{e} = \nt{expr} \basic{END} \dpaction{\basic{e}} \\ \nt{expr}: \newprod \ldots \end{tabular} \end{quote} \caption{Fixing the grammar specification in \fref{fig:basiceos}} \label{fig:basiceos:sol} \end{figure} An end-of-stream conflict arises when a state has distinct actions on \eos and on at least one physical token. In short, this means that the end of an $S$-sentence cannot be unambiguously identified without examining one extra token. \menhir's default behavior, in that case, is to suppress the action on \eos, so that more input is \emph{always} requested. \paragraph{Example} \fref{fig:basiceos} shows a grammar that has end-of-stream conflicts. When this grammar is processed, \menhir warns about these conflicts, and further warns that \nt{expr} is never accepted. Let us explain. Part of the corresponding automaton, as described in the \automaton file, is shown in \fref{fig:basiceosdump}. Explanations at the end of the \automaton file (not shown) point out that states 6 and 2 have an end-of-stream conflict. Indeed, both states have distinct actions on \eos and on the physical token \basic{TIMES}. % It is interesting to note that, even though state 4 has actions on \eos and on physical tokens, it does not have an end-of-stream conflict. This is because the action taken in state 4 is always to reduce the production $\nt{expr} \rightarrow \nt{expr}$ \basic{TIMES} \nt{expr}, regardless of the lookahead token. By default, \menhir produces a parser where end-of-stream conflicts are resolved in favor of looking ahead: that is, the problematic reduce actions on \eos are suppressed. This means, in particular, that the \emph{accept} action in state 2, which corresponds to reducing the production $\nt{expr} \rightarrow \nt{expr'}$, is suppressed. This explains why the symbol \nt{expr} is never accepted: because expressions do not have an unambiguous end marker, the parser will always request one more token and will never stop. In order to avoid this end-of-stream conflict, the standard solution is to introduce a new token, say \basic{END}, and to use it as an end marker for expressions. The \basic{END} token could be generated by the lexical analyzer when it encounters the actual end of stream, or it could correspond to a piece of concrete syntax, say, a line feed character, a semicolon, or an \texttt{end} keyword. The solution is shown in \fref{fig:basiceos:sol}. % ------------------------------------------------------------------------------ \section{Positions} \label{sec:positions} When an \ocamllex-generated lexical analyzer produces a token, it updates two fields, named \verb+lex_start_p+ and \verb+lex_curr_p+, in its environment record, whose type is \verb+Lexing.lexbuf+. Each of these fields holds a value of type \verb+Lexing.position+. Together, they represent the token's start and end positions within the text that is being scanned. These fields are read by \menhir after calling the lexical analyzer, so \textbf{it is the lexical analyzer's responsibility} to correctly set these fields. A position consists mainly of an offset (the position's \verb+pos_cnum+ field), but also holds information about the current file name, the current line number, and the current offset within the current line. (Not all \ocamllex-generated analyzers keep this extra information up to date. This must be explicitly programmed by the author of the lexical analyzer.) \begin{figure} \begin{center} \begin{tabular}{@{}l@{\hspace{7.0mm}}l@{}} \verb+$startpos+ & start position of the first symbol in the production's right-hand side, if there is one; \\& end position of the most recently parsed symbol, otherwise \\ \verb+$endpos+ & end position of the last symbol in the production's right-hand side, if there is one; \\& end position of the most recently parsed symbol, otherwise \\ \verb+$startpos(+ \verb+$+\nt{i} \barre \nt{id} \verb+)+ & start position of the symbol named \verb+$+\nt{i} or \nt{id} \\ \verb+$endpos(+ \verb+$+\nt{i} \barre \nt{id} \verb+)+ & end position of the symbol named \verb+$+\nt{i} or \nt{id} \\ \ksymbolstartpos & start position of the leftmost symbol \nt{id} such that \verb+$startpos(+\nt{id}\verb+)+ \verb+!=+\, \verb+$endpos(+\nt{id}\verb+)+; \\& if there is no such symbol, \verb+$endpos+ \\[2mm] % \verb+$startofs+ \\ \verb+$endofs+ \\ \verb+$startofs(+ \verb+$+\nt{i} \barre \nt{id} \verb+)+ & same as above, but produce an integer offset instead of a position \\ \verb+$endofs(+ \verb+$+\nt{i} \barre \nt{id} \verb+)+ \\ \verb+$symbolstartofs+ \\[2mm] % \verb+$loc+ & stands for the pair \verb+($startpos, $endpos)+ \\ \verb+$loc(+ \nt{id} \verb+)+ & stands for the pair \verb+($startpos(+ \nt{id} \verb+), $endpos(+ \nt{id} \verb+))+ \\ % $loc($i)$ works too, % but is not documented, % as that would be visually heavy % and its use is not encouraged anyway. \verb+$sloc+ & stands for the pair \verb+($symbolstartpos, $endpos)+ \\ \end{tabular} \end{center} \caption{Position-related keywords} \label{fig:pos} \end{figure} % We could document $endpos($0). Not sure whether that would be a good thing. \begin{figure} \begin{tabular}{@{}ll@{\hspace{2cm}}l} % Positions. \verb+symbol_start_pos()+ & \ksymbolstartpos \\ \verb+symbol_end_pos()+ & \verb+$endpos+ \\ \verb+rhs_start_pos i+ & \verb+$startpos($i)+ & ($1 \leq i \leq n$) \\ \verb+rhs_end_pos i+ & \verb+$endpos($i)+ & ($1 \leq i \leq n$) \\ % i = 0 permitted, really % Offsets. \verb+symbol_start()+ & \verb+$symbolstartofs+ \\ \verb+symbol_end()+ & \verb+$endofs+ \\ \verb+rhs_start i+ & \verb+$startofs($i)+ & ($1 \leq i \leq n$) \\ \verb+rhs_end i+ & \verb+$endofs($i)+ & ($1 \leq i \leq n$) \\ % i = 0 permitted, really \end{tabular} \caption{Translating position-related incantations from \ocamlyacc to \menhir} \label{fig:pos:mapping} \end{figure} This mechanism allows associating pairs of positions with terminal symbols. If desired, \menhir automatically extends it to nonterminal symbols as well. That is, it offers a mechanism for associating pairs of positions with terminal or nonterminal symbols. This is done by making a set of keywords available to semantic actions (\fref{fig:pos}). These keywords are \emph{not} available outside of a semantic action: in particular, they cannot be used within an \ocaml header. \ocaml's standard library module \texttt{Parsing} is deprecated. The functions that it offers \emph{can} be called, but will return dummy positions. We remark that, if the current production has an empty right-hand side, then \verb+$startpos+ and \verb+$endpos+ are equal, and (by convention) are the end position of the most recently parsed symbol (that is, the symbol that happens to be on top of the automaton's stack when this production is reduced). If the current production has a nonempty right-hand side, then \verb+$startpos+ is the same as \verb+$startpos($1)+ and \verb+$endpos+ is the same as \verb+$endpos($+\nt{n}\verb+)+, where \nt{n} is the length of the right-hand side. More generally, if the current production has matched a sentence of length zero, then \verb+$startpos+ and \verb+$endpos+ will be equal, and conversely. % (provided the lexer is reasonable and never produces a token whose start and % end positions are equal). The position \verb+$startpos+ is sometimes ``further towards the left'' than one would like. For example, in the following production: \begin{verbatim} declaration: modifier? variable { $startpos } \end{verbatim} the keyword \verb+$startpos+ represents the start position of the optional modifier \verb+modifier?+. If this modifier turns out to be absent, then its start position is (by definition) the end position of the most recently parsed symbol. This may not be what is desired: perhaps the user would prefer in this case to use the start position of the symbol \verb+variable+. This is achieved by using \ksymbolstartpos instead of \verb+$startpos+. By definition, \ksymbolstartpos is the start position of the leftmost symbol whose start and end positions differ. In this example, the computation of \ksymbolstartpos skips the absent \verb+modifier+, whose start and end positions coincide, and returns the start position of the symbol \verb+variable+ (assuming this symbol has distinct start and end positions). % On pourrait souligner que $symbolstartpos renvoie la $startpos du premier % symbole non vide, et non pas la $symbolstartpos du premier symbole non vide. % Donc ça peut rester un peu contre-intuitif, et ne pas correspondre % exactement à ce que l'on attend. D'ailleurs, le calcul de $symbolstartpos % est préservé par %inline (on obtient cela très facilement en éliminant % $symbolstartpos avant l'inlining) mais ne correspond pas à ce que donnerait % $symbolstartpos après un inlining manuel. Fondamentalement, cette notion de % $symbolstartpos ne tourne pas très rond. There is no keyword \verb+$symbolendpos+. Indeed, the problem with \verb+$startpos+ is due to the asymmetry in the definition of \verb+$startpos+ and \verb+$endpos+ in the case of an empty right-hand side, and does not affect \verb+$endpos+. \newcommand{\fineprint}{\footnote{% The computation of \ksymbolstartpos is optimized by \menhir under two assumptions about the lexer. First, \menhir assumes that the lexer never produces a token whose start and end positions are equal. Second, \menhir assumes that two positions produced by the lexer are equal if and only if they are physically equal. If the lexer violates either of these assumptions, the computation of \ksymbolstartpos could produce a result that differs from \texttt{Parsing.symbol\_start\_pos()}. }} The positions computed by \menhir are exactly the same as those computed by \verb+ocamlyacc+\fineprint. More precisely, \fref{fig:pos:mapping} sums up how to translate a call to the \texttt{Parsing} module, as used in an \ocamlyacc grammar, to a \menhir keyword. We note that \menhir's \verb+$startpos+ does not appear in the right-hand column in \fref{fig:pos:mapping}. In other words, \menhir's \verb+$startpos+ does not correspond exactly to any of the \ocamlyacc function calls. An exact \ocamlyacc equivalent of \verb+$startpos+ is \verb+rhs_start_pos 1+ if the current production has a nonempty right-hand side and \verb+symbol_start_pos()+ if it has an empty right-hand side. Finally, we remark that \menhir's \dinline keyword (\sref{sec:inline}) does not affect the computation of positions. The same positions are computed, regardless of where \dinline keywords are placed. % ------------------------------------------------------------------------------ \section{Using \menhir as an interpreter} \label{sec:interpret} When \ointerpret is set, \menhir no longer behaves as a compiler. Instead, it acts as an interpreter. That is, it repeatedly: \begin{itemize} \item reads a sentence off the standard input channel; \item parses this sentence, according to the grammar; \item displays an outcome. \end{itemize} This process stops when the end of the input channel is reached. \subsection{Sentences} \label{sec:sentences} The syntax of sentences is as follows: \begin{center} \begin{tabular}{r@{}c@{}l} \nt{sentence} \is \optional{\nt{lid}\,\deuxpoints} \sepspacelist{\nt{uid}} \,\dnewline \end{tabular} \end{center} Less formally, a sentence is a sequence of zero or more terminal symbols (\nt{uid}'s), separated with whitespace, terminated with a newline character, and optionally preceded with a non-terminal start symbol (\nt{lid}). This non-terminal symbol can be omitted if, and only if, the grammar only has one start symbol. For instance, here are four valid sentences for the grammar of arithmetic expressions found in the directory \distrib{demos/calc}: % \begin{verbatim} main: INT PLUS INT EOL INT PLUS INT INT PLUS PLUS INT EOL INT PLUS PLUS \end{verbatim} % In the first sentence, the start symbol \texttt{main} was explicitly specified. In the other sentences, it was omitted, which is permitted, because this grammar has no start symbol other than \texttt{main}. The first sentence is a stream of four terminal symbols, namely \texttt{INT}, \texttt{PLUS}, \texttt{INT}, and \texttt{EOL}. These terminal symbols must be provided under their symbolic names. Writing, say, ``\texttt{12+32\textbackslash n}'' instead of \texttt{INT PLUS INT EOL} is not permitted. \menhir would not be able to make sense of such a concrete notation, since it does not have a lexer for it. % On pourrait documenter le fait qu'une phrase finie est transformée par \menhir % en un flot de tokens potentiellement infinie, avec un suffixe infini EOF ... % Mais c'est un hack, qui pourrait changer à l'avenir. \subsection{Outcomes} \label{sec:outcomes} As soon as \menhir is able to read a complete sentence off the standard input channel (that is, as soon as it finds the newline character that ends the sentence), it parses the sentence according to whichever grammar was specified on the command line, and displays an outcome. An outcome is one of the following: \begin{itemize} \item \texttt{ACCEPT}: a prefix of the sentence was successfully parsed; a parser generated by \menhir would successfully stop and produce a semantic value; \item \texttt{OVERSHOOT}: the end of the sentence was reached before it could be accepted; a parser generated by \menhir would request a non-existent ``next token'' from the lexer, causing it to fail or block; \item \texttt{REJECT}: the sentence was not accepted; a parser generated by \menhir would raise the exception \texttt{Error}. \end{itemize} When \ointerpretshowcst is set, each \texttt{ACCEPT} outcome is followed with a concrete syntax tree. A concrete syntax tree is either a leaf or a node. A leaf is either a terminal symbol or \error. A node is annotated with a non-terminal symbol, and carries a sequence of immediate descendants that correspond to a valid expansion of this non-terminal symbol. \menhir's notation for concrete syntax trees is as follows: \begin{center} \begin{tabular}{r@{}c@{}l} \nt{cst} \is \nt{uid} \\ && \error \\ && \texttt{[} \nt{lid}\,\deuxpoints \sepspacelist{\nt{cst}} \texttt{]} \end{tabular} \end{center} % This notation is not quite unambiguous (it is ambiguous if several % productions are identical). For instance, if one wished to parse the example sentences of \sref{sec:sentences} using the grammar of arithmetic expressions in \distrib{demos/calc}, one could invoke \menhir as follows: \begin{verbatim} $ menhir --interpret --interpret-show-cst demos/calc/parser.mly main: INT PLUS INT EOL ACCEPT [main: [expr: [expr: INT] PLUS [expr: INT]] EOL] INT PLUS INT OVERSHOOT INT PLUS PLUS INT EOL REJECT INT PLUS PLUS REJECT \end{verbatim} (Here, \menhir's input---the sentences provided by the user on the standard input channel--- is shown intermixed with \menhir's output---the outcomes printed by \menhir on the standard output channel.) The first sentence is valid, and accepted; a concrete syntax tree is displayed. The second sentence is incomplete, because the grammar specifies that a valid expansion of \texttt{main} ends with the terminal symbol \texttt{EOL}; hence, the outcome is \texttt{OVERSHOOT}. The third sentence is invalid, because of the repeated occurrence of the terminal symbol \texttt{PLUS}; the outcome is \texttt{REJECT}. The fourth sentence, a prefix of the third one, is rejected for the same reason. \subsection{Remarks} Using \menhir as an interpreter offers an easy way of debugging your grammar. For instance, if one wished to check that addition is considered left-associative, as requested by the \dleft directive found in the file \distrib{demos/calc/parser.mly}, one could submit the following sentence: \begin{verbatim} $ ./menhir --interpret --interpret-show-cst ../demos/calc/parser.mly INT PLUS INT PLUS INT EOL ACCEPT [main: [expr: [expr: [expr: INT] PLUS [expr: INT]] PLUS [expr: INT]] EOL ] \end{verbatim} %$ The concrete syntax tree displayed by \menhir is skewed towards the left, as desired. The switches \ointerpret and \otrace can be used in conjunction. When \otrace is set, the interpreter logs its actions to the standard error channel. % ------------------------------------------------------------------------------ \section{Generated API} When \menhir processes a grammar specification, say \texttt{parser.mly}, it produces one \ocaml module, \texttt{Parser}, whose code resides in the file \texttt{parser.ml} and whose signature resides in the file \texttt{parser.mli}. We now review this signature. For simplicity, we assume that the grammar specification has just one start symbol \verb+main+, whose \ocaml type is \verb+thing+. % ------------------------------------------------------------------------------ \subsection{Monolithic API} \label{sec:monolithic} The monolithic API defines the type \verb+token+, the exception \verb+Error+, and the parsing function \verb+main+, named after the start symbol of the grammar. %% type token The type \verb+token+ is an algebraic data type. A value of type \verb+token+ represents a terminal symbol and its semantic value. For instance, if the grammar contains the declarations \verb+%token A+ and \verb+%token B+, then the generated file \texttt{parser.mli} contains the following definition: \begin{verbatim} type token = | A | B of int \end{verbatim} % If \oonlytokens is specified on the command line, the type \verb+token+ is generated, and the rest is omitted. On the contrary, if \oexternaltokens is used, the type \verb+token+ is omitted, but the rest (described below) is generated. %% exception Error The exception \verb+Error+ carries no argument. It is raised by the parsing function \verb+main+ (described below) when a syntax error is detected. % \begin{verbatim} exception Error \end{verbatim} %% val main Next comes one parsing function for each start symbol of the grammar. Here, we have assumed that there is one start symbol, named \verb+main+, so the generated file \texttt{parser.mli} contains the following declaration: \begin{verbatim} val main: (Lexing.lexbuf -> token) -> Lexing.lexbuf -> thing \end{verbatim} % On ne montre pas la définition de l'exception Error. This function expects two arguments, namely: a lexer, which typically is produced by \ocamllex and has type \verb+Lexing.lexbuf -> token+; and a lexing buffer, which has type \verb+Lexing.lexbuf+. This API is compatible with \ocamlyacc. (For information on using \menhir without \ocamllex, please consult \sref{sec:qa}.) % This API is ``monolithic'' in the sense that there is just one function, which does everything: it pulls tokens from the lexer, parses, and eventually returns a semantic value (or fails by throwing the exception \texttt{Error}). % We may wish to note that the behavior of the function \verb+main+ % is influenced by the strategy that is chosen at compile time via % \ostrategy. % ------------------------------------------------------------------------------ \subsection{Incremental API} \label{sec:incremental} If \otable is set, \menhir offers an incremental API in addition to the monolithic API. In this API, control is inverted. The parser does not have access to the lexer. Instead, when the parser needs the next token, it stops and returns its current state to the user. The user is then responsible for obtaining this token (typically by invoking the lexer) and resuming the parser from that state. % The directory \distrib{demos/calc-incremental} contains a demo that illustrates the use of the incremental API. This API is ``incremental'' in the sense that the user has access to a sequence of the intermediate states of the parser. Assuming that semantic values are immutable, a parser state is a persistent data structure: it can be stored and used multiple times, if desired. This enables applications such as ``live parsing'', where a buffer is continuously parsed while it is being edited. The parser can be re-started in the middle of the buffer whenever the user edits a character. Because two successive parser states share most of their data in memory, a list of $n$ successive parser states occupies only $O(n)$ space in memory. % One could point out that semantic actions should be side-effect free. % But that is an absolute requirement. Semantic actions can have side % effects, if the user knows what they are doing. % TEMPORARY actually, live parsing also requires a way of performing % error recovery, up to a complete parse... as in Merlin. % ------------------------------------------------------------------------------ \subsubsection{Starting the parser} In this API, the parser is started by invoking \verb+Incremental.main+. (Recall that we assume that \verb+main+ is the name of the start symbol.) The generated file \texttt{parser.mli} contains the following declaration: \begin{verbatim} module Incremental : sig val main: position -> thing MenhirInterpreter.checkpoint end \end{verbatim} The argument is the initial position. If the lexer is based on an \ocaml lexing buffer, this argument should be \verb+lexbuf.lex_curr_p+. In \sref{sec:incremental} and \sref{sec:inspection}, the type \verb+position+ is a synonym for \verb+Lexing.position+. We emphasize that the function \verb+Incremental.main+ does not parse anything. It constructs a checkpoint which serves as a \emph{starting} point. The functions \verb+offer+ and \verb+resume+, described below, are used to drive the parser. % ------------------------------------------------------------------------------ \subsubsection{Driving the parser} \label{sec:incremental:driving} The sub-module \menhirinterpreter is also part of the incremental API. Its declaration, which appears in the generated file \texttt{parser.mli}, is as follows: \begin{verbatim} module MenhirInterpreter : MenhirLib.IncrementalEngine.INCREMENTAL_ENGINE with type token = token \end{verbatim} The signature \verb+INCREMENTAL_ENGINE+, defined in the module \menhirlibincrementalengine, contains many types and functions, which are described in the rest of this section (\sref{sec:incremental:driving}) and in the following sections (\sref{sec:incremental:inspecting}, \sref{sec:incremental:updating}). Please keep in mind that, from the outside, these types and functions should be referred to with an appropriate prefix. For instance, the type \verb+checkpoint+ should be referred to as \verb+MenhirInterpreter.checkpoint+, or \verb+Parser.MenhirInterpreter.checkpoint+, depending on which modules the user chooses to open. %% type token % Passons-le sous silence. %% type 'a env \begin{verbatim} type 'a env \end{verbatim} The abstract type \verb+'a env+ represents the current state of the parser. (That is, it contains the current state and stack of the LR automaton.) Assuming that semantic values are immutable, it is a persistent data structure: it can be stored and used multiple times, if desired. The parameter \verb+'a+ is the type of the semantic value that will eventually be produced if the parser succeeds. %% type production \begin{verbatim} type production \end{verbatim} The abstract type \verb+production+ represents a production of the grammar. % The ``start productions'' (which do not exist in an \mly file, but are constructed by \menhir internally) are \emph{not} part of this type. %% type 'a checkpoint \begin{verbatim} type 'a checkpoint = private | InputNeeded of 'a env | Shifting of 'a env * 'a env * bool | AboutToReduce of 'a env * production | HandlingError of 'a env | Accepted of 'a | Rejected \end{verbatim} The type \verb+'a checkpoint+ represents an intermediate or final state of the parser. An intermediate checkpoint is a suspension: it records the parser's current state, and allows parsing to be resumed. The parameter \verb+'a+ is the type of the semantic value that will eventually be produced if the parser succeeds. \verb+Accepted+ and \verb+Rejected+ are final checkpoints. \verb+Accepted+ carries a semantic value. \verb+InputNeeded+ is an intermediate checkpoint. It means that the parser wishes to read one token before continuing. \verb+Shifting+ is an intermediate checkpoint. It means that the parser is taking a shift transition. It exposes the state of the parser before and after the transition. The Boolean parameter tells whether the parser intends to request a new token after this transition. (It always does, except when it is about to accept.) \verb+AboutToReduce+ is an intermediate checkpoint: it means that the parser is about to perform a reduction step. \verb+HandlingError+ is also an intermediate checkpoint: it means that the parser has detected an error and is about to handle it. (Error handling is typically performed in several steps, so the next checkpoint is likely to be \verb+HandlingError+ again.) In these two cases, the parser does not need more input. The parser suspends itself at this point only in order to give the user an opportunity to observe the parser's transitions and possibly handle errors in a different manner, if desired. %% val offer \begin{verbatim} val offer: 'a checkpoint -> token * position * position -> 'a checkpoint \end{verbatim} The function \verb+offer+ allows the user to resume the parser after the parser has suspended itself with a checkpoint of the form \verb+InputNeeded env+. This function expects the previous checkpoint \verb+checkpoint+ as well as a new token (together with the start and end positions of this token). It produces a new checkpoint, which again can be an intermediate checkpoint or a final checkpoint. It does not raise any exception. (The exception \texttt{Error} is used only in the monolithic API.) %% val resume \begin{verbatim} val resume: ?strategy:[ `Legacy | `Simplified ] -> 'a checkpoint -> 'a checkpoint \end{verbatim} The function \verb+resume+ allows the user to resume the parser after the parser has suspended itself with a checkpoint of the form \verb+AboutToReduce (env, prod)+ or \verb+HandlingError env+. This function expects just the previous checkpoint \verb+checkpoint+. It produces a new checkpoint. It does not raise any exception. % The optional argument \verb+strategy+ influences the manner in which \verb+resume+ deals with checkpoints of the form \verb+ErrorHandling _+. Its default value is \verb+`Legacy+. For more details, see \sref{sec:errors}. The incremental API subsumes the monolithic API. Indeed, \verb+main+ can be (and is in fact) implemented by first using \verb+Incremental.main+, then calling \verb+offer+ and \verb+resume+ in a loop, until a final checkpoint is obtained. %% type supplier \begin{verbatim} type supplier = unit -> token * position * position \end{verbatim} A token supplier is a function of no arguments which delivers a new token (together with its start and end positions) every time it is called. The function \verb+loop+ and its variants, described below, expect a supplier as an argument. %% val lexer_lexbuf_to_supplier \begin{verbatim} val lexer_lexbuf_to_supplier: (Lexing.lexbuf -> token) -> Lexing.lexbuf -> supplier \end{verbatim} The function \verb+lexer_lexbuf_to_supplier+, applied to a lexer and to a lexing buffer, produces a fresh supplier. %% (remark about the loop* functions) The functions \verb+offer+ and \verb+resume+, documented above, are sufficient to write a parser loop. One can imagine many variations of such a loop, which is why we expose \verb+offer+ and \verb+resume+ in the first place. Nevertheless, some variations are so common that it is worth providing them, ready for use. The following functions are implemented on top of \verb+offer+ and \verb+resume+. %% val loop \begin{verbatim} val loop: ?strategy:[ `Legacy | `Simplified ] -> supplier -> 'a checkpoint -> 'a \end{verbatim} \verb+loop supplier checkpoint+ begins parsing from \verb+checkpoint+, reading tokens from \verb+supplier+. It continues parsing until it reaches a checkpoint of the form \verb+Accepted v+ or \verb+Rejected+. In the former case, it returns \verb+v+. In the latter case, it raises the exception \verb+Error+. (By the way, this is how we implement the monolithic API on top of the incremental API.) % The optional argument \verb+strategy+ influences the manner in which \verb+loop+ deals with checkpoints of the form \verb+ErrorHandling _+. Its default value is \verb+`Legacy+. For more details, see \sref{sec:errors}. \begin{verbatim} val loop_handle: ('a -> 'answer) -> ('a checkpoint -> 'answer) -> supplier -> 'a checkpoint -> 'answer \end{verbatim} \verb+loop_handle succeed fail supplier checkpoint+ begins parsing from \verb+checkpoint+, reading tokens from \verb+supplier+. It continues until it reaches a checkpoint of the form \verb+Accepted v+ or \verb+HandlingError _+ (or~\verb+Rejected+, but that should not happen, as \verb+HandlingError _+ will be observed first). In the former case, it calls \verb+succeed v+. In the latter case, it calls \verb+fail+ with this checkpoint. It cannot raise \verb+Error+. This means that \menhir's traditional error-handling procedure (which pops the stack until a state that can act on the \error token is found) does not get a chance to run. Instead, the user can implement her own error handling code, in the \verb+fail+ continuation. %% val loop_handle_undo \begin{verbatim} val loop_handle_undo: ('a -> 'answer) -> ('a checkpoint -> 'a checkpoint -> 'answer) -> supplier -> 'a checkpoint -> 'answer \end{verbatim} \verb+loop_handle_undo+ is analogous to \verb+loop_handle+, but passes a pair of checkpoints (instead of a single checkpoint) to the failure continuation. % The first (and oldest) checkpoint that is passed to the failure continuation is the last \verb+InputNeeded+ checkpoint that was encountered before the error was detected. The second (and newest) checkpoint is where the error was detected. (This is the same checkpoint that \verb+loop_handle+ would pass to its failure continuation.) Going back to the first checkpoint can be thought of as undoing any reductions that were performed after seeing the problematic token. (These reductions must be default reductions or spurious reductions.) This can be useful to someone who wishes to implement an error explanation or error recovery mechanism. \verb+loop_handle_undo+ must be applied to an \verb+InputNeeded+ checkpoint. The initial checkpoint produced by \verb+Incremental.main+ is of this form. %% val shifts \begin{verbatim} val shifts: 'a checkpoint -> 'a env option \end{verbatim} \verb+shifts checkpoint+ assumes that \verb+checkpoint+ has been obtained by submitting a token to the parser. It runs the parser from \verb+checkpoint+, through an arbitrary number of reductions, until the parser either accepts this token (i.e., shifts) or rejects it (i.e., signals an error). If the parser decides to shift, then \verb+Some env+ is returned, where \verb+env+ is the parser's state just before shifting. Otherwise, \verb+None+ is returned. This can be used to test whether the parser is willing to accept a certain token. This function should be used with caution, though, as it causes semantic actions to be executed. It is desirable that all semantic actions be side-effect-free, or that their side-effects be harmless. %% val acceptable \begin{verbatim} val acceptable: 'a checkpoint -> token -> position -> bool \end{verbatim} \verb+acceptable checkpoint token pos+ requires \verb+checkpoint+ to be an \verb+InputNeeded+ checkpoint. It returns \verb+true+ iff the parser is willing to shift this token. % This can be used to test, after an error has been detected, which tokens would have been accepted at this point. To do this, one would typically use \verb+loop_handle_undo+ to get access to the last \verb+InputNeeded+ checkpoint that was encountered before the error was detected, and apply \verb+acceptable+ to that checkpoint. \verb+acceptable+ is implemented using \verb+shifts+, so, like \verb+shifts+, it causes certain semantic actions to be executed. It is desirable that all semantic actions be side-effect-free, or that their side-effects be harmless. % ------------------------------------------------------------------------------ \subsubsection{Inspecting the parser's state} \label{sec:incremental:inspecting} Although the type \verb+env+ is opaque, a parser state can be inspected via a few accessor functions, which are described in this section. The following types and functions are contained in the \verb+MenhirInterpreter+ sub-module. %% type 'a lr1state \begin{verbatim} type 'a lr1state \end{verbatim} The abstract type \verb+'a lr1state+ describes a (non-initial) state of the LR(1) automaton. % If \verb+s+ is such a state, then \verb+s+ should have at least one incoming transition, and all of its incoming transitions carry the same (terminal or non-terminal) symbol, say $A$. We say that $A$ is the \emph{incoming symbol} of the state~\verb+s+. % The index \verb+'a+ is the type of the semantic values associated with $A$. The role played by \verb+'a+ is clarified in the definition of the type \verb+element+, which appears further on. %% val number \begin{verbatim} val number: _ lr1state -> int \end{verbatim} The states of the LR(1) automaton are numbered (from 0 and up). The function \verb+number+ maps a state to its number. %% val production_index %% val find_production \begin{verbatim} val production_index: production -> int val find_production: int -> production \end{verbatim} Productions are numbered. (The set of indices of all productions forms an interval, which does \emph{not} necessarily begin at 0.) % The function \verb+production_index+ converts a production to an integer number, whereas the function \verb+find_production+ carries out the reverse conversion. It is an error to apply \verb+find_production+ to an invalid index. %% type element \begin{verbatim} type element = | Element: 'a lr1state * 'a * position * position -> element \end{verbatim} The type \verb+element+ describes one entry in the stack of the LR(1) automaton. In a stack element of the form \verb+Element (s, v, startp, endp)+, \verb+s+ is a (non-initial) state and \verb+v+ is a semantic value. The value~\verb+v+ is associated with the incoming symbol~$A$ of the state~\verb+s+. In other words, the value \verb+v+ was pushed onto the stack just before the state \verb+s+ was entered. Thus, for some type \verb+'a+, the state~\verb+s+ has type \verb+'a lr1state+ and the value~\verb+v+ has type~\verb+'a+. The positions \verb+startp+ and \verb+endp+ delimit the fragment of the input text that was reduced to the symbol $A$. In order to do anything useful with the value \verb+v+, one must gain information about the type \verb+'a+, by inspection of the state~\verb+s+. So far, the type \verb+'a lr1state+ is abstract, so there is no way of inspecting~\verb+s+. The inspection API (\sref{sec:inspection}) offers further tools for this purpose. %% val top \begin{verbatim} val top: 'a env -> element option \end{verbatim} \verb+top env+ returns the parser's top stack element. The state contained in this stack element is the current state of the automaton. If the stack is empty, \verb+None+ is returned. In that case, the current state of the automaton must be an initial state. %% val pop_many \begin{verbatim} val pop_many: int -> 'a env -> 'a env option \end{verbatim} \verb+pop_many i env+ pops \verb+i+ elements off the automaton's stack. This is done via \verb+i+ successive invocations of \verb+pop+. Thus, \verb+pop_many 1+ is \verb+pop+. The index \verb+i+ must be nonnegative. The time complexity is $O(i)$. %% val get \begin{verbatim} val get: int -> 'a env -> element option \end{verbatim} \verb+get i env+ returns the parser's \verb+i+-th stack element. The index \verb+i+ is 0-based: thus, \verb+get 0+ is \verb+top+. If \verb+i+ is greater than or equal to the number of elements in the stack, \verb+None+ is returned. \verb+get+ is implemented using \verb+pop_many+ and \verb+top+: its time complexity is $O(i)$. %% val current_state_number \begin{verbatim} val current_state_number: 'a env -> int \end{verbatim} \verb+current_state_number env+ is the integer number of the automaton's current state. Although this number might conceivably be obtained via the functions~\verb+top+ and \verb+number+, using \verb+current_state_number+ is preferable, because this method works even when the automaton's stack is empty (in which case the current state is an initial state, and \verb+top+ returns \verb+None+). This number can be passed as an argument to a \verb+message+ function generated by \verb+menhir --compile-errors+. %% val equal \begin{verbatim} val equal: 'a env -> 'a env -> bool \end{verbatim} \verb+equal env1 env2+ tells whether the parser configurations \verb+env1+ and \verb+env2+ are equal in the sense that the automaton's current state is the same in \verb+env1+ and \verb+env2+ and the stack is \emph{physically} the same in \verb+env1+ and \verb+env2+. If \verb+equal env1 env2+ is \verb+true+, then the sequence of the stack elements, as observed via \verb+pop+ and \verb+top+, must be the same in \verb+env1+ and \verb+env2+. Also, if \verb+equal env1 env2+ holds, then the checkpoints \verb+input_needed env1+ and \verb+input_needed env2+ must be equivalent. (The function \verb+input_needed+ is documented in \sref{sec:incremental:updating}.) The function \verb+equal+ has time complexity $O(1)$. %% val positions \begin{verbatim} val positions: 'a env -> position * position \end{verbatim} The function \verb+positions+ returns the start and end positions of the current lookahead token. If invoked in an initial state, this function returns a pair of twice the initial position that was passed as an argument to \verb+main+. %% val has_default_reduction %% val state_has_default_reduction \begin{verbatim} val env_has_default_reduction: 'a env -> bool val state_has_default_reduction: _ lr1state -> bool \end{verbatim} When applied to an environment \verb+env+ taken from a checkpoint of the form \verb+AboutToReduce (env, prod)+, the function \verb+env_has_default_reduction+ tells whether the reduction that is about to take place is a default reduction. \verb+state_has_default_reduction s+ tells whether the state \verb+s+ has a default reduction. This includes the case where \verb+s+ is an accepting state. % ------------------------------------------------------------------------------ \subsubsection{Updating the parser's state} \label{sec:incremental:updating} The functions presented in the previous section (\sref{sec:incremental:inspecting}) allow inspecting parser states of type \verb+'a checkpoint+ and \verb+'a env+. However, so far, there are no functions for manufacturing new parser states, except \verb+offer+ and \verb+resume+, which create new checkpoints by feeding tokens, one by one, to the parser. In this section, a small number of functions are provided for manufacturing new parser states of type \verb+'a env+ and \verb+'a checkpoint+. These functions allow going far back into the past and jumping ahead into the future, so to speak. In other words, they allow driving the parser in other ways than by feeding tokens into it. The functions \verb+pop+, \verb+force_reduction+ and \verb+feed+ (part of the inspection API; see \sref{sec:inspection}) construct values of type \verb+'a env+. The function \verb+input_needed+ constructs values of type \verb+'a checkpoint+ and thereby allows resuming parsing in normal mode (via \verb+offer+). Together, these functions can be used to implement error handling and error recovery strategies. %% val pop \begin{verbatim} val pop: 'a env -> 'a env option \end{verbatim} \verb+pop env+ returns a new environment, where the parser's top stack cell has been popped off. (If the stack is empty, \verb+None+ is returned.) This amounts to pretending that the (terminal or nonterminal) symbol that corresponds to this stack cell has not been read. %% val force_reduction \begin{verbatim} val force_reduction: production -> 'a env -> 'a env \end{verbatim} \verb+force_reduction prod env+ can be called only if in the state \verb+env+ the parser is capable of reducing the production \verb+prod+. If this condition is satisfied, then this production is reduced, which means that its semantic action is executed (this can have side effects!) and the automaton makes a goto (nonterminal) transition. If this condition is not satisfied, an \verb+Invalid_argument+ exception is raised. %% val input_needed \begin{verbatim} val input_needed: 'a env -> 'a checkpoint \end{verbatim} \verb+input_needed env+ returns \verb+InputNeeded env+. Thus, out of a parser state that might have been obtained via a series of calls to the functions \verb+pop+, \verb+force_reduction+, \verb+feed+, and so on, it produces a checkpoint, which can be used to resume normal parsing, by supplying this checkpoint as an argument to \verb+offer+. This function should be used with some care. It could ``mess up the lookahead'' in the sense that it allows parsing to resume in an arbitrary state \verb+s+ with an arbitrary lookahead symbol \verb+t+, even though \menhir's reachability analysis (which is carried out via the \olisterrors switch) might well think that it is impossible to reach this particular configuration. If one is using \menhir's new error reporting facility (\sref{sec:errors:new}), this could cause the parser to reach an error state for which no error message has been prepared. % ------------------------------------------------------------------------------ \subsection{Inspection API} \label{sec:inspection} If \oinspection is set, \menhir offers an inspection API in addition to the monolithic and incremental APIs. (The reason why this is not done by default is that this requires more tables to be generated, thus making the generated parser larger.) Like the incremental API, the inspection API is found in the sub-module \menhirinterpreter. It offers the following types and functions. %% type _ terminal The type \verb+'a terminal+ is a generalized algebraic data type (GADT). A value of type \verb+'a terminal+ represents a terminal symbol (without a semantic value). The index \verb+'a+ is the type of the semantic values associated with this symbol. For instance, if the grammar contains the declarations \verb+%token A+ and \verb+%token B+, then the generated module \menhirinterpreter contains the following definition: % \begin{verbatim} type _ terminal = | T_A : unit terminal | T_B : int terminal \end{verbatim} % The data constructors are named after the terminal symbols, prefixed with ``\verb+T_+''. %% type _ nonterminal The type \verb+'a nonterminal+ is also a GADT. A value of type \verb+'a nonterminal+ represents a nonterminal symbol (without a semantic value). The index \verb+'a+ is the type of the semantic values associated with this symbol. For instance, if \verb+main+ is the only nonterminal symbol, then the generated module \menhirinterpreter contains the following definition: % \begin{verbatim} type _ nonterminal = | N_main : thing nonterminal \end{verbatim} % The data constructors are named after the nonterminal symbols, prefixed with ``\verb+N_+''. %% type 'a symbol The type \verb+'a symbol+ % (an algebraic data type) is the disjoint union of the types \verb+'a terminal+ and \verb+'a nonterminal+. In other words, a value of type \verb+'a symbol+ represents a terminal or nonterminal symbol (without a semantic value). This type is (always) defined as follows: % \begin{verbatim} type 'a symbol = | T : 'a terminal -> 'a symbol | N : 'a nonterminal -> 'a symbol \end{verbatim} %% type xsymbol The type \verb+xsymbol+ is an existentially quantified version of the type \verb+'a symbol+. It is useful in situations where the index \verb+'a+ is not statically known. It is (always) defined as follows: % \begin{verbatim} type xsymbol = | X : 'a symbol -> xsymbol \end{verbatim} %% type item The type \verb+item+ describes an LR(0) item, that is, a pair of a production \verb+prod+ and an index \verb+i+ into the right-hand side of this production. If the length of the right-hand side is \verb+n+, then \verb+i+ is comprised between 0 and \verb+n+, inclusive. \begin{verbatim} type item = production * int \end{verbatim} %% Comparison functions. The following functions implement total orderings on the types \verb+_ terminal+, \verb+_ nonterminal+, \verb+xsymbol+, \verb+production+, and \verb+item+. \begin{verbatim} val compare_terminals: _ terminal -> _ terminal -> int val compare_nonterminals: _ nonterminal -> _ nonterminal -> int val compare_symbols: xsymbol -> xsymbol -> int val compare_productions: production -> production -> int val compare_items: item -> item -> int \end{verbatim} %% val incoming_symbol The function \verb+incoming_symbol+ maps a (non-initial) LR(1) state~\verb+s+ to its incoming symbol, that is, the symbol that the parser must recognize before it enters the state \verb+s+. % \begin{verbatim} val incoming_symbol: 'a lr1state -> 'a symbol \end{verbatim} % This function can be used to gain access to the semantic value \verb+v+ in a stack element \verb+Element (s, v, _, _)+. Indeed, by case analysis on the symbol \verb+incoming_symbol s+, one gains information about the type \verb+'a+, hence one obtains the ability to do something useful with the value~\verb+v+. %% val items The function \verb+items+ maps a (non-initial) LR(1) state~\verb+s+ to its LR(0) \emph{core}, that is, to the underlying set of LR(0) items. This set is represented as a list, whose elements appear in an arbitrary order. This set is \emph{not} closed under $\epsilon$-transitions. % \begin{verbatim} val items: _ lr1state -> item list \end{verbatim} %% val lhs %% val rhs The functions \verb+lhs+ and \verb+rhs+ map a production \verb+prod+ to its left-hand side and right-hand side, respectively. The left-hand side is always a nonterminal symbol, hence always of the form \verb+N _+. The right-hand side is a (possibly empty) sequence of (terminal or nonterminal) symbols. % \begin{verbatim} val lhs: production -> xsymbol val rhs: production -> xsymbol list \end{verbatim} % %% val nullable The function \verb+nullable+, applied to a non-terminal symbol, tells whether this symbol is nullable. A nonterminal symbol is nullable if and only if it produces the empty word $\epsilon$. % \begin{verbatim} val nullable: _ nonterminal -> bool \end{verbatim} %% val first %% val xfirst The function call \verb+first nt t+ tells whether the \emph{FIRST} set of the nonterminal symbol \verb+nt+ contains the terminal symbol \verb+t+. That is, it returns \verb+true+ if and only if \verb+nt+ produces a word that begins with \verb+t+. The function \verb+xfirst+ is identical to \verb+first+, except it expects a first argument of type \verb+xsymbol+ instead of \verb+_ terminal+. % \begin{verbatim} val first: _ nonterminal -> _ terminal -> bool val xfirst: xsymbol -> _ terminal -> bool \end{verbatim} %% val foreach_terminal %% val foreach_terminal_but_error The function \verb+foreach_terminal+ enumerates the terminal symbols, including the special symbol \error. The function \verb+foreach_terminal_but_error+ enumerates the terminal symbols, excluding \error. \begin{verbatim} val foreach_terminal: (xsymbol -> 'a -> 'a) -> 'a -> 'a val foreach_terminal_but_error: (xsymbol -> 'a -> 'a) -> 'a -> 'a \end{verbatim} %% val feed \verb+feed symbol startp semv endp env+ causes the parser to consume the (terminal or nonterminal) symbol \verb+symbol+, accompanied with the semantic value \verb+semv+ and with the start and end positions \verb+startp+ and \verb+endp+. Thus, the automaton makes a transition, and reaches a new state. The stack grows by one cell. This operation is permitted only if the current state (as determined by \verb+env+) has an outgoing transition labeled with \verb+symbol+. Otherwise, an \verb+Invalid_argument+ exception is raised. \begin{verbatim} val feed: 'a symbol -> position -> 'a -> position -> 'b env -> 'b env \end{verbatim} % TEMPORARY % document the modules that use the inspection API: Printers % document MenhirLib.General? % The directory \distrib{demos/calc-inspection} contains a demo that illustrates the use of the inspection API. % review it / clean it up! % ------------------------------------------------------------------------------ \section{Error handling: the traditional way} \label{sec:errors} \menhir's traditional error handling mechanism is considered deprecated: although it is still supported for the time being, it might be removed in the future. % We recommend setting up an error handling mechanism using the new tools offered by \menhir (\sref{sec:errors:new}). \paragraph{Error handling} \menhir's error traditional handling mechanism is inspired by that of \yacc and \ocamlyacc, but is not identical. A special \error token is made available for use within productions. The LR automaton is constructed exactly as if \error was a regular terminal symbol. However, \error is never produced by the lexical analyzer. Instead, when an error is detected, the current lookahead token is discarded and replaced with the \error token, which becomes the current lookahead token. At this point, the parser enters \emph{error handling} mode. % In error handling mode, the parser behaves as follows: \begin{itemize} \item If the current state has a shift action on the \error token, then this action takes place. Under the \legacy strategy, the parser then reads the next token and returns to normal mode. Under the simplified strategy, it does \emph{not} request the next token, so the current token remains \error, and the parser remains in error handling mode. \item If the current state has a reduce action on the \error token, then this action takes place. (This behavior differs from that of \yacc and \ocamlyacc, which do not reduce on \error. It is somewhat unclear why not.) The current token remains \error and the parser remains in error handling mode. \item If the current state has no action on the \error token, then, under the simplified strategy, the parser rejects the input. Under the \legacy strategy, the parser pops a cell off its stack and remains in error handling mode. If the stack is empty, then the parser rejects the input. \end{itemize} In the monolithic API, the parser rejects the input by raising the exception \texttt{Error}. This exception carries no information. The position of the error can be obtained by reading the lexical analyzer's environment record. In the incremental API, the parser rejects the input by returning the checkpoint \texttt{Rejected}. Which strategy should one choose? First, let us note that the difference between the strategies \legacy and \simplified matters only if the grammar uses the \error token. The following rule of thumb can be used to select between them: \begin{itemize} \item If the \error token is used only to catch an error and stop, then the \simplified strategy should be preferred. (In this this restricted style, the \error token always appears at the end of a production, whose semantic action raises an exception.) \item If the \error token is used to survive an error and continue parsing, then the legacy strategy should be selected. \end{itemize} \paragraph{Error recovery} \ocamlyacc offers an error recovery mode, which is entered immediately after an \error token was successfully shifted. In this mode, tokens are repeatedly taken off the input stream and discarded until an acceptable token is found. This feature is no longer offered by \menhir. \paragraph{Error-related keywords} The following keyword is made available to semantic actions. When the \verb+$syntaxerror+ keyword is evaluated, evaluation of the semantic action is aborted, so that the current reduction is abandoned; the current lookahead token is discarded and replaced with the \error token; and error handling mode is entered. Note that there is no mechanism for inserting an \error token \emph{in front of} the current lookahead token, even though this might also be desirable. It is unclear whether this keyword is useful; it might be suppressed in the future. % ------------------------------------------------------------------------------ \section{Error handling: the new way} \label{sec:errors:new} \menhir's incremental API (\sref{sec:incremental}) allows taking control when an error is detected. Indeed, as soon as an invalid token is detected, the parser produces a checkpoint of the form \verb+HandlingError _+. At this point, if one decides to let the parser proceed, by just calling \verb+resume+, then \menhir enters its traditional error handling mode (\sref{sec:errors}). Instead, however, one can decide to take control and perform error handling or error recovery in any way one pleases. One can, for instance, build and display a diagnostic message, based on the automaton's current stack and/or state. Or, one could modify the input stream, by inserting or deleting tokens, so as to suppress the error, and resume normal parsing. In principle, the possibilities are endless. An apparently simple-minded approach to error reporting, proposed by Jeffery~\cite{jeffery-03} and further explored by Pottier~\cite{pottier-reachability-cc-2016}, consists in selecting a diagnostic message (or a template for a diagnostic message) based purely on the current state of the automaton. In this approach, one determines, ahead of time, which are the ``error states'' (that is, the states in which an error can be detected), and one prepares, for each error state, a diagnostic message. Because state numbers are fragile (they change when the grammar evolves), an error state is identified not by its number, but by an input sentence that leads to it: more precisely, by an input sentence which causes an error to be detected in this state. Thus, one maintains a set of pairs of an erroneous input sentence and a diagnostic message. \menhir defines a file format, the \messages file format, for representing this information (\sref{sec:messages:format}), and offers a set of tools for creating, maintaining, and exploiting \messages files (\sref{sec:messages:tools}). Once one understands these tools, there remains to write a collection of diagnostic messages, a more subtle task than one might think (\sref{sec:errors:diagnostics}), and to glue everything together (\sref{sec:errors:example}). In this approach to error handling, as in any other approach, one must understand exactly when (that is, in which states) errors are detected. This in turn requires understanding how the automaton is constructed. \menhir's construction technique is not Knuth's canonical LR(1) technique~\cite{knuth-lr-65}, which is usually too expensive to be practical. Instead, \menhir \emph{merges} states~\cite{pager-77} and introduces so-called \emph{default reductions}. These techniques \emph{defer} error detection by allowing extra reductions to take place before an error is detected. % Furthermore, \menhir supports \donerrorreduce declarations, % which also introduce extra reductions. The impact of these alterations must be taken into account when writing diagnostic messages (\sref{sec:errors:diagnostics}). In this approach to error handling, the special \error token is not used. It should not appear in the grammar. Similarly, the \verb+$syntaxerror+ keyword should not be used. % ------------------------------------------------------------------------------ \subsection{The \messages file format} \label{sec:messages:format} \paragraph{Definition} A \messages file is a text file. It is composed of a list of entries. Each entry consists of one or more input sentences, followed with one or more blank lines, followed with a message. Two entries are separated by one or more blank lines. The syntax of an input sentence is described in \sref{sec:sentences}. A~message is an arbitrary piece of text, but cannot cannot a blank line. Blank lines are significant: they are used as separators, both between entries, and (within an entry) between the sentences and the message. Thus, there cannot be a blank line between two sentences. (If there is one, \menhir becomes confused and may complain about some word not being ``a known non-terminal symbol''). There also cannot be a blank line inside a message. \begin{figure} \begin{verbatim} grammar: TYPE UID # This hand-written comment concerns just the sentence above. grammar: TYPE OCAMLTYPE UID PREC # This hand-written comment concerns just the sentence above. # This hand-written comment concerns both sentences above. Ill-formed declaration. Examples of well-formed declarations: %type expression %type date time \end{verbatim} \caption{An entry in a \messages file} \label{fig:messages:entry} \end{figure} \begin{figure} \begin{verbatim} grammar: TYPE UID ## ## Ends in an error in state: 1. ## ## declaration -> TYPE . OCAMLTYPE separated_nonempty_list(option(COMMA), ## strict_actual) [ TYPE TOKEN START RIGHT PUBLIC PERCENTPERCENT PARAMETER ## ON_ERROR_REDUCE NONASSOC LEFT INLINE HEADER EOF COLON ] ## ## The known suffix of the stack is as follows: ## TYPE ## # This hand-written comment concerns just the sentence above. # grammar: TYPE OCAMLTYPE UID PREC ## ## Ends in an error in state: 5. ## ## strict_actual -> symbol . loption(delimited(LPAREN,separated_nonempty_list ## (COMMA,strict_actual),RPAREN)) [ UID TYPE TOKEN START STAR RIGHT QUESTION ## PUBLIC PLUS PERCENTPERCENT PARAMETER ON_ERROR_REDUCE NONASSOC LID LEFT ## INLINE HEADER EOF COMMA COLON ] ## ## The known suffix of the stack is as follows: ## symbol ## # This hand-written comment concerns just the sentence above. # This hand-written comment concerns both sentences above. Ill-formed declaration. Examples of well-formed declarations: %type expression %type date time \end{verbatim} \caption{An entry in a \messages file, decorated with auto-generated comments} \label{fig:messages:entry:decorated} \end{figure} As an example, \fref{fig:messages:entry} shows a valid entry, taken from \menhir's own \messages file. This entry contains two input sentences, which lead to errors in two distinct states. A single message is associated with these two error states. \paragraph{Comments} Comment lines, which begin with a \verb+#+ character, are ignored everywhere. However, users who wish to take advantage of \menhir's facility for merging two \messages files (\sref{sec:messages:merge}) should follow certain conventions regarding the placement of comments: \begin{itemize} \item If a comment concerns a specific sentence and should remain attached to this sentence, then it must immediately follow this sentence (without a blank line in between). \item If a comment concerns all sentences in an entry, then it should appear between the sentences and the message, with blank lines in between. \item One should avoid placing comments between two entries, as the merging algorithm will not be able to handle them in a satisfactory way. \end{itemize} \paragraph{Auto-generated comments} Several commands, described next (\sref{sec:messages:tools}), produce \messages files where each input sentence is followed with an auto-generated comment, marked with \verb+##+. This special comment indicates in which state the error is detected, and is supposed to help the reader understand what it means to be in this state: What has been read so far? What is expected next? As an example, the previous entry, decorated with auto-generated comments, is shown in \fref{fig:messages:entry:decorated}. (We have manually wrapped the lines that did not fit in this document.) An auto-generated comment begins with the number of the error state that is reached via this input sentence. Then, the auto-generated comment shows the LR(1) items that compose this state, in the same format as in an \automaton file. these items offer a description of the past (that is, what has been read so far) and the future (that is, which terminal symbols are allowed next). Finally, the auto-generated comment shows what is known about the stack when the automaton is in this state. (This can be deduced from the LR(1) items, but is more readable if shown separately.) % Plus, there might be cases where the known suffix is longer than the what % the LR(1) items suggest. But I have never seen this yet. In a canonical LR(1) automaton, the LR(1) items offer an exact description of the past and future. However, in a noncanonical automaton, which is by default what \menhir produces, the situation is more subtle. The lookahead sets can be over-approximated, so the automaton can perform one or more ``spurious reductions'' before an error is detected. As a result, the LR(1) items in the error state offer a description of the future that may be both incorrect (that is, a terminal symbol that appears in a lookahead set is not necessarily a valid continuation) and incomplete (that is, a terminal symbol that does not appear in any lookahead set may nevertheless be a valid continuation). More details appear further on (\sref{sec:errors:diagnostics}). In order to attract the user's attention to this issue, if an input sentence causes one or more spurious reductions, then the auto-generated comment contains a warning about this fact. This mechanism is not completely foolproof, though, as it may be the case that one particular sentence does not cause any spurious reductions (hence, no warning appears), yet leads to an error state that can be reached via other sentences that do involve spurious reductions. % Not sure what to conclude about this issue... % ------------------------------------------------------------------------------ \subsection{Maintaining \messages files} \label{sec:messages:tools} Ideally, the set of input sentences in a \messages file should be correct (that is, every sentence causes an error on its last token), irredundant (that is, no two sentences lead to the same error state), and complete (that is, every error state is reached by some sentence). \paragraph{Verifying correctness and irredundancy} The correctness and irredundancy of a \messages file are checked by supplying \ocompileerrors \nt{filename} on the command line, where \nt{filename} is the name of the \messages file. (These arguments must be supplied in addition to the other usual arguments, such as the name of the \mly file.) This command fails if a sentence does not cause an error at all, or causes an error too early. It also fails if two sentences lead to the same error state. % If the file is correct and irredundant, then (as its name suggests) this command compiles the \messages file down to an \ocaml function, whose code is printed on the standard output channel. This function, named \verb+message+, has type \verb+int -> string+, and maps a state number to a message. It raises the exception \verb+Not_found+ if its argument is not the number of a state for which a message has been defined. If the set of input sentences is complete, then it cannot raise \verb+Not_found+. \paragraph{Verifying completeness} The completeness of a \messages file is checked via the commands \olisterrors and \ocompareerrors. The former produces, from scratch, a complete set of input sentences, that is, a set of input sentences that reaches all error states. The latter compares two sets of sentences (more precisely, the two underlying sets of error states) for inclusion. The command \olisterrors first computes all possible ways of causing an error. From this information, it deduces a list of all error states, that is, all states where an error can be detected. For each of these states, it computes a (minimal) input sentence that causes an error in this state. Finally, it prints these sentences, in the \messages file format, on the standard output channel. Each sentence is followed with an auto-generated comment and with a dummy diagnostic message. The user should be warned that this algorithm may require large amounts of time (typically in the tens of seconds, possibly more) and memory (typically in the gigabytes, possibly more). It requires a 64-bit machine. (On a 32-bit machine, it works, but quickly hits a built-in size limit.) At the verbosity level \ologautomaton~\texttt{2}, it displays some progress information and internal statistics on the standard error channel. The command \ocompareerrors \nt{filename1} \ocompareerrors \nt{filename2} compares the \messages files \nt{filename1} and \nt{filename2}. Each file is read and internally translated to a mapping of states to messages. \menhir then checks that the left-hand mapping is a subset of the right-hand mapping. That is, if a state~$s$ is reached by some sentence in \nt{filename1}, then it should also be reached by some sentence in \nt{filename2}. Furthermore, if the message associated with $s$ in \nt{filename1} is not a dummy message, then the same message should be associated with $s$ in \nt{filename2}. To check that the sentences in \nt{filename2} cover all error states, it suffices to (1)~use \olisterrors to produce a complete set of sentences, which one stores in \nt{filename1}, then (2)~use \ocompareerrors to compare \nt{filename1} and \nt{filename2}. In the case of a grammar that evolves fairly often, it can take significant human time and effort to update the \messages file and ensure correctness, irredundancy, and completeness. A tempting way of reducing this effort is to abandon completeness. This implies that the auto-generated \verb+message+ function can raise \verb+Not_found+ and that a generic ``syntax error'' message must be produced in that case. We prefer to discourage this approach, as it implies that the end user is exposed to a mixture of specific and generic syntax error messages, and there is no guarantee that the specific (hand-written) messages will appear in \emph{all} situations where they are expected to appear. Instead, we recommend waiting for the grammar to become stable and enforcing completeness. \paragraph{Merging \messages files} \label{sec:messages:merge} The command \omergeerrors \nt{filename1} \omergeerrors \nt{filename2} attempts to merge the \messages files \nt{filename1} and \nt{filename2}, and prints the result on the standard output channel. This command can be useful if two users have worked independently and each of them has produced a \messages file that covers a subset of all error states. The merging algorithm works roughly as follows: % \begin{itemize} \item All entries in \nt{filename2} are preserved literally. \item An entry in \nt{filename1} that contains the dummy message \verb++ is ignored. \item An entry in \nt{filename1} that leads to a state for which there is no entry in \nt{filename2} is copied to \nt{filename2}. \item An entry in \nt{filename1} that leads to a state for which there is also an entry in \nt{filename2}, with a distinct message, gives rise to a conflict. It is inserted into \nt{filename2} together with a comment that signals the conflict. \end{itemize} % The algorithm is asymmetric: the content of \nt{filename1} is inserted into or appended to \nt{filename2}. For this reason, if one of the files is a large ``reference'' file and the other file is a small ``delta'', then it is recommended to provide the ``delta'' as \nt{filename1} and the ``reference'' as \nt{filename2}. \paragraph{Other commands} The command \oupdateerrors \nt{filename} is used to update the auto-generated comments in the \messages file \nt{filename}. It is typically used after a change in the grammar (or in the command line options that affect the construction of the automaton). A new \messages file is produced on the standard output channel. It is identical to \nt{filename}, except the auto-generated comments, identified by \verb+##+, have been removed and re-generated. The command \oechoerrors \nt{filename} is used to filter out all comments, blank lines, and messages from the \messages file \nt{filename}. The input sentences, and nothing else, are echoed on the standard output channel. As an example application, one could then translate the sentences to concrete syntax and create a collection of source files that trigger every possible syntax error. The command \ointerpreterror is analogous to \ointerpret. It causes \menhir to act as an interpreter. \menhir reads sentences off the standard input channel, parses them, and displays the outcome. This switch can be usefully combined with \otrace. The main difference between \ointerpret and \ointerpreterror is that, when the latter command is used, \menhir expects the input sentence to cause an error on its last token, and displays information about the state in which the error is detected, in the form of a \messages file entry. This can be used to quickly find out exactly what error is caused by one particular input sentence. % ------------------------------------------------------------------------------ \subsection{Writing accurate diagnostic messages} \label{sec:errors:diagnostics} One might think that writing a diagnostic message for each error state is a straightforward (if lengthy) task. In reality, it is not so simple. % Here are a few guidelines. % The reader is referred to Pottier's % paper~\cite{pottier-reachability-cc-2016} for more details. \paragraph{A state, not a sentence} The first thing to keep in mind is that a diagnostic message is associated with a \emph{state}~$s$, as opposed to a sentence. An entry in a \messages file contains a sentence~$w$ that leads to an error in state~$s$. This sentence is just one way of causing an error in state~$s$; there may exist many other sentences that also cause an error in this state. The diagnostic message should not be specific of the sentence~$w$: it should make sense regardless of how the state~$s$ is reached. As a rule of thumb, when writing a diagnostic message, one should (as much as possible) ignore the example sentence~$w$ altogether, and concentrate on the description of the state~$s$, which appears as part of the auto-generated comment. The LR(1) items that compose the state~$s$ offer a description of the past (that is, what has been read so far) and the future (that is, which terminal symbols are allowed next). A diagnostic message should be designed based on this description. \begin{figure} \verbatiminput{declarations.mly} \caption{A grammar where one error state is difficult to explain} \label{fig:declarations} \end{figure} \begin{figure} \begin{verbatim} program: ID COLON ID LPAREN ## ## Ends in an error in state: 8. ## ## typ1 -> typ0 . [ SEMICOLON RPAREN ] ## typ1 -> typ0 . ARROW typ1 [ SEMICOLON RPAREN ] ## ## The known suffix of the stack is as follows: ## typ0 ## \end{verbatim} \caption{A problematic error state in the grammar of \fref{fig:declarations}, due to over-approximation} \label{fig:declarations:over} \end{figure} \paragraph{The problem of over-approximated lookahead sets} As pointed out earlier (\sref{sec:messages:format}), in a noncanonical automaton, the lookahead sets in the LR(1) items can be both over- and under-approximated. One must be aware of this phenomenon, otherwise one runs the risk of writing a diagnostic message that proposes too many or too few continuations. As an example, let us consider the grammar in \fref{fig:declarations}. According to this grammar, a ``program'' is either a declaration between parentheses or a declaration followed with a semicolon. A ``declaration'' is an identifier, followed with a colon, followed with a type. A ``type'' is an identifier, a type between parentheses, or a function type in the style of \ocaml. The (noncanonical) automaton produced by \menhir for this grammar has 17~states. Using \olisterrors, we find that an error can be detected in 10 of these 17~states. By manual inspection of the auto-generated comments, we find that for 9 out of these 10~states, writing an accurate diagnostic message is easy. However, one problematic state remains, namely state~8, shown in \fref{fig:declarations:over}. In this state, a (level-0) type has just been read. One valid continuation, which corresponds to the second LR(1) item in \fref{fig:declarations:over}, is to continue this type: the terminal symbol \verb+ARROW+, followed with a (level-1) type, is a valid continuation. Now, the question is, what other valid continuations are there? By examining the first LR(1) item in \fref{fig:declarations:over}, it may look as if both \verb+SEMICOLON+ and \verb+RPAREN+ are valid continuations. However, this cannot be the case. A moment's thought reveals that \emph{either} we have seen an opening parenthesis \verb+LPAREN+ at the very beginning of the program, in which case we definitely expect a closing parenthesis \verb+RPAREN+; \emph{or} we have not seen one, in which case we definitely expect a semicolon \verb+SEMICOLON+. It is \emph{never} the case that \emph{both} \verb+SEMICOLON+ and \verb+RPAREN+ are valid continuations! In fact, the lookahead set in the first LR(1) item in \fref{fig:declarations:over} is over-approximated. State~8 in the noncanonical automaton results from merging two states in the canonical automaton. In such a situation, one cannot write an accurate diagnostic message. % by lack of ``static context''. Knowing that the automaton is in state~8 does not give us a precise view of the valid continuations. Some valuable information (that is, whether we have seen an opening parenthesis \verb+LPAREN+ at the very beginning of the program) is buried in the automaton's stack. \begin{figure} \verbatiminput{declarations-phantom.mly} \caption{Splitting the problematic state of \fref{fig:declarations:over} via selective duplication} \label{fig:declarations:phantom} \end{figure} \begin{figure} \verbatiminput{declarations-onerrorreduce.mly} \caption{Avoiding the problematic state of \fref{fig:declarations:over} via reductions on error} \label{fig:declarations:onerrorreduce} \end{figure} \begin{figure} \begin{verbatim} program: ID COLON ID LPAREN ## ## Ends in an error in state: 15. ## ## program -> declaration . SEMICOLON [ # ] ## ## The known suffix of the stack is as follows: ## declaration ## ## WARNING: This example involves spurious reductions. ## This implies that, although the LR(1) items shown above provide an ## accurate view of the past (what has been recognized so far), they ## may provide an INCOMPLETE view of the future (what was expected next). ## In state 8, spurious reduction of production typ1 -> typ0 ## In state 11, spurious reduction of production declaration -> ID COLON typ1 ## \end{verbatim} \caption{A problematic error state in the grammar of \fref{fig:declarations:onerrorreduce}, due to under-approximation} \label{fig:declarations:under} \end{figure} How can one work around this problem? Let us suggest three options. \paragraph{Blind duplication of states} One option would be to build a canonical automaton by using the % (undocumented!) \ocanonical switch. In this example, one would obtain a 27-state automaton, where the problem has disappeared. However, this option is rarely viable, as it duplicates many states without good reason. \paragraph{Selective duplication of states} A second option is to manually cause just enough duplication to remove the problematic over-approximation. In our example, we wish to distinguish two kinds of types and declarations, namely those that must be followed with a closing parenthesis, and those that must be followed with a semicolon. We create such a distinction by parameterizing \verb+typ1+ and \verb+declaration+ with a phantom parameter. The modified grammar is shown in \fref{fig:declarations:phantom}. The phantom parameter does not affect the language that is accepted: for instance, the nonterminal symbols \texttt{declaration(SEMICOLON)} and \texttt{declaration(RPAREN)} generate the same language as \texttt{declaration} in the grammar of \fref{fig:declarations}. Yet, by giving distinct names to these two symbols, we force the construction of an automaton where more states are distinguished. In this example, \menhir produces a 23-state automaton. Using \olisterrors, we find that an error can be detected in 11 of these 23~states, and by manual inspection of the auto-generated comments, we find that for each of these 11~states, writing an accurate diagnostic message is easy. In summary, we have selectively duplicated just enough states so as to split the problematic error state into two non-problematic error states. % Je me demande s'il n'y a pas un lien avec la traduction de LR(k+1) vers LR(k)... % On voit que le FOLLOW est intégré au symbole nonterminal. \paragraph{Reductions on error} A third and last option is to introduce an \donerrorreduce declaration (\sref{sec:onerrorreduce}) so as to prevent the detection of an error in the problematic state~8. We see in \fref{fig:declarations:over} that, in state~8, the production $\texttt{typ1} \rightarrow \texttt{typ0}$ is ready to be reduced. If we could force this reduction to take place, then the automaton would move to some other state where it would be clear which of \verb+SEMICOLON+ and \verb+RPAREN+ is expected. We achieve this by marking \verb+typ1+ as ``reducible on error''. The modified grammar is shown in \fref{fig:declarations:onerrorreduce}. For this grammar, \menhir produces a 17-state automaton. (This is the exact same automaton as for the grammar of \fref{fig:declarations}, except 2 of the 17 states have received extra reduction actions.) Using \olisterrors, we find that an error can be detected in 9 of these~17 states. The problematic state, namely state~8, is no longer an error state! The problem has vanished. \paragraph{The problem of under-approximated lookahead sets} The third option seems by far the simplest of all, and is recommended in many situations. However, it comes with a caveat. There may now exist states whose lookahead sets are under-approximated, in a certain sense. Because of this, there is a danger of writing an incomplete diagnostic message, one that does not list all valid continuations. To see this, let us look again at the sentence \texttt{ID COLON ID LPAREN}. In the grammar and automaton of \fref{fig:declarations}, this sentence takes us to the problematic state~8, shown in \fref{fig:declarations:over}. In the grammar and automaton of \fref{fig:declarations:onerrorreduce}, because more reduction actions are carried out before the error is detected, this sentence takes us to state~15, shown in \fref{fig:declarations:under}. When writing a diagnostic message for state~15, one might be tempted to write: ``Up to this point, a declaration has been recognized. At this point, a semicolon is expected''. Indeed, by examining the sole LR(1) item in state~15, it looks as if \verb+SEMICOLON+ is the only permitted continuation. However, this is not the case. Another valid continuation is \verb+ARROW+: indeed, the sentence \texttt{ID COLON ID ARROW ID SEMICOLON} forms a valid program. In fact, if the first token following \texttt{ID COLON ID} is \texttt{ARROW}, then in state~8 this token is shifted, so the two reductions that take us from state~8 through state~11 to state~15 never take place. This is why, even though \texttt{ARROW} does not appear in state~15 as a valid continuation, it nevertheless is a valid continuation of \texttt{ID COLON ID}. The warning produced by \menhir, shown in \fref{fig:declarations:under}, is supposed to attract attention to this issue. Another way to explain this issue is to point out that, by declaring \verb+%on_error_reduce typ1+, we make a choice. When the parser reads a type and finds an invalid token, it decides that this type is finished, even though, in reality, this type could be continued with \verb+ARROW+ \ldots. This in turn causes the parser to perform another reduction and consider the current declaration finished, even though, in reality, this declaration could be continued with \verb+ARROW+ \ldots. In summary, when writing a diagnostic message for state~15, one should take into account the fact that this state can be reached via spurious reductions and (therefore) \verb+SEMICOLON+ may not be the only permitted continuation. One way of doing this, without explicitly listing all permitted continuations, is to write: ``Up to this point, a declaration has been recognized. If this declaration is complete, then at this point, a semicolon is expected''. % ------------------------------------------------------------------------------ \subsection{A working example} \label{sec:errors:example} The demo \distrib{demos/calc-syntax-errors} illustrates this approach to error handling. It is based on the demo \distrib{demos/calc}, which involves a very simple grammar of arithmetic expressions. Compared with \distrib{demos/calc}, one \donerrorreduce declaration is added so as to reduce the number of error states. There remain just 9~error states, for which we write 5~distinct syntax error messages. These messages are stored in the file \distrib{demos/calc-syntax-errors/parserMessages.messages}. % The file \distrib{demos/calc-syntax-errors/dune} instructs the build system to check this file for correctness, irredundancy and completeness and to compile this file into an OCaml module \verb+parserMessages.ml+. % This OCaml module contains a single function, \verb+ParserMessages.messages+, which maps a state number to a diagnostic message. It is called from the main module, \distrib{demos/calc-syntax-errors/calc.ml}. % There, we use the facilities offered by the module \menhirliberrorreports to print a full syntax error message, which includes the precise location of the error as well as the diagnostic message returned by the function \verb+ParserMessages.messages+. % As icing on the cake, we allow the diagnostic message to contain placeholders of the form \verb+$i+, where \verb+i+ is an integer constant, understood as a 0-based index into the parser's stack. We replace such a placeholder with the fragment of the source text that corresponds to this stack entry. % A number of expected-output files demonstrate the kind of syntax error messages that we produce; see for instance \distrib{demos/calc-syntax-errors/calc03.exp} and \distrib{demos/calc-syntax-errors/calc07.exp}. The CompCert verified compiler offers another real-world example. The ``pre-parser'' is where syntax errors are detected: see \compcertgithubfile{cparser/pre\_parser.mly}. % (The pre-parser is also in charge of distinguishing type names versus variable % names, but that is an independent issue.) A database of erroneous input sentences and (templates for) diagnostic messages is stored in \compcertgithubfile{cparser/handcrafted.messages}. % It is compiled, using \ocompileerrors, to an \ocaml file named % \texttt{cparser/pre\_parser\_messages.ml}. The function % \verb+Pre_parser_messages.message+, which maps a state number to (a template % for) a diagnostic message, is called from % \compcertgithubfile{cparser/ErrorReports.ml}, where we construct and display a % full-fledged diagnostic message. % ------------------------------------------------------------------------------ \section{Coq back-end} \label{sec:coq} \menhir is able to generate a parser that whose correctness can be formally verified using the Coq proof assistant~\cite{jourdan-leroy-pottier-12}. This feature is used to construct the parser of the CompCert verified compiler~\cite{compcert}. Setting the \ocoq switch on the command line enables the Coq back-end. When this switch is set, \menhir expects an input file whose name ends in \vy and generates a Coq file whose name ends in \texttt{.v}. Like a \mly file, a \vy file is a grammar specification, with embedded semantic actions. The only difference is that the semantic actions in a \vy file are expressed in Coq instead of \ocaml. A \vy file otherwise uses the same syntax as a \mly file. CompCert's \compcertgithubfile{cparser/Parser.vy} serves as an example. Several restrictions are imposed when \menhir is used in \ocoq mode: % \begin{itemize} \item The error handling mechanism (\sref{sec:errors}) is absent. The \verb+$syntaxerror+ keyword and the \error token are not supported. \item Location information is not propagated. The \verb+$start*+ and \verb+$end*+ keywords (\fref{fig:pos}) are not supported. \item \dparameter (\sref{sec:parameter}) is not supported. \item \dinline (\sref{sec:inline}) is not supported. \item The standard library (\sref{sec:library}) is not supported, of course, because its semantic actions are expressed in \ocaml. If desired, the user can define an analogous library, whose semantic actions are expressed in Coq. \item Because Coq's type inference algorithm is rather unpredictable, the Coq type of every nonterminal symbol must be provided via a \dtype or \dstart declaration (\sref{sec:type}, \sref{sec:start}). \item Unless the proof of completeness has been deactivated using \ocoqnocomplete, the grammar must not have a conflict (not even a benign one, in the sense of \sref{sec:conflicts:benign}). That is, the grammar must be LR(1). Conflict resolution via priority and associativity declarations (\sref{sec:assoc}) is not supported. The reason is that there is no simple formal specification of how conflict resolution should work. \end{itemize} The generated file contains several modules: \begin{itemize} \item The module \verb+Gram+ defines the terminal and non-terminal symbols, the grammar, and the semantic actions. \item The module \verb+Aut+ contains the automaton generated by \menhir, together with a certificate that is checked by Coq while establishing the soundness and completeness of the parser. \end{itemize} The type~\verb+terminal+ of the terminal symbols is an inductive type, with one constructor for each terminal symbol. A terminal symbol named \verb+Foo+ in the \verb+.vy+ file is named \verb+Foo't+ in Coq. A~terminal symbol per se does not carry a the semantic value. We also define the type \verb+token+ of tokens, that is, dependent pairs of a terminal symbol and a semantic value of an appropriate type for this symbol. We model the lexer as an object of type \verb+Streams.Stream token+, that is, an infinite stream of tokens. % TEMPORARY documenter que du coup, après extraction, la seule façon pour un % lexer OCaml de produire des tokens, c'est d'utiliser Obj.magic % cf. la fonction compute_token_stream dans le Lexer.mll de Compcert: % Cons (Coq_existT (t, Obj.magic v), Lazy.from_fun compute_token_stream) The type~\verb+nonterminal+ of the non-terminal symbols is an inductive type, with one constructor for each non-terminal symbol. A non-terminal symbol named \verb+Bar+ in the \verb+.vy+ file is named \verb+Bar'nt+ in Coq. The proof of termination of an LR(1) parser in the case of invalid input seems far from obvious. We did not find such a proof in the literature. In an application such as CompCert~\cite{compcert}, this question is not considered crucial. For this reason, we did not formally establish the termination of the parser. Instead, in order to satisfy Coq's termination requirements, we use the ``fuel'' technique: the parser takes an additional parameter \verb+log_fuel+ of type \verb+nat+ such that $2^{\verb+log_fuel+}$ is the maximum number of steps the parser is allowed to perform. In practice, one can use a value of e.g., 40 or 50 to make sure the parser will never run out of fuel in a reasonnable time. Parsing can have three different outcomes, represented by the type \verb+parse_result+. % (This definition is implicitly parameterized over the initial state~\verb+init+. We omit the details here.) % \begin{verbatim} Inductive parse_result := | Fail_pr_full: state -> token -> parse_result | Timeout_pr: parse_result | Parsed_pr: symbol_semantic_type (NT (start_nt init)) -> Stream token -> parse_result. \end{verbatim} The outcome \verb+Fail_pr_full+ means that parsing has failed because of a syntax error. (If the completeness of the parser with respect to the grammar has been proved, this implies that the input is invalid). It contains two pieces of information: the state of the parser and the token which caused the error. These are provided for error reporting, if desired. It is important to note that, even though they should be correct, the validity of these two pieces of information is not certified by either the correctness or the completeness theorem. For more discussion on this, see \sref{sec:coq:errmsg}. The outcome \verb+Timeout_pr+ means that the fuel has been exhausted. Of course, this cannot happen if the parser was given an infinite amount of fuel, as suggested above. The outcome \verb+Parsed_pr+ means that the parser has succeeded in parsing a prefix of the input stream. It carries the semantic value that has been constructed for this prefix, as well as the remainder of the input stream. For each entry point \verb+entry+ of the grammar, \menhir generates a parsing function \verb+entry+, whose type is \verb+nat -> Stream token -> parse_result+. % jh: Je suis un peu embêté, parce que init est % en réalité de type initstate, mais je n'ai pas envie d'en parler % dans la doc. Tout ce qui importe, c'est que le premier paramètre de % Parsed_pr a un type compatible avec le type que l'utilisateur a % donné. Two theorems are provided, named \verb+entry_point_correct+ and \verb+entry_point_complete+. The correctness theorem states that, if a word (a prefix of the input stream) is accepted, then this word is valid (with respect to the grammar) and the semantic value that is constructed by the parser is valid as well (with respect to the grammar). The completeness theorem states that if a word (a prefix of the input stream) is valid (with respect to the grammar), then (given sufficient fuel) it is accepted by the parser. These results imply that the grammar is unambiguous: for every input, there is at most one valid interpretation. This is proved by another generated theorem, named \verb+Parser.unambiguous+. % jh: Pas besoin de prouver la terminaison pour avoir la non-ambiguïté, car % les cas de non-terminaison ne concernent que les entrées invalides. % fp: bien vu! % fp: ce serait intéressant d'avoir un certificat comme quoi la grammaire est % bien LR(1), mais peut-être qu'on s'en fout. C'est bien de savoir qu'elle % est non-ambiguë. % jh: Je ne sais pas ce que c'est qu'un certificat comme quoi la grammaire % est LR(1), en pratique... % fp: Ce serait une preuve d'un théorème, exprimé uniquement en termes de % la grammaire, comme quoi la grammaire est LR(1). Il y a une définition % de cette propriété dans le textbook de Aho et Ullman, si je me rappelle % bien. Mais peu importe. % fp: On pourrait aussi souhaiter un théorème comme quoi le parser ne lit % pas le stream trop loin... % jh: pour vraiment prouver cela, il faudrait inverser le % controle. Sinon, comme résultat un peu moins fort, dans la version % actuelle, on renvoie le stream restant, et on prouve qu'il % correspond bien à la fin du Stream. The parsers produced by \menhir's Coq back-end must be linked with a Coq library. This library can be installed via the command \verb+opam install coq-menhirlib+.% % \footnote{This assumes that you have installed \texttt{opam}, the OCaml package manager, and that you have run the command \texttt{opam repo add coq-released https://coq.inria.fr/opam/released}.} % The Coq sources of this library can be found in the \texttt{coq-menhirlib} directory of the Menhir repository. The directory \distrib{demos/coq-minicalc} contains a minimal example that shows how to set things up. The CompCert verified compiler~\cite{compcert,compcert-github} offers a real-world example. There, see in particular the directory \compcertgithubfile{cparser}. \subsection{Error messaging options for Coq mode} \label{sec:coq:errmsg} Users of the Coq mode have several options for providing error messages from the parser. If they wish, they can follow the pattern of CompCert and use Menhir's incremental mode for a \textit{non-verified} separate parser (\sref{sec:errors:example}). This may also aid in parsing languages (such as C) that need a lexical feedback loop for correct parsing. A similar option is available in \distrib{demos/calc-syntax-errors}, where a second parser is used after the first to determine errors. The parse result \verb+Fail_pr_full+ provides the third and simplest option. As it carries state and token information, it allows constructing meaningful error messages with a small amount of work. The generated function \verb+Aut.N_of_state+ converts a state to a state number. While this is not as powerful as the advanced error handling enabled by the incremental API for non-verified parsers, this does allow interoperability with the existing \messages files and tooling, described in \sref{sec:messages:format} and \sref{sec:messages:tools}. This allows error messaging without a second parser. An example is provided in \distrib{demos/coq-syntax-errors}. Note that the extra information carried by the data constructor \verb+Fail_pr_full+ (and by the related data constructor \verb+Fail_sr_full+ that is used internally) is not verified. This extra information is provided for convenience, but there is no proof of its correctness. Users who wish to ignore this extra information can use the abbreviated notation \verb+Fail_pr+ and \verb+Fail_sr+. This notation is used in the statements of the theorems about the parser. It is available only in Coq code, such as in \distrib{demos/coq-minicalc}, not in extracted OCaml code. % ------------------------------------------------------------------------------ \section{Building grammarware on top of \menhir} \label{sec:grammarware} It is possible to build a variety of grammar-processing tools, also known as ``grammarware''~\cite{klint-laemmel-verhoef-05}, on top of \menhir's front-end. Indeed, \menhir offers a facility for dumping a \cmly file, which contains a (binary-form) representation of the grammar and automaton, as well as a library, \menhirsdk, for (programmatically) reading and exploiting a \cmly file. These facilities are described in \sref{sec:sdk}. % Furthermore, \menhir allows decorating a grammar with ``attributes'', which are ignored by \menhir's back-ends, yet are written to the \cmly file, thus can be exploited by other tools, via \menhirsdk. % Attributes are described in \sref{sec:attributes}. \subsection{\menhir's SDK} \label{sec:sdk} The command line option \ocmly causes \menhir to produce a \cmly file in addition to its normal operation. This file contains a (binary-form) representation of the grammar and automaton. This is the grammar that is obtained after the following steps have been carried out: \begin{itemize} \item joining multiple \mly files, if necessary; % in fact, always (due to standard.mly) \item eliminating anonymous rules; \item expanding away parameterized nonterminal symbols; \item removing unreachable nonterminal symbols; \item performing \ocaml type inference, if the \oinfer switch is used; \item inlining away nonterminal symbols that are decorated with \dinline. \end{itemize} The library \menhirsdk offers an API for reading a \cmly file. The functor \repo{sdk/cmly_read.mli}{\texttt{MenhirSdk.Cmly\_read.Read}} reads such a file and produces a module whose signature is \repo{sdk/cmly_api.ml}{\texttt{MenhirSdk.Cmly\_api.GRAMMAR}}. This API is not explained in this document; for details, the reader is expected to follow the above links. % TEMPORARY mention the demo generate-printers % as an example of both the SDK and attributes % (possibly make it an independent package) \subsection{Attributes} \label{sec:attributes} Attributes are decorations that can be placed in \mly files. They are ignored by \menhir's back-ends, but are written to \cmly files, thus can be exploited by other tools, via \menhirsdk. An attribute consists of a name and a payload. An attribute name is an \ocaml identifier, such as \texttt{cost}, or a list of \ocaml identifiers, separated with dots, such as \texttt{my.name}. An attribute payload is an \ocaml expression of arbitrary type, such as \texttt{1} or \verb+"&&"+ or \verb+print_int+. Following the syntax of \ocaml's attributes, an attribute's name and payload are separated with one or more spaces, and are delimited by \verb+[@+ and \verb+]+. Thus, \verb+[@cost 1]+ and \verb+[@printer print_int]+ are examples of attributes. An attribute can be attached at one of four levels: % grammar-level attributes, %[@foo ...] % terminal attribute, %token BAR [@foo ...] % nonterminal attribute, bar [@foo ...]: ... % producer attribute, e = expr [@foo ...] \begin{enumerate} \item An attribute can be attached with the grammar. Such an attribute must be preceded with a \verb+%+ sign and must appear in the declarations section (\sref{sec:decls}). For example, the following is a valid declaration: \begin{verbatim} %[@trace true] \end{verbatim} \item An attribute can be attached with a terminal symbol. Such an attribute must follow the declaration of this symbol. For example, the following is a valid declaration of the terminal symbol \verb+INT+: \begin{verbatim} %token INT [@cost 0] [@printer print_int] \end{verbatim} \item An attribute can be attached with a nonterminal symbol. Such an attribute must appear inside the rule that defines this symbol, immediately after the name of this symbol. For instance, the following is a valid definition of the nonterminal symbol \verb+expr+: \begin{verbatim} expr [@default EConst 0]: i = INT { EConst i } | e1 = expr PLUS e2 = expr { EAdd (e1, e2) } \end{verbatim} An attribute can be attached with a parameterized nonterminal symbol: \begin{verbatim} option [@default None] (X): { None } | x = X { Some x } \end{verbatim} An attribute cannot be attached with a nonterminal symbol that is decorated with the \dinline keyword. \item An attribute can be attached with a producer (\sref{sec:producers}), that is, with an occurrence of a terminal or nonterminal symbol in the right-hand side of a production. Such an attribute must appear immediately after the producer. For instance, in the following rule, an attribute is attached with the producer \verb+expr*+: \begin{verbatim} exprs: LPAREN es = expr* [@list true] RPAREN { es } \end{verbatim} \end{enumerate} % %attribute declarations: As a convenience, it is possible to attach many attributes with many (terminal and nonterminal) symbols in one go, via an \dattribute declaration, which must be placed in the declarations section (\sref{sec:decls}). For instance, the following declaration attaches both of the attributes \verb+[@cost 0]+ and \verb+[@precious false]+ with each of the symbols \verb+INT+ and \verb+id+: \begin{verbatim} %attribute INT id [@cost 0] [@precious false] \end{verbatim} An \dattribute declaration can be considered syntactic sugar: it is desugared away in terms of the four forms of attributes presented earlier. (The command line switch \oonlypreprocess can be used to see how it is desugared.) % Interaction of %attribute declarations and parameterized nonterminals: If an attribute is attached with a parameterized nonterminal symbol, then, when this symbol is expanded away, the attribute is transmitted to every instance. For instance, in an earlier example, the attribute \verb+[@default None]+ was attached with the parameterized symbol \verb+option+. Then, every instance of \verb+option+, such as \verb+option(expr)+, \verb+option(COMMA)+, and so on, inherits this attribute. To attach an attribute with one specific instance only, one can use an \dattribute declaration. For instance, the declaration \verb+%attribute option(expr) [@cost 10]+ attaches an attribute with the nonterminal symbol \verb+option(expr)+, but not with the symbol \verb+option(COMMA)+. % ------------------------------------------------------------------------------ \section{Interaction with build systems} \label{sec:build} This section explains some details of the compilation workflow, including \ocaml type inference and its repercussions on dependency analysis (\sref{sec:build:infer}) and compilation flags (\sref{sec:build:flags}). % This material should be of interest only to authors of build systems who wish to build support for \menhir into their system. % Ordinary users should skip this section and use a build system that knows about \menhir, such as \hyperlink{dune}{\dune} (preferred) or \ocamlbuild. \subsection{\ocaml type inference and dependency analysis} \label{sec:build:infer} In an ideal world, the semantic actions in a \mly file should be well-typed according to the \ocaml type discipline, and their types should be known to \menhir, which may need this knowledge. (When \oinspection is set, \menhir needs to know the \ocaml type of every nonterminal symbol.) % To address this problem, three approaches exist: \begin{itemize} \item Ignore the problem and let \menhir run without \ocaml type information (\sref{sec:build:infer:none}). \item Let \menhir obtain \ocaml type information by invoking the \ocaml compiler (\sref{sec:build:infer:direct}). \item Let \menhir request and receive \ocaml type information without invoking the \ocaml compiler (\sref{sec:build:infer:indirect}). \end{itemize} \subsubsection{Running without \ocaml type information} \label{sec:build:infer:none} The simplest thing to do is to run \menhir \emph{without} any of the flags described in the following (\sref{sec:build:infer:direct}, \sref{sec:build:infer:indirect}). % Then, the semantic actions are \emph{not} type-checked, and their \ocaml type is \emph{not} inferred. % (This is analogous to using \ocamlyacc.) % The drawbacks of this approach are as follows: \begin{itemize} \item A type error in a semantic action is detected only when the \ml file produced by \menhir is type-checked. The location of the type error, as reported by the \ocaml compiler, can be suboptimal. % I think that the type error should be reported inside a semantic % action (we produce # directives for this purpose). Yet I am not % certain that this will be the case. Plus, the type error could be % reported inside Menhir's standard library, whereas when --infer is % used, we place the standard library first, so as to ensure that no % type error is found inside it. (See [infer.ml].) \item Unless a \dtype declaration for every nonterminal symbol is given, the inspection API cannot be generated, that is, \oinspection must be turned off. \end{itemize} \subsubsection{Obtaining \ocaml type information by calling the \ocaml compiler} \label{sec:build:infer:direct} The second approach is to let \menhir invoke the \ocaml compiler so as to type-check the semantic actions and infer their types. This is done by invoking \menhir with the \oinfer switch, as follows. \docswitch{\oinfer} This switch causes the semantic actions to be checked for type consistency \emph{before} the parser is generated. To do so, \menhir generates a mock \ml file, which contains just the semantic actions, and invokes the \ocaml compiler, under the form \verb+ocamlc -i+, so as to type-check this file and infer the types of the semantic actions. \menhir then reads this information and produces real \ml and \mli files. % There is a slight catch with \oinfer. The types inferred by \ocamlc are valid % in the toplevel context, but can change meaning when inserted into a local % context. \docswitch{\oocamlc \nt{command}} This switch controls how \ocamlc is invoked. It allows setting both the name of the executable and the command line options that are passed to it. \docskip One difficulty with this approach is that the \ocaml compiler usually needs to consult a few \texttt{.cm[iox]} files. Indeed, if the \mly file contains a reference to an external \ocaml module, say \texttt{A}, then the \ocaml compiler typically needs to read one or more files named \texttt{A.cm[iox]}. This implies that these files must have been created first. But how is one supposed to know, exactly, which files should be created first? One must scan the \mly file so as to find out which external modules it depends upon. In other words, a dependency analysis is required. This analysis can be carried out by invoking \menhir with the \odepend switch, as follows. \docswitch{\odepend} This switch causes \menhir to generate dependency information for use in conjunction with \make. When invoked in this mode, \menhir does not generate a parser. Instead, it examines the grammar specification and prints a list of prerequisites for the targets \nt{basename}\texttt{.cm[iox]}, \nt{basename}\texttt{.ml}, and \nt{basename}\texttt{.mli}. This list is intended to be textually included within a \Makefile. % % It is important to note that \nt{basename}\texttt{.ml} and % \nt{basename}\texttt{.mli} can have \texttt{.cm[iox]} prerequisites. This is % because, when the \oinfer switch is used, \menhir infers types by invoking % \ocamlc, and \ocamlc itself requires the \ocaml modules that the grammar % specification depends upon to have been compiled first. % To produce this list, \menhir generates a mock \ml file, which contains just the semantic actions, invokes \ocamldep, and postprocesses its output. \docswitch{\orawdepend} This switch is analogous to \odepend. However, in this case, \ocamldep's output is \emph{not} postprocessed by \menhir: it is echoed without change. This switch is not suitable for direct use with \make; it is intended for use with \omake or \ocamlbuild, which perform their own postprocessing. \docswitch{\oocamldep \nt{command}} This switch controls how \ocamldep is invoked. It allows setting both the name of the executable and the command line options that are passed to it. \subsubsection{Obtaining \ocaml type information without calling the \ocaml compiler} \label{sec:build:infer:indirect} The third approach is to let \menhir request and receive \ocaml type information \emph{without} allowing \menhir to invoke the \ocaml compiler. There is nothing magic about this: to achieve this, \menhir must be invoked twice, and the \ocaml compiler must be invoked (by the user, or by the build system) in between. This is done as follows. \docswitch{\oinferwrite \nt{mockfilename}} When invoked in this mode, \menhir does not generate a parser. Instead, generates a mock \ml file, named \nt{mockfilename}, which contains just the semantic actions. Then, it stops. \docskip It is then up to the user (or to the build system) to invoke \verb+ocamlc -i+ so as to type-check the mock \ml file and infer its signature. The output of this command should be redirected to some file \nt{sigfilename}. Then, \menhir can be invoked again, as follows. \docswitch{\oinferread \nt{sigfilename}} When invoked in this mode, \menhir assumes that the file \nt{sigfilename} contains the result of running \verb+ocamlc -i+ on the file \nt{mockfilename}. It reads and parses this file, so as to obtain the \ocaml type of every semantic action, then proceeds normally to generate a parser. \docskip This protocol was introduced on 2018/05/23; earlier versions of \menhir do not support it. Its existence can be tested as follows: \docswitch{\oinferprotocolsupported} When invoked with this switch, \menhir immediately terminates with exit code 0. An earlier version of \menhir, which does not support this protocol, would display a help message and terminate with a nonzero exit code. \subsection{Compilation flags} \label{sec:build:flags} The following switches allow querying \menhir so as to find out which compilation flags should be passed to the \ocaml compiler and linker. \docswitch{\osuggestcomp} This switch causes \menhir to print a set of suggested compilation flags, and exit. These flags are intended to be passed to the \ocaml compilers (\ocamlc or \ocamlopt) when compiling and linking the parser generated by \menhir. What flags are suggested? In the absence of the \otable switch, no flags are suggested. When \otable is set, a \texttt{-I} flag is suggested, so as to ensure that \menhirlib is visible to the \ocaml compiler. \docswitch{\osuggestlinkb} This switch causes \menhir to print a set of suggested link flags, and exit. These flags are intended to be passed to \texttt{ocamlc} when producing a bytecode executable. What flags are suggested? In the absence of the \otable switch, no flags are suggested. When \otable is set, the object file \texttt{menhirLib.cma} is suggested, so as to ensure that \menhirlib is linked in. \docswitch{\osuggestlinko} This switch causes \menhir to print a set of suggested link flags, and exit. These flags are intended to be passed to \texttt{ocamlopt} when producing a native code executable. What flags are suggested? In the absence of the \otable switch, no flags are suggested. When \otable is set, the object file \texttt{menhirLib.cmxa} is suggested, so as to ensure that \menhirlib is linked in. \docswitch{\osuggestmenhirlib} This switch causes \menhir to print (the absolute path of) the directory where \menhirlib was installed. \docswitch{\osuggestocamlfind} This switch is deprecated and may be removed in the future. It always prints \texttt{false}. % ------------------------------------------------------------------------------ \section{Comparison with \ocamlyacc} % TEMPORARY idéalement, il faudrait documenter la différence de comportement % sur les réductions par défaut (sur des symboles autres que #). Roughly speaking, Menhir is 90\% compatible with \ocamlyacc. Legacy \ocamlyacc grammar specifications are accepted and compiled by Menhir. The resulting parsers run and produce correct parse trees. However, parsers that explicitly invoke functions in the module \texttt{Parsing} behave slightly incorrectly. For instance, the functions that provide access to positions return a dummy position when invoked by a Menhir parser. Porting a grammar specification from ocamlyacc to Menhir requires replacing all calls to \texttt{Parsing} with new Menhir-specific keywords (\sref{sec:positions}). Here is an incomplete list of the differences between \ocamlyacc and \menhir. The list is roughly sorted by decreasing order of importance. \begin{itemize} \item \menhir allows the definition of a nonterminal symbol to be parameterized (\sref{sec:templates}). A formal parameter can be instantiated with a terminal symbol, a nonterminal symbol, or an anonymous rule (\sref{sec:actual}). A library of standard parameterized definitions (\sref{sec:library}), including options, sequences, and lists, is bundled with Menhir. EBNF syntax is supported: the modifiers \dquestion, \dplus, and \dstar are sugar for options, nonempty lists, and arbitrary lists (\fref{fig:sugar}). \item \ocamlyacc only accepts LALR(1) grammars. \menhir accepts LR(1) grammars, thus avoiding certain artificial conflicts. \item \menhir's \dinline keyword (\sref{sec:inline}) helps avoid or resolve some LR(1) conflicts without artificial modification of the grammar. \item \menhir explains conflicts (\sref{sec:conflicts}) in terms of the grammar, not just in terms of the automaton. \menhir's explanations are believed to be understandable by mere humans. \item \menhir offers an incremental API (in \otable mode only) (\sref{sec:incremental}). This means that the state of the parser can be saved at any point (at no cost) and that parsing can later be resumed from a saved state. \item \menhir offers a set of tools for building a (complete, irredundant) set of invalid input sentences, mapping each such sentence to a (hand-written) error message, and maintaining this set as the grammar evolves (\sref{sec:errors:new}). \item In \ocoq mode, \menhir produces a parser whose correctness and completeness with respect to the grammar can be checked by Coq (\sref{sec:coq}). \item \menhir offers an interpreter (\sref{sec:interpret}) that helps debug grammars interactively. \item \menhir allows grammar specifications to be split over multiple files (\sref{sec:split}). It also allows several grammars to share a single set of tokens. \item \menhir produces reentrant parsers. \item \menhir is able to produce parsers that are parameterized by \ocaml modules. \item \ocamlyacc requires semantic values to be referred to via keywords: \verb+$1+, \verb+$2+, and so on. \menhir allows semantic values to be explicitly named. \item \menhir warns about end-of-stream conflicts (\sref{sec:eos}), whereas \ocamlyacc does not. \menhir warns about productions that are never reduced, whereas, at least in some cases, \ocamlyacc does not. \item \menhir offers an option to typecheck semantic actions \emph{before} a parser is generated: see \oinfer. \item \ocamlyacc produces tables that are interpreted by a piece of C code, requiring semantic actions to be encapsulated as \ocaml closures and invoked by C code. \menhir offers a choice between producing tables and producing code. In either case, no C code is involved. \item \menhir makes \ocaml's standard library module \texttt{Parsing} entirely obsolete. Access to locations is now via keywords (\sref{sec:positions}). Uses of \verb+raise Parse_error+ within semantic actions are deprecated. The function \verb+parse_error+ is deprecated. They are replaced with keywords (\sref{sec:errors}). \item \menhir's error handling mechanism (\sref{sec:errors}) is inspired by \ocamlyacc's, but is not guaranteed to be fully compatible. Error recovery, also known as re-synchronization, is not supported by \menhir. \item The way in which severe conflicts (\sref{sec:conflicts}) are resolved is not guaranteed to be fully compatible with \ocamlyacc. \item \menhir warns about unused \dtoken, \dnonassoc, \dleft, and \dright declarations. It also warns about \dprec annotations that do not help resolve a conflict. \item \menhir accepts \ocaml-style comments. \item \menhir allows \dstart and \dtype declarations to be condensed. \item \menhir allows two (or more) productions to share a single semantic action. \item \menhir produces better error messages when a semantic action contains ill-balanced parentheses. % \item \ocamlyacc allows nonterminal start symbols to start with an uppercase % letter, and produces invalid \ocaml code in that case. \menhir disallows this. \item \ocamlyacc ignores semicolons and commas everywhere. \menhir regards semicolons and commas as significant, and allows them, or requires them, in certain well-defined places. % \item \ocamlyacc ignores multiple definitions of a token, even when two of them are at % different types. \menhir rejects this. \item \ocamlyacc allows \dtype declarations to refer to terminal or non-terminal symbols, whereas \menhir requires them to refer to non-terminal symbols. Types can be assigned to terminal symbols with a \dtoken declaration. \end{itemize} % ------------------------------------------------------------------------------ \section{Questions and Answers} \label{sec:qa} $\mathstrut$ % Ensure correct indentation of the first question. Ugly. \vspace{-\baselineskip} \question{Is \menhir faster than \ocamlyacc? What is the speed difference between \texttt{menhir} and \texttt{menhir -{}-table}?} A (not quite scientific) benchmark suggests that the parsers produced by \ocamlyacc and \texttt{menhir -{}-table} have comparable speed, whereas those produced by \texttt{menhir} are between 2 and 5 times faster. This benchmark excludes the time spent in the lexer and in the semantic actions. \question{How do I write \Makefile rules for \menhir?} This can be a bit tricky. % understatement If you must do this, see \sref{sec:build}. It is recommended instead to use a build system with built-in support for \menhir, such as \hyperlink{dune}{\dune} (preferred) or \ocamlbuild. \question{How do I use \menhir with \ocamlbuild?} Pass \verb+-use-menhir+ to \ocamlbuild. To pass options to \menhir, pass \verb+-menhir "menhir "+ to \ocamlbuild. To use \menhir's table-based back-end, pass \verb+-menhir "menhir --table"+ to \ocamlbuild, and either pass \verb+-package menhirLib+ to \ocamlbuild or add the tag \verb+package(menhirLib)+ in the \verb+_tags+ file. To combine multiple \mly files, say \verb+a.mly+ and \verb+b.mly+, into a single parser, say \verb+parser.{ml,mli}+, create a file named \verb+parser.mlypack+ that contains the module names \verb+A B+. See the directory \distrib{demos/ocamlbuild} for examples. To deal with \messages files (\sref{sec:errors:new}), use the rules provided in the file \distrib{demos/ocamlbuild/myocamlbuild.ml}. % Advanced scenario: to use --only-tokens and -external-tokens, % use .mlypack + _tags + myocamlbuild.ml. Not explained here, % but \distrib{demos/ocamlbuild/calc-two} contains an example. \hypertarget{dune}{% \question{How do I use \menhir with \dune?}} Please use \dune version 1.4.0 or newer, as it has appropriate built-in rules for Menhir parsers. In the simplest scenario, where the parser resides in a single source file \texttt{parser.mly}, the \texttt{dune-project} file should contain a ``stanza'' along the following lines: \begin{verbatim} (menhir (modules parser) (flags --explain --dump) (infer true) ) \end{verbatim} % Ordinary command line switches, like \oexplain and \odump, are passed as part of the \texttt{flags} line, as done above. % The \oinfer switch has special status and should not be used directly; instead, write \texttt{(infer true)} or \texttt{(infer false)}, as done above. (The default is \texttt{true}.) % The \otable switch can also be listed as part of the \texttt{flags} line; if you do so, then you must add \texttt{menhirLib} to the list of libraries that your code requires, as in the following example: % \begin{verbatim} (executable (name myexecutable) (libraries menhirLib) ) \end{verbatim} % The directory \distrib{demos} % (and others like it) offers several examples. % For more details, see \href % {https://dune.readthedocs.io/en/stable/} {https://dune.readthedocs.io/en/stable/dune-files.html#menhir} {\dune's documentation}. % To deal with \messages files (\sref{sec:errors:new}), please use and adapt the rules found in the file \distrib{src/stage2/dune}. % It may be necessary to specify which version of the Menhir build rules % one wishes to use. This is done by writing, e.g. % \begin{verbatim} % (using menhir 2.0) % \end{verbatim} % at the top level of the \texttt{dune-project} file. % However, my understanding is that this is usually not necessary. % \dune will automatically add this line for us % when a project is first compiled. \question{My \mly file begins with a module alias declaration \texttt{module F = Foo}. Because of this, the \mli file generated by Menhir contains references to \texttt{F} instead of \texttt{Foo}. This does not make sense!} % Beginning with \menhir 20200525, \menhir prefers to use the types inferred by the \ocaml compiler over the types provided by the user in \dtype declarations. (This may sound strange, but these types can differ in some situations that involve polymorphic variants. Using the inferred type is required for type soundness.) % In the presence of a~module alias declaration such as \texttt{module F = Foo}, OCaml can infer types that begin with \texttt{F.} instead of \texttt{Foo.}, and Menhir currently does not detect that \texttt{F} is a local name. % The suggested fix is to avoid placing module alias declarations in \mly files. % % fp: If people insist on using module aliases, the simplest way of supporting % them would be to generate a module alias definition both in the .ml file % and in the .mli file. This could be done by adding a new %module_alias % declaration to Menhir. \question{\menhir reports \emph{more} shift/reduce conflicts than \ocamlyacc! How come?} \ocamlyacc sometimes merges two states of the automaton that \menhir considers distinct. This happens when the grammar is not LALR(1). If these two states happen to contain a shift/reduce conflict, then \menhir reports two conflicts, while \ocamlyacc only reports one. Of course, the two conflicts are very similar, so fixing one will usually fix the other as well. \question{I do not use \ocamllex. Is there an API that does not involve lexing buffers?} Like \ocamlyacc, \menhir produces parsers whose monolithic API (\sref{sec:monolithic}) is intended for use with \ocamllex. However, it is possible to convert them, after the fact, to a simpler, revised API. In the revised API, there are no lexing buffers, and a lexer is just a function from unit to tokens. Converters are provided by the library module \menhirlibconvert. This can be useful, for instance, for users of \texttt{sedlex}, the Unicode-friendly lexer generator. Also, please note that \menhir's incremental API (\sref{sec:incremental}) does not mention the type \verb+Lexing.lexbuf+. In this API, the parser expects to be supplied with triples of a token and start/end positions of type \verb+Lexing.position+. \question{Is there other useful magic in \menhirlib?} There is some. The module \menhirliberrorreports offers some facilities for constructing syntax error messages. The module \menhirliblexerutil offers facilities for extracting the position of a syntax error out of the lexing buffer and displaying it in a readable way. \question{I need both \dinline and non-\dinline versions of a non-terminal symbol. Is this possible?} Define an \dinline version first, then use it to define a non-\dinline version, like this: \begin{verbatim} %inline ioption(X): (* nothing *) { None } | x = X { Some x } option(X): o = ioption(X) { o } \end{verbatim} This can work even in the presence of recursion, as illustrated by the following definition of (reversed, left-recursive, possibly empty) lists: \begin{verbatim} %inline irevlist(X): (* nothing *) { [] } | xs = revlist(X) x = X { x :: xs } revlist(X): xs = irevlist(X) { xs } \end{verbatim} The definition of \verb+irevlist+ is expanded into the definition of \verb+revlist+, so in the end, \verb+revlist+ receives its normal, recursive definition. One can then view \verb+irevlist+ as a variant of \verb+revlist+ that is inlined one level deep. % Intentionally do not call this "list", because people may copy-paste this % definition, and will end up unintentionally redefining the meaning of *. \question{Can I ship a generated parser while avoiding a dependency on \menhirlib?} Yes. One option is to use the code-based back-end (that is, to not use \otable). In this case, the generated parser is self-contained. Another option is to use the table-based back-end (that is, use \otable) and include a copy of the files \verb+menhirLib.{ml,mli}+ together with the generated parser. The command \texttt{menhir \osuggestmenhirlib} will tell you where to find these source files. \question{Why is \texttt{\$startpos} off towards the left? It seems to include some leading whitespace.} Indeed, as of 2015/11/04, the computation of positions has changed so as to match \ocamlyacc's behavior. As a result, \texttt{\$startpos} can now appear to be too far off to the left. This is explained in \sref{sec:positions}. In short, the solution is to use \verb+$symbolstartpos+ instead. \question{Can I pretty-print a grammar in ASCII, HTML, or \LaTeX{} format?} Yes. Have a look at \texttt{obelisk} \cite{obelisk}. \question{Does \menhir support mid-rule actions?} Yes. See \nt{midrule} and its explanation in \sref{sec:library}. % ------------------------------------------------------------------------------ \section{Technical background} After experimenting with Knuth's canonical LR(1) technique~\cite{knuth-lr-65}, we found that it \emph{really} is not practical, even on today's computers. For this reason, \menhir implements a slightly modified version of Pager's algorithm~\cite{pager-77}, which merges states on the fly if it can be proved that no reduce/reduce conflicts will arise as a consequence of this decision. This is how \menhir avoids the so-called \emph{mysterious} conflicts created by LALR(1) parser generators~\cite[section 5.7]{bison}. \menhir's algorithm for explaining conflicts is inspired by DeRemer and Pennello's~\cite{deremer-pennello-82} and adapted for use with Pager's construction technique. By default, \menhir produces code, as opposed to tables. This approach has been explored before~\cite{bhamidipaty-proebsting-98,horspool-faster-90}. \menhir performs some static analysis of the automaton in order to produce more compact code. When asked to produce tables, \menhir performs compression via first-fit row displacement, as described by Tarjan and Yao~\cite{tarjan-yao-79}. Double displacement is not used. The action table is made sparse by factoring out an error matrix, as suggested by Dencker, Dürre, and Heuft~\cite{dencker-84}. The type-theoretic tricks that triggered our interest in LR parsers~\cite{pottier-regis-gianas-typed-lr} are not implemented in \menhir. In the beginning, we did not implement them because the \ocaml compiler did not at the time offer generalized algebraic data types (GADTs). Today, \ocaml has GADTs, but, as the saying goes, ``if it ain't broken, don't fix it''. The main ideas behind the Coq back-end are described in a paper by Jourdan, Pottier and Leroy~\cite{jourdan-leroy-pottier-12}. The C11 parser in the CompCert compiler~\cite{compcert} is constructed by Menhir and verified by Coq, following this technique. How to construct a correct C11 parser using Menhir is described by Jourdan and Pottier~\cite{jourdan-pottier-17}. The approach to error reports presented in \sref{sec:errors:new} was proposed by Jeffery~\cite{jeffery-03} and further explored by Pottier~\cite{pottier-reachability-cc-2016}. % ------------------------------------------------------------------------------ \section{Acknowledgements} \menhir's interpreter (\ointerpret) and table-based back-end (\otable) were implemented by Guillaume Bau, Raja Boujbel, and François Pottier. The project was generously funded by Jane Street Capital, LLC through the ``OCaml Summer Project'' initiative. Frédéric Bour provided motivation and an initial implementation for the incremental API, for the inspection API, for attributes, and for \menhirsdk. \href{https://github.com/ocaml/merlin}{Merlin}, an emacs mode for \ocaml, contains an impressive incremental, syntax-error-tolerant \ocaml parser, which is based on \menhir and has been a driving force for \menhir's APIs. Jacques-Henri Jourdan designed and implemented the Coq back-end and did the Coq proofs for it. Gabriel Scherer provided motivation for investigating Jeffery's technique. % ------------------------------------------------------------------------------ % Bibliography. \bibliographystyle{plain} \bibliography{local} \end{document} % LocalWords: Yann Régis Gianas Regis inria Menhir filename mly basename Coq % LocalWords: coq vy tt Coq's iox Menhir's nonterminal graphviz nullable calc % LocalWords: inline postprocessed postprocessing ocamlc bytecode linkpkg cmo % LocalWords: menhirLib ocamlopt cmx qa ocamlrun runtime uid productiongroups % LocalWords: prec Actuals parameterization Parameterizing ds actuals plist xs % LocalWords: loption LPAREN RPAREN Inlining inlined inlining lp ioption bool % LocalWords: boption sep nonassociative multi basicshiftreduce lookahead decl % LocalWords: UIDENT LIDENT decls tycon expr exprs basiceos basiceosdump lex % LocalWords: curr Lexing lexbuf pos cnum startpos endpos startofs endofs LALR % LocalWords: syntaxerror whitespace EOL cst API lexing MenhirInterpreter pc % LocalWords: InputNeeded HandlingError env CompCert Aut se nat init cparser % LocalWords: validator subdirectory EBNF reentrant eos typecheck menhir ulex % LocalWords: DeRemer Pennello's Tarjan Yao Dencker Dürre Heuft Bau Raja LLC % LocalWords: Acknowledgements Boujbel Frédéric Bour menhir-20210929/doc/manual001.png000066400000000000000000000103331412503066000162140ustar00rootroot00000000000000PNG  IHDRooEbKGD̿ pHYsaa?itIME  \'iIDATx/iJEE4 t`zU"=0?2/Ll>]]˲f{w7mKV蓢:a5\F5 a[6GکJhIR]mrW1V;W[}!IEjH3(6ٺO[n8 Q-j5 $$No>էJmCUj\ 3Ɵ+ލH>FF,Y}4DSw6_k+BrG6{2[[=Y/eE? )1?(7W.+2nET$'EPTFڞi'sb$F%6>?fqgJ3C@#:QkՉ=LS*G;,惖i1Q#;$Ĝ_:949kϊYז|ҎhCyeDhs/?g.FfG$DKsEZZ#ko'Hiq<¿+Dd|@@7̢[@T+fK ܵZp*}[:Tͮ1$ظR|?v=QBr _ܘG$?Yۨ26mFt: 6 O uHM Jm#_OޟYɁε;N.D|gF9 YB$ȜJֆ=[$})^F%YG.I_}rbr,G@t3뗺 oRat)k@@@]# &?_ѳH$#) ' _xqiO<D54HD'r4jpcCIJ'm|VF6 `yɸ܈"o Me=guLbI3~[,rP,Z 4ObҲz^<`sv}8f.ME_z~<`󁾸xy&0 sNy`׫ъu)N`]U^Dnw <병oj=sJe *T@@Mg1S~ cxG"GӤKR"ă |0J>"IdILl9Q\^ EBBLM@ۥ5ypMd6 M~]0)["B0P5DƔƐ,asM2*b=Yci=vZրYvP4HR̅M—&Med=λ]e>)힧rt?\\P@J`8ڲǙŕv^րI5>Q2W6/H"b241z,4tB;Gmhw.SuzsI]6^OVkwr$.HmQ;k8^:GY x,KG GkgDb[Kcyk-k;AX;5 *uOg~ߣ^#7| -O>MͭO#x`O@S' 0gIgwB|5!I))Igˊ[æ^5~ _pTvif6jAHh^e^$M;?,F_?6 >]Ǹa0I_®/ay`wä/gח2KYeKjYIc^#AMzts6 [*PI'Ѧ;t6+yIKW;YNهfdž3}M2Q1i(k GCql>HJyIڭˊՏcNdĕY7G~=zd]LLrSdA]  ՘ji P4&UFEi}ɑ]:$I+se&Kk=p;?0Π%tEXtdate:create2021-09-29T09:28:15+00:00 %tEXtdate:modify2021-09-29T09:28:15+00:00Qe-tEXticc:copyrightCopyright Artifex Software 2011Ŵ1tEXticc:descriptionArtifex Software sRGB ICC Profile tEXtSoftwareGPL Ghostscript 9.53.3o'IENDB`menhir-20210929/doc/manual002.png000066400000000000000000000100361412503066000162150ustar00rootroot00000000000000PNG  IHDRQo 6bKGD̿ pHYsaa?itIME  \'IDATx-F<9P4¦{*0^1 1-0 ]7Loj%[eTꪯ T!Aϙtݿ7M߰=G" JK)I;9;wԹx7{@S$;ϕ"@TR3~|۱uxn1!*&kw*$ (d^]bmwz+~66y*،&=Ltw WU5K]׿-jV$ǵ)_ŋ{&E6EHht]IP$D@HdȲRSBaGjM ֣-ɕ+•\~d -(ٌ.B0$2W"NNSȈ $8:UxpTY{*bl6_'Q|4b|⌣+d% L#4RQEՎĴ%7WwZԽWjdW)Z"C.%0wa.,홺2E!PDBjHHjDHD4(T!3$z CXSI(ȕAocjQ~/N}z4'&ɪj6zMoyJɒn=m-xZ6w] ;OJJy$ODB7+ϤdtN`ĭVڂM̮Oe~QD6ǥSʍ^ Kzۧ0qnH`%D8{n ccx\%tI4!̩)LF2A&7ٷ3DôC,Z3C4w,f=LQ 3i_q>"zszoJL|6zN8Zyx)NkgtG~㨎"L/q9zEOQdzK>w![ fT}ϪwM .O7&jO"c\;wÃo~.IH1گ8ޅ<9u68vi[D>Oxz;V9孒e:v1tu^<7`֪\Ͷʽ:s]RjBłx՚v1VF@ ɂF5=JY^n0#t`пhB$1]=_# ̺B1`ilY4 6*GnjaW[jпJzSسinvvH_UjE*T0RŅrwӏ~x zae;g I?C{E#MĂҼ q>{Ժ6UU&0(7#MJGb]XHmh{bwZTBȧK{3aQۗkם*,Z˔zleB٣9$NJupRH_Uj@ͦ ;= BQ)lFԺB&hS&R=<<<~>hg&AůێL41 ?>yAmCN-3yvcakc(ZcIc=C1Jcmi8ի0h$3=u=Kճ{99[󷒝ۻw,U.3K˜znÍ)8'ĺo.#!}a?7/$ x&|)Rjҫ3[,s:]kJR4AfXmp 擛T^+9LzӻsGb-l@XN*/fd/kMYi~m"uIzGo@!Fb3(yA;\SӮW[ ɳ=NB򙀐/|1&s(r[7 TнFR=ՙڸ ~,m-MKnF'=@@I9>n2ӴR!`q E۹nف^SR5QfmXQVEtѢ=gUG{\ U 7Xg% d#IR@DJ{c MAj0ԬsF$P^qXRsLe-1Uc~U-\ ,#Ǎ04;ͨO8(<'O^oYYٹ}^ugrN^$ON<{ܦ@3 }JC9aq.]XxzWfmaR[UvE mMun"6Efqd;\I[k 9>'^D^ yP|3kc$|3!!/|ED`|œ3$ⳝrot5hF@Sؕe5. *YW)!"$E"b2Di6 fd@B#GH-3I2G WǘĀLXk s.=Vn[q]q/y+vkf2k-1OAE!QQjQd^miss $r]p\ Ĕ/܍'{Wi]5Ɏ_Rg8b6,kYGY$Mc:Ͻ栮vju V;cdꑝgV=t iyPDr]mXW+mڧIԾZX1 V=׌hWS6P=( O]9$Av]kX~ֵOWcH %[u{5a]+X͇=4#{&`-= ”DOF̏|W~O|7ws DH?ugތ'+_ף'I7b|#%xǏ˼?~u}vae}WO[3K_*37!0WM76x Լzfx=n]]3ݸ^C:\mGް768U=d0XP@#\缻!#?w\ao𰭾G$l85BUIq֚6~-k|XQ?x8ں7:;lnT'&D%3"/jA ^<6z wT nEa )~آv+ܯ/ok:T fa~zYUq>Do|(i.Ѣ {ݩQ @ExhTR9C&80uJ+WSnBl3{.kcb:w}I::eƼ@9Ktv䭟e+:/GJ)\SG?JH`sNu%ޡ c;bYW?CP w]ˌWLo#B!B MqQ'ZNS + ^ |$䌨, =r9RzppNM.:2#5Xsf&pތ$_:5/|߲-+O5.MtW{M~ץ M|f}֤a~' ٙkKے'Nvsu^TG74g~޷fH3J9ٮ7ĉN]E:u*\.ȡ.:(!֩no{6j^.QҸĈ!>/x;~'|+4~Oo}xxg|Tbuj@Bȳ]C"#ҐȈ4FF!72" iȍHCndDr##ҐȈ4FF$8ZWx )g |xE/ .(d]q.8D*~B F16Ѿ -î'I|n2 3F)ujuOHD~qO w7ާ8 hmq6ф;+Sy5RZPCQȾלŌKÌS m,H<Q5f0oY;4}BX~Şas/x>9.83SI !BqbjkB6]MLE!&"{E~-GMB>RؼlF!@%"(#I lc6!);#x2yoT-f>';Malfc6|'B͌:A:9G_KB:B⒄BF+5r(B\/G&KHQx)J#EI!n$HQR#EI!N%x89RHQR#Eyj p%! ZHuLjo¨q*H5/AHrDnk D!DHrYtb^r G<*DqQ战~,lҍ%cBQR|'NIµ~#͊FAI_b_0Y#/ wHK]>pco|~S=>)|~~q/Ipo|(_xx Iyz_xKؖ˭Bh0#‹HWyV3k_xW@ҚwU /ed8'^RF*5f H"BGWo/Ԛgkp4lPSZ΍zRD֯N[錙}U=2^C ua!r4Ym!I4CTcZ#'a~ډ8Y(Q^-qZ]bۭNc@H wNļlo,N\_?l67u UNw;6ƒܐ)ЁxޱÖPH$\tBPd1Tys:hս< /y om$|@ta ޵lRG^'(= 2V:mo.lz;/#nr=[ @\7y-Ր 50wh@j5lTZ۷ insert n+1 empty lines .\" for manpage-specific macros, see man(7) .SH NAME menhir \- an LR(1) parser generator for OCaml .SH SYNOPSIS .B menhir .RI [ options ] " files" .SH DESCRIPTION .B menhir is an LR(1) parser generator for the OCaml programming language. That is, Menhir compiles LR(1) grammar specifications down to OCaml code. It is mostly compatible with .BR ocamlyacc (1). .SH OPTIONS .TP .B \-h, \-\-help Show summary of options. .TP .BI \-b,\ \-\-base\ basename Specifies a base name for the output file(s). .TP .B \-\-canonical Construct a canonical Knuth LR(1) automaton. .TP .B \-\-cmly Write the grammar and automaton to .IR basename .cmly. .TP .B \-\-comment Include comments in the generated code. .TP .BI \-\-compare\-errors\ file1\ \-\-compare\-errors\ file2 Compare two .messages files. .TP .BI \-\-compile\-errors\ file Compile a .messages file to OCaml code. .TP .B \-\-coq Generate a formally verified parser, in Coq. .TP .BI \-\-coq\-lib\-path\ path How to qualify references to MenhirLib. .TP .B \-\-coq\-lib\-no\-path Do not qualify references to MenhirLib. .TP .B \-\-coq\-no\-actions Ignore semantic actions in the Coq output. .TP .B \-\-coq\-no\-complete Do not generate a proof of completeness. .TP .B \-\-depend Invoke ocamldep and display dependencies. .TP .B \-\-dump Describe the automaton in .IR basename .automaton. .TP .BI \-\-echo\-errors\ file Echo the sentences in a .messages file. .TP .B \-\-explain Explain conflicts in .IR basename .conflicts. .TP .BI \-\-external\-tokens\ module Import token type definition from .IR module . .TP .B \-\-fixed\-exception Declares Error = Parsing.Parse_error. .TP .B \-\-graph Write grammar's dependency graph to .IR basename .dot. .TP .B \-\-infer Invoke ocamlc for ahead of time type inference. .TP .B \-\-infer\-protocol\-supported Stop with exit code 0. .TP .BI \-\-infer\-write\-query\ file Write mock .ml file. .TP .BI \-\-infer\-read\-reply\ file Read inferred .mli file. .TP .B \-\-inspection Generate the inspection API. .TP .B \-\-interpret Interpret the sentences provided on stdin. .TP .B \-\-interpret\-show\-cst Show a concrete syntax tree upon acceptance. .TP .B \-\-interpret\-error Interpret an error sentence provided on stdin. .TP .B \-\-lalr Construct an LALR(1) automaton. .TP .BI \-la,\ \-\-log\-automaton\ level Log information about the automaton. .TP .BI \-lc,\ \-\-log\-code\ level Log information about the generated code. .TP .BI \-lg,\ \-\-log\-grammar\ level Log information about the grammar. .TP .B \-\-list\-errors Produce a list of erroneous inputs. .TP .B \-\-no\-dollars Disallow the use of $i notation. .TP .B \-\-no\-inline Ignore the %inline keyword. .TP .B \-\-no\-stdlib Do not load the standard library. .TP .BI \-\-ocamlc\ command Specifies how ocamlc should be invoked. .TP .BI \-\-ocamldep\ command Specifies how ocamldep should be invoked. .TP .B \-\-only\-preprocess Print a simplified grammar and exit. .TP .B \-\-only\-preprocess\-for\-ocamlyacc Print grammar in ocamlyacc format and exit. .TP .B \-\-only\-preprocess\-u Print grammar with unit actions and exit. .TP .B \-\-only\-preprocess\-uu Print grammar with unit actions and tokens and exit. .TP .B \-\-only\-tokens Generate token type definition only, no code. .TP .B \-\-raw\-depend Invoke ocamldep and echo its raw output. .TP .BI \-\-stdlib\ directory Specify where the standard library lies. .TP .B \-\-strict Warnings about the grammar are errors. .TP .B \-\-suggest\-comp\-flags Suggest compilation flags for ocaml{c,opt}. .TP .B \-\-suggest\-link\-flags-byte Suggest link flags for ocamlc. .TP .B \-\-suggest\-link\-flags-opt Suggest link flags for ocamlopt. .TP .B \-\-suggest\-menhirLib Suggest where MenhirLib was installed in source form. .TP .B \-\-suggest\-ocamlfind Deprecated. .TP .B \-t, \-\-table Use the table-based back-end. .TP .B \-\-timings Display internal timings. .TP .B \-\-trace Include tracing instructions in the generated code. .TP .B \-\-unused\-precedence\-levels Do not warn about unused precedence levels. .TP .BI \-\-unused\-token\ token Do not warn that .IR token is unused. .TP .B \-\-unused\-tokens Do not warn about any unused token. .TP .BI \-\-update\-errors\ file Update auto-comments in a .messages file. .TP .B \-\-version Show version number and exit. .TP .B \-v Synonymous with .BR \-\-dump\ \-\-explain . .SH SEE ALSO .BR ocaml (1). .SH AUTHOR .B menhir was written by Fran\(,cois Pottier and Yann R\('egis-Gianas. .PP This manual page was originally written by Samuel Mimram for the Debian project (but may be used by others). menhir-20210929/doc/mymacros.hva000066400000000000000000000001601412503066000163370ustar00rootroot00000000000000\input{mymacros.sty} % Ignore \raisebox and \phantom. \newcommand{\raisebox}[2]{#2} \newcommand{\phantom}[1]{} menhir-20210929/doc/mymacros.sty000066400000000000000000000014461412503066000164100ustar00rootroot00000000000000%; whizzy -macros main.tex % References to sections, lemmas, theorems, etc. \newcommand{\sref}[1]{\S\ref{#1}} \newcommand{\tref}[1]{Theorem~\ref{#1}} \newcommand{\lemref}[1]{Lemma~\ref{#1}} \newcommand{\dref}[1]{Definition~\ref{#1}} \newcommand{\eref}[1]{Example~\ref{#1}} \newcommand{\fref}[1]{Figure~\ref{#1}} \newcommand{\aref}[1]{Appendix~\ref{#1}} % Abbreviations. \def\etal.{\emph{et al.}} % Define \citeyear in addition to \cite, if not already defined. \@ifundefined{citeyear}{ \@ifundefined{shortcite}{ \let\citeyear\cite }{ \let\citeyear\shortcite } }{} % Lambda-calculus syntax. \newcommand{\ekw}[1]{\mathsf{#1}} \newcommand{\expr}{e} \newcommand{\evar}{x} \newcommand{\eabs}[2]{\lambda#1.#2} \newcommand{\eapp}[2]{#1\;#2} \newcommand{\elet}[3]{\ekw{let}\;#1=#2\;\ekw{in}\;#3} menhir-20210929/doc/new-rule-syntax-blog-post.md000066400000000000000000000325001412503066000213200ustar00rootroot00000000000000# Parser Construction With Menhir: A Couple Appetizers This post is a shameless advertisement for Menhir, a parser generator for OCaml. It illustrates Menhir's new input syntax, which was introduced on November 13, 2018. The code fragments shown below are excerpts of valid `.mly` files. ## Ingredients Suppose I would like to parse and evaluate our good old friends, the arithmetic expressions. For instance, the string `"(3 + 4) * 5 - 9"` should be accepted and evaluated to the value `26`. I assume that I have a lexical analyzer that can chop up this string into a stream of basic tokens, or terminal symbols. My alphabet of terminal is the following: ``` %token INT %token PLUS MINUS TIMES DIV LPAREN RPAREN EOL ``` Based on this alphabet, I wish to define the syntax of (and obtain a parser for) arithmetic expressions. This exercise may seem old and tired, but let me try and see if I can add some new spice and style to it. In fact, let me do it twice, in two slightly different ways. So, how would you like your arithmetic expressions cooked? ## First Flavor: Hot Off the Oven, With On-The-Fly Evaluation In this first demo, I wish to evaluate an arithmetic expression, that is, find out which integer value it represents. Thus, I am eventually interested in just an integer result. ``` %start main %% ``` I wish to recognize an expression followed with an end-of-line symbol: ``` let main := ~ = expr; EOL; <> ``` Here, `~ = expr` is a **pun**, a shorthand for `expr = expr`. It can be read as follows: "read an expression; evaluate it; let the variable `expr` stand for its value". `<>` is a **point-free semantic action**. In general, it is a shorthand for a semantic action that builds a tuple of the variables that have been bound earlier in the sequence. Thus, in this case, it is a shorthand for the semantic action `{ expr }`. It is now time to define `expr` and thereby describe the syntax and the meaning of arithmetic expressions. To do this in a nonambiguous manner, one of several traditional approaches is to stratify the syntax in several levels, namely additive expressions, multiplicative expressions, and atomic expressions. These levels are also traditionally known as *expressions*, *terms*, and *factors*. The topmost level is the level of additive expressions. In other words, an expression is just an additive expression: ``` let expr == additive_expr ``` This definition has no runtime cost: it makes `expr` a synonym for `additive_expr`. In traditional Menhir speak, `expr` is an `%inline` nonterminal symbol. This definition introduces a useful level of indirection: if in the future I decide to introduce a new level in the syntax of expressions, all I have to do is update the definition of `expr`; the places where `expr` is used do not need to be updated. In other words, the fact that "an expression is just an additive expression" is an implementation detail, and should not be revealed. An additive expression is a nonempty, left-associative list of multiplicative expressions, separated with additive operators: ``` let additive_expr == fold_left(additive_op, multiplicative_expr) ``` What does this mean? Well, quite obviously, the additive operators are `PLUS` and `MINUS`, which respectively denote addition or subtraction: ``` let additive_op == | PLUS; { ( + ) } | MINUS; { ( - ) } ``` Furthermore, a nonempty list of elements `elem` separated by operators `op` is: either a single element; or a (smaller) such list, followed with an operator, followed with an element. In the second case, the operator must be applied to the sum of the left-hand list and to the right-hand element: ``` let fold_left(op, elem) := | elem | sum = fold_left(op, elem); ~ = op; ~ = elem; { op sum elem } ``` This is a **parameterized definition**. Because this definition is recursive, it cannot be macro-expanded away: we cannot use `==` and must instead use `:=`. So much for additive expressions. This scheme can now be reproduced, one level down: a multiplicative expression is a nonempty, left-associative list of atomic expressions, separated with multiplicative operators. ``` let multiplicative_expr == fold_left(multiplicative_op, atomic_expr) let multiplicative_op == | TIMES; { ( * ) } | DIV; { ( / ) } ``` There remains to define atomic expressions. In this demo, I wish to allow the use of `MINUS` as a unary operator. Thus, an atomic expression shall be one of the following: an integer literal; an arbitrary expression between parentheses; or an application of a unary operator to an atomic expression. ``` let atomic_expr := | INT | delimited(LPAREN, expr, RPAREN) | app(unary_op, atomic_expr) ``` There is just one unary operator, `MINUS`, whose meaning is integer negation: ``` let unary_op == | MINUS; { (~- ) } ``` There remains to explain `delimited(left, x, right)` and `app(f, x)`. My main motivation for introducing these auxiliary parameterized symbols is to make the definition of `atomic_expr` prettier. `delimited(left, x, right)` is in fact part of Menhir's standard library, where it is defined as follows: ``` %public let delimited(left, x, right) == left; ~ = x; right; <> ``` `app(f, x)` recognizes the sequence `f; x`. Its value is the application of the value of `f` to the value of `x`. It is defined as follows: ``` let app(f, x) == ~ = f; ~ = x; { f x } ``` At this point, the arithmetic-expression parser-and-evaluator is complete. Menhir accepts it without complaining, which means that this grammar is in the class LR(1), therefore is **unambiguous**. From it, Menhir generates an LR(1) parser, a deterministic pushdown automaton, whose **performance is predictable**: provided each semantic action takes constant time, its time complexity is linear in the size of the input. Compared with other parsing techniques, guaranteed unambiguity and efficiency are two important strengths of LR(1) parsers. ## Second Flavor: As An Abstract-Syntax-and-Location Millefeuille Let me now be more ambitious. Instead of evaluating arithmetic expressions on the fly, let me build Abstract Syntax Trees. This opens the door to all kinds of symbolic computation: compilation down to native code, simplification, automatic differentiation, and so on. In a separate file, say `syntax.ml`, I define the types of the ASTs that I wish to build: ``` type unop = | OpNeg type binop = | OpPlus | OpMinus | OpTimes | OpDiv type 'a located = { loc: Lexing.position * Lexing.position; value: 'a } type expr = raw_expr located and raw_expr = | ELiteral of int | EUnOp of unop * expr | EBinOp of expr * binop * expr ``` The types `unop` and `binop` are simple enumerated types. In the definition of the type `raw_expr`, one recognizes three kinds of expressions: integer literals, applications of unary operators, and applications of binary operators. There is no data constructor for expressions in parentheses: although parentheses are a necessary feature of the concrete syntax, there is no need to record them in the abstract syntax. In an abstract syntax tree, I would like every subtree to be annotated with its location in the input text. This would be important, in a real-world programming language implementation, in order to produce error messages that carry a source code location. To achieve this, I use a traditional technique: I define two types, `expr` and `raw_expr`, in a mutually recursive manner. An expression is a raw expression annotated with a location (a pair of a start position and an end position). A raw expression is an integer literal, an application of a unary operator to an expression, or an application of a binary operator to two expressions. Thus, like a cake, an abstract syntax tree has layered structure: one layer of location information, one layer of structural information, one layer of location information, one layer of structural information, and so on. Let me now move on to the description of the parser. This time, I am eventually interested in producing an abstract syntax tree. ``` %start main %{ open Syntax %} %% ``` The first few definitions are unchanged: ``` let main := ~ = expr; EOL; <> let expr == additive_expr ``` This time around, I won't use a generic definition along the lines of `fold_left(op, elem)`. It can be done, though; this is left as an exercise for the reader! Here is a direct definition of additive expressions: ``` let additive_expr := | multiplicative_expr | located( ~ = additive_expr; ~ = additive_op; ~ = multiplicative_expr; ) let additive_op == | PLUS; { OpPlus } | MINUS; { OpMinus } ``` In short, an additive expression is either a multiplicative expression, or an additive expression followed with an additive operator followed with a multiplicative expression. In the second production, I use three `~` patterns in order to avoid the chore of naming the three semantic values. I again use **a point-free semantic action**: `` means that the data constructor `EBinOp` should be applied to a tuple of the three semantic values. At the cost of greater verbosity, one could equivalently write `e1 = additive_expr; op = additive_op; e2 = multiplicative_expr; { EBinOp (e1, op, e2) }`. Now, `EBinOp(e1, op, e2)` has type `raw_expr`, but I would like the semantic value of the nonterminal symbol `additive_expr` to have type `expr`. Therefore, I need to wrap this semantic value in a record of type `raw_expr located`. This can be done in a lightweight and elegant manner just by wrapping the second production with `located(...)`, where the parameterized nonterminal symbol `located(x)` is defined once and for all as follows: ``` let located(x) == ~ = x; { { loc = $loc; value = x } } ``` `located(x)` recognizes the same input as `x`, and wraps the semantic value of type `'a` produced by `x` in a record of type `'a located`. One level down, multiplicative expressions are described via the same pattern: ``` let multiplicative_expr := | atomic_expr | located( ~ = multiplicative_expr; ~ = multiplicative_op; ~ = atomic_expr; ) let multiplicative_op == | TIMES; { OpTimes } | DIV; { OpDiv } ``` Finally, as earlier, an atomic expression is one of: an expression between parentheses; an integer literal; an application of a unary operator to an atomic expression. ``` let atomic_expr := | LPAREN; ~ = expr; RPAREN; <> | located( | ~ = INT; | ~ = unary_op; ~ = atomic_expr; ) let unary_op == | MINUS; { OpNeg } ``` Only the last two cases in the definition of `atomic_expr` are wrapped in `located(...)`: in the first case, this is not necessary, as the expression already carries a location. Things are formulated in such a way that the computed locations are tight: the source code range associated with a parenthesized subexpression does not include the parentheses. One could of course easily adopt the reverse convention: this is left as another exercise for the reader! ## Behind The Scenes, Or: In The Kitchen If one expands away all symbols introduced by `==`, expands away all parameterized symbols, and strips away all semantic actions, one finds that the two descriptions presented above represent the same LR(1) grammar, therefore give rise to the same deterministic pushdown automaton. This bare-bones grammar is printed by `menhir --only-preprocess-u`, a useful inspection tool. It is printed in Menhir's traditional syntax. Once manually translated to the modern syntax used in this article, it is as follows: ``` %token DIV EOL INT LPAREN MINUS PLUS RPAREN TIMES %start main %% let main := additive_expr; EOL let additive_expr := | multiplicative_expr | additive_expr; PLUS; multiplicative_expr | additive_expr; MINUS; multiplicative_expr let multiplicative_expr := | atomic_expr | multiplicative_expr; TIMES; atomic_expr | multiplicative_expr; DIV; atomic_expr let atomic_expr := | INT | LPAREN; additive_expr; RPAREN | MINUS; atomic_expr ``` ## Spilling the Sauce: A Syntax Error Suppose my fingers slip, and I make a syntax error in my grammar description: ``` let main := ~ = expr; EOL; <>; ``` Not to worry. Menhir's parser for `.mly` files is a Menhir-generated parser, and produces reasonable syntax error messages. Here, the semicolon that follows the semantic action is invalid: ``` File "parser.mly", line 30, characters 19-20: Error: syntax error after '<>' and before ';'. At this point, one of the following is expected: a bar '|' followed with an expression, or another rule. ``` Yes, **LR(1) parsers can produce good syntax error messages**. ## References The full source code of [the first demo](https://gitlab.inria.fr/fpottier/menhir/blob/master/demos/calc-new-syntax-dune/parser.mly) and [the second demo](https://gitlab.inria.fr/fpottier/menhir/blob/master/demos/calc-ast-dune/parser.mly) is available online. [A summary of the changes](https://gitlab.inria.fr/fpottier/menhir/blob/master/doc/new-rule-syntax-summary.md) between the old and new syntaxes is also available. The syntax of Menhir is of course also documented in the [reference manual](http://gallium.inria.fr/~fpottier/menhir/manual.html#sec5). menhir-20210929/doc/new-rule-syntax-summary.md000066400000000000000000000072711412503066000211160ustar00rootroot00000000000000# Differences between the old and new rule syntax This presentation of the new rule syntax is meant to be useful to a reader who is familiar with the old rule syntax. For a direct, self-contained presentation of the new rule syntax, please consult Menhir's manual. ## Rules A rule begins with `let`. | | | | |----------------------------|---------|--------------------------| | `foo: ...` | becomes | `let foo := ...` | | `%inline foo: ...` | becomes | `let foo == ...` | | `%public foo: ...` | becomes | `%public let foo := ...` | | `%public %inline foo: ...` | becomes | `%public let foo == ...` | A rule **cannot** be terminated with a semicolon. ## Sums Productions are separated with `|`. A leading `|` is permitted, and ignored. For instance, the rule `let a := | A; { () }` has only one production, which recognizes the symbol `A`. In contrast with the old syntax,two productions **cannot** share a semantic action. ## Sequences In a sequence `p1 = e1; e2`, the semicolon is **mandatory**. The pattern `p1` binds the semantic values produced by `e1`. | | | | |-----------------|-------|-------------------------------| | `x = foo;` | means | the same as in the old syntax | | `foo;` | means | `_ = foo;` | | `~ = foo;` | means | `foo = foo;` | | `(x, y) = foo;` | means | `_xy = foo;` where the following semantic action is wrapped in `let (x, y) = _xy in ...` | In contrast with the old syntax, when a sequence ends with a semantic action, the semicolon that precedes the semantic action is still mandatory. For instance, in `let literal := i = INT; { i }`, the semicolon is required. In contrast with the old syntax, **a sequence need not end with a semantic action**. A sequence can also end with a symbol, whose semantic value is then implicitly returned. For instance, | | | | |--------------------------------|-------|------------------| | `foo` at the end of a sequence | means | `x = foo; { x }` | This implies that **it becomes easy to define a symbol as a synonym for another symbol** or for a complex expression. For instance, | | | | |---------------------------------|---------|-------------------| | `%inline foo: xs = bar* { xs }` | becomes | `let foo == bar*` | ## Semantic actions Traditional semantic actions, such as `{ (x, y) }`, remain available. In addition, so-called **point-free semantic actions** appear. They take the form of a single OCaml identifier between angle brackets. This identifier, which should denote a function or a data constructor, is implicitly **applied** to a tuple of the variables that have been bound earlier in the sequence. If this identifier is omitted, the identity function is assumed. Thus, | | | | |-----------------------------------------------------|-------|--------------------------------------------------------| | `let pair(x, y) := ~ = x; ~ = y; ` | means | `let pair(x, y) := x = x; y = y; { Pair (x, y) }` | | `let pair(x, y) := ~ = x; ~ = y; <>` | means | `let pair(x, y) := x = x; y = y; { (x, y) }` | | `let parenthesized(x) := LPAREN; ~ = x; RPAREN; <>` | means | `let parenthesized(x) := LPAREN; x = x; RPAREN; { x }` | This often removes the need to invent names for semantic values. `$1`, `$2`, etc. are forbidden. Semantic values must be named. A semantic value can be named either explicitly or via a `~` pattern. menhir-20210929/doc/plain.bst000066400000000000000000000460631412503066000156360ustar00rootroot00000000000000% BibTeX standard bibliography style `plain' % version 0.99a for BibTeX versions 0.99a or later, LaTeX version 2.09. % Copyright (C) 1985, all rights reserved. % Copying of this file is authorized only if either % (1) you make absolutely no changes to your copy, including name, or % (2) if you do make changes, you name it something other than % btxbst.doc, plain.bst, unsrt.bst, alpha.bst, and abbrv.bst. % This restriction helps ensure that all standard styles are identical. % The file btxbst.doc has the documentation for this style. % Modified by Francois.Pottier@inria.fr with support for url field. ENTRY { address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type url volume year } {} { label } INTEGERS { output.state before.all mid.sentence after.sentence after.block } FUNCTION {init.state.consts} { #0 'before.all := #1 'mid.sentence := #2 'after.sentence := #3 'after.block := } STRINGS { s t } FUNCTION {output.nonnull} { 's := output.state mid.sentence = { ", " * write$ } { output.state after.block = { add.period$ write$ newline$ "\newblock " write$ } { output.state before.all = 'write$ { add.period$ " " * write$ } if$ } if$ mid.sentence 'output.state := } if$ s } FUNCTION {output} { duplicate$ empty$ 'pop$ 'output.nonnull if$ } FUNCTION {output.check} { 't := duplicate$ empty$ { pop$ "empty " t * " in " * cite$ * warning$ } 'output.nonnull if$ } FUNCTION {output.bibitem} { newline$ "\bibitem{" write$ cite$ write$ "}" write$ newline$ "" before.all 'output.state := } FUNCTION {fin.entry} { add.period$ write$ newline$ } FUNCTION {new.block} { output.state before.all = 'skip$ { after.block 'output.state := } if$ } FUNCTION {new.sentence} { output.state after.block = 'skip$ { output.state before.all = 'skip$ { after.sentence 'output.state := } if$ } if$ } FUNCTION {not} { { #0 } { #1 } if$ } FUNCTION {and} { 'skip$ { pop$ #0 } if$ } FUNCTION {or} { { pop$ #1 } 'skip$ if$ } FUNCTION {new.block.checka} { empty$ 'skip$ 'new.block if$ } FUNCTION {new.block.checkb} { empty$ swap$ empty$ and 'skip$ 'new.block if$ } FUNCTION {new.sentence.checka} { empty$ 'skip$ 'new.sentence if$ } FUNCTION {new.sentence.checkb} { empty$ swap$ empty$ and 'skip$ 'new.sentence if$ } FUNCTION {field.or.null} { duplicate$ empty$ { pop$ "" } 'skip$ if$ } FUNCTION {emphasize} { duplicate$ empty$ { pop$ "" } { "{\em " swap$ * "}" * } if$ } INTEGERS { nameptr namesleft numnames } FUNCTION {format.names} { 's := #1 'nameptr := s num.names$ 'numnames := numnames 'namesleft := { namesleft #0 > } { s nameptr "{ff~}{vv~}{ll}{, jj}" format.name$ 't := nameptr #1 > { namesleft #1 > { ", " * t * } { numnames #2 > { "," * } 'skip$ if$ t "others" = { " et~al." * } { " and " * t * } if$ } if$ } 't if$ nameptr #1 + 'nameptr := namesleft #1 - 'namesleft := } while$ } FUNCTION {format.authors} { author empty$ { "" } { author format.names } if$ } FUNCTION {format.editors} { editor empty$ { "" } { editor format.names editor num.names$ #1 > { ", editors" * } { ", editor" * } if$ } if$ } FUNCTION {format.title} { title empty$ { "" } { url empty$ { title "t" change.case$ } { "\href{" url "}{" title "t" change.case$ "}" * * * * } if$ } if$ } FUNCTION {n.dashify} { 't := "" { t empty$ not } { t #1 #1 substring$ "-" = { t #1 #2 substring$ "--" = not { "--" * t #2 global.max$ substring$ 't := } { { t #1 #1 substring$ "-" = } { "-" * t #2 global.max$ substring$ 't := } while$ } if$ } { t #1 #1 substring$ * t #2 global.max$ substring$ 't := } if$ } while$ } FUNCTION {format.date} { year empty$ { month empty$ { "" } { "there's a month but no year in " cite$ * warning$ month } if$ } { month empty$ 'year { month " " * year * } if$ } if$ } FUNCTION {format.btitle} { url empty$ { title emphasize } { "\href{" url "}{" title emphasize "}" * * * * } if$ } FUNCTION {tie.or.space.connect} { duplicate$ text.length$ #3 < { "~" } { " " } if$ swap$ * * } FUNCTION {either.or.check} { empty$ 'pop$ { "can't use both " swap$ * " fields in " * cite$ * warning$ } if$ } FUNCTION {format.bvolume} { volume empty$ { "" } { "volume" volume tie.or.space.connect series empty$ 'skip$ { " of " * series emphasize * } if$ "volume and number" number either.or.check } if$ } FUNCTION {format.number.series} { volume empty$ { number empty$ { series field.or.null } { output.state mid.sentence = { "number" } { "Number" } if$ number tie.or.space.connect series empty$ { "there's a number but no series in " cite$ * warning$ } { " in " * series * } if$ } if$ } { "" } if$ } FUNCTION {format.edition} { edition empty$ { "" } { output.state mid.sentence = { edition "l" change.case$ " edition" * } { edition "t" change.case$ " edition" * } if$ } if$ } INTEGERS { multiresult } FUNCTION {multi.page.check} { 't := #0 'multiresult := { multiresult not t empty$ not and } { t #1 #1 substring$ duplicate$ "-" = swap$ duplicate$ "," = swap$ "+" = or or { #1 'multiresult := } { t #2 global.max$ substring$ 't := } if$ } while$ multiresult } FUNCTION {format.pages} { pages empty$ { "" } { pages multi.page.check { "pages" pages n.dashify tie.or.space.connect } { "page" pages tie.or.space.connect } if$ } if$ } FUNCTION {format.vol.num.pages} { volume field.or.null number empty$ 'skip$ { "(" number * ")" * * volume empty$ { "there's a number but no volume in " cite$ * warning$ } 'skip$ if$ } if$ pages empty$ 'skip$ { duplicate$ empty$ { pop$ format.pages } { ":" * pages n.dashify * } if$ } if$ } FUNCTION {format.chapter.pages} { chapter empty$ 'format.pages { type empty$ { "chapter" } { type "l" change.case$ } if$ chapter tie.or.space.connect pages empty$ 'skip$ { ", " * format.pages * } if$ } if$ } FUNCTION {format.in.ed.booktitle} { booktitle empty$ { "" } { editor empty$ { "In " booktitle emphasize * } { "In " format.editors * ", " * booktitle emphasize * } if$ } if$ } FUNCTION {empty.misc.check} { author empty$ title empty$ howpublished empty$ month empty$ year empty$ note empty$ and and and and and key empty$ not and { "all relevant fields are empty in " cite$ * warning$ } 'skip$ if$ } FUNCTION {format.thesis.type} { type empty$ 'skip$ { pop$ type "t" change.case$ } if$ } FUNCTION {format.tr.number} { type empty$ { "Technical Report" } 'type if$ number empty$ { "t" change.case$ } { number tie.or.space.connect } if$ } FUNCTION {format.article.crossref} { key empty$ { journal empty$ { "need key or journal for " cite$ * " to crossref " * crossref * warning$ "" } { "In {\em " journal * "\/}" * } if$ } { "In " key * } if$ " \cite{" * crossref * "}" * } FUNCTION {format.crossref.editor} { editor #1 "{vv~}{ll}" format.name$ editor num.names$ duplicate$ #2 > { pop$ " et~al." * } { #2 < 'skip$ { editor #2 "{ff }{vv }{ll}{ jj}" format.name$ "others" = { " et~al." * } { " and " * editor #2 "{vv~}{ll}" format.name$ * } if$ } if$ } if$ } FUNCTION {format.book.crossref} { volume empty$ { "empty volume in " cite$ * "'s crossref of " * crossref * warning$ "In " } { "Volume" volume tie.or.space.connect " of " * } if$ editor empty$ editor field.or.null author field.or.null = or { key empty$ { series empty$ { "need editor, key, or series for " cite$ * " to crossref " * crossref * warning$ "" * } { "{\em " * series * "\/}" * } if$ } { key * } if$ } { format.crossref.editor * } if$ " \cite{" * crossref * "}" * } FUNCTION {format.incoll.inproc.crossref} { editor empty$ editor field.or.null author field.or.null = or { key empty$ { booktitle empty$ { "need editor, key, or booktitle for " cite$ * " to crossref " * crossref * warning$ "" } { "In {\em " booktitle * "\/}" * } if$ } { "In " key * } if$ } { "In " format.crossref.editor * } if$ " \cite{" * crossref * "}" * } FUNCTION {article} { output.bibitem format.authors "author" output.check new.block format.title "title" output.check new.block crossref missing$ { journal emphasize "journal" output.check format.vol.num.pages output format.date "year" output.check } { format.article.crossref output.nonnull format.pages output } if$ new.block note output fin.entry } FUNCTION {book} { output.bibitem author empty$ { format.editors "author and editor" output.check } { format.authors output.nonnull crossref missing$ { "author and editor" editor either.or.check } 'skip$ if$ } if$ new.block format.btitle "title" output.check crossref missing$ { format.bvolume output new.block format.number.series output new.sentence publisher "publisher" output.check address output } { new.block format.book.crossref output.nonnull } if$ format.edition output format.date "year" output.check new.block note output fin.entry } FUNCTION {booklet} { output.bibitem format.authors output new.block format.title "title" output.check howpublished address new.block.checkb howpublished output address output format.date output new.block note output fin.entry } FUNCTION {inbook} { output.bibitem author empty$ { format.editors "author and editor" output.check } { format.authors output.nonnull crossref missing$ { "author and editor" editor either.or.check } 'skip$ if$ } if$ new.block format.btitle "title" output.check crossref missing$ { format.bvolume output format.chapter.pages "chapter and pages" output.check new.block format.number.series output new.sentence publisher "publisher" output.check address output } { format.chapter.pages "chapter and pages" output.check new.block format.book.crossref output.nonnull } if$ format.edition output format.date "year" output.check new.block note output fin.entry } FUNCTION {incollection} { output.bibitem format.authors "author" output.check new.block format.title "title" output.check new.block crossref missing$ { format.in.ed.booktitle "booktitle" output.check format.bvolume output format.number.series output format.chapter.pages output new.sentence publisher "publisher" output.check address output format.edition output format.date "year" output.check } { format.incoll.inproc.crossref output.nonnull format.chapter.pages output } if$ new.block note output fin.entry } FUNCTION {inproceedings} { output.bibitem format.authors "author" output.check new.block format.title "title" output.check new.block crossref missing$ { format.in.ed.booktitle "booktitle" output.check format.bvolume output format.number.series output format.pages output address empty$ { organization publisher new.sentence.checkb organization output publisher output format.date "year" output.check } { address output.nonnull format.date "year" output.check new.sentence organization output publisher output } if$ } { format.incoll.inproc.crossref output.nonnull format.pages output } if$ new.block note output fin.entry } FUNCTION {conference} { inproceedings } FUNCTION {manual} { output.bibitem author empty$ { organization empty$ 'skip$ { organization output.nonnull address output } if$ } { format.authors output.nonnull } if$ new.block format.btitle "title" output.check author empty$ { organization empty$ { address new.block.checka address output } 'skip$ if$ } { organization address new.block.checkb organization output address output } if$ format.edition output format.date output new.block note output fin.entry } FUNCTION {mastersthesis} { output.bibitem format.authors "author" output.check new.block format.title "title" output.check new.block "Master's thesis" format.thesis.type output.nonnull school "school" output.check address output format.date "year" output.check new.block note output fin.entry } FUNCTION {misc} { output.bibitem format.authors output title howpublished new.block.checkb format.title output howpublished new.block.checka howpublished output format.date output new.block note output fin.entry empty.misc.check } FUNCTION {phdthesis} { output.bibitem format.authors "author" output.check new.block format.btitle "title" output.check new.block "PhD thesis" format.thesis.type output.nonnull school "school" output.check address output format.date "year" output.check new.block note output fin.entry } FUNCTION {proceedings} { output.bibitem editor empty$ { organization output } { format.editors output.nonnull } if$ new.block format.btitle "title" output.check format.bvolume output format.number.series output address empty$ { editor empty$ { publisher new.sentence.checka } { organization publisher new.sentence.checkb organization output } if$ publisher output format.date "year" output.check } { address output.nonnull format.date "year" output.check new.sentence editor empty$ 'skip$ { organization output } if$ publisher output } if$ new.block note output fin.entry } FUNCTION {techreport} { output.bibitem format.authors "author" output.check new.block format.title "title" output.check new.block format.tr.number output.nonnull institution "institution" output.check address output format.date "year" output.check new.block note output fin.entry } FUNCTION {unpublished} { output.bibitem format.authors "author" output.check new.block format.title "title" output.check new.block note "note" output.check format.date output fin.entry } FUNCTION {default.type} { misc } MACRO {jan} {"January"} MACRO {feb} {"February"} MACRO {mar} {"March"} MACRO {apr} {"April"} MACRO {may} {"May"} MACRO {jun} {"June"} MACRO {jul} {"July"} MACRO {aug} {"August"} MACRO {sep} {"September"} MACRO {oct} {"October"} MACRO {nov} {"November"} MACRO {dec} {"December"} MACRO {acmcs} {"ACM Computing Surveys"} MACRO {acta} {"Acta Informatica"} MACRO {cacm} {"Communications of the ACM"} MACRO {ibmjrd} {"IBM Journal of Research and Development"} MACRO {ibmsj} {"IBM Systems Journal"} MACRO {ieeese} {"IEEE Transactions on Software Engineering"} MACRO {ieeetc} {"IEEE Transactions on Computers"} MACRO {ieeetcad} {"IEEE Transactions on Computer-Aided Design of Integrated Circuits"} MACRO {ipl} {"Information Processing Letters"} MACRO {jacm} {"Journal of the ACM"} MACRO {jcss} {"Journal of Computer and System Sciences"} MACRO {scp} {"Science of Computer Programming"} MACRO {sicomp} {"SIAM Journal on Computing"} MACRO {tocs} {"ACM Transactions on Computer Systems"} MACRO {tods} {"ACM Transactions on Database Systems"} MACRO {tog} {"ACM Transactions on Graphics"} MACRO {toms} {"ACM Transactions on Mathematical Software"} MACRO {toois} {"ACM Transactions on Office Information Systems"} MACRO {toplas} {"ACM Transactions on Programming Languages and Systems"} MACRO {tcs} {"Theoretical Computer Science"} READ FUNCTION {sortify} { purify$ "l" change.case$ } INTEGERS { len } FUNCTION {chop.word} { 's := 'len := s #1 len substring$ = { s len #1 + global.max$ substring$ } 's if$ } FUNCTION {sort.format.names} { 's := #1 'nameptr := "" s num.names$ 'numnames := numnames 'namesleft := { namesleft #0 > } { nameptr #1 > { " " * } 'skip$ if$ s nameptr "{vv{ } }{ll{ }}{ ff{ }}{ jj{ }}" format.name$ 't := nameptr numnames = t "others" = and { "et al" * } { t sortify * } if$ nameptr #1 + 'nameptr := namesleft #1 - 'namesleft := } while$ } FUNCTION {sort.format.title} { 't := "A " #2 "An " #3 "The " #4 t chop.word chop.word chop.word sortify #1 global.max$ substring$ } FUNCTION {author.sort} { author empty$ { key empty$ { "to sort, need author or key in " cite$ * warning$ "" } { key sortify } if$ } { author sort.format.names } if$ } FUNCTION {author.editor.sort} { author empty$ { editor empty$ { key empty$ { "to sort, need author, editor, or key in " cite$ * warning$ "" } { key sortify } if$ } { editor sort.format.names } if$ } { author sort.format.names } if$ } FUNCTION {author.organization.sort} { author empty$ { organization empty$ { key empty$ { "to sort, need author, organization, or key in " cite$ * warning$ "" } { key sortify } if$ } { "The " #4 organization chop.word sortify } if$ } { author sort.format.names } if$ } FUNCTION {editor.organization.sort} { editor empty$ { organization empty$ { key empty$ { "to sort, need editor, organization, or key in " cite$ * warning$ "" } { key sortify } if$ } { "The " #4 organization chop.word sortify } if$ } { editor sort.format.names } if$ } FUNCTION {presort} { type$ "book" = type$ "inbook" = or 'author.editor.sort { type$ "proceedings" = 'editor.organization.sort { type$ "manual" = 'author.organization.sort 'author.sort if$ } if$ } if$ " " * year field.or.null sortify * " " * title field.or.null sort.format.title * #1 entry.max$ substring$ 'sort.key$ := } ITERATE {presort} SORT STRINGS { longest.label } INTEGERS { number.label longest.label.width } FUNCTION {initialize.longest.label} { "" 'longest.label := #1 'number.label := #0 'longest.label.width := } FUNCTION {longest.label.pass} { number.label int.to.str$ 'label := number.label #1 + 'number.label := label width$ longest.label.width > { label 'longest.label := label width$ 'longest.label.width := } 'skip$ if$ } EXECUTE {initialize.longest.label} ITERATE {longest.label.pass} FUNCTION {begin.bib} { preamble$ empty$ 'skip$ { preamble$ write$ newline$ } if$ "\begin{thebibliography}{" longest.label * "}" * write$ newline$ } EXECUTE {begin.bib} EXECUTE {init.state.consts} ITERATE {call.type$} FUNCTION {end.bib} { newline$ "\end{thebibliography}" write$ newline$ } EXECUTE {end.bib} menhir-20210929/doc/sigplanconf.cls000066400000000000000000000712071412503066000170250ustar00rootroot00000000000000%----------------------------------------------------------------------------- % % LaTeX Class/Style File % % Name: sigplanconf.cls % Purpose: A LaTeX 2e class file for SIGPLAN conference proceedings. % This class file supercedes acm_proc_article-sp, % sig-alternate, and sigplan-proc. % % Author: Paul C. Anagnostopoulos % Windfall Software % 978 371-2316 % paul@windfall.com % % Created: 12 September 2004 % % Revisions: See end of file. % %----------------------------------------------------------------------------- \NeedsTeXFormat{LaTeX2e}[1995/12/01] \ProvidesClass{sigplanconf}[2005/03/07 v0.93 ACM SIGPLAN Proceedings] % The following few pages contain LaTeX programming extensions adapted % from the ZzTeX macro package. % Token Hackery % ----- ------- \def \@expandaftertwice {\expandafter\expandafter\expandafter} \def \@expandafterthrice {\expandafter\expandafter\expandafter\expandafter \expandafter\expandafter\expandafter} % This macro discards the next token. \def \@discardtok #1{}% token % This macro removes the `pt' following a dimension. {\catcode `\p = 12 \catcode `\t = 12 \gdef \@remover #1pt{#1} } % \catcode % This macro extracts the contents of a macro and returns it as plain text. % Usage: \expandafter\@defof \meaning\macro\@mark \def \@defof #1:->#2\@mark{#2} % Control Sequence Names % ------- -------- ----- \def \@name #1{% {\tokens} \csname \expandafter\@discardtok \string#1\endcsname} \def \@withname #1#2{% {\command}{\tokens} \expandafter#1\csname \expandafter\@discardtok \string#2\endcsname} % Flags (Booleans) % ----- ---------- % The boolean literals \@true and \@false are appropriate for use with % the \if command, which tests the codes of the next two characters. \def \@true {TT} \def \@false {FL} \def \@setflag #1=#2{\edef #1{#2}}% \flag = boolean % IF and Predicates % -- --- ---------- % A "predicate" is a macro that returns \@true or \@false as its value. % Such values are suitable for use with the \if conditional. For example: % % \if \oddp{\x} \else \fi % A predicate can be used with \@setflag as follows: % % \@setflag \flag = {} % Here are the predicates for TeX's repertoire of conditional % commands. These might be more appropriately interspersed with % other definitions in this module, but what the heck. % Some additional "obvious" predicates are defined. \def \eqlp #1#2{\ifnum #1 = #2\@true \else \@false \fi} \def \neqlp #1#2{\ifnum #1 = #2\@false \else \@true \fi} \def \lssp #1#2{\ifnum #1 < #2\@true \else \@false \fi} \def \gtrp #1#2{\ifnum #1 > #2\@true \else \@false \fi} \def \zerop #1{\ifnum #1 = 0\@true \else \@false \fi} \def \onep #1{\ifnum #1 = 1\@true \else \@false \fi} \def \posp #1{\ifnum #1 > 0\@true \else \@false \fi} \def \negp #1{\ifnum #1 < 0\@true \else \@false \fi} \def \oddp #1{\ifodd #1\@true \else \@false \fi} \def \evenp #1{\ifodd #1\@false \else \@true \fi} \def \rangep #1#2#3{\if \orp{\lssp{#1}{#2}}{\gtrp{#1}{#3}}\@false \else \@true \fi} \def \tensp #1{\rangep{#1}{10}{19}} \def \dimeqlp #1#2{\ifdim #1 = #2\@true \else \@false \fi} \def \dimneqlp #1#2{\ifdim #1 = #2\@false \else \@true \fi} \def \dimlssp #1#2{\ifdim #1 < #2\@true \else \@false \fi} \def \dimgtrp #1#2{\ifdim #1 > #2\@true \else \@false \fi} \def \dimzerop #1{\ifdim #1 = 0pt\@true \else \@false \fi} \def \dimposp #1{\ifdim #1 > 0pt\@true \else \@false \fi} \def \dimnegp #1{\ifdim #1 < 0pt\@true \else \@false \fi} \def \vmodep {\ifvmode \@true \else \@false \fi} \def \hmodep {\ifhmode \@true \else \@false \fi} \def \mathmodep {\ifmmode \@true \else \@false \fi} \def \textmodep {\ifmmode \@false \else \@true \fi} \def \innermodep {\ifinner \@true \else \@false \fi} \long\def \codeeqlp #1#2{\if #1#2\@true \else \@false \fi} \long\def \cateqlp #1#2{\ifcat #1#2\@true \else \@false \fi} \long\def \tokeqlp #1#2{\ifx #1#2\@true \else \@false \fi} \long\def \xtokeqlp #1#2{\expandafter\ifx #1#2\@true \else \@false \fi} \long\def \definedp #1{% \expandafter\ifx \csname \expandafter\@discardtok \string#1\endcsname \relax \@false \else \@true \fi} \long\def \undefinedp #1{% \expandafter\ifx \csname \expandafter\@discardtok \string#1\endcsname \relax \@true \else \@false \fi} \def \emptydefp #1{\ifx #1\@empty \@true \else \@false \fi}% {\name} \let \emptylistp = \emptydefp \long\def \emptyargp #1{% {#n} \@empargp #1\@empargq\@mark} \long\def \@empargp #1#2\@mark{% \ifx #1\@empargq \@true \else \@false \fi} \def \@empargq {\@empargq} \def \emptytoksp #1{% {\tokenreg} \expandafter\@emptoksp \the#1\@mark} \long\def \@emptoksp #1\@mark{\emptyargp{#1}} \def \voidboxp #1{\ifvoid #1\@true \else \@false \fi} \def \hboxp #1{\ifhbox #1\@true \else \@false \fi} \def \vboxp #1{\ifvbox #1\@true \else \@false \fi} \def \eofp #1{\ifeof #1\@true \else \@false \fi} % Flags can also be used as predicates, as in: % % \if \flaga \else \fi % Now here we have predicates for the common logical operators. \def \notp #1{\if #1\@false \else \@true \fi} \def \andp #1#2{\if #1% \if #2\@true \else \@false \fi \else \@false \fi} \def \orp #1#2{\if #1% \@true \else \if #2\@true \else \@false \fi \fi} \def \xorp #1#2{\if #1% \if #2\@false \else \@true \fi \else \if #2\@true \else \@false \fi \fi} % Arithmetic % ---------- \def \@increment #1{\advance #1 by 1\relax}% {\count} \def \@decrement #1{\advance #1 by -1\relax}% {\count} % Options % ------- \@setflag \@blockstyle = \@false \@setflag \@copyrightwanted = \@true \@setflag \@explicitsize = \@false \@setflag \@mathtime = \@false \@setflag \@ninepoint = \@true \@setflag \@onecolumn = \@false \@setflag \@preprint = \@false \newcount{\@numheaddepth} \@numheaddepth = 3 \@setflag \@times = \@false % Note that all the dangerous article class options are trapped. \DeclareOption{9pt}{\@setflag \@ninepoint = \@true \@setflag \@explicitsize = \@true} \DeclareOption{10pt}{\PassOptionsToClass{10pt}{article}% \@setflag \@ninepoint = \@false \@setflag \@explicitsize = \@true} \DeclareOption{11pt}{\PassOptionsToClass{11pt}{article}% \@setflag \@ninepoint = \@false \@setflag \@explicitsize = \@true} \DeclareOption{12pt}{\@unsupportedoption{12pt}} \DeclareOption{a4paper}{\@unsupportedoption{a4paper}} \DeclareOption{a5paper}{\@unsupportedoption{a5paper}} \DeclareOption{b5paper}{\@unsupportedoption{b5paper}} \DeclareOption{blockstyle}{\@setflag \@blockstyle = \@true} \DeclareOption{cm}{\@setflag \@times = \@false} \DeclareOption{computermodern}{\@setflag \@times = \@false} \DeclareOption{executivepaper}{\@unsupportedoption{executivepaper}} \DeclareOption{indentedstyle}{\@setflag \@blockstyle = \@false} \DeclareOption{landscape}{\@unsupportedoption{landscape}} \DeclareOption{legalpaper}{\@unsupportedoption{legalpaper}} \DeclareOption{letterpaper}{\@unsupportedoption{letterpaper}} \DeclareOption{mathtime}{\@setflag \@mathtime = \@true} \DeclareOption{nocopyrightspace}{\@setflag \@copyrightwanted = \@false} \DeclareOption{notitlepage}{\@unsupportedoption{notitlepage}} \DeclareOption{numberedpars}{\@numheaddepth = 4} \DeclareOption{onecolumn}{\@setflag \@onecolumn = \@true} \DeclareOption{preprint}{\@setflag \@preprint = \@true} \DeclareOption{times}{\@setflag \@times = \@true} \DeclareOption{titlepage}{\@unsupportedoption{titlepage}} \DeclareOption{twocolumn}{\@setflag \@onecolumn = \@false} \DeclareOption*{\PassOptionsToClass{\CurrentOption}{article}} \ExecuteOptions{9pt,indentedstyle,times} \@setflag \@explicitsize = \@false \ProcessOptions \if \@onecolumn \if \notp{\@explicitsize}% \@setflag \@ninepoint = \@false \PassOptionsToClass{11pt}{article}% \fi \PassOptionsToClass{twoside,onecolumn}{article} \else \PassOptionsToClass{twoside,twocolumn}{article} \fi \LoadClass{article} \def \@unsupportedoption #1{% \ClassError{proc}{The standard '#1' option is not supported.}} % Utilities % --------- \newcommand{\setvspace}[2]{% #1 = #2 \advance #1 by -1\parskip} % Document Parameters % -------- ---------- % Page: \setlength{\hoffset}{-1in} \setlength{\voffset}{-1in} \setlength{\topmargin}{1in} \setlength{\headheight}{0pt} \setlength{\headsep}{0pt} \if \@onecolumn \setlength{\evensidemargin}{.75in} \setlength{\oddsidemargin}{.75in} \else \setlength{\evensidemargin}{.75in} \setlength{\oddsidemargin}{.75in} \fi % Text area: \newdimen{\standardtextwidth} \setlength{\standardtextwidth}{42pc} \if \@onecolumn \setlength{\textwidth}{40.5pc} \else \setlength{\textwidth}{\standardtextwidth} \fi \setlength{\topskip}{8pt} \setlength{\columnsep}{2pc} \setlength{\textheight}{54.5pc} % Running foot: \setlength{\footskip}{30pt} % Paragraphs: \if \@blockstyle \setlength{\parskip}{5pt plus .1pt minus .5pt} \setlength{\parindent}{0pt} \else \setlength{\parskip}{0pt} \setlength{\parindent}{12pt} \fi \setlength{\lineskip}{.5pt} \setlength{\lineskiplimit}{\lineskip} \frenchspacing \pretolerance = 400 \tolerance = \pretolerance \setlength{\emergencystretch}{5pt} \clubpenalty = 10000 \widowpenalty = 10000 \setlength{\hfuzz}{.5pt} % Standard vertical spaces: \newskip{\standardvspace} \setvspace{\standardvspace}{5pt plus 1pt minus .5pt} % Margin paragraphs: \setlength{\marginparwidth}{0pt} \setlength{\marginparsep}{0pt} \setlength{\marginparpush}{0pt} \setlength{\skip\footins}{8pt plus 3pt minus 1pt} \setlength{\footnotesep}{9pt} \renewcommand{\footnoterule}{% \hrule width .5\columnwidth height .33pt depth 0pt} \renewcommand{\@makefntext}[1]{% \noindent \@makefnmark \hspace{1pt}#1} % Floats: \setcounter{topnumber}{4} \setcounter{bottomnumber}{1} \setcounter{totalnumber}{4} \renewcommand{\fps@figure}{tp} \renewcommand{\fps@table}{tp} \renewcommand{\topfraction}{0.90} \renewcommand{\bottomfraction}{0.30} \renewcommand{\textfraction}{0.10} \renewcommand{\floatpagefraction}{0.75} \setcounter{dbltopnumber}{4} \renewcommand{\dbltopfraction}{\topfraction} \renewcommand{\dblfloatpagefraction}{\floatpagefraction} \setlength{\floatsep}{18pt plus 4pt minus 2pt} \setlength{\textfloatsep}{18pt plus 4pt minus 3pt} \setlength{\intextsep}{10pt plus 4pt minus 3pt} \setlength{\dblfloatsep}{18pt plus 4pt minus 2pt} \setlength{\dbltextfloatsep}{20pt plus 4pt minus 3pt} % Miscellaneous: \errorcontextlines = 5 % Fonts % ----- \if \@times \renewcommand{\rmdefault}{ptm}% \if \@mathtime \usepackage[mtbold,noTS1]{mathtime}% \else %%% \usepackage{mathptm}% \fi \else \relax \fi \if \@ninepoint \renewcommand{\normalsize}{% \@setfontsize{\normalsize}{9pt}{10pt}% \setlength{\abovedisplayskip}{5pt plus 1pt minus .5pt}% \setlength{\belowdisplayskip}{\abovedisplayskip}% \setlength{\abovedisplayshortskip}{3pt plus 1pt minus 2pt}% \setlength{\belowdisplayshortskip}{\abovedisplayshortskip}} \renewcommand{\tiny}{\@setfontsize{\tiny}{5pt}{6pt}} \renewcommand{\scriptsize}{\@setfontsize{\scriptsize}{7pt}{8pt}} \renewcommand{\small}{% \@setfontsize{\small}{8pt}{9pt}% \setlength{\abovedisplayskip}{4pt plus 1pt minus 1pt}% \setlength{\belowdisplayskip}{\abovedisplayskip}% \setlength{\abovedisplayshortskip}{2pt plus 1pt}% \setlength{\belowdisplayshortskip}{\abovedisplayshortskip}} \renewcommand{\footnotesize}{% \@setfontsize{\footnotesize}{8pt}{9pt}% \setlength{\abovedisplayskip}{4pt plus 1pt minus .5pt}% \setlength{\belowdisplayskip}{\abovedisplayskip}% \setlength{\abovedisplayshortskip}{2pt plus 1pt}% \setlength{\belowdisplayshortskip}{\abovedisplayshortskip}} \renewcommand{\large}{\@setfontsize{\large}{11pt}{13pt}} \renewcommand{\Large}{\@setfontsize{\Large}{14pt}{18pt}} \renewcommand{\LARGE}{\@setfontsize{\LARGE}{18pt}{20pt}} \renewcommand{\huge}{\@setfontsize{\huge}{20pt}{25pt}} \renewcommand{\Huge}{\@setfontsize{\Huge}{25pt}{30pt}} \fi % Abstract % -------- \renewenvironment{abstract}{% \section*{Abstract}% \normalsize}{% } % Bibliography % ------------ \renewenvironment{thebibliography}[1] {\section*{\refname \@mkboth{\MakeUppercase\refname}{\MakeUppercase\refname}}% \list{\@biblabel{\@arabic\c@enumiv}}% {\settowidth\labelwidth{\@biblabel{#1}}% \leftmargin\labelwidth \advance\leftmargin\labelsep \@openbib@code \usecounter{enumiv}% \let\p@enumiv\@empty \renewcommand\theenumiv{\@arabic\c@enumiv}}% \small \softraggedright%%%\sloppy \clubpenalty4000 \@clubpenalty \clubpenalty \widowpenalty4000% \sfcode`\.\@m} {\def\@noitemerr {\@latex@warning{Empty `thebibliography' environment}}% \endlist} % Categories % ---------- \@setflag \@firstcategory = \@true \newcommand{\category}[3]{% \if \@firstcategory \paragraph*{Categories and Subject Descriptors}% \@setflag \@firstcategory = \@false \else \unskip ;\hspace{.75em}% \fi \@ifnextchar [{\@category{#1}{#2}{#3}}{\@category{#1}{#2}{#3}[]}} \def \@category #1#2#3[#4]{% {\let \and = \relax #1 [\textit{#2}]% \if \emptyargp{#4}% \if \notp{\emptyargp{#3}}: #3\fi \else :\space \if \notp{\emptyargp{#3}}#3---\fi \textrm{#4}% \fi}} % Copyright Notice % --------- ------ \def \ftype@copyrightbox {8} \def \@toappear {} \def \@permission {} \def \@copyrightspace {% \@float{copyrightbox}[b]% \vbox to 1.25in{% \vfill \begin{center}% \@toappear \end{center}}% \end@float} \long\def \toappear #1{% \def \@toappear {\parbox[b]{20pc}{\scriptsize #1}}} %%%\def \toappearbox #1{% %%% \def \@toappear {\raisebox{5pt}{\framebox[20pc]{\parbox[b]{19pc}{#1}}}}} \toappear{% \noindent \@permission \par \vspace{2pt} \noindent \textsl{\@conferencename}\quad \@conferenceinfo \par \@copyrightinfo} \newcommand{\permission}[1]{% \gdef \@permission {#1}} \permission{% Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.} \def \@copyrightinfo {% \if \notp{\emptydefp{\copyrightinfo}}% Copyright \copyright\ \@copyrightyear\ ACM \@copyrightdata\dots \$5.00. \fi} % Enunciations % ------------ \def \@begintheorem #1#2{% {name}{number} \trivlist \item[\hskip \labelsep \textsc{#1 #2.}]% \itshape\selectfont \ignorespaces} \def \@opargbegintheorem #1#2#3{% {name}{number}{title} \trivlist \item[% \hskip\labelsep \textsc{#1\ #2}% \if \notp{\emptyargp{#3}}\nut (#3).\fi]% \itshape\selectfont \ignorespaces} \@setflag \@qeddone = \@false \newenvironment{proof}{% \global\@setflag \@qeddone = \@false \@ifnextchar[{\@titledproof}{\@titledproof[]}}{% \if \notp{\@qeddone}\qed \fi \endtrivlist} \def \@titledproof [#1]{% \trivlist \item[\hskip \labelsep \textsc{Proof% \if \notp{\emptyargp{#1}}\space #1\fi .}]% \ignorespaces} \newcommand{\qed}{% \unskip \kern 6pt {\linethickness{.5pt}\framebox(4,4){}}% \global\@setflag \@qeddone = \@true} \newcommand{\newdef}[2]{% {type}{name} \@withname\@ifdefinable {#1}{% \@definecounter{#1}% \@withname\xdef {the#1}{\@thmcounter{#1}}% \global\@namedef{#1}{\@begindef{#1}{#2}}% \global\@namedef{end#1}{\@endtheorem}}} \def \@begindef #1#2{% {type}{name} \refstepcounter{#1}% \@ifnextchar[{\@titleddef{#1}{#2}}{\@titleddef{#1}{#2}[]}} \def \@titleddef #1#2[#3]{% {type}{name}[title] \trivlist \item[\hskip \labelsep \itshape{#2% \if \notp{\emptyargp{#3}}\space #3\fi .}]% \ignorespaces} % Figures % ------- \@setflag \@caprule = \@true \long\def \@makecaption #1#2{% \addvspace{4pt} \if \@caprule \hrule width \hsize height .33pt \vspace{4pt} \fi \setbox \@tempboxa = \hbox{\@setfigurenumber{#1.}\nut #2}% \if \dimgtrp{\wd\@tempboxa}{\hsize}% \noindent \@setfigurenumber{#1.}\nut #2\par \else \centerline{\box\@tempboxa}% \fi} \newcommand{\nocaptionrule}{% \@setflag \@caprule = \@false} \def \@setfigurenumber #1{% {\rmfamily \bfseries \selectfont #1}} % Hierarchy % --------- \setcounter{secnumdepth}{\@numheaddepth} \newskip{\@sectionaboveskip} \setvspace{\@sectionaboveskip}{10pt plus 3pt minus 2pt} \newskip{\@sectionbelowskip} \if \@blockstyle \setlength{\@sectionbelowskip}{0.1pt}% \else \setlength{\@sectionbelowskip}{4pt}% \fi \renewcommand{\section}{% \@startsection {section}% {1}% {0pt}% {-\@sectionaboveskip}% {\@sectionbelowskip}% {\large \bfseries \raggedright}} \newskip{\@subsectionaboveskip} \setvspace{\@subsectionaboveskip}{8pt plus 2pt minus 2pt} \newskip{\@subsectionbelowskip} \if \@blockstyle \setlength{\@subsectionbelowskip}{0.1pt}% \else \setlength{\@subsectionbelowskip}{4pt}% \fi \renewcommand{\subsection}{% \@startsection% {subsection}% {2}% {0pt}% {-\@subsectionaboveskip}% {\@subsectionbelowskip}% {\normalsize \bfseries \raggedright}} \renewcommand{\subsubsection}{% \@startsection% {subsubsection}% {3}% {0pt}% {-\@subsectionaboveskip} {\@subsectionbelowskip}% {\normalsize \bfseries \raggedright}} \newskip{\@paragraphaboveskip} \setvspace{\@paragraphaboveskip}{6pt plus 2pt minus 2pt} \renewcommand{\paragraph}{% \@startsection% {paragraph}% {4}% {0pt}% {\@paragraphaboveskip} {-1em}% {\normalsize \bfseries \if \@times \itshape \fi}} % Standard headings: \newcommand{\acks}{\section*{Acknowledgments}} \newcommand{\keywords}{\paragraph*{Keywords}} \newcommand{\terms}{\paragraph*{General Terms}} % Identification % -------------- \def \@conferencename {} \def \@conferenceinfo {} \def \@copyrightyear {} \def \@copyrightdata {[to be supplied]} \newcommand{\conferenceinfo}[2]{% \gdef \@conferencename {#1}% \gdef \@conferenceinfo {#2}} \newcommand{\copyrightyear}[1]{% \gdef \@copyrightyear {#1}} \let \CopyrightYear = \copyrightyear \newcommand{\copyrightdata}[1]{% \gdef \@copyrightdata {#1}} \let \crdata = \copyrightdata % Lists % ----- \setlength{\leftmargini}{13pt} \setlength\leftmarginii{13pt} \setlength\leftmarginiii{13pt} \setlength\leftmarginiv{13pt} \setlength{\labelsep}{3.5pt} \setlength{\topsep}{\standardvspace} \if \@blockstyle \setlength{\itemsep}{0pt} \setlength{\parsep}{4pt} \else \setlength{\itemsep}{2pt} \setlength{\parsep}{0pt} \fi \renewcommand{\labelitemi}{{\small \centeroncapheight{\textbullet}}} \renewcommand{\labelitemii}{\centeroncapheight{\rule{2.5pt}{2.5pt}}} \renewcommand{\labelitemiii}{$-$} \renewcommand{\labelitemiv}{{\Large \textperiodcentered}} \renewcommand{\@listi}{% \leftmargin = \leftmargini \listparindent = \parindent} \let \@listI = \@listi \renewcommand{\@listii}{% \leftmargin = \leftmarginii \labelwidth = \leftmarginii \advance \labelwidth by -\labelsep \listparindent = \parindent} \renewcommand{\@listiii}{% \leftmargin = \leftmarginiii \labelwidth = \leftmarginiii \advance \labelwidth by -\labelsep \listparindent = \parindent} \renewcommand{\@listiv}{% \leftmargin = \leftmarginiv \labelwidth = \leftmarginiv \advance \labelwidth by -\labelsep \listparindent = \parindent} % Mathematics % ----------- \def \theequation {\arabic{equation}} % Miscellaneous % ------------- \newcommand{\balancecolumns}{% \vfill\eject \global\@colht = \textheight \global\ht\@cclv = \textheight} \newcommand{\nut}{\hspace{.5em}} \newcommand{\softraggedright}{% \let \\ = \@centercr \leftskip = 0pt \rightskip = 0pt plus 10pt} % Program Code % ------- ---- \newcommand{\mono}[1]{% {\@tempdima = \fontdimen2\font \texttt{\spaceskip = 1.1\@tempdima #1}}} % Running Heads and Feet % ------- ----- --- ---- \if \@preprint \def \ps@plain {% \let \@mkboth = \@gobbletwo \let \@evenhead = \@empty \def \@evenfoot {% \reset@font \@conferencename \hfil \thepage \hfil \@formatyear}% \let \@oddhead = \@empty \let \@oddfoot = \@evenfoot} \else \let \ps@plain = \ps@empty \let \ps@headings = \ps@empty \let \ps@myheadings = \ps@empty \fi \def \@formatyear {% \number\year/\number\month/\number\day} % Title Page % ----- ---- \@setflag \@addauthorsdone = \@false \def \@titletext {\@latex@error{No title was provided}{}} \def \@subtitletext {} \newcount{\@authorcount} \newcount{\@titlenotecount} \newtoks{\@titlenotetext} \renewcommand{\title}[1]{% \gdef \@titletext {#1}} \newcommand{\subtitle}[1]{% \gdef \@subtitletext {#1}} \newcommand{\authorinfo}[3]{% {names}{affiliation}{email/URL} \global\@increment \@authorcount \@withname\gdef {\@authorname\romannumeral\@authorcount}{#1}% \@withname\gdef {\@authoraffil\romannumeral\@authorcount}{#2}% \@withname\gdef {\@authoremail\romannumeral\@authorcount}{#3}} \renewcommand{\author}[1]{% \@latex@error{The \string\author\space command is obsolete; use \string\authorinfo}{}} \renewcommand{\maketitle}{% \pagestyle{plain}% \if \@onecolumn {\hsize = \standardtextwidth \@maketitle}% \else \twocolumn[\@maketitle]% \fi \@placetitlenotes \if \@copyrightwanted \@copyrightspace \fi} \def \@maketitle {% \begin{center} \let \thanks = \titlenote \noindent \LARGE \bfseries \@titletext \par \vskip 6pt \noindent \Large \@subtitletext \par \vskip 12pt \ifcase \@authorcount \@latex@error{No authors were specified for this paper}{}\or \@titleauthors{i}{}{}\or \@titleauthors{i}{ii}{}\or \@titleauthors{i}{ii}{iii}\or \@titleauthors{i}{ii}{iii}\@titleauthors{iv}{}{}\or \@titleauthors{i}{ii}{iii}\@titleauthors{iv}{v}{}\or \@titleauthors{i}{ii}{iii}\@titleauthors{iv}{v}{vi}\or \@titleauthors{i}{ii}{iii}\@titleauthors{iv}{v}{vi}% \@titleauthors{vii}{}{}\or \@titleauthors{i}{ii}{iii}\@titleauthors{iv}{v}{vi}% \@titleauthors{vii}{viii}{}\or \@titleauthors{i}{ii}{iii}\@titleauthors{iv}{v}{vi}% \@titleauthors{vii}{viii}{ix}\or \@titleauthors{i}{ii}{iii}\@titleauthors{iv}{v}{vi}% \@titleauthors{vii}{viii}{ix}\@titleauthors{x}{}{}% \@titleauthors{i}{ii}{iii}\@titleauthors{iv}{v}{vi}% \@titleauthors{vii}{viii}{ix}\@titleauthors{x}{xi}{}% \@titleauthors{i}{ii}{iii}\@titleauthors{iv}{v}{vi}% \@titleauthors{vii}{viii}{ix}\@titleauthors{x}{xi}{xii}% \else \@latex@error{Cannot handle more than 12 authors}{}% \fi \vspace{1.75pc} \end{center}} \def \@titleauthors #1#2#3{% \if \andp{\emptyargp{#2}}{\emptyargp{#3}}% \noindent \@setauthor{40pc}{#1}{\@false}\par \else\if \emptyargp{#3}% \noindent \@setauthor{17pc}{#1}{\@false}\hspace{3pc}% \@setauthor{17pc}{#2}{\@false}\par \else \noindent \@setauthor{12.5pc}{#1}{\@false}\hspace{2pc}% \@setauthor{12.5pc}{#2}{\@false}\hspace{2pc}% \@setauthor{12.5pc}{#3}{\@true}\par \relax \fi\fi \vspace{20pt}} \def \@setauthor #1#2#3{% \vtop{% \def \and {% \hspace{16pt}} \hsize = #1 \normalfont \centering \large \@name{\@authorname#2}\par \vspace{5pt} \normalsize \@name{\@authoraffil#2}\par \vspace{2pt} \textsf{\@name{\@authoremail#2}}\par}} \def \@maybetitlenote #1{% \if \andp{#1}{\gtrp{\@authorcount}{3}}% \titlenote{See page~\pageref{@addauthors} for additional authors.}% \fi} \newtoks{\@fnmark} \newcommand{\titlenote}[1]{% \global\@increment \@titlenotecount \ifcase \@titlenotecount \relax \or \@fnmark = {\ast}\or \@fnmark = {\dagger}\or \@fnmark = {\ddagger}\or \@fnmark = {\S}\or \@fnmark = {\P}\or \@fnmark = {\ast\ast}% \fi \,$^{\the\@fnmark}$% \edef \reserved@a {\noexpand\@appendtotext{% \noexpand\@titlefootnote{\the\@fnmark}}}% \reserved@a{#1}} \def \@appendtotext #1#2{% \global\@titlenotetext = \expandafter{\the\@titlenotetext #1{#2}}} \newcount{\@authori} \iffalse \def \additionalauthors {% \if \gtrp{\@authorcount}{3}% \section{Additional Authors}% \label{@addauthors}% \noindent \@authori = 4 {\let \\ = ,% \loop \textbf{\@name{\@authorname\romannumeral\@authori}}, \@name{\@authoraffil\romannumeral\@authori}, email: \@name{\@authoremail\romannumeral\@authori}.% \@increment \@authori \if \notp{\gtrp{\@authori}{\@authorcount}} \repeat}% \par \fi \global\@setflag \@addauthorsdone = \@true} \fi \let \addauthorsection = \additionalauthors \def \@placetitlenotes { \the\@titlenotetext} % Utilities % --------- \newcommand{\centeroncapheight}[1]{% {\setbox\@tempboxa = \hbox{#1}% \@measurecapheight{\@tempdima}% % Calculate ht(CAP) - ht(text) \advance \@tempdima by -\ht\@tempboxa % ------------------ \divide \@tempdima by 2 % 2 \raise \@tempdima \box\@tempboxa}} \newbox{\@measbox} \def \@measurecapheight #1{% {\dimen} \setbox\@measbox = \hbox{ABCDEFGHIJKLMNOPQRSTUVWXYZ}% #1 = \ht\@measbox} \long\def \@titlefootnote #1#2{% \insert\footins{% \reset@font\footnotesize \interlinepenalty\interfootnotelinepenalty \splittopskip\footnotesep \splitmaxdepth \dp\strutbox \floatingpenalty \@MM \hsize\columnwidth \@parboxrestore %%% \protected@edef\@currentlabel{% %%% \csname p@footnote\endcsname\@thefnmark}% \color@begingroup \def \@makefnmark {$^{#1}$}% \@makefntext{% \rule\z@\footnotesep\ignorespaces#2\@finalstrut\strutbox}% \color@endgroup}} % LaTeX Modifications % ----- ------------- \def \@seccntformat #1{% \@name{\the#1}% \@expandaftertwice\@seccntformata \csname the#1\endcsname.\@mark \quad} \def \@seccntformata #1.#2\@mark{% \if \emptyargp{#2}.\fi} % Revision History % -------- ------- % Date Person Ver. Change % ---- ------ ---- ------ % 2004.09.12 PCA 0.1--5 Preliminary development. % 2004.11.18 PCA 0.5 Start beta testing. % 2004.11.19 PCA 0.6 Obsolete \author and replace with % \authorinfo. % Add 'nocopyrightspace' option. % Compress article opener spacing. % Add 'mathtime' option. % Increase text height by 6 points. % 2004.11.28 PCA 0.7 Add 'cm/computermodern' options. % Change default to Times text. % 2004.12.14 PCA 0.8 Remove use of mathptm.sty; it cannot % coexist with latexym or amssymb. % 2005.01.20 PCA 0.9 Rename class file to sigplanconf.cls. % 2005.03.05 PCA 0.91 Change default copyright data. % 2005.03.06 PCA 0.92 Add at-signs to some macro names. % 2005.03.07 PCA 0.93 The 'onecolumn' option defaults to '11pt', % and it uses the full type width. menhir-20210929/doc/sigplanconf.hva000066400000000000000000000003311412503066000170100ustar00rootroot00000000000000\input{article.hva} \usepackage{hyperref} \usepackage{natbib} \usepackage{amsmath} \newcommand{\authorinfo}[3]{\author{\renewcommand\and{ and }#1\\#2\\\texttt{#3}}} \newcommand{\mathstrut}[1]{#1} \usepackage{style}menhir-20210929/doc/style.hva000066400000000000000000000002531412503066000156500ustar00rootroot00000000000000% Compact layout \newstyle{body}{ max-width:800px; width: 85\%; margin: auto; font-size: 1rem; } \newstyle{pre, .quote}{ margin-left: 2em; font-size: 1rem; } menhir-20210929/doc/version.tex000066400000000000000000000000361412503066000162160ustar00rootroot00000000000000\gdef\menhirversion{20210929} menhir-20210929/doc/whizzy.el000066400000000000000000000003541412503066000157000ustar00rootroot00000000000000(whizzy-add-configuration ".*\.\\(tex\\|sty\\)" '((whizzy-master . "main.tex")) ) (whizzy-add-configuration "main\.tex" '((whizzy . "section -advi \"advi -geometry 1270x1024 -fullwidth -html Start-Document\" -dvicopy dvicopy" )) ) menhir-20210929/doc/whizzy.sh000066400000000000000000000004601412503066000157100ustar00rootroot00000000000000# Include TEXINPUTS setting from Makefile.local. # Do not include all of Makefile.local, because both whizzytex and Makefile # rely on NAME (for different purposes). if [ -f Makefile.local ] then echo "Extracting TEXINPUTS setting from Makefile.local..." `grep TEXINPUTS Makefile.local` fi menhir-20210929/doc/whizzy.sty000066400000000000000000000001371412503066000161160ustar00rootroot00000000000000% Use small pages when whizzytex'ing. \makeatletter \setlength\textheight{340\p@} \makeatother menhir-20210929/dune000066400000000000000000000001201412503066000141120ustar00rootroot00000000000000(vendored_dirs fix pprint) (data_only_dirs analysis attic headers releases www) menhir-20210929/dune-project000066400000000000000000000003011412503066000155570ustar00rootroot00000000000000(lang dune 2.0) (name menhir) (using menhir 2.0) (version 20210929) (package (name menhirLib) ) (package (name menhirSdk) ) (package (name menhir) ) (package (name coq-menhirlib) ) menhir-20210929/exec.sh000077500000000000000000000016271412503066000145340ustar00rootroot00000000000000#!/bin/bash set -euo pipefail # This script re-runs a specific test, named on the command line. # Examples: # ./exec.sh good/mezzo # ./exec.sh bad/option # If this is a newly created test, then [make depend] should be run first # for dune to know about this test. for name in "$@" do if [[ $name =~ ^good/.* ]] ; then # A positive test. base=${name#good/} rm -f _build/default/test/static/"$name".out dune build @$base # Display the timings. cat _build/default/test/static/"$name".timings elif [[ $name =~ ^bad/.* ]] ; then # A negative test. base=${name#bad/} rm -f _build/default/test/static/"$name".out dune build @$base # Display the output. cat _build/default/test/static/"$name".out else # Unrecognized. echo "Don't know what to do with '$name'." echo "This script handles tests whose name begins with good/ or bad/." exit 1 fi done menhir-20210929/fix/000077500000000000000000000000001412503066000140315ustar00rootroot00000000000000menhir-20210929/fix/.gitignore000066400000000000000000000000361412503066000160200ustar00rootroot00000000000000_build .merlin fix.install *~ menhir-20210929/fix/AUTHORS000066400000000000000000000000731412503066000151010ustar00rootroot00000000000000François Pottier, Inria Paris menhir-20210929/fix/CHANGES.md000066400000000000000000000027031412503066000154250ustar00rootroot00000000000000# CHANGES ## 2020/11/20 * New module `DataFlow`, which performs a forward data flow analysis over a directed graph. (Such a computation could previously be performed by using the generic solver `Fix.Make`, but it was somewhat awkward to write, as it required access to predecessors. The new algorithm is easier to use and is more efficient.) * In `Memoize`, new combinator `curried`, which can be used in combination with `fix` or `defensive_fix`. Thus, for instance, `curried fix` is a fixed point combinator that constructs a memoizing two-argument curried function. ## 2020/01/31 * In `Gensym`, new abstract type `generator`, with three functions `generator`, `fresh`, and `current`. * In `Memoize`, new function `visibly_memoize`, which not only returns a memoized function, but also provides outside access to the memoization table. * New signatures `ONGOING_NUMBERING` and `TWO_PHASE_NUMBERING` and new module `Numbering`, which provides facilities for numbering things. * Breaking change: the module `Fix.Number` is renamed `Fix.GraphNumbering`. ## 2018/12/06 * New release, including new modules (`Gensym`, `Memoize`, `Tabulate`, `Number`, `HashCons`, `Prop`, `Glue`), new convenience functors (`Fix.ForHashedType`, etc.), and new demos. The least-fixed-point computation algorithm is unchanged. ## 2013/06/11 * Initial release of the package, containing just `Fix.Make`, the least-fixed-point computation algorithm. menhir-20210929/fix/LICENSE000066400000000000000000000636611412503066000150520ustar00rootroot00000000000000In the following, "the Library" refers to the OCaml source files that form the Fix library. The names of these files match the pattern src/*.{ml,mli}. The Library is distributed under the terms of the GNU Library General Public License version 2 (included below). As a special exception to the GNU Library General Public License, you may link, statically or dynamically, a "work that uses the Library" with a publicly distributed version of the Library to produce an executable file containing portions of the Library, and distribute that executable file under terms of your choice, without any of the additional requirements listed in clause 6 of the GNU Library General Public License. By "a publicly distributed version of the Library", we mean either the unmodified Library as distributed by INRIA, or a modified version of the Library that is distributed under the conditions defined in clause 2 of the GNU Library General Public License. This exception does not however invalidate any other reasons why the executable file might be covered by the GNU Library General Public License. ---------------------------------------------------------------------- GNU LIBRARY GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1991 Free Software Foundation, Inc. 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. [This is the first released version of the library GPL. It is numbered 2 because it goes with version 2 of the ordinary GPL.] Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This license, the Library General Public License, applies to some specially designated Free Software Foundation software, and to any other libraries whose authors decide to use it. You can use it for your libraries, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library, or if you modify it. For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link a program with the library, you must provide complete object files to the recipients so that they can relink them with the library, after making changes to the library and recompiling it. And you must show them these terms so they know their rights. Our method of protecting your rights has two steps: (1) copyright the library, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the library. Also, for each distributor's protection, we want to make certain that everyone understands that there is no warranty for this free library. If the library is modified by someone else and passed on, we want its recipients to know that what they have is not the original version, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that companies distributing free software will individually obtain patent licenses, thus in effect transforming the program into proprietary software. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. Most GNU software, including some libraries, is covered by the ordinary GNU General Public License, which was designed for utility programs. This license, the GNU Library General Public License, applies to certain designated libraries. This license is quite different from the ordinary one; be sure to read it in full, and don't assume that anything in it is the same as in the ordinary license. The reason we have a separate public license for some libraries is that they blur the distinction we usually make between modifying or adding to a program and simply using it. Linking a program with a library, without changing the library, is in some sense simply using the library, and is analogous to running a utility program or application program. However, in a textual and legal sense, the linked executable is a combined work, a derivative of the original library, and the ordinary General Public License treats it as such. Because of this blurred distinction, using the ordinary General Public License for libraries did not effectively promote software sharing, because most developers did not use the libraries. We concluded that weaker conditions might promote sharing better. However, unrestricted linking of non-free programs would deprive the users of those programs of all benefit from the free status of the libraries themselves. This Library General Public License is intended to permit developers of non-free programs to use free libraries, while preserving your freedom as a user of such programs to change the free libraries that are incorporated in them. (We have not seen how to achieve this as regards changes in header files, but we have achieved it as regards changes in the actual functions of the Library.) The hope is that this will lead to faster development of free libraries. The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, while the latter only works together with the library. Note that it is possible for a library to be covered by the ordinary General Public License rather than by this special one. GNU LIBRARY GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any software library which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Library General Public License (also called "this License"). Each licensee is addressed as "you". A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables. The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".) "Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library. Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does. 1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) The modified work must itself be a software library. b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change. c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License. d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful. (For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library. In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2, instead of to this License. (If a newer version than version 2 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices. Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy. This option is useful when you wish to copy part of the code of the Library into a program that is not a library. 4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange. If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code. 5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License. However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables. When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law. If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.) Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself. 6. As an exception to the Sections above, you may also compile or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications. You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things: a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.) b) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution. c) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place. d) Verify that the user has already received a copy of these materials or that you have already sent this user a copy. For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute. 7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above. b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it. 10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 13. The Free Software Foundation may publish revised and/or new versions of the Library General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation. 14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS Appendix: How to Apply These Terms to Your New Libraries If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License). To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This library is free software; you can redistribute it and/or modify it under the terms of the GNU Library General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public License for more details. You should have received a copy of the GNU Library General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA Also add information on how to contact you by electronic and paper mail. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the library `Frob' (a library for tweaking knobs) written by James Random Hacker. , 1 April 1990 Ty Coon, President of Vice That's all there is to it! menhir-20210929/fix/dune000066400000000000000000000001461412503066000147100ustar00rootroot00000000000000(env (dev (flags :standard -g -w @A-4-44-67 )) (release (flags :standard )) ) menhir-20210929/fix/dune-project000066400000000000000000000000201412503066000163430ustar00rootroot00000000000000(lang dune 1.3) menhir-20210929/fix/dune-workspace.versions000066400000000000000000000005071412503066000205540ustar00rootroot00000000000000(lang dune 2.0) (context (opam (switch 4.02.3))) (context (opam (switch 4.03.0))) (context (opam (switch 4.04.2))) (context (opam (switch 4.05.0))) (context (opam (switch 4.06.1))) (context (opam (switch 4.07.1))) (context (opam (switch 4.08.1))) (context (opam (switch 4.09.1))) (context (opam (switch 4.09.0+bytecode-only))) menhir-20210929/fix/src/000077500000000000000000000000001412503066000146205ustar00rootroot00000000000000menhir-20210929/fix/src/Boolean.ml000066400000000000000000000021311412503066000165260ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) type property = bool let bottom = false let equal (b1 : bool) (b2 : bool) = b1 = b2 let leq (b1 : bool) (b2 : bool) = b1 <= b2 let is_maximal b = b let leq_join b1 b2 = b1 || b2 let join = leq_join menhir-20210929/fix/src/Boolean.mli000066400000000000000000000021301412503066000166760ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (**The Boolean lattice. The ordering is [false <= true]. *) open Sigs include PROPERTY with type property = bool include SEMI_LATTICE with type property := bool include MINIMAL_SEMI_LATTICE with type property := bool menhir-20210929/fix/src/Core.ml000066400000000000000000000303051412503066000160430ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Sigs (* -------------------------------------------------------------------------- *) (* The code is parametric in an implementation of maps over variables and in an implementation of properties. *) module Make (M : IMPERATIVE_MAPS) (P : PROPERTY) = struct type variable = M.key type property = P.property type valuation = variable -> property type rhs = valuation -> property type equations = variable -> rhs (* -------------------------------------------------------------------------- *) (* The dependency graph. *) (* An edge from [node1] to [node2] in the dynamic dependency graph means that [node1] depends on [node2], or (equivalently) that [node1] observes [node2]. Then, an update of the current property at [node2] causes a signal to be sent to [node1]. A node can observe itself. *) module Graph = CoreDependencyGraph type node = data Graph.node (* Each node in the dependency graph corresponds to a specific variable [v], and carries data about it. *) and data = { (* This is the result of the application of [rhs] to the variable [v]. It must be stored in order to guarantee that this application is performed at most once. *) rhs: rhs; (* This is the current property at [v]. It evolves monotonically with time. *) mutable property: property; (* That's it! *) } (* [property node] returns the current property at [node]. *) let property node = (Graph.data node).property (* -------------------------------------------------------------------------- *) (* Many definitions must be made within the body of the function [lfp]. For greater syntactic convenience, we place them in a local module. *) let lfp (eqs : equations) : valuation = let module LFP = struct (* -------------------------------------------------------------------------- *) (* The workset. *) (* When the algorithm is inactive, the workset is empty. *) (* Our workset is based on a Queue, but it could just as well be based on a Stack. A textual replacement is possible. It could also be based on a priority queue, provided a sensible way of assigning priorities could be found. *) module Workset : sig (* [insert node] inserts [node] into the workset. [node] must have no successors. *) val insert: node -> unit (* [repeat f] repeatedly applies [f] to a node extracted out of the workset, until the workset becomes empty. [f] is allowed to use [insert]. *) val repeat: (node -> unit) -> unit (* That's it! *) end = struct (* Initialize the workset. *) let workset = Queue.create() let insert node = Queue.push node workset let repeat f = while not (Queue.is_empty workset) do f (Queue.pop workset) done end (* -------------------------------------------------------------------------- *) (* Signals. *) (* A node in the workset has no successors. (It can have predecessors.) In other words, a predecessor (an observer) of some node is never in the workset. Furthermore, a node never appears twice in the workset. *) (* When a variable broadcasts a signal, all of its predecessors (observers) receive the signal. Any variable that receives the signal loses all of its successors (that is, it ceases to observe anything) and is inserted into the workset. This preserves the above invariant. *) let signal subject = List.iter (fun observer -> Graph.clear_successors observer; Workset.insert observer ) (Graph.predecessors subject) (* At this point, [subject] has no predecessors. This plays no role in the correctness proof, though. *) (* -------------------------------------------------------------------------- *) (* Tables. *) (* The permanent table maps variables that have reached a fixed point to properties. It persists forever. *) let permanent : property M.t = M.create() (* The transient table maps variables that have not yet reached a fixed point to nodes. (A node contains not only a property, but also a memoized right-hand side, and carries edges.) At the beginning of a run, it is empty. It fills up during a run. At the end of a run, it is copied into the permanent table and cleared. *) let transient : node M.t = M.create() (* [freeze()] copies the transient table into the permanent table, and empties the transient table. This allows all nodes to be reclaimed by the garbage collector. *) let freeze () = M.iter (fun v node -> M.add v (property node) permanent ) transient; M.clear transient (* -------------------------------------------------------------------------- *) (* Workset processing. *) (* [solve node] re-evaluates the right-hand side at [node]. If this leads to a change, then the current property is updated, and [node] emits a signal towards its observers. *) (* When [solve node] is invoked, [node] has no subjects. Indeed, when [solve] is invoked by [node_for], [node] is newly created; when [solve] is invoked by [Workset.repeat], [node] has just been extracted out of the workset, and a node in the workset has no subjects. *) (* [node] must not be in the workset. *) (* In short, when [solve node] is invoked, [node] is neither awake nor asleep. When [solve node] finishes, [node] is either awake or asleep again. (Chances are, it is asleep, unless it is its own observer; then, it is awakened by the final call to [signal node].) *) let rec solve (node : node) : unit = (* Retrieve the data record carried by this node. *) let data = Graph.data node in (* Prepare to compute an updated value at this node. This is done by invoking the client's right-hand side function. *) (* The flag [alive] is used to prevent the client from invoking [request] after this interaction phase is over. In theory, this dynamic check seems required in order to argue that [request] behaves like a pure function. In practice, this check is not very useful: only a bizarre client would store a [request] function and invoke it after it has become stale. *) let alive = ref true and subjects = ref [] in (* We supply the client with [request], a function that provides access to the current valuation, and dynamically records dependencies. This yields a set of dependencies that is correct by construction. *) let request (v : variable) : property = assert !alive; try M.find v permanent with Not_found -> let subject = node_for v in (* IFPAPER subjects := subject :: !subjects; property subject ELSE *) let p = property subject in if not (P.is_maximal p) then subjects := subject :: !subjects; p (* END *) in (* Give control to the client. *) let new_property = data.rhs request in (* From now on, prevent any invocation of this instance of [request] by the client. *) alive := false; (* At this point, [node] has no subjects, as noted above. Thus, the precondition of [set_successors] is met. We can install [subjects] as the new set of subjects for this node. *) (* If we have gathered no subjects in the list [subjects], then this node must have stabilized. If [new_property] is maximal, then this node must have stabilized. *) (* If this node has stabilized, then it need not observe any more, so the call to [set_successors] is skipped. In practice, this seems to be a minor optimization. In the particular case where every node stabilizes at the very first call to [rhs], this means that no edges are ever built. This particular case is unlikely, as it means that we are just doing memoization, not a true fixed point computation. *) (* One could go further and note that, if this node has stabilized, then it could immediately be taken out of the transient table and copied into the permanent table. This would have the beneficial effect of allowing the detection of further nodes that have stabilized. Furthermore, it would enforce the property that no node in the transient table has a maximal value, hence the call to [is_maximal] above would become useless. *) (* IFPAPER Graph.set_successors node !subjects; ELSE *) if not (!subjects = [] || P.is_maximal new_property) then Graph.set_successors node !subjects; (* END *) (* If the updated value differs from the previous value, record the updated value and send a signal to all observers of [node]. *) if not (P.equal data.property new_property) then begin data.property <- new_property; signal node end (* Note that equality of the two values does not imply that this node has stabilized forever. *) (* -------------------------------------------------------------------------- *) (* [node_for v] returns the graph node associated with the variable [v]. It is assumed that [v] does not appear in the permanent table. If [v] appears in the transient table, the associated node is returned. Otherwise, [v] is a newly discovered variable: a new node is created on the fly, and the transient table is grown. The new node can either be inserted into the workset (it is then awake) or handled immediately via a recursive call to [solve] (it is then asleep, unless it observes itself). *) (* The recursive call to [solve node] can be replaced, if desired, by a call to [Workset.insert node]. Using a recursive call to [solve] permits eager top-down discovery of new nodes. This can save a constant factor, because it allows new nodes to move directly from [bottom] to a good first approximation, without sending any signals, since [node] has no observers when [solve node] is invoked. In fact, if the dependency graph is acyclic, the algorithm discovers nodes top-down, performs computation on the way back up, and runs without ever inserting a node into the workset! Unfortunately, this causes the stack to grow as deep as the longest path in the dependency graph, which can blow up the stack. *) and node_for (v : variable) : node = try M.find v transient with Not_found -> let node = Graph.create { rhs = eqs v; property = P.bottom } in (* Adding this node to the transient table prior to calling [solve] recursively is mandatory, otherwise [solve] might loop, creating an infinite number of nodes for the same variable. *) M.add v node transient; solve node; (*k or: Workset.insert node *) node (* -------------------------------------------------------------------------- *) (* Invocations of [get] trigger the fixed point computation. *) (* The flag [inactive] prevents reentrant calls by the client. *) let inactive = ref true let get (v : variable) : property = try M.find v permanent with Not_found -> assert !inactive; inactive := false; let node = node_for v in Workset.repeat solve; freeze(); inactive := true; property node (* -------------------------------------------------------------------------- *) (* Close the local module [LFP]. *) end in LFP.get end (* -------------------------------------------------------------------------- *) (* Special cases, for easier use. *) module ForOrderedType (T : OrderedType) (P : PROPERTY) = Make(Glue.PersistentMapsToImperativeMaps(Map.Make(T)))(P) module ForHashedType (T : HashedType) (P : PROPERTY) = Make(Glue.HashTablesAsImperativeMaps(T))(P) module ForType (T : TYPE) (P : PROPERTY) = ForHashedType(Glue.TrivialHashedType(T))(P) menhir-20210929/fix/src/Core.mli000066400000000000000000000053161412503066000162200ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (**[Fix] offers support for computing the least solution of a set of monotone equations, as described in the unpublished paper "Lazy Least Fixed Points in ML". In other words, it allows defining a recursive function of type [variable -> property], where cyclic dependencies between variables are allowed, and properties must be equipped with a partial order. The function thus obtained performs the fixed point computation on demand, in an incremental manner, and is memoizing. *) open Sigs (**[Make] constructs a solver for a type [key] that is equipped with an implementation of imperative maps and a type [property] that is equipped with [bottom], [equal], and [is_maximal] functions. *) module Make (M : IMPERATIVE_MAPS) (P : PROPERTY) : SOLVER with type variable = M.key and type property = P.property (**[ForOrderedType] is a special case of [Make] where it suffices to pass an ordered type [T] as an argument. A reference to a persistent map is used to hold the memoization table. *) module ForOrderedType (T : OrderedType) (P : PROPERTY) : SOLVER with type variable = T.t and type property = P.property (**[ForHashedType] is a special case of [Make] where it suffices to pass a hashed type [T] as an argument. A hash table is used to hold the memoization table. *) module ForHashedType (T : HashedType) (P : PROPERTY) : SOLVER with type variable = T.t and type property = P.property (**[ForType] is a special case of [Make] where it suffices to pass an arbitrary type [T] as an argument. A hash table is used to hold the memoization table. OCaml's built-in generic equality and hash functions are used. *) module ForType (T : TYPE) (P : PROPERTY) : SOLVER with type variable = T.t and type property = P.property menhir-20210929/fix/src/CoreDependencyGraph.ml000066400000000000000000000105211412503066000210220ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* Using doubly-linked adjacency lists, one could implement [predecessors] in worst-case linear time with respect to the length of the output list, [set_successors] in worst-case linear time with respect to the length of the input list, and [clear_successors] in worst-case linear time with respect to the number of edges that are removed. We use a simpler implementation, based on singly-linked adjacency lists, with deferred removal of edges. It achieves the same complexity bounds, except [predecessors] only offers an amortized complexity bound. This is good enough for our purposes, and, in practice, is more efficient by a constant factor. This simplification was suggested by Arthur Charguéraud. *) (* -------------------------------------------------------------------------- *) (* Nodes and edges. *) type 'data node = { (* The client information associated with this node. *) data: 'data; (* This node's incoming and outgoing edges. *) mutable outgoing: 'data edge list; mutable incoming: 'data edge list; (* A transient mark, always set to [false], except when checking against duplicate elements in a successor list. *) mutable marked: bool; } and 'data edge = { (* This edge's nodes. Edges are symmetric: source and destination are not distinguished. Thus, an edge appears both in the outgoing edge list of its source node and in the incoming edge list of its destination node. This allows edges to be easily marked as destroyed. *) node1: 'data node; node2: 'data node; (* Edges that are destroyed are marked as such, but are not immediately removed from the adjacency lists. *) mutable destroyed: bool; } (* -------------------------------------------------------------------------- *) (* Node creation. *) let create data = { data = data; outgoing = []; incoming = []; marked = false; } (* Data access. *) let data node = node.data (* [follow src edge] returns the node that is connected to [src] by [edge]. Time complexity: constant. *) let follow src edge = if edge.node1 == src then edge.node2 else begin assert (edge.node2 == src); edge.node1 end (* The [predecessors] function removes edges that have been marked destroyed. The cost of removing these has already been paid for, so the amortized time complexity of [predecessors] is linear in the length of the output list. *) let predecessors (node : 'data node) : 'data node list = let predecessors = List.filter (fun edge -> not edge.destroyed) node.incoming in node.incoming <- predecessors; List.map (follow node) predecessors (* [link src dst] creates a new edge from [src] to [dst], together with its reverse edge. Time complexity: constant. *) let link (src : 'data node) (dst : 'data node) = let edge = { node1 = src; node2 = dst; destroyed = false; } in src.outgoing <- edge :: src.outgoing; dst.incoming <- edge :: dst.incoming let set_successors (src : 'data node) (dsts : 'data node list) = assert (src.outgoing = []); let rec loop = function | [] -> () | dst :: dsts -> if dst.marked then loop dsts (* skip duplicate elements *) else begin dst.marked <- true; link src dst; loop dsts; dst.marked <- false end in loop dsts let clear_successors node = List.iter (fun edge -> assert (not edge.destroyed); edge.destroyed <- true; ) node.outgoing; node.outgoing <- [] menhir-20210929/fix/src/CoreDependencyGraph.mli000066400000000000000000000043601412503066000211770ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* This module provides a data structure for maintaining and modifying a directed graph. Each node is allowed to carry a piece of client data. There are functions for creating a new node, looking up a node's data, looking up a node's predecessors, and setting or clearing a node's successors (all at once). *) type 'data node (* [create data] creates a new node, with no incident edges, with client information [data]. Time complexity: constant. *) val create: 'data -> 'data node (* [data node] returns the client information associated with the node [node]. Time complexity: constant. *) val data: 'data node -> 'data (* [predecessors node] returns a list of [node]'s predecessors. Amortized time complexity: linear in the length of the output list. *) val predecessors: 'data node -> 'data node list (* [set_successors src dsts] creates an edge from the node [src] to each of the nodes in the list [dsts]. Duplicate elements in the list [dsts] are removed, so that no duplicate edges are created. It is assumed that [src] initially has no successors. Time complexity: linear in the length of the input list. *) val set_successors: 'data node -> 'data node list -> unit (* [clear_successors node] removes all of [node]'s outgoing edges. Time complexity: linear in the number of edges that are removed. *) val clear_successors: 'data node -> unit menhir-20210929/fix/src/DataFlow.ml000066400000000000000000000121001412503066000166450ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Sigs (* Such a data flow analysis problem could also be solved by the generic least fixed point computation algorithm [Fix.Make.lfp]. However, such an approach would be less efficient, as (1) it would require reversing the graph first, so to have access to predecessors; (2) whenever a dirty node is examined, the contributions of all of its predecessors would be recomputed and joined, whereas the forward data flow analysis algorithm pushes information from a dirty node to its successors, thereby avoiding recomputation along edges whose source is not dirty; (3) the generic algorithm performs dynamic discovery of dependencies, whereas in this situation, all dependencies are explicitly provided by the user. *) (* We require a minimal semi-lattice, equipped with a [leq_join] operation, as opposed to a semi-lattice, which offers separate [leq] and [join] operations. Although [leq_join] is less powerful, it is sufficient for our purposes, and is potentially more efficient than the sequence of [leq] [join]. *) module Run (M : MINIMAL_IMPERATIVE_MAPS) (P : MINIMAL_SEMI_LATTICE) (G : DATA_FLOW_GRAPH with type variable = M.key and type property = P.property) = struct open P type variable = M.key (* A mapping of variables to properties. This mapping is initially empty. *) let properties = M.create() (* A set of dirty variables, whose outgoing transitions must be examined. *) (* The set of dirty variables is represented as a combination of a stack and a map of variables to Booleans. This map keeps track of which variables are in the stack and allows us to avoid pushing a variable onto the stack when it is already in the stack. (In principle, a map of variables to [unit] should suffice, but our minimal map API does not offer a [remove] function. Thus, we have to use a map of variables to Booleans.) *) let pending : variable Queue.t = Queue.create() let dirty : bool M.t = M.create() let is_dirty (x : variable) = try M.find x dirty with Not_found -> false let schedule (x : variable) = if not (is_dirty x) then begin M.add x true dirty; Queue.push x pending end (* [update x' p'] ensures that the property associated with the variable [x'] is at least [p']. If this causes a change in the property at [x'], then [x] is scheduled or rescheduled. *) let update (x' : variable) (p' : property) = match M.find x' properties with | exception Not_found -> (* [x'] is newly discovered. *) M.add x' p' properties; schedule x' | p -> (* [x'] has been discovered earlier. *) let p'' = P.leq_join p' p in if p'' != p then begin (* The failure of the physical equality test [p'' == p] implies that [P.leq p' p] does not hold. Thus, [x'] is affected by this update and must itself be scheduled. *) M.add x' p'' properties; schedule x' end (* [examine] examines a variable that has just been taken out of the stack. Its outgoing transitions are inspected and its successors are updated. *) let examine (x : variable) = (* [x] is dirty, so a property must have been associated with it. *) let p = try M.find x properties with Not_found -> assert false in G.foreach_successor x p update (* Populate the stack with the root variables. *) let () = G.foreach_root (fun x p -> M.add x p properties; schedule x ) (* As long as the stack is nonempty, pop a variable and examine it. *) let () = try while true do let x = Queue.pop pending in M.add x false dirty; examine x done with Queue.Empty -> () (* Expose the solution. *) type property = P.property option let solution x = try Some (M.find x properties) with Not_found -> None end module ForOrderedType (T : OrderedType) = Run(Glue.PersistentMapsToImperativeMaps(Map.Make(T))) module ForHashedType (T : HashedType) = Run(Glue.HashTablesAsImperativeMaps(T)) module ForType (T : TYPE) = ForHashedType(Glue.TrivialHashedType(T)) module ForIntSegment (K : sig val n: int end) = Run(Glue.ArraysAsImperativeMaps(K)) menhir-20210929/fix/src/DataFlow.mli000066400000000000000000000066051412503066000170330ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (**[DataFlow] performs a forward data flow analysis over a directed graph. *) open Sigs (**[Run] requires a type [variable] that is equipped with an implementation of imperative maps, a type [property] that is equipped with [leq] and [join] functions, and a data flow graph whose edges describe the propagation of properties. It performs a forward data flow analysis and returns its result. *) (**The function [solution] has type [variable -> property option]. A reachable variable is mapped to [Some _]; an unreachable one is mapped to [None]. *) module Run (M : MINIMAL_IMPERATIVE_MAPS) (P : MINIMAL_SEMI_LATTICE) (G : DATA_FLOW_GRAPH with type variable = M.key and type property = P.property) : SOLUTION with type variable = G.variable and type property = P.property option (**[ForOrderedType] is a special case of [Run] where it suffices to pass an ordered type [T] as an argument. A reference to a persistent map is used to hold the memoization table. *) module ForOrderedType (T : OrderedType) (P : MINIMAL_SEMI_LATTICE) (G : DATA_FLOW_GRAPH with type variable = T.t and type property = P.property) : SOLUTION with type variable = G.variable and type property = P.property option (**[ForHashedType] is a special case of [Run] where it suffices to pass a hashed type [T] as an argument. A hash table is used to hold the memoization table. *) module ForHashedType (T : HashedType) (P : MINIMAL_SEMI_LATTICE) (G : DATA_FLOW_GRAPH with type variable = T.t and type property = P.property) : SOLUTION with type variable = G.variable and type property = P.property option (**[ForType] is a special case of [Run] where it suffices to pass an arbitrary type [T] as an argument. A hash table is used to hold the memoization table. OCaml's built-in generic equality and hash functions are used. *) module ForType (T : TYPE) (P : MINIMAL_SEMI_LATTICE) (G : DATA_FLOW_GRAPH with type variable = T.t and type property = P.property) : SOLUTION with type variable = G.variable and type property = P.property option (**[ForIntSegment] is a special case of [Run] where the type of variables is the integer segment [\[0..n)]. An array is used to hold the table. *) module ForIntSegment (K : sig val n: int end) (P : MINIMAL_SEMI_LATTICE) (G : DATA_FLOW_GRAPH with type variable = int and type property = P.property) : SOLUTION with type variable = G.variable and type property = P.property option menhir-20210929/fix/src/Gensym.ml000066400000000000000000000023431412503066000164160ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) type gensym = unit -> int let postincrementor c () = let y = !c in c := y + 1; y let make () = postincrementor (ref 0) (* We do not worry about overflow. On today's 64-bit machines, it won't occur in a lifetime. *) type generator = int ref let generator () = ref 0 let fresh c = let y = !c in c := y + 1; y let current c = !c menhir-20210929/fix/src/Gensym.mli000066400000000000000000000031201412503066000165610ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (**[Gensym] offers a simple facility for generating fresh integer identifiers. *) (**A gensym is a generator of unique integer identifiers. *) type gensym = unit -> int (**[make()] produces a new gensym. *) val make : unit -> gensym (**A slightly more powerful abstraction is a generator whose current state can be inspected (without modification). *) type generator (**[generator()] creates a new generator. [fresh generator] causes the generator to create and return a fresh integer identifier. [current generator] returns the generator's current state, that is, the next available integer identifier. *) val generator: unit -> generator val fresh: generator -> int val current: generator -> int menhir-20210929/fix/src/Glue.ml000066400000000000000000000066301412503066000160530ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Sigs module CHAR = struct type t = char end module INT = struct type t = int end module STRING = struct type t = string end module TrivialOrderedType (T : TYPE) = struct include T let compare = compare end module TrivialHashedType (T : TYPE) = struct include T let equal = (=) let hash = Hashtbl.hash end module InjectOrderedType (U : OrderedType) (I : INJECTION with type u := U.t) = struct type t = I.t let compare x y = U.compare (I.encode x) (I.encode y) end module InjectHashedType (U : HashedType) (I : INJECTION with type u := U.t) = struct type t = I.t let equal x y = U.equal (I.encode x) (I.encode y) let hash x = U.hash (I.encode x) end module InjectMinimalImperativeMaps (M : MINIMAL_IMPERATIVE_MAPS) (I : INJECTION with type u := M.key) = struct type key = I.t type 'data t = 'data M.t let create = M.create let add x y m = M.add (I.encode x) y m let find x m = M.find (I.encode x) m end module InjectImperativeMaps (M : IMPERATIVE_MAPS) (I : INJECTION with type u := M.key) (J : sig val decode: M.key -> I.t end) = struct include InjectMinimalImperativeMaps(M)(I) let clear = M.clear let iter f m = M.iter (fun x y -> f (J.decode x) y ) m end module PersistentMapsToImperativeMaps (M : PERSISTENT_MAPS) = struct type key = M.key type 'data t = 'data M.t ref let create () = ref M.empty let clear t = t := M.empty let add k d t = t := M.add k d !t let find k t = M.find k !t let iter f t = M.iter f !t end module ArraysAsImperativeMaps (K : sig val n: int end) = struct open K type key = int type 'data t = 'data option array let create () = Array.make n None let clear m = Array.fill m 0 n None let add key data m = m.(key) <- Some data let find key m = match m.(key) with | None -> raise Not_found | Some data -> data let iter f m = Array.iteri (fun key data -> match data with | None -> () | Some data -> f key data ) m end module Adapt (T : Hashtbl.S) = struct include T (* types: [key], ['data t] *) (* values: [clear], [iter] *) let create () = T.create 1023 let add key data table = T.add table key data let find table key = T.find key table end module HashTablesAsImperativeMaps (H : HashedType) = Adapt(Hashtbl.Make(H)) module MinimalSemiLattice (P : SEMI_LATTICE) = struct type property = P.property let leq_join p' p = if P.leq p' p then p else P.join p' p end menhir-20210929/fix/src/Glue.mli000066400000000000000000000103621412503066000162210ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (**[Glue] contains glue code that helps build various implementations of association maps. *) open Sigs (* -------------------------------------------------------------------------- *) (**Some common types, packaged as modules of signature [TYPE]. *) module CHAR : TYPE with type t = char module INT : TYPE with type t = int module STRING : TYPE with type t = string (* -------------------------------------------------------------------------- *) (**An arbitrary type can be equipped with an ordering function, just by using OCaml's built-in generic comparison function. *) module TrivialOrderedType (T : TYPE) : OrderedType with type t = T.t (**An arbitrary type can be equipped with equality and hash functions, just by using OCaml's built-in generic equality and hash functions. *) module TrivialHashedType (T : TYPE) : HashedType with type t = T.t (* -------------------------------------------------------------------------- *) (**If there is an injection of [t] into [u], then an ordering on [u] gives rise to an ordering on [t]. *) module InjectOrderedType (U : OrderedType) (I : INJECTION with type u := U.t) : OrderedType with type t = I.t (**If there is an injection of [t] into [u], then a hashed-type structure on [u] can be transported to [t]. *) module InjectHashedType (U : HashedType) (I : INJECTION with type u := U.t) : HashedType with type t = I.t (**If there is an injection of [t] into [u], then an implementation of minimal imperative maps for the type [u] can be transported to the type [t]. *) module InjectMinimalImperativeMaps (M : MINIMAL_IMPERATIVE_MAPS) (I : INJECTION with type u := M.key) : MINIMAL_IMPERATIVE_MAPS with type key = I.t (**If there is an injection of [t] into [u], and if the inverse mapping can be effectively computed, then an implementation of imperative maps for the type [u] can be transported to the type [t]. *) module InjectImperativeMaps (M : IMPERATIVE_MAPS) (I : INJECTION with type u := M.key) (J : sig val decode: M.key -> I.t end) : IMPERATIVE_MAPS with type key = I.t (* -------------------------------------------------------------------------- *) (**Implementations of various map signatures. *) (**An implementation of persistent maps can be made to satisfy the interface of imperative maps. An imperative map is represented as a persistent map, wrapped within a reference cell. *) module PersistentMapsToImperativeMaps (M : PERSISTENT_MAPS) : IMPERATIVE_MAPS with type key = M.key and type 'data t = 'data M.t ref (**An implementation of imperative maps as arrays is possible if keys are consecutive integers. *) module ArraysAsImperativeMaps (K : sig val n: int end) : IMPERATIVE_MAPS with type key = int and type 'data t = 'data option array (**An implementation of imperative maps as a hash table. *) module HashTablesAsImperativeMaps (H : HashedType) : IMPERATIVE_MAPS with type key = H.t and type 'data t = 'data Hashtbl.Make(H).t (* -------------------------------------------------------------------------- *) (**[MinimalSemiLattice] converts a semi-lattice to a minimal semi-lattice; that is, it implements [leq_join] in terms of separate [leq] and [join] operations. *) module MinimalSemiLattice (P : SEMI_LATTICE) : MINIMAL_SEMI_LATTICE with type property = P.property menhir-20210929/fix/src/GraphNumbering.ml000066400000000000000000000042601412503066000200640ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Sigs module Make (M : IMPERATIVE_MAPS) (G : GRAPH with type t = M.key) = struct (* Set up a facility for numbering vertices. *) module N = Numbering.Make(M) (* Implement a depth-first search. The functions [N.has_been_encoded] and [N.encode] allow us not only to assign a unique number to each vertex, but also to mark a vertex and test whether a vertex has been marked. *) let frontier = Stack.create() let push x = Stack.push x frontier let rec visit () = match Stack.pop frontier with | exception Stack.Empty -> (* The stack is empty: we are done. *) () | x -> if N.has_been_encoded x then (* [x] is known already: ignore it. *) visit() else (* Assign a number to [x]. *) let (_ : int) = N.encode x in G.foreach_successor x push; visit() (* Perform the depth-first search. *) let () = G.foreach_root push; visit() (* We are done! This defines [n], [encode], [decode]. *) include N.Done() end module ForOrderedType (T : OrderedType) = Make(Glue.PersistentMapsToImperativeMaps(Map.Make(T))) module ForHashedType (T : HashedType) = Make(Glue.HashTablesAsImperativeMaps(T)) module ForType (T : TYPE) = ForHashedType(Glue.TrivialHashedType(T)) menhir-20210929/fix/src/GraphNumbering.mli000066400000000000000000000040401412503066000202310ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (**[GraphNumbering] offers a facility for discovering and numbering the reachable vertices in a finite directed graph. *) open Sigs (**[Make(M)(G)] produces a numbering of the graph [G], or more precisely, of the subset of the vertices of [G] that are reachable from the roots. The type of the vertices must be equipped with an implementation of imperative maps. *) module Make (M : IMPERATIVE_MAPS) (G : GRAPH with type t = M.key) : NUMBERING with type t = G.t (**[ForOrderedType] is a special case of [Make] where it suffices for the vertices of [G] to be ordered. *) module ForOrderedType (T : OrderedType) (G : GRAPH with type t = T.t) : NUMBERING with type t = G.t (**[ForHashedType] is a special case of [Make] where it suffices for the vertices of [G] to be hashed. *) module ForHashedType (T : HashedType) (G : GRAPH with type t = T.t) : NUMBERING with type t = G.t (**[ForType] is a special case of [Make] where the vertices of [G] can have arbitrary type. OCaml's built-in generic equality and hash functions are used. *) module ForType (T : TYPE) (G : GRAPH with type t = T.t) : NUMBERING with type t = G.t menhir-20210929/fix/src/HashCons.ml000066400000000000000000000052151412503066000166630ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Sigs (* The interface that we expose is not fully safe: it is possible, by applying the functor [Make] twice, to construct two instances of the hash-consing service that produce hash-consed values of *compatible* type [M.key cell]. *) (* To achieve greater safety, one might wish to make the functor [Make] generative, so that each application of [Make] creates a fresh abstract type [t] which is convertible (in one direction only) into [M.key cell]. However, that would render [Make] impossible to use in situations where the user wishes to hash-cons a type of trees. Indeed, the user needs to first define a (concrete, recursive) type of trees, then create an instance of the hash-consing service. If [Make] produces an abstract type, then the type definition and the functor application must be mutually recursive, which is not permitted. *) type 'data cell = { id: int; data: 'data } let id x = x.id let data x = x.data let equal x y = x.id = y.id (* We could also use physical equality, saving two reads. *) let compare x y = compare x.id y.id (* To compare two cells, we compare their unique identifiers. *) let hash x = Hashtbl.hash x.id (* To hash a cell, we hash its unique identifier. *) (* We could also return [x.id] without hashing it. *) module type SERVICE = sig type data val make: data -> data cell end (* Creating a fresh hash-consing service is a simple matter of: 1- creating a new gensym; 2- memoizing the function [fun data -> { id = gensym(); data }]. *) module Make (M : MEMOIZER) = struct type data = M.key let gensym = Gensym.make() let make = M.memoize (fun data -> { id = gensym(); data }) end module ForHashedType (T : HashedType) = Make(Memoize.ForHashedType(T)) menhir-20210929/fix/src/HashCons.mli000066400000000000000000000053251412503066000170360ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (**[HashCons] offers support for setting up a hash-consed data type, that is, a data type whose values carry unique integer identifiers. *) open Sigs (**The type ['data cell] describes a cell that carries a unique identifier [id] as well as a payload [data]. *) (**This type is marked [private], which means that the user has no direct way of allocating cells. Instead, the user must apply the functor [Make] (below) to obtain a function [make] which either allocates a fresh cell or returns an existing cell. The user is still allowed to read existing cells. *) type 'data cell = private { id: int; data: 'data } (**Accessors. *) val id : 'data cell -> int val data: 'data cell -> 'data (**Cells come with an equality test, a comparison function, and and a hash function. These functions exploit the cell's unique identifier only -- the data is ignored. *) (**Wherever a module of signature [HashedType with type t = foo cell] is expected, the module [HashCons] can be supplied. This holds regardless of the type [foo]. *) val equal: 'data cell -> 'data cell -> bool val compare: 'data cell -> 'data cell -> int val hash : 'data cell -> int (**A hash-consing service allocates uniquely-numbered cells for data. The smart constructor [make] either allocates a fresh cell or returns an existing cell, as appropriate. *) module type SERVICE = sig type data val make: data -> data cell end (**The functor [Make] expects a type [data] for which a memoizer exists, and produces a hash-consing service for it. *) module Make (M : MEMOIZER) : SERVICE with type data = M.key (**[ForHashedType] is a special case of [Make] where it suffices to pass a hashed type [T] as an argument. A hash table is used to hold the memoization table. *) module ForHashedType (T : HashedType) : SERVICE with type data = T.t menhir-20210929/fix/src/Makefile000066400000000000000000000002131412503066000162540ustar00rootroot00000000000000.PHONY: all all: dune build @check # This is enough to type-check the code in this directory, # and is much faster than "make -C .. all". menhir-20210929/fix/src/Memoize.ml000066400000000000000000000122701412503066000165610ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Sigs (* [rev_take accu n xs] is [accu @ rev (take n xs)], where [take n xs] takes the first [n] elements of the list [xs]. The length of [xs] must be at least [n]. *) let rec rev_take accu n xs = match n, xs with | 0, _ -> accu | _, [] -> (* The list is too short. This cannot happen. *) assert false | _, x :: xs -> rev_take (x :: accu) (n - 1) xs module Make (M : IMPERATIVE_MAPS) = struct type key = M.key let add x y table = M.add x y table; y (* [memoize] could be defined as a special case of [fix] via the declaration [let memoize f = fix (fun _ x -> f x)]. The following direct definition is perhaps easier to understand and may give rise to more efficient code. *) type 'a t = 'a M.t let visibly_memoize (f : key -> 'a) : (key -> 'a) * 'a t = let table = M.create() in let f x = try M.find x table with Not_found -> add x (f x) table in f, table let memoize (f : key -> 'a) : key -> 'a = let f, _table = visibly_memoize f in f let fix (ff : (key -> 'a) -> (key -> 'a)) : key -> 'a = let table = M.create() in let rec f x = try M.find x table with Not_found -> add x (ff f x) table in f (* In the implementation of [defensive_fix], we choose to use two tables. A permanent table, [table] maps keys to values. Once a pair [x, y] has been added to this table, it remains present forever: [x] is stable, and a call to [f x] returns [y] immediately. A transient table, [marked], is used only while a call is in progress. This table maps keys to integers: for each key [x], it records the depth of the stack at the time [x] was pushed onto the stack. Finally, [stack] is a list of the keys currently under examination (most recent key first), and [depth] is the length of the list [stack]. Recording integer depths in the table [marked] allows us to identify the desired cycle, a prefix of the list [stack], without requiring an equality test on keys. *) exception Cycle of key list * key let defensive_fix (ff : (key -> 'a) -> (key -> 'a)) : key -> 'a = (* Create the permanent table. *) let table = M.create() in (* Define the main recursive function. *) let rec f stack depth marked (x : key) : 'a = try M.find x table with Not_found -> match M.find x marked with | i -> (* [x] is marked, and was pushed onto the stack at a time when the stack depth was [i]. We have found a cycle. Fail. Cut a prefix of the reversed stack, which represents the cycle that we have detected, and reverse it on the fly. *) raise (Cycle (rev_take [] (depth - i) stack, x)) | exception Not_found -> (* [x] is not marked. Mark it while we work on it. There is no need to unmark [x] afterwards; inserting it into [table] indicates that it has stabilized. There also is no need to catch and re-raise the exception [Cycle]; we just let it escape. *) M.add x depth marked; let stack = x :: stack and depth = depth + 1 in let y = ff (f stack depth marked) x in add x y table in fun x -> (* Create the transient table. *) let marked = M.create() and stack = [] and depth = 0 in (* Answer this query. *) f stack depth marked x (* The combinator [curried] can be used to obtain a curried version of [fix] or [defensive_fix] in a concrete instance where the type [key] is a product type. *) (* [curried] could be defined as a toplevel function; it does not depend on any of the code above. However, it seems convenient to place it here. *) let curry f x y = f (x, y) let uncurry f (x, y) = f x y let curried (fix : ('a * 'b -> 'c) fix) : ('a -> 'b -> 'c) fix = fun ff -> let ff f = uncurry (ff (curry f)) in curry (fix ff) end module ForOrderedType (T : OrderedType) = Make(Glue.PersistentMapsToImperativeMaps(Map.Make(T))) module ForHashedType (T : HashedType) = Make(Glue.HashTablesAsImperativeMaps(T)) module ForType (T : TYPE) = ForHashedType(Glue.TrivialHashedType(T)) module Char = ForType(Glue.CHAR) module Int = ForType(Glue.INT) module String = ForType(Glue.STRING) menhir-20210929/fix/src/Memoize.mli000066400000000000000000000046421412503066000167360ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (**[Memoize] offers a number of combinators that help construct possibly recursive memoizing functions, that is, functions that lazily record their input/output graph, so as to avoid repeated computation. *) open Sigs (**[Make] constructs a memoizer for a type [key] that is equipped with an implementation of imperative maps. *) module Make (M : IMPERATIVE_MAPS) : MEMOIZER with type key = M.key and type 'a t = 'a M.t (**[ForOrderedType] is a special case of [Make] where it suffices to pass an ordered type [T] as an argument. A reference to a persistent map is used to hold the memoization table. *) module ForOrderedType (T : OrderedType) : MEMOIZER with type key = T.t and type 'a t = 'a Map.Make(T).t ref (**[ForHashedType] is a special case of [Make] where it suffices to pass a hashed type [T] as an argument. A hash table is used to hold the memoization table. *) module ForHashedType (T : HashedType) : MEMOIZER with type key = T.t and type 'a t = 'a Hashtbl.Make(T).t (**[ForType] is a special case of [Make] where it suffices to pass an arbitrary type [T] as an argument. A hash table is used to hold the memoization table. OCaml's built-in generic equality and hash functions are used. *) module ForType (T : TYPE) : MEMOIZER with type key = T.t (**Memoizers for some common types. *) module Char : MEMOIZER with type key = char module Int : MEMOIZER with type key = int module String : MEMOIZER with type key = string menhir-20210929/fix/src/Numbering.ml000066400000000000000000000061371412503066000171070ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Sigs let force o = match o with Some x -> x | None -> assert false module Make (M : IMPERATIVE_MAPS) = struct type t = M.key (* Create a generator of fresh integers. *) open Gensym let g = generator() let current () = current g (* Memoizing the function [fun _ -> fresh g] yields the function [encode], which maps keys to unique integers. We use [visibly_memoize] so as to have access to the memoization table. This allows us to use operations such as [M.find] and [M.iter] below. *) let (encode : t -> int), (table : int M.t) = let module Memo = Memoize.Make(M) in Memo.visibly_memoize (fun (_ : t) -> fresh g) (* Testing whether a key has been encountered already. *) let has_been_encoded (x : t) : bool = (* We do not have [M.mem], so we re-implement it in terms of [M.find]. *) try let _ = M.find x table in true with Not_found -> false (* Building a mapping of integer codes back to keys. *) let reverse_mapping () : t array = let n = current() in let reverse : t option array = Array.make n None in M.iter (fun x i -> reverse.(i) <- Some x ) table; Array.map force reverse module Done () = struct type t = M.key let n = current() let encode x = (* It would be an error to try and encode new keys now. Thus, if [x] has not been encountered before, the client is at fault. Fail with a nice informative message. *) if has_been_encoded x then encode x else Printf.sprintf "Fix.Numbering: invalid argument passed to \"encode\".\n%s\n" __LOC__ |> invalid_arg let reverse = reverse_mapping() let decode i = if 0 <= i && i < n then reverse.(i) else Printf.sprintf "Fix.Numbering: invalid argument passed to \"decode\".\n\ The index %d is not in the range [0, %d).\n%s\n" i n __LOC__ |> invalid_arg end end module ForOrderedType (T : OrderedType) = Make(Glue.PersistentMapsToImperativeMaps(Map.Make(T))) module ForHashedType (T : HashedType) = Make(Glue.HashTablesAsImperativeMaps(T)) module ForType (T : TYPE) = ForHashedType(Glue.TrivialHashedType(T)) menhir-20210929/fix/src/Numbering.mli000066400000000000000000000041021412503066000172460ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (**[Numbering] offers a facility for assigning a unique number to each value in a certain finite set and translating (both ways) between values and their numbers. *) open Sigs (**The functor [Make] requires an implementation of maps for the type [M.key] and offers a two-phase numbering facility. The function [encode] is backed by a map, therefore runs in logarithmic time or constant time, depending on the type of map that is used. The function [decode] is backed by an array of size [n], therefore runs in constant time. *) module Make (M : IMPERATIVE_MAPS) : TWO_PHASE_NUMBERING with type t = M.key (**[ForOrderedType] is a special case of [Make] where it suffices for keys to be ordered. *) module ForOrderedType (T : OrderedType) : TWO_PHASE_NUMBERING with type t = T.t (**[ForHashedType] is a special case of [Make] where it suffices for keys to be hashed. *) module ForHashedType (T : HashedType) : TWO_PHASE_NUMBERING with type t = T.t (**[ForType] is a special case of [Make] where keys can have arbitrary type. OCaml's built-in generic equality and hash functions are used. *) module ForType (T : TYPE) : TWO_PHASE_NUMBERING with type t = T.t menhir-20210929/fix/src/Option.ml000066400000000000000000000033131412503066000164220ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) module Option (X : sig type t end) = struct open X type property = t option let bottom = None let equal (o1 : property) (o2 : property) = (* It is permitted to assume that [o1 <= o2] holds. This implies that when [o1] is [Some x1] and [o2] is [Some x2] we may return [true] without actually comparing [x1] and [x2]. *) match o1, o2 with | Some _, None -> (* Because [o1 <= o2] holds, this cannot happen. *) let msg = Printf.sprintf "\n Fix.Prop.Option says: \ please check that your \"rhs\" function is \ monotone.\n %s\n" __LOC__ in raise (Invalid_argument msg) | None, Some _ -> false | None, None | Some _, Some _ -> true let is_maximal o = match o with | None -> false | Some _ -> true end menhir-20210929/fix/src/Option.mli000066400000000000000000000023471412503066000166010ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (**The option type, equipped with the ordering is [None <= Some x]. *) (**This ordering is not a lattice. *) (**Although the code is polymorphic in the type of elements, it must still be packaged as a functor, because [property] cannot be a parameterized type. *) open Sigs module Option (X : sig type t end) : PROPERTY with type property = X.t option menhir-20210929/fix/src/Set.ml000066400000000000000000000024061412503066000157070ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) module Set (X : sig type t val empty: t val equal: t -> t -> bool end) = struct type property = X.t (* a set *) let bottom = X.empty let equal = X.equal let is_maximal _s = false (* We do not know what the full set is. We could take it as a functor argument, but the comparison would be costly anyway, so that seems pointless. *) end menhir-20210929/fix/src/Set.mli000066400000000000000000000021471412503066000160620ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (**The lattice of sets. The ordering is set inclusion. Therefore, the empty set is the bottom element. *) open Sigs module Set (X : sig type t val empty: t val equal: t -> t -> bool end) : PROPERTY with type property = X.t menhir-20210929/fix/src/Sigs.ml000066400000000000000000000273731412503066000160730ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* -------------------------------------------------------------------------- *) (**A type alone. *) module type TYPE = sig type t end (* -------------------------------------------------------------------------- *) (**An ordered type. A hashed type. These are standard notions. *) module type OrderedType = Map.OrderedType module type HashedType = Hashtbl.HashedType (* -------------------------------------------------------------------------- *) (**A type whose elements can be enumerated. *) module type FINITE_TYPE = sig type t val foreach: (t -> unit) -> unit end (* -------------------------------------------------------------------------- *) (**Association maps. *) (**Following the convention of the ocaml standard library, [find] raises the exception [Not_found] when the key is not in the domain of the map. In contrast, [get] returns an option. *) (**Persistent maps. The empty map is a constant. Insertion creates a new map. *) (**This is a fragment of the standard signature [Map.S]. *) module type PERSISTENT_MAPS = sig type key type 'data t val empty: 'data t val add: key -> 'data -> 'data t -> 'data t val find: key -> 'data t -> 'data val iter: (key -> 'data -> unit) -> 'data t -> unit end (**Imperative maps. A fresh empty map is produced by [create]. Insertion updates a map in place. [clear] empties an existing map. *) (**The order of the arguments to [add] and [find] is consistent with the order used in [PERSISTENT_MAPS] above. Thus, it departs from the convention used in OCaml's [Hashtbl] module. *) module type MINIMAL_IMPERATIVE_MAPS = sig type key type 'data t val create: unit -> 'data t val add: key -> 'data -> 'data t -> unit val find: key -> 'data t -> 'data end module type IMPERATIVE_MAPS = sig include MINIMAL_IMPERATIVE_MAPS val clear: 'data t -> unit val iter: (key -> 'data -> unit) -> 'data t -> unit end (* -------------------------------------------------------------------------- *) (**The signature [PROPERTY] is used by [Fix.Make], the least fixed point computation algorithm. *) (**The type [property] must form a partial order. It must be equipped with a least element [bottom] and with an equality test [equal]. (In the function call [equal p q], it is permitted to assume that [p <= q] holds.) We do not require an ordering test [leq]. We do not require a join operation [lub]. We do require the ascending chain condition: every monotone sequence must eventually stabilize. *) (**The function [is_maximal] determines whether a property [p] is maximal with respect to the partial order. Only a conservative check is required: in any event, it is permitted for [is_maximal p] to be [false]. If [is_maximal p] is [true], then [p] must have no strict upper bound. In particular, in the case where properties form a lattice, this means that [p] must be the top element. *) module type PROPERTY = sig type property val bottom: property val equal: property -> property -> bool val is_maximal: property -> bool end (* -------------------------------------------------------------------------- *) (**The signature [SEMI_LATTICE] offers separate [leq] and [join] functions. The functor [Glue.MinimalSemiLattice] can be used, if necessary, to convert this signature to [MINIMAL_SEMI_LATTICE]. *) module type SEMI_LATTICE = sig type property val leq: property -> property -> bool val join: property -> property -> property end (**The signature [MINIMAL_SEMI_LATTICE] is used by [Fix.DataFlow]. *) module type MINIMAL_SEMI_LATTICE = sig type property (** [leq_join p q] must compute the join of [p] and [q]. If the result is logically equal to [q], then [q] itself must be returned. Thus, we have [leq_join p q == q] if and only if [leq p q] holds. *) val leq_join: property -> property -> property end (* -------------------------------------------------------------------------- *) (**The type of a fixed point combinator that constructs a value of type ['a]. *) type 'a fix = ('a -> 'a) -> 'a (* -------------------------------------------------------------------------- *) (**Memoizers -- higher-order functions that construct memoizing functions. *) module type MEMOIZER = sig (**A type of keys. *) type key (**A memoization combinator for this type. *) val memoize: (key -> 'a) -> (key -> 'a) (**A memoization combinator where the memoization table is exposed. *) type 'a t val visibly_memoize: (key -> 'a) -> (key -> 'a) * 'a t (**A recursive memoization combinator. *) val fix: (key -> 'a) fix (**[defensive_fix] works like [fix], except it additionally detects circular dependencies, which can arise if the second-order function supplied by the user does not follow a well-founded recursion pattern. When the user invokes [f x], where [f] is the function returned by [defensive_fix], if a cyclic dependency is detected, then [Cycle (zs, z)] is raised, where the list [zs] begins with [z] and continues with a series of intermediate keys, leading back to [z]. Note that undetected divergence remains possible; this corresponds to an infinite dependency chain, without a cycle. *) exception Cycle of key list * key val defensive_fix: (key -> 'a) fix (**This combinator can be used to obtain a curried version of [fix] or [defensive_fix] in a concrete instance where the type [key] is a product type. *) val curried: ('a * 'b -> 'c) fix -> ('a -> 'b -> 'c) fix end (* -------------------------------------------------------------------------- *) (**Tabulators: higher-order functions that construct tabulated functions. *) (**Like memoization, tabulation guarantees that, for every key [x], the image [f x] is computed at most once. Unlike memoization, where this computation takes place on demand, in the case of tabulation, the computation of every [f x] takes place immediately, when [tabulate] is invoked. The graph of the function [f], a table, is constructed and held in memory. *) module type TABULATOR = sig (**A type of keys. *) type key (**A tabulation combinator for this type. *) val tabulate: (key -> 'a) -> (key -> 'a) end (* -------------------------------------------------------------------------- *) (**Solvers: higher-order functions that compute the least solution of a monotone system of equations. *) module type SOLVER = sig type variable type property (**A valuation is a mapping of variables to properties. *) type valuation = variable -> property (**A right-hand side, when supplied with a valuation that gives meaning to its free variables, evaluates to a property. More precisely, a right-hand side is a monotone function of valuations to properties. *) type rhs = valuation -> property (**A system of equations is a mapping of variables to right-hand sides. *) type equations = variable -> rhs (**[lfp eqs] produces the least solution of the system of monotone equations [eqs]. *) (**It is guaranteed that, for each variable [v], the application [eqs v] is performed at most once (whereas the right-hand side produced by this application is, in general, evaluated multiple times). This guarantee can be used to perform costly pre-computation, or memory allocation, when [eqs] is applied to its first argument. *) (**When [lfp] is applied to a system of equations [eqs], it performs no actual computation. It produces a valuation, [get], which represents the least solution of the system of equations. The actual fixed point computation takes place, on demand, when [get] is applied. *) val lfp: equations -> valuation end (* -------------------------------------------------------------------------- *) (**The signature [SOLUTION] is used to describe the result of [Fix.DataFlow]. *) module type SOLUTION = sig type variable type property val solution: variable -> property end (* -------------------------------------------------------------------------- *) (**Directed, rooted graphs. *) module type GRAPH = sig type t val foreach_root: (t -> unit) -> unit val foreach_successor: t -> (t -> unit) -> unit end (* -------------------------------------------------------------------------- *) (**The signature [DATA_FLOW_GRAPH] is used to describe a data flow analysis problem. It is used to describe the input to [Fix.DataFlow]. *) (**The function [foreach_root] describes the root nodes of the data flow graph as well as the properties associated with them. *) (**The function [foreach_successor] describes the edges of the data flow graph as well as the manner in which a property at the source of an edge is transformed into a property at the target. The property at the target must of course be a monotonic function of the property at the source. *) module type DATA_FLOW_GRAPH = sig type variable type property val foreach_root: (variable -> property -> unit) -> unit val foreach_successor: variable -> property -> (variable -> property -> unit) -> unit end (* -------------------------------------------------------------------------- *) (**Numberings. *) (**An ongoing numbering of (a subset of) a type [t] offers a function [encode] which maps a value of type [t] to a unique integer code. If applied twice to the same value, [encode] returns the same code; if applied to a value that has never been encountered, it returns a fresh code. The function [current] returns the next available code, which is also the number of values that have been encoded so far. The function [has_been_encoded] tests whether a value has been encoded already. *) module type ONGOING_NUMBERING = sig type t val encode: t -> int val current: unit -> int val has_been_encoded: t -> bool end (**A numbering of (a subset of) a type [t] is a triple of an integer [n] and two functions [encode] and [decode] which represent an isomorphism between this subset of [t] and the interval [\[0..n)]. *) module type NUMBERING = sig type t val n: int val encode: t -> int val decode: int -> t end (**A combination of the above two signatures. According to this signature, a numbering process is organized in two phases. During the first phase, the numbering is ongoing; one can encode keys, but not decode. Applying the functor [Done()] ends the first phase. A fixed numbering then becomes available, which gives access to the total number [n] of encoded keys and to both [encode] and [decode] functions. *) module type TWO_PHASE_NUMBERING = sig include ONGOING_NUMBERING module Done () : NUMBERING with type t = t end (* -------------------------------------------------------------------------- *) (**Injections. *) (**An injection of [t] into [u] is an injective function of type [t -> u]. Because [encode] is injective, [encode x] can be thought of as the identity of the object [x]. *) module type INJECTION = sig type t type u val encode: t -> u end menhir-20210929/fix/src/Tabulate.ml000066400000000000000000000037361412503066000167240ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Sigs module Make (F : FINITE_TYPE) (M : MINIMAL_IMPERATIVE_MAPS with type key = F.t) = struct type key = M.key let tabulate (f : key -> 'a) : key -> 'a = let table = M.create() in F.foreach (fun x -> M.add x (f x) table); fun x -> try M.find x table with Not_found -> (* This cannot happen if [foreach] is exhaustive. *) let msg = Printf.sprintf "\n Fix.Tabulate says: \ please check that your \"foreach\" function is \ exhaustive.\n %s\n" __LOC__ in raise (Invalid_argument msg) end module ForOrderedType (F : FINITE_TYPE) (T : OrderedType with type t = F.t) = Make(F)(Glue.PersistentMapsToImperativeMaps(Map.Make(T))) module ForHashedType (F : FINITE_TYPE) (T : HashedType with type t = F.t) = Make(F)(Glue.HashTablesAsImperativeMaps(T)) module ForType (F : FINITE_TYPE) = ForHashedType(F)(Glue.TrivialHashedType(F)) module ForIntSegment (K : sig val n: int end) = struct type key = int let tabulate (f : key -> 'a) : key -> 'a = let table = Array.init K.n f in fun x -> table.(x) end menhir-20210929/fix/src/Tabulate.mli000066400000000000000000000045611412503066000170720ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (**[Tabulate] offers facilities for tabulating a function, that is, eagerly evaluating this function at every point in its domain, so as to obtain an equivalent function that can be queried in constant time. *) open Sigs (**[Make] constructs a tabulator for a finite type that is equipped with an implementation of imperative maps. *) module Make (F : FINITE_TYPE) (M : MINIMAL_IMPERATIVE_MAPS with type key = F.t) : TABULATOR with type key = F.t (**[ForOrderedType] is a special case of [Make] where it suffices to pass a finite ordered type as an argument. A reference to a persistent map is used to hold the table. *) module ForOrderedType (F : FINITE_TYPE) (T : OrderedType with type t = F.t) : TABULATOR with type key = F.t (**[ForOrderedType] is a special case of [Make] where it suffices to pass a finite hashed type as an argument. A reference to a persistent map is used to hold the table. *) module ForHashedType (F : FINITE_TYPE) (T : HashedType with type t = F.t) : TABULATOR with type key = F.t (**[ForOrderedType] is a special case of [Make] where it suffices to pass an arbitrary finite type as an argument. A reference to a persistent map is used to hold the table. *) module ForType (F : FINITE_TYPE) : TABULATOR with type key = F.t (**[ForIntSegment] constructs a tabulator for the integer segment [\[0..n)]. An array is used to hold the table. *) module ForIntSegment (K : sig val n: int end) : TABULATOR with type key = int menhir-20210929/fix/src/Vendored_fix.ml000066400000000000000000000047621412503066000175770ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* The module [Fix] that we present to the final user is obtained as a combination of several modules, as follows. *) (* Define all signatures in the toplevel structure. We expect the user to declare [open Fix] and to have direct access to all of these signatures, under unqualified names. *) include Sigs (* Give access to the following modules as submodules. Thus, if the user has declared [open Fix], then she can use [Glue], [Memoize], etc. If she hasn't, then she must use [Fix.Glue], [Fix.Memoize], etc. *) module Glue = Glue module Memoize = Memoize module Numbering = Numbering module GraphNumbering = GraphNumbering module Tabulate = Tabulate module Gensym = Gensym module HashCons = HashCons module DataFlow = DataFlow module Prop = struct (**[Prop] offers a number of ready-made implementations of the signature [PROPERTY]. *) (**The lattice of Booleans. *) module Boolean = Boolean (* The following declarations are set up so that the user sees [Prop.Option] and [Prop.Set] as functors. *) (**The lattice of options. *) include Option (**The lattice of sets. *) include Set end (* As a special case, [Core] is renamed [Fix]. Thus, if the user has declared [open Fix], then she can still use [Fix.Make], [Fix.ForHashedType], etc. (This seems nice.) If she hasn't, then she can still use [Fix.Make], because we define an alias for [Make] one level up. This is required for compatibility with earlier versions of [Fix] (2013-2018), where [Fix.Make] was the sole entry point. *) module Fix = Core module Make = Core.Make menhir-20210929/fix/src/attic/000077500000000000000000000000001412503066000157245ustar00rootroot00000000000000menhir-20210929/fix/src/attic/BoolEqs.ml000066400000000000000000000040731412503066000176260ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Fix (* -------------------------------------------------------------------------- *) (* Positive Boolean formulae. *) type 'variable formula = | FVar of 'variable | FTrue | FFalse | FDisjunction of 'variable formula * 'variable formula | FConjunction of 'variable formula * 'variable formula (* Construction. *) let disjunction f xs = List.fold_left (fun formula x -> FDisjunction (formula, f x)) FFalse xs let conjunction f xs = List.fold_left (fun formula x -> FConjunction (formula, f x)) FTrue xs (* Evaluation. *) let rec eval f env = match f with | FVar x -> env x | FTrue -> true | FFalse -> false | FDisjunction (f1, f2) -> eval f1 env || eval f2 env | FConjunction (f1, f2) -> eval f1 env && eval f2 env (* -------------------------------------------------------------------------- *) (* We are now ready to solve. *) module Make (M : IMPERATIVE_MAPS) = struct type variable = M.key type valuation = variable -> bool type equations = variable -> variable formula module F = Fix.Make(M)(Boolean) let lfp (eqs : equations) : valuation = let eqs : F.equations = fun v -> eval (eqs v) in F.lfp eqs end menhir-20210929/fix/src/attic/BoolEqs.mli000066400000000000000000000033271412503066000200000ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Fix (* -------------------------------------------------------------------------- *) (* Positive Boolean formulae. *) type 'variable formula = | FVar of 'variable | FTrue | FFalse | FDisjunction of 'variable formula * 'variable formula | FConjunction of 'variable formula * 'variable formula (* TEMPORARY supprimer ces deux constructeurs? *) val disjunction: ('a -> 'variable formula) -> 'a list -> 'variable formula val conjunction: ('a -> 'variable formula) -> 'a list -> 'variable formula (* -------------------------------------------------------------------------- *) (* Solving systems of recursive positive Boolean equations. *) module Make (M : IMPERATIVE_MAPS) : sig type variable = M.key type valuation = variable -> bool type equations = variable -> variable formula val lfp: equations -> valuation end menhir-20210929/fix/src/attic/ChopFix.ml000066400000000000000000000127471412503066000176310ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Maps open Fix (* -------------------------------------------------------------------------- *) (* Systems of syntactic equations. *) module type SYNTACTIC_EQUATIONS = sig type variable type 'variable formula val rhs: variable -> variable formula end (* -------------------------------------------------------------------------- *) (* Generic syntax for right-hand sides. *) module type SYNTAX = sig type 'variable formula type property val var: 'a -> 'a formula val cost: 'a formula -> int val map: ('a -> 'b) -> ('a formula -> 'b formula) -> 'a formula -> 'b formula val eval: 'variable formula -> ('variable -> property) -> property end (* -------------------------------------------------------------------------- *) module Solve (S : SYNTAX) = struct open S (* -------------------------------------------------------------------------- *) (* A system of arbitrary equations is turned into a system of small equations by chopping formulae so as to ensure that their size is bounded by a constant [k]. *) let k = 32 (* TEMPORARY parameterize? *) (* In the new equation system, variables (known as ``points'') are either original variables, or roots of small formulae whose leaves are again points. Each such root carries a mutable field for use by the [Fix] module. This allows [Fix] to perform memoization at each formula root, and guarantees linear time complexity. *) type ('variable, 'property) dummy = (* TEMPORARY *) ('variable, 'property) info type ('variable, 'property) point = | V of 'variable | R of ('variable, 'property) root and ('variable, 'property) root = { formula: ('variable, 'property) point formula; (* a small formula *) mutable data: ('variable, 'property) info option; } and ('variable, 'property) info = (('variable, 'property) point, 'property) dummy let mkV v = V v (* -------------------------------------------------------------------------- *) module Make (V : sig type variable end) (P : PROPERTY with type property = S.property) (M : IMPERATIVE_MAP with type key = V.variable and type data = (V.variable, P.property) info) (E : SYNTACTIC_EQUATIONS with type variable = V.variable and type 'variable formula = 'variable S.formula) = struct (* -------------------------------------------------------------------------- *) (* Conversion of arbitrary formulae to small formulae. *) module SmallEquations = struct type variable = (E.variable, P.property) point type property = P.property let rec chop (f : E.variable formula) : variable formula = let credit = ref k in let rec chopc (f : E.variable formula) : variable formula = let cost = cost f in if !credit <= 0 && cost > 0 then var (R { formula = chop f; data = None }) else ( credit := !credit - cost; map mkV chopc f ) in chopc f (* Because [rhs] is invoked at most once per variable, each right-hand side is chopped at most once. Thus, at most one copy of each formula root exists. This allows storing information about the root within the root itself. *) let rhs : variable -> variable formula = function | V v -> chop (E.rhs v) | R root -> root.formula let rhs (v : variable) : (variable -> property) -> property = eval (rhs v) end (* -------------------------------------------------------------------------- *) (* An imperative map over points is implemented in terms of the existing imperative map over variables, on the one hand, and of the mutable storage field found within each formula root, on the other hand. *) module MixedMap = struct type key = SmallEquations.variable type data = (V.variable, P.property) info let set (key : key) (data : data) : unit = match key with | V x -> M.set x data | R root -> root.data <- Some data let get (key : key) : data option = match key with | V x -> M.get x | R root -> root.data end (* -------------------------------------------------------------------------- *) (* We are now ready to solve the small equation system. *) module Solution = Make (SmallEquations) (P) (MixedMap) (SmallEquations) (* -------------------------------------------------------------------------- *) (* Publish the solution (at original variables only). *) type variable = E.variable type property = P.property let get (v : variable) = Solution.get (V v) end end (* TEMPORARY measure performance without (singular) imperative map; use (plural) imperative maps instead, implemented as a hash table or an infinite array; see if it's worth the trouble of publishing those info types. *) menhir-20210929/fix/src/attic/ChopFix.mli000066400000000000000000000041461412503066000177740ustar00rootroot00000000000000(******************************************************************************) (* *) (* Fix *) (* *) (* François Pottier, Inria Paris *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Maps open Fix (* -------------------------------------------------------------------------- *) (* Systems of syntactic equations. *) module type SYNTACTIC_EQUATIONS = sig type variable type 'variable formula val rhs: variable -> variable formula end (* -------------------------------------------------------------------------- *) (* Generic syntax for right-hand sides. *) module type SYNTAX = sig type 'variable formula type property val var: 'a -> 'a formula val cost: 'a formula -> int val map: ('a -> 'b) -> ('a formula -> 'b formula) -> 'a formula -> 'b formula val eval: 'variable formula -> ('variable -> property) -> property end (* -------------------------------------------------------------------------- *) (* Chopping and solving. *) module Solve (S : SYNTAX) : sig type ('variable, 'property) info module Make (V : sig type variable end) (P : PROPERTY with type property = S.property) (M : IMPERATIVE_MAP with type key = V.variable and type data = (V.variable, P.property) info) (E : SYNTACTIC_EQUATIONS with type variable = V.variable and type 'variable formula = 'variable S.formula) : VALUATION with type variable = V.variable and type property = P.property end menhir-20210929/fix/src/dune000066400000000000000000000002111412503066000154700ustar00rootroot00000000000000(library (name vendored_fix) (public_name vendored_fix) (synopsis "An on-demand, incremental fixed point computation algorithm") ) menhir-20210929/fix/vendored_fix.opam000066400000000000000000000007121412503066000173630ustar00rootroot00000000000000name: "vendored_fix" opam-version: "2.0" maintainer: "francois.pottier@inria.fr" authors: [ "François Pottier " ] homepage: "https://gitlab.inria.fr/fpottier/fix" dev-repo: "git+https://gitlab.inria.fr/fpottier/fix.git" bug-reports: "francois.pottier@inria.fr" build: [ ["dune" "build" "-p" name "-j" jobs] ] depends: [ "ocaml" { >= "4.02.3" } "dune" { >= "1.3" } ] synopsis: "Facilities for memoization and fixed points" menhir-20210929/lib/000077500000000000000000000000001412503066000140115ustar00rootroot00000000000000menhir-20210929/lib/Convert.ml000066400000000000000000000110121412503066000157560ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* An ocamlyacc-style, or Menhir-style, parser requires access to the lexer, which must be parameterized with a lexing buffer, and to the lexing buffer itself, where it reads position information. *) (* This traditional API is convenient when used with ocamllex, but inelegant when used with other lexer generators. *) type ('token, 'semantic_value) traditional = (Lexing.lexbuf -> 'token) -> Lexing.lexbuf -> 'semantic_value (* This revised API is independent of any lexer generator. Here, the parser only requires access to the lexer, and the lexer takes no parameters. The tokens returned by the lexer may contain position information. *) type ('token, 'semantic_value) revised = (unit -> 'token) -> 'semantic_value (* --------------------------------------------------------------------------- *) (* Converting a traditional parser, produced by ocamlyacc or Menhir, into a revised parser. *) (* A token of the revised lexer is essentially a triple of a token of the traditional lexer (or raw token), a start position, and and end position. The three [get] functions are accessors. *) (* We do not require the type ['token] to actually be a triple type. This enables complex applications where it is a record type with more than three fields. It also enables simple applications where positions are of no interest, so ['token] is just ['raw_token] and [get_startp] and [get_endp] return dummy positions. *) let traditional2revised (get_raw_token : 'token -> 'raw_token) (get_startp : 'token -> Lexing.position) (get_endp : 'token -> Lexing.position) (parser : ('raw_token, 'semantic_value) traditional) : ('token, 'semantic_value) revised = (* Accept a revised lexer. *) fun (lexer : unit -> 'token) -> (* Create a dummy lexing buffer. *) let lexbuf : Lexing.lexbuf = Lexing.from_string "" in (* Wrap the revised lexer as a traditional lexer. A traditional lexer returns a raw token and updates the fields of the lexing buffer with new positions, which will be read by the parser. *) let lexer (lexbuf : Lexing.lexbuf) : 'raw_token = let token : 'token = lexer() in lexbuf.Lexing.lex_start_p <- get_startp token; lexbuf.Lexing.lex_curr_p <- get_endp token; get_raw_token token in (* Invoke the traditional parser. *) parser lexer lexbuf (* --------------------------------------------------------------------------- *) (* Converting a revised parser back to a traditional parser. *) let revised2traditional (make_token : 'raw_token -> Lexing.position -> Lexing.position -> 'token) (parser : ('token, 'semantic_value) revised) : ('raw_token, 'semantic_value) traditional = (* Accept a traditional lexer and a lexing buffer. *) fun (lexer : Lexing.lexbuf -> 'raw_token) (lexbuf : Lexing.lexbuf) -> (* Wrap the traditional lexer as a revised lexer. *) let lexer () : 'token = let token : 'raw_token = lexer lexbuf in make_token token lexbuf.Lexing.lex_start_p lexbuf.Lexing.lex_curr_p in (* Invoke the revised parser. *) parser lexer (* --------------------------------------------------------------------------- *) (* Simplified versions of the above, where concrete triples are used. *) module Simplified = struct let traditional2revised parser = traditional2revised (fun (token, _, _) -> token) (fun (_, startp, _) -> startp) (fun (_, _, endp) -> endp) parser let revised2traditional parser = revised2traditional (fun token startp endp -> (token, startp, endp)) parser end menhir-20210929/lib/Convert.mli000066400000000000000000000065701412503066000161440ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* An ocamlyacc-style, or Menhir-style, parser requires access to the lexer, which must be parameterized with a lexing buffer, and to the lexing buffer itself, where it reads position information. *) (* This traditional API is convenient when used with ocamllex, but inelegant when used with other lexer generators. *) type ('token, 'semantic_value) traditional = (Lexing.lexbuf -> 'token) -> Lexing.lexbuf -> 'semantic_value (* This revised API is independent of any lexer generator. Here, the parser only requires access to the lexer, and the lexer takes no parameters. The tokens returned by the lexer may contain position information. *) type ('token, 'semantic_value) revised = (unit -> 'token) -> 'semantic_value (* --------------------------------------------------------------------------- *) (* Converting a traditional parser, produced by ocamlyacc or Menhir, into a revised parser. *) (* A token of the revised lexer is essentially a triple of a token of the traditional lexer (or raw token), a start position, and and end position. The three [get] functions are accessors. *) (* We do not require the type ['token] to actually be a triple type. This enables complex applications where it is a record type with more than three fields. It also enables simple applications where positions are of no interest, so ['token] is just ['raw_token] and [get_startp] and [get_endp] return dummy positions. *) val traditional2revised: ('token -> 'raw_token) -> ('token -> Lexing.position) -> ('token -> Lexing.position) -> ('raw_token, 'semantic_value) traditional -> ('token, 'semantic_value) revised (* --------------------------------------------------------------------------- *) (* Converting a revised parser back to a traditional parser. *) val revised2traditional: ('raw_token -> Lexing.position -> Lexing.position -> 'token) -> ('token, 'semantic_value) revised -> ('raw_token, 'semantic_value) traditional (* --------------------------------------------------------------------------- *) (* Simplified versions of the above, where concrete triples are used. *) module Simplified : sig val traditional2revised: ('token, 'semantic_value) traditional -> ('token * Lexing.position * Lexing.position, 'semantic_value) revised val revised2traditional: ('token * Lexing.position * Lexing.position, 'semantic_value) revised -> ('token, 'semantic_value) traditional end menhir-20210929/lib/Engine.ml000066400000000000000000001133271412503066000155570ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) type position = Lexing.position open EngineTypes (* The LR parsing engine. *) (* This module is used: - at compile time, if so requested by the user, via the --interpret options; - at run time, in the table-based back-end. *) module Make (T : TABLE) = struct (* This propagates type and exception definitions. The functions [number], [production_index], [find_production], too, are defined by this [include] declaration. *) include T type 'a env = (state, semantic_value, token) EngineTypes.env (* ------------------------------------------------------------------------ *) (* The type [checkpoint] represents an intermediate or final result of the parser. See [EngineTypes]. *) (* The type [checkpoint] is presented to the user as a private type (see [IncrementalEngine]). This prevents the user from manufacturing checkpoints (i.e., continuations) that do not make sense. (Such continuations could potentially violate the LR invariant and lead to crashes.) *) (* 2017/03/29 Although [checkpoint] is a private type, we now expose a constructor function, [input_needed]. This function allows manufacturing a checkpoint out of an environment. For this reason, the type [env] must also be parameterized with ['a]. *) type 'a checkpoint = | InputNeeded of 'a env | Shifting of 'a env * 'a env * bool | AboutToReduce of 'a env * production | HandlingError of 'a env | Accepted of 'a | Rejected (* ------------------------------------------------------------------------ *) (* As of 2020/12/16, we introduce a choice between multiple error handling strategies. *) (* Regardless of the strategy, when a syntax error is encountered, the function [initiate] is called, a [HandlingError] checkpoint is produced, and (after resuming) the function [error] is called. This function checks whether the current state allows shifting, reducing, or neither, when the lookahead token is [error]. Its behavior, then, depends on the strategy, as follows. *) (* In the legacy strategy, which until now was the only strategy, - If shifting is possible, then a [Shifting] checkpoint is produced, whose field [please_discard] is [true], so (after resuming) an [InputNeeded] checkpoint is produced, and (after a new token has been provided) the parser leaves error-handling mode and returns to normal mode. - If reducing is possible, then one or more reductions are performed. Default reductions are announced via [AboutToReduce] checkpoints, whereas ordinary reductions are performed silently. (It is unclear why this is so.) The parser remains in error-handling mode, so another [HandlingError] checkpoint is produced, and the function [error] is called again. - If neither action is possible and if the stack is nonempty, then a cell is popped off the stack, then a [HandlingError] checkpoint is produced, and the function [error] is called again. - If neither action is possible and if the stack is empty, then the parse dies with a [Reject] checkpoint. *) (* The simplified strategy differs from the legacy strategy as follows: - When shifting, a [Shifting] checkpoint is produced, whose field [please_discard] is [false], so the parser does not request another token, and the parser remains in error-handling mode. (If the destination state of this shift transition has a default reduction, then the parser will perform this reduction as its next step.) - When reducing, all reductions are announced by [AboutToReduce] checkpoints. - If neither shifting [error] nor reducing on [error] is possible, then the parser dies with a [Reject] checkpoint. (The parser does not attempt to pop cells off the stack one by one.) This simplified strategy is appropriate when the grammar uses the [error] token in a limited way, where the [error] token always appears at the end of a production whose semantic action raises an exception (whose purpose is to signal a syntax error and perhaps produce a custom message). Then, the parser must not request one token past the syntax error. (In a REPL, that would be undesirable.) It must perform as many reductions on [error] as possible, then (if possible) shift the [error] token and move to a new state where a default reduction will be possible. (Because the [error] token always appears at the end of a production, no other action can exist in that state, so a default reduction must exist.) The semantic action raises an exception, and that is it. *) (* Let us note that it is also possible to perform no error handling at all, or to perform customized error handling, by stopping as soon as the first [ErrorHandling] checkpoint appears. *) type strategy = [ `Legacy | `Simplified ] (* ------------------------------------------------------------------------ *) (* In the code-based back-end, the [run] function is sometimes responsible for pushing a new cell on the stack. This is motivated by code sharing concerns. In this interpreter, there is no such concern; [run]'s caller is always responsible for updating the stack. *) (* In the code-based back-end, there is a [run] function for each state [s]. This function can behave in two slightly different ways, depending on when it is invoked, or (equivalently) depending on [s]. If [run] is invoked after shifting a terminal symbol (or, equivalently, if [s] has a terminal incoming symbol), then [run] discards a token, unless [s] has a default reduction on [#]. (Indeed, in that case, requesting the next token might drive the lexer off the end of the input stream.) If, on the other hand, [run] is invoked after performing a goto transition, or invoked directly by an entry point, then there is nothing to discard. These two cases are reflected in [CodeBackend.gettoken]. Here, the code is structured in a slightly different way. It is up to the caller of [run] to indicate whether to discard a token, via the parameter [please_discard]. This flag is set when [s] is being entered by shifting a terminal symbol and [s] does not have a default reduction on [#]. *) (* The following recursive group of functions are tail recursive, produce a checkpoint of type [semantic_value checkpoint], and cannot raise an exception. A semantic action can raise [Error], but this exception is immediately caught within [reduce]. *) let rec run env please_discard : semantic_value checkpoint = (* Log the fact that we just entered this state. *) if log then Log.state env.current; (* If [please_discard] is set, we discard the current lookahead token and fetch the next one. In order to request a token from the user, we return an [InputNeeded] continuation, which, when invoked by the user, will take us to [discard]. If [please_discard] is not set, we skip this step and jump directly to [check_for_default_reduction]. *) if please_discard then InputNeeded env else check_for_default_reduction env (* [discard env triple] stores [triple] into [env], overwriting the previous token. It is invoked by [offer], which itself is invoked by the user in response to an [InputNeeded] checkpoint. *) and discard env triple = if log then begin let (token, startp, endp) = triple in Log.lookahead_token (T.token2terminal token) startp endp end; let env = { env with error = false; triple } in check_for_default_reduction env and check_for_default_reduction env = (* Examine what situation we are in. This case analysis is analogous to that performed in [CodeBackend.gettoken], in the sub-case where we do not have a terminal incoming symbol. *) T.default_reduction env.current announce_reduce (* there is a default reduction; perform it *) check_for_error_token (* there is none; continue below *) env and check_for_error_token env = (* There is no default reduction. Consult the current lookahead token so as to determine which action should be taken. *) (* Peeking at the first input token, without taking it off the input stream, is done by reading [env.triple]. We are careful to first check [env.error]. *) (* Note that, if [please_discard] was true, then we have just called [discard], so the lookahead token cannot be [error]. *) (* Returning [HandlingError env] is like calling [error ~strategy env] directly, except it allows the user to regain control and choose an error-handling strategy. *) if env.error then begin if log then Log.resuming_error_handling(); HandlingError env end else let (token, _, _) = env.triple in (* We consult the two-dimensional action table, indexed by the current state and the current lookahead token, in order to determine which action should be taken. *) T.action env.current (* determines a row *) (T.token2terminal token) (* determines a column *) (T.token2value token) shift (* shift continuation *) announce_reduce (* reduce continuation *) initiate (* failure continuation *) env (* ------------------------------------------------------------------------ *) (* This function takes care of shift transitions along a terminal symbol. (Goto transitions are taken care of within [reduce] below.) The symbol can be either an actual token or the [error] pseudo-token. *) (* Here, the lookahead token CAN be [error]. *) and shift env (please_discard : bool) (terminal : terminal) (value : semantic_value) (s' : state) = (* Log the transition. *) if log then Log.shift terminal s'; (* Push a new cell onto the stack, containing the identity of the state that we are leaving. *) let (_, startp, endp) = env.triple in let stack = { state = env.current; semv = value; startp; endp; next = env.stack; } in (* Switch to state [s']. *) let new_env = { env with stack; current = s' } in (* Expose the transition to the user. (In principle, we have a choice between exposing the transition before we take it, after we take it, or at some point in between. This affects the number and type of the parameters carried by [Shifting]. Here, we choose to expose the transition after we take it; this allows [Shifting] to carry only three parameters, whose meaning is simple.) *) Shifting (env, new_env, please_discard) (* ------------------------------------------------------------------------ *) (* The function [announce_reduce] stops the parser and returns a checkpoint which allows the parser to be resumed by calling [reduce]. *) (* Only ordinary productions are exposed to the user. Start productions are not exposed to the user. Reducing a start production simply leads to the successful termination of the parser. *) and announce_reduce env (prod : production) = if T.is_start prod then accept env prod else AboutToReduce (env, prod) (* The function [reduce] takes care of reductions. It is invoked by [resume] after an [AboutToReduce] event has been produced. *) (* Here, the lookahead token CAN be [error]. *) (* The production [prod] CANNOT be a start production. *) and reduce env (prod : production) = (* Log a reduction event. *) if log then Log.reduce_or_accept prod; (* Invoke the semantic action. The semantic action is responsible for truncating the stack and pushing a new cell onto the stack, which contains a new semantic value. It can raise [Error]. *) (* If the semantic action terminates normally, it returns a new stack, which becomes the current stack. *) (* If the semantic action raises [Error], we catch it and initiate error handling. *) (* This [match/with/exception] construct requires OCaml 4.02. *) match T.semantic_action prod env with | stack -> (* By our convention, the semantic action has produced an updated stack. The state now found in the top stack cell is the return state. *) (* Perform a goto transition. The target state is determined by consulting the goto table at the return state and at production [prod]. *) let current = T.goto_prod stack.state prod in let env = { env with stack; current } in run env false | exception Error -> initiate env and accept env prod = (* Log an accept event. *) if log then Log.reduce_or_accept prod; (* Extract the semantic value out of the stack. *) let v = env.stack.semv in (* Finish. *) Accepted v (* ------------------------------------------------------------------------ *) (* The following functions deal with errors. *) (* [initiate] initiates or resumes error handling. *) (* Here, the lookahead token CAN be [error]. *) and initiate env = if log then Log.initiating_error_handling(); let env = { env with error = true } in HandlingError env (* [error] handles errors. *) and error ~strategy env = assert env.error; (* Consult the column associated with the [error] pseudo-token in the action table. *) T.action env.current (* determines a row *) T.error_terminal (* determines a column *) T.error_value (error_shift ~strategy) (* shift continuation *) (error_reduce ~strategy) (* reduce continuation *) (error_fail ~strategy) (* failure continuation *) env and error_shift ~strategy env please_discard terminal value s' = assert (terminal = T.error_terminal && value = T.error_value); (* This state is capable of shifting the [error] token. *) if log then Log.handling_error env.current; (* In the simplified strategy, we change [please_discard] to [false], which means that we won't request the next token and (therefore) we will remain in error-handling mode after shifting the [error] token. *) let please_discard = match strategy with `Legacy -> please_discard | `Simplified -> false in shift env please_discard terminal value s' and error_reduce ~strategy env prod = (* This state is capable of performing a reduction on [error]. *) if log then Log.handling_error env.current; (* In the legacy strategy, we call [reduce] instead of [announce_reduce], apparently in an attempt to hide the reduction steps performed during error handling. In the simplified strategy, all reductions steps are announced. *) match strategy with | `Legacy -> reduce env prod | `Simplified -> announce_reduce env prod and error_fail ~strategy env = (* This state is unable to handle errors. In the simplified strategy, we die immediately. In the legacy strategy, we attempt to pop a stack cell. (This amounts to forgetting part of what we have just read, in the hope of reaching a state where we can shift the [error] token and resume parsing in normal mode. Forgetting past input is not appropriate when the goal is merely to produce a good syntax error message.) *) match strategy with | `Simplified -> Rejected | `Legacy -> (* Attempt to pop a stack cell. *) let cell = env.stack in let next = cell.next in if next == cell then (* The stack is empty. Die. *) Rejected else begin (* The stack is nonempty. Pop a cell, updating the current state to the state [cell.state] found in the popped cell, and continue error handling there. *) (* I note that if the new state [cell.state] has a default reduction, then it is ignored. It is unclear whether this is intentional. It could be a good thing, as it avoids a scenario where the parser diverges by repeatedly popping, performing a default reduction of an epsilon production, popping, etc. Still, the question of whether to obey default reductions while error handling seems obscure. *) let env = { env with stack = next; current = cell.state } in HandlingError env end (* End of the nest of tail recursive functions. *) (* ------------------------------------------------------------------------ *) (* ------------------------------------------------------------------------ *) (* The incremental interface. See [EngineTypes]. *) (* [start s] begins the parsing process. *) let start (s : state) (initial : position) : semantic_value checkpoint = (* Build an empty stack. This is a dummy cell, which is its own successor. Its [next] field WILL be accessed by [error_fail] if an error occurs and is propagated all the way until the stack is empty. Its [endp] field WILL be accessed (by a semantic action) if an epsilon production is reduced when the stack is empty. *) let rec empty = { state = s; (* dummy *) semv = T.error_value; (* dummy *) startp = initial; (* dummy *) endp = initial; next = empty; } in (* Build an initial environment. *) (* Unfortunately, there is no type-safe way of constructing a dummy token. Tokens carry semantic values, which in general we cannot manufacture. This instance of [Obj.magic] could be avoided by adopting a different representation (e.g., no [env.error] field, and an option in the first component of [env.triple]), but I like this representation better. *) let dummy_token = Obj.magic () in let env = { error = false; triple = (dummy_token, initial, initial); (* dummy *) stack = empty; current = s; } in (* Begin parsing. *) (* The parameter [please_discard] here is [true], which means we know that we must read at least one token. This claim relies on the fact that we have ruled out the two special cases where a start symbol recognizes the empty language or the singleton language {epsilon}. *) run env true (* [offer checkpoint triple] is invoked by the user in response to a checkpoint of the form [InputNeeded env]. It checks that [checkpoint] is indeed of this form, and invokes [discard]. *) (* [resume checkpoint] is invoked by the user in response to a checkpoint of the form [AboutToReduce (env, prod)] or [HandlingError env]. It checks that [checkpoint] is indeed of this form, and invokes [reduce] or [error], as appropriate. *) (* In reality, [offer] and [resume] accept an argument of type [semantic_value checkpoint] and produce a checkpoint of the same type. The choice of [semantic_value] is forced by the fact that this is the parameter of the checkpoint [Accepted]. *) (* We change this as follows. *) (* We change the argument and result type of [offer] and [resume] from [semantic_value checkpoint] to ['a checkpoint]. This is safe, in this case, because we give the user access to values of type [t checkpoint] only if [t] is indeed the type of the eventual semantic value for this run. (More precisely, by examining the signatures [INCREMENTAL_ENGINE] and [INCREMENTAL_ENGINE_START], one finds that the user can build a value of type ['a checkpoint] only if ['a] is [semantic_value]. The table back-end goes further than this and produces versions of [start] composed with a suitable cast, which give the user access to a value of type [t checkpoint] where [t] is the type of the start symbol.) *) let offer : 'a . 'a checkpoint -> token * position * position -> 'a checkpoint = function | InputNeeded env -> Obj.magic discard env | _ -> invalid_arg "offer expects InputNeeded" let resume : 'a . ?strategy:strategy -> 'a checkpoint -> 'a checkpoint = fun ?(strategy=`Legacy) checkpoint -> match checkpoint with | HandlingError env -> Obj.magic error ~strategy env | Shifting (_, env, please_discard) -> Obj.magic run env please_discard | AboutToReduce (env, prod) -> Obj.magic reduce env prod | _ -> invalid_arg "resume expects HandlingError | Shifting | AboutToReduce" (* ------------------------------------------------------------------------ *) (* ------------------------------------------------------------------------ *) (* The traditional interface. See [EngineTypes]. *) (* ------------------------------------------------------------------------ *) (* Wrapping a lexer and lexbuf as a token supplier. *) type supplier = unit -> token * position * position let lexer_lexbuf_to_supplier (lexer : Lexing.lexbuf -> token) (lexbuf : Lexing.lexbuf) : supplier = fun () -> let token = lexer lexbuf in let startp = lexbuf.Lexing.lex_start_p and endp = lexbuf.Lexing.lex_curr_p in token, startp, endp (* ------------------------------------------------------------------------ *) (* The main loop repeatedly handles intermediate checkpoints, until a final checkpoint is obtained. This allows implementing the monolithic interface ([entry]) in terms of the incremental interface ([start], [offer], [handle], [reduce]). *) (* By convention, acceptance is reported by returning a semantic value, whereas rejection is reported by raising [Error]. *) (* [loop] is polymorphic in ['a]. No cheating is involved in achieving this. All of the cheating resides in the types assigned to [offer] and [handle] above. *) let rec loop : 'a . ?strategy:strategy -> supplier -> 'a checkpoint -> 'a = fun ?(strategy=`Legacy) read checkpoint -> match checkpoint with | InputNeeded _ -> (* The parser needs a token. Request one from the lexer, and offer it to the parser, which will produce a new checkpoint. Then, repeat. *) let triple = read() in let checkpoint = offer checkpoint triple in loop ~strategy read checkpoint | Shifting _ | AboutToReduce _ | HandlingError _ -> (* The parser has suspended itself, but does not need new input. Just resume the parser. Then, repeat. *) let checkpoint = resume ~strategy checkpoint in loop ~strategy read checkpoint | Accepted v -> (* The parser has succeeded and produced a semantic value. Return this semantic value to the user. *) v | Rejected -> (* The parser rejects this input. Raise an exception. *) raise Error let entry strategy (s : state) lexer lexbuf : semantic_value = let initial = lexbuf.Lexing.lex_curr_p in loop ~strategy (lexer_lexbuf_to_supplier lexer lexbuf) (start s initial) (* ------------------------------------------------------------------------ *) (* [loop_handle] stops if it encounters an error, and at this point, invokes its failure continuation, without letting Menhir do its own traditional error-handling (which involves popping the stack, etc.). *) let rec loop_handle succeed fail read checkpoint = match checkpoint with | InputNeeded _ -> let triple = read() in let checkpoint = offer checkpoint triple in loop_handle succeed fail read checkpoint | Shifting _ | AboutToReduce _ -> (* Which strategy is passed to [resume] here is irrelevant, since this checkpoint is not [HandlingError _]. *) let checkpoint = resume checkpoint in loop_handle succeed fail read checkpoint | HandlingError _ | Rejected -> (* The parser has detected an error. Invoke the failure continuation. *) fail checkpoint | Accepted v -> (* The parser has succeeded and produced a semantic value. Invoke the success continuation. *) succeed v (* ------------------------------------------------------------------------ *) (* [loop_handle_undo] is analogous to [loop_handle], except it passes a pair of checkpoints to the failure continuation. The first (and oldest) checkpoint is the last [InputNeeded] checkpoint that was encountered before the error was detected. The second (and newest) checkpoint is where the error was detected, as in [loop_handle]. Going back to the first checkpoint can be thought of as undoing any reductions that were performed after seeing the problematic token. (These reductions must be default reductions or spurious reductions.) *) let rec loop_handle_undo succeed fail read (inputneeded, checkpoint) = match checkpoint with | InputNeeded _ -> (* Update the last recorded [InputNeeded] checkpoint. *) let inputneeded = checkpoint in let triple = read() in let checkpoint = offer checkpoint triple in loop_handle_undo succeed fail read (inputneeded, checkpoint) | Shifting _ | AboutToReduce _ -> (* Which strategy is passed to [resume] here is irrelevant, since this checkpoint is not [HandlingError _]. *) let checkpoint = resume checkpoint in loop_handle_undo succeed fail read (inputneeded, checkpoint) | HandlingError _ | Rejected -> fail inputneeded checkpoint | Accepted v -> succeed v (* For simplicity, we publish a version of [loop_handle_undo] that takes a single checkpoint as an argument, instead of a pair of checkpoints. We check that the argument is [InputNeeded _], and duplicate it. *) (* The parser cannot accept or reject before it asks for the very first character of input. (Indeed, we statically reject a symbol that generates the empty language or the singleton language {epsilon}.) So, the [start] checkpoint must match [InputNeeded _]. Hence, it is permitted to call [loop_handle_undo] with a [start] checkpoint. *) let loop_handle_undo succeed fail read checkpoint = assert (match checkpoint with InputNeeded _ -> true | _ -> false); loop_handle_undo succeed fail read (checkpoint, checkpoint) (* ------------------------------------------------------------------------ *) let rec shifts checkpoint = match checkpoint with | Shifting (env, _, _) -> (* The parser is about to shift, which means it is willing to consume the terminal symbol that we have fed it. Return the state just before this transition. *) Some env | AboutToReduce _ -> (* The parser wishes to reduce. Just follow. *) (* Which strategy is passed to [resume] here is irrelevant, since this checkpoint is not [HandlingError _]. *) shifts (resume checkpoint) | HandlingError _ -> (* The parser fails, which means it rejects the terminal symbol that we have fed it. *) None | InputNeeded _ | Accepted _ | Rejected -> (* None of these cases can arise. Indeed, after a token is submitted to it, the parser must shift, reduce, or signal an error, before it can request another token or terminate. *) assert false let acceptable checkpoint token pos = let triple = (token, pos, pos) in let checkpoint = offer checkpoint triple in match shifts checkpoint with | None -> false | Some _env -> true (* ------------------------------------------------------------------------ *) (* The type ['a lr1state] describes the (non-initial) states of the LR(1) automaton. The index ['a] represents the type of the semantic value associated with the state's incoming symbol. *) (* The type ['a lr1state] is defined as an alias for [state], which itself is usually defined as [int] (see [TableInterpreter]). So, ['a lr1state] is technically a phantom type, but should really be thought of as a GADT whose data constructors happen to be represented as integers. It is presented to the user as an abstract type (see [IncrementalEngine]). *) type 'a lr1state = state (* ------------------------------------------------------------------------ *) (* Stack inspection. *) (* We offer a read-only view of the parser's state as a stream of elements. Each element contains a pair of a (non-initial) state and a semantic value associated with (the incoming symbol of) this state. Note that the type [element] is an existential type. *) (* As of 2017/03/31, the type [stack] and the function [stack] are DEPRECATED. If desired, they could now be implemented outside Menhir, by relying on the functions [top] and [pop]. *) type element = | Element: 'a lr1state * 'a * position * position -> element open General type stack = element stream (* If [current] is the current state and [cell] is the top stack cell, then [stack cell current] is a view of the parser's state as a stream of elements. *) let rec stack cell current : element stream = lazy ( (* The stack is empty iff the top stack cell is its own successor. In that case, the current state [current] should be an initial state (which has no incoming symbol). We do not allow the user to inspect this state. *) let next = cell.next in if next == cell then Nil else (* Construct an element containing the current state [current] as well as the semantic value contained in the top stack cell. This semantic value is associated with the incoming symbol of this state, so it makes sense to pair them together. The state has type ['a state] and the semantic value has type ['a], for some type ['a]. Here, the OCaml type-checker thinks ['a] is [semantic_value] and considers this code well-typed. Outside, we will use magic to provide the user with a way of inspecting states and recovering the value of ['a]. *) let element = Element ( current, cell.semv, cell.startp, cell.endp ) in Cons (element, stack next cell.state) ) let stack env : element stream = stack env.stack env.current (* As explained above, the function [top] allows access to the top stack element only if the stack is nonempty, i.e., only if the current state is not an initial state. *) let top env : element option = let cell = env.stack in let next = cell.next in if next == cell then None else Some (Element (env.current, cell.semv, cell.startp, cell.endp)) (* [equal] compares the stacks for physical equality, and compares the current states via their numbers (this seems cleaner than using OCaml's polymorphic equality). *) (* The two fields that are not compared by [equal], namely [error] and [triple], are overwritten by the function [discard], which handles [InputNeeded] checkpoints. Thus, if [equal env1 env2] holds, then the checkpoints [input_needed env1] and [input_needed env2] are equivalent: they lead the parser to behave in the same way. *) let equal env1 env2 = env1.stack == env2.stack && number env1.current = number env2.current let current_state_number env = number env.current (* ------------------------------------------------------------------------ *) (* Access to the position of the lookahead token. *) let positions { triple = (_, startp, endp); _ } = startp, endp (* ------------------------------------------------------------------------ *) (* Access to information about default reductions. *) (* This can be a function of states, or a function of environments. We offer both. *) (* Instead of a Boolean result, we could return a [production option]. However, we would have to explicitly test whether [prod] is a start production, and in that case, return [None], I suppose. Indeed, we have decided not to expose the start productions. *) let state_has_default_reduction (state : _ lr1state) : bool = T.default_reduction state (fun _env _prod -> true) (fun _env -> false) () let env_has_default_reduction env = state_has_default_reduction env.current (* ------------------------------------------------------------------------ *) (* The following functions work at the level of environments (as opposed to checkpoints). The function [pop] causes the automaton to go back into the past, pretending that the last input symbol has never been read. The function [force_reduction] causes the automaton to re-interpret the past, by recognizing the right-hand side of a production and reducing this production. The function [feed] causes the automaton to progress into the future by pretending that a (terminal or nonterminal) symbol has been read. *) (* The function [feed] would ideally be defined here. However, for this function to be type-safe, the GADT ['a symbol] is needed. For this reason, we move its definition to [InspectionTableInterpreter], where the inspection API is available. *) (* [pop] pops one stack cell. It cannot go wrong. *) let pop (env : 'a env) : 'a env option = let cell = env.stack in let next = cell.next in if next == cell then (* The stack is empty. *) None else (* The stack is nonempty. Pop off one cell. *) Some { env with stack = next; current = cell.state } (* [force_reduction] is analogous to [reduce], except that it does not continue by calling [run env] or [initiate env]. Instead, it returns [env] to the user. *) (* [force_reduction] is dangerous insofar as it executes a semantic action. This semantic action could have side effects: nontermination, state, exceptions, input/output, etc. *) let force_reduction prod (env : 'a env) : 'a env = (* Check if this reduction is permitted. This check is REALLY important. The stack must have the correct shape: that is, it must be sufficiently high, and must contain semantic values of appropriate types, otherwise the semantic action will crash and burn. *) (* We currently check whether the current state is WILLING to reduce this production (i.e., there is a reduction action in the action table row associated with this state), whereas it would be more liberal to check whether this state is CAPABLE of reducing this production (i.e., the stack has an appropriate shape). We currently have no means of performing such a check. *) if not (T.may_reduce env.current prod) then invalid_arg "force_reduction: this reduction is not permitted in this state" else begin (* We do not expose the start productions to the user, so this cannot be a start production. Hence, it has a semantic action. *) assert (not (T.is_start prod)); (* Invoke the semantic action. *) let stack = T.semantic_action prod env in (* Perform a goto transition. *) let current = T.goto_prod stack.state prod in { env with stack; current } end (* The environment manipulation functions -- [pop] and [force_reduction] above, plus [feed] -- manipulate the automaton's stack and current state, but do not affect the automaton's lookahead symbol. When the function [input_needed] is used to go back from an environment to a checkpoint (and therefore, resume normal parsing), the lookahead symbol is clobbered anyway, since the only action that the user can take is to call [offer]. So far, so good. One problem, though, is that this call to [offer] may well place the automaton in a configuration of a state [s] and a lookahead symbol [t] that is normally unreachable. Also, perhaps the state [s] is a state where an input symbol normally is never demanded, so this [InputNeeded] checkpoint is fishy. There does not seem to be a deep problem here, but, when programming an error recovery strategy, one should pay some attention to this issue. Ideally, perhaps, one should use [input_needed] only in a state [s] where an input symbol is normally demanded, that is, a state [s] whose incoming symbol is a terminal symbol and which does not have a default reduction on [#]. *) let input_needed (env : 'a env) : 'a checkpoint = InputNeeded env (* The following functions are compositions of [top] and [pop]. *) let rec pop_many i env = if i = 0 then Some env else match pop env with | None -> None | Some env -> pop_many (i - 1) env let get i env = match pop_many i env with | None -> None | Some env -> top env end menhir-20210929/lib/Engine.mli000066400000000000000000000031271412503066000157240ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open EngineTypes (* The LR parsing engine. *) module Make (T : TABLE) : ENGINE with type state = T.state and type token = T.token and type semantic_value = T.semantic_value and type production = T.production and type 'a env = (T.state, T.semantic_value, T.token) EngineTypes.env (* We would prefer not to expose the definition of the type [env]. However, it must be exposed because some of the code in the inspection API needs access to the engine's internals; see [InspectionTableInterpreter]. Everything would be simpler if --inspection was always ON, but that would lead to bigger parse tables for everybody. *) menhir-20210929/lib/EngineTypes.ml000066400000000000000000000334271412503066000166060ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* This file defines several types and module types that are used in the specification of module [Engine]. *) (* --------------------------------------------------------------------------- *) (* It would be nice if we could keep the structure of stacks and environments hidden. However, stacks and environments must be accessible to semantic actions, so the following data structure definitions must be public. *) (* --------------------------------------------------------------------------- *) (* A stack is a linked list of cells. A sentinel cell -- which is its own successor -- is used to mark the bottom of the stack. The sentinel cell itself is not significant -- it contains dummy values. *) type ('state, 'semantic_value) stack = { (* The state that we should go back to if we pop this stack cell. *) (* This convention means that the state contained in the top stack cell is not the current state [env.current]. It also means that the state found within the sentinel is a dummy -- it is never consulted. This convention is the same as that adopted by the code-based back-end. *) state: 'state; (* The semantic value associated with the chunk of input that this cell represents. *) semv: 'semantic_value; (* The start and end positions of the chunk of input that this cell represents. *) startp: Lexing.position; endp: Lexing.position; (* The next cell down in the stack. If this is a self-pointer, then this cell is the sentinel, and the stack is conceptually empty. *) next: ('state, 'semantic_value) stack; } (* --------------------------------------------------------------------------- *) (* A parsing environment contains all of the parser's state (except for the current program point). *) type ('state, 'semantic_value, 'token) env = { (* If this flag is true, then the first component of [env.triple] should be ignored, as it has been logically overwritten with the [error] pseudo-token. *) error: bool; (* The last token that was obtained from the lexer, together with its start and end positions. Warning: before the first call to the lexer has taken place, a dummy (and possibly invalid) token is stored here. *) triple: 'token * Lexing.position * Lexing.position; (* The stack. In [CodeBackend], it is passed around on its own, whereas, here, it is accessed via the environment. *) stack: ('state, 'semantic_value) stack; (* The current state. In [CodeBackend], it is passed around on its own, whereas, here, it is accessed via the environment. *) current: 'state; } (* --------------------------------------------------------------------------- *) (* This signature describes the parameters that must be supplied to the LR engine. *) module type TABLE = sig (* The type of automaton states. *) type state (* States are numbered. *) val number: state -> int (* The type of tokens. These can be thought of as real tokens, that is, tokens returned by the lexer. They carry a semantic value. This type does not include the [error] pseudo-token. *) type token (* The type of terminal symbols. These can be thought of as integer codes. They do not carry a semantic value. This type does include the [error] pseudo-token. *) type terminal (* The type of nonterminal symbols. *) type nonterminal (* The type of semantic values. *) type semantic_value (* A token is conceptually a pair of a (non-[error]) terminal symbol and a semantic value. The following two functions are the pair projections. *) val token2terminal: token -> terminal val token2value: token -> semantic_value (* Even though the [error] pseudo-token is not a real token, it is a terminal symbol. Furthermore, for regularity, it must have a semantic value. *) val error_terminal: terminal val error_value: semantic_value (* [foreach_terminal] allows iterating over all terminal symbols. *) val foreach_terminal: (terminal -> 'a -> 'a) -> 'a -> 'a (* The type of productions. *) type production val production_index: production -> int val find_production: int -> production (* If a state [s] has a default reduction on production [prod], then, upon entering [s], the automaton should reduce [prod] without consulting the lookahead token. The following function allows determining which states have default reductions. *) (* Instead of returning a value of a sum type -- either [DefRed prod], or [NoDefRed] -- it accepts two continuations, and invokes just one of them. This mechanism allows avoiding a memory allocation. *) val default_reduction: state -> ('env -> production -> 'answer) -> ('env -> 'answer) -> 'env -> 'answer (* An LR automaton can normally take three kinds of actions: shift, reduce, or fail. (Acceptance is a particular case of reduction: it consists in reducing a start production.) *) (* There are two variants of the shift action. [shift/discard s] instructs the automaton to discard the current token, request a new one from the lexer, and move to state [s]. [shift/nodiscard s] instructs it to move to state [s] without requesting a new token. This instruction should be used when [s] has a default reduction on [#]. See [CodeBackend.gettoken] for details. *) (* This is the automaton's action table. It maps a pair of a state and a terminal symbol to an action. *) (* Instead of returning a value of a sum type -- one of shift/discard, shift/nodiscard, reduce, or fail -- this function accepts three continuations, and invokes just one them. This mechanism allows avoiding a memory allocation. *) (* In summary, the parameters to [action] are as follows: - the first two parameters, a state and a terminal symbol, are used to look up the action table; - the next parameter is the semantic value associated with the above terminal symbol; it is not used, only passed along to the shift continuation, as explained below; - the shift continuation expects an environment; a flag that tells whether to discard the current token; the terminal symbol that is being shifted; its semantic value; and the target state of the transition; - the reduce continuation expects an environment and a production; - the fail continuation expects an environment; - the last parameter is the environment; it is not used, only passed along to the selected continuation. *) val action: state -> terminal -> semantic_value -> ('env -> bool -> terminal -> semantic_value -> state -> 'answer) -> ('env -> production -> 'answer) -> ('env -> 'answer) -> 'env -> 'answer (* This is the automaton's goto table. This table maps a pair of a state and a nonterminal symbol to a new state. By extension, it also maps a pair of a state and a production to a new state. *) (* The function [goto_nt] can be applied to [s] and [nt] ONLY if the state [s] has an outgoing transition labeled [nt]. Otherwise, its result is undefined. Similarly, the call [goto_prod prod s] is permitted ONLY if the state [s] has an outgoing transition labeled with the nonterminal symbol [lhs prod]. The function [maybe_goto_nt] involves an additional dynamic check and CAN be called even if there is no outgoing transition. *) val goto_nt : state -> nonterminal -> state val goto_prod: state -> production -> state val maybe_goto_nt: state -> nonterminal -> state option (* [is_start prod] tells whether the production [prod] is a start production. *) val is_start: production -> bool (* By convention, a semantic action is responsible for: 1. fetching whatever semantic values and positions it needs off the stack; 2. popping an appropriate number of cells off the stack, as dictated by the length of the right-hand side of the production; 3. computing a new semantic value, as well as new start and end positions; 4. pushing a new stack cell, which contains the three values computed in step 3; 5. returning the new stack computed in steps 2 and 4. Point 1 is essentially forced upon us: if semantic values were fetched off the stack by this interpreter, then the calling convention for semantic actions would be variadic: not all semantic actions would have the same number of arguments. The rest follows rather naturally. *) (* Semantic actions are allowed to raise [Error]. *) exception Error type semantic_action = (state, semantic_value, token) env -> (state, semantic_value) stack val semantic_action: production -> semantic_action (* [may_reduce state prod] tests whether the state [state] is capable of reducing the production [prod]. This function is currently costly and is not used by the core LR engine. It is used in the implementation of certain functions, such as [force_reduction], which allow the engine to be driven programmatically. *) val may_reduce: state -> production -> bool (* The LR engine requires a number of hooks, which are used for logging. *) (* The comments below indicate the conventional messages that correspond to these hooks in the code-based back-end; see [CodeBackend]. *) (* If the flag [log] is false, then the logging functions are not called. If it is [true], then they are called. *) val log : bool module Log : sig (* State %d: *) val state: state -> unit (* Shifting () to state *) val shift: terminal -> state -> unit (* Reducing a production should be logged either as a reduction event (for regular productions) or as an acceptance event (for start productions). *) (* Reducing production / Accepting *) val reduce_or_accept: production -> unit (* Lookahead token is now (-) *) val lookahead_token: terminal -> Lexing.position -> Lexing.position -> unit (* Initiating error handling *) val initiating_error_handling: unit -> unit (* Resuming error handling *) val resuming_error_handling: unit -> unit (* Handling error in state *) val handling_error: state -> unit end end (* --------------------------------------------------------------------------- *) (* This signature describes the monolithic (traditional) LR engine. *) (* In this interface, the parser controls the lexer. *) module type MONOLITHIC_ENGINE = sig type state type token type semantic_value (* An entry point to the engine requires a start state, a lexer, and a lexing buffer. It either succeeds and produces a semantic value, or fails and raises [Error]. *) exception Error val entry: (* strategy: *) [ `Legacy | `Simplified ] -> (* see [IncrementalEngine] *) state -> (Lexing.lexbuf -> token) -> Lexing.lexbuf -> semantic_value end (* --------------------------------------------------------------------------- *) (* The following signatures describe the incremental LR engine. *) (* First, see [INCREMENTAL_ENGINE] in the file [IncrementalEngine.ml]. *) (* The [start] function is set apart because we do not wish to publish it as part of the generated [parser.mli] file. Instead, the table back-end will publish specialized versions of it, with a suitable type cast. *) module type INCREMENTAL_ENGINE_START = sig (* [start] is an entry point. It requires a start state and a start position and begins the parsing process. If the lexer is based on an OCaml lexing buffer, the start position should be [lexbuf.lex_curr_p]. [start] produces a checkpoint, which usually will be an [InputNeeded] checkpoint. (It could be [Accepted] if this starting state accepts only the empty word. It could be [Rejected] if this starting state accepts no word at all.) It does not raise any exception. *) (* [start s pos] should really produce a checkpoint of type ['a checkpoint], for a fixed ['a] that depends on the state [s]. We cannot express this, so we use [semantic_value checkpoint], which is safe. The table back-end uses [Obj.magic] to produce safe specialized versions of [start]. *) type state type semantic_value type 'a checkpoint val start: state -> Lexing.position -> semantic_value checkpoint end (* --------------------------------------------------------------------------- *) (* This signature describes the LR engine, which combines the monolithic and incremental interfaces. *) module type ENGINE = sig include MONOLITHIC_ENGINE include IncrementalEngine.INCREMENTAL_ENGINE with type token := token and type 'a lr1state = state (* useful for us; hidden from the end user *) include INCREMENTAL_ENGINE_START with type state := state and type semantic_value := semantic_value and type 'a checkpoint := 'a checkpoint end menhir-20210929/lib/ErrorReports.ml000066400000000000000000000112221412503066000170110ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* -------------------------------------------------------------------------- *) (* A two-place buffer stores zero, one, or two elements. *) type 'a content = | Zero | One of 'a | Two of 'a * (* most recent: *) 'a type 'a buffer = 'a content ref (* [update buffer x] pushes [x] into [buffer], causing the buffer to slide. *) let update buffer x = buffer := match !buffer, x with | Zero, _ -> One x | One x1, x2 | Two (_, x1), x2 -> Two (x1, x2) let show f buffer : string = match !buffer with | Zero -> (* The buffer cannot be empty. If we have read no tokens, we cannot have detected a syntax error. *) assert false | One invalid -> (* It is unlikely, but possible, that we have read just one token. *) Printf.sprintf "before '%s'" (f invalid) | Two (valid, invalid) -> (* In the most likely case, we have read two tokens. *) Printf.sprintf "after '%s' and before '%s'" (f valid) (f invalid) let last buffer = match !buffer with | Zero -> (* The buffer cannot be empty. If we have read no tokens, we cannot have detected a syntax error. *) assert false | One invalid | Two (_, invalid) -> invalid open Lexing let wrap lexer = let buffer = ref Zero in buffer, fun lexbuf -> let token = lexer lexbuf in update buffer (lexbuf.lex_start_p, lexbuf.lex_curr_p); token let wrap_supplier supplier = let buffer = ref Zero in buffer, fun () -> let (_token, pos1, pos2) as triple = supplier() in update buffer (pos1, pos2); triple (* -------------------------------------------------------------------------- *) let extract text (pos1, pos2) : string = let ofs1 = pos1.pos_cnum and ofs2 = pos2.pos_cnum in let len = ofs2 - ofs1 in try String.sub text ofs1 len with Invalid_argument _ -> (* In principle, this should not happen, but if it does, let's make this a non-fatal error. *) "???" let sanitize text = String.map (fun c -> if Char.code c < 32 then ' ' else c ) text (* If we were willing to depend on [Str], we could implement [compress] as follows: let compress text = Str.global_replace (Str.regexp "[ \t\n\r]+") " " text *) let rec compress n b i j skipping = if j < n then let c, j = Bytes.get b j, j + 1 in match c with | ' ' | '\t' | '\n' | '\r' -> let i = if not skipping then (Bytes.set b i ' '; i + 1) else i in let skipping = true in compress n b i j skipping | _ -> let i = Bytes.set b i c; i + 1 in let skipping = false in compress n b i j skipping else Bytes.sub_string b 0 i let compress text = let b = Bytes.of_string text in let n = Bytes.length b in compress n b 0 0 false let shorten k text = let n = String.length text in if n <= 2 * k + 3 then text else String.sub text 0 k ^ "..." ^ String.sub text (n - k) k let is_digit c = let c = Char.code c in Char.code '0' <= c && c <= Char.code '9' exception Copy let expand f text = let n = String.length text in let b = Buffer.create n in let rec loop i = if i < n then begin let c, i = text.[i], i + 1 in loop ( try if c <> '$' then raise Copy; let j = ref i in while !j < n && is_digit text.[!j] do incr j done; if i = !j then raise Copy; let k = int_of_string (String.sub text i (!j - i)) in Buffer.add_string b (f k); !j with Copy -> (* We reach this point if either [c] is not '$' or [c] is '$' but is not followed by an integer literal. *) Buffer.add_char b c; i ) end else Buffer.contents b in loop 0 menhir-20210929/lib/ErrorReports.mli000066400000000000000000000064001412503066000171640ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* -------------------------------------------------------------------------- *) (* The following functions help keep track of the start and end positions of the last two tokens in a two-place buffer. This is used to nicely display where a syntax error took place. *) type 'a buffer (* [wrap lexer] returns a pair of a new (initially empty) buffer and a lexer which internally relies on [lexer] and updates [buffer] on the fly whenever a token is demanded. *) (* The type of the buffer is [(position * position) buffer], which means that it stores two pairs of positions, which are the start and end positions of the last two tokens. *) open Lexing val wrap: (lexbuf -> 'token) -> (position * position) buffer * (lexbuf -> 'token) val wrap_supplier: (unit -> 'token * position * position) -> (position * position) buffer * (unit -> 'token * position * position) (* [show f buffer] prints the contents of the buffer, producing a string that is typically of the form "after '%s' and before '%s'". The function [f] is used to print an element. The buffer MUST be nonempty. *) val show: ('a -> string) -> 'a buffer -> string (* [last buffer] returns the last element of the buffer. The buffer MUST be nonempty. *) val last: 'a buffer -> 'a (* -------------------------------------------------------------------------- *) (* [extract text (pos1, pos2)] extracts the sub-string of [text] delimited by the positions [pos1] and [pos2]. *) val extract: string -> position * position -> string (* [sanitize text] eliminates any special characters from the text [text]. A special character is a character whose ASCII code is less than 32. Every special character is replaced with a single space character. *) val sanitize: string -> string (* [compress text] replaces every run of at least one whitespace character with exactly one space character. *) val compress: string -> string (* [shorten k text] limits the length of [text] to [2k+3] characters. If the text is too long, a fragment in the middle is replaced with an ellipsis. *) val shorten: int -> string -> string (* [expand f text] searches [text] for occurrences of [$k], where [k] is a nonnegative integer literal, and replaces each such occurrence with the string [f k]. *) val expand: (int -> string) -> string -> string menhir-20210929/lib/General.ml000066400000000000000000000042011412503066000157150ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* --------------------------------------------------------------------------- *) (* Lists. *) let rec take n xs = match n, xs with | 0, _ | _, [] -> [] | _, (x :: xs as input) -> let xs' = take (n - 1) xs in if xs == xs' then input else x :: xs' let rec drop n xs = match n, xs with | 0, _ -> xs | _, [] -> [] | _, _ :: xs -> drop (n - 1) xs let rec uniq1 cmp x ys = match ys with | [] -> [] | y :: ys -> if cmp x y = 0 then uniq1 cmp x ys else y :: uniq1 cmp y ys let uniq cmp xs = match xs with | [] -> [] | x :: xs -> x :: uniq1 cmp x xs let weed cmp xs = uniq cmp (List.sort cmp xs) (* --------------------------------------------------------------------------- *) (* Streams. *) type 'a stream = 'a head Lazy.t and 'a head = | Nil | Cons of 'a * 'a stream (* The length of a stream. *) let rec length xs = match Lazy.force xs with | Nil -> 0 | Cons (_, xs) -> 1 + length xs (* Folding over a stream. *) let rec foldr f xs accu = match Lazy.force xs with | Nil -> accu | Cons (x, xs) -> f x (foldr f xs accu) menhir-20210929/lib/General.mli000066400000000000000000000045361412503066000161010ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* This module offers general-purpose functions on lists and streams. *) (* As of 2017/03/31, this module is DEPRECATED. It might be removed in the future. *) (* --------------------------------------------------------------------------- *) (* Lists. *) (* [take n xs] returns the [n] first elements of the list [xs]. It is acceptable for the list [xs] to have length less than [n], in which case [xs] itself is returned. *) val take: int -> 'a list -> 'a list (* [drop n xs] returns the list [xs], deprived of its [n] first elements. It is acceptable for the list [xs] to have length less than [n], in which case an empty list is returned. *) val drop: int -> 'a list -> 'a list (* [uniq cmp xs] assumes that the list [xs] is sorted according to the ordering [cmp] and returns the list [xs] deprived of any duplicate elements. *) val uniq: ('a -> 'a -> int) -> 'a list -> 'a list (* [weed cmp xs] returns the list [xs] deprived of any duplicate elements. *) val weed: ('a -> 'a -> int) -> 'a list -> 'a list (* --------------------------------------------------------------------------- *) (* A stream is a list whose elements are produced on demand. *) type 'a stream = 'a head Lazy.t and 'a head = | Nil | Cons of 'a * 'a stream (* The length of a stream. *) val length: 'a stream -> int (* Folding over a stream. *) val foldr: ('a -> 'b -> 'b) -> 'a stream -> 'b -> 'b menhir-20210929/lib/IncrementalEngine.ml000066400000000000000000000504151412503066000177370ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) type position = Lexing.position open General (* This signature describes the incremental LR engine. *) (* In this mode, the user controls the lexer, and the parser suspends itself when it needs to read a new token. *) module type INCREMENTAL_ENGINE = sig type token (* A value of type [production] is (an index for) a production. The start productions (which do not exist in an \mly file, but are constructed by Menhir internally) are not part of this type. *) type production (* The type ['a checkpoint] represents an intermediate or final state of the parser. An intermediate checkpoint is a suspension: it records the parser's current state, and allows parsing to be resumed. The parameter ['a] is the type of the semantic value that will eventually be produced if the parser succeeds. *) (* [Accepted] and [Rejected] are final checkpoints. [Accepted] carries a semantic value. *) (* [InputNeeded] is an intermediate checkpoint. It means that the parser wishes to read one token before continuing. *) (* [Shifting] is an intermediate checkpoint. It means that the parser is taking a shift transition. It exposes the state of the parser before and after the transition. The Boolean parameter tells whether the parser intends to request a new token after this transition. (It always does, except when it is about to accept.) *) (* [AboutToReduce] is an intermediate checkpoint. It means that the parser is about to perform a reduction step. It exposes the parser's current state as well as the production that is about to be reduced. *) (* [HandlingError] is an intermediate checkpoint. It means that the parser has detected an error and is currently handling it, in several steps. *) (* A value of type ['a env] represents a configuration of the automaton: current state, stack, lookahead token, etc. The parameter ['a] is the type of the semantic value that will eventually be produced if the parser succeeds. *) (* In normal operation, the parser works with checkpoints: see the functions [offer] and [resume]. However, it is also possible to work directly with environments (see the functions [pop], [force_reduction], and [feed]) and to reconstruct a checkpoint out of an environment (see [input_needed]). This is considered advanced functionality; its purpose is to allow error recovery strategies to be programmed by the user. *) type 'a env type 'a checkpoint = private | InputNeeded of 'a env | Shifting of 'a env * 'a env * bool | AboutToReduce of 'a env * production | HandlingError of 'a env | Accepted of 'a | Rejected (* [offer] allows the user to resume the parser after it has suspended itself with a checkpoint of the form [InputNeeded env]. [offer] expects the old checkpoint as well as a new token and produces a new checkpoint. It does not raise any exception. *) val offer: 'a checkpoint -> token * position * position -> 'a checkpoint (* [resume] allows the user to resume the parser after it has suspended itself with a checkpoint of the form [AboutToReduce (env, prod)] or [HandlingError env]. [resume] expects the old checkpoint and produces a new checkpoint. It does not raise any exception. *) (* The optional argument [strategy] influences the manner in which [resume] deals with checkpoints of the form [ErrorHandling _]. Its default value is [`Legacy]. It can be briefly described as follows: - If the [error] token is used only to report errors (that is, if the [error] token appears only at the end of a production, whose semantic action raises an exception) then the simplified strategy should be preferred. (This includes the case where the [error] token does not appear at all in the grammar.) - If the [error] token is used to recover after an error, or if perfect backward compatibility is required, the legacy strategy should be selected. More details on these strategies appear in the file [Engine.ml]. *) type strategy = [ `Legacy | `Simplified ] val resume: ?strategy:strategy -> 'a checkpoint -> 'a checkpoint (* A token supplier is a function of no arguments which delivers a new token (together with its start and end positions) every time it is called. *) type supplier = unit -> token * position * position (* A pair of a lexer and a lexing buffer can be easily turned into a supplier. *) val lexer_lexbuf_to_supplier: (Lexing.lexbuf -> token) -> Lexing.lexbuf -> supplier (* The functions [offer] and [resume] are sufficient to write a parser loop. One can imagine many variations (which is why we expose these functions in the first place!). Here, we expose a few variations of the main loop, ready for use. *) (* [loop supplier checkpoint] begins parsing from [checkpoint], reading tokens from [supplier]. It continues parsing until it reaches a checkpoint of the form [Accepted v] or [Rejected]. In the former case, it returns [v]. In the latter case, it raises the exception [Error]. The optional argument [strategy], whose default value is [Legacy], is passed to [resume] and influences the error-handling strategy. *) val loop: ?strategy:strategy -> supplier -> 'a checkpoint -> 'a (* [loop_handle succeed fail supplier checkpoint] begins parsing from [checkpoint], reading tokens from [supplier]. It continues parsing until it reaches a checkpoint of the form [Accepted v] or [HandlingError env] (or [Rejected], but that should not happen, as [HandlingError _] will be observed first). In the former case, it calls [succeed v]. In the latter case, it calls [fail] with this checkpoint. It cannot raise [Error]. This means that Menhir's error-handling procedure does not get a chance to run. For this reason, there is no [strategy] parameter. Instead, the user can implement her own error handling code, in the [fail] continuation. *) val loop_handle: ('a -> 'answer) -> ('a checkpoint -> 'answer) -> supplier -> 'a checkpoint -> 'answer (* [loop_handle_undo] is analogous to [loop_handle], except it passes a pair of checkpoints to the failure continuation. The first (and oldest) checkpoint is the last [InputNeeded] checkpoint that was encountered before the error was detected. The second (and newest) checkpoint is where the error was detected, as in [loop_handle]. Going back to the first checkpoint can be thought of as undoing any reductions that were performed after seeing the problematic token. (These reductions must be default reductions or spurious reductions.) [loop_handle_undo] must initially be applied to an [InputNeeded] checkpoint. The parser's initial checkpoints satisfy this constraint. *) val loop_handle_undo: ('a -> 'answer) -> ('a checkpoint -> 'a checkpoint -> 'answer) -> supplier -> 'a checkpoint -> 'answer (* [shifts checkpoint] assumes that [checkpoint] has been obtained by submitting a token to the parser. It runs the parser from [checkpoint], through an arbitrary number of reductions, until the parser either accepts this token (i.e., shifts) or rejects it (i.e., signals an error). If the parser decides to shift, then [Some env] is returned, where [env] is the parser's state just before shifting. Otherwise, [None] is returned. *) (* It is desirable that the semantic actions be side-effect free, or that their side-effects be harmless (replayable). *) val shifts: 'a checkpoint -> 'a env option (* The function [acceptable] allows testing, after an error has been detected, which tokens would have been accepted at this point. It is implemented using [shifts]. Its argument should be an [InputNeeded] checkpoint. *) (* For completeness, one must undo any spurious reductions before carrying out this test -- that is, one must apply [acceptable] to the FIRST checkpoint that is passed by [loop_handle_undo] to its failure continuation. *) (* This test causes some semantic actions to be run! The semantic actions should be side-effect free, or their side-effects should be harmless. *) (* The position [pos] is used as the start and end positions of the hypothetical token, and may be picked up by the semantic actions. We suggest using the position where the error was detected. *) val acceptable: 'a checkpoint -> token -> position -> bool (* The abstract type ['a lr1state] describes the non-initial states of the LR(1) automaton. The index ['a] represents the type of the semantic value associated with this state's incoming symbol. *) type 'a lr1state (* The states of the LR(1) automaton are numbered (from 0 and up). *) val number: _ lr1state -> int (* Productions are numbered. *) (* [find_production i] requires the index [i] to be valid. Use with care. *) val production_index: production -> int val find_production: int -> production (* An element is a pair of a non-initial state [s] and a semantic value [v] associated with the incoming symbol of this state. The idea is, the value [v] was pushed onto the stack just before the state [s] was entered. Thus, for some type ['a], the state [s] has type ['a lr1state] and the value [v] has type ['a]. In other words, the type [element] is an existential type. *) type element = | Element: 'a lr1state * 'a * position * position -> element (* The parser's stack is (or, more precisely, can be viewed as) a stream of elements. The type [stream] is defined by the module [General]. *) (* As of 2017/03/31, the types [stream] and [stack] and the function [stack] are DEPRECATED. They might be removed in the future. An alternative way of inspecting the stack is via the functions [top] and [pop]. *) type stack = (* DEPRECATED *) element stream (* This is the parser's stack, a stream of elements. This stream is empty if the parser is in an initial state; otherwise, it is non-empty. The LR(1) automaton's current state is the one found in the top element of the stack. *) val stack: 'a env -> stack (* DEPRECATED *) (* [top env] returns the parser's top stack element. The state contained in this stack element is the current state of the automaton. If the stack is empty, [None] is returned. In that case, the current state of the automaton must be an initial state. *) val top: 'a env -> element option (* [pop_many i env] pops [i] cells off the automaton's stack. This is done via [i] successive invocations of [pop]. Thus, [pop_many 1] is [pop]. The index [i] must be nonnegative. The time complexity is O(i). *) val pop_many: int -> 'a env -> 'a env option (* [get i env] returns the parser's [i]-th stack element. The index [i] is 0-based: thus, [get 0] is [top]. If [i] is greater than or equal to the number of elements in the stack, [None] is returned. The time complexity is O(i). *) val get: int -> 'a env -> element option (* [current_state_number env] is (the integer number of) the automaton's current state. This works even if the automaton's stack is empty, in which case the current state is an initial state. This number can be passed as an argument to a [message] function generated by [menhir --compile-errors]. *) val current_state_number: 'a env -> int (* [equal env1 env2] tells whether the parser configurations [env1] and [env2] are equal in the sense that the automaton's current state is the same in [env1] and [env2] and the stack is *physically* the same in [env1] and [env2]. If [equal env1 env2] is [true], then the sequence of the stack elements, as observed via [pop] and [top], must be the same in [env1] and [env2]. Also, if [equal env1 env2] holds, then the checkpoints [input_needed env1] and [input_needed env2] must be equivalent. The function [equal] has time complexity O(1). *) val equal: 'a env -> 'a env -> bool (* These are the start and end positions of the current lookahead token. If invoked in an initial state, this function returns a pair of twice the initial position. *) val positions: 'a env -> position * position (* When applied to an environment taken from a checkpoint of the form [AboutToReduce (env, prod)], the function [env_has_default_reduction] tells whether the reduction that is about to take place is a default reduction. *) val env_has_default_reduction: 'a env -> bool (* [state_has_default_reduction s] tells whether the state [s] has a default reduction. This includes the case where [s] is an accepting state. *) val state_has_default_reduction: _ lr1state -> bool (* [pop env] returns a new environment, where the parser's top stack cell has been popped off. (If the stack is empty, [None] is returned.) This amounts to pretending that the (terminal or nonterminal) symbol that corresponds to this stack cell has not been read. *) val pop: 'a env -> 'a env option (* [force_reduction prod env] should be called only if in the state [env] the parser is capable of reducing the production [prod]. If this condition is satisfied, then this production is reduced, which means that its semantic action is executed (this can have side effects!) and the automaton makes a goto (nonterminal) transition. If this condition is not satisfied, [Invalid_argument _] is raised. *) val force_reduction: production -> 'a env -> 'a env (* [input_needed env] returns [InputNeeded env]. That is, out of an [env] that might have been obtained via a series of calls to the functions [pop], [force_reduction], [feed], etc., it produces a checkpoint, which can be used to resume normal parsing, by supplying this checkpoint as an argument to [offer]. *) (* This function should be used with some care. It could "mess up the lookahead" in the sense that it allows parsing to resume in an arbitrary state [s] with an arbitrary lookahead symbol [t], even though Menhir's reachability analysis (menhir --list-errors) might well think that it is impossible to reach this particular configuration. If one is using Menhir's new error reporting facility, this could cause the parser to reach an error state for which no error message has been prepared. *) val input_needed: 'a env -> 'a checkpoint end (* This signature is a fragment of the inspection API that is made available to the user when [--inspection] is used. This fragment contains type definitions for symbols. *) module type SYMBOLS = sig (* The type ['a terminal] represents a terminal symbol. The type ['a nonterminal] represents a nonterminal symbol. In both cases, the index ['a] represents the type of the semantic values associated with this symbol. The concrete definitions of these types are generated. *) type 'a terminal type 'a nonterminal (* The type ['a symbol] represents a terminal or nonterminal symbol. It is the disjoint union of the types ['a terminal] and ['a nonterminal]. *) type 'a symbol = | T : 'a terminal -> 'a symbol | N : 'a nonterminal -> 'a symbol (* The type [xsymbol] is an existentially quantified version of the type ['a symbol]. This type is useful in situations where the index ['a] is not statically known. *) type xsymbol = | X : 'a symbol -> xsymbol end (* This signature describes the inspection API that is made available to the user when [--inspection] is used. *) module type INSPECTION = sig (* The types of symbols are described above. *) include SYMBOLS (* The type ['a lr1state] is meant to be the same as in [INCREMENTAL_ENGINE]. *) type 'a lr1state (* The type [production] is meant to be the same as in [INCREMENTAL_ENGINE]. It represents a production of the grammar. A production can be examined via the functions [lhs] and [rhs] below. *) type production (* An LR(0) item is a pair of a production [prod] and a valid index [i] into this production. That is, if the length of [rhs prod] is [n], then [i] is comprised between 0 and [n], inclusive. *) type item = production * int (* Ordering functions. *) val compare_terminals: _ terminal -> _ terminal -> int val compare_nonterminals: _ nonterminal -> _ nonterminal -> int val compare_symbols: xsymbol -> xsymbol -> int val compare_productions: production -> production -> int val compare_items: item -> item -> int (* [incoming_symbol s] is the incoming symbol of the state [s], that is, the symbol that the parser must recognize before (has recognized when) it enters the state [s]. This function gives access to the semantic value [v] stored in a stack element [Element (s, v, _, _)]. Indeed, by case analysis on the symbol [incoming_symbol s], one discovers the type ['a] of the value [v]. *) val incoming_symbol: 'a lr1state -> 'a symbol (* [items s] is the set of the LR(0) items in the LR(0) core of the LR(1) state [s]. This set is not epsilon-closed. This set is presented as a list, in an arbitrary order. *) val items: _ lr1state -> item list (* [lhs prod] is the left-hand side of the production [prod]. This is always a non-terminal symbol. *) val lhs: production -> xsymbol (* [rhs prod] is the right-hand side of the production [prod]. This is a (possibly empty) sequence of (terminal or nonterminal) symbols. *) val rhs: production -> xsymbol list (* [nullable nt] tells whether the non-terminal symbol [nt] is nullable. That is, it is true if and only if this symbol produces the empty word [epsilon]. *) val nullable: _ nonterminal -> bool (* [first nt t] tells whether the FIRST set of the nonterminal symbol [nt] contains the terminal symbol [t]. That is, it is true if and only if [nt] produces a word that begins with [t]. *) val first: _ nonterminal -> _ terminal -> bool (* [xfirst] is analogous to [first], but expects a first argument of type [xsymbol] instead of [_ terminal]. *) val xfirst: xsymbol -> _ terminal -> bool (* [foreach_terminal] enumerates the terminal symbols, including [error]. [foreach_terminal_but_error] enumerates the terminal symbols, excluding [error]. *) val foreach_terminal: (xsymbol -> 'a -> 'a) -> 'a -> 'a val foreach_terminal_but_error: (xsymbol -> 'a -> 'a) -> 'a -> 'a (* The type [env] is meant to be the same as in [INCREMENTAL_ENGINE]. *) type 'a env (* [feed symbol startp semv endp env] causes the parser to consume the (terminal or nonterminal) symbol [symbol], accompanied with the semantic value [semv] and with the start and end positions [startp] and [endp]. Thus, the automaton makes a transition, and reaches a new state. The stack grows by one cell. This operation is permitted only if the current state (as determined by [env]) has an outgoing transition labeled with [symbol]. Otherwise, [Invalid_argument _] is raised. *) val feed: 'a symbol -> position -> 'a -> position -> 'b env -> 'b env end (* This signature combines the incremental API and the inspection API. *) module type EVERYTHING = sig include INCREMENTAL_ENGINE include INSPECTION with type 'a lr1state := 'a lr1state with type production := production with type 'a env := 'a env end menhir-20210929/lib/InfiniteArray.ml000066400000000000000000000036601412503066000171140ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (** This module implements infinite arrays, that is, arrays that grow transparently upon demand. *) type 'a t = { default: 'a; mutable table: 'a array; mutable extent: int; (* the index of the greatest [set] ever, plus one *) } let default_size = 16384 (* must be non-zero *) let make x = { default = x; table = Array.make default_size x; extent = 0; } let rec new_length length i = if i < length then length else new_length (2 * length) i let ensure a i = assert (0 <= i); let table = a.table in let length = Array.length table in if i >= length then begin let table' = Array.make (new_length (2 * length) i) a.default in Array.blit table 0 table' 0 length; a.table <- table' end let get a i = ensure a i; Array.unsafe_get a.table (i) let set a i x = ensure a i; Array.unsafe_set a.table (i) x; if a.extent <= i then a.extent <- i + 1 let extent a = a.extent let domain a = Array.sub a.table 0 a.extent menhir-20210929/lib/InfiniteArray.mli000066400000000000000000000034311412503066000172610ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (** This module implements infinite arrays. **) type 'a t (** [make x] creates an infinite array, where every slot contains [x]. **) val make: 'a -> 'a t (** [get a i] returns the element contained at offset [i] in the array [a]. Slots are numbered 0 and up. **) val get: 'a t -> int -> 'a (** [set a i x] sets the element contained at offset [i] in the array [a] to [x]. Slots are numbered 0 and up. **) val set: 'a t -> int -> 'a -> unit (** [extent a] is the length of an initial segment of the array [a] that is sufficiently large to contain all [set] operations ever performed. In other words, all elements beyond that segment have the default value. *) val extent: 'a t -> int (** [domain a] is a fresh copy of an initial segment of the array [a] whose length is [extent a]. *) val domain: 'a t -> 'a array menhir-20210929/lib/InspectionTableFormat.ml000066400000000000000000000060651412503066000206060ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* This signature defines the format of the tables that are produced (in addition to the tables described in [TableFormat]) when the command line switch [--inspection] is enabled. It is used as an argument to [InspectionTableInterpreter.Make]. *) module type TABLES = sig (* The types of symbols. *) include IncrementalEngine.SYMBOLS (* The type ['a lr1state] describes an LR(1) state. The generated parser defines it internally as [int]. *) type 'a lr1state (* Some of the tables that follow use encodings of (terminal and nonterminal) symbols as integers. So, we need functions that map the integer encoding of a symbol to its algebraic encoding. *) val terminal: int -> xsymbol val nonterminal: int -> xsymbol (* The left-hand side of every production already appears in the signature [TableFormat.TABLES], so we need not repeat it here. *) (* The right-hand side of every production. This a linearized array of arrays of integers, whose [data] and [entry] components have been packed. The encoding of symbols as integers in described in [TableBackend]. *) val rhs: PackedIntArray.t * PackedIntArray.t (* A mapping of every (non-initial) state to its LR(0) core. *) val lr0_core: PackedIntArray.t (* A mapping of every LR(0) state to its set of LR(0) items. Each item is represented in its packed form (see [Item]) as an integer. Thus the mapping is an array of arrays of integers, which is linearized and packed, like [rhs]. *) val lr0_items: PackedIntArray.t * PackedIntArray.t (* A mapping of every LR(0) state to its incoming symbol, if it has one. *) val lr0_incoming: PackedIntArray.t (* A table that tells which non-terminal symbols are nullable. *) val nullable: string (* This is a packed int array of bit width 1. It can be read using [PackedIntArray.get1]. *) (* A two-table dimensional table, indexed by a nonterminal symbol and by a terminal symbol (other than [#]), encodes the FIRST sets. *) val first: int (* width of the bitmap *) * string (* second component of [PackedIntArray.t] *) end menhir-20210929/lib/InspectionTableInterpreter.ml000066400000000000000000000253221412503066000216560ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* -------------------------------------------------------------------------- *) (* The type functor. *) module Symbols (T : sig type 'a terminal type 'a nonterminal end) = struct open T (* This should be the only place in the whole library (and generator!) where these types are defined. *) type 'a symbol = | T : 'a terminal -> 'a symbol | N : 'a nonterminal -> 'a symbol type xsymbol = | X : 'a symbol -> xsymbol end (* -------------------------------------------------------------------------- *) (* The code functor. *) module Make (TT : TableFormat.TABLES) (IT : InspectionTableFormat.TABLES with type 'a lr1state = int) (ET : EngineTypes.TABLE with type terminal = int and type nonterminal = int and type semantic_value = Obj.t) (E : sig type 'a env = (ET.state, ET.semantic_value, ET.token) EngineTypes.env end) = struct (* Including [IT] is an easy way of inheriting the definitions of the types [symbol] and [xsymbol]. *) include IT (* This auxiliary function decodes a packed linearized array, as created by [TableBackend.linearize_and_marshal1]. Here, we read a row all at once. *) let read_packed_linearized (data, entry : PackedIntArray.t * PackedIntArray.t) (i : int) : int list = LinearizedArray.read_row_via (PackedIntArray.get data) (PackedIntArray.get entry) i (* This auxiliary function decodes a symbol. The encoding was done by [encode_symbol] or [encode_symbol_option] in the table back-end. *) let decode_symbol (symbol : int) : IT.xsymbol = (* If [symbol] is 0, then we have no symbol. This could mean e.g. that the function [incoming_symbol] has been applied to an initial state. In principle, this cannot happen. *) assert (symbol > 0); (* The low-order bit distinguishes terminal and nonterminal symbols. *) let kind = symbol land 1 in let symbol = symbol lsr 1 in if kind = 0 then IT.terminal (symbol - 1) else IT.nonterminal symbol (* These auxiliary functions convert a symbol to its integer code. For speed and for convenience, we use an unsafe type cast. This relies on the fact that the data constructors of the [terminal] and [nonterminal] GADTs are declared in an order that reflects their internal code. In the case of nonterminal symbols, we add [start] to account for the presence of the start symbols. *) let n2i (nt : 'a IT.nonterminal) : int = let answer = TT.start + Obj.magic nt in (* For safety, check that the above cast produced a correct result. *) assert (IT.nonterminal answer = X (N nt)); answer let t2i (t : 'a IT.terminal) : int = let answer = Obj.magic t in (* For safety, check that the above cast produced a correct result. *) assert (IT.terminal answer = X (T t)); answer (* Ordering functions. *) let compare_terminals t1 t2 = (* Subtraction is safe because overflow is impossible. *) t2i t1 - t2i t2 let compare_nonterminals nt1 nt2 = (* Subtraction is safe because overflow is impossible. *) n2i nt1 - n2i nt2 let compare_symbols symbol1 symbol2 = match symbol1, symbol2 with | X (T _), X (N _) -> -1 | X (N _), X (T _) -> 1 | X (T t1), X (T t2) -> compare_terminals t1 t2 | X (N nt1), X (N nt2) -> compare_nonterminals nt1 nt2 let compare_productions prod1 prod2 = (* Subtraction is safe because overflow is impossible. *) prod1 - prod2 let compare_items (prod1, index1) (prod2, index2) = let c = compare_productions prod1 prod2 in (* Subtraction is safe because overflow is impossible. *) if c <> 0 then c else index1 - index2 (* The function [incoming_symbol] goes through the tables [IT.lr0_core] and [IT.lr0_incoming]. This yields a representation of type [xsymbol], out of which we strip the [X] quantifier, so as to get a naked symbol. This last step is ill-typed and potentially dangerous. It is safe only because this function is used at type ['a lr1state -> 'a symbol], which forces an appropriate choice of ['a]. *) let incoming_symbol (s : 'a IT.lr1state) : 'a IT.symbol = let core = PackedIntArray.get IT.lr0_core s in let symbol = decode_symbol (PackedIntArray.get IT.lr0_incoming core) in match symbol with | IT.X symbol -> Obj.magic symbol (* The function [lhs] reads the table [TT.lhs] and uses [IT.nonterminal] to decode the symbol. *) let lhs prod = IT.nonterminal (PackedIntArray.get TT.lhs prod) (* The function [rhs] reads the table [IT.rhs] and uses [decode_symbol] to decode the symbol. *) let rhs prod = List.map decode_symbol (read_packed_linearized IT.rhs prod) (* The function [items] maps the LR(1) state [s] to its LR(0) core, then uses [core] as an index into the table [IT.lr0_items]. The items are then decoded by the function [export] below, which is essentially a copy of [Item.export]. *) type item = int * int let low_bits = 10 let low_limit = 1 lsl low_bits let export t : item = (t lsr low_bits, t mod low_limit) let items s = (* Map [s] to its LR(0) core. *) let core = PackedIntArray.get IT.lr0_core s in (* Now use [core] to look up the table [IT.lr0_items]. *) List.map export (read_packed_linearized IT.lr0_items core) (* The function [nullable] maps the nonterminal symbol [nt] to its integer code, which it uses to look up the array [IT.nullable]. This yields 0 or 1, which we map back to a Boolean result. *) let decode_bool i = assert (i = 0 || i = 1); i = 1 let nullable nt = decode_bool (PackedIntArray.get1 IT.nullable (n2i nt)) (* The function [first] maps the symbols [nt] and [t] to their integer codes, which it uses to look up the matrix [IT.first]. *) let first nt t = decode_bool (PackedIntArray.unflatten1 IT.first (n2i nt) (t2i t)) let xfirst symbol t = match symbol with | X (T t') -> compare_terminals t t' = 0 | X (N nt) -> first nt t (* The function [foreach_terminal] exploits the fact that the first component of [TT.error] is [Terminal.n - 1], i.e., the number of terminal symbols, including [error] but not [#]. *) let rec foldij i j f accu = if i = j then accu else foldij (i + 1) j f (f i accu) let foreach_terminal f accu = let n, _ = TT.error in foldij 0 n (fun i accu -> f (IT.terminal i) accu ) accu let foreach_terminal_but_error f accu = let n, _ = TT.error in foldij 0 n (fun i accu -> if i = TT.error_terminal then accu else f (IT.terminal i) accu ) accu (* ------------------------------------------------------------------------ *) (* The following is the implementation of the function [feed]. This function is logically part of the LR engine, so it would be nice if it were placed in the module [Engine], but it must be placed here because, to ensure type safety, its arguments must be a symbol of type ['a symbol] and a semantic value of type ['a]. The type ['a symbol] is not available in [Engine]. It is available here. *) open EngineTypes open ET open E (* [feed] fails if the current state does not have an outgoing transition labeled with the desired symbol. This check is carried out at runtime. *) let feed_failure () = invalid_arg "feed: outgoing transition does not exist" (* Feeding a nonterminal symbol [nt]. Here, [nt] has type [nonterminal], which is a synonym for [int], and [semv] has type [semantic_value], which is a synonym for [Obj.t]. This type is unsafe, because pushing a semantic value of arbitrary type into the stack can later cause a semantic action to crash and burn. The function [feed] is given a safe type below. *) let feed_nonterminal (nt : nonterminal) startp (semv : semantic_value) endp (env : 'b env) : 'b env = (* Check if the source state has an outgoing transition labeled [nt]. This is done by consulting the [goto] table. *) let source = env.current in match ET.maybe_goto_nt source nt with | None -> feed_failure() | Some target -> (* Push a new cell onto the stack, containing the identity of the state that we are leaving. The semantic value [semv] and positions [startp] and [endp] contained in the new cell are provided by the caller. *) let stack = { state = source; semv; startp; endp; next = env.stack } in (* Move to the target state. *) { env with stack; current = target } let reduce _env _prod = feed_failure() let initiate _env = feed_failure() let feed_terminal (terminal : terminal) startp (semv : semantic_value) endp (env : 'b env) : 'b env = (* Check if the source state has an outgoing transition labeled [terminal]. This is done by consulting the [action] table. *) let source = env.current in ET.action source terminal semv (fun env _please_discard _terminal semv target -> (* There is indeed a transition toward the state [target]. Push a new cell onto the stack and move to the target state. *) let stack = { state = source; semv; startp; endp; next = env.stack } in { env with stack; current = target } ) reduce initiate env (* The type assigned to [feed] ensures that the type of the semantic value [semv] is appropriate: it must be the semantic-value type of the symbol [symbol]. *) let feed (symbol : 'a symbol) startp (semv : 'a) endp env = let semv : semantic_value = Obj.repr semv in match symbol with | N nt -> feed_nonterminal (n2i nt) startp semv endp env | T terminal -> feed_terminal (t2i terminal) startp semv endp env end menhir-20210929/lib/InspectionTableInterpreter.mli000066400000000000000000000041401412503066000220220ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* This functor is invoked inside the generated parser, in [--table] mode. It produces no code! It simply constructs the types [symbol] and [xsymbol] on top of the generated types [terminal] and [nonterminal]. *) module Symbols (T : sig type 'a terminal type 'a nonterminal end) : IncrementalEngine.SYMBOLS with type 'a terminal := 'a T.terminal and type 'a nonterminal := 'a T.nonterminal (* This functor is invoked inside the generated parser, in [--table] mode. It constructs the inspection API on top of the inspection tables described in [InspectionTableFormat]. *) module Make (TT : TableFormat.TABLES) (IT : InspectionTableFormat.TABLES with type 'a lr1state = int) (ET : EngineTypes.TABLE with type terminal = int and type nonterminal = int and type semantic_value = Obj.t) (E : sig type 'a env = (ET.state, ET.semantic_value, ET.token) EngineTypes.env end) : IncrementalEngine.INSPECTION with type 'a terminal := 'a IT.terminal and type 'a nonterminal := 'a IT.nonterminal and type 'a lr1state := 'a IT.lr1state and type production := int and type 'a env := 'a E.env menhir-20210929/lib/LexerUtil.ml000066400000000000000000000037421412503066000162660ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Lexing open Printf let init filename lexbuf = lexbuf.lex_curr_p <- { pos_fname = filename; pos_lnum = 1; pos_bol = 0; pos_cnum = 0 }; lexbuf let read filename = let c = open_in filename in let text = really_input_string c (in_channel_length c) in close_in c; let lexbuf = Lexing.from_string text in text, init filename lexbuf let newline lexbuf = let pos = lexbuf.lex_curr_p in lexbuf.lex_curr_p <- { pos with pos_lnum = pos.pos_lnum + 1; pos_bol = pos.pos_cnum; } let is_dummy (pos1, pos2) = pos1 == dummy_pos || pos2 == dummy_pos let range ((pos1, pos2) as range) = if is_dummy range then sprintf "At an unknown location:\n" else let file = pos1.pos_fname in let line = pos1.pos_lnum in let char1 = pos1.pos_cnum - pos1.pos_bol in let char2 = pos2.pos_cnum - pos1.pos_bol in (* yes, [pos1.pos_bol] *) sprintf "File \"%s\", line %d, characters %d-%d:\n" file line char1 char2 (* use [char1 + 1] and [char2 + 1] if *not* using Caml mode *) menhir-20210929/lib/LexerUtil.mli000066400000000000000000000041511412503066000164320ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) open Lexing (* [init filename lexbuf] initializes the lexing buffer [lexbuf] so that the positions that are subsequently read from it refer to the file [filename]. It returns [lexbuf]. *) val init: string -> lexbuf -> lexbuf (* [read filename] reads the entire contents of the file [filename] and returns a pair of this content (a string) and a lexing buffer that has been initialized, based on this string. *) val read: string -> string * lexbuf (* [newline lexbuf] increments the line counter stored within [lexbuf]. It should be invoked by the lexer itself every time a newline character is consumed. This allows maintaining a current the line number in [lexbuf]. *) val newline: lexbuf -> unit (* [range (startpos, endpos)] prints a textual description of the range delimited by the start and end positions [startpos] and [endpos]. This description is one line long and ends in a newline character. This description mentions the file name, the line number, and a range of characters on this line. The line number is correct only if [newline] has been correctly used, as described dabove. *) val range: position * position -> string menhir-20210929/lib/LinearizedArray.ml000066400000000000000000000053061412503066000174340ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* The [entry] array contains offsets into the [data] array. It has [n+1] elements if the original (unencoded) array has [n] elements. The value of [entry.(n)] is the length of the [data] array. This convention is natural and allows avoiding a special case. *) type 'a t = (* data: *) 'a array * (* entry: *) int array let make (a : 'a array array) : 'a t = let n = Array.length a in (* Build the entry array. *) let size = ref 0 in let entry = Array.init (n + 1) (fun i -> let s = !size in if i < n then size := s + Array.length a.(i); s ) in assert (entry.(n) = !size); (* Build the data array. *) let i = ref 0 and j = ref 0 in let data = Array.init !size (fun _ -> while !j = Array.length a.(!i) do i := !i + 1; j := 0; done; let x = a.(!i).(!j) in j := !j + 1; x ) in data, entry let length ((_, entry) : 'a t) : int = Array.length entry let row_length ((_, entry) : 'a t) i : int = entry.(i + 1) - entry.(i) let row_length_via get_entry i = get_entry (i + 1) - get_entry i let read ((data, entry) as la : 'a t) i j : 'a = assert (0 <= j && j < row_length la i); data.(entry.(i) + j) let read_via get_data get_entry i j = assert (0 <= j && j < row_length_via get_entry i); get_data (get_entry i + j) let write ((data, entry) as la : 'a t) i j (v : 'a) : unit = assert (0 <= j && j < row_length la i); data.(entry.(i) + j) <- v let rec read_interval_via get_data i j = if i = j then [] else get_data i :: read_interval_via get_data (i + 1) j let read_row_via get_data get_entry i = read_interval_via get_data (get_entry i) (get_entry (i + 1)) let read_row ((data, entry) : 'a t) i : 'a list = read_row_via (Array.get data) (Array.get entry) i menhir-20210929/lib/LinearizedArray.mli000066400000000000000000000053001412503066000175770ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* An array of arrays (of possibly different lengths!) can be ``linearized'', i.e., encoded as a data array (by concatenating all of the little arrays) and an entry array (which contains offsets into the data array). *) type 'a t = (* data: *) 'a array * (* entry: *) int array (* [make a] turns the array of arrays [a] into a linearized array. *) val make: 'a array array -> 'a t (* [read la i j] reads the linearized array [la] at indices [i] and [j]. Thus, [read (make a) i j] is equivalent to [a.(i).(j)]. *) val read: 'a t -> int -> int -> 'a (* [write la i j v] writes the value [v] into the linearized array [la] at indices [i] and [j]. *) val write: 'a t -> int -> int -> 'a -> unit (* [length la] is the number of rows of the array [la]. Thus, [length (make a)] is equivalent to [Array.length a]. *) val length: 'a t -> int (* [row_length la i] is the length of the row at index [i] in the linearized array [la]. Thus, [row_length (make a) i] is equivalent to [Array.length a.(i)]. *) val row_length: 'a t -> int -> int (* [read_row la i] reads the row at index [i], producing a list. Thus, [read_row (make a) i] is equivalent to [Array.to_list a.(i)]. *) val read_row: 'a t -> int -> 'a list (* The following variants read the linearized array via accessors [get_data : int -> 'a] and [get_entry : int -> int]. *) val row_length_via: (* get_entry: *) (int -> int) -> (* i: *) int -> int val read_via: (* get_data: *) (int -> 'a) -> (* get_entry: *) (int -> int) -> (* i: *) int -> (* j: *) int -> 'a val read_row_via: (* get_data: *) (int -> 'a) -> (* get_entry: *) (int -> int) -> (* i: *) int -> 'a list menhir-20210929/lib/Makefile000066400000000000000000000000461412503066000154510ustar00rootroot00000000000000.PHONY: all all: @ dune build @check menhir-20210929/lib/PackedIntArray.ml000066400000000000000000000141171412503066000172100ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* A packed integer array is represented as a pair of an integer [k] and a string [s]. The integer [k] is the number of bits per integer that we use. The string [s] is just an array of bits, which is read in 8-bit chunks. *) (* The ocaml programming language treats string literals and array literals in slightly different ways: the former are statically allocated, while the latter are dynamically allocated. (This is rather arbitrary.) In the context of Menhir's table-based back-end, where compact, immutable integer arrays are needed, ocaml strings are preferable to ocaml arrays. *) type t = int * string (* The magnitude [k] of an integer [v] is the number of bits required to represent [v]. It is rounded up to the nearest power of two, so that [k] divides [Sys.word_size]. *) let magnitude (v : int) = if v < 0 then Sys.word_size else let rec check k max = (* [max] equals [2^k] *) if (max <= 0) || (v < max) then k (* if [max] just overflew, then [v] requires a full ocaml integer, and [k] is the number of bits in an ocaml integer plus one, that is, [Sys.word_size]. *) else check (2 * k) (max * max) in check 1 2 (* [pack a] turns an array of integers into a packed integer array. *) (* Because the sign bit is the most significant bit, the magnitude of any negative number is the word size. In other words, [pack] does not achieve any space savings as soon as [a] contains any negative numbers, even if they are ``small''. *) let pack (a : int array) : t = let m = Array.length a in (* Compute the maximum magnitude of the array elements. This tells us how many bits per element we are going to use. *) let k = Array.fold_left (fun k v -> max k (magnitude v) ) 1 a in (* Because access to ocaml strings is performed on an 8-bit basis, two cases arise. If [k] is less than 8, then we can pack multiple array entries into a single character. If [k] is greater than 8, then we must use multiple characters to represent a single array entry. *) if k <= 8 then begin (* [w] is the number of array entries that we pack in a character. *) assert (8 mod k = 0); let w = 8 / k in (* [n] is the length of the string that we allocate. *) let n = if m mod w = 0 then m / w else m / w + 1 in let s = Bytes.create n in (* Define a reader for the source array. The reader might run off the end if [w] does not divide [m]. *) let i = ref 0 in let next () = let ii = !i in if ii = m then 0 (* ran off the end, pad with zeroes *) else let v = a.(ii) in i := ii + 1; v in (* Fill up the string. *) for j = 0 to n - 1 do let c = ref 0 in for _x = 1 to w do c := (!c lsl k) lor next() done; Bytes.set s j (Char.chr !c) done; (* Done. *) k, Bytes.unsafe_to_string s end else begin (* k > 8 *) (* [w] is the number of characters that we use to encode an array entry. *) assert (k mod 8 = 0); let w = k / 8 in (* [n] is the length of the string that we allocate. *) let n = m * w in let s = Bytes.create n in (* Fill up the string. *) for i = 0 to m - 1 do let v = ref a.(i) in for x = 1 to w do Bytes.set s ((i + 1) * w - x) (Char.chr (!v land 255)); v := !v lsr 8 done done; (* Done. *) k, Bytes.unsafe_to_string s end (* Access to a string. *) let read (s : string) (i : int) : int = Char.code (String.unsafe_get s i) (* [get1 t i] returns the integer stored in the packed array [t] at index [i]. It assumes (and does not check) that the array's bit width is [1]. The parameter [t] is just a string. *) let get1 (s : string) (i : int) : int = let c = read s (i lsr 3) in let c = c lsr ((lnot i) land 0b111) in let c = c land 0b1 in c (* [get t i] returns the integer stored in the packed array [t] at index [i]. *) (* Together, [pack] and [get] satisfy the following property: if the index [i] is within bounds, then [get (pack a) i] equals [a.(i)]. *) let get ((k, s) : t) (i : int) : int = match k with | 1 -> get1 s i | 2 -> let c = read s (i lsr 2) in let c = c lsr (2 * ((lnot i) land 0b11)) in let c = c land 0b11 in c | 4 -> let c = read s (i lsr 1) in let c = c lsr (4 * ((lnot i) land 0b1)) in let c = c land 0b1111 in c | 8 -> read s i | 16 -> let j = 2 * i in (read s j) lsl 8 + read s (j + 1) | _ -> assert (k = 32); (* 64 bits unlikely, not supported *) let j = 4 * i in (((read s j lsl 8) + read s (j + 1)) lsl 8 + read s (j + 2)) lsl 8 + read s (j + 3) (* [unflatten1 (n, data) i j] accesses the two-dimensional bitmap represented by [(n, data)] at indices [i] and [j]. The integer [n] is the width of the bitmap; the string [data] is the second component of the packed array obtained by encoding the table as a one-dimensional array. *) let unflatten1 (n, data) i j = get1 data (n * i + j) menhir-20210929/lib/PackedIntArray.mli000066400000000000000000000053421412503066000173610ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* A packed integer array is represented as a pair of an integer [k] and a string [s]. The integer [k] is the number of bits per integer that we use. The string [s] is just an array of bits, which is read in 8-bit chunks. *) (* The ocaml programming language treats string literals and array literals in slightly different ways: the former are statically allocated, while the latter are dynamically allocated. (This is rather arbitrary.) In the context of Menhir's table-based back-end, where compact, immutable integer arrays are needed, ocaml strings are preferable to ocaml arrays. *) type t = int * string (* [pack a] turns an array of integers into a packed integer array. *) (* Because the sign bit is the most significant bit, the magnitude of any negative number is the word size. In other words, [pack] does not achieve any space savings as soon as [a] contains any negative numbers, even if they are ``small''. *) val pack: int array -> t (* [get t i] returns the integer stored in the packed array [t] at index [i]. *) (* Together, [pack] and [get] satisfy the following property: if the index [i] is within bounds, then [get (pack a) i] equals [a.(i)]. *) val get: t -> int -> int (* [get1 t i] returns the integer stored in the packed array [t] at index [i]. It assumes (and does not check) that the array's bit width is [1]. The parameter [t] is just a string. *) val get1: string -> int -> int (* [unflatten1 (n, data) i j] accesses the two-dimensional bitmap represented by [(n, data)] at indices [i] and [j]. The integer [n] is the width of the bitmap; the string [data] is the second component of the packed array obtained by encoding the table as a one-dimensional array. *) val unflatten1: int * string -> int -> int -> int menhir-20210929/lib/Printers.ml000066400000000000000000000067061412503066000161620ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) module Make (I : IncrementalEngine.EVERYTHING) (User : sig val print: string -> unit val print_symbol: I.xsymbol -> unit val print_element: (I.element -> unit) option end) = struct let arrow = " -> " let dot = "." let space = " " let newline = "\n" open User open I (* Printing a list of symbols. An optional dot is printed at offset [i] into the list [symbols], if this offset lies between [0] and the length of the list (included). *) let rec print_symbols i symbols = if i = 0 then begin print dot; print space; print_symbols (-1) symbols end else begin match symbols with | [] -> () | symbol :: symbols -> print_symbol symbol; print space; print_symbols (i - 1) symbols end (* Printing an element as a symbol. *) let print_element_as_symbol element = match element with | Element (s, _, _, _) -> print_symbol (X (incoming_symbol s)) (* Some of the functions that follow need an element printer. They use [print_element] if provided by the user; otherwise they use [print_element_as_symbol]. *) let print_element = match print_element with | Some print_element -> print_element | None -> print_element_as_symbol (* Printing a stack as a list of symbols. Stack bottom on the left, stack top on the right. *) let rec print_stack env = match top env, pop env with | Some element, Some env -> print_stack env; print space; print_element element | _, _ -> () let print_stack env = print_stack env; print newline (* Printing an item. *) let print_item (prod, i) = print_symbol (lhs prod); print arrow; print_symbols i (rhs prod); print newline (* Printing a list of symbols (public version). *) let print_symbols symbols = print_symbols (-1) symbols (* Printing a production (without a dot). *) let print_production prod = print_item (prod, -1) (* Printing the current LR(1) state. *) let print_current_state env = print "Current LR(1) state: "; match top env with | None -> print ""; (* TEMPORARY unsatisfactory *) print newline | Some (Element (current, _, _, _)) -> print (string_of_int (number current)); print newline; List.iter print_item (items current) let print_env env = print_stack env; print_current_state env; print newline end menhir-20210929/lib/Printers.mli000066400000000000000000000052011412503066000163200ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* This module is part of MenhirLib. *) module Make (I : IncrementalEngine.EVERYTHING) (User : sig (* [print s] is supposed to send the string [s] to some output channel. *) val print: string -> unit (* [print_symbol s] is supposed to print a representation of the symbol [s]. *) val print_symbol: I.xsymbol -> unit (* [print_element e] is supposed to print a representation of the element [e]. This function is optional; if it is not provided, [print_element_as_symbol] (defined below) is used instead. *) val print_element: (I.element -> unit) option end) : sig open I (* Printing a list of symbols. *) val print_symbols: xsymbol list -> unit (* Printing an element as a symbol. This prints just the symbol that this element represents; nothing more. *) val print_element_as_symbol: element -> unit (* Printing a stack as a list of elements. This function needs an element printer. It uses [print_element] if provided by the user; otherwise it uses [print_element_as_symbol]. (Ending with a newline.) *) val print_stack: 'a env -> unit (* Printing an item. (Ending with a newline.) *) val print_item: item -> unit (* Printing a production. (Ending with a newline.) *) val print_production: production -> unit (* Printing the current LR(1) state. The current state is first displayed as a number; then the list of its LR(0) items is printed. (Ending with a newline.) *) val print_current_state: 'a env -> unit (* Printing a summary of the stack and current state. This function just calls [print_stack] and [print_current_state] in succession. *) val print_env: 'a env -> unit end menhir-20210929/lib/RowDisplacement.ml000066400000000000000000000214211412503066000174430ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* This module compresses a two-dimensional table, where some values are considered insignificant, via row displacement. *) (* This idea reportedly appears in Aho and Ullman's ``Principles of Compiler Design'' (1977). It is evaluated in Tarjan and Yao's ``Storing a Sparse Table'' (1979) and in Dencker, Dürre, and Heuft's ``Optimization of Parser Tables for Portable Compilers'' (1984). *) (* A compressed table is represented as a pair of arrays. The displacement array is an array of offsets into the data array. *) type 'a table = int array * (* displacement *) 'a array (* data *) (* In a natural version of this algorithm, displacements would be greater than (or equal to) [-n]. However, in the particular setting of Menhir, both arrays are intended to be compressed with [PackedIntArray], which does not efficiently support negative numbers. For this reason, we are careful not to produce negative displacements. *) (* In order to avoid producing negative displacements, we simply use the least significant bit as the sign bit. This is implemented by [encode] and [decode] below. *) (* One could also think, say, of adding [n] to every displacement, so as to ensure that all displacements are nonnegative. This would work, but would require [n] to be published, for use by the decoder. *) let encode (displacement : int) : int = if displacement >= 0 then displacement lsl 1 else (-displacement) lsl 1 + 1 let decode (displacement : int) : int = if displacement land 1 = 0 then displacement lsr 1 else -(displacement lsr 1) (* It is reasonable to assume that, as matrices grow large, their density becomes low, i.e., they have many insignificant entries. As a result, it is important to work with a sparse data structure for rows. We internally represent a row as a list of its significant entries, where each entry is a pair of a [j] index and an element. *) type 'a row = (int * 'a) list (* [compress equal insignificant dummy m n t] turns the two-dimensional table [t] into a compressed table. The parameter [equal] is equality of data values. The parameter [wildcard] tells which data values are insignificant, and can thus be overwritten with other values. The parameter [dummy] is used to fill holes in the data array. [m] and [n] are the integer dimensions of the table [t]. *) let compress (equal : 'a -> 'a -> bool) (insignificant : 'a -> bool) (dummy : 'a) (m : int) (n : int) (t : 'a array array) : 'a table = (* Be defensive. *) assert (Array.length t = m); assert begin for i = 0 to m - 1 do assert (Array.length t.(i) = n) done; true end; (* This turns a row-as-array into a row-as-sparse-list. The row is accompanied by its index [i] and by its rank (the number of its significant entries, that is, the length of the row-as-a-list. *) let sparse (i : int) (line : 'a array) : int * int * 'a row (* index, rank, row *) = let rec loop (j : int) (rank : int) (row : 'a row) = if j < 0 then i, rank, row else let x = line.(j) in if insignificant x then loop (j - 1) rank row else loop (j - 1) (1 + rank) ((j, x) :: row) in loop (n - 1) 0 [] in (* Construct an array of all rows, together with their index and rank. *) let rows : (int * int * 'a row) array = (* index, rank, row *) Array.mapi sparse t in (* Sort this array by decreasing rank. This does not have any impact on correctness, but reportedly improves compression. The intuitive idea is that rows with few significant elements are easy to fit, so they should be inserted last, after the problem has become quite constrained by fitting the heavier rows. This heuristic is attributed to Ziegler. *) Array.fast_sort (fun (_, rank1, _) (_, rank2, _) -> compare rank2 rank1 ) rows; (* Allocate a one-dimensional array of displacements. *) let displacement : int array = Array.make m 0 in (* Allocate a one-dimensional, infinite array of values. Indices into this array are written [k]. *) let data : 'a InfiniteArray.t = InfiniteArray.make dummy in (* Determine whether [row] fits at offset [k] within the current [data] array, up to extension of this array. *) (* Note that this check always succeeds when [k] equals the length of the [data] array. Indeed, the loop is then skipped. This property guarantees the termination of the recursive function [fit] below. *) let fits k (row : 'a row) : bool = let d = InfiniteArray.extent data in let rec loop = function | [] -> true | (j, x) :: row -> (* [x] is a significant element. *) (* By hypothesis, [k + j] is nonnegative. If it is greater than or equal to the current length of the data array, stop -- the row fits. *) assert (k + j >= 0); if k + j >= d then true (* We now know that [k + j] is within bounds of the data array. Check whether it is compatible with the element [y] found there. If it is, continue. If it isn't, stop -- the row does not fit. *) else let y = InfiniteArray.get data (k + j) in if insignificant y || equal x y then loop row else false in loop row in (* Find the leftmost position where a row fits. *) (* If the leftmost significant element in this row is at offset [j], then we can hope to fit as far left as [-j] -- so this element lands at offset [0] in the data array. *) (* Note that displacements may be negative. This means that, for insignificant elements, accesses to the data array could fail: they could be out of bounds, either towards the left or towards the right. This is not a problem, as long as [get] is invoked only at significant elements. *) let rec fit k row : int = if fits k row then k else fit (k + 1) row in let fit row = match row with | [] -> 0 (* irrelevant *) | (j, _) :: _ -> fit (-j) row in (* Write [row] at (compatible) offset [k]. *) let rec write k = function | [] -> () | (j, x) :: row -> InfiniteArray.set data (k + j) x; write k row in (* Iterate over the sorted array of rows. Fit and write each row at the leftmost compatible offset. Update the displacement table. *) Array.iter (fun (i, _, row) -> let k = fit row in (* if [row] has leading insignificant elements, then [k] can be negative *) write k row; displacement.(i) <- encode k ) rows; (* Return the compressed tables. *) displacement, InfiniteArray.domain data (* [get ct i j] returns the value found at indices [i] and [j] in the compressed table [ct]. This function call is permitted only if the value found at indices [i] and [j] in the original table is significant -- otherwise, it could fail abruptly. *) (* Together, [compress] and [get] have the property that, if the value found at indices [i] and [j] in an uncompressed table [t] is significant, then [get (compress t) i j] is equal to that value. *) let get (displacement, data) i j = assert (0 <= i && i < Array.length displacement); let k = decode displacement.(i) in assert (0 <= k + j && k + j < Array.length data); (* failure of this assertion indicates an attempt to access an insignificant element that happens to be mapped out of the bounds of the [data] array. *) data.(k + j) (* [getget] is a variant of [get] which only requires read access, via accessors, to the two components of the table. *) let getget get_displacement get_data (displacement, data) i j = let k = decode (get_displacement displacement i) in get_data data (k + j) menhir-20210929/lib/RowDisplacement.mli000066400000000000000000000050451412503066000176200ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* This module compresses a two-dimensional table, where some values are considered insignificant, via row displacement. *) (* A compressed table is represented as a pair of arrays. The displacement array is an array of offsets into the data array. *) type 'a table = int array * (* displacement *) 'a array (* data *) (* [compress equal insignificant dummy m n t] turns the two-dimensional table [t] into a compressed table. The parameter [equal] is equality of data values. The parameter [wildcard] tells which data values are insignificant, and can thus be overwritten with other values. The parameter [dummy] is used to fill holes in the data array. [m] and [n] are the integer dimensions of the table [t]. *) val compress: ('a -> 'a -> bool) -> ('a -> bool) -> 'a -> int -> int -> 'a array array -> 'a table (* [get ct i j] returns the value found at indices [i] and [j] in the compressed table [ct]. This function call is permitted only if the value found at indices [i] and [j] in the original table is significant -- otherwise, it could fail abruptly. *) (* Together, [compress] and [get] have the property that, if the value found at indices [i] and [j] in an uncompressed table [t] is significant, then [get (compress t) i j] is equal to that value. *) val get: 'a table -> int -> int -> 'a (* [getget] is a variant of [get] which only requires read access, via accessors, to the two components of the table. *) val getget: ('displacement -> int -> int) -> ('data -> int -> 'a) -> 'displacement * 'data -> int -> int -> 'a menhir-20210929/lib/TableFormat.ml000066400000000000000000000140271412503066000165470ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* This signature defines the format of the parse tables. It is used as an argument to [TableInterpreter.Make]. *) module type TABLES = sig (* This is the parser's type of tokens. *) type token (* This maps a token to its internal (generation-time) integer code. *) val token2terminal: token -> int (* This is the integer code for the error pseudo-token. *) val error_terminal: int (* This maps a token to its semantic value. *) val token2value: token -> Obj.t (* Traditionally, an LR automaton is described by two tables, namely, an action table and a goto table. See, for instance, the Dragon book. The action table is a two-dimensional matrix that maps a state and a lookahead token to an action. An action is one of: shift to a certain state, reduce a certain production, accept, or fail. The goto table is a two-dimensional matrix that maps a state and a non-terminal symbol to either a state or undefined. By construction, this table is sparse: its undefined entries are never looked up. A compression technique is free to overlap them with other entries. In Menhir, things are slightly different. If a state has a default reduction on token [#], then that reduction must be performed without consulting the lookahead token. As a result, we must first determine whether that is the case, before we can obtain a lookahead token and use it as an index in the action table. Thus, Menhir's tables are as follows. A one-dimensional default reduction table maps a state to either ``no default reduction'' (encoded as: 0) or ``by default, reduce prod'' (encoded as: 1 + prod). The action table is looked up only when there is no default reduction. *) val default_reduction: PackedIntArray.t (* Menhir follows Dencker, Dürre and Heuft, who point out that, although the action table is not sparse by nature (i.e., the error entries are significant), it can be made sparse by first factoring out a binary error matrix, then replacing the error entries in the action table with undefined entries. Thus: A two-dimensional error bitmap maps a state and a terminal to either ``fail'' (encoded as: 0) or ``do not fail'' (encoded as: 1). The action table, which is now sparse, is looked up only in the latter case. *) (* The error bitmap is flattened into a one-dimensional table; its width is recorded so as to allow indexing. The table is then compressed via [PackedIntArray]. The bit width of the resulting packed array must be [1], so it is not explicitly recorded. *) (* The error bitmap does not contain a column for the [#] pseudo-terminal. Thus, its width is [Terminal.n - 1]. We exploit the fact that the integer code assigned to [#] is greatest: the fact that the right-most column in the bitmap is missing does not affect the code for accessing it. *) val error: int (* width of the bitmap *) * string (* second component of [PackedIntArray.t] *) (* A two-dimensional action table maps a state and a terminal to one of ``shift to state s and discard the current token'' (encoded as: s | 10), ``shift to state s without discarding the current token'' (encoded as: s | 11), or ``reduce prod'' (encoded as: prod | 01). *) (* The action table is first compressed via [RowDisplacement], then packed via [PackedIntArray]. *) (* Like the error bitmap, the action table does not contain a column for the [#] pseudo-terminal. *) val action: PackedIntArray.t * PackedIntArray.t (* A one-dimensional lhs table maps a production to its left-hand side (a non-terminal symbol). *) val lhs: PackedIntArray.t (* A two-dimensional goto table maps a state and a non-terminal symbol to either undefined (encoded as: 0) or a new state s (encoded as: 1 + s). *) (* The goto table is first compressed via [RowDisplacement], then packed via [PackedIntArray]. *) val goto: PackedIntArray.t * PackedIntArray.t (* The number of start productions. A production [prod] is a start production if and only if [prod < start] holds. This is also the number of start symbols. A nonterminal symbol [nt] is a start symbol if and only if [nt < start] holds. *) val start: int (* A one-dimensional semantic action table maps productions to semantic actions. The calling convention for semantic actions is described in [EngineTypes]. This table contains ONLY NON-START PRODUCTIONS, so the indexing is off by [start]. Be careful. *) val semantic_action: ((int, Obj.t, token) EngineTypes.env -> (int, Obj.t) EngineTypes.stack) array (* The parser defines its own [Error] exception. This exception can be raised by semantic actions and caught by the engine, and raised by the engine towards the final user. *) exception Error (* The parser indicates whether to generate a trace. Generating a trace requires two extra tables, which respectively map a terminal symbol and a production to a string. *) val trace: (string array * string array) option end menhir-20210929/lib/TableInterpreter.ml000066400000000000000000000161541412503066000176250ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) module MakeEngineTable (T : TableFormat.TABLES) = struct type state = int let number s = s type token = T.token type terminal = int type nonterminal = int type semantic_value = Obj.t let token2terminal = T.token2terminal let token2value = T.token2value let error_terminal = T.error_terminal let error_value = Obj.repr () (* The function [foreach_terminal] exploits the fact that the first component of [T.error] is [Terminal.n - 1], i.e., the number of terminal symbols, including [error] but not [#]. *) (* There is similar code in [InspectionTableInterpreter]. The code there contains an additional conversion of the type [terminal] to the type [xsymbol]. *) let rec foldij i j f accu = if i = j then accu else foldij (i + 1) j f (f i accu) let foreach_terminal f accu = let n, _ = T.error in foldij 0 n (fun i accu -> f i accu ) accu type production = int (* In principle, only non-start productions are exposed to the user, at type [production] or at type [int]. This is checked dynamically. *) let non_start_production i = assert (T.start <= i && i - T.start < Array.length T.semantic_action) let production_index i = non_start_production i; i let find_production i = non_start_production i; i let default_reduction state defred nodefred env = let code = PackedIntArray.get T.default_reduction state in if code = 0 then nodefred env else defred env (code - 1) let is_start prod = prod < T.start (* This auxiliary function helps access a compressed, two-dimensional matrix, like the action and goto tables. *) let unmarshal2 table i j = RowDisplacement.getget PackedIntArray.get PackedIntArray.get table i j let action state terminal value shift reduce fail env = match PackedIntArray.unflatten1 T.error state terminal with | 1 -> let action = unmarshal2 T.action state terminal in let opcode = action land 0b11 and param = action lsr 2 in if opcode >= 0b10 then (* 0b10 : shift/discard *) (* 0b11 : shift/nodiscard *) let please_discard = (opcode = 0b10) in shift env please_discard terminal value param else (* 0b01 : reduce *) (* 0b00 : cannot happen *) reduce env param | c -> assert (c = 0); fail env let goto_nt state nt = let code = unmarshal2 T.goto state nt in (* code = 1 + state *) code - 1 let goto_prod state prod = goto_nt state (PackedIntArray.get T.lhs prod) let maybe_goto_nt state nt = let code = unmarshal2 T.goto state nt in (* If [code] is 0, there is no outgoing transition. If [code] is [1 + state], there is a transition towards [state]. *) assert (0 <= code); if code = 0 then None else Some (code - 1) exception Error = T.Error type semantic_action = (state, semantic_value, token) EngineTypes.env -> (state, semantic_value) EngineTypes.stack let semantic_action prod = (* Indexing into the array [T.semantic_action] is off by [T.start], because the start productions do not have entries in this array. *) T.semantic_action.(prod - T.start) (* [may_reduce state prod] tests whether the state [state] is capable of reducing the production [prod]. This information could be determined in constant time if we were willing to create a bitmap for it, but that would take up a lot of space. Instead, we obtain this information by iterating over a line in the action table. This is costly, but this function is not normally used by the LR engine anyway; it is supposed to be used only by programmers who wish to develop error recovery strategies. *) (* In the future, if desired, we could memoize this function, so as to pay the cost in (memory) space only if and where this function is actually used. We could also replace [foreach_terminal] with a function [exists_terminal] which stops as soon as the accumulator is [true]. *) let may_reduce state prod = (* Test if there is a default reduction of [prod]. *) default_reduction state (fun () prod' -> prod = prod') (fun () -> (* If not, then for each terminal [t], ... *) foreach_terminal (fun t accu -> accu || (* ... test if there is a reduction of [prod] on [t]. *) action state t () (* shift: *) (fun () _ _ () _ -> false) (* reduce: *) (fun () prod' -> prod = prod') (* fail: *) (fun () -> false) () ) false ) () (* If [T.trace] is [None], then the logging functions do nothing. *) let log = match T.trace with Some _ -> true | None -> false module Log = struct open Printf let state state = match T.trace with | Some _ -> fprintf stderr "State %d:\n%!" state | None -> () let shift terminal state = match T.trace with | Some (terminals, _) -> fprintf stderr "Shifting (%s) to state %d\n%!" terminals.(terminal) state | None -> () let reduce_or_accept prod = match T.trace with | Some (_, productions) -> fprintf stderr "%s\n%!" productions.(prod) | None -> () let lookahead_token token startp endp = match T.trace with | Some (terminals, _) -> fprintf stderr "Lookahead token is now %s (%d-%d)\n%!" terminals.(token) startp.Lexing.pos_cnum endp.Lexing.pos_cnum | None -> () let initiating_error_handling () = match T.trace with | Some _ -> fprintf stderr "Initiating error handling\n%!" | None -> () let resuming_error_handling () = match T.trace with | Some _ -> fprintf stderr "Resuming error handling\n%!" | None -> () let handling_error state = match T.trace with | Some _ -> fprintf stderr "Handling error in state %d\n%!" state | None -> () end end menhir-20210929/lib/TableInterpreter.mli000066400000000000000000000032531412503066000177720ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU Library General Public License version 2, with a *) (* special exception on linking, as described in the file LICENSE. *) (* *) (******************************************************************************) (* This module provides a thin decoding layer for the generated tables, thus providing an API that is suitable for use by [Engine.Make]. It is part of [MenhirLib]. *) (* The exception [Error] is declared within the generated parser. This is preferable to pre-declaring it here, as it ensures that each parser gets its own, distinct [Error] exception. This is consistent with the code-based back-end. *) (* This functor is invoked by the generated parser. *) module MakeEngineTable (T : TableFormat.TABLES) : EngineTypes.TABLE with type state = int and type token = T.token and type semantic_value = Obj.t and type production = int and type terminal = int and type nonterminal = int menhir-20210929/lib/dune000066400000000000000000000015121412503066000146660ustar00rootroot00000000000000;; Note: the library MenhirLib is not built here; it is built in lib/pack. ;; These rules generate the module [StaticVersion]. This module defines a ;; value of type [unit] whose name is [require_XXXXXXXX], where [XXXXXXXX] ;; is our 8-digit version number. This number is set in the [dune-project] ;; file. ;; When the [--table] switch is passed, Menhir produces a reference to ;; [MenhirLib.StaticVersion.require_XXXXXXXX] in the generated code. This ;; ensures that the generated code can be linked only with an appropriate ;; version of MenhirLib. This is important because we use unsafe casts: a ;; version mismatch could cause a crash. (rule (with-stdout-to StaticVersion.mli (echo "val require_%{version:menhir}: unit\n") ) ) (rule (with-stdout-to StaticVersion.ml (echo "let require_%{version:menhir} = ()\n") ) ) menhir-20210929/lib/pack/000077500000000000000000000000001412503066000147275ustar00rootroot00000000000000menhir-20210929/lib/pack/dune000066400000000000000000000010001412503066000155740ustar00rootroot00000000000000;; The helper script pack.ml generates menhirLib.{ml,mli} ;; by concatenating the modules listed in menhirLib.mlpack. (executable (name pack) (modules pack) ) (rule (targets menhirLib.ml menhirLib.mli) (deps (glob_files ../*.{ml,mli}) menhirLib.mlpack) (action (run ./pack.exe)) ) ;; We can then compile MenhirLib from menhirLib.{ml,mli} ;; in this directory. (library (name menhirLib) (public_name menhirLib) (synopsis "Runtime support for code generated by Menhir") (modules menhirLib) ) menhir-20210929/lib/pack/menhirLib.mlpack000066400000000000000000000006331412503066000200330ustar00rootroot00000000000000# This is the list of modules that must go into MenhirLib. # They must be listed in dependency order, as this list is # used to construct menhirLib.ml at installation time. General Convert IncrementalEngine EngineTypes Engine ErrorReports LexerUtil Printers InfiniteArray PackedIntArray RowDisplacement LinearizedArray TableFormat InspectionTableFormat InspectionTableInterpreter TableInterpreter StaticVersion menhir-20210929/lib/pack/pack.ml000066400000000000000000000052731412503066000162060ustar00rootroot00000000000000(* This script finds the names of the modules in MenhirLib by reading the file menhirLib.mlpack. It then finds the source files for these modules in the parent directory (lib/), and concatenates them to create menhirLib.{ml,mli} in the current directory (lib/pack). *) (* ------------------------------------------------------------------------- *) (* [up fn] is [../fn]. *) let up fn = Filename.concat Filename.parent_dir_name fn (* ------------------------------------------------------------------------- *) (* [cat_file oc fn] prints the content of the file [fn] on the channel [oc]. *) let cat_file oc fn = let ic = open_in fn in let rec loop () = match input_line ic with | s -> output_string oc s; output_char oc '\n'; loop () | exception End_of_file -> () in loop (); close_in ic (* ------------------------------------------------------------------------- *) (* The names of the modules in MenhirLib are obtained by reading the non-comment lines in the file menhirLib.mlpack. *) let menhirLib_modules : string list = let ic = open_in "menhirLib.mlpack" in let rec loop accu = match input_line ic with | exception End_of_file -> List.rev accu | s -> let s = String.trim s in let accu = if s <> "" && s.[0] <> '#' then s :: accu else accu in loop accu in let r = loop [] in close_in ic; r (* ------------------------------------------------------------------------- *) (* The source file menhirLib.ml is created by concatenating all of the source files that make up MenhirLib. This file is not needed to compile Menhir or MenhirLib. It is installed at the same time as MenhirLib and is copied by Menhir when the user requests a self-contained parser (one that is not dependent on MenhirLib). *) let () = print_endline "Creating menhirLib.ml..."; let oc = open_out "menhirLib.ml" in List.iter (fun m -> Printf.fprintf oc "module %s = struct\n" m; cat_file oc (up (m ^ ".ml")); Printf.fprintf oc "end\n" ) menhirLib_modules; close_out oc (* The source file menhirLib.mli is created in the same way. If a module does not have an .mli file, then we assume that its .ml file contains type (and module type) definitions only, so we copy it instead of the (non-existent) .mli file. *) let () = print_endline "Creating menhirLib.mli..."; let oc = open_out "menhirLib.mli" in List.iter (fun m -> Printf.fprintf oc "module %s : sig\n" m; if Sys.file_exists (up (m ^ ".mli")) then cat_file oc (up (m ^ ".mli")) else cat_file oc (up (m ^ ".ml")); Printf.fprintf oc "end\n" ) menhirLib_modules; close_out oc menhir-20210929/pprint/000077500000000000000000000000001412503066000145575ustar00rootroot00000000000000menhir-20210929/pprint/.exclude000066400000000000000000000000471412503066000162120ustar00rootroot00000000000000OldPPrintEngine.* size.* PPrintBench.* menhir-20210929/pprint/AUTHORS000066400000000000000000000002231412503066000156240ustar00rootroot00000000000000PPrint was written by François Pottier and Nicolas Pouillard, with contributions by Yann Régis-Gianas, Gabriel Scherer, and Jonathan Protzenko. menhir-20210929/pprint/CHANGES.md000066400000000000000000000046471412503066000161640ustar00rootroot00000000000000# Changes ## 2020/04/10 * New function `PPrint.utf8format`. ## 2020/03/16 * New functions `PPrint.OCaml.flowing_list` and `PPrint.OCaml.flowing_array`. ## 2020/02/26 * Change the behavior of `PPrint.ToFormatter` to use `Format.pp_print_text` internally. This means that a newline character causes a call to `Format.pp_force_newline`; a space character causes a call to `Format.pp_print_space`; and every other character is printed using `Format.pp_print_char`. * Switch to `dune`. * Avoid a few compilation warnings. ## 2018/05/23 * Add a `line` field to the `state` record, which can be read by the code that implements custom documents. Add a `range` combinator that allows retrieving the start and end points of a (sub)document in the output. (Suggested by Victor Gomes.) ## 2017/10/03 * Update the code and build options to use `-safe-string`. This means that the library now requires OCaml 4.02 or later, and is compatible with 4.06. ## 2015/03/16 * Moved to github and changed the license to LGPL with an exception. ## 2014/04/25 * Minor changes in the implementation of `string` and `substring`. Initially committed on 2014/03/24, but left out of the 20140424 release due to a goof-up. ## 2014/04/11 * Changed the behavior of `align`, which was not consistent with its documentation. `align` now sets the indentation level to the current column. In particular, this means that `align (align d)` is equivalent to `align d`, which was not the case previously. Thanks to Dmitry Grebeniuk for reporting this issue. ## 2014/04/03 * The library is now extensible (in principle). A `custom` document constructor allows the user to define her own documents, as long as they fit the manner in which the current rendering engine works. * The `compact` rendering engine is now tail-recursive too. ## 2014/03/21 * Minor optimisation in the smart constructor `group`. ## 2014/03/13 * New (simpler) pretty-printing engine. The representation of documents in memory is slightly larger; document construction is perhaps slightly slower, while rendering is significantly faster. (Construction dominates rendering.) The rendering speed is now guaranteed to be independent of the width parameter. The price to pay for this simplification is that the primitive document constructors `column` and `nesting` are no longer supported. The API is otherwise unchanged. ## 2013/01/31 * First official release of PPrint. menhir-20210929/pprint/LICENSE000066400000000000000000000636661412503066000156050ustar00rootroot00000000000000In the following, "the Library" refers to the OCaml source files that form the PPrint library. The names of these files match the pattern PPrint*.{ml,mli}. The Library is distributed under the terms of the GNU Library General Public License version 2 (included below). As a special exception to the GNU Library General Public License, you may link, statically or dynamically, a "work that uses the Library" with a publicly distributed version of the Library to produce an executable file containing portions of the Library, and distribute that executable file under terms of your choice, without any of the additional requirements listed in clause 6 of the GNU Library General Public License. By "a publicly distributed version of the Library", we mean either the unmodified Library as distributed by INRIA, or a modified version of the Library that is distributed under the conditions defined in clause 2 of the GNU Library General Public License. This exception does not however invalidate any other reasons why the executable file might be covered by the GNU Library General Public License. ---------------------------------------------------------------------- GNU LIBRARY GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1991 Free Software Foundation, Inc. 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. [This is the first released version of the library GPL. It is numbered 2 because it goes with version 2 of the ordinary GPL.] Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This license, the Library General Public License, applies to some specially designated Free Software Foundation software, and to any other libraries whose authors decide to use it. You can use it for your libraries, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library, or if you modify it. For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link a program with the library, you must provide complete object files to the recipients so that they can relink them with the library, after making changes to the library and recompiling it. And you must show them these terms so they know their rights. Our method of protecting your rights has two steps: (1) copyright the library, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the library. Also, for each distributor's protection, we want to make certain that everyone understands that there is no warranty for this free library. If the library is modified by someone else and passed on, we want its recipients to know that what they have is not the original version, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that companies distributing free software will individually obtain patent licenses, thus in effect transforming the program into proprietary software. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. Most GNU software, including some libraries, is covered by the ordinary GNU General Public License, which was designed for utility programs. This license, the GNU Library General Public License, applies to certain designated libraries. This license is quite different from the ordinary one; be sure to read it in full, and don't assume that anything in it is the same as in the ordinary license. The reason we have a separate public license for some libraries is that they blur the distinction we usually make between modifying or adding to a program and simply using it. Linking a program with a library, without changing the library, is in some sense simply using the library, and is analogous to running a utility program or application program. However, in a textual and legal sense, the linked executable is a combined work, a derivative of the original library, and the ordinary General Public License treats it as such. Because of this blurred distinction, using the ordinary General Public License for libraries did not effectively promote software sharing, because most developers did not use the libraries. We concluded that weaker conditions might promote sharing better. However, unrestricted linking of non-free programs would deprive the users of those programs of all benefit from the free status of the libraries themselves. This Library General Public License is intended to permit developers of non-free programs to use free libraries, while preserving your freedom as a user of such programs to change the free libraries that are incorporated in them. (We have not seen how to achieve this as regards changes in header files, but we have achieved it as regards changes in the actual functions of the Library.) The hope is that this will lead to faster development of free libraries. The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, while the latter only works together with the library. Note that it is possible for a library to be covered by the ordinary General Public License rather than by this special one. GNU LIBRARY GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any software library which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Library General Public License (also called "this License"). Each licensee is addressed as "you". A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables. The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".) "Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library. Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does. 1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) The modified work must itself be a software library. b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change. c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License. d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful. (For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library. In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2, instead of to this License. (If a newer version than version 2 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices. Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy. This option is useful when you wish to copy part of the code of the Library into a program that is not a library. 4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange. If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code. 5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License. However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables. When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law. If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.) Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself. 6. As an exception to the Sections above, you may also compile or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications. You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things: a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.) b) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution. c) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place. d) Verify that the user has already received a copy of these materials or that you have already sent this user a copy. For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute. 7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above. b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it. 10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 13. The Free Software Foundation may publish revised and/or new versions of the Library General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation. 14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS Appendix: How to Apply These Terms to Your New Libraries If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License). To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This library is free software; you can redistribute it and/or modify it under the terms of the GNU Library General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public License for more details. You should have received a copy of the GNU Library General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA Also add information on how to contact you by electronic and paper mail. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the library `Frob' (a library for tweaking knobs) written by James Random Hacker. , 1 April 1990 Ty Coon, President of Vice That's all there is to it! menhir-20210929/pprint/dune-project000066400000000000000000000000201412503066000170710ustar00rootroot00000000000000(lang dune 1.3) menhir-20210929/pprint/src/000077500000000000000000000000001412503066000153465ustar00rootroot00000000000000menhir-20210929/pprint/src/PPrintCombinators.ml000066400000000000000000000203101412503066000213110ustar00rootroot00000000000000(**************************************************************************) (* *) (* PPrint *) (* *) (* François Pottier, Inria Paris *) (* Nicolas Pouillard *) (* *) (* Copyright 2007-2019 Inria. All rights reserved. This file is *) (* distributed under the terms of the GNU Library General Public *) (* License, with an exception, as described in the file LICENSE. *) (**************************************************************************) open PPrintEngine (* ------------------------------------------------------------------------- *) (* Predefined single-character documents. *) let lparen = char '(' let rparen = char ')' let langle = char '<' let rangle = char '>' let lbrace = char '{' let rbrace = char '}' let lbracket = char '[' let rbracket = char ']' let squote = char '\'' let dquote = char '"' let bquote = char '`' let semi = char ';' let colon = char ':' let comma = char ',' let space = char ' ' let dot = char '.' let sharp = char '#' let slash = char '/' let backslash = char '\\' let equals = char '=' let qmark = char '?' let tilde = char '~' let at = char '@' let percent = char '%' let dollar = char '$' let caret = char '^' let ampersand = char '&' let star = char '*' let plus = char '+' let minus = char '-' let underscore = char '_' let bang = char '!' let bar = char '|' (* ------------------------------------------------------------------------- *) (* Repetition. *) let twice doc = doc ^^ doc let repeat n doc = let rec loop n doc accu = if n = 0 then accu else loop (n - 1) doc (doc ^^ accu) in loop n doc empty (* ------------------------------------------------------------------------- *) (* Delimiters. *) let precede l x = l ^^ x let terminate r x = x ^^ r let enclose l r x = l ^^ x ^^ r let squotes = enclose squote squote let dquotes = enclose dquote dquote let bquotes = enclose bquote bquote let braces = enclose lbrace rbrace let parens = enclose lparen rparen let angles = enclose langle rangle let brackets = enclose lbracket rbracket (* ------------------------------------------------------------------------- *) (* Some functions on lists. *) (* A variant of [fold_left] that keeps track of the element index. *) let foldli (f : int -> 'b -> 'a -> 'b) (accu : 'b) (xs : 'a list) : 'b = let r = ref 0 in List.fold_left (fun accu x -> let i = !r in r := i + 1; f i accu x ) accu xs (* ------------------------------------------------------------------------- *) (* Working with lists of documents. *) let concat docs = (* We take advantage of the fact that [^^] operates in constant time, regardless of the size of its arguments. The document that is constructed is essentially a reversed list (i.e., a tree that is biased towards the left). This is not a problem; when pretty-printing this document, the engine will descend along the left branch, pushing the nodes onto its stack as it goes down, effectively reversing the list again. *) List.fold_left (^^) empty docs let separate sep docs = foldli (fun i accu doc -> if i = 0 then doc else accu ^^ sep ^^ doc ) empty docs let concat_map f xs = List.fold_left (fun accu x -> accu ^^ f x ) empty xs let separate_map sep f xs = foldli (fun i accu x -> if i = 0 then f x else accu ^^ sep ^^ f x ) empty xs let separate2 sep last_sep docs = let n = List.length docs in foldli (fun i accu doc -> if i = 0 then doc else accu ^^ (if i < n - 1 then sep else last_sep) ^^ doc ) empty docs let optional f = function | None -> empty | Some x -> f x (* ------------------------------------------------------------------------- *) (* Text. *) (* This variant of [String.index_from] returns an option. *) let index_from s i c = try Some (String.index_from s i c) with Not_found -> None (* [lines s] chops the string [s] into a list of lines, which are turned into documents. *) let lines s = let rec chop accu i = match index_from s i '\n' with | Some j -> let accu = substring s i (j - i) :: accu in chop accu (j + 1) | None -> substring s i (String.length s - i) :: accu in List.rev (chop [] 0) let arbitrary_string s = separate (break 1) (lines s) (* [split ok s] splits the string [s] at every occurrence of a character that satisfies the predicate [ok]. The substrings thus obtained are turned into documents, and a list of documents is returned. No information is lost: the concatenation of the documents yields the original string. This code is not UTF-8 aware. *) let split ok s = let n = String.length s in let rec index_from i = if i = n then None else if ok s.[i] then Some i else index_from (i + 1) in let rec chop accu i = match index_from i with | Some j -> let accu = substring s i (j - i) :: accu in let accu = char s.[j] :: accu in chop accu (j + 1) | None -> substring s i (String.length s - i) :: accu in List.rev (chop [] 0) (* [words s] chops the string [s] into a list of words, which are turned into documents. *) let words s = let n = String.length s in (* A two-state finite automaton. *) (* In this state, we have skipped at least one blank character. *) let rec skipping accu i = if i = n then (* There was whitespace at the end. Drop it. *) accu else match s.[i] with | ' ' | '\t' | '\n' | '\r' -> (* Skip more whitespace. *) skipping accu (i + 1) | _ -> (* Begin a new word. *) word accu i (i + 1) (* In this state, we have skipped at least one non-blank character. *) and word accu i j = if j = n then (* Final word. *) substring s i (j - i) :: accu else match s.[j] with | ' ' | '\t' | '\n' | '\r' -> (* A new word has been identified. *) let accu = substring s i (j - i) :: accu in skipping accu (j + 1) | _ -> (* Continue inside the current word. *) word accu i (j + 1) in List.rev (skipping [] 0) let flow_map sep f docs = foldli (fun i accu doc -> if i = 0 then f doc else accu ^^ (* This idiom allows beginning a new line if [doc] does not fit on the current line. *) group (sep ^^ f doc) ) empty docs let flow sep docs = flow_map sep (fun x -> x) docs let url s = flow (break 0) (split (function '/' | '.' -> true | _ -> false) s) (* ------------------------------------------------------------------------- *) (* Alignment and indentation. *) let hang i d = align (nest i d) let ( !^ ) = string let ( ^/^ ) x y = x ^^ break 1 ^^ y let prefix n b x y = group (x ^^ nest n (break b ^^ y)) let (^//^) = prefix 2 1 let jump n b y = group (nest n (break b ^^ y)) (* Deprecated. let ( ^@^ ) x y = group (x ^/^ y) let ( ^@@^ ) x y = group (nest 2 (x ^/^ y)) *) let infix n b op x y = prefix n b (x ^^ blank b ^^ op) y let surround n b opening contents closing = group (opening ^^ nest n ( break b ^^ contents) ^^ break b ^^ closing ) let soft_surround n b opening contents closing = group (opening ^^ nest n (group (break b) ^^ contents) ^^ group (break b ^^ closing)) let surround_separate n b void opening sep closing docs = match docs with | [] -> void | _ :: _ -> surround n b opening (separate sep docs) closing let surround_separate_map n b void opening sep closing f xs = match xs with | [] -> void | _ :: _ -> surround n b opening (separate_map sep f xs) closing menhir-20210929/pprint/src/PPrintCombinators.mli000066400000000000000000000213251412503066000214710ustar00rootroot00000000000000(**************************************************************************) (* *) (* PPrint *) (* *) (* François Pottier, Inria Paris *) (* Nicolas Pouillard *) (* *) (* Copyright 2007-2019 Inria. All rights reserved. This file is *) (* distributed under the terms of the GNU Library General Public *) (* License, with an exception, as described in the file LICENSE. *) (**************************************************************************) open PPrintEngine (** A set of high-level combinators for building documents. *) (** {1 Single characters} *) (** The following constant documents consist of a single character. *) val lparen: document val rparen: document val langle: document val rangle: document val lbrace: document val rbrace: document val lbracket: document val rbracket: document val squote: document val dquote: document val bquote: document val semi: document val colon: document val comma: document val space: document val dot: document val sharp: document val slash: document val backslash: document val equals: document val qmark: document val tilde: document val at: document val percent: document val dollar: document val caret: document val ampersand: document val star: document val plus: document val minus: document val underscore: document val bang: document val bar: document (** {1 Delimiters} *) (** [precede l x] is [l ^^ x]. *) val precede: document -> document -> document (** [terminate r x] is [x ^^ r]. *) val terminate: document -> document -> document (** [enclose l r x] is [l ^^ x ^^ r]. *) val enclose: document -> document -> document -> document (** The following combinators enclose a document within a pair of delimiters. They are partial applications of [enclose]. No whitespace or line break is introduced. *) val squotes: document -> document val dquotes: document -> document val bquotes: document -> document val braces: document -> document val parens: document -> document val angles: document -> document val brackets: document -> document (** {1 Repetition} *) (** [twice doc] is the document obtained by concatenating two copies of the document [doc]. *) val twice: document -> document (** [repeat n doc] is the document obtained by concatenating [n] copies of the document [doc]. *) val repeat: int -> document -> document (** {1 Lists and options} *) (** [concat docs] is the concatenation of the documents in the list [docs]. *) val concat: document list -> document (** [separate sep docs] is the concatenation of the documents in the list [docs]. The separator [sep] is inserted between every two adjacent documents. *) val separate: document -> document list -> document (** [concat_map f xs] is equivalent to [concat (List.map f xs)]. *) val concat_map: ('a -> document) -> 'a list -> document (** [separate_map sep f xs] is equivalent to [separate sep (List.map f xs)]. *) val separate_map: document -> ('a -> document) -> 'a list -> document (** [separate2 sep last_sep docs] is the concatenation of the documents in the list [docs]. The separator [sep] is inserted between every two adjacent documents, except between the last two documents, where the separator [last_sep] is used instead. *) val separate2: document -> document -> document list -> document (** [optional f None] is the empty document. [optional f (Some x)] is the document [f x]. *) val optional: ('a -> document) -> 'a option -> document (** {1 Text} *) (** [lines s] is the list of documents obtained by splitting [s] at newline characters, and turning each line into a document via [substring]. This code is not UTF-8 aware. *) val lines: string -> document list (** [arbitrary_string s] is equivalent to [separate (break 1) (lines s)]. It is analogous to [string s], but is valid even if the string [s] contains newline characters. *) val arbitrary_string: string -> document (** [words s] is the list of documents obtained by splitting [s] at whitespace characters, and turning each word into a document via [substring]. All whitespace is discarded. This code is not UTF-8 aware. *) val words: string -> document list (** [split ok s] splits the string [s] before and after every occurrence of a character that satisfies the predicate [ok]. The substrings thus obtained are turned into documents, and a list of documents is returned. No information is lost: the concatenation of the documents yields the original string. This code is not UTF-8 aware. *) val split: (char -> bool) -> string -> document list (** [flow sep docs] separates the documents in the list [docs] with the separator [sep] and arranges for a new line to begin whenever a document does not fit on the current line. This is useful for typesetting free-flowing, ragged-right text. A typical choice of [sep] is [break b], where [b] is the number of spaces that must be inserted between two consecutive words (when displayed on the same line). *) val flow: document -> document list -> document (** [flow_map sep f docs] is equivalent to [flow sep (List.map f docs)]. *) val flow_map: document -> ('a -> document) -> 'a list -> document (** [url s] is a possible way of displaying the URL [s]. A potential line break is inserted immediately before and immediately after every slash and dot character. *) val url: string -> document (** {1 Alignment and indentation} *) (* [hang n doc] is analogous to [align], but additionally indents all lines, except the first one, by [n]. Thus, the text in the box forms a hanging indent. *) val hang: int -> document -> document (** [prefix n b left right] has the following flat layout: {[ left right ]} and the following non-flat layout: {[ left right ]} The parameter [n] controls the nesting of [right] (when not flat). The parameter [b] controls the number of spaces between [left] and [right] (when flat). *) val prefix: int -> int -> document -> document -> document (** [jump n b right] is equivalent to [prefix n b empty right]. *) val jump: int -> int -> document -> document (** [infix n b middle left right] has the following flat layout: {[ left middle right ]} and the following non-flat layout: {[ left middle right ]} The parameter [n] controls the nesting of [right] (when not flat). The parameter [b] controls the number of spaces between [left] and [middle] (always) and between [middle] and [right] (when flat). *) val infix: int -> int -> document -> document -> document -> document (** [surround n b opening contents closing] has the following flat layout: {[ opening contents closing ]} and the following non-flat layout: {[ opening contents closing ]} The parameter [n] controls the nesting of [contents] (when not flat). The parameter [b] controls the number of spaces between [opening] and [contents] and between [contents] and [closing] (when flat). *) val surround: int -> int -> document -> document -> document -> document (** [soft_surround] is analogous to [surround], but involves more than one group, so it offers possibilities other than the completely flat layout (where [opening], [contents], and [closing] appear on a single line) and the completely developed layout (where [opening], [contents], and [closing] appear on separate lines). It tries to place the beginning of [contents] on the same line as [opening], and to place [closing] on the same line as the end of [contents], if possible. *) val soft_surround: int -> int -> document -> document -> document -> document (** [surround_separate n b void opening sep closing docs] is equivalent to [surround n b opening (separate sep docs) closing], except when the list [docs] is empty, in which case it reduces to [void]. *) val surround_separate: int -> int -> document -> document -> document -> document -> document list -> document (** [surround_separate_map n b void opening sep closing f xs] is equivalent to [surround_separate n b void opening sep closing (List.map f xs)]. *) val surround_separate_map: int -> int -> document -> document -> document -> document -> ('a -> document) -> 'a list -> document (** {1 Short-hands} *) (** [!^s] is a short-hand for [string s]. *) val ( !^ ) : string -> document (** [x ^/^ y] separates [x] and [y] with a breakable space. It is a short-hand for [x ^^ break 1 ^^ y]. *) val ( ^/^ ) : document -> document -> document (** [x ^//^ y] is a short-hand for [prefix 2 1 x y]. *) val ( ^//^ ) : document -> document -> document menhir-20210929/pprint/src/PPrintEngine.ml000066400000000000000000000533721412503066000202540ustar00rootroot00000000000000(**************************************************************************) (* *) (* PPrint *) (* *) (* François Pottier, Inria Paris *) (* Nicolas Pouillard *) (* *) (* Copyright 2007-2019 Inria. All rights reserved. This file is *) (* distributed under the terms of the GNU Library General Public *) (* License, with an exception, as described in the file LICENSE. *) (**************************************************************************) (** A point is a pair of a line number and a column number. *) type point = int * int (** A range is a pair of points. *) type range = point * point (* ------------------------------------------------------------------------- *) (* A type of integers with infinity. *) type requirement = int (* with infinity *) (* Infinity is encoded as [max_int]. *) let infinity : requirement = max_int (* Addition of integers with infinity. *) let (++) (x : requirement) (y : requirement) : requirement = if x = infinity || y = infinity then infinity else x + y (* Comparison between an integer with infinity and a normal integer. *) let (<==) (x : requirement) (y : int) = x <= y (* ------------------------------------------------------------------------- *) (* A uniform interface for output channels. *) class type output = object (** [char c] sends the character [c] to the output channel. *) method char: char -> unit (** [substring s ofs len] sends the substring of [s] delimited by the offset [ofs] and the length [len] to the output channel. *) method substring: string -> int (* offset *) -> int (* length *) -> unit end (* Three kinds of output channels are wrapped so as to satisfy the above interface: OCaml output channels, OCaml memory buffers, and OCaml formatters. *) class channel_output channel = object method char = output_char channel method substring = output_substring channel (* We used to use [output], but, as of OCaml 4.02 and with -safe-string enabled, the type of [output] has changed: this function now expects an argument of type [bytes]. The new function [output_substring] must be used instead. Furthermore, as of OCaml 4.06, -safe-string is enabled by default. In summary, we require OCaml 4.02, use [output_substring], and enable -safe-string. *) end class buffer_output buffer = object method char = Buffer.add_char buffer method substring = Buffer.add_substring buffer end class formatter_output fmt = object method char = function | '\n' -> Format.pp_force_newline fmt () | ' ' -> Format.pp_print_space fmt () | c -> Format.pp_print_char fmt c method substring str ofs len = Format.pp_print_text fmt ( if ofs = 0 && len = String.length str then str else String.sub str ofs len ) end (* ------------------------------------------------------------------------- *) (** The rendering engine maintains the following internal state. Its structure is subject to change in future versions of the library. Nevertheless, it is exposed to the user who wishes to define custom documents. *) type state = { width: int; (** The line width. This parameter is fixed throughout the execution of the renderer. *) ribbon: int; (** The ribbon width. This parameter is fixed throughout the execution of the renderer. *) mutable last_indent: int; (** The number of blanks that were printed at the beginning of the current line. This field is updated (only) when a hardline is emitted. It is used (only) to determine whether the ribbon width constraint is respected. *) mutable line: int; (** The current line. This field is updated (only) when a hardline is emitted. It is not used by the pretty-printing engine itself. *) mutable column: int; (** The current column. This field must be updated whenever something is sent to the output channel. It is used (only) to determine whether the width constraint is respected. *) } (* ------------------------------------------------------------------------- *) (* [initial rfrac width] creates a fresh initial state. *) let initial rfrac width = { width = width; ribbon = max 0 (min width (truncate (float_of_int width *. rfrac))); last_indent = 0; line = 0; column = 0 } (* ------------------------------------------------------------------------- *) (** A custom document is defined by implementing the following methods. *) class type custom = object (** A custom document must publish the width (i.e., the number of columns) that it would like to occupy if it is printed on a single line (that is, in flattening mode). The special value [infinity] means that this document cannot be printed on a single line; this value causes any groups that contain this document to be dissolved. This method should in principle work in constant time. *) method requirement: requirement (** The method [pretty] is used by the main rendering algorithm. It has access to the output channel and to the algorithm's internal state, as described above. In addition, it receives the current indentation level and the current flattening mode (on or off). If flattening mode is on, then the document must be printed on a single line, in a manner that is consistent with the requirement that was published ahead of time. If flattening mode is off, then there is no such obligation. The state must be updated in a manner that is consistent with what is sent to the output channel. *) method pretty: output -> state -> int -> bool -> unit (** The method [compact] is used by the compact rendering algorithm. It has access to the output channel only. *) method compact: output -> unit end (* ------------------------------------------------------------------------- *) (* Here is the algebraic data type of documents. It is analogous to Daan Leijen's version, but the binary constructor [Union] is replaced with the unary constructor [Group], and the constant [Line] is replaced with more general constructions, namely [IfFlat], which provides alternative forms depending on the current flattening mode, and [HardLine], which represents a newline character, and causes a failure in flattening mode. *) type document = (* [Empty] is the empty document. *) | Empty (* [Char c] is a document that consists of the single character [c]. We enforce the invariant that [c] is not a newline character. *) | Char of char (* [String s] is a document that consists of just the string [s]. We assume, but do not check, that this string does not contain a newline character. [String] is a special case of [FancyString], which takes up less space in memory. *) | String of string (* [FancyString (s, ofs, len, apparent_length)] is a (portion of a) string that may contain fancy characters: color escape characters, UTF-8 or multi-byte characters, etc. Thus, the apparent length (which corresponds to what will be visible on screen) differs from the length (which is a number of bytes, and is reported by [String.length]). We assume, but do not check, that fancystrings do not contain a newline character. *) | FancyString of string * int * int * int (* [Blank n] is a document that consists of [n] blank characters. *) | Blank of int (* When in flattening mode, [IfFlat (d1, d2)] turns into the document [d1]. When not in flattening mode, it turns into the document [d2]. *) | IfFlat of document * document (* When in flattening mode, [HardLine] causes a failure, which requires backtracking all the way until the stack is empty. When not in flattening mode, it represents a newline character, followed with an appropriate number of indentation. A common way of using [HardLine] is to only use it directly within the right branch of an [IfFlat] construct. *) | HardLine (* The following constructors store their space requirement. This is the document's apparent length, if printed in flattening mode. This information is computed in a bottom-up manner when the document is constructed. *) (* In other words, the space requirement is the number of columns that the document needs in order to fit on a single line. We express this value in the set of `integers extended with infinity', and use the value [infinity] to indicate that the document cannot be printed on a single line. *) (* Storing this information at [Group] nodes is crucial, as it allows us to avoid backtracking and buffering. *) (* Storing this information at other nodes allows the function [requirement] to operate in constant time. This means that the bottom-up computation of requirements takes linear time. *) (* [Cat (req, doc1, doc2)] is the concatenation of the documents [doc1] and [doc2]. The space requirement [req] is the sum of the requirements of [doc1] and [doc2]. *) | Cat of requirement * document * document (* [Nest (req, j, doc)] is the document [doc], in which the indentation level has been increased by [j], that is, in which [j] blanks have been inserted after every newline character. The space requirement [req] is the same as the requirement of [doc]. *) | Nest of requirement * int * document (* [Group (req, doc)] represents an alternative: it is either a flattened form of [doc], in which occurrences of [Group] disappear and occurrences of [IfFlat] resolve to their left branch, or [doc] itself. The space requirement [req] is the same as the requirement of [doc]. *) | Group of requirement * document (* [Align (req, doc)] increases the indentation level to reach the current column. Thus, the document [doc] is rendered within a box whose upper left corner is the current position. The space requirement [req] is the same as the requirement of [doc]. *) | Align of requirement * document (* [Range (req, hook, doc)] is printed like [doc]. After it is printed, the function [hook] is applied to the range that is occupied by [doc] in the output. *) | Range of requirement * (range -> unit) * document (* [Custom (req, f)] is a document whose appearance is user-defined. *) | Custom of custom (* ------------------------------------------------------------------------- *) (* Retrieving or computing the space requirement of a document. *) let rec requirement = function | Empty -> 0 | Char _ -> 1 | String s -> String.length s | FancyString (_, _, _, len) | Blank len -> len | IfFlat (doc1, _) -> (* In flattening mode, the requirement of [ifflat x y] is just the requirement of its flat version, [x]. *) (* The smart constructor [ifflat] ensures that [IfFlat] is never nested in the left-hand side of [IfFlat], so this recursive call is not a problem; the function [requirement] has constant time complexity. *) requirement doc1 | HardLine -> (* A hard line cannot be printed in flattening mode. *) infinity | Cat (req, _, _) | Nest (req, _, _) | Group (req, _) | Align (req, _) | Range (req, _, _) -> (* These nodes store their requirement -- which is computed when the node is constructed -- so as to allow us to answer in constant time here. *) req | Custom c -> c#requirement (* ------------------------------------------------------------------------- *) (* The above algebraic data type is not exposed to the user. Instead, we expose the following functions. These functions construct a raw document and compute its requirement, so as to obtain a document. *) let empty = Empty let char c = assert (c <> '\n'); Char c let space = char ' ' let string s = String s let fancysubstring s ofs len apparent_length = if len = 0 then empty else FancyString (s, ofs, len, apparent_length) let substring s ofs len = fancysubstring s ofs len len let fancystring s apparent_length = fancysubstring s 0 (String.length s) apparent_length (* The following function was stolen from [Batteries]. *) let utf8_length s = let rec length_aux s c i = if i >= String.length s then c else let n = Char.code (String.unsafe_get s i) in let k = if n < 0x80 then 1 else if n < 0xe0 then 2 else if n < 0xf0 then 3 else 4 in length_aux s (c + 1) (i + k) in length_aux s 0 0 let utf8string s = fancystring s (utf8_length s) let utf8format f = Printf.ksprintf utf8string f let hardline = HardLine let blank n = match n with | 0 -> empty | 1 -> space | _ -> Blank n let ifflat doc1 doc2 = (* Avoid nesting [IfFlat] in the left-hand side of [IfFlat], as this is redundant. *) match doc1 with | IfFlat (doc1, _) | doc1 -> IfFlat (doc1, doc2) let internal_break i = ifflat (blank i) hardline let break0 = internal_break 0 let break1 = internal_break 1 let break i = match i with | 0 -> break0 | 1 -> break1 | _ -> internal_break i let (^^) x y = match x, y with | Empty, _ -> y | _, Empty -> x | _, _ -> Cat (requirement x ++ requirement y, x, y) let nest i x = assert (i >= 0); Nest (requirement x, i, x) let group x = let req = requirement x in (* Minor optimisation: an infinite requirement dissolves a group. *) if req = infinity then x else Group (req, x) let align x = Align (requirement x, x) let range hook x = Range (requirement x, hook, x) let custom c = (* Sanity check. *) assert (c#requirement >= 0); Custom c (* ------------------------------------------------------------------------- *) (* Printing blank space (indentation characters). *) let blank_length = 80 let blank_buffer = String.make blank_length ' ' let rec blanks output n = if n <= 0 then () else if n <= blank_length then output#substring blank_buffer 0 n else begin output#substring blank_buffer 0 blank_length; blanks output (n - blank_length) end (* ------------------------------------------------------------------------- *) (* This function expresses the following invariant: if we are in flattening mode, then we must be within bounds, i.e. the width and ribbon width constraints must be respected. *) let ok state flatten : bool = not flatten || state.column <= state.width && state.column <= state.last_indent + state.ribbon (* ------------------------------------------------------------------------- *) (* The pretty rendering engine. *) (* The renderer is supposed to behave exactly like Daan Leijen's, although its implementation is quite radically different, and simpler. Our documents are constructed eagerly, as opposed to lazily. This means that we pay a large space overhead, but in return, we get the ability of computing information bottom-up, as described above, which allows to render documents without backtracking or buffering. *) (* The [state] record is never copied; it is just threaded through. In addition to it, the parameters [indent] and [flatten] influence the manner in which the document is rendered. *) (* The code is written in tail-recursive style, so as to avoid running out of stack space if the document is very deep. Each [KCons] cell in a continuation represents a pending call to [pretty]. Each [KRange] cell represents a pending call to a user-provided range hook. *) type cont = | KNil | KCons of int * bool * document * cont | KRange of (range -> unit) * point * cont let rec pretty (output : output) (state : state) (indent : int) (flatten : bool) (doc : document) (cont : cont) : unit = match doc with | Empty -> continue output state cont | Char c -> output#char c; state.column <- state.column + 1; (* assert (ok state flatten); *) continue output state cont | String s -> let len = String.length s in output#substring s 0 len; state.column <- state.column + len; (* assert (ok state flatten); *) continue output state cont | FancyString (s, ofs, len, apparent_length) -> output#substring s ofs len; state.column <- state.column + apparent_length; (* assert (ok state flatten); *) continue output state cont | Blank n -> blanks output n; state.column <- state.column + n; (* assert (ok state flatten); *) continue output state cont | HardLine -> (* We cannot be in flattening mode, because a hard line has an [infinity] requirement, and we attempt to render a group in flattening mode only if this group's requirement is met. *) assert (not flatten); (* Emit a hardline. *) output#char '\n'; blanks output indent; state.line <- state.line + 1; state.column <- indent; state.last_indent <- indent; (* Continue. *) continue output state cont | IfFlat (doc1, doc2) -> (* Pick an appropriate sub-document, based on the current flattening mode. *) pretty output state indent flatten (if flatten then doc1 else doc2) cont | Cat (_, doc1, doc2) -> (* Push the second document onto the continuation. *) pretty output state indent flatten doc1 (KCons (indent, flatten, doc2, cont)) | Nest (_, j, doc) -> pretty output state (indent + j) flatten doc cont | Group (req, doc) -> (* If we already are in flattening mode, stay in flattening mode; we are committed to it. If we are not already in flattening mode, we have a choice of entering flattening mode. We enter this mode only if we know that this group fits on this line without violating the width or ribbon width constraints. Thus, we never backtrack. *) let flatten = flatten || let column = state.column ++ req in column <== state.width && column <== state.last_indent + state.ribbon in pretty output state indent flatten doc cont | Align (_, doc) -> (* The effect of this combinator is to set [indent] to [state.column]. Usually [indent] is equal to [state.last_indent], hence setting it to [state.column] increases it. However, if [nest] has been used since the current line began, then this could cause [indent] to decrease. *) (* assert (state.column > state.last_indent); *) pretty output state state.column flatten doc cont | Range (_, hook, doc) -> let start : point = (state.line, state.column) in pretty output state indent flatten doc (KRange (hook, start, cont)) | Custom c -> (* Invoke the document's custom rendering function. *) c#pretty output state indent flatten; (* Sanity check. *) assert (ok state flatten); (* Continue. *) continue output state cont and continue output state = function | KNil -> () | KCons (indent, flatten, doc, cont) -> pretty output state indent flatten doc cont | KRange (hook, start, cont) -> let finish : point = (state.line, state.column) in hook (start, finish); continue output state cont (* Publish a version of [pretty] that does not take an explicit continuation. This function may be used by authors of custom documents. We do not expose the internal [pretty] -- the one that takes a continuation -- because we wish to simplify the user's life. The price to pay is that calls that go through a custom document cannot be tail calls. *) let pretty output state indent flatten doc = pretty output state indent flatten doc KNil (* ------------------------------------------------------------------------- *) (* The compact rendering algorithm. *) let rec compact output doc cont = match doc with | Empty -> continue output cont | Char c -> output#char c; continue output cont | String s -> let len = String.length s in output#substring s 0 len; continue output cont | FancyString (s, ofs, len, _apparent_length) -> output#substring s ofs len; continue output cont | Blank n -> blanks output n; continue output cont | HardLine -> output#char '\n'; continue output cont | Cat (_, doc1, doc2) -> compact output doc1 (doc2 :: cont) | IfFlat (doc, _) | Nest (_, _, doc) | Group (_, doc) | Align (_, doc) | Range (_, _, doc) -> compact output doc cont | Custom c -> (* Invoke the document's custom rendering function. *) c#compact output; continue output cont and continue output cont = match cont with | [] -> () | doc :: cont -> compact output doc cont let compact output doc = compact output doc [] (* ------------------------------------------------------------------------- *) (* We now instantiate the renderers for the three kinds of output channels. *) (* This is just boilerplate. *) module MakeRenderer (X : sig type channel val output: channel -> output end) = struct type channel = X.channel type dummy = document type document = dummy let pretty rfrac width channel doc = pretty (X.output channel) (initial rfrac width) 0 false doc let compact channel doc = compact (X.output channel) doc end module ToChannel = MakeRenderer(struct type channel = out_channel let output = new channel_output end) module ToBuffer = MakeRenderer(struct type channel = Buffer.t let output = new buffer_output end) module ToFormatter = MakeRenderer(struct type channel = Format.formatter let output = new formatter_output end) menhir-20210929/pprint/src/PPrintEngine.mli000066400000000000000000000251401412503066000204150ustar00rootroot00000000000000(**************************************************************************) (* *) (* PPrint *) (* *) (* François Pottier, Inria Paris *) (* Nicolas Pouillard *) (* *) (* Copyright 2007-2019 Inria. All rights reserved. This file is *) (* distributed under the terms of the GNU Library General Public *) (* License, with an exception, as described in the file LICENSE. *) (**************************************************************************) (** A pretty-printing engine and a set of basic document combinators. *) (** {1 Building documents} *) (** Documents must be built in memory before they are rendered. This may seem costly, but it is a simple approach, and works well. *) (** The following operations form a set of basic (low-level) combinators for building documents. On top of these combinators, higher-level combinators can be defined: see {!PPrintCombinators}. *) (** This is the abstract type of documents. *) type document (** The following basic (low-level) combinators allow constructing documents. *) (** [empty] is the empty document. *) val empty: document (** [char c] is a document that consists of the single character [c]. This character must not be a newline. *) val char: char -> document (** [string s] is a document that consists of the string [s]. This string must not contain a newline. *) val string: string -> document (** [substring s ofs len] is a document that consists of the portion of the string [s] delimited by the offset [ofs] and the length [len]. This portion must not contain a newline. *) val substring: string -> int -> int -> document (** [fancystring s apparent_length] is a document that consists of the string [s]. This string must not contain a newline. The string may contain fancy characters: color escape characters, UTF-8 or multi-byte characters, etc. Thus, its apparent length (which measures how many columns the text will take up on screen) differs from its length in bytes. *) val fancystring: string -> int -> document (** [fancysubstring s ofs len apparent_length] is a document that consists of the portion of the string [s] delimited by the offset [ofs] and the length [len]. This portion must contain a newline. The string may contain fancy characters. *) val fancysubstring : string -> int -> int -> int -> document (** [utf8string s] is a document that consists of the UTF-8-encoded string [s]. This string must not contain a newline. *) val utf8string: string -> document (** [utf8format format ...] is equivalent to [utf8string (Printf.sprintf format ...)]. *) val utf8format: ('a, unit, string, document) format4 -> 'a (** [hardline] is a forced newline document. This document forces all enclosing groups to be printed in non-flattening mode. In other words, any enclosing groups are dissolved. *) val hardline: document (** [blank n] is a document that consists of [n] blank characters. *) val blank: int -> document (** [break n] is a document which consists of either [n] blank characters, when forced to display on a single line, or a single newline character, otherwise. Note that there is no choice at this point: choices are encoded by the [group] combinator. *) val break: int -> document (** [doc1 ^^ doc2] is the concatenation of the documents [doc1] and [doc2]. *) val (^^): document -> document -> document (** [nest j doc] is the document [doc], in which the indentation level has been increased by [j], that is, in which [j] blanks have been inserted after every newline character. Read this again: indentation is inserted after every newline character. No indentation is inserted at the beginning of the document. *) val nest: int -> document -> document (** [group doc] encodes a choice. If possible, then the entire document [group doc] is rendered on a single line. Otherwise, the group is dissolved, and [doc] is rendered. There might be further groups within [doc], whose presence will lead to further choices being explored. *) val group: document -> document (** [ifflat doc1 doc2] is rendered as [doc1] if part of a group that can be successfully flattened, and is rendered as [doc2] otherwise. Use this operation with caution. Because the pretty-printer is free to choose between [doc1] and [doc2], these documents should be semantically equivalent. *) val ifflat: document -> document -> document (** [align doc] is the document [doc], in which the indentation level has been set to the current column. Thus, [doc] is rendered within a box whose upper left corner is the current position. *) val align: document -> document (** A point is a pair of a line number and a column number. *) type point = int * int (** A range is a pair of points. *) type range = point * point (** [range hook doc] is printed exactly like the document [doc], but allows the caller to register a hook that is applied, when the document is printed, to the range occupied by this document in the output text. This offers a way of mapping positions in the output text back to (sub)documents. *) val range: (range -> unit) -> document -> document (** {1 Rendering documents} *) (** This renderer sends its output into an output channel. *) module ToChannel : PPrintRenderer.RENDERER with type channel = out_channel and type document = document (** This renderer sends its output into a memory buffer. *) module ToBuffer : PPrintRenderer.RENDERER with type channel = Buffer.t and type document = document (** This renderer sends its output into a formatter channel. *) module ToFormatter : PPrintRenderer.RENDERER with type channel = Format.formatter and type document = document (** {1 Defining custom documents} *) (** A width requirement is expressed as an integer, where the value [max_int] is reserved and represents infinity. *) type requirement = int val infinity : requirement (** An output channel is represented abstractly as an object equipped with methods for displaying one character and for displaying a substring. *) class type output = object (** [char c] sends the character [c] to the output channel. *) method char: char -> unit (** [substring s ofs len] sends the substring of [s] delimited by the offset [ofs] and the length [len] to the output channel. *) method substring: string -> int (* offset *) -> int (* length *) -> unit end (** The rendering engine maintains the following internal state. Its structure is subject to change in future versions of the library. Nevertheless, it is exposed to the user who wishes to define custom documents. *) type state = { width: int; (** The line width. This parameter is fixed throughout the execution of the renderer. *) ribbon: int; (** The ribbon width. This parameter is fixed throughout the execution of the renderer. *) mutable last_indent: int; (** The number of blanks that were printed at the beginning of the current line. This field is updated (only) when a hardline is emitted. It is used (only) to determine whether the ribbon width constraint is respected. *) mutable line: int; (** The current line. This field is updated (only) when a hardline is emitted. It is not used by the pretty-printing engine itself. *) mutable column: int; (** The current column. This field must be updated whenever something is sent to the output channel. It is used (only) to determine whether the width constraint is respected. *) } (** A custom document is defined by implementing the following methods. *) class type custom = object (** A custom document must publish the width (i.e., the number of columns) that it would like to occupy if it is printed on a single line (that is, in flattening mode). The special value [infinity] means that this document cannot be printed on a single line; this value causes any groups that contain this document to be dissolved. This method should in principle work in constant time. *) method requirement: requirement (** The method [pretty] is used by the main rendering algorithm. It has access to the output channel and to the algorithm's internal state, as described above. In addition, it receives the current indentation level and the current flattening mode (on or off). If flattening mode is on, then the document must be printed on a single line, in a manner that is consistent with the requirement that was published ahead of time. If flattening mode is off, then there is no such obligation. The state must be updated in a manner that is consistent with what is sent to the output channel. *) method pretty: output -> state -> int -> bool -> unit (** The method [compact] is used by the compact rendering algorithm. It has access to the output channel only. *) method compact: output -> unit end (** The function [custom] constructs a custom document. In other words, it converts an object of type [custom] to a document. *) val custom: custom -> document (** The key functions of the library are exposed, in the hope that they may be useful to authors of custom (leaf and non-leaf) documents. In the case of a leaf document, they can help perform certain basic functions; for instance, applying the function [pretty] to the document [hardline] is a simple way of printing a hardline, while respecting the indentation parameters and updating the state in a correct manner. Similarly, applying [pretty] to the document [blank n] is a simple way of printing [n] spaces. In the case of a non-leaf document (i.e., one which contains sub-documents), these functions are essential: they allow computing the width requirement of a sub-document and displaying a sub-document. *) (** [requirement doc] computes the width requirement of the document [doc]. It works in constant time. *) val requirement: document -> requirement (** [pretty output state indent flatten doc] prints the document [doc]. See the documentation of the method [pretty]. *) val pretty: output -> state -> int -> bool -> document -> unit (** [compact output doc] prints the document [doc]. See the documentation of the method [compact]. *) val compact: output -> document -> unit menhir-20210929/pprint/src/PPrintMini.ml000066400000000000000000000137001412503066000177320ustar00rootroot00000000000000(**************************************************************************) (* *) (* PPrint *) (* *) (* François Pottier, Inria Paris *) (* Nicolas Pouillard *) (* *) (* Copyright 2007-2019 Inria. All rights reserved. This file is *) (* distributed under the terms of the GNU Library General Public *) (* License, with an exception, as described in the file LICENSE. *) (**************************************************************************) (* -------------------------------------------------------------------------- *) (* A type of integers with infinity. *) type requirement = int (* with infinity *) (* Infinity is encoded as [max_int]. *) let infinity : requirement = max_int (* Addition of integers with infinity. *) let (++) (x : requirement) (y : requirement) : requirement = if x = infinity || y = infinity then infinity else x + y (* Comparison of requirements is just ordinary comparison. *) (* -------------------------------------------------------------------------- *) (* The type of documents. See [PPrintEngine] for documentation. *) type document = | Empty | FancyString of string * int * int * int | Blank of int | IfFlat of document * document | HardLine | Cat of requirement * document * document | Nest of requirement * int * document | Group of requirement * document (* -------------------------------------------------------------------------- *) (* Retrieving or computing the space requirement of a document. *) let rec requirement = function | Empty -> 0 | FancyString (_, _, _, len) | Blank len -> len | IfFlat (doc1, _) -> requirement doc1 | HardLine -> infinity | Cat (req, _, _) | Nest (req, _, _) | Group (req, _) -> req (* -------------------------------------------------------------------------- *) (* Document constructors. *) let empty = Empty let fancysubstring s ofs len apparent_length = if len = 0 then empty else FancyString (s, ofs, len, apparent_length) let fancystring s apparent_length = fancysubstring s 0 (String.length s) apparent_length let utf8_length s = let rec length_aux s c i = if i >= String.length s then c else let n = Char.code (String.unsafe_get s i) in let k = if n < 0x80 then 1 else if n < 0xe0 then 2 else if n < 0xf0 then 3 else 4 in length_aux s (c + 1) (i + k) in length_aux s 0 0 let utf8string s = fancystring s (utf8_length s) let char c = assert (c <> '\n'); fancystring (String.make 1 c) 1 let space = char ' ' let hardline = HardLine let blank n = match n with | 0 -> empty | 1 -> space | _ -> Blank n let ifflat doc1 doc2 = match doc1 with | IfFlat (doc1, _) | doc1 -> IfFlat (doc1, doc2) let internal_break i = ifflat (blank i) hardline let break0 = internal_break 0 let break1 = internal_break 1 let break i = match i with | 0 -> break0 | 1 -> break1 | _ -> internal_break i let (^^) x y = match x, y with | Empty, _ -> y | _, Empty -> x | _, _ -> Cat (requirement x ++ requirement y, x, y) let nest i x = assert (i >= 0); Nest (requirement x, i, x) let group x = let req = requirement x in if req = infinity then x else Group (req, x) (* -------------------------------------------------------------------------- *) (* Printing blank space (indentation characters). *) let blank_length = 80 let blank_buffer = String.make blank_length ' ' let rec blanks output n = if n <= 0 then () else if n <= blank_length then Buffer.add_substring output blank_buffer 0 n else begin Buffer.add_substring output blank_buffer 0 blank_length; blanks output (n - blank_length) end (* -------------------------------------------------------------------------- *) (* The rendering engine maintains the following internal state. *) (* For simplicity, the ribbon width is considered equal to the line width; in other words, there is no ribbon width constraint. *) (* For simplicity, the output channel is required to be an OCaml buffer. It is stored within the [state] record. *) type state = { (* The line width. *) width: int; (* The current column. *) mutable column: int; (* The output buffer. *) output: Buffer.t; } (* -------------------------------------------------------------------------- *) (* For simplicity, the rendering engine is *not* in tail-recursive style. *) let rec pretty state (indent : int) (flatten : bool) doc = match doc with | Empty -> () | FancyString (s, ofs, len, apparent_length) -> Buffer.add_substring state.output s ofs len; state.column <- state.column + apparent_length | Blank n -> blanks state.output n; state.column <- state.column + n | HardLine -> assert (not flatten); Buffer.add_char state.output '\n'; blanks state.output indent; state.column <- indent | IfFlat (doc1, doc2) -> pretty state indent flatten (if flatten then doc1 else doc2) | Cat (_, doc1, doc2) -> pretty state indent flatten doc1; pretty state indent flatten doc2 | Nest (_, j, doc) -> pretty state (indent + j) flatten doc | Group (req, doc) -> let flatten = flatten || state.column ++ req <= state.width in pretty state indent flatten doc (* -------------------------------------------------------------------------- *) (* The engine's entry point. *) let pretty width doc = let output = Buffer.create 512 in let state = { width; column = 0; output } in pretty state 0 false doc; Buffer.contents output menhir-20210929/pprint/src/PPrintOCaml.ml000066400000000000000000000111441412503066000200310ustar00rootroot00000000000000(**************************************************************************) (* *) (* PPrint *) (* *) (* François Pottier, Inria Paris *) (* Nicolas Pouillard *) (* *) (* Copyright 2007-2019 Inria. All rights reserved. This file is *) (* distributed under the terms of the GNU Library General Public *) (* License, with an exception, as described in the file LICENSE. *) (**************************************************************************) open Printf open PPrintEngine open PPrintCombinators type constructor = string type type_name = string type record_field = string type tag = int (* ------------------------------------------------------------------------- *) (* This internal [sprintf]-like function produces a document. We use [string], as opposed to [arbitrary_string], because the strings that we produce will never contain a newline character. *) let dsprintf format = ksprintf string format (* ------------------------------------------------------------------------- *) (* Nicolas prefers using this code as opposed to just [sprintf "%g"] or [sprintf "%f"]. The latter print [inf] and [-inf], whereas OCaml understands [infinity] and [neg_infinity]. [sprintf "%g"] does not add a trailing dot when the number happens to be an integral number. [sprintf "%F"] seems to lose precision and ignores the precision modifier. *) let valid_float_lexeme (s : string) : string = let l = String.length s in let rec loop i = if i >= l then (* If we reach the end of the string and have found only characters in the set '0' .. '9' and '-', then this string will be considered as an integer literal by OCaml. Adding a trailing dot makes it a float literal. *) s ^ "." else match s.[i] with | '0' .. '9' | '-' -> loop (i + 1) | _ -> s in loop 0 (* This function constructs a string representation of a floating point number. This representation is supposed to be accepted by OCaml as a valid floating point literal. *) let float_representation (f : float) : string = match classify_float f with | FP_nan -> "nan" | FP_infinite -> if f < 0.0 then "neg_infinity" else "infinity" | _ -> (* Try increasing precisions and validate. *) let s = sprintf "%.12g" f in if f = float_of_string s then valid_float_lexeme s else let s = sprintf "%.15g" f in if f = float_of_string s then valid_float_lexeme s else sprintf "%.18g" f (* ------------------------------------------------------------------------- *) (* A few constants and combinators, used below. *) let some = string "Some" let none = string "None" let lbracketbar = string "[|" let rbracketbar = string "|]" let seq1 opening separator closing = surround_separate 2 0 (opening ^^ closing) opening (separator ^^ break 1) closing let seq2 opening separator closing = surround_separate_map 2 1 (opening ^^ closing) opening (separator ^^ break 1) closing (* ------------------------------------------------------------------------- *) (* The following functions are printers for many types of OCaml values. *) (* There is no protection against cyclic values. *) type representation = document let tuple = seq1 lparen comma rparen let variant _ cons _ args = match args with | [] -> !^cons | _ :: _ -> !^cons ^^ tuple args let record _ fields = seq2 lbrace semi rbrace (fun (k, v) -> infix 2 1 equals !^k v) fields let option f = function | None -> none | Some x -> some ^^ tuple [f x] let list f xs = seq2 lbracket semi rbracket f xs let flowing_list f xs = group (lbracket ^^ space ^^ nest 2 ( flow_map (semi ^^ break 1) f xs ) ^^ space ^^ rbracket) let array f xs = seq2 lbracketbar semi rbracketbar f (Array.to_list xs) let flowing_array f xs = group (lbracketbar ^^ space ^^ nest 2 ( flow_map (semi ^^ break 1) f (Array.to_list xs) ) ^^ space ^^ rbracketbar) let ref f x = record "ref" ["contents", f !x] let float f = string (float_representation f) let int = dsprintf "%d" let int32 = dsprintf "%ld" let int64 = dsprintf "%Ld" let nativeint = dsprintf "%nd" let char = dsprintf "%C" let bool = dsprintf "%B" let string = dsprintf "%S" let unknown tyname _ = dsprintf "" tyname menhir-20210929/pprint/src/PPrintOCaml.mli000066400000000000000000000112621412503066000202030ustar00rootroot00000000000000(**************************************************************************) (* *) (* PPrint *) (* *) (* François Pottier, Inria Paris *) (* Nicolas Pouillard *) (* *) (* Copyright 2007-2019 Inria. All rights reserved. This file is *) (* distributed under the terms of the GNU Library General Public *) (* License, with an exception, as described in the file LICENSE. *) (**************************************************************************) (** A set of functions that construct representations of OCaml values. *) (** The string representations produced by these functions are supposed to be accepted by the OCaml parser as valid values. *) (** The signature of this module is compatible with that expected by the [camlp4] generator [Camlp4RepresentationGenerator]. *) (** These functions do {i not} distinguish between mutable and immutable values. They do {i not} recognize sharing, and do {i not} incorporate a protection against cyclic values. *) type constructor = string type type_name = string type record_field = string type tag = int (** A representation of a value is a [PPrint] document. *) type representation = PPrintEngine.document (** [variant _ dc _ args] is a description of a constructed value whose data constructor is [dc] and whose arguments are [args]. The other two parameters are presently unused. *) val variant : type_name -> constructor -> tag -> representation list -> representation (** [record _ fields] is a description of a record value whose fields are [fields]. The other parameter is presently unused. *) val record : type_name -> (record_field * representation) list -> representation (** [tuple args] is a description of a tuple value whose components are [args]. *) val tuple : representation list -> representation (** [string s] is a representation of the string [s]. *) val string : string -> representation (** [int i] is a representation of the integer [i]. *) val int : int -> representation (** [int32 i] is a representation of the 32-bit integer [i]. *) val int32 : int32 -> representation (** [int64 i] is a representation of the 64-bit integer [i]. *) val int64 : int64 -> representation (** [nativeint i] is a representation of the native integer [i]. *) val nativeint : nativeint -> representation (** [float f] is a representation of the floating-point number [f]. *) val float : float -> representation (** [char c] is a representation of the character [c]. *) val char : char -> representation (** [bool b] is a representation of the Boolenan value [b]. *) val bool : bool -> representation (** [option f o] is a representation of the option [o], where the representation of the element, if present, is computed by the function [f]. *) val option : ('a -> representation) -> 'a option -> representation (** [list f xs] is a representation of the list [xs], where the representation of each element is computed by the function [f]. If the whole list fits on a single line, then it is printed on a single line; otherwise each element is printed on a separate line. *) val list : ('a -> representation) -> 'a list -> representation (** [flowing_list f xs] is a representation of the list [xs], where the representation of each element is computed by the function [f]. As many elements are possible are printed on each line. *) val flowing_list : ('a -> representation) -> 'a list -> representation (** [array f xs] is a representation of the array [xs], where the representation of each element is computed by the function [f]. If the whole array fits on a single line, then it is printed on a single line; otherwise each element is printed on a separate line. *) val array : ('a -> representation) -> 'a array -> representation (** [flowing_array f xs] is a representation of the array [xs], where the representation of each element is computed by the function [f]. As many elements are possible are printed on each line. *) val flowing_array : ('a -> representation) -> 'a array -> representation (** [ref r] is a representation of the reference [r], where the representation of the content is computed by the function [f]. *) val ref : ('a -> representation) -> 'a ref -> representation (** [unknown t _] is a representation of an unknown value of type [t]. *) val unknown : type_name -> 'a -> representation menhir-20210929/pprint/src/PPrintRenderer.ml000066400000000000000000000033061412503066000206050ustar00rootroot00000000000000(**************************************************************************) (* *) (* PPrint *) (* *) (* François Pottier, Inria Paris *) (* Nicolas Pouillard *) (* *) (* Copyright 2007-2019 Inria. All rights reserved. This file is *) (* distributed under the terms of the GNU Library General Public *) (* License, with an exception, as described in the file LICENSE. *) (**************************************************************************) (** A common signature for the multiple document renderers proposed by {!PPrintEngine}. *) module type RENDERER = sig (** Output channels. *) type channel (** Documents. *) type document (** [pretty rfrac width channel document] pretty-prints the document [document] into the output channel [channel]. The parameter [width] is the maximum number of characters per line. The parameter [rfrac] is the ribbon width, a fraction relative to [width]. The ribbon width is the maximum number of non-indentation characters per line. *) val pretty: float -> int -> channel -> document -> unit (** [compact channel document] prints the document [document] to the output channel [channel]. No indentation is used. All newline instructions are respected, that is, no groups are flattened. *) val compact: channel -> document -> unit end menhir-20210929/pprint/src/Vendored_pprint.ml000066400000000000000000000017021412503066000210420ustar00rootroot00000000000000(**************************************************************************) (* *) (* PPrint *) (* *) (* François Pottier, Inria Paris *) (* Nicolas Pouillard *) (* *) (* Copyright 2007-2019 Inria. All rights reserved. This file is *) (* distributed under the terms of the GNU Library General Public *) (* License, with an exception, as described in the file LICENSE. *) (**************************************************************************) (** A package of all of the above. *) include PPrintEngine include PPrintCombinators module OCaml = PPrintOCaml menhir-20210929/pprint/src/dune000066400000000000000000000001711412503066000162230ustar00rootroot00000000000000(library (name vendored_pprint) (public_name vendored_pprint) (modules :standard \ PPrintMini) (wrapped false) ) menhir-20210929/pprint/vendored_pprint.opam000066400000000000000000000014641412503066000206440ustar00rootroot00000000000000opam-version: "2.0" maintainer: "francois.pottier@inria.fr" authors: [ "François Pottier " "Nicolas Pouillard " ] homepage: "https://github.com/fpottier/pprint" dev-repo: "git+ssh://git@github.com/fpottier/pprint.git" bug-reports: "francois.pottier@inria.fr" build: [ ["dune" "build" "-p" name "-j" jobs] ] depends: [ "ocaml" {>= "4.02"} "dune" {>= "1.3"} ] synopsis: "A pretty-printing combinator library and rendering engine" description: "This library offers a set of combinators for building so-called documents as well as an efficient engine for converting documents to a textual, fixed-width format. The engine takes care of indentation and line breaks, while respecting the constraints imposed by the structure of the document and by the text width." menhir-20210929/sdk/000077500000000000000000000000001412503066000140245ustar00rootroot00000000000000menhir-20210929/sdk/cmly_api.ml000066400000000000000000000115541412503066000161610ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The following signatures describe the API offered by the functor [Cfmly_read.Read]. This functor reads in a .cmly file and gives access to the description of the grammar and automaton contained in this file. *) (* This API is currently entirely self-contained, except for a reference to the module [Keyword], which is also part of [MenhirSdk]. *) (* The module type [INDEXED] describes a type [t] whose elements are in a bijection with an integer interval of the form [0..count). *) module type INDEXED = sig type t val count : int val of_int : int -> t val to_int : t -> int val iter : (t -> unit) -> unit val fold : (t -> 'a -> 'a) -> 'a -> 'a val tabulate : (t -> 'a) -> t -> 'a end (* The module type [GRAMMAR] describes the grammar and automaton. *) module type GRAMMAR = sig type terminal = private int type nonterminal = private int type production = private int type lr0 = private int type lr1 = private int type item = production * int type ocamltype = string type ocamlexpr = string module Range : sig type t val startp: t -> Lexing.position val endp: t -> Lexing.position end module Attribute : sig type t val label : t -> string val has_label : string -> t -> bool val payload : t -> string val position : t -> Range.t end module Grammar : sig val basename : string val preludes : string list val postludes : string list val parameters : string list val entry_points : (nonterminal * production * lr1) list val attributes : Attribute.t list end module Terminal : sig include INDEXED with type t = terminal val name : t -> string val kind : t -> [`REGULAR | `ERROR | `EOF | `PSEUDO] val typ : t -> ocamltype option val attributes : t -> Attribute.t list end module Nonterminal : sig include INDEXED with type t = nonterminal val name : t -> string val mangled_name : t -> string val kind : t -> [`REGULAR | `START] val typ : t -> ocamltype option val positions : t -> Range.t list val nullable : t -> bool val first : t -> terminal list val attributes : t -> Attribute.t list end type symbol = | T of terminal | N of nonterminal val symbol_name : ?mangled:bool -> symbol -> string type identifier = string module Action : sig type t val expr : t -> ocamlexpr val keywords : t -> Keyword.keyword list end module Production : sig include INDEXED with type t = production val kind : t -> [`REGULAR | `START] val lhs : t -> nonterminal val rhs : t -> (symbol * identifier * Attribute.t list) array val positions : t -> Range.t list val action : t -> Action.t option val attributes : t -> Attribute.t list end module Lr0 : sig include INDEXED with type t = lr0 val incoming : t -> symbol option val items : t -> item list end module Lr1 : sig include INDEXED with type t = lr1 val lr0 : t -> lr0 val transitions : t -> (symbol * t) list val reductions : t -> (terminal * production list) list end module Print : sig open Format val terminal : formatter -> terminal -> unit val nonterminal : formatter -> nonterminal -> unit val symbol : formatter -> symbol -> unit val mangled_nonterminal : formatter -> nonterminal -> unit val mangled_symbol : formatter -> symbol -> unit val production : formatter -> production -> unit val item : formatter -> item -> unit val itemset : formatter -> item list -> unit val annot_item : string list -> formatter -> item -> unit val annot_itemset : string list list -> formatter -> item list -> unit end end menhir-20210929/sdk/cmly_format.ml000066400000000000000000000064151412503066000167000ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module defines the data that is stored in .cmly files. In short, a .cmly file contains a value of type [grammar], defined below. *) (* The type definitions in this module are used by [Cmly_write], which writes a .cmly file, and by [Cmly_read], which reads a .cmly file. They should not be used anywhere else. *) (* All entities (terminal symbols, nonterminal symbols, and so on) are represented as integers. These integers serve as indices into arrays. This enables simple and efficient hashing, comparison, indexing, etc. *) type terminal = int type nonterminal = int type production = int type lr0 = int type lr1 = int type ocamltype = string type ocamlexpr = string type range = { r_start: Lexing.position; r_end: Lexing.position; } type attribute = { a_label: string; a_payload: string; a_position: range; } type attributes = attribute list type terminal_def = { t_name: string; t_kind: [`REGULAR | `ERROR | `EOF | `PSEUDO]; t_type: ocamltype option; t_attributes: attributes; } type nonterminal_def = { n_name: string; n_kind: [`REGULAR | `START]; n_mangled_name: string; n_type: ocamltype option; n_positions: range list; n_nullable: bool; n_first: terminal list; n_attributes: attributes; } type symbol = | T of terminal | N of nonterminal type identifier = string type action = { a_expr: ocamlexpr; a_keywords: Keyword.keyword list; } type producer_def = symbol * identifier * attributes type production_def = { p_kind: [`REGULAR | `START]; p_lhs: nonterminal; p_rhs: producer_def array; p_positions: range list; p_action: action option; p_attributes: attributes; } type lr0_state_def = { lr0_incoming: symbol option; lr0_items: (production * int) list; } type lr1_state_def = { lr1_lr0: lr0; lr1_transitions: (symbol * lr1) list; lr1_reductions: (terminal * production list) list; } type grammar = { g_basename : string; g_preludes : string list; g_postludes : string list; g_terminals : terminal_def array; g_nonterminals : nonterminal_def array; g_productions : production_def array; g_lr0_states : lr0_state_def array; g_lr1_states : lr1_state_def array; g_entry_points : (nonterminal * production * lr1) list; g_attributes : attributes; g_parameters : string list; } menhir-20210929/sdk/cmly_read.ml000066400000000000000000000217641412503066000163270ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Cmly_format open Cmly_api (* ------------------------------------------------------------------------ *) (* Reading a .cmly file. *) exception Error of string let read (ic : in_channel) : grammar = (* .cmly file format: CMLY ++ version string ++ grammar *) let magic = "CMLY" ^ Version.version in try let m = really_input_string ic (String.length magic) in if m <> magic then raise (Error (Printf.sprintf "Invalid magic string in .cmly file.\n\ Expecting %S, but got %S." magic m)) else (input_value ic : grammar) with | End_of_file (* [really_input_string], [input_value] *) | Failure _ -> (* [input_value] *) raise (Error (Printf.sprintf "Invalid or damaged .cmly file.")) let read (filename : string) : grammar = let ic = open_in_bin filename in match read ic with | x -> close_in_noerr ic; x | exception exn -> close_in_noerr ic; raise exn (* ------------------------------------------------------------------------ *) (* Packaging the interval [0..count) as a module of type [INDEXED]. *) module Index (P : sig val name: string (* for error messages only *) val count: int end) : INDEXED with type t = int = struct type t = int let count = P.count let of_int n = if 0 <= n && n < count then n else invalid_arg (P.name ^ ".of_int: index out of bounds") let to_int n = n let iter f = for i = 0 to count - 1 do f i done let fold f x = let r = ref x in for i = 0 to count - 1 do r := f i !r done; !r let tabulate f = let a = Array.init count f in Array.get a end (* ------------------------------------------------------------------------ *) (* Packaging a data structure of type [Cmly_format.grammar] as a module of type [Cmly_api.GRAMMAR]. *) module Make (G : sig val grammar : grammar end) : GRAMMAR = struct open G type terminal = int type nonterminal = int type production = int type lr0 = int type lr1 = int type item = production * int type ocamltype = string type ocamlexpr = string module Range = struct type t = Cmly_format.range let startp range = range.r_start let endp range = range.r_end end module Attribute = struct type t = Cmly_format.attribute let label attr = attr.a_label let has_label label attr = label = attr.a_label let payload attr = attr.a_payload let position attr = attr.a_position end module Grammar = struct let basename = grammar.g_basename let preludes = grammar.g_preludes let postludes = grammar.g_postludes let entry_points = grammar.g_entry_points let attributes = grammar.g_attributes let parameters = grammar.g_parameters end module Terminal = struct let table = grammar.g_terminals let name i = table.(i).t_name let kind i = table.(i).t_kind let typ i = table.(i).t_type let attributes i = table.(i).t_attributes include Index(struct let name = "Terminal" let count = Array.length table end) end module Nonterminal = struct let table = grammar.g_nonterminals let name i = table.(i).n_name let mangled_name i = table.(i).n_mangled_name let kind i = table.(i).n_kind let typ i = table.(i).n_type let positions i = table.(i).n_positions let nullable i = table.(i).n_nullable let first i = table.(i).n_first let attributes i = table.(i).n_attributes include Index(struct let name = "Nonterminal" let count = Array.length table end) end type symbol = Cmly_format.symbol = | T of terminal | N of nonterminal let symbol_name ?(mangled=false) = function | T t -> Terminal.name t | N n -> if mangled then Nonterminal.mangled_name n else Nonterminal.name n type identifier = string module Action = struct type t = action let expr t = t.a_expr let keywords t = t.a_keywords end module Production = struct let table = grammar.g_productions let kind i = table.(i).p_kind let lhs i = table.(i).p_lhs let rhs i = table.(i).p_rhs let positions i = table.(i).p_positions let action i = table.(i).p_action let attributes i = table.(i).p_attributes include Index(struct let name = "Production" let count = Array.length table end) end module Lr0 = struct let table = grammar.g_lr0_states let incoming i = table.(i).lr0_incoming let items i = table.(i).lr0_items include Index(struct let name = "Lr0" let count = Array.length table end) end module Lr1 = struct let table = grammar.g_lr1_states let lr0 i = table.(i).lr1_lr0 let transitions i = table.(i).lr1_transitions let reductions i = table.(i).lr1_reductions include Index(struct let name = "Lr1" let count = Array.length table end) end module Print = struct let terminal ppf t = Format.pp_print_string ppf (Terminal.name t) let nonterminal ppf t = Format.pp_print_string ppf (Nonterminal.name t) let symbol ppf = function | T t -> terminal ppf t | N n -> nonterminal ppf n let mangled_nonterminal ppf t = Format.pp_print_string ppf (Nonterminal.name t) let mangled_symbol ppf = function | T t -> terminal ppf t | N n -> mangled_nonterminal ppf n let rec lengths l acc = function | [] -> if l = -1 then [] else l :: lengths (-1) [] acc | [] :: rows -> lengths l acc rows | (col :: cols) :: rows -> lengths (max l (String.length col)) (cols :: acc) rows let rec adjust_length lengths cols = match lengths, cols with | l :: ls, c :: cs -> let pad = l - String.length c in let c = if pad = 0 then c else c ^ String.make pad ' ' in c :: adjust_length ls cs | _, [] -> [] | [], _ -> assert false let align_tabular rows = let lengths = lengths (-1) [] rows in List.map (adjust_length lengths) rows let print_line ppf = function | [] -> () | x :: xs -> Format.fprintf ppf "%s" x; List.iter (Format.fprintf ppf " %s") xs let print_table ppf table = let table = align_tabular table in List.iter (Format.fprintf ppf "%a\n" print_line) table let annot_itemset annots ppf items = let last_lhs = ref (-1) in let prepare (p, pos) annot = let rhs = Array.map (fun (sym, id, _) -> if id <> "" && id.[0] <> '_' then "(" ^ id ^ " = " ^ symbol_name sym ^ ")" else symbol_name sym ) (Production.rhs p) in if pos >= 0 && pos < Array.length rhs then rhs.(pos) <- ". " ^ rhs.(pos) else if pos > 0 && pos = Array.length rhs then rhs.(pos - 1) <- rhs.(pos - 1) ^ " ."; let lhs = Production.lhs p in let rhs = Array.to_list rhs in let rhs = if !last_lhs = lhs then "" :: " |" :: rhs else begin last_lhs := lhs; Nonterminal.name lhs :: "::=" :: rhs end in if annot = [] then [rhs] else [rhs; ("" :: "" :: annot)] in let rec prepare_all xs ys = match xs, ys with | [], _ -> [] | (x :: xs), (y :: ys) -> let z = prepare x y in z :: prepare_all xs ys | (x :: xs), [] -> let z = prepare x [] in z :: prepare_all xs [] in print_table ppf (List.concat (prepare_all items annots)) let itemset ppf t = annot_itemset [] ppf t let annot_item annot ppf item = annot_itemset [annot] ppf [item] let item ppf t = annot_item [] ppf t let production ppf t = item ppf (t, -1) end end module Read (X : sig val filename : string end) = Make (struct let grammar = read X.filename end) menhir-20210929/sdk/cmly_read.mli000066400000000000000000000024461412503066000164740ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The functor [Read] reads a .cmly file. If the file is unreadable, the exception [Error] is raised. Otherwise, the functor builds a module of type [Cmly_api.GRAMMAR], which gives access to a description of the grammar and automaton. *) exception Error of string module Read (X : sig val filename : string end) : Cmly_api.GRAMMAR menhir-20210929/sdk/dune000066400000000000000000000010361412503066000147020ustar00rootroot00000000000000;; The library MenhirSdk is built here. ;; This rule generates the module [Version]. This module defines the value ;; [version] of type [string]. Its value is a string representation of our ;; 8-digit version number [XXXXXXXX]. This number is set in the [dune-project] ;; file. (rule (with-stdout-to version.ml (echo "let version = \"%{version:menhir}\"\n") ) ) ;; Compile MenhirSdk in this directory. (library (name menhirSdk) (public_name menhirSdk) (synopsis "Toolkit for postprocessing Menhir automata (.cmly files)") ) menhir-20210929/sdk/keyword.ml000066400000000000000000000063521412503066000160500ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module provides some type and function definitions that help deal with the keywords that we recognize within semantic actions. *) (* ------------------------------------------------------------------------- *) (* Types. *) (* The user can request position information either at type [int] (a simple offset) or at type [Lexing.position]. *) type flavor = | FlavorOffset | FlavorPosition | FlavorLocation (* The user can request position information about the $start or $end of a symbol. Also, $symbolstart requests the computation of the start position of the first nonempty element in a production. *) type where = | WhereSymbolStart | WhereStart | WhereEnd (* The user can request position information about a production's left-hand side or about one of the symbols in its right-hand side, which he can refer to by position or by name. *) type subject = | Before | Left | RightNamed of string (* Keywords inside semantic actions. They allow access to semantic values or to position information. *) type keyword = | Position of subject * where * flavor | SyntaxError (* ------------------------------------------------------------------------- *) (* These auxiliary functions help map a [Position] keyword to the name of the variable that the keyword is replaced with. *) let where = function | WhereSymbolStart -> "symbolstart" | WhereStart -> "start" | WhereEnd -> "end" let subject = function | Before -> "__0_" | Left -> "" | RightNamed id -> Printf.sprintf "_%s_" id let flavor = function | FlavorPosition -> "pos" | FlavorOffset -> "ofs" | FlavorLocation -> "loc" let posvar s w f = match w, f with | _, (FlavorOffset | FlavorPosition) -> Printf.sprintf "_%s%s%s" (where w) (flavor f) (subject s) | WhereSymbolStart, FlavorLocation -> "_sloc" | WhereStart, FlavorLocation -> Printf.sprintf "_loc%s" (subject s) | _ -> assert false (* ------------------------------------------------------------------------- *) (* Sets of keywords. *) module KeywordSet = struct include Set.Make (struct type t = keyword let compare = compare end) let map f keywords = fold (fun keyword accu -> add (f keyword) accu ) keywords empty end menhir-20210929/sdk/keyword.mli000066400000000000000000000062371412503066000162230ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module provides some type and function definitions that help deal with the keywords that we recognize within semantic actions. *) (* The user can request position information either at several types: - a simple offset of type [int], e.g., via $startofs; - a position of type [Lexing.position], e.g., via $startpos; - a location, e.g., via $loc. A location is currently represented as a pair of positions, but this might change in the future; we may allow the user to choose a custom type of locations. *) type flavor = | FlavorOffset | FlavorPosition | FlavorLocation (* The user can request position information about the $start or $end of a symbol. Also, $symbolstart requests the computation of the start position of the first nonempty element in a production. *) type where = | WhereSymbolStart | WhereStart | WhereEnd (* The user can request position information about a production's left-hand side or about one of the symbols in its right-hand side, which he must refer to by name. (Referring to its symbol by its position, using [$i], is permitted in the concrete syntax, but the lexer eliminates this form.) We add a new subject, [Before], which corresponds to [$endpos($0)] in concrete syntax. We adopt the (slightly awkward) convention that when the subject is [Before], the [where] component must be [WhereEnd]. If [flavor] is [FlavorLocation], then [where] must be [WhereSymbolStart] or [WhereStart]. In the former case, [subject] must be [Left]; this corresponds to $sloc in concrete syntax. In the latter case, [subject] must be [Left] or [RightNamed _]; this corresponds to $loc and $loc(x) in concrete syntax. *) type subject = | Before | Left | RightNamed of string (* Keywords inside semantic actions. They allow access to semantic values or to position information. *) type keyword = | Position of subject * where * flavor | SyntaxError (* This maps a [Position] keyword to the name of the variable that the keyword is replaced with. *) val posvar: subject -> where -> flavor -> string (* Sets of keywords. *) module KeywordSet : sig include Set.S with type elt = keyword val map: (keyword -> keyword) -> t -> t end menhir-20210929/src/000077500000000000000000000000001412503066000140325ustar00rootroot00000000000000menhir-20210929/src/.merlin000066400000000000000000000007161412503066000153250ustar00rootroot00000000000000EXCLUDE_QUERY_DIR B ../_build/default/fix/src/.vendored_fix.objs/byte B ../_build/default/pprint/src/.vendored_pprint.objs/byte B ../_build/default/lib/pack/.menhirLib.objs/byte B ../_build/default/sdk/.menhirSdk.objs/byte B ../_build/default/src/stage2/.main.eobjs/byte S ../fix/src S ../pprint/src S ../lib/pack S ../sdk S . FLG -open Dune__exe -strict-sequence -strict-formats -short-paths -keep-locs -safe-string -g -w @1..66-4-9-41-44-45-60 -open MenhirSdk menhir-20210929/src/AtomicBitSet.ml000066400000000000000000000124721412503066000167210ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module offers bitsets that fit within an OCaml integer. This can be used to represent sets of integers in the semi-open interval [0, bound), where [bound] is [Sys.word_size - 1], that is, usually 63. *) type t = int type element = int let bound = Sys.word_size - 1 (* -------------------------------------------------------------------------- *) (* [bit i] is [2^i]. *) let bit i = assert (0 <= i && i < bound); 1 lsl i (* -------------------------------------------------------------------------- *) (* [tib x] is the base-2 logarithm of [x]. We may assume that [x] is a power of two, that is, a single bit is set. The function [tib] is so named because it is the inverse of [bit]: [tib (bit i) = i]. *) (* It would be nice if we could use gcc's __builtin_clz to do this. See caml_z.c in the library zarith. *) (* The following code is based on Jean-Christophe Filliâtre's Bitset. *) let log2 = Array.make 255 0 let () = for i = 0 to 7 do log2.(bit i) <- i done let tib16 accu x = if x land 0xFF = 0 then accu + 8 + log2.(x lsr 8) else accu + log2.(x) let tib32 accu x = if x land 0xFFFF = 0 then tib16 (accu + 16) (x lsr 16) else tib16 accu x let ffffffff = (0xffff lsl 16) lor 0xffff (* We cannot use the literal 0xffffffff because the OCaml compiler will reject it when compiling for a 32-bit machine. *) let tib64 x = if x land ffffffff = 0 then tib32 32 (x lsr 32) else tib32 0 x let tib x = match Sys.word_size with | 32 -> tib32 0 x | 64 -> tib64 x | _ -> assert false let () = (* A sanity check, executed once at startup time. *) for i = 0 to bound - 1 do assert (tib (bit i) = i) done (* -------------------------------------------------------------------------- *) (* [pop x] is the population count, that is, the number of bits that are set in [x]. *) (* Again, it would be nice if we could use gcc's __builtin_popcount. *) let b0 = 0x55555555 let b1 = 0x33333333 let b2 = 0xf0f0f0f let pop32 x = (* Count the bits inside each byte, in parallel. *) (* https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel *) let x = x - (x lsr 1) land b0 in let x = x land b1 + (x lsr 2) land b1 in let x = x land b2 + (x lsr 4) land b2 in (* Add up the four counts found in the four bytes. Each of these counts is at most 8, so the sum is at most 32, and fits in a byte. *) let x = x + x lsr 8 in let x = x + x lsr 16 in x land 0xff let pop64 x = pop32 x + pop32 (x lsr 32) (* Because [pop32] examines only the lower 32 bits, we pass [x] [pop32] without bothering to mask off the higher 32 bits. *) let pop x = match Sys.word_size with | 32 -> pop32 x | 64 -> pop64 x | _ -> assert false (* -------------------------------------------------------------------------- *) (* Operations. *) let empty = 0 let is_empty s = s = 0 let singleton = bit let add i s = (bit i) lor s let remove i s = (lnot (bit i)) land s let rec fold_delta delta f s accu = if s = 0 then accu else let x = s land (-s) in let s = s lxor x in (* or: s - x *) let accu = f (delta + tib x) accu in fold_delta delta f s accu let rec iter_delta delta f s = if s <> 0 then let x = s land (-s) in let s = s lxor x in (* or: s - x *) f (delta + tib x); iter_delta delta f s let rec fold f s accu = if s = 0 then accu else let x = s land (-s) in let s = s lxor x in (* or: s - x *) let accu = f (tib x) accu in fold f s accu let rec iter f s = if s <> 0 then let x = s land (-s) in let s = s lxor x in (* or: s - x *) f (tib x); iter f s let is_singleton s = (* Test whether only one bit is set in [ss]. We do this by turning off the rightmost bit, then comparing to zero. *) s land (s - 1) = 0 let cardinal s = pop s (* or: fold (fun _ m -> m + 1) s 0 *) let elements s = (* Note: the list is produced in decreasing order. *) fold (fun tl hd -> tl :: hd) s [] let subset s1 s2 = s1 land s2 = s1 let mem i s = subset (singleton i) s let union s1 s2 = s1 lor s2 let inter s1 s2 = s1 land s2 let minimum s = if s = 0 then raise Not_found else let x = s land (-s) in tib x let choose = minimum let compare : t -> t -> int = compare (* this is [Generic.compare] *) let equal : t -> t -> bool = (=) let disjoint s1 s2 = is_empty (inter s1 s2) menhir-20210929/src/AtomicBitSet.mli000066400000000000000000000030631412503066000170660ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module offers bitsets that fit within an OCaml integer. This can be used to represent sets of integers in the semi-open interval [0, bound), where [bound] is [Sys.word_size - 1], that is, usually 63. *) val bound: int include GSet.S with type element = int (* [iter_delta] and [fold_delta] are slightly generalized variants of [iter] and [fold]. They add the constant [delta] on the fly to each set element before presenting this set element to the user function [f]. *) val iter_delta: int -> (element -> unit) -> t -> unit val fold_delta: int -> (element -> 'b -> 'b) -> t -> 'b -> 'b menhir-20210929/src/AutomatonGraph.ml000066400000000000000000000031451412503066000173200ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Printf open Grammar module P = Dot.Print (struct type vertex = Lr1.node let name node = sprintf "s%d" (Lr1.number node) let successors (f : ?style:Dot.style -> label:string -> vertex -> unit) source = SymbolMap.iter (fun symbol target -> let label = Symbol.print symbol in f ~label target ) (Lr1.transitions source) let iter (f : ?shape:Dot.shape -> ?style:Dot.style -> label:string -> vertex -> unit) = Lr1.iter (fun node -> let label = sprintf "%d" (Lr1.number node) in f ~label node ) end) let print_automaton_graph() = let f = open_out (Settings.base ^ ".dot") in P.print f; close_out f menhir-20210929/src/AutomatonGraph.mli000066400000000000000000000023361412503066000174720ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* Build and print the LR(1) automaton as a graph. Each state of the automaton gives rise to a node. Edges are labeled with nonterminal and terminal symbols. The reduction actions that exist in each state are not shown. *) val print_automaton_graph: unit -> unit menhir-20210929/src/BoundedBitSet.ml000066400000000000000000000045061412503066000170640ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This functor takes an integer parameter [n] and provides an implementation of sets of integers, where every integer element must lie in the semi-open interval [0, n). *) (* We select an implementation based on [n]. If [n] is less than or equal to [AtomicBitSet.bound], then we can use a single-word bit set, which requires no memory allocation. If [n] is less than or equal to [DWordBitSet.bound], then we can use a double-word bit set. And so on, up to quad-word bit sets, which are about as far as I am willing to go. Otherwise, we fall back on [SparseBitSet], which can represent integers of unbounded magnitude. *) (* The functor [Make] must take a dummy argument [()] in order to indicate that it is not an applicative functor. Otherwise, we get a cryptic type error message: "This expression creates fresh types. It is not allowed inside applicative functors." *) module Make (N : sig val n: int end) () = struct (* An [if] construct in the module language would be welcome. This encoding is horrible. *) module type S = GSet.S with type element = int include (val if N.n <= AtomicBitSet.bound then (module AtomicBitSet : S) else if N.n <= DWordBitSet.bound then (module DWordBitSet : S) else if N.n <= QWordBitSet.bound then (module QWordBitSet : S) else (module SparseBitSet : S) : S) end menhir-20210929/src/BoundedBitSet.mli000066400000000000000000000023171412503066000172330ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This functor takes an integer parameter [n] and provides an implementation of sets of integers, where every integer element must lie in the semi-open interval [0, n). *) module Make (N : sig val n: int end) () : GSet.S with type element = int menhir-20210929/src/CheckSafeParameterizedGrammar.ml000066400000000000000000000240471412503066000222330ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) let value = Positions.value open Syntax (* This test accepts a parameterized grammar, with the restriction that all parameters must have sort [*]. This implies that the head of every application must be a toplevel nonterminal symbol: it cannot be a formal parameter of the current rule. *) (* -------------------------------------------------------------------------- *) (* This flag causes graph edges to be logged on the standard error channel. *) let debug = false (* -------------------------------------------------------------------------- *) (* For syntactic convenience, the code is wrapped in a functor. *) module Run (G : sig val g : grammar end) = struct open G (* -------------------------------------------------------------------------- *) (* We build a graph whose vertices are all formal parameters of all rules. A formal parameter is represented as a pair of a nonterminal symbol and a 0-based integer index (the number of this parameter within this rule). We use OCaml's generic equality and hash functions at this type. *) type formal = symbol * int let formals (nt, rule) : formal list = let arity = List.length rule.pr_parameters in Misc.mapi arity (fun i -> nt, i) let formals : formal array = StringMap.bindings g.p_rules |> List.map formals |> List.concat |> Array.of_list (* If [nt/i] is a formal parameter, we may need to find the rule that defines the symbol [nt] as well as the name of the [i]-th formal parameter in this rule. *) let info ((nt, i) : formal) : parameterized_rule * symbol = let rule = try StringMap.find nt g.p_rules with Not_found -> assert false in let x = try List.nth rule.pr_parameters i with Failure _ -> assert false in rule, x (* -------------------------------------------------------------------------- *) (* For each formal parameter [nt/i], we want to know whether this parameter is actually used, that is, whether it occurs in the right-hand side. *) (* Note that we look for syntactic occurrences *anywhere* in the right-hand side. We do *not* ignore occurrences that appear as the actual argument of a parameterized symbol that happens to ignore its argument... That would probably require a fixed point computation, and might be unsound: expansion might diverge as soon as there is a syntactic occurrence in the right-hand side. *) let used_in_producer x ((_, param, _) : producer) = Parameters.occurs x param let used_in_branch x (branch : parameterized_branch) = List.exists (used_in_producer x) branch.pr_producers let used (formal : formal) : bool = let rule, x = info formal in List.exists (used_in_branch x) rule.pr_branches (* Memoize this function. *) let used : formal -> bool = let module M = Fix.Memoize.ForType(struct type t = formal end) in M.memoize used (* -------------------------------------------------------------------------- *) (* The graph edges are as follows. First, for every rule of the following form: F(..., X, ...): # where X is the i-th formal parameter of F ... G(..., X, ...) ... # where X is the j-th actual parameter of G there is a "safe" edge from the formal parameter F/i to the formal G/j. This reflects the fact that there is a flow from F/i to G/j. It is "safe" in the sense that it is not size-increasing: the same parameter X is passed from F to G. Second, for every rule of the following form: F(..., X, ...): # where X is the i-th formal parameter of F ... G(..., H(..., X, ...) , ...) ... # where H(...) is the j-th actual parameter of G there is a "dangerous" edge from the formal parameter F/i to the formal G/j. This reflects the fact that there is a flow from F/i to G/j. This flow is "dangerous" in the sense that it is size-increasing: X is transformed to H(..., X, ...). In this example, there should also be a safe edge from the formal F/i to the formal H/k. More generally, an occurrence of X in the right-hand side can occur deeply nested in a context K: F(..., X, ...): ... K[X] ... In that case, we must create as many edges as the context K is deep, and all of these edges should be dangerous, except the one that corresponds to the innermost layer of K, which is safe. Another way of putting this is, when the left-hand side is: F(..., X, ...): # where X is the i-th formal parameter of F and the right-hand side contains a possibly-nested occurrence of: G(..., K[X], ...) # where K[X] is the j-th actual parameter of G then we must create an edge of F/i to G/j, and this edge is safe if and only if the context K is empty, i.e., X occurs at depth 0 in K[x]. *) (* As an exception to the previous rule, if it is known that the parameterized symbol G does not use its [j]-th parameter, then the edge of F/i to G/j should not be created, nor should the applications inside K be inspected. *) (* The code below has quadratic complexity because [Parameters.occurs_deep] is expensive. In principle, we could achieve linear complexity by first annotating subterm (bottom-up) with a Boolean flag that indicates whether [x] occurs (shallowly/deeply) in it; that would allow us to implement [occurs_deep] in constant time. However, in practice, quadratic complexity is probably good enough. *) type edge = | Safe | Dangerous let rec successors_parameter (f : edge -> formal -> unit) x (param : parameter) = match param with | ParameterVar _ -> (* This is not an application. No successors. *) () | ParameterApp (sym, params) -> let nt = value sym in List.iteri (fun i param -> (* If it is known that [nt] does not use its [i]-th parameter, then there is nothing to do here. *) if used (nt, i) then begin (* Check, recursively, the applications that appear inside [param]. *) successors_parameter f x param; (* If [x] occurs in the [i]-th actual parameter of this application, then there is an edge to the formal [nt, i]. Whether it is a safe or dangerous edge depends on whether [x] occurs shallow or deep. *) if Parameters.occurs_shallow x param then f Safe (nt, i) else if Parameters.occurs_deep x param then f Dangerous (nt, i) end ) params | ParameterAnonymous _ -> assert false let successors_producer f x ((_, param, _) : producer) = successors_parameter f x param let successors_branch f x (branch : parameterized_branch) = List.iter (successors_producer f x) branch.pr_producers let successors f (formal : formal) = let rule, x = info formal in List.iter (successors_branch f x) rule.pr_branches (* -------------------------------------------------------------------------- *) (* We now have a full description of the graph. *) module G = struct type node = formal let n = Array.length formals let index = Misc.inverse formals let successors f = successors (fun _ target -> f target) let iter f = Array.iter f formals end (* -------------------------------------------------------------------------- *) (* Display the graph. *) let () = if debug then G.iter (fun (x, i) -> successors (fun edge (y, j) -> let kind = match edge with Safe -> "safe" | Dangerous -> "dangerous" in Printf.eprintf "%s/%d ->(%s) %s/%d\n" x i kind y j ) (x, i) ) (* -------------------------------------------------------------------------- *) (* Compute its strongly connected components, ignoring the distinction between safe and dangerous edges. *) module T = Tarjan.Run(G) (* -------------------------------------------------------------------------- *) (* The safety criterion is: no dangerous edge is part of a cycle. Indeed, if this criterion is satisfied, then expansion must terminate: only a finite number of well-sorted terms (involving toplevel symbols and applications) can arise. (This sentence is not a proof!) Conversely, if a dangerous edge appears in a cycle, then expansion will not terminate. (That is, unless the dangerous cycle is unreachable. We choose to reject it anyway in that case.) In other words, this criterion is sound and complete. *) (* Checking that no dangerous edge is part of a cycle is done by examining the source and destination of every dangerous edge and ensuring that they lie in distinct components. *) let () = G.iter (fun source -> successors (fun edge target -> match edge with | Safe -> () | Dangerous -> if T.representative source = T.representative target then let (nt, i) = source in Error.error [] "the parameterized nonterminal symbols in this grammar\n\ cannot be expanded away: expansion would not terminate.\n\ The %s formal parameter of \"%s\" grows without bound." (Misc.nth (i + 1)) nt ) source ) end (* of the functor *) (* -------------------------------------------------------------------------- *) (* Re-package the above functor as a function. *) let check g = let module T = Run(struct let g = g end) in () menhir-20210929/src/CheckSafeParameterizedGrammar.mli000066400000000000000000000024621412503066000224010ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This test accepts a parameterized grammar, with the restriction that all parameters must have sort [*]. Parameters of higher sort must be eliminated prior to running this test: see [SelectiveExpansion]. *) (* This test succeeds if and only if the expansion of this grammar is safe, that is, terminates. *) val check: Syntax.grammar -> unit menhir-20210929/src/Compatibility.ml000066400000000000000000000051351412503066000172010ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) module Bytes = struct include Bytes let escaped s = let n = ref 0 in for i = 0 to length s - 1 do n := !n + (match unsafe_get s i with | '\"' | '\\' | '\n' | '\t' | '\r' | '\b' -> 2 | ' ' .. '~' -> 1 | _ -> 4) done; if !n = length s then copy s else begin let s' = create !n in n := 0; for i = 0 to length s - 1 do begin match unsafe_get s i with | ('\"' | '\\') as c -> unsafe_set s' !n '\\'; incr n; unsafe_set s' !n c | '\n' -> unsafe_set s' !n '\\'; incr n; unsafe_set s' !n 'n' | '\t' -> unsafe_set s' !n '\\'; incr n; unsafe_set s' !n 't' | '\r' -> unsafe_set s' !n '\\'; incr n; unsafe_set s' !n 'r' | '\b' -> unsafe_set s' !n '\\'; incr n; unsafe_set s' !n 'b' | (' ' .. '~') as c -> unsafe_set s' !n c | c -> let a = Char.code c in unsafe_set s' !n '\\'; incr n; unsafe_set s' !n (Char.chr (48 + a / 100)); incr n; unsafe_set s' !n (Char.chr (48 + (a / 10) mod 10)); incr n; unsafe_set s' !n (Char.chr (48 + a mod 10)); end; incr n done; s' end end module String = struct open String let escaped s = let rec escape_if_needed s n i = if i >= n then s else match unsafe_get s i with | '\"' | '\\' | '\000'..'\031' | '\127'.. '\255' -> Bytes.unsafe_to_string (Bytes.escaped (Bytes.unsafe_of_string s)) | _ -> escape_if_needed s n (i+1) in escape_if_needed s (length s) 0 end menhir-20210929/src/Compatibility.mli000066400000000000000000000026061412503066000173520ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The standard library function [String.escaped] in OCaml 4.02.3 depends on the operating system function [isprint] and therefore can have OS- dependent, locale-dependent behavior. This issue has been fixed in OCaml 4.03. We use a copy of the code found in OCaml 4.03 and higher, so as to avoid this issue. *) module Bytes : sig val escaped: bytes -> bytes end module String : sig val escaped: string -> string end menhir-20210929/src/CompletedNatWitness.ml000066400000000000000000000044711412503066000203260ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) type 'a t = | Finite of int * 'a Seq.seq | Infinity let equal p1 p2 = match p1, p2 with | Finite (i1, _), Finite (i2, _) -> i1 = i2 | Infinity, Infinity -> true | _, _ -> false let bottom = Infinity let epsilon = Finite (0, Seq.empty) let singleton x = Finite (1, Seq.singleton x) let is_maximal p = match p with | Finite (0, _) -> true | _ -> false let min p1 p2 = match p1, p2 with | Finite (i1, _), Finite (i2, _) -> if i1 <= i2 then p1 else p2 | p, Infinity | Infinity, p -> p let min_lazy p1 p2 = match p1 with | Finite (0, _) -> p1 | _ -> min p1 (p2()) let add p1 p2 = match p1, p2 with | Finite (i1, xs1), Finite (i2, xs2) -> Finite (i1 + i2, Seq.append xs1 xs2) | _, _ -> Infinity let add_lazy p1 p2 = match p1 with | Infinity -> Infinity | _ -> add p1 (p2()) let print conv p = match p with | Finite (0, _) -> (* Avoid producing a trailing space. *) Printf.sprintf "(* 0 *)" | Finite (i, xs) -> Printf.sprintf "(* %d *) " i ^ String.concat " " (List.map conv (Seq.elements xs)) | Infinity -> "infinity" let to_int p = match p with | Finite (i, _) -> i | Infinity -> max_int let extract p = match p with | Finite (_, xs) -> Seq.elements xs | Infinity -> assert false menhir-20210929/src/CompletedNatWitness.mli000066400000000000000000000037651412503066000205040ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This is the lattice of the natural numbers, completed with [Infinity], and ordered towards zero (i.e. [Infinity] is [bottom], [Finite 0] is [top]). *) (* These numbers are further enriched with sequences of matching length. Thus, a lattice element is either [Finite (n, xs)], where [n] is a natural number and [xs] is a sequence of length [n]; or [Infinity]. The sequences [xs] are ignored by the ordering (e.g., [compare] ignores them) but are nevertheless constructed (e.g., [add] concatenates two sequences). They should be thought of as witnesses, or proofs, that explain why the number [n] was obtained. *) type 'a t = | Finite of int * 'a Seq.seq | Infinity val bottom: 'a t val equal: 'a t -> 'b t -> bool val is_maximal: 'a t -> bool val epsilon: 'a t val singleton: 'a -> 'a t val min: 'a t -> 'a t -> 'a t val add: 'a t -> 'a t -> 'a t val min_lazy: 'a t -> (unit -> 'a t) -> 'a t val add_lazy: 'a t -> (unit -> 'a t) -> 'a t val print: ('a -> string) -> 'a t -> string val to_int: 'a t -> int val extract: 'a t -> 'a list menhir-20210929/src/DFS.ml000066400000000000000000000040101412503066000147730ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) module Run (G : sig type node type label val foreach_outgoing_edge: node -> (label -> node -> unit) -> unit val foreach_root: (node -> unit) -> unit end) (M : sig val mark: G.node -> unit val is_marked: G.node -> bool end) (D : sig val discover: G.node -> unit val traverse: G.node -> G.label -> G.node -> unit end) = struct open G open M open D let rec visit node = if not (is_marked node) then begin mark node; discover node; foreach_outgoing_edge node (fun label target -> traverse node label target; visit target ) end let () = foreach_root visit end module MarkSet (S : Set.S) = struct let marked = ref S.empty let is_marked x = S.mem x !marked let mark x = marked := S.add x !marked let marked () = !marked end module MarkArray (G : sig type node val n: int val number: node -> int end) = struct let marked = Array.make G.n false let is_marked x = marked.(G.number x) let mark x = marked.(G.number x) <- true let marked () = marked end menhir-20210929/src/DFS.mli000066400000000000000000000057601412503066000151610ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* A generic implementation of depth-first search. *) (* The graph [G] must be equipped with ways of iterating over the outgoing edges of a node and over the root notes. Edges can be labeled. If no labels are needed, then the type [label] should be defined as [unit]. *) (* The module [M] must offer a mechanism for marking a node and testing whether a node is marked. The functors [MarkSet] and [MarkArray] (below) can help implement it. *) (* The function [D.discover] is invoked at most once per node, when this node is newly discovered (after this node has been marked and before its outgoing edges are traversed). The function [D.traverse] is invoked at most once per edge, when this edge is traversed. *) (* The functor application [Run(G)(M)(D)] performs the search. No result is returned. *) module Run (G : sig type node type label val foreach_outgoing_edge: node -> (label -> node -> unit) -> unit val foreach_root: (node -> unit) -> unit end) (M : sig val mark: G.node -> unit val is_marked: G.node -> bool end) (D : sig val discover: G.node -> unit val traverse: G.node -> G.label -> G.node -> unit end) : sig end (* The module [MarkSet(S)] provides a fresh marking mechanism for elements of type [S.elt], where [S] is a set implementation. The functions [mark] and [is_marked] allow marking an element and testing whether an element is marked. The function [marked] returns the set of all marked elements. *) module MarkSet (S : Set.S) : sig val mark: S.elt -> unit val is_marked: S.elt -> bool val marked: unit -> S.t end (* The module [MarkArray(S)] provides a fresh marking mechanism for nodes of type [G.node], where [G] is a graph whose nodes are numbered. The functions [mark] and [is_marked] allow marking a node and testing whether a node is marked. The function [marked] returns an array of marks. *) module MarkArray (G : sig type node val n: int val number: node -> int end) : sig val mark: G.node -> unit val is_marked: G.node -> bool val marked: unit -> bool array end menhir-20210929/src/DWordBitSet.ml000066400000000000000000000113671412503066000165260ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module offers bitsets that fit within *two* OCaml integers. This can be used to represent sets of integers in the semi-open interval [0, bound), where [bound] is [2 * AtomicBitSet.bound], that is, usually 126. *) module A = AtomicBitSet (* As a special case, the empty set is represented by the data constructor [E]. Thus, the empty set requires no memory allocation. In order to maintain a unique representation of sets, we forbid the value [D (A.empty, A.empty)]. The smart constructor [construct] enforces this restriction. *) type t = | E | D of A.t * A.t let construct hi lo = if A.is_empty hi && A.is_empty lo then E else D (hi, lo) type element = int let bound = 2 * A.bound let empty = E let is_empty s = match s with | E -> true | D (_, _) -> (* Assuming every set is built by [construct] above, a set whose constructor is [D] cannot be empty. *) false let singleton i = if i < A.bound then D (A.empty, A.singleton i) else D (A.singleton (i - A.bound), A.empty) let add i s = match s with | E -> singleton i | D (hi, lo) -> if i < A.bound then let lo' = A.add i lo in if lo == lo' then s else D (hi, lo') else let hi' = A.add (i - A.bound) hi in if hi == hi' then s else D (hi', lo) let remove i s = match s with | E -> s | D (hi, lo) -> if i < A.bound then let lo' = A.remove i lo in if lo == lo' then s else construct hi lo' else let hi' = A.remove (i - A.bound) hi in if hi == hi' then s else construct hi' lo let fold f s accu = match s with | E -> accu | D (hi, lo) -> let accu = A.fold f lo accu in let accu = A.fold_delta A.bound f hi accu in accu let iter f s = match s with | E -> () | D (hi, lo) -> A.iter f lo; A.iter_delta A.bound f hi let is_singleton s = match s with | E -> false | D (hi, lo) -> A.is_singleton hi && A.is_empty lo || A.is_empty hi && A.is_singleton lo let cardinal s = match s with | E -> 0 | D (hi, lo) -> A.cardinal hi + A.cardinal lo let elements s = (* Note: the list is produced in decreasing order. *) fold (fun tl hd -> tl :: hd) s [] let subset s1 s2 = match s1, s2 with | E, _ -> true | D (_, _), E -> (* Assuming every set is built by [construct] above, a set whose constructor is [D] cannot be empty. *) false | D (hi1, lo1), D (hi2, lo2) -> A.subset hi1 hi2 && A.subset lo1 lo2 let mem i s = match s with | E -> false | D (hi, lo) -> if i < A.bound then A.mem i lo else A.mem (i - A.bound) hi let union s1 s2 = match s1, s2 with | E, s | s, E -> s | D (hi1, lo1), D (hi2, lo2) -> let hi = A.union hi1 hi2 and lo = A.union lo1 lo2 in if hi == hi2 && lo == lo2 then s2 else D (hi, lo) let inter s1 s2 = match s1, s2 with | E, _ | _, E -> E | D (hi1, lo1), D (hi2, lo2) -> construct (A.inter hi1 hi2) (A.inter lo1 lo2) let choose s = match s with | E -> raise Not_found | D (hi, lo) -> if A.is_empty lo then A.choose hi + A.bound else A.choose lo let compare s1 s2 = if s1 == s2 then 0 else match s1, s2 with | E , E -> 0 | D _, E -> 1 | E , D _ -> -1 | D (hi1, lo1), D (hi2, lo2) -> begin match A.compare hi1 hi2 with | 0 -> A.compare lo1 lo2 | n -> n end let equal s1 s2 = (s1 == s2) || match s1, s2 with | E , E -> true | D _, E | E , D _ -> false | D (hi1, lo1), D (hi2, lo2) -> A.equal hi1 hi2 && A.equal lo1 lo2 let disjoint s1 s2 = match s1, s2 with | E, _ | _, E -> true | D (hi1, lo1), D (hi2, lo2) -> A.disjoint hi1 hi2 && A.disjoint lo1 lo2 menhir-20210929/src/DWordBitSet.mli000066400000000000000000000023531412503066000166720ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module offers bitsets that fit within *two* OCaml integers. This can be used to represent sets of integers in the semi-open interval [0, bound), where [bound] is [2 * AtomicBitSet.bound], that is, usually 126. *) val bound: int include GSet.S with type element = int menhir-20210929/src/Driver.mli000066400000000000000000000022561412503066000157750ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The module [Driver] serves to offer a unified API to the parser, which could be produced by either ocamlyacc or Menhir. *) val grammar : (Lexing.lexbuf -> Parser.token) -> Lexing.lexbuf -> Syntax.partial_grammar menhir-20210929/src/Drop.ml000066400000000000000000000133131412503066000152710ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) let value = Positions.value (* The source. *) module S = Syntax (* The target. *) module T = BasicSyntax (* -------------------------------------------------------------------------- *) (* Most of the translation is straightforward. *) let drop_parameter (param : S.parameter) : S.symbol = match param with | S.ParameterVar sym -> value sym | S.ParameterApp _ -> (* The grammar should not have any parameterized symbols. *) assert false | S.ParameterAnonymous _ -> assert false let drop_producer ((id, param, attrs) : S.producer) : T.producer = { T.producer_identifier = id; T.producer_symbol = drop_parameter param; T.producer_attributes = attrs } let drop_branch (branch : S.parameterized_branch) : T.branch = { T.branch_position = branch.S.pr_branch_position; T.producers = List.map drop_producer branch.S.pr_producers; T.action = branch.S.pr_action; T.branch_prec_annotation = branch.S.pr_branch_prec_annotation; T.branch_production_level = branch.S.pr_branch_production_level } let drop_rule (rule : S.parameterized_rule) : T.rule = (* The grammar should not have any parameterized symbols. *) assert (rule.S.pr_parameters = []); (* The [%public] flag is dropped. *) { T.branches = List.map drop_branch rule.S.pr_branches; T.positions = rule.S.pr_positions; T.inline_flag = rule.S.pr_inline_flag; T.attributes = rule.S.pr_attributes; } (* -------------------------------------------------------------------------- *) (* We must store [%type] declarations and [%on_error_reduce] declarations in StringMaps, whereas so far they were represented as lists. *) let drop_declarations (kind : string) (f : 'info1 -> 'info2) (decls : (S.parameter * 'info1) list) : 'info2 StringMap.t = (* Now is as good a time as any to check against multiple declarations concerning a single nonterminal symbol. Indeed, if we did not rule out this situation, then we would have to keep only one (arbitrarily chosen) declaration. To do this, we first build a map of symbols to info *and* position... *) List.fold_left (fun accu (param, info) -> let symbol = drop_parameter param in begin match StringMap.find symbol accu with | exception Not_found -> () | (_, position) -> Error.error [position; Parameters.position param] "there are multiple %s declarations for the symbol %s." kind symbol end; StringMap.add symbol (f info, Parameters.position param) accu ) StringMap.empty decls (* ... then drop the positions. *) |> StringMap.map (fun (info, _) -> info) let drop_type_declarations = drop_declarations "%type" value let drop_on_error_reduce_declarations = drop_declarations "%on_error_reduce" (fun x -> x) (* -------------------------------------------------------------------------- *) (* We must eliminate (that is, desugar) [%attribute] declarations. We examine them one by one and attach these attributes with terminal or nonterminal symbols, as appropriate. This is entirely straightforward. *) let add_attribute (g : T.grammar) param attr : T.grammar = let symbol = drop_parameter param in match StringMap.find symbol g.T.tokens with | props -> (* This is a terminal symbol. *) let props = { props with S.tk_attributes = attr :: props.S.tk_attributes } in { g with T.tokens = StringMap.add symbol props g.T.tokens } | exception Not_found -> match StringMap.find symbol g.T.rules with | rule -> (* This is a nonterminal symbol. *) let rule = { rule with T.attributes = attr :: rule.T.attributes } in { g with T.rules = StringMap.add symbol rule g.T.rules } | exception Not_found -> (* This is an unknown symbol. This should not happen. *) assert false let add_attributes g (params, attrs) = List.fold_left (fun g param -> List.fold_left (fun g attr -> add_attribute g param attr ) g attrs ) g params let add_attributes (decls : (S.parameter list * S.attributes) list) g = List.fold_left add_attributes g decls (* -------------------------------------------------------------------------- *) (* Putting it all together. *) let drop (g : S.grammar) : T.grammar = { T.preludes = g.S.p_preludes; T.postludes = g.S.p_postludes; T.parameters = g.S.p_parameters; T.start_symbols = StringMap.domain g.S.p_start_symbols; T.types = drop_type_declarations g.S.p_types; T.tokens = g.S.p_tokens; T.on_error_reduce = drop_on_error_reduce_declarations g.S.p_on_error_reduce; T.gr_attributes = g.S.p_grammar_attributes; T.rules = StringMap.map drop_rule g.S.p_rules } |> add_attributes g.S.p_symbol_attributes menhir-20210929/src/Drop.mli000066400000000000000000000023331412503066000154420ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This function translates a grammar from the [Syntax] format to the [BasicSyntax] format. Naturally, the grammar must not have any parameterized symbols, since these are not allowed by the latter format. *) val drop: Syntax.grammar -> BasicSyntax.grammar menhir-20210929/src/Fix.ml000066400000000000000000000026251412503066000151170ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The library fix, which is found in the subdirectory fix/, has been renamed vendored_fix so as to prevent Dune from complaining about a conflict with a copy of fix that might be installed on the user's system. *) (* As a result, the library is now accessible under the name Vendored_fix. Because we do not want to pollute Menhir's sources with this name, we define the module Fix as an alias for Vendored_fix. *) include Vendored_fix menhir-20210929/src/FixSolver.ml000066400000000000000000000040021412503066000163010ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) module Make (M : Fix.IMPERATIVE_MAPS) (P : Fix.MINIMAL_SEMI_LATTICE) = struct type variable = M.key type property = P.property let join = P.leq_join (* A map of each variable to its upper bounds (its successors). *) let upper : variable list M.t = M.create() let successors x = try M.find x upper with Not_found -> [] let record_VarVar x y = M.add x (y :: successors x) upper (* A map of each variable to its lower bound (a constant). *) let lower : property M.t = M.create() let record_ConVar p y = match M.find y lower with | exception Not_found -> M.add y p lower | q -> M.add y (join p q) lower (* Running the analysis. *) module Solve () = struct module G = struct type nonrec variable = variable type nonrec property = property let foreach_root contribute = M.iter contribute lower let foreach_successor x p contribute = List.iter (fun y -> contribute y p) (successors x) end include Fix.DataFlow.Run(M)(P)(G) end end menhir-20210929/src/FixSolver.mli000066400000000000000000000031171412503066000164600ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Fix module Make (M : IMPERATIVE_MAPS) (P : MINIMAL_SEMI_LATTICE) : sig type variable = M.key type property = P.property (* [record_ConVar x y] records an inequality between a constant and a variable. *) val record_ConVar: property -> variable -> unit (* [record_VarVar x y] records an inequality between two variables. *) val record_VarVar: variable -> variable -> unit (* The functor [Solve] computes the least solution of the constraints. The value [None] represents bottom. *) module Solve () : SOLUTION with type variable = variable and type property = property option end menhir-20210929/src/Generic.ml000066400000000000000000000023771412503066000157510ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* Because the generic comparison function is named [Pervasives.compare] in early versions of OCaml and [Stdlib.compare] in recent versions, we cannot refer to it under either name. The following definition allows us to refer to it under the name [Generic.compare]. *) let compare = compare menhir-20210929/src/GroundSort.ml000066400000000000000000000021041412503066000164670ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) type sort = | GArrow of sort list let star = GArrow [] let domain sort = let GArrow sorts = sort in sorts menhir-20210929/src/GroundSort.mli000066400000000000000000000023051412503066000166430ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The syntax of sorts is: sort ::= (sort, ..., sort) -> * where the arity (the number of sorts on the left-hand side of the arrow) can be zero. *) type sort = | GArrow of sort list val star: sort val domain: sort -> sort list menhir-20210929/src/IL.ml000066400000000000000000000157251412503066000147020ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Positions (* Abstract syntax of the language used for code production. *) type interface = interface_item list and interface_item = (* Functor. Called [Make]. No functor if no parameters. Very ad hoc! *) | IIFunctor of Stretch.t list * interface (* Exception declarations. *) | IIExcDecls of excdef list (* Algebraic data type declarations (mutually recursive). *) | IITypeDecls of typedef list (* Value declarations. *) | IIValDecls of (string * typescheme) list (* Include directive. *) | IIInclude of module_type (* Submodule. *) | IIModule of string * module_type (* Comment. *) | IIComment of string and module_type = | MTNamedModuleType of string | MTWithType of module_type * string list * string * with_kind * typ | MTSigEnd of interface and with_kind = | WKNonDestructive (* = *) | WKDestructive (* := *) and excdef = { (* Name of the exception. *) excname: string; (* Optional equality. *) exceq: string option; } and typedef = { (* Name of the algebraic data type. *) typename: string; (* Type parameters. This is a list of type variable names, without the leading quote, which will be added by the pretty-printer. Can also be "_". *) typeparams: string list; (* Data constructors. *) typerhs: typedefrhs; (* Constraint. *) typeconstraint: (typ * typ) option } and typedefrhs = | TDefRecord of fielddef list | TDefSum of datadef list | TAbbrev of typ and fielddef = { (* Whether the field is mutable. *) modifiable: bool; (* Name of the field. *) fieldname: string; (* Type of the field. *) fieldtype: typescheme } and datadef = { (* Name of the data constructor. *) dataname: string; (* Types of the value parameters. *) datavalparams: typ list; (* Instantiated type parameters, if this is a GADT -- [None] if this is an ordinary ADT. *) datatypeparams: typ list option; } and typ = (* Textual OCaml type. *) | TypTextual of Stretch.ocamltype (* Type variable, without its leading quote. Can also be "_". *) | TypVar of string (* Application of an algebraic data type constructor. *) | TypApp of string * typ list (* Anonymous tuple. *) | TypTuple of typ list (* Arrow type. *) | TypArrow of typ * typ and typescheme = { (* Universal quantifiers, without leading quotes. *) quantifiers: string list; (* Body. *) body: typ; } and valdef = { (* Whether the value is public. Public values cannot be suppressed by the inliner. They serve as seeds for the dead code analysis. *) valpublic: bool; (* Definition's left-hand side. *) valpat: pattern; (* Value to which it is bound. *) valval: expr } and expr = (* Variable. *) | EVar of string (* Function. *) | EFun of pattern list * expr (* Function call. *) | EApp of expr * expr list (* Local definitions. This is a nested sequence of [let] definitions. *) | ELet of (pattern * expr) list * expr (* Case analysis. *) | EMatch of expr * branch list | EIfThen of expr * expr | EIfThenElse of expr * expr * expr (* Raising exceptions. *) | ERaise of expr (* Exception analysis. *) | ETry of expr * branch list (* Data construction. Tuples of length 1 are considered nonexistent, that is, [ETuple [e]] is considered the same expression as [e]. *) | EUnit | EIntConst of int | EStringConst of string | EData of string * expr list | ETuple of expr list (* Type annotation. *) | EAnnot of expr * typescheme (* Cheating on the typechecker. *) | EMagic of expr (* Obj.magic *) | ERepr of expr (* Obj.repr *) (* Records. *) | ERecord of (string * expr) list | ERecordAccess of expr * string | ERecordWrite of expr * string * expr (* Textual OCaml code. *) | ETextual of Stretch.t (* Comments. *) | EComment of string * expr | EPatComment of string * pattern * expr (* Arrays. *) | EArray of expr list | EArrayAccess of expr * expr and branch = { (* Branch pattern. *) branchpat: pattern; (* Branch body. *) branchbody: expr; } and pattern = (* Wildcard. *) | PWildcard (* Variable. *) | PVar of string | PVarLocated of string located (* The positions must not be dummies. Use [pvarlocated]. *) (* Data deconstruction. Tuples of length 1 are considered nonexistent, that is, [PTuple [p]] is considered the same pattern as [p]. *) | PUnit | PData of string * pattern list | PTuple of pattern list | PRecord of (string * pattern) list (* Disjunction. *) | POr of pattern list (* Type annotation. *) | PAnnot of pattern * typ (* Module expressions. *) and modexpr = | MVar of string | MStruct of structure | MApp of modexpr * modexpr (* Structures. *) and program = structure and structure = structure_item list and structure_item = (* Functor. Called [Make]. No functor if no parameters. Very ad hoc! *) | SIFunctor of Stretch.t list * structure (* Exception definitions. *) | SIExcDefs of excdef list (* Algebraic data type definitions (mutually recursive). *) | SITypeDefs of typedef list (* Value definitions (mutually recursive or not, as per the flag). *) | SIValDefs of bool * valdef list (* Raw OCaml code. *) | SIStretch of Stretch.t list (* Sub-module definition. *) | SIModuleDef of string * modexpr (* Module inclusion. *) | SIInclude of modexpr (* Comment. *) | SIComment of string (* A type of parameters, with injections both into patterns (formal parameters) and into expressions (actual parameters). *) type xparam = | XVar of string | XMagic of xparam let xvar x = XVar x let xmagic xp = XMagic xp let rec xparam2expr = function | XVar x -> EVar x | XMagic xp -> EMagic (xparam2expr xp) let rec xparam2pat = function | XVar x -> PVar x | XMagic xp -> xparam2pat xp (* no magic *) type xparams = xparam list let xparams2exprs xps = List.map xparam2expr xps let xparams2pats xps = List.map xparam2pat xps menhir-20210929/src/IO.ml000066400000000000000000000106551412503066000147020ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* Input-output utilities. *) (* ------------------------------------------------------------------------- *) (* [try/finally] has the same semantics as in Java. *) let try_finally action handler = let result = try action() with e -> handler(); raise e in handler(); result (* ------------------------------------------------------------------------- *) (* [moving_away filename action] moves the file [filename] away (if it exists), performs [action], then moves the file back into place (if it was moved away). *) let moving_away filename action = if Sys.file_exists filename then let newname = filename ^ ".moved_by_menhir" in Sys.rename filename newname; try_finally action (fun () -> Sys.rename newname filename ) else action() (* ------------------------------------------------------------------------- *) (* [with_file filename creation action] creates the file [filename] by running [creation], then runs [action], and ensures that the file is removed in the end. *) let with_file filename creation action = creation(); try_finally action (fun () -> Sys.remove filename) (* ------------------------------------------------------------------------- *) (* [exhaust channel] reads all of the data that's available on [channel]. It does not assume that the length of the data is known ahead of time. It does not close the channel. *) let chunk_size = 16384 let exhaust channel = let buffer = Buffer.create chunk_size in let chunk = Bytes.create chunk_size in let rec loop () = let length = input channel chunk 0 chunk_size in if length = 0 then Buffer.contents buffer else begin Buffer.add_subbytes buffer chunk 0 length; loop() end in loop() (* ------------------------------------------------------------------------- *) (* [invoke command] invokes an external command (which expects no input) and returns its output, if the command succeeds. It returns [None] if the command fails. *) let invoke command = let ic = Unix.open_process_in command in (* 20130911 Be careful to read in text mode, so as to avoid newline translation problems (which would manifest themselves on Windows). *) set_binary_mode_in ic false; let result = exhaust ic in match Unix.close_process_in ic with | Unix.WEXITED 0 -> Some result | _ -> None (* ------------------------------------------------------------------------- *) (* [read_whole_file filename] reads the file [filename] in text mode and returns its contents as a string. *) let read_whole_file filename = (* Open the file in text mode, so that (under Windows) CRLF is converted to LF. This guarantees that one byte is one character and seems to be required in order to report accurate positions. *) let channel = open_in filename in (* The standard library functions [pos_in] and [seek_in] do not work correctly when CRLF conversion is being performed, so we abandon their use. (They were used to go and extract the text of semantic actions.) Instead we load the entire file into memory up front, and work with a string. *) (* The standard library function [in_channel_length] does not work correctly when CRLF conversion is being performed, so we do not use it to read the whole file. And the standard library function [Buffer.add_channel] uses [really_input] internally, so we cannot use it either. Bummer. *) let s = exhaust channel in close_in channel; s menhir-20210929/src/IO.mli000066400000000000000000000041471412503066000150520ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* Input-output utilities. *) (* [try/finally] has the same semantics as in Java. *) val try_finally : (unit -> 'a) -> (unit -> unit) -> 'a (* [moving_away filename action] moves the file [filename] away (if it exists), performs [action], then moves the file back into place (if it was moved away). *) val moving_away: string -> (unit -> 'a) -> 'a (* [with_file filename creation action] creates the file [filename] by running [creation], then runs [action], and ensures that the file is removed in the end. *) val with_file: string -> (unit -> unit) -> (unit -> 'a) -> 'a (* [exhaust channel] reads all of the data that's available on [channel]. It does not assume that the length of the data is known ahead of time. It does not close the channel. *) val exhaust: in_channel -> string (* [invoke command] invokes an external command (which expects no input) and returns its output, if the command succeeds. It returns [None] if the command fails. *) val invoke: string -> string option (* [read_whole_file filename] reads the file [filename] in text mode and returns its contents as a string. *) val read_whole_file: string -> string menhir-20210929/src/InputFile.ml000066400000000000000000000072361412503066000162730ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* ---------------------------------------------------------------------------- *) (* The identity of the current input file. *) (* 2011/10/19: do not use [Filename.basename]. The [#] annotations that we insert in the [.ml] file must retain their full path. This does mean that the [#] annotations depend on how menhir is invoked -- e.g. [menhir foo/bar.mly] and [cd foo && menhir bar.mly] will produce different files. Nevertheless, this seems useful/reasonable. *) (* This also influences the type error messages produced by [--infer]. *) (* 2016/08/25: in principle, the order in which file names appear on the command line (when there are several of them) does not matter. It is however used in [BasicPrinter] (see the problem description there). For this reason, we define a type [input_file] which includes the file's name as well as its index on the command line. *) type input_file = { input_file_name: string; input_file_index: int } let builtin_input_file = { input_file_name = ""; input_file_index = -1 } let dummy_input_file = { input_file_name = ""; input_file_index = 0 } let same_input_file file1 file2 = file1.input_file_index = file2.input_file_index (* could also use physical equality [file1 == file2] *) let compare_input_files file1 file2 = Generic.compare file1.input_file_index file2.input_file_index (* Ideally, this function should NOT be used, as it reflects the order of the input files on the command line. As of 2016/08/25, it is used by [BasicPrinter], for lack of a better solution. *) let current_input_file = ref dummy_input_file (* This declares that a new file is being processed. *) let new_input_file name : unit = current_input_file := { input_file_name = name; input_file_index = !current_input_file.input_file_index + 1 } let get_input_file () : input_file = assert (!current_input_file != dummy_input_file); !current_input_file let get_input_file_name () : string = (get_input_file()).input_file_name (* ---------------------------------------------------------------------------- *) (* The contents of the current input file. *) let get_initialized_ref ref = match !ref with | None -> assert false | Some contents -> contents let file_contents = ref (None : string option) let get_file_contents () = get_initialized_ref file_contents let with_file_contents contents f = file_contents := Some contents; let result = f() in file_contents := None; (* avoid memory leak *) result open Lexing let chunk (pos1, pos2) = let ofs1 = pos1.pos_cnum and ofs2 = pos2.pos_cnum in let contents = get_file_contents() in let len = ofs2 - ofs1 in String.sub contents ofs1 len menhir-20210929/src/InputFile.mli000066400000000000000000000054161412503066000164420ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module keeps track of which input file is currently being read. It defines a type [input_file] of input files, which is used to record the origin of certain elements (productions, declarations, etc.). *) (* ---------------------------------------------------------------------------- *) (* The identity of the current input file. *) type input_file (* [new_input_file filename] must be called when a new input file is about to be read. *) val new_input_file: string -> unit (* [get_input_file()] indicates which input file is currently being read. [get_input_file_name()] is the name of this file. *) val get_input_file: unit -> input_file val get_input_file_name: unit -> string (* This fictitious "built-in" input file is used as the origin of the start productions. This technical detail is probably irrelevant entirely. *) val builtin_input_file: input_file (* This equality test for input files is used (for instance) when determining which of two productions has greater priority. *) val same_input_file: input_file -> input_file -> bool (* This ordering between input files reflects their ordering on the command line. Ideally, it should NOT be used. *) val compare_input_files: input_file -> input_file -> int (* ---------------------------------------------------------------------------- *) (* The contents of the current input file. *) (* [with_file_contents contents f] records that the contents of the current input file is [contents] while the action [f] runs. The function [f] can then call [chunk] (below) to retrieve certain segments of [contents]. *) val with_file_contents: string -> (unit -> 'a) -> 'a (* [chunk pos1 pos2] extracts a chunk out of the current input file, delimited by the positions [pos1] and [pos2]. *) val chunk: (Lexing.position * Lexing.position) -> string menhir-20210929/src/LALR.ml000066400000000000000000000104401412503066000151150ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module constructs an LALR automaton for the grammar described by the module [Grammar]. *) (* In LALR mode, two LR(1) states are merged as soon as they have the same LR(0) core. *) open Grammar type lr1state = Lr0.lr1state module Run () = struct let () = () (* -------------------------------------------------------------------------- *) (* The LALR automaton has exactly the same states as the LR(0) automaton, up to lookahead information. Therefore, we can use the same state numbers. Thus, the states and the transitions of the LALR automaton are the same as those of the LR(0) automaton! *) (* This means that we have almost nothing to do: in fact, the only thing that we have to do is compute a mapping of LR(0) nodes to LR(1) states. *) (* This computation can be viewed as a fixed point computation. In fact, it is a special kind of fixed point computation: it can be viewed as a forward data flow analysis where the graph is the LR(0) automaton and a property is an LR(1) state. *) type node = int (* A property is an LR(1) state. The function [leq_join] is used to detect stabilization and to merge the contribution of a predecessor state into a successor state. We exploit the fact that [Lr0.union s' s] is physically equal to [s] if [s'] is a subet of [s]. (Yes, we live on the edge.) *) module P = struct type property = lr1state let leq_join = Lr0.union end (* The graph. *) module G = struct type variable = node type property = P.property (* The root nodes are the entry nodes of the LR(0) automaton. The properties associated with these nodes are given by the function [Lr0.start]. *) let foreach_root f = ProductionMap.iter (fun _prod node -> f node (Lr0.start node) ) Lr0.entry (* The edges are the edges of the LR(0) automaton, and the manner in which each edge contributes to the computation of a property is given by the function [Lr0.transition]. *) let foreach_successor node state f = SymbolMap.iter (fun symbol (successor_node : node) -> let successor_state : lr1state = Lr0.transition symbol state in f successor_node successor_state ) (Lr0.outgoing_edges node) end (* Run the data flow computation. *) module F = Fix.DataFlow.ForIntSegment(Lr0)(P)(G) (* [solution : variable -> property option]. *) (* Because every node is reachable, this function never returns [None]. *) (* -------------------------------------------------------------------------- *) (* Expose the mapping of nodes to LR(1) states. *) let n = Lr0.n let states : lr1state array = Array.init n (fun node -> Option.force (F.solution node)) let state : node -> lr1state = Array.get states (* -------------------------------------------------------------------------- *) (* Expose the entry nodes and transitions of the LALR automaton. *) (* Because we re-use LR(0) node numbers, these are exactly the same as those of the LR(0) automaton! *) let entry : node ProductionMap.t = Lr0.entry let transitions : node -> node SymbolMap.t = Lr0.outgoing_edges (* -------------------------------------------------------------------------- *) (* Expose the bijection between nodes and numbers. *) let number (i : node) : int = i let node (i : int) : node = i (* -------------------------------------------------------------------------- *) end (* Run *) menhir-20210929/src/LALR.mli000066400000000000000000000025411412503066000152710ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module constructs an LALR automaton for the grammar described by the module [Grammar]. *) (* In this construction, precedence declarations are not taken into account. Thus, conflicts are not resolved; no transitions or reductions are removed in order to resolve conflicts. As a result, every node is reachable from some entry node. *) open LR1Sigs module Run () : LR1_AUTOMATON menhir-20210929/src/LR1Canonical.ml000066400000000000000000000061671412503066000166040ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module formulates the construction of the canonical LR(1) automaton as a forward graph traversal. *) type lr0state = Lr0.node type lr1state = Lr0.lr1state open Grammar module Run () = struct (* -------------------------------------------------------------------------- *) (* Give an implicit definition of the graph that we wish to traverse. *) module G = struct type t = lr1state let foreach_root f = ProductionMap.iter (fun _prod (c : lr0state) -> f (Lr0.start c) ) Lr0.entry let foreach_successor (state : lr1state) f = let symbols = Lr0.outgoing_symbols (Lr0.core state) in List.iter (fun symbol -> let successor = Lr0.transition symbol state in f successor ) symbols end (* -------------------------------------------------------------------------- *) (* Traversing this graph yields a numbering of the LR(1) states in the canonical automaton. *) type node = int include Fix.GraphNumbering.ForOrderedType(Lr0.Lr1StateAsOrderedType)(G) (* This defines [n : int], [encode : lr1state -> node], [decode : node -> lr1state]. *) (* -------------------------------------------------------------------------- *) (* Expose the mapping of nodes to LR(1) states. *) let state : node -> lr1state = decode (* -------------------------------------------------------------------------- *) (* Expose the entry nodes of the LR(1) automaton. *) let entry : node ProductionMap.t = ProductionMap.map (fun (c : lr0state) -> encode (Lr0.start c) ) Lr0.entry (* -------------------------------------------------------------------------- *) (* Expose the transitions of the LR(1) automaton. *) let transition symbol (i : node) : node = encode (Lr0.transition symbol (state i)) let outgoing_symbols (i : node) = Lr0.outgoing_symbols (Lr0.core (state i)) let transitions (i : node) : node SymbolMap.t = SymbolMap.init (fun symbol -> transition symbol i ) (outgoing_symbols i) (* -------------------------------------------------------------------------- *) (* Expose the bijection between nodes and numbers. *) let number (i : node) : int = i let node (i : int) : node = i end menhir-20210929/src/LR1Canonical.mli000066400000000000000000000020671412503066000167500ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module constructs a canonical LR(1) automaton. *) open LR1Sigs module Run () : LR1_AUTOMATON menhir-20210929/src/LR1Pager.ml000066400000000000000000000361371412503066000157530ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module implements the construction of an LR(1) automaton following a version of Pager's algorithm. *) (* This is a complete re-implementation of Pager's algorithm, by François Pottier, on 2020/01/31, following a suggestion by Jacques-Henri Jourdan. Our previous implementation of Pager's algorithm exhibited a rare bug that could create artificial (unexplainable) conflicts; see issue #21 at https://gitlab.inria.fr/fpottier/menhir/issues/21 . *) (* This code can be viewed as a variant of [LR1Canonical]. However, there, the algorithm is just a traversal of a well-defined graph, whereas here, the situation is more complex. Indeed, when the algorithm asks the question: * What state should be the target of the transition labeled [symbol] out of the state [state]? the answer *depends on which states have been discovered so far*. In fact, the answer to this question can involve an arbitrary choice, as there are sometimes several acceptable target states. So, the first phase of our algorithm resembles a graph traversal, but the graph's "edges" depend on the history of the traversal itself. *) (* The algorithm is in two phases: a nonstandard graph traversal, followed with a standard graph traversal. *) (* The first traversal is concerned solely with discovering states; it follows edges, but does not actually construct them in memory. This traversal can discover more states than will be present in the automaton at the end: indeed, some states that are discovered along the way can later become unreachable, as they are subsumed by larger states, and the graph's edges change over time. *) (* The second traversal determines which states are actually reachable, therefore present in the final automaton. It numbers these states. *) (* This implementation of Pager's algorithm is more flexible than our previous implementation in a subtle and interesting way. This explains why issue #21 was present and is now fixed. Imagine that the state B is created as a successor of A along some symbol [symbol]. (The previous algorithm would install an edge of A to B.) Now, imagine that, for some reason, the state A is subsumed by a larger state A', and for some independent reason, the state B is subsumed by a larger state B'. (The previous algorithm would grow the two states in place.) Now, the transition out of A' along [symbol] must be examined, and it is quite possible that its target B'' is incompatible with B', that is, the union of B' and B'' has a conflict that does not exist in the canonical automaton. In that case, the previous algorithm had painted itself into a corner; there was no way of detecting or avoiding this artificial conflict. The new algorithm, on the other hand, simply decides that the transition out of A' along [symbol] cannot lead to B' and must instead lead to B'' (or to a state that subsumes B''). Indeed, the new algorithm does not commit early to which states or edges will exist once the dust settles down. *) type lr0state = Lr0.node type lr1state = Lr0.lr1state open Grammar module Run () = struct (* -------------------------------------------------------------------------- *) (* Give an implicit definition of the graph that we wish to traverse. *) (* This section is identical to the one found in [LR1Canonical]. *) module G = struct type t = lr1state let foreach_root f = ProductionMap.iter (fun _prod (c : lr0state) -> f (Lr0.start c) ) Lr0.entry let foreach_successor (state : lr1state) f = let symbols = Lr0.outgoing_symbols (Lr0.core state) in List.iter (fun symbol -> let successor = Lr0.transition symbol state in f successor ) symbols end (* -------------------------------------------------------------------------- *) (* Sets of LR(1) states. *) (* We expect these sets to have few elements, most of the time, as of the automaton produced by Pager's algorithm will have only marginally more states than the LALR automaton. So, one could perhaps use lists instead of sets. But it's easy to do the right thing here, so let's do it. *) module S = struct include Set.Make(Lr0.Lr1StateAsOrderedType) (* [select p s] returns an element of the set [s] that satisfies the predicate [p], if such an element exists. It is deterministic: the least element that satisfies [p], according to the user-defined ordering on elements, is selected. *) exception Found of elt let select (p : elt -> bool) (s : t) : elt option = try iter (fun x -> if p x then raise (Found x) ) s; None with Found x -> Some x end (* -------------------------------------------------------------------------- *) (* Set up a mapping of LR(0) nodes to sets of LR(1) states. This allows us to efficiently find all existing LR(1) states that are core-compatible with a newly discovered LR(1) state. *) (* Within each family [families.(c)], all states have the same core [c], no two states subsume each other, and no two states are compatible with each other. (Two states in the subsumption relation are also compatible, so the latter statement is stronger.) *) let families : S.t array = Array.make Lr0.n S.empty (* -------------------------------------------------------------------------- *) (* The frontier of the first traversal. This is a set of states that are currently scheduled for later examination. *) let frontier : lr1state Stack.t = Stack.create() let schedule state = Stack.push state frontier (* -------------------------------------------------------------------------- *) (* [subsume candidate state] determines whether [candidate] is a subset of [state], in the sense of set-theoretic inclusion of sets of LR(1) items. *) (* [compatible candidate state] determines whether [candidate] and [state] are compatible according to Pager's weak compatibility criterion, modified so as to take end-of-stream conflicts into account. *) (* Since 2011/01/24, both criteria take error compatibility into account. *) let subsume candidate state = Lr0.subsume candidate state && Lr0.error_compatible candidate state let compatible candidate state = Lr0.compatible candidate state && Lr0.eos_compatible candidate state && Lr0.error_compatible candidate state (* In the construction mode [ModeInclusionOnly], the compatibility test is much weakened. In this mode, compatibility is defined as the symmetric closure of subsumption. This means that two states can be merged only if one subsumes the other. Thus, we get an LR(1) automaton where every state is a state that also exists in the canonical automaton. Thus, it is clear that no artificial conflicts can be created by this construction. *) let compatible candidate state = Settings.( match construction_mode with | ModePager -> compatible candidate state | ModeInclusionOnly -> subsume candidate state || subsume state candidate | ModeCanonical | ModeLALR -> (* We cannot be here. *) assert false ) (* -------------------------------------------------------------------------- *) (* Debugging code. *) let debug = false let rec no_two related xs = match xs with | [] -> true | x :: xs -> List.for_all (fun x' -> not (related x x')) xs && no_two related xs let well_formed (family : S.t) : bool = let members = S.elements family in no_two compatible members (* -------------------------------------------------------------------------- *) (* [examine candidate] is invoked whenever some state [state] has just been taken out of the frontier and the algorithm has determined that [candidate] should normally be its successor along a certain symbol. *) (* In a standard graph traversal, we would test whether [candidate] has been discovered already: if so (1), do nothing; otherwise (2), schedule it. *) (* Here, case (1) becomes more widely applicable, because we also test whether a state that subsumes [candidate] has been discovered already. Furthermore, a third case appears: if we find [candidate] is compatible with an existing state, then we construct the union of these two states, and schedule it. *) let rec examine (candidate : lr1state) = (* Find out which already-discovered states are core-compatible with the candidate state. *) let c : lr0state = Lr0.core candidate in let family : S.t = families.(c) in if debug then assert (well_formed family); (* Is the candidate state a subset of an existing state? *) (* One might wish to first test [S.mem candidate family], because this test runs in logarithmic time, whereas the test [S.exists ..] below runs in linear time. However, in most cases, we expect the family to have size zero or one, rarely more, so adding such a redundant test does not seem to be a good idea. *) if S.exists (subsume candidate) family then (* Yes, it is. Then, the candidate state does not need to be created, and the existing state does not even need to be rescheduled. There is nothing to do. *) (* This covers the case where the candidate state has been discovered in the past. Therefore, a state is never scheduled twice. This implies that the algorithm terminates. *) () else fuse c family candidate and fuse (c : lr0state) (family : S.t) (candidate : lr1state) = (* Is the candidate state Pager-compatible with an existing state? *) (* (This covers the case where the candidate subsumes an existing state.) *) match S.select (compatible candidate) family with | Some state -> if debug then assert (not (subsume candidate state)); if debug then assert (compatible candidate state); (* Yes, it is. (The candidate might be compatible with several existing states; in that case, an arbitrary choice is made.) Then, we form the union of the candidate state and the pre-existing state, and we regard it as a new candidate state. *) let candidate : lr1state = Lr0.union candidate state in (* The existing state [state] is subsumed by the new candidate, so it must be removed from the family, so as to maintain the invariant that the family consists of pairwise-incompatible states. *) (* If there were any transitions whose endpoint was [state], then these transitions can safely be redirected towards [candidate], so it is fine to remove [state] from consideration; it definitely will not be part of the final automaton. *) let family = S.remove state family in if debug then assert (well_formed family); (* There might still be more opportunities for fusion, as the candidate state can be compatible with several members of the family. *) fuse c family candidate | None -> (* No, it is not. *) (* The new candidate is added to the family and scheduled. *) let family = S.add candidate family in if debug then assert (well_formed family); families.(c) <- family; schedule candidate (* -------------------------------------------------------------------------- *) (* Carry out the first traversal. First schedule the roots, then repeatedly extract a node of out the frontier and examine its candidate successors. *) let () = G.foreach_root (fun state -> let c = Lr0.core state in if debug then assert (S.is_empty families.(c)); families.(c) <- S.singleton state; schedule state ) let () = while not (Stack.is_empty frontier) do let state = Stack.pop frontier in G.foreach_successor state examine done (* The first phase is now over. *) (* -------------------------------------------------------------------------- *) (* We are now ready for the second phase. *) (* The array [families] is now read-only. *) (* The function [examine] can be replaced by [redirect], where we are assured that the candidate state must be subsumed by some member of the family. *) let redirect candidate = let c = Lr0.core candidate in let family = families.(c) in match S.select (subsume candidate) family with | Some successor -> successor | None -> (* This cannot happen. Trust me, I have often been wrong before. *) assert false (* The composition of [G.foreach_successor] and [redirect] defines the edges of a new graph [G'], whose vertices form a subset of the vertices that we have discovered during the first phase. *) module G' = struct include G let foreach_successor (state : lr1state) f = G.foreach_successor state (fun candidate -> let successor = redirect candidate in f successor ) end (* Traversing the graph [G'] yields a numbering of its vertices, which are the states of the final LR(1) automaton. *) (* The remainder of this file is identical to [LR1Canonical], except for one use of [redirect] in the definition of [transition]. *) type node = int include Fix.GraphNumbering.ForOrderedType(Lr0.Lr1StateAsOrderedType)(G') (* This defines [n : int], [encode : lr1state -> node], [decode : node -> lr1state]. *) (* -------------------------------------------------------------------------- *) (* Expose the mapping of nodes to LR(1) states. *) let state : node -> lr1state = decode (* -------------------------------------------------------------------------- *) (* Expose the entry nodes of the LR(1) automaton. *) let entry : node ProductionMap.t = ProductionMap.map (fun (c : lr0state) -> encode (Lr0.start c) ) Lr0.entry (* -------------------------------------------------------------------------- *) (* Expose the transitions of the LR(1) automaton. *) let transition symbol (i : node) : node = encode (redirect (Lr0.transition symbol (state i))) (* note the use of [redirect] *) let outgoing_symbols (i : node) = Lr0.outgoing_symbols (Lr0.core (state i)) let transitions (i : node) : node SymbolMap.t = SymbolMap.init (fun symbol -> transition symbol i ) (outgoing_symbols i) (* -------------------------------------------------------------------------- *) (* Expose the bijection between nodes and numbers. *) let number (i : node) : int = i let node (i : int) : node = i end menhir-20210929/src/LR1Pager.mli000066400000000000000000000021561412503066000161160ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module implements the construction of an LR(1) automaton following a version of Pager's algorithm. *) open LR1Sigs module Run () : LR1_AUTOMATON menhir-20210929/src/LR1Sigs.ml000066400000000000000000000034161412503066000156140ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* The output signature of several LR(1) automaton construction algorithms. *) module type LR1_AUTOMATON = sig (* An abstract type of nodes, that is, states in the LR(1) automaton. *) type node (* The number of nodes. *) val n: int (* Nodes are numbered from 0 to [n-1]. *) val number: node -> int val node: int -> node (* To each start production corresponds an entry node. *) val entry : node ProductionMap.t (* Each node carries outgoing transitions towards other nodes. (Note to implementors of the signature [LR1_AUTOMATON]: there is no need to memoize this function; this is done a posteriori, in [Lr1].) *) val transitions: node -> node SymbolMap.t (* Each node represents an LR(1) state, that is, a set of LR(1) items. *) val state: node -> Lr0.lr1state end menhir-20210929/src/LRijkstra.ml000066400000000000000000000267211412503066000163010ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module implements [--list-errors]. Its purpose is to find, for each pair of a state [s] and a terminal symbol [z] such that looking at [z] in state [s] causes an error, a minimal path (starting in some initial state) that actually triggers this error. *) (* This is potentially useful for grammar designers who wish to better understand the properties of their grammar, or who wish to produce a list of all possible syntax errors (or, at least, one syntax error in each automaton state where an error may occur). *) (* The problem seems rather tricky. One might think that it suffices to compute shortest paths in the automaton, and to use [Analysis.minimal] to replace each non-terminal symbol in a path with a minimal word that this symbol generates. One can indeed do so, but this yields only a lower bound on the actual shortest path to the error at [s, z]. Indeed, several difficulties arise, including the fact that reductions are subject to a lookahead hypothesis; the fact that some states have a default reduction, hence will never trigger an error; the fact that conflict resolution x removes some (shift or reduce) actions, hence may suppress the shortest path. *) (* ------------------------------------------------------------------------ *) (* To delay the side effects performed by this module, we wrap everything in in a big functor. The functor also serves to pass verbosity parameters. *) module Run (X : sig (* If [verbose] is set, produce various messages on [stderr]. *) val verbose: bool (* If [statistics] is defined, it is interpreted as the name of a file to which one line of statistics is appended. *) val statistics: string option end) = struct open Grammar open Default (* ------------------------------------------------------------------------ *) (* Record our start time. *) let now () = match X.statistics with | Some _ -> Unix.((times()).tms_utime) | None -> 0.0 let start = now() (* ------------------------------------------------------------------------ *) (* Run the core reachability analysis, which finds out exactly under what conditions each nonterminal transition in the automaton can be taken. *) module Core = LRijkstraCore.Run(X) module W = Core.W (* ------------------------------------------------------------------------ *) (* The following code validates the fact that an error can be triggered in state [s'] by beginning at the start symbol [nt] and reading the sequence of terminal symbols [w]. We use this for debugging purposes. Furthermore, this gives us a list of spurious reductions, which we use to produce a comment. *) let fail msg = Printf.eprintf "LRijkstra: internal error: %s.\n%!" msg; exit 1 let fail format = Printf.ksprintf fail format let validate nt s' w : ReferenceInterpreter.target = let open ReferenceInterpreter in match check_error_path false nt (W.elements w) with | OInputReadPastEnd -> fail "input was read past its end" | OInputNotFullyConsumed -> fail "input was not fully consumed" | OUnexpectedAccept -> fail "input was unexpectedly accepted" | OK ((state, _) as target) -> if Lr1.Node.compare state s' <> 0 then fail "error occurred in state %d instead of %d" (Lr1.number state) (Lr1.number s') else target (* ------------------------------------------------------------------------ *) (* We now wish to determine, given a state [s'] and a terminal symbol [z], a minimal path that takes us from some entry state to state [s'] with [z] as the next (unconsumed) symbol. *) (* This can be formulated as a search for a shortest path in a graph. The graph is not just the automaton, though. It is a (much) larger graph whose vertices are pairs [s, z] and whose edges are obtained by querying the module [E] above. For this purpose, we use Dijkstra's algorithm, unmodified. Experiments show that the running time of this phase is typically 10x shorter than the running time of the main loop above. *) module A = Astar.Make(struct (* A vertex is a pair [s, z], where [z] is a real terminal symbol. *) type node = Lr1.node * Terminal.t let equal (s'1, z1) (s'2, z2) = Lr1.Node.compare s'1 s'2 = 0 && Terminal.compare z1 z2 = 0 let hash (s, z) = Hashtbl.hash (Lr1.number s, z) (* An edge is labeled with a word. *) type label = W.word (* We search forward from every [s, z], where [s] is an initial state. *) let sources f = Terminal.iter_real (fun z -> ProductionMap.iter (fun _ s -> f (s, z) ) Lr1.entry ) (* The successors of [s, z] are defined as follows. *) let successors (s, z) edge = assert (Terminal.real z); (* For every transition out of [s], labeled [sym], leading to [s']... *) Lr1.transitions s |> SymbolMap.iter (fun sym s' -> match sym with | Symbol.T t -> if Terminal.equal z t then (* If [sym] is the terminal symbol [z], then this transition matches our lookahead assumption, so we can take it. For every [z'], we have an edge to [s', z'], labeled with the singleton word [z]. *) let w = W.singleton z in Terminal.iter_real (fun z' -> edge w 1 (s', z') ) | Symbol.N nt -> (* If [sym] is a nonterminal symbol [nt], then we query [E] in order to find out which (minimal) words [w] allow us to take this transition. We must again try every [z'], and must respect the constraint that the first symbol of the word [w.z'] is [z]. For every [z'] and [w] that fulfill these requirements, we have an edge to [s', z'], labeled with the word [w]. *) Core.query s nt z (fun w z' -> edge w (W.length w) (s', z') ) ) (* Algorithm A*, used with a zero estimate, is Dijkstra's algorithm. We have experimented with a non-zero estimate, but the performance increase was minimal. *) let estimate _ = 0 end) (* ------------------------------------------------------------------------ *) (* [explored] counts how many graph nodes we have discovered during the search. *) let explored = ref 0 (* We wish to store a set of triples [nt, w, (s', spurious)], meaning that an error can be triggered in state [s'] by beginning in the initial state that corresponds to [nt] and by reading the sequence of terminal symbols [w]. We wish to store at most one such triple for every state [s'], so we organize the data as a set [domain] of states [s'] and a list [data] of triples [nt, w, (s', spurious)]. The list [spurious] documents the spurious reductions that are performed by the parser at the end. *) (* We could print this data as we go, which would naturally result in sorting the output by increasing word sizes. However, it seems preferable to sort the sentences lexicographically, so that similar sentences end up close to one another. (We could also sort them by state number. The result would be roughly similar.) This is why we store a list of triples and sort it before printing it out. *) let domain = ref Lr1.NodeSet.empty let data : (Nonterminal.t * W.word * ReferenceInterpreter.target) list ref = ref [] (* The set [reachable] stores every reachable state (regardless of whether an error can be triggered in that state). *) let reachable = ref Lr1.NodeSet.empty (* Perform the forward search. *) let _, _ = A.search (fun ((s', z), path) -> incr explored; reachable := Lr1.NodeSet.add s' !reachable; (* If [z] causes an error in state [s'] and this is the first time we are able to trigger an error in this state, ... *) if causes_an_error s' z && not (Lr1.NodeSet.mem s' !domain) then begin (* Reconstruct the initial state [s] and the word [w] that lead to this error. *) let (s, _), ws = A.reverse path in let w = List.fold_right W.append ws (W.singleton z) in (* Check that the reference interpreter confirms our finding. At the same time, compute a list of spurious reductions. *) let nt = Lr1.nt_of_entry s in let target = validate nt s' w in (* Store this new data. *) domain := Lr1.NodeSet.add s' !domain; data := (nt, w, target) :: !data end ) (* Sort and output the data. *) let () = !data |> List.fast_sort (fun (nt1, w1, _) (nt2, w2, _) -> let c = Nonterminal.compare nt1 nt2 in if c <> 0 then c else W.compare w2 w1 ) |> List.map (fun (nt, w, target) -> (nt, W.elements w, target)) |> List.iter Interpret.print_messages_item (* ------------------------------------------------------------------------ *) (* Verbosity. *) let max_heap_size = if X.verbose || X.statistics <> None then let stat = Gc.quick_stat() in (stat.Gc.top_heap_words * (Sys.word_size / 8) / 1024 / 1024) else 0 (* dummy *) let () = Time.tick "Forward search"; if X.verbose then begin Printf.eprintf "%d graph nodes explored by forward search.\n\ %d out of %d states are reachable.\n\ Found %d states where an error can occur.\n%!" !explored (Lr1.NodeSet.cardinal !reachable) Lr1.n (Lr1.NodeSet.cardinal !domain) end (* ------------------------------------------------------------------------ *) (* If requested by the client, write one line of statistics to a .csv file. *) let stop = now() let () = X.statistics |> Option.iter (fun filename -> let c = open_out_gen [ Open_creat; Open_append; Open_text ] 0o644 filename in Printf.fprintf c "%s,%d,%d,%d,%d,%d,%d,%d,%.2f,%d\n%!" (* Grammar name. *) Settings.base (* Number of terminal symbols. *) Terminal.n (* Number of nonterminal symbols. *) Nonterminal.n (* Grammar size (not counting the error productions). *) begin Production.foldx (fun prod accu -> let rhs = Production.rhs prod in if List.mem (Symbol.T Terminal.error) (Array.to_list rhs) then accu else accu + Array.length rhs ) 0 end (* Automaton size (i.e., number of states). *) Lr1.n (* Total trie size. *) Core.total_trie_size (* Size of [F]. *) Core.facts (* Size of [E]. *) Core.edge_facts (* Elapsed user time, in seconds. *) (stop -. start) (* Max heap size, in megabytes. *) max_heap_size ; close_out c ) (* ------------------------------------------------------------------------ *) end menhir-20210929/src/LRijkstra.mli000066400000000000000000000041171412503066000164450ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module implements [--list-errors]. Its purpose is to find, for each pair of a state [s] and a terminal symbol [z] such that looking at [z] in state [s] causes an error, a minimal path (starting in some initial state) that actually triggers this error. *) (* In this analysis, we explicitly ignore the [error] token. (We display a warning if the grammar uses this token.) Thus, we disregard any reductions or transitions that take place when the lookahead symbol is [error]. As a result, any state whose incoming symbol is [error] is found unreachable. It would be too complicated to have to create a first error in order to be able to take certain transitions or drop certain parts of the input. *) module Run (X : sig (* If [verbose] is set, produce various messages on [stderr]. *) val verbose: bool (* If [statistics] is defined, it is interpreted as the name of a file to which one line of statistics is appended. *) val statistics: string option end) : sig (* The result of this analysis is a [.messages] file. It is written to the standard output channel. No result is returned. *) end menhir-20210929/src/LRijkstraCore.ml000066400000000000000000000765251412503066000171210ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* As announced in our specification, we ignore the [error] token. We never work with the terminal symbol [#] either. This symbol never appears in the maps returned by [Lr1.transitions] and [Lr1.reductions]. Thus, in principle, we work with real terminal symbols only. However, we encode [any] as [#] -- see below. *) (* NOTE: Because the performance impact of the assertions in this file is about 10%, they are turned off by default. Change the value of [debug] to [true] if you wish to enable assertions. *) let debug = false open Grammar open Default (* ------------------------------------------------------------------------ *) (* We introduce a pseudo-terminal symbol [any]. It is used in several places later on, in particular in the [lookahead] field of a fact, to encode the absence of a lookahead hypothesis -- i.e., any terminal symbol will do. *) (* We choose to encode [any] as [#]. There is no risk of confusion, since we do not use [#] anywhere. Thus, the assertion [Terminal.real z] implies [z <> any]. *) let any = Terminal.sharp (* [foreach_terminal f] applies the function [f] to every terminal symbol in turn, except [error] and [#]. *) let foreach_terminal = Terminal.iter_real (* [foreach_terminal_not_causing_an_error s f] applies the function [f] to every terminal symbol [z] such that [causes_an_error s z] is false. This could be implemented in a naive manner using [foreach_terminal] and [causes_an_error]. This implementation is significantly more efficient. *) let foreach_terminal_not_causing_an_error s f = match has_default_reduction s with | Some _ -> (* There is a default reduction. No symbol causes an error. *) foreach_terminal f | None -> (* Enumerate every terminal symbol [z] for which there is a reduction. *) TerminalMap.iter (fun z _ -> (* A reduction on [#] is always a default reduction. (See [lr1.ml].) *) if debug then assert (not (Terminal.equal z Terminal.sharp)); if Terminal.non_error z then f z ) (Lr1.reductions s); (* Enumerate every terminal symbol [z] for which there is a transition. *) SymbolMap.iter (fun sym _ -> match sym with | Symbol.T z -> if debug then assert (not (Terminal.equal z Terminal.sharp)); if Terminal.non_error z then f z | Symbol.N _ -> () ) (Lr1.transitions s) (* Let us say a state [s] is solid if its incoming symbol is a terminal symbol (or if it has no incoming symbol at all, i.e., it is an initial state). It is fragile if its incoming symbol is a non-terminal symbol. *) let is_solid s = match Lr1.incoming_symbol s with | None | Some (Symbol.T _) -> true | Some (Symbol.N _) -> false (* ------------------------------------------------------------------------ *) (* To delay the side effects performed by this module, we wrap everything in in a big functor. The functor also serves to pass verbosity parameters. *) module Run (X : sig (* If [verbose] is set, produce various messages on [stderr]. *) val verbose: bool end) = struct (* ------------------------------------------------------------------------ *) (* Because of our encoding of terminal symbols as 8-bit characters, this algorithm supports at most 256 terminal symbols. *) let () = if Terminal.n > 256 then Error.error [] "the reachability analysis supports at most 256 terminal symbols.\n\ The grammar has %d terminal symbols." Terminal.n (* ------------------------------------------------------------------------ *) (* Produce a warning if the grammar uses the [error] pseudo-token. *) let () = if grammar_uses_error_token then Error.warning [] "The reachability analysis ignores all productions that involve the error token." (* ------------------------------------------------------------------------ *) (* Build a module that represents words as (hash-consed) strings. Note: this functor application has a side effect (it allocates memory, and more importantly, it may fail). *) module W = Terminal.Word(struct end) (* ------------------------------------------------------------------------ *) (* Instantiate [Trie]. This allocates fresh mutable state, but otherwise has no effect. The construction of the tries actually takes place when [Trie.stars] is invoked below. *) module Trie = Trie.Make(struct end) (* ------------------------------------------------------------------------ *) (* The main algorithm, [LRijkstra], accumulates facts. A fact is a triple of a [position] (that is, a sub-trie), a [word], and a [lookahead] assumption. Such a fact means that this [position] can be reached, from the source state [Trie.source position], by consuming [word], under the assumption that the next input symbol is [lookahead]. *) (* We allow [lookahead] to be [any] so as to indicate that this fact does not have a lookahead assumption. *) (* type fact = { position: Trie.trie; word: W.word; lookahead: Terminal.t (* may be [any] *) } *) (* To save memory (and therefore time), we encode a fact in a single OCaml integer value. This is made possible by the fact that tries, words, and terminal symbols are represented as (or can be encoded as) integers. This admittedly horrible hack allows us to save roughly a factor of 2 in space, and to gain 10% in time. *) type fact = int let dummy : fact = -1 (* should never be accessed! *) (* Encoding and decoding facts. *) (* We encode [position|word|lookahead] in a single word of memory. *) (* The lookahead symbol fits in 8 bits. *) (* In the largest grammars that we have seen, the number of unique words is about 3.10^5, so a word should fit in about 19 bits (2^19 = 524288). In the largest grammars that we have seen, the total star size is about 64000, so a trie should fit in about 17 bits (2^17 = 131072). *) (* On a 64-bit machine, we have ample space in a 63-bit word! We allocate 30 bits for [word] and the rest (i.e., 25 bits) for [position]. *) (* On a 32-bit machine, we are a bit more cramped! In Menhir's own fancy-parser, the number of terminal symbols is 27, the number of unique words is 566, and the total star size is 546. We allocate 12 bits for [word] and 11 bits for [position]. This is better than refusing to work altogether, but still not great. A more satisfactory approach might be to revert to heap allocation of facts when in 32-bit mode, but that would make the code somewhat ugly. *) let w_lookahead = 8 let w_word = if Sys.word_size < 64 then 12 else 30 let w_position = Sys.word_size - 1 - (w_word + w_lookahead) (* 25, on a 64-bit machine *) let identity (fact : fact) : int = if debug then assert (fact <> dummy); fact lsr (w_word + w_lookahead) let position (fact : fact) : Trie.trie = if debug then assert (fact <> dummy); Trie.decode (identity fact) let word (fact : fact) : W.word = if debug then assert (fact <> dummy); (fact lsr w_lookahead) land (1 lsl w_word - 1) let lookahead (fact : fact) : Terminal.t = Terminal.i2t (fact land (1 lsl w_lookahead - 1)) let mkfact position (word : W.word) lookahead = let position : int = Trie.encode position and word : int = word and lookahead : int = Terminal.t2i lookahead in if debug then begin assert (0 <= position && 0 <= word && 0 <= lookahead); assert (lookahead < 1 lsl w_lookahead); end; if position < 1 lsl w_position && word < 1 lsl w_word then (* [lsl] binds tighter than [lor] *) (position lsl w_word lor word) lsl w_lookahead lor lookahead else let advice = if Sys.word_size < 64 then "Please use a 64-bit machine." else "Please report this error to Menhir's developers." in Error.error [] "an internal limit was exceeded.\n\ Sys.word_size = %d. Position = %d. Word = %d.\n\ %s%!" Sys.word_size position word advice let mkfact p w l = let fact = mkfact p w l in if debug then begin assert (word fact == w); (* round-trip property *) assert (lookahead fact == l); (* round-trip property *) assert (position fact == p); (* round-trip property *) end; fact (* Two invariants reduce the number of facts that we consider: 1. If [lookahead] is a real terminal symbol [z] (i.e., not [any]), then [z] does not cause an error in the [current] state. It would be useless to consider a fact that violates this property; this cannot possibly lead to a successful reduction. In practice, this refinement allows reducing the number of facts that go through the queue by a factor of two. 2. [lookahead] is [any] iff the [current] state is solid. This sounds rather reasonable (when a state is entered by shifting, it is entered regardless of which symbol follows) and simplifies the implementation of the sub-module [F]. *) let invariant1 position _word lookahead = let current = Trie.current position in lookahead = any || not (causes_an_error current lookahead) let invariant2 position _word lookahead = let current = Trie.current position in (lookahead = any) = is_solid current (* [compatible z a] checks whether the terminal symbol [a] satisfies the lookahead assumption [z] -- which can be [any]. *) let compatible z a = if debug then begin assert (Terminal.non_error z); assert (Terminal.real a); end; z = any || z = a (* ------------------------------------------------------------------------ *) (* As in Dijkstra's algorithm, a priority queue contains the facts that await examination. The length of [word fact] serves as the priority of a fact. This guarantees that we discover shortest paths. (We never insert into the queue a fact whose priority is less than the priority of the last fact extracted out of the queue.) *) (* [LowIntegerPriorityQueue] offers very efficient operations (essentially constant time, for a small constant). It exploits the fact that priorities are low nonnegative integers. *) module Q = LowIntegerPriorityQueue let q = Q.create dummy (* In principle, there is no need to insert the fact into the queue if [F] already stores a comparable fact. We could perform this test in [enqueue]. However, a few experiments suggests that this is not worthwhile. The run time augments (because membership in [F] is tested twice, upon inserting and upon extracting) and the memory consumption does not seem to go down significantly. *) let enqueue position word lookahead = (* [lookahead] can be [any], but cannot be [error] *) if debug then begin assert (Terminal.non_error lookahead); assert (invariant1 position word lookahead); assert (invariant2 position word lookahead); end; (* The length of [word] serves as the priority of this fact. *) let priority = W.length word in (* Encode and enqueue this fact. *) Q.add q (mkfact position word lookahead) priority (* ------------------------------------------------------------------------ *) (* Construct the [star] of every state [s]. Initialize the priority queue. *) let () = (* For every state [s], if the trie rooted at [s] is nontrivial, ... *) Trie.stars (fun s position -> (* ...then insert an initial fact into the priority queue. *) (* In order to respect invariants 1 and 2, we must distinguish two cases. If [s] is solid, then we insert a single fact, whose lookahead assumption is [any]. Otherwise, we must insert one initial fact for every terminal symbol [z] that does not cause an error in state [s]. *) let word = W.epsilon in if is_solid s then enqueue position word any else foreach_terminal_not_causing_an_error s (fun z -> enqueue position word z ) ); if X.verbose then Trie.verbose() (* ------------------------------------------------------------------------ *) (* The module [F] maintains a set of known facts. *) (* Three aspects of a fact are of particular interest: - its position [position], given by [position fact]; - its first symbol [a], given by [W.first (word fact) (lookahead fact)]; - its lookahead assumption [z], given by [lookahead fact]. For every triple of [position], [a], and [z], we store at most one fact, (whose word has minimal length). Indeed, we are not interested in keeping track of several words that produce the same effect. Only the shortest such word is of interest. Thus, the total number of facts accumulated by the algorithm is at most [T.n^2], where [T] is the total size of the tries that we have constructed, and [n] is the number of terminal symbols. (This number can be quite large. [T] can be in the tens of thousands, and [n] can be over one hundred. These figures lead to a theoretical upper bound of 100M. In practice, for T=25K and n=108, we observe that the algorithm gathers about 7M facts.) *) module F : sig (* [register fact] registers the fact [fact]. It returns [true] if this fact is new, i.e., no fact concerning the same triple of [position], [a], and [z] was previously known. *) val register: fact -> bool (* [query current z f] enumerates all known facts whose current state is [current] and whose lookahead assumption is compatible with [z]. The symbol [z] must a real terminal symbol, i.e., cannot be [any]. *) val query: Lr1.node -> Terminal.t -> (fact -> unit) -> unit (* [size()] returns the number of facts currently stored in the set. *) val size: unit -> int (* [verbose()] outputs debugging & performance information. *) val verbose: unit -> unit end = struct (* We need to query the set of facts in two ways. In [register], we must test whether a proposed triple of [position], [a], [z] already appears in the set. In [query], we must find all facts that match a pair [current, z], where [current] is a state. (Note that [position] determines [current], but the converse is not true: a position contains more information besides the current state.) To address these needs, we use a two-level table. The first level is a matrix indexed by [current] and [z]. At the second level, we find sets of facts, where two facts are considered equal if they have the same triple of [position], [a], and [z]. In fact, we know at this level that all facts have the same [z] component, so only [position] and [a] are compared. Because our facts satisfy invariant 2, [z] is [any] if and only if the state [current] is solid. This means that we are wasting quite a lot of space in the matrix (for a solid state, the whole line is empty, except for the [any] column). *) (* The level-2 sets. *) module M = MySet.Make(struct type t = fact let compare fact1 fact2 = if debug then assert (lookahead fact1 = lookahead fact2); (* Compare the two positions first. This can be done without going through [Trie.decode], by directly comparing the two integer identities. *) let c = Generic.compare (identity fact1) (identity fact2) in if debug then assert (c = Trie.compare (position fact1) (position fact2)); if c <> 0 then c else let z = lookahead fact1 in let a1 = W.first (word fact1) z and a2 = W.first (word fact2) z in (* note: [a1] and [a2] can be [any] here *) Terminal.compare a1 a2 end) (* The level-1 matrix. *) let table = Array.make (Lr1.n * Terminal.n) M.empty let index current z = Terminal.n * (Lr1.number current) + Terminal.t2i z let count = ref 0 let register fact = let current = Trie.current (position fact) in let z = lookahead fact in let i = index current z in let m = table.(i) in (* We crucially rely on the fact that [M.add] guarantees not to change the set if an ``equal'' fact already exists. Thus, a later, longer path is ignored in favor of an earlier, shorter path. *) let m' = M.add fact m in m != m' && begin incr count; table.(i) <- m'; true end let query current z f = if debug then assert (not (Terminal.equal z any)); (* If the state [current] is solid then the facts that concern it are stored in the column [any], and all of them are compatible with [z]. Otherwise, they are stored in all columns except [any], and only those stored in the column [z] are compatible with [z]. *) let i = index current (if is_solid current then any else z) in let m = table.(i) in M.iter f m let size () = !count let verbose () = Printf.eprintf "F stores %d facts.\n%!" (size()) end (* ------------------------------------------------------------------------ *) (* The module [E] is in charge of recording the non-terminal edges that we have discovered, or more precisely, the conditions under which these edges can be taken. It maintains a set of quadruples [s, nt, w, z], where such a quadruple means that in the state [s], the outgoing edge labeled [nt] can be taken by consuming the word [w], under the assumption that the next symbol is [z]. Again, the terminal symbol [a], given by [W.first w z], plays a role. For each quadruple [s, nt, a, z], we store at most one quadruple [s, nt, w, z]. Thus, internally, we maintain a mapping of [s, nt, a, z] to [w]. For greater simplicity, we do not allow [z] to be [any] in [register] or [query]. Allowing it would complicate things significantly, it seems. *) module E : sig (* [register s nt w z] records that, in state [s], the outgoing edge labeled [nt] can be taken by consuming the word [w], if the next symbol is [z]. It returns [true] if this information is new, i.e., if the underlying quadruple [s, nt, a, z] is new. The symbol [z] cannot be [any]. *) val register: Lr1.node -> Nonterminal.t -> W.word -> Terminal.t -> bool (* [query s nt a foreach] enumerates all words [w] and all real symbols [z] such that, in state [s], the outgoing edge labeled [nt] can be taken by consuming the word [w], under the assumption that the next symbol is [z], and the first symbol of the word [w.z] is [a]. The symbol [a] can be [any]. The function [foreach] can be either [foreach_terminal] or of the form [foreach_terminal_not_causing_an_error _]. It limits the symbols [z] that are considered. *) val query: Lr1.node -> Nonterminal.t -> Terminal.t -> (* foreach: *) ((Terminal.t -> unit) -> unit) -> (W.word -> Terminal.t -> unit) -> unit (* [size()] returns the number of edges currently stored in the set. *) val size: unit -> int (* [verbose()] outputs debugging & performance information. *) val verbose: unit -> unit end = struct (* At a high level, we must implement a mapping of [s, nt, a, z] to [w]. In practice, we can implement this specification using any combination of arrays, hash tables, balanced binary trees, and perfect hashing (i.e., packing several of [s], [nt], [a], [z] in one word.) Here, we choose to use an array, indexed by [s], of hash tables, indexed by a key that packs [nt], [a], and [z] in one word. According to a quick experiment, the final population of the hash table [table.(index s)] seems to be roughly [Terminal.n * Trie.size s]. We note that using an initial capacity of 0 and relying on the hash table's resizing mechanism has a significant cost, which is why we try to guess a good initial capacity. *) module H = Hashtbl let table = Array.init Lr1.n (fun i -> let size = Trie.size i in H.create (if size = 1 then 0 else Terminal.n * size) ) let index s = Lr1.number s let pack nt a z : int = (* We rely on the fact that we have at most 256 terminal symbols. *) (Nonterminal.n2i nt lsl 16) lor (Terminal.t2i a lsl 8) lor (Terminal.t2i z) let count = ref 0 let register s nt w z = if debug then assert (Terminal.real z); let i = index s in let m = table.(i) in let a = W.first w z in (* Note that looking at [a] in state [s] cannot cause an error. *) if debug then assert (not (causes_an_error s a)); let key = pack nt a z in if H.mem m key then false else begin incr count; H.add m key w; true end let rec query s nt a foreach f = if Terminal.equal a any then begin (* If [a] is [any], we query the table for every real symbol [a]. We can limit ourselves to symbols that do not cause an error in state [s]. Those that do certainly do not have an entry; see the assertion in [register] above. *) foreach_terminal_not_causing_an_error s (fun a -> query s nt a foreach f ) end else let i = index s in let m = table.(i) in foreach (fun z -> if debug then assert (Terminal.real z); let key = pack nt a z in match H.find m key with | w -> f w z | exception Not_found -> () ) let size () = !count let verbose () = Printf.eprintf "E stores %d edges.\n%!" (size()) end (* ------------------------------------------------------------------------ *) (* [new_edge s nt w z] is invoked when we discover that in the state [s], the outgoing edge labeled [nt] can be taken by consuming the word [w], under the assumption that the next symbol is [z]. We check whether this quadruple already exists in the set [E]. If not, then we add it, and we compute its consequences, in the form of new facts, which we insert into the priority queue for later examination. *) let new_edge s nt w z = if debug then assert (Terminal.real z); if E.register s nt w z then let sym = Symbol.N nt in (* Query [F] for existing facts which could be extended by following this newly discovered edge. They must be facts whose current state is [s] and whose lookahead assumption is compatible with [a]. For each such fact, ... *) F.query s (W.first w z) (fun fact -> if debug then assert (compatible (lookahead fact) (W.first w z)); (* ... try to take one step in the trie along an edge labeled [nt]. *) match Trie.step sym (position fact) with | position -> (* This takes us to a new state whose incoming symbol is [nt]. Hence, this state is not solid. In order to satisfy invariant 2, we must create fact whose lookahead assumption is not [any]. That's fine, since our lookahead assumption is [z]. In order to satisfy invariant 1, we must check that [z] does not cause an error in this state. *) if debug then assert (not (is_solid (Trie.current position))); if not (causes_an_error (Trie.current position) z) then let word = W.append (word fact) w in enqueue position word z | exception Not_found -> (* Could not take a step in the trie. This means this branch leads nowhere of interest, and was pruned when the trie was constructed. *) () ) (* ------------------------------------------------------------------------ *) (* [new_fact fact] is invoked when we discover a new fact (i.e., one that was not previously known). It studies the consequences of this fact. These consequences are of two kinds: - As in Dijkstra's algorithm, the new fact can be viewed as a newly discovered vertex. We study its (currently known) outgoing edges, and enqueue new facts in the priority queue. - Sometimes, a fact can also be viewed as a newly discovered edge. This is the case when the word that took us from [source] to [current] represents a production of the grammar and [current] is willing to reduce this production. We record the existence of this edge, and re-inspect any previously discovered vertices which are interested in this outgoing edge. *) let new_fact fact = (* Throughout this rather long function, there is just one [fact]. Let's name its components right now, so as to avoid accessing them several times. (That could be costly, as it requires decoding the fact.) *) let position = position fact and lookahead = lookahead fact and word = word fact in let source = Trie.source position and current = Trie.current position in (* 1. View [fact] as a vertex. Examine the transitions out of [current]. For every transition labeled by a symbol [sym] and into a state [target], ... *) Lr1.transitions current |> SymbolMap.iter (fun sym target -> (* ... try to follow this transition in the trie [position], down to a child which we call [child]. *) match Trie.step sym position, sym with | exception Not_found -> (* Could not take a step in the trie. This means this transition leads nowhere of interest. *) () | child, Symbol.T t -> (* 1a. The transition exists in the trie, and [sym] is in fact a terminal symbol [t]. We note that [t] cannot be the [error] token, because the trie does not have any edges labeled [error]. *) if debug then begin assert (Lr1.Node.compare (Trie.current child) target = 0); assert (is_solid target); assert (Terminal.non_error t); end; (* If the lookahead assumption [lookahead] is compatible with [t], then we derive a new fact, where one more edge has been taken, and enqueue this new fact for later examination. *) (* The state [target] is solid, i.e., its incoming symbol is terminal. This state is always entered without consideration for the next lookahead symbol. Thus, we can use [any] as the lookahead assumption in the new fact that we produce. If we did not have [any], we would have to produce one fact for every possible lookahead symbol. *) if compatible lookahead t then let word = W.append word (W.singleton t) in enqueue child word any | child, Symbol.N nt -> (* 1b. The transition exists in the trie, and [sym] is in fact a nonterminal symbol [nt]. *) if debug then begin assert (Lr1.Node.compare (Trie.current child) target = 0); assert (not (is_solid target)); end; (* We need to know how this nonterminal edge can be taken. We query [E] for a word [w] that allows us to take this edge. In general, the answer depends on the terminal symbol [z] that comes *after* this word: we try all such symbols. We must make sure that the first symbol of the word [w.z] satisfies the lookahead assumption [lookahead]; this is ensured by passing this information to [E.query]. *) (* It could be the case that, due to a default reduction, the answer to our query does not depend on [z], and we are wasting work. However, allowing [z] to be [any] in [E.query], and taking advantage of this to increase performance, seems difficult. *) let foreach = foreach_terminal_not_causing_an_error target in E.query current nt lookahead foreach (fun w z -> if debug then assert (compatible lookahead (W.first w z)); let word = W.append word w in enqueue child word z ) ); (* 2. View [fact] as a possible edge. This is possible if the path from [source] to the [current] state represents a production [prod] and [current] is willing to reduce this production. Then, reducing [prod] takes us all the way back to [source]. Thus, this production gives rise to an edge labeled [nt] -- the left-hand side of [prod] -- out of [source]. *) let z = lookahead in if not (Terminal.equal z any) then begin (* 2a. The lookahead assumption [z] is a real terminal symbol. We check whether [current] is willing to reduce some production [prod] on [z], and whether the sub-trie [position] accepts [prod], which means that this reduction takes us back to the root of the trie. If so, we have discovered a new edge. *) match has_reduction current z with | Some prod when Trie.accepts prod position -> new_edge source (Production.nt prod) word z | _ -> () end else begin (* 2b. The lookahead assumption is [any]. We must consider every pair [prod, z] such that the [current] state can reduce [prod] on [z] and [position] accepts [prod]. *) match has_default_reduction current with | Some (prod, _) -> if Trie.accepts prod position then (* [new_edge] does not accept [any] as its 4th parameter, so we must iterate over all terminal symbols. *) foreach_terminal (fun z -> new_edge source (Production.nt prod) word z ) | None -> TerminalMap.iter (fun z prods -> if Terminal.non_error z then let prod = Misc.single prods in if Trie.accepts prod position then new_edge source (Production.nt prod) word z ) (Lr1.reductions current) end (* ------------------------------------------------------------------------ *) (* The main loop of the algorithm. *) (* [level] is the length of [word fact] for the facts that we are examining at the moment. [extracted] counts how many facts we have extracted out of the priority queue. [considered] counts how many of these were found to be new, and subsequently passed to [new_fact]. *) let level, extracted, considered = ref 0, ref 0, ref 0 let done_with_level () = Printf.eprintf "Done with level %d.\n" !level; W.verbose(); F.verbose(); E.verbose(); Printf.eprintf "Q stores %d facts.\n" (Q.cardinal q); Printf.eprintf "%d facts extracted out of Q, of which %d considered.\n%!" !extracted !considered let () = Q.repeat q (fun fact -> incr extracted; if F.register fact then begin if X.verbose && W.length (word fact) > !level then begin done_with_level(); level := W.length (word fact); end; incr considered; new_fact fact end ); if X.verbose then done_with_level(); Time.tick "Running LRijkstra" (* ------------------------------------------------------------------------ *) (* We are done. Expose accessor functions. *) (* We expose [E.query], but simplify its interface by specializing it with [foreach_terminal]. We also restrict it to the case where [a] is real. *) let query s nt a = if debug then assert (Terminal.real a); E.query s nt a foreach_terminal (* Expose some numbers. *) let facts, edge_facts = F.size(), E.size() let total_trie_size = Trie.total_size() end menhir-20210929/src/LRijkstraCore.mli000066400000000000000000000057131412503066000172610ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* This is the core of the reachability analysis. After the automaton has been constructed, this (expensive) analysis determines exactly under which conditions each nonterminal edge in the automaton can be taken. This information can then be used to determine how to reach certain states in the automaton; see, e.g., [LRijkstra]. *) (* In this analysis, we explicitly ignore the [error] token. (We display a warning if the grammar uses this token.) Thus, we disregard any reductions or transitions that take place when the lookahead symbol is [error]. As a result, any state whose incoming symbol is [error] is found unreachable. It would be too complicated to have to create a first error in order to be able to take certain transitions or drop certain parts of the input. *) module Run (X : sig (* If [verbose] is set, produce various messages on [stderr]. *) val verbose: bool end) : sig (* A representation of words of terminal symbols. See [GrammarFunctor]. *) module W : sig type word val singleton: Terminal.t -> word val append: word -> word -> word val length: word -> int val elements: word -> Terminal.t list val compare: word -> word -> int end (* [query s nt a] enumerates all words [w] and all symbols [z] such that, in state [s], the outgoing edge labeled [nt] can be taken by consuming the word [w], under the assumption that the next symbol is [z], and the first symbol of the word [w.z] is [a]. *) val query: (* s: *) Lr1.node -> (* nt: *) Nonterminal.t -> (* a: *) Terminal.t -> (* f: *) (W.word -> Terminal.t -> unit) -> unit (* [facts] is the total number of facts discovered. [edge_facts] is the total number of edge facts discovered. [total_trie_size] is the sum of the sizes of the tries that are internally constructed in the module [Trie]. These numbers are provided for information only. *) val facts: int val edge_facts: int val total_trie_size: int end menhir-20210929/src/LoopDetection.ml000066400000000000000000000214521412503066000171400ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar module Run () = struct (* -------------------------------------------------------------------------- *) (* Let us write A -> alpha when there exists a production A -> alpha, and let us write beta => gamma when the sentential form beta expands (in one step) to gamma. *) (* According to Aho and Ullman ("The Theory of Parsing, Translation, and Compiling -- Volume I: Parsing", page 150), a grammar is cycle-free if there is no derivation of the form A =>+ A. In other words, there is a cycle when a nonterminal symbol A expands, in one more steps, to itself. *) (* Under the assumption that every nonterminal symbol is reachable and generates a nonempty language, the presence of a cycle implies that the grammar is infinitely ambiguous: for some inputs, there is an infinite number of parse trees. *) (* We reject such a grammar, on two grounds: first, it seems pathological, and is likely the result of a mistake; second, the algorithm that we use to speed up closure computations (in the module Item) does not tolerate the presence of certain cycles. *) (* Let us define a relation R as follows: A R B holds if and only if there is a production A -> alpha B beta where alpha and beta are nullable. Then, it is not difficult to see that the relations =>+ and R+ coincide. That is, to check that a grammar is cycle-free, it suffices to check that the relation R is acyclic. *) (* The relation R is defined as follows. Upon first reading, take [require_nullable_suffix] to be [true] and [require_nonempty_prefix] to be [false]. *) let nullable_suffix prod i = let nullable, _ = Analysis.nullable_first_prod prod i in nullable let successors ~require_nullable_suffix ~require_nonempty_prefix (yield : Production.index -> Nonterminal.t -> unit) (nt : Nonterminal.t) : unit = Production.iternt nt begin fun prod -> let rhs = Production.rhs prod in let n = Array.length rhs in let nullable_prefix = ref true in let i = ref 0 in while !nullable_prefix && !i < n do match rhs.(Misc.postincrement i) with | Symbol.T _ -> nullable_prefix := false | Symbol.N nt' -> if (not require_nullable_suffix || nullable_suffix prod !i) && (not require_nonempty_prefix || !i > 1) then yield prod nt'; nullable_prefix := Analysis.nullable nt' done end (* This adapter hides [prod] from the user function [yield]. *) let adapt successors yield nt = successors (fun _prod nt' -> yield nt') nt (* A detailed explanation of cycles whose length is greater than one. *) let show_cycle nts nt = assert (List.hd nts = nt); if List.length nts = 1 then "" else begin let nts = Array.of_list (nts @ [nt]) in let i = ref 0 in let next () = Nonterminal.print false nts.(Misc.postincrement i) and finished () = !i = Array.length nts in Misc.with_buffer 1024 begin fun b -> let out format = Printf.bprintf b format in out "%s" (next()); while not (finished()) do out " expands to %s" (next()); if finished() then out ".\n" else out ",\nwhich" done end end let fail nts nt = let positions = List.flatten (List.map Nonterminal.positions nts) in Error.error positions "the grammar is cyclic:\n\ the nonterminal symbol %s expands to itself.\n%s\ A cyclic grammar is ambiguous." (Nonterminal.print false nt) (show_cycle nts nt) (* To detect a cycle in a relation, we use the combinator [defensive_fix] that is provided by the library Fix. We define a function of type [Nonterminal.t -> unit] that computes nothing but calls itself recursively according to the pattern defined by the relation R. Then, we evaluate this function everywhere. If there is a cycle, it is detected and reported. *) (* The claim that "a cyclic grammar is ambiguous" implicitly assumes that every nonterminal symbol is reachable and inhabited. *) let () = let module M = Fix.Memoize.ForType(Nonterminal) in let successors_R = successors ~require_nullable_suffix:true ~require_nonempty_prefix:false |> adapt in let check = M.defensive_fix successors_R in try Nonterminal.iter check with M.Cycle (nts, nt) -> fail nts nt (* -------------------------------------------------------------------------- *) (* Another anomaly that we wish to detect is hidden left recursion. In the paper "Looping LR parsers" (1988), Soisalon-Soininen and Tarhio define hidden left recursion (although they do not explicitly use this term) and point out that: 1- a grammar that has hidden left recursion cannot be LR(k) for any k, and (worse) 2- if the shift/reduce conflict that it causes is resolved in favor in reduction, then the deterministic parser that is constructed can diverge by entering an infinite sequence of reductions. Conversely, they show if a grammar exhibits no cycle and no hidden left recursion, then the parser must terminate, regardless of how conflicts are resolved. *) (* One possible definition of hidden left recursion, given by Nederhof and Sarbo in the paper "Increasing the Applicability of LR Parsing" (1993), is the existence of a production A -> B alpha where B is nullable and alpha expands (in zero or more steps) to A beta. *) (* Let us define a relation S as follows. A S B holds if and only if there is a production A -> alpha B beta where alpha is nullable. This relation can be viewed as the disjoint union of two smaller relations L and H, defined as follows: - A L B holds if and only if there is a production A -> B beta; - A H B holds if and only if there is a production A -> alpha B beta where alpha is nullable but is not epsilon. A cycle in the relation L is fine: it represents ordinary left recursion. A cycle that involves at least one H edge and any number of L and H edges, however, denotes hidden left recursion. *) (* An error message. *) let fail prod = let nt, rhs = Production.def prod in let positions = Production.positions prod in Error.error positions "the grammar exhibits hidden left recursion: in the production\n\ %s,\n\ the nonterminal symbol %s is nullable,\n\ and the remainder of the right-hand side expands to a sentential form\n\ that begins with the nonterminal symbol %s.\n\ This implies that the grammar is not LR(k) for any k." (Production.print prod) (Symbol.print rhs.(0)) (Symbol.print (Symbol.N nt)) (* Furthermore, this creates a shift/reduce conflict, which (if resolved in favor of reduction) can cause the parser to diverge. *) (* To detect hidden left recursion in linear time, we first compute the strongly connected components of the relation S. Then, we check every edge in the relation H. If the source and destination vertices of this edge lie in the same component, then we have detected hidden left recursion. *) let () = let module T = Tarjan.Run (struct type node = Nonterminal.t let n = Nonterminal.n let index = Nonterminal.n2i let iter = Nonterminal.iter (* The relation S is computed as follows. *) let successors = successors ~require_nullable_suffix:false ~require_nonempty_prefix:false |> adapt end) in (* The relation H is computed as follows. *) let successors_H = successors ~require_nullable_suffix:false ~require_nonempty_prefix:true in (* Iterate on every edge in the relation H. *) Nonterminal.iter begin fun nt -> nt |> successors_H begin fun prod nt' -> (* If the source vertex [nt] and the destination vertex [nt'] lie in the same component, then we have detected hidden left recursion. *) if T.representative nt = T.representative nt' then fail prod end end (* -------------------------------------------------------------------------- *) let () = Time.tick "Running loop detection" end (* Run *) menhir-20210929/src/LoopDetection.mli000066400000000000000000000021501412503066000173030ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* [Run] detects and rejects certain anomalies in the grammar, which cause the grammar to be outside of the class LR(1). *) module Run () : sig end menhir-20210929/src/LowIntegerPriorityQueue.ml000066400000000000000000000101021412503066000212040ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module implements a simple-minded priority queue, under the assumption that priorities are low nonnegative integers. *) module MyArray = ResizableArray module MyStack = ResizableArray type 'a t = { (* A priority queue is represented as a resizable array, indexed by priorities, of stacks (implemented as resizable arrays). There is no a priori bound on the size of the main array -- its size is increased if needed. It is up to the user to use priorities of reasonable magnitude. *) a: 'a MyStack.t MyArray.t; (* Index of lowest nonempty stack, if there is one; or lower (sub-optimal, but safe). If the queue is empty, [best] is arbitrary. *) mutable best: int; (* Current number of elements in the queue. Used in [remove] to stop the search for a nonempty bucket. *) mutable cardinal: int; } let create default = (* Set up the main array so that it initially has 16 priority levels and, whenever new levels are added, each of them is initialized with a fresh empty stack. The dummy stack is never accessed; it is used to fill empty physical slots in the main array. *) let dummy = MyStack.make_ 0 default in let a = MyArray.make 16 dummy (fun _ -> MyStack.make_ 1024 default) in { a; best = 0; cardinal = 0 } let add q x priority = assert (0 <= priority); q.cardinal <- q.cardinal + 1; (* Grow the main array if necessary. *) if MyArray.length q.a <= priority then MyArray.resize q.a (priority + 1); (* Find out which stack we should push into. *) let xs = MyArray.get q.a priority in (* assert (xs != MyArray.default q.a); *) (* Push. *) MyStack.push xs x; (* Decrease [q.best], if necessary, so as not to miss the new element. In the special case of Dijkstra's algorithm or A*, this never happens. *) if priority < q.best then q.best <- priority let is_empty q = q.cardinal = 0 let cardinal q = q.cardinal let rec remove_nonempty q = (* Look for the next nonempty bucket. We know there is one. This may seem inefficient, because it is a linear search. However, in applications where [q.best] never decreases, the cumulated cost of this loop is the maximum priority ever used, which is good. *) let xs = MyArray.get q.a q.best in if MyStack.length xs = 0 then begin (* As noted below, [MyStack.pop] does not physically shrink the stack. When we find that a priority level has become empty, we physically empty it, so as to free the (possibly large) space that it takes up. This strategy is good when the client is Dijkstra's algorithm or A*. *) let dummy = MyArray.default q.a in MyArray.set q.a q.best dummy; q.best <- q.best + 1; remove_nonempty q end else begin q.cardinal <- q.cardinal - 1; Some (MyStack.pop xs) (* Note: [MyStack.pop] does not shrink the physical array underlying the stack. This is good, because we are likely to push new elements into this stack. *) end let remove q = if q.cardinal = 0 then None else remove_nonempty q let rec repeat q f = match remove q with | None -> () | Some x -> f x; repeat q f menhir-20210929/src/LowIntegerPriorityQueue.mli000066400000000000000000000036551412503066000213740ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (** This module implements a simple-minded priority queue, under the assumption that priorities are low nonnegative integers. *) (** The type of priority queues. *) type 'a t (** [create default] creates an empty priority queue. The [default] value is used to fill empty physical slots, but is otherwise irrelevant. *) val create: 'a -> 'a t (** [add q x p] inserts the element [x], with priority [p], into the queue [q]. *) val add: 'a t -> 'a -> int -> unit (** [remove q] extracts out of [q] and returns an element with minimum priority. *) val remove: 'a t -> 'a option (** [is_empty q] tests whether the queue [q] is empty. *) val is_empty: 'a t -> bool (** [cardinal q] returns the number of elements in the queue [q]. *) val cardinal: 'a t -> int (** [repeat q f] repeatedly extracts an element with minimum priority out of [q] and passes it to [f] (which may insert new elements into [q]), until [q] is exhausted. *) val repeat: 'a t -> ('a -> unit) -> unit menhir-20210929/src/MArray.ml000066400000000000000000000107731412503066000155670ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) include Array let empty = [||] let last a = let n = length a in assert (n > 0); unsafe_get a (n - 1) let pop a = let n = length a in assert (n > 0); sub a 0 (n - 1) let push a x = let n = length a in init (n + 1) (fun i -> if i = n then x else a.(i)) let truncate k a = let n = length a in if n <= k then a else sub a (n-k) k let rec equal_segments equal a1 i1 a2 i2 n = n = 0 || equal a1.(i1) a2.(i2) && equal_segments equal a1 (i1 + 1) a2 (i2 + 1) (n - 1) let is_suffix equal a1 a2 = let n1 = length a1 and n2 = length a2 in n1 <= n2 && equal_segments equal a1 0 a2 (n2 - n1) n1 let rec greatest_suffix_forall p a n k = if k = n || not (p a.(n - 1 - k)) then k else greatest_suffix_forall p a n (k + 1) let greatest_suffix_forall p a = let k = greatest_suffix_forall p a (length a) 0 in truncate k a let rev a = let n = length a in if n = 0 then a else let r = make n a.(0) in for i = 0 to n - 2 do r.(i) <- a.(n - i - 1) done; r let rev_of_list xs = match xs with | [] -> [||] | x :: xs -> let n = 1 + List.length xs in let r = make n x in List.iteri (fun i x -> r.(n - i - 2) <- x) xs ; r let rev_to_list a = fold_left (fun xs x -> x :: xs) [] a let iter_rev f a = for i = length a - 1 downto 0 do f a.(i) done let existsi p a = let n = length a in let rec loop i = if i = n then false else if p i (unsafe_get a i) then true else loop (succ i) in loop 0 let count p a = let n = length a in let c = ref 0 in for i = 0 to n-1 do if p (unsafe_get a i) then c := !c + 1 done; !c (* To keep compatibility with OCaml 4.02, we copy [Array.for_all], which appeared in 4.03. *) let for_all p a = let n = length a in let rec loop i = if i = n then true else if p (unsafe_get a i) then loop (succ i) else false in loop 0 (* Similarly, we copy [Array.for_all2], which appeared in 4.11. *) let for_all2 p l1 l2 = let n1 = length l1 and n2 = length l2 in if n1 <> n2 then invalid_arg "Array.for_all2" else let rec loop i = if i = n1 then true else if p (unsafe_get l1 i) (unsafe_get l2 i) then loop (succ i) else false in loop 0 let fold_left2 f accu a1 a2 = let n1 = length a1 and n2 = length a2 in if n1 <> n2 then invalid_arg "Array.fold_left2"; let accu = ref accu in for i = 0 to n1 - 1 do accu := f !accu (unsafe_get a1 i) (unsafe_get a2 i) done; !accu let leq_join leq_join a1 a2 = let n = length a1 in assert (n = length a2); let a = init n (fun i -> leq_join (unsafe_get a1 i) (unsafe_get a2 i)) in if for_all2 (==) a2 a then a2 else a let test () = assert (pop [|1; 2; 3; 4|] = [|1; 2; 3|]) ; assert (push [|1; 2; 3|] 4 = [|1; 2; 3; 4|]) ; assert (truncate 2 [|1; 2; 3; 4|] = [|3; 4|]) ; assert (truncate 4 [|1; 2|] = [|1; 2|]) ; assert (is_suffix (=) [||] [||]) ; assert (is_suffix (=) [||] [|0;3;4|]) ; assert (is_suffix (=) [|2|] [|0;2|]) ; assert (is_suffix (=) [|3; 4|] [|0;3;4|]) ; assert (greatest_suffix_forall ((<) 4) [|1; 2; 3; 4|] = [||]) ; assert (greatest_suffix_forall ((<) 2) [|1; 2; 3; 4|] = [|3; 4|]) ; assert (greatest_suffix_forall ((<) 0) [|1; 2; 3; 4|] = [|1; 2; 3; 4|]) ; assert (greatest_suffix_forall ((<) 0) [|1; 2; 0; 4|] = [|4|]) ; assert (rev [|1; 2; 3; 4|] = [|4; 3; 2; 1|]) ; assert (rev_of_list [1; 2; 3; 4; 5] = [|5; 4; 3; 2; 1|]) ; assert (rev_to_list [|1; 2; 3; 4; 5|] = [5; 4; 3; 2; 1]) ; assert (count (fun x -> x mod 2 = 0) [| 1;2;3 |] = 1) ; assert (count (fun x -> x mod 2 = 0) [||] = 0) ; () menhir-20210929/src/MArray.mli000066400000000000000000000074361412503066000157420ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (** This module is an extension of Stdlib.Array *) include module type of Array (**[empty] is the empty array. *) val empty : 'a array (**[last a] is the last element of the array [a], which must be nonempty. *) val last : 'a array -> 'a (** [pop a] is [a] with its last element removed. [pop [|1; 2; 3; 4|] = [|1; 2; 3|]] *) val pop : 'a array -> 'a array (** [push a x] is [a] with [x] added at its end. [push [|1; 2; 3|] 4 = [|1; 2; 3; 4|]] *) val push : 'a array -> 'a -> 'a array (**If the array [a] has length at least [k], then [truncate k a] is the suffix of length [k] of the array [a]. Otherwise, [truncate k a] is [a]. *) val truncate : int -> 'a array -> 'a array (**[is_suffix equal a1 a2] tests whether [a1] is a suffix of [a2]. The elements are compared using the function [equal]. *) val is_suffix : ('a -> 'a -> bool) -> 'a array -> 'a array -> bool (**[greatest_suffix_forall p a] is the greatest suffix of the array [a] whose elements satisfy the predicate [p]. *) val greatest_suffix_forall : ('a -> bool) -> 'a array -> 'a array (**[rev a] is a new array whose elements are the elements of the array [a], in reverse order. [rev [|1; 2; 3; 4|] = [|4; 3; 2; 1|]]*) val rev : 'a array -> 'a array (**[rev_of_list] converts a list to an array. The list's head becomes the end of the array. [rev_of_list [1; 2; 3; 4; 5] = [|5; 4; 3; 2; 1|]] *) val rev_of_list : 'a list -> 'a array (**[rev_to_list] converts an array to a list. The end of the array becomes the list's head. [rev_to_list [|1; 2; 3; 4; 5|] = [5; 4; 3; 2; 1]] *) val rev_to_list : 'a array -> 'a list (**[iter_rev f a] is equivalent to [iter f (rev a)]. *) val iter_rev : ('a -> unit) -> 'a array -> unit (**[existsi p a] tests whether there exists an index [i] such that [p i a.(i)] holds. *) val existsi : (int -> 'a -> bool) -> 'a array -> bool (**[count p a] counts how many elements of the array [a] satisfy the predicate [p]. *) val count: ('a -> bool) -> 'a array -> int (**[for_all] is identical to the function by the same name in the OCaml standard library. *) val for_all : ('a -> bool) -> 'a array -> bool (**[for_all2] is identical to the function by the same name in the OCaml standard library. *) val for_all2 : ('a -> 'b -> bool) -> 'a array -> 'b array -> bool (**[fold_left2] performs left-to-right iteration over two arrays, which must have the same length, while carrying an accumulator. *) val fold_left2 : ('a -> 'b1 -> 'b2 -> 'a) -> 'a -> 'b1 array -> 'b2 array -> 'a (**Given a [leq_join] function on elements, [leq_join] constructs a [leq_join] function on arrays. The two arrays must have the same length. The specification of a [leq_join] is defined by the signature [Fix.MINIMAL_SEMI_LATTICE]. *) val leq_join : ('a -> 'b -> 'b) -> 'a array -> 'b array -> 'b array (** Unit tests. *) val test : unit -> unit menhir-20210929/src/MList.ml000066400000000000000000000030711412503066000154150ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) include List (** A list subject to a condition. (Be careful, though: the list is of course constructed even if the condition is false.) *) let ifn condition xs = if condition then xs else [] (** A list subject to a condition. (Be careful, though: the list is of course constructed even if the condition is false.) *) let if1 condition x = if condition then [ x ] else [] (** A lazy version of [ifn], where the list is constructed only if the condition is true. *) let ifnlazy condition xs = if condition then xs() else [] let sum li = fold_left (+) 0 li menhir-20210929/src/MList.mli000066400000000000000000000031001412503066000155570ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (** This module is an extension of Stdlib.List *) include module type of List (** A list subject to a condition. (Be careful, though: the list is of course constructed even if the condition is false.) *) val ifn : bool -> 'a list -> 'a list (** A list subject to a condition. (Be careful, though: the list is of course constructed even if the condition is false.) *) val if1 : bool -> 'a -> 'a list (** A lazy version of [ifn], where the list is constructed only if the condition is true. *) val ifnlazy : bool -> (unit -> 'a list) -> 'a list (** The sum of a list of integers. *) val sum : int list -> int menhir-20210929/src/Makefile000066400000000000000000000001011412503066000154620ustar00rootroot00000000000000# [make] compiles Menhir. .PHONY: all all: @ dune build @check menhir-20210929/src/MySet.ml000066400000000000000000000075511412503066000154350ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) module Make (Ord: Map.OrderedType) = struct type elt = Ord.t type t = Empty | Node of t * elt * t * int (* Sets are represented by balanced binary trees (the heights of the children differ by at most 2 *) let height = function Empty -> 0 | Node(_, _, _, h) -> h (* Creates a new node with left son l, value v and right son r. We must have all elements of l < v < all elements of r. l and r must be balanced and | height l - height r | <= 2. Inline expansion of height for better speed. *) let create l v r = let hl = match l with Empty -> 0 | Node(_,_,_,h) -> h in let hr = match r with Empty -> 0 | Node(_,_,_,h) -> h in Node(l, v, r, (if hl >= hr then hl + 1 else hr + 1)) (* Same as create, but performs one step of rebalancing if necessary. Assumes l and r balanced and | height l - height r | <= 3. Inline expansion of create for better speed in the most frequent case where no rebalancing is required. *) let bal l v r = let hl = match l with Empty -> 0 | Node(_,_,_,h) -> h in let hr = match r with Empty -> 0 | Node(_,_,_,h) -> h in if hl > hr + 2 then begin match l with Empty -> invalid_arg "Set.bal" | Node(ll, lv, lr, _) -> if height ll >= height lr then create ll lv (create lr v r) else begin match lr with Empty -> invalid_arg "Set.bal" | Node(lrl, lrv, lrr, _)-> create (create ll lv lrl) lrv (create lrr v r) end end else if hr > hl + 2 then begin match r with Empty -> invalid_arg "Set.bal" | Node(rl, rv, rr, _) -> if height rr >= height rl then create (create l v rl) rv rr else begin match rl with Empty -> invalid_arg "Set.bal" | Node(rll, rlv, rlr, _) -> create (create l v rll) rlv (create rlr rv rr) end end else Node(l, v, r, (if hl >= hr then hl + 1 else hr + 1)) (* [add x t] guarantees that it returns [t] (physically unchanged) if [x] is already a member of [t]. *) let rec add x = function Empty -> Node(Empty, x, Empty, 1) | Node(l, v, r, _) as t -> let c = Ord.compare x v in if c = 0 then t else if c < 0 then let l' = add x l in if l == l' then t else bal l' v r else let r' = add x r in if r == r' then t else bal l v r' let empty = Empty let rec find x = function Empty -> raise Not_found | Node(l, v, r, _) -> let c = Ord.compare x v in if c = 0 then v else find x (if c < 0 then l else r) let rec iter f = function Empty -> () | Node(l, v, r, _) -> iter f l; f v; iter f r end menhir-20210929/src/MySet.mli000066400000000000000000000030151412503066000155750ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This is a stripped-down copy of the [Set] module from OCaml's standard library. The only difference is that [add x t] guarantees that it returns [t] (physically unchanged) if [x] is already a member of [t]. This yields fewer memory allocations and an easy way of testing whether the element was already present in the set before it was added. *) module Make (Ord: Map.OrderedType) : sig type elt = Ord.t type t val empty: t val add: elt -> t -> t val find: elt -> t -> elt (* may raise [Not_found] *) val iter: (elt -> unit) -> t -> unit end menhir-20210929/src/NatInfinityMax.ml000066400000000000000000000030131412503066000172630ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) type property = int let bottom = 0 let infinity = max_int let finite n = assert (0 <= n && n < infinity); n let to_int p = p let equal : property -> property -> bool = (=) let is_maximal p = p = infinity let max = max let max_lazy p q = if p = infinity then infinity else max p (q()) let add p q = if p = infinity || q = infinity then infinity else p + q let add_lazy p q = if p = infinity then infinity else let q = q() in if q = infinity then infinity else p + q let print p = if p = infinity then "infinity" else string_of_int p menhir-20210929/src/NatInfinityMax.mli000066400000000000000000000032371412503066000174440ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This is the lattice of the natural numbers, completed with [infinity], and ordered towards infinity: thatis, [bottom] is zero, [top] is infinity. *) (* Please be aware that this lattice has unbounded height, so a fixed point computation in this lattice does not necessarily terminate. *) type property val bottom: property val infinity: property val finite: int -> property val equal: property -> property -> bool val is_maximal: property -> bool val max: property -> property -> property val add: property -> property -> property val max_lazy: property -> (unit -> property) -> property val add_lazy: property -> (unit -> property) -> property val print: property -> string val to_int: property -> int menhir-20210929/src/PPrint.ml000066400000000000000000000026551412503066000156100ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The library pprint, which is found in the subdirectory pprint/, has been renamed vendored_pprint so as to prevent Dune from complaining about a conflict with a copy of pprint that might be installed on the user's system. *) (* As a result, the library is now accessible under the name Vendored_pprint. Because we do not want to pollute Menhir's sources with this name, we define the module PPrint as an alias for Vendored_pprint. *) include Vendored_pprint menhir-20210929/src/QWordBitSet.ml000066400000000000000000000163201412503066000165350ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module offers bitsets that fit within *four* OCaml integers. This can be used to represent sets of integers in the semi-open interval [0, bound), where [bound] is [4 * AtomicBitSet.bound], that is, usually 252. *) (* In principle, we could and should implement this as a pair of two sets of type [DWordBitSet.t]. However, we wish to avoid nesting heap-allocated pairs, so we have to manually adapt the code in [DWordBitSet]. *) module A = AtomicBitSet type t = | E | Q of A.t * A.t * A.t * A.t let construct hhi hlo lhi llo = if A.is_empty hhi && A.is_empty hlo && A.is_empty lhi && A.is_empty llo then E else Q (hhi, hlo, lhi, llo) type element = int let bound = 4 * A.bound let quarter3 = 3 * A.bound let middle = 2 * A.bound let quarter = 1 * A.bound let empty = E let is_empty s = match s with | E -> true | Q (_, _, _, _) -> false let singleton i = if i < middle then if i < quarter then Q (A.empty, A.empty, A.empty, A.singleton i) else let i = i - quarter in Q (A.empty, A.empty, A.singleton i, A.empty) else let i = i - middle in if i < quarter then Q (A.empty, A.singleton i, A.empty, A.empty) else let i = i - quarter in Q (A.singleton i, A.empty, A.empty, A.empty) let add i s = match s with | E -> singleton i | Q (hhi, hlo, lhi, llo) -> if i < middle then if i < quarter then let llo' = A.add i llo in if llo == llo' then s else Q (hhi, hlo, lhi, llo') else let i = i - quarter in let lhi' = A.add i lhi in if lhi == lhi' then s else Q (hhi, hlo, lhi', llo) else let i = i - middle in if i < quarter then let hlo' = A.add i hlo in if hlo == hlo' then s else Q (hhi, hlo', lhi, llo) else let i = i - quarter in let hhi' = A.add i hhi in if hhi == hhi' then s else Q (hhi', hlo, lhi, llo) let remove i s = match s with | E -> s | Q (hhi, hlo, lhi, llo) -> if i < middle then if i < quarter then let llo' = A.remove i llo in if llo == llo' then s else construct hhi hlo lhi llo' else let i = i - quarter in let lhi' = A.remove i lhi in if lhi == lhi' then s else construct hhi hlo lhi' llo else let i = i - middle in if i < quarter then let hlo' = A.remove i hlo in if hlo == hlo' then s else construct hhi hlo' lhi llo else let i = i - quarter in let hhi' = A.remove i hhi in if hhi == hhi' then s else construct hhi' hlo lhi llo let fold f s accu = match s with | E -> accu | Q (hhi, hlo, lhi, llo) -> let accu = A.fold f llo accu in let accu = A.fold_delta quarter f lhi accu in let accu = A.fold_delta middle f hlo accu in let accu = A.fold_delta quarter3 f hhi accu in accu let iter f s = match s with | E -> () | Q (hhi, hlo, lhi, llo) -> A.iter f llo; A.iter_delta quarter f lhi; A.iter_delta middle f hlo; A.iter_delta quarter3 f hhi let is_singleton s = match s with | E -> false | Q (hhi, hlo, lhi, llo) -> A.is_singleton hhi && A.is_empty hlo && A.is_empty lhi && A.is_empty llo || A.is_empty hhi && A.is_singleton hlo && A.is_empty lhi && A.is_empty llo || A.is_empty hhi && A.is_empty hlo && A.is_singleton lhi && A.is_empty llo || A.is_empty hhi && A.is_empty hlo && A.is_empty lhi && A.is_singleton llo let cardinal s = match s with | E -> 0 | Q (hhi, hlo, lhi, llo) -> A.cardinal hhi + A.cardinal hlo + A.cardinal lhi + A.cardinal llo let elements s = fold (fun tl hd -> tl :: hd) s [] let subset s1 s2 = match s1, s2 with | E, _ -> true | Q (_, _, _, _), E -> false | Q (hhi1, hlo1, lhi1, llo1), Q (hhi2, hlo2, lhi2, llo2) -> A.subset hhi1 hhi2 && A.subset hlo1 hlo2 && A.subset lhi1 lhi2 && A.subset llo1 llo2 let mem i s = match s with | E -> false | Q (hhi, hlo, lhi, llo) -> if i < middle then if i < quarter then A.mem i llo else let i = i - quarter in A.mem i lhi else let i = i - middle in if i < quarter then A.mem i hlo else let i = i - quarter in A.mem i hhi let union s1 s2 = match s1, s2 with | E, s | s, E -> s | Q (hhi1, hlo1, lhi1, llo1), Q (hhi2, hlo2, lhi2, llo2) -> let hhi = A.union hhi1 hhi2 and hlo = A.union hlo1 hlo2 and lhi = A.union lhi1 lhi2 and llo = A.union llo1 llo2 in if hhi == hhi2 && hlo == hlo2 && lhi == lhi2 && llo == llo2 then s2 else Q (hhi, hlo, lhi, llo) let inter s1 s2 = match s1, s2 with | E, _ | _, E -> E | Q (hhi1, hlo1, lhi1, llo1), Q (hhi2, hlo2, lhi2, llo2) -> construct (A.inter hhi1 hhi2) (A.inter hlo1 hlo2) (A.inter lhi1 lhi2) (A.inter llo1 llo2) let choose s = match s with | E -> raise Not_found | Q (hhi, hlo, lhi, llo) -> if not (A.is_empty llo) then A.choose llo else if not (A.is_empty lhi) then A.choose lhi + quarter else if not (A.is_empty hlo) then A.choose hlo + middle else A.choose hhi + quarter3 let compare s1 s2 = if s1 == s2 then 0 else match s1, s2 with | E , E -> 0 | Q _, E -> 1 | E , Q _ -> -1 | Q (hhi1, hlo1, lhi1, llo1), Q (hhi2, hlo2, lhi2, llo2) -> begin match A.compare hhi1 hhi2 with | 0 -> begin match A.compare hlo1 hlo2 with | 0 -> begin match A.compare lhi1 lhi2 with | 0 -> A.compare llo1 llo2 | n -> n end | n -> n end | n -> n end let equal s1 s2 = (s1 == s2) || match s1, s2 with | E , E -> true | Q _, E | E , Q _ -> false | Q (hhi1, hlo1, lhi1, llo1), Q (hhi2, hlo2, lhi2, llo2) -> A.equal hhi1 hhi2 && A.equal hlo1 hlo2 && A.equal lhi1 lhi2 && A.equal llo1 llo2 let disjoint s1 s2 = match s1, s2 with | E, _ | _, E -> true | Q (hhi1, hlo1, lhi1, llo1), Q (hhi2, hlo2, lhi2, llo2) -> A.disjoint hhi1 hhi2 && A.disjoint hlo1 hlo2 && A.disjoint lhi1 lhi2 && A.disjoint llo1 llo2 menhir-20210929/src/QWordBitSet.mli000066400000000000000000000023541412503066000167100ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module offers bitsets that fit within *four* OCaml integers. This can be used to represent sets of integers in the semi-open interval [0, bound), where [bound] is [4 * AtomicBitSet.bound], that is, usually 252. *) val bound: int include GSet.S with type element = int menhir-20210929/src/RandomSentenceGenerator.ml000066400000000000000000000153501412503066000211440ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* -------------------------------------------------------------------------- *) (* Random choice in a list. *) let choose (xs : 'a list) : 'a = let n = List.length xs in assert (n > 0); let i = Random.int n in List.nth xs i (* Random choice in a closed integer interval. *) let pick (i, j) = assert (i <= j); i + Random.int (j - i + 1) (* -------------------------------------------------------------------------- *) (* Computations on closed integer intervals. *) let infinity = max_int let between i j = (i, j) let at_least i = between i infinity let at_most j = between 0 j let full = (0, max_int) let intersect (i1, j1) (i2, j2) = (max i1 i2, min j1 j2) let intersect intervals = List.fold_left intersect full intervals let nonempty (i, j) = i <= j (* -------------------------------------------------------------------------- *) (* A buffer is used to emit tokens. *) let buffer = ref [] let emit tok = buffer := tok :: !buffer let reset () = let toks = !buffer in buffer := []; List.rev toks (* -------------------------------------------------------------------------- *) (* [nonterminal nt goal k] emits a sentence that is generated by [nt] and whose length is at most [goal]. Then, it invokes the continuation [k]. The use of an explicit continuation allows us to use tail calls and avoid stack overflows. *) let rec nonterminal nt goal k : unit = assert (Analysis.minimal nt <= goal); assert (goal <= Analysis.maximal nt); (* We must choose between the productions that are associated with [nt]. There does not necessarily a production that is capable of producing a word of length [goal] exactly; there does not even necessarily exist a production such that [goal] lies in the interval of lengths that this production can generate. (E.g., we could have a goal of 3, and two productions that generate words of length 2 and 4, respectively.) We proceed as follows. First, we attempt to choose among the productions such that [goal] lies in the production's length window. If unfortunately there is no such production, then we must adjust our goal, by increasing or decreasing it. Increasing it is dangerous and can cause nontermination. We choose to decrease it; we set it to the minimum value [minimal nt] and therefore produce a word of minimal length. *) (* Attempt to choose a production whose length window contains [goal]. *) let prods = Production.foldnt nt (fun prod prods -> if Production.error_free prod && Analysis.minimal_prod prod 0 <= goal && goal <= Analysis.maximal_prod prod 0 then prod :: prods else prods ) [] in if prods = [] then (* Unsuccessful. Set [goal] to [minimal nt] and retry. *) nonterminal nt (Analysis.minimal nt) k else (* Successful. Pick a production and use it. *) production (choose prods) 0 goal k (* [production prod i goal k] emits a sentence that is generated by the production suffix [prod/i] and whose length is at most [goal]. Then, it invokes the continuation [k]. *) and production prod i goal k : unit = assert (Analysis.minimal_prod prod i <= goal); assert (goal <= Analysis.maximal_prod prod i); let n = Production.length prod in assert (0 <= i && i <= n); if i = n then begin assert (goal = 0); k() end else if i < n then begin let rhs = Production.rhs prod in match rhs.(i) with | Symbol.T tok -> (* A terminal symbol offers no choice. *) emit tok; production prod (i + 1) (goal - 1) k | Symbol.N nt -> (* A nonterminal symbol [nt] offers a choice: we must split the budget between this symbol and the rest of the right-hand side. *) let min1, max1 = Analysis.(minimal nt, maximal nt) and min2, max2 = Analysis.(minimal_prod prod (i + 1), maximal_prod prod (i + 1)) in (* This is where things get tricky. We have [goal <= max], that is, [goal <= max1 + max2], where [max1] and [max2] may be [infinity]. We wish to split [goal] as [goal1 + goal2], where the constraints [min1 <= goal1 <= max1] and [min2 <= goal2 <= max2] must be satisfied. *) assert (min1 <= max1 && min2 <= max2); assert (min1 + min2 <= goal); assert (max1 = infinity || max2 = infinity || goal <= max1 + max2); (* The constraints bearing on [goal1] are as follows. *) let constraints = intersect [ between 0 goal; between min1 max1; (if max2 = infinity then full else at_least (goal - max2)); at_most (goal - min2); ] in (* This assertion is not entirely obvious... *) assert (nonempty constraints); let goal1 = pick constraints in let goal2 = goal - goal1 in assert (min1 <= goal1); assert (goal1 <= max1); assert (min2 <= goal2); assert (goal2 <= max2); nonterminal nt goal1 (fun () -> production prod (i + 1) goal2 k) end (* -------------------------------------------------------------------------- *) (* A wrapper that takes care of resetting the buffer and providing an identity continuation. *) let nonterminal nt goal = assert (!buffer = []); let k () = () in match nonterminal nt goal k with | () -> reset() | exception e -> buffer := []; raise e (* A wrapper that takes care of adjusting the initial goal if it lies outside the window. *) let nonterminal nt goal = let min, max = Analysis.(minimal nt, maximal nt) in let goal = if min <= goal && goal <= max then goal else if goal < min then min else begin assert (max < goal); assert (max < infinity); max end in nonterminal nt goal menhir-20210929/src/RandomSentenceGenerator.mli000066400000000000000000000033311412503066000213110ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (**This module generates random sentences that are well-formed according to the grammar. If the grammar is not LR(1), these sentences are *not* necessarily accepted by the automaton. The distribution of sentences is *not* uniform. The goal length is not necessarily obeyed exactly; the generator produces a sentence whose length is at most [goal]. The time complexity is roughly linear with respect to the goal length. Because we do not wish to generate sentences that contain the [error] pseudo-token, any production that contains this token is ignored. This can cause a problem if the goal can be achieved only via such a production. This is hopefully unlikely. *) open Grammar val nonterminal: Nonterminal.t -> int -> Terminal.t list menhir-20210929/src/ReferenceGraph.ml000066400000000000000000000042311412503066000172440ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar let print_reference_graph() = (* Allocate. *) let forward : NonterminalSet.t NonterminalMap.t ref = ref NonterminalMap.empty in let successors nt = try NonterminalMap.find nt !forward with Not_found -> NonterminalSet.empty in (* Populate. *) Production.iter (fun prod -> let nt1 = Production.nt prod and rhs = Production.rhs prod in Array.iter (function | Symbol.T _ -> () | Symbol.N nt2 -> forward := NonterminalMap.add nt1 (NonterminalSet.add nt2 (successors nt1)) !forward ) rhs ); (* Print. *) let module P = Dot.Print (struct type vertex = Nonterminal.t let name nt = Printf.sprintf "nt%d" (Nonterminal.n2i nt) let successors (f : ?style:Dot.style -> label:string -> vertex -> unit) nt = NonterminalSet.iter (fun successor -> f ~label:"" successor ) (successors nt) let iter (f : ?shape:Dot.shape -> ?style:Dot.style -> label:string -> vertex -> unit) = Nonterminal.iter (fun nt -> f ~label:(Nonterminal.print false nt) nt ) end) in let f = open_out (Settings.base ^ ".dot") in P.print f; close_out f menhir-20210929/src/ReferenceGraph.mli000066400000000000000000000022721412503066000174200ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* Build and print the forward reference graph of the grammar. There is an edge of a nonterminal symbol [nt1] to every nonterminal symbol [nt2] that occurs in the definition of [nt1]. *) val print_reference_graph: unit -> unit menhir-20210929/src/SelectiveExpansion.ml000066400000000000000000000455641412503066000202120ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) let value = Positions.value let unknown = Positions.unknown_pos open Syntax open GroundSort (* -------------------------------------------------------------------------- *) (* Expansion modes. *) type mode = | ExpandHigherSort | ExpandAll (* -------------------------------------------------------------------------- *) (* Expansion can be understood as traversing a graph where every vertex is labeled with a pair of a nonterminal symbol [nt] and an instantiation of the formal parameters of [nt]. *) (* We allow partial instantiations, where some of the formal parameters of [nt] are instantiated, while others remain present. For this reason, we represent an instantation as a list of *optional* actual parameters. *) (* The actual parameters that appear in an instantiation make sense *in the source namespace* (at the toplevel). That is, they refer to (terminal and nonterminal) symbols that exist (at the toplevel) in the original grammar. *) type instantiation = parameter option list type label = nonterminal * instantiation (* Equality and hashing for labels. *) module Label = struct type t = label let equal (nt1, inst1) (nt2, inst2) = nt1 = nt2 && List.for_all2 (Option.equal Parameters.equal) inst1 inst2 let hash (nt, inst) = Hashtbl.hash (nt, Misc.ListExtras.hash (Option.hash Parameters.hash) inst) end (* -------------------------------------------------------------------------- *) (* [mangle label] chooses a concrete name for the new nonterminal symbol that corresponds to the label [label]. *) (* We include parentheses and commas in this name, because that is readable and acceptable in many situations. We replace them with underscores in situations where these characters are not valid; see [Misc.normalize]. *) let mangle_po (po : parameter option) = match po with | None -> (* When a parameter remains uninstantiated, we put an underscore in its place. *) "_" | Some p -> Parameters.print false p let mangle ((nt, pos) : label) : nonterminal = if pos = [] then nt else Printf.sprintf "%s(%s)" nt (Misc.separated_list_to_string mangle_po "," pos) (* -------------------------------------------------------------------------- *) (* An environment maps all of the formal parameters of a rule to actual parameters, which make sense in the source namespace. *) module Env = StringMap type env = parameter Env.t let subst_symbol env sym : parameter = try Env.find (value sym) env with Not_found -> (* [x] is not a formal parameter. It is a toplevel symbol. *) ParameterVar sym let apply (param : parameter) (params : parameter list) : parameter = match param with | ParameterVar sym -> assert (params <> []); ParameterApp (sym, params) | ParameterApp _ -> (* In a well-sorted grammar, only a variable can have higher sort. Here, [param] must have higher sort, so [param] must be a variable. This case cannot arise. *) assert false | ParameterAnonymous _ -> (* Anonymous rules are eliminated early on. *) assert false let rec subst_parameter env param : parameter = match param with | ParameterVar sym -> subst_symbol env sym | ParameterApp (sym, params) -> assert (params <> []); apply (subst_symbol env sym) (subst_parameters env params) | ParameterAnonymous _ -> (* Anonymous rules are eliminated early on. *) assert false and subst_parameters env params = List.map (subst_parameter env) params (* -------------------------------------------------------------------------- *) (* -------------------------------------------------------------------------- *) (* For syntactic convenience, the rest of this file is a functor. *) module Run (G : sig (* Expansion mode. *) val mode: mode (* Sort information. *) val sorts: SortInference.sorts (* The grammar [g] whose expansion is desired. *) val g : grammar end) = struct open G (* -------------------------------------------------------------------------- *) (* Determining the sort of a symbol or parameter. *) (* Be careful: these functions use the toplevel sort environment [sorts], so they must not be used within a rule. (The sort environment would have to be extended with information about the formal parameters.) *) let sort symbol = try StringMap.find (value symbol) sorts with Not_found -> assert false let sort param = match param with | ParameterVar sym -> sort sym | ParameterApp (_, params) -> assert (params <> []); (* An application always has sort [*]. *) star | ParameterAnonymous _ -> assert false (* -------------------------------------------------------------------------- *) (* Looking up the [%attribute] declarations, looking for attributes attached with a nonterminal symbol [nt]. This is used when we create a specialized version of this symbol. *) (* We use an inefficient linear search, but that shouldn't be a problem. *) let global_attributes (nt : symbol) : attribute list = let param = ParameterVar (unknown nt) in List.concat (List.map (fun (params, attrs) -> if List.exists (Parameters.equal param) params then attrs else [] ) g.p_symbol_attributes) (* -------------------------------------------------------------------------- *) (* A queue keeps track of the graph vertices that have been discovered but not yet visited. *) let enqueue, repeatedly = let queue = Queue.create() in let enqueue label = Queue.add label queue and repeatedly visit = Misc.qiter visit queue in enqueue, repeatedly (* -------------------------------------------------------------------------- *) (* A hash table is used to mark the graph vertices that have been discovered. *) let mark, marked = let module H = Hashtbl.Make(Label) in let table = H.create 128 in let mark label = H.add table label () and marked label = H.mem table label in mark, marked (* -------------------------------------------------------------------------- *) (* The rules of the expanded grammar are gradually collected. *) let emit, rules = let rules = ref StringMap.empty in let emit rule = assert (not (StringMap.mem rule.pr_nt !rules)); rules := StringMap.add rule.pr_nt rule !rules and rules() = !rules in emit, rules (* -------------------------------------------------------------------------- *) (* On top of the function [mangle], we set up a mechanism that checks that every (normalized) mangled name is unique. (Indeed, in principle, there could be clashes, although in practice this is unlikely.) We must check that every application of [mangle] to a *new* argument yields a *new* (normalized) result. This is succinctly expressed by combining a claim and a memoizer. *) (* [new_claim()] creates a new service for claiming names. It returns a function [claim] of type [int -> unit] such that the call [claim x] succeeds if and only if [claim x] has never been called before. *) let new_claim () : (string -> unit) = let names = ref StringSet.empty in let claim name = if StringSet.mem name !names then Error.error [] "internal name clash over %s" name; names := StringSet.add name !names in claim let mangle : label -> nonterminal = let ensure_fresh = new_claim() in let module M = Fix.Memoize.ForHashedType(Label) in M.memoize (fun label -> let name = mangle label in ensure_fresh (Misc.normalize name); name ) (* -------------------------------------------------------------------------- *) (* [recognize] receives an actual parameter [param] that makes sense in the source namespace and transforms it into a parameter that makes sense in the target namespace. This involves examining each application and "recognizing" it as an application of a label to a sequence of residual actual parameters, as explained next. All labels thus recognized are enqueued. *) (* [recognize] governs how much specialization is performed. For instance, [F(X, Y, Z)] could be recognized as: - an application of the symbol [F] to the residual arguments [X, Y, Z]. Then, no specialization at all takes place. - an application of the symbol [F(X,Y,Z)] to no residual arguments. Then, [F] is fully specialized for [X, Y, Z]. - in between these extremes, say, an application of the symbol [F(X,_,Z)] to the residual argument [Y]. Then, [F] is partially specialized. If there are any residual arguments, then they must be recursively recognized. For instance, [F(X,G(Y),Z)] could be recognized as an application of the symbol [F(X,_,Z)] to [G(Y)], which itself could be recognized as an application of the symbol [G(Y)] to no residual arguments. *) let rec recognize (param : parameter) : parameter = (* [param] must have sort [star], in an appropriate sort environment. *) match param with | ParameterAnonymous _ -> assert false | ParameterVar _ -> param | ParameterApp (sym, ps) -> assert (ps <> []); let x = value sym in (* This symbol is applied to at least one argument, so cannot be a terminal symbol. It must be either a nonterminal symbol or an (uninstantiated) formal parameter of the current rule. *) (* Actually, in both modes, formal parameters of higher sort are expanded away, so [sym] cannot be an uninstantiated parameter of the current rule. It must be a nonterminal symbol. We can therefore look up its sort in the toplevel environment [sorts]. *) let inst, residuals = match mode with | ExpandAll -> (* Expansion of all parameters. *) let inst = List.map (fun p -> Some p) ps and residuals = [] in inst, residuals | ExpandHigherSort -> (* Expansion of only the parameters of higher sort. *) let ss : sort list = domain (sort (ParameterVar sym)) in assert (List.length ps = List.length ss); let pss = List.combine ps ss in let inst = pss |> List.map (fun (param, sort) -> if sort = star then None else Some param) in let residuals = pss |> List.filter (fun (_, sort) -> sort = star) |> List.map (fun (param, _) -> recognize param) in inst, residuals in let label = (x, inst) in enqueue label; let sym = mangle label in Parameters.app (unknown sym) residuals (* -------------------------------------------------------------------------- *) (* The following functions take arguments in the source namespace and produce results in the target namespace. *) let subst_parameter env param = (* [param] must have sort [star], in an appropriate sort environment. *) recognize (subst_parameter env param) let subst_producer env (id, param, attrs) = let param = subst_parameter env param in (id, param, attrs) let subst_producers env producers = List.map (subst_producer env) producers let subst_branch env branch = { branch with pr_producers = subst_producers env branch.pr_producers } let subst_branches env branches = List.map (subst_branch env) branches (* -------------------------------------------------------------------------- *) (* A quick and dirty way of mapping a name to a fresh name. *) let freshen : string -> string = let c = ref 0 in fun x -> Printf.sprintf "%s__menhir__%d" x (Misc.postincrement c) (* -------------------------------------------------------------------------- *) (* [instantiation_env] expects the formal parameters of a rule, [formals], and an instantiation [inst] that dictates how this rule must be specialized. It returns an environment [env] that can be used to perform specialization and a list of residual formal parameters (those that are not specialized). *) let instantiation_env formals inst : env * symbol list = assert (List.length formals = List.length inst); let env, residuals = List.fold_right2 (fun formal po (env, residuals) -> let param, residuals = match po with | Some param -> (* This formal parameter is instantiated. *) param, residuals | None -> (* This formal parameter is not instantiated. *) (* We would like to map it to itself. *) (* However, we must in principle be a bit careful: if a toplevel symbol by the same name as [formal] appears free in the codomain of the environment that we are building, then we will run intro trouble. We avoid this problem by systematically renaming every formal parameter to a fresh unlikely name. *) let formal = freshen formal in ParameterVar (unknown formal), formal :: residuals in Env.add formal param env, residuals ) formals inst (Env.empty, []) in env, residuals (* -------------------------------------------------------------------------- *) (* [visit label] visits a vertex labeled [label] in the graph. This label is a pair of a nonterminal symbol [nt] and an instantiation [inst]. Unless this vertex has been visited already, we create a specialized copy of [nt] for this instantiation. This involves a call to [subst_branches], which can cause more vertices to be discovered and enqueued. *) (* The specialized symbol retains any attributes carried by the original parameterized symbol. These attributes could be either attached with this rule ([rule.pr_attributes]) or specified via an [%attribute] declaration. We have to look up [%attribute] declarations now (as opposed to letting [Drop] handle them) if this is a parameterized symbol, as the connection between the original parameterized symbol and its specialized version is evident here but is lost afterwards. *) let visit label = if not (marked label) then begin mark label; let (nt, inst) = label in let rule = StringMap.find nt g.p_rules in let formals = rule.pr_parameters in let env, residuals = instantiation_env formals inst in emit { rule with pr_nt = mangle label; pr_parameters = residuals; pr_branches = subst_branches env rule.pr_branches; pr_attributes = (if formals = [] then [] else global_attributes nt) @ rule.pr_attributes } end (* -------------------------------------------------------------------------- *) (* The entry points of the graph traversal include the nonterminal symbols of sort [*]. (Not just the start symbols, as we haven't run the reachability analysis, and the grammar may contain unreachable parts, which we still want to expand.) Because a start symbol must have sort [*], this includes the start symbols. *) let () = StringMap.iter (fun nt prule -> if prule.pr_parameters = [] then let label = (nt, []) in enqueue label ) g.p_rules (* -------------------------------------------------------------------------- *) (* The parameters that appear in [%type] declarations and [%on_error_reduce] declarations are also considered entry points. They have sort [*]. *) let subst_parameter param = subst_parameter Env.empty param let subst_declaration (param, info) = assert (sort param = star); (subst_parameter param, info) let subst_declarations decls = List.map subst_declaration decls (* -------------------------------------------------------------------------- *) (* An [%attribute] declaration for a parameter of sort [*] is treated as an entry point. An [%attribute] declaration for a symbol of higher sort is not regarded as an entry point, and at the end, is kept only if this symbol still appears in the expanded grammar. *) (* This is done in two passes over the list of [%attribute] declarations, named [thingify] and [unthingify]. The first pass runs as part of the discovery of entry points, before the graph traversal. The second pass runs after the graph traversal is complete. *) type thing = | TargetParameterOfSortStar of parameter | SourceParameterOfHigherSort of parameter let thingify_parameter param : thing = if sort param = star then TargetParameterOfSortStar (subst_parameter param) else SourceParameterOfHigherSort param let thingify_attribute_declaration (params, attrs) = (List.map thingify_parameter params, attrs) let thingify_attribute_declarations decls = List.map thingify_attribute_declaration decls let unthingify_parameter rules thing = match thing with | TargetParameterOfSortStar param -> (* This parameter has sort [star]. Keep it. *) Some param | SourceParameterOfHigherSort param -> (* This parameter has higher sort. It must be a symbol. Keep it if it still appears in the expanded grammar. *) let symbol = value (Parameters.unvar param) in if StringMap.mem symbol rules then Some param else None let unthingify_attribute_declaration rules (params, attrs) = (Misc.filter_map (unthingify_parameter rules) params, attrs) let unthingify_attribute_declarations rules decls = List.map (unthingify_attribute_declaration rules) decls (* -------------------------------------------------------------------------- *) (* Put everything together a construct a new grammar. *) let g = (* Discovery of entry points. *) let p_types = subst_declarations g.p_types and p_on_error_reduce = subst_declarations g.p_on_error_reduce and things = thingify_attribute_declarations g.p_symbol_attributes in (* Graph traversal. *) repeatedly visit; (* Construction of the new grammar. *) let p_rules = rules() in let p_symbol_attributes = unthingify_attribute_declarations p_rules things in { g with p_types; p_on_error_reduce; p_symbol_attributes; p_rules } end (* of the functor *) (* -------------------------------------------------------------------------- *) (* Re-package the above functor as a function. *) let expand mode sorts g = let module G = Run(struct let mode = mode let sorts = sorts let g = g end) in G.g menhir-20210929/src/SelectiveExpansion.mli000066400000000000000000000037761412503066000203620ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Syntax open SortInference (* [expand sorts g] expands away some or all of the parameterized nonterminal symbols in the grammar [g], producing a new grammar. [sorts] is the sort environment produced by [SortInference]. *) (* The mode [ExpandHigherSort] causes a partial expansion: only the parameters of higher sort (i.e., of sort other than [*]) are expanded away. This mode is safe, in the sense that expansion always terminates. A proof sketch is as follows: 1- an application always has sort [*]; 2- therefore, only a variable can have higher sort; 3- therefore, only a finite number of terms can appear during expansion. *) (* The mode [ExpandAll] causes a complete expansion: all parameters are expanded away. This process is potentially nonterminating. One must first run the termination test in [CheckSafeParameterizedGrammar] (which itself is applicable only after the parameters of higher sort have been expanded away). *) type mode = | ExpandHigherSort | ExpandAll val expand: mode -> sorts -> grammar -> grammar menhir-20210929/src/Seq.ml000066400000000000000000000037211412503066000151170ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* Sequences with constant time concatenation and linear-time conversion to an ordinary list. *) (* We maintain the invariant that the left-hand side of [SConcat] is never an empty sequence. This allows a slight improvement in [first]. *) type 'a seq = | SZero | SOne of 'a | SConcat of 'a seq * 'a seq let empty = SZero let singleton x = SOne x let append xs ys = match xs with | SZero -> ys | SOne _ | SConcat _ -> SConcat (xs, ys) let rec elements xs accu = match xs with | SZero -> accu | SOne x -> x :: accu | SConcat (xs1, xs2) -> elements xs1 (elements xs2 accu) let elements xs = elements xs [] let rec concat xss = match xss with | [] -> empty | xs :: xss -> append xs (concat xss) let rec first xs = match xs with | SZero -> (* We disallow applying [first] to an empty sequence. *) assert false | SOne x -> x | SConcat (xs1, _) -> (* Our invariant guarantees [xs1] is nonempty. *) first xs1 menhir-20210929/src/Seq.mli000066400000000000000000000024211412503066000152640ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* Sequences with constant time concatenation and linear-time conversion to an ordinary list. *) type 'a seq val empty: 'a seq val singleton: 'a -> 'a seq val append: 'a seq -> 'a seq -> 'a seq val elements: 'a seq -> 'a list val concat: 'a seq list -> 'a seq val first: 'a seq -> 'a (* sequence must be nonempty *) menhir-20210929/src/SortInference.ml000066400000000000000000000223761412503066000171440ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) let value = Positions.value let position = Positions.position let error = Error.error open Syntax open SortUnification (* -------------------------------------------------------------------------- *) (* Error handling. *) (* In [check_arity], in principle, [arity1] is the expected arity and [arity2] is the actual arity. This distinction does not make much sense, though, as we do not know which is wrong, the declaration site or the use site. So, we display a neutral error message. *) let check_arity sym arity1 arity2 = let plural = max arity1 arity2 > 1 in if arity1 <> arity2 then error [position sym] "does the symbol \"%s\" expect %d or %d argument%s?" (value sym) (min arity1 arity2) (max arity1 arity2) (if plural then "s" else "") (* This variant of [unify] is used when no unification error can arise. *) let unify_cannot_fail sort1 sort2 = try unify sort1 sort2 with | Unify _ | Occurs _ -> (* If the caller is right, this unification step cannot fail! *) assert false (* In [unify], in principle, [sort1] is the expected sort and [sort2] is the actual sort. Again, this distinction does not make much sense, so we display a neutral error message. *) let unify sym sort1 sort2 = try unify sort1 sort2 with | Unify (v1, v2) -> let print v = print (decode v) in error [position sym] "how is the symbol \"%s\" parameterized?\n\ It is used at sorts %s and %s.\n\ The sort %s is not compatible with the sort %s." (value sym) (print sort1) (print sort2) (print v1) (print v2) | Occurs (v1, v2) -> let print v = print (decode v) in error [position sym] "how is the symbol \"%s\" parameterized?\n\ It is used at sorts %s and %s.\n\ The sort %s cannot be unified with the sort %s." (value sym) (print sort1) (print sort2) (print v1) (print v2) (* -------------------------------------------------------------------------- *) (* An environment maps (terminal and nonterminal) symbols to unification variables. *) type symbol = string module Env = StringMap type env = variable Env.t let find x env : variable = try Env.find x env with Not_found -> assert false (* unbound terminal or nonterminal symbol *) let extend env (xvs : (symbol * variable) list) = List.fold_left (fun env (x, v) -> Env.add x v env ) env xvs (* -------------------------------------------------------------------------- *) (* [allocate xs] allocates a fresh unification variable [v] for every element [x] of the list [xs]. It returns the lists [xvs] and [vs]. *) let allocate (xs : 'a list) : ('a * variable) list * variable list = let xvs = List.map (fun x -> x, fresh()) xs in let vs = List.map snd xvs in xvs, vs (* -------------------------------------------------------------------------- *) (* [check_parameter env param expected] checks that the parameter [param] has sort [expected]. A parameter is either a symbol or an application of a symbol to a number of parameters. Every application is total -- the language does not have partial applications. The sort of every application is [star], but the sort of a variable is unrestricted. *) let rec check_parameter env (param : parameter) (expected : variable) = match param with | ParameterVar sym -> let x = value sym in unify sym expected (find x env) | ParameterApp (sym, actuals) -> let x = value sym in (* This application has sort [star]. *) unify sym expected star; (* Retrieve the expected sort of each parameter. Two cases arise: if [x] has already been assigned an arrow sort, then we can retrieve its domain, which gives us the expected sort of each actual parameter; otherwise, we just make up a fresh arrow sort of appropriate arity. We could avoid this case distinction and always use the latter method, but the former method, when applicable, yields better error messages. If [sym] is a toplevel (nonterminal or terminal) symbol, then we will be in the first case, as we have been careful to initially assign an arrow sort of appropriate arity to each such symbol. *) let v = find x env in let expected = match domain v with | Some expected -> check_arity sym (List.length expected) (List.length actuals); expected | None -> let _, expected = allocate actuals in unify_cannot_fail v (arrow expected); expected in (* Check the sort of each actual parameter. *) List.iter2 (check_parameter env) actuals expected | ParameterAnonymous _ -> (* Anonymous rules have been eliminated already. *) assert false (* -------------------------------------------------------------------------- *) (* The following functions respectively check that a producer, a branch, a rule, and a grammar are well-sorted under an environment [env]. *) let check_producer env (producer : producer) = let (_, param, _) = producer in (* A producer must have sort [star]. *) check_parameter env param star let check_branch env (branch : parameterized_branch) = List.iter (check_producer env) branch.pr_producers let enter_rule env (nt : symbol) (rule : parameterized_rule) : env = (* For each formal parameter, allocate a fresh variable. *) let formals, domain = allocate rule.pr_parameters in (* Connect these variables with the sort of the symbol [nt]. *) (* Because it is performed first, this particular unification cannot fail. *) unify_cannot_fail (find nt env) (arrow domain); (* Extend the environment. *) extend env formals let check_rule env (nt : symbol) (rule : parameterized_rule) = (* Extend the environment within this rule. *) let env = enter_rule env nt rule in (* Check each branch in this extended environment. *) List.iter (check_branch env) rule.pr_branches let check_grammar env g = (* Each rule must be well-sorted. *) StringMap.iter (check_rule env) g.p_rules; (* The start symbols must have sort [star]. *) StringMap.iter (fun nt position -> let sym = Positions.with_pos position nt in unify sym star (find nt env) ) g.p_start_symbols; (* Every symbol that appears in a [%type] declaration must have sort [star]. *) List.iter (fun (param, _) -> check_parameter env param star ) g.p_types; (* Same rule for [%on_error_reduce] declarations. *) List.iter (fun (param, _) -> check_parameter env param star ) g.p_on_error_reduce; (* The symbols that appear in [%attribute] declarations must be well-sorted. Their sort is not necessarily [star]: it is legal to attach an attribute with a parameterized symbol. *) List.iter (fun (params, _) -> List.iter (fun param -> check_parameter env param (fresh()) ) params ) g.p_symbol_attributes (* -------------------------------------------------------------------------- *) type sorts = GroundSort.sort Env.t let infer (g : grammar) : sorts = (* For each (terminal or nonterminal) symbol, allocate a unification variable. The terminal symbols have sort [star], so we can use this particular variable. *) let env = StringMap.fold (fun tok _ env -> Env.add tok star env ) g.p_tokens Env.empty in let env = Env.add "error" star env in let env = StringMap.fold (fun nt rule env -> let env = Env.add nt (fresh()) env in (* The following line unifies the sort of [nt] with an arrow of appropriate arity. It cannot fail. This strategy should lead to slightly better unification error messages. *) let _ : env = enter_rule env nt rule in env ) g.p_rules env in (* Impose sort equality constraints. *) check_grammar env g; (* Decode the environment, so our user doesn't have to deal with unification variables. *) let env = Env.map decode env in (* Ground any unassigned sort variables. (These should occur only in unreachable parts of the grammar.) This guarantees that the user does not have to deal with sort variables. *) let env = Env.map ground env in (* At log level 3, display the inferred sort of every symbol. *) Error.logG 3 (fun f -> Env.iter (fun x gsort -> Printf.fprintf f "%s :: %s\n" x (print (unground gsort)) ) env ); env menhir-20210929/src/SortInference.mli000066400000000000000000000023471412503066000173110ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Syntax open GroundSort (* [infer_grammar g] performs sort inference for the grammar [g], rejecting the grammar if it is ill-sorted. It returns a map of (terminal and nonterminal) symbols to ground sorts. *) type sorts = sort StringMap.t val infer: grammar -> sorts menhir-20210929/src/SortUnification.ml000066400000000000000000000076311412503066000175130ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module implements sort inference. *) (* -------------------------------------------------------------------------- *) (* The syntax of sorts is: sort ::= (sort, ..., sort) -> * where the arity (the number of sorts on the left-hand side of the arrow) can be zero. *) module S = struct type 'a structure = | Arrow of 'a list let map f (Arrow xs) = Arrow (List.map f xs) let iter f (Arrow xs) = List.iter f xs exception Iter2 let iter2 f (Arrow xs1) (Arrow xs2) = let n1 = List.length xs1 and n2 = List.length xs2 in if n1 = n2 then List.iter2 f xs1 xs2 else raise Iter2 end include S (* -------------------------------------------------------------------------- *) (* Instantiate the unification algorithm with the above signature. *) include Unifier.Make(S) type sort = term = | TVar of int | TNode of sort structure (* -------------------------------------------------------------------------- *) (* Sort constructors. *) let arrow (args : variable list) : variable = fresh (Some (Arrow args)) let star : variable = arrow [] let fresh () = fresh None (* Sort accessors. *) let domain (x : variable) : variable list option = match structure x with | Some (Arrow xs) -> Some xs | None -> None (* -------------------------------------------------------------------------- *) (* Converting between sorts and ground sorts. *) let rec ground s = match s with | TVar _ -> (* All variables are replaced with [*]. *) GroundSort.GArrow [] | TNode (Arrow ss) -> GroundSort.GArrow (List.map ground ss) let rec unground (GroundSort.GArrow ss) = TNode (Arrow (List.map unground ss)) (* -------------------------------------------------------------------------- *) (* A name generator for unification variables. *) let make_gensym () : unit -> string = let c = ref 0 in let gensym () = let n = Misc.postincrement c in Printf.sprintf "%c%s" (char_of_int (Char.code 'a' + n mod 26)) (let d = n / 26 in if d = 0 then "" else string_of_int d) in gensym (* A memoized name generator. *) let make_name () : int -> string = let gensym = make_gensym() in Fix.Memoize.Int.memoize (fun _x -> gensym()) (* -------------------------------------------------------------------------- *) (* A printer. *) let rec print name (b : Buffer.t) (sort : sort) = match sort with | TVar x -> Printf.bprintf b "%s" (name x) | TNode (S.Arrow []) -> Printf.bprintf b "*" | TNode (S.Arrow (sort :: sorts)) -> (* Always parenthesize the domain, so there is no ambiguity. *) Printf.bprintf b "(%a%a) -> *" (print name) sort (print_comma_sorts name) sorts and print_comma_sorts name b sorts = List.iter (print_comma_sort name b) sorts and print_comma_sort name b sort = Printf.bprintf b ", %a" (print name) sort let print sort : string = let b = Buffer.create 32 in print (make_name()) b sort; Buffer.contents b menhir-20210929/src/SortUnification.mli000066400000000000000000000044241412503066000176610ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module implements sort inference. *) (* -------------------------------------------------------------------------- *) (* The syntax of sorts is: sort ::= (sort, ..., sort) -> * where the arity (the number of sorts on the left-hand side of the arrow) can be zero. See [GroundSort]. *) type 'a structure = | Arrow of 'a list type sort = | TVar of int | TNode of sort structure (* -------------------------------------------------------------------------- *) (* Sort unification. *) type variable val star: variable val arrow: variable list -> variable val fresh: unit -> variable (* [domain] is the opposite of [arrow]. If [x] has been unified with an arrow, then [domain x] returns its domain. Otherwise, it returns [None]. Use with caution. *) val domain: variable -> variable list option exception Unify of variable * variable exception Occurs of variable * variable val unify: variable -> variable -> unit (* Once unification is over, a unification variable can be decoded as a sort. *) val decode: variable -> sort (* Grounding a sort replaces all sort variables with the sort [*]. *) val ground: sort -> GroundSort.sort val unground: GroundSort.sort -> sort (* -------------------------------------------------------------------------- *) (* A sort can be printed. *) val print: sort -> string menhir-20210929/src/SparseBitSet.ml000066400000000000000000000136711412503066000167440ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This data structure implements sets of integers (of unbounded magnitude). *) module A = AtomicBitSet (* A sparse bit set is a linked list pairs of an index and a bit set. The list is sorted by order of increasing indices. *) type t = | N | C of int * A.t * t type element = int let empty = N let is_empty = function | N -> true | C _ -> false let rec add base offset s = match s with | N -> (* Insert at end. *) C (base, A.singleton offset, N) | C (addr, ss, qs) -> if base < addr then (* Insert in front. *) C (base, A.singleton offset, s) else if base = addr then (* Found appropriate cell, update bit field. *) let ss' = A.add offset ss in if A.equal ss' ss then s else C (addr, ss', qs) else (* Not there yet, continue. *) let qs' = add base offset qs in if qs == qs' then s else C (addr, ss, qs') let add i s = let offset = i mod A.bound in let base = i - offset in add base offset s let singleton i = (* This is [add i N], specialised. *) let offset = i mod A.bound in let base = i - offset in C (base, A.singleton offset, N) let rec remove base offset s = match s with | N -> N | C (addr, ss, qs) -> if base < addr then s else if base = addr then (* Found appropriate cell, update bit field. *) let ss' = A.remove offset ss in if A.is_empty ss' then qs else if A.equal ss' ss then s else C (addr, ss', qs) else (* Not there yet, continue. *) let qs' = remove base offset qs in if qs == qs' then s else C (addr, ss, qs') let remove i s = let offset = i mod A.bound in let base = i - offset in remove base offset s let rec mem base offset s = match s with | N -> false | C (addr, ss, qs) -> if base < addr then false else if base = addr then A.mem offset ss else mem base offset qs let mem i s = let offset = i mod A.bound in let base = i - offset in mem base offset s let rec fold f s accu = match s with | N -> accu | C (addr, ss, qs) -> let accu = A.fold_delta addr f ss accu in fold f qs accu let rec iter f s = match s with | N -> () | C (addr, ss, qs) -> A.iter_delta addr f ss; iter f qs let is_singleton s = match s with | C (_, ss, N) -> A.is_singleton ss | C (_, _, C _) | N -> false let rec cardinal accu s = match s with | C (_, ss, qs) -> let accu = accu + A.cardinal ss in cardinal accu qs | N -> accu let cardinal s = cardinal 0 s let elements s = fold (fun tl hd -> tl :: hd) s [] let rec subset s1 s2 = match s1, s2 with | N, _ -> true | _, N -> false | C (addr1, ss1, qs1), C (addr2, ss2, qs2) -> if addr1 < addr2 then false else if addr1 = addr2 then A.subset ss1 ss2 && subset qs1 qs2 else subset s1 qs2 (* [union] arbitrarily attempts to preserve sharing between its second argument and its result. *) let rec union s1 s2 = match s1, s2 with | N, s | s, N -> s | C (addr1, ss1, qs1), C (addr2, ss2, qs2) -> if addr1 < addr2 then C (addr1, ss1, union qs1 s2) else if addr1 > addr2 then let s = union s1 qs2 in if s == qs2 then s2 else C (addr2, ss2, s) else let ss = A.union ss1 ss2 in let s = union qs1 qs2 in if A.equal ss ss2 && s == qs2 then s2 else C (addr1, ss, s) (* [inter] arbitrarily attempts to preserve sharing between its first argument and its result. *) let rec inter s1 s2 = match s1, s2 with | N, _ | _, N -> N | C (addr1, ss1, qs1), C (addr2, ss2, qs2) -> if addr1 < addr2 then inter qs1 s2 else if addr1 > addr2 then inter s1 qs2 else let ss = A.inter ss1 ss2 in let s = inter qs1 qs2 in if A.is_empty ss then s else if A.equal ss ss1 && s == qs1 then s1 else C (addr1, ss, s) let choose s = match s with | N -> raise Not_found | C (addr, ss, _) -> assert (not (A.is_empty ss)); addr + A.choose ss let rec compare x y = if x == y then 0 else match x, y with | C (a1, ss1, qs1), C (a2, ss2, qs2) -> begin match Generic.compare a1 a2 with | 0 -> begin match A.compare ss1 ss2 with | 0 -> compare qs1 qs2 | n -> n end | n -> n end | N, N -> 0 | C _, N -> 1 | N, C _ -> -1 let rec equal x y = (x == y) || match x, y with | C (a1, ss1, qs1), C (a2, ss2, qs2) -> a1 = a2 && A.equal ss1 ss2 && equal qs1 qs2 | N, N -> true | C _, N | N, C _ -> false let rec disjoint s1 s2 = match s1, s2 with | N, _ | _, N -> true | C (addr1, ss1, qs1), C (addr2, ss2, qs2) -> if addr1 = addr2 then A.disjoint ss1 ss2 && disjoint qs1 qs2 else if addr1 < addr2 then disjoint qs1 s2 else disjoint s1 qs2 menhir-20210929/src/SparseBitSet.mli000066400000000000000000000017671412503066000171200ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) include GSet.S with type element = int menhir-20210929/src/StackStates.ml000066400000000000000000000133671412503066000166270ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar module Run (S : sig (**[stack_height s] is the height of the known suffix of the stack at state [s]. *) val stack_height: Lr1.node -> int (**[production_height prod] is the height of the known suffix of the stack at a state where production [prod] can be reduced. *) val production_height: Production.index -> int (**[goto_height nt] is the height of the known suffix of the stack at a state where an edge labeled [nt] has just been followed. *) val goto_height: Nonterminal.t -> int end) = struct open S (* We now wish to compute, at each state [s], a vector of sets of states, whose length is [stack_height s]. *) (* Vectors of sets of states. *) module StateSetVector = struct (* We use arrays whose right end represents the top of the stack. *) (* The index 0 corresponds to the cell that lies deepest in the stack. *) let empty, push = MArray.(empty, push) let truncate k v = assert (k <= Array.length v); MArray.truncate k v type property = Lr1.NodeSet.t array let bottom height = Array.make height Lr1.NodeSet.empty let leq_join v1 v2 = MArray.leq_join Lr1.NodeSet.leq_join v1 v2 (* Because all heights are known ahead of time, we are able (and careful) to compare and join only vectors of equal length. *) end open StateSetVector (* Define the data flow graph. *) (* Its vertices are the nodes of the LR(1) automaton. *) module G = struct type variable = Lr1.node type property = StateSetVector.property (* At each start state of the automaton, the stack is empty. *) let foreach_root contribute = Lr1.entry |> ProductionMap.iter (fun _prod root -> assert (stack_height root = 0); contribute root empty ) (* The edges of the data flow graph are the transitions of the automaton. *) let foreach_successor source stack contribute = Lr1.transitions source |> SymbolMap.iter (fun _symbol target -> (* The contribution of [source], through this edge, to [target], is the stack at [source], extended with a new cell for this transition, and truncated to the stack height at [target], so as to avoid obtaining a vector that is longer than expected/necessary. *) let cell = Lr1.NodeSet.singleton source and height = stack_height target in let stack = push stack cell in contribute target (truncate height stack) ) end (* Compute the least fixed point. *) let stack_states : Lr1.node -> property option = let module F = Fix.DataFlow.Run(Lr1.ImperativeNodeMap)(StateSetVector)(G) in F.solution (* If every state is reachable, then the least fixed point must be non-[None] everywhere, so we may view it as a function that produces a vector of sets of states. *) let stack_states (node : Lr1.node) : property = match stack_states node with | None -> (* Apparently this node is unreachable. *) assert false | Some v -> v (* [truncate_join height f nodes] computes a join of the images through [f] of the nodes in the set [nodes], truncated at height [height]. *) let truncate_join height f nodes = Lr1.NodeSet.fold (fun node accu -> leq_join (truncate height (f node)) accu ) nodes (bottom height) (* From the above information, deduce, for each production, the shape of the stack when this production is reduced. *) (* We produce a vector of states whose length is [production_height prod]. It is up to the user to provide an appropriate height oracle. *) let production_states : Production.index -> property = Production.tabulate (fun prod -> let sites = Lr1.production_where prod in let height = production_height prod in truncate_join height stack_states sites ) (* Compute the shape of the stack when a transition on the nonterminal symbol [nt] is taken. *) (* We produce a vector of states whose length is [goto_height nt]. It is up to the user to provide an appropriate height oracle. *) let goto_states : Nonterminal.t -> property = Nonterminal.tabulate (fun nt -> let symbol = Symbol.N nt in (* Compute the join of the stack shapes at every target of an edge labeled with [nt]. *) let targets = Lr1.all_targets symbol in let height = goto_height nt in truncate_join height stack_states targets ) type property = Lr1.NodeSet.t array (* Debugging output. *) let print (v : property) = if Array.length v = 0 then "epsilon" else Misc.separated_list_to_string Lr1.NodeSet.print "; " (Array.to_list v) let dump (prefix : string) f = Lr1.iter (fun node -> Printf.fprintf f "%sstack(%s) = %s\n" prefix (Lr1.print node) (print (stack_states node)) ); Production.iterx (fun prod -> Printf.fprintf f "%sprodstack(%s) = %s\n" prefix (Production.print prod) (print (production_states prod)) ) end (* Run *) menhir-20210929/src/StackStates.mli000066400000000000000000000065321412503066000167740ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (**This module performs a static analysis of the LR(1) automaton in order to determine which states might possibly be held in the known suffix of the stack at every state. We assume that the known suffix of the stack, a sequence of symbols, has already been computed at every state. All that is needed, actually, is the size of the known suffix, given by the function [stack_height]. This size information must be consistent: the size at a state [s] must be no greater than the minimum of the sizes at the predecessors of [s], plus one. *) module Run (S : sig (**[stack_height s] is the height of the known suffix of the stack at state [s]. *) val stack_height: Lr1.node -> int (**[production_height prod] is the height of the known suffix of the stack at a state where production [prod] can be reduced. *) val production_height: Production.index -> int (**[goto_height nt] is the height of the known suffix of the stack at a state where an edge labeled [nt] has just been followed. *) val goto_height: Nonterminal.t -> int end) : sig (**A property is a description of the known suffix of the stack at state [s]. It is represented as an array. By convention, the top of the stack is the end of the array. Each array element is a set of states that may appear in this stack cell. *) type property = Lr1.NodeSet.t array (**[print] prints a property. *) val print: property -> string (**[stack_states s] is the known suffix of the stack at state [s]. *) val stack_states: Lr1.node -> property (**[production_states prod] is the known suffix of the stack at a state where production [prod] can be reduced. In the short invariant, the length of this suffix is [Production.length prod]. In the long invariant, its length can be greater. *) val production_states: Production.index -> property (**[goto_states nt] is the known suffix of the stack at a state where an edge labeled [nt] has just been followed. If [long] is false, then the length of this suffix is [1]. If [long] is true, then its length can be greater. *) val goto_states: Nonterminal.t -> property (**[dump prefix f] dumps the result of the analysis to the output channel [f], in an unspecified format. The string [prefix] is emitted at the beginning of every line of output. *) val dump: string -> out_channel -> unit end menhir-20210929/src/StackSymbols.ml000066400000000000000000000227121412503066000170060ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar module type STACK_SYMBOLS = sig (**A property is a description of the known suffix of the stack at state [s]. It is represented as an array of symbols. By convention, the top of the stack is the end of the array. *) type property = Symbol.t array (**[stack_symbols s] is the known suffix of the stack at state [s]. It is represented as an array of symbols. By convention, the top of the stack is the end of the array. *) val stack_symbols: Lr1.node -> property (**[stack_height s] is [Array.length (stack_symbols s)]. *) val stack_height: Lr1.node -> int (**[production_symbols prod] is the known suffix of the stack at a state where production [prod] can be reduced. In the short invariant, the length of this suffix is [Production.length prod]. In the long invariant, its length can be greater. *) val production_symbols: Production.index -> property (**[production_height prod] is [Array.length (production_symbols prod)]. *) val production_height: Production.index -> int (**[goto_symbols nt] is the known suffix of the stack at a state where an edge labeled [nt] has just been followed. In the short invariant, the length of this suffix is [1]. In the long invariant, its length can be greater. *) val goto_symbols: Nonterminal.t -> property (**[goto_height nt] is [Array.length (goto_symbols nt)]. *) val goto_height: Nonterminal.t -> int end (* We compute a lower bound on the height of the stack at every state, and at the same time, we compute which symbols are held in this stack prefix. *) (* In order to compute a lower bound on the height of the stack at a state [s], we examine the LR(0) items that compose [s]. For each item, if the bullet is at position [pos], then we can be assured that the height of the stack is at least [pos]. Thus, we compute the maximum of [pos] over all items (of which there is at least one). *) (* The set of items that we use is not closed, but this does not matter; the items that would be added by the closure would not add any information regarding the height of the stack, since the bullet is at position 0 in these items. *) (* Instead of computing just the stack height, we compute, in the same manner, which symbols are on the stack at a state [s]. This is an array of symbols whose length is the height of the stack at [s]. By convention, the top of the stack is the end of the array. *) (* This analysis is extremely fast: on an automaton with over 100,000 states, it takes under 0.01 second. *) module Run () = struct type property = Symbol.t array (* Compute and tabulate this information at the level of the LR(0) automaton. *) let stack_symbols : Lr0.node -> property = let dummy = Array.make 0 (Symbol.T Terminal.sharp) in Misc.tabulate Lr0.n (fun node -> Item.Set.fold (fun item accu -> let _prod, _nt, rhs, pos, _length = Item.def item in if pos > Array.length accu then Array.sub rhs 0 pos else accu ) (Lr0.items node) dummy ) (* Extend it to the LR(1) automaton. *) let stack_symbols (node : Lr1.node) : property = stack_symbols (Lr0.core (Lr1.state node)) let stack_height (node : Lr1.node) : int = Array.length (stack_symbols node) (* Add a trivial definition of [production_symbols]. *) let production_symbols = Production.rhs let production_height prod = Array.length (production_symbols prod) (* Add a trivial definition of [goto_symbols]. *) let goto_symbols nt = [| Symbol.N nt |] let goto_height nt = Array.length (goto_symbols nt) end (* ------------------------------------------------------------------------ *) (* The submodule [Long] computes the known suffix of the stack in each state, as a vector of symbols, and it computes a suffix that is as long as possible, in contrast with the above code, which computes a suffix whose length can be predicted by based on the LR(0) items in each state. *) module Long () = struct (* Vectors of symbols. *) module SymbolVector = struct type property = Symbol.t array let empty, push = MArray.(empty, push) (* Given two arrays [v1] and [v2] of lengths [n1] and [n2], the function call [lcs v1 v2 n1 n2 (min n1 n2) 0] computes the greatest [k] such that [truncate k v1] and [truncate k v2] are equal. *) let rec lcs v1 v2 n1 n2 n k = (* [n] is [min n1 n2]. *) if k = n || v1.(n1 - 1 - k) <> v2.(n2 - 1 - k) then k else lcs v1 v2 n1 n2 n (k + 1) let leq_join v1 v2 = let n1 = Array.length v1 and n2 = Array.length v2 in let n = min n1 n2 in let k = lcs v1 v2 n1 n2 n 0 in if k = n2 then v2 else if k = n1 then v1 else MArray.truncate k v1 end include SymbolVector (* Define the data flow graph. *) (* We perform the data flow analysis at the level of the LR(0) automaton. *) module G = struct type variable = Lr0.node type property = SymbolVector.property (* At each start state of the automaton, the stack is empty. *) let foreach_root contribute = Lr0.entry |> ProductionMap.iter (fun _prod root -> contribute root empty ) (* The edges of the data flow graph are the transitions of the automaton. *) let foreach_successor source stack contribute = Lr0.outgoing_edges source |> SymbolMap.iter (fun symbol target -> (* The contribution of [source], through this edge, to [target], is the stack at [source], extended with a new cell for this transition. *) contribute target (push stack symbol) ) end (* Compute the least fixed point. *) let stack_symbols : Lr0.node -> property option = let module F = Fix.DataFlow.Run(Lr0.ImperativeNodeMap)(SymbolVector)(G) in F.solution (* If every state is reachable, then the least fixed point must be non-[None] everywhere, so we may view it as a function that produces a vector of symbols. *) let stack_symbols (node : Lr0.node) : property = match stack_symbols node with | None -> (* Apparently this node is unreachable. *) assert false | Some v -> v (* Move up to the level of the LR(1) automaton. *) let stack_symbols (node : Lr1.node) : property = stack_symbols (Lr0.core (Lr1.state node)) let stack_height (node : Lr1.node) : int = Array.length (stack_symbols node) (* [join1 f nodes] computes the join of the images through [f] of the nodes in the set [nodes]. Because our join does not have a bottom element, this set must be nonempty. *) let join1 f nodes = let node = Lr1.NodeSet.choose nodes in let nodes = Lr1.NodeSet.remove node nodes in Lr1.NodeSet.fold (fun node accu -> leq_join (f node) accu ) nodes (f node) (* From the above information, deduce, for each production, the shape of the stack when this production is reduced. *) (* We *can* produce a vector whose length is greater than that of the production [prod]. *) let production_symbols : Production.index -> property = Production.tabulate (fun prod -> let nodes = Lr1.production_where prod in if Lr1.NodeSet.is_empty nodes then (* This production is never reduced. It is not clear what vector should be returned. Using the right-hand side of the production seems reasonable. This is what the short invariant does. *) Production.rhs prod else (* Compute a join over the set of nodes where this production is reduced. *) join1 stack_symbols nodes ) let production_height prod = Array.length (production_symbols prod) (* Compute the shape of the stack when a transition on the nonterminal symbol [nt] is taken. *) let goto_symbols : Nonterminal.t -> property = Nonterminal.tabulate (fun nt -> let symbol = Symbol.N nt in (* Compute the join of the stack shapes at every target of an edge labeled with [nt]. *) let targets = Lr1.all_targets symbol in if Lr1.NodeSet.is_empty targets then (* No edge is labeled [nt]. *) [| symbol |] else join1 stack_symbols targets ) let goto_height nt = Array.length (goto_symbols nt) end (* ------------------------------------------------------------------------ *) (* Printing. *) let buffer = Buffer.create 1024 let print_symbols symbols = symbols |> Array.iter (fun symbol -> Printf.bprintf buffer " %s" (Symbol.print symbol) ); let s = Buffer.contents buffer in Buffer.clear buffer; s menhir-20210929/src/StackSymbols.mli000066400000000000000000000070121412503066000171530ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar module type STACK_SYMBOLS = sig (**A property is a description of the known suffix of the stack at state [s]. It is represented as an array of symbols. By convention, the top of the stack is the end of the array. *) type property = Symbol.t array (**[stack_symbols s] is the known suffix of the stack at state [s]. It is represented as an array of symbols. By convention, the top of the stack is the end of the array. *) val stack_symbols: Lr1.node -> property (**[stack_height s] is [Array.length (stack_symbols s)]. *) val stack_height: Lr1.node -> int (**[production_symbols prod] is the known suffix of the stack at a state where production [prod] can be reduced. In the short invariant, the length of this suffix is [Production.length prod]. In the long invariant, its length can be greater. *) val production_symbols: Production.index -> property (**[production_height prod] is [Array.length (production_symbols prod)]. *) val production_height: Production.index -> int (**[goto_symbols nt] is the known suffix of the stack at a state where an edge labeled [nt] has just been followed. In the short invariant, the length of this suffix is [1]. In the long invariant, its length can be greater. *) val goto_symbols: Nonterminal.t -> property (**[goto_height nt] is [Array.length (goto_symbols nt)]. *) val goto_height: Nonterminal.t -> int end (**This module computes the known suffix of the stack, a sequence of symbols, in each of the automaton's states. The length of this sequence can be predicted based on the LR(0) items present in this state: it is the maximum position of the bullet over all items. *) module Run () : STACK_SYMBOLS (**This module computes the known suffix of the stack, a sequence of symbols, in each of the automaton's states. The length of this sequence is determined by an analysis of the paths in the LR(0) automaton. At each state, the sequence computed by [Run] is always a suffix of the sequence computed by [Long]. *) module Long () : STACK_SYMBOLS (* The "long invariant" was used in Menhir until 2012/08/25. However, the extra information that it contains, compared to the "short invariant", was useless; computing it was a waste of time. As of 2012/08/25, the short invariant has been used. As of 2021/05/14, the long invariant is re-introduced, for possible use in the new code back-end. *) (**This utility function prints a sequence of symbols. Every symbol is preceded with a space. *) val print_symbols: Symbol.t array -> string menhir-20210929/src/Trie.ml000066400000000000000000000202711412503066000152710ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* -------------------------------------------------------------------------- *) (* We begin with a number of auxiliary functions that provide information about the LR(1) automaton. These functions could perhaps be moved elsewhere, e.g., inside [Default]. We keep them here, for now, because they are not used anywhere else. *) (* [can_reduce s prod] indicates whether state [s] is able to reduce production [prod] (either as a default reduction, or as a normal reduction). *) let can_reduce s prod = match Default.has_default_reduction s with | Some (prod', _) when prod = prod' -> true | _ -> TerminalMap.fold (fun z prods accu -> (* A reduction on [#] is always a default reduction. (See [lr1.ml].) *) assert (not (Terminal.equal z Terminal.sharp)); accu || Terminal.non_error z && List.mem prod prods ) (Lr1.reductions s) false (* [reduction_path_exists s w prod] tests whether the path determined by the sequence of symbols [w] out of the state [s] exists in the automaton and leads to a state where [prod] can be reduced. It further requires [w] to not contain the [error] token. *) let rec reduction_path_exists s (w : Symbol.t list) prod : bool = match w with | [] -> can_reduce s prod | a :: w -> Symbol.non_error a && match SymbolMap.find a (Lr1.transitions s) with | s -> reduction_path_exists s w prod | exception Not_found -> false (* -------------------------------------------------------------------------- *) (* Tries. *) module Make (X : sig end) = struct (* A trie has the following structure. *) type trie = { (* A unique identity, used by [compare]. The trie construction code ensures that these numbers are indeed unique: see [fresh], [insert], [star]. *) identity: int; (* The root state of this star: "where we come from". *) source: Lr1.node; (* The current state, i.e., the root of this sub-trie: "where we are". *) current: Lr1.node; (* The productions that we can reduce in the current state. In other words, if this list is nonempty, then the current state is the end of one (or several) branches. It can nonetheless have children. *) mutable productions: Production.index list; (* The children, or sub-tries. *) mutable transitions: trie SymbolMap.t (* The two fields above are written only during the construction of a trie. Once every trie has been constructed, they are frozen. *) } (* This counter is used by [mktrie] to produce unique identities. *) let c = ref 0 (* We keep a mapping of integer identities to tries. Whenever a new identity is assigned, this mapping must be updated. *) let tries = let s : Lr1.node = Obj.magic () in (* yes, this hurts *) let dummy = { identity = -1; source = s; current = s; productions = []; transitions = SymbolMap.empty } in MenhirLib.InfiniteArray.make dummy (* This smart constructor creates a new trie with a unique identity. *) let mktrie source current productions transitions = let identity = Misc.postincrement c in let t = { identity; source; current; productions; transitions } in MenhirLib.InfiniteArray.set tries identity t; t (* [insert t w prod] updates the trie (in place) by adding a new branch, corresponding to the sequence of symbols [w], and ending with a reduction of production [prod]. We assume [reduction_path_exists w prod t.current] holds, so we need not worry about this being a dead branch, and we can use destructive updates without having to set up an undo mechanism. *) let rec insert (t : trie) (w : Symbol.t list) prod : unit = match w with | [] -> assert (can_reduce t.current prod); t.productions <- prod :: t.productions | a :: w -> match SymbolMap.find a (Lr1.transitions t.current) with | exception Not_found -> assert false | successor -> (* Find our child at [a], or create it. *) let t' = try SymbolMap.find a t.transitions with Not_found -> let t' = mktrie t.source successor [] SymbolMap.empty in t.transitions <- SymbolMap.add a t' t.transitions; t' in (* Update our child. *) insert t' w prod (* [insert t prod] inserts a new branch, corresponding to production [prod], into the trie [t], which is updated in place. *) let insert t prod : unit = let w = Array.to_list (Production.rhs prod) in (* Check whether the path [w] leads to a state where [prod] can be reduced. If not, then some transition or reduction action must have been suppressed by conflict resolution; or the path [w] involves the [error] token. In that case, the branch is dead, and is not added. This test is superfluous (i.e., it would be OK to add a dead branch) but allows us to build a slightly smaller star in some cases. *) if reduction_path_exists t.current w prod then insert t w prod (* [fresh s] creates a new empty trie whose source is [s]. *) let fresh source = mktrie source source [] SymbolMap.empty (* The star at [s] is obtained by starting with a fresh empty trie and inserting into it every production [prod] whose left-hand side [nt] is the label of an outgoing edge at [s]. *) let star s = let t = fresh s in SymbolMap.iter (fun sym _ -> match sym with | Symbol.T _ -> () | Symbol.N nt -> Production.iternt nt (insert t) ) (Lr1.transitions s); t (* A trie [t] is nontrivial if it has at least one branch, i.e., contains at least one sub-trie whose [productions] field is nonempty. Trivia: a trie of size greater than 1 is necessarily nontrivial, but the converse is not true: a nontrivial trie can have size 1. (This occurs if all productions have zero length.) *) let trivial t = t.productions = [] && SymbolMap.is_empty t.transitions (* Redefine [star] to record the size of the newly built trie. *) let size = Array.make Lr1.n (-1) let star s = let initial = !c in let t = star s in let final = !c in size.(Lr1.number s) <- final - initial; t (* Define [stars] to build all stars and pass all nontrivial ones to [f]. *) let stars f = (* For every state [s]... *) Lr1.iter (fun s -> (* Build the trie rooted at [s]. If it is nontrivial, invoke [f]. *) let t = star s in if not (trivial t) then f s t ) let size s = assert (size.(s) >= 0); size.(s) let total_size () = !c let compare t1 t2 = Generic.compare t1.identity t2.identity let source t = t.source let current t = t.current let accepts prod t = List.mem prod t.productions let step a t = SymbolMap.find a t.transitions (* careful: may raise [Not_found] *) let verbose () = Printf.eprintf "Total star size: %d\n%!" (total_size()) let decode i = let t = MenhirLib.InfiniteArray.get tries i in assert (t.identity = i); (* ensure we do not get the [dummy] trie *) t let encode t = assert (decode t.identity == t); (* round-trip property *) t.identity end menhir-20210929/src/Trie.mli000066400000000000000000000100651412503066000154420ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* Suppose [s] is a state that carries an outgoing edge labeled with a non-terminal symbol [nt]. We are interested in finding out how this edge can be taken. In order to do that, we must determine how, by starting in [s], one can follow a path that corresponds to (the right-hand side of) a production [prod] associated with [nt]. There are in general several such productions. The paths that they determine in the automaton form a "star". We represent the star rooted at [s] as a trie. A point in a trie (that is, a sub-trie) tells us where we come from, where we are, and which production(s) we are hoping to reduce in the future. *) (* This module depends on [Grammar], [Lr1], [Default]: that is, we assume that the automaton has been fully constructed. It is used by [LRijkstra]. *) module Make (X : sig end) : sig type trie (* [stars f] constructs the trie rooted at every state [s]. (There is one branch for every production [prod] associated with every non-terminal symbol [nt] for which [s] carries an outgoing edge.) If this trie [t] is nontrivial (i.e., it has at least one branch, leading to a state where a production can be reduced), then [f s t] is invoked. *) val stars: (Lr1.node -> trie -> unit) -> unit (* After [stars] has been called, [size (Lr1.number s)] reports the size of the trie that has been constructed for state [s]. *) val size: int -> int (* After [stars] has been called, [total_size()] reports the total size of the tries that have been constructed. *) val total_size: unit -> int (* Every (sub-)trie has a unique identity. (One can think of it as its address.) [compare] compares the identity of two tries. This can be used, e.g., to set up a map whose keys are tries. *) val compare: trie -> trie -> int (* [source t] returns the source state of the (sub-)trie [t]. This is the root of the star of which [t] is a sub-trie. In other words, this tells us "where we come from". *) val source: trie -> Lr1.node (* [current t] returns the current state of the (sub-)trie [t]. This is the root of the sub-trie [t]. In other words, this tells us "where we are". *) val current: trie -> Lr1.node (* [accepts prod t] tells whether the current state of the trie [t] is the end of a branch associated with production [prod]. If so, this means that we have successfully followed a path that corresponds to the right-hand side of production [prod]. *) val accepts: Production.index -> trie -> bool (* [step sym t] is the immediate sub-trie of [t] along the symbol [sym]. This function raises [Not_found] if [t] has no child labeled [sym]. *) val step: Symbol.t -> trie -> trie (* [verbose()] outputs debugging & performance information. *) val verbose: unit -> unit (* Since every (sub-)trie has a unique identity, its identity can serve as a unique integer code for this (sub-)trie. We allow this conversion, both ways. This mechanism is used only as a way of saving space in the encoding of facts. *) val encode: trie -> int val decode: int -> trie end menhir-20210929/src/Unifier.ml000066400000000000000000000144671412503066000160010ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module provides a simple-minded implementation of first-order unification over an arbitrary signature. *) (* -------------------------------------------------------------------------- *) (* The signature must be described by the client, as follows. *) module type STRUCTURE = sig (* The type ['a structure] should be understood as a type of shallow terms whose leaves have type ['a]. *) type 'a structure val map: ('a -> 'b) -> 'a structure -> 'b structure val iter: ('a -> unit) -> 'a structure -> unit (* [iter2] fails if the head constructors differ. *) exception Iter2 val iter2: ('a -> 'b -> unit) -> 'a structure -> 'b structure -> unit end (* -------------------------------------------------------------------------- *) (* The unifier. *) module Make (S : STRUCTURE) = struct type 'a structure = 'a S.structure (* The data structure maintained by the unifier is as follows. *) (* A unifier variable is a point of the union-find algorithm. *) type variable = descriptor UnionFind.point and descriptor = { (* Every equivalence class carries a globally unique identifier. When a new equivalence class is created, a fresh identifier is chosen, and when two classes are merged, one of the two identifiers is kept. This identifier can be used as a key in a hash table. One should be aware, though, that identifiers are stable only as long as no unions are performed. *) id : int; (* Every equivalence class carries a structure, which is either [None], which means that the variable is just that, a variable; or [Some t], which means that the variable represents (has been equated with) the term [t]. *) structure : variable structure option; (* Every equivalence class carries a mutable mark, which is used only by the occurs check. We could also remove this field altogether and use a separate hash table, where [id]s serve as keys, but this should be faster. The occurs check is performed eagerly, so this could matter. *) mutable mark : Mark.t; } (* -------------------------------------------------------------------------- *) (* Accessors. *) let id v = (UnionFind.get v).id let structure v = (UnionFind.get v).structure (* -------------------------------------------------------------------------- *) (* [fresh] creates a fresh variable with specified structure. *) let fresh = let c = ref 0 in fun structure -> let id = Misc.postincrement c in let mark = Mark.none in UnionFind.fresh { id; structure; mark } (* -------------------------------------------------------------------------- *) (* [occurs_check x y] checks that [x] does not occur within [y]. *) exception Occurs of variable * variable let occurs_check x y = (* Generate a fresh color for this particular traversal. *) let black = Mark.fresh () in (* The traversal code -- a depth-first search. *) let rec visit z = let desc = UnionFind.get z in if not (Mark.same desc.mark black) then begin desc.mark <- black; (* We are looking for [x]. *) if UnionFind.equivalent x z then raise (Occurs (x, y)) else Option.iter (S.iter visit) desc.structure end in (* The root is [y]. *) visit y (* -------------------------------------------------------------------------- *) (* The internal function [unify v1 v2] equates the variables [v1] and [v2] and propagates the consequences of this equation until a cycle is detected, an inconsistency is found, or a solved form is reached. The exceptions that can be raised are [Occurs] and [S.Iter2]. *) let rec unify (v1 : variable) (v2 : variable) : unit = if not (UnionFind.equivalent v1 v2) then begin let desc1 = UnionFind.get v1 and desc2 = UnionFind.get v2 in (* Unify the two descriptors. *) let desc = match desc1.structure, desc2.structure with | None, None -> (* variable/variable *) desc1 | None, Some _ -> (* variable/term *) occurs_check v1 v2; desc2 | Some _, None -> (* term/variable *) occurs_check v2 v1; desc1 | Some s1, Some s2 -> (* term/term *) S.iter2 unify s1 s2; { desc1 with structure = Some s1 } in (* Merge the equivalence classes. Do this last, so we get more meaningful output if the recursive call (above) fails and we have to print the two terms. *) UnionFind.union v1 v2; UnionFind.set v1 desc end (* -------------------------------------------------------------------------- *) (* The public version of [unify]. *) exception Unify of variable * variable let unify v1 v2 = try unify v1 v2 with S.Iter2 -> raise (Unify (v1, v2)) (* -------------------------------------------------------------------------- *) (* Decoding an acyclic graph as a deep term. *) (* This is a simple-minded version of the code, where sharing is lost. Its cost could be exponential if there is a lot of sharing. In practice, its use is usually appropriate, especially in the scenario where the term is meant to be printed as a tree. *) type term = | TVar of int | TNode of term structure let rec decode (v : variable) : term = match structure v with | None -> TVar (id v) | Some t -> TNode (S.map decode t) (* -------------------------------------------------------------------------- *) end menhir-20210929/src/Unifier.mli000066400000000000000000000053651412503066000161470ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module provides a simple-minded implementation of first-order unification over an arbitrary signature. *) (* -------------------------------------------------------------------------- *) (* The signature must be described by the client, as follows. *) module type STRUCTURE = sig (* The type ['a structure] should be understood as a type of shallow terms whose leaves have type ['a]. *) type 'a structure val map: ('a -> 'b) -> 'a structure -> 'b structure val iter: ('a -> unit) -> 'a structure -> unit (* [iter2] fails if the head constructors differ. *) exception Iter2 val iter2: ('a -> 'b -> unit) -> 'a structure -> 'b structure -> unit end (* -------------------------------------------------------------------------- *) (* The unifier. *) module Make (S : STRUCTURE) : sig (* The type of unification variables. *) type variable (* [fresh s] creates a fresh variable that carries the structure [s]. *) val fresh: variable S.structure option -> variable (* [structure x] returns the structure (currently) carried by variable [x]. *) val structure: variable -> variable S.structure option (* [unify x y] attempts to unify the terms represented by the variables [x] and [y]. The creation of cycles is not permitted; an eager occurs check rules them out. *) exception Unify of variable * variable exception Occurs of variable * variable val unify: variable -> variable -> unit (* This is the type of deep terms over the signature [S]. *) type term = | TVar of int (* the variable's unique identity *) | TNode of term S.structure (* [decode x] turns the variable [x] into the term that it represents. Sharing is lost, so this operation can in the worst case have exponential cost. *) val decode: variable -> term end menhir-20210929/src/action.ml000066400000000000000000000211721412503066000156440ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Keyword type t = { (* The code for this semantic action. *) expr: IL.expr; (* This Boolean flag indicates whether this semantic action originates from Menhir's standard library. Via inlining, several semantic actions can be combined into one; in that case, we take a conjunction. *) standard: bool; (* A superset of the free variables that this semantic action can use in order to refer to a semantic value. *) semvars: StringSet.t; (* The set of keywords that appear in this semantic action. They can be thought of as free variables that refer to positions. They must be renamed during inlining. *) keywords : KeywordSet.t; } (* -------------------------------------------------------------------------- *) (* Constructors. *) let from_stretch xs s = { expr = IL.ETextual s; standard = s.Stretch.stretch_filename = Settings.stdlib_filename; semvars = xs; keywords = KeywordSet.of_list s.Stretch.stretch_keywords } let rec from_il_expr e = { expr = e; standard = true; semvars = fv e; keywords = KeywordSet.empty; } (* We are lazy and write an incomplete and somewhat ad hoc [fv] function for IL expressions. *) and fv e = fv_aux StringSet.empty e and fv_aux accu e = IL.(match e with | EVar x -> StringSet.add x accu | ETextual _ -> (* Considering the manner in which [Action.from_il_expr] is used in the module NewRuleSyntax, this piece of text is expected to stand for a data constructor, not a semantic value, so we can take its set of free variables to be empty. This is rather ad hoc and fragile. *) accu | ETuple es -> List.fold_left fv_aux accu es | EApp (e, es) -> List.fold_left fv_aux accu (e :: es) | _ -> assert false (* unsupported *)) (* -------------------------------------------------------------------------- *) (* Building [let x = a1 in a2]. *) let compose x a1 a2 = (* 2015/07/20: there used to be a call to [parenthesize_stretch] here, which would insert parentheses around every stretch in [a1]. This is not necessary, as far as I can see, since every stretch that represents a semantic action is already parenthesized by the lexer. *) { expr = CodeBits.blet ([ IL.PVar x, a1.expr ], a2.expr); semvars = StringSet.union a1.semvars (StringSet.remove x a2.semvars); keywords = KeywordSet.union a1.keywords a2.keywords; standard = a1.standard && a2.standard; } (* Building [let p = x in a]. *) let rec bind p x a = { expr = CodeBits.blet ([ p, IL.EVar x ], a.expr); semvars = StringSet.add x (StringSet.diff a.semvars (bv p)); keywords = a.keywords; standard = a.standard; } (* We are lazy and write an incomplete and somewhat ad hoc [bv] function for IL patterns. See also [NewRuleSyntax.bv]. *) and bv p = bv_aux StringSet.empty p and bv_aux accu p = IL.(match p with | PWildcard | PUnit -> accu | PVar x -> StringSet.add x accu | PTuple ps -> List.fold_left bv_aux accu ps | _ -> assert false (* unsupported *)) (* -------------------------------------------------------------------------- *) (* Accessors. *) let to_il_expr action = action.expr let is_standard action = action.standard let semvars action = action.semvars let keywords action = action.keywords let has_syntaxerror action = KeywordSet.mem SyntaxError action.keywords let has_beforeend action = KeywordSet.mem (Position (Before, WhereEnd, FlavorPosition)) action.keywords let posvars action = KeywordSet.fold (fun keyword accu -> match keyword with | SyntaxError -> accu | Position (s, w, f) -> let x = Keyword.posvar s w f in StringSet.add x accu ) (keywords action) StringSet.empty let vars action = StringSet.union (semvars action) (posvars action) (* -------------------------------------------------------------------------- *) (* Defining a keyword in terms of other keywords. *) let define keyword keywords f action = assert (KeywordSet.mem keyword action.keywords); { expr = f action.expr; standard = action.standard; semvars = action.semvars; keywords = KeywordSet.union keywords (KeywordSet.remove keyword action.keywords) } (* -------------------------------------------------------------------------- *) (* Simultaneous substitutions are represented as association lists, where no name appears twice in the domain. We distinguish renamings that affect semantic-value variables and renamings that affect position-keyword variables. The two are unfortunately somewhat linked because our position-keyword variables are named after semantic-value variables; this is quite a mess. *) type subst = { semvar: (string * string) list; posvar: (string * string) list; } let empty = { semvar = []; posvar = [] } let extend1 x y var = assert (not (List.mem_assoc x var)); if x <> y then (x, y) :: var else var let extend_semvar x y { semvar; posvar } = { semvar = extend1 x y semvar; posvar } let extend_posvar x y { semvar; posvar } = { semvar; posvar = extend1 x y posvar } let extend = extend_semvar let apply1 var x = try List.assoc x var with Not_found -> x let apply_semvar phi x = apply1 phi.semvar x let apply_subject phi subject = match subject with | Before | Left -> subject | RightNamed x -> RightNamed (apply_semvar phi x) let bindings phi = phi.posvar @ phi.semvar let restrict1 xs var = List.filter (fun (x, _y) -> StringSet.mem x xs) var let restrict_semvar xs { semvar; posvar } = { semvar = restrict1 xs semvar; posvar } (* -------------------------------------------------------------------------- *) (* [rename_keyword f phi keyword] applies [f] to possibly transform the keyword [keyword]. If [f] decides to change this keyword (by returning [Some _]) then this decision is obeyed. Otherwise, the keyword is renamed by the substitution [phi]. In either case, [phi] is extended with a renaming decision. *) let rename_keyword f (phi : subst ref) keyword : keyword = match keyword with | SyntaxError -> SyntaxError | Position (subject, where, flavor) -> let subject', where' = match f (subject, where) with | Some (subject', where') -> subject', where' | None -> apply_subject !phi subject, where in phi := extend_posvar (Keyword.posvar subject where flavor) (Keyword.posvar subject' where' flavor) !phi; Position (subject', where', flavor) let rename f phi a = (* Rename all keywords, growing [phi] as we go. *) let keywords = a.keywords in let phi = ref phi in let keywords = KeywordSet.map (rename_keyword f phi) keywords in let phi = !phi in let standard = a.standard in (* Restrict [phi.semvar] to the set [a.semvars], in order to avoid generating [let] bindings that would be both useless and harmful: if [x] is not free in [e], then we do not want to generate [let x = x' in e], as suddenly [x'] would be free in this new expression. *) let phi = restrict_semvar a.semvars phi in (* Apply [phi.semvar] to the set of free variables. *) let semvars = StringSet.map (apply_semvar phi) a.semvars in (* Construct a new semantic action, where [phi] is translated into a set of *simultaneous* [let] bindings. (We cannot use a series of nested [let] bindings, as that would cause a capture if the domain and codomain of [phi] have a nonempty intersection.) *) let phi = List.map (fun (x, y) -> IL.PVar x, IL.EVar y) (bindings phi) in let expr = CodeBits.eletand (phi, a.expr) in { expr; standard; semvars; keywords } menhir-20210929/src/action.mli000066400000000000000000000125731412503066000160220ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Keyword (* from sdk/ *) (** A semantic action is a piece of OCaml code together with information about the free variables that appear in this code (which refer to semantic values) and information about the keywords that appear in this code. The code can be represented internally as a piece of text or (more generally) as an IL expression. *) type t (* -------------------------------------------------------------------------- *) (* Constructors. *) (** [from_stretch xs s] builds an action out of a textual piece of code. The set [xs] must contain all of the variables that occur free in the semantic action and denote a semantic value. *) val from_stretch: StringSet.t -> Stretch.t -> t (** [from_il_expr] builds an action out of an IL expression. Not every IL expression is accepted; only the expressions built by NewRuleSyntax are accepted. *) val from_il_expr: IL.expr -> t (** [compose x a1 a2] builds the action [let x = a1 in a2]. This combinator is used during inlining (that is, while eliminating %inlined symbols). *) val compose : string -> t -> t -> t (** [bind p x a] binds the OCaml pattern [p] to the OCaml variable [x] in the semantic action [a]. Therefore, it builds the action [let p = x in a]. Not every IL pattern is accepted; only those built by NewRuleSyntax are accepted. *) val bind: IL.pattern -> string -> t -> t (* -------------------------------------------------------------------------- *) (* Accessors. *) (** [to_il_expr] converts an action to an IL expression. *) val to_il_expr: t -> IL.expr (** [is_standard a] indicates whether the action [a] originates from Menhir's standard library. Via inlining, several actions can be combined into one; in that case, we take a conjunction *) val is_standard: t -> bool (** [semvars a] is a superset of the free variables that may appear in the action [a] to denote a semantic value. *) val semvars: t -> StringSet.t (** [keywords a] is the set of keywords used in the action [a]. *) val keywords: t -> KeywordSet.t (** [has_syntaxerror a] tests whether the keyword [$syntaxerror] appears in the set [keywords a]. *) val has_syntaxerror: t -> bool (** [has_beforeend a] tests whether the keyword [$endpos($0)] appears in the set [keywords a]. *) val has_beforeend: t -> bool (** [posvars a] is the set of conventional position variables that correspond to the position keywords used in the action [a]. *) val posvars: t -> StringSet.t (** [vars a] is the union of [semvars a] and [posvars a]. *) val vars: t -> StringSet.t (* -------------------------------------------------------------------------- *) (* Keyword expansion. *) (** [define keyword keywords f a] defines away the keyword [keyword]. This keyword is removed from the set of keywords of the action [a]; the set [keywords] is added in its place. The code of the action [a] is transformed by the function [f], which typically wraps its argument in some new [let] bindings. *) val define: keyword -> KeywordSet.t -> (IL.expr -> IL.expr) -> t -> t (* -------------------------------------------------------------------------- *) (* Variable renaming and keyword transformation. *) (* Some keyword contains names: [$startpos(foo)] is an example. If one wishes for some reason to rename the variable [foo] to [bar], then this keyword must be renamed to [$startpos(bar)]. Furthermore, during inlining, it may be necessary to transform a keyword into another keyword: e.g., if [x] is inlined away and replaced with a sequence of [y] and [z], then [$startpos(x)] must be renamed to [$startpos(y)] and [$endpos(x)] must be renamed to [$endpos(z)]. *) (** A variable-to-variable substitution is a finite map of variables to variables. It can be semantically interpreted as a simultaneous binding construct, that is, as a [let/and] construct. *) type subst (** The empty substitution. *) val empty: subst (** Extending a substitution. *) val extend: string -> string -> subst -> subst (** [rename f phi a] transforms the action [a] by applying the renaming [phi] as well as the keyword transformations determined by the function [f]. The function [f] is applied to each (not-yet-renamed) keyword and may decide to transform it into another keyword, by returning [Some _], or to not transform it, by returning [None]. In the latter case, [phi] still applies to the keyword. *) val rename: (subject * where -> (subject * where) option) -> subst -> t -> t menhir-20210929/src/anonymous.ml000066400000000000000000000135001412503066000164130ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Syntax (* For each anonymous rule, we define a fresh nonterminal symbol, and replace the anonymous rule with a reference to this symbol. If the anonymous rule appears inside a parameterized rule, then we must define a parameterized nonterminal symbol. *) (* ------------------------------------------------------------------------ *) (* Computing the free names of some syntactic categories. *) let rec fn_parameter accu (p : parameter) = (* [p] cannot be [ParameterAnonymous _]. *) let x, ps = Parameters.unapp p in let accu = StringSet.add (Positions.value x) accu in fn_parameters accu ps and fn_parameters accu ps = List.fold_left fn_parameter accu ps let fn_producer accu ((_, p, _) : producer) = fn_parameter accu p let fn_branch accu branch = List.fold_left fn_producer accu branch.pr_producers let fn_branches accu branches = List.fold_left fn_branch accu branches (* ------------------------------------------------------------------------ *) (* This functor makes it easy to share mutable internal state between the functions that follow. *) module Run (X : sig end) = struct (* ------------------------------------------------------------------------ *) (* A fresh name generator. *) let fresh : unit -> string = let next = ref 0 in fun () -> Printf.sprintf "__anonymous_%d" (Misc.postincrement next) (* ------------------------------------------------------------------------ *) (* A rule accumulator. Used to collect the fresh definitions that we produce. *) let rules = ref [] (* ------------------------------------------------------------------------ *) (* [anonymous pos parameters branches] deals with an anonymous rule, at position [pos], which appears inside a possibly-parameterized rule whose parameters are [parameters], and whose body is [branches]. We assume that [branches] does not itself contain any anonymous rules. As a side effect, we create a fresh definition, and return its name. *) let var (symbol : symbol) : parameter = ParameterVar (Positions.unknown_pos symbol) let anonymous pos (parameters : symbol list) (branches : parameterized_branch list) : parameter = (* Compute the free symbols of [branches]. They should form a subset of [parameters], although we have not yet checked this. We create a definition that is parameterized only over the parameters that actually occur free in the definition -- i.e., a definition without useless parameters. This seems important, as (in some situations) it avoids duplication and leads to fewer states in the automaton. *) let used = fn_branches StringSet.empty branches in let parameters = List.filter (fun x -> StringSet.mem x used) parameters in (* Generate a fresh non-terminal symbol. *) let symbol = fresh() in (* Construct its definition. Note that it is implicitly marked %inline. Also, it does not carry any attributes; this is consistent with the fact that %inline symbols cannot carry attributes. *) let rule = { pr_public_flag = false; pr_inline_flag = true; pr_nt = symbol; pr_positions = [ pos ]; (* this list is not allowed to be empty *) pr_attributes = []; pr_parameters = parameters; pr_branches = branches } in (* Record this definition. *) rules := rule :: !rules; (* Return the symbol that stands for it. *) Parameters.app (Positions.with_pos pos symbol) (List.map var parameters) (* ------------------------------------------------------------------------ *) (* Traversal code. *) let rec transform_parameter (parameters : symbol list) (p : parameter) : parameter = match p with | ParameterVar _ -> p | ParameterApp (x, ps) -> ParameterApp (x, List.map (transform_parameter parameters) ps) | ParameterAnonymous branches -> let pos = Positions.position branches and branches = Positions.value branches in (* Do not forget the recursive invocation! *) let branches = List.map (transform_parameterized_branch parameters) branches in (* This is where the real work is done. *) anonymous pos parameters branches and transform_producer parameters ((x, p, attrs) : producer) = x, transform_parameter parameters p, attrs and transform_parameterized_branch parameters branch = let pr_producers = List.map (transform_producer parameters) branch.pr_producers in { branch with pr_producers } let transform_parameterized_rule rule = let pr_branches = List.map (transform_parameterized_branch rule.pr_parameters) rule.pr_branches in { rule with pr_branches } end (* ------------------------------------------------------------------------ *) (* The main entry point invokes the functor and reads its result. *) let transform_partial_grammar g = let module R = Run(struct end) in let pg_rules = List.map R.transform_parameterized_rule g.pg_rules in let pg_rules = !R.rules @ pg_rules in { g with pg_rules } menhir-20210929/src/anonymous.mli000066400000000000000000000020371412503066000165670ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Syntax val transform_partial_grammar: partial_grammar -> partial_grammar menhir-20210929/src/astar.ml000066400000000000000000000244111412503066000155000ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module implements A* search, following Hart, Nilsson, and Raphael (1968). To each visited graph node, the algorithm associates an internal record, carrying various information. For this reason, the algorithm's space complexity is, in the worst case, linear in the size of the graph. The mapping of nodes to internal records is implemented via a hash table, while the converse mapping is direct (via a record field). Nodes that remain to be examined are kept in a priority queue, where the priority of a node is the cost of the shortest known path from the start node to it plus the estimated cost of a path from this node to a goal node. (Lower priority nodes are considered first). It is the use of the second summand that makes A* more efficient than Dijkstra's standard algorithm for finding shortest paths in an arbitrary graph. In fact, when [G.estimate] is the constant zero function, A* coincides with Dijkstra's algorithm. One should note that A* is faster than Dijkstra's algorithm only when a path to some goal node exists. Otherwise, both algorithms explore the entire graph, and have similar time requirements. The priority queue is implemented as an array of doubly linked lists. *) module Make (G : sig (* Graph nodes. *) type node include Hashtbl.HashedType with type t := node (* Edge labels. *) type label (* The source node(s). *) val sources: (node -> unit) -> unit (* [successors n f] presents each of [n]'s successors, in an arbitrary order, to [f], together with the cost of the edge that was followed. *) val successors: node -> (label -> int -> node -> unit) -> unit (* An estimate of the cost of the shortest path from the supplied node to some goal node. For algorithms such as A* and IDA* to find shortest paths, this estimate must be a correct under-approximation of the actual cost. *) val estimate: node -> int end) = struct type cost = int (* Nodes with low priorities are dealt with first. *) type priority = cost (* Paths back to a source (visible by the user). *) type path = | Edge of G.label * path | Source of G.node let rec follow labels path = match path with | Source node -> node, labels | Edge (label, path) -> follow (label :: labels) path let reverse path = follow [] path type inode = { (* Graph node associated with this internal record. *) this: G.node; (* Cost of the best known path from a source node to this node. (ghat) *) mutable cost: cost; (* Estimated cost of the best path from this node to a goal node. (hhat) *) estimate: cost; (* Best known path from a source node to this node. *) mutable path: path; (* Previous node on doubly linked priority list *) mutable prev: inode; (* Next node on doubly linked priority list *) mutable next: inode; (* The node's priority, if the node is in the queue; -1 otherwise *) mutable priority: priority; } (* This auxiliary module maintains a mapping of graph nodes to internal records. *) module M : sig (* Adds a binding to the mapping. *) val add: G.node -> inode -> unit (* Retrieves the internal record for this node. Raises [Not_found] no such record exists. *) val get: G.node -> inode end = struct module H = Hashtbl.Make(struct include G type t = node end) let t = H.create 100003 let add node inode = H.add t node inode let get node = H.find t node end (* This auxiliary module maintains a priority queue of internal records. *) module P : sig (* Adds this node to the queue. *) val add: inode -> priority -> unit (* Adds this node to the queue, or changes its priority, if it already was in the queue. It is assumed, in the second case, that the priority can only decrease. *) val add_or_decrease: inode -> priority -> unit (* Retrieve a node with lowest priority of the queue. *) val get: unit -> inode option end = struct module InfiniteArray = MenhirLib.InfiniteArray (* Array of pointers to the doubly linked lists, indexed by priorities. There is no a priori bound on the size of this array -- its size is increased if needed. It is up to the user to use a graph where paths have reasonable lengths. *) let a = InfiniteArray.make None (* Index of lowest nonempty list, if there is one; or lower (sub-optimal, but safe). If the queue is empty, [best] is arbitrary. *) let best = ref 0 (* Current number of elements in the queue. Used in [get] to stop the search for a nonempty bucket. *) let cardinal = ref 0 (* Adjust node's priority and insert into doubly linked list. *) let add inode priority = assert (0 <= priority); cardinal := !cardinal + 1; inode.priority <- priority; match InfiniteArray.get a priority with | None -> InfiniteArray.set a priority (Some inode); (* Decrease [best], if necessary, so as not to miss the new element. In the special case of A*, this never happens. *) assert (!best <= priority); (* if priority < !best then best := priority *) | Some inode' -> inode.next <- inode'; inode.prev <- inode'.prev; inode'.prev.next <- inode; inode'.prev <- inode (* Takes a node off its doubly linked list. Does not adjust [best], as this is not necessary in order to preserve the invariant. *) let remove inode = cardinal := !cardinal - 1; if inode.next == inode then InfiniteArray.set a inode.priority None else begin InfiniteArray.set a inode.priority (Some inode.next); inode.next.prev <- inode.prev; inode.prev.next <- inode.next; inode.next <- inode; inode.prev <- inode end; inode.priority <- -1 let rec get () = if !cardinal = 0 then None else get_nonempty() and get_nonempty () = (* Look for next nonempty bucket. We know there is one. This may seem inefficient, because it is a linear search. However, in A*, [best] never decreases, so the total cost of this loop is the maximum priority ever used. *) match InfiniteArray.get a !best with | None -> best := !best + 1; get_nonempty() | Some inode as result -> remove inode; result let add_or_decrease inode priority = if inode.priority >= 0 then remove inode; add inode priority end (* Initialization. *) let estimate node = let e = G.estimate node in assert (0 <= e); (* failure means user error *) e let () = G.sources (fun node -> let rec inode = { this = node; cost = 0; estimate = estimate node; path = Source node; prev = inode; next = inode; priority = -1 } in M.add node inode; P.add inode inode.estimate ) (* Access to the search results (after the search is over). *) let distance node = try (M.get node).cost with Not_found -> max_int let path node = (M.get node).path (* let [Not_found] escape if no path was found *) (* Search. *) let rec search f = (* Pick the open node that currently has lowest fhat, that is, lowest estimated distance to a goal node. *) match P.get() with | None -> (* Finished. *) distance, path | Some inode -> let node = inode.this in (* Let the user know about this newly discovered node. *) f (node, inode.path); (* Otherwise, examine its successors. *) G.successors node (fun label edge_cost son -> assert (0 <= edge_cost); (* failure means user error *) (* Determine the cost of the best known path from the start node, through this node, to this son. *) let new_cost = inode.cost + edge_cost in assert (0 <= new_cost); (* failure means overflow *) try let ison = M.get son in if new_cost < ison.cost then begin (* This son has been visited before, but this new path to it is shorter. If it was already open and waiting in the priority queue, increase its priority; otherwise, mark it as open and insert it into the queue. *) let new_fhat = new_cost + ison.estimate in assert (0 <= new_fhat); (* failure means overflow *) P.add_or_decrease ison new_fhat; ison.cost <- new_cost; ison.path <- Edge (label, inode.path) end with Not_found -> (* This son was never visited before. Allocate a new status record for it and mark it as open. *) let rec ison = { this = son; cost = new_cost; estimate = estimate son; path = Edge (label, inode.path); prev = ison; next = ison; priority = -1 } in M.add son ison; let fhat = new_cost + ison.estimate in assert (0 <= fhat); (* failure means overflow *) P.add ison fhat ); search f end menhir-20210929/src/astar.mli000066400000000000000000000053761412503066000156620ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This signature defines an implicit representation for graphs where edges have integer costs, there is a distinguished start node, and there is a set of distinguished goal nodes. It is also assumed that some geometric knowledge of the graph allows safely estimating the cost of shortest paths to goal nodes. If no such knowledge is available, [estimate] should be the constant zero function. *) module Make (G : sig (* Graph nodes. *) type node include Hashtbl.HashedType with type t := node (* Edge labels. *) type label (* The source node(s). *) val sources: (node -> unit) -> unit (* [successors n f] presents each of [n]'s successors, in an arbitrary order, to [f], together with the cost of the edge that was followed. *) val successors: node -> (label -> int -> node -> unit) -> unit (* An estimate of the cost of the shortest path from the supplied node to some goal node. This estimate must be a correct under-approximation of the actual cost. *) val estimate: node -> int end) : sig (* A path (from a target node back to some source node) is described by a series of labels and ends in a source node. *) type path = | Edge of G.label * path | Source of G.node (* A path can also be presented as a pair of a source node and a list of labels, which describe the edges from the source node to a target node. *) val reverse: path -> G.node * G.label list (* Search. Newly discovered nodes are presented to the user, in order of increasing distance from the source nodes, by invoking the user-supplied function [f]. At the end, a mapping of nodes to distances to the source nodes and a mapping of nodes to shortest paths are returned. *) val search: (G.node * path -> unit) -> (G.node -> int) * (G.node -> path) end menhir-20210929/src/back.ml000066400000000000000000000067331412503066000152750ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* Driver for the back-end. *) (* The automaton is now frozen and will no longer be modified. It is time to dump a new description of it, if requested by the user. *) let () = if Settings.dump_resolved then let module D = Dump.Make(Default) in D.dump (Settings.base ^ ".automaton.resolved") let () = if Settings.automaton_graph then AutomatonGraph.print_automaton_graph() (* Let [Interpret] handle the command line options [--interpret], [--interpret-error], [--compile-errors], [--compare-errors]. *) let () = Interpret.run() (* If [--list-errors] is set, produce a list of erroneous input sentences, then stop. *) let () = if Settings.list_errors then begin let module L = LRijkstra.Run(struct (* Undocumented: if [--log-automaton 2] is set, be verbose. *) let verbose = Settings.logA >= 2 (* For my own purposes, LRijkstra can print one line of statistics to a .csv file. *) let statistics = if false then Some "lr.csv" else None end) in exit 0 end (* Define an .ml file writer . *) let write program = let module P = Printer.Make (struct let filename = Settings.base ^ ".ml" let f = open_out filename let locate_stretches = (* 2017/05/09: always include line number directives in generated .ml files. Indeed, they affect the semantics of [assert] instructions in the semantic actions. *) (* 2011/10/19: do not use [Filename.basename]. The line number directives that we insert in the [.ml] file must retain their full path. This does mean that the line number directives depend on how menhir is invoked -- e.g. [menhir foo/bar.mly] and [cd foo && menhir bar.mly] will produce different files. Nevertheless, this seems useful/reasonable. *) Some filename end) in P.program program (* If requested, generate a .cmly file. *) let () = if Settings.cmly then Cmly_write.write (Settings.base ^ ".cmly") (* Construct and print the code using an appropriate back-end. *) let () = if Settings.table then begin let module B = TableBackend.Run (struct end) in write B.program; Interface.write Front.grammar () end else if Settings.coq then begin let module B = CoqBackend.Run (struct end) in let filename = Settings.base ^ ".v" in let f = open_out filename in B.write_all f end else begin let module B = CodeBackend.Run (struct end) in write (CodeInliner.inline B.program); Interface.write Front.grammar () end let () = Time.tick "Printing" menhir-20210929/src/back.mli000066400000000000000000000020471412503066000154400ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module drives the back-end. No functionality is offered by this module. *) menhir-20210929/src/basicPrinter.ml000066400000000000000000000345251412503066000170220ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Printf open Positions open Syntax open Stretch open BasicSyntax open Settings (* When the original grammar is split over several files, it may be IMPOSSIBLE to print it out into a single file, as that would introduce a total ordering (between rules, between priority declarations, between %on_error_reduce declarations) that did not exist originally. We currently do not warn about this problem. Nobody has ever complained about it. *) (* -------------------------------------------------------------------------- *) (* The printing mode. *) (* [PrintNormal] is the normal mode: the result is a Menhir grammar. [PrintForOCamlyacc] is close to the normal mode, but attempts to produce ocamlyacc-compatible output. This means, in particular, that we cannot bind identifiers to semantic values, but must use [$i] instead. [PrintUnitActions _] causes all OCaml code to be suppressed: the semantic actions are replaced with unit actions, preludes and postludes disappear, %parameter declarations disappear. Every %type declaration carries the [unit] type. [PrintUnitActions true] in addition declares that every token carries a semantic value of type [unit]. *) module Print (X : sig val mode : Settings.print_mode end) = struct open X (* -------------------------------------------------------------------------- *) (* Printing an OCaml type. *) let print_ocamltype ty : string = Printf.sprintf " <%s>" ( match ty with | Declared stretch -> stretch.stretch_raw_content | Inferred t -> t ) let print_ocamltype ty : string = let s = print_ocamltype ty in match mode with | PrintForOCamlyacc -> (* ocamlyacc does not allow a %type declaration to contain a new line. Replace it with a space. *) String.map (function '\r' | '\n' -> ' ' | c -> c) s | PrintNormal | PrintUnitActions _ -> s (* -------------------------------------------------------------------------- *) (* Printing the type of a terminal symbol. *) let print_token_type (prop : token_properties) = match mode with | PrintNormal | PrintForOCamlyacc | PrintUnitActions false -> Option.value (Option.map print_ocamltype prop.tk_ocamltype) ~default:"" | PrintUnitActions true -> "" (* omitted ocamltype after %token means *) (* -------------------------------------------------------------------------- *) (* Printing the type of a nonterminal symbol. *) let print_nonterminal_type ty = match mode with | PrintNormal | PrintForOCamlyacc -> print_ocamltype ty | PrintUnitActions _ -> " " (* -------------------------------------------------------------------------- *) (* Printing a binding for a semantic value. *) let print_binding id = match mode with | PrintNormal -> id ^ " = " | PrintForOCamlyacc | PrintUnitActions _ -> (* need not, or must not, bind a semantic value *) "" (* -------------------------------------------------------------------------- *) (* Testing whether it is permitted to print OCaml code (semantic actions, prelude, postlude). *) let if_ocaml_code_permitted f x = match mode with | PrintNormal | PrintForOCamlyacc -> f x | PrintUnitActions _ -> (* In these modes, all OCaml code is omitted: semantic actions, preludes, postludes, etc. *) () (* -------------------------------------------------------------------------- *) (* Testing whether attributes should be printed. *) let attributes_printed : bool = match mode with | PrintNormal | PrintUnitActions _ -> true | PrintForOCamlyacc -> false (* -------------------------------------------------------------------------- *) (* Printing a semantic action. *) let print_semantic_action f g branch = let e = Action.to_il_expr branch.action in match mode with | PrintUnitActions _ -> (* In the unit-action modes, we print a pair of empty braces, which is fine. *) () | PrintNormal -> Printer.print_expr f e | PrintForOCamlyacc -> (* In ocamlyacc-compatibility mode, the code must be wrapped in [let]-bindings whose right-hand side uses the [$i] keywords. *) let bindings = List.mapi (fun i producer -> let id = producer_identifier producer and symbol = producer_symbol producer in (* Test if [symbol] is a terminal symbol whose type is [unit]. *) let is_unit_token = try let prop = StringMap.find symbol g.tokens in prop.tk_ocamltype = None with Not_found -> symbol = "error" in (* Define the variable [id] as a synonym for [$(i+1)]. *) (* As an exception to this rule, if [symbol] is a terminal symbol which has been declared *not* to carry a semantic value, then we cannot use [$(i+1)] -- ocamlyacc does not allow it -- so we use the unit value instead. *) IL.PVar id, if is_unit_token then IL.EUnit else IL.EVar (sprintf "$%d" (i + 1)) ) branch.producers in (* The identifiers that we bind are pairwise distinct. *) (* We must use simultaneous bindings (that is, a [let/and] form), as opposed to a cascade of [let] bindings. Indeed, ocamlyacc internally translates [$i] to [_i] (just like us!), so name captures will occur unless we restrict the use of [$i] to the outermost scope. (Reported by Kenji Maillard.) *) let e = CodeBits.eletand (bindings, e) in Printer.print_expr f e (* -------------------------------------------------------------------------- *) (* Printing preludes and postludes. *) let print_preludes f g = List.iter (fun prelude -> fprintf f "%%{%s%%}\n" prelude.stretch_raw_content ) g.preludes let print_postludes f g = List.iter (fun postlude -> fprintf f "%s\n" postlude.stretch_raw_content ) g.postludes (* -------------------------------------------------------------------------- *) (* Printing %start declarations. *) let print_start_symbols f g = StringSet.iter (fun symbol -> fprintf f "%%start %s\n" (Misc.normalize symbol) ) g.start_symbols (* -------------------------------------------------------------------------- *) (* Printing %parameter declarations. *) let print_parameter f stretch = fprintf f "%%parameter<%s>\n" stretch.stretch_raw_content let print_parameters f g = match mode with | PrintNormal -> List.iter (print_parameter f) g.parameters | PrintForOCamlyacc | PrintUnitActions _ -> (* %parameter declarations are not supported by ocamlyacc, and presumably become useless when the semantic actions are removed. *) () (* -------------------------------------------------------------------------- *) (* Printing attributes. *) let print_attribute f ((name, payload) : attribute) = if attributes_printed then fprintf f " [@%s %s]" (Positions.value name) payload.stretch_raw_content let print_attributes f attrs = List.iter (print_attribute f) attrs (* -------------------------------------------------------------------------- *) (* Printing token declarations and precedence declarations. *) let print_assoc = function | LeftAssoc -> Printf.sprintf "%%left" | RightAssoc -> Printf.sprintf "%%right" | NonAssoc -> Printf.sprintf "%%nonassoc" | UndefinedAssoc -> "" let compare_pairs compare1 compare2 (x1, x2) (y1, y2) = let c = compare1 x1 y1 in if c <> 0 then c else compare2 x2 y2 let compare_tokens (_token, prop) (_token', prop') = match prop.tk_precedence, prop'.tk_precedence with | UndefinedPrecedence, UndefinedPrecedence -> 0 | UndefinedPrecedence, PrecedenceLevel _ -> -1 | PrecedenceLevel _, UndefinedPrecedence -> 1 | PrecedenceLevel (m, v, _, _), PrecedenceLevel (m', v', _, _) -> compare_pairs InputFile.compare_input_files Generic.compare (m, v) (m', v') let print_tokens f g = (* Print the %token declarations. *) StringMap.iter (fun token prop -> if prop.tk_is_declared then fprintf f "%%token%s %s%a\n" (print_token_type prop) token print_attributes prop.tk_attributes ) g.tokens; (* Sort the tokens wrt. precedence, and group them into levels. *) let levels : (string * token_properties) list list = Misc.levels compare_tokens (List.sort compare_tokens ( StringMap.bindings g.tokens )) in (* Print the precedence declarations: %left, %right, %nonassoc. *) List.iter (fun level -> let (_token, prop) = try List.hd level with Failure _ -> assert false in (* Do nothing about the tokens that have no precedence. *) if prop.tk_precedence <> UndefinedPrecedence then begin fprintf f "%s" (print_assoc prop.tk_associativity); List.iter (fun (token, _prop) -> fprintf f " %s" token ) level; fprintf f "\n" end ) levels (* -------------------------------------------------------------------------- *) (* Printing %type declarations. *) let print_types f g = StringMap.iter (fun symbol ty -> fprintf f "%%type%s %s\n" (print_nonterminal_type ty) (Misc.normalize symbol) ) g.types (* -------------------------------------------------------------------------- *) (* Printing branches and rules. *) let print_producer sep f producer = fprintf f "%s%s%s%a" (sep()) (print_binding (producer_identifier producer)) (Misc.normalize (producer_symbol producer)) print_attributes (producer_attributes producer) let print_branch f g branch = (* Print the producers. *) let sep = Misc.once "" " " in List.iter (print_producer sep f) branch.producers; (* Print the %prec annotation, if there is one. *) Option.iter (fun x -> fprintf f " %%prec %s" x.value ) branch.branch_prec_annotation; (* Newline, indentation, semantic action. *) fprintf f "\n {"; print_semantic_action f g branch; fprintf f "}\n" (* Because the resolution of reduce/reduce conflicts is implicitly dictated by the order in which productions appear in the grammar, the printer should be careful to preserve this order. *) (* 2016/08/25: As noted above, when two productions originate in different files, we have a problem. We MUST print them in some order, even though they should be incomparable. In that case, we use the order in which the source files are specified on the command line. However, this behavior is undocumented, and should not be exploited. (In previous versions of Menhir, the function passed to [List.sort] was not transitive, so it did not make any sense!) *) let compare_branch_production_levels bpl bpl' = match bpl, bpl' with | ProductionLevel (m, l), ProductionLevel (m', l') -> compare_pairs InputFile.compare_input_files Generic.compare (m, l) (m', l') let compare_branches (b : branch) (b' : branch) = compare_branch_production_levels b.branch_production_level b'.branch_production_level let compare_rules (_nt, (r : rule)) (_nt', (r' : rule)) = match r.branches, r'.branches with | [], [] -> 0 | [], _ -> -1 | _, [] -> 1 | b :: _, b' :: _ -> (* To compare two rules, it suffices to compare their first productions. *) compare_branches b b' let print_rule f g (nt, r) = fprintf f "\n%s%a:\n" (Misc.normalize nt) print_attributes r.attributes; (* Menhir accepts a leading "|", but bison does not. Let's not print it. So, we print a bar-separated list. *) let sep = Misc.once (" ") ("| ") in List.iter (fun br -> fprintf f "%s" (sep()); print_branch f g br ) r.branches let print_rules f g = let rules = List.sort compare_rules (StringMap.bindings g.rules) in List.iter (print_rule f g) rules (* -------------------------------------------------------------------------- *) (* Printing %on_error_reduce declarations. *) let print_on_error_reduce_declarations f g = let cmp (_nt, oel) (_nt', oel') = compare_branch_production_levels oel oel' in let levels : (string * on_error_reduce_level) list list = Misc.levels cmp (List.sort cmp ( StringMap.bindings g.on_error_reduce )) in List.iter (fun level -> fprintf f "%%on_error_reduce"; List.iter (fun (nt, _level) -> fprintf f " %s" (Misc.normalize nt) ) level; fprintf f "\n" ) levels let print_on_error_reduce_declarations f g = match mode with | PrintNormal | PrintUnitActions _ -> print_on_error_reduce_declarations f g | PrintForOCamlyacc -> (* %on_error_reduce declarations are not supported by ocamlyacc *) () (* -------------------------------------------------------------------------- *) (* Printing %attribute declarations. *) let print_grammar_attribute f ((name, payload) : attribute) = if attributes_printed then fprintf f "%%[@%s %s]\n" (Positions.value name) payload.stretch_raw_content let print_grammar_attributes f g = List.iter (print_grammar_attribute f) g.gr_attributes (* -------------------------------------------------------------------------- *) (* The main entry point. *) let print f g = print_parameters f g; if_ocaml_code_permitted (print_preludes f) g; print_start_symbols f g; print_tokens f g; print_types f g; print_on_error_reduce_declarations f g; print_grammar_attributes f g; fprintf f "%%%%\n"; print_rules f g; fprintf f "\n%%%%\n"; if_ocaml_code_permitted (print_postludes f) g end let print mode = let module P = Print(struct let mode = mode end) in P.print menhir-20210929/src/basicPrinter.mli000066400000000000000000000030201412503066000171550ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This is a pretty-printer for grammars. *) (* If the [mode] parameter requests ``unit actions'', then semantic actions are dropped: that is, they are replaced with trivial semantic actions that return unit. Accordingly, all [%type] declarations are changed to unit. The prologue and epilogue are dropped. All bindings for semantic values are suppressed. If, furthermore, the [mode] parameter requests ``unit tokens'', then the types carried by tokens are changed to unit. *) val print: Settings.print_mode -> out_channel -> BasicSyntax.grammar -> unit menhir-20210929/src/basicSyntax.ml000066400000000000000000000165041412503066000166620ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Syntax (* This is the abstract syntax for an unparameterized grammar, that is, a grammar that does not have any parameterized nonterminal symbols. Such a grammar is obtained as the result of an expansion phase, which is implemented in [ParameterizedGrammar]. *) (* In an unparameterized grammar, %attribute declarations can be desugared away. This is also done during the above-mentioned expansion phase. Thus, in an unparameterized grammar, attributes can be attached in the following places: - with the grammar: field [gr_attributes] of [grammar] - with a terminal symbol: field [tk_attributes] of [token_properties] - with a nonterminal symbol: field [attributes] of [rule] - with a producer: field [producer_attributes] of [producer] *) (* ------------------------------------------------------------------------ *) (* A producer is a pair of a (located) identifier and a symbol. In concrete syntax, it could be [e = expr], for instance. It carries a number of attributes. *) type producer = { producer_identifier : identifier located; producer_symbol : symbol; producer_attributes : attributes; } type producers = producer list (* ------------------------------------------------------------------------ *) (* A branch contains a series of producers and a semantic action. It is the same as in the surface syntax; see [Syntax]. *) type branch = { branch_position : Positions.t; producers : producers; action : action; branch_prec_annotation : branch_prec_annotation; branch_production_level : branch_production_level } type branches = branch list (* ------------------------------------------------------------------------ *) (* A rule consists mainly of several branches. In contrast with the surface syntax, it has no parameters. *) (* The [%inline] flag is no longer relevant after [NonTerminalInlining]. *) type rule = { branches : branches; positions : Positions.t list; inline_flag : bool; attributes : attributes; } (* ------------------------------------------------------------------------ *) (* A grammar is essentially the same as in the surface syntax; see [Syntax]. The main difference is that [%attribute] declarations, represented by the field [p_symbol_attributes] in the surface syntax, have disappeared. *) type grammar = { preludes : Stretch.t list; postludes : Syntax.postlude list; parameters : Stretch.t list; start_symbols : StringSet.t; types : Stretch.ocamltype StringMap.t; tokens : Syntax.token_properties StringMap.t; on_error_reduce : on_error_reduce_level StringMap.t; gr_attributes : attributes; rules : rule StringMap.t; } (* -------------------------------------------------------------------------- *) (* Accessors for the type [producer]. *) let producer_identifier { producer_identifier } : identifier = Positions.value producer_identifier let producer_identifier_located { producer_identifier } : identifier located = producer_identifier let producer_symbol { producer_symbol } = producer_symbol let producer_attributes { producer_attributes } = producer_attributes (* -------------------------------------------------------------------------- *) (* A getter and a transformer for the field [branches] of the type [rule]. *) let get_branches rule = rule.branches let transform_branches f rule = { rule with branches = f rule.branches } (* -------------------------------------------------------------------------- *) (* [tokens grammar] is a list of all (real) tokens in the grammar [grammar]. The special tokens "#" and "error" are not included. Pseudo-tokens (used in %prec declarations, but never declared using %token) are filtered out. *) let tokens grammar = StringMap.fold (fun token properties tokens -> if properties.tk_is_declared then token :: tokens else tokens ) grammar.tokens [] (* [typed_tokens grammar] is analogous, but includes the OCaml type of each token. *) let typed_tokens grammar = StringMap.fold (fun token properties tokens -> if properties.tk_is_declared then (token, properties.tk_ocamltype) :: tokens else tokens ) grammar.tokens [] (* [nonterminals grammar] is a list of all nonterminal symbols in the grammar [grammar]. *) let nonterminals grammar : nonterminal list = StringMap.fold (fun nt _ rules -> nt :: rules) grammar.rules [] (* [ocamltype_of_symbol grammar symbol] produces the OCaml type of the symbol [symbol] in the grammar [grammar], if it is known. *) let ocamltype_of_symbol grammar symbol : Stretch.ocamltype option = try Some (StringMap.find symbol grammar.types) with Not_found -> None (* [ocamltype_of_start_symbol grammar symbol] produces the OCaml type of the start symbol [symbol] in the grammar [grammar]. *) let ocamltype_of_start_symbol grammar symbol : Stretch.ocamltype = try StringMap.find symbol grammar.types with Not_found -> (* Every start symbol should have a type. *) assert false (* [is_inline_symbol grammar symbol] tells whether [symbol] is a nonterminal symbol (as opposed to a terminal symbol) and is marked %inline. *) let is_inline_symbol grammar symbol : bool = match StringMap.find symbol grammar.rules with | rule -> (* This is a nonterminal symbol. Test its %inline flag. *) rule.inline_flag | exception Not_found -> (* This is a terminal symbol. *) false (* [is_inline_symbol grammar producer] tells whether [producer] represents a nonterminal symbol (as opposed to a terminal) and is marked %inline. *) let is_inline_producer grammar producer = is_inline_symbol grammar (producer_symbol producer) (* -------------------------------------------------------------------------- *) (* [names producers] is the set of names of the producers [producers]. The name of a producer is the OCaml variable that is used to name its semantic value. *) (* This function checks on the fly that no two producers carry the same name. This check should never fail if we have performed appropriate renamings. It is a debugging aid. *) let names (producers : producers) : StringSet.t = List.fold_left (fun ids producer -> let id = producer_identifier producer in assert (not (StringSet.mem id ids)); StringSet.add id ids ) StringSet.empty producers menhir-20210929/src/chopInlined.mll000066400000000000000000000023521412503066000167760ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* Chopping [_inlined] off a name, if there is one, and returning the numeric suffix that follows, if there is one. *) rule chop = parse | (_* as x) "_inlined" (['0'-'9']+ as n) eof { x, int_of_string n } | (_* as x) "_inlined" eof { x, 0 } | (_* as x) eof { x, 0 } menhir-20210929/src/cmly_write.ml000066400000000000000000000136311412503066000165460ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open BasicSyntax open Grammar open Cmly_format let raw_content stretch = stretch.Stretch.stretch_raw_content let ocamltype (typ : Stretch.ocamltype) : ocamltype = match typ with | Stretch.Declared stretch -> raw_content stretch | Stretch.Inferred typ -> typ let ocamltype (typo : Stretch.ocamltype option) : ocamltype option = match typo with | None -> None | Some typ -> Some (ocamltype typ) let range (pos : Positions.t) : range = { r_start = Positions.start_of_position pos; r_end = Positions.end_of_position pos; } let ranges = List.map range let attribute (label, payload : Syntax.attribute) : attribute = { a_label = Positions.value label; a_payload = raw_content payload; a_position = range (Positions.position label); } let attributes : Syntax.attributes -> attributes = List.map attribute let terminal (t : Terminal.t) : terminal_def = { t_kind = ( if Terminal.equal t Terminal.error then `ERROR else if (match Terminal.eof with | None -> false | Some eof -> Terminal.equal t eof) then `EOF else if Terminal.pseudo t then `PSEUDO else `REGULAR ); t_name = Terminal.print t; t_type = ocamltype (Terminal.ocamltype t); t_attributes = attributes (Terminal.attributes t); } let nonterminal (nt : Nonterminal.t) : nonterminal_def = let is_start = Nonterminal.is_internal_start nt in { n_kind = if is_start then `START else `REGULAR; n_name = Nonterminal.print false nt; n_mangled_name = Nonterminal.print true nt; n_type = if is_start then None else ocamltype (Nonterminal.ocamltype nt); n_positions = if is_start then [] else ranges (Nonterminal.positions nt); n_nullable = Analysis.nullable nt; n_first = List.map Terminal.t2i (TerminalSet.elements (Analysis.first nt)); n_attributes = if is_start then [] else attributes (Nonterminal.attributes nt); } let symbol (sym : Symbol.t) : symbol = match sym with | Symbol.N n -> N (Nonterminal.n2i n) | Symbol.T t -> T (Terminal.t2i t) let action (a : Action.t) : action = { a_expr = Printer.string_of_expr (Action.to_il_expr a); a_keywords = Keyword.KeywordSet.elements (Action.keywords a); } let rhs (prod : Production.index) : producer_def array = match Production.classify prod with | Some n -> [| (N (Nonterminal.n2i n), "", []) |] | None -> Array.mapi (fun i sym -> let id = (Production.identifiers prod).(i) in let attrs = attributes (Production.rhs_attributes prod).(i) in symbol sym, id, attrs ) (Production.rhs prod) let production (prod : Production.index) : production_def = { p_kind = if Production.is_start prod then `START else `REGULAR; p_lhs = Nonterminal.n2i (Production.nt prod); p_rhs = rhs prod; p_positions = ranges (Production.positions prod); p_action = if Production.is_start prod then None else Some (action (Production.action prod)); p_attributes = attributes (Production.lhs_attributes prod); } let item (i : Item.t) : production * int = let p, i = Item.export i in (Production.p2i p, i) let itemset (is : Item.Set.t) : (production * int) list = List.map item (Item.Set.elements is) let lr0_state (node : Lr0.node) : lr0_state_def = { lr0_incoming = Option.map symbol (Lr0.incoming_symbol node); lr0_items = itemset (Lr0.items node) } let transition (sym, node) : symbol * lr1 = (symbol sym, Lr1.number node) let lr1_state (node : Lr1.node) : lr1_state_def = { lr1_lr0 = Lr0.core (Lr1.state node); lr1_transitions = List.map transition (SymbolMap.bindings (Lr1.transitions node)); lr1_reductions = let add t ps rs = (Terminal.t2i t, List.map Production.p2i ps) :: rs in TerminalMap.fold_rev add (Lr1.reductions node) [] } let entry_point prod node nt _typ accu : (nonterminal * production * lr1) list = (Nonterminal.n2i nt, Production.p2i prod, Lr1.number node) :: accu let encode () : grammar = { g_basename = Settings.base; g_preludes = List.map raw_content Front.grammar.preludes; g_postludes = List.map raw_content Front.grammar.postludes; g_terminals = Terminal.init terminal; g_nonterminals = Nonterminal.init nonterminal; g_productions = Production.init production; g_lr0_states = Array.init Lr0.n lr0_state; g_lr1_states = Array.of_list (Lr1.map lr1_state); g_entry_points = Lr1.fold_entry entry_point []; g_attributes = attributes Analysis.attributes; g_parameters = List.map raw_content Front.grammar.parameters; } let write oc t = (* .cmly file format: CMLY ++ version string ++ grammar *) let magic = "CMLY" ^ Version.version in output_string oc magic; output_value oc (t : grammar) let write filename = (* Opening in binary mode is required. This is not a text file; we write to it using [output_value]. *) let oc = open_out_bin filename in write oc (encode()); close_out oc menhir-20210929/src/cmly_write.mli000066400000000000000000000023351412503066000167160ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* [write filename] queries the modules [Front] and [Grammar] for information about the grammar and queries the modules [Lr0] and [Lr1] for information about the automaton. It writes this information to the .cmly file [filename]. *) val write: string -> unit menhir-20210929/src/codeBackend.ml000066400000000000000000001555611412503066000165630ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The code generator. *) module Run (T : sig end) = struct open Grammar open Invariant (* only to access [cell] fields *) open IL open CodeBits open CodePieces open TokenType open Interface (* ------------------------------------------------------------------------ *) (* Here is a description of our code generation mechanism. Every internal function that we produce is parameterized by the parser environment [env], which contains (pointers to) the lexer, the lexing buffer, the last token read, etc. No global variables are exploited, so our parsers are reentrant. The functions that we export do not expect an environment as a parameter; they create a fresh one when invoked. Every state [s] is translated to a [run] function. To a first approximation, the only parameter of the [run] function, besides [env], is the stack. However, in some cases (consult the predicate [runpushes]), the top stack cell is not yet allocated when [run s] is called. The cell's contents are passed as extra parameters, and it is [run]'s responsibility to allocate that cell. The first thing in [run] is to discard a token, if the state was entered through a shift transition, and to peek at the lookahead token. When the current token is to be discarded, the [discard] function is invoked. It discards the current token, invokes the lexer to obtain a new token, and returns an updated environment. When we only wish to peek at the current token, without discarding it, we simply read [env.token]. (We have to be careful in cases where the current lookahead token might be [error], since, in those cases, [env.token] is meaningless; see below.) Once the lookahead token is obtained, [run] performs a case analysis of the lookahead token. Each branch performs one of the following. In shift branches, control is dispatched to another [run] function, with appropriate parameters, typically the current stack plus the information that should go into the new top stack cell (a state, a semantic value, locations). In reduce branches, a [reduce] function is invoked. In the default branch, error handling is initiated (see below). The [reduce] function associated with production [prod] pops as many stack cells as necessary, retrieving semantic values and the state [s] that initiated the reduction. It then evaluates the semantic action, which yields a new semantic value. (This is the only place where semantic actions are evaluated, so that semantic actions are never duplicated.) It then passes control on to the [goto] function associated with the nonterminal [nt], where [nt] is the left-hand side of the production [prod]. The [goto] function associated with nonterminal [nt] expects just one parameter besides the environment -- namely, the stack. However, in some cases (consult the predicate [gotopushes]), the top stack cell is not allocated yet, so its contents are passed as extra parameters. In that case, [goto] first allocates that cell. Then, it examines the state found in that cell and performs a goto transition, that is, a shift transition on the nonterminal symbol [nt]. This simply consists in passing control to the [run] function associated with the transition's target state. If this case analysis only has one branch, because all transitions for [nt] lead to the same target state, then no case analysis is required. In principle, a stack cell contains a state, a semantic value, and start and end positions. However, the state can be omitted if it is never consulted by a [goto] function. The semantic value can be omitted if it is associated with a token that was declared not to carry a semantic value. (One could also omit semantic values for nonterminals whose type was declared to be [unit], but that does not seem very useful.) The start or end position can be omitted if they are associated with a symbol that does not require keeping track of positions. When all components of a stack cell are omitted, the entire cell disappears, so that no memory allocation is required. For each start symbol [nt], an entry point function, named after [nt], is generated. Its parameters are a lexer and a lexing buffer. The function allocates and initializes a parser environment and transfers control to the appropriate [run] function. Our functions are grouped into one huge [let rec] definition. The inliner, implemented as a separate module, will inline functions that are called at most once, remove dead code (although there should be none or next to none), and possibly perform other transformations. I note that, if a state can be entered only through (nondefault) reductions, then, in that state, the lookahead token must be a member of the set of tokens that allow these reductions, and by construction, there must exist an action on that token in that state. Thus, the default branch (which signals an error when the lookahead token is not a member of the expected set) is in fact dead. It would be nice (but difficult) to exploit types to prove that. However, one could at least replace the code of that branch with a simple [assert false]. TEMPORARY do it *) (* ------------------------------------------------------------------------ *) (* At this time, the code back-end supports only the legacy error handling strategy. It does not (yet?) support the simplified strategy. Let the user know about this. *) let () = match Settings.strategy with | `Legacy -> () | `Simplified -> Error.error [] "The code back-end does not support --strategy simplified.\n\ Please use either --strategy legacy or --table." (* ------------------------------------------------------------------------ *) (* Here is a description of our error handling mechanism. With every state [s], we associate an [error] function. If [s] is willing to act when the lookahead token is [error], then this function tells how. This includes *both* shift *and* reduce actions. (For some reason, yacc/ocamlyacc/mule/bison can only shift on [error].) If [s] is unable to act when the lookahead token is [error], then this function pops a stack cell, extracts a state [s'] out of it, and transfers control, via a global [errorcase] dispatch function, to the [error] function associated with [s']. (Because some stack cells do not physically hold a state, this description is somewhat simpler than the truth, but that's the idea.) When an error is detected in state [s], then (see [initiate]) the [error] function associated with [s] is invoked. Immediately before invoking the [error] function, the flag [env.error] is set. By convention, this means that the current token is discarded and replaced with an [error] token. The [error] token transparently inherits the positions associated with the underlying concrete token. Whenever we attempt to consult the current token, we check whether [env.error] is set and, if that is the case, resume error handling by calling the [error] function associated with the current state. This allows a series of reductions to correctly take place when the lookahead token is [error]. In many states, though, it is possible to statically prove that [env.error] cannot be set. In that case, we produce a lookup of [env.token] without checking [env.error]. The flag [env.error] is cleared when a token is shifted. States with default reductions perform a reduction regardless of the current lookahead token, which can be either [error] or a regular token. A question that bothered me for a while was, when unwinding the stack, do we stop at a state that has a default reduction? Should it be considered able to handle the error token? I now believe that the answer is, this cannot happen. Indeed, if a state has a default reduction, then, whenever it is entered, reduction is performed and that state is exited, which means that it is never pushed onto the stack. So, it is fine to consider that a state with a default reduction is unable to handle errors. I note that a state that can handle [error] and has a default reduction must in fact have a reduction action on [error]. *) (* The variable that holds the environment. This is a parameter of all functions. *) let env = prefix "env" (* The type of environments. *) let tcenv = env let tenv = TypApp (tcenv, []) (* The [assertfalse] function. We have just one of these, in order to save code size. It should become unnecessary when we add GADTs. *) let assertfalse = prefix "fail" (* The [discard] function. *) let discard = prefix "discard" (* The [run] function associated with a state [s]. *) let run s = prefix (Printf.sprintf "run%d" (Lr1.number s)) (* The [goto] function associated with a nonterminal [nt]. *) let goto nt = prefix (Printf.sprintf "goto_%s" (Nonterminal.print true nt)) (* The [reduce] function associated with a production [prod]. *) let reduce prod = prefix (Printf.sprintf "reduce%d" (Production.p2i prod)) (* The [errorcase] function. *) let errorcase = prefix "errorcase" (* The [error] function associated with a state [s]. *) let error s = prefix (Printf.sprintf "error%d" (Lr1.number s)) (* The constant associated with a state [s]. *) let statecon s = dataprefix (Printf.sprintf "State%d" (Lr1.number s)) let estatecon s = EData (statecon s, []) let pstatecon s = PData (statecon s, []) let pstatescon ss = POr (List.map pstatecon ss) (* The type of states. *) let tcstate = prefix "state" let tstate = TypApp (tcstate, []) (* The [print_token] function. This automatically generated function is used in [--trace] mode. *) let print_token = prefix "print_token" (* Fields in the environment record. *) let flexer = prefix "lexer" let flexbuf = prefix "lexbuf" let ftoken = prefix "token" let ferror = prefix "error" (* The type variable that represents the stack tail. *) let tvtail = tvprefix "tail" let ttail = TypVar tvtail (* The result type for every function. TEMPORARY *) let tvresult = tvprefix "return" let tresult = TypVar tvresult (* ------------------------------------------------------------------------ *) (* Helpers for code production. *) (* The following assertion checks that [env.error] is [false]. *) let assertnoerror : pattern * expr = PUnit, eassert (EApp (EVar "not", [ ERecordAccess (EVar env, ferror) ])) let auto2scheme t = scheme [ tvtail; tvresult ] t (* ------------------------------------------------------------------------ *) (* Accessing the positions of the current token. *) (* There are two ways we can go about this. We can read the positions from the lexbuf immediately after we request a new token, or we can wait until we need the positions and read them at that point. As of 2014/12/12, we switch to the latter approach. The speed difference in a micro-benchmark is not measurable, but this allows us to save two fields in the [env] record, which should be a good thing, as it implies less frequent minor collections. *) let getstartp = ERecordAccess (ERecordAccess (EVar env, flexbuf), "Lexing.lex_start_p") let getendp = ERecordAccess (ERecordAccess (EVar env, flexbuf), "Lexing.lex_curr_p") (* ------------------------------------------------------------------------ *) (* Determine whether the [goto] function for nonterminal [nt] will push a new cell onto the stack. If it doesn't, then that job is delegated to the [run] functions called by [goto]. One could decide that [gotopushes] always returns true, and produce decent code. As a refinement, we decide to drive the [push] operation inside the [run] functions if all of them are able to eliminate this operation via shiftreduce optimization. This will be the case if all of these [run] functions implement a default reduction of a non-epsilon production. If that is not the case, then [gotopushes] returns true. In general, it is good to place the [push] operation inside [goto], because multiple [reduce] functions transfer control to [goto], and [goto] in turn transfers control to multiple [run] functions. Hence, this is where code sharing is maximal. All of the [run] functions that [goto] can transfer control to expect a stack cell of the same shape (indeed, the symbol [nt] is the same in every case, and the state is always represented), which makes this decision possible. *) let gotopushes : Nonterminal.t -> bool = Nonterminal.tabulate (fun nt -> not ( Lr1.targets (fun accu _ target -> accu && match Default.has_default_reduction target with | Some (prod, _) -> Production.length prod > 0 | None -> false ) true (Symbol.N nt) ) ) (* ------------------------------------------------------------------------ *) (* Determine whether the [run] function for state [s] will push a new cell onto the stack. Our convention is this. If this [run] function is entered via a shift transition, then it is in charge of pushing a new stack cell. If it is entered via a goto transition, then it is in charge of pushing a new cell if and only if the [goto] function that invoked it did not do so. Last, if this [run] function is invoked directly by an entry point, then it does not push a stack cell. *) let runpushes s = match Lr1.incoming_symbol s with | Some (Symbol.T _) -> true | Some (Symbol.N nt) -> not (gotopushes nt) | None -> false (* ------------------------------------------------------------------------ *) (* In some situations, we are able to fuse a shift (or goto) transition with a reduce transition, which means that we save the cost (in speed and in code size) of pushing and popping the top stack cell. This involves creating a modified version of the [reduce] function associated with a production [prod], where the contents of the top stack cell are passed as extra parameters. Because we wish to avoid code duplication, we perform this change only if all call sites for [reduce] agree on this modified calling convention. At the call site, the optimization is possible only if a stack cell allocation exists and is immediately followed by a call to [reduce]. This is the case inside the [run] function for state [s] when [run] pushes a stack cell and performs a default reduction. This optimization amounts to coalescing the push operation inside [run] with the pop operation that follows inside [reduce]. Unit production elimination, on the other hand, would coalesce the pop operation inside [reduce] with the push operation that follows inside [goto]. For this reason, the two are contradictory. As a result, we do not attempt to perform unit production elimination. In fact, we did implement it at one point and found that it was seldom applicable, because preference was given to the shiftreduce optimization. There are cases where shiftreduce optimization does not make any difference, for instance, if production [prod] is never reduced, or if the top stack cell is in fact nonexistent. *) let shiftreduce : Production.index -> bool = Production.tabulate (fun prod -> (* Check that this production pops at least one stack cell. *) Production.length prod > 0 && (* Check that all call sites push a stack cell and have a default reduction. *) Lr1.NodeSet.fold (fun s accu -> accu && Option.defined (Default.has_default_reduction s) && runpushes s ) (Lr1.production_where prod) true ) let shiftreducecount = Production.sum (fun prod -> if shiftreduce prod then 1 else 0) let () = Error.logC 1 (fun f -> Printf.fprintf f "%d out of %d productions exploit shiftreduce optimization.\n" shiftreducecount Production.n) (* Check that, as predicted above, [gotopushes nt] returns [false] only when all of the [run] functions that follow it perform shiftreduce optimization. This can be proved as follows. If [gotopushes nt] returns [false], then every successor state [s] has a default reduction for some non-epsilon production [prod]. Furthermore, all states that can reduce [prod] must be successors of that same [goto] function: indeed, because the right-hand side of the production ends with symbol [nt], every state that can reduce [prod] must be entered through [nt]. So, at all such states, [runpushes] is true, which guarantees that [shiftreduce prod] is true as well. *) let () = assert ( Nonterminal.fold (fun nt accu -> accu && if gotopushes nt then true else Lr1.targets (fun accu _ target -> accu && match Default.has_default_reduction target with | Some (prod, _) -> shiftreduce prod | None -> false ) true (Symbol.N nt) ) true ) (* ------------------------------------------------------------------------ *) (* Type production. *) (* This is the type of states. Only states that are represented are declared. *) let statetypedef = { typename = tcstate; typeparams = []; typerhs = TDefSum ( Lr1.fold (fun defs s -> if Invariant.represented s then { dataname = statecon s; datavalparams = []; datatypeparams = None } :: defs else defs ) [] ); typeconstraint = None } (* The type of lexers. *) let tlexer = TypArrow (tlexbuf, ttoken) (* This is the type of parser environments. *) let envtypedef = { typename = tcenv; typeparams = []; typerhs = TDefRecord [ (* The lexer itself. *) field false flexer tlexer; (* The lexing buffer. *) field false flexbuf tlexbuf; (* The last token that was read from the lexer. This is the head of the token stream, unless [env.error] is set. *) field false ftoken ttoken; (* A flag which tells whether we currently have an [error] token at the head of the stream. When this flag is set, the head of the token stream is the [error] token, and the contents of the [token] field is irrelevant. The token following [error] is obtained by invoking the lexer again. *) field true ferror tbool; ]; typeconstraint = None } (* [curry] curries the top stack cell in a type [t] of the form [(stack type) arrow (result type)]. [t] remains unchanged if the stack type does not make at least one cell explicit. *) let curry = function | TypArrow (TypTuple (tstack :: tcell), tresult) -> TypArrow (tstack, marrow tcell tresult) | TypArrow _ as t -> t | _ -> assert false (* [curryif true] is [curry], [curryif false] is the identity. *) let curryif flag t = if flag then curry t else t (* Types for stack cells. [celltype tailtype holds_state symbol] returns the type of a stack cell. The parameter [tailtype] is the type of the tail of the stack. The flag [holds_state] tells whether the cell holds a state. The parameter [symbol] is used to determine whether the cell holds a semantic value and what its type is. A subtlety here and in [curry] above is that singleton stack cells give rise to singleton tuple types, which the type printer eliminates, but which do exist internally. As a result, [curry] always correctly removes the top stack cell, even if it is a singleton tuple cell. *) let celltype tailtype cell = let { symbol; holds_state; holds_startp; holds_endp; _ } = cell in TypTuple ( tailtype :: if1 holds_endp tposition @ if1 holds_state tstate @ semvtype symbol @ if1 holds_startp tposition ) (* Types for stacks. [stacktype s] is the type of the stack at state [s]. [reducestacktype prod] is the type of the stack when about to reduce production [prod]. [gotostacktype nt] is the type of the stack when the [goto] function associated with [nt] is called. In all cases, the tail (that is, the unknown part) of the stack is represented by [ttail], currently a type variable. These stack types are obtained by folding [celltype] over a description of the stack provided by module [Invariant]. *) let stacktype s = Array.fold_left celltype ttail (Invariant.stack s) let reducestacktype prod = Array.fold_left celltype ttail (Invariant.prodstack prod) let gotostacktype nt = Array.fold_left celltype ttail (Invariant.gotostack nt) (* The type of the [run] function. As announced earlier, if [s] is the target of shift transitions, the type of the stack is curried, that is, the top stack cell is not yet allocated, so its contents are passed as extra parameters. If [s] is the target of goto transitions, the top stack cell is allocated. If [s] is a start state, this issue makes no difference. *) let runtypescheme s = auto2scheme ( arrow tenv ( curryif (runpushes s) ( arrow (stacktype s) tresult ) ) ) (* The type of the [goto] function. The top stack cell is curried. *) let gototypescheme nt = auto2scheme (arrow tenv (curry (arrow (gotostacktype nt) tresult))) (* If [prod] is an epsilon production and if the [goto] function associated with it expects a state parameter, then the [reduce] function associated with [prod] also requires a state parameter. *) let reduce_expects_state_param prod = let nt = Production.nt prod in Production.length prod = 0 && Invariant.((gotostack nt).(0).holds_state) (* The type of the [reduce] function. If shiftreduce optimization is performed for this production, then the top stack cell is not explicitly allocated. *) let reducetypescheme prod = auto2scheme ( arrow tenv ( curryif (shiftreduce prod) ( arrow (reducestacktype prod) ( arrowif (reduce_expects_state_param prod) tstate tresult ) ) ) ) (* The type of the [errorcase] function. The shape of the stack is unknown, and is determined by examining the state parameter. *) let errorcasetypescheme = auto2scheme (marrow [ tenv; ttail; tstate ] tresult) (* The type of the [error] function. The shape of the stack is the one associated with state [s]. *) let errortypescheme s = auto2scheme ( marrow [ tenv; stacktype s ] tresult) (* ------------------------------------------------------------------------ *) (* Code production preliminaries. *) (* This flag will be set to [true] if we ever raise the [Error] exception. This happens when we unwind the entire stack without finding a state that can handle errors. *) let can_die = ref false (* A code pattern for an exception handling construct where both alternatives are in tail position. Concrete syntax in OCaml 4.02 is [match e with x -> e1 | exception Error -> e2]. Earlier versions of OCaml do not support this construct. We continue to emulate it using a combination of [try/with], [match/with], and an [option] value. It is used only in a very rare case anyway. *) (* TEMPORARY either remove this or add support for [match with exception] *) let letunless e x e1 e2 = EMatch ( ETry ( EData ("Some", [ e ]), [ branch (PData (excdef.excname, [])) (EData ("None", [])) ] ), [ branch (PData ("Some", [ PVar x ])) e1 ; branch (PData ("None", [])) e2 ] ) (* ------------------------------------------------------------------------ *) (* Calling conventions. *) (* The layout of a stack cell is determined here. The first field in a stack cell is always a pointer to the rest of the stack; it is followed by the fields listed below, each of which may or may not appear. [runpushcell] and [gotopushcell] are the two places where stack cells are allocated. *) (* 2015/11/04. We make [endp] the first element in the list of optional fields, so we are able to access it at a fixed offset, provided we know that it exists. This is exploited when reducing an epsilon production. *) (* The contents of the top stack cell, exposed as individual parameters. The choice of identifiers is suitable for use in the definition of [run]. *) let runcellparams stack : xparams = Invariant.fold_top (fun cell -> let { holds_semv; holds_state; holds_startp; holds_endp; _ } = cell in if1 holds_endp (xvar endp) @ if1 holds_state (xvar state) @ if1 holds_semv (xvar semv) @ if1 holds_startp (xvar startp) ) [] stack (* May the semantic action associated with production [prod] refer to the variable [ids.(i)]? *) let action_may_refer_to_value prod i = Production.is_start prod || let ids = Production.identifiers prod and action = Production.action prod in StringSet.mem ids.(i) (Action.semvars action) (* The contents of a stack cell, exposed as individual parameters, again. The choice of identifiers is suitable for use in the definition of a [reduce] function. [prod] is the production's index. The integer [i] tells which symbol on the right-hand side we are focusing on, that is, which symbol this stack cell is associated with. *) let reducecellparams prod i cell = let { symbol; holds_state; holds_startp; holds_endp; _ } = cell in let ids = Production.identifiers prod in (* The semantic value is bound to the variable [ids.(i)]. Its type is [t]. As of 2016/03/11, we generate a type annotation. Indeed, because of our use of [magic], the semantic value would otherwise have an unknown type; and, if it is a function, the OCaml compiler could warn (incorrectly) that this function does not use its argument. As of 2020/12/28, we use a wildcard pattern if we can determine that the semantic action does not refer to the variable [ids.(i)]. *) let semvpat t = if action_may_refer_to_value prod i then PAnnot (PVar ids.(i), t) else PWildcard in if1 holds_endp (PVar (Printf.sprintf "_endpos_%s_" ids.(i))) @ if1 holds_state (if i = 0 then PVar state else PWildcard) @ List.map semvpat (semvtype symbol) @ if1 holds_startp (PVar (Printf.sprintf "_startpos_%s_" ids.(i))) (* The contents of a stack cell, exposed as individual parameters, again. The choice of identifiers is suitable for use in the definition of [error]. *) let errorcellparams (i, pat) cell = let { holds_semv; holds_state; holds_startp; holds_endp; _ } = cell in i + 1, ptuple ( pat :: if1 holds_endp PWildcard @ if1 holds_state (if i = 0 then PVar state else PWildcard) @ if1 holds_semv PWildcard @ if1 holds_startp PWildcard ) (* Calls to [run]. *) let runparams s : xparams = xvar env :: xmagic (xvar stack) :: ifn (runpushes s) (runcellparams (Invariant.stack s)) let call_run s actuals = EApp (EVar (run s), actuals) (* The parameters to [reduce]. When shiftreduce optimization is in effect, the top stack cell is not allocated, so extra parameters are required. Note that [shiftreduce prod] and [reduce_expects_state_param prod] are mutually exclusive conditions, so the [state] parameter is never bound twice. *) let reduceparams prod = PVar env :: PVar stack :: ifn (shiftreduce prod) ( Invariant.fold_top (reducecellparams prod (Production.length prod - 1)) [] (Invariant.prodstack prod) ) @ if1 (reduce_expects_state_param prod) (PVar state) (* Calls to [reduce]. One must specify the production [prod] as well as the current state [s]. *) (* TEMPORARY use [let] bindings followed with [reduceparams], so as to reduce duplication *) let call_reduce prod s = let actuals = (EVar env) :: (EMagic (EVar stack)) :: ifn (shiftreduce prod) (xparams2exprs (runcellparams (Invariant.stack s))) (* compare with [runpushcell s] *) @ if1 (reduce_expects_state_param prod) (estatecon s) in EApp (EVar (reduce prod), actuals) (* Calls to [goto]. *) let gotoparams nt : xparams = xvar env :: xvar stack :: runcellparams (Invariant.gotostack nt) let call_goto nt = EApp (EVar (goto nt), xparams2exprs (gotoparams nt)) (* Calls to [errorcase]. *) let errorcaseparams : xparams = [ xvar env; xmagic (xvar stack); xvar state ] let call_errorcase = EApp (EVar errorcase, xparams2exprs errorcaseparams) (* Calls to [error]. *) let errorparams = [ xvar env; xmagic (xvar stack) ] let call_error s = EApp (EVar (error s), xparams2exprs errorparams) (* We prefer to call [errorcase] rather than [error], when possible, and even though [errorcase] itself calls [error], because this reduces the number of call sites of the [error] functions and typically allows all of these functions to be inlined into [errorcase]. This greatly reduces the number of functions that we produce. *) let call_error s = if Invariant.represented s then EApp (EVar errorcase, [ EVar env; EMagic (EVar stack); estatecon s ]) (* TEMPORARY use [let] binding and reduce duplication *) else call_error s (* Calls to [assertfalse]. *) let call_assertfalse = EApp (EVar assertfalse, [ EUnit ]) (* ------------------------------------------------------------------------ *) (* Code production for the automaton functions. *) (* Count how many states actually can peek at an error token. This figure is, in general, inferior or equal to the number of states at which [Invariant.errorpeeker] is true, because some of these states have a default reduction and will not consult the lookahead token. *) let errorpeekers = ref 0 (* Code for calling the reduction function for token [prod] upon finding a token within [toks]. This produces a branch, to be inserted in a [run] function for state [s]. *) let reducebranch toks prod s = branch (tokspat toks) (call_reduce prod s) (* Code for shifting from state [s] to state [s'] via the token [tok]. This produces a branch, to be inserted in a [run] function for state [s]. The callee, [run s'], is responsible for taking the current token off the input stream. (There is actually a case where the token is *not* taken off the stream: when [s'] has a default reduction on [#].) It is also responsible for pushing a new stack cell. The rationale behind this decision is that there may be multiple shift transitions into [s'], so we actually share that code by placing it inside [run s'] rather than inside every transition. *) let shiftbranchbody s tok s' = (* TEMPORARY avoid redundancy with [runparams]; use [let] bindings *) (* Construct the actual parameters for [run s']. *) let actuals = (EVar env) :: (EMagic (EVar stack)) :: Invariant.fold_top (fun cell -> let { holds_semv; holds_state; holds_startp; holds_endp; _ } = cell in if1 holds_endp getendp @ if1 holds_state (estatecon s) @ if1 holds_semv (EVar semv) @ if1 holds_startp getstartp ) [] (Invariant.stack s') in (* Call [run s']. *) tracecomment (Printf.sprintf "Shifting (%s) to state %d" (Terminal.print tok) (Lr1.number s')) (call_run s' actuals) (* If [--represent-values] is set and the token [tok] has no semantic value, then we must bind the variable [semv] to a unit value. Otherwise, this is unnecessary. *) let tok_bind_unit_if_necessary tok e = if Settings.represent_values then tok_bind_unit tok (PVar semv) e else e let shiftbranch s tok s' = assert (not (Terminal.pseudo tok)); branch (tokpat tok (PVar semv)) (tok_bind_unit_if_necessary tok (shiftbranchbody s tok s') ) (* This generates code for pushing a new stack cell upon entering the [run] function for state [s]. *) let runpushcell s e = if runpushes s then let contents = xvar stack :: runcellparams (Invariant.stack s) in mlet [ PVar stack ] [ etuple (xparams2exprs contents) ] e else e let runpushcellunless shiftreduce s e = if shiftreduce then EComment ("Not allocating top stack cell", e) else runpushcell s e (* This generates code for dealing with the lookahead token upon entering the [run] function for state [s]. If [s] is the target of a shift transition, then we must take the current token (which was consumed in the shift transition) off the input stream. Whether [s] was entered through a shift or a goto transition, we want to peek at the next token, unless we are performing a default reduction. The parameter [defred] tells which default reduction, if any, we are about to perform. *) (* 2014/12/06 New convention regarding initial states (i.e., states which have no incoming symbol). We do not invoke the lexer when we construct the initial environment, so the [run] function for an initial state must do it. (Except in the very special case where the initial state has a default reduction on [#] -- this means the grammar recognizes only the empty word. We have ruled out this case.) *) let gettoken s defred e = match Lr1.incoming_symbol s, defred with | (Some (Symbol.T _) | None), Some (_, toks) when TerminalSet.mem Terminal.sharp toks -> assert (TerminalSet.cardinal toks = 1); (* There is a default reduction on token [#]. We cannot request the next token, since that might drive the lexer off the end of the input stream, so we cannot call [discard]. Do nothing. *) e | (Some (Symbol.T _) | None), Some _ -> (* There is some other default reduction. Discard the first input token. *) blet ([ PVar env, EApp (EVar discard, [ EVar env ]) (* Note that we do not read [env.token]. *) ], e) | (Some (Symbol.T _) | None), None -> (* There is no default reduction. Discard the first input token and peek at the next one. *) blet ([ PVar env, EApp (EVar discard, [ EVar env ]); PVar token, ERecordAccess (EVar env, ftoken) ], e) | Some (Symbol.N _), Some _ -> (* There is some default reduction. Do not peek at the input token. *) e | Some (Symbol.N _), None -> (* There is no default reduction. Peek at the first input token, without taking it off the input stream. This is normally done by reading [env.token], unless the token might be [error]: then, we check [env.error] first. *) if Invariant.errorpeeker s then begin incr errorpeekers; EIfThenElse ( ERecordAccess (EVar env, ferror), tracecomment "Resuming error handling" (call_error s), blet ([ PVar token, ERecordAccess (EVar env, ftoken) ], e) ) end else blet ([ assertnoerror; PVar token, ERecordAccess (EVar env, ftoken) ], e) (* This produces the header of a [run] function. *) let runheader s body = let body = tracecomment (Printf.sprintf "State %d:" (Lr1.number s)) body in { valpublic = false; valpat = PVar (run s); valval = EAnnot (EFun (xparams2pats (runparams s), body), runtypescheme s) } (* This produces the comment attached with a default reduction. *) let defaultreductioncomment toks e = EPatComment ( "Reducing without looking ahead at ", tokspat toks, e ) (* This produces some bookkeeping code that is used when initiating error handling. We set the flag [env.error]. By convention, the field [env.token] becomes meaningless and one considers that the first token on the input stream is [error]. As a result, the next peek at the lookahead token will cause error handling to be resumed. The next call to [discard] will take the [error] token off the input stream and clear [env.error]. *) (* It seems convenient for [env.error] to be a mutable field, as this allows us to generate compact code. Re-allocating the whole record would produce less compact code. And speed is not an issue in this error-handling code. *) let errorbookkeeping e = tracecomment "Initiating error handling" (blet ( [ PUnit, ERecordWrite (EVar env, ferror, etrue) ], e )) (* This code is used to indicate that a new error has been detected in state [s]. If I am correct, [env.error] is never set here. Indeed, that would mean that we first found an error, and then signaled another error before being able to shift the first error token. My understanding is that this cannot happen: when the first error is signaled, we end up at a state that is willing to handle the error token, by a series of reductions followed by a shift. We initiate error handling by first performing the standard bookkeeping described above, then transferring control to the [error] function associated with [s]. *) let initiate s = blet ( [ assertnoerror ], errorbookkeeping (call_error s) ) (* This produces the body of the [run] function for state [s]. *) let rundef s : valdef = match Default.has_default_reduction s with | Some (prod, toks) as defred -> (* Perform reduction without looking ahead. If shiftreduce optimization is being performed, then no stack cell is allocated. The contents of the top stack cell are passed to [reduce] as extra parameters. *) runheader s ( runpushcellunless (shiftreduce prod) s ( gettoken s defred ( defaultreductioncomment toks ( call_reduce prod s ) ) ) ) | None -> (* If this state is willing to act on the error token, ignore that -- this is taken care of elsewhere. *) let transitions = SymbolMap.remove (Symbol.T Terminal.error) (Lr1.transitions s) and reductions = TerminalMap.remove Terminal.error (Lr1.reductions s) in (* Construct the main case analysis that determines what action should be taken next. A default branch, where an error is detected, is added if the analysis is not exhaustive. In the default branch, we initiate error handling. *) let covered, branches = ProductionMap.fold (fun prod toks (covered, branches) -> (* There is a reduction for these tokens. *) TerminalSet.union toks covered, reducebranch toks prod s :: branches ) (Lr0.invert reductions) (TerminalSet.empty, []) in let covered, branches = SymbolMap.fold (fun symbol s' (covered, branches) -> match symbol with | Symbol.T tok -> (* There is a shift transition for this token. *) TerminalSet.add tok covered, shiftbranch s tok s' :: branches | Symbol.N _ -> covered, branches ) transitions (covered, branches) in let branches = if TerminalSet.subset TerminalSet.universe covered then branches else branches @ [ branch PWildcard (initiate s) ] in (* Finally, construct the code for [run]. The former pushes things onto the stack, obtains the lookahead token, then performs the main case analysis on the lookahead token. *) runheader s ( runpushcell s ( gettoken s None ( EMatch ( EVar token, branches ) ) ) ) (* This is the body of the [reduce] function associated with production [prod]. *) let reducebody prod = (* Find out about the left-hand side of this production and about the identifiers that have been bound to the symbols in the right-hand side. These represent variables that we should bind to semantic values before invoking the semantic action. *) let nt, rhs = Production.def prod and ids = Production.identifiers prod and length = Production.length prod in (* Build a pattern that represents the shape of the stack. Out of the stack, we extract a state (except when the production is an epsilon production) and a number of semantic values. If shiftreduce optimization is being performed, then the top stack cell is not explicitly allocated, so we do not include it in the pattern that is built. *) let (_ : int), pat = Array.fold_left (fun (i, pat) cell -> i + 1, if i = length - 1 && shiftreduce prod then pat else ptuple (pat :: reducecellparams prod i cell) ) (0, PVar stack) (Invariant.prodstack prod) in (* If any identifiers refer to terminal symbols without a semantic value, then bind these identifiers to the unit value. This provides the illusion that every symbol, terminal or nonterminal, has a semantic value. This is more regular and allows applying operators such as ? to terminal symbols without a semantic value. *) let unitbindings = Misc.foldi length (fun i unitbindings -> match semvtype rhs.(i) with | [] when action_may_refer_to_value prod i -> (PVar ids.(i), EUnit) :: unitbindings | _ -> unitbindings ) [] in (* If necessary, determine start and end positions for the left-hand side of the production. If the right-hand side is nonempty, this is done by extracting position information out of the first and last symbols of the right-hand side. If it is empty, then (as of 2015/11/04) this is done by taking the end position stored in the top stack cell (whatever it is). The constraints imposed by the module [Invariant], the layout of cells, and our creation of a sentinel cell (see [entrydef] further on), ensure that this cell exists and has an [endp] field at offset 1. Yes, we live dangerously. You only live once. *) let extract x = (* Extract the end position (i.e., the field at offset 1) in the top stack cell and bind it to the variable [x]. *) PTuple [ PWildcard; PVar x ], EMagic (EVar stack) in let symbol = Symbol.N nt in let posbindings action = let bind_startp = Invariant.startp symbol in if1 (Action.has_beforeend action) ( extract beforeendp ) @ if1 bind_startp ( if length > 0 then PVar startp, EVar (Printf.sprintf "_startpos_%s_" ids.(0)) else extract startp ) @ if1 (Invariant.endp symbol) ( if length > 0 then PVar endp, EVar (Printf.sprintf "_endpos_%s_" ids.(length - 1)) else if bind_startp then PVar endp, EVar startp else extract endp ) in (* If this production is one of the start productions, then reducing it means accepting the input. In that case, we return a final semantic value and stop. Otherwise, we transfer control to the [goto] function, unless the semantic action raises [Error], in which case we transfer control to [errorcase]. *) if Production.is_start prod then tracecomment "Accepting" (blet ( [ pat, EVar stack ], EMagic (EVar ids.(0)) )) else let action = Production.action prod in let act = annotate (Action.to_il_expr action) (semvtypent nt) in tracecomment (Printf.sprintf "Reducing production %s" (Production.print prod)) (blet ( (pat, EVar stack) :: unitbindings @ posbindings action, (* If the semantic action is susceptible of raising [Error], use a [let/unless] construct, otherwise use [let]. *) if Action.has_syntaxerror action then letunless act semv (call_goto nt) (errorbookkeeping call_errorcase) else blet ([ PVar semv, act ], call_goto nt) )) (* This is the definition of the [reduce] function associated with production [prod]. *) let reducedef prod = { valpublic = false; valpat = PVar (reduce prod); valval = EAnnot ( EFun ( reduceparams prod, reducebody prod ), reducetypescheme prod ) } (* This generates code for pushing a new stack cell inside [goto]. *) let gotopushcell nt e = if gotopushes nt then let contents = xvar stack :: runcellparams (Invariant.gotostack nt) in mlet [ PVar stack ] [ etuple (xparams2exprs contents) ] e else e (* This is the heart of the [goto] function associated with nonterminal [nt]. *) let gotobody nt = (* Examine the current state to determine where to go next. *) let branches = Lr1.targets (fun branches sources target -> branch (pstatescon sources) (call_run target (xparams2exprs (runparams target))) :: branches ) [] (Symbol.N nt) in match branches with | [] -> (* If there are no branches, then this [goto] function is never invoked. The inliner will drop it, so whatever we generate here is unimportant. *) call_assertfalse | [ branch ] -> (* If there is only one branch, no case analysis is required. This optimization is not strictly necessary if GADTs are used by the compiler to prove that the case analysis is exhaustive. It does improve readability, though, and is also useful if the compiler does not have GADTs. *) EPatComment ( "State should be ", branch.branchpat, branch.branchbody ) | _ -> (* In the general case, we keep the branches computed above and, unless [nt] is universal, add a default branch, which is theoretically useless but helps avoid warnings if the compiler does not have GADTs. *) let default = branch PWildcard call_assertfalse in EMatch ( EVar state, branches @ (if Invariant.universal (Symbol.N nt) then [] else [ default ]) ) (* This the [goto] function associated with nonterminal [nt]. *) let gotodef nt = { valpublic = false; valpat = PVar (goto nt); valval = EAnnot (EFun (xparams2pats (gotoparams nt), gotopushcell nt (gotobody nt)), gototypescheme nt) } (* ------------------------------------------------------------------------ *) (* Code production for the error handling functions. *) (* This is the body of the [error] function associated with state [s]. *) (* This code does not check whether the state [s] has a default reduction. It is unclear whether this is intentional. See the corresponding comment in MenhirLib.Engine. *) let handle s e = tracecomment (Printf.sprintf "Handling error in state %d" (Lr1.number s)) e let errorbody s = match SymbolMap.find (Symbol.T Terminal.error) (Lr1.transitions s) with | s' -> (* There is a shift transition on error. *) handle s ( shiftbranchbody s Terminal.error s' ) | exception Not_found -> match TerminalMap.lookup Terminal.error (Lr1.reductions s) with | prods -> let prod = Misc.single prods in (* There is a reduce transition on error. If shiftreduce optimization is enabled for this production, then we must pop an extra cell for [reduce]'s calling convention to be met. *) let extrapop e = if shiftreduce prod then let contents = xvar stack :: runcellparams (Invariant.stack s) in let pat = ptuple (xparams2pats contents) in blet ([ pat, EVar stack ], e) else e in handle s ( extrapop ( call_reduce prod s ) ) | exception Not_found -> (* This state is unable to handle errors. Pop the stack to find a state that does handle errors, a state that can further pop the stack, or die. *) match Invariant.rewind s with | Invariant.Die -> can_die := true; ERaise errorval | Invariant.DownTo (w, st) -> let _, pat = Array.fold_left errorcellparams (0, PVar stack) w in blet ( [ pat, EVar stack ], match st with | Invariant.Represented -> call_errorcase | Invariant.UnRepresented s -> call_error s ) (* This is the [error] function associated with state [s]. *) let errordef s = { valpublic = false; valpat = PVar (error s); valval = EAnnot ( EFun ( xparams2pats errorparams, errorbody s ), errortypescheme s ) } (* This is the [errorcase] function. It examines its state parameter and dispatches control to an appropriate [error] function. *) let errorcasedef = let branches = Lr1.fold (fun branches s -> if Invariant.represented s then branch (pstatecon s) (EApp (EVar (error s), [ EVar env; EMagic (EVar stack) ])) :: branches else branches ) [] in { valpublic = false; valpat = PVar errorcase; valval = EAnnot ( EFun ( xparams2pats errorcaseparams, EMatch ( EVar state, branches ) ), errorcasetypescheme ) } (* ------------------------------------------------------------------------ *) (* Code production for the entry points. *) (* This is the entry point associated with a start state [s]. By convention, it is named after the nonterminal [nt] that corresponds to this state. This is a public definition. The code initializes a parser environment, an empty stack, and invokes [run]. 2015/11/11. If the state [s] can reduce an epsilon production whose left-hand symbol keeps track of its start or end position, or if [s] can reduce any production that mentions [$endpos($0)], then the initial stack should contain a sentinel cell with a valid [endp] field at offset 1. For simplicity, we always create a sentinel cell. *) (* When we allocate a fresh parser environment, the [token] field receives a dummy value. It will be overwritten by the first call to [run], which will invoke [discard]. This allows us to invoke the lexer in just one place. *) let entrydef s = let nt = Item.startnt (Lr1.start2item s) in let lexer = "lexer" and lexbuf = "lexbuf" in let initial_stack = let initial_position = getendp in etuple [ EUnit; initial_position ] in { valpublic = true; valpat = PVar (Nonterminal.print true nt); valval = EAnnot ( EFun ( [ PVar lexer; PVar lexbuf ], blet ( [ PVar env, ERecord ([ (flexer, EVar lexer); (flexbuf, EVar lexbuf); (ftoken, EMagic EUnit); (ferror, efalse) ]) ], EMagic (EApp (EVar (run s), [ EVar env; initial_stack ])) ) ), entrytypescheme Front.grammar (Nonterminal.print true nt) ) } (* ------------------------------------------------------------------------ *) (* Code production for auxiliary functions. *) (* This is [assertfalse], used when internal failure is detected. This should never happen if our tool is correct. *) let internal_failure = "Internal failure -- please contact the parser generator's developers." let assertfalsedef = { valpublic = false; valpat = PVar assertfalse; valval = EAnnot ( EFun ([ PUnit ], blet ( [ PUnit, eprintf internal_failure []], eassert efalse ) ), scheme [ "a" ] (arrow tunit (tvar "a")) ) } (* This is [print_token], used to print tokens in [--trace] mode. *) let printtokendef = destructuretokendef print_token tstring false (fun tok -> EStringConst (Terminal.print tok)) (* This is [discard], used to take a token off the input stream and query the lexer for a new one. The code queries the lexer for a new token and stores it into [env.token], overwriting the previous token. It also stores the start and positions of the new token. Last, [env.error] is cleared. We use the lexer's [lex_start_p] and [lex_curr_p] fields to extract the start and end positions of the token that we just read. In practice, it seems that [lex_start_p] can be inaccurate (that is the case when the lexer calls itself recursively, instead of simply recognizing an atomic pattern and returning immediately). However, we are 100% compatible with ocamlyacc here, and there is no better solution anyway. As of 2014/12/12, we re-allocate the environment record instead of updating it. Perhaps surprisingly, this makes the code TWICE FASTER overall. The write barrier is really costly! *) let discardbody = let lexer = "lexer" and lexbuf = "lexbuf" in EFun ( [ PVar env ], blet ([ PVar lexer, ERecordAccess (EVar env, flexer); PVar lexbuf, ERecordAccess (EVar env, flexbuf); PVar token, EApp (EVar lexer, [ EVar lexbuf ]); ] @ trace "Lookahead token is now %s (%d-%d)" [ EApp (EVar print_token, [ EVar token ]); ERecordAccess (ERecordAccess (EVar lexbuf, "Lexing.lex_start_p"), "Lexing.pos_cnum"); ERecordAccess (ERecordAccess (EVar lexbuf, "Lexing.lex_curr_p"), "Lexing.pos_cnum") ], ERecord [ flexer, EVar lexer; flexbuf, EVar lexbuf; ftoken, EVar token; ferror, efalse ] ) ) let discarddef = { valpublic = false; valpat = PVar discard; valval = annotate discardbody (arrow tenv tenv) } (* ------------------------------------------------------------------------ *) (* Here is complete code for the parser. *) open BasicSyntax let grammar = Front.grammar let program = [ SIFunctor (grammar.parameters, mbasics grammar @ SITypeDefs [ envtypedef; statetypedef ] :: SIStretch grammar.preludes :: SIValDefs (true, ProductionMap.fold (fun _ s defs -> entrydef s :: defs ) Lr1.entry ( Lr1.fold (fun defs s -> rundef s :: errordef s :: defs ) ( Nonterminal.foldx (fun nt defs -> gotodef nt :: defs ) (Production.fold (fun prod defs -> if Lr1.NodeSet.is_empty (Lr1.production_where prod) then defs else reducedef prod :: defs ) [ discarddef; printtokendef; assertfalsedef; errorcasedef ]))) ) :: SIStretch grammar.postludes :: [])] (* ------------------------------------------------------------------------ *) (* We are done! *) let () = Error.logC 1 (fun f -> Printf.fprintf f "%d out of %d states can peek at an error.\n" !errorpeekers Lr1.n) let () = if not !can_die then Error.logC 1 (fun f -> Printf.fprintf f "The generated parser cannot raise Error.\n") let () = Time.tick "Producing abstract syntax" end menhir-20210929/src/codeBackend.mli000066400000000000000000000020701412503066000167160ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The (code-based) code generator. *) module Run (T : sig end) : sig val program: IL.program end menhir-20210929/src/codeBits.ml000066400000000000000000000167771412503066000161420ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module provides a number of tiny functions that help produce [IL] code. *) open IL (* A smart constructor for [PVarLocated]. *) let pvarlocated id = let x, pos = Positions.decompose id in let pos1 = Positions.start_of_position pos and pos2 = Positions.end_of_position pos in if pos1 == Lexing.dummy_pos || pos2 == Lexing.dummy_pos then PVar x else PVarLocated id (* Tuples. *) let etuple = function | [] -> EUnit | [ e ] -> e | es -> ETuple es let ptuple = function | [] -> PUnit | [ p ] -> p | ps -> PTuple ps (* A list subject to a condition. *) let ifn condition xs = if condition then xs else [] let if1 condition x = if condition then [ x ] else [] let ifnlazy condition xs = if condition then xs() else [] (* The unit type. *) let tunit = TypApp ("unit", []) (* The Boolean type. *) let tbool = TypApp ("bool", []) (* The integer type. *) let tint = TypApp ("int", []) (* The string type. *) let tstring = TypApp ("string", []) (* The exception type. *) let texn = TypApp ("exn", []) (* The type of pairs. *) let tpair typ1 typ2 = TypTuple [typ1; typ2] (* The type of lexer positions. *) let tposition = TypApp ("Lexing.position", []) (* The type of the $loc and $sloc keywords. *) (* A location is a pair of positions. This might change in the future. *) let tlocation = tpair tposition tposition (* The type of lexer buffers. *) let tlexbuf = TypApp ("Lexing.lexbuf", []) (* The type of untyped semantic values. *) let tobj = TypApp ("Obj.t", []) (* Building a type variable. *) let tvar x : typ = TypVar x (* Building a type scheme. *) let scheme qs t = { quantifiers = qs; body = t } (* Building a type scheme with no quantifiers out of a type. *) let type2scheme t = scheme [] t (* Constraining an expression to have a (monomorphic) type. *) let annotate e t = EAnnot (e, type2scheme t) let pat2var = function | PVar x -> x | _ -> assert false (* [simplify] removes bindings of the form [let v = v in ...] and [let _ = v in ...]. *) let rec simplify = function | [] -> [] | (PVar v1, EVar v2) :: bindings when v1 = v2 -> (* Avoid a useless let binding. *) simplify bindings | (PWildcard, EVar _) :: bindings -> (* Avoid a useless let binding. *) simplify bindings | binding :: bindings -> binding :: simplify bindings (* Building a [let] construct, with on-the-fly simplification. *) let blet (bindings, body) = let bindings = simplify bindings in match bindings, body with | [], _ -> body | [ PVar x1, e ], EVar x2 when x1 = x2 -> (* Reduce [let x = e in x] to just [e]. *) e | _, _ -> ELet (bindings, body) let mlet formals actuals body = blet (List.combine formals actuals, body) (* Simulating a [let/and] construct using tuples. *) let eletand (bindings, body) = let bindings = simplify bindings in match bindings, body with | [], _ -> (* special case: zero bindings *) body | [ PVar x1, e ], EVar x2 when x1 = x2 -> (* Reduce [let x = e in x] to just [e]. *) e | [ _ ], _ -> (* special case: one binding *) ELet (bindings, body) | _ :: _ :: _, _ -> (* general case: at least two bindings *) let pats, exprs = List.split bindings in ELet ([ PTuple pats, ETuple exprs ], body) (* [eraisenotfound] is an expression that raises [Not_found]. *) let eraisenotfound = ERaise (EData ("Not_found", [])) let eassert e = EApp (EVar "assert", [ e ]) (* [bottom] is an expression that has every type. Its semantics is irrelevant. *) let bottom = eraisenotfound (* Boolean constants. *) let efalse : expr = EData ("false", []) let etrue : expr = EData ("true", []) let eboolconst b = if b then etrue else efalse (* Option constructors. *) let enone = EData ("None", []) let esome e = EData ("Some", [e]) (* List constructors. *) let rec elist xs = match xs with | [] -> EData ("[]", []) | x :: xs -> EData ("::", [ x; elist xs ]) (* Integer constants as patterns. *) let pint k : pattern = PData (string_of_int k, []) (* These help build function types. *) let arrow typ body : typ = TypArrow (typ, body) let arrowif flag typ body : typ = if flag then arrow typ body else body let marrow typs body : typ = List.fold_right arrow typs body (* Tracing. *) let eprintf format args = EApp ( EVar "Printf.eprintf", (EStringConst (format ^ "\n%!")) :: args ) let trace (format : string) (args : expr list) : (pattern * expr) list = if Settings.trace then [ PUnit, eprintf format args ] else [] let tracecomment (comment : string) (body : expr) : expr = if Settings.trace then blet (trace comment [], body) else EComment (comment, body) (* ------------------------------------------------------------------------ *) (* Here is a bunch of naming conventions. Our names are chosen to minimize the likelihood that a name in a semantic action is captured. In other words, all global definitions as well as the parameters to [reduce] are given far-fetched names, unless [--no-prefix] was specified. Note that the prefix must begin with '_'. This allows avoiding warnings about unused variables with ocaml 3.09 and later. *) let prefix name = if Settings.noprefix then name else "_menhir_" ^ name let dataprefix name = if Settings.noprefix then name else "Menhir" ^ name let tvprefix name = if Settings.noprefix then name else "ttv_" ^ name (* ------------------------------------------------------------------------ *) (* Converting an interface to a structure. Only exception and type definitions go through. *) let interface_item_to_structure_item = function | IIExcDecls defs -> [ SIExcDefs defs ] | IITypeDecls defs -> [ SITypeDefs defs ] | IIFunctor (_, _) | IIValDecls _ | IIInclude _ | IIModule (_, _) | IIComment _ -> [] let interface_to_structure i = List.flatten (List.map interface_item_to_structure_item i) (* Constructing a named module type together with a list of "with type" constraints. *) let with_types wk name tys = List.fold_left (fun mt (params, name, ty) -> MTWithType (mt, params, name, wk, ty) ) (MTNamedModuleType name) tys let mapp me1 me2 = MApp (me1, me2) let mapp me1 mes2 = List.fold_left mapp me1 mes2 let field modifiable name t = { modifiable = modifiable; fieldname = name; fieldtype = type2scheme t } let branch branchpat branchbody = { branchpat; branchbody } menhir-20210929/src/codeBits.mli000066400000000000000000000104331412503066000162720ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module provides a number of tiny functions that help produce [IL] code. *) open Positions open IL (* A smart constructor for [PVarLocated]. *) val pvarlocated: string located -> pattern (* Tuples. *) val etuple: expr list -> expr val ptuple: pattern list -> pattern (* A list subject to a condition. (Be careful, though: the list is of course constructed even if the condition is false.) *) val ifn: bool -> 'a list -> 'a list val if1: bool -> 'a -> 'a list (* A lazy version of [ifn], where the list is constructed only if the condition is true. *) val ifnlazy: bool -> (unit -> 'a list) -> 'a list (* Standard types. *) val tunit: typ val tbool: typ val tint: typ val tstring: typ val texn: typ val tposition: typ val tlocation: typ val tlexbuf: typ val tobj : typ (* Building a type variable. *) val tvar: string -> typ (* Building a type scheme. *) val scheme: string list -> typ -> typescheme val type2scheme: typ -> typescheme (* Constraining an expression to have a (monomorphic) type. *) val annotate: expr -> typ -> expr (* Projecting out of a [PVar] pattern. *) val pat2var: pattern -> string (* Building a [let] construct, with on-the-fly simplification. These two functions construct a nested sequence of [let] definitions. *) val blet: (pattern * expr) list * expr -> expr val mlet: pattern list -> expr list -> expr -> expr (* Simulating a [let/and] construct. *) val eletand: (pattern * expr) list * expr -> expr (* [eraisenotfound] is an expression that raises [Not_found]. *) val eraisenotfound: expr (* [eassert] builds a runtime assertion [assert e]. *) val eassert: expr -> expr (* [bottom] is an expression that has every type. Its semantics is irrelevant. *) val bottom: expr (* Boolean constants. *) val etrue: expr val efalse: expr val eboolconst: bool -> expr (* Option constructors. *) val enone: expr val esome: expr -> expr (* List constructors. *) val elist: expr list -> expr (* Integer constants as patterns. *) val pint: int -> pattern (* These help build function types. *) val arrow: typ -> typ -> typ val arrowif: bool -> typ -> typ -> typ val marrow: typ list -> typ -> typ (* Tracing. *) (**[eprintf format args] constructs a call to [Printf.eprintf], which log a tracing message onto [stderr]. *) val eprintf: string -> expr list -> expr (**[trace format args] returns a list of (effectful) bindings which, when [--trace] is enabled, log a tracing message onto [stderr]. *) val trace: string -> expr list -> (pattern * expr) list (**[tracecomment c e] emits either a comment whose content is the string [c] or a tracing message whose content is also [c]. *) val tracecomment: string -> expr -> expr (* These functions are used to generate names in menhir's namespace. *) val prefix: string -> string val dataprefix: string -> string val tvprefix: string -> string (* Converting an interface to a structure. Only exception and type definitions go through. *) val interface_to_structure: interface -> structure (* Constructing a named module type together with a list of "with type" constraints. *) val with_types: with_kind -> string -> (string list * string * typ) list -> module_type (* Functor applications. *) val mapp: modexpr -> modexpr list -> modexpr (* Record fields. *) val field: bool -> string -> typ -> fielddef (* Branches. *) val branch: pattern -> expr -> branch menhir-20210929/src/codeInliner.ml000066400000000000000000000230431412503066000166210ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open IL open CodeBits (* In the following, we only inline global functions. In order to avoid unintended capture, as we traverse terms, we keep track of local identifiers that hide global ones. The following little class helps do that. (The pathological case where a local binding hides a global one probably does not arise very often. Fortunately, checking against it in this way is quite cheap, and lets me sleep safely.) *) class locals table = object method pvar (locals : StringSet.t) (id : string) = if Hashtbl.mem table id then StringSet.add id locals else locals end (* Here is the inliner. *) let inline_valdefs (defs : valdef list) : valdef list = (* Create a table of all global definitions. *) let before, table = Traverse.tabulate_defs defs in (* Prepare to count how many times each function is used, including inside its own definition. The public functions serve as starting points for this discovery phase. *) let queue : valdef Queue.t = Queue.create() and usage : int StringMap.t ref = ref StringMap.empty in (* [visit] is called at every identifier occurrence. *) let visit locals id = if StringSet.mem id locals then (* This is a local identifier. Do nothing. *) () else try let _, def = Hashtbl.find table id in (* This is a globally defined identifier. Increment its usage count. If it was never visited, enqueue its definition for exploration. *) let n = try StringMap.find id !usage with Not_found -> Queue.add def queue; 0 in usage := StringMap.add id (n + 1) !usage with Not_found -> (* This identifier is not global. It is either local or a reference to some external library, e.g. ocaml's standard library. *) () in (* Look for occurrences of identifiers inside expressions. *) let o = object inherit [ StringSet.t, unit ] Traverse.fold inherit locals table method! evar locals () id = visit locals id end in (* Initialize the queue with all public definitions, and work from there. We assume that the left-hand side of every definition is a variable. *) List.iter (fun { valpublic = public; valpat = p } -> if public then visit StringSet.empty (pat2var p) ) defs; Misc.qfold (o#valdef StringSet.empty) () queue; let usage = !usage in (* Now, inline every function that is called at most once. At the same time, every function that is never called is dropped. The public functions again serve as starting points for the traversal. *) let queue : valdef Queue.t = Queue.create() and emitted = ref StringSet.empty in let enqueue def = let id = pat2var def.valpat in if not (StringSet.mem id !emitted) then begin emitted := StringSet.add id !emitted; Queue.add def queue end in (* A simple application is an application of a variable to a number of variables, constants, or record accesses out of variables. *) let rec is_simple_arg = function | EVar _ | EData (_, []) | ERecordAccess (EVar _, _) -> true | EMagic e -> is_simple_arg e | _ -> false in let is_simple_app = function | EApp (EVar _, actuals) -> List.for_all is_simple_arg actuals | _ -> false in (* Taking a fresh instance of a type scheme. Ugly. *) let instance = let count = ref 0 in let fresh tv = incr count; tv, Printf.sprintf "freshtv%d" !count in fun scheme -> let mapping = List.map fresh scheme.quantifiers in let rec sub typ = match typ with | TypTextual _ -> typ | TypVar v -> begin try TypVar (List.assoc v mapping) with Not_found -> typ end | TypApp (f, typs) -> TypApp (f, List.map sub typs) | TypTuple typs -> TypTuple (List.map sub typs) | TypArrow (typ1, typ2) -> TypArrow (sub typ1, sub typ2) in sub scheme.body in (* Destructuring a type annotation. *) let rec annotate formals body typ = match formals, typ with | [], _ -> [], CodeBits.annotate body typ | formal :: formals, TypArrow (targ, tres) -> let formals, body = annotate formals body tres in PAnnot (formal, targ) :: formals, body | _ :: _, _ -> (* Type annotation has insufficient arity. *) assert false in (* The heart of the inliner: rewriting a function call to a [let] expression. If there was a type annotation at the function definition site, it is dropped, provided the semantic actions have been type-checked. Otherwise, it is kept, because, due to the presence of [EMagic] expressions in the code, dropping a type annotation could cause an ill-typed program to become apparently well-typed. Keeping a type annotation requires taking a fresh instance of the type scheme, because OCaml doesn't have support for locally and existentially bound type variables. Yuck. *) let inline formals actuals body oscheme = assert (List.length actuals = List.length formals); match oscheme with | Some scheme when not Front.ocaml_types_have_been_checked -> let formals, body = annotate formals body (instance scheme) in mlet formals actuals body | _ -> mlet formals actuals body in (* Look for occurrences of identifiers inside expressions, branches, etc. and replace them with their definitions if they have only one use site or if their definitions are sufficiently simple. *) let o = object (self) inherit [ StringSet.t ] Traverse.map as super inherit locals table method! eapp locals e actuals = match e with | EVar id when (Hashtbl.mem table id) && (* a global identifier *) (not (StringSet.mem id locals)) (* not hidden by a local identifier *) -> let _, def = Hashtbl.find table id in (* cannot fail, thanks to the above check *) let formals, body, oscheme = match def with | { valval = EFun (formals, body) } -> formals, body, None | { valval = EAnnot (EFun (formals, body), scheme) } -> formals, body, Some scheme | { valval = _ } -> (* The definition is not a function definition. This should not happen in the kind of code that we generate. *) assert false in assert (StringMap.mem id usage); if StringMap.find id usage = 1 || is_simple_app body then (* The definition can be inlined, with beta reduction. *) inline formals (self#exprs locals actuals) (EComment (id, self#expr locals body)) oscheme else begin (* The definition cannot be inlined. *) enqueue def; super#eapp locals e actuals end | _ -> (* The thing in function position is not a reference to a global. *) super#eapp locals e actuals end in (* Initialize the queue with all public definitions, and work from there. *) List.iter (function { valpublic = public } as def -> if public then enqueue def ) defs; let valdefs = Misc.qfold (fun defs def -> o#valdef StringSet.empty def :: defs ) [] queue in Error.logC 1 (fun f -> Printf.fprintf f "%d functions before inlining, %d functions after inlining.\n" before (List.length valdefs)); Time.tick "Inlining"; valdefs (* Dumb recursive traversal. *) let rec inline_structure_item item = match item with | SIValDefs (true, defs) -> (* A nest of recursive definitions. Act on it. *) SIValDefs (true, inline_valdefs defs) | SIFunctor (params, s) -> SIFunctor (params, inline_structure s) | SIModuleDef (name, e) -> SIModuleDef (name, inline_modexpr e) | SIInclude e -> SIInclude (inline_modexpr e) | SIExcDefs _ | SITypeDefs _ | SIValDefs (false, _) | SIStretch _ | SIComment _ -> item and inline_structure s = List.map inline_structure_item s and inline_modexpr = function | MVar x -> MVar x | MStruct s -> MStruct (inline_structure s) | MApp (e1, e2) -> MApp (inline_modexpr e1, inline_modexpr e2) (* The external entry point. *) let inline (p : program) : program = if Settings.code_inlining then inline_structure p else p menhir-20210929/src/codeInliner.mli000066400000000000000000000024161412503066000167730ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This transformer inlines every function that is called at most once. It also inlines some functions whose body consists of a single function call. At the same time, every function that is never called is dropped. Public functions are never inlined or dropped. *) val inline: IL.program -> IL.program menhir-20210929/src/codePieces.ml000066400000000000000000000171111412503066000164300ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module defines many internal naming conventions for use by the two code generators, [CodeBackend] and [TableBackend]. It also offers a few code generation facilities. *) open IL open CodeBits open Grammar (* ------------------------------------------------------------------------ *) (* Naming conventions. *) (* The type variable associated with a nonterminal [nt]. *) let ntvar nt = Infer.ntvar (Nonterminal.print true nt) (* A variable used to hold a semantic value. *) let semv = "_v" (* A variable used to hold a stack. *) let stack = prefix "stack" (* A variable used to hold a state. *) let state = prefix "s" (* A variable used to hold a token. *) let token = "_tok" (* Variables used to hold start and end positions. Do not change these names! They are chosen to coincide with the $startpos and $endpos keywords, which the lexer rewrites to _startpos and _endpos, so that binding these variables before executing a semantic action is meaningful. *) (* These names should agree with the printing function [Keyword.posvar]. *) let beforeendp = Keyword.(posvar Before WhereEnd FlavorPosition) (* "_endpos__0_" *) let startp = Keyword.(posvar Left WhereStart FlavorPosition) (* "_startpos" *) let endp = Keyword.(posvar Left WhereEnd FlavorPosition) (* "_endpos" *) let startpos ids i = Keyword.(posvar (RightNamed ids.(i)) WhereStart FlavorPosition) (* sprintf "_startpos_%s_" ids.(i) *) let endpos ids i = Keyword.(posvar (RightNamed ids.(i)) WhereEnd FlavorPosition) (* sprintf "_endpos_%s_" ids.(i) *) (* ------------------------------------------------------------------------ *) (* Types for semantic values. *) (* [semvtypent nt] is the type of the semantic value associated with nonterminal [nt]. *) let semvtypent nt = match Nonterminal.ocamltype nt with | None -> (* [nt] has unknown type. If we have run [Infer], then this can't happen. However, running type inference is only an option, so we still have to deal with that case. *) ntvar nt | Some ocamltype -> (* [nt] has known type. *) TypTextual ocamltype (* [semvtypetok tok] is the type of the semantic value associated with token [tok]. There is no such type if the token does not have a semantic value. *) let semvtypetok tok = match Terminal.ocamltype tok with | None -> (* Token has unit type and is omitted in stack cell, unless [--represent-values] has been passed. *) if Settings.represent_values then [ tunit ] else [] | Some ocamltype -> (* Token has known type. *) [ TypTextual ocamltype ] (* [semvtype symbol] is the type of the semantic value associated with [symbol]. *) let semvtype = function | Symbol.T tok -> semvtypetok tok | Symbol.N nt -> [ semvtypent nt ] (* ------------------------------------------------------------------------ *) (* Patterns for tokens. *) (* [tokpat tok pat] is a pattern that matches the token [tok] and binds its semantic value (if it has one) to the pattern [pat]. *) let tokpat tok pat = let data = TokenType.tokendata (Terminal.print tok) in PData ( data, if Terminal.ocamltype tok = None then [] else [ pat ] ) (* [tok_bind_unit tok pat e] binds the pattern [pat] to the unit value in the expression [e] if the token [tok] has no semantic value. Otherwise, it returns just [e]. *) let tok_bind_unit tok pat e = if Terminal.ocamltype tok = None then blet ([ (pat, EUnit) ], e) else e (* [tokspat toks] is a pattern that matches any token in the set [toks], without binding its semantic value. *) let tokspat toks = POr ( TerminalSet.fold (fun tok pats -> tokpat tok PWildcard :: pats ) toks [] ) (* [destructuretokendef name codomain bindsemv branch] generates the definition of a function that destructures tokens. [name] is the name of the function that is generated. [codomain] is its return type. [bindsemv] tells whether the variable [semv] should be bound. [branch] is applied to each (non-pseudo) terminal and must produce code for each branch. *) let destructuretokendef name codomain bindsemv branch = { valpublic = false; valpat = PVar name; valval = annotate ( EFun ([ PVar token ], EMatch (EVar token, Terminal.fold (fun tok branches -> if Terminal.pseudo tok then branches else { branchpat = tokpat tok (if bindsemv then PVar semv else PWildcard); branchbody = branch tok } :: branches ) [] ) ) ) (arrow TokenType.ttoken codomain) } (* ------------------------------------------------------------------------ *) (* A global variable holds the exception [Error]. *) (* We preallocate the [Error] exception and store it into a global variable. This allows saving code at the sites where the exception is raised. Don't change the conventional name [_eRR], it is shared with the lexer, which replaces occurrences of the [$syntaxerror] keyword with [(raise _eRR)]. *) let parse_error = "_eRR" let errorval = EVar parse_error let basics = "MenhirBasics" (* 2017/01/20 The name [basics] must be an unlikely name, as it might otherwise hide a user-defined module by the same name. *) (* The global definition [let _eRR : exn = Error] includes a type annotation. This allows us to avoid warning 41, which warns about the existence of a data constructor named [Error] in the standard library. *) let excvaldef = { valpublic = false; valpat = PVar parse_error; valval = EAnnot (EData (Interface.excname, []), type2scheme texn) } (* ------------------------------------------------------------------------ *) (* Define the internal sub-module [Basics], which contains the definitions of the exception [Error] and of the type [token]. Then, include this sub-module. This is used both in the code and table back-ends. *) let mbasics grammar = [ (* The module [Basics]. *) SIModuleDef (basics, MStruct ( (* The exception [Error]. *) SIExcDefs [ Interface.excdef ] :: (* 2021/09/27 We now place the definition [let _eRR = Error] at this particular point so as to avoid the risk of a name collision. In previous versions of Menhir, if a token was named [Error], then the definition [let _eRR = Error] would not receive its intended meaning, and the code produced by the code back-end would be ill-typed. *) SIValDefs (false, [ excvaldef ]) :: (* The type [token]. *) interface_to_structure ( TokenType.tokentypedef grammar ) )); (* Include the above submodule. *) SIInclude (MVar basics); ] menhir-20210929/src/codePieces.mli000066400000000000000000000074321412503066000166060ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module defines many internal naming conventions for use by the two code generators, [CodeBackend] and [TableBackend]. It also offers a few code generation facilities. *) open IL open Grammar (* ------------------------------------------------------------------------ *) (* Naming conventions. *) (* A variable used to hold a semantic value. *) val semv : string (* A variable used to hold a stack. *) val stack: string (* A variable used to hold a state. *) val state: string (* A variable used to hold a token. *) val token: string (* Variables used to hold start and end positions. *) val beforeendp: string val startp: string val endp: string val startpos: string array -> int -> string val endpos: string array -> int -> string (* ------------------------------------------------------------------------ *) (* Types for semantic values. *) (* [semvtypent nt] is the type of the semantic value associated with nonterminal [nt]. *) val semvtypent : Nonterminal.t -> typ (* [semvtype symbol] is the type of the semantic value associated with [symbol]. *) val semvtype : Symbol.t -> typ list (* ------------------------------------------------------------------------ *) (* Patterns for tokens. *) (* [tokpat tok pat] is a pattern that matches the token [tok] and binds its semantic value (if it has one) to the pattern [pat]. *) val tokpat: Terminal.t -> pattern -> pattern (* [tokspat toks] is a pattern that matches any token in the set [toks], without binding its semantic value. *) val tokspat: TerminalSet.t -> pattern (* [tok_bind_unit tok pat e] binds the pattern [pat] to the unit value in the expression [e] if the token [tok] has no semantic value. Otherwise, it returns just [e]. *) val tok_bind_unit: Terminal.t -> pattern -> expr -> expr (* [destructuretokendef name codomain bindsemv branch] generates the definition of a function that destructure tokens. [name] is the name of the function that is generated. [codomain] is its return type. [bindsemv] tells whether the variable [semv] should be bound. [branch] is applied to each (non-pseudo) terminal and must produce code for each branch. *) val destructuretokendef: string -> typ -> bool -> (Terminal.t -> expr) -> valdef (* ------------------------------------------------------------------------ *) (* A global variable holds the exception [Error]. *) (* A reference to this global variable. *) val errorval: expr (* ------------------------------------------------------------------------ *) (* The structure items [mbasics grammar] define and include the internal sub-module [Basics], which contains the definitions of the exception [Error] and of the type [token]. Then, they define the global variable mentioned above, which holds the exception [Error]. *) val basics: string val mbasics: BasicSyntax.grammar -> structure menhir-20210929/src/conflict.ml000066400000000000000000000466561412503066000162060ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* -------------------------------------------------------------------------- *) (* I suppose now is as good a time as any to do this. *) let () = if Settings.reference_graph then ReferenceGraph.print_reference_graph() (* -------------------------------------------------------------------------- *) (* If [--dump] is present, honor it before performing conflict resolution. *) let () = if Settings.dump then (* The default reductions have not been decided yet at this point. *) let module Default = struct let has_default_reduction _node = None end in let module D = Dump.Make(Default) in D.dump (Settings.base ^ ".automaton") (* -------------------------------------------------------------------------- *) (* Explaining shift actions. *) (* The existence of a shift action stems from the existence of a shift item in the LR(0) core that underlies the LR(1) state of interest. That is, lookahead sets are not relevant. The existence of a shift item in the LR(0) core is explained by finding a path from a start item to the shift item in the LR(0) nondeterministic automaton, such that the symbols read along this path form the (previously fixed) symbol string that leads to the conflict state in the LR(1) automaton. There may be several such paths: a shortest one is chosen. There may also be several shift items in the conflict state: an arbitrary one is chosen. I believe it would not be interesting to display traces for several shift items: they would be identical except in their last line (where the desired shift item actually appears). *) (* Symbolic execution of the nondeterministic LR(0) automaton. *) (* Configurations are pairs of an LR(0) item and an offset into the input string, which indicates how much has been read so far. *) type configuration0 = Item.t * int (* This function builds a derivation out of a (nonempty, reversed) sequence of configurations. The derivation is constructed from bottom to top, that is, beginning at the last configuration and moving back towards to the start configuration. *) let rec follow derivation offset' = function | [] -> assert (offset' = 0); derivation | (item, offset) :: configs -> let _, _, rhs, pos, _ = Item.def item in let derivation = if offset = offset' then (* This is an epsilon transition. Put a new root node on top of the existing derivation. *) Derivation.build pos rhs derivation None else (* This was a shift transition. Tack symbol in front of the forest. *) Derivation.prepend rhs.(pos) derivation in follow derivation offset configs (* Symbolic execution begins with a start item (corresponding to one of the automaton's entry nodes), a fixed string of input symbols, to be fully consumed, and a goal item. The objective is to find a path through the automaton that leads from the start configuration [(start, 0)] to the goal configuration [(stop, n)], where [n] is the length of the input string. The automaton is explored via breadth-first search. A hash table is used to record which configurations have been visited and to build a spanning tree of shortest paths. *) exception Done let explain_shift_item (start : Item.t) (input : Symbol.t array) (stop : Item.t) : Derivation.t = let n = Array.length input in let table : (configuration0, configuration0 option) Hashtbl.t = Hashtbl.create 1023 in let queue : configuration0 Queue.t = Queue.create() in let enqueue ancestor config = try let _ = Hashtbl.find table config in () with Not_found -> Hashtbl.add table config ancestor; Queue.add config queue in enqueue None (start, 0); try Misc.qiter (function (item, offset) as config -> (* If the item we're looking at is the goal item and if we have read all of the input symbols, stop. *) if (Item.equal item stop) && (offset = n) then raise Done; (* Otherwise, explore the transitions out of this item. *) let prod, _, rhs, pos, length = Item.def item in (* Shift transition, followed only if the symbol matches the symbol found in the input string. *) if (pos < length) && (offset < n) && (Symbol.equal rhs.(pos) input.(offset)) then begin let config' = (Item.import (prod, pos+1), offset+1) in enqueue (Some config) config' end; (* Epsilon transitions. *) if pos < length then match rhs.(pos) with | Symbol.N nt -> Production.iternt nt (fun prod -> let config' = (Item.import (prod, 0), offset) in enqueue (Some config) config' ) | Symbol.T _ -> () ) queue; assert false with Done -> (* We have found a (shortest) path from the start configuration to the goal configuration. Turn it into an explicit derivation. *) let configs = Misc.materialize table (stop, n) in let _, _, rhs, pos, _ = Item.def stop in let derivation = Derivation.tail pos rhs in let derivation = follow derivation n configs in derivation (* -------------------------------------------------------------------------- *) (* Explaining reduce actions. *) (* The existence of a reduce action stems from the existence of a reduce item, whose lookahead set contains the token of interest, in the state of interest. Here, lookahead sets are relevant only insofar as they contain or do not contain the token of interest -- in other words, lookahead sets can be abstracted by Boolean values. The existence of the reduce item is explained by finding a path from a start item to the reduce item in the LR(1) nondeterministic automaton, such that the symbols read along this path form the (previously fixed) symbol string that leads to the conflict state in the LR(1) automaton. There may be several such paths: a shortest one is chosen. *) (* Symbolic execution of the nondeterministic LR(1) automaton. *) (* Configurations are pairs of an LR(1) item and an offset into the input string, which indicates how much has been read so far. An LR(1) item is itself represented as the combination of an LR(0) item and a Boolean flag, telling whether the token of interest appears or does not appear in the lookahead set. *) type configuration1 = Item.t * bool * int (* This function builds a derivation out of a sequence of configurations. The end of the sequence is dealt with specially -- we want to explain how the lookahead symbol appears and is inherited. Once that is done, the rest (that is, the beginning) of the derivation is dealt with as above. *) let config1toconfig0 (item, _, offset) = (item, offset) let rec follow1 tok derivation offset' = function | [] -> assert (Terminal.equal tok Terminal.sharp); (* One could emit a comment saying that the lookahead token is initially [#]. That comment would have to be displayed above the derivation, though, and there is no support for that at the moment, so let's skip it. *) derivation | (item, _, offset) :: configs -> let prod, _, rhs, pos, length = Item.def item in if offset = offset' then (* This is an epsilon transition. Attack a new line and add a comment that explains why the lookahead symbol is produced or inherited. *) let nullable, first = Analysis.nullable_first_prod prod (pos + 1) in if TerminalSet.mem tok first then (* The lookahead symbol is produced (and perhaps also inherited, but let's ignore that). *) let e = Analysis.explain_first_rhs tok rhs (pos + 1) in let comment = "lookahead token appears" ^ (if e = "" then "" else " because " ^ e) in let derivation = Derivation.build pos rhs derivation (Some comment) in (* Print the rest of the derivation without paying attention to the lookahead symbols. *) follow derivation offset (List.map config1toconfig0 configs) else begin (* The lookahead symbol is not produced, so it is definitely inherited. *) assert nullable; let comment = "lookahead token is inherited" ^ (if pos + 1 < length then Printf.sprintf " because %s can vanish" (Symbol.printao (pos + 1) rhs) else "") in let derivation = Derivation.build pos rhs derivation (Some comment) in follow1 tok derivation offset configs end else (* This is a shift transition. Tack symbol in front of forest. *) let derivation = Derivation.prepend rhs.(pos) derivation in follow1 tok derivation offset configs (* Symbolic execution is performed in the same manner as above. *) let explain_reduce_item (tok : Terminal.t) (start : Item.t) (input : Symbol.t array) (stop : Item.t) : Derivation.t = let n = Array.length input in let table : (configuration1, configuration1 option) Hashtbl.t = Hashtbl.create 1023 in let queue : configuration1 Queue.t = Queue.create() in let enqueue ancestor config = try let _ = Hashtbl.find table config in () with Not_found -> Hashtbl.add table config ancestor; Queue.add config queue in (* If the lookahead token is #, then it initially appear in the lookahead set, otherwise it doesn't. *) enqueue None (start, Terminal.equal tok Terminal.sharp, 0); try Misc.qiter (function (item, lookahead, offset) as config -> (* If the item we're looking at is the goal item and if we have read all of the input symbols, stop. *) if (Item.equal item stop) && lookahead && (offset = n) then raise Done; (* Otherwise, explore the transitions out of this item. *) let prod, _nt, rhs, pos, length = Item.def item in (* Shift transition, followed only if the symbol matches the symbol found in the input string. *) if (pos < length) && (offset < n) && (Symbol.equal rhs.(pos) input.(offset)) then begin let config' = (Item.import (prod, pos+1), lookahead, offset+1) in enqueue (Some config) config' end; (* Epsilon transitions. *) if pos < length then match rhs.(pos) with | Symbol.N nt -> let nullable, first = Analysis.nullable_first_prod prod (pos + 1) in let first : bool = TerminalSet.mem tok first in let lookahead' = if nullable then first || lookahead else first in Production.iternt nt (fun prod -> let config' = (Item.import (prod, 0), lookahead', offset) in enqueue (Some config) config' ) | Symbol.T _ -> () ) queue; assert false with Done -> (* We have found a (shortest) path from the start configuration to the goal configuration. Turn it into an explicit derivation. *) let configs = Misc.materialize table (stop, true, n) in let derivation = Derivation.empty in let derivation = follow1 tok derivation n configs in derivation (* -------------------------------------------------------------------------- *) (* A counter of how many conflicts could *not* be explained. *) (* When this counter is nonzero, we display a message on the standard output channel. This can help us detect regressions via [make test]. *) let unexplainable = ref 0 let log_unexplainable () = if !unexplainable > 0 then Error.logA 2 (fun f -> Printf.fprintf f "%d conflict%s could not be explained.\n" !unexplainable (if !unexplainable > 1 then "s" else "") ) (* -------------------------------------------------------------------------- *) (* Putting it all together. *) let () = if Settings.explain then begin (* 2018/09/05: when [--explain] is enabled, always create a fresh .conflicts file (wiping out any pre-existing file), even if there are in fact no conflicts. This should avoid confusion with outdated .conflicts files. *) let out = open_out (Settings.base ^ ".conflicts") in Lr1.conflicts (fun toks node -> try (* Construct a partial LR(1) automaton, looking for a conflict in a state that corresponds to this node. Because Pager's algorithm can merge two states as soon as one of them has a conflict, we can't be too specific about the conflict that we expect to find in the canonical automaton. So, we must supply a set of conflict tokens and accept any kind of conflict that involves one of them. *) (* TEMPORARY with the new compatibility criterion, we can be sure that every conflict token is indeed involved in a conflict. Exploit that? Avoid focusing on a single token? *) let module P = Lr1partial.Run (struct let tokens = toks let goal = node end) in let closure = Lr0.closure P.goal in (* Determine what kind of conflict was found. *) let shift, reduce = Item.Map.fold (fun item toks (shift, reduce) -> match Item.classify item with | Item.Shift (Symbol.T tok, _) when Terminal.equal tok P.token -> shift + 1, reduce | Item.Reduce _ when TerminalSet.mem P.token toks -> shift, reduce + 1 | _ -> shift, reduce ) closure (0, 0) in let kind = if (shift > 0) && (reduce > 1) then "shift/reduce/reduce" else if (shift > 0) then "shift/reduce" else "reduce/reduce" in (* Explain how the conflict state is reached. *) Printf.fprintf out "\n\ ** Conflict (%s) in state %d.\n\ ** Token%s involved: %s\n%s\ ** This state is reached from %s after reading:\n\n%s\n" kind (Lr1.number node) (if TerminalSet.cardinal toks > 1 then "s" else "") (TerminalSet.print toks) (if TerminalSet.cardinal toks > 1 then Printf.sprintf "** The following explanations concentrate on token %s.\n" (Terminal.print P.token) else "") (Nonterminal.print false (Item.startnt P.source)) (Symbol.printa P.path); (* Examine the items in that state, focusing on one particular token. Out of the shift items, we explain just one -- this seems enough. We explain each of the reduce items. *) (* First, build a mapping of items to derivations. *) let (_ : bool), derivations = Item.Map.fold (fun item toks (still_looking_for_shift_item, derivations) -> match Item.classify item with | Item.Shift (Symbol.T tok, _) when still_looking_for_shift_item && (Terminal.equal tok P.token) -> false, let derivation = explain_shift_item P.source P.path item in Item.Map.add item derivation derivations | Item.Reduce _ when TerminalSet.mem P.token toks -> still_looking_for_shift_item, let derivation = explain_reduce_item P.token P.source P.path item in Item.Map.add item derivation derivations | _ -> still_looking_for_shift_item, derivations ) closure (true, Item.Map.empty) in (* Factor out the common context among all derivations, so as to avoid repeating it. This helps prevent derivation trees from drifting too far away towards the right. It also helps produce sub-derivations that are quite compact. *) let context, derivations = Derivation.factor derivations in (* Display the common context. *) Printf.fprintf out "\n** The derivations that appear below have the following common factor:\ \n** (The question mark symbol (?) represents the spot where the derivations begin to differ.)\n\n"; Derivation.printc out context; (* Then, display the sub-derivations. *) Item.Map.iter (fun item derivation -> Printf.fprintf out "\n** In state %d, looking ahead at %s, " (Lr1.number node) (Terminal.print P.token); begin match Item.classify item with | Item.Shift _ -> Printf.fprintf out "shifting is permitted\n** because of the following sub-derivation:\n\n" | Item.Reduce prod -> Printf.fprintf out "reducing production\n** %s\n** is permitted because of the following sub-derivation:\n\n" (Production.print prod) end; Derivation.print out derivation ) derivations; flush out with Lr1partial.Oops -> (* Ha ha! We were unable to explain this conflict. This could happen because the automaton was butchered by conflict resolution directives, or because [--lalr] was enabled and we have unexplainable LALR conflicts. Anyway, send the error message to the .conflicts file and continue. *) incr unexplainable; Printf.fprintf out "\n\ ** Conflict (unexplainable) in state %d.\n\ ** Token%s involved: %s\n\ ** %s.\n%!" (Lr1.number node) (if TerminalSet.cardinal toks > 1 then "s" else "") (TerminalSet.print toks) (match Settings.construction_mode with | Settings.ModeLALR -> "This may be an artificial conflict caused by your use of --lalr" | Settings.ModeCanonical | Settings.ModeInclusionOnly | Settings.ModePager -> "Please send your grammar to Menhir's developers" ) ); log_unexplainable(); Time.tick "Explaining conflicts" end (* ------------------------------------------------------------------------ *) (* Resolve the conflicts that remain in the automaton. *) let () = Lr1.default_conflict_resolution(); Time.tick "Resolving remaining conflicts" (* ------------------------------------------------------------------------ *) (* Now is as good a time as any to add extra reductions, if requested by the user. This must be done after conflicts have been resolved. *) let () = Lr1.extra_reductions(); Time.tick "Adding extra reductions" (* ------------------------------------------------------------------------ *) (* If any warnings about the grammar have been emitted up to this point, and if [--strict] is enabled, now is the time to stop, before going into the back-end. *) let () = Error.exit_if Error.grammatical_error menhir-20210929/src/conflict.mli000066400000000000000000000026571412503066000163500ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module writes a description of the automaton, before conflict resolution, to .automaton. It writes conflict explanations to the file .conflicts. Then, it performs conflict resolution and introduces extra reductions. Finally, if any warnings about the grammar have been emitted up to this point, and if [--strict] is enabled, then it stops Menhir, before going into the back-end. No functionality is offered by this module. *) menhir-20210929/src/coqBackend.ml000066400000000000000000000555061412503066000164310ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Printf open Grammar module Run (T: sig end) = struct let from_menhirlib f = match Settings.coq_lib_path with | None -> () | Some path -> fprintf f "From %s " path let menhirlib_path = match Settings.coq_lib_path with | None -> "" | Some path -> path ^ "." let print_term t = assert (not (Terminal.pseudo t)); sprintf "%s't" (Terminal.print t) let print_nterm nt = sprintf "%s'nt" (Nonterminal.print true nt) let print_symbol = function | Symbol.N nt -> sprintf "NT %s" (print_nterm nt) | Symbol.T t -> sprintf "T %s" (print_term t) let print_cell_symbol cell = print_symbol cell.Invariant.symbol let print_word print_cell w = (* Convert to a list whose head is the top of the stack. *) Array.fold_left (fun accu cell -> print_cell cell :: accu) [] w |> String.concat "; " |> sprintf "[%s]%%list" let print_type ty = if Settings.coq_no_actions then "unit" else match ty with | None -> "unit" | Some t -> match t with | Stretch.Declared s -> s.Stretch.stretch_content | Stretch.Inferred _ -> assert false (* We cannot infer coq types *) let is_final_state node = match Default.has_default_reduction node with | Some (prod, _) -> Production.is_start prod | None -> false let lr1_iter_nonfinal f = Lr1.iter (fun node -> if not (is_final_state node) then f node) let lr1_iterx_nonfinal f = Lr1.iterx (fun node -> if not (is_final_state node) then f node) let lr1_foldx_nonfinal f = Lr1.foldx (fun accu node -> if not (is_final_state node) then f accu node else accu) let print_nis nis = sprintf "Nis'%d" (Lr1.number nis) let print_init init = sprintf "Init'%d" (Lr1.number init) let print_st st = match Lr1.incoming_symbol st with | Some _ -> sprintf "Ninit %s" (print_nis st) | None -> sprintf "Init %s" (print_init st) let (prod_ids, _) = Production.foldx (fun p (prod_ids, counters) -> let lhs = Production.nt p in let id = try SymbolMap.find (Symbol.N lhs) counters with Not_found -> 0 in (ProductionMap.add p id prod_ids, SymbolMap.add (Symbol.N lhs) (id+1) counters)) (ProductionMap.empty, SymbolMap.empty) let print_prod p = sprintf "Prod'%s'%d" (Nonterminal.print true (Production.nt p)) (ProductionMap.find p prod_ids) let () = if not Settings.coq_no_actions then begin Nonterminal.iterx (fun nonterminal -> match Nonterminal.ocamltype nonterminal with | None -> Error.error [] "I don't know the type of the nonterminal symbol %s." (Nonterminal.print false nonterminal) | Some _ -> ()); Production.iterx (fun prod -> if not (Keyword.KeywordSet.is_empty (Action.keywords (Production.action prod))) then Error.error [] "the Coq back-end supports none of the $ keywords." ) end; Production.iterx (fun prod -> Array.iter (fun symb -> match symb with | Symbol.T t -> if t = Terminal.error then Error.error [] "the Coq back-end does not support the error token." | _ -> ()) (Production.rhs prod)); if Front.grammar.BasicSyntax.parameters <> [] then Error.error [] "the Coq back-end does not support %%parameter." let write_tokens f = fprintf f "Inductive token : Type :="; Terminal.iter_real (fun term -> fprintf f "\n| %s : %s%%type -> token" (Terminal.print term) (print_type (Terminal.ocamltype term)) ); fprintf f ".\n\n" let write_inductive_alphabet f name constrs = fprintf f "Inductive %s' : Set :=" name; List.iter (fprintf f "\n| %s") constrs; fprintf f ".\n"; fprintf f "Definition %s := %s'.\n\n" name name; if List.length constrs > 0 then begin let iteri f = ignore (List.fold_left (fun k x -> f k x; succ k) 1 constrs) in fprintf f "Global Program Instance %sNum : %sAlphabet.Numbered %s :=\n" name menhirlib_path name; fprintf f " { inj := fun x => match x return _ with"; iteri (fun k constr -> fprintf f "\n | %s => %d%%positive" constr k); fprintf f "\n end;\n"; fprintf f " surj := (fun n => match n return _ with"; iteri (fprintf f "\n | %d%%positive => %s"); fprintf f "\n | _ => %s\n end)%%Z;\n" (List.hd constrs); fprintf f " inj_bound := %d%%positive }.\n" (List.length constrs); end else begin fprintf f "Global Program Instance %sAlph : %sAlphabet.Alphabet %s :=\n" name menhirlib_path name; fprintf f " { AlphabetComparable := {| compare := fun x y =>\n"; fprintf f " match x, y return comparison with end |};\n"; fprintf f " AlphabetEnumerable := {| all_list := []%%list |} }."; end let write_terminals f = write_inductive_alphabet f "terminal" ( Terminal.fold (fun t l -> if Terminal.pseudo t then l else print_term t::l) []); fprintf f "Global Instance TerminalAlph : %sAlphabet.Alphabet terminal := _.\n\n" menhirlib_path let write_nonterminals f = write_inductive_alphabet f "nonterminal" ( Nonterminal.foldx (fun nt l -> (print_nterm nt)::l) []); fprintf f "Global Instance NonTerminalAlph : %sAlphabet.Alphabet nonterminal := _.\n\n" menhirlib_path let write_symbol_semantic_type f = fprintf f "Definition terminal_semantic_type (t:terminal) : Type:=\n"; fprintf f " match t with\n"; Terminal.iter_real (fun terminal -> fprintf f " | %s => %s%%type\n" (print_term terminal) (print_type (Terminal.ocamltype terminal)) ); fprintf f " end.\n\n"; fprintf f "Definition nonterminal_semantic_type (nt:nonterminal) : Type:=\n"; fprintf f " match nt with\n"; Nonterminal.iterx (fun nonterminal -> fprintf f " | %s => %s%%type\n" (print_nterm nonterminal) (print_type (Nonterminal.ocamltype nonterminal))); fprintf f " end.\n\n"; fprintf f "Definition symbol_semantic_type (s:symbol) : Type:=\n"; fprintf f " match s with\n"; fprintf f " | T t => terminal_semantic_type t\n"; fprintf f " | NT nt => nonterminal_semantic_type nt\n"; fprintf f " end.\n\n" let write_token_term f = fprintf f "Definition token_term (tok : token) : terminal :=\n"; fprintf f " match tok with\n"; Terminal.iter_real (fun terminal -> fprintf f " | %s _ => %s\n" (Terminal.print terminal) (print_term terminal)); fprintf f " end.\n\n" let write_token_sem f = fprintf f "Definition token_sem (tok : token) : symbol_semantic_type (T (token_term tok)) :=\n"; fprintf f " match tok with\n"; Terminal.iter_real (fun terminal -> fprintf f " | %s x => x\n" (Terminal.print terminal)); fprintf f " end.\n\n" let write_productions f = write_inductive_alphabet f "production" ( Production.foldx (fun prod l -> (print_prod prod)::l) []); fprintf f "Global Instance ProductionAlph : %sAlphabet.Alphabet production := _.\n\n" menhirlib_path let write_productions_contents f = fprintf f "Definition prod_contents (p:production) :\n"; fprintf f " { p:nonterminal * list symbol &\n"; fprintf f " %sGrammar.arrows_right\n" menhirlib_path; fprintf f " (symbol_semantic_type (NT (fst p)))\n"; fprintf f " (List.map symbol_semantic_type (snd p)) }\n"; fprintf f " :=\n"; fprintf f " let box := existT (fun p =>\n"; fprintf f " %sGrammar.arrows_right\n" menhirlib_path; fprintf f " (symbol_semantic_type (NT (fst p)))\n"; fprintf f " (List.map symbol_semantic_type (snd p)) )\n"; fprintf f " in\n"; fprintf f " match p with\n"; Production.iterx (fun prod -> fprintf f " | %s => box\n" (print_prod prod); fprintf f " (%s, [%s]%%list)\n" (print_nterm (Production.nt prod)) (String.concat "; " (List.map print_symbol (List.rev (Array.to_list (Production.rhs prod))))); if Production.length prod = 0 then fprintf f " (\n" else fprintf f " (fun %s =>\n" (String.concat " " (List.rev (Array.to_list (Production.identifiers prod)))); if Settings.coq_no_actions then fprintf f "tt" else Printer.print_expr f (Action.to_il_expr (Production.action prod)); fprintf f "\n)\n"); fprintf f " end.\n\n"; fprintf f "Definition prod_lhs (p:production) :=\n"; fprintf f " fst (projT1 (prod_contents p)).\n"; fprintf f "Definition prod_rhs_rev (p:production) :=\n"; fprintf f " snd (projT1 (prod_contents p)).\n"; fprintf f "Definition prod_action (p:production) :=\n"; fprintf f " projT2 (prod_contents p).\n\n" let write_nullable_first f = fprintf f "Definition nullable_nterm (nt:nonterminal) : bool :=\n"; fprintf f " match nt with\n"; Nonterminal.iterx (fun nt -> fprintf f " | %s => %b\n" (print_nterm nt) (Analysis.nullable nt)); fprintf f " end.\n\n"; fprintf f "Definition first_nterm (nt:nonterminal) : list terminal :=\n"; fprintf f " match nt with\n"; Nonterminal.iterx (fun nt -> let firstSet = Analysis.first nt in fprintf f " | %s => [" (print_nterm nt); let first = ref true in TerminalSet.iter (fun t -> if !first then first := false else fprintf f "; "; fprintf f "%s" (print_term t) ) firstSet; fprintf f "]%%list\n"); fprintf f " end.\n\n" let write_grammar f = fprintf f "Module Import Gram <: %sGrammar.T.\n\n" menhirlib_path; fprintf f "Local Obligation Tactic := let x := fresh in intro x; case x; reflexivity.\n\n"; write_terminals f; write_nonterminals f; fprintf f "Include %sGrammar.Symbol.\n\n" menhirlib_path; write_symbol_semantic_type f; fprintf f "Definition token := token.\n\n"; write_token_term f; write_token_sem f; write_productions f; write_productions_contents f; fprintf f "Include %sGrammar.Defs.\n\n" menhirlib_path; fprintf f "End Gram.\n\n" let write_nis f = write_inductive_alphabet f "noninitstate" ( lr1_foldx_nonfinal (fun l node -> (print_nis node)::l) []); fprintf f "Global Instance NonInitStateAlph : %sAlphabet.Alphabet noninitstate := _.\n\n" menhirlib_path let write_init f = write_inductive_alphabet f "initstate" ( ProductionMap.fold (fun _prod node l -> (print_init node)::l) Lr1.entry []); fprintf f "Global Instance InitStateAlph : %sAlphabet.Alphabet initstate := _.\n\n" menhirlib_path let write_start_nt f = fprintf f "Definition start_nt (init:initstate) : nonterminal :=\n"; fprintf f " match init with\n"; Lr1.fold_entry (fun _prod node startnt _t () -> fprintf f " | %s => %s\n" (print_init node) (print_nterm startnt) ) (); fprintf f " end.\n\n" let write_actions f = fprintf f "Definition action_table (state:state) : action :=\n"; fprintf f " match state with\n"; lr1_iter_nonfinal (fun node -> fprintf f " | %s => " (print_st node); match Default.has_default_reduction node with | Some (prod, _) -> fprintf f "Default_reduce_act %s\n" (print_prod prod) | None -> fprintf f "Lookahead_act (fun terminal:terminal =>\n"; fprintf f " match terminal return lookahead_action terminal with\n"; let has_fail = ref false in Terminal.iter_real (fun t -> try let target = SymbolMap.find (Symbol.T t) (Lr1.transitions node) in fprintf f " | %s => Shift_act %s (eq_refl _)\n" (print_term t) (print_nis target) with Not_found -> try let prod = Misc.single (TerminalMap.find t (Lr1.reductions node)) in fprintf f " | %s => Reduce_act %s\n" (print_term t) (print_prod prod) with Not_found -> has_fail := true); if !has_fail then fprintf f " | _ => Fail_act\n"; fprintf f " end)\n" ); fprintf f " end.\n\n" let write_gotos f = fprintf f "Definition goto_table (state:state) (nt:nonterminal) :=\n"; fprintf f " match state, nt return option { s:noninitstate | NT nt = last_symb_of_non_init_state s } with\n"; let has_none = ref false in lr1_iter_nonfinal (fun node -> Nonterminal.iterx (fun nt -> try let target = SymbolMap.find (Symbol.N nt) (Lr1.transitions node) in fprintf f " | %s, %s => " (print_st node) (print_nterm nt); if is_final_state target then fprintf f "None" else fprintf f "Some (exist _ %s (eq_refl _))\n" (print_nis target) with Not_found -> has_none := true)); if !has_none then fprintf f " | _, _ => None\n"; fprintf f " end.\n\n" let write_last_symb f = fprintf f "Definition last_symb_of_non_init_state (noninitstate:noninitstate) : symbol :=\n"; fprintf f " match noninitstate with\n"; lr1_iterx_nonfinal (fun node -> match Lr1.incoming_symbol node with | Some s -> fprintf f " | %s => %s\n" (print_nis node) (print_symbol s) | None -> assert false); fprintf f " end.\n\n" let write_past_symb f = fprintf f "Definition past_symb_of_non_init_state (noninitstate:noninitstate) : list symbol :=\n"; fprintf f " match noninitstate with\n"; lr1_iterx_nonfinal (fun node -> let w = Invariant.(pop (stack node)) in fprintf f " | %s => %s\n" (print_nis node) (print_word print_cell_symbol w) ); fprintf f " end.\n"; fprintf f "Extract Constant past_symb_of_non_init_state => \"fun _ -> assert false\".\n\n" module NodeSetMap = Map.Make(Lr1.NodeSet) let write_past_states f = let get_stateset_id = let memo = ref NodeSetMap.empty in let next_id = ref 1 in fun stateset -> try NodeSetMap.find stateset !memo with | Not_found -> let id = sprintf "state_set_%d" !next_id in memo := NodeSetMap.add stateset id !memo; incr next_id; fprintf f "Definition %s (s:state) : bool :=\n" id; fprintf f " match s with\n"; fprintf f " "; Lr1.NodeSet.iter (fun st -> fprintf f "| %s " (print_st st)) stateset; fprintf f "=> true\n"; fprintf f " | _ => false\n"; fprintf f " end.\n"; fprintf f "Extract Inlined Constant %s => \"assert false\".\n\n" id; id in let print_cell_stateset_id cell = get_stateset_id cell.Invariant.states in let b = Buffer.create 256 in bprintf b "Definition past_state_of_non_init_state (s:noninitstate) : list (state -> bool) :=\n"; bprintf b " match s with\n"; lr1_iterx_nonfinal (fun node -> let w = Invariant.stack node in bprintf b " | %s => %s\n" (print_nis node) (print_word print_cell_stateset_id w) ); bprintf b " end.\n"; Buffer.output_buffer f b; fprintf f "Extract Constant past_state_of_non_init_state => \"fun _ -> assert false\".\n\n" module TerminalSetMap = Map.Make(TerminalSet) let write_items f = if not Settings.coq_no_complete then begin let get_lookaheadset_id = let memo = ref TerminalSetMap.empty in let next_id = ref 1 in fun lookaheadset -> let lookaheadset = if TerminalSet.mem Terminal.sharp lookaheadset then TerminalSet.universe else lookaheadset in try TerminalSetMap.find lookaheadset !memo with Not_found -> let id = sprintf "lookahead_set_%d" !next_id in memo := TerminalSetMap.add lookaheadset id !memo; incr next_id; fprintf f "Definition %s : list terminal :=\n [" id; let first = ref true in TerminalSet.iter (fun lookahead -> if !first then first := false else fprintf f "; "; fprintf f "%s" (print_term lookahead) ) lookaheadset; fprintf f "]%%list.\nExtract Inlined Constant %s => \"assert false\".\n\n" id; id in let b = Buffer.create 256 in lr1_iter_nonfinal (fun node -> bprintf b "Definition items_of_state_%d : list item :=\n" (Lr1.number node); bprintf b " [ "; let first = ref true in Item.Map.iter (fun item lookaheads -> let prod, pos = Item.export item in if not (Production.is_start prod) then begin if !first then first := false else bprintf b ";\n "; bprintf b "{| prod_item := %s; dot_pos_item := %d; lookaheads_item := %s |}" (print_prod prod) pos (get_lookaheadset_id lookaheads); end ) (Lr0.closure (Lr0.export (Lr1.state node))); bprintf b " ]%%list.\n"; bprintf b "Extract Inlined Constant items_of_state_%d => \"assert false\".\n\n" (Lr1.number node) ); Buffer.output_buffer f b; fprintf f "Definition items_of_state (s:state) : list item :=\n"; fprintf f " match s with\n"; lr1_iter_nonfinal (fun node -> fprintf f " | %s => items_of_state_%d\n" (print_st node) (Lr1.number node)); fprintf f " end.\n"; end else fprintf f "Definition items_of_state (s:state): list item := []%%list.\n"; fprintf f "Extract Constant items_of_state => \"fun _ -> assert false\".\n\n" let write_state_helper f = fprintf f "Definition N_of_state (s:state) : N :=\n"; fprintf f " match s with\n"; lr1_iter_nonfinal (fun node -> fprintf f " | %s => %d%%N\n" (print_st node) (Lr1.number node) ); fprintf f " end.\n" let write_automaton f = fprintf f "Module Aut <: %sAutomaton.T.\n\n" menhirlib_path; fprintf f "Local Obligation Tactic := let x := fresh in intro x; case x; reflexivity.\n\n"; fprintf f "Module Gram := Gram.\n"; fprintf f "Module GramDefs := Gram.\n\n"; write_nullable_first f; write_nis f; write_last_symb f; write_init f; fprintf f "Include %sAutomaton.Types.\n\n" menhirlib_path; write_start_nt f; write_actions f; write_gotos f; write_past_symb f; write_past_states f; write_items f; write_state_helper f; fprintf f "End Aut.\n\n" let write_theorems f = fprintf f "Module MenhirLibParser := %sMain.Make Aut.\n" menhirlib_path; fprintf f "Theorem safe:\n"; fprintf f " MenhirLibParser.safe_validator tt = true.\n"; fprintf f "Proof eq_refl true<:MenhirLibParser.safe_validator tt = true.\n\n"; if not Settings.coq_no_complete then begin fprintf f "Theorem complete:\n"; fprintf f " MenhirLibParser.complete_validator tt = true.\n"; fprintf f "Proof eq_refl true<:MenhirLibParser.complete_validator tt = true.\n\n"; end; Lr1.fold_entry (fun _prod node startnt _t () -> let funName = Nonterminal.print true startnt in fprintf f "Definition %s : nat -> MenhirLibParser.Inter.buffer -> MenhirLibParser.Inter.parse_result %s := MenhirLibParser.parse safe Aut.%s.\n\n" funName (print_type (Nonterminal.ocamltype startnt)) (print_init node); fprintf f "Theorem %s_correct (log_fuel : nat) (buffer : MenhirLibParser.Inter.buffer):\n" funName; fprintf f " match %s log_fuel buffer with\n" funName; fprintf f " | MenhirLibParser.Inter.Parsed_pr sem buffer_new =>\n"; fprintf f " exists word (tree : Gram.parse_tree (%s) word),\n" (print_symbol (Symbol.N startnt)); fprintf f " buffer = MenhirLibParser.Inter.app_buf word buffer_new /\\\n"; fprintf f " Gram.pt_sem tree = sem\n"; fprintf f " | _ => True\n"; fprintf f " end.\n"; fprintf f "Proof. apply MenhirLibParser.parse_correct with (init:=Aut.%s). Qed.\n\n" (print_init node); if not Settings.coq_no_complete then begin fprintf f "Theorem %s_complete (log_fuel : nat) (word : list token) (buffer_end : MenhirLibParser.Inter.buffer) :\n" funName; fprintf f " forall tree : Gram.parse_tree (%s) word,\n" (print_symbol (Symbol.N startnt)); fprintf f " match %s log_fuel (MenhirLibParser.Inter.app_buf word buffer_end) with\n" funName; fprintf f " | MenhirLibParser.Inter.Fail_pr => False\n"; fprintf f " | MenhirLibParser.Inter.Parsed_pr output_res buffer_end_res =>\n"; fprintf f " output_res = Gram.pt_sem tree /\\\n"; fprintf f " buffer_end_res = buffer_end /\\ (Gram.pt_size tree <= PeanoNat.Nat.pow 2 log_fuel)%%nat\n"; fprintf f " | MenhirLibParser.Inter.Timeout_pr => (PeanoNat.Nat.pow 2 log_fuel < Gram.pt_size tree)%%nat\n"; fprintf f " end.\n"; fprintf f "Proof. apply MenhirLibParser.parse_complete with (init:=Aut.%s); exact complete. Qed.\n" (print_init node); end ) () let write_all f = if not Settings.coq_no_actions then List.iter (fun s -> fprintf f "%s\n\n" s.Stretch.stretch_content) Front.grammar.BasicSyntax.preludes; fprintf f "From Coq.Lists Require List.\n"; fprintf f "From Coq.PArith Require Import BinPos.\n"; fprintf f "From Coq.NArith Require Import BinNat.\n"; from_menhirlib f; fprintf f "Require Main.\n"; if not Settings.coq_no_version_check then begin from_menhirlib f; fprintf f "Require Version.\n" end; fprintf f "Import List.ListNotations.\n\n"; if not Settings.coq_no_version_check then fprintf f "Definition version_check : unit := %sVersion.require_%s.\n\n" menhirlib_path Version.version; fprintf f "Unset Elimination Schemes.\n\n"; write_tokens f; write_grammar f; write_automaton f; write_theorems f; if not Settings.coq_no_actions then List.iter (fun stretch -> fprintf f "\n\n%s" stretch.Stretch.stretch_raw_content) Front.grammar.BasicSyntax.postludes end menhir-20210929/src/coqBackend.mli000066400000000000000000000020701412503066000165660ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The coq code generator. *) module Run (T: sig end) : sig val write_all: out_channel -> unit end menhir-20210929/src/cst.ml000066400000000000000000000076571412503066000151740ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* Concrete syntax trees. *) (* A concrete syntax tree is one of a leaf -- which corresponds to a terminal symbol; a node -- which corresponds to a non-terminal symbol, and whose immediate descendants form an expansion of that symbol; or an error leaf -- which corresponds to a point where the [error] pseudo-token was shifted. *) type cst = | CstTerminal of Terminal.t | CstNonTerminal of Production.index * cst array | CstError (* The fringe of a concrete syntax tree. *) let rec fringe cst accu = match cst with | CstTerminal tok -> tok :: accu | CstNonTerminal (_, csts) -> fringe_rhs csts (Array.length csts) 0 accu | CstError -> (* Not sure if this is appropriate. *) Terminal.error :: accu and fringe_rhs csts n i accu = if i = n then accu else fringe csts.(i) (fringe_rhs csts n (i + 1) accu) let fringe cst = fringe cst [] (* This is a (mostly) unambiguous printer for concrete syntax trees, in an sexp-like notation. *) let rec pcst b = function | CstTerminal tok -> (* A leaf is denoted by a terminal symbol. *) Printf.bprintf b "%s" (Terminal.print tok) | CstNonTerminal (prod, csts) -> (* A node is denoted by a bracketed, whitespace-separated list, whose head is a non-terminal symbol (followed with a colon) and whose tail consists of the node's descendants. *) (* There is in fact some ambiguity in this notation, since we only print the non-terminal symbol that forms the left-hand side of production [prod], instead of the production itself. This abuse makes things much more readable, and should be acceptable for the moment. The cases where ambiguity actually arises should be rare. *) Printf.bprintf b "[%s:%a]" (Nonterminal.print false (Production.nt prod)) pcsts csts | CstError -> (* An error leaf is denoted by [error]. *) Printf.bprintf b "error" and pcsts b (csts : cst array) = Array.iter (fun cst -> Printf.bprintf b " %a" pcst cst ) csts (* This is the public interface. *) let wrap print f x = let b = Buffer.create 32768 in print b x; Buffer.output_buffer f b let print = wrap pcst (* This is a pretty-printer for concrete syntax trees. The notation is the same as that used by the above printer; the only difference is that the [PPrint] library is used to manage indentation. *) open PPrint let rec build : cst -> document = function | CstTerminal tok -> string (Terminal.print tok) | CstNonTerminal (prod, csts) -> brackets ( group ( string (Nonterminal.print false (Production.nt prod)) ^^ colon ^^ group ( nest 2 ( Array.fold_left (fun doc cst -> doc ^/^ build cst ) empty csts ) ) ^^ break 0 ) ) | CstError -> string "error" let show f cst = ToChannel.pretty 0.8 80 f (build cst) menhir-20210929/src/cst.mli000066400000000000000000000035611412503066000153330ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* Concrete syntax trees. *) (* A concrete syntax tree is one of a leaf -- which corresponds to a terminal symbol; a node -- which corresponds to a non-terminal symbol, and whose immediate descendants form an expansion of that symbol; or an error leaf -- which corresponds to a point where the [error] pseudo-token was shifted. *) type cst = | CstTerminal of Terminal.t | CstNonTerminal of Production.index * cst array | CstError (* The fringe of a concrete syntax tree. *) val fringe: cst -> Terminal.t list (* This is a (mostly) unambiguous printer for concrete syntax trees, in an sexp-like notation. *) val print: out_channel -> cst -> unit (* This is a pretty-printer for concrete syntax trees. The notation is the same as that used by the above printer; the only difference is that the [Pprint] library is used to manage indentation. *) val show: out_channel -> cst -> unit menhir-20210929/src/default.ml000066400000000000000000000131271412503066000160140ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar module C = Conflict (* artificial dependency; ensures that [Conflict] runs first *) (* Here is how we check whether state [s] should have a default reduction. We check whether [s] has no outgoing shift transitions and only has one possible reduction action. In that case, we produce a default reduction action, that is, we perform reduction without consulting the lookahead token. This saves code, but can alter the parser's behavior in the presence of errors. The check for default actions subsumes the check for the case where [s] admits a reduce action with lookahead symbol "#". In that case, it must be the only possible action -- see [Lr1.default_conflict_resolution]. That is, we have reached a point where we have recognized a well-formed input and are now expecting an end-of-stream. In that case, performing reduction without looking at the next token is the right thing to do, since there should in fact be none. The state that we reduce to will also have the same property, and so on, so we will in fact end up rewinding the entire stack and accepting the input when the stack becomes empty. (New as of 2012/01/23.) A state where a shift/reduce conflict was solved in favor of neither (due to a use of the %nonassoc directive) must not perform a default reduction. Indeed, this would effectively mean that the failure that was requested by the user is forgotten and replaced with a reduction. This surprising behavior is present in ocamlyacc and was present in earlier versions of Menhir. See e.g. http://caml.inria.fr/mantis/view.php?id=5462 There is a chance that we might run into trouble if the ideas described in the above two paragraphs collide, that is, if we forbid a default reduction (due to a shift/reduce conflict solved by %nonassoc) in a node where we would like to have default reduction on "#". This situation seems unlikely to arise, so I will not do anything about it for the moment. (Furthermore, someone who uses precedence declarations is looking for trouble anyway.) Between 2012/05/25 and 2015/09/25, if [--canonical] has been specified, then we disallow default reductions on a normal token, because we do not want to introduce any spurious actions into the automaton. We do still allow default reductions on "#", since they are needed for the automaton to terminate properly. From 2015/09/25 on, we again always allow default reductions, as they seem to be beneficial when explaining syntax errors. *) let has_default_reduction : Lr1.node -> (Production.index * TerminalSet.t) option = Lr1.tabulate (fun s -> if Lr1.forbid_default_reduction s then None else let reduction = ProductionMap.is_singleton (Lr0.invert (Lr1.reductions s)) in match reduction with | Some _ -> if SymbolMap.purelynonterminal (Lr1.transitions s) then reduction else None | None -> reduction ) let () = let count = Lr1.sum (fun s -> if has_default_reduction s = None then 0 else 1 ) in Error.logC 1 (fun f -> Printf.fprintf f "%d out of %d states have a default reduction.\n" count Lr1.n) let () = Time.tick "Computing default reductions" (* ------------------------------------------------------------------------ *) (* Here are a number of auxiliary functions that provide information about the LR(1) automaton. *) (* [reductions_on s z] is the list of reductions permitted in state [s] when the lookahead symbol is [z]. This is a list of zero or one elements. This does not take default reductions into account. [z] must be real. *) let reductions_on s z : Production.index list = assert (Terminal.real z); try TerminalMap.find z (Lr1.reductions s) with Not_found -> [] (* [has_reduction s z] tells whether state [s] is willing to reduce some production (and if so, which one) when the lookahead symbol is [z]. It takes a possible default reduction into account. [z] must be real. *) let has_reduction s z : Production.index option = assert (Terminal.real z); match has_default_reduction s with | Some (prod, _) -> Some prod | None -> match reductions_on s z with | prod :: prods -> assert (prods = []); Some prod | [] -> None (* [causes_an_error s z] tells whether state [s] will initiate an error on the lookahead symbol [z]. [z] must be real. *) let causes_an_error s z : bool = assert (Terminal.real z); match has_default_reduction s with | Some _ -> false | None -> reductions_on s z = [] && not (SymbolMap.mem (Symbol.T z) (Lr1.transitions s)) menhir-20210929/src/default.mli000066400000000000000000000032011412503066000161550ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* [has_default_reduction s] tells whether state [s] has a default reduction, and, if so, upon which set of tokens. *) val has_default_reduction : Lr1.node -> (Production.index * TerminalSet.t) option (* [has_reduction s z] tells whether state [s] is willing to reduce some production (and if so, which one) when the lookahead symbol is [z]. It takes a possible default reduction into account. [z] must be real. *) val has_reduction: Lr1.node -> Terminal.t -> Production.index option (* [causes_an_error s z] tells whether state [s] will initiate an error on the lookahead symbol [z]. [z] must be real. *) val causes_an_error: Lr1.node -> Terminal.t -> bool menhir-20210929/src/derivation.ml000066400000000000000000000233131412503066000165320ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* -------------------------------------------------------------------------- *) (* This is a data structure for linear derivation trees. These are derivation trees that are list-like (that is, they do not branch), because a single path is of interest. A tree is either empty or formed of a non-terminal symbol at the root and a forest below the root. A forest is an ordered list of elements. However, its elements are not trees, as one would perhaps expect. Because we are interested in *linear* derivation trees, only one element of the forest receives focus and is a tree. All other elements remain un-expanded, so they are just symbols. In other words, a linear derivation tree is roughly just a list of levels, where each forest corresponds to one level. *) type 'focus level = { prefix: Symbol.t list; focus: 'focus; suffix: Symbol.t list; comment: string } type tree = | TEmpty | TRooted of Symbol.t * forest and forest = tree level (* We make use of contexts with a forest-shaped hole. We have tree contexts and forest contexts. Tree contexts do not have a case for holes, since we work with forest-shaped holes only. Forest contexts have one. *) type ctree = | CRooted of Symbol.t * cforest and cforest = | CHole | CCons of ctree level (* Make a few types visible to clients. *) type t = forest type context = cforest (* -------------------------------------------------------------------------- *) (* Construction. *) let rec array_to_list a i j = if i = j then [] else a.(i) :: array_to_list a (i + 1) j let empty = { prefix = []; focus = TEmpty; suffix = []; comment = "" } let tail pos rhs = let length = Array.length rhs in assert (pos < length); { prefix = []; focus = TEmpty; suffix = array_to_list rhs pos length; comment = "" } let build pos rhs forest comment = let length = Array.length rhs in assert (pos < length); match rhs.(pos) with | Symbol.T _ -> assert false | Symbol.N _ as symbol -> { prefix = []; focus = TRooted (symbol, forest); suffix = array_to_list rhs (pos + 1) length; comment = (match comment with None -> "" | Some comment -> comment) } let prepend symbol forest = { forest with prefix = symbol :: forest.prefix } (* -------------------------------------------------------------------------- *) (* Display. *) let buffer = Buffer.create 32768 let rec print_blank k = if k > 0 then begin Buffer.add_char buffer ' '; print_blank (k - 1) end let print_symbol symbol = let word = Symbol.print symbol in Buffer.add_string buffer word; Buffer.add_char buffer ' '; String.length word + 1 let print_symbols symbols = List.fold_left (fun offset symbol -> offset + print_symbol symbol ) 0 symbols let print_level print_focus_root print_focus_remainder offset forest = print_blank offset; let offset = offset + print_symbols forest.prefix in print_focus_root forest.focus; let (_ : int) = print_symbols forest.suffix in if String.length forest.comment > 0 then begin Buffer.add_string buffer "// "; Buffer.add_string buffer forest.comment end; Buffer.add_char buffer '\n'; print_focus_remainder offset forest.focus let print_tree_root = function | TEmpty -> Buffer.add_string buffer ". " | TRooted (symbol, _) -> let (_ : int) = print_symbol symbol in () let rec print_forest offset forest = print_level print_tree_root print_tree_remainder offset forest and print_tree_remainder offset = function | TEmpty -> () | TRooted (_, forest) -> print_forest offset forest let print_ctree_root = function | CRooted (symbol, _) -> let (_ : int) = print_symbol symbol in () let rec print_cforest offset cforest = match cforest with | CHole -> print_blank offset; Buffer.add_string buffer "(?)\n" | CCons forest -> print_level print_ctree_root print_ctree_remainder offset forest and print_ctree_remainder offset = function | CRooted (_, cforest) -> print_cforest offset cforest let wrap print channel x = Buffer.clear buffer; print 0 x; Buffer.output_buffer channel buffer let print = wrap print_forest let printc = wrap print_cforest (* -------------------------------------------------------------------------- *) (* [punch] turns a (tree or forest) into a pair of a (tree or forest) context and a residual forest, where the context is chosen maximal. In other words, the residual forest consists of a single level -- its focus is [TEmpty]. *) let rec punch_tree tree : (ctree * forest) option = match tree with | TEmpty -> None | TRooted (symbol, forest) -> let forest1, forest2 = punch_forest forest in Some (CRooted (symbol, forest1), forest2) and punch_forest forest : cforest * forest = match punch_tree forest.focus with | None -> CHole, forest | Some (ctree1, forest2) -> CCons { prefix = forest.prefix; focus = ctree1; suffix = forest.suffix; comment = forest.comment }, forest2 (* [fill] fills a (tree or forest) context with a forest so as to produce a new (tree or forest). *) let rec fill_tree ctree1 forest2 : tree = match ctree1 with | CRooted (symbol1, cforest1) -> TRooted (symbol1, fill_forest cforest1 forest2) and fill_forest cforest1 forest2 : forest = match cforest1 with | CHole -> forest2 | CCons level1 -> { prefix = level1.prefix; focus = fill_tree level1.focus forest2; suffix = level1.suffix; comment = level1.comment } (* [common] factors the maximal common (tree or forest) context out of a pair of a (tree or forest) context and a (tree or forest). It returns the (tree or forest) context as well as the residuals of the two parameters. *) let rec common_tree ctree1 tree2 : (ctree * cforest * forest) option = match ctree1, tree2 with | CRooted _, TEmpty -> None | CRooted (symbol1, cforest1), TRooted (symbol2, forest2) -> if Symbol.equal symbol1 symbol2 then let cforest, cforest1, forest2 = common_forest cforest1 forest2 in Some (CRooted (symbol1, cforest), cforest1, forest2) else None and common_forest cforest1 forest2 : cforest * cforest * forest = match cforest1 with | CHole -> CHole, cforest1, forest2 | CCons forest1 -> if Symbol.lequal forest1.prefix forest2.prefix && Symbol.lequal forest1.suffix forest2.suffix && forest1.comment = forest2.comment then begin match common_tree forest1.focus forest2.focus with | None -> CHole, cforest1, forest2 | Some (ctree, csubforest1, subforest2) -> let cforest = { prefix = forest1.prefix; focus = ctree; suffix = forest1.suffix; comment = forest1.comment } in CCons cforest, csubforest1, subforest2 end else CHole, cforest1, forest2 (* [factor] factors the maximal common forest context out of a nonempty family of forests. We assume that the family is represented as a map indexed by items, because this is convenient for the application that we have in mind, but this assumption is really irrelevant. *) let factor forests = match Item.Map.fold (fun item forest accu -> match accu with | None -> (* First time through the loop, so [forest] is the first forest that we examine. Punch it, so as to produce a maximal forest context and a residual forest. *) let context, residual = punch_forest forest in Some (context, Item.Map.singleton item residual) | Some (context, residuals) -> (* Another iteration through the loop. [context] and [residuals] are the maximal common context and the residuals of the forests examined so far. *) (* Combine the common context obtained so far with the forest at hand. This yields a new, smaller common context, as well as residuals for the previous common context and for the forest at hand. *) let context, contextr, forestr = common_forest context forest in (* The residual forests are now: (i) the residual forest [forestr]; and (ii) the previous residual forests [residuals], each of which must be placed with the residual context [contextr]. *) let residuals = Item.Map.add item forestr (Item.Map.map (fill_forest contextr) residuals) in Some (context, residuals) ) forests None with | None -> assert false (* parameter [forests] was an empty map *) | Some (context, residuals) -> context, residuals menhir-20210929/src/derivation.mli000066400000000000000000000054321412503066000167050ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* -------------------------------------------------------------------------- *) (* This is the type of derivations. Derivations are forests: see inside. *) type t (* This is the type of derivations contexts, or derivations with a derivation-shaped hole. *) type context (* -------------------------------------------------------------------------- *) (* Construction. *) (* [empty] is the forest that consists of a single empty tree. *) val empty: t (* [tail pos rhs] is the forest: (i) whose first element is the empty tree, and (ii) whose remaining elements are the symbols found at positions greater than or equal to [pos] in the array [rhs]. *) val tail: int -> Symbol.t array -> t (* [build pos rhs forest comment] is the forest: (i) whose first element is the tree that has the non-terminal symbol [rhs.(pos)] at its root and the forest [forest] below its root, and (ii) whose remaining elements are the symbols found at positions greater than [pos] in the array [rhs]. *) val build: int -> Symbol.t array -> t -> string option -> t (* [prepend symbol forest] is the forest: (i) whose first element is the symbol [symbol], and (ii) whose remaining elements form the forest [forest]. *) val prepend: Symbol.t -> t -> t (* -------------------------------------------------------------------------- *) (* Factoring. *) (* [factor] factors the maximal common derivation context out of a nonempty family of derivations. It produces a pair of the context and of the residual derivations. *) val factor: t Item.Map.t -> context * t Item.Map.t (* -------------------------------------------------------------------------- *) (* Display. *) (* [print] prints a derivation. *) val print: out_channel -> t -> unit (* [printc] prints a derivation context. *) val printc: out_channel -> context -> unit menhir-20210929/src/dot.ml000066400000000000000000000104051412503066000151520ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Printf (* ------------------------------------------------------------------------- *) (* Type definitions. *) type size = float * float (* in inches *) type orientation = | Portrait | Landscape type rankdir = | LeftToRight | TopToBottom type ratio = | Compress | Fill | Auto type style = (* Both nodes and edges. *) | Solid | Dashed | Dotted | Bold | Invisible (* Nodes only. *) | Filled | Diagonals | Rounded type shape = | Box | Oval | Circle | DoubleCircle (* there are many others, let's stop here *) (* ------------------------------------------------------------------------- *) (* Basic printers. *) let print_style = function | None -> "" | Some style -> let style = match style with | Solid -> "solid" | Dashed -> "dashed" | Dotted -> "dotted" | Bold -> "bold" | Invisible -> "invis" | Filled -> "filled" | Diagonals -> "diagonals" | Rounded -> "rounded" in sprintf ", style = %s" style let print_shape = function | None -> "" | Some shape -> let shape = match shape with | Box -> "box" | Oval -> "oval" | Circle -> "circle" | DoubleCircle -> "doublecircle" in sprintf ", shape = %s" shape (* ------------------------------------------------------------------------- *) (* The graph printer. *) module Print (G : sig type vertex val name: vertex -> string val successors: (?style:style -> label:string -> vertex -> unit) -> vertex -> unit val iter: (?shape:shape -> ?style:style -> label:string -> vertex -> unit) -> unit end) = struct let print ?(directed = true) ?size ?(orientation = Landscape) ?(rankdir = LeftToRight) ?(ratio = Compress) (f : out_channel) = fprintf f "%s G {\n" (if directed then "digraph" else "graph"); Option.iter (fun (hsize, vsize) -> fprintf f "size=\"%f, %f\";\n" hsize vsize ) size; begin match orientation with | Portrait -> fprintf f "orientation = portrait;\n" | Landscape -> fprintf f "orientation = landscape;\n" end; begin match rankdir with | LeftToRight -> fprintf f "rankdir = LR;\n" | TopToBottom -> fprintf f "rankdir = TB;\n" end; begin match ratio with | Compress -> fprintf f "ratio = compress;\n" | Fill -> fprintf f "ratio = fill;\n" | Auto -> fprintf f "ratio = auto;\n" end; G.iter (fun ?shape ?style ~label vertex -> fprintf f "%s [ label=\"%s\"%s%s ] ;\n" (G.name vertex) label (print_style style) (print_shape shape) ); G.iter (fun ?shape ?style ~label source -> ignore shape; (* avoid unused variable warnings *) ignore style; ignore label; G.successors (fun ?style ~label destination -> fprintf f "%s %s %s [ label=\"%s\"%s ] ;\n" (G.name source) (if directed then "->" else "--") (G.name destination) label (print_style style) ) source ); fprintf f "\n}\n" end menhir-20210929/src/dot.mli000066400000000000000000000043341412503066000153270ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module displays graphs in graphviz dot format. It is much more basic than the one bundled with the ocamlgraph library, but offers the advantage of being stand-alone. *) (* ------------------------------------------------------------------------- *) (* Type definitions. *) type size = float * float (* in inches *) type orientation = | Portrait | Landscape type rankdir = | LeftToRight | TopToBottom type ratio = | Compress | Fill | Auto type style = (* Both nodes and edges. *) | Solid | Dashed | Dotted | Bold | Invisible (* Nodes only. *) | Filled | Diagonals | Rounded type shape = | Box | Oval | Circle | DoubleCircle (* there are many others, let's stop here *) (* ------------------------------------------------------------------------- *) (* The graph printer. *) module Print (G : sig type vertex val name: vertex -> string val successors: (?style:style -> label:string -> vertex -> unit) -> vertex -> unit val iter: (?shape:shape -> ?style:style -> label:string -> vertex -> unit) -> unit end) : sig val print: ?directed: bool -> ?size: size -> ?orientation: orientation -> ?rankdir: rankdir -> ?ratio: ratio -> out_channel -> unit end menhir-20210929/src/dump.ml000066400000000000000000000102571412503066000153360ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Printf open Grammar module Make (Default : sig val has_default_reduction: Lr1.node -> (Production.index * TerminalSet.t) option end) = struct let dump_node out stack_symbols node = (* Print the state number. *) fprintf out "State %d:\n" (Lr1.number node); (* Print the known suffix of the stack. *) fprintf out "## Known stack suffix:\n\ ##%s\n" (StackSymbols.print_symbols (stack_symbols node)); (* Print the items. *) fprintf out "## LR(1) items:\n%s" (Lr0.print "" (Lr1.state node)); (* Print the transitions. *) fprintf out "## Transitions:\n"; SymbolMap.iter (fun symbol node -> fprintf out "-- On %s shift to state %d\n" (Symbol.print symbol) (Lr1.number node) ) (Lr1.transitions node); (* Print the reductions. *) begin match Default.has_default_reduction node with | Some (prod, toks) -> (* There is a default reduction. *) (* Because end-of-stream conflicts have been resolved, either [toks] is the singleton set that contains just the token [#], or it is a set of ordinary terminal symbols. In the former case, Menhir reduces without even asking for the next token; in the latter case, it first reads the next token, then reduces without looking at it. *) assert ( TerminalSet.equal toks (TerminalSet.singleton Terminal.sharp) || not (TerminalSet.mem Terminal.sharp toks) ); let keyword = if TerminalSet.mem Terminal.sharp toks then "Without" else "After" in fprintf out "## Default reduction:\n"; fprintf out "-- %s reading the next token, %s\n" keyword (Production.describe false prod); | None -> (* There is no default reduction. *) fprintf out "## Reductions:\n"; (* 2020/11/21: for better readability, we now group the symbols that lead to reducing the same production. *) let reductions = Lr0.invert (Lr1.reductions node) in ProductionMap.iter (fun prod toks -> fprintf out "-- On %s\n" (TerminalSet.print toks); fprintf out "-- %s\n" (Production.describe false prod) ) reductions end; (* Print the conflicts. *) if not (TerminalSet.is_empty (Lr1.conflict_tokens node)) then fprintf out "** Conflict on %s\n" (TerminalSet.print (Lr1.conflict_tokens node)); (* Print the end-of-stream conflicts. *) Lr1.has_eos_conflict node |> Option.iter begin fun (prods, toks) -> (* If this function is invoked before conflict resolution has been performed, then the list [prods] could have several elements. We pick one. *) assert (prods <> []); let prod = List.hd prods in fprintf out "** End-of-stream conflict on %s\n" (TerminalSet.print toks); fprintf out "** There is a tension between\n\ ** (1) %s\n\ ** without even requesting a lookahead token, and\n\ ** (2) testing whether the lookahead token is a member of the above set.\n" (Production.describe true prod) end; (* Skip a line. *) fprintf out "\n" let dump filename = let module SS = StackSymbols.Run() in let out = open_out filename in Lr1.iter (dump_node out SS.stack_symbols); close_out out; Time.tick "Dumping the LR(1) automaton" end menhir-20210929/src/dump.mli000066400000000000000000000026671412503066000155150ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* [dump filename] writes a description of the LR(1) automaton to the file [filename]. This function can be invoked either before or after conflicts have been resolved and extra reductions have been added. In both cases, information about the automaton is provided by the modules Grammar and Lr1. *) open Grammar module Make (Default : sig val has_default_reduction: Lr1.node -> (Production.index * TerminalSet.t) option end) : sig val dump: string -> unit end menhir-20210929/src/dune000066400000000000000000000017371412503066000147200ustar00rootroot00000000000000;; Compilation flags for Menhir. ;; Warnings are enabled (and fatal) during development, ;; but are disabled in releases. ;; If you change these flags, please also update the file ;; src/.merlin, which currently is *not* auto-generated. (env (dev (flags :standard -safe-string -g -w @1..66-4-9-41-44-45-60 )) (release (flags :standard -safe-string -g )) ) ;; The following parsers are built by ocamlyacc. (ocamlyacc sentenceParser ) ;; The following lexers are built by ocamllex. (ocamllex lexer lineCount lexmli lexdep chopInlined sentenceLexer segment lexpointfree ) ;; The Menhir standard library "standard.mly" is embedded in the source code of ;; Menhir using the following rule. It generates a file "standard_mly.ml" with ;; contents "let contents = {||}". (rule (with-stdout-to standard_mly.ml (progn (echo "let contents = {|") (cat standard.mly) (echo "|}") ) ) ) menhir-20210929/src/error.ml000066400000000000000000000076041412503066000155240ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Printf (* ---------------------------------------------------------------------------- *) (* A mechanism to turn all display (logging, warnings, errors) on and off. *) let enabled = ref true let enable () = enabled := true let disable () = enabled := false (* ---------------------------------------------------------------------------- *) (* The new OCaml type inference protocol means that Menhir is called twice, first with [--infer-write-query], then with [--infer-read-reply]. This means that any information messages or warnings issued before OCaml type inference takes place are duplicated, unless we do something about it. To address this issue, when [--infer-read-reply] is set, we disable all output until the point where we read the inferred [.mli] file. Then, we enable it again and continue. *) (* An alternative idea would be to disable all output when [--infer-write-query] is set. However, we would then have no output at all if this command fails. *) let () = Settings.(match infer with | IMReadReply _ -> disable() | _ -> () ) (* ---------------------------------------------------------------------------- *) (* Logging and log levels. *) let log kind verbosity msg = if kind >= verbosity && !enabled then Printf.fprintf stderr "%t%!" msg let logG = log Settings.logG let logA = log Settings.logA let logC = log Settings.logC (* ---------------------------------------------------------------------------- *) (* Errors and warnings. *) let print_positions f positions = List.iter (fun position -> fprintf f "%s:\n" (Positions.string_of_pos position) ) positions let display continuation header positions format = let kprintf = if !enabled then Printf.kfprintf else Printf.ikfprintf in kprintf continuation stderr ("%a" ^^ header ^^ format ^^ "\n%!") print_positions positions let error positions format = display (fun _ -> exit 1) "Error: " positions format let warning positions format = display (fun _ -> ()) "Warning: " positions format let errorp v = error [ Positions.position v ] (* ---------------------------------------------------------------------------- *) (* Delayed error reports -- where multiple errors can be reported at once. *) type category = bool ref let new_category () = ref false let signal category positions format = display (fun _ -> category := true) "Error: " positions format let exit_if category = if !category then exit 1 let with_new_category f = let c = new_category() in match f c with | y -> exit_if c; y | exception e -> exit_if c; raise e (* ---------------------------------------------------------------------------- *) (* Certain warnings about the grammar can optionally be treated as errors. *) let grammatical_error = new_category() let grammar_warning pos = if Settings.strict then signal grammatical_error pos else warning pos menhir-20210929/src/error.mli000066400000000000000000000074331412503066000156750ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module helps report errors. *) (* ---------------------------------------------------------------------------- *) (* A mechanism to turn all display (logging, warnings, errors) on and off. *) val enable: unit -> unit val disable: unit -> unit (* ---------------------------------------------------------------------------- *) (* Logging and log levels. *) val logG: int -> (out_channel -> unit) -> unit val logA: int -> (out_channel -> unit) -> unit val logC: int -> (out_channel -> unit) -> unit (* ---------------------------------------------------------------------------- *) (* Errors and warnings. *) (* [error ps format ...] displays the list of positions [ps], followed with the error message [format ...], and exits. The strings "Error: " and "\n" are automatically added at the beginning and end of the error message. The message should begin with a lowercase letter and end with a dot. *) val error: Positions.positions -> ('a, out_channel, unit, 'b) format4 -> 'a (* [errorp] is like [error], but uses the position range carried by [v]. *) val errorp: _ Positions.located -> ('a, out_channel, unit, 'b) format4 -> 'a (* [warning] is like [error], except it does not exit. *) val warning: Positions.positions -> ('a, out_channel, unit, unit) format4 -> 'a (* ---------------------------------------------------------------------------- *) (* Delayed error reports -- where multiple errors can be reported at once. *) (* A category of errors. *) type category (* [new_category()] creates a new category of errors. *) val new_category: unit -> category (* [signal category] is like [error], except it does not exit immediately. It records the fact that an error of this category has occurred. This can be later detected by [exit_if category]. *) val signal: category -> Positions.positions -> ('a, out_channel, unit, unit) format4 -> 'a (* [exit_if category] exits with exit code 1 if [signal category] was previously called. Together, [signal] and [exit_if] allow reporting multiple errors before aborting. *) val exit_if: category -> unit (* [with_new_category f] creates a new category [c], executes [f c], then invokes [exit_if c], and transmits the result returned by [f]. It works also if [f] raises an exception. *) val with_new_category: (category -> 'b) -> 'b (* ---------------------------------------------------------------------------- *) (* Certain warnings about the grammar can optionally be treated as errors. *) val grammatical_error: category (* [grammar_warning] emits a warning or error message, via either [warning] or [signal grammatical_error]. It does not stop the program; the client must at some point use [exit_if grammatical_error] and stop the program if any errors have been reported. *) val grammar_warning: Positions.positions -> ('a, out_channel, unit, unit) format4 -> 'a menhir-20210929/src/expandTokenAliases.ml000066400000000000000000000130371412503066000201520ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Syntax (* We first build an alias map, which records the token aliases declared across all partial grammars. This is a map of aliases to pairs of a terminal symbol and the position where this symbol is declared. Then, we walk the partial grammars before they are joined, expanding the token aliases along the way. *) type aliasmap = (terminal * Positions.t) StringMap.t (* -------------------------------------------------------------------------- *) (* Extend an alias map with the token aliases present in a declaration. *) let collect_aliases_from_declaration (aliasmap : aliasmap) decl : aliasmap = match Positions.value decl with | DToken (_, id, Some qid, _) -> begin match StringMap.find qid aliasmap with | exception Not_found -> (* Good: this alias does not exist yet. Record it. *) StringMap.add qid (id, Positions.position decl) aliasmap | id0, pos -> (* Oops: [qid] has already been declared as an alias for some other token. *) Error.error [Positions.position decl; pos] "%s cannot be declared as an alias for the symbol %s.\n\ It has already been declared as an alias for %s." qid id id0 end | _ -> aliasmap (* Extend an alias map with the token aliases present in a partial grammar. *) let collect_aliases_from_grammar aliasmap g = List.fold_left collect_aliases_from_declaration aliasmap g.pg_declarations let collect_aliases_from_grammars gs : aliasmap = List.fold_left collect_aliases_from_grammar StringMap.empty gs (* -------------------------------------------------------------------------- *) (* Expand a possible alias, returning a name which definitely is not an alias (and may or may not be a valid terminal symbol). *) let dealias_terminal (aliasmap : aliasmap) pos (t : terminal) : terminal = (* [t] is either a terminal symbol or a token alias. If it starts with double quote, then it must be a token alias. *) if t.[0] = '"' then match StringMap.find t aliasmap with | id, _ -> id | exception Not_found -> Error.error [pos] "the token alias %s was never declared." t else t (* Perform alias expansion throughout a partial grammar. (Visitors could be useful here!) *) let dealias_symbol aliasmap (sym : terminal Positions.located) = Positions.pmap (dealias_terminal aliasmap) sym let rec dealias_parameter aliasmap (param : parameter) = match param with | ParameterVar sym -> ParameterVar (dealias_symbol aliasmap sym) | ParameterApp (sym, params) -> ParameterApp ( dealias_symbol aliasmap sym, dealias_parameters aliasmap params ) | ParameterAnonymous branches -> ParameterAnonymous (Positions.map (dealias_branches aliasmap) branches) and dealias_parameters aliasmap params = List.map (dealias_parameter aliasmap) params and dealias_producer aliasmap (producer : producer) = let id, param, attrs = producer in id, (dealias_parameter aliasmap param), attrs and dealias_producers aliasmap producers = List.map (dealias_producer aliasmap) producers and dealias_branch aliasmap (branch : parameterized_branch) = { branch with pr_producers = dealias_producers aliasmap branch.pr_producers } and dealias_branches aliasmap branches = List.map (dealias_branch aliasmap) branches let dealias_rule aliasmap rule = { rule with pr_branches = dealias_branches aliasmap rule.pr_branches } let dealias_decl aliasmap (decl : declaration Positions.located) = Positions.pmap (fun pos (decl : declaration) -> match decl with | DCode _ | DParameter _ | DToken _ | DStart _ | DGrammarAttribute _ -> decl | DTokenProperties (t, assoc, prec) -> DTokenProperties (dealias_terminal aliasmap pos t, assoc, prec) | DType (ty, param) -> DType (ty, dealias_parameter aliasmap param) | DSymbolAttributes (params, attrs) -> DSymbolAttributes (dealias_parameters aliasmap params, attrs) | DOnErrorReduce (param, level) -> DOnErrorReduce (dealias_parameter aliasmap param, level) ) decl let dealias_grammar aliasmap g = { g with pg_declarations = List.map (dealias_decl aliasmap) g.pg_declarations; pg_rules = List.map (dealias_rule aliasmap) g.pg_rules } let dealias_grammars aliasmap gs = List.map (dealias_grammar aliasmap) gs (* -------------------------------------------------------------------------- *) (* The two phases above are combined as follows. *) let dealias_grammars gs = let aliasmap = collect_aliases_from_grammars gs in dealias_grammars aliasmap gs menhir-20210929/src/expandTokenAliases.mli000066400000000000000000000030271412503066000203210ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* Token aliases are quoted strings that are used to provide syntactic sugar for terminal symbols, for example, to allow "+" to be used in grammar rules instead of PLUS, or to allow ")" instead of RPAREN. *) (* This transformation eliminates all references to token aliases in a list of partial grammars. (An alias declared in one partial grammar can be used in another partial grammar.) Declarations of token aliases are preserved, and could be used if desired (e.g. for printing). *) open Syntax val dealias_grammars: partial_grammar list -> partial_grammar list menhir-20210929/src/front.ml000066400000000000000000000207071412503066000155220ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The front-end. This module performs a series of toplevel side effects. *) (* ------------------------------------------------------------------------- *) (* Reading a grammar from a file. *) let load_grammar_from_contents filename contents = InputFile.new_input_file filename; InputFile.with_file_contents contents (fun () -> let open Lexing in let lexbuf = Lexing.from_string contents in lexbuf.lex_curr_p <- { lexbuf.lex_curr_p with pos_fname = filename }; (* the grammar: *) { (Driver.grammar Lexer.main lexbuf) with Syntax.pg_filename = filename } ) let check_filename filename = let validExt = if Settings.coq then ".vy" else ".mly" in if not (Filename.check_suffix filename validExt) then Error.error [] "argument file names should end in %s. \"%s\" is not accepted." validExt filename let load_grammar_from_file filename : Syntax.partial_grammar = check_filename filename; try let contents = IO.read_whole_file filename in load_grammar_from_contents filename contents with Sys_error msg -> Error.error [] "%s" msg (* ------------------------------------------------------------------------- *) (* Read all of the grammar files that are named on the command line, plus the standard library, unless suppressed by [--no-stdlib] or [--coq]. *) let grammars () : Syntax.partial_grammar list = List.map load_grammar_from_file Settings.filenames let grammars : Syntax.partial_grammar list = if Settings.no_stdlib || Settings.coq then grammars() else (* As 20190924, the standard library is no longer actually read from a file. Instead, its text is built into the Menhir executable: it is found in the string [Standard_mly.contents]. We parse it just as if it had been read from a file, and pretend that the file name is [Settings.stdlib_filename]. This file name can appear in generated parsers, because Menhir produces # directives that point back to source (.mly) files. *) (* Note that the [let] construct below is required in order to ensure that the standard library is read first. *) let standard_library = load_grammar_from_contents Settings.stdlib_filename Standard_mly.contents in standard_library :: grammars() let () = Time.tick "Lexing and parsing" (* ------------------------------------------------------------------------- *) (* Eliminate anonymous rules. *) let grammars : Syntax.partial_grammar list = List.map Anonymous.transform_partial_grammar grammars (* ------------------------------------------------------------------------- *) (* If several grammar files were specified, merge them. *) let grammar : Syntax.grammar = PartialGrammar.join_partial_grammars grammars (* ------------------------------------------------------------------------- *) (* Check that the grammar is well-sorted; infer the sort of every symbol. *) let sorts = SortInference.infer grammar (* ------------------------------------------------------------------------- *) (* Expand away all applications of parameterized nonterminal symbols, so as to obtain a grammar without parameterized nonterminal symbols. *) let grammar : BasicSyntax.grammar = let module S = SelectiveExpansion in (* First, perform a selective expansion: expand away all parameters of higher sort, keeping the parameters of sort [*]. This process always terminates. *) let grammar1 = S.expand S.ExpandHigherSort sorts grammar in (* This "first-order parameterized grammar" can then be submitted to the termination check. *) CheckSafeParameterizedGrammar.check grammar1; (* If it passes the check, then full expansion is safe. We drop [grammar1] and start over from [grammar]. This is required in order to get correct names. (Expanding [grammar1] would yield an equivalent grammar, with more complicated names, reflecting the two steps of expansion.) *) let grammar = S.expand S.ExpandAll sorts grammar in (* This yields an unparameterized grammar. *) Drop.drop grammar let () = Time.tick "Joining and expanding" (* ------------------------------------------------------------------------- *) (* If [--only-tokens] was specified on the command line, produce the definition of the [token] type and stop. *) let () = TokenType.produce_tokentypes grammar (* ------------------------------------------------------------------------- *) (* Perform reachability analysis. *) let grammar = Reachability.trim grammar let () = Time.tick "Trimming" (* ------------------------------------------------------------------------- *) (* If [--infer] was specified on the command line, perform type inference. The OCaml type of every nonterminal symbol is then known. *) (* If [--depend] or [--raw-depend] was specified on the command line, perform dependency analysis and stop. *) (* The purpose of [--depend] and [--raw-depend] is to support [--infer]. Indeed, [--infer] is implemented by producing a mock [.ml] file (which contains just the semantic actions) and invoking [ocamlc]. This requires certain [.cmi] files to exist. So, [--(raw-)depend] is a way for us to announce which [.cmi] files we need. It is implemented by producing the mock [.ml] file and running [ocamldep] on it. We also produce a mock [.mli] file, even though in principle it should be unnecessary -- see comment in [nonterminalType.mli]. *) (* If [--infer-write-query] was specified on the command line, write a mock [.ml] file and stop. It is then up to the user (or build system) to invoke [ocamlc -i] on this file, so as to do type inference. *) (* If [--infer-read-reply] was specified on the command line, read the inferred [.mli] file. The OCaml type of every nonterminal symbol is then known, just as with [--infer]. *) let grammar, ocaml_types_have_been_checked = Settings.(match infer with | IMNone -> grammar, false | IMInfer -> let grammar = Infer.infer grammar in Time.tick "Inferring types for nonterminals"; grammar, true | IMDependRaw -> Infer.depend false grammar (* never returns *) | IMDependPostprocess -> Infer.depend true grammar (* never returns *) | IMWriteQuery filename -> Infer.write_query filename grammar (* never returns *) | IMReadReply filename -> let grammar = Infer.read_reply filename grammar in Time.tick "Reading inferred types for nonterminals"; grammar, true ) (* ------------------------------------------------------------------------- *) (* Expand away some of the position keywords. *) let grammar = KeywordExpansion.expand_grammar grammar (* ------------------------------------------------------------------------- *) (* If [--no-inline] was specified on the command line, skip the inlining of non terminal definitions marked with %inline. *) let grammar = if Settings.inline then begin let grammar = Inlining.inline grammar in (* 2018/05/23 Removed the warning that was issued when %inline was used but --infer was turned off. Most people should use ocamlbuild or dune anyway. *) Time.tick "Inlining"; grammar end else grammar (* ------------------------------------------------------------------------- *) (* If [--only-preprocess] or [--only-preprocess-drop] was specified on the command line, print the grammar and stop. Otherwise, continue. *) let () = match Settings.preprocess_mode with | Settings.PMOnlyPreprocess mode -> BasicPrinter.print mode stdout grammar; exit 0 | Settings.PMNormal -> () menhir-20210929/src/front.mli000066400000000000000000000036711412503066000156740ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module drives the front-end. It opens and parses the input files, which yields a number of partial grammars. It joins these grammars, expands them to get rid of parameterized nonterminals, and performs reachability analysis. This yields a single unified grammar. It then performs type inference. This yields the grammar that the back-end works with (often through the interface provided by module [Grammar]). *) val grammar: BasicSyntax.grammar (* This flag tells whether the semantic actions have been type-checked. It is set if and only if either [--infer] or [--infer-read-reply] is in use. Note that the presence of a %type declaration for every nonterminal symbol is *not* sufficient for this flag to be set. Note also that, when [--infer-read-reply] is set, it could be the case that we have an out-of-date inferred [.mli] file, so the semantic actions could still be ill-typed. (The user is then at fault.) *) val ocaml_types_have_been_checked: bool menhir-20210929/src/gMap.ml000066400000000000000000000151321412503066000152520ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) module type S = sig (* Keys are assumed to have a natural total order. *) type key (* The type of maps whose data have type ['a]. *) type 'a t (* The empty map. *) val empty: 'a t (* [lookup k m] looks up the value associated to the key [k] in the map [m], and raises [Not_found] if no value is bound to [k]. *) val lookup: key -> 'a t -> 'a val find: key -> 'a t -> 'a (* [add k d m] returns a map whose bindings are all bindings in [m], plus a binding of the key [k] to the datum [d]. If a binding already exists for [k], it is overridden. *) val add: key -> 'a -> 'a t -> 'a t (* [strict_add k d m] returns a map whose bindings are all bindings in [m], plus a binding of the key [k] to the datum [d]. If a binding already exists for [k] then [Unchanged] is raised. *) exception Unchanged val strict_add: key -> 'a -> 'a t -> 'a t (* [fine_add decide k d m] returns a map whose bindings are all bindings in [m], plus a binding of the key [k] to the datum [d]. If a binding from [k] to [d0] already exists, then the resulting map contains a binding from [k] to [decide d0 d]. *) type 'a decision = 'a -> 'a -> 'a val fine_add: 'a decision -> key -> 'a -> 'a t -> 'a t (* [mem k m] tells whether the key [k] appears in the domain of the map [m]. *) val mem: key -> 'a t -> bool (* [singleton k d] returns a map whose only binding is from [k] to [d]. *) val singleton: key -> 'a -> 'a t (* [is_empty m] returns [true] if and only if the map [m] defines no bindings at all. *) val is_empty: 'a t -> bool (* [is_singleton s] returns [Some x] if [s] is a singleton containing [x] as its only element; otherwise, it returns [None]. *) val is_singleton: 'a t -> (key * 'a) option (* [cardinal m] returns [m]'s cardinal, that is, the number of keys it binds, or, in other words, the cardinal of its domain. *) val cardinal: 'a t -> int (* [choose m] returns an arbitrarily chosen binding in [m], if [m] is nonempty, and raises [Not_found] otherwise. *) val choose: 'a t -> key * 'a (* [lookup_and_remove k m] looks up the value [v] associated to the key [k] in the map [m], and raises [Not_found] if no value is bound to [k]. The call returns the value [v], together with the map [m] deprived from the binding from [k] to [v]. *) val lookup_and_remove: key -> 'a t -> 'a * 'a t val find_and_remove: key -> 'a t -> 'a * 'a t (* [remove k m] is the map [m] deprived from any binding for [k]. *) val remove: key -> 'a t -> 'a t (* [union m1 m2] returns the union of the maps [m1] and [m2]. Bindings in [m2] take precedence over those in [m1]. *) val union: 'a t -> 'a t -> 'a t (* [fine_union decide m1 m2] returns the union of the maps [m1] and [m2]. If a key [k] is bound to [x1] (resp. [x2]) within [m1] (resp. [m2]), then [decide] is called. It is passed [x1] and [x2], and must return the value that shall be bound to [k] in the final map. *) val fine_union: 'a decision -> 'a t -> 'a t -> 'a t (* [iter f m] invokes [f k x], in turn, for each binding from key [k] to element [x] in the map [m]. Keys are presented to [f] in increasing order. *) val iter: (key -> 'a -> unit) -> 'a t -> unit (* [fold f m seed] invokes [f k d accu], in turn, for each binding from key [k] to datum [d] in the map [m]. Keys are presented to [f] in increasing order. The initial value of [accu] is [seed]; then, at each new call, its value is the value returned by the previous invocation of [f]. The value returned by [fold] is the final value of [accu]. *) val fold: (key -> 'a -> 'b -> 'b) -> 'a t -> 'b -> 'b (* [fold_rev] performs exactly the same job as [fold], but presents keys to [f] in the opposite order. *) val fold_rev: (key -> 'a -> 'b -> 'b) -> 'a t -> 'b -> 'b (* [filter f m] returns a copy of the map [m] where only the bindings that satisfy [f] have been retained. *) val filter: (key -> 'a -> bool) -> 'a t -> 'a t (* It is valid to evaluate [iter2 f m1 m2] if and only if [m1] and [m2] have equal domains. Doing so invokes [f k x1 x2], in turn, for each key [k] bound to [x1] in [m1] and to [x2] in [m2]. Bindings are presented to [f] in increasing order. *) val iter2: (key -> 'a -> 'b -> unit) -> 'a t -> 'b t -> unit (* [map f m] returns the map obtained by composing the map [m] with the function [f]; that is, the map $k\mapsto f(m(k))$. *) val map: ('a -> 'b) -> 'a t -> 'b t (* [endo_map] is similar to [map], but attempts to physically share its result with its input. This saves memory when [f] is the identity function. *) val endo_map: ('a -> 'a) -> 'a t -> 'a t (* If [dcompare] is an ordering over data, then [compare dcompare] is an ordering over maps. *) val compare: ('a -> 'a -> int) -> 'a t -> 'a t -> int (* A map's domain is a set. Thus, to be able to perform operations on domains, we need set operations, provided by the [Domain] sub-module. The two-way connection between maps and their domains is given by two additional functions, [domain] and [lift]. [domain m] returns [m]'s domain. [lift f s] returns the map $k\mapsto f(k)$, where $k$ ranges over a set of keys [s]. *) module Domain : GSet.S with type element = key val domain: 'a t -> Domain.t val lift: (key -> 'a) -> Domain.t -> 'a t (* [corestrict m d] performs a co-restriction of the map [m] to the domain [d]. That is, it returns the map $k\mapsto m(k)$, where $k$ ranges over all keys bound in [m] but \emph{not} present in [d]. *) val corestrict: 'a t -> Domain.t -> 'a t end menhir-20210929/src/gSet.ml000066400000000000000000000071111412503066000152660ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This signature describes several implementations of sets, including [Patricia], [AtomicBitSet], and [SparseBitSet]. *) module type S = sig (* Elements are assumed to have a natural total order. *) type element (* Sets. *) type t (* The empty set. *) val empty: t (* [is_empty s] tells whether [s] is the empty set. *) val is_empty: t -> bool (* [singleton x] returns a singleton set containing [x] as its only element. *) val singleton: element -> t (* [is_singleton s] tests whether [s] is a singleton set. *) val is_singleton: t -> bool (* [cardinal s] returns the cardinal of [s]. *) val cardinal: t -> int (* [choose s] returns an arbitrarily chosen element of [s], if [s] is nonempty, and raises [Not_found] otherwise. *) val choose: t -> element (* [mem x s] returns [true] if and only if [x] appears in the set [s]. *) val mem: element -> t -> bool (* [add x s] returns a set whose elements are all elements of [s], plus [x]. *) val add: element -> t -> t (* [remove x s] returns a set whose elements are all elements of [s], except [x]. *) val remove: element -> t -> t (* [union s1 s2] returns the union of the sets [s1] and [s2]. *) val union: t -> t -> t (* [inter s t] returns the set intersection of [s] and [t], that is, $s\cap t$. *) val inter: t -> t -> t (* [disjoint s1 s2] returns [true] if and only if the sets [s1] and [s2] are disjoint, i.e. iff their intersection is empty. *) val disjoint: t -> t -> bool (* [iter f s] invokes [f x], in turn, for each element [x] of the set [s]. Elements are presented to [f] in increasing order. *) val iter: (element -> unit) -> t -> unit (* [fold f s seed] invokes [f x accu], in turn, for each element [x] of the set [s]. Elements are presented to [f] in increasing order. The initial value of [accu] is [seed]; then, at each new call, its value is the value returned by the previous invocation of [f]. The value returned by [fold] is the final value of [accu]. In other words, if $s = \{ x_1, x_2, \ldots, x_n \}$, where $x_1 < x_2 < \ldots < x_n$, then [fold f s seed] computes $([f]\,x_n\,\ldots\,([f]\,x_2\,([f]\,x_1\,[seed]))\ldots)$. *) val fold: (element -> 'b -> 'b) -> t -> 'b -> 'b (* [elements s] is a list of all elements in the set [s]. *) val elements: t -> element list (* [compare] is an ordering over sets. *) val compare: t -> t -> int (* [equal] implements equality over sets. *) val equal: t -> t -> bool (* [subset] implements the subset predicate over sets. *) val subset: (t -> t -> bool) end menhir-20210929/src/grammar.ml000066400000000000000000000022061412503066000160120ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module runs the grammar functor on the grammar produced by the front-end. *) include GrammarFunctor.Make(struct let grammar = Front.grammar let verbose = true end)() menhir-20210929/src/grammarFunctor.ml000066400000000000000000001603731412503066000173650ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open BasicSyntax open Syntax open Positions module Make (G : sig (* An abstract syntax tree for the grammar. *) val grammar: BasicSyntax.grammar (* This flag indicates whether it is OK to produce warnings, verbose information, etc., when this functor is invoked. If it is set to [false], then only serious errors can be signaled. *) val verbose: bool end) () = struct open G (* ------------------------------------------------------------------------ *) (* [index] indexes a list of (distinct) strings, that is, assigns an integer index to each string and builds mappings both ways between strings and indices. *) let index (strings : string list) : int * string array * int StringMap.t = let name = Array.of_list strings and n, map = List.fold_left (fun (n, map) s -> n+1, StringMap.add s n map ) (0, StringMap.empty) strings in n, name, map (* ------------------------------------------------------------------------ *) (* Precedence levels for tokens or pseudo-tokens alike. *) module TokPrecedence = struct (* This set records, on a token by token basis, whether the token's precedence level is ever useful. This allows emitting warnings about useless precedence declarations. *) let ever_useful : StringSet.t ref = ref StringSet.empty let use id = ever_useful := StringSet.add id !ever_useful (* This function is invoked when someone wants to consult a token's precedence level. This does not yet mean that this level is useful, though. Indeed, if it is subsequently compared against [UndefinedPrecedence], it will not allow solving a conflict. So, in addition to the desired precedence level, we return a delayed computation which, when evaluated, records that this precedence level was useful. *) let levelip id properties = lazy (use id), properties.tk_precedence let leveli id = let properties = try StringMap.find id grammar.tokens with Not_found -> assert false (* well-formedness check has been performed earlier *) in levelip id properties (* This function prints warnings about useless precedence declarations for terminal symbols (%left, %right, %nonassoc). It should be invoked after only the automaton has been constructed. *) let diagnostics () = StringMap.iter (fun id properties -> if not (StringSet.mem id !ever_useful) then match properties.tk_precedence with | UndefinedPrecedence -> () | PrecedenceLevel (_, _, pos1, pos2) -> Error.grammar_warning [Positions.import (pos1, pos2)] "the precedence level assigned to %s is never useful." id ) grammar.tokens end (* ------------------------------------------------------------------------ *) (* Nonterminals. *) module Nonterminal = struct type t = int let n2i i = i let equal (nt1 : t) (nt2 : t) = nt1 = nt2 let compare = (-) (* Determine how many nonterminals we have and build mappings both ways between names and indices. A new nonterminal is created for every start symbol. *) let new_start_nonterminals = StringSet.fold (fun symbol ss -> (symbol ^ "'") :: ss) grammar.start_symbols [] let original_nonterminals = nonterminals grammar let start = List.length new_start_nonterminals let (n : int), (name : string array), (map : int StringMap.t) = index (new_start_nonterminals @ original_nonterminals) let () = if verbose then Error.logG 1 (fun f -> Printf.fprintf f "Grammar has %d nonterminal symbols, among which %d start symbols.\n" (n - start) start ) let is_internal_start nt = nt < start let is_user_start nt = StringSet.mem name.(nt) grammar.start_symbols let print normalize nt = if normalize then Misc.normalize name.(nt) else name.(nt) let lookup name = StringMap.find name map let positions nt = (StringMap.find (print false nt) grammar.rules).positions let init f = Array.init n f let iter f = Misc.iteri n f let fold f accu = Misc.foldi n f accu let map f = Misc.mapi n f let iterx f = for nt = start to n - 1 do f nt done let foldx f accu = Misc.foldij start n f accu let ocamltype nt = assert (not (is_internal_start nt)); try Some (StringMap.find (print false nt) grammar.types) with Not_found -> None let ocamltype_of_start_symbol nt = match ocamltype nt with | Some typ -> typ | None -> (* Every start symbol has a type. *) assert false let tabulate f = Array.get (Array.init n f) let attributes : Syntax.attributes array = Array.make n [] let () = StringMap.iter (fun nonterminal { attributes = attrs } -> let nt = lookup nonterminal in attributes.(nt) <- attrs ) grammar.rules let attributes nt = attributes.(nt) end (* Sets and maps over nonterminals. *) module NonterminalMap = Patricia.Big module NonterminalSet = Patricia.Big.Domain (* ------------------------------------------------------------------------ *) (* Terminals. *) module Terminal = struct type t = int let t2i i = i let i2t i = i let compare = (-) let equal (tok1 : t) (tok2 : t) = tok1 = tok2 (* Determine how many terminals we have and build mappings both ways between names and indices. A new terminal "#" is created. A new terminal "error" is created. The fact that the integer code assigned to the "#" pseudo-terminal is the last one is exploited in the table-based back-end. (The right-most row of the action table is not created.) Pseudo-tokens (used in %prec declarations, but never declared using %token) are filtered out. *) (* In principle, the number of the [error] token is irrelevant. It is currently 0, but we do not rely on that. *) let (n : int), (name : string array), (map : int StringMap.t) = let tokens = tokens grammar in match tokens with | [] when verbose -> Error.error [] "no tokens have been declared." | _ -> index ("error" :: tokens @ [ "#" ]) let print tok = name.(tok) let lookup name = StringMap.find name map let sharp = lookup "#" let error = lookup "error" let pseudo tok = (tok = sharp) || (tok = error) let real t = error <> t && t <> sharp let non_error tok = tok <> error let token_properties = let not_so_dummy_properties = (* applicable to [error] and [#] *) { tk_filename = "__primitives__"; tk_precedence = UndefinedPrecedence; tk_associativity = UndefinedAssoc; tk_ocamltype = None; tk_is_declared = true; tk_position = Positions.dummy; tk_attributes = []; tk_alias = None; } in Array.init n (fun tok -> try StringMap.find name.(tok) grammar.tokens with Not_found -> assert (tok = sharp || tok = error); not_so_dummy_properties ) let () = if verbose then Error.logG 1 (fun f -> Printf.fprintf f "Grammar has %d terminal symbols.\n" (n - 2) ) let precedence_level tok = TokPrecedence.levelip (print tok) token_properties.(tok) let associativity tok = token_properties.(tok).tk_associativity let ocamltype tok = token_properties.(tok).tk_ocamltype let init f = Array.init n f let iter f = Misc.iteri n f let fold f accu = Misc.foldi n f accu let map f = Misc.mapi n f let () = assert (sharp = n - 1) let foldx f accu = Misc.foldi sharp f accu let mapx f = Misc.mapi sharp f let () = assert (error = 0) let iter_real f = for i = 1 to n-2 do f i done let tokens_without_an_alias = let accu = ref [] in iter_real begin fun tok -> let properties = token_properties.(tok) in if properties.tk_alias = None then accu := tok :: !accu end; List.rev !accu let () = if verbose && Settings.require_aliases then tokens_without_an_alias |> List.iter begin fun tok -> let properties = token_properties.(tok) in let pos = properties.tk_position in Error.grammar_warning [pos] "no alias has been defined for the token %s." (print tok) end let every_token_has_an_alias = tokens_without_an_alias = [] let alias tok = token_properties.(tok).tk_alias let unquoted_alias tok = alias tok |> Option.map (fun qid -> assert (qid.[0] = '"'); (* Skip the opening quote and decode the remainder using the lexer [decode_string]. *) let qid = String.sub qid 1 (String.length qid - 1) in let lexbuf = Lexing.from_string qid in Misc.with_buffer 8 (fun b -> Lexer.decode_string b lexbuf) ) let print_concrete t = match unquoted_alias t with | Some alias -> alias | None -> (* An alias is missing. Use the abstract name of the terminal instead. This is a best effort. *) print t (* If a token named [EOF] exists, then it is assumed to represent ocamllex's [eof] pattern. *) let eof = try Some (lookup "EOF") with Not_found -> None let attributes tok = token_properties.(tok).tk_attributes (* The sub-module [Word] offers an implementation of words (that is, sequences) of terminal symbols. It is used by [LRijkstra]. We make it a functor, because it has internal state (a hash table) and a side effect (failure if there are more than 256 terminal symbols). *) module Word (X : sig end) = struct (* We could use lists, or perhaps the sequences offered by the module [Seq], which support constant time concatenation. However, we need a much more compact representation: [LRijkstra] stores tens of millions of such words. We use strings, because they are very compact (8 bits per symbol), and on top of that, we use a hash-consing facility. In practice, hash-consing allows us to save 1000x in space. *) (* A drawback of this approach is that it works only if the number of terminal symbols is at most 256. For the moment, this is good enough. [LRijkstra] already has difficulty at 100 terminal symbols or so. *) let () = assert (n <= 256) let (encode : string -> int), (decode : int -> string), verbose = Misc.new_encode_decode 1024 type word = int let epsilon = encode "" let singleton t = encode (String.make 1 (Char.chr t)) let append i1 i2 = let w1 = decode i1 and w2 = decode i2 in if String.length w1 = 0 then i2 else if String.length w2 = 0 then i1 else encode (w1 ^ w2) let length i = String.length (decode i) let first i z = let w = decode i in if String.length w > 0 then Char.code w.[0] else z let rec elements i n w = if i = n then [] else Char.code w.[i] :: elements (i + 1) n w let elements i = let w = decode i in elements 0 (String.length w) w let print i = let w = decode i in Misc.separated_iter_to_string (fun c -> print (Char.code c)) " " (fun f -> String.iter f w) (* [Generic.compare] implements a lexicographic ordering on strings. *) let compare i1 i2 = Generic.compare (decode i1) (decode i2) end end (* Sets of terminals are used intensively in the LR(1) construction, so it is important that they be as efficient as possible. *) module TerminalSet = struct (* 2020/01/29: use [BoundedBitSet] instead of [SparseBitSet]. *) include BoundedBitSet.Make(Terminal)() let print toks = Misc.separated_iter_to_string Terminal.print " " (fun f -> iter f toks) let universe = remove Terminal.sharp ( remove Terminal.error ( Terminal.fold add empty ) ) (* The following definitions are used in the computation of FIRST sets below. They are not exported outside of this file. *) type property = t let bottom = empty let is_maximal _ = false let leq_join = union end (* Maps over terminals. *) module TerminalMap = Patricia.Big (* ------------------------------------------------------------------------ *) (* Symbols. *) module Symbol = struct type t = | N of Nonterminal.t | T of Terminal.t let is_terminal sym = match sym with | N _ -> false | T _ -> true let compare sym1 sym2 = match sym1, sym2 with | N nt1, N nt2 -> Nonterminal.compare nt1 nt2 | T tok1, T tok2 -> Terminal.compare tok1 tok2 | N _, T _ -> 1 | T _, N _ -> -1 let equal sym1 sym2 = compare sym1 sym2 = 0 let rec lequal syms1 syms2 = match syms1, syms2 with | [], [] -> true | sym1 :: syms1, sym2 :: syms2 -> equal sym1 sym2 && lequal syms1 syms2 | _ :: _, [] | [], _ :: _ -> false let non_error sym = match sym with | T tok -> Terminal.non_error tok | N _ -> true let print = function | N nt -> Nonterminal.print false nt | T tok -> Terminal.print tok let nonterminal = function | T _ -> false | N _ -> true (* Printing an array of symbols. [offset] is the start offset -- we print everything to its right. [dot] is the dot offset -- we print a dot at this offset, if we find it. Note that [dot] can be equal to [length]. *) let buffer = Buffer.create 1024 let printaod offset dot symbols = let length = Array.length symbols in let first = ref true in let separate () = if not !first then Printf.bprintf buffer " "; first := false in for i = offset to length do if i = dot then begin separate(); Printf.bprintf buffer "." end; if i < length then begin separate(); Printf.bprintf buffer "%s" (print symbols.(i)) end done; let s = Buffer.contents buffer in Buffer.clear buffer; s let printao offset symbols = printaod offset (-1) symbols let printa symbols = printao 0 symbols let printl symbols = printa (Array.of_list symbols) let lookup name = try T (Terminal.lookup name) with Not_found -> try N (Nonterminal.lookup name) with Not_found -> assert false (* well-formedness check has been performed earlier *) end (* Sets of symbols. *) module SymbolSet = struct include Set.Make(Symbol) let print symbols = Symbol.printl (elements symbols) (* The following definitions are used in the computation of symbolic FOLLOW sets below. They are not exported outside of this file. *) type property = t let bottom = empty let leq = subset let join = union end (* Maps over symbols. *) module SymbolMap = struct include Map.Make(Symbol) let domain m = fold (fun symbol _ accu -> symbol :: accu ) m [] let init f xs = List.fold_left (fun accu x -> add x (f x) accu ) empty xs let purelynonterminal m = fold (fun symbol _ accu -> accu && Symbol.nonterminal symbol ) m true end (* ------------------------------------------------------------------------ *) (* Productions. *) module Production = struct type index = int let compare = (-) (* A new production S' -> S is created for every start symbol S. It is known as a start production. *) (* Count how many productions we have, including the start productions. This is [n]. *) let n : int = let n = StringMap.fold (fun _ { branches = branches } n -> n + List.length branches ) grammar.rules 0 in if verbose then Error.logG 1 (fun f -> Printf.fprintf f "Grammar has %d productions.\n" n); n + StringSet.cardinal grammar.start_symbols let p2i prod = prod let i2p prod = assert (prod >= 0 && prod < n); prod (* Create a number of uninitialized tables that map a production index to information about this production. *) (* [table] maps a production to the left-hand side and right-hand side of this production. [identifiers] maps a production to an array of the identifiers that are used to name the elements of the right-hand side. [actions] maps a production to an optional semantic action. (Only the start productions have none.) [positions] maps a production to an array of the positions (in the .mly file) of the elements of the right-hand side. [rhs_attributes] maps a production to an array of the attributes attached to the elements of the right-hand side. [prec_decl] maps a production to an optional [%prec] annotation. [production_level] maps a production to a production level (see [ParserAux]). *) let table : (Nonterminal.t * Symbol.t array) array = Array.make n (-1, [||]) let identifiers : identifier array array = Array.make n [||] let actions : action option array = Array.make n None let positions : Positions.t list array = Array.make n [] let rhs_attributes : Syntax.attributes array array = Array.make n [||] let prec_decl : symbol located option array = Array.make n None let production_level : branch_production_level array = (* The start productions receive a level that pretends that they originate in a fictitious "builtin" file. So, a reduce/reduce conflict that involves a start production will not be solved. *) let dummy = ProductionLevel (InputFile.builtin_input_file, 0) in Array.make n dummy (* [ntprods] maps a nonterminal symbol to the interval of its productions. *) let ntprods : (int * int) array = Array.make Nonterminal.n (-1, -1) (* This Boolean flag records whether the grammar uses the [error] token. *) let grammar_uses_error_token = ref false (* Create the start productions, populating the above arrays as appropriate. [start] is the number of start productions, therefore also the index of the first non-start production. [startprods] is a mapping of the start symbols to the corresponding start productions. *) let (start : int), (startprods : index NonterminalMap.t) = StringSet.fold (fun nonterminal (k, startprods) -> let nt = Nonterminal.lookup nonterminal and nt' = Nonterminal.lookup (nonterminal ^ "'") in table.(k) <- (nt', [| Symbol.N nt |]); identifiers.(k) <- [| "_1" |]; ntprods.(nt') <- (k, k+1); positions.(k) <- Nonterminal.positions nt; k+1, NonterminalMap.add nt k startprods ) grammar.start_symbols (0, NonterminalMap.empty) (* Create the non-start productions, populating the above arrays. *) let producer_symbol producer = Symbol.lookup (producer_symbol producer) let (_ : int) = StringMap.fold (fun nonterminal { branches } k -> let nt = Nonterminal.lookup nonterminal in let k' = List.fold_left (fun k branch -> let producers = Array.of_list branch.producers in let rhs = Array.map producer_symbol producers in table.(k) <- (nt, rhs); identifiers.(k) <- Array.map producer_identifier producers; actions.(k) <- Some branch.action; rhs_attributes.(k) <- Array.map producer_attributes producers; production_level.(k) <- branch.branch_production_level; prec_decl.(k) <- branch.branch_prec_annotation; positions.(k) <- [ branch.branch_position ]; if not (MArray.for_all Symbol.non_error rhs) then grammar_uses_error_token := true; k+1 ) k branches in ntprods.(nt) <- (k, k'); k' ) grammar.rules start (* Iteration over the productions associated with a specific nonterminal. *) let iternt nt f = let k, k' = ntprods.(nt) in Misc.iterij k k' f let foldnt nt f accu = let k, k' = ntprods.(nt) in Misc.foldij k k' f accu let mapnt nt f = let k, k' = ntprods.(nt) in Misc.mapij k k' f let foldnt_lazy nt f accu = let k, k' = ntprods.(nt) in Misc.foldij_lazy k k' f accu (* Accessors. *) let def prod = table.(prod) let nt prod = let nt, _ = table.(prod) in nt let rhs prod = let _, rhs = table.(prod) in rhs let length prod = Array.length (rhs prod) let identifiers prod = identifiers.(prod) let is_start prod = prod < start let classify prod = if is_start prod then match (rhs prod).(0) with | Symbol.N nt -> Some nt | Symbol.T _ -> assert false else None let action prod = match actions.(prod) with | Some action -> action | None -> (* Start productions have no action. *) assert (is_start prod); assert false let positions prod = positions.(prod) let lhs_attributes prod = Nonterminal.attributes (nt prod) let rhs_attributes prod = rhs_attributes.(prod) let startsymbol2startprod nt = try NonterminalMap.find nt startprods with Not_found -> assert false (* [nt] is not a start symbol *) let error_free prod = MArray.for_all Symbol.non_error (rhs prod) (* Iteration. *) let init f = Array.init n f let iter f = Misc.iteri n f let fold f accu = Misc.foldi n f accu let map f = Misc.mapi n f let amap f = Array.init n f let iterx f = for prod = start to n - 1 do f prod done let foldx f accu = Misc.foldij start n f accu let mapx f = Misc.mapij start n f (* Printing a production. *) let print prod = assert (not (is_start prod)); let nt, rhs = table.(prod) in if Array.length rhs = 0 then (* Avoid producing a trailing space. *) Printf.sprintf "%s ->" (Nonterminal.print false nt) else Printf.sprintf "%s -> %s" (Nonterminal.print false nt) (Symbol.printao 0 rhs) let describe gerund prod = match classify prod with | Some nt -> let ending = if gerund then "ing" else "" in Printf.sprintf "accept%s %s" ending (Nonterminal.print false nt) | None -> let ending = if gerund then "ing" else "e" in Printf.sprintf "reduc%s production %s" ending (print prod) (* Tabulation and sum. *) let tabulate f = Misc.tabulate n f let sum f = Misc.sum n f (* This array allows recording, for each %prec declaration, whether it is ever useful. This allows us to emit a warning about useless %prec declarations. *) (* 2015/10/06: We take into account the fact that a %prec declaration can be duplicated by inlining or by the expansion of parameterized non-terminal symbols. Our table is not indexed by productions, but by positions (of %prec declarations in the source). Thus, if a %prec declaration is duplicated, at least one of its copies should be found useful for the warning to be suppressed. *) let ever_useful : (Positions.t, unit) Hashtbl.t = (* assuming that generic hashing and equality on positions are OK *) Hashtbl.create 16 let consult_prec_decl prod = let osym = prec_decl.(prod) in lazy ( Option.iter (fun sym -> (* Mark this %prec declaration as useful. *) let pos = Positions.position sym in Hashtbl.add ever_useful pos () ) osym ), osym (* This function prints warnings about useless precedence declarations for productions (%prec). It should be invoked after only the automaton has been constructed. *) let diagnostics () = iterx (fun prod -> let osym = prec_decl.(prod) in Option.iter (fun sym -> (* Check whether this %prec declaration was useless. *) let pos = Positions.position sym in if not (Hashtbl.mem ever_useful pos) then begin Error.grammar_warning [pos] "this %%prec declaration is never useful."; Hashtbl.add ever_useful pos () (* hack: avoid two warnings at the same position *) end ) osym ) (* Determining the precedence level of a production. If no %prec declaration was explicitly supplied, it is the precedence level of the rightmost terminal symbol in the production's right-hand side. *) type production_level = | PNone | PRightmostToken of Terminal.t | PPrecDecl of symbol let rightmost_terminal prod = Array.fold_left (fun accu symbol -> match symbol with | Symbol.T tok -> PRightmostToken tok | Symbol.N _ -> accu ) PNone (rhs prod) let combine e1 e2 = lazy (Lazy.force e1; Lazy.force e2) let precedence prod = let fact1, prec_decl = consult_prec_decl prod in let oterminal = match prec_decl with | None -> rightmost_terminal prod | Some { value = terminal } -> PPrecDecl terminal in match oterminal with | PNone -> fact1, UndefinedPrecedence | PRightmostToken tok -> let fact2, level = Terminal.precedence_level tok in combine fact1 fact2, level | PPrecDecl id -> let fact2, level = TokPrecedence.leveli id in combine fact1 fact2, level end let grammar_uses_error_token = !Production.grammar_uses_error_token (* ------------------------------------------------------------------------ *) (* Maps over productions. *) module ProductionMap = struct include Patricia.Big (* Iteration over the start productions only. *) let start f = Misc.foldi Production.start (fun prod m -> add prod (f prod) m ) empty end (* ------------------------------------------------------------------------ *) (* Support for analyses of the grammar, expressed as fixed point computations. We exploit the generic fixed point algorithm in [Fix]. *) (* We perform memoization only at nonterminal symbols. We assume that the analysis of a symbol is the analysis of its definition (as opposed to, say, a computation that depends on the occurrences of this symbol in the grammar). *) module GenericAnalysis (P : Fix.PROPERTY) (S : sig open P (* An analysis is specified by the following functions. *) (* [shortcut] can be used to map a nonterminal symbol to a property. In that case, the definition of this nonterminal symbol is ignored. *) val shortcut: Nonterminal.t -> property option (* [terminal] maps a terminal symbol to a property. *) val terminal: Terminal.t -> property (* [disjunction] abstracts a binary alternative. That is, when we analyze an alternative between several productions, we compute a property for each of them independently, then we combine these properties using [disjunction]. *) val disjunction: property -> (unit -> property) -> property (* [P.bottom] should be a neutral element for [disjunction]. We use it in the analysis of an alternative with zero branches. *) (* [conjunction] abstracts a binary sequence. That is, when we analyze a sequence, we compute a property for each member independently, then we combine these properties using [conjunction]. In general, conjunction needs access to the first member of the sequence (a symbol), not just to its analysis (a property). *) val conjunction: Symbol.t -> property -> (unit -> property) -> property (* [epsilon] abstracts the empty sequence. It should be a neutral element for [conjunction]. *) val epsilon: property end) : sig open P (* The results of the analysis take the following form. *) (* To every nonterminal symbol, we associate a property. *) val nonterminal: Nonterminal.t -> property (* To every symbol, we associate a property. *) val symbol: Symbol.t -> property (* To every suffix of every production, we associate a property. The offset [i], which determines the beginning of the suffix, must be contained between [0] and [n], inclusive, where [n] is the length of the production. *) val production: Production.index -> int -> property end = struct open P (* The following analysis functions are parameterized over [get], which allows making a recursive call to the analysis at a nonterminal symbol. [get] maps a nonterminal symbol to a property. *) (* Analysis of a symbol. *) let symbol sym get : property = match sym with | Symbol.T tok -> S.terminal tok | Symbol.N nt -> (* Recursive call to the analysis, via [get]. *) get nt (* Analysis of (a suffix of) a production [prod], starting at index [i]. *) let production prod i get : property = let rhs = Production.rhs prod in let n = Array.length rhs in (* Conjunction over all symbols in the right-hand side. This can be viewed as a version of [Array.fold_right], which does not necessarily begin at index [0]. Note that, because [conjunction] is lazy, it is possible to stop early. *) let rec loop i = if i = n then S.epsilon else let sym = rhs.(i) in S.conjunction sym (symbol sym get) (fun () -> loop (i+1)) in loop i (* The analysis is the least fixed point of the following function, which analyzes a nonterminal symbol by looking up and analyzing its definition as a disjunction of conjunctions of symbols. *) let nonterminal nt get : property = match S.shortcut nt with | Some p -> p | None -> (* Disjunction over all productions for this nonterminal symbol. *) Production.foldnt_lazy nt (fun prod rest -> S.disjunction (production prod 0 get) rest ) P.bottom (* The least fixed point is taken as follows. Note that it is computed on demand, as [lfp] is called by the user. *) module F = Fix.Make (Fix.Glue.ArraysAsImperativeMaps(Nonterminal)) (P) let nonterminal = F.lfp nonterminal (* The auxiliary functions can be published too. *) let symbol sym = symbol sym nonterminal let production prod i = production prod i nonterminal end (* ------------------------------------------------------------------------ *) (* Compute which nonterminals are nonempty, that is, recognize a nonempty language. Also, compute which nonterminals are nullable. The two computations are almost identical. The only difference is in the base case: a single terminal symbol is not nullable, but is nonempty. *) module NONEMPTY = GenericAnalysis (Fix.Prop.Boolean) (struct let shortcut _nt = None (* A terminal symbol is nonempty. *) let terminal _ = true (* An alternative is nonempty if at least one branch is nonempty. *) let disjunction p q = p || q() (* A sequence is nonempty if both members are nonempty. *) let conjunction _ p q = p && q() (* The sequence epsilon is nonempty. It generates the singleton language {epsilon}. *) let epsilon = true end) module NULLABLE = GenericAnalysis (Fix.Prop.Boolean) (struct let shortcut _nt = None (* A terminal symbol is not nullable. *) let terminal _ = false (* An alternative is nullable if at least one branch is nullable. *) let disjunction p q = p || q() (* A sequence is nullable if both members are nullable. *) let conjunction _ p q = p && q() (* The sequence epsilon is nullable. *) let epsilon = true end) (* ------------------------------------------------------------------------ *) (* Compute FIRST sets. *) module FIRST = GenericAnalysis (TerminalSet) (struct let shortcut _nt = None (* A terminal symbol has a singleton FIRST set. *) let terminal = TerminalSet.singleton (* The FIRST set of an alternative is the union of the FIRST sets. *) let disjunction p q = TerminalSet.union p (q()) (* The FIRST set of a sequence is the union of: the FIRST set of the first member, and the FIRST set of the second member, if the first member is nullable. *) let conjunction symbol p q = if NULLABLE.symbol symbol then TerminalSet.union p (q()) else p (* The FIRST set of the empty sequence is empty. *) let epsilon = TerminalSet.empty end) (* ------------------------------------------------------------------------ *) (* For every nonterminal symbol [nt], compute a word of minimal length generated by [nt]. This analysis subsumes [NONEMPTY] and [NULLABLE]. Indeed, [nt] produces a nonempty language if only if the minimal length is finite; [nt] is nullable if only if the minimal length is zero. *) (* This analysis is in principle more costly than [NONEMPTY] and [NULLABLE], so it is performed only on demand. In practice, it seems to be very cheap: its cost is not measurable for any of the grammars in our benchmark suite. *) module MINIMAL = GenericAnalysis (struct include CompletedNatWitness type property = Terminal.t t end) (struct let shortcut _nt = None open CompletedNatWitness (* A terminal symbol has length 1. *) let terminal = singleton (* The length of an alternative is the minimum length of any branch. *) let disjunction = min_lazy (* The length of a sequence is the sum of the lengths of the members. *) let conjunction _ = add_lazy (* The epsilon sequence has length 0. *) let epsilon = epsilon end) (* ------------------------------------------------------------------------ *) (* For every nonterminal symbol [nt], we wish to compute the maximum length of a word generated by [nt]. This length can be either finite or [infty]. *) (* We assume that every symbol generates a nonempty language. This removes the need to watch out for productions whose right hand side is empty (which we would have to ignore). A similar assumption is made in the computation of FIRST sets above. *) (* We need to determine which symbols generate a nonempty word, that is, a word of length at least 1. This could be done directly by a fixed point computation over a Boolean lattice, but it is easier to just rely on the FIRST sets, which we compute anyway. A symbol generates a nonempty word if and only if its FIRST set is nonempty. *) let generates_nonempty_word symbol : bool = not (TerminalSet.is_empty (FIRST.symbol symbol)) (* Then, we construct the reference graph of the grammar. The vertices of this graph are the nonterminal symbols, and, if there is a production [A -> alpha B gamma], then there is an edge from A to B. With each such edge, we associate a Boolean label, which is [true] if [alpha . gamma] generates a nonempty word. *) module G = struct type node = Nonterminal.t let n = Nonterminal.n let index nt = nt let iter = Nonterminal.iter let labeled_successors yield nt = Production.iternt nt (fun prod -> let rhs = Production.rhs prod in rhs |> Array.iteri (fun i symbol -> match symbol with | Symbol.T _ -> () | Symbol.N nt' -> (* There is an edge from [nt] to [nt'], whose Boolean label [gnw] is obtained by computing whether the right-hand side, deprived of [nt'], can generate a nonempty word. *) let gnw = MArray.existsi (fun j symbol -> i <> j && generates_nonempty_word symbol ) rhs in yield gnw nt' ) ) let successors yield nt = labeled_successors (fun _gnw nt' -> yield nt') nt end (* We now compute the strongly connected components of the reference graph described above. If a component contains an edge labeled [true], then every nonterminal symbol in this component can generate words of unbounded length. Mark these symbols in an array [unbounded]. (This computation is performed only when needed.) *) let unbounded : bool array Lazy.t = lazy begin let unbounded = Array.make Nonterminal.n false in let module T = Tarjan.Run(G) in (* For each edge of [nt] to [nt'] labeled [gnw], *) G.iter begin fun nt -> nt |> G.labeled_successors begin fun gnw nt' -> (* If [gnw] is set and if [nt] and [nt'] lie in the same component, *) if gnw && T.representative nt = T.representative nt' then begin (* Then mark every symbol in this component as unbounded. *) T.scc (T.representative nt) |> List.iter begin fun nt -> unbounded.(nt) <- true end end end end; unbounded end let unbounded nt : bool = (Lazy.force unbounded).(nt) (* We can finally perform a least fixed point computation, in the lattice [NatMaxInfinity], to find out what is the maximum length of a word generated by each nonterminal symbol. Such a computation normally would not terminate, as the lattice has unbounded height: a cycle of strictly positive weight in the grammar will cause an endless climb. However, we have just identified the symbols that participate in these cycles: they are the symbols [nt] such that [unbounded nt] is [true]. We use the [shortcut] function to set these symbols directly to [infinity]. The cycles that may remain in the grammar must have zero weight and cannot cause divergence. The fixed point computation must therefore terminate and yield the desired information. *) module MAXIMAL = GenericAnalysis (NatInfinityMax) (struct open NatInfinityMax let shortcut nt = if unbounded nt then Some infinity else None (* A terminal symbol has length 1. *) let terminal _tok = finite 1 (* The length of an alternative is the maximum length of any branch. *) let disjunction = max_lazy (* The length of a sequence is the sum of the lengths of the members. *) let conjunction _ = add_lazy (* The epsilon sequence has length 0. *) let epsilon = bottom end) (* ------------------------------------------------------------------------ *) let () = if verbose then begin (* If a start symbol generates the empty language or generates the language {epsilon}, report an error. In principle, this could be just a warning. However, in [Engine], in the function [start], it is convenient to assume that neither of these situations can arise. This means that at least one token must be read. *) StringSet.iter (fun symbol -> let nt = Nonterminal.lookup symbol in if not (NONEMPTY.nonterminal nt) then Error.error (Nonterminal.positions nt) "%s generates the empty language." (Nonterminal.print false nt); if TerminalSet.is_empty (FIRST.nonterminal nt) then Error.error (Nonterminal.positions nt) "%s generates the language {epsilon}." (Nonterminal.print false nt) ) grammar.start_symbols; (* If a nonterminal symbol generates the empty language, issue a warning. *) for nt = Nonterminal.start to Nonterminal.n - 1 do if not (NONEMPTY.nonterminal nt) then Error.grammar_warning (Nonterminal.positions nt) "%s generates the empty language." (Nonterminal.print false nt); done end (* ------------------------------------------------------------------------ *) (* Dump the analysis results. *) let () = if verbose then Error.logG 2 (fun f -> for nt = Nonterminal.start to Nonterminal.n - 1 do Printf.fprintf f "nullable(%s) = %b\n" (Nonterminal.print false nt) (NULLABLE.nonterminal nt) done; for nt = Nonterminal.start to Nonterminal.n - 1 do Printf.fprintf f "first(%s) = %s\n" (Nonterminal.print false nt) (TerminalSet.print (FIRST.nonterminal nt)) done; for nt = Nonterminal.start to Nonterminal.n - 1 do Printf.fprintf f "minimal(%s) = %s\n" (Nonterminal.print false nt) (CompletedNatWitness.print Terminal.print (MINIMAL.nonterminal nt)) done; for nt = Nonterminal.start to Nonterminal.n - 1 do Printf.fprintf f "maximal(%s) = %s\n" (Nonterminal.print false nt) (NatInfinityMax.print (MAXIMAL.nonterminal nt)) done; ) let () = if verbose then Time.tick "Analysis of the grammar" (* ------------------------------------------------------------------------ *) (* Compute FOLLOW sets. Unnecessary for us, but requested by a user. Also, this is useful for the SLR(1) test. Thus, we perform this analysis only on demand. *) (* The computation of the symbolic FOLLOW sets follows exactly the same pattern as that of the traditional FOLLOW sets. We share code and parameterize this computation over a module [P]. The type [P.property] intuitively represents a set of symbols. *) module FOLLOW (P : sig include Fix.MINIMAL_SEMI_LATTICE val bottom: property val terminal: Terminal.t -> property val first: Production.index -> int -> property end) = struct module M = Fix.Glue.ArraysAsImperativeMaps(Nonterminal) module S = FixSolver.Make(M)(P) (* Build a system of constraints. *) (* Iterate over all start symbols. *) let () = let sharp = P.terminal Terminal.sharp in for nt = 0 to Nonterminal.start - 1 do assert (Nonterminal.is_internal_start nt); (* Add # to FOLLOW(nt). *) S.record_ConVar sharp nt done (* We need to do this explicitly because our start productions are of the form S' -> S, not S' -> S #, so # will not automatically appear into FOLLOW(S) when the start productions are examined. *) (* Iterate over all productions. *) let () = Array.iteri (fun prod (nt1, rhs) -> (* Iterate over all nonterminal symbols [nt2] in the right-hand side. *) Array.iteri (fun i symbol -> match symbol with | Symbol.T _ -> () | Symbol.N nt2 -> let nullable = NULLABLE.production prod (i+1) and first = P.first prod (i+1) in (* The FIRST set of the remainder of the right-hand side contributes to the FOLLOW set of [nt2]. *) S.record_ConVar first nt2; (* If the remainder of the right-hand side is nullable, FOLLOW(nt1) contributes to FOLLOW(nt2). *) if nullable then S.record_VarVar nt1 nt2 ) rhs ) Production.table (* Second pass. Solve the equations (on demand). *) let follow : (Nonterminal.t -> P.property) Lazy.t = lazy ( let module S = S.Solve() in fun nt -> Option.value (S.solution nt) ~default:P.bottom ) let follow : Nonterminal.t -> P.property = fun nt -> (Lazy.force follow) nt end (* Use the above functor to obtain the standard (concrete) FOLLOW sets. *) let follow : Nonterminal.t -> TerminalSet.t = let module F = FOLLOW(struct include TerminalSet let terminal = singleton let first = FIRST.production end) in F.follow (* At log level 2, display the FOLLOW sets. *) let () = if verbose then Error.logG 2 (fun f -> for nt = Nonterminal.start to Nonterminal.n - 1 do Printf.fprintf f "follow(%s) = %s\n" (Nonterminal.print false nt) (TerminalSet.print (follow nt)) done ) (* Compute FOLLOW sets for the terminal symbols as well. Again, unnecessary for us, but requested by a user. This is done in a single pass over the grammar -- no new fixpoint computation is required. *) let tfollow : TerminalSet.t array Lazy.t = lazy ( let tfollow = Array.make Terminal.n TerminalSet.empty in (* Iterate over all productions. *) Array.iteri (fun prod (nt1, rhs) -> (* Iterate over all terminal symbols [t2] in the right-hand side. *) Array.iteri (fun i symbol -> match symbol with | Symbol.N _ -> () | Symbol.T t2 -> let nullable = NULLABLE.production prod (i+1) and first = FIRST.production prod (i+1) in (* The FIRST set of the remainder of the right-hand side contributes to the FOLLOW set of [t2]. *) tfollow.(t2) <- TerminalSet.union first tfollow.(t2); (* If the remainder of the right-hand side is nullable, FOLLOW(nt1) contributes to FOLLOW(t2). *) if nullable then tfollow.(t2) <- TerminalSet.union (follow nt1) tfollow.(t2) ) rhs ) Production.table; tfollow ) (* Define another accessor. *) let tfollow t = (Lazy.force tfollow).(t) (* At log level 3, display the FOLLOW sets for terminal symbols. *) let () = if verbose then Error.logG 3 (fun f -> for t = 0 to Terminal.n - 1 do Printf.fprintf f "follow(%s) = %s\n" (Terminal.print t) (TerminalSet.print (tfollow t)) done ) (* ------------------------------------------------------------------------ *) (* Compute symbolic FIRST and FOLLOW sets. *) (* The symbolic FIRST set of the word determined by [prod/i] is defined (and computed) as follows. *) let sfirst prod i = let rhs = Production.rhs prod in let n = Array.length rhs in let rec loop i = if i = n then (* If the word [prod/i] is empty, the set is empty. *) SymbolSet.empty else let sym = rhs.(i) in (* If the word [prod/i] begins with a symbol [sym], then [sym] itself is part of the symbolic FIRST set, unconditionally. *) SymbolSet.union (SymbolSet.singleton sym) (* Furthermore, if [sym] is nullable, then the symbolic FIRST set of the sub-word [prod/i+1] contributes, too. *) (if NULLABLE.symbol sym then loop (i + 1) else SymbolSet.empty) in loop i (* The symbolic FOLLOW sets are computed just like the FOLLOW sets, except we use a symbolic FIRST set instead of a standard FIRST set. *) let sfollow : Nonterminal.t -> SymbolSet.t = let module F = FOLLOW(struct let bottom = SymbolSet.bottom include Fix.Glue.MinimalSemiLattice(SymbolSet) let terminal t = SymbolSet.singleton (Symbol.T t) let first = sfirst end) in F.follow (* At log level 3, display the symbolic FOLLOW sets. *) let () = if verbose then Error.logG 3 (fun f -> for nt = Nonterminal.start to Nonterminal.n - 1 do Printf.fprintf f "sfollow(%s) = %s\n" (Nonterminal.print false nt) (SymbolSet.print (sfollow nt)) done ) (* ------------------------------------------------------------------------ *) (* Provide explanations about FIRST sets. *) (* The idea is to explain why a certain token appears in the FIRST set for a certain sequence of symbols. Such an explanation involves basic assertions of the form (i) symbol N is nullable and (ii) the token appears in the FIRST set for symbol N. We choose to take these basic facts for granted, instead of recursively explaining them, so as to keep explanations short. *) (* We first produce an explanation in abstract syntax, then convert it to a human-readable string. *) type explanation = | EObvious (* sequence begins with desired token *) | EFirst of Terminal.t * Nonterminal.t (* sequence begins with a nonterminal that produces desired token *) | ENullable of Symbol.t list * explanation (* sequence begins with a list of nullable symbols and ... *) let explain (tok : Terminal.t) (rhs : Symbol.t array) (i : int) = let length = Array.length rhs in let rec loop i = assert (i < length); let symbol = rhs.(i) in match symbol with | Symbol.T tok' -> assert (Terminal.equal tok tok'); EObvious | Symbol.N nt -> if TerminalSet.mem tok (FIRST.nonterminal nt) then EFirst (tok, nt) else begin assert (NULLABLE.nonterminal nt); match loop (i + 1) with | ENullable (symbols, e) -> ENullable (symbol :: symbols, e) | e -> ENullable ([ symbol ], e) end in loop i let rec convert = function | EObvious -> "" | EFirst (tok, nt) -> Printf.sprintf "%s can begin with %s" (Nonterminal.print false nt) (Terminal.print tok) | ENullable (symbols, e) -> let e = convert e in Printf.sprintf "%s can vanish%s%s" (Symbol.printl symbols) (if e = "" then "" else " and ") e (* ------------------------------------------------------------------------ *) (* Package the analysis results. *) module Analysis = struct let nullable = NULLABLE.nonterminal let nullable_symbol = NULLABLE.symbol let first = FIRST.nonterminal let first_symbol = FIRST.symbol (* An initial definition of [nullable_first_prod]. *) let nullable_first_prod prod i = NULLABLE.production prod i, FIRST.production prod i (* A memoised version, so as to avoid recomputing along a production's right-hand side. *) let nullable_first_prod = Misc.tabulate Production.n (fun prod -> Misc.tabulate (Production.length prod + 1) (fun i -> nullable_first_prod prod i ) ) let first_prod_lookahead prod i z = let nullable, first = nullable_first_prod prod i in if nullable then TerminalSet.add z first else first let explain_first_rhs (tok : Terminal.t) (rhs : Symbol.t array) (i : int) = convert (explain tok rhs i) let follow = follow let attributes = grammar.gr_attributes let minimal nt = CompletedNatWitness.to_int (MINIMAL.nonterminal nt) let minimal_prod prod i = assert (0 <= i && i <= Production.length prod); CompletedNatWitness.to_int (MINIMAL.production prod i) let maximal nt = NatInfinityMax.to_int (MAXIMAL.nonterminal nt) let maximal_prod prod i = assert (0 <= i && i <= Production.length prod); NatInfinityMax.to_int (MAXIMAL.production prod i) end (* ------------------------------------------------------------------------ *) (* Conflict resolution via precedences. *) module Precedence = struct type choice = | ChooseShift | ChooseReduce | ChooseNeither | DontKnow type order = Lt | Gt | Eq | Ic let precedence_order p1 p2 = match p1, p2 with | UndefinedPrecedence, _ | _, UndefinedPrecedence -> Ic | PrecedenceLevel (m1, l1, _, _), PrecedenceLevel (m2, l2, _, _) -> if not (InputFile.same_input_file m1 m2) then Ic else if l1 > l2 then Gt else if l1 < l2 then Lt else Eq let production_order p1 p2 = match p1, p2 with | ProductionLevel (m1, l1), ProductionLevel (m2, l2) -> if not (InputFile.same_input_file m1 m2) then Ic else if l1 > l2 then Gt else if l1 < l2 then Lt else Eq let shift_reduce tok prod = let fact1, tokp = Terminal.precedence_level tok and fact2, prodp = Production.precedence prod in match precedence_order tokp prodp with (* Our information is inconclusive. Drop [fact1] and [fact2], that is, do not record that this information was useful. *) | Ic -> DontKnow (* Our information is useful. Record that fact by evaluating [fact1] and [fact2]. *) | (Eq | Lt | Gt) as c -> Lazy.force fact1; Lazy.force fact2; match c with | Ic -> assert false (* already dispatched *) | Eq -> begin match Terminal.associativity tok with | LeftAssoc -> ChooseReduce | RightAssoc -> ChooseShift | NonAssoc -> ChooseNeither | _ -> assert false (* If [tok]'s precedence level is defined, then its associativity must be defined as well. *) end | Lt -> ChooseReduce | Gt -> ChooseShift let reduce_reduce prod1 prod2 = let pl1 = Production.production_level.(prod1) and pl2 = Production.production_level.(prod2) in match production_order pl1 pl2 with | Lt -> Some prod1 | Gt -> Some prod2 | Eq -> (* The order is strict except in the presence of parameterized non-terminals and/or inlining. Two productions can have the same precedence level if they originate, via macro-expansion or via inlining, from a single production in the source grammar. *) None | Ic -> None end (* This function prints warnings about useless precedence declarations for terminal symbols (%left, %right, %nonassoc) and productions (%prec). It should be invoked after only the automaton has been constructed. *) let diagnostics () = if not Settings.ignore_all_unused_precedence_levels then begin TokPrecedence.diagnostics(); Production.diagnostics() end (* ------------------------------------------------------------------------ *) (* %on_error_reduce declarations. *) module OnErrorReduce = struct (* We keep a [StringMap] internally, and convert back and forth between the types [Nonterminal.t] and [string] when querying this map. This is not very elegant, and could be changed if desired. *) let declarations : Syntax.on_error_reduce_level StringMap.t = grammar.on_error_reduce let print (nt : Nonterminal.t) : string = Nonterminal.print false nt let lookup (nt : string) : Nonterminal.t = try Nonterminal.lookup nt with Not_found -> (* If this fails, then we have an [%on_error_reduce] declaration for an invalid symbol. *) assert false let reduce prod = let nt = Production.nt prod in StringMap.mem (print nt) declarations let iter f = StringMap.iter (fun nt _prec -> f (lookup nt) ) declarations open Precedence let preferable prod1 prod2 = (* The two productions that we are comparing must be distinct. *) assert (prod1 <> prod2); let nt1 = Production.nt prod1 and nt2 = Production.nt prod2 in (* If they have the same left-hand side (which seems rather unlikely?), declare them incomparable. *) nt1 <> nt2 && (* Otherwise, look up the priority levels associated with their left-hand symbols. *) let prec1, prec2 = try StringMap.find (print nt1) declarations, StringMap.find (print nt2) declarations with Not_found -> (* [preferable] should be used to compare two symbols for which there exist [%on_error_reduce] declarations. *) assert false in match production_order prec1 prec2 with | Gt -> (* [prec1] is a higher integer than [prec2], therefore comes later in the file. By analogy with [%left] and friends, we give higher priority to later declarations. *) true | Lt -> false | Eq | Ic -> (* We could issue a warning or an information message in these cases. *) false end (* ------------------------------------------------------------------------ *) (* Facilities for printing sentences. *) module Sentence = struct type sentence = Nonterminal.t option * Terminal.t list open Printf let print_abstract (nto, terminals) : string = Misc.with_buffer 128 (fun b -> Option.iter (fun nt -> bprintf b "%s: " (Nonterminal.print false nt) ) nto; let separator = Misc.once "" " " in List.iter (fun t -> bprintf b "%s%s" (separator()) (Terminal.print t) ) terminals; bprintf b "\n"; ) let print_concrete (_nto, terminals) : string = Misc.with_buffer 128 (fun b -> let separator = Misc.once "" " " in List.iter (fun t -> bprintf b "%s%s" (separator()) (Terminal.print_concrete t) ) terminals ) let print style sentence = match style with | `Abstract -> print_abstract sentence | `Concrete -> print_concrete sentence end (* ------------------------------------------------------------------------ *) end (* module Make *) menhir-20210929/src/grammarFunctor.mli000066400000000000000000000564531412503066000175410ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The functor [Make] transforms an abstract syntax tree for the grammar into a rich internal representation of the grammar. *) (* The reason why this is now a functor, and the reason why its verbosity can be controlled, is that we may wish to invoke it several times, e.g. on the grammar before %inlining, and on the grammar after %inlining. 2015/11/10 *) module Make (G : sig (* An abstract syntax tree for the grammar. *) val grammar: BasicSyntax.grammar (* This flag indicates whether it is OK to produce warnings, verbose information, etc., when this functor is invoked. If it is set to [false], then only serious errors can be signaled. *) val verbose: bool end) () : sig (* ------------------------------------------------------------------------ *) (* Nonterminals. *) module Nonterminal : sig (* The type of nonterminals. *) type t (* Comparison. *) val equal: t -> t -> bool val compare: t -> t -> int (* The number of nonterminals. This includes the extra nonterminals that are internally generated for the grammar's entry points. *) val n: int (* [lookup] maps an identifier to a nonterminal, or raises [Not_found]. *) val lookup : string -> t (* Nonterminals can be converted to integers. This feature is exploited in the table-based back-end. *) val n2i: t -> int (* This produces a string representation of a nonterminal. It should in principle never be applied to one of the internally generated nonterminals, as we do not wish users to become aware of the existence of these extra nonterminals. However, we do sometimes violate this rule when it is difficult to do otherwise. The Boolean parameter tells whether the string representation should be normalized, that is, whether parentheses and commas should be eliminated. This is necessary if the string is intended for use as a valid nonterminal name or as a valid OCaml identifier. *) val print: bool -> t -> string (* This is the OCaml type associated with a nonterminal symbol. It is known only if a %type declaration was provided. This function is not applicable to the internally generated nonterminals. *) val ocamltype: t -> Stretch.ocamltype option (* A start symbol always has a type. This allows us to define a simplified version of [ocamltype] for start symbols. *) val ocamltype_of_start_symbol: t -> Stretch.ocamltype (* Creation of a table indexed by nonterminals. *) val init: (t -> 'a) -> 'a array (* Iteration over nonterminals. The order in which elements are examined, and the order of [map]'s output list, correspond to the numeric indices produced by [n2i] above. *) val iter: (t -> unit) -> unit val fold: (t -> 'a -> 'a) -> 'a -> 'a val map: (t -> 'a) -> 'a list (* Iteration over all nonterminals, except the start nonterminals. *) val iterx: (t -> unit) -> unit val foldx: (t -> 'a -> 'a) -> 'a -> 'a (* Tabulation of a function over nonterminals. *) val tabulate: (t -> 'a) -> (t -> 'a) (* [positions nt] is a list of the positions associated with the definition of [nt]. There can be more than one position because definitions can be split over multiple files. *) val positions: t -> Positions.t list (* [is_user_start nt] tells whether the nonterminal symbol [nt] is one of the start symbols that the user knows about. *) val is_user_start: t -> bool (* [is_internal_start nt] tells whether the nonterminal symbol [nt] is one of the internal start symbols, that is, one of the new symbols [S'] that are created for each user start symbol [S]. *) val is_internal_start: t -> bool (* [attributes nt] is the list of attributes attached with the nonterminal symbol [nt]. *) val attributes: t -> Syntax.attribute list end (* ------------------------------------------------------------------------ *) (* Sets of nonterminals. *) module NonterminalMap : GMap.S with type key = Nonterminal.t module NonterminalSet = NonterminalMap.Domain (* ------------------------------------------------------------------------ *) (* Terminals. *) module Terminal : sig (* The type of terminals. *) type t (* The number of terminals. This includes the two pseudo-tokens [#] and [error]. *) val n: int (* Comparison. *) val equal: t -> t -> bool val compare: t -> t -> int (* [lookup] maps an identifier to a terminal, or raises [Not_found]. *) val lookup : string -> t (* Terminals can be converted to integers. This feature is exploited in the table-based back-end and in [LRijkstra]. The reverse conversion, [i2t], is unsafe and should not be used. [LRijkstra] uses it :-) *) val t2i: t -> int val i2t: int -> t (* unsafe! *) (* This produces a string representation of a terminal. *) val print: t -> string (**[print_concrete t] prints the terminal symbol [t] under a concrete form, by relying on the token alias that has been declared for this symbol. *) val print_concrete: t -> string (* This is the OCaml type associated with a terminal symbol. It is known only if the %token declaration was accompanied with a type. *) val ocamltype: t -> Stretch.ocamltype option (* These are the two pseudo-tokens [#] and [error]. The former is used to denote the end of the token stream. The latter is accessible to the user and is used for handling errors. *) val sharp: t val error: t (* This is the programmer-defined [EOF] token, if there is one. It is recognized based solely on its name, which is fragile, but this behavior is documented. This token is assumed to represent [ocamllex]'s [eof] pattern. It is used only by the reference interpreter, and in a rather non-essential way. *) val eof: t option (* A terminal symbol is pseudo if it is [#] or [error]. It is real otherwise. *) val pseudo: t -> bool val real: t -> bool (* [non_error] returns [true] if its argument is not the [error] token. *) val non_error: t -> bool (* Creation of a table indexed by terminals. *) val init: (t -> 'a) -> 'a array (* Iteration over terminals. The order in which elements are examined, and the order of [map]'s output list, correspond to the numeric indices produced by [t2i] above. *) val iter: (t -> unit) -> unit val fold: (t -> 'a -> 'a) -> 'a -> 'a val map: (t -> 'a) -> 'a list (* Iteration over all terminals except [#]. *) val foldx: (t -> 'a -> 'a) -> 'a -> 'a val mapx: (t -> 'a) -> 'a list (* [iter_real] offers iteration over all real terminals. *) val iter_real: (t -> unit) -> unit (* [attributes t] is the list of attributes attached with the terminal symbol [t]. *) val attributes: t -> Syntax.attribute list (* [every_token_has_an_alias] is true if a token alias has been defined by the user for every token. *) val every_token_has_an_alias: bool (* [alias t] is the token alias that has been defined by the user for the terminal symbol [t], if there is one. It is a quoted escaped string literal. [unquoted_alias t] is the same string, deprived of its opening and closing quotes, and unescaped. *) val alias: t -> string option val unquoted_alias: t -> string option (* The sub-module [Word] offers an implementation of words (that is, sequences) of terminal symbols. It is used by [LRijkstra]. We make it a functor, because it has internal state (a hash table) and a side effect (failure if there are more than 256 terminal symbols). *) (* The type [word] should be treated, as much as possible, as an abstract type. In fact, for efficiency reasons, we represent a word as a unique integer codes, and we allocate these integer codes sequentially, from 0 upwards. The conversion from [int] to [word] is of course unsafe and should be used wisely. *) module Word (X : sig end) : sig type word = int val epsilon: word val singleton: t -> word val append: word -> word -> word val length: word -> int (* [first w z] returns the first symbol of the word [w.z]. *) val first: word -> t -> t val elements: word -> t list val print: word -> string (* [verbose()] prints statistics about the use of the internal hash-consing table so far. *) val verbose: unit -> unit (* Lexicographic ordering. *) val compare: word -> word -> int end end (* ------------------------------------------------------------------------ *) (* Sets and maps over terminals. *) module TerminalSet : sig (* All of the operations documented in [GSet] are available. *) include GSet.S with type element = Terminal.t (* This offers a string representation of a set of terminals. The symbols are simply listed one after the other and separated with spaces. *) val print: t -> string (* This is the set of all terminal symbols except the pseudo-tokens [#] and [error]. *) val universe: t end (* All of the operations documented in [GMap] are available. *) module TerminalMap : GMap.S with type key = Terminal.t (* ------------------------------------------------------------------------ *) (* Symbols. *) module Symbol : sig (* A symbol is either a nonterminal or a terminal. *) type t = | N of Nonterminal.t | T of Terminal.t val is_terminal: t -> bool (* [lookup] maps an identifier to a symbol, or raises [Not_found]. *) val lookup : string -> t (* Comparison. *) val equal: t -> t -> bool val lequal: t list -> t list -> bool (* [non_error] returns [true] if its argument is not the [error] token. *) val non_error: t -> bool (* These produce a string representation of a symbol, of a list of symbols, or of an array of symbols. The symbols are simply listed one after the other and separated with spaces. [printao] prints an array of symbols, starting at a particular offset. [printaod] is analogous, but can also print a single dot at a particular position between two symbols. *) val print: t -> string val printl: t list -> string val printa: t array -> string val printao: int -> t array -> string val printaod: int -> int -> t array -> string end (* ------------------------------------------------------------------------ *) (* Sets and maps over symbols. *) (* All of the operations documented in [Set] are available. *) module SymbolSet : Set.S with type elt = Symbol.t module SymbolMap : sig (* All of the operations documented in [Map] are available. *) include Map.S with type key = Symbol.t (* [domain m] is the domain of the map [m], that is, the list of keys for which an entry exists in the map [m]. *) val domain: 'a t -> key list (* [init f xs] creates a map whose keys are the elements [x] found in the list [xs] and the datum associated with [x] is [f x]. *) val init: (key -> 'a) -> key list -> 'a t (* This returns [true] if and only if all of the symbols in the domain of the map at hand are nonterminals. *) val purelynonterminal: 'a t -> bool end (* ------------------------------------------------------------------------ *) (* Productions. *) module Production : sig (* This is the type of productions. This includes user-defined productions as well as the internally generated productions associated with the start symbols. *) type index (* Comparison. *) val compare: index -> index -> int (* Productions can be converted to integers and back. This is unsafe and should be avoided as much as possible. This feature is exploited, for efficiency, in the encoding of items. *) val p2i: index -> int val i2p: int -> index (* The number of productions. *) val n: int (* These map a production index to the production's definition, that is, a nonterminal (the left-hand side) and an array of symbols (the right-hand side). *) val def: index -> Nonterminal.t * Symbol.t array val nt: index -> Nonterminal.t val rhs: index -> Symbol.t array val length: index -> int (* This maps a production index to an array of the identifiers that should be used for naming the semantic values of the symbols in the right-hand side. *) val identifiers: index -> Syntax.identifier array (* This maps a production index to the production's semantic action. This function is not applicable to a start production. *) val action: index -> Syntax.action (* [positions prod] is a list of the positions associated with production [prod]. This is usually a singleton list, but there can be more than one position for start productions when the definition of the corresponding start symbol is split over multiple files. *) val positions: index -> Positions.t list (* [lhs_attributes prod] returns the attributes attached with the head symbol of the production [prod]. It is equivalent to [Nonterminal.attributes (nt prod)]. [rhs_attributes prod] returns an array of the attributes attached with each element in the right-hand side of the production [prod]. *) val lhs_attributes: index -> Syntax.attributes val rhs_attributes: index -> Syntax.attributes array (* Creation of a table indexed by productions. *) val init: (index -> 'a) -> 'a array (* Iteration over all productions. The order in which elements are examined, and the order of [map]'s output list, correspond to the numeric indices produced by [p2i] above. *) val iter: (index -> unit) -> unit val fold: (index -> 'a -> 'a) -> 'a -> 'a val map: (index -> 'a) -> 'a list val amap: (index -> 'a) -> 'a array (* Iteration over all productions, except the start productions. *) val iterx: (index -> unit) -> unit val foldx: (index -> 'a -> 'a) -> 'a -> 'a val mapx: (index -> 'a) -> 'a list (* This maps a (user) non-terminal start symbol to the corresponding start production. *) val startsymbol2startprod: Nonterminal.t -> index (* Iteration over the productions associated with a specific nonterminal. *) val iternt: Nonterminal.t -> (index -> unit) -> unit val foldnt: Nonterminal.t -> (index -> 'a -> 'a) -> 'a -> 'a val mapnt: Nonterminal.t -> (index -> 'a) -> 'a list (* This allows determining whether a production is a start production. If it is a start production, the start symbol that it is associated with is returned. If it is a regular production, nothing is returned. *) val classify: index -> Nonterminal.t option (* [is_start] is easier to use than [classify] when the start symbol is not needed. *) val is_start: index -> bool (* The integer [start] is published so as to allow the table back-end to produce code for [is_start]. It should not be used otherwise. *) val start: int (* [error_free prod] returns [true] if the right-hand side of the production [prod] does *not* contain the [error] pseudo-token. *) val error_free: index -> bool (* This produces a string representation of a production. It should never be applied to a start production, as we do not wish users to become aware of the existence of these extra productions. *) val print: index -> string (* This produces a representation of a production as a string of the form "accepting " or "reducing -> ". The Boolean flag [gerund] allows choosing between gerund and infinitive forms. *) val describe: bool -> index -> string (* Tabulation of a function over productions. *) val tabulate: (index -> 'a) -> (index -> 'a) (* Sum of an integer function over productions. *) val sum: (index -> int) -> int end (* ------------------------------------------------------------------------ *) (* Maps over productions. *) module ProductionMap : sig include GMap.S with type key = Production.index (* Iteration over the start productions only. *) val start: (Production.index -> 'a) -> 'a t end (* ------------------------------------------------------------------------ *) (* This flag tells whether the [error] token appears in at least one production. *) val grammar_uses_error_token: bool (* ------------------------------------------------------------------------ *) (* Analysis of the grammar. *) module Analysis : sig (* [nullable nt] is the NULLABLE flag of the non-terminal symbol [nt]. That is, it is true if and only if this symbol produces the empty word [epsilon]. *) val nullable: Nonterminal.t -> bool val nullable_symbol: Symbol.t -> bool (* [first nt] is the FIRST set of the non-terminal symbol [nt]. *) val first: Nonterminal.t -> TerminalSet.t val first_symbol: Symbol.t -> TerminalSet.t (* [nullable_first_prod prod i] considers the suffix of the production [prod] defined by offset [i]. It returns its NULLABLE flag as well as its FIRST set. The offset [i] must be contained between [0] and [n], inclusive, where [n] is the length of production [prod]. *) val nullable_first_prod: Production.index -> int -> bool * TerminalSet.t (* [first_prod_lookahead prod i t] computes [FIRST(alpha.t)], where [alpha] is the suffix of the production defined by offset [i], and [t] is a terminal symbol. The offset [i] must be contained between [0] and [n], inclusive, where [n] is the length of production [prod]. *) val first_prod_lookahead: Production.index -> int -> Terminal.t -> TerminalSet.t (* [explain_first_rhs tok rhs i] explains why the token [tok] appears in the FIRST set for the string of symbols found at offset [i] in the array [rhs]. *) val explain_first_rhs: Terminal.t -> Symbol.t array -> int -> string (* [follow nt] is the FOLLOW set of the non-terminal symbol [nt], that is, the set of terminal symbols that could follow an expansion of [nt] in a valid sentence. *) val follow: Nonterminal.t -> TerminalSet.t (* [attributes] are the attributes attached with the grammar. *) val attributes: Syntax.attributes (* [minimal nt] is the minimal size of a sentence generated by the nonterminal symbol [nt]. If this symbol generates an empty language, then [minimal nt] is [max_int]. [minimal_prod prod i] is the minimal size of a sentence generated by the suffix of the production [prod] defined by the offset [i]. *) val minimal: Nonterminal.t -> int val minimal_prod: Production.index -> int -> int (**[maximal nt] is the maximal size of a sentence generated by the nonterminal symbol [nt]. An unbounded maximal size is represented by the special value [max_int]. This analysis is carried out under the assumption that every symbol generates a nonempty language. *) val maximal: Nonterminal.t -> int (**[maximal_prod prod i] is the maximal size of a sentence generated by the suffix of the production [prod] defined by the offset [i]. *) val maximal_prod: Production.index -> int -> int end (* ------------------------------------------------------------------------ *) (* Conflict resolution via precedences. *) module Precedence : sig (* Shift/reduce conflicts require making a choice between shifting a token and reducing a production. How these choices are made is of no concern to the back-end, but here is a rough explanation. Shifting is preferred when the token has higher precedence than the production, or they have same precedence and the token is right-associative. Reducing is preferred when the token has lower precedence than the production, or they have same precedence and the token is left-associative. Neither is allowed when the token and the production have same precedence and the token is non-associative. No preference is explicitly specified when the token or the production has undefined precedence. In that case, the default choice is to prefer shifting, but a conflict will be reported. *) type choice = | ChooseShift | ChooseReduce | ChooseNeither | DontKnow val shift_reduce: Terminal.t -> Production.index -> choice (* Reduce/reduce conflicts require making a choice between reducing two distinct productions. This is done by exploiting a partial order on productions. For compatibility with ocamlyacc, this order should be total and should correspond to textual order when the two productions originate in the same source file. When they originate in different source files, the two productions should be incomparable. *) val reduce_reduce: Production.index -> Production.index -> Production.index option end (* ------------------------------------------------------------------------ *) (* [%on_error_reduce] declarations. *) module OnErrorReduce : sig (* [reduce prod] tells whether the left-hand side of [prod] (a nonterminal symbol) appears in an [%on_error_reduce] declaration. *) val reduce: Production.index -> bool (* [iter f] applies the function [f] in turn, in an arbitrary order, to every nonterminal symbol that appears in an [%on_error_reduce] declaration. *) val iter: (Nonterminal.t -> unit) -> unit (* When two productions could be reduced, in a single state, due to [%on_error_reduce] declarations, these productions can be compared, using [preferable], to test if one of them takes precedence over the other. This is a partial order; two productions may be incomparable. *) val preferable: Production.index -> Production.index -> bool end (* ------------------------------------------------------------------------ *) (* Sentences. *) module Sentence : sig (**A sentence is a pair of an optional start symbol and a sequence of terminal symbols. Our convention is the start symbol can be omitted if this is unambiguous, that is, if the grammar has exactly one start symbol. *) type sentence = Nonterminal.t option * Terminal.t list (**[print style sentence] prints a sentence as a space-separated list of symbolic token names. The [style] parameter indicates whether the sentence should be displayed in concrete syntax; if it is [`Concrete], then every token must have a token alias. *) val print: [`Abstract | `Concrete] -> sentence -> string end (* ------------------------------------------------------------------------ *) (* Diagnostics. *) (* This function prints warnings about useless precedence declarations for terminal symbols (%left, %right, %nonassoc) and productions (%prec). It should be invoked after only the automaton has been constructed. *) val diagnostics: unit -> unit (* ------------------------------------------------------------------------ *) end (* module Make *) menhir-20210929/src/infer.ml000066400000000000000000000355701412503066000155010ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Syntax open Stretch open BasicSyntax open IL open CodeBits open TokenType (* ------------------------------------------------------------------------- *) (* Naming conventions. *) (* The type variable associated with a nonterminal symbol. Its name begins with a prefix which ensures that it begins with a lowercase letter and cannot clash with OCaml keywords. *) let ntvar symbol = TypVar (Printf.sprintf "tv_%s" (Misc.normalize symbol)) (* The term variable associated with a nonterminal symbol. Its name begins with a prefix which ensures that it begins with a lowercase letter and cannot clash with OCaml keywords. *) let encode symbol = Printf.sprintf "xv_%s" (Misc.normalize symbol) let decode s = let n = String.length s in if not (n >= 3 && String.sub s 0 3 = "xv_") then Lexmli.fail(); String.sub s 3 (n - 3) (* The name of the temporary file. *) let base = Settings.base let mlname = base ^ ".ml" let mliname = base ^ ".mli" (* ------------------------------------------------------------------------- *) (* The type of a nonterminal symbol. *) (* 2020/05/25. We must be careful how we process the type information given to us by the user via [%type] and [%start] annotations. Duplicating this information without care can be unsound: e.g., some OCaml types, such as the open polymorphic variant type [> `A ], contain implicit type variables. When such a type annotation is duplicated, the two copies are unrelated: this can cause us to infer a type that is too general (issue #37). To avoid this issue, part of the solution is to enforce sharing by using a type variable everywhere (one per nonterminal symbol). At first sight, it looks as though we could use this type variable everywhere, and use the user-provided type annotation exactly once. However, another OCaml feature, type-directed disambiguation, depends on explicit type information, and does not always make a correct choice if this information is not immediately available (prior to unification). Thus, whenever we need to annotate an expression or a pattern with the type of a nonterminal symbol, we must annotate it *both* with the type variable that corresponds to this symbol (so as to enforce sharing) and with the user-provided type, if there is one (so as to allow type-directed disambiguation). Also, when reading the types inferred by the OCaml type-checker, we must read every type, even in the cases where a type was already provided by the user. Indeed, the type inferred by OCaml may be more specific than the type provided by the user. E.g., the user may declare the type [> `A ], while OCaml may infer the type [> `A | `B ]. *) (* [usertype grammar nt] is the type provided by the user for the nonterminal symbol [nt], if there is one. *) let usertype grammar nt = try Some (TypTextual (StringMap.find nt grammar.types)) with Not_found -> None (* [annotate_expr grammar e nt] annotates the expression [e] with type information associated with the nonterminal symbol [nt]. We annotate [e] with the type variable [ntvar nt], and additionally annotate it with the type [usertype grammar nt], if it exists. *) let annotate_expr grammar e nt = let e = annotate e (ntvar nt) in let e = Option.fold (fun t e -> annotate e t) (usertype grammar nt) e in e let annotate_pat grammar p nt = let p = PAnnot (p, ntvar nt) in let p = Option.fold (fun t p -> PAnnot (p, t)) (usertype grammar nt) p in p (* ------------------------------------------------------------------------- *) (* Code production. *) (* [tokentype grammar symbol] returns the type of the terminal symbol [symbol], if [symbol] is indeed a terminal symbol. Otherwise, it raises [Not_found]. *) let tokentype grammar symbol = let props = StringMap.find symbol grammar.tokens in match props.tk_ocamltype with | None -> tunit | Some ocamltype -> TypTextual ocamltype (* [actiondef] turns a branch into a function definition. *) (* The names and types of the conventional internal variables that correspond to keywords ($startpos,etc.) are hardwired in this code. It would be nice if these conventions were more clearly isolated and perhaps moved to the [Action] or [Keyword] module. *) let actiondef grammar symbol branch = (* Construct a list of the semantic action's formal parameters that depend on the production's right-hand side. *) let formals = List.fold_left (fun formals producer -> let symbol = producer_symbol producer and lid = producer_identifier_located producer in let id = Positions.value lid in let startp, endp, starto, endo, loc = Printf.sprintf "_startpos_%s_" id, Printf.sprintf "_endpos_%s_" id, Printf.sprintf "_startofs_%s_" id, Printf.sprintf "_endofs_%s_" id, Printf.sprintf "_loc_%s_" id in let pid = (* We use [PVarLocated] in preference to [PVar] so that this binding can be accurately in the source file (not in the generated file) by the OCaml compiler. This is important when a variable declared by the user turns out to be unused in a semantic action. We want the unused-variable warning (or error) to be correctly located. *) pvarlocated lid in let pid = try PAnnot (pid, tokentype grammar symbol) with Not_found -> (* Symbol is a nonterminal. *) annotate_pat grammar pid symbol in pid :: PAnnot (PVar startp, tposition) :: PAnnot (PVar endp, tposition) :: PAnnot (PVar starto, tint) :: PAnnot (PVar endo, tint) :: PAnnot (PVar loc, tlocation) :: formals ) [] branch.producers in (* Extend the list with parameters that do not depend on the right-hand side. *) let formals = PAnnot (PVar "_eRR", texn) :: PAnnot (PVar "_startpos", tposition) :: PAnnot (PVar "_endpos", tposition) :: PAnnot (PVar "_endpos__0_", tposition) :: PAnnot (PVar "_symbolstartpos", tposition) :: PAnnot (PVar "_startofs", tint) :: PAnnot (PVar "_endofs", tint) :: PAnnot (PVar "_endofs__0_", tint) :: PAnnot (PVar "_symbolstartofs", tint) :: PAnnot (PVar "_sloc", tlocation) :: PAnnot (PVar "_loc", tlocation) :: formals in (* Construct a function definition out of the above bindings and the semantic action. *) let body = annotate_expr grammar (Action.to_il_expr branch.action) symbol in match formals with | [] -> body | _ -> EFun (formals, body) (* [program] turns an entire grammar into a test program. *) let program grammar = (* Turn the grammar into a bunch of function definitions. Grammar productions that derive from the standard library are reflected first, so that type errors are not reported in them. *) let bindings1, bindings2 = StringMap.fold (fun symbol rule (bindings1, bindings2) -> List.fold_left (fun (bindings1, bindings2) branch -> if Action.is_standard branch.action then (PWildcard, actiondef grammar symbol branch) :: bindings1, bindings2 else bindings1, (PWildcard, actiondef grammar symbol branch) :: bindings2 ) (bindings1, bindings2) rule.branches ) grammar.rules ([], []) in (* Create entry points whose types are the unknowns that we are looking for. Mentioning just the type variable [ntvar nt] suffices here. *) let ps, ts = StringMap.fold (fun nt _ (ps, ts) -> PVar (encode (Misc.normalize nt)) :: ps, ntvar nt :: ts ) grammar.rules ([], []) in let def = { valpublic = true; valpat = PTuple ps; valval = ELet (bindings1 @ bindings2, annotate bottom (TypTuple ts)) } in (* Insert markers to delimit the part of the file that we are interested in. These markers are recognized by [Lexmli]. This helps skip the values, types, exceptions, etc. that might be defined by the prologue or postlogue. *) let begindef = { valpublic = true; valpat = PVar "menhir_begin_marker"; valval = EIntConst 0 } and enddef = { valpublic = true; valpat = PVar "menhir_end_marker"; valval = EIntConst 0 } in (* Issue the test program. We include the definition of the type of tokens, because, in principle, the semantic actions may refer to it or to its data constructors. *) [ SIFunctor (grammar.parameters, interface_to_structure (tokentypedef grammar) @ SIStretch grammar.preludes :: SIValDefs (false, [ begindef; def; enddef ]) :: SIStretch grammar.postludes :: [])] (* ------------------------------------------------------------------------- *) (* Writing the program associated with a grammar to a file. *) let write grammar filename () = let ml = open_out filename in let module P = Printer.Make (struct let f = ml let locate_stretches = Some filename end) in P.program (program grammar); close_out ml (* ------------------------------------------------------------------------- *) (* Running ocamldep on the program. *) type entry = string (* basename *) * string (* filename *) type line = entry (* target *) * entry list (* dependencies *) let depend postprocess grammar = (* Create an [.ml] file and an [.mli] file, then invoke ocamldep to compute dependencies for us. *) (* If an old [.ml] or [.mli] file exists, we are careful to preserve it. We temporarily move it out of the way and restore it when we are done. There is no reason why dependency analysis should destroy existing files. *) let ocamldep_command = Printf.sprintf "%s %s %s" Settings.ocamldep (Filename.quote mlname) (Filename.quote mliname) in let output : string = Option.project ( IO.moving_away mlname (fun () -> IO.moving_away mliname (fun () -> IO.with_file mlname (write grammar mlname) (fun () -> IO.with_file mliname (Interface.write grammar) (fun () -> IO.invoke ocamldep_command ))))) in (* Echo ocamldep's output. *) print_string output; (* If [--raw-depend] was specified on the command line, stop here. This option is used by omake and by ocamlbuild, which performs their own postprocessing of [ocamldep]'s output. For normal [make] users, who use [--depend], some postprocessing is required, which is performed below. *) if postprocess then begin (* Make sense out of ocamldep's output. *) let lexbuf = Lexing.from_string output in let lines : line list = try Lexdep.main lexbuf with Lexdep.Error msg -> (* Echo the error message, followed with ocamldep's output. *) Error.error [] "%s" (msg ^ output) in (* Look for the line that concerns the [.cmo] target, and echo a modified version of this line, where the [.cmo] target is replaced with [.ml] and [.mli] targets, and where the dependency over the [.cmi] file is dropped. In doing so, we assume that the user's [Makefile] supports bytecode compilation, so that it makes sense to request [bar.cmo] to be built, as opposed to [bar.cmx]. This is not optimal, but will do. [camldep] exhibits the same behavior. *) List.iter (fun ((_, target_filename), dependencies) -> if Filename.check_suffix target_filename ".cmo" then let dependencies = List.filter (fun (basename, _) -> basename <> base ) dependencies in if List.length dependencies > 0 then begin Printf.printf "%s.ml %s.mli:" base base; List.iter (fun (_basename, filename) -> Printf.printf " %s" filename ) dependencies; Printf.printf "\n%!" end ) lines end; (* Stop. *) exit 0 (* ------------------------------------------------------------------------- *) (* Augmenting a grammar with inferred type information. *) (* The parameter [output] is supposed to contain the output of [ocamlc -i]. *) let read_reply (output : string) grammar = (* See comment in module [Error]. *) Error.enable(); let env : (string * int * int) list = Lexmli.main (Lexing.from_string output) in let env : (string * ocamltype) list = List.map (fun (id, openingofs, closingofs) -> decode id, Inferred (String.sub output openingofs (closingofs - openingofs)) ) env in (* Augment the grammar with new %type declarations. *) let types = StringMap.fold (fun symbol _ types -> let ocamltype = try List.assoc (Misc.normalize symbol) env with Not_found -> (* No type information was inferred for this symbol. Perhaps the mock [.ml] file or the inferred [.mli] file are out of date. Fail gracefully. *) Error.error [] "found no inferred type for %s." symbol in (* Regardless of whether there was or wasn't a declared type, use the type inferred by the OCaml type-checker. *) StringMap.add symbol ocamltype types ) grammar.rules grammar.types in { grammar with types = types } (* ------------------------------------------------------------------------- *) (* Inferring types for a grammar's nonterminals. *) let infer grammar = (* Invoke ocamlc to do type inference for us. *) let ocamlc_command = Printf.sprintf "%s -c -i %s" Settings.ocamlc (Filename.quote mlname) in let output = write grammar mlname (); match IO.invoke ocamlc_command with | Some result -> Sys.remove mlname; result | None -> (* 2015/10/05: intentionally do not remove the [.ml] file if [ocamlc] fails. (Or if an exception is thrown.) *) exit 1 in (* Make sense out of ocamlc's output. *) read_reply output grammar (* ------------------------------------------------------------------------- *) let write_query filename grammar = write grammar filename (); exit 0 (* ------------------------------------------------------------------------- *) let read_reply filename grammar = read_reply (IO.read_whole_file filename) grammar menhir-20210929/src/infer.mli000066400000000000000000000043121412503066000156400ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open BasicSyntax (* [ntvar symbol] is the name of the type variable associated with a nonterminal symbol. *) val ntvar: string -> IL.typ (* [infer grammar] analyzes the grammar [grammar] and returns a new grammar, augmented with a [%type] declaration for every nonterminal symbol. The [ocamlc] compiler is used to infer types. *) val infer: grammar -> grammar (* [depend postprocess grammar] prints (on the standard output channel) the OCaml dependencies induced by the semantic actions. If [postprocess] is [true], then ocamldep's output is postprocessed, otherwise it is echoed unchanged. This function does not return; it terminates the program. *) val depend: bool -> grammar -> 'never_returns (* [write_query filename grammar] writes the grammar's semantic actions to a mock [.ml] file named [filename]. This file can then be submitted to [ocamlc] for type inference. See [--infer-write-query ] in the manual. *) val write_query: string -> grammar -> 'never_returns (* [read_reply filename grammar] reads the types inferred by OCaml for the mock [.ml] file described above, and returns a new grammar, augmented with a [%type] declaration for every nonterminal symbol. *) val read_reply: string -> grammar -> grammar menhir-20210929/src/infix.ml000066400000000000000000000023011412503066000154750ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) let ( += ) r i = r := !r + i let ( *= ) r i = r := !r * i let ( -= ) r i = r := !r - i let ( /= ) r i = r := !r / i let ( ||= ) r i = r := !r || i let ( &&= ) r i = r := !r && i let ( @:= ) r f = r := f !r let ( @@> ) f g x = f (g x) menhir-20210929/src/infix.mli000066400000000000000000000031021412503066000156460ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (** [r += n] sets [r] to [!r + n] *) val ( += ) : int ref -> int -> unit (** [r *= n] sets [r] to [!r * n] *) val ( *= ) : int ref -> int -> unit (** [r -= n] sets [r] to [!r - n] *) val ( -= ) : int ref -> int -> unit (** [r /= n] sets [r] to [!r / n] *) val ( /= ) : int ref -> int -> unit (** [r ||= n] sets [r] to [!r || n] *) val ( ||= ) : bool ref -> bool -> unit (** [r &&= n] sets [r] to [!r && n] *) val ( &&= ) : bool ref -> bool -> unit (** [r @:= f] sets [r] to [f !r] *) val ( @:= ) : 'a ref -> ('a -> 'a) -> unit (** [f @@> g] is [(fun x -> f (g x))] *) val ( @@> ) : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b menhir-20210929/src/inlining.ml000066400000000000000000000536641412503066000162110ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) let position = Positions.position open Keyword type sw = subject * where open BasicSyntax open ListMonad let drop = MenhirLib.General.drop let take = MenhirLib.General.take (* -------------------------------------------------------------------------- *) (* Throughout this file, branches (productions) are represented as lists of producers. We consider it acceptable to perform operations whose cost is linear in the length of a production, even when (with more complicated code) it would be possible to eliminate this cost. *) (* -------------------------------------------------------------------------- *) (* [search p i xs] searches the list [xs] for an element [x] that satisfies [p]. If successful, then it returns a pair of: - [i] plus the offset of [x] in the list, and - the element [x]. *) let rec search (p : 'a -> bool) (i : int) (xs : 'a list) : (int * 'a) option = match xs with | [] -> None | x :: xs -> if p x then Some (i, x) else search p (i+1) xs (* [search_at p i xs] searches the list [xs] for an element [x] that satisfies [p]. The search begins at index [i] in the list. If successful, then it returns a pair of: - the offset of [x] in the list, and - the element [x]. *) let search_at p i xs = search p i (drop i xs) (* -------------------------------------------------------------------------- *) (* [find grammar symbol] looks up the definition of [symbol], which must be a valid nonterminal symbol, in the grammar [grammar]. *) let find grammar symbol : rule = try StringMap.find symbol grammar.rules with Not_found -> (* This cannot happen. *) assert false (* -------------------------------------------------------------------------- *) (* [check_no_producer_attributes] checks that a producer, which represents a use site of an %inline symbol, does not carry any attributes. This ensures that we need not worry about propagating attributes through inlining. *) let check_no_producer_attributes producer = match producer_attributes producer with | [] -> () | (id, _payload) :: _attributes -> Error.error [position id] "the nonterminal symbol %s is declared %%inline.\n\ A use of it cannot carry an attribute." (producer_symbol producer) (* -------------------------------------------------------------------------- *) (* 2015/11/18. The interaction of %prec and %inline is not documented. It used to be the case that we would disallow marking a production both %inline and %prec. Now, we allow it, but we check that (1) it is inlined at the last position of the host production and (2) the host production does not already have a %prec annotation. *) let check_prec_inline caller producer nsuffix callee = callee.branch_prec_annotation |> Option.iter (fun callee_prec -> (* The callee has a %prec annotation. *) (* Check condition 1. *) if nsuffix > 0 then begin let symbol = producer_symbol producer in Error.error [ position callee_prec; caller.branch_position ] "this production carries a %%prec annotation,\n\ and the nonterminal symbol %s is marked %%inline.\n\ For this reason, %s can be used only in tail position." symbol symbol end; (* Check condition 2. *) caller.branch_prec_annotation |> Option.iter (fun caller_prec -> let symbol = producer_symbol producer in Error.error [ position callee_prec; position caller_prec ] "this production carries a %%prec annotation,\n\ and the nonterminal symbol %s is marked %%inline.\n\ For this reason, %s cannot be used in a production\n\ which itself carries a %%prec annotation." symbol symbol ) ) (* -------------------------------------------------------------------------- *) (* 2015/11/18. If the callee has a %prec annotation (which implies that the caller does not have one, and that the callee appears in tail position in the caller) then the annotation is inherited. This seems reasonable, but remains undocumented. *) let propagate_prec_annotation caller callee = match callee.branch_prec_annotation with | (Some _) as annotation -> assert (caller.branch_prec_annotation = None); annotation | None -> caller.branch_prec_annotation (* -------------------------------------------------------------------------- *) (* [new_candidate x] is a candidate fresh name, which is based on [x] in an unspecified way. A fairly arbitrary construction can be used here; we just need it to produce an infinite sequence of names, so that eventually we are certain to be able to escape any finite set of unavailable names. We also need this construction to produce reasonably concise names, as it can be iterated several times in practice; I have observed up to 9 iterations in real-world grammars. *) (* Here, the idea is to add a suffix of the form _inlined['0'-'9']+ to the name [x], if it does not already include such a suffix. If [x] already carries such a suffix, then we increment the integer number. *) let new_candidate x = let x, n = ChopInlined.chop (Lexing.from_string x) in Printf.sprintf "%s_inlined%d" x (n + 1) (* [fresh names x] returns a fresh name that is not in the set [names]. The new name is obtained by iterating [new_candidate] until we fall outside the set [names]. *) let rec fresh names x = if StringSet.mem x names then fresh names (new_candidate x) else x (* -------------------------------------------------------------------------- *) (* [rename used producers] renames the producers [producers] of the inlined branch (the callee) if necessary to avoid a clash with the set [used] of the names used by the producers of the host branch (the caller). This set need not contain the name of the producer that is inlined away. *) (* This function produces a pair of: 1. a substitution [phi], which represents the renaming that we have performed, and which must be applied to the semantic action of the callee; 2. the renamed [producers]. *) let rename (used : StringSet.t) producers: Action.subst * producers = let phi, _used, producers = List.fold_left (fun (phi, used, producers) producer -> let id = producer_identifier_located producer in let x = Positions.value id in if StringSet.mem x used then let x' = fresh used x in let id' = Positions.map (fun _x -> x') id in Action.extend x x' phi, StringSet.add x' used, { producer with producer_identifier = id' } :: producers else (phi, StringSet.add x used, producer :: producers) ) (Action.empty, used, []) producers in phi, List.rev producers (* -------------------------------------------------------------------------- *) (* [define_positions] defines how the start and end positions of the callee should be computed once it is inlined into the caller. This information is used to transform [$startpos] and [$endpos] in the callee and to transform [$startpos(x)] and [$endpos(x)] in the caller. *) (* 2015/11/04. We ensure that positions are computed in the same manner, regardless of whether inlining is performed. *) (* The arguments of this function are as follows: [name] an array of the names of the producers of the new branch [nprefix] the length of the prefix of the caller, up to the inlining site [ncallee] the length of the callee The results are as follows: [startp] how to transform $startpos in the callee [endp] how to transform $endpos in the callee [beforeendp] how to transform $endpos($0) in the callee *) let define_positions (name : string array) nprefix ncallee : sw * sw * sw = let startp = if ncallee > 0 then (* If the inner production is non-epsilon, things are easy. The start position of the inner production is the start position of its first element. *) RightNamed name.(nprefix), WhereStart else if nprefix > 0 then (* If the inner production is epsilon, we are supposed to compute the end position of whatever comes in front of it. If the prefix is nonempty, then this is the end position of the last symbol in the prefix. *) RightNamed (name.(nprefix - 1)), WhereEnd else (* If the inner production is epsilon and the prefix is empty, then we need to look up the end position stored in the top stack cell. This is the reason why we need the keyword [$endpos($0)]. It is required in this case to preserve the semantics of $startpos and $endpos. *) Before, WhereEnd (* Note that, to contrary to intuition perhaps, we do NOT have that if the prefix is empty, then the start position of the inner production is the start production of the outer production. This is true only if the inner production is non-epsilon. *) in let endp = if ncallee > 0 then (* If the inner production is non-epsilon, things are easy: its end position is the end position of its last element. *) RightNamed (name.(nprefix + ncallee - 1)), WhereEnd else (* If the inner production is epsilon, then its end position is equal to its start position. *) startp (* We must also transform [$endpos($0)] if it used by the inner production. It refers to the end position of the stack cell that comes before the inner production. So, if the prefix is non-empty, then it translates to the end position of the last element of the prefix. Otherwise, it translates to [$endpos($0)]. *) and beforeendp = if nprefix > 0 then RightNamed (name.(nprefix - 1)), WhereEnd else Before, WhereEnd in startp, endp, beforeendp (* -------------------------------------------------------------------------- *) (* [rename_sw_outer] transforms the keywords in the outer production (the caller) during inlining. It replaces [$startpos(x)] and [$endpos(x)], where [x] is the name of the callee, with [startpx] and [endpx], respectively. *) let rename_sw_outer (x, startpx, endpx) (sw : sw) : sw option = match sw with | Before, _ -> None | RightNamed x', where -> if x' = x then match where with | WhereStart -> Some startpx | WhereEnd -> Some endpx | WhereSymbolStart -> assert false (* has been expanded away *) else None | Left, _ -> (* [$startpos], [$endpos], and [$symbolstartpos] have been expanded away earlier; see [KeywordExpansion]. *) assert false (* -------------------------------------------------------------------------- *) (* [rename_sw_inner] transforms the keywords in the inner production (the callee) during inlining. It replaces [$endpos($0)] with [beforeendp]. *) let rename_sw_inner beforeendp (sw : sw) : sw option = match sw with | Before, where -> assert (where = WhereEnd); Some beforeendp | RightNamed _, _ -> None | Left, _ -> (* [$startpos] and [$endpos] have been expanded away earlier; see [KeywordExpansion]. *) assert false (* -------------------------------------------------------------------------- *) (* [inline_branch caller site callee] inlines the branch [callee] into the branch [caller] at the site [site]. By convention, a site is a pair of an integer index -- the index [i] of the producer that must be inlined away -- and a producer [producer] -- the producer itself. This is redundant, as [producer] can be recovered based on [caller] and [i], but convenient. *) type site = int * producer let inline_branch caller (i, producer : site) (callee : branch) : branch = (* The host branch (the caller) is divided into three sections: a prefix of length [nprefix], the producer that we wish to inline away, and a suffix of length [nsuffix]. *) (* Compute the length of the prefix and suffix. *) let nprefix = i in let nsuffix = List.length caller.producers - (i + 1) in (* Construct the prefix and suffix. *) let prefix = take nprefix caller.producers and suffix = drop (nprefix + 1) caller.producers in (* Apply the (undocumented) restrictions that concern the interaction between %prec and %inline. Then, (possibly) propagate a %prec annotation. *) check_prec_inline caller producer nsuffix callee; let branch_prec_annotation = propagate_prec_annotation caller callee in (* Compute the names of the producers in the host branch (the caller), minus the one that is being inlined away. Rename the producers of the inlined branch (the callee), if necessary, so as to avoid a clash with this set. The goal is to guarantee that, after inlining, all producers in the newly constructed branch have unique names. *) let used = StringSet.union (names prefix) (names suffix) in let phi, inlined_producers = rename used callee.producers in (* Construct (the producers of) the new branch. The prefix and suffix of the caller are preserved. In the middle, [producer] disappears and is replaced with [inlined_producers]. For debugging purposes, check that each producer in the new branch carries a unique name. *) let producers = prefix @ inlined_producers @ suffix in let (_ : StringSet.t) = names producers in (* Find out how the start and end positions of the callee should be computed once it is inlined into the caller. *) let startp, endp, beforeendp = let name = producers |> Array.of_list |> Array.map producer_identifier in let ncallee = List.length callee.producers in define_positions name nprefix ncallee in (* Apply appropriate renamings to the semantic actions of the caller and callee, then compose them using a [let] binding. If [x] is the name of the producer that we wish to inline away, then the variable [x] in the caller's semantic action should refer to the semantic value produced by the callee's semantic action. *) let x = producer_identifier producer in let caller_action, callee_action = Action.rename (rename_sw_outer (x, startp, endp)) Action.empty caller.action, Action.rename (rename_sw_inner beforeendp) phi callee.action in let action = Action.compose x callee_action caller_action in (* We are done! Build a new branch. *) let { branch_position; branch_production_level; _ } = caller in { branch_position; producers; action; branch_prec_annotation; branch_production_level; } (* -------------------------------------------------------------------------- *) (* Inlining a list of branches [callees] into the branch [caller] at [site]. *) let inline_branches caller site (callees : branches) : branches = List.map (inline_branch caller site) callees (* -------------------------------------------------------------------------- *) (* For greater syntactic convenience, the main function is written as a functor, and re-packaged as a function at the very end. *) (* Roughly speaking, the transformation is implemented by two mutually recursive functions. [expand_branches] transforms a list of branches into a list of (expanded) branches; [expand_symbol] maps a nonterminal symbol (which may or may not be marked %inline) to its definition in the transformed grammar, an (expanded) rule. In order to avoid duplication of work, we memoize [expand_symbol]. Thus, the expansion of each symbol is computed at most once. (Expansions are demanded top-down, but are computed bottom-up.) Memoization is implemented without pain by using a ready-made fixed point combinator, [Memoize.defensive_fix]. Furthermore, this find point combinator dynamically detects cycles of %inline nonterminal symbols, allowing us to avoid divergence and display a nice error message. *) module Inline (G : sig val grammar: grammar end) = struct open G let is_inline_symbol = is_inline_symbol grammar let is_inline_producer = is_inline_producer grammar let find = find grammar (* In [--coq] mode, %inline is forbidden. There are two reasons for this. One technical reason is that inlining requires constructing composite semantic actions (using [Action.compose], etc.) and this construction is currently OCaml-specific. (This could be rather easily changed, though.) A more philosophical reason is that we don't want to have a large gap between the grammar written by the user in the .mly file and the grammar written by Menhir in the .v file. The latter grammar is the reference grammar, the one with respect to which the generated parser is proved correct. *) let () = if Settings.coq then StringMap.iter (fun _ rule -> if rule.inline_flag then Error.error rule.positions "%%inline is not supported by the Coq back-end." ) grammar.rules (* This is [expand_branches], parameterized by its companion function, [expand_symbol]. The parameter [i], an integer, is used to perform a left-to-right sweep: the precondition of [expand_branches] is that there are no inlining sites at indices less than [i] in [branches]. Thus, we can begin searching at index [i]. (Beginning to search at index 0 would work, too, but would cause redundant searches.) *) let rec expand_branches expand_symbol i branches : branches = (* For each branch [caller] in the list [branches], *) branches >>= fun (caller : branch) -> (* Search for an inlining site in the branch [caller]. We begin the search at position [i], as we know that every inlining site left of this position has been dealt with already. *) match search_at is_inline_producer i caller.producers with | None -> (* There is none; we are done. *) return caller | Some ((i, producer) as site) -> (* There is one. This is an occurrence of a nonterminal symbol [symbol] that is marked %inline. We look up its (expanded) definition (via a recursive call to [expand_symbol]), yielding a set of branches, which we inline into the branch [caller]. Then, we continue looking for inlining sites. *) check_no_producer_attributes producer; let symbol = producer_symbol producer in expand_symbol symbol |> get_branches |> inline_branches caller site |> expand_branches expand_symbol i (* This is [expand_symbol], parameterized by itself. *) let expand_symbol expand_symbol symbol : rule = (* Find the rule that defines this symbol. Then, transform this rule by applying [expand_branches] to its branches. The left-to-right sweep begins at index 0. *) find symbol |> transform_branches (expand_branches expand_symbol 0) (* Apply [defensive_fix] to obtain a closed function [expand_symbol]. *) let expand_symbol : Syntax.symbol -> rule = Fix.Memoize.String.defensive_fix expand_symbol (* Wrap [expand_symbol] in an exception handler, so that, when a cycle of %inline nonterminal symbols is detected, a good error message is displayed. *) let expand_symbol symbol = try expand_symbol symbol with Fix.Memoize.String.Cycle (symbols, symbol) -> let rule = find symbol in let b = Buffer.create 128 in Printf.bprintf b "there is a cycle of %%inline nonterminal symbols:\n"; begin match symbols with | [] -> assert false | head :: [] -> assert (head = symbol); Printf.bprintf b " %s refers to itself." symbol | head :: next :: symbols -> assert (head = symbol); Printf.bprintf b " %s refers to %s,\n" head next; List.iter (fun symbol -> Printf.bprintf b " which refers to %s,\n" symbol ) symbols; Printf.bprintf b " which refers back to %s." symbol end; Error.error rule.positions "%s" (Buffer.contents b) (* The rules of the transformed grammar are obtained by keeping only non-%inline symbols and expanding their rules. *) let rules = grammar.rules |> StringMap.filter (fun _ rule -> not rule.inline_flag) |> StringMap.mapi (fun symbol _rule -> expand_symbol symbol) (* Drop %type declarations that concern %inline symbols. *) let keep symbol _rule : bool = not (is_inline_symbol symbol) let types = StringMap.filter keep grammar.types (* Drop %on_error_reduce declarations that concern %inline symbols. At the same time, display a warning, as this seems strange: these declarations are useless. *) let keep_or_warn (symbol : string) _rule : bool = let keep = keep symbol _rule in if not keep then Error.grammar_warning [] "the declaration %%on_error_reduce %s\n\ has no effect: this symbol is marked %%inline and is expanded away." symbol; keep let on_error_reduce = StringMap.filter keep_or_warn grammar.on_error_reduce (* We are done. *) let grammar = { grammar with rules; types; on_error_reduce } end (* -------------------------------------------------------------------------- *) (* Re-package the above functor as a function. *) let inline grammar = let module I = Inline(struct let grammar = grammar end) in I.grammar menhir-20210929/src/inlining.mli000066400000000000000000000023051412503066000163440ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open BasicSyntax (** [inline g] traverses the grammar [g] and inlines away the nonterminal symbols whose definitions are marked [%inline]. The result is a grammar where no symbols are marked [%inline]. *) val inline: grammar -> grammar menhir-20210929/src/installation.ml000066400000000000000000000044771412503066000171010ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* [normalize] normalizes a file name by recognizing . and .. and treating them in an appropriate manner. *) let rec normalize fn = let dir = Filename.dirname fn in let base = Filename.basename fn in if dir = fn then (* This could be the case e.g. if [fn] is "/". *) dir else if base = Filename.current_dir_name then (* We have "." as the basename, that is, at the end. Remove it and continue. *) normalize dir else if base = Filename.parent_dir_name then (* We have ".." as the basename, that is, at the end. Normalize the rest. Once done, chop off the basename, thus moving to the parent directory. *) Filename.dirname (normalize dir) else (* We have a normal basename. Normalize the rest. *) Filename.concat (normalize dir) base (* The directory where (we think) MenhirLib is installed. *) (* This directory used to be hard-coded in the [menhir] executable. We now adopt a different strategy. We fetch the name of the [menhir] executable, and hope that it is of the form [.../bin/menhir]. We change this to [.../lib/menhirLib], and hope that this is where MenhirLib is installed. *) let libdir () = let root = Sys.executable_name |> normalize |> Filename.dirname (* remove [menhir] *) |> Filename.dirname (* remove [bin] *) in Filename.concat root (Filename.concat "lib" "menhirLib") menhir-20210929/src/installation.mli000066400000000000000000000020511412503066000172340ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The directory where (we think) MenhirLib is installed. *) val libdir: unit -> string menhir-20210929/src/interface.ml000066400000000000000000000136351412503066000163340ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open BasicSyntax open IL open CodeBits (* -------------------------------------------------------------------------- *) (* The [Error] exception. *) let excname = "Error" let excdef = { excname = excname; exceq = (if Settings.fixedexc then Some "Parsing.Parse_error" else None); } (* -------------------------------------------------------------------------- *) (* The type of the monolithic entry point for the start symbol [symbol]. *) let entrytypescheme grammar symbol = let typ = TypTextual (ocamltype_of_start_symbol grammar symbol) in type2scheme (marrow [ arrow tlexbuf TokenType.ttoken; tlexbuf ] typ) (* -------------------------------------------------------------------------- *) (* When the table back-end is active, the generated parser contains, as a sub-module, an application of [Engine.Make]. This sub-module is named as follows. *) let interpreter = "MenhirInterpreter" let checkpoint t = TypApp (interpreter ^ ".checkpoint", [ t ]) let lr1state = "lr1state" let tlr1state a : typ = TypApp (lr1state, [a]) (* -------------------------------------------------------------------------- *) (* The name of the sub-module that contains the incremental entry points. *) let incremental = "Incremental" (* The type of the incremental entry point for the start symbol [symbol]. *) let entrytypescheme_incremental grammar symbol = let t = TypTextual (ocamltype_of_start_symbol grammar symbol) in type2scheme (marrow [ tposition ] (checkpoint t)) (* -------------------------------------------------------------------------- *) (* The name of the sub-module that contains the inspection API. *) let inspection = "Inspection" (* -------------------------------------------------------------------------- *) (* The monolithic (traditional) API: the type [token], the exception [Error], and the parser's entry points. *) let monolithic_api grammar = TokenType.tokentypedef grammar @ IIComment "This exception is raised by the monolithic API functions." :: IIExcDecls [ excdef ] :: IIComment "The monolithic API." :: IIValDecls ( StringSet.fold (fun symbol decls -> (Misc.normalize symbol, entrytypescheme grammar symbol) :: decls ) grammar.start_symbols [] ) :: [] (* -------------------------------------------------------------------------- *) (* The inspection API. *) let inspection_api grammar () = let a = "a" in (* Define the types [terminal] and [nonterminal]. *) TokenType.tokengadtdef grammar @ NonterminalType.nonterminalgadtdef grammar @ (* Include the signature that lists the inspection functions, with appropriate type instantiations. *) IIComment "The inspection API." :: IIInclude ( with_types WKDestructive "MenhirLib.IncrementalEngine.INSPECTION" [ [ a ], "lr1state", tlr1state (TypVar a); [], "production", TypApp ("production", []); [ a ], TokenType.tctokengadt, TokenType.ttokengadt (TypVar a); [ a ], NonterminalType.tcnonterminalgadt, NonterminalType.tnonterminalgadt (TypVar a); [ a ], "env", TypApp ("env", [ TypVar a ]); ] ) :: [] (* -------------------------------------------------------------------------- *) (* The incremental API. *) let incremental_engine () : module_type = with_types WKNonDestructive "MenhirLib.IncrementalEngine.INCREMENTAL_ENGINE" [ [], "token", (* NOT [tctoken], which is qualified if [--external-tokens] is used *) TokenType.ttoken ] let incremental_entry_points grammar : interface = IIComment "The entry point(s) to the incremental API." :: IIModule (incremental, MTSigEnd [ IIValDecls ( StringSet.fold (fun symbol decls -> (symbol, entrytypescheme_incremental grammar symbol) :: decls ) grammar.start_symbols [] ) ]) :: [] let incremental_api grammar () : interface = IIModule ( interpreter, MTSigEnd ( IIComment "The incremental API." :: IIInclude (incremental_engine()) :: MList.ifnlazy Settings.inspection (inspection_api grammar) ) ) :: (* The entry points must come after the incremental API, because their type refers to the type [checkpoint]. *) incremental_entry_points grammar (* -------------------------------------------------------------------------- *) (* The complete interface of the generated parser. *) let interface grammar = [ IIFunctor (grammar.parameters, monolithic_api grammar @ MList.ifnlazy Settings.table (incremental_api grammar) ) ] (* -------------------------------------------------------------------------- *) (* Writing the interface to a file. *) let write grammar () = (* We have a dependency on [TokenType], which takes care of the case where [token_type_mode] is [TokenTypeOnly]. *) assert (Settings.token_type_mode <> Settings.TokenTypeOnly); let mli = open_out (Settings.base ^ ".mli") in let module P = Printer.Make (struct let f = mli let locate_stretches = None end) in P.interface (interface grammar); close_out mli menhir-20210929/src/interface.mli000066400000000000000000000033741412503066000165040ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module defines the interface of the generated parser. *) (* This is the [Error] exception. *) val excname: string val excdef: IL.excdef (* The type of the entry point for the start symbol [nt]. *) val entrytypescheme: BasicSyntax.grammar -> string -> IL.typescheme (* The name of the interpreter sub-module, when the table back-end is used. *) val interpreter: string (* The type ['a checkpoint], defined in the interpreter sub-module. *) val checkpoint: IL.typ -> IL.typ (* The name of the sub-module that contains the incremental entry points. *) val incremental: string (* The name of the sub-module that contains the inspection API. *) val inspection: string (* This writes the interface of the generated parser to the [.mli] file. *) val write: BasicSyntax.grammar -> unit -> unit menhir-20210929/src/interpret.ml000066400000000000000000001227051412503066000164070ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* --------------------------------------------------------------------------- *) open Printf open Grammar open SentenceParserAux (* A delimiter. *) type delimiter = string (* An error message. *) type message = string (* A target tells us which state a sentence leads to, as well as which spurious reductions are performed at the end. *) type target = ReferenceInterpreter.target let target2state (s, _spurious) = s (* A targeted sentence is a located sentence together with the target into which it leads. *) type targeted_sentence = located_sentence * target (* A run is a series of targeted sentences or comments, followed with a delimiter (composed at least one blank line and possibly comments), followed with an error message. *) type run = { (* A list of sentences. *) elements: targeted_sentence or_comment list; (* A delimiter. *) delimiter: delimiter; (* A message. *) message: message; } (* --------------------------------------------------------------------------- *) (* [stream] turns a finite list of terminals into a stream of terminals, represented as a pair of a lexer and a lexing buffer, so as to be usable with Menhir's traditional API. *) (* A traditional lexer returns a token and updates the fields of the lexing buffer with new positions. Here, we have no position information, so we keep the dummy positions that exist at initialization time. *) (* When the finite list is exhausted, two plausible behaviors come to mind. One behavior consists in raising an exception. In that case, we are creating a finite stream, and it is up to the parser to not read past its end. Another behavior consists in returning a designated token. In that case, we are creating an infinite, eventually constant, stream. The choice between these two behaviors is somewhat arbitrary; furthermore, in the second case, the choice of the designated token is arbitrary as well. Here, we adopt the second behavior if and only if the grammar has an EOF token, and we use EOF as the designated token. Again, this is arbitrary, and could be changed in the future. *) exception EndOfStream include struct open Lexing let stream (toks : Terminal.t list) : (lexbuf -> Terminal.t) * lexbuf = let toks = ref toks in let lexbuf = from_string "" in lexbuf.lex_start_p <- dummy_pos; lexbuf.lex_curr_p <- dummy_pos; let lexer _lexbuf = match !toks with | tok :: more -> toks := more; tok | [] -> match Terminal.eof with | Some eof -> eof | None -> raise EndOfStream in lexer, lexbuf end (* --------------------------------------------------------------------------- *) (* [start sentence] returns the start symbol that we should use to interpret the sentence [sentence]. *) (* If a start symbol was explicitly provided as part of the sentence, we use it. Otherwise, we use the grammar's unique start symbol, if there is one. *) let start poss ((nto, _) : sentence) : Nonterminal.t = match nto with | Some nt -> nt | None -> match ProductionMap.is_singleton Lr1.entry with | None -> Error.error poss "because the grammar has multiple start symbols, each of the\n\ sentences provided on the standard input channel must be of the\n\ form: : *" | Some (prod, _) -> match Production.classify prod with | Some nt -> nt | None -> assert false (* --------------------------------------------------------------------------- *) (* [interpret] interprets a sentence. *) let interpret ((_, toks) as sentence) : unit = let nt = start [] sentence in (* Run the reference interpreter. This can produce a concrete syntax tree ([Some cst]), fail with a parser error ([None]), or fail with a lexer error ([EndOfStream]). *) (* In either case, we produce just one line of output, so it should be clear to the user which outcomes correspond to which sentences (should multiple sentences be supplied). *) let lexer, lexbuf = stream toks in begin match ReferenceInterpreter.interpret nt Settings.trace lexer lexbuf with | Some cst -> (* Success. *) printf "ACCEPT"; if Settings.interpret_show_cst then begin print_newline(); Cst.show stdout cst end | None -> (* Parser failure. *) printf "REJECT" | exception EndOfStream -> (* Lexer failure. *) printf "OVERSHOOT" end; print_newline() (* --------------------------------------------------------------------------- *) (* [interpret_error_aux] interprets a sentence, expecting it to end in an error. Failure or success is reported via two continuations. *) let interpret_error_aux log poss ((_, terminals) as sentence) fail succeed = let nt = start poss sentence in let open ReferenceInterpreter in match check_error_path log nt terminals with | OInputReadPastEnd -> fail "no syntax error occurs." | OInputNotFullyConsumed -> fail "a syntax error occurs before the last token is reached." | OUnexpectedAccept -> fail "no syntax error occurs; in fact, this input is accepted." | OK target -> succeed nt terminals target (* --------------------------------------------------------------------------- *) (* This default error message is produced by [--list-errors] when it creates a [.messages] file, and is recognized by [--compare-errors] when it compares two such files. *) let default_message = "\n" (* This is needed in the following function. If [print_messages_auto] is never called, then we end up needlessly performing this analysis. Fortunately, it is extremely cheap. *) module SS = StackSymbols.Run() (* [print_messages_auto (nt, sentence, target)] displays the sentence defined by [nt] and [sentence], leading to the state [target]. It then displays a bunch of auto-generated comments. *) let print_messages_auto (nt, sentence, target) : unit = (* Print the sentence. *) print_string (Sentence.print `Abstract (Some nt, sentence)); (* If a token alias has been defined for every terminal symbol, then we can convert this sentence into concrete syntax. Do so. We make a few assumptions about the concrete syntax of the language: 1. It is permitted to insert one space between two tokens; 2. No token contains a newline character. (Our lexer enforces this assumption.) The name of the start symbol cannot be printed in a meaningful manner, so it is omitted. *) if Terminal.every_token_has_an_alias then printf "##\n\ ## Concrete syntax: %s\n" (Sentence.print `Concrete (Some nt, sentence)) ; (* Show which state this sentence leads to. *) let (s', spurious) = target in printf "##\n\ ## Ends in an error in state: %d.\n\ ##\n\ %s##\n" (Lr1.number s') (* [Lr0.print] or [Lr0.print_closure] could be used here. The latter could sometimes be helpful, but is usually intolerably verbose. *) (Lr0.print "## " (Lr1.state s')) ; (* Show the known suffix of the stack in this state. *) printf "## The known suffix of the stack is as follows:\n\ ##%s\n\ ##\n" (StackSymbols.print_symbols (SS.stack_symbols s')) ; (* If interpreting this sentence causes spurious reductions (that is, reductions that take place after the last terminal symbol has been shifted), say so, and show them. *) if spurious <> [] then begin printf "## WARNING: This example involves spurious reductions.\n\ ## This implies that, although the LR(1) items shown above provide an\n\ ## accurate view of the past (what has been recognized so far), they\n\ ## may provide an INCOMPLETE view of the future (what was expected next).\n" ; List.iter (fun (s, prod) -> printf "## In state %d, spurious reduction of production %s\n" (Lr1.number s) (Production.print prod) ) spurious; printf "##\n" end (* [print_messages_item] displays one data item. The item is of the form [nt, sentence, target], which means that beginning at the start symbol [nt], the sentence [sentence] ends in an error in the target state given by [target]. [target] also contains information about which spurious reductions are performed at the end. The display obeys the [.messages] file format. *) let print_messages_item (nt, sentence, target) : unit = (* Print the sentence, followed with auto-generated comments. *) print_messages_auto (nt, sentence, target); (* Then, print a proposed error message, between two blank lines. *) printf "\n%s\n" default_message (* --------------------------------------------------------------------------- *) (* [write_run run] writes a run into a new [.messages] file. Manually-written comments are preserved. New auto-generated comments are produced. *) let write_run : run or_comment -> unit = function | Thing run -> (* First, print every sentence and human comment. *) List.iter (fun sentence_or_comment -> match sentence_or_comment with | Thing ((poss, ((_, toks) as sentence)), target) -> let nt = start poss sentence in (* Every sentence is followed with newly generated auto-comments. *) print_messages_auto (nt, toks, target) | Comment c -> print_string c ) run.elements; (* Then, print the delimiter, which must begin with a blank line and may include comments. *) print_string run.delimiter; (* Then, print the error message. *) print_string run.message (* No need for another blank line. It will be printed as part of a separate [Comment]. *) | Comment comments -> (* Must begin with a blank line. *) print_string comments (* --------------------------------------------------------------------------- *) (* [interpret_error] interprets a sentence, expecting it to end in an error. Failure or success is reported on the standard output channel. This is used by [--interpret-error]. *) let fail msg = Error.error [] "%s" msg let succeed nt terminals target = print_messages_item (nt, terminals, target); exit 0 let interpret_error sentence = interpret_error_aux Settings.trace [] sentence fail succeed (* --------------------------------------------------------------------------- *) (* The lexer [SentenceLexer] produces sentences that contain raw symbols, that is, strings that are not yet known to represent valid nonterminal or terminal symbols. This check is performed here. It either succeeds or signals an error in the category [c] and raises [Invalid]. *) (* We also check that every sentence leads to an error state. *) exception Invalid let validate_nonterminal_symbol c (lid, startpos, endpos) = match Nonterminal.lookup lid with | exception Not_found -> Error.signal c [Positions.import (startpos, endpos)] "\"%s\" is not a known non-terminal symbol." lid; raise Invalid | nt -> if Nonterminal.is_user_start nt then nt else begin Error.signal c [Positions.import (startpos, endpos)] "\"%s\" is not a start symbol." lid; raise Invalid end let validate_terminal_symbol c (uid, startpos, endpos) = try Terminal.lookup uid with Not_found -> Error.signal c [Positions.import (startpos, endpos)] "\"%s\" is not a known terminal symbol." uid; raise Invalid let validate_sentence c (sentence : raw_sentence) : sentence = let (nto, terminals) = sentence in Option.map (validate_nonterminal_symbol c) nto, List.map (validate_terminal_symbol c) terminals let validate_optional_sentence c = Option.map (validate_sentence c) let validate_located_sentence c (poss, sentence) : targeted_sentence = (* First, validate every symbol. *) let sentence = validate_sentence c sentence in (* Then, check that this sentence leads to an error state. *) interpret_error_aux false poss sentence (* failure: *) (fun msg -> Error.signal c poss "this sentence does not end with a syntax error, as it should:\n%s" msg; raise Invalid) (* success: *) (fun _nt _terminals target -> (poss, sentence), target) (* [validate_entry] validates a list of located sentences or comments, as returned by [SentenceParser.entry]. If a sentence contains an error, then an error message is emitted, this sentence is removed from the list, and the validation process continues. *) let validate_entry c entry : targeted_sentence or_comment list = Misc.filter_map (function | Thing sentence -> begin try Some (Thing (validate_located_sentence c sentence)) with Invalid -> None end | Comment c -> Some (Comment c) ) entry (* This wrapper causes Menhir to exit if at least one error was signaled during validation of [x] by [validate]. *) let strictly validate x = Error.with_new_category (fun c -> validate c x) (* --------------------------------------------------------------------------- *) (* [setup()] returns a function [read] which reads one sentence from the standard input channel and immediately validates it. *) let setup () : unit -> sentence option = let open Lexing in let lexbuf = from_channel stdin in lexbuf.lex_curr_p <- { lexbuf.lex_curr_p with pos_fname = "(stdin)" }; let read () = match SentenceParser.optional_sentence SentenceLexer.lex lexbuf with | exception Parsing.Parse_error -> Error.error (Positions.lexbuf lexbuf) "ill-formed input sentence." | osentence -> strictly validate_optional_sentence osentence in read (* --------------------------------------------------------------------------- *) (* Display an informational message about the contents of a [.messages] file. *) let stats (runs : run or_comment list) = (* [s] counts the sample input sentences. [m] counts the error messages. *) let s = ref 0 and m = ref 0 in List.iter (function | Thing { elements; _ } -> incr m; List.iter (function | Thing _ -> incr s | Comment _ -> () ) elements | Comment _ -> () ) runs; eprintf "Read %d sample input sentences and %d error messages.\n%!" !s !m; runs (* --------------------------------------------------------------------------- *) (* Reading a [.messages] file. *) (* Our life is slightly complicated by the fact that the whitespace between two runs can contain comments, which we wish to preserve when performing [--update-errors]. *) (* Sentences that do not pass validation are removed (and error messages are emitted). If one or more validation errors have occurred and if [strict] is [true], then we stop at the end. *) let mkcomment c accu = if String.length c = 0 then accu else Comment c :: accu let read_messages strict filename : run or_comment list = let open Segment in let c = Error.new_category() in (* Read and segment the file. *) let segments : (tag * string * Lexing.lexbuf) list = segment filename in (* Process the segments, two by two. We expect one segment to contain a non-empty series of sentences, and the next segment to contain free-form text. *) let rec loop accu segments = match segments with | [] -> List.rev accu | (Whitespace, comments, _) :: segments -> loop (mkcomment comments accu) segments | (Segment, _, lexbuf) :: segments -> (* Read a series of located sentences. *) match SentenceParser.entry SentenceLexer.lex lexbuf with | exception Parsing.Parse_error -> Error.error [Positions.cpos lexbuf] "ill-formed sentence." | elements -> (* [elements] is a list of located raw sentences or comments. Validate it. Any sentences that do not pass validation are removed (and error messages are emitted). In an effort to be robust, we continue. If there remain zero sentences, then this entry is removed entirely. *) let elements = validate_entry c elements in (* In principle, we should now find a segment of whitespace followed with a segment of text. By construction, the two kinds of segments alternate. *) match segments with | (Whitespace, delimiter, _) :: (Segment, message, _) :: segments -> if count_things elements = 0 then (* There remain zero sentences. Skip this entry. *) loop accu segments else (* Accumulate this entry. *) let run = { elements; delimiter; message } in loop (Thing run :: accu) segments | [] | [ _ ] -> Error.error (Positions.one (Lexing.lexeme_end_p lexbuf)) "missing a final message. I may be desynchronized." | (Segment, _, _) :: _ | (Whitespace, _, _) :: (Whitespace, _, _) :: _ -> (* Should not happen, thanks to the alternation between the two kinds of segments. *) assert false in let runs = stats (loop [] segments) in if strict then Error.exit_if c; runs (* --------------------------------------------------------------------------- *) (* [foreach_targeted_sentence f accu runs] iterates over the targeted sentences in the list [runs]. The function [f] receives the current accumulator, a targeted sentence, and the corresponding message, and must return an updated accumulator. *) let foreach_targeted_sentence f accu (runs : run or_comment list) = List.fold_left (or_comment_fold (fun accu run -> List.fold_left (or_comment_fold (fun accu sentence -> f accu sentence run.message )) accu run.elements )) accu runs (* --------------------------------------------------------------------------- *) (* [message_table] converts a list of targeted runs to a table (a mapping) of states to located sentences and messages. Optionally, it can detect that two sentences lead to the same state, and report an error. *) let message_table (detect_redundancy : bool) (runs : run or_comment list) : (located_sentence * message) Lr1.NodeMap.t = Error.with_new_category (fun c -> foreach_targeted_sentence (fun table (sentence2, target) message -> let s = target2state target in match Lr1.NodeMap.find s table with | sentence1, _ -> if detect_redundancy then Error.signal c (fst sentence1 @ fst sentence2) "these sentences both cause an error in state %d." (Lr1.number s); table | exception Not_found -> Lr1.NodeMap.add s (sentence2, message) table ) Lr1.NodeMap.empty runs ) (* --------------------------------------------------------------------------- *) (* [compile_runs] converts a list of targeted runs to OCaml code that encodes a mapping of state numbers to error messages. The code is sent to the standard output channel. *) let compile_runs filename (runs : run or_comment list) : unit = (* We wish to produce a function that maps a state number to a message. By convention, we call this function [message]. *) let name = "message" in let open IL in let open CodeBits in let default = { branchpat = PWildcard; branchbody = eraisenotfound (* The default branch raises an exception, which can be caught by the user, who can then produce a generic error message. *) } in let branches = List.fold_left (or_comment_fold (fun branches run -> (* Create an or-pattern for these states. *) let states = Misc.filter_map (or_comment_filter_map (fun (_, target) -> let s = target2state target in pint (Lr1.number s) )) run.elements in (* Map all these states to this message. *) { branchpat = POr states; branchbody = EStringConst run.message } :: branches )) [ default ] runs in let messagedef = { valpublic = true; valpat = PVar name; valval = EFun ([ PVar "s" ], EMatch (EVar "s", branches)) } in let program = [ SIComment (sprintf "This file was auto-generated based on \"%s\"." filename); SIComment (sprintf "Please note that the function [%s] can raise [Not_found]." name); SIValDefs (false, [ messagedef ]); ] in (* Write this program to the standard output channel. *) let module P = Printer.Make (struct let f = stdout let locate_stretches = None end) in P.program program (* --------------------------------------------------------------------------- *) (* The rest of this file is the function [run], internally written as a functor [Run] for syntactic convenience. *) module Run (X : sig end) = struct (* --------------------------------------------------------------------------- *) (* If [--interpret] is set, interpret the sentences found on the standard input channel, then stop, without generating a parser. *) (* We read a series of sentences from the standard input channel. To allow interactive use, we interpret each sentence as soon as it is read. *) let () = if Settings.interpret then let read = setup() in printf "Ready!\n%!"; while true do match read() with | None -> exit 0 | Some sentence -> interpret sentence done (* --------------------------------------------------------------------------- *) (* If [--interpret-error] is set, interpret one sentence found on the standard input channel, then stop, without generating a parser. *) (* We read just one sentence, confirm that this sentence ends in an error, and (if that is the case) display the number of the state that is reached. *) let () = if Settings.interpret_error then let read = setup() in match read() with | None -> exit 1 (* abnormal: no input *) | Some sentence -> interpret_error sentence (* never returns *) (* --------------------------------------------------------------------------- *) (* If [--compile-errors ] is set, compile the error message descriptions found in file [filename] down to OCaml code, then stop. *) let () = Settings.compile_errors |> Option.iter (fun filename -> (* Read the file. Compute the target state of every sentence. Stop if a sentence does not end in an error state, as expected. *) let strict = true in let runs : run or_comment list = read_messages strict filename in (* Build a mapping of states to located sentences. This allows us to detect if two sentences lead to the same state. *) let _ = message_table true runs in (* In principle, we would like to check whether this set of sentences is complete (i.e., covers all states where an error can arise), but this may be costly -- it requires running [LRijkstra]. Instead, we offer a separate facility for comparing two [.messages] files, one of which can be produced via [--list-errors]. This can be used to ensure completeness. *) (* Now, compile this information down to OCaml code. We wish to produce a function that maps a state number to a message. By convention, we call this function [message]. *) compile_runs filename runs; exit 0 ) (* --------------------------------------------------------------------------- *) (* If two [--compare-errors ] directives are provided, compare the two message descriptions files, and stop. We wish to make sure that every state that appears on the left-hand side appears on the right-hand side as well. *) let compare_errors filename1 filename2 = (* Read both files. *) let strict = false in let runs1 = read_messages strict filename1 and runs2 = read_messages strict filename2 in (* Convert the right-hand file to a table for quick lookup. *) let table2 = message_table false runs2 in (* There is no need to convert the left-hand file. In fact, not converting it to a table allows us to produce error messages in an order that respects the left-hand file. Indeed, the left-hand file is processed by the following loop: *) Error.with_new_category begin fun c -> foreach_targeted_sentence begin fun () (sentence1, target1) message1 -> let s = target2state target1 in (* 1. Check that the target state [s] appears in [table2]. *) match Lr1.NodeMap.find s table2 with | exception Not_found -> let poss1 = fst sentence1 in Error.signal c poss1 "this sentence leads to an error in state %d.\n\ No sentence that leads to this state exists in \"%s\"." (Lr1.number s) filename2 (* 2. Check that [s] is mapped by [table1] and [table2] to the same error message. As an exception, if the message found in [table1] is the default message, then no comparison takes place. This allows using [--list-errors] and [--compare-errors] in conjunction to ensure that a [.messages] file is complete, without seeing warnings about different messages. *) | sentence2, message2 -> if message1 <> default_message && message1 <> message2 then begin let poss1 = fst sentence1 and poss2 = fst sentence2 in Error.warning (poss1 @ poss2) "these sentences lead to an error in state %d.\n\ The corresponding messages in \"%s\" and \"%s\" differ." (Lr1.number s) filename1 filename2 end end () runs1 end let () = Settings.compare_errors |> Option.iter (fun (filename1, filename2) -> compare_errors filename1 filename2; exit 0 ) (* --------------------------------------------------------------------------- *) (* Auxiliary functions for [merge_errors]. *) (* [is_blank c] determines whether the comment [c] is blank. *) let is_blank_char c = match c with | ' ' | '\n' | '\r' | '\t' -> true | _ -> false let rec is_blank c i n = i = n || is_blank_char c.[i] && is_blank c (i+1) n let is_blank c = is_blank c 0 (String.length c) (* [remove_leading_blank_comment] removes a leading blank comment from a list. *) let remove_leading_blank_comment xs = match xs with | [] -> [] | Comment c :: xs when is_blank c -> xs | _ :: xs -> xs (* A simple queue where [emit] inserts an element at the end and [elements] returns the current list of all elements and clears the queue. *) module Q = struct let create () = let q = ref [] in let emit x = q := x :: !q and elements () = let xs = List.rev !q in q := []; xs in emit, elements end let conflict_comment filename = sprintf "#@ WARNING:\n\ #@ The following sentence has been copied from \"%s\".\n\ #@ It is redundant with a sentence that appears earlier in this file,\n\ #@ so one of them must be removed.\n" filename let toplevel_comment filename = sprintf "#@ WARNING:\n\ #@ The following comment has been copied from \"%s\".\n\ #@ It may need to be proofread, updated, moved, or removed.\n" filename (* [is_default_run p run] tests whether [run] is a default run, that is, a run that consists of a single sentence and a default message. If so, it additionally tests whether the sentence's target state satisfies [p]. *) let is_default_run (p : Lr1.node -> bool) (run : run) = run.message = default_message && let sentences : targeted_sentence list = List.fold_left (or_comment_fold (fun xs x -> x :: xs)) [] run.elements in match sentences with | [ (_sentence, target) ] -> let s = target2state target in p s | _ -> false (* [remove_default_runs] removes from the list [runs] the default runs whose target state satisfies [p]. *) (* We make the assumption that a default run does not contain interesting comments, so it is not a problem to lose these comments when the run is removed. *) let rec remove_default_runs p (runs : run or_comment list) = match runs with | [] -> [] | Comment c :: runs -> Comment c :: remove_default_runs p runs | Thing run :: runs -> if is_default_run p run then remove_default_runs p (remove_leading_blank_comment runs) else Thing run :: remove_default_runs p runs (* [keep_default_runs] keeps from the list [runs] just the default runs. *) let keep_default_runs (runs : run or_comment list) = List.flatten (List.map (function | Comment _ -> [] | Thing run -> if is_default_run (fun _ -> true) run then [ Thing run ] else [] ) runs) (* [targets run] is the set of target states of a run. *) let targets (run : run) : Lr1.NodeSet.t = List.fold_left (or_comment_fold (fun states (_, target) -> let s = target2state target in Lr1.NodeSet.add s states )) Lr1.NodeSet.empty run.elements (* [insert_runs inserts runs] inserts the content of the table [insert] into the list [runs] at appropriate points that are determined by the target states. *) let insert_runs (inserts : run or_comment list Lr1.NodeMap.t) (runs : run or_comment list) : run or_comment list = let emit, emitted = Q.create() in runs |> List.iter begin function | Thing run -> (* Emit this run. *) emit (Thing run); (* Then, check if the states reached by the sentences in this run appear in the table [inserts]. If so, emit the corresponding data. *) targets run |> Lr1.NodeSet.iter begin fun s -> match Lr1.NodeMap.find s inserts with | data -> List.iter emit data | exception Not_found -> () end | Comment c -> emit (Comment c) end; emitted() (* [gather_followers] turns a list of things and comments into a list of things-followed-with-comments. Any leading comments are silently lost. *) let rec gather_followers (xs : 'a or_comment list) : ('a * comment list) list = match xs with | Comment _ :: xs -> (* If there is a leading comment, ignore it. I believe that in a list of sentences, our current lexer never produces a leading comment. Indeed, a leading comment would be considered part of the previous toplevel comment. *) gather_followers xs | Thing x :: xs -> gather_followers_thing x [] xs | [] -> [] and gather_followers_thing x cs xs = match xs with | Comment c :: xs -> gather_followers_thing x (c :: cs) xs | _ -> (x, List.rev cs) :: gather_followers xs (* [space xs] ensures that every thing is followed with a least one newline. If that is not the case, a blank line is inserted. This is unpleasant, but I have difficulty dealing with my own baroque file format. *) let has_leading_newline = function | Comment c -> assert (c <> ""); c.[0] = '\n' | Thing _ -> false let rec space (xs : 'a or_comment list) : 'a or_comment list = match xs with | [] -> [] | Thing x1 :: x2 :: xs when not (has_leading_newline x2) -> Thing x1 :: Comment "\n" :: space (x2 :: xs) | x :: xs -> x :: space xs (* --------------------------------------------------------------------------- *) (* If two [--merge-errors ] directives are provided, compare the two message descriptions files and produce a merged .messages file. *) (* The code is modeled after [compare_errors] above. When we find that an entry exists on the left-hand side yet is missing on the right-hand side, we note that it should be added. *) (* If multiple sentences on the left-hand side share an error message, we attempt to preserve this feature when these sentences are copied to the right-hand side. This prevents us from using [foreach_targeted_sentence]; we use two nested loops instead. *) (* If the target state of a sentence on the left-hand side does not exist on the right-hand side, then this sentence/message pair is inserted at the end of the right-hand side. If the target state of a sentence on the left-hand side exists also on the right-hand side, albeit with a different message, then the left-hand sentence/message pair must be inserted into the right-hand side at a suitable position (that is, after the sentence/message pair that already exists on the right-hand side). Furthermore, if the sentence/message pair on the right-hand side involves the default message, then it should be removed and replaced. *) let merge_errors filename1 filename2 = let strict = false in let runs1 = read_messages strict filename1 and runs2 = read_messages strict filename2 in (* Remove the default runs on the right-hand side whose target state also appears on the left-hand side. We lose no information in doing so. *) let table1 = message_table false runs1 in let covered1 s = Lr1.NodeMap.mem s table1 in let runs2 = remove_default_runs covered1 runs2 in (* Remove the default runs on the left-hand side whose target state also appears on the right-hand side. Again, we lose nothing in doing so. *) let table2 = message_table false runs2 in let covered2 s = Lr1.NodeMap.mem s table2 in let runs1 = remove_default_runs covered2 runs1 in (* The default runs that remain on either side are unique. Set them aside, to be copied at the end. *) let default1 = keep_default_runs runs1 and default2 = keep_default_runs runs2 and runs1 = remove_default_runs (fun _ -> true) runs1 and runs2 = remove_default_runs (fun _ -> true) runs2 in (* Use [append] when a run must be appended at the end. *) let (append : run or_comment -> unit), appended = Q.create() in (* Use [insert] when a run must be inserted at a specific point. *) let inserts : run or_comment list Lr1.NodeMap.t ref = ref Lr1.NodeMap.empty in let insert (s : Lr1.node) (newer : run or_comment list) = let earlier = try Lr1.NodeMap.find s !inserts with Not_found -> [] in inserts := Lr1.NodeMap.add s (earlier @ newer) !inserts in runs1 |> List.iter begin fun entry -> match entry with | Comment c -> (* We do not want to lose the toplevel comments in the left-hand file, so we append them. This is not great, as they may become badly placed. We cannot really do better, though, as we do not know with what sentence they should be attached. (It may even be the case that they should be split and attached partly with the previous sentence and partly with the next one.) *) if not (is_blank c) then begin append (Comment (toplevel_comment filename1)); append entry end | Thing run1 -> let message1 = run1.message in assert (message1 <> default_message); (* The sentences in the queue [retained] are to be associated with [message1], forming a run, which is to be inserted at the end. *) let retain, retained = Q.create() in (* The fact that [run1.elements] is a mixture of sentences and comments is problematic. We do not know which comments are intended to be paired with which sentences. We adopt the convention that a comment is associated with the sentence that precedes it. The auxiliary function [gather_followers] helps us follow this convention. *) run1.elements |> gather_followers |> List.iter begin fun ((sentence1, target1), comments) -> let comments = List.map (fun c -> Comment c) comments in let s = target2state target1 in match Lr1.NodeMap.find s table2 with | exception Not_found -> (* This sentence is missing on the right-hand side, so this pair of a sentence and message must be retained. The accompanying comments are preserved. *) retain (Thing (sentence1, target1)); List.iter retain comments | _sentence2, message2 -> assert (message2 <> default_message); if message1 <> message2 then begin (* This sentence exists on the right-hand side, with a different message, so this sentence and message must be inserted in the right-hand side. We construct a singleton run (consisting of just one sentence and one message) and schedule it for insertion. If this sentence was part of a group of several sentences that share a message, then this sharing is lost. Preserving it would be difficult. The user can manually recreate it if desired. *) let c = conflict_comment filename1 in let elements = Thing (sentence1, target1) :: comments in let run = { run1 with elements } in insert s [Comment c; Thing run] end end; (* end of the loop over the elements of this run *) (* If the queue [retained] is nonempty, then all of the sentences in it must be associated with [message1], forming a run, which must be inserted at the end. *) let retained = retained() in if retained <> [] then begin let elements = retained in let run = { run1 with elements } in append (Thing run) end end; (* end of the loop over runs *) (* The new data is constructed as follows: *) let runs = (* The non-default runs in [runs2], into which we insert some runs from [run1]. *) insert_runs !inserts runs2 @ (* The non-default runs from [runs1] that we have decided to append at the end. *) appended() @ (* The default runs from both sides. *) default1 @ default2 in (* Print. *) List.iter write_run (space runs) let () = Settings.merge_errors |> Option.iter (fun (filename1, filename2) -> merge_errors filename1 filename2; exit 0 ) (* --------------------------------------------------------------------------- *) (* If [--update-errors ] is set, update the error message descriptions found in file [filename]. The idea is to re-generate the auto-comments, which are marked with ##, while leaving the rest untouched. *) let () = Settings.update_errors |> Option.iter (fun filename -> (* Read the file. *) let strict = false in let runs : run or_comment list = read_messages strict filename in (* We might wish to detect if two sentences lead to the same state. We might also wish to detect if this set of sentences is incomplete, and complete it automatically. However, the first task is carried out by [--compile-errors] already, and the second task is carried out by [--list-errors] and [--compare-errors] together. For now, let's try and keep things as simple as possible. The task of [--update-errors] should be to update the auto-generated comments, without failing, and without adding or removing sentences. *) (* Now, write a new [.messages] to the standard output channel, with new auto-generated comments. *) List.iter write_run runs; exit 0 ) (* --------------------------------------------------------------------------- *) (* If [--echo-errors ] is set, echo the error sentences found in file [filename]. Do not echo the error messages or the comments. *) (* In principle, we should able to run this command without even giving an .mly file name on the command line, and without building the automaton. This is not possible at the moment, because our code is organized in too rigid a manner. *) let () = Settings.echo_errors |> Option.iter (fun filename -> (* Read the file. *) let strict = false in let runs : run or_comment list = read_messages strict filename in (* Echo. *) List.iter (or_comment_iter (fun run -> List.iter (or_comment_iter (fun ((_, sentence), _target) -> print_string (Sentence.print `Abstract sentence) )) run.elements )) runs; exit 0 ) (* [--echo-errors-concrete] works like [--echo-errors], except every sentence is followed with an auto-generated comment that shows its concrete syntax. *) let () = Settings.echo_errors_concrete |> Option.iter (fun filename -> (* Read the file. *) let strict = false in let runs : run or_comment list = read_messages strict filename in (* Echo. *) List.iter (or_comment_iter (fun run -> List.iter (or_comment_iter (fun ((_, sentence), _target) -> print_string (Sentence.print `Abstract sentence); if Terminal.every_token_has_an_alias then printf "## Concrete syntax: %s\n" (Sentence.print `Concrete sentence) )) run.elements )) runs; exit 0 ) (* --------------------------------------------------------------------------- *) (* End of the functor [Run]. *) end let run () = let module R = Run(struct end) in () menhir-20210929/src/interpret.mli000066400000000000000000000040031412503066000165460ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* [run()] is in charge of handling several command line options, namely [--interpret], [--interpret-error], [--compile-errors], [--compare-errors]. If any of these options is present, the execution of Menhir stops here. *) val run: unit -> unit (* [print_messages_item] displays one data item. The item is of the form [nt, sentence, target], which means that beginning at the start symbol [nt], the sentence [sentence] ends in an error in the target state given by [target]. [target] also contains information about which spurious reductions are performed at the end. The display obeys the [.messages] file format. *) open Grammar val print_messages_item: Nonterminal.t * Terminal.t list * ReferenceInterpreter.target -> unit (* [stream] turns a finite list of terminals into a stream of terminals, represented as a pair of a lexer and a lexing buffer, so as to be usable with Menhir's traditional API. This lexer can raise [EndOfStream]. *) exception EndOfStream val stream: Terminal.t list -> (Lexing.lexbuf -> Terminal.t) * Lexing.lexbuf menhir-20210929/src/invariant.ml000066400000000000000000000650751412503066000163740ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module discovers information about the shape and content of the stack in each of the automaton's states. *) open Grammar module C = Conflict (* artificial dependency; ensures that [Conflict] runs first *) (* ------------------------------------------------------------------------ *) (* Compute the known suffix of the stack, a sequence of symbols, at every state. This is the "short invariant". *) module SSy = StackSymbols.Run() open SSy (* ------------------------------------------------------------------------ *) (* Now, compute which states may be held in the known suffix of the stack. *) module SSt = StackStates.Run(SSy) open SSt (* ------------------------------------------------------------------------ *) (* If requested, print the information that has been computed above. *) let () = Error.logC 3 (dump "short") (* ------------------------------------------------------------------------ *) (* We now determine which states must be represented, that is, explicitly pushed onto the stack. For simplicity, a state is either always represented or never represented. More fine-grained strategies, where a single state is sometimes pushed onto the stack and sometimes not pushed, depending on which outgoing transition is being taken, are conceivable, but quite tricky, and probably not worth the trouble. (1) If two states are liable to appear within a single stack cell, then one is represented if and only if the other is represented. This ensures that the structure of stacks is known everywhere and that we can propose types for stacks. (2) If a state [s] has an outgoing transition along nonterminal symbol [nt], and if the [goto] table for symbol [nt] has more than one target, then state [s] is represented. (3) If a stack cell contains more than one state and if at least one of these states is able to handle the [error] token, then these states are represented. (4) If the semantic action associated with a production mentions the [$syntaxerror] keyword, then the state that is being reduced to (that is, the state that initiated the recognition of this production) is represented. (Indeed, it will be passed as an argument to [errorcase].) *) (* Data. *) let rep : bool UnionFind.point array = Array.init Lr1.n (fun _ -> UnionFind.fresh false) (* Getter. *) let represented state = rep.(Lr1.number state) (* Setters. *) let represent state = UnionFind.set (represented state) true let represents states = represent (Lr1.NodeSet.choose states) (* Enforce condition (1) above. *) let share (v : property) = Array.iter (fun states -> let dummy = UnionFind.fresh false in Lr1.NodeSet.iter (fun state -> UnionFind.union dummy (represented state) ) states ) v let () = Lr1.iter (fun node -> share (stack_states node) ); Production.iter (fun prod -> share (production_states prod) ) (* Enforce condition (2) above. *) let () = Nonterminal.iter (fun nt -> let count = Lr1.targets (fun count _ _ -> count + 1 ) 0 (Symbol.N nt) in if count > 1 then Lr1.targets (fun () sources _ -> List.iter represent sources ) () (Symbol.N nt) ) (* Enforce condition (3) above. *) let handler state = try let _ = SymbolMap.find (Symbol.T Terminal.error) (Lr1.transitions state) in true with Not_found -> try let _ = TerminalMap.lookup Terminal.error (Lr1.reductions state) in true with Not_found -> false let handlers states = Lr1.NodeSet.exists handler states let () = Lr1.iter (fun node -> let v = stack_states node in Array.iter (fun states -> if Lr1.NodeSet.cardinal states >= 2 && handlers states then represents states ) v ) (* Enforce condition (4) above. *) let () = Production.iterx (fun prod -> if Action.has_syntaxerror (Production.action prod) then let sites = Lr1.production_where prod in let length = Production.length prod in if length = 0 then Lr1.NodeSet.iter represent sites else let states = (production_states prod).(0) in represents states ) (* Define accessors. *) (* If [--represent-states] is passed on the command line, then every state is represented. The above computation is still performed. *) let represented state = Settings.represent_states || UnionFind.get (represented state) let representeds states = Settings.represent_states || if Lr1.NodeSet.is_empty states then false else represented (Lr1.NodeSet.choose states) (* Statistics. *) let () = Error.logC 1 (fun f -> let count = Lr1.fold (fun count node -> if represented node then count + 1 else count ) 0 in Printf.fprintf f "%d out of %d states are represented.\n" count Lr1.n ) (* If requested, show a detailed table of which states are represented. *) let () = Error.logC 3 (fun f -> Lr1.iter (fun node -> Printf.fprintf f "represented(%s) = %b\n" (Lr1.print node) (represented node) ) ) (* ------------------------------------------------------------------------ *) (* Machinery for the computation of which symbols must keep track of their start or end positions. *) open Keyword type variable = Symbol.t * where (* WhereStart or WhereEnd *) module M : Fix.IMPERATIVE_MAPS with type key = variable = struct type key = variable type 'data t = { mutable startp: 'data SymbolMap.t; mutable endp: 'data SymbolMap.t; } open SymbolMap let create() = { startp = empty; endp = empty } let clear m = m.startp <- empty; m.endp <- empty let add (sym, where) data m = match where with | WhereStart -> m.startp <- add sym data m.startp | WhereEnd -> m.endp <- add sym data m.endp | WhereSymbolStart -> assert false let find (sym, where) m = match where with | WhereStart -> find sym m.startp | WhereEnd -> find sym m.endp | WhereSymbolStart -> assert false let iter f m = iter (fun sym -> f (sym, WhereStart)) m.startp; iter (fun sym -> f (sym, WhereEnd)) m.endp end (* ------------------------------------------------------------------------ *) (* We now determine which positions must be kept track of. For simplicity, we do this on a per-symbol basis. That is, for each symbol, either we never keep track of position information, or we always do. In fact, we do distinguish start and end positions. This leads to computing two sets of symbols -- those that keep track of their start position and those that keep track of their end position. A symbol on the right-hand side of a production must keep track of its (start or end) position if that position is explicitly requested by a semantic action. Furthermore, if the left-hand symbol of a production must keep track of its start (resp. end) position, then the first (resp. last) symbol of its right-hand side (if there is one) must do so as well. That is, unless the right-hand side is empty. *) (* 2015/11/11. When a production [prod] is reduced, the top stack cell may be consulted for its end position. This implies that this cell must exist and must store an end position! Now, when does this happen? 1- This happens if [prod] is an epsilon production and the left-hand symbol of the production, [nt prod], keeps track of its start or end position. 2- This happens if the semantic action explicitly mentions the keyword [$endpos($0)]. Now, if this happens, what should we do? a- If this happens in a state [s] whose incoming symbol is [sym], then [sym] must keep track of its end position. b- If this happens in an initial state, where the stack may be empty, then the sentinel cell at the bottom of the stack must contain an end position. Point (b) doesn't concern us here, but point (a) does. We must implement the constraint (1) \/ (2) -> (a). Point (b) is taken care of in the code back-end, where, for simplicity, we always create a sentinel cell. *) (* I will say that this is a lot more sophisticated than I would like. The code back-end has been known for its efficiency and I am trying to maintain this property -- in particular, I would like to keep track of no positions at all, if the user doesn't use any position keyword. But I am suffering. *) (* If [--represent-positions] is passed on the command line, then every position is stored. *) module F = FixSolver.Make(M)(Fix.Prop.Boolean) let () = (* We gather the constraints explained above in two loops. The first loop looks at every (non-start) production [prod]. The second loop looks at every (non-initial) state [s]. *) Production.iterx (fun prod -> let nt, rhs = Production.def prod and ids = Production.identifiers prod and action = Production.action prod in let length = Array.length rhs in if length > 0 then begin (* If [nt] keeps track of its start position, then the first symbol in the right-hand side must do so as well. *) F.record_VarVar (Symbol.N nt, WhereStart) (rhs.(0), WhereStart); (* If [nt] keeps track of its end position, then the last symbol in the right-hand side must do so as well. *) F.record_VarVar (Symbol.N nt, WhereEnd) (rhs.(length - 1), WhereEnd) end; KeywordSet.iter (function | SyntaxError -> () | Position (Before, _, _) -> (* Doing nothing here because [$endpos($0)] is dealt with in the second loop. *) () | Position (Left, _, _) -> (* [$startpos] and [$endpos] have been expanded away. *) assert false | Position (_, _, FlavorLocation) -> (* [$loc] and [$sloc] have been expanded away. *) assert false | Position (RightNamed _, WhereSymbolStart, _) -> (* [$symbolstartpos(x)] does not exist. *) assert false | Position (RightNamed id, where, _) -> (* If the semantic action mentions [$startpos($i)], then the [i]-th symbol in the right-hand side must keep track of its start position. Similarly for end positions. *) Array.iteri (fun i id' -> if id = id' then F.record_ConVar true (rhs.(i), where) ) ids ) (Action.keywords action) ); (* end of loop on productions *) Lr1.iterx (fun s -> (* Let [sym] be the incoming symbol of state [s]. *) let sym = Option.force (Lr1.incoming_symbol s) in (* Condition (1) in the long comment above (2015/11/11). If an epsilon production [prod] can be reduced in state [s], if its left-hand side [nt] keeps track of its start or end position, then [sym] must keep track of its end position. *) TerminalMap.iter (fun _ prods -> let prod = Misc.single prods in let nt, rhs = Production.def prod in let length = Array.length rhs in if length = 0 then begin F.record_VarVar (Symbol.N nt, WhereStart) (sym, WhereEnd); F.record_VarVar (Symbol.N nt, WhereEnd) (sym, WhereEnd) end ) (Lr1.reductions s); (* Condition (2) in the long comment above (2015/11/11). If a production can be reduced in state [s] and mentions [$endpos($0)], then [sym] must keep track of its end position. *) if Lr1.has_beforeend s then F.record_ConVar true (sym, WhereEnd) ) let track : variable -> bool option = let module S = F.Solve() in S.solution let track : variable -> bool = fun x -> Option.value (track x) ~default:false let startp symbol = Settings.represent_positions || track (symbol, WhereStart) let endp symbol = Settings.represent_positions || track (symbol, WhereEnd) let for_every_symbol (f : Symbol.t -> unit) : unit = Terminal.iter (fun t -> f (Symbol.T t)); Nonterminal.iter (fun nt -> f (Symbol.N nt)) let sum_over_every_symbol (f : Symbol.t -> bool) : int = let c = ref 0 in for_every_symbol (fun sym -> if f sym then c := !c + 1); !c let () = Error.logC 1 (fun f -> Printf.fprintf f "%d out of %d symbols keep track of their start position.\n\ %d out of %d symbols keep track of their end position.\n" (sum_over_every_symbol startp) (Terminal.n + Nonterminal.n) (sum_over_every_symbol endp) (Terminal.n + Nonterminal.n)) (* ------------------------------------------------------------------------ *) (* Constructors and accessors for information about the stack. *) (* Types. *) type cell = { symbol: Symbol.t; states: Lr1.NodeSet.t; holds_semv: bool; holds_state: bool; holds_startp: bool; holds_endp: bool; } type word = cell array (* Constructors. *) (* If [--represent-values] is passed on the command line, then every semantic value is stored. *) let has_semv symbol = Settings.represent_values || match symbol with | Symbol.N _nt -> true | Symbol.T tok -> match Terminal.ocamltype tok with | None -> (* Token has unit type and is omitted in stack cell. *) false | Some _ocamltype -> true let cell symbol states = let holds_semv = has_semv symbol in let holds_state = representeds states in let holds_startp, holds_endp = startp symbol, endp symbol in { symbol; states; holds_semv; holds_state; holds_startp; holds_endp } (* Accessors. *) let similar cell1 cell2 = Symbol.equal cell1.symbol cell2.symbol && cell1.holds_state = cell2.holds_state (* The fields [holds_semv], [holds_startp] and [holds_endp] do not need to be compared, because they are determined by the field [symbol]. The field [states] does not need to be compared because it does not influence the layout of the cell; comparing the field [holds_state] suffices. *) let pop = MArray.pop let fold_top f default w = let n = Array.length w in if n = 0 then default else f w.(n-1) (* ------------------------------------------------------------------------ *) (* Publish the short invariant. *) module type STACK = sig (**[stack s] is the known suffix of the stack at state [s]. *) val stack: Lr1.node -> word (**[prodstack prod] is the known suffix of the stack at a state where production [prod] can be reduced. In the short invariant, the length of this suffix is [Production.length prod]. In the long invariant, its length can be greater. If there are no states where [prod] can be reduced, then every cell contains an empty set of states. *) val prodstack: Production.index -> word (**[gotostack nt] is the known suffix of the stack at a state where an edge labeled [nt] has just been followed. In the short invariant, the length of this suffix is [1]: indeed, it consists of just one cell, associated with the symbol [nt]. In the long invariant, its length can be greater. *) val gotostack: Nonterminal.t -> word end (* Suppose we have a function [foo] that maps things to vectors of foos and a function [bar] that maps things to vectors of bars. Suppose we have a function [cell] that builds a cell out of a foo and a bar. Then, we want to construct and tabulate a function that maps things to vectors of cells. This is done in a generic way as follows. *) let publish tabulate foo bar cell = tabulate (fun thing -> let foos, bars = foo thing, bar thing in assert (Array.length foos >= Array.length bars); (* We allow [bars] to be shorter than [foos]. This is required in the computation of the long invariant, where [validate] can reject sets of states that are not equi-represented. In that case, we truncate [foos] to match [bars]. *) let k = Array.length bars in let foos = MArray.truncate k foos in Array.init k (fun i -> cell foos.(i) bars.(i)) ) let stack : Lr1.node -> word = publish Lr1.tabulate stack_symbols stack_states cell let prodstack : Production.index -> word = publish Production.tabulate production_symbols production_states cell let gotostack : Nonterminal.t -> word = publish Nonterminal.tabulate goto_symbols goto_states cell (* ------------------------------------------------------------------------ *) (* Explain how the stack should be deconstructed when an error is found. We sometimes have a choice as to how many stack cells should be popped. Indeed, several cells in the known suffix of the stack may physically hold a state. If neither of these states handles errors, then we could jump to either. (Indeed, if we jump to one that's nearer, it will in turn pop further stack cells and jump to one that's farther.) In the interest of code size, we should pop as few stack cells as possible. So, we jump to the topmost represented state in the known suffix. *) type state = | Represented | UnRepresented of Lr1.node type instruction = | Die | DownTo of word * state let rewind node : instruction = let w = stack node in let rec rewind w = if Array.length w = 0 then (* I believe that every stack description either is definite (that is, ends with [TailEmpty]) or contains at least one represented state. Thus, if we find an empty [w], this means that the stack is definitely empty. *) Die else let { states; _ } as cell = MArray.last w in let w = MArray.pop w in if representeds states then (* Here is a represented state. We will pop this cell and no more. *) DownTo ([| cell |], Represented) else if handlers states then begin (* Here is an unrepresented state that can handle errors. The cell must hold a singleton set of states, so we know which state to jump to, even though it isn't represented. *) assert (Lr1.NodeSet.cardinal states = 1); let state = Lr1.NodeSet.choose states in DownTo ([| cell |], UnRepresented state) end else (* Here is an unrepresented state that does not handle errors. Pop this cell and look further. *) match rewind w with | Die -> Die | DownTo (w, st) -> DownTo (MArray.push w cell, st) in rewind w (* ------------------------------------------------------------------------- *) (* Miscellaneous. *) let universal symbol = Lr1.fold (fun universal s -> universal && (if represented s then SymbolMap.mem symbol (Lr1.transitions s) else true) ) true (* ------------------------------------------------------------------------ *) (* Discover which states can peek at an error. These are the states where an error token may be on the stream. These are the states that are targets of a reduce action on [error]. *) (* 2012/08/25 I am optimizing this code, whose original version I found had quadratic complexity. The problem is as follows. We can easily iterate over all states to find which states [s] have a reduce action on error. What we must find out, then, is into which state [t] this reduce action takes us. This is not easy to predict, as it depends on the contents of the stack. The original code used an overapproximation, as follows: if the reduction concerns a production whose head symbol is [nt], then all of the states that have an incoming transition labeled [nt] are potential targets. The new version of the code below relies on the same approximation, but uses two successive loops instead of two nested loops. *) let errorpeekers = (* First compute a set of symbols [nt]... *) let nts : SymbolSet.t = Lr1.fold (fun nts node -> try let prods = TerminalMap.lookup Terminal.error (Lr1.reductions node) in let prod = Misc.single prods in let nt = Production.nt prod in SymbolSet.add (Symbol.N nt) nts with Not_found -> nts ) SymbolSet.empty in (* ... then compute the set of all target states of all transitions labeled by some symbol in the set [nt]. *) SymbolSet.fold (fun nt errorpeekers -> Lr1.NodeSet.union errorpeekers (Lr1.all_targets nt) ) nts Lr1.NodeSet.empty let errorpeeker node = Lr1.NodeSet.mem node errorpeekers (* ------------------------------------------------------------------------ *) let () = Time.tick "Constructing the invariant" (* ------------------------------------------------------------------------ *) (* Compute and publish the long invariant. *) (* Fortunately, all of the building blocks are at hand, so this is easy. *) (* A caveat: it is not obvious that the sets of states computed here are equi-represented. (A set is equi-represented if all of its elements are represented *or* all of its elements are unrepresented.) Yet, we need this property, otherwise the long invariant cannot be safely translated to an OCaml GADT. One might think that this property is likely true, because every set of states that appears somewhere in the long invariant must also appear somewhere in the short invariant, and we know that every set of states in the short invariant is equi-represented, because we have explicitly imposed this requirement. However, this is *incorrect*: testing shows that not every set of states in the long invariant is equi-represented. To work around this problem, we truncate the long invariant so as to forget about any stack cells that are not equi-represented. *) module Long () = struct (* Compute. *) module SSy = StackSymbols.Long() module SSt = StackStates.Run(SSy) open SSy (* crucial! shadows the short invariant *) open SSt (* crucial! shadows the short invariant *) (* Validate. *) let unrepresented node = not (represented node) let equi_represented nodes = Lr1.NodeSet.for_all represented nodes || Lr1.NodeSet.for_all unrepresented nodes let validate states = MArray.greatest_suffix_forall equi_represented states let stack_states s = validate @@ stack_states s let production_states prod = validate @@ production_states prod let goto_states nt = validate @@ goto_states nt (* Dump. *) let () = Error.logC 3 (dump "long") (* Publish. *) let stack : Lr1.node -> word = publish Lr1.tabulate stack_symbols stack_states cell let prodstack : Production.index -> word = publish Production.tabulate production_symbols production_states cell let gotostack : Nonterminal.t -> word = publish Nonterminal.tabulate goto_symbols goto_states cell let () = Time.tick "Constructing the long invariant" end (* Long *) (* ------------------------------------------------------------------------ *) (* Compute which entry states can reach each [run], [reduce], and [goto] function. *) (* This information is computed only on demand. *) (* This information is used in the new code back-end to determine in which states we have static knowledge of the final result type of the parser, ['final]. This information can be built into the GADT that describes the states, and this in turn can be used to perform certain optimizations (such as removing case analyses that have only one branch) while preserving the well-typedness of the OCaml code. *) (* This information is computed via a forward data flow analysis. *) (* The join semi-lattice of properties is as follows. *) module P = struct (* [SingleOrigin s] means that we are reachable via a single entry state [s]. [Top] means that we are reachable via multiple entry states. *) type property = | SingleOrigin of Nonterminal.t | Top let leq_join p1 p2 = match p1, p2 with | _, Top | Top, _ -> Top | SingleOrigin start1, SingleOrigin start2 -> if Nonterminal.equal start1 start2 then p2 else Top end (* The call graph of the [run], [reduce] and [goto] functions. *) module G = struct include P type variable = | Run of Lr1.node | Reduce of Production.index | Goto of Nonterminal.t type t = variable let foreach_root yield = (* The entry points are the [run] functions associated with each of the entry states. *) Lr1.entry |> ProductionMap.iter (fun prod node -> let nt = Option.force (Production.classify prod) in yield (Run node) (SingleOrigin nt) ) let foreach_successor v origin yield = match v with | Run node -> (* For each transition from [node] to [node'], the function [run node] calls the function [run node']. In the case of [goto] transitions, this is not a direct call (it goes through [reduce] and [goto] functions), but it is nevertheless accounted for here. *) Lr1.transitions node |> SymbolMap.iter begin fun _label node' -> yield (Run node') origin end; Lr1.reductions node |> TerminalMap.iter begin fun _tok prods -> let prod = Misc.single prods in yield (Reduce prod) origin end | Reduce prod -> (* A [reduce] function ends with a call to a [goto] function. *) let nt = Production.nt prod in yield (Goto nt) origin | Goto _nt -> (* A [goto] function appears to make no calls. The calls that it makes have already been accounted for above. *) () end (* Run the analysis on demand. *) let solution : (G.variable -> P.property option) Lazy.t = lazy ( let module D = Fix.DataFlow.ForType(G)(P)(G) in D.solution ) (* Convert a [property option] to something clearer for the end user. *) module Origin = struct type origin = | Dead | SingleOrigin of Nonterminal.t | MultipleOrigins let convert op = match op with | None -> Dead | Some (P.SingleOrigin nt) -> SingleOrigin nt | Some (P.Top) -> MultipleOrigins (* Publish the data. *) let run node = convert (Lazy.force solution (G.Run node)) let reduce prod = convert (Lazy.force solution (G.Reduce prod)) let goto nt = convert (Lazy.force solution (G.Goto nt)) end (* Origin *) menhir-20210929/src/invariant.mli000066400000000000000000000202621412503066000165320ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module discovers and publishes information about the automaton. It determines the shape of the stack when a state is about to be entered, when a production is about to be reduced, and when a goto transition is about to be taken. It also determines which states should be represented (that is, need to physically exist on the stack at runtime) and which symbols need to keep track of (start or end) positions. It also determines which automaton states could have to deal with an [error] token. The information computed in this module is used in the code back-end, in the Coq back-end, and in the automated production of .messages files. It is not used in the table back-end. *) open Grammar (* ------------------------------------------------------------------------- *) (* A representation of stack shapes. *) (**A cell is a representation of a stack cell. *) type cell = private { symbol: Symbol.t; (**The symbol associated with this cell. This symbol determines the presence and the type of the semantic value stored in this cell. It also determines whether a start position and an end position are stored in this cell. *) states: Lr1.NodeSet.t; (**A set of possible states such that the state that is stored in this cell (or would be stored in this cell) must be a member of this set. The states in this set have the property that either all of them are represented, in which case [holds_state] is [true], or none of them is represented, in which case [holds_state] is [false]. *) holds_semv: bool; (**Whether a semantic value is stored in this cell. By convention, if [symbol] is a nonterminal symbol, then a semantic value is stored. (We do not attempt to detect the situation where the semantic value could be omitted because it has type [unit], or the situation where it could be omitted because it is never used.) If [symbol] is a terminal symbol, then a semantic value is stored if and only if the [%token] declaration was annotated with a type. *) holds_state: bool; (**Whether a state is stored in this cell. *) holds_startp: bool; (**Whether a start position is stored in this cell. This decision is a function of [symbol]. *) holds_endp: bool; (**Whether an end position is stored in this cell. This decision is a function of [symbol]. *) } (**A word is a representation of a stack suffix. A word is an immutable array of cells, whose right end represents the top of the stack. Thus, the index 0 in the array corresponds to the cell that lies deepest in the stack. *) type word = cell array (**[similar] determines whether two stack cells have the same layout in memory, that is, the same OCaml type. This is equivalent to comparing all fields except [states]. *) val similar: cell -> cell -> bool (**[pop w] is the stack [w], deprived of its top element (if it exists). *) val pop: word -> word (**[fold_top f default w] returns [f cell], where [cell] is the top cell in the stack [w], if [w] is nonempty. Otherwise, it returns [default]. *) val fold_top: (cell -> 'a) -> 'a -> word -> 'a (* ------------------------------------------------------------------------- *) (* Information about the stack. *) module type STACK = sig (**[stack s] is the known suffix of the stack at state [s]. *) val stack: Lr1.node -> word (**[prodstack prod] is the known suffix of the stack at a state where production [prod] can be reduced. In the short invariant, the length of this suffix is [Production.length prod]. In the long invariant, its length can be greater. If there are no states where [prod] can be reduced, then every cell contains an empty set of states. *) val prodstack: Production.index -> word (**[gotostack nt] is the known suffix of the stack at a state where an edge labeled [nt] has just been followed. In the short invariant, the length of this suffix is [1]: indeed, it consists of just one cell, associated with the symbol [nt]. In the long invariant, its length can be greater. *) val gotostack: Nonterminal.t -> word end include STACK (* ------------------------------------------------------------------------- *) (* Information about error handling. *) type instruction = | Die | DownTo of word * state and state = | Represented | UnRepresented of Lr1.node (**[rewind s] explains how to rewind the stack when dealing with an error in state [s]. It produces an instruction to either die (because no state on the stack can handle errors) or pop a suffix of the stack. In the latter case, one reaches a state that is either represented (its identity is physically stored in the bottommost cell that is popped) or unrepresented (its identity is statically known). This function is used only in the [legacy] error-handling strategy. *) val rewind: Lr1.node -> instruction (**[errorpeeker s] tells whether state [s] can potentially peek at an error. This is the case if, in state [s], an error token may be on the stream. This function is used only in the [legacy] error-handling strategy. *) val errorpeeker: Lr1.node -> bool (* ------------------------------------------------------------------------- *) (* Information about which states and positions need to physically exist on the stack. *) (**[represented s] tells whether state [s] must have an explicit representation, that is, whether it is pushed onto the stack. *) val represented: Lr1.node -> bool (**[startp symbol] tells whether a start position must be recorded for the symbol [symbol]. *) val startp: Symbol.t -> bool (**[endp symbol] tells whether an end position must be recorded for the symbol [symbol]. *) val endp: Symbol.t -> bool (**[universal symbol] tells whether every represented state has an outgoing transition along [symbol]. *) val universal: Symbol.t -> bool (* ------------------------------------------------------------------------- *) (* More information about the stack. *) (**[Long()] computes a "long invariant" where the known suffix of the stack is as long as possible, based on an analysis of the LR(1) automaton. It is possibly longer than the suffix proposed in the "short invariant", whose length is always the maximum position of the bullet in the items of the state at hand. *) module Long () : STACK (* ------------------------------------------------------------------------- *) (* Reachability from the entry states. *) module Origin : sig (**The origin [SingleOrigin nt] indicates that the point of interest is reachable only via the start symbol [nt]. The origin [Dead] that this point in unreachable. [MultipleOrigins] indicates that this point is reachable via several start symbols. *) type origin = | Dead | SingleOrigin of Nonterminal.t | MultipleOrigins (**[run s] determines via which start symbols the [run] function for state [s] is reachable. *) val run: Lr1.node -> origin (**[reduce prod] determines via which start symbols the [reduce] function for production [prod] is reachable. *) val reduce: Production.index -> origin (**[goto nt] determines via which start symbols the [goto] function for the nonterminal symbol [nt] is reachable. *) val goto: Nonterminal.t -> origin end menhir-20210929/src/item.ml000066400000000000000000000260671412503066000153350ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* ------------------------------------------------------------------------ *) (* Items. *) (* An LR(0) item encodes a pair of integers, namely the index of the production and the index of the bullet in the production's right-hand side. *) (* Both integers are packed into a single integer, using 10 bits for the bullet position and the rest (21 bits on a 32-bit architecture, 53 bits on a 64-bit architecture) for the production index. This means that the length of a production must be at most 1023. This means that the number of productions must be at most: - 2^21, that is about 2 million, on a 32-bit architecture; - 2^53, that is practically unlimited, on a 64-bit architecture. *) (* These static limits could be adjusted if necessary. It would also be possible to dynamically adjust the limits depending on the grammar at hand. In practice, the need for this has not been felt. *) (* WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING *) (* The constant [low_bits], [low_limits] and the function [export] are duplicated in [lib/InspectionTableInterpreter.ml]. Do not modify them, or modify them here and there in a consistent manner. *) type t = int let low_bits = 10 (* have you read the above warning? *) let low_limit = 1 lsl low_bits let export t = (Production.i2p (t lsr low_bits), t mod low_limit) let import (prod, pos) = assert (pos < low_limit); (Production.p2i prod) lsl low_bits + pos let marshal (item : t) : int = item (* In order to guarantee that the assertion in [import] cannot fail, we check up front that every production is reasonably short. *) let () = Production.iter (fun index -> let length = Production.length index in if low_limit <= length then Error.error (Production.positions index) "The length of this production is %d, which exceeds the limit of %d." length (low_limit - 1) ) (* Comparison. *) let equal (item1 : t) (item2: t) = item1 = item2 (* [def item] looks up the production associated with this item in the grammar and returns [prod, nt, rhs, pos, length], where [prod] is the production's index, [nt] and [rhs] represent the production, [pos] is the position of the bullet in the item, and [length] is the length of the production's right-hand side. *) let def t = let prod, pos = export t in let nt, rhs = Production.def prod in let length = Array.length rhs in assert ((pos >= 0) && (pos <= length)); prod, nt, rhs, pos, length let startnt t = let _, _, rhs, pos, length = def t in assert (pos = 0 && length = 1); match rhs.(0) with | Symbol.N nt -> nt | Symbol.T _ -> assert false (* Printing. *) let print item = let _, nt, rhs, pos, _ = def item in Printf.sprintf "%s -> %s" (Nonterminal.print false nt) (Symbol.printaod 0 pos rhs) (* Classifying items. *) type kind = | Shift of Symbol.t * t | Reduce of Production.index let classify item = let prod, _, rhs, pos, length = def item in if pos = length then Reduce prod else Shift (rhs.(pos), import (prod, pos + 1)) (* Sets of items and maps over items. Hashing these data structures is specifically allowed, so balanced trees (for instance) would not be applicable here. *) module Map = Patricia.Big module Set = Map.Domain (* This functor performs precomputation that helps efficiently compute the closure of an LR(0) or LR(1) state. The precomputation requires time linear in the size of the grammar. The nature of the lookahead sets remains abstract. *) (* The precomputation consists in building the LR(0) nondeterministic automaton. This is a graph whose nodes are items and whose edges are epsilon transitions. (We do not care about shift transitions here.) Lookahead information can be attached to nodes and is propagated through the graph during closure computations. *) module Closure (L : Lookahead.S) = struct type state = L.t Map.t type node = { (* Each node is associated with an item. *) item: t; (* All of the epsilon transitions that leave a node have the same behavior with respect to lookahead information. *) (* The lookahead set transmitted along an epsilon transition is either a constant, or the union of a constant and the lookahead set at the source node. The former case corresponds to a source item whose trailer is not nullable, the latter to a source item whose trailer is nullable. *) epsilon_constant: L.t; epsilon_transmits: bool; (* Each node carries pointers to its successors through epsilon transitions. This field is never modified once initialization is over. *) mutable epsilon_transitions: node list; (* The following fields are transient, that is, only used temporarily during graph traversals. Marks are used to recognize which nodes have been traversed already. Lists of predecessors are used to record which edges have been traversed. Lookahead information is attached with each node. *) mutable mark: Mark.t; mutable predecessors: node list; mutable lookahead: L.t; } (* Allocate one graph node per item and build a mapping of items to nodes. *) let mapping : node array array = Array.make Production.n [||] let item2node item = let prod, pos = export item in mapping.(Production.p2i prod).(pos) let () = Production.iter (fun prod -> let _nt, rhs = Production.def prod in let length = Array.length rhs in mapping.(Production.p2i prod) <- Array.init (length+1) (fun pos -> let item = import (prod, pos) in (* The lookahead set transmitted through an epsilon transition is the FIRST set of the remainder of the source item, plus, if that is nullable, the lookahead set of the source item. *) let constant, transmits = if pos < length then let nullable, first = Analysis.nullable_first_prod prod (pos + 1) in L.constant first, nullable else (* No epsilon transitions leave this item. *) L.empty, false in { item = item; epsilon_constant = constant; epsilon_transmits = transmits; epsilon_transitions = []; (* temporary placeholder *) mark = Mark.none; predecessors = []; lookahead = L.empty; } ) ) (* At each node, compute transitions. *) let () = Production.iter (fun prod -> let _nt, rhs = Production.def prod in let length = Array.length rhs in Array.iteri (fun pos node -> node.epsilon_transitions <- if pos < length then match rhs.(pos) with | Symbol.N nt -> Production.mapnt nt (fun prod -> item2node (import (prod, 0)) ) | Symbol.T _ -> [] else [] ) mapping.(Production.p2i prod) ) (* We can be certain that there are no cycles of transitions that transmit a lookahead set. This guarantees that we can traverse these transitions in a topological order. Indeed, if there was such a cycle, then every item in this cycle would have to be of the form A -> . B beta, where beta is nullable. DeRemer and Pennello (1982) call this an includes cycle. An includes cycle is a special case of a cycle, as defined by Aho and Ullman. The module LoopDetection detects and rejects cycles, so we can be assured at this point that no such cycle exists. *) (* Closure computation. *) let closure (items : state) : state = (* Explore the graph forwards, starting from these items. Marks are used to tell which nodes have been visited. Build a list of all visited nodes; this is in fact the list of all items in the closure. At initial nodes and when reaching a node through a transition, record a lookahead set. When we reach a node through a transition that transmits the lookahead set found at its source, record its source, so as to allow re-traversing this transition backwards (below). *) let this = Mark.fresh() in let nodes = ref [] in let rec visit father transmits toks node = if Mark.same node.mark this then begin (* Node has been visited already. *) node.lookahead <- L.union toks node.lookahead; if transmits then node.predecessors <- father :: node.predecessors end else begin (* Node is new. *) node.predecessors <- if transmits then [ father ] else []; node.lookahead <- toks; follow node end and follow node = node.mark <- this; nodes := node :: !nodes; List.iter (visit node node.epsilon_transmits node.epsilon_constant) node.epsilon_transitions in Map.iter (fun item toks -> let node = item2node item in visit node (* dummy! *) false toks node ) items; let nodes = !nodes in (* Explore the graph of transmitting transitions backwards. By hypothesis, it is acyclic, so this is a topological walk. Lookahead sets are inherited through transitions. *) let this = Mark.fresh() in let rec walk node = if not (Mark.same node.mark this) then begin (* Node is new. *) node.mark <- this; (* Explore all predecessors and merge their lookahead sets into the current node's own lookahead set. *) List.iter (fun predecessor -> walk predecessor; node.lookahead <- L.union predecessor.lookahead node.lookahead ) node.predecessors end in List.iter walk nodes; (* Done. Produce a mapping of items to lookahead sets. Clear all transient fields so as to reduce pressure on the GC -- this does not make much difference. *) List.fold_left (fun closure node -> node.predecessors <- []; let closure = Map.add node.item node.lookahead closure in node.lookahead <- L.empty; closure ) Map.empty nodes (* End of closure computation *) end menhir-20210929/src/item.mli000066400000000000000000000057571412503066000155110ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* An LR(0) item encodes a pair of integers, namely the index of the production and the index of the bullet in the production's right-hand side. *) type t val import: Production.index * int -> t val export: t -> Production.index * int (* An item can be encoded as an integer. This is used in the table back-end only. The decoding function (really a copy of [export]) is in [TableInterpreter]. *) val marshal: t -> int (* Comparison. *) val equal: t -> t -> bool (* [def item] looks up the production associated with this item in the grammar and returns [prod, nt, rhs, pos, length], where [prod] is the production's index, [nt] and [rhs] represent the production, [pos] is the position of the bullet in the item, and [length] is the length of the production's right-hand side. *) val def: t -> Production.index * Nonterminal.t * Symbol.t array * int * int (* If [item] is a start item, [startnt item] returns the start nonterminal that corresponds to [item]. *) val startnt: t -> Nonterminal.t (* Printing. *) val print: t -> string (* Classifying items as shift or reduce items. A shift item is one where the bullet can still advance. A reduce item is one where the bullet has reached the end of the right-hand side. *) type kind = | Shift of Symbol.t * t | Reduce of Production.index val classify: t -> kind (* Sets of items and maps over items. Hashing these data structures is specifically allowed. *) module Set : GSet.S with type element = t module Map : GMap.S with type key = t and type Domain.t = Set.t (* This functor performs precomputation that helps efficiently compute the closure of an LR(0) or LR(1) state. The precomputation requires time linear in the size of the grammar. The nature of the lookahead sets remains abstract. *) module Closure (L : Lookahead.S) : sig (* A state maps items to lookahead information. *) type state = L.t Map.t (* This takes the closure of a state through all epsilon transitions. *) val closure: state -> state end menhir-20210929/src/keywordExpansion.ml000066400000000000000000000241561412503066000177450ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open BasicSyntax open Keyword open IL open CodeBits (* [posvar_ keyword] constructs the conventional name of the variable that stands for the position keyword [keyword]. *) let posvar_ = function | Position (subject, where, flavor) -> posvar subject where flavor | _ -> assert false (* [posvar_] should be applied to a position keyword *) (* [symbolstartpos producers i n] constructs an expression which, beginning at index [i], looks for the first non-empty producer and returns its start position. If none is found, this expression returns the end position of the right-hand side. This computation is modeled after the function [Parsing.symbol_start_pos] in OCaml's standard library. *) (* This cascade of [if] constructs could be quite big, and this could be a problem in terms of code size. Fortunately, we can optimize this code by computing, ahead of time, the outcome of certain comparisons. We assume that the lexer never produces a token whose start and end positions are the same. There follows that a non-nullable symbol cannot have the same start and end positions. Conversely, a symbol that generates (a subset of) the language {epsilon} must have the same start and end positions. *) (* Although this code is modeled after [Parsing.symbol_start_pos], we compare positions using physical equality, whereas they use structural equality. If for some reason a symbol has start and end positions that are structurally equal but physically different, then a difference will be observable. However, this is very unlikely. It would mean that a token has the same start and end positions (and furthermore, this position has been re-allocated). *) (* The reason why we expand [$symbolstartpos] away prior to inlining is that we want its meaning to be preserved by inlining. If we tried to preserve this keyword through the inlining phase, then (I suppose) we would have to introduce a family of keywords [$symbolstartpos(i, j)], computing over the interval from [i] to [j], and the preservation would not be exact -- because a nonempty symbol, once inlined, can be seen to be a sequence of empty and nonempty symbols. *) let rec symbolstartpos ((nullable, epsilon) as analysis) producers i n : IL.expr * KeywordSet.t = if i = n then (* Return [$endpos]. *) let keyword = Position (Left, WhereEnd, FlavorPosition) in EVar (posvar_ keyword), KeywordSet.singleton keyword else (* [symbol] is the symbol that appears in the right-hand side at position i. [x] is the identifier that is bound to it. We generate code that compares [$startpos($i)] and [$endpos($i)]. If they differ, we return [$startpos($i)]. Otherwise, we continue. Furthermore, as noted above, if [symbol] is not nullable, then we know that the start and end positions must differ, so we optimize this case. *) let producer = List.nth producers i in let symbol = producer_symbol producer and x = producer_identifier producer in let startp = Position (RightNamed x, WhereStart, FlavorPosition) and endp = Position (RightNamed x, WhereEnd, FlavorPosition) in if not (nullable symbol) then (* The start and end positions must differ. *) EVar (posvar_ startp), KeywordSet.singleton startp else let continue, keywords = symbolstartpos analysis producers (i + 1) n in if epsilon symbol then (* The start and end positions must be the same. *) continue, keywords else (* In the general case, a runtime test is required. *) EIfThenElse ( EApp (EVar "(!=)", [ EVar (posvar_ startp); EVar (posvar_ endp) ]), EVar (posvar_ startp), continue ), KeywordSet.add startp (KeywordSet.add endp keywords) (* [define keyword1 f keyword2] macro-expands [keyword1] as [f(keyword2)], where [f] is a function of expressions to expressions. *) let define keyword1 f keyword2 = Action.define keyword1 (KeywordSet.singleton keyword2) (mlet [ PVar (posvar_ keyword1) ] [ f (EVar (posvar_ keyword2)) ]) (* A [loc] keyword is expanded away. *) (* Since a location is represented as a pair of positions, $loc is sugar for the pair ($startpos, $endpos). (Similarly for $loc(x).) Furthermore, $sloc is sugar for the pair ($symbolstartpos, $endpos). *) let define_as_tuple keyword keywords = Action.define keyword (List.fold_right KeywordSet.add keywords KeywordSet.empty) (mlet [ PVar (posvar_ keyword) ] [ ETuple (List.map (fun keyword -> EVar (posvar_ keyword)) keywords) ]) let expand_loc keyword action = match keyword with | Position (Left, WhereSymbolStart, FlavorLocation) -> (* $sloc *) define_as_tuple keyword [ Position (Left, WhereSymbolStart, FlavorPosition); Position (Left, WhereEnd, FlavorPosition) ] action | Position (subject, WhereStart, FlavorLocation) -> (* $loc, $loc(x) *) define_as_tuple keyword [ Position (subject, WhereStart, FlavorPosition); Position (subject, WhereEnd, FlavorPosition) ] action | _ -> action (* An [ofs] keyword is expanded away. It is defined in terms of the corresponding [pos] keyword. *) let expand_ofs keyword action = match keyword with | Position (subject, where, FlavorOffset) -> define keyword (fun e -> ERecordAccess (e, "Lexing.pos_cnum")) (Position (subject, where, FlavorPosition)) action | _ -> action (* [$symbolstartpos] is expanded into a cascade of [if] constructs, modeled after [Parsing.symbol_start_pos]. *) let expand_symbolstartpos analysis producers n keyword action = match keyword with | Position (Left, WhereSymbolStart, FlavorPosition) -> let expansion, keywords = symbolstartpos analysis producers 0 n in Action.define keyword keywords (mlet [ PVar (posvar_ keyword) ] [ expansion ]) action | Position (RightNamed _, WhereSymbolStart, FlavorPosition) -> (* [$symbolstartpos(x)] does not exist. *) assert false | _ -> action (* [$startpos] and [$endpos] are expanded away. *) let expand_startend producers n keyword action = match keyword with | Position (Left, WhereStart, flavor) -> (* [$startpos] is defined as [$startpos($1)] if this production has nonzero length and [$endpos($0)] otherwise. *) define keyword (fun e -> e) ( if n > 0 then let x = producer_identifier (List.hd producers) in Position (RightNamed x, WhereStart, flavor) else Position (Before, WhereEnd, flavor) ) action | Position (Left, WhereEnd, flavor) -> (* [$endpos] is defined as [$endpos($n)] if this production has nonzero length and [$endpos($0)] otherwise. *) define keyword (fun e -> e) ( if n > 0 then let x = producer_identifier (List.hd (List.rev producers)) in Position (RightNamed x, WhereEnd, flavor) else Position (Before, WhereEnd, flavor) ) action | _ -> action (* [expand_round] performs one round of expansion on [action], using [f] as a rewriting rule. *) let expand_round f action = KeywordSet.fold f (Action.keywords action) action (* [expand_action] performs macro-expansion in [action]. We do this in several rounds: first, expand the [loc] keywords away; then, expand the [ofs] keywords away; then, expand [symbolstart] away; then, expand the rest. We do this in this order because each round can cause new keywords to appear, which must eliminated by the following rounds. *) let expand_action analysis producers action = let n = List.length producers in (* Expand [loc] keywords away first. *) let action = expand_round expand_loc action in (* The [ofs] keyword family is defined in terms of the [pos] family by accessing the [pos_cnum] field. Expand these keywords away first. *) let action = expand_round expand_ofs action in (* Expand [$symbolstartpos] away. *) let action = expand_round (expand_symbolstartpos analysis producers n) action in (* Then, expand away the non-[ofs] keywords. *) let action = expand_round (expand_startend producers n) action in action (* Silently analyze the grammar so as to find out which symbols are nullable and which symbols generate a subset of {epsilon}. This is used to optimize the expansion of $symbolstartpos. *) let analysis grammar = let module G = GrammarFunctor.Make(struct let grammar = grammar let verbose = false end)() in let lookup (nt : Syntax.symbol) : G.Symbol.t = try G.Symbol.lookup nt with Not_found -> assert false in let nullable nt : bool = G.Analysis.nullable_symbol (lookup nt) and epsilon nt : bool = G.TerminalSet.is_empty (G.Analysis.first_symbol (lookup nt)) in nullable, epsilon (* Put everything together. *) let expand_branch analysis branch = { branch with action = expand_action analysis branch.producers branch.action } let expand_rule analysis rule = { rule with branches = List.map (expand_branch analysis) rule.branches } let expand_grammar grammar = let analysis = analysis grammar in { grammar with rules = StringMap.map (expand_rule analysis) grammar.rules } menhir-20210929/src/keywordExpansion.mli000066400000000000000000000023301412503066000201040ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open BasicSyntax (* [expand_grammar] expands away the keywords [$startpos] and [$endpos], as well as the entire [ofs] family of keywords. Doing this early simplifies some aspects later on, in particular %inlining. *) val expand_grammar: grammar -> grammar menhir-20210929/src/lexdep.mll000066400000000000000000000040711412503066000160230ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This code analyzes the output of [ocamldep] and returns the list of [.cmi] files that the [.cmo] file depends on. *) { open Lexing exception Error of string let fail lexbuf = raise (Error (Printf.sprintf "failed to make sense of ocamldep's output (character %d).\n" lexbuf.lex_curr_p.pos_cnum) ) } let newline = ('\n' | '\r' | "\r\n") let whitespace = ( ' ' | '\t' | ('\\' newline) ) let entrychar = [^ '\n' '\r' '\t' ' ' '\\' ':' ] let entry = ((entrychar+ as basename) ".cm" ('i' | 'o' | 'x') as filename) (* [main] recognizes a sequence of lines, where a line consists of an entry, followed by a colon, followed by a list of entries. *) rule main = parse | eof { [] } | entry whitespace* ":" { let bfs = collect [] lexbuf in ((basename, filename), bfs) :: main lexbuf } | _ { fail lexbuf } (* [collect] recognizes a list of entries, separated with spaces and ending in a newline. *) and collect bfs = parse | whitespace+ entry { collect ((basename, filename) :: bfs) lexbuf } | whitespace* newline { bfs } | _ | eof { fail lexbuf } menhir-20210929/src/lexer.mll000066400000000000000000000661201412503066000156640ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) { open Lexing open Parser open Positions open Keyword (* ------------------------------------------------------------------------ *) (* Short-hands. *) let error1 pos = Error.error (Positions.one pos) let error2 lexbuf = Error.error (Positions.lexbuf lexbuf) (* ------------------------------------------------------------------------ *) (* [int_of_string] raises [Failure] if its argument is too large. This is not a problem in practice, but causes false positives when fuzzing Menhir. We hide the problem by failing gracefully. *) let int_of_string (pos : Lexing.position) i = try int_of_string i with Failure _ -> error1 pos "unreasonably large integer." (* ------------------------------------------------------------------------ *) (* This wrapper saves the current lexeme start, invokes its argument, and restores it. This allows transmitting better positions to the parser. *) let savestart lexbuf f = let startp = lexbuf.lex_start_p in let token = f lexbuf in lexbuf.lex_start_p <- startp; token (* ------------------------------------------------------------------------ *) (* Overwrites an old character with a new one at a specified offset in a [bytes] buffer. *) let overwrite content offset c1 c2 = assert (Bytes.get content offset = c1); Bytes.set content offset c2 (* ------------------------------------------------------------------------ *) (* Keyword recognition and construction. *) (* A monster is a spot where we have identified a keyword in concrete syntax. We describe a monster as an object with the following methods: *) type monster = { (* The position of the monster. *) pos: Positions.t; (* This method is passed an array of (optional) names for the producers, that is, the elements of the production's right-hand side. It is also passed a flag which tells whether [$i] syntax is allowed or disallowed. It may perform some checks and is allowed to fail. *) check: check; (* This method transforms the keyword (in place) into a conventional OCaml identifier. This is done by replacing '$', '(', and ')' with '_'. Bloody. The arguments are [ofs1] and [content]. [ofs1] is the offset where [content] begins in the source file. *) transform: int -> bytes -> unit; (* This is the keyword, in abstract syntax. *) keyword: keyword option; (* If this is a [$i] monster, then the identifier [_i] is stored here. *) oid: string option; } and check = Settings.dollars -> string option array -> unit (* No check. *) let none : check = fun _ _ -> () (* ------------------------------------------------------------------------ *) (* The [$syntaxerror] monster. *) let syntaxerror pos : monster = let check = none and transform ofs1 content = (* [$syntaxerror] is replaced with [(raise _eRR)]. Same length. *) let pos = start_of_position pos in let ofs = pos.pos_cnum - ofs1 in let source = "(raise _eRR)" in Bytes.blit_string source 0 content ofs (String.length source) and keyword = Some SyntaxError and oid = None in { pos; check; transform; keyword; oid } (* ------------------------------------------------------------------------ *) (* We check that every [$i] is within range. Also, we forbid using [$i] when a producer has been given a name; this is bad style and may be a mistake. (Plus, this simplifies our life, as we rewrite [$i] to [_i], and we would have to rewrite it to a different identifier otherwise.) *) let check_dollar pos i : check = fun dollars producers -> (* If [i] is out of range, say so. *) if not (0 <= i - 1 && i - 1 < Array.length producers) then Error.error [pos] "$%d refers to a nonexistent symbol." i; (* If [$i] could be referred to via a name, say so. *) producers.(i - 1) |> Option.iter (fun x -> Error.error [pos] "please do not say: $%d. Instead, say: %s." i x ); (* If [$i] syntax is disallowed, say so. *) match dollars with | Settings.DollarsDisallowed -> Error.error [pos] "please do not use $%d. Instead, name this value." i | Settings.DollarsAllowed -> () (* We check that every reference to a producer [x] in a position keyword, such as [$startpos(x)], exists. *) let check_producer pos x : check = fun _ producers -> if not (List.mem (Some x) (Array.to_list producers)) then Error.error [pos] "%s refers to a nonexistent symbol." x (* ------------------------------------------------------------------------ *) (* The [$i] monster. *) let dollar pos i : monster = let check : check = check_dollar pos i and transform ofs1 content = (* [$i] is replaced with [_i]. Thus, it is no longer a keyword. *) let pos = start_of_position pos in let ofs = pos.pos_cnum - ofs1 in overwrite content ofs '$' '_' and keyword = None and oid = Some (Printf.sprintf "_%d" i) in { pos; check; transform; keyword; oid } (* ------------------------------------------------------------------------ *) (* The position-keyword monster. The most horrible of all. *) let position pos (where : string) (flavor : string) (i : string option) (x : string option) = let check_no_parameter () = if i <> None || x <> None then Error.error [pos] "$%s%s does not take a parameter." where flavor in let ofslpar = (* offset of the opening parenthesis, if there is one *) 1 + (* for the initial "$" *) String.length where + 3 (* for "pos" or "ofs" or "loc" *) in let where = match where with | "symbolstart" | "s" -> check_no_parameter(); WhereSymbolStart | "start" -> WhereStart | "end" -> WhereEnd | "" -> WhereStart | _ -> assert false in let flavor = match flavor with | "pos" -> FlavorPosition | "ofs" -> FlavorOffset | "loc" -> FlavorLocation | _ -> assert false in let subject, check = match i, x with | Some i, None -> let ii = int_of_string (start_of_position pos) i in if ii = 0 && where = WhereEnd then (* [$endpos($0)] *) Before, none else (* [$startpos($i)] is rewritten to [$startpos(_i)]. *) RightNamed ("_" ^ i), check_dollar pos ii | None, Some x -> (* [$startpos(x)] *) RightNamed x, check_producer pos x | None, None -> (* [$startpos] *) Left, none | Some _, Some _ -> assert false in let transform ofs1 content = let pos = start_of_position pos in let ofs = pos.pos_cnum - ofs1 in overwrite content ofs '$' '_'; let ofslpar = ofs + ofslpar in match i, x with | None, Some x -> overwrite content ofslpar '(' '_'; overwrite content (ofslpar + 1 + String.length x) ')' '_' | Some i, None -> overwrite content ofslpar '(' '_'; overwrite content (ofslpar + 1) '$' '_'; overwrite content (ofslpar + 2 + String.length i) ')' '_' | _, _ -> () in let keyword = Some (Position (subject, where, flavor)) and oid = None in { pos; check; transform; keyword; oid } (* ------------------------------------------------------------------------ *) (* In an OCaml header, there should be no monsters. This is just a sanity check. *) let no_monsters monsters = match monsters with | [] -> () | monster :: _ -> Error.error [monster.pos] "a Menhir keyword cannot be used in an OCaml header." (* ------------------------------------------------------------------------ *) (* Gathering all of the identifiers in an array of optional identifiers. *) let gather_oid xs oid = match oid with | Some x -> StringSet.add x xs | None -> xs let gather_oids oids = Array.fold_left gather_oid StringSet.empty oids (* Gathering all of the [oid] identifiers in a list of monsters. *) let gather_monsters monsters = List.fold_left (fun xs monster -> gather_oid xs monster.oid ) StringSet.empty monsters (* ------------------------------------------------------------------------ *) (* Creates a stretch. *) let mk_stretch pos1 pos2 parenthesize monsters = (* Read the specified chunk of the file. *) let raw_content : string = InputFile.chunk (pos1, pos2) in (* Transform the monsters, if there are any. (This explicit test allows saving one string copy and keeping just one live copy.) *) let content : string = match monsters with | [] -> raw_content | _ :: _ -> let content : bytes = Bytes.of_string raw_content in List.iter (fun monster -> monster.transform pos1.pos_cnum content) monsters; Bytes.unsafe_to_string content in (* Add whitespace so that the column numbers match those of the source file. If requested, add parentheses so that the semantic action can be inserted into other code without ambiguity. *) let content = if parenthesize then (* If [parenthesize] is true then we are at the beginning of a semantic action, just after the opening brace. This guarantees that we cannot be at the beginning of a line, so the subtraction [_ - 1] below cannot produce a negative result. *) (String.make (pos1.pos_cnum - pos1.pos_bol - 1) ' ') ^ "(" ^ content ^ ")" else (String.make (pos1.pos_cnum - pos1.pos_bol) ' ') ^ content in Stretch.({ stretch_filename = InputFile.get_input_file_name(); stretch_linenum = pos1.pos_lnum; stretch_linecount = pos2.pos_lnum - pos1.pos_lnum; stretch_content = content; stretch_raw_content = raw_content; stretch_keywords = Misc.filter_map (fun monster -> monster.keyword) monsters }) (* Creating a stretch from a located identifier. (This does not require the input file to be currently opened.) In this variant, [parenthesize] is false, [monsters] is empty. *) let stretch_of_id (id : string located) = let raw_content, pos = Positions.decompose id in let pos1 = Positions.start_of_position pos and pos2 = Positions.end_of_position pos and filename = Positions.filename_of_position pos in assert (pos1 != Lexing.dummy_pos); let padding = pos1.pos_cnum - pos1.pos_bol in let content = String.make padding ' ' ^ raw_content in Stretch.({ stretch_filename = filename; stretch_linenum = pos1.pos_lnum; stretch_linecount = pos2.pos_lnum - pos1.pos_lnum; stretch_content = content; stretch_raw_content = raw_content; stretch_keywords = [] }) (* ------------------------------------------------------------------------ *) (* OCaml's reserved words. *) let table words = let table = Hashtbl.create 149 in List.iter (fun word -> Hashtbl.add table word ()) words; table let reserved = table [ "and"; "as"; "assert"; "begin"; "class"; "constraint"; "do"; "done"; "downto"; "else"; "end"; "exception"; "external"; "false"; "for"; "fun"; "function"; "functor"; "if"; "in"; "include"; "inherit"; "initializer"; "lazy"; "let"; "match"; "method"; "module"; "mutable"; "new"; "object"; "of"; "open"; "or"; "parser"; "private"; "rec"; "sig"; "struct"; "then"; "to"; "true"; "try"; "type"; "val"; "virtual"; "when"; "while"; "with"; "mod"; "land"; "lor"; "lxor"; "lsl"; "lsr"; "asr"; ] (* ------------------------------------------------------------------------ *) (* Menhir's percent-directives. *) let table directives = let table = Hashtbl.create 149 in List.iter (fun (word, token) -> Hashtbl.add table word token) directives; table let directives = table [ "token", TOKEN; "type", TYPE; "left", LEFT; "right", RIGHT; "nonassoc", NONASSOC; "start", START; "prec", PREC; "public", PUBLIC; "parameter", PARAMETER; "inline", INLINE; "attribute", PERCENTATTRIBUTE; "on_error_reduce", ON_ERROR_REDUCE; ] (* ------------------------------------------------------------------------ *) (* Decoding escaped characters. *) let char_for_backslash = function | 'n' -> '\010' | 'r' -> '\013' | 'b' -> '\008' | 't' -> '\009' | c -> c } (* ------------------------------------------------------------------------ *) (* Patterns. *) let newline = ('\010' | '\013' | "\013\010") let whitespace = [ ' ' '\t' ] let lowercase = ['a'-'z' '\223'-'\246' '\248'-'\255' '_'] let uppercase = ['A'-'Z' '\192'-'\214' '\216'-'\222'] let identchar = ['A'-'Z' 'a'-'z' '_' '\192'-'\214' '\216'-'\246' '\248'-'\255' '0'-'9'] (* '\'' forbidden *) let attributechar = identchar | '.' let subject = '$' (['0'-'9']+ as i) | ((lowercase identchar*) as x) let poskeyword = '$' ( (("symbolstart" | "start" | "end") as where) (("pos" | "ofs") as flavor) | (("s" | "") as where) ("loc" as flavor) ) ( '(' subject ')' )? let previouserror = "$previouserror" let syntaxerror = "$syntaxerror" (* ------------------------------------------------------------------------ *) (* The lexer. *) rule main = parse | "%" (identchar+ as directive) { try Hashtbl.find directives directive with Not_found -> error2 lexbuf "unknown directive: %s." directive } | "%%" { (* The token [PERCENTPERCENT] carries a stretch that contains everything that follows %% in the input file. This string must be created lazily. The parser decides (based on the context) whether this stretch is needed. If it is indeed needed, then constructing this stretch drives the lexer to the end of the file. *) PERCENTPERCENT (lazy ( let openingpos = lexeme_end_p lexbuf in let closingpos = finish lexbuf in mk_stretch openingpos closingpos false [] )) } | ";" { SEMI } | ":" { COLON } | "," { COMMA } | "=" { EQUAL } | "(" { LPAREN } | ")" { RPAREN } | "|" { BAR } | "?" { QUESTION } | "*" { STAR } | "+" { PLUS } | "~" { TILDE } | "_" { UNDERSCORE } | ":=" { COLONEQUAL } | "==" { EQUALEQUAL } | "let" { LET } | (lowercase identchar *) as id { if Hashtbl.mem reserved id then error2 lexbuf "this is an OCaml reserved word." else LID (with_pos (cpos lexbuf) id) } | (uppercase identchar *) as id { UID (with_pos (cpos lexbuf) id) } (* Quoted strings are used as aliases for tokens. *) (* A quoted string is stored as is -- with the quotes and with its escape sequences. *) | '"' { let buffer = Buffer.create 16 in let openingpos = lexeme_start_p lexbuf in let content = record_string openingpos buffer lexbuf in let id = Printf.sprintf "\"%s\"" content in let pos = import (openingpos, lexbuf.lex_curr_p) in QID (with_pos pos id) } | "//" [^ '\010' '\013']* newline (* skip C++ style comment *) | newline { new_line lexbuf; main lexbuf } | whitespace+ { main lexbuf } | "/*" { comment (lexeme_start_p lexbuf) lexbuf; main lexbuf } | "(*" { ocamlcomment (lexeme_start_p lexbuf) lexbuf; main lexbuf } | "<" { savestart lexbuf (ocamltype (lexeme_end_p lexbuf)) } | "%{" { savestart lexbuf (fun lexbuf -> let openingpos = lexeme_start_p lexbuf in let stretchpos = lexeme_end_p lexbuf in let closingpos, monsters = action true openingpos [] lexbuf in no_monsters monsters; HEADER (mk_stretch stretchpos closingpos false []) ) } | "{" { savestart lexbuf (fun lexbuf -> let openingpos = lexeme_start_p lexbuf in let stretchpos = lexeme_end_p lexbuf in let closingpos, monsters = action false openingpos [] lexbuf in ACTION ( fun dollars producers -> (* Check that the monsters are well-formed. *) List.iter (fun monster -> monster.check dollars producers) monsters; (* Gather all of the identifiers that the semantic action may use to refer to a semantic value. This includes the identifiers that are explicitly bound by the user (these appear in the array [producers]) and the identifiers [_i] when the semantic action uses [$i]. *) let ids = StringSet.union (gather_oids producers) (gather_monsters monsters) in (* Extract a stretch of text. *) let stretch = mk_stretch stretchpos closingpos true monsters in (* Build a semantic action. *) Action.from_stretch ids stretch ) ) } | ('%'? as percent) "[@" (attributechar+ as id) whitespace* { let openingpos = lexeme_start_p lexbuf in let stretchpos = lexeme_end_p lexbuf in let closingpos = attribute openingpos lexbuf in let pos = Positions.import (openingpos, lexeme_end_p lexbuf) in let attr = mk_stretch stretchpos closingpos false [] in if percent = "" then (* No [%] sign: this is a normal attribute. *) ATTRIBUTE (Positions.with_pos pos id, attr) else (* A [%] sign is present: this is a grammar-wide attribute. *) GRAMMARATTRIBUTE (Positions.with_pos pos id, attr) } | eof { EOF } | _ { error2 lexbuf "unexpected character(s)." } (* ------------------------------------------------------------------------ *) (* Skip C style comments. *) and comment openingpos = parse | newline { new_line lexbuf; comment openingpos lexbuf } | "*/" { () } | eof { error1 openingpos "unterminated comment." } | _ { comment openingpos lexbuf } (* ------------------------------------------------------------------------ *) (* Collect an O'Caml type delimited by angle brackets. Angle brackets can appear as part of O'Caml function types and variant types, so we must recognize them and *not* treat them as a closing bracket. *) and ocamltype openingpos = parse | "->" | "[>" { ocamltype openingpos lexbuf } | '>' { OCAMLTYPE (Stretch.Declared (mk_stretch openingpos (lexeme_start_p lexbuf) true [])) } | "(*" { ocamlcomment (lexeme_start_p lexbuf) lexbuf; ocamltype openingpos lexbuf } | newline { new_line lexbuf; ocamltype openingpos lexbuf } | eof { error1 openingpos "unterminated OCaml type." } | _ { ocamltype openingpos lexbuf } (* ------------------------------------------------------------------------ *) (* Collect O'Caml code delimited by curly brackets. The monsters that are encountered along the way are accumulated in the list [monsters]. Nested curly brackets must be properly counted. Nested parentheses are also kept track of, so as to better report errors when they are not balanced. *) and action percent openingpos monsters = parse | '{' { let _, monsters = action false (lexeme_start_p lexbuf) monsters lexbuf in action percent openingpos monsters lexbuf } | ("}" | "%}") as delimiter { match percent, delimiter with | true, "%}" | false, "}" -> (* This is the delimiter we were instructed to look for. *) lexeme_start_p lexbuf, monsters | _, _ -> (* This is not it. *) error1 openingpos "unbalanced opening brace." } | '(' { let _, monsters = parentheses (lexeme_start_p lexbuf) monsters lexbuf in action percent openingpos monsters lexbuf } | '$' (['0'-'9']+ as i) { let i = int_of_string (lexeme_start_p lexbuf) i in let monster = dollar (cpos lexbuf) i in action percent openingpos (monster :: monsters) lexbuf } | poskeyword { let monster = position (cpos lexbuf) where flavor i x in action percent openingpos (monster :: monsters) lexbuf } | previouserror { error2 lexbuf "$previouserror is no longer supported." } | syntaxerror { let monster = syntaxerror (cpos lexbuf) in action percent openingpos (monster :: monsters) lexbuf } | '"' { string (lexeme_start_p lexbuf) lexbuf; action percent openingpos monsters lexbuf } | "'" { char lexbuf; action percent openingpos monsters lexbuf } | "(*" { ocamlcomment (lexeme_start_p lexbuf) lexbuf; action percent openingpos monsters lexbuf } | newline { new_line lexbuf; action percent openingpos monsters lexbuf } | ')' | eof { error1 openingpos "unbalanced opening brace." } | _ { action percent openingpos monsters lexbuf } (* ------------------------------------------------------------------------ *) (* Inside a semantic action, we keep track of nested parentheses, so as to better report errors when they are not balanced. *) and parentheses openingpos monsters = parse | '(' { let _, monsters = parentheses (lexeme_start_p lexbuf) monsters lexbuf in parentheses openingpos monsters lexbuf } | ')' { lexeme_start_p lexbuf, monsters } | '{' { let _, monsters = action false (lexeme_start_p lexbuf) monsters lexbuf in parentheses openingpos monsters lexbuf } | '$' (['0'-'9']+ as i) { let i = int_of_string (lexeme_start_p lexbuf) i in let monster = dollar (cpos lexbuf) i in parentheses openingpos (monster :: monsters) lexbuf } | poskeyword { let monster = position (cpos lexbuf) where flavor i x in parentheses openingpos (monster :: monsters) lexbuf } | previouserror { error2 lexbuf "$previouserror is no longer supported." } | syntaxerror { let monster = syntaxerror (cpos lexbuf) in parentheses openingpos (monster :: monsters) lexbuf } | '"' { string (lexeme_start_p lexbuf) lexbuf; parentheses openingpos monsters lexbuf } | "'" { char lexbuf; parentheses openingpos monsters lexbuf } | "(*" { ocamlcomment (lexeme_start_p lexbuf) lexbuf; parentheses openingpos monsters lexbuf } | newline { new_line lexbuf; parentheses openingpos monsters lexbuf } | '}' | eof { error1 openingpos "unbalanced opening parenthesis." } | _ { parentheses openingpos monsters lexbuf } (* ------------------------------------------------------------------------ *) (* Collect an attribute payload, which is terminated by a closing square bracket. Nested square brackets must be properly counted. Nested curly brackets and nested parentheses are also kept track of, so as to better report errors when they are not balanced. *) and attribute openingpos = parse | '[' { let _ = attribute (lexeme_start_p lexbuf) lexbuf in attribute openingpos lexbuf } | ']' { lexeme_start_p lexbuf } | '{' { let _, _ = action false (lexeme_start_p lexbuf) [] lexbuf in attribute openingpos lexbuf } | '(' { let _, _ = parentheses (lexeme_start_p lexbuf) [] lexbuf in attribute openingpos lexbuf } | '"' { string (lexeme_start_p lexbuf) lexbuf; attribute openingpos lexbuf } | "'" { char lexbuf; attribute openingpos lexbuf } | "(*" { ocamlcomment (lexeme_start_p lexbuf) lexbuf; attribute openingpos lexbuf } | newline { new_line lexbuf; attribute openingpos lexbuf } | '}' | ')' | eof { error1 openingpos "unbalanced opening bracket." } | _ { attribute openingpos lexbuf } (* ------------------------------------------------------------------------ *) (* Skip O'Caml comments. Comments can be nested and can contain strings or characters, which must be correctly analyzed. (A string could contain begin-of-comment or end-of-comment sequences, which must be ignored; a character could contain a begin-of-string sequence.) *) and ocamlcomment openingpos = parse | "*)" { () } | "(*" { ocamlcomment (lexeme_start_p lexbuf) lexbuf; ocamlcomment openingpos lexbuf } | '"' { string (lexeme_start_p lexbuf) lexbuf; ocamlcomment openingpos lexbuf } | "'" { char lexbuf; ocamlcomment openingpos lexbuf } | newline { new_line lexbuf; ocamlcomment openingpos lexbuf } | eof { error1 openingpos "unterminated OCaml comment." } | _ { ocamlcomment openingpos lexbuf } (* ------------------------------------------------------------------------ *) (* Skip O'Caml strings. *) and string openingpos = parse | '"' { () } | '\\' newline | newline { new_line lexbuf; string openingpos lexbuf } | '\\' _ (* Upon finding a backslash, skip the character that follows, unless it is a newline. Pretty crude, but should work. *) { string openingpos lexbuf } | eof { error1 openingpos "unterminated OCaml string." } | _ { string openingpos lexbuf } (* ------------------------------------------------------------------------ *) (* Recording on OCaml string. (This is used for token aliases.) *) and record_string openingpos buffer = parse | '"' { Buffer.contents buffer } | ('\\' ['\\' '\'' '"' 't' 'b' 'r' ' ']) as sequence { (* This escape sequence is recognized as such, but not decoded. *) Buffer.add_string buffer sequence; record_string openingpos buffer lexbuf } | '\\' 'n' (* We disallow this escape sequence in a token alias because we wish to use this string (unescaped) when we print a concrete sentence in a .messages file (see [Interpret]), and we want this sentence to fit on a single line. *) { error2 lexbuf "'\\n' is not permitted in a token alias." } | '\\' _ { error2 lexbuf "illegal backslash escape in string." } | newline { error2 lexbuf "illegal newline in string." } | eof { error1 openingpos "unterminated string." } | _ as c { Buffer.add_char buffer c; record_string openingpos buffer lexbuf } (* Decoding a string that may contain escaped characters. *) and decode_string buffer = parse | '"' { (* The final double quote is skipped. *) } | '\\' (['\\' '\'' '"' 'n' 't' 'b' 'r' ' '] as c) { Buffer.add_char buffer (char_for_backslash c); decode_string buffer lexbuf } | _ as c { Buffer.add_char buffer c; decode_string buffer lexbuf } (* ------------------------------------------------------------------------ *) (* Skip O'Caml characters. A lone quote character is legal inside a comment, so if we don't recognize the matching closing quote, we simply abandon. *) and char = parse | '\\'? newline "'" { new_line lexbuf } | [^ '\\' '\''] "'" | '\\' _ "'" | '\\' ['0'-'9'] ['0'-'9'] ['0'-'9'] "'" | '\\' 'x' ['0'-'9' 'a'-'f' 'A'-'F'] ['0'-'9' 'a'-'f' 'A'-'F'] "'" | "" { () } (* ------------------------------------------------------------------------ *) (* Read until the end of the file. This is used after finding a %% that marks the end of the grammar specification. We update the current position as we go. This allows us to build a stretch for the postlude. *) and finish = parse | newline { new_line lexbuf; finish lexbuf } | eof { lexeme_start_p lexbuf } | _ { finish lexbuf } menhir-20210929/src/lexmli.mll000066400000000000000000000050431412503066000160340ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This code analyzes the output of [ocamlc -i] and returns a list of identifiers together with their types. Types are represented by offsets in the source string. *) { let fail () = Error.error [] "failed to make sense of ocamlc's output." } let whitespace = [ ' ' '\t' '\n' '\r' ] let lowercase = ['a'-'z' '\223'-'\246' '\248'-'\255' '_'] let identchar = ['A'-'Z' 'a'-'z' '_' '\192'-'\214' '\216'-'\246' '\248'-'\255' '0'-'9'] (* '\'' forbidden *) (* Read a list of bindings. We start immediately after a [val] keyword, so we expect either an end marker, or an identifier, followed by a colon, followed by a type, followed by another list of bindings. In the latter case, we recognize the identifier and the colon, record where the type begins, and pass control to [type_then_bindings]. *) rule bindings env = parse | "menhir_end_marker : int" { env } | whitespace* ((lowercase identchar*) as id) whitespace* ':' whitespace* { type_then_bindings env id (Lexing.lexeme_end lexbuf) lexbuf } | _ | eof { fail() } (* Read a type followed by a list of bindings. *) and type_then_bindings env id openingofs = parse | whitespace+ "val" whitespace { let closingofs = Lexing.lexeme_start lexbuf in bindings ((id, openingofs, closingofs) :: env) lexbuf } | _ { type_then_bindings env id openingofs lexbuf } | eof { fail() } (* Skip up to the first [val] keyword that follows the begin marker, and start from there. *) and main = parse | _* "val menhir_begin_marker : int" whitespace+ "val" whitespace+ { bindings [] lexbuf } | _ | eof { fail() } menhir-20210929/src/lexpointfree.mll000066400000000000000000000037211412503066000172470ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) { exception InvalidPointFreeAction } (* See [ParserAux.validate_pointfree_action]. *) let lowercase = ['a'-'z' '\223'-'\246' '\248'-'\255' '_'] let uppercase = ['A'-'Z' '\192'-'\214' '\216'-'\222'] let identchar = ['A'-'Z' 'a'-'z' '_' '\192'-'\214' '\216'-'\246' '\248'-'\255' '0'-'9'] (* '\'' forbidden *) let symbolchar = ['!' '$' '%' '&' '*' '+' '-' '.' '/' ':' '<' '=' '>' '?' '@' '^' '|' '~'] let op = symbolchar+ (* An approximation of OCaml's rules. *) let whitespace = [ ' ' '\t' '\n' ] rule validate_pointfree_action = parse | whitespace* (lowercase | uppercase | '`') (identchar | '.')* whitespace* eof | whitespace* '(' op ')' whitespace* eof (* We have got a nonempty point-free action: . *) { true } | whitespace* eof (* We have got an empty point-free action: <>. *) { false } | _ { raise InvalidPointFreeAction } (* See [ParserAux.valid_ocaml_identifier]. *) and valid_ocaml_identifier = parse | lowercase identchar* eof { true } | _ | eof { false } menhir-20210929/src/lineCount.mll000066400000000000000000000023441412503066000165030ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This simple function counts the number of newline characters in a string. *) let newline = ('\010' | '\013' | "\013\010") let ordinary = [^ '\010' '\013']+ rule count n = parse | eof { n } | newline { count (n + 1) lexbuf } | ordinary { count n lexbuf } menhir-20210929/src/listMonad.ml000066400000000000000000000034441412503066000163230ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) type 'a m = 'a list let return x = [ x ] let bind l f = List.flatten (List.map f l) let ( >>= ) l f = bind l f (* 1. (return x) >>= f == f x bind [ x ] f = List.flatten (List.map f [ x ]) = f x 2. m >>= return == m bind l return = List.flatten (List.map (fun x -> [ x ]) (x1::x2::..::xn)) = List.flatten ([x1]::...::[xn]) = x1::...::xn = l 3. (m >>= f) >>= g == m >>= (\x -> f x >>= g) bind (bind l f) g = List.flatten (List.map g (List.flatten (List.map f (x1::...::xn)))) = List.flatten (List.map g (f x1 :: f x2 :: ... :: f xn)) = List.flatten (List.map g ([fx1_1; fx1_2 ... ] :: [fx2_1; ... ] :: ...)) = List.flatten ([ g fx1_1; g fx_1_2 ... ] :: [ g fx_2_1; ... ] ...) = List.flatten (List.map (fun x -> List.flatten (List.map g (f x))) l) = bind l (fun x -> bind (f x) g) *) menhir-20210929/src/listMonad.mli000066400000000000000000000024241412503066000164710ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (** Monad type which represents a list of results. *) type 'a m = 'a list (** [bind x f] applies [f] to a list of results, returning a list of results. *) val bind: 'a m -> ('a -> 'b m) -> 'b m val ( >>= ) : 'a m -> ('a -> 'b m) -> 'b m (** [return x] is the left and right unit of [bind]. *) val return: 'a -> 'a m menhir-20210929/src/lookahead.ml000066400000000000000000000027151412503066000163200ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* These are the operations required of lookahead sets during a closure computation. This signature is exploited by the functor [Item.Closure]. *) module type S = sig (* The type of lookahead sets. *) type t (* The empty lookahead set. Redundant with the following, but convenient. *) val empty: t (* A concrete, constant set of terminal symbols. *) val constant: Grammar.TerminalSet.t -> t (* [union s1 s2] returns the union of [s1] and [s2]. *) val union: t -> t -> t end menhir-20210929/src/lr0.ml000066400000000000000000000624721412503066000150740ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar module InfiniteArray = MenhirLib.InfiniteArray (* ------------------------------------------------------------------------ *) (* Make sure that the middle-end runs before the automaton is constructed. *) let () = let module M = Middle.Run() in () (* ------------------------------------------------------------------------ *) (* Perform loop detection before attempting to build the LR(0) automaton. *) let () = let module L = LoopDetection.Run() in () (* ------------------------------------------------------------------------ *) (* Symbolic lookahead information. *) (* A symbolic lookahead set consists of an actual concrete set of terminal symbols and of a number of set variables. Set variables as encoded as integers. *) module VarSet = SparseBitSet module SymbolicLookahead = struct type t = TerminalSet.t * VarSet.t let constant toks = (toks, VarSet.empty) let empty = constant TerminalSet.empty let union (toks1, vars1) ((toks2, vars2) as s2) = let toks = TerminalSet.union toks1 toks2 and vars = VarSet.union vars1 vars2 in if toks2 == toks && vars2 == vars then s2 else (toks, vars) let variable (var : int) : t = (TerminalSet.empty, VarSet.singleton var) let project (toks, vars) = assert (VarSet.is_empty vars); toks end (* We will perform closure operations over symbolic lookahead sets. This allows us to later represent LR(1) states as pairs of an LR(0) node number and an array of concrete lookahead sets. *) module SymbolicClosure = Item.Closure(SymbolicLookahead) (* Closure operations over concrete lookahead sets are also used (when explaining conflicts). One could take another instance of the functor. The approach below is somewhat less elegant and makes each call to [closure] somewhat slower, but saves the cost of instantiating the functor again -- which is linear in the size of the grammar. *) type concretelr1state = TerminalSet.t Item.Map.t let closure (state : concretelr1state) : concretelr1state = Item.Map.map SymbolicLookahead.project (SymbolicClosure.closure (Item.Map.map SymbolicLookahead.constant state)) (* ------------------------------------------------------------------------ *) (* Finding which non-epsilon transitions leave a set of items. This code is parametric in the nature of lookahead sets. *) let transitions (state : 'a Item.Map.t) : 'a Item.Map.t SymbolMap.t = Item.Map.fold (fun item toks transitions -> match Item.classify item with | Item.Shift (symbol, item') -> let items : 'a Item.Map.t = try SymbolMap.find symbol transitions with Not_found -> Item.Map.empty in SymbolMap.add symbol (Item.Map.add item' toks items) transitions | Item.Reduce _ -> transitions ) state SymbolMap.empty (* ------------------------------------------------------------------------ *) (* Determining the reduction opportunities at a (closed) state. They are represented as a list of pairs of a lookahead set and a production index. This code is again parametric in the nature of lookahead sets. *) let reductions (state : 'a Item.Map.t) : ('a * Production.index) list = Item.Map.fold (fun item toks accu -> match Item.classify item with | Item.Reduce prod -> (toks, prod) :: accu | Item.Shift _ -> accu ) state [] (* ------------------------------------------------------------------------ *) (* Construction of the the LR(0) automaton. *) (* Nodes are numbered sequentially. *) type node = int (* A symbolic transition is a pair of the target state number and an array of symbolic lookahead sets. The variables in these sets are numbered in [0,g) where g is the number of items in the source LR(0) state. Items are numbered in the order of presentation by [Item.Set.fold]. *) type symbolic_transition_target = node * SymbolicLookahead.t array (* The automaton is represented by (growing) arrays of states (sets of items), symbolic transition information, and symbolic reduction information, indexed by node numbers. Conversely, a hash table maps states (sets of items) to node numbers. *) let n = ref 0 let states : Item.Set.t InfiniteArray.t = InfiniteArray.make Item.Set.empty let _transitions : symbolic_transition_target SymbolMap.t InfiniteArray.t = InfiniteArray.make SymbolMap.empty let _reductions : (SymbolicLookahead.t * Production.index) list InfiniteArray.t = InfiniteArray.make [] let map : (Item.Set.t, node) Hashtbl.t = Hashtbl.create 50021 let incoming : Symbol.t option InfiniteArray.t = InfiniteArray.make None (* The automaton is built depth-first. *) let rec explore (symbol : Symbol.t option) (state : Item.Set.t) : node = (* Find out whether this state was already explored. *) try Hashtbl.find map state with Not_found -> (* If not, create a new node. *) let k = !n in n := k + 1; InfiniteArray.set states k state; Hashtbl.add map state k; (* Record its incoming symbol. *) InfiniteArray.set incoming k symbol; (* Build a symbolic version of the current state, where each item is associated with a distinct lookahead set variable, numbered consecutively. *) let (_ : int), (symbolic_state : SymbolicClosure.state) = Item.Set.fold (fun item (i, symbolic_state) -> i+1, Item.Map.add item (SymbolicLookahead.variable i) symbolic_state ) state (0, Item.Map.empty) in (* Compute the symbolic closure. *) let closure = SymbolicClosure.closure symbolic_state in (* Compute symbolic information about reductions. *) InfiniteArray.set _reductions k (reductions closure); (* Compute symbolic information about the transitions, and, by dropping the symbolic lookahead information, explore the transitions to further LR(0) states. *) InfiniteArray.set _transitions k (SymbolMap.mapi (fun symbol symbolic_state -> let (k : node) = explore (Some symbol) (Item.Map.domain symbolic_state) in let lookahead : SymbolicLookahead.t array = Array.make (Item.Map.cardinal symbolic_state) SymbolicLookahead.empty in let (_ : int) = Item.Map.fold (fun _ s i -> lookahead.(i) <- s; i+1 ) symbolic_state 0 in ((k, lookahead) : symbolic_transition_target) ) (transitions closure)); k (* Creating a start state out of a start production. It contains a single item, consisting of the start production, at position 0. *) let start prod : Item.Set.t = Item.Set.singleton (Item.import (prod, 0)) (* This starts the construction of the automaton and records the entry nodes in an array. *) let entry : node ProductionMap.t = ProductionMap.start (fun prod -> explore None (start prod) ) let () = Hashtbl.clear map let n = !n let () = Error.logA 1 (fun f -> Printf.fprintf f "Built an LR(0) automaton with %d states.\n" n); Time.tick "Construction of the LR(0) automaton" (* ------------------------------------------------------------------------ *) (* Accessors. *) let items node : Item.Set.t = InfiniteArray.get states node let incoming_symbol node : Symbol.t option = InfiniteArray.get incoming node let outgoing_edges node : node SymbolMap.t = SymbolMap.map (fun (target, _) -> target) (InfiniteArray.get _transitions node) let outgoing_symbols node : Symbol.t list = SymbolMap.domain (InfiniteArray.get _transitions node) (* Efficient access to the predecessors of an LR(0) state requires building a reversed graph. This is done on the first invocation of the function [predecessors]. Our measurements show that it typically takes less than 0.01s anyway. *) let predecessors : node list array Lazy.t = lazy ( let predecessors = Array.make n [] in for source = 0 to n-1 do SymbolMap.iter (fun _symbol (target, _) -> predecessors.(target) <- source :: predecessors.(target) ) (InfiniteArray.get _transitions source) done; predecessors ) let incoming_edges (c : node) : node list = (Lazy.force predecessors).(c) module ImperativeNodeMap = Fix.Glue.ArraysAsImperativeMaps(struct let n = n end) (* ------------------------------------------------------------------------ *) (* Help for building the LR(1) automaton. *) (* An LR(1) state is represented as a pair of an LR(0) state number and an array of concrete lookahead sets (whose length depends on the LR(0) state). *) type lr1state = node * TerminalSet.t array (* A view of the type [lr1state] as an ordered type. This can be used in conjuction with [Fix.Memoize], [Fix.Numbering], etc. E.g. a numbering facility based on this mechanism is able to number 10000 states in about 0.01s. *) module Lr1StateAsOrderedType = struct type t = lr1state let compare (k1, toksr1) (k2, toksr2) = let c = k1 - k2 in if c <> 0 then c else Generic.compare toksr1 toksr2 (* In principle, we should use [Array.compare TerminalSet.compare], but the function [Array.compare] does not exist, and we happen to know that [TerminalSet.compare] is OCaml's generic comparison, so the whole comparison can be carried using generic comparison. *) end (* An encoded LR(1) state can be turned into a concrete representation, that is, a mapping of items to concrete lookahead sets. *) let export (k, toksr) = let (_ : int), items = Item.Set.fold (fun item (i, items) -> i+1, Item.Map.add item toksr.(i) items ) (InfiniteArray.get states k) (0, Item.Map.empty) in items (* Displaying a concrete state. *) let print_concrete leading (state : concretelr1state) = Misc.with_buffer 1024 (fun buffer -> Item.Map.iter (fun item toks -> Printf.bprintf buffer "%s%s [ %s ]\n" leading (Item.print item) (TerminalSet.print toks) ) state ) (* Displaying a state. By default, only the kernel is displayed, not the closure. *) let print leading state = print_concrete leading (export state) let print_closure leading state = print_concrete leading (closure (export state)) (* The core of an LR(1) state is the underlying LR(0) state. *) let core (k, _) = k (* A sanity check. This well-formedness check is quite costly, due to the use of [Item.Set.cardinal]. Therefore, it is enabled only when [debug] is [true]. *) let debug = false let well_formed (k, toksr) = not debug || Array.length toksr = Item.Set.cardinal (InfiniteArray.get states k) (* An LR(1) start state is the combination of an LR(0) start state (which consists of a single item) with a singleton lookahead set that consists of the end-of-file pseudo-token. *) let start k = let state = (k, [| TerminalSet.singleton Terminal.sharp |]) in assert (well_formed state); state (* Interpreting a symbolic lookahead set with respect to a source state. The variables in the symbolic lookahead set (which are integers) are interpreted as indices into the state's array of concrete lookahead sets. The result is a concrete lookahead set. *) let interpret ((_, toksr) as state : lr1state) ((toks, vars) : SymbolicLookahead.t) : TerminalSet.t = assert (well_formed state); VarSet.fold (fun var toks -> assert (var >= 0 && var < Array.length toksr); TerminalSet.union toksr.(var) toks ) vars toks (* Out of an LR(1) state, one produces information about reductions and transitions. This is done in an efficient way by interpreting the precomputed symbolic information with respect to that state. *) let reductions ((k, _) as state : lr1state) : (TerminalSet.t * Production.index) list = List.map (fun (s, prod) -> interpret state s, prod ) (InfiniteArray.get _reductions k) let transitions ((k, _) as state : lr1state) : lr1state SymbolMap.t = SymbolMap.map (fun ((k, sr) : symbolic_transition_target) -> ((k, Array.map (interpret state) sr) : lr1state) ) (InfiniteArray.get _transitions k) let transition symbol ((k, _) as state : lr1state) : lr1state = let ((k, sr) : symbolic_transition_target) = try SymbolMap.find symbol (InfiniteArray.get _transitions k) with Not_found -> assert false (* no transition along this symbol *) in (k, Array.map (interpret state) sr) (* [transition_tokens transitions] returns the set of tokens (terminal symbols) that are labels of outgoing transitions in the table [transitions]. *) let transition_tokens transitions = SymbolMap.fold (fun symbol _target toks -> match symbol with | Symbol.T tok -> TerminalSet.add tok toks | Symbol.N _ -> toks ) transitions TerminalSet.empty (* Equality of states. *) let equal ((k1, toksr1) as state1) ((k2, toksr2) as state2) = assert (k1 = k2 && well_formed state1 && well_formed state2); let rec loop i = if i = 0 then true else let i = i - 1 in (TerminalSet.equal toksr1.(i) toksr2.(i)) && (loop i) in loop (Array.length toksr1) (* Subsumption between states. *) let subsume ((k1, toksr1) as state1) ((k2, toksr2) as state2) = assert (k1 = k2 && well_formed state1 && well_formed state2); let rec loop i = if i = 0 then true else let i = i - 1 in (TerminalSet.subset toksr1.(i) toksr2.(i)) && (loop i) in loop (Array.length toksr1) (* This function determines whether two (core-equivalent) states are compatible, according to a criterion that is close to Pager's weak compatibility criterion. Pager's criterion guarantees that if a merged state has a potential conflict at [(i, j)] -- that is, some token [t] appears within the lookahead sets of both item [i] and item [j] -- then there exists a state in the canonical automaton that also has a potential conflict at [(i, j)] -- that is, some token [u] appears within the lookahead sets of both item [i] and item [j]. Note that [t] and [u] can be distinct. Pager has shown that his weak compatibility criterion is stable, that is, preserved by transitions and closure. This means that, if two states can be merged, then so can their successors. This is important, because merging two states means committing to merging their successors, even though we have not even built these successors yet. The criterion used here is a slightly more restrictive version of Pager's criterion, which guarantees equality of the tokens [t] and [u]. This is done essentially by applying Pager's original criterion on a token-wise basis. Pager's original criterion states that two states can be merged if the new state has no conflict or one of the original states has a conflict. Our more restrictive criterion states that two states can be merged if, for every token [t], the new state has no conflict at [t] or one of the original states has a conflict at [t]. This modified criterion is also stable. My experiments show that it is almost as effective in practice: out of more than a hundred real-world sample grammars, only one automaton was affected, and only one extra state appeared as a result of using the modified criterion. Its advantage is to potentially make conflict explanations easier: if there appears to be a conflict at [t], then some conflict at [t] can be explained. This was not true when using Pager's original criterion. *) (* A word of caution: reasoning about compatibility is tricky and often counter-intuitive. Here is a list of properties and non-properties: - Compatibility is reflexive and symmetric. - Compatibility is *not* transitive. - If two states A and B are in the subumption relation (i.e., one is a subset of the other), then A and B are compatible. - Compatibility is *not* monotonic. That is, it is *not* the case that if two states A and B are incompatible, then two larger states A' and B' must be incompatible as well. (The fact that the state A U B is compatible with itself shows that this is false.) In the contrapositive, it is *not* the case that if A and B are compatible, then two smaller states A' and B' must be compatible as well. - Compatibility is preserved by union of compatible states. That is, if A and B are compatible, then C is compatible with (A U B) if and only if C is compatible with both A and B. *) let compatible (k1, toksr1) (k2, toksr2) = assert (k1 = k2); let n = Array.length toksr1 in (* Two states are compatible if and only if they are compatible at every pair (i, j), where i and j are distinct. *) let rec loopi i = if i = n then true else let toksr1i = toksr1.(i) and toksr2i = toksr2.(i) in let rec loopj j = if j = i then true else let toksr1j = toksr1.(j) and toksr2j = toksr2.(j) in (* The two states are compatible at (i, j) if every conflict token in the merged state already was a conflict token in one of the two original states. This could be written as follows: TerminalSet.subset (TerminalSet.inter (TerminalSet.union toksr1i toksr2i) (TerminalSet.union toksr1j toksr2j)) (TerminalSet.union (TerminalSet.inter toksr1i toksr1j) (TerminalSet.inter toksr2i toksr2j)) but is easily seen (on paper) to be equivalent to: *) TerminalSet.subset (TerminalSet.inter toksr2i toksr1j) (TerminalSet.union toksr1i toksr2j) && TerminalSet.subset (TerminalSet.inter toksr1i toksr2j) (TerminalSet.union toksr2i toksr1j) && loopj (j+1) in loopj 0 && loopi (i+1) in loopi 0 (* This function determines whether two (core-equivalent) states can be merged without creating an end-of-stream conflict, now or in the future. The rule is, if an item appears in one state with the singleton "#" as its lookahead set, then its lookahead set in the other state must contain "#". So, either the second lookahead set is also the singleton "#", and no end-of-stream conflict exists, or it is larger, and the second state already contains an end-of-stream conflict. Put another way, we do not want to merge two lookahead sets when one contains "#" alone and the other does not contain "#". I invented this rule to complement Pager's criterion. I believe, but I am not 100% sure, that it does indeed prevent end-of-stream conflicts and that it is stable. Thanks to Sébastien Hinderer for reporting the bug caused by the absence of this extra criterion. *) let eos_compatible (k1, toksr1) (k2, toksr2) = assert (k1 = k2); let n = Array.length toksr1 in let rec loop i = if i = n then true else let toks1 = toksr1.(i) and toks2 = toksr2.(i) in begin if TerminalSet.mem Terminal.sharp toks1 && TerminalSet.is_singleton toks1 then (* "#" is alone in one set: it must be a member of the other set. *) TerminalSet.mem Terminal.sharp toks2 else if TerminalSet.mem Terminal.sharp toks2 && TerminalSet.is_singleton toks2 then (* Symmetric condition. *) TerminalSet.mem Terminal.sharp toks1 else true end && loop (i+1) in loop 0 (* This function determines whether two (core-equivalent) states can be merged without creating spurious reductions on the [error] token. The rule is, we merge two states only if they agree on which reductions are permitted on the [error] token. Without this restriction, we might end up in a situation where we decide to introduce an [error] token into the input stream and perform a reduction, whereas a canonical LR(1) automaton, confronted with the same input string, would fail normally -- that is, it would introduce an [error] token into the input stream, but it would not be able to perform a reduction right away: the current state would be discarded. In the interest of more accurate (or sane, or predictable) error handling, I decided to introduce this restriction as of 20110124. This will cause an increase in the size of automata for grammars that use the [error] token. It might actually make the [error] token somewhat easier to use. Note that two sets can be in the subsumption relation and still be error-incompatible. Error-compatibility requires equality of the lookahead sets, restricted to [error]. Thanks to Didier Rémy for reporting a bug caused by the absence of this extra criterion. *) let error_compatible (k1, toksr1) (k2, toksr2) = assert (k1 = k2); let n = Array.length toksr1 in let rec loop i = if i = n then true else let toks1 = toksr1.(i) and toks2 = toksr2.(i) in begin if TerminalSet.mem Terminal.error toks1 then (* [error] is a member of one set: it must be a member of the other set. *) TerminalSet.mem Terminal.error toks2 else if TerminalSet.mem Terminal.error toks2 then (* Symmetric condition. *) TerminalSet.mem Terminal.error toks1 else true end && loop (i+1) in loop 0 (* Union of two states. The two states must have the same core. The new state is obtained by pointwise union of the lookahead sets. *) let union (k1, toksr1) ((k2, toksr2) as s2) = assert (k1 = k2); let toksr = MArray.leq_join TerminalSet.union toksr1 toksr2 in (* If the fresh array [toksr] has the same content as [toksr2], then we must return the state [s2], unchanged. *) if toksr2 == toksr then s2 else (k1, toksr) (* Restriction of a state to a set of tokens of interest. Every lookahead set is intersected with that set. *) let restrict toks (k, toksr) = k, Array.map (fun toksri -> TerminalSet.inter toksri toks ) toksr (* A (state-local, possibly nondeterministic) reduction table maps terminal symbols to lists of productions. *) type reductions = Production.index list TerminalMap.t (* [add_reduction prod tok reductions] adds a reduction of [prod] on [tok] to the table [reductions]. *) let add_reduction prod tok reductions = let prods = try TerminalMap.lookup tok reductions with Not_found -> [] in TerminalMap.add tok (prod :: prods) reductions (* [add_reductions prod toks reductions] adds a reduction of [prod] on every token in the set [toks] to the table [reductions]. *) let add_reductions prod toks reductions = TerminalSet.fold (add_reduction prod) toks reductions let reductions_table state = List.fold_left (fun reductions (toks, prod) -> add_reductions prod toks reductions ) TerminalMap.empty (reductions state) (* [reduction_tokens reductions] returns the domain of the reductions table [table], in the form of a set of tokens. *) let reduction_tokens reductions = TerminalMap.fold (fun tok _prods toks -> TerminalSet.add tok toks ) reductions TerminalSet.empty (* This inverts a mapping of tokens to productions into a mapping of productions to sets of tokens. *) (* This is needed, in [CodeBackend], to avoid producing two (or more) separate branches that call the same [reduce] function. Instead, we generate just one branch, guarded by a [POr] pattern. *) let invert reductions : TerminalSet.t ProductionMap.t = TerminalMap.fold (fun tok prods inverse -> List.fold_left (fun inverse prod -> let toks = try ProductionMap.lookup prod inverse with Not_found -> TerminalSet.empty in ProductionMap.add prod (TerminalSet.add tok toks) inverse ) inverse prods ) reductions ProductionMap.empty (* [has_eos_conflict transitions reductions] tells whether a state has an end-of-stream conflict, that is, a reduction action on [#] and at least one other (shift or reduce) action. *) let has_eos_conflict transitions reductions = match TerminalMap.lookup_and_remove Terminal.sharp reductions with | exception Not_found -> (* There is no reduction action on [#], thus no conflict. *) false | prods, reductions -> (* There is at least one reduction action on [#]. *) (* If there are two reduction actions on [#], then we have a conflict. *) List.length prods > 1 || (* If there only one reduction on [#], then we have a conflict if and only if either there exists another shift or reduce action. *) not (TerminalMap.is_empty reductions) || SymbolMap.exists (fun symbol _ -> Symbol.is_terminal symbol) transitions let has_eos_conflict_lr1state (state : lr1state) = has_eos_conflict (transitions state) (reductions_table state) menhir-20210929/src/lr0.mli000066400000000000000000000153651412503066000152440ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* This module builds the LR(0) automaton associated with the grammar, then provides access to it. It also provides facilities for efficiently performing LR(1) constructions. *) (* ------------------------------------------------------------------------ *) (* The LR(0) automaton. *) (* The nodes of the LR(0) automaton are numbered. *) type node = int (* This is the number of nodes in the LR(0) automaton. *) val n: int (* These are the automaton's entry states, indexed by the start productions. *) val entry: node ProductionMap.t (* A node can be converted to the underlying LR(0) set of items. This set is not closed. *) val items: node -> Item.Set.t (* The incoming symbol of an LR(0) node is the symbol carried by all of the edges that enter this node. A node has zero incoming edges (and, thus, no incoming symbol) if and only if it is a start node. *) val incoming_symbol: node -> Symbol.t option val incoming_edges: node -> node list (* The outgoing edges of a node. *) val outgoing_edges: node -> node SymbolMap.t val outgoing_symbols: node -> Symbol.t list (* Maps whose keys are nodes. *) module ImperativeNodeMap : Fix.MINIMAL_IMPERATIVE_MAPS with type key = node (* ------------------------------------------------------------------------ *) (* Help for building the LR(1) automaton. *) (* An LR(1) state is internally represented as a pair of an LR(0) state number and an array of concrete lookahead sets (whose length depends on the LR(0) state). *) type lr1state (* A view of the type [lr1state] as an ordered type. *) module Lr1StateAsOrderedType : sig type t = lr1state val compare: t -> t -> int end (* An encoded LR(1) state can be turned into a concrete representation, that is, a mapping of items to concrete lookahead sets. *) type concretelr1state = TerminalSet.t Item.Map.t val export: lr1state -> concretelr1state (* One can take the closure of a concrete LR(1) state. *) val closure: concretelr1state -> concretelr1state (* The core of an LR(1) state is the underlying LR(0) state. *) val core: lr1state -> node (* One can create an LR(1) start state out of an LR(0) start node. *) val start: node -> lr1state (* Information about the transitions at a state. *) val transitions: lr1state -> lr1state SymbolMap.t val transition: Symbol.t -> lr1state -> lr1state (* [transition_tokens transitions] returns the set of tokens (terminal symbols) that are labels of outgoing transitions in the table [transitions]. *) val transition_tokens: 'target SymbolMap.t -> TerminalSet.t (* Information about the reductions at a state. *) (* See also [reductions_table] further on. *) val reductions: lr1state -> (TerminalSet.t * Production.index) list (* Equality of states. The two states must have the same core. Then, they are equal if and only if their lookahead sets are pointwise equal. *) val equal: lr1state -> lr1state -> bool (* Subsumption between states. The two states must have the same core. Then, one subsumes the other if and only if their lookahead sets are (pointwise) in the subset relation. *) val subsume: lr1state -> lr1state -> bool (* A slightly modified version of Pager's weak compatibility criterion. The two states must have the same core. *) val compatible: lr1state -> lr1state -> bool (* This function determines whether two (core-equivalent) states can be merged without creating an end-of-stream conflict. *) val eos_compatible: lr1state -> lr1state -> bool (* This function determines whether two (core-equivalent) states can be merged without creating spurious reductions on the [error] token. *) val error_compatible: lr1state -> lr1state -> bool (* Union of two states. The two states must have the same core. The new state is obtained by pointwise union of the lookahead sets. *) (* If [s'] is a subset of [s], then [union s' s] is physically equal to [s]. *) val union: lr1state -> lr1state -> lr1state (* Restriction of a state to a set of tokens of interest. Every lookahead set is intersected with that set. *) val restrict: TerminalSet.t -> lr1state -> lr1state (* The following functions display: 1- a concrete state; 2- a state (only the kernel, not the closure); 3- the closure of a state. The first parameter is a fixed string that is added at the beginning of every line. *) val print_concrete: string -> concretelr1state -> string val print: string -> lr1state -> string val print_closure: string -> lr1state -> string (* A (state-local, possibly nondeterministic) reduction table maps terminal symbols to lists of productions. *) type reductions = Production.index list TerminalMap.t (* [add_reduction prod tok reductions] adds a reduction of [prod] on [tok] to the table [reductions]. *) val add_reduction: Production.index -> Terminal.t -> reductions -> reductions (* [add_reductions prod toks reductions] adds a reduction of [prod] on every token in the set [toks] to the table [reductions]. *) val add_reductions: Production.index -> TerminalSet.t -> reductions -> reductions (* A table of the reductions at a state. *) val reductions_table: lr1state -> reductions (* [invert] inverts a reduction table (that is, a mapping of tokens to lists of productions), producing a mapping of productions to sets of tokens. *) val invert : reductions -> TerminalSet.t ProductionMap.t (* [reduction_tokens reductions] returns the domain of the reductions table [table], in the form of a set of tokens. *) val reduction_tokens: reductions -> TerminalSet.t (* [has_eos_conflict transitions reductions] tells whether a state has an end-of-stream conflict, that is, a reduction action on [#] and at least one other (shift or reduce) action. *) val has_eos_conflict: 'target SymbolMap.t -> reductions -> bool val has_eos_conflict_lr1state: lr1state -> bool menhir-20210929/src/lr1.ml000066400000000000000000001044761412503066000150760ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar open LR1Sigs (* -------------------------------------------------------------------------- *) (* Run the SLR(1) check first. *) let () = Slr.check() (* -------------------------------------------------------------------------- *) (* Select a construction algorithm based on the command-line settings. *) module type ALGORITHM = sig module Run () : LR1_AUTOMATON end let algo, mode = Settings.(match construction_mode with | ModeCanonical -> (module LR1Canonical : ALGORITHM), "canonical" | ModeInclusionOnly -> (module LR1Pager : ALGORITHM), "no-pager" | ModePager -> (module LR1Pager : ALGORITHM), "pager" | ModeLALR -> (module LALR : ALGORITHM), "lalr" ) module Algorithm = (val algo : ALGORITHM) let () = Error.logA 1 (fun f -> Printf.fprintf f "The construction mode is %s.\n" mode ) (* -------------------------------------------------------------------------- *) (* Run the construction algorithm. *) module Raw = Algorithm.Run() let () = Error.logA 1 (fun f -> Printf.fprintf f "Built an LR(1) automaton with %d states.\n" Raw.n ) (* -------------------------------------------------------------------------- *) (* In the following, we perform a depth-first traversal of the raw automaton that was built above. As we go, we perform silent conflict resolution (which can remove some transitions and therefore make some raw nodes unreachable) and we assign consecutive numbers to the reachable nodes. *) (* We define our own type [node] to be the number of a reachable node in this new numbering system. *) type node = int (* -------------------------------------------------------------------------- *) (* All of the following mutable state is modified or initialized during the depth-first traversal below. *) (* All of the arrays below are indexed by raw node numbers. *) module M = struct let marked : bool array = Array.make Raw.n false let mark (node : Raw.node) = marked.(Raw.number node) <- true let is_marked (node : Raw.node) = marked.(Raw.number node) end (* This array is initialized during the traversal. We assign new consecutive numbers to the reachable nodes. *) let unreachable = -1 let _number : node array = Array.make Raw.n unreachable let transport (raw_node : Raw.node) : node = _number.(Raw.number raw_node) (* This array of transitions is initialized here with the data supplied by [Raw.transitions]. Then, some transitions are *removed* (because of conflict resolution) during the traversal. *) let transitions : Raw.node SymbolMap.t array = Array.init Raw.n (fun i -> Raw.transitions (Raw.node i) ) (* Transitions are also stored in reverse, so as to allow reverse traversals of the automaton. This array is populated during the traversal. *) let predecessors : node list array = Array.make Raw.n [] (* This array is initialized during the traversal. *) let reductions : Lr0.reductions array = Array.make Raw.n TerminalMap.empty (* dummy *) (* Tokens for which there are several possible actions in the raw LR(1) automaton are known as conflict tokens. This array is populated during the traversal. *) let _conflict_tokens : TerminalSet.t array = Array.make Raw.n TerminalSet.empty (* Information about end-of-stream conflicts in each state. *) let _has_eos_conflict : (Production.index list * TerminalSet.t) option array = Array.make Raw.n None (* (New as of 2012/01/23.) This flag records whether a shift/reduce conflict in this node was solved in favor of neither (%nonassoc). This is later used to forbid a default reduction at this node. *) let forbid_default_reduction : bool array = Array.make Raw.n false (* A count of all reachable nodes. *) let n = ref 0 (* A list of nodes with conflicts. *) let conflict_nodes : node list ref = ref [] (* Counts of nodes with shift/reduce and reduce/reduce conflicts. *) let shift_reduce = ref 0 let reduce_reduce = ref 0 (* Count of the shift/reduce conflicts that could be silently resolved. *) let silently_solved = ref 0 (* -------------------------------------------------------------------------- *) (* A view of the raw LR(1) automaton as a graph. *) (* This view relies on the [transitions] array, as opposed to the function [Raw.transitions]. This means that, once an edge has been removed, it can no longer be followed. *) module RawForwardEdges = struct type node = Raw.node type label = Symbol.t let foreach_outgoing_edge node f = let i = Raw.number node in SymbolMap.iter f transitions.(i) let foreach_root f = ProductionMap.iter (fun _prod node -> f node) Raw.entry end (* -------------------------------------------------------------------------- *) (* This function is invoked during the traversal when a node is discovered. *) (* It is in charge of detecting and resolving conflicts at this node. *) let discover (raw_node : Raw.node) = let i = Raw.number raw_node and state = Raw.state raw_node in (* Number this node. *) let node : node = Misc.postincrement n in _number.(i) <- node; (* Detect conflicts. We iterate over the table [Lr0.reductions_table state], which gives us all potential reductions. The code is written in such a way that we are aware of multi-way shift/reduce/reduce conflicts. We solve conflicts, when unambiguously allowed by priorities, by removing certain transitions and reductions. *) let has_shift_reduce = ref false and has_reduce_reduce = ref false in let foreach_reduction f = TerminalMap.fold f (Lr0.reductions_table state) TerminalMap.empty in reductions.(i) <- foreach_reduction begin fun tok prods reductions -> assert (prods <> []); if SymbolMap.mem (Symbol.T tok) transitions.(i) then begin (* There is a transition in addition to the reduction(s). We have (at least) a shift/reduce conflict. *) assert (not (Terminal.equal tok Terminal.sharp)); if List.length prods = 1 then begin let prod = List.hd prods in (* This is a single shift/reduce conflict. If priorities tell us how to solve it, we follow that and modify the automaton. *) match Precedence.shift_reduce tok prod with | Precedence.ChooseShift -> (* Suppress the reduce action. *) incr silently_solved; reductions | Precedence.ChooseReduce -> (* Record the reduce action and suppress the shift transition. The automaton is modified in place. This can have the subtle effect of making some nodes unreachable. Any conflicts in these nodes are then ignored (as they should be). *) incr silently_solved; transitions.(i) <- SymbolMap.remove (Symbol.T tok) transitions.(i); TerminalMap.add tok prods reductions | Precedence.ChooseNeither -> (* Suppress both the reduce action and the shift transition. *) incr silently_solved; transitions.(i) <- SymbolMap.remove (Symbol.T tok) transitions.(i); forbid_default_reduction.(i) <- true; reductions | Precedence.DontKnow -> (* Priorities don't allow concluding. Record the existence of a shift/reduce conflict. *) _conflict_tokens.(i) <- Grammar.TerminalSet.add tok _conflict_tokens.(i); has_shift_reduce := true; TerminalMap.add tok prods reductions end else begin (* At least two reductions are enabled, so this is a shift/reduce/reduce conflict. If the priorities are such that each individual shift/reduce conflict is solved in favor of shifting or in favor of neither, then solve the entire composite conflict in the same way. Otherwise, report the conflict. *) let choices = List.map (Precedence.shift_reduce tok) prods in if List.for_all (fun choice -> match choice with | Precedence.ChooseShift -> true | _ -> false ) choices then begin (* Suppress the reduce action. *) silently_solved := !silently_solved + List.length prods; reductions end else if List.for_all ((=) Precedence.ChooseNeither) choices then begin (* Suppress the reduce action and the shift transition. *) silently_solved := !silently_solved + List.length prods; transitions.(i) <- SymbolMap.remove (Symbol.T tok) transitions.(i); reductions end else begin (* Record a shift/reduce/reduce conflict. Keep all reductions. *) _conflict_tokens.(i) <- Grammar.TerminalSet.add tok _conflict_tokens.(i); has_shift_reduce := true; has_reduce_reduce := true; TerminalMap.add tok prods reductions end end end else begin (* There is no transition in addition to the reduction(s). *) if List.length prods >= 2 then begin (* If there are multiple reductions, then we have a pure reduce/reduce conflict. Do nothing about it at this point. *) _conflict_tokens.(i) <- Grammar.TerminalSet.add tok _conflict_tokens.(i); has_reduce_reduce := true end; TerminalMap.add tok prods reductions end end; (* Detect end-of-stream conflicts at this node. If it has both a reduce action at [#] and some other (shift or reduce) action, this is an end-of-stream conflict. *) let transitions = transitions.(i) and reductions = reductions.(i) in if Lr0.has_eos_conflict transitions reductions then begin (* Conceptually suppress the reduce action at [#]. *) let prods, reductions = TerminalMap.lookup_and_remove Terminal.sharp reductions in (* Compute the tokens involved in the transitions and remaining reductions. *) let toks = TerminalSet.union (Lr0.transition_tokens transitions) (Lr0.reduction_tokens reductions) in (* Record this end-of-stream conflict. *) _has_eos_conflict.(i) <- Some (prods, toks) end; (* Record statistics about conflicts. *) if not (TerminalSet.is_empty _conflict_tokens.(i)) then begin conflict_nodes := node :: !conflict_nodes; if !has_shift_reduce then incr shift_reduce; if !has_reduce_reduce then incr reduce_reduce end (* -------------------------------------------------------------------------- *) (* This function is invoked during the traversal when an edge is traversed. *) (* It records an edge in the predecessor array. *) let traverse (source : Raw.node) _symbol (target : Raw.node) = (* The source node has been discovered and numbered already, so it can be transported. (This is not necessarily true of the target node.) *) let j = Raw.number target in predecessors.(j) <- transport source :: predecessors.(j) (* -------------------------------------------------------------------------- *) (* Perform the depth-first traversal of the raw automaton. *) let () = let module D = struct let traverse = traverse let discover = discover end in let module R = DFS.Run(RawForwardEdges)(M)(D) in () let () = if !silently_solved = 1 then Error.logA 1 (fun f -> Printf.fprintf f "One shift/reduce conflict was silently solved.\n" ) else if !silently_solved > 1 then Error.logA 1 (fun f -> Printf.fprintf f "%d shift/reduce conflicts were silently solved.\n" !silently_solved ); if !n < Raw.n then Error.logA 1 (fun f -> Printf.fprintf f "Only %d states remain after resolving shift/reduce conflicts.\n" !n ) let () = Grammar.diagnostics() (* -------------------------------------------------------------------------- *) (* Most of our mutable state becomes frozen at this point. Some transitions and reductions can still be removed further on, when default conflict resolution is performed. Also, some reductions can still be added further on, when [%on_error_reduce] declarations are obeyed. *) let n = !n let conflict_nodes = !conflict_nodes (* We need a mapping of nodes to raw node numbers -- the inverse of the array [number]. *) (* While building this mapping, we must remember that [number.(i)] is [unreachable] if the raw node number [i] has never been reached by the traversal. *) let raw : node -> int = let raw = Array.make n (-1) (* dummy *) in Array.iteri (fun i (* raw index *) (node : node) -> assert (0 <= i && i < Raw.n); if node <> unreachable then begin assert (0 <= node && node < n); raw.(node) <- i end ) _number; fun node -> assert (0 <= node && node < n); raw.(node) (* The array [transitions] is re-constructed so as to map nodes to nodes (instead of raw nodes to raw nodes). This array is now frozen; it is no longer modified. *) let transitions : node SymbolMap.t array = Array.init n (fun node -> SymbolMap.map transport transitions.(raw node) ) (* The array [predecessors] is now frozen. *) (* The array [reductions] is *not* yet frozen. *) (* The array [conflict_tokens] is now frozen. *) let () = Time.tick "Construction of the LR(1) automaton" (* -------------------------------------------------------------------------- *) (* Accessors. *) let number node = node let print node = Printf.sprintf "%d" (number node) let entry = ProductionMap.map transport Raw.entry let state node = Raw.state (Raw.node (raw node)) let transitions node = assert (0 <= node && node < n); transitions.(node) let set_reductions node table = reductions.(raw node) <- table let reductions node = reductions.(raw node) let predecessors node = predecessors.(raw node) let conflict_tokens node = _conflict_tokens.(raw node) let conflicts f = List.iter (fun node -> f (conflict_tokens node) node ) conflict_nodes let forbid_default_reduction node = forbid_default_reduction.(raw node) let has_eos_conflict node = _has_eos_conflict.(raw node) (* -------------------------------------------------------------------------- *) (* The incoming symbol of a node can be computed by going through its LR(0) core. For this reason, we do not need to explicitly record it here. *) let incoming_symbol node = Lr0.incoming_symbol (Lr0.core (state node)) let is_start node = match incoming_symbol node with | None -> true | Some _ -> false (* -------------------------------------------------------------------------- *) (* Graph views. *) module ForwardEdges = struct type nonrec node = node type label = Symbol.t let foreach_outgoing_edge node f = SymbolMap.iter f (transitions node) end module BackwardEdges = struct type nonrec node = node type label = unit (* could be changed to [Symbol.t option] if needed *) let foreach_outgoing_edge node f = List.iter (fun node -> f () node) (predecessors node) end (* -------------------------------------------------------------------------- *) (* With each start production [S' -> S], exactly two states are associated: a start state, which contains the item [S' -> . S [#]], and an exit state, which contains the item [S' -> S . [#]]. *) (* The following function recognizes these two states and returns the corresponding start symbol [S]. *) let is_start_or_exit node = let items = Lr0.items (Lr0.core (state node)) in if Item.Set.cardinal items = 1 then let item = Item.Set.choose items in let prod, _, _, _, _ = Item.def item in Production.classify prod else None (* -------------------------------------------------------------------------- *) (* Iteration over all nodes. *) let fold f accu = let accu = ref accu in for node = 0 to n - 1 do accu := f !accu node done; !accu let iter f = for node = 0 to n - 1 do f node done let map f = List.rev ( fold (fun accu node -> f node :: accu ) [] ) let foldx f = fold (fun accu node -> match incoming_symbol node with | None -> accu | Some _ -> f accu node ) let iterx f = iter (fun node -> match incoming_symbol node with | None -> () | Some _ -> f node ) let tabulate (f : node -> 'a) = Misc.tabulate n f let sum (f : node -> int) = Misc.sum n f (* -------------------------------------------------------------------------- *) (* We build a map of each symbol to the (reachable) nodes that have this incoming symbol. *) let lookup symbol index = try SymbolMap.find symbol index with Not_found -> [] let index : node list SymbolMap.t = fold (fun index node -> match incoming_symbol node with | None -> index | Some symbol -> SymbolMap.add symbol (node :: lookup symbol index) index ) SymbolMap.empty (* This allows iterating over all nodes that are targets of edges carrying a certain symbol. The sources of the corresponding edges are also provided. *) let targets f accu symbol = (* There are no incoming transitions on the start symbols. *) let targets = lookup symbol index in List.fold_left (fun accu target -> f accu (predecessors target) target ) accu targets (* -------------------------------------------------------------------------- *) (* Converting a start node into the single item that it contains. *) let start2item node = let state : Lr0.lr1state = state node in let core : Lr0.node = Lr0.core state in let items : Item.Set.t = Lr0.items core in assert (Item.Set.cardinal items = 1); Item.Set.choose items (* -------------------------------------------------------------------------- *) (* [has_beforeend s] tests whether the state [s] can reduce a production whose semantic action uses [$endpos($0)]. Note that [$startpos] and [$endpos] have been expanded away already, so we need not worry about the fact that (in an epsilon production) they expand to [$endpos($0)]. *) let has_beforeend node = TerminalMap.fold (fun _ prods accu -> accu || let prod = Misc.single prods in not (Production.is_start prod) && let action = Production.action prod in Action.has_beforeend action ) (reductions node) false (* -------------------------------------------------------------------------- *) (* Computing which terminal symbols a state is willing to act upon. One must keep in mind that, due to the merging of states, a state might be willing to perform a reduction on a certain token, yet the reduction can take us to another state where this token causes an error. In other words, the set of terminal symbols that is computed here is really an over-approximation of the set of symbols that will not cause an error. And there seems to be no way of performing an exact computation, as we would need to know not only the current state, but the contents of the stack as well. *) let acceptable_tokens (s : node) = (* If this state is willing to act on the error token, ignore it -- we do not wish to report that an error would be accepted in this state :-) *) let transitions = SymbolMap.remove (Symbol.T Terminal.error) (transitions s) and reductions = TerminalMap.remove Terminal.error (reductions s) in (* Accumulate the tokens carried by outgoing transitions. *) let covered = SymbolMap.fold (fun symbol _ covered -> match symbol with | Symbol.T tok -> TerminalSet.add tok covered | Symbol.N _ -> covered ) transitions TerminalSet.empty in (* Accumulate the tokens that permit reduction. *) let covered = ProductionMap.fold (fun _ toks covered -> TerminalSet.union toks covered ) (Lr0.invert reductions) covered in (* That's it. *) covered (* -------------------------------------------------------------------------- *) (* Report statistics. *) (* Produce the reports. *) let () = if !shift_reduce = 1 then Error.grammar_warning [] "one state has shift/reduce conflicts." else if !shift_reduce > 1 then Error.grammar_warning [] "%d states have shift/reduce conflicts." !shift_reduce; if !reduce_reduce = 1 then Error.grammar_warning [] "one state has reduce/reduce conflicts." else if !reduce_reduce > 1 then Error.grammar_warning [] "%d states have reduce/reduce conflicts." !reduce_reduce (* -------------------------------------------------------------------------- *) (* Instantiate [Set] and [Map] on the type [node]. *) module Node = struct type t = node let compare = (-) end module NodeSet = struct include Set.Make(Node) (* [union] does not guarantee physical equality between its second argument and its result when a logical equality holds. We wrap it so as to obtain this property. *) let leq_join s1 s2 = if subset s1 s2 then s2 else union s1 s2 let print s = Printf.sprintf "{ %s }" ( Misc.separated_iter_to_string print ", " (fun f -> iter f s) ) end module NodeMap = Map.Make(Node) module ImperativeNodeMap = Fix.Glue.ArraysAsImperativeMaps(struct let n = n end) let all_sources symbol = targets (fun accu sources _target -> List.fold_left (fun accu source -> NodeSet.add source accu) accu sources ) NodeSet.empty symbol let all_targets symbol = targets (fun accu _sources target -> NodeSet.add target accu ) NodeSet.empty symbol (* -------------------------------------------------------------------------- *) (* For each production, compute where (that is, in which states) this production can be reduced. This computation is done AFTER default conflict resolution (see below). It is an error to call the accessor function [production_where] before default conflict resolution has taken place. *) let production_where : NodeSet.t ProductionMap.t option ref = ref None let initialize_production_where () = production_where := Some ( fold (fun accu node -> TerminalMap.fold (fun _ prods accu -> let prod = Misc.single prods in let nodes = try ProductionMap.lookup prod accu with Not_found -> NodeSet.empty in ProductionMap.add prod (NodeSet.add node nodes) accu ) (reductions node) accu ) ProductionMap.empty ) let production_where (prod : Production.index) : NodeSet.t = match !production_where with | None -> (* It is an error to call this function before conflict resolution. *) assert false | Some production_where -> try (* Production [prod] may be reduced at [nodes]. *) let nodes = ProductionMap.lookup prod production_where in assert (not (NodeSet.is_empty nodes)); nodes with Not_found -> (* The production [prod] is never reduced. *) NodeSet.empty (* -------------------------------------------------------------------------- *) (* Warn about productions that are never reduced. *) (* These are productions that can never, ever be reduced, because there is no state that is willing to reduce them. There could be other productions that are never reduced because the only states that are willing to reduce them are unreachable. We do not report those. In fact, through the use of the inspection API, it might be possible to bring the automaton into a state where one of those productions can be reduced. *) let warn_about_productions_never_reduced () = let count = ref 0 in Production.iter (fun prod -> if NodeSet.is_empty (production_where prod) then match Production.classify prod with | Some nt -> incr count; Error.grammar_warning (Nonterminal.positions nt) "symbol %s is never accepted." (Nonterminal.print false nt) | None -> incr count; Error.grammar_warning (Production.positions prod) "production %s is never reduced." (Production.print prod) ); if !count > 0 then let plural_mark, be = if !count > 1 then ("s", "are") else ("", "is") in Error.grammar_warning [] "in total, %d production%s %s never reduced." !count plural_mark be (* -------------------------------------------------------------------------- *) (* When requested by the code generator, apply default conflict resolution to ensure that the automaton is deterministic. *) (* [best prod prods] chooses which production should be reduced among the list [prod :: prods]. It fails if no best choice exists. *) let rec best choice = function | [] -> choice | prod :: prods -> match Precedence.reduce_reduce choice prod with | Some choice -> best choice prods | None -> (* The cause for not knowing which production is best could be: 1- the productions originate in different source files; 2- they are derived, via inlining, from the same production. *) Error.signal Error.grammatical_error (Production.positions choice @ Production.positions prod) "do not know how to resolve a reduce/reduce conflict\n\ between the following two productions:\n%s\n%s" (Production.print choice) (Production.print prod); choice (* dummy *) (* Go ahead. *) let default_conflict_resolution () = let shift_reduce = ref 0 and reduce_reduce = ref 0 in conflict_nodes |> List.iter (fun node -> _conflict_tokens.(raw node) <- TerminalSet.empty; set_reductions node ( TerminalMap.fold (fun tok prods reductions -> try let (_ : node) = SymbolMap.find (Symbol.T tok) (transitions node) in (* There is a transition at this symbol, so this is a (possibly multiway) shift/reduce conflict. Resolve in favor of shifting by suppressing all reductions. *) shift_reduce := List.length prods + !shift_reduce; reductions with Not_found -> (* There is no transition at this symbol. Check whether we have multiple reductions. *) match prods with | [] -> assert false | [ _ ] -> TerminalMap.add tok prods reductions | prod :: ((_ :: _) as prods) -> (* We have a reduce/reduce conflict. Resolve, if possible, in favor of a single reduction. This reduction must be preferrable to each of the others. *) reduce_reduce := List.length prods + !reduce_reduce; TerminalMap.add tok [ best prod prods ] reductions ) (reductions node) TerminalMap.empty ) ); if !shift_reduce = 1 then Error.warning [] "one shift/reduce conflict was arbitrarily resolved." else if !shift_reduce > 1 then Error.warning [] "%d shift/reduce conflicts were arbitrarily resolved." !shift_reduce; if !reduce_reduce = 1 then Error.warning [] "one reduce/reduce conflict was arbitrarily resolved." else if !reduce_reduce > 1 then Error.warning [] "%d reduce/reduce conflicts were arbitrarily resolved." !reduce_reduce; (* Now, detect and remove end-of-stream conflicts. If a state has both a reduce action at [#] and some other (shift or reduce) action, this is an end-of-stream conflict. This conflict is resolved by suppressing the reduce action at [#]. *) (* Because we have already removed some reductions above, we may find fewer end-of-stream conflicts than we did during our first pass. *) let eos_conflicts = ref 0 in iter begin fun node -> let transitions = transitions node and reductions = reductions node in if Lr0.has_eos_conflict transitions reductions then begin (* Suppress the reduce action at [#]. *) let _, reductions = TerminalMap.lookup_and_remove Terminal.sharp reductions in set_reductions node reductions; (* Mark this end-of-stream conflict as resolved. *) _has_eos_conflict.(raw node) <- None; (* Count this end-of-stream conflict. *) incr eos_conflicts end end; if !eos_conflicts = 1 then Error.grammar_warning [] "one state end-of-stream conflict was arbitrarily resolved." else if !eos_conflicts > 1 then Error.grammar_warning [] "%d end-of-stream conflicts were arbitrarily resolved." !eos_conflicts; (* We can now compute where productions are reduced. *) initialize_production_where(); warn_about_productions_never_reduced() (* -------------------------------------------------------------------------- *) (* Extra reductions. *) (* 2015/10/19 Original implementation. *) (* 2016/07/13 Use priority levels to choose which productions to reduce when several productions are eligible. *) (* If a state can reduce some productions whose left-hand symbol has been marked [%on_error_reduce], and if one such production [prod] is preferable to every other (according to the priority rules of [%on_error_reduce] declarations), then every error action in this state is replaced with a reduction of [prod]. This is done even though this state may have outgoing shift transitions: thus, we are forcing one interpretation of the past, among several possible interpretations. *) (* The code below looks like the decision on a default reduction in [Default], except we do not impose the absence of outgoing terminal transitions. Also, we actually modify the automaton, so the back-ends, the reference interpreter, etc., need not be aware of this feature, whereas they are aware of default reductions. *) (* This code can run before we decide on the default reductions; this does not affect which default reductions will be permitted. *) (* This code does not affect which productions can be reduced where. Thus, it is OK for it to run after [initialize_production_where()]. *) (* A count of how many states receive extra reductions through this mechanism. *) let extra = ref 0 (* A count of how many states have more than one eligible production, but one is preferable to every other (so priority plays a role). *) let prioritized = ref [] (* The set of nonterminal symbols in the left-hand side of an extra reduction. *) let extra_nts = ref NonterminalSet.empty let extra_reductions_in_node node = (* Compute the productions which this node can reduce. *) let productions : _ ProductionMap.t = Lr0.invert (reductions node) in let prods : Production.index list = ProductionMap.fold (fun prod _ prods -> prod :: prods) productions [] in (* Keep only those whose left-hand symbol is marked [%on_error_reduce]. *) let prods = List.filter OnErrorReduce.reduce prods in (* Check if one of them is preferable to every other one. *) match Misc.best OnErrorReduce.preferable prods with | None -> (* Either no production is marked [%on_error_reduce], or several of them are marked and none is preferable. *) () | Some prod -> let acceptable = acceptable_tokens node in (* An extra reduction is possible. Replace every error action with a reduction of [prod]. If we replace at least one error action with a reduction, update [extra] and [extra_nts]. *) let triggered = lazy ( incr extra; if List.length prods > 1 then prioritized := node :: !prioritized; extra_nts := NonterminalSet.add (Production.nt prod) !extra_nts ) in Terminal.iter_real (fun tok -> if not (TerminalSet.mem tok acceptable) then begin set_reductions node (TerminalMap.add tok [ prod ] (reductions node)); Lazy.force triggered end ) let extra_reductions () = (* Examine every node. *) iter (fun node -> (* Just like a default reduction, an extra reduction should be forbidden (it seems) if [forbid_default_reduction] is set. *) if not (forbid_default_reduction node) then extra_reductions_in_node node ); (* Info message. *) if !extra > 0 then begin Error.logA 1 (fun f -> Printf.fprintf f "Extra reductions on error were added in %d states.\n" !extra; Printf.fprintf f "Priority played a role in %d of these states.\n" (List.length !prioritized) ); Error.logA 2 (fun f -> if !prioritized <> [] then Printf.fprintf f "These states are %s.\n" (NodeSet.print (NodeSet.of_list !prioritized)) ) end; (* Warn about useless %on_error_reduce declarations. *) OnErrorReduce.iter (fun nt -> if not (NonterminalSet.mem nt !extra_nts) then Error.grammar_warning [] "the declaration %%on_error_reduce %s is never useful." (Nonterminal.print false nt) ) (* -------------------------------------------------------------------------- *) (* Define [fold_entry], which in some cases facilitates the use of [entry]. *) let fold_entry f accu = ProductionMap.fold (fun prod state accu -> let nt : Nonterminal.t = match Production.classify prod with | Some nt -> nt | None -> assert false (* this is a start production *) in let t : Stretch.ocamltype = Nonterminal.ocamltype_of_start_symbol nt in f prod state nt t accu ) entry accu let entry_of_nt nt = (* Find the entry state that corresponds to [nt]. *) try ProductionMap.find (Production.startsymbol2startprod nt) entry with Not_found -> assert false exception Found of Nonterminal.t let nt_of_entry s = (* [s] should be an initial state. *) assert (incoming_symbol s = None); try ProductionMap.iter (fun prod entry -> if Node.compare s entry = 0 then match Production.classify prod with | None -> assert false | Some nt -> raise (Found nt) ) entry; (* This should not happen if [s] is indeed an initial state. *) assert false with Found nt -> nt menhir-20210929/src/lr1.mli000066400000000000000000000232361412503066000152410ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* This module first constructs an LR(1) automaton by using an appropriate construction method (LALR, Pager, canonical). Then, this automaton is further transformed (in place), in three steps: 1. Silent conflict resolution (without warnings), following the user's precedence declarations. This is done immediately. This can remove transitions and reductions. 2. Default conflict resolution (with warnings), following a fixed default policy. This is done via an explicit call to [default_conflict_resolution()]. This can remove reductions. 3. Addition of extra reductions, following the user's [%on_error_reduce] declarations. This is done via an explicit call to [extra_reductions()]. Conflicts are explained after step 1, and before steps 2 and 3. This is the main reason why these steps are separate. *) (* During step 1, shift/reduce conflicts are silently resolved if (and only if) that is allowed in a clean way by user-specified priorities. This includes multi-way shift/reduce/reduce conflicts if (and only if) there is agreement that the shift action should be preferred. Conflicts that cannot be silently resolved in this phase are reported, explained, then arbitrarily resolved in step 2. *) (* ------------------------------------------------------------------------- *) (* Accessors. *) (* This is the type of the automaton's nodes. *) type node module Node : Set.OrderedType with type t = node module NodeSet : sig include Set.S with type elt = node val leq_join: t -> t -> t val print: t -> string end module NodeMap : Map.S with type key = node module ImperativeNodeMap : Fix.MINIMAL_IMPERATIVE_MAPS with type key = node (* These are the automaton's entry states, indexed by the start productions. *) val entry: node ProductionMap.t (* [fold_entry] folds over [entry]. For convenience, it gives access not only to the start production and start state, but also to the nonterminal symbol and to the OCaml type associated with this production. *) val fold_entry: (Production.index -> node -> Nonterminal.t -> Stretch.ocamltype -> 'a -> 'a) -> 'a -> 'a (* [entry_of_nt] maps a (user) non-terminal start symbol to the corresponding start state. [nt_of_entry] does the reverse. *) val entry_of_nt: Nonterminal.t -> node val nt_of_entry: node -> Nonterminal.t (* Nodes are numbered sequentially from [0] to [n-1]. *) val n: int val number: node -> int (* A state is printed simply as its number. *) val print: node -> string (* This provides access to the LR(1) state that a node stands for. *) val state: node -> Lr0.lr1state (* This converts a start node into the single item that it contains. *) val start2item: node -> Item.t (* This maps a node to its incoming symbol, that is, the symbol carried by all of the edges that enter this node. A node has zero incoming edges (and, thus, no incoming symbol) if and only if it is a start node. *) val incoming_symbol: node -> Symbol.t option (* [is_start s] determines whether [s] is an initial state. *) val is_start: node -> bool (* With each start production [S' -> S], exactly two states are associated: a start state, which contains the item [S' -> . S [#]], and an exit state, which contains the item [S' -> S . [#]]. *) (* [is_start_or_exit node] determines whether [node] is one of these two states and, if so, returns the corresponding start symbol [S]. *) val is_start_or_exit: node -> Nonterminal.t option (* This maps a node to its predecessors. *) val predecessors: node -> node list (* A view of the forward edges as a graph. *) module ForwardEdges : sig type nonrec node = node type label = Symbol.t val foreach_outgoing_edge: node -> (label -> node -> unit) -> unit end (* A view of the backward (reverse) edges as a graph. *) module BackwardEdges : sig type nonrec node = node type label = unit val foreach_outgoing_edge: node -> (label -> node -> unit) -> unit end (* This provides access to a node's transitions and reductions. *) val transitions: node -> node SymbolMap.t val reductions: node -> Production.index list TerminalMap.t (* or: node -> Lr0.reductions *) (* (New as of 2012/01/23.) This tells whether a shift/reduce conflict in this node was solved in favor of neither (%nonassoc). This implies that one must forbid a default reduction at this node. *) val forbid_default_reduction: node -> bool (* [has_beforeend s] tests whether the state [s] can reduce a production whose semantic action uses [$endpos($0)]. Note that [$startpos] and [$endpos] have been expanded away already, so we need not worry about the fact that (in an epsilon production) they expand to [$endpos($0)]. *) val has_beforeend: node -> bool (* Computing which terminal symbols a state is willing to act upon. This function is currently unused, but could be used as part of an error reporting system. *) val acceptable_tokens: node -> TerminalSet.t (* Iteration over all nodes. The order in which elements are examined, and the order of [map]'s output list, correspond to the numeric indices produced by [number] above. *) val fold: ('a -> node -> 'a) -> 'a -> 'a val iter: (node -> unit) -> unit val map: (node -> 'a) -> 'a list (* Tabulation and sum of a function over nodes. *) val tabulate: (node -> 'a) -> (node -> 'a) val sum: (node -> int) -> int (* Iteration over non-start nodes *) val foldx: ('a -> node -> 'a) -> 'a -> 'a val iterx: (node -> unit) -> unit (* Iteration over all edges that carry a certain symbol. Edges are grouped in families, where all edges in a single family have the same target node. [targets f accu symbol] invokes [f accu sources target] once for every family, where [sources] are the sources of the edges in the family and [target] is their common target. *) val targets: ('a -> node list -> node -> 'a) -> 'a -> Symbol.t -> 'a (* [all_sources symbol] is the set of all sources of edges labeled with [symbol]. *) val all_sources: Symbol.t -> NodeSet.t (* [all_targets symbol] is the set of all targets of edges labeled with [symbol]. *) val all_targets: Symbol.t -> NodeSet.t (* Iteration over all nodes with conflicts. [conflicts f] invokes [f toks node] once for every node [node] with a conflict, where [toks] are the tokens involved in the conflicts at that node. If this function is invoked after conflicts have been resolved, then no conflicts are reported. *) val conflicts: (TerminalSet.t -> node -> unit) -> unit (* [conflict_tokens node] returns the set of tokens where [node] has a conflict. If this function is invoked after conflicts have been resolved, then no conflict tokens are reported. *) val conflict_tokens: node -> TerminalSet.t (* [has_eos_conflict node] indicates whether [node] has an end-of-stream conflict. If so, the list of productions and the lookahead tokens that are involved are returned. If this function is invoked after conflicts have been resolved, then no end-of-stream conflicts are reported. *) val has_eos_conflict: node -> (Production.index list * TerminalSet.t) option (* ------------------------------------------------------------------------- *) (* Modifications of the automaton. *) (* This function performs default conflict resolution. First, it resolves standard (shift/reduce and reduce/reduce) conflicts (thus ensuring that the automaton is deterministic) by removing some reduction actions. Second, it resolves end-of-stream conflicts by ensuring that states that have a reduce action at the pseudo-token "#" have no other action. It is called after conflicts have been explained and before code generation takes place. The automaton is modified in place. *) val default_conflict_resolution: unit -> unit (* This function adds extra reduction actions in the face of an error, if requested by the user via [%on_error_reduce]. *) (* It must be called after conflict resolution has taken place. The automaton is modified in place. *) (* If a state can reduce only one production, whose left-hand symbol has been declared [%on_error_reduce], then every error action in this state is replaced with a reduction action. This is done even though this state may have outgoing shift transitions: thus, we are forcing one interpretation of the past, among several possible interpretations. *) val extra_reductions: unit -> unit (* ------------------------------------------------------------------------- *) (* Information about which productions are reduced and where. *) (* [production_where prod] is the set of all states [s] where production [prod] might be reduced. It is an error to call this function before default conflict resolution has taken place. *) val production_where: Production.index -> NodeSet.t menhir-20210929/src/lr1partial.ml000066400000000000000000000200741412503066000164420ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar exception Oops module Run (X : sig (* A restricted set of tokens of interest. *) val tokens: TerminalSet.t (* A state of the (merged) LR(1) automaton that we're trying to simulate. *) val goal: Lr1.node end) = struct (* First, let's restrict our interest to the nodes of the merged LR(1) automaton that can reach the goal node. Some experiments show that this can involve one tenth to one half of all nodes. This optimization seems minor, but is easy to implement. *) let relevant : Lr1.node -> bool = let module G = struct include Lr1.BackwardEdges let foreach_root f = f X.goal end in let module M = DFS.MarkArray(Lr1) in let module D = struct let discover _node = () let traverse _source _label _target = () end in let module R = DFS.Run(G)(M)(D) in M.is_marked (* Second, all of the states that we shall consider are restricted to the set of tokens of interest. This is an important idea: by abstracting away some information, we make the construction much faster. *) let restrict = Lr0.restrict X.tokens (* Constructing the automaton. The automaton is represented as a graph. States are never merged -- this is a canonical LR(1) construction! As we go, we record the correspondence between nodes in this automaton and nodes in the merged LR(1) automaton. This allows us to tell when we have reached the desired place. This also allows us not to follow transitions that have already been eliminated, in the merged automaton, via resolution of shift/reduce conflicts. Whenever we follow a transition in the canonical LR(1) automaton, we check that the corresponding transition is legal in the merged LR(1) automaton. The automaton is explored breadth-first and shortest paths from every node to one of the start nodes are recorded. *) type node = { state: Lr0.lr1state; ancestor: (Symbol.t * node) option; shadow: Lr1.node; } (* A queue of pending nodes, whose successors should be explored. *) let queue : node Queue.t = Queue.create() (* Mapping of LR(0) state numbers to lists of nodes. *) let map : node list array = Array.make Lr0.n [] (* Exploring a state. This creates a new node, if necessary, and enqueues it for further exploration. *) exception Goal of node * Terminal.t let explore ancestor shadow (state : Lr0.lr1state) : unit = (* Find all existing nodes that share the same LR(0) core. *) let k = Lr0.core state in assert (k < Lr0.n); let similar = map.(k) in (* Check whether one of these nodes coincides with the candidate new node. If so, stop. This check requires comparing not only the states of the partial, canonical automaton, but also their shadows in the full, merged automaton. This is because a single state of the canonical automaton may be reached along several different paths, leading to distinct shadows in the merged automaton, and we must explore all of these paths in order to ensure that we eventually find a goal node. *) if not (List.exists (fun node -> Lr0.equal state node.state && shadow == node.shadow ) similar) then begin (* Otherwise, create a new node. *) let node = { state = state; ancestor = ancestor; shadow = shadow; } in map.(k) <- node :: similar; Queue.add node queue; (* Check whether this is a goal node. A node [N] is a goal node if (i) [N] has a conflict involving one of the tokens of interest and (ii) [N] corresponds to the goal node, that is, the path that leads to [N] in the canonical LR(1) automaton leads to the goal node in the merged LR(1) automaton. Note that these conditions do not uniquely define [N]. *) if shadow == X.goal then let can_reduce = ref TerminalSet.empty in let reductions1 : Production.index list TerminalMap.t = Lr1.reductions shadow in List.iter (fun (toks, prod) -> TerminalSet.iter (fun tok -> (* We are looking at a [(tok, prod)] pair -- a reduction in the canonical automaton state. *) (* Check that this reduction, which exists in the canonical automaton state, also exists in the merged automaton -- that is, it wasn't suppressed by conflict resolution. *) if List.mem prod (TerminalMap.lookup tok reductions1) then try let (_ : Lr1.node) = SymbolMap.find (Symbol.T tok) (Lr1.transitions shadow) in (* Shift/reduce conflict. *) raise (Goal (node, tok)) with Not_found -> let toks = !can_reduce in (* We rely on the property that [TerminalSet.add tok toks] preserves physical equality when [tok] is a member of [toks]. *) let toks' = TerminalSet.add tok toks in if toks == toks' then (* Reduce/reduce conflict. *) raise (Goal (node, tok)) else (* No conflict so far. *) can_reduce := toks' ) toks ) (Lr0.reductions state) end (* Populate the queue with the start nodes. Until we find a goal node, take a node out the queue, construct the nodes that correspond to its successors, and enqueue them. *) let goal, token = try ProductionMap.iter (fun (prod : Production.index) (k : Lr0.node) -> let shadow = try ProductionMap.find prod Lr1.entry with Not_found -> assert false in if relevant shadow then explore None shadow (restrict (Lr0.start k)) ) Lr0.entry; Misc.qiter (fun node -> SymbolMap.iter (fun symbol state -> try let shadow = SymbolMap.find symbol (Lr1.transitions node.shadow) in if relevant shadow then explore (Some (symbol, node)) shadow (restrict state) with Not_found -> (* No shadow. This can happen if a shift/reduce conflict was resolved in favor in reduce. Ignore that transition. *) () ) (Lr0.transitions node.state) ) queue; (* We didn't find a goal node. This shouldn't happen! If the goal node in the merged LR(1) automaton has a conflict, then there should exist a node with a conflict in the canonical automaton as well. Otherwise, Pager's construction is incorrect. *) raise Oops with Goal (node, tok) -> node, tok (* Query the goal node that was found about the shortest path from it to one of the entry nodes. *) let source, path = let rec follow path node = match node.ancestor with | None -> Lr1.start2item node.shadow, Array.of_list path | Some (symbol, node) -> follow (symbol :: path) node in follow [] goal let goal = Lr0.export goal.state end menhir-20210929/src/lr1partial.mli000066400000000000000000000041351412503066000166130ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar (* This exception is raised by [Run] if we fail to reach the goal state. This is known to happen in a few pathological cases (e.g., when a shift/reduce conflict is solved in favor of reduction, the only path towards the goal state may disappear). So we report this situation gracefully in the .conflicts file instead of failing abruptly. *) exception Oops module Run (X : sig (* A restricted set of tokens of interest. *) val tokens: TerminalSet.t (* A state of the (merged) LR(1) automaton that we're trying to simulate. *) val goal: Lr1.node end) : sig (* What we are after is a path, in the canonical LR(1) automaton, that leads from some entry node to a node [N] such that (i) [N] has a conflict involving one of the tokens of interest and (ii) [N] corresponds to the goal node, that is, the path that leads to [N] in the canonical LR(1) automaton leads to the goal node in the merged LR(1) automaton. *) val source: Item.t val path: Symbol.t array val goal: Lr0.concretelr1state (* An (arbitrarily chosen) conflict token in the goal state. *) val token: Terminal.t end menhir-20210929/src/main.ml000066400000000000000000000020661412503066000153140ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The main program. *) (* Everything is in [Back]. *) module B = Back (* artificial dependency *) menhir-20210929/src/mark.ml000066400000000000000000000024121412503066000153150ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (** This module implements a very simple notion of ``mark''. A mark is really a reference cell (without content). Creating a new mark requires allocating a new cell, and comparing marks requires comparing pointers. *) type t = unit ref let fresh = ref let same = (==) let none = fresh() menhir-20210929/src/mark.mli000066400000000000000000000027031412503066000154710ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (** This module implements a very simple notion of ``mark''. *) (** The type of marks. *) type t (** [fresh()] generates a fresh mark, that is, a mark that is guaranteed to be distinct from all existing marks. *) val fresh: unit -> t (** [same mark1 mark2] tells whether [mark1] and [mark2] are the same mark, that is, were created by the same call to [fresh]. *) val same: t -> t -> bool (** [none] is a distinguished mark, created via an initial call to [fresh()]. *) val none: t menhir-20210929/src/middle.ml000066400000000000000000000031661412503066000156300ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar module Run () = struct (* -------------------------------------------------------------------------- *) (* If [--random-sentence] was specified on the command line, obey it. *) let () = Settings.random_sentence |> Option.iter begin fun (nt, goal, style) -> match Nonterminal.lookup nt with | exception Not_found -> Error.error [] "the nonterminal symbol %s does not exist." nt | nt -> let sentence = RandomSentenceGenerator.nonterminal nt goal in print_endline (Sentence.print style (Some nt, sentence)); exit 0 end (* -------------------------------------------------------------------------- *) end (* Run *) menhir-20210929/src/middle.mli000066400000000000000000000021661412503066000160000ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The side effects in this module are executed after the grammar has been read and analyzed, but before the automaton is constructed. *) module Run () : sig end menhir-20210929/src/misc.ml000066400000000000000000000237711412503066000153310ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) let single = function | [ x ] -> x | _ -> assert false let rec mapd f = function | [] -> [] | x :: xs -> let y1, y2 = f x in y1 :: y2 :: mapd f xs let tabulate n f = let a = Array.init n f in Array.get a let sum n (f : int -> int) : int = let sum = ref 0 in for x = 0 to n - 1 do sum := !sum + f x done; !sum let with_buffer n f = let b = Buffer.create n in f b; Buffer.contents b type 'a iter = ('a -> unit) -> unit let separated_iter_to_string printer separator iter = with_buffer 32 (fun b -> let first = ref true in iter (fun x -> if !first then begin Buffer.add_string b (printer x); first := false end else begin Buffer.add_string b separator; Buffer.add_string b (printer x) end ) ) let separated_list_to_string printer separator xs = separated_iter_to_string printer separator (fun f -> List.iter f xs) let inverse (a : 'a array) : 'a -> int = let table = Hashtbl.create (Array.length a) in Array.iteri (fun i data -> assert (not (Hashtbl.mem table data)); Hashtbl.add table data i ) a; fun data -> try Hashtbl.find table data with Not_found -> assert false let support_assoc l x = try List.assoc x l with Not_found -> x (* Turning an implicit list, stored using pointers through a hash table, into an explicit list. The head of the implicit list is not included in the explicit list. *) let materialize (table : ('a, 'a option) Hashtbl.t) (x : 'a) : 'a list = let rec loop x = match Hashtbl.find table x with | None -> [] | Some x -> x :: loop x in loop x (* [iteri] implements a [for] loop over integers, from 0 to [n-1]. *) let iterij i j f = for x = i to j - 1 do f x done let iteri n f = iterij 0 n f let rec foldij i j f accu = if i < j then foldij (i + 1) j f (f i accu) else accu let foldi n f accu = foldij 0 n f accu let rec foldij_lazy i j f accu = if i < j then f i (fun () -> foldij_lazy (i + 1) j f accu) else accu (* [mapij start n f] produces the list [ f start; ... f (n-1) ]. *) let mapij start n f = List.rev ( foldij start n (fun i accu -> f i :: accu ) [] ) (* [mapi n f] produces the list [ f 0; ... f (n-1) ]. *) let mapi n f = mapij 0 n f (* [qfold f accu q] repeatedly takes an element [x] off the queue [q] and applies [f] to the accumulator and to [x], until [q] becomes empty. Of course, [f] can add elements to [q] as a side-effect. We allocate an option to ensure that [qfold] is tail-recursive. *) let rec qfold f accu q = match try Some (Queue.take q) with Queue.Empty -> None with | Some x -> qfold f (f accu x) q | None -> accu (* [qiter f q] repeatedly takes an element [x] off the queue [q] and applies [f] to [x], until [q] becomes empty. Of course, [f] can add elements to [q] as a side-effect. *) let qiter f q = try while true do f (Queue.take q) done with Queue.Empty -> () let rec smap f = function | [] -> [] | (x :: xs) as l -> let x' = f x and xs' = smap f xs in if x == x' && xs == xs' then l else x' :: xs' let rec smapa f accu = function | [] -> accu, [] | (x :: xs) as l -> let accu, x' = f accu x in let accu, xs' = smapa f accu xs in accu, if x == x' && xs == xs' then l else x' :: xs' let normalize s = let s = Bytes.of_string s in let n = Bytes.length s in for i = 0 to n - 1 do match Bytes.get s i with | '(' | ')' | ',' -> Bytes.set s i '_' | _ -> () done; Bytes.unsafe_to_string s (* [postincrement r] increments [r] and returns its original value. *) let postincrement r = let x = !r in r := x + 1; x (* [filter_map f l] returns the list of [y]s such that [f x = Some y] where [x] is in [l], preserving the order of elements of [l]. *) let filter_map f l = List.(rev (fold_left (fun ys x -> match f x with | None -> ys | Some y -> y :: ys ) [] l)) let new_encode_decode capacity = (* Set up a a hash table, mapping strings to unique integers. *) let module H = Hashtbl.Make(struct type t = string let equal = (=) let hash = Hashtbl.hash end) in let table = H.create capacity in (* Set up a resizable array, mapping integers to strings. *) let text = MenhirLib.InfiniteArray.make "" in (* This counts the calls to [encode]. *) let c = ref 0 in (* A string is mapped to a unique integer, as follows. *) let encode (s : string) : int = c := !c + 1; try H.find table s with Not_found -> (* The number of elements in the hash table is the next available unique integer code. *) let i = H.length table in H.add table s i; MenhirLib.InfiniteArray.set text i s; i (* An integer code can be mapped back to a string, as follows. *) and decode (i : int) : string = MenhirLib.InfiniteArray.get text i and verbose () = Printf.fprintf stderr "%d calls to intern; %d unique strings.\n%!" !c (H.length table) in encode, decode, verbose let rec best (preferable : 'a -> 'a -> bool) (xs : 'a list) : 'a option = match xs with | [] -> (* Special case: no elements at all, so no best element. This case does not participate in the recursion. *) None | [x] -> Some x | x :: xs -> (* If [x] is preferable to every element of [xs], then it is the best element of [x :: xs]. *) if List.for_all (preferable x) xs then Some x else (* [xs] is nonempty, so the recursive call is permitted. *) match best preferable xs with | Some y -> if preferable y x then (* If [y] is the best element of [xs] and [y] is preferable to [x], then [y] is the best element of [x :: xs]. *) Some y else (* There is no best element. *) None | None -> (* There is no best element. *) None let rec levels1 cmp x1 xs = match xs with | [] -> [x1], [] | x2 :: xs -> let ys1, yss = levels1 cmp x2 xs in if cmp x1 x2 = 0 then x1 :: ys1, yss else [x1], ys1 :: yss let levels cmp xs = match xs with | [] -> [] | x1 :: xs -> let ys1, yss = levels1 cmp x1 xs in ys1 :: yss (* Suppose [ys] is a list of elements that are pairwise incomparable with respect to the partial order [<=], and [x] is a new element. Then, [insert (<=) x ys] is the list obtained by inserting [x] and removing any non-maximal elements; so it is again a list of pairwise incomparable elements. *) let insert (<=) x ys = (* If [x] is subsumed by some element [y] of [ys], then there is nothing to do. In particular, no element [y] of [ys] can be subsumed by [x], since the elements of [ys] are pairwise incomparable. *) if List.exists (fun y -> x <= y) ys then ys (* Or [x] must be inserted, and any element [y] of [ys] that is subsumed by [x] must be removed. *) else x :: List.filter (fun y -> not (y <= x)) ys (* Suppose [xs] is an arbitrary list of elements. Then [trim (<=) xs] is the sublist of the elements of [xs] that are maximal with respect to the partial order [<=]. In other words, it is a sublist where every element that is less than some other element has been removed. *) (* One might wish to define [trim] using [List.filter] to keep just the maximal elements, but it is not so easy to say "keep an element only if it is not subsumed by some *other* element of the list". Instead, we iterate [insert]. *) let trim (<=) xs = List.fold_right (insert (<=)) xs [] let rec dup1 cmp x ys = match ys with | [] -> None | y :: ys -> if cmp x y = 0 then Some x else dup1 cmp y ys let dup cmp xs = match xs with | [] -> None | x :: xs -> dup1 cmp x xs let once x y = let s = ref x in fun () -> let result = !s in s := y; result module ListExtras = struct let rec equal (=) xs ys = match xs, ys with | [], [] -> true | x :: xs, y :: ys -> x = y && equal (=) xs ys | _ :: _, [] | [], _ :: _ -> false let hash hash xs = Hashtbl.hash (List.map hash xs) end let nth = function | 1 -> "first" | 2 -> "second" | 3 -> "third" | i -> Printf.sprintf "%dth" i let count = function | 1 -> "one" | 2 -> "two" | 3 -> "three" | i -> Printf.sprintf "%d" i let rec list_make n x = if n = 0 then [] else x :: list_make (n - 1) x (* [digits n] computes how many decimal digits are involved in the decimal representation of the integer [n]. *) let digits n = let rec loop accu n = if n < 10 then accu + 1 else loop (accu + 1) (n / 10) in loop 0 n (* [pad n s] pads the string [s] with zeroes in front so that its length is [n]. *) let pad n s = String.make (n - String.length s) '0' ^ s let padded_index n i = pad (digits n) (Printf.sprintf "%d" i) menhir-20210929/src/misc.mli000066400000000000000000000172571412503066000155040ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* Projection out of a singleton list. *) val single: 'a list -> 'a (* A variant of [List.map] where [f] returns a pair of elements, to be flattened into the new list. *) val mapd: ('a -> 'b * 'b) -> 'a list -> 'b list (* Tabulating a function using an internal array. [tabulate n f] returns a function that is extensionally equal to [f], but relies on an internal array. Arguments to [f] are of type [int] and are supposed to lie in the range [0..n). *) val tabulate: int -> (int -> 'a) -> (int -> 'a) (* [sum n f] computes the sum [f 0 + f 1 + ... + f (n-1)]. *) val sum: int -> (int -> int) -> int (* [with_buffer n f] creates a fresh buffer of size [n], passes it to [f], and returns the final content of the buffer. *) val with_buffer: int -> (Buffer.t -> unit) -> string (* [separated_list_to_string printer sep l] converts [l] into a string representation built by using [printer] on each element and [sep] as a separator. *) type 'a iter = ('a -> unit) -> unit val separated_iter_to_string: ('a -> string) -> string -> 'a iter -> string val separated_list_to_string: ('a -> string) -> string -> 'a list -> string (* If [a] is an array, therefore a mapping of integers to elements, then [inverse a] computes its inverse, a mapping of elements to integers. The type ['a] of elements must support the use of OCaml's generic equality and hashing functions. *) val inverse: 'a array -> ('a -> int) (* [support_assoc l x] returns the second component of the first couple in [l] whose first component is [x]. If it does not exist, it returns [x]. *) val support_assoc : ('a * 'a) list -> 'a -> 'a (* Turning an implicit list, stored using pointers through a hash table, into an explicit list. The head of the implicit list is not included in the explicit list. *) val materialize: ('a, 'a option) Hashtbl.t -> 'a -> 'a list (* [iteri] implements a [for] loop over integers, from 0 to [n-1]. *) val iteri: int -> (int -> unit) -> unit val iterij: int -> int -> (int -> unit) -> unit (* [foldij i j f accu] iterates on the semi-open interval [i, j), with an accumulator. [foldij_lazy i j f accu] is analogous, but is interruptible: if at some point the function [f] does not demand its second argument, then iteration stops early. [foldij] and [foldij_lazy] iterate in the same direction, from left to right, but do not build the accumulator in the same way: the calls to [f] are associated differently. (In that respect, [foldij] is a left fold, while [foldij_lazy] is a right fold.) *) (* [foldi] implements a [for] loop over integers, from 0 to [n-1], with an accumulator. *) val foldi: int -> (int -> 'a -> 'a) -> 'a -> 'a val foldij: int -> int -> (int -> 'a -> 'a) -> 'a -> 'a val foldij_lazy: int -> int -> (int -> (unit -> 'a) -> 'a) -> 'a -> 'a (* [mapij start n f] produces the list [ f start; ... f (n-1) ]. *) val mapij: int -> int -> (int -> 'a) -> 'a list (* [mapi n f] produces the list [ f 0; ... f (n-1) ]. *) val mapi: int -> (int -> 'a) -> 'a list (* [qfold f accu q] repeatedly takes an element [x] off the queue [q] and applies [f] to the accumulator and to [x], until [q] becomes empty. Of course, [f] can add elements to [q] as a side-effect. *) val qfold: ('a -> 'b -> 'a) -> 'a -> 'b Queue.t -> 'a (* [qiter f q] repeatedly takes an element [x] off the queue [q] and applies [f] to [x], until [q] becomes empty. Of course, [f] can add elements to [q] as a side-effect. *) val qiter: ('b -> unit) -> 'b Queue.t -> unit (* [smap] has the same semantics as [List.map], but attempts to physically return the input list when [f] is the identity. *) val smap: ('a -> 'a) -> 'a list -> 'a list (* [smapa] is a variant of [smap] that maintains an accumulator. *) val smapa: ('b -> 'a -> 'b * 'a) -> 'b -> 'a list -> 'b * 'a list (* [normalize s] returns a copy of [s] where parentheses and commas are replaced with underscores. *) val normalize: string -> string (* [postincrement r] increments [r] and returns its original value. *) val postincrement: int ref -> int (* [filter_map f l] returns the list of [y]s such that [f x = Some y] where [x] is in [l], preserving the order of elements of [l]. *) val filter_map : ('a -> 'b option) -> 'a list -> 'b list (* [new_encode_decode capacity] creates a new service for assigning unique integer codes to strings. [capacity] is the initial capacity of the internal hash table. [new_encode_decode] returns a triple [encode, decode, verbose], where [encode] and [decode] translate between strings and unique integer codes and [verbose] prints statistics about the use of the service so far. *) val new_encode_decode: int -> (string -> int) * (int -> string) * (unit -> unit) (* If [preferable] is a partial order on elements, then [best preferable xs] returns the best (least) element of [xs], if there is one. Its complexity is quadratic. *) val best: ('a -> 'a -> bool) -> 'a list -> 'a option (* Assuming that the list [xs] is sorted with respect to the ordering [cmp], [levels cmp xs] is the list of levels of [xs], where a level is a maximal run of adjacent equal elements. Every level is a nonempty list. *) val levels: ('a -> 'a -> int) -> 'a list -> 'a list list (* Suppose [xs] is an arbitrary list of elements. Then [trim (<=) xs] is the sublist of the elements of [xs] that are maximal with respect to the partial order [<=]. In other words, it is a sublist where every element that is less than some other element has been removed. *) val trim: ('a -> 'a -> bool) -> 'a list -> 'a list (* Assuming that the list [xs] is sorted with respect to the ordering [cmp], [dup cmp xs] returns a duplicate element of the list [xs], if one exists. *) val dup: ('a -> 'a -> int) -> 'a list -> 'a option (* [once x y] produces a function [f] which produces [x] the first time it is called and produces [y] forever thereafter. *) val once: 'a -> 'a -> (unit -> 'a) (* Equality and hashing for lists, parameterized over equality and hashing for elements. *) module ListExtras : sig val equal: ('a -> 'a -> bool) -> 'a list -> 'a list -> bool val hash: ('a -> int) -> 'a list -> int end (* A nice way of printing [n] in English, for concrete values of [n]. *) val count: int -> string (* A nice way of printing "nth" in English, for concrete values of [n]. *) val nth: int -> string (* [List.make] *) val list_make: int -> 'a -> 'a list (* [padded_index n i] produces a padded string representation of the index [i], which must lie in the semi-open interval [0, n). It is defined in such a way that all indices are mapped to strings of equal length. This ensures that alphabetical ordering coincides with numeric ordering. *) val padded_index: int -> int -> string menhir-20210929/src/newRuleSyntax.ml000066400000000000000000000430031412503066000172140ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Syntax (* Because the main function, [NewRuleSyntax.rule], is called by the stage 2 parser (fancy-parser) and nowhere else, this file is type-checked and compiled only at stage 2, not at stage 1. Be sure to run [make bootstrap]. *) (* -------------------------------------------------------------------------- *) (* -------------------------------------------------------------------------- *) (* Function composition. *) let (>>) f g x = g (f x) (* -------------------------------------------------------------------------- *) (* Constructors for the new rule syntax. *) (* [return pos x] is the semantic action [{x}]. *) let return pos x : seq_expression = let action = Action.from_il_expr (IL.EVar x) in let raw_action _ _ = action in Positions.with_pos pos (EAction (XATraditional raw_action, None)) (* -------------------------------------------------------------------------- *) (* -------------------------------------------------------------------------- *) (* Converting the new syntax to the old syntax. *) (* The new syntax is organized in several levels: choice expressions, sequence expressions, symbol expressions, action expressions. The code below reflects this organization. *) (* -------------------------------------------------------------------------- *) (* When a [~] pattern appears at the top level in [~ = e1; e2] and furthermore the expression [e1] is a symbol [x1], then this is considered a pun -- that is, [~] is sugar for [x1]. We resolve these puns in a first pass, before we check that patterns are linear, so a linearity violation that involves [~] will be correctly caught. *) (* There can still remain [~] patterns after puns are resolved, but they stand for fresh variables and cannot cause a linearity violation. *) let rec resolve_puns (e : seq_expression) : seq_expression = Positions.map (fun e -> match e with | ECons (SemPatTilde pos, (ESymbol (x1, [], _) as e1), e2) when ParserAux.valid_ocaml_identifier x1 -> (* This is a pun. Resolve it. *) let x1 = Positions.with_pos pos (Positions.value x1) in (* optional *) ECons (SemPatVar x1, e1, resolve_puns e2) | ECons (p1, e1, e2) -> ECons (p1, e1, resolve_puns e2) | ESingleton _ | EAction _ -> e ) e (* -------------------------------------------------------------------------- *) (* Checking that a new-syntax pattern is linear, i.e., that no variable is bound twice. *) (* We first build a mapping of variables to the places where they are bound, then check that every list in the image of this mapping is a singleton list. *) let check_linearity (ps : pattern list) = let rec build (m : Positions.positions StringMap.t) (p : pattern) = match p with | SemPatVar x -> let x, pos = Positions.decompose x in StringMap.multiple_add x pos m | SemPatWildcard | SemPatTilde _ -> m | SemPatTuple ps -> List.fold_left build m ps in let m = List.fold_left build StringMap.empty ps in StringMap.iter (fun x positions -> if List.length positions > 1 then Error.error positions "The variable %s is bound several times." x ) m let rec patterns (e : seq_expression) : pattern list = let e = Positions.value e in match e with | ECons (p, _, e) -> p :: patterns e | ESingleton _ | EAction _ -> [] let check_linearity : seq_expression -> unit = patterns >> check_linearity (* -------------------------------------------------------------------------- *) (* Determining whether a pattern contains a [~] subpattern. *) let rec tilde_used positions (p : pattern) = match p with | SemPatVar _ | SemPatWildcard -> positions | SemPatTilde pos -> pos :: positions | SemPatTuple ps -> List.fold_left tilde_used positions ps (* Emitting a warning when a [~] subpattern has been used but the sequence expression ends in something other than a point-free semantic action. *) let tilde_used_warning positions = let n = List.length positions in if n > 0 then let variables_have, tpatterns, wpatterns = if n = 1 then "variable has", "a ~ pattern", "a wildcard pattern _" else "variables have", "~ patterns", "wildcard patterns _" in Error.warning positions "%s nameless %s been introduced by %s,\n\ yet this sequence does not end in a point-free semantic action <...>.\n\ Perhaps %s should be used instead." (Misc.count n) variables_have tpatterns wpatterns (* -------------------------------------------------------------------------- *) (* Converting a new-syntax pattern to an IL pattern. *) (* Here, [x1] is the variable that holds the semantic value; it is typically named [_1], [_2], etc. When we encounter a [~] pattern, we convert it to a fresh name, using [x1] as a basis in the generation of this fresh name. *) let pattern (x1 : identifier) (p : pattern) : IL.pattern = let c = ref 1 in let fresh () = Printf.sprintf "%s_%d" x1 (Misc.postincrement c) in let rec pattern p = match p with | SemPatVar x -> IL.PVar (Positions.value x) | SemPatWildcard -> IL.PWildcard | SemPatTilde _ -> IL.PVar (fresh()) | SemPatTuple [] -> IL.PUnit | SemPatTuple [p] -> pattern p | SemPatTuple ps -> IL.PTuple (List.map pattern ps) in pattern p (* [bv accu p] accumulates the bound variables of a pattern [p] produced by the above function. The ordering is significant; variables must be accumulated left to right (so we get a reversed list). *) let rec bv accu p = match p with | IL.PVar x -> x :: accu | IL.PWildcard -> accu | IL.PUnit -> accu | IL.PTuple ps -> List.fold_left bv accu ps | _ -> (* Impossible; not produced above. *) assert false (* -------------------------------------------------------------------------- *) (* Extracting the attributes of a symbol expression. *) let attributes (e : symbol_expression) : attributes = match e with | ESymbol (_, _, attrs) -> attrs (* -------------------------------------------------------------------------- *) (* As we descend into a sequence expression and prepare to build a production, we maintain a context of the following form. *) type context = { (* The position of the complete sequence expression. *) pos: Positions.t; (* The prefix of the production's right-hand side that has been built so far. This is reversed list of producers. Every producer carries an identifier, which is either user-supplied or auto-generated. *) producers: producer list; (* The user-supplied names under which the semantic values are known. Not every semantic value has such a name, as the user can supply no pattern, a wildcard pattern, or a tuple pattern; in either of these cases, there is no name for the semantic value. This is a reversed list. Its length is equal to the length of the list [producers] above. *) uxs: identifier option list; (* A set of (independent) bindings that must be wrapped around the semantic action. These are typically bindings of the form [let p = x in ...]. *) bindings: action -> action; (* A tuple of variables that the user has bound, either explicitly or via the [~] notation. This is a reversed list. Its length is unrelated to the length of the above lists, because one semantic value can be matched against a pattern that binds zero, one, or more variables. Once complete, this tuple becomes the argument to a point-free semantic action. *) tuple: identifier list; (* A list of positions indicating where [~] patterns appear. This flag is maintained as we descend into a [seq_expression] whose puns have been resolved already. Thus, when this list is nonempty, we expect that this [seq_expression] ends with a point-free semantic action; otherwise, there is no point in using [~], and the user could have used [_] instead. We issue a warning if the [seq_expression] does not end with a point-free semantic action. *) tilde_used: Positions.positions; } (* The empty context. *) let empty pos : context = { pos; producers = []; uxs = []; bindings = (fun a -> a); tuple = []; tilde_used = []; } (* Recording a user-supplied identifier. *) let user (x : identifier located) : identifier option = Some (Positions.value x) let auto : identifier option = None (* Accessing the producers. *) let producers context : producer list = List.rev context.producers (* Accessing the user-supplied identifiers. *) let uxs context : identifier option array = Array.of_list (List.rev context.uxs) (* Accessing the tuple. *) let tuple context : identifier list = List.rev context.tuple (* -------------------------------------------------------------------------- *) (* OCaml variables for semantic values. *) (* We do not use a fresh name generator. Instead, we use our [context] to generate names of the form [_1], [_2], etc., corresponding to the current index in the production that we are building. *) let semvar context : identifier = let i = List.length context.producers + 1 in Printf.sprintf "_%d" i (* -------------------------------------------------------------------------- *) (* Converting a symbol expression to a parameter. *) let rec parameter (e : symbol_expression) : parameter = match e with | ESymbol (sym, args, _attrs) -> (* Converting a symbol expression is easy. Note, however, that the arguments [args] are arbitrary expressions. *) Parameters.app sym (List.map nested_parameter args) (* Converting an arbitrary expression to a parameter. *) and nested_parameter (e : expression) : parameter = match Positions.value e with | EChoice [ Branch ({ Positions.value = ESingleton e }, _) ] -> (* A choice with only one branch, whose body is a trivial sequence consisting of just a symbol, is simplified on the fly. This is important, as it allows us to avoid falling into the default case below, where an anonymous rule is generated. E.g., when we have an application [f(x)] of a parameterized symbol [f] to a symbol [x], we don't want an anonymous rule to be generated for [x]. That would be wasteful and (more seriously) could cause the grammar-expansion-termination check to fail. *) parameter e | EChoice _ -> (* A choice expression is converted to an anonymous rule. *) let pos = Positions.position e in ParameterAnonymous (Positions.with_pos pos (productions e)) (* Converting the head of a sequence, a pair [x = e1] of a variable [x] and a symbol expression [e1], to a producer. *) and producer x (e1 : symbol_expression) : producer = x, parameter e1, attributes e1 (* Converting the head of a sequence, a pair [p = e1] of a pattern [p] and a symbol expression [e1], to a context extension. *) and extend (p : pattern) (e1 : symbol_expression) (context : context) : context = match p with | SemPatVar x1 -> (* The variable [x1] is bound to the semantic value of [e1]. *) (* Puns have been resolved already, so they are handled by this code. *) { pos = context.pos; producers = producer x1 e1 :: context.producers; uxs = user x1 :: context.uxs; bindings = context.bindings; tuple = Positions.value x1 :: context.tuple; tilde_used = context.tilde_used } | _ -> (* An arbitrary pattern [p] is used. We bind a variable [x1] to the semantic value of [e1], and wrap the semantic action in a binding [let p = x1 in ...]. Any [~] subpatterns within [p] are translated to fresh identifiers. *) let x1 = semvar context in let tilde_used = tilde_used context.tilde_used p in let p : IL.pattern = pattern x1 p in let binding = Action.bind p x1 in let x1 = Positions.unknown_pos x1 in { pos = context.pos; producers = producer x1 e1 :: context.producers; uxs = auto :: context.uxs; bindings = binding >> context.bindings; tuple = bv context.tuple p; tilde_used } (* Converting a sequence expression to a production. *) and production_aux (context : context) (e : seq_expression) (level : branch_production_level) : parameterized_branch = let e, pos = Positions.decompose e in match e with | ECons (p, e1, e2) -> (* A sequence expression [p = e1; e2]. Based on [p] and [e1], extend the context, then continue with [e2]. *) production_aux (extend p e1 context) e2 level | EAction (XATraditional raw_action, prec) -> (* An action expression. This is the end of the sequence. *) tilde_used_warning context.tilde_used; (* Check that the semantic action seems well-formed. *) let action = raw_action Settings.DollarsDisallowed (uxs context) in (* Build and return a complete production. *) { pr_branch_position = context.pos; pr_producers = producers context; pr_action = context.bindings action; pr_branch_prec_annotation = prec; pr_branch_production_level = level; } | EAction (XAPointFree oid, prec) -> (* A point-free semantic action, containing an OCaml identifier [id] between angle brackets. This is syntactic sugar for a traditional semantic action containing an application of [id] to a tuple of the semantic values that have been assigned a name by the user. *) (* As a special case, if [oid] is [None], then we must not build an application node -- we simply build a tuple. *) (* [id] is actually a stretch, not just a string, and this matters when there is an OCaml error (e.g., [id] is undeclared, or ill-typed). The stretch contains source code location information which allows the error to be reported in the source file. *) (* Build the tuple as an IL expression. *) let evar x = IL.EVar x in let evars xs = List.map evar xs in let tuple = CodeBits.etuple (evars (tuple context)) in (* Build an application of [id] to this tuple. *) (* We abuse the abstract syntax of IL and build an application node, regardless of whether [id] is a (possibly qualified) value, a (possibly qualified) data constructor, a polymorphic variant constructor, etc. *) let e = match oid with | Some id -> IL.EApp (IL.ETextual id, [tuple]) | None -> tuple in (* Build a traditional semantic action. *) let action = Action.from_il_expr e in let raw_action _ _ = action in let e = EAction (XATraditional raw_action, prec) in let e = Positions.with_pos pos e in (* Reset [tilde_used], to avoid triggering the warning via our recursive call. *) let context = { context with tilde_used = [] } in (* Done. *) production_aux context e level | ESingleton e -> tilde_used_warning context.tilde_used; (* When a symbol expression [e] appears as the last element of a sequence, this is understood as syntactic sugar for [x = e; {x}], where [x] is a fresh variable. *) (* Another option would be to view it as sugar for [~ = e; <>]. That would also make sense, but would depart from the lambda-calculus convention that in a sequence [e1; e2; e3] only the value of the last expression is returned. *) (* No %prec annotation can be supplied when this sugar is used. *) let x = semvar context in let e = ECons (SemPatVar (Positions.with_pos pos x), e, return pos x) in let e = Positions.with_pos pos e in production_aux context e level and production (Branch (e, level) : branch) = let e = resolve_puns e in check_linearity e; let pos = Positions.position e in production_aux (empty pos) e level and productions (e : expression) : parameterized_branch list = match Positions.value e with | EChoice branches -> List.map production branches (* -------------------------------------------------------------------------- *) (* Converting a new rule to an old rule. *) let rule (rule : rule) : parameterized_rule = { pr_public_flag = rule.rule_public; pr_inline_flag = rule.rule_inline; pr_nt = Positions.value rule.rule_lhs; pr_positions = [ Positions.position rule.rule_lhs ]; pr_attributes = rule.rule_attributes; pr_parameters = List.map Positions.value rule.rule_formals; pr_branches = productions rule.rule_rhs } menhir-20210929/src/newRuleSyntax.mli000066400000000000000000000022611412503066000173660ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Syntax (* The new rule syntax is desugared to the old rule syntax. The translation exploits anonymous rules, so it must be performed before anonymous rules are eliminated. *) val rule: rule -> parameterized_rule menhir-20210929/src/nonterminalType.ml000066400000000000000000000100341412503066000175520ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open BasicSyntax open IL (* This is the conventional name of the nonterminal GADT, which describes the nonterminal symbols. *) let tcnonterminalgadt = "nonterminal" let tnonterminalgadt a = TypApp (tcnonterminalgadt, [ a ]) (* This is the conventional name of the data constructors of the nonterminal GADT. *) let tnonterminalgadtdata nt = "N_" ^ Misc.normalize nt (* This is the definition of the nonterminal GADT. Here, the data constructors have no value argument, but have a type index. *) exception MissingOCamlType of string let nonterminalgadtdef grammar = assert Settings.inspection; let comment, datadefs = try (* The ordering of this list matters. We want the data constructors to respect the internal ordering (as determined by [nonterminals] in [BasicSyntax]) of the nonterminal symbols. This may be exploited in the table back-end to allow an unsafe conversion of a data constructor to an integer code. See [n2i] in [InspectionTableInterpreter]. *) "The indexed type of nonterminal symbols.", List.map (fun nt -> let index = match ocamltype_of_symbol grammar nt with | Some t -> TypTextual t | None -> raise (MissingOCamlType nt) in { dataname = tnonterminalgadtdata nt; datavalparams = []; datatypeparams = Some [ index ] } ) (nonterminals grammar) with MissingOCamlType nt -> (* If the type of some nonterminal symbol is unknown, give up and define ['a nonterminal] as an abstract type. This is useful when we are in [--(raw)-depend] mode and we do not wish to fail. Instead, we produce a mock [.mli] file that is an approximation of the real [.mli] file. When we are not in [--(raw)-depend] mode, though, this is a problem. We display an error message and stop. *) Settings.(match infer with | IMDependRaw | IMDependPostprocess -> "The indexed type of nonterminal symbols (mock!).", [] | IMNone -> Error.error [] "\ the type of the nonterminal symbol %s is unknown.\n\ When --inspection is set, the type of every nonterminal symbol must be known.\n\ Please enable type inference (see --infer and --infer-read-reply)\n\ or specify the type of every symbol via %%type declarations." nt | IMInfer | IMReadReply _ -> (* This should not happen: when [--infer] or [--infer-read-reply] is set, the types of all nonterminal symbols should be known. *) assert false | IMWriteQuery _ -> (* This should not happen: when [--infer-write-query] is set, we write a mock [.ml] file, but no [.mli] file, so this function should never be called. *) assert false) in [ IIComment comment; IITypeDecls [{ typename = tcnonterminalgadt; typeparams = [ "_" ]; typerhs = TDefSum datadefs; typeconstraint = None }] ] menhir-20210929/src/nonterminalType.mli000066400000000000000000000060621412503066000177310ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module deals with the definition of the type that describes the nonterminal symbols. *) (* This is the conventional name of the [nonterminal] GADT. This is an indexed type (i.e., it has one type parameter). Its data constructors carry zero value arguments. *) val tcnonterminalgadt: string val tnonterminalgadt: IL.typ -> IL.typ (* [tnonterminalgadtdata nt] is the conventional name of the data constructor associated with the non-terminal symbol [nt]. *) val tnonterminalgadtdata: string -> string (* This is the definition of the [nonterminal] GADT, for use by the code generators. This definition can be constructed only if the type of every nonterminal symbol is known, either because the user has provided this information, or because [--infer] has been set and inference has been performed already. This definition is produced only in [--inspection] mode. *) val nonterminalgadtdef: BasicSyntax.grammar -> IL.interface (* When in [--(raw-)depend] mode, we are asked to produce a mock [.mli] file before [--infer] has run, which means that we are usually not able to construct the definition of the [nonterminal] GADT. This implies that the mock [.mli] file is a subset of the final [.mli] file. I believe that, when working with [ocamlbuild], this is not a problem. In fact, the mock [.mli] file could just as well be empty or absent, and things would still work: in principle, it is enough for us to publish which files we need in order to be able to type-check the real [.ml] file used by [--infer]. However, when working with [make], which is unable to mix the production of targets and the computation of dependencies, we additionally need to predict which files one will need in order to compile the real [.mli] and [.ml] files. Here, the fact that the mock [.mli] file is incomplete could in theory be a problem, leading to incomplete dependencies. The problem does not lie in the line [parser.ml parser.mli: ...] that we add; it lies in the lines produced by [ocamldep] itself, where the line [parser.cmi: ...] is missing some dependencies. *) menhir-20210929/src/option.ml000066400000000000000000000037271412503066000157050ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) let defined = function | None -> false | Some _ -> true let map f = function | None -> None | Some x -> Some (f x) let iter f o = match o with | None -> () | Some x -> f x let fold f o accu = match o with | None -> accu | Some x -> f x accu let force = function | Some x -> x | None -> assert false let project = function | Some x -> x | None -> (* Presumably, an error message has already been printed. *) exit 1 let equal equal o1 o2 = match o1, o2 with | None, None -> true | Some x1, Some x2 -> equal x1 x2 | None, Some _ | Some _, None -> false let hash hash = function | Some x -> hash x | None -> Hashtbl.hash None let value o ~default = match o with | Some x -> x | None -> default let rec first_value = function | Some e :: _ -> Some e | None :: li -> first_value li | [] -> None let simplify = function None -> None | Some o -> o menhir-20210929/src/option.mli000066400000000000000000000027211412503066000160470ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) val defined: 'a option -> bool val map: ('a -> 'b) -> 'a option -> 'b option val iter: ('a -> unit) -> 'a option -> unit val fold: ('a -> 'b -> 'b) -> 'a option -> 'b -> 'b val force: 'a option -> 'a val project: 'a option -> 'a (* careful: calls [exit 1] in case of failure *) val equal: ('a -> 'b -> bool) -> 'a option -> 'b option -> bool val hash: ('a -> int) -> 'a option -> int val value: 'a option -> default:'a -> 'a val first_value: 'a option list -> 'a option val simplify: 'a option option -> 'a optionmenhir-20210929/src/parameters.ml000066400000000000000000000073051412503066000165340ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* TEMPORARY clean up and write an .mli file *) open Syntax open Positions let app p ps = match ps with | [] -> ParameterVar p | _ -> ParameterApp (p, ps) let unapp = function | ParameterVar x -> (x, []) | ParameterApp (p, ps) -> (p, ps) | ParameterAnonymous _ -> (* Anonymous rules are eliminated early on. *) assert false let unvar = function | ParameterVar x -> x | ParameterApp _ | ParameterAnonymous _ -> assert false let rec map f = function | ParameterVar x -> ParameterVar (f x) | ParameterApp (p, ps) -> ParameterApp (f p, List.map (map f) ps) | ParameterAnonymous _ -> (* Anonymous rules are eliminated early on. *) assert false let rec fold f init = function | ParameterVar x -> f init x | ParameterApp (p, ps) -> f (List.fold_left (fold f) init ps) p | ParameterAnonymous _ -> (* Anonymous rules are eliminated early on. *) assert false let identifiers m p = fold (fun accu x -> StringMap.add x.value x.position accu) m p let rec occurs (x : symbol) (p : parameter) = match p with | ParameterVar y -> x = y.value | ParameterApp (y, ps) -> x = y.value || List.exists (occurs x) ps | ParameterAnonymous _ -> assert false let occurs_shallow (x : symbol) (p : parameter) = match p with | ParameterVar y -> x = y.value | ParameterApp (y, _) -> assert (x <> y.value); false | ParameterAnonymous _ -> assert false let occurs_deep (x : symbol) (p : parameter) = match p with | ParameterVar _ -> false | ParameterApp (_, ps) -> List.exists (occurs x) ps | ParameterAnonymous _ -> assert false type t = parameter let rec equal x y = match x, y with | ParameterVar x, ParameterVar y -> x.value = y.value | ParameterApp (p1, p2), ParameterApp (p1', p2') -> p1.value = p1'.value && List.for_all2 equal p2 p2' | _ -> (* Anonymous rules are eliminated early on. *) false let hash = function | ParameterVar x | ParameterApp (x, _) -> Hashtbl.hash (Positions.value x) | ParameterAnonymous _ -> (* Anonymous rules are eliminated early on. *) assert false let position = function | ParameterVar x | ParameterApp (x, _) -> Positions.position x | ParameterAnonymous bs -> Positions.position bs let with_pos p = Positions.with_pos (position p) p let rec print with_spaces = function | ParameterVar x | ParameterApp (x, []) -> x.value | ParameterApp (x, ps) -> let separator = if with_spaces then ", " else "," in Printf.sprintf "%s(%s)" x.value (Misc.separated_list_to_string (print with_spaces) separator ps) | ParameterAnonymous _ -> assert false menhir-20210929/src/parserAux.ml000066400000000000000000000114701412503066000163410ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Positions open Stretch open Syntax type early_producer = Positions.t * identifier located option * parameter * attributes type early_producers = early_producer list type early_production = early_producers * string located option * (* optional precedence *) branch_production_level * Positions.t type early_productions = early_production list let new_precedence_level = let c = ref 0 in fun (pos1, pos2) -> incr c; PrecedenceLevel (InputFile.get_input_file (), !c, pos1, pos2) let new_production_level = let c = ref 0 in fun () -> incr c; ProductionLevel (InputFile.get_input_file (), !c) let new_on_error_reduce_level = new_production_level (* the counter is shared with [new_production_level], but this is irrelevant *) module IdSet = Set.Make (struct type t = identifier located let compare id1 id2 = compare (value id1) (value id2) end) let defined_identifiers (_, ido, _, _) accu = Option.fold IdSet.add ido accu let defined_identifiers (producers : early_producers) = List.fold_right defined_identifiers producers IdSet.empty let check_production_group (right_hand_sides : early_productions) = match right_hand_sides with | [] -> (* A production group cannot be empty. *) assert false | (producers, _, _, _) :: right_hand_sides -> let ids = defined_identifiers producers in List.iter (fun (producers, _, _, _) -> let ids' = defined_identifiers producers in try let id = IdSet.choose (IdSet.union (IdSet.diff ids ids') (IdSet.diff ids' ids)) in Error.error [Positions.position id] "two productions that share a semantic action must define exactly\n\ the same identifiers. Here, \"%s\" is defined\n\ in one production, but not in all of them." (Positions.value id) with Not_found -> () ) right_hand_sides (* [normalize_producer i p] assigns a name of the form [_i] to the unnamed producer [p]. *) let normalize_producer i (pos, opt_identifier, parameter, attrs) = let id = match opt_identifier with | Some id -> id | None -> Positions.with_pos pos ("_" ^ string_of_int (i + 1)) in (id, parameter, attrs) let normalize_producers (producers : early_producers) : producer list = List.mapi normalize_producer producers let override pos o1 o2 = match o1, o2 with | Some _, Some _ -> Error.error [ pos ] "this production carries two %%prec declarations." | None, Some _ -> o2 | _, None -> o1 (* Only unnamed producers can be referred to using positional identifiers. Besides, such positions must be taken in the interval [1 .. List.length producers]. The output array [p] is such that [p.(idx) = Some x] if [idx] must be referred to using [x], not [$(idx + 1)]. *) let producer_names (producers : early_producers) = producers |> List.map (fun (_, oid, _, _) -> Option.map Positions.value oid) |> Array.of_list (* Check that a stretch contains an OCaml lowercase or uppercase identifier, and convert this stretch to a string. The stretch may be empty, too. *) let validate_pointfree_action (ty : ocamltype) : Stretch.t option = match ty with | Inferred _ -> assert false | Declared stretch -> let s = stretch.stretch_raw_content in if Lexpointfree.validate_pointfree_action (Lexing.from_string s) then Some stretch else None (* Test whether a string is a valid OCaml lowercase identifier. *) (* [x] should be a LID, UID, or QID, produced by Menhir's main lexer. Testing its first character should suffice, but we are more cautious and validate it thoroughly. *) let valid_ocaml_identifier (x : identifier located) : bool = Lexpointfree.valid_ocaml_identifier (Lexing.from_string (Positions.value x)) menhir-20210929/src/parserAux.mli000066400000000000000000000102411412503066000165050ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module provides utilities that are shared by the two versions of the parser. *) open Stretch open Syntax (* A few types used in the parser. *) type early_producer = Positions.t * identifier located option * parameter * attributes type early_producers = early_producer list type early_production = early_producers * string located option * (* optional precedence *) branch_production_level * Positions.t type early_productions = early_production list (* [new_precedence_level (pos1, pos2)] creates a new precendence level, which is stronger than any levels previously created by this function. It should be called every time a [%left], [%right], or [%nonassoc] declaration is found. The positions are the positions of this declaration in the source code. The precedence levels created by this function are attached to tokens and (via %prec) to productions. They are used in solving shift/reduce and shift/reduce/reduce conflicts. *) val new_precedence_level: Lexing.position * Lexing.position -> precedence_level (* [new_production_level()] creates a new production level, which is stronger than any levels previously created by this function. It should be called every time a new production is found. The production levels created by this function are attached to productions. They are used in solving reduce/reduce conflicts: following ocamlyacc and bison, the production that appears first in the grammar receives preference. It may seem very strange that %prec annotations do not influence this process, but that's how it is, at least for the moment. *) val new_production_level: unit -> branch_production_level (* [new_on_error_reduce_level()] creates a new level, which is attached to an [%on_error_reduce] declaration. *) val new_on_error_reduce_level: unit -> on_error_reduce_level (* [check_production_group] accepts a production group and checks that all productions in the group define the same set of identifiers. *) val check_production_group: early_productions -> unit (* [normalize_producers] accepts a list of producers where identifiers are optional and returns a list of producers where identifiers are mandatory. A missing identifier in the [i]-th position receives the conventional name [_i]. *) val normalize_producers: early_producers -> producer list (* [override pos oprec1 oprec2] decides which of the two optional %prec declarations [oprec1] and [oprec2] applies to a production. It signals an error if the two are present. *) val override: Positions.t -> 'a option -> 'a option -> 'a option (* [producer_names producers] returns an array [names] such that [names.(idx) = None] if the (idx + 1)-th producer is unnamed and [names.(idx) = Some id] if it is called [id]. *) val producer_names: early_producers -> identifier option array (* Check that a stretch represents valid content for a point-free semantic action, i.e., either just whitespace, or an OCaml lowercase or uppercase identifier. May raise [Lexpointfree.InvalidPointFreeAction]. *) val validate_pointfree_action: ocamltype -> Stretch.t option (* Test whether a string is a valid OCaml lowercase identifier. *) val valid_ocaml_identifier: identifier located -> bool menhir-20210929/src/partialGrammar.ml000066400000000000000000000652131412503066000173360ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Syntax open Positions (* ------------------------------------------------------------------------- *) (* This adds one declaration [decl], as found in file [filename], to the grammar [grammar]. *) let join_declaration filename (grammar : grammar) decl = match decl.value with (* Preludes are stored in an arbitrary order. The order of preludes within a single source file is preserved. Same treatment for functor parameters. *) | DCode code -> { grammar with p_preludes = grammar.p_preludes @ [ code ] } | DParameter (Stretch.Declared stretch) -> { grammar with p_parameters = grammar.p_parameters @ [ stretch ] } | DParameter (Stretch.Inferred _) -> assert false (* Token declarations are recorded. Things are made somewhat difficult by the fact that %token and %left-%right-%nonassoc declarations are independent. *) (* If a token carries an alias, it is recorded in the field [tk_alias]. *) | DToken (ocamltype, terminal, alias, attributes) -> let token_property = try (* Retrieve any previous definition for this token. *) let token_property = StringMap.find terminal grammar.p_tokens in (* If the previous definition was actually a %token declaration (as opposed to a %left, %right, or %nonassoc specification), signal an error. *) if token_property.tk_is_declared then Error.errorp decl "the token %s has multiple definitions." terminal; (* Otherwise, update the previous definition. *) { token_property with tk_is_declared = true; tk_ocamltype = ocamltype; tk_filename = filename; tk_position = decl.position; tk_attributes = attributes; tk_alias = alias; } with Not_found -> (* If no previous definition exists, create one. *) { tk_filename = filename; tk_ocamltype = ocamltype; tk_associativity = UndefinedAssoc; tk_precedence = UndefinedPrecedence; tk_position = decl.position; tk_attributes = attributes; tk_is_declared = true; tk_alias = alias; } in { grammar with p_tokens = StringMap.add terminal token_property grammar.p_tokens } (* Start symbols. *) | DStart nonterminal -> { grammar with p_start_symbols = StringMap.add nonterminal decl.position grammar.p_start_symbols } (* Type declarations for nonterminals. *) | DType (ocamltype, nonterminal) -> { grammar with p_types = (nonterminal, with_pos (position decl) ocamltype)::grammar.p_types } (* Reductions on error for nonterminals. *) | DOnErrorReduce (nonterminal, prec) -> { grammar with p_on_error_reduce = (nonterminal, prec) :: grammar.p_on_error_reduce } (* Token associativity and precedence. *) | DTokenProperties (terminal, assoc, prec) -> (* Retrieve the property record for this token, creating one if none existed (but without deeming the token to have been declared). *) let token_properties, grammar = try StringMap.find terminal grammar.p_tokens, grammar with Not_found -> let p = { tk_filename = filename; tk_ocamltype = None; tk_associativity = UndefinedAssoc; tk_precedence = prec; tk_is_declared = false; tk_attributes = []; tk_alias = None; (* Will be updated later. *) tk_position = decl.position; } in p, { grammar with p_tokens = StringMap.add terminal p grammar.p_tokens } in (* Reject duplicate precedence declarations. *) if token_properties.tk_associativity <> UndefinedAssoc then Error.error [ decl.position; token_properties.tk_position ] "there are multiple precedence declarations for token %s." terminal; (* Record the new declaration. *) token_properties.tk_precedence <- prec; token_properties.tk_associativity <- assoc; grammar | DGrammarAttribute attr -> { grammar with p_grammar_attributes = attr :: grammar.p_grammar_attributes } | DSymbolAttributes (actuals, attrs) -> { grammar with p_symbol_attributes = (actuals, attrs) :: grammar.p_symbol_attributes } (* ------------------------------------------------------------------------- *) (* This stores an optional postlude into a grammar. Postludes are stored in an arbitrary order. *) let join_postlude postlude grammar = match postlude with | None -> grammar | Some postlude -> { grammar with p_postludes = postlude :: grammar.p_postludes } (* ------------------------------------------------------------------------- *) (* We rewrite definitions when nonterminals are renamed. The renaming [phi] is an association list of names to names. *) type renaming = (nonterminal * nonterminal) list let identity_renaming = [] let rewrite_nonterminal (phi : renaming) nonterminal = Misc.support_assoc phi nonterminal let rewrite_parameter phi parameter = Parameters.map (Positions.map (Misc.support_assoc phi)) parameter let rewrite_producer phi ((ido, parameter, attrs) : producer) = ido, rewrite_parameter phi parameter, attrs let rewrite_branch phi ({ pr_producers = producers } as branch) = { branch with pr_producers = List.map (rewrite_producer phi) producers } let rewrite_branches phi branches = match phi with | [] -> branches | _ -> List.map (rewrite_branch phi) branches let fresh_counter = ref 0 let names = ref StringSet.empty let use_name name = names := StringSet.add name !names let used_name name = StringSet.mem name !names let rec fresh ?(hint = "v") () = let name = incr fresh_counter; hint ^ string_of_int !fresh_counter in if used_name name then fresh ~hint () else ( use_name name; name ) (* Alpha conversion of [prule]. We rename bound parameters using fresh names. *) let alphaconvert_rule parameters prule = let phi = List.combine parameters (List.map (fun x -> fresh ~hint:x ()) parameters) in { prule with pr_parameters = List.map (Misc.support_assoc phi) prule.pr_parameters; pr_branches = rewrite_branches phi prule.pr_branches } (* Rewrite a rule taking bound names into account. We rename parameters to avoid capture. *) let rewrite_rule phi prule = let ids = List.fold_left (fun acu (f, d) -> StringSet.add f (StringSet.add d acu)) StringSet.empty phi in let captured_parameters = List.filter (fun p -> StringSet.mem p ids) prule.pr_parameters in let prule = alphaconvert_rule captured_parameters prule in { prule with pr_nt = rewrite_nonterminal phi prule.pr_nt; pr_branches = rewrite_branches phi prule.pr_branches } let rewrite_rules phi rules = List.map (rewrite_rule phi) rules let rewrite_grammar phi grammar = (* We assume that [phi] affects only private symbols, so it does not affect the start symbols. *) if phi = identity_renaming then grammar else { grammar with pg_rules = rewrite_rules phi grammar.pg_rules } (* ------------------------------------------------------------------------- *) (* To rename (internalize) a nonterminal, we prefix it with its filename. This guarantees that names are unique. *) let is_valid_nonterminal_character = function | 'A' .. 'Z' | 'a' .. 'z' | '_' | '\192' .. '\214' | '\216' .. '\246' | '\248' .. '\255' | '0' .. '9' -> true | _ -> false let restrict filename = let m = Bytes.of_string (Filename.chop_suffix filename (if Settings.coq then ".vy" else ".mly")) in for i = 0 to Bytes.length m - 1 do if not (is_valid_nonterminal_character (Bytes.get m i)) then Bytes.set m i '_' done; Bytes.unsafe_to_string m let rename nonterminal filename = let name = restrict filename ^ "_" ^ nonterminal in if used_name name then fresh ~hint:name () else (use_name name; name) (* ------------------------------------------------------------------------- *) type symbol_kind = (* The nonterminal is declared public at a particular position. *) | PublicNonTerminal of Positions.t (* The nonterminal is declared (nonpublic) at a particular position. *) | PrivateNonTerminal of Positions.t (* The symbol is a token. *) | Token of token_properties (* We do not know yet what the symbol means. This is defined in the sequel or it is free in the partial grammar. *) | DontKnow of Positions.t type symbol_table = (symbol, symbol_kind) Hashtbl.t let find_symbol (symbols : symbol_table) symbol = Hashtbl.find symbols symbol let add_in_symbol_table (symbols : symbol_table) symbol kind = use_name symbol; Hashtbl.add symbols symbol kind; symbols let replace_in_symbol_table (symbols : symbol_table) symbol kind = Hashtbl.replace symbols symbol kind; symbols let empty_symbol_table () : symbol_table = Hashtbl.create 13 let store_symbol (symbols : symbol_table) symbol kind = match find_symbol symbols symbol, kind with (* The symbol is not known so far. Add it. *) | exception Not_found -> add_in_symbol_table symbols symbol kind (* There are two definitions of this symbol in one grammatical unit (that is, one .mly file), and at least one of them is private. This is forbidden. *) | PrivateNonTerminal p, PrivateNonTerminal p' | PublicNonTerminal p, PrivateNonTerminal p' | PrivateNonTerminal p, PublicNonTerminal p' -> Error.error [ p; p'] "the nonterminal symbol %s is multiply defined.\n\ Only %%public symbols can have split definitions." symbol (* The symbol is known to be a token but declared as a nonterminal.*) | Token tkp, (PrivateNonTerminal p | PublicNonTerminal p) | (PrivateNonTerminal p | PublicNonTerminal p), Token tkp -> Error.error [ p; tkp.tk_position ] "the identifier %s is a reference to a token." symbol (* In the following cases, we do not gain any piece of information. As of 2017/03/29, splitting the definition of a %public nonterminal symbol is permitted. (It used to be permitted over multiple units, but forbidden within a single unit.) *) | _, DontKnow _ | Token _, Token _ | PublicNonTerminal _, PublicNonTerminal _ -> symbols (* We learn that the symbol is a nonterminal or a token. *) | DontKnow _, _ -> replace_in_symbol_table symbols symbol kind let store_used_symbol position tokens symbols symbol = let kind = try Token (StringMap.find symbol tokens) with Not_found -> DontKnow position in store_symbol symbols symbol kind let non_terminal_is_not_reserved symbol positions = if symbol = "error" then Error.error positions "%s is reserved and thus cannot be used \ as a non-terminal symbol." symbol let non_terminal_is_not_a_token tokens symbol positions = try let tkp = StringMap.find symbol tokens in Error.error (positions @ [ tkp.tk_position ]) "the identifier %s is a reference to a token." symbol with Not_found -> () let store_public_nonterminal tokens symbols symbol positions = non_terminal_is_not_reserved symbol positions; non_terminal_is_not_a_token tokens symbol positions; store_symbol symbols symbol (PublicNonTerminal (List.hd positions)) let store_private_nonterminal tokens symbols symbol positions = non_terminal_is_not_reserved symbol positions; non_terminal_is_not_a_token tokens symbol positions; store_symbol symbols symbol (PrivateNonTerminal (List.hd positions)) (* for debugging, presumably: let string_of_kind = function | PublicNonTerminal p -> Printf.sprintf "public (%s)" (Positions.string_of_pos p) | PrivateNonTerminal p -> Printf.sprintf "private (%s)" (Positions.string_of_pos p) | Token tk -> Printf.sprintf "token (%s)" tk.tk_filename | DontKnow p -> Printf.sprintf "only used at (%s)" (Positions.string_of_pos p) let string_of_symbol_table t = let b = Buffer.create 13 in let m = 1 + Hashtbl.fold (fun k v acu -> max (String.length k) acu) t 0 in let fill_blank s = let s' = String.make m ' ' in String.blit s 0 s' 0 (String.length s); s' in Hashtbl.iter (fun k v -> Buffer.add_string b (Printf.sprintf "%s: %s\n" (fill_blank k) (string_of_kind v))) t; Buffer.contents b *) let is_private_symbol t x = try match Hashtbl.find t x with | PrivateNonTerminal _ -> true | _ -> false with Not_found -> false let fold_on_private_symbols f init t = Hashtbl.fold (fun k -> function PrivateNonTerminal _ -> (fun acu -> f acu k) | _ -> (fun acu -> acu)) t init let fold_on_public_symbols f init t = Hashtbl.fold (fun k -> function PublicNonTerminal _ -> (fun acu -> f acu k) | _ -> (fun acu -> acu)) t init let iter_on_only_used_symbols f t = Hashtbl.iter (fun k -> function DontKnow pos -> f k pos | _ -> ()) t let symbols_of grammar (pgrammar : Syntax.partial_grammar) = let tokens = grammar.p_tokens in let symbols_of_rule symbols prule = let rec store_except_rule_parameters symbols parameter = let symbol, parameters = Parameters.unapp parameter in (* Process the reference to [symbol]. *) let symbols = if List.mem symbol.value prule.pr_parameters then (* Rule parameters are bound locally, so they are not taken into account. *) symbols else store_used_symbol symbol.position tokens symbols symbol.value in (* Process the parameters. *) List.fold_left store_except_rule_parameters symbols parameters in (* Analyse each branch. *) let symbols = List.fold_left (fun symbols branch -> List.fold_left (fun symbols (_, p, _) -> store_except_rule_parameters symbols p ) symbols branch.pr_producers ) symbols prule.pr_branches in (* Store the symbol declaration. *) (* A nonterminal symbol is considered public if it is declared using %public or %start. *) if prule.pr_public_flag || StringMap.mem prule.pr_nt grammar.p_start_symbols then store_public_nonterminal tokens symbols prule.pr_nt prule.pr_positions else store_private_nonterminal tokens symbols prule.pr_nt prule.pr_positions in List.fold_left symbols_of_rule (empty_symbol_table ()) pgrammar.pg_rules let merge_rules symbols pgs = (* Retrieve all the public symbols. *) let public_symbols = List.fold_left (fold_on_public_symbols (fun s k -> StringSet.add k s)) (StringSet.singleton "error") symbols in (* We check the references in each grammar can be bound to a public symbol. *) let _ = List.iter (iter_on_only_used_symbols (fun k pos -> if not (StringSet.mem k public_symbols) then Error.error [ pos ] "%s is undefined." k)) symbols in (* Detect private symbol clashes and rename them if necessary. *) let detect_private_symbol_clashes = fold_on_private_symbols (fun (defined, clashes) symbol -> if StringSet.mem symbol defined || StringSet.mem symbol public_symbols then (defined, StringSet.add symbol clashes) else (StringSet.add symbol defined, clashes)) in let _private_symbols, clashes = List.fold_left detect_private_symbol_clashes (StringSet.empty, StringSet.empty) symbols in let rpgs = List.map (fun (symbol_table, pg) -> let renaming = StringSet.fold (fun x phi -> if is_private_symbol symbol_table x then begin let x' = rename x pg.pg_filename in Printf.fprintf stderr "Note: the nonterminal symbol %s (from %s) is renamed %s.\n" x pg.pg_filename x'; (x, x') :: phi end else phi) clashes [] in rewrite_grammar renaming pg) pgs in (* Merge public nonterminal definitions and copy private nonterminal definitions. Since the clash between private symbols have already been resolved, these copies are safe. *) List.fold_left (fun rules rpg -> List.fold_left (fun rules r -> let r = try let r' = StringMap.find r.pr_nt rules in let positions = r.pr_positions @ r'.pr_positions in let ra, ra' = List.length r.pr_parameters, List.length r'.pr_parameters in (* The arity of the parameterized symbols must be constant.*) if ra <> ra' then Error.error positions "the symbol %s is defined with arities %d and %d." r.pr_nt ra ra' else if r.pr_inline_flag <> r'.pr_inline_flag then Error.error positions "not all definitions of %s are marked %%inline." r.pr_nt else (* We combine the different branches. The parameters could have different names, we rename them with the fresh names assigned earlier (see the next comment). *) let phi = List.combine r.pr_parameters r'.pr_parameters in let rbr = rewrite_branches phi r.pr_branches in { r' with pr_positions = positions; pr_branches = rbr @ r'.pr_branches; pr_attributes = r.pr_attributes @ r'.pr_attributes; } with Not_found -> (* We alphaconvert the rule in order to avoid the capture of private symbols coming from another unit. *) alphaconvert_rule r.pr_parameters r in StringMap.add r.pr_nt r rules) rules rpg.pg_rules) StringMap.empty rpgs let empty_grammar = { p_preludes = []; p_postludes = []; p_parameters = []; p_start_symbols = StringMap.empty; p_types = []; p_tokens = StringMap.empty; p_rules = StringMap.empty; p_on_error_reduce = []; p_grammar_attributes = []; p_symbol_attributes = []; } let join grammar pgrammar = let filename = pgrammar.pg_filename in List.fold_left (join_declaration filename) grammar pgrammar.pg_declarations |> join_postlude pgrammar.pg_postlude (* If a rule is marked %inline, then it must not carry an attribute. *) let check_inline_attribute prule = match prule.pr_inline_flag, prule.pr_attributes with | true, (id, _payload) :: _attributes -> Error.error [Positions.position id] "the nonterminal symbol %s is declared %%inline.\n\ It cannot carry an attribute." prule.pr_nt | true, [] | false, _ -> () let reserved = [ "error" ] let check_identifier_reference mark_token_as_used grammar prule is_prec s p = if not is_prec && List.mem s prule.pr_parameters then (* A parameter of this rule. *) () else if not is_prec && List.mem s reserved then (* A reserved token. *) mark_token_as_used s else if not is_prec && StringMap.mem s grammar.p_rules then (* A nonterminal symbol. *) () else match StringMap.find s grammar.p_tokens with | prop -> (* A token or pseudo-token. Mark it as used. *) mark_token_as_used s; if not is_prec && not prop.tk_is_declared then (* A pseudo-token, declared by %left, %right or %nonassoc, cannot be used as a normal identifier. It can be only in a %prec annotation. *) Error.error [ p ] "The symbol %s has not been declared by %%token,\n\ so cannot be used here." s | exception Not_found -> (* An unknown symbol. *) if is_prec then Error.error [ p ] "The terminal symbol %s is undefined." s else Error.error [ p ] "The symbol %s is undefined." s let check_parameterized_grammar_is_well_defined grammar = (* Every start symbol is defined and has a %type declaration. *) StringMap.iter (fun nonterminal p -> if not (StringMap.mem nonterminal grammar.p_rules) then Error.error [p] "the start symbol %s is undefined." nonterminal; if not (List.exists (function | ParameterVar { value = id }, _ -> id = nonterminal | _ -> false) grammar.p_types) then Error.error [p] "the type of the start symbol %s is unspecified." nonterminal; ) grammar.p_start_symbols; (* Every %type definition refers to well-defined (terminal or nonterminal) symbols and has, at its head, a nonterminal symbol. *) (* Same check for %on_error_reduce definitions. *) let rec check (kind : string) (must_be_nonterminal : bool) (p : Syntax.parameter) = (* Destructure head and arguments. *) let head, ps = Parameters.unapp p in let head = value head in (* Check if [head] is a nonterminal or terminal symbol. *) let is_nonterminal = StringMap.mem head grammar.p_rules and is_terminal = StringMap.mem head grammar.p_tokens || List.mem head reserved in (* If [head] is not satisfactory, error. *) if not (is_terminal || is_nonterminal) then Error.error [Parameters.position p] "%s is undefined." head; if (must_be_nonterminal && not is_nonterminal) then Error.error [Parameters.position p] "%s is a terminal symbol,\n\ but %s declarations are applicable only to nonterminal symbols." (Parameters.print true p) kind; (* Then, check the arguments. *) List.iter (check kind false) ps in let check_fst kind must_be_nonterminal (p, _) = check kind must_be_nonterminal p in List.iter (check_fst "%type" true) grammar.p_types; List.iter (check_fst "%on_error_reduce" true) grammar.p_on_error_reduce; List.iter (fun (params, _) -> List.iter (check "%attribute" false) params ) grammar.p_symbol_attributes; (* Every reference to a symbol is well defined. *) let used_tokens = ref StringSet.empty in let mark_token_as_used token = used_tokens := StringSet.add token !used_tokens in StringMap.iter (fun k prule -> let check_identifier_reference = check_identifier_reference mark_token_as_used grammar prule in (* The formal parameters of each rule must have distinct names. *) prule.pr_parameters |> List.sort compare |> Misc.dup compare |> Option.iter (fun x -> Error.error prule.pr_positions "several parameters of this rule are named \"%s\"." x ); (* Check each branch. *) List.iter (fun { pr_producers = producers; pr_branch_prec_annotation; } -> ignore (List.fold_left (* Check the producers. *) (fun already_seen (id, p, _) -> let symbol, parameters = Parameters.unapp p in let s = symbol.value and p = symbol.position in let already_seen = (* Check the producer id is unique. *) if StringSet.mem id.value already_seen then Error.error [ id.position ] "there are multiple producers named %s in this sequence." id.value; StringSet.add id.value already_seen in (* Check that the producer is defined somewhere. *) check_identifier_reference false s p; StringMap.iter (check_identifier_reference false) (List.fold_left Parameters.identifiers StringMap.empty parameters); already_seen ) StringSet.empty producers); Option.iter (fun terminal -> check_identifier_reference true terminal.value terminal.position ) pr_branch_prec_annotation) prule.pr_branches; (* It is forbidden to use %inline on a %start symbol. *) if (prule.pr_inline_flag && StringMap.mem k grammar.p_start_symbols) then Error.error prule.pr_positions "%s cannot be both a start symbol and inlined." k; (* If a rule is marked %inline, then it must not carry an attribute. *) check_inline_attribute prule ) grammar.p_rules; (* Check that every token is used. *) if not Settings.ignore_all_unused_tokens then begin match Settings.token_type_mode with | Settings.TokenTypeOnly -> () | Settings.TokenTypeAndCode | Settings.CodeOnly _ -> StringMap.iter (fun token { tk_position = p } -> if not (StringSet.mem token !used_tokens || StringSet.mem token Settings.ignored_unused_tokens) then Error.warning [p] "the token %s is unused." token ) grammar.p_tokens end let join_partial_grammars pgs = (* Prior to joining the partial grammars, remove all uses of token aliases. *) let pgs = ExpandTokenAliases.dealias_grammars pgs in (* Join the partial grammars. *) let grammar = List.fold_left join empty_grammar pgs in let symbols = List.map (symbols_of grammar) pgs in let tpgs = List.combine symbols pgs in let rules = merge_rules symbols tpgs in let grammar = { grammar with p_rules = rules } in (* Check well-formedness. *) check_parameterized_grammar_is_well_defined grammar; grammar menhir-20210929/src/partialGrammar.mli000066400000000000000000000020331412503066000174760ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Syntax val join_partial_grammars : partial_grammar list -> grammar menhir-20210929/src/patricia.ml000066400000000000000000000775711412503066000162010ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This is an implementation of Patricia trees, following Chris Okasaki's paper at the 1998 ML Workshop in Baltimore. Both big-endian and little-endian trees are provided. Both sets and maps are implemented on top of Patricia trees. *) (*i --------------------------------------------------------------------------------------------------------------- i*) (*s \mysection{Little-endian vs big-endian trees} *) (* A tree is little-endian if it expects the key's least significant bits to be tested first during a search. It is big-endian if it expects the key's most significant bits to be tested first. Most of the code is independent of this design choice, so it is written as a functor, parameterized by a small structure which defines endianness. Here is the interface which must be adhered to by such a structure. *) module Endianness = struct module type S = sig (* A mask is an integer with a single one bit (i.e. a power of 2). *) type mask = int (* [branching_bit] accepts two distinct integers and returns a mask which identifies the first bit where they differ. The meaning of ``first'' varies according to the endianness being implemented. *) val branching_bit: int -> int -> mask (* [mask i m] returns an integer [i'], where all bits which [m] says are relevant are identical to those in [i], and all others are set to some unspecified, but fixed value. Which bits are ``relevant'' according to a given mask varies according to the endianness being implemented. *) val mask: int -> mask -> int (* [shorter m1 m2] returns [true] if and only if [m1] describes a shorter prefix than [m2], i.e. if it makes fewer bits relevant. Which bits are ``relevant'' according to a given mask varies according to the endianness being implemented. *) val shorter: mask -> mask -> bool end (* Now, let us define [Little] and [Big], two possible [Endiannness] choices. *) module Little = struct type mask = int let lowest_bit x = x land (-x) (* Performing a logical ``xor'' of [i0] and [i1] yields a bit field where all differences between [i0] and [i1] show up as one bits. (There must be at least one, since [i0] and [i1] are distinct.) The ``first'' one is the lowest bit in this bit field, since we are checking least significant bits first. *) let branching_bit i0 i1 = lowest_bit (i0 lxor i1) (* The ``relevant'' bits in an integer [i] are those which are found (strictly) to the right of the single one bit in the mask [m]. We keep these bits, and set all others to 0. *) let mask i m = i land (m-1) (* The smaller [m] is, the fewer bits are relevant. *) let shorter = (<) end module Big = struct type mask = int let lowest_bit x = x land (-x) let rec highest_bit x = let m = lowest_bit x in if x = m then m else highest_bit (x - m) (* Performing a logical ``xor'' of [i0] and [i1] yields a bit field where all differences between [i0] and [i1] show up as one bits. (There must be at least one, since [i0] and [i1] are distinct.) The ``first'' one is the highest bit in this bit field, since we are checking most significant bits first. In Okasaki's paper, this loop is sped up by computing a conservative initial guess. Indeed, the bit at which the two prefixes disagree must be somewhere within the shorter prefix, so we can begin searching at the least-significant valid bit in the shorter prefix. Unfortunately, to allow computing the initial guess, the main code has to pass in additional parameters, e.g. a mask which describes the length of each prefix. This ``pollutes'' the endianness-independent code. For this reason, this optimization isn't implemented here. *) let branching_bit i0 i1 = highest_bit (i0 lxor i1) (* The ``relevant'' bits in an integer [i] are those which are found (strictly) to the left of the single one bit in the mask [m]. We keep these bits, and set all others to 0. Okasaki uses a different convention, which allows big-endian Patricia trees to masquerade as binary search trees. This feature does not seem to be useful here. *) let mask i m = i land (lnot (2*m-1)) (* The smaller [m] is, the more bits are relevant. *) let shorter = (>) end end (*i --------------------------------------------------------------------------------------------------------------- i*) (*s \mysection{Patricia-tree-based maps} *) module Make (X : Endianness.S) = struct (* Patricia trees are maps whose keys are integers. *) type key = int (* A tree is either empty, or a leaf node, containing both the integer key and a piece of data, or a binary node. Each binary node carries two integers. The first one is the longest common prefix of all keys in this sub-tree. The second integer is the branching bit. It is an integer with a single one bit (i.e. a power of 2), which describes the bit being tested at this node. *) type 'a t = | Empty | Leaf of int * 'a | Branch of int * X.mask * 'a t * 'a t (* The empty map. *) let empty = Empty (* [choose m] returns an arbitrarily chosen binding in [m], if [m] is nonempty, and raises [Not_found] otherwise. *) let rec choose = function | Empty -> raise Not_found | Leaf (key, data) -> key, data | Branch (_, _, tree0, _) -> choose tree0 (* [lookup k m] looks up the value associated to the key [k] in the map [m], and raises [Not_found] if no value is bound to [k]. This implementation takes branches \emph{without} checking whether the key matches the prefix found at the current node. This means that a query for a non-existent key shall be detected only when finally reaching a leaf, rather than higher up in the tree. This strategy is better when (most) queries are expected to be successful. *) let rec lookup key = function | Empty -> raise Not_found | Leaf (key', data) -> if key = key' then data else raise Not_found | Branch (_, mask, tree0, tree1) -> lookup key (if (key land mask) = 0 then tree0 else tree1) let find = lookup (* [mem k m] tells whether the key [k] appears in the domain of the map [m]. *) let mem k m = try let _ = lookup k m in true with Not_found -> false (* The auxiliary function [join] merges two trees in the simple case where their prefixes disagree. Assume $t_0$ and $t_1$ are non-empty trees, with longest common prefixes $p_0$ and $p_1$, respectively. Further, suppose that $p_0$ and $p_1$ disagree, that is, neither prefix is contained in the other. Then, no matter how large $t_0$ and $t_1$ are, we can merge them simply by creating a new [Branch] node that has $t_0$ and $t_1$ as children! *) let join p0 t0 p1 t1 = let m = X.branching_bit p0 p1 in let p = X.mask p0 (* for instance *) m in if (p0 land m) = 0 then Branch(p, m, t0, t1) else Branch(p, m, t1, t0) (* The auxiliary function [match_prefix] tells whether a given key has a given prefix. More specifically, [match_prefix k p m] returns [true] if and only if the key [k] has prefix [p] up to bit [m]. Throughout our implementation of Patricia trees, prefixes are assumed to be in normal form, i.e. their irrelevant bits are set to some predictable value. Formally, we assume [X.mask p m] equals [p] whenever [p] is a prefix with [m] relevant bits. This allows implementing [match_prefix] using only one call to [X.mask]. On the other hand, this requires normalizing prefixes, as done e.g. in [join] above, where [X.mask p0 m] has to be used instead of [p0]. *) let match_prefix k p m = X.mask k m = p (* [fine_add decide k d m] returns a map whose bindings are all bindings in [m], plus a binding of the key [k] to the datum [d]. If a binding from [k] to [d0] already exists, then the resulting map contains a binding from [k] to [decide d0 d]. *) type 'a decision = 'a -> 'a -> 'a exception Unchanged let basic_add decide k d m = let rec add t = match t with | Empty -> Leaf (k, d) | Leaf (k0, d0) -> if k = k0 then let d' = decide d0 d in if d' == d0 then raise Unchanged else Leaf (k, d') else join k (Leaf (k, d)) k0 t | Branch (p, m, t0, t1) -> if match_prefix k p m then if (k land m) = 0 then Branch (p, m, add t0, t1) else Branch (p, m, t0, add t1) else join k (Leaf (k, d)) p t in add m let strict_add k d m = basic_add (fun _ _ -> raise Unchanged) k d m let fine_add decide k d m = try basic_add decide k d m with Unchanged -> m (* [add k d m] returns a map whose bindings are all bindings in [m], plus a binding of the key [k] to the datum [d]. If a binding already exists for [k], it is overridden. *) let add k d m = fine_add (fun _old_binding new_binding -> new_binding) k d m (* [singleton k d] returns a map whose only binding is from [k] to [d]. *) let singleton k d = Leaf (k, d) (* [is_singleton m] returns [Some (k, d)] if [m] is a singleton map that maps [k] to [d]. Otherwise, it returns [None]. *) let is_singleton = function | Leaf (k, d) -> Some (k, d) | Empty | Branch _ -> None (* [is_empty m] returns [true] if and only if the map [m] defines no bindings at all. *) let is_empty = function | Empty -> true | Leaf _ | Branch _ -> false (* [cardinal m] returns [m]'s cardinal, that is, the number of keys it binds, or, in other words, its domain's cardinal. *) let rec cardinal = function | Empty -> 0 | Leaf _ -> 1 | Branch (_, _, t0, t1) -> cardinal t0 + cardinal t1 (* [remove k m] returns the map [m] deprived from any binding involving [k]. *) let remove key m = let rec remove = function | Empty -> raise Not_found | Leaf (key', _) -> if key = key' then Empty else raise Not_found | Branch (prefix, mask, tree0, tree1) -> if (key land mask) = 0 then match remove tree0 with | Empty -> tree1 | tree0 -> Branch (prefix, mask, tree0, tree1) else match remove tree1 with | Empty -> tree0 | tree1 -> Branch (prefix, mask, tree0, tree1) in try remove m with Not_found -> m (* [lookup_and_remove k m] looks up the value [v] associated to the key [k] in the map [m], and raises [Not_found] if no value is bound to [k]. The call returns the value [v], together with the map [m] deprived from the binding from [k] to [v]. *) let rec lookup_and_remove key = function | Empty -> raise Not_found | Leaf (key', data) -> if key = key' then data, Empty else raise Not_found | Branch (prefix, mask, tree0, tree1) -> if (key land mask) = 0 then match lookup_and_remove key tree0 with | data, Empty -> data, tree1 | data, tree0 -> data, Branch (prefix, mask, tree0, tree1) else match lookup_and_remove key tree1 with | data, Empty -> data, tree0 | data, tree1 -> data, Branch (prefix, mask, tree0, tree1) let find_and_remove = lookup_and_remove (* [fine_union decide m1 m2] returns the union of the maps [m1] and [m2]. If a key [k] is bound to [x1] (resp. [x2]) within [m1] (resp. [m2]), then [decide] is called. It is passed [x1] and [x2], and must return the value which shall be bound to [k] in the final map. The operation returns [m2] itself (as opposed to a copy of it) when its result is equal to [m2]. *) let reverse decision elem1 elem2 = decision elem2 elem1 let fine_union decide m1 m2 = let rec union s t = match s, t with | Empty, _ -> t | (Leaf _ | Branch _), Empty -> s | Leaf(key, value), _ -> fine_add (reverse decide) key value t | Branch _, Leaf(key, value) -> fine_add decide key value s | Branch(p, m, s0, s1), Branch(q, n, t0, t1) -> if (p = q) && (m = n) then (* The trees have the same prefix. Merge their sub-trees. *) let u0 = union s0 t0 and u1 = union s1 t1 in if t0 == u0 && t1 == u1 then t else Branch(p, m, u0, u1) else if (X.shorter m n) && (match_prefix q p m) then (* [q] contains [p]. Merge [t] with a sub-tree of [s]. *) if (q land m) = 0 then Branch(p, m, union s0 t, s1) else Branch(p, m, s0, union s1 t) else if (X.shorter n m) && (match_prefix p q n) then (* [p] contains [q]. Merge [s] with a sub-tree of [t]. *) if (p land n) = 0 then let u0 = union s t0 in if t0 == u0 then t else Branch(q, n, u0, t1) else let u1 = union s t1 in if t1 == u1 then t else Branch(q, n, t0, u1) else (* The prefixes disagree. *) join p s q t in union m1 m2 (* [union m1 m2] returns the union of the maps [m1] and [m2]. Bindings in [m2] take precedence over those in [m1]. *) let union m1 m2 = fine_union (fun _d d' -> d') m1 m2 (* [iter f m] invokes [f k x], in turn, for each binding from key [k] to element [x] in the map [m]. Keys are presented to [f] according to some unspecified, but fixed, order. *) let rec iter f = function | Empty -> () | Leaf (key, data) -> f key data | Branch (_, _, tree0, tree1) -> iter f tree0; iter f tree1 (* [fold f m seed] invokes [f k d accu], in turn, for each binding from key [k] to datum [d] in the map [m]. Keys are presented to [f] in increasing order according to the map's ordering. The initial value of [accu] is [seed]; then, at each new call, its value is the value returned by the previous invocation of [f]. The value returned by [fold] is the final value of [accu]. *) let rec fold f m accu = match m with | Empty -> accu | Leaf (key, data) -> f key data accu | Branch (_, _, tree0, tree1) -> fold f tree1 (fold f tree0 accu) (* [fold_rev] performs exactly the same job as [fold], but presents keys to [f] in the opposite order. *) let rec fold_rev f m accu = match m with | Empty -> accu | Leaf (key, data) -> f key data accu | Branch (_, _, tree0, tree1) -> fold_rev f tree0 (fold_rev f tree1 accu) (* It is valid to evaluate [iter2 f m1 m2] if and only if [m1] and [m2] have the same domain. Doing so invokes [f k x1 x2], in turn, for each key [k] bound to [x1] in [m1] and to [x2] in [m2]. Bindings are presented to [f] according to some unspecified, but fixed, order. *) let rec iter2 f t1 t2 = match t1, t2 with | Empty, Empty -> () | Leaf (key1, data1), Leaf (key2, data2) -> assert (key1 = key2); f key1 (* for instance *) data1 data2 | Branch (p1, m1, left1, right1), Branch (p2, m2, left2, right2) -> assert (p1 = p2); assert (m1 = m2); iter2 f left1 left2; iter2 f right1 right2 | _, _ -> assert false (* [map f m] returns the map obtained by composing the map [m] with the function [f]; that is, the map $k\mapsto f(m(k))$. *) let rec map f = function | Empty -> Empty | Leaf (key, data) -> Leaf(key, f data) | Branch (p, m, tree0, tree1) -> Branch (p, m, map f tree0, map f tree1) (* [endo_map] is similar to [map], but attempts to physically share its result with its input. This saves memory when [f] is the identity function. *) let rec endo_map f tree = match tree with | Empty -> tree | Leaf (key, data) -> let data' = f data in if data == data' then tree else Leaf(key, data') | Branch (p, m, tree0, tree1) -> let tree0' = endo_map f tree0 in let tree1' = endo_map f tree1 in if (tree0' == tree0) && (tree1' == tree1) then tree else Branch (p, m, tree0', tree1') (* [filter f m] returns a copy of the map [m] where only the bindings that satisfy [f] have been retained. *) let filter f m = fold (fun key data accu -> if f key data then add key data accu else accu ) m empty (* [iterator m] returns a stateful iterator over the map [m]. *) (* TEMPORARY performance could be improved, see JCF's paper *) let iterator m = let remainder = ref [ m ] in let rec next () = match !remainder with | [] -> None | Empty :: parent -> remainder := parent; next() | (Leaf (key, data)) :: parent -> remainder := parent; Some (key, data) | (Branch(_, _, s0, s1)) :: parent -> remainder := s0 :: s1 :: parent; next () in next (* If [dcompare] is an ordering over data, then [compare dcompare] is an ordering over maps. *) exception Got of int let compare dcompare m1 m2 = let iterator2 = iterator m2 in try iter (fun key1 data1 -> match iterator2() with | None -> raise (Got 1) | Some (key2, data2) -> let c = Generic.compare key1 key2 in if c <> 0 then raise (Got c) else let c = dcompare data1 data2 in if c <> 0 then raise (Got c) ) m1; match iterator2() with | None -> 0 | Some _ -> -1 with Got c -> c (*i --------------------------------------------------------------------------------------------------------------- i*) (*s \mysection{Patricia-tree-based sets} *) (* To enhance code sharing, it would be possible to implement maps as sets of pairs, or (vice-versa) to implement sets as maps to the unit element. However, both possibilities introduce some space and time inefficiency. To avoid it, we define each structure separately. *) module Domain = struct type element = int type t = | Empty | Leaf of int | Branch of int * X.mask * t * t (* The empty set. *) let empty = Empty (* [is_empty s] returns [true] if and only if the set [s] is empty. *) let is_empty = function | Empty -> true | Leaf _ | Branch _ -> false (* [singleton x] returns a set whose only element is [x]. *) let singleton x = Leaf x (* [is_singleton s] tests whether [s] is a singleton set. *) let is_singleton = function | Leaf _ -> true | Empty | Branch _ -> false (* [choose s] returns an arbitrarily chosen element of [s], if [s] is nonempty, and raises [Not_found] otherwise. *) let rec choose = function | Empty -> raise Not_found | Leaf x -> x | Branch (_, _, tree0, _) -> choose tree0 (* [cardinal s] returns [s]'s cardinal. *) let rec cardinal = function | Empty -> 0 | Leaf _ -> 1 | Branch (_, _, t0, t1) -> cardinal t0 + cardinal t1 (* [mem x s] returns [true] if and only if [x] appears in the set [s]. *) let rec mem x = function | Empty -> false | Leaf x' -> x = x' | Branch (_, mask, tree0, tree1) -> mem x (if (x land mask) = 0 then tree0 else tree1) (* The auxiliary function [join] merges two trees in the simple case where their prefixes disagree. *) let join p0 t0 p1 t1 = let m = X.branching_bit p0 p1 in let p = X.mask p0 (* for instance *) m in if (p0 land m) = 0 then Branch(p, m, t0, t1) else Branch(p, m, t1, t0) (* [add x s] returns a set whose elements are all elements of [s], plus [x]. *) exception Unchanged let rec strict_add x t = match t with | Empty -> Leaf x | Leaf x0 -> if x = x0 then raise Unchanged else join x (Leaf x) x0 t | Branch (p, m, t0, t1) -> if match_prefix x p m then if (x land m) = 0 then Branch (p, m, strict_add x t0, t1) else Branch (p, m, t0, strict_add x t1) else join x (Leaf x) p t let add x s = try strict_add x s with Unchanged -> s (* [remove x s] returns a set whose elements are all elements of [s], except [x]. *) let remove x s = let rec strict_remove = function | Empty -> raise Not_found | Leaf x' -> if x = x' then Empty else raise Not_found | Branch (prefix, mask, tree0, tree1) -> if (x land mask) = 0 then match strict_remove tree0 with | Empty -> tree1 | tree0 -> Branch (prefix, mask, tree0, tree1) else match strict_remove tree1 with | Empty -> tree0 | tree1 -> Branch (prefix, mask, tree0, tree1) in try strict_remove s with Not_found -> s (* [union s1 s2] returns the union of the sets [s1] and [s2]. *) let rec union s t = match s, t with | Empty, _ -> t | _, Empty -> s | Leaf x, _ -> add x t | _, Leaf x -> add x s | Branch(p, m, s0, s1), Branch(q, n, t0, t1) -> if (p = q) && (m = n) then (* The trees have the same prefix. Merge their sub-trees. *) let u0 = union s0 t0 and u1 = union s1 t1 in if t0 == u0 && t1 == u1 then t else Branch(p, m, u0, u1) else if (X.shorter m n) && (match_prefix q p m) then (* [q] contains [p]. Merge [t] with a sub-tree of [s]. *) if (q land m) = 0 then Branch(p, m, union s0 t, s1) else Branch(p, m, s0, union s1 t) else if (X.shorter n m) && (match_prefix p q n) then (* [p] contains [q]. Merge [s] with a sub-tree of [t]. *) if (p land n) = 0 then let u0 = union s t0 in if t0 == u0 then t else Branch(q, n, u0, t1) else let u1 = union s t1 in if t1 == u1 then t else Branch(q, n, t0, u1) else (* The prefixes disagree. *) join p s q t (* [build] is a ``smart constructor''. It builds a [Branch] node with the specified arguments, but ensures that the newly created node does not have an [Empty] child. *) let build p m t0 t1 = match t0, t1 with | Empty, Empty -> Empty | Empty, _ -> t1 | _, Empty -> t0 | _, _ -> Branch(p, m, t0, t1) (* [inter s t] returns the set intersection of [s] and [t], that is, $s\cap t$. *) let rec inter s t = match s, t with | Empty, _ | _, Empty -> Empty | (Leaf x as s), t | t, (Leaf x as s) -> if mem x t then s else Empty | Branch(p, m, s0, s1), Branch(q, n, t0, t1) -> if (p = q) && (m = n) then (* The trees have the same prefix. Compute the intersections of their sub-trees. *) build p m (inter s0 t0) (inter s1 t1) else if (X.shorter m n) && (match_prefix q p m) then (* [q] contains [p]. Intersect [t] with a sub-tree of [s]. *) inter (if (q land m) = 0 then s0 else s1) t else if (X.shorter n m) && (match_prefix p q n) then (* [p] contains [q]. Intersect [s] with a sub-tree of [t]. *) inter s (if (p land n) = 0 then t0 else t1) else (* The prefixes disagree. *) Empty (* [disjoint s1 s2] returns [true] if and only if the sets [s1] and [s2] are disjoint, i.e. iff their intersection is empty. It is a specialized version of [inter], which uses less space. *) exception NotDisjoint let disjoint s t = let rec inter s t = match s, t with | Empty, _ | _, Empty -> () | Leaf x, _ -> if mem x t then raise NotDisjoint | _, Leaf x -> if mem x s then raise NotDisjoint | Branch(p, m, s0, s1), Branch(q, n, t0, t1) -> if (p = q) && (m = n) then begin inter s0 t0; inter s1 t1 end else if (X.shorter m n) && (match_prefix q p m) then inter (if (q land m) = 0 then s0 else s1) t else if (X.shorter n m) && (match_prefix p q n) then inter s (if (p land n) = 0 then t0 else t1) else () in try inter s t; true with NotDisjoint -> false (* [iter f s] invokes [f x], in turn, for each element [x] of the set [s]. Elements are presented to [f] according to some unspecified, but fixed, order. *) let rec iter f = function | Empty -> () | Leaf x -> f x | Branch (_, _, tree0, tree1) -> iter f tree0; iter f tree1 (* [fold f s seed] invokes [f x accu], in turn, for each element [x] of the set [s]. Elements are presented to [f] according to some unspecified, but fixed, order. The initial value of [accu] is [seed]; then, at each new call, its value is the value returned by the previous invocation of [f]. The value returned by [fold] is the final value of [accu]. In other words, if $s = \{ x_1, x_2, \ldots, x_n \}$, where $x_1 < x_2 < \ldots < x_n$, then [fold f s seed] computes $([f]\,x_n\,\ldots\,([f]\,x_2\,([f]\,x_1\,[seed]))\ldots)$. *) let rec fold f s accu = match s with | Empty -> accu | Leaf x -> f x accu | Branch (_, _, s0, s1) -> fold f s1 (fold f s0 accu) (* [elements s] is a list of all elements in the set [s]. *) let elements s = fold (fun tl hd -> tl :: hd) s [] (* [iterator s] returns a stateful iterator over the set [s]. That is, if $s = \{ x_1, x_2, \ldots, x_n \}$, where $x_1 < x_2 < \ldots < x_n$, then [iterator s] is a function which, when invoked for the $k^{\text{th}}$ time, returns [Some]$x_k$, if $k\leq n$, and [None] otherwise. Such a function can be useful when one wishes to iterate over a set's elements, without being restricted by the call stack's discipline. For more comments about this algorithm, please see module [Baltree], which defines a similar one. *) let iterator s = let remainder = ref [ s ] in let rec next () = match !remainder with | [] -> None | Empty :: parent -> remainder := parent; next() | (Leaf x) :: parent -> remainder := parent; Some x | (Branch(_, _, s0, s1)) :: parent -> remainder := s0 :: s1 :: parent; next () in next (* [compare] is an ordering over sets. *) exception Got of int let compare s1 s2 = let iterator2 = iterator s2 in try iter (fun x1 -> match iterator2() with | None -> raise (Got 1) | Some x2 -> let c = Generic.compare x1 x2 in if c <> 0 then raise (Got c) ) s1; match iterator2() with | None -> 0 | Some _ -> -1 with Got c -> c (* [equal] implements equality over sets. *) let equal s1 s2 = compare s1 s2 = 0 (* [subset] implements the subset predicate over sets. In other words, [subset s t] returns [true] if and only if $s\subseteq t$. It is a specialized version of [diff]. *) exception NotSubset let subset s t = let rec diff s t = match s, t with | Empty, _ -> () | _, Empty | Branch _, Leaf _ -> raise NotSubset | Leaf x, _ -> if not (mem x t) then raise NotSubset | Branch(p, m, s0, s1), Branch(q, n, t0, t1) -> if (p = q) && (m = n) then begin diff s0 t0; diff s1 t1 end else if (X.shorter n m) && (match_prefix p q n) then diff s (if (p land n) = 0 then t0 else t1) else (* Either [q] contains [p], which means at least one of [s]'s sub-trees is not contained within [t], or the prefixes disagree. In either case, the subset relationship cannot possibly hold. *) raise NotSubset in try diff s t; true with NotSubset -> false end (*i --------------------------------------------------------------------------------------------------------------- i*) (*s \mysection{Relating sets and maps} *) (* Back to the world of maps. Let us now describe the relationship which exists between maps and their domains. *) (* [domain m] returns [m]'s domain. *) let rec domain = function | Empty -> Domain.Empty | Leaf (k, _) -> Domain.Leaf k | Branch (p, m, t0, t1) -> Domain.Branch (p, m, domain t0, domain t1) (* [lift f s] returns the map $k\mapsto f(k)$, where $k$ ranges over a set of keys [s]. *) let rec lift f = function | Domain.Empty -> Empty | Domain.Leaf k -> Leaf (k, f k) | Domain.Branch (p, m, t0, t1) -> Branch(p, m, lift f t0, lift f t1) (* [build] is a ``smart constructor''. It builds a [Branch] node with the specified arguments, but ensures that the newly created node does not have an [Empty] child. *) let build p m t0 t1 = match t0, t1 with | Empty, Empty -> Empty | Empty, _ -> t1 | _, Empty -> t0 | _, _ -> Branch(p, m, t0, t1) (* [corestrict m d] performs a co-restriction of the map [m] to the domain [d]. That is, it returns the map $k\mapsto m(k)$, where $k$ ranges over all keys bound in [m] but \emph{not} present in [d]. Its code resembles [diff]'s. *) let rec corestrict s t = match s, t with | Empty, _ | _, Domain.Empty -> s | Leaf (k, _), _ -> if Domain.mem k t then Empty else s | _, Domain.Leaf k -> remove k s | Branch(p, m, s0, s1), Domain.Branch(q, n, t0, t1) -> if (p = q) && (m = n) then build p m (corestrict s0 t0) (corestrict s1 t1) else if (X.shorter m n) && (match_prefix q p m) then if (q land m) = 0 then build p m (corestrict s0 t) s1 else build p m s0 (corestrict s1 t) else if (X.shorter n m) && (match_prefix p q n) then corestrict s (if (p land n) = 0 then t0 else t1) else s end (*i --------------------------------------------------------------------------------------------------------------- i*) (*s \mysection{Instantiating the functor} *) module Little = Make(Endianness.Little) module Big = Make(Endianness.Big) menhir-20210929/src/patricia.mli000066400000000000000000000024301412503066000163300ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This is an implementation of Patricia trees, following Chris Okasaki's paper at the 1998 ML Workshop in Baltimore. Both big-endian and little-endian trees are provided. Both sets and maps are implemented on top of Patricia trees. *) module Little : GMap.S with type key = int module Big : GMap.S with type key = int menhir-20210929/src/positions.ml000066400000000000000000000074341412503066000164230ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Lexing type t = (* Start and end positions. *) position * position type 'a located = { value : 'a; position : t; } let value { value = v } = v let position { position = p } = p let decompose { value; position } = (value, position) let with_pos p v = { value = v; position = p; } let with_loc = (* The location is converted from the type [position * position] to the type [t]. *) with_pos let map f v = { value = f v.value; position = v.position; } let pmap f v = { value = f v.position v.value; position = v.position } let iter f { value = v } = f v let mapd f v = let w1, w2 = f v.value in let pos = v.position in { value = w1; position = pos }, { value = w2; position = pos } let dummy = (dummy_pos, dummy_pos) let unknown_pos v = { value = v; position = dummy } let start_of_position (p, _) = p let end_of_position (_, p) = p let filename_of_position p = (start_of_position p).pos_fname let line p = p.pos_lnum let column p = p.pos_cnum - p.pos_bol let characters p1 p2 = (column p1, p2.pos_cnum - p1.pos_bol) (* intentionally [p1.pos_bol] *) let join x1 x2 = ( start_of_position (if x1 = dummy then x2 else x1), end_of_position (if x2 = dummy then x1 else x2) ) let import x = x let join_located l1 l2 f = { value = f l1.value l2.value; position = join l1.position l2.position; } let string_of_lex_pos p = let c = p.pos_cnum - p.pos_bol in (string_of_int p.pos_lnum)^":"^(string_of_int c) let string_of_pos p = let filename = filename_of_position p in (* [filename] is hopefully not "". *) let l = line (start_of_position p) in let c1, c2 = characters (start_of_position p) (end_of_position p) in Printf.sprintf "File \"%s\", line %d, characters %d-%d" filename l c1 c2 let pos_or_undef = function | None -> dummy | Some x -> x let cpos lexbuf = (lexeme_start_p lexbuf, lexeme_end_p lexbuf) let with_cpos lexbuf v = with_pos (cpos lexbuf) v let string_of_cpos lexbuf = string_of_pos (cpos lexbuf) let joinf f t1 t2 = join (f t1) (f t2) let ljoinf f = List.fold_left (fun p t -> join p (f t)) dummy let join_located_list ls f = { value = f (List.map (fun l -> l.value) ls); position = ljoinf (fun x -> x.position) ls } (* The functions that print error messages and warnings require a list of positions. The following auxiliary functions help build such lists. *) type positions = t list let one (pos : position) : positions = [ import (pos, pos) ] let lexbuf (lexbuf : lexbuf) : positions = [ import (lexbuf.lex_start_p, lexbuf.lex_curr_p) ] let print (pos : position) = Printf.printf "{ pos_fname = \"%s\"; pos_lnum = %d; pos_bol = %d; pos_cnum = %d }\n" pos.pos_fname pos.pos_lnum pos.pos_bol pos.pos_cnum menhir-20210929/src/positions.mli000066400000000000000000000107001412503066000165620ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* TEMPORARY clean up this over-complicated API? *) (** Extension of standard library's positions. *) (** {2 Extended lexing positions} *) (** Abstract type for pairs of positions in the lexing stream. *) type t (** Decoration of a value with a position. *) type 'a located = { value : 'a; position : t; } (** [value dv] returns the raw value that underlies the decorated value [dv]. *) val value: 'a located -> 'a (** [position dv] returns the position that decorates the decorated value [dv]. *) val position: 'a located -> t (** [decompose dv] returns a pair of the value and position. *) val decompose: 'a located -> 'a * t (** [with_pos p v] decorates [v] with a position [p]. *) val with_pos : t -> 'a -> 'a located val with_cpos: Lexing.lexbuf -> 'a -> 'a located val with_loc : (Lexing.position * Lexing.position) -> 'a -> 'a located val unknown_pos : 'a -> 'a located (** [map f v] extends the decoration from [v] to [f v]. *) val map: ('a -> 'b) -> 'a located -> 'b located val pmap: (t -> 'a -> 'b) -> 'a located -> 'b located (** [iter f dv] applies [f] to the value inside [dv]. *) val iter: ('a -> unit) -> 'a located -> unit (** [mapd f v] extends the decoration from [v] to both members of the pair [f v]. *) val mapd: ('a -> 'b1 * 'b2) -> 'a located -> 'b1 located * 'b2 located (** This value is used when an object does not come from a particular input location. *) val dummy: t (** {2 Accessors} *) (** [column p] returns the number of characters from the beginning of the line of the Lexing.position [p]. *) val column : Lexing.position -> int (** [column p] returns the line number of to the Lexing.position [p]. *) val line : Lexing.position -> int (** [characters p1 p2] returns the character interval between [p1] and [p2] assuming they are located in the same line. *) val characters : Lexing.position -> Lexing.position -> int * int val start_of_position: t -> Lexing.position val end_of_position: t -> Lexing.position val filename_of_position: t -> string (** {2 Position handling} *) (** [join p1 p2] returns a position that starts where [p1] starts and stops where [p2] stops. *) val join : t -> t -> t val import : Lexing.position * Lexing.position -> t val ljoinf : ('a -> t) -> 'a list -> t val joinf : ('a -> t) -> 'a -> 'a -> t val join_located : 'a located -> 'b located -> ('a -> 'b -> 'c) -> 'c located val join_located_list : ('a located) list -> ('a list -> 'b list) -> ('b list) located (** [string_of_lex_pos p] returns a string representation for the lexing position [p]. *) val string_of_lex_pos : Lexing.position -> string (** [string_of_pos p] returns the standard (Emacs-like) representation of the position [p]. *) val string_of_pos : t -> string (** [pos_or_undef po] is the identity function except if po = None, in that case, it returns [undefined_position]. *) val pos_or_undef : t option -> t (** {2 Interaction with the lexer runtime} *) (** [cpos lexbuf] returns the current position of the lexer. *) val cpos : Lexing.lexbuf -> t (** [string_of_cpos p] returns a string representation of the lexer's current position. *) val string_of_cpos : Lexing.lexbuf -> string (* The functions that print error messages and warnings require a list of positions. The following auxiliary functions help build such lists. *) type positions = t list val one: Lexing.position -> positions val lexbuf: Lexing.lexbuf -> positions (* Low-level printing function, for debugging. *) val print: Lexing.position -> unit menhir-20210929/src/printer.ml000066400000000000000000000531271412503066000160570ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* A pretty-printer for [IL]. *) open IL module PreliminaryMake (X : sig (* We assume that the following types and functions are given. This allows us to work both with buffers of type [Buffer.t] and with output channels of type [out_channel]. *) type channel val fprintf: channel -> ('a, channel, unit) format -> 'a val output_substring: channel -> string -> int -> int -> unit (* This is the channel that is being written to. *) val f: channel (* [locate_stretches] controls the way we print OCaml stretches (types and semantic actions). If it is [Some dstfilename], where [dstfilename] is the name of the file that is being written, then we surround stretches with OCaml line number directives of the form # . If it is [None], then we don't. *) (* Providing line number directives allows the OCaml typechecker to report type errors in the .mly file, instead of in the generated .ml / .mli files. Line number directives also affect the dynamic semantics of any [assert] statements contained in semantic actions: when they are provided, the [Assert_failure] exception carries a location in the .mly file. As a general rule of thumb, line number directives should always be provided, except perhaps where we think that they decrease readability (e.g., in a generated .mli file). *) val locate_stretches: string option end) = struct open X let output_char f c = fprintf f "%c" c let output_string f s = fprintf f "%s" s let flush f = fprintf f "%!" (* ------------------------------------------------------------------------- *) (* Dealing with newlines and indentation. *) let maxindent = 120 let whitespace = String.make maxindent ' ' let indentation = ref 0 let line = ref 1 (* [rawnl] is, in principle, the only place where writing a newline character to the output channel is permitted. This ensures that the line counter remains correct. But see also [stretch] and [typ0]. *) let rawnl f = incr line; output_char f '\n' let nl f = rawnl f; output_substring f whitespace 0 !indentation let indent ofs producer f x = let old_indentation = !indentation in let new_indentation = old_indentation + ofs in if new_indentation <= maxindent then indentation := new_indentation; nl f; producer f x; indentation := old_indentation (* This produces a line number directive. *) let sharp f line file = fprintf f "%t# %d \"%s\"%t" rawnl line file rawnl (* ------------------------------------------------------------------------- *) (* Printers of atomic elements. *) let nothing _ = () let space f = output_char f ' ' let comma f = output_string f ", " let semi f = output_char f ';' let seminl f = semi f; nl f let times f = output_string f " * " let letrec f = output_string f "let rec " let letnonrec f = output_string f "let " let keytyp f = output_string f "type " let exc f = output_string f "exception " let et f = output_string f "and " let var f x = output_string f x let bar f = output_string f " | " (* ------------------------------------------------------------------------- *) (* List printers. *) (* A list with a separator in front of every element. *) let rec list elem sep f = function | [] -> () | e :: es -> fprintf f "%t%a%a" sep elem e (list elem sep) es (* A list with a separator between elements. *) let seplist elem sep f = function | [] -> () | e :: es -> fprintf f "%a%a" elem e (list elem sep) es (* OCaml type parameters. *) let typeparams p0 p1 f = function | [] -> () | [ param ] -> fprintf f "%a " p0 param | _ :: _ as params -> fprintf f "(%a) " (seplist p1 comma) params (* ------------------------------------------------------------------------- *) (* Expression printer. *) (* We use symbolic constants that stand for subsets of the expression constructors. We do not use numeric levels to stand for subsets, because our subsets do not form a linear inclusion chain. *) type subset = | All | AllButSeq | AllButFunTryMatch | AllButFunTryMatchSeq | AllButLetFunTryMatch | AllButLetFunTryMatchSeq | AllButIfThenSeq | OnlyAppOrAtom | OnlyAtom (* This computes the intersection of a subset with the constraint "should not be a sequence". *) let andNotSeq = function | All | AllButSeq -> AllButSeq | AllButFunTryMatch | AllButFunTryMatchSeq -> AllButFunTryMatchSeq | AllButLetFunTryMatch | AllButLetFunTryMatchSeq -> AllButLetFunTryMatchSeq | AllButIfThenSeq -> AllButIfThenSeq | OnlyAppOrAtom -> OnlyAppOrAtom | OnlyAtom -> OnlyAtom (* This defines the semantics of subsets by relating expressions with subsets. *) let rec member e k = match e with | EComment _ | EPatComment _ -> true | EFun _ | ETry _ | EMatch _ -> begin match k with | AllButFunTryMatch | AllButFunTryMatchSeq | AllButLetFunTryMatch | AllButLetFunTryMatchSeq | OnlyAppOrAtom | OnlyAtom -> false | _ -> true end | ELet ([], e) -> member e k | ELet ((PUnit, _) :: _, _) -> begin match k with | AllButSeq | AllButFunTryMatchSeq | AllButLetFunTryMatchSeq | AllButIfThenSeq | OnlyAppOrAtom | OnlyAtom -> false | _ -> true end | ELet (_ :: _, _) -> begin match k with | AllButLetFunTryMatch | AllButLetFunTryMatchSeq | OnlyAppOrAtom | OnlyAtom -> false | _ -> true end | EIfThen _ -> begin match k with | AllButIfThenSeq | OnlyAppOrAtom | OnlyAtom -> false | _ -> true end | EApp (_, _ :: _) | EData (_, _ :: _) | EMagic _ | ERepr _ | ERaise _ -> begin match k with | OnlyAtom -> false | _ -> true end | ERecordWrite _ | EIfThenElse _ -> begin match k with | OnlyAppOrAtom | OnlyAtom -> false | _ -> true end | EVar _ | ETextual _ | EApp (_, []) | EData (_, []) | ETuple _ | EAnnot _ | ERecord _ | ERecordAccess (_, _) | EIntConst _ | EStringConst _ | EUnit | EArray _ | EArrayAccess (_, _) -> true let rec exprlet k pes f e2 = match pes with | [] -> exprk k f e2 | (PUnit, e1) :: pes -> fprintf f "%a%t%a" (exprk AllButLetFunTryMatch) e1 seminl (exprlet k pes) e2 | (PVar id1, EAnnot (e1, ts1)) :: pes -> (* TEMPORARY current ocaml does not support type schemes here; drop quantifiers, if any *) fprintf f "let %s : %a = %a in%t%a" id1 typ ts1.body (* scheme ts1 *) expr e1 nl (exprlet k pes) e2 | (PVar id1, EFun (ps1, e1)) :: pes -> fprintf f "let %s%a = %a in%t%t%a" id1 (list pat0 space) ps1 (indent 2 expr) e1 nl nl (exprlet k pes) e2 | (p1, (ELet _ as e1)) :: pes -> fprintf f "let %a =%a%tin%t%a" pat p1 (indent 2 expr) e1 nl nl (exprlet k pes) e2 | (p1, e1) :: pes -> fprintf f "let %a = %a in%t%a" pat p1 expr e1 nl (exprlet k pes) e2 and atom f e = exprk OnlyAtom f e and app f e = exprk OnlyAppOrAtom f e and expr f e = exprk All f e and exprk k f e = if member e k then match e with | EComment (c, e) -> if Settings.comment then fprintf f "(* %s *)%t%a" c nl (exprk k) e else exprk k f e | EPatComment (s, p, e) -> if Settings.comment then fprintf f "(* %s%a *)%t%a" s pat p nl (exprk k) e else exprk k f e | ELet (pes, e2) -> exprlet k pes f e2 | ERecordWrite (e1, field, e2) -> fprintf f "%a.%s <- %a" atom e1 field (exprk (andNotSeq k)) e2 | EMatch (_, []) -> assert false | EMatch (e, brs) -> fprintf f "match %a with%a" expr e (branches k) brs | ETry (_, []) -> assert false | ETry (e, brs) -> fprintf f "try%a%twith%a" (indent 2 expr) e nl (branches k) brs | EIfThen (e1, e2) -> fprintf f "if %a then%a" expr e1 (indent 2 (exprk (andNotSeq k))) e2 | EIfThenElse (e0, e1, e2) -> fprintf f "if %a then%a%telse%a" expr e0 (indent 2 (exprk AllButIfThenSeq)) e1 nl (indent 2 (exprk (andNotSeq k))) e2 | EFun (ps, e) -> fprintf f "fun%a ->%a" (list pat0 space) ps (indent 2 (exprk k)) e | EApp (EVar op, [ e1; e2 ]) when op.[0] = '(' && op.[String.length op - 1] = ')' -> let op = String.sub op 1 (String.length op - 2) in fprintf f "%a %s %a" app e1 op app e2 | EApp (e, args) -> fprintf f "%a%a" app e (list atom space) args | ERaise e -> fprintf f "raise %a" atom e | EMagic e -> fprintf f "Obj.magic %a" atom e | ERepr e -> fprintf f "Obj.repr %a" atom e | EData (d, []) -> var f d | EData (d, [ arg ]) -> fprintf f "%s %a" d atom arg | EData ("::", [ arg1; arg2 ]) -> (* Special case for infix cons. *) fprintf f "%a :: %a" atom arg1 atom arg2 | EData (d, (_ :: _ :: _ as args)) -> fprintf f "%s (%a)" d (seplist app comma) args | EVar v -> var f v | ETextual action -> stretch false f action | EUnit -> fprintf f "()" | EIntConst k -> if k >= 0 then fprintf f "%d" k else fprintf f "(%d)" k | EStringConst s -> fprintf f "\"%s\"" (Compatibility.String.escaped s) | ETuple [] -> assert false | ETuple [ e ] -> atom f e | ETuple (_ :: _ :: _ as es) -> fprintf f "(%a)" (seplist app comma) es | EAnnot (e, s) -> (* TEMPORARY current ocaml does not support type schemes here; drop quantifiers, if any *) fprintf f "(%a : %a)" app e typ s.body (* should be scheme s *) | ERecordAccess (e, field) -> fprintf f "%a.%s" atom e field | ERecord fs -> fprintf f "{%a%t}" (indent 2 (seplist field nl)) fs nl | EArray fs -> fprintf f "[|%a%t|]" (indent 2 (seplist array_field nl)) fs nl | EArrayAccess (e, i) -> fprintf f "%a.(%a)" atom e expr i else fprintf f "(%a)" expr e (* When printing a stretch, the string [content] includes padding (that is, whitespace) to as to preserve the column numbers of the source file, whereas the string [raw_content] does not. When [X.locate_stretches] is [None], the parameter [raw] controls the choice between them. When [X.locate_stretches] is [Some _], we ignore [raw] and force the use of the [content] string, so as to have correct column numbers. *) and stretch raw f stretch = let content = stretch.Stretch.stretch_content and raw_content = stretch.Stretch.stretch_raw_content in match X.locate_stretches with | Some basename -> sharp f stretch.Stretch.stretch_linenum stretch.Stretch.stretch_filename; output_string f content; line := !line + stretch.Stretch.stretch_linecount; (* The addition [_ + 2] anticipates the effect on the line counter of the directive that we are just about to print. *) sharp f (!line + 2) basename; output_substring f whitespace 0 !indentation | None -> output_string f (if raw then raw_content else content) and branches k f = function | [] -> () | [ br ] -> fprintf f "%t| %a" nl (branch k) br | br :: brs -> fprintf f "%t| %a%a" nl (branch AllButFunTryMatch) br (branches k) brs and branch k f br = fprintf f "%a ->%a" pat br.branchpat (indent 4 (exprk k)) br.branchbody and field f (label, e) = fprintf f "%s = %a%t" label app e semi and fpat f (label, p) = fprintf f "%s = %a%t" label pat p semi and array_field f e = fprintf f "%a%t" app e semi and pat0 f = function | PUnit -> fprintf f "()" | PWildcard -> fprintf f "_" | PVar x -> var f x | PVarLocated x -> (* Turn [x] on the fly into a stretch and print that. *) stretch true f (Lexer.stretch_of_id x) | PData (d, []) -> var f d | PTuple [] -> assert false | PTuple [ p ] -> pat0 f p | PTuple (_ :: _ :: _ as ps) -> fprintf f "(%a)" (seplist pat1 comma) ps | PAnnot (p, t) -> fprintf f "(%a : %a)" pat p typ t | PRecord fps -> (* 2018/10/19. In a record pattern, we used to omit bindings of the form [field = _]. However, this triggers OCaml's warning 9. We now print all bindings. *) fprintf f "{%a%t}" (indent 2 (seplist fpat nl)) fps nl | p -> fprintf f "(%a)" pat p and pat1 f = function | PData (d, [ arg ]) -> fprintf f "%s %a" d pat0 arg | PData (d, (_ :: _ :: _ as args)) -> fprintf f "%s (%a)" d (seplist pat1 comma) args | PTuple [ p ] -> pat1 f p | p -> pat0 f p and pat2 f = function | POr [] -> assert false | POr (_ :: _ as ps) -> seplist pat2 bar f ps | PTuple [ p ] -> pat2 f p | p -> pat1 f p and pat f p = pat2 f p and typevar f = function | "_" -> fprintf f "_" | v -> fprintf f "'%s" v and typ0 f = function | TypTextual (Stretch.Declared ocamltype) -> (* Parentheses are necessary to avoid confusion between 1 - ary data constructor with n arguments and n - ary data constructor. *) fprintf f "(%a)" (stretch true) ocamltype | TypTextual (Stretch.Inferred t) -> line := !line + LineCount.count 0 (Lexing.from_string t); fprintf f "(%s)" t | TypVar v -> typevar f v | TypApp (t, params) -> fprintf f "%a%s" (typeparams typ0 typ) params t | t -> fprintf f "(%a)" typ t and typ1 f = function | TypTuple [] -> assert false | TypTuple (_ :: _ as ts) -> seplist typ0 times f ts | t -> typ0 f t and typ2 f = function | TypArrow (t1, t2) -> fprintf f "%a -> %a" typ1 t1 typ2 t2 | t -> typ1 f t and typ f = typ2 f and scheme f scheme = match scheme.quantifiers with | [] -> typ f scheme.body | qs -> fprintf f "%a. %a" (list typevar space) qs typ scheme.body (* ------------------------------------------------------------------------- *) (* Toplevel definition printer. *) (* The tuple of the arguments of a data constructor. *) let datavalparams f params = (* [typ1] because [type t = A of int -> int ] is not allowed by OCaml *) (* [type t = A of (int -> int)] is allowed *) seplist typ1 times f params (* A data constructor definition. *) let datadef typename f def = fprintf f " | %s" def.dataname; match def.datavalparams, def.datatypeparams with | [], None -> (* | A *) () | _ :: _, None -> (* | A of t * u *) fprintf f " of %a" datavalparams def.datavalparams | [], Some indices -> (* | A : (v, w) ty *) fprintf f " : %a%s" (typeparams typ0 typ) indices typename | _ :: _, Some indices -> (* | A : t * u -> (v, w) ty *) fprintf f " : %a -> %a%s" datavalparams def.datavalparams (typeparams typ0 typ) indices typename let fielddef f def = fprintf f " %s%s: %a" (if def.modifiable then "mutable " else "") def.fieldname scheme def.fieldtype let typerhs typename f = function | TDefRecord [] -> assert false | TDefRecord (_ :: _ as fields) -> fprintf f " = {%t%a%t}" nl (seplist fielddef seminl) fields nl | TDefSum [] -> () | TDefSum defs -> fprintf f " = %a" (list (datadef typename) nl) defs | TAbbrev t -> fprintf f " = %a" typ t let typeconstraint f = function | None -> () | Some (t1, t2) -> fprintf f "%tconstraint %a = %a" nl typ t1 typ t2 let typedef f def = fprintf f "%a%s%a%a" (typeparams typevar typevar) def.typeparams def.typename (typerhs def.typename) def.typerhs typeconstraint def.typeconstraint let rec pdefs pdef sep1 sep2 f = function | [] -> () | [ def ] -> fprintf f "%t%a" sep1 pdef def | def :: defs -> fprintf f "%t%a%t%t%a" sep1 pdef def (* Separate two successive items with two newlines. *) nl nl (pdefs pdef sep2 sep2) defs let valdef f = function | { valpat = PVar id; valval = EAnnot (e, ts) } -> (* TEMPORARY current ocaml does not support type schemes here; drop quantifiers, if any *) fprintf f "%s : %a =%a" id typ ts.body (* scheme ts *) (indent 2 expr) e | { valpat = p; valval = e } -> fprintf f "%a =%a" pat p (indent 2 expr) e let valdefs recursive = pdefs valdef (if recursive then letrec else letnonrec) et let typedefs = pdefs typedef keytyp et let excdef in_intf f def = match in_intf, def.exceq with | _, None | true, Some _ -> fprintf f "%s" def.excname | false, Some s -> fprintf f "%s = %s" def.excname s let excdefs in_intf = pdefs (excdef in_intf) exc exc let block format body f b = fprintf f format (fun f b -> indent 2 body f b; nl f ) b (* Convention: each structure (or interface) item prints a newline before and after itself. *) let rec structure_item f item = match item with | SIFunctor ([], s) -> structure f s | SIStretch stretches -> List.iter (stretch false f) stretches | _ -> nl f; begin match item with | SIFunctor (params, s) -> fprintf f "module Make%a%t= %a" (list (stretch false) nl) params nl structend s | SIExcDefs defs -> excdefs false f defs | SITypeDefs defs -> typedefs f defs | SIValDefs (recursive, defs) -> valdefs recursive f defs | SIStretch _ -> assert false (* already handled above *) | SIModuleDef (name, rhs) -> fprintf f "module %s = %a" name modexpr rhs | SIInclude e -> fprintf f "include %a" modexpr e | SIComment comment -> fprintf f "(* %s *)" comment end; nl f and structend f s = block "struct%aend" structure f s and structure f s = list structure_item nothing f s and modexpr f = function | MVar x -> fprintf f "%s" x | MStruct s -> structend f s | MApp (e1, e2) -> fprintf f "%a (%a)" modexpr e1 modexpr e2 let valdecl f (x, ts) = fprintf f "val %s: %a" x typ ts.body let with_kind f = function | WKNonDestructive -> output_string f "=" | WKDestructive -> output_string f ":=" let rec module_type f = function | MTNamedModuleType s -> output_string f s | MTWithType (mt, params, name, wk, t) -> fprintf f "%a%a" module_type mt (indent 2 with_type) (params, name, wk, t) | MTSigEnd i -> sigend f i and with_type f (params, name, wk, t) = fprintf f "with type %a %a %a" typ (TypApp (name, List.map (fun v -> TypVar v) params)) with_kind wk typ t and interface_item f item = match item with | IIFunctor ([], i) -> interface f i | _ -> nl f; begin match item with | IIFunctor (params, i) -> fprintf f "module Make%a%t: %a" (list (stretch false) nl) params nl sigend i | IIExcDecls defs -> excdefs true f defs | IITypeDecls defs -> typedefs f defs | IIValDecls decls -> pdefs valdecl nothing nothing f decls | IIInclude mt -> fprintf f "include %a" module_type mt | IIModule (name, mt) -> fprintf f "module %s : %a" name module_type mt | IIComment comment -> fprintf f "(* %s *)" comment end; nl f and sigend f i = block "sig%aend" interface f i and interface f i = list interface_item nothing f i let program s = structure X.f s; flush X.f let interface i = interface X.f i; flush X.f let expr e = expr X.f e; flush X.f end (* ------------------------------------------------------------------------- *) (* Instantiation with output channels. *) module Make (X : sig val f: out_channel val locate_stretches: string option end) = struct include PreliminaryMake(struct type channel = out_channel include X let fprintf = Printf.fprintf let output_substring = output_substring end) end (* ------------------------------------------------------------------------- *) (* Instantiation with buffers. *) module MakeBuffered (X : sig val f: Buffer.t val locate_stretches: string option end) = struct include PreliminaryMake(struct type channel = Buffer.t include X let fprintf = Printf.bprintf let output_substring = Buffer.add_substring end) end (* ------------------------------------------------------------------------- *) (* Common instantiations. *) let print_expr f e = let module P = Make (struct let f = f let locate_stretches = None end) in P.expr e let string_of_expr e = Misc.with_buffer 512 (fun b -> let module P = MakeBuffered (struct let f = b let locate_stretches = None end) in P.expr e ) menhir-20210929/src/printer.mli000066400000000000000000000045761412503066000162340ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* A pretty-printer for [IL]. *) module Make (X : sig (* This is the channel that is being written to. *) val f: out_channel (* [locate_stretches] controls the way we print OCaml stretches (types and semantic actions). If it is [Some dstfilename], where [dstfilename] is the name of the file that is being written, then we surround stretches with OCaml line number directives of the form # . If it is [None], then we don't. *) (* Providing line number directives allows the OCaml typechecker to report type errors in the .mly file, instead of in the generated .ml / .mli files. Line number directives also affect the dynamic semantics of any [assert] statements contained in semantic actions: when they are provided, the [Assert_failure] exception carries a location in the .mly file. As a general rule of thumb, line number directives should always be provided, except perhaps where we think that they decrease readability (e.g., in a generated .mli file). *) val locate_stretches: string option end) : sig val program: IL.program -> unit val expr: IL.expr -> unit val interface: IL.interface -> unit end (* Common instantiations. In the following two functions, [locate_stretches] is [None], so no line number directives are printed. *) val print_expr: out_channel -> IL.expr -> unit val string_of_expr: IL.expr -> string menhir-20210929/src/rawPrinter.ml000066400000000000000000000134321412503066000165240ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* A debugging pretty-printer for [IL]. Newlines are used liberally, so as to facilitate diffs. *) open IL module Make (X : sig (* This is the channel that is being written to. *) val f: out_channel end) = struct (* ------------------------------------------------------------------------- *) (* XML-style trees. *) type tree = | Node of string * tree list let node label ts = Node (label, ts) (* ------------------------------------------------------------------------- *) (* Dealing with newlines and indentation. *) let maxindent = 120 let whitespace = Bytes.make maxindent ' ' let indentation = ref 0 let line = ref 1 (* [rawnl] is, in principle, the only place where writing a newline character to the output channel is permitted. This ensures that the line counter remains correct. But see also [stretch] and [typ0]. *) let rawnl f = incr line; output_char f '\n' let nl f = rawnl f; output f whitespace 0 !indentation let indent ofs producer f x = let old_indentation = !indentation in let new_indentation = old_indentation + ofs in if new_indentation <= maxindent then indentation := new_indentation; nl f; producer f x; indentation := old_indentation (* ------------------------------------------------------------------------- *) (* Tree printers. *) let rec print_tree f = function | Node (label, []) -> output_char f '<'; output_string f label; output_char f '/'; output_char f '>'; nl f | Node (label, ts) -> output_char f '<'; output_string f label; output_char f '>'; indent 2 print_trees f ts; output_char f '<'; output_char f '/'; output_string f label; output_char f '>'; nl f and print_trees f = function | [] -> () | t :: ts -> print_tree f t; print_trees f ts (* ------------------------------------------------------------------------- *) (* Expression-to-tree converter. *) let rec expr e = match e with | EComment (c, e) -> node "comment" [ string c; expr e ] | EPatComment (s, p, e) -> node "patcomment" [ string s; pat p; expr e ] | ELet (pes, e2) -> node "let" ( patexprs pes @ [ expr e2 ]) | ERecordWrite (e1, field, e2) -> node "recordwrite" [ expr e1; string field; expr e2 ] | EMatch (e, brs) -> node "match" ( expr e :: branches brs ) | ETry (e, brs) -> node "try" ( expr e :: branches brs ) | EIfThen (e1, e2) -> node "ifthen" [ expr e1; expr e2 ] | EIfThenElse (e0, e1, e2) -> node "ifthenelse" [ expr e0; expr e1; expr e2 ] | EFun (ps, e) -> node "fun" ( pats ps @ [ expr e ]) | EApp (e, args) -> node "app" ( expr e :: exprs args ) | ERaise e -> node "raise" [ expr e ] | EMagic e -> node "magic" [ expr e ] | ERepr e -> node "repr" [ expr e ] | EData (d, args) -> node "data" ( string d :: exprs args ) | EVar v -> node "var" [ string v ] | ETextual action -> node "text" [ stretch action ] | EUnit -> node "unit" [] | EIntConst k -> node "int" [ int k ] | EStringConst s -> node "string" [ string s ] | ETuple es -> node "tuple" ( exprs es ) | EAnnot (e, s) -> node "annot" [ expr e; scheme s ] | ERecordAccess (e, field) -> node "recordaccess" [ expr e; string field ] | ERecord fs -> node "record" (fields fs) | EArray fs -> node "array" (exprs fs) | EArrayAccess (e1, e2) -> node "arrayaccess" [ expr e1; expr e2 ] and exprs es = List.map expr es and stretch stretch = string stretch.Stretch.stretch_content and branches brs = List.map branch brs and branch br = node "branch" [ pat br.branchpat; expr br.branchbody ] and fields fs = List.map field fs and field (label, e) = node "field" [ string label; expr e ] and pats ps = List.map pat ps and pat = function | PUnit -> node "punit" [] | PWildcard -> node "pwildcard" [] | PVar x -> node "pvar" [ string x ] | PVarLocated x -> let x = Positions.value x in node "pvar" [ string x ] | PTuple ps -> node "ptuple" (pats ps) | PAnnot (p, t) -> node "pannot" [ pat p; typ t ] | PData (d, args) -> node "pdata" (string d :: pats args) | PRecord fps -> node "precord" (fpats fps) | POr ps -> node "por" (pats ps) and fpats fps = List.map fpat fps and fpat (_, p) = pat p and patexprs pes = List.map patexpr pes and patexpr (p, e) = node "patexpr" [ pat p; expr e ] and string s = node s [] and int k = node (string_of_int k) [] and scheme _s = string "omitted" (* TEMPORARY to be completed, someday *) and typ _t = string "omitted" (* TEMPORARY to be completed, someday *) (* ------------------------------------------------------------------------- *) (* Convert to a tree, then print the tree. *) let expr e = print_tree X.f (expr e) end menhir-20210929/src/rawPrinter.mli000066400000000000000000000023101412503066000166660ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* A debugging pretty-printer for [IL]. Newlines are used liberally, so as to facilitate diffs. *) module Make (X : sig (* This is the channel that is being written to. *) val f: out_channel end) : sig val expr: IL.expr -> unit end menhir-20210929/src/reachability.ml000066400000000000000000000045141412503066000170300ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open BasicSyntax let rec visit grammar visited symbol = try let rule = StringMap.find symbol grammar.rules in if not (StringSet.mem symbol visited) then let visited = StringSet.add symbol visited in List.fold_left (visitb grammar) visited rule.branches else visited with Not_found -> (* This is a terminal symbol. *) assert (symbol = "error" || StringMap.mem symbol grammar.tokens); visited and visitb grammar visited { producers = symbols } = List.fold_left (visits grammar) visited symbols and visits grammar visited producer = visit grammar visited (producer_symbol producer) let trim grammar = if StringSet.cardinal grammar.start_symbols = 0 then Error.error [] "no start symbol has been declared." else let reachable = StringSet.fold (fun symbol visited -> visit grammar visited symbol ) grammar.start_symbols StringSet.empty in StringMap.iter (fun symbol rule -> if not (StringSet.mem symbol reachable) then Error.grammar_warning rule.positions "symbol %s is unreachable from any of the start symbol(s)." symbol ) grammar.rules; { grammar with rules = StringMap.restrict reachable grammar.rules; types = StringMap.restrict reachable grammar.types; on_error_reduce = StringMap.restrict reachable grammar.on_error_reduce; } menhir-20210929/src/reachability.mli000066400000000000000000000022431412503066000171760ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This extremely simple analysis restricts a grammar to the set of nonterminals that are reachable, via productions, from the start nonterminals. *) val trim: BasicSyntax.grammar -> BasicSyntax.grammar menhir-20210929/src/referenceInterpreter.ml000066400000000000000000000265611412503066000205600ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar open Cst (* ------------------------------------------------------------------------ *) (* Set up all of the information required by the LR engine. Everything is read directly from [Grammar] and [Lr1]. *) module T = struct type state = Lr1.node let number = Lr1.number type token = Terminal.t type terminal = Terminal.t type nonterminal = Nonterminal.t type semantic_value = cst let token2terminal (token : token) : terminal = token let token2value (token : token) : semantic_value = CstTerminal token let error_terminal = Terminal.error let error_value = CstError let foreach_terminal = Terminal.foldx type production = Production.index let production_index = Production.p2i let find_production = Production.i2p let default_reduction (s : state) defred nodefred env = match Default.has_default_reduction s with | Some (prod, _) -> defred env prod | None -> nodefred env let action (s : state) (tok : terminal) value shift reduce fail env = (* Check whether [s] has an outgoing shift transition along [tok]. *) try let s' : state = SymbolMap.find (Symbol.T tok) (Lr1.transitions s) in (* There is such a transition. Return either [ShiftDiscard] or [ShiftNoDiscard], depending on the existence of a default reduction on [#] at [s']. *) match Default.has_default_reduction s' with | Some (_, toks) when TerminalSet.mem Terminal.sharp toks -> shift env false tok value s' | _ -> shift env true tok value s' (* There is no such transition. Look for a reduction. *) with Not_found -> try let prod = Misc.single (TerminalMap.find tok (Lr1.reductions s)) in reduce env prod (* There is no reduction either. Fail. *) with Not_found -> fail env let goto_nt (s : state) (nt : nonterminal) : state = try SymbolMap.find (Symbol.N nt) (Lr1.transitions s) with Not_found -> assert false let goto_prod (s : state) (prod : production) : state = goto_nt s (Production.nt prod) let maybe_goto_nt (s : state) (nt : nonterminal) : state option = try Some (SymbolMap.find (Symbol.N nt) (Lr1.transitions s)) with Not_found -> None open MenhirLib.EngineTypes exception Error (* By convention, a semantic action returns a new stack. It does not affect [env]. *) let is_start = Production.is_start type semantic_action = (state, semantic_value, token) env -> (state, semantic_value) stack let semantic_action (prod : production) : semantic_action = fun env -> assert (not (Production.is_start prod)); (* Reduce. Pop a suffix of the stack, and use it to construct a new concrete syntax tree node. *) let n = Production.length prod in let values : semantic_value array = Array.make n CstError (* dummy *) and startp = ref Lexing.dummy_pos and endp= ref Lexing.dummy_pos and current = ref env.current and stack = ref env.stack in (* We now enter a loop to pop [k] stack cells and (after that) push a new cell onto the stack. *) (* This loop does not update [env.current]. Instead, the state in the newly pushed stack cell will be used (by our caller) as a basis for a goto transition, and [env.current] will be updated (if necessary) then. *) for k = n downto 1 do (* Fetch a semantic value. *) values.(k - 1) <- !stack.semv; (* Pop one cell. The stack must be non-empty. As we pop a cell, change the automaton's current state to the one stored within the cell. (It is sufficient to do this only when [k] is 1, since the last write overwrites any and all previous writes.) If this is the first (last) cell that we pop, update [endp] ([startp]). *) let next = !stack.next in assert (!stack != next); if k = n then begin endp := !stack.endp end; if k = 1 then begin current := !stack.state; startp := !stack.startp end; stack := next done; (* Done popping. *) (* Construct and push a new stack cell. The associated semantic value is a new concrete syntax tree. *) { state = !current; semv = CstNonTerminal (prod, values); startp = !startp; endp = !endp; next = !stack } let may_reduce node prod = Lr1.NodeSet.mem node (Lr1.production_where prod) (* The logging functions that follow are called only if [log] is [true]. *) module Log = struct open Printf let state s = fprintf stderr "State %d:" (Lr1.number s); prerr_newline() let shift tok s' = fprintf stderr "Shifting (%s) to state %d" (Terminal.print tok) (Lr1.number s'); prerr_newline() let reduce_or_accept prod = match Production.classify prod with | Some _ -> fprintf stderr "Accepting"; prerr_newline() | None -> fprintf stderr "Reducing production %s" (Production.print prod); prerr_newline() let lookahead_token tok startp endp = fprintf stderr "Lookahead token is now %s (%d-%d)" (Terminal.print tok) startp.Lexing.pos_cnum endp.Lexing.pos_cnum; prerr_newline() let initiating_error_handling () = fprintf stderr "Initiating error handling"; prerr_newline() let resuming_error_handling () = fprintf stderr "Resuming error handling"; prerr_newline() let handling_error s = fprintf stderr "Handling error in state %d" (Lr1.number s); prerr_newline() end end (* ------------------------------------------------------------------------ *) (* The strategy used by the reference interpreter is determined by the command line switch [--strategy]. *) let strategy = Settings.strategy (* ------------------------------------------------------------------------ *) (* Define a palatable user entry point. *) let interpret nt log lexer lexbuf = (* Instantiate the LR engine. *) let module E = MenhirLib.Engine.Make (struct include T let log = log end) in (* Run it. *) try Some (E.entry strategy (Lr1.entry_of_nt nt) lexer lexbuf) with T.Error -> None (* ------------------------------------------------------------------------ *) (* Another entry point, used internally by [LRijkstra] to check that the sentences that [LRijkstra] produces do lead to an error in the expected state. *) type spurious_reduction = Lr1.node * Production.index type target = Lr1.node * spurious_reduction list type check_error_path_outcome = (* Bad: the input was read past its end. *) | OInputReadPastEnd (* Bad: a syntax error occurred before all of the input was read. *) | OInputNotFullyConsumed (* Bad: the parser unexpectedly accepted (part of) this input. *) | OUnexpectedAccept (* Good: a syntax error occurred after reading the last input token. We report in which state the error took place, as well as a list of spurious reductions. A non-default reduction that takes place after looking at the last input token (i.e., the erroneous token) is spurious. Furthermore, any reduction that takes place after a spurious reduction is itself spurious. We note that a spurious reduction can take place only in a non-canonical LR automaton. *) | OK of target let check_error_path log nt input = (* Instantiate the LR engine. *) let module E = MenhirLib.Engine.Make (struct include T let log = log end) in (* Determine the initial state. *) let entry = Lr1.entry_of_nt nt in (* This function helps extract the current parser state out of [env]. It may become unnecessary if the [Engine] API offers it. *) let current env = (* Peek at the stack. If empty, then we must be in the initial state. *) match E.top env with | None -> entry | Some (E.Element (s, _, _, _)) -> s in (* Set up a function that delivers tokens one by one. *) let input = ref input in let next () = match !input with | [] -> None | t :: ts -> input := ts; Some t in let looking_at_last_token () : bool = !input = [] in (* Run. We wish to stop at the first error (without handling the error in any way) and report in which state the error occurred. A clean way of doing this is to use the incremental API, as follows. The main loop resembles the [loop] function in [Engine]. *) (* Another reason why we write our own loop is that we wish to detect spurious reductions. We accumulate these reductions in [spurious], a (reversed) list of productions. *) let rec loop (checkpoint : cst E.checkpoint) (spurious : spurious_reduction list) = match checkpoint with | E.InputNeeded _ -> begin match next() with | None -> OInputReadPastEnd | Some t -> loop (E.offer checkpoint (t, Lexing.dummy_pos, Lexing.dummy_pos)) spurious end | E.Shifting _ -> loop (E.resume ~strategy checkpoint) spurious | E.AboutToReduce (env, prod) -> (* If we have requested the last input token and if this is not a default reduction, then this is a spurious reduction. Furthermore, if a spurious reduction has taken place already, then this is also a spurious reduction. *) let spurious = if looking_at_last_token() && not (E.env_has_default_reduction env) || spurious <> [] then (current env, prod) :: spurious else spurious in loop (E.resume ~strategy checkpoint) spurious | E.HandlingError env -> (* Check that all of the input has been read. Otherwise, the error has occurred sooner than expected. *) if !input = [] then (* Return the current state and the list of spurious reductions. *) OK (current env, List.rev spurious) else OInputNotFullyConsumed | E.Accepted _ -> (* The parser has succeeded. This is unexpected. *) OUnexpectedAccept | E.Rejected -> (* The parser rejects this input. This should not happen; we should observe [HandlingError _] first. *) assert false in loop (E.start entry Lexing.dummy_pos) [] menhir-20210929/src/referenceInterpreter.mli000066400000000000000000000055541412503066000207300ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar open Cst (* This reference interpreter animates the LR automaton. It uses the grammar and automaton descriptions, as provided by [Grammar] and [Lr1], as well as the generic LR engine in [MenhirLib.Engine]. *) (* The interpreter requires a start symbol, a Boolean flag that tells whether a trace should be produced on the standard error channel, a lexer, and a lexing buffer. It either succeeds and produces a concrete syntax tree, or fails. *) val interpret: Nonterminal.t -> bool -> (Lexing.lexbuf -> Terminal.t) -> Lexing.lexbuf -> cst option (* This variant of the reference interpreter is used internally by us. We use it to debug [LRijkstra]. It checks that a sentence leads to a syntax error in the expected state. It is also used by several of the command line options [--interpret-error], [--compile-errors], etc. *) type spurious_reduction = Lr1.node * Production.index type target = Lr1.node * spurious_reduction list type check_error_path_outcome = (* Bad: the input was read past its end. *) | OInputReadPastEnd (* Bad: a syntax error occurred before all of the input was read. *) | OInputNotFullyConsumed (* Bad: the parser unexpectedly accepted (part of) this input. *) | OUnexpectedAccept (* Good: a syntax error occurred after reading the last input token. We report in which state the error took place, as well as a list of spurious reductions. A non-default reduction that takes place after looking at the last input token (i.e., the erroneous token) is spurious. Furthermore, any reduction that takes place after a spurious reduction is itself spurious. We note that a spurious reduction can take place only in a non-canonical LR automaton. *) | OK of target val check_error_path: bool -> (* --trace *) Nonterminal.t -> (* initial non-terminal symbol *) Terminal.t list -> (* input *) check_error_path_outcome menhir-20210929/src/resizableArray.ml000066400000000000000000000075771412503066000173630ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module implements resizable arrays, that is, arrays that can grow upon explicit request. *) type 'a t = { (* The default element is used to fill empty slots when growing or shrinking the physical array. *) default: 'a; (* The init function is used to initialize newly allocated slots when growing the logical array. *) init: int -> 'a; (* The logical size of this array. *) mutable size: int; (* The physical array, whose length is at least [size]. *) mutable table: 'a array } let make capacity default init = (* [capacity] must be nonzero, so that doubling it actually enlarges the array. *) assert (capacity >= 0); let capacity = if capacity = 0 then 1 else capacity in let table = Array.make capacity default in { default; init; size = 0; table } let make_ capacity default = make capacity default (fun _ -> default) let length a = a.size let get a i = assert (0 <= i && i < a.size); Array.unsafe_get a.table (i) let set a i x = assert (0 <= i && i < a.size); Array.unsafe_set a.table (i) x let shrink a s = (* This is [resize a s], assuming [0 <= s < a.size]. *) Array.fill a.table s (a.size - s) a.default; a.size <- s let grow a s = (* This is [resize a s], assuming [0 <= s && a.size < s]. *) let n = Array.length a.table in if s > n then begin (* The physical size of the array must increase. The new size is at least double of the previous size, and larger if requested. *) let table = Array.make (max (2 * n) s) a.default in Array.blit a.table 0 table 0 n; a.table <- table end; (* From [a.size] to [s], we have new logical slots. Initialize them. *) let init = a.init and table = a.table in for i = a.size to s - 1 do Array.unsafe_set table i (init i) done; (* Update the array's logical size. *) a.size <- s let resize a s = assert (0 <= s); if s < a.size then shrink a s else if s > a.size then grow a s let push a x = let s = a.size in (* equivalent to: [length a] *) begin (* equivalent to: [resize a (s + 1)] *) let s = s + 1 in let n = Array.length a.table in if s > n then begin (* assert (s = n + 1); *) (* assert (max (2 * n) s = 2 * n); *) let table = Array.make (2 * n) a.default in Array.blit a.table 0 table 0 n; a.table <- table end; (* No need to call [init], since there is just one new logical slot and we are about to write it anyway. *) a.size <- s end; Array.unsafe_set a.table (s) x (* equivalent to: [set a s x] *) let pop a = let s = a.size in (* equivalent to: [length a] *) assert (s > 0); let s = s - 1 in a.size <- s; let table = a.table in let x = Array.unsafe_get table (s) in (* equivalent to: [get a s] *) Array.unsafe_set table (s) a.default; (* equivalent to: [resize a s] *) x let default a = a.default menhir-20210929/src/resizableArray.mli000066400000000000000000000062461412503066000175240ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module implements resizable arrays, that is, arrays that can grow upon explicit request. *) type 'a t (* [make capacity default init] creates a resizable array of logical length 0, whose physical length is initially [capacity], and whose default element is [default]. The default element is used to fill empty slots in the physical array; it is otherwise irrelevant. The [init] function is used to initialize new logical slots when the logical size of the array grows, so, unlike [default], it is semantically meaningful. *) val make: int -> 'a -> (int -> 'a) -> 'a t (* [make_] is a simplified variant of [make] where the [init] function always returns [default], i.e., where new logical slots are initialized with [default] when the array is grown. *) val make_: int -> 'a -> 'a t (* [length a] returns the current logical length of the array [a]. *) val length: 'a t -> int (* [resize a n] changes the logical length of the array [a] to [n]. If the length decreases, any excess elements are lost. The capacity of the underlying physical array remains the same. If the length increases, the new positions are filled with the array's default element, as initially supplied to [make]. The capacity of the underlying physical array grows by at least a factor of two. *) val resize: 'a t -> int -> unit (* [get a i] returns the element contained at offset [i] in the array [a]. Slots are numbered 0 and up. [i] must be strictly less than the array's current logical length. *) val get: 'a t -> int -> 'a (* [set a i x] sets the element contained at offset [i] in the array [a] to [x]. Slots are numbered 0 and up. [i] must be strictly less than the array's current logical length. *) val set: 'a t -> int -> 'a -> unit (* [push a x] appends the element [x] at the end of the array [a], whose length increases by one. *) val push: 'a t -> 'a -> unit (* [pop a] removes the element [x] found at the end of the array [a], whose length decreases by one. The array must have nonzero length. *) val pop: 'a t -> 'a (* [default a] returns the default value that was used when the array [a] was created. This should be seldom useful, but can be convenient. *) val default: 'a t -> 'a menhir-20210929/src/segment.mll000066400000000000000000000113241412503066000162030ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This lexer is used to cut an input into segments, delimited by a blank line. (More precisely, by a run of at least one blank line and zero or more comment lines.) It produces a list of segments, where each segment is represented as a pair of positions. It is stand-alone and cannot fail. *) (* The whitespace in between two segments can contain comments, and the user may wish to preserve them. For this reason, we view a run of whitespace as a segment, too, and we accompany each segment with a tag which is either [Segment] or [Whitespace]. The two kinds of segments must alternate in the list that we produce. *) { type tag = | Segment | Whitespace open Lexing } let newline = ('\010' | '\013' | "\013\010") let whitespace = [ ' ' '\t' ';' ] let comment = '#' [^'\010''\013']* newline (* In the idle state, we skip whitespace, newlines and comments (while updating the liner counter). If we reach the end of file, we return the list of all segments found so far. If we reach a non-blank non-comment character, we record its position and switch to the busy state. *) rule idle opening segments = parse | whitespace { idle opening segments lexbuf } | newline { new_line lexbuf; idle opening segments lexbuf } | comment { new_line lexbuf; idle opening segments lexbuf } | eof { let closing = lexbuf.lex_start_p in let segment = Whitespace, opening, closing in let segments = segment :: segments in List.rev segments } | _ { let closing = lexbuf.lex_start_p in let segment = Whitespace, opening, closing in let segments = segment :: segments in let opening = closing in busy segments opening false lexbuf } (* In the busy state, we skip everything, maintaining one bit [just_saw_a_newline], until [just_saw_a_newline] is true and we find a second newline. This marks the end of a segment, and we revert back to the idle state. If we reach the end of file, we consider that this is also the end of a segment. *) and busy segments opening just_saw_a_newline = parse | whitespace { busy segments opening just_saw_a_newline lexbuf } | newline { new_line lexbuf; (* The newline that we just saw is already included in the segment. This one is not included. *) let closing = lexbuf.lex_start_p in if just_saw_a_newline then let segment = Segment, opening, closing in let segments = segment :: segments in let opening = closing in idle opening segments lexbuf else busy segments opening true lexbuf } | eof { let closing = lexbuf.lex_start_p in let segment = Segment, opening, closing in let segments = segment :: segments in List.rev segments } | _ { busy segments opening false lexbuf } { (* This wrapper function reads a file, cuts it into segments, and creates a fresh lexbuf for each segment, taking care to adjust its start position. *) let segment filename : (tag * string * lexbuf) list = let content = IO.read_whole_file filename in let lexbuf = from_string content in lexbuf.lex_curr_p <- { lexbuf.lex_curr_p with pos_fname = filename }; let segments : (tag * position * position) list = idle lexbuf.lex_curr_p [] lexbuf in List.map (fun (tag, startp, endp) -> let start = startp.pos_cnum in let length = endp.pos_cnum - start in let content = String.sub content start length in let lexbuf = from_string content in lexbuf.lex_start_p <- startp; lexbuf.lex_curr_p <- startp; lexbuf.lex_abs_pos <- startp.pos_cnum; (* That was tricky to find out. See [Lexing.engine]. [pos_cnum] is updated based on [buf.lex_abs_pos + buf.lex_curr_pos]. *) tag, content, lexbuf ) segments } menhir-20210929/src/sentenceLexer.mll000066400000000000000000000052401412503066000173450ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This lexer is used to read the sentences provided on the standard input channel when [--interpret] is enabled. *) { open Lexing open SentenceParser (* A short-hand. *) let error2 lexbuf = Error.error (Positions.lexbuf lexbuf) } let newline = ('\010' | '\013' | "\013\010") let whitespace = [ ' ' '\t' ';' ] let lowercase = ['a'-'z' '\223'-'\246' '\248'-'\255' '_'] let uppercase = ['A'-'Z' '\192'-'\214' '\216'-'\222'] let identchar = ['A'-'Z' 'a'-'z' '_' '\192'-'\214' '\216'-'\246' '\248'-'\255' '0'-'9'] (* '\'' forbidden *) let autocomment = "##" [^'\010''\013']* newline let comment = "#" [^'\010''\013']* newline let skip = newline whitespace* newline rule lex = parse (* An identifier that begins with an lowercase letter is considered a non-terminal symbol. It should be a start symbol. *) | (lowercase identchar *) as lid { NONTERMINAL (lid, lexbuf.lex_start_p, lexbuf.lex_curr_p) } (* An identifier that begins with an uppercase letter is considered a terminal symbol. *) | (uppercase identchar *) as uid { TERMINAL (uid, lexbuf.lex_start_p, lexbuf.lex_curr_p) } (* Whitespace is ignored. *) | whitespace { lex lexbuf } (* The end of a line is translated to [EOL]. *) | newline { new_line lexbuf; EOL } (* An auto-generated comment is ignored. *) | autocomment { new_line lexbuf; lex lexbuf } (* A manually-written comment is preserved. *) | comment as c { new_line lexbuf; COMMENT c } (* The end of file is translated to [EOF]. *) | eof { EOF } (* A colon. *) | ':' { COLON } | _ { error2 lexbuf "unexpected character.\n\ (I believe I am reading a sentence, but may be off.)" } menhir-20210929/src/sentenceParser.mly000066400000000000000000000070521412503066000175420ustar00rootroot00000000000000/******************************************************************************/ /* */ /* Menhir */ /* */ /* François Pottier, Inria Paris */ /* Yann Régis-Gianas, PPS, Université Paris Diderot */ /* */ /* Copyright Inria. All rights reserved. This file is distributed under the */ /* terms of the GNU General Public License version 2, as described in the */ /* file LICENSE. */ /* */ /******************************************************************************/ /* This is two parsers in one. */ /* This parser is used to read the sentences provided on the standard input channel when [--interpret] is set. The entry point is [optional_sentence]. */ /* It is used also to read a [.messages] file. The entry point is [entry]. */ /* This parser must be compatible with both ocamlyacc and menhir, so we use $ notation, do not use Menhir's standard library, and collect positions manually. */ /* ------------------------------------------------------------------------ */ /* Tokens. */ %token COLON EOF EOL %token TERMINAL %token NONTERMINAL %token COMMENT /* only manually-written comments, beginning with a single # */ /* ------------------------------------------------------------------------ */ /* Types. */ %{ open SentenceParserAux (* Computing the start and end positions of a sentence. *) let locate_sentence (nto, terminals) = let opening = match nto, terminals with | Some (_, opening, _), _ | None, (_, opening, _) :: _ -> opening | None, [] -> Lexing.dummy_pos (* cannot happen *) and closing = match nto, List.rev terminals with | _, (_, _, closing) :: _ | Some (_, _, closing), _ -> closing | None, [] -> Lexing.dummy_pos (* cannot happen *) in [Positions.import (opening, closing)], (nto, terminals) %} %type sentence %type located_sentence %type optional_sentence %type entry %start optional_sentence %start entry %% /* ------------------------------------------------------------------------ */ /* An entry is a list of located sentences or comments. */ entry: located_sentences_or_comments EOF { $1 } /* A list of located sentences or comments. */ located_sentences_or_comments: { [] } | located_sentence located_sentences_or_comments { Thing $1 :: $2 } | COMMENT located_sentences_or_comments { Comment $1 :: $2 } /* A located sentence. */ located_sentence: sentence { locate_sentence $1 } /* An optional sentence. */ optional_sentence: | EOF { None } | sentence { Some $1 } /* A sentence is a pair of an optional non-terminal start symbol and a list of terminal symbols. It is terminated by a newline. */ sentence: | NONTERMINAL COLON terminals EOL { Some $1, $3 } | terminals EOL { None, $1 } /* A list of terminal symbols. */ terminals: | { [] } | TERMINAL terminals { $1 :: $2 } menhir-20210929/src/sentenceParserAux.ml000066400000000000000000000037401412503066000200270ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Grammar type raw_symbol = string * Lexing.position * Lexing.position type raw_nonterminal = raw_symbol type raw_terminal = raw_symbol type raw_sentence = raw_nonterminal option * raw_terminal list type located_raw_sentence = Positions.positions * raw_sentence type sentence = Nonterminal.t option * Terminal.t list type located_sentence = Positions.positions * sentence type comment = string type 'a or_comment = | Thing of 'a | Comment of comment let or_comment_iter f = function | Thing s -> f s | Comment _ -> () let or_comment_fold f accu = function | Thing s -> f accu s | Comment _ -> accu let or_comment_map f = function | Thing s -> Thing (f s) | Comment c -> Comment c let or_comment_filter_map f = function | Thing s -> Some (f s) | Comment _ -> None let or_comment_count accu = function | Thing _ -> accu + 1 | Comment _ -> accu let count_things (xs : 'a or_comment list) = List.fold_left or_comment_count 0 xs menhir-20210929/src/settings.ml000066400000000000000000000514041412503066000162300ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Printf (* ------------------------------------------------------------------------- *) (* Prepare for parsing the command line. *) type token_type_mode = | TokenTypeAndCode (* produce the definition of the [token] type and code for the parser *) | TokenTypeOnly (* produce the type definition only *) | CodeOnly of string (* produce the code only; import token type from specified module *) let token_type_mode = ref TokenTypeAndCode let tokentypeonly () = token_type_mode := TokenTypeOnly let is_uppercase_ascii c = c >= 'A' && c <= 'Z' let is_capitalized_ascii s = String.length s > 0 && is_uppercase_ascii s.[0] let codeonly m = if not (is_capitalized_ascii m) then begin (* Not using module [Error] to avoid a circular dependency. *) fprintf stderr "Error: %s is not a valid OCaml module name.\n" m; exit 1 end; token_type_mode := CodeOnly m let version = ref false type construction_mode = | ModeCanonical (* --canonical: canonical Knuth LR(1) automaton *) | ModeInclusionOnly (* --no-pager : states are merged when there is an inclusion relationship *) | ModePager (* normal mode: states are merged as per Pager's criterion *) | ModeLALR (* --lalr : states are merged as in an LALR generator, i.e. as soon as they have the same LR(0) core *) (* Note that --canonical overrides --no-pager. If both are specified, the result is a canonical automaton. *) let construction_mode = ref ModePager let explain = ref false let base = ref "" let dump = ref false let dump_resolved = ref false let reference_graph = ref false let automaton_graph = ref false let trace = ref false let noprefix = ref false type print_mode = | PrintNormal | PrintForOCamlyacc | PrintUnitActions of bool (* if true, declare unit tokens *) type preprocess_mode = | PMNormal (* preprocess and continue *) | PMOnlyPreprocess of print_mode (* preprocess, print grammar, stop *) let preprocess_mode = ref PMNormal let recovery = ref false let v () = dump := true; explain := true let inline = ref true type infer_mode = (* Perform no type inference. This is the default mode. *) | IMNone (* Perform type inference by invoking ocamlc directly. *) | IMInfer (* --infer *) | IMDependRaw (* --raw-depend *) | IMDependPostprocess (* --depend *) (* Perform type inference by writing a mock .ml file and reading the corresponding inferred .mli file. *) | IMWriteQuery of string (* --infer-write-query *) | IMReadReply of string (* --infer-read-reply *) let show_infer_mode = function | IMNone -> "" | IMInfer -> "--infer" | IMDependRaw -> "--raw-depend" | IMDependPostprocess -> "--depend" | IMWriteQuery _ -> "--infer-write-query" | IMReadReply _ -> "--infer-read-reply" let infer = ref IMNone let set_infer_mode mode2 = let mode1 = !infer in match mode1, mode2 with | IMNone, _ -> infer := mode2 (* It is valid to specify [--infer] in conjunction with [--depend] or [--raw-depend]. The latter command then takes precedence. This is for compatibility with Menhir prior to 2018/05/23. *) | IMInfer, (IMInfer | IMDependRaw | IMDependPostprocess) -> infer := mode2 | (IMDependRaw | IMDependPostprocess), IMInfer -> () | _, _ -> fprintf stderr "Error: you cannot use both %s and %s.\n" (show_infer_mode mode1) (show_infer_mode mode2); exit 1 let enable_infer () = set_infer_mode IMInfer let enable_depend () = set_infer_mode IMDependPostprocess let enable_raw_depend () = set_infer_mode IMDependRaw let enable_write_query filename = set_infer_mode (IMWriteQuery filename) let enable_read_reply filename = set_infer_mode (IMReadReply filename) let code_inlining = ref true let represent_positions = ref false let represent_states = ref false let represent_values = ref false let represent_everything () = represent_positions := true; represent_states := true; represent_values := true let comment = ref false let ocamlc = ref "ocamlc" let ocamldep = ref "ocamldep" let logG, logA, logC = ref 0, ref 0, ref 0 let timings = ref None let filenames = ref StringSet.empty let no_stdlib = ref false let insert name = filenames := StringSet.add name !filenames let interpret = ref false let interpret_show_cst = ref false let interpret_error = ref false let table = ref false let inspection = ref false let coq = ref false let coq_no_version_check = ref false let coq_no_complete = ref false let coq_no_actions = ref false let strict = ref false let fixedexc = ref false type suggestion = | SuggestNothing | SuggestCompFlags | SuggestLinkFlags of string (* "cmo" or "cmx" *) | SuggestWhereIsMenhirLibSource | SuggestUseOcamlfind let suggestion = ref SuggestNothing let ignored_unused_tokens = ref StringSet.empty let ignore_unused_token t = ignored_unused_tokens := StringSet.add t !ignored_unused_tokens let ignore_all_unused_tokens = ref false let ignore_all_unused_precedence_levels = ref false let list_errors = ref false let compile_errors = ref None let set_compile_errors filename = compile_errors := Some filename let compare_errors = ref [] let add_compare_errors filename = compare_errors := filename :: !compare_errors let merge_errors = ref [] let add_merge_errors filename = merge_errors := filename :: !merge_errors let update_errors = ref None let set_update_errors filename = update_errors := Some filename let echo_errors = ref None let set_echo_errors filename = echo_errors := Some filename let echo_errors_concrete = ref None let set_echo_errors_concrete filename = echo_errors_concrete := Some filename let cmly = ref false let coq_lib_path = ref (Some "MenhirLib") type dollars = | DollarsDisallowed | DollarsAllowed let dollars = ref DollarsAllowed let require_aliases = ref false let random_sentence_symbol = ref None let random_sentence_goal = ref 0 let random_sentence_style = ref `Abstract let random_sentence_abstract symbol = random_sentence_symbol := Some symbol; random_sentence_style := `Abstract let random_sentence_concrete symbol = random_sentence_symbol := Some symbol; random_sentence_style := `Concrete; require_aliases := true let strategy = ref `Legacy let set_strategy = function | "legacy" -> strategy := `Legacy | "simplified" -> strategy := `Simplified | _ -> eprintf "Error: --strategy should be followed with legacy | simplified.\n"; exit 1 (* When new command line options are added, please update both the manual in [doc/manual.tex] and the man page in [doc/menhir.1]. *) (* Please note that there is a very short length limit on the explanations here, since the output of [menhir -help] must fit in 80 columns. *) let options = Arg.align [ "--automaton-graph", Arg.Set automaton_graph, " (undocumented)"; "--base", Arg.Set_string base, " Specifies a base name for the output file(s)"; "--canonical", Arg.Unit (fun () -> construction_mode := ModeCanonical), " Construct a canonical Knuth LR(1) automaton"; "--cmly", Arg.Set cmly, " Write a .cmly file"; "--comment", Arg.Set comment, " Include comments in the generated code"; "--compare-errors", Arg.String add_compare_errors, " (used twice) Compare two .messages files"; "--compile-errors", Arg.String set_compile_errors, " Compile a .messages file to OCaml code"; "--coq", Arg.Set coq, " Generate a formally verified parser, in Coq"; "--coq-lib-path", Arg.String (fun path -> coq_lib_path := Some path), " How to qualify references to MenhirLib"; "--coq-lib-no-path", Arg.Unit (fun () -> coq_lib_path := None), " Do *not* qualify references to MenhirLib"; "--coq-no-version-check", Arg.Set coq_no_version_check, " Do not generate a version check."; "--coq-no-actions", Arg.Set coq_no_actions, " Ignore semantic actions in the Coq output"; "--coq-no-complete", Arg.Set coq_no_complete, " Do not generate a proof of completeness"; "--depend", Arg.Unit enable_depend, " Invoke ocamldep and display dependencies"; "--dump", Arg.Set dump, " Write an .automaton file"; "--dump-resolved", Arg.Set dump_resolved, " Write an .automaton.resolved file"; "--echo-errors", Arg.String set_echo_errors, " Echo the sentences in a .messages file"; "--echo-errors-concrete", Arg.String set_echo_errors_concrete, " Echo the sentences in a .messages file"; "--error-recovery", Arg.Set recovery, " (no longer supported)"; "--explain", Arg.Set explain, " Explain conflicts in .conflicts"; "--external-tokens", Arg.String codeonly, " Import token type definition from "; "--fixed-exception", Arg.Set fixedexc, " Declares Error = Parsing.Parse_error"; "--infer", Arg.Unit enable_infer, " Invoke ocamlc to do type inference"; "--infer-protocol-supported", Arg.Unit (fun () -> exit 0), " Stop with exit code 0"; "--infer-write-query", Arg.String enable_write_query, " Write mock .ml file"; "--infer-read-reply", Arg.String enable_read_reply, " Read inferred .mli file"; "--inspection", Arg.Set inspection, " Generate the inspection API"; "--interpret", Arg.Set interpret, " Interpret the sentences provided on stdin"; "--interpret-show-cst", Arg.Set interpret_show_cst, " Show a concrete syntax tree upon acceptance"; "--interpret-error", Arg.Set interpret_error, " Interpret an error sentence"; "--lalr", Arg.Unit (fun () -> construction_mode := ModeLALR), " Construct an LALR(1) automaton"; "--list-errors", Arg.Set list_errors, " Produce a list of erroneous inputs"; "--log-automaton", Arg.Set_int logA, " Log information about the automaton"; "--log-code", Arg.Set_int logC, " Log information about the generated code"; "--log-grammar", Arg.Set_int logG, " Log information about the grammar"; "--merge-errors", Arg.String add_merge_errors, " (used twice) Merge two .messages files"; "--no-code-inlining", Arg.Clear code_inlining, " (undocumented)"; "--no-dollars", Arg.Unit (fun () -> dollars := DollarsDisallowed), " Disallow $i in semantic actions"; "--no-inline", Arg.Clear inline, " Ignore the %inline keyword"; "--no-pager", Arg.Unit (fun () -> if !construction_mode = ModePager then construction_mode := ModeInclusionOnly), " (undocumented)"; "--no-prefix", Arg.Set noprefix, " (undocumented)"; "--no-stdlib", Arg.Set no_stdlib, " Do not load the standard library"; "--ocamlc", Arg.Set_string ocamlc, " Specifies how ocamlc should be invoked"; "--ocamldep", Arg.Set_string ocamldep, " Specifies how ocamldep should be invoked"; "--only-preprocess", Arg.Unit (fun () -> preprocess_mode := PMOnlyPreprocess PrintNormal), " Print grammar and exit"; "--only-preprocess-for-ocamlyacc", Arg.Unit (fun () -> preprocess_mode := PMOnlyPreprocess PrintForOCamlyacc), " Print grammar in ocamlyacc format and exit"; "--only-preprocess-u", Arg.Unit (fun () -> preprocess_mode := PMOnlyPreprocess (PrintUnitActions false)), " Print grammar with unit actions and exit"; "--only-preprocess-uu", Arg.Unit (fun () -> preprocess_mode := PMOnlyPreprocess (PrintUnitActions true)), " Print grammar with unit actions & tokens"; "--only-tokens", Arg.Unit tokentypeonly, " Generate token type definition only, no code"; "--random-seed", Arg.Int Random.init, " Set the random seed"; "--random-self-init", Arg.Unit Random.self_init, " Pick a random seed in a system-dependent way"; "--random-sentence-length", Arg.Set_int random_sentence_goal, " Set the goal length for a random sentence"; "--random-sentence", Arg.String random_sentence_abstract, " Generate a random valid sentence"; "--random-sentence-concrete", Arg.String random_sentence_concrete, " Generate a random valid sentence"; "--raw-depend", Arg.Unit enable_raw_depend, " Invoke ocamldep and echo its raw output"; "--reference-graph", Arg.Set reference_graph, " (undocumented)"; "--represent-states", Arg.Set represent_states, " (undocumented)"; "--represent-positions", Arg.Set represent_positions, " (undocumented)"; "--represent-values", Arg.Set represent_values, " (undocumented)"; "--represent-everything", Arg.Unit represent_everything, " (undocumented)"; "--require-aliases", Arg.Set require_aliases, " Check that every token has a token alias"; "--stdlib", Arg.String ignore, " Ignored (deprecated)"; "--strategy", Arg.String set_strategy, " Choose an error-handling strategy"; "--strict", Arg.Set strict, " Warnings about the grammar are errors"; "--suggest-comp-flags", Arg.Unit (fun () -> suggestion := SuggestCompFlags), " Suggest compilation flags for ocaml{c,opt}"; "--suggest-link-flags-byte", Arg.Unit (fun () -> suggestion := SuggestLinkFlags "cma"), " Suggest link flags for ocamlc"; "--suggest-link-flags-opt", Arg.Unit (fun () -> suggestion := SuggestLinkFlags "cmxa"), " Suggest link flags for ocamlopt"; "--suggest-menhirLib", Arg.Unit (fun () -> suggestion := SuggestWhereIsMenhirLibSource), " Suggest where is MenhirLib"; "--suggest-ocamlfind", Arg.Unit (fun () -> suggestion := SuggestUseOcamlfind), " (deprecated)"; "--table", Arg.Set table, " Use the table-based back-end"; "--timings", Arg.Unit (fun () -> timings := Some stderr), " Output internal timings to stderr"; "--timings-to", Arg.String (fun filename -> timings := Some (open_out filename)), " Output internal timings to "; "--trace", Arg.Set trace, " Generate tracing instructions"; "--unused-precedence-levels", Arg.Set ignore_all_unused_precedence_levels, " Do not warn about unused precedence levels"; "--unused-token", Arg.String ignore_unused_token, " Do not warn that is unused"; "--unused-tokens", Arg.Set ignore_all_unused_tokens, " Do not warn about any unused token"; "--update-errors", Arg.String set_update_errors, " Update auto-comments in a .messages file"; "--version", Arg.Set version, " Show version number and exit"; "-b", Arg.Set_string base, " Synonymous with --base "; "-lg", Arg.Set_int logG, " Synonymous with --log-grammar"; "-la", Arg.Set_int logA, " Synonymous with --log-automaton"; "-lc", Arg.Set_int logC, " Synonymous with --log-code"; "-t", Arg.Set table, " Synonymous with --table"; "-v", Arg.Unit v, " Synonymous with --dump --explain"; ] let usage = sprintf "Usage: %s " Sys.argv.(0) (* ------------------------------------------------------------------------- *) (* Parse the command line. *) let () = Arg.parse options insert usage (* ------------------------------------------------------------------------- *) (* If required, print a version number and stop. *) let () = if !version then begin printf "menhir, version %s\n" Version.version; exit 0 end (* ------------------------------------------------------------------------- *) (* Menhir is able to suggest compile and link flags to be passed to the OCaml compilers. If required, do so and stop. *) (* If [--table] is not passed, no flags are necessary. If [--table] is passed, then [MenhirLib] needs to be visible (at compile time) and linked in (at link time). *) (* The compilation flags are in fact meant to be used both at compile- and link-time. *) let () = match !suggestion with | SuggestNothing -> () | SuggestCompFlags -> if !table then printf "-I %s\n%!" (Installation.libdir()); exit 0 | SuggestLinkFlags extension -> if !table then printf "menhirLib.%s\n%!" extension; exit 0 | SuggestWhereIsMenhirLibSource -> printf "%s\n%!" (Installation.libdir()); exit 0 | SuggestUseOcamlfind -> printf "false\n"; exit 0 (* ------------------------------------------------------------------------- *) (* Export the settings. *) let stdlib_filename = "" let filenames = StringSet.elements !filenames let base = if !base = "" then match filenames with | [] -> fprintf stderr "%s\n" usage; exit 1 | [ filename ] -> Filename.chop_suffix filename (if !coq then ".vy" else ".mly") | _ -> fprintf stderr "Error: you must specify --base when providing multiple input files.\n"; exit 1 else !base let token_type_mode = !token_type_mode let construction_mode = !construction_mode let explain = !explain let dump = !dump let dump_resolved = !dump_resolved let reference_graph = !reference_graph let automaton_graph = !automaton_graph let trace = !trace let () = if !recovery then begin fprintf stderr "Error: --error-recovery mode is no longer supported.\n"; exit 1 end let noprefix = !noprefix let code_inlining = !code_inlining let represent_positions = !represent_positions let represent_states = !represent_states let represent_values = !represent_values let inline = !inline let comment = !comment let preprocess_mode = !preprocess_mode let ocamlc = !ocamlc let ocamldep = !ocamldep let logG, logA, logC = !logG, !logA, !logC let timings = !timings let interpret = !interpret let interpret_show_cst = !interpret_show_cst let interpret_error = !interpret_error let table = !table let inspection = !inspection let () = if inspection && not table then begin fprintf stderr "Error: --inspection requires --table.\n"; exit 1 end let no_stdlib = !no_stdlib let coq = !coq let coq_no_version_check = !coq_no_version_check let coq_no_complete = !coq_no_complete let coq_no_actions = !coq_no_actions let strict = !strict let fixedexc = !fixedexc let ignored_unused_tokens = !ignored_unused_tokens let ignore_all_unused_tokens = !ignore_all_unused_tokens let ignore_all_unused_precedence_levels = !ignore_all_unused_precedence_levels let list_errors = !list_errors let compile_errors = !compile_errors let compare_errors = match !compare_errors with | [] -> None | [ filename2; filename1 ] -> (* LIFO *) Some (filename1, filename2) | _ -> eprintf "To compare two .messages files, please use:\n\ --compare-errors --compare-errors .\n"; exit 1 let merge_errors = match !merge_errors with | [] -> None | [ filename2; filename1 ] -> (* LIFO *) Some (filename1, filename2) | _ -> eprintf "To merge two .messages files, please use:\n\ --merge-errors --merge-errors .\n"; exit 1 let update_errors = !update_errors let echo_errors = !echo_errors let echo_errors_concrete = !echo_errors_concrete let cmly = !cmly let coq_lib_path = !coq_lib_path let dollars = !dollars let require_aliases = !require_aliases let random_sentence = match !random_sentence_symbol with | None -> None | Some nt -> let goal = !random_sentence_goal and style = !random_sentence_style in Some (nt, goal, style) let strategy = !strategy let infer = !infer (* If some flags imply that we will NOT produce an OCaml parser, then there is no need to perform type inference, so [--infer] is ignored. This saves time and dependency nightmares. *) let skipping_parser_generation = coq || compile_errors <> None || interpret_error || list_errors || compare_errors <> None || merge_errors <> None || update_errors <> None || echo_errors <> None || false (* maybe also: [preprocess_mode <> PMNormal] *) let infer = match infer with | IMInfer when skipping_parser_generation -> IMNone | _ -> infer menhir-20210929/src/settings.mli000066400000000000000000000237511412503066000164050ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module parses the command line. *) (* The list of file names that appear on the command line. *) val filenames: string list (* How to deal with the type of tokens. *) type token_type_mode = | TokenTypeAndCode (* produce the definition of the [token] type and code for the parser *) | TokenTypeOnly (* produce the type definition only *) | CodeOnly of string (* produce the code only, by relying on an external token type *) val token_type_mode: token_type_mode (* How to construct the automaton. *) type construction_mode = | ModeCanonical (* --canonical: canonical Knuth LR(1) automaton *) | ModeInclusionOnly (* --no-pager : states are merged when there is an inclusion relationship, default reductions are used *) | ModePager (* normal mode: states are merged as per Pager's criterion, default reductions are used *) | ModeLALR (* --lalr : states are merged as in an LALR generator, i.e. as soon as they have the same LR(0) core *) val construction_mode: construction_mode (* Whether conflicts should be explained. *) val explain: bool (* Whether the automaton should be dumped before conflict resolution. *) val dump: bool (* Whether the automaton should be dumped after conflict resolution. *) val dump_resolved: bool (* Whether the grammar's reference graph should be dumped. *) val reference_graph: bool (* Whether the automaton's graph should be dumped. *) val automaton_graph: bool (* Whether tracing instructions should be generated. *) val trace: bool (* Whether one should stop and print the grammar after joining and expanding the grammar. *) type print_mode = | PrintNormal | PrintForOCamlyacc | PrintUnitActions of bool (* if true, declare unit tokens *) type preprocess_mode = | PMNormal (* preprocess and continue *) | PMOnlyPreprocess of print_mode (* preprocess, print grammar, stop *) val preprocess_mode: preprocess_mode (* Whether and how OCaml type inference (for semantic actions and nonterminal symbols) should be performed. See the manual for details. *) type infer_mode = (* Perform no type inference. This is the default mode. *) | IMNone (* Perform type inference by invoking ocamlc directly. *) | IMInfer (* --infer *) | IMDependRaw (* --raw-depend *) | IMDependPostprocess (* --depend *) (* Perform type inference by writing a mock .ml file and reading the corresponding inferred .mli file. *) | IMWriteQuery of string (* --infer-write-query *) | IMReadReply of string (* --infer-read-reply *) val infer: infer_mode (* Whether one should inline the non terminal definitions marked with the %inline keyword. *) val inline: bool (* Whether comments should be printed or discarded. *) val comment: bool (* This undocumented flag suppresses prefixing of identifiers with an unlikely prefix in the generated code. This increases the code's readability, but can cause identifiers in semantic actions to be captured. *) val noprefix: bool (* This undocumented flag causes the code to be transformed by [Inline]. It is on by default. *) val code_inlining: bool (**[represent_positions] forces every stack cell to contain a start position and an end position. This flag is [false] by default. It influences the code back-end only. *) val represent_positions: bool (**[represent_states] forces every stack cell to contain a state. This flag is [false] by default. It influences the code back-end only. *) val represent_states: bool (**[represent_values] forces every stack cell to contain a semantic value. This flag is [false] by default. It influences the code back-end only. *) val represent_values: bool (* How [ocamlc] and [ocamldep] should be invoked. *) val ocamlc: string val ocamldep: string (* How verbose we should be. *) val logG: int (* diagnostics on the grammar *) val logA: int (* diagnostics on the automaton *) val logC: int (* diagnostics on the generated code *) (* Whether tasks should be timed. *) val timings: out_channel option (* The base name that should be used for the files that we create. This name can contain a path. *) val base: string (* The filename of the standard library. *) val stdlib_filename : string (* Whether Menhir should behave as an interpreter. *) val interpret : bool (* Whether the interpreter should build and display concrete syntax trees. *) val interpret_show_cst : bool (* Whether Menhir should behave as an interpreter, in a special mode where it checks one input sentence, expecting it to trigger an error at the last token, and displays which state was reached. *) val interpret_error : bool (* Whether to use the table-based back-end ([true]) or the code-based back-end ([false]). *) val table : bool (* Whether to generate the inspection API (which requires GADTs, and requires producing more tables). *) val inspection : bool (* Whether the standard menhir library should be used. *) val no_stdlib : bool (* Whether to generate a coq description of the grammar and automaton. *) val coq : bool (* Whether to generate a version check for MenhirLib in the generated parser. *) val coq_no_version_check : bool (* Whether the coq description must contain completeness proofs. *) val coq_no_complete : bool (* Whether the coq backend should ignore types and semantic actions. *) val coq_no_actions : bool (* Whether unresolved LR(1) conflicts, useless precedence declarations, productions that are never reduced, etc. should be treated as errors. *) val strict: bool (* This flag causes the exception [Error] should be declared equal to [Parsing.Parse_error]. This is useful when full compatibility with ocamlyacc is desired. In particular, this is used when building Menhir itself, since Menhir is compiled first using ocamlyacc, then using Menhir. *) val fixedexc: bool (* This is a set of tokens which may be unused and about which we should not emit a warning. *) val ignored_unused_tokens: StringSet.t (* This flag supersedes the set [ignored_unused_tokens]. If it is set, then we should not emit a warning about any unused tokens. *) val ignore_all_unused_tokens: bool (* This flag suppresses all warnings about unused precedence levels. *) val ignore_all_unused_precedence_levels: bool (* This flag causes Menhir to produce a list of erroneous input sentences. Enough sentences are computed to produce exactly one error in every state where an error can occur. *) val list_errors: bool (* This flag causes Menhir to read the error message descriptions stored in [filename] and compile them to OCaml code. *) val compile_errors: string option (* If present, this is a pair of .messages files whose contents should be compared. *) val compare_errors: (string * string) option (* If present, this is a pair of .messages files whose contents should be merged. *) val merge_errors: (string * string) option (* This flag causes Menhir to read the error message descriptions stored in [filename] and re-generate the auto-generated comments, which begin with [##]. This allows bringing these comments up to date when the grammar evolves. *) val update_errors: string option (* This flag causes Menhir to read the error message descriptions stored in [filename] and echo the error sentences (and nothing else; no messages, no comments). *) val echo_errors: string option (* This flag causes Menhir to read the error message descriptions stored in [filename] and echo the error sentences, including the concrete syntax of each sentence, in an auto-comment. *) val echo_errors_concrete: string option (* This flag causes Menhir to produce a [.cmly] file, which contains a binary-format description of the grammar and automaton. *) val cmly: bool (* This name is used in --coq mode. It appears in the generated Coq file, and indicates under what name (or path) the Coq library MenhirLib is known. Its default value is [Some "MenhirLib"]. *) val coq_lib_path: string option (* This flag tells whether [$i] notation in semantic actions is allowed. *) type dollars = | DollarsDisallowed | DollarsAllowed val dollars: dollars (* This flag requires every token to come with a token alias. If that is not the case, warnings are emitted. *) val require_aliases : bool (* When [random_sentence] is [Some (nt, goal, style)], the user is asking Menhir to show a random sentence. The sentence must be generated by the nonterminal symbol [nt]. Its goal length is [goal]. The [style] parameter indicates whether the sentence should be displayed in concrete syntax; if it is [`Concrete], then every token must have a token alias. *) val random_sentence : (string * int * [`Abstract | `Concrete]) option (* The error handling strategy that should be used by the code back-end, the table back-end, and the reference interpreter. See [IncrementalEngine] for an explanation of the available strategies. *) val strategy: [`Legacy | `Simplified] menhir-20210929/src/slr.ml000066400000000000000000000125231412503066000151670ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module extends the LR(0) automaton with lookahead information in order to construct an SLR(1) automaton. The lookahead information is obtained by considering the FOLLOW sets. *) (* This construction is not used by Menhir, but can be used to check whether the grammar is in the class SLR(1). This check is performed when the log level [lg] is at least 1. *) open Grammar (* This flag, which is reserved for internal use, causes more information about SLR(1) conflict states to be printed. *) let tell_me_everything = false (* The following function turns an LR(0) state into an SLR(1) state. *) let make_slr_state (s : Lr0.node) : Lr0.concretelr1state = (* Obtain the set of LR(0) items associated with the state [s]. *) let items = Lr0.items s in (* Unfortunately, this set is not closed. We do not have a function that computes the closure of a set of LR(0) items -- we could build one using [Item.Closure], but that would be overkill. So, we first convert this set to a set of LR(1) items, then compute the closure at this level, and finally we turn this LR(1) state into an SLR(1) state by letting the lookahead sets be the FOLLOW sets. This is somewhat ugly and naïve, but seems to work. *) (* Convert this set to a set of LR(1) items. Here, we can use any set of tokens as the lookahead set. We use the empty set. *) let s = Item.Map.lift (fun _item -> TerminalSet.empty) items in (* Compute the LR(1) closure. *) let s = Lr0.closure s in (* We now have an LR(1) state that has the correct set of LR(0) items but phony lookahead information. We convert it into an SLR(1) state by deciding that, for each item, the lookahead set is the FOLLOW set of the symbol that appears on the left-hand side of the item. *) Item.Map.fold (fun item toks accu -> let _, nt, _, _, _ = Item.def item in let follow_nt = Analysis.follow nt in assert (TerminalSet.subset toks follow_nt); (* sanity check *) Item.Map.add item follow_nt accu ) s Item.Map.empty (* The following function turns a closed LR(1) state into a map of terminal symbols to reduction actions. Copied from a related function in [Lr0]. *) let reductions (s : Lr0.concretelr1state) : Production.index list TerminalMap.t = Item.Map.fold (fun item toks reductions -> match Item.classify item with | Item.Reduce prod -> Lr0.add_reductions prod toks reductions | Item.Shift _ -> reductions ) s TerminalMap.empty (* The following function turns a closed LR(1) state into a set of shift actions. *) let transitions (s : Lr0.concretelr1state) : TerminalSet.t = Item.Map.fold (fun item _ transitions -> match Item.classify item with | Item.Shift (Symbol.T tok, _) -> TerminalSet.add tok transitions | Item.Shift (Symbol.N _, _) | Item.Reduce _ -> transitions ) s TerminalSet.empty (* This function computes the domain of a terminal map, producing a terminal set. *) let domain (m : 'a TerminalMap.t) : TerminalSet.t = TerminalMap.fold (fun tok _ accu -> TerminalSet.add tok accu ) m TerminalSet.empty (* The following function checks whether a closed LR(1) state is free of conflicts. *) let state_is_ok (s : Lr0.concretelr1state) : bool = let reductions = reductions s and transitions = transitions s in (* Check for shift/reduce conflicts. *) TerminalSet.disjoint transitions (domain reductions) && (* Check for reduce/reduce conflicts. *) TerminalMap.fold (fun _ prods ok -> ok && match prods with | [] | [ _ ] -> true | _ :: _ :: _ -> false ) reductions true (* The following function counts the number of states in the SLR(1) automaton that have a conflict. *) let count_slr_violations () : int = let count = ref 0 in for s = 0 to Lr0.n - 1 do let s = make_slr_state s in if not (state_is_ok s) then begin incr count; if tell_me_everything then Printf.fprintf stderr "The following SLR(1) state has a conflict:\n%s" (Lr0.print_concrete "" s) end done; !count (* At log level 1, indicate whether the grammar is SLR(1). *) let check () = Error.logG 1 (fun f -> let count = count_slr_violations() in if count = 0 then Printf.fprintf f "The grammar is SLR(1).\n" else Printf.fprintf f "The grammar is not SLR(1) -- %d states have a conflict.\n" count ) menhir-20210929/src/slr.mli000066400000000000000000000025431412503066000153410ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module extends the LR(0) automaton with lookahead information in order to construct an SLR(1) automaton. The lookahead information is obtained by considering the FOLLOW sets. *) (* This construction is not used by Menhir, but can be used to check whether the grammar is in the class SLR(1). This check is performed when the log level [lg] is at least 1. *) val check: unit -> unit menhir-20210929/src/stage1/000077500000000000000000000000001412503066000152165ustar00rootroot00000000000000menhir-20210929/src/stage1/Driver.ml000066400000000000000000000030031412503066000167770ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The module [Driver] serves to offer a unified API to the parser, which could be produced by either ocamlyacc or Menhir. *) (* This is the ocamlyacc-specific driver. There is nothing special to do. We handle syntax errors in a minimalistic manner. This error handling code will be exercised only if there is a syntax error in [fancy-parser.mly], during stage 2 of the bootstrap process. *) let grammar lexer lexbuf = try Parser.grammar lexer lexbuf with Parsing.Parse_error -> Error.error (Positions.lexbuf lexbuf) "syntax error." menhir-20210929/src/stage1/dune000066400000000000000000000010561412503066000160760ustar00rootroot00000000000000;; Build the stage1 version of Menhir. During this stage, Menhir's parser ;; is generated by ocamlyacc. (ocamlyacc parser) ;; As dune cannot use the same OCaml module in two different libraries or ;; executables, we must copy the source files to the present directory. (copy_files# ../*.{ml,mli}) ;; The stage1 version of Menhir. This executable is later used to build the ;; stage2 version of Menhir. (executable (name main) (libraries unix vendored_fix vendored_pprint menhirLib menhirSdk ) (flags :standard -open MenhirSdk) ) menhir-20210929/src/stage1/parser.mly000066400000000000000000000270121412503066000172370ustar00rootroot00000000000000/******************************************************************************/ /* */ /* Menhir */ /* */ /* François Pottier, Inria Paris */ /* Yann Régis-Gianas, PPS, Université Paris Diderot */ /* */ /* Copyright Inria. All rights reserved. This file is distributed under the */ /* terms of the GNU General Public License version 2, as described in the */ /* file LICENSE. */ /* */ /******************************************************************************/ /* This is the crude version of the parser. It is meant to be processed by ocamlyacc. Its existence is necessary for bootstrapping. It is kept in sync with [fancy-parser], with a few differences: 0. [yacc-parser] produces dummy position information; 1. [fancy-parser] exploits many features of Menhir; 2. [fancy-parser] performs slightly more refined error handling; 3. [fancy-parser] supports anonymous rules. 4. [fancy-parser] supports the new rule syntax. */ %{ open Syntax open Positions %} %token TOKEN TYPE LEFT RIGHT NONASSOC START PREC PUBLIC COLON BAR EOF EQUAL %token INLINE LPAREN RPAREN COMMA QUESTION STAR PLUS PARAMETER ON_ERROR_REDUCE %token PERCENTATTRIBUTE SEMI %token LID UID QID %token HEADER %token OCAMLTYPE %token PERCENTPERCENT %token ACTION %token ATTRIBUTE GRAMMARATTRIBUTE /* For the new rule syntax: */ %token LET TILDE UNDERSCORE COLONEQUAL EQUALEQUAL %start grammar %type producer %type production %type grammar /* These declarations solve a shift-reduce conflict in favor of shifting: when the declaration of a non-terminal symbol begins with a leading bar, it is understood as an (insignificant) leading optional bar, *not* as an empty right-hand side followed by a bar. This ambiguity arises due to the existence of a new notation for letting several productions share a single semantic action. */ %nonassoc no_optional_bar %nonassoc BAR %% /* ------------------------------------------------------------------------- */ /* A grammar consists of declarations and rules, followed by an optional postlude, which we do not parse. */ grammar: declarations PERCENTPERCENT rules postlude { { pg_filename = ""; (* filled in by the caller *) pg_declarations = List.rev $1; pg_rules = $3; pg_postlude = $4 } } postlude: EOF { None } | PERCENTPERCENT /* followed by actual postlude */ { Some (Lazy.force $1) } /* ------------------------------------------------------------------------- */ /* A declaration is an %{ OCaml header %}, or a %token, %start, %type, %left, %right, or %nonassoc declaration. */ declarations: /* epsilon */ { [] } | declarations declaration { $2 @ $1 } | declarations SEMI { $1 } declaration: | HEADER /* lexically delimited by %{ ... %} */ { [ unknown_pos (DCode $1) ] } | TOKEN optional_ocamltype terminals { let ty, ts = $2, $3 in List.map (Positions.map (fun (terminal, alias, attrs) -> DToken (ty, terminal, alias, attrs) )) ts } | START nonterminals { List.map (Positions.map (fun nonterminal -> DStart nonterminal)) $2 } | TYPE OCAMLTYPE actuals { List.map (Positions.map (fun nt -> DType ($2, nt))) (List.map Parameters.with_pos $3) } | START OCAMLTYPE nonterminals /* %start foo is syntactic sugar for %start foo %type foo */ { Misc.mapd (fun ntloc -> Positions.mapd (fun nt -> DStart nt, DType ($2, ParameterVar ntloc)) ntloc) $3 } | priority_keyword symbols { let prec = ParserAux.new_precedence_level (rhs_start_pos 1, rhs_end_pos 1) in List.map (Positions.map (fun symbol -> DTokenProperties (symbol, $1, prec))) $2 } | PARAMETER OCAMLTYPE { [ unknown_pos (DParameter $2) ] } | GRAMMARATTRIBUTE { [ unknown_pos (DGrammarAttribute $1) ] } | PERCENTATTRIBUTE actuals attributes { [ unknown_pos (DSymbolAttributes ($2, $3)) ] } | ON_ERROR_REDUCE actuals { let prec = ParserAux.new_on_error_reduce_level() in List.map (Positions.map (fun nt -> DOnErrorReduce (nt, prec))) (List.map Parameters.with_pos $2) } optional_ocamltype: /* epsilon */ { None } | OCAMLTYPE /* lexically delimited by angle brackets */ { Some $1 } priority_keyword: LEFT { LeftAssoc } | RIGHT { RightAssoc } | NONASSOC { NonAssoc } /* ------------------------------------------------------------------------- */ /* A symbol is a terminal or nonterminal symbol. */ /* One would like to require nonterminal symbols to begin with a lowercase letter, so as to lexically distinguish them from terminal symbols, which must begin with an uppercase letter. However, for compatibility with ocamlyacc, this is impossible. It can be required only for nonterminal symbols that are also start symbols. */ /* We also accept token aliases in place of ordinary terminal symbols. Token aliases are quoted strings. */ symbols: /* epsilon */ { [] } | symbols optional_comma symbol { $3 :: $1 } symbol: LID { $1 } | UID { $1 } | QID { $1 } optional_comma: /* epsilon */ { () } | COMMA { () } attributes: /* epsilon */ { [] } | ATTRIBUTE attributes { $1 :: $2 } /* ------------------------------------------------------------------------- */ /* Terminals must begin with an uppercase letter. Nonterminals that are declared to be start symbols must begin with a lowercase letter. */ terminals: /* epsilon */ { [] } | terminals optional_comma UID optional_alias attributes { let ts, uid, alias, attrs = $1, $3, $4, $5 in let alias = Option.map Positions.value alias in Positions.map (fun uid -> uid, alias, attrs) uid :: ts } nonterminals: /* epsilon */ { [] } | nonterminals LID { $2 :: $1 } optional_alias: /* epsilon */ { None } | QID { Some $1 } /* ------------------------------------------------------------------------- */ /* A rule defines a symbol. It is optionally declared %public, and optionally carries a number of formal parameters. The right-hand side of the definition consists of a list of production groups. */ rules: /* epsilon */ { [] } | rules rule { $2 :: $1 } | rules SEMI { $1 } rule: flags symbol attributes optional_formal_parameters COLON optional_bar production_group production_groups { let public, inline = $1 in { pr_public_flag = public; pr_inline_flag = inline; pr_nt = Positions.value $2; pr_positions = [ Positions.position $2 ]; pr_attributes = $3; pr_parameters = $4; pr_branches = List.flatten ($7 :: List.rev $8) } } flags: /* epsilon */ { false, false } | PUBLIC { true, false } | INLINE { false, true } | PUBLIC INLINE { true, true } | INLINE PUBLIC { true, true } /* ------------------------------------------------------------------------- */ /* Parameters are surroundered with parentheses and delimited by commas. The syntax of actual parameters allows applications, whereas the syntax of formal parameters does not. It also allows use of the "?", "+", and "*" shortcuts. */ optional_formal_parameters: /* epsilon */ { [] } | LPAREN formal_parameters RPAREN { $2 } formal_parameters: symbol { [ Positions.value $1 ] } | symbol COMMA formal_parameters { Positions.value $1 :: $3 } optional_actuals: /* epsilon */ { [] } | LPAREN actuals_comma RPAREN { $2 } actuals_comma: actual { [ $1 ] } | actual COMMA actuals_comma { $1 :: $3 } actual: symbol optional_actuals { Parameters.app $1 $2 } | actual modifier { ParameterApp ($2, [ $1 ]) } actuals: /* epsilon */ { [] } | actuals optional_comma actual { $3::$1 } optional_bar: /* epsilon */ %prec no_optional_bar { () } | BAR { () } /* ------------------------------------------------------------------------- */ /* The "?", "+", and "*" modifiers are short-hands for applications of certain parameterized nonterminals, defined in the standard library. */ modifier: QUESTION { unknown_pos "option" } | PLUS { unknown_pos "nonempty_list" } | STAR { unknown_pos "list" } /* ------------------------------------------------------------------------- */ /* A production group consists of a list of productions, followed by a semantic action and an optional precedence specification. */ production_groups: /* epsilon */ { [] } | production_groups BAR production_group { $3 :: $1 } production_group: productions ACTION /* action is lexically delimited by braces */ optional_precedence { let productions, action, oprec2 = $1, $2, $3 in (* If multiple productions share a single semantic action, check that all of them bind the same names. *) ParserAux.check_production_group productions; (* Then, *) List.map (fun (producers, oprec1, level, pos) -> (* Replace [$i] with [_i]. *) let pr_producers = ParserAux.normalize_producers producers in (* Distribute the semantic action. Also, check that every [$i] is within bounds. *) let names = ParserAux.producer_names producers in let pr_action = action Settings.dollars names in { pr_producers; pr_action; pr_branch_prec_annotation = ParserAux.override pos oprec1 oprec2; pr_branch_production_level = level; pr_branch_position = pos }) productions } optional_precedence: /* epsilon */ { None } | PREC symbol { Some $2 } /* ------------------------------------------------------------------------- */ /* A production is a list of producers, optionally followed by a precedence declaration. Lists of productions are nonempty and separated with bars. */ productions: production { [ $1 ] } | production bar_productions { $1 :: $2 } bar_productions: BAR production { [ $2 ] } | BAR production bar_productions { $2 :: $3 } production: producers optional_precedence { List.rev $1, $2, ParserAux.new_production_level(), Positions.import (symbol_start_pos(), symbol_end_pos()) } producers: /* epsilon */ { [] } | producers producer { $2 :: $1 } /* ------------------------------------------------------------------------- */ /* A producer is an actual parameter, possibly preceded by a binding, and possibly followed with attributes. */ producer: | actual attributes optional_semis { Positions.import (symbol_start_pos(), symbol_end_pos()), None, $1, $2 } | LID EQUAL actual attributes optional_semis { Positions.import (symbol_start_pos(), symbol_end_pos()), Some $1, $3, $4 } /* ------------------------------------------------------------------------- */ /* Semicolons used to be considered whitespace by our lexer, but are no longer. We must allow optional semicolons in a few conventional places. */ optional_semis: /* empty */ { () } | optional_semis SEMI { () } %% menhir-20210929/src/stage2/000077500000000000000000000000001412503066000152175ustar00rootroot00000000000000menhir-20210929/src/stage2/Driver.ml000066400000000000000000000056331412503066000170130ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The module [Driver] serves to offer a unified API to the parser, which could be produced by either ocamlyacc or Menhir. *) (* This is the Menhir-specific driver. We wish to handle syntax errors in a more ambitious manner, so as to help our end users understand their mistakes. *) open Parser.MenhirInterpreter (* incremental API to our parser *) (* [fail buffer lexbuf s] is invoked if a syntax error is encountered in state [s]. *) let fail buffer lexbuf (s : int) = (* Display a nice error message. In principle, the table found in [ParserMessages] should be complete, so we should obtain a nice message. If [Not_found] is raised, we produce a generic message, which is better than nothing. Note that the OCaml code in [ParserMessages] is auto-generated based on the table in [ParserMessages.messages]. *) let message = try ParserMessages.message s with Not_found -> Printf.sprintf "Unknown syntax error (in state %d).\n" s in (* Show the two tokens between which the error took place. *) let where = MenhirLib.ErrorReports.show InputFile.chunk buffer in (* Hack: remove the final newline, because [Error.error] adds one. *) let message = String.sub message 0 (String.length message - 1) in (* Display our message and die. *) Error.error (Positions.lexbuf lexbuf) "syntax error %s.\n%s" where message (* Same as above, except we expect a checkpoint instead of a state [s]. *) let fail buffer lexbuf checkpoint = match checkpoint with | HandlingError env -> let s = current_state_number env in fail buffer lexbuf s | _ -> assert false (* this cannot happen *) (* The entry point. *) let grammar lexer lexbuf = (* Keep track of the last two tokens in a buffer. *) let buffer, lexer = MenhirLib.ErrorReports.wrap lexer in loop_handle (fun v -> v) (fail buffer lexbuf) (lexer_lexbuf_to_supplier lexer lexbuf) (Parser.Incremental.grammar lexbuf.Lexing.lex_curr_p) menhir-20210929/src/stage2/Makefile000066400000000000000000000013741412503066000166640ustar00rootroot00000000000000# The directory used by dune to mirror this directory. BUILD_DIR := ../../_build/default/src/stage2 # [make update] is used under the programmer's manual control, after the # grammar in [parser.mly] has been modified. # It updates the file [parserMessages.messages] with new auto-generated # comments for all error states. .PHONY: update update: @ dune build --force parserMessages.messages.updated @ cp $(BUILD_DIR)/parserMessages.messages.updated parserMessages.messages # [make strip] strips away the auto-generated comments found in the file # parserMessages.messages. It is typically used after [make update], which # creates many such comments. .PHONY: strip strip: @ sed -e "/^##/d" -i.bak parserMessages.messages @ rm parserMessages.messages.bak menhir-20210929/src/stage2/dune000066400000000000000000000107321412503066000161000ustar00rootroot00000000000000;; Build the stage2 version of Menhir. During this stage, Menhir's parser ;; is generated by the stage1 Menhir executable. ;; ----------------------------------------------------------------------------- ;; The flags that are passed to every invocation of Menhir in the rules below ;; are set in the file "menhir_flags". Any flags that affect the construction ;; of the automaton, such as --canonical, *must* be listed there. ;; We need these flags in "s-expression" format in order to use them in the ;; "menhir" stanza below. The following rule generates a file in this format ;; by wrapping the list of arguments in parentheses. (rule (with-stdout-to menhir_flags.sexp (progn (echo "(") (cat %{dep:menhir_flags}) (echo ")") ) ) ) ;; ----------------------------------------------------------------------------- ;; Bind the name "menhir" to "../stage1/main.exe" within the present scope. ;; This is so that the "menhir" stanza below will use *that* executable ;; instead of whatever "menhir" executable (if any) is available on the ;; developer's machine. (env (_ (binaries ../stage1/main.exe (../stage1/main.exe as menhir))) ) ;; Menhir's parser is generated by Menhir. ;; We include the flags found in the file "menhir_flags" plus extra flags ;; specified here. (menhir (flags (:include menhir_flags.sexp) --strict -lg 1 -la 1 -lc 1 -v ) (modules parser) ) ;; ----------------------------------------------------------------------------- ;; As dune cannot use the same OCaml module in two different libraries or ;; executables, we must copy the source files to the present directory. (copy_files# ../*.{ml,mli}) ;; ----------------------------------------------------------------------------- ;; The stage2 version of Menhir. The stanza is identical to that used for the ;; stage1 version, but the [Driver] and [Parser] modules are different. ;; The link_deps field requires running the completeness check. (executable (name main) (libraries unix vendored_fix vendored_pprint menhirLib menhirSdk ) (flags :standard -open MenhirSdk) (link_deps parserMessages.check) ) ;; ----------------------------------------------------------------------------- ;; Install the Menhir executable under the "menhir" name. This would usually ;; be achieved by adding a "public_name" field in the "executable" stanza ;; above. However, we cannot do that here, because the public name "menhir" ;; would clash with the binding of this name to the stage1 version of Menhir ;; at the top of this file. Thus, we explicitly request its installation as ;; follows. (install (section bin) (package menhir) (files (./main.exe as menhir)) ) ;; ----------------------------------------------------------------------------- ;; This section deals with the .messages file. ;; The module [ParserMessages] is generated by Menhir based on the source file ;; "parserMessages.messages". The completeness check is performed first. (rule (deps parserMessages.check) (action (with-stdout-to parserMessages.ml (run menhir %{read-lines:menhir_flags} %{dep:parser.mly} --compile-errors %{dep:parserMessages.messages} ) )) ) ;; In order to perform the completeness check, we must first generate a file ;; "parserMessages.auto.messages" that contains a list of all error states. (rule (with-stdout-to parserMessages.auto.messages (run menhir %{read-lines:menhir_flags} %{dep:parser.mly} --list-errors ) ) ) ;; The completeness check verifies that all error messages are listed in the ;; ".messages" file. It compares the ".messages" file with that generated by ;; Menhir using the above rule. (rule (with-stdout-to parserMessages.check (run menhir %{read-lines:menhir_flags} %{dep:parser.mly} --compare-errors %{dep:parserMessages.auto.messages} --compare-errors %{dep:parserMessages.messages} )) ) ;; ----------------------------------------------------------------------------- ;; The following rule is used under the programmer's manual control, ;; after the grammar in [src/stage2/parser.mly] has been modified. ;; This rule updates the file [parserMessages.messages] with new ;; auto-generated comments for all error states. ;; It is invoked by running [make update] in the directory src/. (rule (with-stdout-to parserMessages.messages.updated (run menhir %{read-lines:menhir_flags} --update-errors %{dep:parserMessages.messages} %{dep:parser.mly} ) ) ) menhir-20210929/src/stage2/menhir_flags000066400000000000000000000001001412503066000175670ustar00rootroot00000000000000--no-pager --table --fixed-exception --require-aliases --strict menhir-20210929/src/stage2/parser.mly000066400000000000000000000621151412503066000172430ustar00rootroot00000000000000/******************************************************************************/ /* */ /* Menhir */ /* */ /* François Pottier, Inria Paris */ /* Yann Régis-Gianas, PPS, Université Paris Diderot */ /* */ /* Copyright Inria. All rights reserved. This file is distributed under the */ /* terms of the GNU General Public License version 2, as described in the */ /* file LICENSE. */ /* */ /******************************************************************************/ /* This is the fancy version of the parser, to be processed by menhir. It is kept in sync with [Parser], but exercises menhir's features. */ /* As of 2014/12/02, the $previouserror keyword and the --error-recovery mode no longer exist. Thus, we replace all calls to [Error.signal] with calls to [Error.error], and report just one error. */ /* ------------------------------------------------------------------------- */ /* Imports. */ %{ open Stretch open Syntax open Positions (* An injection of symbol expressions into choice expressions. *) let inject (e : symbol_expression located) : expression = Positions.pmap (fun pos e -> let branch = Branch ( Positions.with_pos pos (ESingleton e), ParserAux.new_production_level() ) in EChoice [ branch ] ) e (* When a stretch has been created by [Lexer.mk_stretch] with [parenthesize] set to [true], it includes parentheses. In some (rare) cases, this is undesirable. The following function removes the parentheses a posteriori. They are replaced with whitespace, so as to not alter column numbers. *) let rec find s n i = assert (i < n); if s.[i] = '(' then i else begin assert (s.[i] = ' '); find s n (i+1) end let unparenthesize (s : string) : string = let n = String.length s in (* The string [s] must end with a closing parenthesis. *) assert (n >= 2 && s.[n-1] = ')'); (* The string [s] must begin with a certain amount of spaces followed with an opening parenthesis. Find its offset [i]. *) let i = find s n 0 in (* Create a copy without the parentheses. *) let b = Bytes.of_string s in Bytes.set b i ' '; Bytes.set b (n-1) ' '; Bytes.to_string b let unparenthesize (s : Stretch.t) : Stretch.t = { s with stretch_content = unparenthesize s.stretch_content } let unparenthesize (o : Stretch.t option) : Stretch.t option = Option.map unparenthesize o %} /* ------------------------------------------------------------------------- */ /* Tokens. */ %token TOKEN "%token" TYPE "%type" LEFT "%left" RIGHT "%right" NONASSOC "%nonassoc" START "%start" PREC "%prec" PUBLIC "%public" COLON ":" BAR "|" EOF "" EQUAL "=" INLINE "%inline" LPAREN "(" RPAREN ")" COMMA "," QUESTION "?" STAR "*" PLUS "+" PARAMETER "%parameter" ON_ERROR_REDUCE "%on_error_reduce" PERCENTATTRIBUTE "%attribute" SEMI ";" %token LID "lident" UID "UIdent" QID "\"alias\"" %token HEADER "%{ header %}" %token OCAMLTYPE "" %token PERCENTPERCENT "%%" %token ACTION "{}" %token ATTRIBUTE "[@foo]" GRAMMARATTRIBUTE "%[@foo]" /* For the new rule syntax: */ %token LET "let" TILDE "~" UNDERSCORE "_" COLONEQUAL ":=" EQUALEQUAL "==" /* ------------------------------------------------------------------------- */ /* Type annotations and start symbol. */ %type producer %type production %start grammar /* ------------------------------------------------------------------------- */ /* Priorities. */ /* These declarations solve a shift-reduce conflict in favor of shifting: when the right-hand side of an old-style rule begins with a leading bar, this bar is understood as an (insignificant) leading optional bar, *not* as an empty right-hand side followed by a bar. This ambiguity arises due to the possibility for several productions to share a single semantic action. The new rule syntax does not have this possibility, and has no ambiguity. */ %nonassoc no_optional_bar %nonassoc BAR /* ------------------------------------------------------------------------- */ /* On-error-reduce declarations. */ /* These declarations reduce the number of states where an error can occur, thus reduce the number of syntax error messages that we have to write in parserMessages.messages. */ %on_error_reduce old_rule %on_error_reduce list(ATTRIBUTE) %on_error_reduce action_expression %on_error_reduce separated_nonempty_list(COMMA,symbol) %on_error_reduce separated_nonempty_list(COMMA,pattern) %on_error_reduce loption(delimited(LPAREN,separated_nonempty_list(COMMA,lax_actual),RPAREN)) %on_error_reduce loption(delimited(LPAREN,separated_nonempty_list(COMMA,expression),RPAREN)) %% /* ------------------------------------------------------------------------- */ /* A grammar consists of declarations and rules, followed by an optional postlude, which we do not parse. */ grammar: ds = flatten(declaration*) PERCENTPERCENT rs = rule* t = postlude { { pg_filename = ""; (* filled in by the caller *) pg_declarations = ds; pg_rules = rs; pg_postlude = t } } /* ------------------------------------------------------------------------- */ /* A declaration is an %{ OCaml header %}, or a %token, %start, %type, %left, %right, or %nonassoc declaration. */ declaration: | h = HEADER /* lexically delimited by %{ ... %} */ { [ with_loc $loc (DCode h) ] } | TOKEN ty = OCAMLTYPE? ts = clist(terminal_alias_attrs) { List.map (Positions.map (fun (terminal, alias, attrs) -> DToken (ty, terminal, alias, attrs) )) ts } | START t = OCAMLTYPE? nts = clist(nonterminal) /* %start foo is syntactic sugar for %start foo %type foo */ { match t with | None -> List.map (Positions.map (fun nonterminal -> DStart nonterminal)) nts | Some t -> Misc.mapd (fun ntloc -> Positions.mapd (fun nt -> DStart nt, DType (t, ParameterVar ntloc)) ntloc) nts } | TYPE t = OCAMLTYPE ss = clist(strict_actual) { List.map (Positions.map (fun nt -> DType (t, nt))) (List.map Parameters.with_pos ss) } | k = priority_keyword ss = clist(symbol) { let prec = ParserAux.new_precedence_level $loc(k) in List.map (Positions.map (fun symbol -> DTokenProperties (symbol, k, prec))) ss } | PARAMETER t = OCAMLTYPE { [ with_loc $loc (DParameter t) ] } | attr = GRAMMARATTRIBUTE { [ with_loc $loc (DGrammarAttribute attr) ] } | PERCENTATTRIBUTE actuals = clist(strict_actual) attrs = ATTRIBUTE+ { [ with_loc $loc (DSymbolAttributes (actuals, attrs)) ] } | ON_ERROR_REDUCE ss = clist(strict_actual) { let prec = ParserAux.new_on_error_reduce_level() in List.map (Positions.map (fun nt -> DOnErrorReduce (nt, prec))) (List.map Parameters.with_pos ss) } | SEMI { [] } /* This production recognizes tokens that are valid in the rules section, but not in the declarations section. This is a hint that a %% was forgotten. */ | rule_specific_token { Error.error [Positions.import $loc] "syntax error inside a declaration.\n\ Did you perhaps forget the %%%% that separates declarations and rules?" } priority_keyword: LEFT { LeftAssoc } | RIGHT { RightAssoc } | NONASSOC { NonAssoc } %inline rule_specific_token: | PUBLIC | INLINE | COLON | LET | EOF { () } /* ------------------------------------------------------------------------- */ /* Our lists of symbols are separated with optional commas. Order is irrelevant. */ %inline clist(X): xs = separated_nonempty_list(COMMA?, X) { xs } /* ------------------------------------------------------------------------- */ /* A symbol is a terminal or nonterminal symbol. */ /* One would like to require nonterminal symbols to begin with a lowercase letter, so as to lexically distinguish them from terminal symbols, which must begin with an uppercase letter. However, for compatibility with ocamlyacc, this is impossible. It can be required only for nonterminal symbols that are also start symbols. */ /* We also accept token aliases in place of ordinary terminal symbols. Token aliases are quoted strings. */ symbol: id = LID | id = UID | id = QID { id } /* ------------------------------------------------------------------------- */ /* Terminals must begin with an uppercase letter. Nonterminals that are declared to be start symbols must begin with a lowercase letter. */ /* In declarations, terminals must be UIDs, but we may also declare token aliases, which are QIDs. */ %inline terminal_alias_attrs: id = UID alias = QID? attrs = ATTRIBUTE* { let alias = Option.map Positions.value alias in Positions.map (fun uid -> uid, alias, attrs) id } %inline nonterminal: id = LID { id } /* ------------------------------------------------------------------------- */ /* A rule is expressed either in the traditional (yacc-style) syntax or in the new syntax. */ %inline rule: old_rule { $1 } | new_rule /* The new syntax is converted on the fly to the old syntax. */ { NewRuleSyntax.rule $1 } /* ------------------------------------------------------------------------- */ /* A rule defines a symbol. It is optionally declared %public, and optionally carries a number of formal parameters. The right-hand side of the definition consists of a list of productions. */ old_rule: flags = flags /* flags */ symbol = symbol /* the symbol that is being defined */ attributes = ATTRIBUTE* params = plist(symbol) /* formal parameters */ COLON optional_bar branches = branches SEMI* { let public, inline = flags in let rule = { pr_public_flag = public; pr_inline_flag = inline; pr_nt = Positions.value symbol; pr_positions = [ Positions.position symbol ]; pr_attributes = attributes; pr_parameters = List.map Positions.value params; pr_branches = branches } in rule } %inline branches: prods = separated_nonempty_list(BAR, production_group) { List.flatten prods } flags: /* epsilon */ { false, false } | PUBLIC { true, false } | INLINE { false, true } | PUBLIC INLINE | INLINE PUBLIC { true, true } optional_bar: /* epsilon */ %prec no_optional_bar | BAR { () } /* ------------------------------------------------------------------------- */ /* A production group consists of a list of productions, followed by a semantic action and an optional precedence specification. */ production_group: productions = separated_nonempty_list(BAR, production) action = ACTION oprec2 = ioption(precedence) { (* If multiple productions share a single semantic action, check that all of them bind the same names. *) ParserAux.check_production_group productions; (* Then, *) List.map (fun (producers, oprec1, level, pos) -> (* Replace [$i] with [_i]. *) let pr_producers = ParserAux.normalize_producers producers in (* Distribute the semantic action. Also, check that every [$i] is within bounds. *) let names = ParserAux.producer_names producers in let pr_action = action Settings.dollars names in { pr_producers; pr_action; pr_branch_prec_annotation = ParserAux.override pos oprec1 oprec2; pr_branch_production_level = level; pr_branch_position = pos }) productions } precedence: PREC symbol = symbol { symbol } /* ------------------------------------------------------------------------- */ /* A production is a list of producers, optionally followed by a precedence declaration. */ production: producers = producer* oprec = ioption(precedence) { producers, oprec, ParserAux.new_production_level(), Positions.import $loc } /* ------------------------------------------------------------------------- */ /* A producer is an actual parameter, possibly preceded by a binding, and possibly followed with attributes. Because both [ioption] and [terminated] are defined as inlined by the standard library, this definition expands to two productions, one of which begins with id = LID, the other of which begins with p = actual. The token LID is in FIRST(actual), but the LR(1) formalism can deal with that. If [option] was used instead of [ioption], an LR(1) conflict would arise -- looking ahead at LID would not allow determining whether to reduce an empty [option] or to shift. */ producer: | id = ioption(terminated(LID, EQUAL)) p = actual attrs = ATTRIBUTE* SEMI* { position (with_loc $loc ()), id, p, attrs } /* ------------------------------------------------------------------------- */ /* The ideal syntax of actual parameters includes: 1. a symbol, optionally applied to a list of actual parameters; 2. an actual parameter followed with a modifier; 3. an anonymous rule. (Not delimited by parentheses! Otherwise one would often end up writing two pairs of parentheses.) */ /* In order to avoid a few ambiguities, we restrict this ideal syntax as follows: a. Within a %type declaration, we use [strict_actual], which allows 1- and 2- (this is undocumented; the documentation says we require a symbol) but not 3-, which would not make semantic sense anyway. b. Within a producer, we use [actual], which allows 1- and 2- but not 3-. Case 3- is allowed by switching to [lax_actual] within the actual arguments of an application, which are clearly delimited by parentheses and commas. c. In front of a modifier, we can never allow [lax_actual], as this would create an ambiguity: basically, [A | B?] could be interpreted either as [(A | B)?] or as [A | (B?)]. */ %inline generic_actual(A, B): (* 1- *) symbol = symbol actuals = plist(A) { Parameters.app symbol actuals } (* 2- *) | p = B m = located(modifier) { ParameterApp (m, [ p ]) } strict_actual: p = generic_actual(strict_actual, strict_actual) { p } actual: p = generic_actual(lax_actual, actual) { p } lax_actual: p = generic_actual(lax_actual, /* cannot be lax_ */ actual) { p } (* 3- *) | /* leading bar disallowed */ branches = located(branches) { ParameterAnonymous branches } (* 2016/05/18: we used to eliminate anonymous rules on the fly during parsing. However, when an anonymous rule appears in a parameterized definition, the fresh nonterminal symbol that is created should be parameterized. This was not done, and is not easy to do on the fly, as it requires inherited attributes (or a way of simulating them). We now use explicit abstract syntax for anonymous rules. *) /* ------------------------------------------------------------------------- */ /* The "?", "+", and "*" modifiers are short-hands for applications of certain parameterized nonterminals, defined in the standard library. */ modifier: QUESTION { "option" } | PLUS { "nonempty_list" } | STAR { "list" } /* ------------------------------------------------------------------------- */ /* A postlude is announced by %%, but is optional. */ postlude: EOF { None } | p = PERCENTPERCENT /* followed by actual postlude */ { Some (Lazy.force p) } /* -------------------------------------------------------------------------- */ /* -------------------------------------------------------------------------- */ /* The new rule syntax. */ /* Whereas the old rule syntax allows a nonterminal symbol to begin with an uppercase letter, the new rule syntax disallows it. The left-hand side of a new rule must be a lowercase identifier [LID]. */ /* A new rule *cannot* be terminated by a semicolon. (This is contrast with a traditional rule, which can be followed with any number of semicolons.) We are forced to forbid the use of semicolons are as a rule terminator because they are used already as a sequencing construct. Permitting both uses would give rise to a shift/reduce conflict that we would not be able to solve. */ new_rule: | rule_public = boption(PUBLIC) LET rule_lhs = LID rule_attributes = ATTRIBUTE* rule_formals = plist(symbol) rule_inline = equality_symbol rule_rhs = expression {{ rule_public; rule_inline; rule_lhs; rule_attributes; rule_formals; rule_rhs; }} /* A new rule is written [let foo := ...] or [let foo == ...]. In the former case, we get an ordinary nonterminal symbol; in the latter case, we get an %inline nonterminal symbol. */ equality_symbol: COLONEQUAL { false } | EQUALEQUAL { true } /* The right-hand side of a new rule is an expression. */ /* An expression is a choice expression. */ expression: e = located(choice_expression) { e } /* A choice expression is a bar-separated list of alternatives, with an optional leading bar, which is ignored. Each alternative is a sequence expression. */ /* We cannot allow a choice expression to be empty, even though that would make semantic sense (the empty sum is void). Indeed, that would create a shift/reduce conflict: after reading [def x = y], it would be unclear whether this is a definition of [x] as an alias for [y], or a definition of [x] as an alias for the empty sum, followed with an old-style rule that happens to begin with [y]. */ %inline choice_expression: branches = preceded_or_separated_nonempty_llist(BAR, branch) { EChoice branches } %inline branch: e = seq_expression { Branch (e, ParserAux.new_production_level()) } /* A sequence expression takes one of the following forms: e1; e2 a sequence that binds no variables (sugar for _ = e1; e2) p = e1; e2 a sequence that binds the variables in the pattern p or is an symbol expression or an action expression. */ /* Allowing an symbol expression [e] where a sequence expression is expected can be understood as syntactic sugar for [x = e; { x }]. */ /* In a sequence [e1; e2] or [p = e1; e2], the left-hand expression [e1] is *not* allowed to be an action expression. That would be a Bison-style midrule action. Instead, one must explicitly write [midrule({ ... })]. */ /* In a sequence, the semicolon cannot be omitted. This is in contrast with old-style rules, where semicolons are optional. Here, semicolons are required for disambiguation: indeed, in the absence of mandatory semicolons, when a sequence begins with x(y,z), it would be unclear whether 1- x is a parameterized symbol and (y,z) are its actual arguments, or 2- x is unparameterized and (y, z) is a tuple pattern which forms the beginning of the next element of the sequence. */ /* We *could* allow the semicolon to be omitted when it precedes an action expression (as opposed to a sequence expression). This would be implemented in the definition of the nonterminal symbol [continuation]. We choose not to do this, as we wish to make it clear in this case that this is a sequence whose last element is the action expression. */ %inline seq_expression: e = located(raw_seq_expression) { e } raw_seq_expression: | e1 = symbol_expression e2 = continuation { ECons (SemPatWildcard, e1, e2) } | p1 = pattern EQUAL e1 = symbol_expression e2 = continuation { ECons (p1, e1, e2) } | e = symbol_expression { ESingleton e } | e = action_expression { e } %inline continuation: SEMI e2 = seq_expression /* | e2 = action_expression */ { e2 } /* A symbol expression takes one of the following forms: foo(...) a terminal or nonterminal symbol (with parameters) e* same as above e+ same as above e? same as above */ /* Note the absence of parenthesized expressions [(e)] in the syntax of symbol expressions. There are two reasons why they are omitted. At the syntactic level, introducing them would create a conflict. At a semantic level, they are both unnecessary and ambiguous, as one can instead write [endrule(e)] or [midrule(e)] and thereby indicate whether the anonymous nonterminal symbol that is generated should or should not be marked %inline. */ symbol_expression: | symbol = symbol es = plist(expression) attrs = ATTRIBUTE* { ESymbol (symbol, es, attrs) } | e = located(symbol_expression) m = located(modifier) attrs = ATTRIBUTE* (* We are forced by syntactic considerations to require a symbol expression in a position where an expression is expected. As a result, an injection must be applied. *) { ESymbol (m, [ inject e ], attrs) } /* An action expression is a semantic action, optionally preceded or followed with a precedence annotation. */ action_expression: | action = action { EAction (action, None) } | prec = precedence action = action { EAction (action, Some prec) } | action = action prec = precedence { EAction (action, Some prec) } /* A semantic action is either a traditional semantic action (an OCaml expression between curly braces) or a point-free semantic action (an optional OCaml identifier between angle brackets). */ /* The token OCAMLTYPE, which until now was supposed to denote an OCaml type between angle brackets, is re-used for this purpose. This is not very pretty. */ /* The stretch produced by the lexer is validated -- i.e., we check that it contains just an OCaml identifier, or is empty. The parentheses added by the lexer to the [stretch_content] field are removed (ugh!) because they are problematic when this identifier is a data constructor. */ action: action = ACTION { XATraditional action } | action = OCAMLTYPE { match ParserAux.validate_pointfree_action action with | os -> XAPointFree (unparenthesize os) | exception Lexpointfree.InvalidPointFreeAction -> Error.error [Positions.import $loc] "A point-free semantic action must consist \ of a single OCaml identifier." (* or whitespace *) } /* Patterns. */ pattern: | x = LID { SemPatVar x } | UNDERSCORE { SemPatWildcard } | TILDE { SemPatTilde (Positions.import $loc) } | LPAREN ps = separated_list(COMMA, pattern) RPAREN { SemPatTuple ps } (* -------------------------------------------------------------------------- *) (* -------------------------------------------------------------------------- *) (* Generic definitions. *) (* ------------------------------------------------------------------------- *) (* Formal and actual parameter lists can be absent. When present, they must be nonempty, and are delimited with parentheses and separated with commas. *) %inline plist(X): params = loption(delimited(LPAREN, separated_nonempty_list(COMMA, X), RPAREN)) { params } (* -------------------------------------------------------------------------- *) (* [reversed_preceded_or_separated_nonempty_llist(delimiter, X)] recognizes a nonempty list of [X]s, separated with [delimiter]s, and optionally preceded with a leading [delimiter]. It produces an OCaml list in reverse order. Its definition is left-recursive. *) reversed_preceded_or_separated_nonempty_llist(delimiter, X): | ioption(delimiter) x = X { [x] } | xs = reversed_preceded_or_separated_nonempty_llist(delimiter, X) delimiter x = X { x :: xs } (* [preceded_or_separated_nonempty_llist(delimiter, X)] recognizes a nonempty list of [X]s, separated with [delimiter]s, and optionally preceded with a leading [delimiter]. It produces an OCaml list in direct order. *) %inline preceded_or_separated_nonempty_llist(delimiter, X): xs = rev(reversed_preceded_or_separated_nonempty_llist(delimiter, X)) { xs } (* [preceded_or_separated_llist(delimiter, X)] recognizes a possibly empty list of [X]s, separated with [delimiter]s, and optionally preceded with a leading [delimiter]. It produces an OCaml list in direct order. *) preceded_or_separated_llist(delimiter, X): | (* empty *) { [] } | xs = preceded_or_separated_nonempty_llist(delimiter, X) { xs } (* -------------------------------------------------------------------------- *) (* [located(X)] recognizes the same language as [X] and converts the resulting value from type ['a] to type ['a located]. *) located(X): x = X { with_loc $loc x } %% menhir-20210929/src/stage2/parserMessages.messages000066400000000000000000000502611412503066000217400ustar00rootroot00000000000000# ---------------------------------------------------------------------------- grammar: UID grammar: HEADER UID Either a declaration or '%%' is expected at this point. # ---------------------------------------------------------------------------- grammar: TYPE UID grammar: TYPE OCAMLTYPE TYPE grammar: TYPE OCAMLTYPE UID PREC grammar: TYPE OCAMLTYPE UID LPAREN TYPE grammar: TYPE OCAMLTYPE UID COMMA TYPE grammar: TYPE OCAMLTYPE UID LPAREN UID UID grammar: TYPE OCAMLTYPE UID LPAREN UID COMMA TYPE grammar: TYPE OCAMLTYPE UID PLUS RPAREN grammar: ON_ERROR_REDUCE TYPE # %type and %on_error_reduce are both followed with clist(strict_actual), # so they are not distinguished in the automaton. Ill-formed declaration. Examples of well-formed declarations: %type expression %type date time %type option(date) %on_error_reduce expression %on_error_reduce date time %on_error_reduce option(date) # ---------------------------------------------------------------------------- grammar: TOKEN TYPE grammar: TOKEN OCAMLTYPE TYPE grammar: TOKEN UID STAR grammar: TOKEN UID QID STAR grammar: TOKEN UID COMMA TYPE Ill-formed '%token' declaration. Examples of well-formed declarations: %token FOO %token BAR "|" %token DOT "." SEMICOLON ";" %token LID UID %token FOO [@cost 0] # ---------------------------------------------------------------------------- grammar: START UID grammar: START OCAMLTYPE LEFT grammar: START LID UID grammar: START LID COMMA UID Ill-formed '%start' declaration. A start symbol must begin with a lowercase letter. Examples of well-formed declarations: %start program %start expression phrase %start date time # ---------------------------------------------------------------------------- grammar: RIGHT TYPE grammar: RIGHT UID STAR grammar: RIGHT UID COMMA TYPE Ill-formed precedence declaration. Examples of well-formed declarations: %left PLUS %left PLUS MINUS %nonassoc unary_minus %right CONCAT # ---------------------------------------------------------------------------- grammar: PARAMETER UID Ill-formed '%parameter' declaration. Examples of well-formed declarations: %parameter # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT TYPE # Do not mention that %% or EOF would be accepted at this point. A rule is expected at this point. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT UID COLON ACTION SEMI UNDERSCORE # We have seen a semicolon, so we know that the previous rule is complete. # Do not mention that %% or EOF would be accepted at this point. Another rule is expected at this point. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT UID COLON ACTION TYPE # Do not mention that %% or EOF would be accepted at this point. Either another production '|' ... or another rule is expected at this point. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT INLINE TYPE # This is definitely old-style syntax. Ill-formed rule. Either '%public' or a non-terminal symbol is expected at this point. Examples of well-formed rules: %public option(X): { None } | x = X { Some x } %inline clist(X): xs = separated_nonempty_list(COMMA?, X) { xs } %public %inline pair(X, Y): x = X; y = Y { (x, y) } grammar: PERCENTPERCENT PUBLIC INLINE TYPE # This is definitely old-style syntax. Ill-formed rule. A non-terminal symbol is expected at this point. Examples of well-formed rules: %public option(X): { None } | x = X { Some x } %inline clist(X): xs = separated_nonempty_list(COMMA?, X) { xs } %public %inline pair(X, Y): x = X; y = Y { (x, y) } # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT UID UID Ill-formed rule. Either a parenthesized, comma-delimited list of formal parameters or an attribute or a colon ':' is expected at this point. Examples of well-formed rules: main: e = expr; EOL { e } expr: i = INT { i } | e1 = expr; PLUS; e2 = expr { e1 + e2 } option(X): { None } | x = X { Some x } main [@cost 0]: e = expr; EOL { e } # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT UID LPAREN TYPE grammar: PERCENTPERCENT UID LPAREN UID COMMA TYPE Ill-formed rule. A comma-delimited list of formal parameters is expected at this point. Examples of well-formed rules: option(X): { None } | x = X { Some x } pair(X, Y): x = X; y = Y { (x, y) } # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT UID LPAREN UID UID # Ignore the fact that the comma-delimited list of symbols could continue. Ill-formed rule. A closing parenthesis ')' is expected at this point. Examples of well-formed rules: option(X): { None } | x = X { Some x } pair(X, Y): x = X; y = Y { (x, y) } # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT UID COLON TYPE grammar: PERCENTPERCENT UID COLON BAR TYPE grammar: PERCENTPERCENT UID COLON ACTION BAR TYPE grammar: PERCENTPERCENT UID COLON UID BAR TYPE Ill-formed rule. A list of productions is expected at this point. Examples of well-formed rules: main: e = expr; EOL { e } expr: i = INT { i } | e1 = expr; PLUS; e2 = expr { e1 + e2 } symbol: s = LID | s = UID { s } # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT UID COLON UID TYPE grammar: PERCENTPERCENT UID COLON UID SEMI TYPE grammar: PERCENTPERCENT UID COLON LID TYPE grammar: PERCENTPERCENT UID COLON LID EQUAL TYPE grammar: PERCENTPERCENT UID COLON LID EQUAL UID PLUS TYPE # The following sentences are tricky. In front of us could be many things # (comma, closing parenthesis, identifier, modifier, %prec keyword, etc.). # We don't know which symbol we expect to reduce towards (e.g., it could be # [actual] or [lax_actual]). # # Let's just back up to a safe level of abstraction and say that this is an # ill-formed production. grammar: PERCENTPERCENT UID COLON UID LPAREN UID TYPE grammar: PERCENTPERCENT UID COLON UID LPAREN UID STAR TYPE Ill-formed production. A production is a sequence of producers, followed with a semantic action. Examples of well-formed producers: expr option(COMMA) separated_list(COMMA, expr) e = expr ds = declaration* es = list(terminated(expr, SEMI)) es = list(e = expr SEMI { e }) xs = list(x = var { Some x } | WILDCARD { None }) expr [@cost 0] # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT UID COLON UID LPAREN ACTION BAR TYPE # Here, we have seen a BAR, so we expect a production (group). A production is expected at this point. A production is a sequence of producers, followed with a semantic action. Examples of well-formed producers: expr option(COMMA) separated_list(COMMA, expr) e = expr ds = declaration* es = list(terminated(expr, SEMI)) es = list(e = expr SEMI { e }) xs = list(x = var { Some x } | WILDCARD { None }) # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT UID COLON UID LPAREN ACTION UID # In the non-canonical automaton, this is a tricky case where, looking at the # description of the state, it seems that only COMMA and RPAREN can follow # here. But in fact, other tokens are possible, such as BAR, simply because # they will NOT take us into this state. In the canonical automaton, the list # of possibilities is explicit in the lookahead sets. grammar: PERCENTPERCENT UID COLON UID LPAREN ACTION PREC UID UID # In the first case above, we may expect a %prec annotation, whereas in the # second case above, we have just seen it. In the error message, we merge # these two situations and do not mention the possibility of a %prec # annotation. Either another production '|' ... or a comma ',' or a closing parenthesis ')' is expected at this point. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT UID COLON PREC TYPE grammar: PERCENTPERCENT UID COLON UID LPAREN ACTION PREC TYPE grammar: PERCENTPERCENT UID COLON ACTION PREC TYPE grammar: PERCENTPERCENT LET LID COLONEQUAL PREC EOF # Conflate old rule syntax and new rule syntax. Ill-formed %prec annotation. A symbol is expected at this point. Examples of well-formed annotations: expr: MINUS e = expr %prec UMINUS { -e } # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT UID COLON PREC LID UID Ill-formed rule. Either a semantic action '{' ... '}' or another production '|' ... is expected at this point. Examples of well-formed rules: expr: MINUS e = expr %prec UMINUS { -e } # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT UID LPAREN UID RPAREN BAR Ill-formed rule. A colon ':' is expected at this point. Examples of well-formed rules: option(X): { None } | x = X { Some x } # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT UID COLON ACTION PREC UID TYPE Either another rule or another production '|' ... is expected at this point. Examples of well-formed rules: option(X): { None } | x = X { Some x } # ---------------------------------------------------------------------------- grammar: TYPE OCAMLTYPE UID LPAREN UID LPAREN TYPE grammar: PERCENTPERCENT UID COLON UID LPAREN UID COMMA TYPE grammar: PERCENTPERCENT UID COLON UID LPAREN UID LPAREN TYPE Ill-formed list of actual parameters. A comma-delimited list of actual parameters is expected at this point. Examples of well-formed actual parameters: expr expr+ option(expr) separated_list(COMMA, expr) # Omitting the fact that an anonymous rule is a valid actual parameter... # Also omitting the subtle distinctions between lax_actual, actual, etc. # ---------------------------------------------------------------------------- grammar: TYPE OCAMLTYPE UID LPAREN UID PLUS UID Ill-formed list of actual parameters. Either a modifier '*' or '+' or '?' or a closing parenthesis ')' or a comma ',' is expected at this point. Examples of well-formed actual parameters: expr expr+ option(expr) separated_list(COMMA, expr) # ------------------------------------------------------------------------------ grammar: PERCENTATTRIBUTE TYPE grammar: PERCENTATTRIBUTE UID COMMA TYPE grammar: PERCENTATTRIBUTE UID TYPE grammar: PERCENTATTRIBUTE UID PLUS TYPE grammar: PERCENTATTRIBUTE UID LPAREN TYPE grammar: PERCENTATTRIBUTE UID ATTRIBUTE UID Ill-formed '%attribute' declaration. An '%attribute' declaration should contain a nonempty list of symbols, followed with a nonempty list of attributes. Examples of well-formed declarations: %attribute FOO [@printer "foo"] %attribute bar BAZ [@printer "bar/BAZ"] [@cost 2.0] # ---------------------------------------------------------------------------- # ---------------------------------------------------------------------------- # The following error sentences concern both the old and new rule syntax. grammar: PERCENTPERCENT PUBLIC TYPE Ill-formed rule. 'let' or '%inline' or a non-terminal symbol is expected at this point. Examples of well-formed rules: %public option(X): { None } | x = X { Some x } %public let option(X) := { None } | x = X; { Some x } # ---------------------------------------------------------------------------- # ---------------------------------------------------------------------------- # The following error sentences have to do with the new rule syntax. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET EOF A lowercase identifier is expected at this point. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID EOF grammar: PERCENTPERCENT LET LID LPAREN UID RPAREN EOF # Ignore attributes. # In the first case, we have not seen a list of formal parameters yet, # so such a list could still appear; yet I choose not to mention it. # People are likely to write '=' whereas we expect ':=' or '=='. # We should remind them what these two symbols mean. An equality symbol ':=' or '==' is expected at this point. Examples of well-formed rules: let option(X) := { None } | x = X; { Some x } (* ordinary *) let ioption(X) == { None } | x = X; { Some x } (* inline *) # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID LPAREN EOF # By accident, the lookahead token (EQUALEQUAL or COLONEQUAL) reveals # that we are in the new rule syntax. A comma-delimited list of formal parameters is expected at this point. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID LPAREN UID EOF # By accident, the lookahead token (EQUALEQUAL or COLONEQUAL) reveals # that we are in the new rule syntax. At this point, one of the following is expected: a comma ',' followed with an expression, or a closing parenthesis ')'. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL EOF # A choice expression is expected. An expression is expected at this point. Examples of expressions: term t = term; { t } LPAREN; ~ = term; RPAREN; <> factor | term; MUL; factor # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL TILDE EOF grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN TILDE EOF An equals sign '=' is expected at this point. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL TILDE EQUAL EOF grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN TILDE EQUAL EOF # A symbol expression is expected. # A symbol expression always begins with a symbol, # so we can say that a symbol is expected. A symbol is expected at this point. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL TILDE EQUAL LID EOF grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN UNDERSCORE EQUAL UID ATTRIBUTE EOF # Ignore the fact that an attribute or a modifier is permitted. A semicolon ';' is expected at this point. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL PREC UID EOF grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN PREC UID EOF A semantic action is expected at this point. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL LPAREN EOF grammar: PERCENTPERCENT LET LID COLONEQUAL LPAREN LPAREN EOF This opening parenthesis seems to be the beginning of a tuple pattern. Thus, a comma-separated list of patterns is expected at this point. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL LPAREN LPAREN UNDERSCORE EOF grammar: PERCENTPERCENT LET LID COLONEQUAL LPAREN UNDERSCORE EOF The previous opening parenthesis seemed to be the beginning of a tuple pattern. Thus, either a comma ',' followed with a pattern or a closing parenthesis ')' is expected at this point. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL LPAREN UNDERSCORE COMMA EOF A pattern is expected at this point. Examples of patterns: x ~ _ (x, y, _) # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL LID TYPE # This is tricky. We have read a lowercase identifier, # but do not know yet whether it represents a pattern # (in which case an EQUAL sign is expected) # or a symbol # (in which case many continuations are possible) # (in fact, the rule could be complete, as we are # at the top level). # Ignore the fact that this symbol could be followed with a list of # actual parameters, or a modifier, or an attribute. At this point, one of the following is expected: an equals sign '=' followed with a symbol, or a semicolon ';' followed with an expression, or a bar '|' followed with an expression, or another rule. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN LID EOF # This is analogous to the previous case, # except we are not at the top level. At this point, one of the following is expected: an equals sign '=' followed with a symbol, or a semicolon ';' followed with an expression, or a bar '|' followed with an expression, or a closing parenthesis ')'. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL ACTION TYPE At this point, one of the following is expected: a bar '|' followed with an expression, or another rule. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL UID TYPE # Ignore modifiers and attributes. # We expect either SEMI; seq_expression or BAR; expression or another rule. At this point, one of the following is expected: a semicolon ';' followed with an expression, or a bar '|' followed with an expression, or another rule. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN TYPE grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN UID LPAREN TYPE A comma-separated list of expressions is expected at this point. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL BAR TYPE grammar: PERCENTPERCENT LET LID COLONEQUAL UID BAR TYPE grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN BAR TYPE grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN UID BAR TYPE # A sequence expression is expected. # We are inside a choice expression. # We can show examples that involve '|', # as our sequence expression can be part of a choice expression # and therefore followed with a BAR. An expression is expected at this point. Examples of expressions: term t = term; { t } LPAREN; ~ = term; RPAREN; <> factor | term; MUL; factor # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN UID SEMI TYPE grammar: PERCENTPERCENT LET LID COLONEQUAL UNDERSCORE EQUAL UID SEMI TYPE grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN UNDERSCORE EQUAL UID SEMI TYPE grammar: PERCENTPERCENT LET LID COLONEQUAL UID SEMI EOF # A sequence expression is expected. # We are inside a sequence expression. # In fact, we have just read a semicolon. # Maybe it is worth re-iterating that (in the new syntax) # a rule cannot be terminated with a semicolon. After a semicolon, an expression is expected. (A rule cannot be terminated with a semicolon.) Examples of expressions: term t = term; { t } LPAREN; ~ = term; RPAREN; <> # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN ACTION EOF At this point, one of the following is expected: a comma ',' followed with an expression, or a bar '|' followed with an expression, or a closing parenthesis ')'. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN UID EOF # Ignore modifiers and attributes. # We expect either SEMI; seq_expression or BAR; expression or COMMA; expression or RPAREN. At this point, one of the following is expected: a semicolon ';' followed with an expression, or a bar '|' followed with an expression, or a comma ',' followed with an expression, or a closing parenthesis ')'. # ---------------------------------------------------------------------------- grammar: PERCENTPERCENT LET LID COLONEQUAL UID LPAREN UID COMMA TYPE # A choice expression is expected (allowed). An expression is expected at this point. Examples of expressions: term t = term; { t } LPAREN; ~ = term; RPAREN; <> factor | ~ = term; ~ = op; ~ = factor; # ---------------------------------------------------------------------------- # Local Variables: # mode: shell-script # End: menhir-20210929/src/stage3/000077500000000000000000000000001412503066000152205ustar00rootroot00000000000000menhir-20210929/src/stage3/anonymize/000077500000000000000000000000001412503066000172315ustar00rootroot00000000000000menhir-20210929/src/stage3/anonymize/anonymize.ml000066400000000000000000000026531412503066000216020ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This script removes file names in # (line) directives. *) (* This is used to textually compare the parsers produced by the stage 2 and stage 3 executables. *) let line_directive = Str.regexp {|^# \([0-9]+\) ".*"$|} let process fn = let ic = open_in fn in try while true do let s = input_line ic in print_endline (Str.replace_first line_directive {|# \1 ""|} s) done with End_of_file -> close_in ic let () = Arg.parse [] process "" menhir-20210929/src/stage3/anonymize/dune000066400000000000000000000000631412503066000201060ustar00rootroot00000000000000(executable (name anonymize) (libraries str) ) menhir-20210929/src/stage3/dune000066400000000000000000000060711412503066000161020ustar00rootroot00000000000000;; Build the stage3 version of Menhir, based on a parser generated by the ;; stage2 version of Menhir. ;; One might think that one could save time by building only the stage3 ;; parser.ml and parser.mli, as opposed to a full stage3 executable. However, ;; because Menhir is used in --infer mode, building these files in a correct ;; manner actually requires type-checking a large part of Menhir's source ;; code. It's simpler to just build everything than to be smart. ;; ----------------------------------------------------------------------------- ;; As dune cannot use the same OCaml module in two different libraries or ;; executables, we must copy the source files to the present directory. (copy_files# ../*.{ml,mli}) ;; The following files are copied from stage 2. (copy_files ../stage2/menhir_flags.sexp) (copy_files ../stage2/parser.mly) (copy_files ../stage2/Driver.ml) (copy_files ../stage2/parserMessages.ml) ;; ----------------------------------------------------------------------------- ;; Bind the name "menhir" to "../stage2/main.exe" within the present scope. (env (_ (binaries ../stage2/main.exe (../stage2/main.exe as menhir))) ) ;; Menhir's parser is generated by Menhir. ;; We include the flags found in the file "menhir_flags" plus extra flags ;; specified here. (menhir (flags (:include menhir_flags.sexp) ) (modules parser) ) ;; ----------------------------------------------------------------------------- ;; The stage3 version of Menhir. ;; The link_deps field below is an artificial way of requiring the @bootstrap ;; target to be automatically built. (executable (name main) (libraries unix vendored_fix vendored_pprint menhirLib menhirSdk ) (flags :standard -open MenhirSdk) (link_deps (alias bootstrap)) ) ;; ---------------------------------------------------------------------------- ;; The bootstrap check verifies that stage1-Menhir and stage2-Menhir produce ;; the same result when applied to Menhir's stage2 grammar. If this check ;; fails, then the ocamlyacc parser in stage1/parser.mly and the Menhir parser ;; in stage2/parser.mly have different semantics, a mistake that must be ;; fixed. ;; This check compares the [Parser] modules generated by the stage1 and stage2 ;; executables, and fails if they are not identical. ;; These parsers contain line directives that are necessarily different ;; because they were generated in different directories. A helper script is ;; used to remove the "filename" part of the line directives. ;; This check is run as part of [dune test]. (rule (with-stdout-to parser.stage2.ml (run anonymize/anonymize.exe %{dep:../stage2/parser.ml}) ) ) (rule (with-stdout-to parser.stage3.ml (run anonymize/anonymize.exe %{dep:parser.ml}) ) ) (rule (alias bootstrap) (action (progn (echo "Bootstrap check: comparing the stage 2 and stage 3 parsers...\n") (progn (diff parser.stage2.ml parser.stage3.ml) (diff ../stage2/parser.mli parser.mli) ) (echo "Bootstrap check: done.\n") )) ) (rule (alias test) (deps (alias bootstrap)) (action (progn)) ) menhir-20210929/src/standard.mly000066400000000000000000000170501412503066000163600ustar00rootroot00000000000000/******************************************************************************/ /* */ /* Menhir */ /* */ /* François Pottier, Inria Paris */ /* Yann Régis-Gianas, PPS, Université Paris Diderot */ /* */ /* Copyright Inria. All rights reserved. This file is distributed under the */ /* terms of the GNU Library General Public License version 2, with a */ /* special exception on linking, as described in the file LICENSE. */ /* */ /******************************************************************************/ (* This is menhir's standard library. It offers a number of parameterized nonterminal definitions, such as options and lists, that should be useful in a number of circumstances. *) %% (* ------------------------------------------------------------------------- *) (* The identity. *) (* [endrule(X)] is the same as [X]. *) (* This allows placing an anonymous subrule in the middle of a rule, as in: cat endrule(dog { action1 }) cow { action2 } Because [endrule] is marked %inline, everything is expanded away. So, this is equivalent to: cat dog cow { action1; action2 } Note that [action1] moves to the end of the rule. The anonymous subrule can even have several branches, as in: cat endrule(dog { action1a } | fox { action1b }) cow { action2 } This is expanded to: cat dog cow { action1a; action2 } | cat fox cow { action1b; action2 } *) %public %inline endrule(X): x = X { x } (* [anonymous(X)] is a deprecated synonym for [endrule(X)]. It was never documented. *) %public %inline anonymous(X): x = X { x } (* [midrule(X)] is the same as [X]. *) (* This allows placing an anonymous subrule in the middle of a rule, as in: cat midrule(dog { action1 }) cow { action2 } Because [midrule] is not marked %inline, this is equivalent to: cat xxx cow { action2 } where the fresh nonterminal symbol [xxx] is separately defined by: xxx: dog { action1 } In particular, if there is no [dog], what we get is a semantic action embedded in the middle of a rule. For instance, cat midrule({ action1 }) cow { action2 } is equivalent to: cat xxx cow { action2 } where [xxx] is separately defined by the rule: xxx: { action1 } *) %public midrule(X): x = X { x } (* [embedded(X)] is a deprecated synonym for [midrule(X)]. It was never documented. *) %public embedded(X): x = X { x } (* ------------------------------------------------------------------------- *) (* Options. *) (* [option(X)] recognizes either nothing or [X]. It produces a value of type ['a option] if [X] produces a value of type ['a]. *) %public option(X): /* nothing */ { None } | x = X { Some x } (* [ioption(X)] is identical to [option(X)], except its definition is inlined. This has the effect of duplicating the production that refers to it, possibly eliminating an LR(1) conflict. *) %public %inline ioption(X): /* nothing */ { None } | x = X { Some x } (* [boption(X)] recognizes either nothing or [X]. It produces a value of type [bool]. *) %public boption(X): /* nothing */ { false } | X { true } (* [loption(X)] recognizes either nothing or [X]. It produces a value of type ['a list] if [X] produces a value of type ['a list]. *) %public loption(X): /* nothing */ { [] } | x = X { x } (* ------------------------------------------------------------------------- *) (* Sequences. *) (* [epsilon] recognizes the empty word. It can be used instead of the traditional /* empty */ comment. *) (* NOT YET ADDED because we first need to remove the limitation that every symbol must be reachable from the start symbol! %public %inline epsilon: /* empty */ { () } *) (* [pair(X, Y)] recognizes the sequence [X Y]. It produces a value of type ['a * 'b] if [X] and [Y] produce values of type ['a] and ['b], respectively. *) %public %inline pair(X, Y): x = X; y = Y { (x, y) } (* [separated_pair(X, sep, Y)] recognizes the sequence [X sep Y]. It produces a value of type ['a * 'b] if [X] and [Y] produce values of type ['a] and ['b], respectively. *) %public %inline separated_pair(X, sep, Y): x = X; sep; y = Y { (x, y) } (* [preceded(opening, X)] recognizes the sequence [opening X]. It passes on the value produced by [X], so that it produces a value of type ['a] if [X] produces a value of type ['a]. *) %public %inline preceded(opening, X): opening; x = X { x } (* [terminated(X, closing)] recognizes the sequence [X closing]. It passes on the value produced by [X], so that it produces a value of type ['a] if [X] produces a value of type ['a]. *) %public %inline terminated(X, closing): x = X; closing { x } (* [delimited(opening, X, closing)] recognizes the sequence [opening X closing]. It passes on the value produced by [X], so that it produces a value of type ['a] if [X] produces a value of type ['a]. *) %public %inline delimited(opening, X, closing): opening; x = X; closing { x } (* ------------------------------------------------------------------------- *) (* Lists. *) (* [list(X)] recognizes a possibly empty list of [X]'s. It produces a value of type ['a list] if [X] produces a value of type ['a]. The front element of the list is the first element that was parsed. *) %public list(X): /* nothing */ { [] } | x = X; xs = list(X) { x :: xs } (* [nonempty_list(X)] recognizes a nonempty list of [X]'s. It produces a value of type ['a list] if [X] produces a value of type ['a]. The front element of the list is the first element that was parsed. *) %public nonempty_list(X): x = X { [ x ] } | x = X; xs = nonempty_list(X) { x :: xs } (* [separated_list(separator, X)] recognizes a possibly empty list of [X]'s, separated with [separator]'s. It produces a value of type ['a list] if [X] produces a value of type ['a]. The front element of the list is the first element that was parsed. *) %public %inline separated_list(separator, X): xs = loption(separated_nonempty_list(separator, X)) { xs } (* [separated_nonempty_list(separator, X)] recognizes a nonempty list of [X]'s, separated with [separator]'s. It produces a value of type ['a list] if [X] produces a value of type ['a]. The front element of the list is the first element that was parsed. *) %public separated_nonempty_list(separator, X): x = X { [ x ] } | x = X; separator; xs = separated_nonempty_list(separator, X) { x :: xs } (* ------------------------------------------------------------------------- *) (* List manipulation and transformation. *) (* [rev(XS)] recognizes the same language as [XS], but reverses the resulting OCaml list. (20181005) *) %public %inline rev(XS): xs = XS { List.rev xs } (* [flatten(XSS)] recognizes the same language as [XSS], and flattens the resulting OCaml list of lists. (20181005) *) %public %inline flatten(XSS): xss = XSS { List.flatten xss } (* [append(XS, YS)] recognizes [XS YS], and appends (concatenates) the resulting OCaml lists. (20181005) *) %public %inline append(XS, YS): xs = XS ys = YS { xs @ ys } %% menhir-20210929/src/stretch.ml000066400000000000000000000037351412503066000160500ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* A stretch is a fragment of a source file. It holds the file name, the line number, and the line count (that is, the length) of the fragment. These are used to generate line number directives when the fragment is copied to an output file. It also holds the textual content of the fragment, as a string. The [raw_content] field holds the text that was found in the source file, while the [content] field holds the same text after transformation by the lexer (which may substitute keywords, insert padding, insert parentheses, etc.). See [Lexer.mk_stretch] and its various call sites in [Lexer]. *) type t = { stretch_filename : string; stretch_linenum : int; stretch_linecount : int; stretch_raw_content : string; stretch_content : string; stretch_keywords : Keyword.keyword list } (* An OCaml type is either a stretch (if it was found in some source file) or a string (if it was inferred via [Infer]). *) type ocamltype = | Declared of t | Inferred of string menhir-20210929/src/stringMap.ml000066400000000000000000000030271412503066000163320ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) include Map.Make (String) let cardinal s = fold (fun _ _ x -> x + 1) s 0 let filter pred map = fold (fun key value map -> if pred key value then add key value map else map) map empty let restrict domain map = filter (fun k _ -> StringSet.mem k domain) map let domain map = fold (fun key _ acu -> StringSet.add key acu) map StringSet.empty let multiple_add k v m = let vs = try find k m with Not_found -> [] in add k (v :: vs) m let of_list xs = List.fold_left (fun m (x, v) -> add x v m) empty xs menhir-20210929/src/stringMap.mli000066400000000000000000000034341412503066000165050ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) include Map.S with type key = string (**[cardinal m] is the cardinal of the map [m]. *) val cardinal : 'a t -> int (**[restrict s m] restricts the domain of the map [m] to (its intersection with) the set [s]. *) val restrict: StringSet.t -> 'a t -> 'a t (**[filter pred m] restricts the domain of the map [m] to (key, value) couples that verify [pred]. *) val filter: (string -> 'a -> bool) -> 'a t -> 'a t (**[domain m] returns the domain of the map [m]. *) val domain: 'a t -> StringSet.t (**[multiple_add k v m] adds the key-value pair [k, v] to the map [m], which maps keys to *lists* of values. The list currently associated with [k] is extended with the value [v]. *) val multiple_add: key -> 'a -> 'a list t -> 'a list t (**[of_list] converts a list of key-value pairs to a map. *) val of_list: (key * 'a) list -> 'a t menhir-20210929/src/stringSet.ml000066400000000000000000000022411412503066000163450ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) include Set.Make (String) (* [map] appears in OCaml 4.04. *) let map f xs = fold (fun x accu -> add (f x) accu) xs empty let print s = Misc.separated_iter_to_string (fun s -> s) ", " (fun f -> iter f s) menhir-20210929/src/stringSet.mli000066400000000000000000000022321412503066000165160ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) include Set.S with type elt = string val map: (elt -> elt) -> t -> t (* [print] prints a set of strings as a comma-separated list, without opening and closing delimiters. *) val print: t -> string menhir-20210929/src/syntax.ml000066400000000000000000000306721412503066000157220ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The type [partial_grammar] describes the abstract syntax that is produced by the parsers (yacc-parser and fancy-parser). The type [grammar] describes the abstract syntax that is obtained after one or more partial grammars are joined (see [PartialGrammar]). It differs in that declarations are organized in a more useful way and a number of well-formedness checks have been performed. *) type 'a located = 'a Positions.located (* ------------------------------------------------------------------------ *) (* Terminals and nonterminal symbols are strings. *) type terminal = string type nonterminal = string type symbol = string (* In a somewhat fragile convention, in a partial grammar, a reference to a terminal symbol either is a normal identifier [LID], in which case it is the name of the terminal symbol, or is a quoted identifier [QID], in which case it is a token alias. Token aliases are eliminated by replacing them with the corresponding terminal symbols very early on during the joining of the partial grammars; see the module [ExpandTokenAliases]. In a complete grammar, there are no token aliases any longer. That is, we keep track of the aliases that have been declared (they can be found via the field [tk_alias]), but we never use them, since they have been eliminated up front. *) type alias = string option (* Identifiers (which are used to refer to a symbol's semantic value) are strings. *) type identifier = string (* A file name is a string. *) type filename = string (* ------------------------------------------------------------------------ *) (* A postlude is a source file fragment. *) type postlude = Stretch.t (* ------------------------------------------------------------------------ *) (* OCaml semantic actions are represented as stretches. *) type action = Action.t (* ------------------------------------------------------------------------ *) (* An attribute consists of an attribute name and an attribute payload. The payload is an uninterpreted stretch of source text. *) type attribute = string located * Stretch.t type attributes = attribute list (* Attributes allow the user to annotate the grammar with information that is ignored by Menhir, but can be exploited by other tools, via the SDK. *) (* Attributes can be attached in the following places: - with the grammar: %[@bar ...] - with a terminal symbol: %token FOO [@bar ...] - with a rule: foo(X) [@bar ...]: ... - with a producer: e = foo(quux) [@bar ...] - with an arbitrary symbol: %attribute FOO foo(quux) [@bar ...] After expanding away parameterized nonterminal symbols, things become a bit simpler, as %attribute declarations are desugared away. *) (* ------------------------------------------------------------------------ *) (* Information about tokens. (Only after joining.) *) type token_associativity = LeftAssoc | RightAssoc | NonAssoc | UndefinedAssoc type precedence_level = UndefinedPrecedence (* Items are incomparable when they originate in different files. A value of type [input_file] is used to record an item's origin. The positions allow locating certain warnings. *) | PrecedenceLevel of InputFile.input_file * int * Lexing.position * Lexing.position type token_properties = { tk_filename : filename; tk_ocamltype : Stretch.ocamltype option; tk_position : Positions.t; tk_alias : alias; tk_attributes : attributes; mutable tk_associativity : token_associativity; mutable tk_precedence : precedence_level; mutable tk_is_declared : bool; } (* ------------------------------------------------------------------------ *) (* A [%prec] annotation is optional. A production can carry at most one. If there is one, it is a symbol name. See [ParserAux]. *) type branch_prec_annotation = symbol located option (* ------------------------------------------------------------------------ *) (* A "production level" is used to solve reduce/reduce conflicts. It reflects which production appears first in the grammar. See [ParserAux]. *) type branch_production_level = | ProductionLevel of InputFile.input_file * int (* ------------------------------------------------------------------------ *) (* A level is attached to every [%on_error_reduce] declaration. It is used to decide what to do when several such declarations are applicable in a single state. *) type on_error_reduce_level = branch_production_level (* we re-use the above type, to save code *) (* ------------------------------------------------------------------------ *) (* ------------------------------------------------------------------------ *) (* The old rule syntax. Although old, still used internally. The new syntax is translated down to it. *) (* A parameter is either just a symbol or an application of a symbol to a nonempty tuple of parameters. Before anonymous rules have been eliminated, it can also be an anonymous rule, represented as a list of branches. *) type parameter = | ParameterVar of symbol located | ParameterApp of symbol located * parameters | ParameterAnonymous of parameterized_branch list located and parameters = parameter list (* ------------------------------------------------------------------------ *) (* A producer is a pair of identifier and a parameter. In concrete syntax, it could be [e = expr], for instance. The identifier [e] is always present. (A use of the keyword [$i] in a semantic action is turned by the lexer and parser into a reference to an identifier [_i].) A producer carries a number of attributes. *) and producer = identifier located * parameter * attributes (* ------------------------------------------------------------------------ *) (* A branch contains a series of producers and a semantic action. *) and parameterized_branch = { pr_branch_position : Positions.t; pr_producers : producer list; pr_action : action; pr_branch_prec_annotation : branch_prec_annotation; pr_branch_production_level : branch_production_level } (* ------------------------------------------------------------------------ *) (* A rule has a header and several branches. *) type parameterized_rule = { pr_public_flag : bool; pr_inline_flag : bool; pr_nt : nonterminal; pr_positions : Positions.t list; pr_attributes : attributes; pr_parameters : symbol list; pr_branches : parameterized_branch list; } (* ------------------------------------------------------------------------ *) (* ------------------------------------------------------------------------ *) (* The new rule syntax. *) (* In the user's eyes, this replaces the old rule syntax, which corresponds to the types [parameter], [producer], [parameterized_branch], and [parameterized_rule] above. *) (* Internally, the new rule syntax is translated down to the old rule syntax; see [NewRuleSyntax]. This is done on the fly during parsing. *) type pattern = | SemPatVar of identifier located | SemPatWildcard | SemPatTilde of Positions.t | SemPatTuple of pattern list (* Patterns: as in the manual. *) type raw_action = Settings.dollars -> identifier option array -> action (* Ugly type produced by the lexer for an ACTION token. *) type expression = choice_expression located (* A toplevel expression is a choice expression. *) and choice_expression = | EChoice of branch list (* A choice expression is a list of branches. *) and branch = | Branch of seq_expression * branch_production_level (* A branch is a sequence expression, plus an ugly [branch_production_level]. *) and seq_expression = raw_seq_expression located and raw_seq_expression = | ECons of pattern * symbol_expression * seq_expression | ESingleton of symbol_expression | EAction of extended_action * branch_prec_annotation (* A sequence is either a cons [p = e1; e2] or a lone symbol expression [e] or a semantic action. *) and symbol_expression = | ESymbol of symbol located * expression list * attributes (* A symbol expression is a symbol, possibly accompanied with actual parameters and attributes. *) and extended_action = | XATraditional of raw_action | XAPointFree of Stretch.t option (* A semantic action is either traditional { ... } or point-free. There are two forms of point-free actions, <> and . In the latter case, [id] is an OCaml identifier. *) type rule = { rule_public: bool; rule_inline: bool; rule_lhs: symbol located; rule_attributes: attributes; rule_formals: symbol located list; rule_rhs: expression; } (* ------------------------------------------------------------------------ *) (* ------------------------------------------------------------------------ *) (* A declaration. (Only before joining.) *) type declaration = (* Raw OCaml code. *) | DCode of Stretch.t (* Raw OCaml functor parameter. *) | DParameter of Stretch.ocamltype (* really a stretch *) (* Terminal symbol (token) declaration. *) | DToken of Stretch.ocamltype option * terminal * alias * attributes (* Start symbol declaration. *) | DStart of nonterminal (* Priority and associativity declaration. *) | DTokenProperties of terminal * token_associativity * precedence_level (* Type declaration. *) | DType of Stretch.ocamltype * parameter (* Grammar-level attribute declaration. *) | DGrammarAttribute of attribute (* Attributes shared among multiple symbols, i.e., [%attribute]. *) | DSymbolAttributes of parameter list * attributes (* On-error-reduce declaration. *) | DOnErrorReduce of parameter * on_error_reduce_level (* ------------------------------------------------------------------------ *) (* A partial grammar. (Only before joining.) *) type partial_grammar = { pg_filename : filename; pg_postlude : postlude option; pg_declarations : declaration located list; pg_rules : parameterized_rule list; } (* ------------------------------------------------------------------------ *) (* ------------------------------------------------------------------------ *) (* A grammar. (Only after joining.) *) (* The differences with partial grammars (above) are as follows: 1. the file name is gone (there could be several file names, anyway). 2. there can be several postludes. 3. declarations are organized by kind: preludes, postludes, functor %parameters, %start symbols, %types, %tokens, %on_error_reduce, grammar attributes, %attributes. 4. rules are stored in a map, indexed by symbol names, instead of a list. 5. token aliases have been replaced with ordinary named terminal symbols. *) type grammar = { p_preludes : Stretch.t list; p_postludes : postlude list; p_parameters : Stretch.t list; p_start_symbols : Positions.t StringMap.t; p_types : (parameter * Stretch.ocamltype located) list; p_tokens : token_properties StringMap.t; p_on_error_reduce : (parameter * on_error_reduce_level) list; p_grammar_attributes : attributes; p_symbol_attributes : (parameter list * attributes) list; p_rules : parameterized_rule StringMap.t; } menhir-20210929/src/tableBackend.ml000066400000000000000000000747431412503066000167420ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open CodeBits open Grammar open IL open Interface open Printf open TokenType open NonterminalType open CodePieces module Run (T : sig end) = struct (* ------------------------------------------------------------------------ *) (* Conventional names for modules, exceptions, record fields, functions. *) let menhirlib = "MenhirLib" let make_engine_table = menhirlib ^ ".TableInterpreter.MakeEngineTable" let make_engine = menhirlib ^ ".Engine.Make" let make_symbol = menhirlib ^ ".InspectionTableInterpreter.Symbols" let make_inspection = menhirlib ^ ".InspectionTableInterpreter.Make" let engineTypes = menhirlib ^ ".EngineTypes" let field x = engineTypes ^ "." ^ x let fstate = field "state" let fsemv = field "semv" let fstartp = field "startp" let fendp = field "endp" let fnext = field "next" let fstack = field "stack" let fcurrent = field "current" let entry = interpreter ^ ".entry" let start = interpreter ^ ".start" let staticVersion = menhirlib ^ ".StaticVersion" (* The following are names of internal sub-modules. *) let tables = "Tables" let symbols = "Symbols" let et = "ET" let ti = "TI" (* ------------------------------------------------------------------------ *) (* Statistics. *) (* Integer division, rounded up. *) let div a b = if a mod b = 0 then a / b else a / b + 1 (* [size] provides a rough measure of the size of its argument, in words. The [unboxed] parameter is true if we have already counted 1 for the pointer to the object. *) let rec size unboxed = function | EIntConst _ | ETuple [] | EData (_, []) -> if unboxed then 0 else 1 | EStringConst s -> 1 + div (String.length s * 8) Sys.word_size | ETuple es | EData (_, es) | EArray es -> 1 + List.length es + List.fold_left (fun s e -> s + size true e) 0 es | _ -> assert false (* not implemented *) let size = size false (* Optionally, print a measure of each of the tables that we are defining. *) let define (name, expr) = { valpublic = true; valpat = PVar name; valval = expr } let define_and_measure (x, e) = Error.logC 1 (fun f -> fprintf f "The %s table occupies roughly %d bytes.\n" x (size e * (Sys.word_size / 8)) ); define (x, e) (* ------------------------------------------------------------------------ *) (* Code generation for semantic actions. *) (* The functions [reducecellparams] and [reducebody] are adapted from [CodeBackend]. *) (* Things are slightly more regular here than in the code-based back-end, since there is no optimization: every stack cell has the same structure and holds a state, a semantic value, and a pair of positions. Because every semantic value is represented, we do not have a separate [unitbindings]. *) (* [reducecellparams] constructs a pattern that describes the contents of a stack cell. If this is the bottom cell, the variable [state] is bound to the state found in the cell. If [ids.(i)] is used in the semantic action, then it is bound to the semantic value. The position variables are always bound. *) let reducecellparams prod i _symbol (next : pattern) : pattern = let ids = Production.identifiers prod in PRecord [ fstate, (if i = 0 then PVar state else PWildcard); fsemv, PVar ids.(i); fstartp, PVar (Printf.sprintf "_startpos_%s_" ids.(i)); fendp, PVar (Printf.sprintf "_endpos_%s_" ids.(i)); fnext, next; ] (* The semantic values bound in [reducecellparams] have type [Obj.t]. They should now be cast to their real type. If we had [PMagic] in the syntax of patterns, we could do that in one swoop; since we don't, we have to issue a series of casts a posteriori. *) let reducecellcasts prod i symbol casts = let ids = Production.identifiers prod in let id = ids.(i) in let t : typ = match semvtype symbol with | [] -> tunit | [ t ] -> t | _ -> assert false in (* Cast: [let id = ((Obj.magic id) : t) in ...]. *) ( PVar id, annotate (EMagic (EVar id)) t ) :: casts (* 2015/11/04. The start and end positions of an epsilon production are obtained by taking the end position stored in the top stack cell (whatever it is). *) let endpos_of_top_stack_cell = ERecordAccess(EVar stack, fendp) (* This is the body of the [reduce] function associated with production [prod]. It assumes that the variable [stack] is bound. *) let reducebody prod = let nt, rhs = Production.def prod and ids = Production.identifiers prod and length = Production.length prod in (* Build a pattern that represents the shape of the stack. Out of the stack, we extract a state (except when the production is an epsilon production) and a number of semantic values. *) (* At the same time, build a series of casts. *) (* We want a [fold] that begins with the deepest cells in the stack. Folding from left to right on [rhs] is appropriate. *) let (_ : int), pat, casts = Array.fold_left (fun (i, pat, casts) symbol -> i + 1, reducecellparams prod i symbol pat, reducecellcasts prod i symbol casts ) (0, PVar stack, []) rhs in (* Determine beforeend/start/end positions for the left-hand side of the production, and bind them to the conventional variables [beforeendp], [startp], and [endp]. These variables may be unused by the semantic action, in which case these bindings are dead code and can be ignored by the OCaml compiler. *) let posbindings = ( PVar beforeendp, endpos_of_top_stack_cell ) :: ( PVar startp, if length > 0 then EVar (Printf.sprintf "_startpos_%s_" ids.(0)) else endpos_of_top_stack_cell ) :: ( PVar endp, if length > 0 then EVar (Printf.sprintf "_endpos_%s_" ids.(length - 1)) else EVar startp ) :: [] in (* This cannot be one of the start productions. *) assert (not (Production.is_start prod)); (* This is a regular production. Perform a reduction. *) let action = Production.action prod in let act = annotate (Action.to_il_expr action) (semvtypent nt) in EComment ( Production.print prod, blet ( (pat, EVar stack) :: (* destructure the stack *) casts @ (* perform type casts *) posbindings @ (* bind [startp] and [endp] *) [ PVar semv, act ], (* run the user's code and bind [semv] *) (* Return a new stack, onto which we have pushed a new stack cell. *) ERecord [ (* the new stack cell *) fstate, EVar state; (* the current state after popping; it will be updated by [goto] *) fsemv, ERepr (EVar semv); (* the newly computed semantic value *) fstartp, EVar startp; (* the newly computed start and end positions *) fendp, EVar endp; fnext, EVar stack; (* this is the stack after popping *) ] ) ) (* This is the body of the semantic action associated with production [prod]. It takes just one parameter, namely the environment [env]. *) let semantic_action prod = let env = prefix "env" in EFun ( [ PVar env ], (* Access the stack and current state via the environment. *) (* In fact, the current state needs be bound here only if this is an epsilon production. Otherwise, the variable [state] will be bound by the pattern produced by [reducecellparams] above. *) ELet ( [ PVar stack, ERecordAccess (EVar env, fstack) ] @ (if Production.length prod = 0 then [ PVar state, ERecordAccess (EVar env, fcurrent) ] else []), reducebody prod ) ) (* Export the number of start productions. *) let start_def = define ( "start", EIntConst Production.start ) (* ------------------------------------------------------------------------ *) (* Table encodings. *) (* Encodings of entries in the default reduction table. *) let encode_DefRed prod = (* 1 + prod *) 1 + Production.p2i prod let encode_NoDefRed = (* 0 *) 0 (* Encodings of entries in the action table. *) let encode_Reduce prod = (* prod | 01 *) (Production.p2i prod lsl 2) lor 1 let encode_ShiftDiscard s = (* s | 10 *) ((Lr1.number s) lsl 2) lor 0b10 let encode_ShiftNoDiscard s = (* s | 11 *) ((Lr1.number s) lsl 2) lor 0b11 let encode_Fail = (* 00 *) 0 (* Encodings of entries in the goto table. *) let encode_Goto node = (* 1 + node *) 1 + Lr1.number node let encode_NoGoto = (* 0 *) 0 (* Encodings of the hole in the action and goto tables. *) let hole = assert (encode_Fail = 0); assert (encode_NoGoto = 0); 0 (* Encodings of entries in the error bitmap. *) let encode_Error = (* 0 *) 0 let encode_NoError = (* 1 *) 1 (* Encodings of terminal and nonterminal symbols in the production table. *) let encode_no_symbol = 0 (* 0 | 0 *) let encode_terminal tok = (Terminal.t2i tok + 1) lsl 1 (* t + 1 | 0 *) let encode_nonterminal nt = ((Nonterminal.n2i nt) lsl 1) lor 1 (* nt | 1 *) let encode_symbol = function | Symbol.T tok -> encode_terminal tok | Symbol.N nt -> encode_nonterminal nt let encode_symbol_option = function | None -> encode_no_symbol | Some symbol -> encode_symbol symbol (* Encoding a Boolean as an integer value. *) let encode_bool b = if b then 1 else 0 (* ------------------------------------------------------------------------ *) (* Table compression. *) (* Our sparse, two-dimensional tables are turned into one-dimensional tables via [RowDisplacement]. *) (* The error bitmap, which is two-dimensional but not sparse, is made one-dimensional by simple flattening. *) (* Every one-dimensional table is then packed via [PackedIntArray]. *) (* Optionally, we print some information about the compression ratio. *) (* [population] counts the number of significant entries in a two-dimensional matrix. *) let population (matrix : int array array) = Array.fold_left (fun population row -> Array.fold_left (fun population entry -> if entry = hole then population else population + 1 ) population row ) 0 matrix (* [marshal1] marshals a one-dimensional array. *) let marshal1 (table : int array) = let (bits : int), (text : string) = MenhirLib.PackedIntArray.pack table in ETuple [ EIntConst bits; EStringConst text ] (* [marshal11] marshals a one-dimensional array whose bit width is statically known to be [1]. *) let marshal11 (table : int array) = let (bits : int), (text : string) = MenhirLib.PackedIntArray.pack table in assert (bits = 1); EStringConst text (* List-based versions of the above functions. *) let marshal1_list (table : int list) = marshal1 (Array.of_list table) let marshal11_list (table : int list) = marshal11 (Array.of_list table) (* [linearize_and_marshal1] marshals an array of integer arrays (of possibly different lengths). *) let linearize_and_marshal1 (table : int array array) = let data, entry = MenhirLib.LinearizedArray.make table in ETuple [ marshal1 data; marshal1 entry ] (* [flatten_and_marshal11_list] marshals a two-dimensional bitmap, whose width (for now) is assumed to be [Terminal.n - 1]. *) let flatten_and_marshal11_list (table : int list list) = ETuple [ (* Store the table width. *) EIntConst (Terminal.n - 1); (* View the table as a one-dimensional array, and marshal it. *) marshal11_list (List.flatten table) ] (* [marshal2] marshals a two-dimensional table, with row displacement. *) let marshal2 name m n (matrix : int list list) = let matrix : int array array = Array.of_list (List.map Array.of_list matrix) in let (displacement : int array), (data : int array) = MenhirLib.RowDisplacement.compress (=) (fun x -> x = hole) hole m n matrix in Error.logC 1 (fun f -> fprintf f "The %s table is %d entries; %d non-zero; %d compressed.\n" name (m * n) (population matrix) (Array.length displacement + Array.length data) ); ETuple [ marshal1 displacement; marshal1 data; ] (* ------------------------------------------------------------------------ *) (* Table generation. *) (* The action table. *) let action node t = match Default.has_default_reduction node with | Some _ -> (* [node] has a default reduction; in that case, the action table is never looked up. *) hole | None -> try let target = SymbolMap.find (Symbol.T t) (Lr1.transitions node) in (* [node] has a transition to [target]. If [target] has a default reduction on [#], use [ShiftNoDiscard], otherwise [ShiftDiscard]. *) match Default.has_default_reduction target with | Some (_, toks) when TerminalSet.mem Terminal.sharp toks -> assert (TerminalSet.cardinal toks = 1); encode_ShiftNoDiscard target | _ -> encode_ShiftDiscard target with Not_found -> try (* [node] has a reduction. *) let prod = Misc.single (TerminalMap.find t (Lr1.reductions node)) in encode_Reduce prod with Not_found -> (* [node] has no action. *) encode_Fail (* In the error bitmap and in the action table, the row that corresponds to the [#] pseudo-terminal is never accessed. Thus, we do not create this row. This does not create a gap in the table, because this is the right-most row. For sanity, we check this fact here. *) let () = assert (Terminal.t2i Terminal.sharp = Terminal.n - 1) (* The goto table. *) let goto node nt = try let target = SymbolMap.find (Symbol.N nt) (Lr1.transitions node) in encode_Goto target with Not_found -> encode_NoGoto (* The error bitmap reflects which entries in the action table are [Fail]. Like the action table, it is not accessed when [node] has a default reduction. *) let error node t = if action node t = encode_Fail then encode_Error else encode_NoError (* The default reductions table. *) let default_reduction node = match Default.has_default_reduction node with | Some (prod, _) -> encode_DefRed prod | None -> encode_NoDefRed (* Generate the table definitions. *) let action = define_and_measure ( "action", marshal2 "action" Lr1.n (Terminal.n - 1) ( Lr1.map (fun node -> Terminal.mapx (fun t -> action node t ) ) ) ) let goto = define_and_measure ( "goto", marshal2 "goto" Lr1.n Nonterminal.n ( Lr1.map (fun node -> Nonterminal.map (fun nt -> goto node nt ) ) ) ) let error = define_and_measure ( "error", flatten_and_marshal11_list ( Lr1.map (fun node -> Terminal.mapx (fun t -> error node t ) ) ) ) let default_reduction = define_and_measure ( "default_reduction", marshal1_list ( Lr1.map (fun node -> default_reduction node ) ) ) let lhs = define_and_measure ( "lhs", marshal1 ( Production.amap (fun prod -> Nonterminal.n2i (Production.nt prod) ) ) ) let semantic_action = define ( "semantic_action", (* Non-start productions only. *) EArray (Production.mapx semantic_action) ) (* ------------------------------------------------------------------------ *) (* When [--trace] is enabled, we need tables that map terminals and productions to strings. *) let stringwrap f x = EStringConst (f x) let reduce_or_accept prod = match Production.classify prod with | Some _ -> "Accepting" | None -> "Reducing production " ^ (Production.print prod) let trace = define_and_measure ( "trace", if Settings.trace then EData ("Some", [ ETuple [ EArray (Terminal.map (stringwrap Terminal.print)); EArray (Production.map (stringwrap reduce_or_accept)); ] ]) else EData ("None", []) ) (* ------------------------------------------------------------------------ *) (* Generate the two functions that map a token to its integer code and to its semantic value, respectively. *) let token2terminal = destructuretokendef "token2terminal" tint false (fun tok -> EIntConst (Terminal.t2i tok)) let token2value = destructuretokendef "token2value" tobj true (fun tok -> ERepr ( match Terminal.ocamltype tok with | None -> EUnit | Some _ -> EVar semv ) ) (* ------------------------------------------------------------------------ *) (* The client APIs invoke the interpreter with an appropriate start state. The monolithic API uses the function [entry], which performs the entire parsing process, while the incremental API relies on the function [start], which returns just an initial checkpoint. Both functions are defined in [lib/Engine.ml]. *) (* The function [entry] takes a [strategy] parameter, whose value is fixed at compile time, based on [Settings.strategy]. For users of the incremental API, the value of [Settings.strategy] is irrelevant; the functions [resume] and [loop] offered by the incremental API take a [strategy] parameter at runtime. *) let strategy = match Settings.strategy with | `Legacy -> EData ("`Legacy", []) | `Simplified -> EData ("`Simplified", []) (* An entry point to the monolithic API. *) let monolithic_entry_point state nt t = define ( Nonterminal.print true nt, let lexer = "lexer" and lexbuf = "lexbuf" in EFun ( [ PVar lexer; PVar lexbuf ], annotate ( EMagic ( EApp ( EVar entry, [ strategy; EIntConst (Lr1.number state); EVar lexer; EVar lexbuf ] ) ) ) (TypTextual t) ) ) (* The whole monolithic API. *) let monolithic_api : IL.valdef list = Lr1.fold_entry (fun _prod state nt t api -> monolithic_entry_point state nt t :: api ) [] (* An entry point to the incremental API. *) let incremental_entry_point state nt t = let initial = "initial_position" in define ( Nonterminal.print true nt, (* In principle the eta-expansion [fun initial_position -> start s initial_position] should not be necessary, since [start] is a pure function. However, when [--trace] is enabled, [start] will log messages to the standard error channel. *) EFun ( [ PVar initial ], annotate ( EMagic ( EApp ( EVar start, [ EIntConst (Lr1.number state); EVar initial; ] ) ) ) (checkpoint (TypTextual t)) ) ) (* The whole incremental API. *) let incremental_api : IL.valdef list = Lr1.fold_entry (fun _prod state nt t api -> incremental_entry_point state nt t :: api ) [] (* ------------------------------------------------------------------------ *) (* Constructing representations of symbols. *) (* [eterminal t] is a value of type ['a terminal] (for some ['a]) that encodes the terminal symbol [t]. It is just a data constructor of the terminal GADT. *) let eterminal (t : Terminal.t) : expr = EData (tokengadtdata (Terminal.print t), []) (* [enonterminal nt] is a value of type ['a nonterminal] (for some ['a]) that encodes the nonterminal symbol [nt]. It is just a data constructor of the nonterminal GADT. *) let enonterminal (nt : Nonterminal.t) : expr = EData (tnonterminalgadtdata (Nonterminal.print false nt), []) (* [esymbol symbol] is a value of type ['a symbol] (for some ['a]) that encodes the symbol [symbol]. It is built by applying the injection [T] or [N] to the terminal or nonterminal encoding. *) let dataT = "T" let dataN = "N" let esymbol (symbol : Symbol.t) : expr = match symbol with | Symbol.T t -> EData (dataT, [ eterminal t ]) | Symbol.N nt -> EData (dataN, [ enonterminal nt ]) (* [xsymbol symbol] is a value of type [xsymbol] that encodes the symbol [symbol]. It is built by applying the injection [X] (an existential quantifier) to [esymbol symbol]. *) let dataX = "X" let xsymbol (symbol : Symbol.t) : expr = EData (dataX, [ esymbol symbol ]) (* ------------------------------------------------------------------------ *) (* Produce a function that maps a terminal symbol (represented as an integer code) to its representation as an [xsymbol]. Include [error] but not [#], i.e., include all of the symbols which can appear in a production. *) (* Note that, instead of generating a function, we could (a) use an array or (b) use an unsafe conversion of an integer to a data constructor, then wrap it using [X] and [T/N]. Approach (b) is unsafe and causes memory allocation (due to the wrapping) at each call. *) let terminal () = assert Settings.inspection; let t = "t" in define ( "terminal", EFun ([ PVar t ], EMatch (EVar t, Terminal.mapx (fun tok -> branch (pint (Terminal.t2i tok)) (xsymbol (Symbol.T tok)) ) @ [ branch PWildcard (EComment ("This terminal symbol does not exist.", EApp (EVar "assert", [ efalse ]))) ] ) ) ) (* ------------------------------------------------------------------------ *) (* Produce a function that maps a (non-start) nonterminal symbol (represented as an integer code) to its representation as an [xsymbol]. *) let nonterminal () = assert Settings.inspection; let nt = "nt" in define ( "nonterminal", EFun ([ PVar nt ], EMatch (EVar nt, Nonterminal.foldx (fun nt branches -> branch (pint (Nonterminal.n2i nt)) (xsymbol (Symbol.N nt)) :: branches ) [ branch PWildcard (EComment ("This nonterminal symbol does not exist.", EApp (EVar "assert", [ efalse ]))) ] ) ) ) (* ------------------------------------------------------------------------ *) (* Produce a mapping of every LR(0) state to its incoming symbol (encoded as an integer value). (Note that the initial states do not have one.) *) let lr0_incoming () = assert Settings.inspection; define_and_measure ( "lr0_incoming", marshal1 (Array.init Lr0.n (fun node -> encode_symbol_option (Lr0.incoming_symbol node) )) ) (* ------------------------------------------------------------------------ *) (* A table that maps a production (i.e., an integer index) to the production's right-hand side. In principle, we use this table for ordinary productions only, as opposed to the start productions, whose existence is not exposed to the user. However, it is simpler (and not really costly) to include all productions in this table. *) let rhs () = assert Settings.inspection; let productions : int array array = Production.amap (fun prod -> Array.map encode_symbol (Production.rhs prod) ) in define_and_measure ( "rhs", linearize_and_marshal1 productions ) (* ------------------------------------------------------------------------ *) (* A table that maps an LR(1) state to its LR(0) core. *) let lr0_core () = assert Settings.inspection; define_and_measure ( "lr0_core", marshal1_list (Lr1.map (fun (node : Lr1.node) -> Lr0.core (Lr1.state node) )) ) (* A table that maps an LR(0) state to a set of LR(0) items. *) let lr0_items () = assert Settings.inspection; let items : int array array = Array.init Lr0.n (fun node -> Array.map Item.marshal (Array.of_list (Item.Set.elements (Lr0.items node))) ) in define_and_measure ( "lr0_items", linearize_and_marshal1 items ) (* ------------------------------------------------------------------------ *) (* A table that tells which nonterminal symbols are nullable. (For simplicity, this table includes the start symbols.) *) let nullable () = assert Settings.inspection; define_and_measure ( "nullable", marshal11_list ( Nonterminal.map (fun nt -> encode_bool (Analysis.nullable nt) ) ) ) (* ------------------------------------------------------------------------ *) (* A two-dimensional bitmap, indexed first by nonterminal symbols, then by terminal symbols, encodes the FIRST sets. *) let first () = assert Settings.inspection; define_and_measure ( "first", flatten_and_marshal11_list ( Nonterminal.map (fun nt -> Terminal.mapx (fun t -> encode_bool (TerminalSet.mem t (Analysis.first nt)) ) ) ) ) (* ------------------------------------------------------------------------ *) (* A reference to [MenhirLib.StaticVersion.require_XXXXXXXX], where [XXXXXXXX] is our 8-digit version number. This ensures that the generated code can be linked only with an appropriate version of MenhirLib. This is important because we use unsafe casts, and a version mismatch could cause a crash. *) let versiondef = { valpublic = true; valpat = PUnit; valval = EVar (staticVersion ^ ".require_" ^ Version.version); } (* ------------------------------------------------------------------------ *) (* Let's put everything together. *) open BasicSyntax let grammar = Front.grammar let program = [ SIFunctor (grammar.parameters, (* Make a reference to [MenhirLib.StaticVersion.require_XXXXXXXX], where [XXXXXXXX] is our 8-digit version number. This ensures that the generated code can be linked only with an appropriate version of MenhirLib. This is important because we use unsafe casts, and a version mismatch could cause a crash. *) SIComment "This generated code requires the following version of MenhirLib:" :: SIValDefs (false, [ versiondef ]) :: (* Define the internal sub-module [basics], which contains the definitions of the exception [Error] and of the type [token]. Then, include this sub-module. This sub-module is used again below, as part of the application of the functor [TableInterpreter.Make]. *) mbasics grammar @ (* In order to avoid hiding user-defined identifiers, only the exception [Error] and the type [token] should be defined (at top level, with non-mangled names) above this line. We also define the value [_eRR] above this line so that we do not have a problem if a user prelude hides the name [Error]. *) SIStretch grammar.preludes :: (* Define the tables. *) SIModuleDef (tables, MStruct [ (* The internal sub-module [basics] contains the definitions of the exception [Error] and of the type [token]. *) SIInclude (MVar basics); (* This is a non-recursive definition, so none of the names defined here are visible in the semantic actions. *) SIValDefs (false, [ token2terminal; define ("error_terminal", EIntConst (Terminal.t2i Terminal.error)); token2value; default_reduction; error; start_def; action; lhs; goto; semantic_action; trace; ]) ] ) :: SIModuleDef (interpreter, MStruct ( (* Apply the functor [TableInterpreter.MakeEngineTable] to the tables. *) SIModuleDef (et, MApp (MVar make_engine_table, MVar tables)) :: (* Apply the functor [Engine.Make] to obtain an engine. *) SIModuleDef (ti, MApp (MVar make_engine, MVar et)) :: SIInclude (MVar ti) :: MList.ifnlazy Settings.inspection (fun () -> (* Define the internal sub-module [symbols], which contains type definitions. Then, include this sub-module. This sub-module is used again below, as part of the application of the functor [TableInterpreter.MakeInspection]. *) SIModuleDef (symbols, MStruct ( interface_to_structure ( tokengadtdef grammar @ nonterminalgadtdef grammar ) )) :: SIInclude (MVar symbols) :: (* Apply the functor [InspectionTableInterpreter.Make], which expects four arguments. *) SIInclude (mapp (MVar make_inspection) [ (* Argument 1, of type [TableFormat.TABLES]. *) MVar tables; (* Argument 2, of type [InspectionTableFormat.TABLES]. *) MStruct ( (* [lr1state] *) SIInclude (MVar ti) :: (* [terminal], [nonterminal]. *) SIInclude (MVar symbols) :: (* This functor application builds the types [symbol] and [xsymbol] in terms of the types [terminal] and [nonterminal]. This saves us the trouble of generating these definitions. *) SIInclude (MApp (MVar make_symbol, MVar symbols)) :: SIValDefs (false, terminal() :: nonterminal() :: lr0_incoming() :: rhs() :: lr0_core() :: lr0_items() :: nullable() :: first() :: [] ) :: [] ); (* Argument 3, of type [EngineTypes.TABLE]. *) MVar et; (* Argument 4, of type [EngineTypes.ENGINE with ...]. *) MVar ti; ]) :: [] ) )) :: SIValDefs (false, monolithic_api) :: SIModuleDef (incremental, MStruct [ SIValDefs (false, incremental_api) ]) :: SIStretch grammar.postludes :: [])] let () = Time.tick "Producing abstract syntax" end menhir-20210929/src/tableBackend.mli000066400000000000000000000020711412503066000170740ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* The (table-based) code generator. *) module Run (T : sig end) : sig val program: IL.program end menhir-20210929/src/tarjan.ml000066400000000000000000000161501412503066000156460ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module provides an implementation of Tarjan's algorithm for finding the strongly connected components of a graph. The algorithm runs when the functor is applied. Its complexity is $O(V+E)$, where $V$ is the number of vertices in the graph $G$, and $E$ is the number of edges. *) module Run (G : sig type node (* We assume each node has a unique index. Indices must range from $0$ to $n-1$, where $n$ is the number of nodes in the graph. *) val n: int val index: node -> int (* Iterating over a node's immediate successors. *) val successors: (node -> unit) -> node -> unit (* Iterating over all nodes. *) val iter: (node -> unit) -> unit end) = struct (* Define the internal data structure associated with each node. *) type data = { (* Each node carries a flag which tells whether it appears within the SCC stack (which is defined below). *) mutable stacked: bool; (* Each node carries a number. Numbers represent the order in which nodes were discovered. *) mutable number: int; (* Each node [x] records the lowest number associated to a node already detected within [x]'s SCC. *) mutable low: int; (* Each node carries a pointer to a representative element of its SCC. This field is used by the algorithm to store its results. *) mutable representative: G.node; (* Each representative node carries a list of the nodes in its SCC. This field is used by the algorithm to store its results. *) mutable scc: G.node list } (* Define a mapping from external nodes to internal ones. Here, we simply use each node's index as an entry into a global array. *) let table = (* Create the array. We initially fill it with [None], of type [data option], because we have no meaningful initial value of type [data] at hand. *) let table = Array.make G.n None in (* Initialize the array. *) G.iter (fun x -> table.(G.index x) <- Some { stacked = false; number = 0; low = 0; representative = x; scc = [] } ); (* Define a function which gives easy access to the array. It maps each node to its associated piece of internal data. *) function x -> match table.(G.index x) with | Some dx -> dx | None -> assert false (* Indices do not cover the range $0\ldots n$, as expected. *) (* Create an empty stack, used to record all nodes which belong to the current SCC. *) let scc_stack = Stack.create() (* Initialize a function which allocates numbers for (internal) nodes. A new number is assigned to each node the first time it is visited. Numbers returned by this function start at 1 and increase. Initially, all nodes have number 0, so they are considered unvisited. *) let mark = let counter = ref 0 in fun dx -> incr counter; dx.number <- !counter; dx.low <- !counter (* This reference will hold a list of all representative nodes. The components that have been identified last appear at the head of the list. *) let representatives = ref [] (* Look at all nodes of the graph, one after the other. Any unvisited nodes become roots of the search forest. *) let () = G.iter (fun root -> let droot = table root in if droot.number = 0 then begin (* This node hasn't been visited yet. Start a depth-first walk from it. *) mark droot; droot.stacked <- true; Stack.push droot scc_stack; let rec walk x = let dx = table x in G.successors (fun y -> let dy = table y in if dy.number = 0 then begin (* $y$ hasn't been visited yet, so $(x,y)$ is a regular edge, part of the search forest. *) mark dy; dy.stacked <- true; Stack.push dy scc_stack; (* Continue walking, depth-first. *) walk y; if dy.low < dx.low then dx.low <- dy.low end else if (dy.low < dx.low) && dy.stacked then begin (* The first condition above indicates that $y$ has been visited before $x$, so $(x, y)$ is a backwards or transverse edge. The second condition indicates that $y$ is inside the same SCC as $x$; indeed, if it belongs to another SCC, then the latter has already been identified and moved out of [scc_stack]. *) if dy.number < dx.low then dx.low <- dy.number end ) x; (* We are done visiting $x$'s neighbors. *) if dx.low = dx.number then begin (* $x$ is the entry point of a SCC. The whole SCC is now available; move it out of the stack. We pop elements out of the SCC stack until $x$ itself is found. *) let rec loop () = let element = Stack.pop scc_stack in element.stacked <- false; dx.scc <- element.representative :: dx.scc; element.representative <- x; if element != dx then loop() in loop(); representatives := x :: !representatives end in walk root end ) (* There only remains to make our results accessible to the outside. *) let representative x = (table x).representative let scc x = (table x).scc let representatives = Array.of_list !representatives (* The array [representatives] contains a representative for each component. The components that have been identified last appear first in this array. A component is identified only after its successors have been identified; therefore, this array is naturally in topological order. *) let yield action x = let data = table x in assert (data.representative == x); (* a sanity check *) assert (data.scc <> []); (* a sanity check *) action x data.scc let iter action = Array.iter (yield action) representatives let rev_topological_iter action = MArray.iter_rev (yield action) representatives end menhir-20210929/src/tarjan.mli000066400000000000000000000055061412503066000160220ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module provides an implementation of Tarjan's algorithm for finding the strongly connected components of a graph. The algorithm runs when the functor is applied. Its complexity is $O(V+E)$, where $V$ is the number of vertices in the graph $G$, and $E$ is the number of edges. *) module Run (G : sig type node (* We assume each node has a unique index. Indices must range from $0$ to $n-1$, where $n$ is the number of nodes in the graph. *) val n: int val index: node -> int (* Iterating over a node's immediate successors. *) val successors: (node -> unit) -> node -> unit (* Iterating over all nodes. *) val iter: (node -> unit) -> unit end) : sig open G (* This function maps each node to a representative element of its component. *) val representative: node -> node (* This function maps each representative element to a list of all members of its component. Non-representative elements are mapped to an empty list. *) val scc: node -> node list (* [iter action] iterates over all components. For each component, the [action] function is applied to the component's representative element and to a list of the component's elements. (This must be a nonempty list.) The components are presented in topological order: that is, a component is examined before its successors are examined. *) val iter: (node -> node list -> unit) -> unit (* [rev_topological_iter action] iterates over all components. For each component, the [action] function is applied to the component's representative element and to a list of the component's elements. (This must be a nonempty list.) The components are presented in reverse topological order: that is, a component is examined after its successors have been examined. *) val rev_topological_iter: (node -> node list -> unit) -> unit end menhir-20210929/src/time.ml000066400000000000000000000034441412503066000153270ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) open Unix open Printf let clock = ref (times()) let tick msg = match Settings.timings with | None -> () | Some channel -> let times1 = !clock in let times2 = times() in fprintf channel "%s: %.02fs\n%!" msg (times2.tms_utime -. times1.tms_utime); clock := times() type chrono = float ref let fresh () = ref 0. let chrono (chrono : float ref) (task : unit -> 'a) : 'a = match Settings.timings with | None -> task() | Some _channel -> let times1 = times() in let result = task() in let times2 = times() in chrono := !chrono +. times2.tms_utime -. times1.tms_utime; result let display (chrono : float ref) msg = match Settings.timings with | None -> () | Some channel -> fprintf channel "%s: %.02fs\n" msg !chrono menhir-20210929/src/time.mli000066400000000000000000000026651412503066000155040ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* Call [tick msg] to stop timing a task and start timing the next task. A message is displayed. The message includes [msg] as well as timing information. The very first task is deemed to begin when this module is initialized. *) val tick: string -> unit (* Another timing method, with separate chronometers; useful for more precise profiling. *) type chrono val fresh: unit -> chrono val chrono: chrono -> (unit -> 'a) -> 'a val display: chrono -> string -> unit menhir-20210929/src/tokenType.ml000066400000000000000000000151471412503066000163560ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module deals with a few details regarding the definition of the [token] type. In particular, if [--only-tokens] was specified, it emits the type definition and exits. *) open BasicSyntax open IL open CodeBits (* This is the conventional name of the [token] type, with no prefix. A prefix is possibly appended to it below, where [tctoken] is redefined before being exported. *) let tctoken = "token" let ttoken = TypApp (tctoken, []) (* This is the conventional name of the token GADT, which describes the tokens. Same setup as above. *) let tctokengadt = "terminal" let ttokengadt a = TypApp (tctokengadt, [ a ]) (* This is the conventional name of the data constructors of the token GADT. *) let ttokengadtdata token = "T_" ^ token (* This is the definition of the type of tokens. It is defined as an algebraic data type, unless [--external-tokens M] is set, in which case it is defined as an abbreviation for the type [M.token]. *) let tokentypedef grammar = let typerhs = match Settings.token_type_mode with | Settings.TokenTypeOnly | Settings.TokenTypeAndCode -> (* Algebraic data type. *) TDefSum ( List.map (fun (tok, typo) -> { dataname = tok; datavalparams = (match typo with None -> [] | Some t -> [ TypTextual t ]); datatypeparams = None }) (typed_tokens grammar) ) | Settings.CodeOnly m -> (* Type abbreviation. *) TAbbrev (TypApp (m ^ "." ^ tctoken, [])) in [ IIComment "The type of tokens."; IITypeDecls [{ typename = tctoken; typeparams = []; typerhs; typeconstraint = None }] ] (* This is the definition of the token GADT. Here, the data constructors have no value argument, but have a type index. *) (* The token GADT is produced only when [Settings.inspection] is true. Thus, when [Settings.inspection] is false, we remain compatible with old versions of OCaml, without GADTs. *) (* Although the [token] type does not include the [error] token (because this token is never produced by the lexer), the token GADT must include the [error] token (because this GADT must describe all of the tokens that are allowed to appear in a production). *) (* It is defined as a generalized algebraic data type, unless [--external-tokens M] is set, in which case it is defined as an abbreviation for the type ['a M.tokengadt]. *) let tokengadtdef grammar = assert Settings.inspection; let param, typerhs = match Settings.token_type_mode with | Settings.TokenTypeOnly | Settings.TokenTypeAndCode -> (* Generalized algebraic data type. *) let param = "_" in param, TDefSum ( (* The ordering of this list matters. We want the data constructors to respect the internal ordering (as determined by [typed_tokens] in [BasicSyntax]) of the terminal symbols. This may be exploited in the table back-end to allow an unsafe conversion of a data constructor to an integer code. See [t2i] in [InspectionTableInterpreter]. *) { dataname = ttokengadtdata "error"; datavalparams = []; datatypeparams = Some [ tunit ] (* the [error] token has a semantic value of type [unit] *) } :: List.map (fun (token, typo) -> { dataname = ttokengadtdata token; datavalparams = []; datatypeparams = Some [ match typo with None -> tunit | Some t -> TypTextual t ] }) (typed_tokens grammar) ) | Settings.CodeOnly m -> (* Type abbreviation. *) let param = "a" in param, TAbbrev (TypApp (m ^ "." ^ tctokengadt, [ TypVar param ])) in [ IIComment "The indexed type of terminal symbols."; IITypeDecls [{ typename = tctokengadt; typeparams = [ param ]; typerhs; typeconstraint = None }] ] (* If we were asked to only produce a type definition, then do so and stop. *) let produce_tokentypes grammar = match Settings.token_type_mode with | Settings.TokenTypeOnly -> (* Create both an .mli file and an .ml file. This is made necessary by the fact that the two can be different when there are functor parameters. *) let i = tokentypedef grammar @ ifnlazy Settings.inspection (fun () -> tokengadtdef grammar ) in let module P = Printer.Make (struct let f = open_out (Settings.base ^ ".mli") let locate_stretches = None end) in P.interface [ IIFunctor (grammar.parameters, i) ]; let module P = Printer.Make (struct let f = open_out (Settings.base ^ ".ml") let locate_stretches = None end) in P.program [ SIFunctor (grammar.parameters, interface_to_structure i ) ]; exit 0 | Settings.CodeOnly _ | Settings.TokenTypeAndCode -> () (* The token type and the token GADTs can be referred to via a short (unqualified) name, regardless of how they have been defined (either directly or as an abbreviation). However, their data constructors must be qualified if [--external-tokens] is set. *) let tokenprefix id = match Settings.token_type_mode with | Settings.CodeOnly m -> m ^ "." ^ id | Settings.TokenTypeAndCode -> id | Settings.TokenTypeOnly -> id (* irrelevant, really *) let tokendata = tokenprefix let tokengadtdata token = tokenprefix (ttokengadtdata token) menhir-20210929/src/tokenType.mli000066400000000000000000000057221412503066000165250ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* This module deals with the definitions of the type(s) that describe the tokens and the terminal symbols. *) (* By default, following [ocamlyacc], we produce just one type, [token], which describes the tokens. A token contains a tag (a terminal symbol) and possibly a semantic value. *) (* In addition to that, in [--inspection] mode only, we produce a GADT which describes the terminal symbols. A terminal symbol is just a tag; it does not carry a semantic value. *) (* In this module, we also deal with [--only-tokens] and [--external-tokens]. If [--only-tokens] is specified on the command line, [produce_tokentypes] emits the type definition(s) and exit. If [--external-tokens M] is set, then the token type and the token GADT are defined as abbreviations for [M.token] and ['a M.terminal]. *) (* The conventional name of the [token] type, for use by the code generators. *) val ttoken: IL.typ (* [tokendata] maps the name of a token to a data constructor of the [token] type. (If [--external-tokens] is set, then it prefixes its argument with an appropriate OCaml module name. Otherwise, it is the identity.) *) val tokendata: string -> string (* The conventional name of the [terminal] type, a.k.a. the token GADT. This is an indexed type (i.e., it has one type parameter). Its data constructors carry zero value arguments. *) val tctokengadt: string val ttokengadt: IL.typ -> IL.typ (* [tokengadtdata] maps the name of a token to a data constructor of the token GADT. *) val tokengadtdata: string -> string (* The definitions of the token type and of the token GADT, for use by the code generators. Each of these lists defines zero or one type. *) val tokentypedef: BasicSyntax.grammar -> IL.interface val tokengadtdef: BasicSyntax.grammar -> IL.interface (* If [--only-tokens] is set, then [produce_tokentypes] writes the type definitions to the [.ml] and [.mli] files and stops Menhir. Otherwise, it does nothing. *) val produce_tokentypes: BasicSyntax.grammar -> unit menhir-20210929/src/traverse.ml000066400000000000000000000330361412503066000162240ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (* Code for traversing or transforming [IL] terms. *) open IL open CodeBits (* This turns a list of value definitions into a hash table. It also counts and numbers the definitions. We assume that the left-hand side of every definition is a variable. *) let tabulate_defs (defs : valdef list) : int * (string, int * valdef) Hashtbl.t = let count = ref 0 in let table = Hashtbl.create 1023 in List.iter (fun def -> let k = !count in count := k + 1; Hashtbl.add table (pat2var def.valpat) (k, def) ) defs; !count, table (* This mixin class, used by [map] and [fold] below, helps maintain environments, which can be used to keep track of local variable bindings. *) class virtual ['env] env = object(self) (* The virtual method [pvar] records a local variable binding in the environment. *) method virtual pvar: 'env -> string -> 'env method pat env = function | PWildcard | PUnit -> env | PVar id -> self#pvar env id | PVarLocated id -> let id = Positions.value id in self#pvar env id | PTuple ps | POr ps | PData (_, ps) -> self#pats env ps | PAnnot (p, _) -> self#pat env p | PRecord fps -> self#fpats env fps method pats env ps = List.fold_left self#pat env ps method fpats env fps = List.fold_left self#fpat env fps method fpat env (_, p) = self#pat env p end (* A class that helps transform expressions. The environment [env] can be used to keep track of local variable bindings. *) exception NoChange class virtual ['env] map = object (self) inherit ['env] env method expr (env : 'env) e = try match e with | EVar x -> self#evar env x | EFun (ps, e) -> self#efun env ps e | EApp (e, es) -> self#eapp env e es | ELet (bs, e) -> self#elet env bs e | EMatch (e, bs) -> self#ematch env e bs | EIfThen (e, e1) -> self#eifthen env e e1 | EIfThenElse (e, e1, e2) -> self#eifthenelse env e e1 e2 | ERaise e -> self#eraise env e | ETry (e, bs) -> self#etry env e bs | EUnit -> self#eunit env | EIntConst k -> self#eintconst env k | EStringConst s -> self#estringconst env s | EData (d, es) -> self#edata env d es | ETuple es -> self#etuple env es | EAnnot (e, t) -> self#eannot env e t | EMagic e -> self#emagic env e | ERepr _ -> self#erepr env e | ERecord fs -> self#erecord env fs | ERecordAccess (e, f) -> self#erecordaccess env e f | ERecordWrite (e, f, e1) -> self#erecordwrite env e f e1 | ETextual action -> self#etextual env action | EComment (s, e) -> self#ecomment env s e | EPatComment (s, p, e) -> self#epatcomment env s p e | EArray es -> self#earray env es | EArrayAccess (e, i) -> self#earrayaccess env e i with NoChange -> e method evar _env _x = raise NoChange method efun env ps e = let e' = self#expr (self#pats env ps) e in if e == e' then raise NoChange else EFun (ps, e') method eapp env e es = let e' = self#expr env e and es' = self#exprs env es in if e == e' && es == es' then raise NoChange else EApp (e', es') method elet env bs e = let env, bs' = self#bindings env bs in let e' = self#expr env e in if bs == bs' && e == e' then raise NoChange else ELet (bs', e') method ematch env e bs = let e' = self#expr env e and bs' = self#branches env bs in if e == e' && bs == bs' then raise NoChange else EMatch (e', bs') method eifthen env e e1 = let e' = self#expr env e and e1' = self#expr env e1 in if e == e' && e1 == e1' then raise NoChange else EIfThen (e', e1') method eifthenelse env e e1 e2 = let e' = self#expr env e and e1' = self#expr env e1 and e2' = self#expr env e2 in if e == e' && e1 == e1' && e2 == e2' then raise NoChange else EIfThenElse (e', e1', e2') method eraise env e = let e' = self#expr env e in if e == e' then raise NoChange else ERaise e' method etry env e bs = let e' = self#expr env e and bs' = self#branches env bs in if e == e' && bs == bs' then raise NoChange else ETry (e', bs') method eunit _env = raise NoChange method eintconst _env _k = raise NoChange method estringconst _env _s = raise NoChange method edata env d es = let es' = self#exprs env es in if es == es' then raise NoChange else EData (d, es') method etuple env es = let es' = self#exprs env es in if es == es' then raise NoChange else ETuple es' method eannot env e t = let e' = self#expr env e in if e == e' then raise NoChange else EAnnot (e', t) method emagic env e = let e' = self#expr env e in if e == e' then raise NoChange else EMagic e' method erepr env e = let e' = self#expr env e in if e == e' then raise NoChange else ERepr e' method erecord env fs = let fs' = self#fields env fs in if fs == fs' then raise NoChange else ERecord fs' method erecordaccess env e f = let e' = self#expr env e in if e == e' then raise NoChange else ERecordAccess (e', f) method erecordwrite env e f e1 = let e' = self#expr env e and e1' = self#expr env e1 in if e == e' && e1 == e1' then raise NoChange else ERecordWrite (e', f, e1') method earray env es = let es' = self#exprs env es in if es == es' then raise NoChange else EArray es' method earrayaccess env e i = let e' = self#expr env e in if e == e' then raise NoChange else EArrayAccess (e', i) method etextual _env _action = raise NoChange method ecomment env s e = let e' = self#expr env e in if e == e' then raise NoChange else EComment (s, e') method epatcomment env s p e = let e' = self#expr env e in if e == e' then raise NoChange else EPatComment (s, p, e') method exprs env es = Misc.smap (self#expr env) es method fields env fs = Misc.smap (self#field env) fs method field env ((f, e) as field) = let e' = self#expr env e in if e == e' then field else (f, e') method branches env bs = Misc.smap (self#branch env) bs method branch env b = let e = b.branchbody in let e' = self#expr (self#pat env b.branchpat) e in if e == e' then b else { b with branchbody = e' } (* The method [binding] produces a pair of an updated environment and a transformed binding. *) method binding env ((p, e) as b) = let e' = self#expr env e in self#pat env p, if e == e' then b else (p, e') (* For nested non-recursive bindings, the environment produced by each binding is used to traverse the following bindings. The method [binding] produces a pair of an updated environment and a transformed list of bindings. *) method bindings env bs = Misc.smapa self#binding env bs method valdef env def = let e = def.valval in let e' = self#expr env e in if e == e' then def else { def with valval = e' } method valdefs env defs = Misc.smap (self#valdef env) defs end (* A class that helps iterate, or fold, over expressions. *) class virtual ['env, 'a] fold = object (self) inherit ['env] env method expr (env : 'env) (accu : 'a) e = match e with | EVar x -> self#evar env accu x | EFun (ps, e) -> self#efun env accu ps e | EApp (e, es) -> self#eapp env accu e es | ELet (bs, e) -> self#elet env accu bs e | EMatch (e, bs) -> self#ematch env accu e bs | EIfThen (e, e1) -> self#eifthen env accu e e1 | EIfThenElse (e, e1, e2) -> self#eifthenelse env accu e e1 e2 | ERaise e -> self#eraise env accu e | ETry (e, bs) -> self#etry env accu e bs | EUnit -> self#eunit env accu | EIntConst k -> self#eintconst env accu k | EStringConst s -> self#estringconst env accu s | EData (d, es) -> self#edata env accu d es | ETuple es -> self#etuple env accu es | EAnnot (e, t) -> self#eannot env accu e t | EMagic e -> self#emagic env accu e | ERepr _ -> self#erepr env accu e | ERecord fs -> self#erecord env accu fs | ERecordAccess (e, f) -> self#erecordaccess env accu e f | ERecordWrite (e, f, e1) -> self#erecordwrite env accu e f e1 | ETextual action -> self#etextual env accu action | EComment (s, e) -> self#ecomment env accu s e | EPatComment (s, p, e) -> self#epatcomment env accu s p e | EArray es -> self#earray env accu es | EArrayAccess (e, i) -> self#earrayaccess env accu e i method evar (_env : 'env) (accu : 'a) _x = accu method efun (env : 'env) (accu : 'a) ps e = let accu = self#expr (self#pats env ps) accu e in accu method eapp (env : 'env) (accu : 'a) e es = let accu = self#expr env accu e in let accu = self#exprs env accu es in accu method elet (env : 'env) (accu : 'a) bs e = let env, accu = self#bindings env accu bs in let accu = self#expr env accu e in accu method ematch (env : 'env) (accu : 'a) e bs = let accu = self#expr env accu e in let accu = self#branches env accu bs in accu method eifthen (env : 'env) (accu : 'a) e e1 = let accu = self#expr env accu e in let accu = self#expr env accu e1 in accu method eifthenelse (env : 'env) (accu : 'a) e e1 e2 = let accu = self#expr env accu e in let accu = self#expr env accu e1 in let accu = self#expr env accu e2 in accu method eraise (env : 'env) (accu : 'a) e = let accu = self#expr env accu e in accu method etry (env : 'env) (accu : 'a) e bs = let accu = self#expr env accu e in let accu = self#branches env accu bs in accu method eunit (_env : 'env) (accu : 'a) = accu method eintconst (_env : 'env) (accu : 'a) _k = accu method estringconst (_env : 'env) (accu : 'a) _s = accu method edata (env : 'env) (accu : 'a) _d es = let accu = self#exprs env accu es in accu method etuple (env : 'env) (accu : 'a) es = let accu = self#exprs env accu es in accu method eannot (env : 'env) (accu : 'a) e _t = let accu = self#expr env accu e in accu method emagic (env : 'env) (accu : 'a) e = let accu = self#expr env accu e in accu method erepr (env : 'env) (accu : 'a) e = let accu = self#expr env accu e in accu method erecord (env : 'env) (accu : 'a) fs = let accu = self#fields env accu fs in accu method erecordaccess (env : 'env) (accu : 'a) e _f = let accu = self#expr env accu e in accu method erecordwrite (env : 'env) (accu : 'a) e _f e1 = let accu = self#expr env accu e in let accu = self#expr env accu e1 in accu method earray (env : 'env) (accu : 'a) es = let accu = self#exprs env accu es in accu method earrayaccess (env : 'env) (accu : 'a) e _i = let accu = self#expr env accu e in accu method etextual (_env : 'env) (accu : 'a) _action = accu method ecomment (env : 'env) (accu : 'a) _s e = let accu = self#expr env accu e in accu method epatcomment (env : 'env) (accu : 'a) _s _p e = let accu = self#expr env accu e in accu method exprs (env : 'env) (accu : 'a) es = List.fold_left (self#expr env) accu es method fields (env : 'env) (accu : 'a) fs = List.fold_left (self#field env) accu fs method field (env : 'env) (accu : 'a) (_f, e) = let accu = self#expr env accu e in accu method branches (env : 'env) (accu : 'a) bs = List.fold_left (self#branch env) accu bs method branch (env : 'env) (accu : 'a) b = let accu = self#expr (self#pat env b.branchpat) accu b.branchbody in accu method binding ((env, accu) : 'env * 'a) (p, e) = let accu = self#expr env accu e in self#pat env p, accu method bindings (env : 'env) (accu : 'a) bs = List.fold_left self#binding (env, accu) bs method valdef (env : 'env) (accu : 'a) def = let accu = self#expr env accu def.valval in accu method valdefs (env : 'env) (accu : 'a) defs = List.fold_left (self#valdef env) accu defs end menhir-20210929/src/unionFind.ml000066400000000000000000000120111412503066000163100ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (** This module implements a simple and efficient union/find algorithm. See Robert E. Tarjan, ``Efficiency of a Good But Not Linear Set Union Algorithm'', JACM 22(2), 1975. *) (** The abstraction defined by this module is a set of points, partitioned into equivalence classes. With each equivalence class, a piece of information, of abstract type ['a], is associated; we call it a descriptor. A point is implemented as a cell, whose (mutable) contents consist of a single link to either information about the equivalence class, or another point. Thus, points form a graph, which must be acyclic, and whose connected components are the equivalence classes. In every equivalence class, exactly one point has no outgoing edge, and carries information about the class instead. It is the class's representative element. Information about a class consists of an integer weight (the number of elements in the class) and of the class's descriptor. *) type 'a point = { mutable link: 'a link } and 'a link = | Info of 'a info | Link of 'a point and 'a info = { mutable weight: int; mutable descriptor: 'a } (** [fresh desc] creates a fresh point and returns it. It forms an equivalence class of its own, whose descriptor is [desc]. *) let fresh desc = { link = Info { weight = 1; descriptor = desc } } (** [repr point] returns the representative element of [point]'s equivalence class. It is found by starting at [point] and following the links. For efficiency, the function performs path compression at the same time. *) let rec repr point = match point.link with | Link point' -> let point'' = repr point' in if point'' != point' then (* [point''] is [point']'s representative element. Because we just invoked [repr point'], [point'.link] must be [Link point'']. We write this value into [point.link], thus performing path compression. Note that this function never performs memory allocation. *) point.link <- point'.link; point'' | Info _ -> point (** [get point] returns the descriptor associated with [point]'s equivalence class. *) let rec get point = (* By not calling [repr] immediately, we optimize the common cases where the path starting at [point] has length 0 or 1, at the expense of the general case. *) match point.link with | Info info | Link { link = Info info } -> info.descriptor | Link { link = Link _ } -> get (repr point) let rec set point v = match point.link with | Info info | Link { link = Info info } -> info.descriptor <- v | Link { link = Link _ } -> set (repr point) v (** [union point1 point2] merges the equivalence classes associated with [point1] and [point2] into a single class whose descriptor is that originally associated with [point2]. It does nothing if [point1] and [point2] already are in the same class. The weights are used to determine whether [point1] should be made to point to [point2], or vice-versa. By making the representative of the smaller class point to that of the larger class, we guarantee that paths remain of logarithmic length (not accounting for path compression, which makes them yet smaller). *) let union point1 point2 = let point1 = repr point1 and point2 = repr point2 in if point1 != point2 then match point1.link, point2.link with | Info info1, Info info2 -> let weight1 = info1.weight and weight2 = info2.weight in if weight1 >= weight2 then begin point2.link <- Link point1; info1.weight <- weight1 + weight2; info1.descriptor <- info2.descriptor end else begin point1.link <- Link point2; info2.weight <- weight1 + weight2 end | _, _ -> assert false (* [repr] guarantees that [link] matches [Info _]. *) (** [equivalent point1 point2] tells whether [point1] and [point2] belong to the same equivalence class. *) let equivalent point1 point2 = repr point1 == repr point2 menhir-20210929/src/unionFind.mli000066400000000000000000000042711412503066000164720ustar00rootroot00000000000000(******************************************************************************) (* *) (* Menhir *) (* *) (* François Pottier, Inria Paris *) (* Yann Régis-Gianas, PPS, Université Paris Diderot *) (* *) (* Copyright Inria. All rights reserved. This file is distributed under the *) (* terms of the GNU General Public License version 2, as described in the *) (* file LICENSE. *) (* *) (******************************************************************************) (** This module implements a simple and efficient union/find algorithm. See Robert E. Tarjan, ``Efficiency of a Good But Not Linear Set Union Algorithm'', JACM 22(2), 1975. *) (** The abstraction defined by this module is a set of points, partitioned into equivalence classes. With each equivalence class, a piece of information, of abstract type ['a], is associated; we call it a descriptor. *) type 'a point (** [fresh desc] creates a fresh point and returns it. It forms an equivalence class of its own, whose descriptor is [desc]. *) val fresh: 'a -> 'a point (** [get point] returns the descriptor associated with [point]'s equivalence class. *) val get: 'a point -> 'a (** [union point1 point2] merges the equivalence classes associated with [point1] and [point2] into a single class whose descriptor is that originally associated with [point2]. It does nothing if [point1] and [point2] already are in the same class. *) val union: 'a point -> 'a point -> unit (** [equivalent point1 point2] tells whether [point1] and [point2] belong to the same equivalence class. *) val equivalent: 'a point -> 'a point -> bool (** [set p d] updates the descriptor of [p] to [d]. *) val set: 'a point -> 'a -> unit