megaparsec-9.3.1/0000755000000000000000000000000007346545000012024 5ustar0000000000000000megaparsec-9.3.1/CHANGELOG.md0000644000000000000000000011553307346545000013645 0ustar0000000000000000*Megaparsec follows [SemVer](https://semver.org/).* ## Megaparsec 9.3.1 * Fixed a bug related to processing of tabs when error messages are rendered. [Issue 524](https://github.com/mrkkrp/megaparsec/issues/524). ## Megaparsec 9.3.0 * Now `label` can override more than one group of hints in the parser it wraps. [Issue 482](https://github.com/mrkkrp/megaparsec/issues/482). * `takeP n` now returns the empty chunk of the input stream when `n` is negative, similar to when `n == 0`. [Issue 497](https://github.com/mrkkrp/megaparsec/issues/497). * Added the `MonadParsecDbg` type class in `Text.Megaparsec.Debug`. The type class allows us to use `dbg` in MTL monad transformers. [Issue 488](https://github.com/mrkkrp/megaparsec/issues/488). * Introduced the `ShareInput` and `NoShareInput` newtype wrappers in `Text.Megaparsec.Stream` in order to allow the user to choose how the input should be sliced and shared during the parsing. [Issue 492](https://github.com/mrkkrp/megaparsec/issues/492). ## Megaparsec 9.2.2 * Fixed a space leak in the implementations of the `reachOffset` and `reachOffsetNoLine` methods of `TraversableStream`. [Issue 486](https://github.com/mrkkrp/megaparsec/issues/486). ## Megaparsec 9.2.1 * Builds with `mtl-2.3` and `transformers-0.6`. ## Megaparsec 9.2.0 * Added parsers for binary representations (little/big endian) of numbers in `Text.Megaparsec.Byte.Binary`. ## Megaparsec 9.1.0 * Added `dbg'` in `Text.Megaparsec.Debug` for debugging parsers that have unshowable return values. * Documentation improvements. ## Megaparsec 9.0.1 * Added [Safe Haskell](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/safe_haskell.html) support. ## Megaparsec 9.0.0 * Split the `Stream` type class. The methods `showTokens` and `tokensLength` have been put into a separate type class `VisualStream`, while `reachOffset` and `reachOffsetNoLine` are now in `TraversableStream`. This should make defining `Stream` instances for custom streams easier. * Defined `Stream` instances for lists and `Seq`s. * Added the functions `hspace` and `hspace1` to the `Text.Megaparsec.Char` and `Text.Megaparsec.Byte` modules. ## Megaparsec 8.0.0 * The methods `failure` and `fancyFailure` of `MonadParsec` are now ordinary functions and live in `Text.Megaparsec`. They are defined in terms of the new `parseError` method of `MonadParsec`. This method allows us to signal parse errors at a given offset without manipulating parser state manually. * Megaparsec now supports registration of “delayed” parse errors. On lower level we added a new field called `stateParseErrors` to the `State` record. The type also had to change from `State s` to `State s e`. This field contains the list of registered `ParseErrors` that do not end parsing immediately but still will cause failure in the end if the list is not empty. Users are expected to register parse errors using the three functions: `registerParseError`, `registerFailure`, and `registerFancyFailure`. These functions are analogous to those without the `register` prefix, except that they have “delayed” effect. * Added the `tokensLength` method to the `Stream` type class to improve support for custom input streams. * Added the `setErrorOffset` function to set offset of `ParseError`s. * Changed type signatures of `reachOffset` and `reachOffsetNoLine` methods of the `Stream` type class. Instead of three-tuple `reachOffset` now returns two-tuple because `SourcePos` is already contained in the returned `PosState` record. * Generalized `decimal`, `binary`, `octal`, and `hexadecimal` parsers in lexer modules so that they `Num` instead of just `Integral`. * Dropped support for GHC 8.2.x and older. ## Megaparsec 7.0.5 * Dropped support for GHC 7.10. * Adapted the code to `MonadFail` changes in `base-4.13`. * Separated the test suite into its own package. The reason is that we can avoid circular dependency on `hspec-megaparsec` and thus avoid keeping copies of its source files in our test suite, as we had to do before. Another benefit is that we can export some auxiliary functions in `megaparsec-tests` which can be used by other test suites, for example in the `parser-combinators-tests` package. Version of `megaparsec-tests` will be kept in sync with versions of `megaparsec` from now on. ## Megaparsec 7.0.4 * Numerous documentation corrections. ## Megaparsec 7.0.3 * Fixed the build with `mtl` older than `2.2.2`. ## Megaparsec 7.0.2 * Fixed the property test for `char'` which was failing in the case when there is a character with different upper and title cases. * More descriptive error messages when `elabel` or `ulabel` from `Text.Megaparsec.Error.Builder` are used with empty strings. * Typo fixes in the docs. ## Megaparsec 7.0.1 * Fixed a bug in `errorBundlePretty`. Previously the question sign `?` was erroneously inserted before offending line in 2nd and later parse errors. ## Megaparsec 7.0.0 ### General * Dropped the `Text.Megaparsec.Perm` module. Use `Control.Applicative.Permutations` from `parser-combinators` instead. * Dropped the `Text.Megaparsec.Expr` module. Use `Control.Monad.Combinators.Expr` from `parser-combinators` instead. * The debugging function `dbg` has been moved from `Text.Megaparsec` to its own module `Text.Megaparsec.Debug`. * Dropped support for GHC 7.8. ### Combinators * Moved some general combinators from `Text.Megaparsec.Char` and `Text.Megaparsec.Byte` to `Text.Megaparsec`, renaming some of them for clarity. Practical consequences: * Now there is the `single` combinator that is a generalization of `char` for arbitrary streams. `Text.Megaparsec.Char` and `Text.Megaparsec.Byte` still contain `char` as type-constrained versions of `single`. * Similarly, now there is the `chunk` combinator that is a generalization of `string` for arbitrary streams. The `string` combinator is still re-exported from `Text.Megaparsec.Char` and `Text.Megaparsec.Byte` for compatibility. * `satisfy` does not depend on type of token, and so it now lives in `Text.Megaparsec`. * `anyChar` was renamed to `anySingle` and moved to `Text.Megaparsec`. * `notChar` was renamed to `anySingleBut` and moved to `Text.Megaparsec`. * `oneOf` and `noneOf` were moved to `Text.Megaparsec`. * Simplified the type of the `token` primitive. It now takes just a matching function `Token s -> Maybe a` as the first argument and the collection of expected items `Set (ErrorItem (Token s))` as the second argument. This makes sense because the collection of expected items cannot depend on what we see in the input stream. * The `label` primitive now doesn't prepend the phrase “the rest of” to the label when its inner parser produces hints after consuming input. In that case `label` has no effect. * Fixed the `Text.Megaparsec.Char.Lexer.charLiteral` so it can accept longer escape sequences (max length is now 10). * Added the `binDigitChar` functions in `Text.Megaparsec.Byte` and `Text.Megaparsec.Char`. * Added the `binary` functions in `Text.Megaparsec.Byte.Lexer` and `Text.Megaparsec.Char.Lexer`. * Improved case-insensitive character matching in the cases when e.g. `isLower` and `isUpper` both return `False`. Functions affected: `Text.Megaparsec.Char.char'`. * Renamed `getPosition` to `getSourcePos`. * Renamed `getTokensProcessed` to `getOffset`, `setTokensProcessed` to `setOffset`. * Dropped `getTabWidth` and `setTabWidth` because tab width is irrelevant to parsing process now, it's only relevant for pretty-printing of parse errors, which is handled separately. * Added and `withParsecT` in `Text.Megaparsec.Internal` to allow changing the type of the custom data component in parse errors. ### Parser state and input stream * Dropped stacks of source positions. Accordingly, the functions `pushPosition` and `popPosition` from `Text.Megaparsec` and `sourcePosStackPretty` from `Text.Megaparsec.Error` were removed. The reason for this simplification is that I could not find any code that uses the feature and it makes manipulation of source positions hairy. * Introduced `PosState` for calculating `SourcePos` from offsets and getting offending line for displaying on pretty-printing of parse errors. It's now contained in both `State` and `ParseErrorBundle`. * Dropped `positionAt1`, `positionAtN`, `advance1`, and `advanceN` methods from `Stream`. They are no longer necessary because `reachOffset` (and its specialized version `reachOffsetNoLine`) takes care of `SourcePos` calculation. ### Parse errors * `ParseError` now contains raw offset in input stream instead of `SourcePos`. `errorPos` was dropped from `Text.Megaparsec.Error`. * `ParseError` is now parametrized over stream type `s` instead of token type `t`. * Introduced `ParseErrorBundle` which contains one or more `ParseError` equipped with all information that is necessary to pretty-print them together with offending lines from the input stream. Functions like `runParser` now return `ParseErrorBundle` instead of plain `ParseError`. By default there will be only one `ParseError` in such a bundle, but it's possible to add more parse errors to a bundle manually. During pretty-printing, the input stream will be traversed only once. * The primary function for pretty-printing of parse errors—`errorBundlePretty` always prints offending lines now. `parseErrorPretty` is still there, but it probably won't see a lot of use from now on. `parseErrorPretty'` and `parseErrorPretty_` were removed. `parseTest'` was removed because `parseTest` always prints offending lines now. * Added `attachSourcePos` function in `Text.Megaparsec.Error`. * The `ShowToken` type class has been removed and its method `showTokens` now lives in the `Stream` type class. * The `LineToken` type class is no longer necessary because the new method `reachOffset` of the type class `Stream` does its job. * In `Text.Megaparsec.Error` the following functions were added: `mapParseError`, `errorOffset`. * Implemented continuous highlighting in parse errors. For this we added the `errorComponentLen` method to the `ShowErrorComponent` type class. ### Parse error builder * The functions `err` and `errFancy` now accept offsets at which the parse errors are expected to have happened, i.e. `Int`s. Thus `posI` and `posN` are no longer necessary and were removed. * `ET` is now parametrized over the type of stream `s` instead of token type `t`. * Combinators like `utoks` and `etoks` now accept chunks of input stream directly, i.e. `Tokens s` instead of `[Token s]` which should be more natural and convenient. ## Megaparsec 6.5.0 * Added `Text.Megaparsec.Internal`, which exposes some internal data structures and data constructor of `ParsecT`. ## Megaparsec 6.4.1 * `scientific` now correctly backtracks after attempting to parse fractional and exponent parts of a number. `float` correctly backtracks after attempting to parse optional exponent part (when it comes after fractional part, otherwise it's obligatory). ## Megaparsec 6.4.0 * `Text.Megaparsec` now re-exports `Control.Monad.Combinators` instead of `Control.Applicative.Combinators` from `parser-combinators` because the monadic counterparts of the familiar combinators are more efficient and not as leaky. This may cause minor breakage in certain cases: * You import `Control.Applicative` and in that case there will be a name conflict between `Control.Applicative.many` and `Control.Monad.Combinator.many` now (the same for `some`). * You define a polymorphic helper in terms of combinator(s) from `Control.Applicative.Combinators` and use `Applicative` or `Alternative` constraint. In this case you'll have to adjust the constraint to be `Monad` or `MonadPlus` respectively. Also note that the new `Control.Monad.Combinators` module we re-export now re-exports `empty` from `Control.Applicative`. * Fix the `atEnd` parser. It now does not produce hints, so when you use it, it won't contribute to the “expecting end of input” component of parse error. ## Megaparsec 6.3.0 * Added an `IsString` instance for `ParsecT`. Now it is possible to write `"abc"` rather than `string "abc"`. * Added the `customFailure` combinator, which is a special case of `fancyFailure`. * Made implementation of `sconcat` and `mconcat` of `ParsecT` more efficient. ## Megaparsec 6.2.0 * `float` in `Text.Megaparsec.Char.Lexer` and `Text.Megaparsec.Byte.Lexer` now does not accept plain integers. This is the behavior we had in version 5 of the library. ## Megaparsec 6.1.1 * Fixed the bug when `tokens` used `cok` continuation even when matching an empty chunk. Now it correctly uses `eok` in this case. ## Megaparsec 6.1.0 * Improved rendering of offending line in `parseErrorPretty'` in the presence of tab characters. * Added `parseErrorPretty_`, which is just like `parseErrorPretty'` but allows to specify tab width to use. * Adjusted hint generation so when we backtrack a consuming parser with `try`, we do not create hints from its parse error (because it's further in input stream!). This was a quite subtle bug that stayed unnoticed for several years apparently. ## Megaparsec 6.0.2 * Allow `parser-combinators-0.2.0`. ## Megaparsec 6.0.1 * Fixed a typo in `README.md`. * Added some text that clarifies how to parametrize the `ParseError` type. ## Megaparsec 6.0.0 ### General * Re-organized the module hierarchy. Some modules such as `Text.Megaparsec.Prim` do not exist anymore. Stream definitions were moved to `Text.Megaparsec.Stream`. Generic combinators are now re-exported from the `Control.Applicative.Combinators` from the package `parser-combinators`. Just import `Text.Megaparsec` and you should be OK. Add `Text.Megaparsec.Char` if you are working with a stream of `Char`s or `Text.Megaparsec.Byte` if you intend to parse binary data, then add qualified modules you need (permutation parsing, lexing, expression parsing, etc.). `Text.Megaparsec.Lexer` was renamed to `Text.Megaparsec.Char.Lexer` because many functions in it has the `Token s ~ Char` constraint. There is also `Text.Megaparsec.Byte.Lexer` now, although it has fewer functions. * Dropped per-stream modules, the `Parser` type synonym is to be defined manually by user. * Added a `MonadFix` instance for `ParsecT`. * More lightweight dependency tree, dropped `exceptions` and `QuickCheck` dependencies. * Added dependency on `case-insensitive`. ### Source positions * Now `Pos` contains an `Int` inside, not `Word`. * Dropped `unsafePos` and changed type of `mkPos` so it throws from pure code if its argument is not a positive `Int`. * Added `pos1` constant that represents the `Pos` with value 1 inside. * Made `InvalidPosException` contain the invalid `Int` value that was passed to `mkPos`. ### Parse errors * Changed the definition of `ParseError` to have separate data constructors for “trivial” errors (unexpected/expected tokens) and “fancy” errors (everything else). * Removed the `ErrorComponent` type class, added `ErrorFancy` instead. `ErrorFancy` is a sum type which can represent `fail` messages, incorrect indentation, and custom data (we use `Void` for that by default to “disable” it). This is better than the typeclass-based approach because every instance of `ErrorComponent` needed to have constructors for `fail` and indentation massages anyway, leading to duplication of code (for example for parse error component rendering). * Added `Functor` instances for `ErrorItem` and `ErrorFancy`. * Added the function `errorPos` to get error positions from `ParseError` (previously it was a record selector in `ParseError`). * Control characters in parse error are displayed in a readable form even when they are part of strings, for example: `{` (`{` followed by the newline character). Previously control characters were rendered in readable form only as standalone tokens. * Added `Text.Megaparsec.Error.Builder` module to help construct `ParseError`s easily. It is useful for testing and debugging. Previously we had something like that in the `hspec-megaparsec` package, but it does not hurt to ship it with the library. * Added `parseErrorPretty'` allowing to display offending line in parse errors. * Added `LineToken` type class for tokens that support operations necessary for selecting and displaying relevant line of input (used in `parseErrorPretty'`). * Added `parseTest'` function that is just like `parseTest`, but also prints offending line in parse errors. This is powered by the new `parseErrorPretty'`. ### Stream * Introduced the new `Text.Megaparsec.Stream` module that is the home of `Stream` type class. In version 6, the type class has been extended significantly to improve performance and make some combinators more general. ### Combinators * Changed signatures of `failure` and `token`, they only can signal trivial errors now. * Added a new method of `MonadParsec` type class called `fancyFailure` for signalling non-trivial failures. Signatures of some functions (`failure`, `token`) have been changed accordingly. * Added `takeWhileP`, `takeWhile1P` and `takeP` to `MonadParsec`. * Added `takeRest` non-primitive combinator to consume the rest of input. * Added `atEnd` which returns `True` when end of input has been reached. * Dropped `oneOf'` and `noneOf'` from `Text.Megaparsec.Char`. These were seldom (if ever) used and are easily re-implemented. * Added `notChar` in `Text.Megaparsec.Char`. * Added `space1` in `Text.Megaparsec.Char`. This parser is like `space` but requires at least one space character to be present to succeed. * Added new module `Text.Megaparsec.Byte`, which is similar to `Text.Megaparsec.Char`, but for token streams of the type `Word8` instead of `Char`. * `integer` was dropped from `Text.Megaparsec.Char.Lexer`. Use `decimal` instead. * `number` was dropped from `Text.Megaparsec.Char.Lexer`. Use `scientific` instead. * `decimal`, `octal`, and `hexadecimal` are now polymorphic in their return type and can be used to parse any instance of `Integral`. * `float` is now polymorphic in its return type and can be used to parse any instance of `RealFloat`. * Added new module `Text.Megaparsec.Byte.Lexer`, which provides some functions (white space and numeric helpers) from `Text.Megaparsec.Char.Lexer` for streams with token type `Word8`. ## Megaparsec 5.3.1 * Various updates to the docs. * Allowed `QuickCheck-2.10`. ## Megaparsec 5.3.0 * Added the `match` combinator that allows to get collection of consumed tokens along with result of parsing. * Added the `region` combinator which allows to process parse errors happening when its argument parser is run. * Added the `getNextTokenPosition`, which returns position where the next token in the stream begins. * Defined `Semigroup` and `Monoid` instances of `ParsecT`. * Dropped support for GHC 7.6. * Added an `ErrorComponent` instance for `()`. ## Megaparsec 5.2.0 * Added `MonadParsec` instance for `RWST`. * Allowed `many` to run parsers that do not consume input. Previously this signalled an `error` which was ugly. Of course, in most cases giving `many` a parser that do not consume input will lead to non-termination bugs, but there are legal cases when this should be allowed. The test suite now contains an example of this. Non-termination issues is something inherited from the power Megaparsec gives (with more power comes more responsibility), so that `error` case in `many` really does not solve the problem, it was just a little ah-hoc guard we got from Parsec's past. * The criterion benchmark was completely re-written and a new weigh benchmark to analyze memory consumption was added. * Performance improvements: `count` (marginal improvement, simpler implementation), `count'` (considerable improvement), and `many` (marginal improvement, simpler implementation). * Added `stateTokensProcessed` field to parser state and helper functions `getTokensProcessed` and `setTokensProcessed`. The field contains number of processed tokens so far. This allows, for example, create wrappers that return just parsed fragment of input stream alongside with result of parsing. (It was possible before, but very inefficient because it required traversing entire input stream twice.) * `IndentNone` option of `indentBlock` now picks whitespace after it like its sisters `IndentMany` and `IndentSome` do, see #161. * Fixed a couple of quite subtle bugs in `indentBlock` introduced by changing behaviour of `skipLineComment` in version 5.1.0. See #178 for more information. ## Megaparsec 5.1.2 * Stopped using property tests with `dbg` helper to avoid flood of debugging info when test suite is run. * Fixed the build with `QuickCheck` versions older than 2.9.0. ## Megaparsec 5.1.1 * Exported the `observing` primitive from `Text.Megaparsec`. ## Megaparsec 5.1.0 * Defined `displayException` for `ParseError`, so exceptions are displayed in human-friendly form now. This works with GHC 7.10 and later. * Line comments parsed by `skipLineComment` now may end at the end of input and do not necessarily require a newline to be parsed correctly. See #119. * Exposed `parseErrorTextPretty` function in `Text.Megaparsec.Error` to allow to render `ParseError`s without stack of source positions. * Eliminated the `old-tests` test suite — Parsec legacy. The cases that are not already *obviously* covered in the main test suite were included into it. * Added `Arbitrary` instances for the following data types: `Pos`, `SourcePos`, `ErrorItem`, `Dec`, `ParseError` and `State`. This should make testing easier without the need to add orphan instances every time. The drawback is that we start to depend on `QuickCheck`, but that's a fair price. * The test suite now uses the combination of Hspec and the `hpesc-megaparsec` package, which also improved the latter (that package is the recommended way to test Megaparsec parsers). * The `try` combinator now truly backtracks parser state when its argument parser fails (either consuming input or not). Most users will never notice the difference though. See #142. * Added the `dbg` function that should be helpful for debugging. * Added `observing` primitive combinator that allows to “observe” parse errors without ending parsing (they are returned in `Left`, while normal results are wrapped in `Right`). * Further documentation improvements. ## Megaparsec 5.0.1 * Derived `NFData` instances for `Pos`, `InvalidPosException`, `SourcePos`, `ErrorItem`, `Dec`, `ParseError`, and `State`. * Derived `Data` instance for `ParseError`, `Data` and `Typeable` instances for `SourcePos` and `State`. * Minor documentation improvements. ## Megaparsec 5.0.0 ### General changes * Removed `parseFromFile` and `StorableStream` type-class that was necessary for it. The reason for removal is that reading from file and then parsing its contents is trivial for every instance of `Stream` and this function provides no way to use newer methods for running a parser, such as `runParser'`. So, simply put, it adds little value and was included in 4.x versions for compatibility reasons. * Moved position-advancing function from arguments of `token` and `tokens` functions to `Stream` type class (named `updatePos`). The new function allows to handle custom streams of tokens where every token contains information about its position in stream better (for example when stream of tokens is produced with happy/alex). * Support for include files (stack of positions instead of flat position) added. The new functions `pushPosition` and `popPosition` can be used to move “vertically” in the stack of positions. `getPosition` and `setPosition` still work on top (“current file”) level, but user can get full stack via `getParserState` if necessary. Note that `ParseError` and pretty-printing for it also support the new feature. * Added type function `Token` associated with `Stream` type class. The function returns type of token corresponding to specific token stream. * Type `ParsecT` (and also type synonym `Parsec`) are now parametrized over type of custom component in parse errors. * Parameters of `MonadParsec` type class are: `e` — type of custom component in parse errors, `s` — type of input stream, and `m` — type of underlying monad. * Type of `failure` primitive combinator was changed, now it accepts three arguments: set of unexpected items, set of expected items, and set of custom data. * Type of `token` primitive combinator was changed, now in case of failure a triple-tuple is returned with elements corresponding to arguments of `failure` primitive. The `token` primitive can also be optionally given an argument of token type to use in error messages (as expected item) in case of end of input. * `unexpected` combinator now accepts argument of type `ErrorItem` instead of plain `String`. * General performance improvements and improvements in speed of some combinators, `manyTill` in particular. ### Error messages * The module `Text.Megaparsec.Pos` was completely rewritten. The new module uses `Pos` data type with smart constructors to ensure that things like line and column number can be only positive. `SourcePos` on the other hand does not require smart constructors anymore and its constructors are exported. `Show` and `Read` instances of `SourcePos` are derived and pretty-printing is done with help of `sourcePosPretty` function. * The module `Text.Megaparsec.Error` was completely rewritten. A number of new types and type-classes are introduced: `ErrorItem`, `Dec`, `ErrorComponent`, and `ShowErrorComponent`. `ParseError` does not need smart constructors anymore and its constructor and field selectors are exported. It uses sets (from the `containers` package) instead of sorted lists to enumerate unexpected and expected items. The new definition is also parametrized over token type and custom data type which can be passed around as part of parse error. Default “custom data” component is `Dec`, which see. All in all, we have completely well-typed and extensible error messages now. `Show` and `Read` instances of `ParseError` are derived and pretty-printing is done with help of `parseErrorPretty`. * The module `Text.Megaparsec.ShowToken` was eliminated and type class `ShowToken` was moved to `Text.Megaparsec.Error`. The only method of that class in now named `showTokens` and it works on streams of tokens, where single tokes are represented by `NonEmpty` list with single element. ### Built-in combinators * Combinators `oneOf`, `oneOf'`, `noneOf`, and `noneOf'` now accept any instance of `Foldable`, not only `String`. ### Lexer * Error messages about incorrect indentation levels were greatly improved. Now every such message contains information about desired ordering between “reference” indentation level and actual indentation level as well as values of these levels. The information is stored in `ParseError` in well-typed form and can be pretty-printed when necessary. As part of this improvement, type of `indentGuard` was changed. * `incorrectIndent` combinator is introduced in `Text.Megaparsec.Lexer` module. It allows to fail with detailed information regarding incorrect indentation. * Introduced `scientific` parser that can parse arbitrary big numbers without error or memory overflow. `float` still returns `Double`, but it's defined in terms of `scientific` now. Since `Scientific` type can reliably represent integer values as well as floating point values, `number` now returns `Scientific` instead of `Either Integer Double` (`Integer` or `Double` can be extracted from `Scientific` value anyway). This in turn makes `signed` parser more natural and general, because we do not need ad-hoc `Signed` type class anymore. * Added `skipBlockCommentNested` function that should help parse possibly nested block comments. * Added `lineFold` function that helps parse line folds. ## Megaparsec 4.4.0 * Now state returned on failure is the exact state of parser at the moment when it failed, which makes incremental parsing feature much better and opens possibilities for features like “on-the-fly” recovering from parse errors. * The `count` combinator now works with `Applicative` instances (previously it worked only with instances of `Alternative`). It's now also faster. * `tokens` and parsers built upon it (such as `string` and `string'`) backtrack automatically on failure now, that is, when they fail, they never consume any input. This is done to make their consumption model match how error messages are reported (which becomes an important thing as user gets more control with primitives like `withRecovery`). This means, in particular, that it's no longer necessary to use `try` with `tokens`-based parsers. This new feature *does not* affect performance in any way. * New primitive parser `withRecovery` added. The parser allows to recover from parse errors “on-the-fly” and continue parsing. Once parsing is finished, several parse errors may be reported or ignored altogether. * `eitherP` combinator added. * Removed `Enum` instance of `Message` type. This was Parsec's legacy that we should eliminate now. `Message` does not constitute enumeration, `toEnum` was never properly defined for it. The idea to use `fromEnum` to determine type of `Message` is also ugly, for this purpose new functions `isUnexpected`, `isExpected`, and `isMessage` are defined in `Text.Megaparsec.Error`. * Minor tweak in signature of `MonadParsec` type class. Collection of constraints changed from `Alternative m, Monad m, Stream s t` to `Alternative m, MonadPlus m, Stream s t`. This is done to make it easier to write more abstract code with older GHC where such primitives as `guard` are defined for instances of `MonadPlus`, not `Alternative`. ## Megaparsec 4.3.0 * Canonicalized `Applicative`/`Monad` instances. Thanks to Herbert Valerio Riedel. * Custom messages in `ParseError` are printed each on its own line. * Now accumulated hints are not used with `ParseError` records that have only custom messages in them (created with `Message` constructor, as opposed to `Unexpected` or `Expected`). This strips “expected” line from custom error messages where it's unlikely to be relevant anyway. * Added higher-level combinators for indentation-sensitive grammars: `indentLevel`, `nonIndented`, and `indentBlock`. ## Megaparsec 4.2.0 * Made `newPos` constructor and other functions in `Text.Megaparsec.Pos` smarter. Now it's impossible to create `SourcePos` with non-positive line number or column number. Unfortunately we cannot use `Numeric.Natural` because we need to support older versions of `base`. * `ParseError` is now a monoid. `mergeError` is used as `mappend`. * Added functions `addErrorMessages` and `newErrorMessages` to add several messages to existing error and to construct error with several attached messages respectively. * `parseFromFile` now lives in `Text.Megaparsec.Prim`. Previously we had 5 nearly identical definitions of the function, varying only in type-specific `readFile` function. Now the problem is solved by introduction of `StorableStream` type class. All supported stream types are instances of the class out of box and thus we have polymorphic version of `parseFromFile`. * `ParseError` is now instance of `Exception` (and `Typeable`). * Introduced `runParser'` and `runParserT'` functions that take and return parser state. This makes it possible to partially parse input, resume parsing, specify non-standard initial textual position, etc. * Introduced `failure` function that allows to fail with arbitrary collection of messages. `unexpected` is now defined in terms of `failure`. One consequence of this design decision is that `failure` is now method of `MonadParsec`, while `unexpected` is not. * Removed deprecated combinators from `Text.Megaparsec.Combinator`: * `chainl` * `chainl1` * `chainr` * `chainr1` * `number` parser in `Text.Megaparsec.Lexer` now can be used with `signed` combinator to parse either signed `Integer` or signed `Double`. ## Megaparsec 4.1.1 * Fixed bug in implementation of `sepEndBy` and `sepEndBy1` and removed deprecation notes for these functions. * Added tests for `sepEndBy` and `sepEndBy1`. ## Megaparsec 4.1.0 * Relaxed dependency on `base`, so that minimal required version of `base` is now 4.6.0.0. This allows Megaparsec to compile with GHC 7.6.x. * `Text.Megaparsec` and `Text.Megaparsec.Prim` do not export data types `Consumed` and `Reply` anymore because they are rather low-level implementation details that should not be visible to end-user. * Representation of file name and textual position in error messages was made conventional. * Fixed some typos is documentation and other materials. ## Megaparsec 4.0.0 ### General changes * Renamed `many1` → `some` as well as other parsers that had `many1` part in their names. * The following functions are now re-exported from `Control.Applicative`: `(<|>)`, `many`, `some`, `optional`. See #9. * Introduced type class `MonadParsec` in the style of MTL monad transformers. Eliminated built-in user state since it was not flexible enough and can be emulated via stack of monads. Now all tools in Megaparsec work with any instance of `MonadParsec`, not only with `ParsecT`. * Added new function `parseMaybe` for lightweight parsing where error messages (and thus file name) are not important and entire input should be parsed. For example it can be used when parsing of single number according to specification of its format is desired. * Fixed bug with `notFollowedBy` always succeeded with parsers that don't consume input, see #6. * Flipped order of arguments in the primitive combinator `label`, see #21. * Renamed `tokenPrim` → `token`, removed old `token`, because `tokenPrim` is more general and original `token` is little used. * Made `token` parser more powerful, now its second argument can return `Either [Message] a` instead of `Maybe a`, so it can influence error message when parsing of token fails. See #29. * Added new primitive combinator `hidden p` which hides “expected” tokens in error message when parser `p` fails. * Tab width is not hard-coded anymore. It can be manipulated via `getTabWidth` and `setTabWidth`. Default tab-width is `defaultTabWidth`, which is 8. ### Error messages * Introduced type class `ShowToken` and improved representation of characters and strings in error messages, see #12. * Greatly improved quality of error messages. Fixed entire `Text.Megaparsec.Error` module, see #14 for more information. Made possible normal analysis of error messages without “render and re-parse” approach that previous maintainers had to practice to write even simplest tests, see module `Utils.hs` in `old-tests` for example. * Reduced number of `Message` constructors (now there are only `Unexpected`, `Expected`, and `Message`). Empty “magic” message strings are ignored now, all the library now uses explicit error messages. * Introduced hint system that greatly improves quality of error messages and made code of `Text.Megaparsec.Prim` a lot clearer. ### Built-in combinators * All built-in combinators in `Text.Megaparsec.Combinator` now work with any instance of `Alternative` (some of them even with `Applicative`). * Added more powerful `count'` parser. This parser can be told to parse from `m` to `n` occurrences of some thing. `count` is defined in terms of `count'`. * Removed `optionMaybe` parser, because `optional` from `Control.Applicative` does the same thing. * Added combinator `someTill`. * These combinators are considered deprecated and will be removed in future: * `chainl` * `chainl1` * `chainr` * `chainr1` * `sepEndBy` * `sepEndBy1` ### Character parsing * Renamed some parsers: * `alphaNum` → `alphaNumChar` * `digit` → `digitChar` * `endOfLine` → `eol` * `hexDigit` → `hexDigitChar` * `letter` → `letterChar` * `lower` → `lowerChar` * `octDigit` → `octDigitChar` * `space` → `spaceChar` * `spaces` → `space` * `upper` → `upperChar` * Added new character parsers in `Text.Megaparsec.Char`: * `asciiChar` * `charCategory` * `controlChar` * `latin1Char` * `markChar` * `numberChar` * `printChar` * `punctuationChar` * `separatorChar` * `symbolChar` * Descriptions of old parsers have been updated to accent some Unicode-specific moments. For example, old description of `letter` stated that it parses letters from “a” to “z” and from “A” to “Z”. This is wrong, since it used `Data.Char.isAlpha` predicate internally and thus parsed many more characters (letters of non-Latin languages, for example). * Added combinators `char'`, `oneOf'`, `noneOf'`, and `string'` which are case-insensitive variants of `char`, `oneOf`, `noneOf`, and `string` respectively. ### Lexer * Rewritten parsing of numbers, fixed #2 and #3 (in old Parsec project these are number 35 and 39 respectively), added per bug tests. * Since Haskell report doesn't say anything about sign, `integer` and `float` now parse numbers without sign. * Removed `natural` parser, it's equal to new `integer` now. * Renamed `naturalOrFloat` → `number` — this doesn't parse sign too. * Added new combinator `signed` to parse all sorts of signed numbers. * Transformed `Text.Parsec.Token` into `Text.Megaparsec.Lexer`. Little of Parsec's code remains in the new lexer module. New module doesn't impose any assumptions on user and should be vastly more useful and general. Hairy stuff from original Parsec didn't get here, for example built-in Haskell functions are used to parse escape sequences and the like instead of trying to re-implement the whole thing. ### Other * Renamed the following functions: * `permute` → `makePermParser` * `buildExpressionParser` → `makeExprParser` * Added comprehensive QuickCheck test suite. * Added benchmarks. ## Parsec 3.1.9 * Many and various updates to documentation and package description (including the homepage links). * Add an `Eq` instance for `ParseError`. * Fixed a regression from 3.1.6: `runP` is again exported from module `Text.Parsec`. ## Parsec 3.1.8 * Fix a regression from 3.1.6 related to exports from the main module. ## Parsec 3.1.7 * Fix a regression from 3.1.6 related to the reported position of error messages. See bug #9 for details. * Reset the current error position on success of `lookAhead`. ## Parsec 3.1.6 * Export `Text` instances from `Text.Parsec`. * Make `Text.Parsec` exports more visible. * Re-arrange `Text.Parsec` exports. * Add functions `crlf` and `endOfLine` to `Text.Parsec.Char` for handling input streams that do not have normalized line terminators. * Fix off-by-one error in `Token.charControl`. ## Parsec 3.1.4 & 3.1.5 * Bump dependency on `text`. ## Parsec 3.1.3 * Fix a regression introduced in 3.1.2 related to positions reported by error messages. megaparsec-9.3.1/LICENSE.md0000644000000000000000000000251607346545000013434 0ustar0000000000000000Copyright © 2015–present Megaparsec contributors\ Copyright © 2007 Paolo Martini\ Copyright © 1999–2000 Daan Leijen All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. megaparsec-9.3.1/README.md0000644000000000000000000003544707346545000013320 0ustar0000000000000000# Megaparsec [![License FreeBSD](https://img.shields.io/badge/license-FreeBSD-brightgreen.svg)](http://opensource.org/licenses/BSD-2-Clause) [![Hackage](https://img.shields.io/hackage/v/megaparsec.svg?style=flat)](https://hackage.haskell.org/package/megaparsec) [![Stackage Nightly](http://stackage.org/package/megaparsec/badge/nightly)](http://stackage.org/nightly/package/megaparsec) [![Stackage LTS](http://stackage.org/package/megaparsec/badge/lts)](http://stackage.org/lts/package/megaparsec) ![CI](https://github.com/mrkkrp/megaparsec/workflows/CI/badge.svg?branch=master) * [Features](#features) * [Core features](#core-features) * [Error messages](#error-messages) * [External lexers](#external-lexers) * [Character and binary parsing](#character-and-binary-parsing) * [Lexer](#lexer) * [Documentation](#documentation) * [Tutorials](#tutorials) * [Performance](#performance) * [Comparison with other solutions](#comparison-with-other-solutions) * [Megaparsec vs Attoparsec](#megaparsec-vs-attoparsec) * [Megaparsec vs Parsec](#megaparsec-vs-parsec) * [Megaparsec vs Trifecta](#megaparsec-vs-trifecta) * [Megaparsec vs Earley](#megaparsec-vs-earley) * [Related packages](#related-packages) * [Prominent projects that use Megaparsec](#prominent-projects-that-use-megaparsec) * [Links to announcements and blog posts](#links-to-announcements-and-blog-posts) * [Contribution](#contribution) * [License](#license) This is an industrial-strength monadic parser combinator library. Megaparsec is a feature-rich package that tries to find a nice balance between speed, flexibility, and quality of parse errors. ## Features The project provides flexible solutions to satisfy common parsing needs. The section describes them shortly. If you're looking for comprehensive documentation, see the [section about documentation](#documentation). ### Core features The package is built around `MonadParsec`, an MTL-style monad transformer. Most features work with all instances of `MonadParsec`. One can achieve various effects combining monad transformers, i.e. building a monadic stack. Since the common monad transformers like `WriterT`, `StateT`, `ReaderT` and others are instances of the `MonadParsec` type class, one can also wrap `ParsecT` *in* these monads, achieving, for example, backtracking state. On the other hand `ParsecT` is an instance of many type classes as well. The most useful ones are `Monad`, `Applicative`, `Alternative`, and `MonadParsec`. Megaparsec includes all functionality that is typically available in Parsec-like libraries and also features some special combinators: * `parseError` allows us to end parsing and report an arbitrary parse error. * `withRecovery` can be used to recover from parse errors “on-the-fly” and continue parsing. Once parsing is finished, several parse errors may be reported or ignored altogether. * `observing` makes it possible to “observe” parse errors without ending parsing. In addition to that, Megaparsec features high-performance combinators similar to those found in [Attoparsec][attoparsec]: * `tokens` makes it easy to parse several tokens in a row (`string` and `string'` are built on top of this primitive). This is about 100 times faster than matching a string token by token. `tokens` returns “chunk” of original input, meaning that if you parse `Text`, it'll return `Text` without repacking. * `takeWhile` and `takeWhile1` are about 150 times faster than approaches involving `many`, `manyTill` and other similar combinators. * `takeP` allows us to grab n tokens from the stream and returns them as a “chunk” of the stream. Megaparsec is about as fast as Attoparsec if you write your parser carefully (see also [the section about performance](#performance)). The library can currently work with the following types of input stream out-of-the-box: * `String = [Char]` * `ByteString` (strict and lazy) * `Text` (strict and lazy) It's also possible to make it work with custom token streams by making them an instance of the `Stream` type class. ### Error messages * Megaparsec has typed error messages and the ability to signal custom parse errors that better suit the user's domain of interest. * Since version 8, the location of parse errors can independent of current offset in the input stream. It is useful when you want a parse error to point to a particular position after performing some checks. * Instead of a single parse error Megaparsec produces so-called `ParseErrorBundle` data type that helps to manage multi-error messages and pretty-print them. Since version 8, reporting multiple parse errors at once has become easier. ### External lexers Megaparsec works well with streams of tokens produced by tools like Alex. The design of the `Stream` type class has been changed significantly in the recent versions, but user can still work with custom streams of tokens. ### Character and binary parsing Megaparsec has decent support for Unicode-aware character parsing. Functions for character parsing live in the [`Text.Megaparsec.Char`][tm-char] module. Similarly, there is [`Text.Megaparsec.Byte`][tm-byte] module for parsing streams of bytes. ### Lexer [`Text.Megaparsec.Char.Lexer`][tm-char-lexer] is a module that should help you write your lexer. If you have used `Parsec` in the past, this module “fixes” its particularly inflexible `Text.Parsec.Token`. [`Text.Megaparsec.Char.Lexer`][tm-char-lexer] is intended to be imported using a qualified import, it's not included in [`Text.Megaparsec`][tm]. The module doesn't impose how you should write your parser, but certain approaches may be more elegant than others. An especially important theme is parsing of white space, comments, and indentation. The design of the module allows one quickly solve simple tasks and doesn't get in the way when the need to implement something less standard arises. [`Text.Megaparsec.Byte.Lexer`][tm-byte-lexer] is also available for users who wish to parse binary data. ## Documentation Megaparsec is well-documented. See the [current version of Megaparsec documentation on Hackage][hackage]. ## Tutorials You can find the most complete Megaparsec tutorial [here][the-tutorial]. It should provide sufficient guidance to help you start with your parsing tasks. ## Performance Despite being flexible, Megaparsec is also fast. Here is how Megaparsec compares to [Attoparsec][attoparsec] (the fastest widely used parsing library in the Haskell ecosystem): Test case | Execution time | Allocated | Max residency ------------------|---------------:|----------:|-------------: CSV (Attoparsec) | 76.50 μs | 397,784 | 10,544 CSV (Megaparsec) | 64.69 μs | 352,408 | 9,104 Log (Attoparsec) | 302.8 μs | 1,150,032 | 10,912 Log (Megaparsec) | 337.8 μs | 1,246,496 | 10,912 JSON (Attoparsec) | 18.20 μs | 128,368 | 9,032 JSON (Megaparsec) | 25.45 μs | 203,824 | 9,176 You can run the benchmarks yourself by executing: ``` $ nix-build -A benches.parsers-bench $ cd result/bench $ ./bench-memory $ ./bench-speed ``` More information about benchmarking and development can be found [here][hacking]. ## Comparison with other solutions There are quite a few libraries that can be used for parsing in Haskell, let's compare Megaparsec with some of them. ### Megaparsec vs Attoparsec [Attoparsec][attoparsec] is another prominent Haskell library for parsing. Although both libraries deal with parsing, it's usually easy to decide which you will need in particular project: * *Attoparsec* is sometimes faster but not that feature-rich. It should be used when you want to process large amounts of data where performance matters more than quality of error messages. * *Megaparsec* is good for parsing of source code or other human-readable texts. It has better error messages and it's implemented as a monad transformer. So, if you work with something human-readable where the size of input data is moderate, it makes sense to go with Megaparsec, otherwise Attoparsec may be a better choice. ### Megaparsec vs Parsec Since Megaparsec is a fork of [Parsec][parsec], we are bound to list the main differences between the two libraries: * Better error messages. Megaparsec has typed error messages and custom error messages, it can also report multiple parse errors at once. * Megaparsec can show the line on which parse error happened as part of parse error. This makes it a lot easier to figure out where the error happened. * Some quirks and bugs of Parsec are fixed. * Better support for Unicode parsing in [`Text.Megaparsec.Char`][tm-char]. * Megaparsec has more powerful combinators and can parse languages where indentation matters. * Better documentation. * Megaparsec can recover from parse errors “on the fly” and continue parsing. * Megaparsec allows us to conditionally process parse errors inside a running parser. In particular, it's possible to define regions in which parse errors, should they happen, will get a “context tag”, e.g. we could build a context stack like “in function definition foo”, “in expression x”, etc. * Megaparsec is faster and supports efficient operations `tokens`, `takeWhileP`, `takeWhile1P`, `takeP`, like Attoparsec. If you want to see a detailed change log, `CHANGELOG.md` may be helpful. Also see [this original announcement][original-announcement] for another comparison. ### Megaparsec vs Trifecta [Trifecta][trifecta] is another Haskell library featuring good error messages. These are the common reasons why Trifecta may be problematic to use: * Complicated, doesn't have any tutorials available, and documentation doesn't help much. * Trifecta can parse `String` and `ByteString` natively, but not `Text`. * Depends on `lens`, which is a very heavy dependency. If you're not into `lens`, you may not like the API. [Idris][idris] has switched from Trifecta to Megaparsec which allowed it to [have better error messages and fewer dependencies][idris-testimony]. ### Megaparsec vs Earley [Earley][earley] is a newer library that allows us to safely parse context-free grammars (CFG). Megaparsec is a lower-level library compared to Earley, but there are still enough reasons to choose it: * Megaparsec is faster. * Your grammar may be not context-free or you may want introduce some sort of state to the parsing process. Almost all non-trivial parsers require state. Even if your grammar is context-free, state may allow for additional niceties. Earley does not support that. * Megaparsec's error messages are more flexible allowing to include arbitrary data in them, return multiple error messages, mark regions that affect any error that happens in those regions, etc. In other words, Megaparsec is less safe but also more powerful. ## Related packages The following packages are designed to be used with Megaparsec (open a PR if you want to add something to the list): * [`hspec-megaparsec`](https://hackage.haskell.org/package/hspec-megaparsec)—utilities for testing Megaparsec parsers with with [Hspec](https://hackage.haskell.org/package/hspec). * [`replace-megaparsec`](https://hackage.haskell.org/package/replace-megaparsec)—Stream editing and find-and-replace with Megaparsec. * [`cassava-megaparsec`](https://hackage.haskell.org/package/cassava-megaparsec)—Megaparsec parser of CSV files that plays nicely with [Cassava](https://hackage.haskell.org/package/cassava). * [`tagsoup-megaparsec`](https://hackage.haskell.org/package/tagsoup-megaparsec)—a library for easily using [TagSoup](https://hackage.haskell.org/package/tagsoup) as a token type in Megaparsec. * [`parser-combinators`](https://hackage.haskell.org/package/parser-combinators)—provides permutation and expression parsers [previously bundled with Megaparsec](https://markkarpov.com/post/megaparsec-7.html#parsercombinators-grows-megaparsec-shrinks). ## Prominent projects that use Megaparsec Some prominent projects that use Megaparsec: * [Idris](https://github.com/idris-lang/Idris-dev)—a general-purpose functional programming language with dependent types * [Dhall](https://github.com/dhall-lang/dhall-haskell)—an advanced configuration language * [hnix](https://github.com/haskell-nix/hnix)—re-implementation of the Nix language in Haskell * [Hledger](https://github.com/simonmichael/hledger)—an accounting tool * [MMark](https://github.com/mmark-md/mmark)—strict markdown processor for writers ## Links to announcements and blog posts Here are some blog posts mainly announcing new features of the project and describing what sort of things are now possible: * [Megaparsec 8](https://markkarpov.com/post/megaparsec-8.html) * [Megaparsec 7](https://markkarpov.com/post/megaparsec-7.html) * [Evolution of error messages](https://markkarpov.com/post/evolution-of-error-messages.html) * [A major upgrade to Megaparsec: more speed, more power](https://markkarpov.com/post/megaparsec-more-speed-more-power.html) * [Latest additions to Megaparsec](https://markkarpov.com/post/latest-additions-to-megaparsec.html) * [Announcing Megaparsec 5](https://markkarpov.com/post/announcing-megaparsec-5.html) * [Megaparsec 4 and 5](https://markkarpov.com/post/megaparsec-4-and-5.html) * [The original Megaparsec 4.0.0 announcement][original-announcement] ## Contribution Issues (bugs, feature requests or otherwise feedback) may be reported in [the GitHub issue tracker for this project](https://github.com/mrkkrp/megaparsec/issues). Pull requests are also welcome. If you would like to contribute to the project, you may find [this document][hacking] helpful. ## License Copyright © 2015–present Megaparsec contributors\ Copyright © 2007 Paolo Martini\ Copyright © 1999–2000 Daan Leijen Distributed under FreeBSD license. [hackage]: https://hackage.haskell.org/package/megaparsec [the-tutorial]: https://markkarpov.com/tutorial/megaparsec.html [hacking]: ./HACKING.md [tm]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec.html [tm-char]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Char.html [tm-byte]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Byte.html [tm-char-lexer]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Char-Lexer.html [tm-byte-lexer]: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Byte-Lexer.html [attoparsec]: https://hackage.haskell.org/package/attoparsec [parsec]: https://hackage.haskell.org/package/parsec [trifecta]: https://hackage.haskell.org/package/trifecta [earley]: https://hackage.haskell.org/package/Earley [idris]: https://www.idris-lang.org/ [idris-testimony]: https://twitter.com/edwinbrady/status/950084043282010117?s=09 [parsers-bench]: https://github.com/mrkkrp/parsers-bench [fast-parser]: https://markkarpov.com/megaparsec/writing-a-fast-parser.html [original-announcement]: https://mail.haskell.org/pipermail/haskell-cafe/2015-September/121530.html megaparsec-9.3.1/Setup.hs0000644000000000000000000000012707346545000013460 0ustar0000000000000000module Main (main) where import Distribution.Simple main :: IO () main = defaultMain megaparsec-9.3.1/Text/0000755000000000000000000000000007346545000012750 5ustar0000000000000000megaparsec-9.3.1/Text/Megaparsec.hs0000644000000000000000000005035507346545000015363 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE MultiParamTypeClasses #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE Safe #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE UndecidableInstances #-} -- | -- Module : Text.Megaparsec -- Copyright : © 2015–present Megaparsec contributors -- © 2007 Paolo Martini -- © 1999–2001 Daan Leijen -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- This module includes everything you need to get started writing a parser. -- If you are new to Megaparsec and don't know where to begin, take a look -- at the tutorial . -- -- In addition to the "Text.Megaparsec" module, which exports and re-exports -- almost everything that you may need, we advise to import -- "Text.Megaparsec.Char" if you plan to work with a stream of 'Char' tokens -- or "Text.Megaparsec.Byte" if you intend to parse binary data. -- -- It is common to start working with the library by defining a type synonym -- like this: -- -- > type Parser = Parsec Void Text -- > ^ ^ -- > | | -- > Custom error component Input stream type -- -- Then you can write type signatures like @Parser 'Int'@—for a parser that -- returns an 'Int' for example. -- -- Similarly (since it's known to cause confusion), you should use -- 'ParseErrorBundle' type parametrized like this: -- -- > ParseErrorBundle Text Void -- > ^ ^ -- > | | -- > Input stream type Custom error component (the same you used in Parser) -- -- Megaparsec uses some type-level machinery to provide flexibility without -- compromising on type safety. Thus type signatures are sometimes necessary -- to avoid ambiguous types. If you're seeing an error message that reads -- like “Type variable @e0@ is ambiguous …”, you need to give an explicit -- signature to your parser to resolve the ambiguity. It's a good idea to -- provide type signatures for all top-level definitions. module Text.Megaparsec ( -- * Re-exports -- $reexports module Text.Megaparsec.Pos, module Text.Megaparsec.Error, module Text.Megaparsec.Stream, module Control.Monad.Combinators, -- * Data types State (..), PosState (..), Parsec, ParsecT, -- * Running parser parse, parseMaybe, parseTest, runParser, runParser', runParserT, runParserT', -- * Primitive combinators MonadParsec (..), -- * Signaling parse errors -- $parse-errors failure, fancyFailure, unexpected, customFailure, region, registerParseError, registerFailure, registerFancyFailure, -- * Derivatives of primitive combinators single, satisfy, anySingle, anySingleBut, oneOf, noneOf, chunk, (), match, takeRest, atEnd, -- * Parser state combinators getInput, setInput, getSourcePos, getOffset, setOffset, setParserState, ) where import Control.Monad.Combinators import Control.Monad.Identity import Data.List.NonEmpty (NonEmpty (..)) import qualified Data.List.NonEmpty as NE import Data.Maybe (fromJust) import Data.Set (Set) import qualified Data.Set as E import Text.Megaparsec.Class import Text.Megaparsec.Error import Text.Megaparsec.Internal import Text.Megaparsec.Pos import Text.Megaparsec.State import Text.Megaparsec.Stream -- $reexports -- -- Note that we re-export monadic combinators from -- "Control.Monad.Combinators" because these are more efficient than -- 'Applicative'-based ones. Thus 'many' and 'some' may clash with the -- functions from "Control.Applicative". You need to hide the functions like -- this: -- -- > import Control.Applicative hiding (many, some) -- -- Also note that you can import "Control.Monad.Combinators.NonEmpty" if you -- wish that combinators like 'some' return 'NonEmpty' lists. The module -- lives in the @parser-combinators@ package (you need at least version -- /0.4.0/). -- -- This module is intended to be imported qualified: -- -- > import qualified Control.Monad.Combinators.NonEmpty as NE -- -- Other modules of interest are: -- -- * "Control.Monad.Combinators.Expr" for parsing of expressions. -- * "Control.Applicative.Permutations" for parsing of permutations -- phrases. ---------------------------------------------------------------------------- -- Data types -- | 'Parsec' is a non-transformer variant of the more general 'ParsecT' -- monad transformer. type Parsec e s = ParsecT e s Identity ---------------------------------------------------------------------------- -- Running a parser -- | @'parse' p file input@ runs parser @p@ over 'Identity' (see -- 'runParserT' if you're using the 'ParsecT' monad transformer; 'parse' -- itself is just a synonym for 'runParser'). It returns either a -- 'ParseErrorBundle' ('Left') or a value of type @a@ ('Right'). -- 'errorBundlePretty' can be used to turn 'ParseErrorBundle' into the -- string representation of the error message. See "Text.Megaparsec.Error" -- if you need to do more advanced error analysis. -- -- > main = case parse numbers "" "11,2,43" of -- > Left bundle -> putStr (errorBundlePretty bundle) -- > Right xs -> print (sum xs) -- > -- > numbers = decimal `sepBy` char ',' -- -- 'parse' is the same as 'runParser'. parse :: -- | Parser to run Parsec e s a -> -- | Name of source file String -> -- | Input for parser s -> Either (ParseErrorBundle s e) a parse = runParser -- | @'parseMaybe' p input@ runs the parser @p@ on @input@ and returns the -- result inside 'Just' on success and 'Nothing' on failure. This function -- also parses 'eof', so if the parser doesn't consume all of its input, it -- will fail. -- -- The function is supposed to be useful for lightweight parsing, where -- error messages (and thus file names) are not important and entire input -- should be consumed. For example, it can be used for parsing of a single -- number according to a specification of its format. parseMaybe :: (Ord e, Stream s) => Parsec e s a -> s -> Maybe a parseMaybe p s = case parse (p <* eof) "" s of Left _ -> Nothing Right x -> Just x -- | The expression @'parseTest' p input@ applies the parser @p@ on the -- input @input@ and prints the result to stdout. Useful for testing. parseTest :: ( ShowErrorComponent e, Show a, VisualStream s, TraversableStream s ) => -- | Parser to run Parsec e s a -> -- | Input for parser s -> IO () parseTest p input = case parse p "" input of Left e -> putStr (errorBundlePretty e) Right x -> print x -- | @'runParser' p file input@ runs parser @p@ on the input stream of -- tokens @input@, obtained from source @file@. The @file@ is only used in -- error messages and may be the empty string. Returns either a -- 'ParseErrorBundle' ('Left') or a value of type @a@ ('Right'). -- -- > parseFromFile p file = runParser p file <$> readFile file -- -- 'runParser' is the same as 'parse'. runParser :: -- | Parser to run Parsec e s a -> -- | Name of source file String -> -- | Input for parser s -> Either (ParseErrorBundle s e) a runParser p name s = snd $ runParser' p (initialState name s) -- | The function is similar to 'runParser' with the difference that it -- accepts and returns the parser state. This allows us e.g. to specify -- arbitrary textual position at the beginning of parsing. This is the most -- general way to run a parser over the 'Identity' monad. -- -- @since 4.2.0 runParser' :: -- | Parser to run Parsec e s a -> -- | Initial state State s e -> (State s e, Either (ParseErrorBundle s e) a) runParser' p = runIdentity . runParserT' p -- | @'runParserT' p file input@ runs parser @p@ on the input list of tokens -- @input@, obtained from source @file@. The @file@ is only used in error -- messages and may be the empty string. Returns a computation in the -- underlying monad @m@ that returns either a 'ParseErrorBundle' ('Left') or -- a value of type @a@ ('Right'). runParserT :: (Monad m) => -- | Parser to run ParsecT e s m a -> -- | Name of source file String -> -- | Input for parser s -> m (Either (ParseErrorBundle s e) a) runParserT p name s = snd <$> runParserT' p (initialState name s) -- | This function is similar to 'runParserT', but like 'runParser'' it -- accepts and returns parser state. This is thus the most general way to -- run a parser. -- -- @since 4.2.0 runParserT' :: (Monad m) => -- | Parser to run ParsecT e s m a -> -- | Initial state State s e -> m (State s e, Either (ParseErrorBundle s e) a) runParserT' p s = do (Reply s' _ result) <- runParsecT p s let toBundle es = ParseErrorBundle { bundleErrors = NE.sortWith errorOffset es, bundlePosState = statePosState s } return $ case result of OK x -> case NE.nonEmpty (stateParseErrors s') of Nothing -> (s', Right x) Just de -> (s', Left (toBundle de)) Error e -> (s', Left (toBundle (e :| stateParseErrors s'))) -- | Given the name of source file and the input construct the initial state -- for a parser. initialState :: String -> s -> State s e initialState name s = State { stateInput = s, stateOffset = 0, statePosState = PosState { pstateInput = s, pstateOffset = 0, pstateSourcePos = initialPos name, pstateTabWidth = defaultTabWidth, pstateLinePrefix = "" }, stateParseErrors = [] } ---------------------------------------------------------------------------- -- Signaling parse errors -- $parse-errors -- -- The most general function to fail and end parsing is 'parseError'. These -- are built on top of it. The section also includes functions starting with -- the @register@ prefix which allow users to register “delayed” -- 'ParseError's. -- | Stop parsing and report a trivial 'ParseError'. -- -- @since 6.0.0 failure :: (MonadParsec e s m) => -- | Unexpected item (if any) Maybe (ErrorItem (Token s)) -> -- | Expected items Set (ErrorItem (Token s)) -> m a failure us ps = do o <- getOffset parseError (TrivialError o us ps) {-# INLINE failure #-} -- | Stop parsing and report a fancy 'ParseError'. To report a single custom -- parse error, see 'Text.Megaparsec.customFailure'. -- -- @since 6.0.0 fancyFailure :: (MonadParsec e s m) => -- | Fancy error components Set (ErrorFancy e) -> m a fancyFailure xs = do o <- getOffset parseError (FancyError o xs) {-# INLINE fancyFailure #-} -- | The parser @'unexpected' item@ fails with an error message telling -- about unexpected item @item@ without consuming any input. -- -- > unexpected item = failure (Just item) Set.empty unexpected :: (MonadParsec e s m) => ErrorItem (Token s) -> m a unexpected item = failure (Just item) E.empty {-# INLINE unexpected #-} -- | Report a custom parse error. For a more general version, see -- 'fancyFailure'. -- -- > customFailure = fancyFailure . Set.singleton . ErrorCustom -- -- @since 6.3.0 customFailure :: (MonadParsec e s m) => e -> m a customFailure = fancyFailure . E.singleton . ErrorCustom {-# INLINE customFailure #-} -- | Specify how to process 'ParseError's that happen inside of this -- wrapper. This applies to both normal and delayed 'ParseError's. -- -- As a side-effect of the implementation the inner computation will start -- with an empty collection of delayed errors and they will be updated and -- “restored” on the way out of 'region'. -- -- @since 5.3.0 region :: (MonadParsec e s m) => -- | How to process 'ParseError's (ParseError s e -> ParseError s e) -> -- | The “region” that the processing applies to m a -> m a region f m = do deSoFar <- stateParseErrors <$> getParserState updateParserState $ \s -> s {stateParseErrors = []} r <- observing m updateParserState $ \s -> s {stateParseErrors = (f <$> stateParseErrors s) ++ deSoFar} case r of Left err -> parseError (f err) Right x -> return x {-# INLINEABLE region #-} -- | Register a 'ParseError' for later reporting. This action does not end -- parsing and has no effect except for adding the given 'ParseError' to the -- collection of “delayed” 'ParseError's which will be taken into -- consideration at the end of parsing. Only if this collection is empty the -- parser will succeed. This is the main way to report several parse errors -- at once. -- -- @since 8.0.0 registerParseError :: (MonadParsec e s m) => ParseError s e -> m () registerParseError e = updateParserState $ \s -> s {stateParseErrors = e : stateParseErrors s} {-# INLINE registerParseError #-} -- | Like 'failure', but for delayed 'ParseError's. -- -- @since 8.0.0 registerFailure :: (MonadParsec e s m) => -- | Unexpected item (if any) Maybe (ErrorItem (Token s)) -> -- | Expected items Set (ErrorItem (Token s)) -> m () registerFailure us ps = do o <- getOffset registerParseError (TrivialError o us ps) {-# INLINE registerFailure #-} -- | Like 'fancyFailure', but for delayed 'ParseError's. -- -- @since 8.0.0 registerFancyFailure :: (MonadParsec e s m) => -- | Fancy error components Set (ErrorFancy e) -> m () registerFancyFailure xs = do o <- getOffset registerParseError (FancyError o xs) {-# INLINE registerFancyFailure #-} ---------------------------------------------------------------------------- -- Derivatives of primitive combinators -- | @'single' t@ only matches the single token @t@. -- -- > semicolon = single ';' -- -- See also: 'token', 'anySingle', 'Text.Megaparsec.Byte.char', -- 'Text.Megaparsec.Char.char'. -- -- @since 7.0.0 single :: (MonadParsec e s m) => -- | Token to match Token s -> m (Token s) single t = token testToken expected where testToken x = if x == t then Just x else Nothing expected = E.singleton (Tokens (t :| [])) {-# INLINE single #-} -- | The parser @'satisfy' f@ succeeds for any token for which the supplied -- function @f@ returns 'True'. -- -- > digitChar = satisfy isDigit "digit" -- > oneOf cs = satisfy (`elem` cs) -- -- __Performance note__: when you need to parse a single token, it is often -- a good idea to use 'satisfy' with the right predicate function instead of -- creating a complex parser using the combinators. -- -- See also: 'anySingle', 'anySingleBut', 'oneOf', 'noneOf'. -- -- @since 7.0.0 satisfy :: (MonadParsec e s m) => -- | Predicate to apply (Token s -> Bool) -> m (Token s) satisfy f = token testChar E.empty where testChar x = if f x then Just x else Nothing {-# INLINE satisfy #-} -- | Parse and return a single token. It's a good idea to attach a 'label' -- to this parser. -- -- > anySingle = satisfy (const True) -- -- See also: 'satisfy', 'anySingleBut'. -- -- @since 7.0.0 anySingle :: (MonadParsec e s m) => m (Token s) anySingle = satisfy (const True) {-# INLINE anySingle #-} -- | Match any token but the given one. It's a good idea to attach a 'label' -- to this parser. -- -- > anySingleBut t = satisfy (/= t) -- -- See also: 'single', 'anySingle', 'satisfy'. -- -- @since 7.0.0 anySingleBut :: (MonadParsec e s m) => -- | Token we should not match Token s -> m (Token s) anySingleBut t = satisfy (/= t) {-# INLINE anySingleBut #-} -- | @'oneOf' ts@ succeeds if the current token is in the supplied -- collection of tokens @ts@. Returns the parsed token. Note that this -- parser cannot automatically generate the “expected” component of error -- message, so usually you should label it manually with 'label' or (''). -- -- > oneOf cs = satisfy (`elem` cs) -- -- See also: 'satisfy'. -- -- > digit = oneOf ['0'..'9'] "digit" -- -- __Performance note__: prefer 'satisfy' when you can because it's faster -- when you have only a couple of tokens to compare to: -- -- > quoteFast = satisfy (\x -> x == '\'' || x == '\"') -- > quoteSlow = oneOf "'\"" -- -- @since 7.0.0 oneOf :: (Foldable f, MonadParsec e s m) => -- | Collection of matching tokens f (Token s) -> m (Token s) oneOf cs = satisfy (`elem` cs) {-# INLINE oneOf #-} -- | As the dual of 'oneOf', @'noneOf' ts@ succeeds if the current token -- /not/ in the supplied list of tokens @ts@. Returns the parsed character. -- Note that this parser cannot automatically generate the “expected” -- component of error message, so usually you should label it manually with -- 'label' or (''). -- -- > noneOf cs = satisfy (`notElem` cs) -- -- See also: 'satisfy'. -- -- __Performance note__: prefer 'satisfy' and 'anySingleBut' when you can -- because it's faster. -- -- @since 7.0.0 noneOf :: (Foldable f, MonadParsec e s m) => -- | Collection of taken we should not match f (Token s) -> m (Token s) noneOf cs = satisfy (`notElem` cs) {-# INLINE noneOf #-} -- | @'chunk' chk@ only matches the chunk @chk@. -- -- > divOrMod = chunk "div" <|> chunk "mod" -- -- See also: 'tokens', 'Text.Megaparsec.Char.string', -- 'Text.Megaparsec.Byte.string'. -- -- @since 7.0.0 chunk :: (MonadParsec e s m) => -- | Chunk to match Tokens s -> m (Tokens s) chunk = tokens (==) {-# INLINE chunk #-} -- | A synonym for 'label' in the form of an operator. infix 0 () :: (MonadParsec e s m) => m a -> String -> m a () = flip label {-# INLINE () #-} -- | Return both the result of a parse and a chunk of input that was -- consumed during parsing. This relies on the change of the 'stateOffset' -- value to evaluate how many tokens were consumed. If you mess with it -- manually in the argument parser, prepare for troubles. -- -- @since 5.3.0 match :: (MonadParsec e s m) => m a -> m (Tokens s, a) match p = do o <- getOffset s <- getInput r <- p o' <- getOffset -- NOTE The 'fromJust' call here should never fail because if the stream -- is empty before 'p' (the only case when 'takeN_' can return 'Nothing' -- as per its invariants), (tp' - tp) won't be greater than 0, and in that -- case 'Just' is guaranteed to be returned as per another invariant of -- 'takeN_'. return ((fst . fromJust) (takeN_ (o' - o) s), r) {-# INLINEABLE match #-} -- | Consume the rest of the input and return it as a chunk. This parser -- never fails, but may return the empty chunk. -- -- > takeRest = takeWhileP Nothing (const True) -- -- @since 6.0.0 takeRest :: (MonadParsec e s m) => m (Tokens s) takeRest = takeWhileP Nothing (const True) {-# INLINE takeRest #-} -- | Return 'True' when end of input has been reached. -- -- > atEnd = option False (True <$ hidden eof) -- -- @since 6.0.0 atEnd :: (MonadParsec e s m) => m Bool atEnd = option False (True <$ hidden eof) {-# INLINE atEnd #-} ---------------------------------------------------------------------------- -- Parser state combinators -- | Return the current input. getInput :: (MonadParsec e s m) => m s getInput = stateInput <$> getParserState {-# INLINE getInput #-} -- | @'setInput' input@ continues parsing with @input@. setInput :: (MonadParsec e s m) => s -> m () setInput s = updateParserState (\(State _ o pst de) -> State s o pst de) {-# INLINE setInput #-} -- | Return the current source position. This function /is not cheap/, do -- not call it e.g. on matching of every token, that's a bad idea. Still you -- can use it to get 'SourcePos' to attach to things that you parse. -- -- The function works under the assumption that we move in the input stream -- only forwards and never backwards, which is always true unless the user -- abuses the library. -- -- @since 7.0.0 getSourcePos :: (TraversableStream s, MonadParsec e s m) => m SourcePos getSourcePos = do st <- getParserState let pst = reachOffsetNoLine (stateOffset st) (statePosState st) setParserState st {statePosState = pst} return (pstateSourcePos pst) {-# INLINE getSourcePos #-} -- | Get the number of tokens processed so far. -- -- See also: 'setOffset'. -- -- @since 7.0.0 getOffset :: (MonadParsec e s m) => m Int getOffset = stateOffset <$> getParserState {-# INLINE getOffset #-} -- | Set the number of tokens processed so far. -- -- See also: 'getOffset'. -- -- @since 7.0.0 setOffset :: (MonadParsec e s m) => Int -> m () setOffset o = updateParserState $ \(State s _ pst de) -> State s o pst de {-# INLINE setOffset #-} -- | @'setParserState' st@ sets the parser state to @st@. -- -- See also: 'getParserState', 'updateParserState'. setParserState :: (MonadParsec e s m) => State s e -> m () setParserState st = updateParserState (const st) {-# INLINE setParserState #-} megaparsec-9.3.1/Text/Megaparsec/0000755000000000000000000000000007346545000015017 5ustar0000000000000000megaparsec-9.3.1/Text/Megaparsec/Byte.hs0000644000000000000000000001701007346545000016255 0ustar0000000000000000{-# LANGUAGE Safe #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeOperators #-} -- | -- Module : Text.Megaparsec.Byte -- Copyright : © 2015–present Megaparsec contributors -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- Commonly used binary parsers. -- -- @since 6.0.0 module Text.Megaparsec.Byte ( -- * Simple parsers newline, crlf, eol, tab, space, hspace, space1, hspace1, -- * Categories of characters controlChar, spaceChar, upperChar, lowerChar, letterChar, alphaNumChar, printChar, digitChar, binDigitChar, octDigitChar, hexDigitChar, asciiChar, -- * Single byte char, char', -- * Sequence of bytes string, string', ) where import Control.Applicative import Data.Char hiding (isSpace, toLower, toUpper) import Data.Functor (void) import Data.Proxy import Data.Word (Word8) import Text.Megaparsec import Text.Megaparsec.Common ---------------------------------------------------------------------------- -- Simple parsers -- | Parse a newline byte. newline :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) newline = char 10 {-# INLINE newline #-} -- | Parse a carriage return character followed by a newline character. -- Return the sequence of characters parsed. crlf :: forall e s m. (MonadParsec e s m, Token s ~ Word8) => m (Tokens s) crlf = string (tokensToChunk (Proxy :: Proxy s) [13, 10]) {-# INLINE crlf #-} -- | Parse a CRLF (see 'crlf') or LF (see 'newline') end of line. Return the -- sequence of characters parsed. eol :: forall e s m. (MonadParsec e s m, Token s ~ Word8) => m (Tokens s) eol = (tokenToChunk (Proxy :: Proxy s) <$> newline) <|> crlf "end of line" {-# INLINE eol #-} -- | Parse a tab character. tab :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) tab = char 9 {-# INLINE tab #-} -- | Skip /zero/ or more white space characters. -- -- See also: 'skipMany' and 'spaceChar'. space :: (MonadParsec e s m, Token s ~ Word8) => m () space = void $ takeWhileP (Just "white space") isSpace {-# INLINE space #-} -- | Like 'space', but does not accept newlines and carriage returns. -- -- @since 9.0.0 hspace :: (MonadParsec e s m, Token s ~ Word8) => m () hspace = void $ takeWhileP (Just "white space") isHSpace {-# INLINE hspace #-} -- | Skip /one/ or more white space characters. -- -- See also: 'skipSome' and 'spaceChar'. space1 :: (MonadParsec e s m, Token s ~ Word8) => m () space1 = void $ takeWhile1P (Just "white space") isSpace {-# INLINE space1 #-} -- | Like 'space1', but does not accept newlines and carriage returns. -- -- @since 9.0.0 hspace1 :: (MonadParsec e s m, Token s ~ Word8) => m () hspace1 = void $ takeWhile1P (Just "white space") isHSpace {-# INLINE hspace1 #-} ---------------------------------------------------------------------------- -- Categories of characters -- | Parse a control character. controlChar :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) controlChar = satisfy (isControl . toChar) "control character" {-# INLINE controlChar #-} -- | Parse a space character, and the control characters: tab, newline, -- carriage return, form feed, and vertical tab. spaceChar :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) spaceChar = satisfy isSpace "white space" {-# INLINE spaceChar #-} -- | Parse an upper-case character. upperChar :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) upperChar = satisfy (isUpper . toChar) "uppercase letter" {-# INLINE upperChar #-} -- | Parse a lower-case alphabetic character. lowerChar :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) lowerChar = satisfy (isLower . toChar) "lowercase letter" {-# INLINE lowerChar #-} -- | Parse an alphabetic character: lower-case or upper-case. letterChar :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) letterChar = satisfy (isLetter . toChar) "letter" {-# INLINE letterChar #-} -- | Parse an alphabetic or digit characters. alphaNumChar :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) alphaNumChar = satisfy (isAlphaNum . toChar) "alphanumeric character" {-# INLINE alphaNumChar #-} -- | Parse a printable character: letter, number, mark, punctuation, symbol -- or space. printChar :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) printChar = satisfy (isPrint . toChar) "printable character" {-# INLINE printChar #-} -- | Parse an ASCII digit, i.e between “0” and “9”. digitChar :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) digitChar = satisfy isDigit' "digit" where isDigit' x = x >= 48 && x <= 57 {-# INLINE digitChar #-} -- | Parse a binary digit, i.e. “0” or “1”. -- -- @since 7.0.0 binDigitChar :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) binDigitChar = satisfy isBinDigit "binary digit" where isBinDigit x = x == 48 || x == 49 {-# INLINE binDigitChar #-} -- | Parse an octal digit, i.e. between “0” and “7”. octDigitChar :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) octDigitChar = satisfy isOctDigit' "octal digit" where isOctDigit' x = x >= 48 && x <= 55 {-# INLINE octDigitChar #-} -- | Parse a hexadecimal digit, i.e. between “0” and “9”, or “a” and “f”, or -- “A” and “F”. hexDigitChar :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) hexDigitChar = satisfy (isHexDigit . toChar) "hexadecimal digit" {-# INLINE hexDigitChar #-} -- | Parse a character from the first 128 characters of the Unicode -- character set, corresponding to the ASCII character set. asciiChar :: (MonadParsec e s m, Token s ~ Word8) => m (Token s) asciiChar = satisfy (< 128) "ASCII character" {-# INLINE asciiChar #-} ---------------------------------------------------------------------------- -- Single byte -- | A type-constrained version of 'single'. -- -- > newline = char 10 char :: (MonadParsec e s m, Token s ~ Word8) => Token s -> m (Token s) char = single {-# INLINE char #-} -- | The same as 'char' but case-insensitive. This parser returns the -- actually parsed character preserving its case. -- -- >>> parseTest (char' 101) "E" -- 69 -- 'E' -- >>> parseTest (char' 101) "G" -- 1:1: -- unexpected 'G' -- expecting 'E' or 'e' char' :: (MonadParsec e s m, Token s ~ Word8) => Token s -> m (Token s) char' c = choice [ char (toLower c), char (toUpper c) ] {-# INLINE char' #-} ---------------------------------------------------------------------------- -- Helpers -- | 'Word8'-specialized version of 'Data.Char.isSpace'. isSpace :: Word8 -> Bool isSpace x | x >= 9 && x <= 13 = True | x == 32 = True | x == 160 = True | otherwise = False {-# INLINE isSpace #-} -- | Like 'isSpace', but does not accept newlines and carriage returns. isHSpace :: Word8 -> Bool isHSpace x | x == 9 = True | x == 11 = True | x == 12 = True | x == 32 = True | x == 160 = True | otherwise = False {-# INLINE isHSpace #-} -- | Convert a byte to char. toChar :: Word8 -> Char toChar = chr . fromIntegral {-# INLINE toChar #-} -- | Convert a byte to its upper-case version. toUpper :: Word8 -> Word8 toUpper x | x >= 97 && x <= 122 = x - 32 | x == 247 = x -- division sign | x == 255 = x -- latin small letter y with diaeresis | x >= 224 = x - 32 | otherwise = x {-# INLINE toUpper #-} -- | Convert a byte to its lower-case version. toLower :: Word8 -> Word8 toLower x | x >= 65 && x <= 90 = x + 32 | x == 215 = x -- multiplication sign | x >= 192 && x <= 222 = x + 32 | otherwise = x {-# INLINE toLower #-} megaparsec-9.3.1/Text/Megaparsec/Byte/0000755000000000000000000000000007346545000015722 5ustar0000000000000000megaparsec-9.3.1/Text/Megaparsec/Byte/Binary.hs0000644000000000000000000001304307346545000017503 0ustar0000000000000000{-# LANGUAGE AllowAmbiguousTypes #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE Safe #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TypeApplications #-} {-# LANGUAGE TypeFamilies #-} -- | -- Module : Text.Megaparsec.Byte.Binary -- Copyright : © 2021–present Megaparsec contributors -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- Binary-format number parsers. -- -- @since 9.2.0 module Text.Megaparsec.Byte.Binary ( -- * Generic parsers BinaryChunk (..), anyLE, anyBE, -- * Parsing unsigned values word8, word16le, word16be, word32le, word32be, word64le, word64be, -- * Parsing signed values int8, int16le, int16be, int32le, int32be, int64le, int64be, ) where import Data.Bits import qualified Data.ByteString as B import qualified Data.ByteString.Lazy as BL import Data.Int import Data.Word import Text.Megaparsec -- | Data types that can be converted to little- or big- endian numbers. class BinaryChunk chunk where convertChunkBE :: (Bits a, Num a) => chunk -> a convertChunkLE :: (Bits a, Num a) => chunk -> a instance BinaryChunk B.ByteString where convertChunkBE = B.foldl' go 0 where go acc byte = (acc `unsafeShiftL` 8) .|. fromIntegral byte convertChunkLE = B.foldl' go 0 where go acc byte = (acc .|. fromIntegral byte) `rotateR` 8 instance BinaryChunk BL.ByteString where convertChunkBE = BL.foldl' go 0 where go acc byte = (acc `unsafeShiftL` 8) .|. fromIntegral byte convertChunkLE = BL.foldl' go 0 where go acc byte = (acc .|. fromIntegral byte) `rotateR` 8 ---------------------------------------------------------------------------- -- Generic parsers -- | Parse a little-endian number. -- -- You may wish to call this with a visible type application: -- -- > number <- anyLE (Just "little-endian 32 bit word") @Word32 anyLE :: forall a e s m. (MonadParsec e s m, FiniteBits a, Num a, BinaryChunk (Tokens s)) => -- | Label, if any Maybe String -> m a anyLE mlabel = convertChunkLE <$> takeP mlabel (finiteByteSize @a) {-# INLINE anyLE #-} -- | Parse a big-endian number. -- -- You may wish to call this with a visible type application: -- -- > number <- anyBE (Just "big-endian 32 bit word") @Word32 anyBE :: forall a e s m. (MonadParsec e s m, FiniteBits a, Num a, BinaryChunk (Tokens s)) => -- | Label, if any Maybe String -> m a anyBE mlabel = convertChunkBE <$> takeP mlabel (finiteByteSize @a) {-# INLINE anyBE #-} -------------------------------------------------------------------------------- -- Parsing unsigned values -- | Parse a 'Word8'. word8 :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Word8 word8 = anyBE (Just "8 bit word") {-# INLINE word8 #-} -- | Parse a little-endian 'Word16'. word16le :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Word16 word16le = anyLE (Just "little-endian 16 bit word") {-# INLINE word16le #-} -- | Parse a big-endian 'Word16'. word16be :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Word16 word16be = anyBE (Just "big-endian 16 bit word") {-# INLINE word16be #-} -- | Parse a little-endian 'Word32'. word32le :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Word32 word32le = anyLE (Just "little-endian 32 bit word") {-# INLINE word32le #-} -- | Parse a big-endian 'Word32'. word32be :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Word32 word32be = anyBE (Just "big-endian 32 bit word") {-# INLINE word32be #-} -- | Parse a little-endian 'Word64'. word64le :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Word64 word64le = anyLE (Just "little-endian 64 word") {-# INLINE word64le #-} -- | Parse a big-endian 'Word64'. word64be :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Word64 word64be = anyBE (Just "big-endian 64 word") {-# INLINE word64be #-} ---------------------------------------------------------------------------- -- Parsing signed values -- | Parse a 'Int8'. int8 :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Int8 int8 = anyBE (Just "8 bit int") {-# INLINE int8 #-} -- | Parse a little-endian 'Int16'. int16le :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Int16 int16le = anyLE (Just "little-endian 16 bit int") {-# INLINE int16le #-} -- | Parse a big-endian 'Int16'. int16be :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Int16 int16be = anyBE (Just "big-endian 16 bit int") {-# INLINE int16be #-} -- | Parse a little-endian 'Int32'. int32le :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Int32 int32le = anyLE (Just "little-endian 32 bit int") {-# INLINE int32le #-} -- | Parse a big-endian 'Int32'. int32be :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Int32 int32be = anyBE (Just "big-endian 32 bit int") {-# INLINE int32be #-} -- | Parse a little-endian 'Int64'. int64le :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Int64 int64le = anyLE (Just "little-endian 64 int") {-# INLINE int64le #-} -- | Parse a big-endian 'Int64'. int64be :: (MonadParsec e s m, BinaryChunk (Tokens s)) => m Int64 int64be = anyBE (Just "big-endian 64 int") {-# INLINE int64be #-} -------------------------------------------------------------------------------- -- Helpers -- | Return the number of bytes in the argument. -- -- Performs ceiling division, so byte-unaligned types (bitsize not a -- multiple of 8) should work, but further usage is not tested. finiteByteSize :: forall a. (FiniteBits a) => Int finiteByteSize = finiteBitSize @a undefined `ceilDiv` 8 where ceilDiv x y = (x + y - 1) `div` y {-# INLINE finiteByteSize #-} megaparsec-9.3.1/Text/Megaparsec/Byte/Lexer.hs0000644000000000000000000002215607346545000017343 0ustar0000000000000000{-# LANGUAGE Safe #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeOperators #-} -- | -- Module : Text.Megaparsec.Byte.Lexer -- Copyright : © 2015–present Megaparsec contributors -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- Stripped-down version of "Text.Megaparsec.Char.Lexer" for streams of -- bytes. -- -- This module is intended to be imported qualified: -- -- > import qualified Text.Megaparsec.Byte.Lexer as L module Text.Megaparsec.Byte.Lexer ( -- * White space space, lexeme, symbol, symbol', skipLineComment, skipBlockComment, skipBlockCommentNested, -- * Numbers decimal, binary, octal, hexadecimal, scientific, float, signed, ) where import Control.Applicative import Data.Functor (void) import Data.List (foldl') import Data.Proxy import Data.Scientific (Scientific) import qualified Data.Scientific as Sci import Data.Word (Word8) import Text.Megaparsec import qualified Text.Megaparsec.Byte as B import Text.Megaparsec.Lexer ---------------------------------------------------------------------------- -- White space -- | Given a comment prefix this function returns a parser that skips line -- comments. Note that it stops just before the newline character but -- doesn't consume the newline. Newline is either supposed to be consumed by -- 'space' parser or picked up manually. skipLineComment :: (MonadParsec e s m, Token s ~ Word8) => -- | Line comment prefix Tokens s -> m () skipLineComment prefix = B.string prefix *> void (takeWhileP (Just "character") (/= 10)) {-# INLINEABLE skipLineComment #-} -- | @'skipBlockComment' start end@ skips non-nested block comment starting -- with @start@ and ending with @end@. skipBlockComment :: (MonadParsec e s m, Token s ~ Word8) => -- | Start of block comment Tokens s -> -- | End of block comment Tokens s -> m () skipBlockComment start end = p >> void (manyTill anySingle n) where p = B.string start n = B.string end {-# INLINEABLE skipBlockComment #-} -- | @'skipBlockCommentNested' start end@ skips possibly nested block -- comment starting with @start@ and ending with @end@. -- -- @since 5.0.0 skipBlockCommentNested :: (MonadParsec e s m, Token s ~ Word8) => -- | Start of block comment Tokens s -> -- | End of block comment Tokens s -> m () skipBlockCommentNested start end = p >> void (manyTill e n) where e = skipBlockCommentNested start end <|> void anySingle p = B.string start n = B.string end {-# INLINEABLE skipBlockCommentNested #-} ---------------------------------------------------------------------------- -- Numbers -- | Parse an integer in the decimal representation according to the format -- of integer literals described in the Haskell report. -- -- If you need to parse signed integers, see the 'signed' combinator. -- -- __Warning__: this function does not perform range checks. decimal :: forall e s m a. (MonadParsec e s m, Token s ~ Word8, Num a) => m a decimal = decimal_ "integer" {-# INLINEABLE decimal #-} -- | A non-public helper to parse decimal integers. decimal_ :: forall e s m a. (MonadParsec e s m, Token s ~ Word8, Num a) => m a decimal_ = mkNum <$> takeWhile1P (Just "digit") isDigit where mkNum = foldl' step 0 . chunkToTokens (Proxy :: Proxy s) step a w = a * 10 + fromIntegral (w - 48) {-# INLINE decimal_ #-} -- | Parse an integer in the binary representation. The binary number is -- expected to be a non-empty sequence of zeroes “0” and ones “1”. -- -- You could of course parse some prefix before the actual number: -- -- > binary = char 48 >> char' 98 >> L.binary -- -- __Warning__: this function does not perform range checks. -- -- @since 7.0.0 binary :: forall e s m a. (MonadParsec e s m, Token s ~ Word8, Num a) => m a binary = mkNum <$> takeWhile1P Nothing isBinDigit "binary integer" where mkNum = foldl' step 0 . chunkToTokens (Proxy :: Proxy s) step a w = a * 2 + fromIntegral (w - 48) isBinDigit w = w == 48 || w == 49 {-# INLINEABLE binary #-} -- | Parse an integer in the octal representation. The format of the octal -- number is expected to be according to the Haskell report except for the -- fact that this parser doesn't parse “0o” or “0O” prefix. It is a -- responsibility of the programmer to parse correct prefix before parsing -- the number itself. -- -- For example you can make it conform to the Haskell report like this: -- -- > octal = char 48 >> char' 111 >> L.octal -- -- __Warning__: this function does not perform range checks. octal :: forall e s m a. (MonadParsec e s m, Token s ~ Word8, Num a) => m a octal = mkNum <$> takeWhile1P Nothing isOctDigit "octal integer" where mkNum = foldl' step 0 . chunkToTokens (Proxy :: Proxy s) step a w = a * 8 + fromIntegral (w - 48) isOctDigit w = w - 48 < 8 {-# INLINEABLE octal #-} -- | Parse an integer in the hexadecimal representation. The format of the -- hexadecimal number is expected to be according to the Haskell report -- except for the fact that this parser doesn't parse “0x” or “0X” prefix. -- It is a responsibility of the programmer to parse correct prefix before -- parsing the number itself. -- -- For example you can make it conform to the Haskell report like this: -- -- > hexadecimal = char 48 >> char' 120 >> L.hexadecimal -- -- __Warning__: this function does not perform range checks. hexadecimal :: forall e s m a. (MonadParsec e s m, Token s ~ Word8, Num a) => m a hexadecimal = mkNum <$> takeWhile1P Nothing isHexDigit "hexadecimal integer" where mkNum = foldl' step 0 . chunkToTokens (Proxy :: Proxy s) step a w | w >= 48 && w <= 57 = a * 16 + fromIntegral (w - 48) | w >= 97 = a * 16 + fromIntegral (w - 87) | otherwise = a * 16 + fromIntegral (w - 55) isHexDigit w = (w >= 48 && w <= 57) || (w >= 97 && w <= 102) || (w >= 65 && w <= 70) {-# INLINEABLE hexadecimal #-} -- | Parse a floating point value as a 'Scientific' number. 'Scientific' is -- great for parsing of arbitrary precision numbers coming from an untrusted -- source. See documentation in "Data.Scientific" for more information. -- -- The parser can be used to parse integers or floating point values. Use -- functions like 'Data.Scientific.floatingOrInteger' from "Data.Scientific" -- to test and extract integer or real values. -- -- This function does not parse sign, if you need to parse signed numbers, -- see 'signed'. scientific :: forall e s m. (MonadParsec e s m, Token s ~ Word8) => m Scientific scientific = do c' <- decimal_ SP c e' <- option (SP c' 0) (try $ dotDecimal_ (Proxy :: Proxy s) c') e <- option e' (try $ exponent_ e') return (Sci.scientific c e) {-# INLINEABLE scientific #-} data SP = SP !Integer {-# UNPACK #-} !Int -- | Parse a floating point number according to the syntax for floating -- point literals described in the Haskell report. -- -- This function does not parse sign, if you need to parse signed numbers, -- see 'signed'. -- -- __Note__: in versions /6.0.0/–/6.1.1/ this function accepted plain integers. float :: (MonadParsec e s m, Token s ~ Word8, RealFloat a) => m a float = do c' <- decimal_ Sci.toRealFloat <$> ( ( do SP c e' <- dotDecimal_ (Proxy :: Proxy s) c' e <- option e' (try $ exponent_ e') return (Sci.scientific c e) ) <|> (Sci.scientific c' <$> exponent_ 0) ) {-# INLINEABLE float #-} dotDecimal_ :: (MonadParsec e s m, Token s ~ Word8) => Proxy s -> Integer -> m SP dotDecimal_ pxy c' = do void (B.char 46) let mkNum = foldl' step (SP c' 0) . chunkToTokens pxy step (SP a e') w = SP (a * 10 + fromIntegral (w - 48)) (e' - 1) mkNum <$> takeWhile1P (Just "digit") isDigit {-# INLINE dotDecimal_ #-} exponent_ :: (MonadParsec e s m, Token s ~ Word8) => Int -> m Int exponent_ e' = do void (B.char' 101) (+ e') <$> signed (return ()) decimal_ {-# INLINE exponent_ #-} -- | @'signed' space p@ parser parses an optional sign character (“+” or -- “-”), then if there is a sign it consumes optional white space (using -- @space@ parser), then it runs parser @p@ which should return a number. -- Sign of the number is changed according to the previously parsed sign -- character. -- -- For example, to parse signed integer you can write: -- -- > lexeme = L.lexeme spaceConsumer -- > integer = lexeme L.decimal -- > signedInteger = L.signed spaceConsumer integer signed :: (MonadParsec e s m, Token s ~ Word8, Num a) => -- | How to consume white space after the sign m () -> -- | How to parse the number itself m a -> -- | Parser for signed numbers m a signed spc p = option id (lexeme spc sign) <*> p where sign = (id <$ B.char 43) <|> (negate <$ B.char 45) {-# INLINEABLE signed #-} ---------------------------------------------------------------------------- -- Helpers -- | A fast predicate to check if the given 'Word8' is a digit in ASCII. isDigit :: Word8 -> Bool isDigit w = w - 48 < 10 {-# INLINE isDigit #-} megaparsec-9.3.1/Text/Megaparsec/Char.hs0000644000000000000000000002431007346545000016230 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE LambdaCase #-} {-# LANGUAGE Safe #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeOperators #-} -- | -- Module : Text.Megaparsec.Char -- Copyright : © 2015–present Megaparsec contributors -- © 2007 Paolo Martini -- © 1999–2001 Daan Leijen -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : non-portable -- -- Commonly used character parsers. module Text.Megaparsec.Char ( -- * Simple parsers newline, crlf, eol, tab, space, hspace, space1, hspace1, -- * Categories of characters controlChar, spaceChar, upperChar, lowerChar, letterChar, alphaNumChar, printChar, digitChar, binDigitChar, octDigitChar, hexDigitChar, markChar, numberChar, punctuationChar, symbolChar, separatorChar, asciiChar, latin1Char, charCategory, categoryName, -- * Single character char, char', -- * Sequence of characters string, string', ) where import Control.Applicative import Data.Char import Data.Functor (void) import Data.Proxy import Text.Megaparsec import Text.Megaparsec.Common ---------------------------------------------------------------------------- -- Simple parsers -- | Parse a newline character. newline :: (MonadParsec e s m, Token s ~ Char) => m (Token s) newline = char '\n' {-# INLINE newline #-} -- | Parse a carriage return character followed by a newline character. -- Return the sequence of characters parsed. crlf :: forall e s m. (MonadParsec e s m, Token s ~ Char) => m (Tokens s) crlf = string (tokensToChunk (Proxy :: Proxy s) "\r\n") {-# INLINE crlf #-} -- | Parse a CRLF (see 'crlf') or LF (see 'newline') end of line. Return the -- sequence of characters parsed. eol :: forall e s m. (MonadParsec e s m, Token s ~ Char) => m (Tokens s) eol = (tokenToChunk (Proxy :: Proxy s) <$> newline) <|> crlf "end of line" {-# INLINE eol #-} -- | Parse a tab character. tab :: (MonadParsec e s m, Token s ~ Char) => m (Token s) tab = char '\t' {-# INLINE tab #-} -- | Skip /zero/ or more white space characters. -- -- See also: 'skipMany' and 'spaceChar'. space :: (MonadParsec e s m, Token s ~ Char) => m () space = void $ takeWhileP (Just "white space") isSpace {-# INLINE space #-} -- | Like 'space', but does not accept newlines and carriage returns. -- -- @since 9.0.0 hspace :: (MonadParsec e s m, Token s ~ Char) => m () hspace = void $ takeWhileP (Just "white space") isHSpace {-# INLINE hspace #-} -- | Skip /one/ or more white space characters. -- -- See also: 'skipSome' and 'spaceChar'. -- -- @since 6.0.0 space1 :: (MonadParsec e s m, Token s ~ Char) => m () space1 = void $ takeWhile1P (Just "white space") isSpace {-# INLINE space1 #-} -- | Like 'space1', but does not accept newlines and carriage returns. -- -- @since 9.0.0 hspace1 :: (MonadParsec e s m, Token s ~ Char) => m () hspace1 = void $ takeWhile1P (Just "white space") isHSpace {-# INLINE hspace1 #-} ---------------------------------------------------------------------------- -- Categories of characters -- | Parse a control character (a non-printing character of the Latin-1 -- subset of Unicode). controlChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) controlChar = satisfy isControl "control character" {-# INLINE controlChar #-} -- | Parse a Unicode space character, and the control characters: tab, -- newline, carriage return, form feed, and vertical tab. spaceChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) spaceChar = satisfy isSpace "white space" {-# INLINE spaceChar #-} -- | Parse an upper-case or title-case alphabetic Unicode character. Title -- case is used by a small number of letter ligatures like the -- single-character form of Lj. upperChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) upperChar = satisfy isUpper "uppercase letter" {-# INLINE upperChar #-} -- | Parse a lower-case alphabetic Unicode character. lowerChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) lowerChar = satisfy isLower "lowercase letter" {-# INLINE lowerChar #-} -- | Parse an alphabetic Unicode character: lower-case, upper-case, or -- title-case letter, or a letter of case-less scripts\/modifier letter. letterChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) letterChar = satisfy isLetter "letter" {-# INLINE letterChar #-} -- | Parse an alphabetic or numeric digit Unicode characters. -- -- Note that the numeric digits outside the ASCII range are parsed by this -- parser but not by 'digitChar'. Such digits may be part of identifiers but -- are not used by the printer and reader to represent numbers. alphaNumChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) alphaNumChar = satisfy isAlphaNum "alphanumeric character" {-# INLINE alphaNumChar #-} -- | Parse a printable Unicode character: letter, number, mark, punctuation, -- symbol or space. printChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) printChar = satisfy isPrint "printable character" {-# INLINE printChar #-} -- | Parse an ASCII digit, i.e between “0” and “9”. digitChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) digitChar = satisfy isDigit "digit" {-# INLINE digitChar #-} -- | Parse a binary digit, i.e. "0" or "1". -- -- @since 7.0.0 binDigitChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) binDigitChar = satisfy isBinDigit "binary digit" where isBinDigit x = x == '0' || x == '1' {-# INLINE binDigitChar #-} -- | Parse an octal digit, i.e. between “0” and “7”. octDigitChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) octDigitChar = satisfy isOctDigit "octal digit" {-# INLINE octDigitChar #-} -- | Parse a hexadecimal digit, i.e. between “0” and “9”, or “a” and “f”, or -- “A” and “F”. hexDigitChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) hexDigitChar = satisfy isHexDigit "hexadecimal digit" {-# INLINE hexDigitChar #-} -- | Parse a Unicode mark character (accents and the like), which combines -- with preceding characters. markChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) markChar = satisfy isMark "mark character" {-# INLINE markChar #-} -- | Parse a Unicode numeric character, including digits from various -- scripts, Roman numerals, etc. numberChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) numberChar = satisfy isNumber "numeric character" {-# INLINE numberChar #-} -- | Parse a Unicode punctuation character, including various kinds of -- connectors, brackets and quotes. punctuationChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) punctuationChar = satisfy isPunctuation "punctuation" {-# INLINE punctuationChar #-} -- | Parse a Unicode symbol characters, including mathematical and currency -- symbols. symbolChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) symbolChar = satisfy isSymbol "symbol" {-# INLINE symbolChar #-} -- | Parse a Unicode space and separator characters. separatorChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) separatorChar = satisfy isSeparator "separator" {-# INLINE separatorChar #-} -- | Parse a character from the first 128 characters of the Unicode -- character set, corresponding to the ASCII character set. asciiChar :: (MonadParsec e s m, Token s ~ Char) => m (Token s) asciiChar = satisfy isAscii "ASCII character" {-# INLINE asciiChar #-} -- | Parse a character from the first 256 characters of the Unicode -- character set, corresponding to the ISO 8859-1 (Latin-1) character set. latin1Char :: (MonadParsec e s m, Token s ~ Char) => m (Token s) latin1Char = satisfy isLatin1 "Latin-1 character" {-# INLINE latin1Char #-} -- | @'charCategory' cat@ parses character in Unicode General Category -- @cat@, see 'Data.Char.GeneralCategory'. charCategory :: (MonadParsec e s m, Token s ~ Char) => GeneralCategory -> m (Token s) charCategory cat = satisfy ((== cat) . generalCategory) categoryName cat {-# INLINE charCategory #-} -- | Return the human-readable name of Unicode General Category. categoryName :: GeneralCategory -> String categoryName = \case UppercaseLetter -> "uppercase letter" LowercaseLetter -> "lowercase letter" TitlecaseLetter -> "titlecase letter" ModifierLetter -> "modifier letter" OtherLetter -> "other letter" NonSpacingMark -> "non-spacing mark" SpacingCombiningMark -> "spacing combining mark" EnclosingMark -> "enclosing mark" DecimalNumber -> "decimal number character" LetterNumber -> "letter number character" OtherNumber -> "other number character" ConnectorPunctuation -> "connector punctuation" DashPunctuation -> "dash punctuation" OpenPunctuation -> "open punctuation" ClosePunctuation -> "close punctuation" InitialQuote -> "initial quote" FinalQuote -> "final quote" OtherPunctuation -> "other punctuation" MathSymbol -> "math symbol" CurrencySymbol -> "currency symbol" ModifierSymbol -> "modifier symbol" OtherSymbol -> "other symbol" Space -> "white space" LineSeparator -> "line separator" ParagraphSeparator -> "paragraph separator" Control -> "control character" Format -> "format character" Surrogate -> "surrogate character" PrivateUse -> "private-use Unicode character" NotAssigned -> "non-assigned Unicode character" ---------------------------------------------------------------------------- -- Single character -- | A type-constrained version of 'single'. -- -- > semicolon = char ';' char :: (MonadParsec e s m, Token s ~ Char) => Token s -> m (Token s) char = single {-# INLINE char #-} -- | The same as 'char' but case-insensitive. This parser returns the -- actually parsed character preserving its case. -- -- >>> parseTest (char' 'e') "E" -- 'E' -- >>> parseTest (char' 'e') "G" -- 1:1: -- unexpected 'G' -- expecting 'E' or 'e' char' :: (MonadParsec e s m, Token s ~ Char) => Token s -> m (Token s) char' c = choice [ char (toLower c), char (toUpper c), char (toTitle c) ] {-# INLINE char' #-} ---------------------------------------------------------------------------- -- Helpers -- | Is it a horizontal space character? isHSpace :: Char -> Bool isHSpace x = isSpace x && x /= '\n' && x /= '\r' megaparsec-9.3.1/Text/Megaparsec/Char/0000755000000000000000000000000007346545000015674 5ustar0000000000000000megaparsec-9.3.1/Text/Megaparsec/Char/Lexer.hs0000644000000000000000000004316107346545000017314 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE MultiWayIf #-} {-# LANGUAGE Safe #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeOperators #-} -- | -- Module : Text.Megaparsec.Char.Lexer -- Copyright : © 2015–present Megaparsec contributors -- © 2007 Paolo Martini -- © 1999–2001 Daan Leijen -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : non-portable -- -- High-level parsers to help you write your lexer. The module doesn't -- impose how you should write your parser, but certain approaches may be -- more elegant than others. -- -- Parsing of white space is an important part of any parser. We propose a -- convention where __every lexeme parser assumes no spaces before the__ -- __lexeme and consumes all spaces after the lexeme__; this is what the -- 'lexeme' combinator does, and so it's enough to wrap every lexeme parser -- with 'lexeme' to achieve this. Note that you'll need to call 'space' -- manually to consume any white space before the first lexeme (i.e. at the -- beginning of the file). -- -- This module is intended to be imported qualified: -- -- > import qualified Text.Megaparsec.Char.Lexer as L -- -- To do lexing of byte streams, see "Text.Megaparsec.Byte.Lexer". module Text.Megaparsec.Char.Lexer ( -- * White space space, lexeme, symbol, symbol', skipLineComment, skipBlockComment, skipBlockCommentNested, -- * Indentation indentLevel, incorrectIndent, indentGuard, nonIndented, IndentOpt (..), indentBlock, lineFold, -- * Character and string literals charLiteral, -- * Numbers decimal, binary, octal, hexadecimal, scientific, float, signed, ) where import Control.Applicative import Control.Monad (void) import qualified Data.Char as Char import Data.List (foldl') import Data.List.NonEmpty (NonEmpty (..)) import Data.Maybe (fromMaybe, isJust, listToMaybe) import Data.Proxy import Data.Scientific (Scientific) import qualified Data.Scientific as Sci import qualified Data.Set as E import Text.Megaparsec import qualified Text.Megaparsec.Char as C import Text.Megaparsec.Lexer ---------------------------------------------------------------------------- -- White space -- | Given a comment prefix this function returns a parser that skips line -- comments. Note that it stops just before the newline character but -- doesn't consume the newline. Newline is either supposed to be consumed by -- 'space' parser or picked up manually. skipLineComment :: (MonadParsec e s m, Token s ~ Char) => -- | Line comment prefix Tokens s -> m () skipLineComment prefix = C.string prefix *> void (takeWhileP (Just "character") (/= '\n')) {-# INLINEABLE skipLineComment #-} -- | @'skipBlockComment' start end@ skips non-nested block comment starting -- with @start@ and ending with @end@. skipBlockComment :: (MonadParsec e s m, Token s ~ Char) => -- | Start of block comment Tokens s -> -- | End of block comment Tokens s -> m () skipBlockComment start end = p >> void (manyTill anySingle n) where p = C.string start n = C.string end {-# INLINEABLE skipBlockComment #-} -- | @'skipBlockCommentNested' start end@ skips possibly nested block -- comment starting with @start@ and ending with @end@. -- -- @since 5.0.0 skipBlockCommentNested :: (MonadParsec e s m, Token s ~ Char) => -- | Start of block comment Tokens s -> -- | End of block comment Tokens s -> m () skipBlockCommentNested start end = p >> void (manyTill e n) where e = skipBlockCommentNested start end <|> void anySingle p = C.string start n = C.string end {-# INLINEABLE skipBlockCommentNested #-} ---------------------------------------------------------------------------- -- Indentation -- | Return the current indentation level. -- -- The function is a simple shortcut defined as: -- -- > indentLevel = sourceColumn <$> getPosition -- -- @since 4.3.0 indentLevel :: (TraversableStream s, MonadParsec e s m) => m Pos indentLevel = sourceColumn <$> getSourcePos {-# INLINE indentLevel #-} -- | Fail reporting incorrect indentation error. The error has attached -- information: -- -- * Desired ordering between reference level and actual level -- * Reference indentation level -- * Actual indentation level -- -- @since 5.0.0 incorrectIndent :: (MonadParsec e s m) => -- | Desired ordering between reference level and actual level Ordering -> -- | Reference indentation level Pos -> -- | Actual indentation level Pos -> m a incorrectIndent ord ref actual = fancyFailure . E.singleton $ ErrorIndentation ord ref actual {-# INLINEABLE incorrectIndent #-} -- | @'indentGuard' spaceConsumer ord ref@ first consumes all white space -- (indentation) with @spaceConsumer@ parser, then it checks the column -- position. Ordering between current indentation level and the reference -- indentation level @ref@ should be @ord@, otherwise the parser fails. On -- success the current column position is returned. -- -- When you want to parse a block of indentation, first run this parser with -- arguments like @'indentGuard' spaceConsumer 'GT' 'pos1'@—this will make -- sure you have some indentation. Use returned value to check indentation -- on every subsequent line according to syntax of your language. indentGuard :: (TraversableStream s, MonadParsec e s m) => -- | How to consume indentation (white space) m () -> -- | Desired ordering between reference level and actual level Ordering -> -- | Reference indentation level Pos -> -- | Current column (indentation level) m Pos indentGuard sc ord ref = do sc actual <- indentLevel if compare actual ref == ord then return actual else incorrectIndent ord ref actual {-# INLINEABLE indentGuard #-} -- | Parse a non-indented construction. This ensures that there is no -- indentation before actual data. Useful, for example, as a wrapper for -- top-level function definitions. -- -- @since 4.3.0 nonIndented :: (TraversableStream s, MonadParsec e s m) => -- | How to consume indentation (white space) m () -> -- | How to parse actual data m a -> m a nonIndented sc p = indentGuard sc EQ pos1 *> p {-# INLINEABLE nonIndented #-} -- | Behaviors for parsing of indented tokens. This is used in -- 'indentBlock', which see. -- -- @since 4.3.0 data IndentOpt m a b = -- | Parse no indented tokens, just return the value IndentNone a | -- | Parse many indented tokens (possibly zero), use given indentation -- level (if 'Nothing', use level of the first indented token); the -- second argument tells how to get the final result, and the third -- argument describes how to parse an indented token IndentMany (Maybe Pos) ([b] -> m a) (m b) | -- | Just like 'IndentMany', but requires at least one indented token to -- be present IndentSome (Maybe Pos) ([b] -> m a) (m b) -- | Parse a “reference” token and a number of other tokens that have a -- greater (but the same for all of them) level of indentation than that of -- the “reference” token. The reference token can influence parsing, see -- 'IndentOpt' for more information. -- -- __Note__: the first argument of this function /must/ consume newlines -- among other white space characters. -- -- @since 4.3.0 indentBlock :: (TraversableStream s, MonadParsec e s m, Token s ~ Char) => -- | How to consume indentation (white space) m () -> -- | How to parse “reference” token m (IndentOpt m a b) -> m a indentBlock sc r = do sc ref <- indentLevel a <- r case a of IndentNone x -> x <$ sc IndentMany indent f p -> do mlvl <- (optional . try) (C.eol *> indentGuard sc GT ref) done <- isJust <$> optional eof case (mlvl, done) of (Just lvl, False) -> indentedItems ref (fromMaybe lvl indent) sc p >>= f _ -> sc *> f [] IndentSome indent f p -> do pos <- C.eol *> indentGuard sc GT ref let lvl = fromMaybe pos indent x <- if | pos <= ref -> incorrectIndent GT ref pos | pos == lvl -> p | otherwise -> incorrectIndent EQ lvl pos xs <- indentedItems ref lvl sc p f (x : xs) {-# INLINEABLE indentBlock #-} -- | Grab indented items. This is a helper for 'indentBlock', it's not a -- part of the public API. indentedItems :: (TraversableStream s, MonadParsec e s m) => -- | Reference indentation level Pos -> -- | Level of the first indented item ('lookAhead'ed) Pos -> -- | How to consume indentation (white space) m () -> -- | How to parse indented tokens m b -> m [b] indentedItems ref lvl sc p = go where go = do sc pos <- indentLevel done <- isJust <$> optional eof if done then return [] else if | pos <= ref -> return [] | pos == lvl -> (:) <$> p <*> go | otherwise -> incorrectIndent EQ lvl pos -- | Create a parser that supports line-folding. The first argument is used -- to consume white space between components of line fold, thus it /must/ -- consume newlines in order to work properly. The second argument is a -- callback that receives a custom space-consuming parser as an argument. -- This parser should be used after separate components of line fold that -- can be put on different lines. -- -- An example should clarify the usage pattern: -- -- > sc = L.space (void spaceChar) empty empty -- > -- > myFold = L.lineFold sc $ \sc' -> do -- > L.symbol sc' "foo" -- > L.symbol sc' "bar" -- > L.symbol sc "baz" -- for the last symbol we use normal space consumer -- -- @since 5.0.0 lineFold :: (TraversableStream s, MonadParsec e s m) => -- | How to consume indentation (white space) m () -> -- | Callback that uses provided space-consumer (m () -> m a) -> m a lineFold sc action = sc >> indentLevel >>= action . void . indentGuard sc GT {-# INLINEABLE lineFold #-} ---------------------------------------------------------------------------- -- Character and string literals -- | The lexeme parser parses a single literal character without quotes. The -- purpose of this parser is to help with parsing of conventional escape -- sequences. It's your responsibility to take care of character literal -- syntax in your language (by surrounding it with single quotes or -- similar). -- -- The literal character is parsed according to the grammar rules defined in -- the Haskell report. -- -- Note that you can use this parser as a building block to parse various -- string literals: -- -- > stringLiteral = char '"' >> manyTill L.charLiteral (char '"') -- -- __Performance note__: the parser is not particularly efficient at the -- moment. charLiteral :: (MonadParsec e s m, Token s ~ Char) => m Char charLiteral = label "literal character" $ do -- The @~@ is needed to avoid requiring a MonadFail constraint, -- and we do know that r will be non-empty if count' succeeds. r <- lookAhead (count' 1 10 anySingle) case listToMaybe (Char.readLitChar r) of Just (c, r') -> c <$ skipCount (length r - length r') anySingle Nothing -> unexpected (Tokens (head r :| [])) {-# INLINEABLE charLiteral #-} ---------------------------------------------------------------------------- -- Numbers -- | Parse an integer in the decimal representation according to the format -- of integer literals described in the Haskell report. -- -- If you need to parse signed integers, see the 'signed' combinator. -- -- __Note__: before the version /6.0.0/ the function returned 'Integer', -- i.e. it wasn't polymorphic in its return type. -- -- __Warning__: this function does not perform range checks. decimal :: (MonadParsec e s m, Token s ~ Char, Num a) => m a decimal = decimal_ "integer" {-# INLINEABLE decimal #-} -- | A non-public helper to parse decimal integers. decimal_ :: forall e s m a. (MonadParsec e s m, Token s ~ Char, Num a) => m a decimal_ = mkNum <$> takeWhile1P (Just "digit") Char.isDigit where mkNum = foldl' step 0 . chunkToTokens (Proxy :: Proxy s) step a c = a * 10 + fromIntegral (Char.digitToInt c) {-# INLINE decimal_ #-} -- | Parse an integer in binary representation. The binary number is -- expected to be a non-empty sequence of zeroes “0” and ones “1”. -- -- You could of course parse some prefix before the actual number: -- -- > binary = char '0' >> char' 'b' >> L.binary -- -- __Warning__: this function does not perform range checks. -- -- @since 7.0.0 binary :: forall e s m a. (MonadParsec e s m, Token s ~ Char, Num a) => m a binary = mkNum <$> takeWhile1P Nothing isBinDigit "binary integer" where mkNum = foldl' step 0 . chunkToTokens (Proxy :: Proxy s) step a c = a * 2 + fromIntegral (Char.digitToInt c) isBinDigit x = x == '0' || x == '1' {-# INLINEABLE binary #-} -- | Parse an integer in the octal representation. The format of the octal -- number is expected to be according to the Haskell report except for the -- fact that this parser doesn't parse “0o” or “0O” prefix. It is a -- responsibility of the programmer to parse correct prefix before parsing -- the number itself. -- -- For example you can make it conform to the Haskell report like this: -- -- > octal = char '0' >> char' 'o' >> L.octal -- -- __Note__: before version /6.0.0/ the function returned 'Integer', i.e. it -- wasn't polymorphic in its return type. -- -- __Warning__: this function does not perform range checks. octal :: forall e s m a. (MonadParsec e s m, Token s ~ Char, Num a) => m a octal = mkNum <$> takeWhile1P Nothing Char.isOctDigit "octal integer" where mkNum = foldl' step 0 . chunkToTokens (Proxy :: Proxy s) step a c = a * 8 + fromIntegral (Char.digitToInt c) {-# INLINEABLE octal #-} -- | Parse an integer in the hexadecimal representation. The format of the -- hexadecimal number is expected to be according to the Haskell report -- except for the fact that this parser doesn't parse “0x” or “0X” prefix. -- It is a responsibility of the programmer to parse correct prefix before -- parsing the number itself. -- -- For example you can make it conform to the Haskell report like this: -- -- > hexadecimal = char '0' >> char' 'x' >> L.hexadecimal -- -- __Note__: before version /6.0.0/ the function returned 'Integer', i.e. it -- wasn't polymorphic in its return type. -- -- __Warning__: this function does not perform range checks. hexadecimal :: forall e s m a. (MonadParsec e s m, Token s ~ Char, Num a) => m a hexadecimal = mkNum <$> takeWhile1P Nothing Char.isHexDigit "hexadecimal integer" where mkNum = foldl' step 0 . chunkToTokens (Proxy :: Proxy s) step a c = a * 16 + fromIntegral (Char.digitToInt c) {-# INLINEABLE hexadecimal #-} -- | Parse a floating point value as a 'Scientific' number. 'Scientific' is -- great for parsing of arbitrary precision numbers coming from an untrusted -- source. See documentation in "Data.Scientific" for more information. -- -- The parser can be used to parse integers or floating point values. Use -- functions like 'Data.Scientific.floatingOrInteger' from "Data.Scientific" -- to test and extract integer or real values. -- -- This function does not parse sign, if you need to parse signed numbers, -- see 'signed'. -- -- @since 5.0.0 scientific :: forall e s m. (MonadParsec e s m, Token s ~ Char) => m Scientific scientific = do c' <- decimal_ SP c e' <- option (SP c' 0) (try $ dotDecimal_ (Proxy :: Proxy s) c') e <- option e' (try $ exponent_ e') return (Sci.scientific c e) {-# INLINEABLE scientific #-} data SP = SP !Integer {-# UNPACK #-} !Int -- | Parse a floating point number according to the syntax for floating -- point literals described in the Haskell report. -- -- This function does not parse sign, if you need to parse signed numbers, -- see 'signed'. -- -- __Note__: before version /6.0.0/ the function returned 'Double', i.e. it -- wasn't polymorphic in its return type. -- -- __Note__: in versions /6.0.0/–/6.1.1/ this function accepted plain -- integers. float :: (MonadParsec e s m, Token s ~ Char, RealFloat a) => m a float = do c' <- decimal_ Sci.toRealFloat <$> ( ( do SP c e' <- dotDecimal_ (Proxy :: Proxy s) c' e <- option e' (try $ exponent_ e') return (Sci.scientific c e) ) <|> (Sci.scientific c' <$> exponent_ 0) ) {-# INLINEABLE float #-} dotDecimal_ :: (MonadParsec e s m, Token s ~ Char) => Proxy s -> Integer -> m SP dotDecimal_ pxy c' = do void (C.char '.') let mkNum = foldl' step (SP c' 0) . chunkToTokens pxy step (SP a e') c = SP (a * 10 + fromIntegral (Char.digitToInt c)) (e' - 1) mkNum <$> takeWhile1P (Just "digit") Char.isDigit {-# INLINE dotDecimal_ #-} exponent_ :: (MonadParsec e s m, Token s ~ Char) => Int -> m Int exponent_ e' = do void (C.char' 'e') (+ e') <$> signed (return ()) decimal_ {-# INLINE exponent_ #-} -- | @'signed' space p@ parses an optional sign character (“+” or “-”), then -- if there is a sign it consumes optional white space (using the @space@ -- parser), then it runs the parser @p@ which should return a number. Sign -- of the number is changed according to the previously parsed sign -- character. -- -- For example, to parse signed integer you can write: -- -- > lexeme = L.lexeme spaceConsumer -- > integer = lexeme L.decimal -- > signedInteger = L.signed spaceConsumer integer signed :: (MonadParsec e s m, Token s ~ Char, Num a) => -- | How to consume white space after the sign m () -> -- | How to parse the number itself m a -> -- | Parser for signed numbers m a signed spc p = option id (lexeme spc sign) <*> p where sign = (id <$ C.char '+') <|> (negate <$ C.char '-') {-# INLINEABLE signed #-} megaparsec-9.3.1/Text/Megaparsec/Class.hs0000644000000000000000000004227207346545000016427 0ustar0000000000000000{-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE FunctionalDependencies #-} {-# LANGUAGE MultiParamTypeClasses #-} {-# LANGUAGE Safe #-} {-# LANGUAGE TupleSections #-} {-# LANGUAGE UndecidableInstances #-} -- | -- Module : Text.Megaparsec.Class -- Copyright : © 2015–present Megaparsec contributors -- © 2007 Paolo Martini -- © 1999–2001 Daan Leijen -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- Definition of 'MonadParsec'—type class describing monads that implement -- the full set of primitive parsers. -- -- @since 6.5.0 module Text.Megaparsec.Class ( MonadParsec (..), ) where import Control.Monad import Control.Monad.Identity import qualified Control.Monad.RWS.Lazy as L import qualified Control.Monad.RWS.Strict as S import Control.Monad.Trans import qualified Control.Monad.Trans.Reader as L import qualified Control.Monad.Trans.State.Lazy as L import qualified Control.Monad.Trans.State.Strict as S import qualified Control.Monad.Trans.Writer.Lazy as L import qualified Control.Monad.Trans.Writer.Strict as S import Data.Set (Set) import Text.Megaparsec.Error import Text.Megaparsec.State import Text.Megaparsec.Stream -- | Type class describing monads that implement the full set of primitive -- parsers. -- -- __Note__ that the following primitives are “fast” and should be taken -- advantage of as much as possible if your aim is a fast parser: 'tokens', -- 'takeWhileP', 'takeWhile1P', and 'takeP'. class (Stream s, MonadPlus m) => MonadParsec e s m | m -> e s where -- | Stop parsing and report the 'ParseError'. This is the only way to -- control position of the error without manipulating the parser state -- manually. -- -- @since 8.0.0 parseError :: ParseError s e -> m a -- | The parser @'label' name p@ behaves as parser @p@, but whenever the -- parser @p@ fails /without consuming any input/, it replaces names of -- “expected” tokens with the name @name@. label :: String -> m a -> m a -- | @'hidden' p@ behaves just like parser @p@, but it doesn't show any -- “expected” tokens in error message when @p@ fails. -- -- Please use 'hidden' instead of the old @'label' ""@ idiom. hidden :: m a -> m a hidden = label "" -- | The parser @'try' p@ behaves like the parser @p@, except that it -- backtracks the parser state when @p@ fails (either consuming input or -- not). -- -- This combinator is used whenever arbitrary look ahead is needed. Since -- it pretends that it hasn't consumed any input when @p@ fails, the -- ('A.<|>') combinator will try its second alternative even if the first -- parser failed while consuming input. -- -- For example, here is a parser that is supposed to parse the word “let” -- or the word “lexical”: -- -- >>> parseTest (string "let" <|> string "lexical") "lexical" -- 1:1: -- unexpected "lex" -- expecting "let" -- -- What happens here? The first parser consumes “le” and fails (because it -- doesn't see a “t”). The second parser, however, isn't tried, since the -- first parser has already consumed some input! 'try' fixes this behavior -- and allows backtracking to work: -- -- >>> parseTest (try (string "let") <|> string "lexical") "lexical" -- "lexical" -- -- 'try' also improves error messages in case of overlapping alternatives, -- because Megaparsec's hint system can be used: -- -- >>> parseTest (try (string "let") <|> string "lexical") "le" -- 1:1: -- unexpected "le" -- expecting "let" or "lexical" -- -- __Note__ that as of Megaparsec 4.4.0, 'Text.Megaparsec.Char.string' -- backtracks automatically (see 'tokens'), so it does not need 'try'. -- However, the examples above demonstrate the idea behind 'try' so well -- that it was decided to keep them. You still need to use 'try' when your -- alternatives are complex, composite parsers. try :: m a -> m a -- | If @p@ in @'lookAhead' p@ succeeds (either consuming input or not) -- the whole parser behaves like @p@ succeeded without consuming anything -- (parser state is not updated as well). If @p@ fails, 'lookAhead' has no -- effect, i.e. it will fail consuming input if @p@ fails consuming input. -- Combine with 'try' if this is undesirable. lookAhead :: m a -> m a -- | @'notFollowedBy' p@ only succeeds when the parser @p@ fails. This -- parser /never consumes/ any input and /never modifies/ parser state. It -- can be used to implement the “longest match” rule. notFollowedBy :: m a -> m () -- | @'withRecovery' r p@ allows us to continue parsing even if the parser -- @p@ fails. In this case @r@ is called with the actual 'ParseError' as -- its argument. Typical usage is to return a value signifying failure to -- parse this particular object and to consume some part of the input up -- to the point where the next object starts. -- -- Note that if @r@ fails, the original error message is reported as if -- without 'withRecovery'. In no way recovering parser @r@ can influence -- error messages. -- -- @since 4.4.0 withRecovery :: -- | How to recover from failure (ParseError s e -> m a) -> -- | Original parser m a -> -- | Parser that can recover from failures m a -- | @'observing' p@ allows us to “observe” failure of the @p@ parser, -- should it happen, without actually ending parsing but instead getting -- the 'ParseError' in 'Left'. On success parsed value is returned in -- 'Right' as usual. Note that this primitive just allows you to observe -- parse errors as they happen, it does not backtrack or change how the -- @p@ parser works in any way. -- -- @since 5.1.0 observing :: -- | The parser to run m a -> m (Either (ParseError s e) a) -- | This parser only succeeds at the end of input. eof :: m () -- | The parser @'token' test expected@ accepts tokens for which the -- matching function @test@ returns 'Just' results. If 'Nothing' is -- returned the @expected@ set is used to report the items that were -- expected. -- -- For example, the 'Text.Megaparsec.satisfy' parser is implemented as: -- -- > satisfy f = token testToken Set.empty -- > where -- > testToken x = if f x then Just x else Nothing -- -- __Note__: type signature of this primitive was changed in the version -- /7.0.0/. token :: -- | Matching function for the token to parse (Token s -> Maybe a) -> -- | Used in the error message to mention the items that were expected Set (ErrorItem (Token s)) -> m a -- | The parser @'tokens' test chk@ parses a chunk of input @chk@ and -- returns it. The supplied predicate @test@ is used to check equality of -- given and parsed chunks after a candidate chunk of correct length is -- fetched from the stream. -- -- This can be used for example to write 'Text.Megaparsec.chunk': -- -- > chunk = tokens (==) -- -- Note that beginning from Megaparsec 4.4.0, this is an auto-backtracking -- primitive, which means that if it fails, it never consumes any input. -- This is done to make its consumption model match how error messages for -- this primitive are reported (which becomes an important thing as user -- gets more control with primitives like 'withRecovery'): -- -- >>> parseTest (string "abc") "abd" -- 1:1: -- unexpected "abd" -- expecting "abc" -- -- This means, in particular, that it's no longer necessary to use 'try' -- with 'tokens'-based parsers, such as 'Text.Megaparsec.Char.string' and -- 'Text.Megaparsec.Char.string''. This feature /does not/ affect -- performance in any way. tokens :: -- | Predicate to check equality of chunks (Tokens s -> Tokens s -> Bool) -> -- | Chunk of input to match against Tokens s -> m (Tokens s) -- | Parse /zero/ or more tokens for which the supplied predicate holds. -- Try to use this as much as possible because for many streams this -- combinator is much faster than parsers built with -- 'Control.Monad.Combinators.many' and 'Text.Megaparsec.satisfy'. -- -- > takeWhileP (Just "foo") f = many (satisfy f "foo") -- > takeWhileP Nothing f = many (satisfy f) -- -- The combinator never fails, although it may parse the empty chunk. -- -- @since 6.0.0 takeWhileP :: -- | Name for a single token in the row Maybe String -> -- | Predicate to use to test tokens (Token s -> Bool) -> -- | A chunk of matching tokens m (Tokens s) -- | Similar to 'takeWhileP', but fails if it can't parse at least one -- token. Try to use this as much as possible because for many streams -- this combinator is much faster than parsers built with -- 'Control.Monad.Combinators.some' and 'Text.Megaparsec.satisfy'. -- -- > takeWhile1P (Just "foo") f = some (satisfy f "foo") -- > takeWhile1P Nothing f = some (satisfy f) -- -- Note that the combinator either succeeds or fails without consuming any -- input, so 'try' is not necessary with it. -- -- @since 6.0.0 takeWhile1P :: -- | Name for a single token in the row Maybe String -> -- | Predicate to use to test tokens (Token s -> Bool) -> -- | A chunk of matching tokens m (Tokens s) -- | Extract the specified number of tokens from the input stream and -- return them packed as a chunk of stream. If there is not enough tokens -- in the stream, a parse error will be signaled. It's guaranteed that if -- the parser succeeds, the requested number of tokens will be returned. -- -- The parser is roughly equivalent to: -- -- > takeP (Just "foo") n = count n (anySingle "foo") -- > takeP Nothing n = count n anySingle -- -- Note that if the combinator fails due to insufficient number of tokens -- in the input stream, it backtracks automatically. No 'try' is necessary -- with 'takeP'. -- -- @since 6.0.0 takeP :: -- | Name for a single token in the row Maybe String -> -- | How many tokens to extract Int -> -- | A chunk of matching tokens m (Tokens s) -- | Return the full parser state as a 'State' record. getParserState :: m (State s e) -- | @'updateParserState' f@ applies the function @f@ to the parser state. updateParserState :: (State s e -> State s e) -> m () ---------------------------------------------------------------------------- -- Lifting through MTL instance (MonadParsec e s m) => MonadParsec e s (L.StateT st m) where parseError e = lift (parseError e) label n (L.StateT m) = L.StateT $ label n . m try (L.StateT m) = L.StateT $ try . m lookAhead (L.StateT m) = L.StateT $ \s -> (,s) . fst <$> lookAhead (m s) notFollowedBy (L.StateT m) = L.StateT $ \s -> notFollowedBy (fst <$> m s) >> return ((), s) withRecovery r (L.StateT m) = L.StateT $ \s -> withRecovery (\e -> L.runStateT (r e) s) (m s) observing (L.StateT m) = L.StateT $ \s -> fixs s <$> observing (m s) eof = lift eof token test mt = lift (token test mt) tokens e ts = lift (tokens e ts) takeWhileP l f = lift (takeWhileP l f) takeWhile1P l f = lift (takeWhile1P l f) takeP l n = lift (takeP l n) getParserState = lift getParserState updateParserState f = lift (updateParserState f) instance (MonadParsec e s m) => MonadParsec e s (S.StateT st m) where parseError e = lift (parseError e) label n (S.StateT m) = S.StateT $ label n . m try (S.StateT m) = S.StateT $ try . m lookAhead (S.StateT m) = S.StateT $ \s -> (,s) . fst <$> lookAhead (m s) notFollowedBy (S.StateT m) = S.StateT $ \s -> notFollowedBy (fst <$> m s) >> return ((), s) withRecovery r (S.StateT m) = S.StateT $ \s -> withRecovery (\e -> S.runStateT (r e) s) (m s) observing (S.StateT m) = S.StateT $ \s -> fixs s <$> observing (m s) eof = lift eof token test mt = lift (token test mt) tokens e ts = lift (tokens e ts) takeWhileP l f = lift (takeWhileP l f) takeWhile1P l f = lift (takeWhile1P l f) takeP l n = lift (takeP l n) getParserState = lift getParserState updateParserState f = lift (updateParserState f) instance (MonadParsec e s m) => MonadParsec e s (L.ReaderT r m) where parseError e = lift (parseError e) label n (L.ReaderT m) = L.ReaderT $ label n . m try (L.ReaderT m) = L.ReaderT $ try . m lookAhead (L.ReaderT m) = L.ReaderT $ lookAhead . m notFollowedBy (L.ReaderT m) = L.ReaderT $ notFollowedBy . m withRecovery r (L.ReaderT m) = L.ReaderT $ \s -> withRecovery (\e -> L.runReaderT (r e) s) (m s) observing (L.ReaderT m) = L.ReaderT $ observing . m eof = lift eof token test mt = lift (token test mt) tokens e ts = lift (tokens e ts) takeWhileP l f = lift (takeWhileP l f) takeWhile1P l f = lift (takeWhile1P l f) takeP l n = lift (takeP l n) getParserState = lift getParserState updateParserState f = lift (updateParserState f) instance (Monoid w, MonadParsec e s m) => MonadParsec e s (L.WriterT w m) where parseError e = lift (parseError e) label n (L.WriterT m) = L.WriterT $ label n m try (L.WriterT m) = L.WriterT $ try m lookAhead (L.WriterT m) = L.WriterT $ (,mempty) . fst <$> lookAhead m notFollowedBy (L.WriterT m) = L.WriterT $ (,mempty) <$> notFollowedBy (fst <$> m) withRecovery r (L.WriterT m) = L.WriterT $ withRecovery (L.runWriterT . r) m observing (L.WriterT m) = L.WriterT $ fixs mempty <$> observing m eof = lift eof token test mt = lift (token test mt) tokens e ts = lift (tokens e ts) takeWhileP l f = lift (takeWhileP l f) takeWhile1P l f = lift (takeWhile1P l f) takeP l n = lift (takeP l n) getParserState = lift getParserState updateParserState f = lift (updateParserState f) instance (Monoid w, MonadParsec e s m) => MonadParsec e s (S.WriterT w m) where parseError e = lift (parseError e) label n (S.WriterT m) = S.WriterT $ label n m try (S.WriterT m) = S.WriterT $ try m lookAhead (S.WriterT m) = S.WriterT $ (,mempty) . fst <$> lookAhead m notFollowedBy (S.WriterT m) = S.WriterT $ (,mempty) <$> notFollowedBy (fst <$> m) withRecovery r (S.WriterT m) = S.WriterT $ withRecovery (S.runWriterT . r) m observing (S.WriterT m) = S.WriterT $ fixs mempty <$> observing m eof = lift eof token test mt = lift (token test mt) tokens e ts = lift (tokens e ts) takeWhileP l f = lift (takeWhileP l f) takeWhile1P l f = lift (takeWhile1P l f) takeP l n = lift (takeP l n) getParserState = lift getParserState updateParserState f = lift (updateParserState f) -- | @since 5.2.0 instance (Monoid w, MonadParsec e s m) => MonadParsec e s (L.RWST r w st m) where parseError e = lift (parseError e) label n (L.RWST m) = L.RWST $ \r s -> label n (m r s) try (L.RWST m) = L.RWST $ \r s -> try (m r s) lookAhead (L.RWST m) = L.RWST $ \r s -> do (x, _, _) <- lookAhead (m r s) return (x, s, mempty) notFollowedBy (L.RWST m) = L.RWST $ \r s -> do notFollowedBy (void $ m r s) return ((), s, mempty) withRecovery n (L.RWST m) = L.RWST $ \r s -> withRecovery (\e -> L.runRWST (n e) r s) (m r s) observing (L.RWST m) = L.RWST $ \r s -> fixs' s <$> observing (m r s) eof = lift eof token test mt = lift (token test mt) tokens e ts = lift (tokens e ts) takeWhileP l f = lift (takeWhileP l f) takeWhile1P l f = lift (takeWhile1P l f) takeP l n = lift (takeP l n) getParserState = lift getParserState updateParserState f = lift (updateParserState f) -- | @since 5.2.0 instance (Monoid w, MonadParsec e s m) => MonadParsec e s (S.RWST r w st m) where parseError e = lift (parseError e) label n (S.RWST m) = S.RWST $ \r s -> label n (m r s) try (S.RWST m) = S.RWST $ \r s -> try (m r s) lookAhead (S.RWST m) = S.RWST $ \r s -> do (x, _, _) <- lookAhead (m r s) return (x, s, mempty) notFollowedBy (S.RWST m) = S.RWST $ \r s -> do notFollowedBy (void $ m r s) return ((), s, mempty) withRecovery n (S.RWST m) = S.RWST $ \r s -> withRecovery (\e -> S.runRWST (n e) r s) (m r s) observing (S.RWST m) = S.RWST $ \r s -> fixs' s <$> observing (m r s) eof = lift eof token test mt = lift (token test mt) tokens e ts = lift (tokens e ts) takeWhileP l f = lift (takeWhileP l f) takeWhile1P l f = lift (takeWhile1P l f) takeP l n = lift (takeP l n) getParserState = lift getParserState updateParserState f = lift (updateParserState f) instance (MonadParsec e s m) => MonadParsec e s (IdentityT m) where parseError e = lift (parseError e) label n (IdentityT m) = IdentityT $ label n m try = IdentityT . try . runIdentityT lookAhead (IdentityT m) = IdentityT $ lookAhead m notFollowedBy (IdentityT m) = IdentityT $ notFollowedBy m withRecovery r (IdentityT m) = IdentityT $ withRecovery (runIdentityT . r) m observing (IdentityT m) = IdentityT $ observing m eof = lift eof token test mt = lift (token test mt) tokens e ts = lift $ tokens e ts takeWhileP l f = lift (takeWhileP l f) takeWhile1P l f = lift (takeWhile1P l f) takeP l n = lift (takeP l n) getParserState = lift getParserState updateParserState f = lift $ updateParserState f fixs :: s -> Either a (b, s) -> (Either a b, s) fixs s (Left a) = (Left a, s) fixs _ (Right (b, s)) = (Right b, s) {-# INLINE fixs #-} fixs' :: (Monoid w) => s -> Either a (b, s, w) -> (Either a b, s, w) fixs' s (Left a) = (Left a, s, mempty) fixs' _ (Right (b, s, w)) = (Right b, s, w) {-# INLINE fixs' #-} megaparsec-9.3.1/Text/Megaparsec/Common.hs0000644000000000000000000000206207346545000016603 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE Safe #-} -- | -- Module : Text.Megaparsec.Common -- Copyright : © 2018–present Megaparsec contributors -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- Common token combinators. This module is not public, the functions from -- it are re-exported in "Text.Megaparsec.Byte" and "Text.Megaparsec.Char". -- -- @since 7.0.0 module Text.Megaparsec.Common ( string, string', ) where import qualified Data.CaseInsensitive as CI import Data.Function (on) import Text.Megaparsec -- | A synonym for 'chunk'. string :: (MonadParsec e s m) => Tokens s -> m (Tokens s) string = chunk {-# INLINE string #-} -- | The same as 'string', but case-insensitive. On success returns string -- cased as the parsed input. -- -- >>> parseTest (string' "foobar") "foObAr" -- "foObAr" string' :: (MonadParsec e s m, CI.FoldCase (Tokens s)) => Tokens s -> m (Tokens s) string' = tokens ((==) `on` CI.mk) {-# INLINE string' #-} megaparsec-9.3.1/Text/Megaparsec/Debug.hs0000644000000000000000000002432607346545000016410 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE MultiParamTypeClasses #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE Unsafe #-} -- | -- Module : Text.Megaparsec.Debug -- Copyright : © 2015–present Megaparsec contributors -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- Debugging helpers. -- -- @since 7.0.0 module Text.Megaparsec.Debug ( MonadParsecDbg (..), dbg', ) where import Control.Monad.Identity (IdentityT, mapIdentityT) import qualified Control.Monad.Trans.RWS.Lazy as L import qualified Control.Monad.Trans.RWS.Strict as S import qualified Control.Monad.Trans.Reader as L import qualified Control.Monad.Trans.State.Lazy as L import qualified Control.Monad.Trans.State.Strict as S import qualified Control.Monad.Trans.Writer.Lazy as L import qualified Control.Monad.Trans.Writer.Strict as S import Data.Bifunctor (Bifunctor (first)) import qualified Data.List.NonEmpty as NE import Data.Proxy import Debug.Trace import Text.Megaparsec.Class (MonadParsec) import Text.Megaparsec.Error import Text.Megaparsec.Internal import Text.Megaparsec.State import Text.Megaparsec.Stream -- | Type class describing parser monads that can trace during evaluation. -- -- @since 9.3.0 class (MonadParsec e s m) => MonadParsecDbg e s m where -- | @'dbg' label p@ parser works exactly like @p@, but when it's evaluated -- it prints information useful for debugging. The @label@ is only used to -- refer to this parser in the debugging output. This combinator uses the -- 'trace' function from "Debug.Trace" under the hood. -- -- Typical usage is to wrap every sub-parser in misbehaving parser with -- 'dbg' assigning meaningful labels. Then give it a shot and go through the -- print-out. As of current version, this combinator prints all available -- information except for /hints/, which are probably only interesting to -- the maintainer of Megaparsec itself and may be quite verbose to output in -- general. Let me know if you would like to be able to see hints in the -- debugging output. -- -- The output itself is pretty self-explanatory, although the following -- abbreviations should be clarified (they are derived from the low-level -- source code): -- -- * @COK@—“consumed OK”. The parser consumed input and succeeded. -- * @CERR@—“consumed error”. The parser consumed input and failed. -- * @EOK@—“empty OK”. The parser succeeded without consuming input. -- * @EERR@—“empty error”. The parser failed without consuming input. -- -- __Note__: up until the version /9.3.0/ this was a non-polymorphic -- function that worked only in 'ParsecT'. It was first introduced in the -- version /7.0.0/. dbg :: (Show a) => -- | Debugging label String -> -- | Parser to debug m a -> -- | Parser that prints debugging messages m a -- | @dbg (p :: StateT st m)@ prints state __after__ running @p@: -- -- >>> p = modify succ >> dbg "a" (single 'a' >> modify succ) -- >>> parseTest (runStateT p 0) "a" -- a> IN: 'a' -- a> MATCH (COK): 'a' -- a> VALUE: () (STATE: 2) -- ((),2) instance (Show st, MonadParsecDbg e s m) => MonadParsecDbg e s (L.StateT st m) where dbg str sma = L.StateT $ \s -> dbgWithComment "STATE" str $ L.runStateT sma s -- | @dbg (p :: StateT st m)@ prints state __after__ running @p@: -- -- >>> p = modify succ >> dbg "a" (single 'a' >> modify succ) -- >>> parseTest (runStateT p 0) "a" -- a> IN: 'a' -- a> MATCH (COK): 'a' -- a> VALUE: () (STATE: 2) -- ((),2) instance (Show st, MonadParsecDbg e s m) => MonadParsecDbg e s (S.StateT st m) where dbg str sma = S.StateT $ \s -> dbgWithComment "STATE" str $ S.runStateT sma s instance (MonadParsecDbg e s m) => MonadParsecDbg e s (L.ReaderT r m) where dbg = L.mapReaderT . dbg -- | @dbg (p :: WriterT st m)@ prints __only__ log produced by @p@: -- -- >>> p = tell [0] >> dbg "a" (single 'a' >> tell [1]) -- >>> parseTest (runWriterT p) "a" -- a> IN: 'a' -- a> MATCH (COK): 'a' -- a> VALUE: () (LOG: [1]) -- ((),[0,1]) instance (Monoid w, Show w, MonadParsecDbg e s m) => MonadParsecDbg e s (L.WriterT w m) where dbg str wma = L.WriterT $ dbgWithComment "LOG" str $ L.runWriterT wma -- | @dbg (p :: WriterT st m)@ prints __only__ log produced by @p@: -- -- >>> p = tell [0] >> dbg "a" (single 'a' >> tell [1]) -- >>> parseTest (runWriterT p) "a" -- a> IN: 'a' -- a> MATCH (COK): 'a' -- a> VALUE: () (LOG: [1]) -- ((),[0,1]) instance (Monoid w, Show w, MonadParsecDbg e s m) => MonadParsecDbg e s (S.WriterT w m) where dbg str wma = S.WriterT $ dbgWithComment "LOG" str $ S.runWriterT wma -- | @RWST@ works like @StateT@ inside a @WriterT@: subparser's log and its -- final state is printed: -- -- >>> p = tell [0] >> modify succ >> dbg "a" (single 'a' >> tell [1] >> modify succ) -- >>> parseTest (runRWST p () 0) "a" -- a> IN: 'a' -- a> MATCH (COK): 'a' -- a> VALUE: () (STATE: 2) (LOG: [1]) -- ((),2,[0,1]) instance (Monoid w, Show w, Show st, MonadParsecDbg e s m) => MonadParsecDbg e s (L.RWST r w st m) where dbg str sma = L.RWST $ \r s -> do let smth = (\(a, st, w) -> ShowComment "LOG" (ShowComment "STATE" (a, st), w)) <$> L.runRWST sma r s ((a, st), w) <- first unComment . unComment <$> dbg str smth pure (a, st, w) -- | @RWST@ works like @StateT@ inside a @WriterT@: subparser's log and its -- final state is printed: -- -- >>> p = tell [0] >> modify succ >> dbg "a" (single 'a' >> tell [1] >> modify succ) -- >>> parseTest (runRWST p () 0) "a" -- a> IN: 'a' -- a> MATCH (COK): 'a' -- a> VALUE: () (STATE: 2) (LOG: [1]) -- ((),2,[0,1]) instance (Monoid w, Show w, Show st, MonadParsecDbg e s m) => MonadParsecDbg e s (S.RWST r w st m) where dbg str sma = S.RWST $ \r s -> do let smth = (\(a, st, w) -> ShowComment "LOG" (ShowComment "STATE" (a, st), w)) <$> S.runRWST sma r s ((a, st), w) <- first unComment . unComment <$> dbg str smth pure (a, st, w) instance (MonadParsecDbg e s m) => MonadParsecDbg e s (IdentityT m) where dbg = mapIdentityT . dbg -- | @'dbgWithComment' label_a label_c m@ traces the first component of the -- result produced by @m@ with @label_a@ and the second component with -- @label_b@. dbgWithComment :: (MonadParsecDbg e s m, Show a, Show c) => -- | Debugging label (for @a@) String -> -- | Extra component label (for @c@) String -> -- | Parser to debug m (a, c) -> -- | Parser that prints debugging messages m (a, c) dbgWithComment lbl str ma = unComment <$> dbg str (ShowComment lbl <$> ma) -- | A wrapper with a special show instance: -- -- >>> show (ShowComment "STATE" ("Hello, world!", 42)) -- Hello, world! (STATE: 42) data ShowComment c a = ShowComment String (a, c) unComment :: ShowComment c a -> (a, c) unComment (ShowComment _ val) = val instance (Show c, Show a) => Show (ShowComment c a) where show (ShowComment lbl (a, c)) = show a ++ " (" ++ lbl ++ ": " ++ show c ++ ")" instance (VisualStream s, ShowErrorComponent e) => MonadParsecDbg e s (ParsecT e s m) where dbg lbl p = ParsecT $ \s cok cerr eok eerr -> let l = dbgLog lbl unfold = streamTake 40 cok' x s' hs = flip trace (cok x s' hs) $ l (DbgIn (unfold (stateInput s))) ++ l (DbgCOK (streamTake (streamDelta s s') (stateInput s)) x) cerr' err s' = flip trace (cerr err s') $ l (DbgIn (unfold (stateInput s))) ++ l (DbgCERR (streamTake (streamDelta s s') (stateInput s)) err) eok' x s' hs = flip trace (eok x s' hs) $ l (DbgIn (unfold (stateInput s))) ++ l (DbgEOK (streamTake (streamDelta s s') (stateInput s)) x) eerr' err s' = flip trace (eerr err s') $ l (DbgIn (unfold (stateInput s))) ++ l (DbgEERR (streamTake (streamDelta s s') (stateInput s)) err) in unParser p s cok' cerr' eok' eerr' -- | A single piece of info to be rendered with 'dbgLog'. data DbgItem s e a = DbgIn [Token s] | DbgCOK [Token s] a | DbgCERR [Token s] (ParseError s e) | DbgEOK [Token s] a | DbgEERR [Token s] (ParseError s e) -- | Render a single piece of debugging info. dbgLog :: forall s e a. (VisualStream s, ShowErrorComponent e, Show a) => -- | Debugging label String -> -- | Information to render DbgItem s e a -> -- | Rendered result String dbgLog lbl item = prefix msg where prefix = unlines . fmap ((lbl ++ "> ") ++) . lines pxy = Proxy :: Proxy s msg = case item of DbgIn ts -> "IN: " ++ showStream pxy ts DbgCOK ts a -> "MATCH (COK): " ++ showStream pxy ts ++ "\nVALUE: " ++ show a DbgCERR ts e -> "MATCH (CERR): " ++ showStream pxy ts ++ "\nERROR:\n" ++ parseErrorPretty e DbgEOK ts a -> "MATCH (EOK): " ++ showStream pxy ts ++ "\nVALUE: " ++ show a DbgEERR ts e -> "MATCH (EERR): " ++ showStream pxy ts ++ "\nERROR:\n" ++ parseErrorPretty e -- | Pretty-print a list of tokens. showStream :: (VisualStream s) => Proxy s -> [Token s] -> String showStream pxy ts = case NE.nonEmpty ts of Nothing -> "" Just ne -> let (h, r) = splitAt 40 (showTokens pxy ne) in if null r then h else h ++ " <…>" -- | Calculate number of consumed tokens given 'State' of parser before and -- after parsing. streamDelta :: -- | State of parser before consumption State s e -> -- | State of parser after consumption State s e -> -- | Number of consumed tokens Int streamDelta s0 s1 = stateOffset s1 - stateOffset s0 -- | Extract a given number of tokens from the stream. streamTake :: forall s. (Stream s) => Int -> s -> [Token s] streamTake n s = case fst <$> takeN_ n s of Nothing -> [] Just chk -> chunkToTokens (Proxy :: Proxy s) chk -- | Just like 'dbg', but doesn't require the return value of the parser to -- be 'Show'-able. -- -- @since 9.1.0 dbg' :: (MonadParsecDbg e s m) => -- | Debugging label String -> -- | Parser to debug m a -> -- | Parser that prints debugging messages m a dbg' lbl p = unBlind <$> dbg lbl (Blind <$> p) -- | A wrapper type with a dummy 'Show' instance. newtype Blind x = Blind {unBlind :: x} instance Show (Blind x) where show _ = "NOT SHOWN" megaparsec-9.3.1/Text/Megaparsec/Error.hs0000644000000000000000000003566507346545000016463 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE DeriveDataTypeable #-} {-# LANGUAGE DeriveFunctor #-} {-# LANGUAGE DeriveGeneric #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE LambdaCase #-} {-# LANGUAGE RecordWildCards #-} {-# LANGUAGE Safe #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE StandaloneDeriving #-} {-# LANGUAGE UndecidableInstances #-} -- | -- Module : Text.Megaparsec.Error -- Copyright : © 2015–present Megaparsec contributors -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- Parse errors. The current version of Megaparsec supports typed errors -- instead of 'String'-based ones. This gives a lot of flexibility in -- describing what exactly went wrong as well as a way to return arbitrary -- data in case of failure. -- -- You probably do not want to import this module directly because -- "Text.Megaparsec" re-exports it anyway. module Text.Megaparsec.Error ( -- * Parse error type ErrorItem (..), ErrorFancy (..), ParseError (..), mapParseError, errorOffset, setErrorOffset, ParseErrorBundle (..), attachSourcePos, -- * Pretty-printing ShowErrorComponent (..), errorBundlePretty, parseErrorPretty, parseErrorTextPretty, ) where import Control.DeepSeq import Control.Exception import Control.Monad.State.Strict import Data.Data (Data) import Data.List (intercalate) import Data.List.NonEmpty (NonEmpty (..)) import qualified Data.List.NonEmpty as NE import Data.Maybe (isNothing) import Data.Proxy import Data.Set (Set) import qualified Data.Set as E import Data.Typeable (Typeable) import Data.Void import GHC.Generics import Text.Megaparsec.Pos import Text.Megaparsec.State import Text.Megaparsec.Stream ---------------------------------------------------------------------------- -- Parse error type -- | A data type that is used to represent “unexpected\/expected” items in -- 'ParseError'. It is parametrized over the token type @t@. -- -- @since 5.0.0 data ErrorItem t = -- | Non-empty stream of tokens Tokens (NonEmpty t) | -- | Label (cannot be empty) Label (NonEmpty Char) | -- | End of input EndOfInput deriving (Show, Read, Eq, Ord, Data, Typeable, Generic, Functor) instance (NFData t) => NFData (ErrorItem t) -- | Additional error data, extendable by user. When no custom data is -- necessary, the type is typically indexed by 'Void' to “cancel” the -- 'ErrorCustom' constructor. -- -- @since 6.0.0 data ErrorFancy e = -- | 'fail' has been used in parser monad ErrorFail String | -- | Incorrect indentation error: desired ordering between reference -- level and actual level, reference indentation level, actual -- indentation level ErrorIndentation Ordering Pos Pos | -- | Custom error data ErrorCustom e deriving (Show, Read, Eq, Ord, Data, Typeable, Generic, Functor) instance (NFData a) => NFData (ErrorFancy a) where rnf (ErrorFail str) = rnf str rnf (ErrorIndentation ord ref act) = ord `seq` rnf ref `seq` rnf act rnf (ErrorCustom a) = rnf a -- | @'ParseError' s e@ represents a parse error parametrized over the -- stream type @s@ and the custom data @e@. -- -- 'Semigroup' and 'Monoid' instances of the data type allow us to merge -- parse errors from different branches of parsing. When merging two -- 'ParseError's, the longest match is preferred; if positions are the same, -- custom data sets and collections of message items are combined. Note that -- fancy errors take precedence over trivial errors in merging. -- -- @since 7.0.0 data ParseError s e = -- | Trivial errors, generated by the Megaparsec's machinery. The data -- constructor includes the offset of error, unexpected token (if any), -- and expected tokens. -- -- Type of the first argument was changed in the version /7.0.0/. TrivialError Int (Maybe (ErrorItem (Token s))) (Set (ErrorItem (Token s))) | -- | Fancy, custom errors. -- -- Type of the first argument was changed in the version /7.0.0/. FancyError Int (Set (ErrorFancy e)) deriving (Typeable, Generic) deriving instance ( Show (Token s), Show e ) => Show (ParseError s e) deriving instance ( Eq (Token s), Eq e ) => Eq (ParseError s e) deriving instance ( Data s, Data (Token s), Ord (Token s), Data e, Ord e ) => Data (ParseError s e) instance ( NFData (Token s), NFData e ) => NFData (ParseError s e) instance (Stream s, Ord e) => Semigroup (ParseError s e) where (<>) = mergeError {-# INLINE (<>) #-} instance (Stream s, Ord e) => Monoid (ParseError s e) where mempty = TrivialError 0 Nothing E.empty mappend = (<>) {-# INLINE mappend #-} instance ( Show s, Show (Token s), Show e, ShowErrorComponent e, VisualStream s, Typeable s, Typeable e ) => Exception (ParseError s e) where displayException = parseErrorPretty -- | Modify the custom data component in a parse error. This could be done -- via 'fmap' if not for the 'Ord' constraint. -- -- @since 7.0.0 mapParseError :: (Ord e') => (e -> e') -> ParseError s e -> ParseError s e' mapParseError _ (TrivialError o u p) = TrivialError o u p mapParseError f (FancyError o x) = FancyError o (E.map (fmap f) x) -- | Get the offset of a 'ParseError'. -- -- @since 7.0.0 errorOffset :: ParseError s e -> Int errorOffset (TrivialError o _ _) = o errorOffset (FancyError o _) = o -- | Set the offset of a 'ParseError'. -- -- @since 8.0.0 setErrorOffset :: Int -> ParseError s e -> ParseError s e setErrorOffset o (TrivialError _ u p) = TrivialError o u p setErrorOffset o (FancyError _ x) = FancyError o x -- | Merge two error data structures into one joining their collections of -- message items and preferring the longest match. In other words, earlier -- error message is discarded. This may seem counter-intuitive, but -- 'mergeError' is only used to merge error messages of alternative branches -- of parsing and in this case longest match should be preferred. mergeError :: (Stream s, Ord e) => ParseError s e -> ParseError s e -> ParseError s e mergeError e1 e2 = case errorOffset e1 `compare` errorOffset e2 of LT -> e2 EQ -> case (e1, e2) of (TrivialError s1 u1 p1, TrivialError _ u2 p2) -> TrivialError s1 (n u1 u2) (E.union p1 p2) (FancyError {}, TrivialError {}) -> e1 (TrivialError {}, FancyError {}) -> e2 (FancyError s1 x1, FancyError _ x2) -> FancyError s1 (E.union x1 x2) GT -> e1 where -- NOTE The logic behind this merging is that since we only combine -- parse errors that happen at exactly the same position, all the -- unexpected items will be prefixes of input stream at that position or -- labels referring to the same thing. Our aim here is to choose the -- longest prefix (merging with labels and end of input is somewhat -- arbitrary, but is necessary because otherwise we can't make -- ParseError lawful Monoid and have nice parse errors at the same -- time). n Nothing Nothing = Nothing n (Just x) Nothing = Just x n Nothing (Just y) = Just y n (Just x) (Just y) = Just (max x y) {-# INLINE mergeError #-} -- | A non-empty collection of 'ParseError's equipped with 'PosState' that -- allows us to pretty-print the errors efficiently and correctly. -- -- @since 7.0.0 data ParseErrorBundle s e = ParseErrorBundle { -- | A collection of 'ParseError's that is sorted by parse error offsets bundleErrors :: NonEmpty (ParseError s e), -- | The state that is used for line\/column calculation bundlePosState :: PosState s } deriving (Generic) deriving instance ( Show s, Show (Token s), Show e ) => Show (ParseErrorBundle s e) deriving instance ( Eq s, Eq (Token s), Eq e ) => Eq (ParseErrorBundle s e) deriving instance ( Typeable s, Typeable (Token s), Typeable e ) => Typeable (ParseErrorBundle s e) deriving instance ( Data s, Data (Token s), Ord (Token s), Data e, Ord e ) => Data (ParseErrorBundle s e) instance ( NFData s, NFData (Token s), NFData e ) => NFData (ParseErrorBundle s e) instance ( Show s, Show (Token s), Show e, ShowErrorComponent e, VisualStream s, TraversableStream s, Typeable s, Typeable e ) => Exception (ParseErrorBundle s e) where displayException = errorBundlePretty -- | Attach 'SourcePos'es to items in a 'Traversable' container given that -- there is a projection allowing us to get an offset per item. -- -- Items must be in ascending order with respect to their offsets. -- -- @since 7.0.0 attachSourcePos :: (Traversable t, TraversableStream s) => -- | How to project offset from an item (e.g. 'errorOffset') (a -> Int) -> -- | The collection of items t a -> -- | Initial 'PosState' PosState s -> -- | The collection with 'SourcePos'es added and the final 'PosState' (t (a, SourcePos), PosState s) attachSourcePos projectOffset xs = runState (traverse f xs) where f a = do pst <- get let pst' = reachOffsetNoLine (projectOffset a) pst put pst' return (a, pstateSourcePos pst') {-# INLINEABLE attachSourcePos #-} ---------------------------------------------------------------------------- -- Pretty-printing -- | The type class defines how to print a custom component of 'ParseError'. -- -- @since 5.0.0 class (Ord a) => ShowErrorComponent a where -- | Pretty-print a component of 'ParseError'. showErrorComponent :: a -> String -- | Length of the error component in characters, used for highlighting of -- parse errors in input string. -- -- @since 7.0.0 errorComponentLen :: a -> Int errorComponentLen _ = 1 instance ShowErrorComponent Void where showErrorComponent = absurd -- | Pretty-print a 'ParseErrorBundle'. All 'ParseError's in the bundle will -- be pretty-printed in order together with the corresponding offending -- lines by doing a single pass over the input stream. The rendered 'String' -- always ends with a newline. -- -- @since 7.0.0 errorBundlePretty :: forall s e. ( VisualStream s, TraversableStream s, ShowErrorComponent e ) => -- | Parse error bundle to display ParseErrorBundle s e -> -- | Textual rendition of the bundle String errorBundlePretty ParseErrorBundle {..} = let (r, _) = foldl f (id, bundlePosState) bundleErrors in drop 1 (r "") where f :: (ShowS, PosState s) -> ParseError s e -> (ShowS, PosState s) f (o, !pst) e = (o . (outChunk ++), pst') where (msline, pst') = reachOffset (errorOffset e) pst epos = pstateSourcePos pst' outChunk = "\n" <> sourcePosPretty epos <> ":\n" <> offendingLine <> parseErrorTextPretty e offendingLine = case msline of Nothing -> "" Just sline -> let rpadding = if pointerLen > 0 then replicate rpshift ' ' else "" pointerLen = if rpshift + elen > slineLen then slineLen - rpshift + 1 else elen pointer = replicate pointerLen '^' lineNumber = (show . unPos . sourceLine) epos padding = replicate (length lineNumber + 1) ' ' rpshift = unPos (sourceColumn epos) - 1 slineLen = length sline in padding <> "|\n" <> lineNumber <> " | " <> sline <> "\n" <> padding <> "| " <> rpadding <> pointer <> "\n" pxy = Proxy :: Proxy s elen = case e of TrivialError _ Nothing _ -> 1 TrivialError _ (Just x) _ -> errorItemLength pxy x FancyError _ xs -> E.foldl' (\a b -> max a (errorFancyLength b)) 1 xs -- | Pretty-print a 'ParseError'. The rendered 'String' always ends with a -- newline. -- -- @since 5.0.0 parseErrorPretty :: (VisualStream s, ShowErrorComponent e) => -- | Parse error to render ParseError s e -> -- | Result of rendering String parseErrorPretty e = "offset=" <> show (errorOffset e) <> ":\n" <> parseErrorTextPretty e -- | Pretty-print a textual part of a 'ParseError', that is, everything -- except for its position. The rendered 'String' always ends with a -- newline. -- -- @since 5.1.0 parseErrorTextPretty :: forall s e. (VisualStream s, ShowErrorComponent e) => -- | Parse error to render ParseError s e -> -- | Result of rendering String parseErrorTextPretty (TrivialError _ us ps) = if isNothing us && E.null ps then "unknown parse error\n" else messageItemsPretty "unexpected " (showErrorItem pxy `E.map` maybe E.empty E.singleton us) <> messageItemsPretty "expecting " (showErrorItem pxy `E.map` ps) where pxy = Proxy :: Proxy s parseErrorTextPretty (FancyError _ xs) = if E.null xs then "unknown fancy parse error\n" else unlines (showErrorFancy <$> E.toAscList xs) ---------------------------------------------------------------------------- -- Helpers -- | Pretty-print an 'ErrorItem'. showErrorItem :: (VisualStream s) => Proxy s -> ErrorItem (Token s) -> String showErrorItem pxy = \case Tokens ts -> showTokens pxy ts Label label -> NE.toList label EndOfInput -> "end of input" -- | Get length of the “pointer” to display under a given 'ErrorItem'. errorItemLength :: (VisualStream s) => Proxy s -> ErrorItem (Token s) -> Int errorItemLength pxy = \case Tokens ts -> tokensLength pxy ts _ -> 1 -- | Pretty-print an 'ErrorFancy'. showErrorFancy :: (ShowErrorComponent e) => ErrorFancy e -> String showErrorFancy = \case ErrorFail msg -> msg ErrorIndentation ord ref actual -> "incorrect indentation (got " <> show (unPos actual) <> ", should be " <> p <> show (unPos ref) <> ")" where p = case ord of LT -> "less than " EQ -> "equal to " GT -> "greater than " ErrorCustom a -> showErrorComponent a -- | Get length of the “pointer” to display under a given 'ErrorFancy'. errorFancyLength :: (ShowErrorComponent e) => ErrorFancy e -> Int errorFancyLength = \case ErrorCustom a -> errorComponentLen a _ -> 1 -- | Transform a list of error messages into their textual representation. messageItemsPretty :: -- | Prefix to prepend String -> -- | Collection of messages Set String -> -- | Result of rendering String messageItemsPretty prefix ts | E.null ts = "" | otherwise = prefix <> (orList . NE.fromList . E.toAscList) ts <> "\n" -- | Print a pretty list where items are separated with commas and the word -- “or” according to the rules of English punctuation. orList :: NonEmpty String -> String orList (x :| []) = x orList (x :| [y]) = x <> " or " <> y orList xs = intercalate ", " (NE.init xs) <> ", or " <> NE.last xs megaparsec-9.3.1/Text/Megaparsec/Error.hs-boot0000644000000000000000000000025207346545000017404 0ustar0000000000000000{-# LANGUAGE RoleAnnotations #-} {-# LANGUAGE Safe #-} module Text.Megaparsec.Error ( ParseError, ) where type role ParseError nominal nominal data ParseError s e megaparsec-9.3.1/Text/Megaparsec/Error/0000755000000000000000000000000007346545000016110 5ustar0000000000000000megaparsec-9.3.1/Text/Megaparsec/Error/Builder.hs0000644000000000000000000001243107346545000020033 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} {-# LANGUAGE DeriveGeneric #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE Safe #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE StandaloneDeriving #-} {-# LANGUAGE UndecidableInstances #-} -- | -- Module : Text.Megaparsec.Error.Builder -- Copyright : © 2015–present Megaparsec contributors -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- A set of helpers that should make construction of 'ParseError's more -- concise. This is primarily useful in test suites and for debugging. -- -- @since 6.0.0 module Text.Megaparsec.Error.Builder ( -- * Top-level helpers err, errFancy, -- * Error components utok, utoks, ulabel, ueof, etok, etoks, elabel, eeof, fancy, -- * Data types ET, EF, ) where import Data.Data (Data) import Data.List.NonEmpty (NonEmpty (..)) import qualified Data.List.NonEmpty as NE import Data.Proxy import Data.Set (Set) import qualified Data.Set as E import Data.Typeable (Typeable) import GHC.Generics import Text.Megaparsec.Error import Text.Megaparsec.Stream ---------------------------------------------------------------------------- -- Data types -- | Auxiliary type for construction of trivial parse errors. data ET s = ET (Maybe (ErrorItem (Token s))) (Set (ErrorItem (Token s))) deriving (Typeable, Generic) deriving instance (Eq (Token s)) => Eq (ET s) deriving instance (Ord (Token s)) => Ord (ET s) deriving instance ( Data s, Data (Token s), Ord (Token s) ) => Data (ET s) instance (Stream s) => Semigroup (ET s) where ET us0 ps0 <> ET us1 ps1 = ET (n us0 us1) (E.union ps0 ps1) where n Nothing Nothing = Nothing n (Just x) Nothing = Just x n Nothing (Just y) = Just y n (Just x) (Just y) = Just (max x y) instance (Stream s) => Monoid (ET s) where mempty = ET Nothing E.empty mappend = (<>) -- | Auxiliary type for construction of fancy parse errors. newtype EF e = EF (Set (ErrorFancy e)) deriving (Eq, Ord, Data, Typeable, Generic) instance (Ord e) => Semigroup (EF e) where EF xs0 <> EF xs1 = EF (E.union xs0 xs1) instance (Ord e) => Monoid (EF e) where mempty = EF E.empty mappend = (<>) ---------------------------------------------------------------------------- -- Top-level helpers -- | Assemble a 'ParseError' from the offset and the @'ET' t@ value. @'ET' -- t@ is a monoid and can be assembled by combining primitives provided by -- this module, see below. err :: -- | 'ParseError' offset Int -> -- | Error components ET s -> -- | Resulting 'ParseError' ParseError s e err p (ET us ps) = TrivialError p us ps -- | Like 'err', but constructs a “fancy” 'ParseError'. errFancy :: -- | 'ParseError' offset Int -> -- | Error components EF e -> -- | Resulting 'ParseError' ParseError s e errFancy p (EF xs) = FancyError p xs ---------------------------------------------------------------------------- -- Error components -- | Construct an “unexpected token” error component. utok :: (Stream s) => Token s -> ET s utok = unexp . Tokens . nes -- | Construct an “unexpected tokens” error component. Empty chunk produces -- 'EndOfInput'. utoks :: forall s. (Stream s) => Tokens s -> ET s utoks = unexp . canonicalizeTokens (Proxy :: Proxy s) -- | Construct an “unexpected label” error component. Do not use with empty -- strings (for empty strings it's bottom). ulabel :: (Stream s) => String -> ET s ulabel label | label == "" = error "Text.Megaparsec.Error.Builder.ulabel: empty label" | otherwise = unexp . Label . NE.fromList $ label -- | Construct an “unexpected end of input” error component. ueof :: (Stream s) => ET s ueof = unexp EndOfInput -- | Construct an “expected token” error component. etok :: (Stream s) => Token s -> ET s etok = expe . Tokens . nes -- | Construct an “expected tokens” error component. Empty chunk produces -- 'EndOfInput'. etoks :: forall s. (Stream s) => Tokens s -> ET s etoks = expe . canonicalizeTokens (Proxy :: Proxy s) -- | Construct an “expected label” error component. Do not use with empty -- strings. elabel :: (Stream s) => String -> ET s elabel label | label == "" = error "Text.Megaparsec.Error.Builder.elabel: empty label" | otherwise = expe . Label . NE.fromList $ label -- | Construct an “expected end of input” error component. eeof :: (Stream s) => ET s eeof = expe EndOfInput -- | Construct a custom error component. fancy :: ErrorFancy e -> EF e fancy = EF . E.singleton ---------------------------------------------------------------------------- -- Helpers -- | Construct the appropriate 'ErrorItem' representation for the given -- token stream. The empty string produces 'EndOfInput'. canonicalizeTokens :: (Stream s) => Proxy s -> Tokens s -> ErrorItem (Token s) canonicalizeTokens pxy ts = case NE.nonEmpty (chunkToTokens pxy ts) of Nothing -> EndOfInput Just xs -> Tokens xs -- | Lift an unexpected item into 'ET'. unexp :: (Stream s) => ErrorItem (Token s) -> ET s unexp u = ET (pure u) E.empty -- | Lift an expected item into 'ET'. expe :: (Stream s) => ErrorItem (Token s) -> ET s expe p = ET Nothing (E.singleton p) -- | Make a singleton non-empty list from a value. nes :: a -> NonEmpty a nes x = x :| [] megaparsec-9.3.1/Text/Megaparsec/Internal.hs0000644000000000000000000005075207346545000017140 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE LambdaCase #-} {-# LANGUAGE MultiParamTypeClasses #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE Safe #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeOperators #-} {-# LANGUAGE UndecidableInstances #-} -- | -- Module : Text.Megaparsec.Internal -- Copyright : © 2015–present Megaparsec contributors -- © 2007 Paolo Martini -- © 1999–2001 Daan Leijen -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- Internal definitions. Versioning rules do not apply here. Please do not -- rely on these unless you really know what you're doing. -- -- @since 6.5.0 module Text.Megaparsec.Internal ( -- * Data types Hints (..), Reply (..), Consumption (..), Result (..), ParsecT (..), -- * Helper functions toHints, withHints, accHints, refreshHints, runParsecT, withParsecT, ) where import Control.Applicative import Control.Monad import Control.Monad.Cont.Class import Control.Monad.Error.Class import qualified Control.Monad.Fail as Fail import Control.Monad.Fix import Control.Monad.IO.Class import Control.Monad.Reader.Class import Control.Monad.State.Class import Control.Monad.Trans import Data.List.NonEmpty (NonEmpty (..)) import qualified Data.List.NonEmpty as NE import Data.Proxy import Data.Semigroup import Data.Set (Set) import qualified Data.Set as E import Data.String (IsString (..)) import Text.Megaparsec.Class import Text.Megaparsec.Error import Text.Megaparsec.State import Text.Megaparsec.Stream ---------------------------------------------------------------------------- -- Data types -- | 'Hints' represent a collection of 'ErrorItem's to be included into -- 'ParseError' (when it's a 'TrivialError') as “expected” message items -- when a parser fails without consuming input right after successful parser -- that produced the hints. -- -- For example, without hints you could get: -- -- >>> parseTest (many (char 'r') <* eof) "ra" -- 1:2: -- unexpected 'a' -- expecting end of input -- -- We're getting better error messages with the help of hints: -- -- >>> parseTest (many (char 'r') <* eof) "ra" -- 1:2: -- unexpected 'a' -- expecting 'r' or end of input newtype Hints t = Hints (Set (ErrorItem t)) instance (Ord t) => Semigroup (Hints t) where Hints xs <> Hints ys = Hints $ xs <> ys instance (Ord t) => Monoid (Hints t) where mempty = Hints mempty -- | All information available after parsing. This includes consumption of -- input, success (with the returned value) or failure (with the parse -- error), and parser state at the end of parsing. -- -- See also: 'Consumption', 'Result'. data Reply e s a = Reply (State s e) Consumption (Result s e a) -- | Whether the input has been consumed or not. -- -- See also: 'Result', 'Reply'. data Consumption = -- | Some part of input stream was consumed Consumed | -- | No input was consumed Virgin -- | Whether the parser has failed or not. On success we include the -- resulting value, on failure we include a 'ParseError'. -- -- See also: 'Consumption', 'Reply'. data Result s e a = -- | Parser succeeded OK a | -- | Parser failed Error (ParseError s e) -- | @'ParsecT' e s m a@ is a parser with custom data component of error -- @e@, stream type @s@, underlying monad @m@ and return type @a@. newtype ParsecT e s m a = ParsecT { unParser :: forall b. State s e -> (a -> State s e -> Hints (Token s) -> m b) -> -- consumed-OK (ParseError s e -> State s e -> m b) -> -- consumed-error (a -> State s e -> Hints (Token s) -> m b) -> -- empty-OK (ParseError s e -> State s e -> m b) -> -- empty-error m b } -- | @since 5.3.0 instance (Stream s, Semigroup a) => Semigroup (ParsecT e s m a) where (<>) = liftA2 (<>) {-# INLINE (<>) #-} sconcat = fmap sconcat . sequence {-# INLINE sconcat #-} -- | @since 5.3.0 instance (Stream s, Monoid a) => Monoid (ParsecT e s m a) where mempty = pure mempty {-# INLINE mempty #-} mappend = (<>) {-# INLINE mappend #-} mconcat = fmap mconcat . sequence {-# INLINE mconcat #-} -- | @since 6.3.0 instance (a ~ Tokens s, IsString a, Eq a, Stream s, Ord e) => IsString (ParsecT e s m a) where fromString s = tokens (==) (fromString s) instance Functor (ParsecT e s m) where fmap = pMap pMap :: (a -> b) -> ParsecT e s m a -> ParsecT e s m b pMap f p = ParsecT $ \s cok cerr eok eerr -> unParser p s (cok . f) cerr (eok . f) eerr {-# INLINE pMap #-} -- | 'pure' returns a parser that __succeeds__ without consuming input. instance (Stream s) => Applicative (ParsecT e s m) where pure = pPure (<*>) = pAp p1 *> p2 = p1 `pBind` const p2 p1 <* p2 = do x1 <- p1; void p2; return x1 pPure :: (Stream s) => a -> ParsecT e s m a pPure x = ParsecT $ \s _ _ eok _ -> eok x s mempty {-# INLINE pPure #-} pAp :: (Stream s) => ParsecT e s m (a -> b) -> ParsecT e s m a -> ParsecT e s m b pAp m k = ParsecT $ \s cok cerr eok eerr -> let mcok x s' hs = unParser k s' (cok . x) cerr (accHints hs (cok . x)) (withHints hs cerr) meok x s' hs = unParser k s' (cok . x) cerr (accHints hs (eok . x)) (withHints hs eerr) in unParser m s mcok cerr meok eerr {-# INLINE pAp #-} -- | 'empty' is a parser that __fails__ without consuming input. instance (Ord e, Stream s) => Alternative (ParsecT e s m) where empty = mzero (<|>) = mplus -- | 'return' returns a parser that __succeeds__ without consuming input. instance (Stream s) => Monad (ParsecT e s m) where return = pure (>>=) = pBind pBind :: (Stream s) => ParsecT e s m a -> (a -> ParsecT e s m b) -> ParsecT e s m b pBind m k = ParsecT $ \s cok cerr eok eerr -> let mcok x s' hs = unParser (k x) s' cok cerr (accHints hs cok) (withHints hs cerr) meok x s' hs = unParser (k x) s' cok cerr (accHints hs eok) (withHints hs eerr) in unParser m s mcok cerr meok eerr {-# INLINE pBind #-} instance (Stream s) => Fail.MonadFail (ParsecT e s m) where fail = pFail pFail :: String -> ParsecT e s m a pFail msg = ParsecT $ \s@(State _ o _ _) _ _ _ eerr -> let d = E.singleton (ErrorFail msg) in eerr (FancyError o d) s {-# INLINE pFail #-} instance (Stream s, MonadIO m) => MonadIO (ParsecT e s m) where liftIO = lift . liftIO instance (Stream s, MonadReader r m) => MonadReader r (ParsecT e s m) where ask = lift ask local f p = mkPT $ \s -> local f (runParsecT p s) instance (Stream s, MonadState st m) => MonadState st (ParsecT e s m) where get = lift get put = lift . put instance (Stream s, MonadCont m) => MonadCont (ParsecT e s m) where callCC f = mkPT $ \s -> callCC $ \c -> runParsecT (f (\a -> mkPT $ \s' -> c (pack s' a))) s where pack s a = Reply s Virgin (OK a) instance (Stream s, MonadError e' m) => MonadError e' (ParsecT e s m) where throwError = lift . throwError p `catchError` h = mkPT $ \s -> runParsecT p s `catchError` \e -> runParsecT (h e) s mkPT :: (Stream s, Monad m) => (State s e -> m (Reply e s a)) -> ParsecT e s m a mkPT k = ParsecT $ \s cok cerr eok eerr -> do (Reply s' consumption result) <- k s case consumption of Consumed -> case result of OK x -> cok x s' mempty Error e -> cerr e s' Virgin -> case result of OK x -> eok x s' mempty Error e -> eerr e s' -- | 'mzero' is a parser that __fails__ without consuming input. -- -- __Note__: strictly speaking, this instance is unlawful. The right -- identity law does not hold, e.g. in general this is not true: -- -- > v >> mzero = mero -- -- However the following holds: -- -- > try v >> mzero = mzero instance (Ord e, Stream s) => MonadPlus (ParsecT e s m) where mzero = pZero mplus = pPlus pZero :: ParsecT e s m a pZero = ParsecT $ \s@(State _ o _ _) _ _ _ eerr -> eerr (TrivialError o Nothing E.empty) s {-# INLINE pZero #-} pPlus :: (Ord e, Stream s) => ParsecT e s m a -> ParsecT e s m a -> ParsecT e s m a pPlus m n = ParsecT $ \s cok cerr eok eerr -> let meerr err ms = let ncerr err' s' = cerr (err' <> err) (longestMatch ms s') neok x s' hs = eok x s' (toHints (stateOffset s') err <> hs) neerr err' s' = eerr (err' <> err) (longestMatch ms s') in unParser n s cok ncerr neok neerr in unParser m s cok cerr eok meerr {-# INLINE pPlus #-} -- | From two states, return the one with the greater number of processed -- tokens. If the numbers of processed tokens are equal, prefer the second -- state. longestMatch :: State s e -> State s e -> State s e longestMatch s1@(State _ o1 _ _) s2@(State _ o2 _ _) = case o1 `compare` o2 of LT -> s2 EQ -> s2 GT -> s1 {-# INLINE longestMatch #-} -- | @since 6.0.0 instance (Stream s, MonadFix m) => MonadFix (ParsecT e s m) where mfix f = mkPT $ \s -> mfix $ \(~(Reply _ _ result)) -> do let a = case result of OK a' -> a' Error _ -> error "mfix ParsecT" runParsecT (f a) s instance (Stream s) => MonadTrans (ParsecT e s) where lift amb = ParsecT $ \s _ _ eok _ -> amb >>= \a -> eok a s mempty instance (Ord e, Stream s) => MonadParsec e s (ParsecT e s m) where parseError = pParseError label = pLabel try = pTry lookAhead = pLookAhead notFollowedBy = pNotFollowedBy withRecovery = pWithRecovery observing = pObserving eof = pEof token = pToken tokens = pTokens takeWhileP = pTakeWhileP takeWhile1P = pTakeWhile1P takeP = pTakeP getParserState = pGetParserState updateParserState = pUpdateParserState pParseError :: ParseError s e -> ParsecT e s m a pParseError e = ParsecT $ \s _ _ _ eerr -> eerr e s {-# INLINE pParseError #-} pLabel :: String -> ParsecT e s m a -> ParsecT e s m a pLabel l p = ParsecT $ \s cok cerr eok eerr -> let el = Label <$> NE.nonEmpty l cok' x s' hs = case el of Nothing -> cok x s' (refreshHints hs Nothing) Just _ -> cok x s' hs eok' x s' hs = eok x s' (refreshHints hs el) eerr' err = eerr $ case err of (TrivialError pos us _) -> TrivialError pos us (maybe E.empty E.singleton el) _ -> err in unParser p s cok' cerr eok' eerr' {-# INLINE pLabel #-} pTry :: ParsecT e s m a -> ParsecT e s m a pTry p = ParsecT $ \s cok _ eok eerr -> let eerr' err _ = eerr err s in unParser p s cok eerr' eok eerr' {-# INLINE pTry #-} pLookAhead :: (Stream s) => ParsecT e s m a -> ParsecT e s m a pLookAhead p = ParsecT $ \s _ cerr eok eerr -> let eok' a _ _ = eok a s mempty in unParser p s eok' cerr eok' eerr {-# INLINE pLookAhead #-} pNotFollowedBy :: (Stream s) => ParsecT e s m a -> ParsecT e s m () pNotFollowedBy p = ParsecT $ \s@(State input o _ _) _ _ eok eerr -> let what = maybe EndOfInput (Tokens . nes . fst) (take1_ input) unexpect u = TrivialError o (pure u) E.empty cok' _ _ _ = eerr (unexpect what) s cerr' _ _ = eok () s mempty eok' _ _ _ = eerr (unexpect what) s eerr' _ _ = eok () s mempty in unParser p s cok' cerr' eok' eerr' {-# INLINE pNotFollowedBy #-} pWithRecovery :: (Stream s) => (ParseError s e -> ParsecT e s m a) -> ParsecT e s m a -> ParsecT e s m a pWithRecovery r p = ParsecT $ \s cok cerr eok eerr -> let mcerr err ms = let rcok x s' _ = cok x s' mempty rcerr _ _ = cerr err ms reok x s' _ = eok x s' (toHints (stateOffset s') err) reerr _ _ = cerr err ms in unParser (r err) ms rcok rcerr reok reerr meerr err ms = let rcok x s' _ = cok x s' (toHints (stateOffset s') err) rcerr _ _ = eerr err ms reok x s' _ = eok x s' (toHints (stateOffset s') err) reerr _ _ = eerr err ms in unParser (r err) ms rcok rcerr reok reerr in unParser p s cok mcerr eok meerr {-# INLINE pWithRecovery #-} pObserving :: (Stream s) => ParsecT e s m a -> ParsecT e s m (Either (ParseError s e) a) pObserving p = ParsecT $ \s cok _ eok _ -> let cerr' err s' = cok (Left err) s' mempty eerr' err s' = eok (Left err) s' (toHints (stateOffset s') err) in unParser p s (cok . Right) cerr' (eok . Right) eerr' {-# INLINE pObserving #-} pEof :: forall e s m. (Stream s) => ParsecT e s m () pEof = ParsecT $ \s@(State input o pst de) _ _ eok eerr -> case take1_ input of Nothing -> eok () s mempty Just (x, _) -> let us = (pure . Tokens . nes) x ps = E.singleton EndOfInput in eerr (TrivialError o us ps) (State input o pst de) {-# INLINE pEof #-} pToken :: forall e s m a. (Stream s) => (Token s -> Maybe a) -> Set (ErrorItem (Token s)) -> ParsecT e s m a pToken test ps = ParsecT $ \s@(State input o pst de) cok _ _ eerr -> case take1_ input of Nothing -> let us = pure EndOfInput in eerr (TrivialError o us ps) s Just (c, cs) -> case test c of Nothing -> let us = (Just . Tokens . nes) c in eerr (TrivialError o us ps) (State input o pst de) Just x -> cok x (State cs (o + 1) pst de) mempty {-# INLINE pToken #-} pTokens :: forall e s m. (Stream s) => (Tokens s -> Tokens s -> Bool) -> Tokens s -> ParsecT e s m (Tokens s) pTokens f tts = ParsecT $ \s@(State input o pst de) cok _ eok eerr -> let pxy = Proxy :: Proxy s unexpect pos' u = let us = pure u ps = (E.singleton . Tokens . NE.fromList . chunkToTokens pxy) tts in TrivialError pos' us ps len = chunkLength pxy tts in case takeN_ len input of Nothing -> eerr (unexpect o EndOfInput) s Just (tts', input') -> if f tts tts' then let st = State input' (o + len) pst de in if chunkEmpty pxy tts then eok tts' st mempty else cok tts' st mempty else let ps = (Tokens . NE.fromList . chunkToTokens pxy) tts' in eerr (unexpect o ps) (State input o pst de) {-# INLINE pTokens #-} pTakeWhileP :: forall e s m. (Stream s) => Maybe String -> (Token s -> Bool) -> ParsecT e s m (Tokens s) pTakeWhileP ml f = ParsecT $ \(State input o pst de) cok _ eok _ -> let pxy = Proxy :: Proxy s (ts, input') = takeWhile_ f input len = chunkLength pxy ts hs = case ml >>= NE.nonEmpty of Nothing -> mempty Just l -> (Hints . E.singleton . Label) l in if chunkEmpty pxy ts then eok ts (State input' (o + len) pst de) hs else cok ts (State input' (o + len) pst de) hs {-# INLINE pTakeWhileP #-} pTakeWhile1P :: forall e s m. (Stream s) => Maybe String -> (Token s -> Bool) -> ParsecT e s m (Tokens s) pTakeWhile1P ml f = ParsecT $ \(State input o pst de) cok _ _ eerr -> let pxy = Proxy :: Proxy s (ts, input') = takeWhile_ f input len = chunkLength pxy ts el = Label <$> (ml >>= NE.nonEmpty) hs = case el of Nothing -> mempty Just l -> (Hints . E.singleton) l in if chunkEmpty pxy ts then let us = pure $ case take1_ input of Nothing -> EndOfInput Just (t, _) -> Tokens (nes t) ps = maybe E.empty E.singleton el in eerr (TrivialError o us ps) (State input o pst de) else cok ts (State input' (o + len) pst de) hs {-# INLINE pTakeWhile1P #-} pTakeP :: forall e s m. (Stream s) => Maybe String -> Int -> ParsecT e s m (Tokens s) pTakeP ml n' = ParsecT $ \s@(State input o pst de) cok _ _ eerr -> let n = max 0 n' pxy = Proxy :: Proxy s el = Label <$> (ml >>= NE.nonEmpty) ps = maybe E.empty E.singleton el in case takeN_ n input of Nothing -> eerr (TrivialError o (pure EndOfInput) ps) s Just (ts, input') -> let len = chunkLength pxy ts in if len /= n then eerr (TrivialError (o + len) (pure EndOfInput) ps) (State input o pst de) else cok ts (State input' (o + len) pst de) mempty {-# INLINE pTakeP #-} pGetParserState :: (Stream s) => ParsecT e s m (State s e) pGetParserState = ParsecT $ \s _ _ eok _ -> eok s s mempty {-# INLINE pGetParserState #-} pUpdateParserState :: (Stream s) => (State s e -> State s e) -> ParsecT e s m () pUpdateParserState f = ParsecT $ \s _ _ eok _ -> eok () (f s) mempty {-# INLINE pUpdateParserState #-} nes :: a -> NonEmpty a nes x = x :| [] {-# INLINE nes #-} ---------------------------------------------------------------------------- -- Helper functions -- | Convert a 'ParseError' record into 'Hints'. toHints :: (Stream s) => -- | Current offset in input stream Int -> -- | Parse error to convert ParseError s e -> Hints (Token s) toHints streamPos = \case TrivialError errOffset _ ps -> -- NOTE This is important to check here that the error indeed has -- happened at the same position as current position of stream because -- there might have been backtracking with 'try' and in that case we -- must not convert such a parse error to hints. if streamPos == errOffset then Hints (if E.null ps then E.empty else ps) else mempty FancyError _ _ -> mempty {-# INLINE toHints #-} -- | @'withHints' hs c@ makes “error” continuation @c@ use given hints @hs@. -- -- __Note__ that if resulting continuation gets 'ParseError' that has custom -- data in it, hints are ignored. withHints :: (Stream s) => -- | Hints to use Hints (Token s) -> -- | Continuation to influence (ParseError s e -> State s e -> m b) -> -- | First argument of resulting continuation ParseError s e -> -- | Second argument of resulting continuation State s e -> m b withHints (Hints ps') c e = case e of TrivialError pos us ps -> c (TrivialError pos us (E.union ps ps')) _ -> c e {-# INLINE withHints #-} -- | @'accHints' hs c@ results in “OK” continuation that will add given -- hints @hs@ to third argument of original continuation @c@. accHints :: (Stream s) => -- | 'Hints' to add Hints (Token s) -> -- | An “OK” continuation to alter (a -> State s e -> Hints (Token s) -> m b) -> -- | Altered “OK” continuation (a -> State s e -> Hints (Token s) -> m b) accHints hs1 c x s hs2 = c x s (hs1 <> hs2) {-# INLINE accHints #-} -- | Replace the hints with the given 'ErrorItem' (or delete it if 'Nothing' -- is given). This is used in the 'label' primitive. refreshHints :: Hints t -> Maybe (ErrorItem t) -> Hints t refreshHints (Hints _) Nothing = Hints E.empty refreshHints (Hints hs) (Just m) = if E.null hs then Hints hs else Hints (E.singleton m) {-# INLINE refreshHints #-} -- | Low-level unpacking of the 'ParsecT' type. runParsecT :: (Monad m) => -- | Parser to run ParsecT e s m a -> -- | Initial state State s e -> m (Reply e s a) runParsecT p s = unParser p s cok cerr eok eerr where cok a s' _ = return $ Reply s' Consumed (OK a) cerr err s' = return $ Reply s' Consumed (Error err) eok a s' _ = return $ Reply s' Virgin (OK a) eerr err s' = return $ Reply s' Virgin (Error err) -- | Transform any custom errors thrown by the parser using the given -- function. Similar in function and purpose to @withExceptT@. -- -- __Note__ that the inner parser will start with an empty collection of -- “delayed” 'ParseError's. Any delayed 'ParseError's produced in the inner -- parser will be lifted by applying the provided function and added to the -- collection of delayed parse errors of the outer parser. -- -- @since 7.0.0 withParsecT :: forall e e' s m a. (Monad m, Ord e') => (e -> e') -> -- | Inner parser ParsecT e s m a -> -- | Outer parser ParsecT e' s m a withParsecT f p = ParsecT $ \s cok cerr eok eerr -> let s' = s { stateParseErrors = [] } adjustState :: State s e -> State s e' adjustState st = st { stateParseErrors = (mapParseError f <$> stateParseErrors st) ++ stateParseErrors s } cok' x st hs = cok x (adjustState st) hs cerr' e st = cerr (mapParseError f e) (adjustState st) eok' x st hs = eok x (adjustState st) hs eerr' e st = eerr (mapParseError f e) (adjustState st) in unParser p s' cok' cerr' eok' eerr' {-# INLINE withParsecT #-} megaparsec-9.3.1/Text/Megaparsec/Lexer.hs0000644000000000000000000001017507346545000016436 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE Safe #-} -- | -- Module : Text.Megaparsec.Common -- Copyright : © 2018–present Megaparsec contributors -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- Common token combinators. This module is not public, the functions from -- it are re-exported in "Text.Megaparsec.Byte" and "Text.Megaparsec.Char". -- -- @since 7.0.0 module Text.Megaparsec.Lexer ( -- * White space space, lexeme, symbol, symbol', ) where import qualified Data.CaseInsensitive as CI import Text.Megaparsec import Text.Megaparsec.Common ---------------------------------------------------------------------------- -- White space -- | @'space' sc lineComment blockComment@ produces a parser that can parse -- white space in general. It's expected that you create such a parser once -- and pass it to other functions in this module as needed (when you see -- @spaceConsumer@ in documentation, usually it means that something like -- 'space' is expected there). -- -- @sc@ is used to parse blocks of space characters. You can use -- 'Text.Megaparsec.Char.space1' from "Text.Megaparsec.Char" for this -- purpose as well as your own parser (if you don't want to automatically -- consume newlines, for example). Make sure that the parser does not -- succeed on the empty input though. In an earlier version of the library -- 'Text.Megaparsec.Char.spaceChar' was recommended, but now parsers based -- on 'takeWhile1P' are preferred because of their speed. -- -- @lineComment@ is used to parse line comments. You can use -- @skipLineComment@ if you don't need anything special. -- -- @blockComment@ is used to parse block (multi-line) comments. You can use -- @skipBlockComment@ or @skipBlockCommentNested@ if you don't need anything -- special. -- -- If you don't want to allow a kind of comment, simply pass 'empty' which -- will fail instantly when parsing of that sort of comment is attempted and -- 'space' will just move on or finish depending on whether there is more -- white space for it to consume. space :: (MonadParsec e s m) => -- | A parser for space characters which does not accept empty -- input (e.g. 'Text.Megaparsec.Char.space1') m () -> -- | A parser for a line comment (e.g. 'skipLineComment') m () -> -- | A parser for a block comment (e.g. 'skipBlockComment') m () -> m () space sp line block = skipMany $ choice [hidden sp, hidden line, hidden block] {-# INLINEABLE space #-} -- | This is a wrapper for lexemes. The typical usage is to supply the first -- argument (parser that consumes white space, probably defined via 'space') -- and use the resulting function to wrap parsers for every lexeme. -- -- > lexeme = L.lexeme spaceConsumer -- > integer = lexeme L.decimal lexeme :: (MonadParsec e s m) => -- | How to consume white space after lexeme m () -> -- | How to parse actual lexeme m a -> m a lexeme spc p = p <* spc {-# INLINEABLE lexeme #-} -- | This is a helper to parse symbols, i.e. verbatim strings. You pass the -- first argument (parser that consumes white space, probably defined via -- 'space') and then you can use the resulting function to parse strings: -- -- > symbol = L.symbol spaceConsumer -- > -- > parens = between (symbol "(") (symbol ")") -- > braces = between (symbol "{") (symbol "}") -- > angles = between (symbol "<") (symbol ">") -- > brackets = between (symbol "[") (symbol "]") -- > semicolon = symbol ";" -- > comma = symbol "," -- > colon = symbol ":" -- > dot = symbol "." symbol :: (MonadParsec e s m) => -- | How to consume white space after lexeme m () -> -- | Symbol to parse Tokens s -> m (Tokens s) symbol spc = lexeme spc . string {-# INLINEABLE symbol #-} -- | A case-insensitive version of 'symbol'. This may be helpful if you're -- working with case-insensitive languages. symbol' :: (MonadParsec e s m, CI.FoldCase (Tokens s)) => -- | How to consume white space after lexeme m () -> -- | Symbol to parse (case-insensitive) Tokens s -> m (Tokens s) symbol' spc = lexeme spc . string' {-# INLINEABLE symbol' #-} megaparsec-9.3.1/Text/Megaparsec/Pos.hs0000644000000000000000000000723307346545000016121 0ustar0000000000000000{-# LANGUAGE DeriveAnyClass #-} {-# LANGUAGE DeriveDataTypeable #-} {-# LANGUAGE DeriveGeneric #-} {-# LANGUAGE Safe #-} -- | -- Module : Text.Megaparsec.Pos -- Copyright : © 2015–present Megaparsec contributors -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- Textual source position. The position includes name of file, line number, -- and column number. -- -- You probably do not want to import this module directly because -- "Text.Megaparsec" re-exports it anyway. module Text.Megaparsec.Pos ( -- * Abstract position Pos, mkPos, unPos, pos1, defaultTabWidth, InvalidPosException (..), -- * Source position SourcePos (..), initialPos, sourcePosPretty, ) where import Control.DeepSeq import Control.Exception import Data.Data (Data) import Data.Typeable (Typeable) import GHC.Generics ---------------------------------------------------------------------------- -- Abstract position -- | 'Pos' is the type for positive integers. This is used to represent line -- number, column number, and similar things like indentation level. -- 'Semigroup' instance can be used to safely and efficiently add 'Pos'es -- together. -- -- @since 5.0.0 newtype Pos = Pos Int deriving (Show, Eq, Ord, Data, Generic, Typeable, NFData) -- | Construction of 'Pos' from 'Int'. The function throws -- 'InvalidPosException' when given a non-positive argument. -- -- @since 6.0.0 mkPos :: Int -> Pos mkPos a = if a <= 0 then throw (InvalidPosException a) else Pos a {-# INLINE mkPos #-} -- | Extract 'Int' from 'Pos'. -- -- @since 6.0.0 unPos :: Pos -> Int unPos (Pos w) = w {-# INLINE unPos #-} -- | Position with value 1. -- -- @since 6.0.0 pos1 :: Pos pos1 = mkPos 1 -- | Value of tab width used by default. Always prefer this constant when -- you want to refer to the default tab width because actual value /may/ -- change in future. -- -- Currently: -- -- > defaultTabWidth = mkPos 8 -- -- @since 5.0.0 defaultTabWidth :: Pos defaultTabWidth = mkPos 8 instance Semigroup Pos where (Pos x) <> (Pos y) = Pos (x + y) {-# INLINE (<>) #-} instance Read Pos where readsPrec d = readParen (d > 10) $ \r1 -> do ("Pos", r2) <- lex r1 (x, r3) <- readsPrec 11 r2 return (mkPos x, r3) -- | The exception is thrown by 'mkPos' when its argument is not a positive -- number. -- -- @since 5.0.0 newtype InvalidPosException = -- | Contains the actual value that was passed to 'mkPos' InvalidPosException Int deriving (Eq, Show, Data, Typeable, Generic) instance Exception InvalidPosException instance NFData InvalidPosException ---------------------------------------------------------------------------- -- Source position -- | The data type 'SourcePos' represents source positions. It contains the -- name of the source file, a line number, and a column number. Source line -- and column positions change intensively during parsing, so we need to -- make them strict to avoid memory leaks. data SourcePos = SourcePos { -- | Name of source file sourceName :: FilePath, -- | Line number sourceLine :: !Pos, -- | Column number sourceColumn :: !Pos } deriving (Show, Read, Eq, Ord, Data, Typeable, Generic) instance NFData SourcePos -- | Construct initial position (line 1, column 1) given name of source -- file. initialPos :: FilePath -> SourcePos initialPos n = SourcePos n pos1 pos1 -- | Pretty-print a 'SourcePos'. -- -- @since 5.0.0 sourcePosPretty :: SourcePos -> String sourcePosPretty (SourcePos n l c) | null n = showLC | otherwise = n <> ":" <> showLC where showLC = show (unPos l) <> ":" <> show (unPos c) megaparsec-9.3.1/Text/Megaparsec/State.hs0000644000000000000000000000467407346545000016446 0ustar0000000000000000{-# LANGUAGE DeriveDataTypeable #-} {-# LANGUAGE DeriveGeneric #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE Safe #-} {-# LANGUAGE StandaloneDeriving #-} {-# LANGUAGE UndecidableInstances #-} -- | -- Module : Text.Megaparsec.State -- Copyright : © 2015–present Megaparsec contributors -- © 2007 Paolo Martini -- © 1999–2001 Daan Leijen -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- Definition of Megaparsec's 'State'. -- -- @since 6.5.0 module Text.Megaparsec.State ( State (..), PosState (..), ) where import Control.DeepSeq (NFData) import Data.Data (Data) import Data.Typeable (Typeable) import GHC.Generics import {-# SOURCE #-} Text.Megaparsec.Error (ParseError) import Text.Megaparsec.Pos -- | This is the Megaparsec's state parametrized over stream type @s@ and -- custom error component type @e@. data State s e = State { -- | The rest of input to process stateInput :: s, -- | Number of processed tokens so far -- -- @since 7.0.0 stateOffset :: {-# UNPACK #-} !Int, -- | State that is used for line\/column calculation -- -- @since 7.0.0 statePosState :: PosState s, -- | Collection of “delayed” 'ParseError's in reverse order. This means -- that the last registered error is the first element of the list. -- -- @since 8.0.0 stateParseErrors :: [ParseError s e] } deriving (Typeable, Generic) deriving instance ( Show (ParseError s e), Show s ) => Show (State s e) deriving instance ( Eq (ParseError s e), Eq s ) => Eq (State s e) deriving instance ( Data e, Data (ParseError s e), Data s ) => Data (State s e) instance (NFData s, NFData (ParseError s e)) => NFData (State s e) -- | A special kind of state that is used to calculate line\/column -- positions on demand. -- -- @since 7.0.0 data PosState s = PosState { -- | The rest of input to process pstateInput :: s, -- | Offset corresponding to beginning of 'pstateInput' pstateOffset :: !Int, -- | Source position corresponding to beginning of 'pstateInput' pstateSourcePos :: !SourcePos, -- | Tab width to use for column calculation pstateTabWidth :: Pos, -- | Prefix to prepend to offending line pstateLinePrefix :: String } deriving (Show, Eq, Data, Typeable, Generic) instance (NFData s) => NFData (PosState s) megaparsec-9.3.1/Text/Megaparsec/Stream.hs0000644000000000000000000006662507346545000016625 0ustar0000000000000000{-# LANGUAGE BangPatterns #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE LambdaCase #-} {-# LANGUAGE MultiWayIf #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE RecordWildCards #-} {-# LANGUAGE Safe #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE UndecidableInstances #-} -- | -- Module : Text.Megaparsec.Stream -- Copyright : © 2015–present Megaparsec contributors -- License : FreeBSD -- -- Maintainer : Mark Karpov -- Stability : experimental -- Portability : portable -- -- Megaparsec's input stream facilities. -- -- You probably do not want to import this module directly because -- "Text.Megaparsec" re-exports it anyway. -- -- @since 6.0.0 module Text.Megaparsec.Stream ( Stream (..), ShareInput (..), NoShareInput (..), VisualStream (..), TraversableStream (..), ) where import Data.Bifunctor (second) import qualified Data.ByteString as B import qualified Data.ByteString.Char8 as B8 import qualified Data.ByteString.Lazy as BL import qualified Data.ByteString.Lazy.Char8 as BL8 import Data.Char (chr) import Data.Foldable (foldl', toList) import Data.Kind (Type) import Data.List.NonEmpty (NonEmpty (..)) import qualified Data.List.NonEmpty as NE import Data.Maybe (fromMaybe) import Data.Proxy import qualified Data.Sequence as S import qualified Data.Text as T import qualified Data.Text.Lazy as TL import Data.Word (Word8) import Text.Megaparsec.Pos import Text.Megaparsec.State -- | Type class for inputs that can be consumed by the library. -- -- Note that the 'Stream' instances for 'Text' and 'ByteString' (strict and -- lazy) default to "input sharing" (see 'ShareInput', 'NoShareInput'). We plan -- to move away from input sharing in a future major release; if you want to -- retain the current behaviour and are concerned with maximum performance you -- should consider using the 'ShareInput' wrapper explicitly. -- -- __Note__: before the version /9.0.0/ the class included the methods from -- 'VisualStream' and 'TraversableStream'. class (Ord (Token s), Ord (Tokens s)) => Stream s where -- | Type of token in the stream. type Token s :: Type -- | Type of “chunk” of the stream. type Tokens s :: Type -- | Lift a single token to chunk of the stream. The default -- implementation is: -- -- > tokenToChunk pxy = tokensToChunk pxy . pure -- -- However for some types of stream there may be a more efficient way to -- lift. tokenToChunk :: Proxy s -> Token s -> Tokens s tokenToChunk pxy = tokensToChunk pxy . pure -- | The first method that establishes isomorphism between list of tokens -- and chunk of the stream. Valid implementation should satisfy: -- -- > chunkToTokens pxy (tokensToChunk pxy ts) == ts tokensToChunk :: Proxy s -> [Token s] -> Tokens s -- | The second method that establishes isomorphism between list of tokens -- and chunk of the stream. Valid implementation should satisfy: -- -- > tokensToChunk pxy (chunkToTokens pxy chunk) == chunk chunkToTokens :: Proxy s -> Tokens s -> [Token s] -- | Return length of a chunk of the stream. chunkLength :: Proxy s -> Tokens s -> Int -- | Check if a chunk of the stream is empty. The default implementation -- is in terms of the more general 'chunkLength': -- -- > chunkEmpty pxy ts = chunkLength pxy ts <= 0 -- -- However for many streams there may be a more efficient implementation. chunkEmpty :: Proxy s -> Tokens s -> Bool chunkEmpty pxy ts = chunkLength pxy ts <= 0 -- | Extract a single token form the stream. Return 'Nothing' if the -- stream is empty. take1_ :: s -> Maybe (Token s, s) -- | @'takeN_' n s@ should try to extract a chunk of length @n@, or if the -- stream is too short, the rest of the stream. Valid implementation -- should follow the rules: -- -- * If the requested length @n@ is 0 (or less), 'Nothing' should -- never be returned, instead @'Just' (\"\", s)@ should be returned, -- where @\"\"@ stands for the empty chunk, and @s@ is the original -- stream (second argument). -- * If the requested length is greater than 0 and the stream is -- empty, 'Nothing' should be returned indicating end of input. -- * In other cases, take chunk of length @n@ (or shorter if the -- stream is not long enough) from the input stream and return the -- chunk along with the rest of the stream. takeN_ :: Int -> s -> Maybe (Tokens s, s) -- | Extract chunk of the stream taking tokens while the supplied -- predicate returns 'True'. Return the chunk and the rest of the stream. -- -- For many types of streams, the method allows for significant -- performance improvements, although it is not strictly necessary from -- conceptual point of view. takeWhile_ :: (Token s -> Bool) -> s -> (Tokens s, s) -- | @since 9.0.0 instance (Ord a) => Stream [a] where type Token [a] = a type Tokens [a] = [a] tokenToChunk Proxy = pure tokensToChunk Proxy = id chunkToTokens Proxy = id chunkLength Proxy = length chunkEmpty Proxy = null take1_ [] = Nothing take1_ (t : ts) = Just (t, ts) takeN_ n s | n <= 0 = Just ([], s) | null s = Nothing | otherwise = Just (splitAt n s) takeWhile_ = span -- | @since 9.0.0 instance (Ord a) => Stream (S.Seq a) where type Token (S.Seq a) = a type Tokens (S.Seq a) = S.Seq a tokenToChunk Proxy = pure tokensToChunk Proxy = S.fromList chunkToTokens Proxy = toList chunkLength Proxy = length chunkEmpty Proxy = null take1_ S.Empty = Nothing take1_ (t S.:<| ts) = Just (t, ts) takeN_ n s | n <= 0 = Just (S.empty, s) | null s = Nothing | otherwise = Just (S.splitAt n s) takeWhile_ = S.spanl -- | This wrapper selects the input-sharing 'Stream' implementation for -- 'T.Text' ('TL.Text') and 'B.ByteString' ('BL.ByteString'). By input -- sharing we mean that our parsers will use slices whenever possible to -- avoid having to copy parts of the input. See also the documentation of -- 'T.split'. -- -- Note that using slices is in general faster than copying; on the other -- hand it also has the potential for causing surprising memory leaks: if -- any slice of the input survives in the output, holding on to the output -- will force the entire input 'T.Text'/'B.ByteString' to stay in memory! -- Even when using lazy 'TL.Text'/'BL.ByteString' we will hold on to whole -- chunks at a time leading to to significantly worse memory residency in -- some cases. -- -- See 'NoShareInput' for a somewhat slower implementation that avoids this -- memory leak scenario. -- -- @since 9.3.0 newtype ShareInput a = ShareInput {unShareInput :: a} instance Stream (ShareInput B.ByteString) where type Token (ShareInput B.ByteString) = Word8 type Tokens (ShareInput B.ByteString) = B.ByteString tokenToChunk Proxy = B.singleton tokensToChunk Proxy = B.pack chunkToTokens Proxy = B.unpack chunkLength Proxy = B.length chunkEmpty Proxy = B.null take1_ (ShareInput s) = second ShareInput <$> B.uncons s takeN_ n (ShareInput s) | n <= 0 = Just (B.empty, ShareInput s) | B.null s = Nothing | otherwise = Just . second ShareInput $ B.splitAt n s takeWhile_ p (ShareInput s) = second ShareInput $ B.span p s instance Stream (ShareInput BL.ByteString) where type Token (ShareInput BL.ByteString) = Word8 type Tokens (ShareInput BL.ByteString) = BL.ByteString tokenToChunk Proxy = BL.singleton tokensToChunk Proxy = BL.pack chunkToTokens Proxy = BL.unpack chunkLength Proxy = fromIntegral . BL.length chunkEmpty Proxy = BL.null take1_ (ShareInput s) = second ShareInput <$> BL.uncons s takeN_ n (ShareInput s) | n <= 0 = Just (BL.empty, ShareInput s) | BL.null s = Nothing | otherwise = Just . second ShareInput $ BL.splitAt (fromIntegral n) s takeWhile_ p (ShareInput s) = second ShareInput $ BL.span p s instance Stream (ShareInput T.Text) where type Token (ShareInput T.Text) = Char type Tokens (ShareInput T.Text) = T.Text tokenToChunk Proxy = T.singleton tokensToChunk Proxy = T.pack chunkToTokens Proxy = T.unpack chunkLength Proxy = T.length chunkEmpty Proxy = T.null take1_ (ShareInput s) = second ShareInput <$> T.uncons s takeN_ n (ShareInput s) | n <= 0 = Just (T.empty, ShareInput s) | T.null s = Nothing | otherwise = Just . second ShareInput $ T.splitAt n s takeWhile_ p (ShareInput s) = second ShareInput $ T.span p s instance Stream (ShareInput TL.Text) where type Token (ShareInput TL.Text) = Char type Tokens (ShareInput TL.Text) = TL.Text tokenToChunk Proxy = TL.singleton tokensToChunk Proxy = TL.pack chunkToTokens Proxy = TL.unpack chunkLength Proxy = fromIntegral . TL.length chunkEmpty Proxy = TL.null take1_ (ShareInput s) = second ShareInput <$> TL.uncons s takeN_ n (ShareInput s) | n <= 0 = Just (TL.empty, ShareInput s) | TL.null s = Nothing | otherwise = Just . second ShareInput $ TL.splitAt (fromIntegral n) s takeWhile_ p (ShareInput s) = second ShareInput $ TL.span p s -- | This wrapper selects the no-input-sharing 'Stream' implementation for -- 'T.Text' ('TL.Text') and 'B.ByteString' ('BL.ByteString'). This means -- that our parsers will create independent copies rather than using slices -- of the input. See also the documentation of 'T.copy'. -- -- More importantly, any parser output will be independent of the input, and -- holding on to parts of the output will never prevent the input from being -- garbage collected. -- -- For maximum performance you might consider using 'ShareInput' instead, -- but beware of its pitfalls! -- -- @since 9.3.0 newtype NoShareInput a = NoShareInput {unNoShareInput :: a} instance Stream (NoShareInput B.ByteString) where type Token (NoShareInput B.ByteString) = Word8 type Tokens (NoShareInput B.ByteString) = B.ByteString tokenToChunk Proxy = B.singleton tokensToChunk Proxy = B.pack chunkToTokens Proxy = B.unpack chunkLength Proxy = B.length chunkEmpty Proxy = B.null take1_ (NoShareInput s) = second NoShareInput <$> B.uncons s takeN_ n (NoShareInput s) | n <= 0 = Just (B.empty, NoShareInput s) | B.null s = Nothing | otherwise = let (result, rest) = B.splitAt n s -- To avoid sharing the entire input we create a clean copy of the result. unSharedResult = B.copy result in Just (unSharedResult, NoShareInput rest) takeWhile_ p (NoShareInput s) = let (result, rest) = B.span p s -- Ditto. unSharedResult = B.copy result in (unSharedResult, NoShareInput rest) instance Stream (NoShareInput BL.ByteString) where type Token (NoShareInput BL.ByteString) = Word8 type Tokens (NoShareInput BL.ByteString) = BL.ByteString tokenToChunk Proxy = BL.singleton tokensToChunk Proxy = BL.pack chunkToTokens Proxy = BL.unpack chunkLength Proxy = fromIntegral . BL.length chunkEmpty Proxy = BL.null take1_ (NoShareInput s) = second NoShareInput <$> BL.uncons s takeN_ n (NoShareInput s) | n <= 0 = Just (BL.empty, NoShareInput s) | BL.null s = Nothing | otherwise = let (result, rest) = BL.splitAt (fromIntegral n) s -- To avoid sharing the entire input we create a clean copy of the result. unSharedResult = BL.copy result in Just (unSharedResult, NoShareInput rest) takeWhile_ p (NoShareInput s) = let (result, rest) = BL.span p s -- Ditto. unSharedResult = BL.copy result in (unSharedResult, NoShareInput rest) instance Stream (NoShareInput T.Text) where type Token (NoShareInput T.Text) = Char type Tokens (NoShareInput T.Text) = T.Text tokenToChunk Proxy = T.singleton tokensToChunk Proxy = T.pack chunkToTokens Proxy = T.unpack chunkLength Proxy = T.length chunkEmpty Proxy = T.null take1_ (NoShareInput s) = second NoShareInput <$> T.uncons s takeN_ n (NoShareInput s) | n <= 0 = Just (T.empty, NoShareInput s) | T.null s = Nothing | otherwise = let (result, rest) = T.splitAt n s -- To avoid sharing the entire input we create a clean copy of the result. unSharedResult = T.copy result in Just (unSharedResult, NoShareInput rest) takeWhile_ p (NoShareInput s) = let (result, rest) = T.span p s unSharedResult = T.copy result in (unSharedResult, NoShareInput rest) instance Stream (NoShareInput TL.Text) where type Token (NoShareInput TL.Text) = Char type Tokens (NoShareInput TL.Text) = TL.Text tokenToChunk Proxy = TL.singleton tokensToChunk Proxy = TL.pack chunkToTokens Proxy = TL.unpack chunkLength Proxy = fromIntegral . TL.length chunkEmpty Proxy = TL.null take1_ (NoShareInput s) = second NoShareInput <$> TL.uncons s takeN_ n (NoShareInput s) | n <= 0 = Just (TL.empty, NoShareInput s) | TL.null s = Nothing | otherwise = let (result, rest) = TL.splitAt (fromIntegral n) s -- To avoid sharing the entire input we create a clean copy of the result. unSharedResult = tlCopy result in Just (unSharedResult, NoShareInput rest) takeWhile_ p (NoShareInput s) = let (result, rest) = TL.span p s unSharedResult = tlCopy result in (unSharedResult, NoShareInput rest) -- | Create an independent copy of a TL.Text, akin to BL.copy. tlCopy :: TL.Text -> TL.Text tlCopy = TL.fromStrict . T.copy . TL.toStrict {-# INLINE tlCopy #-} -- Since we are using @{-# LANGUAGE Safe #-}@ we can't use deriving via in -- these cases. instance Stream B.ByteString where type Token B.ByteString = Token (ShareInput B.ByteString) type Tokens B.ByteString = Tokens (ShareInput B.ByteString) tokenToChunk Proxy = tokenToChunk (Proxy :: Proxy (ShareInput B.ByteString)) tokensToChunk Proxy = tokensToChunk (Proxy :: Proxy (ShareInput B.ByteString)) chunkToTokens Proxy = chunkToTokens (Proxy :: Proxy (ShareInput B.ByteString)) chunkLength Proxy = chunkLength (Proxy :: Proxy (ShareInput B.ByteString)) chunkEmpty Proxy = chunkEmpty (Proxy :: Proxy (ShareInput B.ByteString)) take1_ s = second unShareInput <$> take1_ (ShareInput s) takeN_ n s = second unShareInput <$> takeN_ n (ShareInput s) takeWhile_ p s = second unShareInput $ takeWhile_ p (ShareInput s) instance Stream BL.ByteString where type Token BL.ByteString = Token (ShareInput BL.ByteString) type Tokens BL.ByteString = Tokens (ShareInput BL.ByteString) tokenToChunk Proxy = tokenToChunk (Proxy :: Proxy (ShareInput BL.ByteString)) tokensToChunk Proxy = tokensToChunk (Proxy :: Proxy (ShareInput BL.ByteString)) chunkToTokens Proxy = chunkToTokens (Proxy :: Proxy (ShareInput BL.ByteString)) chunkLength Proxy = chunkLength (Proxy :: Proxy (ShareInput BL.ByteString)) chunkEmpty Proxy = chunkEmpty (Proxy :: Proxy (ShareInput BL.ByteString)) take1_ s = second unShareInput <$> take1_ (ShareInput s) takeN_ n s = second unShareInput <$> takeN_ n (ShareInput s) takeWhile_ p s = second unShareInput $ takeWhile_ p (ShareInput s) instance Stream T.Text where type Token T.Text = Token (ShareInput T.Text) type Tokens T.Text = Tokens (ShareInput T.Text) tokenToChunk Proxy = tokenToChunk (Proxy :: Proxy (ShareInput T.Text)) tokensToChunk Proxy = tokensToChunk (Proxy :: Proxy (ShareInput T.Text)) chunkToTokens Proxy = chunkToTokens (Proxy :: Proxy (ShareInput T.Text)) chunkLength Proxy = chunkLength (Proxy :: Proxy (ShareInput T.Text)) chunkEmpty Proxy = chunkEmpty (Proxy :: Proxy (ShareInput T.Text)) take1_ s = second unShareInput <$> take1_ (ShareInput s) takeN_ n s = second unShareInput <$> takeN_ n (ShareInput s) takeWhile_ p s = second unShareInput $ takeWhile_ p (ShareInput s) instance Stream TL.Text where type Token TL.Text = Token (ShareInput TL.Text) type Tokens TL.Text = Tokens (ShareInput TL.Text) tokenToChunk Proxy = tokenToChunk (Proxy :: Proxy (ShareInput TL.Text)) tokensToChunk Proxy = tokensToChunk (Proxy :: Proxy (ShareInput TL.Text)) chunkToTokens Proxy = chunkToTokens (Proxy :: Proxy (ShareInput TL.Text)) chunkLength Proxy = chunkLength (Proxy :: Proxy (ShareInput TL.Text)) chunkEmpty Proxy = chunkEmpty (Proxy :: Proxy (ShareInput TL.Text)) take1_ s = second unShareInput <$> take1_ (ShareInput s) takeN_ n s = second unShareInput <$> takeN_ n (ShareInput s) takeWhile_ p s = second unShareInput $ takeWhile_ p (ShareInput s) -- | Type class for inputs that can also be used for debugging. -- -- @since 9.0.0 class (Stream s) => VisualStream s where -- | Pretty-print non-empty stream of tokens. This function is also used -- to print single tokens (represented as singleton lists). -- -- @since 7.0.0 showTokens :: Proxy s -> NonEmpty (Token s) -> String -- | Return the number of characters that a non-empty stream of tokens -- spans. The default implementation is sufficient if every token spans -- exactly 1 character. -- -- @since 8.0.0 tokensLength :: Proxy s -> NonEmpty (Token s) -> Int tokensLength Proxy = NE.length instance VisualStream String where showTokens Proxy = stringPretty instance VisualStream B.ByteString where showTokens Proxy = stringPretty . fmap (chr . fromIntegral) instance VisualStream BL.ByteString where showTokens Proxy = stringPretty . fmap (chr . fromIntegral) instance VisualStream T.Text where showTokens Proxy = stringPretty instance VisualStream TL.Text where showTokens Proxy = stringPretty -- | Type class for inputs that can also be used for error reporting. -- -- @since 9.0.0 class (Stream s) => TraversableStream s where {-# MINIMAL reachOffset | reachOffsetNoLine #-} -- | Given an offset @o@ and initial 'PosState', adjust the state in such -- a way that it starts at the offset. -- -- Return two values (in order): -- -- * 'Maybe' 'String' representing the line on which the given offset -- @o@ is located. It can be omitted (i.e. 'Nothing'); in that case -- error reporting functions will not show offending lines. If -- returned, the line should satisfy a number of conditions that are -- described below. -- * The updated 'PosState' which can be in turn used to locate -- another offset @o'@ given that @o' >= o@. -- -- The 'String' representing the offending line in input stream should -- satisfy the following: -- -- * It should adequately represent location of token at the offset of -- interest, that is, character at 'sourceColumn' of the returned -- 'SourcePos' should correspond to the token at the offset @o@. -- * It should not include the newline at the end. -- * It should not be empty, if the line happens to be empty, it -- should be replaced with the string @\"\\"@. -- * Tab characters should be replaced by appropriate number of -- spaces, which is determined by the 'pstateTabWidth' field of -- 'PosState'. -- -- __Note__: type signature of the function was changed in the version -- /9.0.0/. -- -- @since 7.0.0 reachOffset :: -- | Offset to reach Int -> -- | Initial 'PosState' to use PosState s -> -- | See the description of the function (Maybe String, PosState s) reachOffset o pst = (Nothing, reachOffsetNoLine o pst) -- | A version of 'reachOffset' that may be faster because it doesn't need -- to fetch the line at which the given offset in located. -- -- The default implementation is this: -- -- > reachOffsetNoLine o pst = -- > snd (reachOffset o pst) -- -- __Note__: type signature of the function was changed in the version -- /8.0.0/. -- -- @since 7.0.0 reachOffsetNoLine :: -- | Offset to reach Int -> -- | Initial 'PosState' to use PosState s -> -- | Reached source position and updated state PosState s reachOffsetNoLine o pst = snd (reachOffset o pst) instance TraversableStream String where -- NOTE Do not eta-reduce these (breaks inlining) reachOffset o pst = reachOffset' splitAt foldl' id id ('\n', '\t') o pst reachOffsetNoLine o pst = reachOffsetNoLine' splitAt foldl' ('\n', '\t') o pst instance TraversableStream B.ByteString where -- NOTE Do not eta-reduce these (breaks inlining) reachOffset o pst = reachOffset' B.splitAt B.foldl' B8.unpack (chr . fromIntegral) (10, 9) o pst reachOffsetNoLine o pst = reachOffsetNoLine' B.splitAt B.foldl' (10, 9) o pst instance TraversableStream BL.ByteString where -- NOTE Do not eta-reduce these (breaks inlining) reachOffset o pst = reachOffset' splitAtBL BL.foldl' BL8.unpack (chr . fromIntegral) (10, 9) o pst reachOffsetNoLine o pst = reachOffsetNoLine' splitAtBL BL.foldl' (10, 9) o pst instance TraversableStream T.Text where -- NOTE Do not eta-reduce (breaks inlining of reachOffset'). reachOffset o pst = reachOffset' T.splitAt T.foldl' T.unpack id ('\n', '\t') o pst reachOffsetNoLine o pst = reachOffsetNoLine' T.splitAt T.foldl' ('\n', '\t') o pst instance TraversableStream TL.Text where -- NOTE Do not eta-reduce (breaks inlining of reachOffset'). reachOffset o pst = reachOffset' splitAtTL TL.foldl' TL.unpack id ('\n', '\t') o pst reachOffsetNoLine o pst = reachOffsetNoLine' splitAtTL TL.foldl' ('\n', '\t') o pst ---------------------------------------------------------------------------- -- Helpers -- | An internal helper state type combining a difference 'String' and an -- unboxed 'SourcePos'. data St = St {-# UNPACK #-} !SourcePos ShowS -- | A helper definition to facilitate defining 'reachOffset' for various -- stream types. reachOffset' :: forall s. (Stream s) => -- | How to split input stream at given offset (Int -> s -> (Tokens s, s)) -> -- | How to fold over input stream (forall b. (b -> Token s -> b) -> b -> Tokens s -> b) -> -- | How to convert chunk of input stream into a 'String' (Tokens s -> String) -> -- | How to convert a token into a 'Char' (Token s -> Char) -> -- | Newline token and tab token (Token s, Token s) -> -- | Offset to reach Int -> -- | Initial 'PosState' to use PosState s -> -- | Line at which 'SourcePos' is located, updated 'PosState' (Maybe String, PosState s) reachOffset' splitAt' foldl'' fromToks fromTok (newlineTok, tabTok) o PosState {..} = ( Just $ case expandTab pstateTabWidth . addPrefix . f . fromToks . fst $ takeWhile_ (/= newlineTok) post of "" -> "" xs -> xs, PosState { pstateInput = post, pstateOffset = max pstateOffset o, pstateSourcePos = spos, pstateTabWidth = pstateTabWidth, pstateLinePrefix = if sameLine then -- NOTE We don't use difference lists here because it's -- desirable for 'PosState' to be an instance of 'Eq' and -- 'Show'. So we just do appending here. Fortunately several -- parse errors on the same line should be relatively rare. pstateLinePrefix ++ f "" else f "" } ) where addPrefix xs = if sameLine then pstateLinePrefix ++ xs else xs sameLine = sourceLine spos == sourceLine pstateSourcePos (pre, post) = splitAt' (o - pstateOffset) pstateInput St spos f = foldl'' go (St pstateSourcePos id) pre go (St apos g) ch = let SourcePos n l c = apos c' = unPos c w = unPos pstateTabWidth in if | ch == newlineTok -> St (SourcePos n (l <> pos1) pos1) id | ch == tabTok -> St (SourcePos n l (mkPos $ c' + w - ((c' - 1) `rem` w))) (g . (fromTok ch :)) | otherwise -> St (SourcePos n l (c <> pos1)) (g . (fromTok ch :)) {-# INLINE reachOffset' #-} -- | Like 'reachOffset'' but for 'reachOffsetNoLine'. reachOffsetNoLine' :: forall s. (Stream s) => -- | How to split input stream at given offset (Int -> s -> (Tokens s, s)) -> -- | How to fold over input stream (forall b. (b -> Token s -> b) -> b -> Tokens s -> b) -> -- | Newline token and tab token (Token s, Token s) -> -- | Offset to reach Int -> -- | Initial 'PosState' to use PosState s -> -- | Updated 'PosState' PosState s reachOffsetNoLine' splitAt' foldl'' (newlineTok, tabTok) o PosState {..} = ( PosState { pstateInput = post, pstateOffset = max pstateOffset o, pstateSourcePos = spos, pstateTabWidth = pstateTabWidth, pstateLinePrefix = pstateLinePrefix } ) where spos = foldl'' go pstateSourcePos pre (pre, post) = splitAt' (o - pstateOffset) pstateInput go (SourcePos n l c) ch = let c' = unPos c w = unPos pstateTabWidth in if | ch == newlineTok -> SourcePos n (l <> pos1) pos1 | ch == tabTok -> SourcePos n l (mkPos $ c' + w - ((c' - 1) `rem` w)) | otherwise -> SourcePos n l (c <> pos1) {-# INLINE reachOffsetNoLine' #-} -- | Like 'BL.splitAt' but accepts the index as an 'Int'. splitAtBL :: Int -> BL.ByteString -> (BL.ByteString, BL.ByteString) splitAtBL n = BL.splitAt (fromIntegral n) {-# INLINE splitAtBL #-} -- | Like 'TL.splitAt' but accepts the index as an 'Int'. splitAtTL :: Int -> TL.Text -> (TL.Text, TL.Text) splitAtTL n = TL.splitAt (fromIntegral n) {-# INLINE splitAtTL #-} -- | @stringPretty s@ returns pretty representation of string @s@. This is -- used when printing string tokens in error messages. stringPretty :: NonEmpty Char -> String stringPretty (x :| []) = charPretty x stringPretty ('\r' :| "\n") = "crlf newline" stringPretty xs = "\"" <> concatMap f (NE.toList xs) <> "\"" where f ch = case charPretty' ch of Nothing -> [ch] Just pretty -> "<" <> pretty <> ">" -- | @charPretty ch@ returns user-friendly string representation of given -- character @ch@, suitable for using in error messages. charPretty :: Char -> String charPretty ' ' = "space" charPretty ch = fromMaybe ("'" <> [ch] <> "'") (charPretty' ch) -- | If the given character has a pretty representation, return that, -- otherwise 'Nothing'. This is an internal helper. charPretty' :: Char -> Maybe String charPretty' = \case '\NUL' -> Just "null" '\SOH' -> Just "start of heading" '\STX' -> Just "start of text" '\ETX' -> Just "end of text" '\EOT' -> Just "end of transmission" '\ENQ' -> Just "enquiry" '\ACK' -> Just "acknowledge" '\BEL' -> Just "bell" '\BS' -> Just "backspace" '\t' -> Just "tab" '\n' -> Just "newline" '\v' -> Just "vertical tab" '\f' -> Just "form feed" '\r' -> Just "carriage return" '\SO' -> Just "shift out" '\SI' -> Just "shift in" '\DLE' -> Just "data link escape" '\DC1' -> Just "device control one" '\DC2' -> Just "device control two" '\DC3' -> Just "device control three" '\DC4' -> Just "device control four" '\NAK' -> Just "negative acknowledge" '\SYN' -> Just "synchronous idle" '\ETB' -> Just "end of transmission block" '\CAN' -> Just "cancel" '\EM' -> Just "end of medium" '\SUB' -> Just "substitute" '\ESC' -> Just "escape" '\FS' -> Just "file separator" '\GS' -> Just "group separator" '\RS' -> Just "record separator" '\US' -> Just "unit separator" '\DEL' -> Just "delete" '\160' -> Just "non-breaking space" _ -> Nothing -- | Replace tab characters with given number of spaces. expandTab :: Pos -> String -> String expandTab w' = go 0 0 where go _ 0 [] = [] go !i 0 ('\t' : xs) = go i (w - (i `rem` w)) xs go !i 0 (x : xs) = x : go (i + 1) 0 xs go !i n xs = ' ' : go (i + 1) (n - 1) xs w = unPos w' megaparsec-9.3.1/bench/memory/0000755000000000000000000000000007346545000014413 5ustar0000000000000000megaparsec-9.3.1/bench/memory/Main.hs0000644000000000000000000001605307346545000015640 0ustar0000000000000000{-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE TypeFamilies #-} module Main (main) where import Control.DeepSeq import Control.Monad import Data.ByteString (ByteString) import qualified Data.ByteString as B import Data.List.NonEmpty (NonEmpty (..)) import qualified Data.List.NonEmpty as NE import qualified Data.Set as E import Data.Text (Text) import qualified Data.Text as T import Data.Void import Text.Megaparsec import qualified Text.Megaparsec.Byte.Binary as Binary import Text.Megaparsec.Char import qualified Text.Megaparsec.Char.Lexer as L import Weigh -- | The type of parser that consumes 'Text'. type Parser = Parsec Void Text -- | The type of parser that consumes 'ByteString'. type ParserBs = Parsec Void ByteString main :: IO () main = mainWith $ do setColumns [Case, Allocated, GCs, Max] bparser "string" manyAs (string . fst) bparser "string'" manyAs (string' . fst) bparser "many" manyAs (const $ many (char 'a')) bparser "some" manyAs (const $ some (char 'a')) bparser "choice" (const "b") (choice . fmap char . manyAsB' . snd) bparser "count" manyAs (\(_, n) -> count n (char 'a')) bparser "count'" manyAs (\(_, n) -> count' 1 n (char 'a')) bparser "endBy" manyAbs' (const $ endBy (char 'a') (char 'b')) bparser "endBy1" manyAbs' (const $ endBy1 (char 'a') (char 'b')) bparser "manyTill" manyAsB (const $ manyTill (char 'a') (char 'b')) bparser "someTill" manyAsB (const $ someTill (char 'a') (char 'b')) bparser "sepBy" manyAbs (const $ sepBy (char 'a') (char 'b')) bparser "sepBy1" manyAbs (const $ sepBy1 (char 'a') (char 'b')) bparser "sepEndBy" manyAbs' (const $ sepEndBy (char 'a') (char 'b')) bparser "sepEndBy1" manyAbs' (const $ sepEndBy1 (char 'a') (char 'b')) bparser "skipMany" manyAs (const $ skipMany (char 'a')) bparser "skipSome" manyAs (const $ skipSome (char 'a')) bparser "skipCount" manyAs (\(_, n) -> skipCount n (char 'a')) bparser "skipManyTill" manyAsB (const $ skipManyTill (char 'a') (char 'b')) bparser "skipSomeTill" manyAsB (const $ skipSomeTill (char 'a') (char 'b')) bparser "takeWhileP" manyAs (const $ takeWhileP Nothing (== 'a')) bparser "takeWhile1P" manyAs (const $ takeWhile1P Nothing (== 'a')) bparser "decimal" mkInt (const (L.decimal :: Parser Integer)) bparser "octal" mkInt (const (L.octal :: Parser Integer)) bparser "hexadecimal" mkInt (const (L.hexadecimal :: Parser Integer)) bparser "scientific" mkInt (const L.scientific) bparserBs "word32be" many0x33 (const $ many Binary.word32be) bparserBs "word32le" many0x33 (const $ many Binary.word32le) forM_ stdSeries $ \n -> bbundle "single error" n [n] bbundle "2 errors" 1000 [1, 1000] bbundle "4 errors" 1000 [1, 500, 1000] bbundle "100 errors" 1000 [10, 20 .. 1000] breachOffset 0 1000 breachOffset 0 2000 breachOffset 0 4000 breachOffset 1000 1000 breachOffsetNoLine 0 1000 breachOffsetNoLine 0 2000 breachOffsetNoLine 0 4000 breachOffsetNoLine 1000 1000 -- | Perform a series of measurements with the same parser. bparser :: (NFData a) => -- | Name of the benchmark group String -> -- | How to construct input (Int -> Text) -> -- | The parser receiving its future input ((Text, Int) -> Parser a) -> Weigh () bparser name f p = forM_ stdSeries $ \i -> do let arg = (f i, i) p' (s, n) = parse (p (s, n)) "" s func (name ++ "-" ++ show i) p' arg -- | Perform a series of measurements with the same parser. bparserBs :: (NFData a) => -- | Name of the benchmark group String -> -- | How to construct input (Int -> ByteString) -> -- | The parser receiving its future input ((ByteString, Int) -> ParserBs a) -> Weigh () bparserBs name f p = forM_ stdSeries $ \i -> do let arg = (f i, i) p' (s, n) = parse (p (s, n)) "" s func (name ++ "-" ++ show i) p' arg -- | Benchmark the 'errorBundlePretty' function. bbundle :: -- | Name of the benchmark String -> -- | Number of lines in input stream Int -> -- | Lines with parse errors [Int] -> Weigh () bbundle name totalLines sps = do let s = take (totalLines * 80) (cycle as) as = replicate 79 'a' ++ "\n" f l = TrivialError (20 + l * 80) (Just $ Tokens ('a' :| "")) (E.singleton $ Tokens ('b' :| "")) bundle :: ParseErrorBundle String Void bundle = ParseErrorBundle { bundleErrors = f <$> NE.fromList sps, bundlePosState = PosState { pstateInput = s, pstateOffset = 0, pstateSourcePos = initialPos "", pstateTabWidth = defaultTabWidth, pstateLinePrefix = "" } } func ("errorBundlePretty-" ++ show totalLines ++ "-" ++ name) errorBundlePretty bundle -- | Benchmark the 'reachOffset' function. breachOffset :: -- | Starting offset in 'PosState' Int -> -- | Offset to reach Int -> Weigh () breachOffset o0 o1 = func ("reachOffset-" ++ show o0 ++ "-" ++ show o1) f (o0 * 80, o1 * 80) where f :: (Int, Int) -> PosState Text f (startOffset, targetOffset) = snd $ reachOffset targetOffset PosState { pstateInput = manyAs (targetOffset - startOffset), pstateOffset = startOffset, pstateSourcePos = initialPos "", pstateTabWidth = defaultTabWidth, pstateLinePrefix = "" } -- | Benchmark the 'reachOffsetNoLine' function. breachOffsetNoLine :: -- | Starting offset in 'PosState' Int -> -- | Offset to reach Int -> Weigh () breachOffsetNoLine o0 o1 = func ("reachOffsetNoLine-" ++ show o0 ++ "-" ++ show o1) f (o0 * 80, o1 * 80) where f :: (Int, Int) -> PosState Text f (startOffset, targetOffset) = reachOffsetNoLine targetOffset PosState { pstateInput = manyAs (targetOffset - startOffset), pstateOffset = startOffset, pstateSourcePos = initialPos "", pstateTabWidth = defaultTabWidth, pstateLinePrefix = "" } -- | The series of sizes to try as part of 'bparser'. stdSeries :: [Int] stdSeries = [500, 1000, 2000, 4000] ---------------------------------------------------------------------------- -- Helpers -- | Generate that many \'a\' characters. manyAs :: Int -> Text manyAs n = T.replicate n "a" -- | Like 'manyAs' but the result is a 'ByteString'. many0x33 :: Int -> ByteString many0x33 n = B.replicate n 0x33 -- | Like 'manyAs', but interspersed with \'b\'s. manyAbs :: Int -> Text manyAbs n = T.take (if even n then n + 1 else n) (T.replicate n "ab") -- | Like 'manyAs', but with a \'b\' added to the end. manyAsB :: Int -> Text manyAsB n = manyAs n <> "b" -- | Like 'manyAsB', but returns a 'String'. manyAsB' :: Int -> String manyAsB' n = replicate n 'a' ++ "b" -- | Like 'manyAbs', but ends in a \'b\'. manyAbs' :: Int -> Text manyAbs' n = T.take (if even n then n else n + 1) (T.replicate n "ab") -- | Render an 'Integer' with the number of digits linearly dependent on the -- argument. mkInt :: Int -> Text mkInt n = (T.pack . show) ((10 :: Integer) ^ (n `quot` 100)) megaparsec-9.3.1/bench/speed/0000755000000000000000000000000007346545000014203 5ustar0000000000000000megaparsec-9.3.1/bench/speed/Main.hs0000644000000000000000000001642007346545000015426 0ustar0000000000000000{-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE TypeFamilies #-} module Main (main) where import Control.DeepSeq import Criterion.Main import Data.ByteString (ByteString) import qualified Data.ByteString as B import Data.List.NonEmpty (NonEmpty (..)) import qualified Data.List.NonEmpty as NE import qualified Data.Set as E import Data.Text (Text) import qualified Data.Text as T import Data.Void import Text.Megaparsec import qualified Text.Megaparsec.Byte.Binary as Binary import Text.Megaparsec.Char import qualified Text.Megaparsec.Char.Lexer as L -- | The type of parser that consumes 'Text'. type Parser = Parsec Void Text -- | The type of parser that consumes 'ByteString'. type ParserBs = Parsec Void ByteString main :: IO () main = defaultMain [ bparser "string" manyAs (string . fst), bparser "string'" manyAs (string' . fst), bparser "many" manyAs (const $ many (char 'a')), bparser "some" manyAs (const $ some (char 'a')), bparser "choice" (const "b") (choice . fmap char . manyAsB' . snd), bparser "count" manyAs (\(_, n) -> count n (char 'a')), bparser "count'" manyAs (\(_, n) -> count' 1 n (char 'a')), bparser "endBy" manyAbs' (const $ endBy (char 'a') (char 'b')), bparser "endBy1" manyAbs' (const $ endBy1 (char 'a') (char 'b')), bparser "manyTill" manyAsB (const $ manyTill (char 'a') (char 'b')), bparser "someTill" manyAsB (const $ someTill (char 'a') (char 'b')), bparser "sepBy" manyAbs (const $ sepBy (char 'a') (char 'b')), bparser "sepBy1" manyAbs (const $ sepBy1 (char 'a') (char 'b')), bparser "sepEndBy" manyAbs' (const $ sepEndBy (char 'a') (char 'b')), bparser "sepEndBy1" manyAbs' (const $ sepEndBy1 (char 'a') (char 'b')), bparser "skipMany" manyAs (const $ skipMany (char 'a')), bparser "skipSome" manyAs (const $ skipSome (char 'a')), bparser "skipCount" manyAs (\(_, n) -> skipCount n (char 'a')), bparser "skipManyTill" manyAsB (const $ skipManyTill (char 'a') (char 'b')), bparser "skipSomeTill" manyAsB (const $ skipSomeTill (char 'a') (char 'b')), bparser "takeWhileP" manyAs (const $ takeWhileP Nothing (== 'a')), bparser "takeWhile1P" manyAs (const $ takeWhile1P Nothing (== 'a')), bparser "decimal" mkInt (const (L.decimal :: Parser Integer)), bparser "octal" mkInt (const (L.octal :: Parser Integer)), bparser "hexadecimal" mkInt (const (L.hexadecimal :: Parser Integer)), bparser "scientific" mkInt (const L.scientific), bparserBs "word32be" many0x33 (const $ many Binary.word32be), bparserBs "word32le" many0x33 (const $ many Binary.word32le), bgroup "" [bbundle "single error" n [n] | n <- stdSeries], bbundle "2 errors" 1000 [1, 1000], bbundle "4 errors" 1000 [1, 500, 1000], bbundle "100 errors" 1000 [10, 20 .. 1000], breachOffset 0 1000, breachOffset 0 2000, breachOffset 0 4000, breachOffset 1000 1000, breachOffsetNoLine 0 1000, breachOffsetNoLine 0 2000, breachOffsetNoLine 0 4000, breachOffsetNoLine 1000 1000 ] -- | Perform a series to measurements with the same parser. bparser :: (NFData a) => -- | Name of the benchmark group String -> -- | How to construct input (Int -> Text) -> -- | The parser receiving its future input ((Text, Int) -> Parser a) -> -- | The benchmark Benchmark bparser name f p = bgroup name (bs <$> stdSeries) where bs n = env (return (f n, n)) (bench (show n) . nf p') p' (s, n) = parse (p (s, n)) "" s -- | Perform a series to measurements with the same parser. bparserBs :: (NFData a) => -- | Name of the benchmark group String -> -- | How to construct input (Int -> ByteString) -> -- | The parser receiving its future input ((ByteString, Int) -> ParserBs a) -> -- | The benchmark Benchmark bparserBs name f p = bgroup name (bs <$> stdSeries) where bs n = env (return (f n, n)) (bench (show n) . nf p') p' (s, n) = parse (p (s, n)) "" s -- | Benchmark the 'errorBundlePretty' function. bbundle :: -- | Name of the benchmark String -> -- | Number of lines in input stream Int -> -- | Lines with parse errors [Int] -> Benchmark bbundle name totalLines sps = let s = take (totalLines * 80) (cycle as) as = replicate 79 'a' ++ "\n" f l = TrivialError (20 + l * 80) (Just $ Tokens ('a' :| "")) (E.singleton $ Tokens ('b' :| "")) bundle :: ParseErrorBundle String Void bundle = ParseErrorBundle { bundleErrors = f <$> NE.fromList sps, bundlePosState = PosState { pstateInput = s, pstateOffset = 0, pstateSourcePos = initialPos "", pstateTabWidth = defaultTabWidth, pstateLinePrefix = "" } } in bench ("errorBundlePretty-" ++ show totalLines ++ "-" ++ name) (nf errorBundlePretty bundle) -- | Benchmark the 'reachOffset' function. breachOffset :: -- | Starting offset in 'PosState' Int -> -- | Offset to reach Int -> Benchmark breachOffset o0 o1 = bench ("reachOffset-" ++ show o0 ++ "-" ++ show o1) (nf f (o0 * 80, o1 * 80)) where f :: (Int, Int) -> PosState Text f (startOffset, targetOffset) = snd $ reachOffset targetOffset PosState { pstateInput = manyAs (targetOffset - startOffset), pstateOffset = startOffset, pstateSourcePos = initialPos "", pstateTabWidth = defaultTabWidth, pstateLinePrefix = "" } -- | Benchmark the 'reachOffsetNoLine' function. breachOffsetNoLine :: -- | Starting offset in 'PosState' Int -> -- | Offset to reach Int -> Benchmark breachOffsetNoLine o0 o1 = bench ("reachOffsetNoLine-" ++ show o0 ++ "-" ++ show o1) (nf f (o0 * 80, o1 * 80)) where f :: (Int, Int) -> PosState Text f (startOffset, targetOffset) = reachOffsetNoLine targetOffset PosState { pstateInput = manyAs (targetOffset - startOffset), pstateOffset = startOffset, pstateSourcePos = initialPos "", pstateTabWidth = defaultTabWidth, pstateLinePrefix = "" } -- | The series of sizes to try as part of 'bparser'. stdSeries :: [Int] stdSeries = [500, 1000, 2000, 4000] ---------------------------------------------------------------------------- -- Helpers -- | Generate that many \'a\' characters. manyAs :: Int -> Text manyAs n = T.replicate n "a" -- | Like 'manyAs' but the result is a 'ByteString'. many0x33 :: Int -> ByteString many0x33 n = B.replicate n 0x33 -- | Like 'manyAs', but interspersed with \'b\'s. manyAbs :: Int -> Text manyAbs n = T.take (if even n then n + 1 else n) (T.replicate n "ab") -- | Like 'manyAs', but with a \'b\' added to the end. manyAsB :: Int -> Text manyAsB n = manyAs n <> "b" -- | Like 'manyAsB', but returns a 'String'. manyAsB' :: Int -> String manyAsB' n = replicate n 'a' ++ "b" -- | Like 'manyAbs', but ends in a \'b\'. manyAbs' :: Int -> Text manyAbs' n = T.take (if even n then n else n + 1) (T.replicate n "ab") -- | Render an 'Integer' with the number of digits linearly dependent on the -- argument. mkInt :: Int -> Text mkInt n = (T.pack . show) ((10 :: Integer) ^ (n `quot` 100)) megaparsec-9.3.1/megaparsec.cabal0000644000000000000000000000622307346545000015122 0ustar0000000000000000cabal-version: 2.4 name: megaparsec version: 9.3.1 license: BSD-2-Clause license-file: LICENSE.md maintainer: Mark Karpov author: Megaparsec contributors, Paolo Martini , Daan Leijen tested-with: ghc ==9.0.2 ghc ==9.2.5 ghc ==9.4.4 homepage: https://github.com/mrkkrp/megaparsec bug-reports: https://github.com/mrkkrp/megaparsec/issues synopsis: Monadic parser combinators description: This is an industrial-strength monadic parser combinator library. Megaparsec is a feature-rich package that tries to find a nice balance between speed, flexibility, and quality of parse errors. category: Parsing build-type: Simple extra-doc-files: CHANGELOG.md README.md source-repository head type: git location: https://github.com/mrkkrp/megaparsec.git flag dev description: Turn on development settings. default: False manual: True library exposed-modules: Text.Megaparsec Text.Megaparsec.Byte Text.Megaparsec.Byte.Binary Text.Megaparsec.Byte.Lexer Text.Megaparsec.Char Text.Megaparsec.Char.Lexer Text.Megaparsec.Debug Text.Megaparsec.Error Text.Megaparsec.Error.Builder Text.Megaparsec.Internal Text.Megaparsec.Pos Text.Megaparsec.Stream other-modules: Text.Megaparsec.Class Text.Megaparsec.Common Text.Megaparsec.Lexer Text.Megaparsec.State default-language: Haskell2010 build-depends: base >=4.15 && <5.0, bytestring >=0.2 && <0.12, case-insensitive >=1.2 && <1.3, containers >=0.5 && <0.7, deepseq >=1.3 && <1.5, mtl >=2.2.2 && <3.0, parser-combinators >=1.0 && <2.0, scientific >=0.3.7 && <0.4, text >=0.2 && <2.1, transformers >=0.4 && <0.7 if flag(dev) ghc-options: -O0 -Wall -Werror else ghc-options: -O2 -Wall if flag(dev) ghc-options: -Wcompat -Wincomplete-record-updates -Wincomplete-uni-patterns -Wnoncanonical-monad-instances -Wno-missing-home-modules benchmark bench-speed type: exitcode-stdio-1.0 main-is: Main.hs hs-source-dirs: bench/speed default-language: Haskell2010 build-depends: base >=4.15 && <5.0, bytestring >=0.2 && <0.12, containers >=0.5 && <0.7, criterion >=0.6.2.1 && <1.7, deepseq >=1.3 && <1.5, megaparsec, text >=0.2 && <2.1 if flag(dev) ghc-options: -O2 -Wall -Werror else ghc-options: -O2 -Wall benchmark bench-memory type: exitcode-stdio-1.0 main-is: Main.hs hs-source-dirs: bench/memory default-language: Haskell2010 build-depends: base >=4.15 && <5.0, bytestring >=0.2 && <0.12, containers >=0.5 && <0.7, deepseq >=1.3 && <1.5, megaparsec, text >=0.2 && <2.1, weigh >=0.0.4 if flag(dev) ghc-options: -O2 -Wall -Werror else ghc-options: -O2 -Wall