-
Notifications
You must be signed in to change notification settings - Fork 274
Feat: Verified and easy-to-use parser combinators #6143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
A.AboutDrop(input, A.Length(input)-A.Length(remaining1), A.Length(remaining1)-A.Length(remaining2)); | ||
} | ||
|
||
// Cannot express this predicate if Input is allocated. Add once we accept quantification over unallocated objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be interested in the "proof trait" pattern I used in the Actions library: https://github.com/dafny-lang/dafny/pull/6074/files#diff-cb5145d4b584ea7656360f05c946523c48c5651815cea269b469a855844c3ed2R105
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some initial thoughts based on the offline demo, will return for a more thorough review.
FailureData("expected a "+name, input, Option.None)) | ||
} | ||
|
||
opaque function IntToString(n: int): string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the existing utilities like Std.Strings.OfInt()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember why I did not do that. In the project that was using these parsers, I was only import Std.Wrappers and these parsers.
By using Std.Strings.OfInt(), I now need to include all these files
"Std/Strings.dfy",
"Std/Arithmetic/Power.dfy",
"Std/Arithmetic/Mul.dfy",
"Std/Arithmetic/DivMod.dfy",
"Std/Arithmetic/Logarithm.dfy",
"Std/Arithmetic/Internal/DivInternals.dfy",
"Std/Arithmetic/Internal/GeneralInternals.dfy",
"Std/Arithmetic/Internal/ModInternals.dfy",
"Std/Arithmetic/Internal/MulInternalsNonlinear.dfy",
"Std/Arithmetic/Internal/ModInternalsNonlinear.dfy",
"Std/Arithmetic/Internal/DivInternalsNonlinear.dfy",
"Std/Arithmetic/Internal/MulInternals.dfy",
"Std/Arithmetic/LittleEndianNat.dfy",
much of which is useless to my application. Looking forward being able to just enable standard libraries soon :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sure, but as you move this code in here it will become part of the project/pre-built doo file, so there's no need for includes anywhere!
a.e_I(b.e_I(c)) | ||
|
||
will parse a, b, c but only return the result of `c`. */ | ||
function e_I<U>(other: B<U>): (p: B<U>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just _I
and I_
, since the "e" can be inferred by the lack of "I"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_I
is not a valid user-defined Dafny identifier because it starts with underscore. I also wanted that initially. That's also why I made the e
lowercase. Also, the e
lowercase makes me think of the interdiction sign on the road, a bit bended :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, damn. I suppose you could also do '_I
and I_'
? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could but then that means the resulting generated code will look like _q__I
and I___q
in most backends, and even __q__I
and _I___q
in Rust, which does not look as symmetrical. Plus, quotes make it hard to select the entire identifier in any regular IDE. So no, I don't think I want to add quotes in identifiers in a builder pattern.
expect r.ParseFailure?; | ||
var failureMessage := P.FailureToString(input, r); | ||
expect failureMessage | ||
== "Error:" + "\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use @"..."
verbatim strings for multi-line constants in a few places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with @"..."
is that indentation is treated as part of the constant, the first line requires two more characters than the remaining lines (because of the @"
initial) and, because of the way Windows checks out lines, the expect would automatically fail if it expects "\r\n"
instead of "\n"
.
The format I used above makes it possible to preserve the indentation while still having a visual about what the output should look like, and having control over all the characters.
Alternatively, I could use @"
and still have the right indentation, by writing something like:
StripIdent(@"
FirstLine()
SecondLine()");
where strip ident would
- Be a method in the standard library, somewhere?
- Remove the first newline
- Detect the indentation of the first line
- Erase the indentation on all subsequent lines
- Erase Windows' carriage return
\r
.
I would prefer the current approach because it does not require any special handling of the string.
I'm personally using @""
mostly for regular expressions, to avoid escaping backslashes, not to avoid newlines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair that this is well out of scope for this PR at least. But those stripping utilities definitely sound like great things to add to Std.Strings
in the future, so all Dafny users can get more value out of verbatim strings.
@@ -5,7 +5,7 @@ standard-libraries = true | |||
# but consumers don't have to have this option set | |||
# to use the standard libraries in general. | |||
reads-clauses-on-methods = true | |||
enforce-determinism = true | |||
#enforce-determinism = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should change this as it means we won't catch cases of libraries that accidentally can't be used in this mode. Is the Parsers library fundamentally non-deterministic or is this something fixable?
The good news is my PR creates a separate examples project with this off since I hit this as well: https://github.com/dafny-lang/dafny/pull/6074/files#diff-8ae9e889ff28306ca6c425716f35f7ef20201f4c15d7128363b49411e724bb3b :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was accidentally committed. Thanks for pointing it out.
@@ -0,0 +1,756 @@ | |||
abstract module Std.Parsers.AbstractInput { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure you add the license header to each source file.
It's expecting a digit where it should have just accepted the end of the string. To figure out what went wrong, | ||
let's first define two debug functions like below - they can't be defined in the library because they have print effects. You could change the type of the output of string if you were to customize them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a decent solution given the current constraints, but I definitely wish it was easier to flip on for users somehow, and it's not great that it just plain doesn't work if users have --track-print-effects
on.
Would it be possible to have an alternate evaluation mode that also carries a separate debugging information type in parallel, so you could do something like:
var r, log := ApplyWithLog(p, input);
print log;
(I'll know better myself how feasible that is once I review this more deeply)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way we could do it would be to define a new module ParserBuildersLog
that would keep track of everything like this
datatype B<P> = B(p: Parser<(P, string)>) = {
... All operators but adding a log to it
}
... All operators but taking into account the log
Then, to import these, one could simply replace import Std.Parsers.StringParsersBuilders
by adding Log
at the end of the last one.
However, it would have the extra hassle that it has to be maintained separately.
To avoid the separation, one would have to create another abstract module
abstract module StringBuildersAbstract refines ParserBuilders {
type Returning<P>
function Extract(r: Returning<P>): P
function Combine(a: Returning<P>, r: Returning<Q>): Returning<P>
}
module StringBuilders StringBuildersAbstract {
type Returning<P> = P
}
module StringBuildersLog StringBuildersAbstract {
type Returning<P> = (P, string)
}
Let me know if it's important for you that we do it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, that solution would have a lot of overhead. I'll keep this in mind when I review the implementation to see if I can think of a simpler solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed README and examples as requested, will start reviewing the rest next week.
@@ -0,0 +1,336 @@ | |||
# Verified Parser Combinators | |||
|
|||
Parser combinators in Dafny, inspired from the model (Meijer&Hutton 1996). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hyperlink to paper perhaps?
|
||
This library offers two styles of functional parser combinators. | ||
|
||
- The first parsers style is a synonym for `seq<character> -> ParseResult<Result>` that supports monadic styles, is straightforward to use, but results in lots of closing parentheses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that be literally char
, or is character
more like a type parameter here?
|
||
/** Tic tac toe using string parsers */ | ||
method Main() { | ||
var x := OrSeq([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider var cell
instead of var x
// A parser is valid iff for any input, it never returns a fatal error | ||
// and always returns a suffix of its input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many parsers are NOT valid? (Begs the question again of what a fatal error is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reformulated to:
// A parser is NeverFatal iff for any input, it never returns a fatal error
// Since fatal errors occur only when a parser returns more characters than it was given,
// NeverFatal means the remaining unparsed must be smaller or equal to the input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like that you're showing this example with both styles of constructing parsers, but since the rest is identical, I'd say combine them into one file and just run the tests on both copies.
expect r.ParseFailure?; | ||
var failureMessage := P.FailureToString(input, r); | ||
expect failureMessage | ||
== "Error:" + "\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair that this is well out of scope for this PR at least. But those stripping utilities definitely sound like great things to add to Std.Strings
in the future, so all Dafny users can get more value out of verbatim strings.
FailureData("expected a "+name, input, Option.None)) | ||
} | ||
|
||
opaque function IntToString(n: int): string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sure, but as you move this code in here it will become part of the project/pre-built doo file, so there's no need for includes anywhere!
a.e_I(b.e_I(c)) | ||
|
||
will parse a, b, c but only return the result of `c`. */ | ||
function e_I<U>(other: B<U>): (p: B<U>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, damn. I suppose you could also do '_I
and I_'
? :)
Co-authored-by: Robin Salkeld <[email protected]>
Fixes #5418
This PR adds to the standard library support for functional parsers and parsers combinators.
A parser is simply a
Input -> ParseResult<T>
whereParseResult
is either a success or a failure that contains the remaining unparsed input. A parsing failure is "committed" if the remaining unparsed input is strictly smaller than the input. Disjunctions can happen only if a failure is not committed. Combinators can help modify the consumption of the input, which makes it possible to create a rich variety of parsers.The Parsers module is abstract as parsers operate on an
Input
that can be converted at any time to aseq<C>
. InStringParsers
,C
is a char andInput
is a datatype simulating lazy slicing on the input string. That lazy slicing resulted in a +16% speedup on a real benchmark compared to just havingseq<char>
forInput
.You can browse the example to understand how parsers combinators can be used.
4*(1+3)*x^3+x
The easiest way to use parsers is to use the parsers builders. Parser builds are a datatype wrapper around parsers to make it possible to have suffix and infix combinators. This allows chaining and avoid nested parentheses. See the following examples:
4*(1+3)*x^3+x
but the syntax seems better to me.abc((de|f((g))*))ml"
, create another parser out of that, and apply this parser to an input string like"abcdeml"
.Notable features include:
IntToString
andStringToInt
are reverse of each other in both ways.How has this been tested?
By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.