-
Notifications
You must be signed in to change notification settings - Fork 38
Add source names (via new Stream
and SourceSpan
classes) and .span()
combinator
#83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Stream
and SourceSpan
classes) and .span()
combinator
e64a7f5
to
6579cae
Compare
This primarily wraps the str|bytes|list that is the data to parse, but also adds the metadata `source` to hold a filename, URL, etc. where the data is from. Introducing this class also paves the way for eventually supporting streaming input data.
cf9c189
to
58b00dd
Compare
Wrap the string, bytes, list into a Stream before calling parse.
58b00dd
to
52ac956
Compare
Sorry about the million force-pushes to my local branch obsuring the history above. I wasn't able to get tox working locally, so I debugged using GH actions on my fork. Everything should be good to go now. I made a couple of changes to the workflows: namely removing python 3.7 which is de jure unsupported by parsy at this point, and is unavailable in GH actions now anyway. I added runners for python 3.12 and 3.13, for which I added a kludge in setup.py since setuptools isn't part of python 3.12 onward. |
Hi @tsani - thanks so much for your work on this. At this point in its life, parsy is a pretty mature library, so breaking API compatibility is a really big deal, and not something that I would consider at this point for this feature. Breaking the However, I think there shouldn't be a need to do that. The first thing I think we can do is add a Somewhat harder is that we need to keep the interface for def consume(n):
@Parser
def consumer(stream, index):
items = stream[index:index + n]
if len(items) == n:
return Result.success(index + n, items)
else:
return Result.failure(index, "{0} items".format(n))
return consumer This means that This is a harder constraint, but there are some ways forward:
Proof of concept code: class StrStream(str):
def __new__(cls, string, source):
instance = super().__new__(cls, string)
instance.source = source
return instance
>>> s = StrStream('some text', 'myfile.txt')
>>> s
'some text'
>>> s.split(' ')
['some', 'text']
>>> s.source
'myfile.txt'
>>> isinstance(s, str)
True My expectation is that there shouldn't be any need to change any of the existing test suite - it should run without modification, any breakage is telling us that we've possibly broken someone else's code too. |
BTW - I have done some work on the CI etc., and switched to uv for packaging, and merged that to master, so those parts of the PR shouldn't be needed any more. You might want to start a new branch and cherry-pick what you need. Sorry for the extra work! |
Hey @spookylukey, thanks for the input on this! I made a new PR with the changes here: #85 |
Following from #82:
I've gone ahead with the name
source
. That makes the most sense to me as it could be something more abstract like<stdin>
or a URL as you mentioned.I opted against changing
mark
at all, since this would cause parsers involving it to break when parsing a data stream equipped with a source.This approach seemed the best to me. I created a dataclass
SourceSpan
to hold the start&end row&column alongside thesource
, adjusted theline_info
andline_info_at
helpers to account for thesource
, and introduced the method.span()
as the improved version of.mark()
to augment the result of the parser with aSourceSpan
object.The tricky part of the PR was that it wasn't so simple to "just thread a source name through the parsers." The parser objects themselves are completely stateless -- all the state is held within the data stream, which is just a string, list, or bytes object.
I created a class
Stream
to wrap the underlying data stream, making it possible to add extra fields; in this case, that's justsource
. Then.parse()
takes a Stream instead of "raw" data. This does create a breaking change in the API, as anyone callingparse
must pass a Stream as input. (Fixing the tests to account for this was super fun.)I believe that introducing Stream is important for the future since it's common for parsers to work on data in a truly streaming fashion. The current design of parsy requires all the data to parse to be buffered upfront, so adding genuine streaming will take a lot more effort.
To eliminate the API breakage, I added a somewhat ugly
isinstance
check in.parse()
to convert to aStream
(with no source name) when the user provides something else, so that this can be a patch release instead of a minor release.