Skip to content

Conversation

@marhop
Copy link
Contributor

@marhop marhop commented Nov 12, 2025

Hi,

The new xmlDeclaration function introduced by this PR makes the solution proposed in #129 part of the API.

I wondered how to add an XML declaration to the rendered XML produced by the streaming API, found nothing in the docs and finally stumbled upon this issue where it is explained. I figured this should be properly documented in the Haddocks or even provided as an - albeit very simple - function.

Should you be opposed to adding new functions to the API we should at least add a note to the docs so the issue mentioned above is not the only place where this info can be found.

Cheers,
Martin

@k0ral
Copy link
Collaborator

k0ral commented Nov 14, 2025

If we add xmlDeclaration, I'm wondering whether the existence of rsXMLDeclaration is still warranted.

From a practical perspective, in its current semantics:

  • when it's true, it has no effect
  • when it's false, it drops any EventBeginElement from the stream of events... but that can already be achieved with a filter at Conduit level

From a theoretical perspective, xml-conduit seems to be designed to work with XML at a high level of abstraction, focusing primarily on document semantics. The few configuration knobs, like rsXMLDeclaration, seem to exist mainly to handle edge cases. By exposing xmlDeclaration, we effectively acknowledge that users may need lower-level control over document formatting as well, which makes those rsXMLDeclaration options feel unnecessary.

What do you think ?

@marhop
Copy link
Contributor Author

marhop commented Nov 15, 2025

Good point! You're right, if we add xmlDeclaration the rsXMLDeclaration setting becomes somewhat redundant and could be removed.

However, having reflected on your thoughts for a while, maybe we should move away from the pragmatic solution in #129 (manually yielding an EventBeginDocument in order to provoke an XML declaration being rendered) altogether and instead make the rsXMLDeclaration setting work as expected in the first place? That is, how about changing the PR as follows?

  • Remove the xmlDeclaration function again.
  • Change the rendering implementation so that if rsXMLDeclaration == True, an EventBeginDocument is automatically injected into the stream (leading to a rendered XML declaration), but only if the first element on the stream isn't already an EventBeginDocument (otherwise we would get two XML declarations).

If I don't miss anything, this should make the rsXMLDeclaration setting work as expected, rendering an XML declaration if set to True and not rendering an XML declaration if set to False, regardless of any manually added EventBeginDocument elements in the stream (well, at least at the start of the stream, but another position would be weird anyway).

If you're fine with this suggestion I can update the PR and you can have another look. OK?

@k0ral
Copy link
Collaborator

k0ral commented Nov 16, 2025

I imagine users might want any one of the following semantics:

  1. preserve presence/absence of XML declaration
  2. ensure absence of XML declaration
  3. ensure presence of XML declaration

Today, rsXMLDeclaration lets users choose between (1) or (2). Your last proposal would let users choose between (2) and (3) instead. Either way, some users are going to be let down, and will resort to the lower-level, event-based API to achieve what they want.

Thus, I have the feeling rsXMLDeclaration should go, and the issue should be fixed another way. xmlDeclaration is one way, but we could also consider providing higher-level helpers instead like ensureXMLDeclaration :: Monad m => ConduitT Event Event m () and dropXMLDeclaration :: Monad m => ConduitT Event Event m ().

What do you think ? 🙂

@marhop
Copy link
Contributor Author

marhop commented Nov 17, 2025

Well, personally, I'm fine with having only options 2 and 3 because they allow me to be explicit about the presence of an XML declaration in my rendered output. But that may be just me. ;-)

As to option 1, is this really possible today (or at all)? Maybe I don't understand the use case correctly, but if it is about parsing XML, processing it, and rendering it again, and about preserving an XML declaration that is (or is not) present in the input so that it appears again (or still doesn't) in the output - I'm not sure how this would work. For example, if I run something like runConduitRes $ parseFile def "a.xml" .| mapM_C (liftIO . print) on the following two inputs ...

<a>...</a>

vs.

<?xml version="1.0"?>
<a>...</a>

... I get the exact same output, in both cases starting with EventBeginDocument, EvenBeginElement(...), ..., so the information about an XML declaration being present in the input is already lost after parsing. Consequently, if I pipe this stream of Events into renderBytes def .| stdoutC the output does or does not include an XML declaration depending only on the value of rsXMLDeclaration. (OK, and on the fact that there is an EventBeginDocument in the stream which the current implementation relies on to trigger rendering an XML declaration at all.)

But if the information about the presence or absence of an XML declaration in the input is already lost after parsing I don't see how it can be preserved, making option 1 impossible anyway. (Or am I missing something?) However, if you ask me, this is no problem because I don't think the XML declaration has any semantic value that must be preserved. I would rather consider it a syntactic detail, in a similar area as pretty printing; which, come to think of it, somehow justifies it being configured in the RenderSettings instead of dealing with it in the stream (like my initial xmlDeclaration would do) ...

So, I have no strong feelings about this but I lean a little more towards keeping the rsXMLDeclaration setting (but making it work as described above). Sorry. ;-)

@marhop
Copy link
Contributor Author

marhop commented Nov 17, 2025

Oh, maybe there's yet another option. :-)

Today, when working with the Text.XML.Stream.Render module, one usually builds a stream out of high-level tag components, resulting in a stream of EventBeginElement etc. Now, the initial problem (no XML declaration being rendered) was caused by the fact that this stream of events usually contains no EventBeginDocument, unless it is added by a low-level yield EventBeginDocument.

So maybe we could add another high-level function document :: Monad m => ConduitT i Event m () -> ConduitT i Event m () similar to tag that is intended to "wrap" all the tag components in a document (similar to a parent tag wrapping its child elements)? That way we would keep the high-level feeling of the API (other than with xmlDeclaration proposed above) while offering the possibility to generate a complete document (i.e., including EventBeginDocument and EventEndDocument) instead of just a stream of elements. Apart from looking more tidy to me (which may not be a very compelling argument) this would let the current implementation pick up on the EventBeginDocument to generate an XML declaration.

Another, maybe too drastic thought regarding rsXMLDeclaration, since you would like to get rid of it: Do we need this setting at all? Would it cause any harm if it was removed (or moved to the Internal module) and always set to True? I can't imagine a scenario where an XML declaration being included in the rendered output would ever be a problem. (I may be too unimaginative though.)

@k0ral
Copy link
Collaborator

k0ral commented Nov 17, 2025

the information about an XML declaration being present in the input is already lost after parsing

I hadn't realized that the parser behaved like that, fair point.
Nevertheless, that only covers pipelines like XML document | parse XML | transform | render XML ; we still need to care about pipelines like non-XML data source | parse | transform into XML | render XML, for which we cannot assume an EventBeginDocument will always be present, unless I am mistaken.
(EDIT: I realize now that's what your last message mentions so I guess we're on the same page 🙂 .)

I can't imagine a scenario where an XML declaration being included in the rendered output would ever be a problem.

Not clear to me what the intent was exactly, but you can check the description of the commit that introduced that feature.

Off the top of my head, I could imagine a scenario where one needs to render an XML document and then nest it inside another XML document, in which case the intermediate XML declaration becomes undesired.

So maybe we could add another high-level function document :: Monad m => ConduitT i Event m () -> ConduitT i Event m () similar to tag that is intended to "wrap" all the tag components in a document (similar to a parent tag wrapping its child elements)?

That sounds like an elegant and practical solution. And again, I have the feeling it makes rsXMLDeclaration redundant, as it makes no sense to use document and rsXMLDeclaration = False together (or the other way around). Or am I missing something ?

@marhop
Copy link
Contributor Author

marhop commented Nov 17, 2025

Off the top of my head, I could imagine a scenario where one needs to render an XML document and then nest it inside another XML document, in which case the intermediate XML declaration becomes undesired.

Ha, I knew I was too unimaginative! ;-) Yes, that sounds like something that happens in real life. This could be approached by not wrapping the stream in the proposed document function but producing only a stream of tags then.

And again, I have the feeling it makes rsXMLDeclaration redundant, as it makes no sense to use document and rsXMLDeclaration = False together (or the other way around). Or am I missing something ?

Yes, I think that makes rsXMLDeclaration (mostly?) redundant. If you want an XML declaration, wrap your conduit in a document function; if you don't want one, omit the document function. I can only think of two scenarios where I would still reach for the rsXMLDeclaration setting, but both appear very contrived to me:

  • We receive an "unknown" stream of events that may or may not contain an EventBeginDocument, but we want to make sure that no XML declaration is rendered. I guess we could still achieve this by low-level filtering as you suggested above.
  • We'd like to include an EventBeginDocument (and an EventEndDocument) in our event stream because we want it to have a "proper" structure, but we don't want an XML declaration in our rendered output. This seems like a rather academic problem though. ;-)

A less contrived reason for keeping rsXMLDeclaration might be simply to have the same RenderSettings in the Text.XML.Stream.Render and Text.XML modules, but I don't think that alone is a good justification.

So let me suggest to proceed as follows then:

  • I start by removing the xmlDeclaration function and instead introducing the document function. This allows to include the required EventBeginDocument and EventEndDocument events without resorting to lower-level functions like yield, and it triggers an XML declaration being rendered on EventBeginDocument events (unless suppressed by rsXMLDeclaration = False).
  • If this works out well and nobody shows up with a good reason for keeping rsXMLDeclaration it can (now, or later down the road) be deprecated, moved to the Internal module like rsPretty, and eventually be removed.

If this sounds good, I would like your thoughts on the document function. I see two possible implementations:

  1. document :: Monad m => ConduitT i Event m () -> ConduitT i Event m () which yields an EventBeginDocument, then everything from the inner conduit, and then an EventEndDocument.
  2. document :: Monad m => Name -> Attributes -> ConduitT i Event m () -> ConduitT i Event m () which yields an EventBeginDocument, then yields the same as applying the tag function with the same arguments would (i.e., create an element with the given name and attributes and everything from the inner conduit as element content), and then an EventEndDocument.

The first variant allows to create an empty document (but who needs that?), while the second ensures that a document contains exactly one root element (which in my head resonates with the "make invalid states unrepresentable" mantra). I very much prefer the second variant, but maybe that's not flexible enough?

@k0ral
Copy link
Collaborator

k0ral commented Nov 18, 2025

So let me suggest to proceed as follows then

Your plan sounds good to me 🙂 .

I see two possible implementations:

Although well-formed XML documents must indeed have a unique root tag/element, users might want to insert extra events between EventBeginDocument and the root EventBeginElement:

  • EventBeginDoctype/EventEndDoctype
  • EventInstruction
  • EventComment

Thus, I suggest going with the first signature document :: Monad m => ConduitT i Event m () -> ConduitT i Event m ().

If this works out well and nobody shows up with a good reason for keeping rsXMLDeclaration it can (now, or later down the road) be deprecated, moved to the Internal module like rsPretty, and eventually be removed.

Even though it's down the road, I'd like to clarify one point: the moment we remove rsXMLDeclaration, shouldn't we also change the behavior of the parsing logic to stop inserting EventBeginDocument/EventEndDocument when they're not present in the parsed XML document ?

That way, in a pipeline like parse XML | transform | render XML, the possibilities become (emphasis on last row):

Has parsed document an XML declaration ? Does user want rendered document to have an XML declaration ? How to achieve
Yes Yes nothing to do, XML declaration events are parsed, then rendered transparently
Yes No filter out event
No Yes use the new document combinator
No No nothing to do

@marhop marhop changed the title Add Stream.Render.xmlDeclaration function Add Stream.Render.document function Nov 19, 2025
@marhop
Copy link
Contributor Author

marhop commented Nov 19, 2025

Although well-formed XML documents must indeed have a unique root tag/element, users might want to insert extra events between EventBeginDocument and the root EventBeginElement:

Fair enough! I have implemented document :: Monad m => ConduitT i Event m () -> ConduitT i Event m ().

I haven't yet touched rsXMLDeclaration; I don't think removing it should necessarily be part of this same PR anyway. I do have some thoughts regarding your latest comment, but I fear I'm a little too tired right now to communicate them clearly, so I will get back to you tomorrow!

@k0ral k0ral merged commit c6b2225 into snoyberg:master Nov 23, 2025
18 checks passed
@marhop
Copy link
Contributor Author

marhop commented Nov 24, 2025

Thanks for merging, and thanks for your quick answers, your thoughts, and your openness for discussion! Contributing to this project was a very pleasant experience. 👍

Anyway, I think I still owe you an answer regarding the rsXMLDeclaration setting.

Even though it's down the road, I'd like to clarify one point: the moment we remove rsXMLDeclaration, shouldn't we also change the behavior of the parsing logic to stop inserting EventBeginDocument/EventEndDocument when they're not present in the parsed XML document ?

After some digging in the parser code I don't think this is how it works today. For the parser, EventBeginDocument has not much to do with the XML declaration. This is different from the renderer which uses EventBeginDocument as a trigger to emit an XML declaration if it is configured to do so - although I guess this is more of an implementation detail than by necessity and could possibly just as well be implemented differently.

The parser (unless in the case of severe errors, I suppose) pretty much always returns an EventBeginDocument/EventEndDocument pair - when the input contains an XML declaration, when it doesn't, and even when the input is completely empty. Further more, there is nothing like an EventXMLDeclaration that would signal the parser encountering an XML declaration. In other words, the parser simply ignores the presence or absence of an XML declaration. It tolerates it, but forgets about it immediately. Therefore, the first column of your table (the one labeled "Has parsed document an XML declaration?") can only take the value "unknown" once parsing is done, reducing the table to two rows:

Does user want rendered document to have an XML declaration? How to achieve with rsXMLDeclaration How to achive without rsXMLDeclaration
Yes Make sure the stream contains an EventBeginDocument.12 Make sure the stream contains an EventBeginDocument.1
No Set rsXMLDeclaration = False. Filter out EventBeginDocument.

So essentially, removing rsXMLDeclaration makes no difference if users want an XML declaration in their output. If they wish to suppress it however, setting rsXMLDeclaration = False looks clearer to me than filtering out a lower-level event, don't you think? Besides, while filtering out the EventBeginDocument from a stream does indeed have the effect of suppressing an XML declaration in the rendered output it might - after unrelated code changes in the future, or in some specific application that relies on an EventBeginDocument for other things - have unintended other effects beside suppressing the XML declaration.

Therefore, I now think having the rsXMLDeclaration setting is in fact justified.

Footnotes

  1. This is trivial if the stream originates from parsed XML input - as mentioned above, there's always an EventBeginDocument then. Otherwise, use the document function. 2

  2. And set rsXMLDeclaration = True, but this is the default anyway.

@k0ral
Copy link
Collaborator

k0ral commented Nov 25, 2025

Included in release 1.10.1.0.

For the parser, EventBeginDocument has not much to do with the XML declaration.

That's my point: maybe that behavior of the parser should change, so that EventBeginDocument is generated if, and only if, the parsed document actually has an XML declaration. But maybe that would make xml-conduit care too much about low-level XML tokens and deviate from the main goal of just preserving document semantics.

Therefore, I now think having the rsXMLDeclaration setting is in fact justified.

It is justified given the current behavior of the parser, hence my line of reasoning: if we change that behavior, maybe we can afford to remove rsXMLDeclaration. Why am I so intent on getting rid of it, you might ask ? Because I still find it unsatisfying that there's an overlap between the semantics of document and rsXMLDeclaration: they both control whether an XML declaration is rendered, one is opt-in, the other is opt-out. When used together, they may conflict and lead to confusion for users. Apologies for quoting the Zen of Python here, but: "there should be one-- and preferably only one --obvious way to do it."

Anyway, that's drifting out of the scope of the original pull-request, and it can be dealt with later, so don't mind me.

Thanks for merging, and thanks for your quick answers, your thoughts, and your openness for discussion! Contributing to this project was a very pleasant experience. 👍

Thank you for contributing, and for taking the time to discuss 🙂 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants