Fix asdf_write_to_mem bugs

Work on asdf-format/libasdf-gwcs#6 quite a few bugs in `asdf_write_to_mem`, which I hadn't thought about in a while.  It's used quite a bit in the test suite, but in trivial cases where it happens to just work.

But there are actually a few logic errors and unhandled TODOs in the code for it that come up as soon as you try to write something less trivial:

- The existing logic writes the YAML tree first -- here it doesn't pre-suppose the size in bytes of the tree, and the memory stream logic in `stream.c` handles `realloc` calls while writing -- this works OK.  However, the implementation of `asdf_write_to_mem` has an optimization where after it has output the YAML and before it outputs binary blocks, it checks to see how much space is needed for the binary blocks and, if necessary, does a single `realloc` call to reserve space for them.  This makes basic sense but it has two serious problems in the implementation:

  - After the `realloc` it calls `asdf_emitter_set_output_malloc` to the new buffer returned by `realloc`.  This, however, has the effect of resetting the emitter state.  All of the `asdf_emitter_set_output_*` functions have this bug.  The intention was that the emitter state should not move to `ASDF_EMITTER_STATE_INITIAL` until an ouput stream has been set on the emitter.  But there was also the intention that it *should* be possible to change the stream at any point between emitter states and this fails on that intent.  The `asdf_emitter_set_output_*` are also are kinda copypasta of each other -- they all are basically the same just wrapping different `asdf_stream` constructors.  That's fine for convenience I guess but I think we could simplify this and fix the bug.  Need tests for this.
 
 - The other issue stems from the comment
    > In order to reasonably estimate the size to reserve for each
    > block we also need to know it's *compressed* size if compression is
    > enabled on that block.  We haven't implemented compressed writing
    > yet so fix that later.  Probably we'll need to compress each block
    > individually first in order to do that; lots of refactoring
    > involved...

    At the time I wrote that it was true, but now compressed writing *is* implemented, but I forgot to update that comment / revisit this issue.  I don't think *that* much refactoring is really needed.  In fact thinking harder about it I don't think it's that complicated.  The compressed writing already writes the compressed data to a temporary buf that's copied/written to the stream.  *This* implementation is not ideal either because sometimes even the compressed data can take up a lot of space in memory.  At the time it was implemented this way in part as a quick solution, and in part because the Python asdf library already does the same thing.  For now we don't have to fix that.  The fact remains that if writing the compressed data to a memory stream, it will necessitate, generally, a `realloc` per compressed block written.  But looking at the code it should *just work* as-is.  It only means we can't use the aforementioned pre-`realloc` optimization on any compressed blocks (could still use it for non-compressed blocks).  This could still be optimized heuristically later using known optimal compression ratios.

 - There is also a TODO:
  
    > // TODO: We should also take into account the block index here; ideally have
    > // a separate routine for just pre-rendering the block index to a buffer

    This is probably less critical but would also be worth doing to avoid an otherwise entirely avoidable `realloc`

    - I also forgot at the time of writing this issue that it does already reserve one _SC_PAGESIZE for the block index.  There *are* some pathological cases where that might not be enough, but not going to optimize for that right now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix asdf_write_to_mem bugs #187

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fix asdf_write_to_mem bugs #187

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions