-
Notifications
You must be signed in to change notification settings - Fork 51
lazy parsing of iterations #938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7bbea9b
to
1992d29
Compare
051ede0
to
18cbf90
Compare
18cbf90
to
b1da52a
Compare
0b0ddaf
to
9f5cbc1
Compare
33ebc35
to
2671df4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added inline :)
include/openPMD/Iteration.hpp
Outdated
private: | ||
Iteration(); | ||
|
||
struct DeferredRead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a doxygen string here, just to clarify what this switch does.
Would DeferredReadAccess
be more explicit? Not sure, ... maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we rename this one anyway, I'd go for DeferredParseAccess
to avoid confusion with deferred load_chunks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of the lazy/deferred parsing reads well. One general comment, I would unify the naming for this in the code base, at the moment we use:
- lazy
- deferred read
- deferred access
- (not) yet accessed
To describe the same thing, which can be confusing as the same concept in the code is meant.
Should we skip the introduction of the term "lazy" and instead use "deferred (iteration) access" in all places?
Do we want to make this a Series constructor parameter or JSON option?
Should we introduce the builder pattern in a separate PR and/or use it more consistently throughout the code base & examples?
7e2e3f6
to
0c481b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of the lazy/deferred parsing reads well. One general comment, I would unify the naming for this in the code base, at the moment we use:
* lazy * deferred read * deferred access * (not) yet accessed
To describe the same thing, which can be confusing as the same concept in the code is meant.
Should we skip the introduction of the term "lazy" and instead use "deferred (iteration) access" in all places?
I've renamed things now:
parse_lazily -> defer_iteration_parsing
NotYetAccessed -> ParseAccessDeferred
Iteration::deferRead -> Iteration::deferParseAccess
DeferredRead -> DeferredParseAccess
m_deferredRead -> m_deferredParseAccess
parseLazily -> runDeferredParseAccess
Do we want to make this a Series constructor parameter or JSON option?
Good idea, done
Should we introduce the builder pattern in a separate PR and/or use it more consistently throughout the code base & examples?
I introduced the builder pattern since it made for a better way to have a constructor with more than one defaulted argument. Since we're now back at one defaulted argument, we don't necessarily need it anymore. I've isolated the builder pattern to a branch franzpoeschel/topic-builder
now, should I do a PR?
Also, we can definitely use it more in the examples and tests. In the code base itself probably not so much since we seldomly create a Series
object ourselves.
include/openPMD/Iteration.hpp
Outdated
private: | ||
Iteration(); | ||
|
||
struct DeferredRead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we rename this one anyway, I'd go for DeferredParseAccess
to avoid confusion with deferred load_chunks
include/openPMD/Series.hpp
Outdated
std::unique_ptr< ParsedInput > parseInput(std::string); | ||
// use a template in order not to expose nlohmann_json to users | ||
template< typename JSON > | ||
void parseJsonOptions( JSON const & ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you only call this in the .cpp
file, you can define this helper function only in the Series.cpp
file and in a namespace {}
(as you already do) and avoid that it gets exposed as a function and symbol at all :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that easy because that function accesses private members. But we can theoretically make all members of SeriesData
public and inherit it privately in SeriesInternal
. So, the data struct can be passed to non-friend functions for manipulation?
Edit: Pushed a commit that does just that.
Edit: Not so easy either, this breaks dynamic casting. I've made it a public inheritance now, meaning that SeriesInternal
has all members public now. Series
still hides them.
uint64_t iteration{}; | ||
|
||
// support for std::tie | ||
operator std::tuple< bool &, int &, uint64_t & >() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever :)
First attempt at an implementation of lazy parsing todo: * file-based layout * iterator access Basic file-based deferred parsing Remove changes to Container class Read eagerly by default Read iteration upon Iteration::open Expose to frontend and add some tests Use Builder Pattern for Series
In that case, we must not attempt to piece back together the filename by filebased methods, but instead directly use the filename specified by the user.
parse_lazily -> defer_iteration_parsing NotYetAccessed -> ParseAccessDeferred Iteration::deferRead -> Iteration::deferParseAccess DeferredRead -> DeferredParseAccess m_deferredRead -> m_deferredParseAccess parseLazily -> runDeferredParseAccess
This makes all members of SeriesData public, but Series still hides them.
Co-authored-by: Axel Huebl <[email protected]>
6ded0af
to
e99630c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is great!
Opening a Series with many iterations is currently a time-intensive procedure due to eager parsing.
This adds an option to lazily parse a Series. If that option is chosen,
Iteration::open()
must be called before accessing.ReadIterations
will do that automatically.I previously attempted parsing iterations automatically upon accessing them via the
Container
interface, but there were too many edge cases and it's probably better to do such things explicitly in parallel situations.TODO:
including removal of failed first attemptauxiliary::Option
instead of NaNs to represent no scaling.