Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions docs/source/usage/workflow.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,39 @@
.. _workflow:

Access modes
============

The openPMD-api distinguishes between a number of different access modes:

* **Create mode**: Used for creating a new Series from scratch.
Any file possibly existing in the specified location will be overwritten.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For file-based I/O, I think there might be a potentially confusing corner case:
if one writes the same series again, but with different interval, then some files might be overwritten and others untouched.

I am thinking about this from time to time and see pro and cons for, e.g., actively cleaning/moving out all potential files (even for iterations that are not created again) when opening in truncating create mode.

More of a thing we could discuss next time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I guess the most sensible solution would be within the bounds of a potential truncate option.

* **Read-only mode**: Used for reading from an existing Series.
No modifications will be made.
* **Read/Write mode**: Creates a new Series if not existing, otherwise opens an existing Series for reading and writing.
New datasets and iterations will be inserted as needed.
Not fully supported by all backends:

* ADIOS1: Automatically coerced to *Create* mode if the file does not exist yet and to *Read-only* mode if it exists.
* ADIOS2: Automatically coerced to *Create* mode if the file does not exist yet and to *Read-only* mode if it exists.
Since this happens on a per-file level, this mode allows to read from existing iterations and write to new iterations at the same time in file-based iteration encoding.
* **Append mode**: Restricted mode for appending new iterations to an existing Series that is supported by all backends at least in file-based iteration encoding, and by all but ADIOS1 in other encodings.
The API is equivalent to that of the *Create* mode, meaning that no reading is supported whatsoever.
If the Series does not exist yet, this behaves equivalently to the *Create* mode.
Existing iterations will not be deleted, newly-written iterations will be inserted.

**Warning:** When writing an iteration that already exists, the behavior is implementation-defined and depends on the chosen backend and iteration encoding:

* The new iteration might fully replace the old one.
* The new iteration might be merged into the old one.
* (To be removed in a future update) The old and new iteration might coexist in the resulting dataset.

We suggest to fully define iterations when using Append mode (i.e. as if using Create mode) to avoid implementation-specific behavior.
Appending to an openPMD Series is only supported on a per-iteration level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a bit too strict here to understand what is meant or might be repetitive with what is said above?
Because one could add here: Extending the shape of existing data sets is possible, see the examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, let's maybe restrict the wording to make clear that this refers to the Append mode only:

The Append mode is intended for adding new iterations to a Series, not for the modification of existing iterations.


**Warning:** There is no reading involved in using Append mode.
It is a user's responsibility to ensure that the appended dataset and the appended-to dataset are compatible with each other.
The results of using incompatible backend configurations are undefined.

Workflow
========

Expand Down
11 changes: 11 additions & 0 deletions include/openPMD/Error.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -69,5 +69,16 @@ namespace error

BackendConfigSchema(std::vector<std::string>, std::string what);
};

/**
* @brief Internal errors that should not happen. Please report.
*
* Example: A nullpointer is observed somewhere.
*/
class Internal : public Error
{
public:
Internal(std::string const &what);
};
} // namespace error
} // namespace openPMD
18 changes: 18 additions & 0 deletions include/openPMD/IO/AbstractIOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,23 @@ namespace internal
*/
class AbstractIOHandler
{
friend class Series;

private:
void setIterationEncoding(IterationEncoding encoding)
{
/*
* In file-based iteration encoding, the APPEND mode is handled entirely
* by the frontend, the backend should just treat it as CREATE mode
*/
if (encoding == IterationEncoding::fileBased &&
m_backendAccess == Access::APPEND)
{
// do we really want to have those as const members..?
*const_cast<Access *>(&m_backendAccess) = Access::CREATE;
}
}

public:
#if openPMD_HAVE_MPI
AbstractIOHandler(std::string path, Access at, MPI_Comm)
Expand Down Expand Up @@ -153,6 +170,7 @@ class AbstractIOHandler
virtual std::string backendName() const = 0;

std::string const directory;
// why do these need to be separate?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to answer this question? :)

I am not sure we documented this well when this was introduced (also visible in the missing doxygen strings for those two members).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into that a few times. I believe it would be consistent to treat all files in the file mode as they were one file. When its time to overwrite, remove all existing files related to the series.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to answer this question? :)

I'd suggest to do it in a separate PR, need to figure out what the answer is at first though
I wasn't always quite sure which one of these to use when writing this PR

I ran into that a few times. I believe it would be consistent to treat all files in the file mode as they were one file. When its time to overwrite, remove all existing files related to the series.

We could think about this for CREATE mode, since its semantics are "overwrite anything that exists". The purpose READ_WRITE and APPEND modes is explicitly not to delete existing data. For APPEND mode, the most sensible solution seems to be truncation (i.e. "delete anything past iteration 500") which would be a follow-up PR.

Access const m_backendAccess;
Access const m_frontendAccess;
std::queue<IOTask> m_work;
Expand Down
19 changes: 11 additions & 8 deletions include/openPMD/IO/AbstractIOHandlerImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -266,14 +266,17 @@ class AbstractIOHandlerImpl
* file.
*
* The operation should fail if m_handler->m_frontendAccess is
* Access::READ_ONLY. The new file should be located in
* m_handler->directory. The new file should have the filename
* parameters.name. The filename should include the correct corresponding
* filename extension. Any existing file should be overwritten if
* m_handler->m_frontendAccess is Access::CREATE. The Writables file
* position should correspond to the root group "/" of the hierarchy. The
* Writable should be marked written when the operation completes
* successfully.
* Access::READ_ONLY. If m_handler->m_frontendAccess is Access::APPEND, a
* possibly existing file should not be overwritten. Instead, written
* updates should then either occur in-place or in form of new IO steps.
* Support for reading is not necessary in Append mode.
* The new file should be located in m_handler->directory.
* The new file should have the filename parameters.name.
* The filename should include the correct corresponding filename extension.
* Any existing file should be overwritten if m_handler->m_frontendAccess is
* Access::CREATE. The Writables file position should correspond to the root
* group "/" of the hierarchy. The Writable should be marked written when
* the operation completes successfully.
*/
virtual void
createFile(Writable *, Parameter<Operation::CREATE_FILE> const &) = 0;
Expand Down
3 changes: 2 additions & 1 deletion include/openPMD/IO/Access.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ enum class Access
{
READ_ONLY, //!< open series as read-only, fails if series is not found
READ_WRITE, //!< open existing series as writable
CREATE //!< create new series and truncate existing (files)
CREATE, //!< create new series and truncate existing (files)
APPEND //!< write new iterations to an existing series without reading
}; // Access

// deprecated name (used prior to 0.12.0)
Expand Down
7 changes: 7 additions & 0 deletions src/Error.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,5 +45,12 @@ namespace error
concatVector(errorLocation_in) + "': " + std::move(what))
, errorLocation(std::move(errorLocation_in))
{}

Internal::Internal(std::string const &what)
: Error(
"Internal error: " + what +
"\nThis is a bug. Please report at ' "
"https://github.com/openPMD/openPMD-api/issues'.")
{}
} // namespace error
} // namespace openPMD
23 changes: 7 additions & 16 deletions src/IO/ADIOS/ADIOS2IOHandler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -315,9 +315,6 @@ void ADIOS2IOHandlerImpl::createFile(
m_iterationEncoding = parameters.encoding;
associateWithFile(writable, shared_name);
this->m_dirty.emplace(shared_name);
getFileData(shared_name, IfFileNotOpen::OpenImplicitly).m_mode =
adios2::Mode::Write; // WORKAROUND
// ADIOS2 does not yet implement ReadWrite Mode

writable->written = true;
writable->abstractFilePosition = std::make_shared<ADIOS2FilePosition>();
Expand Down Expand Up @@ -1074,21 +1071,16 @@ adios2::Mode ADIOS2IOHandlerImpl::adios2AccessMode(std::string const &fullPath)
if (auxiliary::directory_exists(fullPath) ||
auxiliary::file_exists(fullPath))
{
std::cerr << "ADIOS2 does currently not yet implement ReadWrite "
"(Append) mode. "
<< "Replacing with Read mode." << std::endl;
return adios2::Mode::Read;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the APPEND mode is added. Should READ_WRITE mode be rejected for ADIOS? Might be semantically cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

READ_WRITE is still useful for file-based iteration encoding since it allows you to read old iterations while at the same time to write new ones. This feature is not found in any other iteration encodings. For the other iteration encodings, there is no use in having READ_WRITE mode for ADIOS2, yeah.
We should however not deactivate it immediately, but allow users to transition to APPEND mode since it's a breaking change.

}
else
{
std::cerr << "ADIOS2 does currently not yet implement ReadWrite "
"(Append) mode. "
<< "Replacing with Write mode." << std::endl;
return adios2::Mode::Write;
}
default:
return adios2::Mode::Undefined;
case Access::APPEND:
return adios2::Mode::Append;
}
throw std::runtime_error("Unreachable!");
}

json::TracingJSON ADIOS2IOHandlerImpl::nullvalue = {
Expand Down Expand Up @@ -2235,6 +2227,7 @@ namespace detail
delayOpeningTheFirstStep = true;
break;
case adios2::Mode::Write:
case adios2::Mode::Append:
/*
* File engines, write mode:
* Default for old layout is no steps.
Expand Down Expand Up @@ -2442,6 +2435,7 @@ namespace detail
{
switch (m_mode)
{
case adios2::Mode::Append:
case adios2::Mode::Write: {
// usesSteps attribute only written upon ::advance()
// this makes sure that the attribute is only put in case
Expand Down Expand Up @@ -2686,17 +2680,14 @@ namespace detail
switch (ba.m_mode)
{
case adios2::Mode::Write:
case adios2::Mode::Append:
eng.PerformPuts();
break;
case adios2::Mode::Read:
eng.PerformGets();
break;
case adios2::Mode::Append:
// TODO order?
eng.PerformGets();
eng.PerformPuts();
break;
default:
throw error::Internal("[ADIOS2] Unexpected access mode.");
break;
}
},
Expand Down
10 changes: 10 additions & 0 deletions src/IO/ADIOS/CommonADIOS1IOHandler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
*/

#include "openPMD/IO/ADIOS/CommonADIOS1IOHandler.hpp"
#include "openPMD/Error.hpp"

#if openPMD_HAVE_ADIOS1

Expand Down Expand Up @@ -406,6 +407,15 @@ void CommonADIOS1IOHandlerImpl<ChildClass>::createFile(
if (!auxiliary::ends_with(name, ".bp"))
name += ".bp";

if (m_handler->m_backendAccess == Access::APPEND &&
auxiliary::file_exists(name))
{
throw error::OperationUnsupportedInBackend(
"ADIOS1",
"Appending to existing file on disk (use Access::CREATE to "
"overwrite)");
}

writable->written = true;
writable->abstractFilePosition =
std::make_shared<ADIOS1FilePosition>("/");
Expand Down
77 changes: 71 additions & 6 deletions src/IO/HDF5/HDF5IOHandler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -237,13 +237,41 @@ void HDF5IOHandlerImpl::createFile(
std::string name = m_handler->directory + parameters.name;
if (!auxiliary::ends_with(name, ".h5"))
name += ".h5";
unsigned flags;
if (m_handler->m_backendAccess == Access::CREATE)
unsigned flags{};
switch (m_handler->m_backendAccess)
{
case Access::CREATE:
flags = H5F_ACC_TRUNC;
else
break;
case Access::APPEND:
if (auxiliary::file_exists(name))
{
flags = H5F_ACC_RDWR;
}
else
{
flags = H5F_ACC_TRUNC;
}
break;
case Access::READ_WRITE:
flags = H5F_ACC_EXCL;
hid_t id =
H5Fcreate(name.c_str(), flags, H5P_DEFAULT, m_fileAccessProperty);
break;
case Access::READ_ONLY:
// condition has been checked above
throw std::runtime_error(
"[HDF5] Control flow error in createFile backend access mode.");
}

hid_t id{};
if (flags == H5F_ACC_RDWR)
{
id = H5Fopen(name.c_str(), flags, m_fileAccessProperty);
}
else
{
id = H5Fcreate(
name.c_str(), flags, H5P_DEFAULT, m_fileAccessProperty);
}
VERIFY(id >= 0, "[HDF5] Internal error: Failed to create HDF5 file");

writable->written = true;
Expand Down Expand Up @@ -409,6 +437,36 @@ void HDF5IOHandlerImpl::createDataset(
"[HDF5] Internal error: Failed to open HDF5 group during dataset "
"creation");

if (m_handler->m_backendAccess == Access::APPEND)
{
// The dataset might already exist in the file from a previous run
// We delete it, otherwise we could not create it again with
// possibly different parameters.
if (htri_t link_id = H5Lexists(node_id, name.c_str(), H5P_DEFAULT);
link_id > 0)
{
// This only unlinks, but does not delete the dataset
// Deleting the actual dataset physically is now up to HDF5:
// > when removing an object with H5Ldelete, the HDF5 library
// > should be able to detect and recycle the file space when no
// > other reference to the deleted object exists
// https://github.com/openPMD/openPMD-api/pull/1007#discussion_r867223316
herr_t status = H5Ldelete(node_id, name.c_str(), H5P_DEFAULT);
VERIFY(
status == 0,
"[HDF5] Internal error: Failed to delete old dataset '" +
name + "' from group for overwriting.");
}
else if (link_id < 0)
{
throw std::runtime_error(
"[HDF5] Internal error: Failed to check for link existence "
"of '" +
name + "' inside group for overwriting.");
}
// else: link_id == 0: Link does not exist, nothing to do
}

Datatype d = parameters.dtype;
if (d == Datatype::UNDEFINED)
{
Expand Down Expand Up @@ -702,7 +760,14 @@ void HDF5IOHandlerImpl::openFile(
Access at = m_handler->m_backendAccess;
if (at == Access::READ_ONLY)
flags = H5F_ACC_RDONLY;
else if (at == Access::READ_WRITE || at == Access::CREATE)
/*
* Within the HDF5 backend, APPEND and READ_WRITE mode are
* equivalent, but the openPMD frontend exposes no reading
* functionality in APPEND mode.
*/
else if (
at == Access::READ_WRITE || at == Access::CREATE ||
at == Access::APPEND)
flags = H5F_ACC_RDWR;
else
throw std::runtime_error("[HDF5] Unknown file Access");
Expand Down
20 changes: 17 additions & 3 deletions src/IO/JSON/JSONIOHandlerImpl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ void JSONIOHandlerImpl::createFile(
file.invalidate();
}

std::string const dir(m_handler->directory);
std::string const &dir(m_handler->directory);
if (!auxiliary::directory_exists(dir))
{
auto success = auxiliary::create_directories(dir);
Expand All @@ -126,8 +126,14 @@ void JSONIOHandlerImpl::createFile(

associateWithFile(writable, shared_name);
this->m_dirty.emplace(shared_name);
// make sure to overwrite!
this->m_jsonVals[shared_name] = std::make_shared<nlohmann::json>();

if (m_handler->m_backendAccess != Access::APPEND)
{
// make sure to overwrite!
this->m_jsonVals[shared_name] = std::make_shared<nlohmann::json>();
}
// else: the JSON value is not available in m_jsonVals and will be
// read from the file later on before overwriting

writable->written = true;
writable->abstractFilePosition = std::make_shared<JSONFilePosition>();
Expand Down Expand Up @@ -910,6 +916,14 @@ JSONIOHandlerImpl::getFilehandle(File fileName, Access access)
{
case Access::CREATE:
case Access::READ_WRITE:
case Access::APPEND:
/*
* Always truncate when writing, we alway write entire JSON
* datasets, never partial ones.
* Within the JSON backend, APPEND and READ_WRITE mode are
* equivalent, but the openPMD frontend exposes no reading
* functionality in APPEND mode.
*/
fs->open(path, std::ios_base::out | std::ios_base::trunc);
break;
case Access::READ_ONLY:
Expand Down
Loading