-
Notifications
You must be signed in to change notification settings - Fork 3.7k
GH-45908: [C++][Docs] Rename and expose basic {Array,...}FromJSON helpers as public APIs #46180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-45908: [C++][Docs] Rename and expose basic {Array,...}FromJSON helpers as public APIs #46180
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This moves the following functions from the IPC namespace to util to make it clear these are useful outside of their use in Arrow IPC. - ArrayFromJSON - ChunkedArrayFromJSON - DictArrayFromJSON - ScalarFromJSON - DictScalarFromJSON
3b6015e
to
cf808f9
Compare
@github-actions crossbow submit preview-docs |
Revision: cf808f9 Submitted crossbow builds: ursacomputing/crossbow @ actions-7a9d15701e
|
Docs previews: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bringing this up! I have some minor comments.
Revision: 5cedcfe Submitted crossbow builds: ursacomputing/crossbow @ actions-b687d19c46
|
I would guess this failure is caused by use of é (e with a diacritic). The bytes of a string literal aren't guaranteed to be utf-8 encoded without a |
Interesting. It would be nice to know what it is about that particular job that makes it fail but I can file a minor PR for it since it sounds like a fix we should make. |
After the latest round of reviews, I noticed some errors with my inline example C++ code and realized it was probably better to write an actual example source file and Latest docs preview
Can you look again @bkietz? Thanks. |
The rest of the PR seems ready to go, for now could you comment out any cases in the GDB test which reference utf-8 strings to verify that's the only thing failing? (And keep CI green until you make the minor PR; I'll help review) |
Well, those cases appear to have been responsible for that failure. After looking more closely, we've got an issue open for this already: #46343 |
This reverts commit b5ed9b1.
Nice catch. I commented there. I reverted the testing commit so the PR is in a merge-able state. Would you be okay if we merged with that one job failing or should we wait on #46343 and rebase before merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments.
Co-authored-by: Enrico Minack <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Merged. Thanks all for the reviews, it's greatly appreciated. |
@@ -2140,8 +2140,8 @@ class WriteFileSystemDatasetMixin : public MakeFileSystemDatasetMixin { | |||
actual_struct = std::dynamic_pointer_cast<Array>(struct_array); | |||
} | |||
|
|||
auto expected_struct = ArrayFromJSON(struct_(expected_physical_schema_->fields()), | |||
file_contents->second); | |||
auto expected_struct = arrow::ArrayFromJSON( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me why you needed to add explicit arrow::
prefixes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, nice catch. I can remove this. Is there an automated way to catch things like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to my knowledge, no.
namespace json { | ||
|
||
using ::arrow::internal::checked_cast; | ||
using ::arrow::internal::checked_pointer_cast; | ||
|
||
namespace { | ||
namespace internal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep this in the anonymous namespace as it was?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change to avoid a conflict with our other Converter class. Now that I see what I've done here, maybe it's better just to rename it to give it a better name and keep it in an anonymous namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be better to rename IMHO.
/// ); | ||
/// \endcode | ||
ARROW_EXPORT | ||
Status ChunkedArrayFromJSONString(const std::shared_ptr<DataType>& type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return a Result<std::shared_ptr<ChunkedArray>>
instead of taking an out-pointer parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I saw the inconsistency in these and don't love it. I haven't looked yet but it seems like we could make all of these helpers consistent (return Result<std::share_ptr<T>>
) instead of using out-params in some and Result in others. I'll take a look now so we don't have to make a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was easy to do, I put up a draft PR and converted ChunkedArrayFromJSONString in ede205e.
I can do the rest if you think that makes sense (I do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should convert all of them.
I posted a couple comments, it would be nice to do a post-commit polish PR. |
I'll file an issue to follow-up on those. |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 07778d9. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
These functions are generally useful and stable so it would be a good idea to clearly include them in the public API. I'm starting here with just the basic ones in order to make the PR small. See #45908 for more information.
What changes are included in this PR?
ArrayFromJSON
,ChunkedArrayFromJSON
,DictArrayFromJSON
,ScalarFromJSON
,DictScalarFromJSON
fromarrow::ipc::internal
namespace toarrow::json
so it's clearer they're part of the public API and that they're more useful than just for IPC{Array,...}FromJSON
to{Array,...}FromJSONString
to avoid confusion between these helpers and the main JSON(L) readerarrow/util/json_simple.{h,cc}
toarrow/json/from_string.{h,cc}
both because of the namespace jump but also because the filename is more clear.Are these changes tested?
Yes.
Are there any user-facing changes?
This expands the scope of our public API but does not break any existing public APIs though it's possible users
{Array,...}FromJSON
as public APIs #45908