Skip to content

GH-45908: [C++][Docs] Rename and expose basic {Array,...}FromJSON helpers as public APIs #46180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
May 14, 2025

Conversation

amoeba
Copy link
Member

@amoeba amoeba commented Apr 18, 2025

Rationale for this change

These functions are generally useful and stable so it would be a good idea to clearly include them in the public API. I'm starting here with just the basic ones in order to make the PR small. See #45908 for more information.

What changes are included in this PR?

  • Moves ArrayFromJSON, ChunkedArrayFromJSON, DictArrayFromJSON, ScalarFromJSON, DictScalarFromJSON from arrow::ipc::internal namespace to arrow::json so it's clearer they're part of the public API and that they're more useful than just for IPC
  • Renames each of the above from {Array,...}FromJSON to {Array,...}FromJSONString to avoid confusion between these helpers and the main JSON(L) reader
  • Renames arrow/util/json_simple.{h,cc} to arrow/json/from_string.{h,cc} both because of the namespace jump but also because the filename is more clear.
  • Adds User Guide and adds a listing in the API docs for moved functions

Are these changes tested?

Yes.

Are there any user-facing changes?

This expands the scope of our public API but does not break any existing public APIs though it's possible users

This comment was marked as resolved.

amoeba added 2 commits April 20, 2025 18:00
This moves the following functions from the IPC namespace to util to make it clear these are useful outside of their use in Arrow IPC.

- ArrayFromJSON
- ChunkedArrayFromJSON
- DictArrayFromJSON
- ScalarFromJSON
- DictScalarFromJSON
@amoeba amoeba force-pushed the feature/GH-45908--expose-from-json branch from 3b6015e to cf808f9 Compare April 21, 2025 01:00
@amoeba amoeba changed the title GH-45908: [C++] Expose {Array,...}FromJSON as public APIs GH-45908: [C++] Expose basic {Array,...}FromJSON helpers as public APIs Apr 21, 2025
@amoeba amoeba marked this pull request as ready for review April 21, 2025 02:36
@amoeba amoeba requested review from bkietz, zanmato1984 and pitrou April 21, 2025 02:37
@amoeba
Copy link
Member Author

amoeba commented Apr 21, 2025

@github-actions crossbow submit preview-docs

@amoeba amoeba changed the title GH-45908: [C++] Expose basic {Array,...}FromJSON helpers as public APIs GH-45908: [C++][Docs] Expose basic {Array,...}FromJSON helpers as public APIs Apr 21, 2025
Copy link

Revision: cf808f9

Submitted crossbow builds: ursacomputing/crossbow @ actions-7a9d15701e

Task Status
preview-docs GitHub Actions

@amoeba
Copy link
Member Author

amoeba commented Apr 21, 2025

Docs previews:

Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bringing this up! I have some minor comments.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 21, 2025
Copy link

github-actions bot commented May 8, 2025

Revision: 5cedcfe

Submitted crossbow builds: ursacomputing/crossbow @ actions-b687d19c46

Task Status
preview-docs GitHub Actions

@bkietz
Copy link
Member

bkietz commented May 8, 2025

I would guess this failure is caused by use of é (e with a diacritic). The bytes of a string literal aren't guaranteed to be utf-8 encoded without a u8 or u8R prefix/lex.string#7, like u8"héhé". We will probably need to revisit that

@amoeba
Copy link
Member Author

amoeba commented May 8, 2025

Interesting. It would be nice to know what it is about that particular job that makes it fail but I can file a minor PR for it since it sounds like a fix we should make.

@amoeba
Copy link
Member Author

amoeba commented May 8, 2025

After the latest round of reviews, I noticed some errors with my inline example C++ code and realized it was probably better to write an actual example source file and literalinclude that from the docs. See 5cedcfe.

Latest docs preview

Can you look again @bkietz? Thanks.

@bkietz
Copy link
Member

bkietz commented May 8, 2025

It would be nice to know what it is about that particular job that makes it fail but I can file a minor PR for it since it sounds like a fix we should make.

The rest of the PR seems ready to go, for now could you comment out any cases in the GDB test which reference utf-8 strings to verify that's the only thing failing? (And keep CI green until you make the minor PR; I'll help review)

@bkietz
Copy link
Member

bkietz commented May 8, 2025

Well, those cases appear to have been responsible for that failure. After looking more closely, we've got an issue open for this already: #46343

@amoeba
Copy link
Member Author

amoeba commented May 8, 2025

Nice catch. I commented there. I reverted the testing commit so the PR is in a merge-able state. Would you be okay if we merged with that one job failing or should we wait on #46343 and rebase before merge?

Copy link
Contributor

@EnricoMi EnricoMi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels May 9, 2025
@amoeba
Copy link
Member Author

amoeba commented May 9, 2025

Thanks @bkietz. There were a couple of comments from @EnricoMi and I'll merge this once those are resolved.

Copy link
Contributor

@EnricoMi EnricoMi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@amoeba amoeba merged commit 07778d9 into apache:main May 14, 2025
41 of 42 checks passed
@amoeba amoeba removed the awaiting merge Awaiting merge label May 14, 2025
@amoeba
Copy link
Member Author

amoeba commented May 14, 2025

Merged. Thanks all for the reviews, it's greatly appreciated.

@@ -2140,8 +2140,8 @@ class WriteFileSystemDatasetMixin : public MakeFileSystemDatasetMixin {
actual_struct = std::dynamic_pointer_cast<Array>(struct_array);
}

auto expected_struct = ArrayFromJSON(struct_(expected_physical_schema_->fields()),
file_contents->second);
auto expected_struct = arrow::ArrayFromJSON(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me why you needed to add explicit arrow:: prefixes here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, nice catch. I can remove this. Is there an automated way to catch things like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to my knowledge, no.

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label May 14, 2025
namespace json {

using ::arrow::internal::checked_cast;
using ::arrow::internal::checked_pointer_cast;

namespace {
namespace internal {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep this in the anonymous namespace as it was?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change to avoid a conflict with our other Converter class. Now that I see what I've done here, maybe it's better just to rename it to give it a better name and keep it in an anonymous namespace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be better to rename IMHO.

/// );
/// \endcode
ARROW_EXPORT
Status ChunkedArrayFromJSONString(const std::shared_ptr<DataType>& type,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return a Result<std::shared_ptr<ChunkedArray>> instead of taking an out-pointer parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw the inconsistency in these and don't love it. I haven't looked yet but it seems like we could make all of these helpers consistent (return Result<std::share_ptr<T>>) instead of using out-params in some and Result in others. I'll take a look now so we don't have to make a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was easy to do, I put up a draft PR and converted ChunkedArrayFromJSONString in ede205e.

I can do the rest if you think that makes sense (I do).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should convert all of them.

@pitrou
Copy link
Member

pitrou commented May 14, 2025

I posted a couple comments, it would be nice to do a post-commit polish PR.

@amoeba
Copy link
Member Author

amoeba commented May 14, 2025

I'll file an issue to follow-up on those.

Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 07778d9.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants