Adding native support of AWS Cloudtrail input format #24479

zhaner08 · 2024-12-15T22:41:10Z

Description

Adding native support of AWS Cloudtrail input format

Additional context and related issues

Publishing this revision out to get some feedbacks while testing is ongoing and tests are being added

Specific question regarding the implementation:

I saw the current code more about supporting multiple input format mapped to single SerDe instead of single InputFormat mapped to multiple SerDe like in this CR, is there a better way to do this? Or we want to pass the input format down to TextLineReaderFactory so it can create Cloudtrail line reader at that the creation time.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( X) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

github-actions · 2025-01-06T17:03:04Z

This pull request has gone a while without any activity. Tagging for triage help: @mosabua

github-actions · 2025-01-28T17:02:54Z

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

mosabua · 2025-01-28T17:05:19Z

We had @zhaner08 and others attend the contributor call to ask about help with this PR. @pettyjamesm and @dain and @electrum will be able to help driving this so I added stale-ignore since we know this PR will come to completion over time.

mosabua · 2025-01-28T17:09:22Z

@zhaner08 let us know if you have any further questions or work planned on this PR or if you are waiting for first review beyond the input we provided during the contributor call.

zhaner08 · 2025-02-04T19:35:10Z

Will work on another revision of this this week

zhaner08 · 2025-02-06T02:34:18Z

This PR is ready to be reviewed, as discussed during the call, this currently only supports CloudTrail + Hive Json combination.

mosabua · 2025-02-13T21:48:44Z

Added @electrum as reviewer since he is file system lead and was part of the discussion in the contributor call.

findinpath · 2025-02-14T14:37:05Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveStorageFormat.java

@@ -101,6 +102,10 @@ public enum HiveStorageFormat
    REGEX(
            REGEX_SERDE_CLASS,
            TEXT_INPUT_FORMAT_CLASS,
+            HIVE_IGNORE_KEY_OUTPUT_FORMAT_CLASS),
+    CLOUDTRAIL(


Let's document it in hive.md so that the Trino users get to know about it.

findinpath · 2025-02-14T14:40:06Z

plugin/trino-hive/src/test/java/io/trino/plugin/hive/HiveTestUtils.java

@@ -189,6 +190,7 @@ public static Set<HivePageSourceFactory> getDefaultHivePageSourceFactories(Trino
                .add(new RcFilePageSourceFactory(fileSystemFactory, hiveConfig))
                .add(new OrcPageSourceFactory(new OrcReaderConfig(), fileSystemFactory, stats, hiveConfig))
                .add(new ParquetPageSourceFactory(fileSystemFactory, stats, new ParquetReaderConfig(), hiveConfig))
+                .add(new CloudTrailJsonPageSourceFactory(fileSystemFactory, hiveConfig))


Let's add test coverage for this functionality.
Consider working with a resource directory to add coverage for the new code and showcase the new functionality.

dain · 2025-03-01T02:32:23Z

...o-hive-formats/src/main/java/io/trino/hive/formats/line/cloudtrail/CloudTrailLineReader.java

+import static java.nio.charset.StandardCharsets.UTF_8;
+import static java.util.Objects.requireNonNull;
+
+/*


Use a javadoc comment

dain · 2025-03-01T02:50:25Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/line/CloudTrailJsonPageSourceFactory.java

+    public CloudTrailJsonPageSourceFactory(TrinoFileSystemFactory trinoFileSystemFactory, HiveConfig config)
+    {
+        super(trinoFileSystemFactory,
+                new JsonDeserializerFactory(),


Is this correct? The CloudTrail reader code in Hive is expected to be just a simple wrapper around the Hive JsonSerDe format, or is does it have custom code? I ask because the exact Hive JsonSerDe has some strange behavior and we need to be sure that this correct behavior.

So from Athena side, we updated our doc to suggest customers to use this Hive json wrapper from ~2 years ago, it literally just extracting the json within the 'records' and use whatever customer choose to process the json itself. Customer can also choose to use openx json. We haven't heard any complaints about using hive/openx json underneath likely due to cloud trail logs are mostly proper formatted json. With this native wrapper, we are also planning to migrate those people that using emr.cloudtrail serde over to this, not planning to write an exact bug to bug copy of that serde

Initial implementation for supporting AWS Cloudtrail input format

c5c53d4

cla-bot bot added the cla-signed label Dec 15, 2024

github-actions bot added the hive Hive connector label Dec 15, 2024

zhaner08 requested a review from dain December 15, 2024 22:41

github-actions bot added the stale label Jan 6, 2025

github-actions bot closed this Jan 28, 2025

mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Jan 28, 2025

mosabua reopened this Jan 28, 2025

zhaner08 added 4 commits February 5, 2025 21:17

Merge branch 'trinodb:master' into support_aws_cloudtrail_input_format

c4c910a

Cleanup and adding tests for the format

2215e8a

Update class name

a5776d0

Remove non related changes

12867a1

zhaner08 changed the title ~~[WIP] Adding native support of AWS Cloudtrail input format~~ Adding native support of AWS Cloudtrail input format Feb 6, 2025

zhaner08 self-assigned this Feb 6, 2025

zhaner08 requested a review from pettyjamesm February 6, 2025 02:35

Fix unit tests

0888235

mosabua requested a review from electrum February 13, 2025 21:48

findinpath reviewed Feb 14, 2025

View reviewed changes

dain reviewed Mar 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding native support of AWS Cloudtrail input format #24479

Adding native support of AWS Cloudtrail input format #24479

Uh oh!

zhaner08 commented Dec 15, 2024

Uh oh!

github-actions bot commented Jan 6, 2025

Uh oh!

github-actions bot commented Jan 28, 2025

Uh oh!

mosabua commented Jan 28, 2025 •

edited

Loading

Uh oh!

mosabua commented Jan 28, 2025

Uh oh!

zhaner08 commented Feb 4, 2025

Uh oh!

zhaner08 commented Feb 6, 2025

Uh oh!

mosabua commented Feb 13, 2025

Uh oh!

findinpath Feb 14, 2025

Uh oh!

findinpath Feb 14, 2025

Uh oh!

dain Mar 1, 2025

Uh oh!

dain Mar 1, 2025

Uh oh!

zhaner08 Mar 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Adding native support of AWS Cloudtrail input format #24479

Are you sure you want to change the base?

Adding native support of AWS Cloudtrail input format #24479

Uh oh!

Conversation

zhaner08 commented Dec 15, 2024

Description

Additional context and related issues

Release notes

Uh oh!

github-actions bot commented Jan 6, 2025

Uh oh!

github-actions bot commented Jan 28, 2025

Uh oh!

mosabua commented Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mosabua commented Jan 28, 2025

Uh oh!

zhaner08 commented Feb 4, 2025

Uh oh!

zhaner08 commented Feb 6, 2025

Uh oh!

mosabua commented Feb 13, 2025

Uh oh!

findinpath Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

findinpath Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

dain Mar 1, 2025

Choose a reason for hiding this comment

Uh oh!

dain Mar 1, 2025

Choose a reason for hiding this comment

Uh oh!

zhaner08 Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mosabua commented Jan 28, 2025 •

edited

Loading

zhaner08 Mar 14, 2025 •

edited

Loading