Skip to content

Commit b423dc1

Browse files
xiaoxmengmeta-codesync[bot]
authored andcommitted
feat: Add Paimon connector data structures and scaffolding (facebookincubator#16674)
Summary: Pull Request resolved: facebookincubator#16674 CONTEXT: Paimon is a lake format that organizes data files in an LSM-tree structure per partition/bucket. This diff bootstraps the Paimon integration under velox/connectors/hive/paimon/, following the same pattern as Iceberg (hive/iceberg/). WHAT: Adds the foundational data structures and connector scaffolding for Paimon table format support: - PaimonConnectorSplit: Extends ConnectorSplit. Represents a Paimon DataSplit (one partition x bucket) containing multiple data files across LSM-tree levels. Includes rawConvertible validation (deleteRowCount must be 0). - PaimonConnectorSplitBuilder: Fluent builder for test convenience. - PaimonDataFile: Per-file metadata including path, size, rowCount, level, sequence numbers, deleteRowCount, Type (kData/kChangelog), Source (kAppend/kCompact), and optional PaimonDeletionFile. - PaimonDeletionFile: Deletion bitmap metadata (path, offset, length, cardinality) with constructor validation. - PaimonRowKind: Row-level change type enum (+I/-U/+U/-D) matching Paimon Java's RowKind values, with kRowKindColumn constant for the hidden _rowkind system column. - PaimonChangelogMode: Table-level changelog producer mode enum (kNone/kInput/kLookup/kFullCompaction). - PaimonConnector: Extends HiveConnector, creates PaimonDataSource instances. - PaimonDataSource: Extends HiveDataSource with addSplit/next stubs (VELOX_NYI). Implementation in follow-up diff. - PaimonConfig: Placeholder for Paimon-specific configuration. - Full serialization/deserialization support for all types. Reviewed By: srsuryadev, zzhao0, tanjialiang Differential Revision: D95760777 fbshipit-source-id: ce2bef799dabaf936a485590110177dbc2247902
1 parent 4d93698 commit b423dc1

21 files changed

+2747
-0
lines changed

velox/connectors/hive/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ velox_add_library(velox_hive_config OBJECT HiveConfig.cpp)
1616
velox_link_libraries(velox_hive_config velox_core velox_exception)
1717

1818
add_subdirectory(iceberg)
19+
add_subdirectory(paimon)
1920

2021
velox_add_library(
2122
velox_hive_connector
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Copyright (c) Facebook, Inc. and its affiliates.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
velox_add_library(
16+
velox_hive_paimon_split
17+
PaimonConnectorSplit.cpp
18+
PaimonDataFileMeta.cpp
19+
PaimonDeletionFile.cpp
20+
PaimonRowKind.cpp
21+
)
22+
23+
velox_link_libraries(
24+
velox_hive_paimon_split
25+
velox_connector
26+
velox_hive_connector
27+
fmt::fmt
28+
)
29+
30+
velox_add_library(
31+
velox_hive_paimon_connector
32+
PaimonConnector.cpp
33+
PaimonDataSource.cpp
34+
)
35+
36+
velox_link_libraries(
37+
velox_hive_paimon_connector
38+
velox_hive_paimon_split
39+
velox_hive_connector
40+
)
41+
42+
if(${VELOX_BUILD_TESTING})
43+
add_subdirectory(tests)
44+
endif()
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
/*
2+
* Copyright (c) Facebook, Inc. and its affiliates.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
#pragma once
17+
18+
#include "velox/common/config/Config.h"
19+
20+
namespace facebook::velox::connector::hive::paimon {
21+
22+
/// Paimon-specific connector configuration.
23+
/// Wraps the shared ConfigBase and provides accessors for Paimon settings.
24+
class PaimonConfig {
25+
public:
26+
explicit PaimonConfig(std::shared_ptr<const config::ConfigBase> config)
27+
: config_(std::move(config)) {
28+
VELOX_CHECK_NOT_NULL(config_);
29+
}
30+
31+
const std::shared_ptr<const config::ConfigBase>& config() const {
32+
return config_;
33+
}
34+
35+
private:
36+
const std::shared_ptr<const config::ConfigBase> config_;
37+
};
38+
39+
} // namespace facebook::velox::connector::hive::paimon
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
/*
2+
* Copyright (c) Facebook, Inc. and its affiliates.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
#include "velox/connectors/hive/paimon/PaimonConnector.h"
17+
18+
#include "velox/connectors/hive/paimon/PaimonDataSource.h"
19+
20+
namespace facebook::velox::connector::hive::paimon {
21+
22+
PaimonConnector::PaimonConnector(
23+
const std::string& id,
24+
std::shared_ptr<const config::ConfigBase> config,
25+
folly::Executor* ioExecutor)
26+
: HiveConnector(id, config, ioExecutor),
27+
paimonConfig_(std::make_shared<PaimonConfig>(connectorConfig())) {}
28+
29+
std::unique_ptr<DataSource> PaimonConnector::createDataSource(
30+
const RowTypePtr& outputType,
31+
const ConnectorTableHandlePtr& tableHandle,
32+
const ColumnHandleMap& columnHandles,
33+
ConnectorQueryCtx* connectorQueryCtx) {
34+
return std::make_unique<PaimonDataSource>(
35+
outputType,
36+
tableHandle,
37+
columnHandles,
38+
&fileHandleFactory_,
39+
ioExecutor_,
40+
connectorQueryCtx,
41+
hiveConfig_);
42+
}
43+
44+
} // namespace facebook::velox::connector::hive::paimon
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
/*
2+
* Copyright (c) Facebook, Inc. and its affiliates.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
#pragma once
17+
18+
#include "velox/connectors/hive/HiveConnector.h"
19+
#include "velox/connectors/hive/paimon/PaimonConfig.h"
20+
21+
namespace facebook::velox::connector::hive::paimon {
22+
23+
/// Provides Paimon table format support by extending HiveConnector.
24+
///
25+
/// Creates PaimonDataSource instances that handle Paimon's multi-file splits
26+
/// (one split = one bucket with multiple data files across LSM-tree levels).
27+
/// Reuses HiveConnector's ORC/Parquet readers directly — no Arrow bridge.
28+
class PaimonConnector final : public HiveConnector {
29+
public:
30+
PaimonConnector(
31+
const std::string& id,
32+
std::shared_ptr<const config::ConfigBase> config,
33+
folly::Executor* ioExecutor);
34+
35+
/// Creates PaimonDataSource for reading from Paimon tables.
36+
std::unique_ptr<DataSource> createDataSource(
37+
const RowTypePtr& outputType,
38+
const ConnectorTableHandlePtr& tableHandle,
39+
const ColumnHandleMap& columnHandles,
40+
ConnectorQueryCtx* connectorQueryCtx) override;
41+
42+
private:
43+
const std::shared_ptr<PaimonConfig> paimonConfig_;
44+
};
45+
46+
class PaimonConnectorFactory final : public ConnectorFactory {
47+
public:
48+
static constexpr const char* kPaimonConnectorName = "paimon";
49+
50+
PaimonConnectorFactory() : ConnectorFactory(kPaimonConnectorName) {}
51+
52+
std::shared_ptr<Connector> newConnector(
53+
const std::string& id,
54+
std::shared_ptr<const config::ConfigBase> config,
55+
folly::Executor* ioExecutor = nullptr,
56+
[[maybe_unused]] folly::Executor* cpuExecutor = nullptr) override {
57+
return std::make_shared<PaimonConnector>(id, config, ioExecutor);
58+
}
59+
};
60+
61+
} // namespace facebook::velox::connector::hive::paimon

0 commit comments

Comments
 (0)