Skip to content

Commit 78ba99b

Browse files
authored
[filesystem] Support use Hadoop dependencies from environment variables HADOOP_CLASSPATH (#1359)
1 parent 7ffcbf0 commit 78ba99b

File tree

7 files changed

+50
-2
lines changed

7 files changed

+50
-2
lines changed

fluss-common/src/main/java/org/apache/fluss/config/ConfigOptions.java

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,8 +105,14 @@ public class ConfigOptions {
105105
.asList()
106106
.defaultValues(
107107
ArrayUtils.concat(
108+
// TODO: remove core-site after implement fluss hdfs security
109+
// utils
108110
new String[] {
109-
"java.", "org.apache.fluss.", "javax.annotation."
111+
"java.",
112+
"org.apache.fluss.",
113+
"javax.annotation.",
114+
"org.apache.hadoop.",
115+
"core-site",
110116
},
111117
PARENT_FIRST_LOGGING_PATTERNS))
112118
.withDescription(

fluss-dist/src/main/resources/bin/config.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,11 @@ constructFlussClassPath() {
3232
fi
3333
done < <(find "$FLUSS_LIB_DIR" ! -type d -name '*.jar' -print0 | sort -z)
3434

35+
# Add Hadoop dependencies from environment variables HADOOP_CLASSPATH
36+
if [ -n "${HADOOP_CLASSPATH}" ]; then
37+
FLUSS_CLASSPATH="$FLUSS_CLASSPATH":"$HADOOP_CLASSPATH"
38+
fi
39+
3540
local FLUSS_SERVER_COUNT
3641
FLUSS_SERVER_COUNT="$(echo "$FLUSS_SERVER" | tr -s ':' '\n' | grep -v '^$' | wc -l)"
3742

fluss-lake/fluss-lake-iceberg/pom.xml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,12 @@
115115
<artifactId>hadoop-mapreduce-client-core</artifactId>
116116
<version>2.8.5</version>
117117
<scope>test</scope>
118+
<exclusions>
119+
<exclusion>
120+
<artifactId>commons-io</artifactId>
121+
<groupId>commons-io</groupId>
122+
</exclusion>
123+
</exclusions>
118124
</dependency>
119125

120126
<dependency>

fluss-lake/fluss-lake-paimon/pom.xml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,12 @@
8181
<artifactId>hadoop-mapreduce-client-core</artifactId>
8282
<version>2.8.5</version>
8383
<scope>test</scope>
84+
<exclusions>
85+
<exclusion>
86+
<artifactId>commons-io</artifactId>
87+
<groupId>commons-io</groupId>
88+
</exclusion>
89+
</exclusions>
8490
</dependency>
8591

8692
<dependency>

fluss-server/pom.xml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,12 @@
133133
<include>*:*</include>
134134
</includes>
135135
</artifactSet>
136+
<relocations>
137+
<relocation>
138+
<pattern>org.apache.commons</pattern>
139+
<shadedPattern>org.apache.fluss.shaded.org.apache.commons</shadedPattern>
140+
</relocation>
141+
</relocations>
136142
</configuration>
137143
</execution>
138144
</executions>

website/docs/maintenance/filesystems/hdfs.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,17 @@ fluss.hadoop.dfs.web.authentication.kerberos.principal: HTTP/[email protected]
5353
# Client principal and keytab (adjust paths as needed)
5454
fluss.hadoop.hadoop.security.kerberos.ticket.cache.path: /tmp/krb5cc_1000
5555
```
56+
57+
#### Use Machine Hadoop Environment Configuration
58+
59+
Fluss includes bundled Hadoop libraries with version 3.3.4 for deploying Fluss in machine without Hadoop installed.
60+
For most use cases, these work perfectly. However, you should configure your machine's native Hadoop environment if:
61+
1. Your HDFS uses kerberos security
62+
2. You need to avoid version conflicts between Fluss's bundled hadoop libraries and your HDFS cluster
63+
64+
Fluss automatically loads HDFS dependencies on the machine via the `HADOOP_CLASSPATH` environment variable.
65+
Make sure that the `HADOOP_CLASSPATH` environment variable is set up (it can be checked by running `echo $HADOOP_CLASSPATH`).
66+
If not, set it up using
67+
```bash
68+
export HADOOP_CLASSPATH=`hadoop classpath`
69+
```

website/docs/maintenance/tiered-storage/lakehouse-storage.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,14 @@ datalake.paimon.metastore: hive
4444
datalake.paimon.uri: thrift://<hive-metastore-host-name>:<port>
4545
datalake.paimon.warehouse: hdfs:///path/to/warehouse
4646
```
47+
4748
#### Add other jars required by datalake
4849
While Fluss includes the core Paimon library, additional jars may still need to be manually added to `${FLUSS_HOME}/plugins/paimon/` according to your needs.
49-
For example, for OSS filesystem support, you need to put `paimon-oss-<paimon_version>.jar` into directory `${FLUSS_HOME}/plugins/paimon/`.
50+
For example:
51+
- If you are using Paimon filesystem catalog with OSS filesystem, you need to put `paimon-oss-<paimon_version>.jar` into directory `${FLUSS_HOME}/plugins/paimon/`.
52+
- If you are using Paimon Hive catalog, you need to put [the flink sql hive connector jar](https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/hive/overview/#using-bundled-hive-jar) into directory `${FLUSS_HOME}/plugins/paimon/`.
53+
54+
Additionally, when using Paimon with HDFS, you must also configure the Fluss server with the Hadoop environment. See the [HDFS setup guide](/docs/maintenance/filesystems/hdfs.md) for detailed instructions.
5055

5156
### Start The Datalake Tiering Service
5257
Then, you must start the datalake tiering service to tier Fluss's data to the lakehouse storage.

0 commit comments

Comments
 (0)