Skip to content

Commit d649042

Browse files
[docs] Add a section to configure hadoop related configuration in hdfs remote storage (#1501)
--------- Co-authored-by: luoyuxia <[email protected]>
1 parent 13930df commit d649042

File tree

1 file changed

+39
-11
lines changed
  • website/docs/maintenance/filesystems

1 file changed

+39
-11
lines changed

website/docs/maintenance/filesystems/hdfs.md

Lines changed: 39 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,19 +9,47 @@ supports HDFS as a remote storage.
99

1010

1111
## Configurations setup
12-
1312
To enabled HDFS as remote storage, you need to define the hdfs path as remote storage in Fluss' `server.yaml`:
14-
15-
```yaml
13+
```yaml title="conf/server.yaml"
1614
# The dir that used to be as the remote storage of Fluss
1715
remote.data.dir: hdfs://namenode:50010/path/to/remote/storage
1816
```
1917
20-
To allow for easy adoption, you can use the same configuration keys in Fluss' server.yaml as in Hadoop's `core-site.xml`.
21-
You can see the configuration keys in Hadoop's [`core-site.xml`](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml).
22-
23-
24-
25-
26-
27-
18+
### Configure Hadoop related configurations
19+
20+
Sometimes, you may want to configure how Fluss accesses your Hadoop filesystem, Fluss supports three methods for loading Hadoop configuration, listed in order of priority (highest to lowest):
21+
22+
1. **Fluss Configuration with `fluss.hadoop.*` Prefix.** Any configuration key prefixed with `fluss.hadoop.` in your `server.yaml` will be passed directly to Hadoop configuration, with the prefix stripped.
23+
2. **Environment Variables.** The system automatically searches for Hadoop configuration files in these locations:
24+
- `$HADOOP_CONF_DIR` (if set)
25+
- `$HADOOP_HOME/conf` (if HADOOP_HOME is set)
26+
- `$HADOOP_HOME/etc/hadoop` (if HADOOP_HOME is set)
27+
3. **Classpath Loading.** Configuration files (`core-site.xml`, `hdfs-site.xml`) found in the classpath are loaded automatically.
28+
29+
#### Configuration Examples
30+
Here's an example of setting up the hadoop configuration in server.yaml:
31+
32+
```yaml title="conf/server.yaml"
33+
# The all following hadoop related configurations is just for a demonstration of how
34+
# to configure hadoop related configurations in `server.yaml`, you may not need configure them
35+
36+
# Basic HA Hadoop configuration using fluss.hadoop.* prefix
37+
fluss.hadoop.fs.defaultFS: hdfs://mycluster
38+
fluss.hadoop.dfs.nameservices: mycluster
39+
fluss.hadoop.dfs.ha.namenodes.mycluster: nn1,nn2
40+
fluss.hadoop.dfs.namenode.rpc-address.mycluster.nn1: namenode1:9000
41+
fluss.hadoop.dfs.namenode.rpc-address.mycluster.nn2: namenode2:9000
42+
fluss.hadoop.dfs.namenode.http-address.mycluster.nn1: namenode1:9870
43+
fluss.hadoop.dfs.namenode.http-address.mycluster.nn2: namenode2:9870
44+
fluss.hadoop.dfs.ha.automatic-failover.enabled: true
45+
fluss.hadoop.dfs.client.failover.proxy.provider.mycluster: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
46+
47+
# Optional: Maybe need kerberos authentication
48+
fluss.hadoop.hadoop.security.authentication: kerberos
49+
fluss.hadoop.hadoop.security.authorization: true
50+
fluss.hadoop.dfs.namenode.kerberos.principal: hdfs/[email protected]
51+
fluss.hadoop.dfs.datanode.kerberos.principal: hdfs/[email protected]
52+
fluss.hadoop.dfs.web.authentication.kerberos.principal: HTTP/[email protected]
53+
# Client principal and keytab (adjust paths as needed)
54+
fluss.hadoop.hadoop.security.kerberos.ticket.cache.path: /tmp/krb5cc_1000
55+
```

0 commit comments

Comments
 (0)