Skip to content

Commit f77ee9e

Browse files
[docs] Add a section to configure hadoop related configuration in hdfs remote storage (#1518)
--------- Co-authored-by: luoyuxia <[email protected]>
1 parent cac54c5 commit f77ee9e

File tree

1 file changed

+39
-11
lines changed
  • website/docs/maintenance/filesystems

1 file changed

+39
-11
lines changed

website/docs/maintenance/filesystems/hdfs.md

Lines changed: 39 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -25,19 +25,47 @@ supports HDFS as a remote storage.
2525

2626

2727
## Configurations setup
28-
2928
To enabled HDFS as remote storage, you need to define the hdfs path as remote storage in Fluss' `server.yaml`:
30-
31-
```yaml
29+
```yaml title="conf/server.yaml"
3230
# The dir that used to be as the remote storage of Fluss
3331
remote.data.dir: hdfs://namenode:50010/path/to/remote/storage
3432
```
3533
36-
To allow for easy adoption, you can use the same configuration keys in Fluss' server.yaml as in Hadoop's `core-site.xml`.
37-
You can see the configuration keys in Hadoop's [`core-site.xml`](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml).
38-
39-
40-
41-
42-
43-
34+
### Configure Hadoop related configurations
35+
36+
Sometimes, you may want to configure how Fluss accesses your Hadoop filesystem, Fluss supports three methods for loading Hadoop configuration, listed in order of priority (highest to lowest):
37+
38+
1. **Fluss Configuration with `fluss.hadoop.*` Prefix.** Any configuration key prefixed with `fluss.hadoop.` in your `server.yaml` will be passed directly to Hadoop configuration, with the prefix stripped.
39+
2. **Environment Variables.** The system automatically searches for Hadoop configuration files in these locations:
40+
- `$HADOOP_CONF_DIR` (if set)
41+
- `$HADOOP_HOME/conf` (if HADOOP_HOME is set)
42+
- `$HADOOP_HOME/etc/hadoop` (if HADOOP_HOME is set)
43+
3. **Classpath Loading.** Configuration files (`core-site.xml`, `hdfs-site.xml`) found in the classpath are loaded automatically.
44+
45+
#### Configuration Examples
46+
Here's an example of setting up the hadoop configuration in server.yaml:
47+
48+
```yaml title="conf/server.yaml"
49+
# The all following hadoop related configurations is just for a demonstration of how
50+
# to configure hadoop related configurations in `server.yaml`, you may not need configure them
51+
52+
# Basic HA Hadoop configuration using fluss.hadoop.* prefix
53+
fluss.hadoop.fs.defaultFS: hdfs://mycluster
54+
fluss.hadoop.dfs.nameservices: mycluster
55+
fluss.hadoop.dfs.ha.namenodes.mycluster: nn1,nn2
56+
fluss.hadoop.dfs.namenode.rpc-address.mycluster.nn1: namenode1:9000
57+
fluss.hadoop.dfs.namenode.rpc-address.mycluster.nn2: namenode2:9000
58+
fluss.hadoop.dfs.namenode.http-address.mycluster.nn1: namenode1:9870
59+
fluss.hadoop.dfs.namenode.http-address.mycluster.nn2: namenode2:9870
60+
fluss.hadoop.dfs.ha.automatic-failover.enabled: true
61+
fluss.hadoop.dfs.client.failover.proxy.provider.mycluster: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
62+
63+
# Optional: Maybe need kerberos authentication
64+
fluss.hadoop.hadoop.security.authentication: kerberos
65+
fluss.hadoop.hadoop.security.authorization: true
66+
fluss.hadoop.dfs.namenode.kerberos.principal: hdfs/[email protected]
67+
fluss.hadoop.dfs.datanode.kerberos.principal: hdfs/[email protected]
68+
fluss.hadoop.dfs.web.authentication.kerberos.principal: HTTP/[email protected]
69+
# Client principal and keytab (adjust paths as needed)
70+
fluss.hadoop.hadoop.security.kerberos.ticket.cache.path: /tmp/krb5cc_1000
71+
```

0 commit comments

Comments
 (0)