Releases: steveloughran/cloudstore
2022-11-04 enhanced mkcsv
enhanced mkcsv with a structured record format. consult the documentation.
2022-11-02 mkcsv command
New Command mkcsv
Creates a CSV file (technically a TSV file,...) with a given path; useful
for scale testing CSV processing through Hive and Spark.
hadoop jar cloudstore-1.0.jar mkcsv -verbose 10000 s3a://bucket/file.csv
The CSV has column 1 == row ID; column2 is a subset of a 1K string;
the length of the subset increases with every row then wraps around.
This is to give the file variable length rows, complicate split calculation
etc.
cloudup improved when uploading multi-GB files
- progress
- better read buffering
- knows how long file is so fails if less data is read (!)
- some more iostats
DiagnosticsAWSCredentialsProvider release
A new credential provider which prints obfuscated and MD5 values of the AWS secrets.
This leaks some information and so the logs must be considered as sensitive as the output
of storediag commands.
It does not attempt to do any authentication, simply print those values used by the temporary/simple
credential providers.
Usage
- Get into the same classloader as the s3a FS, which means into
share/hadoop/common/lib - Add to the list of credential providers
<property>
<name>fs.s3a.aws.credentials.provider</name>
<value>
org.apache.hadoop.fs.store.s3a.DiagnosticsAWSCredentialsProvider,
org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider,
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider,
</value>
</property>Notes
- If using S3A Delegation Tokens, the delegation token binding takes over
authenticating with s3 -the values in fs.s3a.aws.credentials.provider may not be read. - It's not enough to set this option and invoke via cloudstore commands; the class isn't found.
This may be related to the change for HADOOP-17372, but that was forced by odd things happening
if a HiveConfig or similar was passed in.
Output from an operation
2022-10-03 16:41:15,135 [main] INFO s3a.DiagnosticsAWSCredentialsProvider (DiagnosticsAWSCredentialsProvider.java:printSecretOption(135))
- Option fs.s3a.access.key = "AK**************66YB" [20] D51E40E203A4137FFE7CAB1BA000000 from [core-site.xml]
2022-10-03 16:41:15,135 [main] INFO s3a.DiagnosticsAWSCredentialsProvider (DiagnosticsAWSCredentialsProvider.java:printSecretOption(135))
- Option fs.s3a.secret.key = "Bq**********************************dfix" [40] BAA1DCAB58875154AA0B77A000000E0 from [core-site.xml]
2022-10-03 16:41:15,135 [main] INFO s3a.DiagnosticsAWSCredentialsProvider (DiagnosticsAWSCredentialsProvider.java:printSecretOption(138)) -
Option fs.s3a.session.token unset
2022-09-30-release: -e for env vars
- storediag adds -e option to print all the env vars; obfuscates AWS_ ones -but nothing else
- locate and print core, yarn, hdfs, mapred default and site xmls. only core-default.xml is required
- s3adiag highlights that when DTs are enabled, normal cred provider chain is ignored
2022-09-21-tlsinfo
new command tlsinfo to print the TLS information
2022-09-20 maintenance release
Maintenance release
- print "HADOOP_CREDSTORE_PASSWORD" (with obfuscation)
- s3a diag aware of hboss
- s3a diag reports on seek/read tuning better
- input and output streams created during diagnostics probed for stream capabilities
2022-09-01 stream capabilities
storediag prints known stream capabilities of input and output streams, including iocontext and vectored io
2022-08-08-bandwidth
- bandwidth command to work on hadoop 3.0-derived releases (cdh6). this only handles a size in MB without a suffix
- in secure mode, use current user as principal for token renewal
- prints s3a prefetch options
time bin/hadoop jar cloudstore-1.0.jar bandwidth 64 $BUCKET/testfile22022-07-22-bandwidth
Update bandwidth documentation on readme; slightly tune reporting
2022-07-18-robustness
storediag resilient to RTEs raised when trying to fetch dts.