Skip to content

Releases: steveloughran/cloudstore

2022-11-04 enhanced mkcsv

04 Nov 19:23
95facb2

Choose a tag to compare

enhanced mkcsv with a structured record format. consult the documentation.

2022-11-02 mkcsv command

02 Nov 15:26
8b9825a

Choose a tag to compare

New Command mkcsv

Creates a CSV file (technically a TSV file,...) with a given path; useful
for scale testing CSV processing through Hive and Spark.

hadoop jar cloudstore-1.0.jar mkcsv -verbose 10000 s3a://bucket/file.csv

The CSV has column 1 == row ID; column2 is a subset of a 1K string;
the length of the subset increases with every row then wraps around.
This is to give the file variable length rows, complicate split calculation
etc.

cloudup improved when uploading multi-GB files

  • progress
  • better read buffering
  • knows how long file is so fails if less data is read (!)
  • some more iostats

DiagnosticsAWSCredentialsProvider release

03 Oct 16:01
94b29c9

Choose a tag to compare

A new credential provider which prints obfuscated and MD5 values of the AWS secrets.

This leaks some information and so the logs must be considered as sensitive as the output
of storediag commands.

It does not attempt to do any authentication, simply print those values used by the temporary/simple
credential providers.

Usage

  1. Get into the same classloader as the s3a FS, which means into share/hadoop/common/lib
  2. Add to the list of credential providers
<property>
  <name>fs.s3a.aws.credentials.provider</name>
  <value>
    org.apache.hadoop.fs.store.s3a.DiagnosticsAWSCredentialsProvider,
    org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider,
    org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider,
  </value>
</property>

Notes

  • If using S3A Delegation Tokens, the delegation token binding takes over
    authenticating with s3 -the values in fs.s3a.aws.credentials.provider may not be read.
  • It's not enough to set this option and invoke via cloudstore commands; the class isn't found.
    This may be related to the change for HADOOP-17372, but that was forced by odd things happening
    if a HiveConfig or similar was passed in.

Output from an operation

2022-10-03 16:41:15,135 [main] INFO  s3a.DiagnosticsAWSCredentialsProvider (DiagnosticsAWSCredentialsProvider.java:printSecretOption(135))
 - Option fs.s3a.access.key = "AK**************66YB" [20] D51E40E203A4137FFE7CAB1BA000000 from [core-site.xml]
2022-10-03 16:41:15,135 [main] INFO  s3a.DiagnosticsAWSCredentialsProvider (DiagnosticsAWSCredentialsProvider.java:printSecretOption(135))
 - Option fs.s3a.secret.key = "Bq**********************************dfix" [40] BAA1DCAB58875154AA0B77A000000E0 from [core-site.xml]
2022-10-03 16:41:15,135 [main] INFO  s3a.DiagnosticsAWSCredentialsProvider (DiagnosticsAWSCredentialsProvider.java:printSecretOption(138)) -
 Option fs.s3a.session.token unset

2022-09-30-release: -e for env vars

30 Sep 12:22
86539cb

Choose a tag to compare

  • storediag adds -e option to print all the env vars; obfuscates AWS_ ones -but nothing else
  • locate and print core, yarn, hdfs, mapred default and site xmls. only core-default.xml is required
  • s3adiag highlights that when DTs are enabled, normal cred provider chain is ignored

2022-09-21-tlsinfo

21 Sep 16:58
da99e41

Choose a tag to compare

new command tlsinfo to print the TLS information

2022-09-20 maintenance release

20 Sep 09:15
4408399

Choose a tag to compare

Maintenance release

  • print "HADOOP_CREDSTORE_PASSWORD" (with obfuscation)
  • s3a diag aware of hboss
  • s3a diag reports on seek/read tuning better
  • input and output streams created during diagnostics probed for stream capabilities

2022-09-01 stream capabilities

02 Sep 11:01
b560b94

Choose a tag to compare

storediag prints known stream capabilities of input and output streams, including iocontext and vectored io

2022-08-08-bandwidth

08 Aug 16:01
c26bfb8

Choose a tag to compare

  1. bandwidth command to work on hadoop 3.0-derived releases (cdh6). this only handles a size in MB without a suffix
  2. in secure mode, use current user as principal for token renewal
  3. prints s3a prefetch options
time bin/hadoop jar cloudstore-1.0.jar  bandwidth 64 $BUCKET/testfile2

2022-07-22-bandwidth

22 Jul 18:50
d7eb962

Choose a tag to compare

Update bandwidth documentation on readme; slightly tune reporting

2022-07-18-robustness

18 Jul 17:38
70cf0e7

Choose a tag to compare

storediag resilient to RTEs raised when trying to fetch dts.