Skip to content

"Keep getting: open `': Change reported by S3 during open at position 0. ETag was unavailable" when reading from S3 #242

Open
@iwb-vhuysmans

Description

@iwb-vhuysmans

Hi,

I'm currently using following maven dependencies in my project:

       <dependency>
            <groupId>com.dimafeng</groupId>
            <artifactId>testcontainers-scala_2.12</artifactId>
            <version>0.40.12</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.dimafeng</groupId>
            <artifactId>testcontainers-scala-dynalite_2.12</artifactId>
            <version>0.40.12</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.dimafeng</groupId>
            <artifactId>testcontainers-scala-localstack_2.12</artifactId>
            <version>0.40.12</version>
            <scope>test</scope>
        </dependency>

I have some code where I setup AmazonS3 client using LocalStackContainer:

  override val container: LocalStackContainer = new LocalStackContainer(services = List(S3))
  implicit var client: AmazonS3 = null
  var sparkCsvReader: SparkCsvReader = null

  override protected def beforeAll(): Unit = {
    container.start()
    client = AmazonS3ClientBuilder
      .standard()
      .withEndpointConfiguration(
        new AwsClientBuilder.EndpointConfiguration(
          container.container.getEndpointOverride(S3).toString,
          container.container.getRegion
        )
      )
      .withCredentials(
        new AWSStaticCredentialsProvider(
          new BasicAWSCredentials(container.container.getAccessKey, container.container.getSecretKey)
        )
      )
      .build()

    ss.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", container.container.getEndpointOverride(S3).toString)
    ss.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", container.container.getAccessKey)
    ss.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", container.container.getSecretKey)
    ss.sparkContext.hadoopConfiguration.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
    ss.sparkContext.hadoopConfiguration.set("fs.s3.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")

    sparkCsvReader = new SparkCsvReader() // Class I would like to test

  }

When I use the client and create a bucket and upload two files everything works fine. But when I try to read them back from the S3 bucket, I keep getting following error:

open `s3a://bucket1/file1.csv': Change reported by S3 during open at position 0. ETag 079a45cc9a4cda24698dddf8f6263cdd was unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions