Skip to content

Failed to read S3 config when duration without unit #12753

Open
@wenwj0

Description

@wenwj0

Bug description

When I set this config, velox threw an error.

spark.hadoop.fs.s3a.connection.timeout=200000

errMsg:

org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Invalid duration '200000'
Retriable: False
Context: Split [Hive: s3a://xxxxxxxxxx/part-00000-xxxxx.zstd.parquet 0 - 1442] Task Gluten_Stage_1_TID_1_VTID_0
Function: toDuration
File: /home/gitlab-runner/builds/2Grm8K_1/0/gluten/ep/build-velox/build/velox_ep/velox/common/config/Config.cpp
Line: 88

This error is related to the toDuration function, which will throw error when the value string without time unit.

https://github.com/facebookincubator/velox/blob/main/velox/common/config/Config.cpp#L88

  static const RE2 kPattern(R"(^\s*(\d+(?:\.\d+)?)\s*([a-zA-Z]+)\s*)");

  double value;
  std::string unit;
  if (!RE2::FullMatch(str, kPattern, &value, &unit)) {
    VELOX_USER_FAIL("Invalid duration '{}'", str);
  }

Expected behavior:
In vanilla Spark, the set spark.hadoop.fs.s3a.connection.timeout=200000 is work.
Expect velox can support this without time unit.

System information

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageNewly created issue that needs attention.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions