Skip to content

[BUG] Excel currency format is currently ignored when using the inferschema "true" when reading the excel files #1023

@AadithyaNirmal

Description

@AadithyaNirmal

Am I using the newest version of the library?

  • I have made sure that I'm using the latest version of the library.

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When we try to read the excel files using the spark excel library version 3.5.6_0.31.2 it ignores the currency symbols (like $,₹,¥ etc,).

When we try to set inferschema as "false" it is reading all the values as strings and the symbols are getting retained.

Expected Behavior

With inferschema as "true" it should properly retain the currency symbol as well as read a INT/FLOAT values

Steps To Reproduce

Excel file content

A B C D E
₹ 1.00 $6.00 ¥11.00 16.00 ₽ 21.00
₹ 2.00 $7.00 ¥12.00 17.00 ₽ 22.00
₹ 3.00 $8.00 ¥13.00 18.00 ₽ 23.00
₹ 4.00 $9.00 ¥14.00 19.00 ₽ 24.00
₹ 5.00 $10.00 ¥15.00 20.00 ₽ 25.00

Currently when the inferSchema is set to true, all the currency formats are read as INT/FLOAT ignoring the symbols

df = spark.read.format("excel")
.option("inferSchema", "true")
.option("header", "true")
.load("Files/currencytest/testcurrencynew.xlsx")

df.show()

Results:
+---+---+---+---+---+
| A| B| C| D| E|
+---+---+---+---+---+
| 1| 6| 11| 16| 21|
| 2| 7| 12| 17| 22|
| 3| 8| 13| 18| 23|
| 4| 9| 14| 19| 24|
| 5| 10| 15| 20| 25|
+---+---+---+---+---+

When infer schema is set to false all the values with currecny symbol is read as string and not INT/FLOAT.

df = spark.read.format("excel")
.option("inferSchema", "false")
.option("header", "true")
.load("Files/currencytest/testcurrencynew.xlsx")

df.show()

+------+------+------+-------+-----+
| A| B| C| D| E|
+------+------+------+-------+-----+
|₹ 1.00| $6.00|¥11.00|16.00 ₽|21.00|
|₹ 2.00| $7.00|¥12.00|17.00 ₽|22.00|
|₹ 3.00| $8.00|¥13.00|18.00 ₽|23.00|
|₹ 4.00| $9.00|¥14.00|19.00 ₽|24.00|
|₹ 5.00|$10.00|¥15.00|20.00 ₽|25.00|
+------+------+------+-------+-----+

Environment

- Spark version: 3.5
- Spark-Excel version: 3.5.6_0.31.2 
- Scala version: 2.12
- OS: linux

Anything else?

NA

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions