Am I using the newest version of the library?
Is there an existing issue for this?
Current Behavior
When we try to read the excel files using the spark excel library version 3.5.6_0.31.2 it ignores the currency symbols (like $,₹,¥ etc,).
When we try to set inferschema as "false" it is reading all the values as strings and the symbols are getting retained.
Expected Behavior
With inferschema as "true" it should properly retain the currency symbol as well as read a INT/FLOAT values
Steps To Reproduce
Excel file content
| A |
B |
C |
D |
E |
| ₹ 1.00 |
$6.00 |
¥11.00 |
16.00 ₽ |
21.00 |
| ₹ 2.00 |
$7.00 |
¥12.00 |
17.00 ₽ |
22.00 |
| ₹ 3.00 |
$8.00 |
¥13.00 |
18.00 ₽ |
23.00 |
| ₹ 4.00 |
$9.00 |
¥14.00 |
19.00 ₽ |
24.00 |
| ₹ 5.00 |
$10.00 |
¥15.00 |
20.00 ₽ |
25.00 |
Currently when the inferSchema is set to true, all the currency formats are read as INT/FLOAT ignoring the symbols
df = spark.read.format("excel")
.option("inferSchema", "true")
.option("header", "true")
.load("Files/currencytest/testcurrencynew.xlsx")
df.show()
Results:
+---+---+---+---+---+
| A| B| C| D| E|
+---+---+---+---+---+
| 1| 6| 11| 16| 21|
| 2| 7| 12| 17| 22|
| 3| 8| 13| 18| 23|
| 4| 9| 14| 19| 24|
| 5| 10| 15| 20| 25|
+---+---+---+---+---+
When infer schema is set to false all the values with currecny symbol is read as string and not INT/FLOAT.
df = spark.read.format("excel")
.option("inferSchema", "false")
.option("header", "true")
.load("Files/currencytest/testcurrencynew.xlsx")
df.show()
+------+------+------+-------+-----+
| A| B| C| D| E|
+------+------+------+-------+-----+
|₹ 1.00| $6.00|¥11.00|16.00 ₽|21.00|
|₹ 2.00| $7.00|¥12.00|17.00 ₽|22.00|
|₹ 3.00| $8.00|¥13.00|18.00 ₽|23.00|
|₹ 4.00| $9.00|¥14.00|19.00 ₽|24.00|
|₹ 5.00|$10.00|¥15.00|20.00 ₽|25.00|
+------+------+------+-------+-----+
Environment
- Spark version: 3.5
- Spark-Excel version: 3.5.6_0.31.2
- Scala version: 2.12
- OS: linux
Anything else?
NA
Am I using the newest version of the library?
Is there an existing issue for this?
Current Behavior
When we try to read the excel files using the spark excel library version 3.5.6_0.31.2 it ignores the currency symbols (like $,₹,¥ etc,).
When we try to set inferschema as "false" it is reading all the values as strings and the symbols are getting retained.
Expected Behavior
With inferschema as "true" it should properly retain the currency symbol as well as read a INT/FLOAT values
Steps To Reproduce
Excel file content
Currently when the inferSchema is set to true, all the currency formats are read as INT/FLOAT ignoring the symbols
df = spark.read.format("excel")
.option("inferSchema", "true")
.option("header", "true")
.load("Files/currencytest/testcurrencynew.xlsx")
df.show()
Results:
+---+---+---+---+---+
| A| B| C| D| E|
+---+---+---+---+---+
| 1| 6| 11| 16| 21|
| 2| 7| 12| 17| 22|
| 3| 8| 13| 18| 23|
| 4| 9| 14| 19| 24|
| 5| 10| 15| 20| 25|
+---+---+---+---+---+
When infer schema is set to false all the values with currecny symbol is read as string and not INT/FLOAT.
df = spark.read.format("excel")
.option("inferSchema", "false")
.option("header", "true")
.load("Files/currencytest/testcurrencynew.xlsx")
df.show()
+------+------+------+-------+-----+
| A| B| C| D| E|
+------+------+------+-------+-----+
|₹ 1.00| $6.00|¥11.00|16.00 ₽|21.00|
|₹ 2.00| $7.00|¥12.00|17.00 ₽|22.00|
|₹ 3.00| $8.00|¥13.00|18.00 ₽|23.00|
|₹ 4.00| $9.00|¥14.00|19.00 ₽|24.00|
|₹ 5.00|$10.00|¥15.00|20.00 ₽|25.00|
+------+------+------+-------+-----+
Environment
Anything else?
NA