-
Notifications
You must be signed in to change notification settings - Fork 160
Description
Motivation
When users upload CSV files to create datasets in Superset, data type mismatches (such as text values in numeric columns or incorrectly formatted dates) cause the upload to fail with generic error messages like "Database upload file failed" or "Table already exists".
This poor error reporting creates significant friction for users trying to import their data. Without knowing which column, row, or value caused the failure.
Current Behavior
When uploading a CSV file with data type mismatches, the system displays vague error messages that don't help users identify or fix the problem. For example, if a numeric column contains text values or a date column has incorrectly formatted dates, users only see generic failure messages without any context about which column or row caused the issue.
Reproduction Steps:
- Create a CSV file with a column that should contain numeric values but includes some text values (e.g., a "Score" column with values: 25.5, 30.2, "invalid", 45.0)
- Navigate to the Superset UI and attempt to upload this CSV file as a new dataset
- Specify that the "Score" column should be treated as a numeric type (float or integer)
- Observe: The upload fails with a generic error message like "Database upload file failed" that provides no information about which column, row, or value caused the problem
Expected Behavior
When CSV upload fails due to data type conversion errors, the system should provide detailed, actionable error messages that include:
- The column name where the error occurred
- The expected data type for that column
- The specific value(s) that couldn't be converted
- The line number(s) in the CSV file where the invalid values appear
- A reasonable limit on the number of errors shown (to avoid overwhelming users with thousands of errors or creating large payloads)
Acceptance Criteria:
- Error messages include the column name and expected data type when type conversion fails
- Error messages show the specific invalid value and its line number in the CSV file
- When multiple errors exist, the system displays a limited number (e.g., first 5) with an indication of how many total errors were found
- The error detection works for numeric types (integers, floats, big integers) and non-numeric types (strings, dates, categories)
- Existing CSV upload functionality continues to work correctly for valid files
- The improved error handling doesn't significantly impact upload performance
Steps to Test
- Create test CSV files with various data type mismatches:
- A numeric column containing text values
- A date column with incorrectly formatted dates
- Multiple columns with multiple errors
- Upload each test file through the Superset UI
- Verify that error messages clearly identify the column, invalid value, and line number
- Verify that when there are many errors, only a reasonable number are displayed with a total count
Submission
Download https://cap.so/ to record your screen (use Studio mode). Export as an mp4, and drag and drop into an issue comment below.
Guide to submitting pull requests: https://hackmd.io/@timothy1ee/Hky8kV3hlx