Suggestion on page 432 #121
baifanhorst
started this conversation in
General
Replies: 1 comment
-
I missed the option 'na_values='?' in the new code that I suggested. It should be: df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data', Without the option, a following code that drops rows containing '?' won't work. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
On this page, the AUTO MPG dataset is loaded by the following codes:
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight',
'Acceleration', 'Model Year', 'Origin']
df = pd.read_csv(url, names=column_names,
na_values = "?", comment='\t',
sep=" ", skipinitialspace=True)
However, there is one more column in the data file, which contains car names. These names are written as "abc", with the quote marks explicit in the file. I tried to modify the above codes, including adding a new entry to the 'column_names' list, and adding additional options in 'pd.read_csv' such as quotechar = '"' and escapechar='', but still failed to load the car names. On the other hand, the following code, which is simpler, can load those car names:
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data', delim_whitespace=True, header=None, names=['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'car name'])
I suggest in the future this code may replace the original one.
Also, I found the option comment='\t' confusing in the original codes. I examined the data file and cannot find any comment line beginning with '\t'. However, without this option, pd.read_csv would fail to read the data. Later I realized that in front of each car name, there is a tab. So the function of "comment='\t'" is to ignore the car name. I am not sure whether my understanding is correct.
Beta Was this translation helpful? Give feedback.
All reactions