- The first "column" in the DataFrame is the index, which defaults to incrementing integers
- Like how each column has a name, the index is the "name" of each row
- We can assign a column to be the index of a DataFrame:
listings_df = listings_df.set_index('id')
listings_df
Why do we need to assign the result of set_index()?
- Calling
.set_index()does not change the original DataFrame value - Calling
.set_index()returns a new DataFrame value with the index changed, which we then assign to the original variable. - Most Pandas methods return a new value rather than changing the original value.
We can perform indexing and slicing on DataFrames using .iloc:
To get the first row:
listings_df.iloc[0]
To get the second column in the first row:
listings_df.iloc[0, 1]
To get the second column of the first five rows:
listings_df.iloc[0:5, 1]
To get the second column of all rows:
listings_df.iloc[:, 1]
We can also index and slice rows and columns by their names:
To get a single row by it's name in the index:
listings_df.loc['l9995141']
To get several rows by their names:
listings_df.loc[['l9995141', 'l12026015', 'l44688136']]
While you can use
:slicing to specify a start and end names for a range, it is more common to specify a list of names.
To get the name column of all rows:
listings_df.loc[:, 'name']
Use sorting and indexing on listing_df to find:
- The value in the third column of the fifth row.
- The
nameof the listing with anidof'l6113' - The
review_scores_ratingof the most reviewed listing. - The
latitudeandlongitudeof the least expensive listing.