This is my self designed project to make prediction on the residential real estate price in Taipei Taiwan.
The major techniques involved in the project including PCA, K-means, Multivariate Regression, Human Mobility, Network Connectivity, etc.
Taipei, similar to all the other compact cities in east Asian, holds a great amount of population and thus have an insane house price.
Among the factors that affect the price of a house, besides its original properties, for exmaple the size, the layout and its condition, the environment around it is of vital importance.
But how could we describe a house and its surroundings statistically?
We decide to use POIs (Points Of Interest) to construct a model to predict the house price.
To construct the model and compose a sound story, we have collected datasets from all aspects.
- The transaction data in Taipei in the past 3 years (2019-2022)
- The POIs arround each transaction (details below)
- District and Sub-district geographical information
- Population Distribution
- Income distribution
- Education distribution
- Daytime/Nighttime population flow
- Age Distribution
- etc.
- We are using Google Map API to acquire the data, there are mainly two APIs that we used.
- We first use the name of the transaction location to find the latitude and longitude. Use the coordinates to acqure its surrounding POIs.
- We count POIs of different kind within different radius, we choose the buffer to be
- 500m: A walking distance
- 1000m: A public transportation distance/ A scooter distance
- 3000m: A private transportation distance
- We choose a list of POI to count
- Subway Station
- Bus Station
- Police Station
- Hospital
- Supermarket
- Library
- University
- Primary School
- Church
- Nightclub
- Shopping Mall
- Park
- *Cautious! Google API can be costful
We are doing clusters with following combinations
- 500m POI
- 1000m POI
- 3000m POI
- 1,2,3 + Unit Price of Transaction (
$NTD/m^2$ ) - House Properties
- House Properties + Unit Price of Transaction (
$NTD/m^2$ )
We are conducting linear regression with following combinations
- All POIs v.s. Unit Price
- 500m POI v.s. Unit Price
- 1000m POI v.s. Unit Price
- 3000m POI v.s. Unit Price
- House Peroperties v.s. Unit Price
We are considering the following factors in affecting house price
- Education distribution of different level in different sub-district in Taipei
- Income distribution of ttl/avg/median income in different sub-district in Taipei
- Population in each district in daytime and nighttime in weekday/weekend in Taipei
Below is the repository tree of the project
- README.md
- Final_DataClean_Regression.ipynb
This is main code of Data Clearance, PCA, Human Mobility, Network Connectivity and Multivariate Regression.
- Final_Cluster_Visualization.ipynb
This is the main code from K-means Clustering, Map Visualization.
- socioSHAPES.zip
This is the ArcGIS project for network connectivity and other geographical analysis.
- LICENSE
- Output_CSV/
This folder contains the numeric outputs in the form of CSV.
- Raw_data/
This folder contains the raw data used in the project.
- 01Transaction/
This is the folder of transaction raw data
- 02POI/
This is the folder of POIs we have queried from Google API and their processed versions
- 03Census/
This is the folder of Census data: education, income, population flow data.
- 04Geographic_data/
This is the folder of shapefile of Taiwan districts and sub-districts.
- 01Transaction/
- Figures/
This folder contains figures generated from the code.
- Report/
This folder contains both the presentation slides and the final report.
Refer to the folder Report
