Statistical-Learning

NTU 107-02 Statistical Learning：Theory and Applications

Course Information：Syllabus

Final Project - Pet Finder

Estimate adoption efficiency of stray pets by using regression and tree-based models
Presentaion
Data and Analysis

Kaggle Competition

Classify industry and occupation by analying text data
Kaggle Page (Team: 老師笑話好笑喔)
Code

HW1: K-nearest-neighbors Model

We are going to use a subset of the "Million Songs Dataset" in this question. The dataset has been pre-processed and the training and testing dataset has been splitted and stored in a dictionary data structure. You can load the data from msd_data1.pickle using pickle.load(). There are four elements in the dictionary: X_train, Y_train, X_test, Y_test. As indicated by their names, these four elements are training and testing data. The outcome variable (i.e. y ) is the year a song was released, and the features are variables that characterize the sound of a song. The goal is to predict the release year given sound features.

Code

HW2: All About Regressions

We are going use a dataset that predict the outcome values using 44 features. This dataset was collected from a social media platform. The goal is to understand how a post on a company fan page reach the consumers. The first variable, life_post_consumer, is the number of people who clicked anywhere in the post. We want to construct a model that can predict this variable using the value of other variables.

Code

HW3: Classification

Generative Models

We are going to explore the problem of identifying smartphone position through probabilistic generative models. Motion sensors in smartphones provide valuable information for researchers to understand its owners. An interesting (and more challenging) task is to identify human activities through the data recorded by motion sensors. For example, we want to know whether the smartphone owner is walking, running, or biking. In this homework problem, we are going to tackle a simpler problem. We want to know the static position of the smartphone.

Logistic Regression with L2 Regularization

We are going to use to "Adult" dataset on the UCI machine learning reposition. The goal is to predict the label values of the income column, which can be either '>50K' or '<=50K.' The dataset had splitted the training and test data, and we are going to respect this particular train-test split in model testing.

Code

HW4: Data Visualization via Dimensionality Reduction

A large portion of high school students get admitted to universities through an application and screening process that require each university department of offer admission to applicants first before students can choose where they wants to go. If we think of applicants as the customers of an academic department, then the duplications of offered applicants for different departments can be used to understand the competition relationships between academic departments. We are going to visualize this competition relationships using the University Department Offer of Admission Dataset (UDOAD).

Code

HW5: Time Series Prediction

The dataset contains 104 weeks of training data and 39 weeks of test data. The time series is the product sales of a supermarket in a particular period. The goal is to predict sales in the test period.

Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Statistical-Learning

Final Project - Pet Finder

Kaggle Competition

HW1: K-nearest-neighbors Model

HW2: All About Regressions

HW3: Classification

Generative Models

Logistic Regression with L2 Regularization

HW4: Data Visualization via Dimensionality Reduction

HW5: Time Series Prediction

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Final Project		Final Project
HW1		HW1
HW2		HW2
HW3		HW3
HW4		HW4
HW5		HW5
Kaggle Competition		Kaggle Competition
.gitignore		.gitignore
README.md		README.md

hsiehkl/Statistical-Learning

Folders and files

Latest commit

History

Repository files navigation

Statistical-Learning

Final Project - Pet Finder

Kaggle Competition

HW1: K-nearest-neighbors Model

HW2: All About Regressions

HW3: Classification

Generative Models

Logistic Regression with L2 Regularization

HW4: Data Visualization via Dimensionality Reduction

HW5: Time Series Prediction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages