Skip to content

hsiehkl/Statistical-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Statistical-Learning

NTU 107-02 Statistical Learning:Theory and Applications

Course Information:Syllabus

Final Project - Pet Finder

Kaggle Competition

  • Classify industry and occupation by analying text data
  • Kaggle Page (Team: 老師笑話好笑喔)
  • Code

HW1: K-nearest-neighbors Model

We are going to use a subset of the "Million Songs Dataset" in this question. The dataset has been pre-processed and the training and testing dataset has been splitted and stored in a dictionary data structure. You can load the data from msd_data1.pickle using pickle.load(). There are four elements in the dictionary: X_train, Y_train, X_test, Y_test. As indicated by their names, these four elements are training and testing data. The outcome variable (i.e. y ) is the year a song was released, and the features are variables that characterize the sound of a song. The goal is to predict the release year given sound features.

HW2: All About Regressions

We are going use a dataset that predict the outcome values using 44 features. This dataset was collected from a social media platform. The goal is to understand how a post on a company fan page reach the consumers. The first variable, life_post_consumer, is the number of people who clicked anywhere in the post. We want to construct a model that can predict this variable using the value of other variables.

HW3: Classification

Generative Models

We are going to explore the problem of identifying smartphone position through probabilistic generative models. Motion sensors in smartphones provide valuable information for researchers to understand its owners. An interesting (and more challenging) task is to identify human activities through the data recorded by motion sensors. For example, we want to know whether the smartphone owner is walking, running, or biking. In this homework problem, we are going to tackle a simpler problem. We want to know the static position of the smartphone.

Logistic Regression with L2 Regularization

We are going to use to "Adult" dataset on the UCI machine learning reposition. The goal is to predict the label values of the income column, which can be either '>50K' or '<=50K.' The dataset had splitted the training and test data, and we are going to respect this particular train-test split in model testing.

HW4: Data Visualization via Dimensionality Reduction

A large portion of high school students get admitted to universities through an application and screening process that require each university department of offer admission to applicants first before students can choose where they wants to go. If we think of applicants as the customers of an academic department, then the duplications of offered applicants for different departments can be used to understand the competition relationships between academic departments. We are going to visualize this competition relationships using the University Department Offer of Admission Dataset (UDOAD).

HW5: Time Series Prediction

The dataset contains 104 weeks of training data and 39 weeks of test data. The time series is the product sales of a supermarket in a particular period. The goal is to predict sales in the test period.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published