Name	Name	Last commit message	Last commit date
parent directory ..
project	project
README.md	README.md
hw2.pdf	hw2.pdf
hw3.pdf	hw3.pdf
hw4.pdf	hw4.pdf
hw5.pdf	hw5.pdf
hw6.pdf	hw6.pdf
hw7.pdf	hw7.pdf

Name

Last commit message

Last commit date

CS7304H 统计学习理论与方法

本文件仅用于代码归档，不属于课程原始提交文件

简介

教材：The Elements of Statistical Learning
project: 课程项目
- 课程项目要求：
  
  In this project, you are required to complete a classification task with high-dimension sparse data. The data come from an anonymous text classification dataset, in which each text is classified into one of the predefined 20 categories. As our course focus on statistical learning instead of feature extraction from raw data, we will provide the pre-extracted features from the dataset instead of the original texts. There are totally 11314 texts for training, and we have another 7532 texts for testing. Each text is represented as a 10000 dim vector. You may simply load these features and their corresponding labels with numpy.load or pickle.load function. Note that the text features are quite sparse and are in high dimension. Properly handling such high-dim sparse data might be the key to satisfactory performance. You may split a validation dataset with a preferred ratio by yourself. Your objective is to train statistical learning models with data we provided, and achieve as high test accuracy as you can. Detailed descriptions are listed below.