Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

CS7304H 统计学习理论与方法

本文件仅用于代码归档,不属于课程原始提交文件

简介

  • 教材:The Elements of Statistical Learning
  • project: 课程项目
    • 课程项目要求:

      In this project, you are required to complete a classification task with high-dimension sparse data. The data come from an anonymous text classification dataset, in which each text is classified into one of the predefined 20 categories. As our course focus on statistical learning instead of feature extraction from raw data, we will provide the pre-extracted features from the dataset instead of the original texts. There are totally 11314 texts for training, and we have another 7532 texts for testing. Each text is represented as a 10000 dim vector. You may simply load these features and their corresponding labels with numpy.load or pickle.load function. Note that the text features are quite sparse and are in high dimension. Properly handling such high-dim sparse data might be the key to satisfactory performance. You may split a validation dataset with a preferred ratio by yourself. Your objective is to train statistical learning models with data we provided, and achieve as high test accuracy as you can. Detailed descriptions are listed below.

文档目录

  • hw2-hw7:七次课程作业

参考资料