Skip to content

Commit cdb4c7c

Browse files
added "data design" section
1 parent 1d6db0a commit cdb4c7c

File tree

1 file changed

+24
-21
lines changed

1 file changed

+24
-21
lines changed

README.md

Lines changed: 24 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ Out of personal preference and need for focus, I geared the original curriculum
8585
[★ What are some good resources for learning about numerical analysis? / Quora ]
8686
(http://www.quora.com/What-are-some-good-resources-for-learning-about-numerical-analysis)
8787

88-
* **Linear Algebra & Programming**
88+
#### **Linear Algebra & Programming**
8989
* Linear Algebra [Khan Academy / Videos](http://bit.ly/khanlinalg)
9090
* Linear Algebra / Levandosky [Stanford / Book ```$10```](http://amzn.to/1kIfmmI)
9191
* Linear Programming (Math 407) [University of Washington / Course](http://bit.ly/course-uw-linearprogramming)
@@ -95,45 +95,48 @@ Out of personal preference and need for focus, I geared the original curriculum
9595
* Vector Calculus: Understanding the Cross Product [Better Explained / Article](https://betterexplained.com/articles/cross-product/)
9696
* Vector Calculus: Understanding the Dot Product [Better Explained / Article](https://betterexplained.com/articles/vector-calculus-understanding-the-dot-product/)
9797

98-
* **Convex Optimization**
98+
#### **Convex Optimization**
9999
* Convex Optimization / Boyd [Stanford / Lectures](http://stanford.edu/class/ee364a/index.html) / [Book](http://stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf)
100100

101-
* **Statistics**
101+
#### **Statistics**
102102
* Stats in a Nutshell [Book ```$29```](http://amzn.to/1iMnx2X)
103103
* Think Stats: Probability and Statistics for Programmers [Digital](http://bit.ly/ebook-thinkstats) & [Book ```$25```](http://amzn.to/RcVnTf)
104104
* Think Bayes [Digital](http://bit.ly/ebook-thinkbayes) & [Book ```$25```](http://amzn.to/1hmy4Cr)
105105

106-
* **Differential Equations & Calculus**
106+
#### **Differential Equations & Calculus**
107107
* Differential Equations in Data Science [Python Tutorial](http://bit.ly/ipynb-differentialeq)
108-
109-
* **Problem Solving**
108+
#### **Problem Solving**
110109
* Problem-Solving Heuristics "How To Solve It" [Polya / Book ```$10```](http://amzn.to/1mqJRSi)
111110

112111
### Computing
113112

114113
Get your environment up and running with the [Data Science Toolbox](http://bit.ly/datascitoolbox)
115114

116-
* **Algorithms**
115+
#### **Algorithms**
117116
* Algorithms Design & Analysis I [Stanford / Coursera](http://bit.ly/coursera-algo)
118117
* Algorithm Design, Kleinberg & Tardos [Book ```$125```](http://amzn.to/1iMnWm5)
119118

120-
* **Distributed Computing Paradigms**
119+
#### **Distributed Computing Paradigms**
121120
* *See Intro to Data Science [UW / Lectures on MapReduce](http://bit.ly/uwintrodatascience)
122121
* Intro to Hadoop and MapReduce [Cloudera / Udacity Course](http://bit.ly/udacity-hadoopmapreduce) *includes select free excerpts of Hadoop: The Definitive Guide [Book ```$29```](http://amzn.to/1i7wgLv)
123122

124-
* **Databases**
123+
#### **Databases**
125124
* Introduction to Databases [Stanford / Online Course](https://bit.ly/introdatabases)
126125
* SQL School [Mode Analytics / Tutorials](http://bit.ly/sqlschool)
127126
* SQL Tutorials [SQLZOO / Tutorials](http://bit.ly/tut-sqlzoo)
128127

129-
* **Data Mining**
128+
#### **Data Mining**
130129
* Mining Massive Data Sets / Stanford [Coursera](https://www.coursera.org/course/mmds) & [Digital](http://bit.ly/ebook-miningmassivedata) & [Book ```$58```](http://amzn.to/1txocpo)
131130
* Mining The Social Web [Book ```$30```](http://amzn.to/1mqxAsB)
132131
* Introduction to Information Retrieval / Stanford [Digital](http://bit.ly/ebook-stanford-inforetrieval) & [Book ```$56```](http://amzn.to/1mWbnUT)
133132

133+
#### **Data Design**
134+
How does the real world get translated into data? How should one structure that data to make it understandable and usable? Extends beyond database design to usability of schemas and models.
135+
* [Tidy Data in Python](http://www.jeannicholashould.com/tidy-data-in-python.html)
136+
134137
_OSDSM Specialization: [Web Scraping & Crawling](https://github.com/datasciencemasters/go/blob/master/specializations.md#web-scraping--crawling)_
135138

136-
* **Machine Learning**
139+
### **Machine Learning**
137140

138141
_Foundational & Theoretical_
139142
* Machine Learning [Ng Stanford / Coursera](http://bit.ly/stanford-ml) & [Stanford CS 229](http://bit.ly/stanfordcs229)
@@ -146,7 +149,7 @@ _OSDSM Specialization: [Web Scraping & Crawling](https://github.com/datasciencem
146149
* Machine Learning for Hackers [ipynb / digital book](http://bit.ly/mlforhackers)
147150
* Intro to scikit-learn, SciPy2013 [youtube tutorials](http://bit.ly/scikit-video-tuts)
148151

149-
* **Probabilistic Modeling**
152+
### **Probabilistic Modeling**
150153
* Probabilistic Programming and Bayesian Methods for Hackers [Github / Tutorials](http://bit.ly/ipnb-probabilisticprogramming)
151154
* Probabilistic Graphical Models [Stanford / Coursera](http://bit.ly/stanford-pgm)
152155

@@ -177,7 +180,7 @@ One of the "unteachable" skills of data science is an intuition for analysis. Wh
177180

178181
### Data Communication and Design
179182

180-
* **Visualization**
183+
#### **Visualization**
181184

182185
_Data Visualization and Communication_
183186
* The Truthful Art: Data, Charts, and Maps for Communication [Cairo / Book ```$21```](http://amzn.to/1UydGAc)
@@ -218,32 +221,32 @@ Installing Basic Packages [Python, virtualenv, NumPy, SciPy, matplotlib and IPyt
218221

219222
_More Libraries can be found in the ["awesome machine learning"](https://github.com/josephmisiti/awesome-machine-learning#python) repo & in related [specializations](https://github.com/datasciencemasters/go/blob/master/specializations.md)_
220223

221-
* **Data Structures & Analysis Packages**
224+
#### **Data Structures & Analysis Packages**
222225
* Flexible and powerful data analysis / manipulation library with labeled data structures objects, statistical functions, etc [pandas](http://bit.ly/py-pandas) & Tutorials [Python for Data Analysis / Book](http://amzn.to/Q2pI5I)
223226

224-
* **Machine Learning Packages**
227+
#### **Machine Learning Packages**
225228
* [scikit-learn](http://bit.ly/py-scikit) - Tools for Data Mining & Analysis
226229

227-
* **Networks Packages**
230+
#### **Networks Packages**
228231
* [networkx](http://bit.ly/py-networkx) - Network Modeling & Viz
229232

230-
* **Statistical Packages**
233+
#### **Statistical Packages**
231234
* [PyMC](http://bit.ly/py-pymc) - Bayesian Inference & Markov Chain Monte Carlo sampling toolkit
232235
* [Statsmodels](http://bit.ly/py-statsmodel) - Python module that allows users to explore data, estimate statistical models, and perform statistical tests
233236
* [PyMVPA](http://bit.ly/py-mvpa) - Multivariate Pattern Analysis in Python
234237

235-
* **Natural Language Processing & Understanding**
238+
#### **Natural Language Processing & Understanding**
236239
* [NLTK](http://bit.ly/py-nltk) - Natural Language Toolkit
237240
* [Gensim](http://bit.ly/py-gensim) - Python library for topic modeling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.
238241

239-
* **Data APIs**
242+
#### **Data APIs**
240243
* [twython](http://bit.ly/py-twython) - Python wrapper for the Twitter API
241244

242-
* **Visualization Packages**
245+
#### **Visualization Packages**
243246
* [matplotlib](http://bit.ly/matplotlib-docs) - well-integrated with analysis and data manipulation packages like numpy and pandas
244247
* [Seaborn](http://bit.ly/seaborn-python) - a high-level statistical visualization package built on top of matplotlib
245248

246-
* **iPython Data Science Notebooks**
249+
#### **iPython Data Science Notebooks**
247250
* [Data Science in IPython Notebooks](http://bit.ly/ipynb-ds) (Linear Regression, Logistic Regression, Random Forests, K-Means Clustering)
248251
* [A Gallery of Interesting IPython Notebooks - Pandas for Data Analysis](http://bit.ly/ipyfordataanalysis)
249252

0 commit comments

Comments
 (0)