You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**See Intro to Data Science [UW / Lectures on MapReduce](http://bit.ly/uwintrodatascience)
122
121
* Intro to Hadoop and MapReduce [Cloudera / Udacity Course](http://bit.ly/udacity-hadoopmapreduce)*includes select free excerpts of Hadoop: The Definitive Guide [Book ```$29```](http://amzn.to/1i7wgLv)
123
122
124
-
***Databases**
123
+
####**Databases**
125
124
* Introduction to Databases [Stanford / Online Course](https://bit.ly/introdatabases)
126
125
* SQL School [Mode Analytics / Tutorials](http://bit.ly/sqlschool)
* Mining The Social Web [Book ```$30```](http://amzn.to/1mqxAsB)
132
131
* Introduction to Information Retrieval / Stanford [Digital](http://bit.ly/ebook-stanford-inforetrieval) & [Book ```$56```](http://amzn.to/1mWbnUT)
133
132
133
+
#### **Data Design**
134
+
How does the real world get translated into data? How should one structure that data to make it understandable and usable? Extends beyond database design to usability of schemas and models.
135
+
*[Tidy Data in Python](http://www.jeannicholashould.com/tidy-data-in-python.html)
_More Libraries can be found in the ["awesome machine learning"](https://github.com/josephmisiti/awesome-machine-learning#python) repo & in related [specializations](https://github.com/datasciencemasters/go/blob/master/specializations.md)_
220
223
221
-
***Data Structures & Analysis Packages**
224
+
####**Data Structures & Analysis Packages**
222
225
* Flexible and powerful data analysis / manipulation library with labeled data structures objects, statistical functions, etc [pandas](http://bit.ly/py-pandas) & Tutorials [Python for Data Analysis / Book](http://amzn.to/Q2pI5I)
223
226
224
-
***Machine Learning Packages**
227
+
####**Machine Learning Packages**
225
228
*[scikit-learn](http://bit.ly/py-scikit) - Tools for Data Mining & Analysis
226
229
227
-
***Networks Packages**
230
+
####**Networks Packages**
228
231
*[networkx](http://bit.ly/py-networkx) - Network Modeling & Viz
229
232
230
-
***Statistical Packages**
233
+
####**Statistical Packages**
231
234
*[PyMC](http://bit.ly/py-pymc) - Bayesian Inference & Markov Chain Monte Carlo sampling toolkit
232
235
*[Statsmodels](http://bit.ly/py-statsmodel) - Python module that allows users to explore data, estimate statistical models, and perform statistical tests
233
236
*[PyMVPA](http://bit.ly/py-mvpa) - Multivariate Pattern Analysis in Python
234
237
235
-
***Natural Language Processing & Understanding**
238
+
####**Natural Language Processing & Understanding**
236
239
*[NLTK](http://bit.ly/py-nltk) - Natural Language Toolkit
237
240
*[Gensim](http://bit.ly/py-gensim) - Python library for topic modeling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.
238
241
239
-
***Data APIs**
242
+
####**Data APIs**
240
243
*[twython](http://bit.ly/py-twython) - Python wrapper for the Twitter API
241
244
242
-
***Visualization Packages**
245
+
####**Visualization Packages**
243
246
*[matplotlib](http://bit.ly/matplotlib-docs) - well-integrated with analysis and data manipulation packages like numpy and pandas
244
247
*[Seaborn](http://bit.ly/seaborn-python) - a high-level statistical visualization package built on top of matplotlib
245
248
246
-
***iPython Data Science Notebooks**
249
+
####**iPython Data Science Notebooks**
247
250
*[Data Science in IPython Notebooks](http://bit.ly/ipynb-ds) (Linear Regression, Logistic Regression, Random Forests, K-Means Clustering)
248
251
*[A Gallery of Interesting IPython Notebooks - Pandas for Data Analysis](http://bit.ly/ipyfordataanalysis)
0 commit comments