317 lines (310 loc) · 34.2 KB

Data Science

R에서 파이썬까지…데이터과학 학습 사이트 8곳
리다, 기업을 위한 데이터과학 강의 공개
데이터과학에 입문하고 싶다면, 이곳부터
데이터과학을 시작할 때 도움되는 것들
헬로 데이터 과학- 헬로 데이터 과학당신의 삶과 업무를 바꾸는 데이터 과학 (데이터 사이언스)
- 데이터 과학자의 데이터로 책 쓰기: 데이터는 기획력과 감수성이다
- 온라인 서비스 개선을 데이터 활용법 (How We Use Data 발표)
인정받는 데이터 분석가 되기 – 외부 세미나 요약 –
Overfitting을 피해보자!
손에 잡히는 데이터 과학 이야기
How to Become a Data Scientist for Free
- 데이터 과학을 무료로 공부해보자
따라 하는 데이터 과학 – 강의 PPT
datasciencetech.institute
mindscale.kr
How to actually learn data science
Skills You Need for that Data Science Job
GitHub Special: Data Scientists to Follow & Best Tutorials on GitHub
How to Become a Data Scientist
So You Want To Be a Data Scientist: A Guide for College Grads
데이터과학 자료모음
A curated list of data science blogs
Data Science Courses
Pascal Poupart's Homepage
dataquest.io
Linear Algebra for Data Scientists
Reading Between the Lines: How We Make Sense of Users’ Searches
Research papers that changed the world of Big Data
Data Analysis (1): Neuroimaging Data loading using SPM8 toolbox
The last-mile problem: How data science and behavioral science can work together
공공데이터를 연결하라…‘LOD’
GE산업인터넷 플랫폼, 프레딕스™(Predix™)에 대해 알아야 할 모든 것
articles
트위터로 들여다보는 빅데이터 분석
버즈피드의 교훈: 분산 미디어와 데이터 분석
실리콘 밸리 데이터 사이언티스트의 하루
[데이터사이언티스트를 찾아서] “데이터의 잡음 속 숨겨진 진실을 찾아라”
Data Science From Scratch: First Principles with Python
Three Things About Data Science You Won't Find In the Books
Weekly Digest, June 15
The democratization of predictive analytics
Grepping logs is terrible
Grepping logs is still terrible
Why Topological Data Analysis Works
HyperLogSandwich
Pipelining - A Successful Data Processing Model
How I Became a Data Scientist Despite Having Been a Math Major
NASA'S DATA PORTAL
Top 10 data mining algorithms in plain English
Tracking Economic Development with Open Data and Predictive Algorithms
신선한 데이터를 냉장고에서 꺼내기
Algorithm reduces size of data sets while preserving their mathematical properties
A BEGINNER'S GUIDE TO DATA ANALYSIS WITH UNIX UTILITIES
Data Scientist: The Sexiest Job of the 21st Century
Enterprise Data Analysis and Visualization: An Interview Study
Prologue to Data Science
Statistical Data Mining Tutorials
pubdata.tistory.com/category/Lecture_DataMining
Data Mining and Statistics: What's the Connection?
Data Science in Clojure at Yieldbot [VIDEO]
Introduction to Data Mining
Mining the Web to Predict Future Events
Using Data Science to Measure a Musical Revolution
Data Science Career Alert - June 12
Comparing Python and R for Data Science
Introducing ShArc: Shot Arc Analysis
Inside Data@Scale 2015
- Dato
DataLake
- A Data Lake Architecture With Hadoop and Open Source Search Engines
Data Maven
ryd.io - A data science exploration of the NYC Taxi data set via clustering and time-series analysis
Nonnegative Matrix Factorization via Rank-One Downdate
- 이 장에서는 새운 기법인 NMF(Non-negative Matrix Factorization) 을 소개
프레임드, 예측 분석 기술 클라우드 서비스로 출시
11 Facts about Data Science that you must know
The Data Science Workflow
퇴물개발자가 생각하는 빅데이터 기술
Predicting winners of the Rugby World Cup
Building Analytics at 500px
2015 Data Science Salary Survey / 2015 데이터과학 소득 조사
Building Analytics at 500px
[B2B스타트업] 데이터과학자들의 실험실, 넘버웍스
50 years of Data Science
기획자·마케터가 알아둘 데이터과학 원칙 6가지
우리 식당 김사장이 데이터 과학자가 된 사연은?
e커머스 데이터 파헤치기-6편
github.com/caesar0301/awesome-public-datasets
데이터와 관련하여 기업들이 공개한 기술은 어떤게 있을까?
The Automatic Statistician - An artificial intelligence for data science
Common Probability Distributions: The Data Scientist’s Crib Sheet
- 데이터 사이언스에 많이 사용되는 확률밀도함수들
  - Bernoulli
    - 동전의 앞/뒤처럼 이벤트가 0 또는 1밖에 일어나지 않는 분포
    - 동전은 확률이 0.5/0.5 겠지만 다른 경우도 있을 수 있음
  - Uniform
    - 주사위처럼 모든 결과에 대한 확률이 동일한 확률분포
  - Binomial
    - 동전을 n번 던졌을 때 p번만큼 앞면이 나올 확률은?
    - Binomial은 이렇게 0 또는 1이 나오는 이벤트(각각이 Bernoulli확률을 갖는 이벤트)에 대해 1이 발생활 횟수에 대한 확률
  - Poisson
    - 1시간에 평균 10번의 전화통화가 온다고 가정. 그렇다면 한시간에 12번 전화통화가 올 확률은? 이것이 바로 poisson(포아송) 확률
    - 이것은, 예를 들어, 60분 중 48번의 실패(0)와 12번의 성공(1)을 하면 ok. 또는, 60분이 아니라 더 잘게 쪼개서 988번의 실패와 12번의 성공을 하면 ok
    - 이처럼 시행횟수가 크고 이벤트가 일어날 확률이 작은 bionomial 분포가 바로 poisson 분포에 수렴(이 때문에 binomial의 근사로도 사용)
  - Hypergeometric
    - 까만공과 하얀공이 절반씩 있는데 그것을 여러번 뽑는다고 가정. 그럼 이것은 Binomial과 동일한가?
    - 아님. 왜냐면 공을 뽑을 때 만약 그 공을 다시 채워넣지 않는다면 남아있는 공의 확률은 바뀌기 때문
    - Binomial의 경우와 달리 replacement(다시 보충)를 허용하지 않는 것이 바로 hypergeometric 확률입니다.
  - Geometric
    - 주사위를 굴렸을 때 한번에 6이 나올 확률은? 두번만에 6이 나올 확률은? 세번만에, 네번만에...
    - 이처럼 geometric 분포는 어떤 이벤트가 일어날 때까지의 횟수에 대한 확률
    - 이벤트의 확률이 어떠하든 늘 "가장 첫번째"에 이벤트가 발생할 확률이 가장 크다
  - Negative Binomial
    - Geometric이 한번 성공할 때까지 걸리는 횟수에 대한 분포라면 negative binominal은 n번 성공할 때까지 걸리는 횟수에 대한 분포 비슷하게 안지은거야?;;)
  - Exponential
    - bionomial의 연속버전이 poisson이었다면, geometric의 연속버전이 exponential분포
    - 다시말해 "평균 5분만에 전화가 걸려온다고 할 때 다음 전화가 7분 후에 걸려올 확률은?"
  - Weibull
    - exponential이 "다음 이벤트가 성공할 때 까지의 실패구간은"에 대한 함수였다면 반대로 Weibull은 "첫 실패가 발생할 때까지 이번 이벤트가 성공할 구간"에 대한 확률
  - Gaussian (Normal)
    - 너무 유명한 확률분포
    - 특히 매우 많은 수의 동일 확률분포를 가진 샘플들의 산술평균은 그 샘플들이 어떤 분포를 따르든(binomial이든 exponential이든 아님 다른거든) 결국 Gaussian 분포로 수렴한다는 "중심극한정리"가 매우 유용하기에 매우 많은 곳에 적용 가능
  - Log-normal
    - 변수의 log 값이 Gaussian을 나타내는 분포
    - 다시말해 Gaussian을 exponential 한 함수
  - Student’s t-distribution
    - 정규분포의 mean 값에 대한 판단을 내릴 떄 사용하는 확률분포
  - Chi-squared distribution
    - Gaussian 분포를 가진 확률변수의 제곱들의 합에 대한 분포
    - 예를 들어 k자유도의 chi-squared는 k개의 독립적인 Gaussian들에 대한 합의 확률분포
[우리가 데이터를 쓰는 법] 좋다는 건 알겠는데 좀 써보고 싶소. 데이터! - 넘버웍스 하용호 대표
‘데이터’를 똑똑하게 만드는 오픈소스 기술 12종
Google Data Studio (beta) provides everything you need to turn your data into beautiful, informative reports that are easy to read, easy to share, and fully customizable
쉽게 이해하는 모바일 데이타 분석
데이터 사이언티스트로 성장하기
Data School
A User’s Guide To FiveThirtyEight’s 2016 General Election Forecast
어떻게 하면 싱싱한 데이터를 모형에 바로 적용할 수 있을까? – Bayesian Online Leaning
데이터 과학 여름 학교 2016
데이터에 현혹되지 않고, 데이터를 잘 활용할수 있는 14가지 룰
Demystifying Different Roles in Data Team
Causal Data Science
Announcing the general availability of the Microsoft Excel API to expand the power of Office 365
Difference between classification and clustering in data mining?
16 analytic disciplines compared to data science
PyData Paris 2016 - Round table: "How to become a data scientist"
Renee Teate | Becoming a Data Scientist Advice From My Podcast Guests
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
데이터 전처리에 대한 모든 것
데이터 사이언스 스쿨 - Python 데이터 핸들링과 시각화 라이브러리 실무
데이터 과학을 공부하는 이유
데이터는 차트가 아니라 돈이 되어야 한다
Practical Data Science at Honestbee - DataScienceSG
빅데이터의 대중화
이론의 종말: 데이터 홍수가 과학적 연구방법을 구닥다리로 만든다
[데이터로 풀어보는 나] 이메일로 분석해 보는 나의 3년
E-Mail 데이터 곱씹어보기
스터디뽀개기.zip
- GNMT로 알아보는 신경망 기반 기계번역 / 구글 신경망 기계번역 시스템 리뷰
- Spark + R / spark + R 기본 사용법, 특징과 장단점 소개
- Spark를 이용한 분산 컴퓨팅 / 분산환경에서 머신러닝을 운용하기 위한 기반으로 Spark와 클라우드를 활용하는 법
- 강화학습을 활용한 대화형 시스템 / 대화형 시스템을 구성하기 위해 강화학습을 이용하는 방법 리뷰
How to Make Your Database 200x Faster Without Having to Pay More?
- 데이터 분석에 있어 정확한 수치가 필요한 것이 아니라 데이터의 추이 또는 비율 등을 분석하는 경우에는 전체 데이터가 아닌 샘플링을 하는 방식을 이용할 수 있다는 내용
- Presto, BlinkDB / G-OLA, SnappyData 등과 같은 샘플링 방식을 지원하는 데이터 처리 솔루션에 대해서도 간단하게 소개
3 methods to deal with outliers
Visual Information Theory
[전병국의 데이터스토리] 가장 위대한 데이터 분석가
Tutorial 1: Protein - DNA interaction
A survey on predicting the popularity of web content
The Rise of the Data Engineer
Data analysis in excel
Common Probability Distributions: The Data Scientist’s Crib Sheet
dataplatforms.com
빅데이터 파라독스 표본수가 클수록 정확할 거 같지만, 선택편향이 있는 경우 실제 정확도는 400명의 확률표본으로 조사한 것과 마찬가지
How to start a Data Science project in Python
이야기 12. 당신은 데이터 문맹(Data Illiterate) 인가?
Data science with R - 1. 오해
Q&A with leading Data Scientists
Forrester vs Gartner on Data Science Platforms and Machine Learning Solutions
sooyongshin.wordpress.com
- Healthcare Data? Data! Data!! (0) – 왜 데이터 이야기를 하나..
Data Science Ontology
Automated Machine Learning — A Paradigm Shift That Accelerates Data Scientist Productivity @ Airbnb
A list of artificial intelligence tools you can use today — for personal use (1/3)
Data Science Bowl 2017, Predicting Lung Cancer: Solution Write-up, Team Deep Breath
How to Start a Data Science Project in Python
- 데이터 분석을 위한 기본적인 Python 환경 설정 방법
- Anaconda의 Conda를 활용해 분리된 환경 설정
- 하나의 Python 데이터 분석 프로젝트의 디렉토리를 구성하는 방법
Strata Data Conference
Data Science Resources : Cheat Sheets
Top 28 Cheat Sheets for Machine Learning, Data Science, Probability, SQL & Big Data
Getting started: the 3 stages of data infrastructure
데이터를 얻으려는 노오오력
#2.5. Intra/Inter-Class Variability 데이터의 '질'이란?
Analyzing GitHub, how developers change programming languages over time
Regression 모델 평가 방법
OPENDATAMINER - THE DATA MINING COMPANY THAT TURNS YOUR DATA INTO VALUES
- opendataminer object mapper
7 Techniques to Handle Imbalanced Data

Book

시스템 트레이딩을 위한 데이터 사이언스 (파이썬 활용편)
27 free data mining books
Foundations of Data Science
The Data Science Handbook
16 Free Data Science Books
Free Data Science Books
50+ Free Data Science Books
밑바닥부터 시작하는 데이터 과학
- 밑바닥부터 시작하는 데이터 사이언스
- 밑바닥부터 시작하는 데이터 과학 ch.03 데이터 시각화

Library

academictorrents.com
Announcing FsLab: Data science package
Beaker
Digdag is a simple tool that helps you to build, run, schedule, and monitor complex pipelines of tasks Data Workflow Management Opensource Engine
GRID - Global Research Identifier Database Cataloging the world's research organisations
Mirador is a tool for visual exploration of complex datasets
Mockaroo - Mockaroo lets you generate up to 1,000 rows of realistic test data in CSV, JSON, SQL, and Excel formats
Piwik - Open Analytics Platform
sampleclean - Data Cleaning With Algorithms, Machines, and People
Weld: A common runtime for high performance data analytics
- Numba와 비슷하게, Rust 기반 컴파일러를 이용해 Data 분석 스크립트의 속도를 최적화하여 빠르게 함
- 내용에 따르면 특정 데이터 분석의 경우 속도 향상
- Pandas, TensorFlow, Spark SQL등 결합 가능

News

The Best of Big Data: New Articles Published This Month (June 2017)

Public Data

Python

Recommendation

Topic Modeling