Skip to content

Commit e5d72a4

Browse files
markdown linting
1 parent 442a7c6 commit e5d72a4

File tree

11 files changed

+105
-46
lines changed

11 files changed

+105
-46
lines changed

0_domain_study/guide.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# Domain Study: Guide
22

3-
To do meaningful research in a domain, you need to learn what others already do and don't understand in this area. Use this folder to organize your group's understanding of your research domain including: your own summaries, helpful PDFs, links you found helpful, ...
3+
To do meaningful research in a domain, you need to learn what others already do
4+
and don't understand in this area. Use this folder to organize your group's
5+
understanding of your research domain including: your own summaries, helpful
6+
PDFs, links you found helpful, ...
47

5-
This folder is different from `/notes` because it contains _only_ information about your research domain. When deciding what goes here, ask yourself this question: _Would someone need to know this to understand our research?_
8+
This folder is different from `/notes` because it contains _only_ information
9+
about your research domain. When deciding what goes here, ask yourself this
10+
question: _Would someone need to know this to understand our research?_

1_datasets/guide.md

Lines changed: 50 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,25 @@
11
# Datasets: Guide
22

3-
Store your local datasets in this folder (`.csv`, `.xlsx`, `.json`, `.sqlite`, ...). You can use the README to document each dataset (where it's from, what data & types it contains, what you use it for, ...).
3+
Store your local datasets in this folder (`.csv`, `.xlsx`, `.json`, `.sqlite`,
4+
...). You can use the README to document each dataset (where it's from, what
5+
data & types it contains, what you use it for, ...).
46

5-
One of the primary goals of this repository is that anyone can clone and replicate your research. To make this possible **DO NOT modify or overwrite your raw datasets**! You should keep them _exactly_ as they were when you downloaded them, you may even want to name them `dataset.raw.ext` (eg. `daily_temperatures.raw.csv`).
7+
One of the primary goals of this repository is that anyone can clone and
8+
replicate your research. To make this possible **DO NOT modify or overwrite your
9+
raw datasets**! You should keep them _exactly_ as they were when you downloaded
10+
them, you may even want to name them `dataset.raw.ext` (eg.
11+
`daily_temperatures.raw.csv`).
612

7-
When cleaning and processing your datasets, you should save the prepared data to a _new_ file with a descriptive name. This approach will result in many dataset files, but that's ok!
13+
When cleaning and processing your datasets, you should save the prepared data to
14+
a _new_ file with a descriptive name. This approach will result in many dataset
15+
files, but that's ok!
816

917
## Types of Dataset
1018

11-
A dataset is "simply" a collection of related measurements or observations. To create a good model of your problem using data you must understanding what _kinds_ of data exist, how to understand them, and the best ways to analyze each one. The kind of data you choose impacts:
19+
A dataset is "simply" a collection of related measurements or observations. To
20+
create a good model of your problem using data you must understanding what
21+
_kinds_ of data exist, how to understand them, and the best ways to analyze each
22+
one. The kind of data you choose impacts:
1223

1324
- The tools you use for exploration and analysis
1425
- How we visualize the data
@@ -32,17 +43,20 @@ Data that represents quantities and can represented as numbers.
3243

3344
#### Continuous Data
3445

35-
- **Definition**: Can take any value within a range (including fractions and decimals)
46+
- **Definition**: Can take any value within a range (including fractions and
47+
decimals)
3648
- **Examples**: Height, weight, temperature, time, distance
3749
- **Analysis**: Mean, median, standard deviation, histograms, scatter plots
38-
- **Real-world example**: Recording daily temperature over a month (72.5°F, 68.3°F, etc.)
50+
- **Real-world example**: Recording daily temperature over a month (72.5°F,
51+
68.3°F, etc.)
3952

4053
#### Discrete Data
4154

4255
- **Definition**: Countable values, typically whole numbers
4356
- **Examples**: Number of children, items sold, count of occurrences
4457
- **Analysis**: Frequency tables, bar charts, mode
45-
- **Real-world example**: Number of customers visiting a store each day (45, 52, 38, etc.)
58+
- **Real-world example**: Number of customers visiting a store each day (45, 52,
59+
38, etc.)
4660

4761
### Qualitative (Categorical) Data
4862

@@ -53,14 +67,16 @@ Data that describes qualities or characteristics of what you want to study.
5367
- **Definition**: Categories with no inherent order or ranking
5468
- **Examples**: Gender, blood type, country, color, product type
5569
- **Analysis**: Frequency counts, mode, chi-square tests, pie charts
56-
- **Real-world example**: Survey responses for favorite color (red, blue, green, etc.)
70+
- **Real-world example**: Survey responses for favorite color (red, blue, green,
71+
etc.)
5772

5873
#### Ordinal Data
5974

6075
- **Definition**: Categories with a meaningful order or ranking
6176
- **Examples**: Education level, satisfaction ratings (1-5), economic status
6277
- **Analysis**: Median, percentiles, rank correlations, stacked bar charts
63-
- **Real-world example**: Customer satisfaction ratings (very dissatisfied, dissatisfied, neutral, satisfied, very satisfied)
78+
- **Real-world example**: Customer satisfaction ratings (very dissatisfied,
79+
dissatisfied, neutral, satisfied, very satisfied)
6480

6581
### Binary Data
6682

@@ -116,7 +132,8 @@ Data that describes qualities or characteristics of what you want to study.
116132
- **Examples**: Surveys, experiments, interviews, direct observations
117133
- **Advantages**: Tailored to research needs, higher control over quality
118134
- **Disadvantages**: Time-consuming, potentially expensive
119-
- **Real-world example**: Market research survey designed specifically for a new product
135+
- **Real-world example**: Market research survey designed specifically for a new
136+
product
120137

121138
### Secondary Data
122139

@@ -128,21 +145,26 @@ Data that describes qualities or characteristics of what you want to study.
128145

129146
### [Proxy Data](https://centerforgov.gitbooks.io/benchmarking/content/Proxy.html)
130147

131-
- **Definition**: Data that is
132-
- **Examples**: Tree rings to proxy historical weather patterns, tax data to proxy incomes
133-
- **Advantages**: Helos you understand phenomena that are difficult or impossible to study directly.
148+
- **Definition**: Data that is
149+
- **Examples**: Tree rings to proxy historical weather patterns, tax data to
150+
proxy incomes
151+
- **Advantages**: Helos you understand phenomena that are difficult or
152+
impossible to study directly.
134153
- **Disadvantages**: You cannot draw conclusions with the same confidence.
135-
- **Real-world example**: Using the stock market + unemployment rates as a proxy for the economy..
154+
- **Real-world example**: Using the stock market + unemployment rates as a proxy
155+
for the economy..
136156

137157
### Experimental Data
138158

139-
- **Definition**: Generated from controlled experiments with manipulated variables
159+
- **Definition**: Generated from controlled experiments with manipulated
160+
variables
140161
- **Examples**: A/B tests, clinical trials, laboratory experiments
141162
- **Characteristics**:
142163
- Control and treatment groups
143164
- Controlled conditions
144165
- Designed to establish causality
145-
- **Real-world example**: Testing whether a new website design increases conversion rates
166+
- **Real-world example**: Testing whether a new website design increases
167+
conversion rates
146168

147169
### Observational Data
148170

@@ -152,7 +174,8 @@ Data that describes qualities or characteristics of what you want to study.
152174
- Natural setting
153175
- No manipulation of variables
154176
- Good for establishing correlation (not causation)
155-
- **Real-world example**: Observing and recording consumer shopping behaviors in a store
177+
- **Real-world example**: Observing and recording consumer shopping behaviors in
178+
a store
156179

157180
## Classification by Size and Complexity
158181

@@ -186,8 +209,10 @@ Data that describes qualities or characteristics of what you want to study.
186209
- Curse of dimensionality
187210
- Feature selection importance
188211
- Visualization difficulties
189-
- **Analysis**: Dimension reduction techniques (PCA, t-SNE), specialized algorithms
190-
- **Real-world example**: Gene expression data with thousands of genes measured for each sample
212+
- **Analysis**: Dimension reduction techniques (PCA, t-SNE), specialized
213+
algorithms
214+
- **Real-world example**: Gene expression data with thousands of genes measured
215+
for each sample
191216

192217
## Classification by Access Type
193218

@@ -204,7 +229,8 @@ Data that describes qualities or characteristics of what you want to study.
204229
### Private Data
205230

206231
- **Definition**: Access restricted to authorized users
207-
- **Examples**: Company internal data, personal health records, proprietary research
232+
- **Examples**: Company internal data, personal health records, proprietary
233+
research
208234
- **Characteristics**:
209235
- Security measures required
210236
- Often subject to privacy regulations
@@ -251,7 +277,8 @@ Data that describes qualities or characteristics of what you want to study.
251277
- Reference data
252278
- Shared across systems
253279
- Requires governance
254-
- **Real-world example**: Product master list with SKUs, descriptions, and categories
280+
- **Real-world example**: Product master list with SKUs, descriptions, and
281+
categories
255282

256283
### Metadata
257284

@@ -276,7 +303,8 @@ Data that describes qualities or characteristics of what you want to study.
276303

277304
### Hierarchical Data
278305

279-
- **Definition**: Organized in a tree-like structure with parent-child relationships
306+
- **Definition**: Organized in a tree-like structure with parent-child
307+
relationships
280308
- **Examples**: XML, JSON, file systems
281309
- **Characteristics**:
282310
- Nested structure

2_data_preparation/guide.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
11
# Data Preparation: Guide
22

3-
This folder is for any Python scripts or notebooks you use to clean & prepare your datasets. These files should:
3+
This folder is for any Python scripts or notebooks you use to clean & prepare
4+
your datasets. These files should:
45

56
1. Read in datasets from `0_datasets`
67
2. Clean, reformat, or otherwise process the datasets for later.
78
3. Write the processed dataset into `0_datasets` with a helpful file name.
89

9-
**DO NOT modify an existing dataset in `0_datasets`! Instead, save your processed data to a _new_ file.** This is critical to open research: Someone should be able to clone this repository and run your scripts to replicate your research. If you modify an original dataset, others cannot replicate your work.
10+
**DO NOT modify an existing dataset in `0_datasets`! Instead, save your
11+
processed data to a _new_ file.** This is critical to open research: Someone
12+
should be able to clone this repository and run your scripts to replicate your
13+
research. If you modify an original dataset, others cannot replicate your work.

3_data_exploration/guide.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,23 @@
11
# Data Exploration: Guide
22

3-
This folder is for any Python scripts or notebooks you use to _explore and understand_ your datasets. These files should:
3+
This folder is for any Python scripts or notebooks you use to _explore and
4+
understand_ your datasets. These files should:
45

56
1. Read in prepared datasets from `0_datasets`
67
2. Explore and understand the dataset without running a deep analysis:
7-
- Generate some visualizations (in a notebook, or in a separate image file saved to this folder)
8-
- Run some descriptive statistics (_[beware](https://www.researchgate.net/publication/316652618_Same_Stats_Different_Graphs_Generating_Datasets_with_Varied_Appearance_and_Identical_Statistics_through_Simulated_Annealing) the [Datasaurus Dozen](https://www.research.autodesk.com/publications/same-stats-different-graphs/)!_)
9-
- ... let your curiosity guide you, but _avoid_ running any inferential statistics or using any machine learning at this stage.
8+
- Generate some visualizations (in a notebook, or in a separate image file
9+
saved to this folder)
10+
- Run some descriptive statistics
11+
(_[beware](https://www.researchgate.net/publication/316652618_Same_Stats_Different_Graphs_Generating_Datasets_with_Varied_Appearance_and_Identical_Statistics_through_Simulated_Annealing)
12+
the
13+
[Datasaurus Dozen](https://www.research.autodesk.com/publications/same-stats-different-graphs/)!_)
14+
- ... let your curiosity guide you, but _avoid_ running any inferential
15+
statistics or using any machine learning at this stage.
1016

11-
**DO NOT modify an existing dataset in `0_datasets`!** This is critical to open research: Someone should be able to clone this repository and run your scripts to replicate your research. If you modify an original dataset, others cannot replicate your work.
17+
**DO NOT modify an existing dataset in `0_datasets`!** This is critical to open
18+
research: Someone should be able to clone this repository and run your scripts
19+
to replicate your research. If you modify an original dataset, others cannot
20+
replicate your work.
1221

13-
> [Chapter 4 - Exploratory Data Analysis](https://bookdown.org/rdpeng/artofdatascience/exploratory-data-analysis.html) from the Art of Data Science is a good starting reference.
22+
> [Chapter 4 - Exploratory Data Analysis](https://bookdown.org/rdpeng/artofdatascience/exploratory-data-analysis.html)
23+
> from the Art of Data Science is a good starting reference.

4_data_analysis/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
# Data Analysis
1+
# Data Analysis

4_data_analysis/guide.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,17 @@
11
# Data Analysis: Guide
22

3-
This folder is for any Python scripts or notebooks you use to gain insights from your data through modeling, inferential statistics, and other analytical techniques. These files should:
3+
This folder is for any Python scripts or notebooks you use to gain insights from
4+
your data through modeling, inferential statistics, and other analytical
5+
techniques. These files should:
46

57
1. Read in prepared datasets from `0_datasets`
6-
2. Learn from your datasets using methods that are appropriate to your research question, dataset and team's constraints.
8+
2. Learn from your datasets using methods that are appropriate to your research
9+
question, dataset and team's constraints.
710

8-
**DO NOT modify an existing dataset in `0_datasets`!** This is critical to open research: Someone should be able to clone this repository and run your scripts to replicate your research. If you modify an original dataset, others cannot replicate your work.
11+
**DO NOT modify an existing dataset in `0_datasets`!** This is critical to open
12+
research: Someone should be able to clone this repository and run your scripts
13+
to replicate your research. If you modify an original dataset, others cannot
14+
replicate your work.
915

10-
> [Chapters 5-8](https://bookdown.org/rdpeng/artofdatascience) from the Art of Data Science are a good starting reference.
16+
> [Chapters 5-8](https://bookdown.org/rdpeng/artofdatascience) from the Art of
17+
> Data Science are a good starting reference.

5_communication_strategy/guide.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
11
# Communication Strategy: Guide
22

3-
This folder is here to organize the communication strategy for your research findings. You can use it however you like. Your communication artefact doesn't need to be stored here - it could be a video hosted on YouTube, a SM campaign, ... don't constrain yourself to something that can be stored on GitHub!
3+
This folder is here to organize the communication strategy for your research
4+
findings. You can use it however you like. Your communication artefact doesn't
5+
need to be stored here - it could be a video hosted on YouTube, a SM campaign,
6+
... don't constrain yourself to something that can be stored on GitHub!

6_final_presentation/guide.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# Final Presentation: Guide
22

3-
You can use this folder to plan your final presentation including presentation outlines, scripts, ...
3+
You can use this folder to plan your final presentation including presentation
4+
outlines, scripts, ...
45

56
Don't forget to link to your final presentation in the repository README!

collaboration/communication.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010
# Communication
1111

12-
______________________________________________________________________
12+
---
1313

1414
## Communication Schedule
1515

@@ -25,15 +25,15 @@ how often will we get in touch on each channel, and what we will discuss there:
2525
- **Slack/Discord**:
2626
- **Video Calls**:
2727

28-
______________________________________________________________________
28+
---
2929

3030
## Availability
3131

3232
### Availability for calling/messaging
3333

34-
| Day | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday | |
35-
------ | :----: | :-----: | :-------: | :------: | :----: | :------: | :----: |
36-
| _name_ | | | | | | | |
34+
| Day | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday | |
35+
| ------ | :----: | :-----: | :-------: | :------: | :----: | :------: | :----: | --- |
36+
| _name_ | | | | | | | |
3737

3838
### How many hours everyone has per day
3939

collaboration/retrospective.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,4 @@
2828

2929
### Name
3030

31-
<!-- write a 2-3 sentence reflection on your contributions, challenges and progress in this milestone -->
31+
<!-- reflect on your contributions, challenges and progress in this milestone -->

0 commit comments

Comments
 (0)