Skip to content

Commit 07725e6

Browse files
main guide draft, update other guides
1 parent e5d72a4 commit 07725e6

File tree

6 files changed

+179
-1
lines changed

6 files changed

+179
-1
lines changed

0_domain_study/guide.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,8 @@ PDFs, links you found helpful, ...
88
This folder is different from `/notes` because it contains _only_ information
99
about your research domain. When deciding what goes here, ask yourself this
1010
question: _Would someone need to know this to understand our research?_
11+
12+
## README.md
13+
14+
Use this folder's README to document all the notes and resources in this folder.
15+
Someone shouldn't need to read through _everything_ to find what they need.

1_datasets/guide.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,12 @@ When cleaning and processing your datasets, you should save the prepared data to
1414
a _new_ file with a descriptive name. This approach will result in many dataset
1515
files, but that's ok!
1616

17+
## README.md
18+
19+
Use the README in this folder to document each dataset in the folder. Include
20+
information like: where is the data from? how was it collected? how does it
21+
relate to your problem? ...
22+
1723
## Types of Dataset
1824

1925
A dataset is "simply" a collection of related measurements or observations. To

2_data_preparation/guide.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,9 @@ your datasets. These files should:
1111
processed data to a _new_ file.** This is critical to open research: Someone
1212
should be able to clone this repository and run your scripts to replicate your
1313
research. If you modify an original dataset, others cannot replicate your work.
14+
15+
## README.md
16+
17+
Use this folder's README to give a quick summary of each script/notebook - which
18+
dataset(s) does it process and how? which datasets does it create and save to
19+
`/1_datasets`?.

3_data_exploration/guide.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,8 @@ replicate your work.
2121

2222
> [Chapter 4 - Exploratory Data Analysis](https://bookdown.org/rdpeng/artofdatascience/exploratory-data-analysis.html)
2323
> from the Art of Data Science is a good starting reference.
24+
25+
## README.md
26+
27+
Use the README in this folder to give a quick summary of each script/notebook -
28+
which dataset(s) it explores, and how.

4_data_analysis/guide.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,9 @@ replicate your work.
1515

1616
> [Chapters 5-8](https://bookdown.org/rdpeng/artofdatascience) from the Art of
1717
> Data Science are a good starting reference.
18+
19+
## README.md
20+
21+
Use the README in this folder to document your analysis strategy and provide a
22+
quick summary of each script/notebook. You can also explain your research
23+
results in-depth in this folder's README.

guide.md

Lines changed: 151 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,151 @@
1-
# ET6 CDSP Starter
1+
# ET6 CDSP Starter: Guide
2+
3+
This repository is here to help guide you through the
4+
[Collaborative Data Science Project (CDSP)](https://docs.google.com/document/d/1TaoVVqJD5EqmBGLw6_qzph8EZnuL6uhY/edit?usp=sharing&ouid=100638458423869369523&rtpof=true&sd=true).
5+
6+
This repository's structure roughly follows the CDSP milestones. It's also
7+
designed to help you do **reproducible** research. If your research process is
8+
well organized, others should be able to clone it, run all scripts (without
9+
errors!), and evaluate your conclusions for themselves.
10+
11+
```
12+
/
13+
├── README.md # Project overview and main findings
14+
├── /collaboration/ # Team norms, strategies, and retrospectives
15+
├── /notes/ # Shared resources and learning materials
16+
├── /0_domain_study/ # Domain research and background
17+
├── /1_datasets/ # Raw and processed datasets
18+
├── /2_data_preparation/ # Scripts for cleaning and processing data
19+
├── /3_data_exploration/ # Scripts for initial data understanding
20+
├── /4_data_analysis/ # Scripts for in-depth analysis
21+
├── /5_communication_strategy/ # Materials for communicating findings
22+
└── /6_final_presentation/ # Final presentation materials
23+
```
24+
25+
Below are some suggestions on how to use the folders/files in this repository,
26+
but they're just suggestions! Your group should find a system that works for you
27+
28+
---
29+
30+
## 1. Cross-Cultural Collaboration
31+
32+
- Use `/collaboration` to document team norms (in `README.md`), collaboration
33+
strategies, project goals, and your milestone retrospectives. This folder is a
34+
living document — you should update it through the whole project.
35+
- Use `/notes` to share useful resources: tools, tutorials, examples, or
36+
anything else that helps your team learn and work better.
37+
- In the main `README.md`, write a short intro to your team and what you're
38+
hoping to study.
39+
- Create a retrospective for this milestone in `/collaboration` using the
40+
template.
41+
42+
---
43+
44+
## 2. Problem Identification
45+
46+
- Use `/0_domain_study` to build a reference folder for your research domain. A
47+
new teammate should be able to catch up using this folder.
48+
- Use `/0_domain_study/README.md` to organize your notes in `0_domain_study` so
49+
people don't have to read every single file to find what they need.
50+
- In the main `README.md`, write a summary of your research question, relevant
51+
background, and why you think this is a meaningful problem to work on.
52+
- Create a retrospective for this milestone in `/collaboration` using the
53+
template.
54+
55+
---
56+
57+
## 3. Data Collection
58+
59+
- Store **all datasets** (raw or processed) in `/1_datasets`. This folder is for
60+
data only — not code. (Unless you happen to have a dataset that's inside a
61+
`.py` file…)
62+
- In `/1_datasets/README.md`, document each dataset: where it's from, how it was
63+
collected, how it connects to your research question, and any limitations or
64+
caveats.
65+
- Use `/2_data_preparation` to keep all your cleaning, transformation, and prep
66+
scripts. These scripts should read data from `/1_datasets` and write new
67+
datasets back to that same folder.
68+
- In `/2_data_preparation/README.md`, explain what each script does: which
69+
datasets it reads, what it does to them, and what outputs it creates.
70+
- Use `/3_data_exploration` to explore, visualize, and get a feel for your
71+
datasets. This isn't the place for answering research questions — it's just to
72+
understand your data.
73+
- In `/3_data_exploration/README.md`, summarize what each script/notebook
74+
explores, and which datasets it uses.
75+
- In the main `README.md`, describe how you're modeling your research question
76+
with data, what datasets you're using, and how you prepared them.
77+
- Create a retrospective for this milestone in `/collaboration` using the
78+
template.
79+
80+
---
81+
82+
## 4. Data Analysis
83+
84+
- Use `/4_data_analysis` for scripts and notebooks that actually analyze your
85+
data to answer your research question. Don’t try to cram everything into one
86+
file — you can have many scripts/notebooks in here as long as they are clearly
87+
named. It's expected that your research findings and conclusions will be the
88+
result of many smaller analyses, trying to fit everything into a single
89+
notebook will be unhelpful. You can always cite different scripts/notebooks to
90+
support different parts of your conclusions.
91+
- In `/4_data_analysis/README.md`, outline your analysis strategy and summarize
92+
what each script or notebook does.
93+
- In the main `README.md`, include:
94+
- A short summary of your analysis approach
95+
- A clear statement of your research conclusions (right at the top)
96+
- How confident you are in your results
97+
- What limitations your work has
98+
- Any ideas for future research
99+
- Create a retrospective for this milestone in `/collaboration` using the
100+
template.
101+
102+
---
103+
104+
## 5. Communication Strategy
105+
106+
- Use `/5_communication_strategy` for planning and drafting your communication
107+
artefact. That includes audience research, message development, and assets
108+
like images or scripts.
109+
- You don’t need to store the final artefact here if that doesn’t make sense For
110+
example, in cohort 6 a group created an instagram account and meme campaign as
111+
their communication strategy! You can't push that to a folder on GitHub.
112+
- In `/5_communication_strategy/README.md`, summarize your strategy: who you’re
113+
reaching, what you’re saying, and why.
114+
- In the main `README.md`, include a summary of your communication strategy and
115+
a link (if possible) to your final artefact.
116+
- Create a retrospective for this milestone in `/collaboration` using the
117+
template.
118+
119+
---
120+
121+
## 6. Final Presentation
122+
123+
- Use `/6_final_presentation` to store slides, scripts, or notes from preparing
124+
your final presentation.
125+
- In `/6_final_presentation/README.md`, list what’s in the folder and link to
126+
your actual presentation.
127+
- Create a retrospective for this milestone in `/collaboration` using the
128+
template.
129+
130+
---
131+
132+
## General Tips
133+
134+
- Keep README files updated as you go. They’re for humans. Future-you is a
135+
human.
136+
- Reproducibility is key. Someone else should be able to run your pipeline
137+
without tweaking your code or guessing what goes where.
138+
- Use clear, consistent file names — you don’t want to waste time figuring out
139+
what `final_final_revised3.ipynb` was supposed to do.
140+
- Document your work as you’re doing it. Waiting until the end = pain.
141+
- Cross-reference when needed. ("This analysis uses the cleaned data from
142+
`/2_data_preparation/clean_survey_data.py`.")
143+
- Commit early and often. Write commit messages that your teammates (and your
144+
future self) will understand.
145+
- Do regular repo reviews as a team. Is everything findable? Understandable?
146+
147+
---
148+
149+
---
150+
151+
Happy studies!

0 commit comments

Comments
 (0)