main guide draft, update other guides

colevandersWands · colevandersWands · commit 07725e66421e · 2025-05-16T15:52:08.000-04:00
diff --git a/0_domain_study/guide.md b/0_domain_study/guide.md
@@ -8,3 +8,8 @@ PDFs, links you found helpful, ...
 This folder is different from `/notes` because it contains _only_ information
 about your research domain. When deciding what goes here, ask yourself this
 question: _Would someone need to know this to understand our research?_
+
+## README.md
+
+Use this folder's README to document all the notes and resources in this folder.
+Someone shouldn't need to read through _everything_ to find what they need.
diff --git a/1_datasets/guide.md b/1_datasets/guide.md
@@ -14,6 +14,12 @@ When cleaning and processing your datasets, you should save the prepared data to
 a _new_ file with a descriptive name. This approach will result in many dataset
 files, but that's ok!
 
+## README.md
+
+Use the README in this folder to document each dataset in the folder. Include
+information like: where is the data from? how was it collected? how does it
+relate to your problem? ...
+
 ## Types of Dataset
 
 A dataset is "simply" a collection of related measurements or observations. To
diff --git a/2_data_preparation/guide.md b/2_data_preparation/guide.md
@@ -11,3 +11,9 @@ your datasets. These files should:
 processed data to a _new_ file.** This is critical to open research: Someone
 should be able to clone this repository and run your scripts to replicate your
 research. If you modify an original dataset, others cannot replicate your work.
+
+## README.md
+
+Use this folder's README to give a quick summary of each script/notebook - which
+dataset(s) does it process and how? which datasets does it create and save to
+`/1_datasets`?.
diff --git a/3_data_exploration/guide.md b/3_data_exploration/guide.md
@@ -21,3 +21,8 @@ replicate your work.
 
 > [Chapter 4 - Exploratory Data Analysis](https://bookdown.org/rdpeng/artofdatascience/exploratory-data-analysis.html)
 > from the Art of Data Science is a good starting reference.
+
+## README.md
+
+Use the README in this folder to give a quick summary of each script/notebook -
+which dataset(s) it explores, and how.
diff --git a/4_data_analysis/guide.md b/4_data_analysis/guide.md
@@ -15,3 +15,9 @@ replicate your work.
 
 > [Chapters 5-8](https://bookdown.org/rdpeng/artofdatascience) from the Art of
 > Data Science are a good starting reference.
+
+## README.md
+
+Use the README in this folder to document your analysis strategy and provide a
+quick summary of each script/notebook. You can also explain your research
+results in-depth in this folder's README.
diff --git a/guide.md b/guide.md
@@ -1 +1,151 @@
-# ET6 CDSP Starter
+# ET6 CDSP Starter: Guide
+
+This repository is here to help guide you through the
+[Collaborative Data Science Project (CDSP)](https://docs.google.com/document/d/1TaoVVqJD5EqmBGLw6_qzph8EZnuL6uhY/edit?usp=sharing&ouid=100638458423869369523&rtpof=true&sd=true).
+
+This repository's structure roughly follows the CDSP milestones. It's also
+designed to help you do **reproducible** research. If your research process is
+well organized, others should be able to clone it, run all scripts (without
+errors!), and evaluate your conclusions for themselves.
+
+```
+/
+├── README.md                   # Project overview and main findings
+├── /collaboration/             # Team norms, strategies, and retrospectives
+├── /notes/                     # Shared resources and learning materials
+├── /0_domain_study/            # Domain research and background
+├── /1_datasets/                # Raw and processed datasets
+├── /2_data_preparation/        # Scripts for cleaning and processing data
+├── /3_data_exploration/        # Scripts for initial data understanding
+├── /4_data_analysis/           # Scripts for in-depth analysis
+├── /5_communication_strategy/  # Materials for communicating findings
+└── /6_final_presentation/      # Final presentation materials
+```
+
+Below are some suggestions on how to use the folders/files in this repository,
+but they're just suggestions! Your group should find a system that works for you
+
+---
+
+## 1. Cross-Cultural Collaboration
+
+- Use `/collaboration` to document team norms (in `README.md`), collaboration
+  strategies, project goals, and your milestone retrospectives. This folder is a
+  living document — you should update it through the whole project.
+- Use `/notes` to share useful resources: tools, tutorials, examples, or
+  anything else that helps your team learn and work better.
+- In the main `README.md`, write a short intro to your team and what you're
+  hoping to study.
+- Create a retrospective for this milestone in `/collaboration` using the
+  template.
+
+---
+
+## 2. Problem Identification
+
+- Use `/0_domain_study` to build a reference folder for your research domain. A
+  new teammate should be able to catch up using this folder.
+- Use `/0_domain_study/README.md` to organize your notes in `0_domain_study` so
+  people don't have to read every single file to find what they need.
+- In the main `README.md`, write a summary of your research question, relevant
+  background, and why you think this is a meaningful problem to work on.
+- Create a retrospective for this milestone in `/collaboration` using the
+  template.
+
+---
+
+## 3. Data Collection
+
+- Store **all datasets** (raw or processed) in `/1_datasets`. This folder is for
+  data only — not code. (Unless you happen to have a dataset that's inside a
+  `.py` file…)
+- In `/1_datasets/README.md`, document each dataset: where it's from, how it was
+  collected, how it connects to your research question, and any limitations or
+  caveats.
+- Use `/2_data_preparation` to keep all your cleaning, transformation, and prep
+  scripts. These scripts should read data from `/1_datasets` and write new
+  datasets back to that same folder.
+- In `/2_data_preparation/README.md`, explain what each script does: which
+  datasets it reads, what it does to them, and what outputs it creates.
+- Use `/3_data_exploration` to explore, visualize, and get a feel for your
+  datasets. This isn't the place for answering research questions — it's just to
+  understand your data.
+- In `/3_data_exploration/README.md`, summarize what each script/notebook
+  explores, and which datasets it uses.
+- In the main `README.md`, describe how you're modeling your research question
+  with data, what datasets you're using, and how you prepared them.
+- Create a retrospective for this milestone in `/collaboration` using the
+  template.
+
+---
+
+## 4. Data Analysis
+
+- Use `/4_data_analysis` for scripts and notebooks that actually analyze your
+  data to answer your research question. Don’t try to cram everything into one
+  file — you can have many scripts/notebooks in here as long as they are clearly
+  named. It's expected that your research findings and conclusions will be the
+  result of many smaller analyses, trying to fit everything into a single
+  notebook will be unhelpful. You can always cite different scripts/notebooks to
+  support different parts of your conclusions.
+- In `/4_data_analysis/README.md`, outline your analysis strategy and summarize
+  what each script or notebook does.
+- In the main `README.md`, include:
+  - A short summary of your analysis approach
+  - A clear statement of your research conclusions (right at the top)
+  - How confident you are in your results
+  - What limitations your work has
+  - Any ideas for future research
+- Create a retrospective for this milestone in `/collaboration` using the
+  template.
+
+---
+
+## 5. Communication Strategy
+
+- Use `/5_communication_strategy` for planning and drafting your communication
+  artefact. That includes audience research, message development, and assets
+  like images or scripts.
+- You don’t need to store the final artefact here if that doesn’t make sense For
+  example, in cohort 6 a group created an instagram account and meme campaign as
+  their communication strategy! You can't push that to a folder on GitHub.
+- In `/5_communication_strategy/README.md`, summarize your strategy: who you’re
+  reaching, what you’re saying, and why.
+- In the main `README.md`, include a summary of your communication strategy and
+  a link (if possible) to your final artefact.
+- Create a retrospective for this milestone in `/collaboration` using the
+  template.
+
+---
+
+## 6. Final Presentation
+
+- Use `/6_final_presentation` to store slides, scripts, or notes from preparing
+  your final presentation.
+- In `/6_final_presentation/README.md`, list what’s in the folder and link to
+  your actual presentation.
+- Create a retrospective for this milestone in `/collaboration` using the
+  template.
+
+---
+
+## General Tips
+
+- Keep README files updated as you go. They’re for humans. Future-you is a
+  human.
+- Reproducibility is key. Someone else should be able to run your pipeline
+  without tweaking your code or guessing what goes where.
+- Use clear, consistent file names — you don’t want to waste time figuring out
+  what `final_final_revised3.ipynb` was supposed to do.
+- Document your work as you’re doing it. Waiting until the end = pain.
+- Cross-reference when needed. ("This analysis uses the cleaned data from
+  `/2_data_preparation/clean_survey_data.py`.")
+- Commit early and often. Write commit messages that your teammates (and your
+  future self) will understand.
+- Do regular repo reviews as a team. Is everything findable? Understandable?
+
+---
+
+---
+
+Happy studies!