|
1 | | -# ET6 CDSP Starter |
| 1 | +# ET6 CDSP Starter: Guide |
| 2 | + |
| 3 | +This repository is here to help guide you through the |
| 4 | +[Collaborative Data Science Project (CDSP)](https://docs.google.com/document/d/1TaoVVqJD5EqmBGLw6_qzph8EZnuL6uhY/edit?usp=sharing&ouid=100638458423869369523&rtpof=true&sd=true). |
| 5 | + |
| 6 | +This repository's structure roughly follows the CDSP milestones. It's also |
| 7 | +designed to help you do **reproducible** research. If your research process is |
| 8 | +well organized, others should be able to clone it, run all scripts (without |
| 9 | +errors!), and evaluate your conclusions for themselves. |
| 10 | + |
| 11 | +``` |
| 12 | +/ |
| 13 | +├── README.md # Project overview and main findings |
| 14 | +├── /collaboration/ # Team norms, strategies, and retrospectives |
| 15 | +├── /notes/ # Shared resources and learning materials |
| 16 | +├── /0_domain_study/ # Domain research and background |
| 17 | +├── /1_datasets/ # Raw and processed datasets |
| 18 | +├── /2_data_preparation/ # Scripts for cleaning and processing data |
| 19 | +├── /3_data_exploration/ # Scripts for initial data understanding |
| 20 | +├── /4_data_analysis/ # Scripts for in-depth analysis |
| 21 | +├── /5_communication_strategy/ # Materials for communicating findings |
| 22 | +└── /6_final_presentation/ # Final presentation materials |
| 23 | +``` |
| 24 | + |
| 25 | +Below are some suggestions on how to use the folders/files in this repository, |
| 26 | +but they're just suggestions! Your group should find a system that works for you |
| 27 | + |
| 28 | +--- |
| 29 | + |
| 30 | +## 1. Cross-Cultural Collaboration |
| 31 | + |
| 32 | +- Use `/collaboration` to document team norms (in `README.md`), collaboration |
| 33 | + strategies, project goals, and your milestone retrospectives. This folder is a |
| 34 | + living document — you should update it through the whole project. |
| 35 | +- Use `/notes` to share useful resources: tools, tutorials, examples, or |
| 36 | + anything else that helps your team learn and work better. |
| 37 | +- In the main `README.md`, write a short intro to your team and what you're |
| 38 | + hoping to study. |
| 39 | +- Create a retrospective for this milestone in `/collaboration` using the |
| 40 | + template. |
| 41 | + |
| 42 | +--- |
| 43 | + |
| 44 | +## 2. Problem Identification |
| 45 | + |
| 46 | +- Use `/0_domain_study` to build a reference folder for your research domain. A |
| 47 | + new teammate should be able to catch up using this folder. |
| 48 | +- Use `/0_domain_study/README.md` to organize your notes in `0_domain_study` so |
| 49 | + people don't have to read every single file to find what they need. |
| 50 | +- In the main `README.md`, write a summary of your research question, relevant |
| 51 | + background, and why you think this is a meaningful problem to work on. |
| 52 | +- Create a retrospective for this milestone in `/collaboration` using the |
| 53 | + template. |
| 54 | + |
| 55 | +--- |
| 56 | + |
| 57 | +## 3. Data Collection |
| 58 | + |
| 59 | +- Store **all datasets** (raw or processed) in `/1_datasets`. This folder is for |
| 60 | + data only — not code. (Unless you happen to have a dataset that's inside a |
| 61 | + `.py` file…) |
| 62 | +- In `/1_datasets/README.md`, document each dataset: where it's from, how it was |
| 63 | + collected, how it connects to your research question, and any limitations or |
| 64 | + caveats. |
| 65 | +- Use `/2_data_preparation` to keep all your cleaning, transformation, and prep |
| 66 | + scripts. These scripts should read data from `/1_datasets` and write new |
| 67 | + datasets back to that same folder. |
| 68 | +- In `/2_data_preparation/README.md`, explain what each script does: which |
| 69 | + datasets it reads, what it does to them, and what outputs it creates. |
| 70 | +- Use `/3_data_exploration` to explore, visualize, and get a feel for your |
| 71 | + datasets. This isn't the place for answering research questions — it's just to |
| 72 | + understand your data. |
| 73 | +- In `/3_data_exploration/README.md`, summarize what each script/notebook |
| 74 | + explores, and which datasets it uses. |
| 75 | +- In the main `README.md`, describe how you're modeling your research question |
| 76 | + with data, what datasets you're using, and how you prepared them. |
| 77 | +- Create a retrospective for this milestone in `/collaboration` using the |
| 78 | + template. |
| 79 | + |
| 80 | +--- |
| 81 | + |
| 82 | +## 4. Data Analysis |
| 83 | + |
| 84 | +- Use `/4_data_analysis` for scripts and notebooks that actually analyze your |
| 85 | + data to answer your research question. Don’t try to cram everything into one |
| 86 | + file — you can have many scripts/notebooks in here as long as they are clearly |
| 87 | + named. It's expected that your research findings and conclusions will be the |
| 88 | + result of many smaller analyses, trying to fit everything into a single |
| 89 | + notebook will be unhelpful. You can always cite different scripts/notebooks to |
| 90 | + support different parts of your conclusions. |
| 91 | +- In `/4_data_analysis/README.md`, outline your analysis strategy and summarize |
| 92 | + what each script or notebook does. |
| 93 | +- In the main `README.md`, include: |
| 94 | + - A short summary of your analysis approach |
| 95 | + - A clear statement of your research conclusions (right at the top) |
| 96 | + - How confident you are in your results |
| 97 | + - What limitations your work has |
| 98 | + - Any ideas for future research |
| 99 | +- Create a retrospective for this milestone in `/collaboration` using the |
| 100 | + template. |
| 101 | + |
| 102 | +--- |
| 103 | + |
| 104 | +## 5. Communication Strategy |
| 105 | + |
| 106 | +- Use `/5_communication_strategy` for planning and drafting your communication |
| 107 | + artefact. That includes audience research, message development, and assets |
| 108 | + like images or scripts. |
| 109 | +- You don’t need to store the final artefact here if that doesn’t make sense For |
| 110 | + example, in cohort 6 a group created an instagram account and meme campaign as |
| 111 | + their communication strategy! You can't push that to a folder on GitHub. |
| 112 | +- In `/5_communication_strategy/README.md`, summarize your strategy: who you’re |
| 113 | + reaching, what you’re saying, and why. |
| 114 | +- In the main `README.md`, include a summary of your communication strategy and |
| 115 | + a link (if possible) to your final artefact. |
| 116 | +- Create a retrospective for this milestone in `/collaboration` using the |
| 117 | + template. |
| 118 | + |
| 119 | +--- |
| 120 | + |
| 121 | +## 6. Final Presentation |
| 122 | + |
| 123 | +- Use `/6_final_presentation` to store slides, scripts, or notes from preparing |
| 124 | + your final presentation. |
| 125 | +- In `/6_final_presentation/README.md`, list what’s in the folder and link to |
| 126 | + your actual presentation. |
| 127 | +- Create a retrospective for this milestone in `/collaboration` using the |
| 128 | + template. |
| 129 | + |
| 130 | +--- |
| 131 | + |
| 132 | +## General Tips |
| 133 | + |
| 134 | +- Keep README files updated as you go. They’re for humans. Future-you is a |
| 135 | + human. |
| 136 | +- Reproducibility is key. Someone else should be able to run your pipeline |
| 137 | + without tweaking your code or guessing what goes where. |
| 138 | +- Use clear, consistent file names — you don’t want to waste time figuring out |
| 139 | + what `final_final_revised3.ipynb` was supposed to do. |
| 140 | +- Document your work as you’re doing it. Waiting until the end = pain. |
| 141 | +- Cross-reference when needed. ("This analysis uses the cleaned data from |
| 142 | + `/2_data_preparation/clean_survey_data.py`.") |
| 143 | +- Commit early and often. Write commit messages that your teammates (and your |
| 144 | + future self) will understand. |
| 145 | +- Do regular repo reviews as a team. Is everything findable? Understandable? |
| 146 | + |
| 147 | +--- |
| 148 | + |
| 149 | +--- |
| 150 | + |
| 151 | +Happy studies! |
0 commit comments