Building a Data Analysis pipeline tutorial

This example data analysis project analyzes the word count for all words in 4 novels. It reports the top 10 most occurring words in each book in a report.

Recreate the computational environment

1. Clone repo

Clone this repo, and using the command line, navigate to the root of this project.

git clone <repo_name>
cd <folder_name>

2. Recreate the computational environment

Option 1: Use `conda-lock.yml`

2.1.1 Run the following commands to create the conda environment:

conda-lock install --name ia4 conda-lock.yml

2.1.2 Activate the conda environment:

conda activate ia4

2.1.3 Run the analysis:

bash runall.sh

Option 2: Use `environment.yml`

2.2.1 Create a conda environment using environment.yml

conda env create -n ia4 -f environment.yml

2.2.2 Activate the conda environment:

conda activate ia4

2.2.3 Run the analysis:

bash runall.sh

Option 3: Use `docker-compose.yml`

2.3.1. Pull and launch the docker container, this will direct you to the terminal of the container, no GUI

docker compose run --rm ia4

2.3.3 You will land directly in the terminal of the container. Run the analysis:

bash runall.sh

2.3.4 After you are done, type exit to leave docker container.

Exercise:

Your task is to add a "smarter" data analysis pipeline using GNU Make! It should accomplish the same task as bash runall.sh when you type make all.

It should reset the analysis the starting point (the state when you first copied this repo) when you type make clean.

Depenedencies

GNU Make
Quarto
Python & Python libraries:
- click
- matplotlib
- pandas

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
data		data
report		report
results		results
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
conda-linux-64.lock		conda-linux-64.lock
conda-lock.yml		conda-lock.yml
conda-osx-64.lock		conda-osx-64.lock
conda-osx-arm64.lock		conda-osx-arm64.lock
conda-win-64.lock		conda-win-64.lock
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
runall.sh		runall.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building a Data Analysis pipeline tutorial

Recreate the computational environment

1. Clone repo

2. Recreate the computational environment

Exercise:

Depenedencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Building a Data Analysis pipeline tutorial

Recreate the computational environment

1. Clone repo

2. Recreate the computational environment

Exercise:

Depenedencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages