Skip to content

BigDataIA-Summer2022-Team04/Assignment_04

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Assignment_04

Links:

  • Airflow - UI Deployed on Google Compute Engine
  • Streamlit - UI Deployed on Streamlit Cloud

Architecture:

Architecture


Part 1: Automating Benchmarking of datasets

  • DataPerf Source Source code and dataset stored as a branch dataperf_src

  • Airflow DAG

    • clean_dir - Clean's the working directory
    • download_code - Download's the source code along with dataset from dataperf_src branch
    • generate_template - Generates the task_setup.yml file from the template.yml file replacing the user input.
    • run_python_script - Runs the computation script namely create_baselines.py, main.py and plotter.py
    • send_report_email - Send an email along with the generated report image
    • send_failure_email - Send an email informing the failure of job
    • send_ack_email - Send an email acknowledging the job run and for for further update email

Screenshots:

1. Airflow DAG's

Airflow DAG's

2. Template file - link

paths:
embedding_folder: embeddings/
groundtruth_folder: data/
submission_folder: submissions/
results_folder: results/

tasks:
- data_id: {{ data_id }}
    train_size:  {{ train_size }}
    noise_level:  {{ noise_level }}
    test_size:  {{ test_size }}
    val_size:  {{ val_size }}

baselines:
    {% for element in  baselines %}
    - name: {{ element }}
    {% endfor %}

3. Streamlit UI

Streamlit UI

4. Job Run Acknowledgement Email

Acknowledgement Email

5. Job Run Report Email

Report Email

6. Job Run Failure Email

Failure Email

7. Report Sample

Sample Report


Part 2: Technology evaluation

Snoopy Observation recorded here


Git Branches:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors