Skip to content

ReneGV/work-absense-forecaster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

31 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

work-absense-forecaster

Tests

A work absense hours ML project

πŸ“‘ Table of Contents


Project Overview

This project builds a machine learning system to forecast employee absenteeism hours using structured HR and productivity data.
The goal is to identify patterns in employee behavior and workplace conditions to anticipate absenteeism risk, enabling better planning and cost reduction.

Setup

python -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate
pip install -r requirements.txt

MLflow Integration

Run MLFlow server for experiment tracking:

mlflow server --backend-store-uri sqlite:///my.db --default-artifact-root ./mlruns --host 0.0.0.0 --port 5000

The MLflow UI will be available at http://localhost:5000

πŸš€ API Server

The project includes a FastAPI-based REST API for making absenteeism predictions.

Running with Docker (Recommended)

Build and start the API:

docker-compose up -d

Stop the API:

docker-compose down

API Endpoints

The API will be available at http://localhost:8000

  • GET / - API information and available endpoints
  • GET /health - Health check endpoint
  • POST /predict - Make absenteeism predictions
  • GET /docs - Interactive Swagger UI documentation

Testing the API

Health check:

curl http://localhost:8000/health

Get API information:

curl http://localhost:8000/

Make a prediction:

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "reason_for_absence": 23,
    "month_of_absence": 7,
    "day_of_the_week": 3,
    "seasons": 1,
    "transportation_expense": 289,
    "distance_from_residence_to_work": 36,
    "service_time": 13,
    "age": 33,
    "work_load_average/day": 239.554,
    "hit_target": 97,
    "disciplinary_failure": 0,
    "education": 1,
    "son": 2,
    "social_drinker": 1,
    "social_smoker": 0,
    "pet": 1,
    "weight": 90,
    "height": 172
  }'

Expected response:

{
  "prediction": 1,
  "prediction_label": "High",
  "confidence": 0.7343
}

Interactive Documentation

Once the server is running, access the interactive API documentation:

These interfaces allow you to:

  • View all available endpoints
  • See request/response schemas
  • Test the API directly from your browser
  • Download OpenAPI specification

Project Organization

β”œβ”€β”€ LICENSE
β”œβ”€β”€ Makefile           <- Makefile with commands like `make data` or `make train`
β”œβ”€β”€ README.md          <- The top-level README for developers using this project.
β”œβ”€β”€ Dockerfile         <- Docker configuration for the API server
β”œβ”€β”€ docker-compose.yml <- Docker Compose configuration for easy deployment
β”œβ”€β”€ requirements.txt   <- Full requirements file for development and training
β”œβ”€β”€ requirements-api.txt <- Minimal requirements for the API server
β”‚
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ external       <- Data from third party sources.
β”‚   β”œβ”€β”€ interim        <- Intermediate data that has been transformed.
β”‚   β”œβ”€β”€ processed      <- The final, canonical data sets for modeling.
β”‚   └── raw            <- The original, immutable data dump.
β”‚
β”œβ”€β”€ docs               <- A default Sphinx project; see sphinx-doc.org for details
β”‚
β”œβ”€β”€ models             <- Trained and serialized models, model predictions, or model summaries
β”‚
β”œβ”€β”€ notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
β”‚                         the creator's initials, and a short `-` delimited description, e.g.
β”‚                         `1.0-jqp-initial-data-exploration`.
β”‚
β”œβ”€β”€ references         <- Data dictionaries, manuals, and all other explanatory materials.
β”‚
β”œβ”€β”€ reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
β”‚   └── figures        <- Generated graphics and figures to be used in reporting
β”‚
β”œβ”€β”€ setup.py           <- makes project pip installable (pip install -e .) so src can be imported
β”œβ”€β”€ src                <- Source code for use in this project.
β”‚   β”œβ”€β”€ __init__.py    <- Makes src a Python module
β”‚   β”‚
β”‚   β”œβ”€β”€ api            <- FastAPI REST API for predictions
β”‚   β”‚   └── server.py  <- API server implementation
β”‚   β”‚
β”‚   β”œβ”€β”€ data           <- Scripts to download or generate data
β”‚   β”‚   └── make_dataset.py
β”‚   β”‚
β”‚   β”œβ”€β”€ features       <- Scripts to turn raw data into features for modeling
β”‚   β”‚   └── build_features.py
β”‚   β”‚
β”‚   β”œβ”€β”€ models         <- Scripts to train models and then use trained models to make
β”‚   β”‚   β”‚                 predictions
β”‚   β”‚   β”œβ”€β”€ predict_model.py
β”‚   β”‚   β”œβ”€β”€ train_model.py
β”‚   β”‚   └── preprocessors.py
β”‚   β”‚
β”‚   └── visualization  <- Scripts to create exploratory and results oriented visualizations
β”‚       └── visualize.py
β”‚
β”œβ”€β”€ tests              <- Unit and integration tests
β”‚   β”œβ”€β”€ unit/          <- Unit tests for individual components
β”‚   └── integration/   <- End-to-end integration tests
β”‚
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

Project based on the cookiecutter data science project template. #cookiecutterdatascience

πŸ§ͺ Testing Guide for Work Absenteeism Forecaster

πŸ“‹ Test Structure

A comprehensive test suite with both unit tests and integration tests has been created:

tests/
β”œβ”€β”€ __init__.py                    # Package initialization
β”œβ”€β”€ conftest.py                    # Shared pytest fixtures (root level)
β”‚
β”œβ”€β”€ unit/                          # Unit tests
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ conftest.py               # Unit-specific fixtures
β”‚   β”œβ”€β”€ test_preprocessors.py     # Tests for custom transformers
β”‚   β”œβ”€β”€ test_train_model.py       # Tests for training pipeline
β”‚   β”œβ”€β”€ test_predict_model.py     # Tests for prediction pipeline
β”‚   β”œβ”€β”€ test_data_utils.py        # Tests for data utilities
β”‚   └── test_evaluation.py        # Tests for model evaluation
β”‚
└── integration/                   # Integration tests
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ conftest.py                # Integration-specific fixtures
    └── test_pipeline_integration.py  # End-to-end pipeline tests

Additional files:
β”œβ”€β”€ pytest.ini                     # Pytest configuration
β”œβ”€β”€ Dockerfile.test                # Docker image for testing
└── docker-compose.test.yml        # Docker compose for running tests

Test Coverage

Unit Tests (tests/unit/)

  1. Preprocessors (test_preprocessors.py)

    • DropColumnsTransformer: Column dropping functionality
    • IQRClippingTransformer: Outlier handling using IQR method
    • ToStringTransformer: Type conversion to strings
    • Integration with sklearn pipelines
  2. Model Training (test_train_model.py)

    • Data loading and preparation
    • Pipeline construction
    • Model creation (Logistic Regression, Random Forest, Neural Network)
    • Training and evaluation
    • Multiple model training
    • Model persistence
  3. Model Prediction (test_predict_model.py)

    • Model loading
    • Making predictions on new data
    • Data handling in prediction pipeline
  4. Data Utilities (test_data_utils.py)

    • CSV file loading
    • Column name normalization
    • Data shape validation
    • Data value preservation
  5. Model Evaluation (test_evaluation.py)

    • Metrics calculation (accuracy, F1, recall, precision)
    • Classification reports
    • Confusion matrix creation

Integration Tests (tests/integration/)

End-to-End Pipeline (test_pipeline_integration.py)

  1. Complete ML Workflow (test_realistic_ml_workflow)
    • Data loading and preparation
    • Train/test split
    • Preprocessing pipeline creation
    • Model training
    • Model persistence (save/load)
    • Prediction on new data
    • Metrics evaluation
    • Confusion matrix generation
    • File artifact verification

πŸ”„ Continuous Integration

This project uses GitHub Actions to automatically run tests on every push and pull request.

Workflow Overview

The CI pipeline runs:

  • Unit tests on all test files in tests/unit/
  • Integration tests on all test files in tests/integration/
  • Coverage reports with XML and HTML output
  • Tests on Python 3.9

Viewing Test Results

  1. Navigate to the Actions tab in the GitHub repository
  2. Click on any workflow run to see detailed test results
  3. Coverage reports are uploaded as artifacts (available for 30 days)

Workflow Configuration

The workflow is defined in .github/workflows/tests.yml and triggers on:

  • Pushes to main and develop branches
  • Pull requests to main and develop branches

πŸš€ Running tests

Build Docker Image

Build the docker image that contains the required environment to run the tests:

docker build -f Dockerfile.test -t work-absenteeism-test:latest .

Run All Tests

Run all tests (unit + integration):

docker-compose -f docker-compose.test.yml run --rm test

Run Specific Test Suites

Run only unit tests:

docker-compose -f docker-compose.test.yml run --rm test pytest tests/unit/ -v

Run only integration tests:

docker-compose -f docker-compose.test.yml run --rm test pytest tests/integration/ -v

Run with Coverage

Run tests with coverage report:

docker-compose -f docker-compose.test.yml run --rm test-coverage

Test Markers

Use pytest markers to run specific test categories:

# Run only unit tests
pytest -m unit

# Run only integration tests  
pytest -m integration

# Run only slow tests
pytest -m slow

About

πŸ’‘ An ML project that AIMs to predict work absense.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors