Skip to content

kushalrathod32/fampay-data-intern-assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Intern Hiring Assignment

Project Overview

This project is a data pipeline built with Python and Pandas to process historical stock data. It converts daily trading records into monthly summaries and calculates technical indicators (SMA & EMA) using manual mathematical formulas rather than high-level library functions.

Installation & Execution

  1. Clone the repository: git clone cd fampay_assignment

  2. Setup Virtual Environment: python -m venv venv source venv/bin/activate

  3. Install Dependencies: pip install -r requirements.txt

  4. Run the Script: python src/main.py

Project Structure

  • src/main.py: The entry point that handles data downloading and triggers the processing.
  • src/processor.py: Contains the logic for monthly resampling and manual indicator math.
  • data/: Stores the raw stock_data.csv.
  • output/: Stores the 10 resulting CSV files (one per ticker).

Logic & Assumptions

1. Monthly Resampling

Daily data is aggregated into monthly rows.

  • Open/Close: We pick the exact price from the first and last trading days of the month.
  • High/Low: We pick the maximum and minimum price seen during that month.

2. Indicator Start Points (The "Why")

If you open the result files, you will see empty (NaN) values at the top. This is intentional and follows financial logic:

  • SMA 10 & EMA 10: These start after 9 empty rows. You cannot calculate a 10-month average until you have at least 10 months of data history.
  • SMA 20 & EMA 20: These start after 19 empty rows because a 20-month window is required.

3. Manual Formulas Used

To demonstrate logic accuracy, we implemented the formulas manually:

  • Simple Moving Average (SMA): Formula: Sum of closing prices (over N periods) / N

  • Exponential Moving Average (EMA):

    • Multiplier: 2 / (N + 1)
    • The Seed: As required, the very first EMA value is initialized as the SMA of that period.
    • Recursive Formula: EMA = (Current Price - Previous EMA) * Multiplier + Previous EMA

Assumptions

  • Data Source: The dataset is fetched dynamically from the provided GitHub link.
  • Relative Paths: All file paths are relative, ensuring the code runs on any machine (Mac, Windows, or Linux) without modification.
  • Validation: Every stock ticker results in exactly 24 rows, representing the full 2-year period provided in the CSV.

Developed by: Kushal Rathod

About

Data Engineering Intern Hiring Assignment: A Python-based ETL pipeline that resamples daily stock data into monthly summaries and calculates SMA/EMA indicators using manual formulas.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages