Data Engineering Intern Hiring Assignment

Project Overview

This project is a data pipeline built with Python and Pandas to process historical stock data. It converts daily trading records into monthly summaries and calculates technical indicators (SMA & EMA) using manual mathematical formulas rather than high-level library functions.

Installation & Execution

Clone the repository: git clone cd fampay_assignment
Setup Virtual Environment: python -m venv venv source venv/bin/activate
Install Dependencies: pip install -r requirements.txt
Run the Script: python src/main.py

Project Structure

src/main.py: The entry point that handles data downloading and triggers the processing.
src/processor.py: Contains the logic for monthly resampling and manual indicator math.
data/: Stores the raw stock_data.csv.
output/: Stores the 10 resulting CSV files (one per ticker).

Logic & Assumptions

1. Monthly Resampling

Daily data is aggregated into monthly rows.

Open/Close: We pick the exact price from the first and last trading days of the month.
High/Low: We pick the maximum and minimum price seen during that month.

2. Indicator Start Points (The "Why")

If you open the result files, you will see empty (NaN) values at the top. This is intentional and follows financial logic:

SMA 10 & EMA 10: These start after 9 empty rows. You cannot calculate a 10-month average until you have at least 10 months of data history.
SMA 20 & EMA 20: These start after 19 empty rows because a 20-month window is required.

3. Manual Formulas Used

To demonstrate logic accuracy, we implemented the formulas manually:

Simple Moving Average (SMA): Formula: Sum of closing prices (over N periods) / N
Exponential Moving Average (EMA):
- Multiplier: 2 / (N + 1)
- The Seed: As required, the very first EMA value is initialized as the SMA of that period.
- Recursive Formula: EMA = (Current Price - Previous EMA) * Multiplier + Previous EMA

Assumptions

Data Source: The dataset is fetched dynamically from the provided GitHub link.
Relative Paths: All file paths are relative, ensuring the code runs on any machine (Mac, Windows, or Linux) without modification.
Validation: Every stock ticker results in exactly 24 rows, representing the full 2-year period provided in the CSV.

Developed by: Kushal Rathod

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
output		output
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Intern Hiring Assignment

Project Overview

Installation & Execution

Project Structure

Logic & Assumptions

1. Monthly Resampling

2. Indicator Start Points (The "Why")

3. Manual Formulas Used

Assumptions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Intern Hiring Assignment

Project Overview

Installation & Execution

Project Structure

Logic & Assumptions

1. Monthly Resampling

2. Indicator Start Points (The "Why")

3. Manual Formulas Used

Assumptions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages