Large Scale Data Management Systems / Big Data Analytics coursework repository from NTUA, covering RDD, Spark SQL, and parquet-based processing exercises.
1_4_Q1_RDD.py,1_4_Q1_SQL.py,1_4_Q1_SQL_parquet.py: question 1 implementations1_4_Q2_RDD.py,1_4_Q2_SQL.py,1_4_Q2_SQL_parquet.py: question 2 implementations1_5.py: follow-up exercise2.py,2.log,2.results: second exercise and captured outputscsv_to_parquet.py: conversion helperlogs/,results/,tmp/: generated artefacts and intermediate outputBig_Data_Exercise_report.pdf: report
- The repository is organized around coursework questions rather than a single application entry point.
- The report provides the clearest overview of the datasets, tasks, and evaluation context.