Skip to content

Latest commit

 

History

History
16 lines (13 loc) · 840 Bytes

File metadata and controls

16 lines (13 loc) · 840 Bytes

Big Data Coursework

Large Scale Data Management Systems / Big Data Analytics coursework repository from NTUA, covering RDD, Spark SQL, and parquet-based processing exercises.

Repository contents

  • 1_4_Q1_RDD.py, 1_4_Q1_SQL.py, 1_4_Q1_SQL_parquet.py: question 1 implementations
  • 1_4_Q2_RDD.py, 1_4_Q2_SQL.py, 1_4_Q2_SQL_parquet.py: question 2 implementations
  • 1_5.py: follow-up exercise
  • 2.py, 2.log, 2.results: second exercise and captured outputs
  • csv_to_parquet.py: conversion helper
  • logs/, results/, tmp/: generated artefacts and intermediate output
  • Big_Data_Exercise_report.pdf: report

Notes

  • The repository is organized around coursework questions rather than a single application entry point.
  • The report provides the clearest overview of the datasets, tasks, and evaluation context.