Skip to content

giorgosouz/Big-data

Repository files navigation

Big Data Coursework

Large Scale Data Management Systems / Big Data Analytics coursework repository from NTUA, covering RDD, Spark SQL, and parquet-based processing exercises.

Repository contents

  • 1_4_Q1_RDD.py, 1_4_Q1_SQL.py, 1_4_Q1_SQL_parquet.py: question 1 implementations
  • 1_4_Q2_RDD.py, 1_4_Q2_SQL.py, 1_4_Q2_SQL_parquet.py: question 2 implementations
  • 1_5.py: follow-up exercise
  • 2.py, 2.log, 2.results: second exercise and captured outputs
  • csv_to_parquet.py: conversion helper
  • logs/, results/, tmp/: generated artefacts and intermediate output
  • Big_Data_Exercise_report.pdf: report

Notes

  • The repository is organized around coursework questions rather than a single application entry point.
  • The report provides the clearest overview of the datasets, tasks, and evaluation context.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages