Repository for development of the second project of the course DATA WAREHOUSES (INFO-H-419) from the program BDMA, Fall 2022.
Development of a TPC-DI Benchmark in Oracle with several scale factors.
Group members:
Professor: Zimányi, Esteban
- Download TPC-DI Tools (https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp)
- Install java-8 if not already installed and set java-8 as the default version of java
- Install python-oracledb library: (More information: https://python-oracledb.readthedocs.io/en/latest/user_guide/installation.html)
- Rename folder PDGF to pdgf
- Generate the data to a folder "staging/sf" where sf is the chosen scale factor
- e.g. for scale factor 3:
$ java -jar DIGen.jar -sf 3 -o ../staging/3
-
If implementing on Oracle Linux, convert the generated files to Linux format:
find staging/3/ -type f -exec dos2unix {} \; > results.out -
Create a database on Oracle
CREATE DATABASE <db-name> -
Set the information for connecting to your Oracle Database, in particular
host(e.g.localhost),port(e.g.:1521),user,password,db-namethat is set in step 7. -
Do the Historical Load with
$ python main.py -sf 'sf'where:
- 'sf' is Scale factor to be used in benchmark, the same as the one used to generate the data