File tree Expand file tree Collapse file tree 3 files changed +20
-2
lines changed
Expand file tree Collapse file tree 3 files changed +20
-2
lines changed Original file line number Diff line number Diff line change @@ -62,13 +62,16 @@ Project is divided into two modules:
6262
6363### bigfiles
6464
65+ - [ How to Run] ( bigfiles/README.md#how-to-run )
6566- bigfile is file that does not fit to RAM
6667- module for comparing big files
6768- written in Scala
6869- more about bigfiles module could be found in [ bigfiles README] ( bigfiles/README.md )
6970
71+
7072### smallfiles
7173
74+ - [ How to Run] ( smallfiles/README.md#how-to-run )
7275- smallfile is file that fits to RAM
7376- module for comparing small files
7477- written in Python
Original file line number Diff line number Diff line change 11# Scala CPS-Dataset-Comparison
22
3- This is scala implementation of the project. It is used for comparing big files.
3+ This is scala implementation of the project. It is used for comparing big files (files that can not fit to RAM) .
44
55- [ How to run] ( #how-to-run )
66 - [ Requirements] ( #requirements )
@@ -15,6 +15,7 @@ Then run:
1515
1616``` bash
1717spark-submit target/scala-2.12/dataset-comparison-assembly-1.0.jar -o < output-path> --inputA < A-file-path> --inputB < B-file-path>
18+
1819```
1920### Parameters:
2021| Parameter | Description | Required |
@@ -26,6 +27,18 @@ spark-submit target/scala-2.12/dataset-comparison-assembly-1.0.jar -o <output-pa
2627| ` -d ` or ` --diff ` [ Row] | difference compute type| ** optional** |
2728| ` -e ` or ` --exclude ` | columns to exclude| ** optional** |
2829
30+ Example:
31+ ``` bash
32+ spark-submit --class africa.absa.cps.DatasetComparison \
33+ --conf " spark.driver.extraJavaOptions=-Dconfig.file=/../bigfiles/src/main/resources/application.conf" \
34+ target/scala-2.11/dataset-comparison-assembly-0.1.0.jar \
35+ -o " /test_files/output_names$( date ' +%Y-%m-%d_%H%M%S' ) " \
36+ --inputA /test_files/namesA.parquet \
37+ --inputB /test_files/namesB.parquet \
38+ -d Row
39+
40+ ```
41+
2942### Run with specific config
3043
3144``` bash
Original file line number Diff line number Diff line change 11# Python CPS-Dataset-Comparison
22
3- This is python implementation of the project. It is used for comparing small files.
3+ > This module is not yet implemented.
4+
5+ This is python implementation of the project. It is used for comparing small files (files fitting into RAM).
46
57- [ Create and run environment] ( #create-and-run-environment )
68- [ Run main] ( #run-main )
You can’t perform that action at this time.
0 commit comments