Skip to content

Commit 2d33594

Browse files
Update READMEs
1 parent 13295d1 commit 2d33594

File tree

3 files changed

+20
-2
lines changed

3 files changed

+20
-2
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,13 +62,16 @@ Project is divided into two modules:
6262

6363
### bigfiles
6464

65+
- [How to Run](bigfiles/README.md#how-to-run)
6566
- bigfile is file that does not fit to RAM
6667
- module for comparing big files
6768
- written in Scala
6869
- more about bigfiles module could be found in [bigfiles README](bigfiles/README.md)
6970

71+
7072
### smallfiles
7173

74+
- [How to Run](smallfiles/README.md#how-to-run)
7275
- smallfile is file that fits to RAM
7376
- module for comparing small files
7477
- written in Python

bigfiles/README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Scala CPS-Dataset-Comparison
22

3-
This is scala implementation of the project. It is used for comparing big files.
3+
This is scala implementation of the project. It is used for comparing big files (files that can not fit to RAM).
44

55
- [How to run](#how-to-run)
66
- [Requirements](#requirements)
@@ -15,6 +15,7 @@ Then run:
1515

1616
```bash
1717
spark-submit target/scala-2.12/dataset-comparison-assembly-1.0.jar -o <output-path> --inputA <A-file-path> --inputB <B-file-path>
18+
1819
```
1920
### Parameters:
2021
| Parameter | Description | Required |
@@ -26,6 +27,18 @@ spark-submit target/scala-2.12/dataset-comparison-assembly-1.0.jar -o <output-pa
2627
|`-d` or `--diff` [Row] |difference compute type| **optional**|
2728
|`-e` or `--exclude`|columns to exclude|**optional**|
2829

30+
Example:
31+
```bash
32+
spark-submit --class africa.absa.cps.DatasetComparison \
33+
--conf "spark.driver.extraJavaOptions=-Dconfig.file=/../bigfiles/src/main/resources/application.conf" \
34+
target/scala-2.11/dataset-comparison-assembly-0.1.0.jar \
35+
-o "/test_files/output_names$(date '+%Y-%m-%d_%H%M%S')" \
36+
--inputA /test_files/namesA.parquet \
37+
--inputB /test_files/namesB.parquet \
38+
-d Row
39+
40+
```
41+
2942
### Run with specific config
3043

3144
```bash

smallfiles/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Python CPS-Dataset-Comparison
22

3-
This is python implementation of the project. It is used for comparing small files.
3+
> This module is not yet implemented.
4+
5+
This is python implementation of the project. It is used for comparing small files (files fitting into RAM).
46

57
- [Create and run environment](#create-and-run-environment)
68
- [Run main](#run-main)

0 commit comments

Comments
 (0)