Skip to content

Commit 007a41e

Browse files
authored
small fixes about distributed processing (#88)
1 parent 9715b76 commit 007a41e

2 files changed

Lines changed: 9 additions & 8 deletions

File tree

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Distributed processing allows you to scale document indexing across multiple mac
1616

1717
### 1. Prepare Your Configuration File
1818

19-
Create or modify a configuration file that includes the distributed settings:
19+
Check your processing configuration file ([example](/examples/process/config.yaml)), to include the distributed settings:
2020

2121
```yaml
2222
dispatcher_config:
@@ -38,8 +38,12 @@ On each node, run:
3838
git clone <repository-url>
3939
cd mmore
4040
41+
# Make a virtual environment
42+
python -m venv .venv
43+
source .venv/bin/activate
44+
4145
# Install dependencies
42-
pip install -e '.[all]'
46+
pip install -e .
4347
```
4448

4549
### 3. Launch the Distributed Processing
@@ -71,7 +75,7 @@ Once all nodes are running, return to the master node and type `go` when prompte
7175

7276
## Monitoring Progress
7377

74-
You can monitor the processing using the dashboard, just check its [documentation](/docs/dashboard_readme.md).
78+
You can monitor the processing using the dashboard, just check its [documentation](./dashboard.md).
7579

7680
The dashboard provides:
7781
- Real-time progress visualization

docs/process.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,7 @@ bash scripts/process_distributed.sh -f /path/to/my/input/folder
3838
Getting a sense of the overall progress of the pipeline can be challenging when running on a large dataset, and especially in a distributed environment. You can optionally use the dashboard to monitor the progress of the pipeline.
3939
You will be able to visualize results :chart_with_upwards_trend:. The dashboard also lets you gently stop workers :chart_with_downwards_trend: and monitor their progression.
4040

41-
1. Start the backend on the cluster [backend README](/src/mmore/dashboard/backend/README.md).
42-
2. Specify the backend URL in the frontend as an environment variable.
43-
3. Start the frontend on your local machine [frontend README](/src/mmore/dashboard/frontend/README.md).
44-
4. Specify the backend URL in the `process_config.yaml` file and finally execute `run_process.py` as usual.
41+
Check the docs in the [dashboard documentation](./dashboard.md).
4542

4643
#### :scroll: Examples
4744
You can find more examples scripts in [the `/examples` directory](/examples).
@@ -55,7 +52,7 @@ Be aware that the fast mode might not be as accurate as the default mode, especi
5552

5653
### :rocket: Distributed mode
5754

58-
The project is designed to be easily scalable to a multi GPU / multi node environment. To use it, To use it, set the `distribued` to `true` in the config file, and follow the steps described in the [](/README.md) section.
55+
The project is designed to be easily scalable to a multi GPU / multi node environment. To use it, To use it, set the `distribued` to `true` in the config file, and follow the steps described in the [distributed processing](./distributed_processing.md) section.
5956

6057
### :wrench: File type parameters tuning
6158

0 commit comments

Comments
 (0)