Skip to content

Commit b9be9fb

Browse files
authored
Merge pull request #194 from julienrf/add-tutorial
Add tutorial showing how to migrate from DynamoDB
2 parents 0d9d818 + 24eb472 commit b9be9fb

File tree

19 files changed

+341
-8
lines changed

19 files changed

+341
-8
lines changed
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
name: "Tests / Tutorials"
2+
on:
3+
push:
4+
branches:
5+
- master
6+
pull_request:
7+
8+
env:
9+
TUTORIAL_DIR: docs/source/tutorials/dynamodb-to-scylladb-alternator
10+
11+
jobs:
12+
test:
13+
name: DynamoDB migration
14+
runs-on: ubuntu-latest
15+
steps:
16+
- uses: actions/checkout@v4
17+
- name: Cache Docker images
18+
uses: ScribeMD/[email protected]
19+
with:
20+
key: docker-${{ runner.os }}-${{ hashFiles('docker-compose-tests.yml') }}
21+
- uses: actions/setup-java@v4
22+
with:
23+
distribution: temurin
24+
java-version: 8
25+
cache: sbt
26+
- name: Build migrator
27+
run: |
28+
./build.sh
29+
mv migrator/target/scala-2.13/scylla-migrator-assembly.jar "$TUTORIAL_DIR/spark-data"
30+
- name: Set up services
31+
run: |
32+
cd $TUTORIAL_DIR
33+
docker compose up -d
34+
- name: Wait for the services to be up
35+
run: |
36+
.github/wait-for-port.sh 8000 # DynamoDB
37+
.github/wait-for-port.sh 8001 # ScyllaDB Alternator
38+
.github/wait-for-port.sh 8080 # Spark master
39+
.github/wait-for-port.sh 8081 # Spark worker
40+
- name: Run tutorial
41+
run: |
42+
cd $TUTORIAL_DIR
43+
aws configure set region us-west-1
44+
aws configure set aws_access_key_id dummy
45+
aws configure set aws_secret_access_key dummy
46+
sed -i 's/seq 1 40000/seq 1 40/g' ./create-data.sh
47+
./create-data.sh
48+
. ./run-migrator.sh
49+
- name: Stop services
50+
run: |
51+
cd $TUTORIAL_DIR
52+
docker compose down

docs/source/getting-started/ansible.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ The Ansible playbook expects to be run in an Ubuntu environment where the direct
3535
- Ensure networking is configured to allow you access spark master node via TCP ports 8080 and 4040
3636
- visit ``http://<spark-master-hostname>:8080``
3737

38-
9. `Review and modify config.yaml <../#configure-the-migration>`_ based whether you're performing a migration to CQL or Alternator
38+
9. `Review and modify config.yaml <./#configure-the-migration>`_ based whether you're performing a migration to CQL or Alternator
3939

4040
- If you're migrating to ScyllaDB CQL interface (from Apache Cassandra, ScyllaDB, or other CQL source), make a copy review the comments in ``config.yaml.example``, and edit as directed.
4141
- If you're migrating to Alternator (from DynamoDB or other ScyllaDB Alternator), make a copy, review the comments in ``config.dynamodb.yml``, and edit as directed.

docs/source/getting-started/aws-emr.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ This page describes how to use the Migrator in `Amazon EMR <https://aws.amazon.c
1212
--output-document=config.yaml
1313
1414
15-
2. `Configure the migration <../#configure-the-migration>`_ according to your needs.
15+
2. `Configure the migration <./#configure-the-migration>`_ according to your needs.
1616

1717
3. Download the latest release of the Migrator.
1818

@@ -67,7 +67,7 @@ This page describes how to use the Migrator in `Amazon EMR <https://aws.amazon.c
6767
6868
spark-submit --deploy-mode cluster --class com.scylladb.migrator.Migrator --conf spark.scylla.config=/mnt1/config.yaml /mnt1/scylla-migrator-assembly.jar
6969
70-
See also our `general recommendations to tune the Spark job <../#run-the-migration>`_.
70+
See also our `general recommendations to tune the Spark job <./#run-the-migration>`_.
7171

7272
- Add a Bootstrap action to download the Migrator and the migration configuration:
7373

docs/source/getting-started/docker.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ This page describes how to set up a Spark cluster locally on your machine by usi
3333

3434
http://localhost:8080
3535

36-
5. Rename the file ``config.yaml.example`` to ``config.yaml``, and `configure <../#configure-the-migration>`_ it according to your needs.
36+
5. Rename the file ``config.yaml.example`` to ``config.yaml``, and `configure <./#configure-the-migration>`_ it according to your needs.
3737

3838
6. Finally, run the migration.
3939

@@ -47,7 +47,7 @@ This page describes how to set up a Spark cluster locally on your machine by usi
4747
4848
The ``spark-master`` container mounts the ``./migrator/target/scala-2.13`` dir on ``/jars`` and the repository root on ``/app``.
4949

50-
See also our `general recommendations to tune the Spark job <../#run-the-migration>`_.
50+
See also our `general recommendations to tune the Spark job <./#run-the-migration>`_.
5151

5252
7. You can monitor progress by observing the Spark web console you opened in step 4. Additionally, after the job has started, you can track progress via ``http://localhost:4040``.
5353

docs/source/getting-started/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@ A Spark cluster is made of several *nodes*, which can contain several *workers*
1212

1313
We recommend provisioning at least 2 GB of memory per CPU on each node. For instance, a cluster node with 4 CPUs should have at least 8 GB of memory.
1414

15+
.. caution::
16+
17+
Make sure the Spark version, the Scala version, and the Migrator version you use are `compatible together <../#compatibility-matrix>`_.
18+
1519
The following pages describe various alternative ways to set up a Spark cluster:
1620

1721
* :doc:`on your infrastructure, using Ansible </getting-started/ansible>`,

docs/source/getting-started/spark-standalone.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ This page describes how to set up a Spark cluster on your infrastructure and to
2121
wget https://github.com/scylladb/scylla-migrator/raw/master/config.yaml.example \
2222
--output-document=config.yaml
2323
24-
4. `Configure the migration <../#configure-the-migration>`_ according to your needs.
24+
4. `Configure the migration <./#configure-the-migration>`_ according to your needs.
2525

2626
5. Finally, run the migration as follows from the Spark master node.
2727

@@ -32,6 +32,6 @@ This page describes how to set up a Spark cluster on your infrastructure and to
3232
--conf spark.scylla.config=<path to config.yaml> \
3333
<path to scylla-migrator-assembly.jar>
3434
35-
See also our `general recommendations to tune the Spark job <../#run-the-migration>`_.
35+
See also our `general recommendations to tune the Spark job <./#run-the-migration>`_.
3636

3737
6. You can monitor progress from the `Spark web UI <https://spark.apache.org/docs/latest/spark-standalone.html#monitoring-and-logging>`_.

docs/source/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ The ScyllaDB Migrator is a Spark application that migrates data to ScyllaDB. Its
99
* It can rename columns along the way.
1010
* When migrating from DynamoDB it can transfer a snapshot of the source data, or continuously migrate new data as they come.
1111

12-
Read over the :doc:`Getting Started </getting-started/index>` page to set up a Spark cluster for a migration.
12+
Read over the :doc:`Getting Started </getting-started/index>` page to set up a Spark cluster and to configure your migration. Alternatively, follow our :doc:`step-by-step tutorial to perform a migration between fake databases using Docker </tutorials/dynamodb-to-scylladb-alternator/index>`.
1313

1414
--------------------
1515
Compatibility Matrix
@@ -33,3 +33,4 @@ Migrator Spark Scala
3333
rename-columns
3434
validate
3535
configuration
36+
tutorials/index
Loading
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/usr/bin/env sh
2+
3+
generate_25_items() {
4+
local items=""
5+
for i in `seq 1 25`; do
6+
items="${items}"'{
7+
"PutRequest": {
8+
"Item": {
9+
"id": { "S": "'"$(uuidgen)"'" },
10+
"col1": { "S": "'"$(uuidgen)"'" },
11+
"col2": { "S": "'"$(uuidgen)"'" },
12+
"col3": { "S": "'"$(uuidgen)"'" },
13+
"col4": { "S": "'"$(uuidgen)"'" },
14+
"col5": { "S": "'"$(uuidgen)"'" }
15+
}
16+
}
17+
},'
18+
done
19+
echo "${items%,}" # remove trailing comma
20+
}
21+
22+
aws \
23+
--endpoint-url http://localhost:8000 \
24+
dynamodb batch-write-item \
25+
--request-items '{
26+
"Example": ['"$(generate_25_items)"']
27+
}' > /dev/null
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#!/usr/bin/env sh
2+
3+
# Create table
4+
aws \
5+
--endpoint-url http://localhost:8000 \
6+
dynamodb create-table \
7+
--table-name Example \
8+
--attribute-definitions AttributeName=id,AttributeType=S \
9+
--key-schema AttributeName=id,KeyType=HASH \
10+
--provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100
11+
12+
# Add items in parallel
13+
# Change 40000 into 400 below for a faster demo (10,000 items instead of 1,000,000)
14+
seq 1 40000 | xargs --max-procs=8 --max-args=1 ./create-25-items.sh

0 commit comments

Comments
 (0)