Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
136 commits
Select commit Hold shift + click to select a range
7f2fc6e
multiplier test using same cluster
jugal-chauhan Apr 3, 2025
96d0fc0
add python variable as build with parameters
jugal-chauhan Apr 11, 2025
ce0df5f
Fix pytest to use env var from jenkins and more logging to inspect
jugal-chauhan Apr 14, 2025
edea285
Make index name as const var
jugal-chauhan Apr 14, 2025
ceea0d8
Fix error caused by export env vars
jugal-chauhan Apr 14, 2025
743e33a
Fix env vars in ECS container
jugal-chauhan Apr 14, 2025
87d7cfd
Revert back pytest command
jugal-chauhan Apr 14, 2025
28892d2
Remove duplication of backfill logic
jugal-chauhan Apr 14, 2025
78841f4
Logic to get existing or create new S3 bucket
jugal-chauhan Apr 14, 2025
a4bb0ae
fetch account id from role ARN
jugal-chauhan Apr 14, 2025
a9bc16a
Fix bucket creation logic
jugal-chauhan Apr 14, 2025
bc8c29d
Revert unwanted changes
jugal-chauhan Apr 14, 2025
1fe54c6
Add positional args support to run aws commands
jugal-chauhan Apr 14, 2025
0c3e31a
Catch error as a boolean flag
jugal-chauhan Apr 14, 2025
ff77c97
Add error catching for clear clusters error
jugal-chauhan Apr 14, 2025
316b281
Fetch correct parameters
jugal-chauhan Apr 14, 2025
ffc5743
Increase stuck count threshold value from 3 to 10 checks
jugal-chauhan Apr 14, 2025
f8b3788
Revert unwanted files
jugal-chauhan Apr 14, 2025
662661d
Revert lib stack changes for S3 resources
jugal-chauhan Apr 14, 2025
bc2f04f
allow Snapshot role all resources
jugal-chauhan Apr 14, 2025
94a82e3
Add logic to get values from conftest
jugal-chauhan Apr 14, 2025
3024fd8
Update test to use both fixtures
jugal-chauhan Apr 14, 2025
3444208
Fix errors on fixtures
jugal-chauhan Apr 14, 2025
93ae015
Reached success and remove natGateway
jugal-chauhan Apr 14, 2025
3172799
Rename files for PR and add natGateway
jugal-chauhan Apr 14, 2025
b793cae
Add cleanup steps post job execution
jugal-chauhan Apr 15, 2025
88df222
Drop custom es56 yaml from PR
jugal-chauhan Apr 15, 2025
a74d440
Add parameter for scaling RFS workers
jugal-chauhan Apr 15, 2025
f6cda43
Rename large snapshot S3 bucket name to include stage
jugal-chauhan Apr 15, 2025
d362553
Allow arbitrary stage name
jugal-chauhan Apr 15, 2025
491f5a5
Fix deploy MA stacks to use the provided stage name
jugal-chauhan Apr 15, 2025
5dc1d2f
Add cluster version parameter and dynamically preload data using valu…
jugal-chauhan Apr 15, 2025
5f640a9
Explicitly delete existing index at start
jugal-chauhan Apr 15, 2025
45f458f
Edit lockable resource name
jugal-chauhan Apr 15, 2025
9aa7bcf
Increase assumed role duration for expired AWS credentials
jugal-chauhan Apr 16, 2025
4ea6e41
Rename snapshot and update S3 folder name deletion
jugal-chauhan Apr 16, 2025
a7b994e
Remove final snapshot config repo name
jugal-chauhan Apr 16, 2025
86dbf8f
Revert change for credentials duration
jugal-chauhan Apr 16, 2025
2b374ef
Increase duration for credentials correctly
jugal-chauhan Apr 16, 2025
0b22442
Enable custom domain creation using parameter
jugal-chauhan Apr 17, 2025
a598923
Test for os2x parameters
jugal-chauhan Apr 17, 2025
1b45e04
Update cdk engineversion
jugal-chauhan Apr 17, 2025
3083d7f
Test fix adds versiontype
jugal-chauhan Apr 17, 2025
76da9a2
Test correction distVersion
jugal-chauhan Apr 17, 2025
1fd3476
Test skip source deploy
jugal-chauhan Apr 17, 2025
df6511a
Test shorten domain name
jugal-chauhan Apr 17, 2025
ff6deda
Test fix zoneawareness
jugal-chauhan Apr 17, 2025
4cfba97
Test avoid validationerror
jugal-chauhan Apr 17, 2025
dcf78a2
Test fix remove source config
jugal-chauhan Apr 17, 2025
d0a1d0e
Test readd source context with placeholder
jugal-chauhan Apr 17, 2025
7b239c5
Test reduce vpcAZCount to 2
jugal-chauhan Apr 17, 2025
9f9ef23
Test replace source context
jugal-chauhan Apr 17, 2025
be809b9
Test remove source cdk
jugal-chauhan Apr 17, 2025
69266c0
Test deploy in custom region
jugal-chauhan Apr 17, 2025
a49b3b4
Test skip source deploy
jugal-chauhan Apr 17, 2025
c6f4291
Test add region support
jugal-chauhan Apr 17, 2025
66b26b0
Test update region support
jugal-chauhan Apr 17, 2025
b2d20f5
Test reduce instance type
jugal-chauhan Apr 17, 2025
e36edd8
Test add region for script
jugal-chauhan Apr 17, 2025
aed3cc6
Add more logging for param
jugal-chauhan Apr 17, 2025
e7a5021
Fix test to use target cluster
jugal-chauhan Apr 17, 2025
ad6e6d4
Update test to use sigv4 auth
jugal-chauhan Apr 17, 2025
6f60eae
Fix services file for sigv4
jugal-chauhan Apr 17, 2025
3a0d8a1
Edit logic to fetch stack region
jugal-chauhan Apr 17, 2025
42536dc
Revert update test to use sigv4 auth
jugal-chauhan Apr 17, 2025
f080f0a
Update inc timeout
jugal-chauhan Apr 17, 2025
02948e0
Update increase timeout
jugal-chauhan Apr 17, 2025
1b6402c
Remove duplicate import
jugal-chauhan Apr 17, 2025
408d4b4
Remove connection check for test
jugal-chauhan Apr 17, 2025
3272403
Add logging to debug
jugal-chauhan Apr 17, 2025
eeee01b
Test log cat-indices
jugal-chauhan Apr 17, 2025
28e496d
Allow target snapshot if no source
jugal-chauhan Apr 18, 2025
2e10d52
CDK deploy snapshotRole
jugal-chauhan Apr 18, 2025
88def5d
Pass default snapshot role
jugal-chauhan Apr 18, 2025
18d00f5
Test add targetcluster cdk
jugal-chauhan Apr 18, 2025
44c7af9
Remove conditions for snapshot role
jugal-chauhan Apr 18, 2025
991a58d
Remove flag
jugal-chauhan Apr 18, 2025
cd9d108
Replace source_cluster with target
jugal-chauhan Apr 18, 2025
e0baaeb
Region specific S3 bucket
jugal-chauhan Apr 18, 2025
9697be5
Remove region during cleanup
jugal-chauhan Apr 18, 2025
211adb0
Add os1x support
jugal-chauhan Apr 18, 2025
0e340c9
Change deploy_region var and param name
jugal-chauhan Apr 18, 2025
0cb8208
Add snapshot region param
jugal-chauhan Apr 19, 2025
aacfcab
Add endpoint in createSnapshot
jugal-chauhan Apr 21, 2025
cedb3a5
Add broader perm in requestingRole instead
jugal-chauhan Apr 21, 2025
7569b72
Add logging for final snapshot
jugal-chauhan Apr 21, 2025
5f1c684
Limit param options for regions
jugal-chauhan Apr 21, 2025
42fb3bc
Add snapshot request timeout
jugal-chauhan Apr 22, 2025
f4c343a
Update logging
jugal-chauhan Apr 22, 2025
7477856
Changing instance type to avoid ICEing
jugal-chauhan Apr 22, 2025
9ede8da
Update master nodes to be graviton
jugal-chauhan Apr 22, 2025
8a16f56
Add strict wait for backfill stop
jugal-chauhan Apr 23, 2025
d9ff8b2
Update race condition and connection error
jugal-chauhan Apr 23, 2025
3c35f55
Fix 404 error handling
jugal-chauhan Apr 23, 2025
9a4c65e
Update instance type based on engine
jugal-chauhan Apr 23, 2025
f30d344
Avoid ICEing error on es6
jugal-chauhan Apr 23, 2025
3671ed5
Add ebs volume for c5 instance type
jugal-chauhan Apr 23, 2025
b2aecd0
Test debug RFS failure
jugal-chauhan Apr 24, 2025
110c499
Revert debug change
jugal-chauhan Apr 24, 2025
0a24f0c
Test es5 on diff region
jugal-chauhan Apr 24, 2025
2c90e95
Test os1 ICE error
jugal-chauhan Apr 24, 2025
13fc1d3
Add source version extra args
jugal-chauhan Apr 24, 2025
944f4f4
Add dynamic data node count
jugal-chauhan Apr 24, 2025
7892ba9
Add us-east-2 as deploy region option
jugal-chauhan Apr 24, 2025
2ebcee2
Add documentation for source version
jugal-chauhan Apr 25, 2025
aa1ab64
Add log4j debugs to debug pipeline
jugal-chauhan May 1, 2025
962a702
Debug remove default sigv4 from target
jugal-chauhan May 1, 2025
f9bb0d0
Target sigv4 auth from the migration cdk
jugal-chauhan May 2, 2025
ad05473
Merge branch 'main' into test-k8s-large-snapshot
jugal-chauhan May 2, 2025
ffb9a8c
Merge remote-tracking branch 'jugalupstream/main' into test-k8s-large…
jugal-chauhan May 2, 2025
85f3b35
Bring in conflicting target sigv4 auth
jugal-chauhan May 2, 2025
b1c2516
Add correct args due to merge conflicts
jugal-chauhan May 2, 2025
00b069b
Add Optional due to conflict
jugal-chauhan May 2, 2025
d94193c
Debug abs snapshot method
jugal-chauhan May 5, 2025
a638b91
Fix method declaration for jenkins test error
jugal-chauhan May 6, 2025
c5adb0c
Fix flake8 results
jugal-chauhan May 6, 2025
2ffccb5
Revert back default noAuth on target
jugal-chauhan May 7, 2025
8c4f213
Fix target cluster auth from cdk
jugal-chauhan May 8, 2025
56c0322
Remove unused S3-snapshot-schema
jugal-chauhan May 8, 2025
98aef4e
Add new method to update cluster auth
jugal-chauhan May 8, 2025
22d63db
Merge branch 'main' into test-k8s-large-snapshot
jugal-chauhan May 8, 2025
ccdae5c
Change default auth on target to be SIGv4
jugal-chauhan May 8, 2025
50fd2ad
Clean up snapshot deletion logic
jugal-chauhan May 8, 2025
b6a216a
Fix all flake8 issues
jugal-chauhan May 8, 2025
db94d0c
Fix pytest error due to timeout
jugal-chauhan May 8, 2025
ee41d25
Remove natgateway from network stack
jugal-chauhan May 8, 2025
a19d639
Refactor calc expected doc count
jugal-chauhan May 8, 2025
b380d70
Fix flake8 on transformer config
jugal-chauhan May 8, 2025
061b8cc
Enable natgateway from network stack
jugal-chauhan May 8, 2025
5449c88
Revert natgateway to 0
jugal-chauhan May 9, 2025
f586bce
Remove debug logs, add natgateway, fix test snapshot
jugal-chauhan May 9, 2025
f90e888
Add logging in pytest for cleanup
jugal-chauhan May 9, 2025
53fba36
Fix tests by adding timeout param
jugal-chauhan May 9, 2025
8d81239
Remove logging which were added to debug
jugal-chauhan May 9, 2025
761eb9d
Merge branch 'main' into test-k8s-large-snapshot
jugal-chauhan May 9, 2025
8619d90
Merge branch 'main' into test-k8s-large-snapshot
jugal-chauhan May 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,12 @@ public static class Args {
description = "The role ARN the cluster will assume to write a snapshot to S3")
public String s3RoleArn;

@Parameter(
names = {"--s3-endpoint" },
required = false,
description = "The S3 endpoint URL to use for the S3 bucket, like: s3.us-west-2.amazonaws.com")
public String s3Endpoint;

@Parameter(
names = {"--index-allowlist"},
required = false,
Expand Down Expand Up @@ -163,6 +169,7 @@ public void run() {
arguments.indexAllowlist,
arguments.maxSnapshotRateMBPerNode,
arguments.s3RoleArn,
arguments.s3Endpoint,
context
);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ public class S3SnapshotCreator extends SnapshotCreator {
private final String s3Region;
private final Integer maxSnapshotRateMBPerNode;
private final String snapshotRoleArn;
private final String s3Endpoint;

public S3SnapshotCreator(
String snapshotName,
Expand All @@ -24,7 +25,7 @@ public S3SnapshotCreator(
List<String> indexAllowlist,
IRfsContexts.ICreateSnapshotContext context
) {
this(snapshotName, snapshotRepoName, client, s3Uri, s3Region, indexAllowlist, null, null, context);
this(snapshotName, snapshotRepoName, client, s3Uri, s3Region, indexAllowlist, null, null, null, context);
}

public S3SnapshotCreator(
Expand All @@ -36,13 +37,15 @@ public S3SnapshotCreator(
List<String> indexAllowlist,
Integer maxSnapshotRateMBPerNode,
String snapshotRoleArn,
String s3Endpoint,
IRfsContexts.ICreateSnapshotContext context
) {
super(snapshotName, snapshotRepoName, indexAllowlist, client, context);
this.s3Uri = s3Uri;
this.s3Region = s3Region;
this.maxSnapshotRateMBPerNode = maxSnapshotRateMBPerNode;
this.snapshotRoleArn = snapshotRoleArn;
this.s3Endpoint = s3Endpoint;
}

@Override
Expand All @@ -60,6 +63,10 @@ public ObjectNode getRequestBodyForRegisterRepo() {
if (maxSnapshotRateMBPerNode != null) {
settings.put("max_snapshot_bytes_per_sec", maxSnapshotRateMBPerNode + "mb");
}

if (s3Endpoint != null) {
settings.put("endpoint", s3Endpoint);
}

ObjectNode body = mapper.createObjectNode();
body.put("type", "s3");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,8 @@ def __init__(self, config_file: str):

if 'snapshot' in self.config:
self.snapshot: Snapshot = get_snapshot(self.config["snapshot"],
source_cluster=self.source_cluster)
source_cluster=self.source_cluster,
target_cluster=self.target_cluster)
logger.info(f"Snapshot initialized: {self.snapshot}")
else:
logger.info("No snapshot provided")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def connection_check(cluster: Cluster) -> ConnectionResult:
caught_exception = None
r = None
try:
r = cluster.call_api(cluster_details_path, timeout=3)
r = cluster.call_api(cluster_details_path, timeout=20)
except Exception as e:
caught_exception = e
logging.debug(f"Unable to access cluster: {cluster} with exception: {e}")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ def _generate_auth_object(self) -> requests.auth.AuthBase | None:
raise NotImplementedError(f"Auth type {self.auth_type} not implemented")

def call_api(self, path, method: HttpMethod = HttpMethod.GET, data=None, headers=None,
timeout=None, session=None, raise_error=True, **kwargs) -> requests.Response:
timeout=300, session=None, raise_error=True, **kwargs) -> requests.Response:
"""
Calls an API on the cluster.
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,12 @@ def __init__(self, command_root: str, command_args: Dict[str, Any], sensitive_fi
run_as_detatched: bool = False, log_file: Optional[str] = None):
self.command_args = command_args
self.command = [command_root]
if "__positional__" in command_args:
Copy link
Member

@AndreKurait AndreKurait May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of __positional__, would it make sense to have this in command_root? Maybe with command_root as an array

self.command.extend(command_args["__positional__"])

for key, value in command_args.items():
if key == "__positional__":
continue
self.command.append(key)
if value is not FlagOnlyArgument:
if type(value) is not str:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,17 @@ def __init__(self, supplied_backfill: str):
super().__init__("Unsupported backfill type", supplied_backfill)


def get_snapshot(config: Dict, source_cluster: Cluster):
def get_snapshot(config: Dict, source_cluster: Cluster = None, target_cluster: Cluster = None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit strange to pass in two clusters to this function given it only operates on one. What about just passing one cluster in

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just uses one cluster to perform any operation. cluster_to_use uses target when source cluster is not present.

# Use target_cluster as a fallback when source_cluster is not available
cluster_to_use = source_cluster if source_cluster is not None else target_cluster

if cluster_to_use is None:
raise ValueError("Either source_cluster or target_cluster must be provided for snapshot operations")

if 'fs' in config:
return FileSystemSnapshot(config, source_cluster)
return FileSystemSnapshot(config, cluster_to_use)
elif 's3' in config:
return S3Snapshot(config, source_cluster)
return S3Snapshot(config, cluster_to_use)
logger.error(f"An unsupported snapshot type was provided: {config.keys()}")
if len(config.keys()) > 1:
raise UnsupportedSnapshotError(', '.join(config.keys()))
Expand Down
Loading
Loading