-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Scale Testing Pipeline for Cilium L7 & Kubernetes Network Policies #554
base: main
Are you sure you want to change the base?
Conversation
…date related references in configuration files
…edundant parameters
…pdating argument flags to be optional
…er2 function by removing unused parameters for improved clarity
…yml for improved command visibility
…hon script file reference
…s for consistency
…updating cloud_info parameter for consistency
modules/python/clusterloader2/netpol-scale/config/modules/cilium-envoy-measurments.yaml
Outdated
Show resolved
Hide resolved
…sterloader image comment in netpol-scale-testing.yml
… collect.yml to use variable for cloud_info
…um-envoy-measurments.yaml measurments typo in the file name, fixed and pushed new file.
…hours to every 8 hours
@@ -10,6 +10,8 @@ Repository Bloat Risks | |||
*.gz | |||
bin/ | |||
debug/ | |||
venv/ | |||
.gitignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want the gitignore in this file
displayName: "Every 8 hours" | ||
branches: | ||
include: | ||
- sarathsa/cilium-l7-scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this still be present?
- $(LOCATION) | ||
engine: clusterloader2 | ||
engine_input: | ||
image: "ghcr.io/sanamsarath/clusterloader2:vtest" # TODO: Fix this after perf-tests PR is merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Order of operations here?
# TODO: Remove aks once CL2 update provider name to be azure | ||
|
||
|
||
def configure_clusterloader2( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add unit tests for all these new methods
DAEMONSETS_PER_NODE = {"aws": 2, "azure": 6, "aks": 6} | ||
CPU_CAPACITY = {"aws": 0.94, "azure": 0.87, "aks": 0.87} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could import it from
telescope/modules/python/clusterloader2/slo/slo.py
Lines 14 to 23 in ea4f9d5
DAEMONSETS_PER_NODE = { | |
"aws": 2, | |
"azure": 6, | |
"aks": 6 | |
} | |
CPU_CAPACITY = { | |
"aws": 0.94, | |
"azure": 0.87, | |
"aks": 0.87 | |
} |
# test config | ||
# add "s" at the end of test_duration_secs | ||
file.write("# Test config\n") | ||
test_duration = str(test_duration_secs) + "s" | ||
# Test config | ||
# add "s" at the end of test_duration_secs | ||
file.write("# Test config\n") | ||
test_duration = f"{test_duration_secs}s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated
file.close() | ||
|
||
|
||
def validate_clusterloader2(node_count=2, operation_timeout_in_minutes=10): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | ||
|
||
|
||
def execute_clusterloader2( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
content = "" | ||
for f in os.listdir(cl2_report_dir): | ||
file_path = os.path.join(cl2_report_dir, f) | ||
with open(file_path, "r", encoding="utf-8") as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a try/except, so it can continue reading the other files if some of them fails
help="Number of workers per client", | ||
) | ||
parser_configure.add_argument( | ||
"--netpol_type", type=str, required=True, help="Type of network policy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add the valid options
- script: | | ||
run_id=$(Build.BuildId)-$(System.JobId) | ||
echo "Run ID: $run_id" | ||
echo "##vso[task.setvariable variable=RUN_ID]$run_id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required as the run id is already set here
matrix: | ||
azure_cilium: | ||
cl2_config_file: netpol-scale-config.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the other parameters? Example:
telescope/pipelines/perf-eval/CNI Benchmark/slo-servicediscovery-cilium-nodesubnet.yml
Lines 28 to 40 in ea4f9d5
matrix: | |
azure_cilium: | |
cpu_per_node: 4 | |
node_count: 1000 | |
node_per_step: 1000 | |
max_pods: 20 | |
repeats: 10 | |
scale_timeout: "15m" | |
cilium_enabled: True | |
network_policy: cilium | |
network_dataplane: cilium | |
service_test: True | |
cl2_config_file: load-config.yaml |
This pull request establishes a new pipeline for scale testing Cilium L7 network policies, and Kubernetes network policies. It also supports configuration to run feature, soak, and load tests. Currently, it handles network policies matching HTTP traffic, with plans to extend support for benchmarking and scale testing other L4 and L7 network policies.
[Copilot generated Summary]
This pull request introduces several new configurations and functionalities for Cilium and network policy scale testing in the
clusterloader2
framework. The changes include adding new measurement modules, updating configuration files, and enhancing the main script to support these new features.Key changes include:
New Measurement Modules:
cilium-envoy-measurments.yaml
to collect various Cilium Envoy HTTP and memory metrics using Prometheus queries.cilium-measurements.yaml
to gather additional Cilium metrics such as queueing delay, CPU usage, and memory usage using Prometheus queries.Configuration Updates:
netpol-scale-config.yaml
to include parameters for enabling Cilium and Cilium Envoy, and to define the steps for starting and gathering measurements.Script Enhancements:
netpol_scale.py
to include functions for configuring, validating, executing, and collecting results fromclusterloader2
tests. The script now supports command-line arguments for various test parameters and configurations.Pipeline Configuration:
netpol-scale-testing.yml
to define the CI/CD pipeline for network policy scale testing usingclusterloader2
.