Skip to content

feat(reports): configure reports for presigned recording transfer #1102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

andrewazores
Copy link
Member

@andrewazores andrewazores commented Apr 25, 2025

Welcome to Cryostat! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits: git commit -S -m "YOUR_COMMIT_MESSAGE"

Related to cryostatio/cryostat#856
Depends on #1101
Based on #1101
Depends on cryostatio/cryostat-reports#345

Description of the change:

Adjusts the report deployment configuration to account for the Cryostat change above.

Rather than Cryostat acting as a network pipe middleman to stream JFR files out of storage and into the report generator(s), Cryostat now simply points the report generator to an archived recording file storage URL. The report generator retrieves the file on its own. This requires a few extra pieces of configuration on the report generator side to tell it what the storage container location is, as well as needing to relax the storage network policy to allow these requests, and so that the report generator correctly trusts the storage SSL/TLS cert (TODO).

Motivation for the change:

Support the new reports presigned transfer feature, which should overall network bandwidth used for report generations and also likely decrease report generation latency.

How to manually test:

  1. Check out and build PR, use linked Reports PR or my pre-built one: REPORTS_IMG=quay.io/andrewazores/cryostat-reports:presign-ssl-certs-1
  2. Deploy Operator, create Cryostat CR with reportOptions: { replicas: 1 }
  3. make sample_app in a target namespace
  4. Open Cryostat UI and ensure report generation works as usual. Check Reports Pod(s) logs and ensure there are lines like Attempting to download presigned recording from ...
  5. Tear down, then set up a new test scenario with TLS disabled (no cert manager) and repeat the test. Everything should still work.

@andrewazores andrewazores added feat New feature or request safe-to-test labels Apr 25, 2025
@andrewazores
Copy link
Member Author

/build_test

@andrewazores
Copy link
Member Author

Copy link

/build_test : At least one test failed ❌.
View Actions Run.

@andrewazores
Copy link
Member Author

/build_test

Copy link

/build_test : At least one test failed ❌.
View Actions Run.

@andrewazores
Copy link
Member Author

2025-04-28 15:25:34,215 ERROR [io.cry.dis.KubeApiDiscovery] (executor-thread-8) Failed to syncronize Endpoints in namespace cryostat-operator-scorecard: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: [https://10.96.0.1:443/api/v1/namespaces/cryostat-operator-scorecard/pods/cryostat-recording-d9456cd8b-8zhpd.](https://10.96.0.1/api/v1/namespaces/cryostat-operator-scorecard/pods/cryostat-recording-d9456cd8b-8zhpd.) Message: pods "cryostat-recording-d9456cd8b-8zhpd" is forbidden: User "system:serviceaccount:cryostat-operator-scorecard:cryostat-recording" cannot get resource "pods" in API group "" in the namespace "cryostat-operator-scorecard". Received status: Status(apiVersion=v1, code=403, details=StatusDetails(causes=[], group=null, kind=pods, name=cryostat-recording-d9456cd8b-8zhpd, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "cryostat-recording-d9456cd8b-8zhpd" is forbidden: User "system:serviceaccount:cryostat-operator-scorecard:cryostat-recording" cannot get resource "pods" in API group "" in the namespace "cryostat-operator-scoreca
			at io.fabric8.kubernetes.client.KubernetesClientException.copyAsCause(KubernetesClientException.java:238)
			at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:507)
			at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:524)
			at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleGet(OperationSupport.java:467)
			at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleGet(BaseOperation.java:792)
			at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:193)
			at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:149)
			at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:98)
			at io.cryostat.discovery.KubeDiscoveryNodeType.lambda$static$15(KubeApiDiscovery.java:661)
			at io.cryostat.discovery.KubeApiDiscovery.queryForNode(KubeApiDiscovery.java:505)
			at io.cryostat.discovery.KubeApiDiscovery.tuplesFromEndpoints(KubeApiDiscovery.java:232)
			at io.cryostat.discovery.KubeApiDiscovery.getTargetTuplesFrom(KubeApiDiscovery.java:243)
			at io.cryostat.discovery.KubeApiDiscovery.lambda$handleQueryEvent$4(KubeApiDiscovery.java:300)
			at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)

Seems like a strange failure, and I'm not sure how/why that would be connected to this PR.

@andrewazores
Copy link
Member Author

/build_test

Copy link

/build_test : At least one test failed ❌.
View Actions Run.

Copy link

github-actions bot commented Apr 29, 2025

@andrewazores
Copy link
Member Author

I'm actually not sure if that Endpoints synchronization message above is a real failure. Running the tests locally with other dependencies satisfied results in a passed scorecard suite, even with that message printed. I wonder if that's also just an artefact of the container shutdown/cleanup process.

$ export REPORTS_IMG=quay.io/andrewazores/cryostat-reports:presign-ssl-certs-1
$ export IMAGE_NAMESPACE=quay.io/andrewazores
$ export OPERATOR_VERSION=4.1.0-reports-presign-31 # yea, 31, don't judge
$ SKIP_TESTS=true make generate manifests manager oci-build bundle bundle-build  ; podman image prune -f ; podman push $IMAGE_NAMESPACE/cryostat-operator:$OPERATOR_VERSION ; podman push $IMAGE_NAMESPACE/cryostat-operator-bundle:$OPERATOR_VERSION
$ make test-scorecard 2>&1 | tee scorecard.log

@andrewazores
Copy link
Member Author

scorecard.log

Copy link
Member

@ebaron ebaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this when testing using the PR description:

Reports pod:

2025-05-01 18:49:51,964 INFO  [io.cry.rep.ReportResource] (executor-thread-1) Attempting to download presigned recording from https://cryostat-sample-storage.cryostat-operator-system.svc.cluster.local:8333/archivedrecordings/9Y5wflb2w0C3dysM-Wg5c3urPYFliz0u3A_SWvt5xUo=/quarkus-cryostat-agent-6f5746dc9f-c9wg2_test_20250501T184951Z.jfr?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20250501T184951Z&X-Amz-SignedHeaders=host&X-Amz-Credential=cryostat/20250501/us-east-1/s3/aws4_request&X-Amz-Expires=60&X-Amz-Signature=446cf11f5fe77486003c331b27e2359907ccce0ecd12d625eab886b79cf0921f
2025-05-01 18:49:52,182 INFO  [io.qua.htt.access-log] (executor-thread-1) 10.128.0.50 - - [01/May/2025:18:49:52 +0000] "POST /remote_report HTTP/1.1" 500 72

Cryostat pod:

2025-05-01 18:49:52,185 WARN  [com.git.ben.caf.cac.LocalAsyncCache] (vert.x-eventloop-thread-2) Exception thrown during asynchronous load: java.util.concurrent.CompletionException: org.jboss.resteasy.reactive.ClientWebApplicationException: Received: 'Internal Server Error, status code 500' when invoking REST Client method: 'io.cryostat.reports.ReportSidecarService#generatePresigned'
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:874)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
	at io.smallrye.context.impl.wrappers.SlowContextualConsumer.accept(SlowContextualConsumer.java:21)
	at io.smallrye.mutiny.helpers.UniCallbackSubscriber.onFailure(UniCallbackSubscriber.java:62)
	at io.smallrye.mutiny.operators.uni.UniOperatorProcessor.onFailure(UniOperatorProcessor.java:55)
	at io.smallrye.mutiny.operators.uni.UniOperatorProcessor.onFailure(UniOperatorProcessor.java:55)
	at io.smallrye.mutiny.operators.uni.UniOnItemConsume$UniOnItemComsumeProcessor.onFailure(UniOnItemConsume.java:65)
	at io.smallrye.mutiny.operators.uni.UniOnFailureTransform$UniOnFailureTransformProcessor.onFailure(UniOnFailureTransform.java:64)
	at io.smallrye.mutiny.operators.uni.builders.UniCreateFromCompletionStage$CompletionStageUniSubscription.forwardResult(UniCreateFromCompletionStage.java:58)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
	at org.jboss.resteasy.reactive.client.impl.RestClientRequestContext.handleUnrecoverableError(RestClientRequestContext.java:378)
	at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.handleException(AbstractResteasyReactiveContext.java:329)
	at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:175)
	at org.jboss.resteasy.reactive.client.impl.RestClientRequestContext$1.lambda$execute$0(RestClientRequestContext.java:324)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:270)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:252)
	at io.vertx.core.impl.ContextInternal.lambda$runOnContext$0(ContextInternal.java:50)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:405)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.jboss.resteasy.reactive.ClientWebApplicationException: Received: 'Internal Server Error, status code 500' when invoking REST Client method: 'io.cryostat.reports.ReportSidecarService#generatePresigned'
	at org.jboss.resteasy.reactive.client.impl.RestClientRequestContext.unwrapException(RestClientRequestContext.java:205)
	... 14 more
Caused by: jakarta.ws.rs.WebApplicationException: Internal Server Error, status code 500
	at io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21)
	at io.smallrye.mutiny.operators.uni.builders.UniCreateFromCompletionStage.subscribe(UniCreateFromCompletionStage.java:24)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.operators.uni.UniOnFailureTransform.subscribe(UniOnFailureTransform.java:31)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.operators.uni.UniOnItemConsume.subscribe(UniOnItemConsume.java:30)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.operators.uni.UniOnItemTransformToUni$UniOnItemTransformToUniProcessor.performInnerSubscription(UniOnItemTransformToUni.java:81)
	at io.smallrye.mutiny.operators.uni.UniOnItemTransformToUni$UniOnItemTransformToUniProcessor.onItem(UniOnItemTransformToUni.java:57)
	at io.smallrye.mutiny.operators.uni.builders.UniCreateFromItemSupplier.subscribe(UniCreateFromItemSupplier.java:29)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.operators.uni.UniOnItemTransformToUni.subscribe(UniOnItemTransformToUni.java:25)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.operators.uni.UniOnItemTransform.subscribe(UniOnItemTransform.java:22)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.groups.UniSubscribe.withSubscriber(UniSubscribe.java:51)
	at io.smallrye.mutiny.groups.UniSubscribe.with(UniSubscribe.java:110)
	at io.smallrye.mutiny.operators.uni.UniSubscribeToCompletionStage.subscribe(UniSubscribeToCompletionStage.java:30)
	at io.smallrye.mutiny.groups.UniSubscribe.asCompletionStage(UniSubscribe.java:174)
	at io.smallrye.mutiny.groups.UniSubscribe.asCompletionStage(UniSubscribe.java:162)
	at io.smallrye.mutiny.Uni.subscribeAsCompletionStage(Uni.java:141)
	at io.quarkus.cache.runtime.caffeine.CaffeineCacheImpl$4$1.apply(CaffeineCacheImpl.java:122)
	at io.quarkus.cache.runtime.caffeine.CaffeineCacheImpl$4$1.apply(CaffeineCacheImpl.java:116)
	at com.github.benmanes.caffeine.cache.LocalAsyncCache$AsyncAsMapView.lambda$computeIfAbsent$0(LocalAsyncCache.java:366)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2677)
	at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2675)
	at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2658)
	at com.github.benmanes.caffeine.cache.LocalAsyncCache$AsyncAsMapView.computeIfAbsent(LocalAsyncCache.java:365)
	at com.github.benmanes.caffeine.cache.LocalAsyncCache$AsyncAsMapView.computeIfAbsent(LocalAsyncCache.java:293)
	at io.quarkus.cache.runtime.caffeine.CaffeineCacheImpl$4.get(CaffeineCacheImpl.java:115)
	at io.quarkus.cache.runtime.caffeine.CaffeineCacheImpl$4.get(CaffeineCacheImpl.java:109)
	at io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21)
	at io.smallrye.mutiny.operators.uni.builders.UniCreateFromCompletionStage.subscribe(UniCreateFromCompletionStage.java:24)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.operators.uni.UniOnItemTransform.subscribe(UniOnItemTransform.java:22)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.operators.uni.UniEmitOn.subscribe(UniEmitOn.java:22)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.operators.uni.UniOnItemConsume.subscribe(UniOnItemConsume.java:30)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.operators.uni.UniFailOnTimeout.subscribe(UniFailOnTimeout.java:36)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.operators.uni.UniOnItemConsume.subscribe(UniOnItemConsume.java:30)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.operators.uni.UniOnTermination.subscribe(UniOnTermination.java:21)
	at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
	at io.smallrye.mutiny.groups.UniSubscribe.withSubscriber(UniSubscribe.java:51)
	at io.smallrye.mutiny.groups.UniSubscribe.with(UniSubscribe.java:110)
	at io.smallrye.mutiny.operators.uni.UniSubscribeToCompletionStage.subscribe(UniSubscribeToCompletionStage.java:30)
	at io.smallrye.mutiny.groups.UniSubscribe.asCompletionStage(UniSubscribe.java:174)
	at io.smallrye.mutiny.groups.UniSubscribe.asCompletionStage(UniSubscribe.java:162)
	at io.smallrye.mutiny.Uni.subscribeAsCompletionStage(Uni.java:141)
	at io.cryostat.recordings.LongRunningRequestGenerator_onMessage_Invoker_izeA4IbSRjunhFJnQw1q0NJka-4.invoke(Unknown Source)
	at io.cryostat.recordings.LongRunningRequestGenerator_onMessage_LazyInvoker_izeA4IbSRjunhFJnQw1q0NJka-4.invoke(Unknown Source)
	at io.quarkus.vertx.runtime.EventConsumerInvoker.invokeBean(EventConsumerInvoker.java:79)
	at io.quarkus.vertx.runtime.EventConsumerInvoker.invoke(EventConsumerInvoker.java:51)
	at io.quarkus.vertx.runtime.VertxEventBusConsumerRecorder$3$1$2.call(VertxEventBusConsumerRecorder.java:158)
	at io.quarkus.vertx.runtime.VertxEventBusConsumerRecorder$3$1$2.call(VertxEventBusConsumerRecorder.java:154)
	at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$4(ContextImpl.java:192)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:270)
	at io.vertx.core.impl.ContextImpl$1.execute(ContextImpl.java:221)
	at io.vertx.core.impl.WorkerTask.run(WorkerTask.java:56)
	at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:635)
	at org.jboss.threads.EnhancedQueueExecutor$Task.doRunWith(EnhancedQueueExecutor.java:2516)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2495)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1521)
	at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:11)
	at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:11)
	... 2 more

@andrewazores
Copy link
Member Author

Was that with the PR body description or the steps in the last most recent comment?

@ebaron
Copy link
Member

ebaron commented May 1, 2025

Was that with the PR body description or the steps in the last most recent comment?

It was the body description

@ebaron
Copy link
Member

ebaron commented May 1, 2025

Oh whoops, forgot to override REPORTS_IMG. Is that still needed, or has that change been merged?

@andrewazores
Copy link
Member Author

That change hasn't been merged yet on the -reports side.

ebaron
ebaron previously approved these changes May 1, 2025
@andrewazores
Copy link
Member Author

/build_test

Copy link

github-actions bot commented May 2, 2025

/build_test completed successfully ✅.
View Actions Run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependent feat New feature or request safe-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants