Skip to content

Testing the new NVCF feature enable-gateway-timeout#2576

Closed
sacpis wants to merge 2 commits intoNVIDIA:mainfrom
sacpis:add_enable_gateway_timeout_header
Closed

Testing the new NVCF feature enable-gateway-timeout#2576
sacpis wants to merge 2 commits intoNVIDIA:mainfrom
sacpis:add_enable_gateway_timeout_header

Conversation

@sacpis
Copy link
Collaborator

@sacpis sacpis commented Feb 2, 2025

Testing the new NVCF feature enabled-gateway-timeout.

Currently, when a client invokes an API, the job gets into the queue on the server side. There is a timeout for the queue. If the worker does not/could not pick up the job within the queue timeout, a HTTP response of 202 (request accepted) is sent back to the client. Even though the worker has not picked up the job from the queue.

The new feature enables the a correct HTTP response back to the client to indicate what has happened to their request. If the job is not picked up by the worker within the queue timeout, a HTTP response of 504 (gateway timeout) is sent back to the client indicating that the worker has failed to pick up the job form the within the set queue timeout. In this case, the client can send back the request or we can have a retry mechanism.

With the new NVCF feature, once the worker picks up the job within the queue timeout, a HTTP response of 202 is sent back to the client. Now the client needs to poll the server for the result. This poll interval can be set in the request header with a key NVCF_POLL-SECONDS to a value between 1 min (default) to 20 minutes (maximum value). For long running job, it is recommended to have a long polling value.

Please refer to the NVCF document here.

Signed-off-by: Sachin Pisal <spisal@nvidia.com>
@github-actions
Copy link

github-actions bot commented Feb 2, 2025

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

github-actions bot pushed a commit that referenced this pull request Feb 2, 2025
@sacpis
Copy link
Collaborator Author

sacpis commented Feb 2, 2025

I am planning to test this using long running examples with more number of shots. That will keep the workers busy, which will put the incoming jobs into the queue. This will trigger the queue timeout which will return 504 to the client.

Please let me know if you have any other thoughts for testing this.

@sacpis sacpis changed the title [WIP] Testing the new NVCF feature enable-gateway-timeout Testing the new NVCF feature enable-gateway-timeout Feb 4, 2025
@sacpis sacpis marked this pull request as ready for review February 4, 2025 19:33
@github-actions
Copy link

github-actions bot commented Feb 4, 2025

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

github-actions bot pushed a commit that referenced this pull request Feb 4, 2025
{"nvcf-feature-enable-gateway-timeout", "true"},
// The max timeout for the polling response is 20 minutes
// https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/api.html#http-polling
{"NVCF-POLL-SECONDS", "1200"}};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is this doing? I think the prior behavior was that the user would get some sort of "heartbeat" polling message approximately every 5 seconds. Is that going away? Do they have to wait 1200 seconds for that heartbeat now? Does it still run correctly if their job takes ~1 hour?

@sacpis
Copy link
Collaborator Author

sacpis commented Dec 13, 2025

Closing this PR as we are not supporting NVQC anymore.

@sacpis sacpis closed this Dec 13, 2025
github-actions bot pushed a commit that referenced this pull request Dec 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants