cc.api_post_start_healthcheck_timeout_in_seconds doesn't seem to matter

## Issue

The `cc.api_post_start_healthcheck_timeout_in_seconds` field doesn't seem to have any effect.


## Context

The `cc.api_post_start_healthcheck_timeout_in_seconds` config field seems to imply that the post-start will fail if CCNG fails report healthy within that time frame:

https://github.com/cloudfoundry/capi-release/blob/6f0f64cd6738fa260294c00315a58658b052f121/jobs/cloud_controller_ng/spec#L349-L351

However, we pushed a changed to CCNG that slept for 10 hours before starting the thin server (in [the runner](https://github.com/cloudfoundry/cloud_controller_ng/blob/main/lib%2Fcloud_controller%2Frunner.rb)), and this timeout did not get triggered.  Instead, 20 minutes passed until both `ccng_monit_http_healthcheck` and `nginx_cc` failed. 

```
Task 277 | 22:03:30 | L starting jobs: api/abf448d3-74e1-4086-80c2-130814373e14 (0) (canary)                                                                                                          Task 277 | 22:03:57 | Updating instance scheduler: scheduler/7d611e5a-1ada-41a4-b811-49b5dbcb2b2f (0) (canary) (00:02:09)
Task 277 | 22:23:32 | Updating instance api: api/abf448d3-74e1-4086-80c2-130814373e14 (0) (canary) (00:21:44)
                    L Error: 'api/abf448d3-74e1-4086-80c2-130814373e14 (0)' is not running after update. Review logs for failed jobs: ccng_monit_http_healthcheck, nginx_cc
Task 277 | 22:23:32 | Error: 'api/abf448d3-74e1-4086-80c2-130814373e14 (0)' is not running after update. Review logs for failed jobs: ccng_monit_http_healthcheck, nginx_cc
```

So it seems this check in the post start script:

https://github.com/cloudfoundry/capi-release/blob/6f0f64cd6738fa260294c00315a58658b052f121/jobs/cloud_controller_ng/templates/post-start.sh.erb#L71

doesn't seem to matter anymore. 

However, trying this experiment on older versions of CAPI, we observer that the deploy fails in the post-start script after the configured time.

We believe this regression was introduced in this PR: https://github.com/cloudfoundry/capi-release/pull/195/files, which reconfigured the monit dependencies between the processes.

What we are not clear on is if this is a regression that reintroduced the issue which prompted the introduction of that post-start check in the first place: https://github.com/cloudfoundry/capi-release/issues/125.  

## Steps to Reproduce

Add a long sleep to this line: https://github.com/cloudfoundry/cloud_controller_ng/blob/main/lib%2Fcloud_controller%2Frunner.rb#L88 and deploy

## Expected result

The post-start script to fail after the configured `cc.api_post_start_healthcheck_timeout_in_seconds` value

## Current result

Deploy fails after 20 minutes.

## Possible Fix

Not sure, is this something we need to address?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cc.api_post_start_healthcheck_timeout_in_seconds doesn't seem to matter #230

Issue

Context

Steps to Reproduce

Expected result

Current result

Possible Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	cc.api_post_start_healthcheck_timeout_in_seconds:
	default: 60
	description: "Maximum time (in seconds) for cloud_controller_ng to report healthy"

cc.api_post_start_healthcheck_timeout_in_seconds doesn't seem to matter #230

Description

Issue

Context

Steps to Reproduce

Expected result

Current result

Possible Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions