Skip to content

Added information about Online project hibernation #4470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ahardin-rh
Copy link
Contributor

No description provided.

@ahardin-rh
Copy link
Contributor Author

ahardin-rh commented May 24, 2017

@damemi @sallyom @luciddreamz PTAL. Some questions:

  1. Would the user run oc status to check on hibernation status, or are there any other user tasks related to this feature? If so, perhaps I can add more detail to https://docs.openshift.com/online/dev_guide/projects.html#check-project-status
  2. Auto-idling is different, but we may also want to mention it in the docs, even though it is something that the user does not set. What does the user (customer) need to know about this? Are there rules for idling, for example, that should be documented?

Huge thanks!

@damemi
Copy link

damemi commented May 24, 2017

@ahardin-rh oc status won't give a solid indication of hibernation, but the presence of a force-sleep pods=0 quota in the project would (we can also change the name of the quota to hibernate if that's desired)

@ahardin-rh
Copy link
Contributor Author

@damemi Okay, so the user could look in the project for the force-sleep value to see where it is at in the scale-down process? When at 0, it hibernates. Is that correct?

Also, what about # 2? What, if anything, should the user know about auto-idling? Even if there's nothing to set, how are they experiencing it and is there any guidance they need to follow?

@damemi
Copy link

damemi commented May 24, 2017

@ahardin-rh the user will only see a force-sleep value if it's already hibernating, and the quota object is deleted when awake. there isn't a way to tell how close a project is to hibernating, for example

As for auto-idling, @sallyom can probably speak more in detail but it basically monitors network traffic to pods in the project and if it falls below a certain threshold every service in the project is idled. again, there's no monitor to see if they're approaching this threshold or not (it's all handled in the controller)

@ahardin-rh
Copy link
Contributor Author

@damemi Thanks. Is that something that the user would know to check? Is that something worth documenting, or no?

As mentioned in today's scrum, there's a lot of information on force-sleep/hibernation and auto-idling, but I need guidance from you and @sallyom to help me distill down what the user actually needs to know. It seems like a lot of this is behind the scenes and not focused on user action. However, I would like to know what else, if anything, impacts user experience and would be worth capturing in the docs.

@damemi
Copy link

damemi commented May 26, 2017

I would say the user should know what triggers force-sleep, how to identify your project is in force-sleep, and what causes it to be removed. You're right that it's very behind-the-scenes as of now, but would probably refer to @abhgupta to find out what we want to publicly document about it.

@ahardin-rh
Copy link
Contributor Author

@sallyom @abhgupta Thoughts? Thanks.

@abhgupta
Copy link
Member

@ahardin-rh I haven't followed this PR/discussion very closely. I will take a look at this tomorrow (on mostly-pto today).

@ahardin-rh
Copy link
Contributor Author

@abhgupta @sallyom When you get a moment, can you please guide me on what we need to add to this PR? Thanks!

@ahardin-rh
Copy link
Contributor Author

@abhgupta @sallyom I'm currently blocked. Can you please let me know what else is needed?

ifdef::openshift-online[]
[[projects-hibenation]]
=== Project Hibernation
In {product-title} Starter, projects hibernate after 30 minutes of inactivity.
Copy link

@sallyom sallyom Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this? "In {product-title} Starter, projects with 30 minutes of inactivity are placed in an idled state with resources scaled to zero.
Upon receiving network activity, project resources are un-idled (scaled back up). In addition to auto-idling, projects must hibernate 18 hours in a 72-hour period. During the hibernation, all project resources are given a hard quota of zero (they cannot be scaled up)."

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahardin-rh @sallyom There is a idle_threshold actually, if the the network activity received is less than the idle_threshold the project will be idled. I think this might need to be referred to in the doc.

Copy link

@yasun1 yasun1 Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahardin-rh @sallyom

  1. Whatever the hours is described in the design or the introduction, the hibernate is 8 hours and the period is 24 hours, but at here, they are 18 of 72, are our design changed to this?
  2. About the last sentence, from the testing on devenv, the project will be given an extra quota force-sleep which only hard code the pod limit to zero, not all the resources(dc,rc,pvc can be successfully created). So during the hibernation, no pod is active(can't be scaled up, and created).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahardin, @yasun1 is correct. During hibernation, pods are given a hard quota of 0, not 'all project resources'

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahardin-rh @yasun1 @damemi
"In {product-title} Starter, inactive projects are placed in an idled state with resources scaled to zero. A project is considered inactive when that project's cumulative network traffic received over 30 minutes is below a configured threshold. Upon receiving any network activity, an idled project's resources are un-idled (scaled back up)."

@ahardin-rh ahardin-rh force-pushed the force-sleep-auto-idler branch from 48694d8 to fc8b332 Compare June 15, 2017 16:38
@ahardin-rh
Copy link
Contributor Author

@sallyom Thanks! This is now updated 🌟

@ahardin-rh
Copy link
Contributor Author

@adellape @bmcelvee Please peer review 💟

Copy link
Contributor

@adellape adellape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

an idled state with resources scaled to zero. Upon receiving network activity,
project resources are un-idled (scaled back up). In addition to auto-idling,
projects must hibernate 18 hours in a 72-hour period. During the hibernation,
all project resources are given a hard quota of zero (they cannot be scaled up).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numerals for the two "zeros" since there's > 10 numerals in the paragraph? Stylepedia

an idled state with resources scaled to zero. Upon receiving network activity,
project resources are un-idled (scaled back up). In addition to auto-idling,
projects must hibernate 18 hours in a 72-hour period. During the hibernation,
all project resources are given a hard quota of zero (they cannot be scaled up).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahardin-rh 'all project resources are given a...' should be changed to 'pods are given a..' my bad, sorry :)

Copy link

@sallyom sallyom Jun 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damemi @ahardin-rh - how about 'projects will hibernate for a configurable amount of time in a configurable time period.' or perhaps, ' projects will hibernate for a configurable amount of time in a configurable time period, currently set to 18 hours in a 72-hour period.' (if we want to give more information)
edit: should keep that sentence 'projects must hibernate 18 hours in a 72-hour period.' as/is

Copy link

@damemi damemi Jun 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to be clearer something like this:

Any project exceeding 54 cumulative quota-hours of usage in a rolling 72-hour period must hibernate for the next 18 hours. Quota-hours are calculated as the maximum between percentage of terminating and non-terminating pod resource quota consumed, multiplied by the running time of those pods. For example, 2 compute pods each using half of the available memory quota for 1 hour will be counted as 1 Quota-hour.

if we want to be specific, since technically not all projects must hibernate 18 hours every 72 hours. the original description might just be easier to understand though

@ahardin-rh ahardin-rh force-pushed the force-sleep-auto-idler branch 2 times, most recently from 0b895c5 to 9f2070f Compare June 29, 2017 17:49
@ahardin-rh
Copy link
Contributor Author

@sallyom @sallyom @ychww @yasun1 Comments addressed

@yasun1
Copy link

yasun1 commented Jun 30, 2017

@ahardin-rh @sallyom @damemi
For the new update, I think

  1. The 'For example' should be more rigorous for easily understanding, etc. '2 compute pods' should be '2 terminating pods', and 'counted as 1 Quota-hour' can be 'counted as 1 terminating Quota-hour'.
  2. Another thing is that I think that as a customer I'd like to know what will affect me if the project hibernates. I like the sentence described in the design:
    the replica count will be set to 0 and all individual pods will be deleted, all PVCs and PVs in the project will be left untouched.

@ahardin-rh ahardin-rh force-pushed the force-sleep-auto-idler branch from 9f2070f to e95a12d Compare June 30, 2017 17:35
@ahardin-rh
Copy link
Contributor Author

@yasun1 Thanks. This is updated.

@vikram-redhat vikram-redhat modified the milestones: Next Release, Staging Jan 8, 2018
@vikram-redhat
Copy link
Contributor

@sallyom @abhgupta can we now publish these docs since Starter is at 3.7?

@sallyom
Copy link

sallyom commented Jan 15, 2018

@ahardin-rh getting closer but hibernation is not deployed in clusters yet. When it is, I'll update here, there may be more to add for docs, explanation on what to expect when a project is 'unidled' will need to be added.

@ahardin-rh
Copy link
Contributor Author

@sallyom Okay, thanks! We'll stand by.
cc @vikram-redhat

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 4, 2018
@sallyom
Copy link

sallyom commented Apr 26, 2018

@ahardin-rh auto-idling is now deployed to starter clusters.

individual pods are deleted. All PVCs and PVs in the project are left untouched.
After the force-sleep period is over, a project is put in an idled state, where
the replica count is `0`, but the force-sleep resource quota is removed. Upon
receiving network traffic, the project's replica counts will be restored to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*** maybe should note: 'If network traffic does not restore a project's replica counts, then a user may have to manually scale up the deployment.' This is bc we've been seeing issues regarding unidling times, unidling in general

the replica count is `0`, but the force-sleep resource quota is removed. Upon
receiving network traffic, the project's replica counts will be restored to
their pre-sleep value and pods will be created.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe note that in web-console, you'll see your deployment as 'Idled due to inactivity' whereupon you can manually scale the deployment back up.

@ahardin-rh ahardin-rh force-pushed the force-sleep-auto-idler branch from 6ea8edc to 7838d7e Compare April 30, 2018 21:16
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2018
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 30, 2018
@ahardin-rh ahardin-rh force-pushed the force-sleep-auto-idler branch from 7838d7e to fcd8c65 Compare May 1, 2018 20:55
@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 1, 2018
@ahardin-rh
Copy link
Contributor Author

@abhgupta @sallyom This PR is updated to only focus on Hibernation. Idling and Pruning in now discussed separately in #8991. Please review to ensure that I got the correct details for each. Thanks!

@openshift-bot
Copy link

@ahardin-rh: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 4, 2018
@vikram-redhat
Copy link
Contributor

@abhgupta @sallyom - please take a look.

@ahardin-rh - are you able to rebase?

@ahardin-rh
Copy link
Contributor Author

@abhgupta @sallyom Since this is still in motion and has been open for so long, I am going to close this PR. We can create a new one when you're ready to document this feature.

@ahardin-rh ahardin-rh closed this Aug 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch/enterprise-3.9 branch/enterprise-3.10 branch/online needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants