Added information about Online project hibernation #4470

ahardin-rh · 2017-05-24T14:43:50Z

No description provided.

ahardin-rh · 2017-05-24T15:16:59Z

@damemi @sallyom @luciddreamz PTAL. Some questions:

Would the user run oc status to check on hibernation status, or are there any other user tasks related to this feature? If so, perhaps I can add more detail to https://docs.openshift.com/online/dev_guide/projects.html#check-project-status
Auto-idling is different, but we may also want to mention it in the docs, even though it is something that the user does not set. What does the user (customer) need to know about this? Are there rules for idling, for example, that should be documented?

Huge thanks!

damemi · 2017-05-24T15:45:40Z

@ahardin-rh oc status won't give a solid indication of hibernation, but the presence of a force-sleep pods=0 quota in the project would (we can also change the name of the quota to hibernate if that's desired)

ahardin-rh · 2017-05-24T20:02:10Z

@damemi Okay, so the user could look in the project for the force-sleep value to see where it is at in the scale-down process? When at 0, it hibernates. Is that correct?

Also, what about # 2? What, if anything, should the user know about auto-idling? Even if there's nothing to set, how are they experiencing it and is there any guidance they need to follow?

damemi · 2017-05-24T20:28:01Z

@ahardin-rh the user will only see a force-sleep value if it's already hibernating, and the quota object is deleted when awake. there isn't a way to tell how close a project is to hibernating, for example

As for auto-idling, @sallyom can probably speak more in detail but it basically monitors network traffic to pods in the project and if it falls below a certain threshold every service in the project is idled. again, there's no monitor to see if they're approaching this threshold or not (it's all handled in the controller)

ahardin-rh · 2017-05-25T20:42:04Z

@damemi Thanks. Is that something that the user would know to check? Is that something worth documenting, or no?

As mentioned in today's scrum, there's a lot of information on force-sleep/hibernation and auto-idling, but I need guidance from you and @sallyom to help me distill down what the user actually needs to know. It seems like a lot of this is behind the scenes and not focused on user action. However, I would like to know what else, if anything, impacts user experience and would be worth capturing in the docs.

damemi · 2017-05-26T15:05:29Z

I would say the user should know what triggers force-sleep, how to identify your project is in force-sleep, and what causes it to be removed. You're right that it's very behind-the-scenes as of now, but would probably refer to @abhgupta to find out what we want to publicly document about it.

ahardin-rh · 2017-05-30T15:08:25Z

@sallyom @abhgupta Thoughts? Thanks.

abhgupta · 2017-05-30T15:56:22Z

@ahardin-rh I haven't followed this PR/discussion very closely. I will take a look at this tomorrow (on mostly-pto today).

ahardin-rh · 2017-06-05T15:10:38Z

@abhgupta @sallyom When you get a moment, can you please guide me on what we need to add to this PR? Thanks!

ahardin-rh · 2017-06-08T15:40:22Z

@abhgupta @sallyom I'm currently blocked. Can you please let me know what else is needed?

sallyom · 2017-06-13T21:48:27Z

architecture/core_concepts/projects_and_users.adoc

+ifdef::openshift-online[]
+[[projects-hibenation]]
+=== Project Hibernation
+In {product-title} Starter, projects hibernate after 30 minutes of inactivity.


Something like this? "In {product-title} Starter, projects with 30 minutes of inactivity are placed in an idled state with resources scaled to zero.
Upon receiving network activity, project resources are un-idled (scaled back up). In addition to auto-idling, projects must hibernate 18 hours in a 72-hour period. During the hibernation, all project resources are given a hard quota of zero (they cannot be scaled up)."

@ahardin-rh @sallyom There is a idle_threshold actually, if the the network activity received is less than the idle_threshold the project will be idled. I think this might need to be referred to in the doc.

@ahardin-rh @sallyom

Whatever the hours is described in the design or the introduction, the hibernate is 8 hours and the period is 24 hours, but at here, they are 18 of 72, are our design changed to this?

About the last sentence, from the testing on devenv, the project will be given an extra quota force-sleep which only hard code the pod limit to zero, not all the resources(dc,rc,pvc can be successfully created). So during the hibernation, no pod is active(can't be scaled up, and created).

@ahardin, @yasun1 is correct. During hibernation, pods are given a hard quota of 0, not 'all project resources'

@ahardin-rh @yasun1 @damemi
"In {product-title} Starter, inactive projects are placed in an idled state with resources scaled to zero. A project is considered inactive when that project's cumulative network traffic received over 30 minutes is below a configured threshold. Upon receiving any network activity, an idled project's resources are un-idled (scaled back up)."

ahardin-rh · 2017-06-15T16:46:29Z

@sallyom Thanks! This is now updated 🌟

ahardin-rh · 2017-06-15T16:49:52Z

@adellape @bmcelvee Please peer review 💟

adellape

LGTM!

adellape · 2017-06-19T17:12:59Z

architecture/core_concepts/projects_and_users.adoc

+an idled state with resources scaled to zero. Upon receiving network activity,
+project resources are un-idled (scaled back up). In addition to auto-idling,
+projects must hibernate 18 hours in a 72-hour period. During the hibernation,
+all project resources are given a hard quota of zero (they cannot be scaled up).


Numerals for the two "zeros" since there's > 10 numerals in the paragraph? Stylepedia

sallyom · 2017-06-29T15:34:06Z

architecture/core_concepts/projects_and_users.adoc

+an idled state with resources scaled to zero. Upon receiving network activity,
+project resources are un-idled (scaled back up). In addition to auto-idling,
+projects must hibernate 18 hours in a 72-hour period. During the hibernation,
+all project resources are given a hard quota of zero (they cannot be scaled up).


@ahardin-rh 'all project resources are given a...' should be changed to 'pods are given a..' my bad, sorry :)

@damemi @ahardin-rh - how about 'projects will hibernate for a configurable amount of time in a configurable time period.' or perhaps, ' projects will hibernate for a configurable amount of time in a configurable time period, currently set to 18 hours in a 72-hour period.' (if we want to give more information)
edit: should keep that sentence 'projects must hibernate 18 hours in a 72-hour period.' as/is

I think to be clearer something like this:

Any project exceeding 54 cumulative quota-hours of usage in a rolling 72-hour period must hibernate for the next 18 hours. Quota-hours are calculated as the maximum between percentage of terminating and non-terminating pod resource quota consumed, multiplied by the running time of those pods. For example, 2 compute pods each using half of the available memory quota for 1 hour will be counted as 1 Quota-hour.

if we want to be specific, since technically not all projects must hibernate 18 hours every 72 hours. the original description might just be easier to understand though

ahardin-rh · 2017-06-29T18:11:10Z

@sallyom @sallyom @ychww @yasun1 Comments addressed

yasun1 · 2017-06-30T05:40:37Z

@ahardin-rh @sallyom @damemi
For the new update, I think

The 'For example' should be more rigorous for easily understanding, etc. '2 compute pods' should be '2 terminating pods', and 'counted as 1 Quota-hour' can be 'counted as 1 terminating Quota-hour'.
Another thing is that I think that as a customer I'd like to know what will affect me if the project hibernates. I like the sentence described in the design:
the replica count will be set to 0 and all individual pods will be deleted, all PVCs and PVs in the project will be left untouched.

ahardin-rh · 2017-06-30T17:38:07Z

@yasun1 Thanks. This is updated.

vikram-redhat · 2018-01-12T02:28:25Z

@sallyom @abhgupta can we now publish these docs since Starter is at 3.7?

sallyom · 2018-01-15T19:23:54Z

@ahardin-rh getting closer but hibernation is not deployed in clusters yet. When it is, I'll update here, there may be more to add for docs, explanation on what to expect when a project is 'unidled' will need to be added.

ahardin-rh · 2018-01-15T19:25:24Z

@sallyom Okay, thanks! We'll stand by.
cc @vikram-redhat

sallyom · 2018-04-26T17:24:13Z

@ahardin-rh auto-idling is now deployed to starter clusters.

sallyom · 2018-04-26T17:31:07Z

architecture/core_concepts/projects_and_users.adoc

+individual pods are deleted. All PVCs and PVs in the project are left untouched.
+After the force-sleep period is over, a project is put in an idled state, where
+the replica count is `0`, but the force-sleep resource quota is removed. Upon
+receiving network traffic, the project's replica counts will be restored to


*** maybe should note: 'If network traffic does not restore a project's replica counts, then a user may have to manually scale up the deployment.' This is bc we've been seeing issues regarding unidling times, unidling in general

sallyom · 2018-04-26T17:32:21Z

architecture/core_concepts/projects_and_users.adoc

+the replica count is `0`, but the force-sleep resource quota is removed. Upon
+receiving network traffic, the project's replica counts will be restored to
+their pre-sleep value and pods will be created.
+


Maybe note that in web-console, you'll see your deployment as 'Idled due to inactivity' whereupon you can manually scale the deployment back up.

ahardin-rh · 2018-05-01T21:07:30Z

@abhgupta @sallyom This PR is updated to only focus on Hibernation. Idling and Pruning in now discussed separately in #8991. Please review to ensure that I got the correct details for each. Thanks!

openshift-bot · 2018-06-04T18:27:26Z

@ahardin-rh: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vikram-redhat · 2018-08-01T08:19:17Z

@abhgupta @sallyom - please take a look.

@ahardin-rh - are you able to rebase?

ahardin-rh · 2018-08-03T18:19:47Z

@abhgupta @sallyom Since this is still in motion and has been open for so long, I am going to close this PR. We can create a new one when you're ready to document this feature.

ahardin-rh added branch/dedicated branch/enterprise-3.3 branch/enterprise-3.4 branch/enterprise-3.5 branch/enterprise-3.6 branch/online labels May 24, 2017

ahardin-rh added this to the Online Next 2 milestone May 24, 2017

ahardin-rh self-assigned this May 24, 2017

sallyom reviewed Jun 13, 2017

View reviewed changes

ahardin-rh force-pushed the force-sleep-auto-idler branch from 48694d8 to fc8b332 Compare June 15, 2017 16:38

adellape reviewed Jun 19, 2017

View reviewed changes

sallyom reviewed Jun 29, 2017

View reviewed changes

ahardin-rh force-pushed the force-sleep-auto-idler branch 2 times, most recently from 0b895c5 to 9f2070f Compare June 29, 2017 17:49

ahardin-rh force-pushed the force-sleep-auto-idler branch from 9f2070f to e95a12d Compare June 30, 2017 17:35

ahardin-rh removed the branch/enterprise-3.6 label Dec 8, 2017

vikram-redhat modified the milestones: Next Release, Staging Jan 8, 2018

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 4, 2018

sallyom reviewed Apr 26, 2018

View reviewed changes

ahardin-rh added branch/enterprise-3.10 and removed branch/enterprise-3.7 labels Apr 26, 2018

ahardin-rh force-pushed the force-sleep-auto-idler branch from 6ea8edc to 7838d7e Compare April 30, 2018 21:16

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2018

openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 30, 2018

ahardin-rh force-pushed the force-sleep-auto-idler branch from 7838d7e to fcd8c65 Compare May 1, 2018 20:55

openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 1, 2018

ahardin-rh mentioned this pull request May 1, 2018

Added Project Idling and Account Pruning sections #8991

Merged

ahardin-rh removed the branch/dedicated label May 1, 2018

Added information about Online project hibernation

fcd8c65

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 4, 2018

ahardin-rh closed this Aug 3, 2018

Added information about Online project hibernation #4470

Added information about Online project hibernation #4470

Uh oh!

Conversation

ahardin-rh commented May 24, 2017

Uh oh!

ahardin-rh commented May 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

damemi commented May 24, 2017

Uh oh!

ahardin-rh commented May 24, 2017

Uh oh!

damemi commented May 24, 2017

Uh oh!

ahardin-rh commented May 25, 2017

Uh oh!

damemi commented May 26, 2017

Uh oh!

ahardin-rh commented May 30, 2017

Uh oh!

abhgupta commented May 30, 2017

Uh oh!

ahardin-rh commented Jun 5, 2017

Uh oh!

ahardin-rh commented Jun 8, 2017

Uh oh!

sallyom Jun 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ychww Jun 19, 2017

Choose a reason for hiding this comment

Uh oh!

yasun1 Jun 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sallyom Jun 29, 2017

Choose a reason for hiding this comment

Uh oh!

sallyom Jun 29, 2017

Choose a reason for hiding this comment

Uh oh!

ahardin-rh commented Jun 15, 2017

Uh oh!

ahardin-rh commented Jun 15, 2017

Uh oh!

adellape left a comment

Choose a reason for hiding this comment

Uh oh!

adellape Jun 19, 2017

Choose a reason for hiding this comment

Uh oh!

sallyom Jun 29, 2017

Choose a reason for hiding this comment

Uh oh!

sallyom Jun 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

damemi Jun 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahardin-rh commented Jun 29, 2017

Uh oh!

yasun1 commented Jun 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahardin-rh commented Jun 30, 2017

Uh oh!

vikram-redhat commented Jan 12, 2018

Uh oh!

sallyom commented Jan 15, 2018

Uh oh!

ahardin-rh commented Jan 15, 2018

Uh oh!

sallyom commented Apr 26, 2018

Uh oh!

ahardin-rh commented May 24, 2017 •

edited

Loading

sallyom Jun 13, 2017 •

edited

Loading

yasun1 Jun 19, 2017 •

edited

Loading

sallyom Jun 29, 2017 •

edited

Loading

damemi Jun 29, 2017 •

edited

Loading

yasun1 commented Jun 30, 2017 •

edited

Loading