Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ concurrency:
permissions:
contents: read
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: rvben/rumdl@v0
ci:
runs-on: ubuntu-latest
steps:
Expand Down
23 changes: 23 additions & 0 deletions .rumdl.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
[global]
# https://rumdl.dev/global-settings/#toml-configuration-rumdltoml
disable = [
"first-line-h1", # Disabled as these pages are embedded into manageiq.org
"line-length",

"descriptive-link-text", # TODO: Give each link proper link text
"no-alt-text", # TODO: Give each image Alt text
"no-emphasis-as-heading" # TODO: rumdl is inaccurately autofixing some of these -
# Need to determine if they are emphasis or headings
]
flavor = "gfm"

[code-block-style]
style = "fenced"

[code-fence-style]
style = "backtick"

[no-inline-html]
allowed-elements = ["details", "summary"]
fix = true
table-allowed-elements = ["br"]
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,39 +3,39 @@
[![CI](https://github.com/ManageIQ/guides/actions/workflows/ci.yaml/badge.svg)](https://github.com/ManageIQ/guides/actions/workflows/ci.yaml)

### [Setup](developer_setup.md)

* [Development Appliance Setup](https://github.com/ManageIQ/manageiq-appliance-dev-setup)
* [Development using a Vagrant VM](vagrant_developer_vm.md)
* [Plugin development](developer_setup/plugins.md) aka working with the split `manageiq*` repositories
* [Provider development guide](providers/dev-guide.md)
* Provider setup instructions
- [Amazon AWS](providers/amazon_aws_config.md)
- [Openshift](providers/openshift.md)
- [Openstack Infra](providers/openstack_infra_provider.md)
- [Interactive debugging with Pry-Remote](developer_setup/debugging.md)
* [Running in minimal mode](developer_setup/minimal_mode.md)
* [Amazon AWS](providers/amazon_aws_config.md)
* [Openshift](providers/openshift.md)
* [Openstack Infra](providers/openstack_infra_provider.md)
* [Interactive debugging with Pry-Remote](developer_setup/debugging.md)
* [Running the test suites](developer_setup/running_test_suites.md)
* [Running Cypress tests](ui/cypress.md)
* [Setting up Kubernetes for use with ManageIQ](providers/kubernetes.md)
* [Testing logical replication with migrations](logical_replication_migrations.md)

### Developer Guidelines

* [Backport Process](backport_process.md)
* [Coding Style and Standards](coding_style_and_standards.md)
* [Contributing to the API](https://github.com/ManageIQ/manageiq-api/blob/master/CONTRIBUTING.md)
* [External Authentication (httpd)](external_auth.md)
* [GIT Helpers](git_utils/README.md)
* [Issue and PR Triage Process](triage_process.md)
* [Internationalization Guidelines](i18n.md)
* [Merger Guidelines](mergers_guidelines.md)
* [Project Roadmap](https://github.com/orgs/ManageIQ/projects/13)
* [Repository Labels and Colors](labels.md)
* [Reviewer Guidelines](reviewers_guidelines.md)
* [Sprint Boundaries](sprint_boundaries.md)
* [UI Patterns](ui/patterns.md)
* [UI Plugins](ui/plugins.md)
* [Updating this Documentation](writing_guides.md)

### Technical documentation

* [Architecture](architecture.md)
* [Opening custom URLs via Custom Buttons and Automate](automate_url_open.md)
* [Report data API](ui/report_data_api.md)
Expand Down
23 changes: 12 additions & 11 deletions architecture/capacity_and_utilization_collection_explanation.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,22 @@
1. A processor worker will pick up one of these rollup work items and queue up ANOTHER rollup for the next stage of the rollup chain.
1. This continues until we hit the end of the chain.

After enabling the Capacity & Utilization Collector Role, data collection begins immediately. However, the first collection begins 5 minutes after the CFME Server is started, and every 10 minutes after that. Therefore, the longest the collection will take after enabling the Capacity & Utilization Collector CFME Server Role is 10 minutes. The first collection from a particular management system may take a few minutes since CFME is gathering data points going one month back in time
After enabling the Capacity & Utilization Collector Role, data collection begins immediately. However, the first collection begins 5 minutes after the CFME Server is started, and every 10 minutes after that. Therefore, the longest the collection will take after enabling the Capacity & Utilization Collector CFME Server Role is 10 minutes. The first collection from a particular management system may take a few minutes since CFME is gathering data points going one month back in time

## Rollups

There are two types of rollups, **time-based** and **infrastructure-based**.

* **Time-based** rollups go from realtime → hourly → daily.
* **Infrastructure-based** rollups for hourly go from `Vm` → `Host` → `EmsCluster` → `ExtManagementSystem` → `MiqRegion` (and maybe to `MiqEnterprise`?).

### Example

Say we do a capture on a VM and we get back data with timestamps between 4:05 and 4:15. This would cause records to be written in the metrics for each interval. Then, it would put a rollup on the queue for that VM for the 4:00 hour *(time-based)*. A processor worker will pick up that queue item, gather all of the real-time records for that VM for the 4:00 hour, and write a rollup hourly record for that VM. Then it will queue up 2 more rollups. One rollup is for the parent Host of that VM for the 4:00 hour *(infrastructure-based)*, and another is for that VM for the day *(time-based)*.
Say we do a capture on a VM and we get back data with timestamps between 4:05 and 4:15. This would cause records to be written in the metrics for each interval. Then, it would put a rollup on the queue for that VM for the 4:00 hour *(time-based)*. A processor worker will pick up that queue item, gather all of the real-time records for that VM for the 4:00 hour, and write a rollup hourly record for that VM. Then it will queue up 2 more rollups. One rollup is for the parent Host of that VM for the 4:00 hour *(infrastructure-based)*, and another is for that VM for the day *(time-based)*.

Below is the full tree of rollups that will occur:

~~~
```text
Vm (realtime collected)
Vm (hourly)
Vm (daily)
Expand All @@ -33,22 +35,22 @@ Vm (realtime collected)
ExtManagementSystem (daily)
MiqRegion (hourly)
MiqRegion (daily)
~~~

```

That is the simplest description. In reality, there are some nuances that should be mentioned.

* **We collect data that spans hours** (e.g. we collect 3:50-4:15), so a separate rollup is put on the queue for each hour in question, and the chain begins separately for each.

* **Rollups are queued up for a particular hour or day, but don't actually execute until the end of that time period**. As captures queue up rollups for the same parameters (e.g. for Vm:1 for the 4:00 hour), they are merged on the queue, so you only end up with one rollup record. If a rollup is being executed and new data comes in for that hour (e.g. from a gap collection, or collection falls behind), then a new queue item will be placed for those same parameters. This is ok as the rollup code recognizes when it has to update existing records.
* **Rollups are queued up for a particular hour or day, but don't actually execute until the end of that time period**. As captures queue up rollups for the same parameters (e.g. for Vm:1 for the 4:00 hour), they are merged on the queue, so you only end up with one rollup record. If a rollup is being executed and new data comes in for that hour (e.g. from a gap collection, or collection falls behind), then a new queue item will be placed for those same parameters. This is ok as the rollup code recognizes when it has to update existing records.

* **The definition of a day is actually more complex**. A day is a different range of hours depending on your time zone. We accomplish managing this with Time Profiles. Time Profiles represent a time zone as well as a set of hours and days that are considered valid. This way a customer can create a Time Profile that represents an entire time zone or just their business hours in a particular time zone (e.g. Eastern Time 9-5 M-F). Part of setting up a time profile is to choose whether or not it will participate in daily rollups. UTC is provided OOtB with daily rollups enabled. Therefore, as far as daily rollups are concerned, the truth is that a daily rollup is queued for each Time Profile that has daily rollups enabled.
* **The definition of a day is actually more complex**. A day is a different range of hours depending on your time zone. We accomplish managing this with Time Profiles. Time Profiles represent a time zone as well as a set of hours and days that are considered valid. This way a customer can create a Time Profile that represents an entire time zone or just their business hours in a particular time zone (e.g. Eastern Time 9-5 M-F). Part of setting up a time profile is to choose whether or not it will participate in daily rollups. UTC is provided OOtB with daily rollups enabled. Therefore, as far as daily rollups are concerned, the truth is that a daily rollup is queued for each Time Profile that has daily rollups enabled.

* **Storages are slightly different. Their information is collected from our storage scans**, so if you've never scanned a Storage, you won't get any data. Storages rollup directly to their EMS, I think. Also, they are run on a different schedule.
* **Storages are slightly different. Their information is collected from our storage scans**, so if you've never scanned a Storage, you won't get any data. Storages rollup directly to their EMS, I think. Also, they are run on a different schedule.

* **Cloud rollups** *(coming soon)* will go from `Vm` → `Availability Zone` → `ExtManagementSystem` → `MiqRegion`.

## Notes on Testing Rollups

* **Rollups are automatic behind the scenes and are triggered by a collection**. Therefore, if you try to manually inject data, you are not really running a collection and thus won't get rollups.

* **A capture of a Vm can be kicked off in a rails console with `vm.perf_capture("realtime")`**. The rollups on the queue can be executed without a worker by just delivering them from the queue
Expand All @@ -58,9 +60,8 @@ That is the simplest description. In reality, there are some nuances that should
q.delivered(*q.deliver)
```

If you want to fake creating rollups, you can just do `vm.perf_rollup_to_parent("realtime", start_time, end_time)`, which queues them up and starts the chain.
If you want to fake creating rollups, you can just do `vm.perf_rollup_to_parent("realtime", start_time, end_time)`, which queues them up and starts the chain.

* **Database tables are not ordered sets of data, so if you did a straight query they are not guaranteed to appear in any particular order.** In addition, due to the nature of multiple workers, data may get written in different orders, especially if records have to be updated. It may be helpful to order by timestamp and filter against `resource_type`, `resource_id`, `capture_interval_name`.


## TODO: Notes on why we use Postgres inheritance, and why metrics and metrics_rollups are in separate tables.
## TODO: Notes on why we use Postgres inheritance, and why metrics and metrics_rollups are in separate tables
43 changes: 22 additions & 21 deletions architecture/enterprise.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Enterprise Architecture

To facilitate scaling it is possible to use multiple ManageIQ appliances each
performing different roles. To understand how this is accomplished some new
terms and concepts must be learned. This document gives a high level overview
performing different roles. To understand how this is accomplished some new
terms and concepts must be learned. This document gives a high level overview
of the enterprise architecture and hierarchy.

* [Appliance](#appliance)
Expand All @@ -12,67 +12,68 @@ of the enterprise architecture and hierarchy.
## Appliance

Also known as a "Server" or an "MiqServer", an appliance is a virtual machine
with the ManageIQ executable code. It is delivered as a preconfigured virtual
with the ManageIQ executable code. It is delivered as a preconfigured virtual
appliance that can run on either VMware vSphere, RHEV, oVirt, or OpenStack.

Appliances are added for horizontal scalability as well as for dividing up work
by roles. An appliance can be configured to handle work for one or many roles,
by roles. An appliance can be configured to handle work for one or many roles,
with workers within the appliance carrying out the duties for which they are
configured.

## Zone

Multiple appliances are logically grouped into zones. Typically, zones are
Multiple appliances are logically grouped into zones. Typically, zones are
configured to provide specific functionalities, however the grouping is
completely up to the user.

Some examples of zones are

* A UI zone
* A reporting zone
* A test zone
* A production zone
* A vSphere zone

ManageIQ has the ability to create an affinity between a zone and a particular
provider. In this way, a provider specific zone can be created.
provider. In this way, a provider specific zone can be created.

## Region / Enterprise

A region is a full installation of ManageIQ, containing one database appliance,
and potentially many other appliances. A region is the collection of all zones
and potentially many other appliances. A region is the collection of all zones
that share the same database.

In a typical enterprise installation, a separate region is used for each
geographical region where WAN access to the database would be detrimental to
performance. For example, for an international corporation, one region may be
placed in North America, a second in Europe, and a third in Asia. When multiple
performance. For example, for an international corporation, one region may be
placed in North America, a second in Europe, and a third in Asia. When multiple
regions are involved, we refer to the collection of all regions as the
enterprise.

To give a worldwide "single pane of glass" view, one extra region is usually
added to act as a "master" region. This region is referred to as the Enterprise
region. The other regions then enable the database synchronization role in
added to act as a "master" region. This region is referred to as the Enterprise
region. The other regions then enable the database synchronization role in
order to replicate their data into the "master" region. In this way, individual
regions get the benefit of being co-located with the database, whereas the
enterprise region can provide a high-level reporting view where needed. See the
enterprise region can provide a high-level reporting view where needed. See the
database synchronization role for more information.

Only one appliance per region can provide the database. The ManageIQ appliance
comes preconfigured with a default database. If a second appliance is added, it
Only one appliance per region can provide the database. The ManageIQ appliance
comes preconfigured with a default database. If a second appliance is added, it
must be configured to point to the first appliances's database.

Each region is identified by a unique number. When a new region is created, a
unique number must be chosen. The ManageIQ appliance comes preconfigured with
the default region number of 0. The region number is used to set up database id
numbers in ranges of 1 trillion. Thus region 0 will contain ids 0 through
Each region is identified by a unique number. When a new region is created, a
unique number must be chosen. The ManageIQ appliance comes preconfigured with
the default region number of 0. The region number is used to set up database id
numbers in ranges of 1 trillion. Thus region 0 will contain ids 0 through
999,999,999,999 and region 1 will contain ids 1,000,000,000,000 through
1,999,999,999,999. By having each region be a specific range, there are no
1,999,999,999,999. By having each region be a specific range, there are no
collisions when database synchronization combines the various regions into the
"master" region's database.

### Visual example

<pre>
```text
Region
+--------------------------------------------------------------+
| |
Expand All @@ -91,4 +92,4 @@ collisions when database synchronization combines the various regions into the
| +--------------------------+ +--------------------------+ |
| |
+--------------------------------------------------------------+
</pre>
```
12 changes: 6 additions & 6 deletions architecture/logging.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

ManageIQ uses the standard Ruby [Logger](https://ruby-doc.org/stdlib-2.6.6/libdoc/logger/rdoc/Logger.html)
interface, with a custom formatter that is mostly just minor changes from the
default formatter. Additionally, we use a logger abstraction library, called
[manageiq-loggers](github.com/ManageIQ/manageiq-loggers) in order to support
default formatter. Additionally, we use a logger abstraction library, called
[manageiq-loggers](https://github.com/ManageIQ/manageiq-loggers) in order to support
multiple log targets and formats.

An example log message looks like:
Expand All @@ -26,7 +26,7 @@ where:
### Container log format

In container deployments, ManageIQ also broadcasts logs to STDOUT in structured
JSON format, so that it can be consumed by a cluster-level log aggregator. For,
JSON format, so that it can be consumed by a cluster-level log aggregator. For,
example OpenShift has a feature called cluster logging, which consumes STDOUT and
feeds those lines to ElasticSearch as part of an EFK stack (ElasticSearch /
Fluentd / Kibana). However, because it is simple JSON, the output could be
Expand All @@ -50,13 +50,13 @@ as a single line.
### Development

In development, a number of log objects are available, with `$log` being the
primary log object. There are a number of separate log objects created for
various purposes, particularly for provider clients. The Rails logger is also
primary log object. There are a number of separate log objects created for
various purposes, particularly for provider clients. The Rails logger is also
available via `$rails_log` (or the standard `Rails.logger`). See
[lib/vmdb/loggers.rb](https://github.com/ManageIQ/manageiq/blob/master/lib/vmdb/loggers.rb)
for the complete list of loggers.

Additionally, if the [`Vmdb::Logging`](https://github.com/ManageIQ/manageiq/blob/master/lib/vmdb/logging.rb)
module is mixed into a class, then the _log method is available. This method will
module is mixed into a class, then the _log method is available. This method will
automatically prefix the code location to the message, and so is the most preferred
way to do logging.
Loading