Skip to content

Support arm #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 44 commits into
base: master
Choose a base branch
from
Draft

Support arm #2

wants to merge 44 commits into from

Conversation

tsorya
Copy link
Owner

@tsorya tsorya commented Jun 28, 2021

No description provided.

omertuc and others added 30 commits July 14, 2021 03:53
Older versions of go are out of support, so for security compliance, we were trying to get all components on the latest version. 1.14 is already out of support, i.e. https://endoflife.date/go
Support both APIs.  The distiction is by the flags --cluster-id and --infra-env-id.
If --cluster-id is present then we use v1 APIs.  If --infra-env-id is present then we use
v2 APIs.
This PR adds output of the `lsblk` to the logs related to the mounts and
disks on the target node. This is useful as currently the informations
gathered contain detailed information only for the mounted devices, but
do not provide them for any additional device that may be present but
not mounted.

This is useful so that apart from the existence of the device we can see
also its size and type.
To add oVirt provider support some modifications need to
be done on the agent side.
oVirt platform is detected according to the family parameter
and not by product name which can be various, to detect the
right platform and add it to the isVitual list new function
was added and it will change the host product type to match
oVirt one, it will ensure that the rest of the flow will
remain the same as for the different types of hosts.
The TPM version is validated in the service to make sure the specific
TPM HW is supported by openshift.

Signed-off-by: Yoni Bettan <[email protected]>
NO-ISSUE: Add flaper87 to approvers list
The original plan was to move all images to ubi8. This is not possible due to the lack
of some packages that are needed for other projects. We are now going to switch all images
to stream8 with the hope that the consistency accross repos will prevent (or help) with
debugging current/future issues in CI.

The goal is to keep component's builds as consistent as possible in the channels we are
releasing them on

Signed-off-by: Flavio Percoco <[email protected]>

Co-authored-by: Flavio Percoco <[email protected]>
…dLogs command (openshift#239)

In order to move the Agent to use V2SendLogs command, we must first prepare Agent
to receive additional arguments (InfraEnvID), otherwise it may cause crashes due
to unknown flag on. Only after that we can change the command passed between Assisted-Service
and Agent, and after that we can actually change Agent code to use V2 API
…enshift#240)

This is the final commit for this issue. In previous ones we have
changed Agent to extract InfraEnvID from sendLogs command parameters (if exists)
and changed AssistedService to send InfraEnvID of the host in the sendLogs command
It will make the agent send an invalid TPM version format to the service
and, therefore, to fail.

Signed-off-by: Yoni Bettan <[email protected]>
…shift#246)

This PR changes Agent code to use V2 API for HostLogProgress
instead of V1 API
k4e-device-worker uses the inventory for collecting hardware
information. However, the k4e-device-worker runs as a process on the
host and not inside a container.
Therefore the path for its root filesystem doesn't need to be chrooted.
The PR enbales the caller to provide the chroot root folder.

Signed-off-by: Moti Asayag <[email protected]>
This commit introduces a dry run mode to the agent. The mode is
activated when the `DRY_ENABLE=true` environment is set (or when the
`--dry-run` flag is passed to the installer).

The dry run mode disables several destructive actions performed by the
agent. The purpose of the dry run mode is mostly allowing us to
run a lot of agents on the same machine, without causing harm to
that machine, but still communicating with the service. This is useful
when performing load testing to the service.

Other than disabling destructive actions, it also disables actions that
take too long due to networking / processing reasons.

See the diff for the exact changes, but here's a summary -
- Next step runner retry delay during dry run reduced from 1h to 1m. I believe
  1h was chosen to reduce load on production servers, and it's
  annoyingly long when dealing with crashes during load testing.
- Added journal `DRY_AGENT_ID` field to help separate between
  logs of multiple dry run agents running at the same time on the
  same machine
- When registering a host, the host ID defined by the `DRY_HOST_ID`
  environment will be used rather then the host ID retrieved from
  the hardware.
- `diagnoseSystem` is skipped, it's not necessary and consumes CPU
- Image availability is skipped, dummy results are returned instead
- Disk speed check - 1ms is returned immediately rather than doing
  destructive/slow disk checks
- `getBmcAddress`/`getBmcV6Address` are skipped, the default "not found"
  `0.0.0.0` or `::/0` are immediately returned
- `smartctl` calls are skipped, it's too slow and not useful, dummy
  hard-coded smartctl JSON is returned
- The first interface's MAC address is overridden with the MAC address
  specified in the `DRY_MAC_ADDRESS` env variable. This is useful for
  when you want multiple agents to run on the same machine, but present
  different mac addresses to the service, so they can be individually
  identiefied and separated by BMAC.
- NTP sync - returns an empty list immediately, it's too slow and not
  really necessary when doing a fake installation
- Added the `--cgroup` namespace to `nsenter`. The reasoning for that
  is explained in detail in a code comment in this commit's diff.
- Agent and next_step_runner will now halt when the file whose path
  is configured by DRY_FAKE_REBOOT_MARKER_PATH gets created. This file
  is used by the installer to signal that a "fake reboot" happened.

Other unrelated changes -
- Added some test artifacts to `.dockerignore` to prevent them from
  causing cache misses after doing a `COPY . .` in the Dockerfile
omertuc and others added 14 commits November 7, 2021 14:46
When the installer launches the logs-sender binary from the agent, we
encounter this error:

`Logs were sent\n1 error occurred:\n\t* /usr/bin/lsblk failed: 1 lsblk: unknown column:  NAME,MAJ:MIN,SIZE,TYPE,FSTYPE,KNAME,MODEL,UUID,WWN,HCTL,VENDOR,STATE,TRAN,PKNAME\n\n\n\n\n`

It is caused by the columns list and the `-o` flag being inside the same
parameter.

This commit separates them to avoid this error
Currently when logs-sender is failing we don't know why it happens and on which step as there is no output and agent has nothing to send to service. We need to add logs-sender logs to stdout and it will allow agent to gather them on failure
…ft#254)

The agent binary only checked if a reboot happened when next_step_runner has
an error. This is not the case when reboots happen, next_step_runner exits
cleanly with a 0 exit code, so we should only check dry reboot in the agent
if next_step_runner had no error
…dry mode (openshift#256)

Dry run mode was only tried so far with single-node. This commit makes
the IP and hostname configurable, allowing the swarm to launch
multi-node clusters without having hosts clash.

It also disables the connectivity check in dry run mode because with
different fake IP addresses that would obviously not work
Assisted Installer Service already add the suffix "config/worker".

Removing in Agent side to avoid to get to this URL:
http://<API IP>:22624/config/worker/config/worker
The CIDR validation, the majorityGroup check, and the L2 and L3 validations, are
already performed by assisted service when validating the cluster, host, and network
data. There should not be a need to replicate this validation in the agent itself.

Signed-off-by: Flavio Percoco <[email protected]>
Added 'Accept' to ignition download request header to
explicitly specify version 3.2.0. This is required to
avoid a redundant conversion to v2.2[1] and a failure
in machine-config-operator when LUKS is enabled[2]

[1] https://github.com/openshift/machine-config-operator/blob/9c6c2bfd7ed498bfbc296d530d1839bd6a177b0b/pkg/server/api.go#L154

[2] error: failed to convert config from spec v3.2 to v2.2: unable to convert Ignition spec v3 config to v2: LUKS is not supported on 2.2
Add support of Authorization token and CA cert to the API VIP connectivity validation.
…API VIP connectivity failure (openshift#264)

The apivip request pass base64 encoded CA cert, the agent should decode
it before trying to parse it
Return ignition as part of APIVipConnectivityResponse
for exposing LUKS (disk encryption) information to the
service. This is required as part of disk encryption
validation for day2 hosts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.