Skip to content

shelldriver: fixes and improvements #945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -542,6 +542,8 @@ environment config:
'_run',
Mock(return_value=(['OK'], [], 0))
).start()
patch.object(ShellDriver, "_silence_kernel", Mock()).start()
patch.object(ShellDriver, "_inject_run", Mock()).start()

.. testcode:: pytest-example
:hide:
Expand Down
63 changes: 53 additions & 10 deletions labgrid/driver/shelldriver.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,35 @@ def _run(self, cmd, *, timeout=30.0, codec="utf-8", decodeerrors="strict"):
def run(self, cmd, timeout=30.0, codec="utf-8", decodeerrors="strict"):
return self._run(cmd, timeout=timeout, codec=codec, decodeerrors=decodeerrors)

def _raw_run(self, cmd, timeout=2, codec="utf-8", decodeerrors="strict"):
"""Runs command without run() wrapper as it's not yet available"""
marker = gen_marker()
self.console.sendline(f"echo {marker[:4]}; {cmd} ; echo {marker[4:]}")
_, _, match, _ = self.console.expect(
rf"{marker[:4]}(.*){marker[4:]}\s+{self.prompt}",
timeout=timeout
)
data = self.re_vt100.sub('', match.group(1).decode(codec, decodeerrors)).split('\r\n')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data = self.re_vt100.sub('', match.group(1).decode(codec, decodeerrors)).split('\r\n')
data = re_vt100.sub('', match.group(1).decode(codec, decodeerrors)).split('\r\n')

This seems to have changed in 6743c69

if data and not data[-1]:
del data[-1]
return data

def _silence_kernel(self, timeout, kernel_log_level=1, codec="utf-8", decodeerrors="strict"):
"""Tries to set kernel log level to KERN_ALERT log level to make it less chatty."""

while True:
self.console.sendline(f"dmesg -n {kernel_log_level}")
try:
data = self._raw_run("cat /proc/sys/kernel/printk")
if data[2] and data[2].startswith(f'{kernel_log_level}\t'):
return

except TIMEOUT:
pass

if timeout.expired:
raise TIMEOUT(f"Timeout of {self.login_timeout} seconds exceeded during waiting for kernel silencer") # pylint: disable=line-too-long

Comment on lines +113 to +141
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want more retry code for certain devices behaving strangely. Shouldn't these devices wait until input can be handled reliably before starting login/shell?

From the commit message:

Those issues mostly happen frequently during early boot process, where there is a lot of output
going on.

Could await_login_timeout help here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want more retry code for certain devices behaving strangely.

Ok, fair enough.

Shouldn't these devices wait until input can be handled reliably before starting login/shell?

Sure, if you have complete control over the build/settings of bootloader, kernel, drivers and operating system, then this is certainly doable.

For example, in OpenWrt project I'm testing 3 releases, this means 3-4 different kernel versions, 5.10, 5.15, 6.1 and 6.6. I've already learned during the years, that each of this kernel versions changed the behavior of the device, driver probing/loading and thus influence the serial console behavior. I don't want to handle this differences manually, thus I'm silencing the kernel ASAP, then continue bringing the device into SSH over LAN network and profit.

On some slow devices there might be 10s silence on the console and then ath11k is being probed and if you're lucky enough, exactly at the point when you're trying to detect LAN network availability of the device. In some builds, there are strange 3rd party applications which are not afraid to output debug logging into /dev/console directly etc.

Could await_login_timeout help here instead?

Its quite some time, but I guess, that I've tried all available features/config options, indeed. I'll rebase, retest and see how it goes today.

@step()
def _await_login(self):
"""Awaits the login prompt and logs the user in"""
Expand All @@ -136,14 +165,14 @@ def _await_login(self):
)

if index == 0:
if did_login and not did_silence_kernel:
if not did_silence_kernel:
# Silence the kernel and wait for another prompt
self.console.sendline("dmesg -n 1")
self._silence_kernel(timeout)
did_silence_kernel = True
else:
# we got a prompt. no need for any further action to
# activate this driver.
self.status = 1
self._status = 1
Comment on lines -146 to +175
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's already fixed with 8bc7858.

break

elif index == 1:
Expand Down Expand Up @@ -177,9 +206,10 @@ def _await_login(self):
if timeout.expired:
raise TIMEOUT(f"Timeout of {self.login_timeout} seconds exceeded during waiting for login") # pylint: disable=line-too-long

if self.post_login_settle_time > 0:
self.console.settle(self.post_login_settle_time, timeout=timeout.remaining)

if did_login:
if self.post_login_settle_time > 0:
self.console.settle(self.post_login_settle_time, timeout=timeout.remaining)
self._check_prompt()

@step()
Expand All @@ -206,11 +236,24 @@ def _check_prompt(self):
self._status = 0
raise

def _inject_run(self):
self.console.sendline(
'''run() { echo -n "$MARKER"; sh -c "$@"; echo "$MARKER $?"; }'''
)
self.console.expect(self.prompt)
def _inject_run(self, inject_timeout=10.0):
"""Tries to inject run() command wrapper into shell on the console."""
timeout = Timeout(inject_timeout)

while True:
self.console.sendline(
'''run() { echo -n "$MARKER"; sh -c "$@"; echo "$MARKER $?"; }'''
)
self.console.expect(self.prompt)
try:
data = self._raw_run("type run")
if data[2] and "function" in data[2]:
return
except TIMEOUT:
pass

if timeout.expired:
raise TIMEOUT(f"Timeout of {inject_timeout} seconds exceeded during waiting for run injection") # pylint: disable=line-too-long
Comment on lines -209 to +256
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your serial console does not work reliably (even after specifying a certain settle time), I don't think retrying is the correct approach. Even if this works after the nth try, too many other things will fail in unexpected ways later. Better make your serial console work reliably or prefer the SSHDriver in these situations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your serial console does not work reliably (even after specifying a certain settle time)

Ok, fair enough, but my goal is to make it generic and reusable across different use cases, instead of fiddling with variable settle time, fighting crazy applications logging into /dev/console directly etc.

If your serial console does not work reliably (even after specifying a certain settle time), I don't think retrying is the correct approach.

Well, this approach works reliably for over than 2 years, with a mix of dozen different devices, bootloaders, kernels/drivers and operating systems. So the idea here is to get as fast (and reliably) as possible into the SSH and profit.

SSHDriver in these situations.

Indeed, its the only reliable solution if there is no dedicated serial console available.


@step(args=['keyfile_path'])
def _put_ssh_key(self, keyfile_path):
Expand Down