Skip to content

[Bug] Snapshot failed time to time because of IUCV failure #908

@dvjsharma

Description

@dvjsharma

Summary

Snapshot/capture requests occasionally fail when the SDK sends the initial IUCV command (pwd) to the target VM immediately after power-on. SMUT logs show rc:4 rs:111 and Feilong returns internal error 300/5.

Logs

[2018-05-10 19:44:24] [ERROR] Failed to check iucv status on capture source vm QYR00011 with error (Error) ULTVMU0315E IUCV socket error sending command to QYR00011. cmd: pwd, rc: 4, rs: 111, out: ERROR: ERROR connecting socket:
[2018-05-10 19:44:24] [ERROR] Failed to capture guest 'QYR00011' to generate image 'd1b4ce01-daf3-4704-bd6e-a50be9fafc69'.
[2018-05-10 19:44:24] [ERROR] (127.0.0.1:46488) Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/zvmsdk/sdkserver.py", line 146, in serve_API
    return_data = api_func(*api_args, **api_kwargs)
  File "/usr/lib/python2.7/site-packages/zvmsdk/api.py", line 327, in guest_capture
    compress_level=compress_level)
  File "/usr/lib/python2.7/site-packages/zvmsdk/vmops.py", line 211, in guest_capture
    compress_level=compress_level)
  File "/usr/lib/python2.7/site-packages/zvmsdk/smutclient.py", line 522, in guest_capture
    msg=msg)
SDKGuestOperationError: Failed to capture userid QYR00011 to generate image, error: Failed to check iucv status on capture source vm QYR00011 with error (Error) ULTVMU0315E IUCV socket error sending command to QYR00011. cmd: pwd, rc: 4, rs: 111, out: ERROR: ERROR connecting socket:

[2018-05-10 19:44:24] [INFO] The msg <{u'rs': 5, u'overallRC': 300, u'modID': 10, u'rc': 300, u'output': u'', u'errmsg': u'Failed to capture userid QYR00011 to generate image, error: Failed to check iucv status on capture source vm QYR00011 with error (Error) ULTVMU0315E IUCV socket error sending command to QYR00011. cmd: pwd, rc: 4, rs: 111, out: ERROR: ERROR connecting socket:'}> lead to return internal error
[2018-05-10 19:44:24] [INFO] 10.32.202.49 "POST /guests/QYR00011/action" status: 500 length: 337 headers: [('Content-Length', '337'), ('Content-Type', 'application/json'), ('cache-control', 'no-cache')] exc_info: None

Findings by @jichenjc

[2018-05-10 19:44:19] [INFO] 9.114.224.65 "PUT /guests/QYR00011/nic/1000" status: 200 length: 77 headers: [('Content-Length', '77'), ('Content-Type', 'application/json'), ('cache-control', 'no-cache')] exc_info: None
[2018-05-10 19:44:20] [INFO] 10.32.202.49 "POST /guests/QYR00011/action" status: 200 length: 77 headers: [('Content-Length', '77'), ('Content-Type', 'application/json'), ('cache-control', 'no-cache')] exc_info: None
...
[2018-05-10 19:44:23] [INFO] 10.32.202.49 "GET /guests/QYR00011/power_state" status: 200 length: 79 headers: [('Content-Length', '79'), ('Content-Type', 'application/json'), ('cache-control', 'no-cache')] exc_info: None

"I doubt whether it's because instance startup and IUCV not able to talk now so I'd suggest we reconsider IUCV connection to have some retry mechanism."

Launchpad Ref - https://bugs.launchpad.net/python-zvm-sdk/+bug/1770555

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions