Skip to content

S3 origin experiences timeouts when being contacted by the cache while under load #118

@turetske

Description

@turetske

This only occurs when the cache is talking to the origin, it does not occur with direct reads.

This also seems to occur in version 0.4.1, which is when we had previously fixed this issue, so I'm unsure why it's occurring again. Looking at the code, the specific error that was fixed in that version is still fixed, so it's another error that's behaving the same way.

How it appears:

On the cache end, it will make a request for the specific object:

time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8425 http_Protocol:  rc:159 got hdr line: GET /dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/data-00066-of-00107.parquet HTTP/1.1" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8425 http_Protocol:  Parsing first line: GET /dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/data-00066-of-00107.parquet HTTP/1.1" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8425 http_Protocol:  rc:49 got hdr line: Host: uwdf-pelican-cache.uwdf-prod.chtc.io:8443" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8425 http_Protocol:  rc:13 got hdr line: Accept: */*" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8425 http_Protocol:  rc:32 got hdr line: Accept-Encoding: gzip, deflate" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8425 http_Protocol:  rc:41 got hdr line: User-Agent: Python/3.12 aiohttp/3.12.14" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8425 http_Protocol:  rc:2 got hdr line: " daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8425 http_Protocol:  rc:2 detected header end." daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8425 unknown.10590:322@[::128.104.153.58] Xrootd_Protocol: 0000 Bridge req=3010 dlen=145 blen=145" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8425 unknown.10590:322@[::128.104.153.58] http_Protocol: Process is exiting rc:0" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8425 acc_Audit: unknown.10590:322@[::128.104.153.58] grant https *@[::ffff:128.104.153.58] read /dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/data-00066-of-00107.parquet" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8425 unknown.10590:322@[::128.104.153.58] Pss_Open: url=pelican://[email protected]:443//dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/data-00066-of-00107.parquet?" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8397 http_Req:  XrdHttpReq::Data! final=False" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8397 unknown.10580:371@[::128.104.153.58] http_Req: PostProcessHTTPReq req: 2 reqstate: 4 final_:False" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8397 unknown.10580:371@[::128.104.153.58] http_Req: Got data vectors to send:1" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 http_Req:  XrdHttpReq::Data! final=False" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 unknown.10582:191@[::128.104.153.58] http_Req: PostProcessHTTPReq req: 2 reqstate: 4 final_:False" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 unknown.10582:191@[::128.104.153.58] http_Req: Got data vectors to send:1" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8397 http_Req:  XrdHttpReq::Data! final=False" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8397 unknown.10580:371@[::128.104.153.58] http_Req: PostProcessHTTPReq req: 2 reqstate: 4 final_:False" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8397 unknown.10580:371@[::128.104.153.58] http_Req: Got data vectors to send:1" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8707 http_Req:  XrdHttpReq::Data! final=True" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8707 unknown.10586:202@[::128.104.153.58] http_Req: PostProcessHTTPReq req: 2 reqstate: 4 final_:False" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8707 unknown.10586:202@[::128.104.153.58] http_Req: Got data vectors to send:1" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8707 http_Protocol: Sending 2097152 bytes" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 http_Req:  XrdHttpReq::Data! final=False" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 unknown.10582:191@[::128.104.153.58] http_Req: PostProcessHTTPReq req: 2 reqstate: 4 final_:True" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 unknown.10582:191@[::128.104.153.58] http_Req: Got data vectors to send:1" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 unknown.10582:191@[::128.104.153.58] http_Protocol:  Process. lp:(nil) reqstate: 4" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 unknown.10582:191@[::128.104.153.58] Xrootd_Protocol: 0000 Bridge req=3003 dlen=0 blen=0" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 unknown.10582:191@[::128.104.153.58] http_Protocol: Process is exiting rc:1" daemon=xrootd.cache

Later, it will get:

time="2025-10-21T17:26:04Z" level=info msg="251021 17:26:04 8425 Posix_Open: [ERROR] Operation expired: Operation expired while in queue open pelican://[email protected]:443//dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/data-00066-of-00107.parquet?" daemon=xrootd.cache
time="2025-10-21T17:26:04Z" level=info msg="251021 17:26:04 8425 ofs_open: unknown.10590:322@[::128.104.153.58] Unable to open /dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/data-00066-of-00107.parquet; timer expired" daemon=xrootd.cache
time="2025-10-21T17:26:04Z" level=info msg="251021 17:26:04 8425 unknown.10590:322@[::128.104.153.58] Xrootd_Response: 0000 sending err 3035: Unable to open /dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/data-00066-of-00107.parquet; timer expired" daemon=xrootd.cache

It appears to be an operation expired issue.

I should note that between these two code blocks, we can see:

time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 http_Req:  XrdHttpReq::Done" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 unknown.10582:191@[::128.104.153.58] http_Req: PostProcessHTTPReq req: 2 reqstate: 5 final_:True" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 http_Req:  XrdHttpReq request ended." daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 http_Protocol:  Cleanup" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 http_Protocol: SSL server failed to send the shutdown message to the client" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 http_Protocol:  Reset" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 http_Req:  XrdHttpReq request ended." daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8401 XrootdXeq: unknown.10582:191@[::128.104.153.58] disc 0:00:00 (send failure)" daemon=xrootd.cache
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 8707 XrdLink: Unable to send to unknown.10586:202@[::128.104.153.58]; broken pipe" daemon=xrootd.cache

The (send failure) may be related since 10582:191@[::128.104.153.58] matches, but I'm unsure if that can be considered equivalent a job ID. The broken pipe afterwards for XrdLink is interesting, but I think it for a different job.

On the origin side, the only mentions of the object are

time="2025-10-21T17:24:55Z" level=info msg="251021 17:24:55 191 unknown.613:54@[::128.104.153.58] ofs_readdir: data-00069-of-00107.parquet fn=/dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/" daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg=data-00069-of-00107.parquet daemon=xrootd.origin
time="2025-10-21T17:24:55Z" level=info msg="251021 17:24:55 191 unknown.613:54@[::128.104.153.58] http_Req: Dirlist /dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/ entry=data-00069-of-00107.parquet stat=4294967296 168433180 64 0 0 0 3000 bin bin" daemon=xrootd.origin

Looking at a section of the logs at around the time of the timeout, we can see:

ss is exiting rc:0" daemon=xrootd.origin
time="2025-10-21T17:25:53Z" level=info msg="251021 17:25:53 508 363ea68c.1092:80@[::10.129.173.58] ofs_open: 0-600 (600) fn=/dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/alloy/data-00000-of-00001.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:53Z" level=info msg="251021 17:25:53 508 scitokens_GenerateAcls: Failed to deserialize SciToken: token verification failed: Token issuer is not in list of allowed issuers." daemon=xrootd.origin
time="2025-10-21T17:25:53Z" level=info msg="251021 17:25:53 508 scitokens_Access: Failed to generate ACLs for token" daemon=xrootd.origin
time="2025-10-21T17:25:53Z" level=info msg="251021 17:25:53 508 acc_Audit: 363ea68c.1092:80@[::10.129.173.58] grant https 363ea68c.0@[::ffff:10.129.173.58] read /dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/alloy/data-00000-of-00001.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:53Z" level=info msg="251021 17:25:53 508 s3_S3File::Open: Opening file /dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/alloy/data-00000-of-00001.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:53Z" level=info msg="251021 17:25:53 063 s3_SetupHandle: Sending HTTP request https://web.s3.wisc.edu/pelican-data-loader/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/alloy/data-00000-of-00001.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:53Z" level=info msg="251021 17:25:53 063 s3_SetupHandle: Checking if curl verbose logging is enabled" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 505 363ea68c.1083:213@[::10.129.173.58] ofs_read: 2097152@4194304 fn=/dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/csv/data-00205-of-00630.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 505 s3_cache: Read request for object=data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/csv/data-00205-of-00630.parquet, offset=4194304, size=2097152" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 504 363ea68c.1094:87@[::10.129.173.58] http_Protocol:  Process. lp:(nil) reqstate: 4" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 504 http_Req:  XrdBridge::SetSF(false) failed." daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 504 363ea68c.1094:87@[::10.129.173.58] http_Protocol: Process is exiting rc:0" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 504 363ea68c.1094:87@[::10.129.173.58] ofs_read: 2097152@8388608 fn=/dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/xml/data-00103-of-00239.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 504 s3_cache: Read request for object=data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/xml/data-00103-of-00239.parquet, offset=8388608, size=2097152" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 504 s3_cache: Issuing GET for object=data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/xml/data-00103-of-00239.parquet, offset=12582912, size=2097152" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 504 http_Req:  XrdHttpReq::Data! final=True" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 504 363ea68c.1094:87@[::10.129.173.58] http_Req: PostProcessHTTPReq req: 2 reqstate: 5 final_:False" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 504 363ea68c.1094:87@[::10.129.173.58] http_Req: Got data vectors to send:1" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 064 s3_SetupHandle: Sending HTTP request https://web.s3.wisc.edu/pelican-data-loader/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/xml/data-00103-of-00239.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 504 http_Protocol: Sending 2097152 bytes" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 064 s3_SetupHandle: Checking if curl verbose logging is enabled" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 XrdLink: Unable to send to 363ea68c.968:104@[::10.129.173.58]; resource temporarily unavailable" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 XrdLink: Unable to send to 363ea68c.969:77@[::10.129.173.58]; resource temporarily unavailable" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 432 XrdLink: Unable to send to 363ea68c.1015:150@[::10.129.173.58]; resource temporarily unavailable" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 434 XrdLink: Unable to send to 363ea68c.1017:152@[::10.129.173.58]; resource temporarily unavailable" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 435 XrdLink: Unable to send to 363ea68c.1018:153@[::10.129.173.58]; resource temporarily unavailable" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 363ea68c.969:77@[::10.129.173.58] http_Protocol:  Process. lp:(nil) reqstate: 5" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 363ea68c.968:104@[::10.129.173.58] http_Protocol:  Process. lp:(nil) reqstate: 5" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 363ea68c.968:104@[::10.129.173.58] http_Protocol: Process is exiting rc:1" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 432 363ea68c.1015:150@[::10.129.173.58] ofs_read: 2097152@14680064 fn=/dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/data-00000-of-00107.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 363ea68c.969:77@[::10.129.173.58] http_Protocol: Process is exiting rc:1" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 432 s3_cache: Read request for object=data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/data-00000-of-00107.parquet, offset=14680064, size=2097152" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 363ea68c.968:104@[::10.129.173.58] ofs_close: use=1 fn=/dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/unity3d-asset/data-00003-of-00016.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 435 363ea68c.1018:153@[::10.129.173.58] ofs_read: 2097152@18874368 fn=/dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/json/data-00569-of-00704.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 435 s3_cache: Read request for object=data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/json/data-00569-of-00704.parquet, offset=18874368, size=2097152" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 363ea68c.969:77@[::10.129.173.58] ofs_close: use=1 fn=/dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/json/data-00228-of-00704.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 434 363ea68c.1017:152@[::10.129.173.58] ofs_read: 2097152@18874368 fn=/dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/text/data-00380-of-00535.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 363ea68c.968:104@[::10.129.173.58] ofs_close: use=0 fn=dummy" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 434 s3_cache: Read request for object=data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/text/data-00380-of-00535.parquet, offset=18874368, size=2097152" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 363ea68c.969:77@[::10.129.173.58] ofs_close: use=0 fn=dummy" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 http_Req:  XrdHttpReq::Done" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 363ea68c.968:104@[::10.129.173.58] http_Req: PostProcessHTTPReq req: 2 reqstate: 6 final_:True" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 http_Req:  XrdHttpReq request ended." daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 http_Req:  XrdHttpReq::Done" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 363ea68c.969:77@[::10.129.173.58] http_Req: PostProcessHTTPReq req: 2 reqstate: 6 final_:True" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 http_Protocol:  Cleanup" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 http_Req:  XrdHttpReq request ended." daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 http_Protocol: SSL server failed to send the shutdown message to the client" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 http_Protocol:  Cleanup" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 http_Protocol: SSL server failed to send the shutdown message to the client" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 http_Protocol:  Reset" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 http_Req:  XrdHttpReq request ended." daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 XrootdXeq: 363ea68c.968:104@[::10.129.173.58] disc 0:00:32 (send failure)" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 http_Protocol:  Reset" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 http_Req:  XrdHttpReq request ended." daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 XrootdXeq: 363ea68c.969:77@[::10.129.173.58] disc 0:00:32 (send failure)" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 432 s3_cache: Issuing GET for object=data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/data-00000-of-00107.parquet, offset=18874368, size=2097152" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 434 s3_cache: Issuing GET for object=data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/text/data-00380-of-00535.parquet, offset=23068672, size=2097152" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 432 http_Req:  XrdHttpReq::Data! final=False" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 432 363ea68c.1015:150@[::10.129.173.58] http_Req: PostProcessHTTPReq req: 2 reqstate: 5 final_:True" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 065 s3_SetupHandle: Sending HTTP request https://web.s3.wisc.edu/pelican-data-loader/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/data-00000-of-00107.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 432 363ea68c.1015:150@[::10.129.173.58] http_Req: Got data vectors to send:1" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 432 363ea68c.1015:150@[::10.129.173.58] http_Protocol:  Process. lp:(nil) reqstate: 5" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 435 s3_cache: Issuing GET for object=data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/json/data-00569-of-00704.parquet, offset=23068672, size=2097152" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 065 s3_SetupHandle: Checking if curl verbose logging is enabled" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 432 363ea68c.1015:150@[::10.129.173.58] http_Protocol: Process is exiting rc:1" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 065 s3_SetupHandle: Sending HTTP request https://web.s3.wisc.edu/pelican-data-loader/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/text/data-00380-of-00535.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 432 363ea68c.1015:150@[::10.129.173.58] ofs_close: use=1 fn=/dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/c-sharp/data-00000-of-00107.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 434 http_Req:  XrdHttpReq::Data! final=False" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 434 363ea68c.1017:152@[::10.129.173.58] http_Req: PostProcessHTTPReq req: 2 reqstate: 6 final_:False" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 065 s3_SetupHandle: Checking if curl verbose logging is enabled" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 434 363ea68c.1017:152@[::10.129.173.58] http_Req: Got data vectors to send:1" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 434 363ea68c.1017:152@[::10.129.173.58] ofs_read: 2097152@20971520 fn=/dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/text/data-00380-of-00535.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 434 s3_cache: Read request for object=data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/text/data-00380-of-00535.parquet, offset=20971520, size=2097152" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 435 http_Req:  XrdHttpReq::Data! final=False" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 435 363ea68c.1018:153@[::10.129.173.58] http_Req: PostProcessHTTPReq req: 2 reqstate: 6 final_:False" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 435 363ea68c.1018:153@[::10.129.173.58] http_Req: Got data vectors to send:1" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 066 s3_SetupHandle: Sending HTTP request https://web.s3.wisc.edu/pelican-data-loader/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/json/data-00569-of-00704.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 435 363ea68c.1018:153@[::10.129.173.58] ofs_read: 2097152@20971520 fn=/dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/json/data-00569-of-00704.parquet" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 435 s3_cache: Read request for object=data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/json/data-00569-of-00704.parquet, offset=20971520, size=2097152" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 066 s3_SetupHandle: Checking if curl verbose logging is enabled" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 064 s3_Run: Processing result from curl" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 064 s3_cache: Finished GET for object=data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/jupyter-notebook/data-00111-of-00312.parquet, offset=2097152, size=2097152, duration_ms=161; succeeded" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 063 s3_Run: Processing result from curl" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 508 363ea68c.1092:80@[::10.129.173.58] ofs_fstat:  fn=/dsi/pytorch/data/datasets--bigcode--the-stack-dedup/snapshots/17cad72c886a2858e08d4c349a00d6466f54df63/data/alloy/data-00000-of-00001.parquet" daemon=xrootd.origin

I think these strings are very telling:

time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 376 XrdLink: Unable to send to 363ea68c.968:104@[::10.129.173.58]; resource temporarily unavailable" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 385 XrdLink: Unable to send to 363ea68c.969:77@[::10.129.173.58]; resource temporarily unavailable" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 432 XrdLink: Unable to send to 363ea68c.1015:150@[::10.129.173.58]; resource temporarily unavailable" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 434 XrdLink: Unable to send to 363ea68c.1017:152@[::10.129.173.58]; resource temporarily unavailable" daemon=xrootd.origin
time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 435 XrdLink: Unable to send to 363ea68c.1018:153@[::10.129.173.58]; resource temporarily unavailable" daemon=xrootd.origin

I'm unsure if this string means anything:

time="2025-10-21T17:25:54Z" level=info msg="251021 17:25:54 504 http_Req:  XrdBridge::SetSF(false) failed." daemon=xrootd.origin

I am unsure what I need to be looking for in these logs to diagnose the problem or what to search with from here.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions