-
Notifications
You must be signed in to change notification settings - Fork 457
Open
Description
I am reaching out regarding an intermittent issue I have observed while running a view code. Occasionally, the code executes successfully, but at other times it fails with the following error in the syslogs, causing the service to restart:
Jun 26 09:34:46 startheavy[3664]: /opt/heavyai/scripts/innerstartheavy: line 206: 3669 Segmentation fault ./bin/heavydb $MAPD_DATA $RO --port $MAPD_TCP_PORT --http-port $MAPD_HTTP_PORT --calcite-port $MAPD_CALCITE_PORT $CONFIG $VERBOSE ${FORWARDED_ARGS[@]}
Jun 26 09:34:46 startheavy[3670]: Failed to write to log, write storage/log/heavy_web_server.Server_name.root.log.ALL.20250626-093357.3670: file already closed
Jun 26 09:34:48 startheavy[3657]: startheavy 3657 exited
Jun 26 09:34:48 systemd[1]: heavydb_all.service: Main process exited, code=exited, status=139/n/a
Jun 26 09:34:48 systemd[1]: heavydb_all.service: Failed with result 'exit-code'.
Jun 26 09:34:48 systemd[1]: heavydb_all.service: Consumed 17.898s CPU time.
Jun 26 09:34:48 systemd[1]: heavydb_all.service: Scheduled restart job, restart counter is at 1.
Jun 26 09:34:48 systemd[1]: Stopped heavydb_all.service - HEAVY.AI String.
Jun 26 09:34:48 systemd[1]: heavydb_all.service: Consumed 17.898s CPU time.
Jun 26 09:34:48 systemd[1]: Started heavydb_all.service - HEAVY.AI String.
Jun 26 09:34:48 startheavy[3895]: startheavy 3895 running
Jun 26 09:34:48 startheavy[3899]: Backend TCP: localhost:6274
Jun 26 09:34:48 startheavy[3899]: Backend HTTP: localhost:6278
Jun 26 09:34:48 startheavy[3899]: Frontend Web: localhost:6273
Jun 26 09:34:48 startheavy[3899]: Calcite TCP: localhost:6279
Jun 26 09:34:48 startheavy[3899]: WARN: libgeos path not specified. Some SQL 'ST' functions will not be available.
Jun 26 09:34:48 startheavy[3899]: - heavydb 3904 started
Jun 26 09:34:48 startheavy[3899]: - heavy_web_server 3905 started
Jun 26 09:34:54 startheavy[3905]: ⇨ http server started on [::]:6273
Interestingly, the view sometimes runs without errors, and during those successful runs, I have confirmed that the query is running on the GPU. However, I have not yet identified the scenarios in which the view code works and those in which it does not.
View code:
SELECT DISTINCT
((((((((CAST(v1.colA AS VARCHAR) || '-')
|| CAST(v1.colB AS VARCHAR)) || '-')
|| CAST(v1.colC AS VARCHAR)) || '-')
|| v1.expr_1) || '-')
|| v1.expr_2) AS Account_ID,
'' AS Acct_Type_ID,
'' AS Acct_Desc,
CAST(v1.colB AS VARCHAR) AS Branch_Code,
CASE
WHEN ((CAST(v1.colA AS VARCHAR) || CAST(v1.colB AS VARCHAR) || CAST(v1.colC AS VARCHAR)) LIKE '005500%')
THEN NULL
WHEN t3.colA IS NOT NULL
THEN t3.colCurrency
ELSE NULL
END AS Acct_Currency,
CASE
WHEN t3.colType = '55' THEN 'C'
WHEN t4.orgType = '99' AND t5.clientId IS NOT NULL THEN 'C'
WHEN t3.colCategory = '6633' AND t5.clientId IS NOT NULL THEN 'D'
WHEN t6.clientId IS NOT NULL THEN 'A'
ELSE 'B'
END AS Acct_Type,
CASE
WHEN t3.colType = '55' THEN ''
WHEN t4.orgType = '99' AND t5.clientId IS NOT NULL THEN ''
WHEN t3.colCategory = '6633' AND t5.clientId IS NOT NULL THEN 'TRUST'
WHEN t6.clientId IS NOT NULL THEN ''
ELSE ''
END AS Acct_Other_Desc,
COALESCE(t4.legalName, (t6.lastName || ' ' || t6.firstName)) AS Acct_Name,
'' AS Acct_Name2,
'' AS Acct_Name3,
'ADMIN' AS Session_Id,
CURRENT_DATE AS Receive_Time,
'' AS Hash,
v1.expr_1,
v1.expr_2
FROM (
SELECT
t0.eventId,
t1.colA,
t1.colB,
t1.colC,
t1.accountType,
t1.sequenceNo,
CASE
WHEN COALESCE(CAST(t1.accountType AS VARCHAR), '') = ''
THEN 'DEFAULT_TYPE'
ELSE CAST(t1.accountType AS VARCHAR)
END AS expr_1,
CASE
WHEN COALESCE(CAST(t1.sequenceNo AS VARCHAR), '') = ''
THEN '00'
ELSE CAST(t1.sequenceNo AS VARCHAR)
END AS expr_2
FROM SOURCE_EVENT t0
JOIN SOURCE_ACCOUNTS t1
ON t0.eventId = t1.eventId
AND t1.usageType IN ('PRIMARY', 'UNIQUE')
AND t0.appSource IN ('APP1', 'WEB', 'API', 'TOOL', 'GATEWAY')
) AS v1
INNER JOIN SOURCE_OPERATIONS t2 ON v1.eventId = t2.eventId
LEFT JOIN SOURCE_PARTIES t7 ON v1.eventId = t7.eventId
LEFT JOIN SOURCE_ENTITIES t3
ON (CAST(v1.colA AS VARCHAR) || CAST(v1.colB AS VARCHAR) || CAST(v1.colC AS VARCHAR))
= (CAST(t3.colA AS VARCHAR) || CAST(t3.colB AS VARCHAR) || CAST(t3.colC AS VARCHAR))
LEFT JOIN (
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (
PARTITION BY roleCode, isContact, colA, colB, colC
ORDER BY Receive_Time DESC
) AS rn
FROM ENTITY_PARTIES
) AS sub
WHERE rn = 1
) AS t5
ON v1.colA = t5.colA
AND v1.colB = t5.colB
AND v1.colC = t5.colC
AND t5.isContact = 'TRUE'
LEFT JOIN ORG_CLIENT t4 ON t5.clientId = t4.clientId
LEFT JOIN PERS_CLIENT t6 ON t5.clientId = t6.clientId;
Server configuration:
CPU: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz - 24 cores
RAM: 32GB
GPU: 2× Tesla P100-PCIE-16GB
OS: Debian 12
Can you confirm if this is a known bug, whether there's a patch or workaround, and help investigate further with additional logging or debugging?
Thank you.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels