Skip to content

Segmentation Fault When Executing View Query - HeavyAi Version: 9.0.0 #857

@Chintan-24

Description

@Chintan-24

I am reaching out regarding an intermittent issue I have observed while running a view code. Occasionally, the code executes successfully, but at other times it fails with the following error in the syslogs, causing the service to restart:

Jun 26 09:34:46  startheavy[3664]: /opt/heavyai/scripts/innerstartheavy: line 206: 3669 Segmentation fault ./bin/heavydb $MAPD_DATA $RO --port $MAPD_TCP_PORT --http-port $MAPD_HTTP_PORT --calcite-port $MAPD_CALCITE_PORT $CONFIG $VERBOSE ${FORWARDED_ARGS[@]}
Jun 26 09:34:46  startheavy[3670]: Failed to write to log, write storage/log/heavy_web_server.Server_name.root.log.ALL.20250626-093357.3670: file already closed
Jun 26 09:34:48  startheavy[3657]: startheavy 3657 exited
Jun 26 09:34:48  systemd[1]: heavydb_all.service: Main process exited, code=exited, status=139/n/a
Jun 26 09:34:48  systemd[1]: heavydb_all.service: Failed with result 'exit-code'.
Jun 26 09:34:48  systemd[1]: heavydb_all.service: Consumed 17.898s CPU time.
Jun 26 09:34:48  systemd[1]: heavydb_all.service: Scheduled restart job, restart counter is at 1.
Jun 26 09:34:48  systemd[1]: Stopped heavydb_all.service - HEAVY.AI String.
Jun 26 09:34:48  systemd[1]: heavydb_all.service: Consumed 17.898s CPU time.
Jun 26 09:34:48  systemd[1]: Started heavydb_all.service - HEAVY.AI String.
Jun 26 09:34:48  startheavy[3895]: startheavy 3895 running
Jun 26 09:34:48  startheavy[3899]: Backend TCP: localhost:6274
Jun 26 09:34:48  startheavy[3899]: Backend HTTP: localhost:6278
Jun 26 09:34:48  startheavy[3899]: Frontend Web: localhost:6273
Jun 26 09:34:48  startheavy[3899]: Calcite TCP: localhost:6279
Jun 26 09:34:48  startheavy[3899]: WARN: libgeos path not specified. Some SQL 'ST' functions will not be available.
Jun 26 09:34:48  startheavy[3899]: - heavydb 3904 started
Jun 26 09:34:48  startheavy[3899]: - heavy_web_server 3905 started
Jun 26 09:34:54  startheavy[3905]: ⇨ http server started on [::]:6273

Interestingly, the view sometimes runs without errors, and during those successful runs, I have confirmed that the query is running on the GPU. However, I have not yet identified the scenarios in which the view code works and those in which it does not.

View code:

SELECT DISTINCT
    ((((((((CAST(v1.colA AS VARCHAR) || '-') 
    || CAST(v1.colB AS VARCHAR)) || '-') 
    || CAST(v1.colC AS VARCHAR)) || '-') 
    || v1.expr_1) || '-') 
    || v1.expr_2) AS Account_ID,
    
    '' AS Acct_Type_ID,
    '' AS Acct_Desc,
    CAST(v1.colB AS VARCHAR) AS Branch_Code,
    
    CASE 
        WHEN ((CAST(v1.colA AS VARCHAR) || CAST(v1.colB AS VARCHAR) || CAST(v1.colC AS VARCHAR)) LIKE '005500%')
            THEN NULL
        WHEN t3.colA IS NOT NULL
            THEN t3.colCurrency
        ELSE NULL
    END AS Acct_Currency,
    
    CASE 
        WHEN t3.colType = '55' THEN 'C'
        WHEN t4.orgType = '99' AND t5.clientId IS NOT NULL THEN 'C'
        WHEN t3.colCategory = '6633' AND t5.clientId IS NOT NULL THEN 'D'
        WHEN t6.clientId IS NOT NULL THEN 'A'
        ELSE 'B'
    END AS Acct_Type,
    
    CASE 
        WHEN t3.colType = '55' THEN ''
        WHEN t4.orgType = '99' AND t5.clientId IS NOT NULL THEN ''
        WHEN t3.colCategory = '6633' AND t5.clientId IS NOT NULL THEN 'TRUST'
        WHEN t6.clientId IS NOT NULL THEN ''
        ELSE ''
    END AS Acct_Other_Desc,
    
    COALESCE(t4.legalName, (t6.lastName || ' ' || t6.firstName)) AS Acct_Name,
    '' AS Acct_Name2,
    '' AS Acct_Name3,
    'ADMIN' AS Session_Id,
    CURRENT_DATE AS Receive_Time,
    '' AS Hash,    
    v1.expr_1,
    v1.expr_2

FROM (
    SELECT 
        t0.eventId,
        t1.colA,
        t1.colB,
        t1.colC,
        t1.accountType,
        t1.sequenceNo,
        
        CASE 
            WHEN COALESCE(CAST(t1.accountType AS VARCHAR), '') = ''
                THEN 'DEFAULT_TYPE'
            ELSE CAST(t1.accountType AS VARCHAR)
        END AS expr_1,
        
        CASE 
            WHEN COALESCE(CAST(t1.sequenceNo AS VARCHAR), '') = ''
                THEN '00'
            ELSE CAST(t1.sequenceNo AS VARCHAR)
        END AS expr_2

    FROM SOURCE_EVENT t0
    JOIN SOURCE_ACCOUNTS t1 
        ON t0.eventId = t1.eventId
        AND t1.usageType IN ('PRIMARY', 'UNIQUE')
        AND t0.appSource IN ('APP1', 'WEB', 'API', 'TOOL', 'GATEWAY')
) AS v1

INNER JOIN SOURCE_OPERATIONS t2 ON v1.eventId = t2.eventId
LEFT JOIN SOURCE_PARTIES t7 ON v1.eventId = t7.eventId

LEFT JOIN SOURCE_ENTITIES t3 
    ON (CAST(v1.colA AS VARCHAR) || CAST(v1.colB AS VARCHAR) || CAST(v1.colC AS VARCHAR)) 
    = (CAST(t3.colA AS VARCHAR) || CAST(t3.colB AS VARCHAR) || CAST(t3.colC AS VARCHAR))

LEFT JOIN (
    SELECT * FROM (
        SELECT *, ROW_NUMBER() OVER (
            PARTITION BY roleCode, isContact, colA, colB, colC
            ORDER BY Receive_Time DESC
        ) AS rn
        FROM ENTITY_PARTIES
    ) AS sub
    WHERE rn = 1
) AS t5 
    ON v1.colA = t5.colA
    AND v1.colB = t5.colB
    AND v1.colC = t5.colC
    AND t5.isContact = 'TRUE'

LEFT JOIN ORG_CLIENT t4 ON t5.clientId = t4.clientId
LEFT JOIN PERS_CLIENT t6 ON t5.clientId = t6.clientId;

Server configuration:

CPU: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz - 24 cores
RAM: 32GB
GPU: 2× Tesla P100-PCIE-16GB
OS: Debian 12

Can you confirm if this is a known bug, whether there's a patch or workaround, and help investigate further with additional logging or debugging?

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions