[column lineage]  Something wrong with column lineage(spark  sql)

**Describe the bug**
First we want to get the column lineage with insert sql(spark sql, running in hadoop Hadoop ).But In some cases, the results of field lineage analysis based on Spark SQL are incorrect，such as when no target table field information is specified after **_insert table  xxx_** expression:

**SQL**
Paste the SQL text here. For example:
```sql
INSERT OVERWRITE table data_ds.data_metric_di PARTITION(ds=20251013)
SELECT uin,
         device,
         play_cnt,
         sum(if(play_progress_cnt>90,
         1,
         0)) play_progress90_cnt,
        
FROM data_dwd.data_metric_ori
WHERE ds=20251013
```

**To Reproduce**
*Note here we refer to SQL provided in prior step as stored in a file named `test.sql`*

 API (Application Programming Interface): provide the python code you're using and the output. 
For example:
```python
from sqllineage.runner import LineageRunner
from sqllineage.core.metadata.dummy import DummyMetaDataProvider

if __name__ == '__main__':
    sql ='''
    INSERT OVERWRITE table data_ds.data_metric_di PARTITION(ds=20251013)
    SELECT uin,
            device,
            play_cnt,
            sum(if(play_progress_cnt>90,
            1,
            0)) play_progress90_cnt,
        
    FROM data_dwd.data_metric_ori
    WHERE ds=20251013
    '''
    metadata = dict()
    metadata = {"data_ds.data_metric_di":["ds","uin","device","pcr_cnt","pcr90_cnt"], "data_dwd.data_metric_ori":["ds","uin","device","play_cnt","play_progress_cnt"]}
    provider = DummyMetaDataProvider(metadata)
    rs = LineageRunner(sql=sql, dialect='sparksql', metadata_provider=provider,verbose=True)
    columns = rs.get_column_lineage(exclude_subquery_columns=True)
    print("column_lineage:", columns)
```
```
[Table: <default>.analyze]
```
the output is as follows:
`[(Column: data_dwd.data_metric_ori.device, Column: data_ds.data_metric_di.device), (Column: data_dwd.data_metric_ori.play_cnt, Column: data_ds.data_metric_di.play_cnt), (Column: data_dwd.data_metric_ori.play_progress_cnt, Column: data_ds.data_metric_di.play_progress90_cnt), (Column: data_dwd.data_metric_ori.uin, Column: data_ds.data_metric_di.uin)]`

**Expected behavior**
A clear and concise description of what you expected to happen, and the output in accordance with the `To Reproduce` section.

The sql is legal and can be used to insert data correctly in the Hadoop environment .But we can see that   _**(Column: data_dwd.data_metric_ori.play_progress_cnt, Column: data_ds.data_metric_di.play_progress90_cnt)**_ is wrong，it should be  _**(Column: data_dwd.data_metric_ori.play_progress_cnt, Column: data_ds.data_metric_di.pcr90_cnt)**_ ，as we have already declared the columns of the target table data_ds.data_metric_di.Is there any way to solve this problem?



**Python version (available via `python --version`)**
 - 3.10

**SQLLineage version (available via `sqllineage --version`):**
 - 1.5.3

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[column lineage] Something wrong with column lineage(spark sql) #733

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[column lineage] Something wrong with column lineage(spark sql) #733

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions