Skip to content
This repository was archived by the owner on Sep 2, 2025. It is now read-only.
This repository was archived by the owner on Sep 2, 2025. It is now read-only.

[CT-1109] ParseException - mismatch input table #446

@twcardenas

Description

@twcardenas

Describe the bug

A clear and concise description of what the bug is. What command did you run? What happened?

I am running dbt-spark 1.2.0 and submitting to an AWS EMR Spark Cluster using Thrift. The cluster uses Spark 3.1.2.
Ran dbt run
Error:

Error while compiling statement: FAILED: ParseException line 3:24 mismatched input 'table' expecting KW_VIEW near 'replace' in create view statement

The SQL Statement generated was:



      create or replace table target_database_name.target_table_name
    
    
    using delta
    
    
    
    
    location '<S3_LOCATION>'
    
    as
      select * from source_database_name.source_table_name
# profile:
db_test:
  outputs:
    dev:
      type: spark
      method: thrift
      host: IP_ADDRESS
      port: 10000
      user: user
      schema: target_database_name
      connect_retries: 0
      connect_timeout: 10
      retry_all: true
  target: dev

# dbt_project.yaml

name: 'db_test'
version: '1.0.0'
config-version: 2

profile: 'db_test'

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"  # directory which will store compiled SQL files
clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"

models:
  +file_format: delta
  +materialized: table
  +location_root: S3_PATH

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

# Model File
select * from {{source("source_database_name", "source_table_name")}}

Expected behavior

A clear and concise description of what you expected to happen.
I expected a new table to be created in a new database that is a copy of the source table

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

The output of dbt --version:

Core:
  - installed: 1.2.1
  - latest:    1.2.1 - Up to date!

Plugins:
  - spark: 1.2.0 - Up to date!

The operating system you're using:
macOS

The output of python --version:
local computer - Python 3.9.1
EMR Spark Cluster is 3.7

Additional context

Add any other context about the problem here.

I pasted the generated SQL into spark.sql("") on the cluster and it worked fined

I added this macro I found on an older issue that solved an issue I was seeing

{% macro spark__list_relations_without_caching(relation) %}
  {% set rels = [] %}
  {% for node in graph.nodes.values() | selectattr("resource_type", "equalto", "model") %}
      {% do rels.append(node.fqn[1]) %}
  {% endfor %}

  {% if rels | length > 1 %}  
    {% set suffix = rels | join('|') %}
  {% else %}
    {% set suffix = '*' %}
  {% endif %}

  {% call statement('list_relations_without_caching', fetch_result=True) -%}
    show table extended in {{ relation }} like {{ suffix }}
  {% endcall %}
  {% do return(load_result('list_relations_without_caching').table) %}
{% endmacro %}

{%- macro spark__create_table_as(temporary, relation, compiled_code, language='sql') -%}
  {%- if language == 'sql' -%}
    {%- if temporary -%}
      {{ create_temporary_view(relation, compiled_code) }}
    {%- else -%}
      {% if config.get('file_format', validator=validation.any[basestring]) == 'delta' %}
        create or replace table {{ relation }}
      {% else %}
        create table if not exists {{ relation }}
      {% endif %}
      {{ file_format_clause() }}
      {{ options_clause() }}
      {{ partition_cols(label="partitioned by") }}
      {{ clustered_cols(label="clustered by") }}
      {{ location_clause() }}
      {{ comment_clause() }}
      as
      {{ compiled_code }}
    {%- endif -%}
  {%- elif language == 'python' -%}
    {#--
    N.B. Python models _can_ write to temp views HOWEVER they use a different session
    and have already expired by the time they need to be used (I.E. in merges for incremental models)

    TODO: Deep dive into spark sessions to see if we can reuse a single session for an entire
    dbt invocation.
     --#}
    {{ py_write_table(compiled_code=compiled_code, target_relation=relation) }}
  {%- endif -%}
{%- endmacro -%}

Metadata

Metadata

Assignees

Labels

type:bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions