-
Notifications
You must be signed in to change notification settings - Fork 240
[CT-1109] ParseException - mismatch input table #446
Description
Describe the bug
A clear and concise description of what the bug is. What command did you run? What happened?
I am running dbt-spark 1.2.0 and submitting to an AWS EMR Spark Cluster using Thrift. The cluster uses Spark 3.1.2.
Ran dbt run
Error:
Error while compiling statement: FAILED: ParseException line 3:24 mismatched input 'table' expecting KW_VIEW near 'replace' in create view statement
The SQL Statement generated was:
create or replace table target_database_name.target_table_name
using delta
location '<S3_LOCATION>'
as
select * from source_database_name.source_table_name
# profile:
db_test:
outputs:
dev:
type: spark
method: thrift
host: IP_ADDRESS
port: 10000
user: user
schema: target_database_name
connect_retries: 0
connect_timeout: 10
retry_all: true
target: dev
# dbt_project.yaml
name: 'db_test'
version: '1.0.0'
config-version: 2
profile: 'db_test'
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_packages"
models:
+file_format: delta
+materialized: table
+location_root: S3_PATH
Steps To Reproduce
In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.
# Model File
select * from {{source("source_database_name", "source_table_name")}}
Expected behavior
A clear and concise description of what you expected to happen.
I expected a new table to be created in a new database that is a copy of the source table
Screenshots and log output
If applicable, add screenshots or log output to help explain your problem.
System information
The output of dbt --version:
Core:
- installed: 1.2.1
- latest: 1.2.1 - Up to date!
Plugins:
- spark: 1.2.0 - Up to date!
The operating system you're using:
macOS
The output of python --version:
local computer - Python 3.9.1
EMR Spark Cluster is 3.7
Additional context
Add any other context about the problem here.
I pasted the generated SQL into spark.sql("") on the cluster and it worked fined
I added this macro I found on an older issue that solved an issue I was seeing
{% macro spark__list_relations_without_caching(relation) %}
{% set rels = [] %}
{% for node in graph.nodes.values() | selectattr("resource_type", "equalto", "model") %}
{% do rels.append(node.fqn[1]) %}
{% endfor %}
{% if rels | length > 1 %}
{% set suffix = rels | join('|') %}
{% else %}
{% set suffix = '*' %}
{% endif %}
{% call statement('list_relations_without_caching', fetch_result=True) -%}
show table extended in {{ relation }} like {{ suffix }}
{% endcall %}
{% do return(load_result('list_relations_without_caching').table) %}
{% endmacro %}
{%- macro spark__create_table_as(temporary, relation, compiled_code, language='sql') -%}
{%- if language == 'sql' -%}
{%- if temporary -%}
{{ create_temporary_view(relation, compiled_code) }}
{%- else -%}
{% if config.get('file_format', validator=validation.any[basestring]) == 'delta' %}
create or replace table {{ relation }}
{% else %}
create table if not exists {{ relation }}
{% endif %}
{{ file_format_clause() }}
{{ options_clause() }}
{{ partition_cols(label="partitioned by") }}
{{ clustered_cols(label="clustered by") }}
{{ location_clause() }}
{{ comment_clause() }}
as
{{ compiled_code }}
{%- endif -%}
{%- elif language == 'python' -%}
{#--
N.B. Python models _can_ write to temp views HOWEVER they use a different session
and have already expired by the time they need to be used (I.E. in merges for incremental models)
TODO: Deep dive into spark sessions to see if we can reuse a single session for an entire
dbt invocation.
--#}
{{ py_write_table(compiled_code=compiled_code, target_relation=relation) }}
{%- endif -%}
{%- endmacro -%}