Skip to content

Commit df1d2d9

Browse files
authored
DBT-556 Added support for materializing Kudu table through impala adapter (#207)
* DBT-556 Added support for materializing Kudu table through impala adapter. * DBT-556 Addressed review comment. * DBT-556 Incorporated a review comment in CONTRIBUTING.md file. * DBT-556 Updated README.md with correct set of available tests.
1 parent eb66439 commit df1d2d9

File tree

7 files changed

+246
-34
lines changed

7 files changed

+246
-34
lines changed

CONTRIBUTING.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ When `dbt-impala` is installed this way, any changes you make to the `dbt-impala
6161
`dbt-impala` contains [functional](https://github.com/cloudera/dbt-impala/tree/master/tests/functional/) tests. Functional tests require an actual Impala warehouse to test against.
6262

6363
- You can run functional tests "locally" by configuring a `test.env` file with appropriate `ENV` variables.
64+
- To run `Kudu functional tests` as part of the test suite when underlying storage is `Kudu`, please set the `ENV` variable `DISABLE_KUDU_TEST` to `false`. Kudu tests are disabled by default as this `ENV` variable is set to true.
6465

6566
```
6667
cp test.env.example test.env

KUDU_INTEGRATION.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Kudu Integration using dbt-impala
2+
3+
The `dbt-impala` adapter allows you to use [dbt](https://www.getdbt.com/) along with [Apache Kudu](https://kudu.apache.org) and [Cloudera Data Platform](https://cloudera.com)
4+
5+
6+
## Getting started
7+
8+
- [Install dbt](https://docs.getdbt.com/docs/installation)
9+
- Read the [introduction](https://docs.getdbt.com/docs/introduction/) and [viewpoint](https://docs.getdbt.com/docs/about/viewpoint/)
10+
11+
### Requirements
12+
13+
- In a CDP public cloud deployment, Kudu is available as one of the many Cloudera Runtime services within the Real-time Data Mart template.
14+
- To use Kudu, you can create a Data Hub cluster by selecting Real-time Data Mart template template in the Management Console.
15+
- Follow this [article](https://blog.cloudera.com/integrating-cloudera-data-warehouse-with-kudu-clusters) on integrating the created Kudu service with Impala CDW.
16+
17+
18+
For general instructions, please follow [Readme](README.md) guidelines.

README.md

Lines changed: 34 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ The `dbt-impala` adapter allows you to use [dbt](https://www.getdbt.com/) along
77

88
- [Install dbt](https://docs.getdbt.com/docs/installation)
99
- Read the [introduction](https://docs.getdbt.com/docs/introduction/) and [viewpoint](https://docs.getdbt.com/docs/about/viewpoint/)
10+
- For using `dbt-impala` adapter against [Apache Kudu](https://kudu.apache.org), please follow [Kudu Integration](KUDU_INTEGRATION.md) guidelines.
1011

1112
### Requirements
1213

@@ -40,40 +41,40 @@ demo_project:
4041
```
4142

4243
## Supported features
43-
| Name | Supported | Iceberg |
44-
|------|-----------|---------|
45-
|Materialization: View|Yes| N/A |
46-
|Materialization: Table|Yes| Yes |
47-
|Materialization: Table with Partitions |Yes| Yes |
48-
|Materialization: Incremental - Append|Yes| Yes |
49-
|Materialization: Incremental - Append with Partitions |Yes| Yes |
50-
|Materialization: Incremental - Insert+Overwrite |Yes| Yes |
51-
|Materialization: Incremental - Insert+Overwrite with Partition |Yes| Yes |
52-
|Materialization: Incremental - Merge|No| No |
53-
|Materialization: Ephemeral|Yes| Yes |
54-
|Seeds|Yes| Yes |
55-
|Tests|Yes| Yes |
56-
|Snapshots|No| No |
57-
|Documentation|Yes| Yes |
58-
|Authentication: LDAP|Yes| Yes |
59-
|Authentication: Kerberos|Yes| No |
44+
| Name | Supported | Iceberg | Kudu |
45+
|------|-----------|---------|------|
46+
|Materialization: View|Yes| N/A | N/A |
47+
|Materialization: Table|Yes| Yes | Yes |
48+
|Materialization: Table with Partitions |Yes| Yes | No |
49+
|Materialization: Incremental - Append|Yes| Yes | Yes |
50+
|Materialization: Incremental - Append with Partitions |Yes| Yes | No |
51+
|Materialization: Incremental - Insert+Overwrite |Yes| Yes | Yes |
52+
|Materialization: Incremental - Insert+Overwrite with Partition |Yes| Yes | No |
53+
|Materialization: Incremental - Merge|No| No | No |
54+
|Materialization: Ephemeral|Yes| Yes | No |
55+
|Seeds|Yes| Yes | Yes |
56+
|Tests|Yes| Yes | Yes |
57+
|Snapshots|No| No | No |
58+
|Documentation|Yes| Yes | Yes |
59+
|Authentication: LDAP|Yes| Yes | Yes |
60+
|Authentication: Kerberos|Yes| No | No |
6061

6162
### Tests Coverage
6263

6364
#### Functional Tests
64-
| Name | Base | Iceberg |
65-
|------|------|---------|
66-
|Materialization: View|Yes| N/A |
67-
|Materialization: Table|Yes| Yes |
68-
|Materialization: Table with Partitions |Yes| Yes |
69-
|Materialization: Incremental - Append|Yes| Yes |
70-
|Materialization: Incremental - Append with Partitions |Yes| Yes |
71-
|Materialization: Incremental - Insert+Overwrite |Yes| Yes |
72-
|Materialization: Incremental - Insert+Overwrite with Partition |Yes| Yes |
73-
|Materialization: Ephemeral|Yes| Yes |
74-
|Seeds|Yes| Yes |
75-
|Tests|Yes| Yes |
76-
|Snapshots|No| No |
77-
|Documentation| Yes | Yes |
78-
|Authentication: LDAP|Yes| Yes |
79-
|Authentication: Kerberos|No| No |
65+
| Name | Base | Iceberg | Kudu |
66+
|------|------|---------|------|
67+
|Materialization: View|Yes| N/A | N/A |
68+
|Materialization: Table|Yes| Yes | Yes |
69+
|Materialization: Table with Partitions |Yes| Yes | No |
70+
|Materialization: Incremental - Append|Yes| Yes | Yes |
71+
|Materialization: Incremental - Append with Partitions |Yes| Yes | No |
72+
|Materialization: Incremental - Insert+Overwrite |Yes| No | No |
73+
|Materialization: Incremental - Insert+Overwrite with Partition |Yes| Yes | No |
74+
|Materialization: Ephemeral|Yes| Yes | No |
75+
|Seeds|Yes| Yes | Yes |
76+
|Tests|Yes| Yes | Yes |
77+
|Snapshots|No| No | No |
78+
|Documentation| Yes | Yes | Yes |
79+
|Authentication: LDAP|Yes| Yes | Yes |
80+
|Authentication: Kerberos|No| No | No |

dbt/adapters/impala/relation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ class ImpalaIncludePolicy(Policy):
3737
class ImpalaRelation(BaseRelation):
3838
quote_policy: ImpalaQuotePolicy = field(default_factory=lambda: ImpalaQuotePolicy())
3939
include_policy: ImpalaIncludePolicy = field(default_factory=lambda: ImpalaIncludePolicy())
40-
quote_character: str = None
40+
quote_character: str = "`"
4141
information: str = None
4242

4343
def __post_init__(self):

dbt/include/impala/macros/adapters.sql

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,14 @@
6868
{%- endif %}
6969
{%- endmacro -%}
7070

71+
{% macro ct_option_primary_key(label, required=false) %}
72+
{%- set primaryKey = config.get('primary_key', validator=validation[basestring]) -%}
73+
74+
{%- if primaryKey is not none %}
75+
{{label}} {{primaryKey}}
76+
{%- endif %}
77+
{%- endmacro -%}
78+
7179
{% macro ct_option_stored_as(label, required=false) %}
7280
{%- set storedAs = config.get('stored_as', validator=validation[basestring]) -%}
7381

@@ -180,6 +188,7 @@
180188
{{ ct_option_row_format(label="row format") }}
181189
{{ ct_option_with_serdeproperties(label="with serdeproperties") }}
182190
{%- if table_type == 'iceberg' -%} STORED BY ICEBERG {%- endif -%}
191+
{{ ct_option_primary_key(label="PRIMARY KEY") }}
183192
{{ ct_option_stored_as(label="stored as") }}
184193
{{ ct_option_location_clause(label="location") }}
185194
{{ ct_option_cached_in(label="cached in") }}

test.env.example

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,4 @@ IMPALA_SCHEMA=my_schema
1717
IMPALA_USER=my_user
1818
IMPALA_PASSWORD=my_password
1919
IMPALA_HTTP_PATH=my_http_path
20+
DISABLE_KUDU_TEST=true
Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
# Copyright 2024 Cloudera Inc.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import pytest
16+
import os
17+
from dbt.tests.util import run_dbt, relation_from_name, check_relations_equal
18+
19+
from dbt.tests.adapter.basic.test_incremental import (
20+
BaseIncremental,
21+
BaseIncrementalNotSchemaChange,
22+
)
23+
24+
from dbt.tests.adapter.basic.files import (
25+
schema_base_yml,
26+
model_incremental,
27+
)
28+
29+
pytestmark = pytest.mark.skipif(
30+
os.getenv(key="DISABLE_KUDU_TEST", default="true") == "true",
31+
reason="Kudu tests will be run when DISABLE_KUDU_TEST is set to false in test.env",
32+
)
33+
34+
incremental_kudu_sql = (
35+
"""
36+
{{
37+
config(
38+
materialized="incremental",
39+
stored_as="kudu",
40+
primary_key="(id)"
41+
)
42+
}}
43+
""".strip()
44+
+ model_incremental
45+
)
46+
47+
48+
class TestIncrementalKudu(BaseIncremental):
49+
@pytest.fixture(scope="class")
50+
def project_config_update(self):
51+
return {"name": "incremental_test_model"}
52+
53+
@pytest.fixture(scope="class")
54+
def models(self):
55+
return {"incremental_test_model.sql": incremental_kudu_sql, "schema.yml": schema_base_yml}
56+
57+
def test_incremental(self, project):
58+
# seed command
59+
results = run_dbt(["seed"])
60+
assert len(results) == 2
61+
62+
# base table rowcount
63+
relation = relation_from_name(project.adapter, "base")
64+
result = project.run_sql(f"select count(*) as num_rows from {relation}", fetch="one")
65+
assert result[0] == 10
66+
67+
# added table rowcount
68+
relation = relation_from_name(project.adapter, "added")
69+
result = project.run_sql(f"select count(*) as num_rows from {relation}", fetch="one")
70+
assert result[0] == 20
71+
72+
# run command
73+
# the "seed_name" var changes the seed identifier in the schema file
74+
results = run_dbt(["run", "--vars", "seed_name: base"])
75+
assert len(results) == 1
76+
77+
# check relations equal
78+
check_relations_equal(project.adapter, ["base", "incremental_test_model"])
79+
80+
# change seed_name var
81+
# the "seed_name" var changes the seed identifier in the schema file
82+
results = run_dbt(["run", "--vars", "seed_name: added"])
83+
assert len(results) == 1
84+
85+
# check relations equal
86+
check_relations_equal(project.adapter, ["added", "incremental_test_model"])
87+
88+
# run full-refresh and compare with base table again
89+
results = run_dbt(
90+
[
91+
"run",
92+
"--select",
93+
"incremental_test_model",
94+
"--full-refresh",
95+
"--vars",
96+
"seed_name: base",
97+
]
98+
)
99+
assert len(results) == 1
100+
101+
check_relations_equal(project.adapter, ["base", "incremental_test_model"])
102+
103+
# get catalog from docs generate
104+
catalog = run_dbt(["docs", "generate"])
105+
assert len(catalog.nodes) == 3
106+
assert len(catalog.sources) == 1
107+
108+
109+
insertoverwrite_sql = """
110+
{{
111+
config(
112+
materialized="incremental",
113+
incremental_strategy="insert_overwrite",
114+
partition_by="id_partition",
115+
stored_as="kudu",
116+
primary_key="(id)"
117+
)
118+
}}
119+
select *, id as id_partition from {{ source('raw', 'seed') }}
120+
{% if is_incremental() %}
121+
where id > (select max(id) from {{ this }})
122+
{% endif %}
123+
""".strip()
124+
125+
126+
@pytest.mark.skip(reason="Need to fix partition by syntax for Kudu")
127+
class TestInsertoverwriteKudu(TestIncrementalKudu):
128+
@pytest.fixture(scope="class")
129+
def models(self):
130+
return {"incremental_test_model.sql": insertoverwrite_sql, "schema.yml": schema_base_yml}
131+
132+
133+
incremental_single_partitionby_sql = """
134+
{{
135+
config(
136+
materialized="incremental",
137+
partition_by="id_partition",
138+
stored_as="kudu",
139+
primary_key="(id)"
140+
)
141+
}}
142+
select *, id as id_partition from {{ source('raw', 'seed') }}
143+
{% if is_incremental() %}
144+
where id > (select max(id) from {{ this }})
145+
{% endif %}
146+
""".strip()
147+
148+
149+
@pytest.mark.skip(reason="Need to fix partition by syntax for Kudu")
150+
class TestIncrementalWithSinglePartitionKeyKudu(TestIncrementalKudu):
151+
@pytest.fixture(scope="class")
152+
def models(self):
153+
return {
154+
"incremental_test_model.sql": incremental_single_partitionby_sql,
155+
"schema.yml": schema_base_yml,
156+
}
157+
158+
159+
incremental_multiple_partitionby_sql = """
160+
{{
161+
config(
162+
materialized="incremental",
163+
partition_by=["id_partition1", "id_partition2"],
164+
stored_as="kudu",
165+
primary_key="(id)"
166+
)
167+
}}
168+
select *, id as id_partition1, id as id_partition2 from {{ source('raw', 'seed') }}
169+
{% if is_incremental() %}
170+
where id > (select max(id) from {{ this }})
171+
{% endif %}
172+
""".strip()
173+
174+
175+
@pytest.mark.skip(reason="Need to fix partition by syntax for Kudu")
176+
class TestIncrementalWithMultiplePartitionKeyKudu(TestIncrementalKudu):
177+
@pytest.fixture(scope="class")
178+
def models(self):
179+
return {
180+
"incremental_test_model.sql": incremental_multiple_partitionby_sql,
181+
"schema.yml": schema_base_yml,
182+
}

0 commit comments

Comments
 (0)