Skip to content

Commit 6632fd0

Browse files
jankiram84wzhou-code
authored andcommitted
IMPALA-12754: [DOCS] External JDBC table support
Created the docs for Impala external JDBC table support Change-Id: I5360389037ae9ee675ab406d87617d55d476bf8f Reviewed-on: http://gerrit.cloudera.org:8080/21539 Tested-by: Impala Public Jenkins <[email protected]> Reviewed-by: gaurav singh <[email protected]> Reviewed-by: Wenzhe Zhou <[email protected]>
1 parent 4c00cbf commit 6632fd0

File tree

2 files changed

+231
-0
lines changed

2 files changed

+231
-0
lines changed

docs/impala.ditamap

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -328,6 +328,7 @@ under the License.
328328
<topicref href="topics/impala_rcfile.xml"/>
329329
<topicref href="topics/impala_seqfile.xml"/>
330330
</topicref>
331+
<topicref href="topics/impala_jdbc_external_table.xml"/>
331332
<topicref href="topics/impala_kudu.xml"/>
332333
<topicref href="topics/impala_hbase.xml"/>
333334
<topicref rev="4.1.0" href="topics/impala_iceberg.xml"/>
Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!--
3+
Licensed to the Apache Software Foundation (ASF) under one
4+
or more contributor license agreements. See the NOTICE file
5+
distributed with this work for additional information
6+
regarding copyright ownership. The ASF licenses this file
7+
to you under the Apache License, Version 2.0 (the
8+
"License"); you may not use this file except in compliance
9+
with the License. You may obtain a copy of the License at
10+
11+
http://www.apache.org/licenses/LICENSE-2.0
12+
13+
Unless required by applicable law or agreed to in writing,
14+
software distributed under the License is distributed on an
15+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
KIND, either express or implied. See the License for the
17+
specific language governing permissions and limitations
18+
under the License.
19+
-->
20+
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
21+
<concept id="impala_jdbc_external_table_support">
22+
<title id="jdbc_external_table">Using Impala to Query External JDBC Data Sources</title>
23+
<prolog>
24+
<metadata>
25+
<data name="Category" value="Impala"/>
26+
<data name="Category" value="JDBC external tables"/>
27+
<data name="Category" value="Remote data sources"/>
28+
<data name="Category" value="SQL"/>
29+
<data name="Category" value="JDBC External table properties"/>
30+
<data name="Category" value="Querying JDBC external tables"/>
31+
<data name="Category" value="Modifying JDBC external tables"/>
32+
<data name="Category" value="Analyst"/>
33+
<data name="Category" value="Developers"/>
34+
</metadata>
35+
</prolog>
36+
<conbody>
37+
<p>Apache Impala now supports reading from external JDBC data sources. An external JDBC table represents a table or a view in a remote RDBMS database or another Impala cluster. Using external JDBC tables, you can connect Impala to a database, such as MySQL, PostgreSQL, or another Impala cluster and read the data in the remote tables.</p>
38+
<section>
39+
<title>Syntax</title>
40+
<p>To connect to a remote database, you create an external JDBC table with the appropriate table properties, such as the database type, JDBC URL, driver class, driver file location, JDBC username and password, and name of the remote table to be mapped to the Impala external JDBC table.</p>
41+
<codeblock id="codeblock_klg_tcd_5bc">CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name
42+
(<i>col_name data_type</i>,
43+
....)
44+
STORED BY JDBC
45+
TBLPROPERTIES (
46+
"database.type"="<i>value</i>",
47+
"jdbc.url"="<i>value</i>",
48+
"jdbc.driver"="<i>value</i>",
49+
"driver.url"="<i>value</i>",
50+
"dbcp.username"="<i>value</i>",
51+
"dbcp.password"="<i>value</i>",
52+
"table"="<i>table.name</i>");</codeblock>
53+
<p><b>Examples:</b></p>
54+
<p><b>Creating an external JDBC table to map a table in a remote PostgreSQL
55+
database:</b><codeblock id="codeblock_f3m_jdd_5bc">CREATE EXTERNAL TABLE student_jdbc (
56+
id INT,
57+
bool_col BOOLEAN,
58+
tinyint_col TINYINT,
59+
smallint_col SMALLINT,
60+
int_col INT,
61+
bigint_col BIGINT,
62+
float_col FLOAT,
63+
double_col DOUBLE,
64+
date_col DATE,
65+
string_col STRING,
66+
timestamp_col TIMESTAMP)
67+
STORED BY JDBC
68+
TBLPROPERTIES (
69+
"database.type"="POSTGRES",
70+
"jdbc.url"="jdbc:postgresql://<i>IP_address</i>:5432/<i>database_name</i>",
71+
"jdbc.driver"="org.postgresql.Driver",
72+
"driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar",
73+
"dbcp.username"="user",
74+
"dbcp.password"="password",
75+
"table"="student");
76+
</codeblock></p>
77+
<p><b>Creating an external JDBC table to map a table in another Impala
78+
cluster:</b><codeblock id="codeblock_csx_qdd_5bc">CREATE EXTERNAL TABLE student_jdbc (
79+
id INT,
80+
bool_col BOOLEAN,
81+
tinyint_col TINYINT,
82+
smallint_col SMALLINT,
83+
int_col INT,
84+
bigint_col BIGINT,
85+
float_col FLOAT,
86+
double_col DOUBLE,
87+
date_col DATE,
88+
string_col STRING,
89+
timestamp_col TIMESTAMP)
90+
STORED BY JDBC
91+
TBLPROPERTIES (
92+
"database.type"="IMPALA",
93+
"jdbc.url"="jdbc:impala://<i>IP_address</i>:21050/<i>database_name</i>",
94+
"jdbc.auth"="AuthMech=3",
95+
"jdbc.properties"="MEM_LIMIT=1000000000, MAX_ERRORS = 10000",
96+
"jdbc.driver"="com.cloudera.impala.jdbc.Driver",
97+
"driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar",
98+
"dbcp.username"="user",
99+
"dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks",
100+
"dbcp.password.key"="password-key",
101+
"table"="student");</codeblock></p>
102+
</section>
103+
<section id="section_mtj_z2d_5bc">
104+
<title>Table Properties</title>
105+
<p>While creating an external JDBC table, you are required to specify the following
106+
table properties:</p>
107+
<ul id="ul_q55_hfd_5bc">
108+
<li dir="ltr"><codeph>database.type</codeph>: POSTGRES, MYSQL, or IMPALA</li>
109+
<li dir="ltr"><codeph>jdbc.url</codeph>: JDBC connection string with the required
110+
parameters — database type, hostname/IP address, port number, and database name.<p
111+
dir="ltr">Example: “jdbc:impala://10.96.132.138:21050/sample_db”.</p></li>
112+
<li dir="ltr"><codeph>jdbc.driver</codeph>: Class name of the JDBC driver</li>
113+
<li dir="ltr"><codeph>driver.url</codeph>: URL to download the JAR file package that
114+
is used to access the external database</li>
115+
<li><codeph>table</codeph>: Name of the table in the remote database that you want
116+
to map in Impala</li>
117+
</ul>
118+
<p>Besides the above required properties, you can also specify optional parameters that
119+
allow you to use different authentication methods, allow case sensitive column names
120+
in remote tables, or to specify additional database properties:</p>
121+
<ul id="ul_mmg_4fd_5bc">
122+
<li dir="ltr"><codeph>jdbc.auth</codeph>: Authentication mechanism of the JDBC
123+
driver</li>
124+
<li dir="ltr"><codeph>dbcp.username</codeph>: JDBC username</li>
125+
<li dir="ltr"><codeph>dbcp.password</codeph>: JDBC password in clear text.<note
126+
id="note_bqh_tfd_5bc">Storing JDBC passwords in clear text is not
127+
recommended in production environments. The recommended way is to store
128+
the password in a Java keystore file.</note></li>
129+
<li dir="ltr"><codeph>dbcp.password.key</codeph>: Key of the Java keystore</li>
130+
<li dir="ltr"><codeph>dbcp.password.keystore</codeph>: URI of the keystore
131+
file</li>
132+
<li dir="ltr"><codeph>jdbc.properties</codeph>: Additional properties applied to
133+
database engines, like Impala Query options. The properties are specified as
134+
comma-separated "key-value" pairs. </li>
135+
<li dir="ltr"><codeph>jdbc.fetch.size</codeph>: Number of rows to fetch in a
136+
batch</li>
137+
<li><codeph>column.mapping</codeph>: Mapping of column names between external table
138+
and Impala JDBC table.</li>
139+
</ul>
140+
</section>
141+
<section id="section_bhv_zfd_5bc">
142+
<title>Supported Data Types</title>
143+
<p>The following column data types are supported for an Impala external JDBC table:</p>
144+
<ul id="ul_ky1_cgd_5bc">
145+
<li dir="ltr">Numeric data type: boolean, tinyint, smallint, int, bigint, float,
146+
double</li>
147+
<li dir="ltr">Decimal with scale and precision</li>
148+
<li dir="ltr">String type: string</li>
149+
<li dir="ltr">Date</li>
150+
<li dir="ltr">Timestamp</li>
151+
</ul>
152+
</section>
153+
<section id="section_e1g_fgd_5bc">
154+
<title>Limitations</title>
155+
<p>You must be aware of the following limitations while using Impala external JDBC tables:</p>
156+
<ul id="ul_xkf_ggd_5bc">
157+
<li dir="ltr">Following column data types are not supported: char, varchar, binary,
158+
<p>Complex data types - struct, map, array, and nested type</p></li>
159+
<li dir="ltr">JDBC tables have to be defined one table at a time</li>
160+
<li dir="ltr">Writing to a JDBC table is not supported</li>
161+
<li dir="ltr">Only supported binary predicates with operators =, !=, &lt;=, >=,
162+
&lt;, > to be pushed to RDBMS</li>
163+
</ul>
164+
</section>
165+
<section id="secure_jdbc_password">
166+
<title>Securing the JDBC Password</title>
167+
<p>The <codeph>dbcp.password</codeph> table property stores the JDBC password in clear
168+
text. To avoid the risk of a password leak, the <codeph>SHOW CREATE TABLE
169+
&lt;table-name></codeph> and <codeph>DESCRIBE FORMATTED | EXTENDED
170+
&lt;table-name></codeph> statements mask the value of the
171+
<codeph>dbcp.password</codeph> table property in their outputs.</p>
172+
<p>In production environments, it is recommended that you do not store the JDBC password
173+
in clear text using the <codeph>dbcp.password</codeph> table property. Instead, you
174+
can store the password in a Java Keystore file on HDFS or on cloud storage like
175+
Amazon S3 using the following command:</p>
176+
<p><b>Creating a Java keystore file on HDFS with the key as "host1.password" and
177+
password as
178+
"passwd1":</b><codeblock id="codeblock_fgg_qgd_5bc">hadoop credential create host1.password -provider jceks://hdfs/user/foo/test.jceks -v passwd1</codeblock></p>
179+
<p><b>Creating a Java keystore file on Amazon S3 with the key as "impala" and password
180+
as
181+
"passwd2":</b><codeblock id="codeblock_gll_rgd_5bc">hadoop credential create impala -provider jceks://s3a@dw-impala-test/jceks/demo.jceks -v passwd2</codeblock></p>
182+
<p>For more information, see the <xref
183+
href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html"
184+
format="html" scope="external">Apache Hadoop CredentialProvider API
185+
Guide</xref>.</p>
186+
</section>
187+
<section id="section_oxg_m3d_5bc">
188+
<title>Support for case-sensitive table and column names</title>
189+
<p>The column names of tables in the remote database can be different from the external
190+
JDBC table schema. For example, Postgres allows case-sensitive column names,
191+
however, Impala saves column names in lowercase. In such situations, you can set the
192+
<codeph>column.mapping</codeph> table property to map column names between
193+
Impala external JDBC tables and the remote tables.</p>
194+
<p><b>Example:</b><codeblock id="codeblock_jfs_v3d_5bc">"column.mapping"="id=id, bool_col=Bool_col, tinyint_col=Tinyint_col,
195+
smallint_col=Smallint_col, int_col=Int_col, bigint_col=Bigint_col, float_col=Float_col, double_col=Double_col, date_col=date_col, string_col=String_col, timestamp=Timestamp");</codeblock></p>
196+
</section>
197+
<section id="section_b2t_1jd_5bc">
198+
<title>Modifying the external JDBC table</title>
199+
<p>You can use the ALTER TABLE statement to add, drop, or modify columns, or modify the
200+
table properties of existing external JDBC tables. The syntax is the same as the
201+
other Impala tables.</p>
202+
<p><b>To add, drop, or modify
203+
columns</b><codeblock id="codeblock_gcv_fjd_5bc">ALTER TABLE student_jdbc ADD COLUMN IF NOT EXISTS date_col DATE;
204+
ALTER TABLE student_jdbc DROP COLUMN int_col;
205+
ALTER TABLE student_jdbc CHANGE COLUMN date_col timestamp_col TIMESTAMP;</codeblock></p>
206+
<p><b>To modify table
207+
properties</b><codeblock id="codeblock_er4_jjd_5bc">ALTER TABLE student_jdbc
208+
SET TBLPROPERTIES ("dbcp.username"="impala", "dbcp.password"="password");</codeblock></p>
209+
</section>
210+
<section id="section_yz3_kjd_5bc">
211+
<title>Querying external JDBC tables</title>
212+
<p>Querying or reading external JDBC tables is the same as querying regular tables in
213+
Impala. You can use SELECT statements to query data and can also join the external
214+
table with other tables across databases. However, do note that the metadata for the
215+
external tables is not persisted in Hive Metastore (HMS). </p>
216+
<p><b>Example:</b><codeblock id="codeblock_yyb_njd_5bc">SELECT * from student_jdbc;</codeblock></p>
217+
</section>
218+
<section id="section_nbg_w3d_5bc">
219+
<title><b><b>Query options for external JDBC tables</b></b></title>
220+
<p>A new query option, CLEAN_DBCP_DS_CACHE is added to save the DBCP SQL DataSource
221+
objects in the cache for a longer period of time. This allows the DBCP connection
222+
pools to be reused across multiple queries. When the value is set to false, the DBCP
223+
SQL DataSource object is not closed when its reference count is 0. The SQL
224+
DataSource object is kept in cache until the object is idle for more than 5
225+
minutes.</p>
226+
<p><b>Type</b>: BOOLEAN</p>
227+
<p><b>Default</b>: True (1)</p>
228+
</section>
229+
</conbody>
230+
</concept>

0 commit comments

Comments
 (0)