Skip to content

[feature](RoutineLoad) Support RoutineLoad IAM auth#61324

Open
0AyanamiRei wants to merge 2 commits intoapache:masterfrom
0AyanamiRei:support-IAM-auth
Open

[feature](RoutineLoad) Support RoutineLoad IAM auth#61324
0AyanamiRei wants to merge 2 commits intoapache:masterfrom
0AyanamiRei:support-IAM-auth

Conversation

@0AyanamiRei
Copy link
Contributor

@0AyanamiRei 0AyanamiRei commented Mar 13, 2026

What problem does this PR solve?

Overview:
This PR adds AWS MSK IAM authentication for Kafka Routine Load in Apache Doris. You can connect to Amazon MSK using IAM credentials (including Assume Role and cross-account) with SASL_SSL and OAUTHBEARER.

What It Solves:

  1. Consume AWS MSK data from Doris via Routine Load.
  2. Support three credential modes: explicit AK/SK, same-account Instance Profile Assume Role, and cross-account AK/SK Assume Role.
  3. Align with AWS MSK IAM (SigV4-signed OAUTHBEARER tokens).

SQL Examples

  1. MSK IAM with explicit Access Key and Secret Key (same account)
CREATE ROUTINE LOAD my_msk_load ON my_db.my_table
COLUMNS (id, name, dt)
PROPERTIES (
    "desired_concurrent_number" = "2",
    "max_error_number" = "1000"
)
FROM KAFKA
(
    "kafka_broker_list" = "b-1.xxx.kafka.us-east-1.amazonaws.com:9098,b-2.xxx.kafka.us-east-1.amazonaws.com:9098",
    "kafka_topic" = "my-topic",
    "property.security.protocol" = "SASL_SSL",
    "property.sasl.mechanism" = "OAUTHBEARER",
    "aws.region" = "us-east-1",
    "aws.access.key" = "AKIAIOSFODNN7EXAMPLE",
    "aws.secret.key" = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
);
  1. MSK IAM with Assume Role (e.g. EC2 Instance Profile, same account)
CREATE ROUTINE LOAD my_msk_load ON my_db.my_table
COLUMNS (id, name, dt)
PROPERTIES (
    "desired_concurrent_number" = "2"
)
FROM KAFKA
(
    "kafka_broker_list" = "b-1.xxx.kafka.us-east-1.amazonaws.com:9098",
    "kafka_topic" = "my-topic",
    "property.security.protocol" = "SASL_SSL",
    "property.sasl.mechanism" = "OAUTHBEARER",
    "aws.region" = "us-east-1",
    "aws.msk.iam.role.arn" = "arn:aws:iam::123456789012:role/MyMSKConsumerRole"
);
  1. MSK IAM with cross-account Assume Role (AK/SK of account B to assume role in account A)
CREATE ROUTINE LOAD my_msk_load ON my_db.my_table
COLUMNS (id, name, dt)
PROPERTIES (
    "desired_concurrent_number" = "2"
)
FROM KAFKA
(
    "kafka_broker_list" = "b-1.xxx.kafka.us-east-1.amazonaws.com:9098",
    "kafka_topic" = "my-topic",
    "property.security.protocol" = "SASL_SSL",
    "property.sasl.mechanism" = "OAUTHBEARER",
    "aws.region" = "us-east-1",
    "aws.msk.iam.role.arn" = "arn:aws:iam::111111111111:role/CrossAccountMSKRole",
    "aws.access.key" = "AKIAIOSFODNN7EXAMPLE",
    "aws.secret.key" = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
);

Validation rules (FE): When any aws.* property is set, aws.region is required, and property.security.protocol must be SASL_SSL and property.sasl.mechanism must be OAUTHBEARER. If you use explicit credentials, both aws.access.key and aws.secret.key must be set together.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Mar 13, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@0AyanamiRei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26674 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a52972e3af5fca8edb5a7d0440d274594b53ce75, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17647	4506	4281	4281
q2	q3	10648	788	509	509
q4	4674	355	251	251
q5	7539	1195	1040	1040
q6	171	173	148	148
q7	798	848	692	692
q8	9460	1470	1278	1278
q9	4854	4699	4697	4697
q10	6332	1903	1669	1669
q11	462	252	241	241
q12	750	567	464	464
q13	18045	2955	2183	2183
q14	232	230	219	219
q15	q16	733	734	661	661
q17	705	838	438	438
q18	5888	5450	5196	5196
q19	1357	971	615	615
q20	542	491	379	379
q21	4504	1826	1393	1393
q22	427	451	320	320
Total cold run time: 95768 ms
Total hot run time: 26674 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4865	4526	4559	4526
q2	q3	3872	4384	3968	3968
q4	1124	1212	805	805
q5	4101	4379	4318	4318
q6	186	173	138	138
q7	1742	1662	1515	1515
q8	2520	2697	2569	2569
q9	7499	7520	7351	7351
q10	3725	3945	3635	3635
q11	533	469	434	434
q12	522	614	456	456
q13	2719	3293	2327	2327
q14	316	321	307	307
q15	q16	763	781	711	711
q17	1169	1416	1355	1355
q18	7308	6868	6675	6675
q19	886	888	973	888
q20	2050	2124	2006	2006
q21	3909	3501	3379	3379
q22	445	425	377	377
Total cold run time: 50254 ms
Total hot run time: 47740 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168526 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a52972e3af5fca8edb5a7d0440d274594b53ce75, data reload: false

query5	4334	654	495	495
query6	328	223	212	212
query7	4219	457	273	273
query8	338	247	239	239
query9	8752	2761	2737	2737
query10	511	396	342	342
query11	6941	5094	4892	4892
query12	190	136	126	126
query13	1266	445	356	356
query14	5751	3744	3474	3474
query14_1	2874	2820	2832	2820
query15	219	194	177	177
query16	989	478	463	463
query17	903	721	622	622
query18	2460	450	353	353
query19	215	212	183	183
query20	136	132	130	130
query21	218	132	114	114
query22	13302	14160	14387	14160
query23	16576	15863	15537	15537
query23_1	15677	16106	15842	15842
query24	7477	1625	1223	1223
query24_1	1241	1235	1262	1235
query25	651	484	452	452
query26	1238	284	205	205
query27	2696	486	323	323
query28	4588	1864	1847	1847
query29	858	571	481	481
query30	302	246	189	189
query31	1058	938	865	865
query32	84	71	70	70
query33	509	328	273	273
query34	894	888	532	532
query35	634	685	585	585
query36	1045	1099	1033	1033
query37	138	97	84	84
query38	2964	2949	2862	2862
query39	861	837	821	821
query39_1	771	795	790	790
query40	231	150	133	133
query41	60	59	58	58
query42	261	249	258	249
query43	238	253	218	218
query44	
query45	195	189	183	183
query46	879	970	613	613
query47	2542	2177	3097	2177
query48	351	314	229	229
query49	626	454	366	366
query50	676	275	219	219
query51	4077	4050	4095	4050
query52	264	269	260	260
query53	287	335	289	289
query54	306	271	264	264
query55	95	91	86	86
query56	327	323	331	323
query57	1965	1675	1570	1570
query58	284	276	269	269
query59	2800	2946	2760	2760
query60	346	340	321	321
query61	150	143	145	143
query62	636	597	541	541
query63	320	278	274	274
query64	5052	1287	989	989
query65	
query66	1463	458	347	347
query67	24298	24327	24357	24327
query68	
query69	406	317	285	285
query70	987	953	976	953
query71	346	314	305	305
query72	2881	2648	1948	1948
query73	559	532	317	317
query74	9597	9566	9402	9402
query75	2830	2763	2472	2472
query76	2290	1027	652	652
query77	366	379	319	319
query78	10870	11039	10511	10511
query79	3059	790	584	584
query80	1718	626	539	539
query81	576	255	235	235
query82	1003	150	118	118
query83	335	265	248	248
query84	301	116	97	97
query85	891	489	446	446
query86	483	328	296	296
query87	3143	3137	3030	3030
query88	3589	2676	2640	2640
query89	425	375	342	342
query90	2042	179	174	174
query91	163	162	134	134
query92	86	74	77	74
query93	1456	823	501	501
query94	639	316	256	256
query95	590	403	315	315
query96	646	517	232	232
query97	2439	2515	2435	2435
query98	247	224	219	219
query99	999	951	906	906
Total cold run time: 253048 ms
Total hot run time: 168526 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 80.65% (25/31) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants