Skip to content

ATLAS-4922: Atlas Async Import using Kafka #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
May 21, 2025

Conversation

DishaTalreja3
Copy link
Contributor

What changes were proposed in this pull request?

The existing synchronous import API causes performance bottlenecks with large or multiple requests. This patch introduces asynchronous imports using Kafka to queue requests via dedicated topics, enabling users to submit multiple import requests concurrently.

New APIs added to:

  • Perform asynchronous imports
  • Check import status (all operations or a specific import ID)
  • Abort a queued import operation

How was this patch tested?

Manual Testing, Unit Tests

@DishaTalreja3 DishaTalreja3 marked this pull request as ready for review March 12, 2025 18:41
Copy link
Contributor

@mneethiraj mneethiraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DishaTalreja3 - I haven't completed review yet; sending comments so far. Please review and update the patch.

new SynchronousQueue<>(), // Direct handoff queue
new ThreadFactoryBuilder().setNameFormat(THREADNAME_PREFIX + " thread-%d").build());
}
executors = executorService;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review all writes to executors i.e. following methods:

  • startInternal()
  • startConsumers()
  • stop()

Prior to this patch, these methods are to be called only once in an Atlas instance. However, now executors can be overwritten for every call to async import. Review the usage carefully and update to avoid inappropriate overwrites.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this new approach the executors are overWritten in
startConsumers() - only when the passed in executorService is null (essentially at the start of the instance or start of the first async import request). Please correct me f I am wrong.

@JsonIgnoreProperties(ignoreUnknown = true)
public class AtlasAsyncImportRequest extends AtlasBaseModelObject implements Serializable {
private static final long serialVersionUID = 1L;
public static final String ASYNC_IMPORT_TYPE_NAME = "__AtlasAsyncImportRequest";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constants defined in line 47-49 seem to be used/needed only in Atlas server side. If yes, please move these to server side (like repository module).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constants are used in this model class as well. Moving them to the repository module might introduce unnecessary dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the classes in model package referenes ASYNC_IMPORT_TYPE_NAME. This is referenced only from AsyncImportService and AtlasAsyncImportRequestDTO.

KafkaProducer producer = getOrCreateProducer(topic);

sendInternalToProducer(producer, topic, messages);
sendInternal(topic, messages, SORT_NOT_NEEDED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I get it now. SORT_NOT_NEEDED doesn't add much to readability, given the called method is right above. I suggest to simply send false here.

@JsonIgnoreProperties(ignoreUnknown = true)
public class AtlasAsyncImportRequest extends AtlasBaseModelObject implements Serializable {
private static final long serialVersionUID = 1L;
public static final String ASYNC_IMPORT_TYPE_NAME = "__AtlasAsyncImportRequest";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the classes in model package referenes ASYNC_IMPORT_TYPE_NAME. This is referenced only from AsyncImportService and AtlasAsyncImportRequestDTO.

@mneethiraj
Copy link
Contributor

@DishaTalreja3 - can you please resolve conversations that you addressed?

@mneethiraj mneethiraj merged commit f3d302a into apache:master May 21, 2025
1 check passed
mneethiraj pushed a commit that referenced this pull request May 22, 2025
Co-authored-by: jackhalfalltrades <[email protected]>
(cherry picked from commit f3d302a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants