-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Add rest endpoints to dynamically add a catalog #12605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed. If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
Please provide any comment on this pull request. |
String filePath = getPropertyFilePath(catalogVo.getCatalogName()); | ||
log.info("filepath: " + filePath); | ||
File propertiesFile = new File(filePath); | ||
try (OutputStream out = new FileOutputStream(propertiesFile)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: you might consider using Files.newOuputStream here instead. See #12614 for details about why File{Input,Output}Stream should be avoided in cases where any {Input,Output}Stream will do.
@ramveer93 Unloading dynamically loaded plugins is very memory leak prone in Java. It is very hard to enforce that all the threads spawned by a plugin are being correctly stopped at the lifecycle end. I'm pretty sure it is very easy to find instances even in Presto when thread pools are not being closed (e.g.: https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/HiveClientModule.java#L128). Introducing a runtime method that removes catalogs creates a false impression of "removal". In fact there's a high chance that the catalog won't be correctly unloaded, and will still consume resources. Not so sure about dynamic adding of catalogs. But it seems like since these methods are needed - frequent registrations of new catalogs are expected. If there's no reliable way of "deregistering" them - eventually the cluster will run out of memory. |
@DELETE | ||
@Consumes(MediaType.APPLICATION_JSON) | ||
@Produces(MediaType.APPLICATION_JSON) | ||
public Response deleteCatalog(@QueryParam("catalogName") String catalogName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this will try to delete whatever catalog is in StaticCatalogStoreConfig
. I think it would be operationally better to let whatever is specified as a static resource to be immutable, but still allow one to delete or add on the fly. I don't think it should delete the files that are statically shipped with the Presto deployment. Worst case scenario, you don't specify any static catalogs and you need to add them via the REST endpoint. But that seems better than accidentally deleting an important catalog config and effectively not having a way to undo that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment,
Not sure I followed your comment fully.
But I am adding my explanation: Delete catalog endpoint will call connectorManager.dropConnection(catalogName);
which will eventually call removeCatalog
method of catalogManager
which deletes the catalog from concurrentMap.
The endpoint will also delete the file associated with this catalog from etc/catalog which was created when we registered the catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the catalogs that this endpoint allows one to delete guaranteed to be registered through the same endpoint? If not, then I am questioning if it's really desirable to allow one to delete catalog configs which were statically loaded from etc/catalog
. It seems like it's better, at the very least, to load and unload them in an isolated directory, and whatever is loaded on startup in etc/catalog
should be immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The endpoint deletes catalogs irrespective of added using the endpoint or already existing , because any new catalog will be added to etc/catalog and will be removed from same directory.
When presto is being install first time, it doesn't have any etc/catalog folder ,so there are no catalogs provided as a part of installation , it is up to the user to add whatever catalog they want to use, hence there is nothing to be immutable.
If the question is deletion existing catalog during development,there is already the etc/catalog folder checked in presto main which will sync back the etc/catalog.
@arhimondr But still if restart(which is the only solution as of now to do cleanup of catalogs in memory) is performed before server goes out of memory, that will clear the catalogs from memory and take fresh from etc/catalog which were already added by endpoint, which is still better then adding each and every catalog and restarting the cluster every time. |
@ramveer93 This slightly reminds me a classical J2EE server, with the Personally i would prefer not to have this kind of functionality, as it is very unreliable and error prone. But let's see what others say. |
Couldn't Presto also reload it's config files on a signal instead? So many orchestration systems can just update config files on disk and then all that's needed is to tell presto to reload them. |
@chancez Allowing changing configuration in runtime is not a simple change. Currently Presto reads the configuration at the startup and initialized the internal state based on the configuration read. I'm not even sure if it is feasible in general. Usually if there's a need to change some behaviour in runtime we use session properties. What configuration properties would you like to change in runtime? |
I'm speaking about catalogs specifically at this time, as that's the topic of this PR. If you're able to add catalogs via an HTTP request, you should be able to add them by re-reading the files in the catalog directory on a signal. |
I can't deny the outOfmemory error completely if |
When new catalog is created, the It is very hard (not impossible, but hard) to ensure that all the threads are stopped when catalog removal is made. |
Any update on this? This will be a great feature for Presto |
Nothing pending from my side. I can do the changes suggested by @pettyjamesm only if i get confidence that the pull request will be merge else no point of wasting time. |
@ramveer93 Given the concerns around leaking resources when dropping a catalog maybe we should focus on just adding them and leave deleting them for later? I think we would need to test manually, and then with automation, that resources created by a catalog are released properly when the catalog is deleted. If we want to support potentially broken (or untested) deletion it would have to be hidden behind a config option and even that is a tough sell. I'll help you get this over the line. It's been sitting because of the deletion issues @arhimondr mentioned. |
Thanks for reply. Also if there is no delete there will be lots of catalogs and if not the API call to delete, the only option is to restart the Presto server to delete all the catalogs, which will delete catalogs which user doesn't want to delete as well. I have tested it manually for basic behaviour only i.e after deleting catalog , adding the same catalog shouldn't say catalog already exists.and querying for deleted catalog should give the error as catalog doesn't exists etc. Also I relly on
for releasing the resources of deleted catalogs. |
@ramveer93 I’ve been looking at a very similar approach lately, and will tell you that basically every connector (catalog) fails to even attempt to clean up after itself. There are hard assumptions that initialization happens once and cleanup is unnecessary because the JVM must be about to exit. If you want to find the surface issues, I recommend creating and deleting catalogs in a loop and taking periodic heap dumps. You’ll find pretty serious memory leaks and a growing number of threads hanging around indefinitely. That’s just scratching the surface though, see unmerged PR airlift/airlift#739 to add best effort cleanup after an exception is raised during connector shutdown. |
As @pettyjamesm points out it might work for a while, but that's not good enough for functionality we include in a public interface people expect to work. It's very bad for it to work right up until it doesn't. Then people start asking why we added this broken interface and when are we going to fix it. I think the suggestion to try it in a loop to see what it does is a good one. Might want to run a few queries against it to see if that triggers additional initialization. What are you trying to accomplish here? How often do you intend to add and remove catalogs? |
@aweisberg , About why this feature should be there , it's not about how often you register or remove the catalog, but every time we do , presto needs to be restarted, we have to do it manually for EMR every time when we want to add new catalog. I thought of this feature do remove this manual process and do it via API which doesn't require presto restart. We used the Presto and it used to be frustration everytime we wanted to add new catalog to it as it needs to be restarted manually logging into the server. If not for removing catalog , I don't see any challenge for adding catalog via this feature. Again, the feature doesn't introduce any new bug , it's just the way Presto's connectorManager.dropConnection(catalogName); works , which doesn't release resources even after executing, same thing I tried explaining from starting. |
@ramveer93 Could you please explain the use case when you need to add / remove catalogs frequently? Maybe there's an other way of achieving of what you are trying to do then adding / removing catalogs. |
Introducing a new endpoint that leaks memory when invoked is a new bug. Before when this was not accessible people couldn't do it and leak memory. If we merged this then they would be able to.
Option 4 is sketchy. I would need to ask a few of the more experienced developers to gauge whether that is something that is appropriate to do. |
For me I wish to have the ability to add/remove catalogs without having to restart the cluster, through rest api or not is not that important. Having to restart the cluster introduces downtime, which could be a problem if it links to some production servers. My users wouldn't be able to view their dashboard or generate their report in that period. |
We had API catalog which used to call the rest APIs via Presto to get the result, there used to be many APIs everyday which we register, every time we add new API one restart was required,i thought if we can register/delete catalog without restart , we can avoid the downtime as well as registering any no of catalogs without restart. |
when I add a mysql catalog dynamically, it told me "No available nodes run query", but it can run query: "select" and "insert" if I restart Presto server, can you have a try? |
Great to see this moving again. I noticed in the other PR this requires the catalog be added at every worker and to the coordinator? Additionally if a worker restarts it's going to forget that the catalog has been changed? |
take 'mysql' for example, reproducible steps: Now I'm trying to solve the problem, can you look at it too? |
@arhimondr could you give some tips to slove the Travis CI build failed? And, why Jenkins build never end? |
Hi, I am also having same kind of requirement for Presto. Is it possible to create a catalog using RestAPI? Please suggest. |
Hi, I read all this complex issue and can't find a good solution for my use case. I'm looking for a way to use presto as a query runner for different catalogs. The goal is to query presto with the query & the catalog/connector to use for the query. Sequentially when presto receives a request it should create connections to catalogs, run the query, delete the catalogs. Consider a use cases with thousands of catalogs/connector and thousands of queries per hour. If deletion is not possible, what behavior will have presto if catalog.properties contains thousands of lines by only provide adding stuff? What do you guys feel about this kind of use case @aweisberg @arhimondr? Can we work together to fix the memory leak on the dropConnection call? |
This has been a long and resurrect discussion , e.g. #14964 . Perhaps we want to have some document / wiki list all the pros/cons and the (current) decision? |
Point is good, but the way is hard. |
Hi Team, I am having the same kind of requirement with Presto. Whenever my application configures a Jdbc source, the same time I need to create a catalog on Presto automatically using some Rest APIs. |
As of now, I created a Presto Agent App which is actually a Spring Boot App and providing Rest APIs to manager catalogs on Presto Server. For example, I can add, edit and delete catalog on Presto Server using my agent app. Also, I am restarting the presto server using my agent app after adding, editing and deleting catalogs. As of now, I am able to add catalogs for the below connectors: Hive For my use-case, presto should load catalogs dynamically and also I am feeling that this feature is very much required in presto. |
This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions! |
这是来自QQ邮箱的假期自动回复邮件。
好,我最近正在休假中
您,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
|
This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions! |
As of now Presto doesn't support adding catalog on runtime, every time a new catalog is added we have to restart Presto to detect that catalog, this pull request solves the problem by exposing rest endpoints which will allow user to add and delete catalog without even restarting Presto server.