Skip to content

Add rest endpoints to dynamically add a catalog #12605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ramveer93
Copy link

As of now Presto doesn't support adding catalog on runtime, every time a new catalog is added we have to restart Presto to detect that catalog, this pull request solves the problem by exposing rest endpoints which will allow user to add and delete catalog without even restarting Presto server.

@facebook-github-bot
Copy link
Collaborator

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot
Copy link
Collaborator

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@ramveer93
Copy link
Author

Please provide any comment on this pull request.

String filePath = getPropertyFilePath(catalogVo.getCatalogName());
log.info("filepath: " + filePath);
File propertiesFile = new File(filePath);
try (OutputStream out = new FileOutputStream(propertiesFile)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: you might consider using Files.newOuputStream here instead. See #12614 for details about why File{Input,Output}Stream should be avoided in cases where any {Input,Output}Stream will do.

@arhimondr
Copy link
Member

@ramveer93 Unloading dynamically loaded plugins is very memory leak prone in Java. It is very hard to enforce that all the threads spawned by a plugin are being correctly stopped at the lifecycle end. I'm pretty sure it is very easy to find instances even in Presto when thread pools are not being closed (e.g.: https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/HiveClientModule.java#L128). Introducing a runtime method that removes catalogs creates a false impression of "removal". In fact there's a high chance that the catalog won't be correctly unloaded, and will still consume resources.

Not so sure about dynamic adding of catalogs. But it seems like since these methods are needed - frequent registrations of new catalogs are expected. If there's no reliable way of "deregistering" them - eventually the cluster will run out of memory.

@DELETE
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
public Response deleteCatalog(@QueryParam("catalogName") String catalogName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this will try to delete whatever catalog is in StaticCatalogStoreConfig. I think it would be operationally better to let whatever is specified as a static resource to be immutable, but still allow one to delete or add on the fly. I don't think it should delete the files that are statically shipped with the Presto deployment. Worst case scenario, you don't specify any static catalogs and you need to add them via the REST endpoint. But that seems better than accidentally deleting an important catalog config and effectively not having a way to undo that.

Copy link
Author

@ramveer93 ramveer93 Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment,
Not sure I followed your comment fully.
But I am adding my explanation: Delete catalog endpoint will call connectorManager.dropConnection(catalogName); which will eventually call removeCatalog method of catalogManager which deletes the catalog from concurrentMap.
The endpoint will also delete the file associated with this catalog from etc/catalog which was created when we registered the catalog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the catalogs that this endpoint allows one to delete guaranteed to be registered through the same endpoint? If not, then I am questioning if it's really desirable to allow one to delete catalog configs which were statically loaded from etc/catalog. It seems like it's better, at the very least, to load and unload them in an isolated directory, and whatever is loaded on startup in etc/catalog should be immutable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endpoint deletes catalogs irrespective of added using the endpoint or already existing , because any new catalog will be added to etc/catalog and will be removed from same directory.

When presto is being install first time, it doesn't have any etc/catalog folder ,so there are no catalogs provided as a part of installation , it is up to the user to add whatever catalog they want to use, hence there is nothing to be immutable.
If the question is deletion existing catalog during development,there is already the etc/catalog folder checked in presto main which will sync back the etc/catalog.

@ramveer93
Copy link
Author

@arhimondr
Thanks for the comment.
Regarding unloading dynamically loaded catalogs, the code relies on connectorManager.dropConnection(catalogName);, it assumes that it will drop the connection for that particular catalog and perform cleanup stuff required to clean the catalog from memory, then only the code deletes the catalog file from file system. If connectorManager.dropConnection(catalogName); is not taking care about the cleaning of the catalog from memory, then yes, as you wrote there will be memory leak and it needs to be fixed if at all it is duable.

But still if restart(which is the only solution as of now to do cleanup of catalogs in memory) is performed before server goes out of memory, that will clear the catalogs from memory and take fresh from etc/catalog which were already added by endpoint, which is still better then adding each and every catalog and restarting the cluster every time.

@arhimondr
Copy link
Member

@ramveer93 This slightly reminds me a classical J2EE server, with the WAR deployment over the web interface. Every time a WAR archive is deployed that way there's a chance that this time is gonna be the one that will run the server out of memory. So, at the end of the day server is restarted after each deployment, just to be sure.

Personally i would prefer not to have this kind of functionality, as it is very unreliable and error prone. But let's see what others say.

CC: @tdcmeehan @wenleix @highker

@chancez
Copy link

chancez commented Apr 11, 2019

Couldn't Presto also reload it's config files on a signal instead? So many orchestration systems can just update config files on disk and then all that's needed is to tell presto to reload them.

@arhimondr
Copy link
Member

@chancez Allowing changing configuration in runtime is not a simple change. Currently Presto reads the configuration at the startup and initialized the internal state based on the configuration read. I'm not even sure if it is feasible in general.

Usually if there's a need to change some behaviour in runtime we use session properties. What configuration properties would you like to change in runtime?

@chancez
Copy link

chancez commented Apr 11, 2019

I'm speaking about catalogs specifically at this time, as that's the topic of this PR. If you're able to add catalogs via an HTTP request, you should be able to add them by re-reading the files in the catalog directory on a signal.

@ramveer93
Copy link
Author

@ramveer93 This slightly reminds me a classical J2EE server, with the WAR deployment over the web interface. Every time a WAR archive is deployed that way there's a chance that this time is gonna be the one that will run the server out of memory. So, at the end of the day server is restarted after each deployment, just to be sure.

Personally i would prefer not to have this kind of functionality, as it is very unreliable and error prone. But let's see what others say.

CC: @tdcmeehan @wenleix @highker

Personally i would prefer not to have this kind of functionality, as it is very unreliable and error prone. But let's see what others say.

I can't deny the outOfmemory error completely if connectorManager.dropConnection(catalogName); doesn't work fully , But I just wanted to understand why do we have a method which doesn't perform what it needs to be or it perform partially. I am talking about connectorManager.dropConnection(catalogName);. Although I tested the same multiple times adding same catalog deleted prior to adding , but not even single time getting the error or catalog already exists. So just wanted to understand where else it will store the previous catalogs in code.

@arhimondr
Copy link
Member

Although I tested the same multiple times adding same catalog deleted prior to adding , but not even single time getting the error or catalog already exists. So just wanted to understand where else it will store the previous catalogs in code.

When new catalog is created, the ConnectorFactory#create is called. It effectively creates another instance of a Connector. Connector is allowed to create thread pools. And new thread pools are created each time a new catalog based on the connector is added. These internal threads may hold the back references to some internal data structures. So, even if the catalog is removed from the CatalogManager, there still threads that hold some references to the internal data structures of that catalog.

It is very hard (not impossible, but hard) to ensure that all the threads are stopped when catalog removal is made.

@tchunwei
Copy link

Any update on this? This will be a great feature for Presto

@ramveer93
Copy link
Author

Any update on this? This will be a great feature for Presto

Nothing pending from my side. I can do the changes suggested by @pettyjamesm only if i get confidence that the pull request will be merge else no point of wasting time.

@aweisberg
Copy link
Contributor

@ramveer93 Given the concerns around leaking resources when dropping a catalog maybe we should focus on just adding them and leave deleting them for later?

I think we would need to test manually, and then with automation, that resources created by a catalog are released properly when the catalog is deleted. If we want to support potentially broken (or untested) deletion it would have to be hidden behind a config option and even that is a tough sell.

I'll help you get this over the line. It's been sitting because of the deletion issues @arhimondr mentioned.

@ramveer93
Copy link
Author

ramveer93 commented Jul 12, 2019

@ramveer93 Given the concerns around leaking resources when dropping a catalog maybe we should focus on just adding them and leave deleting them for later?

I think we would need to test manually, and then with automation, that resources created by a catalog are released properly when the catalog is deleted. If we want to support potentially broken (or untested) deletion it would have to be hidden behind a config option and even that is a tough sell.

I'll help you get this over the line. It's been sitting because of the deletion issues @arhimondr mentioned.

Thanks for reply.
User can ignore the delete API call if they don't want to delete added catalogs,

Also if there is no delete there will be lots of catalogs and if not the API call to delete, the only option is to restart the Presto server to delete all the catalogs, which will delete catalogs which user doesn't want to delete as well.

I have tested it manually for basic behaviour only i.e after deleting catalog , adding the same catalog shouldn't say catalog already exists.and querying for deleted catalog should give the error as catalog doesn't exists etc.

Also I relly on

connectorManager.dropConnection(catalogName);

for releasing the resources of deleted catalogs.
Please let me know how can we run more manual/automation test cases.

@pettyjamesm
Copy link
Contributor

@ramveer93 I’ve been looking at a very similar approach lately, and will tell you that basically every connector (catalog) fails to even attempt to clean up after itself. There are hard assumptions that initialization happens once and cleanup is unnecessary because the JVM must be about to exit. If you want to find the surface issues, I recommend creating and deleting catalogs in a loop and taking periodic heap dumps. You’ll find pretty serious memory leaks and a growing number of threads hanging around indefinitely. That’s just scratching the surface though, see unmerged PR airlift/airlift#739 to add best effort cleanup after an exception is raised during connector shutdown.

@aweisberg
Copy link
Contributor

As @pettyjamesm points out it might work for a while, but that's not good enough for functionality we include in a public interface people expect to work. It's very bad for it to work right up until it doesn't. Then people start asking why we added this broken interface and when are we going to fix it.

I think the suggestion to try it in a loop to see what it does is a good one. Might want to run a few queries against it to see if that triggers additional initialization.

What are you trying to accomplish here? How often do you intend to add and remove catalogs?

@ramveer93
Copy link
Author

@aweisberg , About why this feature should be there , it's not about how often you register or remove the catalog, but every time we do , presto needs to be restarted, we have to do it manually for EMR every time when we want to add new catalog. I thought of this feature do remove this manual process and do it via API which doesn't require presto restart.

We used the Presto and it used to be frustration everytime we wanted to add new catalog to it as it needs to be restarted manually logging into the server.

If not for removing catalog , I don't see any challenge for adding catalog via this feature.

Again, the feature doesn't introduce any new bug , it's just the way Presto's connectorManager.dropConnection(catalogName); works , which doesn't release resources even after executing, same thing I tried explaining from starting.

@arhimondr
Copy link
Member

@ramveer93 Could you please explain the use case when you need to add / remove catalogs frequently? Maybe there's an other way of achieving of what you are trying to do then adding / removing catalogs.

@aweisberg
Copy link
Contributor

Again, the feature doesn't introduce any new bug , it's just the way Presto's connectorManager.dropConnection(catalogName); works , which doesn't release resources even after executing, same thing I tried explaining from starting.

Introducing a new endpoint that leaks memory when invoked is a new bug. Before when this was not accessible people couldn't do it and leak memory. If we merged this then they would be able to.

If not for removing catalog , I don't see any challenge for adding catalog via this feature.
I think there are a couple of ways to get this PR merged.

  1. Remove dropping catalogs
  2. Demonstrate that dropping catalogs doesn't leak threads, memory, or other resources
  3. Fix dropping catalogs so that is doesn't leak.
  4. Demonstrate that dropping catalogs doesn't leak that much in the way of resources and that we can add a config option to allow people to do this that is disabled by default. It should produce an error explaining why dropping catalogs is disabled by default.

Option 4 is sketchy. I would need to ask a few of the more experienced developers to gauge whether that is something that is appropriate to do.

@tchunwei
Copy link

@ramveer93 Could you please explain the use case when you need to add / remove catalogs frequently? Maybe there's an other way of achieving of what you are trying to do then adding / removing catalogs.

For me I wish to have the ability to add/remove catalogs without having to restart the cluster, through rest api or not is not that important. Having to restart the cluster introduces downtime, which could be a problem if it links to some production servers. My users wouldn't be able to view their dashboard or generate their report in that period.

@ramveer93
Copy link
Author

For me I wish to have the ability to add/remove catalogs without having to restart the cluster, through rest api or not is not that important. Having to restart the cluster introduces downtime, which could be a problem if it links to some production servers. My users wouldn't be able to view their dashboard or generate their report in that period.

We had API catalog which used to call the rest APIs via Presto to get the result, there used to be many APIs everyday which we register, every time we add new API one restart was required,i thought if we can register/delete catalog without restart , we can avoid the downtime as well as registering any no of catalogs without restart.

@ramveer93 ramveer93 closed this Jul 16, 2019
@ramveer93 ramveer93 reopened this Jul 16, 2019
@wsjz
Copy link

wsjz commented Mar 18, 2020

when I add a mysql catalog dynamically, it told me "No available nodes run query", but it can run query: "select" and "insert" if I restart Presto server, can you have a try?

@aweisberg
Copy link
Contributor

Great to see this moving again. I noticed in the other PR this requires the catalog be added at every worker and to the coordinator? Additionally if a worker restarts it's going to forget that the catalog has been changed?

@wsjz
Copy link

wsjz commented Mar 30, 2020

when I add a mysql catalog dynamically, it told me "No available nodes run query", but it can run query: "select" and "insert" if I restart Presto server, can you have a try?

take 'mysql' for example, reproducible steps:
1、clear all mysql catalog file on all nodes' catalog dir
2、restart presto
3、add mysql catalog dynamically
4、run "presto --catalog mysql --debug" and query by select, you can see the problem

Now I'm trying to solve the problem, can you look at it too?

@jianghuzai
Copy link

@arhimondr could you give some tips to slove the Travis CI build failed? And, why Jenkins build never end?

@isonusingh
Copy link

Hi, I am also having same kind of requirement for Presto. Is it possible to create a catalog using RestAPI? Please suggest.

@KannarFr
Copy link

KannarFr commented Aug 18, 2020

Hi,

I read all this complex issue and can't find a good solution for my use case. I'm looking for a way to use presto as a query runner for different catalogs. The goal is to query presto with the query & the catalog/connector to use for the query.

Sequentially when presto receives a request it should create connections to catalogs, run the query, delete the catalogs.

Consider a use cases with thousands of catalogs/connector and thousands of queries per hour.

If deletion is not possible, what behavior will have presto if catalog.properties contains thousands of lines by only provide adding stuff?

What do you guys feel about this kind of use case @aweisberg @arhimondr?

Can we work together to fix the memory leak on the dropConnection call?

@wenleix
Copy link
Contributor

wenleix commented Aug 24, 2020

This has been a long and resurrect discussion , e.g. #14964 . Perhaps we want to have some document / wiki list all the pros/cons and the (current) decision?

@stone-wlg
Copy link

Point is good, but the way is hard.

@isonusingh
Copy link

isonusingh commented Oct 20, 2020

Hi Team, I am having the same kind of requirement with Presto. Whenever my application configures a Jdbc source, the same time I need to create a catalog on Presto automatically using some Rest APIs.
As of now, I am planning to create a Spring Boot Agent App which can help me to automatic manage catalogs on Presto.

@aweisberg @arhimondr

@isonusingh
Copy link

As of now, I created a Presto Agent App which is actually a Spring Boot App and providing Rest APIs to manager catalogs on Presto Server. For example, I can add, edit and delete catalog on Presto Server using my agent app. Also, I am restarting the presto server using my agent app after adding, editing and deleting catalogs.

As of now, I am able to add catalogs for the below connectors:

Hive
MySQL
SQLServer
MongoDB
Kafka
MemSQL
Oracle
PostgreSQL
RedShift
Google BigQuery
This approach is working fine with a single user but when multiple users will use Agent App that will be a problem. So, I added a locking mechanism in my agent app and now one user can only add/edit/delete catalogs.

For my use-case, presto should load catalogs dynamically and also I am feeling that this feature is very much required in presto.

@stale
Copy link

stale bot commented Mar 2, 2022

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

@stale stale bot added the stale label Mar 2, 2022
@wsjz
Copy link

wsjz commented Mar 2, 2022 via email

@stale stale bot removed the stale label Mar 2, 2022
@stale
Copy link

stale bot commented Sep 21, 2022

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.