Skip to content

feat: add catalog config to separate directory and REST type catalogs #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jackye1995
Copy link
Collaborator

No description provided.

@jackye1995 jackye1995 marked this pull request as draft April 15, 2025 05:45
@github-actions github-actions bot added the enhancement New feature or request label Apr 15, 2025
@jackye1995
Copy link
Collaborator Author

jackye1995 commented Apr 17, 2025

@yanghua please take a look if you have time, the updated user experience based on my understanding of the discussion so far (this mainly focuses on the Spark SQL experience without the need to register temp view):

For Lance directories, user is expected to start session with:

spark =
        SparkSession.builder()
            .appName("spark-lance-connector-test")
            .master("local")
            .config("spark.sql.catalog.lance", "com.lancedb.lance.spark.LanceCatalog")
            .config("spark.sql.catalog.lance.type", "dir")
            .config("spark.sql.catalog.lance.path", dbPath)
            .getOrCreate();

which will map the default Spark namespace to contain tables in dbPath, so user can run:

spark.sql("SELECT * FROM t1")

spark.sql("SELECT * FROM default.t1")

Another user experience for Lance directory:

spark =
        SparkSession.builder()
            .appName("spark-lance-connector-test")
            .master("local")
            .config("spark.sql.catalog.lance", "com.lancedb.lance.spark.LanceCatalog")
            .config("spark.sql.catalog.lance.type", "dir")
            .config("spark.sql.catalog.lance.paths.ns1", dbPath1)
            .config("spark.sql.catalog.lance.paths.ns2", dbPath2)
            .getOrCreate();

This will map Spark namespace ns1 to dbPath1, ns2 to dbPath2, so user can run:

spark.sql("SELECT t1.c1, t2.c2 FROM ns1.t1 as t1, ns2.t2 as t2 WHERE t1.c3 = t2.c3")

For REST catalog, user should run something like:

spark =
        SparkSession.builder()
            .appName("spark-lance-connector-test")
            .master("local")
            .config("spark.sql.catalog.lance", "com.lancedb.lance.spark.LanceCatalog")
            .config("spark.sql.catalog.lance.type", "rest")
            .config("spark.sql.catalog.lance.uri", "https://my.lancecatalog.com")
            .getOrCreate();

which will connect to the corresponding endpoint (this is pending the HMS server impl to write unit tests)

@jackye1995 jackye1995 requested a review from yanghua April 17, 2025 00:42
@jackye1995 jackye1995 marked this pull request as ready for review April 17, 2025 01:17
@jackye1995
Copy link
Collaborator Author

If we agree with the general direction, I can remove the CreateNamespace part to get the updated structure merged first. And there is another follow up fix needed for the LanceDataSource.extractCatalog which is currently hard-coded to lance and needs to be updated.

@jackye1995 jackye1995 changed the title feat: support create namespace from Lance REST catalog feat: add catalog config to separate directory and REST type catalogs Apr 17, 2025
@jackye1995 jackye1995 requested a review from SaintBacchus April 17, 2025 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant