Apps #32

mobeenvaid-db · 2025-10-15T02:27:50Z

added apps folder, basic readme with description of the two apps, and then the uc-metadata-assistant app

eliswanson-db

Could you take a look at the items I highlighted, and we can sync on it?

eliswanson-db · 2025-10-16T18:08:46Z

apps/uc-metadata-assistant/README.md

+
+| Catalog Size | Discovery Time* | Generation Time** | UI Response*** |
+|-------------|----------------|-------------------|----------------|
+| Small (1K objects) | <2s | 5-15min | <100ms |


So, you've benchmarked this at a thousand tables taking 5-15 minutes to tag?

So it imputed values from the amount of time it took to generate 20 objects. And there wasn't any tagging here, just metadata generation. I could do a test run of 100-200 pretty easily and that may be a better baseline to benchmark off of. The "Discovery Time" portion is a bit misleading as it's a quick query on the information_schema.

Got it, so its just a query on the information schema itself, that makes sense

I was wondering, because the batch version is much slower, but it does a wide variety of queries against different components of metadata and data

eliswanson-db · 2025-10-16T18:11:16Z

apps/uc-metadata-assistant/pii_detector.py

+class PIIDetector:
+    """
+    Self-contained PII detection with pattern matching and data analysis.
+    Provides similar functionality to Presidio but embedded and lightweight.


Have you benchmarked this against presidio OOB? I'd be very interested in exploring this further if the metrics are similar

I should remove that actually. I used vibe coding to resemble presidio, but it's not nearly as good per my review of Presidio, to be candid. The most functioning part of the PII detection at this moment is regex pattern detection, the LLM has sort of been hit and miss but I haven't prompted it well and it's an area I have to build out better.

I see - I absolutely do have the same challenges with Presidio- the installs with customers' networking can be a trial. But from what we've seen with our benchmarking, the LLM can beat presidio, but it has to be well prompted, and although Presidio tends to have a really low precision, it gets a really high recall when combined with a well-prompted LLM

eliswanson-db · 2025-10-16T18:11:55Z

apps/uc-metadata-assistant/pii_detector.py

+            'confidence': confidence
+        }
+
+    def _determine_classification(self, pii_types: List[str]) -> str:


I think before putting this in main, we should align on how we should be classifying data

Maybe that's the wrong approach though, happy to discuss

I'm open to either approach. In its current design, it encapsulates the entire PII Detection logic and classification is just a task within it. Classification in the app is really just informative, which is less useful than dbxmetagen which I understand facilitates domain-identifying data objects. My only concern about combining it into main is the challenges it may introduce to maintain and extend it. But yeah, let's discuss.

eliswanson-db · 2025-10-16T18:12:37Z

apps/uc-metadata-assistant/README.md

+The **Quality tab** provides comprehensive metadata quality assessment and governance analytics:
+
+**🏆 Metadata Quality Assessment Header**:
+- **Trophy icon** emphasizing excellence and quality focus


Could you review some of this? I think there's a lot of content here that could be cleaned out - I'm working on doing the same with the main readme

Sure, let me take a look this week and whittle it down

eliswanson-db · 2025-10-16T18:13:17Z

apps/uc-metadata-assistant/README.md

+  - Llama models: ~3-8s per batch  
+  - Gemma models: ~1-3s per batch (fastest)
+  - Claude models: ~2-6s per batch
+- **Batch Size Impact**: Larger batches (20-50 objects) reduce total time but increase individual request time


are objects columns or tables here?

objects represent schemas, tables, and columns (basically just the load)

Ok got it, so 5 tables with 800 columns each, plus the schema, would be like 4006 'objects'?

I think it'd be helpful to clarify in the readme what objects are in this case

eliswanson-db · 2025-10-16T18:15:20Z

apps/readme.md

@@ -0,0 +1 @@
+This folder contains apps that support organizations seeking to streamline metadata generation. The first is UC Metadata Assistant, a self-contained app designed for business users looking to generate metadata. The second is built off of dbxmetagen and provides an interface for executing the utilities contained in the remainder of this repo and is provided for organizations with enterprise demands including CI/CD support, domain identification, and mature PII detection and data classification.


I think we need a gap analysis or a differentiator for this in its current state - what is different about the UC Metadata Assistant?

It's not that business users are generating metadata, that's PART of it, but really, your governance management tooling is what is particularly useful there, right?

Well, I think at its current state, the app is really designed as a lightweight mechanism for business users to populate metadata. That's where it's strongest (the remaining components need to be built out better though they are functional today...I have ideas on where I want to take them). We can put this as a discussion item when we connect

That makes sense - I think that we want to include this in the readme in a clear way before we can share this widely, and we should clarify in the main readme the distinction between the two

eliswanson-db · 2025-10-16T18:16:55Z

apps/uc-metadata-assistant/README.md

@@ -0,0 +1,870 @@
+# 🏢 Unity Catalog Metadata Assistant


Ideally, could we remove emojis that don't serve any purpose, just as a style guide item?

In the app - sure, but in the readme I'd really prefer removing them. Seems too much ai generated to me. Happy to discuss this

yeah, it was largely ai generated - i just provided inputs along the way and had it build out sections and then reviewed them here and there. I don't really have a strong opinion on emojis lol, so we can cut them out that's fine.

not trying to be picky about these things, just thinking it would be good to have a general style - in the app its one thing but in readmens and regular code, not idea

mobeenvaid-db added 6 commits October 14, 2025 21:10

Create apps

54b30a0

Delete apps

ca740ef

Create readme.md

62c2493

Create README.md

7008791

Add files via upload

3d7ac03

Add files via upload

b3bd5c9

eliswanson-db reviewed Oct 16, 2025

View reviewed changes

		@@ -0,0 +1 @@
		This folder contains apps that support organizations seeking to streamline metadata generation. The first is UC Metadata Assistant, a self-contained app designed for business users looking to generate metadata. The second is built off of dbxmetagen and provides an interface for executing the utilities contained in the remainder of this repo and is provided for organizations with enterprise demands including CI/CD support, domain identification, and mature PII detection and data classification.

Uh oh!

Apps #32

Are you sure you want to change the base?

Apps #32

Uh oh!

Conversation

mobeenvaid-db commented Oct 15, 2025

Uh oh!

eliswanson-db left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants