Add ai_parse_document job fixes by bali0019 · Pull Request #115 · databricks/bundle-examples

bali0019 · 2025-10-15T16:35:28Z

Folder Rename

Renamed workflow_with_ai_parse_document → job_with_ai_parse_document to align with "job" terminology used throughout the project

Remove v1.0 Parser References

Removed CASE statement handling v1.0 vs v2.0 formats in text extraction task
Now uses only v2.0 format (parsed:document:elements) as v1.0 may get deprecated
Updated README to remove mention of v1.0 support
Kept explicit version: 2.0 parameter in ai_parse_document call for clarity

This example demonstrates incremental document processing using: - ai_parse_document for extracting structured data from PDFs/images - ai_query for LLM-based content analysis - Databricks Workflows with Structured Streaming and serverless compute Key features: - Python notebooks with Structured Streaming for incremental processing - Serverless compute for cost efficiency - Parameterized workflow with catalog, schema, and table names - Checkpointed streaming to process only new data - Visual debugging notebook with interactive bounding boxes

- Add job-level parameters block for catalog and schema (shared across all tasks) - Move optional schedule configuration to top of job definition - Replace Python notebook with Jupyter notebook format including visual outputs

The ipynb format was not preserving the visual output properly. Using .py notebook source and .html export to show outputs correctly.

- Move directory from knowledge_base/ to contrib/ - Replace "workflow" terminology with "job" throughout - Add recursiveFileLookup option for subdirectory file discovery - Make all parameters explicit at job level - Update checkpoint paths to use Unity Catalog volumes instead of /tmp - Remove hardcoded personal paths from notebooks - Add 6 visual screenshots showing parsing capabilities - Update README with Example Output section - Remove HTML output file in favor of Python notebook

- Remove CASE statement handling v1.0 vs v2.0 in text extraction - Use only v2.0 format (parsed:document:elements) - Update README to remove mention of v1.0 support - Keep explicit version parameter in ai_parse_document call

…se_document Aligns folder name with job terminology used throughout the project

…low-fixes

Replaced by job_with_ai_parse_document with updated terminology

bali0019 added 10 commits October 7, 2025 12:43

Address PR feedback for workflow example

63ec694

- Add job-level parameters block for catalog and schema (shared across all tasks) - Move optional schedule configuration to top of job definition - Replace Python notebook with Jupyter notebook format including visual outputs

Replace ipynb with py and html formats for better output visualization

9d61646

The ipynb format was not preserving the visual output properly. Using .py notebook source and .html export to show outputs correctly.

Update HTML output with latest visualization

8578b05

Remove v1.0 parser references as it will be deprecated

5caf2cc

- Remove CASE statement handling v1.0 vs v2.0 in text extraction - Use only v2.0 format (parsed:document:elements) - Update README to remove mention of v1.0 support - Keep explicit version parameter in ai_parse_document call

Rename folder from workflow_with_ai_parse_document to job_with_ai_par…

408bc87

…se_document Aligns folder name with job terminology used throughout the project

Merge remote-tracking branch 'origin/main' into add-ai-document-workf…

a5f13e8

…low-fixes

Merge branch 'main' into add-ai-document-workflow-fixes

391c466

Remove old workflow_with_ai_parse_document folder

36284bd

Replaced by job_with_ai_parse_document with updated terminology

bali0019 closed this Oct 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ai_parse_document job fixes#115

Add ai_parse_document job fixes#115
bali0019 wants to merge 10 commits intodatabricks:mainfrom
bali0019:add-ai-document-workflow-fixes

bali0019 commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bali0019 commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant