Skip to content

AI HTML recipe scrape of unsupported websites#674

Draft
TomBursch wants to merge 1 commit into
mainfrom
full-ai-recipe-scrape
Draft

AI HTML recipe scrape of unsupported websites#674
TomBursch wants to merge 1 commit into
mainfrom
full-ai-recipe-scrape

Conversation

@TomBursch
Copy link
Copy Markdown
Owner

Part of #670

@jmylchreest
Copy link
Copy Markdown
Contributor

I work with an extraction solution that does a very similar thing to this, and I took the recent decision to actually build a small C based library and model registry - i'm just working on a small fine-tuned model based on Qwen at the moment - all of this will allow the project to leverage local inferrence within the app with a small (~200MB) local model, avoiding any inference costs etc and allowing users to do it all on-device. I'll keep you informed if it goes somewhere quickly :)

I mainly wanted to put a note here, simply because it has a lot of overlap.

My other solution deals with this from an API perspective in a similar way. Cleaning content prior to sending it for inference is quite important, and the refyne library provides that too. This isn't supposed to be an ad, just to say I have a brief demo app that does much the same thing here: https://recipeapp-demo.refyne.uk that I'd be happy to share/offer a key for refyne for if you're interested. The above has a re-implementation of the cleaner in rust to provide C bindings that the kitchenowl app/api could both use directly.

@TomBursch TomBursch force-pushed the full-ai-recipe-scrape branch from 7d7a349 to 4c2f8d7 Compare April 3, 2026 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request help wanted Extra attention is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants