Skip to content

Conversation

@emanuelebeffa
Copy link

@emanuelebeffa emanuelebeffa commented Nov 27, 2025

Adds AI-powered auto-tagging functionality using OpenAI compatible APIs. Users can configure their API key, model, base URL and define a vocabulary of allowed tags, then use the "Refresh AI tags" action to automatically generate tag suggestions for their bookmarks based on content (URL, title, description).

The feature includes:

  • API key validation
  • Base URL support to use any OpenAI compatible APIs (e.g. Ollama)
  • Configurable tag vocabulary to constrain AI suggestions
  • Bulk tagging
  • Unit tests

Estimated cost analysis

Using OpenAI's gpt-5-nano model at $0.05 per 1M input tokens and $0.4 per 1M output tokens:

Average cost per bookmark: ~$0.000043

  • Input: ~250 tokens

  • Cost: $0,0000125

  • Output: ~100 tokens

  • Cost: $0.00004

Volume Cost
100 bookmarks ~$0.0043
1,000 bookmarks ~$0.043

Possible future evolution

  • Batch processing optimizations for large bulk operations, further reducing costs
  • Prompt customization
  • List supported models

@emanuelebeffa emanuelebeffa mentioned this pull request Nov 27, 2025
@oliexe
Copy link

oliexe commented Dec 2, 2025

This is awesome. Lgtm, hopefully this will get merged soon.

@7jrxt42BxFZo4iAnN4CX
Copy link

@sissbruecker we are waiting for this

@Eragos
Copy link

Eragos commented Dec 8, 2025

@7jrxt42BxFZo4iAnN4CX sadly this is only a single OpenAI, not a self-hosted Ollama or other AI-APIs solution… The difference is not so bad but should be carefully thought of – my2ct

@7jrxt42BxFZo4iAnN4CX
Copy link

@Eragos Then expand to any OpenAI compatible ones? I'd like an open router.

@emanuelebeffa
Copy link
Author

@7jrxt42BxFZo4iAnN4CX sadly this is only a single OpenAI, not a self-hosted Ollama or other AI-APIs solution… The difference is not so bad but should be carefully thought of – my2ct

I’m already working on Ollama integration, I was just waiting for this PR to be merged first and to check whether there were any blockers

@7jrxt42BxFZo4iAnN4CX
Copy link

@emanuelebeffa Great news, that would be great.
We are waiting for merge.

@emanuelebeffa
Copy link
Author

I almost finished Ollama integration, I’ll be reopening the PR soon

@emanuelebeffa emanuelebeffa reopened this Dec 9, 2025
@emanuelebeffa emanuelebeffa changed the title feat: AI auto tagging feat: AI auto-tagging Dec 9, 2025
@7jrxt42BxFZo4iAnN4CX
Copy link

@emanuelebeffa

This review was generated with AI assistance

Review Summary

Thanks for this feature! The implementation is well-structured with good test coverage. I found a few issues worth addressing before merge.


🔴 Critical

Bulk "Refresh AI tags" doesn't work for bookmarks with existing tags

The documentation states:

"This will replace the existing tags with new AI-generated suggestions"

However, _auto_tag_bookmark_task skips bookmarks that already have tags:

if bookmark.tags.exists():
    logger.info(f"Skipping AI tagging - bookmark {bookmark_id} already has tags")
    return

Suggested fix: Add a force parameter to allow bulk refresh to override existing tags:

def auto_tag_bookmark(user: User, bookmark: Bookmark, force: bool = False):
    # ...
    _auto_tag_bookmark_task(bookmark.id, user.id, force)

@task()
def _auto_tag_bookmark_task(bookmark_id: int, user_id: int, force: bool = False):
    if bookmark.tags.exists() and not force:
        return
    # ...

Then in refresh_ai_tags, pass force=True.


🟠 Important

1. API key exposed in API response

ai_api_key is included in UserProfileSerializer fields without write_only=True. While only the profile owner can access it, this is still risky (XSS, logging, etc.).

Suggested fix:

ai_api_key = serializers.CharField(write_only=True, required=False, allow_blank=True)

2. No rate limiting for bulk AI operations

Selecting many bookmarks for bulk refresh could trigger hundreds of API calls, potentially exceeding provider rate limits or incurring unexpected costs.

Suggestion: Consider adding a limit or at least a warning in the UI.


🟡 Minor / Nice-to-have

  • Timeout: Consider adding explicit timeout to OpenAI client (default is 10 min, which is quite long)
  • Vocabulary size limit: Large tag lists increase API costs per request

Overall, great work! The Pydantic structured outputs, error handling with retry logic for 5xx errors, and hallucination filtering are all well done. 👍

@emanuelebeffa
Copy link
Author

Thanks for the review! I've pushed fixes for Bulk "Refresh AI tags" doesn't work for bookmarks with existing tags, API key exposed in API response and Timeout. I've also pushed UI warnings for No rate limiting for bulk AI operations and Vocabulary size limit.

@7jrxt42BxFZo4iAnN4CX
Copy link

Looks great! I've reviewed the latest changes, and they successfully address all the previous points.

The implementation looks solid and ready to go.
Waiting for @sissbruecker for the final review and merge. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants