-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Knowledge (v1) – Product Requirements
Goal
Enable Budibase agents to answer questions using company knowledge stored in common systems, with the simplest possible setup for SME teams.
Users should be able to connect their knowledge sources, select what the agent can access, and immediately ask questions grounded in their documents.
The system should be simple to configure, reliable, and transparent, without exposing complex AI or retrieval infrastructure.
Problem
Operations teams store critical knowledge across multiple systems such as:
- SharePoint
- Google Drive
- Confluence
- PDFs and documents
Today:
- Agents do not have access to this knowledge.
- Users must manually copy information into instructions or prompts.
- There is no reliable way for agents to reference internal documents.
Building a full custom RAG infrastructure (embeddings, vector DB, retrieval pipelines) would add significant engineering complexity and slow delivery.
We need a simple knowledge layer that allows agents to retrieve relevant information from documents while keeping the system easy to operate and maintain.
Solution
Provide a Knowledge system for agents built on:
1. Gemini Flash for reasoning
Agents use Gemini Flash to interpret questions and generate answers.
2. Google File Search for retrieval
Google File Search will manage:
- document chunking
- embeddings
- indexing
- semantic retrieval
Budibase uploads synced documents to File Search and retrieves relevant context during agent responses.
3. First-party knowledge connectors
Support three initial connectors:
- SharePoint
- Google Drive
- Confluence
Each connector follows the same pattern:
Connection
- OAuth connection to the system
Scope
- User selects which folders, spaces, or libraries to include
Sync
- Manual sync button
- Automatic refresh on a schedule
Access
- User selects which agents can access the knowledge source
4. Manual + scheduled sync (not live sync)
For v1:
- user triggers Sync now
- automatic refresh runs periodically (24 hrs? 12hrs?)
- only changed files are reindexed
This avoids building complex real-time sync infrastructure.
No gos
The following will not be included in v1:
- Live or real-time document sync
- Webhook-driven updates
- Fine-grained upstream permission mirroring
- Custom vector databases
- Retrieval configuration such as top-k, chunk size, or embeddings
- User-facing search UI
- Per-document access control
The system should prioritize simplicity over completeness.
Prior art
Examples of similar systems:
- OpenAI File Search – Managed RAG layer for assistants
- Glean – Enterprise knowledge retrieval platform
- Dust – AI assistants with connected knowledge sources
- ChatGPT Enterprise – Connectors for Google Drive, SharePoint, etc.
Most modern AI platforms follow the same pattern:
Knowledge sources
↓
Connector + sync
↓
Managed retrieval system
↓
LLM reasoning
Budibase follows this model while integrating knowledge directly into agents.
User flow
1. Add knowledge source
User navigates to:
Agents → Knowledge → Add source
Select provider:
- SharePoint
- Google Drive
- Confluence
- Upload files
2. Connect account
User completes OAuth connection.
Example:
Connect SharePoint
→ Microsoft login
→ grant access
Connection is stored for the workspace.
3. Choose scope
User selects what the agent can access.
SharePoint
- site
- library
- folder
Google Drive
- specific folders
Confluence
- spaces
4. Sync knowledge
User runs:
Sync now
System:
- fetches documents
- uploads them to File Search
- indexes them
UI shows:
- Files indexed
- Last synced
- Sync status
5. Grant agent access
User selects which agents can use the knowledge source.
Example:
HR Agent → HR policies
IT Agent → IT runbooks
6. Agent answers questions
User asks:
What is our vacation policy?
System flow:
User question
↓
Agent query
↓
Google File Search retrieves relevant document chunks
↓
Gemini Flash generates answer
↓
Agent responds with grounded information
Success criteria
- Users can connect a knowledge source in under 3 minutes
- Agents can answer questions grounded in documents
- Sync process is transparent and reliable
- No AI infrastructure configuration required from users
Metadata
Metadata
Assignees
Labels
Type
Projects
Status