Skip to content

[Project] Agent Knowledge v1 - Budibase AI first approach #18294

@joebudi

Description

@joebudi

Knowledge (v1) – Product Requirements

Goal

Enable Budibase agents to answer questions using company knowledge stored in common systems, with the simplest possible setup for SME teams.

Users should be able to connect their knowledge sources, select what the agent can access, and immediately ask questions grounded in their documents.

The system should be simple to configure, reliable, and transparent, without exposing complex AI or retrieval infrastructure.


Problem

Operations teams store critical knowledge across multiple systems such as:

  • SharePoint
  • Google Drive
  • Confluence
  • PDFs and documents

Today:

  • Agents do not have access to this knowledge.
  • Users must manually copy information into instructions or prompts.
  • There is no reliable way for agents to reference internal documents.

Building a full custom RAG infrastructure (embeddings, vector DB, retrieval pipelines) would add significant engineering complexity and slow delivery.

We need a simple knowledge layer that allows agents to retrieve relevant information from documents while keeping the system easy to operate and maintain.


Solution

Provide a Knowledge system for agents built on:

1. Gemini Flash for reasoning

Agents use Gemini Flash to interpret questions and generate answers.

2. Google File Search for retrieval

Google File Search will manage:

  • document chunking
  • embeddings
  • indexing
  • semantic retrieval

Budibase uploads synced documents to File Search and retrieves relevant context during agent responses.

3. First-party knowledge connectors

Support three initial connectors:

  • SharePoint
  • Google Drive
  • Confluence

Each connector follows the same pattern:

Connection

  • OAuth connection to the system

Scope

  • User selects which folders, spaces, or libraries to include

Sync

  • Manual sync button
  • Automatic refresh on a schedule

Access

  • User selects which agents can access the knowledge source

4. Manual + scheduled sync (not live sync)

For v1:

  • user triggers Sync now
  • automatic refresh runs periodically (24 hrs? 12hrs?)
  • only changed files are reindexed

This avoids building complex real-time sync infrastructure.


No gos

The following will not be included in v1:

  • Live or real-time document sync
  • Webhook-driven updates
  • Fine-grained upstream permission mirroring
  • Custom vector databases
  • Retrieval configuration such as top-k, chunk size, or embeddings
  • User-facing search UI
  • Per-document access control

The system should prioritize simplicity over completeness.


Prior art

Examples of similar systems:

  • OpenAI File Search – Managed RAG layer for assistants
  • Glean – Enterprise knowledge retrieval platform
  • Dust – AI assistants with connected knowledge sources
  • ChatGPT Enterprise – Connectors for Google Drive, SharePoint, etc.

Most modern AI platforms follow the same pattern:

Knowledge sources

Connector + sync

Managed retrieval system

LLM reasoning

Budibase follows this model while integrating knowledge directly into agents.


User flow

1. Add knowledge source

User navigates to:

Agents → Knowledge → Add source

Select provider:

  • SharePoint
  • Google Drive
  • Confluence
  • Upload files

2. Connect account

User completes OAuth connection.

Example:

Connect SharePoint
→ Microsoft login
→ grant access

Connection is stored for the workspace.


3. Choose scope

User selects what the agent can access.

SharePoint

  • site
  • library
  • folder

Google Drive

  • specific folders

Confluence

  • spaces

4. Sync knowledge

User runs:

Sync now

System:

  • fetches documents
  • uploads them to File Search
  • indexes them

UI shows:

  • Files indexed
  • Last synced
  • Sync status

5. Grant agent access

User selects which agents can use the knowledge source.

Example:

HR Agent → HR policies
IT Agent → IT runbooks


6. Agent answers questions

User asks:

What is our vacation policy?

System flow:

User question

Agent query

Google File Search retrieves relevant document chunks

Gemini Flash generates answer

Agent responds with grounded information


Success criteria

  • Users can connect a knowledge source in under 3 minutes
  • Agents can answer questions grounded in documents
  • Sync process is transparent and reliable
  • No AI infrastructure configuration required from users

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

In progress

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions