Skip to content

Commit

Permalink
refactor to r2r (#46)
Browse files Browse the repository at this point in the history
  • Loading branch information
emrgnt-cmplxty authored Feb 21, 2024
1 parent a0ee251 commit 9cc532a
Show file tree
Hide file tree
Showing 50 changed files with 57 additions and 57 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ To get started with this project, you'll be using Poetry for managing dependenci
2. **Clone and Install Dependencies:**
- Clone the project repository and navigate to the project directory:
```bash
git clone [email protected]:SciPhi-AI/sciphi_r2r.git
cd sciphi_r2r
git clone [email protected]:SciPhi-AI/r2r.git
cd r2r
```
- Install the project dependencies with Poetry:
```bash
Expand All @@ -33,7 +33,7 @@ This guide should help you set up the project with minimal hassle. Ensure you fo

## Demonstration

https://github.com/SciPhi-AI/sciphi_r2r/assets/68796651/c648ab67-973a-416a-985e-2eafb0a41ef0
https://github.com/SciPhi-AI/r2r/assets/68796651/c648ab67-973a-416a-985e-2eafb0a41ef0

## Community
[Join our Discord server!](https://discord.gg/p6KqD2kjtB)
Expand All @@ -42,11 +42,11 @@ https://github.com/SciPhi-AI/sciphi_r2r/assets/68796651/c648ab67-973a-416a-985e-

The framework primarily revolves around three core abstractions:

- The **Ingestion Pipeline**: Facilitates the preparation of embeddable 'Documents' from various data formats (json, txt, pdf, html, etc.). The abstraction can be found in [`ingestion.py`](sciphi_r2r/core/pipelines/ingestion.py).
- The **Ingestion Pipeline**: Facilitates the preparation of embeddable 'Documents' from various data formats (json, txt, pdf, html, etc.). The abstraction can be found in [`ingestion.py`](r2r/core/pipelines/ingestion.py).

- The **Embedding Pipeline**: Manages the transformation of text into stored vector embeddings, interacting with embedding and vector database providers through a series of steps (e.g., extract_text, transform_text, chunk_text, embed_chunks, etc.). The abstraction can be found in [`embedding.py`](sciphi_r2r/core/pipelines/embedding.py).
- The **Embedding Pipeline**: Manages the transformation of text into stored vector embeddings, interacting with embedding and vector database providers through a series of steps (e.g., extract_text, transform_text, chunk_text, embed_chunks, etc.). The abstraction can be found in [`embedding.py`](r2r/core/pipelines/embedding.py).

- The **RAG Pipeline**: Works similarly to the embedding pipeline but incorporates an LLM provider to produce text completions. The abstraction can be found in [`rag.py`](sciphi_r2r/core/pipelines/rag.py).
- The **RAG Pipeline**: Works similarly to the embedding pipeline but incorporates an LLM provider to produce text completions. The abstraction can be found in [`rag.py`](r2r/core/pipelines/rag.py).

Each pipeline incorporates a logging database for operation tracking and observability.

Expand Down
2 changes: 1 addition & 1 deletion config.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"logging": {
"level": "INFO",
"name": "sciphi_r2r",
"name": "r2r",
"database": "demo_logs_v1"
},
"embedding": {
Expand Down
14 changes: 7 additions & 7 deletions examples/basic/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,17 @@
import dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter

from sciphi_r2r.core import GenerationConfig, LoggingDatabaseConnection
from sciphi_r2r.datasets import HuggingFaceDataProvider
from sciphi_r2r.embeddings import OpenAIEmbeddingProvider
from sciphi_r2r.llms import OpenAIConfig, OpenAILLM
from sciphi_r2r.main import create_app, load_config
from sciphi_r2r.pipelines import (
from r2r.core import GenerationConfig, LoggingDatabaseConnection
from r2r.datasets import HuggingFaceDataProvider
from r2r.embeddings import OpenAIEmbeddingProvider
from r2r.llms import OpenAIConfig, OpenAILLM
from r2r.main import create_app, load_config
from r2r.pipelines import (
BasicEmbeddingPipeline,
BasicIngestionPipeline,
BasicRAGPipeline,
)
from sciphi_r2r.vector_dbs import PGVectorDB, QdrantDB
from r2r.vector_dbs import PGVectorDB, QdrantDB

dotenv.load_dotenv()

Expand Down
12 changes: 6 additions & 6 deletions examples/basic/embedding_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@
import dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter

from sciphi_r2r.core import DatasetConfig, LoggingDatabaseConnection
from sciphi_r2r.datasets import HuggingFaceDataProvider
from sciphi_r2r.embeddings import OpenAIEmbeddingProvider
from sciphi_r2r.main import load_config
from sciphi_r2r.pipelines import BasicDocument, BasicEmbeddingPipeline
from sciphi_r2r.vector_dbs import PGVectorDB, QdrantDB
from r2r.core import DatasetConfig, LoggingDatabaseConnection
from r2r.datasets import HuggingFaceDataProvider
from r2r.embeddings import OpenAIEmbeddingProvider
from r2r.main import load_config
from r2r.pipelines import BasicDocument, BasicEmbeddingPipeline
from r2r.vector_dbs import PGVectorDB, QdrantDB

if __name__ == "__main__":
dotenv.load_dotenv()
Expand Down
12 changes: 6 additions & 6 deletions examples/basic/rag_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@

import dotenv

from sciphi_r2r.core import GenerationConfig, LoggingDatabaseConnection
from sciphi_r2r.embeddings import OpenAIEmbeddingProvider
from sciphi_r2r.llms import OpenAIConfig, OpenAILLM
from sciphi_r2r.main import load_config
from sciphi_r2r.pipelines import BasicRAGPipeline
from sciphi_r2r.vector_dbs import PGVectorDB, QdrantDB
from r2r.core import GenerationConfig, LoggingDatabaseConnection
from r2r.embeddings import OpenAIEmbeddingProvider
from r2r.llms import OpenAIConfig, OpenAILLM
from r2r.main import load_config
from r2r.pipelines import BasicRAGPipeline
from r2r.vector_dbs import PGVectorDB, QdrantDB


class DemoRAGPipeline(BasicRAGPipeline):
Expand Down
2 changes: 1 addition & 1 deletion examples/client/test_client.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import uuid

from sciphi_r2r.client import SciPhiR2RClient
from r2r.client import SciPhiR2RClient

# Initialize the client with the base URL of your API
base_url = "http://localhost:8000" # Change this to your actual API base URL
Expand Down
12 changes: 6 additions & 6 deletions examples/web_search/rag_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@

import dotenv

from sciphi_r2r.core import GenerationConfig, LoggingDatabaseConnection
from sciphi_r2r.embeddings import OpenAIEmbeddingProvider
from sciphi_r2r.llms import OpenAIConfig, OpenAILLM
from sciphi_r2r.main import load_config
from sciphi_r2r.pipelines import WebSearchRAGPipeline
from sciphi_r2r.vector_dbs import PGVectorDB, QdrantDB
from r2r.core import GenerationConfig, LoggingDatabaseConnection
from r2r.embeddings import OpenAIEmbeddingProvider
from r2r.llms import OpenAIConfig, OpenAILLM
from r2r.main import load_config
from r2r.pipelines import WebSearchRAGPipeline
from r2r.vector_dbs import PGVectorDB, QdrantDB

vector_db_provider = "qdrant"
if __name__ == "__main__":
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import logging
from typing import Generator, Optional, Tuple

from sciphi_r2r.core import DatasetConfig, DatasetProvider
from r2r.core import DatasetConfig, DatasetProvider

logger = logging.getLogger(__name__)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import os
from typing import Generator, Optional, Tuple

from sciphi_r2r.core import DatasetConfig, DatasetProvider
from r2r.core import DatasetConfig, DatasetProvider

logger = logging.getLogger(__name__)

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from openai import OpenAI

from sciphi_r2r.core import EmbeddingProvider
from r2r.core import EmbeddingProvider

logger = logging.getLogger(__name__)

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import logging
from typing import Optional

from sciphi_r2r.core import EmbeddingProvider
from r2r.core import EmbeddingProvider

logger = logging.getLogger(__name__)

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion sciphi_r2r/llms/openai/base.py → r2r/llms/openai/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from openai.types import Completion
from openai.types.chat import ChatCompletion

from sciphi_r2r.core import GenerationConfig, LLMConfig, LLMProvider
from r2r.core import GenerationConfig, LLMConfig, LLMProvider

logger = logging.getLogger(__name__)

Expand Down
File renamed without changes.
6 changes: 3 additions & 3 deletions sciphi_r2r/main/app.py → r2r/main/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@
from fastapi import FastAPI, File, Form, HTTPException, UploadFile
from pydantic import BaseModel

from sciphi_r2r.core import (
from r2r.core import (
EmbeddingPipeline,
IngestionPipeline,
LoggingDatabaseConnection,
RAGPipeline,
)
from sciphi_r2r.main.utils import configure_logging, find_project_root
from r2r.main.utils import configure_logging, find_project_root

logger = logging.getLogger("sciphi_r2r")
logger = logging.getLogger("r2r")

# Current directory where this script is located
CURRENT_DIR = Path(__file__).resolve().parent
Expand Down
4 changes: 2 additions & 2 deletions sciphi_r2r/main/utils.py → r2r/main/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,12 @@ def configure_logging():
os.makedirs(logs_dir)

# Create a custom logger
logger = logging.getLogger("sciphi_r2r")
logger = logging.getLogger("r2r")
logger.setLevel(logging.DEBUG) # Set the logging level

# Create handlers (console and file handler with rotation)
c_handler = logging.StreamHandler()
log_file_path = os.path.join(logs_dir, "sciphi_r2r.log")
log_file_path = os.path.join(logs_dir, "r2r.log")
f_handler = RotatingFileHandler(
log_file_path, maxBytes=1000000, backupCount=5
)
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@

from langchain.text_splitter import TextSplitter

from sciphi_r2r.core import (
from r2r.core import (
BasicDocument,
EmbeddingPipeline,
LoggingDatabaseConnection,
VectorDBProvider,
VectorEntry,
)
from sciphi_r2r.embeddings import OpenAIEmbeddingProvider
from r2r.embeddings import OpenAIEmbeddingProvider

logger = logging.getLogger(__name__)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from enum import Enum
from typing import Optional, Union

from sciphi_r2r.core import (
from r2r.core import (
BasicDocument,
IngestionPipeline,
LoggingDatabaseConnection,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import logging
from typing import Optional

from sciphi_r2r.core import (
from r2r.core import (
GenerationConfig,
LLMProvider,
LoggingDatabaseConnection,
Expand All @@ -13,7 +13,7 @@
VectorSearchResult,
log_execution_to_db,
)
from sciphi_r2r.embeddings import OpenAIEmbeddingProvider
from r2r.embeddings import OpenAIEmbeddingProvider

logger = logging.getLogger(__name__)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@
import logging
from typing import Optional

from sciphi_r2r.core import (
from r2r.core import (
GenerationConfig,
LLMProvider,
LoggingDatabaseConnection,
VectorDBProvider,
log_execution_to_db,
)
from sciphi_r2r.embeddings import OpenAIEmbeddingProvider
from sciphi_r2r.integrations import SerperClient
from r2r.embeddings import OpenAIEmbeddingProvider
from r2r.integrations import SerperClient

from ..basic.rag import BasicRAGPipeline

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
import os
from typing import Optional, Union

from sciphi_r2r.core import VectorDBProvider, VectorEntry, VectorSearchResult
from sciphi_r2r.vecs.client import Client
from sciphi_r2r.vecs.collection import Collection
from r2r.core import VectorDBProvider, VectorEntry, VectorSearchResult
from r2r.vecs.client import Client
from r2r.vecs.collection import Collection

logger = logging.getLogger(__name__)

Expand All @@ -21,7 +21,7 @@ def __init__(self, provider: str = "pgvector") -> None:
"PGVectorDB must be initialized with provider `pgvector`."
)
try:
import sciphi_r2r.vecs
import r2r.vecs
except ImportError:
raise ValueError(
f"Error, PGVectorDB requires the vecs library. Please run `poetry add vecs`."
Expand All @@ -36,7 +36,7 @@ def __init__(self, provider: str = "pgvector") -> None:
DB_CONNECTION = (
f"postgresql://{user}:{password}@{host}:{port}/{db_name}"
)
self.vx: Client = sciphi_r2r.vecs.create_client(DB_CONNECTION)
self.vx: Client = r2r.vecs.create_client(DB_CONNECTION)
except Exception as e:
raise ValueError(
f"Error {e} occurred while attempting to connect to the pgvector provider with {DB_CONNECTION}."
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import os
from typing import Optional, Union

from sciphi_r2r.core import VectorDBProvider, VectorEntry, VectorSearchResult
from r2r.core import VectorDBProvider, VectorEntry, VectorSearchResult

logger = logging.getLogger(__name__)

Expand Down

0 comments on commit 9cc532a

Please sign in to comment.