Skip to content

Commit 65cd2c9

Browse files
committed
Update README.md
1 parent ee497f4 commit 65cd2c9

File tree

1 file changed

+96
-47
lines changed

1 file changed

+96
-47
lines changed

README.md

Lines changed: 96 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,59 +1,75 @@
1-
# Document Question Answering System
2-
3-
This system provides a document-based question answering capability for the Smalltalk application. It allows users to upload documents, which are processed, split into fragments, and converted into vector embeddings for semantic search.
4-
5-
## Overview
6-
7-
The document QA system integrates with the existing Smalltalk application and enables:
8-
9-
1. Document upload and processing
10-
2. Automatic splitting of documents into semantically meaningful fragments
11-
3. Generation of vector embeddings for each fragment
12-
4. Retrieval of relevant document fragments based on user queries
13-
5. Enhanced responses that incorporate information from uploaded documents
14-
15-
## Key Components
16-
17-
### 1. DocumentFragment
1+
smalltalk
2+
==
3+
smalltalk is a tinystruct-based project that provides instant messaging functionality, It allows users to send text and share images, documents, and other content.
4+
Also, It allows you to interact with ChatGPT which is a language model developed by OpenAI through a command-line interface (CLI) or a web interface.
5+
6+
[![Star History Chart](https://api.star-history.com/svg?repos=tinystruct/smalltalk&type=Date)](https://star-history.com/#tinystruct/smalltalk&Date)
7+
8+
Installation
9+
---
10+
1. Download the project from GitHub by clicking the "Clone or download" button, then selecting "Download ZIP".
11+
2. Extract the downloaded ZIP file to your local machine.
12+
3. If you used to use git, then you should execute the following command to instead of above steps:
13+
```bash
14+
git clone https://github.com/tinystruct/smalltalk.git
15+
```
16+
4. You will need to follow this [tutorial](https://openjdk.org/install/) to install the Java Development Kit (JDK 11+) on your computer first. If you choose to download and install it manually, please check it in this [OpenJDK Archive](https://jdk.java.net/archive/). And Java development environment such as Eclipse or IntelliJ IDEA is just better to have, not required.
17+
18+
If your current envirionment is using JDK 8, you can execute the below command to upgrade it quickly.
19+
```
20+
bin/openjdk-upgrade
21+
```
22+
5. Import the extracted / cloned project into your Java development environment.
23+
6. Go to `src/main/resources/application.properties` file and update the `openai.api_key` with your own key or set the environment variable `OPENAI_API_KEY` with your own key.
24+
7. Here is the last step for installation:
25+
```tcsh
26+
./mvnw compile
27+
```
1828

19-
This class represents a piece of text from a document, with metadata such as:
20-
- Document ID
21-
- Content
22-
- Fragment index
23-
- File path
24-
- MIME type
25-
- Creation timestamp
29+
Usage
30+
---
31+
You can run smalltalk in different ways:
2632

27-
### 2. DocumentEmbedding
33+
CLI mode
34+
1. Open a terminal and navigate to the project's root directory.
35+
2. To execute it in CLI mode, run the following command:
36+
```tcsh
37+
bin/dispatcher --version
38+
```
39+
To see the available commands, run the following command:
40+
```tcsh
41+
bin/dispatcher --help
42+
```
43+
To interact with ChatGPT, use the chat command, for example:
44+
```tcsh
45+
bin/dispatcher chat
46+
```
47+
![CLI](https://github.com/tinystruct/smalltalk/assets/3631818/b49bab05-0135-4383-b252-0ca9c011f6e8)
2848

29-
This class stores the vector representation of a document fragment:
30-
- Fragment ID (references DocumentFragment)
31-
- Embedding vector (stored as serialized byte array)
32-
- Embedding dimension
33-
- Creation timestamp
49+
Web mode
3450

35-
### 3. DocumentProcessor
51+
1. Run the project in a servlet container or in a HTTP server:
52+
2. To run it in a servlet container, you need to compile the project first:
3653

37-
Handles the processing of uploaded documents:
38-
- Extracts textual content using Apache Tika for rich document formats
39-
- Splits content into appropriately sized fragments
40-
- Creates DocumentFragment objects for each fragment
41-
- Supports PDF, Word, Excel, PowerPoint, and other document formats through Tika
54+
then you can run it on tomcat server by running the following command:
4255

43-
### 4. EmbeddingManager
56+
```tcsh
57+
sudo bin/dispatcher start --import org.tinystruct.system.TomcatServer --server-port 777
58+
```
59+
or run it on netty http server by running the following command:
4460

45-
Manages the generation and retrieval of embeddings:
46-
- Generates embeddings using OpenAI's embedding API
47-
- Caches query embeddings to reduce API calls
48-
- Implements the cosine similarity function for semantic search
49-
- Provides methods to find similar documents based on query
61+
```tcsh
62+
sudo bin/dispatcher start --import org.tinystruct.system.NettyHttpServer --server-port 777
63+
```
64+
3. To run it in a Docker container, you can use the command below:
5065

51-
### 5. DocumentQA
66+
```tcsh
67+
docker run -d -p 777:777 -e "OPENAI_API_KEY=[YOUR-OPENAI-API-KEY]" -e "STABILITY_API_KEY=[YOUR-STABILITY-API-KEY]" m0ver/smalltalk
68+
```
69+
4. Access the application by navigating to http://localhost:777/?q=talk in your web browser
70+
5. If you want to talk with ChatGPT, please type @ChatGPT in your topic of the conversation when you set up the topic.
5271

53-
Implements the question answering functionality:
54-
- Finds relevant document fragments for a given query
55-
- Formats document context for inclusion in AI responses
56-
- Enhances queries with document context
72+
![Web](https://github.com/tinystruct/smalltalk/assets/3631818/32e50145-a5be-41d6-9cea-5b25e76e9f1b)
5773

5874
## Database Schema (SQLite)
5975

@@ -144,3 +160,36 @@ The system relies on the following key libraries:
144160
- OpenAI API - For generating embeddings
145161
- tinystruct - For application framework and database access
146162

163+
Demonstration
164+
---
165+
A demonstration for the comet technology, without any websocket and support any web browser:
166+
167+
https://tinystruct.herokuapp.com/?q=talk
168+
169+
Troubleshooting
170+
---
171+
* If you encounter any problems during the installation or usage of the project, please check the project's documentation or build files for information about how to set up and run the project.
172+
* If you still have problems, please open an issue on GitHub or contact the project maintainers for help.
173+
174+
Contribution
175+
---
176+
We welcome contributions to the smalltalk project. If you are interested in contributing, please read the CONTRIBUTING.md file for more information about the project's development process and coding standards.
177+
178+
Acknowledgements
179+
---
180+
smalltalk uses the OpenAI API to interact with the ChatGPT language model. We would like to thank OpenAI for providing this powerful tool to the community.
181+
182+
License
183+
---
184+
185+
Licensed under the Apache License, Version 2.0 (the "License");
186+
you may not use this file except in compliance with the License.
187+
You may obtain a copy of the License at
188+
189+
http://www.apache.org/licenses/LICENSE-2.0
190+
191+
Unless required by applicable law or agreed to in writing, software
192+
distributed under the License is distributed on an "AS IS" BASIS,
193+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
194+
See the License for the specific language governing permissions and
195+
limitations under the License.

0 commit comments

Comments
 (0)