BeepGPT - Intelligent notifications powered by Kaskada

BeepGPT keeps you in the loop without disturbing your focus. Its personalized, intelligent AI continuously monitors your Slack workspace, alerting you to important conversations and freeing you to concentrate on what’s most important.

BeepGPT reads the full history of your (public) Slack workspace and trains a Generative AI model to predict when you need to engage with a conversation. This training process gives the AI a deep understanding of your interests, expertise, and relationships. Using this understanding, BeepGPT watches conversations in real-time and notifies you when an important conversation is happening without you. With BeepGPT you can focus on getting things done without worrying about missing out.

This repo provides a notebook for you to train your own model using your Slack data, as well as a script to run the alerting bot in production.

Things to do

To just experiment with Kaskada, feel free to use the Example Slack Export included in the repo.
To also experiment with training a model, you will need a OpenAI API key. See Getting an OpenAI API key
To train a model on your Slack, you will need a json export of your Slack History. An export can be initiated here: https://.slack.com/services/export. An Admin-level user of your Slack workspace will need to do the export.
To run the "Production" code and receive alerts from a bot, you will need to create a Slack App. See Creating a Slack App

Repo Contents

Example Slack Export

slack-export/ contains an example Slack workspace export. To learn more about the format of the export, see: https://slack.com/help/articles/220556107-How-to-read-Slack-data-exports

Note that some PII data has been removed from the export, but it doesn't effect the files for our use case.

slack-export.parquet contains the data from the example export, in the proper format to be consumed by Kaskada.

Generated Slack Data

The files inside slack-generation/ include:
- notebook.ipynb a Jupyter notebook which describes how to use OpenAI to generate historical slack data.
- projects.json and schedule.jsonl are used in the above notebook.
- generated.jsonl is the raw generated historical data.
slack-generation.parquet contains all the generated slack data in the proper format to be consumed by Kaskada. It is used as example data in the v2 training notebook below.
slack-generation.users.json contains an example users.json file for user lookup in various notebooks.

Fine-Tuning Model Training Files

FineTuning_v2.ipynb is a Jupyter notebook which contains all the details of how we successfully trained a model to power BeepGPT.
human.py is a python script used in the v2 training process. See section 2.1 in the FineTuning_v2.ipynb notebook for more info.
FineTuning_v1.ipynb is an earlier version of the training procedure. Models trained with this notebook don't generalize as well as those trained with the "v2" notebook.

Training Outputs

labels_v2.json
labels_v1.json

These are lists of userIds from the training notebooks. These files are used in the production code to convert single-token user representations back to their original userId.

Production code

beep-gpt.py contains the code that watches Slack in real-time and alerts you about important conversations. This code uses messages.parquet and labels_.json as inputs. Note that this code is not production-ready, but functions well enough to demo the full application path.
1. To run this code, first make sure you using at least Python 3.8. (3.11 recommended):
2. Next install the required libraries
```
pip install -r requirements.txt
```
3. Then set the following environment variables:
  - OPEN_AI_KEY -> Found here: https://platform.openai.com/account/api-keys
  - SLACK_APP_TOKEN -> Found here: https://api.slack.com/apps//general, should start with xapp-
  - SLACK_BOT_TOKEN -> Found here: https://api.slack.com/apps//oauth, should start with xoxb-
4. Finally start the script
```
python beep-gpt.py
```
manifest.yaml contains a template for creating a new App in Slack

A New Approach

The following files are related to a new approach we are taking on the project:

Let users specify what topics they are interested in following.
The system can recommend topics based on previous history.

Files:

ChatCompletion_v1.ipynb a Jupyter notebook that uses chat completion to determine if a user should be notified.
- This notebook was primarily created as a baseline for comparing to results of other methods.
ChatCompletion_v1_results_*.jsonl contains the results from the above notebook from various runs.
- The results are different on each run
Vectors_v0.ipynb a Jupyter notebook that uses embeddings and vector search to determine if a user should be notified.
- This notebook creates embeddings for the conversations and tries to match them to topics
Vectors_v1.ipynb a Jupyter notebook that evaluates numerous embeddings models to try to determine which works best for this scenario.
- This notebook create embeddings for the topics and tries to match them to conversations.
- Outputs are compared to the results from ChatCompletion_v1.ipynb to determine which embedding models work best
Vectors_v1_<model_name>_scores.jsonl output from running retrieval against the topic embeddings for each conversation.
- One file for each embedding model.

Conversation Endings

In most of the examples above, we used "10 minutes with no new messages" as the separator between conversations. But in real life this is probably a bad way to determine if a conversation is ended. In the following files we experiment with the idea of using few-shot learning and chat completion "in the loop" to determine the end of a conversation.

ConversationEnding_v1.ipynb In this notebook, we use Kaskada to gather up previous messages in a channel and a UDF to call the LLM to make the determination. This method includes the past 5 messages in the channel, independent of whether or not they are part of the current conversation.
- ConversationEnding_v1_results.jsonl contains the results from this experiment.
- ConversationEnding_v1_input.jsonl contains a small set of data to test if we can determine the end of a conversation based on the string output from a previous step
ConversationEnding_v2.ipynb In this notebook, we use Ray remote Actors to enable calling the LLM in a parallel manner. This method only include messages that are part of the current conversation.
- For this method we just proved out the technique. No results file is available.

Getting an OpenAI API key

In order to experiment with training a model, you will need an OpenAI API key.

If you don't yet have an OpenAI account, go here to sign-up: https://platform.openai.com/signup?launch

After signing up, you will need to add billing details in order to obtain an API key. After doing so, you can create a key here: https://platform.openai.com/account/api-keys

Creating a Slack App

If you want to run the Production code, you will need to create a Slack App and install it into a Slack workspace that you have access to.

Start here: https://api.slack.com/apps, and click Create New App. Choose From an App Manifest.
Choose the workspace to install the App in.
Copy the contents of manifest.yaml and paste it into the window. (make sure to paste it in the yaml section)
Click Next, then Create
Then on the Basic Information page, click Install to Workspace and follow the auth flow.
Finally, under App-Level Tokens, click Generate Tokens and Scopes. Add the connections:write scope, name it SocketToken, and click Generate. Don't worry about saving the token somewhere safe, you can always re-access it.

After creating the Slack App, any user that wants to be notified from the App needs to first add it to their personal Apps list. Additionally the App needs to be manually added to any channel you want it to watch.

To add the App to your list: In Slack, in the sidebar, goto Apps -> Manage -> Browse Apps. Click on BeepGPT to add it to your app list.

To add the App to a channel: In Slack, go to the channel, click the v next to the channel name, goto Integrations -> Apps -> Add Apps. Add BeepGPT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BeepGPT - Intelligent notifications powered by Kaskada

Things to do

Repo Contents

Example Slack Export

Generated Slack Data

Fine-Tuning Model Training Files

Training Outputs

Production code

A New Approach

Conversation Endings

Getting an OpenAI API key

Creating a Slack App

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
slack-export		slack-export
slack-generation		slack-generation
.gitignore		.gitignore
ChatCompletion_v1.ipynb		ChatCompletion_v1.ipynb
ChatCompletion_v1_results_1.jsonl		ChatCompletion_v1_results_1.jsonl
ChatCompletion_v1_results_2.jsonl		ChatCompletion_v1_results_2.jsonl
ChatCompletion_v1_results_3.jsonl		ChatCompletion_v1_results_3.jsonl
ChatCompletion_v1_results_avg.jsonl		ChatCompletion_v1_results_avg.jsonl
ConversationEnding_v1.ipynb		ConversationEnding_v1.ipynb
ConversationEnding_v1_input.jsonl		ConversationEnding_v1_input.jsonl
ConversationEnding_v1_results.jsonl		ConversationEnding_v1_results.jsonl
ConversationEnding_v2.ipynb		ConversationEnding_v2.ipynb
FineTuning_v1.ipynb		FineTuning_v1.ipynb
FineTuning_v2.ipynb		FineTuning_v2.ipynb
README.md		README.md
Vectors_v0.ipynb		Vectors_v0.ipynb
Vectors_v1.ipynb		Vectors_v1.ipynb
Vectors_v1_bge-base-en-v1.5_scores.jsonl		Vectors_v1_bge-base-en-v1.5_scores.jsonl
Vectors_v1_bge-large-en-v1.5_scores.jsonl		Vectors_v1_bge-large-en-v1.5_scores.jsonl
Vectors_v1_bge-small-en-v1.5_scores.jsonl		Vectors_v1_bge-small-en-v1.5_scores.jsonl
Vectors_v1_e5-base-v2_scores.jsonl		Vectors_v1_e5-base-v2_scores.jsonl
Vectors_v1_e5-large-v2_scores.jsonl		Vectors_v1_e5-large-v2_scores.jsonl
Vectors_v1_ember-v1_scores.jsonl		Vectors_v1_ember-v1_scores.jsonl
Vectors_v1_gte-base_scores.jsonl		Vectors_v1_gte-base_scores.jsonl
Vectors_v1_gte-large_scores.jsonl		Vectors_v1_gte-large_scores.jsonl
Vectors_v1_llama2_scores.jsonl		Vectors_v1_llama2_scores.jsonl
Vectors_v1_openai_scores.jsonl		Vectors_v1_openai_scores.jsonl
beep-gpt.py		beep-gpt.py
human.py		human.py
labels_v1.json		labels_v1.json
labels_v2.json		labels_v2.json
manifest.yaml		manifest.yaml
requirements.txt		requirements.txt
slack-export.parquet		slack-export.parquet
slack-generation.parquet		slack-generation.parquet
slack-generation.users.json		slack-generation.users.json

kaskada-ai/beep-gpt

Folders and files

Latest commit

History

Repository files navigation

BeepGPT - Intelligent notifications powered by Kaskada

Things to do

Repo Contents

Example Slack Export

Generated Slack Data

Fine-Tuning Model Training Files

Training Outputs

Production code

A New Approach

Conversation Endings

Getting an OpenAI API key

Creating a Slack App

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages