BeepGPT keeps you in the loop without disturbing your focus. Its personalized, intelligent AI continuously monitors your Slack workspace, alerting you to important conversations and freeing you to concentrate on what’s most important.
BeepGPT reads the full history of your (public) Slack workspace and trains a Generative AI model to predict when you need to engage with a conversation. This training process gives the AI a deep understanding of your interests, expertise, and relationships. Using this understanding, BeepGPT watches conversations in real-time and notifies you when an important conversation is happening without you. With BeepGPT you can focus on getting things done without worrying about missing out.
This repo provides a notebook for you to train your own model using your Slack data, as well as a script to run the alerting bot in production.
- To just experiment with Kaskada, feel free to use the Example Slack Export included in the repo.
- To also experiment with training a model, you will need a OpenAI API key. See Getting an OpenAI API key
- To train a model on your Slack, you will need a json export of your Slack History. An export can be initiated here: https://.slack.com/services/export. An Admin-level user of your Slack workspace will need to do the export.
- To run the "Production" code and receive alerts from a bot, you will need to create a Slack App. See Creating a Slack App
slack-export/contains an example Slack workspace export. To learn more about the format of the export, see: https://slack.com/help/articles/220556107-How-to-read-Slack-data-exports
Note that some PII data has been removed from the export, but it doesn't effect the files for our use case.
slack-export.parquetcontains the data from the example export, in the proper format to be consumed by Kaskada.
- The files inside
slack-generation/include:notebook.ipynba Jupyter notebook which describes how to use OpenAI to generate historical slack data.projects.jsonandschedule.jsonlare used in the above notebook.generated.jsonlis the raw generated historical data.
slack-generation.parquetcontains all the generated slack data in the proper format to be consumed by Kaskada. It is used as example data in thev2training notebook below.slack-generation.users.jsoncontains an exampleusers.jsonfile for user lookup in various notebooks.
FineTuning_v2.ipynbis a Jupyter notebook which contains all the details of how we successfully trained a model to power BeepGPT.human.pyis a python script used in the v2 training process. See section 2.1 in theFineTuning_v2.ipynbnotebook for more info.FineTuning_v1.ipynbis an earlier version of the training procedure. Models trained with this notebook don't generalize as well as those trained with the "v2" notebook.
labels_v2.jsonlabels_v1.json
These are lists of userIds from the training notebooks. These files are used in the production code to convert single-token user representations back to their original userId.
-
beep-gpt.pycontains the code that watches Slack in real-time and alerts you about important conversations. This code usesmessages.parquetandlabels_.jsonas inputs. Note that this code is not production-ready, but functions well enough to demo the full application path.- To run this code, first make sure you using at least Python 3.8. (3.11 recommended):
- Next install the required libraries
pip install -r requirements.txt - Then set the following environment variables:
OPEN_AI_KEY-> Found here: https://platform.openai.com/account/api-keysSLACK_APP_TOKEN-> Found here: https://api.slack.com/apps//general, should start withxapp-SLACK_BOT_TOKEN-> Found here: https://api.slack.com/apps//oauth, should start withxoxb-
- Finally start the script
python beep-gpt.py
-
manifest.yamlcontains a template for creating a new App in Slack
The following files are related to a new approach we are taking on the project:
- Let users specify what topics they are interested in following.
- The system can recommend topics based on previous history.
Files:
ChatCompletion_v1.ipynba Jupyter notebook that uses chat completion to determine if a user should be notified.- This notebook was primarily created as a baseline for comparing to results of other methods.
ChatCompletion_v1_results_*.jsonlcontains the results from the above notebook from various runs.- The results are different on each run
Vectors_v0.ipynba Jupyter notebook that uses embeddings and vector search to determine if a user should be notified.- This notebook creates embeddings for the conversations and tries to match them to topics
Vectors_v1.ipynba Jupyter notebook that evaluates numerous embeddings models to try to determine which works best for this scenario.- This notebook create embeddings for the topics and tries to match them to conversations.
- Outputs are compared to the results from
ChatCompletion_v1.ipynbto determine which embedding models work best
Vectors_v1_<model_name>_scores.jsonloutput from running retrieval against the topic embeddings for each conversation.- One file for each embedding model.
In most of the examples above, we used "10 minutes with no new messages" as the separator between conversations. But in real life this is probably a bad way to determine if a conversation is ended. In the following files we experiment with the idea of using few-shot learning and chat completion "in the loop" to determine the end of a conversation.
ConversationEnding_v1.ipynbIn this notebook, we use Kaskada to gather up previous messages in a channel and a UDF to call the LLM to make the determination. This method includes the past 5 messages in the channel, independent of whether or not they are part of the current conversation.ConversationEnding_v1_results.jsonlcontains the results from this experiment.ConversationEnding_v1_input.jsonlcontains a small set of data to test if we can determine the end of a conversation based on the string output from a previous step
ConversationEnding_v2.ipynbIn this notebook, we use Ray remote Actors to enable calling the LLM in a parallel manner. This method only include messages that are part of the current conversation.- For this method we just proved out the technique. No results file is available.
In order to experiment with training a model, you will need an OpenAI API key.
If you don't yet have an OpenAI account, go here to sign-up: https://platform.openai.com/signup?launch
After signing up, you will need to add billing details in order to obtain an API key. After doing so, you can create a key here: https://platform.openai.com/account/api-keys
If you want to run the Production code, you will need to create a Slack App and install it into a Slack workspace that you have access to.
- Start here: https://api.slack.com/apps, and click
Create New App. ChooseFrom an App Manifest. - Choose the workspace to install the App in.
- Copy the contents of
manifest.yamland paste it into the window. (make sure to paste it in theyamlsection) - Click
Next, thenCreate - Then on the
Basic Informationpage, clickInstall to Workspaceand follow the auth flow. - Finally, under
App-Level Tokens, clickGenerate Tokens and Scopes. Add theconnections:writescope, name itSocketToken, and clickGenerate. Don't worry about saving the token somewhere safe, you can always re-access it.
After creating the Slack App, any user that wants to be notified from the App needs to first add it to their personal Apps list. Additionally the App needs to be manually added to any channel you want it to watch.
To add the App to your list: In Slack, in the sidebar, goto Apps -> Manage -> Browse Apps. Click on BeepGPT to add it to your app list.
To add the App to a channel: In Slack, go to the channel, click the v next to the channel name, goto Integrations -> Apps -> Add Apps. Add BeepGPT