This tool allows you to process tweet data extracted from Twitter Analytics and generate relevant questions based on the content of each tweet. It consists of two main steps: cleaning the tweet data and then generating questions from the cleaned data.
- Step 1: Clean and preprocess your tweet data.
- Step 2: Generate questions based on the cleaned tweet data.
- Python 3.x
- Required libraries:
pandas,groq
You will also need an GROQ API key to generate questions from tweets.
In this step, you will clean and preprocess the tweet data by removing unwanted elements like mentions, hashtags, and URLs. Additionally, tweets shorter than a specified character limit will be filtered out. Use clean_data.py for this purpose.
- A CSV file containing tweets downloaded from Twitter Analytics. Ensure the file includes a column named
Post text, which contains the tweet text.
MAX_CHARS: Minimum number of characters a tweet must have to be included.REMOVE_MENTIONS: Set toTrueto remove mentions (@username).REMOVE_HASHTAGS: Set toTrueto remove hashtags (#hashtag).REMOVE_WEBLINKS: Set toTrueto remove URLs (web links).
- Input CSV: Load the CSV file containing tweets.
- Sorting: Optionally, sort the data by impressions.
- Filtering: Filter out tweets with fewer than
MAX_CHARScharacters. - Cleaning: Remove mentions, hashtags, and URLs based on control variables.
- Output: Save the cleaned data into a new CSV file.
After cleaning the tweet data, the next step is to generate questions based on the content of each tweet. Use generate_questions.py for this purpose
- The cleaned tweet CSV from Step 1.
- An GROQ API key to generate questions.
- Input CSV: Use the cleaned CSV from Step 1.
- Question Generation: The tool will use the OpenAI API to generate relevant questions for each tweet.
- Output: A new CSV will be created containing the original tweet and the generated question.
After running the scripts, your final output CSV will include two columns:
| tweet | Generated Question |
|---|---|
| "I am learning machine learning!" | "What aspects of machine learning are you focusing on?" |
| "What's the future of AI?" | "How do you envision the future of AI impacting industries?" |
| "How do you use GPT-3 effectively?" | "What are the best practices for using GPT-3 in applications?" |
-
Clone this repository to your local machine:
git clone https://github.com/hiteshbandhu/tweetune.git
-
Navigate to the project directory:
cd tweetune -
Install the required dependencies:
pip install -r requirements.txt
-
Obtain your GROQ API key and set it up for use with the question generation script.
Run the cleaning script on your downloaded Twitter Analytics CSV file to preprocess the data:
python clean_data.pyAfter this step, you’ll have a CSV file with cleaned tweet data.
Run the question generation script to create questions from your cleaned tweet data:
python generate_questions.pyYou will receive a CSV file containing the original tweet and the generated question for each tweet.
This project is licensed under the MIT License. See the LICENSE file for more details.
By following these steps, you can clean your tweet data and generate meaningful questions that can be used for engagement, analysis, or research purposes. If you encounter any issues or have suggestions for improvements, feel free to open an issue or contribute to the project.