You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have worked on catalog queries, found in the file pairs.py and classified them into four levels:
4
+
5
+
Level 1: Easy - Approximately the exact same code.
6
+
Level 2: Medium - More than 50% of lines are the same.
7
+
Level 3: Hard - Less than 50% similarity but with the same functionality.
8
+
Level 4: Undetermined - Sometimes the bot provides code (not necessarly correct), sometimes not.
9
+
10
+
11
+
12
+
Easy level queries :
13
+
14
+
Q1: I would like to use RTDIP components to read from an eventhub using 'connection string' as the connection string, and 'consumer group' as the consumer group, transform using binary to string, and edge x transformer then write to delta.
15
+
16
+
Q18: Read customer purchase history from a Parquet file, perform a customer segmentation analysis, and save the segments to a Delta Lake.
17
+
18
+
19
+
Medium level queries :
20
+
21
+
Q2: I need to read data from Kafka using a specific bootstrap server and topic, then apply a JSON parser, and finally write the results to a Hive table.
22
+
23
+
Q9: Load sales data from an FTP (file transfer protocol) server, perform currency conversion, and append the results to an existing Parquet file.
24
+
25
+
Q14: Access weather data stored in an HDFS cluster, normalize temperature readings, and store the results in an Elasticsearch index.
26
+
27
+
Q19: Aggregate financial transaction data from a SQL database, calculate the monthly average transaction amount, and store the results in a Delta Lake.
28
+
29
+
Q20 : Fetch log data from an Elasticsearch index, filter logs with error severity, and archive them in a Delta Lake. ,
30
+
31
+
32
+
33
+
Hard level queries :
34
+
35
+
Q3: Fetch sensor data from an Azure Blob Storage in CSV format, aggregate the data on sensor ID, and save it to a SQL database.
36
+
37
+
Q4: Stream data from a MQTT broker, filter out readings below a threshold value, and store the data in Elasticsearch
38
+
39
+
Q6: Retrieve temperature data from a REST API, normalize the data, and write it into a MongoDB collection.
40
+
41
+
Q7: Connect to a Google Cloud Storage, download logs in JSON format, conduct sentiment analysis, and then store the results in a Google BigQuery table.
42
+
43
+
Q8: Stream Twitter data using API credentials, extract hashtags from tweets, and save the data into a Cassandra database.
44
+
45
+
Q10: Connect to an IoT device using MQTT protocol, apply a low-pass filter to sensor readings, and upload the filtered data to an InfluxDB instance.
46
+
47
+
Q12: Aggregate temperature and humidity data from a CSV file stored in an Azure Data Lake, calculate average values per day, and upload to a Snowflake database.
48
+
49
+
Q13: Extract stock market data from a REST API, calculate moving averages, and save the data in an Amazon Redshift cluster.
50
+
51
+
Q16: Stream social media data from a JSON file, deduplicate the entries based on user ID, and store the results in a Delta Lake.
52
+
53
+
Q21: Read weather data from a RESTful API, convert temperature from Celsius to Fahrenheit, and store the results in a JSON file.
54
+
55
+
Q22: Connect to a MySQL database, retrieve order data, group by product category, and insert the grouped data into a new table.
56
+
57
+
Q23: Extract text data from a series of PDF files stored in an SFTP server, perform named entity recognition, and index the entities in an Apache Solr collection.
58
+
59
+
60
+
Undetermined level queries :
61
+
62
+
Q5: Import financial data from an S3 bucket in Parquet format, apply a standard scaler transformation, and then upload it to a Redshift database.
63
+
64
+
Q11: Read customer feedback from a Google Sheets document, apply sentiment analysis, and store the results in a PostgreSQL database for further analysis.
65
+
66
+
Q15: Load sales records from a MongoDB collection, filter out records with sales below $500, and export the data to a CSV file.
67
+
68
+
Q17: Load IoT sensor data from a CSV file, apply a smoothing filter to the readings, and write to a Delta Lake for time-series analysis.
To set up the required dependencies, create a virtual environment and run the following command:
27
36
```
28
37
pip install -r requirements.txt
29
38
```
39
+
* Run the Application with Docker
40
+
Navigate to the `src` folder by using:
30
41
42
+
```
43
+
cd src
44
+
```
45
+
Ensure that the Docker daemon is running before proceeding. Follow these steps to run the application:
46
+
Step 1: Build the Docker Image
47
+
Execute the following command to build the Docker image. Replace `<your-image-name>` with your chosen image name:
31
48
49
+
```
50
+
docker build -t <your-image-name> .
51
+
```
52
+
Step 2: Run the Docker Container
53
+
Launch the container using the following command, specifying port 8501 for Streamlit:
32
54
55
+
```
56
+
docker run -dp 8501:8501 <your-image-name>
57
+
```
58
+
59
+
Once the container is successfully running, access the application by clicking the link displayed in the `Ports` column within the Docker Desktop interface. This link corresponds to the port mapping configured during the container's launch and serves as the entry point to interact with the application.
60
+
61
+
_Note: Remember to replace <your-image-name> with the actual name you assigned to your Docker image._
33
62
63
+
* Accessing the Chatbot Application
34
64
65
+
Open your web browser and navigate to the presented link. The Chatbot application will be displayed, prompting you to input your OpenAI API Key. Engage in conversations by posing RTDIP-oriented questions and explore the capabilities of the application.
0 commit comments