OpenSearch and Spark Integration P0 Demo #316
dai-chen
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
Please find more context about this feature in https://github.com/opensearch-project/sql/issues/1116.
Demo Use Case
Architecture: From customer end, they already have ingestion pipeline pushing ALB logs to a S3 bucket. The dataset is extremely huge and keeps growing. There is some monitoring system that alarms on suspicious client IP.
Workflow: Once received the notification, customer wants to quick load data corresponding only into predefined OpenSearch index and dashboard. So they can diagnose and troubleshoot fast by full text analytics and visualization offered by OpenSearch.
Solution for Demo: we propose a new Maximus table format on which secondary index and materialized view are based. For the use case:
client_ipas first accelerationPrerequisites
$HOME/.awsalb_logs_tempto simulate customer ingestiondeltalog,alb_logs_rawandalb_logs_metricsfor Maximum metadata and MV dataalb_logs_rawandalb_logs_metricsindex created previouslyPlease run with DevTools in OpenSearch Dashboard.
Demo
Workflow
Steps
Please run with DevTools in OpenSearch Dashboard.
Cleanup
Remove
docker_os_dataanddocker_spark_datadocker volume.Alternatively, if you don't want to lose everything you created in OpenSearch, start docker and run the following commands in CLI and re-create
alb_logs_temponly.Video
OpenSearch.Spark.demo.part.1.mp4
OpenSearch.Spark.demo.part.2.mp4
Beta Was this translation helpful? Give feedback.
All reactions