Data ingestion refers to the process of collecting data from different sources speaking different languages, translating it into a common language and storing it on a destination node, where it can be levered for further analysis or hunting.
In practical terms, this means receiving information from a variety of security-related heterogenous sensors, and storing it on a unified system.
OXA framework considers the ingestion process is fostered when four different issues are solved:
- the sensor produces events that are created using structured content, ideally a standard
- the produced events are conveyed (from the sensor to the analytics) in a transport protocol, ideally a standard
- the incoming events (at the analytics side) are normalized as structured content, ideally a standard
- the normalized events are stored on a system that can allow scalability and accessibility, ideally a standard
OXA addresses these issues by combining existing approaches and providing new ones!
On this stage OXA does not intend to replace existing standards. Several of them already exist and perform well :
- Open Cybersecurity Schema Framework - OCSF
- OpenTelemetry (which has integrated Elastic Common Schema - ECS)
- Common Event Format - CEF
- OASIS Heimdall Data Format - OHDF
- Intrusion Detection Message Exchange Format - IDMEF
OCSF, OpenTelemetry/ECS, CEF have a large community adoption.
OCSF, OpenTelemetry/ECS are the preferred OXA choices for sensor solutions, considering next stages and current community adoption
Syslog (and its secure evolutions) has been an historical solution to distribute log on other systems. Modern log transport relies more and more on HTTP. It can then be used in two ways:
- The sensor-producer will push the events to the analytics-consumer
- The analytics-consumer will pull the events from the sensor-producer-management console
OXA has no preference on the transport being used
When the incoming events are directly structured in a rich format it allows valuable cybersecurity use cases:
- turnkey enrichment,
- real-time detection analytics,
- complex queries for future hunting activities
OCSF and OpenTelemetry/ECS have a real advantage because they both use a rich schema. OXA recognizes these two solutions as the preferred ones for an analytic solution.
Two main approaches can pretend to compete at this stage:
- the first one relies on Elasticsearch, as a scalable data store and distributed search engine. Elasticsearch is relevant for incoming events structured as OpenTelemtry/ECS
- the second one relies on S3 from AWS, now widely adopted as as de-factor distributed object storage. S3 is very relevant for incoming events structured as OCSF or any security solution activated into a AWS ecosystem.
OXA has no preference on the storage being used for analytics solution
For a relevant ingestion capability, two Frameworks are preferred: OCSF and OpenTelemetry/ECS.
OXA brings some resources to accelerate the use of OCSF and/or OpenTelemetrECS