Last Updated: 2025-11-30
Harmony proxy supports automatic parsing of multiple content types beyond JSON, enabling seamless data transformation across different formats in healthcare and data integration pipelines.
The HTTP adapter automatically detects and parses incoming request bodies based on the Content-Type header, converting them to a normalized JSON structure for pipeline processing. This enables middleware and backends to work with a consistent data model regardless of the original format.
Content-Type Headers:
application/jsonapplication/fhir+json(FHIR resources)application/dicom+json(DICOM JSON)
Example Request:
curl -X POST http://localhost:8080/api/data \
-H "Content-Type: application/json" \
-d '{"name": "Alice", "age": 30}'Normalized Structure:
{
"name": "Alice",
"age": 30
}Notes:
- Default format when Content-Type is missing or unrecognized
- Direct pass-through to normalized_data
- Validates JSON syntax
Content-Type Headers:
application/xmltext/xmlapplication/soap+xml
Example Request:
curl -X POST http://localhost:8080/api/data \
-H "Content-Type: application/xml" \
-d '<person><name>Bob</name><age>25</age></person>'Normalized Structure:
{
"person": {
"name": "Bob",
"age": "25"
}
}XML Features:
- Text-only elements: Converted to simple string values
- Attributes: Prefixed with
@(e.g.,"@id": "123") - Nested elements: Preserved as nested objects
- Multiple elements with same name: Converted to arrays
- Mixed content: Text stored in
#textfield when attributes present
Example with Attributes:
<person id="123" type="customer">
<name>Charlie</name>
</person>Becomes:
{
"person": {
"@id": "123",
"@type": "customer",
"name": "Charlie"
}
}Security: XXE (XML External Entity) attacks are prevented - quick-xml does not support external entities by default.
Content-Type Header:
text/csv
Example Request:
curl -X POST http://localhost:8080/api/data \
-H "Content-Type: text/csv" \
-d 'name,age,city
Alice,30,NYC
Bob,25,LA'Normalized Structure:
{
"rows": [
{"name": "Alice", "age": "30", "city": "NYC"},
{"name": "Bob", "age": "25", "city": "LA"}
],
"row_count": 2
}CSV Features:
- First row treated as header
- All values parsed as strings
- Empty fields supported
- Handles quoted fields with commas
Security: Formula injection prevention - fields starting with =, +, -, or @ are automatically prefixed with a single quote (') to prevent execution in spreadsheet applications.
Example:
name,formula
Alice,=SUM(A1:A10)Becomes:
{
"rows": [
{"name": "Alice", "formula": "'=SUM(A1:A10)"}
]
}Content-Type Header:
application/x-www-form-urlencoded
Example Request:
curl -X POST http://localhost:8080/api/data \
-H "Content-Type: application/x-www-form-urlencoded" \
-d 'name=Alice&age=30&city=NYC'Normalized Structure:
{
"name": "Alice",
"age": "30",
"city": "NYC"
}Array Support:
Use [] notation for arrays:
curl -X POST http://localhost:8080/api/data \
-H "Content-Type: application/x-www-form-urlencoded" \
-d 'name=Alice&interests[]=coding&interests[]=music'Becomes:
{
"name": "Alice",
"interests": ["coding", "music"]
}Content-Type Header:
multipart/form-data; boundary=<boundary>
Example Request:
curl -X POST http://localhost:8080/api/upload \
-F "name=Alice" \
-F "age=30" \
-F "file=@document.pdf"Normalized Structure:
{
"fields": {
"name": "Alice",
"age": "30"
},
"files": [
{
"name": "file",
"filename": "document.pdf",
"content_type": "application/pdf",
"size": 12345,
"checksum": "a1b2c3d4..."
}
]
}File Handling:
- Files are NOT saved to disk automatically
- File metadata captured for pipeline processing
- SHA256 checksum computed for integrity verification
- Middleware/backends can access file data via envelope
Content-Type Headers:
image/*(JPEG, PNG, GIF, etc.)video/*audio/*application/pdfapplication/zipapplication/octet-stream
Example Request:
curl -X POST http://localhost:8080/api/upload \
-H "Content-Type: image/jpeg" \
--data-binary @photo.jpgNormalized Structure:
{
"format": "binary",
"content_type": "image/jpeg",
"size": 45678,
"checksum": "abc123..."
}Notes:
- Binary data preserved in
original_datafield of envelope - Checksum allows integrity verification
- Middleware can process binary data directly
Configure size and complexity limits to prevent resource exhaustion:
[proxy.content_limits]
max_body_size = 10485760 # 10MB maximum request body
max_csv_rows = 10000 # Maximum CSV rows to parse
max_xml_depth = 100 # Maximum XML nesting depth
max_multipart_files = 10 # Maximum files in multipart upload
max_form_fields = 1000 # Maximum form fieldsDefaults:
max_body_size: 10MB (10,485,760 bytes)max_csv_rows: 10,000 rowsmax_xml_depth: 100 levelsmax_multipart_files: 10 filesmax_form_fields: 1,000 fields
[proxy]
id = "content-aware-proxy"
store_dir = "./data"
[proxy.content_limits]
max_body_size = 20971520 # 20MB for larger uploads
max_csv_rows = 50000 # Support larger CSV files
max_xml_depth = 50 # Limit XML complexity
[logging]
log_level = "info"
[network.default]
enable_wireguard = false
interface = "wg0"
[network.default.http]
bind_address = "0.0.0.0"
bind_port = 8080
[pipelines.api]
description = "Multi-format API pipeline"
networks = ["default"]
endpoints = ["api_endpoint"]
backends = ["processing_backend"]
middleware = []
[endpoints.api_endpoint]
service = "http"
[endpoints.api_endpoint.options]
path_prefix = "/api"
[backends.processing_backend]
service = "http"
[backends.processing_backend.options]
base_url = "http://backend-service:8080"
[services.http]
module = ""Each request includes content metadata in the envelope for tracking parsing status and format details:
pub struct ContentMetadata {
pub content_type: String, // Original Content-Type header
pub charset: Option<String>, // Character encoding if specified
pub format: String, // Detected format: json, xml, csv, etc.
pub parse_status: ParseStatus, // Success, Failed, NotAttempted, Unsupported
pub original_size: usize, // Size of original payload in bytes
pub checksum: Option<String>, // SHA256 checksum (for binary content)
}Parse Status Values:
Success: Content parsed successfullyFailed: Parsing attempted but failed (malformed data)NotAttempted: No parsing attempted (empty payload)Unsupported: Content-Type not supported
Accessing in Middleware:
async fn process(envelope: RequestEnvelope<Vec<u8>>) -> Result<RequestEnvelope<Vec<u8>>, Error> {
if let Some(metadata) = &envelope.request_details.content_metadata {
tracing::info!(
"Processing {} content ({}), parse_status: {:?}",
metadata.format,
metadata.content_type,
metadata.parse_status
);
}
Ok(envelope)
}Threat: XML External Entity (XXE) attacks allow attackers to read local files or perform SSRF attacks.
Mitigation: The quick-xml parser does not support external entities by default. External entity declarations are ignored.
Example Attack (Blocked):
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<data>&xxe;</data>This will parse as if the entity doesn't exist, preventing file disclosure.
Threat: CSV formula injection occurs when spreadsheet applications execute formulas in CSV cells.
Mitigation: Fields starting with =, +, -, or @ are automatically prefixed with a single quote.
Before:
name,command
Alice,=cmd|'/c calc'!A1After Parsing:
{"name": "Alice", "command": "'=cmd|'/c calc'!A1"}Threat: Billion Laughs attack (XML bomb) causes exponential entity expansion.
Mitigation:
- Maximum XML depth limit (default: 100)
- No entity expansion support
- Maximum body size limit (default: 10MB)
Threats:
- Path traversal via malicious filenames
- Resource exhaustion via many small files
- Memory exhaustion via large files
Mitigations:
- Filename sanitization (automatic by multer)
- Maximum file count limit (default: 10)
- Maximum body size limit (default: 10MB)
- Files not automatically written to disk
All content types respect the max_body_size limit. Additional per-format limits:
- CSV: Row count limit prevents memory exhaustion
- XML: Depth limit prevents stack overflow
- Multipart: File count limit prevents descriptor exhaustion
- Form: Field count limit prevents hash collision attacks
When content parsing fails:
parse_statusset toFailed- Warning logged with error details
normalized_dataset toNone- Pipeline continues with
original_dataavailable - Middleware can check
parse_statusand handle accordingly
Example log:
WARN harmony: Failed to parse XML: XML parsing error: unexpected EOF
When Content-Type is unknown:
- Attempts to parse as JSON (fallback behavior)
- If JSON parsing fails,
parse_statusset toUnsupported - Request continues through pipeline
- Backend receives raw data in
original_data
When limits are exceeded:
- Parsing terminates immediately
- Error returned to client (400 Bad Request)
- Descriptive error message includes limit value
Example error:
{
"error": "CSV row count exceeds limit of 10000"
}Missing Content-Type: Defaults to application/json
Unknown Content-Type:
- Attempts JSON parsing
- If JSON parsing fails, marks as
Unsupported - Pipeline continues with raw data
Empty Payload:
parse_statusset toNotAttemptednormalized_dataset toNone- No parsing attempted
Explicitly set the Content-Type header to ensure correct parsing:
# Good
curl -H "Content-Type: text/csv" -d @data.csv http://...
# Avoid (will try JSON parsing)
curl -d @data.csv http://...Check parse status in middleware before processing:
if envelope.request_details.content_metadata
.as_ref()
.map_or(false, |m| m.parse_status != ParseStatus::Success)
{
return Err(Error::from("Content parsing failed"));
}Set limits based on your use case:
- Small API endpoints: Lower limits (1MB, 100 rows)
- File upload services: Higher limits (100MB, more files)
- Untrusted inputs: Conservative limits
- Internal services: Relaxed limits
Track parsing failures in logs and metrics:
if let Some(metadata) = &envelope.request_details.content_metadata {
if metadata.parse_status != ParseStatus::Success {
metrics::counter!("parse_failures", "format" => &metadata.format).increment(1);
}
}For large files, use binary content types rather than base64-encoded JSON:
# Efficient
curl -H "Content-Type: application/pdf" --data-binary @large.pdf http://...
# Inefficient (33% overhead)
curl -H "Content-Type: application/json" -d '{"file":"<base64>"}' http://...Symptoms: CSV data appears in original_data but not in normalized_data
Solutions:
- Verify
Content-Type: text/csvheader is set - Check CSV has valid header row
- Ensure CSV is properly formatted (no unquoted commas in fields)
- Check row count doesn't exceed
max_csv_rowslimit
Symptoms: parse_status: Failed for XML content
Solutions:
- Validate XML syntax with external tool
- Check for unsupported features (external entities, DTDs)
- Verify XML depth doesn't exceed
max_xml_depthlimit - Ensure proper UTF-8 encoding
Symptoms: Files array is empty in normalized_data
Solutions:
- Verify
Content-Type: multipart/form-datawith boundary parameter - Ensure boundary in header matches boundary in body
- Check that fields have
filenameattribute for file detection - Verify file count doesn't exceed
max_multipart_fileslimit
Symptoms: 400 Bad Request with size limit error
Solutions:
- Increase relevant limit in
[proxy.content_limits]configuration - Split large requests into smaller chunks
- Use streaming endpoints for very large files
- Compress data before upload
Approximate parsing overhead by content type:
- JSON: ~10-20μs for small payloads (<1KB)
- XML: ~50-100μs (includes structure conversion)
- CSV: ~100μs per 100 rows
- Form URL-encoded: ~20-30μs
- Multipart: ~500μs per file (includes checksum)
- Binary: ~1ms per MB (checksum calculation)
Memory overhead during parsing:
- JSON: ~1x payload size (serde_json)
- XML: ~2-3x payload size (DOM structure)
- CSV: ~2x payload size (row objects)
- Multipart: ~1.5x payload size (field buffers)
- Use JSON when possible: Fastest parsing, lowest overhead
- Stream large files: Don't parse entire body if processing in chunks
- Disable checksums: For trusted sources, skip checksum calculation
- Tune limits: Set limits to actual use case requirements
- Monitor metrics: Track parsing times and adjust limits
See examples/content-types/ directory for:
- Sample configuration files
- Example requests for each content type
- JOLT transforms for format conversion
- Integration test examples
- Endpoints Guide - Endpoint configuration
- Middleware Guide - Processing middleware
- Security Guide - Security best practices
- Configuration Reference - Complete config options