Skip to content

Error in handling charsets different from UTF-8 #132

Open
@andsel

Description

@andsel
  • Version: 3.3.5
  • Operating System:
  • Config File (if you have sensitive info, please remove it):
input {
	http {
		port => 9006
		codec => plain {
			charset => "CP1254"
		}
	}
}	

output {
	stdout {
		codec => json {charset => "UTF-8"}
	}
}
  • Sample Data:
    python script to use as client to send encoded data
import requests
API_ENDPOINT = "http://127.0.0.1:9006"
message='TÜRKÇE karakter test : ĞÜŞİÇÖışüğöç'
r = requests.post(url = API_ENDPOINT, data = bytes(message,'cp1254'))
  • Steps to Reproduce:
    • run logstash with the pipeline
    • execute the python script
    • the console output is:
{"message":"T�RK�E karakter test : ������������","@version":"1","@timestamp":"2020-11-30T10:38:55.338Z","headers":{"connection":"keep-alive","request_method":"POST","http_accept":"*/*","http_user_agent":"python-requests/2.21.0","content_length":"35","http_version":"HTTP/1.1","http_host":"127.0.0.1:9006","request_path":"/","accept_encoding":"gzip, deflate"},"host":"127.0.0.1"}

This seems not to be a problem in the codec because I've tried with this pipeline (same codec, different input):

input {
	file {
		path => "/tmp/cp1254_encoded.txt"
		mode => "read"
		sincedb_path => "/dev/null"
		file_completed_log_path => "/tmp/file_actions.log"
		file_completed_action => "log"
		codec => plain {
			charset => "CP1254"
		}
	}
}	

output {
	stdout {
		codec => json {charset => "UTF-8"}
	}
}

with the file attached as input data
cp1254_encoded.txt

and the console out is what's expected (TÜRKÇE karakter test : ĞÜŞİÇÖışüğöç)

NB:
to reproduce the text file simply cut&paste the above string in a text editor and ask the editor to save it with encoding CP1254

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions