-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Description
Hello. Yesterday the process has stuck in a strange state without doing anything.
I did check the CH logs, there wasn't any suspicious.
There's a log from the stuck time:
> grep -C15 2020-02-06T20:12:33.379Z TMOP-CCH.log
[2020-02-06T20:12:27.381Z] INFO [upload] start handle {"name": "graphite_tagged", "filename": "/var/lib/carbon-clickhouse/graphite_tagged/default.1581019944197075408.lz4"}
[2020-02-06T20:12:27.403Z] INFO [upload] start handle {"name": "graphite_index", "filename": "/var/lib/carbon-clickhouse/graphite_index/default.1581019944197075408.lz4"}
[2020-02-06T20:12:27.617Z] ERROR [upload] handle failed {"name": "graphite_tagged", "filename": "/var/lib/carbon-clickhouse/graphite_tagged/default.1581019944197075408.lz4", "error": "Post http://ch-host:8123/?query=INSERT+INTO+graphite.tagged+%28Date%2C+Tag1%2C+Path%2C+Tags%2C+Version%29+FORMAT+RowBinary: EOF", "time": 0.235977166}
[2020-02-06T20:12:27.754Z] INFO [upload] handle success {"name": "graphite", "filename": "/var/lib/carbon-clickhouse/graphite/default.1581019944197075408.lz4", "time": 0.374817767}
[2020-02-06T20:12:28.129Z] INFO [upload] handle success {"name": "graphite_index", "filename": "/var/lib/carbon-clickhouse/graphite_index/default.1581019944197075408.lz4", "time": 0.727357581}
[2020-02-06T20:12:28.380Z] INFO [upload] start handle {"name": "graphite_tagged", "filename": "/var/lib/carbon-clickhouse/graphite_tagged/default.1581019944197075408.lz4"}
[2020-02-06T20:12:28.486Z] INFO [upload] handle success {"name": "graphite_tagged", "filename": "/var/lib/carbon-clickhouse/graphite_tagged/default.1581019944197075408.lz4", "time": 0.105951873}
[2020-02-06T20:12:30.379Z] INFO [upload] start handle {"name": "graphite", "filename": "/var/lib/carbon-clickhouse/graphite/default.1581019947197180216.lz4"}
[2020-02-06T20:12:30.380Z] INFO [upload] start handle {"name": "graphite_tagged", "filename": "/var/lib/carbon-clickhouse/graphite_tagged/default.1581019947197180216.lz4"}
[2020-02-06T20:12:30.380Z] INFO [upload] start handle {"name": "graphite_index", "filename": "/var/lib/carbon-clickhouse/graphite_index/default.1581019947197180216.lz4"}
[2020-02-06T20:12:30.478Z] INFO [upload] handle success {"name": "graphite_tagged", "filename": "/var/lib/carbon-clickhouse/graphite_tagged/default.1581019947197180216.lz4", "time": 0.098053346}
[2020-02-06T20:12:30.625Z] INFO [upload] handle success {"name": "graphite", "filename": "/var/lib/carbon-clickhouse/graphite/default.1581019947197180216.lz4", "time": 0.245967823}
[2020-02-06T20:12:30.875Z] ERROR [upload] handle failed {"name": "graphite_index", "filename": "/var/lib/carbon-clickhouse/graphite_index/default.1581019947197180216.lz4", "error": "Post http://ch-host:8123/?query=INSERT+INTO+graphite.index+%28Date%2C+Level%2C+Path%2C+Version%29+FORMAT+RowBinary: EOF", "time": 0.495387526}
[2020-02-06T20:12:31.380Z] INFO [upload] start handle {"name": "graphite_index", "filename": "/var/lib/carbon-clickhouse/graphite_index/default.1581019947197180216.lz4"}
[2020-02-06T20:12:31.639Z] INFO [upload] handle success {"name": "graphite_index", "filename": "/var/lib/carbon-clickhouse/graphite_index/default.1581019947197180216.lz4", "time": 0.25905138}
[2020-02-06T20:12:33.379Z] INFO [upload] start handle {"name": "graphite", "filename": "/var/lib/carbon-clickhouse/graphite/default.1581019950197305764.lz4"}
[2020-02-06T20:12:33.380Z] INFO [upload] start handle {"name": "graphite_index", "filename": "/var/lib/carbon-clickhouse/graphite_index/default.1581019950197305764.lz4"}
[2020-02-06T20:12:33.380Z] INFO [upload] start handle {"name": "graphite_tagged", "filename": "/var/lib/carbon-clickhouse/graphite_tagged/default.1581019950197305764.lz4"}
[2020-02-06T20:12:45.198Z] INFO [writer] chunk interval changed {"interval": "10s"}
[2020-02-06T20:12:49.234Z] WARN [tcp] unfinished line {"peer": "[2a00:1f78:fffd:402e::101]:35402", "line": "servers.a"}
[2020-02-06T20:12:50.012Z] WARN [tcp] unfinished line {"peer": "[2a00:1f78:fffd:4002::a9d]:56246", "line": "metric.name"}
[2020-02-06T20:12:50.709Z] WARN [tcp] failed to accept connection {"error": "accept tcp [::]:2003: accept4: too many open files"}
[2020-02-06T20:12:50.709Z] WARN [tcp] failed to accept connection {"error": "accept tcp [::]:2003: accept4: too many open files"}
[2020-02-06T20:12:50.709Z] WARN [tcp] failed to accept connection {"error": "accept tcp [::]:2003: accept4: too many open files"}
**** the line repeats next 1.3GiB****
Another strange thing - disk space was allocated in df output, but du didn't show anything: (upd: looks like logrotate has removed the file, but carbon-clickhouse hasn't release it by some reason)
> df
Filesystem 1K-blocks Used Available Use% Mounted on
udev 980608 0 980608 0% /dev
tmpfs 198304 22024 176280 12% /run
/dev/vda 7329792 7329772 20 100% /
# cuted
> sudo du -hscx / | sort -h
1.7G /
1.7G total
> sudo lsof | grep -i del
COMMAND PID TID USER FD TYPE DEVICE SIZE/OFF NODE NAME
carbon-cl 19199 carbon-clickhouse 13w REG 254,0 5603639296 4872077 /var/log/carbon-clickhouse/carbon-clickhouse.log.1 (deleted)
carbon-cl 19199 6028 carbon-clickhouse 13w REG 254,0 5603639296 4872077 /var/log/carbon-clickhouse/carbon-clickhouse.log.1 (deleted)
carbon-cl 19199 19200 carbon-clickhouse 13w REG 254,0 5603639296 4872077 /var/log/carbon-clickhouse/carbon-clickhouse.log.1 (deleted)
*** dozens of line with different TID ***
I'll increase nofiles limit for sure because it was the reason for the outage. But maybe you'd find a way to avoid such deadlocks. Even panic and die would be better behavior IMHO.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels