Releases: scalyr/scalyr-agent-2
Releases · scalyr/scalyr-agent-2
Lasso
Features:
- Add copy truncate log rotation support. This is enabled by default. It does not support copy truncate with compression unless the
delaycompressoption is used. This feature can be disabled by settingenable_copy_truncate_log_rotation_supportto false.
Improvements:
- Add new
tcp_request_parserandtcp_message_delimiterconfig option. Valid values fortcp_request_parserincludedefaultandbatch. New TCP recv batch oriented request parser is much more efficient than the default one and should be a preferred choice in most situations. For backward compatibility reasons, the default parser hasn't been changed yet. shell_monitornow outputs two additional metrics during each sample gather interval -durationandexit_code. First one represents how many seconds it took to execute the shell command / script and the second one represents that script exit (status).
Misc:
- On startup and when parsing a config file, agent now emits a warning if the config file is readable by others.
- Add the config option
enable_worker_process_metrics_gatherto enable 'linux_process_metrics' monitor for each multiprocess worker. - Each session, which runs in a separate process, periodically writes its stats in the log file. The interval between writes can be changed by using the
default_worker_session_status_message_interval - Rename some of the configuration parameters:
use_miltiprocess_copying_workerstouse_multiprocess_workers,default_workers_per_api_keytodefault_sessions_per_api_key. Previous option names are preserved for the backward compatibility but they are marked as deprecated. NOTE: The appropriate environment variable names are changed too. - Update docker monitor so we don't log some non-fatal errors under warning log level when consuming logs using Docker API.
- Add support for
compression_type: noneconfig option which completely disables compression for outgoing requests. Right now one of the main bottle necks in the high volume scenarios in the agent is compression operation. Disabling it can, in some scenarios, lead to large increase to the overall throughput (up to 2x). Disabling the compression will in most cases result in larger data egress traffic which may incur additional charges on your infrastructure provider so this option should never be set tononeunless explicitly advised by the technical support. - Linux system metrics monitor has been updated to also ignore
/var/lib/docker/*and/snap/*mount points by default. Capturing metrics for those mount points usually offers no additional insight to the end user. For information on how to change the ignore list via configuration option, please see RELEASE_NOTES. - The agent install bash script now adds the Scalyr repositories directly without installing the
scalyr-repopackages. This also eliminates errors caused by re-acquiring the package manager's lock file during the pre/post install/uninstall scripts. The issue occurred in bothaptandrpmpackage managers.
Security fixes and improvements:
- Agent installation artifacts have been updated so the default
agent.jsonfile which is bundled with the agent is not readable by "other" system users by default anymore. For more context, details and impact, please see RELEASE_NOTES.
Endora
Features:
- Ability to upload logs to different Scalyr team accounts by specifying different API keys for different log files. See RELEASE_NOTES for more details.
- New configuration option
default_workers_per_api_keywhich creates more than one session with the Scalyr servers to increase upload throughput. This may be set using theSCALYR_DEFAULT_WORKERS_PER_API_KEYenvironment variable. - New configuration option
use_multiprocess_copying_workerswhich uses separate processes for each upload session, thereby providing more CPU resources to the agent. This may be set using theSCALYR_USE_MULTIPROCESS_COPYING_WORKERSenvironment variable.
Improvements: - Linux system metrics monitor now ignores the following special mounts points by default:
/sys/*,/dev*,/run*. If you want still capturedf.*metrics for those mount points, please refer to RELEASE_NOTES. - Update
url_monitorso it sends correctUser-Agentheader which identifies requests are originating from the agent.
Misc:
- The default value for the
k8s_cri_query_filesystemKubernetes monitor config option (set via theSCALYR_K8S_CRI_QUERY_FILESYSTEMenvironment var) has changed toTrue. This means that by default when in CRI mode, the monitor will only query the filesystem for the list of active containers, rather than first querying the Kubelet API. If you wish to revert to the original default to prefer using the Kubelet API, setSCALYR_K8S_CRI_QUERY_FILESYSTEMthe environment variable to "false" for the Scalyr Agent daemonset. - New
global_monitor_sample_interval_enable_jitterconfig option has been added which is enabled by default. When this option is enabled, random sleep between 2/10 and 8/10 of the configured monitor sample gather interval is used before gathering the sample for the first time. This ensures that sample gathering for all the monitors doesn't run at the same time. This comes in handy when running agent configured with many monitors on lower powered devices to spread the monitor sample gathering related load spike across a longer time frame.
Bug fixes:
- Fix to make sure we don't expect a valid Docker socket when running Kubernetes monitor in CRI mode. This fixes an issue preventing the K8s monitor from running in CRI mode if Docker is not available.
- Fix line grouping code and make sure we don't throw if line data contains bad or partial unicode escape sequence.
- Fix
scalyr_agent/run_monitor.pyscript so it also works correctly out of the box when using source code installation. - Update Windows System Metrics monitor to better handle a situation when disk io counters are not available.
- Docker monitor has been fixed that when running in "API mode" (
docker_raw_logs: false) it also correctly ingests logs from containerstderr. Previously only logs fromstdouthave been ingested.
Hydrus
Features:
- Add new
initial_stopped_container_collection_windowconfiguration option to the Kubernetes monitor, which can be configured by setting theSCALY_INITIAL_STOPPED_CONTAINER_COLLECTION_WINDOWenvironment variable. By default, the Scalyr Agent does not collect the logs from any pods stopped before the agent was started. To override this, set this parameter to the number of seconds the agent will look in the past (before it was started). It will collect logs for any pods that was started and stopped during this window. This can be useful in autoscaling environments to ensure all pod logs are captured since node creation, even if the Scalyr Agent daemonset starts just after other pods.
Improvements:
- Improve logging in the Kubernetes monitor.
- On agent start up we now also log the locale (language code and encoding) used by the agent process. This will make it easier to troubleshoot issues which are related to the agent process not using UTF-8 coding.
- Default value for
tcp_buffer_sizeSyslog monitor config option has been increased from 2048 to 8192 bytes. - New
message_size_can_exceed_tcp_bufferconfig option has been added to Syslog monitor. When set to True, monitor will support messages which are larger thantcp_buffer_sizebytes in size andtcp_buffer_sizeconfig option will tell how much bytes we try to read from the socket at once / in a single recv() call. For backward compatibility reasons, it defaults to False.
Bug fixes:
- Fix a bug / race-condition in Docker monitor which could cause, under some scenarios, when monitoring containers running on the same host, logs to stop being ingested after the container restart. There was a relatively short time window when this could happen and it was more likely to affect containers which take longer to stop / start.
- Update code for all the monitors to correctly use UTC timezone everywhere. Previously some of the code incorrectly used local server time instead of UTC. This means some of those monitors could exhibit incorrect / undefined behavior when running the agent on a server which has local time set to something else than UTC.
- Fix
docker_raw_logs: falsefunctionality in the Docker monitor which has been broken for a while now. - Update Windows System Metrics monitor to better handle a situation when disk io counters are not available.
Celaeno
Bug fixes:
- Fix
scalyr-agent-2 statuscommand non-fatal error when running status command multiple times concurrently or in a short time frame. - Fix
scalyr-agent-statuscommand to not log config override warning to stdout since it may interfere with consumers of the status command output. - Fix merging of active-checkpoints.json and checkpoints.json checkpoint file data. Previously data from active checkpoints file was not correctly merged into full checkpoint data file which means that under some scenarios (e.g. agent crashed after active checkpoint file was written, but before full checkpoint file was written), data which was already sent to the server could be sent twice. Actual time window when this could happen was relatively small since full checkpoint data is written out every 60 seconds by default.
- Fix Postgres monitor error when specifying the Postgres
database_portin the agent config.
Betelgeuze
- Upgrade
psutildependency which incorporates many critical fixes. As part of the change, Windows Server 2003/XP is no longer supported. - Small fix for the
pywin32library which is used in the Windows version.
Aqua
Features:
- Add new
win32_max_open_fdsconfiguration option which allows user to overwrite maximum open file limit on Windows for the scalyr agent process.
Bug fixes:
- Fix bug in packaging which would cause agent to sometimes crash on Windows when using windows event log monitor.
Alcor
Bug fixes:
- Fix formatting of the "Health Check:" line in ``scalyr-agent-2 status -v` command output and make sure the value is left padded and consistent with other lines.
- Fix reporting of "Last successful communication with Scalyr" line value in the
scalyr-agent-2 status -vcommand output if we never successfuly establish connection with the Scalyr API. - Fix a regression in
scalyr-agent-2-config --upgrade-windowsfunctionality which would sometimes throw an exception, depending on the configuration values.
Security fixes and improvments:
- Fix a bug with the agent not correctly validating that the hostname which is stored inside the certificate returned by the server matches the one the agent is trying to connect to (
scalyr_configoption). This would open up a possibility for MITM attack in case the attacker was able to spoof or control the DNS. - Fix a bug with the agent not correctly validating the server certificate and hostname when using
scalyr-agent-2-config --upgrade-windowsfunctionality under Python < 2.7.9. This would open up a possibility for MITM attack in case the attacker was able to spoof or control the DNS. - When connecting to the Scalyr API, agent now explicitly requests TLS v1.2 and aborts connection if the server doesn't support it or tries to use an older version. Recently Scalyr API deprecated support for TLS v1.1 which allows us to implement this change which makes the agent more robust against potential downgrade attacks. Due to lack of required functionality in older Python versions, this is only true when running the agent under Python >= 2.7.9.
- When connecting to the Scalyr API, server now sends a SNI header which matches the host specified in the agent config. Due to lack of required functionality in older Python versions, this is only true when running the agent under Python >= 2.7.9.
Ursa
Bug fixes:
- Fixed a regression in Scalyr Windows Agent cmdlet script (
ScalyrShell.cmd) which prevents the agent from starting.
Titan
Features:
- The
status -vcommand now contains health check information, and will have a return code of2if the health check has failed. New optional flag for thestatusCLI command-Hreturns a short status with only health check info. A new configuration featurehealthy_max_time_since_last_copy_attemptdefines how many seconds is acceptable for the Agent to not attempt to send up logs before the health check should fail, defaulting to60.0. For more information, please refer to the release notes document. - Kubernetes yaml has been updated to include a liveliness check based on the new health check info, which will cause a pod restart if the agent is considered unhealthy.
Bug fixes:
- Fixed race condition in pipelined requests which could lead to duplicate log upload, especially for systems with a large number of inactive log files. Log files would be reuploaded from their start over short period of time (seconds to minutes). This bug is triggered when pipelining is enabled, either by explicitly setting the
pipeline_thresholdconfig option or by using a Scalyr Agent release >= 2.1.6 (pipelining was turned on by default in 2.1.6). - Fixed the misconfiguration in Windows packager which causes some number of the monitors to not be included in Windows version. This generates import errors when attempting to use monitors like the syslog or shell monitor.
Misc:
compression_levelconfiguration option now defaults to6when usingdeflatecompression_type(deflateis the default value for thecompression_typeconfiguration option).6offers the best trade off between compression ratio and CPU usage. For more information, please refer to the release notes document.
Serenity
Features:
- New configuration feature
k8s_logsallows configuring of Kubernetes logs similarly to thelogsconfiguration but matches based on Kubernetes pod, namespace, and container name. Please see the RELEASE_NOTES for more details.
Bug fixes:
- Fixed race condition that sometimes resulted in duplicated K8s logs being uploaded on agent restart or configuration update.
Misc:
- The Windows package is now built using
pyInstallerinstead ofpy2exe. As part of the change, we are no longer supporting 32-bit Windows systems. Nothing else should change due move topyInstaller.