You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`DIAL_LOG_PARSER_FILENAME_REGEX`| optional | Allows to override the regex to match log file names (default: `date=(\d{4}-\d{2}-\d{2})(\d+)-(\w{8}-\w{4}-\w{4}-\w{4}-\w{12}).log(.gz)?`) |
53
-
|`DIAL_LOG_PARSER_INPUT_COMPRESSION`| optional | Compression type for input log files. Possible values: 'detect' - detect compression from file extension (default), 'none' - no compression, or well known compression types [supported by pyarrow](https://arrow.apache.org/docs/python/generated/pyarrow.fs.FileSystem.html#pyarrow.fs.FileSystem.open_input_stream) (like 'gzip'). |
53
+
|`DIAL_LOG_PARSER_INPUT_COMPRESSION`| optional | Compression type for input log files. Possible values: <br/> `infer` - infer compression from file extension (default), <br/> `none` - no compression, <br/> or well known compression types [supported by fsspec](https://filesystem-spec.readthedocs.io/en/latest/features.html#transparent-text-mode-and-compression) (like `gzip`). |
54
+
|`DIAL_LOG_PARSER_INPUT_CACHE`| optional | Cache type for input filesystem. Possible values: <br/> `default` - use default caching behavior (default), <br/> `none` - disable caching, <br/> or cache types supported by fsspec (like `readahead`, `bytes`, etc.). <br/> See https://filesystem-spec.readthedocs.io/en/latest/api.html#read-buffering and specific filesystem documentation for details. |
54
55
55
56
### Storage specific environment variables
56
57
57
58
Specific storage implementations may require additional environment variables to be set.
58
59
59
-
For example, for S3, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY may be required. See https://s3fs.readthedocs.io/en/latest/#credentials
60
+
For example, for S3, `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` may be required. See https://s3fs.readthedocs.io/en/latest/#credentials
60
61
61
62
Fsspec compatible implementations should be supported (may require to install the extra packages to the docker).
62
63
Check the list [Built-in Fsspec Implementations](https://filesystem-spec.readthedocs.io/en/latest/api.html#implementations) and [Other Known Fsspec Implementations](https://filesystem-spec.readthedocs.io/en/latest/api.html#external-implementations) for more details.
63
64
65
+
#### Azure Blob Storage
66
+
67
+
For Azure Blob Storage, see [adlfs documentation](https://github.com/fsspec/adlfs?tab=readme-ov-file#setting-credentials) for the list of required environment variables.
68
+
69
+
**Note**: `AZURE_STORAGE_ANON` should be explicitly set to `false` to use authenticated access. The default value in the adlfs library is `true` which may lead to authentication issues when trying to access private blobs.
70
+
71
+
If you store the logs compressed as `.logs.gz` and the `Content-Encoding` header for the blob is set to `gzip`, you may encounter an issue where adlfs returns decompressed file content, but reports the file size for the compressed file. This confuses the caching and decompression logic in fsspec and may lead to an error when the parser tries to read the file content.
72
+
73
+
To work around this issue, you can set the `DIAL_LOG_PARSER_INPUT_COMPRESSION=none` to explicitly disable compression in the parser even if the file name ends with `.gz`, and set `DIAL_LOG_PARSER_INPUT_CACHE=none` to disable caching to avoid issues with the file size mismatch. This way the parser will read the file content as is without trying to decompress it or cache it.
0 commit comments