Skip to content

Commit ce84415

Browse files
ICU: make optional
1 parent 38c533b commit ce84415

File tree

3 files changed

+56
-14
lines changed

3 files changed

+56
-14
lines changed

client/README.md

Lines changed: 27 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -34,17 +34,37 @@ but it is not officially supported.
3434

3535
**Download and extract [the latest release zip `dwh-migration-tools-vX.X.X.zip`](https://github.com/google/dwh-migration-tools/releases/latest).**
3636

37-
Python ≥ 3.7.2 is required, as well as the additional local dependencies
38-
`pkg-config` and `libicu-dev`. Typical commands for installing these
39-
dependencies on a Debian-based Linux distribution such as Ubuntu would be:
37+
### Python
38+
39+
Python ≥ 3.7.2 is required.
40+
You can check whether you have a recent enough version of Python installed by
41+
running the command `python3 --version`.
42+
43+
You must also have the Python `pip` and `venv` modules installed. Altogether,
44+
the distribution-specific commands to install these are:
45+
46+
* Debian-based distros: `sudo apt install python3-pip python3-venv`
47+
* Red Hat-based distros: `sudo yum install python38 python38-pip` (for e.g. Python
48+
3.8)
49+
50+
### Support for Encodings other than UTF-8
51+
52+
If all of the files you wish to translate are UTF-8 encoded
53+
(this is commonly the case), you can skip this section.
54+
Otherwise, you will need to install additional system dependencies:
55+
56+
* Debian-based distros: `sudo apt install pkg-config libicu-dev`
57+
* RedHat-based distros: `sudo yum install gcc gcc-c++ libicu-devel
58+
python38-devel`
59+
60+
**You must also remember**, upon reaching the step to `pip install` further down
61+
in the Quickstart section below, to use this command instead:
4062

4163
```shell
42-
sudo apt update
43-
sudo apt install -y python3-pip python3-venv pkg-config libicu-dev
64+
pip install ../dwh-migration-tools/client[icu]
4465
```
4566

46-
You can check whether you have a recent enough version of Python installed by
47-
running the command `python3 --version`.
67+
### GCP
4868

4969
You need a GCP project and a Google Cloud Storage bucket to use for uploading
5070
your input SQL files and downloading the translated output. [Learn how to

client/bqms_run/encoding.py

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,6 @@
1515

1616
import logging
1717

18-
import icu
19-
2018
logger = logging.getLogger(__name__)
2119

2220

@@ -25,14 +23,35 @@ class EncodingDetector:
2523
An encoding detector.
2624
"""
2725

26+
DEFAULT_ENCODING = "utf-8"
27+
has_logged_fallback_warning = False
28+
2829
def detect(self, data: bytes) -> str:
2930
"""
3031
Detect the encoding of the provided bytes, return the encoding name.
3132
"""
32-
encoding = icu.CharsetDetector(data).detect().getName()
33-
if not isinstance(encoding, str):
34-
return "utf-8"
35-
return encoding
33+
try:
34+
# pylint: disable-next=import-outside-toplevel
35+
import icu
36+
37+
encoding = icu.CharsetDetector(data).detect().getName()
38+
if not isinstance(encoding, str):
39+
return EncodingDetector.DEFAULT_ENCODING
40+
return encoding
41+
# pylint: disable-next=broad-exception-caught
42+
except Exception as ex:
43+
# any ICU-related exceptions should not halt execution,
44+
# just assume UTF-8
45+
if not EncodingDetector.has_logged_fallback_warning:
46+
# only print a single per-process warning to avoid log noise
47+
EncodingDetector.has_logged_fallback_warning = True
48+
logger.warning(
49+
# pylint: disable-next=line-too-long
50+
"PyICU is either not available or misconfigured; assuming default encoding of %s (cause: %s)",
51+
EncodingDetector.DEFAULT_ENCODING,
52+
ex,
53+
)
54+
return EncodingDetector.DEFAULT_ENCODING
3655

3756
def decode(self, data: bytes) -> str:
3857
"""

client/pyproject.toml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,10 @@ PyYAML = "^6.0"
1515
marshmallow = "^3.17.1"
1616
cloudpathlib = "^0.10.0"
1717
typing-extensions = "^4.4.0"
18-
PyICU = "^2.10.2"
18+
PyICU = { version = "^2.10.2", optional = true }
19+
20+
[tool.poetry.extras]
21+
icu = ["PyICU"]
1922

2023
[tool.poetry.dev-dependencies]
2124
black = "22.6.0"

0 commit comments

Comments
 (0)