Skip to content

Commit 7a0b4f5

Browse files
committed
Update emoji and pandas dependency constraints
Widen version constraints to allow emoji 2.x and pandas 2.x (fixes #37, #38). Update code to use EMOJI_DATA instead of removed UNICODE_EMOJI and drop removed use_aliases parameter from demojize/emojize calls.
1 parent 5c8f1be commit 7a0b4f5

File tree

3 files changed

+1745
-528
lines changed

3 files changed

+1745
-528
lines changed

cleantext/clean.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,14 @@
77
import sys
88
from unicodedata import category
99

10-
from emoji import UNICODE_EMOJI, demojize, emojize
10+
from emoji import demojize, emojize
11+
12+
try:
13+
from emoji import EMOJI_DATA
14+
except ImportError:
15+
from emoji import UNICODE_EMOJI
16+
17+
EMOJI_DATA = None
1118
from ftfy import fix_text
1219

1320
from . import constants
@@ -74,7 +81,7 @@ def to_ascii_unicode(text, lang="en", no_emoji=False):
7481
text = fix_strange_quotes(text)
7582

7683
if not no_emoji:
77-
text = demojize(text, use_aliases=True)
84+
text = demojize(text)
7885

7986
lang = lang.lower()
8087
# special handling for German text to preserve umlauts
@@ -88,7 +95,7 @@ def to_ascii_unicode(text, lang="en", no_emoji=False):
8895
text = save_replace(text, lang=lang, back=True)
8996

9097
if not no_emoji:
91-
text = emojize(text, use_aliases=True)
98+
text = emojize(text)
9299

93100
return text
94101

@@ -196,6 +203,8 @@ def remove_punct(text):
196203

197204

198205
def remove_emoji(text):
206+
if EMOJI_DATA is not None:
207+
return remove_substrings(text, EMOJI_DATA)
199208
return remove_substrings(text, UNICODE_EMOJI["en"])
200209

201210

0 commit comments

Comments
 (0)