-
Notifications
You must be signed in to change notification settings - Fork 203
Description
Description
When trying to use hazm with Python 3.11 on a macOS system with Apple Silicon (ARM64), I encountered issues due to its dependency on an outdated version of numpy (1.5.0). This version is incompatible with Python 3.11 and fails to install due to changes in distutils.sysconfig (missing _init_posix attribute). Additionally, newer versions of numpy (e.g., 1.26.4 or 2.2.5) seem to cause import errors, such as ModuleNotFoundError: No module named 'numpy.rec'.
Environment
- Operating System: macOS
- Architecture: Apple Silicon (ARM64)
- Python Version: 3.11
- Hazm Version: 0.10.0
- numpy Version Tried: 1.5.0 (required by hazm), 1.26.4, 2.2.5
- Other Relevant Packages: scikit-learn, pandas, gensim==4.3.3
Error Details
Attempt 1: Installing hazm with newer numpy (e.g., 2.2.5)
When importing hazm.Normalizer, I get the following error:
ModuleNotFoundError: No module named 'numpy.rec'
This seems to be because hazm or its dependencies (e.g., sklearn) expect an older numpy version.
Attempt 2: Installing numpy==1.5.0
When trying to install numpy==1.5.0 (as required by hazm), I get an installation error:
AttributeError: module 'distutils.sysconfig' has no attribute '_init_posix'
This is likely because numpy==1.5.0 is too old for Python 3.11 and does not support ARM64 architecture.
Steps to Reproduce
- Create a virtual environment with Python 3.11 on macOS (Apple Silicon).
- Run
pip install hazm. - Try importing
from hazm import Normalizer(fails withnumpy.recerror ifnumpy>=1.18is installed). - Alternatively, try
pip install numpy==1.5.0(fails withdistutilserror).
What I Tried
- Installed newer
numpyversions (1.26.4, 2.2.5), but gotModuleNotFoundError: No module named 'numpy.rec'. - Tried installing
numpy==1.5.0, but it fails due todistutilsincompatibility with Python 3.11. - Considered using alternative libraries like
parsivarorvirastar, which work fine with modernnumpyversions.
Samle Code:
import pandas as pd
import torch
import transformers
import chromadb
from hazm import Normalizer
import numpy as np
import sklearn
print("pandas:", pd.__version__)
print("torch:", torch.__version__)
print("MPS available:", torch.backends.mps.is_available()) # باید True باشه
print("transformers:", transformers.__version__)
print("chromadb:", chromadb.__version__)
print("numpy:", np.__version__)
print("scikit-learn:", sklearn.__version__)
print("hazm:", Normalizer().normalize("تست نرمالسازی"))
Expected Behavior
hazm should be compatible with recent numpy versions (e.g., 1.26.4 or higher) and installable on Python 3.11 with Apple Silicon without requiring an outdated numpy==1.5.0.
Suggested Fix
- Update
hazmdependencies to support newernumpyversions (e.g., >=1.24). - Remove or update reliance on
numpy.recinhazmor its dependencies (e.g.,sklearn). - Test and document compatibility with Python 3.11 and ARM64 architectures.
Additional Notes
I’m happy to provide more details or test any proposed fixes. This issue is critical for projects relying on hazm for Persian NLP on modern systems.
Thanks for maintaining this great library!