Welcome! This guide walks you through installing, running, and using simple_NER for the first time.
- Python 3.10 or later
- pip or uv package manager
pip install simple_NEROr with development tools:
pip install "simple_NER[dev]"Here's the simplest example — extract email, phone, and dates from text:
from simple_NER import create_pipeline
# Create a pipeline with specific entity types
pipe = create_pipeline(["email", "phone", "temporal"])
# Process some text
text = "Call me at +1-800-555-0100 or email john@example.com by 2025-06-01"
# Extract entities
for entity in pipe.process(text):
print(f"{entity.entity_type:12} | {entity.value:20} | confidence: {entity.confidence}")Output:
phone | +1-800-555-0100 | confidence: 0.9
email | john@example.com | confidence: 1.0
date | 2025-06-01 | confidence: 0.85
Each Entity contains:
- value: The extracted text
- entity_type: The label (e.g.,
"email","phone") - confidence: 0.0–1.0 (higher = more certain)
- data: Extra metadata specific to that entity type (e.g., for email:
local_part,domain) - spans: Character positions in the original text
for entity in pipe.process(text):
print(f"Found '{entity.value}' at position {entity.spans}")
print(f"Extra data: {entity.data}")| Type | What it finds | Example |
|---|---|---|
email |
Email addresses | john@example.com |
phone |
Phone numbers | +1-800-555-0100 |
temporal |
Dates, times, durations | 2025-06-01, in 3 days |
numbers |
Numeric and written numbers | 42, seventy-three |
currency |
Money amounts | $99.99, 100 EUR |
locations |
Countries, cities, capitals | New York, France |
names |
Person names | John Smith, Mary Johnson |
organization |
Company names | Apple Inc, Google LLC |
url |
HTTP/HTTPS URLs | https://example.com |
hashtag |
#hashtags | #python, #NLP |
See all 16 annotators: docs/index.md#all-annotators
Some annotators let you tweak how confident they need to be to return entities. For example, LocationNER can distinguish between cities and countries:
from simple_NER import create_pipeline
pipe = create_pipeline(
["locations"],
annotator_params={
"locations": {
"include_cities": True,
"label_confidence": {
"City": 0.7, # Less strict for cities
"Country": 0.95 # Very strict for countries
}
}
}
)
for entity in pipe.process("Paris is in France"):
print(entity.entity_type, entity.value, entity.confidence)When multiple annotators find overlapping text (e.g., both "5" as a number and as part of a date), the pipeline uses a dedup strategy:
# Keep only the longest span
pipe = create_pipeline(["numbers", "temporal"], dedup_strategy="keep_longest")
# Keep the one with highest confidence
pipe = create_pipeline(["numbers", "temporal"], dedup_strategy="keep_higher_confidence")
# Keep all (no dedup)
pipe = create_pipeline(["numbers", "temporal"], dedup_strategy="keep_all")For processing many texts, use the async pipeline:
import asyncio
from simple_NER.pipeline import AsyncNERPipeline
async def process_batch():
pipe = AsyncNERPipeline()
pipe.add_annotator("email")
pipe.add_annotator("phone")
texts = [
"Email: alice@example.com",
"Phone: +1-555-0100",
"Both: bob@test.com and 555-1234",
]
results = await pipe.process_batch_async(texts, max_concurrency=5)
for text, entities in zip(texts, results):
print(f"Text: {text}")
for entity in entities:
print(f" - {entity.entity_type}: {entity.value}")
asyncio.run(process_batch())Pass lang to the pipeline — it forwards to all annotators that support it:
# German date and number parsing
pipe = create_pipeline(
["temporal", "numbers", "currency"],
lang="de-de"
)
for entity in pipe.process("Das Datum ist 15.03.2025 und der Betrag ist 99,99 EUR"):
print(entity.value, entity.entity_type)Languages supported per annotator:
temporal,numbers,date,currency,organization— seedocs/FAQ.md#Q-What-languages-are-supportedfor detailslocations,email,phone,url,hashtag— language-agnostic
from simple_NER import create_pipeline
pipe = create_pipeline(["lookup"])
# Add words to recognize
pipe.annotators[0].add_wordlist("color", ["red", "blue", "green", "yellow"])
for entity in pipe.process("I like blue cars"):
print(entity.value, entity.entity_type) # blue colorfrom simple_NER.annotators.simple_ner import SimpleNER
ner = SimpleNER()
ner.add_entity_examples("color", ["red", "blue", "green"])
for entity in ner.extract_entities("the ball is bright red"):
print(entity.value, entity.entity_type) # red color- Deep dive: docs/index.md — complete API reference
- Tutorials: docs/TUTORIALS.md — step-by-step guides
- Examples: examples/README.md — 15+ runnable scripts
- FAQ: docs/FAQ.md — common questions answered
- API Reference: docs/API.md — class and method details
pip install simple_NER --upgrade- Check confidence: add
print(entity.confidence)to see how certain the detector is - Adjust parameters: pass
annotator_paramstocreate_pipeline(see Customizing Entity Confidence above) - Check language: if using non-English text, pass
lang="de-de"(or your language code) - See examples: browse examples/ for use cases similar to yours
Use the async pipeline (see Async Processing) with max_concurrency tuned to your CPU cores.
- Questions? Check docs/FAQ.md
- API details? See docs/API.md
- Working examples? Browse examples/README.md
- Found a bug? Open an issue on GitHub