TigreGotico · github-actions · Feb 25, 2026 · Feb 25, 2026 · Feb 25, 2026 · May 29, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,28 +1,20 @@
 # Changelog
 
-## [0.2.0a2](https://github.com/TigreGotico/tugaphone/tree/0.2.0a2) (2026-02-06)
+## [0.2.2a2](https://github.com/TigreGotico/tugaphone/tree/0.2.2a2) (2026-05-29)
 
-[Full Changelog](https://github.com/TigreGotico/tugaphone/compare/0.2.0a1...0.2.0a2)
+[Full Changelog](https://github.com/TigreGotico/tugaphone/compare/0.2.2a1...0.2.2a2)
 
 **Merged pull requests:**
 
-- Configure Renovate [\#3](https://github.com/TigreGotico/tugaphone/pull/3) ([renovate[bot]](https://github.com/apps/renovate))
+- docs: add docs/ and examples/ [\#22](https://github.com/TigreGotico/tugaphone/pull/22) ([JarbasAl](https://github.com/JarbasAl))
 
-## [0.2.0a1](https://github.com/TigreGotico/tugaphone/tree/0.2.0a1) (2026-02-06)
+## [0.2.2a1](https://github.com/TigreGotico/tugaphone/tree/0.2.2a1) (2026-02-25)
 
-[Full Changelog](https://github.com/TigreGotico/tugaphone/compare/0.1.0a1...0.2.0a1)
+[Full Changelog](https://github.com/TigreGotico/tugaphone/compare/0.2.1...0.2.2a1)
 
 **Merged pull requests:**
 
-- feat: regional accent transformations [\#1](https://github.com/TigreGotico/tugaphone/pull/1) ([JarbasAl](https://github.com/JarbasAl))
-
-## [0.1.0a1](https://github.com/TigreGotico/tugaphone/tree/0.1.0a1) (2026-02-06)
-
-[Full Changelog](https://github.com/TigreGotico/tugaphone/compare/0.0.2...0.1.0a1)
-
-**Merged pull requests:**
-
-- feat: new phonemizer + postag backends [\#4](https://github.com/TigreGotico/tugaphone/pull/4) ([JarbasAl](https://github.com/JarbasAl))
+- refactor: upstream some logic to dependencies [\#16](https://github.com/TigreGotico/tugaphone/pull/16) ([JarbasAl](https://github.com/JarbasAl))
 
 
 

diff --git a/README.md b/README.md
@@ -37,6 +37,14 @@ pip install tugaphone
 
 ## 🧰 Usage
 
+### Companion libraries
+
+The follow libraries are dependencies of tugaphone and might be useful on their own
+
+- [Tugalex](https://github.com/TigreGotico/tugalex) - Lexicon of words and exceptions
+- [TugaTagger](https://github.com/TigreGotico/tugatagger) - portuguese text postagger
+- [silabificador](https://github.com/TigreGotico/silabificador) - portuguese text syllabification
+
 ### Basic Phonemization
 
 ```python
@@ -100,24 +108,6 @@ print(normalize_numbers("897654356789098", "pt-PT"))  # long-scale (biliões)
 print(normalize_numbers("897654356789098", "pt-BR"))  # short-scale (trilhões)
 ```
 
-### Syllabification
-
-```python
-from tugaphone.syl import syllabify
-
-words = ["casa", "Brasil", "extraordinário", "português"]
-
-for word in words:
-    syllables = syllabify(word)
-    print(f"{word} → {'.'.join(syllables)}")
-
-# Output:
-# casa → ca.sa
-# Brasil → bra.sil
-# extraordinário → ex.tra.or.di.ná.rio
-# português → por.tu.guês
-```
-
 ### Advanced: Tokenization and Features
 
 ```python

diff --git a/docs/advanced.md b/docs/advanced.md
@@ -0,0 +1,121 @@
+# Advanced recipes
+
+Once the basic `phonemize_sentence` loop is clear, these are the knobs worth
+knowing.
+
+## POS engines and homographs
+
+Portuguese homographs change pronunciation by part of speech. `tugaphone` tags
+the sentence first and feeds the tags into transcription, so the engine you pick
+affects accuracy:
+
+```python
+from tugaphone import TugaPhonemizer
+
+ph = TugaPhonemizer(postag_engine="spacy")    # most accurate, needs pt_core_news_lg
+ph.phonemize_sentence("Vou para casa.")        # 'para' as preposition
+ph.phonemize_sentence("Ele para o carro.")     # 'para' as verb
+```
+
+Engine options, from heaviest to lightest:
+
+| Engine | Needs | Notes |
+|--------|-------|-------|
+| `spacy` | `spacy` + `pt_core_news_lg` | Most accurate. |
+| `brill` | `brill-postaggers` | Lighter, faster; installed via `tugatagger[brill]`. |
+| `lexicon` | nothing extra | Built-in lookup, limited coverage. |
+| `dummy` | nothing | Rule-based fallback, no dependencies. |
+| `auto` | — | Falls through whatever is installed. Default. |
+
+If you only need deterministic output with no optional dependencies, construct
+with `postag_engine="dummy"`.
+
+## Regional accents
+
+On top of the five dialect codes, `tugaphone.regional` ships sub-regional accent
+presets as `RegionalTransforms`. Pass one through `regional_dialect`; it is
+applied on top of the `lang` dialect:
+
+```python
+from tugaphone import TugaPhonemizer
+from tugaphone.regional import (PortoDialect, MinhoDialect, BragaDialect,
+                                TrasMontanoDialect, FafeDialect)
+
+ph = TugaPhonemizer()
+sentence = "a gente sente o que sabe"
+for name, accent in [("porto", PortoDialect), ("minho", MinhoDialect),
+                     ("braga", BragaDialect), ("trasmontano", TrasMontanoDialect),
+                     ("fafe", FafeDialect)]:
+    print(name, "→", ph.phonemize_sentence(sentence, "pt-PT", regional_dialect=accent))
+```
+
+| Preset | Signature features |
+|--------|--------------------|
+| `CoimbraDialect` | Diphthong retention only (neutral baseline). |
+| `MinhoDialect` | Vowel-centralization resistance, open vowels, alveolar rhotic. |
+| `BragaDialect` | Palatal epenthesis (`abelha` → `abeilha`) on top of northern rules. |
+| `FamalicaoDialect` | Conservative `o`-nasal retention (`Famalicão` → `Famalicoum`). |
+| `TrasMontanoDialect` | `ch` affrication, s-voicing, final nasal denasalization. |
+| `PortoDialect` | Rising `o` diphthong (`Porto` → `Puorto`). |
+| `FafeDialect` | Nasal diphthongization of `e` (`gente` → `geinte`). |
+
+These are explicitly experimental — real variation is messier than any rule set.
+
+### Serializing an accent
+
+`RegionalTransforms` round-trips through a plain dict, so an accent config can
+live in JSON or YAML:
+
+```python
+from tugaphone.regional import PortoDialect, RegionalTransforms
+
+cfg = PortoDialect.as_dict
+# {'morpheme_rules': [], 'ipa_rules': ['rising_diphthong_o', ...]}
+
+clone = RegionalTransforms.from_dict(cfg)
+[r.__name__ for r in clone.ipa_rules]
+```
+
+`from_dict` raises `ValueError` on an unknown IPA rule name. Only rules listed in
+`tugaphone.regional.RULE_MAP` survive the round-trip; accents that use other rule
+functions serialize a subset of their behaviour.
+
+## Number normalization
+
+`normalize_numbers` spells digits out before transcription and is independently
+useful for any TTS front-end:
+
+```python
+from tugaphone.number_utils import normalize_numbers
+
+normalize_numbers("vou comprar 1 casa")      # 'vou comprar uma casa'  (feminine)
+normalize_numbers("vou adotar 1 cão")        # 'vou adotar um cão'      (masculine)
+normalize_numbers("897654356789098", "pt-PT")  # long scale (biliões)
+normalize_numbers("897654356789098", "pt-BR")  # short scale (trilhões)
+```
+
+Gender is inferred from preceding articles (`a`, `as`, `da`, `das`) and from the
+shape of the following noun (`-a`, `-dade`, `-agem` endings lean feminine). Pass
+`strict=False` to leave unparseable tokens in place instead of raising.
+
+## Integration with sibling libraries
+
+`tugaphone` composes three TigreGotico Portuguese NLP libraries; each is usable
+on its own:
+
+- [`tugalex`](https://github.com/TigreGotico/tugalex) — the phonetic lexicon
+  (`LEXICON` in `tugaphone.dialects`). `LEXICON.get_ipa_map(region=...)` returns
+  the per-region exception table.
+- [`tugatagger`](https://github.com/TigreGotico/tugatagger) — the POS tagger
+  behind `postag_engine`.
+- [`silabificador`](https://github.com/TigreGotico/silabificador) — the
+  syllabifier behind `WordToken.syllables`.
+
+A TTS front-end typically wires `tugaphone` as the G2P stage: normalize text,
+phonemize per target dialect, hand the IPA string to the acoustic model.
+
+## Where next
+
+- [api.md](api.md) — full signatures
+- [tokenizer.md](tokenizer.md) — inspect syllables, stress and graphemes directly
+- [quickstart.md](quickstart.md) — the basics
diff --git a/docs/api.md b/docs/api.md
@@ -0,0 +1,164 @@
+# API Reference
+
+Every public symbol, with the signatures and return shapes as they exist in the
+source.
+
+## `tugaphone.TugaPhonemizer`
+
+The phonemizer entry class.
+
+```python
+TugaPhonemizer(postag_engine="auto", postag_model="pt_core_news_lg")
+```
+
+| Argument | Meaning |
+|----------|---------|
+| `postag_engine` | POS tagging backend passed to `TugaTagger`: `"auto"`, `"spacy"`, `"brill"`, `"lexicon"`, `"dummy"`. |
+| `postag_model` | Model identifier for engines that take one (e.g. the spaCy model name). |
+
+Construction builds the `TugaTagger` and warms the lexicon so the first
+transcription is fast.
+
+### `phonemize_sentence`
+
+```python
+phonemize_sentence(sentence: str,
+                   lang: str = "pt-PT",
+                   regional_dialect: Optional[RegionalTransforms] = None) -> str
+```
+
+Transcribes `sentence` to IPA for the target dialect. Returns a space-separated
+phoneme string — one token per word, with `ˈ` for primary stress and `·` for
+syllable boundaries; punctuation tokens are preserved.
+
+`lang` is one of `pt-PT`, `pt-BR`, `pt-AO`, `pt-MZ`, `pt-TL`; any other value
+falls back to European Portuguese.
+
+When `regional_dialect` is given, the word is first run through the preset's
+morpheme rules, transcribed, then run through its IPA rules. See
+[`RegionalTransforms`](#tugaphoneregionalregionaltransforms).
+
+```python
+ph = TugaPhonemizer()
+ph.phonemize_sentence("O gato dorme.", "pt-BR")   # 'ˈu gˈa·tʊ ˈdɔh·me'
+```
+
+### `get_dialect_inventory` (staticmethod)
+
+```python
+TugaPhonemizer.get_dialect_inventory(lang: str = "pt-PT") -> DialectInventory
+```
+
+Maps a dialect code to its `DialectInventory` instance (`EuropeanPortuguese`,
+`BrazilianPortuguese`, `AngolanPortuguese`, `MozambicanPortuguese`,
+`TimoresePortuguese`).
+
+## `tugaphone.number_utils`
+
+### `normalize_numbers`
+
+```python
+normalize_numbers(text: str, lang: str = "pt-PT", strict: bool = True) -> str
+```
+
+Replaces numeric tokens in a sentence with their Portuguese written form,
+inferring gender and ordinality from the surrounding words. `pt-PT` uses the
+long scale (biliões), `pt-BR` the short scale (trilhões). With `strict=False`,
+tokens that fail to format are left untouched instead of raising.
+
+```python
+from tugaphone.number_utils import normalize_numbers
+normalize_numbers("vou comprar 1 casa")    # 'vou comprar uma casa'
+normalize_numbers("vou adotar 2 cães")     # 'vou adotar dois cães'
+normalize_numbers("1.5e10")                # 'um vírgula cinco vezes dez elevado a dez'
+```
+
+### `NumberParser`
+
+A classmethod-based helper underneath `normalize_numbers`. Useful when you need
+finer control or want to interrogate a single token.
+
+| Method | Returns |
+|--------|---------|
+| `pronounce_number_word(word, prev_word=None, next_word=None, gender=None, as_ordinal=None, is_brazilian=False)` | Spelled-out form of one numeric token. |
+| `to_int(word)` / `is_int(word)` | Integer value (ordinal markers stripped) / membership test. |
+| `to_float(word)` / `is_float(word)` | Float value / membership test. |
+| `is_scientific_notation(word)` | `True` for forms like `"1.5e10"`. |
+| `pronounce_scientific(word, is_brazilian=False)` | Spoken form of scientific notation. |
+| `is_ordinal(word, next_word=None)` | Detects `º`/`ª` markers, attached or separate. |
+| `get_number_gender(word, prev_word=None, next_word=None)` | `"feminine"` or `"masculine"`. |
+
+```python
+from tugaphone.number_utils import NumberParser
+NumberParser.pronounce_number_word("19", is_brazilian=True)   # 'dezenove'
+NumberParser.pronounce_number_word("19", is_brazilian=False)  # 'dezanove'
+NumberParser.pronounce_number_word("1", next_word="º")        # 'primeiro' (ordinal)
+NumberParser.get_number_gender("1", next_word="casa")         # 'feminine'
+```
+
+## `tugaphone.regional.RegionalTransforms`
+
+A serializable dataclass holding the rules for a sub-regional accent.
+
+```python
+@dataclass
+class RegionalTransforms:
+    morpheme_rules: List[MorphemeTransform] = []   # applied to the word before G2P
+    ipa_rules:      List[IPATransform]      = []    # applied to the IPA after G2P
+```
+
+| Member | Behaviour |
+|--------|-----------|
+| `apply_morpheme(word, postag="NOUN")` | Runs every morpheme rule in order, returns the rewritten word. |
+| `apply_ipa(word, phonemes, postag="NOUN")` | Runs every IPA rule in order, returns the rewritten phoneme string. |
+| `as_dict` (property) | Serializes the rule lists to rule-name strings. |
+| `from_dict(data)` (staticmethod) | Rebuilds an instance from `{"ipa_rules": [...], "morpheme_rules": [...]}`; raises `ValueError` on an unknown IPA rule name. |
+
+```python
+from tugaphone.regional import PortoDialect, RegionalTransforms
+
+cfg = PortoDialect.as_dict
+clone = RegionalTransforms.from_dict(cfg)
+[r.__name__ for r in clone.ipa_rules]   # ['rising_diphthong_o', ...]
+```
+
+### Preset accents
+
+Importable from `tugaphone.regional`: `CoimbraDialect`, `MinhoDialect`,
+`BragaDialect`, `FamalicaoDialect`, `TrasMontanoDialect`, `PortoDialect`,
+`FafeDialect`. Each is a ready-built `RegionalTransforms`. Pass any of them to
+`phonemize_sentence(..., regional_dialect=...)`.
+
+Only the IPA rules listed in `RULE_MAP` round-trip through `as_dict`/`from_dict`;
+accents built from other rule functions serialize a subset.
+
+## `tugaphone.dialects`
+
+| Symbol | Role |
+|--------|------|
+| `DialectInventory` | Base class: phoneme maps, character sets, stress/punctuation tokens. `dialect_code` attribute carries the tag. |
+| `EuropeanPortuguese`, `BrazilianPortuguese`, `AngolanPortuguese`, `MozambicanPortuguese`, `TimoresePortuguese` | The five dialect inventories. |
+| `LisbonPortuguese`, `RioJaneiroPortuguese`, `SaoPauloPortuguese` | City-specific inventories layered on the base dialects. |
+| `LEXICON` | Module-level `TugaLexicon()` instance; `LEXICON.get_ipa_map(region=...)` returns the per-region exception map. |
+
+You rarely instantiate these directly — `TugaPhonemizer` does it for you — but
+they are the `dialect` argument the tokenizer accepts.
+
+## `tugaphone.tokenizer`
+
+The hierarchical model. See [tokenizer.md](tokenizer.md) for the full walkthrough;
+the public surface is:
+
+| Symbol | Role |
+|--------|------|
+| `Sentence(surface, words=[], dialect=EuropeanPortuguese())` | Top-level container; `.ipa`, `.words`, `.n_words`, `.features`. |
+| `Sentence.from_postagged(surface, tags, dialect=None)` | Build from `(token, pos)` pairs (the path `TugaPhonemizer` uses). |
+| `WordToken` | `.surface`, `.syllables`, `.graphemes`, `.stressed_syllable_idx`, `.ipa`, `.features`. |
+| `GraphemeToken` | `.surface`, `.ipa`, `.is_diphthong`, `.is_nasal`, `.is_digraph`, `.features`, and more predicates. |
+| `CharToken` | character-level predicates (`.is_vowel`, `.is_consonant`, `.ipa`, ...). |
+
+## Where next
+
+- [quickstart.md](quickstart.md) — install and first call
+- [advanced.md](advanced.md) — recipes for accents, POS engines, numbers
+- [tokenizer.md](tokenizer.md) — the token tree and feature extraction