Skip to content

Commit 06f006b

Browse files
Prosodic v3.0.0: DataFrame-first architecture, 42x faster (#76)
Prosodic v3: DataFrame-first architecture, 42x faster
2 parents 31db244 + a00b5c2 commit 06f006b

38 files changed

Lines changed: 15906 additions & 5013 deletions

CLAUDE.md

Lines changed: 202 additions & 0 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
Prosodic is a metrical-phonological parser written in Python. Currently, it can parse English and Finnish text, but adding additional languages is easy with a pronunciation dictionary or a custom python function. Prosodic was built by [Ryan Heuser](https://github.com/quadrismegistus), [Josh Falk](https://github.com/jsfalk), and [Arto Anttila](http://web.stanford.edu/~anttila/). Josh also maintains [another repository](https://github.com/jsfalk/prosodic1b), in which he has rewritten the part of this project that does phonetic transcription for English and Finnish. [Sam Bowman](https://github.com/sleepinyourhat) has contributed to the codebase as well, adding several new metrical constraints.
66

7-
This version, Prosodic 2.x, is a near-total rewrite of the original Prosodic.
7+
Prosodic 3.x features a DataFrame-first architecture with vectorized numpy constraint evaluation, GPU-accelerated harmonic bounding, and a Maximum Entropy weight learner for training constraint weights from annotated data. See [CLAUDE.md](CLAUDE.md) for full architecture docs.
88

99
Supports Python>=3.9.
1010

@@ -13,6 +13,20 @@ Supports Python>=3.9.
1313
You can view and use a web app demo of the current Prosodic app at **[prosodic.dev](https://prosodic.dev/)**.
1414

1515

16+
## Performance
17+
18+
Shakespeare sonnets (2155 lines, Apple M1). Run `python -m prosodic.profiling` to regenerate.
19+
20+
| Step | v2 | v3 | Speedup |
21+
|---|---|---|---|
22+
| Init (tokenize + pronunciations + entities) | 5.29s | 1.80s | 3x |
23+
| Parse (CPU) | 72.97s | 5.0s | 15x |
24+
| Parse (GPU) | 72.97s | 1.3s | 57x |
25+
| **End-to-end (CPU)** | **78.3s** | **6.8s** | **12x** |
26+
| **End-to-end (GPU)** | **78.3s** | **3.1s** | **26x** |
27+
| **DF-only (no entities, GPU)** | **78.3s** | **1.8s** | **42x** |
28+
| Syntax (dep parse) | 160.2s | 2.7s | 58x |
29+
1630
## Install
1731

1832
### 1. Install python package
@@ -851,6 +865,44 @@ line_from_richardIII
851865

852866

853867

868+
#### Phrasal stress (syntax)
869+
870+
Prosodic can optionally compute phrasal stress from dependency parsing (Liberman & Prince 1977), using spaCy. This adds a `phrasal_stress` column to the syllable DataFrame, indicating each word's syntactic prominence (0 = sentence root, more negative = more deeply embedded).
871+
872+
```bash
873+
# Install spaCy (optional dependency)
874+
pip install prosodic[syntax]
875+
python -m spacy download en_core_web_sm
876+
```
877+
878+
```python
879+
# Enable with syntax=True
880+
t = prosodic.Text("Shall I compare thee to a summers day", syntax=True)
881+
882+
# Phrasal stress values per word
883+
df = t._syll_df
884+
df[['word_txt', 'phrasal_stress']].drop_duplicates('word_num')
885+
# word_txt phrasal_stress
886+
# Shall -1
887+
# I -1
888+
# compare 0 # ROOT (most prominent)
889+
# thee -1
890+
# to -1
891+
# a -3
892+
# summers -2
893+
# day -1
894+
```
895+
896+
Two metrical constraints use phrasal stress (both inert when `syntax=False`):
897+
- `w_prom`: penalizes phrasally prominent words (root/direct dependents) on weak metrical positions
898+
- `s_demoted`: penalizes deeply embedded words on strong metrical positions
899+
900+
```python
901+
from prosodic.parsing.meter import Meter
902+
m = Meter(constraints=['w_stress', 's_unstress', 'w_peak', 'w_prom', 's_demoted'])
903+
t.parse(meter=m)
904+
```
905+
854906
#### Metrical parsing
855907

856908
##### Parsing lines

_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__='2.1.2'
1+
__version__='3.0.0'

codecov.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
codecov:
1+
codecov:
22
token: a6cb4510-38d6-4a03-bea2-4fd132e2a6ad
3-
ignore:
4-
- "lib"
5-
- "tests"
3+
ignore:
4+
- "prosodic/lib/**"
5+
- "tests/**"

0 commit comments

Comments
 (0)