Add duration distributions#150
Conversation
There was a problem hiding this comment.
Pull request overview
Begins the HSMM implementation (per #149) by introducing duration distributions as a small, self-contained PR. Adds an AbstractDurationDistribution interface (logdensityof, rand, fit!) and three concrete implementations (GeometricDuration, PoissonDuration, NegBinomialDuration), keeping the package dependency-light by hand-rolling samplers and fitters. SpecialFunctions is added as a dependency for trigamma.
Changes:
- New
AbstractDurationDistributiontype and three shifted-support concrete subtypes on{1,2,...}, each with customrand(log-space Knuth Poisson, Marsaglia–Tsang gamma for NB) and MLE-basedfit!(closed-form for Geometric/Poisson, profile-likelihood Newton for NB). - Module-level export of the new types and import of
digamma/trigamma/nbinomlogpdf/poislogpdf. - New
test/duration.jlsuite covering construction validation,show, PMF support/normalization, sample mean recovery, and weightedfit!recovery, wired intotest/runtests.jl.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| Project.toml | Adds SpecialFunctions = "2" dep for trigamma. |
| src/HiddenMarkovModels.jl | Imports new special/log-pmf functions, exports new duration types, includes the new source file. |
| src/types/duration_distribution.jl | Defines AbstractDurationDistribution + Geometric/Poisson/NegBinomial duration types with logdensityof, rand, and fit! implementations. |
| test/duration.jl | New tests for construction, display, PMF support/normalization, sampling means, and fit! parameter recovery. |
| test/runtests.jl | Wires the new Duration distributions test set into the top-level runner. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks for this starting point! |
|
In other words, I could see a package where the sojourn time distribution has to be brought by the user, in the same way that the user currently brings the emission distributions. If the additional effort starting from Distributions.jl is light, this brings the best of both worlds in terms of code maintenance and flexibility |
|
I think the middle ground of starting from Distributions.jl might be the best then. Given HSMMs disallow self-transitions, one can never sample a 0 from the sojourn time distributions. So all of these dists need to be offset by 1, and so I think having some small wrappers of the relevant Distributions objects, and then clear instructions for anyone who wants to overload would be best. What do you think? |
|
Or we don't ask the user to wrap and we directly state we expect a distribution with support on {0,1,2...}? |
|
Do you mean support on {1,2,3...}? But nonetheless, yes this is an option. We could always include in a tutorial then there would at least be an example. I think I largely viewed this as a way to ensure the most common sojourn distributions would be available to reduce boilerplate generation. But I think to your point, this leaves it more generic and it shouldn't be that hard to generate the correct support. We could also include a SupportError type that throws on incorrect support of the sojourn distribution. (E.g., continuous/negative/includes 0?) |
|
Given our discussion--do you think closing this PR and revisiting later if needed is the right move? I think at most, adding thin wrappers over certain Distributions objects would be what we supply. Otherwise we can set up some logic in the |
|
I meant that we could accept any
|
|
Oh got it--yes that makes sense to me and is a better approach than what i suggested |
As discussed in #149, starting the implementation of HSMMs, via smaller PRs. This PR does the following:
AbstractDurationDistributiontype that requiresrand,logdensityof, andfit!methods.NegBinomialDuration,PoissonDuration,GeometricDuration(theoretically an HMM)SpecialFunctionsto deps.A design choice i made was the keep this as dependency free as possible given the general ethos of the package. As such I recreated some common distributions available in
Distributions.jland also made the relevantrand/fit!functions use hand written algos (e.g., Knuth) and fitting (Newton for NegBinomial. Could be cleaner.)