Skip to content

Add duration distributions#150

Open
rsenne wants to merge 3 commits into
mainfrom
hsmm-duration-distributions
Open

Add duration distributions#150
rsenne wants to merge 3 commits into
mainfrom
hsmm-duration-distributions

Conversation

@rsenne
Copy link
Copy Markdown
Collaborator

@rsenne rsenne commented Jun 2, 2026

As discussed in #149, starting the implementation of HSMMs, via smaller PRs. This PR does the following:

  • Adds the AbstractDurationDistribution type that requires rand, logdensityof, and fit! methods.
  • Adds three common DDs: NegBinomialDuration, PoissonDuration, GeometricDuration (theoretically an HMM)
  • Adds testing for these distributions.
  • Adds SpecialFunctions to deps.

A design choice i made was the keep this as dependency free as possible given the general ethos of the package. As such I recreated some common distributions available in Distributions.jl and also made the relevant rand/fit! functions use hand written algos (e.g., Knuth) and fitting (Newton for NegBinomial. Could be cleaner.)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Begins the HSMM implementation (per #149) by introducing duration distributions as a small, self-contained PR. Adds an AbstractDurationDistribution interface (logdensityof, rand, fit!) and three concrete implementations (GeometricDuration, PoissonDuration, NegBinomialDuration), keeping the package dependency-light by hand-rolling samplers and fitters. SpecialFunctions is added as a dependency for trigamma.

Changes:

  • New AbstractDurationDistribution type and three shifted-support concrete subtypes on {1,2,...}, each with custom rand (log-space Knuth Poisson, Marsaglia–Tsang gamma for NB) and MLE-based fit! (closed-form for Geometric/Poisson, profile-likelihood Newton for NB).
  • Module-level export of the new types and import of digamma/trigamma/nbinomlogpdf/poislogpdf.
  • New test/duration.jl suite covering construction validation, show, PMF support/normalization, sample mean recovery, and weighted fit! recovery, wired into test/runtests.jl.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
Project.toml Adds SpecialFunctions = "2" dep for trigamma.
src/HiddenMarkovModels.jl Imports new special/log-pmf functions, exports new duration types, includes the new source file.
src/types/duration_distribution.jl Defines AbstractDurationDistribution + Geometric/Poisson/NegBinomial duration types with logdensityof, rand, and fit! implementations.
test/duration.jl New tests for construction, display, PMF support/normalization, sampling means, and fit! parameter recovery.
test/runtests.jl Wires the new Duration distributions test set into the top-level runner.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@rsenne rsenne requested a review from gdalle June 2, 2026 01:38
@gdalle
Copy link
Copy Markdown
Member

gdalle commented Jun 2, 2026

Thanks for this starting point!
I do enjoy minimal dependencies, but I don't enjoy code duplication, especially since Distributions is a very old and dependable package (most of the time). Do we really have to provide these duration distributions in the first place?

@gdalle
Copy link
Copy Markdown
Member

gdalle commented Jun 2, 2026

In other words, I could see a package where the sojourn time distribution has to be brought by the user, in the same way that the user currently brings the emission distributions. If the additional effort starting from Distributions.jl is light, this brings the best of both worlds in terms of code maintenance and flexibility

@rsenne
Copy link
Copy Markdown
Collaborator Author

rsenne commented Jun 2, 2026

I think the middle ground of starting from Distributions.jl might be the best then. Given HSMMs disallow self-transitions, one can never sample a 0 from the sojourn time distributions. So all of these dists need to be offset by 1, and so I think having some small wrappers of the relevant Distributions objects, and then clear instructions for anyone who wants to overload would be best. What do you think?

@gdalle
Copy link
Copy Markdown
Member

gdalle commented Jun 2, 2026

Or we don't ask the user to wrap and we directly state we expect a distribution with support on {0,1,2...}?

@rsenne
Copy link
Copy Markdown
Collaborator Author

rsenne commented Jun 2, 2026

Do you mean support on {1,2,3...}? But nonetheless, yes this is an option. We could always include in a tutorial then there would at least be an example.

I think I largely viewed this as a way to ensure the most common sojourn distributions would be available to reduce boilerplate generation. But I think to your point, this leaves it more generic and it shouldn't be that hard to generate the correct support.

We could also include a SupportError type that throws on incorrect support of the sojourn distribution. (E.g., continuous/negative/includes 0?)

@rsenne
Copy link
Copy Markdown
Collaborator Author

rsenne commented Jun 3, 2026

Given our discussion--do you think closing this PR and revisiting later if needed is the right move? I think at most, adding thin wrappers over certain Distributions objects would be what we supply. Otherwise we can set up some logic in the is_valid_hmm and tutorials would work too

@gdalle
Copy link
Copy Markdown
Member

gdalle commented Jun 3, 2026

I meant that we could accept any {0, 1, ...}-supported distribution that implements rand and logdensityof, and explicitly state in the documentation that "we will shift everything by 1, so the distribution you give us is the distribution of [the sojourn time - 1]". That way:

  • We don't have to depend on Distributions
  • People can use a Distributions.Distribution out of the box
  • They don't have to pass it to us by wrapping it in a MinusOneShift

@rsenne
Copy link
Copy Markdown
Collaborator Author

rsenne commented Jun 3, 2026

Oh got it--yes that makes sense to me and is a better approach than what i suggested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants