-
Notifications
You must be signed in to change notification settings - Fork 14
Complete re-write of rectangular binnings to use ranges #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@Datseris This seems reasonable, but I think it breaks upstream code at the moment. To get the joint histograms for multi-argument functions I simply do (with the old code)
For a |
from source code of if e.precise
# Don't know how to make this faster unfurtunately...
cartidx = CartesianIndex(map(searchsortedlast, ranges, Tuple(point)))
else
bin = floor.(Int, (point .- e.mini) ./ e.widths) .+ 1
cartidx = CartesianIndex(Tuple(bin))
end |
I'll extract this into a function |
Excellent. |
Fixing the tests of the Transfer Operator is very hard. I am getting
|
There is just so much in this source code that isn't used, makes it so hard to read the source code. In this block # Count how many points jump from the i-th bin to each of
# the unique target bins, and use that to calculate the transition
# probability from bᵢ to bⱼ.
for (j, bᵤ) in enumerate(unique(target_bins))
n_transitions_i_to_j = sum(target_bins .== bᵤ)
push!(I, i)
push!(J, bᵤ)
push!(P, n_transitions_i_to_j / n_visitsᵢ)
end
|
I'll have a look. Tag me when you're done changing things, so we don't do overlapping work |
Yes, I know. This code is ancient and is a direct rewrite of some messy matlab code from back in the days. As we talked about before, it will be fixed as part of #55. But the issue shouldn't be in the loop. If the bins are computed correctly and has the expected format before the loops, then the transfer operator approximation should be correct. |
I found the issue. Something is fishy is going on with the encodings @testset "All points covered" begin
# Ensure that given a `RectangularBinning` no point is in invalid bin
x = Dataset(rand(100, 2))
binnings = [
RectangularBinning(3),
RectangularBinning(0.2),
RectangularBinning([2, 3]),
RectangularBinning([0.2, 0.3]),
]
for bin in binnings
rbe = RectangularBinEncoding(bin, x)
visited_bins = map(pᵢ -> encode(rbe, pᵢ), x)
@test -1 ∉ visited_bins
end
end This errors. I'll fix this now. Or at least I'll try. |
Well, to be precise, this is also a problem in the Tranfer Operator code. If you allow for |
The transfer operator is approximated by how a locally linear map transforms points. An implicit assumption here is that the points are supported on the grid on which the approximation is made. It should be fine to just drop any point where one or more components are encoded as I've always made sure that the binning used covers all the point a priori, so this hadn't crossed my mind before. My mistake. |
this should be done in a different pr. for now I found the obvious problem. When makign the range |
Our binning code was really bad when it came down to real world usage. When preparing the workshop, showing the ouputs of value histogram was alwas unintuitive. This thing we do with
n_eps
andnextfloat
always leads to completely random and unintuitive numbers for the histogram edges. it also is very hard to get "the expected histogram" for different distribtions, hence computing the KL divergence. Furthermore, it is fundamentally inaccurate.A much better approach is to give up trying to "hack up" some accuracy ourselves, and instead take advantage of Julia's base
range
system that operates usingTwicePrecision
to always keep range step sizes what the user expects without dealing with floating point precision. So that range0:0.1:1
has exactly 0.1 step and exactly length of 11.I have fully re-written the internals of rectangular binnings to utilize ranges. This has lead to many, many benefits:
RectangularBinning
is an intermediate struct that gets cast into aFixedRectangularBinning
. This reduces a lot the code.n_eps
have been completely removed. They were never accurate to begin with; they just changed the histogram sizes but they were just as inaccurate. To be accurate you need double precision.FixedRectangularBinning
now takes in standard juliarange
s as input. Onerange
for each dimension, with convenience constructors. This allows us to utilize Julia's internal double precision system without any hacky stuff. This also means taht the outcome space has nice, simple edges and bin widths, which is what a user would like..precise
option: if true, they usesearchsortedlast
, which uses internally the double precision, to map data to correct bin according to ranges. Iffalse
they use our standard division with the bin width.To give an example of how much of achange this does, here we go: