Skip to content

Conversation

@pgunn
Copy link
Member

@pgunn pgunn commented Dec 18, 2025

For now this is meant as a conversation piece to decide what we think of this design of a provenance system; it hasn't seen any testing (although it builds), and the corresponding changes to OnACID have not been made

Copy link
Contributor

@ethanbb ethanbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this field can provide some important information - it seems like a good idea! I have some suggestions, mostly about naming & consistency.

# the dicts all having at least the fields "event":str, "time":int, and "text":str
# Other fields are permitted. Time is Unix epochtime. The semantics used here will be mirrored in
# the OnACID class
self.provenance.append({'event': 'init', 'time': int(time.time()), 'text': 'CNMF Object initialized'})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'text' seems too generic to me - how about 'description'? Or 'action', though that might be too similar to 'event'.

Also, should it be in a consistent tense? To match the others, this could be "Initialize CNMF object".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I toyed with description as I wrote it; still open to it. I don't have strong feelings between the two. Will revise in that direction.

cnmf object with the current estimates
"""

# TODO: This both modifies self and returns a new cnmf object. Consider revising semantics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, since this does the initial fit and refit in one call, it doesn't make much sense to me for self to become the result after just the initial fit. Self should either remain unchanged, or become the re-fit object (in which case there should be no return value). To do the former, I think you could just deepcopy the object first. To do the latter, there would have to be some version of refit that worked in-place rather than returning a new CNMF object. But of course either way it would be a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did a change in this direction for fit() earlier; painful but worth it. It sucks that it has the potential to break notebooks as well as any code out there that builds on caiman, but I think the API being better-behaved is worth it.

A new CNMF object
"""
self.provenance.append({'event': 'refit', 'time': int(time.time()), 'text': f'Run refit', 'data_target': str(images)})
cnm = CNMF(self.params.patch['n_processes'], params=self.params, dview=dview)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the new object needs to inherit the provenance from self.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, whoops, good catch.

"""
logger = logging.getLogger("caiman")
# Todo : to compartment
self.provenance.append({'event': 'fit', 'time': int(time.time()), 'text': f'Run fit', 'data_target': str(images)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is str(images) really useful? I think it trims to just show the beginning and the end of the movie, right? (Otherwise obviously this would be impractical.)

I suppose comparing string representations could be used as a very rough heuristic for whether 2 movies were actually the same. To do so more rigorously would require iterating through the whole movie and computing a hash, I think, which is probably too costly for the benefit. So maybe this is a decent compromise, although it seems a bit hacky. Alternatively, maybe just save the first few frames as an array? Then at least you could compare those frames with another movie as floats, with whatever precision you wanted, rather than as strings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is totally a goof on my part; wasn't reading carefully enough and thought images was a list of filenames as it is in some other methods. I wasn't meaning to include any kind of serialisation of data in this.

I haven't tested this code at all; probably would've been caught during testing.

indices of components to be removed
"""

self.provenance.append({'event': 'remove_components', 'time': int(time.time()), 'text': f'Remove named components', 'data_target': str(ind_rm)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_target seems too generic, and it's not really the same as the data_target in fit, so how about inds_removed or similar?

Also, how about 'Remove specified components' rather than 'named'? Maybe this is just me, but I think 'named' implies they have actual names (i.e., strings).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both sound fine. I was writing this with the idea that we'd try very hard to keep 'event' and any custom parameters stable over time, but I was going back and forth over where custom parameters should also be very custom to the event or try to bend towards being generic.

Yr : np.ndarray
movie in format pixels (d) x frames (T)
"""
self.provenance.append({'event': 'compute_residuals', 'time': int(time.time()), 'text': f'Compute and store residuals inline'})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be specific and say "Compute residuals and store in YrA"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.

args_in = [(F[jj], None, jj, None, None, None, None,
args) for jj in range(F.shape[0])]

self.provenance.append({'event': 'deconvolve', 'time': int(time.time()), 'text': f'Deconvolve on traces', 'method_used': str(method_deconvolution)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about 'method_deconvolution' instead of 'method_used' to match the parameter name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@pgunn
Copy link
Member Author

pgunn commented Dec 21, 2025

Overall I think this field can provide some important information - it seems like a good idea! I have some suggestions, mostly about naming & consistency.

Thanks - the main intent here is to provide some reasonably strong hints to the user how a given dataset came about; I think it may be impossible to chase it all the way to 100% reproducibility given sometimes people will reach into the objects through accessors, different versions of the algorithms, and even some amount of nondeterminism in the algorithms, but any improvement along these lines should be a plus if it doesn't cost too much in other parts of the design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants