WIP: CNMF object: first draft of provenance #1551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

pgunn wants to merge 1 commit into dev from dev-provenance

+30 −0

Member

pgunn commented Dec 18, 2025

For now this is meant as a conversation piece to decide what we think of this design of a provenance system; it hasn't seen any testing (although it builds), and the corresponding changes to OnACID have not been made


          CNMF object: first draft of provenance

92c8070

ethanbb suggested changes

View reviewed changes

Contributor

ethanbb left a comment

Overall I think this field can provide some important information - it seems like a good idea! I have some suggestions, mostly about naming & consistency.

caiman/source_extraction/cnmf/cnmf.py

    
                                           # the dicts all having at least the fields "event":str, "time":int, and "text":str

                                           # Other fields are permitted. Time is Unix epochtime. The semantics used here will be mirrored in

                                           # the OnACID class

                      self.provenance.append({'event': 'init', 'time': int(time.time()), 'text': 'CNMF Object initialized'})

Contributor

ethanbb Dec 20, 2025

'text' seems too generic to me - how about 'description'? Or 'action', though that might be too similar to 'event'.

Also, should it be in a consistent tense? To match the others, this could be "Initialize CNMF object".

Member Author

pgunn Dec 21, 2025

I toyed with description as I wrote it; still open to it. I don't have strong feelings between the two. Will revise in that direction.

caiman/source_extraction/cnmf/cnmf.py

    
                          cnmf object with the current estimates

                      """

                      # TODO: This both modifies self and returns a new cnmf object. Consider revising semantics.

Contributor

ethanbb Dec 21, 2025

I agree, since this does the initial fit and refit in one call, it doesn't make much sense to me for self to become the result after just the initial fit. Self should either remain unchanged, or become the re-fit object (in which case there should be no return value). To do the former, I think you could just deepcopy the object first. To do the latter, there would have to be some version of refit that worked in-place rather than returning a new CNMF object. But of course either way it would be a breaking change.

Member Author

pgunn Dec 21, 2025

Yeah, I did a change in this direction for fit() earlier; painful but worth it. It sucks that it has the potential to break notebooks as well as any code out there that builds on caiman, but I think the API being better-behaved is worth it.

caiman/source_extraction/cnmf/cnmf.py

    
                              A new CNMF object

                      """

                      self.provenance.append({'event': 'refit', 'time': int(time.time()), 'text': f'Run refit', 'data_target': str(images)})

                      cnm = CNMF(self.params.patch['n_processes'], params=self.params, dview=dview)

Contributor

ethanbb Dec 21, 2025

It seems like the new object needs to inherit the provenance from self.

Member Author

pgunn Dec 21, 2025

Oh, whoops, good catch.

caiman/source_extraction/cnmf/cnmf.py

    
                      """

                      logger = logging.getLogger("caiman")

                      # Todo : to compartment

                      self.provenance.append({'event': 'fit', 'time': int(time.time()), 'text': f'Run fit', 'data_target': str(images)})

Contributor

ethanbb Dec 21, 2025

Is str(images) really useful? I think it trims to just show the beginning and the end of the movie, right? (Otherwise obviously this would be impractical.)

I suppose comparing string representations could be used as a very rough heuristic for whether 2 movies were actually the same. To do so more rigorously would require iterating through the whole movie and computing a hash, I think, which is probably too costly for the benefit. So maybe this is a decent compromise, although it seems a bit hacky. Alternatively, maybe just save the first few frames as an array? Then at least you could compare those frames with another movie as floats, with whatever precision you wanted, rather than as strings.

Member Author

pgunn Dec 21, 2025

Oh, this is totally a goof on my part; wasn't reading carefully enough and thought images was a list of filenames as it is in some other methods. I wasn't meaning to include any kind of serialisation of data in this.

I haven't tested this code at all; probably would've been caught during testing.

caiman/source_extraction/cnmf/cnmf.py

    
                                      indices of components to be removed

                      """

                      self.provenance.append({'event': 'remove_components', 'time': int(time.time()), 'text': f'Remove named components', 'data_target': str(ind_rm)})

Contributor

ethanbb Dec 21, 2025

data_target seems too generic, and it's not really the same as the data_target in fit, so how about inds_removed or similar?

Also, how about 'Remove specified components' rather than 'named'? Maybe this is just me, but I think 'named' implies they have actual names (i.e., strings).

Member Author

pgunn Dec 21, 2025

Both sound fine. I was writing this with the idea that we'd try very hard to keep 'event' and any custom parameters stable over time, but I was going back and forth over where custom parameters should also be very custom to the event or try to bend towards being generic.

caiman/source_extraction/cnmf/cnmf.py

    
                           Yr :    np.ndarray

                                   movie in format pixels (d) x frames (T)

                      """

                      self.provenance.append({'event': 'compute_residuals', 'time': int(time.time()), 'text': f'Compute and store residuals inline'})

Contributor

ethanbb Dec 21, 2025

Can we be specific and say "Compute residuals and store in YrA"?

Member Author

pgunn Dec 21, 2025

Yup.

caiman/source_extraction/cnmf/cnmf.py

    
                      args_in = [(F[jj], None, jj, None, None, None, None,

                                  args) for jj in range(F.shape[0])]

                      self.provenance.append({'event': 'deconvolve', 'time': int(time.time()), 'text': f'Deconvolve on traces', 'method_used': str(method_deconvolution)})

Contributor

ethanbb Dec 21, 2025

How about 'method_deconvolution' instead of 'method_used' to match the parameter name?

Member Author

pgunn Dec 21, 2025

Sure

Member Author

pgunn commented Dec 21, 2025

Overall I think this field can provide some important information - it seems like a good idea! I have some suggestions, mostly about naming & consistency.

Thanks - the main intent here is to provide some reasonably strong hints to the user how a given dataset came about; I think it may be impossible to chase it all the way to 100% reproducibility given sometimes people will reach into the objects through accessors, different versions of the algorithms, and even some amount of nondeterminism in the algorithms, but any improvement along these lines should be a plus if it doesn't cost too much in other parts of the design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet