Description
This issue was originally created at: 2002-04-07 13:05:02.
This issue was reported by: nyber
.
nyber said at 2002-04-07 13:05:02
I would like to request having an option of not taking source code comments into account when computing the md5 checksum.
I have a cvs log comment block in my source files that gets added to every time I do a cvs checkin. So after a checkin scons will rebuild all of those files because the md5 checksum changes. But all that is modified is a comment block at the top of the file.
---The following is Anthony Roach's comments on how this could be done.
That would be a pretty easy change. We'd just have to add some sort of filtering function to either Node.get_contents()
, Sig.Calculator.csig()
, or Sig.MD5.signature()
. This filtering function would be given the contents of the file, and it would return the contents with all the C comments stripped out. Node.get_contents()
would probably be the best place to do this, because we'd want the filter function to be somehow configurable based on the file extension or something. In a SConstruct file, maybe you'd do this:
def comment_filter(contents):
# filter out comments
SignatureFilter(comment_filter, ".c .h .cpp .hpp")
omynous said at 2005-01-31 22:42:02
Duplicates 649436
issues@scons said at 2005-01-31 22:42:02
Converted from SourceForge tracker item 540682
gregnoel said at 2008-05-26 00:11:57
Assigning to MatiGruca.
Welcome aboard. Here's the issue that corresponds to your GSoC project.
matigruca said at 2008-05-31 14:40:56
Re-post from scons-dev (so that it won't get lost) + a little update at the bottom:
I've been thinking how to implement the comments stripper (http://www.scons.org/wiki/GSoC2008/MatiGruca), and that's what I've come up with.
- Add new module (
Comments.py
?Stripper.py
?) containing (a) all the stripping functions and (b) a control function to decide which function to use in particular case. - Modify
changed_content()
method (from 'File' class fromNode/FS.py
) to run the control function from - let's say -Comments.py
module to check if we can strip part of the file or run the good oldget_csig()
to check the MD5 signature for the whole file otherwise.
The changed_content()
method could look like this:
def changed_content(self, target, prev_ni):
ret = SCons.Comments.Stripper(self, target, prev_ni)
if ret == SIGS_DIFFER: # different signatures
return 1
elif ret == SIGS_EQUAL: # equal signatures
return 0
# stripper didn't know what to do, so go the usual way...
cur_csig = self.get_csig()
try:
return cur_csig != prev_ni.csig
except AttributeError:
return 1
I have hacked it in this way (only a provisory C-comments/code stripping functions) and it works fine, but there is one thing I haven't thought about before.
I have to store (max) three checksums for every file in the .sconsign.dblite:
- checksum for the whole file
- checksum for the code
- checksum for the comments
I have added two additional elements to the field_list (let's call them 'psig' and 'osig' as for now) list in FileNodeInfo class (Node/FS.py
) and it works fine, but, I guess, I should also addget_psig()
andget_osig()
functions toNode/__init__.py
(just likeget_csig()
in 'Node' class inNode/__init__.py
).
Any comments?
I've already created C-code/comments stripping functions + tests (but I'm still working to improve them). Now I will go for D's nested comments.
Cheers,
Mateusz
gregnoel said at 2008-05-31 16:48:41
I have no record of having received a message with this content; it must have been dropped somewhere.
Add new module (Comments.py? Stripper.py?) containing (a) all the stripping functions and (b) a control function to decide which function to use in particular case.
A place to put all the stripping functions is a good idea, but it's the Builder that knows both the source and the target, so it's the Builder that should say which stripping function to use. If it doesn't nominate one, the default would be get_contents()
(i.e., the whole file). This is much simpler than a control function that has to know and understand all possible sources and targets (and probably faster, to boot).
There's no need to change anything in the signature calculations other than to select the stripping function in the correct places. (In fact, it would be bad design to do otherwise: the signature subsystem is already complex enough, so let it figure out what it wants to keep.) Check with Steven about where to call the stripping function and whether the cache index needs to be tweaked.
Have you created a branch to contain this work yet? Oh, and you should change the status of this issue to STARTED.
matigruca said at 2008-06-02 11:38:34
I have no record of having received a message with this content; it must have been dropped somewhere.
This is really strange. Right after sending this mail to scons-dev I've checked scons.org mail archives and my mail was there, but now it is gone! Now I know, why there was no response :)
There's no need to change anything in the signature calculations other than to select the stripping function in the correct places.
Do you mean adding new database fields (psig and osig in my previous post) for new MD5 checksums?
If yes, I am in trouble, since I can't see a way to compare the checksums for code/comments if all I got is MD5 sum for whole file. I agree that signature subsystem is too complex and I know, that adding new checksum fields to the database is a pain for current SCons's users, but what else can I do?
Have you created a branch to contain this work yet?
No, can I do this by myself? I thought that I need some special write priviledges.
Thanks for comments!
Mateusz
gregnoel said at 2008-06-02 15:08:48
You don't need to add any fields to the signature subsystem; you don't have to do much beyond providing the portion of the file to be checksummed. You don't need to store the MD5 sum of the whole file; you don't need to store the MD5 sum of the excluded portions. All you need to do is provide the piece that should be checked for changes and let the signature subsystem take care of the rest.
Nag Steven to do a brain dump of the signature subsystem into the Developers' Guide in the wiki (at least the internal API) and then bug him about anything you don't understand. Record all of the questions and answers in the wiki and one of us will clean it up later. Although I can't seem to find it now, he did part of this when he was describing the proposed Decider() functionality on the mailing list, so you can start there. In essence, the idea is that if you ask for a bit of signature, the subsystem will notice what's being used and remember it for future use. You don't have to know or care how it's done.
As for SVN permissions, as far as I know, you should be good to go. Drop us a line if it doesn't work.
matigruca said at 2008-07-07 10:04:48
Hi,
nagging Steven failed (please, don't treat it as objection, I understand that he is busy and moreover it made me to sit more, learn more and think more - which is good), so I am coming up with my new idea about where to place the comments stripping framework.
I added new attribute to BuilderBase class called 'stripper' along with two methods: add_stripper()
and get_stripper()
.
Now, for every initialized builder I can add adequate stripper in tool-specific initialization files.
For example in Tool/cc.py:
71 from SCons.Comments import CComments
72
73 for suffix in CSuffixes:
74 static_obj.add_action(suffix, SCons.Defaults.CAction)
75 shared_obj.add_action(suffix, SCons.Defaults.ShCAction)
76 static_obj.add_emitter(suffix, SCons.Defaults.StaticObjectEmitter)
77 shared_obj.add_emitter(suffix, SCons.Defaults.SharedObjectEmitter)
78 static_obj.add_stripper(suffix, CComments) ### NEW!
79 shared_obj.add_stripper(suffix, CComments) ### NEW!
(should CComments
function be called StripCComments
?)
or in Tool/FortranCommon.py
:
126 from SCons.Comments import FortranComments
127 for suffix in suffixes:
128 static_obj.add_action(suffix, compaction)
129 shared_obj.add_action(suffix, shcompaction)
130 static_obj.add_emitter(suffix, FortranEmitter)
131 shared_obj.add_emitter(suffix, ShFortranEmitter)
132 static_obj.add_stripper(suffix, FortranComments) ### NEW!
133 shared_obj.add_stripper(suffix, FortranComments) ### NEW!
134
135
136 for suffix in ppsuffixes:
137 static_obj.add_action(suffix, compppaction)
138 shared_obj.add_action(suffix, shcompppaction)
139 static_obj.add_emitter(suffix, FortranEmitter)
140 shared_obj.add_emitter(suffix, ShFortranEmitter)
141 static_obj.add_stripper(suffix, FortranComments) ### NEW!
142 shared_obj.add_stripper(suffix, FortranComments) ### NEW!
(FortranComments
or StripFortranComments
?)
OK, so now I got the right strippers connected to the right suffixes.
I think that changed_content()
method in Node/FS.py
is the right place
to check for the checksum, so now I do it like that (simplified version):
2697 def changed_content(self, target, prev_ni):
2698 try: ### NEW!
2699 stripper = target.builder.get_stripper(self.suffix)
2700 code = stripper(str(self))
2701 cur_csig = SCons.Util.MD5signature(code)
2702 ninfo = self.get_ninfo()
2703 ninfo.csig = cur_csig
2704 except: ### OLD
2705 cur_csig = self.get_csig()
2706
2707 try:
2708 return cur_csig != prev_ni.csig
2709 except AttributeError:
2710 return 1
That's all. I know that it is a lot better than previous propositions and I hope that it'll good enough to be accepted :)
Greg, thanks for comments, when I have read them for the first time, I didn't understand them as good as I do now.
Looking back at my work I think that I should've started with Python Binary Builder[1]. Working on PBB gave me some knowledge I needed to propose the above. Greg, how do you think - should I reschedule my project on my wiki page, or should I just mention it in my mid-term evaluation
report?
[1] - http://www.scons.org/wiki/InstallPython
Regards,
Mateusz Gruca
gregnoel said at 2008-07-07 11:09:49
(Gary, you should add your comments to the issue, not reply via email. Steven, consider yourself heartily nagged.)
Yes, either the Builder or the Executor is the right place to attach the stripper function; attaching to the Executor is a bit more complicated, but would have the advantage that it could be overridden on a case-by-case basis if desired.
As for the names of the stripper functions, note that some are more widely usable than others. A stripper for #-comments could not only be used for the shell and Python but also dozens of languages; its name should reflect that.
matigruca said at 2008-07-08 14:29:51
Hi Greg,
Yes, either the Builder or the Executor is the right place to attach the stripper function; attaching to the Executor is a bit more complicated, but would have the advantage that it could be overridden on a case-by-case basis if desired.
OK, I will investigate the Executor variant.
As for the names of the stripper functions, note that some are more widely usable than others. A stripper for #-comments could not only be used for the shell and Python but also dozens of languages; its name should reflect that.
I agree. What about StripHashComments
and StripHashCode
?
Regards,
Mati Gruca
gregnoel said at 2008-08-05 06:24:51
GSoC issues should be assigned to the 'anytime' milestone, since they are outside the normal release cycle.