-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Python: Modernize File Not Always Closed query #18845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Modernize File Not Always Closed query #18845
Conversation
7bc2978
to
2f2e755
Compare
QHelp previews: python/ql/src/Resources/FileNotAlwaysClosed.qhelpFile is not always closedWhen a file is opened, it should always be closed. A file opened for writing that is not closed when the application exits may result in data loss, where not all of the data written may be saved to the file. A file opened for reading or writing that is not closed may also use up file descriptors, which is a resource leak that in long running applications could lead to a failure to open additional files. RecommendationEnsure that opened files are always closed, including when an exception could be raised. The best practice is often to use a ExampleIn the following examples, in the case marked BAD, the file may not be closed if an exception is raised. In the cases marked GOOD, the file is always closed. def bad():
f = open("filename", "w")
f.write("could raise exception") # BAD: This call could raise an exception, leading to the file not being closed.
f.close()
def good1():
with open("filename", "w") as f:
f.write("always closed") # GOOD: The `with` statement ensures the file is always closed.
def good2():
f = open("filename", "w")
try:
f.write("always closed")
finally:
f.close() # GOOD: The `finally` block always ensures the file is closed.
References
|
…oc and annotate tests.
a2fbf85
to
3707f10
Compare
3d08e52
to
bdbdcf8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff! I've added a couple of comments, though many of them are more me musing about how this fits in the grander scheme of things. In the interest of expedience, I would be happy to leave some of the more broad changes I suggest as potential future work. (Once we have more of these quality queries ported over, we'll probably have a better feel for what would make sense as a framework for this sort of thing.)
|
||
/** A node where a file is closed. */ | ||
abstract class FileClose extends DataFlow::CfgNode { | ||
/** Holds if this file close will occur if an exception is thrown at `e`. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By e
do you mean raises
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; I had changed the name at some point and missed that comment.
private predicate fileLocalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { | ||
DataFlow::localFlowStep(nodeFrom, nodeTo) | ||
or | ||
exists(FileWrapperCall fw | nodeFrom = fw.getWrapped() and nodeTo = fw) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extension of local flow makes me wonder if it would make more sense to rewrite this part of the query as a proper data-flow query (with an additional step for file wrapper calls). My main worry is that calculating the fileLocalFlow
relation might result in bad performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, full dataflow could improve precison on the trickier cases like the file being closed through a wrapper. I wasn't sure of the performance impact compared to local reasoning, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main worry about using global dataflow would be if there were a lot of sources, but my gut feeling is that there probably aren't that many open
calls in your average codebase. You probably have a better intuition for this than me, though.
private predicate fileLocalFlow(DataFlow::Node source, DataFlow::Node sink) { | ||
fileLocalFlowStep*(source, sink) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I've misread the rest of the code, source
will in fact always be a FileOpen
instance. If this is true, it might make sense to specialise the argument to that class (a kind of manual "magic"), as this would certainly reduce the size of the fileLocalFlow
predicate (assuming the compiler hasn't already figured that this kind of magic is available).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this change caused DCA to timeout on FreeCAD. I'm not sure why, but I'll revert it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that's very concerning. There is a reason why the definition of localStep
is marked as pragma[inline]
. I'm not entirely comfortable with this query (potentially) materialising an even bigger predicate.
In this case, I think there might be a better solution: restrict the fileLocalFlowStep
relation to LocalSourceNode
s. I think this should work, since FileOpen
must be a LocalSourceNode
, and so must the wrappers (as they are CallCfgNode
s). Of course, the external interface to fileLocalFlow
must then use flowsTo
to get to the final sink node. And of course, the step from a LocalSourceNode
to its FileWrapperCall
should also use flowsTo
.
Does the above strategy make sense to you? If not, let me know and we can have a look at it together. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented this.
It looks like outside of FreeCAD, which is timing out on main
regardless of these changes, performance seems fine.
This appeared to cause timeouts on DCA.
279e351
to
d23c3b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes. I have one more suggestion, otherwise I think this is ready to merge.
private predicate fileLocalFlow(DataFlow::Node source, DataFlow::Node sink) { | ||
fileLocalFlowStep*(source, sink) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that's very concerning. There is a reason why the definition of localStep
is marked as pragma[inline]
. I'm not entirely comfortable with this query (potentially) materialising an even bigger predicate.
In this case, I think there might be a better solution: restrict the fileLocalFlowStep
relation to LocalSourceNode
s. I think this should work, since FileOpen
must be a LocalSourceNode
, and so must the wrappers (as they are CallCfgNode
s). Of course, the external interface to fileLocalFlow
must then use flowsTo
to get to the final sink node. And of course, the step from a LocalSourceNode
to its FileWrapperCall
should also use flowsTo
.
Does the above strategy make sense to you? If not, let me know and we can have a look at it together. 🙂
private predicate fileLocalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { | ||
DataFlow::localFlowStep(nodeFrom, nodeTo) | ||
or | ||
exists(FileWrapperCall fw | nodeFrom = fw.getWrapped() and nodeTo = fw) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main worry about using global dataflow would be if there were a lot of sources, but my gut feeling is that there probably aren't that many open
calls in your average codebase. You probably have a better intuition for this than me, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the additional changes. LGTM! 👍
Rewrites
py/file-not-closed
query to not rely onpointsTo
analysis.Reviewing per-commit may be helpful.