Skip to content

Python: Modernize File Not Always Closed query #18845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 28, 2025

Conversation

joefarebrother
Copy link
Contributor

@joefarebrother joefarebrother commented Feb 24, 2025

Rewrites py/file-not-closed query to not rely on pointsTo analysis.

Reviewing per-commit may be helpful.

@joefarebrother joefarebrother force-pushed the python-qual-file-not-closed branch from 7bc2978 to 2f2e755 Compare March 10, 2025 11:23
@joefarebrother joefarebrother marked this pull request as ready for review March 10, 2025 13:43
@joefarebrother joefarebrother requested a review from a team as a code owner March 10, 2025 13:43
@joefarebrother joefarebrother changed the title [Draft] Python: Modernize File Not Always Closed query Python: Modernize File Not Always Closed query Mar 10, 2025
Copy link
Contributor

github-actions bot commented Mar 10, 2025

QHelp previews:

python/ql/src/Resources/FileNotAlwaysClosed.qhelp

File is not always closed

When a file is opened, it should always be closed.

A file opened for writing that is not closed when the application exits may result in data loss, where not all of the data written may be saved to the file. A file opened for reading or writing that is not closed may also use up file descriptors, which is a resource leak that in long running applications could lead to a failure to open additional files.

Recommendation

Ensure that opened files are always closed, including when an exception could be raised. The best practice is often to use a with statement to automatically clean up resources. Otherwise, ensure that .close() is called in a try...except or try...finally block to handle any possible exceptions.

Example

In the following examples, in the case marked BAD, the file may not be closed if an exception is raised. In the cases marked GOOD, the file is always closed.

def bad():
    f = open("filename", "w")
    f.write("could raise exception") # BAD: This call could raise an exception, leading to the file not being closed.
    f.close()


def good1():
    with open("filename", "w") as f:
        f.write("always closed") # GOOD: The `with` statement ensures the file is always closed.

def good2():
    f = open("filename", "w")
    try:
       f.write("always closed")
    finally:
        f.close() # GOOD: The `finally` block always ensures the file is closed.
   

References

@joefarebrother joefarebrother added the no-change-note-required This PR does not need a change note label Mar 10, 2025
@joefarebrother joefarebrother force-pushed the python-qual-file-not-closed branch from a2fbf85 to 3707f10 Compare March 20, 2025 11:47
@joefarebrother joefarebrother force-pushed the python-qual-file-not-closed branch from 3d08e52 to bdbdcf8 Compare March 20, 2025 14:29
tausbn
tausbn previously approved these changes Mar 21, 2025
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! I've added a couple of comments, though many of them are more me musing about how this fits in the grander scheme of things. In the interest of expedience, I would be happy to leave some of the more broad changes I suggest as potential future work. (Once we have more of these quality queries ported over, we'll probably have a better feel for what would make sense as a framework for this sort of thing.)


/** A node where a file is closed. */
abstract class FileClose extends DataFlow::CfgNode {
/** Holds if this file close will occur if an exception is thrown at `e`. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By e do you mean raises?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; I had changed the name at some point and missed that comment.

Comment on lines 87 to 91
private predicate fileLocalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
DataFlow::localFlowStep(nodeFrom, nodeTo)
or
exists(FileWrapperCall fw | nodeFrom = fw.getWrapped() and nodeTo = fw)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extension of local flow makes me wonder if it would make more sense to rewrite this part of the query as a proper data-flow query (with an additional step for file wrapper calls). My main worry is that calculating the fileLocalFlow relation might result in bad performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, full dataflow could improve precison on the trickier cases like the file being closed through a wrapper. I wasn't sure of the performance impact compared to local reasoning, however.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main worry about using global dataflow would be if there were a lot of sources, but my gut feeling is that there probably aren't that many open calls in your average codebase. You probably have a better intuition for this than me, though.

Comment on lines 94 to 96
private predicate fileLocalFlow(DataFlow::Node source, DataFlow::Node sink) {
fileLocalFlowStep*(source, sink)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I've misread the rest of the code, source will in fact always be a FileOpen instance. If this is true, it might make sense to specialise the argument to that class (a kind of manual "magic"), as this would certainly reduce the size of the fileLocalFlow predicate (assuming the compiler hasn't already figured that this kind of magic is available).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this change caused DCA to timeout on FreeCAD. I'm not sure why, but I'll revert it for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's very concerning. There is a reason why the definition of localStep is marked as pragma[inline]. I'm not entirely comfortable with this query (potentially) materialising an even bigger predicate.

In this case, I think there might be a better solution: restrict the fileLocalFlowStep relation to LocalSourceNodes. I think this should work, since FileOpen must be a LocalSourceNode, and so must the wrappers (as they are CallCfgNodes). Of course, the external interface to fileLocalFlow must then use flowsTo to get to the final sink node. And of course, the step from a LocalSourceNode to its FileWrapperCall should also use flowsTo.

Does the above strategy make sense to you? If not, let me know and we can have a look at it together. 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this.
It looks like outside of FreeCAD, which is timing out on main regardless of these changes, performance seems fine.

This appeared to cause timeouts on DCA.
@joefarebrother joefarebrother force-pushed the python-qual-file-not-closed branch from 279e351 to d23c3b8 Compare March 26, 2025 09:24
@joefarebrother joefarebrother requested a review from tausbn March 26, 2025 15:06
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes. I have one more suggestion, otherwise I think this is ready to merge.

Comment on lines 94 to 96
private predicate fileLocalFlow(DataFlow::Node source, DataFlow::Node sink) {
fileLocalFlowStep*(source, sink)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's very concerning. There is a reason why the definition of localStep is marked as pragma[inline]. I'm not entirely comfortable with this query (potentially) materialising an even bigger predicate.

In this case, I think there might be a better solution: restrict the fileLocalFlowStep relation to LocalSourceNodes. I think this should work, since FileOpen must be a LocalSourceNode, and so must the wrappers (as they are CallCfgNodes). Of course, the external interface to fileLocalFlow must then use flowsTo to get to the final sink node. And of course, the step from a LocalSourceNode to its FileWrapperCall should also use flowsTo.

Does the above strategy make sense to you? If not, let me know and we can have a look at it together. 🙂

Comment on lines 87 to 91
private predicate fileLocalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
DataFlow::localFlowStep(nodeFrom, nodeTo)
or
exists(FileWrapperCall fw | nodeFrom = fw.getWrapped() and nodeTo = fw)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main worry about using global dataflow would be if there were a lot of sources, but my gut feeling is that there probably aren't that many open calls in your average codebase. You probably have a better intuition for this than me, though.

@joefarebrother joefarebrother requested a review from tausbn March 28, 2025 14:27
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the additional changes. LGTM! 👍

@joefarebrother joefarebrother merged commit 4356766 into github:main Mar 28, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation no-change-note-required This PR does not need a change note Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants