Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Modernize File Not Always Closed query #18845

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

joefarebrother
Copy link
Contributor

@joefarebrother joefarebrother commented Feb 24, 2025

Rewrites py/file-not-closed query to not rely on pointsTo analysis.

Reviewing per-commit may be helpful.

@joefarebrother joefarebrother force-pushed the python-qual-file-not-closed branch from 7bc2978 to 2f2e755 Compare March 10, 2025 11:23
@joefarebrother joefarebrother marked this pull request as ready for review March 10, 2025 13:43
@joefarebrother joefarebrother requested a review from a team as a code owner March 10, 2025 13:43
@joefarebrother joefarebrother changed the title [Draft] Python: Modernize File Not Always Closed query Python: Modernize File Not Always Closed query Mar 10, 2025
Copy link
Contributor

github-actions bot commented Mar 10, 2025

QHelp previews:

python/ql/src/Resources/FileNotAlwaysClosed.qhelp

File is not always closed

When a file is opened, it should always be closed.

A file opened for writing that is not closed when the application exits may result in data loss, where not all of the data written may be saved to the file. A file opened for reading or writing that is not closed may also use up file descriptors, which is a resource leak that in long running applications could lead to a failure to open additional files.

Recommendation

Ensure that opened files are always closed, including when an exception could be raised. The best practice is often to use a with statement to automatically clean up resources. Otherwise, ensure that .close() is called in a try...except or try...finally block to handle any possible exceptions.

Example

In the following examples, in the case marked BAD, the file may not be closed if an exception is raised. In the cases marked GOOD, the file is always closed.

def bad():
    f = open("filename", "w")
    f.write("could raise exception") # BAD: This call could raise an exception, leading to the file not being closed.
    f.close()


def good1():
    with open("filename", "w") as f:
        f.write("always closed") # GOOD: The `with` statement ensures the file is always closed.

def good2():
    f = open("filename", "w")
    try:
       f.write("always closed")
    finally:
        f.close() # GOOD: The `finally` block always ensures the file is closed.
   

References

@joefarebrother joefarebrother added the no-change-note-required This PR does not need a change note label Mar 10, 2025
@joefarebrother joefarebrother force-pushed the python-qual-file-not-closed branch from a2fbf85 to 3707f10 Compare March 20, 2025 11:47
@joefarebrother joefarebrother force-pushed the python-qual-file-not-closed branch from 3d08e52 to bdbdcf8 Compare March 20, 2025 14:29
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! I've added a couple of comments, though many of them are more me musing about how this fits in the grander scheme of things. In the interest of expedience, I would be happy to leave some of the more broad changes I suggest as potential future work. (Once we have more of these quality queries ported over, we'll probably have a better feel for what would make sense as a framework for this sort of thing.)

Comment on lines +17 to +22
private DataFlow::TypeTrackingNode fileOpenInstance(DataFlow::TypeTracker t) {
t.start() and
result instanceof FileOpenSource
or
exists(DataFlow::TypeTracker t2 | result = fileOpenInstance(t2).track(t2, t))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a type tracker actually needed here? As far as I can tell FileOpenSource only contains API graph nodes at the moment, so I would hope that the type tracking done within the API graph calculation would be sufficient.


/** A node where a file is closed. */
abstract class FileClose extends DataFlow::CfgNode {
/** Holds if this file close will occur if an exception is thrown at `e`. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By e do you mean raises?

Comment on lines +87 to +91
private predicate fileLocalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
DataFlow::localFlowStep(nodeFrom, nodeTo)
or
exists(FileWrapperCall fw | nodeFrom = fw.getWrapped() and nodeTo = fw)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extension of local flow makes me wonder if it would make more sense to rewrite this part of the query as a proper data-flow query (with an additional step for file wrapper calls). My main worry is that calculating the fileLocalFlow relation might result in bad performance.

Comment on lines +94 to +96
private predicate fileLocalFlow(DataFlow::Node source, DataFlow::Node sink) {
fileLocalFlowStep*(source, sink)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I've misread the rest of the code, source will in fact always be a FileOpen instance. If this is true, it might make sense to specialise the argument to that class (a kind of manual "magic"), as this would certainly reduce the size of the fileLocalFlow predicate (assuming the compiler hasn't already figured that this kind of magic is available).


FileWrapperCall() {
wrapped = this.getArg(_).getALocalSource() and
this.getFunction() = classTracker(_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly puzzled by this very weak restriction. Do we really not require anything of the class that's wrapping it?

Comment on lines +104 to +110
(
retVal = ret.getValue()
or
retVal = ret.getValue().(List).getAnElt()
or
retVal = ret.getValue().(Tuple).getAnElt()
) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it's a subset of a more generic concept of "returning some structure containing a thing of interest". For instance, what if the file object is put in a dict that's returned?
If we rewrite the query to use data-flow, I could see this potentially being more widely useful as a standard set of additional flow steps.


predicate fileMayNotBeClosedOnException(FileOpen fo, DataFlow::Node raises) {
fileIsClosed(fo) and
exists(DataFlow::CfgNode fileRaised |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move the raises argument into this exists (as it doesn't seem to be used anywhere outside of this predicate)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation no-change-note-required This PR does not need a change note Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants