Skip to content

CFG cycles cause memory leak in clang extractor #7

@bennofs

Description

@bennofs

I noticed that memory usage increases continuously when using the clang extractor. Here's a simple demo program to reproduce the issue:

#include <clang/Frontend/FrontendActions.h>
#include <clang/Frontend/CompilerInstance.h>

#include <filesystem>
#include <fstream>
#include <iostream>

#include "clang_ast/clang_extractor.h"
#include "common/clang_driver.h"

using namespace compy;

constexpr char kProgramLoop[] =
    "int cyclic(int a) {"
    "while (1) {"
    "  if (a == 4) return cyclic(a + 1);"
    "  a += 10;"
    "  a /= 2;"
    "}"
    "}";

int main(int, char**) {
  // Init extractor
  std::shared_ptr<ClangDriver> clang_;

  clang_.reset(new ClangDriver(ClangDriver::ProgrammingLanguage::C,
                               ClangDriver::OptimizationLevel::O0,
                               {}, {}));

  compy::clang::ClangExtractor extractor(clang_);
  std::cout << "iter" << "," << "bytes" << std::endl;
  for (int i = 0; i < 10000; ++i) {
    //auto fa = std::make_unique<::clang::SyntaxOnlyAction>();
    //clang_->Invoke(kProgramLoop, {fa.get()}, {});
    extractor.GraphFromString(kProgramLoop);
    std::ifstream statm("/proc/self/statm");
    long stat_total, stat_rss, stat_shared, stat_text, stat_data, stat_library, stat_dirty;
    statm >> stat_total >> stat_rss >> stat_shared >> stat_text >> stat_data >> stat_library >> stat_dirty;
    std::cout << i << "," << (stat_rss<<12) << std::endl;
  }
}

This is caused by cyclic control flow. Cycles in the CFG lead to cycles in the ExtractionInfo graph, which means the nodes in the cycle never reach refcount zero when the top-level is discarded.

To confirm that theory, I made a graph with three variants of the above code:

  1. extractor, the code above
  2. syntax-only, using the driver to run the SyntaxOnly action (to see whether the issue is caused by our frontend action or something in the driver)
  3. no-loop, where the kProgramLoop was modified to remove the cyclic control flow (changing the while into if and adding a return statement to silence warnings)

The result clearly shows that the memory leak only happens when using our extractor with cyclic control flow:

memusage

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions