Add a threadpool and run the comparison of each file with all files a… #72

cgkantidis · 2025-05-17T23:30:15Z

…fter it in a different thread

Duplo's job is embarrassingly parallel and thus the usage of a threadpool is the easiest way to speed up Duplo's runtime.

This is a classic case of trading larger memory consumption for faster runtime.

Even the most basic laptops nowadays have at least 8GB of RAM, and not using all of the available memory is a waste.

For enterprise machines, which can have 100s of GB of RAM is trade-off of memory for runtime reduction is especially appreciated.

Users can elect to run the tool with a single thread (-j 1) in order to revert to the old behavior, with no extra cost in memory.

The only synchronization between the threads is the logging messages by the exporter, which are printed by each thread when it finishes comparing a file with all other files after it.

Quake 2 runtime and memory scaling:

GCC 15.1.0 runtime and memory scaling:

…fter it in a different thread Duplo's job is [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) and thus the usage of a threadpool is the easiest way to speed up Duplo's runtime. This is a classic case of trading larger memory consumption for faster runtime. Even the most basic laptops nowadays have at least 8GB of RAM, and not using all of the available memory is a waste. For enterprise machines, which can have 100s of GB of RAM is trade-off of memory for runtime reduction is especially appreciated. Users can elect to run the tool with a single thread (-j 1) in order to revert to the old behavior, with no extra cost in memory. The only synchronization between the threads is the logging messages by the exporter, which are printed by each thread when it finishes comparing a file with all other files after it.

clang 15.0.0 has a bug that requires us to define the constructor for the Block class in order to be able to emplace_back at the std::vector<Block>. This is fixed in clang 16.0.0

dlidstrom · 2025-05-18T13:49:55Z

Hi @cgkantidis ! Very nice work, I'm impressed (understatement!). I will merge this when I get a chance.

dlidstrom · 2025-05-18T20:52:19Z

I ran some tests and this seems to work very well. I just think a few minor additions would be nice to have. What do you think?

diff --git a/src/Main.cpp b/src/Main.cpp
index e3ff9f8..e73341a 100644
--- a/src/Main.cpp
+++ b/src/Main.cpp
@@ -5,6 +5,7 @@
 #include <algorithm>
 #include <iostream>
 #include <iterator>
+#include <thread>
 
 namespace {
     int run(long argc, const char* argv[]) {
@@ -22,7 +23,7 @@ namespace {
 
             unsigned char blockPercentThreshold = static_cast<unsigned char>(blockPercentThresholdValue);
             auto numberOfFiles = ap.getInt("-n", 0);
-            auto numThreads = ap.getInt("-j", 1);
+            auto numThreads = ap.getInt("-j", std::thread::hardware_concurrency());
             auto minChars = ap.getInt("-mc", MIN_CHARS);
             bool ignorePrepStuff = ap.is("-ip");
             bool outputXml = ap.is("-xml");
@@ -48,7 +49,7 @@ namespace {
             std::cout << "       Duplo " << VERSION << " - duplicate source code block finder\n\n";
 
             std::cout << "\nSYNOPSIS\n";
-            std::cout << "       duplo [OPTIONS] [INTPUT_FILELIST] [OUTPUT_FILE]\n";
+            std::cout << "       duplo [OPTIONS] [INPUT_FILELIST] [OUTPUT_FILE]\n";
 
             std::cout << "\nDESCRIPTION\n";
             std::cout << "       Duplo is a tool to find duplicated code blocks in large\n";
@@ -65,6 +66,7 @@ namespace {
             std::cout << "       -d               ignore file pairs with same name\n";
             std::cout << "       -xml             output file in XML\n";
             std::cout << "       -json            output file in JSON format\n";
+            std::cout << "       -j               number of threads to use (default is number of cores: " << std::thread::hardware_concurrency() << ")\n";
             std::cout << "       INPUT_FILELIST   input filelist (specify '-' to read from stdin)\n";
             std::cout << "       OUTPUT_FILE      output file (specify '-' to output to stdout)\n";
 
@@ -76,6 +78,7 @@ namespace {
                 "       Daniel Lidstrom ([email protected])",
                 "       Christian M. Ammann ([email protected])",
                 "       Trevor D'Arcy-Evans ([email protected])",
+                "       Christos Gkantidis"
             };
             std::copy(std::begin(authors), std::end(authors), std::ostream_iterator<const char*>(std::cout, "\n"));
             std::cout << "\n";
diff --git a/src/compile_flags.txt b/src/compile_flags.txt
index 69dec9d..38b9326 100644
--- a/src/compile_flags.txt
+++ b/src/compile_flags.txt
@@ -4,3 +4,5 @@
 include/
 -I
 ../build/_deps/nlohmann_json-src/single_include/
+-I
+../build/_deps/thread_pool-src/include/

compile_flags.txt is used by the clangd VSCode extension.

cgkantidis · 2025-05-18T21:01:00Z

I'm fine with these changes!

Thanks for adding my name to the authors list! I really appreciate it! You can add my email next to the name if you want to match the format ([email protected])

You don't need the compile_flags.txy by the way... The most canonical way would be to have cmake generate the compile_commands.json and have clangd pick this up... Then your IDE will also show the warnings enabled by the flags in cmake etc...

cgkantidis · 2025-05-18T21:06:05Z

The reason I left the default number of threads to 1 is because for a large project (like gcc or llvm code) there might not be enough memory to accommodate all threads, which might either report an error, or might lead to memory thrashing, continually storing memory to swap (disk) and RAM, whichight slow down the program...

With -j 1 it will not be as fast as possible but it's up to the user to determine how many threads are enough for them.

dlidstrom · 2025-05-18T21:09:47Z

Ok, let's leave it at 1 for safety then, it's good enough as it is 👍

cgkantidis added 2 commits May 17, 2025 21:41

Fix clang 15.0.0 build

b6759f0

clang 15.0.0 has a bug that requires us to define the constructor for the Block class in order to be able to emplace_back at the std::vector<Block>. This is fixed in clang 16.0.0

cgkantidis force-pushed the threadpool branch from b3e501a to b6759f0 Compare May 18, 2025 10:47

dlidstrom merged commit 883c9e7 into dlidstrom:main May 18, 2025
6 of 7 checks passed

cgkantidis deleted the threadpool branch May 19, 2025 21:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a threadpool and run the comparison of each file with all files a… #72

Add a threadpool and run the comparison of each file with all files a… #72

Uh oh!

cgkantidis commented May 17, 2025 •

edited

Loading

Uh oh!

dlidstrom commented May 18, 2025

Uh oh!

dlidstrom commented May 18, 2025

Uh oh!

cgkantidis commented May 18, 2025

Uh oh!

cgkantidis commented May 18, 2025

Uh oh!

dlidstrom commented May 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add a threadpool and run the comparison of each file with all files a… #72

Add a threadpool and run the comparison of each file with all files a… #72

Uh oh!

Conversation

cgkantidis commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dlidstrom commented May 18, 2025

Uh oh!

dlidstrom commented May 18, 2025

Uh oh!

cgkantidis commented May 18, 2025

Uh oh!

cgkantidis commented May 18, 2025

Uh oh!

dlidstrom commented May 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cgkantidis commented May 17, 2025 •

edited

Loading