Skip to content

Conversation

@cgkantidis
Copy link
Contributor

@cgkantidis cgkantidis commented May 17, 2025

…fter it in a different thread

Duplo's job is embarrassingly parallel and thus the usage of a threadpool is the easiest way to speed up Duplo's runtime.

This is a classic case of trading larger memory consumption for faster runtime.

Even the most basic laptops nowadays have at least 8GB of RAM, and not using all of the available memory is a waste.

For enterprise machines, which can have 100s of GB of RAM is trade-off of memory for runtime reduction is especially appreciated.

Users can elect to run the tool with a single thread (-j 1) in order to revert to the old behavior, with no extra cost in memory.

The only synchronization between the threads is the logging messages by the exporter, which are printed by each thread when it finishes comparing a file with all other files after it.

Quake 2 runtime and memory scaling:
quake2

GCC 15.1.0 runtime and memory scaling:
gcc-15 1 0

…fter it in a different thread

Duplo's job is [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel)
and thus the usage of a threadpool is the easiest way to speed up Duplo's
runtime.

This is a classic case of trading larger memory consumption for faster runtime.

Even the most basic laptops nowadays have at least 8GB of RAM, and not using all
of the available memory is a waste.

For enterprise machines, which can have 100s of GB of RAM is trade-off of memory
for runtime reduction is especially appreciated.

Users can elect to run the tool with a single thread (-j 1) in order to revert
to the old behavior, with no extra cost in memory.

The only synchronization between the threads is the logging messages by the
exporter, which are printed by each thread when it finishes comparing a file
with all other files after it.
clang 15.0.0 has a bug that requires us to define the constructor for the Block
class in order to be able to emplace_back at the std::vector<Block>.

This is fixed in clang 16.0.0
@dlidstrom
Copy link
Owner

Hi @cgkantidis ! Very nice work, I'm impressed (understatement!). I will merge this when I get a chance.

@dlidstrom
Copy link
Owner

I ran some tests and this seems to work very well. I just think a few minor additions would be nice to have. What do you think?

diff --git a/src/Main.cpp b/src/Main.cpp
index e3ff9f8..e73341a 100644
--- a/src/Main.cpp
+++ b/src/Main.cpp
@@ -5,6 +5,7 @@
 #include <algorithm>
 #include <iostream>
 #include <iterator>
+#include <thread>
 
 namespace {
     int run(long argc, const char* argv[]) {
@@ -22,7 +23,7 @@ namespace {
 
             unsigned char blockPercentThreshold = static_cast<unsigned char>(blockPercentThresholdValue);
             auto numberOfFiles = ap.getInt("-n", 0);
-            auto numThreads = ap.getInt("-j", 1);
+            auto numThreads = ap.getInt("-j", std::thread::hardware_concurrency());
             auto minChars = ap.getInt("-mc", MIN_CHARS);
             bool ignorePrepStuff = ap.is("-ip");
             bool outputXml = ap.is("-xml");
@@ -48,7 +49,7 @@ namespace {
             std::cout << "       Duplo " << VERSION << " - duplicate source code block finder\n\n";
 
             std::cout << "\nSYNOPSIS\n";
-            std::cout << "       duplo [OPTIONS] [INTPUT_FILELIST] [OUTPUT_FILE]\n";
+            std::cout << "       duplo [OPTIONS] [INPUT_FILELIST] [OUTPUT_FILE]\n";
 
             std::cout << "\nDESCRIPTION\n";
             std::cout << "       Duplo is a tool to find duplicated code blocks in large\n";
@@ -65,6 +66,7 @@ namespace {
             std::cout << "       -d               ignore file pairs with same name\n";
             std::cout << "       -xml             output file in XML\n";
             std::cout << "       -json            output file in JSON format\n";
+            std::cout << "       -j               number of threads to use (default is number of cores: " << std::thread::hardware_concurrency() << ")\n";
             std::cout << "       INPUT_FILELIST   input filelist (specify '-' to read from stdin)\n";
             std::cout << "       OUTPUT_FILE      output file (specify '-' to output to stdout)\n";
 
@@ -76,6 +78,7 @@ namespace {
                 "       Daniel Lidstrom ([email protected])",
                 "       Christian M. Ammann ([email protected])",
                 "       Trevor D'Arcy-Evans ([email protected])",
+                "       Christos Gkantidis"
             };
             std::copy(std::begin(authors), std::end(authors), std::ostream_iterator<const char*>(std::cout, "\n"));
             std::cout << "\n";
diff --git a/src/compile_flags.txt b/src/compile_flags.txt
index 69dec9d..38b9326 100644
--- a/src/compile_flags.txt
+++ b/src/compile_flags.txt
@@ -4,3 +4,5 @@
 include/
 -I
 ../build/_deps/nlohmann_json-src/single_include/
+-I
+../build/_deps/thread_pool-src/include/

compile_flags.txt is used by the clangd VSCode extension.

@cgkantidis
Copy link
Contributor Author

I'm fine with these changes!

Thanks for adding my name to the authors list! I really appreciate it! You can add my email next to the name if you want to match the format ([email protected])

You don't need the compile_flags.txy by the way... The most canonical way would be to have cmake generate the compile_commands.json and have clangd pick this up... Then your IDE will also show the warnings enabled by the flags in cmake etc...

@cgkantidis
Copy link
Contributor Author

The reason I left the default number of threads to 1 is because for a large project (like gcc or llvm code) there might not be enough memory to accommodate all threads, which might either report an error, or might lead to memory thrashing, continually storing memory to swap (disk) and RAM, whichight slow down the program...

With -j 1 it will not be as fast as possible but it's up to the user to determine how many threads are enough for them.

@dlidstrom
Copy link
Owner

Ok, let's leave it at 1 for safety then, it's good enough as it is 👍

@dlidstrom dlidstrom merged commit 883c9e7 into dlidstrom:main May 18, 2025
6 of 7 checks passed
@cgkantidis cgkantidis deleted the threadpool branch May 19, 2025 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants