Skip to content

Commit 802cfb3

Browse files
authored
Merge pull request #28 from dlidstrom/original-line-numbers
Original line numbers
2 parents d26c2f7 + f72b1b1 commit 802cfb3

File tree

11 files changed

+184
-62
lines changed

11 files changed

+184
-62
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ duplo
1313
build
1414
.vscode
1515
out.txt
16+
out.xml
1617
files.lst
1718
CMakeFiles
1819
CMakeCache.txt

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ cmake_minimum_required(VERSION 3.15.5)
22
project(duplo)
33
file(GLOB SOURCES src/*.cpp)
44

5-
SET(DUPLO_VERSION "1.0.0" CACHE STRING "Duplo version")
5+
SET(DUPLO_VERSION "\"v1.0.0\"" CACHE STRING "Duplo version")
66

77
if(MSVC)
88
else()

README.md

Lines changed: 64 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,44 @@
1-
# 1. Duplo (C/C++/Java Duplicate Source Code Block Finder)
1+
# Duplo (C/C++/Java Duplicate Source Code Block Finder) <!-- omit in toc -->
22

33
![C/C++ CI](https://github.com/dlidstrom/Duplo/workflows/C/C++%20CI/badge.svg)
44

5-
- [1. Duplo (C/C++/Java Duplicate Source Code Block Finder)](#1-duplo-ccjava-duplicate-source-code-block-finder)
6-
- [1.1. General Information](#11-general-information)
7-
- [1.2. Maintainer](#12-maintainer)
8-
- [1.3. File Format Support](#13-file-format-support)
9-
- [1.4. Installation](#14-installation)
10-
- [1.4.1. Docker](#141-docker)
11-
- [1.4.2. Pre-built binaries](#142-pre-built-binaries)
12-
- [1.5. Usage](#15-usage)
13-
- [1.5.1. Passing files using `stdin`](#151-passing-files-using-stdin)
14-
- [1.5.2. Passing files using file](#152-passing-files-using-file)
15-
- [1.5.3. Xml output](#153-xml-output)
16-
- [1.6. Feedback and Bug Reporting](#16-feedback-and-bug-reporting)
17-
- [1.7. Algorithm Background](#17-algorithm-background)
18-
- [1.7.1. Performance Measurements](#171-performance-measurements)
19-
- [1.8. Developing](#18-developing)
20-
- [1.8.1. Unix](#181-unix)
21-
- [1.8.2. Windows](#182-windows)
22-
- [1.8.3. Additional Language Support](#183-additional-language-support)
23-
- [1.8.4. Language Suggestions](#184-language-suggestions)
24-
- [1.9. Changes](#19-changes)
25-
- [1.10. License](#110-license)
26-
27-
## 1.1. General Information
5+
- [1. General Information](#1-general-information)
6+
- [2. Maintainer](#2-maintainer)
7+
- [3. File Format Support](#3-file-format-support)
8+
- [4. Installation](#4-installation)
9+
- [4.1. Docker](#41-docker)
10+
- [4.2. Pre-built binaries](#42-pre-built-binaries)
11+
- [5. Usage](#5-usage)
12+
- [5.1. Passing files using `stdin`](#51-passing-files-using-stdin)
13+
- [5.1.1. Bash](#511-bash)
14+
- [5.1.2. Windows](#512-windows)
15+
- [5.1.3. Docker](#513-docker)
16+
- [5.2. Passing files using file](#52-passing-files-using-file)
17+
- [5.3. Xml output](#53-xml-output)
18+
- [6. Feedback and Bug Reporting](#6-feedback-and-bug-reporting)
19+
- [7. Algorithm Background](#7-algorithm-background)
20+
- [7.1. Performance Measurements](#71-performance-measurements)
21+
- [8. Developing](#8-developing)
22+
- [8.1. Unix](#81-unix)
23+
- [8.2. Windows](#82-windows)
24+
- [8.3. Additional Language Support](#83-additional-language-support)
25+
- [8.4. Language Suggestions](#84-language-suggestions)
26+
- [9. Changes](#9-changes)
27+
- [10. License](#10-license)
28+
29+
## 1. General Information
2830

2931
Duplicated source code blocks can harm maintainability of software systems.
3032
Duplo is a tool to find duplicated code blocks in large C, C++, Java, C# and
3133
VB.Net systems.
3234

33-
## 1.2. Maintainer
35+
## 2. Maintainer
3436

3537
Duplo was originally developed by Christian
3638
M. Ammann and is now maintained and developed by Daniel
3739
Lidström.
3840

39-
## 1.3. File Format Support
41+
## 3. File Format Support
4042

4143
Duplo has built in support for the following
4244
file formats:
@@ -75,9 +77,9 @@ src\engine\geometry\SkinnedMeshGeometry.cpp(45)
7577
...
7678
```
7779

78-
## 1.4. Installation
80+
## 4. Installation
7981

80-
### 1.4.1. Docker
82+
### 4.1. Docker
8183

8284
If you have Docker, the way to run Duplo is to use this command:
8385

@@ -88,34 +90,52 @@ If you have Docker, the way to run Duplo is to use this command:
8890

8991
This pulls the latest image and runs duplo. Note that you'll have to pipe the filenames into this command. A complete commandline sample will be shown below.
9092

91-
### 1.4.2. Pre-built binaries
93+
### 4.2. Pre-built binaries
9294

9395
Duplo is also available as a pre-built binary for (alpine) linux and macos. Grab the executable from the [releases](https://github.com/dlidstrom/Duplo/releases) page.
9496

9597
You can of course build from source as well, and you'll have to do so to get a binary for Windows.
9698

97-
## 1.5. Usage
99+
## 5. Usage
98100

99101
Duplo works with a list of files. You can either specify a file that contains the list of files, or you can pass them using `stdin`.
100102

101103
Run `duplo --help` on the command line to see the detailed options.
102104

103-
### 1.5.1. Passing files using `stdin`
105+
### 5.1. Passing files using `stdin`
106+
107+
In each of the following commands, `duplo` will write the duplicated blocks into `out.txt` in addition to the information written to stdout.
108+
109+
#### 5.1.1. Bash
104110

105111
```bash
106112
# unix
107113
> find . -type f \( -iname "*.cpp" -o -iname "*.h" \) | duplo - out.txt
114+
```
108115

116+
Let's break this down. `find . -type f \( -iname "*.cpp" -o -iname "*.h" \)` is a syntax to look recursively in the current directory (the `.` part) for files (the `-type f` part) matching `*.cpp` or `*.h` (case insensitive). The output from `find` is piped into `duplo` which then reads the filenames from `stdin` (the `-` tells `duplo` to get the filenames from `stdin`, a common unix convention in many commandline applications). The result of the analysis is then written to `out.txt`.
117+
118+
#### 5.1.2. Windows
119+
120+
```bash
109121
# windows
110122
> Get-ChildItem -Include "*.cpp", "*.h" -Recurse | % { $_.FullName } | Duplo.exe - out.txt
123+
```
124+
125+
This works similarly to the Bash command, but uses PowerShell commands to achieve the same effect.
126+
127+
#### 5.1.3. Docker
111128
129+
```bash
112130
# Docker on unix
113131
> find . -type f \( -iname "*.cpp" -or -iname "*.h" \) | docker run --rm -i -w /src -v $(pwd):/src dlidstrom/duplo - out.txt
114132
```
115133
116-
In each of the above commands, `duplo` will write the duplicated blocks into `out.txt` in addition to the information written to stdout.
134+
This command also works in a similar fashion to the Bash command, but instead of piping into a local `duplo` executable, it will pipe into `duplo` running inside Docker. This is very convenient as you do not have to install `duplo` separately. You will have to install Docker though, if you haven't already. That is a good thing to do anyway, since it opens up a lot of possibilities apart from running `duplo`.
135+
136+
Again, similarly to the Bash command, this uses `find` to find files in the current directory, then passes the file list to Docker which will pass it further into an instance of the latest version of `duplo`. The working directory in the `duplo` container should be `/src` (that's where the `duplo` executable is located) and the current path of your host machine will be mapped to `/src` when the container is running. The `-i` allows `stdin` of your host machine to be passed into Docker to allow `duplo` to read the filenames. Any parameters to `duplo` can be placed at the end of the command as you can see `- out.txt` has been.
117137
118-
### 1.5.2. Passing files using file
138+
### 5.2. Passing files using file
119139
120140
`duplo` can analyze files specified in a separate file:
121141
@@ -135,30 +155,30 @@ In each of the above commands, `duplo` will write the duplicated blocks into `ou
135155
136156
Again, the duplicated blocks are written to `out.txt`.
137157
138-
### 1.5.3. Xml output
158+
### 5.3. Xml output
139159
140160
Duplo can also output xml and there is a stylesheet that will format the result for viewing in a browser. This can be used as a report tab in your continuous integration tool (TeamCity, etc).
141161
142-
## 1.6. Feedback and Bug Reporting
162+
## 6. Feedback and Bug Reporting
143163
144164
Please open an issue to discuss feedback,
145165
feature requests and bug reports.
146166
147-
## 1.7. Algorithm Background
167+
## 7. Algorithm Background
148168
149169
Duplo uses the same techniques as Duploc to detect duplicated code blocks. See
150170
[Duca99bCodeDuplication](http://scg.unibe.ch/archive/papers/Duca99bCodeDuplication.pdf) for
151171
further information.
152172
153-
### 1.7.1. Performance Measurements
173+
### 7.1. Performance Measurements
154174
155175
| System | Files | Loc's | Time |
156176
|-|-|-|-|
157177
| Quake2 | 266 | 102740 | 18sec |
158178
159-
## 1.8. Developing
179+
## 8. Developing
160180
161-
### 1.8.1. Unix
181+
### 8.1. Unix
162182
163183
You need `CMake` and preferrably `fswatch` for the best experience.
164184
@@ -183,11 +203,11 @@ build/> popd
183203
> ./watch.sh
184204
```
185205
186-
### 1.8.2. Windows
206+
### 8.2. Windows
187207
188208
Use Visual Studio 2019 to open the included solution file (or try `CMake`).
189209
190-
### 1.8.3. Additional Language Support
210+
### 8.3. Additional Language Support
191211
192212
Duplo can analyze all text files regardless of format, but it has special support for some programming languages (C++, C#, Java, for example). This allows Duplo to improve the duplication detection as it can ignore preprocessor directives and/or comments.
193213
@@ -196,7 +216,7 @@ To implement support for a new language, there are a couple of options (in order
196216
1. Implement `FileTypeBase` which has support for handling comments and preprocessor directives. You just need to decide what is a comment. With this option you need to implement a couple of methods, one which is `CreateLineFilter`. This is to remove multiline comments. Look at `CstyleCommentsFilter` for an example.
197217
2. Implement `IFileType` interface directly. This gives you the most freedom but also is the hardest option of course.
198218
199-
### 1.8.4. Language Suggestions
219+
### 8.4. Language Suggestions
200220
201221
- JavaScript (easy, just look at the existing C-based ones)
202222
- Ruby
@@ -212,7 +232,7 @@ To implement support for a new language, there are a couple of options (in order
212232
213233
Send me a pull request!
214234
215-
## 1.9. Changes
235+
## 9. Changes
216236
217237
- 0.5
218238
- Fixed malformed xml (thanks [@ArsMasiuk](https://github.com/ArsMasiuk)!)
@@ -228,7 +248,7 @@ Send me a pull request!
228248
- Fixed limitation of total number of lines of code
229249
- Checking of arbitrary files
230250
231-
## 1.10. License
251+
## 10. License
232252
233253
Duplo is free software; you can redistribute it and/or modify
234254
it under the terms of the GNU General Public License as published by

compile.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
#!/bin/bash
2+
# to run this, first set the DUPLO_VERSION environment variable. Otherwise
3+
# some of the tests might fail. So, do this:
4+
# > export DUPLO_VERSION=v1.0.0
25
p() {
36
now="$(date +'%r')"
47
printf "$(tput setaf 1)%s$(tput sgr0) | $(tput bold)$1$(tput sgr0)\n" "$now";

src/Duplo.cpp

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -172,10 +172,26 @@ namespace {
172172
std::ostream& outFile) {
173173
unsigned duplicateLines = 0;
174174
if (xml) {
175-
outFile << " <set LineCount=\"" << count << "\">" << std::endl;
176-
outFile << " <block SourceFile=\"" << source1.GetFilename() << "\" StartLineNumber=\"" << source1.GetLine(line1).GetLineNumber() << "\"/>" << std::endl;
177-
outFile << " <block SourceFile=\"" << source2.GetFilename() << "\" StartLineNumber=\"" << source2.GetLine(line2).GetLineNumber() << "\"/>" << std::endl;
178-
outFile << " <lines xml:space=\"preserve\">" << std::endl;
175+
outFile
176+
<< " <set LineCount=\"" << count << "\">"
177+
<< std::endl;
178+
int startLineNumber1 = source1.GetLine(line1).GetLineNumber();
179+
int endLineNumber1 = source1.GetLine(line1 + count).GetLineNumber();
180+
outFile
181+
<< " <block SourceFile=\"" << source1.GetFilename()
182+
<< "\" StartLineNumber=\"" << startLineNumber1
183+
<< "\" EndLineNumber=\"" << endLineNumber1 << "\"/>"
184+
<< std::endl;
185+
int startLineNumber2 = source2.GetLine(line2).GetLineNumber();
186+
int endLineNumber2 = source2.GetLine(line2 + count).GetLineNumber();
187+
outFile
188+
<< " <block SourceFile=\"" << source2.GetFilename()
189+
<< "\" StartLineNumber=\"" << startLineNumber2
190+
<< "\" EndLineNumber=\"" << endLineNumber2 << "\"/>"
191+
<< std::endl;
192+
outFile
193+
<< " <lines xml:space=\"preserve\">"
194+
<< std::endl;
179195
for (int j = 0; j < count; j++) {
180196
// replace various characters/ strings so that it doesn't upset the XML parser
181197
std::string tmpstr = source1.GetLine(j + line1).GetLine();
@@ -199,8 +215,14 @@ namespace {
199215
outFile << " </lines>" << std::endl;
200216
outFile << " </set>" << std::endl;
201217
} else {
202-
outFile << source1.GetFilename() << "(" << source1.GetLine(line1).GetLineNumber() << ")" << std::endl;
203-
outFile << source2.GetFilename() << "(" << source2.GetLine(line2).GetLineNumber() << ")" << std::endl;
218+
outFile
219+
<< source1.GetFilename()
220+
<< "(" << source1.GetLine(line1).GetLineNumber() << ")"
221+
<< std::endl;
222+
outFile
223+
<< source2.GetFilename()
224+
<< "(" << source2.GetLine(line2).GetLineNumber() << ")"
225+
<< std::endl;
204226
for (int j = 0; j < count; j++) {
205227
outFile << source1.GetLine(j + line1).GetLine() << std::endl;
206228
duplicateLines++;
@@ -426,7 +448,7 @@ void Duplo::Run(const Options& options) {
426448
<< std::endl;
427449
} else {
428450
outfile
429-
<< "Configuration: "
451+
<< "Configuration:"
430452
<< std::endl
431453
<< " Number of files: "
432454
<< files
@@ -444,7 +466,7 @@ void Duplo::Run(const Options& options) {
444466
<< options.GetIgnoreSameFilename()
445467
<< std::endl
446468
<< std::endl
447-
<< "Results: "
469+
<< "Results:"
448470
<< std::endl
449471
<< " Lines of code: "
450472
<< locsTotal

src/SourceLine.cpp

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
#include "SourceLine.h"
22
#include "HashUtil.h"
33
#include "SourceFile.h"
4-
5-
#include <algorithm>
4+
5+
#include <algorithm>
66

77
SourceLine::SourceLine(const std::string& line, int lineNumber) {
88
m_line = line;
@@ -11,16 +11,16 @@ SourceLine::SourceLine(const std::string& line, int lineNumber) {
1111
std::string cleanLine;
1212

1313
// Remove all white space and noise (tabs etc)
14-
std::copy_if(
15-
std::begin(line),
16-
std::end(line),
17-
std::back_inserter(cleanLine),
14+
std::copy_if(
15+
std::begin(line),
16+
std::end(line),
17+
std::back_inserter(cleanLine),
1818
[](char c) { return c > ' '; });
1919
m_hash = HashUtil::Hash(cleanLine.c_str(), cleanLine.size());
2020
}
2121

2222
int SourceLine::GetLineNumber() const {
23-
return m_lineNumber;
23+
return m_lineNumber + 1;
2424
}
2525

2626
bool SourceLine::operator==(const SourceLine& other) const {
@@ -30,7 +30,7 @@ bool SourceLine::operator==(const SourceLine& other) const {
3030
const std::string& SourceLine::GetLine() const {
3131
return m_line;
3232
}
33-
33+
3434
unsigned long SourceLine::GetHash() const {
3535
return m_hash;
3636
}

tests/Simple/LineNumbers.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
AAAAA
2+
BBBBB
3+
CCCCC
4+
DDDDD
5+
EEEEE
6+
/* some comment to offset the line numbers */
7+
AAAAA
8+
BBBBB
9+
CCCCC
10+
// skip this line
11+
DDDDD
12+
EEEEE
13+
FFFFF

tests/Simple/LineNumbers.lst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
tests/Simple/LineNumbers.c

tests/Simple/test-xml.bats

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
setup() {
2+
run ./build/duplo -xml tests/Simple/LineNumbers.lst out.xml
3+
}
4+
5+
@test "LineNumbers.c" {
6+
[ "$status" -eq 0 ]
7+
[ "${lines[0]}" = "Loading and hashing files ... 2 done." ]
8+
[ "${lines[1]}" = "tests/Simple/LineNumbers.c found: 1 block(s)" ]
9+
}
10+
11+
@test "LineNumbers.c out.xml" {
12+
run cat out.xml
13+
printf 'Lines:\n'
14+
printf 'lines %s\n' "${lines[@]}" >&2
15+
[ "${lines[0]}" = "<?xml version=\"1.0\"?>" ]
16+
[ "${lines[1]}" = "<duplo>" ]
17+
[ "${lines[2]}" = " <set LineCount=\"5\">" ]
18+
[ "${lines[3]}" = " <block SourceFile=\"tests/Simple/LineNumbers.c\" StartLineNumber=\"7\" EndLineNumber=\"13\"/>" ]
19+
[ "${lines[4]}" = " <block SourceFile=\"tests/Simple/LineNumbers.c\" StartLineNumber=\"1\" EndLineNumber=\"7\"/>" ]
20+
[ "${lines[5]}" = " <lines xml:space=\"preserve\">" ]
21+
[ "${lines[6]}" = " <line Text=\"AAAAA\"/>" ]
22+
[ "${lines[7]}" = " <line Text=\"BBBBB\"/>" ]
23+
[ "${lines[8]}" = " <line Text=\"CCCCC\"/>" ]
24+
[ "${lines[9]}" = " <line Text=\"DDDDD\"/>" ]
25+
[ "${lines[10]}" = " <line Text=\"EEEEE\"/>" ]
26+
[ "${lines[11]}" = " </lines>" ]
27+
[ "${lines[12]}" = " </set>" ]
28+
[ "${lines[13]}" = "</duplo>" ]
29+
}

0 commit comments

Comments
 (0)