Skip to content

Commit 3494175

Browse files
committed
Merge branch 'release/v0.4.1-alpha'
2 parents a3e9041 + d7d2c18 commit 3494175

File tree

5 files changed

+33
-4
lines changed

5 files changed

+33
-4
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file.
44
The format is based on [Keep a Changelog](http://keepachangelog.com/)
55
and this project adheres to [Semantic Versioning](http://semver.org/).
66

7+
## [v0.4.1-alpha] - 2020-11-16
8+
9+
### Fixed
10+
- Don't visit URLs twice (introduced with v0.4.0-alpha)
11+
712
## [v0.4.0-alpha] - 2020-11-05
813

914
Logging

README.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,24 @@ Date and time #worker Status Code Bytes Response Time URL
6161

6262
You can download binaries for Linux, macOS and Windows from [github.com »andreaskoch » gargantua » releases](https://github.com/andreaskoch/gargantua/releases):
6363

64+
Linux:
65+
66+
```bash
67+
curl -L https://github.com/andreaskoch/gargantua/releases/download/v0.4.1-alpha/gargantua_linux_amd64 -o gargantua
68+
chmod +x gargantua
69+
```
70+
71+
macOS:
72+
73+
```bash
74+
curl -L https://github.com/andreaskoch/gargantua/releases/download/v0.4.1-alpha/gargantua_darwin_amd64 -o gargantua
75+
chmod +x gargantua
76+
```
77+
78+
Windows:
79+
6480
```bash
65-
wget https://github.com/andreaskoch/gargantua/releases/download/v0.3.0-alpha/gargantua_linux_amd64
81+
curl -L https://github.com/andreaskoch/gargantua/releases/download/v0.4.1-alpha/gargantua_windows_amd64 -o gargantua.exe
6682
```
6783

6884
## Docker Image

crawler.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,14 +50,14 @@ func crawl(xmlSitemapURL url.URL, options CrawlOptions, stop chan bool) error {
5050

5151
case targetURL := <-urls:
5252
// skip URLs we have already seen
53-
_, alreadyVisited := visitedURLs[targetURL.String()]
53+
_, alreadyVisited := visitedURLs[targetURL.getUrl()]
5454

5555
if alreadyVisited {
5656
continue
5757
}
5858

5959
// mark the URL as visited
60-
visitedURLs[targetURL.String()] = targetURL
60+
visitedURLs[targetURL.getUrl()] = targetURL
6161

6262
debugf("Sending URL to work queue: %s", targetURL.String())
6363

http.go

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,14 @@ type crawlerUrl struct {
9393
parent url.URL
9494
}
9595

96+
func (u crawlerUrl) getUrl() string {
97+
return u.url.String()
98+
}
99+
100+
func (u crawlerUrl) getParent() string {
101+
return u.parent.String()
102+
}
103+
96104
func (u crawlerUrl) String() string {
97105
return fmt.Sprintf("%s (%s)", u.url.String(), u.parent.String())
98106
}

main.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ import (
1111
)
1212

1313
const applicationName = "gargantua"
14-
const applicationVersion = "v0.4.0-alpha"
14+
const applicationVersion = "v0.4.1-alpha"
1515

1616
var defaultUserAgent = fmt.Sprintf("%s bot (https://github.com/andreaskoch/gargantua)", applicationName)
1717

0 commit comments

Comments
 (0)