Skip to content

Commit

Permalink
Merge branch 'release/v0.4.1-alpha'
Browse files Browse the repository at this point in the history
  • Loading branch information
andreaskoch committed Nov 16, 2020
2 parents a3e9041 + d7d2c18 commit 3494175
Show file tree
Hide file tree
Showing 5 changed files with 33 additions and 4 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).

## [v0.4.1-alpha] - 2020-11-16

### Fixed
- Don't visit URLs twice (introduced with v0.4.0-alpha)

## [v0.4.0-alpha] - 2020-11-05

Logging
Expand Down
18 changes: 17 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,24 @@ Date and time #worker Status Code Bytes Response Time URL

You can download binaries for Linux, macOS and Windows from [github.com »andreaskoch » gargantua » releases](https://github.com/andreaskoch/gargantua/releases):

Linux:

```bash
curl -L https://github.com/andreaskoch/gargantua/releases/download/v0.4.1-alpha/gargantua_linux_amd64 -o gargantua
chmod +x gargantua
```

macOS:

```bash
curl -L https://github.com/andreaskoch/gargantua/releases/download/v0.4.1-alpha/gargantua_darwin_amd64 -o gargantua
chmod +x gargantua
```

Windows:

```bash
wget https://github.com/andreaskoch/gargantua/releases/download/v0.3.0-alpha/gargantua_linux_amd64
curl -L https://github.com/andreaskoch/gargantua/releases/download/v0.4.1-alpha/gargantua_windows_amd64 -o gargantua.exe
```

## Docker Image
Expand Down
4 changes: 2 additions & 2 deletions crawler.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,14 +50,14 @@ func crawl(xmlSitemapURL url.URL, options CrawlOptions, stop chan bool) error {

case targetURL := <-urls:
// skip URLs we have already seen
_, alreadyVisited := visitedURLs[targetURL.String()]
_, alreadyVisited := visitedURLs[targetURL.getUrl()]

if alreadyVisited {
continue
}

// mark the URL as visited
visitedURLs[targetURL.String()] = targetURL
visitedURLs[targetURL.getUrl()] = targetURL

debugf("Sending URL to work queue: %s", targetURL.String())

Expand Down
8 changes: 8 additions & 0 deletions http.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,14 @@ type crawlerUrl struct {
parent url.URL
}

func (u crawlerUrl) getUrl() string {
return u.url.String()
}

func (u crawlerUrl) getParent() string {
return u.parent.String()
}

func (u crawlerUrl) String() string {
return fmt.Sprintf("%s (%s)", u.url.String(), u.parent.String())
}
Expand Down
2 changes: 1 addition & 1 deletion main.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import (
)

const applicationName = "gargantua"
const applicationVersion = "v0.4.0-alpha"
const applicationVersion = "v0.4.1-alpha"

var defaultUserAgent = fmt.Sprintf("%s bot (https://github.com/andreaskoch/gargantua)", applicationName)

Expand Down

0 comments on commit 3494175

Please sign in to comment.