Skip to content

[Bug] Map results include extra links #1426

Closed
@sawyerclemmons

Description

@sawyerclemmons

Describe the Bug
When using the Firecrawl API and calling the map endpoint, some extra urls are sometimes included in the results. This seems to happen when calling the map endpoint for a child url that does not exist and then later calling it again for the parent url.

To Reproduce
Steps to reproduce the issue:

  1. Call the /map endpoint for a site such as
curl --location 'https://api.firecrawl.dev/v1/map' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ***' \
--data '{
    "url": "https://faculty.cs.byu.edu/~rodham/cs240"
}'

The results do not include the url https://faculty.cs.byu.edu/~rodham/cs240/this-path-is-invalid

  1. Call the /map endpoint with a child url that does not exist and resolves to a 404
curl --location 'https://api.firecrawl.dev/v1/map' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ***' \
--data '{
    "url": "https://faculty.cs.byu.edu/~rodham/cs240/this-path-is-invalid"
}'

This returns a map result with just the provided page as it's not a valid path and leads to a forbidden page.

  1. Call the /map endpoint for the parent url again.
curl --location 'https://api.firecrawl.dev/v1/map' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ***' \
--data '{
    "url": "https://faculty.cs.byu.edu/~rodham/cs240"
}'

Now the results do include the url https://faculty.cs.byu.edu/~rodham/cs240/this-path-is-invalid

Expected Behavior
These non-existent paths would not be included in the map results.

Screenshots
N/A

Environment (please complete the following information):
Using the Firecrawl API here, so no specific environment info.

Logs
N/A

Additional Context
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions