Skip to content

Commit e108978

Browse files
author
termuxhub-bot
committed
metadata: synchronize tool metadata
1 parent dc4fd38 commit e108978

3 files changed

Lines changed: 194 additions & 137 deletions

File tree

metadata/readme/0156.md

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,13 +32,14 @@
3232
- **JavaScript** parsing / crawling
3333
- Customizable **automatic form filling**
3434
- **Scope control** - Preconfigured field / Regex
35+
- **Knowledge base** - ML page-type / form classification (auto-downloaded model)
3536
- **Customizable output** - Preconfigured fields
3637
- INPUT - **STDIN**, **URL** and **LIST**
3738
- OUTPUT - **STDOUT**, **FILE** and **JSON**
3839

3940
## Installation
4041

41-
katana requires Go 1.25+ to install successfully. If you encounter any installation issues, we recommend trying with the latest available version of Go, as the minimum required version may have changed. Run the command below or download a pre-compiled binary from the [release page](https://github.com/projectdiscovery/katana/releases).
42+
katana requires Go 1.26+ to install successfully. If you encounter any installation issues, we recommend trying with the latest available version of Go, as the minimum required version may have changed. Run the command below or download a pre-compiled binary from the [release page](https://github.com/projectdiscovery/katana/releases).
4243

4344
```console
4445
CGO_ENABLED=1 go install github.com/projectdiscovery/katana/cmd/katana@latest
@@ -620,6 +621,62 @@ Option to limit the number of pages crawled per domain. Prevents any single doma
620621
katana -u https://tesla.com -mdp 100
621622
```
622623
624+
## Knowledge Base Classification
625+
626+
Katana can enrich crawl results with a **knowledge base** — machine-learning classification of each crawled page powered by [dit](https://github.com/HappyHackingSpace/dit). When enabled, every response is classified by **page type** (e.g. `login`, `error`, `captcha`, `parked`) and any forms on the page are identified, with the result attached to the `knowledgebase` field of the JSONL output. This works across **all engines** (standard and headless).
627+
628+
> **Note**: The classification model is **downloaded automatically** on first use to `~/.dit/model.json` (from [Hugging Face](https://huggingface.co/datasets/happyhackingspace/dit)). This is a one-time, per-machine cost — subsequent runs reuse the cached model. No manual installation of `dit` is required.
629+
630+
*`-knowledge-base`*
631+
----
632+
633+
Enable knowledge base classification. Page-type and form classification is added to the `knowledgebase` field of each result.
634+
635+
```console
636+
katana -u https://example.com -kb -jsonl
637+
```
638+
639+
```json
640+
{
641+
"timestamp": "...",
642+
"request": { "...": "..." },
643+
"response": {
644+
"...": "...",
645+
"knowledgebase": {
646+
"PageType": "login",
647+
"Forms": [{ "type": "login", "fields": { "username": "username or email", "password": "password" } }]
648+
}
649+
}
650+
}
651+
```
652+
653+
*`-filter-page-type`*
654+
----
655+
656+
Filter results to only the given page type(s). Enabling this implies `-kb` (the classifier is initialized automatically).
657+
658+
```console
659+
katana -u https://example.com -fpt login,error
660+
```
661+
662+
*`-kb-secrets`*
663+
----
664+
665+
Enable the secrets extractor in the knowledge base, surfacing detected secrets (API keys, tokens, etc.) under the `secrets` key. Add `-kb-validate-secrets` to validate detected secrets against their provider — note this **sends live API calls**.
666+
667+
```console
668+
katana -u https://example.com -kb-secrets
669+
```
670+
671+
*`-kb-endpoints`*
672+
----
673+
674+
Enable the endpoints extractor, which classifies requests as REST, GraphQL, SOAP, or XHR under the `endpoints` key.
675+
676+
```console
677+
katana -u https://example.com -kb-endpoints
678+
```
679+
623680
## Authenticated Crawling
624681

625682
Authenticated crawling involves including custom headers or cookies in HTTP requests to access protected resources. These headers provide authentication or authorization information, allowing you to crawl authenticated content / endpoint. You can specify headers directly in the command line or provide them as a file with katana to perform authenticated crawling.

0 commit comments

Comments
 (0)