Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions features/llm-extract.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,14 @@ import ExtractNoSchemaPython from "/snippets/v2/scrape/json/no-schema/python.mdx
import ExtractNoSchemaNode from "/snippets/v2/scrape/json/no-schema/js.mdx";
import ExtractNoSchemaCURL from "/snippets/v2/scrape/json/no-schema/curl.mdx";
import ExtractNoSchemaOutput from "/snippets/v2/scrape/json/no-schema/output.mdx";
import EventExampleCURL from "/snippets/v2/scrape/json/events-example/curl.mdx";
import EventExamplePython from "/snippets/v2/scrape/json/events-example/python.mdx";
import EventExampleNode from "/snippets/v2/scrape/json/events-example/js.mdx";
import EventExampleOutput from "/snippets/v2/scrape/json/events-example/output.mdx";

<Note>
**v2 API Change:** JSON schema extraction is fully supported in v2, but the API format has changed. In v2, the schema is embedded directly inside the format object as `formats: [{type: "json", schema: {...}}]`. The v1 `jsonOptions` parameter no longer exists in v2.
</Note>

## Scrape and extract structured data with Firecrawl

Expand Down Expand Up @@ -66,13 +73,31 @@ Output:

<ExtractNoSchemaOutput />

### Real-world example: Extracting company information

Here's a comprehensive example extracting structured company information from a website:

<CodeGroup>

<EventExamplePython />
<EventExampleNode />
<EventExampleCURL />

</CodeGroup>

Output:

<EventExampleOutput />

### JSON format options

When using JSON mode, include an object in `formats`, for example:
When using JSON mode in v2, include an object in `formats` with the schema embedded directly:

`formats: [{ type: 'json', schema: { ... }, prompt: '...' }]`

Parameters:

- `schema`: JSON Schema describing the structured output you want.
- `schema`: JSON Schema describing the structured output you want (required for schema-based extraction).
- `prompt`: Optional prompt to guide extraction (also used for no-schema extraction).

**Important:** Unlike v1, there is no separate `jsonOptions` parameter in v2. The schema must be included directly inside the format object in the `formats` array.
4 changes: 2 additions & 2 deletions snippets/v2/scrape/json/base/curl.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://docs.firecrawl.dev/",
"url": "https://firecrawl.dev",
"formats": [ {
"type": "json",
"schema": {
Expand Down Expand Up @@ -31,4 +31,4 @@ curl -X POST https://api.firecrawl.dev/v2/scrape \
}
} ]
}'
```
```
2 changes: 1 addition & 1 deletion snippets/v2/scrape/json/base/js.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ const schema = z.object({
is_in_yc: z.boolean()
});

const result = await app.scrape("https://docs.firecrawl.dev/", {
const result = await app.scrape("https://firecrawl.dev", {
formats: [{
type: "json",
schema: schema
Expand Down
7 changes: 4 additions & 3 deletions snippets/v2/scrape/json/base/python.mdx
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
```python Python
from firecrawl import Firecrawl
from pydantic import BaseModel

app = Firecrawl(api_key="fc-YOUR-API-KEY")

class JsonSchema(BaseModel):
class CompanyInfo(BaseModel):
company_mission: str
supports_sso: bool
is_open_source: bool
Expand All @@ -13,11 +14,11 @@ result = app.scrape(
'https://firecrawl.dev',
formats=[{
"type": "json",
"schema": JsonSchema
"schema": CompanyInfo.model_json_schema()
}],
only_main_content=False,
timeout=120000
)

print(result)
```
```
34 changes: 34 additions & 0 deletions snippets/v2/scrape/json/events-example/curl.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
```bash cURL
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev/",
"formats": [{
"type": "json",
"schema": {
"type": "object",
"properties": {
"company_mission": {
"type": "string"
},
"supports_sso": {
"type": "boolean"
},
"is_open_source": {
"type": "boolean"
},
"is_in_yc": {
"type": "boolean"
}
},
"required": [
"company_mission",
"supports_sso",
"is_open_source",
"is_in_yc"
]
}
}]
}'
```
24 changes: 24 additions & 0 deletions snippets/v2/scrape/json/events-example/js.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
```js Node
import FirecrawlApp from "@mendable/firecrawl-js";
import { z } from "zod";

const app = new FirecrawlApp({
apiKey: "fc-YOUR_API_KEY"
});

const companyInfoSchema = z.object({
company_mission: z.string(),
supports_sso: z.boolean(),
is_open_source: z.boolean(),
is_in_yc: z.boolean()
});

const result = await app.scrape("https://firecrawl.dev/", {
formats: [{
type: "json",
schema: companyInfoSchema
}]
});

console.log(result);
```
13 changes: 13 additions & 0 deletions snippets/v2/scrape/json/events-example/output.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
```json Output
{
"success": true,
"data": {
"json": {
"company_mission": "Turn websites into LLM-ready data",
"supports_sso": true,
"is_open_source": true,
"is_in_yc": true
}
}
}
```
22 changes: 22 additions & 0 deletions snippets/v2/scrape/json/events-example/python.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
```python Python
from firecrawl import Firecrawl
from pydantic import BaseModel

app = Firecrawl(api_key="fc-YOUR-API-KEY")

class CompanyInfo(BaseModel):
company_mission: str
supports_sso: bool
is_open_source: bool
is_in_yc: bool

result = app.scrape(
'https://firecrawl.dev/',
formats=[{
"type": "json",
"schema": CompanyInfo.model_json_schema()
}]
)

print(result)
```
4 changes: 2 additions & 2 deletions snippets/v2/scrape/json/no-schema/curl.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://docs.firecrawl.dev/",
"url": "https://firecrawl.dev",
"formats": [{
"type": "json",
"prompt": "Extract the company mission from the page."
}]
}'
```
```
2 changes: 1 addition & 1 deletion snippets/v2/scrape/json/no-schema/js.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ const app = new FirecrawlApp({
apiKey: "fc-YOUR_API_KEY"
});

const result = await app.scrape("https://docs.firecrawl.dev/", {
const result = await app.scrape("https://firecrawl.dev", {
formats: [{
type: "json",
prompt: "Extract the company mission from the page."
Expand Down