Open
Description
Description
Hi, I encountered a problem. After executing the scraper, I found that the content of some links cannot be crawled. The logs show 0 records. I have tried many methods, but it still cannot be crawled.
Steps to reproduce
here is part of my config
{
"index_name": "docs",
"sitemap_urls": [
"https://mydomain/sitemap.xml"
],
"start_urls": [
{
"url": "https://mydomain/guides",
"tags": [
"guides"
],
"selectors_key": "guides"
}
],
"stop_urls": [],
"selectors": {
"default": {
"lvl0": {
"selector": "",
"global": true,
"default_value": "文档"
},
"lvl1": "article h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5, article th, article td:first-child",
"lvl6": "article h6",
"text": "article p, article li, article td"
},
"guides": {
"lvl0": {
"selector": "",
"global": true,
"default_value": "开发指南"
},
"lvl1": "article h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5, article th, article td:first-child",
"lvl6": "article h6",
"text": "article p, article li, article td"
}
},
"strip_chars": " .,;:#",
"custom_settings": {
"separatorsToIndex": "_",
"attributesForFaceting": [
"language",
"version",
"type",
"docusaurus_tag",
"tags"
],
"attributesToRetrieve": [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type"
]
},
"nb_hits": 2227
}
Expected Behavior
I hope to crawl the content of all the links in the configuration into Typesense.
Actual Behavior
Content cannot be searched
Metadata
Typesense Version: maybe 0.24,I don't know how to get to know version
OS:x86_64 GNU/Linux
Metadata
Assignees
Labels
No labels