Have you searched if there an existing feature request for this?
Feature description
Most modern sites embed structured metadata (<script type="application/ld+json">, <meta property="og:*">, Twitter Cards, microdata) for SEO and social sharing. This data is intentionally stable across UI redesigns, which aligns well with Scrapling's adaptive-by-default philosophy.
Today users have to parse it manually:
import json
scripts = page.css('script[type="application/ld+json"]::text').getall()
data = []
for s in scripts:
try:
parsed = json.loads(s)
if isinstance(parsed, list):
data.extend(parsed)
elif "@graph" in parsed:
data.extend(parsed["@graph"])
else:
data.append(parsed)
except json.JSONDecodeError:
pass
og = {
m.attrib["property"].replace("og:", ""): m.attrib.get("content")
for m in page.css('meta[property^="og:"]')
}
Proposed first-class API on Selector:
page.json_ld() # list[dict]; flattens @graph; tolerant of malformed JSON
page.opengraph() # dict of og:* meta
page.twitter_card() # dict of twitter:* meta
page.microdata() # list[dict] parsed from itemscope/itemprop
page.structured_data() # everything above, grouped by source
page.metadata() # normalized summary: {title, description, image, type, ...} fused across sources
Why this fits Scrapling:
- Aligns with the existing ROADMAP item "Add the ability to auto-detect schemas in pages and manipulate them".
- No new heavy deps (
json is stdlib; microdata can be done with the existing lxml).
- Pure addition on
Selector — doesn't touch fetchers or the adaptive parser.
Reference implementations in the ecosystem: extruct, metascraper.
Happy to send a PR against dev if there's interest — would love early feedback on naming / scope before implementing.
Have you searched if there an existing feature request for this?
Feature description
Most modern sites embed structured metadata (
<script type="application/ld+json">,<meta property="og:*">, Twitter Cards, microdata) for SEO and social sharing. This data is intentionally stable across UI redesigns, which aligns well with Scrapling's adaptive-by-default philosophy.Today users have to parse it manually:
Proposed first-class API on
Selector:Why this fits Scrapling:
jsonis stdlib; microdata can be done with the existinglxml).Selector— doesn't touch fetchers or the adaptive parser.Reference implementations in the ecosystem:
extruct,metascraper.Happy to send a PR against
devif there's interest — would love early feedback on naming / scope before implementing.