Skip to content

Commit 4c3ad5c

Browse files
committed
Add README [skip ci]
1 parent 6120b1a commit 4c3ad5c

1 file changed

Lines changed: 140 additions & 0 deletions

File tree

README.md

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
## Brick\StructuredData
2+
3+
<img src="https://raw.githubusercontent.com/brick/brick/master/logo.png" alt="" align="left" height="64">
4+
5+
A PHP library to read Microdata, RDFa Lite & JSON-LD structured data in HTML pages.
6+
7+
This library is a foundation to read schema.org structured data in [brick/schema](https://github.com/brick/schema),
8+
but may be used with other vocabularies.
9+
10+
[![Build Status](https://secure.travis-ci.org/brick/structured-data.svg?branch=master)](http://travis-ci.org/brick/structured-data)
11+
[![Coverage Status](https://coveralls.io/repos/brick/structured-data/badge.svg?branch=master)](https://coveralls.io/r/brick/structured-data?branch=master)
12+
[![Latest Stable Version](https://poser.pugx.org/brick/structured-data/v/stable)](https://packagist.org/packages/brick/structured-data)
13+
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](http://opensource.org/licenses/MIT)
14+
15+
### Installation
16+
17+
This library is installable via [Composer](https://getcomposer.org/):
18+
19+
```bash
20+
composer require brick/structured-data
21+
```
22+
23+
### Requirements
24+
25+
This library requires PHP 7.2 or later. It makes use of the following extensions:
26+
27+
- [dom](https://www.php.net/manual/en/book.dom.php)
28+
- [json](https://www.php.net/manual/en/book.json.php)
29+
- [libxml](https://www.php.net/manual/en/book.libxml.php)
30+
31+
These extensions are enabled by default, and should be available in most PHP installations.
32+
33+
### Project status & release process
34+
35+
This library is under development. It is likely to change fast in the early `0.x` releases. However, the library follows a strict BC break convention:
36+
37+
The current releases are numbered `0.x.y`. When a non-breaking change is introduced (adding new methods, fixing bugs,
38+
optimizing existing code, etc.), `y` is incremented.
39+
40+
**When a breaking change is introduced, a new `0.x` version cycle is always started.**
41+
42+
It is therefore safe to lock your project to a given release cycle, such as `0.1.*`.
43+
44+
If you need to upgrade to a newer release cycle, check the [release history](https://github.com/brick/structured-data/releases)
45+
for a list of changes introduced by each further `0.x.0` version.
46+
47+
### Introduction
48+
49+
The library unifies reading the 3 supported formats (Microdata, RDFa Lite & JSON-LD) under a common interface:
50+
51+
```php
52+
interface Brick\StructuredData\Reader
53+
{
54+
/**
55+
* Reads the items contained in the given document.
56+
*
57+
* @param DOMDocument $document The DOM document to read.
58+
* @param string $url The URL the document was retrieved from. This will be used only to resolve relative
59+
* URLs in property values. No attempt will be performed to connect to this URL.
60+
*
61+
* @return Item[] The top-level items.
62+
*/
63+
public function read(DOMDocument $document, string $url) : array;
64+
}
65+
```
66+
67+
There are 3 implementations of this interface, one for each format:
68+
69+
- `MicrodataReader`
70+
- `RdfaLiteReader`
71+
- `JsonLdReader`
72+
73+
The `read()` method returns the top-level items found in the document. Every `Item` consists of:
74+
75+
- An optional id (`itemid` in Microdata, `resource` in RDFa Lite, `@id` in JSON-LD)
76+
- An array of zero or more types; each type is a URL, for example `http://schema.org/Product`
77+
- An associative array of zero or more properties; each property has a URL as a key, for example `http://schema.org/price`,
78+
and maps to an array of one or more values; values can be plain strings, or nested `Item` objects
79+
80+
### Quickstart
81+
82+
Here is a working example that reads Microdata from a web page. Just change the URL and give it a try:
83+
84+
```php
85+
use Brick\StructuredData\Reader\MicrodataReader;
86+
use Brick\StructuredData\HTMLReader;
87+
use Brick\StructuredData\Item;
88+
89+
// Let's read Microdata here;
90+
// You could also use RdfaLiteReader, JsonLdReader,
91+
// or even use all of them by chaining them in a ReaderChain
92+
$microdataReader = new MicrodataReader();
93+
94+
// Wrap into HTMLReader to be able to read HTML strings or files directly,
95+
// i.e. without manually converting them to DOMDocument instances first
96+
$htmlReader = new HTMLReader($microdataReader);
97+
98+
// Replace this URL with that of a website you know is using Microdata
99+
$url = 'http://www.example.com/';
100+
$html = file_get_contents($url);
101+
102+
// Read the document and return the top-level items found
103+
// Note: the URL is only required to resolve relative URLs; no attempt will be made to connect to it
104+
$items = $htmlReader->read($html, $url);
105+
106+
// Loop through the top-level items
107+
foreach ($items as $item) {
108+
echo implode(',', $item->getTypes()), PHP_EOL;
109+
110+
foreach ($item->getProperties() as $name => $values) {
111+
foreach ($values as $value) {
112+
if ($value instanceof Item) {
113+
// We're only displaying the class name in this example; you would typically
114+
// recurse through nested Items to get the information you need
115+
$value = '(' . implode(', ', $value->getTypes()) . ')';
116+
}
117+
118+
// If $value is not an Item, then it's a plain string
119+
120+
echo " - $name: $value", PHP_EOL;
121+
}
122+
}
123+
}
124+
```
125+
126+
### Known issues
127+
128+
- No support for the `itemref` attribute in `MicroDataReader`
129+
- No support for the `prefix` attribute in `RdfaLiteReader`; only [predefined prefixes](https://www.w3.org/2011/rdfa-context/rdfa-1.1) are supported right now
130+
- No proper support for `@context` in `JsonLdReader`; right now, only strings are accepted in `@context`, and they are considered a vocabulary identifier; this works fine with simple markup like the one used in the examples on [schema.org](https://schema.org/), but may fail with more complex documents.
131+
132+
#### Note about JSON-LD's `@context`
133+
134+
While `JsonLdReader` should be able to handle a proper context object in the future, its goal will never be to be a
135+
fully compliant JSON-LD parser; in particular, it will *never* attempt to fetch a JSON-LD context referenced by a URL.
136+
137+
This is consistent with how indexing robots typically crawl the web, they do not fetch remote contexts, which relieves
138+
them from fetching additional documents to extract structured data from a web page.
139+
140+
The aim of `JsonLdReader`, and the other `Reader` implementations for that matter, is to be able to parse a document with the same capabilities as [Google Structured Data Testing Tool](https://search.google.com/structured-data/testing-tool/) or [Yandex Structured data validator](https://webmaster.yandex.com/tools/microtest/), no more, no less. These tools [do not load external context files](https://webmasters.stackexchange.com/q/123425/18342).

0 commit comments

Comments
 (0)