|
1 | | -# HTML AST |
2 | | -An HTML AST (Abstract Syntax Tree) parser written in php. Inspired by the AST parser in TempestPHP (written by Brett Roose). |
| 1 | +# html-ast |
3 | 2 |
|
4 | | -It has a built-in lexer to parse the html, and then a AST parser to convert it into a tree structure. |
5 | | -Finally, it comes with a printer to output properly formatted HTML (indented). |
| 3 | +An HTML AST (Abstract Syntax Tree) parser written in PHP. |
| 4 | +Inspired by the AST parser in TempestPHP (by Brett Roose), this library provides a built-in lexer to tokenize HTML strings, an AST parser to convert tokens into a tree structure, and a printer to output well-formatted (indented) HTML. |
| 5 | + |
| 6 | +> **Note:** This package requires PHP 8.2 or higher. |
| 7 | +
|
| 8 | +## Table of Contents |
| 9 | + |
| 10 | +- [Features](#features) |
| 11 | +- [Requirements](#requirements) |
| 12 | +- [Installation](#installation) |
| 13 | +- [Usage](#usage) |
| 14 | + - [Lexing](#lexing) |
| 15 | + - [Parsing](#parsing) |
| 16 | + - [Printing](#printing) |
| 17 | +- [Testing](#testing) |
| 18 | +- [Todo](#todo) |
| 19 | +- [Contributing](#contributing) |
| 20 | +- [License](#license) |
| 21 | + |
| 22 | +## Features |
| 23 | + |
| 24 | +- **Built-in Lexer:** Tokenizes raw HTML input. |
| 25 | +- **AST Parser:** Converts tokenized HTML into an Abstract Syntax Tree for easier analysis and manipulation. |
| 26 | +- **HTML Printer:** Renders the AST back into properly indented HTML code. |
| 27 | + |
| 28 | +## Requirements |
| 29 | + |
| 30 | +- PHP version **8.2** or later. |
| 31 | +- Composer (for installation via [Packagist](https://packagist.org/)). |
| 32 | + |
| 33 | +## Installation |
| 34 | + |
| 35 | +You can install **html-ast** via Composer. From your project root, run: |
| 36 | + |
| 37 | +```bash |
| 38 | +composer require sinnbeck/html-ast |
| 39 | +``` |
| 40 | + |
| 41 | +Alternatively, if you prefer to clone the repository directly: |
| 42 | + |
| 43 | +```bash |
| 44 | +git clone https://github.com/sinnbeck/html-ast.git |
| 45 | +cd html-ast |
| 46 | +composer install |
| 47 | +``` |
6 | 48 |
|
7 | 49 | ## Usage |
| 50 | + |
| 51 | +The package is organized into three main components: the Lexer, the AST Parser, and the Printer. Below are basic examples of how to use each. |
| 52 | + |
| 53 | +### Lexing |
| 54 | + |
| 55 | +The lexer tokenizes an HTML string. Tokens represent the smallest meaningful elements of the HTML (such as tags, attributes, and text). |
| 56 | + |
8 | 57 | ```php |
9 | 58 | use Sinnbeck\HtmlAst\Lexer\Lexer; |
10 | | -use Sinnbeck\HtmlAst\Ast\Parser; |
11 | | -use Sinnbeck\HtmlAst\Printer; |
12 | 59 |
|
13 | | -$lexer = Lexer::fromString($html) |
| 60 | +// Provide your HTML string |
| 61 | +$html = '<div class="container"><p>Hello, world!</p></div>'; |
| 62 | + |
| 63 | +// Create a Lexer instance from the string |
| 64 | +$lexer = Lexer::fromString($html); |
| 65 | + |
| 66 | +// Lex the HTML string into tokens |
14 | 67 | $tokens = $lexer->lex(); |
| 68 | + |
| 69 | +// Optionally, inspect the tokens: |
| 70 | +print_r($tokens); |
| 71 | +``` |
| 72 | + |
| 73 | +### Parsing |
| 74 | + |
| 75 | +The AST parser converts the token list into a tree structure, where each node represents an HTML element, text node, or comment. |
| 76 | + |
| 77 | +```php |
| 78 | +use Sinnbeck\HtmlAst\Ast\Parser; |
| 79 | + |
| 80 | +// Create an AST parser instance with the tokens from the lexer |
15 | 81 | $ast = Parser::make($tokens); |
| 82 | + |
| 83 | +// Parse tokens into an AST (node tree) |
16 | 84 | $nodeTree = $ast->parse(); |
17 | 85 |
|
18 | | -//and if you want to output the resulting HTML |
| 86 | +// Optionally, inspect the node tree: |
| 87 | +print_r($nodeTree); |
| 88 | +``` |
| 89 | + |
| 90 | +### Printing |
| 91 | + |
| 92 | +The printer takes an HTML input or the resulting AST and renders it as neatly formatted HTML. This is useful for ensuring consistent formatting after transformations. |
| 93 | + |
| 94 | +```php |
| 95 | +use Sinnbeck\HtmlAst\Printer; |
| 96 | + |
| 97 | +// Create a Printer instance and render the HTML string |
19 | 98 | echo Printer::make()->render($html); |
20 | 99 | ``` |
21 | 100 |
|
| 101 | +## Testing |
| 102 | + |
| 103 | +The repository includes tests under the `tests` directory, using [Pest PHP](https://pestphp.com/) as the testing framework and Symfony's VarDumper for debugging. To run tests, execute: |
| 104 | + |
| 105 | +```bash |
| 106 | +composer test |
| 107 | +``` |
| 108 | + |
| 109 | +This command runs all tests to ensure the lexing, parsing, and printing functionalities work as expected. |
| 110 | + |
22 | 111 | ## Todo |
23 | | -* [ ] Add line numbers to lexer |
24 | | -* [ ] Add html validator to ensure HTMl structure is valid |
25 | | -* [ ] Add node visitors to allow changing HTML ? |
| 112 | + |
| 113 | +* [ ] Add line numbers to tokens (Lexer) |
| 114 | +* [ ] Introduce an HTML validator to ensure that the HTML structure conforms to expected standards |
| 115 | +* [ ] Implement a node visitor pattern to allow modification or transformation of the AST |
| 116 | + |
| 117 | +## Contributing |
| 118 | + |
| 119 | +Contributions to **html-ast** are welcome. If you would like to contribute, please follow these steps: |
| 120 | + |
| 121 | +1. Fork the repository. |
| 122 | +2. Create a feature branch: |
| 123 | + ```bash |
| 124 | + git checkout -b feature/your-feature-name |
| 125 | + ``` |
| 126 | +3. Make your changes and add tests. |
| 127 | +4. Format all files: |
| 128 | + ```bash |
| 129 | + ./vendor/bin/pint` |
| 130 | + ``` |
| 131 | +5. Commit your changes: |
| 132 | + ```bash |
| 133 | + git commit -am 'Add new feature' |
| 134 | + ``` |
| 135 | +6. Push the branch: |
| 136 | + ```bash |
| 137 | + git push origin feature/your-feature-name |
| 138 | + ``` |
| 139 | +7. Open a pull request explaining your changes. |
| 140 | + |
| 141 | +Please adhere to the coding standards and test all changes before submitting a pull request. |
| 142 | + |
| 143 | +## License |
| 144 | + |
| 145 | +This project is licensed under the [MIT License](https://opensource.org/license/MIT) |
0 commit comments