Skip to content

Commit 7ffb99f

Browse files
committed
Merge branches 'fix-values-module-indent', 'fix-readme-usage-snippet', 'optimize-any-value-parsing-1', 'fix-readme-usage-snippet-syntax-error', 'fix-setup-comment', 'improve-readme-text', 'optimize-some-if-statements', 'fix-formatter-issue-1', 'fix-formatter-issue-2', 'fix-formatter-issue-3', 'rewrite-https-hyperlinks', and 'fix-selectors-parsing'
13 parents 3fd2325 + 94f1ac9 + 7680929 + 2f5275e + 62b56e3 + 0d1923d + 83d3b56 + 29a482e + 28ab7dc + 27d21ca + 1b79b9b + b0402c2 + 9e4ce7b commit 7ffb99f

File tree

7 files changed

+246
-229
lines changed

7 files changed

+246
-229
lines changed

README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@ It should go without saying that whether you choose to install the package with
2929
The code snippet below demonstrates obtaining of a _parse tree_ (in the `stylesheet` variable) by parsing the file `example.css`:
3030

3131
```python
32-
from csspring.parsing import normalize_input, parse_stylesheet
33-
stylesheet = parse_stylesheet(normalize_input(open('example.css', newline=''))))) # The `newline=''` argument prevents default re-writing of newline sequences in input — per the CSS Syntax spec., parsing does filtering of newline sequences so no rewriting by `open` is necessary or desirable
32+
from csspring.parsing import parse_stylesheet
33+
stylesheet = parse_stylesheet(open('example.css', newline='')) # The `newline=''` argument prevents default re-writing of newline sequences in input — per the CSS Syntax spec., parsing does filtering of newline sequences so no rewriting by `open` is necessary or desirable
3434
```
3535

3636
## Documentation
@@ -71,15 +71,15 @@ Parsing is offered only in the form of Python modules — no "command-line" prog
7171

7272
### Why?
7373

74-
We wanted a "transparent" CSS parser — one that could be used in different configurations without it imposing limitations that would strictly speaking go beyond parsing. Put differently, we wanted a parser that does not assume any particular application, a software _library_ in the classical sense of the term, or a true _API_ if you will.
74+
We wanted a "transparent" CSS parser — one that could be used in different configurations without it imposing limitations that would strictly speaking go beyond parsing. Put differently, we wanted a parser that does not assume any particular application a software _library_ in the classical sense of the term, or a true _API_ if you will.
7575

7676
For instance, the popular [Less](http://lesscss.org) software seems to rather effortlessly parse CSS [3] text, but it invariably re-arranges white-space in the output, without giving the user any control over the latter. Less is not _transparent_ like that — there is no way to use it with recovery of the originally parsed text from the parse tree — parsing with Less is a one-way street for at least _some_ applications (specifically those that "transform" CSS but need to preserve all of the original input as-is).
7777

7878
In comparison, this library was written to preserve _all_ input, _as-is_. This became one of the requirements defining the library, contributing to its _reason d'etre_.
7979

8080
### Why Python?
8181

82-
As touched upon in [the disclaimer above](#disclaimer), the parser was written "from the bottom up" - if it ever adopts a top layer exposing its features with a "command line" tool, said layer will invariably have to tap into the rest of it, the library, and so in the very least a library is offered. Without a command-line tool (implying switches and other facilities commonly associated with command-line tools) the utility of the parser is tightly bound to the capabilities of e.g. the programming language it was written in, since the language effectively functions as the interface to the library (you can hardly use a library offered in the form of a C code without a C compiler and/or a dynamic linker). A parser is seldom used in isolation, after all — its output, the parse tree, is normally fed to another component in a larger application. Python is currently ubiquitous and attractive looking at a set of metrics that are relevant here. The collective amount of Python code is currently growing steadily, which drives adoption, which makes the prospect of offering CSS parsing written in specifically Python ever more enticing.
82+
As touched upon in [the disclaimer above](#disclaimer), the parser was written "from the bottom up" - if it ever adopts a top layer exposing its features with a "command line" tool, said layer will invariably have to tap into the rest of it, the library, and so in the very least a library is offered. Without a command-line tool (implying switches and other facilities commonly associated with command-line tools) the utility of the parser is tightly bound to the capabilities of e.g. the programming language it was written in, since the language effectively functions as the interface to the library (you can hardly use a library offered in the form of a C code without a C compiler and/or a dynamic linker). A parser is seldom used in isolation, after all — its output, the parse tree, is normally fed to another component in a larger application. Python is ubiquitous and attractive on a number of metrics relevant to us. The collective amount of Python code is growing steadily, which drives adoption, both becoming factors for choosing to offer CSS parsing written in specifically Python.
8383

8484
Another factor for choosing Python was the fact we couldn't find any _sufficiently capable_ CSS parsing libraries written specifically as [reusable] Python module(s). While there _are_ a few CSS parsing libraries available, none declared compliance with or de-facto support CSS 3 (including features like nested rules etc). In comparison, this library was written in close alignment with CSS 3 standard specification(s) (see [the compliance declaration](#compliance)).
8585

expand/csspring/syntax/tokenizing.py

+25-1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
from ..utils import CP, BufferedPeekingReader, is_surrogate_code_point_ordinal, IteratorReader, join, parser_error, PeekingUnreadingReader
1010

1111
from abc import ABC
12+
import builtins
1213
from collections.abc import Callable, Iterable, Iterator
1314
from dataclasses import dataclass
1415
from decimal import Decimal
@@ -46,7 +47,7 @@ def next(n: int) -> str:
4647
def consume(n: int) -> None:
4748
"""Consume the next code point from the stream.
4849
49-
Consuming removes a [filtered] code point from the stream. If no code points are available for consumption (the stream is "exhausted"), an empty string signifying the so-called EOF ("end of file", see https://drafts.csswg.org/css-syntax/#eof-code-point) value, is consumed instead.
50+
Consuming removes a [filtered] code point from the stream. If no code points are available for consumption (the stream is "exhausted"), an empty string signifying the so-called EOF ("end of file", see http://drafts.csswg.org/css-syntax/#eof-code-point) value, is consumed instead.
5051
"""
5152
nonlocal consumed # required for the `+=` to work for mutable non-locals like lists (despite the fact that the equivalent `extend` does _not_ require the statement)
5253
consumed += input.read(n) or [ FilteredCodePoint('', source='') ]
@@ -502,3 +503,26 @@ def is_non_printable_code_point(cp: CP) -> bool:
502503
def is_whitespace(cp: CP) -> bool:
503504
"""See http://drafts.csswg.org/css-syntax/#whitespace."""
504505
return is_newline(cp) or cp in ('\t', ' ')
506+
507+
# Map of values by token type, for types of tokens which do _not_ have the `value` attribute
508+
token_values = { # For the `token_value` procedure to work as intended, subtypes should be listed _before_ their supertype(s)
509+
OpenBraceToken: '{',
510+
OpenBracketToken: '[',
511+
OpenParenToken: '(',
512+
CloseBraceToken: '}',
513+
CloseBracketToken: ']',
514+
CloseParenToken: ')',
515+
ColonToken: ':',
516+
CommaToken: ',',
517+
SemicolonToken: ';',
518+
CDCToken: '->',
519+
CDOToken: '!--',
520+
}
521+
522+
def token_value(type: builtins.type[Token]) -> str:
523+
"""Get the value of a token by its type for types of tokens that do _not_ feature a `value` attribute.
524+
525+
:param type: Type of token to get the value of
526+
:returns: The value common to the type of tokens
527+
"""
528+
return next(value for key, value in token_values.items() if issubclass(type, key))

setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,6 @@ def run(self, *args, **kwargs) -> None:
2222
subprocess.check_call(('make', '-C', self.build_lib, '-f', os.path.realpath('Makefile')))
2323

2424
class BuildCommand(setuptools.command.build.build):
25-
sub_commands = [ ('build_make', None) ] + setuptools.command.build.build.sub_commands # Makes the `build_make` command a sub-command of the `build_command`, which has the effect of the former being invoked when the latter is invoked (which is invoked in turn when the wheel must be built, through the `bdist_wheel` command)
25+
sub_commands = [ ('build_make', None) ] + setuptools.command.build.build.sub_commands # Makes the `build_make` command a sub-command of the `build` command, which has the effect of the former being invoked when the latter is invoked (which is invoked in turn when the wheel must be built, through the `bdist_wheel` command)
2626

2727
setup(cmdclass={ 'build': BuildCommand, 'build_make': MakeCommand })

0 commit comments

Comments
 (0)