From b75f35be86b7e884d57ebe45411b0dfe8791b854 Mon Sep 17 00:00:00 2001 From: Wes Turner <50891+westurner@users.noreply.github.com> Date: Sun, 5 Feb 2023 11:51:15 -0500 Subject: [PATCH] DOC: README.rst: slight modifications, conda install; badges --- README.rst | 97 +++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 71 insertions(+), 26 deletions(-) diff --git a/README.rst b/README.rst index 793f80a..ab13752 100644 --- a/README.rst +++ b/README.rst @@ -1,23 +1,37 @@ + +======================================= Requests-HTML: HTML Parsing for Humans™ ======================================= -.. image:: https://farm5.staticflickr.com/4695/39152770914_a3ab8af40d_k_d.jpg - .. image:: https://travis-ci.com/psf/requests-html.svg?branch=master :target: https://travis-ci.com/psf/requests-html -This library intends to make parsing HTML (e.g. scraping the web) as -simple and intuitive as possible. +.. image:: https://img.shields.io/pypi/v/requests-html + :target: https://pypi.org/project/requests-html/ + +.. image:: https://img.shields.io/pypi/dm/requests-html + :target: https://pypi.org/project/requests-html/ + +.. image:: https://img.shields.io/conda/dn/conda-forge/requests-html + :target: https://github.com/conda-forge/requests-html-feedstock/blob/main/recipe/meta.yaml + +.. image:: https://img.shields.io/badge/python-3.7+-important + :target: https://python.org/ -When using this library you automatically get: +A Python library for requesting and parsing HTML with psf/requests and Chromium. -- **Full JavaScript support**! -- *CSS Selectors* (a.k.a jQuery-style, thanks to PyQuery). -- *XPath Selectors*, for the faint of heart. -- Mocked user-agent (like a real web browser). -- Automatic following of redirects. -- Connection–pooling and cookie persistence. -- The Requests experience you know and love, with magical parsing abilities. +.. contents:: + +Features +======== + +- **Full JS support**! +- *CSS Selectors* with pyquery: ``.find()``, ``.pq`` +- *XPath Selectors* with lxml: ``.xpath()``, ``.lxml`` +- Mocked user-agent (like a real web browser) +- Follows HTTP redirects +- HTTP Connection pooling and cookie persistence +- Downloads Chromium on first request or ``pyppeteer-install`` - **Async Support** .. Other nice features include: @@ -28,7 +42,7 @@ When using this library you automatically get: Tutorial & Usage ================ -Make a GET request to 'python.org', using Requests: +Make a (blocking) GET request to 'python.org' with Requests and `HTMLSession`: .. code-block:: pycon @@ -36,7 +50,7 @@ Make a GET request to 'python.org', using Requests: >>> session = HTMLSession() >>> r = session.get('https://python.org/') -Try async and get some sites at the same time: +Make multiple concurrent HTTP GET requests with `AsyncHTMLSession`: .. code-block:: pycon @@ -65,7 +79,7 @@ Try async and get some sites at the same time: https://www.google.com/ https://www.reddit.com/ -Note that the order of the objects in the results list represents the order they were returned in, not the order that the coroutines are passed to the ``run`` method, which is shown in the example by the order being different. +Note that the order of the objects in the `results` list represents the order the HTTP requests were returned in, not the order that the coroutines are passed to the ``AsyncHTMLSession.run()`` method (due to variable network and server latency). Grab a list of all links on the page, as–is (anchors excluded): @@ -154,10 +168,10 @@ XPath is also supported: [] -JavaScript Support +JS Support ================== -Let's grab some text that's rendered by JavaScript. Until 2020, the Python 2.7 countdown clock (https://pythonclock.org) will serve as a good test page: +Let's grab some text that requires JS to render. Until 2020, the Python 2.7 countdown clock (https://pythonclock.org) will serve as a good test page: .. code-block:: pycon @@ -178,7 +192,7 @@ Notice the clock is missing. The ``render()`` method takes the response and rend >>> r.html.search('Python 2.7 will retire in...{}Enable Guido Mode')[0] '\n \n
1Year2Months28Days16Hours52Minutes46Seconds
\n
\n
\n