DOC: io.rst description and code inconsistent, plus the description is for deprecated behaviour #60705
Open
Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/dev/user_guide/io.html#reading-html-content
Read in the content of the file from the above URL and pass it to
read_html
as a string:In [317]: html_str = """ .....: <table> .....: <tr> .....: <th>A</th> .....: <th colspan="1">B</th> .....: <th rowspan="1">C</th> .....: </tr> .....: <tr> .....: <td>a</td> .....: <td>b</td> .....: <td>c</td> .....: </tr> .....: </table> .....: """ .....: In [318]: with open("tmp.html", "w") as f: .....: f.write(html_str) .....: In [319]: df = pd.read_html("tmp.html") In [320]: df[0] Out[320]: A B C 0 a b c
Documentation problems
Problem 1
The "above URL" is
url = 'https://www.sump.org/notes/request/' # HTTP request reflector
but data from that URL is not what's used in the code.
Problem 2
"pass it to read_html
as a string" is not what's being demonstrated in the code.
Problem 3
read_html
can take an HTML string, but that behaviour is deprecated, per its docs:
Deprecated since version 2.1.0: Passing html literal strings is deprecated. Wrap literal string/bytes input in
io.StringIO
/io.BytesIO
instead.
Suggested fix for documentation
I'm not sure!