Skip to content

DOC: io.rst description and code inconsistent, plus the description is for deprecated behaviour #60705

Open
@wjandrea

Description

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/dev/user_guide/io.html#reading-html-content

Read in the content of the file from the above URL and pass it to read_html as a string:

In [317]: html_str = """
   .....:          <table>
   .....:              <tr>
   .....:                  <th>A</th>
   .....:                  <th colspan="1">B</th>
   .....:                  <th rowspan="1">C</th>
   .....:              </tr>
   .....:              <tr>
   .....:                  <td>a</td>
   .....:                  <td>b</td>
   .....:                  <td>c</td>
   .....:              </tr>
   .....:          </table>
   .....:      """
   .....: 

In [318]: with open("tmp.html", "w") as f:
   .....:     f.write(html_str)
   .....: 

In [319]: df = pd.read_html("tmp.html")

In [320]: df[0]
Out[320]: 
   A  B  C
0  a  b  c

Documentation problems

Problem 1

The "above URL" is

url = 'https://www.sump.org/notes/request/' # HTTP request reflector

but data from that URL is not what's used in the code.

Problem 2

"pass it to read_html as a string" is not what's being demonstrated in the code.

Problem 3

read_html can take an HTML string, but that behaviour is deprecated, per its docs:

Deprecated since version 2.1.0: Passing html literal strings is deprecated. Wrap literal string/bytes input in io.StringIO/io.BytesIO instead.

Suggested fix for documentation

I'm not sure!

Metadata

Assignees

No one assigned

    Labels

    DocsNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions