Releases: tidyverse/rvest
rvest 1.0.5
-
Fixes
LiveHTMLobjects returning 'could not find node with given id' errors if a page navigation occurred after a$click()event (#405). -
New example vignette displays the same starwars data but rendered dynamically using JS, so you need to use
read_html_live()to get the data.
rvest 1.0.4
-
New
read_html_live()reads HTML into a real, live, HTML browser, meaning
that you can scrape HTML generated by javascript. It returns aLiveHTML
object which you can also use to simulate user interactions with the page,
like clicking, typing, and scrolling (#245).
rvest 1.0.3
- Re-document to fix HTML issues in
.Rd.
rvest 1.0.2
rvest 1.0.1
rvest 1.0.0
(to be released as rvest 1.0.0)
New features
-
New
html_text2()provides a more natural rendering of HTML nodes into text,
converting<br>into "\n", and removing non-significant whitespace (#175).
By default, it also converts into regular spaces, which you can
suppress withpreserve_nbsp = TRUE(#284). -
html_table()has been re-written from scratch to more closely mimic the
algorithm that browsers use for parsing tables. This should mean that there
are far fewer tables for which it fails to produce some output (#63, #204,
#215). Thefillargument has been deprecated since it is no longer needed.
html_table()now returns a tibble rather than a data frame to be compatible
with the rest of the tidyverse (#199). Its performance has been considerably
improved (#237). It also gains ana.stringsargument to control what values
are converted toNA(#107), and aconvertargument to control whether to
run the conversion (#311). -
New
html_form_submit()allows you to submit a form directly, without
needing to create a session (#300). -
rvest is now licensed as MIT (#287).
API changes
Since this is the 1.0.0 release, I included a large number of API changes to make rvest more compatible with current tidyverse conventions. Older functions have been deprecated, so existing code will continue to work (albeit with a few new warnings).
-
rvest now imports xml2 rather than depending on it. This is cleaner because
it avoids attaching all the xml2 functions that you're less likely to use.
To reduce the change of breakages, rvest re-exports xml2 functions
read_html()andurl_absolute(), but your code may now need an explicit
library(xml2). -
html_form()now returns an object with classrvest_form(instead of form).
Fields within a form now have classrvest_field, instead of a
variety of classes that were lacking thervest_prefix. All functions for
working with forms have a commonhtml_form_prefix:set_values()became
html_form_set().submit_form()was renamed tosession_submit()because
it returns a session. -
html_node()andhtml_nodes()have been superseded in favor of
html_element()andhtml_elements()since they (almost) always return
elements, not nodes (#298). -
html_session()is nowsession()and returns an object of class
rvest_session(instead ofsession). All functions that work with session
objects now have a commonsession_prefix. -
Long deprecated
html(),html_tag(),xml()functions have been removed. -
minimal_html()(which doesn't appear to be used by any other package)
has had its arguments flipped to make it more intuitive. -
guess_encoding()has been renamed tohtml_encoding_guess()to avoid
a clash withstringr::guess_encoding()(#209).repair_encoding()has
been deprecated because it doesn't appear to work. -
pluck()is no longer exported to avoid a clash withpurrr::pluck();
if you need it usepurrr::map_chr()and friends instead (#209). -
xml_tag(),xml_node(), andxml_nodes()have been formally deprecated
in favor of theirhtml_equivalents.
Minor improvements and bug fixes
-
The "harvesting the web" vignette has been rewritten to focus more on basics
rvest, eliminating the screenshots to keep the installed package as svelte as
possible. It's also been renamed tovignette("rvest")since it's the
vignette that you should read first. -
The SelectorGadget vignette is now a web-only article,
https://rvest.tidyverse.org/articles/articles/selectorgadget.html,
so we can be more generous with screenshots since they're no longer bundled
with every install of the package. Together with the rewrite of the other
vignette, this means that rvest is now ~90 Kb instead of ~1.1 Mb. -
All uses of IMDB have been eliminated since the site explicitly prohibits
scraping (#195). -
session_submit()errors ifformdoesn't have aurl(#288). -
New
session_forward()function to complementsession_back().
It now allows you to pick the submission button by position (#156).
The...argument is deprecated; please useconfiginstead. -
html_form_set()can now accept character vectors allowing you to select
multiple checkboxes in a set or select multiple values from a multi-<select>
(#127, with help from @juba). It also uses dynamic dots so that you can use
!!!if you have a list of values (#189).
rvest 0.3.6
- Remove failing example
rvest 0.3.5
- Use web archive to fix broken example.
rvest 0.3.4
- Remove unneeded
read_xml.response()method (#242).