Skip to content

Releases: jdepoix/youtube-transcript-api

v1.0.3

25 Mar 18:12
b706276
Compare
Choose a tag to compare

What's Changed

  • Refactored parsing of the JS var containing the transcript data, to make it more robust to changes in the formatting of the returned HTML

Full Changelog: v1.0.2...v1.0.3

v1.0.2

17 Mar 18:17
dc08c3f
Compare
Choose a tag to compare

What's Changed

  • Added retry mechanism, which will retry requests when Webshare proxies are used and RequestBlocked is raised, to trigger an IP rotation in case a user encounters a blocked residential IP
  • Added new error messages when RequestBlocked is raised despite proxies being used, to assist users in figuring out what the issue is
  • Fixed PEP-8 warning by @afourney in #396

New Contributors

Full Changelog: v1.0.1...v1.0.2

v1.0.1

12 Mar 20:30
aad8621
Compare
Choose a tag to compare

What's Changed

  • Adds a feature to allow proxy configs to prevent the HTTP client from keeping TCP connections open, as keeping TCP connections alive can prevent proxy providers from rotating your IP
    • adds the prevent_keeping_connections_alive() -> bool method to ProxyConfig objects
    • When initializing YouTubeTranscriptApi a Connection: close header will be added to the HTTP client, if a proxy config with prevent_keeping_connections_alive() == True is used
  • Added py.typed by @jkawamoto in #390

New Contributors

Full Changelog: v1.0.0...v1.0.1

v1.0.0

11 Mar 18:27
bf45008
Compare
Choose a tag to compare

What's Changed

  • Overhaul of the public API to move away from the static methods get_transcript, get_transcripts and list_transcripts
    • YouTubeTranscriptApi.get_transcript(video_id) is replaced with YouTubeTranscriptsApi().fetch(video_id)
    • YouTubeTranscriptApi.list_transcripts(video_id) is replaced with YouTubeTranscriptsApi().list(video_id)
    • There is no equivalent for YouTubeTranscriptApi.get_transcript in the new interface, as this doesn't provide any meaningful utility over just running [ytt_api.fetch(video_id) for video_id in video_ids]
    • By calling .fetch and .list on a YouTubeTranscript instance, we can share a HTTP session between all requests, which allows us to share cookies and reduces redundant requests, thereby saving bandwidth and proxy costs.
    • transcript.fetch() now returns a FetchedTranscript object instead of a list of dictionaries. This allows for adding metadata and utility methods to the returned object. You can still convert a FetchedTranscript object to the previously used format by calling fetched_transcript.to_raw_data().
    • You'll find more details on the updated API in the README. The old static methods can still be used, but have been deprecated and will be removed in a future version!
  • Added new exceptions types to make the cause of some common errors more clear and allow for catching/handling them
    • RequestBlocked is now raised if the request has been blocked by YouTube due to a blacklisted IP (which would previously raise TranscriptDisabled #303)
    • AgeRestricted is raised if the video is age restricted and requires cookie authentication (#111)
    • VideoUnplayable is raised if the video is unplayable for an unknown reason. When this happens the error message that YouTube would display on the WebPlayer is returned by the exception, which should make unknown errors more useful. (#219)
  • Added type hierarchy to configure proxies, which can now be passed into the constructor of YouTubeTranscriptApi. All proxy configs are located in the new module youtube_transcript_api.proxies.
    • Generic HTTP/HTTPS/SOCKS proxy can be configured using the GenericProxyConfig class (similarly to how it was done before using the requests dict)
    • Added integration of the proxy provider Webshare, which allows for easily setting up rotating residential proxies using the WebshareProxyConfig
    • You'll find more details on the proxy config classes and how to use them in the README
  • Added the option to pass a HTTP session into the YouTubeTranscriptApi constructor
    • Allows for setting a path to CA_BUNDLE file (#362, #312)
    • Allows for setting custom headers (#316)
    • Allows for sharing HTTP sessions between multiple instance of YouTubeTranscriptApi
  • Added type signatures to all interfaces

Contributors

Due to the rewrite of some interfaces I wasn't able to merge their PRs directly, but special thanks to the work done by @crhowell in #219 and by @andre-c-andersen in #337, as their PRs have been very useful in implementing the new exceptions types! 😊🙏

Full Changelog: v0.6.3...v1.0.0

v0.6.3

18 Nov 09:52
97522b7
Compare
Choose a tag to compare

What's Changed

  • Fix grammatical mistakes in README by @Jai0401 in #287
  • Update README.md - cookies extension and instructions for export by @samfisherirl in #339
  • [security] defusedxml.ElementTree instead of xml.etree.ElementTree by @vasiliadi in #352

New Contributors

Full Changelog: v0.6.2...v0.6.3

v0.6.2

27 Dec 13:23
f5c9e16
Compare
Choose a tag to compare

Fixes

  • YouTube has made some changes which caused the translationLanguages key to sometimes be missing from the captions json. This release adjusts the fetching process to initialize translation_languages with an empty list in case that happens.

v0.6.1

16 Jun 13:27
Compare
Choose a tag to compare

Fixes

  • Fixed transcript list not showing display names for languages in English

v0.6.0

17 Apr 13:51
Compare
Choose a tag to compare

Features

  • The optional parameter preserve_formatting has been added to YouTubeTranscriptApi.get_transcript, YouTubeTranscriptApi.get_transcripts, and Transcript.fetch. If this is set to True, formatting elements such as <i> (italics) and <b> (bold) are no longer removed from the transcript. (thanks to @eseiver!)
  • Using the URL of a YouTube video instead of it's video ID will no throw a InvalidVideoId exception.

v0.5.0

26 Oct 10:14
Compare
Choose a tag to compare

Features

  • Added support for formatting .srt files using the SRTFormatter (thanks to @liamrs222!)
  • get_transcript and get_transcripts now assert that their input type is correct, as users commonly passed a video id (string) into get_transcripts although it expects a list. Since a string is an iterable the module tried to find a video for each character of that string, which failed with a not-so-helpful error message. (thanks to @majamil16!)

v0.4.4

30 Mar 15:35
Compare
Choose a tag to compare

Fixes

  • Transcript language list is now properly escaped, thereby fixing a decoding error which would occur on transcripts for languages containing " in their name (like Estnisch - "Raev")