Reading WARC files from HDFS is based on pydoop: the latest release dates to 2019 and the module gets more and more difficult to install on newer Python versions.
Pydoop should be replaced, ideally by a utility method reading from HDFS relying only on Spark functions.