Skip to content

Scrape FIPS algorithm data #276

@J08nY

Description

@J08nY

Initial description by @J08nY

Data from the FIPS algorithm dataset is not utilized and mined fully. We can follow the links to the algorithm page and get more data that will help us. This can help in cert id cleanup to get rid of the algo references.

Details

Currently, the FIPSAlgorithm object is built from rows of a pandas DataFrame constructed merely from the list of Algorithms, see below

df = pd.read_html(html_path)[0]

This table does not include valuable attributes found on the individual pages of the algorithm. The proposed enhancement should:

class FIPSAlgorithm(PandasSerializableType, ComplexSerializableType):

Further guidance

One can isolate the pipeline stage that processes the algorithm dataset simply by

from sec_certs.dataset.fips_algorithm import FIPSAlgorithmDataset

alg_dset = FIPSAlgorithmDataset.from_web()
alg_dset.to_json("/path/to/some/file.json")

The PR implementing this enhancement should modify the parse_algorithms_from_html method.

Metadata

Metadata

Assignees

Labels

fipsRelated to FIPS 140 certification

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions