Skip to content

Latest commit

 

History

History
130 lines (92 loc) · 5.94 KB

File metadata and controls

130 lines (92 loc) · 5.94 KB

Judgments parser

This parser converts UK judgments from .docx format to XML. It is written in C# and requires .NET 8.0.

Release process

Find Case Law

Important

This section only covers Find Case Law. If you are trying to create a new release for another project, you should instead follow that project's release process.

  1. Update the code
    • Make a new branch for the release
    • Update version.targets in the root of the repo with the new version number - this is used by the parser code to add <uk:parser>x.x.x</uk:parser> to the parsed xml outputs
    • Push the branch and open a new PR against main
    • Merge the PR
  2. Create a GitHub Release
    • Create a new tag on main with the same version number as version.targets
    • Generate release notes
    • Publish the release

Deployment

Find Case Law

Important

This section only covers Find Case Law. If you are trying to deploy the parser for another project, you should instead follow that project's deployment process.

  1. Wait for the next day

Validating a deployment

  1. Go to Find Case Law and check that a new judgment has the latest <uk:parser> version in it.

Using the parser

C# API

To invoke the parser programatically, clients should use the classes in the UK.Gov.NationalArchives.Judgments.Api namespace.

  1. Create a Request object, with the following properties:
    • Content (required), a byte array, the content of the judgment, in .docx format
    • Filename (optional), a string, the name of .docx file containing the judgment
    • Attachments (optional), an array of Attachment objects, having the following properties:
      • Content (required), a byte array, the content of the attachment, in .docx format
      • Type (required), an enum, with the following possibe values: Order
      • Filename (optional), a string, the name of .docx file containing the attachment
    • Meta (optional), a Meta object, with the following properties:
      • Court (optional), a string, the identifier of the court
      • Cite (optional), a string, the natural citation of the case
      • Date (optional), a date, the date of the judgment
      • Name (optional), a string, the case name
      • Uri (optional), a string, a URI for the judgment
      • Attachments (optional), an array of ExternalAttachment objects, having the following properties:
        • Name (required), a string, the name of the attachment for display
        • Link (optional), a string, a URL for the attachment
    • Hint (optional), an enum, with the following possibe values: UKSC, UKCA, UKHC, UKUT, Judgment, PressSummary. If present, the parser will attempt to parse a judgment only of the specified type.
  2. Pass it to the Parse method in the Parser class,
  3. Receive a Response object, which will have the following properties:
    • Xml, a string, the judgment in LegalDocML
    • Images, an array of Image objects, having the following properties:
      • Content, a byte array, the content of the image
      • Type, a string, the MIME type of the image
      • Name, a string, the name of the image as referred to in the XML
    • Meta, a Meta object, as above

CLI

The parser can also be invoked from the command line, as follows:

dotnet run --input path/to/file.docx

So, for example, the following command will parse the included test document and direct the output to the console:

dotnet run --input test/judgments/test1.docx

To direct the XML output to a file, use the --output option, like so:

dotnet run --input test/judgments/test1.docx --output something.xml

To save the XML and all of the embedded images to a .zip file, use the --output-zip option, like so:

dotnet run --input test/judgments/test1.docx --output-zip something.zip

If the --log option is used, the parser will log its progress to the specified file. For example:

dotnet run --input test/judgments/test1.docx --output something.xml --log log.txt

And if the --test option is used, the parser will perform a few tests and display the results either in the console or, if logging is enabled, to the log file.

Tests

There are a mixture of unit, integration and end to end tests which overall give a good coverage of the codebase. These run in CI and should be updated/added to when changes are made.

To run all the tests use your IDE or run:

dotnet test tna-judgments-parser.sln

When significant changes are made to the parser some tests may fail due to differences in the expected xml output. The test xmls can be updated en masse by running:

dotnet test tna-judgments-parser.sln --filter test.UpdateXmlFiles.UpdateJudgmentXmls -e UPDATE_XML="true"

Dev Containers

You can run this code in a Dev Container in VSCode or other IDES.

  • Install the ms-vscode-remote.remote-containers extension

  • Press F1, and select Dev Containers: Open Folder in Container.... Select the parser folder.

You can now run tests in debug mode from the Flask (Testing) icon on the left.

Configuration lives in devcontainer.json.