Skip to content

Latest commit

 

History

History
84 lines (59 loc) · 3.24 KB

README.md

File metadata and controls

84 lines (59 loc) · 3.24 KB

sorrow

rhinocratic License

A Clojure/ClojureScript implementation of an error-correcting coding scheme for alphanumeric data, as described by A.S. Sethi, V. Rajaraman and P.S. Kenjale in their 1977 paper.

The encoding appends two check characters to an alphanumeric string in order to enable the detection and correction of the following types of error:

  • a single error in transcription of a character OR
  • a single transposition of adjacent characters

For an alphabet of (prime) size p, the maximum length of an encoded word will be (p - 1)/2. The check characters are calculated by one of two different methods according to the desired length n of the encoded word. For n <= ⌊(p + 2)/3⌋, the simpler of the two methods described in the paper will be used, the more complex second method being reserved for cases when longer words are required.

Encoding

To create an encoder for an alphabet and a desired encoded word length n:

(require '[sorrow.core :refer [encoder corrector alphanumeric-upper-case]])

(def enc (encoder alphanumeric-upper-case 8))

The cardinality of the alphabet must be prime. The returned encoding function will accept words of length n-2 formed from the alphabet and append the two check characters:

(enc "ABC123")

;; => "ABC1235K"

Sorrow pre-defines a couple of alphabets:

  • alphanumeric-upper-case - The digits 0-9, letters A-Z plus * to make the cardinality prime (37 characters)
  • alphanumeric-mixed-case - The digits 0-9, letters A-Za-z minus capital 'O' (61 characters)

However, any alphabet of prime size and with distinct characters may be supplied as a string:

(def my-alphabet "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ~@#!*")
(def enc (encoder my-alphabet 8))

Validation/correction

To obtain a corrector for encoded words of length n:

(def cor (corrector alphanumeric-upper-case 8))

For an encoded string of length n, the corrector function will return a map containing the following:

  • :status - :correct, :corrected or :uncorrectable
  • :original - the uncorrected word, as supplied to the corrector
  • :correct - the corrected word (absent if :status = :uncorrectable, same as :original if :status = :correct)
  • :error-type - :transcription or :transposition if :status = :corrected
  • :error-pos - position of the error in the word if :status = :corrected

For example:

(cor "4H9SC512")

;; => {:status :corrected, :original "4H9SC512", :correct "4H9SC510", :error-type :transcription, :error-pos 7}

Running the project's tests

$ clojure -T:build test

License

Copyright © 2023 Andrew Baxter

Distributed under the Eclipse Public License version 1.0.