Three potential additions to stdlib

I now find myself with some free time that I would like to put in to extending the STDLIB. I have some preliminary work on three topics that might be useful to users of STDLIB: error codes, hash tables/unordered maps, and the unicode database. I would appreciate some guidance as to which topic people would find  most useful and whether the interest in any topic is high enough to justify inclusion in STDLIB.

For error codes the idea is to standardize how the stdlib identifies and distinguises error conditions. I have a list of 100+ error code names, each with unique integer values. The names currently follow a pattern language in which a suffix is used to indicate the general source of the error: `_failure`, in which the outside world is not as expected, `_error`, in which the programmer likely made a mistake, and `_fault`, in which the processor is likely at fault. This pattern language makes the names a little unwieldy so I am considering dropping it. I have also defined a subroutine that takes an error code as an argument and stops processing with an error stop with an error code dependent string as the stop code. The resulting codes could be made part of `stdlib_error.f90` and the subroutine part of an overloaded `error_stop`. FWIW this approach was inspired by a [posting by Richard A. O'Keefe in the Erlang mail list] (http://erlang.org/pipermail/erlang-questions/2015-March/083608.html).

The hash tables/unordered maps are composed of two or three modules: a module of hash functions, a module of hash tables using key/value pairs to implement an unordered map, and perhaps a module of hash tables using keys to implement an unordered set. The hash functions would have to be [non-cryptographic](https://en.wikipedia.org/wiki/List_of_hash_functions) to be legal in the US. Ideally there should be several different functions so the function can be changed in the event of excessive collisions. My idea is to use 64 bit hashes as reasonable for large data sets, though I can be persuaded to use 32 bit hashes. I have tentatively identified the [Fowler-Noll-Voh (FNV)](https://en.wikipedia.org/wiki/Fowler–Noll–Vo_hash_function) hash, the [Murmur hash](https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp), and the [Spooky hash](http://www.burtleburtle.net/bob/hash/spooky.html) as 64 bit public domain hash functions of interest. If anyone believes that any of these hashes are cryptographic let me know, or if you know of "better" hash functions. The hashes will be a transliteration to signed arithmetic assuming two's complement arithmetic and overflow detection can be turned off. The hash table will store the key and the value as (transferred) eight bit integer arrays. The tables can come in two implementations: chaining and open tables. They will use power of two table sizes, which assumes that the hash functions do an excellent job of randomizing bits.

The Unicode consortium maintains a [database](https://unicode.org/ucd/) of character properties for the UCS character set. This database is the basis of the Python module [unicodedata](https://docs.python.org/3/library/unicodedata.html). This database is necessary for assigning code points to categories (e.g., number, letter, and punctuation), converting strings to one of the four normalization forms, and case folding. I consider this data necessary for the implementation of a UCS based varying string module. The initial implementation would be a module with procedures that use 32 bit integer scalars and arrays to represent UCS code points and strings. Several of the Unicode categories ("Bidi", "General Category", "Decomposition", and "Numerical") would be represented by enumerations in category specific types whose only components are eight bit integers. The text files in the UCD and Unihan directories that make up the database take up about 70.5 MBytes, so supporting this database requires a very significant download. The rank one arrays implementing this database take up a comparable amount of runtime storage.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Three potential additions to stdlib #262

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Three potential additions to stdlib #262

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions