Releases: lex-lingo/lingo
Releases · lex-lingo/lingo
v1.10.2
v1.10.1
v1.10.0
- Dropped support for Ruby 2.0.
- Updated dependency versions.
v1.9.0
- Dropped support for Ruby 1.9.
- Removed support for deprecated options and attendee names (
old→new):- Lingo::Language::Grammar:
compositum→compound - Lingo::Attendee::TextReader:
lir-record-pattern→records - Lingo::Config:
multiworder→multi_worder,
objectfilter→object_filter,
textreader→text_reader,
textwriter→text_writer,
vectorfilter→vector_filter,
wordsearcher→word_searcher
- Lingo::Language::Grammar:
- Lingo::Attendee::TextWriter learned format directives for
extoption (currently supported are:%c= config name,%l= language name,%d= current date,%t= current time). - Lingo::Attendee::Sequencer remembers word form of sequences.
- Updated and extended English system dictionary and suffix list.
- Fixed errors with XML input (issue #15 by Thomas Berger).
v1.8.7
- Added Lingo::Attendee::LsiFilter to correlate semantically related terms
(LSI) over the
"corpus" of all files processed during a single program invocation; requires
lsi4r which in turn requires
rb-gsl. [EXPERIMENTAL: Interface may
be changed or removed in next release.] - Added Lingo::Attendee::HalFilter to correlate semantically related terms
(HAL) over
individual documents; requires hal4r
which in turn requires rb-gsl.
[EXPERIMENTAL: Interface may be changed or removed in next release.] - Added Lingo::Attendee::AnalysisFilter and associated
lingoctltooling. - Multiword dictionaries can now identify hyphenated variants (e.g.
automatic data-processing); sethyphenate: truein the
dictionary config. - Lingo::Attendee::Tokenizer no longer considers hyphens at word edges as part
of the word. As a consequence, Lingo::Attendee::Dehyphenizer has been
dropped. - Dropped Lingo::Attendee::NonewordFilter; use Lingo::Attendee::VectorFilter
with optionlexicals: '\?'instead. - Lingo::Attendee::TextReader and Lingo::Attendee::TextWriter learned
encodingoption to read/write text that is not UTF-8 encoded;
configuration files and dictionaries still need to be UTF-8, though. - Lingo::Attendee::TextReader and Lingo::Attendee::TextWriter learned to
read/write Gzip-compressed files (file extension.gzor.gzip). - Lingo::Attendee::Sequencer learned to recognize
0in the pattern to match
number tokens. - Fixed Lingo::Attendee::TextReader to recognize BOM in input files; does not
apply to input read fromSTDIN. - Fixed regression introduced in 1.8.6 where Lingo::Attendee::Debugger would
no longer work immediately behind Lingo::Attendee::TextReader. - Fixed
lingoctlcopy commands when overwriting existing files. - Refactored Lingo::Database::Crypter into a module.
- JRuby 9000 compatibility.
v1.8.6
- Lingo::Attendee::VectorFilter learned
posoption to print position and
byte offset with each word. - Lingo::Attendee::VectorFilter learned
tfidfoption to sort results based
on their tf–idf score; the document
frequencies are calculated over the "corpus" of all files processed during
a single program invocation. - Lingo::Attendee::VectorFilter learned
tokensoption to filter on
Lingo::Language::Token in addition to Lingo::Language::Word. - Lingo::Attendee::VectorFilter no longer supports
debug(as well as
promptandpreamble); use Lingo::Attendee::DebugFilter instead. - Lingo::Attendee::TextReader no longer removes line endings; option
chomp
is obsolete. - Lingo::Attendee::TextReader passes byte offset to the following attendee.
- Lingo::Attendee::Tokenizer records token's byte offset.
- Lingo::Attendee::Tokenizer records token's sequence position.
- Lingo::Attendee::Tokenizer learned
skip-tagsoption to skip over
specified tags' contents. - Lingo::Attendee subclasses warn when invalid or obsolete options or names
are used. - Changed German infix substitution
/entoch/chenin order to prevent
overly aggressive identifications. - Internal refactoring and API changes.
v1.8.5
- Dictionary values (projections) are no longer sorted; hence, order of
definition affects processing. - Lexicals in Lingo::Language::Word are no longer sorted; in particular,
compound parts keep their original order. - Lexicals in Lingo::Language::Word are no longer cleaned from duplicates.
- Compiled dictionaries are updated whenever the Lingo version or their
configuration changes, not only when the source file's size or modification
time changes. - Lingo::Attendee::Synonymer learned
compound-partsoption to also
generate synonyms for compound parts when set totrue. - Lingo::Attendee::TextReader learned better PDF-to-text conversion using the
pdftotextcommand; specifyfilter: pdftotextin the config. - Lingo::Attendee::VectorFilter learned
dictoption to print words in
dictionary format (viz. Lingo::Database::Source::WordClass). - Lingo::Attendee::VectorFilter learned
preambleoption to print current
configuration to the beginning of the log file (debug: 'true');
setpreamble: falseto disable. - Multiword dictionaries compiled from base forms can now generate inflected
adjectives based on the gender of the head noun; setinflect: true
in the dictionary config. - Lingo::Database::Source::WordClass supports gender information being encoded
in the dictionary as well as shorthand notation for multiple word
classes/genders. - Lingo::Database::Source::WordClass supports compounds being encoded in the
dictionary (appending+to their parts' word classes is
recommended). - Lingo::Database::Source removes leading and trailing whitespace from
dictionary lines. - Lingo::Database::Crypter uses OpenSSL to encrypt/decrypt dictionaries.
Note: Can't decrypt dictionaries encrypted with the old scheme anymore. - Lingo::Attendee::Tokenizer learned subset of MediaWiki syntax.
- Eliminated pathological behaviour of the
URLSrule in
Lingo::Attendee::Tokenizer. - Fixed regression introduced in 1.8.2 where
combine: allwould no
longer work in Lingo::Attendee::MultiWorder. - Updated and extended Russian dictionaries. (Yulia Dorokhova, Thomas Müller)
lingoctlno longer overwrites existing files without confirmation.lingoctllearnedarchivecommand.- Dictionary cleanup.
v1.8.4
- Lingo::Attendee::Sequencer accepts regular expression patterns.
- Lingo::Attendee::Sequencer substitutes
0in the format string for the
matched pattern. - Lingo::Attendee::NonewordFilter learned
dictoption to print nonewords
in dictionary format. - Added progress reporting to Lingo::Attendee::TextReader for
STDIN. lingoctl demoreports successful initialization.- Russian localization for Lingo::Web. (Yulia Dorokhova, Thomas Müller)
- Lingo::Web learned parameter
hlto set UI language. - Lingo::Web displays the configuration in use.
- Lingo::Srv accepts array of query strings in addition to single query
string. - Meeting config takes precedence over language config.
- When dictionary entries are rejected during conversion, the location of the
reject file will be shown. - LIR record number defaults to match string in absence of capture group.
- Optionally prevent Lingo from sorting any results by setting the
LINGO_NO_SORTenvironment variable.
v1.8.3
- Fixed regression introduced in 1.8.2 where reading input from
STDINwas no
longer possible. - Fixed regression introduced in 1.8.2 where Lingo would no longer run on Ruby
1.9.2. - Fixed length limit handling for multibyte characters in SDBM store.
- Fixed encoding issue in SDBM store.
- Fixed issue with BOM in config files.
- Modified character handling to accept any Unicode letter (Alphabetic)
and digit (Decimal Number). - Modified Lingo::Attendee::Tokenizer to use only hard-coded tokenization
rules. - Modified Lingo::Attendee::VectorFilter option
lexicalsto be
case-sensitive. - Improved overall performance and memory usage; Lingo::Attendee::Sequencer
changed the order sequences are inserted into the stream. - Eliminated performance penalty caused by Lingo::Attendee::Abbreviator.
- Added Russian language support. (Yulia Dorokhova, Thomas Müller)
- Added
fieldsoption to Lingo::Attendee::TextReader to cut off field
labels; defaults totruein record (LIR) mode. - Added
skipoption to Lingo::Attendee::TextReader to skip lines matching
the given pattern. - Added
srcoption to Lingo::Attendee::VectorFilter to print "source" part
of compounds. - Added
lingosrvandlingowebexecutables. The former provides a simple
HTTP endpoint with JSON output; the latter serves a demo web interface. - Refactored internal caching.
- Made dependency on Ruby version >= 1.9.2 explicit.
- Removed reporting facility (options
--perfmonand--status). - Learned
--profileoption to collect profiling information while running. - Deprecated Lingo::Language::Grammar option
compositum(nowcompound),
Lingo::Config optiontextreader(nowtext_reader), and
Lingo::Attendee::TextReader optionlir-record-pattern(nowrecords);
they will be removed in Lingo 1.9.
v1.8.2
- Performance improvements regarding Lingo::Attendee::VectorFilter (as well
as Lingo::Attendee::NonewordFilter) memory usage; setsort: false
in the config. - Added Lingo::Attendee::Stemmer (implementing Porter's algorithm for suffix
stripping). - Added progress reporting to Lingo::Attendee::TextReader; set
progress: truein the config. - Added directory and glob processing to Lingo::Attendee::TextReader (new
optionsglobandrecursive). - Renamed Lingo::Attendee::TextReader option
lir-record-patternto
records. - Fixed Lingo::Attendee::Debugger to forward all objects so it can be
inserted between any two attendees in the config. - Fixed regression introduced in 1.8.0 where Lingo would not use existing
compiled dictionary when source file is not present. - Fixed "invalid byte sequence in UTF-8" on Windows for SDBM store.
- Enabled pluggable (compiled) dictionaries and storage backends.
- Extensive internal refactoring and cleanup. (Finished for now.)