Description
The word list in eng.traineddata contains relatively (in comparison with fra, deu, ita, spa) many ambigious words (checked with https://gist.github.com/jbarth-ubhd/8d5ceb4035bf2d89700117a311209f20 ):
AMBIGIOUS (EXCERPT): Abstract;In addRole Alberta.ca AngMarTV AppSight aXe BarCap Betting| BioTalent BOX/VPOWER B|S|T BTsites CafeMom CATEGORY:NONE ChemGrout classi®cation CMDs CyberCoders d’Alzon Disc™ DomainTools EARTHWEBNEWS.COM ebizQ EBV-infected Elly_Brown ESPN.com Fire).gba FishBowlDC GEO's getFieldType GFP-Fes GOV/PGC/A GreatSeats.com HKFlix HMSHost icon.gif IconLover image/file JobList KCAL/MOL kgw.com KrF LFTs liveCD load_five MbePoint McBurney McGrady MESSAGE Metz® MOVIES/HDTV NCN-pincer NetFlix ~NEW NotesViewColumn NowBuy NowVisit om/fresh PollDaddy <POSSIBLE <<PREVIOUS PRICES|TIPS ProGrad QCard Quotes.net RakionSEA Re:finlay RTDs SciencesLocation Security| >see SEOs ServerBeach Services/Armed Solution™ <STDIO.H> TheBlackElf T/L UNjobs.org usawallpaper.com Ventolin® ViewVC VivirLatino vWD WebCopier www.ask.com <?xml
338080 lines
0.00 % lines with »ſ«
27.71 % lines all-UPPERCASE
8.68 % lines ambigious
PS: fra, deu, ita, spa contain also ~30% all-UPPERCASE words - is this intended?
Activity