Skip to content

English: Short words ending -s #246

@ojwb

Description

@ojwb

Spun off from #165 (comment)

@pramsey Reported the two plurals of bus not being conflated:

$ printf 'bus\nbuses\nbusses\n'|./stemwords -l en -p
bus -> bus
buses -> buse
busses -> buss

There is a cost to exceptions, especially ones that need to be checked for every word stemmed, so we don't generally worry about irregular cases if we're no worse off than we would be without any stemming. However if there's unwanted conflation of the irregular forms with words which have different (or different enough) meanings then that's a different matter.

Here the only potentially unwanted conflation seems to be with buss (archaic word for a kiss/to kiss). If we're worrying about buss then busses is also the plural of the noun and third person singular of the verb, so it's inherently ambiguous.

If this is part of a wider pattern which we can come up with a sensible rule for then it might be worth an exception. So far I've spotted these other words ending -s which add -es for the plural and can double the s or not:

  • The plural of bias can be biases (stem bias) or biasses (stem biass, same as biassed and biassing)
  • The plural of gas can be gases (stem gase) or gasses (stem gass, same as gassed and gassing)
  • The plural of yes (as a noun) can be yeses (stem yese) or yesses (stem yess)

(We already have an exceptional invariant entry for bias to prevent us removing the s.)

It seems any new rule for this can't just look at the ending since the current handling of e.g. vases->vase and masses->mass is what we want.

I checked the mailing list archives and gas/gases/gasses has been noted at least twice before (and gas improved to not stem to ga). Martin summarised that change (probably the last in this area):

-s removal has been changed. You now need a vowel somewhere before the letter before the s. So 'gas', 'this', 'has', 'was' keep the s, 'dogs', 'cats', 'woos', 'kiwis' lose the s. Usefully, the s is not removed from non-words like 'cvs', 'spss', 'lms' etc.

In general there is a problem identifying plurals of words ending Xs, where
X is vowel other than e. As you know, porter2 leaves -us alone but removes s
after a,i,o. This works fairly well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions