Talk:Z13402

From Wikifunctions
Latest comment: 1 month ago by GrounderUK in topic Definition of “word”

Definition of “word”

See also words from string (Z13402). Tokenization by whitespace could be generalized to tokenization by delimiter(s). If punctuation is suppressed by whitespace substitution or inclusion within delimiters, we converge on a common function.

In the domain of lexical forms, conventions vary by language. In English we have a particular difficulty with hyphens and apostrophes (occasionally described by the misnomer “interpunction”).

  • The string “don’t” is generally regarded as equivalent to “do not”, which is two words, not one.
  • The string “can’t” is generally regarded as equivalent to “cannot”, which might be considered a single word.
  • Contraction of “is” to “’s” may be indistinguishable from a possessive, so a whitespace-delimited string ending ’s may be considered either one word or two (whereas such a string ending s’ is always a single word, if correct).
  • Compound words are typically hyphenated in some contexts and left as separate words in others. A “well-known” distinction is one that is well known. Sometimes a form with neither hyphens nor spaces may be used (see, for example, https://books.google.com/ngrams/graph?content=wellknown%2Cwell-known%2Cwell+known&year_start=1800&year_end=2000&corpus=en-2019&smoothing=3.)

GrounderUK (talk) 13:40, 30 March 2024 (UTC)Reply