Jump to content

Wikifunctions:Type proposals/Grapheme

From Wikifunctions

Summary

A user-perceived character; smallest functional writing unit, the w:grapheme. Includes not just the "base" character but also the sequence of combiners.

Uses

Often, we need to process Strings by user-perceived character. "1️⃣" should be processed as one grapheme, not three characters (1+variation selector+combining keycap). When the user wants to select the last character of "I won't decline this proposal 1️⃣", they want to select the full 1️⃣, not just the . Likewise, we don't want nonsense like ⃣1 to emerge when we reverse the String. This doesn't just apply to keycap emojis. It also applies to real natural-language writing systems, like the grapheme "A̧" (A+̧).

To do all of that, we need to split a String into graphemes, and to do that, we need the grapheme type. Such a splitter would fix characters with diacritics (Z22735).

Structure

Like the String, but the value is also a String that is the full sequence of characters for this single grapheme.

Example values

{
  "type": "grapheme",
  "value": "A̧"
}
{
  "Z1K1": "Zxyz",
  "ZxyzK1": "A̧"
}

Validator

The validator ensures that all characters under the "value" field combine to form exactly one grapheme.

Identity

Two graphemes are the same if their value Strings are the same.

Converting to code

Probably just return the String value, K1?

Display function

Display K1.

Read function

The input should be K1. Another function can split an arbitrary String into a typed array of graphemes; see Special:Permalink/32335, which can also be reengineered into a validator.

Alternatives

We can represent the grapheme as a String, but that makes things very weird, and could require bundling a validator and equality-finder with each function dealing with graphemes.

Comments