Jump to content

Wikifunctions:Type proposals/Alphabet

From Wikifunctions

This would be a list of Z86 associated with one Z60. A language may have multiple alphabets associated with it for different purposes.

Uses

Sorting

The most obvious user case would be language respecting sorting, as even latin based alphabets disagree on the order of letters.

Language dependent string evaluation/manipulation

This covers miscellaneous cases where an alphabet is passed as one argument to a functions. Some existing functions where this could be useful:

  • Z11693
  • Z13119
  • Caesar ciphers: general case Z12812, and Z10846, Z10627 and Z10851
  • Z10096 has many issues, but one raised on its discussion page is handling of multi character letters. Breton is used as an example and this also applies to many other languages, like the Dutch Ij and Welsh CC, DD, FF, NG, LL, PH, RH and TH. And those are still using "the Latin alphabet".

Comments

This still leaves some sorting related issues unresolved, like transliteration of foreign orthology. In Swedish, the Danish Ø and ø are treated like the native Ö and ö in sorting, like in this Wikipedia category. But those could be handled using language specific replacement maps, an alphabet passed to the function would contain which natural language to use. --Autom (talk) 01:33, 30 March 2024 (UTC)[reply]

Do you think this should be a String, rather than an (ordered) list of code points? Jdforrester (WMF) (talk) 18:17, 1 April 2024 (UTC)[reply]
@Jdforrester (WMF): I wrote it like that because some languages treat double letters differently for sorting (like how Aa is sorted under Å in w:da:Kategori:Købstæder). Using single code points would be more elegant and intuitive, but a small string can do all the same things and more. --Autom (talk) 11:42, 10 May 2024 (UTC)[reply]
Sorry, this wouldn't solve my example as they are treated as equivalent. The Dutch Ij is already in its alphabetical position, but I'm certain there are other exceptions I haven't thought about. I have only limited knowledge of European languages, after all.
You have me convinced that it might be best to use code points and solve edge cases on a per language basis instead. --Autom (talk) 11:51, 10 May 2024 (UTC)[reply]