Wikifunctions:Status updates/2024-05-30/lb
◀ | Wikifunctions-Status-Update | ▶ |
A single singular or a plurality of plurals?
We are working towards functions being able to access data from Wikidata. The first use-case we are aiming for is to access the lexicographic Forms of a Lexeme, given a Lexeme ID. For example, consider a function that creates sentences such as “There are four apples.”, where both the number “four” as well as the noun “apple” are arguments to the function that creates the sentence.
What should the Functions for accessing Lexeme Forms look like? If you have ideas to sketch that out, please go ahead, unbound by technical limitations. We’ll look forward to seeing your ideas and using them as inspiration in order to mold what we can achieve technically on top of the platform we have.
In the above example, we could have a Function that creates “four” based on the natural number 4, and “apples” based on the Lexeme ID L3257. The Lexeme has two forms: L3257-F1 being “apple”, marked with the grammatical feature singular, and L3257-F2 being “apples”, marked with the grammatical feature plural. In order to get the right Form, we can either look up the relevant Form Id manually, which will hardly scale, or we use the grammatical features to request the right form. In other words, there could be a Function e.g. "return form
" which takes a Lexeme ID and a list of grammatical features and returns all matching Forms.
return form(L3257, [plural])
would return “apples”. Accordingly, for the Estonian verb “amüseerima” (to amuse), we would make the call
return form(L350582, [third person, plural, present tense, indicative])
to return “amüseerivad”.
One question we will have is whether the English plural and the Estonian plural should be the same object in Wikifunctions, or whether they should be two different Objects. In Wikidata, the answer is that they are (in general) the same Item, plural – the form for more than one, or zero, depending on the language. There are several languages which have other grammatical numbers, such as paucal, dual, trial, and others, which are not used in English and other languages that only have the two. Even for languages that use only two values, there are differences; for instance, English uses the plural form for zero ('He ate zero eggs'), whereas French uses the singular ('Il a mangé zero œuf', or more idiomatically 'Il n'a mangé aucun œuf').
In Wikifunctions, we could choose to have individual enumerations for each language, which would have the advantage allowing for simpler, but different user interfaces for each language, where we display only the features relevant to a given language: so that for English we don’t show the other number classes, or ask for grammatical features which are not relevant for the language.
There are several different solutions, and the following list is not even exhaustive:
- a single enumeration of all grammatical features, as in Wikidata
- shared enumerations for the groups of grammatical features, just as cases, numbers, etc.
- enumerations for the groups of grammatical features that actually appear for languages, i.e. one shared enumeration for all languages that use only the singular and plural
- one enumeration for each language, and for each group of grammatical feature
We have been discussing this question in the Natural Language Generation Special Interest Group. Another solution that was mentioned was to use sub-typing, for example to have “English numbers” be a subtype of the “Grammatical numbers” type, with shared elements. But without Wikifunctions having support for sub-types, this isn’t currently an option.
It is very likely that we won’t be able to resolve this issue fully until we have actually built it and found in practice how it works out. It might even be that we change some of the decisions later, as we discover patterns of Wikifunctions usage that make Wikifunctions friendlier and easier to use. But it would be good to start thinking about what we would like to aim for and what the principles are along which we align our design decisions.
Rezent Ännerungen an der Software
The big piece of work that we landed this week was a comprehensive re-build of the front-end code for how we show labels. As is standard with MediaWiki-based tools, if your language was set to French, but there wasn't a label in French and was one in English, we show the fallback English label. Previously, we were resolving the label into a string based on your view language and displaying that, which mostly worked, but meant that the label would not be hinted for language or directionality when they were different from the context. We now fully pass down the label's language, and thus directionality, in all places (T343464, T342661). In the future, we may adjust how fallback labels like this are displayed, possibly to explain inline what language is being shown, and/or give a call-to-action to translate the label; you can add ideas to the top-level task (T343460).
Alongside the above work, we adjusted the function-calling API code to correctly pass along the activity tracing headers (T365053), and made a few code quality improvements. Additionally, we have been investigating performance-related issues with the back-end services, and hope to have more to report soon.
Function of the Week: days in month when not a leap year
Are you using your knuckles whenever you try to remember how many days are in a given calendar month? No need for that anymore! Welcome the function days in month when not a leap year (Z16316).
OK, admittedly, your knuckles may often be more readily available than access to Wikifunctions, but let’s ignore that wrinkle for a moment.
The new Function takes a Gregorian calendar month, which we have introduced as a new Type last week, and returns a natural number, depending on how many days that month has in a year that is not a leap year (a complementary function for leap years also exists).
The function has twelve Tests, one for each month (making it a completely covered Function), and currently two Implementations, both in Python:
- One Implementation using a lookup in a Python dictionary, where for each month number we have the number of days,
- One Implementation using an unusual formula that I’ve never seen before
- Update: according to 99of9, who created that implementation, that formula was indeed invented for Wikifunctions, using the knuckle method as inspiration: the "modulo 2" is due to the knuckle/valley alternation, and the "integer division by 8" represents where we switch from one hand to the other.
One nice thing with a good, or even complete test coverage, like in this case, is that you don’t even have to understand or prove the formula in order to trust it (although that sure doesn’t hurt): you can simply check that all test cases are what you would expect, and that they pass (as they do).
Thanks to the community for so swiftly adopting the new Type, and for having created more than a dozen new Functions using the new Type.