User:Dv103/Writing Gregorian calendar date readers
Writing Gregorian calendar date readers
![]() | This is an essay. It contains the advice or opinions of one or more Wikifunctions contributors. This page is not one of Wikifunctions' policies or guidelines, as it has not been thoroughly vetted by the community. Some essays represent widespread norms; others only represent minority viewpoints. |
If you wish to add an implementation for read Gregorian Calendar Date (Z20808) for your language (first check if there is already one in Gregorian date readers (Z23981)), you may write one by yourself, but if you don't want to reinvent the wheel this is a guide on how to adapt a preexisting implementation. (Note: this guide is written to be understandable to less experienced users, but it is sill useful for everyone; this guide requires you to know JavaScript).
Create the function
First, as usual, you need to create a function. It should have one input of type Gregorian calendar date (Z20420) and the output of type String (Z6).
Then you should write some testcases. Think of all the possible formats that your language uses for writing dates, including the most border-case ones (for example: how would a speaker of your language interpret the date "1 2 3"?). Some tips:
- The output of display date (Z20780) for your language should be a valid input of your function. This is not only a reasonable expectation, but the Wikifunctions UI also works with the assumption that this is true.
- You don't have to worry about supporting the ISO 8601 formats. They are handled by another function. If Wikifunctions needs to parse a valid ISO 8601 string as a date, your function won't even be called.
Now you can create a JavaScript impletentation, and copy-paste this code. Then change the function and the parameter names according to your function ZID. Finally, change the line var input=Z23976K1;
with the name of the parameter passed to your function. Now you should have a working clone to the mul reading function. Check if it works.
Localisation
Separators
The line of code const separators=[" ","\t","/","\\","'",'"',"|"];
defines the separators used to identify the components (here called tokens) of the input. The separators themselves won't be part of any token. This list should be fine for most languages, but if your language uses other characters (or strings) to separate the parts of the date, you should add them to this list.
Order of elements
The line const prefOptions=["yemd","eymd","dmye","dmey","mdye","mdey","yedm","eydm",];
defines the possible orders of the components in decreasing order of priority. "e"
stands for Era (BC/AD). You should change this list to match the formats used in your language, adding the missing ones and removing the ones that are not used. Some tips:
- The function already handles the fact that the Era may be absent. Don't worry about it.
- The
"ymd"
format should be present somewhere and should be before the"ydm"
format, if the latter is present at all (unless in your language"ydm"
is actually more used than"ymd"
). This is to ensure that inputs that are almost in the ISO 6801yyyy-mm-dd
format (but not exactly compliant, so that your function has to process them) can be properly handled. - If in your language dates written only using numbers follow a different format than the ones with some component written in literal form, the numeric-only format should have the priority over the literal ones, that still need to be present. This is because dates with literal components are usually intrinsecally less ambiguous. As a rule of thumb, the answer of the
"1 2 3"
question should be the format with the highest priority.
Language specific options
The function languageSpecificOptions()
should contain the actual specifications to properly localise your code.
Defining literals
You can see that there is already a definition for Era literals: you can start by editing the object to reflect your language's literals for Eras. As you can see, the eras are encoded as either -1n
for Before Christ or 1n
for Anno Domini. As a rule, all the numeric values must be of type BigInt
.
You can add also other calls to defineLiterals(literals,meaning,compMode=LENIENT, forced=false)
. Here a quick explanation of the arguments:
literals
: an object with the literals as keys and the corresponding meaning as value. The value must be of type BigInt. The literal can be composed by more than one words: in that case, separe the words with a single space. More literals can correspond to the same value. The possible values for Era are only-1n
and1n
. The months must be between0n
(January) and11n
(December). The year 1BC should be represented as0n
, the year 2BC as-1n
, and so on.meaning
: what the literals represent. The possible values areDAY
,MONTH
,YEAR
,ERA
. If the literals can represenr more than one element, you can pass an Array containing all the possible meanings.compMode
: the mode of comparison. Can beLENIENT
orNORMAL
. TheLENIENT
comparison also accept inputs that are prefixes of your literals on a word-by-word basis (useful for example to handle all the possible abbreviations of the months). This should be avoided if there is a literal that is the prefix of another literal with a different meaning. TheNORMAL
comparison instead only accepts inputs that contain the entire word. Both comparison modes are case-insensitive and ignore the characters.,:;
.forced
: iftrue
, the tokens that match with one of the literals are forced to take one of themeaning
s.
If you need something more sophisticated than a literals list, you can use the function defineLiteralFunction(f,meaning, forced=false)
. It is similar to the previous one, except f
is a coustom function that accepts two parameters (the list of tokens, of type list of strings, and the index of the token currently analised, of type number) and returns a list of two elements (the value corresponding to the literal, of type BigInt, and the length of the literal in number of tokens, of type Number) if it founds a literal, null
otherwise.
Defining constraints
It is possible to add constraints, to render less ambiguous the interpretation of the input. For example, in the English reading function it is possible to impose that "st", "nd", "rd" and "th" must be right after the day number. The function that you can call to impose a constraint has as signature defineConstraint(literals, meaning, position, compMode=NORMAL)
. It is very similar to defineLiterals
. Here is a description of the parameters:
literals
: an Array of strings containing the literals that should trigger the constraint. If the constraints contain more than one word, they must be separated by a single space.meaning
: as before, it can take the valuesDAY
,MONTH
,YEAR
,ERA
and can be an array of those values: in this case, ALL the specified values must be next to the constraint. This means that if you pass more than two values inmeaning
, it becames impossible to satisfy them, making the entire function impossible to properly execute.position
: can beBEFORE
,AFTER
orBEFORE_OR_AFTER
. It specifies the position of the constraint relative to the constrained element (meaning right after, right before, or right after or before).compMode
:LENIENT
orNORMAL
. They work exactly the same way as before.
Note that the constraints themselves and meaningless words are transparent to other constraints.
Here is an example:
const prefOptions=["yemd","eymd","yedm","eydm","dmye","dmey","mdye","mdey"];
function languageSpecificOptions(){
defineConstraint(["a","b"],DAY,BEFORE);
defineConstraint(["c"],MONTH,BEFORE_OR_AFTER);
}
This means that "4 A B C meaningless text 3 5"
is interpreted as "3 April 5 AD" since mdy
is a valid format, A
and B
must be right before a day (ignoring B
, C
and meaningless text
, since they are transparent) and C
can be right after or before a month (in this case it is after).
For constraints too you can instead use defineConstraintFunction(f,meaning,position)
to specify a coustom parsing function f
to identify the constraints. f
must take in input 2 arguments (the token list and the current position) and must return a Number (the number of tokens occupied by the constraint) or null
in case the currently analised token isn't a constraint.
Final checks
Now the function should be properly localised, so you can check if it passes all the testcases. If it does, when the implementation is connected it is possible to add it to Gregorian date readers (Z23981).
Some examples
If you want to check some examples (other than the basic mul implementation), here they are:
- read Gregorian date, refactored, it, js (Z24947): this is a basic but complete implementation. Note that Italian dates don't require fancy constraints or literal definitions (just the eras and the months' names). The only interesting feature is the fact that in
prefOptions
only formats with the era after the year are specified, since in Italian it would be very weird to place the era indicator before the year. This is functionally an implicit constraint. - refactored code, en, dag (mdy), js (Z24945): A way more complex example, that contains most of the possible features (given by the complexity of the Dagbani date conventions).
If you encounter some problems or have some questions, don't hesitate to ping me or to write in the talk page of this guide.
Writing Day of Roman year readers
Once you have implemented a Gregorian calendar date reader, implementing a Day of Roman year (Z20342) reader is trivial:
- Create a new function
- Create the testcases
- Add a Javascript implementation
- Copy-paste this implementation, correcting the function name, the function parameter and the line
var input=Z25022K1;
- Copy-paste the
prefOption
value and thelanguageSpecificOptions()
body from your implementation of the Gregorian calendar date reader - Check if it works properly
- When everyting is connected, you can add your function to Day of roman year readers (Z25030)