Wikifunctions:Determinism

From Wikifunctions
"It seems hard to sneak a look at God's cards. But that He plays dice and uses 'telepathic' methods... is something that I cannot believe for a single moment." ― Albert Einstein
"God not only plays dice, He also sometimes throws the dice where they cannot be seen." ― Stephen Hawking

While providing execution results, the Wikifunctions engine uses aggressive caching. This is to reduce the server load and enable more people to take advantage of the project.

Because of caching, the functions should be pure, that is their result should depend completely on the function's inputs. Such a function can be executed once for a given argument set and then cached virtually forever. An example of a pure function would be addition. Every time you add 1 to 2, you get 3. On the other hand, a function returning the current time is not pure. When executing it now and an hour ago, the result would be different even if parameters remain unchanged. Pure functions are moreover better suited for debugging and testing, because they don't have any hidden state (e.g. a counter that's incremented after every execution) nor external dependencies.

However, all the aims of Wikifunctions couldn't be fulfilled if the project would completely forbid using impure functions. Therefore, some guidelines regarding the function determinism and purity had to be put in place.

All the rules outlined below have a common goal: to reduce the number of nondeterministic functions, so that the execution engine can support them effectively.

Date and time

Functions that perform an analysis based on the current date or time are undoubtedly useful. They should be however decoupled from reading the current time and ask for it as a parameter instead. Any calculation based on the current time is just a general case of a calculation for the given date. For example, instead of creating a function get today's day of week (with no arguments), one should create a function get day of week for a date (with the date being passed as an argument).

This doesn't mean, however, that the former function cannot exist. It can be defined as a composition as follows: get day of week for a date(get current date()). This way, the date-related nondeterminism can be limited to just one function, that returns the current date (and/or time) and the evaluator will be able to know which parts of the composition have to be re-run and which can be read from cache.

Note: Currently, during the early days of Wikifunctions, date and time are not supported as data types, so that the whole category does not apply yet.

Environment properties

Functions should not depend on properties of the execution environment. At any point of time, the environment can be updated or altered in other way and the functions are expected to run as they used to be.

Randomness

Random values are troublesome for caching and therefore should be used sparingly. For any algorithm that makes use of randomness, it's recommended to accept a seed as well. That seed should be used for initializing any random generators invoked by the function.

Since random number generators will have to be deterministic as well, they should meet one of the following rules:

  • In order to obtain the random number, one have to pass the previous value as a seed, for example:
    • Assume seed is passed to our function as an argument.
    • Let random_value1 be the result of random(seed).
    • Let random_value2 be the result of random(random_value1).
    • etc.
  • The random generator accepts two arguments: seed and the previous value, for example:
    • Assume seed is passed to our function as an argument.
    • Choose some initial value.
    • Let random_value1 be the result of random(seed, initial).
    • Let random_value2 be the result of random(seed, random_value1).
    • etc.