StringComparator

The Dictionary intrinsic class allows the program to customize how strings in the dictionary are compared to input strings using a "comparator" object. StringComparator provides an implementation of the comparator interface that's fast and efficient, since it's implemented natively in the interpreter.

StringComparator objects can be customized in several ways:

StringComparator objects are immutable once created, so you cannot change any of the comparison rules after creating one of these objects. This is important, because it conforms to the Dictionary class's requirement that the comparison rules of a comparator must never change once a comparator is installed. If you want to change a dictionary's comparison rules dynamically, simply create a new StringComparator (or a custom comparator object of your own) and install it in the dictionary.

When using the StringComparator class, you must #include <strcomp.h> in your source files.

Equivalence mappings

The StringComparator class lets you specify that certain sequences of characters in an input string can match other characters in reference (dictionary) strings. This is done through "equivalence mappings," which specify characters that are considered equivalent for the purposes of matching strings.

Equivalence mappings are designed primarily to make things easier for authors and players in games written in languages using accented and other special characters. Here are the specific cases that went into the StringComparator's design:

The simplest way to allow this sort of approximation would be to add a dictionary entry for each alternative spelling; so in a French game, if we put "élan" in the dictionary, we'd also include "elan". This approach has two disadvantages, though. First, it's obviously a lot of extra work for the author. Second, in cases where accents are significant in differentiating one word from another, which are common in languages that use accented letters, the extra words create ambiguity when two different objects have names that differ only in accents; this ambiguity is unavoidable if the accent elisions are to be allowed, but adding the words to the dictionary makes it impossible to turn off the accent ambiguity, which a player using a properly localized keyboard might want to do.

Equivalence mappings address these problems by allowing an author to enter only the exact form of a word into the dictionary, using all of the proper accents, but still match unaccented characters (or other approximations) in player input. The dictionary is not affected by the approximations, so the dictionary retains full information on the correct form of each word; the input isn't affected, either, so we can tell whether the user typed an approximation or an exact match.

A StringComparator object can define many equivalence mappings. Each mapping defines an association between one "reference character" and a corresponding "input string." A reference character is a single character that can appear in reference strings; when a StringComparator is used with a Dictionary object, reference strings are simply the strings that are stored in the dictionary. An input string defines the character or characters that will be considered equivalent to the reference character when the input string appears in input. Each mapping also defines two "result flags" values: one for an upper-case input string and one for a lower-case input string; these are bit flag values that are combined into a matchValues() result when the mapping is actually used.

For example, in a French game, we might want to allow unaccented characters in input to match the corresponding accented characters in dictionary words. To do this, we could provide a mapping of reference character "à" to "a", from "á" to "a", from "â" to "a", from "é" to "e", from "è" to "e", and so on.

There are two important constraints on the allowed mappings:

Result flags

The result flags values are used to convey information about the occurrence of an equivalence mapping to a matchValues() caller. These are important because they provide a simple way for the caller to determine whether an input string matched its dictionary word exactly or using equivalence mappings; furthermore, since each mapping has its own separate result flags, these allow different mappings to indicate different results. For example, in a German game, we might want to allow unaccented character to be used in input to match accented dictionary words, but count these as weaker matches than if the exact accents were used; we could do this by adding in a bit flag to each accented-to-unaccented equivalence mapping, and then test for that flag in the matchValues() result. However, we might want to consider "ss" as exactly equivalent to "ß"; to do this, we would use 0 as the equivalence's result flags, so that as far as the matchValues() caller is concerned, the a match from "ss" to "ß" is exact.

The result flags differentiate upper-case and lower-case input strings. Each mapping has an upper-case result flags value, and a lower-case result flags value. When an equivalence mapping is used to match a string, only one of the flags is used, based on the first character of the matching input string: if the first character is an upper-case letter, the upper-case result flags value is used; otherwise, the lower-case value is used. (Note that this means that if a non-alphabetic character is the first character of the input string, the lower-case value is used.) This distinction is meant to allow mappings to assign different strengths based on the case of the input. This is useful in French, for example: accents are typically removed in French writing when a letter is capitalized, hence we would not want to flag an unaccented capital as a weak match for an accented letter, as we would for an unaccented minuscule.

Important: the StringComparator class reserves the low-order 8 bits of the result flags for its own use. Therefore, any flags defined in equivalence mappings should use values 0x0100 and above.

Construction

To create a StringComparator, use the new operator:

new StringComparator(truncLen, caseSensitive, mappings)

The parameters are:

StringComparator Methods

For more information on how the Dictionary class uses comparators, refer to the Dictionary section.

In addition to the standard Object methods, StringComparator provides the following methods:

calcHash(str)

Calculate a hash value for the given string. Returns an integer giving the hash value. The hash calculation conforms to the requirement that, for two strings s1 and s2, if matchValues(s1, s2) indicates a match, then calcHash(s1) will equal calcHash(s2).

matchValues(inputStr, refStr)

Compares the two strings, and returns a non-zero integer if the two strings match, according to the rules defined when the StringComparator was constructed, or 0 (the integer zero) if the strings do not match. inputStr is the "input" string, which typically will come from user input or a similar source; refStr is the "reference" string, which is the string against which the input is to be tested. When used with a Dictionary, the reference string is the string stored in the dictionary.

The return value for a match will always be a non-zero integer. This value is formed by combining, using bitwise OR, all of the applicable flags for the match, including the pre-defined flags and the result flags for all equivalence mappings used to make the match. The following flag values are pre-defined:

Note that, in addition to the pre-defined flags listed above, StringComparator reserves all flag values from 0x0001 to 0x0080, to allow for future expansion; equivalence mappings should use flag values 0x0100 and above.