Design Decisions

This page records design decisions

2006-10-21T10:55:55Z. Logging. Results recorded to file specified in resources/logger.properties [logfile]. Decision to output only errors and fatals to the console. Impact. ErrHandler class

Error handling. Provisional decision. If an error is fatal to the operation of the engine, use eh.fatal() not eh.error(). I.e. eh.error() will be reported, it will not cause the program to terminate.

Braille Table selection codes. Currently favouring language code from iso639.txt and the locale (nearly country) code from iso 3166. I don't think Java has a way of validating these.

Update: 2006-11-03T09:51:32.0Z. Wrong again! I had the temerity to ask the people who know about these things (I18N group in W3C), and had my lack of information corrected. This only involves a minor change to table selection. The advisory document is W3C page, which updated me on quite a few things, I18N wise. I asked about UEBC, which is English, but has no locale! I was advised:

You might very well tags English braille texts as "en-Brai". US variants would be "en-Brai-US". British variants would be "en-Brai-GB".

The references were:

rfc4647 which, in combination with RFC 4646, replaces RFC 3066, which replaced RFC 1766. This describes a syntax, called a "language-range", for specifying items in a user's list of language preferences. It also describes different mechanisms for comparing and matching these to language tags.
rfc4646. This describes the structure, content, construction, and semantics of language tags for use in cases where it is desirable to indicate the language used in an information object.
iana registryiana, which is a central location for finding a language.
ISO 15924, unicode.org. Alphabetical list of four-letter script codes
Rather than link to each one which will become outdated, try rfc3066bis, which tracks the latest! Nice idea W3C.

W3C provide some simple advice,

The golden rule when creating language tags is to keep the tag as short as possible. Avoid region, script or other subtags except where they add useful distinguishing information. For instance, use ja for Japanese and not ja-JP, unless there is a particular reason that you need to say that this is Japanese as spoken in Japan.

The sequence is


    language-script-region-variant-extension-privateuse
    

The language is obtained from iana (see above), the script from ISO 15924, the region from the iana registry (see above), variants (rare) from rfc4646.txt (with constraints); the extension is pretty well constrained by rfc4646.txt (there are currently none) and finally the private use area which is prefixed by x and terminates the sequencxe.

To build such a sequence, see rfc4646.txt, sections 2.1 and 2.2. Section 4.4 describes canonicalization. Formally, the bnf from 4646 is:


 Language-Tag  = langtag
                 / privateuse             ; private use tag
                 / grandfathered          ; grandfathered registrations

   langtag       = (language
                    ["-" script]
                    ["-" region]
                    *("-" variant)
                    *("-" extension)
                    ["-" privateuse])

   language      = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code
                 / 4ALPHA                 ; reserved for future use
                 / 5*8ALPHA               ; registered language subtag

   extlang       = *3("-" 3ALPHA)         ; reserved for future use

   script        = 4ALPHA                 ; ISO 15924 code

   region        = 2ALPHA                 ; ISO 3166 code
                 / 3DIGIT                 ; UN M.49 code

   variant       = 5*8alphanum            ; registered variants
                 / (DIGIT 3alphanum)

   extension     = singleton 1*("-" (2*8alphanum))

   singleton     = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT
                 ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9"
                 ; Single letters: x/X is reserved for private use

   privateuse    = ("x"/"X") 1*("-" (1*8alphanum))

   grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum))
                   ; grandfathered registration
                   ; Note: i is the only singleton
                   ; that starts a grandfathered tag

   alphanum      = (ALPHA / DIGIT)       ; letters and numbers

                        Figure 1: Language Tag ABNF