To create, as an open source community, an international braille engine that will run on all major platforms using component based APIs and an XML input.
A braille engine for the conversion of XML into an embosser ready braille file. I want it to work for any language.
The project was first opened up 2006-09-19T13:37:54.0Z. All material will be open source, using the GPL license. With that in mind, any contributions must be GPL . Instigated by Dave Pawson, so ©Dave Pawson 2006. The project is very new, so we need developers to help with the work! The goal is to obtain a working product for the 80% and then refine to manage the remainder. A secondary goal is for an automated process, since the main complaint about braille is that there isn't enough of it. A nice to have would be support for UBC, sorry UEBC :-)
The project currently uses Java.
For the mailing list please follow Ubuntu and its Code of conduct
Outline
My initial thoughts are that the input XML would be processed by an XSLT stylesheet through to an XML vocabulary dedicated to formatted braille. This is basically blocks and inlines with a very simple structure. The architecture is along these lines.
An XML input document is transformed to this braille vocabulary. The XSLT emitter is used to capture output and transcribe the braille using a language dependent translation table. Content is then passed to a page object which builds pages from the transcribed output. These pages are then written to disk ready for embossing or other means of access. Now a bit more detail.
Transcription. Mark Frodsham wrote a braille engine in C#. It is a really nice piece of work. 99% data driven from a table based source. I ported it to Java and can currently run and pass all his tests. Mark favours test driven development (TDD) which is great for developing confidence in the development process. I've converted the braille table into an XML format such that generating a new table won't be such a hassle. I'll provide a schema and description for that. The only two elements (that I've found) which are language dependent are the table itself, a list of acronyms and date formatting. There could be more for your language.
Formatting.
I wrote some java to implement paragraph formatting. This has such methods as
prPara (int startCell, int restCells)
which produces a block of braille starting in cell 'startCell', with all remaining lines starting in the 'restCells' cells. Again this is all tested. Hopefully completed.
I have thought about, but not implemented the build up of pages from blocks of text. That's next on the list, or maybe the integration step should come first.
I now have an untested object to load a braille table. I've coded an abstract BrailleTable.java and instantiated it with ENBrailleTable.java. I've used ERH's XOM to read the XML braille table and load both the hashmap which is keyed into the groups for faster access, and the main rules, an array of Rule objects.
One or two not nice items, but it seems logical. I've moved all the language dependencies out into the abstract or concrete class of table, i.e. removing them totally from the Translator, which uses the factory to obtain a BrailleTable.
2006-10-03T13:33:59.0Z. Progress. Using Elliotte Harolds XOM I'm now loading the braille table via a factory class. Lots more I18N isolation, which, in true dev fashion, has broken the design. I'm now back to single class testing. A further update from Mark has meant wrapping that into the mix, so it could be some time before I'm up and passing tests again. I've now resolved the I18N such that all messages can be used from resource files, which can be tucked away in the jar file, simply adding another language via a text file. Tested in Swedish and Czech - for which thanks to Jirka and Markus.
All class files are looking more at home now, I've added an impl package. I need to add Localizer from James Clark to the mix, then that's it for files.
Two major pieces of work remaining. Handling acronyms in bulk, rather than via Regex, and the Sax parser interface.
I now have a working braille engine. The formatter is partly tested, hence not integrated. Finished testing. All existing 90+ tests now pass, another twenty added for more recent work (Roman numerals caught me out). Prior to release I want to define a 'starter' rules file (no translation, just copies input to output) which can be built on for another language. I think it will be ready for criticism then.
Update. 2006-10-16T08:56:06.0Z. I've started work on the XML input side. Two classes, org.dpawson.Handler which is the sax2 output handler, and org.dpawson.Ubraille which will form the basic input class for XML. So far so good. Still a bit too raw to be released, but the logic seems about right.
Created an ant script. Can now generate the jar file, zip everything up and compile it all. See the download directory Starting to plan the I18N move next.
I'm starting to think about scope too. To try and answer the basic questions, what is this software supposed to do? Who is it for etc. At the back of my mind I have a fairly clear belief that this software is unlikely to be able to translate literally any old text into braille. E.g. math, foreign language text, poetry, chemistry etc. My goal is for a braille engine that can handle the pareto effortlessly and with no mistakes which make it unreadable for the end user. Hence I've added a passthrough element to the XML, such that a professional transcriber can do that work and not have it messed up by the engine. I'm told that transcribers do both transcription and formatting naturally, so I'm making the assumption that such content will be appropriately formatted, hence only needs to be added to the output stream. My main goal is that the braille engine is wholly language neutral, hence new languages can be added to the engine by a Java programmer and a braillist working together. As to the audience? I'm less sure today. Tomorrow I want non-programmers to be able to pick it up and use it without problems. I don't think its ready for that just yet. Today I'm in need of other programmers to look at the code and find some of the bugs
Update 2006-10-17T18:28:15Z. I18N incorporated without trouble. Two classes added, utils.Localizer and utils.I18N. I'll gradually move the current suite of messages over to this use. I'm currently holding all messages in the resources directory, with a hierarchy matching the classes. I think this can be improved. I'm less sure how.
Update: 2006-10-21T15:26:17Z. The logging and Error Handler are now implemented using log4j and an extension the sax Error Handler. Seems to be working well. A few more classes to move to this way of logging. The logger is configurable such that messages go to stdout and or to a logfile (set in the logging.properties file). Ant build file now updated to generate zipfileset
The outline of the system is available as
ubraille.png
Update 2006-10-27T15:13:07.0Z. Not a lot of progress. More work on the move to full I18N and the use of the ErrHandler class. Development work on using special characters (code in the ENBrailleTable class and changes to the translate method in TextToBraille class). See rules for a fuller description.
Update 2006-10-30T16:01:24Z. The use of special characters (see the English braille table for an example) brought out an error in the acronym regex. All to the better. Uploaded 1.03 as a result. I've been using a simple Java backtranslator (for en-GB braille) for some time. I'm thinking of including that as a utility. I can do that now that I've found a way of reading files (resources) from within a Jar file! Boy that took some doing :-). I'll post it once I figure a way of doing so. Perhaps just a zip file in the download directory? Suggestions please.
Update 2006-11-04T11:36:28Z. Resolved the jar file issue, btrans now working as it should, from a single file, the jar. Special character processing now finished, although the full list of special characters for en-GB needs completing. Added to the to do list.
Update 2006-11-19T18:56:05Z. A good weekends work. Inline elements have been bothering me for some time now. I've been thinking over various approaches, and tried two this week. Initially I wanted to process the inlines directly from the Sax handler, using the translator. This meant I needed a passthrough within the translator, which I think is not a good design. I eventually tested out using the opposite approach. Passing xml markup for the inlines through to the trnanslator. This way I have to recognised XML mid-translation process, which is not perfect, but controllable. This has to be done in a locale specific manner since (for at least en-GB) the rules are peculiar, to say the least. I finally finished coding and minimally testing this today. I think it will be relatively easy to add other inlines; not that I intend to have many.
Since I have also commenced modifying the Ubraille class to strengthen the user interface and provide an output file handling, I won't post another release until it is at least workable and tested. One more step forward! More on the todo list though; I've left en-GB code within the TextToBraille class. This needs moving over to ENBrailleTable next week.
Update 2006-11-27T18:00:00.0Z. Captured a list of issues. Progress on XML processing, bugs remain on interface between XML and translator, the braille table seems to be corrupted.
Update 2006-11-29T18:14:41.0Z. Added XOM to the mix. It's LGPL. The mix is coming together. Within the XML Sax parse phase, I'm now building both local block objects and Paragraph objects which are used to format the contracted braille. Incomplete as yet, but the source has been updated to show the progress. Two new Classes, Block and BlockType. BlockType is the same as Mode, a replacement of the enumeration type. Block uses this to capture the metadata about the block whilst analysing the incoming source.
Update 2006-12-01T16:32:09.0Z! Milestone!!! Today I processed some XML end to end. XML in, Braille ready for the embosser out. Buggy, but complete, and no crashing. Cleared a nasty bug whereby any newlines in the XML were crashing the table lookup algorithm (still needs investigation). Implementated a normalize() method in Utils.util and applied it to one block, the default. I'm now getting output via the Paragraph object, though not using the Page object. It's very slow, but that's an issue that can be addressed when it has increased functionality and a better feature list.
The resulting output looks like this:
P>A ) DEFAULT 9D5T ( #J'J4 I4;E4 NO
9D5T,N
EMPHASIS.^W & .TWO .^WS
PREFIX ^W ..F\R ^WS 9 EMPHASIS.'1 !N M
^WS
;HDHDH
CR1T+ DIGITAL AUDIO FILES = DELIV]Y TO
PET]BOR\< ON REMOVA# DISKS4
OV]VIEW4 A PROCESS 6DELIV] DIGITAL AUDIO
FILES 6;RNIB PET]BOR\<4
Incorrect indenting (good old zero based and one based error) but it's there. The corresponding input was
<brxml xmlns="http://www.dpawson.co.uk/ns#" xml:lang="en-GB" grade="1">
<meta>
<page>
<width>40</width>
<height>28</height>
<header>Fixed header text
<pagenum prefix="Page" align="end"/></header>
</page>
</meta>
<body>
<para s="0" r="3">Paragraph First line start at column 0,
remainder of lines are indented 3 cells.</para>
<para>Para with default indent of 0,0. I.e. no indentation</para>
<break type="page"/>
<para align="center">Centred header</para>
<para>Emphasis<em>word</em> and <em>two words</em></para>
<para>Prefix word <em>four words in emphasis</em>, then more
words</para>
Real progress. I now believe we have a working prototype!
I shall not be posting more progress reports to this site. I'll use the mailing list now that we have a small group interested.
Source and javadoc are available as a zip here. Please let me know what you think. DaveP
To let others have their say, I've created a mailing list at freelists.org. Subscribe and unsubscribe address at the bottom of the page.