Transolution XLIFF Filter Documentation
| Version: | 0.1 |
|---|---|
| Date: | 2005-08-03 |
| Authors: | Fredrik Corneliusson |
Contents
Introduction
Filter to convert from tagged formats such as HTML and XML to XLIFF and back. The filter is configurable using a ini-file so it can be adapted to a number of different formats.
- sgml2xliff.py - Script to convert to XLIFF
- xliff2sgml.py - Script to back convert XLIFF to original format
As of this moment ini files for the following formats exists in transolution/filters/filter_settings:
- DOCBOOK (docbook.ini)
- HTML (HTML.ini)
- Staroffice/Openoffice content.xml files. (OOffice.ini)
The filter produces sentence segmented XLIFF files.
Here are some example XLIFF files created with the filter: example_xliffs
Usage
First of all you need have the transolution directory in your python path or you have to be in the directory containing the transolution folder. If you can start the XLIFF editor everything should be OK. To get help on the arguments the script uses:
>python sgml2xliff.py -h
usage: sgml2xliff.exe [options] inifile path
options:
-h, --help show this help message and exit
-e ENCODING, --encoding=ENCODING
set source file encoding
-r, --recursive Process files recursive.
-f FMASK, --fmask=FMASK
File mask to use when running recursive.
-l SLANG, --slang=SLANG
Language of source document(s).
-s SKIPWORDS, --skipwords=SKIPWORDS
a file containing a the words not to segment after.
Example: vs. Mr.
-z, --xlz Create a xlz (zip file containing xliff and skeleton
files).
Tip
If the source file does not have newlines as OO's content.xml it is a good idea to insert newlines where appropriate e.g. after paragraph tags (</text:p>) before conversion to XLIFF. Otherwise the editor gets sluggish when you view the document with skeleton context turned on.
Convert to XLIFF
For example to convert a html file:
>python sgml2xliff.py -z ./transolution/filters/filter_settings/html.ini ../test/test.html
The -z switch is used to create .xlz file (a zip containing the XLIFF and the skeleton file).
Convert back to original format
>python xliff2sgml.py ../test/test.html.xlz
Filter INI files
Configuration file syntax
The configuration file consists of sections, led by a "[section]" header and followed by "name= value" entries.
Tags Section
In this section you define how tags be treated.
The syntax for the Tags section is
TAGNAME=TYPE,FLAGS
The TAGNAME should be just the name as it appears in the file (case insensitive), if the tag contains colon you have to replace the colon with : e.g.
text:p => text:p
The TYPE can be either "External" or "Internal". The rule is that if a Tag can be present in sentences it should be "Internal" e.g.:
Here's some <b>bold</b> text.
If you don't want to have the tag in translation segments you set the TYPE to External. e.g.
<header>This is a header</header>
If you want all content between the start and end tag to be set to Internal or External you set the Grouped FLAG. e.g.
<script language="javascript">
var Open = "";
function preload() { Open = new Image(16,16); Closed = new Image(16,16); }
</script>
This all that is supported at the moment and the other stuff such as Translatable Attributes is just there as I plan to implement support for it in the future.
The ini file Tags section for the tags above would be
[Tags] b=Internal header=External script=External,Group
FilterSettings section
The FilterSettings-section has two settings. If you set KeepLineBreaks to True all line breaks in the file will be kept in the translation segments. If it's not set line breaks will be removed from sentences. DefaultTagStyle is the style to use for tags that are not specified in the Tags-section.
[FilterSettings] KeepLineBreaks=False DefaultTagStyle=External
Skip words
Sometimes abbreviations (Mr., etc.) cause the filter to segment. The solution to this problem is to specify a text file with the skip words to the filter (-s or --skipwords). There is a very incomplete English abbreviation file under transolution/filters/skipwords/en_skipwords.txt. Just add every abbreviation you don't want to segment after and specify it to the filter.
