Generation in 7
What is the WinDi Translation Search Engine (WTSE) ?
Just like the existing search engines that look up results on different web sites, the purpose of this WinDi
function is to provide a Translation Search Engine in 7 languages.(please note that the screenshots below were done beginning of March '07; the number of Translations
mentioned is different today)
The WinDi Translation Search Engine allows then to retrieve already translated sentences inside the WinDi Translation
|Search without grammatical info.
||Search with grammatical info.
||Translation in 7 languages
All sentences contained in the WinDi Translation Database were generated by the
WinDi Translation Robot (WSL-Batch)
under the supervision of the WinDi Linguistic Team.
This Team carefully selects the vocabulary and all grammatical parameters of each group of sentences (named "jobs"),
afterwards the WinDi Translation Robot generates the translations in 7 languages of each chosen combination
(named "translation unit"). The resulting translation units are then indexed in real-time in the WinDi Translation Search
An average of 2 million sentences are being generated weekly by the WinDi Translation Robot.
This process is then completely different from automatic translation since the stored translations have been prepared
by a team of linguists "asking" the WinDi Translation Robot to execute repetitive tasks at a speed of one translation
every 150 millisecond (this delay is due to the fact that each translated sentence involves at least
3 up to 6 database accesses among +-15 millions of conjugation and grammar records).
WinDi Translation Robot (WSL-Batch) and WTSE are based on the WinDi aligned grammar and corpora between 7 languages.
A demonstration of WinDi Translation Search Engine
is available in order to allow you to quickly discover WTSE's concept.
This linguistic function is absolutely unprecedented on the Internet. Moreover, in a short
delay it will become the greatest 'Translation Memory in 7 languages' ever available On-Line !
Indeed, as a comparison, 1 million translated sentences represent about 10,000 translated pages or 10 translation
books (of 1000 pages). As the WinDi Translation Robot generates about an average of 300,000
sentences in 7 languages per day, 3 'virtual 1000-page translation books' are added daily into the WinDi Translation
Database available On-Line...
However, the number of translated pages or books mentioned above is largely underestimated because it does not
include the 6 translations that come with each sentence... The WinDi Translation Database represents in reality an
even bigger amount of printed documents while offering a great ease of search.
Technical information about the WinDi Translation Database
The database format was designed by the WinDi Development Team in order to be platform-independent. However, at this
time, this database runs under Windows 2000 as a standard 32-bits application. The hard disks used are NTFS formatted
without any specific settings.
The database is based on 'translation units' which is a sentence translated in 7 languages. A translation unit
includes all the vocabulary and grammatical informations required by the WSL-Batch Process.
All these vocabulary and grammatical informations are then available for the user receiving his/her search results.
You will notice that a request to the WinDi Database gets an answer within a delay of a few milliseconds (from 10,
average 50). This average corresponds to the delay running from the time when the WinDi Internet front end receives
a request (from a user) until the WinDi Database cluster's response. This includes the Ethernet 100 Mb transmission
delay between the Internet front end and the cluster.
Unfortunately, the Internet slows down the response delays. Since the WinDi Database is updated in real time (each
time WSL-Batch has processed a job), you may notice that mentioned delays get a bit longer from time to time,
otherwise we would have to stop all Internet accesses when updates are available...
This is the way the WinDi Database is working today. However, if we could install the database and its associated
consultation program in the same device (for example without any LAN and its transmission protocol), the database
consultation delays will dramatically decrease to be around the micro second, enabling real-time applications in the
future including speech-to-speech devices.
Finally, the size of the WinDi Database and its number of records will not influence the response delays even in case
of Tera records. The designed WinDi Database format is only limited by the OS itself and the size of hard disks or
memories. In comparison with all included devices, the WinDi Database Cluster has no limitation in terms of size and
number of records.
This was the WinDi Development Team's challenge when the design of the WinDi Translation Search Engine was launched...
However, at this time the WinDi Database Cluster could support up to 10,000 PC, each of them being able to have
2-TeraByte hard disk(s) representing a database size of 20 PetaBytes or more than 1,000 billions of sentences in 7
languages. Even in this case, the database access delays will remain the same as the 50 mS average of today, thanks
to the selected database design which exactly matches our project's requirements!
If you think such technology is impossible, you must know that WE DO HAVE THIS TECHNOLOGY TODAY ready to grow
and we are working each day to fill this linguistic tank in order to help overcome the language barrier...
Portability of the WinDi Translation Database
As explained here above, the WinDi Translation Database format is platform-independent which makes possible to use it in
many other different ways...
Since the average size of a translation unit is about 1.5 KBytes, a 1-million translation database will represent only
a few gigabytes of disk/memory space depending on the OS and its hard disk or memory formatting technology.
The WinDi Development Team is able to design other programs accessing the WinDi Translation Database as the one available
through the WTSE, and this on different platforms as handheld devices for example.
There are numerous applications for such translation database in 7 languages. It could be saved on a flash memory or
USB key having enough space according to the wanted application.
For instance, more and more features are being integrated in cell phones or GSM. Why not embed also the WinDi Translation
Database in order to allow anyone to access translations in 7 languages anytime and anywhere...
Today, there are MP3 players with 80 GBytes of memory/disk space. 80 GBytes will be rapidly replaced by 100 or
200 GBytes devices. This kind of size will allow us to save about 100 million translations in a small handheld
device as a small MP3 player.