From lojbab@lojban.org Wed Aug 23 07:13:28 2000
Return-Path: <lojbab@lojban.org>
Received: (qmail 5681 invoked from network); 23 Aug 2000 14:13:28 -0000
Received: from unknown (10.1.10.27) by m2.onelist.org with QMQP; 23 Aug 2000 14:13:28 -0000
Received: from unknown (HELO stmpy-4.cais.net) (205.252.14.74) by mta2 with SMTP; 23 Aug 2000 14:13:28 -0000
Received: from bob (4.dynamic.cais.com [207.226.56.4]) by stmpy-4.cais.net (8.10.1/8.9.3) with ESMTP id e7NEDOJ77838 for <lojban@egroups.com>; Wed, 23 Aug 2000 10:13:24 -0400 (EDT) (envelope-from lojbab@lojban.org)
Message-Id: <4.2.2.20000823084322.00a24cb0@127.0.0.1>
X-Sender: vir1036/pop.cais.com@127.0.0.1
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Wed, 23 Aug 2000 10:11:09 -0400
To: Lojban List <lojban@egroups.com>
Subject: Re: [lojban] le stura be la gihuste
In-Reply-To: <Pine.LNX.4.21.0008230846590.3571-100000@burp.n>
References: <4.2.2.20000822154547.00b30f00@127.0.0.1>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
From: "Bob LeChevalier (lojbab)" <lojbab@lojban.org>

At 09:08 AM 08/23/2000 +0200, Elrond wrote:
> > >.iku'i oio'onaisai mi pu fapro le stura be la gihuste le ka klijmi je
> > >dikni je ke zu'o galfi ke'ebo frili
> >
> > The current structure of the gismu list is designed for
> > LogFlash. Translations need not abide by the structure, and perhaps 
> should
> > not try to do so. AFTER a translation is done, a LogFlash-compatible
> > structure could be devised for French Lojbanists who might wish to use
> > LogFlash with French to Lojban keywords.
>AFAIK (or, I hope ?) the gismu list was not done only for Logflash's
>purposes!

Yes and no. Originally it was indeed so. The structure of the list (fixed 
length fields of particular sizes) was chosen for LogFlash purposes. At 
the time we were making words, computer usages other than LogFlash did not 
exist and we were not connected to the Internet. We pretty much assumed 
that the dictionary gismu list would look somewhat different than the 
LogFlash list, both in structure and in definition style, and indeed would 
be a text and not a structured file. But the dictionary wasn't written, 
and it was the LogFlash list that was baselined. But translations of the 
gismu list into other languages are only weakly constrained by the baseline 
- we are smart enough to know that literally translating the English will 
not give the best definitions in other languages, so we have to trust that 
translators will maintain the integrity of the meanings (and that reviewers 
will catch any flaws).

The cmavo list is more clearly designed for LogFlash - the definitions are 
minimal and not very standardized in style. More importantly, the 
compounds that are in the cmavo list were chosen as teaching examples to 
appear in LogFlash, and there is no other justification for the particular 
selection that appears in the list.

LogFlash requires unique keywords, and we do suggest having some sort of 
keyword for each gismu, but it need not be a translation of the English 
keyword. In the text definition, we suggest including alternative word 
choices and synonyms in some manner, as we have for the English. If 
nothing else, a word search of the gismu list then serves some of the 
function of a dictionary, and a key-word-in-context (KWIC) list like we 
used to prepare the English-Lojban dictionary file becomes a simple 
computer task preparatory to a real dictionary.

There certainly was no attempt to make the gismu list usable for any other 
computer applications besides LogFlash. We never figured at the time that 
baselining would constrain one to use only the one format, and indeed Nora 
and I have no qualms at generating a new format or file/fields if we need 
them for a new application. That new file is not limited by baseline 
considerations, unless it somehow becomes part of the language definition.

>Indeed, while the jbofi'e does its work quite well (thanks to
>richard), it does so with quite much difficulties: as for any automated
>translation tool, extracting place names/keywords and the grammar of
>relationships from the current gismu list is a f... mess.

I haven't used jbofi'e, but we found that simply adding prepositions for 
each gismu into the current parser/glosser improved readability a lot.

>But this state of things is not only an issue for automated
>translation tools; indeed, while thinking about a possible translation of
>the gismu list to my mother tongue, French, it proved that the current
>format makes it tremendously hard to translate it to languages such as
>french, where words have several different forms (verb, noun, and so on):
>such a translation would impose, if using the current format, a painful
>choice between
> a) a very verbose file where all forms of words are listed
>for the sake of easing searches,
> or b) a compact (like now) file where only a few forms of words are
>listed, and where searches are made difficult because searching for a
>concept implies searching for many different words before finding the
>right one.
><rant>if English had different forms for verbs and their
>associated nouns, I bet some people would have thought a little bit
>more about it *before* writing the gismu list...</rant>

Not likely. The keyword would have probably been a standard form of noun 
or verb as appropriate. Definitions were written to be read and understood 
by English speakers trying to grasp the word meaning in no more than 2 
lines on a screen (or 1 line in text), and not for computer word 
searches. We presumed that English speakers know when and how to turn a 
verb into a noun and vice versa, and tended to only give alternative forms 
when they had different roots, or where connotations might lead to 
misunderstandings of the meanings. Not much thought was given to 
non-English native speakers using the gismu list in lieu of a translation 
into their native language - we did not have the luxury of thinking so far 
ahead back then (note that it has taken around 10 years before anyone tried 
to do more than translate the keywords).

> No, seriously. Of course I could start a translation in whatever
>formt suits my needs. However, from a computer hobbyist standpoint, I feel
>like having as much as different formats as there are different
>translation is a major mistake. At the even thought of having two versions
>of every lojban-related program to study in French or English, I feel a
>strong headache coming.

The point is to do the translation in any format you choose and THEN 
conform that translation to some format. If you have a good French 
language gismu list in ANY format, making a LogFlash-compatible version of 
that list shouldn't be hard. Definitions might need to be tweaked 
(shortened if you've been wordy), and if you haven't done keywords we would 
need to add them, but these are adjustments rather less in scope than the 
original translation.

Note that all the stuff beyond column 160 is totally free format. I 
adopted conventions to make my computer manipulations of the list easier, 
but LogFlash ignores that text completely.

> What I want to stress here is the fact that the various lists
>*must* be reformatted to improve the efficiency and simplicity of
>automated tools, be they translation tools, typesetting programs, word
>lookups, and so on.

I don't see how this is so. The difficulty is all on the human end - 
preparing the files. Computer memory and speed is cheap and hardly 
challenged by the size of the lists that are being searched for Lojban 
processing (and if they are, then indexing a file isn't difficult), and any 
format can be manipulated into any other format by a computer program as 
part of setup, if the original format is regular. But writing clear and 
understandable English or French text that defines the words is MUCH more 
difficult, and cannot be automated.

In the case of the English list, there already is multiple lists in a 
sense. Colin Fine came up with a list giving English keywords for each 
place of each gismu, and Nora used this to generate a "gismu list" of 
prepositions and case tags for each gismu. That list exists separately 
from the baselined list (and in fact is not baselined). It also went 
through at least three complete revisions to reach its current form.

>It also *should* be reformatted for any translation
>(of english words into another natural language) in order to create a
>standardized format readable by a single version of any automated tool.

Like I said: do the translation, and THEN worry about conforming it to some 
standardized format.

>I have several ideas about what would be the important criteria to
>be considered when choosing a new format for the various lists (the gismu
>list is not the only problem, of course, the current lujvo and cmavo lists
>are no more easy to feed into automations). These ideas might just be
>complete crap and/or bullshit, but yet I tried to find a consistent
>scheme: while several days ago, when I first started to think seriously
>about translating the gismu list in my native tongue (I do not master English
>enough to master Logflash), I could not do anything more than translating
>the keywords, because the syntax of the translation field is
>obnoxious;

If you mean the textual definition, it is free format human-readable 
English-text, subject only to the field size, and the need for a space at 
an appropriate place in order to divide into two lines to fit an 80 column 
screen. Any other syntax conventions you care to devise are your 
prerogative - there is no real standard (and the cmavo list definitions, as 
you will surely note, have much more severe syntax problems - problems I 
have never figured out how to resolve for the English dictionary short of 
rewriting the list.

>now with those several ideas, I can already think about
>having standardized tools, more complex translation capabilities and so
>on, both for French AND English versions of the list. Ask for further
>details.

Consider yourself asked, since I cannot see what you find missing without 
much more detail. Indeed, a few gismu in French or English would be best 
of all. I can't read French, but Nora has some rusty skill.

>However, I cannot, and do not want to start working on anything
>before further comments from other people: I want to know whether there
>are other people interested in a standard format or if it is actually
>preferable to translate in whatever format suits the new language's
>purposes. Working in the blind and in the fear of re-doing everything
>someday (like what will be needed for the English version at one point)
>just sinks my will completely.

The English gismu list WON'T be redone, whether it is needed or not for 
computer applications. That is what the baseline means. Any other format 
that is generated will not be the baselined gismu list. Similarly, while 
many people use the EBNF grammar, it is the YACC grammar that is baselined, 
and the EBNF is merely an alternate format that hopefully is identical to 
the baseline in meaning.

The English cmavo list definitions will have to be rewritten for the 
dictionary, and thus it is a riskier translation effort. But by the nature 
of cmavo, it is unlikely that a non-English definition of the words could 
be formed by merely translating the English, anyway. It is not clear 
whether the rewritten dictionary definitions will be plowed back into the 
"cmavo list" that is currently used as the baseline and in LogFlash 3.

As for fear of redoing - if you have translated the list into good French 
ignoring formatting considerations, then the redoing will be from the 
French translation, and not from any reformatting of the English. That 
redoing SHOULD be just an editing job.

On the other hand, I have to note that the current English gismu list has 
been polished by at least a dozen complete editorial passes through the 
entire list rewriting and standardizing styles and wording and format, 
which took place intermittently over some 6 years. And yes I had several 
"sinkings of will" when I faced yet another pass through the gismu list 
looking to make certain kinds of changes. To get a polished French list 
with everything needed for all manner of applications will likely need at 
least as many passes, and it almost certainly is a job beyond the 
capability of any one person. So again, you are faced with the need to 
have a minimally useful French list sooner, saving the polished 
multi-application list for some future date.

Dictionary/lexicon work is extremely time consuming and in some ways 
mind-numbingly depressing because there is always more work that could be 
done. Let us see what you have in mind, and we can comment, then you can 
decide what you will do and do it. Do not worry about whether it will need 
revisions; it will. But if you have made a good effort, then any revisions 
will be editorial rather than starting over from the English, and in the 
meantime, French Lojbanists will have a word list that they currently do 
not have.

lojbab
--
lojbab lojbab@lojban.org
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273
Artificial language Loglan/Lojban: http://www.lojban.org


