[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lojban] le stura be la gihuste
- To: Lojban List <lojban@egroups.com>
- Subject: Re: [lojban] le stura be la gihuste
- From: "Bob LeChevalier (lojbab)" <lojbab@lojban.org>
- Date: Wed, 23 Aug 2000 10:11:09 -0400
- In-reply-to: <Pine.LNX.4.21.0008230846590.3571-100000@burp.n>
- References: <4.2.2.20000822154547.00b30f00@127.0.0.1>
At 09:08 AM 08/23/2000 +0200, Elrond wrote:
> >.iku'i oio'onaisai mi pu fapro le stura be la gihuste le ka klijmi je
> >dikni je ke zu'o galfi ke'ebo frili
>
> The current structure of the gismu list is designed for
> LogFlash. Translations need not abide by the structure, and perhaps
should
> not try to do so. AFTER a translation is done, a LogFlash-compatible
> structure could be devised for French Lojbanists who might wish to use
> LogFlash with French to Lojban keywords.
AFAIK (or, I hope ?) the gismu list was not done only for Logflash's
purposes!
Yes and no. Originally it was indeed so. The structure of the list (fixed
length fields of particular sizes) was chosen for LogFlash purposes. At
the time we were making words, computer usages other than LogFlash did not
exist and we were not connected to the Internet. We pretty much assumed
that the dictionary gismu list would look somewhat different than the
LogFlash list, both in structure and in definition style, and indeed would
be a text and not a structured file. But the dictionary wasn't written,
and it was the LogFlash list that was baselined. But translations of the
gismu list into other languages are only weakly constrained by the baseline
- we are smart enough to know that literally translating the English will
not give the best definitions in other languages, so we have to trust that
translators will maintain the integrity of the meanings (and that reviewers
will catch any flaws).
The cmavo list is more clearly designed for LogFlash - the definitions are
minimal and not very standardized in style. More importantly, the
compounds that are in the cmavo list were chosen as teaching examples to
appear in LogFlash, and there is no other justification for the particular
selection that appears in the list.
LogFlash requires unique keywords, and we do suggest having some sort of
keyword for each gismu, but it need not be a translation of the English
keyword. In the text definition, we suggest including alternative word
choices and synonyms in some manner, as we have for the English. If
nothing else, a word search of the gismu list then serves some of the
function of a dictionary, and a key-word-in-context (KWIC) list like we
used to prepare the English-Lojban dictionary file becomes a simple
computer task preparatory to a real dictionary.
There certainly was no attempt to make the gismu list usable for any other
computer applications besides LogFlash. We never figured at the time that
baselining would constrain one to use only the one format, and indeed Nora
and I have no qualms at generating a new format or file/fields if we need
them for a new application. That new file is not limited by baseline
considerations, unless it somehow becomes part of the language definition.
Indeed, while the jbofi'e does its work quite well (thanks to
richard), it does so with quite much difficulties: as for any automated
translation tool, extracting place names/keywords and the grammar of
relationships from the current gismu list is a f... mess.
I haven't used jbofi'e, but we found that simply adding prepositions for
each gismu into the current parser/glosser improved readability a lot.
But this state of things is not only an issue for automated
translation tools; indeed, while thinking about a possible translation of
the gismu list to my mother tongue, French, it proved that the current
format makes it tremendously hard to translate it to languages such as
french, where words have several different forms (verb, noun, and so on):
such a translation would impose, if using the current format, a painful
choice between
a) a very verbose file where all forms of words are listed
for the sake of easing searches,
or b) a compact (like now) file where only a few forms of words are
listed, and where searches are made difficult because searching for a
concept implies searching for many different words before finding the
right one.
<rant>if English had different forms for verbs and their
associated nouns, I bet some people would have thought a little bit
more about it *before* writing the gismu list...</rant>
Not likely. The keyword would have probably been a standard form of noun
or verb as appropriate. Definitions were written to be read and understood
by English speakers trying to grasp the word meaning in no more than 2
lines on a screen (or 1 line in text), and not for computer word
searches. We presumed that English speakers know when and how to turn a
verb into a noun and vice versa, and tended to only give alternative forms
when they had different roots, or where connotations might lead to
misunderstandings of the meanings. Not much thought was given to
non-English native speakers using the gismu list in lieu of a translation
into their native language - we did not have the luxury of thinking so far
ahead back then (note that it has taken around 10 years before anyone tried
to do more than translate the keywords).
No, seriously. Of course I could start a translation in whatever
formt suits my needs. However, from a computer hobbyist standpoint, I feel
like having as much as different formats as there are different
translation is a major mistake. At the even thought of having two versions
of every lojban-related program to study in French or English, I feel a
strong headache coming.
The point is to do the translation in any format you choose and THEN
conform that translation to some format. If you have a good French
language gismu list in ANY format, making a LogFlash-compatible version of
that list shouldn't be hard. Definitions might need to be tweaked
(shortened if you've been wordy), and if you haven't done keywords we would
need to add them, but these are adjustments rather less in scope than the
original translation.
Note that all the stuff beyond column 160 is totally free format. I
adopted conventions to make my computer manipulations of the list easier,
but LogFlash ignores that text completely.
What I want to stress here is the fact that the various lists
*must* be reformatted to improve the efficiency and simplicity of
automated tools, be they translation tools, typesetting programs, word
lookups, and so on.
I don't see how this is so. The difficulty is all on the human end -
preparing the files. Computer memory and speed is cheap and hardly
challenged by the size of the lists that are being searched for Lojban
processing (and if they are, then indexing a file isn't difficult), and any
format can be manipulated into any other format by a computer program as
part of setup, if the original format is regular. But writing clear and
understandable English or French text that defines the words is MUCH more
difficult, and cannot be automated.
In the case of the English list, there already is multiple lists in a
sense. Colin Fine came up with a list giving English keywords for each
place of each gismu, and Nora used this to generate a "gismu list" of
prepositions and case tags for each gismu. That list exists separately
from the baselined list (and in fact is not baselined). It also went
through at least three complete revisions to reach its current form.
It also *should* be reformatted for any translation
(of english words into another natural language) in order to create a
standardized format readable by a single version of any automated tool.
Like I said: do the translation, and THEN worry about conforming it to some
standardized format.
I have several ideas about what would be the important criteria to
be considered when choosing a new format for the various lists (the gismu
list is not the only problem, of course, the current lujvo and cmavo lists
are no more easy to feed into automations). These ideas might just be
complete crap and/or bullshit, but yet I tried to find a consistent
scheme: while several days ago, when I first started to think seriously
about translating the gismu list in my native tongue (I do not master English
enough to master Logflash), I could not do anything more than translating
the keywords, because the syntax of the translation field is
obnoxious;
If you mean the textual definition, it is free format human-readable
English-text, subject only to the field size, and the need for a space at
an appropriate place in order to divide into two lines to fit an 80 column
screen. Any other syntax conventions you care to devise are your
prerogative - there is no real standard (and the cmavo list definitions, as
you will surely note, have much more severe syntax problems - problems I
have never figured out how to resolve for the English dictionary short of
rewriting the list.
now with those several ideas, I can already think about
having standardized tools, more complex translation capabilities and so
on, both for French AND English versions of the list. Ask for further
details.
Consider yourself asked, since I cannot see what you find missing without
much more detail. Indeed, a few gismu in French or English would be best
of all. I can't read French, but Nora has some rusty skill.
However, I cannot, and do not want to start working on anything
before further comments from other people: I want to know whether there
are other people interested in a standard format or if it is actually
preferable to translate in whatever format suits the new language's
purposes. Working in the blind and in the fear of re-doing everything
someday (like what will be needed for the English version at one point)
just sinks my will completely.
The English gismu list WON'T be redone, whether it is needed or not for
computer applications. That is what the baseline means. Any other format
that is generated will not be the baselined gismu list. Similarly, while
many people use the EBNF grammar, it is the YACC grammar that is baselined,
and the EBNF is merely an alternate format that hopefully is identical to
the baseline in meaning.
The English cmavo list definitions will have to be rewritten for the
dictionary, and thus it is a riskier translation effort. But by the nature
of cmavo, it is unlikely that a non-English definition of the words could
be formed by merely translating the English, anyway. It is not clear
whether the rewritten dictionary definitions will be plowed back into the
"cmavo list" that is currently used as the baseline and in LogFlash 3.
As for fear of redoing - if you have translated the list into good French
ignoring formatting considerations, then the redoing will be from the
French translation, and not from any reformatting of the English. That
redoing SHOULD be just an editing job.
On the other hand, I have to note that the current English gismu list has
been polished by at least a dozen complete editorial passes through the
entire list rewriting and standardizing styles and wording and format,
which took place intermittently over some 6 years. And yes I had several
"sinkings of will" when I faced yet another pass through the gismu list
looking to make certain kinds of changes. To get a polished French list
with everything needed for all manner of applications will likely need at
least as many passes, and it almost certainly is a job beyond the
capability of any one person. So again, you are faced with the need to
have a minimally useful French list sooner, saving the polished
multi-application list for some future date.
Dictionary/lexicon work is extremely time consuming and in some ways
mind-numbingly depressing because there is always more work that could be
done. Let us see what you have in mind, and we can comment, then you can
decide what you will do and do it. Do not worry about whether it will need
revisions; it will. But if you have made a good effort, then any revisions
will be editorial rather than starting over from the English, and in the
meantime, French Lojbanists will have a word list that they currently do
not have.
lojbab
--
lojbab lojbab@lojban.org
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273
Artificial language Loglan/Lojban: http://www.lojban.org