From lojbab@lojban.org Wed Aug 23 07:13:28 2000 Return-Path: Received: (qmail 5681 invoked from network); 23 Aug 2000 14:13:28 -0000 Received: from unknown (10.1.10.27) by m2.onelist.org with QMQP; 23 Aug 2000 14:13:28 -0000 Received: from unknown (HELO stmpy-4.cais.net) (205.252.14.74) by mta2 with SMTP; 23 Aug 2000 14:13:28 -0000 Received: from bob (4.dynamic.cais.com [207.226.56.4]) by stmpy-4.cais.net (8.10.1/8.9.3) with ESMTP id e7NEDOJ77838 for ; Wed, 23 Aug 2000 10:13:24 -0400 (EDT) (envelope-from lojbab@lojban.org) Message-Id: <4.2.2.20000823084322.00a24cb0@127.0.0.1> X-Sender: vir1036/pop.cais.com@127.0.0.1 X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 Date: Wed, 23 Aug 2000 10:11:09 -0400 To: Lojban List Subject: Re: [lojban] le stura be la gihuste In-Reply-To: References: <4.2.2.20000822154547.00b30f00@127.0.0.1> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed From: "Bob LeChevalier (lojbab)" At 09:08 AM 08/23/2000 +0200, Elrond wrote: > > >.iku'i oio'onaisai mi pu fapro le stura be la gihuste le ka klijmi je > > >dikni je ke zu'o galfi ke'ebo frili > > > > The current structure of the gismu list is designed for > > LogFlash. Translations need not abide by the structure, and perhaps > should > > not try to do so. AFTER a translation is done, a LogFlash-compatible > > structure could be devised for French Lojbanists who might wish to use > > LogFlash with French to Lojban keywords. >AFAIK (or, I hope ?) the gismu list was not done only for Logflash's >purposes! Yes and no. Originally it was indeed so. The structure of the list (fixed length fields of particular sizes) was chosen for LogFlash purposes. At the time we were making words, computer usages other than LogFlash did not exist and we were not connected to the Internet. We pretty much assumed that the dictionary gismu list would look somewhat different than the LogFlash list, both in structure and in definition style, and indeed would be a text and not a structured file. But the dictionary wasn't written, and it was the LogFlash list that was baselined. But translations of the gismu list into other languages are only weakly constrained by the baseline - we are smart enough to know that literally translating the English will not give the best definitions in other languages, so we have to trust that translators will maintain the integrity of the meanings (and that reviewers will catch any flaws). The cmavo list is more clearly designed for LogFlash - the definitions are minimal and not very standardized in style. More importantly, the compounds that are in the cmavo list were chosen as teaching examples to appear in LogFlash, and there is no other justification for the particular selection that appears in the list. LogFlash requires unique keywords, and we do suggest having some sort of keyword for each gismu, but it need not be a translation of the English keyword. In the text definition, we suggest including alternative word choices and synonyms in some manner, as we have for the English. If nothing else, a word search of the gismu list then serves some of the function of a dictionary, and a key-word-in-context (KWIC) list like we used to prepare the English-Lojban dictionary file becomes a simple computer task preparatory to a real dictionary. There certainly was no attempt to make the gismu list usable for any other computer applications besides LogFlash. We never figured at the time that baselining would constrain one to use only the one format, and indeed Nora and I have no qualms at generating a new format or file/fields if we need them for a new application. That new file is not limited by baseline considerations, unless it somehow becomes part of the language definition. >Indeed, while the jbofi'e does its work quite well (thanks to >richard), it does so with quite much difficulties: as for any automated >translation tool, extracting place names/keywords and the grammar of >relationships from the current gismu list is a f... mess. I haven't used jbofi'e, but we found that simply adding prepositions for each gismu into the current parser/glosser improved readability a lot. >But this state of things is not only an issue for automated >translation tools; indeed, while thinking about a possible translation of >the gismu list to my mother tongue, French, it proved that the current >format makes it tremendously hard to translate it to languages such as >french, where words have several different forms (verb, noun, and so on): >such a translation would impose, if using the current format, a painful >choice between > a) a very verbose file where all forms of words are listed >for the sake of easing searches, > or b) a compact (like now) file where only a few forms of words are >listed, and where searches are made difficult because searching for a >concept implies searching for many different words before finding the >right one. >if English had different forms for verbs and their >associated nouns, I bet some people would have thought a little bit >more about it *before* writing the gismu list... Not likely. The keyword would have probably been a standard form of noun or verb as appropriate. Definitions were written to be read and understood by English speakers trying to grasp the word meaning in no more than 2 lines on a screen (or 1 line in text), and not for computer word searches. We presumed that English speakers know when and how to turn a verb into a noun and vice versa, and tended to only give alternative forms when they had different roots, or where connotations might lead to misunderstandings of the meanings. Not much thought was given to non-English native speakers using the gismu list in lieu of a translation into their native language - we did not have the luxury of thinking so far ahead back then (note that it has taken around 10 years before anyone tried to do more than translate the keywords). > No, seriously. Of course I could start a translation in whatever >formt suits my needs. However, from a computer hobbyist standpoint, I feel >like having as much as different formats as there are different >translation is a major mistake. At the even thought of having two versions >of every lojban-related program to study in French or English, I feel a >strong headache coming. The point is to do the translation in any format you choose and THEN conform that translation to some format. If you have a good French language gismu list in ANY format, making a LogFlash-compatible version of that list shouldn't be hard. Definitions might need to be tweaked (shortened if you've been wordy), and if you haven't done keywords we would need to add them, but these are adjustments rather less in scope than the original translation. Note that all the stuff beyond column 160 is totally free format. I adopted conventions to make my computer manipulations of the list easier, but LogFlash ignores that text completely. > What I want to stress here is the fact that the various lists >*must* be reformatted to improve the efficiency and simplicity of >automated tools, be they translation tools, typesetting programs, word >lookups, and so on. I don't see how this is so. The difficulty is all on the human end - preparing the files. Computer memory and speed is cheap and hardly challenged by the size of the lists that are being searched for Lojban processing (and if they are, then indexing a file isn't difficult), and any format can be manipulated into any other format by a computer program as part of setup, if the original format is regular. But writing clear and understandable English or French text that defines the words is MUCH more difficult, and cannot be automated. In the case of the English list, there already is multiple lists in a sense. Colin Fine came up with a list giving English keywords for each place of each gismu, and Nora used this to generate a "gismu list" of prepositions and case tags for each gismu. That list exists separately from the baselined list (and in fact is not baselined). It also went through at least three complete revisions to reach its current form. >It also *should* be reformatted for any translation >(of english words into another natural language) in order to create a >standardized format readable by a single version of any automated tool. Like I said: do the translation, and THEN worry about conforming it to some standardized format. >I have several ideas about what would be the important criteria to >be considered when choosing a new format for the various lists (the gismu >list is not the only problem, of course, the current lujvo and cmavo lists >are no more easy to feed into automations). These ideas might just be >complete crap and/or bullshit, but yet I tried to find a consistent >scheme: while several days ago, when I first started to think seriously >about translating the gismu list in my native tongue (I do not master English >enough to master Logflash), I could not do anything more than translating >the keywords, because the syntax of the translation field is >obnoxious; If you mean the textual definition, it is free format human-readable English-text, subject only to the field size, and the need for a space at an appropriate place in order to divide into two lines to fit an 80 column screen. Any other syntax conventions you care to devise are your prerogative - there is no real standard (and the cmavo list definitions, as you will surely note, have much more severe syntax problems - problems I have never figured out how to resolve for the English dictionary short of rewriting the list. >now with those several ideas, I can already think about >having standardized tools, more complex translation capabilities and so >on, both for French AND English versions of the list. Ask for further >details. Consider yourself asked, since I cannot see what you find missing without much more detail. Indeed, a few gismu in French or English would be best of all. I can't read French, but Nora has some rusty skill. >However, I cannot, and do not want to start working on anything >before further comments from other people: I want to know whether there >are other people interested in a standard format or if it is actually >preferable to translate in whatever format suits the new language's >purposes. Working in the blind and in the fear of re-doing everything >someday (like what will be needed for the English version at one point) >just sinks my will completely. The English gismu list WON'T be redone, whether it is needed or not for computer applications. That is what the baseline means. Any other format that is generated will not be the baselined gismu list. Similarly, while many people use the EBNF grammar, it is the YACC grammar that is baselined, and the EBNF is merely an alternate format that hopefully is identical to the baseline in meaning. The English cmavo list definitions will have to be rewritten for the dictionary, and thus it is a riskier translation effort. But by the nature of cmavo, it is unlikely that a non-English definition of the words could be formed by merely translating the English, anyway. It is not clear whether the rewritten dictionary definitions will be plowed back into the "cmavo list" that is currently used as the baseline and in LogFlash 3. As for fear of redoing - if you have translated the list into good French ignoring formatting considerations, then the redoing will be from the French translation, and not from any reformatting of the English. That redoing SHOULD be just an editing job. On the other hand, I have to note that the current English gismu list has been polished by at least a dozen complete editorial passes through the entire list rewriting and standardizing styles and wording and format, which took place intermittently over some 6 years. And yes I had several "sinkings of will" when I faced yet another pass through the gismu list looking to make certain kinds of changes. To get a polished French list with everything needed for all manner of applications will likely need at least as many passes, and it almost certainly is a job beyond the capability of any one person. So again, you are faced with the need to have a minimally useful French list sooner, saving the polished multi-application list for some future date. Dictionary/lexicon work is extremely time consuming and in some ways mind-numbingly depressing because there is always more work that could be done. Let us see what you have in mind, and we can comment, then you can decide what you will do and do it. Do not worry about whether it will need revisions; it will. But if you have made a good effort, then any revisions will be editorial rather than starting over from the English, and in the meantime, French Lojbanists will have a word list that they currently do not have. lojbab -- lojbab lojbab@lojban.org Bob LeChevalier, President, The Logical Language Group, Inc. 2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273 Artificial language Loglan/Lojban: http://www.lojban.org