[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] le stura be la gihuste

To: Lojban List <lojban@egroups.com>
Subject: Re: [lojban] le stura be la gihuste
From: "Bob LeChevalier (lojbab)" <lojbab@lojban.org>
Date: Thu, 24 Aug 2000 10:23:56 -0400
In-reply-to: <Pine.LNX.4.21.0008231614470.3775-100000@burp.n>
References: <4.2.2.20000823084322.00a24cb0@127.0.0.1>

At 05:28 PM 08/23/2000 +0200, Elrond wrote:

> I don't see how this is so.  The difficulty is all on the human end -
> preparing the files.  Computer memory and speed is cheap and hardly
> challenged by the size of the lists that are being searched for Lojban

> processing (and if they are, then indexing a file isn't difficult), andany

> format can be manipulated into any other format by a computer program as
> part of setup, if the original format is regular.

Leaving apart the amount of difficulty in doing any translation work (that
is my problem to evaluate it, and not subject to discussion, obviously),
there is still *much* work to be done on the lists' format before anyone
can write *SIMPLE* yet *efficient* programs that can, for example, convert
Lojban text to correct English. This work includes *adding* fields,
keywords, prepositions, connect words, grammar information, and so on, to
the gismu, cmavo and lujvo lists.

OK, I understand and agree these added fields may be necessary. But theyare not (as yet) planned to be part of the baselined list. Currently mostof the stuff, to the extent that it has been devised, exists in separatelists, and not in a single file.

Once this work is done, which means that the various list,
considered as *computer files*, are rewritten in a way that
parsing/reading it is easy and precise and/or unambiguous from a *computer
program*'s standpoint, then the material for an easier "field filling"
translation work of the lists themselves is there.

I guess what you are saying is that it would be easiest to translate all ofthe field information for a single word at one time, and not to requiremultiple passes through multiple files. If so, I can understand this.

This is what I meant. I do not consider changing the gismu list, but
modify the structure of the files which contain it.

> >now with those several ideas, I can already think about
> >having standardized tools, more complex translation capabilities and so
> >on, both for French AND English versions of the list. Ask for further
> >details.
>
> Consider yourself asked
li'o

> Like I said: do the translation, and THEN worry about conforming it tosome

> standardized format.

        I do not want this. I want to care about and devise a format which
will make it easy to modify/revise the translated list *before* starting
the translation. For having used computer material which was created from
the ground up without structure design before, and knowing how difficult
it is to "patch", or even *use* (in an automated fashion) a file with
bogus/random structure, I cannot just impose the same thing to a
newly-written computer entity. Look, this is just what everyone teaches to
database or programs writers: write with regular patterns, document them,
for the sake of maintainability!

The question is whether we know the regular patterns that we shalleventually want/need, and furthermore whether it is possible to write themwith regular patterns and end up with consistency. More on this below.

        I believe that I now have to insert the ideas I thought about.

My goals were the following:

* The files should be formatted in plain text, with (preferably) only
ASCII characters, or, when the character set is different, have it
specified at the beginning (in a "header") in a standard fashion.

OK. Obviously we have to do different things when working with Cyrillicfor Russian. The first line of the gismu list serves as such a header now.

* Fields should not be fixed-width, not because it is a waste of computer
space, but because a certain width often later on proves too narrow.
Instead, have them separated by "control" characters/tags, preferably
using tabulations or newlines so that
   1) it is easily parsable (main goal)
   2) it formats automatically nicely when displayed on a standard
computer display.

I am with you up until the last point. I understand that Unix-based toolsuse control-character delimiters for fields, but I have seldom seenDOS/Windows programs do so. As such, while what you produced below isquite readable on a standard display, as a database display, it is hard touse because I cannot easily match fields from multiple words.

I think this is perhaps the main difference between the twoapproaches. When translating, you are focusing on translating all materialfor a single word at one time. On the other hand, when creating suchlists, we found it most highly essential to be able to look at the samefields for large numbers of words at one time, which allows me to sort onthe fly on any of the fields, and to compare the same fields of all wordsof certain types; to the extent that the place structure definitions areconsistent, it has been because we were able to do such multi-wordcomparisons easily in a single page display. This necessitates eitherfixed length fields, or perhaps a spreadsheet-style database (I haven'tused spreadsheets in years, so I am not sure the state of the art inuser-friendliness of display, but I am thinking of Excel, which Nora hasused on occasion.) which does allow for variable length fields (but I haveno idea how the data is stored internally).

We have never had a qualm about adding a field later, when it was shown tobe needed, and I don't think we have the design competence at this point tobe certain what fields would be needed for a variety of tools, or even fora variety of natural languages. It seems that you want to add all thefields now and do the translation for each word for all fields, butpractically speaking, I don't think that we can reasonably produce even theEnglish form of such a list with added fields for grammatical informationany time soon.

* In addition to the "unique" keywords for each lojban word, should be
added place keywords.


That currently exists as a separate file.

The grammar class of sumti fitting in each place
should, when needed, specified clearly and separately from the
translation.

It is a matter of design principle that in Lojban, all sumti have the samegrammar class. I understand that in natural languages, the correspondingplaces of the corresponding predicates will often have grammaticalrestrictions, but I think these restrictions differ with differentlanguages, and perhaps even with the choice of word used to translate theLojban.

* The translation should be done in two parts: the first part translating
what fits in each place (creating these "place keywords"), and the other
part specifying the relationship stated by the gismu between the places,
together with all connection words needed when doing a lojban-to-other
language translation.

I don't think this is simple. I think a quality translation tool willsometimes need more than one template of relationship and connectionwords. But I think the data exists more or less for English for theparser/glosser, just not in the master gismu file.

This is about the gismu, lujvo, and possibly fu'ivla lists (I did not --
yet -- think much about the cmavo lists, while however I am considering
doing so soon).

We cannot even get people to do keywords for the lujvo and fu'ivlalists. The dikyjvo place structure analysis per the Book is immensely timeconsuming.

In short, generating data for each of these fields of information is amajor project in itself. We have generally done this by getting onefield/data-category filled in for several hundred words at a time, thenverified for consistency in style and content over all the words, beforegoing on to a new word. You are in effect proposing that we add severalmore fields to each word, and the easy answer is that it will likely beyears efore they are done for the lujvo and fu'ivla even if we keep it realsimple.

As for the implementation of these ideas, the best thing would be an
XML-like database format. Unfortunately, I do not (yet) know much about
XML parsers and therefore did not bother working on an appropriate set of
XML tags.

I have no idea even what XML is, and the only kind of parser I knowanything about is a YACC parser (and not much even then).

 I instead tried a simplified syntax to firstly format, and
secondly translate, the first few gismu. Here is what I got, explanation
follows:

betfu (bef, be'u): "abdomen", "belly"
        1: "abdomen", "belly", "lower trunk" \
           [body part; \
            metaphor: midsection; \
            also: "stomach" (= djaruntyrango); \
            also: "digestive tract" (= befctirango, befctirangyci'e)]
        2: "body"
        r 1* : $1 is [!:an] [2:a/the] abdomen [2:of body $2]
        r 2* : $2 has [!:an] [1:for] abdomen [1:$1]
        related: cutne; livga; canti; djaruntyrango; befctirango; \
                befctirangyci'e

kakne (ka'e): "able", "can"
        1: "able", "capable" [also: "talentuous"] (1)

I don't recognize "talentuous" as a valid English word, and based on itsroots, would not associate it with mere ability. The use of "talent" inthe current gismu definition is out in the related information area, andmakes sense only if one immediately contrasts it with stati which is thenormal word used to refer to talent.

        2 (event, state): "ability", "capacity"
        3 (event, state): "cond. of ability"
        r 1* : $1 is/are able [2:to do/be $2] [3: under cond. $3]
        r 2* : $2 is/are ability [1:of $1] [3: under cond. $3]
        r 3* : $3 is/are cond. of ability [1:of $1] [2:to do/be $2]
        n 1 : also: "has talent", "know how to"
        n 2 : also: "know how to use" (= plika'e)
        related: stati; certu; gasnu [in the time-free potiential sense]; \
                djuno; zifre; plika'e; ka'e; nu'o; pu'i

gapru (gap): "above", "up"
        1: "thing directly above", "thing vertically above", \
           "thing upwards"
        2: "origin of above-ness"
        3: "frame of ref.", "gravity of ref."
        r 1* : $1 is/are above [2:$2] [3:in frame of ref. $3]
        r 2, 23 : $2 has s/g above it [3:in frame of ref. $3]
        r 21* : $2 has, above it, $1 [3: in frame of ref. $3]
        r 3+ : $3 is frame of ref. [*:in which !]
        r 3 : $3 is frame of ref. in which s/g is above s/g else
        related: tsani; galtu; cnita; drudi; gacri; dizlo; farna

This is very good, almost ideal, for what I originally had in mind for thedictionary firm of the gismu list (but which I've concluded cannot beproduced in any reasonable amount of time). But I think it fails as acomputer tool. The keywords listed for the places contain a lot of humaninformation, but the computer program needs a single keyword, and not a choice.

The "also:" information in x1 of betfu is "related" info - indeed all ofthe stuff in square brackets in the regular gismu list is "related"stuff. Related stuff is important for a human translation (but is likelyto be very natlang specific), and yet is not useful for computer application.

Enough for now on.

Yes that is a sufficient sample. I think others can and should comment,and I will have Nora look (she may very well disagree with me). Commentsfrom others such as R Curnow, who have done computer tools based on theexisting list, would be especially informative.

Of course this bit might seem much less clear and/or
obvious than a straight translation as in the current list. However this
"format" makes it easy to *convert* it, for example, to the actual format
and thus print the whole list in a more understandable way, all this by a
*single* tool.

Unix people talk about such tools. DOS/Windows people tend to use screeneditors and not tools, and indeed seldom think in terms of a tool to dowhat you describe. As such, if I wanted to do anything with your list in adifferent format, it would be a severe pain for me to convert it toanything else.

 Such a syntax allows both for easy translation of

   da de di gapru
into
   "da" (ent. au-dessus) est au-dessus de "de" (obj. surplombé) dans le
réf. "di" (référentiel)

and
   da de di te gapru
into
   "da" (référentiel) est réf. dans lequel "di" (ent. au-dessus) est au
dessus de "de" (obj. surplombé)

The process, while not obviously clear, is intuitive from the information
provided in each gismu record.


I'll have to believe you on this.

        As one might have noted, I tried to use this "two parts in
translation" pattern I talked about.

In a first part of each record, there are place keywords with notes
corresponding to each particular places.
The second part states different possible foreign translation of the
*relationship* itself, one for each possible useful place structure. For
example purposes, I tried to specify every relevant relationship
translation for each gismu; of course it is possible to specify only one
and eventually complete what's remaining later.
The last part states notes and related information for the gismu record as
a whole.

This part sounds very useful as an additional template. But it is going tobe language specific which place structures are "useful" or which have atranslation, so I question that this needs to be done in English and thentranslated to French - I am sure there are idiomatic French phrasings forsome gismu that there is no corresponding English for, and vice versa.

I won't include here the detailed explanation of each syntax bit, even
if the bracket things, for example, are truly not obvious. This was a
draft idea.

This would have been useful. It is not clear what syntax information youare trying to communicate with the various symbols and codes.

I also do know that these choices are suboptimal -- I feel like there
would be much less to write if any modification/translation of the list
could be written as a "patch" to a previous version. This is why tagged
syntax is nice, btw.

I do not know the conventions of syntax tagging, and had the impressionthat there are numerous ways to do it, all mutually incompatible.

So, what do you think of it ? Any derived ideas show up ?

I'll let others tackle improvements. From my perspective, just producingthe English files would take man-weeks of effort that we have no one tospare; the result would then be used without being checked by the years ofmultiple reviewers and proofreading that have gone into the existing gismulist. And we get no French gismu list until the English is complete. Thismerely heightens the existing dependence of Lojban on its English roots,when I think the goal of translation into other languages is as much aspossible to cut Lojban loose from English.

The tradeoff is that in cutting loose from English, the solidity of thebaseline is weakened - will a French Lojbanist working from a Frenchtranslation of the gismu list communicate well with a Russian Lojbanistworking from a Russian translation, with neither resorting to the Englishlists? We cannot know until we have those translations.

I am more hungry now for the tools that people need to learn Lojban in thedifferent languages, and less focused on the tools that might be need forcomputer translation applications.

Thanks for your attention


You definitely have my attention.

If I sound negative, it is not that your ideas are bad, but rather that Ithink the job is too big for the people we have and their low levels oftime-availability, and ill-suited for the diverse methods differentvolunteers will use in working on it on different computer platforms usingdifferent software. (Nora, for example, does all of her word list work onpaper while riding on the subway, ideally entering it later into thecomputer; she doesn't use any standard format for her notes and comments,so only she can enter them into the machine, and she usually does that asstraight unformatted text.)


lojbab
--
lojbab                                             lojbab@lojban.org
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA                    703-385-0273
Artificial language Loglan/Lojban:                 http://www.lojban.org

Follow-Ups:
- Re: [lojban] le stura be la gihuste
  - From: Elrond <grey.havens@earthling.net>
- Re: [lojban] le stura be la gihuste - XML
  - From: John Leuner <jewel@pixie.co.za>
- Tools - jflash
  - From: John Leuner <jewel@pixie.co.za>
- Re: [lojban] le stura be la gihuste
  - From: Richard Curnow <richard@rrbcurnow.freeuk.com>
- Re: [lojban] le stura be la gihuste
  - From: "David Twery" <dbtwery@bellatlantic.net>

References:
- Re: [lojban] le stura be la gihuste
  - From: "Bob LeChevalier (lojbab)" <lojbab@lojban.org>
- Re: [lojban] le stura be la gihuste
  - From: Elrond <grey.havens@earthling.net>

Prev by Date: skudji
Next by Date: Re: [lojban] le stura be la gihuste
Previous by thread: Re: [lojban] le stura be la gihuste
Next by thread: Re: [lojban] le stura be la gihuste
Index(es):
- Date
- Thread