[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing lujvo




la djordj. cusku di'e

> I'd like to know whether or not a machine parser for lujvo already exists.
> I think it's quite a useful thing to have; it would convert a lujvo into
> its component rafsi, and list the meanings of the rafsi.

Yes, there is, but additional implementations are always useful.

> At present it recursively attempts to extract rafsi from the left hand end
> of the lujvo, each time testing whether or not the remaining word (minus
> any hyphenating letters) is still a valid lujvo.  In doing this, it first
> tries to look up the first five letters in a gismu dictionary; if they're
> present then if no letters remain it returns success (i.e. parsed the
> whole word), otherwise it recurses on the remaining letters.

That's too simple an algorithm, as your counterexample below illustrates.

> A couple of important points present themselves.  Firstly, is it possible
> to resolve an arbitrary lujvo into component rafsi without needing to look
> them up in a dictionary?

Yes, it is always possible.

>   tavta'atavlytavla
>
> it will look up "tavta" as a 5L rafsi, despite the following apostrophe,
> then "tavt" as a 4L rafsi, despite the following `a' (which should be a
> `y', no?), then finally "tav" as a 3L rafsi, which will finally resolve.
> If it could split the word up sensibly to begin with then there would be
> less dictionary searches (not that they take long) and it would just be a
> nicer algorithm.

Absolutely no dictionary searching is required: the algorithm works
independently of what rafsi exist or don't exist.  The only
possible analysis is CVC-CVV-CVCCy-CVCCV.

> Secondly, and partially mentioned above, the reference grammar describes
> lujvo creation twice; the first time it is very general, and the second
> time it is more strict.  Specifically, the second time it says that all 4L
> rafsi should be followed by a `y' hyphen.  Is this generally true then?

Yes.

> It seems to me that a CCVC rafsi could be followed by a CVCCV gismu, say,
> provided they fit together, i.e. the last C of the first forms an
> allowable consonant pair with the first C of the last.

No, that is forbidden.

> Thirdly, are there any circumstances in which a 5 letter rafsi (other than
> a rafsi fu'ivla, which I'm not dealing with anyway) can appear other than
> at the end of the word?

No.

> Fourthly, do cmavo count as rafsi?  There seem to be some in the gismu
> list (CV'V form), which struck me as odd; these are cmavo, aren't they,
> not gismu?  I thought gismu were always five letters long, CCVCV or CVCCV.

Some cmavo have rafsi, though most don't.  Sometimes the rafsi of a cmavo
is identical to the cmavo, and sometimes it isn't (typically because it
is CVC, which no cmavo can be).

> And finally, is "ta'a" a three letter rafsi or a four letter rafsi?

THree letter.

> PS: "The Complete Lojban" is an excellent book -- well worth waiting for.
> I'm very happy to own a copy.

Thank you!

--
John Cowan                                      cowan@ccil.org
                e'osai ko sarji la lojban.