[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban-beginners] Re: welcome and question about brivla recognizing
On Sunday 20 July 2008 03:54:55 Mateusz Grotek wrote:
> Hello.
> I'm new to lojban, recently started reading CLL, but have some questions 
> about brivla recognition from speech stream.
> What is exact algorithm for doing it? I tried to create one, but it 
> looks like i have to count letters before stress, what i don't wanna do. 
> Is it really needed? (Because of something what is called "tosmabru 
> failure" in book). And point 5b) in draft look for me somehow wrong, but 
>   maybe it's my fault. Could you explain it to me please?
I found a bug in the algorithm; I don't remember what it is. I modified it,
implemented it (called valfendi), and added support for fu'ivla rafsi. The
following is the part of it that resolves brivla:
----
3.  Pick the first piece that has not been resolved.
li'o
  E.  If the piece contains a consonant followed two letters later, not
      counting apostrophes and commas, by a vowel, and there is no 'y' after
      the letter between the consonant and the vowel, then there is a (possibly
      invalid) brivla in the piece.
    I.  Make a copy of the string, decapitalize all consonants, remove all
        commas adjacent to consonants, and insert commas before consonant
        clusters, between adjacent nondiphthong vowels, and after each pair
        of vowels without a comma between them.
    II. If the stress option is set and no vowel in the piece is stressed,
        stress the vowel in the next-to-last syllable not counting syllables
        which have 'y' in them.
    III.Capitalize all letters in all syllables which have at least one capital
        letter in them.
    IV. Scan forward for a stressed vowel other than 'Y' after or at the first
        CC or CyC consonant cluster, then scan forward to the end of the next
        syllable, ignoring syllables with 'y' in them. If the next syllable is
        itself stressed, reset the count.
    V.  If you reached the end of the word looking for a stressed vowel or the
        next syllable, resolve the piece as an error. If the next syllable
        begins with a non-initial consonant cluster, a vowel, or an apostrophe,
        go back to IV and keep looking. If the next syllable begins with a
        valid consonant cluster or single consonant, break before it and
        consider the first part, which is a brivlavau. If there is no next
        syllable, the whole piece is a brivlavau.
    VI. If the piece does not have a CC or CyC consonant cluster in the first
        five letters, not counting apostrophe, comma, or 'y', find the first
        consonant after the first letter and break before it.
    VI. If the piece has a consonant cluster in the first five letters, find
        the first CC or CyC consonant cluster and check whether it is a valid
        initial cluster (if it contains 'y' it is not) and whether the part
        beginning there is a slinku'i (see below) or monosyllabic. If the CyC
        contains a stressed 'Y', it is not a consonant cluster; thus
        /ledYcIlta/ is {le dy cilta} "D's thread", but /ledycIlta/ is
        {ledycilta} "hypha". If the part contains a diphthong and no other
        vowels, it is monosyllabic even if there is a comma in the diphthong,
        e.g. {pru,a}.
      a.  If the brivlavau begins with a consonant cluster, it is a valid
          initial cluster, and the brivlavau is not a slinku'i and is not
          monosyllabic, resolve it as a brivla.
      b.  If the brivlavau begins with a consonant cluster but the cluster is
          not a valid initial cluster or the brivlavau is a slinku'i or
          monosyllabic, resolve it as an error.
      c.  If the brivlavau does not begin with a consonant cluster, the cluster
          is a valid initial cluster, and the part beginning there is not a
          slinku'i and is not monosyllabic, break before the consonant cluster
          and resolve the second part as a brivla.
      d.  If the brivlavau does not begin with a consonant cluster and the
          cluster is not a valid initial cluster or the part beginning there
          is a slinku'i, resolve the brivlavau as a brivla.
A slinku'i, as far as word breaking is concerned, is anything that matches
the regex
^C[raf3]*([gim]?$|[raf4]?y)
but does not match the regex
^[raf3]*([gim]?$|[raf4]?y)
where
C matches any consonant
[raf3] matches any 3-letter rafsi, meaningful or not (any CCV where CC is a
valid initial pair, CVC, or CVV where the VV is a diphthong allowed in lujvo)
[raf4] matches any 4-letter rafsi, meaningful or not (any CCVC where CC is a
valid initial pair, or CVCC for any CC)
[gim] matches any gismu, meaningful or not ([raf4]V).
Anything after the first 'y' is ignored. It has no effect on where to break the
word, only on whether the word is valid.
----
The tosmabru is handled by rule 3.E.VI.c. It is possible to find arbitrarily
long sequences in which changing one of the last five letters changes it from
brivla to tosmabru, but finding a long one in which both the brivla and the
valsrsmabru (the valsrtosmabru minus the CV cmavo) are sensible is pretty
difficult. So when listening to speech, you'll probably lex such words by
recognizing them, not by counting letters backward from valid or invalid pairs.
Pierre