From nobody@digitalkingdom.org Sun Jul 20 06:56:19 2008 Received: with ECARTIS (v1.0.0; list lojban-beginners); Sun, 20 Jul 2008 06:56:19 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.69) (envelope-from ) id 1KKZOc-0003TD-Ev for lojban-beginners-real@lojban.org; Sun, 20 Jul 2008 06:56:18 -0700 Received: from cpe-071-075-215-096.carolina.res.rr.com ([71.75.215.96] helo=ixazon.dynip.com) by chain.digitalkingdom.org with esmtp (Exim 4.69) (envelope-from ) id 1KKZOX-0003Sx-4w for lojban-beginners@lojban.org; Sun, 20 Jul 2008 06:56:17 -0700 Received: from chausie (chausie.ixazon.lan [192.168.7.4]) by ixazon.dynip.com (Postfix) with ESMTP id 5E411CEA9B for ; Sun, 20 Jul 2008 09:56:05 -0400 (EDT) From: Pierre Abbat To: lojban-beginners@lojban.org Subject: [lojban-beginners] Re: welcome and question about brivla recognizing Date: Sun, 20 Jul 2008 09:53:26 -0400 User-Agent: KMail/1.9.6 References: <4882EF4F.9020509@poczta.onet.pl> In-Reply-To: <4882EF4F.9020509@poczta.onet.pl> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200807200953.27219.phma@phma.optus.nu> X-Spam-Score: 2.2 X-Spam-Score-Int: 22 X-Spam-Bar: ++ X-archive-position: 682 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-beginners-bounce@lojban.org Errors-to: lojban-beginners-bounce@lojban.org X-original-sender: phma@phma.optus.nu Precedence: bulk Reply-to: lojban-beginners@lojban.org X-list: lojban-beginners On Sunday 20 July 2008 03:54:55 Mateusz Grotek wrote: > Hello. > I'm new to lojban, recently started reading CLL, but have some questions > about brivla recognition from speech stream. > What is exact algorithm for doing it? I tried to create one, but it > looks like i have to count letters before stress, what i don't wanna do. > Is it really needed? (Because of something what is called "tosmabru > failure" in book). And point 5b) in draft look for me somehow wrong, but > maybe it's my fault. Could you explain it to me please? I found a bug in the algorithm; I don't remember what it is. I modified it, implemented it (called valfendi), and added support for fu'ivla rafsi. The following is the part of it that resolves brivla: ---- 3. Pick the first piece that has not been resolved. li'o E. If the piece contains a consonant followed two letters later, not counting apostrophes and commas, by a vowel, and there is no 'y' after the letter between the consonant and the vowel, then there is a (possibly invalid) brivla in the piece. I. Make a copy of the string, decapitalize all consonants, remove all commas adjacent to consonants, and insert commas before consonant clusters, between adjacent nondiphthong vowels, and after each pair of vowels without a comma between them. II. If the stress option is set and no vowel in the piece is stressed, stress the vowel in the next-to-last syllable not counting syllables which have 'y' in them. III.Capitalize all letters in all syllables which have at least one capital letter in them. IV. Scan forward for a stressed vowel other than 'Y' after or at the first CC or CyC consonant cluster, then scan forward to the end of the next syllable, ignoring syllables with 'y' in them. If the next syllable is itself stressed, reset the count. V. If you reached the end of the word looking for a stressed vowel or the next syllable, resolve the piece as an error. If the next syllable begins with a non-initial consonant cluster, a vowel, or an apostrophe, go back to IV and keep looking. If the next syllable begins with a valid consonant cluster or single consonant, break before it and consider the first part, which is a brivlavau. If there is no next syllable, the whole piece is a brivlavau. VI. If the piece does not have a CC or CyC consonant cluster in the first five letters, not counting apostrophe, comma, or 'y', find the first consonant after the first letter and break before it. VI. If the piece has a consonant cluster in the first five letters, find the first CC or CyC consonant cluster and check whether it is a valid initial cluster (if it contains 'y' it is not) and whether the part beginning there is a slinku'i (see below) or monosyllabic. If the CyC contains a stressed 'Y', it is not a consonant cluster; thus /ledYcIlta/ is {le dy cilta} "D's thread", but /ledycIlta/ is {ledycilta} "hypha". If the part contains a diphthong and no other vowels, it is monosyllabic even if there is a comma in the diphthong, e.g. {pru,a}. a. If the brivlavau begins with a consonant cluster, it is a valid initial cluster, and the brivlavau is not a slinku'i and is not monosyllabic, resolve it as a brivla. b. If the brivlavau begins with a consonant cluster but the cluster is not a valid initial cluster or the brivlavau is a slinku'i or monosyllabic, resolve it as an error. c. If the brivlavau does not begin with a consonant cluster, the cluster is a valid initial cluster, and the part beginning there is not a slinku'i and is not monosyllabic, break before the consonant cluster and resolve the second part as a brivla. d. If the brivlavau does not begin with a consonant cluster and the cluster is not a valid initial cluster or the part beginning there is a slinku'i, resolve the brivlavau as a brivla. A slinku'i, as far as word breaking is concerned, is anything that matches the regex ^C[raf3]*([gim]?$|[raf4]?y) but does not match the regex ^[raf3]*([gim]?$|[raf4]?y) where C matches any consonant [raf3] matches any 3-letter rafsi, meaningful or not (any CCV where CC is a valid initial pair, CVC, or CVV where the VV is a diphthong allowed in lujvo) [raf4] matches any 4-letter rafsi, meaningful or not (any CCVC where CC is a valid initial pair, or CVCC for any CC) [gim] matches any gismu, meaningful or not ([raf4]V). Anything after the first 'y' is ignored. It has no effect on where to break the word, only on whether the word is valid. ---- The tosmabru is handled by rule 3.E.VI.c. It is possible to find arbitrarily long sequences in which changing one of the last five letters changes it from brivla to tosmabru, but finding a long one in which both the brivla and the valsrsmabru (the valsrtosmabru minus the CV cmavo) are sensible is pretty difficult. So when listening to speech, you'll probably lex such words by recognizing them, not by counting letters backward from valid or invalid pairs. Pierre