From LOJBAN%CUVMB.bitnet@YaleVM.YCC.YALE.EDU Sat Mar 6 22:44:40 2010 Received: from YALEVM.YCC.YALE.EDU by MINERVA.CIS.YALE.EDU via SMTP; Wed, 2 Jun 1993 03:01:35 -0400 Received: from CUVMB.CC.COLUMBIA.EDU by YaleVM.YCC.Yale.Edu (IBM VM SMTP V2R2) with BSMTP id 5270; Wed, 02 Jun 93 03:00:36 EDT Received: from CUVMB.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (Mailer R2.07) with BSMTP id 8176; Wed, 02 Jun 93 03:01:53 EDT Date: Wed, 2 Jun 1993 02:59:36 EDT Reply-To: Logical Language Group Sender: Lojban list From: Logical Language Group X-To: lojban@cuvmb.cc.columbia.edu To: Erik Rauch Status: RO X-Status: X-From-Space-Date: Tue Jun 1 22:59:36 1993 X-From-Space-Address: @YaleVM.YCC.YALE.EDU:LOJBAN@CUVMB.BITNET Message-ID: 06/01/93 Lojban baseline rafsi list part 5 of 5 This list is in the public domain. However, we ask that this header be retained on all distributed copies, so that people have some idea what they are looking at and how to get more information. For information about the artificial language Loglan/Lojban, please contact The Logical Language Group, Inc. (LLG). We ask that you provide a paper-mail address as well as an email address if appropriate The LLG is funded solely by your contributions, which are encouraged for the purpose of defraying our costs (for both electronic and paper distribution.) Bob LeChevalier, President, The Logical Language Group, Inc. 2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273 email: lojbab@grebyn.com THE lujvo-MAKING ALGORITHM The following is the official algorithm for generating Lojban lujvo (complex brivla, or predicate words), given a known tanru (metaphor) and a complete list of gismu (Lojban primitive roots) and their assigned rafsi (affixes). Note that Lojban does not require use of the optimal, or "best" form of a word. Poetic usage allows any of the valid word forms created by this algorithm to be used under appropriate circumstances. Given an n-term tanru and the instruction to find the highest- scoring lujvo: 1) For all terms except the final term, look up or generate all of the rafsi (3- and 4- letter forms). Three-letter forms will be of the structure CVC, CCV, CVV, or CV'V (the apostrophe is not counted as a letter in any Lojban rule). A standard gismu list gives the three- letter rafsi for each gismu and for each cmavo with an assigned rafsi. You can memorize the list also. This is not difficult if you use the language much: the set of possible rafsi for each word is limited, and because almost all possible rafsi have an assigned meaning, the more you know, the easier it is to learn the rest by elimination. - Given a CCVCV gismu C1C2V1C3V2, the CVC rafsi, if any, will be C1V1C3 or C2V1C3. The CVV/CV'V rafsi, if any, will be C1V1(')V2 or C2V1(')V2. The CCV rafsi, if any, will be C1C2V1. Very few gismu have both a CCV and a CVV/CV'V assigned. - Given a CVCCV gismu C1V1C2C3V2, the CVC rafsi, if any, will be C1V1C2. The CVV/CV'V rafsi, if any, will be C1V1(')V2. The CCV rafsi, if any, will be C1C2V2, or rarely, C1C2V1. - The rafsi for cmavo is assigned more arbitrarily. A CVV/CV'V form cmavo will often be its own rafsi, but when this isn't possible, the final letter is changed. A single letter, usually an arbitrary conso- nant, is added to a CV cmavo to make its rafsi. - The four-letter rafsi form for any gismu is formed by dropping the final vowel from the gismu (which is then effectively replaced by "y" in the lujvo). 2) For the final term, look up or generate all of the three-letter rafsi, omitting any CVC-form rafsi since a lujvo cannot end in a consonant. Then, for this position only, add in the full gismu itself as a '5-letter rafsi'. 3) Since most cmavo with rafsi have CVC rafsi and none has a 5-letter form, few cmavo can occur in the final position of a tanru used as the basis of a lujvo. cmavo in those positions are rare anyway, the exceptions being PA+MOI numbers. If a cmavo in any position has no rafsi, then it cannot be incorporated into the lujvo. Consider rephrasing or using zei to form an 'any-word' compound. 4) Form all of the ordered combinations of these rafsi, one rafsi per corresponding term ordered in the sequence of their corresponding terms. 5) Audible 'hyphens' may be necessary between some adjacent rafsi to make the word pronouncible, understandable, well-formed, and not prone to breaking up into two-or-more smaller words. Hyphens are never optional; they are not permitted in-between rafsi unless they are required. Right-to-left testing is recommended for reasons discussed below: a) If there are more than two terms, an initial CVV or CV'V rafsi will fall off and be heard as a separate cmavo. It must therefore be glued on with the letter 'r', which nominally stands in a syllable by itself. For example sai + zba + ta'u becomes sairzbata'u (syllabized as sai,r,zba,TA'u). If the initial rafsi is a CV'V, the 'r' may be joined onto the second syllable. Thus sa'i + zba + ta'u becomes sa'irzbata'u (syllabized as sa,'ir,zba,TA'u). If the first consonant of the second syllable is an 'r', the gluing 'hyphen' must be the letter 'n', instead of 'r' because doubled consonants are not permitted in Lojban. Thus sai + rai + ta'u becomes sainraita'u (syllabized as sai,n,rai,TA'u and NOT sain,rai,TA'u). 'n' is NOT permitted unless the adjacent 'r' forces it. If there are exactly two terms, and the initial term is a CVV or CV'V rafsi AND the final term is a 5-letter rafsi, an 'r' hyphen is needed as described above to prevent the initial rafsi from falling off into a separate CVV or CV'V cmavo. As above, an 'n' is used as glue if and only if an 'r' cannot be used. Thus sai + taxfu needs hyphen 'r' to become sairtaxfu (sai,r,TAX,fu). sai + ranji needs hyphen 'n' to become sainranji (sai,n,RAN,ji). If there are exactly two terms, and the initial term is a CVV or CV'V rafsi AND the final term is a CVV or CV'V rafsi, an 'r' hyphen is needed, because the lujvo is not well-formed, lacking a consonant cluster, and will fall apart into two CVV or CV'V cmavo. As above, an 'n' is used as glue if and only if an 'r' cannot be used. Thus sai + ta'u needs hyphen 'r' to become sairta'u (sai,r,TA,'u). sai + rai needs hyphen 'n' to become sainrai (SAI,n,rai). Note that hyphen in a syllable by itself is not counted in determining penultimate stress. However, if joined onto a vowel syllable as when ta'u + sai forms ta'ursai, the vowel syllable is counted and is stressed if penultimate (ta,'UR,sai). If there are exactly two terms, and the initial term is a CVV or CV'V rafsi AND the final term is a CCV rafsi, no hyphen is needed, because the lujvo is well-formed, having a consonant cluster, and penultimate stress falls on part of the CVV/CV'V rafsi, preventing it from falling off into a separate word. Thus sai + zba needs no hyphen 'r' to form saizba. b) Put y after any 4-letter rafsi form (e.g. zbasysai). Do not count a syllable centered on this hyphen in determining penultimate stress. (e.g. ZBAS,y,sai or ZBA,sy,sai). c) Put y at any proscribed C/C joint (impermissible medial consonant pair, e.g. nunynau). The following are the rules summarizing proscribed medials: Given that the consonant pair is defined as C1C2, that b, d, g, j, v and z are voiced consonants, c, f, k, p, s, t, and x are unvoiced consonants, and l, m, n, and r are nasal/liquid consonants. 1. C1 cannot be the same as C2. e.g. *kk 2. If C1 is voiced, then C2 must either be voiced or nasal/liquid. If C1 is unvoiced, then C2 must be either unvoiced or nasal/liquid. *bf 3. Both C1 and C2 cannot be among c, j, s, or z. *cs 4. *cx, *kx, *xc, *xk, and *mz are not permitted. Do not count a syllable centered on this hyphen in determining penultimate stress. (e.g. NUN,y,nau or NU,ny,nau). d) Put y at any proscribed C/CC joint (e.g. nunydji). The following are the rules for proscribed triples: The first two consonants of a consonant triple in a Lojban brivla must be restricted as for permissible medial consonant pairs per the above. The second pair within the triple must be a permissible initial consonant pair. Since you cannot get a triple in a lujvo unless the latter two consonants are part of a CCV rafsi, testing the first two consonants per c) is sufficient for this part of the test. In addition, there are a few triples that meet the above conditions but are still not pronounceable so as to be easily and uniquely resolvable from other combinations. Hence they are also not permitted, and require a hyphen. These triples are: n,dj n,dz n,tc n,ts Do not count a syllable centered on this hyphen in determining penultimate stress. (e.g. NUN,y,dji or NU,ny,dji). e) Test all forms starting with a series of CVC rafsi for "tosmabru failure", which means that the first CV will fall off into a separate cmavo, leaving the rest a valid lujvo. ("*tosmabru was a trial word that was found to so break up, and is used as the archetypal example of an invalid lujvo according to this rule.) This is a tricky rule, but not that common a circumstance, because the CV falls off only if a valid lujvo remains. The following are a set of simple short cuts to test for and correct all "tosmabru" situations. (The same situation with an apparent le'avla form remaining does not break up simply because such forms are forbidden to le'avla. This is the so-called "*slinku'i" rule for le'avla: if you stick a CV cmavo on the front of a le'avla and it forms a valid lujvo, then the le'avla is NOT valid.) If a series of rafsi has the pattern 'CVC ... CVC + X' , where no 'y' hyphens have been installed between any two of the CVC, there may be a "tosmabru" problem. - If X is a CVCCV long rafsi with a permissible initial as the consonant cluster, then even a single CVC rafsi on the front requires a "tosmabru test" (as in tos + mabru which would break up into to + smabru). You are specifically testing here to ensure that the CV on the front does not fall off, leaving a lujvo composed of a series of CCV rafsi. - If X is any rafsi or partial-lujvo that causes a y hyphen to be installed between the previous CVC and itself by one of the above rules, and there are at least two CVC rafsi preceding, you must also test for "tosmabru" break up (as in tos + mab + bai which would have added a 'y' hyphen between the last two terms, and would break up into to + smabybai, where "smab" is a hypothetical 4-letter rafsi form). You are testing here to avoid the initial CV falling off to leave a lujvo with a spurious CCVC 4-letter rafsi form just before the X component. NOTE THAT THE RULES DO NOT DEPEND ON THERE ACTUALLY BEING RAFSI THAT WOULD MAKE THE BROKEN UP WORD POSSIBLE (smab- is not the 4-letter form for any gismu currently assigned, but the rules do not presume that the listener knows which rafsi are real - they are based ONLY on the forms if the words.) The "tosmabru" test is: Examine all the C/C joints between the CVC rafsi, and between the last CVC and the X term. If the ALL of those C/C joints, as well as the CC in X, if we are dealing with the CVCCV case for X, are "bridged" by permissible initials, listed in Section III or the back of the gismu list, then the trial word will break up into a cmavo and a shorter brivla ("tosmaktu" would thus be valid, unlike "tosmabru"). If any C/C joint is unbridged, i.e., is impermissible as an initial CC, the trial word will not break up. It has passed the "tosmabru test". Only the first joint in a trial word needs to be unbridged in order to ensure resolvability. Thus: Install y as a hyphen at the first bridged joint if the "tosmabru" test fails (e.g. tosymabru). The 'lazy Lojbanist' "tosmabru test" is to add a hyphen any time you have a CVC rafsi followed by a CV... of 5-or-more letters, where the first C/C joint forms a permissible initial. This is NOT a correct algorithm - it will put in hyphens that are not necessary resulting in words that are technically invalid. However, for nonce lujvo-making, if an unnecessary hyphen is present, the word can be successfully and unambiguously analyzed. If a "tosmabru" hyphen is omitted, the word is likely to be incorrectly analyzed. Note that the 'tosmabru test' requires all hyphens based on other rules to have been determined before conducting the test. This is why this step occurs last. 6) Evaluate all combinations and select the word with the highest score, using some algorithm. SCORING ALGORITHM This algorithm was devised by Bob and Nora LeChevalier in 1989. It is not the only permitted algorithm, but it usually gives a choice that people find preferable. This is the algorithm encoded in the lujvo- making program sold by la lojbangirz. The algorithm may be changed in the future. Note that the algorithm basically encodes a hierarchy of priorities, preferring short words (counting an apostrophe as a half of a letter), then words with fewer hyphens, then words with fewer syllables and/or more vowels. Values are attached to various properties of the lujvo. The score is the sum of these values. 1. Count the number of hyphens (h), including 'y', 'r', or 'n'. 2. Count the number of vowels (v) not including 'y'. 3. Count the number of apostrophes (a). 4. Count the total number of characters including hyphens and apostrophes (l). 5. For each rafsi component, find the value in the following list. Sum this total (r): Cvv (sai) 8 CCVC (zbas) 4 CCV (zba) 7 -CCVCV (-zbasu) 3 CV'V (ta'u) 6 CVCC (sarj) 2 CVC (nun) 5 -CVCCV (-sarji) 1 The score is then 32500 - (1000 * l) + (500 * a) - (100 * h) + (10 * r) + v In case of ties, there is no preference. This should be rare. The following examples use the rafsi: CVC = nun CCV = zba Cvv = nau, sai CVCCV = sarji CCVC- = zbas- CV'V = ta'u Stress is shown explicitly using capitalization in these examples. Being algorithmic (always penultimate), it does not have to be explicitly shown when these words are actually used. zba + sai ZBAsai 32500 - (1000 * 6) + (500 * 0) - (100 * 0) + (10 * 15) + 3 = 26653 nun + y + nau NUNynau 32500 - (1000 * 7) + (500 * 0) - (100 * 1) + (10 * 13) + 3 = 25533 sai + r + zba + ta'u sairzbaTA'u 32500 - (1000 * 11) + (500 * 1) - (100 * 1) + (10 * 21) + 5 = 22115 zba + zbas + y + sarji zbazbasySARji 32500 - (1000 * 13) + (500 * 0) - (100 * 1) + (10 * 12) + 4 = 19524