From lojban+bncCK7Yk5CUCxD4r7TzBBoENkQcAQ@googlegroups.com Sun Sep 11 13:20:32 2011 Received: from mail-gw0-f57.google.com ([74.125.83.57]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1R2qVk-0005Wr-RX; Sun, 11 Sep 2011 13:20:31 -0700 Received: by gwj18 with SMTP id 18sf3768818gwj.2 for ; Sun, 11 Sep 2011 13:20:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:mime-version:date:message-id:subject:from :to:x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-google-group-id:list-post :list-help:list-archive:sender:list-subscribe:list-unsubscribe :content-type; bh=i9lepFSB1MllV2b4Cr14aZrMw/cwN/HFdPjl4UNxhps=; b=in54f7s4RioUbmdHeA/qMKmnPkwU/G/um9MNK9l19aD4UALBX+/7uFrc1U2+i005n9 vHX/kWDnnpfc1QWor3+K6OAZYsqfvfhDqaAnKG24tNnJQ/TGQ80gXbHoqYf8CHsiDg5u JxxNi0RnpyBI8I+O6LQYcYRBkwrWniqG1WQK0= Received: by 10.150.238.1 with SMTP id l1mr669995ybh.78.1315772408767; Sun, 11 Sep 2011 13:20:08 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.101.171.12 with SMTP id y12ls15084025ano.2.gmail; Sun, 11 Sep 2011 13:20:07 -0700 (PDT) Received: by 10.101.37.17 with SMTP id p17mr3220088anj.16.1315772407735; Sun, 11 Sep 2011 13:20:07 -0700 (PDT) Received: by 10.101.37.17 with SMTP id p17mr3220086anj.16.1315772407718; Sun, 11 Sep 2011 13:20:07 -0700 (PDT) Received: from mail-gx0-f180.google.com (mail-gx0-f180.google.com [209.85.161.180]) by gmr-mx.google.com with ESMTPS id v10si7369049anq.2.2011.09.11.13.20.07 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 11 Sep 2011 13:20:07 -0700 (PDT) Received-SPF: pass (google.com: domain of rdentato@gmail.com designates 209.85.161.180 as permitted sender) client-ip=209.85.161.180; Received: by gxk10 with SMTP id 10so3376392gxk.25 for ; Sun, 11 Sep 2011 13:20:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.154.201 with SMTP id r9mr2337714icw.362.1315772405668; Sun, 11 Sep 2011 13:20:05 -0700 (PDT) Received: by 10.42.180.6 with HTTP; Sun, 11 Sep 2011 13:20:05 -0700 (PDT) Date: Sun, 11 Sep 2011 22:20:05 +0200 Message-ID: Subject: [lojban] valsi processor From: Remo Dentato To: lojban X-Original-Sender: rdentato@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of rdentato@gmail.com designates 209.85.161.180 as permitted sender) smtp.mail=rdentato@gmail.com; dkim=pass (test mode) header.i=@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Someone may have noticed that lately I've posed many questions about morphology. The reason is that I am writing a tool to analyze text lojban word by word. A tool like this might be able, for example, to generate statistics about a text or to augment it. Everything started from a thread here on the list where we discussed how to automatically add typographic elements to a text (e.g. converting it to TeX). I've written a Lua module based on LPeg called "jbo" that offers a function jbo.rafske(s) that will analyze the text "s" and will invoke a call back funciton for each word it finds. For example the script: ------------- -- Read a file with lojban text and categorize each word in it jbo = require("jbo") function jbo.fcmavo(sel,cma) print("CMAVO", sel, cma) end function jbo.fcmevla(cme) print("CMEVLA", cme) end function jbo.fgismu(gis) print("GISMU", gis) end function jbo.ffuhivla(gis) print("FUhIVLA", gis) end function jbo.flujvo(gis) print("LUJVO", gis) end function jbo.ftosmabru(sel,cma) print("TOSMABRU", sel,cma) end function jbo.fslinkuhi(s) print("SLINKhUI", s) end function jbo.fnalvla(s) print("NALVLA",s) end function jbo.fcomma(s) end function jbo.fpause(s) end text = io.stdin:read("*a") jbo.rafske(text) ------------------------- will just print each word preceded by its type. The functions jbo.xxxx will be called when an element of type xxxx is found. For those who don't know Lua (http://www.lua.org), it's a very fast scripting language used in major products like "World of Warcraft" or "Adobe Lightroom" and LPeg is the module to use PEG expressions as a pattern matching tool. As a guide for the module, I've used the "rafske.peg" file by alyn post found in the jbogenturfa'i repository but LPeg does not lend easily to a simple translation of a generic PEG grammar so that the module it's not an exact translation of the peg file (unfortunately). The module "jbo" is in it's very alpha stage, it *seems* to handle correctly all the words found in jbovlaste but I'm pretty sure it would fail for some case I did not considered properly. Definitely handling stress might be a weak point. Someone mentioned a large test file but I was not able to find it, I would be very interested in any list of words that would stress the morpholy rules. If anyone is interested, I'll be happy to share the code too, of course. It's just 470 lines of codes, including the full list of cmavo (3 per rows). remod -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en.