From lojban-beginners+bncCLr6ktCfBBDPjbztBBoExEgp0w@googlegroups.com Wed Apr 20 09:16:43 2011 Received: from mail-gx0-f189.google.com ([209.85.161.189]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1QCa4w-0002R0-Gl; Wed, 20 Apr 2011 09:16:43 -0700 Received: by gxk3 with SMTP id 3sf2049134gxk.16 for ; Wed, 20 Apr 2011 09:16:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:x-beenthere:received-spf:date:from:to:subject :message-id:mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-google-group-id:list-post :list-help:list-archive:sender:list-subscribe:list-unsubscribe :content-type:content-disposition; bh=5ed4D53VfaPzdGvw/WQNTFAZbGUA4myot1fEwb1BJU4=; b=j1Txrd9r0RiepbPGMY+EUgAgf73aWgpTtg+Enmg7PufyKy+pIPH4a6hRh+FNWYRHms kAbYYf4mcVO7KycFXD7B38OxR0tKFZnWJcjjQh8W+hZorwM0RaYVjxnxMOj/TpiW71I8 RnsUIq547ziUfTMFq4C4F1DX404LulJTiJoKg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-google-group-id:list-post :list-help:list-archive:sender:list-subscribe:list-unsubscribe :content-type:content-disposition; b=zlTSV2Y1b24c2XyrYOYhYCgOSiE6DRpupF20OFSuzZY2qEaih03DrDOwEcSVdq8Pm0 gELgewdt37yVexErSC76sk4rm/dd7EoNUwogjBCLkasJ+FmRZO2VrdagH2Gjt2gNpdOS H/Kp9sQ0WJZQ03VwbuDJyBmbD6yiZKe/JZcoI= Received: by 10.236.138.230 with SMTP id a66mr442712yhj.71.1303316175932; Wed, 20 Apr 2011 09:16:15 -0700 (PDT) X-BeenThere: lojban-beginners@googlegroups.com Received: by 10.150.105.18 with SMTP id d18ls1252553ybc.7.gmail; Wed, 20 Apr 2011 09:16:14 -0700 (PDT) Received: by 10.236.77.3 with SMTP id c3mr1383741yhe.38.1303316174504; Wed, 20 Apr 2011 09:16:14 -0700 (PDT) Received: by 10.236.77.3 with SMTP id c3mr1383740yhe.38.1303316174490; Wed, 20 Apr 2011 09:16:14 -0700 (PDT) Received: from mail-gx0-f180.google.com (mail-gx0-f180.google.com [209.85.161.180]) by gmr-mx.google.com with ESMTPS id h34si261249yhm.1.2011.04.20.09.16.13 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 20 Apr 2011 09:16:13 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.161.180 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) client-ip=209.85.161.180; Received: by gxk10 with SMTP id 10so250192gxk.25 for ; Wed, 20 Apr 2011 09:16:13 -0700 (PDT) Received: by 10.236.18.106 with SMTP id k70mr6380118yhk.203.1303316172871; Wed, 20 Apr 2011 09:16:12 -0700 (PDT) Received: from sunflowerriver.org (173-10-243-253-Albuquerque.hfc.comcastbusiness.net [173.10.243.253]) by mx.google.com with ESMTPS id 42sm469614yhl.68.2011.04.20.09.16.10 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 20 Apr 2011 09:16:12 -0700 (PDT) Date: Wed, 20 Apr 2011 10:16:07 -0600 From: ".alyn.post." To: lojban-beginners@googlegroups.com Subject: Re: [lojban-beginners] vlastezba: First beta version released! Message-ID: <20110420161607.GF49678@alice.local> Mail-Followup-To: lojban-beginners@googlegroups.com References: <20110420142911.GB49678@alice.local> <20110420152215.GD49678@alice.local> Mime-Version: 1.0 In-Reply-To: X-Original-Sender: alyn.post@lodockikumazvati.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 209.85.161.180 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) smtp.mail=alanpost@sunflowerriver.org Reply-To: lojban-beginners@googlegroups.com Precedence: list Mailing-list: list lojban-beginners@googlegroups.com; contact lojban-beginners+owners@googlegroups.com List-ID: X-Google-Group-Id: 300742228892 List-Post: , List-Help: , List-Archive: Sender: lojban-beginners@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Ha! I took a look at your parser and I can see how both those mistakes could be made. :-) I haven't updated my .jar file, I tried instead to work around the bugs you report below: $ echo "^Mba'e ba'er ba'ercatra broda"|java -jar vlastezba.jar /dev/fd/0 Read file [/dev/fd/0], got [4] unique words. ba'er ba'e ba'ercatra broda Ok, that seems pretty reasonable. If I remove all the spaces, I still expect there to be two words: $ echo "^Mba'eba'erba'ercatra broda"|java -jar vlastezba.jar /dev/fd/0 Read file [/dev/fd/0], got [2] unique words. ba'eba'erba'ercatra broda That "ba'eba'erba'ercatra" should be two words, "ba'e" and the lujvo "ba'erba'ercatra" It also appears I can break things by using '.': $ echo "^Mba'e.ba'erba'ercatra broda"|java -jar vlastezba.jar /dev/fd/0 Read file [/dev/fd/0], got [2] unique words. ba'e.ba'erba'ercatra broda There aren't any Lojban words with '.' in them. This problem is perhaps better demonstrated here: $ echo "^Mcoi.ro.do broda"|java -jar vlastezba.jar /dev/fd/0 lojban.vlastezba.TokenizerFailure: Could not find any cmavo in [coi.ro.do] - last candidate cmavo was [.r], cmavo list is: {coi} at lojban.vlastezba.LojbanTokenizer.breakOutCmavo(LojbanTokenizer.java:292) at lojban.vlastezba.LojbanTokenizer.getNextWord(LojbanTokenizer.java:182) at lojban.vlastezba.LojbanTokenizer.nextWord(LojbanTokenizer.java:473) at lojban.vlastezba.GlossaryCreator.loadHashMap(GlossaryCreator.java:128) at lojban.vlastezba.GlossaryCreator.createGlossary(GlossaryCreator.java:32) at lojban.vlastezba.GlossaryCreator.main(GlossaryCreator.java:183) Cannonically, whitespace in Lojban is all of/some of the whitespace character class in your locale (until we get our own locale, probably the English whitespace character class) and the '.' character. It is often (informally) also any punctuation other than ' (and the UTF8 version of that...) and , -Alan On Wed, Apr 20, 2011 at 06:02:09PM +0200, Johan Pretorius wrote: > To be honest, I sucked that example out of my ear based on how it was > meant to work. The reason you didn't get that result, is because there > were two bugs in the code: > - We were ignoring the first line of any file (fixed it) > - when the last word of the file is a compound cmavo, it gets misparsed > and you only end up getting the first cmavo from the cluster, the rest are > ignored. This one needs more careful thought. > > So, with the new jar file that I'm uploading as I speak, you should be > able to do what tried to, just make sure you tack a gismu or something to > the end of your file, so that you get accurate results. This time I tested > it before making any wild claims :-) > > On Wed, Apr 20, 2011 at 5:22 PM, .alyn.post. > <[1]alyn.post@lodockikumazvati.org> wrote: > > I'm not getting the result you report: > > $ echo "coirodo"|java -jar vlastezba.jar /dev/fd/0 > Read file [/dev/fd/0], got [0] unique words. > > This is also happening if I write the file and try it: > > $ cat test.txt > coirodo > $ java -jar vlastezba.jar test.txt > Read file [test.txt], got [0] unique words. > > Here is my java version: > > $ java -version > java version "1.6.0_24" > Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326) > Java HotSpot(TM) Client VM (build 19.1-b02-334, mixed mode) > > -Alan > On Wed, Apr 20, 2011 at 04:51:51PM +0200, Johan Pretorius wrote: > > Hi Alan, > > > > That would indeed be an interesting experiment, I'd be quite keen to > see > > the results myself. > > > > Right now, if you just call > > > > java -jar vlastezba.jar test.txt > > > > with some Lojban text (legal or otherwise) in test.txt, it will return > (on > > stdout), one valsi per line. So "coirodo" would result in: > > coi > > ro > > do > > (you can make it go look up the definitions by passing a second > parameter, > > but it will just add junk to the output that I don't think you'd want) > > > > Right now it doesn't check grammar at all, so you can throw any random > > collection of words at it (I don't intend for it to ever do this, > there > > are tools out there that are far better at this than I could ever hope > to > > make it). > > > > It also won't give you a classification of valsi - it doesn't "know" > when > > it's dealing with a cmavo (or indeed what class), or a gismu, or a > lujvo. > > This I DO intend to fix. > > > > I want to add other output formats anyway, so if you want me to do > > something specific to make your comparison easier, let me know. Now > would > > be a good time, as I'm going away on holiday for a week, and wanted to > > spend at least a little bit of time on vlastezba. > > > > In fact, if you are comfortable with Java, feel free to make it do > what > > you need, the source code is on [1][2]sourceforge.net > > ([2][3]http://sourceforge.net/projects/vlastezba/), and is GPL'ed :-) > > > > mu'o mi'e iu'an > > > > On Wed, Apr 20, 2011 at 4:29 PM, .alyn.post. > > <[3][4]alyn.post@lodockikumazvati.org> wrote: > > > > Do you have an external representation for your valsi parsing > > result? If I hand you the string "coirodo" is there a print > > form of that along the lines of ("coi" "ro" "do")? > > > > I would be interested seeing the result from processing a large > > data set of words and phrases and comparing that to jbogenturfa'i. > > In order to do this I'd need some output format from your program > > that I could parse. > > > > jbogenturfa'i uses the morphology PEG grammar that xorxes developed, > > so it contains code which I think is similar (and should be > > identical in result) to what you are doing: > > > > $ echo "coirodo"|jbogenturfahi --rafske > > ((cmavo (COI "coi")) (cmavo (PA "ro")) (cmavo (KOhA "do"))) > > > > I'd be curious to know whether they are in fact producing identical > > results. > > > > -Alan > > On Wed, Apr 20, 2011 at 11:02:28AM +0200, Johan Pretorius wrote: > > > Hi all > > > > > > You can download it from here: > > > > > > [1][4][5]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download > > > > > > I have completed the cmavo cluster breakout code, and tested it as > far > > as > > > I was able. > > > > > > It should be easy enough to run if you have Java 1.6 installed, just > > go > > > java -jar vlastezba.jar and it will print out usage instructions. > > > > > > Please download it and test to pieces! I'd love all your feedback. > > > > > > Not that it doesn't get very smart at this stage - for instance, it > > won't > > > know what to do if you feed it a string of lojban that doesn't have > > any > > > spaces in. The only clever bit is that it's able to break apart > cmavo > > > clusters if they don't have any spaces. > > > > > > Regards, > > > Johan > > > > > > -- > > > Johan Pretorius > > > Cell: 0829268327 > > > [2][5][6]pretoriusjf@gmail.com > > > > > > -- > > > You received this message because you are subscribed to the Google > > Groups > > > "Lojban Beginners" group. > > > To post to this group, send email to > > [6][7]lojban-beginners@googlegroups.com. > > > To unsubscribe from this group, send email to > > > [7][8]lojban-beginners+unsubscribe@googlegroups.com. > > > For more options, visit this group at > > > [8][9]http://groups.google.com/group/lojban-beginners?hl=en. > > > > > > References > > > > > > Visible links > > > 1. > > > [9][10]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download > > > 2. mailto:[10][11]pretoriusjf@gmail.com > > > > -- > > .i ma'a lo bradi ku penmi gi'e du > > -- > > You received this message because you are subscribed to the Google > > Groups "Lojban Beginners" group. > > To post to this group, send email to > > [11][12]lojban-beginners@googlegroups.com. > > To unsubscribe from this group, send email to > > [12][13]lojban-beginners+unsubscribe@googlegroups.com. > > For more options, visit this group at > > [13][14]http://groups.google.com/group/lojban-beginners?hl=en. > > > > -- > > Johan Pretorius > > Cell: 0829268327 > > [14][15]pretoriusjf@gmail.com > > > > -- > > You received this message because you are subscribed to the Google > Groups > > "Lojban Beginners" group. > > To post to this group, send email to > [16]lojban-beginners@googlegroups.com. > > To unsubscribe from this group, send email to > > [17]lojban-beginners+unsubscribe@googlegroups.com. > > For more options, visit this group at > > [18]http://groups.google.com/group/lojban-beginners?hl=en. > > > > References > > > > Visible links > > 1. [19]http://sourceforge.net/ > > 2. [20]http://sourceforge.net/projects/vlastezba/ > > 3. mailto:[21]alyn.post@lodockikumazvati.org > > 4. > [22]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download > > 5. mailto:[23]pretoriusjf@gmail.com > > 6. mailto:[24]lojban-beginners@googlegroups.com > > 7. mailto:[25]lojban-beginners%2Bunsubscribe@googlegroups.com > > 8. [26]http://groups.google.com/group/lojban-beginners?hl=en > > 9. > [27]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download > > 10. mailto:[28]pretoriusjf@gmail.com > > 11. mailto:[29]lojban-beginners@googlegroups.com > > 12. mailto:[30]lojban-beginners%2Bunsubscribe@googlegroups.com > > 13. [31]http://groups.google.com/group/lojban-beginners?hl=en > > 14. mailto:[32]pretoriusjf@gmail.com > -- > .i ma'a lo bradi ku penmi gi'e du > > -- > You received this message because you are subscribed to the Google > Groups "Lojban Beginners" group. > To post to this group, send email to > [33]lojban-beginners@googlegroups.com. > To unsubscribe from this group, send email to > [34]lojban-beginners+unsubscribe@googlegroups.com. > For more options, visit this group at > [35]http://groups.google.com/group/lojban-beginners?hl=en. > > -- > Johan Pretorius > Cell: 0829268327 > [36]pretoriusjf@gmail.com > > -- > You received this message because you are subscribed to the Google Groups > "Lojban Beginners" group. > To post to this group, send email to lojban-beginners@googlegroups.com. > To unsubscribe from this group, send email to > lojban-beginners+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/lojban-beginners?hl=en. > > References > > Visible links > 1. mailto:alyn.post@lodockikumazvati.org > 2. http://sourceforge.net/ > 3. http://sourceforge.net/projects/vlastezba/ > 4. mailto:alyn.post@lodockikumazvati.org > 5. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download > 6. mailto:pretoriusjf@gmail.com > 7. mailto:lojban-beginners@googlegroups.com > 8. mailto:lojban-beginners%2Bunsubscribe@googlegroups.com > 9. http://groups.google.com/group/lojban-beginners?hl=en > 10. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download > 11. mailto:pretoriusjf@gmail.com > 12. mailto:lojban-beginners@googlegroups.com > 13. mailto:lojban-beginners%2Bunsubscribe@googlegroups.com > 14. http://groups.google.com/group/lojban-beginners?hl=en > 15. mailto:pretoriusjf@gmail.com > 16. mailto:lojban-beginners@googlegroups.com > 17. mailto:lojban-beginners%2Bunsubscribe@googlegroups.com > 18. http://groups.google.com/group/lojban-beginners?hl=en > 19. http://sourceforge.net/ > 20. http://sourceforge.net/projects/vlastezba/ > 21. mailto:alyn.post@lodockikumazvati.org > 22. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download > 23. mailto:pretoriusjf@gmail.com > 24. mailto:lojban-beginners@googlegroups.com > 25. mailto:lojban-beginners%252Bunsubscribe@googlegroups.com > 26. http://groups.google.com/group/lojban-beginners?hl=en > 27. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download > 28. mailto:pretoriusjf@gmail.com > 29. mailto:lojban-beginners@googlegroups.com > 30. mailto:lojban-beginners%252Bunsubscribe@googlegroups.com > 31. http://groups.google.com/group/lojban-beginners?hl=en > 32. mailto:pretoriusjf@gmail.com > 33. mailto:lojban-beginners@googlegroups.com > 34. mailto:lojban-beginners%2Bunsubscribe@googlegroups.com > 35. http://groups.google.com/group/lojban-beginners?hl=en > 36. mailto:pretoriusjf@gmail.com -- .i ma'a lo bradi ku penmi gi'e du -- You received this message because you are subscribed to the Google Groups "Lojban Beginners" group. To post to this group, send email to lojban-beginners@googlegroups.com. To unsubscribe from this group, send email to lojban-beginners+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban-beginners?hl=en.