[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lojban-beginners] vlastezba: First beta version released!
Ha! I took a look at your parser and I can see how both those
mistakes could be made. :-)
I haven't updated my .jar file, I tried instead to work around the
bugs you report below:
$ echo "^Mba'e ba'er ba'ercatra broda"|java -jar vlastezba.jar /dev/fd/0
Read file [/dev/fd/0], got [4] unique words.
ba'er
ba'e
ba'ercatra
broda
Ok, that seems pretty reasonable. If I remove all the spaces, I still
expect there to be two words:
$ echo "^Mba'eba'erba'ercatra broda"|java -jar vlastezba.jar /dev/fd/0
Read file [/dev/fd/0], got [2] unique words.
ba'eba'erba'ercatra
broda
That "ba'eba'erba'ercatra" should be two words, "ba'e" and the lujvo
"ba'erba'ercatra"
It also appears I can break things by using '.':
$ echo "^Mba'e.ba'erba'ercatra broda"|java -jar vlastezba.jar /dev/fd/0
Read file [/dev/fd/0], got [2] unique words.
ba'e.ba'erba'ercatra
broda
There aren't any Lojban words with '.' in them.
This problem is perhaps better demonstrated here:
$ echo "^Mcoi.ro.do broda"|java -jar vlastezba.jar /dev/fd/0
lojban.vlastezba.TokenizerFailure: Could not find any cmavo in [coi.ro.do] - last candidate cmavo was [.r], cmavo list is: {coi}
at lojban.vlastezba.LojbanTokenizer.breakOutCmavo(LojbanTokenizer.java:292)
at lojban.vlastezba.LojbanTokenizer.getNextWord(LojbanTokenizer.java:182)
at lojban.vlastezba.LojbanTokenizer.nextWord(LojbanTokenizer.java:473)
at lojban.vlastezba.GlossaryCreator.loadHashMap(GlossaryCreator.java:128)
at lojban.vlastezba.GlossaryCreator.createGlossary(GlossaryCreator.java:32)
at lojban.vlastezba.GlossaryCreator.main(GlossaryCreator.java:183)
Cannonically, whitespace in Lojban is all of/some of the whitespace
character class in your locale (until we get our own locale, probably the
English whitespace character class) and the '.' character. It is often
(informally) also any punctuation other than ' (and the UTF8 version
of that...) and ,
-Alan
On Wed, Apr 20, 2011 at 06:02:09PM +0200, Johan Pretorius wrote:
> To be honest, I sucked that example out of my ear based on how it was
> meant to work. The reason you didn't get that result, is because there
> were two bugs in the code:
> - We were ignoring the first line of any file (fixed it)
> - when the last word of the file is a compound cmavo, it gets misparsed
> and you only end up getting the first cmavo from the cluster, the rest are
> ignored. This one needs more careful thought.
>
> So, with the new jar file that I'm uploading as I speak, you should be
> able to do what tried to, just make sure you tack a gismu or something to
> the end of your file, so that you get accurate results. This time I tested
> it before making any wild claims :-)
>
> On Wed, Apr 20, 2011 at 5:22 PM, .alyn.post.
> <[1]alyn.post@lodockikumazvati.org> wrote:
>
> I'm not getting the result you report:
>
> $ echo "coirodo"|java -jar vlastezba.jar /dev/fd/0
> Read file [/dev/fd/0], got [0] unique words.
>
> This is also happening if I write the file and try it:
>
> $ cat test.txt
> coirodo
> $ java -jar vlastezba.jar test.txt
> Read file [test.txt], got [0] unique words.
>
> Here is my java version:
>
> $ java -version
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326)
> Java HotSpot(TM) Client VM (build 19.1-b02-334, mixed mode)
>
> -Alan
> On Wed, Apr 20, 2011 at 04:51:51PM +0200, Johan Pretorius wrote:
> > Hi Alan,
> >
> > That would indeed be an interesting experiment, I'd be quite keen to
> see
> > the results myself.
> >
> > Right now, if you just call
> >
> > java -jar vlastezba.jar test.txt
> >
> > with some Lojban text (legal or otherwise) in test.txt, it will return
> (on
> > stdout), one valsi per line. So "coirodo" would result in:
> > coi
> > ro
> > do
> > (you can make it go look up the definitions by passing a second
> parameter,
> > but it will just add junk to the output that I don't think you'd want)
> >
> > Right now it doesn't check grammar at all, so you can throw any random
> > collection of words at it (I don't intend for it to ever do this,
> there
> > are tools out there that are far better at this than I could ever hope
> to
> > make it).
> >
> > It also won't give you a classification of valsi - it doesn't "know"
> when
> > it's dealing with a cmavo (or indeed what class), or a gismu, or a
> lujvo.
> > This I DO intend to fix.
> >
> > I want to add other output formats anyway, so if you want me to do
> > something specific to make your comparison easier, let me know. Now
> would
> > be a good time, as I'm going away on holiday for a week, and wanted to
> > spend at least a little bit of time on vlastezba.
> >
> > In fact, if you are comfortable with Java, feel free to make it do
> what
> > you need, the source code is on [1][2]sourceforge.net
> > ([2][3]http://sourceforge.net/projects/vlastezba/), and is GPL'ed :-)
> >
> > mu'o mi'e iu'an
> >
> > On Wed, Apr 20, 2011 at 4:29 PM, .alyn.post.
> > <[3][4]alyn.post@lodockikumazvati.org> wrote:
> >
> > Do you have an external representation for your valsi parsing
> > result? If I hand you the string "coirodo" is there a print
> > form of that along the lines of ("coi" "ro" "do")?
> >
> > I would be interested seeing the result from processing a large
> > data set of words and phrases and comparing that to jbogenturfa'i.
> > In order to do this I'd need some output format from your program
> > that I could parse.
> >
> > jbogenturfa'i uses the morphology PEG grammar that xorxes developed,
> > so it contains code which I think is similar (and should be
> > identical in result) to what you are doing:
> >
> > $ echo "coirodo"|jbogenturfahi --rafske
> > ((cmavo (COI "coi")) (cmavo (PA "ro")) (cmavo (KOhA "do")))
> >
> > I'd be curious to know whether they are in fact producing identical
> > results.
> >
> > -Alan
> > On Wed, Apr 20, 2011 at 11:02:28AM +0200, Johan Pretorius wrote:
> > > Hi all
> > >
> > > You can download it from here:
> > >
> >
> [1][4][5]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> > >
> > > I have completed the cmavo cluster breakout code, and tested it as
> far
> > as
> > > I was able.
> > >
> > > It should be easy enough to run if you have Java 1.6 installed, just
> > go
> > > java -jar vlastezba.jar and it will print out usage instructions.
> > >
> > > Please download it and test to pieces! I'd love all your feedback.
> > >
> > > Not that it doesn't get very smart at this stage - for instance, it
> > won't
> > > know what to do if you feed it a string of lojban that doesn't have
> > any
> > > spaces in. The only clever bit is that it's able to break apart
> cmavo
> > > clusters if they don't have any spaces.
> > >
> > > Regards,
> > > Johan
> > >
> > > --
> > > Johan Pretorius
> > > Cell: 0829268327
> > > [2][5][6]pretoriusjf@gmail.com
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > Groups
> > > "Lojban Beginners" group.
> > > To post to this group, send email to
> > [6][7]lojban-beginners@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > [7][8]lojban-beginners+unsubscribe@googlegroups.com.
> > > For more options, visit this group at
> > > [8][9]http://groups.google.com/group/lojban-beginners?hl=en.
> > >
> > > References
> > >
> > > Visible links
> > > 1.
> >
> [9][10]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> > > 2. mailto:[10][11]pretoriusjf@gmail.com
> >
> > --
> > .i ma'a lo bradi ku penmi gi'e du
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Lojban Beginners" group.
> > To post to this group, send email to
> > [11][12]lojban-beginners@googlegroups.com.
> > To unsubscribe from this group, send email to
> > [12][13]lojban-beginners+unsubscribe@googlegroups.com.
> > For more options, visit this group at
> > [13][14]http://groups.google.com/group/lojban-beginners?hl=en.
> >
> > --
> > Johan Pretorius
> > Cell: 0829268327
> > [14][15]pretoriusjf@gmail.com
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups
> > "Lojban Beginners" group.
> > To post to this group, send email to
> [16]lojban-beginners@googlegroups.com.
> > To unsubscribe from this group, send email to
> > [17]lojban-beginners+unsubscribe@googlegroups.com.
> > For more options, visit this group at
> > [18]http://groups.google.com/group/lojban-beginners?hl=en.
> >
> > References
> >
> > Visible links
> > 1. [19]http://sourceforge.net/
> > 2. [20]http://sourceforge.net/projects/vlastezba/
> > 3. mailto:[21]alyn.post@lodockikumazvati.org
> > 4.
> [22]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> > 5. mailto:[23]pretoriusjf@gmail.com
> > 6. mailto:[24]lojban-beginners@googlegroups.com
> > 7. mailto:[25]lojban-beginners%2Bunsubscribe@googlegroups.com
> > 8. [26]http://groups.google.com/group/lojban-beginners?hl=en
> > 9.
> [27]http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> > 10. mailto:[28]pretoriusjf@gmail.com
> > 11. mailto:[29]lojban-beginners@googlegroups.com
> > 12. mailto:[30]lojban-beginners%2Bunsubscribe@googlegroups.com
> > 13. [31]http://groups.google.com/group/lojban-beginners?hl=en
> > 14. mailto:[32]pretoriusjf@gmail.com
> --
> .i ma'a lo bradi ku penmi gi'e du
>
> --
> You received this message because you are subscribed to the Google
> Groups "Lojban Beginners" group.
> To post to this group, send email to
> [33]lojban-beginners@googlegroups.com.
> To unsubscribe from this group, send email to
> [34]lojban-beginners+unsubscribe@googlegroups.com.
> For more options, visit this group at
> [35]http://groups.google.com/group/lojban-beginners?hl=en.
>
> --
> Johan Pretorius
> Cell: 0829268327
> [36]pretoriusjf@gmail.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "Lojban Beginners" group.
> To post to this group, send email to lojban-beginners@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban-beginners+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban-beginners?hl=en.
>
> References
>
> Visible links
> 1. mailto:alyn.post@lodockikumazvati.org
> 2. http://sourceforge.net/
> 3. http://sourceforge.net/projects/vlastezba/
> 4. mailto:alyn.post@lodockikumazvati.org
> 5. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> 6. mailto:pretoriusjf@gmail.com
> 7. mailto:lojban-beginners@googlegroups.com
> 8. mailto:lojban-beginners%2Bunsubscribe@googlegroups.com
> 9. http://groups.google.com/group/lojban-beginners?hl=en
> 10. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> 11. mailto:pretoriusjf@gmail.com
> 12. mailto:lojban-beginners@googlegroups.com
> 13. mailto:lojban-beginners%2Bunsubscribe@googlegroups.com
> 14. http://groups.google.com/group/lojban-beginners?hl=en
> 15. mailto:pretoriusjf@gmail.com
> 16. mailto:lojban-beginners@googlegroups.com
> 17. mailto:lojban-beginners%2Bunsubscribe@googlegroups.com
> 18. http://groups.google.com/group/lojban-beginners?hl=en
> 19. http://sourceforge.net/
> 20. http://sourceforge.net/projects/vlastezba/
> 21. mailto:alyn.post@lodockikumazvati.org
> 22. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> 23. mailto:pretoriusjf@gmail.com
> 24. mailto:lojban-beginners@googlegroups.com
> 25. mailto:lojban-beginners%252Bunsubscribe@googlegroups.com
> 26. http://groups.google.com/group/lojban-beginners?hl=en
> 27. http://sourceforge.net/projects/vlastezba/files/vlastezba.jar/download
> 28. mailto:pretoriusjf@gmail.com
> 29. mailto:lojban-beginners@googlegroups.com
> 30. mailto:lojban-beginners%252Bunsubscribe@googlegroups.com
> 31. http://groups.google.com/group/lojban-beginners?hl=en
> 32. mailto:pretoriusjf@gmail.com
> 33. mailto:lojban-beginners@googlegroups.com
> 34. mailto:lojban-beginners%2Bunsubscribe@googlegroups.com
> 35. http://groups.google.com/group/lojban-beginners?hl=en
> 36. mailto:pretoriusjf@gmail.com
--
.i ma'a lo bradi ku penmi gi'e du
--
You received this message because you are subscribed to the Google Groups "Lojban Beginners" group.
To post to this group, send email to lojban-beginners@googlegroups.com.
To unsubscribe from this group, send email to lojban-beginners+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban-beginners?hl=en.