[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lojban-beginners] vlastezba: First beta version released!
Hi Alan, all
Alan, can I please ask you to run the attached four files through
jbogenturfa'i, and send me back the results? I have a visual tool
(kdiff3) to compare them to my results, which makes it easier for me to
figure out what is going on.
New release! Get it here: http://sourceforge.net/projects/vlastezba/files/vlastezba_21.jar/download
In this release, I have fixed a bunch of things:
- Dots are no longer assumed to be an integral part of a word. In fact, now, if a dot is found, it is assumed to be a word separator, in exactly the same way as a space. Beyond this they are completely ignored, and indeed, removed from the input stream.
- "ybu" and "y'y" now parses. Since no clarity was to be had about whether or not y is a vowel, consonant, neither or both, I just added those two as special cases... I alread had a loose standing "y" as a special case in there, because it is explicitly mentioned in CLL (section 4.3, I think)
- The last cmavo cluster in a file is no longer misparsed. Specifically, I added a regression test and unit test for "coirodo" appearing on a single line in its own file, and it finds 3 words as you would expect it to.
- Output is now always ordered alphabetically. Previously it was in any old order because I used an unordered HashMap to store them in.
- Previously we seemed to produce some duplicates (I guess this could happen if there were extra whitespace in the words). This only happened in about 0.5% of cases. I did not consciously fix this, but it seems to no longer happen.
- Internally, the logic is much better organized - the parsing logic is no longer all stuffed into a single class, instead there is a class hierarchy specifically to represent each word class, the idea is that each will have its own specialized processing. The main point of doing this was to enrich the results returned by the tokenizer, which means in future we can get all flexible (like, if we find a lujvo, we will know what it's rafsi are, so that we can decide to give the user a list of those, look up their gismu's definitions, or what).
- Added regression tests. There are 4 files: the Terry the Tiger story, the Berenstein Bears story, a file containing only "coirodo" on a single line, and a file containing a list of all recognized cmavo (about 1000 lines). I also added a script that will run all these through vlastezba, compares the outputs against "expected" results, and spits the diffs into a single file (test_result.txt). It should be noted that the "expected" results are baselined off of this release, so it is impossible for there to be any reported problems. However, next time a change is made, it will be possible to see how the regression tests are affected. The expected results can then be manually updated to be more correct, thus causing the test to become more correct over time.
- Added 2 unit tests to the ones already existing, specifically to test these two cases: "coirodo" and "ybu"... since both were problems that got fixed in this release.
By the way, does anybody know how to do a formal release on SourceForge? Aside from just uploading the jar file, which is what I'm doing currently.
Regards,
iu'an
--
You received this message because you are subscribed to the Google Groups "Lojban Beginners" group.
To post to this group, send email to lojban-beginners@googlegroups.com.
To unsubscribe from this group, send email to lojban-beginners+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban-beginners?hl=en.
Attachment:
tests.zip
Description: Zip archive