[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban-beginners] vlastezba: First beta version released!



I have run these four files through jbogenturfahi with the --rafske
option.  I have attached both the raw output[1] and the post-processed
output[2].

The post-processed output is hopefully what you want, a sorted list
of words, one per line, that appear in each input file.

1: The raw output is in Scheme, and contains more information but is
   also more difficult to parse without a Scheme reader.
2: The program I used to perform post-processing is attached as
   well, though it also requires having Scheme.  I include it for
   informational purposes.

-Alan

On Fri, Apr 22, 2011 at 06:45:28PM +0200, Johan Pretorius wrote:
>    Hi Alan, all
> 
>    Alan, can I please ask you to run the attached four files through
>    jbogenturfa'i, and send me back the results? I have a visual tool (kdiff3)
>    to compare them to my results, which makes it easier for me to figure out
>    what is going on.
> 
>    New release! Get it here:
>    [1]http://sourceforge.net/projects/vlastezba/files/vlastezba_21.jar/download
> 
>    In this release, I have fixed a bunch of things:
>    - Dots are no longer assumed to be an integral part of a word. In fact,
>    now, if a dot is found, it is assumed to be a word separator, in exactly
>    the same way as a space. Beyond this they are completely ignored, and
>    indeed, removed from the input stream.
>    - "ybu" and "y'y" now parses. Since no clarity was to be had about whether
>    or not y is a vowel, consonant, neither or both, I just added those two as
>    special cases... I alread had a loose standing "y" as a special case in
>    there, because it is explicitly mentioned in CLL (section 4.3, I think)
>    - The last cmavo cluster in a file is no longer misparsed. Specifically, I
>    added a regression test and unit test for "coirodo" appearing on a single
>    line in its own file, and it finds 3 words as you would expect it to.
>    - Output is now always ordered alphabetically. Previously it was in any
>    old order because I used an unordered HashMap to store them in.
>    - Previously we seemed to produce some duplicates (I guess this could
>    happen if there were extra whitespace in the words). This only happened in
>    about 0.5% of cases. I did not consciously fix this, but it seems to no
>    longer happen.
>    - Internally, the logic is much better organized - the parsing logic is no
>    longer all stuffed into a single class, instead there is a class hierarchy
>    specifically to represent each word class, the idea is that each will have
>    its own specialized processing. The main point of doing this was to enrich
>    the results returned by the tokenizer, which means in future we can get
>    all flexible (like, if we find a lujvo, we will know what it's rafsi are,
>    so that we can decide to give the user a list of those, look up their
>    gismu's definitions, or what).
>    - Added regression tests. There are 4 files: the Terry the Tiger story,
>    the Berenstein Bears story, a file containing only "coirodo" on a single
>    line, and a file containing a list of all recognized cmavo (about 1000
>    lines). I also added a script that will run all these through vlastezba,
>    compares the outputs against "expected" results, and spits the diffs into
>    a single file (test_result.txt). It should be noted that the "expected"
>    results are baselined off of this release, so it is impossible for there
>    to be any reported problems. However, next time a change is made, it will
>    be possible to see how the regression tests are affected. The expected
>    results can then be manually updated to be more correct, thus causing the
>    test to become more correct over time.
>    - Added 2 unit tests to the ones already existing, specifically to test
>    these two cases: "coirodo" and "ybu"... since both were problems that got
>    fixed in this release.
> 
>    By the way, does anybody know how to do a formal release on SourceForge?
>    Aside from just uploading the jar file, which is what I'm doing currently.
> 
>    Regards,
>    iu'an
> 
>    --
>    You received this message because you are subscribed to the Google Groups
>    "Lojban Beginners" group.
>    To post to this group, send email to lojban-beginners@googlegroups.com.
>    To unsubscribe from this group, send email to
>    lojban-beginners+unsubscribe@googlegroups.com.
>    For more options, visit this group at
>    http://groups.google.com/group/lojban-beginners?hl=en.
> 
> References
> 
>    Visible links
>    1. http://sourceforge.net/projects/vlastezba/files/vlastezba_21.jar/download



-- 
.i ma'a lo bradi ku penmi gi'e du

-- 
You received this message because you are subscribed to the Google Groups "Lojban Beginners" group.
To post to this group, send email to lojban-beginners@googlegroups.com.
To unsubscribe from this group, send email to lojban-beginners+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban-beginners?hl=en.

Attachment: jbogenturfahi-cipra.zip
Description: Zip compressed data