[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban-beginners] vlastezba: First beta version released!

To: lojban-beginners@googlegroups.com
Subject: Re: [lojban-beginners] vlastezba: First beta version released!
From: ".alyn.post." <alyn.post@lodockikumazvati.org>
Date: Fri, 22 Apr 2011 11:23:38 -0600
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:x-beenthere:received-spf:date:from:to:subject :message-id:mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-google-group-id:list-post :list-help:list-archive:sender:list-subscribe:list-unsubscribe :content-type:content-disposition; bh=xLXBD++5nLHvHT1EussdKSNZZimXWBfB/G0RNOVVing=; b=FAzIg/9gj6TZiRuH9uo3PElZrchlfVHmZ+LFUkNbyX3Z9bHJBFkrZYN3nwHGOMp3km cCCBkpnkVWFlywwqs7zz0LAQUwmqjBeCuFtqnlTBjKuMrcHIvOMMHgMLd1GZP3yf9/bc VeHybZDWt8spOA8INHKUHuuGOMKh7be4QK/xc=
Domainkey-signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-google-group-id:list-post :list-help:list-archive:sender:list-subscribe:list-unsubscribe :content-type:content-disposition; b=z/oCgcZ1Yrtrb/sqgST53b6Posh4ULCFrGjpoPODMjd8UTq6Ye5rKrg5iX+WqFfYcL LRTifBF/140ZuI606Jy8rPyJUiZjTRQMOpmvbIng8ZgRKdIzwiKXByFfFekmn9CvpHsE /0ucWK/XPmS5ECL4rs60TIAoiwhVvLoR3ftqI=
In-reply-to: <BANLkTi=94Rhn=2OG=BZcQnPU78kZ8=b62w@mail.gmail.com>
List-archive: <http://groups.google.com/group/lojban-beginners?hl=en_US>
List-help: <http://groups.google.com/support/?hl=en_US>, <mailto:lojban-beginners+help@googlegroups.com>
List-id: <lojban-beginners.googlegroups.com>
List-post: <http://groups.google.com/group/lojban-beginners/post?hl=en_US>, <mailto:lojban-beginners@googlegroups.com>
List-subscribe: <http://groups.google.com/group/lojban-beginners/subscribe?hl=en_US>, <mailto:lojban-beginners+subscribe@googlegroups.com>
List-unsubscribe: <http://groups.google.com/group/lojban-beginners/subscribe?hl=en_US>, <mailto:lojban-beginners+unsubscribe@googlegroups.com>
Mail-followup-to: lojban-beginners@googlegroups.com
Mailing-list: list lojban-beginners@googlegroups.com; contact lojban-beginners+owners@googlegroups.com
References: <BANLkTi=6XN4JTMLd6Wa44xnQj66JpTvQ3w@mail.gmail.com> <20110421123557.GA71258@alice.local> <BANLkTimtuh8-sNtiYVnGi6nhfSV8L3yKLQ@mail.gmail.com> <BANLkTikLSAhF2dWJCsuRcM8vfpk-R++tVg@mail.gmail.com> <BANLkTim352fTOt_OH9yvxH0hs1Vrr87OUQ@mail.gmail.com> <BANLkTi=Vt94A37OUxv7n3PnYUSK+Aa0q=Q@mail.gmail.com> <BANLkTin-fCSPZWMDOH_XLcHYt57X71cNdQ@mail.gmail.com> <20110422042758.GE79918@alice.local> <BANLkTinvx92HVnpUisfOgMuA2pm9tg+cxQ@mail.gmail.com> <BANLkTi=94Rhn=2OG=BZcQnPU78kZ8=b62w@mail.gmail.com>
Reply-to: lojban-beginners@googlegroups.com
Sender: lojban-beginners@googlegroups.com

I have run these four files through jbogenturfahi with the --rafske
option.  I have attached both the raw output[1] and the post-processed
output[2].

The post-processed output is hopefully what you want, a sorted list
of words, one per line, that appear in each input file.

1: The raw output is in Scheme, and contains more information but is
   also more difficult to parse without a Scheme reader.
2: The program I used to perform post-processing is attached as
   well, though it also requires having Scheme.  I include it for
   informational purposes.

-Alan

On Fri, Apr 22, 2011 at 06:45:28PM +0200, Johan Pretorius wrote:
>    Hi Alan, all
> 
>    Alan, can I please ask you to run the attached four files through
>    jbogenturfa'i, and send me back the results? I have a visual tool (kdiff3)
>    to compare them to my results, which makes it easier for me to figure out
>    what is going on.
> 
>    New release! Get it here:
>    [1]http://sourceforge.net/projects/vlastezba/files/vlastezba_21.jar/download
> 
>    In this release, I have fixed a bunch of things:
>    - Dots are no longer assumed to be an integral part of a word. In fact,
>    now, if a dot is found, it is assumed to be a word separator, in exactly
>    the same way as a space. Beyond this they are completely ignored, and
>    indeed, removed from the input stream.
>    - "ybu" and "y'y" now parses. Since no clarity was to be had about whether
>    or not y is a vowel, consonant, neither or both, I just added those two as
>    special cases... I alread had a loose standing "y" as a special case in
>    there, because it is explicitly mentioned in CLL (section 4.3, I think)
>    - The last cmavo cluster in a file is no longer misparsed. Specifically, I
>    added a regression test and unit test for "coirodo" appearing on a single
>    line in its own file, and it finds 3 words as you would expect it to.
>    - Output is now always ordered alphabetically. Previously it was in any
>    old order because I used an unordered HashMap to store them in.
>    - Previously we seemed to produce some duplicates (I guess this could
>    happen if there were extra whitespace in the words). This only happened in
>    about 0.5% of cases. I did not consciously fix this, but it seems to no
>    longer happen.
>    - Internally, the logic is much better organized - the parsing logic is no
>    longer all stuffed into a single class, instead there is a class hierarchy
>    specifically to represent each word class, the idea is that each will have
>    its own specialized processing. The main point of doing this was to enrich
>    the results returned by the tokenizer, which means in future we can get
>    all flexible (like, if we find a lujvo, we will know what it's rafsi are,
>    so that we can decide to give the user a list of those, look up their
>    gismu's definitions, or what).
>    - Added regression tests. There are 4 files: the Terry the Tiger story,
>    the Berenstein Bears story, a file containing only "coirodo" on a single
>    line, and a file containing a list of all recognized cmavo (about 1000
>    lines). I also added a script that will run all these through vlastezba,
>    compares the outputs against "expected" results, and spits the diffs into
>    a single file (test_result.txt). It should be noted that the "expected"
>    results are baselined off of this release, so it is impossible for there
>    to be any reported problems. However, next time a change is made, it will
>    be possible to see how the regression tests are affected. The expected
>    results can then be manually updated to be more correct, thus causing the
>    test to become more correct over time.
>    - Added 2 unit tests to the ones already existing, specifically to test
>    these two cases: "coirodo" and "ybu"... since both were problems that got
>    fixed in this release.
> 
>    By the way, does anybody know how to do a formal release on SourceForge?
>    Aside from just uploading the jar file, which is what I'm doing currently.
> 
>    Regards,
>    iu'an
> 
>    --
>    You received this message because you are subscribed to the Google Groups
>    "Lojban Beginners" group.
>    To post to this group, send email to lojban-beginners@googlegroups.com.
>    To unsubscribe from this group, send email to
>    lojban-beginners+unsubscribe@googlegroups.com.
>    For more options, visit this group at
>    http://groups.google.com/group/lojban-beginners?hl=en.
> 
> References
> 
>    Visible links
>    1. http://sourceforge.net/projects/vlastezba/files/vlastezba_21.jar/download



-- 
.i ma'a lo bradi ku penmi gi'e du

-- 
You received this message because you are subscribed to the Google Groups "Lojban Beginners" group.
To post to this group, send email to lojban-beginners@googlegroups.com.
To unsubscribe from this group, send email to lojban-beginners+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban-beginners?hl=en.

Attachment: jbogenturfahi-cipra.zip
Description: Zip compressed data

Follow-Ups:
- Re: [lojban-beginners] vlastezba: First beta version released!
  - From: Johan Pretorius <pretoriusjf@gmail.com>

References:
- [lojban-beginners] vlastezba: First beta version released!
  - From: Johan Pretorius <pretoriusjf@gmail.com>
- Re: [lojban-beginners] vlastezba: First beta version released!
  - From: ".alyn.post." <alyn.post@lodockikumazvati.org>
- Re: [lojban-beginners] vlastezba: First beta version released!
  - From: Johan Pretorius <pretoriusjf@gmail.com>
- Re: [lojban-beginners] vlastezba: First beta version released!
  - From: Jonathan Jones <eyeonus@gmail.com>
- Re: [lojban-beginners] vlastezba: First beta version released!
  - From: Jonathan Jones <eyeonus@gmail.com>
- Re: [lojban-beginners] vlastezba: First beta version released!
  - From: Jonathan Jones <eyeonus@gmail.com>
- Re: [lojban-beginners] vlastezba: First beta version released!
  - From: Jonathan Jones <eyeonus@gmail.com>
- Re: [lojban-beginners] vlastezba: First beta version released!
  - From: ".alyn.post." <alyn.post@lodockikumazvati.org>
- Re: [lojban-beginners] vlastezba: First beta version released!
  - From: Jonathan Jones <eyeonus@gmail.com>
- Re: [lojban-beginners] vlastezba: First beta version released!
  - From: Johan Pretorius <pretoriusjf@gmail.com>

Prev by Date: Re: [lojban-beginners] vlastezba: First beta version released!
Next by Date: Re: [lojban-beginners] vlastezba: First beta version released!
Previous by thread: Re: [lojban-beginners] vlastezba: First beta version released!
Next by thread: Re: [lojban-beginners] vlastezba: First beta version released!
Index(es):
- Date
- Thread