[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban] CLL diffs
On Tue, Sep 07, 2010 at 09:59:51PM -0600, Alan Post wrote:
> My favorite change so far is the following:
>
> [-forbidden.-] {+forbilien .+}
>
> Someone changed forbidden to forbilien, twice no less.
>
> My largest challenge in this project are the fact that I did not
> get consistent conversion of non-ASCII characters, so the wdiff
> patch is very noisy--anytime a non-ascii character, or an ascii
> character with a non-ascii representation (e.g., single and double
> quote) appears, it shows up as a diff. I've managed to remove
> certain classes of these, and am still finding patterns as I go.
There's a unix command called "recode" which can almost certainly
fix those problem, just so you know.
> How would you like to proceed? Looking at my numbers, I'm wondering
> if I've taken the wrong strategy regarding character encoding and
> questioning whether I should revisit my pipeline and try to
> systematically solve the character encoding problem, rather than
> fixing the results of it.
That seems like a good plan, yes.
> I don't know how many changes to expect to see at the end My
> guess is that I should see 500 legitimate changes, but the error
> bar an that makes the number a bit specious.
*nod*
Not sure how I can help, but if you want to send me the two things
you're working from I can try my own hand at the encoding problems.
-Robin
--
http://singinst.org/ : Our last, best hope for a fantastic future.
Lojban (http://www.lojban.org/): The language in which "this parrot
is dead" is "ti poi spitaki cu morsi", but "this sentence is false"
is "na nei". My personal page: http://www.digitalkingdom.org/rlp/
--
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.