[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lojban] CLL diffs



On Tue, Sep 07, 2010 at 09:59:51PM -0600, Alan Post wrote:
> My favorite change so far is the following:
> 
> [-forbidden.-] {+forbilien .+}
> 
> Someone changed forbidden to forbilien, twice no less.
> 
> My largest challenge in this project are the fact that I did not
> get consistent conversion of non-ASCII characters, so the wdiff
> patch is very noisy--anytime a non-ascii character, or an ascii
> character with a non-ascii representation (e.g., single and double
> quote) appears, it shows up as a diff.  I've managed to remove
> certain classes of these, and am still finding patterns as I go.

There's a unix command called "recode" which can almost certainly
fix those problem, just so you know.

> How would you like to proceed?  Looking at my numbers, I'm wondering
> if I've taken the wrong strategy regarding character encoding and
> questioning whether I should revisit my pipeline and try to
> systematically solve the character encoding problem, rather than
> fixing the results of it.  

That seems like a good plan, yes.

> I don't know how many changes to expect to see at the end  My
> guess is that I should see 500 legitimate changes, but the error
> bar an that makes the number a bit specious.

*nod*

Not sure how I can help, but if you want to send me the two things
you're working from I can try my own hand at the encoding problems.

-Robin

-- 
http://singinst.org/ :  Our last, best hope for a fantastic future.
Lojban (http://www.lojban.org/): The language in which "this parrot
is dead" is "ti poi spitaki cu morsi", but "this sentence is false"
is "na nei".   My personal page: http://www.digitalkingdom.org/rlp/

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.