[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] Need some jbovlaste programming help.



On Tue, Mar 26, 2013 at 09:15:47AM -0400, Pierre Abbat wrote:
> On Monday, March 25, 2013 11:08:21 Robin Lee Powell wrote:
> > Some bad data has snuck its way in to jbovlaste (a good chunk from
> > an import script I screwed up that we can't just re-run, but some of
> > it isn't from that, so not sure what's going on) and it needs
> > cleaning.
> > 
> > I have neither the time nor inclination.
> > 
> > I don't much care what it's written in as long as it's UTF-8 safe
> > (i.e. bash isn't going to cut it), but we need something that does
> > the following:
> > 
> > For every natlang word:
> > 
> >   if a duplicate (same word, meaning, and langid) exists,
> >   consolidate them.  This means deleting the duplicate, combining
> >   the "notes" field for the two of them, and updating all instances
> >   of the id you just deleted to point to the one that still exists
> >   in the tables threads, keywordmapping, natlangwordbestguesses, and
> >   natlangwordvotes.  natlangwordbestguesses has to be handled
> >   specially there, as it shouldn't end up with two identical rows
> >   (identical across all 3 fields); that shouldn't be possible given
> >   that manipulation, but check anyway.
> 
> Is PHP okay? All the database programming I've done is in PHP,
> except a tiny bit that's in bash.

It's better than it not getting done.  :)  You will want to read
http://tympanus.net/codrops/2009/08/31/solving-php-mysql-utf-8-issues/
and
http://stackoverflow.com/questions/60174/how-to-prevent-sql-injection-in-php
, noting that the database postgres, not MySQL.

> Can I (or anyone else who's going to do this) have a copy of the
> database to practice on? I don't want to delete from the live
> database a word by mistake.

Except for the users table you certainly can, yes.

http://users.digitalkingdom.org/~rlpowell/media/public/jbovlaste.20130326.sql.gz

If you need a scrubbed users table, let me know.

The problems are mostly in Russian, due to an import script I fucked
up.  You can see the issue thuswise:

select word,meaning,langid from natlangwords where word in (select word from natlangwords group by word, meaning, langid having count(*) > 1) order by langid;

That should never, ever happen.

There are also a bunch of natlangwords, some related to this bug and
some not, which aren't in use at all.

-Robin

-- 
http://intelligence.org/ :  Our last, best hope for a fantastic future.
.i ko na cpedu lo nu stidi vau loi jbopre .i danfu lu na go'i li'u .e
lu go'i li'u .i ji'a go'i lu na'e go'i li'u .e lu go'i na'i li'u .e
lu no'e go'i li'u .e lu to'e go'i li'u .e lu lo mamta be do cu sofybakni li'u

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.