Received: from mail-da0-f64.google.com ([209.85.210.64]:33788) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) (envelope-from ) id 1UKBou-0005IX-1O; Mon, 25 Mar 2013 11:08:44 -0700 Received: by mail-da0-f64.google.com with SMTP id z8sf1984919daj.29 for ; Mon, 25 Mar 2013 11:08:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=x-received:x-beenthere:x-received:received-spf:date:from:to:subject :message-id:mail-followup-to:mime-version:user-agent :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-google-group-id:list-post :list-help:list-archive:sender:list-subscribe:list-unsubscribe :content-type:content-disposition; bh=E5FYuXNelo34xtv3ULqpj8neCzBXXTKW0CSqXvVpToI=; b=RB4IDoS3Fh8ZLJ231zgj7+Oc2oIqWR6Wjm76yIWADGh8+7q/H7FCLgwJWKrabofe9a U+qoR+tTgbdyMxYSWPEAKXTa49tGszNYZ8Qrrh8InRTa9FSLDzbO9CSK5/K3JmY3K00V vo2uRW+amyo5LOZGEGVTdxEpMMYQ/H/uiRAJgb3aVHDtZ/e97PX7T2db7G+7JA4OyKee Q3c8NXIIJavq1uWNQkomTrBnlaQ93cvc92gjDbFEgs/HxJNaddtSMKaT9p/zAgUkfGw4 Y8yHIRfUFYNwMXSbPLw/5eAK9gTumnoB3xMSX/p6wv71dTd4uxA+4jPatrUUI2NhJsnU s0Xg== X-Received: by 10.50.160.132 with SMTP id xk4mr1228230igb.7.1364234904070; Mon, 25 Mar 2013 11:08:24 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.50.7.38 with SMTP id g6ls3063726iga.26.canary; Mon, 25 Mar 2013 11:08:23 -0700 (PDT) X-Received: by 10.67.4.201 with SMTP id cg9mr1863746pad.45.1364234903444; Mon, 25 Mar 2013 11:08:23 -0700 (PDT) Received: from stodi.digitalkingdom.org (mail.digitalkingdom.org. [173.13.139.236]) by gmr-mx.google.com with ESMTPS id ty9si2302625pbc.0.2013.03.25.11.08.23 (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 25 Mar 2013 11:08:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of rlpowell@digitalkingdom.org designates 173.13.139.236 as permitted sender) client-ip=173.13.139.236; Received: from nobody by stodi.digitalkingdom.org with local (Exim 4.76) (envelope-from ) id 1UKBok-0005IM-Ae for lojban@googlegroups.com; Mon, 25 Mar 2013 11:08:22 -0700 Received: from rlpowell by stodi.digitalkingdom.org with local (Exim 4.76) (envelope-from ) id 1UKBoj-0005I8-Ns; Mon, 25 Mar 2013 11:08:21 -0700 Date: Mon, 25 Mar 2013 11:08:21 -0700 From: Robin Lee Powell To: lojban-list@lojban.org, jbovlaste@lojban.org Subject: [lojban] Need some jbovlaste programming help. Message-ID: <20130325180820.GU6328@stodi.digitalkingdom.org> Mail-Followup-To: lojban-list@lojban.org, jbovlaste@lojban.org MIME-Version: 1.0 User-Agent: Mutt/1.5.21 (2010-09-15) X-Original-Sender: rlpowell@digitalkingdom.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: best guess record for domain of rlpowell@digitalkingdom.org designates 173.13.139.236 as permitted sender) smtp.mail=rlpowell@digitalkingdom.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline X-Spam-Score: 0.1 (/) X-Spam_score: 0.1 X-Spam_score_int: 1 X-Spam_bar: / X-Spam-Report: Spam detection software, running on the system "stodi.digitalkingdom.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Some bad data has snuck its way in to jbovlaste (a good chunk from an import script I screwed up that we can't just re-run, but some of it isn't from that, so not sure what's going on) and it needs cleaning. [...] Content analysis details: (0.1 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: googlegroups.com] 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid Some bad data has snuck its way in to jbovlaste (a good chunk from an import script I screwed up that we can't just re-run, but some of it isn't from that, so not sure what's going on) and it needs cleaning. I have neither the time nor inclination. I don't much care what it's written in as long as it's UTF-8 safe (i.e. bash isn't going to cut it), but we need something that does the following: For every natlang word: if a duplicate (same word, meaning, and langid) exists, consolidate them. This means deleting the duplicate, combining the "notes" field for the two of them, and updating all instances of the id you just deleted to point to the one that still exists in the tables threads, keywordmapping, natlangwordbestguesses, and natlangwordvotes. natlangwordbestguesses has to be handled specially there, as it shouldn't end up with two identical rows (identical across all 3 fields); that shouldn't be possible given that manipulation, but check anyway. if the word is unused, delete it; unused means that its id does not occur in the appropriate column in threads, keywordmapping, natlangwordbestguesses, and natlangwordvotes. For context, here's the code: https://github.com/lojban/jbovlaste , here's a script that works https://github.com/lojban/jbovlaste/blob/master/bin/snarfgismu_tabs (the script in question, in fact, but fixed) in case you want to keep to the same code style, and here's the schema: https://github.com/lojban/jbovlaste/blob/master/help/schema.txt Looking forward to some help. -Robin -- http://intelligence.org/ : Our last, best hope for a fantastic future. .i ko na cpedu lo nu stidi vau loi jbopre .i danfu lu na go'i li'u .e lu go'i li'u .i ji'a go'i lu na'e go'i li'u .e lu go'i na'i li'u .e lu no'e go'i li'u .e lu to'e go'i li'u .e lu lo mamta be do cu sofybakni li'u -- You received this message because you are subscribed to the Google Groups "lojban" group. To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban?hl=en. For more options, visit https://groups.google.com/groups/opt_out.