From rlpowell@csclub.uwaterloo.ca Sun May 21 16:24:42 2000 Return-Path: Received: (qmail 32678 invoked from network); 21 May 2000 23:24:42 -0000 Received: from unknown (10.1.10.26) by m2.onelist.org with QMQP; 21 May 2000 23:24:42 -0000 Received: from unknown (HELO calum.csclub.uwaterloo.ca) (129.97.134.11) by mta1 with SMTP; 21 May 2000 23:24:42 -0000 Received: from calum.csclub.uwaterloo.ca (localhost [127.0.0.1]) by calum.csclub.uwaterloo.ca (8.9.3+Sun/8.9.3) with ESMTP id TAA04851 for ; Sun, 21 May 2000 19:24:29 -0400 (EDT) Message-Id: <200005212324.TAA04851@calum.csclub.uwaterloo.ca> To: lojban@egroups.com Subject: [LONG] UNIX shell script for word lookup Date: Sun, 21 May 2000 19:24:29 -0400 X-eGroups-From: Robin Lee Powell From: Robin Lee Powell Here's a little script I wrote that looks up lojban words in the cmavo and gismu files, and makes the output nice and pretty and readable. I'm quite proud of it; feedback welcomed. :-) Note that it assumes that the wordlists are gzip'd: the first call to grep is actually to zgrep. If you have the space available to keep them in plain text, remove the 'z' at the beginning of both lines that start with 'z$GREP'. Edit the defines at the top as appropriate. If you don't have any of the programs used, fmt in particular, you're on your own. I call it 'll', but you can call it whatever you want. Just don't call me Shirley. It's a ksh script, but is probably sh compatible; most of my scripts are. -Robin #!/usr/bin/ksh ############## # ll = Lojban Lookup ############## GISMU=~/lojban/gismu CMAVO=~/lojban/cmavo TEMP=/tmp/ll.1 WORD="[] .,'\"()[]" #Word boundary pattern GREP="grep" USAGE=" $0 [-a] [-n] [-i] [-g] [-c] $0 looks through the gismu and cmavo word files for the given word, in that order, as described below. The gismu and cmavo entries can be told apart by the (always uppercase) selma'o name in the rafsi columns (in effect, column 2), By default, $0 looks for as either the exact word in the first column (i.e. the exact lojbanic word) or an exact word in the rafsi, gloss or mnemonic columns. IOW, $0 by default only looks in the first line of the paragraphs it outputs (this was made a reasonable proposition by the discover that neither file contains the character '$'.) -a: Find the word anywhere, but still an exact word match. -n: No space checking, find anywhere at all. 'mil' finds 'milti'. Note that, keeping shell quoting issues in mind, an arbitrary grep expression can be passed to -n with the expected results. -i: Case insensitive. -c: cmavo only. -g: gismu only. -n overrides -a. Using both -g and -c is a NOP. The difference between the arguments is well exemplified by running '$0 coi', '$0 -a coi', '$0 -ai coi', '$0 -n coi'. " while getopts anigc c do case $c in a) ANY="ANY";; n) NS="NS";; i) GREP="$GREP -i";; g) GONLY="g";; c) CONLY="c";; \?) echo "$USAGE" exit 2;; esac done shift `expr $OPTIND - 1` if [ $# != 1 ] then echo "$USAGE" exit 2 fi pat=$1 touch $TEMP if [ -n "$GONLY" -o -z "$CONLY" ] then z$GREP "$pat" $GISMU >> $TEMP fi if [ -n "$CONLY" -o -z "$GONLY" ] then z$GREP "$pat" $CMAVO >> $TEMP fi cat $TEMP | $GREP -v '^ [0-9]* Lojban' | \ nawk '{ printf("$%s$%20s%s$$%20s%s\n", \ substr($0, 1, 62), "", substr($0, 63, 96), "", substr( $0, 170)); }' | \ cat >$TEMP.2 if [ "$NS" ] then ANY="" cat $TEMP.2 | \ tr '$' '\n' | fmt > $TEMP fi if [ "$ANY" ] then cat $TEMP.2 | \ $GREP "$WORD$pat$WORD" | \ tr '$' '\n' | fmt > $TEMP fi if [ -z "$ANY" -a -z "$NS" ] then cat $TEMP.2 | \ $GREP "^\\$[^\$]*$WORD$pat$WORD[^\$]*\\$" | \ tr '$' '\n' | fmt > $TEMP fi num=`cat $TEMP | wc -l` if [ $num -gt 10 ] then less $TEMP else cat $TEMP fi rm $TEMP