From nobody@digitalkingdom.org Fri Aug 07 21:31:20 2009
Received: with ECARTIS (v1.0.0; list lojban-list); Fri, 07 Aug 2009 21:31:20 -0700 (PDT)
Received: from nobody by chain.digitalkingdom.org with local (Exim 4.69)	(envelope-from <nobody@digitalkingdom.org>)	id 1MZdaS-0008FI-4q	for lojban-list-real@lojban.org; Fri, 07 Aug 2009 21:31:20 -0700
Received: from dsl.zenzebra.mv.com ([207.22.49.29] helo=cmarib.ramside)	by chain.digitalkingdom.org with esmtp (Exim 4.69)	(envelope-from <sunrise2000@comcast.net>)	id 1MZdaL-0008Ev-GX	for lojban-list@lojban.org; Fri, 07 Aug 2009 21:31:20 -0700
Received: from cmarib.ramside (localhost [127.0.0.1])	by cmarib.ramside (8.13.4/8.13.4) with ESMTP id n784UrBW018029	for <lojban-list@lojban.org>; Sat, 8 Aug 2009 04:30:53 GMT
Received: (from rusat@localhost)	by cmarib.ramside (8.13.4/8.13.4/Submit) id n784UrRC018026;	Sat, 8 Aug 2009 04:30:53 GMT
X-Authentication-Warning: cmarib.ramside: rusat set sender to sunrise2000@comcast.net using -f
To: lojban-list@lojban.org
Subject: [lojban] Re: Parsing NIhO sections of text
References: <86my6csrga.fsf@cmarib.ramside>	<20090806200854.GA9738@sdf.lonestar.org>
From: sunrise2000@comcast.net
Date: 08 Aug 2009 04:30:52 +0000
In-Reply-To: <20090806200854.GA9738@sdf.lonestar.org>
Message-ID: <86prb7sz9f.fsf@cmarib.ramside>
Lines: 103
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-archive-position: 15934
X-ecartis-version: Ecartis v1.0.0
Sender: lojban-list-bounce@lojban.org
Errors-to: lojban-list-bounce@lojban.org
X-original-sender: sunrise2000@comcast.net
Precedence: bulk
Reply-to: lojban-list@lojban.org
X-list: lojban-list

Minimiscience <minimiscience@gmail.com> writes:

> de'i li 06 pi'e 08 pi'e 2009 la'o fy. sunrise2000@comcast.net .fy. cusku zoi
> skamyxatra.
> > I'm trying to parse out sections of Lojban text delimited by sequences
> > of NIhO cmavo into their respective paragraphs, sections, chapters,
> > etc.
> ...
> > Does anyone here know how I could use contetx-free grammar rules to
> > parse the different sections separated by NIhO sequences?
>
> If the length of a NIhO sequence exceeds the maximum depth of the parse
> tree/list/structure, can't you just enclose the list in another list until the
> depths match?  E.g., when you have the list X=[[broda, broda], [broda]], and
> you encounter four NIhOs in a row, let X=[[X]] (two levels of lists because
> four minus the depth of X is two), and then append to X whatever comes after
> that.

The code I included in my original post used a recursive approach to
parsing.  Following your advice, I transformed the code to use an
iterative approach, and have come up with a series of clauses that
will parse NIhO-delimited text properly in (I think) all cases.

It's interesting to note that the original (recursive) code consisted
of just three lines of Prolog.  The iterative version is 53 lines long
(including comments), about 15 times the size!  The code is, however,
well behaved and written with zero cuts.  (That's the important part!)
It finds the correct parse, doesn't find any incorrect solutions, and
is guaranteed to terminate, even when backtracking.  In other words,
it works like it should. :D

Thanks for the suggestion!

/*  ++ HERE IS WHAT I CAME UP WITH ++  */

/* Prolog code to parse NIhO-delimited text into paragraphs, sections,
   chapters, etc.  This code is relased under the GNU General Public
   License, Version 3.0. */

/* For simplicity, this code uses "p" to represent a paragraph, and
   "n" to represent a member of selma'o NIhO. */

para(p) --> [p].

'n*'(0) --> [].
'n*'(N) --> [n], 'n*'(M), {N is M+1}.

/* repackage tree Tree of depth Depth to be at least N levels deep */
deepen(Depth,N,Tree,Tails,Tree,Tails) --> {N =< Depth}.
deepen(Depth,N,TreeIn,TailsIn,TreeOut,TailsOut) -->
	{N > Depth,
	TreeTmp = [TreeIn|NewTail],
	TailsTmp = [NewTail|TailsIn],
	NewDepth is Depth + 1},
	deepen(NewDepth,N,TreeTmp,TailsTmp,TreeOut,TailsOut).

/* unifies each member of a list with [].  this is used to terminate
lists in a nested list. */
closelists([]) --> [].
closelists([H|T]) --> {H = []}, closelists(T).

/* if the number of NIhOs is no greater than the depth of the parse
tree, then deepen P to depth N-1, install it at level N in the tail
list, find all lower tails in the tail list, close them, and replace
them with the tails from the deepening */

niho(TreeIn, TailList, TreeOut) --> 'n*'(N), para(P),
	{length(TailList,Depth),
	N =< Depth,
	length(NTails,N),
	suffix(NTails,TailList),
	append(Prefix,NTails,TailList),
	append([ThisTail],TgtTails,NTails),
	TgtDepth is N - 1},
	closelists(TgtTails),
	deepen(0,TgtDepth,P,[],SubTree,DeepTails),
	{ThisTail = [SubTree|NewTail],
	append(Prefix,[NewTail|DeepTails],NewTailList)},
	niho(TreeIn, NewTailList, TreeOut).

/* if the number of NIhOs is greater than the depth of the parse tree,
then add levels to the parse tree until the depth equals the number of
NIhOs, then proceed as above */

niho(TreeIn, TailList, TreeOut) --> 'n*'(N), para(P),
	{length(TailList,Depth),
	N > Depth,
	N2 is N - 1},
	deepen(Depth,N,TreeIn,TailList,NewTree,[ThisTail|CloseTails]),
	closelists(CloseTails),
	deepen(0,N2,P,[],SubTree,TailsTmp),
	{ThisTail = [SubTree|NewTail],
	NewTailList = [NewTail|TailsTmp]},
	niho(NewTree, NewTailList, TreeOut).

/* termination case */
niho(Tree, TailList, Tree) --> closelists(TailList).

/* the actual top level non-terminal for parsing a NIhO tree */
niho(Tree) --> 'n*'(N), para(P),
	deepen(0,N,P,[],TreeTmp,TailsTmp),
	niho(TreeTmp, TailsTmp, Tree).


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.