From sabren@xxxxxxxxxxxxx.xxxx Sun Oct 24 15:22:57 1999 X-Digest-Num: 265 Message-ID: <44114.265.1436.959273825@eGroups.com> Date: Sun, 24 Oct 1999 18:22:57 -0400 (EDT) From: "Michal Wallace (sabren)" To: perl-ai list Subject: perl and lojban, sitting in a tree.. This one's long. Synopsis: computers can't yet grok most human concepts and languages, but they do just fine with computer science. Why not tackle a slightly simpler problem: translating computer languages, with lojban as the intermediary language? Hey all, This talk about language has got me thinking. John Nolan made an excellent point when he asked, why should a computer (AI robot) talk to you at all? It seems to me that language doesn't make much sense without something to communicate, and someone willing and able to listen. It doesn't make sense for computers and humans to talk about baseball, because computers don't really care about baseball in the real world. However, given a virtual world, computers can *play* baseball, keep score, and interact with humans. The Sapir-Whorf hypothesis states that language limits experience. But isn't the reverse also true? Consider the visible spectrum. I've been told that 24 bit RGB offers more colors than the human eye can perceive. I'm not sure that it can account for every color the eye can see, but it offers more shades of "red", for example, than the human eye can visually distinguish. With so many possible colors, most languages allow us to "see" only a handful of colors. You can test this yourself. Just visit http://www.lynda.com/hexh.html and see how many colors you can name. [*] The Sapri-Whorf hypothesis would suggest that, for the most part, we usually only experience "red" - not the many individual shades of red, because experience is shaped by our ability to code it in language. But the opposite is also true: at one point in time, it would have been outlandish to talk about "colors" we can't actually see. Yet, once science delivered us a theory of electromagnetics, we can talk about mirowaves, radio waves, infra-red, ultraviolet, X-rays.. all of which are invisible "colors". We can talk abou these things now, because we experience them. But those words would have been wasted on the ancient greeks, because they never experienced the phenomena the words describe. Computers don't normally experience the physical world. Yes, you can attach a microphone, a couple quick-cams for binocular vision, a robotic arm, and some motorized wheels - and with enough knowledge, and work, it might even work its way around the room. :) But that's a robot, not a computer. A "computer" entity would have experiences completely different from your average human's, and therefore, an "intelligent" computer's language would reflect that. That is, it would be far more natural for AI's to talk about databases, algorithms, logic, and applications than to talk about the Atlanta Braves. And in fact, we talk with computers about these things all the time. We use languages such as perl, java, SQL, lisp... [**] The more I read about Lojban, the more I think that it makes sense as a "native language" for computers. It's logic-based (and seems to share a lot in common with lambda calculus / lisp / prolog - but I've only a superficial understanding of any of those OR lojban, so don't take my word for it).. It's got a well-defined YACC grammar, requires no particular inflections or stress (for text-to-speech readers).. It's supposedly always obvious where each word ends (for speech-to-text listeners).. Requires only a handful of ASCII characters.. It is said to be quite expressive and easy to learn (although there's only a handful of resources for learning). Finally, it has evolved over the past thirty or so years with input and interest from the AI community. There's been talk on this list about translating natural language with perl. I suggested esperanto or lojban as an intermediary language. I've done some reading, and found out that others have had that same idea: http://www.lojban.org/files/why-lojban/mactrans.txt The probem, of course, is that reliably parsing English or Japanese is a long way off. The computer doesn't really even know what it's translating. However, just about every computer on the planet can parse source code. Perhaps an interesting project would be an automated translator for computer programs. Right now, perl can be compiled into C, python can be compiled into java bytecode or C source. Just about anything can be compiled into assembly language. In all of these cases, the interpreter chunks downward, breaking the high level language into low level steps. There are also some lateral chunkers: programs that translate awk or sed to perl, for example, or assemblers that convert opcodes into machine language. For the most part, these are just search-and replace methods. These work because the conceptual gap between the languages is not large (at least in one direction). The translation article I linked above compares this kind of thing to a first year language student simply looking up words in a translating dictionary and writing the translation down. But what about translating a lisp or python program into perl? Or (even with a perl grammar) doing the opposite? As long as the two languages is turing-complete, it's possible. It simply requires an understanding of what the programs are doing. The translator needs to recognize the algorithm being used, and map that to the other language. It needs to understand things like recursion, sorting, function calls, loops, design patterns, and how to simulate them if translating from an expressive language to a less expressive one, and how to recognize the workarounds when going from a less expressive language to a more expressive one. A universal source code translator would have to "chunk up" and make comments about what a particuluar program was doing, then chunk back down into a different language. If an intermediary language were used, it would have to be expressive enough to handle any statement in any other language. (Even weird stuff like regexps, or the cut ("!") in prolog.) ... And perhaps (like a human translator) it might have to be expressive enough to ask for help. No current computer language is expressive enough to account for all the thought forms or experiences available in computer science. Because they're turing complete, they *CAN* express any particular operation, but often it's in the manner of someone taking ten paragraphs to describe a single experience for which one's language has no word. (Like me, right now! Imagine if there was a single word that had the exact meaning of this entire message - including this sentence!) But: all of these concepts could be described in a human language, such as english, or even lojban. So, what would lojban buy or cost us as an intermediary language for translating source code? If someone were to actually implement a translator like this, would perl be a sensible implementation language? Why or why not? How might we approach the issue of "chunking up" and recognizing patterns? I've rambled enough. :) ------------ [*] Incidentally, I wrote a perl program that will take an RGB color value and give you an english description. It's the example program for AI::Fuzzy, and you can find it by grabbing "fuzco" and AI-Fuzzy-*.tar.gz at http://www.sabren.com/code/perl/ [**] Yes, we can look up the Braves on the net, but the computer doesn't have any clue who they are. It might not "understand" a perl script, either, but it reacts as if it understands. If humans spoke perl, the turing test would be a snap. :) Cheers, - Michal ------------------------------------------------------------------------- http://www.manifestation.com/ http://www.linkwatcher.com/metalog/ -------------------------------------------------------------------------