Sunday, May 08, 2011

Hours of Sunshine Annually in USA

I made a Google Earth map of hours of sunshine per year for various places in the U.S. I got the data from "Historical Sunshine and Cloud Data in the United States, CDIAC NDP-021/R1", a database from the Global Change Master Directory - a government website.  I matched the place names in the database with locations I found in another database published by the USGS.



I constructed the document with two Perl scripts: one to match place names with locations and another to write the KML.  I had to match some places names with locations by hand.

Sunday, April 10, 2011

Language Recognition with gzip

So I'm watching a lecture in CS221, and Prof. Norvig shows a slide with this Unix command on it:
  cat | gzip | wc
and informs us that it can be used to identify what language a text is written in.  Basically the command is "combine some files, zip them up, then measure the size of the resulting zip file.  Jiggawhat?  Yes, there is a thing called "compression-based classification", and it's actually pretty effective.  What happens when you zip a file?  A compression algorithm finds repeated patterns in the file and replaces instances of them with shorter patterns.  The more and longer the repeated patterns, the more the file can be compressed.  What happens when you zip two files together?  If they have similar content, they will compress well.  If they have different content, they won't compress as well.  So, if you have some text and you want to find out what language it is, zip it with example texts for any language it might be.  The language it compresses best with is the language it is mostly likely to be.  If the languages you are considering aren't too similar (e.g. Dutch & German, not Norwegian & Swedish), this technique is actually very reliable.

People have made serious studies of this.  I tried it out informally, and it actually works.  Here's a Perl script that does compression-based classification:
Note the system call in size_when_zipped.  I threw together a "corpus" for each of several languages by picking text from books on Project Gutenberg; English, French, German and Dutch and found the script to work consistently for various texts.

Changing the topic: I rewrote the script in Python and Haskell, just to see how the code would compare.  The translation from Perl to Python was pretty direct.  Again, I use Unix utilities to zip the files together and measure their length.  The Python version is both fewer lines and fewer characters, but only because I used less whitespace and no lines were devoted beginnings and endings of code blocks.  The translation to Haskell was less direct.  I don't have a Unix or GNU system to run Haskell programs on, so, when I wrote the Haskell version,  I had to write functions to perform the globbing and concatenation done by cat and the zipping done by gzip.  It it weren't for that, the Haskell code would have been the smallest.  Haskell blows both Python and Perl away in terms of terseness, without being (IMO) less readable.
I must say that Python is ridiculously easy to code with.  I've only written a few hundred lines of Python, and I already feel like I've got the language figured out.  Still, Perl is more fun.  Probably because in Perl "There's more than one way to do it", and using the ins and outs of the language to find a way to do that is both elegant and readable is an interesting challenge.  With Python "There should be one—and preferably only one—obvious way to do it"  and there are no challenges if you already know how to write conventional C#/Java code.

Thanks to Job Vranish for the lovely definition of bestOf in the Haskell version. The version I wrote was several lines longer.  With Haskell "There is more than one way to do it, the obvious way to do it is with recursion, but you should never use recursion because somebody already wrote higher-order function to do it".

    Wednesday, August 18, 2010

    Disagreement

    I can't disagree with you unless you disagree with me.  If it's rude to contradict someone, then whoever speaks first wins.

    Sunday, August 08, 2010

    Monadic Predicate Logic

    Suppose we restricted our thoughts to the concepts of  something, everything, nothing, logical connectives like and, or, if...then or not, and monadic predicates, which are attributions of a property to an individual object, like "the sky is blue" or "tomorrow is Tuesday". (In the first statement, the sky is the object and blueness is its property.  In the second statement, tomorrow is the object and being a Tuesday is its property.  That's pretty restrictive, but it's enough to capture the logic behind such truths as the argument:
    Socrates is a man.
    Man is mortal.
    Therefore, Socrates is mortal.
    Which can be restated in a way that makes its logic more obvious:
    If Socrates is a man and if something is a man then it is mortal, then Socrates is mortal.
    The validity of any proposition that can be constructed from only the above-mentioned concepts is decidable.  By "validity" I mean the proposition is true independent of anything else.  "If something is both large and round, then it is large" is true whether or not anything large and/or round actually exists.  The above-mentioned proof of Socrates' mortality is valid, too.  By "proposition" I mean the meaning of a declarative, true-or-false statement.  By "decidable" I mean someone could write a computer program that would the determine any one of these proposition's validity, and it would work for any proposition.

    So why have I been explaining all this?  Because I wrote a computer program that does just that.  It's not the first to do so, and it only works for propositions that contain five or fewer predicates, but I spend a fair amount working on it, so I'm going to share it, anyway.
     
    The Application

    The application runs from the command line.  It takes one or more text files as command-line arguments.  When you execute it, it will read each file and try to interpret it as a statement of monadic predicate logic.  If it can, it will decide whether the contents are necessarily true (valid), possibly true (valid only if additional premises were added), or self-contradictory (necessarily false).

    So, if I had an input file that contains the following text:
    // All men are mortal.
    x,Hx->Mx
    // Socrates is a man.
    Hs
    // Therefore,
    ->
    // Socrates is mortal.
    Ms
    I would test it through the command prompt like this:


    The Language

    mpl.exe recognizes a particular language of symbolic logic.  The language uses only ASCII characters available on American English keyboard.  Its elements resemble either elements of ordinary symbolic logic or C-style bitwise operators.  My intent is for the language to be unoriginal enough for someone who knows symbolic logic to be able to use it without much trouble.  I'm too lazy to give a precise definition of the language, so I'll give a brief synopsis.  If anyone has questions, I'll be happy to answer them.

    The elements of the language are as follows:
    • Ax - a predicate, A, predicated on one variable x.  Predicates can be any character from A to Z.  Variables can be any character from a to z.  These are what represents attributions of a property to an object.  mpl.exe will handle statements with an unlimited number of these, so long as they use no more than five different predicates.
    • (,) - grouping by parentheses, just like in algebra.
    • .,:,:., etc. - old school grouping, like in Mathematical Logic.
    • x,... - universal quantification, i.e. "for all x, ... is true".  The generalization extends to the end of the group it is in, e.g. "x,Ax&y,By->Cx" means the same as "(x,Ax&(y,By->Cx))", but not the same as "(x,Ax)&(y,By->Cx)".
    • 3x,... - existential quantification, i.e. "there is an x such that ... is true".  It also extends to the end of the group it is in.
    • ~... - negation.
    • ...&.. - logical AND.
    • ...|... - logical OR.
    • ...->... - material conditional, i.e. IF...THEN...
    • ...<=>... - logical equivalence. &,|,-> and <=> all have the same precedence and are left-associative.
    • // ... - comments; anything after them is ignored by mpl.exe.
    Basically, mpl.exe expects an input file to contain a single statement.  The contents of a file might look like this:
    ((x,Hx->Mx)&Hs)->Ms
    This is the basic categorical syllogism.  You can add comments by adding "//"  and text behind it:
    ((x,Hx->Mx)&Hs)->Ms //the basic categorical syllogism
    mpl.exe will ignore anything behind  "//" .  If a line only contains a comment, mpl.exe will ignore it completely:
    // the basic categorical syllogism
    ((x,Hx->Mx)&Hs)->Ms
    A statement can be spread across multiple lines.  mpl.exe treats separate lines as if they were separate operand in a logical conjunction, so
    Ax|Bx
    Bx->Cx
    Cx
    means the same as
    (Ax|Bx)&(Bx->Cx)&(Cx)
    If a binary operator is on a line by itself, every line before it is conjoined and treated as its left operand; everything after it is conjoined and treated as its right operand.  So
    x,H->Mx
    Hs
    ->
    Ms
    means the same as
    ((x,Hx->Mx)&Hs)->Ms
    This allows us to symbolize logical arguments and test their validity.  Think of the first two lines in the text above as the premises of an argument and the last line as the conclusion.  As with computer programs, comments can be helpful:
    // All men are mortal.
    x,Hx->Mx
    // Socrates is a man.
    Hs
    // Therefore,
    ->
    // Socrates is mortal.
    Ms

    The Algorithm

    I won't say much about the algorithm right now, but I will say it is based on "possible worlds" semantics for logic.  I will also say that it runs in O(2^n^n) time, where n is the number of predicates mentioned in a statement; it does not scale well.  On my computer, mpl.exe finishes instantaneously for statements with three or fewer predicates and finishes in about a second for a statement with four predicates.  I haven't tried it, but I estimate it would take about a day to finish evaluating a statement with five predicates.  I didn't bother to implement mpl.exe in such a way that it would support more than five predicates, considering how long it would take to do that.

    The Download

    Click here to download the application.

    The Future


    I plan on extending mpl.exe to handle identity and modal operators for possibility and necessity.  Maybe soon.

    Sunday, May 02, 2010

    Litany Writer

    Here's an MS Excel spreadsheet that take scripture references, looks them up on Biblegateway.com and writes them to an MS Word document.  I made it to help my dad write litanies, but it's also good for looking up several passages of scripture at once.  To use it, you'll need MS Office and you'll need to enable macros in Excel.

    Tuesday, March 16, 2010

    Ancestors

    For all my Cumingses: I've found images of the baptismal records of Isaac and John Cummin/Commen on-line. According to Isaac Cummings Family Association and my dad's research, they are the first two generations in our line to live in America. They emigrated from Essex in Southeast England in 1635.

    The baptismal records are in registers photographed by Seax - Essex Archives Online. The register with Isaac's name is here and the register with John's name is here.

    The following are images of the records themselves and my (hopefully mostly accurate) transcriptions of what they say:


    Anno 1601
    Aprill 5 Izecht Cummin the sonne of Jhon Cummin baptised




    5. John Commen sonne of Izaack & Anne his wiffe was baptised on the 9th daye of May

    Thursday, October 29, 2009

    Calvinism, According to Custardy

    I like this guy's explanation of TULIP: http://custardy.blogspot.com/2009/10/calvinism.html. Of the few who read my blog, I think the majority are Reformed.  What do you all think of it?