<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[EDOOFUS]]></title>
  <link href="http://kisom.github.com/atom.xml" rel="self"/>
  <link href="http://kisom.github.com/"/>
  <updated>2012-04-26T16:53:13+03:00</updated>
  <id>http://kisom.github.com/</id>
  <author>
    <name><![CDATA[Kyle Isom]]></name>
    <email><![CDATA[coder@kyleisom.net]]></email>
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[So, You Want To Unit Test in Xcode (Part 2)]]></title>
    <link href="http://kisom.github.com/blog/2012/03/16/so-you-want-to-unit-test-in-xcode-part-2/"/>
    <updated>2012-03-16T12:13:00+03:00</updated>
    <id>http://kisom.github.com/blog/2012/03/16/so-you-want-to-unit-test-in-xcode-part-2</id>
    <content type="html"><![CDATA[<p>In the <a href="http://kisom.github.com/blog/2012/03/15/so-you-want-to-unit-test-in-xcode">last post</a>, I
talked about getting unit testing set up in Xcode, why you should write
unit tests, and what kinds of things you should unit test. Now, I&#8217;d like
to talk a bit more about <em>how</em> to write unit tests. If you come from a
background doing unit testing, as I did, it&#8217;s very straightforward. If not,
I&#8217;ll spend a little time explaining things a bit more.</p>

<!-- more -->


<p>When you generate a test case, you get a test class (which is a subclass of
<code>SenTestCase</code>). Just like any other class, you can declare members and methods,
which are used to perform helper tasks and carry state.</p>

<p>A very basic codebase only requires test methods.
<a href="https://developer.apple.com/library/mac/#documentation/DeveloperTools/Conceptual/UnitTesting/00-About_Unit_Testing/about.html">OCUnit</a>
will load any method prefixed by <code>test</code>. These methods must return <code>void</code> and
take no parameters. After setting up the test in the method, you can use the
<code>ST...</code> macros to actually test the results. Here&#8217;s a contrived example:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='objc'><span class='line'><span class="o">-</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">testAdder</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="kt">int</span> <span class="n">result</span> <span class="o">=</span> <span class="n">add</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span> <span class="c1">// should return 2</span>
</span><span class='line'>    <span class="n">STAssertTrue</span><span class="p">(</span><span class="n">result</span> <span class="o">==</span> <span class="mi">2</span><span class="p">,</span> <span class="s">@&quot;1+1 should be 2!&quot;</span><span class="p">);</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>There are a number of test macros, which are listed in
<a href="https://developer.apple.com/library/mac/#documentation/DeveloperTools/Conceptual/UnitTesting/AB-Unit-Test_Result_Macro_Reference/result_macro_reference.html#//apple_ref/doc/uid/TP40002143-CH9-SW1">Appendix B</a>
of the <a href="https://developer.apple.com/library/mac/#documentation/DeveloperTools/Conceptual/UnitTesting/00-About_Unit_Testing/about.html">Xcode Unit Testing Guide</a>.</p>

<p>A real example taken from <a href="https://github.com/kisom/flexargs">FlexArgs</a>:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
</pre></td><td class='code'><pre><code class='objc'><span class='line'><span class="o">-</span> <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">test_init_with_NSArray</span>
</span><span class='line'><span class="p">{</span>
</span><span class='line'>    <span class="n">NSArray</span> <span class="o">*</span><span class="n">testArgs</span> <span class="o">=</span> <span class="p">[</span><span class="n">NSArray</span> <span class="nl">arrayWithObjects:</span>
</span><span class='line'>                         <span class="s">@&quot;foo=bar&quot;</span><span class="p">,</span>
</span><span class='line'>                         <span class="s">@&quot;baz=1&quot;</span><span class="p">,</span>
</span><span class='line'>                         <span class="s">@&quot;quux=-2.5&quot;</span><span class="p">,</span>
</span><span class='line'>                         <span class="s">@&quot;spam=footastic&quot;</span><span class="p">,</span>
</span><span class='line'>                         <span class="s">@&quot;eggs=false&quot;</span><span class="p">,</span> <span class="nb">nil</span><span class="p">];</span>
</span><span class='line'>    <span class="n">NSDictionary</span> <span class="o">*</span><span class="n">expected</span> <span class="o">=</span> <span class="p">[</span><span class="n">NSDictionary</span> <span class="nl">dictionaryWithObjectsAndKeys:</span>
</span><span class='line'>                              <span class="s">@&quot;bar&quot;</span><span class="p">,</span> <span class="s">@&quot;foo&quot;</span><span class="p">,</span>
</span><span class='line'>                              <span class="p">[</span><span class="n">NSNumber</span> <span class="nl">numberWithLongLong:</span><span class="mi">1</span><span class="p">],</span> <span class="s">@&quot;baz&quot;</span><span class="p">,</span>
</span><span class='line'>                              <span class="p">[</span><span class="n">NSNumber</span> <span class="nl">numberWithDouble:</span><span class="o">-</span><span class="mf">2.5</span><span class="p">],</span> <span class="s">@&quot;quux&quot;</span><span class="p">,</span>
</span><span class='line'>                              <span class="s">@&quot;footastic&quot;</span><span class="p">,</span> <span class="s">@&quot;spam&quot;</span><span class="p">,</span>
</span><span class='line'>                              <span class="p">[</span><span class="n">NSNumber</span> <span class="nl">numberWithBool:</span><span class="n">false</span><span class="p">],</span> <span class="s">@&quot;eggs&quot;</span><span class="p">,</span> <span class="nb">nil</span><span class="p">];</span>
</span><span class='line'>    <span class="n">DNFlexArgs</span> <span class="o">*</span><span class="n">flexArgs</span> <span class="o">=</span> <span class="p">[[</span><span class="n">DNFlexArgs</span> <span class="n">alloc</span><span class="p">]</span> <span class="nl">initParserWithNSArray:</span><span class="n">testArgs</span><span class="p">];</span>
</span><span class='line'>    <span class="n">NSDictionary</span> <span class="o">*</span><span class="n">parsed</span> <span class="o">=</span> <span class="p">[</span><span class="n">flexArgs</span> <span class="n">retrieveArgs</span><span class="p">];</span>
</span><span class='line'>
</span><span class='line'>    <span class="n">STAssertTrue</span><span class="p">([</span><span class="n">parsed</span> <span class="nl">isEqualToDictionary:</span><span class="n">expected</span><span class="p">],</span>
</span><span class='line'>                 <span class="s">@&quot;initParserWithNSArray failed to return the expected NSDictionary!&quot;</span><span class="p">);</span>
</span><span class='line'><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>This test is set up to verify that the dictionary returned by <code>DNFlexArgs</code> is
what we&#8217;d expect it to be. <code>[flexargs initParserWithNSArray:]</code> expects an
<code>NSArray</code> of NSStrings to be passed to it, I&#8217;ve initialised an <code>NSArray</code> of
<code>NSString</code>s. Because <code>DNFlexArgs</code> returns an <code>NSDictionary</code>, I&#8217;ve set up one
with the expected results. Then I setup the instance, pass it the initial
test arguments, and retrieve the results. The last step is to assert that the
returned <code>NSDictionary</code> matches the expected one. I&#8217;ve made sure to hit one of
each kind of possible argument that can be passed in.</p>

<p>In the last post, I listed some questions to help develop tests:</p>

<blockquote><ul>
<li>Right now, I&#8217;m only testing what I expect possible input <em>could</em> be. What
if someone passes in just <code>arg</code> or <code>arg=</code> - do I know that my code will handle
that gracefully?</li>
<li>What happens if I pass in an overflowed numeric value? I&#8217;ve tried to prepare
for this by using <code>long long</code> values, but how do I know my code is doing the
right thing?</li>
<li>What if I craft a special string that does something shifty, like embedding
a null byte? What happens then? What if the string has more than one <code>=</code> in it?</li>
<li>What happens if I <a href="http://pages.cs.wisc.edu/~bart/fuzz/">feed the parser random data</a>?</li>
</ul>
</blockquote>

<p>If you look in the source, you&#8217;ll see I&#8217;ve covered all but the last. That&#8217;s
because I haven&#8217;t yet found a good fuzzing library for Objective-C. However,
the new tests allowed me to make a few improvements to and verifications of
FlexArgs:</p>

<ul>
<li>I&#8217;ve verified that having an argument with no <code>=value</code> yields a boolean
value (i.e. <code>arg</code> results in an <code>arg = 1;</code>)</li>
<li>I&#8217;ve verified that having an argument with an empty value (i.e. <code>arg=</code>)
yields an empty string value (i.e. <code>arg = @"";</code>)</li>
<li>I was able to add support for multiple <code>=</code> in the class. Previously, only
the part of the value after the first <code>=</code> and before any other <code>=</code>&#8217;s was
captured. For example, passing <code>foo=bar=baz</code> resulted in <code>foo = @"bar";</code>
and <code>operator===</code> resulted in <code>operator = @"";</code>. Now, I get <code>foo = @"bar=baz";</code>
and <code>operator = @"==";</code>.</li>
<li>I was able to verify that passing a null byte in the middle of the string
just cut the string off at the null byte instead of causing problems.</li>
</ul>


<p>You can see the new code coverage output (as detailed in the
<a href="http://kisom.github.com/blog/2012/03/15/so-you-want-to-unit-test-in-xcode/">last post</a>)
<a href="http://kisom.github.com/downloads/FlexArgsCoverage2/">here</a>.</p>

<h2>More advanced test cases with <code>setUp</code>, <code>tearDown</code>, and class members</h2>

<p>OCUnit gives us more control over our test cases. Just like any other class,
we can include our own members in the class by putting their definitions
and any <code>@property</code> declarations in the interface. For example, if we&#8217;re
testing network code, we might want to create a socket.</p>

<p>If your members need to be set up for every test, or if certain preparation
needs to be done before each test (like clearing out a temporary directory),
you can reduce code duplication by putting the code in the <code>setUp</code> and
<code>tearDown</code> methods. The <code>setUp</code> method is called before each test method,
and the <code>tearDown</code> method is called after each. If you&#8217;re calling the same
code before each test, you might consider moving them. If most of your tests
are calling the same code and a few aren&#8217;t, consider creating a new test case
for the ones that don&#8217;t, and moving the duplicate code into these methods and
members.</p>

<h2>guard-xcode</h2>

<p>In the <a href="http://kisom.github.com/blog/2012-03-15/so-you-want-to-unit-test-in-xcode/">last post</a>, I
mentioned <a href="https://github.com/guard/guard/">guard</a>. What is guard?</p>

<blockquote><p>Guard is a command line tool to easily handle events on file system modifications.</p></blockquote>

<p>We can use this to trigger a build every time a source file is changed.
Unfortunately, I couldn&#8217;t find any good Guards (the term for specific tasks to
be done on a changed-file event) to handle running configurable builds. To
address this, I wrote a Guard called <code>guard-xcode</code>, which kicks off an Xcode
based on the options you configure it with. The source is, of course,
<a href="https://github.com/kisom/guard-xcode">on Github</a> and it&#8217;s on
<a href="https://rubygems.org/gems/guard-xcode">RubyGems.org</a>, so it can be installed
via <code>gem</code> or <code>bundle install</code>. The <code>[README](https://github.com/kisom/guard-xcode/blob/master/README.md)</code>
explains how to get started.</p>

<p>For FlexArgs, the setup is fairly straightforward. I already have
<a href="http://growl.info/downloads#generaldownloads">growl-notify</a> installed, so all
I have to do is create my Gemfile:</p>

<figure class='code'><figcaption><span>Gemfile </span></figcaption>
 <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='ruby'><span class='line'><span class="n">source</span> <span class="ss">:rubygems</span>
</span><span class='line'>
</span><span class='line'><span class="n">gem</span> <span class="s2">&quot;guard-xcode&quot;</span>
</span><span class='line'><span class="n">gem</span> <span class="s2">&quot;guard&quot;</span>
</span><span class='line'><span class="n">gem</span> <span class="s2">&quot;growl&quot;</span>
</span><span class='line'><span class="n">gem</span> <span class="s2">&quot;rb-readline&quot;</span>   <span class="c1"># better interface for MRI</span>
</span></code></pre></td></tr></table></div></figure>


<p>I&#8217;m using <a href="http://rvm.beginrescueend.com">rvm</a>, so I&#8217;ve already got the
<code>bundle</code> gem installed. You can install it with <code>gem install bundle</code> if you
need to. The next step is to run <code>bundle install</code>, and then <code>guard init xcode</code>.
Of course, the Guardfile doesn&#8217;t know the name of your target, so you&#8217;ll need
to open the Guardfile and edit it. Mine looks like this:</p>

<figure class='code'><figcaption><span>Guardfile </span></figcaption>
 <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class='ruby'><span class='line'><span class="c1"># template Xcode guard</span>
</span><span class='line'>
</span><span class='line'><span class="n">notification</span> <span class="ss">:growl</span>
</span><span class='line'>
</span><span class='line'><span class="n">guard</span> <span class="ss">:xcode</span><span class="p">,</span> <span class="ss">:target</span> <span class="o">=&gt;</span> <span class="s1">&#39;FlexArgs&#39;</span><span class="p">,</span> <span class="ss">:quiet</span> <span class="o">=&gt;</span> <span class="kp">true</span><span class="p">,</span> <span class="ss">:clean</span> <span class="o">=&gt;</span> <span class="kp">true</span> <span class="k">do</span>
</span><span class='line'>  <span class="n">watch</span><span class="p">(</span><span class="sr">/^.+\.[hmc]$/</span><span class="p">)</span>
</span><span class='line'><span class="k">end</span>
</span></code></pre></td></tr></table></div></figure>


<p>Now all you need to do is run <code>guard</code> in your project root. Once files change,
<code>guard</code> will kick off a build. I&#8217;ve set <code>:quiet =&gt; true</code>, so I only get Growl
notifications if the build has warnings or errors.</p>

<h2>Some final ideas for a test-based development cycle</h2>

<p>There&#8217;s a few options you can set in your project build settings that I&#8217;ve
found quite useful:</p>

<ol>
<li>Setting <strong>Test After Build</strong> to <strong>Yes</strong> runs tests anytime a build is done.</li>
<li>Setting <strong>Treat Warnings as Errors</strong> to <strong>Yes</strong> adds more emphasis to
writing good code.</li>
<li>Adding the Test product to the main build makes testing easier as well, and
means you can set your build target to the main target, and still test.</li>
</ol>


<h2>Conclusion</h2>

<p>This concludes the two-part series on Unit Testing in Xcode. I&#8217;ve tried to
document what I learned trying to get testing set up, and hopefully other
people will find it helpful as well.</p>

<h2>References</h2>

<ul>
<li><a href="https://developer.apple.com/library/mac/#documentation/DeveloperTools/Conceptual/UnitTesting/00-About_Unit_Testing/about.html">XCode Unit Testing Guide</a></li>
<li><a href="http://www.infinite-loop.dk/blog/2011/12/code-coverage-with-xcode-4-2/">Code Coverage with Xcode 4.2</a></li>
<li><a href="https://github.com/kisom/flexargs/zipball/blog-post2">FlexArgs tag for this project</a></li>
<li>guard-xcode on <a href="https://github.com/kisom/guard-xcode">Github</a> and <a href="http://rubygems.org/gems/guard-xcode">RubyGems</a></li>
<li><a href="https://github.com/guard/guard">Guard on Github</a></li>
<li><a href="http://growl.info/downloads#generaldownloads">growl-notify</a></li>
<li><a href="http://rvm.beginrescueend.com">rvm</a></li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[So, You Want To Unit Test in Xcode]]></title>
    <link href="http://kisom.github.com/blog/2012/03/15/so-you-want-to-unit-test-in-xcode/"/>
    <updated>2012-03-15T19:19:00+03:00</updated>
    <id>http://kisom.github.com/blog/2012/03/15/so-you-want-to-unit-test-in-xcode</id>
    <content type="html"><![CDATA[<p>One of my personal preferences when testing
<a href="http://heim.ifi.uio.no/~trygver/themes/mvc/mvc-index.html">MVC</a> code is to
test my model using a commandline test driver, so when
<a href="http://samuelgoodwin.tumblr.com">Samuel Goodwin</a>
and I were talking about testing code and he brought that up, we started
discussing ways to more effectively write those commandline drivers. Long story
short, we decided a useful strategy would be able to provide a library to parse
arguments like &#8216;key=value&#8217; into a dictionary. Since we will be doing a lot of
iOS app work, and quite possibly desktop Cocoa work later on, we decided
writing a class to do this in Objective-C would be useful, and about 24 hours
later, <a href="https://github.com/kisom/flexargs/">FlexArgs</a> was dumped onto the
world. But that&#8217;s not what I want to talk about here. Rather, I&#8217;d like to
discuss what I&#8217;ve learned about unit testing in Objective-C. As a developer
who does a lot of testing in C and Python already, I immediately made it a
priority to learn how to do this. In this post, I&#8217;ll go over basic unit
testing, doing code coverage, and writing better tests. I&#8217;m particularly
aiming this post at people who know how to code and test, but want to do it
more effectively in Objective-C or want to learn how to get started.</p>

<!-- more -->


<p>Through version 0.9.0, which was a functional version of the code missing a few
other pieces, I was using the autotools suite to manage the build. (Why? I know
autotools, and can set up the build environment quickly, whereas I don&#8217;t use
Xcode enough to be terribly good at it.) Unfortunately, I couldn&#8217;t find any
good libraries to do it. One of the goals of the project was to learn Objective-C
better, so I buckled down and imported the project into Xcode. I subscribe to
the idea that it&#8217;s a good idea to start writing tests before you write code,
so writing tests after the bulk of the code was written felt a little janky.
I digress.</p>

<p>First things you&#8217;ll need to do is add a new target (<code>File</code>-><code>New</code>-><code>Target</code>)
and name it <code>(@"%@Tests", ProjectName)</code> (ex. MyClassTests), and save it.
Because this is a library I want to give other people, it&#8217;s set up as a CLI
application with a minimal main that gives an example of how to use code, and
so we don&#8217;t really need coverage for that. However, the test suite should be
exercising large portions of our code, so setting up code coverage for that is
a good way to make sure your tests are exercising your code fully.</p>

<p>I&#8217;d like to point out here that while code coverage is great for making sure
all of your code is being touched, it&#8217;s not a replacement for well-thought-out
tests. It&#8217;s a useful tool in the tool box while you&#8217;re developing code, and
great for profiling code to determine bottlenecks (but not
<a href="http://c2.com/cgi/wiki?PrematureOptimization">prematurely</a>!), but you still
need to make sure you&#8217;re testing your code fully. I try to take the time to
consider edge cases, places where the code might act inappropriately, and try
to implement some fuzzing to throw unexpected things at the code.</p>

<p>You can set up your main target to run tests as well. You&#8217;ll need to
edit the scheme:</p>

<p><img src="http://kisom.github.com/images/unit_testing_xcode/xcode4_edit_scheme.png" alt="editing the scheme in xcode4" /></p>

<p>Once there, select <code>Test</code>, and click the <code>+</code> to add the test case bundle.</p>

<p>Now, we can set up code coverage. Under the project settings, select the test
target. Under the build settings, change the <strong>Generate Test Coverage Files</strong>
and <strong>Instrument Program Flow</strong> options to <strong>Yes</strong>. Now the fun part is adding
in <code>libprofile_rt.dylib</code>. I found it under
<code>/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib</code>
in OS X 10.8. Under the <code>Build Phases</code> tab, you&#8217;ll need to add this to the
<code>Link Binary with Libraries</code> section. Once you start running tests, you&#8217;ll start
getting coverage data.</p>

<p>I lied about there being only one fun part. The other fun part is getting that
data, which you will find in the <code>Projects</code> tab in the Organiser. There is a
line called <code>Derived Data</code> with an arrow next to it to open that folder in
Finder. You&#8217;ll want to open that folder in your terminal emulator of choice.
For FlexArgs, I had to navigate to
<code>DERIVED_DATA/Intermediates/FlexArgs.build/Debug/FlexArgsTest.build/Objects-normal/x86_64/</code>.
There, you should see some files named <em>.gcno, </em>.gcda, and so forth.</p>

<p>At this point you&#8217;ll want to install <code>lcov</code> which is, most fortuitously, in
HomeBrew. <code>lcov</code> gives us pretty HTML output of our code coverage (via the
included <code>genhtml</code> program). You&#8217;ll also want to stick this small script in
your path:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="c">#!/bin/sh</span>
</span><span class='line'><span class="c"># ccovhtml.sh</span>
</span><span class='line'><span class="c"># usage:</span>
</span><span class='line'><span class="c">#   ccovhtml.sh ClassName OutputDirectory</span>
</span><span class='line'><span class="c"># example:</span>
</span><span class='line'><span class="c">#   ccovhtml.sh FlexArgsTests ~/tmp/FlexArgsCoverage</span>
</span><span class='line'>
</span><span class='line'>lcov --base-directory . --directory . -c -o <span class="nv">$1</span>.info
</span><span class='line'>genhtml -o <span class="nv">$2</span> -t <span class="s2">&quot;$1 code coverage&quot;</span> --num-spaces 4 <span class="nv">$1</span>.info
</span></code></pre></td></tr></table></div></figure>


<p>The first argument is the name of output file to generate, and is also used
to generate the title for the HTML output. The second is the output directory
to store the files in. Once you&#8217;ve run your tests, run that script and check
out the results. For my code, it built <a href="http://kisom.github.com/downloads/FlexArgsCoverage/">this page</a>.
Note that this is with two tests that don&#8217;t yet test the full functionality
(specifically <code>+(id)parserWithNSArray:(NSArray *)inargv</code> and
<code>-(id)initParser:(char **)inargv nargs:(int)nargs</code>), so you can see that the
output highlights those lines. If you take a look at the <a href="https://github.com/kisom/FlexArgs">FlexArgs</a>
source at <a href="https://github.com/kisom/flexargs/tree/bec86374f3876e8a8c44a17849a3f49c76245d1e">commit bec86374f3</a>
or the helpful tag <code>blog-post</code> (you can grab a <a href="https://github.com/kisom/flexargs/zipball/blog-post">zipfile snapshot</a>)
you&#8217;ll see I only have two tests, and they both touch basic functionality.</p>

<p>Intelligent and well-planned and executed tests offer many benefits:</p>

<ol>
<li>Validate the program&#8217;s logic</li>
<li>Drive development by letting you see what&#8217;s not implemented yet</li>
<li>Perform <a href="https://en.wikipedia.org/wiki/Regression_testing">regression tests</a>
to let you know if changes broke your code, or if they&#8217;ve broken other parts
of the codebase</li>
<li>Test edge cases to make sure your code doesn&#8217;t do anything unexpected</li>
<li>Assist in identifying possible security issues.</li>
</ol>


<p>So given this code, what kinds of tests could I write to improve the functionality?</p>

<ul>
<li>Right now, I&#8217;m only testing what I expect possible input <em>could</em> be. What
if someone passes in just <code>arg</code> or <code>arg=</code> - do I know that my code will handle
that gracefully?</li>
<li>What happens if I pass in an overflowed numeric value? I&#8217;ve tried to prepare
for this by using <code>long long</code> values, but how do I know my code is doing the
right thing?</li>
<li>What if I craft a special string that does something shifty, like embedding
a null byte? What happens then? What if the string has more than one <code>=</code> in it?</li>
<li>What happens if I <a href="http://pages.cs.wisc.edu/~bart/fuzz/">feed the parser random data</a>?</li>
</ul>


<p>As you can see, there&#8217;s a lot to think about. While writing tests ahead of time
to verify basic functionality is a great idea (I wrote about
<a href="http://www.kyleisom.net/blog/2011/07/04/rgtdd/">README-Generated Test-Drive Development</a>
previously), your tests need to go further to fully verify your code. Just by
looking at the questions above and thinking about the tests, I can already
see that my code needs some work to address some of those questions. I can
write the tests to validate the changes I&#8217;ll need to make.</p>

<p>I mentioned at the beginning that I like to test models using command line
test drivers. What this means is that I write a small command line target that
I can call from something like <code>make tests</code> or even <code>python testrunner.py</code> so
I can constantly run my tests. This way, I don&#8217;t need to worry about the view
or the controllers to develop the model. This follows my ideal of developing
model first, and letting the controller and view follow from that. In Xcode,
we can do this from the commandline inside the project using
<code>xcodebuild -target FlexArgsTest -configuration Debug clean build</code>.
Before you run this, set <em>Test After Build</em> to <em>Yes</em> in the Build Settings
to ensure the tests will run after building. (At some point, I&#8217;ll write a
<a href="https://github.com/guard/guard">Guardfile</a> to automate testing.)</p>

<p>I hope you find this useful. Now, if you&#8217;ll excuse me - I have more tests to write&#8230;</p>

<p>Update: I&#8217;ve written a <a href="http://kisom.github.com/blog/2012/03/16/so-you-want-to-unit-test-in-xcode-part-2/">part 2</a>,
which covers a bit more and includes <del>black magic where the dark lord has destroyed
my soul and brought death, destruction, and chaos upon the world</del> a Ruby
gem I wrote to assist in testing.</p>

<h3>References</h3>

<ul>
<li><a href="https://developer.apple.com/library/mac/#documentation/DeveloperTools/Conceptual/UnitTesting/00-About_Unit_Testing/about.html">XCode Unit Testing Guide</a></li>
<li><a href="http://www.infinite-loop.dk/blog/2011/12/code-coverage-with-xcode-4-2/">Code Coverage with Xcode 4.2</a></li>
<li><a href="http://drdobbs.com/tools/206105233">Regression Testing</a></li>
<li><a href="http://pages.cs.wisc.edu/~bart/fuzz/">Fuzzing</a></li>
<li><a href="http://www.kyleisom.net/blog/2011/07/04/rgtdd/">README-Generated Test-Driven Development</a></li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Autonomous Vehicles]]></title>
    <link href="http://kisom.github.com/blog/2012/02/26/autonomous-vehicle/"/>
    <updated>2012-02-26T17:33:00+03:00</updated>
    <id>http://kisom.github.com/blog/2012/02/26/autonomous-vehicle</id>
    <content type="html"><![CDATA[<p>I just finished up the first unit of <a href="http://www.udacity.com">Udacity&#8217;s</a>
<a href="">CS373 (Programming a Robotic Car)</a>.
It&#8217;s been a lot of fun, and reminds me of why I love Python so
much. In this post, I&#8217;m just going to go over what the end result of
the first week has been.</p>

<!-- more -->


<p><a href="http://xkcd.com/353/"><img src="http://imgs.xkcd.com/comics/python.png" alt="Obligatory XKCD" /></a></p>

<p>(Obligatory xkcd&#8230;)</p>

<p>This unit was all Monte Carlo localisation, which generates
probability distributions as to where the robot is in the world. This,
of course, is all very simplified but it is still quite a fascinating
learning experience.</p>

<p>The final homework question has us determining a probability
distribution (i.e. a matrix of probabilities guessing where the robot
is in the world) based on sensor readings and movements in a world of
red / green cells. For example, one of the first the worlds we&#8217;re
given in the examples looks like this:</p>

<p><img src="http://kisom.github.com/images/cs373/unit1/simple_world.png" alt="simple world example" /></p>

<p>You can see each cell in the world has a colour, and the robot&#8217;s
worldview consists of a 2D matrix of probabilities.</p>

<p>Given a list of sensor readings, i.e. <code>['green', 'red']</code> and a list
of corresponding motions in the form [y, x] such that <code>[1, 0]</code> is a
movement downwards and <code>[0, 1]</code> is a movement right, the robot should
be able to figure out where in the world it is. All the example code
in the class is done as simple functions operating in the <code>'__main__'</code>
namespace, but I used a Robot class to simulate everything. Testing
was painful, because I had to hand-type in most of the reference
probability distributions. Then,
<a href="https://bitbucket.org/kisom/cs373/changeset/70b3d80194ee">early Sunday morning</a>
(actually late Saturday night), I discovered the
<a href="http://pypi.python.org/pypi/colorama"><code>colorama</code></a> module, which
faciliates pretty terminal outputs (i.e. colors, bolding, etc&#8230;) I
was able to write a method to print out the map (aka the <code>.showmap()</code>
method in the screenshots) that made testing where the robot was a lot
more fun.</p>

<p>The red/green world felt very contrived. I had a pretty good unit
testing framework set up, so I decided to test my code using a road
test: I built a world with three lanes that used black to represent
the asphalt, yellow to indicate lane markers, and white to indicate
the shoulder marker. I simulated having the robot start at the
shoulder, drive forward and then switch to the middle lane. I expected
the robot should believe itself to be somewhere in the middle
lane.</p>

<p>The world looks something like this:</p>

<p><img src="http://kisom.github.com/images/cs373/unit1/road_test_world.png" alt="road test world" /></p>

<p>With that in mind, I wrote the road test:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="kn">import</span> <span class="nn">localisation</span>         <span class="c"># the module with the robot code</span>
</span><span class='line'>
</span><span class='line'><span class="k">def</span> <span class="nf">road_test</span><span class="p">():</span>
</span><span class='line'>        <span class="n">world</span> <span class="o">=</span> <span class="p">[[</span><span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;yellow&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;yellow&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;white&#39;</span><span class="p">],</span>
</span><span class='line'>                 <span class="p">[</span><span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;white&#39;</span><span class="p">],</span>
</span><span class='line'>                 <span class="p">[</span><span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;yellow&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;yellow&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;white&#39;</span><span class="p">],</span>
</span><span class='line'>                 <span class="p">[</span><span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;white&#39;</span><span class="p">],</span>
</span><span class='line'>                 <span class="p">[</span><span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;yellow&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;yellow&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;white&#39;</span><span class="p">],</span>
</span><span class='line'>                 <span class="p">[</span><span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;white&#39;</span><span class="p">]]</span>
</span><span class='line'>        <span class="n">sensor</span> <span class="o">=</span> <span class="n">localisation</span><span class="o">.</span><span class="n">Sensor</span><span class="p">(</span><span class="mf">0.8</span><span class="p">)</span>
</span><span class='line'>        <span class="n">robot</span> <span class="o">=</span> <span class="n">localisation</span><span class="o">.</span><span class="n">Robot2D</span><span class="p">(</span><span class="n">world</span><span class="p">,</span> <span class="n">sensor</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">)</span>
</span><span class='line'>        <span class="k">print</span> <span class="s">&#39;robot initalised.&#39;</span>
</span><span class='line'>
</span><span class='line'>        <span class="n">motion</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]]</span>
</span><span class='line'>        <span class="n">measurements</span> <span class="o">=</span> <span class="p">[</span><span class="s">&#39;white&#39;</span><span class="p">,</span> <span class="s">&#39;white&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;yellow&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">,</span> <span class="s">&#39;black&#39;</span><span class="p">]</span>
</span><span class='line'>        <span class="k">print</span> <span class="s">&#39;driving!&#39;</span>
</span><span class='line'>        <span class="n">robot</span><span class="o">.</span><span class="n">localise</span><span class="p">(</span><span class="n">motion</span><span class="p">,</span> <span class="n">measurements</span><span class="p">)</span>
</span><span class='line'>        <span class="k">print</span> <span class="n">translate_drive</span><span class="p">(</span><span class="n">motion</span><span class="p">,</span> <span class="n">measurements</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'>        <span class="n">guess</span> <span class="o">=</span> <span class="n">robot</span><span class="o">.</span><span class="n">locate</span><span class="p">()[</span><span class="s">&#39;guess&#39;</span><span class="p">]</span>
</span><span class='line'>        <span class="n">q</span> <span class="o">=</span> <span class="p">[(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">guess</span><span class="p">]</span>
</span><span class='line'>      <span class="n">q</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span> <span class="o">==</span> <span class="mi">2</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">q</span><span class="p">]</span>
</span><span class='line'>      <span class="k">return</span> <span class="n">robot</span>
</span></code></pre></td></tr></table></div></figure>


<p>The <code>translate_drive</code> function maps the <code>motions</code> and <code>measurements</code>
into a pretty output form to make it more intuitive as to what&#8217;s being
simulated:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="k">def</span> <span class="nf">translate_drive</span><span class="p">(</span><span class="n">motion_seq</span><span class="p">,</span> <span class="n">readings</span><span class="p">):</span>
</span><span class='line'>    <span class="n">directions</span> <span class="o">=</span> <span class="p">{</span>
</span><span class='line'>                    <span class="s">&#39;[0, 0]&#39;</span><span class="p">:</span>   <span class="s">&#39;nowhere&#39;</span><span class="p">,</span>
</span><span class='line'>                    <span class="s">&#39;[0, 1]&#39;</span><span class="p">:</span>   <span class="s">&#39;right&#39;</span><span class="p">,</span>
</span><span class='line'>                    <span class="s">&#39;[0, -1]&#39;</span><span class="p">:</span>  <span class="s">&#39;left&#39;</span><span class="p">,</span>
</span><span class='line'>                    <span class="s">&#39;[1, 0]&#39;</span><span class="p">:</span>   <span class="s">&#39;down&#39;</span><span class="p">,</span>
</span><span class='line'>                    <span class="s">&#39;[-1, 0]&#39;</span><span class="p">:</span>  <span class="s">&#39;up&#39;</span>
</span><span class='line'>    <span class="p">}</span>
</span><span class='line'>
</span><span class='line'>    <span class="n">drive_str</span> <span class="o">=</span> <span class="s">&#39;&#39;</span>
</span><span class='line'>    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">motion_seq</span><span class="p">)):</span>
</span><span class='line'>        <span class="n">motion</span> <span class="o">=</span> <span class="n">motion_seq</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</span><span class='line'>        <span class="n">reading</span> <span class="o">=</span> <span class="n">readings</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
</span><span class='line'>        <span class="n">drive_str</span> <span class="o">+=</span> <span class="s">&#39;</span><span class="se">\t</span><span class="s">go &#39;</span> <span class="o">+</span> <span class="n">directions</span><span class="p">[</span><span class="nb">str</span><span class="p">(</span><span class="n">motion</span><span class="p">)]</span>
</span><span class='line'>        <span class="n">drive_str</span> <span class="o">+=</span> <span class="s">&#39;, saw &#39;</span> <span class="o">+</span> <span class="n">reading</span> <span class="o">+</span> <span class="s">&#39;</span><span class="se">\n</span><span class="s">&#39;</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">return</span> <span class="n">drive_str</span>
</span></code></pre></td></tr></table></div></figure>


<p>Here&#8217;s an example of the road test in action:</p>

<p><img src="http://kisom.github.com/images/cs373/unit1/road_test.png" alt="road test screenshot" /></p>

<p>As expected, the robot thinks it is somewhere in the middle lane.</p>

<p>Next week, we start learning
<a href="https://en.wikipedia.org/wiki/Particle_filter">particle filters</a>
and <a href="https://en.wikipedia.org/wiki/Kalman_filter">kalman filters</a>.</p>

<p>I&#8217;m actively working to develop a physical AGV (autonomous ground
vehicle). The objective is to develop a platform I can use to build
later, more practical robots on; the AGV platform will be focused on
navigation. As part of this task, I&#8217;m also working to translate the
Python code to C++ (suitable for the Arduino, for example).</p>

<p>Stay tuned!</p>

<h3>References:</h3>

<ul>
<li><a href="http://robots.stanford.edu/papers/thrun.robust-mcl.html">Robust Monte Carlo Localization for Mobile Robots</a></li>
<li><a href="https://bitbucket.org/kisom/cs373">Python versions of the repo)</a> (private until the 28th)</li>
<li><a href="https://github.com/kisom/cs373">C++ version</a></li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[carefree git and hg]]></title>
    <link href="http://kisom.github.com/blog/2012/02/22/carefree-git-and-hg/"/>
    <updated>2012-02-22T16:52:00+03:00</updated>
    <id>http://kisom.github.com/blog/2012/02/22/carefree-git-and-hg</id>
    <content type="html"><![CDATA[<p>I was at an <a href="http://www.appsterdam.rs">Appsterdam</a> lunch meetup today, and
before the presentation I was talking with some people about source control.
They worked for Atlassian, and so of course bitbucket v. github came up.
(It didn&#8217;t help that I was wearing a GitHub shirt. Atlassian - I want to give
you money to get a bitbucket shirt but I don&#8217;t see any for sale. Why?)
Regardless of why I typically use github more, or what my usage profiles are
for the two, they were interested to hear my solution to a problem I had:
how to simplify working in various source control systems, particularly in
both mercurial and git.</p>

<!-- more -->


<p>For a long time, I used mostly git and far less mercurial. I wrote a bunch
of aliases in my <code>.zprofile</code> that looked something like:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nb">alias </span><span class="nv">commit</span><span class="o">=</span><span class="s1">&#39;git commit&#39;</span>
</span><span class='line'><span class="nb">alias </span><span class="nv">checkout</span><span class="o">=</span><span class="s1">&#39;git checkout&#39;</span>
</span><span class='line'>...
</span></code></pre></td></tr></table></div></figure>


<p>For mercurial, I just entered everything normally. However, as I started to
use mercurial more, I wanted to use those aliases for both systems. I ended
up writing a bunch of shell functions to do this. They are all strict POSIX
compatible, so they work under at least <code>zsh</code>, <code>ksh</code>, and <code>bash</code>. I haven&#8217;t
tested any others, so your mileage may vary. The latest version of this is
available at my <a href="https://github.com/kisom/dotconf">dotconf github repo</a>, you
can view it <a href="https://github.com/kisom/dotconf/blob/master/.sourcecon.zsh">here</a></p>

<p>The core of the code is the pair of functions:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>get_repo_type <span class="o">()</span> <span class="o">{</span>
</span><span class='line'>    git status 2&gt;/dev/null 1&gt;/dev/null
</span><span class='line'>    <span class="k">if</span> <span class="o">[</span> 0 -eq <span class="nv">$?</span> <span class="o">]</span>; <span class="k">then</span>
</span><span class='line'><span class="k">        </span><span class="nb">echo </span>1
</span><span class='line'>        <span class="k">return </span>1
</span><span class='line'>    <span class="k">else</span>
</span><span class='line'><span class="k">        </span>hg status 2&gt;/dev/null 1&gt;/dev/null
</span><span class='line'>        <span class="k">if</span> <span class="o">[</span> 0 -eq <span class="nv">$?</span> <span class="o">]</span>; <span class="k">then</span>
</span><span class='line'><span class="k">            </span><span class="nb">echo </span>2
</span><span class='line'>            <span class="k">return </span>2
</span><span class='line'>        <span class="k">else</span>
</span><span class='line'><span class="k">            </span><span class="nb">echo </span>0
</span><span class='line'>            <span class="k">return </span>0
</span><span class='line'>        <span class="k">fi</span>
</span><span class='line'><span class="k">    fi</span>
</span><span class='line'><span class="o">}</span>
</span><span class='line'>
</span><span class='line'>not_a_repo <span class="o">()</span> <span class="o">{</span>
</span><span class='line'>    <span class="nb">echo</span> <span class="s2">&quot;not a git or mercurial repo!&quot;</span>
</span><span class='line'><span class="o">}</span>
</span></code></pre></td></tr></table></div></figure>


<p><code>get_repo_type</code> does exactly what it says it does: it outputs a number that
identifies what type of source control the repo uses. The <code>not-a_repo</code>
simple provides a shortcut for displaying the error message. All of the
commands use these two functions. The commands are implemented in a similar
style, so let&#8217;s take a look at the first defined function, <code>pull</code>:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>pull <span class="o">()</span> <span class="o">{</span>
</span><span class='line'>    <span class="nv">repo_type</span><span class="o">=</span><span class="k">$(</span>get_repo_type<span class="k">)</span>
</span><span class='line'>    <span class="k">if</span> <span class="o">[</span> <span class="s2">&quot;1&quot;</span> <span class="o">=</span> <span class="s2">&quot;$repo_type&quot;</span> <span class="o">]</span>; <span class="k">then</span>
</span><span class='line'><span class="k">        </span>git pull <span class="nv">$@</span>
</span><span class='line'>    <span class="k">elif</span> <span class="o">[</span> <span class="s2">&quot;2&quot;</span> <span class="o">=</span> <span class="s2">&quot;$repo_type&quot;</span> <span class="o">]</span>; <span class="k">then</span>
</span><span class='line'><span class="k">        </span>hg pull <span class="nv">$@</span>
</span><span class='line'>    <span class="k">else</span>
</span><span class='line'><span class="k">        </span>not_a_repo
</span><span class='line'>    <span class="k">fi</span>
</span><span class='line'><span class="o">}</span>
</span></code></pre></td></tr></table></div></figure>


<p>Unfortunately, shell scripting isn&#8217;t a terribly advanced programming language,
so there&#8217;s a lot of redundancy in the code; in fact all of the commands use the
same basic template of</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'><span class="nv">repo_type</span><span class="o">=</span><span class="k">$(</span>get_repo_type<span class="k">)</span>
</span><span class='line'><span class="k">if</span> <span class="o">[</span> <span class="s2">&quot;1&quot;</span> <span class="o">=</span> <span class="s2">&quot;$repo_type&quot;</span> <span class="o">]</span>; <span class="k">then</span>
</span><span class='line'>   <span class="c"># git commands go here</span>
</span><span class='line'><span class="k">elif</span> <span class="o">[</span> <span class="s2">&quot;2&quot;</span> <span class="o">=</span> <span class="s2">&quot;$repo_type&quot;</span> <span class="o">]</span>; <span class="k">then</span>
</span><span class='line'>   <span class="c"># hg commands go here</span>
</span><span class='line'><span class="k">else</span>
</span><span class='line'><span class="k">   </span>not_a_repo
</span><span class='line'><span class="k">fi</span>
</span></code></pre></td></tr></table></div></figure>


<p>I thought of some other ways to do this, but they all ended up being far more
complex and time-consuming than just knocking it out like this. This style is
also POSIX-compatible, meaning it can be used with really any shell.</p>

<p>Another feature of note is that I&#8217;ve ensured to pass through the shell variable
<code>$@</code>, which means any arguments are passed directly to the command; this lets
you still enable the full use of the specialised commands without having to
mentally switch context between just typing the shortened command and the full
one.</p>

<p>So, let&#8217;s look at what commands are supported (use <code>vcshelp</code> to list them):</p>

<ul>
<li>commit</li>
<li>add</li>
<li>pull</li>
<li>push</li>
<li>checkout</li>
<li>fetch</li>
<li>clog</li>
<li>which_dvcs</li>
<li>vcdiff</li>
</ul>


<p>For the most part, they are wrappers around the $scm version of the command,
passing through any arguments as before. The last three aren&#8217;t (but do pass
through any command line options as appropriate):</p>

<ul>
<li><code>clog</code> is a shortcut for &#8216;commit log&#8217;, and shows the $scm log. For mercurial,
it will pipe it to less (by default, hg doesn&#8217;t).</li>
<li><code>which_dvcs</code> is a wrapper around get_repo_type to print the name of the SCM
instead of the numeric value used in the functions.</li>
<li><code>vcdiff</code> is a version control diff; like <code>clog</code>, it will pipe hg diff to less.</li>
</ul>


<p>There are a few commands that aren&#8217;t documented in <code>vcshelp</code>:
* <code>co</code> is an alias for <code>checkout</code>
* <code>st</code> is a variant of status that shows only tracked files</p>

<p>I&#8217;ve found this system to work out pretty well for me, mostly because it
requires less mental power to handle the basic SCM workflow. It also satisfies
my coder&#8217;s itch to remove unnecessary code (i.e. always having to prefix <code>git</code>
or <code>hg</code> to source control commands) by making the shell &#8220;aware&#8221; of which SCM
I&#8217;m using at the time.</p>

<p>There is, of course the caveat that <a href="http://jrick.devio.us">Josh Rickmar</a>
pointed out. I&#8217;ve come to grow used to a lot of the specifics of working
with the different SCMs. Two common idioms I use a lot with this setup are
<code>commit -a</code> in a git repo and <code>pull -u</code> in a mercurial repo. If you are using
an SCM, you should definitely get to know it before using it for serious
work. Of course, you can also take my code and tweak it so that it behaves
differently. The code is yours.</p>

<p>Thanks go to Chris LePetit, who suggested I write the article.
<a href="http://samuelgoodwin.tumblr.com">Samuel Goodwin</a> and
<a href="https://twitter.com/imwally">Wally Jones</a> proofread it for me.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Analytics Win]]></title>
    <link href="http://kisom.github.com/blog/2012/02/21/analytics-win/"/>
    <updated>2012-02-21T12:15:00+03:00</updated>
    <id>http://kisom.github.com/blog/2012/02/21/analytics-win</id>
    <content type="html"><![CDATA[<p>Inexplicably, for the longest time I was reticent to enable any sort of
analytics on my personal site. Partially because, to be honest, it&#8217;s not
as if my blog is well read (or so I assume, but soon I&#8217;ll have numbers
to back that claim up). As I try to get more involved in the world, I&#8217;ve
found my site is useful as a portfolio of sorts - not so much in the way
of &#8220;look at my sexy site&#8221; as &#8220;here&#8217;s the cool things I do&#8221;. I&#8217;ve noticed
that <a href="https://github.com">GitHub</a> has had some DDoS issues lately, and as
I host this site on my <a href="http://pages.github.com/">GitHub pages</a>, I wanted
to minimise any potential downtimes. I&#8217;d also noticed that some of my
pages were a bit on the slow side to load, as <a href="http://octopress.org/">Octopress</a>
appears to load quite a bit of javascript. I admit to being a fan of many of
the asides, and to have written some of my own.</p>

<!-- more -->


<p>In order to improve this situation, I took two steps:</p>

<ol>
<li>I enabled Google Analytics:
while I&#8217;m not a particular fan of feeding the privacy black hole with even more
data, it appears to be the only viable option at this time. I have been eyeing
<a href="http://haveamint.com/">Mint</a>, but I am abstaining from purchasing anything new
until I leave the Netherlands (in a little over a week) just to stay on the
safe side of my bank account. (30 USD may not sound like much, but that&#8217;s about
two days worth of döner or shawarma for dinner.)</li>
<li>I set up the site on CloudFlare.</li>
</ol>


<p>What I didn&#8217;t realise is that several of my older posts actually rank high on
Google&#8217;s search results; however, the link on Google points to the old url
from when I was using <a href="http://blaze.blackened.cz">blazeblogger</a>. Because the
content is actually useful documentation, I was able to set up redirects so
that the page is back online and people can use the information now.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Thoughts on Mountain Lion]]></title>
    <link href="http://kisom.github.com/blog/2012/02/17/thoughts-on-mountain-lion/"/>
    <updated>2012-02-17T20:16:00+03:00</updated>
    <id>http://kisom.github.com/blog/2012/02/17/thoughts-on-mountain-lion</id>
    <content type="html"><![CDATA[<p>One of the great things about paradigm shifts is we can throw out the old
and start from scratch, getting rid of all the old cruft that&#8217;s built up over
time. Computers are no different, and the tablet revolution has allowed us
to rethink a few things. It looks like Apple is finally converging some of the
lessons learned with iOS and OS X. So, let&#8217;s take a look at some of these
ideas:</p>

<!-- more -->


<h3>The App Store as a sole source of software</h3>

<p>I&#8217;ve heard a lot of complaints about this. The fact of the matter is that for
the average user, this makes sense. It limits the exposure due to malware. For
users who need expanded privileges, it is <em>extremely</em> easy to do. This follows
a principle of secure by default and requiring a decision on the user&#8217;s part
to open up their system. Does Apple benefit? Surely. However, in this case the
decision also benefits users.</p>

<h3>Messages</h3>

<p>We are surrounded by a variety of chat systems. On mobile phones, SMS reigns
supreme; while on the desktop, there are more chat protocols than you can
shake a stick at. Bringing Messages onto the desktop starts to unify the
two systems. While Jabber and AIM / OSCAR are two very common protocols, it
would be good to see Twitter and perhaps even IRC. (While I would love to
see SILC supported, I don&#8217;t expect to see that anytime soon). One other thing
missing is OTR support. While the iMessage compatibility is interesting, lack
of Twitter and OTR support make it not compelling enough for me to switch over
as my chat client<a href="#footnotes">*</a>.</p>

<h3>XCode</h3>

<p>Apple is now providing a much smaller download with just the compiler and
commandline tools needed to use <a href="http://mxcl.github.com/homebrew/">Homebrew</a>
or other package management systems.</p>

<h3>Conclusion</h3>

<p>We&#8217;re starting to see some of the benefits of tablet revolution folded back
into the desktop and laptop realm. For normal users, these changes create a
more streamlined experience and improve the overall security of the system.
For developers, there are some changes to simplify non-iOS / Cocoa developers&#8217;
setups. I think this is a step in the right direction.</p>

<p>Except that it&#8217;s called &#8220;Mountain Lion.&#8221;</p>

<h3><a name="footnotes">Footnotes</a></h3>

<ol>
<li>I currently use <a href="http://www.adium.im/">Adium</a> for my chat and twitter client.</li>
</ol>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Setting Up Aquamacs for Clojure]]></title>
    <link href="http://kisom.github.com/blog/2012/02/02/setting-up-aquamacs-for-clojure/"/>
    <updated>2012-02-02T20:03:00+03:00</updated>
    <id>http://kisom.github.com/blog/2012/02/02/setting-up-aquamacs-for-clojure</id>
    <content type="html"><![CDATA[<p>It took me a bit to get my <a href="http://www.aquamacs.org">Aquamacs</a> install
up and ready to do <a href="http://www.clojure.org">Clojure</a>
and <a href="http://common-lisp.net/project/slime/">SLIME</a>, so I figured I&#8217;d jot
some notes down for future me and anyone who happens to be listening.</p>

<!-- more -->


<p>I assume Aquamacs has been downloaded and
<a href="https://github.com/technomancy/leiningen">leiningen</a> is installed. First,
in a terminal, you&#8217;ll need to install swank-clojure. As of today, the
current version is 1.4.0, but I strongly recommend you check the README
to see if there&#8217;s a new version out. In the shell,
<code>lein plugin install swank-clojure "1.4.0"</code>.</p>

<p>I use <a href="http://marmalade-repo.org/">Marmalade</a> for package management, so
the first thing to do is to add Marmalade to Aquamacs. Open up
<code>"~/Library/Preferences/Aquamacs\ Emacs/Preferences.el"</code> in your editor
of choice (I used <a href="https://code.google.com/p/macvim/">MacVim</a>), and add
the folowing:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="c1">;; Marmalade</span>
</span><span class='line'><span class="p">(</span><span class="nf">require</span> <span class="ss">&#39;package</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="nf">add-to-list</span> <span class="ss">&#39;package-archives</span>
</span><span class='line'>             <span class="o">&#39;</span><span class="p">(</span><span class="s">&quot;marmalade&quot;</span> <span class="o">.</span> <span class="s">&quot;http://marmalade-repo.org/packages/&quot;</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="nf">package-initialize</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>I&#8217;m assuming you don&#8217;t have <code>package.el</code> installed yet, so make sure to</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>`curl "http://repo.or.cz/w/emacs.git/blob_plain/1a0a666f941c99882093d7bd08ced15033bc3f0c:/lisp/emacs-lisp/package.el" > ~/Library/Preferences/Aquamacs\ Emacs/package.el`</span></code></pre></td></tr></table></div></figure>


<p>Now fire up Aquamacs (or evaluate the additions to <code>Preferences.el</code> with
<code>C-x C-e</code>. <code>clojure-mode</code> needs to be installed, either via <code>M-x package-list-packages</code>,
and marking <code>clojure-mode</code> for installation (with <code>i</code>) and installing
(with <code>x</code>), or with <code>M-x package-refresh-contents</code> followed by
<code>M-x package-install clojure-mode</code>. I also like <code>paredit</code> but you
might not, it takes some getting used to.</p>

<p>Now, open up a file in your lein&#8217;d project and use <code>M-x clojure-jack-in</code>.
You might see some errors pop up in your <code>*Compile-Log*</code> buffer, but you
should be very shortly greeted with a REPL.</p>

<p>Happy hacking!</p>

<h2>The End Result</h2>

<p>Here&#8217;s a screenshot of how it turned out (click to view it full-size):
<a href="http://kisom.github.com/images/aquamacs-clojure.png"><img src="http://kisom.github.com/images/aquamacs-clojure.t.png" alt="aquamacs-clojure thumbnail" /></a></p>

<p>I usually run aquamacs full-screen with two panes, left-side for editing
source code and right-size for SLIME.</p>

<h2>References</h2>

<p>I patched together my knowledge from a couple of pages:</p>

<ul>
<li>Incanter&#8217;s article <a href="http://data-sorcery.org/2009/12/20/getting-started/">Setting up Clojure, Incanter, Emacs, Slime, Swank, and Paredit</a></li>
<li>The Doctor What&#8217;s article <a href="http://docwhat.org/2011/08/aquamacs-2-3a-and-marmalade/">Aquamacs 2.3a and Marmalade</a></li>
<li>Phil Hagelberg&#8217;s <a href="https://github.com/technomancy/swank-clojure">swank-clojure</a> <a href="https://github.com/technomancy/swank-clojure/blob/master/README.md">README</a></li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Using Set Theory]]></title>
    <link href="http://kisom.github.com/blog/2012/02/01/using-set-theory/"/>
    <updated>2012-02-01T20:45:00+03:00</updated>
    <id>http://kisom.github.com/blog/2012/02/01/using-set-theory</id>
    <content type="html"><![CDATA[<p>In the <a href="http://kisom.github.com/blog/2012/01/23/basic-set-theory/">last post</a>, we took a look at the
basics of set theory. Now, I&#8217;d like to take a look at how to actually make use
of it in your code.</p>

<p>One of the issues with practically using the code in the last post is that the
initial subsets were defined arbitrarily and not derived from the superset. In
this post, all the examples are derived from the superset. We&#8217;ll use a couple
techniques for doing this illustrate some of the various ways to do it.</p>

<p>In Python, we&#8217;ll use an object-oriented approach, creating a few classes and
working on Book objects. In Clojure, we&#8217;ll use records. Though we&#8217;ll approach
language a little differently, I  hope they still bring clarity to the subject.</p>

<!-- more -->


<h2>Foundation: A Collection of Books</h2>

<p>The first thing we need to do in a useful system is determine what we mean by
book. The last post represented each book as a string denoting the title; while
that worked for a brief introduction, in practise it gives us very limited
options for building subsets. What we need to do is identify more information,
called attributes or fields, that give us the information we need to build our
subsets.</p>

<h3>Python</h3>

<p>In Python, we&#8217;ll approach this using a class. I&#8217;ve saved them in <code>library.py</code>
in the <a href="http://kisom.github.com/downloads/code/using-set-theory/py_example.tar.gz">Python example code</a></p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="c"># used to validate the list of formats passed to a book</span>
</span><span class='line'><span class="n">SUPPORTED_FORMATS</span> <span class="o">=</span> <span class="p">[</span> <span class="s">&#39;epub&#39;</span><span class="p">,</span> <span class="s">&#39;mobi&#39;</span><span class="p">,</span> <span class="s">&#39;pdf&#39;</span> <span class="p">]</span>
</span><span class='line'>
</span><span class='line'>
</span><span class='line'><span class="k">class</span> <span class="nc">Book</span><span class="p">:</span>
</span><span class='line'>    <span class="sd">&quot;&quot;&quot;</span>
</span><span class='line'><span class="sd">    Represents a book, with title, author, and summary text fields. A book</span>
</span><span class='line'><span class="sd">    should be given a list of formats supported as a dictionary in the form</span>
</span><span class='line'><span class="sd">    {fmt: True}, and optionally a list of tags.</span>
</span><span class='line'><span class="sd">    &quot;&quot;&quot;</span>
</span><span class='line'>    <span class="n">title</span> <span class="o">=</span> <span class="bp">None</span>
</span><span class='line'>    <span class="n">author</span> <span class="o">=</span> <span class="bp">None</span>
</span><span class='line'>    <span class="n">summary</span> <span class="o">=</span> <span class="bp">None</span>
</span><span class='line'>    <span class="n">formats</span> <span class="o">=</span> <span class="bp">None</span>
</span><span class='line'>    <span class="n">tags</span> <span class="o">=</span> <span class="bp">None</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">title</span><span class="p">,</span> <span class="n">author</span><span class="p">,</span> <span class="n">summary</span><span class="p">,</span> <span class="n">formats</span><span class="p">):</span>
</span><span class='line'>        <span class="sd">&quot;&quot;&quot;Initalise a new book. The format shoud be a dictiontary in</span>
</span><span class='line'><span class="sd">        the form { &#39;epub&#39;: True } where each key is a format that we</span>
</span><span class='line'><span class="sd">        have the book in.&quot;&quot;&quot;</span>
</span><span class='line'>
</span><span class='line'>        <span class="bp">self</span><span class="o">.</span><span class="n">title</span> <span class="o">=</span> <span class="n">title</span>
</span><span class='line'>        <span class="bp">self</span><span class="o">.</span><span class="n">author</span> <span class="o">=</span> <span class="n">author</span>
</span><span class='line'>        <span class="bp">self</span><span class="o">.</span><span class="n">summary</span> <span class="o">=</span> <span class="n">summary</span>
</span><span class='line'>
</span><span class='line'>        <span class="k">assert</span><span class="p">(</span><span class="ow">not</span> <span class="bp">False</span> <span class="ow">in</span> <span class="p">[</span><span class="n">fmt</span> <span class="ow">in</span> <span class="n">SUPPORTED_FORMATS</span> <span class="k">for</span> <span class="n">fmt</span> <span class="ow">in</span> <span class="n">formats</span><span class="p">])</span>
</span><span class='line'>        <span class="bp">self</span><span class="o">.</span><span class="n">formats</span> <span class="o">=</span> <span class="n">formats</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span><span class='line'>        <span class="sd">&quot;&quot;&quot;</span>
</span><span class='line'><span class="sd">        Return string representation of a book.</span>
</span><span class='line'><span class="sd">        &quot;&quot;&quot;</span>
</span><span class='line'>        <span class="n">out</span> <span class="o">=</span> <span class="s">&quot;</span><span class="si">%s</span><span class="se">\n\t</span><span class="s">by </span><span class="si">%s</span><span class="se">\n\t</span><span class="si">%s</span><span class="se">\n\t</span><span class="s">formats: </span><span class="si">%s</span><span class="s">&quot;</span>
</span><span class='line'>        <span class="n">out</span> <span class="o">=</span> <span class="n">out</span> <span class="o">%</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">title</span><span class="p">,</span> <span class="s">&#39;, &#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">author</span><span class="p">),</span>
</span><span class='line'>                     <span class="bp">self</span><span class="o">.</span><span class="n">summary</span><span class="p">,</span> <span class="s">&#39;, &#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">formats</span><span class="p">))</span>
</span><span class='line'>        <span class="k">return</span> <span class="n">out</span>
</span></code></pre></td></tr></table></div></figure>


<p>We&#8217;ll also want a <code>BookCollection</code> class to store a set of books and provide
some utility methods for dealing with the collection:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="k">class</span> <span class="nc">BookCollection</span><span class="p">:</span>
</span><span class='line'>    <span class="sd">&quot;&quot;&quot;Representation of a collection of books. Internally, they are stored</span>
</span><span class='line'><span class="sd">    as a set. It&#39;s main utility is in its helper methods that make accessing</span>
</span><span class='line'><span class="sd">    the books easier.&quot;&quot;&quot;</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">books</span><span class="p">,</span> <span class="n">book_filter</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
</span><span class='line'>        <span class="sd">&quot;&quot;&quot;Instantiate a collection of books. It expects a collection of</span>
</span><span class='line'><span class="sd">        books, e.g. a list or set, and optionally takes a filter to</span>
</span><span class='line'><span class="sd">        only put some of the books into the collection.&quot;&quot;&quot;</span>
</span><span class='line'>
</span><span class='line'>        <span class="k">if</span> <span class="n">book_filter</span><span class="p">:</span>
</span><span class='line'>            <span class="bp">self</span><span class="o">.</span><span class="n">books</span> <span class="o">=</span> <span class="nb">set</span><span class="p">([</span><span class="n">book</span> <span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">books</span> <span class="k">if</span> <span class="n">book_filter</span><span class="p">(</span><span class="n">book</span><span class="p">)])</span>
</span><span class='line'>        <span class="k">else</span><span class="p">:</span>
</span><span class='line'>            <span class="bp">self</span><span class="o">.</span><span class="n">books</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">books</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">def</span> <span class="nf">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span><span class='line'>        <span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">books</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">def</span> <span class="nf">show_titles</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">description</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
</span><span class='line'>        <span class="sd">&quot;&quot;&quot;Print a list of titles in the collection. If the description</span>
</span><span class='line'><span class="sd">        argument is supplied, it is printed first and all the books are</span>
</span><span class='line'><span class="sd">        printed with a preceding tab.&quot;&quot;&quot;</span>
</span><span class='line'>        <span class="k">if</span> <span class="n">description</span><span class="p">:</span>
</span><span class='line'>            <span class="k">print</span> <span class="n">description</span>
</span><span class='line'>            <span class="n">fmt</span> <span class="o">=</span> <span class="s">&#39;</span><span class="se">\t</span><span class="si">%s</span><span class="s">&#39;</span>
</span><span class='line'>        <span class="k">else</span><span class="p">:</span>
</span><span class='line'>            <span class="n">fmt</span> <span class="o">=</span> <span class="s">&#39;</span><span class="si">%s</span><span class="s">&#39;</span>
</span><span class='line'>
</span><span class='line'>        <span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">books</span><span class="p">:</span>
</span><span class='line'>            <span class="k">print</span> <span class="n">fmt</span> <span class="o">%</span> <span class="p">(</span><span class="n">book</span><span class="o">.</span><span class="n">title</span><span class="p">,</span> <span class="p">)</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">def</span> <span class="nf">get_titles</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span><span class='line'>        <span class="sd">&quot;&quot;&quot;Return a list of titles in the collection.&quot;&quot;&quot;</span>
</span><span class='line'>        <span class="k">return</span> <span class="p">[</span><span class="n">book</span><span class="o">.</span><span class="n">title</span> <span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">books</span><span class="p">]</span>
</span></code></pre></td></tr></table></div></figure>


<p>These two classes are very short (and we&#8217;ll extend them later to make them
more useful) but provide a solid foundation to begin building on. You&#8217;ll want
to load the books in the class.</p>

<p>To load an example book, you would do use code similar to this:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">books</span> <span class="o">=</span> <span class="nb">set</span><span class="p">([</span>
</span><span class='line'>    <span class="n">Book</span><span class="p">(</span><span class="s">&quot;Natural Language Processing with Python&quot;</span><span class="p">,</span>
</span><span class='line'>         <span class="p">[</span><span class="s">&#39;Steven Bird&#39;</span><span class="p">,</span> <span class="s">&#39;Ewan Klein&#39;</span><span class="p">,</span> <span class="s">&#39;Edward Loper&#39;</span><span class="p">],</span>
</span><span class='line'>         <span class="s">&#39;A highly accessible introduction to natural language processing.&#39;</span><span class="p">,</span>
</span><span class='line'>         <span class="p">[</span><span class="s">&#39;mobi&#39;</span><span class="p">,</span> <span class="p">]),</span>
</span><span class='line'>    <span class="n">Book</span><span class="p">(</span><span class="s">&#39;Learning OpenCV&#39;</span><span class="p">,</span> <span class="p">[</span><span class="s">&#39;Gary Bradski&#39;</span><span class="p">,</span> <span class="s">&#39;Adrian Kaehler&#39;</span><span class="p">],</span>
</span><span class='line'>         <span class="s">&#39;Puts you in the middle of the rapidly expanding field of &#39;</span> <span class="o">+</span>
</span><span class='line'>         <span class="s">&#39;computer vision.&#39;</span><span class="p">,</span>
</span><span class='line'>         <span class="p">[</span><span class="s">&#39;pdf&#39;</span><span class="p">,])</span>
</span><span class='line'>    <span class="p">])</span>
</span></code></pre></td></tr></table></div></figure>


<p>Manually entering all these details is tedious. Fortunately for you, I put up
with the tedium to create a sample dataset in <code>sample_library.py</code>. You use the
function <code>get_library()</code> from the file to use it.</p>

<h3>Clojure</h3>

<p>In Clojure, we&#8217;ll use a record to define a book:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="c1">;; define a book record</span>
</span><span class='line'><span class="p">(</span><span class="nf">defrecord</span> <span class="nv">Book</span>
</span><span class='line'>  <span class="o">#</span><span class="nv">^</span><span class="p">{</span> <span class="nv">:doc</span> <span class="s">&quot;Representation of a book. title is a string, authors a vector, </span>
</span><span class='line'><span class="s">summary is text, and formats is a vector.&quot;</span> <span class="p">}</span>
</span><span class='line'>  <span class="p">[</span><span class="nv">title</span> <span class="nv">authors</span> <span class="nv">summary</span> <span class="nv">formats</span><span class="p">])</span>
</span></code></pre></td></tr></table></div></figure>


<p>We&#8217;re not using objects, so we don&#8217;t need a record to store a collection.
(If we wanted to validate formats, we could do it using a Ref and a
:validator argument - that&#8217;s left as an exercise for the reader). I have,
however, defined a few helper functions.</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="k">defn </span><span class="nv">in?</span>
</span><span class='line'>  <span class="s">&quot;Check whether val is in coll.&quot;</span>
</span><span class='line'>  <span class="p">[</span><span class="nv">coll</span> <span class="nv">val</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">map? </span><span class="nv">coll</span><span class="p">)</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">val </span><span class="nv">coll</span><span class="p">)</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">not= </span><span class="mi">-1</span> <span class="p">(</span><span class="o">.</span><span class="nv">indexOf</span> <span class="nv">coll</span> <span class="nv">val</span><span class="p">))))</span>
</span><span class='line'>
</span><span class='line'><span class="c1">;; format validation</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">valid-format?</span>
</span><span class='line'>  <span class="s">&quot;Check a record or object with a :formats key to ensure it fits the list</span>
</span><span class='line'><span class="s">of valid formats.&quot;</span>
</span><span class='line'>  <span class="o">#</span><span class="p">(</span><span class="nv">or</span> <span class="p">(</span><span class="nf">in?</span> <span class="p">(</span><span class="nf">:formats</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">:epub</span><span class="p">)</span>
</span><span class='line'>       <span class="p">(</span><span class="nf">in?</span> <span class="p">(</span><span class="nf">:formats</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">:mobi</span><span class="p">)</span>
</span><span class='line'>       <span class="p">(</span><span class="nf">in?</span> <span class="p">(</span><span class="nf">:formats</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">:pdf</span><span class="p">)))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">defn </span><span class="nv">list-titles</span>
</span><span class='line'>  <span class="s">&quot;Print a list of titles of a book.&quot;</span>
</span><span class='line'>  <span class="p">[</span><span class="nv">books</span> <span class="nv">&amp;</span> <span class="nv">description</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">titles</span>  <span class="p">(</span><span class="nb">map </span><span class="nv">:title</span> <span class="nv">books</span><span class="p">)]</span>
</span><span class='line'>    <span class="p">(</span><span class="k">if </span><span class="nv">description</span>
</span><span class='line'>      <span class="p">(</span><span class="nf">do</span>
</span><span class='line'>        <span class="p">(</span><span class="nb">println </span><span class="nv">description</span><span class="p">)</span>
</span><span class='line'>        <span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">title</span> <span class="nv">titles</span><span class="p">]</span>
</span><span class='line'>          <span class="p">(</span><span class="nb">println </span><span class="s">&quot;\t&quot;</span> <span class="nv">title</span><span class="p">)))</span>
</span><span class='line'>      <span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">title</span> <span class="nv">titles</span><span class="p">]</span>
</span><span class='line'>        <span class="p">(</span><span class="nb">println </span><span class="nv">title</span><span class="p">)))))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">defn </span><span class="nv">get-titles</span>
</span><span class='line'>  <span class="s">&quot;Get a list of titles of a book collection.&quot;</span>
</span><span class='line'>  <span class="p">[</span><span class="nv">books</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">map </span><span class="nv">:title</span> <span class="nv">books</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">defn </span><span class="nv">book-str</span>
</span><span class='line'>  <span class="s">&quot;Return a book as a string.&quot;</span>
</span><span class='line'>  <span class="p">[</span><span class="nv">book</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">format</span> <span class="s">&quot;%s\n(by %s\n\t%s\n\tformats: %s\n&quot;</span>
</span><span class='line'>          <span class="p">(</span><span class="nb">str </span><span class="p">(</span><span class="nf">:title</span> <span class="nv">book</span><span class="p">))</span>
</span><span class='line'>          <span class="p">(</span><span class="nb">join </span><span class="s">&quot;, &quot;</span> <span class="p">(</span><span class="nf">:authors</span> <span class="nv">book</span><span class="p">))</span>
</span><span class='line'>          <span class="p">(</span><span class="nb">str </span><span class="p">(</span><span class="nf">:summary</span> <span class="nv">book</span><span class="p">))</span>
</span><span class='line'>          <span class="p">(</span><span class="nb">join </span><span class="s">&quot;, &quot;</span> <span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="ss">&#39;name</span> <span class="p">(</span><span class="nf">:format</span> <span class="nv">book</span><span class="p">)))))</span>
</span></code></pre></td></tr></table></div></figure>


<p>Adding books is a simple affair:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nb">set </span>
</span><span class='line'>   <span class="p">[(</span><span class="nf">Book</span><span class="o">.</span> <span class="s">&quot;Natural Language Processing with Python&quot;</span>
</span><span class='line'>           <span class="p">[</span><span class="s">&quot;Steven Bird&quot;</span> <span class="s">&quot;Ewan Klein&quot;</span> <span class="s">&quot;Edward Loper&quot;</span> <span class="p">]</span>
</span><span class='line'>           <span class="s">&quot;A highly accessible introduction to natural language processing.&quot;</span>
</span><span class='line'>           <span class="p">[</span> <span class="nv">:mobi</span> <span class="p">])</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">Book</span><span class="o">.</span> <span class="s">&quot;Learning OpenCV&quot;</span> <span class="p">[</span><span class="s">&quot;Gary Bradski&quot;</span> <span class="s">&quot;Adrian Kaehler&quot;</span><span class="p">]</span>
</span><span class='line'>           <span class="p">(</span><span class="nb">str </span><span class="s">&quot;Puts you in the middle of the rapidly expanding field of &quot;</span>
</span><span class='line'>                <span class="s">&quot;computer vision&quot;</span><span class="p">)</span>
</span><span class='line'>      <span class="p">[</span> <span class="nv">:pdf</span> <span class="p">])])</span>
</span></code></pre></td></tr></table></div></figure>


<p>I&#8217;ve loaded a sample dataset into the <code>sample_library.clj</code> source file, available
from the <a href="http://kisom.github.com/downloads/code/using-set-theory/clj-example.tar.gz">Clojure example code</a>.</p>

<h2>Building Subsets</h2>

<p>Now that we have a way to represent a book (with more useful information than
simply the title), we can start to build some subsets. Let&#8217;s start by
looking at <em>set notation</em> (aka how to write a set both mathematically and
in code), and then continue on to recreate the two subsets in the previous
article, <code>epub</code> and <code>mobi</code>.</p>

<h3>Set Notation</h3>

<p>In <a href="https://en.wikipedia.org/wiki/Set_notation">set notation</a>, we denote
a set by writing:</p>

<blockquote><p>A = { x | x ∈ N, x &lt; 10 }</p></blockquote>

<p>which means the set of numbers that are members of (∈ means <em>&#8216;element of&#8217;</em>)
the set of positive integers and are less than 10. You might generalise this
as such:</p>

<blockquote><p>given the universal set S, which defines all the elements under
consideration, and some predicate P which is a function that returns either
true if the element satisfies the predicate (and thus should be included
in the set):<br>
{ x | x ∈ S, P(x) }</p></blockquote>

<p>We would express this set as:</p>

<blockquote><p>A = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }</p></blockquote>

<p>In Python, this is easily expressed with a
<a href="http://www.python.org/dev/peps/pep-0202/">list comprehension</a> (see also
the <a href="http://docs.python.org/reference/expressions.html#list-displays">Python documentation</a>:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="c"># a Python list comprehension isn&#39;t aware that once N is above 10, it should</span>
</span><span class='line'><span class="c"># terminate, so we cheat and create a list of integers from 1 to 100.</span>
</span><span class='line'>
</span><span class='line'><span class="c"># define N</span>
</span><span class='line'><span class="n">N</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="c"># build the set</span>
</span><span class='line'><span class="n">A</span> <span class="o">=</span> <span class="p">[</span> <span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">N</span> <span class="k">if</span> <span class="n">x</span> <span class="o">&lt;</span> <span class="mi">10</span> <span class="p">]</span>
</span></code></pre></td></tr></table></div></figure>


<p>And in Clojure, we could use something similar:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="p">;;</span> <span class="n">define</span> <span class="n">N</span>
</span><span class='line'><span class="p">(</span><span class="k">def</span> <span class="nf">N</span> <span class="p">(</span><span class="n">iterate</span> <span class="n">inc</span> <span class="mi">0</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="k">def</span> <span class="nf">N</span> <span class="c">#^{:doc &quot;Representation of the set of positive integers.&quot;} N)</span>
</span><span class='line'>
</span><span class='line'><span class="p">;;</span> <span class="n">build</span> <span class="n">the</span> <span class="nb">set</span>
</span><span class='line'><span class="p">(</span><span class="nb">filter</span> <span class="c">#(&lt; % 10) N)</span>
</span></code></pre></td></tr></table></div></figure>


<h3>Building the Subsets</h3>

<p>As mentioned earlier, I have already built sample datasets for both Python
and Clojure, so be sure to use those and save yourself from having to build
your own just yet!</p>

<h4>Python</h4>

<p>In Python, we can use the built-in <code>filter</code> function to build a list. It will
serve as our predicate function.</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="kn">import</span> <span class="nn">library</span>
</span><span class='line'><span class="kn">import</span> <span class="nn">sample_library</span>
</span><span class='line'>
</span><span class='line'><span class="c"># my_library is our superset</span>
</span><span class='line'><span class="n">MY_LIBRARY</span> <span class="o">=</span> <span class="n">sample_library</span><span class="o">.</span><span class="n">get_library</span><span class="p">()</span>
</span><span class='line'>
</span><span class='line'><span class="c"># build our filters</span>
</span><span class='line'><span class="n">IS_EPUB</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">book</span><span class="p">:</span> <span class="s">&#39;epub&#39;</span> <span class="ow">in</span> <span class="n">book</span><span class="o">.</span><span class="n">formats</span>
</span><span class='line'><span class="n">IS_MOBI</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">book</span><span class="p">:</span> <span class="s">&#39;mobi&#39;</span> <span class="ow">in</span> <span class="n">book</span><span class="o">.</span><span class="n">formats</span>
</span><span class='line'>
</span><span class='line'><span class="c"># build the subsets</span>
</span><span class='line'><span class="n">EPUB</span> <span class="o">=</span> <span class="n">library</span><span class="o">.</span><span class="n">BookCollection</span><span class="p">(</span><span class="n">MY_LIBRARY</span><span class="o">.</span><span class="n">books</span><span class="p">,</span> <span class="n">IS_EPUB</span><span class="p">)</span>
</span><span class='line'><span class="n">MOBI</span> <span class="o">=</span> <span class="n">library</span><span class="o">.</span><span class="n">BookCollection</span><span class="p">(</span><span class="n">MY_LIBRARY</span><span class="o">.</span><span class="n">books</span><span class="p">,</span> <span class="n">IS_MOBI</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>This gives me the output:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">In</span> <span class="p">[</span><span class="mi">7</span><span class="p">]:</span> <span class="n">formats</span><span class="o">.</span><span class="n">EPUB</span><span class="o">.</span><span class="n">show_titles</span><span class="p">(</span><span class="s">&quot;books in epub format:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="n">books</span> <span class="ow">in</span> <span class="n">epub</span> <span class="n">format</span><span class="p">:</span>
</span><span class='line'>  <span class="n">Code</span> <span class="n">Complete</span>
</span><span class='line'>  <span class="n">The</span> <span class="n">Joy</span> <span class="n">of</span> <span class="n">Clojure</span>
</span><span class='line'>  <span class="n">Mining</span> <span class="n">the</span> <span class="n">Social</span> <span class="n">Web</span>
</span><span class='line'>
</span><span class='line'><span class="n">In</span> <span class="p">[</span><span class="mi">8</span><span class="p">]:</span> <span class="n">formats</span><span class="o">.</span><span class="n">MOBI</span><span class="o">.</span><span class="n">show_titles</span><span class="p">(</span><span class="s">&quot;books in mobi format:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="n">books</span> <span class="ow">in</span> <span class="n">mobi</span> <span class="n">format</span><span class="p">:</span>
</span><span class='line'>  <span class="n">Introduction</span> <span class="n">to</span> <span class="n">Information</span> <span class="n">Retrieval</span>
</span><span class='line'>  <span class="n">Code</span> <span class="n">Complete</span>
</span><span class='line'>  <span class="n">Natural</span> <span class="n">Language</span> <span class="n">Processing</span> <span class="k">with</span> <span class="n">Python</span>
</span><span class='line'>
</span><span class='line'><span class="n">In</span> <span class="p">[</span><span class="mi">9</span><span class="p">]:</span>
</span></code></pre></td></tr></table></div></figure>


<p>If you recall the definition of <code>BookCollection</code>, the filter method is
called as <code>filter(predicate, collection)</code>. In the case of the <code>mobi</code>
subset, it filters out anything that fails the test
<code>'mobi' in book.formats</code>. We might write this as</p>

<blockquote><p>{ book | book ∈ <code>my_library</code>, <code>is_mobi(book)</code> }</p></blockquote>

<p>in set notation. I&#8217;ve predefined some filters in the file <code>formats.py</code>
which is again in the <a href="http://kisom.github.com/downloads/code/using-set-theory/py_example.tar.gz">example code</a>.</p>

<h4>Clojure</h4>

<p>Likewise, Clojure has a built-in filter function, in the form
<code>(filter pred coll)</code>. We&#8217;ll use two
<a href="http://clojuredocs.org/clojure_core/clojure.core/fn">anonymous functions</a>
to do our filtering:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">use</span> <span class="ss">&#39;using_set_theory</span><span class="o">.</span><span class="nv">library</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="nf">use</span> <span class="ss">&#39;using_set_theory</span><span class="o">.</span><span class="nv">sample_library</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">epub?</span> <span class="o">#</span><span class="p">(</span><span class="nv">in?</span> <span class="p">(</span><span class="nf">:formats</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">:epub</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">mobi?</span> <span class="o">#</span><span class="p">(</span><span class="nv">in?</span> <span class="p">(</span><span class="nf">:formats</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">:mobi</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">my-library</span> <span class="p">(</span><span class="nf">get-library</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">epub</span> <span class="p">(</span><span class="nb">filter </span><span class="nv">epub?</span> <span class="nv">my-library</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">mobi</span> <span class="p">(</span><span class="nb">filter </span><span class="nv">mobi?</span> <span class="nv">my-library</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">list-titles</span> <span class="nv">epub</span> <span class="s">&quot;list of books in epub format:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="nf">list-titles</span> <span class="nv">mobi</span> <span class="s">&quot;list of books in mobi format:&quot;</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>In the repl, this gives me:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="nv">using_set_theory</span><span class="o">.</span><span class="nv">core=&gt;</span> <span class="p">(</span><span class="nf">list-titles</span> <span class="nv">epub</span> <span class="s">&quot;list of books in epub format:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="nb">list </span><span class="nv">of</span> <span class="nv">books</span> <span class="nv">in</span> <span class="nv">epub</span> <span class="nv">format:</span><span class="p">)</span>
</span><span class='line'>   <span class="nv">The</span> <span class="nv">Joy</span> <span class="nv">of</span> <span class="nv">Clojure</span>
</span><span class='line'>   <span class="nv">Mining</span> <span class="nv">the</span> <span class="nv">Social</span> <span class="nv">Web</span>
</span><span class='line'>   <span class="nv">Code</span> <span class="nv">Complete</span>
</span><span class='line'><span class="nv">nil</span>
</span><span class='line'><span class="nv">using_set_theory</span><span class="o">.</span><span class="nv">core=&gt;</span> <span class="p">(</span><span class="nf">list-titles</span> <span class="nv">mobi</span> <span class="s">&quot;list of books in mobi format:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="nb">list </span><span class="nv">of</span> <span class="nv">books</span> <span class="nv">in</span> <span class="nv">mobi</span> <span class="nv">format:</span><span class="p">)</span>
</span><span class='line'>   <span class="nv">Introduction</span> <span class="nv">to</span> <span class="nv">Information</span> <span class="nv">Retrieval</span>
</span><span class='line'>   <span class="nv">Natural</span> <span class="nv">Language</span> <span class="nv">Processing</span> <span class="nv">with</span> <span class="nv">Python</span>
</span><span class='line'>   <span class="nv">Code</span> <span class="nv">Complete</span>
</span><span class='line'><span class="nv">nil</span>
</span><span class='line'><span class="nv">using_set_theory</span><span class="o">.</span><span class="nv">core=&gt;</span>
</span></code></pre></td></tr></table></div></figure>


<p>I&#8217;ve put these filters in the <code>filters.clj</code> source file, along with definitions
for <code>epub-books</code> and <code>mobi-books</code>:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">ns</span> <span class="nv">using_set_theory</span><span class="o">.</span><span class="nv">filters</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">:use</span> <span class="p">[</span><span class="nv">using_set_theory</span><span class="o">.</span><span class="nv">library</span><span class="p">]</span>
</span><span class='line'>        <span class="p">[</span><span class="nv">using_set_theory</span><span class="o">.</span><span class="nv">sample_library</span><span class="p">]))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">epub?</span>
</span><span class='line'>  <span class="o">#</span><span class="nv">^</span><span class="p">{</span><span class="nv">:doc</span> <span class="s">&quot;Filter a collection of books by those supporting the epub format.&quot;</span><span class="p">}</span>
</span><span class='line'>  <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">book</span><span class="p">]</span> <span class="p">(</span><span class="nf">in?</span> <span class="p">(</span><span class="nf">:formats</span> <span class="nv">book</span><span class="p">)</span> <span class="nv">:epub</span><span class="p">)))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">mobi?</span>
</span><span class='line'>  <span class="o">#</span><span class="nv">^</span><span class="p">{</span><span class="nv">:doc</span> <span class="s">&quot;Filter a collection of books by those supporting the mobi format.&quot;</span><span class="p">}</span>
</span><span class='line'>  <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">book</span><span class="p">]</span> <span class="p">(</span><span class="nf">in?</span> <span class="p">(</span><span class="nf">:formats</span> <span class="nv">book</span><span class="p">)</span> <span class="nv">:mobi</span><span class="p">)))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">defn- </span><span class="nv">get-epub</span>
</span><span class='line'>  <span class="s">&quot;Takes a collection of books and returns the list of books in epub format.&quot;</span>
</span><span class='line'>  <span class="p">[</span><span class="nv">books</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">filter </span><span class="nv">epub?</span> <span class="nv">books</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">defn- </span><span class="nv">get-mobi</span>
</span><span class='line'>  <span class="s">&quot;Takes a collection of books and returns the list of book in mobi format.&quot;</span>
</span><span class='line'>  <span class="p">[</span><span class="nv">books</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">filter </span><span class="nv">mobi?</span> <span class="nv">books</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">epub-books</span> <span class="p">(</span><span class="nb">set </span><span class="p">(</span><span class="nf">get-epub</span> <span class="p">(</span><span class="nf">get-library</span><span class="p">))))</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">mobi-books</span> <span class="p">(</span><span class="nb">set </span><span class="p">(</span><span class="nf">get-mobi</span> <span class="p">(</span><span class="nf">get-library</span><span class="p">))))</span>
</span></code></pre></td></tr></table></div></figure>


<h2>Parallels with SQL</h2>

<p>This introduction of filters might remind you of SQL, and for good reason.
<a href="https://en.wikipedia.org/wiki/Edgar_F._Codd">Edgar Codd</a> designed SQL with
set theory in mind. You can think of tables as sets (provided, of course,
proper data preparation is done to ensure there are no duplicates in the
database), and operations like <code>SELECT</code> return subsets. For example, if we
were storing the books in a library, we would write something like</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='sql'><span class='line'><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">books</span> <span class="k">WHERE</span> <span class="n">has_epub</span> <span class="o">=</span> <span class="k">TRUE</span><span class="p">;</span>
</span></code></pre></td></tr></table></div></figure>


<h2>Moving On</h2>

<p>Now that we have a programmatic way to build subsets, we can automate the entire
set of sequences in the <a href="http://kisom.github.com/blog/2012/01/23/basic-set-theory/">last post</a>:</p>

<h3>Python</h3>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="kn">import</span> <span class="nn">formats</span>
</span><span class='line'><span class="kn">import</span> <span class="nn">library</span>
</span><span class='line'>
</span><span class='line'><span class="n">either_format</span> <span class="o">=</span> <span class="nb">set</span><span class="o">.</span><span class="n">union</span><span class="p">(</span><span class="n">formats</span><span class="o">.</span><span class="n">EPUB</span><span class="o">.</span><span class="n">books</span><span class="p">,</span> <span class="n">formats</span><span class="o">.</span><span class="n">MOBI</span><span class="o">.</span><span class="n">books</span><span class="p">)</span>
</span><span class='line'><span class="n">either_format</span> <span class="o">=</span> <span class="n">library</span><span class="o">.</span><span class="n">BookCollection</span><span class="p">(</span><span class="n">either_format</span><span class="p">)</span>
</span><span class='line'><span class="n">either_format</span><span class="o">.</span><span class="n">show_titles</span><span class="p">(</span><span class="s">&quot;books in either format:&quot;</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="n">both_formats</span> <span class="o">=</span> <span class="nb">set</span><span class="o">.</span><span class="n">intersection</span><span class="p">(</span><span class="n">formats</span><span class="o">.</span><span class="n">EPUB</span><span class="o">.</span><span class="n">books</span><span class="p">,</span> <span class="n">formats</span><span class="o">.</span><span class="n">MOBI</span><span class="o">.</span><span class="n">books</span><span class="p">)</span>
</span><span class='line'><span class="n">both_formats</span> <span class="o">=</span> <span class="n">library</span><span class="o">.</span><span class="n">BookCollection</span><span class="p">(</span><span class="n">both_formats</span><span class="p">)</span>
</span><span class='line'><span class="n">both_formats</span><span class="o">.</span><span class="n">show_titles</span><span class="p">(</span><span class="s">&quot;books in both formats:&quot;</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>which gives me the results:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">In</span> <span class="p">[</span><span class="mi">31</span><span class="p">]:</span> <span class="n">either_format</span><span class="o">.</span><span class="n">show_titles</span><span class="p">(</span><span class="s">&quot;books in either format:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="n">Out</span><span class="p">[</span><span class="mi">31</span><span class="p">]:</span>
</span><span class='line'><span class="n">books</span> <span class="ow">in</span> <span class="n">either</span> <span class="n">format</span><span class="p">:</span>
</span><span class='line'>  <span class="n">Code</span> <span class="n">Complete</span>
</span><span class='line'>  <span class="n">Mining</span> <span class="n">the</span> <span class="n">Social</span> <span class="n">Web</span>
</span><span class='line'>  <span class="n">Natural</span> <span class="n">Language</span> <span class="n">Processing</span> <span class="k">with</span> <span class="n">Python</span>
</span><span class='line'>  <span class="n">Introduction</span> <span class="n">to</span> <span class="n">Information</span> <span class="n">Retrieval</span>
</span><span class='line'>  <span class="n">The</span> <span class="n">Joy</span> <span class="n">of</span> <span class="n">Clojure</span>
</span><span class='line'><span class="n">In</span> <span class="p">[</span><span class="mi">32</span><span class="p">]:</span> <span class="n">both_formats</span><span class="o">.</span><span class="n">show_titles</span><span class="p">(</span><span class="s">&quot;books in both formats:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="n">Out</span><span class="p">[</span><span class="mi">32</span><span class="p">]:</span>
</span><span class='line'><span class="n">books</span> <span class="ow">in</span> <span class="n">both</span> <span class="n">formats</span><span class="p">:</span>
</span><span class='line'>        <span class="n">Code</span> <span class="n">Complete</span>
</span></code></pre></td></tr></table></div></figure>


<h3>Clojure</h3>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">require</span> <span class="ss">&#39;clojure</span><span class="o">.</span><span class="nv">set</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="nf">use</span> <span class="ss">&#39;using_set_theory</span><span class="o">.</span><span class="nv">filters</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="nf">use</span> <span class="ss">&#39;using_set_theory</span><span class="o">.</span><span class="nv">library</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">either-format</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">clojure</span><span class="o">.</span><span class="nv">set/union</span> <span class="nv">epub-books</span> <span class="nv">mobi-books</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">both-formats</span>
</span><span class='line'>    <span class="p">(</span><span class="nf">clojure</span><span class="o">.</span><span class="nv">set/intersection</span> <span class="nv">epub-books</span> <span class="nv">mobi-books</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">show-titles</span> <span class="nv">either-format</span> <span class="s">&quot;books in either format:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="nf">show-titles</span> <span class="nv">both-formats</span> <span class="s">&quot;books in both formats:&quot;</span><span class="p">)</span>
</span></code></pre></td></tr></table></div></figure>


<p>In the Clojure REPL, I get the following output:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="nv">using_set_theory</span><span class="o">.</span><span class="nv">core=&gt;</span> <span class="p">(</span><span class="nf">show-titles</span> <span class="nv">either-format</span> <span class="s">&quot;books in either format:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="nf">books</span> <span class="nv">in</span> <span class="nv">either</span> <span class="nv">format:</span><span class="p">)</span>
</span><span class='line'>   <span class="nv">Introduction</span> <span class="nv">to</span> <span class="nv">Information</span> <span class="nv">Retrieval</span>
</span><span class='line'>   <span class="nv">The</span> <span class="nv">Joy</span> <span class="nv">of</span> <span class="nv">Clojure</span>
</span><span class='line'>   <span class="nv">Natural</span> <span class="nv">Language</span> <span class="nv">Processing</span> <span class="nv">with</span> <span class="nv">Python</span>
</span><span class='line'>   <span class="nv">Code</span> <span class="nv">Complete</span>
</span><span class='line'>   <span class="nv">Mining</span> <span class="nv">the</span> <span class="nv">Social</span> <span class="nv">Web</span>
</span><span class='line'><span class="nv">nil</span>
</span><span class='line'><span class="nv">using_set_theory</span><span class="o">.</span><span class="nv">core=&gt;</span> <span class="p">(</span><span class="nf">show-titles</span> <span class="nv">both-formats</span> <span class="s">&quot;books in both formats:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="nf">books</span> <span class="nv">in</span> <span class="nv">both</span> <span class="nv">formats:</span><span class="p">)</span>
</span><span class='line'>        <span class="nv">Code</span> <span class="nv">Complete</span>
</span><span class='line'><span class="nv">nil</span>
</span><span class='line'><span class="nv">using_set_theory</span><span class="o">.</span><span class="nv">core=&gt;</span>
</span></code></pre></td></tr></table></div></figure>


<h2>Sets v. Lists</h2>

<p>Remember that one of the key attributes of a set is that each member is distinct.
Let&#8217;s compare a set with a list; we&#8217;ll do this with an intersection.</p>

<h3>Python</h3>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="kn">import</span> <span class="nn">library</span>
</span><span class='line'><span class="kn">import</span> <span class="nn">formats</span>
</span><span class='line'><span class="kn">from</span> <span class="nn">sample_library</span> <span class="kn">import</span> <span class="n">get_library</span>
</span><span class='line'>
</span><span class='line'><span class="n">epub_list</span> <span class="o">=</span> <span class="p">[</span><span class="n">book</span> <span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">get_library</span><span class="p">()</span><span class="o">.</span><span class="n">books</span>
</span><span class='line'>             <span class="k">if</span> <span class="s">&#39;epub&#39;</span> <span class="ow">in</span> <span class="n">book</span><span class="o">.</span><span class="n">formats</span><span class="p">]</span>
</span><span class='line'><span class="n">mobi_list</span> <span class="o">=</span> <span class="p">[</span><span class="n">book</span> <span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">get_library</span><span class="p">()</span><span class="o">.</span><span class="n">books</span>
</span><span class='line'>             <span class="k">if</span> <span class="s">&#39;mobi&#39;</span> <span class="ow">in</span> <span class="n">book</span><span class="o">.</span><span class="n">formats</span><span class="p">]</span>
</span><span class='line'>
</span><span class='line'><span class="n">both_formats</span> <span class="o">=</span> <span class="p">[]</span>
</span><span class='line'><span class="n">both_formats</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">epub_list</span><span class="p">))</span>
</span><span class='line'><span class="n">both_formats</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">mobi_list</span><span class="p">))</span>
</span><span class='line'><span class="k">print</span> <span class="s">&#39;books in both formats:&#39;</span>
</span><span class='line'><span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">both_formats</span><span class="p">:</span>
</span><span class='line'>    <span class="k">print</span> <span class="s">&#39;</span><span class="se">\t</span><span class="si">%s</span><span class="s">&#39;</span> <span class="o">%</span> <span class="n">book</span><span class="o">.</span><span class="n">title</span>
</span><span class='line'>  
</span></code></pre></td></tr></table></div></figure>


<p>The result:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>books in both formats:
</span><span class='line'>  Mining the Social Web
</span><span class='line'>  Code Complete
</span><span class='line'>  The Joy of Clojure
</span><span class='line'>  Code Complete
</span><span class='line'>  Natural Language Processing with Python
</span><span class='line'>  Introduction to Information Retrieval</span></code></pre></td></tr></table></div></figure>


<h3>Clojure</h3>

<p>In Clojure, we&#8217;ll use the vector type, which is like a list but the first
element isn&#8217;t evaluated:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">use</span> <span class="ss">&#39;using_set_theory</span><span class="o">.</span><span class="nv">library</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="nf">use</span> <span class="ss">&#39;using_set_theory</span><span class="o">.</span><span class="nv">sample_library</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="nf">use</span> <span class="ss">&#39;using_set_theory</span><span class="o">.</span><span class="nv">filters</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="nf">use</span> <span class="o">&#39;</span><span class="p">[</span><span class="nv">clojure</span><span class="o">.</span><span class="nv">contrib</span><span class="o">.</span><span class="nv">seq-utils</span> <span class="nv">:only</span> <span class="p">[</span><span class="nv">includes?</span><span class="p">]])</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">epub-list</span> <span class="p">(</span><span class="nf">vec</span> <span class="p">(</span><span class="nb">map </span><span class="nv">:title</span> <span class="nv">epub-books</span><span class="p">)))</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">mobi-list</span> <span class="p">(</span><span class="nf">vec</span> <span class="p">(</span><span class="nb">map </span><span class="nv">:title</span> <span class="nv">mobi-books</span><span class="p">)))</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">both-list</span> <span class="p">(</span><span class="nb">concat </span><span class="nv">epub-list</span> <span class="nv">mobi-list</span><span class="p">))</span>
</span></code></pre></td></tr></table></div></figure>


<p>Which yields:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>clojure.core=> (doseq [title (union epub-list mobi-list)] (println title))
</span><span class='line'>The Joy of Clojure
</span><span class='line'>Code Complete
</span><span class='line'>Mining the Social Web
</span><span class='line'>Introduction to Information Retrieval
</span><span class='line'>Natural Language Processing with Python
</span><span class='line'>nil
</span><span class='line'>clojure.core=></span></code></pre></td></tr></table></div></figure>


<h3>So what?</h3>

<p>You&#8217;ll notice &#8220;Code Complete&#8221; shows up twice in the list. The advantage of sets
here is that only unique items are returned. A union is actually the list of
elements in both sets, <em>minus</em> the list of items that are in both
sets.</p>

<h3>A Second Stab: Python</h3>

<p>Implementing the set operations:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">in_both</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">a</span> <span class="ow">and</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">b</span>
</span><span class='line'>
</span><span class='line'><span class="k">def</span> <span class="nf">intersect</span><span class="p">(</span><span class="n">seta</span><span class="p">,</span> <span class="n">setb</span><span class="p">):</span>
</span><span class='line'>    <span class="n">both_list</span> <span class="o">=</span> <span class="p">[]</span>
</span><span class='line'>    <span class="n">both_list</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">seta</span><span class="p">)</span>
</span><span class='line'>    <span class="n">both_list</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">setb</span><span class="p">)</span>
</span><span class='line'>    <span class="n">intersect_list</span> <span class="o">=</span> <span class="p">[]</span>
</span><span class='line'>    <span class="n">temp_list</span> <span class="o">=</span> <span class="n">both_list</span><span class="p">[:]</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">while</span> <span class="ow">not</span> <span class="n">temp_list</span> <span class="o">==</span> <span class="p">[]:</span>
</span><span class='line'>        <span class="n">element</span> <span class="o">=</span> <span class="n">temp_list</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
</span><span class='line'>        <span class="k">if</span> <span class="ow">not</span> <span class="n">element</span> <span class="ow">in</span> <span class="n">intersect_list</span><span class="p">:</span>
</span><span class='line'>            <span class="k">if</span> <span class="n">in_both</span><span class="p">(</span><span class="n">element</span><span class="p">,</span> <span class="n">seta</span><span class="p">,</span> <span class="n">setb</span><span class="p">):</span>
</span><span class='line'>                <span class="n">intersect_list</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">element</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">return</span> <span class="n">intersect_list</span>
</span><span class='line'>
</span><span class='line'><span class="k">def</span> <span class="nf">union</span><span class="p">(</span><span class="n">seta</span><span class="p">,</span> <span class="n">setb</span><span class="p">):</span>
</span><span class='line'>    <span class="n">both_list</span> <span class="o">=</span> <span class="p">[]</span>
</span><span class='line'>    <span class="n">both_list</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">seta</span><span class="p">)</span>
</span><span class='line'>    <span class="n">both_list</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">setb</span><span class="p">)</span>
</span><span class='line'>    <span class="n">intersect_list</span> <span class="o">=</span> <span class="n">intersect</span><span class="p">(</span><span class="n">seta</span><span class="p">,</span> <span class="n">setb</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">while</span> <span class="ow">not</span> <span class="n">intersect_list</span> <span class="o">==</span> <span class="p">[]:</span>
</span><span class='line'>        <span class="n">element</span> <span class="o">=</span> <span class="n">intersect_list</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
</span><span class='line'>        <span class="k">while</span> <span class="n">both_list</span><span class="o">.</span><span class="n">count</span><span class="p">(</span><span class="n">element</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">1</span><span class="p">:</span>
</span><span class='line'>            <span class="n">both_list</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">element</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'>    <span class="k">return</span> <span class="n">both_list</span>
</span></code></pre></td></tr></table></div></figure>


<p>Applying this to our lists:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">In</span> <span class="p">[</span><span class="mi">30</span><span class="p">]:</span> <span class="n">union</span><span class="p">(</span><span class="n">epub_list</span><span class="p">,</span> <span class="n">mobi_list</span><span class="p">)</span>
</span><span class='line'><span class="n">Out</span><span class="p">[</span><span class="mi">30</span><span class="p">]:</span>
</span><span class='line'><span class="p">[</span><span class="s">&#39;Mining the Social Web&#39;</span><span class="p">,</span>
</span><span class='line'> <span class="s">&#39;The Joy of Clojure&#39;</span><span class="p">,</span>
</span><span class='line'> <span class="s">&#39;Code Complete&#39;</span><span class="p">,</span>
</span><span class='line'> <span class="s">&#39;Introduction to Information Retrieval&#39;</span><span class="p">,</span>
</span><span class='line'> <span class="s">&#39;Natural Language Processing with Python&#39;</span><span class="p">]</span>
</span><span class='line'>
</span><span class='line'><span class="n">In</span> <span class="p">[</span><span class="mi">31</span><span class="p">]:</span>
</span></code></pre></td></tr></table></div></figure>


<h3>A Second Stab: Clojure</h3>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nf">ns</span> <span class="nv">myset</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">:use</span> <span class="p">[</span><span class="nv">clojure</span><span class="o">.</span><span class="nv">contrib</span><span class="o">.</span><span class="nv">seq-utils</span> <span class="nv">:only</span> <span class="p">[</span><span class="nv">includes?</span><span class="p">]]))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">defn </span><span class="nv">unique?</span> <span class="p">[</span><span class="nv">el</span> <span class="nv">ulst</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nb">= </span><span class="mi">0</span> <span class="p">(</span><span class="nb">count </span><span class="p">(</span><span class="nb">filter </span><span class="o">#</span><span class="p">(</span><span class="nv">=</span> <span class="nv">%</span> <span class="nv">el</span><span class="p">)</span> <span class="nv">ulst</span><span class="p">))))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">defn </span><span class="nv">get-intersect</span> <span class="p">[</span><span class="nv">ilist</span> <span class="nv">seta</span> <span class="nv">setb</span> <span class="nv">both-list</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">empty?</span> <span class="nv">both-list</span><span class="p">)</span>
</span><span class='line'>    <span class="nv">ilist</span>
</span><span class='line'>    <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">element</span> <span class="p">(</span><span class="nb">first </span><span class="nv">both-list</span><span class="p">)]</span>
</span><span class='line'>      <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">and </span><span class="p">(</span><span class="nf">includes?</span> <span class="nv">seta</span> <span class="nv">element</span><span class="p">)</span>
</span><span class='line'>               <span class="p">(</span><span class="nf">includes?</span> <span class="nv">setb</span> <span class="nv">element</span><span class="p">)</span>
</span><span class='line'>               <span class="p">(</span><span class="nb">not </span><span class="p">(</span><span class="nf">includes?</span> <span class="nv">ilist</span> <span class="nv">element</span><span class="p">)))</span>
</span><span class='line'>        <span class="p">(</span><span class="nf">get-intersect</span> <span class="p">(</span><span class="nb">conj </span><span class="nv">ilist</span> <span class="nv">element</span><span class="p">)</span> <span class="nv">seta</span> <span class="nv">setb</span> <span class="p">(</span><span class="nb">rest </span><span class="nv">both-list</span><span class="p">))</span>
</span><span class='line'>        <span class="p">(</span><span class="nf">get-intersect</span> <span class="nv">ilist</span> <span class="nv">seta</span> <span class="nv">setb</span> <span class="p">(</span><span class="nb">rest </span><span class="nv">both-list</span><span class="p">))))))</span>
</span><span class='line'>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">defn </span><span class="nv">check-unique</span> <span class="p">[</span><span class="nv">ilist</span> <span class="nv">both</span> <span class="nv">ulist</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">empty?</span> <span class="nv">ilist</span><span class="p">)</span>
</span><span class='line'>   <span class="nv">ulist</span>
</span><span class='line'>   <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">element</span> <span class="p">(</span><span class="nb">first </span><span class="nv">ilist</span><span class="p">)]</span>
</span><span class='line'>     <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">includes?</span> <span class="nv">ulist</span> <span class="nv">element</span><span class="p">)</span>
</span><span class='line'>       <span class="p">(</span><span class="nf">check-unique</span> <span class="p">(</span><span class="nb">rest </span><span class="nv">ilist</span><span class="p">)</span> <span class="nv">both</span> <span class="nv">ulist</span><span class="p">)</span>
</span><span class='line'>       <span class="p">(</span><span class="nf">check-unique</span> <span class="p">(</span><span class="nb">rest </span><span class="nv">ilist</span><span class="p">)</span> <span class="nv">both</span> <span class="p">(</span><span class="nb">conj </span><span class="nv">ulist</span> <span class="nv">element</span><span class="p">))))))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">defn </span><span class="nv">intersect</span> <span class="p">[</span><span class="nv">seta</span> <span class="nv">setb</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="nf">get-intersect</span> <span class="p">[]</span> <span class="nv">seta</span> <span class="nv">setb</span> <span class="p">(</span><span class="nb">concat </span><span class="nv">seta</span> <span class="nv">setb</span><span class="p">)))</span>
</span><span class='line'>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="k">defn </span><span class="nv">union</span> <span class="p">[</span><span class="nv">seta</span> <span class="nv">setb</span><span class="p">]</span>
</span><span class='line'>  <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">both-sets</span> <span class="p">(</span><span class="nb">concat </span><span class="nv">seta</span> <span class="nv">setb</span><span class="p">)</span>
</span><span class='line'>        <span class="nv">intersection</span> <span class="p">(</span><span class="nf">intersect</span> <span class="nv">seta</span> <span class="nv">setb</span><span class="p">)]</span>
</span><span class='line'>        <span class="p">(</span><span class="nf">unique-element</span> <span class="nv">intersection</span> <span class="nv">both-sets</span><span class="p">)))</span>
</span></code></pre></td></tr></table></div></figure>


<p>Applying this:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="nv">clojure</span><span class="o">.</span><span class="nv">core=&gt;</span> <span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">title</span> <span class="p">(</span><span class="nb">union </span><span class="nv">epub-list</span> <span class="nv">mobi-list</span><span class="p">)]</span> <span class="p">(</span><span class="nb">println </span><span class="nv">title</span><span class="p">))</span>
</span><span class='line'><span class="nv">The</span> <span class="nv">Joy</span> <span class="nv">of</span> <span class="nv">Clojure</span>
</span><span class='line'><span class="nv">Code</span> <span class="nv">Complete</span>
</span><span class='line'><span class="nv">Mining</span> <span class="nv">the</span> <span class="nv">Social</span> <span class="nv">Web</span>
</span><span class='line'><span class="nv">Introduction</span> <span class="nv">to</span> <span class="nv">Information</span> <span class="nv">Retrieval</span>
</span><span class='line'><span class="nv">Natural</span> <span class="nv">Language</span> <span class="nv">Processing</span> <span class="nv">with</span> <span class="nv">Python</span>
</span><span class='line'><span class="nv">nil</span>
</span><span class='line'><span class="nv">clojure</span><span class="o">.</span><span class="nv">core=&gt;</span>
</span></code></pre></td></tr></table></div></figure>


<h2>Applications</h2>

<p>This has been just a quick introduction to the topic, but hopefully you
can see the relevance to areas like data mining. Coincidentally, datasets
tend to conform to the mathematical idea of sets, and typically with some
data massaging (i.e. to filter out duplicates), those that don&#8217;t can
be made more like mathemtical sets. Once appropriately represented in the
computer, they can be acted upon with the basic set operations.</p>

<p>I&#8217;ve created an additional example: a web service providing a rest API to
the book collection. As with the code in this post, there is an example in
<a href="https://bitbucket.org/kisom/py_web_service/get/release-1.0.2.tar.gz">Python</a>
and in
<a href="https://github.com/kisom/clj_web_service/tarball/release-1.0.2">Clojure</a>. The
README in either example explains what dependencies are required. You can also
view the <a href="https://bitbucket.org/kisom/py_web_service/">Bitbucket repo</a> for the
Python example, or the <a href="https://github.com/kisom/clj_web_service">GitHub repo</a>
for the Clojure example.</p>

<h2>Acknowledgements</h2>

<p><a href="https://www.github.com/saolsen">Stephen Olsen</a> reviewed many iterations of this
article and helped me to properly articulate the important points (like illustrating
that unions require the subtraction of the intersection). I originally wrote
the bulk of this article on the 25th, but it took me until the 28th to finish
writing the API example code, until the 31st to add in the additional union
explanation, and until the 1st to polish it up.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Noir v. Flask]]></title>
    <link href="http://kisom.github.com/blog/2012/01/28/noir-v-flask/"/>
    <updated>2012-01-28T18:49:00+03:00</updated>
    <id>http://kisom.github.com/blog/2012/01/28/noir-v-flask</id>
    <content type="html"><![CDATA[<p>Noir v. Flask: the shootout</p>

<p>I wrote a quick REST API server as an illustration for a blog post, but
I wrote both a Python and a Clojure version. I wrote a test suite to
cover the entire API (of course - you <em>do</em> write tests too, right?),
and I figured while I was at it, I might as well benchmark the two. Here
are the results of 1,000 runs:</p>

<!-- more -->


<ul>
<li>noir: average run time: <code>0:00:00.171184</code> (0.171184 seconds)</li>
<li>flask: average run time: <code>0:00:00.147073</code> (0.147073 seconds)</li>
</ul>


<p>Notes:</p>

<ul>
<li>the time to start the noir server is much longer, about 5-10 seconds
on my 2011 Macbook Air (1.6 GHz Intel Core i5 with 4G of RAM and a 64G
SSD)</li>
<li>both servers were running on the same machine at the same time,
obviously just listening on different ports</li>
<li>I tested this with the Python test suite</li>
</ul>


<p>Source Code:</p>

<ul>
<li><a href="https://github.com/kisom/clj_web_service">clojure / noir example</a></li>
<li><a href="https://bitbucket.org/kisom/py_web_service">Python / flask example</a></li>
</ul>


<p>References:</p>

<ul>
<li><a href="http://www.webnoir.org">Noir</a></li>
<li><a href="http://flask.pocoo.org/">Flask</a></li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Basic Set Theory]]></title>
    <link href="http://kisom.github.com/blog/2012/01/23/basic-set-theory/"/>
    <updated>2012-01-23T17:44:00+03:00</updated>
    <id>http://kisom.github.com/blog/2012/01/23/basic-set-theory</id>
    <content type="html"><![CDATA[<p>Recently, I was explaining to someone the basics of set theory and how the
various basic operations translate to the real world. I used the example of the
project I&#8217;m currently working on, which is a web front end to my ebook library.
This is a very quick introduction aimed at people with a programming background
but who don&#8217;t have a strong math background; the goal is to help you to learn
to use them without having to delve deep into the math behind them.</p>

<!-- more -->


<h2>Basic Properties of Sets</h2>

<p>The first thing we have to do is to explain what is meant by a <em>set</em> -</p>

<blockquote><p>definition: set<br>
A set is any collection of items where each item is unique and the order of
items in the collection is not important.</p></blockquote>

<p>The uniqueness property is very important to sets: there are no duplicates in
a set.</p>

<p>So what does a set look like? In my database, I have a list of all the books
I have electronic copies of. Each book comes in at least one of three formats:
PDF, epub, or mobi. We&#8217;ll call the <em>superset</em> (the universal set of all the items
under consideration) the list of all the books in the library. We&#8217;ll call this
set &#8216;L&#8217; (for Library). Part of the set might look like:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>L = { 'Natural Language Processing with Python', 'Learning OpenCV', 
</span><span class='line'>      'Code Complete', 'Mastering Algorithms with C', 
</span><span class='line'>      'The Joy of Clojure', 'Mining the Social Web', 
</span><span class='line'>      'Algorithms In A Nutshell', 'Introduction to Information Retrieval', 
</span><span class='line'>      ... }</span></code></pre></td></tr></table></div></figure>


<p>We use <code>'{}'</code> to denote the members of a set. The order of books in the library
doesn&#8217;t matter here, and it doesn&#8217;t make sense to have more than one entry for
a book in the library.</p>

<p>Building a set in Python is very easy:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">library</span> <span class="o">=</span> <span class="nb">set</span><span class="p">([</span><span class="s">&#39;Natural Language Processing with Python&#39;</span><span class="p">,</span> <span class="s">&#39;Learning OpenCV&#39;</span><span class="p">,</span>
</span><span class='line'>               <span class="s">&#39;Code Complete&#39;</span><span class="p">,</span> <span class="s">&#39;Mastering Algorithms with C&#39;</span><span class="p">,</span>
</span><span class='line'>               <span class="s">&#39;The Joy of Clojure&#39;</span><span class="p">,</span> <span class="s">&#39;Mining the Social Web&#39;</span><span class="p">,</span>
</span><span class='line'>               <span class="s">&#39;Algorithms In A Nutshell&#39;</span><span class="p">,</span>
</span><span class='line'>               <span class="s">&#39;Introduction to Information Retrieval&#39;</span><span class="p">,</span>
</span><span class='line'>               <span class="s">&#39;Network Security With OpenSSL&#39;</span><span class="p">,</span> <span class="s">&#39;RADIUS&#39;</span><span class="p">])</span>
</span></code></pre></td></tr></table></div></figure>


<p>Clojure has set notation built in using the <code>#{ }</code> syntax, and any collection
can be turned into a set with <code>(set coll)</code>:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">library</span> <span class="o">#</span><span class="p">{</span> <span class="s">&quot;Natural Language Processing with Python&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;Learning OpenCV&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;Code Complete&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;Mastering Algorithms with C&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;The Joy of Clojure&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;Mining the Social Web&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;Algorithms In A Nutshell&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;Introduction to Information Retrieval&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;Network Security With OpenSSL&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;RADIUS&quot;</span><span class="p">})</span>
</span></code></pre></td></tr></table></div></figure>


<p>So now we need to build some subsets.</p>

<blockquote><p>definition: subset<br>
A subset is some part of a set.</p>

<p>definition: proper subset<br>
A proper subset is some part of a set, but is not the whole set.</p></blockquote>

<p>For example, we&#8217;ll create a subset of books P that are on or in Python. We&#8217;ll also create a subset of books E that are in the English language. For my library,
because not all of my books are in or about Python, the number of members of P
is smaller than the number of elements in L. However, all of my books are in
English, so the number of elements in E is the same as the number of elements in
L. Therefore P is a proper subset, while E is not.</p>

<h2>The Basic Set Operations</h2>

<p>Now let&#8217;s consider two proper subsets of the library to explain some of the basic
set operations: M is the subset of ebooks that I have in mobi format, and we&#8217;ll
redefine E to be the list of ebooks in epub format. For the sake of the rest of
this article, let&#8217;s note the following:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>M = { 'Natural Language Processing with Python', 'Code Complete',
</span><span class='line'>      'Introduction to Information Retrieval' }
</span><span class='line'>E = { 'The Joy of Clojure', 'Mining the Social Web', 'Code Complete' }</span></code></pre></td></tr></table></div></figure>


<p>In practical terms, this means in my library I have copies of:</p>

<ul>
<li>&#8220;Natural Language Processing with Python,&#8221; &#8220;Introduction to Information
Retrieval,&#8221; and &#8220;Code Complete&#8221; in mobi format</li>
<li>&#8220;The Joy of Clojure,&#8221; &#8220;Mining the Social Web,&#8221; and &#8220;Code Complete&#8221; in epub
format.</li>
</ul>


<p>In Python:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">mobi</span> <span class="o">=</span> <span class="nb">set</span><span class="p">([</span><span class="s">&#39;Natural Language Processing with Python&#39;</span><span class="p">,</span> <span class="s">&#39;Code Complete&#39;</span><span class="p">,</span>
</span><span class='line'>           <span class="s">&#39;Introduction to Information Retrieval&#39;</span><span class="p">])</span>
</span><span class='line'><span class="n">epub</span> <span class="o">=</span> <span class="nb">set</span><span class="p">([</span><span class="s">&#39;The Joy of Clojure&#39;</span><span class="p">,</span> <span class="s">&#39;Mining the Social Web&#39;</span><span class="p">,</span> <span class="s">&#39;Code Complete&#39;</span><span class="p">])</span>
</span></code></pre></td></tr></table></div></figure>


<p>In Clojure:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">mobi</span> <span class="o">#</span><span class="p">{</span><span class="s">&quot;Natural Language Processing with Python&quot;</span><span class="o">,</span> <span class="s">&quot;Code Complete&quot;</span><span class="o">,</span>
</span><span class='line'>           <span class="s">&quot;Introduction to Information Retrieval&quot;</span><span class="p">})</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">epub</span> <span class="o">#</span><span class="p">{</span><span class="s">&quot;The Joy of Clojure&quot;</span><span class="o">,</span> <span class="s">&quot;Mining the Social Web&quot;</span><span class="o">,</span> <span class="s">&quot;Code Complete&quot;</span><span class="p">})</span>
</span></code></pre></td></tr></table></div></figure>


<h3>Union</h3>

<p>A union is the set of members that appear in either set - if it&#8217;s in at least
one of the sets, it will appear in a union of the two sets. So we could define
a subset of L that contains all the books I have in a mobile format, which for
our purposes means copies exist in epub or mobi format. In Python, you can
use the <code>set.union</code> method, and in Clojure you can use the functions in the
<code>clojure.set</code> namespace.</p>

<p>In Python:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">mobile</span> <span class="o">=</span> <span class="nb">set</span><span class="o">.</span><span class="n">union</span><span class="p">(</span><span class="n">mobi</span><span class="p">,</span> <span class="n">epub</span><span class="p">)</span>
</span><span class='line'><span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">mobile</span><span class="p">:</span>
</span><span class='line'>    <span class="k">print</span> <span class="n">book</span>
</span></code></pre></td></tr></table></div></figure>


<p>which yields the output:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>Natural Language Processing in Python
</span><span class='line'>Code Complete
</span><span class='line'>Introduction to Information Retrieval
</span><span class='line'>The Joy of Clojure
</span><span class='line'>Mining the Social Web</span></code></pre></td></tr></table></div></figure>


<p>Remember that one of the properties of sets is that order is irrelevant, so you
might get the books in a different order (this applies to Clojure as well).</p>

<p>The same thing, in Clojure:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">mobile</span> <span class="p">(</span><span class="nf">clojure</span><span class="o">.</span><span class="nv">set/union</span> <span class="nv">mobi</span> <span class="nv">epub</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">book</span> <span class="nv">mobile</span><span class="p">]</span> <span class="p">(</span><span class="nb">println </span><span class="p">(</span><span class="nb">str </span><span class="nv">book</span><span class="p">)))</span>
</span></code></pre></td></tr></table></div></figure>


<p>You would see a similar output to the Python example.</p>

<p>Again, the practical result of this is a set of all the books I have in my
library in a mobile format.</p>

<h3>Intersection</h3>

<p>The intersection of two sets is a list of all the members that only appear in
both sets. In the library example, taking the intersection of the mobi and epub
sets gives me a set of my books that I have in both epub and mobi format. The
<code>intersection</code> function gives me this result.</p>

<p>The Python example:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">both_formats</span> <span class="o">=</span> <span class="nb">set</span><span class="o">.</span><span class="n">intersection</span><span class="p">(</span><span class="n">mobi</span><span class="p">,</span> <span class="n">epub</span><span class="p">)</span>
</span><span class='line'><span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">both_formats</span><span class="p">:</span>
</span><span class='line'>    <span class="k">print</span> <span class="n">book</span>
</span></code></pre></td></tr></table></div></figure>


<p>And in Clojure:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">both-formats</span> <span class="p">(</span><span class="nf">clojure</span><span class="o">.</span><span class="nv">set/intersection</span> <span class="nv">mobi</span> <span class="nv">epub</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">book</span> <span class="nv">both-formats</span><span class="p">]</span> <span class="p">(</span><span class="nb">println </span><span class="p">(</span><span class="nb">str </span><span class="nv">book</span><span class="p">)))</span>
</span></code></pre></td></tr></table></div></figure>


<p>For either example, the output should be just one book, given the sample sets:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>Code Complete</span></code></pre></td></tr></table></div></figure>


<p>I could use this result to know which books I can use on any mobile device.</p>

<h3>Difference</h3>

<p>The difference of one set from another is a list of all the members in the first
set that are not in the second set. This operation is a bit different from the
first two; the first two operations are
<a href="http://en.wikipedia.org/wiki/Commutative_property">commutative</a>,
but the result of a difference is dependent on the order of the sets. I&#8217;ll
illustrate this with some code examples:</p>

<p>In Python:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">only_mobi</span> <span class="o">=</span> <span class="nb">set</span><span class="o">.</span><span class="n">difference</span><span class="p">(</span><span class="n">mobi</span><span class="p">,</span> <span class="n">epub</span><span class="p">)</span>
</span><span class='line'><span class="k">print</span> <span class="s">&#39;books only in mobi format:&#39;</span>
</span><span class='line'><span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">only_mobi</span><span class="p">:</span>
</span><span class='line'>    <span class="k">print</span> <span class="s">&#39;</span><span class="se">\t</span><span class="s">&#39;</span> <span class="o">+</span> <span class="n">book</span>
</span><span class='line'>
</span><span class='line'><span class="n">only_epub</span> <span class="o">=</span> <span class="nb">set</span><span class="o">.</span><span class="n">difference</span><span class="p">(</span><span class="n">epub</span><span class="p">,</span> <span class="n">mobi</span><span class="p">)</span>
</span><span class='line'><span class="k">print</span> <span class="s">&#39;books only in epub format:&#39;</span>
</span><span class='line'><span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">only_epub</span><span class="p">:</span>
</span><span class='line'>    <span class="k">print</span> <span class="s">&#39;</span><span class="se">\t</span><span class="s">&#39;</span> <span class="o">+</span> <span class="n">book</span>
</span></code></pre></td></tr></table></div></figure>


<p>In Clojure:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nb">println </span><span class="s">&quot;books only in mobi format:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">only-mobi</span> <span class="p">(</span><span class="nf">clojure</span><span class="o">.</span><span class="nv">set/difference</span> <span class="nv">mobi</span> <span class="nv">epub</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">book</span> <span class="nv">only-mobi</span><span class="p">]</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">println </span><span class="s">&quot;\t&quot;</span> <span class="nv">book</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nb">println </span><span class="s">&quot;books only in epub format:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">only-epub</span> <span class="p">(</span><span class="nf">clojure</span><span class="o">.</span><span class="nv">set/difference</span> <span class="nv">epub</span> <span class="nv">mobi</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">book</span> <span class="nv">only-epub</span><span class="p">]</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">println </span><span class="s">&quot;\t&quot;</span> <span class="nv">book</span><span class="p">))</span>
</span></code></pre></td></tr></table></div></figure>


<p>As the output messages show, this gives us the set of books that are only
in mobi and the set of books that are only in epub. The output should look
something like:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>books only in mobi format:
</span><span class='line'>  Introduction to Information Retrieval
</span><span class='line'>  Natural Language Processing with Python
</span><span class='line'>books only in epub format:
</span><span class='line'>  The Joy of Clojure
</span><span class='line'>  Mining the Social Web</span></code></pre></td></tr></table></div></figure>


<h3>Complements</h3>

<p>When discussing complements, we do so when considering a subset and it&#8217;s
superset. The complement of a subset is the difference of subset from the
superset; i.e., the set of all members in the superset that are not in the
subset. For example, if I wanted to check my library for all ebooks I have
that are not in mobi format, I would use the superset <code>library</code> and take the
difference of mobi from library:</p>

<p>In Python:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">not_mobi</span> <span class="o">=</span> <span class="nb">set</span><span class="o">.</span><span class="n">difference</span><span class="p">(</span><span class="n">library</span><span class="p">,</span> <span class="n">mobi</span><span class="p">)</span>
</span><span class='line'><span class="k">print</span> <span class="s">&#39;books not in mobi format, using the library superset:&#39;</span>
</span><span class='line'><span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">not_mobi</span><span class="p">:</span>
</span><span class='line'>    <span class="k">print</span> <span class="s">&#39;</span><span class="se">\t</span><span class="s">&#39;</span> <span class="o">+</span> <span class="n">book</span>
</span></code></pre></td></tr></table></div></figure>


<p>and in Clojure:</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class='clojure'><span class='line'><span class="p">(</span><span class="nb">println </span><span class="ss">&#39;books</span> <span class="nv">not</span> <span class="nv">in</span> <span class="nv">mobi</span> <span class="nv">format,</span> <span class="nv">using</span> <span class="nv">the</span> <span class="nv">library</span> <span class="nv">superset:</span><span class="o">&#39;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">not-mobi</span> <span class="p">(</span><span class="nf">clojure</span><span class="o">.</span><span class="nv">set/difference</span> <span class="nv">library</span> <span class="nv">mobi</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">book</span> <span class="nv">not-mobi</span><span class="p">]</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">println </span><span class="s">&quot;\t&quot;</span> <span class="nv">book</span><span class="p">))</span>
</span></code></pre></td></tr></table></div></figure>


<p>This gives us the output:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>books not in mobi format, using the library superset:
</span><span class='line'>   Mining the Social Web
</span><span class='line'>   Algorithms In A Nutshell
</span><span class='line'>   Mastering Algorithms with C
</span><span class='line'>   RADIUS
</span><span class='line'>   The Joy of Clojure
</span><span class='line'>   Network Security With OpenSSL
</span><span class='line'>   Learning OpenCV</span></code></pre></td></tr></table></div></figure>


<h2>Conclusion</h2>

<p>This has been a very basic look at set theory and what it means in practise.
There is a lot more to set theory (see the references) but this should help
get you started. There are a lot of applications for set theory, such as in
data mining and natural language processing; it is a powerful tool that is
worth spending some time to get to know.</p>

<p>Stay tuned for the next post, which will be on how to use sets in your code.
We&#8217;ll develop the library idea a bit more.</p>

<p><em>UPDATE</em>: The <a href="http://kyleisom.net/blog/2012/02/01/using-set-theory/">next post</a> is up!</p>

<h2>References</h2>

<ul>
<li>I&#8217;ve been reading <a href="https://en.wikipedia.org/wiki/Alfred_Aho">Alfred Aho&#8217;s</a> <a href="https://en.wikipedia.org/wiki/Special:BookSources/0139145567"><underline>The Theory of Parsing, Translating, and Compiling (Volume I: Parsing)</underline></a>
(<a href="http://www.amazon.com/dp/0139145567/">Amazon link</a>)</li>
<li>There is, of course, a good <a href="https://en.wikipedia.org/wiki/Set_(mathematics">wikipedia article</a>.</li>
</ul>


<h2>Reviewers</h2>

<p>I&#8217;d like to thank the following people for reviewing this:</p>

<ul>
<li><a href="https://www.twitter.com/imwally">Wally Jones</a></li>
<li><a href="https://saolsen.github.com/">Stephen Olsen</a></li>
<li><a href="https://www.twitter.com/qb1t">Aaron Bieber</a></li>
<li><a href="https://twitter.com/#!/Slaughterhut">Jason Barbier</a></li>
<li><a href="http://shawnmeier.com/">Shawn Meier</a></li>
<li>Matt Sowers</li>
</ul>


<h2>Code Samples</h2>

<p>The complete python source code, which you can save to a file and run directly:</p>

<figure class='code'><figcaption><span> (set_theory.py)</span> <a href='http://kisom.github.com/downloads/code/set_theory/set_theory.py'>download</a></figcaption>
 <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
<span class='line-number'>50</span>
<span class='line-number'>51</span>
<span class='line-number'>52</span>
<span class='line-number'>53</span>
<span class='line-number'>54</span>
<span class='line-number'>55</span>
<span class='line-number'>56</span>
</pre></td><td class='code'><pre><code class='py'><span class='line'><span class="c">#!/usr/bin/env python</span>
</span><span class='line'><span class="c"># -*- coding: utf-8 -*-</span>
</span><span class='line'><span class="c">#</span>
</span><span class='line'><span class="c"># author: kyle isom &lt;coder@kyleisom.net&gt;</span>
</span><span class='line'><span class="c"># date: 2012-01-23</span>
</span><span class='line'><span class="c"># license: ISC / public domain (brokenlcd.net/license.txt)</span>
</span><span class='line'><span class="c">#</span>
</span><span class='line'><span class="sd">&quot;&quot;&quot;</span>
</span><span class='line'><span class="sd">Python illustrations for blog article &quot;Basic Set Theory&quot;</span>
</span><span class='line'><span class="sd">    (see http://kisom.github.com/blog/2012/01/23/basic-set-theory/)</span>
</span><span class='line'>
</span><span class='line'><span class="sd">Note that this is slightly tweaked from the examples in the article:</span>
</span><span class='line'><span class="sd">    1. PEP8 dictates that all globals be in all caps; as all the variables</span>
</span><span class='line'><span class="sd">    in this illustration are globals, they have been modified to be all caps.</span>
</span><span class='line'><span class="sd">    2. There is a little extra output to explain what is going on; namely,</span>
</span><span class='line'><span class="sd">    tabs are added before printing books and there is an output line showing</span>
</span><span class='line'><span class="sd">    which example the book set is associated with.</span>
</span><span class='line'><span class="sd">&quot;&quot;&quot;</span>
</span><span class='line'>
</span><span class='line'><span class="c"># variables are in all caps because they are globals, and PEP8 dictates</span>
</span><span class='line'><span class="c"># that globals be in caps.</span>
</span><span class='line'>
</span><span class='line'><span class="c"># the superset</span>
</span><span class='line'><span class="n">LIBRARY</span> <span class="o">=</span> <span class="nb">set</span><span class="p">([</span><span class="s">&#39;Natural Language Processing with Python&#39;</span><span class="p">,</span> <span class="s">&#39;Learning OpenCV&#39;</span><span class="p">,</span>
</span><span class='line'>               <span class="s">&#39;Code Complete&#39;</span><span class="p">,</span> <span class="s">&#39;Mastering Algorithms with C&#39;</span><span class="p">,</span>
</span><span class='line'>               <span class="s">&#39;The Joy of Clojure&#39;</span><span class="p">,</span> <span class="s">&#39;Mining the Social Web&#39;</span><span class="p">,</span>
</span><span class='line'>               <span class="s">&#39;Algorithms In A Nutshell&#39;</span><span class="p">,</span>
</span><span class='line'>               <span class="s">&#39;Introduction to Information Retrieval&#39;</span><span class="p">,</span>
</span><span class='line'>               <span class="s">&#39;Network Security With OpenSSL&#39;</span><span class="p">,</span> <span class="s">&#39;RADIUS&#39;</span><span class="p">])</span>
</span><span class='line'>
</span><span class='line'><span class="c"># the subsets</span>
</span><span class='line'><span class="n">MOBI</span> <span class="o">=</span> <span class="nb">set</span><span class="p">([</span><span class="s">&#39;Natural Language Processing with Python&#39;</span><span class="p">,</span> <span class="s">&#39;Code Complete&#39;</span><span class="p">,</span>
</span><span class='line'>           <span class="s">&#39;Introduction to Information Retrieval&#39;</span><span class="p">])</span>
</span><span class='line'><span class="n">EPUB</span> <span class="o">=</span> <span class="nb">set</span><span class="p">([</span><span class="s">&#39;The Joy of Clojure&#39;</span><span class="p">,</span> <span class="s">&#39;Mining the Social Web&#39;</span><span class="p">,</span>
</span><span class='line'>            <span class="s">&#39;Code Complete&#39;</span><span class="p">])</span>
</span><span class='line'>
</span><span class='line'>
</span><span class='line'><span class="k">print</span> <span class="s">&#39;[+] list all the books in a mobile format (union example):&#39;</span>
</span><span class='line'><span class="n">MOBILE</span> <span class="o">=</span> <span class="nb">set</span><span class="o">.</span><span class="n">union</span><span class="p">(</span><span class="n">MOBI</span><span class="p">,</span> <span class="n">EPUB</span><span class="p">)</span>
</span><span class='line'><span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">MOBILE</span><span class="p">:</span>
</span><span class='line'>    <span class="k">print</span> <span class="s">&#39;</span><span class="se">\t</span><span class="s">&#39;</span> <span class="o">+</span> <span class="n">book</span>
</span><span class='line'>
</span><span class='line'><span class="k">print</span> <span class="s">&#39;[+] list all the books in both mobile formats (intersection example)&#39;</span>
</span><span class='line'><span class="n">BOTH_FORMATS</span> <span class="o">=</span> <span class="nb">set</span><span class="o">.</span><span class="n">intersection</span><span class="p">(</span><span class="n">MOBI</span><span class="p">,</span> <span class="n">EPUB</span><span class="p">)</span>
</span><span class='line'><span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">BOTH_FORMATS</span><span class="p">:</span>
</span><span class='line'>    <span class="k">print</span> <span class="s">&#39;</span><span class="se">\t</span><span class="s">&#39;</span> <span class="o">+</span> <span class="n">book</span>
</span><span class='line'>
</span><span class='line'><span class="n">ONLY_MOBI</span> <span class="o">=</span> <span class="nb">set</span><span class="o">.</span><span class="n">difference</span><span class="p">(</span><span class="n">MOBI</span><span class="p">,</span> <span class="n">EPUB</span><span class="p">)</span>
</span><span class='line'><span class="k">print</span> <span class="s">&#39;[+] books only in mobi format:&#39;</span>
</span><span class='line'><span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">ONLY_MOBI</span><span class="p">:</span>
</span><span class='line'>    <span class="k">print</span> <span class="s">&#39;</span><span class="se">\t</span><span class="s">&#39;</span> <span class="o">+</span> <span class="n">book</span>
</span><span class='line'>
</span><span class='line'><span class="n">ONLY_EPUB</span> <span class="o">=</span> <span class="nb">set</span><span class="o">.</span><span class="n">difference</span><span class="p">(</span><span class="n">EPUB</span><span class="p">,</span> <span class="n">MOBI</span><span class="p">)</span>
</span><span class='line'><span class="k">print</span> <span class="s">&#39;[+] books only in epub format:&#39;</span>
</span><span class='line'><span class="k">for</span> <span class="n">book</span> <span class="ow">in</span> <span class="n">ONLY_EPUB</span><span class="p">:</span>
</span><span class='line'>    <span class="k">print</span> <span class="s">&#39;</span><span class="se">\t</span><span class="s">&#39;</span> <span class="o">+</span> <span class="n">book</span>
</span></code></pre></td></tr></table></div></figure>


<p>You&#8217;ll want to run this with <code>python set_theory.py</code> (or whatever you choose to
name the file, obviously).</p>

<p>The complete Clojure source code, which you can likewise save and run:</p>

<figure class='code'><figcaption><span> (set-theory.clj)</span> <a href='http://kisom.github.com/downloads/code/set_theory/set-theory.clj'>download</a></figcaption>
 <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
<span class='line-number'>50</span>
<span class='line-number'>51</span>
<span class='line-number'>52</span>
<span class='line-number'>53</span>
<span class='line-number'>54</span>
<span class='line-number'>55</span>
<span class='line-number'>56</span>
</pre></td><td class='code'><pre><code class='clj'><span class='line'><span class="c1">;; set-theory.clj</span>
</span><span class='line'><span class="c1">;; author: kyle isom &lt;coder@kyleisom.net&gt;</span>
</span><span class='line'><span class="c1">;; date: 2012-01-23</span>
</span><span class='line'><span class="c1">;; license: ISC / public domain (brokenlcd.net/license.txt)</span>
</span><span class='line'><span class="c1">;;</span>
</span><span class='line'><span class="c1">;; code examples for blog post &quot;Basic Set Theory&quot;</span>
</span><span class='line'><span class="c1">;;     http://kisom.github.com/blog/2012/01/23/basic-set-theory/</span>
</span><span class='line'><span class="c1">;;</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nf">require</span> <span class="ss">&#39;clojure</span><span class="o">.</span><span class="nv">set</span><span class="p">)</span>
</span><span class='line'>
</span><span class='line'><span class="c1">; the superset</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">library</span> <span class="o">#</span><span class="p">{</span> <span class="s">&quot;Natural Language Processing with Python&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;Learning OpenCV&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;Code Complete&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;Mastering Algorithms with C&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;The Joy of Clojure&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;Mining the Social Web&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;Algorithms In A Nutshell&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;Introduction to Information Retrieval&quot;</span>
</span><span class='line'>                <span class="s">&quot;Network Security With OpenSSL&quot;</span><span class="o">,</span>
</span><span class='line'>                <span class="s">&quot;RADIUS&quot;</span> <span class="p">})</span>
</span><span class='line'>
</span><span class='line'><span class="c1">; the subsets</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">mobi</span> <span class="o">#</span><span class="p">{</span><span class="s">&quot;Natural Language Processing with Python&quot;</span><span class="o">,</span> <span class="s">&quot;Code Complete&quot;</span><span class="o">,</span>
</span><span class='line'>           <span class="s">&quot;Introduction to Information Retrieval&quot;</span><span class="p">})</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">epub</span> <span class="o">#</span><span class="p">{</span><span class="s">&quot;The Joy of Clojure&quot;</span><span class="o">,</span> <span class="s">&quot;Mining the Social Web&quot;</span><span class="o">,</span> <span class="s">&quot;Code Complete&quot;</span><span class="p">})</span>
</span><span class='line'>
</span><span class='line'><span class="c1">; union illustration</span>
</span><span class='line'><span class="p">(</span><span class="nb">println </span><span class="s">&quot;union illustration (books in either mobile format)&quot;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">mobile</span> <span class="p">(</span><span class="nf">clojure</span><span class="o">.</span><span class="nv">set/union</span> <span class="nv">mobi</span> <span class="nv">epub</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">book</span> <span class="nv">mobile</span><span class="p">]</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">println </span><span class="s">&quot;\t&quot;</span> <span class="nv">book</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="c1">; intersection illustration</span>
</span><span class='line'><span class="p">(</span><span class="nb">println </span><span class="s">&quot;intersection illustration (books in both mobile formats)&quot;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">both-formats</span> <span class="p">(</span><span class="nf">clojure</span><span class="o">.</span><span class="nv">set/intersection</span> <span class="nv">mobi</span> <span class="nv">epub</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">book</span> <span class="nv">both-formats</span><span class="p">]</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">println </span><span class="s">&quot;\t&quot;</span> <span class="nv">book</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nb">println </span><span class="s">&quot;books only in mobi format:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">only-mobi</span> <span class="p">(</span><span class="nf">clojure</span><span class="o">.</span><span class="nv">set/difference</span> <span class="nv">mobi</span> <span class="nv">epub</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">book</span> <span class="nv">only-mobi</span><span class="p">]</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">println </span><span class="s">&quot;\t&quot;</span> <span class="nv">book</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="p">(</span><span class="nb">println </span><span class="s">&quot;books only in epub format:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">only-epub</span> <span class="p">(</span><span class="nf">clojure</span><span class="o">.</span><span class="nv">set/difference</span> <span class="nv">epub</span> <span class="nv">mobi</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">book</span> <span class="nv">only-epub</span><span class="p">]</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">println </span><span class="s">&quot;\t&quot;</span> <span class="nv">book</span><span class="p">))</span>
</span><span class='line'>
</span><span class='line'><span class="c1">; complement illustration</span>
</span><span class='line'><span class="p">(</span><span class="nb">println </span><span class="s">&quot;books not in mobi format, using the library superset:&quot;</span><span class="p">)</span>
</span><span class='line'><span class="p">(</span><span class="k">def </span><span class="nv">not-mobi</span> <span class="p">(</span><span class="nf">clojure</span><span class="o">.</span><span class="nv">set/difference</span> <span class="nv">library</span> <span class="nv">mobi</span><span class="p">))</span>
</span><span class='line'><span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">book</span> <span class="nv">not-mobi</span><span class="p">]</span>
</span><span class='line'>    <span class="p">(</span><span class="nb">println </span><span class="s">&quot;\t&quot;</span> <span class="nv">book</span><span class="p">))</span>
</span></code></pre></td></tr></table></div></figure>


<p>You&#8217;ll want to run this with <code>clj set-theory.py</code> - I&#8217;ve deliberately chosen
not to make this a lein project in order to make it easier to share, but I did
<a href="http://kisom.github.com/downloads/set_theory.tar.gz">upload a lein project</a>.
You should be able to just run <code>lein deps, test, run</code>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[on SOPA and PIPA]]></title>
    <link href="http://kisom.github.com/blog/2012/01/18/on-sopa-and-pipa/"/>
    <updated>2012-01-18T00:00:00+03:00</updated>
    <id>http://kisom.github.com/blog/2012/01/18/on-sopa-and-pipa</id>
    <content type="html"><![CDATA[<p>Imagine you are the owner of a small restaurant. The neighbourhood is of mixed
quality, but this is where you live so you try to make do anyways. One day,
new laws are passed such that if anyone in your restaurant conducts any sort of
illegal activity (like a drug deal), the police blockade your restaurant and
force everyone out. Furthermore, this new law isn&#8217;t clear about how to get your
restaurant back. The politicans who made this law have absolutely no experience
in the restaurant industry, but still expect you to continually monitor all your
patrons and do the work of the police and law enforcement for them. At any time,
one of your patrons can call the police and claim something happened in your
restaurant and you get shutdown. On top of all this, there are people actively
looking for anything untoward happening so as to shut you down. What do you
do? You can&#8217;t really afford to hire more waiters and waitresses or security
personnel to monitor (and don&#8217;t really want to establish that kind of atmosphere
in your business anyhow); CCTV and other technical measures have too long of a
delay (or require you to suspect something happened so you can check the tapes).
Really, the only thing you can do is to move out of town.</p>

<!-- more -->


<p>Of course I&#8217;m talking about <a href="http://www.govtrack.us/congress/bill.xpd?bill=s112-968">PIPA</a>
and <a href="http://www.govtrack.us/congress/bill.xpd?bill=h112-3261">SOPA</a>. The story I
told isn&#8217;t entirely <a href="http://en.wikipedia.org/wiki/G%C3%B6del%2C_Escher%2C_Bach">isomorphic</a>
to the current situation, but it gets the idea across. The
politicians enacting this legislation have admittedly no technical knowledge, despite the
fact that the vast majority of the people working in the tech industry have decried this
as a universally uneducated and ineffective decision that will do more to hurt the innocent
than to accomplish its stated objectives.</p>

<p>This is the exact reason why I would not start a tech business in the US anymore.
Starting a business is hard enough with everything else; worrying about the legal
environment is too much and there are plenty of places where law enforcement does
its job instead of placing the burden on you.</p>

<p>Unfortunately, just leaving the country isn&#8217;t going to let me just dodge the
effects. What happens when you lose a large part of your market share (i.e. the
US market)? <a href="http://www.arstechnica.com">Ars Technica</a> has a
<a href="http://arstechnica.com/tech-policy/news/2012/01/what-does-sopa-mean-for-us-foreigners.ars">good writeup</a>
on how SOPA/PIPA affect foreign users.</p>

<p>So what could the government do instead?</p>

<ul>
<li>work on improving the technical skills of their workforce (not driving away the people with these skills with the government&#8217;s current overwhelming ineptitude would be a good start)</li>
<li>being far more transparent about the process of taking down a site</li>
</ul>


<p>Unfortunately, I don&#8217;t have enough money to pay the government to listen to me.</p>

<p>Further reading:</p>

<ol>
<li>The EFF has a good <a href="https://www.eff.org/takedowns">Takedown Wall of Shame</a>
if you don&#8217;t believe the government would possibly abuse or misuse their
takedown powers.</li>
<li>The EFF also has a <a href="https://www.eff.org/deeplinks/2012/01/how-pipa-and-sopa-violate-white-house-principles-supporting-free-speech">good writeup</a> on SOPA.</li>
<li><a href="http://americancensorship.org/">Stop American Censorship</a></li>
<li>Fight For the Future has a <a href="http://fightforthefuture.org/pipa">good video</a></li>
</ol>


<p>Thanks to <a href="http://samuelgoodwin.tumblr.com">Samuel Goodwin</a>, Beau Holton,
<a href="https://twitter.com/#!/Slaughterhut">Jason Barbier</a>,
<a href="http://twitter.com/qb1t">Aaron Bieber</a>,
and <a href="http://twitter.com/imwally">Wally Jones</a> for reviewing this.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[back to lisp]]></title>
    <link href="http://kisom.github.com/blog/2012/01/02/back-to-lisp/"/>
    <updated>2012-01-02T00:00:00+03:00</updated>
    <id>http://kisom.github.com/blog/2012/01/02/back-to-lisp</id>
    <content type="html"><![CDATA[

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>commit e358120dd3760e64436f5652895c751b39148ebd
</span><span class='line'>   Author: Kyle Isom &lt;coder@kyleisom.net>
</span><span class='line'>   Date:   Wed Dec 28 19:22:59 2011 +0300
</span><span class='line'>   
</span><span class='line'>    initial commit</span></code></pre></td></tr></table></div></figure>


<p>A brief stint playing with clojure made me miss common lisp, so I&#8217;m working
through <a href="http://www.paulgraham.com">Paul Graham&#8217;s</a>
<a href="http://paulgraham.com/acl.html">ANSI Common Lisp</a> with a copy of
<a href="http://paulgraham.com/onlisp.html">On Lisp</a>. My last foray, I learned
from <a href="http://www.cs.cmu.edu/~dst/">David Touretzky&#8217;s</a>
<a href="http://www.cs.cmu.edu/~dst/LispBook/index.html">A Gentle Introduction to Symbolic Computation</a>,
so this time I&#8217;m trying PG&#8217;s book. So far I&#8217;ve done more useful things,
mostly by actually reading a bit more of the <a href="http://www.sbcl.org">sbcl</a>
<a href="http://www.sbcl.org/manual/">user manual</a> (from which I learned some
useful things such as <code>sb-ext:*posix-argv*</code> and <code>sb-ext:save-lisp-and-die</code>)
and by the immensely useful site
<a href="http://rosettacode.org/wiki/Rosetta_Code">Rosetta Code</a>, from which I
learned about the <a href="http://www.weitz.de/drakma/">DRAKMA</a> HTTP client
library. I&#8217;ve also been aided quite a bit by
<a href="http://xach.com">Zach Beane&#8217;s</a> <a href="http://www.quicklisp.org/">quicklisp</a>;
in fact, one of the things I&#8217;ve done is to write a short
<a href="https://gist.github.com/1548276">script</a> to build an sbcl image with
quicklisp and my most commonly used libraries built-in.</p>

<!-- more -->


<script src="https://gist.github.com/1548276.js?file=build-image.lisp"></script>


<p>One of the things I love about functional programming is the idea that
instead of relying on a lot of variables, you use functions as sort of
&#8220;organic variables&#8221; that provide immutable data based on some input. The
ability to build what feels more organic, less static. I think
<a href="https://en.wikipedia.org/wiki/Steve_Yegge">Steve Yegge&#8217;s</a>
blog post <a href="http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html">Execution in the Kingdom of Nouns</a>
is spot on.</p>

<p>I anticipate this to be the year of Lisp for me, as I delve into
Common Lisp, Scheme, and Clojure.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[my docs got dropped]]></title>
    <link href="http://kisom.github.com/blog/2011/12/31/my-docs-got-dropped/"/>
    <updated>2011-12-31T00:00:00+03:00</updated>
    <id>http://kisom.github.com/blog/2011/12/31/my-docs-got-dropped</id>
    <content type="html"><![CDATA[<p>My docs are in the stratfor leak.</p>

<!-- more -->


<p>I&#8217;m not too worried though; everything in there is either out of date
(i.e. that credit card already expired and I have a new one now), my
password was used only on that site (but was a 22 character phrase with
punctuation), and that email address was only used for stratfor. My
address was also already published due to my domain names (most of which
are now privately registered, but I couldn&#8217;t always afford that).</p>

<p>Now that I have a doc drop that actually affects me (the mtgox break-in
gathered a similarly difficult and unique password, and besides some spamming
from idiots trying to exploit the situation with new bitcoin services
there was no fallout), I can talk about what I think of the spate of
LulzSec-style attacks this year. And that is - getting mad at Anonymous,
LulzSec, AntiSec, or whatever the nom du jour is, is an exercise in futility
and ignorance. It&#8217;s like getting mad at your five year old for getting into
the cookie jar you left on the kitchen table. Your antiquated and
ineffective security mechanism (the kid couldn&#8217;t <em>possibly</em> get on the
table) should have been replaced by something more effective (maybe locking
it in the pantry, overkill for a cookie jar though).</p>

<p>The issue is it&#8217;s 2011 (almost 2012) and we&#8217;ve been doing this for a while
now. The rampant incompetence of people setting up these sites should be
made a crime. That would be a far better use of legislative effort than the
brain dead attempts at anti-piracy we&#8217;re seeing now. The script kiddies
aren&#8217;t exhibiting any serious talent, the security (I use the term loosely
here) people are setting up is juvenile.</p>

<p><strong> Afterthoughts: </strong></p>

<p>As <a href="http://hackerne.ws/user?id=gyardley">gyardley</a> pointed out on
<a href="http://hackerne.ws/item?id=3411236">hackerne.ws</a>, when I compare the
LulzSec-type script kiddies to five-year-olds, I don&#8217;t mean to create
the impression that they shouldn&#8217;t be held legally liable. I do fear,
however, that focusing on legal actions against those responsible will
cause us to lose focus on the bigger problem.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[suddenly enlightenment]]></title>
    <link href="http://kisom.github.com/blog/2011/12/03/suddenly-enlightenment/"/>
    <updated>2011-12-03T00:00:00+03:00</updated>
    <id>http://kisom.github.com/blog/2011/12/03/suddenly-enlightenment</id>
    <content type="html"><![CDATA[<p>It&#8217;s been almost 28 hours since I last slept, so I apologise if this
post contains a few spelling or grammatical errors. As soon as I
become aware of them, rest assured I will quickly put them to right.</p>

<p><a href="http://www.kyleisom.net/blog/2011/11/35-dot_emacs">Today&#8217;s git commit</a> occurred
while I was working on getting a web development test VM / environment working. The
goal was to update a CGI script when I pushed to the dev vm. The commit log:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>commit 2de6f8444c68b0dd5ad31dd815d71a5590c5120e
</span><span class='line'>   Author: Kyle Isom &lt;coder@kyleisom.net>
</span><span class='line'>   Date:   Sat Dec 3 00:24:34 2011 +0300
</span><span class='line'>   
</span><span class='line'>       suddenly enlightenment</span></code></pre></td></tr></table></div></figure>




<!-- more -->


<p>It took a while for me to grok what was happening with the hook, but finally it clicked.
I did a lot of reading online, and was greatly helped by the <a href="https://www.ora.com">O&#8217;Reilly</a>
book <a href="https://shop.oreilly.com/product/9780596620137.do">Version Control with Git</a>
and the <code>githooks(5)</code> man page.</p>

<p>My remote repository was a bare git repo (one initialised with <code>git init --bare</code> that I
pushed my local changes to. I created a staging directory (<code>${HOME}/stage/cgitest</code>)
and created the following hook:</p>

<pre><code>kyle@www-dev:~/code/cgitest/hooks$ cat post-update
#!/bin/sh
export GIT_DIR=/home/kyle/code/zipcgi
export GIT_WORK_TREE=/home/kyle/stage/zipcgi
git reset --hard
git checkout -f
cp ${GIT_WORK_TREE}/zipcgi.py ~/bin/cgi/
</code></pre>

<p>As a side note, make sure the script is <code>chmod +x</code>&#8216;d.</p>

<p>The reason why we have to specify the git dir is that by default,
because this is in the bare repository, git will assume the git
directory is the repository directory. The problem is, that directory
doesn&#8217;t have a working tree. A working tree is required to checkout
the repository - i.e. so we have a named file to work with. To work
around this, I explicitly specify a working tree . Then I copy the CGI
script to my CGI directory.</p>

<p>Why not just symlink the file? Well, symlinks work on inodes. This
allows multiple names to refer to the same file, but it does mean that
even though the file is in the same directory and shares the same
name, it is not guaranteed the same inode number. The git checkout
can, in essence, unlink the old file and create a completely new
file. The end result is that your symlink will likely be broken,
pointing to a now non-existent inode. The safest method is just to
copy the new version on top of the old one.</p>

<p>Why do we have to manipulate the environment variables
<code>GIT_DIR</code> (which points to the directory containing the actual git
repository, more on that in a second) and <code>GIT_WORK_TREE</code>, which
represents the working tree. To really understand this, you need to
understand the difference between the working tree and the
repository. You could take the long route and read the excellent book
I mentioned above and wade through man pages (which are pretty well
written, but there is a lot of information to keep track of). An
alternative is to buckle in and keep reading for my crash course.</p>

<p>Still here? Buckled in? Let&#8217;s do this. A git repository is basically a
filesystem-based database that uses hashes for identification and
great success. If you poke around in your git repository (which in a
standard local repository is in <code>${PROJECT}/.git</code>), particularly under
objects, you will see what I mean. Everything is stored as a hash
object. Git uses <a href="https://en.wikipedia.org/wiki/SHA-1">SHA-1</a>, and
under <code>.git/objects</code> you will see a list of subdirectorys. These
subdirectories (with the exception of <code>pack</code> and <code>info</code>) named after
the first byte of the SHA-1 hash (which is two bytes when stored as a
semi-human readable hex digest). Under these subdirectories, git
stores the objects as the remaining 19 bytes (again, 38 bytes when
stored as a hex digest) of the hash. The file is
zlib-compressed. Don&#8217;t believe me? If you clone my
<a href="https://github.com/kisom/woofs">woofs project</a> and look up</p>

<p><code>.git/objects/bf/2f7383ca7343f85f1308fc6dc3c34dbd047d90</code>.</p>

<p>Try the following python code:</p>

<pre><code>import zlib
print zlib.decompress(open('2f7383ca7343f85f1308fc6dc3c34dbd047d90').read())
</code></pre>

<p>You should see a working version of the script (and the latest version
as of this writing). This is how git sees everything. (If you want to
see what git sees a file as, use <code>git hash-object &lt;FILE&gt;</code>.)</p>

<p>The working directory is where you, the developer or end user,
interact with the contents of the database. This is where things can
be staged to be committed, and in a bare repo (typically found
on remote repos), there won&#8217;t be a working directory because you
aren&#8217;t working directly on that copy of the repo. Try this:</p>

<pre><code>mkdir -p ~/tmp/stage/woofs_working
export GIT_DIR=~/Code/woofs/.git 
export GIT_WORK_TREE=~/tmp/stage/woofs_working
cd ~
git reset --hard
ls ~/tmp/stage/woofs_working
</code></pre>

<p>Voilà! You should see the contents of the repo there. (I&#8217;d recommend
either closing out that terminal session or running</p>

<pre><code>unset GIT_DIR GIT_WORK_TREE
</code></pre>

<p>to prevent problems later on. Also, while I&#8217;m using a repo I chose at
random from my <code>~/Code</code> directory, you could (and should) be trying
with a repo of your own.</p>

<p>It should be clear now why I had to explicitly specify the two. The
next two commands just reset the working directory to the lastest
commit (i.e. the one that was just pushed) and check out a fresh copy,
to make sure everything that should be present is present.</p>

<p>This turned out to be a longer post than I had expected, but my hope
is that it helps other people quickly get their hooks operational. The
cool thing about hooks is they are just executable shells scripts,
which means:</p>

<ol>
<li>the script&#8217;s <code>${PWD}</code> is the hooks directory in the git repo.</li>
<li>the <code>${GIT_DIR}</code> is by default &#8216;.&#8217; and is the repo directory. for
example, if we had a bare woofs repo, it would be something like
<code>/home/kyle/code/woofs</code>, while in a local repo it would be
<code>/home/kyle/code/woofs/.git</code>.</li>
<li>because it&#8217;s just a shell script, you can use any language you can
use a shebang for.</li>
</ol>


<p>Git hooks are a powerful tool and can greatly boost your productivity,
automatically deploy code, and help us fight SkyNet. You should
consider using them in your next project.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[add .emacs.d/init.el]]></title>
    <link href="http://kisom.github.com/blog/2011/11/28/add-emacsd/"/>
    <updated>2011-11-28T00:00:00+03:00</updated>
    <id>http://kisom.github.com/blog/2011/11/28/add-emacsd</id>
    <content type="html"><![CDATA[<p>In the spirit of many of my online profiles which proudly declare &#8220;my
commit log is my blog,&#8221; I&#8217;ve decided to start using that in my posts.
Here is the first such attempt.</p>

<pre><code> commit 40bbc533313a43192506b682fe546304d8603d11 
 Author: Kyle Isom &lt;coder@kyleisom.net&gt;
 Date:   Mon Nov 28 17:34:30 2011 +0300

    add .emacs.d/init.el
</code></pre>

<p>I&#8217;ve started using emacs, which is an act of such great blasphemy for
a red-blooded stalwart vim-wielding hacker such as myself that I find
it difficult to come to grips with sometimes. But there is a method to
my madness, and it isn&#8217;t just that my morals are so comprised right
now in this nadir of my life that I&#8217;ve even started learning
javascript (a running joke).</p>

<!-- more -->


<p>Due to my current work situation, and the prospect of traveling to and
spending several months in a region with little to no network
connectivity, I&#8217;ve purchased a new 11&#8221; Macbook Air. I chose the 11&#8221;
model solely for price reasons; I would much prefer a larger
laptop. I&#8217;ve previously owned two EeePCs (the 7&#8221;
<a href="http://en.wikipedia.org/wiki/ASUS_Eee_PC#Eee_700_series">Eee PC 701</a>
and the 11&#8221;
<a href="http://en.wikipedia.org/wiki/ASUS_Eee_PC#Specifications">Eee PC 1101HAB</a>),
so I&#8217;m familiar with the smaller form-factor, and not a huge fan to be
honest.</p>

<p>Enter emacs - I can do all my work on emacs, with an integrated python
development environment incorporating a shell, pdb (with a pane that
shows the current line of the file being executed as you&#8217;re stepping
through code). I&#8217;ll still have the OS X desktop,
<a href="http://www.iterm2.com/">iterm2</a>,
<a href="http://tmux.sourceforge.net/">tmux</a>, and
<a href="http://code.google.com/p/macvim/">macvim</a>. But for getting things
done, I think that emacs is going to help out a lot.</p>

<p>Plus it&#8217;s backed by a Lisp flavour.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[on the police response to the occupy movement]]></title>
    <link href="http://kisom.github.com/blog/2011/10/29/on-the-police-response-to-the-occupy-movement/"/>
    <updated>2011-10-29T00:00:00+03:00</updated>
    <id>http://kisom.github.com/blog/2011/10/29/on-the-police-response-to-the-occupy-movement</id>
    <content type="html"><![CDATA[<p>(Originally written on 2011-10-29 and an unusual departure from the usual technical content.)</p>

<p>TL;DR One more combat veteran is disgusted and appalled by the actions of a few police officers during the course of the Occupy movement.</p>

<!-- more -->


<p>I just got done watching <a href="http://www.youtube.com/watch?v=WmEHcOc0Sys">this video</a>, and the outrage I feel towards those peace officers who disrespect the freedoms that I, and many better man than I, have ostensibly fought to protect, welled up in my chest like a Midwest thunderstorm. While we were told that our actions in Iraq were in the name of freedom, while many people back home who supported us with letters and care packages were told we were defending freedom, the basic rights of <strong>American citizens</strong> are being trampled on by a few officers and police departments.</p>

<p>Right up front: I don&#8217;t have anything against police officers. Yes, they do unpopular things. I understand that. When I came home, there were times when anti-war protesters took their protests to a personal level. As a 19 or 20 year old to return home from war, having seen friends and brothers killed, having experienced the trauma of being having your vehicle hit by an IED, and having had to make decision whether to take another man&#8217;s life or not (despite not being considered old enough to be responsible enough to consume alcohol), to have people tell you that you are a baby killer is a fairly difficult and emotionally trying experience. I took several of my best years out of my life to serve my country and gave up quite a number of opportunities in order to do so. I understand many of the feelings that I am sure many police officers feel when ordinary citizens harass and rail on them for actions they take that are unpopular. I get it. I know that as despicable as resorting to violence is, the fact of the matter is that humanity forces the need. Would that we were all enlightened, instead of being little more than clever apes with Facebook, but I understand that is not the case. Trust me.</p>

<p>But as a fighting man in an actual war zone - and despite the so-called &#8220;threat of terror&#8221; that supposedly &#8220;looms&#8221; here, America is not a war zone - had I fired even non-lethal rounds at unarmed civilians unprovoked because I felt scared due to the loud noises they were making, I would have faced military punishments. Many soldiers were punished for less. As a professional soldier, I am trained and expected to make reliable snap shoot/no-shoot judgements. During the surge in Iraq, when I was in a very kinetic (military word for an area of active fighting) environment, we still were held accountable for our actions. If use of force was justified (and it was on many occasions), there is no issue.</p>

<p>So to see American police officers engaging American citizens who are not wielding weapons is aggravating in no small way. As I mentioned in the opening paragraph, rage is a much better term for the emotion. It isn&#8217;t all the police officers, but just as a few soldiers cast shame over the entire US military, a few idiot police officers are casting shame over their departments and the governments they represent. If those officers were in my platoon, in addition to military judgement they would be taken out back and beaten.</p>

<p>I don&#8217;t want to rant, I don&#8217;t want to say anything stupid, and I don&#8217;t want to end up resorting to ad-hominem attacks. Suffice it to say that many of us in the military, regardless of our agreement (or lack thereof) with the protesters&#8217; cause, stood up and defended the right of those protesters to have their voice heard. It is a sad day when the populace needs protection from the very police officers whose job is allegedly to protect that self-same populace. Let us not forget who pays the salary of those police officers.</p>

<p>For those police officers who maintain their calm and professional bearing, I applaud you. I understand the desire to smash stupid people in the face for saying and thinking stupid things. Unfortunately, as a police officer, you are bound to protect those people. I have been in similar situations in the military, and I understand the frustration. If you feel so strongly about it, do it the old-fashioned way - go on your own, out of uniform, and throw down. If it&#8217;s a departmental policy to carry out those types of actions, then the government no longer serves its people and no longer upholds the Constitution, and it&#8217;s time for you to be replaced.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[generating patchfiles with git and hg]]></title>
    <link href="http://kisom.github.com/blog/2011/09/28/generating-patchfiles-with-git-and-hg/"/>
    <updated>2011-09-28T00:00:00+03:00</updated>
    <id>http://kisom.github.com/blog/2011/09/28/generating-patchfiles-with-git-and-hg</id>
    <content type="html"><![CDATA[<p>UPDATE: originally this post was only about doing this in git. Since I use
mercurial almost as much as I use git, I decided to look into how to do it
with mercurial too.</p>

<p>I recently was explaining to someone that as a coder, I do (or should do)
a lot more than just code. I figured since I hadn&#8217;t written anything here in
a while, I&#8217;d put my thoughts down here.</p>

<p>i found myself needing to generate a patchfile today from a git repo. it turns
out to be a very easy task.</p>

<!-- more -->


<ul>
<li><p>first, commit to a clean working directory. i&#8217;ll asume you are on the
&#8216;master&#8217; (git) or &#8216;tip&#8217; (hg) branch, but s/master/$branch/ as appropriate.</p></li>
<li><p>if you have only one commit between you and the commit you need to diff
against:</p></li>
</ul>


<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>git format-patch master^ --stdout &gt; my.patch<span class="sb">`</span>
</span></code></pre></td></tr></table></div></figure>


<p>or</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>hg <span class="nb">export </span>tip &gt; my.patch
</span></code></pre></td></tr></table></div></figure>


<ul>
<li><p>otherwise, substitute in the appropriate commit</p></li>
<li><p>to apply the patch, it&#8217;s</p></li>
</ul>


<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>git apply --stat my.patch
</span></code></pre></td></tr></table></div></figure>


<p>or</p>

<figure class='code'> <div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='bash'><span class='line'>hg patch my.patch
</span></code></pre></td></tr></table></div></figure>


<p>I did say it was a very easy task&#8230; You&#8217;ll notice mercurial makes this easier
(or at least I think so) than git.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[coders (should) do much more than code]]></title>
    <link href="http://kisom.github.com/blog/2011/09/10/coders-should-do-much-more-than-code/"/>
    <updated>2011-09-10T00:00:00+03:00</updated>
    <id>http://kisom.github.com/blog/2011/09/10/coders-should-do-much-more-than-code</id>
    <content type="html"><![CDATA[<p>I recently was explaining to someone that as a coder, I do (or should do)
a lot more than just code. I figured since I hadn&#8217;t written anything here in
a while, I&#8217;d put my thoughts down here.</p>

<h2>the tl;dr</h2>

<p>Coders code. That much is obvious from the title, but there is much more that
can and should be involved for anyone writing real code, at least for UNIX
coders.</p>

<h2>intro</h2>

<p>So you&#8217;ve spent the last couple weeks / months / years writing some really
brilliant bit of software that you think would benefit a lot of people. Or
maybe, just a few, but you still are of the mindset that since you did the
work to solve this problem, other people might have the same problem and if
they had the solution, they could concentrate on other problems. Regardless
of the quality of code and the development process you followed, which endless
books have been written on the subject, there is still a <strong>lot</strong> more work to
be done if you intend to make your software both useful and accessible to
other people. You still need to make sure you have a reasonable portable
(for the scope of the usefulness of your code) build system, good documentation,
an easily accessible online place for people to get your code, and proper
follow-through. Let&#8217;s talk through these bits.</p>

<!-- more -->


<h2>the build system</h2>

<p>No matter how wizard your code is, if it&#8217;s more work for other people to build
it than it&#8217;s worth, it won&#8217;t be used. That&#8217;s a simple fact. By now, users have
come to expect the proverbial <code>./configure &amp;&amp; make &amp;&amp; make install</code> (or
perhaps <code>scons</code> or <code>waf</code> or <code>jam</code> or one of the other solutions). Regardless,
the build process should not require much work for end users, except in cases
where the code is a very purposeful bit of code that requires careful
configuration. I personally have begun making use of the <code>autotools</code> suite
(my personal stance on the GPL notwithstanding, a rant for another day but
the curious can take a look at the license for most of my code on my
<a href="https://github.com/kisom/">github page</a>). This comprises
<a href="http://www.gnu.org/software/autoconf/">autoconf</a> and
<a href="http://www.gnu.org/software/automake/">automake</a> primarily. You will easily
spend many hours just writing out the configuration files on your end to
properly support and build the software, determining what needs to be checked
on the user&#8217;s system so that they can be sure the code will run on their node.
Once this is set up and functioning, for the most part and in theory, users
will be able to just do the typical configure-and-make pattern they have come
to know and love. The autotools are really designed for C and C++. For python,
there&#8217;s always the <a href="http://pypi.python.org/pypi/setuptools">Python setuptools</a>,
and of course for Perl there&#8217;s <a href="http://www.cpan.org/">CPAN</a>.</p>

<p>Of course, these tools are quite often in a different language than your code
is. For example, the autotools use POSIX shell, M4, and POSIX Makefiles to
generate the configure script and Makefiles for distribution. This takes time
to learn, especially given some of the nuances involved. There is of course
some debate (see (&#8220;Stop the autoconf insanity! Why we need a new build system&#8221;)[http://freshmeat.net/articles/stop-the-autoconf-insanity-why-we-need-a-new-build-system])
as to how useful these are, but for the most part the reward is worth the work.
For the autotools suite, take a look at the No Starch Press book
<a href="http://nostarch.com/autotools.htm">Autotools:A Practitioner&#8217;s Guide to GNU Autoconf, Automake, and Libtool</a>.
I found this book indispensable in learning the tool suite.</p>

<h2>documentation</h2>

<p>Documentation extends much further (or should) than the typical README and
INSTALL files found in many distributions. Many developers learn the basics
of TeX or LaTeX typesetting to produce aesthetically pleasing manuals; Texinfo
is also quite common. Markdown is becoming popular as well and with the advent
of tools like <a href="http://johnmacfarlane.net/pandoc/">pandoc</a>, even easier to
convert from Markdown to other formats (pandoc supports html and LaTeX). Besides
just the technical side of writing documentation and learning the typesetting
language used, there&#8217;s the art of technical writing as well. Many companies
have full-time technical writers whose sole purpose is writing documentation.
This is because of another simple fact: your software is of no use if the users
can&#8217;t figure out how to use it. While many users may be technically saavy
enough to read the code to figure out how to use it, for your code to be truly
useful, they should not have to resort to this. This is what I see as the Apple
factor: many developers use Apple&#8217;s hardware and operating system because not
only do things Just Work, but there is also excellent documentation available.
Another operating system leading the way in documentation is my beloved
<a href="http://www.openbsd.org">OpenBSD</a>. Users should have a clear set of instructions
of not only how to use the software, but ways to extend it, what things it
can do that they may not realise, and how to solve problems that may crop up.
So a truly good coder is both at least a proficient typesetter but also a
proficient writer of whatever human language the software is in (or aimed at).</p>

<p>Some projects go further and include a full copy of the license the software
is released under (which you should do for the safety / peace of mind /
convenience of your users - it took <a href="http://lteo.devio.us/">lteo</a> constantly
reminding me of this for many of my projects before I started doing it out
of habit) which is most often in a file called LICENSE or COPYING; a copy of
the ChangeLog, which could also be gotten from source control such as
<code>git log</code>; an AUTHORS file to list contributors; a README and INSTALL file to
give a quick usage and overview as well as installation instructions; and
perhaps a HACKING document to explain how to modify the code to be useful.</p>

<p>The README file is still rather useful; in fact, many times I will
<a href="http://kyleisom.net/blog/2011/07/31-rgtdd">write the README first</a> as part
of my development process.</p>

<p>No matter how you approach it or what you use to write and format your user
manuals, you should still have them included.</p>

<h2>distribution</h2>

<p>Today, distribution is one of the easiest aspects of coding. Numerous websites
exist for the sole purpose of distributing your software, such as
<a href="https://github.com">github</a>, <a href="https://www.bitbucket.org">bitbucket</a>,
<a href="https://www.sourceforge.net">sourceforge</a>,
<a href="https://www.freshmeat.net">freshmeat</a>, among others. Typically, such sites
will also host a remote version of your version control system (you <em>are</em>
version controlling, <em>right</em>?) in addition to supporting release downloads. A
well-setup build system offers the ability to build a distribution release,
often in tarball or tarred bzip2 format as well. Some sites still offer just
a release tarball (for a while, this is how I released my
<a href="https://github.com/kisom/libdaemon">libdaemon</a> project, via my
<a href="http://kisom.devio.us/src.html">devio.us homepage</a>. In fact, this is rapidly
becoming one of the easiest pieces of the project lifecycle. If you haven&#8217;t
already, take a look at one of the sites that works as a remote repo for
whatever source control you are using. You will probably see that besides
distribution, these sites are extremely useful for the last important additional
part of coding I want to talk about.</p>

<h2>support and maintenance</h2>

<p>Once the user has a copy of your software and knows how to use it, they will
inevitably encounter bugs or find that while they would really like to see
a feature in the software, they don&#8217;t have the technical skills to implement
it themselves (or perhaps the courage to look through your code&#8230;) Still other
users might fix the bugs or add new features themselves, and would like to
offer you those changes so you can incorporate them into the software. So the
last important additional part of being a coder is support and maintenance.</p>

<p>Many of the sites that offer to host releases of your code provide additional
tools, like wikis, bug reporting (aka trouble tickets), and feature requests.
Users may also provide patchfiles or a git pull request to give you their
contribition (and accordingly, you credit them in the documentation as well).
A good coder needs to be able to support and maintain the software - users are
more apt to use software if it gets patched or updated with new features (or
if it just works and they don&#8217;t need new features or bugs patched, which is
less likely but still possible).</p>

<h2>conclusion</h2>

<p>As I&#8217;ve explained, being a good coder and providing useful software encompasses
so much more than just good technical skills or great development processes.
There&#8217;s the administrative side (i.e. the build system, feature request and bug
tracking) and the human side (i.e. documentation and responding to support
requests). While it may not be as much fun as the actual coding, it is still
integral to the development process.</p>

<h2>update (2012-03-25)</h2>

<p>One of the things I&#8217;ve completely neglected to talk about in this discussion is
the use of tests. Functional tests, unit tests, regression testing, continuous
integration, basically &#8211; TEST ALL THE THINGS. Why? First - it helps you write
better code, and to ensure that changes don&#8217;t break everything (or if they, the
breakage is the expected breakage). Second, it&#8217;s a form of literate coding where
users can see how to use your code in practise (if it&#8217;s a library) or can get a
warm fuzzy knowing you cared enough to validate and test your code as you went.
You might think, well - this is a binary for end users. They won&#8217;t know or won&#8217;t
care about unit tests and so forth. Maybe that&#8217;s true. However, part of the
craft of writing good code is paying attention to detail. Any open source
project that wants to be open to contributions should have tests so quality is
enforced (i.e. don&#8217;t bother submitting a patch or pull request if your changes
don&#8217;t pass the tests) and so they can see how you are using your code. Yes, you
should be writing your code so that it&#8217;s obvious from reading it what it does.
If there&#8217;s a lot of it, and a developer wants to make some quick changes to fix
a bug, tests provide a good way for them to see where things happen and how they
happen. I assert that good coders write good test code. (Testing joke!)</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[RGTDD]]></title>
    <link href="http://kisom.github.com/blog/2011/07/04/rgtdd/"/>
    <updated>2011-07-04T00:00:00+03:00</updated>
    <id>http://kisom.github.com/blog/2011/07/04/rgtdd</id>
    <content type="html"><![CDATA[<p>One of the most important parts of becoming a useful developer is to find a
workflow that maximises productivty. There are plenty of methodologies and tools
people have come up with just for this - Agile, XP, BDD, TDD, RDD, and many
others. Of course, most everyone has their own unique flavour, and of course
I&#8217;m going to talk about mine. I&#8217;ve spent a lot of time trying different things
(and too much time going back to just grinding out code). This is the first post
of a milti-part series on how I&#8217;ve increased my productivity and what I do to
get things done. Unfortunately, my personal projects are sort of haphazard still,
but I have enjoyed success with this at work.</p>

<p>RGTDD has made a difference in how I develop code and stay productive;
furthermore, above just using a specific development methodology, I&#8217;ve found
certain tools to assist me in being productive.</p>

<!-- more -->


<p>So, RGTDD. It stands for README-generated Test-Driven Development. I took Tom
Preston-Werner&#8217;s <a href="http://tom.preston-werner.com/2010/08/23/readme-driven-development.html">README-driven development</a>
and adapted it to my own use. Without rehashing his post, READMEs factor in like
this: once the project is started, the first task to do is write the README. Once
the README is written, you lock it in and should not be changed except to fix
typos and spelling errors. This has several advantages:</p>

<ol>
<li><p>First, you end up having a single introductory piece of documentation. This
file contains the justification for you program, a quick introduction to its
features, and usage information.</p></li>
<li><p>Second, locking in the README means you prevent feature creep. You contract
yourself to the end user (yourself, future end users, the client, etc&#8230;) to
implement specifically the features in the README. As a rule of thumb, each
iteration of the README should have no more than five features - each feature
should be a concrete task. If the tasks are particularly complex, I generally
avoid implementing more than three. I have found for myself that trying to
implement much more than that results in code and projects that stagnate or
quickly become spaghetti code.</p></li>
<li><p>Third, you know what you need to code and what the code should specifically
do. I&#8217;ve found that a lot of times I have an idea for project X, but only have
a vague of idea of what it will do. It is part of good engineering to have a
well-laid out design and path for the project to direct development. I have found
this helps me to keep my code from getting overgrown.</p></li>
</ol>


<p>The README at this point now specifies an interface for users to interact with
the code. From here, you begin writing tests that cover the features. As you
begin writing these tests, you will likely figure out other components to the
code. You start writing tests from these. At this point, using the README as a
guiding document, I switch to a test-driven mode. Once the tests perform as
they should (including tests that are expected to fail), I consider this a
release. (I&#8217;ll write a post later on about what I&#8217;m doing for releases).</p>

<p>That&#8217;s a quick introduction to the project management methodology I use. I&#8217;ll
cover some specific tools to help out with various languages, my version
control and release methods, things I wasn&#8217;t taught in school, and what I&#8217;ve
found to help me keep my life organised.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[woofs released]]></title>
    <link href="http://kisom.github.com/blog/2011/06/20/woofs-released/"/>
    <updated>2011-06-20T00:00:00+03:00</updated>
    <id>http://kisom.github.com/blog/2011/06/20/woofs-released</id>
    <content type="html"><![CDATA[<h2>web one-time offer file securely</h2>

<p>in the past, i found <a href="http://www.home.unix-ag.org/simon/woof.html">simon budig&#8217;s woof script</a>
but i wanted an SSL-secured version. i finally got around to writing an
SSL-secured version. i&#8217;d started one in december, but i was still fairly new to
python, but i finally pulled it off. the repo can be found
<a href="https://github.com/kisom/woofs">here</a>.</p>

<p>interestingly enough, if you look at the git commit logs there are three
activity clusters: when i started the project in december, a brief period in
may when i started the major rewrite to include my own http server, and a
flurry of activity today when i added in ssl support.</p>

<p>so what does it do? as the name implies, it serves a file via https and by
default serves it only once. it&#8217;s designed to allow quick filesharing between
two systems; the transfer is protected by SSL. i won&#8217;t rewrite the documentation
here, so be sure to check the documentation to take a look at usage. perhaps
it will be useful to you as well.</p>
]]></content>
  </entry>
  
</feed>

