Login

Pair Programming Might Be Great

When I first heard about pair programming, years ago, I regarded it as a fad and a scam. Experts in the often-fluffy field of software engineering proclaiming the one true way to build software in an attempt to sell books and conferences. That said, I've recently been convinced that pair programming might be really great. Here's why:

  1. Pair programming helps you become mindful of how you, and others, solve problems.
  2. The communication loop is tight, instant feedback to your ideas, thoughts, and actions.
  3. Reasoning about a task, in two different modes, at the same time. Two minds, one goal.

     

     

    Becoming Mindful

    Pair programming seemed to limit control to solve problems in my own way. One of the things I love about programming, and abstract problem solving, is that there are few limits. In my head, pair programming was intrusive and disrupted creativity. It is now becoming clear that the opposite is true. Instead of solving problems our own way, we should strive to solve problems the best way. Sure, you might think you know your hacking tools in and out - but how do you know there isn't some trick to improve your workflow that you're missing?

    How do you know the process you use to solve various types of problems is the best? If you have experience, and especially if you were lucky enough to work with a great mentor, you may have picked up some problem-solving tactics. Can you list the tactics you use to solve a problem? I've found that we sometimes are unaware of our own problem-solving process. Next, have you ever spent time testing out the effect each of those tactics has on the outcome of solving problems? That's a lot of data for one person to gather. Not only would you need to record all the observations, but you would need a lifetime to generate enough samples to draw any conclusions.

    While conducting a research study to determine what tactics are the most crucial for solving different problems is a tremendous challenge, we can approximate the findings by being mindful and observing others. Each of us have built up an arsenal of tactics, our tricks of the trade, and you can get a snapshot of any engineer's strategy by watching him solve a few problems. How does he start: pen and paper, whiteboard, or straight to code? What next? Writing a test, defining interfaces, or taking a walk?

    Becoming mindful of our own and others’ problem-solving strategies is a process that likely never ends. Thankfully, we have mentors, peers, and students to point out steps we take without notice. Consider a student seeing something for the first time, asking why you perform a particular action; or a mentor opening a dialogue to challenge a misstep.

    Tight Loop

    Sharing ideas and intent with other programmers can spark design discussions that lead to clarity and solutions. Great, can't design meetings and code reviews accomplish the same effect? Sort of, but not efficiently. Design meetings work well for only the broadest discussions, the details that begin to drive the design are mostly unknown until there's an implementation. Interlacing design discussion with writing code permits a higher fidelity in the design.

    Similarly, distinct code reviews are useful but frequently occur too late. Peer code review should be interactive and occur as code is written, more time is saved when poor decisions are detected early. This prevents scenarios where large commits, perhaps a full day's work, is based on misunderstandings or bad engineering. When pairing, the partners should be thinking out loud. That verbal communication can prompt a design review to occur before any code is ever written. However, actual reviewing of code is also important to catch any assumptions or bad choices that are not verbalized.

    Dual Modes

    This hypothetical benefit of pair programming excites me the most. When people talk about having awesome pairing experiences where the pairing team has momentum and flow, I think they are taking advantage of what I call dual modes. In most pair programming literature this concept isn't given a name, instead they talk about two roles: driver and navigator. The driver is whomever has the keyboard and the navigator is the other. These names are deceptive, they imply there is only one active role, the driver, and then some pansy navigator trying to follow the action and pointing out spelling mistakes. Sounds lame.

     

     

    If either of the roles should be considered more active, it should be the one not banging on the keyboard. Instead of a driver and a navigator, there should be a grunt and a commander. The grunt is trying to figure out how the task can get solved given a limited set of assumptions. Those assumptions enable the grunt to not worry about the big picture and focus on only implementing the current task in the best way possible given the assumptions. The commander is the mastermind, she should be providing the assumptions, revising them based on reports from the grunt, and watching the grunt's activity for any signs that an assumption is invalid. The importance of the navigator/commander role cannot be overstated.

    Aside from poorly-named roles, much of the pair programming literature also places too much emphasis on role definitions. My hypothesis is that the real power of pair programming is two people working on the same problem but operating in complimentary mental modes. The exact responsibilities of each partner will shift throughout a session.

    To Be Continued

    I'll be pairing with a few people over the next month and will try to return with a follow-up about the experiment. If you've had an experience with pair programming, please share!

    What is Social Complexity?

    Social complexity is the study of social phenomenon through the lens of complex systems.  This often includes building computational models of social behavior, some of which are related to computer science topics such as: cellular automaton, genetic algorithms, or neural networks.  Social network analysis is also related to social complexity as it is used to quantify relationships, roles, and organization amongst agents.

    A few research organizations involved in social complexity:

    A few books of interest:

    Two notable blogs:

      This is a brief post that I hope to extend later.  Please feel free to make suggestions via comments, email, or twitter.

      (Figure source: Wikipedia)

      Connecting Tweets

      Distant view of a network of tweets based on shared retweeters.  This is shortly after the earthquake in Haiti earlier this year.

      Distributed Processing and Unix Philosophy

      A few months back I started working on a news classifier for Highput. I had originally planned to use Clojure to write Hadoop jobs, but found that most of the input data was available as a stream and the allure of continuously producing results was too much. RSS feeds, twitter streams, and collaborative ranking (e.g, HN, reddit) are continuous, the classifier should be as well. The data retrieval, extraction, transformation, and loading (ETL) was done using a distributed workflow built from three simple components: Python, Beanstalk, and Tokyo Tyrant. The producer/consumer processes are Python programs, Beanstalk is a message queue and divvies jobs to consumers, while Tokyo Tyrant stores intermediate results.

      Python is a great programming language for getting things done: it has loads of libraries, pseudocode-like syntax, and keeps simple things simple. Aside from gaining throughput performance by distributing across multiple machines, one of the advantages of using a workflow for data processing is the reduction of complexity that comes from dividing a larger problem. Each processing step in the workflow was implemented as a separate Python program. Each instance of the program requests a job from Beanstalk, does its work, then throws the result into a different job queue. Any of the processes can be killed at any time without losing the task. That's because Beanstalk keeps track of jobs, even after a worker has reserved it - if the worker fails to report back then the job will be returned to a ready state and made available to other workers. Beanstalk also supports persistent queues, protecting against losing jobs if the beanstalkd process dies. Finally, there is Tokyo Tyrant, the network interface to Tokyo Cabinet. Tokyo Cabinet received deserving attention in 2009, it's the best little database, supporting multiple modes useful in various circumstances: simple key/value pairs, fixed-size arrays, B-trees, and schema-free tables. Ultimately, the news classifier in production would use something with an automated process for balancing storage nodes, but Tokyo Cabinet is great for prototyping even if it's not the longterm solution.

      With these tools, a distributed system prototype for ETL can be built in a day. And not just a hacked up system that should be thrown away, but one that is nearly good enough for production - add deployment management, failure detection and recovery and it's complete. That is significant work, but all are concerns for users of most heavier distributed system frameworks too.  While this solution is ad-hoc and inappropriate for some situations, it's amazing what can be done with a small set of well-designed, task-specific tools in a short period of time.

      Logo Draft

      Been working on a logo for Lightpost Software, think it's getting closer... Have a couple other layout and color variants, but this is my current pick.

      Pictograph in Progress

      A new draft for Lightpost Software's pictograph:

      Or, alternate coloring:

      Hint: it's supposed to be a lightpost.

      Finger, decentralized social networks, and ownership

      A group of developers have started WebFinger, an open source project to revive finger by implementing it atop HTTP.  Although finger was historically used for supplying contact information and personal news, similar to personal blogs, a new webfinger + online storage gives users the ability to consolidate their online persona and retain ownership of content such as photos, videos, blog posts, and personal opinions. This is a step towards reclaiming personal ownership of online content and not filling out the same personal information and preferences for every site or social network platform.
       
      Decentralization and privacy while leveraging collaboration tools such as social networks needs to be the future of the Internet.

      Reactable in Chicago

      Got to play with a Reactable in Chicago's Museum of Science & Industry.

      Clojure's RestFn

      Earlier today on #clojure, there was a brief discussion on how larger (including infinite!) arity functions are implemented in Clojure. An example similar to what started the discussion: 

      (apply + (range 100)) 
      The interesting bit happens in RestFn and how the compiler lays out the bytecode.
       
      The implementation of + is: 
      (defn + 
        "Returns the sum of nums. (+) returns 0." 
        {:inline (fn [x y] `(. clojure.lang.Numbers (add ~x ~y))) 
         :inline-arities #{2}} 
        ([] 0) 
        ([x] (cast Number x)) 
        ([x y] (. clojure.lang.Numbers (add x y))) 
        ([x y & more] 
        (reduce + (+ x y) more))) 
      There are four implementations of +, which gets used depends on the number of arguments provided. The + function is a RestFn instance and it's applyTo method is called from apply.

      So far we have: 

      (apply f xs) -> f.applyTo(xs). 
      This then calls: 
      f.doInvoke(xs.first(), (xs = xs.next()).first(), xs.next()) 
      Which matches the implementation that handles the parameter list [x y & more]. The Clojure compiler, when emitting the bytecode for +, places the implementation defined for [x y & more] under the appropriate doInvoke method and overrides RestFn's default implementation of throwing an exception about an unsupported arity.

      The amsmath matrix environment for alignment

      I just spent a good hour searching for a way to left align a couple lines of type definitions with amsmath.  The matrix environment, which provides matrix formatting without delimiters, is a solution.