Dynamic Typing

The Mason Inn and What Could Have Been

The Mason Inn Conference Center and Hotel is a conference facility and hotel on the George Mason University (GMU) campus in Fairfax, VA. The University built the center to host academic and industry conferences. A major goal of the University is to significantly increase research activity, and the Mason Inn Conference Center helps to accomplish this by facilitating the following activities:

Building relationships. Most academic networking occurs at conferences, and new research partners are often met there.
Increasing funding. Increased research activity will only occur if Mason faculty are able to get funding. It's difficult for Mason to win grant awards alone – the University is young and still proving itself.
Growing reputation. Collaboration with established universities is part of a strategy toward increasing our own research reputation and activity.

The Mason Inn helps to foster these important research goals because hosting conferences on campus permits more faculty and students to attend and thus increases the potential for research collaboration with other organizations. Quite simply: the Conference Center and Hotel is a significant resource to the Mason community that is just beginning to be realized.

Unfortunately the Mason Inn is scheduled to close this summer. After investing $50 million, the decision to close the facility came after three years of losing $2 million per year. The administration attributes the loss to underuse.

What remains unknown is why this loss surprised the University administration. The Washington Business Journal [1] quoted J.J. Davis, senior vice president of GMU, last November as stating: "The location for an overnight traveler was not ideal. It's not off a major corridor. It was hard to bring people here when you're competing for business with facilities in D.C. and the like. The conference center's model was based on meetings that would fill the hotel's rooms, rent out all the conference space, and use the in-house catering." From the outset, the Conference Center and Hotel was built to provide a space that would facilitate research opportunities; as a new facility, such use requires marketing and outreach efforts to inform research communities about the facility as well as time to allow professional organizations schedule conferences there. Professional research conferences are often scheduled years in advance; therefore such an effort would require a time and marketing investment before these opportunities could be fulfilled.

This prompts a few questions. Was research done to understand how long it would take to win bids for conferences? Could the $2 million initial annual loss have been avoided? Even if the administration did not know the Conference Center and Hotel required continued investment, was there any attempt to innovate and attract new hotel customers?

The University recently announced a partnership with INTO, a for-profit international student recruitment corporation. Following that announcement the University administration shared plans to convert the Mason Inn into residential housing for INTO students. This is an opportunity to make the hotel profitable.

However, I suggest that there are some options that might not cause the Mason community to lose permanent access to this important resource. For example, instead of retrofitting the Mason Inn and losing the $50 million investment, the hotel could offer extended stay housing for INTO students. The University can adjust rates as needed to ensure INTO students have affordable housing while reducing the hotel loss. As GMU builds more dormitories and hosts more conferences the extended stay rooms in the hotel would be converted back to standard rooms.

I am concerned the loss of the Mason Inn Conference Center and Hotel will hinder University goals and reflect negatively on GMU over the long term. The expansion of University research and GMU's role as an economic and cultural engine are greatly advanced by establishing Mason as a host of academic and industry conferences. Closing the facility is contrary to the commitments in Mason's published goals [2] and the abandoned investment is a waste of University funds. The Mason Inn – as a site for research, collaboration, and connection – could have launched our transformation to a top-tier research university.

GMU to shutter Mason Inn. Washington Business Journal. http://www.bizjournals.com/washington/blog/top-shelf/2013/11/gmu-to-shutter-mason-inn.html?page=all
Our Goal. George Mason University. http://vision.gmu.edu/the-mason-vision/our-goal/

Pair Programming Might Be Great

When I first heard about pair programming, years ago, I regarded it as a fad and a scam. Experts in the often-fluffy field of software engineering proclaiming the one true way to build software in an attempt to sell books and conferences. That said, I've recently been convinced that pair programming might be really great. Here's why:

Pair programming helps you become mindful of how you, and others, solve problems.
The communication loop is tight, instant feedback to your ideas, thoughts, and actions.
Reasoning about a task, in two different modes, at the same time. Two minds, one goal.

Becoming Mindful

Pair programming seemed to limit control to solve problems in my own way. One of the things I love about programming, and abstract problem solving, is that there are few limits. In my head, pair programming was intrusive and disrupted creativity. It is now becoming clear that the opposite is true. Instead of solving problems our own way, we should strive to solve problems the best way. Sure, you might think you know your hacking tools in and out - but how do you know there isn't some trick to improve your workflow that you're missing?

How do you know the process you use to solve various types of problems is the best? If you have experience, and especially if you were lucky enough to work with a great mentor, you may have picked up some problem-solving tactics. Can you list the tactics you use to solve a problem? I've found that we sometimes are unaware of our own problem-solving process. Next, have you ever spent time testing out the effect each of those tactics has on the outcome of solving problems? That's a lot of data for one person to gather. Not only would you need to record all the observations, but you would need a lifetime to generate enough samples to draw any conclusions.

While conducting a research study to determine what tactics are the most crucial for solving different problems is a tremendous challenge, we can approximate the findings by being mindful and observing others. Each of us have built up an arsenal of tactics, our tricks of the trade, and you can get a snapshot of any engineer's strategy by watching him solve a few problems. How does he start: pen and paper, whiteboard, or straight to code? What next? Writing a test, defining interfaces, or taking a walk?

Becoming mindful of our own and others’ problem-solving strategies is a process that likely never ends. Thankfully, we have mentors, peers, and students to point out steps we take without notice. Consider a student seeing something for the first time, asking why you perform a particular action; or a mentor opening a dialogue to challenge a misstep.

Tight Loop

Sharing ideas and intent with other programmers can spark design discussions that lead to clarity and solutions. Great, can't design meetings and code reviews accomplish the same effect? Sort of, but not efficiently. Design meetings work well for only the broadest discussions, the details that begin to drive the design are mostly unknown until there's an implementation. Interlacing design discussion with writing code permits a higher fidelity in the design.

Similarly, distinct code reviews are useful but frequently occur too late. Peer code review should be interactive and occur as code is written, more time is saved when poor decisions are detected early. This prevents scenarios where large commits, perhaps a full day's work, is based on misunderstandings or bad engineering. When pairing, the partners should be thinking out loud. That verbal communication can prompt a design review to occur before any code is ever written. However, actual reviewing of code is also important to catch any assumptions or bad choices that are not verbalized.

Dual Modes

This hypothetical benefit of pair programming excites me the most. When people talk about having awesome pairing experiences where the pairing team has momentum and flow, I think they are taking advantage of what I call dual modes. In most pair programming literature this concept isn't given a name, instead they talk about two roles: driver and navigator. The driver is whomever has the keyboard and the navigator is the other. These names are deceptive, they imply there is only one active role, the driver, and then some pansy navigator trying to follow the action and pointing out spelling mistakes. Sounds lame.

If either of the roles should be considered more active, it should be the one not banging on the keyboard. Instead of a driver and a navigator, there should be a grunt and a commander. The grunt is trying to figure out how the task can get solved given a limited set of assumptions. Those assumptions enable the grunt to not worry about the big picture and focus on only implementing the current task in the best way possible given the assumptions. The commander is the mastermind, she should be providing the assumptions, revising them based on reports from the grunt, and watching the grunt's activity for any signs that an assumption is invalid. The importance of the navigator/commander role cannot be overstated.

Aside from poorly-named roles, much of the pair programming literature also places too much emphasis on role definitions. My hypothesis is that the real power of pair programming is two people working on the same problem but operating in complimentary mental modes. The exact responsibilities of each partner will shift throughout a session.

To Be Continued

I'll be pairing with a few people over the next month and will try to return with a follow-up about the experiment. If you've had an experience with pair programming, please share!

What is Social Complexity?

Social complexity is the study of social phenomenon through the lens of complex systems. This often includes building computational models of social behavior, some of which are related to computer science topics such as: cellular automaton, genetic algorithms, or neural networks. Social network analysis is also related to social complexity as it is used to quantify relationships, roles, and organization amongst agents.

A few research organizations involved in social complexity:

Sante Fe Institute
CASOS at CMU
Center for Social Complexity at GMU
Center for the Study of Complex Systems at UMich

A few books of interest:

The Evolution of Cooperation, by Robert Axelrod
Generative Social Science: Studies in Agent-Based Computational Modeling, by Joshua M. Epstein
Growing Artificial Societies: Social Science from the Bottom Up, by Joshua M. Epstein and Robert L. Axtell

Two notable blogs:

Zero Intelligence Agents, by Drew Conway
Three-Toed Sloth, by Cosma Shalizi

This is a brief post that I hope to extend later. Please feel free to make suggestions via comments, email, or twitter.

(Figure source: Wikipedia)

Connecting Tweets

Distant view of a network of tweets based on shared retweeters. This is shortly after the earthquake in Haiti earlier this year.

Distributed Processing and Unix Philosophy

A few months back I started working on a news classifier for Highput. I had originally planned to use Clojure to write Hadoop jobs, but found that most of the input data was available as a stream and the allure of continuously producing results was too much. RSS feeds, twitter streams, and collaborative ranking (e.g, HN, reddit) are continuous, the classifier should be as well. The data retrieval, extraction, transformation, and loading (ETL) was done using a distributed workflow built from three simple components: Python, Beanstalk, and Tokyo Tyrant. The producer/consumer processes are Python programs, Beanstalk is a message queue and divvies jobs to consumers, while Tokyo Tyrant stores intermediate results.

Python is a great programming language for getting things done: it has loads of libraries, pseudocode-like syntax, and keeps simple things simple. Aside from gaining throughput performance by distributing across multiple machines, one of the advantages of using a workflow for data processing is the reduction of complexity that comes from dividing a larger problem. Each processing step in the workflow was implemented as a separate Python program. Each instance of the program requests a job from Beanstalk, does its work, then throws the result into a different job queue. Any of the processes can be killed at any time without losing the task. That's because Beanstalk keeps track of jobs, even after a worker has reserved it - if the worker fails to report back then the job will be returned to a ready state and made available to other workers. Beanstalk also supports persistent queues, protecting against losing jobs if the beanstalkd process dies. Finally, there is Tokyo Tyrant, the network interface to Tokyo Cabinet. Tokyo Cabinet received deserving attention in 2009, it's the best little database, supporting multiple modes useful in various circumstances: simple key/value pairs, fixed-size arrays, B-trees, and schema-free tables. Ultimately, the news classifier in production would use something with an automated process for balancing storage nodes, but Tokyo Cabinet is great for prototyping even if it's not the longterm solution.

With these tools, a distributed system prototype for ETL can be built in a day. And not just a hacked up system that should be thrown away, but one that is nearly good enough for production - add deployment management, failure detection and recovery and it's complete. That is significant work, but all are concerns for users of most heavier distributed system frameworks too. While this solution is ad-hoc and inappropriate for some situations, it's amazing what can be done with a small set of well-designed, task-specific tools in a short period of time.

Logo Draft

Been working on a logo for Lightpost Software, think it's getting closer... Have a couple other layout and color variants, but this is my current pick.

Pictograph in Progress

A new draft for Lightpost Software's pictograph:

Or, alternate coloring:

Hint: it's supposed to be a lightpost.

Finger, decentralized social networks, and ownership

A group of developers have started WebFinger, an open source project to revive finger by implementing it atop HTTP. Although finger was historically used for supplying contact information and personal news, similar to personal blogs, a new webfinger + online storage gives users the ability to consolidate their online persona and retain ownership of content such as photos, videos, blog posts, and personal opinions. This is a step towards reclaiming personal ownership of online content and not filling out the same personal information and preferences for every site or social network platform.

Decentralization and privacy while leveraging collaboration tools such as social networks needs to be the future of the Internet.

Reactable in Chicago

Got to play with a Reactable in Chicago's Museum of Science & Industry.

Clojure's RestFn

Earlier today on #clojure, there was a brief discussion on how larger (including infinite!) arity functions are implemented in Clojure. An example similar to what started the discussion:

(apply + (range 100))

The interesting bit happens in RestFn and how the compiler lays out the bytecode.

The implementation of + is:

(defn + 
  "Returns the sum of nums. (+) returns 0." 
  {:inline (fn [x y] `(. clojure.lang.Numbers (add ~x ~y))) 
   :inline-arities #{2}} 
  ([] 0) 
  ([x] (cast Number x)) 
  ([x y] (. clojure.lang.Numbers (add x y))) 
  ([x y & more] 
  (reduce + (+ x y) more)))

There are four implementations of +, which gets used depends on the number of arguments provided. The + function is a RestFn instance and it's applyTo method is called from apply.

So far we have:

(apply f xs) -> f.applyTo(xs).

This then calls:

f.doInvoke(xs.first(), (xs = xs.next()).first(), xs.next())

Which matches the implementation that handles the parameter list [x y & more]. The Clojure compiler, when emitting the bytecode for +, places the implementation defined for [x y & more] under the appropriate doInvoke method and overrides RestFn's default implementation of throwing an exception about an unsupported arity.