PyconAU, Go, and a bit of l'esprit de l'escalier

Yesterday I went to Pycon Australia in Sydney. It was a great conference; well-organized, great speakers, and a fun crowd. I used to write a lot of Python code professionally, so it was both interesting and rewarding for me to get back in touch with the Python community.

I signed up for one of the lightning talk slots at the end of the day, with the contentious title “Love Python? Try Go!” When I asked a friend if it was inappropriate to give a talk about Go at a Python conference, he replied “Giving your talk without pants on is inappropriate. Talking about Go is just obscene!” Undeterred, I prepared and delivered my talk anyway (see the slides).

The talk was surprisingly well-received (I didn’t get booed off the stage). There was the requisite heckling from my friends and colleagues in the front row, but when they judged each of the talks by applause I got a reasonable showing (I expected stony silence).

My talk sparked a number of conversations at the pub afterwards. Were I to do the talk again there are some things I would have done differently:

  • I wouldn’t make such a big deal about static typing.

It turns out that Python programmers like dynamic typing. A lot. Python’s run time typing errors just don’t seem to be a big deal to most Python programmers.

This surprised me, because my number one peeve with dynamic languages is how fragile the code feels. In Go, I can fearlessly refactor code with the confidence that most silly mistakes will be caught by the compiler. I have never had the same confidence when writing Python code.

One counter-argument is that good tests will catch those same kinds of errors. This is true to some extent, but I contend that static type checking saves time during develpoment, too. Unless you write your tests up front (I don’t) you won’t get that important benefit in a dynamic language.

But I digress. The “static vs dynamic typing” discussion is deep and subtle, which makes it a poor choice for a 5-minute lightning talk. I should have left it out.

  • I would have somebody thoroughly proofread my slides.

On the slide about fast compilation I stated that “Typical programs build in <1 second,” except I used a greater-than sign by mistake. That got a laugh, at least, even if I didn’t know why at the time.

  • I would make the point that programming languages are not a religion and that language choice is not a zero-sum game.

Learning a new programming language does not mean you forget the other ones; it just makes you a better programmer by giving you more options and experience.

The intent of my talk was not to coax programmers away from Python to Go. I wanted to tell people about Go and show that it offers some real benefits over other languages for some tasks. While I personally prefer to use Go over other languages for pretty much anything, I certainly don’t expect that everyone should feel the same. I’m not a Go zealot (or a troll), and I hope I didn’t come across that way. :–)

Software systems are mostly heterogeneous, particularly in the web world. For example, Python programs consist of a lot of C code, and most Python web apps are deployed behind a web server written in C. Were I giving the talk again, I would focus on the benefits of adding Go to your software development toolset, and give some real examples of systems that use both Python and Go.

So, in conclusion, if you’re a Pythonista with the need for speed and/or concurrency – or you’re just interested in learning something quite different and new – I (once more) encourage you to try Go. It may not replace Python in your day-to-day life but – like any new tool – it might just make your life easier.

Good starting points are my Real World Go talk and the Go documentation. And, if you’re in Sydney, come along to the Sydney GTUG on Tuesday the 30th of August where Rob Pike and I will be giving a couple of Go presentations.

Go release vs weekly

We just rolled out Go release.r58, the third official “stable” release of Go. Back in March, I announced our new release process. The plan was to tag a new release every couple of months (instead of once a week). The last stable release was r57.1 at the start of May (although there was the security-related r57.2 point release in the interim). I’m happy that we have stuck to our promised release cycle thus far.

Since tagging the release this morning I have had some confused enquiries as to why – when switching from the latest weekly to this release – there appeared to be some regressions. This is by design.

release.r58 is based on weekly.2011-06-09, while the latest weekly is in fact weekly.2011-06-29. Releases are tagged retroactively, and we judged 06-09 to be the most stable weekly in recent history. This means many fixes and changes present in subsequent weeklies didn’t make it into this release. If you were using the latest weekly and then switched to the release, you might have seen some changes effectively “reversed”.

The lesson here is to choose a path – release or weekly – and stick with it:

  • With the release tag you won’t need to upgrade as often and will have a relatively consistent and stable experience.
  • With the weekly tag you can try out the latest improvements and fixes, but you should be prepared for things to break.

And if you choose to switch between paths, don’t be surprised if you’re confused. ;–)

Roll your own gzip-encoded HTTP handler

While I’m sure we’ll have “out of the box” support for gzip-compressed HTTP responses pretty soon, it’s quite easy to do it yourself.

package main

import (
    "compress/gzip"
    "http"
    "io"
    "os"
    "strings"
)

type gzipResponseWriter struct {
    io.Writer
    http.ResponseWriter
}

func (w gzipResponseWriter) Write(b []byte) (int, os.Error) {
    return w.Writer.Write(b)
}

func makeGzipHandler(fn http.HandlerFunc) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        if !strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") {
            fn(w, r)
            return
        }
        w.Header().Set("Content-Encoding", "gzip")
        gz, err := gzip.NewWriter(w)
        if err != nil {
            http.Error(w, err.String(), http.StatusInternalServerError)
            return
        }
        defer gz.Close()
        fn(gzipResponseWriter{Writer: gz, ResponseWriter: w}, r)
    }
}

func handler(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("This is a test."))
}

func main() {
    http.ListenAndServe(":8081", makeGzipHandler(handler))
}

One nicety this demonstrates is how easy it is to substitute a subset of one interface’s methods with those of another. The gzipResponseWriter has two embedded values. The methods from http.ReponseWriter that don’t conflict with the io.Writer are simply inherited. But because the http.ResponseWriter and io.Writer both have a Write method, we must write a shim Write method to pass through to the io.Writer.

Collecting and plotting live data with Go

My recent talk, Practical Go Programming, describes the construction of the URL shortener program goto, and concludes with a graphical demonstration of the program being tested under load. It tests goto running as a single master, and one, two, and three slaves in front of one master. Another Go program plots the number of Get (redirect) and Put (shorten) operations served by each process on a line graph. In this post I will describe how those statistics are gathered, collated, and displayed.

First off, I should let you know that I did this in a hurry. The night before I first delievered the talk (at OSDC in Melbourne) I realised it should end with something visual, some display that would show just how capable and efficient this simple Go program is. So after just under an hour of hacking I had produced the stat package, the corresponding stat server, and the stress-testing program bench. This disclaimer serves both to excuse the quality of the code, and a testament to Go’s utility as a language.

bench

The bench program simply fires off repeated HTTP requests to one or more goto servers. When bench starts it launches several goroutines:

func main() {
    for i := 0; i < 20; i++ {
        go loop(get, getDelay)
    }
    for i := 0; i < 2; i++ {
        go loop(post, postDelay)
    }
}

func loop(fn func(), delay int64) {
    for {
        fn()
        time.Sleep(getDelay)
    }
}

The get function makes a URL redirect request to the goto server, and post makes a URL shortening request. To make the GET requests to valid short URLs, bench records each short URL returned by post and provides them to get as needed. This is coordinated through two channels, newURL and randURL:

var (
    newURL  = make(chan string)
    randURL = make(chan string)
)

The post function sends each shortened URL it receives from goto to the newURL channel:

func post() {
    url := "http://master/add"
    r, err := http.PostForm(url, map[string]string{"url": fooUrl})
    if err != nil {
        log.Println("post:", err)
        return
    }
    defer r.Body.Close()
    b, err := ioutil.ReadAll(r.Body)
    if err != nil {
        log.Println("post:", err)
        return
    }
    newURL <- string(b)
}

While get receives random short URLs from randURL:

func get() {
    url := <-randURL
    r, err := http.Get(url)
    if err != nil {
        log.Println("get:", err)
    }
}

The keeper function runs in its own goroutine and maintains a slice of short URLs. When keeper receives a value from newURL it adds the value to the slice. At the same time, it attempts to send URLs picked randomly from the slice to randURL.

func keeper() {
    var urls []string
    urls = append(urls, <-newURL)
    for {
        r := urls[rand.Intn(len(urls))] // choose random url
        select {
        case u := <-newURL:
            urls = append(urls, u)
        case randURL <- r:
            // random url sent
        }
    }
}

This is a nice example of a Go’s select statement, where it attempts to receive and and send data at the same instant, and selects whichever operation is ready first.

There’s a little more to the bench program, such as configuration through command-line flags, but I’ll leave their discovery as an exercise for the reader.

stat

The stat package enables any Go program to count events that occur during a specified time period (in this case, each second) and report them to a statistics server via RPC. It exposes a simple interface:

func Monitor(addr string)

var (
    In      = make(chan string, 100)
    Process = "default"
)

To use stat, a process should launch a new goroutine and call Monitor, its first argument being the address of the statistics server.

The stat.Process variable can be set to a string that describes the process, which will eventually be used to name the data series on the line chart. In the case of goto, I set it to the process' HTTP listen port.

For each event to be counted, the process sends an identifying string to stat.In. For example, each time goto handles a Get request it sends the string "get" to In.

The Monitor function opens an RPC connection to the server, and then receives events from In and counts them, while periodically singing updates to the stats server. It uses a map of counter values to keep track of the data for each series, and a time.Ticker to trigger server updates.

type counter struct {
    total, period, cycles int64
}

func Monitor(addr string) {
    counters := make(map[string]*counter)
    client, err := rpc.DialHTTP("tcp", addr)
    if err != nil {
        log.Fatal(err)
    }
    t := time.NewTicker(period)
    for {
        select {
        case <-t.C:
            update(client, counters)
        case s := <-In:
            c, ok := counters[s]
            if !ok {
                c = &counter{}
                counters[s] = c
            }
            c.period++
        }
    }
}

The update function constructs a Point for each series, and sends it to the stats server via a Server.Update rpc request.

type Point struct {
    Process string
    Series  string
    Value   int64
}

func update(client *rpc.Client, counters map[string]*counter) {
    for series, c := range counters {
        c.total += c.period
        c.cycles++
        p := Point{Process, series, c.period}
        err := client.Call("Server.Update", &p, &struct{}{})
        if err != nil {
            log.Println("stat update:", err)
        }
        c.period = 0
    }
}

The stat package exports the Point struct (by naming it with a capital letter) so that the stat server itself can use it.

stat/server

The stat server is an RPC server that exposes one method, Server.Update, and an HTTP server that serves some static HTML and JavaScript to display the chart, and the live chart data as JSON.

The Server type contains a map that holds slices of coordinates keyed by series name:

type Server struct {
    series map[string][][2]int64
    start  int64 // store points relative to start time
    mu     sync.Mutex
}

func NewServer() *Server {
    return &Server{
        series: make(map[string][][2]int64),
        start:  time.Nanoseconds(),
    }
}

Its Update method accepts updates via RPC from client programs using the stat library:

func (s *Server) Update(args *stat.Point, r *struct{}) os.Error {
    s.mu.Lock()
    defer s.mu.Unlock()

    // append point to series
    key := args.Process + " " + args.Series
    second := (time.Nanoseconds() - s.start) / 100e6
    point := [2]int64{second, args.Value}
    s.series[key] = append(s.series[key], point)

    // trim series to maxLen
    if sk := s.series[key]; len(sk) > *maxLen {
        sk = sk[len(sk)-*maxLen:]
    }

    return nil
}

Not the cleanest piece of code I’ve written, but in essence it just tracks data series keyed by the Process and Series provided by the client, and only maintains the last maxLen points (configurable by a command-line flag).

The ServeHTTP method simply serves the current data as a JSON blob, allowing us to use a *Server as an http.Handler.

func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    s.mu.Lock()
    defer s.mu.Unlock()
    w.SetHeader("Content-Type", "application/json")
    json.NewEncoder(w).Encode(s.series)
}

The server’s main function sets all this up, as well as the Static handler for serving the HTML and JavaScript.

func main() {
    flag.Parse()
    server := NewServer()
    rpc.Register(server)
    rpc.HandleHTTP()
    http.HandleFunc("/", Static)
    http.Handle("/get", server)
    http.ListenAndServe(*listenAddr, nil)
}

The front-end uses the Prototype and Flotr libraries to draw the graph. When the document loads, it kicks off a timer to make Ajax requests to /get every second:

document.observe('dom:loaded', function(){
    setInterval(function() {
        new Ajax.Request('/get', { onSuccess: draw })
    }, 1000)
});

The draw function then massages that data into the appropriate format to be plotted by Flotr:

function draw(data) {
    var series = [];
    Object.keys(data.responseJSON).each(function(key) {
        series.push({
            label: key,
            data: data.responseJSON[key]
        })
    });
    var f = Flotr.draw($('container'), series);
}

And that, in essence, is all there is to it.

The stat package and server are pretty limited right now, but I would be interested in extending it further. If you use this code I’d be curious to hear about it.

Me at FOSDEM 2011: Practical Go Programming

"Practical Go Programming"

This is a variation on the talk I gave at OSDC in Melbourne last year. The slides are available here.

I had a great time at FOSDEM. It was a very well-organised event, with thousands of open-source hackers from all over Europe coming together to learn and get things done. I'll be back next year for sure.

Update: I've written a post describing the internals of final the demonstration.

2010

In 2010 I:

  • got a job at Google working on an amazing project with a great team,
  • visited 8 countries for work,
  • visited 2 countries for pleasure,
  • made it to my 6th continent, Africa,
  • gave more than a dozen public presentations, including one to a packed room at Google I/O,
  • implemented a ridiculous internal feature that impacted all of Google’s >20k employees,
  • went diving on a World War II wreck in Papua New Guinea,
  • went hang gliding for the first time,
  • started and finished a bunch of programming projects,
  • and met and forged friendships with more people than in the preceding 5 years combined (the best bit).

2011 will be even better. Happy New Year.

On learning Go

I’m often asked about the best way to learn Go. My advice is to take the same approach you should to learn any language: pick a real problem and solve it. It doesn’t have to be big.

My first Go programs were those I wrote working though the exercises of Jon Bentley’s Programming Pearls in preparation for my interviews at Google. (I joined the team a little after Go was launched.) The exercises consisted of various search and sort routines, a hash table implementation, and more along those lines. They were a great way to become familiar with the basic Go syntax, but provided little scope for learning about the standard library or the more interesting language features.

The first real Go program I wrote is a formatting tool for Mercurial change logs, to assist in preparing the Go project’s release notes. In writing it I learned a bit about I/O, strings, maps, and a few parts of the standard library.

Around the same time I began contributing bug fixes and minor features to the Go project itself. Reading and working on the Go core is the single most effective way to familiarize oneself with Go idiom. We have spent a lot of time polishing and refining the Go core, making it an invaluable resource. Next time you look up a function in the documentation you should go a step further and look at its implementation, too.

Things continued in this vein for a while. Any time I wrote a program – any program – I did it in Go. At first I thought this would slow me down, but I was surprised to find that I got over the initial hump quickly. Before long, the kinds of small tasks I would typically write in Python (or even bash) I was writing in Go – and they were better for it. Go being a statically typed and compiled language, these tools were unequivocally more reliable and efficient than they would have been as scripts.

(I’m reminded of something a colleague said to me earlier this year: “Of all the code I’ve written, I’m most proud of my Go code.” With Go I never feel like I’m writing a “throw away” program; they each seem like polished little jewels.)

So if you are looking to learn Go, don’t look too far. Your first Go project is probably right in front of you.

Go Resources: (by no means an exhaustive list)

Go at Amped 2010

About two months ago I attended Amped “The Hack Day: Reloaded” at the Powerhouse Museum in Sydney. A group of developers got together and formed teams to compete in an array of challenges. Among the challenges was to build something using the Powerhouse Museum API which they had unveiled that morning. I chose to take that challenge, and competed as a team of one.

Earlier in the week I had written and released a back-end for Rumpetroll, an engaging and unusual web experiment created by a group of scandinavians (“with love”). Rumpetroll is a canvas and websocket-driven web site with a simple concept: you control a tadpole that can swim around in a shared environment with others.

My first thought on seeing it was that Go would be ideally suited to powering its back-end. Go has a websocket package in its standard library, and its concurrency primitives are perfect for this kind of application. My implementation was about 160 lines in all (50 lines shorter than the original Ruby/EventMachine implementation).

My take on the Powerhouse challenge was to extend the Rumpetroll interface and my back-end to provide a collaborative interface for exploring the museum’s collection. There was a small amount of JavaScript programming involved, but the bulk of the work was in Go. I wrote a simple library for pulling data from the Powerhouse API, and extended the Rumpetroll back-end to retrieve museum items and serve them. You can see the result running at powerhouse.nf.id.au. The code is in the powerhouse branch of my Rumpetroll repository.

The basic functionality of the back-end works like this: each user has a websocket connection through which it sends and receives positional messages as the tadpoles move around the space. Each of those connections is handled by two goroutines; one for reading, and another for writing. The reader goroutine receives data from the websocket, decodes the JSON updates, labels them with a user ID, and sends them to a global muxer goroutine on a single “Incoming” channel. The muxer receives messages from all clients and re-sends them to each client on their own, unique return channel. Each client’s writer goroutine receives each message from its unique channel, JSON-encodes the message, and writes it to the websocket connection.

In then mapped the museum’s items in two-dimensional space. A ContentLayer goroutine listens to the incoming messages, and, when a user’s tadpole is within range of a content item, sends the item’s data to the user.

The whole thing was startlingly easy. A few hours in, I found myself grinning widely as I coloured in the necessary pieces of code. There were no dead ends; whenever I had to change or extend something, the path forward was obvious and painless. Even when doubling back on fundamental design decisions, Go’s lightweight syntax made my life easier than it would have been in other languages.. The only time I really spent swearing was at mysterious bugs in the JavaScript code (mostly typos that would have been caught by a static typing). Perhaps most telling of all is that I’m not even embarrassed by the rushed code I produced under pressure, even two months later

I look forward to the next event of this kind so that I may once again use Go as my not-so-secret weapon.

Deploying Go web services behind Nginx under Debian or Ubuntu

A couple of days ago I wrote a simple URL shortening service named Goto for my own personal use. I deployed it on a Virtual Private Server that I rent from prgmr.com. The server already had an Apache installation serving some simple web sites, so I had to find a way to continue to serve them while serving requests to the new Goto service.

One option is to get another IP address from my VPS provider and use that for serving Goto requests. But, being the responsible netizen that I am, I’d rather not contribute to IPv4 address exhaustion without good cause. Another option is to run Goto on a port other than 80, but that would add at least a few more characters every URL, defeating the purpose of a URL shortening service.

The third option is to use named virtual hosts. Practically all browsers send a Host header in their HTTP requests that specifies the hostname the request is destined for. This enables a web server to serve many sites on the same IP address and port. To run Goto on the same port as my other web sites, I’d need to configure the web server to forward requests with a specific Host to the Goto service.

The first step was to throw Apache out. It was already consuming more than half the VPS’s available RAM; complete overkill for serving simple static web pages. In its stead, I installed the wonderful Russian-born Nginx web server (as root):

$ /etc/init.d/apache2 stop      # uninstall apache2
$ update-rc.d -f apache2 remove # disable apache2
$ apt-get install nginx         # install nginx

After familiarising myself with Nginx’s simple configuration language, I had the web server back up and running in a few minutes (and using a mere 900kb of memory, a stark contrast to Apache’s >100mb).

(Admittedly, this would have been trickier if my original Apache set-up were more complex – involving PHP, for example. In that case I may have opted to put both Apache and Goto behind an Nginx installation.)

Here’s how to configure Nginx to proxy requests to the Goto service:

Goto runs as a stand-alone web server. Tell it to listen on 127.0.0.1:9980. Listening on localhost prevents outsiders from connecting to it directly. (The specific port number was chosen arbitrarily.)

$ goto -http=127.0.0.1:9980

Configure Nginx to forward requests to the service. Do this by creating a site configuration file under /etc/nginx/sites-available. In this case I named it goto. This configuration forwards all requests to r.nf.id.au to the Goto service running at http://127.0.0.1:9980:

server {
    listen 80;
    server_name r.nf.id.au;
    access_log /var/log/nginx/goto.access.log;
    location / {
        proxy_pass http://127.0.0.1:9980;
    }       
}

Enable the site configuration by creating a symlink to it in sites-enabled, and telling Nginx to reload (as root):

$ cd /etc/nginx/sites-enabled
$ ln -s ../sites-available/goto
$ /etc/init.d/nginx reload

You should have already modified your DNS settings to point the hostname to the web server. (Sometimes DNS changes can take a while to propagate; you might want to add a line to your local /etc/hosts to set the hostname manually for the time being.) Check that this worked by loading the URL in a web browser.

We’ve now accomplished what we set out to do. But what if the machine restarts? We need to configure the system to run the goto process on start-up.

Debian makes this pretty straightfoward with its init.d system. You could write the init.d script yourself (if you know how), but its simpler to base it on an existing script. Create a copy of /etc/init.d/nginx, name it /etc/init.d/goto, and modify it to manage the Goto program instead:

#!/bin/sh

### BEGIN INIT INFO
# Provides:          goto
# Required-Start:    $all
# Required-Stop:     $all
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: starts the goto server
# Description:       starts goto using start-stop-daemon
### END INIT INFO

PATH=/sbin:/bin:/usr/sbin:/usr/bin

BIN=/usr/sbin/goto
PIDFILE=/var/run/goto.pid
USER=nobody
GROUP=nogroup

HOST=r.nf.id.au
HTTP=127.0.0.1:9980
FILE=/var/spool/goto/store.gob
PASS=password
BINARGS="-host=$HOST -http=$HTTP -file=$FILE -pass=$PASS"

test -f $BIN || exit 0
set -e
case "$1" in
  start)
    echo -n "Starting goto server: "
    start-stop-daemon --start --chuid $USER:$GROUP \
        --make-pidfile --background --pidfile $PIDFILE \
        --exec $BIN -- $BINARGS
    echo "goto."
    ;;
  stop)
    echo -n "Starting goto server: "
    start-stop-daemon --stop --quiet --pidfile $PIDFILE --exec $BIN
    rm -f $PIDFILE
    echo "goto."
    ;;
  restart)
    echo -n "Restarting goto server: "
    $0 stop
    sleep 1
    $0 start
    echo "goto."
    ;;
  *)
    echo "Usage: $0 {start|stop|restart}" >&2
    exit 1
    ;;
esac
exit 0

There’s some configuration information in there (hostname, http port, etc). It’s not considered a good practice to put that kind of information in an init.d script, but it’ll do for now. (The alternatives are to modify Goto to read from a configuration file, or to have the init.d script call another shell script – named something like /etc/goto.conf – to set the environment variables.)

The data file is to be stored in /var/spool/goto, and, as a security precaution, the goto process runs as user nobody in group nogroup. Create the directory and set its ownership appropriately:

$ mkdir /var/spool/goto
$ chown nobody.nogroup /var/spool/goto

Start the service using the init.d script:

$ /etc/init.d/goto start
Starting goto server: goto.

Finally, add the service to rc.d so that it is launched on start-up:

$ update-rc.d goto defaults
 Adding system startup for /etc/init.d/goto ...
   /etc/rc0.d/K20goto -> ../init.d/goto
   /etc/rc1.d/K20goto -> ../init.d/goto
   /etc/rc6.d/K20goto -> ../init.d/goto
   /etc/rc2.d/S20goto -> ../init.d/goto
   /etc/rc3.d/S20goto -> ../init.d/goto
   /etc/rc4.d/S20goto -> ../init.d/goto
   /etc/rc5.d/S20goto -> ../init.d/goto

And that’s it, for now. There is more you could do in the way of logging and monitoring, but these are the bare essentials to reliably run a Go (or any other) web service behind Nginx under Debian.