Concurrency patterns: a source of unique numbers
Working with some of Russ' code recently, I came across a cute pattern for implementing a unique id provider using a goroutine and a channel.
The goal is to have some source of unique integers. Let’s look at a naive implementation first. We could use a state variable and a function:
var uniqId int func uniq() int { uniqId++ return uniqId }
Each call to uniq
yields one of a seqeunce of unique numbers. These
couldbe used to name temporary files, for example:
fn := "/tmp/build" + strconv.Itoa(uniq())
However, there’s a problem: this code isn’t thread-safe. If two goroutines try to modify the uniqId variable simultaneously, memory corruption might occur. To prevent this we can add a mutex:
var ( uniqLock sync.Mutex uniqId int ) func uniq() int { uniqLock.Lock() defer uniqLock.Unlock() uniqId++ return uniqId }
This code is simple enough, but now there is as much code concerned with synchronisation as there is that provides the actual functionality.
Let’s see how we might do this in a more straightforward manner using a
goroutine and a channel. In the scope of package main
, we declare a channel:
var uniq = make(chan int)
To send the sequence of numbers on that channel, we use this for loop:
for i := 1;; i++ { uniq <- i }
This code should run in a goroutine, which we launch from the package’s
init
function (a function executed before main
). Here’s the whole thing:
var uniq = make(chan int) func init() { go func() { for i := 1;; i++ { uniq <- i } }() }
To use this new approach – returning to our file naming example above – we replace the function call with a channel read:
fn := "/tmp/build" + strconv.Itoa(<-uniq)
This approach is nice for a few reasons: it’s shorter and semantically simpler in its implementation, only one name is introduced to the global namespace (instead of the 3 from the previous example), and the code on the page is almost entirely concerned with its actual purpose: providing a stream of numbers.
You might be thinking “That’s all well and good, but in practice couldn’t I just build a little object to encapsulate this functionality?” You could, of course, but my intention is to demonstrate the usefulness of Go’s concurrency primitives “in the small.” This pattern I’ve shown here was just a small part of a simple program; the kind of thing that, once you’ve internalised the concepts of goroutines and channels, you find yourself writing without thinking much about it.
55917 views and 11 responses
-
Aug 26 2010, 3:54 AMnsf responded:I don't understand sometimes why people are so easily get hooked by a goroutine usage for simple tasks. Goroutines are cheap, but they are not free. Launching that kind of infinite loop goroutine throws 4 kilobytes of memory basically to /dev/null.
The implementation looks simpler, but it takes 4 kilobytes more memory and underneath it all uses the same mutex (most likely) for communication anyway.
I guess I'm just a too C guy, who is always deeply concerned about the memory. And I dislike that aspect of Go, which together with GC technology pretends that memory means nothing and is infinite. I think in the long term it is a wrong approach for doing programming in general. GC to me is a temporary measure for dealing with a burden of manual memory management. For some reason most programmers don't like to deal with that or aren't capable of.
Anyway, I think the mutex version is better.
-
Aug 26 2010, 4:14 AMAndrew Gerrand responded:Hey, thanks for the comment.
I like using goroutines and channels for simple tasks like this one because they make my code simpler. And, although this is a trivial example, the concepts behind it can be extended to more complex algorithms where the clarity really starts to pay off.
As for memory usage: what's 4kb between friends? ;-) But seriously, in the context of the tool in which this was used, the overhead of the goroutine is basically irrelevantĀ compared to the other stuff going on.
I disagree that Go pretends that memory is infinite. On the contrary, Go was designed to give the programmer a lot of control over allocation without making it too complicated. Take a look at slices - there's a reason why they don't grow automatically, and it's not because it was hard to implement. Similarly, the Reader and Writer interface types from the io package were designed to make it natural to re-use buffers and not thrash the GC.
Go isn't C. But if you want C, it's there, isn't it?
I'm glad you like the mutex version. I do, too. It's nice that Go makes it so easy to express it. ;-)
-
Aug 26 2010, 5:28 AMnsf responded:I guess you're right.
-
Aug 28 2010, 11:53 AMBrian Slesinsky responded:Shouldn't the channel be buffered? Otherwise you get a context switch every time you ask for another number.
-
Aug 28 2010, 7:02 PMAndrew Gerrand responded:If the channel were buffered then the goroutine would fill the buffer immediately. Any read from the channel would free space in the buffer, causing the goroutine to send the next value. The net effect would be more memory consumption (the size of the buffer), and no less context switching. These context switches are relatively cheap, anyway.
Just for the hell of it I benchmarked the two approaches. Here are the numbers:
uniq.BenchmarkMutex 10000000 173 ns/op
uniq.BenchmarkChannel 10000000 224 ns/opThe channel-based approach takes 1.29x the time of the mutex-based approach. Depending on the specific application, and your opinion on the aesthetics of the two approaches, this can be an acceptable tradeoff (it is, IMO).
To measure them yourself, get the code here:
http://github.com/nf/uniq -
Sep 19 2010, 7:47 AMJune Kim responded:This is a well-known technique in the CSP(Communicating Sequential Process) field. You'll find many useful and insightful gems in the CSP field, too.
-
Sep 19 2010, 10:48 AMQuestion: responded:This method still isn't guaranteed to return unique numbers, right? Because the uniqId is a finite sized variable, it will eventually wrap around. Does Go have something like Big Integers? You'd probably want to use something like that instead.
-
Sep 19 2010, 4:50 PMAndrew Gerrand responded:June: I'm well aware of the CSP field - Go has a history steeped in CSP. This article by Russ Cox, one of Go's core developers, discusses the history Go's ancestors w.r.t. CSP: http://swtch.com/~rsc/thread/
"Question:": Yes, this will eventually wrap. The use-case I had for this would never run that long. Go's "big" package provides support for arbitrary-precision arithmetic, and could be used for this purpose.
-
Sep 19 2010, 4:54 PMPat responded:Based on my benchmarks, it looks like if you replace the deferred "Unlock" with an explicit unlock (adds just a couple lines), the function version ends up being ~7.6x faster.
Whoa, if I change GOMAXPROCS from 1 to 2, it looks to be 74x faster to just use a mutex. As an added bonus, if we wrap the mutex source up and give it a method, you get the same speed improvement, but unlike the channel, it actually gets GC'd when you no longer need it.
I think channels are great, and in so many cases the speed difference is irrelevant, but when it is just as easy (as in this case) probably better to use a channel.
I sorta wish there were a "sync.AtomicInt" that used a cas under the hood. Then, it'd be the optimal and easy solution.
-
Sep 19 2010, 5:54 PMAndrew Gerrand responded:I assume you meant "probably better to use a mutex," and I agree. Your note about the slowdown using defer is an interesting one. I'd like to look into that further.
sync.AtomicInt is an interesting idea...
-
Sep 20 2010, 11:23 AMPat responded:The slowdown from defer is mainly a consequence of every call to "defer" allocating memory on the heap. It adds a few function calls too, but the "malloc" probably accounts for a vast majority of the time taken.