| tags:stdout streams slow disk i/o categories:programming

stdout is slow
or rather, any sort of disk i/o is slow

Reading or writing any sort of stream can be slow. I cannot stress how much that is true.

It’s incredibly noticeable, especially when you have some sort of GUI program in debug mode and it prints out a load of stuff. Each printout, whether it’s piped somewhere else or goes to the console, can add time that matters to your program’s execution.

Why bring it up now?

Take for example, libdpx. I haven’t silenced any of the debug output; rather I use it to my advantage so that I can trace where things are going. I’ve caught several rather significant bugs this way, since tracing with gdb is rather hard on a coroutine library.

But what about performance?

According to ØMQ, their performance in terms of latency (which is the only one I have tested with duplex) for 100000 1-byte messages is a mere 30.9 microseconds, a figure I was able to replicate locally. Jeff mentioned that I could look into performance with libdpx, and I said “sure, why not?”

The tests are practically identical to the ØMQ ones - I have a receiver and a sender of messages. And what I got initially stunned me: {% highlight shell-session %} $ ./send_lat 127.0.0.1 9999 1 100000

lots of stdout here

message size: 1 bytes message count: 100000 avg latency: 6788597.849 nanoseconds {% endhighlight %}

That’s 6788.6 microseconds. Yikes!

save the output for debugging only

I went ahead today and created a DEBUG_FUNC(...) macro, which #ifdef DEBUG would actually execute anything inside the function, and if not, would delete anything there.

So then I wrapped all my not-important printf statements with DEBUG_FUNC, and the errors I separated out into fprintf(stderr, ...). And then what did I find?

{% highlight shell-session %} $ ./send_lat 127.0.0.1 9999 1 100000 message size: 1 bytes message count: 100000 avg latency: 129588.328 nanoseconds {% endhighlight %}

129.6 microseconds. A huge improvement. And because I didn’t write to stdout at all.

writing means disk i/o

But this is particularly true for anything - if you write to anything, there is going to be overhead because of flushing the stream to disk or wherever it wants to go.

Remember that advice about writing to files in C? You have to flush the stream before closing it, or changes aren’t written to disk. It’s the same thing with stdout - stdout is a file, after all.

This is because files like stdout are line buffered by default - so anything will stay in the buffer until it hits a newline, at which point the program essentially pauses to force everything to the file.

From what I know, there are three types of buffering:

  1. No buffering, so anything written will be flushed immediately to disk.
  2. Block buffering, so after a certain length or manual call it will flush.
  3. Line buffering, so after a \n it will flush.

A note here that was made on Stack Overflow is that stderr isn’t buffered because it’s standard error - so naturally it wants to get the error to you as quickly as possible, even if it means slowing down everything else.

You can set any stream’s buffering characteristics with setvbuf() - either full buffering, line buffering, or no buffering. Practically all files are opened with full buffering (hence the advice to flush before you close), while files like stdout are line buffered, and stderr is not even buffered at all.

And that flushing takes CPU cycles. It has to empty out the buffer by writing it to disk, an operation that is costly because it’s disk i/o, and then it has to clear the buffer before it can resume execution.

While SSDs speed up disk i/o by a noticeable margin, not everyone has SSDs. So just please don’t printf in your library. It makes an impact.

how should I handle logging then in a high performance application?

This is a difficult topic. Some of my suggestions in the past have been to write on a dedicated thread; that is, print whatever you need but only flush on a dedicated thread, so that way that thread is the slow one while the others can continue.

But I guess I would mention that it really depends on what you’re making. Sometimes, forwarding any output to a logging server might be better. Maybe block buffering instead of line buffering stdout. Write your own functions. I don’t know.

Some interesting ones have been making a channel across threads, and then sending log messages that are marked with your thread name & time to another thread to be finally processed and outputted, where in a worker thread:

{% highlight go %} mylog.Send(fmt.Sprintf(“my logging message”)) {% endhighlight %}

which sends it through a channel to the logging thread, which then:

{% highlight go %} for log := range logChan { fmt.Printf(“%s %s: %s\n”, log.Time, log.Threadname, log.Message) } {% endhighlight %}

Or something like that. Again, I really guess it depends on what you want to do.