29 Dec 2011

Terminal tip: Pipe Viewer

A couple of weeks ago, I held a Linux/Unix elementary course. One of the toughest concepts in that course are the concept of pipes and redirect.

I usually begin explaining pipe as "the output of one command becomes input to the next", and show by an example:

 $ zcat pureftpd.log.gz | cut -f1 -d' ' | sort | uniq | wc -l
 1259073

This command reads a ~550MB large compressed pureftpd logfile (from ftp.uio.no), and finds the number of unique visitors. Several commands are linked together by pipe, so the output of one command is input to the next.

However, I received and interesting question: "Which command use the longest time?"

There is no easy way to tell, we can just take an educated guess. However, we can use a handy little unix utility called "Pipe Viewer" to monitor and measure the data going through a pipe. Install from apt:

  $ sudo apt-get install pv

Next, we craft our command above using pv. Since pv behave like cat with respect to input/output, we measure the throughput between each command:

  $ zcat pureftpd.log.gz | pv -cN zcat | cut -f1 -d' ' | \
  > pv -cN cut | sort | pv -cN sort | uniq | pv -cN uniq | \
  > wc -l


As we see from the command, the command that had the slowest throughput was "uniq". Both cut and sort had an impressive 6-7MB/s throughput.