Pipe

Conference Topic:

"The unnamed pipe adds a level of creativity to the commands issued by UNIX. It is possible to have 5, 15, or 30 commands put together by an unnamed pipe to produce a final output. Is this power really advantageous?"

Pipes are indeed used very often in UNIX scripts, and it is important to understand how they work and be able to write compound commands with multiple pipes between component (simple) commands. Please comment on how you see the role of pipes in UNIX, and give examples in which they prove to be advantageous (or not exactly so!)

Response:

Theresa L. Ford on 04-19-2004

Limits of the Pipe

A pipe merely takes the output of a command and uses it as the input for another command. By itself, it's useless. It's designed to connect commands, however, which makes it powerful.

# command | command | command

which does (in a flowchart style):

command -> command -> command

In the shell, a pipe is not capable of sending the output to two or more commands which can then continue on (multithreading):

command --> command -> command
       |\
       | -> command -> command
        \
         -> command -> command

Isn't that what the tee command is for? Splitting up the input into two outputs? Not precisely. The tee command merely dumps output to a file and sends the output on to the next command. It's a dead end although the file could be used by later commands within the sequence.

tee --> command -> command
    \
     -> file

It does not work like this:

tee --> command -> command
    \
     -> command -> command

For example, part of our homework:

# sed [command] [file] > t1
# sed [command] t1 > t2

could be accomplished:

# sed [command] [file] | sed [command] > t2

which skips creating t1 entirely.

Of course, that makes debugging potentially more difficult! Looking at t2, which sed command caused the file to be wrong? A judicious and temporary use of the tee command could help.

# sed [command] [file] | tee t1 | sed [command] > t2

In flowchart style this does:

sed [command] [file] -> sed [command] -> t2
                     \
                      -> t1

It is important to note here that t1 is not used for the second sed command. t1 still contains the output of the first sed, just like the original two step approach, but it's not used. Instead the output from the first sed command goes to both t1 and the second sed command. After the piped commands are debugged, it would be fairly simple to strip out the tee command.

Using a pipe instead of a temporary file is extremely useful when there is not enough disk space to create a temporary file. The lack of multithreading really isn't a problem because that can be handled through programs and scripts that use the pipe command.

A pipe "does one thing and it does it really well" (the goal of most Unix commands). It sends STDOUT to STDIN. Consequently, a pipe is a great thing. It connects all the other "one thing really well" programs so that complex sequences of steps are easily performed without the computer mucking around with temporary files and extra processing steps.

Back