Unit 8: Composing Programs with |
Another key innovation of Unix is that led to its success is the invention of the |
operator.
Before Unix, operating systems tend to provide complex, monolithic, programs. The philosophy of Unix, however, is to provide many small, simple, utility programs, that can be composed to complete a more complex task. These small programs each do one thing only (and do it well) and so are easier to write and less prone to bugs.
The composition of these utility programs relies on two things. First, plain text is often used as input and output of the programs. These allow the programs to understand each other easily. Second, they use |
to communicate with each other. The |
operator takes the standard output from one program and redirects it as the standard input of another program.
For instance,
$ cat test.txt | wc
1 11 64
cat
and wc
together. Recall that cat
reads the content of the file and prints it to standard output. Here, we pipe the standard output from cat
to wc
. So now, these printed texts are redirected as the standard input to wc
.
But this is just the same as
$ wc < test.txt
|
?
Now, recall that we have made copies of test.txt
earlier, into foo.txt
and bar.txt
. If you have not done so or have removed them, you can quickly reproduce the files with:
$ cp test.txt foo.txt
$ cp test.txt bar.txt
Let's suppose now I want to count the total number of words for all three files. Instead of calling wc
on each file one by one, and sum them up myself. I can just run:
$ cat test.txt foo.txt bar.txt | wc
3 33 192
Here, cat
read the three files, concatenate their content, and pass the output to wc
for counting.
Useful Utilities
Before we see more interesting examples of using |
, let's move beyond cat
and wc
, and see what other simple tools are there in Unix.
head
and tail
head
and tail
prints out the first k lines and last k lines from a file (or standard input if the file name is not given). By default, k is 10, but you can pass in an argument to specify k.
$ cat test.txt foo.txt bar.txt
This is a test file for learning Unix file management commands.
This is a test file for learning Unix file management commands.
This is a test file for learning Unix file management commands.
$ cat test.txt foo.txt bar.txt | tail -1
This is a test file for learning Unix file management commands.
echo
echo
simply prints out the command-line argument to the standard output.
$ echo "hello world!"
hello world!
sort
sort
rearrange the input lines in alphabetical order.
$ sort
john
jane
peter
mary^D
jane
john
mary
peter
In the example above, I entered john
, jane
, peter
, mary
followed by Ctrl+D to signify the end of input. sort
prints out jane
, john
, mary
, peter
, in that order.
uniq
uniq
remove any two consecutive lines that are the same.
$ uniq
1
2
2
2
1
1^D
1
2
1
For instance, in the above, there are three consecutive lines of 2
, so only one remained. There are also two consecutive lines of 1
, so only one remained.`
grep
grep
returns the lines of text from the given file (or the standard input) that matches the given string. For instance, run
$ grep abc
and start typing in some lines of text, some containing abc
, some do not. grep
will spew out into the standard output all the lines that contain the text abc
somewhere. As usual,
hit Ctrl+D when you are done.
Pipe Example
To give you an example of how useful |
is, here is a real example. When processing the registration of the workshop, I have quite a few registrations that are duplicates -- students registered more than ones. I need a quick way to count how many unique registrants are there.
So I keep the student id of all registrants in a file called ID
. For instance, the file ID
contains (not real data, of course)
A1234567X,CS
A1234559A,CEG
A1239999J,CEG
A1234580K,CEG
A1233210O,CS
A1234567X,CS
A1234581Q,ISC
A1233216T,ISC
A1239999J,CEG
Now, to count how many unique registrants, I just need to run:
$ cat ID | sort | uniq | wc -l
7
To count how many uniq registrants are CEG
students, I just change it to:
$ cat ID | sort | uniq | grep CEG | wc -l
3