Unix command line 101: How much do you know?
Arguments and options on the half-shell
Summary
Arguments and options are those mysterious little nuggets preceded by minus signs, file names, and other Unix arcana that appear on a command line following the command to be executed. This month Mo unveils the power of these commands, but not before he breaks it down with an examination of the Unix command line itself. (2,600 words)
Unix command line is a sequence of characters in the syntax of the
target shell language. Of the characters found there, some are known as metacharacters, which have a special meaning to the shell. The
metacharacters in the Korn shell are:
; Separates multiple commands on a command line
& Causes the preceding command to execute synchronously (i.e., at the same time as the next command on the command line)
() Launches commands enclosed in parentheses in a separate shell
| Pipes the output of the command to the left of the pipe to the input of the command on the right of the pipe
> Redirects output to a file or device
< Redirects input from a file or device
- Newline Ends a command or set of commands
- Space Separator between command words
- Tab Separator between command words
(Note: Some of these metacharacters can be used in
combinations, such as || and && . Consult your manual for a complete
description.)
With these metacharacters in mind, you can define a command line
word -- a sequence of characters separated by one or more
nonquoted metacharacters. In the following example, the passwd file
is piped through the cut program, and fields 1 and 3 are output based on a colon
delimiter.
In the following command line,
cat /etc/passwd|cut -d ":" -f 1,3 >usruid.txt
the words are:
cat
/etc/passwd
cut
-d
":"
-f
1,3
usruid.txt
Note that the metacharacters | , > , and the space have been removed, and
that the metacharacters & , | , () , ; , and the newline are used to separate or
terminate multiple commands within a command line.
In our example command line, there are two commands separated by
the pipe (| ) symbol:
cat /etc/passwd
and
cut -d ":" -f 1,3
The final portion of the command line -- >usruid.txt -- could be
thought of as the command "and output the result to usruid.txt ,"
although redirection is not usually considered part of a command.
When a command executes a Unix program, utility, or shell script,
it's usual for the command to include arguments. In the example
above, the argument to cat is /etc/passwd . The arguments to cut are
-d , ":" , -f , and (1,3) .
In general, arguments are all the words (note the definition of word
above) that follow an executable program name in a command.
Arguments within a command are separated from one another by spaces
or tabs (metacharacters). Most Unix programs were written with
standards for the arrangement of arguments and options.
Options are the letters or numbers that follow a minus sign. A
simple example of arguments and options would be the use of the cat
command:
cat -v -e -t doodah.txt
In the above example, the arguments are -v , -e , -t , and doodah.txt ,
while the options are the entries for -v , -e , and -t . The -v option
asks cat to display all characters, even nonprintable ones; the -e
option specifies that the end of a line will be displayed as $ ; and the
-t option specifies that a tab should be displayed as ^I instead of
expanding the tab into spaces on the screen.
Unfortunately, no standard terminology has been developed to
differentiate an option from a nonoption argument, which is all the
more confusing when one considers that an option can itself have an
argument. In the first example using cut , the -d option has an
option argument of ":" , and the -f option has an option argument of
(1,3) . In order to clarify these, various manuals have adopted
standards for naming conventions for the parts of a Unix command.
The following examples illustrate the parts:
cat -v -e -t doodah.txt
cat /etc/passwd|cut -d ":" -f 1,3 >usruid.txt
The program name itself, cat in the first example and cat and
cut in the second, is variously called the name, progname, executable, or
program-name. The nonoption arguments to a command, doodah.txt in
the first example and /etc/passwd in the second, is called an
operand or cmdarg.
The options -v ,-e , and -t in the first example, and -d and -f in
the second, are called options, opts, or switches. The arguments to
options, ":" and (1,3) in the second example are called
option-arguments, or optargs.
The standards used in creating Unix executables are:
- Command names must be between two and nine characters long
- Command names must include only lowercase letters and digits
- Option names (options above) must be only one character long
- All options must be preceded by
-
- Options with no arguments may be grouped after a single
- (e.g., -v -e -t could also be written as -vet )
- The first option-argument following an option must be preceded by white space
- Option-arguments cannot be optional
- Groups of option-arguments following an option must either be separated by commas or by white space, and quoted (e.g.,
-f 1,3 or -o "xxx z yy" )
- All options must precede operands on the command line
-- may be used to indicate the end of the options
- The order of the options relative to one another should not matter
- The relative order of the operands (cmdargs) may affect their
significance in ways determined by the command with which they appear
- preceded and followed by white space should only be
used to mean standard input
Not all Unix commands follow these rules, although all the newer
ones do. Older executables were written before the standard was
established, but executables dating from these times are in such regular use that it
was decided not to change them. For example, cut will function with
or without rule number six, which requires a space before the
option-argument. Both of the following commands will work on most
systems.
cat /etc/passwd|cut -d ":" -f 1,3 >usruid.txt
cat /etc/passwd|cut -d: -f1,3 >usruid.txt
The find command is another example of an antiquated program still
in use today. It uses options longer than a single character, which
violates rule number three, and allows options to appear after the
operand, thus violating rule number nine. In the following example,
dot (. ) is the operand, -name and -print are options, and data.txt
is the option-argument for -name .
find . -name data.txt -print.
The getopts function
You're probably wondering what all this blather is leading up to.
Well, Unix provides a handy tool for separating option arguments and
operands, and it's known as the getopts function. This function is called
by following getopts with a string (which contains the list of valid
option characters) and a shell variable (which receives the result
of searching the arguments). The function can be called several
times, and each time it steps forward through the list of arguments
and picks up the next option. It can also pick up an
option-argument, and the index of the argument that it has
processed.
To illustrate this, imagine a shell script that will archive a file
by copying it to an archive directory. The default directory is
/u/arch , but the path of the archive directory can be changed on the
command line. The archive program will also stop and ask you if it
is about to overwrite an earlier archive, but an option can be set
to overwrite without warning. A sample command line for this archive
program would be:
arch [-r] [-a /new/archive/path] filename
In this example, the -r option will automatically replace an
existing archive file without warning, though the default is to warn. The
-a option is followed by an alternative archive directory to use
instead of /u/archive . Finally, filename is the name of the file to
archive.
The following is a listing for arch that covers the processing of
the option arguments. It does not include the logic for doing the
actual archive operation. Below the complete listing is a
step-by-step analysis of how the program works.
#! /bin/sh
#:------------------------------------------------------------
--
usage () {
echo "Usage:"
echo "arch - archives a file to /u/arch directory"
echo "syntax:"
echo " arch [-r] [-a /new/archive/path] filename"
echo "where"
echo " -r will automatically replace an existing archive
file"
echo " (default is to warn)"
echo " -a specifies an alternative archive directory"
echo " filename is the name of the file to archive"
exit
}
#:------------------------------------------------------------
--
replace="w"
arch="/u/arch"
filename=""
optstr=":ra:"
while getopts $optstr opt
do
case $opt in
r) replace="r";;
a) arch=$OPTARG;;
*) usage;;
esac
done
shift `expr $OPTIND - 1`
filename=$1
echo "Archiving" $filename " to " $arch "with" $replace
"replace option"
# rest of the code goes here
The getopts function does not always work correctly with the Korn
shell, so line 1 forces the script to run in the Bourne shell. The
program begins lines 2 through 15 with a comment describing its
actions that also doubles as a usage function, which is called when
the user makes a mistake.
1 #! /bin/sh
2
#:------------------------------------------------------------
--
3 usage () {
4 echo "Usage:"
5 echo "arch - archives a file to /u/arch directory"
6 echo "syntax:"
7 echo " arch [-r] [-a /new/archive/path] filename"
8 echo "where"
9 echo " -r will automatically replace an existing
archive file"
10 echo " (default is to warn)"
11 echo " -a specifies an alternative archive
directory"
12 echo " filename is the name of the file to archive"
13 exit
14 }
15
#:------------------------------------------------------------
--
Shell variables are set up at lines 17 and 18 and contain the default
values used for archiving and the archive directory, and makes a replacement warning default behavior. Line 19 sets up a variable for the file to be
archived:
17 replace="w"
18 arch="/u/arch"
19 filename=""
This program has two possible options: -r and -a . The -a
option requires an option-argument that names the directory to use.
An options string should contain the list of single character
identifiers to be used for options, or ra . In addition, if an
option is to be preceded by an option argument, a colon should
immediately follow it. Finally, getopts will produce an error
message if an invalid option is placed on the command line. In
order to suppress the error message, start the option string with a
colon or ":ra:" . That string is set up at line 21 of the script.
21 optstr=":ra:"
Whenever getopts is called, it locates the next available option,
retrieves the character, and places it in the passed variable name.
At line 23 this variable, $opt , is passed as the second argument to
getopts after $optstr . The getopts function returns true as long as
it continues to find arguments that start with a leading - . When it
finds -r on the command line, it places r in $opt . When it finds
-a , it places a in $opt . Whenever getopts finds an option
expecting an option-argument, it retrieves the argument and places
it in a variable named $OPTARG . The loop at lines 23 through 31
processes all options and option-arguments by repetitively calling
getopts . Inside a case statement the various results are
processed.
If -r was encountered, it will appear in $opt and $replace will be
set to r . If -a was encountered, a will appear in $opt , and the
value in $OPTARG will be used to set the value of $arch . If anything
else is encountered, the user has entered an invalid option. This
calls the usage function, which displays a usage message and exits
the program.
23 while getopts $optstr opt
24 do
25
26 case $opt in
27 r) replace="r";;
28 a) arch=$OPTARG;;
29 *) usage;;
30 esac
31 done
The getopts function also retains one other variable, $OPTIND , which
contains the index of the next argument to be processed. When the
shell script is first started, $OPTIND is set to 1. If -r is
processed as the first argument, $OPTIND will contain 2. If -a is
processed as the second argument (and the name of an archive
directory as the third argument), $OPTIND will contain 4. On the
next call to getopts , getopts returns false, and the loop at lines
23 through 31 ends.
At this point, $OPTIND still contains the value 4, which can be used
as the index of the next argument -- the first argument not
beginning with a hyphen (- ). This should be the name of the file to
archive. At line 33, the shift command is used to shift all arguments
by $OPTIND - 1 ; this causes the argument that was at position 4 ($4 ) to
be shifted left three positions, making it argument $1 . At line 35
this value is picked up and stored in $filename .
33 shift `expr $OPTIND - 1`
34
35 filename=$1
At this point, a good script would execute further error checking,
such as making sure the file named in $filename and the archiving
directory in $arch both exist. In the following example, results of
the extracted values are displayed:
37 echo "Archiving" $filename " to " $arch "with" $replace
"replace option"
38
39 # rest of the code goes here
Using getopts is an excellent way to create scripts that comply with
the Unix command standard. It also makes it fairly easy to add
additional features to your scripts. For example, let's assume you
want to enhance your arch script to put a date and time stamp on an
archive. Simply extend the $optstr variable to allow for a -d
option, add a variable, and extend the case statement. Voilà! You've
just added a -d option to the arch command. Of course, you have to
add the code to handle $datestamp="Y" , but the user interface is easily taken care of.
replace="w"
arch="/u/arch"
filename=""
datestamp="N"
optstr=":ra:d"
while getopts $optstr opt
do
case $opt in
r) replace="r";;
a) arch=$OPTARG;;
d) datestamp="Y";;
*) usage;;
esac
done
|