Homework 1: Shell scripting

Due: 2019-09-24 at 23:59

Preliminaries

First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. Click on the assignment link. One partner should create a new team. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.)

Once you have accepted the assignment and created/joined a team, you can clone the repository on clyde and begin working. But before you do, read the entire assignment and be sure to check out the expected coding style.

Be sure to ask any questions on Piazza.

Coding style

For all of the shell scripts you write, you must follow the Google Shell Style Guide. Place close attention to the formatting rules. These are, in many ways, arbitrary. Nevertheless, you must follow them.

If you use NeoVim or Vim as your editor, you can include the line (called a mode line)

# vim: set sw=2 sts=2 ts=8 et:

at the bottom of each of your scripts to force Vim to indent by 2 spaces and to ensure that tabs will insert spaces. You can set options in your ~/.vimrc file, creating one if necessary. For example, on clyde, I have the simple ~/.vimrc.

set background=dark
filetype plugin indent on
autocmd FileType sh setlocal shiftwidth=2 softtabstop=2 tabstop=8 expandtab

The first line tells Vim to use colors suitable for a terminal with a dark background. The second line tells Vim to use file-type aware indenting. The third line tells Vim to set those options for shell script files. See the Vim wiki for more details.

After you write the #!/bin/bash line at the top of your file (or a mode line at the bottom), you’ll probably want to reopen the file so that Vim knows its a bash file and turns on the appropriate syntax highlighting and indentation.

If you use emacs, you’re kind of on your own. Feel free to ask on Piazza, search StackOverflow, and read the Emacs Wiki.

Same with Nano. This might be useful.

Run time errors and return values

For each of the parts below that ask you to print out a usage message or an error message, this message should be printed to stderr. You can use code like this to print a usage message.

echo "Usage: $0 arguments" >&2

Any errors should cause the script to exit with a nonzero value (1 is a pretty good choice). Scripts that run successfully should exit with value 0. You can use exit "${value}" to exit with a particular value.

Script warnings and errors

Make sure your scripts pass shellcheck without errors or warnings. If you disable a particular warning, you must leave a comment giving a very good reason why you did so.

Executable scripts

All of your scripts should start with the line

#!/bin/bash

and must be executable. Running chmod +x on each of the files is sufficient.

Submission

To submit your homework, you must commit and push to GitHub before the deadline.

Your repository should contain the following files

README
diskhog
goodhygiene
testurl
linecount

It may also a .gitignore file which tells Git to ignore files matching patterns in your working directory.

Any additional files you have added to your repository should be removed from the master branch. (You’re free to make other branches, if you desire, but make sure master contains the version of the code you want graded.)

The README should contain

The names of both partners (or just your name if you worked alone…but please don’t work alone if you can manage it).
Your answers to Part 5 and the commands you used to find them.
An estimate of the amount of time it took to complete each script.
Any known bugs or incomplete functions.
Any interesting design decisions you’d like to share.

Each of your scripts should contain a comment at the top of the script (below the #!/bin/bash line) that contains usage information plus a description of what the program does.

Example.

#!/bin/bash

# Usage: testurl file
#
# The file parameter should be a list of URLs, one per line. Testurl will...

Part 1. Disk hogger (10 points)

Create a shell script called diskhog that lists the five largest items (files or folders) in the current directory in decreasing order of size. You should output the sizes in a human readable format like so.

$ ./hw1/diskhog
214M	ow
58M	ow.tar.xz
48M	.komodoedit
27M	.cache
2.7M	Desktop

Make sure that your script handles files and directories with spaces in name or names that start with a period.

Check out the man pages for du(1), sort(1), and head(1) (or tail(1)). Read the portion of the bash(1) man page that describes the dotglob option to the shopt bash builtin command. You’ll want to use this in your script to make * match files/directories that start with a period.

If the script is run from an empty directory, make sure it prints nothing at all (and not an error message).

Part 2. Shell script hygiene (15 points)

Write a shell script called goodhygiene which takes zero or one parameters. The parameter, if given should be a path to a directory. If no parameters are given, it should act on the current directory. If two or more parameters are given, output usage information to stderr, and exit with return value 1. If the supplied parameter is not a directory, output an error message (on stderr) and exit with return value 1.

For each file in the directory, use file and grep to check if the file is a shell script. If it is, run shellcheck on the file. If shellcheck returns nonzero for any file, goodhygiene should exit with return value 1. Otherwise, exit with value 0.

Make sure you handle files with spaces in the name and those whose names start with a period. Make sure the script works correctly when run on an empty directory and one that contains no shell scripts (by printing nothing and returning 0).

Examples.

$ ./goodhygeine too many args
Usage: ./goodhygeine [dir]
$ echo $?
1
$ ./goodhygeine ~/empty
$ echo $?
0
$ ./goodhygeine

In ./usage-example line 8:
  echo Usage: $0 [dir] >&2
              ^-- SC2086: Double quote to prevent globbing and word splitting.

$ echo $?
1

Part 3. URL testing (15 points)

Write a shell script called testurl that accepts a list of URLs in a separate file and tests if each website is up or not. You might find it useful to checkout the curl, wget and tail commands. And check out this FAQ for how to read a file line by line in Bash. It’s not obvious.

If any URL in the file isn’t accessible, testurl should report that it isn’t found as per the example below and should (after testing all URLs), exit with return value 1.

If zero or more than two parameters are passed to testurl, print the usage to stderr and exit with value 1.

Examples.

$ ./testurl
Usage: ./testurl file
$ echo $?
1
$ cat urls
https://cs.oberlin.edu/~ncare/cs241/labs/lab8.html
https://occs.cs.oberlin.edu/~rhoyle/17s-cs241/assignments/hw02.html
https://no.such.url
https://occs.cs.oberlin.edu
$ ./testurl urls
Not found: http://no.such.url
$ echo $?
1
$ cat urls-working
https://example.com
$ ./testurl urls-working
$ echo $?
0

Part 4. Line count (40 points)

Create a shell script called linecount that takes zero or more paths as parameters and reports the total number of lines of all files in all of the paths. If no parameters are given, then linecount should output 0 and exit.

For each path that is a directory, the lines of all of the files in that directory or below it in the file system should be counted. (That is, if foo, foo/bar, foo/qux, and foo/bar/asdf are all directories and foo is passed as a parameter, then the output of linecount should include the lines of all files in those directories as well.)

Write a function that iterates over each item in a directory and adds the line count to a running total for each file and recurses into each directory. You may not use find.

Make sure your implementation correctly handles file names with spaces, files and directories that start with a period, and only considers normal files and directories (e.g., not symbolic links nor device files).

Report any unreadable files or unreadable/unsearchable directories on stderr (see examples below) and continue on. (The order of the error messages doesn’t matter and yours may not match the order in the examples.)

The shell parameter $0 expands to the path to the script (e.g., ./linecount) which is useful in error messages.

Example outputs (the numbers are made up and just for example purposes)

$ ./linecount
0
$ ./linecount .
97
$ ./linecount "$HOME" /etc
./linecount: /etc/cups/ssl: Permission denied
./linecount: /etc/chatscripts: Permission denied
./linecount: /etc/sudoers.d/README: Permission denied
./linecount: /etc/shadow: Permission denied
# A bunch more errors removed from the example
339123

Part 5. Data file analysis (20 points)

I often find myself using shell tools to answer questions about a data file that I’m working on. Here is a data file from a machine learning dataset that I’d like you download and unzip: adult.data.zip The fields in the data set are described here.

Answer the following questions in your README file (and give the commands used to find the answer):

(2 points) How many entries are marked “Male” and how many are marked “Female”?
(2 points) The last column is the label that is applied to the entry. How many of each label type are there?
(6 points) Give the counts for each label used for “race” in decreasing order
(10 points) Give the counts for a combined “race”/”sex” attribute in decreasing order

Potentially useful commands to look at include cut, sort, and uniq. If you include the commands you used to generate your answers, it might be possible to give you partial credit. Make sure you don’t check adult.data.zip or adult.data into your repository. You might consider adding appropriate lines to a .gitignore (described above) to cause Git to ignore them.