Homework 5: Regular Expressions

Due: TDB at 23:59

Preliminaries

First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. Click on the assignment link. One partner should create a new team. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.)

Once you have accepted the assignment and created/joined a team, you can clone the repository on clyde and begin working. But before you do, read the entire assignment.

Be sure to ask any questions on Piazza.

Submission

To submit your homework, you must commit and push to GitHub before the deadline.

The README.md should contain

Part 1

You’re going to write some regular expressions for use with egrep(1). Each regex will appear in its own two-line file named 1.1 through 1.7.

The first line of the file will be a shebang line telling the OS to run the file as an argument to egrep(1). (This is provided for you and you shouldn’t change it.) The second line of each file is your regular expression. Replace XXX with the regular expression you want.

Write regular expressions that print out the following when run on words.

  1. All words that contain exactly one of a, e, i, o, or u (without duplicates).
  2. All words that contain the lowercase vowels a, e, i, o, and u in that order. (The words may contain repeated vowels as long as aeiou is a subsequence.)
  3. All words that are exactly 22 lowercase letters long.
  4. All words that are made up of only pairs of consonant-vowels like Banana and are at least 6 letters long. You may consider vowels to be just a, e, i, o, and u (and their uppercase versions).
  5. All words that have a 4-letter sequence repeated (without overlapping).
  6. All words that start and end with the same 3 letter sequence (possibly overlapping).
  7. All words that end with their first 3 letters reversed like “detected” (possibly overlapping).

[Hint: For 5–7, you will want to use one or more back references.]

Part 2

You’re going to write some sed(1) scripts to perform the following actions. As with Part 1, these will take the form of two-line text files named 2.1 through 2.9.

The first line is again a shebang line, this time telling it to run sed(1). Where noted below, it also contains the -n flag. The second line is the sed command to run. Initially, it’s the q command which just quits. You should replace this with your own command. All parts can be solved with a single sed command.

  1. Replace all instances of “snow fall” or “wind chill” in each line with “summertime”. You’ll want to use \< and \> to match the boundaries of the words.
  2. Assuming the input is a dictionary file like words (one per line, alpha order), print out all words between computer and science. (For this one, the sed flag -n has been added.)
  3. Replaces all instances of Teh with The and teh with the, but only in standalone words.
  4. Move the last word on a line to the front.
  5. Find lines where a word has been repeated on the same line and replace that line with a repeated word. Don’t print the other lines. (For this one, the sed flag -n has been added.)
  6. Convert C block comments that are on one line and at the end into a line comment. You should preserve everything between the /* and */, including spaces. So /* add things up */ would become // add things up (there are two space before on either side of “add things up”).
  7. Only print out lines that contain cs 241, but change that to CSCI 241. Make sure cs is the start of a word and 241 is the end of a word. (For this one, the sed flag -n has been added.)
  8. Take the previous, but modify it to handle CS or CSCI with any number of spaces (including 0) and with any type of capitalization, and any 3-digit number (e.g., cScI151 becomes CSCI 151)
  9. Truncate all lines after exactly 20 characters.