Homework 5: Regular Expressions
Due: 2020-04-26 at 23:59
Preliminaries
First, find a partner. You’re allowed to work by yourself, but I highly recommend working with a partner. Click on the assignment link. One partner should create a new team. The second partner should click the link and choose the appropriate team. (Please don’t choose the wrong team, there’s a maximum of two people and if you join the wrong one, you’ll prevent the correct person from joining.)
Once you have accepted the assignment and created/joined a team, you can clone the repository on clyde and begin working. But before you do, read the entire assignment.
Be sure to ask any questions on Piazza.
Submission
To submit your homework, you must commit and push to GitHub before the deadline.
The README.md
should contain
- The names of both partners (or just your name if you worked alone…but please don’t work alone if you can manage it).
Part 1
You’re going to write some regular expressions for use with egrep(1)
. Each regex will appear in its own two-line file named 1.1
through 1.7
.
The first line of the file will be a shebang line telling the OS to run the file as an argument to egrep(1)
. (This is provided for you and you shouldn’t change it.) The second line of each file is your regular expression. Replace XXX
with the regular expression you want.
Write regular expressions that print out the following when run on words
.
- All words that contain exactly one of
a
, e
, i
, o
, or u
(without duplicates). - All words that contain the lowercase vowels
a
, e
, i
, o
, and u
in that order. (The words may contain repeated vowels as long as aeiou
is a subsequence.) - All words that are exactly 22 lowercase letters long.
- All words that are made up of only pairs of consonant-vowels like
Banana
and are at least 6 letters long. You may consider vowels to be just a
, e
, i
, o
, and u
(and their uppercase versions). - All words that have a 4-letter sequence repeated (without overlapping).
- All words that start and end with the same 3 letter sequence (possibly overlapping).
- All words that end with their first 3 letters reversed like “detected” (possibly overlapping).
[Hint: For 5–7, you will want to use one or more back references.]
Part 2
You’re going to write some sed(1)
scripts to perform the following actions. As with Part 1, these will take the form of two-line text files named 2.1
through 2.9
.
The first line is again a shebang line, this time telling it to run sed(1)
. Where noted below, it also contains the -n
flag. The second line is the sed command to run. Initially, it’s the q
command which just quits. You should replace this with your own command. All parts can be solved with a single sed command.
- Replace all instances of “snow fall” or “wind chill” in each line with “summertime”. You’ll want to use
\<
and \>
to match the boundaries of the words. - Assuming the input is a dictionary file like
words
(one per line, alpha order), print out all words between computer
and science
. (For this one, the sed
flag -n
has been added.) - Replaces all instances of
Teh
with The
and teh
with the
, but only in standalone words. - Move the last word on a line to the front.
- Find lines where a word has been repeated on the same line and replace that line with a repeated word. Don’t print the other lines. (For this one, the
sed
flag -n
has been added.) - Convert C block comments that are on one line and at the end into a line comment. You should preserve everything between the
/*
and */
, including spaces. So /* add things up */
would become // add things up
(there are two space before on either side of “add things up”). - Only print out lines that contain
cs 241
, but change that to CSCI 241
. Make sure cs
is the start of a word and 241
is the end of a word. (For this one, the sed
flag -n
has been added.) - Take the previous, but modify it to handle
CS
or CSCI
with any number of spaces (including 0) and with any type of capitalization, and any 3-digit number (e.g., cScI151
becomes CSCI 151
) - Truncate all lines after exactly 20 characters.