Grep Across Multiple Lines
Philip Wilkinson
Software Engineer, Amazon
Published: 2/1/2024
Quick Reference
Command | Explanation |
---|---|
bash $ grep -Pzo ‘(?s)from.*to’ <file_name> | grep -Pzo ‘(?s).. followed by the first word, end word and file name is the simplest way to use grep to match across multiple lines in a file |
bash $ ggrep -Pzo (?s)from.*to <file_name> | In other cases where -P is no longer supported by grep, you have to use ggrep after installing it with brew install grep. |
```bash $ pcre2grep -M ‘from(\n | .)*to’ <file_name> ``` |
Multiple matches with grep
If you are searching to match across multiple words on the same line, the grep command takes the form:
$ grep ‘from.*to’ <file_name>
For example:
Which uses regular expression syntax to match lines that contain all words complete until complete on the same line. This is because . means all characters while * means as many as possible.
Using grep -P or ggrep -P to grep multiple lines
To multiline match with grep, the command becomes much more complicated:
# if your machine supports grep -P
$ grep -Pzo ‘(?s)from.*to’
# using ggrep instead
$ ggrep -Pzo ‘(?s)from.*to’
For example:
If your machine does not support grep -P, you can install ggrep from homebrew-core using brew and the command:
$ brew install grep
This will then become available as ggrep.
The parameters for this are:
- -P uses Perl compatible regular expression (PCRE)
- -z treats the input as a set of lines, each being terminated by a zero byte instead of a new line. Essentially this allows grep to treat the file as a whole line as opposed to multiple lines
- -o prints only the matching strings as otherwise the entire file will be printed. The complication however is that will also add a trailing zero byte character which can cause additional problems.
- (?s) activate PCRE_DOTALL which means that “.” finds any character or a new line.
- .* will match everything, including new lines, up until to because of the addition of (?s) into the regular expression.
If you want to simply print out file names that have lines that have matches with the regular expression then you can alter the -o flag to -l which will list all matching file names.
Grep for single line to the final word in another line
$ grep -Pzo '(?s)success.*failure' process_output.txt
# or
$ ggrep -Pzo '(?s)success.*failure' process_output.txt
For example:
Grep for start of line containing multiple instances of the same word to the end of a line containing multiple instances of the same word
$ grep -Pzo '(?s)scheduled.*complete' process_output.txt
# or
$ ggrep -Pzo '(?s)scheduled.*complete' process_output.txt
For example:
Grep for word at the end of one line to the final word in another line
$ grep -Pzo '(?s)failure.*complete' process_output.txt
# or
$ ggrep -Pzo '(?s)failure.*complete' process_output.txt
For example:
Using pcre2grep to grep multiple lines
An alternative would be to take advantage of the pcre2grep extension which would simplify the command by adding the flag -M
$ pcre2grep -M 'from(\n|.)*to' <file_name>
Where the -M or --multiline flags allow patterns to match more than one line. This is an alternative that packs inbuilt support for Perl Compatible regular expression and is usually already preinstalled in your system alongside grep. Otherwise, this can be installed using your package manager.
Alternatively, you can also use the (?s) trick from before to turn on PCRE_DOTALL and make the dot character match new lines as well. Which simplifies the command to:
$ pcre2grep -M 'from(\n|.)*to' <file_name>
Common “gotchas” when using grep across multiple lines
grep will use the first and last instances of the words
When using grep across multiple lines it is important to be aware that the command will get both the first instance of the from word and will get everything up until the last instance of the to word. This will likely affect the output you expected, especially when there may be multiple instances of from or to in your document. Alternatively, tools such as awk or sed will start from the first instance of from but finish at the first instance of to.
grep uses regex standards
It is important to know that the “strings” following the grep command will match the document based on the rules of regular expression. This means that simply typing in fail will also match failure. To match only specific words when matching across multiple lines you can use regular expression tools to match one words. For example:
$ grep -Pzo ‘(?s)\bfail\b.*\n.*\bsuccess\b’
grep is case sensitive
grep commands are also case sensitive but you can control this using the i flag to ignore case.
Find out more about grep
As always if you want to find out more about how to use the grep tool you can use:
$ man grep
Which will print out all the options with explanations. Or:
$ grep --help
Which will print out a short page of all the available options.
Alternative tools
Alternatively, tools such as awk and sed make can make this command much simpler to implement. For awk the command would be:
$ awk ‘/from/,/to/’ <file_name>
where from is the first word or regular expression you are searching for and to is the final work you are looking for.
In sed the command is similar and takes the form:
$ sed -n ‘/from/,/to/p’ <file_name>
As with the prior example, from is the first word or regular expression and to is the final word or regular expression you are looking for.
Written by
Philip Wilkinson
Software Engineer, Amazon
Filed Under
Related Articles
Grep Multiple Strings
How to filter lines and extract specific information from the output of commands or text files based on string patterns and regular expressions with grep.
How To Filter The Output of Commands
Learn how to filter and format the output of commands and logs using the grep, awk, uniq, head, and tail commands.
How to Make Grep Case Insensitive
By default, grep is case sensitive
Grep In a Directory
Learn how to use grep to search for words and phrases within a directory and all its subdirectories, a specific directory, all files, and other variations.
Exclude With Grep
Excluding unwanted key terms or directories when using grep
Grep Count
Efficiently count lines or occurrences in a file.