share
Unix & LinuxCheck if all patterns are in file
[+2] [3] iobender
[2020-04-14 15:08:09]
[ grep ]
[ https://unix.stackexchange.com/questions/580017/check-if-all-patterns-are-in-file ]

Say I have a patterns.txt and want to check if every one of those patterns is present in some file.

I could do something like:

for pattern in $(cat patterns.txt); do 
  if ! grep -q "$pattern" file.txt; then
    echo "Error: missing pattern $pattern"
  fi
done
echo "All patterns found"

but this is inefficient as it has to re-scan file.txt for each pattern (and is not as simple if instead of a file we're looking for patterns in a stream coming from a pipe that may be large, e.g.).

Is there a way to have grep (or some other tool) check if all the patterns are present?

(2) When you say "is present", do you mean whether the actual pattern is present, or if the pattern matches? - Kusalananda
[+1] [2020-04-14 19:27:15] glenn jackman
cat file.txt | awk '
    NR == FNR {seen[$0] = 0; next} 
    {for (p in seen) if ($0 ~ p) seen[p]++} 
    END {
        for (p in seen) 
            if (seen[p] == 0) {
                missing++
                print "missing pattern", p
            } 
        if (missing == 0) print "all found"
        exit missing
    }
' patterns.txt -

Replace the cat command with any pipeline that produces text.


1
[0] [2020-04-14 15:41:18] pLumo

This might work well:

sort -u patterns.txt > sorted_patterns.txt # only once
diff -sq <(grep -o -f sorted_patterns.txt file.txt | sort -u) sorted_patterns.txt

If you have fixed strings instead of patterns, use -F. This makes grep a lot faster!

You could also use cmp [1] instead of diff -s. That might be a bit faster, but won't be able to show what is missing.


Output if not all patterns have been found:

Files /dev/fd/63 and /dev/fd/62 differ

or if all patterns have been found:

Files /dev/fd/63 and /dev/fd/62 are identical

Leave out -q to know what is missing.

2a3
> missing_word
[1] https://stackoverflow.com/questions/12900538/fastest-way-to-tell-if-two-files-have-the-same-contents-in-unix-linux

2
[0] [2020-04-14 16:33:57] αғsнιη

with grep and PCRE extension and also using power of the printf command; you would do something like:

<infile grep -qzP "(?s)$(printf "(?=.*?%s)" $(<pattern.txt))" &&
 echo 'all matched' ||
 echo 'one or more pattern(s) does not found'

if all patterns (in any order) found in input file infile then text all matched will goes to output; otherwise text one or more pattern(s) does not found will echo.

this will act pattern as a matched pattern like pat[01] will match both pat0&pat1; to match exact pattern say pat[01] literally change printf control modifier from %s to %q which that will escape special characters.

<infile grep -qzP "(?s)$(printf "(?=.*?%q)" $(<pattern.txt))" &&
 echo 'all matched' ||
 echo 'one or more pattern(s) does not found'

3