Say I have a patterns.txt and want to check if every one of those patterns is present in some file.
I could do something like:
for pattern in $(cat patterns.txt); do
if ! grep -q "$pattern" file.txt; then
echo "Error: missing pattern $pattern"
fi
done
echo "All patterns found"
but this is inefficient as it has to re-scan file.txt for each pattern (and is not as simple if instead of a file we're looking for patterns in a stream coming from a pipe that may be large, e.g.).
Is there a way to have grep (or some other tool) check if all the patterns are present?
cat file.txt | awk '
NR == FNR {seen[$0] = 0; next}
{for (p in seen) if ($0 ~ p) seen[p]++}
END {
for (p in seen)
if (seen[p] == 0) {
missing++
print "missing pattern", p
}
if (missing == 0) print "all found"
exit missing
}
' patterns.txt -
Replace the cat command with any pipeline that produces text.
This might work well:
sort -u patterns.txt > sorted_patterns.txt # only once
diff -sq <(grep -o -f sorted_patterns.txt file.txt | sort -u) sorted_patterns.txt
If you have fixed strings instead of patterns, use -F. This makes grep a lot faster!
You could
also use
cmp
[1] instead of diff -s. That might be a bit faster, but won't be able to show what is missing.
Output if not all patterns have been found:
Files /dev/fd/63 and /dev/fd/62 differ
or if all patterns have been found:
Files /dev/fd/63 and /dev/fd/62 are identical
Leave out -q to know what is missing.
2a3
> missing_word
[1] https://stackoverflow.com/questions/12900538/fastest-way-to-tell-if-two-files-have-the-same-contents-in-unix-linuxwith grep and PCRE extension and also using power of the printf command; you would do something like:
<infile grep -qzP "(?s)$(printf "(?=.*?%s)" $(<pattern.txt))" &&
echo 'all matched' ||
echo 'one or more pattern(s) does not found'
if all patterns (in any order) found in input file infile then text all matched will goes to output; otherwise text one or more pattern(s) does not found will echo.
this will act pattern as a matched pattern like pat[01] will match both pat0&pat1; to match exact pattern say pat[01] literally change printf control modifier from %s to %q which that will escape special characters.
<infile grep -qzP "(?s)$(printf "(?=.*?%q)" $(<pattern.txt))" &&
echo 'all matched' ||
echo 'one or more pattern(s) does not found'