Shell Tricks: list files with most text matches
Here’s a Bash function for searching all text files in the current directory for a pattern, then listing the files containing matches in ascending order by number of matches. It’s mostly a proof of concept, but a useful companion to a basic grep search.
The meat of the script happens in an array declaration. It first uses grep -lIi -E "$patt" * 2> /dev/null
to list files containing the provided pattern (case insensitive), ignoring binary files. The error redirect at the end of the command will ignore the errors thrown by directories. The results of this are fed to another grep command: grep -Hi -c -E "$patt"
which outputs the match count for each file. The results are saved to the array.
After including the function in a sourced file (e.g. ~/.bash_profile
), running matches -h
will show the available flags and switches:
$ matches -h
Find files in the current directory containing the most occurrences
of a pattern
-c Include occurrence counts in output
-r Reverse sort order (default ascending)
-m COUNT Minimum number of matches required
-h Display this help screen
Example:
# search for files containing at least 3 occurrences
# of the word "jekyll", display filenames with counts
$ matches -c -m 3 jekyll
Here’s the function for pasting into ~/.bash_profile (or other sourced file):
# Find files in the current directory containing the most occurrences of a pattern
# switch -c: turn on display of occurrence counts
# switch -r: reverse sort order (default ascending)
# flag -m COUNT: minimum number of occurrences required to include file in results
# param 1: (required) search pattern (regex allowed, case insensitive)
#
# Results are output in ascending order by occurrence count
matches () {
local counts=false minmatches=1 patt width=1 reverse=""
local helpstring="Find files in the current directory containing the most occurrences of a pattern\n\t-c Include occurrence counts in output\n\t-r Reverse sort order (default ascending)\n\t-m COUNT Minimum number of matches required\n\t-h Display this help screen\n\n Example:\n\t# search for files containing at least 3 occurrences\n\t# of the word \"jekyll\", display filenames with counts\n\n\t$ matches -c -m 3 jekyll"
OPTIND=1
while getopts "crm:h" opt; do
case $opt in
c) counts=true ;;
r) reverse="r" ;;
m) minmatches=$OPTARG ;;
h) echo -e $helpstring; return;;
*) return 1;;
esac
done
shift $((OPTIND-1))
if [ $# -ne 1 ]; then
echo -e $helpstring
return 1
fi
patt=$1; shift
OLDIFS=$IFS
IFS=$'\n'
declare -a matches=$(while read -r line; do \
grep -Hi -c -E "$patt" "$line"; \
done < <(grep -lIi -E "$patt" * 2> /dev/null) \
| sort -t: -${reverse}n -k 2)
width=$(echo -n ${matches[0]##*:}|wc -c|tr -d ' ')
for mtch in ${matches[@]}; do
if [ ${mtch##*:} -ge $minmatches ]; then
if $counts; then
printf "%${width}d: %s\n" ${mtch##*:} "${mtch%:*}"
else
echo "${mtch%:*}"
fi
fi
done
IFS=$OLDIFS
}
Ryan Irelan has produced a series of shell trick videos based on BrettTerpstra.com posts. Readers can get 10% off using the coupon code TERPSTRA
.