Search word from pdf files
The pdfgrep
command in Linux is used to search for a specific pattern of characters in one or multiple PDF files. It’s a handy utility for filtering and displaying lines containing the desired pattern, referred to as a regular expression.
Installation For Ubuntu/Fedora
sudo apt-get update -y sudo apt-get install -y pdfgrep
Syntax
pdfgrep [options...] pattern [files]
Common Options:
-c
: Count the number of matches per input file.-h
: Suppress the prefixing of the file name on output.-i
: Ignore case for matching.-H
: Print the file name for each match.-n
: Prefix each match with the number of the page where it is found.-r
or-R
: Recursively search all files (follows symlinks with-R
).
Example
To search for the pattern “func main()” in all PDF files within a directory and its subdirectories:
pdfgrep -HiR "Siliconvlsi" *
If you only want to search in the current directory (not subdirectories):
pdfgrep -i "siliconvlsi" *
Find Command
Another approach using the find
command:
find /path -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename
--label= "{}" --color "siliconvlsi"' \;
These commands help you search for specific patterns in PDF files, providing flexibility and control over the search process.