Mac Search Tool For Multiple Pdf File

Active4 months ago

Is there a way to search pdf files using the power of grep, without converting to text first in Ubuntu?

Foxit Reader users can use the shortcut Ctrl-Shift-f or select Tools > Search to open the search form of the program in a sidebar. Adobe Reader opens the advanced search options in a new window. Here it is possible to switch from searching the current document to searching all pdfs in a folder on the hard drive.

  1. To search for multiple words, select Multiple Words Or Phrase, and then click Select Words. Type each word in the New Word Or Phrase text field and click Add. You can also import a text file with the list of words or phrases to search for.
  2. Oddly, the only tool in Acrobat that allows you to search for terms and mark them in a PDF is part of the Search and Redact feature. This will add a mark to the page around the search term. I wrote about using this technique in my previous article Highlighting Multiple Words in a PDF Document.
Dervin ThunkDervin Thunk
1,0943 gold badges13 silver badges20 bronze badges

14 Answers

Install the package pdfgrep, then use the command:

——————

Simpliest way is

Community

Search Pdf Files For Text

enzotibenzotib
36.4k10 gold badges107 silver badges97 bronze badges

If you have poppler-utils installed (default on Ubuntu Desktop), you could 'convert' it on the fly and pipe it to grep:

This won't create a .txt file.

wagwag
26.8k6 gold badges56 silver badges48 bronze badges

pdfgrep was written for exactly this purpose and is available in Ubuntu.

It tries to be mostly compatible to grep and thus provides 'the power of grep', only specialized for PDFs. That includes common grep options, such as --recursive, --ignore-case or --color.

In contrast to pdftotext | grep, pdfgrep can output the page number of a match in a performant way and is generally faster when it doesn't have to search the whole document (e.g. --max-count or --quiet).

The basic usage is:

where PATTERN is your search string and FILE a list of filenames (or wildcards in a shell).

See the manpage for more infos.

hpdeifelhpdeifel

No.

A pdf consists of chunks of data, some of them text, some of them pictures and some of them really magical fancy XYZ (eg. .u3d files). Those chunks are most of the times compressed (eg. flat, check http://www.verypdf.com/pdfinfoeditor/compression.htm). In order to 'grep' a .pdf you have to reverse the compression aka extract the text.

You can do that either per file with tools such as pdf2text and grep the result, or you run an 'indexer' (look at xapian.org or lucene) which builds an searchable index out of your .pdf files and then you can use the search engine tools of that indexer to get the content of the pdf.

But no, you can not grep pdf files and hope for reliable answers without extracting the text first.

akiraakira

Recoll can search PDFs. It doesn't support regular expressions, but it has lots of other search options, so it might fit your needs.

Michael Mrozek
64.9k29 gold badges198 silver badges216 bronze badges
user39336user39336
Andy SmithAndy Smith

Take a look at the common resource grep tool crgrep which supports searching within PDF files.

It also allows searching other resources like content nested in archives, database tables, image meta-data, POM file dependencies and web resources - and combinations of these including recursive search.

CraigCraig

try this

for printing the lines the pattern occurs inside the pdf

enzotib
36.4k10 gold badges107 silver badges97 bronze badges
harish.venkatharish.venkat
4,9811 gold badge20 silver badges28 bronze badges

cd to your folder containing your pdf-file and then..

How To Search For A Pdf File

or if you want to search in more than just one pdf-file (e.g. in all pdf-files in your folder)

or

Rasmuss RallRasmuss Rall
Craigslist

There is a duplicate question on StackOverflow. The people there suggest a variation of harish.venkarts answer:

The advantage over the similar answer here is the --with-filename flag for grep. This is somewhat superior to pdfgrep as well, because the standard grep has more features.

user7610user7610
5931 gold badge7 silver badges19 bronze badges

Here is a quick script for search pdf in the current directory :

NicoNico

I assume you mean tp not convert it on the disk, you can convert them to stdout and then grep it with pdftotext. Grepping the pdf without any sort of conversion is not a practical approach since PDF is mostly a binary format.

Google Search Tool For Website

In the directory:

or in the directory and its subdirectories:

Also because some pdf are scans they need to be OCRed first. I wrote a pretty simple way to search all pdfs that cannot be greped and OCR them.

I noticed if a pdf file doesn't have any font it is usually not searchable. So knowing this we can use pdffonts.

First 2 lines of the pdffonts are the table header, so when a file is searchable has more than two line output, knowing this we can create:

then paste this

then make it executable

then list all non-searchable pdfs in the directory:

Ip Search Tool For Ip Camera

or in the directory and its subdirectories:

Eduard Florinescu

Power Search Tool For Firefox

Eduard Florinescu
3,93611 gold badges40 silver badges58 bronze badges

If you just want to search for pdf names/properties... or simple strings that are not compressed or encoded then instead of strings you can use the below

From grep --help:

and cat --help:

phuclvphuclv

gpdf might be what you need if you're using Gnome! Check this in case you're not using Gnome. It's got a list of CLI pdf viewers. Then you can use grep to find some pattern.

Rui F Ribeiro
41.4k16 gold badges96 silver badges158 bronze badges
DharmitDharmit
2,0056 gold badges22 silver badges32 bronze badges

Not the answer you're looking for? Browse other questions tagged grepsearchpdf or ask your own question.