1-Retrieve The Raw Content

Suppose you need to search a large number of files, and you want to find files that
contain a certain word or a phrase. How would you go about writing a program to do
this? A naïve approach would be to sequentially scan each file for the given word or
phrase. Although this approach would work, it has a number of flaws, the most obvious
of which is that it doesn’t scale to larger file sets or cases where files are very large.

Here’s where indexing comes in:

    to search large amounts of text quickly, you must
    first index that text and convert it into a format that will let you search it rapidly, eliminating
    the slow sequential scanning process. This conversion process is called indexing,
    and its output is called an index.
Advertisements