Every day people rely on search engines to find specific content in the many terabytes of data that exist on the Internet, but have you ever wondered how this search is actually performed? One approach is Apache’s Hadoop, which is a software framework that enables distributed manipulation of vast amounts of data. One application of Hadoop is parallel indexing of Internet Web pages. Hadoop is an Apache project with support from Yahoo!, Google, IBM, and others. This article introduces the Hadoop framework and shows you why it’s one of the most important Linux®-based distributed computing frameworks.