SCIENCE & ENGINEERING NEWS
San Francisco, CA — The Internet has become so large so fast that sophisticated search engines are just scratching the surface of the Web’s vast information reservoir, according to a new study released last Wednesday.
The 41-page research paper, prepared by a South Dakota company that has developed new software to plumb the Internet’s depths, estimates the World Wide Web is 500 times larger than the maps provided by popular search engines like Yahoo!, AltaVista and Google.com.
These hidden information coves, well-known to the Net savvy, have become a tremendous source of frustration for researchers who can’t find the information they need with a few simple keystrokes.
“These days it seems like search engines are a little like the weather: Everyone likes to complain about them,” said Danny Sullivan, editor of SearchEngineWatch.com, which analyzes search engines.
For years, the uncharted territory of the Internet’s World Wide Web sector has been dubbed the “invisible Web.”
BrightPlanet, the Sioux Falls start-up behind Wednesday’s report, describes the terrain as the “deep Web” to distinguish from the surface information captured by Internet search engines.
“It’s not an invisible Web anymore. That’s what so cool about we are doing,” said Thane Paulsen, BrightPlanet’s general manager.
Many researchers suspected that these underutilized outposts of cyberspace represented a substantial chunk of the Internet, but no one seems to have explored the Web’s back roads as extensively as BrightPlanet.
Deploying new software developed over the past six months, BrightPlanet estimates there are now about 550 billion documents stored on the Web.
Combined, Internet search engines index about 1 billion pages. One of the first Web search engines, Lycos, had an index of 54,000 pages in mid-1994.
While search engines obviously have come a long way since 1994, they aren’t indexing even more pages because an increasing amount of information is stored in evolving, giant databases set up by government agencies, universities and corporations.
Search engines rely on technology that generally identifies “static” pages, rather than the “dynamic” information stored in databases.
This means that general-purpose search engines will guide users to the home site that houses a huge database, but finding out what’s in them requires additional queries.
BrightPlanet believes it has developed a solution with software called “LexiBot.”
With a single search request, the technology not only searches the pages indexed by traditional search engines, but delves into the databases on the Internet and fishes out the information in them.
The LexiBot isn’t for everyone, BrightPlanet executives concede. For one thing, the software costs money – $89.95 after a free 30-day trial. For another, a LexiBot search isn’t fast. Typical searches will take 10 to 25 minutes to complete, but could require up to 90 minutes for the most complex requests.
“This isn’t for grandma when she is looking for chocolate chip recipes on the Internet,” Paulsen said.
The privately held company expects LexiBot to be particularly popular in academic and scientific circles. It also plans to sell its technology and services to businesses.
About 95 percent of the information stored in the deep Web is free, according to BrightPlanet.
Several Internet veterans who reviewed BrightPlanet’s research Wednesday were intrigued, but warned that the company’s software could be too overwhelming.
“The World Wide Web is getting to be so humongous that you need specialized engines. A centralized approach like this isn’t going to be successful,” predicted Carl Malamud, co-founder of Petaluma-based Invisible Worlds.
Like BrightPlanet, Invisible Worlds is trying to extract more data hidden from search engines, but is customizing the information.
Malamud calls this process “giving context to the content.”
Sullivan agreed that BrightPlanet’s greatest challenge will be showing businesses and individuals how to effectively deploy the company’s breakthrough.
“No one else has come up with something like this yet, so when they fetch people all this information on the deep Web, they are going to have to show people where to dive in. Otherwise, people will just drown.”