The Web: Search engines still evolving

By GENE J. KOPROWSKI, UPI Technology News

This is the second in a series of UPI articles examining the current state and future prospects of the global communications and data network known as the Internet.



CHICAGO (UPI) -- Sometimes, searching for specific information on the Web can be exasperating. Enter the word, "Ford," on the search engine,, for example, and it will present 4,524,801 pages of information that may or may not be relevant to your particular query.

The first link is from The Mercury, a Philadelphia-area newspaper. The headline: "Two seats up for grabs in Spring-Ford region," touting a local school board election. The next hyperlink is to the home page of the Ford Motor Co., which displays information concerning the company's third quarter financial results, as well as its new luxury mini-van.

"What's missing here is the context," Barak Pridor, CEO of ClearForest Corp., of New York City, a developer of data management software, told United Press International. "Information is only meaningful when it is in context."


Computer scientists around the globe, funded by private investors, and government agencies, such as the National Science Foundation and Science Foundation Ireland, are seeking to solve this vexing dilemma. The search problem is inherent in the Internet -- a technology already 30 years old that has been commercialized only within the last decade.

"The Web was designed for data on one end, and a person on the other end," Steven Cherry, senior associate editor at the IEEE Spectrum Magazine, a leading computer science journal, told UPI. "But that doesn't work so well," he added.

Using a combination of statistical mathematics, heuristics, artificial intelligence and new computer languages, researchers are developing a "Semantic Web," as it is called, which responds to online queries more effectively. The new tools are enabling users -- now on internal corporate networks and, within a year, on the global Internet -- to search using more natural language queries.

To make the Web more people-friendly, scientists are striving to make documents placed online more machine-readable.

One approach taken by ClearForest is to develop devices called tags that can be embedded in information, such as documents or Web pages, to help other computers identify exactly what kind of information it contains.


"There are a lot of initiatives concerning taking unstructured content and structuring it," said Pridor. "Once it is structured, there are a slew of things you can do with it."

Customers, like the Federal Bureau of Investigation and the Department of Homeland Security, are using the software tools developed by ClearForest to search through old file systems, computer hard disks and intranets to locate intelligence.

"If there are 10 million intelligence reports, and you tag all the people in there, you can answer some interesting questions," said Pridor. "Who is popping up? Who are the people this person is familiar with? Who did he go to school with? In addition to relationships, you can add organizations, and geographical information. What are all the possible links between Zacarias Moussaoui and al-Qaida?"

The same technology can be used for Web searches, too, enabling users to find a biography of "Henry Ford online, or information about Ford quarter panels," said Pridor.

Investors -- including Greylock -- have given ClearForest $7.5 million in recent weeks to take its technology to the next level.

Computer scientists also are employing artificial intelligence for the Semantic Web, James Lester, chief scientist and chairman of LiveWire Logic in Research Triangle Park, N.C., a linguistic software agent developer, told UPI.


"People use natural language to communicate," Lester said, "but that's not the way it is for computers."

Since the 1950s, computer scientists have been developing ways to represent knowledge so computers can draw inferences from information, he explained. Agreed-upon encoding will enable machines to understand each document.

The World Wide Web Consortium is promoting other search technology approaches as well. W3C is a collaborative research project led by some of the world's leading research universities, including the Massachusetts Institute of Technology in Cambridge, the University of Chicago and Stanford University in California.

Commercial companies are using a technology promoted by the consortium, called the Resource Definition Framework, or RDF, to help catalog digital documents just as a librarian would catalog hardback books.

This kind of ability is enabling public health authorities to track epidemiological data, from disparate sources, Karen Cummings, vice president of marketing at Metatomix Inc., of Waltham, Mass., an integration technology developer, told UPI.

"They can look at information, from across the arena, and see what the trends are," Cummings said.

Such developments are making the Web more "commercially relevant," Jason Wiener, a technologist who is developing a new search engine company in Chicago, told UPI.


Companies such as IBM Corp., and Sun Microsystems also are involved in the Semantic Web effort. Experts say health and bioinformatics -- the new field that is a combination of biology and computing -- will benefit greatly from the searching capabilities, as will regular consumers.

"People aren't doing this out of the goodness of their hearts. There is significant commercial potential. They will soon introduce new kinds of services," Lester said. The market opportunity here is significant for technology companies, he added.

"Key word searching is common today," Wiener said. "But the next generation of the Web is making documents more contextually relevant. The relevance of each document to a particular topic, or search, will be related by the semantic tagging language that developers are working on now in fields from artificial intelligence to relational databases to statistics. People have been actively pursuing this for two or three years now to evolve the Web. Several efforts are starting to rollout. I predict that in the next six months to a year, you will begin to see semantic relationship searching on the 'Net."



Latest Headlines


Follow Us