New search engine for tables is developed

Published: Aug. 8, 2007 at 4:00 PM

STATE COLLEGE, Pa., Aug. 8 (UPI) -- U.S. computer scientists have created a search engine that can identify and extract tables from PDF documents, as well as index and rank the results.

The search engine -- called TableSeer -- developed by Pennsylvania State University researchers has an innovative ranking algorithm that also can identify tables found in frequently cited documents and weigh that factor as well in the search results, Assistant Professor Prasenjit Mitra said.

Mitra said TableSeer is believed to be the first search engine designed for tables.

Although some software can identify and extract tables from text, existing software cannot search for tables across documents, Mitra said. TableSeer automates that process, capturing data not only within the table, but also in tables' titles and footnotes. In addition, it enables column-name-based searches so a user can search for a particular column in a table.

The development of TableSeer is part of an open-source project funded by the National Science Foundation.

TableSeer can be tested online at http://chemxseer.ist.psu.edu. The source code will be made available near the completion of the project, the researchers said.

© 2007 United Press International, Inc. All Rights Reserved.
Order reprints




Additional News Stories
Retailers: As snow falls, so do sales (10 min)
NBA: Washington 118, Golden State 109 (36 min)
NHL: Vancouver 3, Washington 2 (43 min)
Woman allegedly stole case of Scotch (56 min)
NBA: Houston 116, Dallas 108 (OT)
NHL: Chicago 5, Boston 4 (SO)
COL FB: Villanova 23, Montana 21
fark
Cows have taken over Clark County
Shortage of ugly sweaters threatens to ruin ironic hipster parties
Yeah, you probably have mad cow disease
U.S. to Capture Cow Farts to Save the Planet. This should complete the Cow trifecta
Austin man reports cow as missing
800 sheep and 40 cattle killed by Walla tip fire. BBQ trifecta is sick due to overeating