Monday, March 19, 2012

Paper # 6
SPARK: A Keyword Search Engine on Relational Databases

Relational databases are one of the most popular ways to store text data into a database system. Unfortunately, keyword search systems for these databases are not effective since they use query templates to " map keyword search to full-text matching within one or more attributes." IMDb uses this technique.

The main purpose of the document is to demo the system they developed to address this problem; it is named SPARK.

The SPARK Server contains three main components:
-Nonfree tuple set constructor: This component is built with the set of tuples from a Relation that contain at least one match to a query keyword.
-Candidate network generator: Uses relation algebra to process the chosen tuples from the previous step to generate a set of minimal candidate networks whose sizes are within a user-defined threshold.
-Query processor: It uses four algorithms (Sparse and global pipeline, skyline sweeping and block pipeline) to find the top-k results.

The third part of the paper is an explanatory demonstration of SPARK's features. The casual user mode is quite simple and doesn't require much explanation, but the professional search mode allows for much more customization within queries. For example, the user can choose to search an specific keyword only in a specific attribute of a chosen table. The example they give is a search of the name cage, under the table actors and under the attribute actor.name. A user can also include parameters for the three main SPARK components; one such example would be specifying the candidate network threshold.