University of California, Riverside UCR


Efficient content-based retrieval via data compression

Presented by: Dr. Ertem Tuncel

Abstruct:

I will introduce a novel approach to content-based retrieval from very large and high dimensional (e.g., multimedia) databases. This approach is based on efficient compression of the feature vectors extracted from the objects in the database. The design procedure optimizes the query access time by jointly accounting for the database distribution and the query statistics. High compression ratios are achieved using appropriate vector quantization (VQ) techniques, namely, multi-stage VQ and split-VQ, which are especially suited for limited memory applications. The data set is first partitioned using the accumulated query history, and each partition of data points is separately compressed using a vector quantizer tailored to its distribution. The employed VQ techniques inherently provide a spectrum of points to choose from on the time/accuracy plane. This property is especially crucial for large multimedia databases where I/O time is a bottleneck, because it offers the flexibility to trade time for better accuracy. Our experiments demonstrate speedups of 20 to 35 over one of the most popular approximate search techniques.