Loading…
Thursday, March 5 • 15:15 - 17:15
Poster: 'Providing vectors and matrices support to SimSQL, an analytic relational database system,' Shangyu Luo, Rice University

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

SimSQL is a scalable, parallel and analytic distributed database system. What makes it prominent is that it is really useful for stochastic analysis on simulated data which are the uncertain data that are not actually stored in the database, but are produced by calls to statistical distributions. Such data are commonplace in real life, due to measurement errors or prediction needs. Moreover, SimSQL is also good at dealing with large-scale Bayesian machine learning. For my research, I added native vectors and matrices support to SimSQL. Originally, the data in SimSQL was stored as an integer, a double or a string. Now the data can have two new types: vectors and matrices. By providing such complementation of types, the data can be stored in a more efficient and meaningful way. For example, suppose we want to store a student’s grades of 10 tests. In the old SimSQL, we have to create a table with 10 rows and each row has the student’s name and each grade. Obviously, the student’s name is duplicated for 10 times. In contrast, in the new SimSQL, we just need to create a table with 1 row and this row contains the student’s name as a string and 10 grades as a vector. We have reduced 9 rows of a table. Apart from saving storing space, such new data representation can also provide acceleration for many large-scale machine learning algorithms, especially those with natural vectors/matrices representations such as GMM and Lasso. Previously, we may need to scan thousands of rows to read in a 100x100 matrix, and the vector-matrix and matrix-matrix calculation are tedious. Now we just need one entry to store a 100x100 matrix and those vector/matrix calculations are much succinct. The performance improvement of the new SimSQL has been verified by some preliminary experiments. 

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005