Most approaches to Big Data do not exploit the valuable structural relationships inherent in the underlying data. For example, in order to apply conventional linear algebra methods, arrays of dimension 3 or higher are flattened, thus losing their structure. In addition, the existing methods that attempt to overcome these issues are computationally inefficient and unstable. Big Data tasks such as noise removal, data imputation, and classification would benefit from preserving the information contained in higher order data structures such as tensors. However, a lack of scalable algorithms to apply tensor-based methods is a major obstacle in using tensor analysis to leverage the structural information within Big Data. States of critical illness and injury such as sepsis with large high velocity streams of heterogenous structured and unstructured data could greatly benefit from tensor-based methods of analysis.
To solve this problem, MCIRCC members Harm Derksen, PhD (Department of Mathematics), Kayvan Najarian, PhD (Departments of Computational Medicine and Bioinformatics and Emergency Medicine), Jonathan Gryak (Department of Computational Medicine and Bioinformatics), Kevin Ward, MD (Departments of Emergency Medicine and Biomedical Engineering), and Timothy Cornell, MD (Stanford University) were recently awarded a $1,418,872 grant by the National Science Foundation. Drs. Derksen and Najarian will serve as the principal investigators.
The team’s proposed project designs efficient, numerically stable and computationally feasible algorithms for crucial tensor operations using sepsis as a prime example but that will also be widely applicable to many Big Data applications. A new theoretical framework for tensor decomposition has been developed that utilizes colored Brauer diagrams to create a novel algebraic graphical calculus. This calculus in turn can be used to obtain explicit formulas for tensor operations. Derksen and his team will use this framework to develop new scalable algorithms for tensor decomposition, denoising, and other Big Data tasks in a way that efficiently allows the full use of high dimensional tensor data like that existing in sepsis. The graphical calculus will also be used to give accurate estimates for the computational and memory complexity of these algorithms. The developed methods can be used in Big Data applications across a wide spectrum of disciplines such as engineering, science, finance and medicine.
In the spirit of interdisciplinary application, this project fosters collaboration between researchers from mathematics, computer science, engineering, and medicine, and will support two graduate students and a postdoctoral researcher. The project will also support three existing courses at U-M, and the team will use the project to develop new interdisciplinary courses on Big Data methods for the Michigan Institute of Data Sciences (MIDAS).