Encrypted login | home

Program Information

The Value of Prior Knowledge in Machine Learning of Complex Systems


D Craft

D Craft1*, D Ferranti2 , (1) Massachusetts General Hospital, Cambridge, AA, (2) Massachusetts General Hospital, Boston, Massachusetts

Presentations

MO-F-CAMPUS-JT-1 (Monday, July 31, 2017) 4:30 PM - 5:30 PM Room: Joint Imaging-Therapy ePoster Theater


Purpose: To develop machine learning approaches based on genomics for use in predicting how a patient will respond to a proposed treatment. Given the complexity of this problem, we begin by analyzing learning methods using data from simulated systems, which allows us access to a known ground truth. We examine the benefits of using prior system knowledge and investigate how accuracy depends on various system parameters as well as the amount of training data available.

Methods: We generate random hierarchical Boolean networks–directed graphs with 0/1 node states and logical node update rules–the simplest computational systems that can mimic the dynamic behavior of cellular systems. Boolean networks can be generated and simulated at scale, have complex yet cyclical dynamics, and as such provide a useful framework for developing machine learning algorithms for modular and hierarchical networks such as cancer. We apply a variety of machine learning algorithms (SVMs, random forests, elastic nets, etc.) to the simulated data for various network sizes and mutation rates, both in a naive manner and using prior knowledge about how the networks are wired.

Results: Machine learning algorithms that can learn non-linear relationships (SVM and random forests) outperform algorithms based on linear relationships, e.g. logistic regression. For the baseline networks: 92% classification accuracy vs. 86%. The use of prior knowledge dramatically increases the performance, from 75% to 92%. For the baseline networks, to achieve the same accuracy without prior knowledge, one needs on the order of 10 times as much training data.

Conclusion: Nonlinear machine learning algorithms combined with prior knowledge are crucial for learning complex systems. While single gene relationships continue to be sought after for personalized health-care decision-making, this work gives clear evidence that learning the complex relationships of key sets of indicator genes will yield stronger classification and prediction methods.


Contact Email: