Can a machine learning model predict T2D?

Although recent studies have shown an association between gut microbiota and T2D, the new study is the first to aim to evaluate gut microbiome as a predictive measure for several T2D-associated parameters in a longitudinal study. The need for such studies has multiplied with the number of cases, with the prevalence of T2D doubling since 1980, placing a heavy burden on health systems.

The researchers used prospective data from 608 Finnish men obtained from a national database of men with metabolic syndrome. They wanted to develop machine learning models to predict glucose and insulin measurements in both the short term (18 months) and the longer term (4 years). Inclusion of the identified gut microbiome markers improved the prediction accuracy for models of T2D-associated parameters such as glycated hemoglobin (A1C) and machine learning measures.

Machine learning is a type of artificial intelligence that allows computers to learn without being explicitly programmed. When a program is exposed to more data, it can better recognize patterns over time.

Previous studies have shown that levels of bacteria such as Roseburia and Bifidobacteria are altered in patients with T2D. Most of the work, however, dealt with cross-sectional findings rather than prospective data, so that the microbiome could not be evaluated as a predictive tool.

In the current study, random forest models were trained to predict metabolic outcomes, including fasting glucose and fasting insulin, using the baseline microbiome, metabolic outcomes, and additional covariates. The model training was repeated 200 times with different initial divisions. Traits were extracted to identify biomarkers and a local effect method was used to record their effect in predicting a corresponding metabolic trait.

The results suggest that using the microbiome as a predictor of the 18-month period can improve the accuracy of secretion index, A1C, and 2-hour insulin levels (h). For the secretion index, models with microbial predictors outperformed simpler models in 61% of the cases, for 2-hour insulin in 70.5% of the cases and for A1C in 64.5% of the cases.

For a period of 4 years, the model improved the accuracy for secretion index, fasting insulin, and 2-hour insulin. For the secretion index, models including the microbiome outperformed simpler models in 69% of the cases, 2-hour insulin in 61% of the cases, and fasting insulin in 68.5% of the cases.

However, the variation in the differences in root mean square error between models with and without microbial predictors was large, meaning that the potential for improving prediction accuracy using microbiome data is unclear.

The study also identified novel microbial biomarkers that contributed to the prediction accuracy. At the 18-month follow-up, unclassified Muribaculaceae were a significant predictor of the secretion index and A1C; In mice, the bacteria have proven to be protective against T2D.

For the 48 month period, the AD3011 family XIII group was an important predictor of secretion index and 2-hour insulin, and uncultured rhodospirillales were an important predictor of secretion index and 2-hour insulin. Rhodospirillales is made up of bacteria known to produce acetic acid that have been shown to improve insulin sensitivity.

“Our results suggest that bacteria are a means of predicting changes in insulin secretion and the insulin response to glucose uptake,” the authors say.


Aasmets O., Lull K., Lang JM, et al. Machine learning reveals time-varying microbial predictors with complex effects on glucose regulation. mSystems. 2021; 6 (1). doi: 10.1128 / mSystems.01191-20

Related Articles