In 2021, BloodCounts! developed machine learning models based on anomaly detection, which can identify the Cambridge SARS-CoV-2 outbreak in a pathogen-agnostic manner using rich full blood count data. One of the main objectives of the BloodCounts! consortium is to replicate the detection of the SARS-CoV-2 outbreak for larger populations and different cohorts. To achieve this, we aim to apply and refine the previously developed methods on the large BloodCounts! database.
The algorithm relies on detecting a spike in the reconstruction error between the input full blood count and the reconstructed one. The threshold for this error remains to be determined through experimentation. By analysing abnormal error signals in relation to clusters of similarly affected patients, the algorithm can more confidently confirm the start of a pandemic and suppress any false error signals.
Moreover, supervised AI models like XGBoost can be optimized on compressed rich full blood count data to classify whether individuals are infected with SARS-CoV-2. This approach is applicable to routine full blood count tests performed on in-patients, out-patients, and other individuals seeking access to NHS care, including specific at-risk groups.
To gain further insights into the SARS-CoV-2 disease, we can employ state-of-the-art explainability techniques, such as Shapley values, on our classification algorithms. These techniques help us learn about the clinical changes in the blood of patients infected with SARS-CoV-2 that informed the model and determine how these changes are linked to clinical outcomes.