Scientists have created a new machine learning method for analyzing complex scientific data on proteins

Machine learning has been created by scientists to better interpret data from a strong scientific tool: nuclear magnetic resonance (NMR). Understanding proteins and chemical processes in the human body is one use of NMR data. Magnetic resonance imaging (MRI) is closely connected to NMR for medical diagnosis.

Scientists may use NMR spectrometers to determine the structure of molecules like proteins, but it might take a long time for highly experienced human specialists to examine the data. This new machine learning technology can evaluate data considerably faster and with the same accuracy.

The scientists presented their method in a paper just published in Nature Communications, which effectively trains computers to disentangle complicated data on atomic-scale features of proteins and parse it into individual, readable pictures.

Think of the QR code readers on your phone: NMR spectra are like a QR code of a molecule every protein has its own specific ‘QR code.’

Rafael Brüschweiler

“To be able to use these data, we need to separate them into features from different parts of the molecule and quantify their specific properties,” said Rafael Brüschweiler, senior author of the study, Ohio Research Scholar, and a professor of chemistry and biochemistry at The Ohio State University. “Prior to this, using computers to recognize these separate traits when they overlapped was extremely difficult.”

Dawei Li, the study’s principal author and a research scientist at Ohio State’s Campus Chemical Instrument Center, devised a method for teaching computers to scan pictures from NMR spectrometers. Spectra are pictures made up of hundreds of thousands of peaks and troughs that depict changes in proteins or complex metabolite combinations in a biological sample, such as blood or urine, at the atomic level.

The NMR data provide crucial information on the function of a protein as well as critical insights about what is going on in a person’s body. However, because the peaks frequently overlap, dissecting the spectra into legible peaks can be challenging. The impact is similar to that of a mountain range, with higher, closer peaks obscuring smaller ones that may also contain vital information.

“Think of the QR code readers on your phone: NMR spectra are like a QR code of a molecule every protein has its own specific ‘QR code,’” Brüschweiler said. “However, the individual pixels of these ‘QR codes’ can overlap with each other to a significant degree. Your phone would not be able to decipher them. And that is the problem we have had with NMR spectroscopy and that we were able to solve by teaching a computer to accurately read these spectra.”

The procedure entails the creation of an artificial deep neural network, which is a multi-layered network of nodes that the computer utilizes to sort and evaluate data.

The researchers built the network and then taught it to evaluate NMR spectra by giving it spectra that had already been analyzed by a human and telling it the previously known right answer. The researchers began by training a computer to evaluate spectra in the same way they would teach a kid to read.

The researchers went on to more complicated sets after the machine grasped that. They eventually supplied the computer with very complicated spectra of several proteins as well as a mouse urine sample.

According to the researchers, the computer was able to pick out the peaks in the very complicated sample with the same accuracy as a human expert using a deep neural network that had been trained to interpret spectra. Furthermore, the machine completed the task faster and with a high degree of consistency.

According to Brüschweiler, using machine learning to understand NMR spectra is simply one stage in the lengthy scientific process of NMR data interpretation. This discovery, on the other hand, improves the capabilities of NMR spectroscopists, particularly those who utilize Ohio State’s new National Gateway Ultrahigh Field NMR Center, a $17.5 million NSF-funded facility. The facility is set to open in 2022 and will feature North America’s first 1.2 gigahertz NMR spectrometer.

The National Science Foundation and the National Institutes of Health-funded this research. Alexandar Hansen, Chunhua Yuan, and Lei Bruschweiler-Li, all of Ohio State’s Campus Chemical Instrument Center, were also engaged in this investigation.

Topic : Article