Increasing faith in machine-learning models

In data analysis, probabilistic machine learning methods are becoming more and more potent tools, informing a variety of crucial decisions across disciplines and applications, from forecasting election results to predicting the effect of microloans on reducing poverty.

To deal with uncertainty in decision-making, this class of methods employs sophisticated concepts from probability theory. To determine their accuracy and efficacy, however, more factors must be considered than just math. In a typical data analysis, researchers make a lot of subjective decisions or may introduce human error; these decisions must also be evaluated to build users’ confidence in the caliber of decisions made using these methods.

Tamara Broderick, an associate professor in the Department of Electrical Engineering and Computer Science (EECS) and a member of the Laboratory for Information and Decision Systems (LIDS) at MIT, and a group of researchers have created a classification system—a “taxonomy of trust”—to address this problem. It outlines potential trust breakdown points in a data analysis and suggests ways to improve trust at each stage. Professors Tian Zheng, Andrew Gelman, and Rachael Meager from the London School of Economics, as well as Professor Anna Smith from the University of Kentucky, are the other researchers working on the project. The team wants to draw attention to both issues that have already been extensively researched and those that require additional study.

“Someone may be apprehensive to try an unconventional way because they are unsure they will use it appropriately. Or, even if a researcher wishes to utilize nonstandard procedures, peer review may prefer familiar approaches.”
Scientist Tamara Broderick, associate professor in the Department of Electrical Engineering and Computer Science (EECS).

In their paper, which was published in February in Science Advances, the researchers start by describing the steps in the data analysis process where trust might be compromised: Analysts choose which data to collect and which models, or mathematical representations, best reflect the real-life problem or question they are trying to address. They choose algorithms that best fit the model, and then they write code to execute those algorithms. There are particular difficulties in establishing trust for each of these steps. It is possible to quantify the accuracy checks for some components. One question that can be tested against objective standards is, “Does my code have bugs?”. Sometimes issues are more ambiguous and lack obvious solutions; in these cases, analysts must choose from a variety of data collection techniques to determine whether a model accurately depicts the real world.

This taxonomy really highlights where people are focusing, which is something I find to be nice. Because it’s so objective, even though it’s a challenging problem, I believe that a lot of research naturally concentrates on this level of “Are my algorithms solving a particular mathematical problem?”.

In my opinion, it’s difficult to respond to the question “Is it reasonable to mathematize an important applied problem in a certain way?” because the issue has moved beyond simple mathematics.

A model’s representation of reality
The categorization of trust breakdown by the researchers is grounded in real-world application, despite the fact that it may appear abstract.

Meager, a co-author on the paper, examined whether microfinance can benefit a community. The project served as a case study for how to lower the risk of trust failing in various situations.

Microfinance impact assessment might appear to be a simple task at first glance. But as with any analysis, there are difficulties that researchers must overcome at each stage that may undermine confidence in the results. Depending on the program, microfinance—in which people or small businesses receive microloans and other financial services instead of traditional banking—can provide a variety of services. Weakly gathered data for the analysis came from microfinance initiatives in a variety of nations, including Mexico, Mongolia, Bosnia, and the Philippines.

Researchers must determine whether particular case studies can reflect larger trends when combining datasets that are obviously different, in this case from various countries and across different cultures and geographies. Contextualizing the available data is also crucial. Owning goats, for instance, could be considered an investment in rural Mexico.

“It’s challenging to assess a person’s quality of life. There is the potential for a mismatch between what you ultimately care about and what you’re measuring, according to Broderick. For example, people measure things like “What is the business profit of a small business?” or “What is the consumption level of a household?”. What information and assumptions are we relying on before we reach the mathematical level?

Analysts must specify the questions about the real world they hope to address using the data at hand. Analysts must decide what constitutes a successful outcome when assessing the advantages of microfinance. When a microfinance program is implemented in a community, it is common practice in economics to assess the average financial gain per business. Reporting an average, however, might imply a net beneficial impact even if only a small number of people (or even just one) benefited, as opposed to the community at large.

What you really wanted, according to Broderick, was for lots of people to profit. It sounds simple, right? But I think it’s really common that practitioners use standard machine learning tools for a lot of reasons, which makes you wonder why we didn’t measure the thing that we cared about. Furthermore, the proxy reported by these tools might not always match the amount of interest.

Analysts may unconsciously or intentionally favor models they are familiar with, particularly after spending a lot of time learning all there is to know about them. Because they might be less confident they will use a nonstandard method correctly, someone might be hesitant to try it. Even though a researcher may prefer to use unconventional methods, peer review may favor certain well-known techniques, according to Broderick. “Socially, there are numerous causes. However, trust may be affected by this.

The final step is to check the code.
Checking the code that executes an algorithm can feel “prosaic,” according to Broderick, whereas reducing a real-life problem into a model can be a big-picture, amorphous problem. However, there is yet another area that might be overlooked where trust can be increased.

When there is an option to use standard software packages, it may be thought that checking a coding pipeline that carries out an algorithm is outside the purview of an analyst’s job.

Checking for code reproducibility is one way to find bugs. But sharing code along with published work isn’t always necessary or expected, depending on the field. It gets more difficult to write new code from scratch as models get more complex over time. A model becomes impossible to reproduce, or at least very difficult to do.

“Let’s begin with every journal requiring you to release your code. Broderick says, “Let’s start there. Maybe it doesn’t get completely double-checked and everything isn’t perfect, but let’s start there.

Gelman, a co-author of the paper, worked on a forecast for the 2020 U.S. real-time polling during the presidential election from states and the nation. The group also posted their source code online for anyone to download and run for themselves, along with daily updates in The Economist. Input from outsiders throughout the season helped strengthen the analysis by pointing out model flaws and conceptual issues.

Although there isn’t a single way to build a perfect model, analysts and scientists have the chance to boost trust almost constantly, according to the researchers.

None of these things are expected to be perfect, but according to Broderick, we can expect them to be as good as they can be.”.

More information: Tamara Broderick et al, Toward a taxonomy of trust for probabilistic machine learning, Science Advances (2023). DOI: 10.1126/sciadv.abn3999

Increasing faith in machine-learning models

you might also like

Next-generation memory devices made possible by artificial hafnia

A new study looks at how presumptions impact motion capture technology.

Mice with tiny VR goggles could improve neuroscience research

The phenomenon known as “false vacuum decay” is clarified by a new study.

Researchers find an energy-saving answer to the world’s water dilemma.