Machine learning & AI

Researchers discover issues in leveraging source reputation to train artificial disinformation detection algorithms.

Scientists at Rutgers College have found a significant blemish in the manner in which calculations intended to recognize “counterfeit news” assess the validity of online reports.

According to the researchers, rather than evaluating the credibility of each individual article, the majority of these algorithms rely on a credibility score for the “source” of the article.

“It isn’t true that all news stories distributed by sources marked ‘believable’ (e.g., The New York Times) are precise, nor is it the situation that each article distributed by sources named ‘non-trustworthy’ distributions is ‘phony news,”” said Vivek K. Singh, an academic administrator at the Rutgers Institute of Correspondence and Data and co-creator of the review “Falsehood Location Calculations and Decency Across Political Belief Systems: The Effects of Article Level Labeling,” which was made available on OSFHome.

Lauren Feldman, an associate professor of journalism and media studies at the School of Communication and Information who is also a co-author of the paper, added, “Our analysis shows that labeling articles for misinformation based on the source is as bad an idea as just flipping a coin and assigning true or false labels to news stories.” She is a co-author of the paper.

“Our analysis demonstrates that categorizing articles for misrepresentation depending on the source is just as awful of an idea as just flipping a coin and classifying news stories as truthful or untrue.”

Lauren Feldman, an associate professor of journalism and media studies at the School of Communication and Information,

Article-level labels match 51% of the time, so the researchers found that using source-level labels for credibility is not reliable. This marking system has significant ramifications for undertakings like the production of powerful phony news finders and for reviews on decency across the political spectrum.

The study provides a new dataset of individually labeled articles of journalistic quality and a method for misinformation detection and fairness audits to address this issue. The discoveries of this study feature the requirement for more nuanced and solid strategies for identifying falsehood in web-based news and give significant assets to future examination around here.

The credibility and political leanings of 1,000 news articles were evaluated by researchers, who then used these article-level labels to create misinformation detection algorithms. Then, they looked at how the labeling methodology affects how well misinformation detection algorithms work at the article level versus the source level.

Their goal was to investigate the effect that article-level labeling has on the process. They also wanted to find out if bias is reduced when dealing with individually labeled articles and if the bias that exists when using a machine learning approach at the source level applies to individual articles.

The paper was presented by the authors during the 15th Association for Computing Machinery Web Science Conference 2023, which took place in Austin, Texas, from April 30 to May 1.

In addition to Singh and Feldman, the authors include a Ph.D. alumna of the School of Communication and Information, Jinkyung Park, and professionals in computer science, information science, and journalism. Rahul Dev Ellezhuthil, a master’s student in computer science; Joseph Isaac, a doctoral candidate in the School of Communication and Information; and Christoph Mergerson, an assistant professor of race and media at the University of Maryland and a Ph.D. graduate of the School of Communication and Information

According to the authors, “primarily because there is a dearth of fine-grained labels defined at the news article level,” the algorithms used to detect misinformation in online articles function as they do. We recognize that naming every news story may not be plausible given the enormous volume of news stories that are distributed and disseminated on the web. At the same time, the validity of datasets labeled at the source level should be questioned for a variety of reasons.”

“Approving web-based news and forestalling the spread of deception is basic for guaranteeing dependable web-based conditions and safeguarding a majority rules system,” the writers composed, adding that their work “means to increment public trust in falsehood identification exercises and resulting remedies by guaranteeing the legitimacy and reasonableness of results,” and their dataset and the calculated outcomes “plan to prepare for more solid and fair falsehood location calculations.”

Additional details: Jinkyung Park et al., Deception Identification

More information: Jinkyung Park et al, Misinformation Detection Algorithms and Fairness across Political Ideologies: The Impact of Article Level Labeling, DOI: 10.17605/OSF.IO/QWNSF

Topic : Article