What a massive database of retracted papers reveals about science publishing’s ‘death penalty’

Nearly a decade ago, headlines highlighted a disturbing trend in science: The number of articles retracted by journals had increased 10-fold during the previous 10 years. Fraud accounted for some 60% of those retractions; one offender, anesthesiologist Joachim Boldt, had racked up almost 90 retractions after investigators concluded he had fabricated data and committed other ethical violations. Boldt may have even harmed patients by encouraging the adoption of an unproven surgical treatment. Science, it seemed, faced a mushrooming crisis.

The alarming news came with some caveats. Although statistics were sketchy, retractions appeared to be relatively rare, involving only about two of every 10,000 papers. Sometimes the reason for the withdrawal was honest error, not deliberate fraud. And whether suspect papers were becoming more common—or journals were just getting better at recognizing and reporting them—wasn’t clear.

Still, the surge in retractions led many observers to call on publishers, editors, and other gatekeepers to make greater efforts to stamp out bad science. The attention also helped catalyze an effort by two longtime health journalists—Ivan Oransky and Adam Marcus, who founded the blog Retraction Watch, based in New York City—to get more insight into just how many scientific papers were being withdrawn, and why. They began to assemble a list of retractions.

That list, formally released to the public this week as a searchable database, is now the largest and most comprehensive of its kind. It includes more than 18,000 retracted papers and conference abstracts dating back to the 1970s (and even one paper from 1756 involving Benjamin Franklin). It is not a perfect window into the world of retractions. Not all publishers, for instance, publicize or clearly label papers they have retracted, or explain why they did so. And determining which author is responsible for a paper’s fatal flaws can be difficult.

Still, the data trove has enabled Science, working with Retraction Watch, to gain unusual insight into one of scientific publishing’s most consequential but shrouded practices. Our analysis of about 10,500 retracted journal articles shows the number of retractions has continued to grow, but it also challenges some worrying perceptions that continue today. The rise of retractions seems to reflect not so much an epidemic of fraud as a community trying to police itself.

Among the most notable findings:

Although the absolute number of annual retractions has grown, the rate of increase has slowed.

The data confirm that the absolute number of retractions has risen over the past few decades, from fewer than 100 annually before 2000 to nearly 1000 in 2014. But retractions remain relatively rare: Only about four of every 10,000 papers are now retracted. And although the rate roughly doubled from 2003 to 2009, it has remained level since 2012. In part, that trend reflects a rising denominator: The total number of scientific papers published annually more than doubled from 2003 to 2016.

Much of the rise appears to reflect improved oversight at a growing number of journals.

Overall, the number of journals that report retractions has grown. In 1997, just 44 journals reported retracting a paper. By 2016, that number had grown more than 10-fold, to 488. But among journals that have published at least one retraction annually, the average number of retractions per journal has remained largely flat since 1997. Given the simultaneous rise in retractions, that pattern suggests journals are collectively doing more to police papers, says Daniele Fanelli, a lecturer in research methods at the London School of Economics and Political Science who has co-written several studies of retractions. (The number per journal would have increased, he argues, if the growing number of retractions resulted primarily because an increased proportion of papers are flawed.)

“Retractions have increased because editorial practices are improving and journals are trying to encourage editors to take retractions seriously,” says Nicholas Steneck, a research ethics expert at the University of Michigan in Ann Arbor. Scientists have kept the pressure on journals by pointing out flaws in papers on public websites such as PubPeer.

In general, journals with high impact factors—a measure of how often papers are cited—have taken the lead in policing their papers after publication. In 2004, just one-fourth of a sampling of high-impact biomedical journals reported having policies on publishing retractions, according to the Journal of the Medical Library Association (JMLA). Then, in 2009, the Committee on Publication Ethics (COPE), a nonprofit group in Eastleigh, U.K., that now advises more than 12,000 journal editors and publishers, released a model policy for how journals should handle retractions. By 2015, two-thirds of 147 high-impact journals, most of them biomedical titles, had adopted such policies, JMLA reported. Proponents of such policies say they can help journal editors handle reports of flawed papers more consistently and effectively—if the policies are followed.

Journals with lower impact factors also appear to be stepping up their standards, Steneck says. Many journals now use software to detect plagiarism in manuscripts before publication, which can avoid retractions after.

But evidence suggests more editors should step up.

A disturbingly large portion of papers—about 2%—contain “problematic” scientific images that experts readily identified as deliberately manipulated, according to a study of 20,000 papers published in mBio in 2016 by Elisabeth Bik of Stanford University in Palo Alto, California, and colleagues. What’s more, our analysis showed that most of the 12,000 journals recorded in Clarivate’s widely used Web of Science database of scientific articles have not reported a single retraction since 2003.

Relatively few authors are responsible for a disproportionate number of retractions.

Just 500 of more than 30,000 authors named in the retraction database (which includes co-authors) account for about one-quarter of the 10,500 retractions we analyzed. One hundred of those authors have 13 or more retractions each. Those withdrawals are usually the result of deliberate misconduct, not errors.

Nations with smaller scientific communities appear to have a bigger problem with retractions.

Retraction rates differ by country, and variations can reflect idiosyncratic factors, such as a particularly active group of whistleblowers publicizing suspect papers. Such confounding factors make comparing retraction rates across countries harder, Fanelli says. But generally, authors working in countries that have developed policies and institutions for handling and enforcing rules against research misconduct tend to have fewer retractions, he and his colleagues reported in PLOS ONE in 2015.

A retraction does not always signal scientific misbehavior.

Many scientists and members of the public tend to assume a retraction means a researcher has committed research misconduct. But the Retraction Watch data suggest that impression can be misleading.

The database includes a detailed taxonomy of reasons for retractions, taken from retraction notices (although a minority of notices don’t specify the reason for withdrawal). Overall, nearly 40% of retraction notices did not mention fraud or other kinds of misconduct. Instead, the papers were retracted because of errors, problems with reproducibility, and other issues.

Retraction rate levels off

Although the number of retractions ballooned after 1997, the percentage of all papers retracted rose more slowly and leveled off after 2012. Click here for more on the methodology.

’04*’06’08’10’12’14’16’18012345Retraction rate(per 10,000 papers)*The rate appears to decline after 2015, but numbers are almostcertainly incomplete.

(GRAPHIC) J. YOU/SCIENCE; (DATA) RETRACTION WATCH AND NSF

About half of all retractions do appear to have involved fabrication, falsification, or plagiarism—behaviors that fall within the U.S. government’s definition of scientific misconduct. Behaviors widely understood within science to be dishonest and unethical, but which fall outside the U.S. misconduct definition, seem to account for another 10%. Those behaviors include forged authorship, fake peer reviews, and failure to obtain approval from institutional review boards for research on human subjects or animals. (Such retractions have increased as a share of all retractions, and some experts argue the United States should expand its definition of scientific misconduct to cover those behaviors.)

Determining exactly why a paper was withdrawn can be challenging. About 2% of retraction notices, for example, give a vague reason that suggests misconduct, such as an “ethical violation by the author.” In some of those cases, authors worried about damage to their reputations—and perhaps even the threat of libel lawsuits—have persuaded editors to keep the language vague. Other notices are fudged: They state a specific reason, such as lack of review board oversight, but Retraction Watch later independently discovered that investigators had actually determined the paper to be fraudulent.

Ironically, the stigma associated with retraction may make the literature harder to clean up.

Because a retraction is often considered an indication of wrongdoing, many researchers are understandably sensitive when one of their papers is questioned. That stigma, however, might be leading to practices that undermine efforts to protect the integrity of the scientific literature.

Journal editors may hesitate to hand down the death penalty—even when it’s justified. For instance, some papers that once might have been retracted for an honest error or problematic practices are now being “corrected” instead, says Hilda Bastian, who formerly consulted on the U.S. National Library of Medicine’s PubMed database and is now pursuing a doctorate in health science at Bond University in Gold Coast, Australia. (The Retraction Watch database lists some corrections but does not comprehensively track them.) The correction notices can often leave readers wondering what to think. “It’s hard to work out—are you retracting the article or not?” Bastian says.

COPE has issued guidelines to clarify when a paper should be corrected, when it should be retracted, and what details the notices should provide. But editors must still make case-by-case judgments, says Chris Graf, the group’s co-chair and director of research integrity and publishing ethics at Wiley, the scientific publisher based in Hoboken, New Jersey.

A concerted effort to reduce the stigma associated with retractions could allow editors to make better decisions. “We need to be pretty clear that a retraction in the published literature is not the equivalent of, or a finding of, research misconduct,” Graf says. “It is to serve a [different] purpose, which is to correct the published record.”

One helpful reform, some commentators say, would be for journals to follow a standardized nomenclature that would give more details in retraction and correction notices. The notices should specify the nature of a paper’s problems and who was responsible—the authors or the journal itself. Reserving the fraught term “retraction” for papers involving intentional misconduct and devising alternatives for other problems might also prompt more authors to step forward and flag their papers that contain errors, some experts posit.

The burden of misconduct

The majority of retractions have involved scientific fraud (fabrication, falsification, and plagiarism) or other kinds of misconduct (such as fake peer review). Click here for more on the methodology.

Percent of all retractions (%)’97’98’99’00’01’02’03’04’05’06’07’08’09’10’11’12’13’14’15’16’17’18010203040Fake peer reviewFlawed imagesPlagiarism or duplication of text*Retracted papers,by publication year20052000201020151997All retractions:62Fraud:292007All retractions:419Fraud:2522014All retractions:946Fraud:411FraudOther misconductPossible misconductReliabilityErrorMiscellaneous*Changing infractionsThe proportion of retractions involving plagiarism of text—stealing someone else’s or duplicatingone’s own—has risen; one cause appears to be the introduction in 2004 of iThenticate, an inter-net-based plagiarism detection service. Fake peer reviews occur when authors give journals emailaddresses thatthey control, allowing them to review their own manuscripts. Flawed images includeinstances of intentional manipulation and of error.

(GRAPHIC) J. YOU/SCIENCE; (DATA) RETRACTION WATCH

Such discussions underscore how far the dialogue around retractions has advanced since those disturbing headlines from nearly a decade ago. And although the Retraction Watch database has brought new data to the discussions, it also serves as a reminder of how much researchers still don’t understand about the prevalence, causes, and impacts of retractions. Data gaps mean “you have to take the entire literature [on retractions] with a grain of salt,” Bastian says. “Nobody knows what all the retracted articles are. The publishers don’t make that easy.”

Bastian is incredulous that Oransky’s and Marcus’s “passion project” is, so far, the most comprehensive source of information about a key issue in scientific publishing. A database of retractions “is a really serious and necessary piece of infrastructure,” she says. But the lack of long-term funding for such efforts means that infrastructure is “fragile, and it shouldn’t be.”

Ferric Fang, a clinical microbiologist at the University of Washington in Seattle who has studied retractions, says he hopes people will use the new database “to look more closely at how science works, when it doesn’t work right, and how it can work better.” And he believes transparent reporting of retractions can only help make science stronger. “We learn,” he says, “from our mistakes.”