World’s largest linguistics database is getting too expensive for some researchers

Banewel Mark, a literacy facilitator with the Ghana Institute of Linguistics, Literacy and Bible Translation, teaches the Deg language, which is considered “vulnerable,” according to the Endangered Languages Project.

Rodney Ballard

It was 2015 when Gary Simons knew that something had to change. That was the year spare funds started to dry up at the Summer Institute of Linguistics (SIL), a Bible translation group that helped revolutionize the documentation of endangered languages in the mid–20th century. SIL’s budget had long supported Simons’s passion project: Ethnologue—or “the Ethnologue” as many researchers call it—a massive online database considered by many to be the definitive source for information on the world’s languages.

Ethnologue’s users—and there are hundreds of thousands—can track how many people speak each of the world’s tongues, from Hebrew to Hausa to Hakka (9.3 million, 63.4 million, and 48.2 million, respectively). The database indicates, on a scale of one to 10, every language’s risk of extinction. It also gives a surprisingly clear answer to the squishy question, “how many languages exist?” (7111, by the latest count). For linguists, it’s a resource of reference; for students, it’s a window into the diversity of human language.

But for Simons, a computational linguist who has run Ethnologue for almost 20 years, it’s been a growing heartache. To help cover its nearly $1 million in annual operating costs, Ethnologue got its first paywall in late 2015; most nonpaying visitors were turned away after several pages. Since October 2019, the paywall has taken a new form: It lets visitors access every page, but it blots out information on how many speakers a language has and where they live. Subscriptions now start at $480 per person per year.

The online backlash has been harsh. Many linguists have vowed to abandon the site for other resources. “In the last few years, [Ethnologue has] gotten increasingly expensive and locked down,” says Simon Greenhill, an evolutionary linguist at the Max Planck Institute for the Science of Human History. “This is a very sad step.”

He and other scholars are now struggling to find a cheaper or free substitute for the population figures—the data that long made Ethnologue “the only option,” for researchers studying linguistic diversity, says Greenhill, who studies the relationships between languages. “I’m not fundamentally opposed to paying for data, but it’s a hard pill to swallow,” Greenhill says. For a recent paper on how geography affects language diversity, his team used data from an older version of Ethnologue that they had previously paid for; access to its most current databases would have cost several thousand dollars. He’ll be doing the same for an upcoming paper on the causes of language extinction.

Simons understands why linguists are upset. The need to impose fees “is heavy on our heart,” he says. “But we can’t really do anything until we change the economic picture. If we keep coasting the way we are, it’s just going to crumble.”

Since 2013—the year the site got its last major overhaul—Simons and SIL Chief Innovation Development Officer Stephen Moitozo have been trying to simultaneously grow Ethnologue and make it self-sustaining. After the first paywall went up, they added interactive maps and customer service chatbots. Ongoing costs include website maintenance, security, and paying researchers to update the databases whenever new information comes in from independent researchers or SIL’s 5000 field linguists.

To pay for all that, SIL is counting on institutions, not individual subscribers. Some 40% of the world’s top 1400 schools already have subscribed, Moitozo says, and sales teams are after the remaining 60%. SIL is also planning to sell tailored access to corporations, including business intelligence firms and Fortune 500 companies.

“We thought the bulk of the people using Ethnologue were academic researchers,” Moitozo says. Instead, weblog traffic suggested that they were “only 26% right.” Other users include high school students, consultants, and people trying to find interpreters for courts, hospitals, and immigration offices. “There are lots of organizations that depend upon Ethnologue for their daily work,” Moitozo says.

Whether those organizations will be willing to pay is, literally, the million-dollar question.

Greenhill is skeptical. His institution does not have a subscription, and he and colleagues have been “routing around” the new paywall by using Glottolog, a site that for years has cataloged many of the same data as Ethnologue, though in a different format and with different citation standards. (The irony is that the source for much of those data is Ethnologue itself.) “People are shifting to Glottolog as a primary source,” Greenhill says.

That’s a mixed blessing for Harald Hammarström, a linguistic typologist at Uppsala University and an editor and co-founder of Glottolog. Losing free access to Ethnologue is “a shame,” he says. But it also means more researchers will be coming to his site. “That’s something I will be happy about,” he says.

Compared with Ethnologue, Hammarström says Glottolog costs €10,000 to €20,000 a year to run—the price of a part-time staff of three. However, the site performs no surveys of its own, and it doesn’t collect population data. “We don’t need to make money, and we don’t want to make money,” Hammarström says. He and his fellow editors plan to keep working on Glottolog for another decade at least. After that, he anticipates that some other academic will scrape the site and create a new one. “That’s the only, but sufficient, plan for the future,” he says.

Meanwhile, Simons hopes to come up with a better option for independent researchers and students whose institutions don’t have subscriptions to Ethnologue. “Our thinking is if we can make it so that people who are really depending on it for their work are subscribing, and in essence paying their fair share, then we’ll have the means to think about how to be generous to those people who can’t afford it.”

That day can’t come soon enough for Rikker Dockum, a historical linguist at Swarthmore College who has spent years documenting the Tai languages of Southeast Asia. “We still have an ongoing language documentation crisis … languages are dying out, and we’re working very hard to try to [record] what we can,” he says. “Things like Ethnologue and Glottolog are not just butterfly collections that we want people to be able to browse through; they are important tools for linguists to know what is possible in language.”

source: sciencemag.org