The Pangloss Collection, an open archive containing more than 3600 audio and video recordings in 170 languages from across all continents, is now being revamped with a new website.
Examples from the archive include stories and songs in Xârâgurè (New Caledonia), conversations and tales in Kakabe (Guinea), and cooking recipes in Koyi rai (Nepal) and Na-našu (Italy)—a total of 780 hours of listening.
The archives are the result of more than twenty years’ work by linguists and ethnologists who, in their own field of study, are working to collect and preserve the world’s linguistic heritage. Some of the documents come from the digitization of old magnetic tapes. Nearly half of the recordings are transcribed and annotated, some with contextual elements or translations into other languages. The site is open to contributions from both academic and non-academic experts, who are encouraged to improve the corpus by contributing to transcriptions and translations.
In order to be more accessible to the general public, who can freely listen to and download these precious documents and hereby get a sense for the world’s linguistic diversity, the redesigned pangloss.cnrs.fr website can now be consulted via two levels of access. As the content is largely under a Creative Commons license, it is available for use in museographic projects or audio creations.
Beyond its heritage aspect, this collection is also part of an open science approach to facilitate the conservation, referencing, and availability of primary data for researchers. Its purpose is to limit the loss of scientific data (a “second death” for extinct languages) whilst also encouraging collaboration with other disciplines: computer scientists interested in automatic language processing can access the files they need and take part in the co-development of tools (e.g. for automatic transcription). The site is fully bilingual (French–English) and also includes partial translations in other languages, including Chinese for records in certain Asian languages.
In addition to contributions from various laboratories associated with the CNRS, the Pangloss Collection is supported by the recently created Institute for Linguistic Heritage and Diversity at the EPHE-PSL, and data are stored in the archive of the large research infrastructure (Très grande infrastructure de recherche –TGIR) Huma-Num. The Pangloss Collection is a member of the international Digital Endangered Languages and Musics Archives Network (DELAMAN). It is hosted by the Cocoon platform, Collection de corpus oraux numériques, which is one of the participating archives of the Open Language Archive Community (OLAC).