Navigation Aids

 
 
 
 
 
Click here to IM, text, or chat
Languages in the Penn Libraries Collections
FindIt:

Sidebar

Main Content

Languages in the Penn Libraries Collections (2010 update)

What are the languages Penn Library books and other materials use?

Franklin, the Penn Library's online catalog, employs language codes compliant with the ISO 639-2 and ANSI Z39.53 standards managed by the Library Congress. Although Franklin users may limit search results to specific languages, it is not possible for public users to search directly by language code.

This table, using Franklin data extracted on 5 January 2010, counts active titles -- individual bibliographic records -- in Franklin. The counts have been cleaned: for instance, titles identified as "Miscellaneous languages" have been examined and placed under recognizable (and in some cases, non-standard) language names.

Pct of TotalLanguage NameTitles
63.9%English2,350,760
8.9%German327,850
5.6%French206,973
3.2%Spanish119,139
2.4%Italian88,622
2.4%Chinese87,855
2.0%Arabic73,742
1.3%Latin48,544
1.3%Russian48,107
1.2%Hebrew45,897
1.1%Japanese41,494
0.7%Hindi25,313
0.5%Urdu17,470
0.5%Bengali17,242
0.4%Dutch14,726
0.3%Tamil11,809
0.3%Portuguese11,023
0.3%Persian10,381
0.2%Lithuanian9,069
0.2%Swedish8,800
0.2%Turkish8,611
0.2%Polish8,543
0.2%Sanskrit7,311
0.2%Marathi6,628
0.2%Gujarati6,130
0.1%Telugu5,423
0.1%Yiddish4,727
0.1%Malayalam4,638
0.1%Danish4,572
0.1%Tibetan4,072
0.1%Greek, Modern (1453- )3,405
0.1%Korean3,161
0.1%Nepali2,868
0.1%Sinhalese2,777
0.1%Greek, Ancient (to 1453)2,686
0.1%Ukrainian2,546
0.1%Czech2,533
0.0%Catalan1,751
0.0%Panjabi1,688
0.0%Armenian1,517
0.0%Norwegian1,440
0.0%Hungarian1,206
0.0%Finnish1,097
0.0%Frisian1,059
0.0%Rajasthani1,035
0.0%Croatian994
0.0%Serbian988
0.0%Welsh971
0.0%Romanian924
0.0%Turkish, Ottoman900
0.0%Romance (Other)769
0.0%Maithili754
0.0%Mongolian752
0.0%Newari664
0.0%Latvian661
0.0%Pushto627
0.0%Bulgarian612
0.0%Sindhi600
0.0%Swahili580
0.0%French, Middle (ca. 1300-1600)499
0.0%Yoruba433
0.0%Icelandic425
0.0%Irish411
0.0%Indic (Other)391
0.0%French, Old (ca. 842-1300)368
0.0%Konkani357
0.0%Slovak344
0.0%English, Middle (1100-1500)333
0.0%Kannada326
0.0%Prakrit languages317
0.0%Pali316
0.0%Braj315
0.0%Bhojpuri295
0.0%Amharic289
0.0%Kurdish276
0.0%Slovenian258
0.0%Syriac, Modern248
0.0%German, Middle High (ca. 1050-1500)223
0.0%Mayan languages218
0.0%Kazakh210
0.0%Afrikaans187
0.0%Judeo-Arabic185
0.0%Kashmiri174
0.0%Galician173
0.0%Estonian171
0.0%Raeto-Romance169
0.0%Baluchi168
0.0%Belarusian165
0.0%Church Slavic162
0.0%Khasi157
0.0%Lahnda143
0.0%Sino-Tibetan (Other)133
0.0%Provencal (to 1500)121
0.0%Assamese118
0.0%Macedonian118
0.0%Azerbaijani114
0.0%English, Old (ca. 450-1100)114
0.0%Malagasy112
0.0%Shona110
0.0%Aramaic105
0.0%Occitan (post 1500)101
0.0%Ladino96
0.0%Tagalog96
0.0%Tigrinya96
0.0%Indonesian95
0.0%Akkadian93
0.0%Georgian92
0.0%Somali92
0.0%Thai90
0.0%Scots86
0.0%Marwari85
0.0%Niger-Kordofanian (Other)83
0.0%Central American Indian (Other)82
0.0%Basque80
0.0%Coptic80
0.0%Awadhi75
0.0%Scottish Gaelic74
0.0%Creoles and Pidgins, French-based (Other)68
0.0%Magahi68
0.0%Dravidian (Other)66
0.0%Pahlavi66
0.0%Kinyarwanda65
0.0%Dogri62
0.0%Algonquian (Other)60
0.0%Uzbek60
0.0%Romani57
0.0%Oriya55
0.0%Sorbian (Other)55
0.0%Nahuatl54
0.0%Egyptian51
0.0%Austronesian (Other)49
0.0%Sotho46
0.0%Samaritan Aramaic44
0.0%Ganda43
0.0%Germanic (Other)43
0.0%Slavic (Other)42
0.0%Albanian41
0.0%Luo (Kenya and Tanzania)41
0.0%Manipuri41
0.0%Zulu41
0.0%Lushai40
0.0%Bantu (Other)39
0.0%Esperanto39
0.0%North American Indian (Other)37
0.0%Quechua37
0.0%Dakota36
0.0%Vietnamese36
0.0%Cree35
0.0%Western Pahari languages35
0.0%Berber (Other)33
0.0%Ethiopic33
0.0%Hausa33
0.0%Malay32
0.0%Avestan28
0.0%Dutch, Middle (ca. 1050-1350)28
0.0%Nyanja28
0.0%Wolof28
0.0%Tajik27
0.0%Fula26
0.0%Yupik languages26
0.0%Athapascan (Other)25
0.0%Breton25
0.0%Mandingo25
0.0%Ndonga25
0.0%Gothic24
0.0%Javanese24
0.0%Ndebele (Zimbabwe)24
0.0%Nilo-Saharan (Other)24
0.0%Ojibwa24
0.0%Oromo24
0.0%South American Indian (Other)24
0.0%Munda (Other)22
0.0%Sumerian22
0.0%Tswana21
0.0%Altaic (Other)20
0.0%Finno-Ugrian (Other)19
0.0%German, Old High (ca. 750-1050)19
0.0%Mohawk19
0.0%Papuan (Other)19
0.0%Turkmen18
0.0%Kikuyu17
0.0%Burmese16
0.0%Creoles and Pidgins (Other)16
0.0%Hawaiian16
0.0%Sami16
0.0%Bambara15
0.0%Samoan15
0.0%Tatar15
0.0%Zapotec15
0.0%Afroasiatic (Other)14
0.0%Lao14
0.0%Creoles and Pidgins, Portuguese-based (Other)13
0.0%Dyula13
0.0%Hiligaynon13
0.0%Iranian (Other)13
0.0%Iroquoian (Other)13
0.0%Moore13
0.0%Navajo13
0.0%Bemba12
0.0%Delaware12
0.0%Uighur12
0.0%Xhosa12
0.0%Kurukh11
0.0%Micmac11
0.0%Serbo-Croatian [script not known]11
0.0%Twi11
0.0%Ugaritic11
0.0%Bihari (Other)10
0.0%Indo-European (Other)10
0.0%Judeo-Persian10
0.0%Khmer10
0.0%Cherokee9
0.0%Manx9
0.0%Sundanese9
0.0%Tigre9
0.0%Chechen8
0.0%Creek8
0.0%Ga8
0.0%Khoisan (Other)8
0.0%Kuanyama8
0.0%Rundi8
0.0%Shan8
0.0%Swazi8
0.0%Aymara7
0.0%Balinese7
0.0%Chagatai7
0.0%Guarani7
0.0%Igbo7
0.0%Kyrgyz7
0.0%Apache languages6
0.0%Bosnian6
0.0%Burushaski6
0.0%Caucasian (Other)6
0.0%Dzongkha6
0.0%Kongo6
0.0%Mapuche6
0.0%Nyankole6
0.0%Otomian languages6
0.0%Tonga (Nyasa)6
0.0%Celtic (Other)5
0.0%Choctaw5
0.0%Chuvash5
0.0%Greek, Ancient or Modern5
0.0%Kamba5
0.0%Kru (Other)5
0.0%Lingala5
0.0%Nubian languages5
0.0%Santali5
0.0%Semitic (Other)5
0.0%Sogdian5
0.0%Songhai5
0.0%Afar4
0.0%Arawak4
0.0%Bashkir4
0.0%Bikol4
0.0%Duala4
0.0%Faroese4
0.0%Gilbertese4
0.0%Iloko4
0.0%Low German4
0.0%Maltese4
0.0%Manchu4
0.0%Niuean4
0.0%Old Persian (ca. 600-400 B.C.)4
0.0%Papiamento4
0.0%Rarotongan4
0.0%Venda4
0.0%Ainu3
0.0%Angika3
0.0%Australian languages3
0.0%Avaric3
0.0%Banda languages3
0.0%Chinook jargon3
0.0%Cushitic (Other)3
0.0%Dinka3
0.0%Ewe3
0.0%Grebo3
0.0%Herero3
0.0%Kabyle3
0.0%Kara-Kalpak3
0.0%Kawi3
0.0%Lozi3
0.0%Maori3
0.0%Mongo-Nkundu3
0.0%Neapolitan Italian3
0.0%Northern Sotho3
0.0%Palauan3
0.0%Ponape3
0.0%Sardinian3
0.0%Siksika3
0.0%Tahitian3
0.0%Tuvinian3
0.0%Artificial (Other)2
0.0%Bable2
0.0%Carib2
0.0%Creoles and Pidgins, English-based (Other)2
0.0%Efik2
0.0%Elamite2
0.0%Fang2
0.0%Fanti2
0.0%Fijian2
0.0%Garhwali2
0.0%Gondi2
0.0%Greek, Modern (1453- )2
0.0%Hmong2
0.0%Inuktitut2
0.0%Kusaie2
0.0%Luba-Katanga2
0.0%Mari2
0.0%Miscellaneous languages2
0.0%Mon-Khmer (Other)2
0.0%Old Norse2
0.0%Philippine (Other)2
0.0%Salishan languages2
0.0%Sango (Ubangi Creole)2
0.0%Serer2
0.0%Sicilian Italian2
0.0%Siouan (Other)2
0.0%Tsonga2
0.0%Zuni2
0.0%Abkhaz1
0.0%Achinese1
0.0%Acoli1
0.0%Akan1
0.0%Aljamia1
0.0%Arapaho1
0.0%Bamileke languages1
0.0%Basa1
0.0%Batak1
0.0%Bislama1
0.0%Bugis1
0.0%Caddo1
0.0%Cornish1
0.0%Dayak1
0.0%Gayo1
0.0%Haitian French Creole1
0.0%Hiri Motu1
0.0%Hupa1
0.0%Iberian1
0.0%Kabardian1
0.0%Kalatdlisut1
0.0%Karen languages1
0.0%Luba-Lulua1
0.0%Maasai1
0.0%Madurese1
0.0%Mende1
0.0%Minangkabau1
0.0%Mojo1
0.0%Nauru1
0.0%Norwegian (Bokmal)1
0.0%Norwegian (Nynorsk)1
0.0%Nyoro1
0.0%Nzima1
0.0%Ossetic1
0.0%Pampanga1
0.0%Pangasinan1
0.0%Selkup1
0.0%Tamashek1
0.0%Terena1
0.0%Tsimshian1
0.0%Tumbuka1
0.0%Wakashan languages1
0.0%Washoe1
0.0%Wolayta1
96.8%Total: Single-language titles3,681,508
0.1%Multiple languages3,692
3.1%Undetermined language, No linguistic content, or Code missing119,197
100.0%Grand Total3,804,397

The fine print

"Is this everything?"
No. Although this table reports languages of books, journals, videos, sound recordings, and electronic resources in Franklin, you should use the "Percent of Total" column as a guide to the Penn Library collections, rather than the "Titles counted" column. Active or unsuppressed bibliographic records in Franklin may have been missed if they lacked a language code. Items using two or more languages may have been coded for the prominent language or relegated to "Multiple languages", depending upon cataloging practice. This explains why our Klingon translation of Hamlet is cataloged as "English". And, of course, a "single-language" journal may have an article or two in another language!

"After six years, is this all there is?"
Six years ago, on 19 February 2004, we produced the first Languages in the Penn Library Collections count, and we updated the count three years ago on 20 February 2007. The counted collection has grown in six years, from 3,010,421 titles (2004) to 3,274,516 titles (2007) and now to 3,804,397 titles (2010), and the number of single-language titles has grown, too, from 2,931,066 titles (2004) to 3,274,516 titles (2007), and now to 3,681,508 titles (2010). We've added languages, too: we now have 364 languages or language groups represented, when in 2007 we had 354 languages and in 2004 we had only 337 languages. But the relative proportions of language representation have not changed: most of the dramatic changes (for instance, in Chinese titles) reflect the hard work of our catalogers on our existing collections rather than massive acquisitions projects.

"Where's my language?"
This table uses MARC 21 language codes, as maintained by the Library of Congress for the bibliographic description of information resources. Sparsely-published languages may be grouped into generic categories, such as "Bantu (Other)" or "Yupik languages". An interesting discussion of the language coding is provided at Ethnologue's web page, "Three-letter Codes for Identifying Languages". For more information on MARC 21 language codes, see "MARC Code List for Languages" (Library of Congress web).

Mistakes have been made.
This table uses MARC 21 language codes appearing in the MARC bibliographic record format's field 008/35-37 "Fixed-Length Data Elements / Language". However, Franklin has been built through decades of cataloging practices, and so the catalog still uses many obsolete codes: although Ainu was granted its own code in 2007 (too late for our 2007 count), almost all of Franklin's Ainu books use the old "Miscellaneous languages" code. Fossil codes and errors have been re-attributed through examination of individual Franklin records, and this examination is also used to correct and update Franklin records. Several non-standard language names appear: Burashaski and Iberian, treated by MARC as "Miscellaneous languages", have been added to highlight their presence; Mpongwe has been merged back into "Bantu (other)"; identifications for Serbian and Croatian were based upon coding or bibliographic notes, but Serbo-Croatian was added where Roman or Cyrillic script was not indicated.

*