Category talk:Pronunciation
Template
[edit]Does exist any template for the description of Pronunciation files? See for example template:English spoken article. -- Andrew Krizhanovsky (talk) 22:24, 23 January 2010 (UTC)
- No, there is no need for something like this, pronunciation files have very simple description. See File:En-uk-I can't.ogg and File:Pl-trzysta.ogg on how to write the description correctly. Please remember about the proper naming of the file including language ISO code. --Derbeth talk 18:53, 25 January 2010 (UTC)
- Ok. Thank you. -- Andrew Krizhanovsky (talk) 09:41, 26 January 2010 (UTC)
Simple how-to
[edit]Here's how to record a bunch of words on a Ubuntu Linux platform. Use 'synaptic' or 'apt-get' to install the necessary software packages, such as 'alsa-utils' and 'sox'. Compile a list of words that you want to record.
#!/bin/sh
lang=sv
while read word
do
echo $word
arecord -r 100000 -d 4 $lang-$word.wav
sox $lang-$word.wav $lang-$word.ogg norm vad -p .25 reverse vad -p .25 reverse
done
Here, "sv" (Swedish) is set as the language, which will be used as a filename prefix. "echo" prints the word as a prompt. "arecord" records four seconds of audio in 100 kbit/s, including any initial and trailing silence, so you don't have to press any key to start and stop the recording. "sox" then converts the recorded wav file to the free and open Ogg Vorbis format, but first it normalizes the sound level, truncates initial and trailing silence, but keeping .25 seconds of silence margin. --LA2 (talk) 01:28, 15 March 2013 (UTC)
- Can it cut a single record to many words? Infovarius (talk) 13:57, 22 March 2017 (UTC)
Statistics
[edit]The language categories having most files (not in subcategories, at least 20 files) are:
On September 16, 2015: Dutch (334705), Polish (22423), Ukrainian (16178), German (15048), French (13535), Belarusian (8637), Tamil (8086), Russian (6345), Chinese (4932), Swedish (4344), Hungarian (3623), Czech (3075), Latvian (1944), Jèrriais (1769), Arabic (1519), Armenian (1259), Italian (1075), Latin (655), Farsi (492), Navajo (454), Malagasy (443), Telugu (352), Norwegian (325), Spanish (319), Adyghe (288), Portuguese (264), Esperanto (260), Finnish (257), Welsh (219), Vietnamese (200), Lithuanian (165), Georgian (155), Icelandic (152), Galician (145), English (143), Danish (129), Tagalog (128), Turkish (127), Hebrew (123), Romanian (100), Greek (86), Odia (82), Kölsch (79), Macedonian (78), Thai (76), Nepali (72), Hindi (63), Croatian (59), Bashkir (56), Slovak (51), Irish (51), Devanagari (51), Korean (47), Mbunda (40), Bulgarian (30), Catalan (26), Sanskrit (23), Twi (22). --LA2 (talk) 20:06, 16 September 2015 (UTC)
- Why not in subcategories? Infovarius (talk) 13:56, 22 March 2017 (UTC)
- Right, subcategories should be included. But just for comparison, here is an updated count of the top-level files, with remarkable improvements in boldface. The table below shows a summary for 5 levels of subcategories. --LA2 (talk) 11:22, 19 May 2017 (UTC)
On May 19, 2017: Dutch (436804), Polish (23515), Russian (17354), Ukrainian (16182), Belarusian (8634), Chinese (4991), Armenian (4546), Swedish (4370), Hungarian (3701), Czech (3078), French (2933), Luxembourgish (2920), Odia (1973), Latvian (1946), Jèrriais (1769), Arabic (1537), Italian (1123), Latin (651), Wolof (586), Hebrew (577), Persian (558), English (543), Esperanto (483), Navajo (455), Malagasy (443), Telugu (386), Spanish (330), Adyghe (327), Norwegian (326), Upper Sorbian (312), Portuguese (294), Finnish (264), Welsh (228), Vietnamese (201), Lithuanian (166), Georgian (157), Galician (155), Icelandic (149), Turkish (134), Danish (132), Tagalog (128), Bengali (116), Romanian (104), Bashkir (90), Thai (89), Greek (81), Kölsch (79), Macedonian (76), Korean (74), Nepali (73), Hindi (67), Croatian (59), Bulgarian (53), Devanagari (51), Slovak (50), Pronunciation of Kannada alphabet|49), Irish (47), Oromo (42), Mbunda (41), Twi (40), Voice spectrograms|31), Catalan (30), Sanskrit (24), Albanian (23), Limburgish (22), Ancient Greek (22).
Date | Dutch | German | English | Polish | Russian | French | Ukrai- nian |
Ta- mil |
Bela- rusian |
Chi- nese |
Arme- nian |
Swe- dish |
Czech | Hunga- rian |
Luxem- bourgish |
Jèrriais | Ser- bian |
Odia | Lat- vian |
Ara- bic |
Ita- lian |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
May 19, 2017 | 439887 | 53605 | 24848 | 23948 | 18032 | 17608 | 16190 | 8719 | 8639 | 5967 | 4634 | 4605 | 3885 | 3722 | 2921 | 2310 | 2147 | 2048 | 2039 | 1695 | 1393 |
August 8, 2017 | 445308 | 62899 | 25102 | 23946 | 19055 | 17616 | 16192 | 8726 | 8639 | 5968 | 4636 | 4585 | 3880 | 3718 | 3483 | 2310 | 2147 | 2273 | 2039 | 1706 | 1634 |
CatScan | nl | de | en | pl | ru | fr | uk | ta | be | zh | hy | sv | cs | hu | lb | nrf | sr | or | lv | ar | it |
- Languages with less than 1000 words
Date | Spa- nish |
Ale- mannic |
Ro- mansh |
Espe- ranto |
Per- sian |
La- tin |
Portu- guese |
He- brew |
Nor- wegian |
Bas- que |
Arpi- tan |
Te- lugu |
Welsh | Ady- ghe |
Greek | Ice- landic |
Fin- nish |
Japa- nese |
Roma- nian |
Gali- cian |
Mace- donian |
Slo- vene |
Ka- zakh |
Slo- vak |
Alba- nian |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
May 19, 2017 | 971 | 828 | 823 | 761 | 726 | 719 | 654 | 577 | 517 | - | 420 | 401 | 366 | 347 | 276 | 275 | 269 | 252 | 245 | 214 | 136 | 70 | 54 | 51 | 24 |
August 8, 2017 | 968 | 882 | 823 | 762 | 953 | 719 | 652 | 582 | 564 | 511 | 420 | 401 | 366 | 347 | 276 | 274 | 269 | 253 | 245 | 214 | 138 | 67 | 54 | 52 | 24 |
CatScan | es | gsw | rm | eo | fa | la | pt | he | no | eu | frp | te | cy | ady | el | is | fi | ja | ro | gl | mk | sl | kk | sk | sq |
Let's move the images
[edit]Because this category should frequently be scanned by the Wiktionary updater bots, I propose to move its few images in its parent category Category:Phonology, and to rename it Category:Pronunciations according to Commons:Naming_categories#Grammatical_number. JackPotte (talk) 23:40, 24 February 2017 (UTC)
- I'm not sure which problems will this solve. My bot scans these categories and ignores images, it's trivially easy to implement. The problems I deal with the most are files which don't follow the naming rules of this category. Besides, all files directly put here (i.e., not in any subcategory) are useless for automatic processing by definition - they are not assigned to any language. So for me it's not important if there are images among them. --Derbeth talk 08:04, 22 March 2017 (UTC)
- Sorry if I could let think that it was about some programming difficulties of an hypothetical bot... I was actually talking about the crawling execution time optimization of a frequent task, so performances and ecology (several hours per year). JackPotte (talk) 08:41, 22 March 2017 (UTC)
- I still don't see how removing a few images would matter for a category containing tens of thousands of images in its subcategories. Every bot scanning Category:Pronunciation should have a whitelist of extensions (ogg, oga, wav) and ignore other files. --Derbeth talk 06:11, 23 March 2017 (UTC)
Pronunciation by language
[edit]All the language-specific subcategories are directly put here, mixed with the topic-based cats Mispronunciations, Comparison of pronunciations, ... What about a new category Pronunciation by language? Wanlpz (talk) 14:12, 26 August 2024 (UTC)