Help:Spoken Wikipedia using AI

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

This page is under construction. Please help build this help page. It is very incomplete.

Wikipedia articles can be spoken using modern AI tools. This makes them more accessible to blind people or people who would like to listen to articles on the go for example. This page helps you create high-quality audio narrations in the original language or for translations and serves as a place to discuss and organize related things.

Tutorials

[edit]
Arch Mission Foundation
Elephant communication
Heraclitus
2022 in science#August (still some issues with abbreviations and brackets)

SoniTranslate

[edit]

SoniTranslate is free and open source software (GitHub repo)

  1. Set it up on your local machine
  2. Open the Wikipedia article and (in Firefox) right click inspect on a citation like [1] and select Inspect then in the HTML in the Developer Tools window select the part that has class="reference" which should be a bit above the currently selected text
  3. In Filter Styles enter refere so that "sup.reference" is shown – there at the click below "font-style: normal;" to enter a new value and enter "display" then press enter then enter "none" into the second field
  4. If the article has media files like images, right click on the media file, select inspect, then click on the element in the HTML that says "mw-default-size", at the top where it says ".mw-content-ltr figure[typeof~="mw:File/Thumb"]" just like above enter display: none to also hide all images in the article so it removes both the captions and the images alt-texts
  5. If there is an Infobox you could remove the text from it later or delete the table in the HTML that contains it in the Developer Tools by clicking on the table html and pressing the delete key until the infobox is not visible anymore (you can refresh the page to undo you UI changes). Scroll through the page and if there are more templates right click and insepct them and then delete them from the view as well.
  6. If there are any Main or See also links beneath section headers also inspect and then add display: none to .hatnote to hide all of them at once
  7. If there are notes like [citation needed] one hide all sup elements by clicking on it and adding display: none or via custom CSS that hides the .noprint class
  8. Now copy the article's contents from start to the beginning of the See also or the Reference section
  9. Paste the text into a text file that you save as an .srt file
  10. At the top above the text enter something like:

1
00:00:00,000 --> 00:10:00,000

  1. At the bottom below your text enter something like "This was the English Wikipedia article about ARTICLENAME as of CURRENTDATE narrated by AI-generated voice via the open source software SoniTranslate." and save the file
  2. Now remove things like International Phonetic Alphabet (IPA) pronunciation that is usually at the very beginning of the article, tags like [citation needed], check if there are other media captions left in the article that should also be removed (as the case with some templates by which multiple media files are included with one caption)
  3. Also remove any tables (either afterwards or by clicking on tbody in the HTML after inspecting and then adding display: none to .wikitable) and replace it with something like "The table TABLETITLE is not included in this audio version"
  4. Spell out some abbreviations like i.e. refers to "that is" (why ever English has no normal abbreviation for that), e.g. to "for example", i.a. to "inter alia" or "among other things", M often "million", and so on
  5. Removing or adding line breaks may be useful depending on how the copy paste reformatted the text; save the srt file
  6. In SoniTranslate, specify source language, Translate audio to language should be the same if you don't intend to create a translated spoken article and select Max speakers 1 and a voice that matches the target language
  7. Under Advanced settings select turn down max acceleration to none/1, enable Acceleration Rate Regulation, select output type "audio (ogg)", Subtitle type should be disabled, and Translation process (setting near the bottom) should be disable_translation
  8. Under "Upload an SRT subtitle file" upload the srt file then click Translate at the top; if it gets an error simply try again but if it has the 5000 characters limit error (Text length need to be between 0 and 5000 characters in the console) you need to reduce the text size and create one audio file at a time if you want to have it translated or if not translating make sure Translation process is set to disable_translation
  9. Once it has finished you can convert the file to opus and/or trim the end of the file if the audio file is longer than its contents with this ffmpeg command: ffmpeg -copyts -ss 00:00:00.0 -to 00:08:50.0 -i ./input.mpga -c:a libopus -b:a 192000 ./output.opus (adjust file paths and the second timestamp) You can replace 192000 with other audio bitrate quality or remove -copyts -ss 00:00:00.0 -to 00:08:50.0 if no trimming is needed
  10. When you upload the file please name it Wikipedia - article name with further info in brackets like "(AI voice)" and also put it into the category Category:Spoken Wikipedia articles using English-language speech synthesis
  11. Theoretically it may be possible to have links to the different sections in the audio in the file description via Temporal media fragments but I haven't tried this – add info on things like that which is useful to improve the quality of these AI-enabled audio files here or on the talk page

Most of these steps would not be necessary if there was a audio export view like a print preview but you could use a browser extension so that you don't need to do many or all of these steps (for large articles). A tool to create these audios at scale and with lots of adjustments to improve quality is needed so it's probably not a good idea to create many of these manually instead of building such a tool. There also is no proper audio player with features like skip 10 seconds back here so far and there is a request to add the functionality to enable adding buttons to jump to the timestamps where the article's sections start in the audio from the file description page (and later the proper audio player) so that one can also jump around the audio.

Problems

[edit]
  • When translating it is limited to 5 k characters, making this a bit cumbersome because one needs to create multiple audio files and combine them with audacity or a ffmpeg command to combine audio files.
  • It should work when providing just text, providing srts can for example cause the audio file to be longer than its contents
  • It should know common abbreviations and either always spell them out or if possible infer what it refers to; in some cases this could be done if the tool knew the wikilink set on the text
  • Some of the same problems that are noted on the video dubbing page
  • Apparently the voice volume can quickly turn louder for unknown reasons
  • Various issues described already in Help:AI video dubbing#Known problems such as missing quickconfig buttons (or a changing the default settings when starting it with some parameter like --config:spokenWP) or automatically selecting a voice that matches the output language

Bark

[edit]

Bark by Suno AI is also open source. You could experiment with that using Wikipedia text as described above but the voices don't seem as good as the ones of SoniTranslate and it's currently not well-suited for long texts and texts that should be narrated exactly as 1:1 without any additions/hallucinations or slight alterations by the AI. See "Advanced Long-Form Generation" here.

Missing integrated Wikimedia tool & audio-player

[edit]

There needs to be a proper Wikipedia article to AI-spoken audio integrated tool, if possible also available as Web UI to active users, that does things automatically like removing all things that shouldn't be narrated, resolving abbreviations, adding attribution text at the end of the text, adding a standardized description with wikilink and categories, and so on (see above).

Likewise, there needs to be a proper audio player for spoken Wikipedia where it's possible to:

  • add timestamps for each section so one could jump to the one would like to listen to or skip a section (audio chapters)
  • have a wider player so if skipping around with a click one doesn't miss the intended timestamp by minutes
  • listen to the audio without the phone turned on (without having to download the file)
  • have a next and previous track functionality (e.g. for listening to lists or many bookmarked items and this is also critical for the usefulness of audio files of music on WMC)
  • jump back by 10 seconds (maybe also a few more jump options like that) as all audiobook and podcast players as well as YouTube have it

See also

[edit]