Commons talk:Video2commons

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 1 day and sections whose most recent comment is older than 90 days. For the archive overview, see Special:PrefixIndex/Commons talk:Video2commons/Archive.

Before you file a report, please check whether the following list already includes the problem you have.

Known bugs and common errors

  1. Subtitles are not imported https://github.com/toolforge/video2commons/issues/148
    Since circa 2020, subtitles have not been automatically imported even if the option is selected. Unknown cause, possibly bug in codes. Previous discussions: Commons_talk:Video2commons/Archive_2#Youtube_subtitle_not_imported Commons_talk:Video2commons/Archive_2#Not_importing_subtitles_from_youtube.--RZuo (talk) 17:04, 1 January 2023 (UTC)[reply]

Large uploads smoother

[edit]

https://phabricator.wikimedia.org/T358308

thanks to User:CGoubert-WMF, User:HNowlan (WMF) and User:Bawolff, large uploads are now probably smoother. maybe v2c also benefits from this? how's your experience recently? RZuo (talk) 21:13, 27 April 2024 (UTC)[reply]

I should mention that I'm still aware of at least two issues that might affect big uploads (interuption during deployments, and false positives when scanning for bad files) however both should happen a lot less then the previous issue that was recently fixed, so hopefully things are working smoother now even if not perfect. Bawolff (talk) 22:53, 27 April 2024 (UTC)[reply]
File:Discovering Tenggol- Aerial Views and Submerged Delights in 4K.webm 3.28 gb, successful in 1 try. RZuo (talk) 11:51, 28 April 2024 (UTC)[reply]
File:Data Center Tour & Technical Deep Dive into the Power, Data and Cooling Infrastructure!.webm 3.31 gb success in 1 try. RZuo (talk) 14:42, 28 April 2024 (UTC)[reply]
@Bawolff: Hi, I got a message An exception occurred: MaybeEncodingError: b'(\'\\\'PicklingError("Can\\\\\\\'t pickle <class \\\\\\\'video2commons.exceptions.TaskError\\\\\\\'>: import of module \\\\\\\'video2commons.exceptions\\\\\\\' failed")\\\'\', \'"(1, <ExceptionInfo: TaskError(\\\'Sorry, but files larger than 4GB can not be uploaded even with server-side uploading. This task may need manual intervention.\\\')>, None)"\')'. However the MP4 file is only 1.252 GB. So how is it possible to get larger than 4 GB here? Thanks, Yann (talk) 10:45, 14 May 2024 (UTC)[reply]
Its hard to know without looking at the file (and possibly knowing more about mp4 internals then i do), but for some files there can be transcoding options that well normally work well might work terribly for some specific file. As an aside as far as 4gb limit goes on video2commons, i submitted [1] but its waiting on one of the video2commons maintainers to review it (i think that is @Zhuyifei1999 ) Bawolff (talk) 20:43, 14 May 2024 (UTC)[reply]
@Bawolff: Thanks for your answer. It is not the only problem. See the screenshot I posted below. Yann (talk) 22:10, 14 May 2024 (UTC)[reply]
The problems below all sound like variations on out-of-memory errors (the cannot allocate memory error very obviously looks like OOM, but the errors related about processes being killed sound like they could also be OOM related errors). I don't know enough how video2commons works to say more than that. Perhaps the tool in question needs to be modified to do less things in parallel. Bawolff (talk) 00:19, 15 May 2024 (UTC)[reply]
@Bawolff: I wonder if I should open bug reports, and with what information? Yann (talk) 08:15, 15 May 2024 (UTC)[reply]
I got the following while trying to upload https://archive.org/details/where-the-north-begins-1923-by-chester-m.-franklin to File:Where the North Begins (1923, 1080p) by Chester M. Franklin.webm: "An exception occurred: MaybeEncodingError: b'(\'\\\'PicklingError("Can\\\\\\\'t pickle <class \\\\\\\'video2commons.exceptions.TaskError\\\\\\\'>: import of module \\\\\\\'video2commons.exceptions\\\\\\\' failed")\\\'\', \'"(1, <ExceptionInfo: TaskError(\\\'Sorry, but files larger than 4GB can not be uploaded even with server-side uploading. This task may need manual intervention.\\\')>, None)"\')'".   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 17:25, 7 July 2024 (UTC)[reply]

It does not import subtitles!

[edit]
Example that has subtitles at source and when downloaded

When having subtitles checked it usually does not import subtitles, or at least often.

For example, try turning on subtitles for the video on the right which has subtitles at the source. Here is another example since I think the uploader had import subtitles checked (note that in this case the subtitles are set on a wrong language at the source which is rare). Could this please be fixed as soon as possible so people don't need to upload many videos again to have the subtitles working? It would be best if one could easily add subtitles to the video file itself so that they are embedded in the video but for now this is only about the subtitles existing at the source.

There also is an issue that subtitles are not shown for videos where the subtitle was successfully imported using the tool (you can download the video and watch the video with subtitles); create an issue about that here: phab:T368298. Prototyperspective (talk) 17:22, 24 June 2024 (UTC)[reply]

#top "Before you file a report, please check whether the following list already includes the problem you have." RZuo (talk) 19:40, 24 June 2024 (UTC)[reply]
Ups, missed that part. Was about to delete this section but seeing how this is still an issue after ca 4 years, the issue has been created over 2 years ago, and there not being much details in the github issue maybe it should stay. Getting the subtitle downloaded is just making sure the right yt-dlp command is used so if that is the problem it should be quick and simple to fix. Also maybe it has to do with the phab issue about VTT subtitles, should there also be a phabricator issue about it? From the code issue it seems like it could have to do with language codes, could somebody take a look at it? Prototyperspective (talk) 21:17, 24 June 2024 (UTC)[reply]

Cannot upload age-restricted videos

[edit]

I get the following error when I try to upload videos that are age-restricted:

An exception occurred: DownloadError: b'ERROR: unable to download video data: HTTP Error 403: Forbidden'

Is there a work-around? FunnyMath (talk) 17:58, 11 July 2024 (UTC)[reply]

@FunnyMath: I don't suppose we could wait til the bot is 18.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 08:00, 13 July 2024 (UTC)[reply]
@Jeff G.: Maybe the bot can be accompanied by another bot that is 18 or older ;)
Anyways, it's not an urgent issue. I'm using yt-dlp to bypass the age-restriction. I'm uploading some public domain Korean War videos that are not child-friendly, so that's why. My issue might be too niche to merit a fix; I would guess not too many public domain or CC-BY videos are age-restricted. FunnyMath (talk) 09:11, 13 July 2024 (UTC)[reply]
What's the option you're using to bypass the restriction? We can simply include the option in video2commons as well. vip (talk) 16:20, 19 August 2024 (UTC)[reply]
I've got the same problem too. – Illegitimate Barrister (talkcontribs), 17:41, 17 July 2024 (UTC)[reply]

I just tried using this code directly from video2commons download wrapper, and it worked for me locally but on video2commons it failed as: DownloadError: b'ERROR: [youtube] fYH8eSiOf5I: Sign in to confirm your age. This video may be inappropriate for some users.'

import yt_dlp

params = {
    'writedescription': True,
    'writeinfojson': True,
    'writesubtitles': True,
    'writeautomaticsub': False,
    'subtitleslangs': ['all', '-live_chat'],
    'subtitlesformat': 'srt/ass/vtt/best',
    'cachedir': '/tmp/',
    'noplaylist': True,  # not implemented in video2commons
    'postprocessors': [{
        'key': 'FFmpegSubtitlesConvertor',
        'format': 'srt',
    }],
    'max_filesize': 5 * (1 << 30),
    'retries': 10,
    'fragment_retries': 10,
    'prefer_ffmpeg': True,  # avconv do not have srt encoder
    'prefer_free_formats': True,
}

dl = yt_dlp.YoutubeDL(params)
url = 'https://www.youtube.com/watch?v=fYH8eSiOf5I'
info = dl.extract_info(url, download=True, ie_key=None)
info

-- DaxServer (talk) 09:26, 26 August 2024 (UTC)[reply]

It seems a plugin like https://github.com/pukkandan/yt-dlp-YTAgegateBypass is required, I'll give it a try. vip (talk) 14:34, 26 August 2024 (UTC)[reply]

AV1 encoding feature request

[edit]

I think that video2commons should keep up with the times and have the option to convert videos to the AV1 format. Also, v2c should have the option to transcode only a specified section of a source video if, for example, the video contains something copyrightable like an intro animation. prospectprospekt (talk) 04:02, 23 July 2024 (UTC)[reply]

+1. phab:T209437. Yann (talk) 16:34, 23 July 2024 (UTC)[reply]
Agreed. I'll check into that after the ffmpeg/gentoo update (see below). vip (talk) 16:15, 19 August 2024 (UTC)[reply]

API for this tool

[edit]

Hi. Would it be possible to document the API so that I can send task jobs from my workflow? -- DaxServer (talk) 18:59, 14 August 2024 (UTC)[reply]

Hello @DaxServer the API is implemented here: https://github.com/toolforge/video2commons/blob/master/video2commons/frontend/api.py
There is no documentation AFAIK but it's quite easy to understand, I managed to do it in my own tool: https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/blob/main/src/main/java/org/wikimedia/commons/donvip/spacemedia/service/wikimedia/Video2CommonsService.java vip (talk) 16:10, 19 August 2024 (UTC)[reply]
Thanks for the info @Don-vip I'll look into it! -- DaxServer (talk) 16:39, 19 August 2024 (UTC)[reply]
I tried to automate it in Python to no avail. I keep getting errored out. In the end, I used Puppeteer to render and authenticate. I wish there's a simpler way! -- DaxServer (talk) 14:44, 26 August 2024 (UTC)[reply]
This could be useful to build a tool to for example upload all CCBY videos from a channel. This could be useful to for example upload all free audio files of a musician channel. Please link such tools in the V2C documentation. Prototyperspective (talk) 12:17, 20 August 2024 (UTC)[reply]
This is what I'm doing in Commons:Spacemedia and why I step up to repair video2commons when it was completely broken. vip (talk) 18:16, 22 August 2024 (UTC)[reply]
Great tool, if you already use V2C this way maybe a tool could be created from its code or you could add some documentation how this was done. There is this tool to see all CCBY videos by a channel but I think it's dysfunctional and it can't be used for uploading all of these videos (except for videos already uploaded and videos the user manually unchecks). Prototyperspective (talk) 21:45, 22 August 2024 (UTC)[reply]
Now this page has info on how to download all videos from a channel that are CCBY thanks to @Dhx1: Commons:YouTube files/Downloading#Bulk video download – the thing missing is integration with V2C so one can use the web interface to e.g. select whether subtitles/audio/video should be imported for the files and alter the descriptions etc after entering the channel url. Prototyperspective (talk) 20:28, 30 August 2024 (UTC)[reply]

How long to wait?

[edit]

I uploaded a 1 min .mov about 4 hours ago. I'm still getting "Your task is pending".

How long to wait before aborting and starting over? Nowa (talk) 19:22, 27 August 2024 (UTC)[reply]

P.S. aborted after waiting 24 hours. Any recommendations for an MOV to MP4 converter app? Nowa (talk) 12:39, 28 August 2024 (UTC)[reply]
I've encountered the same problem. It seems that the tool is once again unusable. - THV | | U | T - 13:16, 28 August 2024 (UTC)[reply]
@Nowa @The Harvett Vault Everything is fine. The queue is just executing my jobs. Your tasks would come next since you created the jobs after me. -- DaxServer (talk) 16:14, 28 August 2024 (UTC)[reply]
@DaxServer
  1. can you have some consideration for other users and not stuff the tool with your tasks?
  2. if you are batch uploading, you should apply for a bot account?
RZuo (talk) 17:16, 28 August 2024 (UTC)[reply]
Thanks for the notes @RZuo. I didn't know it would clog up. -- DaxServer (talk) 17:42, 28 August 2024 (UTC)[reply]
OK, so I restarted and it looks like it's making progress. Thanks for the input. Nowa (talk) 21:41, 28 August 2024 (UTC)[reply]
Hello @DaxServer, @Momiji-Penguin, thank you for using this tool, please keep in mind the following facts before requesting a large number of files in order to keep the service available to others:
- this is a small community service, not a powerful infrastructure managed by the WMF
- it can currently handle 20 encoding tasks at the same time
- large videos can take several hours to encode. There's a timeout at 48 hours. If you have no idea how long it will take, please send only a few files to have an estimate before sending dozens or hundreds of files.
- you can monitor the current workload on Grafana, here. The first graph ("Total load") indicates the current CPU usage, from 0 to around 100-120. Please do not send a large number of files if you see that the workload is higher than 100 for several hours, it means the service is at full capacity (which is currently the case for three days now).
Thanks. vip (talk) 21:54, 2 September 2024 (UTC)[reply]
OK, I aborted all my pending and running tasks. I reduce the tasks (at most two) at a time, based on your comment and the resource monitor. Momiji-Penguin (talk) 00:23, 3 September 2024 (UTC)[reply]
Thanks for the Grafana link. I've been looking at the wrong one all this time -- DaxServer (talk) 06:30, 3 September 2024 (UTC)[reply]
@Don-vip: Thanks a lot for that information. I added it to inform people about the limited capacity. Please correct if necessary. Yann (talk) 08:09, 3 September 2024 (UTC)[reply]
@Don-vip: Thanks. Which of mine (if any) have run afoul of the 48 hours timeout?   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 10:30, 5 September 2024 (UTC)[reply]

Age-restricted content

[edit]

I was hoping to upload My Life as a Porn Star - Sophie Anderson Tells Her Story, but it keeps giving me "Error: An exception occurred: DownloadError: b'ERROR: [youtube] qTyBoU1Xw2w: Sign in to confirm your age. This video may be inappropriate for some users.'" What can I do to get round this?--Launchballer (talk) 00:04, 29 August 2024 (UTC)[reply]

If you have an Android device or emulator, there's NewPipe and SkyTube. It can download age-restricted videos by bypassing the error message. - THV | | U | T - 01:20, 29 August 2024 (UTC); edited: 05:18, 29 August 2024 (UTC)[reply]
Thanks, got anything for desktop?--Launchballer (talk) 08:03, 29 August 2024 (UTC)[reply]
https://github.com/yt-dlp/yt-dlp same as what v2c uses. RZuo (talk) 03:57, 3 September 2024 (UTC)[reply]

Please enable importing lossless flac (+a bandcamp importer?)

[edit]

In Category:Audio files created by Stellardrone (composer) from Bandcamp and Category:Audio files of music by Raspberrymusic most files are ogg or opus files despite that lossless / higher-quality FLAC files are available. In V2C one can only choose between ogg and opus. Could you please enable FLAC file imports and additionally configure that to be the default? It seems like all files are transcoded to MP3 and OGG files anyway (at the bottom of the page) so the default should probably be the lossless filetype and it can make a big difference in audio quality.

One can download the flac files with yt-dlp as described in Commons:YouTube files/Downloading but with adding --audio-format flac to the command.

Secondly and relevant to #API for this tool above and the Bulk video download section, it would be great if there was a Bandcamp Importer tool (maybe using V2C via API).

I think a or the problem would be that yt-dlp currently can't read the license of the audio file like it can for videos on YouTube so the part of the bulk video download (to be used with artist pages or album pages) "license=Creative Commons Attribution license (reuse allowed)" or some equivalent doesn't work. One can see the metadata that it can parse with yt-dlp --write-annotations --write-info-json --skip-download url and the license info is not among it. This leads me to conclude that the next step, alongside the thread about V2C API above as well as this thread, would be to create an issue in the yt-dlp GitHub repo requesting the license metadata from tracks on bandcamp to be read. @Dhx1: maybe you know more? Prototyperspective (talk) 18:22, 4 September 2024 (UTC)[reply]

I looked into this @Prototyperspective and the yt-dlp bandcamp extractor needs a fair bit of work. There is an ld+json script in each bandcamp page that contains license information (CC-BY-3.0 vs. all rights reserved) but the yt-dlp bandcamp extractor currently doesn't care about this ld+json script, and only extracts a limited amount of metadata. I can give it a go rewriting the yt-dlp bandcamp extractor but it'll take me a little while to figure out the project's conventions and code structure. Once that is ready from a yt-dlp perspective, it seemingly is then easy for the "bulk audio" section of the YouTube downloading help page to duplicated for bandcamp, allowing filtering of only CC-BY-3.0 tracks. Dhx1 (talk) 07:56, 5 September 2024 (UTC)[reply]
@Dhx1: Thanks for replying and being willing to give it a go. Can yt-dlp also download from YouTube Red, Music Premium, YouTube Premium, and Google Play Music? I am looking at https://m.youtube.com/premium/restrictions .   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 10:40, 5 September 2024 (UTC)[reply]
Seemingly yes, as a general answer. See the --cookies-from-browser BROWSER[+KEYRING][:PROFILE][::CONTAINER] command line argument for yt-dlp (and any other articles you can find about using cookies with yt-dlp). The idea is you login with a usual web browser session and then yt-dlp will reuse the session cookies. Results will probably depend on the "premium" service you're trying to access with yt-dlp, whether DRM is used (won't work with DRM), etc. Dhx1 (talk) 23:12, 5 September 2024 (UTC)[reply]
Thanks a lot for looking into this and these insights! Would be great if you could do that and if it requires getting familiar with the yt-dlp code that could be useful for several other very useful tools and/or code changes (for example bulk importers for some other sites of the resources pages). Please comment here if you got something working or if you have links to the required issues. A separate bandcamp page is a good idea and such pages could also linked to cats like Category:Audio files from Bandcamp/Soundcloud (if you don't implement them people could learn from your change and implement it in a similar way). Prototyperspective (talk) 23:01, 5 September 2024 (UTC)[reply]
@Prototyperspective yt-dlp's Soundcloud extractor already captures the license and therefore should be ready to use with:
yt-dlp -o "%(title)s-%(id)s.%(ext)s" --match-filters "license=cc-by" --embed-metadata --parse-metadata "CC-BY-3.0:%(meta_license)s" --parse-metadata "%(uploader_url)s:%(meta_artist_url)s" -f bestaudio -x --audio-format opus --continue --retries 4 --download-archive example_user_archive.txt https://soundcloud.com/wearecc/sets/open-minds-from-creative
Obviously change the example URL I've supplied to whatever user account/playlist you're wanting to download CC-BY-3.0 content from. Dhx1 (talk) 23:26, 5 September 2024 (UTC)[reply]

Batch upload

[edit]

Hi! as explained in the Village pump, there are channels who upload interesting CC content. I asked if there was any way to upload various CC-licenced videos from the same Youtube channel in a fast way?

Thx. TaronjaSatsuma (talk) 15:37, 17 September 2024 (UTC)[reply]