Incredible Render Performance In Kdenlive With NVENC - But 1 Big Problem

Jean-Baptiste Mardelle jb at kdenlive.org
Fri Jul 27 13:29:50 BST 2018


On Thursday, July 26, 2018 6:17:52 PM CEST, johnar1 wrote:
> I use 16.04 and 18.04 and I have found that using any kernel 
> >4.15 severly breaks the nvidia-driver.
>
> I am currently compiling 4.18 rc3 with the fixed modules to 
> bypass the error caused with the driver but it's a pain.
>
> NVENC is a nice feature, but if you use a lot of transitions, 
> effects and title clips in kdenlive, then it's almost pointless, 
> because gpu utilization is basically halted during these 
> operations.
>
> Tell me if you got it working.
> And definitely try the Shotcut melt binary, it has given me 
> even better nvenc performance than my self-compiled melt 6.11 
> from the latest git.


Hello!

So I got nvenc working and have some interesting results. First, regarding 
the slowdown with the high quality track compositing:

The transition used for tracks high quality compositing (qtblend) has some 
code to detect if a compositing is necessary. For example, if the video on 
the top track uses a pixel format that does not use alpha, like RGB, we 
don't try to perform the composition, and directly return the top frame as 
a result. There are several checks to make sure we don't need to do the 
compositing, and one of them is a check of the aspect ratio. If the frames 
aspect ratio don't match, we do perform the compositing. Useful for example 
if you put a small image on a track above a video track, you usually want 
to be able to see the video in the area not covered by the image.

This is the reason for the slowdown on rendering with high quality, because 
your sample sintel movie is in fact a 1920x818 video, with a DAR (display 
aspect ratio) of 960:409.

So it does not match the 1920x1080, 16:9 DAR project property, and so we do 
perform the compositing, slowing down the process. Not sure what is the 
best way to handle this, but we can probably find a solution to prevent 
compositing in some cases.

Now regarding the nvenc performance, I also got some very encouraging 
results. Using a 1920x1080 mp4 sample clip of 46 seconds, I got the 
following results with my GTX 1050 TI card (all tests use the default "high 
quality" track compositing):

Test 1: No effects, one clip in timeline. Render times:
libxvid: 1m30s
nvenc:   7s (!!!)

Test 2: one clip in timeline, with lift/gamma/gain color correction. Render 
times:
libxvid: 1m29s
nvenc:   36s

Test 3: one clip in timeline, with sepia effect. Render times:
libxvid: 1m29s
nvenc:   8s

(sepia filter works in yuv422, while lift/gamma/gain requires an rgb 
colorspace, which explains the performance diff).

Test 4: 1 exta image clip composited over the video. Render times:
libxvid: 1m37s
nvenc:   1m02s

So even with effects and transitions we still get a comfortable time gain. 
It will also be interesting to integrate this in Kdenlive's internal uses, 
like timeline preview and creating proxy clips which will make everything 
faster (I successfully created I frame only clips with nvenc so we can get 
a frame accurate seeking). I also made some tests with the scale_npp 
rescale filter: 

ffmpeg -y -hwaccel cuvid -c:v h264_cuvid -i Sintel.2010.1080p.mkv -filter:v 
scale_npp=320:200 -vcodec h264_nvenc result.mp4

transcoding (using ffmpeg only) the full movie to 320x200 proxy:
using nvenc and scale_npp: 43s.
Using nvenc the normal ffmpeg's scale filter: 1m18s
Using libxvid and normal scale filter: 1m31s
Compared to no resize:
using nvenc without resizing: 1m11s
using libxvid without resizing: 6m08s


So that means we can should be able to create proxy clips twice as fast, 
and timeline preview probably also.

The only sad part is that it requires the NVIDIA proprietary drivers, but 
maybe FFmpeg's other free hwaccel methods will be usable as well in the 
future. If anybody wants to test other hardware, you are welcome.

I will anyways integrate these nvenc speedups in the big december 2018 
refactoring release.

Best regards

Jean-Baptiste

> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On July 26, 2018 10:47 AM, jb <jb at kdenlive.org> wrote:
>
> Le 26.07.18 à 10:02, johnar1 a écrit :
>
> Hello Jean,
>
> yes that is correct, switching from High Quality compositing 
> had a major impact on performance and I am not sure why.
>
> I can help you out getting NVENC to work.
> What platform are you on?
>
> Thanks. My main issue is the NVidia driver, since I currently 
> use an Ubuntu 16.04 based distro, and FFmpeg complains about my 
> NVIDIA driver version < 390. I will upgrade my distro and give 
> some feedback tomorrow.
>
> After that I guess a page on setting up nvenc would be nice on our wiki:
> https://community.kde.org/Kdenlive/Development/KF5
>
> Regards
> Jean-Baptiste
>
> There are a couple things to be mindful of.
>
> .)After installing the graphics driver you need to genereate an xorg.conf
> .)Install the NVENC Headers from here: 
> https://github.com/lutris/ffmpeg-nvenc/issues/22
> .)Use the correct NVENC parameter in your render profile, as 
> the current one has been deprecated.
> Here's my profile:
> f=mp4 vcodec=h264_nvenc gb=21 vq=21 acodec=aac ab=384k r=60 
> preset= slow g=120 bf=2
>
> .)You need to compile ffmpeg and mlt with the following flags:
> ./configure --enable-nvenc --enable-cuvid --enable-nonfree
>
> Or if you don't want to compile yourself you can simply use the 
> melt binary + libs from the most recent Shotcut build.
> https://github.com/mltframework/shotcut/releases/download/v18.07/shotcut-linux-x86_64-180702.tar.bz2
>
> Instructions here:
> https://www.youtube.com/watch?v=X14GvmBpq08&t=314s
>
> This will get NVENC working 100%.
> When you're done, you need to track the GPU utilization in the 
> driver and make sure it is working. How many frames your GPU can 
> push depends on the power of your CPU. I use the 
> kdenlive_multirender  script to ensure 100% utilization on all 
> cores and subsequently higher GPU utilization.
> https://github.com/unfa/kdenlive-multirender
>
> Let's keep in touch!
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On July 26, 2018 9:30 AM, Jean-Baptiste Mardelle <jb at kdenlive.org> wrote:
>
>
> On 25.07.2018 21:38, johnar1 wrote:
> Dear Mr. Vincent Pinon,
>
> if that is in fact your real name, my first born son shall 
> henceforth be known as Vincent.
>
> Your suggestion was spot on and according to my tests so far I 
> believe it works.
>
> Here are my findings:
>
> Rendering 1st minute of Sintel 1080p/60FPS , 4 Threads, NVENC 
> enabled,  Kdenlive 18.04 AppImage
>
> Hello Johnar,
>
> I am myself trying to setup a working nvenc environment and 
> hope to make some more tests.
>
>
> .)Track Composition - "None" 
> [CPU Utilization: 70% - 67% - 64% - 80%] [GPU Utilization: 70%] 
> [Render Time: 15s]
> I noticed however, that transitions such as for example Slide 
> or Composite are rendered improperly, with certain interference 
> patterns.
>
> After consulting the documentation, I realized that this should 
> be fixed by disabling any surrounding empty tracks, but so far I 
> have not been able to achieve that.
>
> .)Track Composition - "Preview"
> [CPU Utilization: 67% - 65% - 69% - 75%] [GPU Utilization: 65%] 
> [Render Time: 20s]
>
> I conclude that the best of both worlds comes into play with 
> this option enabled.
> Both the GPU and CPU are almost fully utilized, while it 
> appears that transitions are rendered correctly.
>
> So if I understand correctly, rendering the same project with 
> Track compositing set to "High Quality" has a major impact and 
> you get this result:
> [CPU Utilization: 100% - 7% - 10% - 18%] [Video Engine 
> Utilization (NVENC): 8%] [Render Time: 2m54s]
>
> This seems strange to me since Kdenlive's "high quality" track 
> compositing uses the qtblend transition that should 
> automatically be bypassed when there is no transparency in the 
> video. If you can confirm that and that this simple change in 
> track compositing has such an impact this definitely has to be 
> checked... Also, Dan recently fixed many of the "affine" 
> transition issue, so it should give results similar to the 
> "qtblend" transition but may be faster..
>
> Thanks for all your investigations, I hope to come back with 
> more infos once I successfully achieve my setup.
>
> Best regards
> Jean-Baptiste
>
>
>
> I will do more thorough testing and read any documentation 
> available, as I absolutely want to understand what exactly these 
> options do.
>
> I think it's a bit counterintutive to have "High Quality" 
> enabled by default, which so heavily impacts performance, while 
> in my opinion not making enough of an effort to alert the user 
> to the extreme effects it may have on render times.
>
> I literally spent 72 hours straight, compiling every single 
> version of melt and kdenlive, documenting and testing every 
> possible compilation parameter variation, performance reviews 
> with every available version of Ubuntu, corresponding kernels 
> and nvidia drivers, and every remotely related kdenlive option 
> or workarounds.
>
> This has  definitely shortened my life span by about 3 - 4 years.
>
> I would like to extend my gratitude to you, good sir.
>
>
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On July 25, 2018 2:23 AM, Vincent Pinon <vpinon at kde.org> wrote:
>
> Hello,
>  
> I don't know how precisely you do the job in kdenlive
> (you could share the .kdenlive, the .sh.mlt, a screenshot),
>  
> one thing I suspect is the track composition (automatic transparency):
> if you keep the default high quality choice,
> kdenlive adds "composite & transform" transitions that are based on Qt.
> So without gpl / qt module, MLT skips these transitions.
>  
> Could you run your test switching to "no transparency"?
> (toolbar just above timeline)
>  
> Thanks for you enthusiastic investigations :)
>  
> Vincent
>  
> Le mardi 24 juillet 2018, 23:26:01 CEST johnar1 a écrit :
> Hey guys, I have some more info.
>
> Hey Eugen, I have some more info.
> For this test I used mlt 6.11, successfully compiled by Dan 
> Dennedy's build-melt.sh
>
> The test file that I am using is the 1080p version of Sintel.
> https://durian.blender.org/download/
> CPU: i5 6600K, GPU:GTX 750 TI nvidia-390 driver , Platform: Kubuntu 18.04
>
> In order to check melt and isolate the problem I simply 
> rendered the first minute of the Sintel short film with the 
> following command:
> (This is not the /bin/melt, but the script which launches it 
> with the correct libs)
>
> /home/frank/melt/20180724/melt -profile atsc_1080p_60 
> sintel.mkv out=3600 -consumer avformat:result-60.mp4 f=mp4 
> vcodec=h264_nvenc preset=slow
> [CPU Utilization: 67% - 70% - 68% - 74%] [Video Engine 
> Utilization (NVENC): 80%] [Render Time: 20s]
>
>
> It obviously works perfectly.
>
>
> Now when I select this melt in the kdenlive environment, and 
> also ffmpeg, ffplay, ffprobe and the profiles path from Dan's 
> melt folder, yields the following results when rendering the 
> first minute of Sintel.
> [CPU Utilization: 100% - 7% - 10% - 18%] [Video Engine 
> Utilization (NVENC): 8%] [Render Time: 2m54s]
> Something in kdenlive breaks parallel processing, only allowing 
> 1 single core to be fully utilized.
> And I have tested every single version of kdenlive available on this earth.
> Every app image, including the refractoring version and every 
> single ppa version, including stable, dev and master.
>
>
> Also generating and launching the render script from the 
> terminal yields the same result.
>
> RENDERER="/home/frank/kdenlive/bin/kdenlive_render"
>
>
> MELT="/home/frank/melt/20180724/melt"
>
>
> SOURCE_0="file:///home/frank/Documents/scripts/script001.sh.mlt"
>
>
> TARGET_0="file:///home/frank/Documents/untitled.mkv"
>
>
> PARAMETERS_0="-pid:2664 in=0 out=3052 $MELT atsc_1080p_60 
> avformat - $SOURCE_0 $TARGET_0 vcodec=nvenc_h264 threads=4 
> real_time=-1"
>
>
> $RENDERER $PARAMETERS_0
>
>
>
>
>
> I have also tested different kdenlive_render executables/libs 
> with the same result.
>
> I should note, that using the kdenlive_multirender script in 
> conjunction with the generated render script by kdenlive, while 
> specifying 4 threads, the CPU uses 2 cores at 100%.
> https://github.com/unfa/kdenlive-multirender
>
> Now as I have described before, when I compile melt without 
> enabling gpl, the 1 minute of Sintel renders perfectly again, 
> with full utilization on both the CPU and GPU, but from within 
> Kdenlive this time.
>
> I conclude that this problem is somehow caused by Kdenlive and 
> related to qt, but I do not possess the knowhow to further 
> analyze it.
>
> With the latest 18.08 Beta18 and the most recent QT version, 2 
> cores instead of 1 are now being utilized at 100% with the NVENC 
> profile and 100% on all 4 cores using the MP4 h264 profile.
>
> So this is 100% a QT issue with NVENC, but I need further 
> insight from a professionals like yourselves.
>
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On July 16, 2018 7:51 AM, johnar1 <johnar1 at protonmail.com> wrote:
>
>
>
> System: i5 6600K, 1050TI, Ubuntu 18.04, Kernel 4.16
> I have successfully compiled mlt and ffmpeg with nvenc support 
> using the official nvenc headers stripped from the Nvidia SDK.
> Rendering the first minute of the 1080p Sintel version, with 4 
> threads specified and my nvenc profile, finishes in 10 seconds.
> Sintel can be downloaded here: https://durian.blender.org/download/
> Nvenc Profile: (compatible with recent mlt versions who are 
> nvenc enabled by deafult)
> f=mp4 vcodec=nvenc_h264 global_quality=21 vq=21 preset=slow bf=2 ab=384k
>
>
> Now here is the problem that I do not understand:
> Using the latest version of kdenlive from the kdenlive-master 
> ppa combined with the newly compiled versions of ffmpeg and mlt 
> works perfectly, but only under very specific circumstances.
>
> I have only been able to get rendering with nvenc to work 
> properly when I use and open this specific kdenlive [b]save 
> file[/b] which I made of the first minute of the Sintel short 
> film with the Appimage Version of Kdenlive. After launching the 
> ppa/installed version of kdenlive and opening this save file, 
> rendering with nvenc works flawlessly.
>
> If I simply start a new project, adding the whole Sintel short 
> film to the project bin, cutting the first minute and render it, 
> nvenc simply does not work and the render time is tripled, 
> despite having changed nothing else, including the nvenc render 
> profile.
>
> If I create a save file of the first minute of Sintel with the 
> installed version and open it on the Appimage version, nvenc 
> does not work again.
>
> Conclusion: There must be something in this save file, maybe a 
> parameter, additonal settings or any type of code not present in 
> the default kdenlive project profiles, which enables NVENC.
>
> I would greatly appreciate it if we could find out the source 
> of this problem together.
>
> Kdenlive Appimage Save File with which NVENC works:
> https://pastebin.com/rzjR57DJ
>
> PPA/Installed Version of Kdenlive created Save File which breaks NVENC:
> https://pastebin.com/3uQ8sP0C
>
>
>
>
>
>
>



More information about the kdenlive mailing list