ffmpeg

ffmpeg批量下载m3u8视频流，去除其中嵌入广告方法探索

gvhi

29 Jun 2024 • 4 min read

最近在看一部美国上将麦克阿瑟推荐的动漫，但是呢在线看总卡顿，俺就尝试把视频下载下来看，想着即不卡顿也没有广告，爽哉。

找到网站播放的m3u8视频流F12，其中url有明显规律，简单一段bash即可下载。

for index in {10..1000}
do
    echo "Processing episode $index"
    
    # URL construction
    url="https://s3.fakevideo.com/video/huoyingrenzhe/%E7%AC%AC${index}%E9%9B%86/index.m3u8"
    
    # Output file path
    output_file="${index}.mp4"
    
    # Download and convert
    ffmpeg -i "$url" -c copy -bsf:a aac_adtstoasc "$output_file" || echo "Failed to process episode $index"
    
    # Pause between requests to be kind to the server
    sleep 1
done

一晚上下载了400集，准备好肥宅快乐水开看。

日，看着看着还是跳出了广告。

庵最开始还以为是网页嵌入的广告视频。

瞬间顿悟，原来是host视频的人嵌入的广告，简直不能太合理了。

查看下载时的输出，看没看到中间几个adjump，这TM就是广告视频流。

到这，第一种解决方法自然就出来了。

怎么跳过下载包含adjump的url呢？

修改ffmpeg源码，找到相关代码修改，庵只想看个动漫而以，懒。
使用m3u8 parser库，过滤掉adjump，之后在下载。

庵好不容易下载一晚上的视频都被插入的广告。蹲到这，第二种解决方法又冒出来了。

怎么删除mp4视频中的广告呢？

使用视频编辑软件，逐个cut掉广告。庵用视频编辑软件看了一下，插入广告的时间有一定的随机性，还无法批量化，fuck。也许庵没有视频编辑经验，没能找到广告视频特征。
提取出广告开始和结束的帧，写一个脚本（opencv），通过图像对比，找出广告的起始结束时间，使用ffmpeg 批量cut掉广告。

以上四种方法只是大体思路，庵只要实现其中一种即可。

庵剧集都快下载完了，最优解就是弄一个去广告的脚本了。开露。

import cv2
import glob
from skimage.metrics import structural_similarity

mp4_files = glob.glob("*.mp4")

ad_starts = ["output-662440.0.jpg"]
ad_ends = ["output-679640.0.jpg"]

ad_starts_img = [cv2.imread(file, cv2.IMREAD_GRAYSCALE) for file in ad_starts ]
ad_ends_img = [cv2.imread(file, cv2.IMREAD_GRAYSCALE) for file in ad_ends ]

for mp4_file in mp4_files:
    print(mp4_file)

    ad_started = False

    cap = cv2.VideoCapture(mp4_file)

    if not cap.isOpened():
        print("Error: Could not open video.")
        exit()

    # 广告大概在11 minute出现
    fps = cap.get(cv2.CAP_PROP_FPS)
    start_time = 11 * 60
    start_frame = int(start_time * fps)
    cap.set(cv2.CAP_PROP_POS_FRAMES, start_frame)

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        timestamp = cap.get(cv2.CAP_PROP_POS_MSEC)

        if ad_started == False:
            for ad_start_img in ad_starts_img:
                ad_start_img = cv2.resize(ad_start_img, (gray_frame.shape[1], gray_frame.shape[0]))
                score, _ = structural_similarity(ad_start_img, gray_frame, full=True)
                if score > 0.99:
                    ad_started = True
                    print("start", score, timestamp)
        else:
            for ad_end_img in ad_ends_img:
                ad_end_img = cv2.resize(ad_end_img, (gray_frame.shape[1], gray_frame.shape[0]))
                score, _ = structural_similarity(ad_end_img, gray_frame, full=True)
                if score > 0.99:
                    print("end", score, timestamp)


        cv2.imshow('Frame', frame)

        if cv2.waitKey(25) & 0xFF == ord('q'):
            break
    cap.release()
    cv2.destroyAllWindows()

示例代码不全，只贴了获得广告的开始时间和结束时间的关键部分。

后来发现某些嵌入的广告分辨率和原视频不一致，省去了比较图像。。。。

import cv2
import glob
from moviepy.editor import VideoFileClip, concatenate_videoclips

mp4_files = glob.glob("*.mp4")

for mp4_file in mp4_files:

    ad_started = False

    cap = cv2.VideoCapture(mp4_file)

    if not cap.isOpened():
        print("Error: Could not open video.")
        exit()

    # 广告大概在11 minute出现
    fps = cap.get(cv2.CAP_PROP_FPS)
    start_time = 11 * 60
    start_frame = int(start_time * fps)
    cap.set(cv2.CAP_PROP_POS_FRAMES, start_frame)

    pre_shape = None

    ad_start = None
    ad_stop = None
    
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

#        cv2.imshow('Frame', frame)
        if pre_shape is None:
            pre_shape = frame.shape
        current_shape = frame.shape
        if current_shape != pre_shape:
            timestamp = cap.get(cv2.CAP_PROP_POS_MSEC)
            if ad_started == False:
                ad_start = timestamp
                ad_started = True
            elif ad_started == True:
                ad_stop = timestamp
                break

        pre_shape = frame.shape

        if cv2.waitKey(25) & 0xFF == ord('q'):
            break

    print(mp4_file, ad_start, ad_stop)

    video = VideoFileClip(mp4_file)
    part1 = video.subclip(0, ad_start/1000.0)
    part2 = video.subclip((ad_stop+300)/1000.0, video.duration)

    final_clip = concatenate_videoclips([part1, part2])

    final_clip.write_videofile("output/"+mp4_file, codec='libx264', fps=video.fps)

    cap.release()
    cv2.destroyAllWindows()

在线音视频转码

直接在浏览器中转换音频和视频文件格式