Software Degeneracy
Posts
Downloading every video off of a YouTube channel with Python using Selenium + PyTube

Downloading every video off of a YouTube channel with Python using Selenium + PyTube

The last video I made was on automating clip editing with MoviePy. I realized that it would probably be useful to teach you how to download videos off of Youtube as well to clip up. We can do this using Selenium + PyTube.

Carson Szeder
February 04, 2024

Downloading every video off of a YouTube channel with Python using Selenium + PyTube

The last video I made was on automating clip editing with MoviePy.

I realized that it would probably be useful to teach you how to download videos off of Youtube as well to clip up.

We can do this using Selenium + PyTube.

I’ll get into it, but if you want to skip the yapping and just go to the repo here it is:

GitHub - Carsoncantcode/yt-transcription

Contribute to Carsoncantcode/yt-transcription development by creating an account on GitHub.

github.com/Carsoncantcode/yt-transcription

We have 2 files in our repo: One for initialization and one for the functions.

This is our main code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.action_chains import ActionChains
import time
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.options import Options
from pytube import YouTube
from pytube.exceptions import AgeRestrictedError
import os
from functions import *

directory = input("What will this directory be named?")

channel = input("What channel are we downloading? ")

parent_dir = "C:\\Open Source\\yt transcription"

path = os.path.join(parent_dir,directory)

driver = create_webdriver_instance()

driver.get(f'https://www.youtube.com/@{channel}/videos')

scroll_to_bottom(driver)

links = driver.find_elements(By.CSS_SELECTOR, 'a.yt-simple-endpoint.style-scope.ytd-rich-grid-media')

hrefs = [link.get_attribute('href') for link in links]

complete_hrefs = [href for href in hrefs if href is not None]

for href in complete_hrefs:
    print(href)
    result = doownloadVideo(href, path)
    print(f"{href}: {result}")


print("Total number of links:", len(complete_hrefs))

time.sleep(3)

driver.quit()

directory = input("What will this directory be named?")

channel = input("What channel are we downloading? ")

parent_dir = "C:\\Open Source\\yt transcription"

path = os.path.join(parent_dir,directory)

driver = create_webdriver_instance()

driver.get(f'https://www.youtube.com/@{channel}/videos')

This paragraph above is how we start the code. We input the directory name we want to save everything to, and we input the channel name we want to download everything off of.

Parent_dir is your parent directory, and path combines the parent_dir/ the directroy name to create and save a new folder.

driver creates a webdriver, and driver.get is the link we are opening.

scroll_to_bottom(driver)

This function right here will initialize a while loop for scrolling down to load all of the links, until no

def scroll_to_bottom(driver):
    last_height = driver.execute_script("return document.documentElement.scrollHeight")

    while True:
        driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")

        time.sleep(2)

        new_height = driver.execute_script("return document.documentElement.scrollHeight")
        if new_height == last_height:
            break
        last_height = new_height

This is that while loop I mentioned above.

This is the code that will find all of the links and download everything

links = driver.find_elements(By.CSS_SELECTOR, 'a.yt-simple-endpoint.style-scope.ytd-rich-grid-media')

hrefs = [link.get_attribute('href') for link in links]

complete_hrefs = [href for href in hrefs if href is not None]

for href in complete_hrefs:
    print(href)
    result = doownloadVideo(href, path)
    print(f"{href}: {result}")

links = driver.find_elements(By.CSS_SELECTOR, 'a.yt-simple-endpoint.style-scope.ytd-rich-grid-media')

links finds all of the elements that have the correct HREF (Video link) in its properties.

hrefs = [link.get_attribute('href') for link in links]

complete_hrefs = [href for href in hrefs if href is not None]

We have to have 2 arrays here. hrefs gets all of the links from the links we scraped. But I found that there are elements with the same class and NO video links, so we had to filter htem out with complete_hrefs.

for href in complete_hrefs:
    print(href)
    result = doownloadVideo(href, path)
    print(f"{href}: {result}")

This function will take that proper list, and run the downloadVideo function.

def doownloadVideo(link, path):
    try:
        video = YouTube(link)
        video.streams.filter(res='720p').first().download(path)
        return 'Downloaded'
    except AgeRestrictedError as e:
        print(f"Error: Video {link} is age restricted. Skipping...")
        return 'Skipped due to age restriction'
    except Exception as e:
        print(f"An error occurred for video {link}: {e}")
        return 'Skipped due to an error'

This function is very simple, as pyTube isn’t complex.

we try to download the video link and download it if it’s found. If it’s not found, we return an error message so we can keep running function regardless of if it downloads or not.

And that’s pretty much the entirety of the script. Great for downloading YouTube videos at scale.