Software Degeneracy
Posts
How to pull data from a website’s Data API Endpoints

How to pull data from a website’s Data API Endpoints

Sometimes when you’re trying to pull data from a site, scraping with Selenium/BS4 is overkill and you can accomplish your task simply with the Python Requests library.

Carson Szeder
February 06, 2024

How to pull data from a website’s Data API Endpoints

Sometimes when you’re trying to pull data from a site, scraping with Selenium/BS4 is overkill and you can accomplish your task simply with the Python Requests library.

The code is VERY Simple:

import requests url = "https://query2.finance.yahoo.com/v1/finance/trending/US?count=50&useQuotes=true&fields=logoUrl%2CregularMarketChangePercent" headers = { "sec-ch-ua": '"Not A(Brand";v="99", "Google Chrome";v="121", "Chromium";v="121"', 'referer':'https://finance.yahoo.com/rates/', "sec-ch-ua-mobile": "?0", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36", "sec-ch-ua-platform": '"Windows"', } response = requests.get(url, headers=headers) if response.status_code == 200: print(response.json()) else: print(f"Failed to fetch data: {response.status_code}")

You basically want to copy the cURL fdrmat from the network tab and format it accordingly with the requests library.

you’re probably wondering how to get that cURL format.

These are the steps to find the data endpoints:

Inspect Element

Go to the Network Tab

Filter by Fetch/XHR Requests

click on requests that look loike they might have the data you want, and go to the responses tab for that code.

Then you just set that cURL up in your code with the url and proper headers, do response = requests.get(url, headers=headers), and print the response.json.

Then from there you can do whatever you want with the data if it’s hittable. Sometimes sites won’t let you hit it directly and that’s when you have to scrape.