1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] Using Beautifulsoup To Scrape the data from a worldmap and store this into a csv-file

Discussão em 'Python' iniciado por Stack, Outubro 25, 2024 às 12:32.

  1. Stack

    Stack Membro Participativo

    try to scrape the data of the site https://www.startupblink.com/startups - in order to grab all the startups: well i think this is a good chance to do this with python and beautiful soup.

    Technically, we could use Python and Beautiful Soup to scrape the data from the website https://www.startupblink.com/startups

    what is needed: .. here some overfiew on the steps:

    first we need to send a GET request to the website using the requests library in Python. then we parse the HTML content of the response using Beautiful Soup.

    we need to find the HTML elements that contain the startup data we re interested in using Beautiful Soup's find or find_all methods.

    afterwards we try to extract the relevant information from the HTML elements using Beautiful Soup's string or get methods. finally we store the data in a format of our choice, such as a CSV file or a database ( note - if we would use pandas it would be a bit easier i get )

    Here's some first ideas to get this started:

    import requests
    from bs4 import BeautifulSoup
    import csv

    # Send an HTTP request to the website's URL and retrieve the HTML content
    url = 'https://www.startupblink.com/startups'
    response = requests.get(url)

    # Parse the HTML content using Beautiful Soup
    soup = BeautifulSoup(response.content, 'html.parser')

    # Find all the startup listings on the page
    startup_listings = soup.find_all('div', {'class': 'startup-list-item'})

    # Create a CSV file to store the extracted data
    with open('startup_data.csv', mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Name', 'Description', 'Location', 'Website'])

    # Loop through each startup listing and extract the relevant information
    for startup in startup_listings:
    name = startup.find('a', {'class': 'startup-link'}).text.strip()
    description = startup.find('div', {'class': 'startup-description'}).text.strip()
    location = startup.find('div', {'class': 'startup-location'}).text.strip()
    website = startup.find('a', {'class': 'startup-link'})['href']

    # Write the extracted data to the CSV file
    writer.writerow([name, description, location, website])







    at this point i think that i have to rework the code - i getback only a tiny csv file with 35 bytes.

    i will have to run more tests - to makesure that i get the right approach

    Continue reading...

Compartilhe esta Página