1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] How to minimise the resources usage of selenium multi-threading?

Discussão em 'Python' iniciado por Stack, Setembro 12, 2024.

  1. Stack

    Stack Membro Participativo

    In short, I have to test the behaviours of a waiting room guarding product, which is appended to the website's html as javascript, with a selenium scrapper.

    I made it multi-threaded, as the waiting room script seems to detect the cookies and browser information for identification. Thus I thought of starting multiple new drivers at once and accessing the home page. This is a sample version of what I wrote:

    class ScrappingThread(threading.Thread):
    # ...init function

    def run(self):
    is_redirected = False
    redirected_times = 0
    while (not is_redirected) or (isRedirectContinued and
    redirected_times < (int(os.getenv("REQUIRED_REDIRECTED_TIMES") or 3))):
    driver = webdriver.Edge(options=options)
    if bool(os.getenv("IS_MONITORED_SECOND_WINDOW") or False):
    driver.set_window_position(0, -600)
    driver.get(os.getenv("WEBSITE_URL"))
    driver.implicitly_wait(2)
    target_url = driver.current_url
    print(target_url)
    sleep(int(os.getenv("IDLE_TIME_FOR_EACH_ACCESS") or 60))
    if initUrl != target_url:
    is_redirected = True
    redirected_times += 1
    if redirected_times >= 4:
    sleep(int(os.getenv("IDLE_TIME_AFTER_EACH_REDIRECTION") or 5))
    print(
    f"The redirect process is completed for {redirected_times} time(s) - Thread {self.index} ({redirected_times >= 4})")
    sleep(int(os.getenv("IDLE_TIME_AFTER_EACH_REDIRECTION") or 10))


    threads = []
    for index in range(int(os.getenv("THREAD_NUM") or 2)):
    t = ScrappingThread(os.getenv("WEBSITE_URL"), index)
    t.start()
    threads.append(t)
    # Wait before a new thread starts so browsers are not opened concurrently.
    sleep(int(os.getenv("IDLE_TIME_BEFORE_STARTING_NEW_THREAD") or 2))

    for t in threads:
    t.join()


    However, the program gets greatly slowed down when around 20 threads are concurrently running. And it would be great to know if there are any ways to speed it up with selenium, or to convert to other libraries for this task.

    UPDATE: I am currently using an HP computer with i7-1165G7 @ 2.80Ghz and 16.0GB RAM.

    Continue reading...

Compartilhe esta Página