I am trying to do a simple web scrape for a price, title, and URL for all the products on the first page of this website; https://equi.life/pages/search-resu...=products&sort_by=title&sort_order=asc&page=1 I keep running into this ERROR: Playwright page not found. These are my specs: HP ENVY x360 Convertible 15-es1xxx Processor 11th Gen Intel(R) Core(TM) i7-1195G7 @ 2.90GHz 1.80 GHz Installed RAM 16.0 GB (15.8 GB usable) Device ID 1975E5CA-1426-41A2-94D4-CF532B2C84B8 Product ID 00342-22041-47520-AAOEM System type 64-bit operating system, x64-based processor Edition Windows 11 Home Version 23H2 Installed on 7/5/2024 OS build 22631.4112 Experience Windows Feature Experience Pack 1000.22700.1034.0 I am using WSL with Ubuntu. These are the packages I have: appdirs==1.4.4 attrs==24.2.0 Automat==24.8.1 certifi==2024.8.30 cffi==1.17.1 charset-normalizer==3.3.2 constantly==23.10.4 cryptography==43.0.1 cssselect==1.2.0 defusedxml==0.7.1 filelock==3.16.0 greenlet==3.0.3 hyperlink==21.0.0 idna==3.8 importlib_metadata==8.4.0 incremental==24.7.2 itemadapter==0.9.0 itemloaders==1.3.1 jmespath==1.0.1 lxml==5.3.0 packaging==24.1 parsel==1.9.1 playwright==1.46.0 Protego==0.3.1 pyasn1==0.6.1 pyasn1_modules==0.4.1 pycparser==2.22 PyDispatcher==2.0.7 pyee==11.1.0 pyOpenSSL==24.2.1 queuelib==1.7.0 requests==2.32.3 requests-file==2.1.0 Scrapy==2.11.2 scrapy-playwright==0.0.41 service-identity==24.1.0 setuptools==74.1.2 tldextract==5.1.2 tqdm==4.66.5 Twisted==24.7.0 typing_extensions==4.12.2 urllib3==2.2.2 w3lib==2.2.1 websockets==10.4 zipp==3.20.1 zope.interface==7.0.3 This is the log that includes the error that shows when I run my spider. Loading items.py 2024-09-11 15:28:45 [scrapy.utils.log] INFO: Scrapy 2.11.2 started (bot: webscraper) 2024-09-11 15:28:45 [scrapy.utils.log] INFO: Versions: lxml 5.3.0.0, libxml2 2.12.9, cssselect 1.2.0, parsel 1.9.1, w3lib 2.2.1, Twisted 24.7.0, Python 3.12.3 (main, Jul 31 2024, 17:43:48) [GCC 13.2.0], pyOpenSSL 24.2.1 (OpenSSL 3.3.2 3 Sep 2024), cryptography 43.0.1, Platform Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39 2024-09-11 15:28:45 [scrapy.addons] INFO: Enabled addons: [] 2024-09-11 15:28:45 [asyncio] DEBUG: Using selector: EpollSelector 2024-09-11 15:28:45 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2024-09-11 15:28:45 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.unix_events._UnixSelectorEventLoop 2024-09-11 15:28:45 [scrapy.extensions.telnet] INFO: Telnet Password: 24ad54f81b9ec806 2024-09-11 15:28:46 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats'] 2024-09-11 15:28:46 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'webscraper', 'FEED_EXPORT_ENCODING': 'utf-8', 'NEWSPIDER_MODULE': 'webscraper.spiders', 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7', 'SPIDER_MODULES': ['webscraper.spiders'], 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'} 2024-09-11 15:28:47 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'webscraper.middlewares.WebscraperDownloaderMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2024-09-11 15:28:47 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'webscraper.middlewares.WebscraperSpiderMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2024-09-11 15:28:47 [scrapy.middleware] INFO: Enabled item pipelines: ['webscraper.pipelines.WebscraperPipeline'] 2024-09-11 15:28:47 [scrapy.core.engine] INFO: Spider opened 2024-09-11 15:28:47 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2024-09-11 15:28:47 [equilife] INFO: Spider opened: equilife 2024-09-11 15:28:47 [equilife] INFO: Spider opened: equilife 2024-09-11 15:28:47 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2024-09-11 15:28:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://equi.life/pages/search-results-page?q=all%20products&tab=products&sort_by=title&sort_order=asc&page=1> (referer: None) 2024-09-11 15:28:48 [scrapy.core.spidermw] WARNING: Async iterable passed to WebscraperSpiderMiddleware.process_spider_output was downgraded to a non-async one 2024-09-11 15:28:48 [equilife] ERROR: Playwright page not found 2024-09-11 15:28:48 [scrapy.core.engine] INFO: Closing spider (finished) 2024-09-11 15:28:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 301, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 68610, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 0.877688, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2024, 9, 11, 12, 28, 48, 868664, tzinfo=datetime.timezone.utc), 'httpcompression/response_bytes': 280665, 'httpcompression/response_count': 1, 'log_count/DEBUG': 4, 'log_count/ERROR': 1, 'log_count/INFO': 12, 'log_count/WARNING': 1, 'memusage/max': 66830336, 'memusage/startup': 66830336, 'response_received_count': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2024, 9, 11, 12, 28, 47, 990976, tzinfo=datetime.timezone.utc)} 2024-09-11 15:28:48 [scrapy.core.engine] INFO: Spider closed (finished) I uninstalled scrapy and scrapy-playwright many times and whenever I was prompted for missing dependencies, I installed them. I tried force reinstalling with upgrades. I tried adding this code snippet to settings.py: DOWNLOADER_MIDDLEWARES = { 'scrapy_playwright.middleware.PlaywrightMiddleware': 543, } I eventually removed it because I realize it was unnecessary and caused more errors. I looked on youtube, google and chatgpt. I have tried everything I can think of. This is my spider.py import scrapy from scrapy_playwright.page import PageMethod from webscraper.items import EquiLifeItem class EquilifeSpider(scrapy.Spider): name = "equilife" allowed_domains = ["equi.life"] start_urls = ["https://equi.life/pages/search-results-page?q=all%20products&tab=products&sort_by=title&sort_order=asc&page=1"] def start_requests(self): for url in self.start_urls: yield scrapy.Request( url, meta={ 'playwright': True, 'playwright_include_page': True, 'playwright_page_methods': [ PageMethod('wait_for_selector', 'div#snize-item clearfix ') ] }, callback=self.parse ) async def parse(self, response): page = response.meta.get('playwright_page') if not page: self.logger.error('Playwright page not found') return try: content = await page.content() selector = scrapy.Selector(text=content, type='html') products = selector.css('a.snize-view-link') for product in products: product_data = EquiLifeItem() product_data['title'] = product.css('span.snize-title::text').get() product_data['price'] = product.css('span.snize-price::text').get() product_data['url'] = product.css('a').attrib.get('href') yield product_data except Exception as e: self.logger.error(f'Error processing page: {e}') finally: await page.close() This is my settings.py - The playwright configurations are at the bottom. BOT_NAME = "webscraper" SPIDER_MODULES = ["webscraper.spiders"] NEWSPIDER_MODULE = "webscraper.spiders" # Obey robots.txt rules ROBOTSTXT_OBEY = False # Configure maximum concurrent requests performed by Scrapy (default: 16) #CONCURRENT_REQUESTS = 32 # Configure a delay for requests for the same website (default: 0) # See also autothrottle settings and docs #DOWNLOAD_DELAY = 3 # The download delay setting will honor only one of: #CONCURRENT_REQUESTS_PER_DOMAIN = 16 #CONCURRENT_REQUESTS_PER_IP = 16 # Disable cookies (enabled by default) #COOKIES_ENABLED = False # Disable Telnet Console (enabled by default) #TELNETCONSOLE_ENABLED = False # Override the default request headers: #DEFAULT_REQUEST_HEADERS = { # "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", # "Accept-Language": "en", #} # Enable or disable spider middlewares SPIDER_MIDDLEWARES = { "webscraper.middlewares.WebscraperSpiderMiddleware": 543, } # Enable or disable downloader middlewares DOWNLOADER_MIDDLEWARES = { "webscraper.middlewares.WebscraperDownloaderMiddleware": 543, } # Enable or disable extensions #EXTENSIONS = { # "scrapy.extensions.telnet.TelnetConsole": None, #} # Configure item pipelines ITEM_PIPELINES = { "webscraper.pipelines.WebscraperPipeline": 300, } # Enable and configure the AutoThrottle extension (disabled by default) #AUTOTHROTTLE_ENABLED = True # The initial download delay #AUTOTHROTTLE_START_DELAY = 5 # The maximum download delay to be set in case of high latencies #AUTOTHROTTLE_MAX_DELAY = 60 # The average number of requests Scrapy should be sending in parallel to # each remote server #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 # Enable showing throttling stats for every response received: #AUTOTHROTTLE_DEBUG = False # Enable and configure HTTP caching (disabled by default) #HTTPCACHE_ENABLED = True #HTTPCACHE_EXPIRATION_SECS = 0 #HTTPCACHE_DIR = "httpcache" #HTTPCACHE_IGNORE_HTTP_CODES = [] #HTTPCACHE_STORAGE = "scrapy.extensions.httpcache.FilesystemCacheStorage" # Set settings whose default value is deprecated to a future-proof value REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7" TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor" FEED_EXPORT_ENCODING = "utf-8" PLAYWRIGHT_ENABLED = True PLAYWRIGHT_BROWSER_TYPE = "chromium" PLAYWRIGHT_LAUNCH_OPTIONS = { 'executable_path': r'C:\Program Files\BraveSoftware\Brave-Browser\Application\brave.exe', 'headless': True, } items.py, middlewares.py and pipelines.py are untouched. Continue reading...