pax_global_header00006660000000000000000000000064145324064710014520gustar00rootroot0000000000000052 comment=59f25aeffc717b73ca5fa6bf2fb97683a9d55c5f pypartpicker-1.9.5/000077500000000000000000000000001453240647100142515ustar00rootroot00000000000000pypartpicker-1.9.5/.gitignore000066400000000000000000000001441453240647100162400ustar00rootroot00000000000000*__pycache__ *.gitignore *Scripts *Lib *Include *pyvenv.cfg *.egg-info venv .venv dist build test.pypypartpicker-1.9.5/LICENSE.txt000066400000000000000000000020511453240647100160720ustar00rootroot00000000000000MIT License Copyright (c) 2021 thefakequake Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. pypartpicker-1.9.5/README.md000066400000000000000000000200651453240647100155330ustar00rootroot00000000000000# PyPartPicker #### PyPartPicker is a package that allows you to obtain information from PCPartPicker quickly and easily, with data being returned via objects with numerous attributes. --- ### Features: - Obtain Part objects, estimated wattage and total cost from PCPartPicker lists. - Obtain name, product URL, price, product type, image and more from Part objects. - Obtain everything previously mentioned + specs, reviews and in depth pricing information from PCPartPicker product links. - Async ready: all scraping methods have an asynchronous version # Installation --- Installation via pip: ``` >>> pip install pypartpicker ``` Or clone the repo directly: ``` >>> git clone https://github.com/thefakequake/pypartpicker.git ``` # Example programs --- Here is a program that searches for i7's, prints every result, then gets the first result and prints its specs: ```python from pypartpicker import Scraper # creates the scraper object pcpp = Scraper() # returns a list of Part objects we can iterate through parts = pcpp.part_search("i7") # iterates through every part object for part in parts: # prints the name of the part print(part.name) # gets the first product and fetches its URL first_product_url = parts[0].url # gets the Product object for the item product = pcpp.fetch_product(first_product_url) # prints the product's specs using the specs attribute print(product.specs) ``` Here is another program that finds i3s that are cheaper than or equal to £110, prints their specs and then prints the first review: ```python from pypartpicker import Scraper from time import sleep # returns a list of Part objects we can iterate through # the region is set to "uk" so that we get prices in GBP pcpp = Scraper() parts = pcpp.part_search("i3", region="uk") # iterates through the parts for part in parts: # checks if the price is lower than 110 if float(part.price.strip("£")) <= 110: print(f"I found a valid product: {part.name}") print(f"Here is the link: {part.url}") # gets the product object for the parts product = pcpp.fetch_product(part.url) print(product.specs) # makes sure the product has reviews if product.reviews != None: # gets the first review review = product.reviews[0] print(f"Posted by {review.author}: {review.content}") print(f"They rated this product {review.rating}/5.") else: print("There are no reviews on this product!") # slows down the program so as not to spam PCPartPicker and potentially get IP banned sleep(3) ``` # Creating the Scraper object --- ### `Scraper(headers={...}, response_retriever=...)` ### Parameters - **headers** ( [dict](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict) ) - The browser headers for the requests in a dict. Note: There are headers set by default. I only recommend changing them if you are encountering scraping errors. - **response_retriever** ( [Callable](https://docs.python.org/3/library/typing.html#typing.Callable) ) - A function accepting arguments (`url, **kwargs`) that is called to retrieve the response from PCPartPicker Note: A default retriever is configured that calls pcpartpicker.com directly. I only recommend changing this if you need to configure how the request is made (e.g. via a proxy) # Scraper Methods --- ### `Scraper.part_search(search_term, limit=20, region=None)` #### Returns Part objects using PCPartPicker's search function. ### **Parameters** - **search_term** ( [str](https://docs.python.org/3/library/stdtypes.html#str) ) - The term you want to search for. Example: `"i5"` - **limit** (Optional[ [int](https://docs.python.org/3/library/functions.html#int) ]) - The amount of results that you want to be returned. Example: `limit=5` - **region** (Optional[ [str](https://docs.python.org/3/library/stdtypes.html#str) ]) - The country code of which you want the pricing for the Part objects to be in. Example: `region="uk"` ### Returns A list of Part objects corresponding to the results on PCPartPicker. --- ### `Scraper.fetch_product(product_url)` #### Returns a Product object from a PCPartPicker product URL. ### **Parameters** - **product_url** ( [str](https://docs.python.org/3/library/stdtypes.html#str) ) - The product URL for the product you want to search for. Example: `"https://pcpartpicker.com/product/9nm323/amd-ryzen-5-3600-36-thz-6-core-processor-100-100000031box"` ### Returns A Product object for the part. --- ### `Scraper.fetch_list(list_url)` #### Returns a PCPPLIst object from a PCPartPicker list URL. ### **Parameters** - **list_url** ( [str](https://docs.python.org/3/library/stdtypes.html#str) ) - The URL for the parts list. Example: `"https://pcpartpicker.com/list/Tzx22V"` ### Returns A PCPPList object for the list. # Other methods --- ### `get_list_links(string)` #### Returns a list of PCPartPicker list links from the given string. ### **Parameters** - **string** ( [str](https://docs.python.org/3/library/stdtypes.html#str) ) - The string containing the parts list URL. Example: `"this function can extract the link from this string https://pcpartpicker.com/list/Tzx22V"` ### Returns A list of URLs. --- ### `get_product_links(string)` #### Returns a list of PCPartPicker product links from the given string. ### **Parameters** - **string** ( [str](https://docs.python.org/3/library/stdtypes.html#str) ) - The string containing the product URL. Example: `"this function can extract the link from this string https://pcpartpicker.com/product/9nm323/amd-ryzen-5-3600-36-thz-6-core-processor-100-100000031box"` ### Returns A list of URLs. # Async Methods ___ #### Same syntax as sync functions, but add aio_ to the beginning of the method name and add await before the function call. #### For example: ```python pcpp = Scraper() results = pcpp.part_search("i5") ``` becomes ```python pcpp = Scraper() results = await pcpp.aio_part_search("i5") ``` Remember: you can only call async functions within other async functions. If you are not writing async code, do not use these methods. Use the sync methods, which don't have aio_ before their name. Only the blocking functions (the ones that involve active scraping) have async equivalents. # Objects --- ## Part ### Attributes - `name` - The name of the part - `url` - The product URL for the part - `type` - The type of the part (e.g. CPU, Motherboard, Memory) - `price` - The cheapest price for the part - `image` - The image link for the part --- ## Product ### Attributes - All attributes from Part - `specs` - The specs for the product, arranged in a dictionary with the values being lists - `price_list` - A list of Price objects containing all the merchants, prices and stock values - `rating` - The total user ratings and average score for the product - `reviews` - A list of Review objects for the product - `compatible_parts` - A list of tuples containing the compatible parts links for the product ___ ## Price ### Attributes - `value` - The final price value - `base_value` - The price value before shipping and other discounts - `seller` - The name of the company selling the part - `seller_icon` - The icon URL for the company selling the part - `url` - The URL which links to the seller's product listing - `in_stock` - A boolean of whether the product is in stock --- ## Review ### Attributes - `author` - The name of the person who posted the review - `author_url` - The URL which points to the author's profile - `author_icon` - The icon URL of the person who posted the review - `points` - The amount of upvotes on the review - `created_at` - How long ago the review was sent - `rating` - The star rating out of 5 of the product that the review gives - `content` - The text content of the review --- ## PCPPList ### Attributes - `parts` - List of Part objects corresponding to every part in the list - `wattage` - The estimated wattage for the parts in the list - `total` - The total price of the PCPartPicker list - `url` - The URL of the PCPartPicker list - `compatibility` - A list containing the compatibility notes for the parts listpypartpicker-1.9.5/pypartpicker/000077500000000000000000000000001453240647100167665ustar00rootroot00000000000000pypartpicker-1.9.5/pypartpicker/__init__.py000066400000000000000000000000541453240647100210760ustar00rootroot00000000000000from .scraper import * from .regex import * pypartpicker-1.9.5/pypartpicker/regex.py000066400000000000000000000006621453240647100204560ustar00rootroot00000000000000import re LIST_REGEX = re.compile( "((?:http|https)://(?:[a-z]{2}.)?pcpartpicker.com/(?:(?:list/(?:[a-zA-Z0-9]{6}))|(?:user/(?:[\\w]+)/saved/(?:[a-zA-Z0-9]{6}))))" ) PRODUCT_REGEX = re.compile( "((?:http|https)://(?:[a-z]{2}.)?pcpartpicker.com/product/(?:[a-zA-Z0-9]{6}))" ) def get_list_links(string): return re.findall(LIST_REGEX, string) def get_product_links(string): return re.findall(PRODUCT_REGEX, string) pypartpicker-1.9.5/pypartpicker/scraper.py000066400000000000000000000444341453240647100210100ustar00rootroot00000000000000import asyncio import concurrent.futures import math import re from typing import List import requests from pypartpicker.regex import LIST_REGEX, PRODUCT_REGEX from bs4 import BeautifulSoup from functools import partial from urllib.parse import urlparse class Part: def __init__(self, **kwargs): self.name = kwargs.get("name") self.url = kwargs.get("url") self.type = kwargs.get("type") self.price = kwargs.get("price") self.image = kwargs.get("image") class PCPPList: def __init__(self, **kwargs): self.parts = kwargs.get("parts") self.wattage = kwargs.get("wattage") self.total = kwargs.get("total") self.url = kwargs.get("url") self.compatibility = kwargs.get("compatibility") class Product(Part): def __init__(self, **kwargs): super().__init__(**kwargs) self.specs = kwargs.get("specs") self.price_list = kwargs.get("price_list") self.rating = kwargs.get("rating") self.reviews = kwargs.get("reviews") self.compatible_parts = kwargs.get("compatible_parts") class Price: def __init__(self, **kwargs): self.value = kwargs.get("value") self.seller = kwargs.get("seller") self.seller_icon = kwargs.get("seller_icon") self.url = kwargs.get("url") self.base_value = kwargs.get("base_value") self.in_stock = kwargs.get("in_stock") class Review: def __init__(self, **kwargs): self.author = kwargs.get("author") self.author_url = kwargs.get("author_url") self.author_icon = kwargs.get("author_icon") self.points = kwargs.get("points") self.created_at = kwargs.get("created_at") self.rating = kwargs.get("rating") self.content = kwargs.get("content") class Verification(Exception): pass class Scraper: def __init__(self, **kwargs): headers_dict = kwargs.get( "headers", { "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36 Edg/88.0.705.63" }, ) if not isinstance(headers_dict, dict): raise ValueError("Headers kwarg has to be a dict!") self.headers = headers_dict response_retriever = kwargs.get( "response_retriever", self.__default_response_retriever ) if not callable(response_retriever): raise ValueError("response_retriever kwarg must be callable!") self.response_retriever = response_retriever @staticmethod def __default_response_retriever(url, **kwargs): return requests.get(url, **kwargs) # Private Helper Function def __make_soup(self, url) -> BeautifulSoup: # sends a request to the URL page = self.response_retriever(url, headers=self.headers) # gets the HTML code for the website and parses it using Python's built in HTML parser soup = BeautifulSoup(page.content, "html.parser") if "Verification" in soup.find(class_="pageTitle").get_text(): raise Verification( f"You are being rate limited by PCPartPicker! Slow down your rate of requests, and complete the captcha at this URL: {url}" ) # returns the HTML return soup # Private Helper Function # Uses a RegEx to check if the specified string matches the URL format of a valid PCPP parts list def __check_list_url(self, url_str): return re.search(LIST_REGEX, url_str) # Private Helper Function # Uses a RegEx to check if the specified string matches the URL format of a valid product on PCPP def __check_product_url(self, url_str): return re.search(PRODUCT_REGEX, url_str) def fetch_list(self, list_url) -> PCPPList: # Ensure a valid pcpartpicker parts list was passed to the function if self.__check_list_url(list_url) is None: raise ValueError(f"'{list_url}' is an invalid PCPartPicker list!") # fetches the HTML code for the website try: soup = self.__make_soup(list_url) except requests.exceptions.ConnectionError: raise ValueError("Invalid list URL! Max retries exceeded with URL.") # gets the code with the table containing all the parts table = soup.find_all("table", {"class": "xs-col-12"}, limit=1)[0] # creates an empty list to put the Part objects inside parts = [] # iterates through every part in the table for item in table.find_all("tr", class_="tr__product"): # creates a new part object using values obtained from the tables' rows part_name = ( item.find(class_="td__name").get_text().strip("\n").replace("\n", "") ) if "Note:" in part_name: part_name = part_name.split("Note:")[0] if "From parametric filter:" in part_name: part_name = part_name.split("From parametric filter:")[0] if "From parametric selection:" in part_name: part_name = part_name.split("From parametric selection:")[0] part_object = Part( name=part_name, price=item.find(class_="td__price") .get_text() .strip("\n") .replace("No Prices Available", "None") .replace("Price", "") .strip("\n"), type=item.find(class_="td__component").get_text().strip("\n").strip(), image=("https://" + item.find("img", class_="")["src"]).replace( "https://https://", "https://" ), ) # converts string representation of 'None' to NoneType if part_object.price == "None": part_object.price = None # checks if the product row has a product URL inside if "href" in str(item.find(class_="td__name")): # adds the product URL to the Part object part_object.url = ( "https://" + urlparse(list_url).netloc + item.find(class_="td__name") .find("a")["href"] .replace("/placeholder-", "") ) # adds the part object to the list parts.append(part_object) # gets the estimated wattage for the list wattage = ( soup.find(class_="partlist__keyMetric") .get_text() .replace("Estimated Wattage:", "") .strip("\n") ) # gets the total cost for the list total_cost = ( table.find("tr", class_="tr__total tr__total--final") .find(class_="td__price") .get_text() ) # gets the compatibility notes for the list compatibilitynotes = [ a.get_text().strip("\n").replace("Note:", "").replace("Warning!", "") for a in soup.find_all("li", class_=["info-message", "warning-message"]) ] # returns a PCPPList object containing all the information return PCPPList( parts=parts, wattage=wattage, total=total_cost, url=list_url, compatibility=compatibilitynotes, ) def part_search(self, search_term, **kwargs) -> List[Part]: search_term = search_term.replace(" ", "+") limit = kwargs.get("limit", 20) # makes sure limit is an integer, raises ValueError if it's not if not isinstance(limit, int): raise ValueError("Product limit must be an integer!") # checks if the region given is a string, and checks if it is a country code if ( not isinstance(kwargs.get("region", "us"), str) or len(kwargs.get("region", "us")) != 2 ): raise ValueError("Invalid region!") if limit < 0: raise ValueError("Limit out of range.") # constructs the search URL if kwargs.get("region") in ("us", None): search_link = f"https://pcpartpicker.com/search/?q={search_term}" else: search_link = f"https://{kwargs.get('region', '')}.pcpartpicker.com/search/?q={search_term}" iterations = math.ceil(limit / 20) # creates an empty list for the part objects to be stored in parts = [] for i in range(iterations): try: soup = self.__make_soup(f"{search_link}&page={i + 1}") except requests.exceptions.ConnectionError: raise ValueError("Invalid region! Max retries exceeded with URL.") # checks if the page redirects to a product page if soup.find(class_="pageTitle").get_text() != "Product Search": # creates a part object with the information from the product page part_object = Part( name=soup.find(class_="pageTitle").get_text(), url=search_link, price=None, ) # searches for the pricing table table = soup.find("table", class_="xs-col-12") # loops through every row in the table for row in table.find_all("tr"): # first conditional statement makes sure its not the top row with the table parameters, second checks if the product is out of stock if ( not "td__availability" in str(row) or "Out of stock" in row.find(class_="td__availability").get_text() ): # skips this iteration continue # sets the price of the price object to the price part_object.price = ( row.find(class_="td__finalPrice") .get_text() .strip("\n") .strip("+") ) break # returns the part object return [part_object] # gets the section of the website's code with the search results section = soup.find("section", class_="search-results__pageContent") if "No results" in section.get_text(): break # iterates through all the HTML elements that match the given the criteria for product in section.find_all("ul", class_="list-unstyled"): # extracts the product data from the HTML code and creates a part object with that information part_object = Part( name=product.find("p", class_="search_results--link") .get_text() .strip(), url="https://" + urlparse(search_link).netloc + product.find("p", class_="search_results--link").find( "a", href=True )["href"], image=("https://" + product.find("img")["src"].strip("/")).replace( "https://https://", "https://" ), ) part_object.price = ( product.find(class_="search_results--price").get_text().strip() ) if part_object.price == "": part_object.price = None # adds the part object to the list parts.append(part_object) # returns the part objects return parts[: kwargs.get("limit", 20)] def fetch_product(self, part_url) -> Product: # Ensure a valid product page was passed to the function if self.__check_product_url(part_url) is None: raise ValueError("Invalid product URL!") try: soup = self.__make_soup(part_url) except requests.exceptions.ConnectionError: raise ValueError("Invalid product URL! Max retries exceeded with URL.") specs_block = soup.find(class_="block xs-hide md-block specs") specs = {} prices = [] price = None # finds the table with the pricing information table = soup.find("table", class_="xs-col-12") section = table.find("tbody") for row in section.find_all("tr"): # skip over empty row if "tr--noBorder" in str(row): continue # creates a Price object with all the information price_object = Price( value=row.find(class_="td__finalPrice").get_text().strip("\n"), seller=row.find(class_="td__logo").find("img")["alt"], seller_icon=( "https://" + row.find(class_="td__logo").find("img")["src"][1:] ).replace("https://https://", "https://"), base_value=row.find(class_="td__base priority--2").get_text(), url="https://" + urlparse(part_url).netloc + row.find(class_="td__finalPrice").find("a")["href"], in_stock=True if "In stock" in row.find(class_="td__availability").get_text() else False, ) # chceks if its the cheapest in stock price if ( price is None and "In stock" in row.find(class_="td__availability").get_text() ): price = row.find(class_="td__finalPrice").get_text().strip("\n") prices.append(price_object) # adds spec keys and values to the specs dictionary for spec in specs_block.find_all("div", class_="group group--spec"): specs[spec.find("h3", class_="group__title").get_text()] = ( spec.find("div", class_="group__content") .get_text() .strip() .strip("\n") .replace("\u00b3", "") .replace('"', "") .split("\n") ) reviews = None # gets the HTML code for the box containing reviews review_box = soup.find(class_="block partReviews") # skips over this process if the review box does not exist if review_box is not None: reviews = [] # counts stars in reviews for review in review_box.find_all(class_="partReviews__review"): stars = 0 for _ in review.find(class_="shape-star-full"): stars += 1 # gets the upvotes and timestamp iterations = 0 for info in review.find( class_="userDetails__userData list-unstyled" ).find_all("li"): if iterations == 0: points = ( info.get_text().replace(" points", "").replace(" point", "") ) elif iterations == 1: created_at = info.get_text().replace(" ago", "") else: break iterations += 1 # creates review object with all the information review_object = Review( author=review.find(class_="userDetails__userName").get_text(), author_url="https://" + urlparse(part_url).netloc + review.find(class_="userDetails__userName").find("a")["href"], author_icon="https://" + urlparse(part_url).netloc + review.find(class_="userAvatar userAvatar--entry").find("img")[ "src" ], content=review.find( class_="partReviews__writeup markdown" ).get_text(), rating=stars, points=points, created_at=created_at, ) reviews.append(review_object) compatible_parts = None # fetches section with compatible parts hyperlinks compatible_parts_list = soup.find(class_="compatibleParts__list list-unstyled") if compatible_parts_list is not None: compatible_parts = [] # finds every list item in the section for item in compatible_parts_list.find_all("li"): compatible_parts.append( ( item.find("a").get_text(), "https://" + urlparse(part_url).netloc + item.find("a")["href"], ) ) # creates the product object to return product_object = Product( name=soup.find(class_="pageTitle").get_text(), url=part_url, image=None, specs=specs, price_list=prices, price=price, rating=soup.find(class_="actionBox-2023 actionBox__ratings") .find(class_="product--rating list-unstyled") .get_text() .strip("\n") .strip() .strip("()"), reviews=reviews, compatible_parts=compatible_parts, type=soup.find(class_="breadcrumb") .find(class_="list-unstyled") .find("li") .get_text(), ) image_box = soup.find(class_="single_image_gallery_box") if image_box is not None: # adds image to object if it finds one product_object.image = image_box.find("img")["src"].replace( "https://https://", "https://" ) return product_object async def aio_part_search(self, search_term, **kwargs): with concurrent.futures.ThreadPoolExecutor() as pool: result = await asyncio.get_event_loop().run_in_executor( pool, partial(self.part_search, search_term, **kwargs) ) return result async def aio_fetch_list(self, list_url): with concurrent.futures.ThreadPoolExecutor() as pool: result = await asyncio.get_event_loop().run_in_executor( pool, self.fetch_list, list_url ) return result async def aio_fetch_product(self, part_url): with concurrent.futures.ThreadPoolExecutor() as pool: result = await asyncio.get_event_loop().run_in_executor( pool, self.fetch_product, part_url ) return result pypartpicker-1.9.5/requirements.txt000066400000000000000000000000151453240647100175310ustar00rootroot00000000000000bs4 requests pypartpicker-1.9.5/setup.cfg000066400000000000000000000000471453240647100160730ustar00rootroot00000000000000[metadata] description-file = README.mdpypartpicker-1.9.5/setup.py000066400000000000000000000022211453240647100157600ustar00rootroot00000000000000from setuptools import setup with open("README.md", "r", encoding="utf-8") as readme: long_description = readme.read() setup( name="pypartpicker", version="1.9.4", description="A package that scrapes pcpartpicker.com and returns the results as objects.", packages=["pypartpicker"], url="https://github.com/thefakequake/pypartpicker", keywords=["pcpartpicker", "scraper", "list", "beautifulsoup", "pc", "parts"], install_requires=["bs4", "requests"], zip_safe=False, download_url="https://github.com/thefakequake/pypartpicker/archive/refs/tags/v1.9.4.tar.gz", long_description=long_description, long_description_content_type="text/markdown", classifiers=[ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "Topic :: Software Development :: Build Tools", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", ], )