Bot that Scrapes Product Information from Amazon
Today ,We are going to develop Amazon bot which is going to get some products name from Amazon Web site. We call this Web scraping. Nowdays ,It’s pretty popular. Bots can sometimes make our lives easier. Sometimes it is used to set an alarm when any product is in stock. It is sometimes used to pull products from the any web site to your own web site. You can write any bot that suits your need and purpose. Today I will be showing you how to do Web Scraping in C# Programming language. Let’s get start .
I am going to use Visual Studio to create a project. To create a new Project in Visual Studio ;
File =>New Project then we are going to choice Console App Template from Create a new project window. After giving the Project and solution names ,you will select the .NET Version . I have choosen .NET 6.0.
Before start the coding We need to download some packages from Nuget . To enter the nuget store Tools =>Nuget Package Manager =>Manage Nuget Packages for Solution… Download the latest versionof the packages in the image from there. |
I’m a bit organized,so I foldered the classes we would use in the project. I created the Absract folder for Abstract classes. We can create classes such as Interface here. I created the Concrete folder within the classes where we will fill the interfaces. |
Before doing web scraping ,let’s go to the relevant site and determine what we will scrape. I want to scrape the names of the products of the company I am currently working for. You can also scrape the prices if you want. But I’ll just take the product names.
Let’s start the coding. First, We need to create a class which must have a property of Product Name. Let’s put the name of that class Product.
public class Product
{
public string ProductName { get; set; }
public decimal ProductPrice { get; set; }
}
Than, create a interface class which name will have IProductService. This interface will have one method which returns list of product.
public interface IProductService
{
List<Product> GetAllProducts();
}
If you’ve made it this far ,great. We ‘re just getting started now. Now, We’re going to create ProductManager class. In this class, we will fill in the interface’s method and also do the scraping process here.We define a variable that name will url Here we will give the URL of which web site we will scrape.
string url = "https://www.amazon.com.tr/s?k=dogo";
I will create a interface variable to use virtual browser which is _driver. It is a kind of IwebDriver.
private readonly IWebDriver _driver;
I will create a consructor of ProductManager to resolve IWebDriver. We call this Dependency Injection in .Net Core.
public ProductManager(IWebDriver webDriver)
{
_driver = webDriver;
}
We will have two method except GetAllProducts() method in Product Manager class. First One is IsPageFullyLoaded(). We are creating this method to make sure get Page source and create virtual browser.
public static string IsPageFullyLoaded(string url, IWebDriver driver)
{
driver.Url = url;
// Wait for the page to complete loading
WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(5));
wait.Until(d => ((IJavaScriptExecutor)d).ExecuteScript("return document.readyState").Equals("complete"));
// Optionally, you can wait for specific elements or conditions to be present/visible
// Add your custom wait conditions here if needed
string pageSource = driver.PageSource.ToString();
return pageSource;
}
The other method is ConvertToHtmlDocument(). This method allow to convert html string to HtmlDocument.
public static HtmlDocument ConvertToHtmlDocument(string htmlString)
{
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(htmlString);
return htmlDoc;
}
Finally, We are going to fill GetAllProducts() method . After reaching the page source,we examine the div tags used on the page and create nodes in the areas we want to reach. In my case ,I just want to get product names from Amazon . Html Agility is quite advanced in this regard. It has advanced documentation.It even has support with artificial intelligence. If you want to check it out, you can check it out here .
public List<Product> GetAllProducts()
{
List <Product> allProducts=new List<Product>();
string source = IsPageFullyLoaded(url, _driver);
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = ConvertToHtmlDocument(source);
if (doc != null)
{
HtmlNode parentsInfos = doc.DocumentNode.SelectSingleNode(".//div[@class='s-main-slot s-result-list s-search-results sg-row']");
if (parentsInfos != null)
{
List<HtmlNode> childrens = parentsInfos.SelectNodes(".//div[@class='sg-col-4-of-24 sg-col-4-of-12 s-result-item s-asin sg-col-4-of-16 sg-col s-widget-spacing-small sg-col-4-of-20']").ToList();
if (childrens!=null)
{
foreach (var product in childrens)
{
Product new_product = new Product();
HtmlNode baseproduct = product.SelectSingleNode(".//div[@class='a-section a-spacing-small puis-padding-left-micro puis-padding-right-micro']");
//Product Name
HtmlNode productTextSideName = baseproduct.ChildNodes[1];
var pName = productTextSideName.InnerText;
new_product.ProductName = pName;
allProducts.Add(new_product);
}
}
}
}
return allProducts.ToList();
}
Finally, We need to configure Program.cs class. We need to resolve some classes such as ChromeDriver,IProductService etc.. I wanted to show the product names I scraped from the Amazon website here on the console.
-------------------------------------------------------------------------------
Product Name : DOGOHazel Sandalet Kadın
-------------------------------------------------------------------------------
Product Name : DOGOBayan Cord spor ayakkabı
-------------------------------------------------------------------------------
Product Name : DOGOSpor ayakkabı Spor AyakkabıKadın
-------------------------------------------------------------------------------
Product Name : DOGOwb spor ayakkabı Spor Ayakkabı Kadın
-------------------------------------------------------------------------------
Product Name : DOGOZipsy Moda ÇizmeKadın
-------------------------------------------------------------------------------
Product Name : DOGOAce Spor Ayakkabı - Friends Till Eternity Harry Potter Spor Ayakkabı Kadın
-------------------------------------------------------------------------------
Product Name : DOGOHazel SandaletKadın
-------------------------------------------------------------------------------
Product Name : DOGOGisele Orta baldır çizmesiKadın
-------------------------------------------------------------------------------
Product Name : DOGODogo Boots - Family Rocks Bilek Hizasında Bot Kadın
-------------------------------------------------------------------------------
Product Name : DOGODogo Botlar - Meraklı Gözler Moda ÇizmeKadın
-------------------------------------------------------------------------------
Product Name : DOGOSpor ayakkabı Spor AyakkabıKadın
-------------------------------------------------------------------------------
Product Name : DOGOKız Este Düz Sandalet
-------------------------------------------------------------------------------
Product Name : DOGOVegan Kadın Botu Bot - Hello My Hooman
-------------------------------------------------------------------------------
Product Name : DOGOKordon Spor Ayakkabı Kadın
-------------------------------------------------------------------------------
That’s how easy Web Scraping is . You can download all the codes of this Project from my github link. Don’t forget to follow me by entering your email adress in the section below my site.
Project Link : Here