List of HTTP Status Codes: GET Request Success vs Errors
Glean and understand the status code issued by a server in response to a client’s request.
To all web crawlers, web scrapers, and data enthusiasts, we encounter a variety of data types: structured, semi-structured, and unstructured. Most often, we work with ones that reside in a traditional row-column database. We, at times, dig it up to unstructured data files, for instance, e-mail messages, word processing documents, videos, photos, audio files, presentations, web-pages and many other kinds of business documents etc. with the premise that such information is useful for the organizations in making well-informed decisions. In extracting this sort of information, it usually requires specific software solutions designed to search and glean data, or another way is to write our own data scraper in R or Python, which might be fairly complicated for certain data sources.
For this article, I highlight the issue in web crawling and scraping that you guys will surely encounter when trying on certain websites. You can check out my simple wikipedia scraping exercise below: