List of HTTP Status Codes: GET Request Success vs Errors

Glean and understand the status code issued by a server in response to a client’s request.

Korkrid Kyle Akepanidtaworn
3 min readAug 11, 2018

To all web crawlers, web scrapers, and data enthusiasts, we encounter a variety of data types: structured, semi-structured, and unstructured. Most often, we work with ones that reside in a traditional row-column database. We, at times, dig it up to unstructured data files, for instance, e-mail messages, word processing documents, videos, photos, audio files, presentations, web-pages and many other kinds of business documents etc. with the premise that such information is useful for the organizations in making well-informed decisions. In extracting this sort of information, it usually requires specific software solutions designed to search and glean data, or another way is to write our own data scraper in R or Python, which might be fairly complicated for certain data sources.

Structured vs. Unstructured data, Source: http://bigdata.black/infrastructure/storage/unstructured-data/
Digital Data & Digital Information, Source: EMC

For this article, I highlight the issue in web crawling and scraping that you guys will surely encounter when trying on certain websites. You can check out my simple wikipedia scraping exercise below:

--

--

Korkrid Kyle Akepanidtaworn

AI Specialized CSA @ Microsoft | Enterprise AI, GenAI, LLM, LLamaIndex, ML | GenAITechLab Fellow, MScFE at WorldQuant, MSDS at CU Boulder