Due to their usually huge size (often several gigabytes or more) and their probable origin from various, heterogeneous sources, the real-world databases of today are highly susceptible to noisy, incomplete, and unreliable information.
This tutorial explains how to extract data using Selenium and Python without the Facebook Graph API. The reason why we use Selenium instead of Facebook Graph API is that Facebook could possibly modify or disable any endpoint accesses to the API at any time. One reason is the Cambridge Analytica fiasco, abusing their gains on the Facebook platform.
You’re going to want to understand the following: what kinds of characteristics or areas make up your data? What are the values of each attribute? What are the discrete characteristics and which are continuously valued? What is the look of the data? How are the values distributed? Can we visualize the data in order to get a better sense of it all? Can we find outliers? Can we evaluate the resemblance between some data objects and others? The subsequent assessment will assist to gain such insight into the data. Knowledge of your data is helpful for pre-processing data, the first important task of data analysis.
- Why Data Science?
- What is Data Science?
- What is Data Science process?
- What Kinds of Data Can Be Analyzed?
- What Kinds of Patterns Can Be Analyzed?
- Which Technologies are Used?
- Which Kinds of Applications Are Targeted?
- Major issues in Data Science
- Data Science and Society
Why Data Science?
We live in a world where a vast amount of data are collected daily. It is a significant necessity to analyze such data to discover knowledge from it.
We live in the information age
It is a popular saying, but in fact, we live in the information age. Every day, terabytes or petabytes of data flow into our computer networks, the World Wide Web (WWW), and various data storage devices from the company, society, science and engineering, medicine, and almost every other aspect of everyday life. Powerful and versatile tools are badly required to automatically discover and convert precious information from enormous quantities of data into structured knowledge.
To help many others learn python faster, I decided to create this tutorial. We will take bite-sized information on how to use Python for Data Science in this tutorial, practice it until we are comfortable and use it for our own purpose. Why learn Python for data science? Recently, Python has gained a lot of interest as a language choice for data science because of extensive support libraries from the communities, integration feature and improved programmer’s productivity. However, there are several limitations such as difficulty in using other languages (not many similarities such as semicolons or declaring cast type), weak in mobile computing, gets slow in speed (compiler), run-time errors (strict design restrictions) and lack database access layers.