Getting to Know Your Data

You’re going to want to understand the following: what kinds of characteristics or areas make up your data? What are the values of each attribute? What are the discrete characteristics and which are continuously valued? What is the look of the data? How are the values distributed? Can we visualize the data in order to get a better sense of it all? Can we find outliers? Can we evaluate the resemblance between some data objects and others? The subsequent assessment will assist to gain such insight into the data. Knowledge of your data is helpful for pre-processing data, the first important task of data analysis.


Introduction to Data Science

  1. Why Data Science?
  2. What is Data Science?
  3. What is Data Science process?
  4. What Kinds of Data Can Be Analyzed?
  5. What Kinds of Patterns Can Be Analyzed?
  6. Which Technologies are Used?
  7. Which Kinds of Applications Are Targeted?
  8. Major issues in Data Science
  9. Data Science and Society

Why Data Science?

We live in a world where a vast amount of data are collected daily. It is a significant necessity to analyze such data to discover knowledge from it.

We live in the information age

It is a popular saying, but in fact, we live in the information age. Every day, terabytes or petabytes of data flow into our computer networks, the World Wide Web (WWW), and various data storage devices from the company, society, science and engineering, medicine, and almost every other aspect of everyday life. Powerful and versatile tools are badly required to automatically discover and convert precious information from enormous quantities of data into structured knowledge.


Learn Python for Data Science

Why Python?

To help many others learn python faster, I decided to create this tutorial. We will take bite-sized information on how to use Python for Data Science in this tutorial, practice it until we are comfortable and use it for our own purpose. Why learn Python for data science? Recently, Python has gained a lot of interest as a language choice for data science because of extensive support libraries from the communities, integration feature and improved programmer’s productivity. However, there are several limitations such as difficulty in using other languages (not many similarities such as semicolons or declaring cast type), weak in mobile computing, gets slow in speed (compiler), run-time errors (strict design restrictions) and lack database access layers.