Python and Extensive Libraries

Learn Python for Data Science

Why Python?

To help many others learn python faster, I decided to create this tutorial. We will take bite-sized information on how to use Python for Data Science in this tutorial, practice it until we are comfortable and use it for our own purpose. Why learn Python for data science? Recently, Python has gained a lot of interest as a language choice for data science because of extensive support libraries from the communities, integration feature and improved programmer’s productivity. However, there are several limitations such as difficulty in using other languages (not many similarities such as semicolons or declaring cast type), weak in mobile computing, gets slow in speed (compiler), run-time errors (strict design restrictions) and lack database access layers.

Python 2 or 3?

There are two versions of Python presently available, 2 and 3. Which better one? What differences do Python 2 and Python 3 differ? For me, it depends. You should learn Python 3 if you’re a fresh learner because Python 2 will end in 2020 because there’s no reason to learn Python 2 unless you have a particular reason. As an instance, if you work with a company that only supports Python 2 using libraries. Python 2 is already 10 years old and will be endorsed in the future in the process shift to Python 3.

Let’s get start it.

You can download and install Python 3 here, https://www.python.org/downloads/. Type this command to check your Python version.



python -v

or

Python3




>>> import sys
>>> print(sys.version)
Python 3
Python 3

After that install Jupyter Notebook using this command.



pip3 install jupyter

Once done, run this command to open Jupyter Notebook. Make sure you in workspace directory before running this command.



jupyter notebook

Understand the Jupyter Notebook components first before we put anything codes in it.

When you launch Jupyter Notebook the first page that you encounter is the Notebook Dashboard.

Notebook Dashboard
Notebook Dashboard

Once you’ve selected a Notebook to edit, the Notebook will open in the Notebook Editor.

Notebook Editor
Notebook Editor

The cell mode indicator will alter to represent the state of the cell when a cell is in edit mode. This state is indicated on the top right of the interface by a small pencil icon. There is no icon at that location when the cell is in command mode.

Edit Mode and Notebook Editor
Edit Mode and Notebook Editor

Now let’s say that while in the Notebook Dashboard, you chose to open a Markdown folder instead of a Notebook file. If so, the File Editor will open the file.

File Editor
File Editor

Try run any simple Python codes here for testing.

Jupyter Notebook
Jupyter Notebook

Before we go deep into solving problems, let’s take a step back and comprehend Python’s basics. As we know, data structures and iteration and conditional buildings are the cruces of any language. Let’s look at some of them.

Python Data Structures

Lists

The list data type has some more methods. Here are all of the methods of list-objects.

Lists
Lists

Using Lists as Stacks

The techniques of listing create it very simple to use a list as a stack, where the last added item is the first retrieved component (“last-in, first-out”). Use append() to add an item at the top of the stack. Use pop() without an explicit index to retrieve an item from the top of the stack.

Using Lists as Stacks
Using Lists as Stacks

Using Lists as Queues

A list can also be used as a queue, where the first element added is the first element retrieved (“first-in, first-out”). However, lists for this purpose are not efficient. While appends and pops are ends of the list, making inserts or pops from the start of a list is slow (because all the other components have to be moved by one). Use collections.deque, intended to have quick appends and pops from both ends, to introduce a queue.

Lists as Queues
Lists as Queues

List Comprehensions: https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions

Nested List Comprehensions: https://docs.python.org/3/tutorial/datastructures.html#nested-list-comprehensions

del

Instead of its value, there is a way to remove an item from a list given its index: del statement. This is different from the method pop() that returns a value. You can also use the del declaration to remove slices from a list or clear the whole list.

del
del

Tuple and Sequences: https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences

Sets

Python also contains a data type of sets. A set is an unordered, duplicate-free collection. Basic uses include testing membership and removing duplicate entries. Set objects also support the union, intersection, difference, and symmetric difference mathematical operations.

Sets
Sets

Set Theory Symbols: https://www.rapidtables.com/math/symbols/Set_Symbols.html

Dictionaries: https://docs.python.org/3/tutorial/datastructures.html#dictionaries

Looping Techniques: https://docs.python.org/3/tutorial/datastructures.html#looping-techniques

More on Conditions: https://docs.python.org/3/tutorial/datastructures.html#more-on-conditions

Comparing Sequences and Other Types: https://docs.python.org/3/tutorial/datastructures.html#comparing-sequences-and-other-types

Top 6 Python libraries you must know in 2019

You may have learned about this famous open source library known as TensorFlow if you are presently working on a machine learning project in Python. Google created this library in cooperation with the Brain Team. TensorFlow is used for machine learning in nearly every Google application. TensorFlow works like a computer library to write new algorithms involving a large number of tensor operations as neural networks can be easily expressed as computational graphs that can be implemented using TensorFlow as a series of tensor operations. Plus, tensors are your data’s N-dimensional matrices.

Scikit-Learn is connected with NumPy and SciPy as a Python library. It is considered one of the best libraries to use complex data. In this library, a lot of modifications are being created. The cross-validation function is one modification, which provides the capacity to use more than one metric. Many training techniques, such as regression of logistics and closest neighbors, have got some minor changes.

Numpy is considered one of Python’s most popular library of machine learning. TensorFlow and other libraries internally use Numpy to perform various tensor activities. Numpy’s finest and most significant characteristic is the array interface.

Keras is considered one of Python’s coolest libraries of machine learning. It offers a mechanism for expressing neural networks more easily. Keras also offers some of the finest tools for model compilation, data sets processing, graph visualization, and more. Keras utilizes Theano or TensorFlow internally in the backend. It is also possible to use some of the most common neural networks such as CNTK. When we compare it with other machine learning libraries, Keras is relatively slow. Because by using back-end infrastructure, it generates a computational graph and then makes use of it to conduct activities. All designs are portable in Keras.

SciPy is a library of machine learning for designers and engineers of applications. However, the difference between SciPy library and SciPy stack still needs to be known. SciPy library includes optimization, linear algebra, integration, and statistics modules.

Pandas is a Python machine learning library that provides high-level data structures and a wide range of analytical tools. One of the excellent features of this library is the capacity to use one or two commands to translate complicated activities with information. Pandas have so many integrated methods for grouping, combining, and filtering data, as well as the functionality of the time series.

I hope this article will help you maximize your effectiveness beginning with Python’s data science. Next article, we will use basic of Python and extra libraries into problem-solving. Python is really an excellent tool and among data scientists are becoming an increasingly popular language. The reason being, learning is simple, it integrates well with other databases and tools such as Spark and Hadoop. Mostly, it has high computational intensity and powerful libraries for data analytics.

Leave a Reply

Your email address will not be published. Required fields are marked *