To help many others learn python faster, I decided to create this tutorial. We will take bite-sized information on how to use Python for Data Science in this tutorial, practice it until we are comfortable and use it for our own purpose. Why learn Python for data science? Recently, Python has gained a lot of interest as a language choice for data science because of extensive support libraries from the communities, integration feature and improved programmer’s productivity. However, there are several limitations such as difficulty in using other languages (not many similarities such as semicolons or declaring cast type), weak in mobile computing, gets slow in speed (compiler), run-time errors (strict design restrictions) and lack database access layers.
Python 2 or 3?
There are two versions of Python presently available, 2 and 3. Which better one? What differences do Python 2 and Python 3 differ? For me, it depends. You should learn Python 3 if you’re a fresh learner because Python 2 will end in 2020 because there’s no reason to learn Python 2 unless you have a particular reason. As an instance, if you work with a company that only supports Python 2 using libraries. Python 2 is already 10 years old and will be endorsed in the future in the process shift to Python 3.
Let’s get start it.
You can download and install Python 3 here, https://www.python.org/downloads/. Type this command to check your Python version.
>>> import sys >>> print(sys.version)
After that install Jupyter Notebook using this command.
pip3 install jupyter
Once done, run this command to open Jupyter Notebook. Make sure you in workspace directory before running this command.
Understand the Jupyter Notebook components first before we put anything codes in it.
When you launch Jupyter Notebook the first page that you encounter is the Notebook Dashboard.
Once you’ve selected a Notebook to edit, the Notebook will open in the Notebook Editor.
The cell mode indicator will alter to represent the state of the cell when a cell is in edit mode. This state is indicated on the top right of the interface by a small pencil icon. There is no icon at that location when the cell is in command mode.
Now let’s say that while in the Notebook Dashboard, you chose to open a Markdown folder instead of a Notebook file. If so, the File Editor will open the file.
Try run any simple Python codes here for testing.
Before we go deep into solving problems, let’s take a step back and comprehend Python’s basics. As we know, data structures and iteration and conditional buildings are the cruces of any language. Let’s look at some of them.
Python Data Structures
The list data type has some more methods. Here are all of the methods of list-objects.
Using Lists as Stacks
The techniques of listing create it very simple to use a list as a stack, where the last added item is the first retrieved component (“last-in, first-out”). Use append() to add an item at the top of the stack. Use pop() without an explicit index to retrieve an item from the top of the stack.
Using Lists as Queues
A list can also be used as a queue, where the first element added is the first element retrieved (“first-in, first-out”). However, lists for this purpose are not efficient. While appends and pops are ends of the list, making inserts or pops from the start of a list is slow (because all the other components have to be moved by one). Use collections.deque, intended to have quick appends and pops from both ends, to introduce a queue.
List Comprehensions: https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
Nested List Comprehensions: https://docs.python.org/3/tutorial/datastructures.html#nested-list-comprehensions
Instead of its value, there is a way to remove an item from a list given its index: del statement. This is different from the method pop() that returns a value. You can also use the del declaration to remove slices from a list or clear the whole list.
Tuple and Sequences: https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
Python also contains a data type of sets. A set is an unordered, duplicate-free collection. Basic uses include testing membership and removing duplicate entries. Set objects also support the union, intersection, difference, and symmetric difference mathematical operations.
Set Theory Symbols: https://www.rapidtables.com/math/symbols/Set_Symbols.html
Looping Techniques: https://docs.python.org/3/tutorial/datastructures.html#looping-techniques
More on Conditions: https://docs.python.org/3/tutorial/datastructures.html#more-on-conditions
Comparing Sequences and Other Types: https://docs.python.org/3/tutorial/datastructures.html#comparing-sequences-and-other-types
Top 6 Python libraries you must know in 2019
You may have learned about this famous open source library known as TensorFlow if you are presently working on a machine learning project in Python. Google created this library in cooperation with the Brain Team. TensorFlow is used for machine learning in nearly every Google application. TensorFlow works like a computer library to write new algorithms involving a large number of tensor operations as neural networks can be easily expressed as computational graphs that can be implemented using TensorFlow as a series of tensor operations. Plus, tensors are your data’s N-dimensional matrices.
Scikit-Learn is connected with NumPy and SciPy as a Python library. It is considered one of the best libraries to use complex data. In this library, a lot of modifications are being created. The cross-validation function is one modification, which provides the capacity to use more than one metric. Many training techniques, such as regression of logistics and closest neighbors, have got some minor changes.
Numpy is considered one of Python’s most popular library of machine learning. TensorFlow and other libraries internally use Numpy to perform various tensor activities. Numpy’s finest and most significant characteristic is the array interface.
Keras is considered one of Python’s coolest libraries of machine learning. It offers a mechanism for expressing neural networks more easily. Keras also offers some of the finest tools for model compilation, data sets processing, graph visualization, and more. Keras utilizes Theano or TensorFlow internally in the backend. It is also possible to use some of the most common neural networks such as CNTK. When we compare it with other machine learning libraries, Keras is relatively slow. Because by using back-end infrastructure, it generates a computational graph and then makes use of it to conduct activities. All designs are portable in Keras.
SciPy is a library of machine learning for designers and engineers of applications. However, the difference between SciPy library and SciPy stack still needs to be known. SciPy library includes optimization, linear algebra, integration, and statistics modules.
Pandas is a Python machine learning library that provides high-level data structures and a wide range of analytical tools. One of the excellent features of this library is the capacity to use one or two commands to translate complicated activities with information. Pandas have so many integrated methods for grouping, combining, and filtering data, as well as the functionality of the time series.
I hope this article will help you maximize your effectiveness beginning with Python’s data science. Next article, we will use basic of Python and extra libraries into problem-solving. Python is really an excellent tool and among data scientists are becoming an increasingly popular language. The reason being, learning is simple, it integrates well with other databases and tools such as Spark and Hadoop. Mostly, it has high computational intensity and powerful libraries for data analytics.