Pandas is defined as an open-source library that provides high-performance data manipulation in Python. The name of Pandas is derived from the word Panel Data, which means an Econometrics from Multidimensional data. It is used for data analysis in Python and developed by Wes McKinney in 2008.
Python Pandas Data Structure The Pandas provides two data structures for processing the data, i.e., Series and DataFrame, which are discussed below:
What is Series ? Series is a one-dimensional Structure storing homogeneous ( Same data type ) MUTABLE (which can be modified ) date or Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).
Example:
import pandas as pd
series1 = pd.Series([10,20,30]) #create a Series
print(series1) #Display the series
Pandas Series can be created from the lists, dictionary, and from a scalar value etc. Series can be created in different ways, here are some ways by which we create a series:
Creating a Empty Series :
import pandas as pd
ser = pd.Series()
print(ser)
Creating a series from Lists:
In order to create a series from list, we have to first create a list after that we can create a series from list.
import pandas as pd
# a simple list
list = ['g', 'e', 'e', 'k', 's']
# create series form a list
ser = pd.Series(list)
print(ser)
Creating a series from Dictionary: In order to create a series from the dictionary, we have to first create a dictionary after that we can make a series using dictionary. Dictionary keys are used to construct indexes of Series.
import pandas as pd
dict = {'Pandas': 10,
'Series': 20,
'DataFrame': 30}
ser = pd.Series(dict)
print(ser)
Creating a series from Scalar value: In order to create a series from scalar value, an index must be provided. The scalar value will be repeated to match the length of the index.
import pandas as pd
import numpy as np
# giving a scalar value with index
ser = pd.Series(10, index=[0, 1, 2, 3, 4, 5])
print(ser)
Creating a Series using range function:
Series object attributes
The Series attribute is defined as any information related to the Series object such as size, datatype. etc. Below are some of the attributes that you can use to get the information about the Series object:
import pandas as pd
import numpy as np
# giving a scalar value with index
ser = pd.Series(range(1,12,2), index=[0, 1, 2, 3, 4, 5])
print("Series index",ser.index)
print("Series Values",ser.values)
print("Series Size",ser.size)
print("Series Has nan values",ser.hasnans)
import pandas as pd
import numpy as np
# giving a scalar value with index
ser = pd.Series(range(1,12,2), index=[0, 1, 2, 3, 4, 5])
print("Series Has nan ndim",ser.ndim)
print("Series Has nan empty",ser.empty)
print("Series Has nan data type",ser.dtype)
print("Series Has nan nbytes (8*size)",ser.nbytes)
In this section we are going to discuss some of the methods that are available to fetch elements for Pandas Series.
head()
tail()
count()
The Series.head(n) function in a series fetches first “n” elements from a pandas object. By default, it gives us the top 5 rows of data in the Series if no value for “n” is specified. On the contrary, the Series.tail(n) function displays the last 5 elements by default. However, we can pass the number as a parameter for the number of values to be pulled out from the Series and Pandas shall print out the specified number of rows.
The syntax is:
<series-object>.head(n) and
<series-object>.tail(n)
<series-object>.count()
import pandas as pd
import numpy as np
# giving a scalar value with index
ser = pd.Series(range(1,12,2), index=[0, 1, 2, 3, 4, 5])
print("count the Series\n",ser.count())
print("\nhead()\n",ser.head())
print("\nhead(2)\n",ser.head(2))
print("\n head(1)\n",ser.head(1))
import pandas as pd
import numpy as np
# giving a scalar value with index
ser = pd.Series(range(1,12,2), index=[0, 1, 2, 3, 4, 5])
print("\n tail() \n",ser.tail())
print("\n tail(2) \n",ser.tail(2))
print("\n tail(1)\n",ser.tail(1))
Accessing Elements of a Series
Indexing Indexing in Series is similar to that for NumPy arrays, and is used to access elements in a series. Indexes are of two types: positional index and labelled index. Positional index takes an integer value that corresponds to its position in the series starting from 0, whereas labelled index takes any user-defined label as index.
import pandas as pd
seriesNum = pd.Series([10,20,30,50,70,45,])
print(seriesNum[4])
When labels are specified, we can use labels as indices while selecting values from a Series, as shown below. Here, the value 3 is displayed for the labelled index Mar.
import pandas as pd
seriesMnths = pd.Series([2,3,4],index=["Feb","Mar","Apr"])
print(seriesMnths["Mar"])
We can also access an element of the series using the positional index:
import pandas as pd
seriesMnths = pd.Series([2,3,4],index=["Feb","Mar","Apr"])
print(seriesMnths[0])
More than one element of a series can be accessed using a list of positional integers or a list of index labels as shown in the following examples:
import pandas as pd
seriesMnths = pd.Series([2,3,4],index=["Feb","Mar","Apr"])
print(seriesMnths[[0,2]])
import pandas as pd
seriesMnths = pd.Series([2,3,4],index=["Feb","Mar","Apr"])
print(seriesMnths[["Mar","Feb"]])
The index values associated with the series can be altered by assigning new index values as shown in the following example:
import pandas as pd
seriesMnths = pd.Series([2,3,4],index=["Feb","Mar","Apr"])
print(seriesMnths[["Mar","Feb"]])
seriesMnths.index=[10,20,30]
print(seriesMnths)
Slicing Sometimes, we may need to extract a part of a series. This can be done through slicing. This is similar to slicing used with NumPy arrays. We can define which part of the series is to be sliced by specifying the start and end parameters [start :end] with the series name.
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London','Paris'], index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[1:3])
print(seriesCapCntry[1:3])
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London','Paris'], index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry['USA' : 'France'])
We can also get the series in reverse order, for example:
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London','Paris'], index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry)
print("Reverse Order")
print(seriesCapCntry[ : : -1])
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London','Paris'], index=['India', 'USA', 'UK', 'France'])
seriesCapCntry[1:3]=40
print(seriesCapCntry)
Mathematical Operations on Series We have learnt in Class XI that if we perform basic mathematical operations like addition, subtraction, multiplication, division, etc., we can perform mathematical operations on two series in Pandas.
seriesA = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series([10,20,-10,-50,100], index = ['z', 'y', 'a', 'c', 'e'])
Addition of two Series It can be done in two ways. In the first method, two series are simply added together, as shown in the following code.
import pandas as pd
seriesA = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series([10,20,-10,-50,100], index = ['a', 'c', 'e','f','h'])
print( seriesA + seriesB )
The second method is applied when we do not want to have NaN values in the output. We can use the series method add() and a parameter fill_value to replace missing value with a specified value. That is, calling seriesA.add(seriesB) is equivalent to calling seriesA+seriesB,
import pandas as pd
seriesA = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series([10,20,-10,-50,100], index = ['a', 'c', 'e','f','h'])
print( seriesA + seriesB )
print("Second Method to add Series Object")
print(seriesA.add(seriesB, fill_value=0))
Subtraction of two Series Again, it can be done in two different ways, as shown in the following examples:
import pandas as pd
seriesA = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series([10,20,-10,-50,100], index = ['a', 'c', 'e','f','h'])
print( "Difference of Two series\n",seriesB-seriesA )
import pandas as pd
seriesA = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series([10,20,-10,-50,100], index = ['a', 'c', 'e','f','h'])
print( "Difference of Two series\n and NaN values will be filled with 1000\n",seriesA.sub(seriesB, fill_value=1000) )
Multiplication of two Series Again, it can be done in two different ways, as shown in the following examples:
import pandas as pd
seriesA = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series([10,20,-10,-50,100], index = ['a', 'c', 'e','f','h'])
print( "Difference of Two series\n",seriesB*seriesA )
import pandas as pd
seriesA = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series([10,20,-10,-50,100], index = ['a', 'c', 'e','f','h'])
print( "Difference of Two series\n",seriesA.mul(seriesB, fill_value=0))
Division of two Series Again, it can be done in two different ways, as shown in the following examples:
import pandas as pd
seriesA = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series([10,20,-10,-50,100], index = ['a', 'c', 'e','f','h'])
print( "Difference of Two series\n",seriesB/seriesA )
Vector Operations on Series: Series also supports vector operations. Any Operation to be performed on a series gets performed on every single element of it.
Adding 3 in the Series
Multiply the Series with 3
Retrieving Values using Conditions
Deleting Elements from a Series
Download Series notes
0 Comments