Python Pandas DataFrame std() For Standard Deviation value of rows and columns by using axis,skipna,numeric

Pandas

DataFrame.std(self, axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

We can get stdard deviation of DataFrame in rows or columns by using std().

`self`	array, elements to get the std value
`axis`	Int (optional ), or tuple, default is None, standard deviation among all the elements. If axis given then values across the axis is returned.
`level`	int ( Optional ),default is None, for multiindex Axis. count along the level.
`skipna`	Bool ( Optional ),default is True, Exclude NA values.
`numeric_only`	Bool ( Optional ),default is None, include only Int, floot and boolean columns.
`ddof`	Delta Degrees of Freedom ( default is 1 ) , N - ddof is used where N is the number of elements in computing the standard deviation

import pandas as pd
my_dict={
  'id':[1,2,3,4,5,4,2],
  'name':['John','Max','Arnold','Krish','John','Krish','Max'],
  'class1':['Four','Three','Three','Four','Four','Four','Three'],
  'mark':[75,85,55,60,60,60,85],
  'sex':['female','male','male','female','female','female','male']
	}
my_data = pd.DataFrame(data=my_dict)
print(my_data.std())

Output

id       1.414214
mark    12.817399
dtype: float64

Using only mark column ( with output )

print(my_data['mark'].std()) # 12.817398889233116

Using axis

We will use option axis=0 ( default ) by adding to above code.

( The last line is only changed )

print(my_data.std(axis=1))

Along the horizontal row ( axis=1 ) the standard deviation among values of two columns ( id and Mark ) is calculated. For example for third row [3,55] is 36.769553.
Output is here.

0    52.325902
1    58.689863
2    36.769553
3    39.597980
4    38.890873
5    39.597980
6    58.689863
dtype: float64

print(my_data.std(axis=0))

Output

id       1.414214
mark    12.817399
dtype: float64

ddof

ddof = 0 this is Population Standard Deviation
ddof = 1 ( default) , this is Sample Standard Deviation

print(my_data.std(ddof=0))

Output

id       1.309307
mark    11.866606
dtype: float64

Handling NA data using skipna option

We will use skipna=True to ignore the null or NA data. Let us check what happens if it is set to True ( skipna=True )

import numpy as np
import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,70,30],
         'ENGLISH':[80,70,np.nan,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
print(my_data.std(skipna=True))

Output

ID          1.870829
MATH       20.000000
ENGLISH    19.235384
dtype: float64

numeric_only

Default value is None, we can set it to True ( numeric_only=True ) to include only float, int, boolean columns. We can included all by setting it to False ( numeric_only=False ) . Let us see the outputs .

print(my_data.std(numeric_only=True))

Output is same as above as we considered ID , MATH and ENGLISH columns. By changing to True we will get error message.

print(my_data.std(numeric_only=False))

TypeError: could not convert string to float: 'Ravi'

Comparison of Standard Deviation using Python, Pandas, Numpy and Statistics library

Pandas Plotting graphs mean min sum len Filtering of Data

Numpy arrays Python & MySQL Python- Tutorials

Subscribe to our YouTube Channel here