« Pandas
DataFrame.mean(self, axis=None, skipna=None, level=None, numeric_only=None)
self | array, elements to get the mean value |
axis | Int (optional ), or tuple, default is None, mean among all the elements. If axis given then values across the axis is returned. |
level | int ( Optional ),default is None, for multiindex Axis. count along the level. |
skipna | Bool ( Optional ),default is True, Exclude NA values. |
numeric_only | Bool ( Optional ),default is None, include only Int, floot and boolean columns. |
We can get mean value of rows or columns by using mean().
import pandas as pd
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
'ID':[1,2,3,4,5,6],
'MATH':[80,40,70,70,70,30],
'ENGLISH':[80,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
print(my_data.mean())
Output
ID 3.5
MATH 60.0
ENGLISH 55.0
dtype: float64
What is the mean mark in MATH ?
print(my_data['MATH'].mean()) # 60.0
We can get the row or details of the record who got less or equal to the mean mark in MATH
print(my_data[my_data['MATH']<=my_data['MATH'].mean()])
Output is here
NAME ID MATH ENGLISH
1 Raju 2 40 70
5 Jack 6 30 30
Using axis
We will use option axis=1 by adding to above code.
( The last line is only changed )
print(my_data.mean(axis=1))
Output is here
0 53.666667
1 37.333333
2 37.666667
3 41.333333
4 45.000000
5 22.000000
dtype: float64
axis=0
( default ) is same as output shown at starting of this page.
level option
For MultiIndex (hierarchical) axis we can specify the level.
import pandas as pd
my_dict=pd.MultiIndex.from_arrays(
[[1,2,3,4,5,6],
[80,40,70,70,70,30],
[80,70,40,50,60,30]],
names=['id','math','eng'])
my_data = pd.Series([4, 2, 0, 8,3,4], name='marks', index=my_dict)
print(my_data.mean(level='math'))
Output
math
80 4.000000
40 2.000000
70 3.666667
30 4.000000
Name: marks, dtype: float64
Handling NA data using skipna option
We will use skipna=True to ignore the null or NA data. Let us check what happens if it is set to True ( skipna=True )
import numpy as np
import pandas as pd
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
'ID':[1,2,3,4,5,6],
'MATH':[80,40,70,70,70,30],
'ENGLISH':[80,70,np.nan,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
print(my_data.mean(skipna=True))
Output
ID 3.5
MATH 60.0
ENGLISH 58.0
dtype: float64
We will use skipna=False
print(my_data.mean(skipna=False))
Output
ID 3.5
MATH 60.0
ENGLISH NaN
dtype: float64
numeric_only
Default value is None, we can set it to True ( numeric_only=True ) to include only float, int, boolean columns. We can included all by setting it to False ( numeric_only=False ) . Let us see the outputs .
print(my_data.mean(numeric_only=False))
This will generate error as we have string objects.
TypeError: could not convert string to float: 'RaviRajuAlexRonKingJack'
« Pandas
Plotting graphs
max
min
sum
len
std
Filtering of Data
← Subscribe to our YouTube Channel here