Diagrama de caja - Construction of BoxPlot and Basic Stats with Python - Pandas
# # Construction of BoxPlot and Basic Stats with Python
Bienvenidos nuevamente!
Adjunto a ustedes el código de explicación de funciones básicas en Python (Pandas, Numpy) utilizando el Jupyter Notebook. Aquí encontrarán uso de media, moda, mediana y rango intercuartílico así como gráficos importantes en la exploración de los datos como ser Diagrama de cajas (BoxPlot) e histogramas.
Igualmente, les adjunto de manera insertada el vídeo de dicha explicación.
---------------------------------------------------------------------------------------
import pandas as pd
from pandas import read_csv
from scipy import stats
datosNB = pd.read_csv('Documents/Canal Yout/Stats with Python/birth2011.csv',sep=',')
datosNB = datosNB.sample(n=500)
datosNB.head()
DOB_MM | DOB_WK | OCNTY | MAGER | MRCNTY | FAGECOMB | DWGT | DMETH_REC | APGAR5 | SEX | COMBGEST | DBWT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 7 | 127 | 24 | 999 | 25.0 | 160 | 1 | 9 | M | 38 | 3090 |
1 | 1 | 7 | 25 | 19 | 25 | 21.0 | 143 | 1 | 9 | F | 39 | 3062 |
2 | 1 | 1 | 999 | 25 | 999 | 45.0 | 172 | 1 | 10 | F | 39 | 3062 |
3 | 1 | 3 | 25 | 38 | 25 | 32.0 | 192 | 2 | 9 | F | 39 | 3062 |
4 | 1 | 2 | 999 | 23 | 999 | 25.0 | 194 | 1 | 9 | M | 39 | 3941 |
# # Select groups
df1 = datosNB.DBWT[datosNB.SEX=='M']
df2 = datosNB.DBWT[datosNB.SEX=='F']
print("Mean for M", df1.mean() )
print("Std for M", df1.std() )
print("Mean for F", df2.mean() )
print("Std for F", df2.std() )
df1.median()
------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
import numpy as np
np.max(df1)
np.min(df1)
df1.quantile () # print median only
q1,q2,q3 = df1.quantile([0.25,0.5,0.75])
print("Cuartiles: ")
print( "Q1 ", q1)
print( "Q2 (mediana) ", q2)
print( "Q3 ",q3)
#Rango intercuartilico para Masculino
RIC = q3 -q1
L1 = q1 - 1.5*RIC
L2 = q3 + 1.5*RIC
# todo numero considerado no outlier o valor extremo
print("L1: ", L1, " hasta L2:", L2)
import matplotlib
matplotlib.style.use('ggplot')
import matplotlib.pyplot as plt
get_ipython().magic('matplotlib inline')
import seaborn as sb
#sb.boxplot(x="SEX", y="DBWT", data=datosNB, palette="Set1")
sb.boxplot( y=df1, palette="Set1")
np.random.seed(1)
#plt.hist(df1) # It is possible to let python to choose the number of bins
plt.hist(df1, bins=10, edgecolor='black') # range=(0,6000)
plt.show()
# # Number of weeks when newborn was born
datosNB.COMBGEST[ datosNB.DBWT== np.min(df1)]
np.ptp(df1) # Range
moda = stats.mode(df1)
print("The modal value is {} with a count of {}".format(moda.mode[0], moda.count[0]))
plt.hist(datosNB.COMBGEST[datosNB.SEX=='M'] , bins=10, edgecolor='black') # range=(0,6000)
plt.show()
moda = stats.mode(datosNB.COMBGEST[datosNB.SEX=='M'])
print("The modal value is {} with a count of {}".format(moda.mode[0], moda.count[0]))
Comentarios
Publicar un comentario