文章目录
取出部分行或列的数据删除列或行删除有空值的行或列填充值判断是否是空值数据拼接在特定的行对应特定的列填值分组移动列的位置(先删除再插入)统计某一列或某一行的缺失值数目统计非空值数量
取出部分行或列的数据
import pandas
as pd
df
= pd
.DataFrame
(pd
.read_excel
(path
, header
=0, encoding
='utf-8'))
df
= df
.iloc
[:, [0, 3, -1]]
df
= df
.iloc
[0:5, :]
删除列或行
用法:DataFrame.drop(labels=None,axis=0, index=None, columns=None, inplace=False)
参数说明: labels 就是要删除的行列的名字,用列表给定 axis 默认为0,指删除行,因此删除columns时要指定axis=1; index 直接指定要删除的行 columns 直接指定要删除的列 inplace=False,默认该删除操作不改变原数据,而是返回一个执行删除操作后的新dataframe; inplace=True,则会直接在原数据上进行删除操作,删除后无法返回。
因此,删除行列有两种方式: 1)labels=None,axis=0 的组合 2)index或columns直接指定要删除的行或列
>>>df
= pd
.DataFrame
(np
.arange
(12).reshape
(3,4), columns
=['A', 'B', 'C', 'D'])
>>>df
A B C D
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
>>>df
.drop
(['B', 'C'], axis
=1)
A D
0 0 3
1 4 7
2 8 11
>>>df
.drop
(columns
=['B', 'C'])
A D
0 0 3
1 4 7
2 8 11
>>> df
.drop
(['B', 'C'])
ValueError
: labels
['B' 'C'] not contained
in axis
>>>df
.drop
([0, 1])
A B C D
2 8 9 10 11
>>> df
.drop
(index
=[0, 1])
A B C D
2 8 9 10 11
删除有空值的行或列
pd
.DataFrame
.dropna
(df
, axis
=1, how
='all', inplace
=True)
填充值
pd
.DataFrame
.fillna
(df
, value
=-1, axis
=1, inplace
=True)
判断是否是空值
pd
.isna
()
数据拼接
pd
.merge
(df
, df_info
, how
='left', on
='REG_ID')
pd
.concat
([df
, col_icd
], axis
=1, sort
=False)
在特定的行对应特定的列填值
df
[f
'{ICD}'] = df
[f
'{ICD}'].mask
(df
['REG_ID'] == REG_ID
, 1)
df
[f
'{ICD}'].mask
(df
['REG_ID'] == REG_ID
, 1, , inplace
=True)
分组
b
= df
['B'].groupby
(df
['A'], as_index
=False).mean
()
print(b
)
b
= df
.ix
[:,1].groupby
(df
.ix
[:, 0]).mean
()
print(b
)
m
= df
.groupby
('A')
b
= m
['B'].mean
()
print(b
)
dfFee
= dfFee
.groupby
(by
=['REG_ID'], as_index
=False)['FEE'].sum()
移动列的位置(先删除再插入)
import pandas
as pd
import numpy
as np
df
= pd
.DataFrame
(np
.random
.randn
(3,4),columns
=['a','b','c','d'])
c
= df
.pop
('c')
df
.insert
(4,'c_new',c
)
print(df
)
统计某一列或某一行的缺失值数目
使用isnull()
import pandas
as pd
df
= pd
.read_csv
('123.csv' , encoding
='gbk')
rows_null
= df
.isnull
().sum(axis
=1)
col_null
= df
.isnull
().sum(axis
=0)
all_null
= df
.isnull
().sum().sum()
idx_null
= df
['列名'].isnull
().sum(axis
=0)
统计非空值数量
import pandas
as pd
df
= pd
.read_csv
('123.csv' , encoding
='gbk')
rows_not_null
= df
.count
(axis
=1)
cols_not_null
= df
.count
(axis
=0)
cols_null
= df
.shape
[1] - cols_not_null
col_not_null
= df
['列名'].count
(axis
=0)
备注:部分内容摘录自其它网站,若有侵权,请联系作者删除!