ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

前列腺癌数据分析

2021-04-15 18:01:59  阅读:235  来源: 互联网

标签:数据分析 1.38629412 non 1.3862940 前列腺癌 null 1.386294601 97


import pandas as pd
df = pd.read_table(r'C:\Users\HP\Downloads\prostate.data',engine="python",encoding="utf-8")
df
Unnamed: 0lcavollweightagelbphsvilcpgleasonpgg45lpsatrain
01-0.5798182.76945950-1.3862940-1.38629460-0.430783T
12-0.9942523.31962658-1.3862940-1.38629460-0.162519T
23-0.5108262.69124374-1.3862940-1.386294720-0.162519T
34-1.2039733.28278958-1.3862940-1.38629460-0.162519T
450.7514163.43237362-1.3862940-1.386294600.371564T
56-1.0498223.22882650-1.3862940-1.386294600.765468T
670.7371643.473518640.6151860-1.386294600.765468F
780.6931473.539509581.5368670-1.386294600.854415T
89-0.7765293.53950947-1.3862940-1.386294601.047319F
9100.2231443.24454463-1.3862940-1.386294601.047319F
10110.2546423.60413865-1.3862940-1.386294601.266948T
1112-1.3470743.598681631.2669480-1.386294601.266948T
12131.6134303.02286163-1.3862940-0.5978377301.266948T
13141.4770492.99822967-1.3862940-1.386294751.348073T
14151.2059713.44201957-1.3862940-0.430783751.398717F
15161.5411593.06105266-1.3862940-1.386294601.446919T
1617-0.4155153.516013701.2441550-0.5978377301.470176T
17182.2884863.64935966-1.38629400.371564601.492904T
1819-0.5621193.26766641-1.3862940-1.386294601.558145T
19200.1823223.825375701.6582280-1.386294601.599388T
20211.1474023.41936559-1.3862940-1.386294601.638997T
21222.0592393.501043601.47476301.3480737201.658228F
2223-0.5447273.37588059-0.7985080-1.386294601.695616T
23241.7817093.451574630.43825501.1786557601.713798T
24250.3852623.667400691.5993880-1.386294601.731656F
25261.4469193.124565680.3001050-1.386294601.766442F
26270.5128243.71965165-1.3862940-0.7985087701.800058T
2728-0.4004783.865979671.8164520-1.3862947201.816452F
28291.0402773.128951670.22314400.0487907801.848455T
29302.4096443.37588065-1.38629401.619388601.894617T
....................................
67682.1983354.050915722.3075730-0.4307837102.962692T
6869-0.4462874.40854769-1.3862940-1.386294602.962692T
69701.1939224.780383722.3263020-0.798508752.972975T
70711.8640803.59319460-1.38629411.3217567603.013081T
71721.1600213.341093771.7492000-1.3862947253.037354T
72731.2149133.82537569-1.38629410.2231447203.056357F
73741.8389613.236716600.43825511.1786559903.075006F
74752.9992263.84908369-1.38629411.9095427203.275256T
75763.1411303.26384968-0.05129312.4203687503.337547T
76772.0108954.433789722.12226200.5007757603.392829T
77782.5376574.354784782.3263020-1.3862947103.435599T
78792.6483003.58212969-1.38629412.5839987703.457893T
79802.7794403.82319263-1.38629400.3715647503.513037F
80811.4678743.070376660.55961600.2231447403.516013T
81822.5136563.473518570.43825502.3272787603.530763T
82832.6130073.88875477-0.52763310.5596167303.565298T
83842.6775913.838376651.11514201.7492009703.570940F
84851.5623463.709907601.69561600.8109307303.587677T
85863.3028493.51898064-1.38629412.3272787603.630986T
86872.0241933.731699581.6389970-1.386294603.680091T
87881.7316563.36901862-1.38629410.3001057303.712352T
88892.8075944.71805265-1.38629412.4638537603.984344T
89901.5623463.695110760.93609310.8109307753.993603T
90913.2464914.10181768-1.3862940-1.386294604.029806T
91922.5329033.677566611.3480731-1.3862947154.129551T
92932.8302683.87639668-1.38629411.3217567604.385147T
93943.8210043.89690944-1.38629412.1690547404.684443T
94952.9074473.39618552-1.38629412.4638537105.143124F
95962.8825643.773910681.55814511.5581457805.477509T
96973.4719663.974998680.43825512.9041657205.582932F

97 rows × 11 columns

一共97行,11列

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97 entries, 0 to 96
Data columns (total 11 columns):
Unnamed: 0    97 non-null int64
lcavol        97 non-null float64
lweight       97 non-null float64
age           97 non-null int64
lbph          97 non-null float64
svi           97 non-null int64
lcp           97 non-null float64
gleason       97 non-null int64
pgg45         97 non-null int64
lpsa          97 non-null float64
train         97 non-null object
dtypes: float64(5), int64(5), object(1)
memory usage: 8.4+ KB

在这个数据集中并没有缺失数据

boolmaping = {'T':'1','F':'0'}
df['trainbool'] = df['train'].map(boolmaping)
inplace = True
df
Unnamed: 0lcavollweightagelbphsvilcpgleasonpgg45lpsatraintrainbool
01-0.5798182.76945950-1.3862940-1.38629460-0.430783T1
12-0.9942523.31962658-1.3862940-1.38629460-0.162519T1
23-0.5108262.69124374-1.3862940-1.386294720-0.162519T1
34-1.2039733.28278958-1.3862940-1.38629460-0.162519T1
450.7514163.43237362-1.3862940-1.386294600.371564T1
56-1.0498223.22882650-1.3862940-1.386294600.765468T1
670.7371643.473518640.6151860-1.386294600.765468F0
780.6931473.539509581.5368670-1.386294600.854415T1
89-0.7765293.53950947-1.3862940-1.386294601.047319F0
9100.2231443.24454463-1.3862940-1.386294601.047319F0
10110.2546423.60413865-1.3862940-1.386294601.266948T1
1112-1.3470743.598681631.2669480-1.386294601.266948T1
12131.6134303.02286163-1.3862940-0.5978377301.266948T1
13141.4770492.99822967-1.3862940-1.386294751.348073T1
14151.2059713.44201957-1.3862940-0.430783751.398717F0
15161.5411593.06105266-1.3862940-1.386294601.446919T1
1617-0.4155153.516013701.2441550-0.5978377301.470176T1
17182.2884863.64935966-1.38629400.371564601.492904T1
1819-0.5621193.26766641-1.3862940-1.386294601.558145T1
19200.1823223.825375701.6582280-1.386294601.599388T1
20211.1474023.41936559-1.3862940-1.386294601.638997T1
21222.0592393.501043601.47476301.3480737201.658228F0
2223-0.5447273.37588059-0.7985080-1.386294601.695616T1
23241.7817093.451574630.43825501.1786557601.713798T1
24250.3852623.667400691.5993880-1.386294601.731656F0
25261.4469193.124565680.3001050-1.386294601.766442F0
26270.5128243.71965165-1.3862940-0.7985087701.800058T1
2728-0.4004783.865979671.8164520-1.3862947201.816452F0
28291.0402773.128951670.22314400.0487907801.848455T1
29302.4096443.37588065-1.38629401.619388601.894617T1
.......................................
67682.1983354.050915722.3075730-0.4307837102.962692T1
6869-0.4462874.40854769-1.3862940-1.386294602.962692T1
69701.1939224.780383722.3263020-0.798508752.972975T1
70711.8640803.59319460-1.38629411.3217567603.013081T1
71721.1600213.341093771.7492000-1.3862947253.037354T1
72731.2149133.82537569-1.38629410.2231447203.056357F0
73741.8389613.236716600.43825511.1786559903.075006F0
74752.9992263.84908369-1.38629411.9095427203.275256T1
75763.1411303.26384968-0.05129312.4203687503.337547T1
76772.0108954.433789722.12226200.5007757603.392829T1
77782.5376574.354784782.3263020-1.3862947103.435599T1
78792.6483003.58212969-1.38629412.5839987703.457893T1
79802.7794403.82319263-1.38629400.3715647503.513037F0
80811.4678743.070376660.55961600.2231447403.516013T1
81822.5136563.473518570.43825502.3272787603.530763T1
82832.6130073.88875477-0.52763310.5596167303.565298T1
83842.6775913.838376651.11514201.7492009703.570940F0
84851.5623463.709907601.69561600.8109307303.587677T1
85863.3028493.51898064-1.38629412.3272787603.630986T1
86872.0241933.731699581.6389970-1.386294603.680091T1
87881.7316563.36901862-1.38629410.3001057303.712352T1
88892.8075944.71805265-1.38629412.4638537603.984344T1
89901.5623463.695110760.93609310.8109307753.993603T1
90913.2464914.10181768-1.3862940-1.386294604.029806T1
91922.5329033.677566611.3480731-1.3862947154.129551T1
92932.8302683.87639668-1.38629411.3217567604.385147T1
93943.8210043.89690944-1.38629412.1690547404.684443T1
94952.9074473.39618552-1.38629412.4638537105.143124F0
95962.8825643.773910681.55814511.5581457805.477509T1
96973.4719663.974998680.43825512.9041657205.582932F0

97 rows × 12 columns

trainbool是将T和F转化成bool值

df.head()
Unnamed: 0lcavollweightagelbphsvilcpgleasonpgg45lpsatraintrainbool
01-0.5798182.76945950-1.3862940-1.38629460-0.430783T1
12-0.9942523.31962658-1.3862940-1.38629460-0.162519T1
23-0.5108262.69124374-1.3862940-1.386294720-0.162519T1
34-1.2039733.28278958-1.3862940-1.38629460-0.162519T1
450.7514163.43237362-1.3862940-1.386294600.371564T1
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing
fig = plt.figure(figsize=(20,12))
corr = df.corr()
sns.heatmap(corr,annot=True)
<matplotlib.axes._subplots.AxesSubplot at 0x20d4bddc208>

在这里插入图片描述


标签:数据分析,1.38629412,non,1.3862940,前列腺癌,null,1.386294601,97
来源: https://blog.csdn.net/showmas/article/details/115732687

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有