ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

sklearn模型使用贝叶斯优化调参

2021-07-02 16:04:12  阅读:429  来源: 互联网

标签:features min 调参 estimators 贝叶斯 samples max import sklearn


文章目录

贝叶斯优化github地址:https://github.com/fmfn/BayesianOptimization

paper地址:http://papers.nips.cc/paper/4522-practical-bayesian%20-optimization-of-machine-learning-algorithms.pdf
Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. “Practical bayesian optimization of machine learning algorithms.” Advances in neural information processing systems 25 (2012).

以随机森林为例:

1. 构造数据源

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from bayes_opt import BayesianOptimization
import numpy as np
import pandas as pd

然后构造一个二分类任务:

x, y = make_classification(n_samples=1000, n_features=5, n_classes=2)

2. 构造黑盒目标函数

def rf_cv(n_estimators, min_samples_split, max_features, max_depth):
    val = cross_val_score(
    	# 这些是随机森林的
        RandomForestClassifier(n_estimators=int(n_estimators),
                               min_samples_split=int(min_samples_split),
                               max_features=min(max_features, 0.999),  # float
                               max_depth=int(max_depth),
                               random_state=2),
        x, y, scoring=['f1', 'accuracy'], cv=5
    ).mean()
    return val

更多评价指标请参考:https://scikit-learn.org/stable/modules/model_evaluation.html#the-scoring-parameter-defining-model-evaluation-rules

3. 确定取值空间

pbounds = {'n_estimators': (10, 250),  # 表示取值范围为10至250
           'min_samples_split': (2, 25),
           'max_features': (0.1, 0.999),
           'max_depth': (5, 15)}

这里字典里的key要与目标函数的参数名对应

4. 构造贝叶斯优化器

optimizer = BayesianOptimization(
    f=rf_cv,  # 黑盒目标函数
    pbounds=pbounds,  # 取值空间
    verbose=2,  # verbose = 2 时打印全部,verbose = 1 时打印运行中发现的最大值,verbose = 0 将什么都不打印
    random_state=1,
)

5. 运行,导出结果与最优参数

optimizer.maximize(  # 运行
    init_points=5,  # 随机搜索的步数
    n_iter=25,  # 执行贝叶斯优化迭代次数
)
print(optimizer.res)  # 所有优化的结果
print(optimizer.max)  # 最好的结果与对应的参数

全部代码

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from bayes_opt import BayesianOptimization
import numpy as np
import pandas as pd

# 产生随机分类数据集,10个特征, 2个类别
x, y = make_classification(n_samples=1000, n_features=5, n_classes=2)

# 步骤一:构造黑盒目标函数
scoring = {'acc': 'accuracy',
           'f_1': 'f1', }


def rf_cv(n_estimators, min_samples_split, max_features, max_depth):
    val = cross_val_score(
        RandomForestClassifier(n_estimators=int(n_estimators),
                               min_samples_split=int(min_samples_split),
                               max_features=min(max_features, 0.999),  # float
                               max_depth=int(max_depth),
                               random_state=2),
        x, y, scoring='f1', cv=5
    ).mean()
    return val


# 步骤二:确定取值空间
pbounds = {'n_estimators': (10, 250),  # 表示取值范围为10至250
           'min_samples_split': (2, 25),
           'max_features': (0.1, 0.999),
           'max_depth': (5, 15)}

# 步骤三:构造贝叶斯优化器
optimizer = BayesianOptimization(
    f=rf_cv,  # 黑盒目标函数
    pbounds=pbounds,  # 取值空间
    verbose=2,  # verbose = 2 时打印全部,verbose = 1 时打印运行中发现的最大值,verbose = 0 将什么都不打印
    random_state=1,
)
optimizer.maximize(  # 运行
    init_points=5,  # 随机搜索的步数
    n_iter=25,  # 执行贝叶斯优化迭代次数
)
print(optimizer.res)  # 打印所有优化的结果
print(optimizer.max)  # 最好的结果与对应的参数

结果显示如下:

itertargetmax_depthmax_fe…min_sa…n_esti…
10.95219.170.74762.00382.56
20.94756.4680.1836.28492.93
30.95028.9680.584411.64174.5
40.9527.0450.88942.63170.9
50.95219.1730.60235.22957.54
60.95228.3040.60735.08657.32
70.951110.590.72312.4774.19
80.94666.6110.24313.66749.53
90.94926.1820.88034.41162.05
100.95147.7350.11644.57679.58
110.953112.720.41084.38981.27
120.951314.280.73383.1284.51
130.950114.80.83986.76777.78
140.951212.650.29562.37679.54
150.952312.040.10536.51382.47
160.950111.790.66552.21168.6
170.95338.3740.4229.81356.87
180.952311.810.873711.0556.84
190.95238.270.636713.3257.61
200.95148.1260.408111.0153.97
210.94959.3230.110.260.14
220.95468.760.15127.38155.76
230.950510.760.14337.15555.15
240.95557.2060.44566.97355.74
250.95435.3590.98097.83555.49
260.95547.0830.41538.07555.05
270.95546.9630.51638.68756.26
280.954314.520.709416.456.91
290.951512.070.727219.0656.5
300.951214.30.52414.4359.32

最优的参数为:

{'target': 0.9554574460534715, 
'params':{
	'max_depth': 7.2061957920136965, 
	'max_features': 0.44564993926538743, 
	'min_samples_split': 6.972807143834928, 
	'n_estimators': 55.73671041246315
}}

标签:features,min,调参,estimators,贝叶斯,samples,max,import,sklearn
来源: https://blog.csdn.net/weixin_35757704/article/details/118416689

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有