1. 정형 데이터 분석 - 앙상블(ensemble) : 부스팅(boosting)

코딩베어 ㅣ 2023. 5. 16. 16:21

728x90

2. 부스팅(boosting)

부스팅이란 예측력이 약한 모형(weak leaners)를 결합하여, 강한 예측 모형을 만드는 방법

부스팅 기법을 사용하는 알고리즘에는 AdaBoost, GBM, LightGBM, XGBoost, eXtreme Gradient Boosting, CatBoost 등이 있다.

XGboost : 병렬 처리가 지원되도록 구현하여 훈련과 분류 속도가 빠른 알고리즘

xgb.train(params, data, nrounds, early_stopping_rounds, watchlist) : nrounds - 최대 부스팅 반복 횟수, early_stopping_rounds - 지정된 회수 이상 성능 향상이 없는 경우 중지, watchlist - 모형을 성능하기 위하여 사용하는 xgb.DMatrix 개체 이름

* xgb.DMatrix(data, info) : info - xgb.Dmatrix 객체에 저장될 data의 정보들의 리스트

library(xgboost)

train.label = as.integer(train$diabetes)-1
mat_train.data = as.matrix(train[, -9])
mat_test.data = as.matrix(tes[, -9])

xgb.train = xgb.DMatrix(data = mat_train.data,
                        label = train_label)
xgb.test = xgb.DMatrix(data = mat_test.data,)

param_list = list(booster = "gbtree",
                  eta = 0.001,
                  max_depth = 10,
                  gamma = 5,
                  subsample = 0.8,
                  colsample_bytree = 0.8,
                  objective = "binary:logistic",
                  eval_metric = "auc")
                  
#xgboost 모형 생성
md.xgb = xgb.train(params = param_list,
                   data = xgb.train,
                   nrounds = 200,
                   ealry_stopping_rounds = 10,
                   watchlist = list(val1 = xgb.train),
                   verbos = 1)
                    
xgb.pred = predict(md.xgb,
                   newdata = xgb.test)
xgb.pred2 = ifelse(xgb.pred >= 0.5, "pos", "neg")
xgb.pred2 = as.factor(xgb.pred2)

library(caret)
confusionMatrix(xgb.pred2,
                reference = test$diabetes,
                postive = "pos")
#정확도는 0.7308으로 보통 수준임
#p값이 0.05를 초과하는 0.07931임
#카파통계량이 0.3658로 Fair(어느 정도 일치 상태)하다.

728x90

저작자표시 비영리 변경금지 (새창열림)

'BAE(Certification)' 카테고리의 다른 글

계층적 군집 분석 - 군집 간 거리 측정 (0)	2023.05.18
1. 정형 데이터 분석 - 앙상블(ensemble) : 랜덤 포레스트(random forest) (0)	2023.05.16
1. 정형 데이터 분석 - 앙상블(ensemble) : 배깅(bagging) (0)	2023.05.16
1. 정형데이터 분석 - 나이브 베이즈(Naive Bayes) 분류 (0)	2023.05.16
1. 정형데이터 분석 - KNN(K-Nearest Neighbor) (0)	2023.05.16

코딩하는 곰의 일상 저장소

1. 정형 데이터 분석 - 앙상블(ensemble) : 부스팅(boosting)

'BAE(Certification)' 카테고리의 다른 글

티스토리툴바