Health Insurance Cross Sell Prediction

Health Insurance Cross Sell Prediction

Back to HOME

Cross selling is the process of offering an existing customer a product that is similar or compatible to the product that they already purchased. Acquiring a new customer is harder than retaining existing customers, which makes customer relationship a very important aspect for any business.

Cross selling can be an effective method to strengthen the relationship with the customer while also boosting the revenue of the business. When rightly done, it can:

  1. improve the customer’s experience with the existing product, or
  2. solve some new problems for them.

Planning cross selling strategy It is important to know which customer might be interested or uninterested in a product while planning the cross selling process. This helps to:

  1. avoid any potential negative effects on the customer relationship due to the cross selling advertisement.
  2. make efficient use of the communication and marketing efforts.

So we will use the data of past health insurance policy holders of our client to build models that can classify a customer as ‘Interested” or “Not interested” in the vehicle insurance. So that the company can plan their marketing and communication strategy accordingly.

Overview

View the complete notebook HERE

Objective

The aim is to predict whether a health insurance policy holder will be “Interested” or “Not interested” in the company’s vehicle insurance.

Data Preparation

RESPONSE VARIABLE

Exploratory Data Analysis

AGE VEHICLE AGE

Models

Logistic Regression Model

# train test splitting
X_train, X_test, y_train, y_test = train_test_split(X_new, y_new, test_size = 0.3, random_state = 23)

# standardising the data
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

# hyperparameter tuning and crossvalidation
parameters = {"penalty":['l1', 'l2', 'elasticnet', 'none'],"max_iter":[100,200,300]}
clf = GridSearchCV(LogisticRegression(), param_grid = parameters, scoring = 'accuracy', cv = 3)
# fitting the model
clf.fit(X_train, y_train)

# class prediction on training and testing datasets
y_pred_lr = clf.predict(X_test)
y_train_pred_lr = clf.predict(X_train)

# probability prediction on training and testing datasets (only using probabilities of positive class)
y_prob_lr = clf.predict_proba(X_test)[:,1]
y_train_prob_lr = clf.predict_proba(X_train)[:,1]

# performance evaluation
print(classification_report(y_test, y_pred_lr))
print(classification_report(y_train, y_train_pred_lr))

Performance metrics for Testing dataset
Accuracy : 0.81
Precision: 0.76
Recall: 0.89
F1-Score: 0.82
Area Under the ROC Curve: 0.88

Receiver operating characteristic (ROC) Curve

LOGISTIC REGRESSION ROC

Confusion Matrix

LOGISTIC REGRESSION CM

Decision Tree Classifier

from sklearn.tree import DecisionTreeClassifier

dtmodel = DecisionTreeClassifier(criterion = 'entropy', random_state = 32)
# hyperparameter tuning and cross validation
parameters = {'max_depth':[7,9,11], 'splitter':['best','random'], 'min_samples_split':[2,4]}
decisiontree = GridSearchCV(dtmodel, param_grid = parameters, scoring = 'accuracy', cv = 3)
# fitting the model
decisiontree.fit(X_train, y_train)

Performance metrics for Testing dataset
Accuracy : 0.84
Precision: 0.79
Recall: 0.91
F1-Score: 0.8
Area Under the ROC Curve: 0.91

Receiver operating characteristic (ROC) Curve

DECISION TREE ROC

Confusion Matrix

DECISION TREE CM

Random Forest Classifier

from sklearn.ensemble import RandomForestClassifier

randomforest = RandomForestClassifier(random_state = 71, max_depth = 11, min_samples_split = 4, n_jobs = -1, criterion = 'entropy', n_estimators = 100)
# fitting the model
randomforest.fit(X_train, y_train)

Performance metrics for Testing dataset
Accuracy : 0.83
Precision: 0.78
Recall: 0.92
F1-Score: 0.85
Area Under the ROC Curve: 0.91

Receiver operating characteristic (ROC) Curve

RANDOM FOREST ROC

Confusion Matrix

RANDOM FOREST CM

Feature Importance

RANDOM FOREST FEATURES

XGBoost Classifier

from xgboost import XGBClassifier
xgb = XGBClassifier(seed = 12, use_label_encoder = False, objective = 'binary:logistic',
                    subsample = 0.9, colsample_bytree = 0.5,
                    max_depth = 7, learning_rate = 0.5, gamma = 0.25, reg_lambda = 1)
# fitting the model
xgb.fit(X_train, y_train)

Performance metrics for Testing dataset
Accuracy : 0.89
Precision: 0.9
Recall: 0.88
F1-Score: 0.89
Area Under the ROC Curve: 0.97

Receiver operating characteristic (ROC) Curve

XGBOOST ROC

Confusion Matrix

XGOOST CM

Feature Importance

XGBOOST FEATURES

Model comparison

MODEL COMPARISON

Conclusions

EDA

Models