Cross selling is the process of offering an existing customer a product that is similar or compatible to the product that they already purchased. Acquiring a new customer is harder than retaining existing customers, which makes customer relationship a very important aspect for any business.
Cross selling can be an effective method to strengthen the relationship with the customer while also boosting the revenue of the business. When rightly done, it can:
Planning cross selling strategy It is important to know which customer might be interested or uninterested in a product while planning the cross selling process. This helps to:
So we will use the data of past health insurance policy holders of our client to build models that can classify a customer as ‘Interested” or “Not interested” in the vehicle insurance. So that the company can plan their marketing and communication strategy accordingly.
View the complete notebook HERE
The aim is to predict whether a health insurance policy holder will be “Interested” or “Not interested” in the company’s vehicle insurance.
# train test splitting
X_train, X_test, y_train, y_test = train_test_split(X_new, y_new, test_size = 0.3, random_state = 23)
# standardising the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
# hyperparameter tuning and crossvalidation
parameters = {"penalty":['l1', 'l2', 'elasticnet', 'none'],"max_iter":[100,200,300]}
clf = GridSearchCV(LogisticRegression(), param_grid = parameters, scoring = 'accuracy', cv = 3)
# fitting the model
clf.fit(X_train, y_train)
# class prediction on training and testing datasets
y_pred_lr = clf.predict(X_test)
y_train_pred_lr = clf.predict(X_train)
# probability prediction on training and testing datasets (only using probabilities of positive class)
y_prob_lr = clf.predict_proba(X_test)[:,1]
y_train_prob_lr = clf.predict_proba(X_train)[:,1]
# performance evaluation
print(classification_report(y_test, y_pred_lr))
print(classification_report(y_train, y_train_pred_lr))
Performance metrics for Testing dataset
Accuracy : 0.81
Precision: 0.76
Recall: 0.89
F1-Score: 0.82
Area Under the ROC Curve: 0.88
from sklearn.tree import DecisionTreeClassifier
dtmodel = DecisionTreeClassifier(criterion = 'entropy', random_state = 32)
# hyperparameter tuning and cross validation
parameters = {'max_depth':[7,9,11], 'splitter':['best','random'], 'min_samples_split':[2,4]}
decisiontree = GridSearchCV(dtmodel, param_grid = parameters, scoring = 'accuracy', cv = 3)
# fitting the model
decisiontree.fit(X_train, y_train)
Performance metrics for Testing dataset
Accuracy : 0.84
Precision: 0.79
Recall: 0.91
F1-Score: 0.8
Area Under the ROC Curve: 0.91
from sklearn.ensemble import RandomForestClassifier
randomforest = RandomForestClassifier(random_state = 71, max_depth = 11, min_samples_split = 4, n_jobs = -1, criterion = 'entropy', n_estimators = 100)
# fitting the model
randomforest.fit(X_train, y_train)
Performance metrics for Testing dataset
Accuracy : 0.83
Precision: 0.78
Recall: 0.92
F1-Score: 0.85
Area Under the ROC Curve: 0.91
from xgboost import XGBClassifier
xgb = XGBClassifier(seed = 12, use_label_encoder = False, objective = 'binary:logistic',
subsample = 0.9, colsample_bytree = 0.5,
max_depth = 7, learning_rate = 0.5, gamma = 0.25, reg_lambda = 1)
# fitting the model
xgb.fit(X_train, y_train)
Performance metrics for Testing dataset
Accuracy : 0.89
Precision: 0.9
Recall: 0.88
F1-Score: 0.89
Area Under the ROC Curve: 0.97