Rental Bike Demand Prediction

Predicting the demand of rental bikes using historical data containing various features like temperature, humidity, season etc.

Rental Bike Demand Prediction

Back to HOME

For a bike renting system to smoothly function, it is necessary to provide a stable supply of rental bikes at any given point of time according to the demand. This requires having a good prediction of the bike demand at each hour. I am working with a dataset of bike rental counts in the city of Seoul, South Korea which contains historical data on date and weather information (Temperature, Humidity, Windspeed, Visibility, Dewpoint, Solar radiation, Snowfall, Rainfall).

Overview

View the complete notebook HERE

Data Source

The dataset was obtained from UCI Machine Learning Repository GO TO SOURCE.

Relevant papers mentioned in the UCI Machine Learning Repository page [1] [2].

Objective

The aim is to predict the demand of rental bikes at any given hour using the weather and date information provided in the dataset.

Data Preparation

Exploratory Data Analysis

RENTED BIKE VS MONTHS RENTED BIKE VS HOURS RENTED BIKE VS SNOWFALL RENTED BIKE VS RAINFALL

CORRELATION HEATMAP

Model Fitting

Linear Regression Model

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error

# splitting data into training and testing set
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 4)

# scaling the data
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# fitting the model
linear_regressor = LinearRegression()
linear_regressor.fit(X_train, y_train)

# prediction using the model
y_pred = linear_regressor.predict(X_test)
y_train_pred = linear_regressor.predict(X_train)

# Performance metrics for testing data
# root mean squared error
print('RMSE:', math.sqrt(mean_squared_error(y_test, y_pred)))
# r2 score
print('R2 score:', r2_score(y_test, y_pred))

# Performance metrics for training data
# root mean squared error
print('RMSE:', math.sqrt(mean_squared_error(y_train, y_train_pred)))
# r2 score
print('R2 score:', r2_score(y_train, y_train_pred))

Performance metrics for Testing dataset
RMSE: 454.3735647954152
R2 score: 0.5117558744340127

Performance metrics for Training dataset
RMSE: 436.9096921808084
R2 score: 0.534370487444807

Comparison between actual and predicted values

Comparing the values visually using a snippet (first 50 values) of the actual and predicted values.

LINEAR REGRESSION ERROR PLOT

Decision Tree Regression Model

from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV

# GridSearchCV for hyperparameter tuning
decision_tree_reg = DecisionTreeRegressor()
grid_parameters = {"max_depth": [3, 5, 7], "max_leaf_nodes": [None, 50, 60, 70, 80, 90], "min_samples_leaf":[7,8,9,10]}
regressor_model = GridSearchCV(decision_tree_reg, param_grid = grid_parameters, scoring = 'neg_mean_squared_error', cv = 5)

# fitting the model
regressor_model.fit(X_train2, y_train2)

Performance metrics for Testing dataset
r2 score: 0.7805191405790043
RMSE: 304.0104231418782

Performance metrics for Training dataset
r2 score: 0.8328601459379475
RMSE: 261.2260801680112

Comparison between actual and predicted values

Comparing the values visually using a snippet (first 50 values) of the actual and predicted values.

DECISION TREE GRIDSEARCH ERROR PLOT

# best hyperparameters
regressor_model.best_params_

‘max_depth’: 7, ‘max_leaf_nodes’: None, ‘min_samples_leaf’: 8

Decision Tree Visualisation

A single decision tree model is trained using the best hyperparameter combination.

# fitting the model
decision_tree_model = DecisionTreeRegressor(max_depth = 7, max_leaf_nodes = None, min_samples_leaf = 8)
decision_tree_model.fit(X_train2, y_train2)

# visualising decision tree
from sklearn.tree import export_graphviz
import graphviz
from IPython.display import Image

dot_data = export_graphviz(decision_tree_model, feature_names=X_train2.columns, filled=True, out_file=None)
graph = graphviz.Source(dot_data)
png_img = graph.pipe(format='png')
Image(png_img)

Decision Tree

Click HERE to enlarge Image: DECISION TREE VISUALISATION

Conclusions

Exploratory Data Analysis

Linear Regression

Decision Tree Regression

References

[1] Sathishkumar V E, Jangwoo Park, and Yongyun Cho (2020). ‘Using data mining techniques for bike sharing demand prediction in metropolitan city.’ Computer Communications, Vol.153, pp.353-366.

[2] Sathishkumar V E and Yongyun Cho (2020). ‘A rule-based model for Seoul Bike sharing demand prediction using weather data’ European Journal of Remote Sensing, pp. 1-18.