Predicting the demand of rental bikes using historical data containing various features like temperature, humidity, season etc.
For a bike renting system to smoothly function, it is necessary to provide a stable supply of rental bikes at any given point of time according to the demand. This requires having a good prediction of the bike demand at each hour. I am working with a dataset of bike rental counts in the city of Seoul, South Korea which contains historical data on date and weather information (Temperature, Humidity, Windspeed, Visibility, Dewpoint, Solar radiation, Snowfall, Rainfall).
View the complete notebook HERE
The dataset was obtained from UCI Machine Learning Repository GO TO SOURCE.
Relevant papers mentioned in the UCI Machine Learning Repository page [1] [2].
The aim is to predict the demand of rental bikes at any given hour using the weather and date information provided in the dataset.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
# splitting data into training and testing set
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 4)
# scaling the data
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# fitting the model
linear_regressor = LinearRegression()
linear_regressor.fit(X_train, y_train)
# prediction using the model
y_pred = linear_regressor.predict(X_test)
y_train_pred = linear_regressor.predict(X_train)
# Performance metrics for testing data
# root mean squared error
print('RMSE:', math.sqrt(mean_squared_error(y_test, y_pred)))
# r2 score
print('R2 score:', r2_score(y_test, y_pred))
# Performance metrics for training data
# root mean squared error
print('RMSE:', math.sqrt(mean_squared_error(y_train, y_train_pred)))
# r2 score
print('R2 score:', r2_score(y_train, y_train_pred))
Performance metrics for Testing dataset
RMSE: 454.3735647954152
R2 score: 0.5117558744340127
Performance metrics for Training dataset
RMSE: 436.9096921808084
R2 score: 0.534370487444807
Comparing the values visually using a snippet (first 50 values) of the actual and predicted values.
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV
# GridSearchCV for hyperparameter tuning
decision_tree_reg = DecisionTreeRegressor()
grid_parameters = {"max_depth": [3, 5, 7], "max_leaf_nodes": [None, 50, 60, 70, 80, 90], "min_samples_leaf":[7,8,9,10]}
regressor_model = GridSearchCV(decision_tree_reg, param_grid = grid_parameters, scoring = 'neg_mean_squared_error', cv = 5)
# fitting the model
regressor_model.fit(X_train2, y_train2)
Performance metrics for Testing dataset
r2 score: 0.7805191405790043
RMSE: 304.0104231418782
Performance metrics for Training dataset
r2 score: 0.8328601459379475
RMSE: 261.2260801680112
Comparing the values visually using a snippet (first 50 values) of the actual and predicted values.
# best hyperparameters
regressor_model.best_params_
‘max_depth’: 7, ‘max_leaf_nodes’: None, ‘min_samples_leaf’: 8
A single decision tree model is trained using the best hyperparameter combination.
# fitting the model
decision_tree_model = DecisionTreeRegressor(max_depth = 7, max_leaf_nodes = None, min_samples_leaf = 8)
decision_tree_model.fit(X_train2, y_train2)
# visualising decision tree
from sklearn.tree import export_graphviz
import graphviz
from IPython.display import Image
dot_data = export_graphviz(decision_tree_model, feature_names=X_train2.columns, filled=True, out_file=None)
graph = graphviz.Source(dot_data)
png_img = graph.pipe(format='png')
Image(png_img)
Click HERE to enlarge Image:
[1] Sathishkumar V E, Jangwoo Park, and Yongyun Cho (2020). ‘Using data mining techniques for bike sharing demand prediction in metropolitan city.’ Computer Communications, Vol.153, pp.353-366.
[2] Sathishkumar V E and Yongyun Cho (2020). ‘A rule-based model for Seoul Bike sharing demand prediction using weather data’ European Journal of Remote Sensing, pp. 1-18.