4 Results – EV Charging Infrastructure Optimization

4.1 Description of the Models

To ensure the robustness of our final choice, we trained and evaluated three distinct algorithms, progressing from simple interpretable models to complex ensemble methods:

Linear Regression (Baseline): A simple parametric model used to establish a baseline. It assumes a linear relationship between features (e.g., road density) and demand.
Random Forest Regressor: An ensemble learning method that operates by constructing a multitude of decision trees during training. It was tested for its ability to handle non-linear interactions without heavy hyperparameter tuning.
XGBoost Regressor (Final Model): An implementation of gradient boosted decision trees designed for speed and performance. This model was selected for its ability to handle sparse data and minimize residual errors through iterative boosting.

4.2 Performance Metrics

Since the target variable (demand_score) is continuous, we utilized standard regression metrics for evaluation:

Root Mean Squared Error (RMSE): Measures the standard deviation of the prediction errors. A lower RMSE indicates a more precise model.
R-Squared (\(R^2\)): Represents the proportion of variance in the dependent variable that is predictable from the independent variables. An \(R^2\) closer to 1.0 indicates a better fit.

4.3 Results Table

The table below summarizes the performance of the three models on the hold-out test set (20% of data).

Table 4.1: Performance Metrics Comparison of ML Models

Model Name	RMSE (Lower is Better)	\(R^2\) Score (Higher is Better)	Training Time (sec)
Linear Regression	145.2	0.42	0.5s
Random Forest	89.4	0.76	12.3s
XGBoost (Optimized)	42.1	0.88	4.1s

4.4 Interpretation of the Results

The results clearly demonstrate the superiority of the XGBoost model for this specific task.

Baseline Failure: The Linear Regression model performed poorly (\(R^2 = 0.42\)), suggesting that the relationship between urban features and EV charging demand is highly non-linear. For example, doubling traffic volume does not simply “double” the demand; it likely triggers a threshold effect that linear models miss.
The Boosting Advantage: While Random Forest performed well, XGBoost achieved the lowest RMSE (42.1). This is attributed to its boosting mechanism, which specifically targets and corrects the errors made by previous trees, allowing it to capture subtle patterns in complex areas like mixed-use zones.
Efficiency: XGBoost also demonstrated superior computational efficiency compared to Random Forest, training approximately 3x faster while achieving higher accuracy.

4.5 Visualization

4.5.1 1. Model Comparison

The following chart visualizes the error rates across all tested models, highlighting the performance gap.

Figure 4.1: Bar chart comparing RMSE values (Lower is Better). XGBoost shows the lowest error.

4.5.2 2. Top 300 Recommended Locations

After applying the MCDM (Multi-Criteria Decision Making) layer to the XGBoost predictions, we identified the top 300 locations. These are the grid cells with the highest combined score of Predicted Demand and Suitability.

Map visualizing the Top 300 recommended charging station locations (Blue, Green and Yellow Markers) against the urban grid.

4.5.3 Interactive Charging Station Map

Click on the link below to access the interactive heatmap showing optimal charging locations.

Click here to open the Interactive Heatmap

4.6 Sensitivity Analysis

To test the robustness of the XGBoost model, we conducted a sensitivity analysis on the top predictor: Traffic Volume.

We artificially increased traffic volume by 10% in the test set while keeping other features constant.

Observation: The model predicted an average demand increase of 14.5%.
Conclusion: The model is highly sensitive to traffic data, confirming that traffic flow is the primary driver of the demand score. This indicates that accurate real-time traffic feeds are essential for maintaining model reliability in production.