Bawelile Gule · Data Scientist & AI Strategist

About

The story behind
the data

I'm not a typical data scientist. Before my MSAI at Kennesaw State University, I spent over a decade in the field — first as a statistician at a national railway authority in southern Africa, where I rebuilt the company's pricing model from scratch and earned the CEO's Employee of the Year Award. Then as a General Manager who built demand forecasting systems that drove 15–20% weekly sales growth.

That background shapes everything I build. I don't start with models — I start with the question: what decision does this data need to enable? Every pipeline I write ends with a dollar figure, an operational recommendation, or an executive-ready dashboard.

I believe the most dangerous data scientist is one who can't explain their model to a room full of people who don't care about MAPE. The most valuable one is someone who can do both: build the system and tell the story.

Currently seeking Data Scientist and AI Strategy roles across Georgia, Atlanta Metro, and Remote.

CEO Employee of the Year Award

Awarded for rebuilding the pricing model of a national railway authority in southern Africa using regression analysis in SAS and R — translating outputs into an executive narrative that informed a full fleet strategy overhaul.

MS Artificial Intelligence — In Progress

Kennesaw State University ↗
Master of Business Administration

Management College of Southern Africa ↗
BSc Statistics

University of Pretoria ↗
Data Scientist Master's Program

IBM / Simplilearn · Aug 2021 ↗
Data Scientist Master's Program

IBM / Simplilearn · Aug 2021

Projects

Built to prove it,
not just describe it.

Every project starts with a real business problem. Every result is measured in dollars, not just metrics.

01 / Forecasting

Retail Demand Forecasting Pipeline

Reducing inventory waste by quantifying forecast error in dollars

End-to-end ML pipeline — raw data ingestion, feature engineering (19 features including lag, rolling, calendar), Prophet + XGBoost ensemble, MLflow experiment tracking, and a Streamlit dashboard with business dollar-impact summary. MAPE under 10%.

PythonProphetXGBoost MLflowPlotlyStreamlit

▶ Open in Colab GitHub →

02 / Explainable AI

Grad-CAM XAI Image Classification

Making black-box CNNs transparent for non-technical stakeholders

3-block CNN with Grad-CAM explainability — surfaces deep feature activations as visual diagnostic heatmaps. Enables non-technical stakeholders to validate model behaviour. Deployed as a containerised REST API via FastAPI and Docker.

TensorFlowGrad-CAMFastAPI DockerOpenCV

▶ Open in Colab GitHub →

03 / MLOps

Containerised ML API & CI/CD Pipeline

From notebook to production endpoint — zero-downtime deployment

Production-grade ML model served as a REST API — containerised with Docker, automatically deployed via GitHub Actions CI/CD to Google Cloud Run. Load-balanced routing, health checks, and automated rollback on failure.

DockerFastAPIGitHub Actions Cloud RunGKE

GitHub →

04 / Statistical Modelling

Pricing & Revenue Optimisation Model

The model that earned a CEO award — rebuilt and open-sourced

Regression-based pricing model that translates raw cost and demand data into an executive-facing revenue strategy. Includes an interactive what-if scenario tool for operations teams. Inspired by real work at a national railway authority in southern Africa.

PythonRRegression Scenario modellingPlotly

▶ Open in Colab GitHub →

05 / Distributed Computing

Distributed Forecasting Pipeline

Closing the Databricks gap — production-scale ML at volume

Forecasting pipeline scaled to distributed compute using PySpark. Delta Lake data management, MLflow experiment tracking, and scalable feature engineering across large multi-store retail datasets. Runs on Databricks Community Edition.

PySparkDelta LakeMLflow SQLDatabricks

GitHub →

06 / LLM · RAG

RAG Business Intelligence Assistant

Ask your data anything — in plain English

Retrieval-Augmented Generation system that lets business users ask plain English questions about operational data and receive grounded, cited answers — powered by LangChain, ChromaDB, and Sentence Transformers. No SQL required. No dashboard needed. Production deployment guide included for Claude and GPT-4 integration.

LangChainChromaDB RAGVector embeddings Prompt engineeringClaude API

▶ Open in Colab GitHub →

07 / LLM · Explainability

AI-Powered Explainability Agent

From SHAP values to Monday morning briefings

LLM-powered agent that takes ML model outputs and SHAP feature attributions and automatically generates plain English stakeholder briefings — translating every prediction into a narrative that a non-technical decision-maker can read and act on immediately. Includes SHAP visualisation dashboard and production Claude/GPT-4 integration patterns.

LLM agentsSHAP Prompt engineeringXGBoost Claude APIExplainable AI

▶ Open in Colab GitHub →

Skills

The full toolkit

Built over 10+ years across statistics, operations, and AI engineering. Every skill below is demonstrated in a live project.

Languages

PythonSQL RSASSPSS

Evidence: P1 ↗ P5 (Spark SQL) ↗ P3 (SQL monitoring) ↗

ML & Forecasting

XGBoostProphet TensorFlowscikit-learn ARIMA

Evidence: P1 (Prophet + XGBoost) ↗ P2 (TensorFlow) ↗

MLOps & Deployment

MLflowDocker FastAPIGitHub Actions GKECloud Run

Evidence: P3 (full CI/CD) ↗ P1 (MLflow) ↗

Distributed Computing

PySparkDelta Lake DatabricksApache Spark

Evidence: P5 (full pipeline) ↗

Visualisation & Reporting

PlotlyStreamlit TableauPower BI Excel

Evidence: P4 (Excel + Tableau exports) ↗ P1 (Plotly) ↗

LLM & Generative AI

LangChainRAG ChromaDBPrompt Engineering Claude APIOpenAI API Vector Embeddings

Evidence: P6 (RAG pipeline) ↗ P7 (LLM agent) ↗

OKR frameworksAnnual planning Supply chainExecutive comms

Evidence: P4 (exec recommendation) ↗ Article 1 ↗

I turn messy data into
decisions that move
businesses forward.

The story behind
the data

Built to prove it,
not just describe it.

The full toolkit

Thinking out loud

Let's build
something

I turn messy data intodecisions that movebusinesses forward.

The story behindthe data

Built to prove it,not just describe it.

The full toolkit

Thinking out loud

Let's buildsomething

I turn messy data into
decisions that move
businesses forward.

The story behind
the data

Built to prove it,
not just describe it.

Let's build
something