Description
What you’ll learn
-
Build a complete portfolio project you can publish: an end-to-end SQL data cleaning + KPI pipeline
-
Turn a messy e-commerce table into a trusted clean_table that’s safe for reporting and dashboards
-
Profile data like a professional: row counts, null/completeness checks, category profiling, and “how bad is it?” diagnostics
-
Build a typed silver layer in SQL: safe casting, mixed-format date parsing, and text normalisation (without silently corrupting results)
-
Enforce a real business contract: filter invalid orders (amounts, costs, flags, hour ranges) and quantify exactly what each rule removes
-
Detect and remove duplicates using a business key, and understand the real-world risk of defining that key incorrectly
-
Implement 10 dashboard-ready KPIs in SQL using CTEs, aggregates, and window functions where needed
-
Standardise outputs into a single kpi_results table with one consistent schema a dashboard (or platform) can read
-
Debug KPI mismatches properly: trace issues back to the right layer (source → silver → clean → KPI) instead of guessing
-
Package the project professionally: clean SQL files, a strong README, evidence screenshots, and a LinkedIn-ready project summary
This course is built to give you a publishable portfolio project as the end product — a complete SQL data-cleaning and KPI pipeline you can put on GitHub, link on LinkedIn, and confidently talk through in interviews.
It’s a real-world simulation built around one messy dataset and a business brief with a clear target: deliver ten KPIs that are trustworthy enough to go on a dashboard.
Most SQL “data cleaning” courses either stay at the level of syntax drills, or they use clean toy datasets where nothing breaks. That’s not what you face in real data teams.
In this course you’ll work through the same workflow you’d use on a real project:
-
Read the brief properly so you know what “correct” means
-
Explore the raw schema and spot the mess early (mixed date formats, typos in categories, missing values, duplicates)
-
Build a typed, safer silver layer where errors surface in a controlled way
-
Enforce the business rules and deduplicate into one trusted clean_table
-
Compute and standardise all KPI outputs into a consistent results table
-
Validate results, understand tolerances/rounding, and debug mismatches like a professional
-
Finish by turning the whole pipeline into a portfolio-ready GitHub project, with a clean repo structure, a strong README, and proof of results
Course outline (high level):
-
Section 00: Course Introduction
-
Section 01: The Verulam Blue Mint Environment
-
Section 02: Understanding the Challenge Brief
-
Section 03: Exploring Source Data Schema
-
Section 04: Data Cleaning I – Sampling & Completeness
-
Section 05: Data Cleaning II – Silver Layer & Normalisation
-
Section 06: Data Cleaning III – Business Rules & Deduplication
-
Section 07: Understanding the KPIs
-
Section 08: Computing KPIs
-
Section 09: Results
-
Section 10: Portfolio project deployment (repo + README + LinkedIn-style project story)
By the end, you won’t just know “how to clean data using SQL”. You’ll have an end-to-end portfolio project you can explain clearly: what was wrong with the data, what you changed, what rules you enforced, and why your KPIs can be trusted.
Who this course is for:
- Anyone who wants a portfolio project they can publish: a complete SQL cleaning + KPI pipeline you can put on GitHub and confidently explain in interviews
- Data analysts, BI developers, and aspiring analytics/data engineers who already know basic SQL and want a serious, employer-facing project (not toy examples)
- Learners who can write queries but haven’t yet built a layered workflow end-to-end (raw → silver → clean → KPIs → standardised results)
- Job seekers who want proof-of-skill in the areas employers actually care about: data quality reasoning, business-rule enforcement, deduplication, and metric reliability
- Not ideal if you’re brand new to SQL and need a fundamentals-first course.





Reviews
There are no reviews yet.