Skip to content

Ritik574-coder/sqlserver-datawarehouse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

44 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›οΈ BusinessDW β€” Business Data Warehouse

A production-grade SQL Server Data Warehouse built using the Medallion Architecture (Bronze β†’ Silver β†’ Gold), integrating multi-source enterprise data into analytics-ready dimensional models.


πŸ“Œ Table of Contents


🧭 Project Overview

BusinessDW is a comprehensive Business Data Warehouse built on SQL Server that integrates data from two enterprise source systems β€” CRM and ERP β€” and transforms raw operational data into structured, analytics-ready information.

The project follows the Medallion Architecture β€” a layered data design pattern that ensures traceability, data quality, and business-readiness at each stage of the pipeline.

Key Goals:

  • Integrate multi-source enterprise data (CRM + ERP)
  • Build a reliable, auditable ingestion pipeline
  • Apply data cleansing, standardization, and business rules
  • Deliver dimensional models ready for BI, reporting, and analytics

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Sources │────▢│    BRONZE    │────▢│    SILVER    │────▢│     GOLD     │────▢│    Consumers    β”‚
β”‚ CRM / ERPβ”‚     β”‚   Raw Data   β”‚     β”‚ Clean & Std  β”‚     β”‚Business-Readyβ”‚     β”‚ BI / Analytics  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Visual_architeture

Alt text

The data warehouse is structured as a SQL Server database with three schema layers, each serving a distinct purpose in the data pipeline.


🧩 Entity Relationship Diagram (ERD)

relationship diagram image


πŸ“‚ Data Sources

Data is ingested from two enterprise systems via CSV file extracts (batch ingestion):

πŸ”΅ CRM System

File Description
cust_info.csv Customer master information
prd_info.csv Product details
sales_details.csv Transactional sales / order records

🟒 ERP System

File Description
cust_az12.csv Customer demographics β€” birth date, gender
loc_a101.csv Customer geography β€” country mapping via customer ID
px_cst_g1v2.csv Product classification β€” category, subcategory, maintenance indicator, product line

πŸ“Š Data Layers

🟀 Bronze Layer β€” Raw Ingestion

Property Detail
Definition Raw, unprocessed data β€” as-is from source systems
Objective Traceability & Debugging
Object Type Tables
Load Method Full Load (Truncate & Insert)
Transformations None (as-is)
Data Modeling None (as-is)
Target Audience Data Engineers

The Bronze layer preserves the exact source data to enable full auditability and root-cause analysis for any downstream data issues.


βšͺ Silver Layer β€” Cleansed & Standardized

Property Detail
Definition Cleaned & standardized data
Objective Intermediate layer β€” prepare data for analysis
Object Type Tables
Load Method Full Load (Truncate & Insert)
Target Audience Data Analysts, Data Engineers

Transformations applied:

  • Data Cleansing
  • Data Standardization
  • Data Normalization
  • Derived Columns
  • Data Enrichment

Key business rules:

  • Deduplicate customers
  • Standardize product IDs and names
  • Conform customer IDs across ERP and CRM
  • Map country codes
  • Enrich products with classification attributes
  • Validate date and gender values
  • Cleanse sales transactions

🟑 Gold Layer β€” Business-Ready

Property Detail
Definition Business-Ready data
Objective Provide data for reporting & analytics consumption
Object Type Views
Load Method None (derived from Silver)
Target Audience Data Analysts, Business Users

Transformations applied:

  • Data Integration (joining across domains)
  • Data Aggregation
  • Business Logic & Rules

Data Modeling patterns:

  • Star Schema
  • Aggregated Objects
  • Flat Tables

πŸ”€ Data Flow & Lineage

Alt text

Each Bronze table maps 1:1 to a Silver table. Silver tables are then integrated and aggregated into Gold views using business logic and dimensional modeling.


βš™οΈ ETL Workflow

Each layer follows a consistent 4-phase development workflow:

🟀 Bronze Layer

Analyse               Code                  Validate               Document
─────────────────    ─────────────────    ──────────────────    ──────────────────
Interview source  ──▢ Data Ingestion    ──▢ Completeness &    ──▢ Documenting
system experts        (Stored Proc)          Schema Checks          Versioning in GIT

βšͺ Silver Layer

Analyse               Code                  Validate               Document
─────────────────    ─────────────────    ──────────────────    ──────────────────
Explore &         ──▢ Data Cleansing    ──▢ Data Correctness  ──▢ Documenting
Understand Data       (Stored Proc)          Checks                 Versioning in GIT
                                                                     + Data Flow
                                                                     + Data Integration

🟑 Gold Layer

Analyse               Code                  Validate               Document
─────────────────    ─────────────────    ──────────────────    ──────────────────
Explore &         ──▢ Data             ──▢ Data Integration  ──▢ Documenting
Understand            Integration           Checks                 Versioning in GIT
Business Objects      (Stored Proc)                                + Data Model
                                                                     + Data Catalog
                                                                     + Data Flow

πŸ₯‡ Gold Layer Outputs

The following analytics-ready objects are produced in the Gold layer:

Object Type Description
dim_customers View Unified customer dimension β€” integrates CRM + ERP demographics and geography
dim_products View Enriched product dimension β€” integrates product details with classification data
dim_location View Geographic dimension for location-based analytics
fact_sales View Central sales fact table β€” transactional order records

These objects form a Star Schema optimized for BI tools, ad-hoc SQL queries, and machine learning pipelines.


πŸ› οΈ Tech Stack

Component Technology
Database SQL Server
ETL / Transformation T-SQL Stored Procedures
Load Strategy Batch Processing β€” Full Load (Truncate & Insert)
Source Interface CSV files (folder-based ingestion)
Version Control Git
Consumers BI & Reporting, Ad-Hoc SQL, Machine Learning

πŸ‘€ Author

Ritik Aspiring Data Engineer | Focused on building production-grade data systems


⭐ Conclusion

This project is a complete end-to-end Business Data Warehouse implementation β€” from raw data ingestion to business-ready analytical datasets. It demonstrates real-world data engineering practices including multi-source integration, layered ETL design, data quality management, and dimensional modeling using industry-standard Medallion Architecture principles.


Built with SQL Server Β· Medallion Architecture Β· CRM & ERP Integration

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors