Ssis-668 | [portable]

SSIS‑668 – A Comprehensive Implementation & Troubleshooting Guide

Audience – SSIS developers, data‑engineers, BI architects, and DBA‑type stakeholders who need to design, build, test, deploy, and support the SSIS‑668 solution (a reusable data‑integration package / pattern that many organizations use for “high‑volume master‑data load with change‑tracking”). Scope – This guide covers everything from prerequisites and design concepts through step‑by‑step development, deployment, monitoring, and troubleshooting. It assumes you are working with SQL Server 2019+ / Azure Data Studio and Visual Studio 2022 with the SQL Server Data Tools (SSDT) extension installed.

1. What Is SSIS‑668? | Item | Description | |------|-------------| | Name | SSIS‑668 – a reference implementation for incremental, change‑data‑capture (CDC) loading of master data from an operational source (OLTP) into a dimensional / reporting data‑warehouse. | | Core Features | • Detects inserts, updates, deletes (SCD‑2) using source CDC tables or timestamp columns • Handles high‑volume data (hundreds of millions rows per run) • Supports parallelism via partitioned data flows • Provides idempotent loads – safe to re‑run without duplication • Emits detailed audit logs (run‑time, row‑counts, error files) | | Typical Use Cases | • Nightly master‑data sync from ERP to DW • Near‑real‑time CDC pipelines (via SSIS + SQL Server Agent) • Data‑warehouse staging area refreshes | | Why It Has a “668” Tag | The original internal ticket/feature request was #668 . Over time the name stuck as the de‑facto reference pattern. It is now shared across multiple projects, so the tag is used for documentation, versioning, and support tickets. |

Note: If your organization uses a different definition (e.g., a custom package named “SSIS‑668”), map the concepts below to your local naming conventions. SSIS-668

2. Prerequisites | Category | Requirement | Recommended Version / Setting | |----------|-------------|-------------------------------| | SQL Server | Database Engine (source & target) | SQL Server 2019 ≥ CU12 or SQL Server 2022 | | SSIS Runtime | Integration Services Catalog (SSISDB) | Deployed to a dedicated SSISDB on a dedicated SQL Server instance | | Development Tools | Visual Studio 2022 + SSDT | Ensure “SQL Server Integration Services” workload is installed | | Permissions | • db_datareader on source DB • db_datawriter on staging & target DW tables • EXECUTE on stored procedures used in the package • SSIS_admin role on SSISDB for deployment | Use a service account for the SSIS Agent job (least‑privilege) | | Hardware | • Minimum 8 GB RAM, 4 vCPU for dev workstation • Production: 16 GB+ RAM, SSD storage, network bandwidth ≥ 1 Gbps between source and DW | Scale out based on row‑volume (see Section 5) | | Other | • .NET Framework 4.8 (or later) • PowerShell 7+ (optional for post‑deploy scripts) | – |

3. High‑Level Architecture +----------------+ +--------------------+ +-------------------+ | Source System | ---> | SSIS‑668 Package | ---> | Data Warehouse | | (OLTP DB) | | (Data Flow, CDC) | | (Dim/Fact Tables) | +----------------+ +--------------------+ +-------------------+

Key Components: 1️⃣ CDC Extraction Sub‑Package 2️⃣ Staging Area (temp tables) 3️⃣ SCD‑2 Merge Logic 4️⃣ Auditing & Logging 5️⃣ Error‑Handling & Redirection | | Core Features | • Detects inserts,

3.1. Logical Flow | Step | Description | |------|-------------| | A. Detect Changes | Use either SQL Server CDC (system tables) or a high‑watermark column (e.g., LastModifiedDT ) to pull only rows that changed since the previous run. | | B. Load to Staging | Bulk‑load the delta set into a staging table ( dbo.stg_<Entity> ) using Fast Load with Table Lock and Check Constraints disabled for performance. | | C. Apply Business Rules | Optional Script Component or Derived Column transformations to enforce data‑cleansing, look‑ups, or surrogate‑key generation. | | D. Merge into Target | Use a set‑based MERGE (or INSERT/UPDATE/DELETE pattern) to implement SCD‑2 . This step is wrapped in a transaction and writes to an audit table ( dbo.Audit_<Entity> ). | | E. Post‑Load Activities | Refresh materialized views, update row‑counts, purge old staging rows, and send an email / webhook notification. | | F. Logging | SSISDB built‑in logging + a custom execution log table ( dbo.SSIS_ExecutionLog ) to capture start/end timestamps, rows processed, and any warnings. |

4. Detailed Development Steps 4.1. Create the Integration Services Project

Open Visual Studio → New → Integration Services Project Name the project SSIS-668_MasterDataLoad . In Solution Explorer → right‑click → Add → New Folder → create Packages , SQL Scripts , Configs . Table Lock = True

4.2. Build the CDC Extraction Sub‑Package | Object | Configuration | |--------|---------------| | Connection Managers | • SourceDB (OLE DB) – points to source OLTP. • DWDB (OLE DB) – points to target DW. | | Variables | • User::LastHighWaterMark (DateTime) – persisted in a control table. • User::CurrentHighWaterMark – set after each successful run. | | Control Flow | 1️⃣ Execute SQL Task – read LastHighWaterMark from dbo.ETL_Control . 2️⃣ Data Flow Task – CDC Source (or OLE DB Source with query WHERE ModifiedDT > ? ). 3️⃣ Execute SQL Task – update dbo.ETL_Control with CurrentHighWaterMark . | | CDC Source (if using SQL Server CDC) | • Enable CDC on the source table ( sys.sp_cdc_enable_table ). • Use the cdc.<schema>_<table>_CT change table as the source. | | Data Flow → Fast Load | Destination = stg_<Entity> Properties: Maximum Insert Commit Size = 0 (full batch), Table Lock = True , Check Constraints = False . |

Tip: Use parameterized queries ( ? ) for the high‑watermark to keep the package fully dynamic.