How to Keep Only the Most Recent Record per Group in SQL (Oracle and SQL Server)

In many database-driven applications, tables often store multiple records for the same business entity. Whether it’s logs, transactions, status snapshots, or historical data, it’s common to encounter duplicate keys with different update timestamps.

To improve performance or prepare clean datasets for reporting, it’s often necessary to retain only the most recent record (based on a timestamp or date column) for each unique combination of identifiers.

This article walks you through how to:

  • Select only the most up-to-date row per group.
  • Delete older records while preserving the latest entry.
  • Use SQL techniques compatible with Oracle and SQL Server (MSSQL).

Use Case

Imagine a table with these columns:

entity_idcategory_idstatus_codeupdated_atvalue
1001A1ACTIVE2024-12-01 08:00:001.23
1001A1ACTIVE2024-12-02 10:30:001.25
1001A1INACTIVE2024-12-03 11:00:001.20

Your goal is to:

  • Keep only one row per unique combination of entity_id, category_id, and status_code.
  • Keep the row with the most recent updated_at.

Solution Overview

We use the ROW_NUMBER() window function to rank records within each group and order them by date descending. Then we:

  1. Filter for the top-ranked (latest) rows.
  2. Optionally, delete the lower-ranked (older) entries.

This works in both Oracle and SQL Server.


SQL Example: Oracle

Filter: Get the Most Recent Row per Group

WITH RankedRecords AS (
    SELECT 
        entity_id, 
        category_id, 
        status_code, 
        updated_at,
        value,
        ROW_NUMBER() OVER (
            PARTITION BY entity_id, category_id, status_code
            ORDER BY updated_at DESC
        ) AS rn
    FROM records
)
SELECT *
FROM RankedRecords
WHERE rn = 1;

🗑️ Delete: Remove All But the Most Recent Record per Group

DELETE FROM records
WHERE (entity_id, category_id, status_code, updated_at) NOT IN (
    WITH RankedRecords AS (
        SELECT 
            entity_id, 
            category_id, 
            status_code, 
            updated_at,
            ROW_NUMBER() OVER (
                PARTITION BY entity_id, category_id, status_code
                ORDER BY updated_at DESC
            ) AS rn
        FROM records
    )
    SELECT 
        entity_id, 
        category_id, 
        status_code, 
        updated_at
    FROM RankedRecords
    WHERE rn = 1
);

🧱 SQL Example: Microsoft SQL Server

🔍 Filter: Select Most Updated Row per Group

WITH RankedRecords AS (
    SELECT 
        entity_id, 
        category_id, 
        status_code, 
        updated_at,
        value,
        ROW_NUMBER() OVER (
            PARTITION BY entity_id, category_id, status_code
            ORDER BY updated_at DESC
        ) AS rn
    FROM records
)
SELECT *
FROM RankedRecords
WHERE rn = 1;

🗑️ Delete: Remove Older Entries per Group

WITH CTE AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (
            PARTITION BY entity_id, category_id, status_code
            ORDER BY updated_at DESC
        ) AS RowNum
    FROM records
)
DELETE FROM CTE
WHERE RowNum > 1;

✅ Why This Works

  • ROW_NUMBER() creates a unique sequence per group.
  • PARTITION BY ensures separate rankings for each group.
  • Ordering by updated_at DESC ensures the most recent row is ranked first.
  • Filtering or deleting based on row number lets you efficiently de-duplicate by recency.

⚠️ Best Practices

  • Ensure updated_at is non-null and properly indexed.
  • Always backup your table or test the query in a sandbox before deleting.
  • Use composite indexes on grouping and date columns for better performance.
  • When dealing with large datasets, consider batch deletes to avoid locks:
-- MSSQL batch delete example
WITH CTE AS (
    SELECT TOP (1000)
        *,
        ROW_NUMBER() OVER (
            PARTITION BY entity_id, category_id, status_code
            ORDER BY updated_at DESC
        ) AS RowNum
    FROM records
)
DELETE FROM CTE
WHERE RowNum > 1;

Conclusion

When you’re dealing with datasets that contain multiple versions of the same logical entity, it’s often critical to retain only the latest record per group. Using ROW_NUMBER() with PARTITION BY and ORDER BY, you can elegantly solve this problem in Oracle and SQL Server.

This technique is widely applicable in data deduplication, reporting, data warehousing, and archiving scenarios — and it’s both performant and easy to maintain.

This article is inspired by real-world challenges we tackle in our projects. If you're looking for expert solutions or need a team to bring your idea to life,

Let's talk!

    Please fill your details, and we will contact you back

      Please fill your details, and we will contact you back