Extract only the first one from the record unit where the value of the specified column is duplicated

Page update date : Wednesday, February 26, 2025

Page creation date : Saturday, January 4, 2025

Normally, when extracting only one duplicate record, the first thing that comes to mind is distinct I think, distinct does not allow you to specify a duplicate column and another column that you want to retrieve at the same time. In addition,distinct does not match this requirement because it includes non-duplicate records in the extraction.

I don't think you can understand what you want to do with the title alone, so let's use a sample record as an example. What I want to do is the following extraction result.

ID	Name
1	Name 1
2	Name 2
3	Name 3
3	Name 3@example.com
4	Name 4 Name 4@example.com
4	Name 4@example.com
4	Name 4@example.com

Get it as follows:

ID	Name
3	Name 3@example.com
4	Name 4@example.com

IDs 1 and 2 are not included because they are not duplicates. IDs 3 and 4 are duplicates, so one of them is retrieved. However, this time, we will not get the first record in the DB storage order. In addition, suppose that you want the first line of the name in order of priority under the conditions that "Name contains an @ mark" and "Name has a small number of characters".

The above condition can be obtained with the following SQL. However, we are using subqueries and have not checked the performance.

select
  *
from
  (select *, row_number() over(order by ID asc) as RowNum from SampleTable) tableMain
where tableMain.RowNum =
(
  select
    top(1) RowNum
  from
    (select *, row_number() over(order by ID asc) as RowNum from SampleTable) tableSub
  where
        tableMain.ID = tableSub.ID
    and exists ( 
      select
        *
      from
        SampleTable
      where
        tableSub.ID = ID
      having
        count(*) >= 2
    )
  order by
    case when charindex(N'@', Name) >= 1 then 0 else 1 end, len(Name)
)

Duplicate rows are having checked with and count . group by The key column to be checked from the outside, here ID the column is where narrowed down by .

The order of precedence for retrieving the first row is order by specified by . charindex Here, and len check whether it contains the specified character and the length of the string. top All you need to do is get the first row of each duplicate record in .

Originally, this is fine, but if there are two or more records with exactly the same value, such as "ID: 4, Name: Name 4@example.com", you will inevitably get two records. row_number Therefore, it is appended so that it can be identified for each row with , so that only one record with exactly the same value can be retrieved. If it is known that the exact same record does not exist, I row_number think it would be better to check the values of the existing records without appending .

If there are multiple keys to be checked for duplicates, you can increase the number of key columns for each. For example, consider the following record: The columns to be checked for duplicates are "ID1" and "ID2".

ID1	ID2	Name
1	1	Name 1
2	1	Name 2
3	1	Name 31
3	2	Name 32
4	1	Name41
4	2	Name42
4	2	Name 4@example.com
5	1	Name 555@example.com
5	1	Name 55@example.com
5	2	Name 5@example.com
6	1	Name 6
6	1	Name 66@example.com
6	2	Name 6
6	2	Name 6

The acquisition result is as follows.

ID1	ID2	Name
4	2	Name 4@example.com
5	1	Name 55@example.com
6	1	Name 66@example.com
6	2	Name 6

The SQL looks like this: You can simply increase the part where the key column was specified.

select
  *
from
  (select *, row_number() over(order by ID1 asc, ID2 asc) as RowNum from SampleTable) tableMain
where tableMain.RowNum =
(
  select
    top(1) RowNum
  from
    (select *, row_number() over(order by ID1 asc, ID2 asc) as RowNum from SampleTable) tableSub
  where
        tableMain.ID1 = tableSub.ID1
    and tableMain.ID2 = tableSub.ID2
    and exists ( 
      select
        *
      from
        SampleTable
      where
            tableSub.ID1 = ID1
        and tableSub.ID2 = ID2
      having
        count(*) >= 2
    )
  order by
    case when charindex(N'@', Name) >= 1 then 0 else 1 end, len(Name)
)

Performing cumulative calculations in SQL

SQL to determine whether a single piece of data is executed by key in an Update or Insert