Extract only the first one from the record unit where the value of the specified column is duplicated
Normally, when extracting only one duplicate record, the first thing that comes to mind is distinct
I think,
distinct
does not allow you to specify a duplicate column and another column that you want to retrieve at the same time.
In addition,distinct
does not match this requirement because it includes non-duplicate records in the extraction.
I don't think you can understand what you want to do with the title alone, so let's use a sample record as an example. What I want to do is the following extraction result.
ID | Name |
---|---|
1 | Name 1 |
2 | Name 2 |
3 | Name 3 |
3 | Name 3@example.com |
4 | Name 4 Name 4@example.com |
4 | Name 4@example.com |
4 | Name 4@example.com |
Get it as follows:
ID | Name |
---|---|
3 | Name 3@example.com |
4 | Name 4@example.com |
IDs 1 and 2 are not included because they are not duplicates. IDs 3 and 4 are duplicates, so one of them is retrieved. However, this time, we will not get the first record in the DB storage order. In addition, suppose that you want the first line of the name in order of priority under the conditions that "Name contains an @ mark" and "Name has a small number of characters".
The above condition can be obtained with the following SQL. However, we are using subqueries and have not checked the performance.
select
*
from
(select *, row_number() over(order by ID asc) as RowNum from SampleTable) tableMain
where tableMain.RowNum =
(
select
top(1) RowNum
from
(select *, row_number() over(order by ID asc) as RowNum from SampleTable) tableSub
where
tableMain.ID = tableSub.ID
and exists (
select
*
from
SampleTable
where
tableSub.ID = ID
having
count(*) >= 2
)
order by
case when charindex(N'@', Name) >= 1 then 0 else 1 end, len(Name)
)
Duplicate rows are having
checked with and count
. group by
The key column to be checked from the outside, here ID
the column is where
narrowed down by .
The order of precedence for retrieving the first row is order by
specified by . charindex
Here, and len
check whether it contains the specified character and the length of the string.
top
All you need to do is get the first row of each duplicate record in .
Originally, this is fine, but if there are two or more records with exactly the same value, such as "ID: 4, Name: Name 4@example.com", you will inevitably get two records.
row_number
Therefore, it is appended so that it can be identified for each row with , so that only one record with exactly the same value can be retrieved.
If it is known that the exact same record does not exist, I row_number
think it would be better to check the values of the existing records without appending .
If there are multiple keys to be checked for duplicates, you can increase the number of key columns for each. For example, consider the following record: The columns to be checked for duplicates are "ID1" and "ID2".
ID1 | ID2 | Name |
---|---|---|
1 | 1 | Name 1 |
2 | 1 | Name 2 |
3 | 1 | Name 31 |
3 | 2 | Name 32 |
4 | 1 | Name41 |
4 | 2 | Name42 |
4 | 2 | Name 4@example.com |
5 | 1 | Name 555@example.com |
5 | 1 | Name 55@example.com |
5 | 2 | Name 5@example.com |
6 | 1 | Name 6 |
6 | 1 | Name 66@example.com |
6 | 2 | Name 6 |
6 | 2 | Name 6 |
The acquisition result is as follows.
ID1 | ID2 | Name |
---|---|---|
4 | 2 | Name 4@example.com |
5 | 1 | Name 55@example.com |
6 | 1 | Name 66@example.com |
6 | 2 | Name 6 |
The SQL looks like this: You can simply increase the part where the key column was specified.
select
*
from
(select *, row_number() over(order by ID1 asc, ID2 asc) as RowNum from SampleTable) tableMain
where tableMain.RowNum =
(
select
top(1) RowNum
from
(select *, row_number() over(order by ID1 asc, ID2 asc) as RowNum from SampleTable) tableSub
where
tableMain.ID1 = tableSub.ID1
and tableMain.ID2 = tableSub.ID2
and exists (
select
*
from
SampleTable
where
tableSub.ID1 = ID1
and tableSub.ID2 = ID2
having
count(*) >= 2
)
order by
case when charindex(N'@', Name) >= 1 then 0 else 1 end, len(Name)
)