20130701

Filled Under:

DISTINCT and How NOT To Use It

The DISTICT keyword placed next to SELECT restricts the result to unique rows from a query.
DISTINCT is also a much abused keyword, often used as a quick fix to bad queries. Take this example from Oracle Applications:



        
-- Problem
SELECT p.party_id
     , p.party_name
FROM hz_parties p
   , hz_cust_accounts c
WHERE c.party_id=p.party_id
AND upper(p.party_name) LIKE 'BUSINESS WOR%’
  PARTY_ID PARTY_NAME
---------- ------------------
      4429 Business World
      4429 Business World
You see that the query returns two rows. Since you’re only interested in one unique answer, how would you correct the query?

The lazy developer approach:

        
-- Wrong Approach
SELECT DISTINCT p.party_id
     , p.party_name
FROM hz_parties p
   , hz_cust_accounts c
WHERE c.party_id=p.party_id
AND upper(p.party_name) LIKE 'BUSINESS WOR%’
PARTY_ID   PARTY_NAME
---------- ------------------
      4429 Business World
Some developers simply stick a DISTINCT in, to suppress duplicate rows from an incorrect query. But is this a wise fix?

 

DISTINCT Abuse and its Consequences

You may have got the right answer with the lazy developer approach BUT:
  • Your query will not perform optimally – DISTINCT causes an additional sort of data.
  • It is possible that one of the joins in the query needs to be an EXISTS subquery, or the join condition is incomplete, or the data model is faulty. This needs to be investigated and the root cause fixed. By using DISTINCT, you are ignoring the problem, not resolving it.

How to Correct The Query Above

That query happened to need an EXISTS subquery in place of a join. The following modified query would work correctly, returning only unique rows:

        
-- Correct Approach
SELECT p.party_id
     , p.party_name
FROM hz_parties p
WHERE upper(p.party_name) LIKE 'BUSINESS WOR%'
AND EXISTS ( SELECT null
             FROM hz_cust_accounts c
             WHERE  c.party_id=p.party_id)
  PARTY_ID PARTY_NAME
---------- ------------------
      4429 Business World

Summary

DISTINCT makes a query return unique rows only, and is often used injudiciously to suppress duplicate rows being returned by a bad query. Don’t use DISTINCT to cover up errors. Fix the problem at the source: run the query without DISTINCT, review it to find why the duplicates occur and correct it.





0 comments:

Post a Comment