Summary: in this tutorial, you will learn how to use the PostgreSQL SELECT DISTINCT clause to remove duplicate rows from a result set returned by a query.
Introduction to PostgreSQL SELECT DISTINCT clause
The SELECT DISTINCT removes duplicate rows from a result set. The SELECT DISTINCT clause retains one row for each group of duplicates.
The SELECT DISTINCT clause can be applied to one or more columns in the select list of the SELECT statement.
The following illustrates the syntax of the DISTINCT clause:
SELECT
DISTINCT column1
FROM
table_name;In this syntax, the SELECT DISTINCT uses the values in the column1 column to evaluate the duplicate.
If you specify multiple columns, the SELECT DISTINCT clause will evaluate the duplicate based on the combination of values in these columns. For example:
SELECT
DISTINCT column1, column2
FROM
table_name;In this syntax, the SELECT DISTINCT uses the combination of values in both column1 and column2 columns for evaluating the duplicate.
Note that PostgreSQL also offers the DISTINCT ON clause that retains the first unique entry of a column or combination of columns in the result set.
If you want to find distinct values of all columns in a table, you can use SELECT DISTINCT *:
SELECT DISTINCT *
FROM table_name;The star or asterisk (*) means all columns of the table_name.
PostgreSQL SELECT DISTINCT examples
Let’s create a new table to practice the SELECT DISTINCT clause.
Note that you will learn how to create a table and insert data into it in the subsequent tutorial. In this tutorial, you need to execute the statement in psql or use pgAdmin to execute the statements.
First, create the colors table that has three columns: id, bcolor and fcolor using the following CREATE TABLE statement:
CREATE TABLE colors(
id SERIAL PRIMARY KEY,
bcolor VARCHAR,
fcolor VARCHAR
);Second, insert some rows into the colors table:
INSERT INTO
colors (bcolor, fcolor)
VALUES
('red', 'red'),
('red', 'red'),
('red', NULL),
(NULL, 'red'),
(NULL, NULL),
('green', 'green'),
('blue', 'blue'),
('blue', 'blue');Third, retrieve the data from the colors table using the SELECT statement:
SELECT
id,
bcolor,
fcolor
FROM
colors;Output:
id | bcolor | fcolor
----+--------+--------
1 | red | red
2 | red | red
3 | red | null
4 | null | red
5 | null | null
6 | green | green
7 | blue | blue
8 | blue | blue
(8 rows)1) PostgreSQL SELECT DISTINCT one column example
The following statement selects unique values from the bcolor column of the colors table and sorts the result set in alphabetical order by using the ORDER BY clause.
SELECT
DISTINCT bcolor
FROM
colors
ORDER BY
bcolor;Output:
bcolor
--------
blue
green
red
null
(4 rows)The bcolor column has three 'red' entries, two NULL, one 'green', and two 'blue'. SELECT DISTINCT removes two 'red' values, one NULL, and one 'blue'.
Note that PostgreSQL treats NULLs as duplicates so that it keeps one NULL for all NULLs when you apply the SELECT DISTINCT clause.
2) SELECT DISTINCT on multiple columns
The following statement applies the SELECT DISTINCT clause to both bcolor and fcolor columns:
SELECT
DISTINCT bcolor, fcolor
FROM
colors
ORDER BY
bcolor,
fcolor;Output:
bcolor | fcolor
--------+--------
blue | blue
green | green
red | red
red | null
null | red
null | null
(6 rows)In this example, the query uses the values from both bcolor and fcolor columns to evaluate the uniqueness of rows.
3) Using the SELECT DISTINCT clause in practice
In practice, you often use the SELECT DISTINCT clause to analyze the uniqueness of values in a column.
For example, you may want to know how many rental rates for films from the film table:
To achieve this, you can specify the rental_rate column in the SELECT DISTINCT clause as follows:
SELECT DISTINCT
rental_rate
FROM
film
ORDER BY
rental_rate;Output:
rental_rate
-------------
0.99
2.99
4.99
(3 rows)The output indicates that there are only three distinct rental rates 0.99, 2.99, and 4.99.
Note that for executing the query above your current database should be dvdrental. To switch the current database to dvdrental run:
\c dvdrentalSummary
- Use the
SELECT DISTINCTto remove duplicate rows from a result set of a query.