--- title: 'PostgreSQL Multicolumn Indexes' page_title: 'PostgreSQL Multicolumn Indexes' page_description: 'In this tutorial, you will learn how to create multicolumn indexes which are indexes defined on more than one column of a table.' prev_url: 'https://www.postgresqltutorial.com/postgresql-indexes/postgresql-multicolumn-indexes/' ogImage: '' updatedOn: '2024-02-28T09:28:50+00:00' enableTableOfContents: true previousLink: title: 'PostgreSQL Partial Index' slug: 'postgresql-indexes/postgresql-partial-index' nextLink: title: 'PostgreSQL REINDEX' slug: 'postgresql-indexes/postgresql-reindex' --- **Summary**: in this tutorial, you will learn how to create PostgreSQL multicolumn indexes, which are indexes defined on two or more columns of a table. ## Introduction to PostgreSQL multicolumn indexes When you [create an index](postgresql-create-index) on two or more columns within a table, this type of index is called a multicolumn index. A multicolumn index is often referred to as a composite index, a combined index, or a concatenated index. A multicolumn index can have a maximum of 32 columns. The limit can be adjusted by modifying the `pg_config_manual.h` file when building PostgreSQL source code. Additionally, only B\-tree, GIST, GIN, and BRIN index types support multicolumn indexes. The following shows the syntax for creating a multicolumn index: ```sql CREATE INDEX [IF NOT EXISTS] index_name ON table_name(column1, column2, ...); ``` In this syntax: - First, specify the index name in the `CREATE INDEX` clause. Use the `IF NOT EXISTS` option to prevent an error from creating an index whose name already exists. - Second, provide the table name along with the index columns in the parenthesis. When defining a multicolumn index, you should place the columns that are frequently used in the [`WHERE`](../postgresql-tutorial/postgresql-where) clause at the beginning of the column list, followed by the columns that are less frequently used in the `WHERE` clause. In general, the query optimizer can use the index when the query’s conditions involve the index’s leading (leftmost) column. For example, if you have an index on `(column1, column2, column3)`, it will be considered for queries such as: ```sql WHERE column1 = v1 AND column2 = v2 AND column3 = v3; ``` Or ```sql WHERE column1 = v1 AND column2 = v2; ``` Or ```sql WHERE column1 = v1; ``` In these cases, the condition on `column1` (and optionally on `column2`) allows PostgreSQL to efficiently narrow down the portion of the index that needs to be scanned. However, if a query does _not_ constrain the first column of the index, PostgreSQL must evaluate whether a full index scan on this index is more efficient than alternative indexes or a table scan. For instance, consider a query with only later columns in the `WHERE` clause: ```sql WHERE column3 = v3; ``` or ```sql WHERE column2 = v2 and column3 = v3; ``` Note that you can also use the `WHERE` clause to define a partially multicolumn index. In such scenarios, PostgreSQL will still consider using the index, but scanning the whole index may have a higher cost than other options. The planner evaluates different execution paths, and if an index scan is not the most efficient, it may choose a sequential scan instead. You can also use a `WHERE` clause to define a partial multicolumn index (an index on multiple columns that only includes rows satisfying a given condition). ## PostgreSQL Multicolumn Index example First, [create a new table](../postgresql-tutorial/postgresql-create-table) called `people` using the following `CREATE TABLE` statement: ```sql CREATE TABLE people ( id INT GENERATED BY DEFAULT AS IDENTITY, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL ); ``` The `people` table consists of three columns: id, first name, and last name. Second, execute the `INSERT` statement in the following file to load `10,000` rows into the `people` table: [Script to load 10000 names](/postgresqltutorial/Script-to-load-10000-names.txt) Third, show the query plan that finds the person whose last name is `Adams`: ```sql EXPLAIN SELECT id, first_name, last_name FROM people WHERE last_name = 'Adams'; ``` Here is the output: ```text QUERY PLAN --------------------------------------------------------- Seq Scan on people (cost=0.00..83.88 rows=9 width=240) Filter: ((last_name)::text = 'Adams'::text) (2 rows) ``` The output indicates that PostgreSQL performs a sequential scan on the `people` table to find the matching rows because there is no index defined for the `last_name` column. Fourth, create a multicolumn index that includes both the `last_name` and `first_name` columns. Assuming that searching for people by their last name is more common than by their first name, we define the index with the following column order: ```sql CREATE INDEX idx_people_names ON people (last_name, first_name); ``` Fifth, show the plan of the query that searches for the person whose last name is `Adams` (using the new index): ```sql EXPLAIN SELECT id, first_name, last_name FROM people WHERE last_name = 'Adams'; ``` Output: ```text QUERY PLAN -------------------------------------------------------------------------------- Bitmap Heap Scan on people (cost=4.42..44.07 rows=18 width=17) Recheck Cond: ((last_name)::text = 'Adams'::text) -> Bitmap Index Scan on idx_people_names (cost=0.00..4.42 rows=18 width=0) Index Cond: ((last_name)::text = 'Adams'::text) (4 rows) ``` The output indicates that the query optimizer uses the `idx_people_names` index for the `last_name = 'Adams'` query. Sixth, find the person whose first name is `Lou` and last name is `Adams`: ```sql EXPLAIN SELECT id, first_name, last_name FROM people WHERE last_name = 'Adams' AND first_name = 'Lou'; ``` Output: ```text QUERY PLAN -------------------------------------------------------------------------------------------- Index Scan using idx_people_names on people (cost=0.29..8.30 rows=1 width=17) Index Cond: (((last_name)::text = 'Adams'::text) AND ((first_name)::text = 'Lou'::text)) (2 rows) ``` The output indicates that the query optimizer will use the index because both columns in the `WHERE` clause (`first_name` and `last_name`) are included in the index. Seventh, search for the person whose first name is `Lou`: ```sql EXPLAIN SELECT id, first_name, last_name FROM people WHERE first_name = 'Lou'; ``` Output: ``` QUERY PLAN ---------------------------------------------------------- Seq Scan on people (cost=0.00..186.00 rows=32 width=17) Filter: ((first_name)::text = 'Lou'::text) (2 rows) ``` The output indicates that PostgreSQL performs a sequential scan instead of using the index, because the first column of the index (`last_name`) is not constrained. Since using the index would require scanning all its entries, the planner determined that a sequential scan on the table was more efficient. ## Summary - Use a PostgreSQL multicolumn index to define an index involving two or more columns from a table. - Place the columns that are frequently used in the `WHERE` clause at the beginning of the column list of the multicolumn index.