Check out our new migration tool - Move your Postgres data to Neon by pasting your connection URL, no manual steps

The tablefunc extension

new

Reshape data with pivot tables and navigate hierarchical structures in Postgres

The tablefunc extension for Postgres provides a powerful set of functions for transforming data directly within your database. Its primary capabilities include creating pivot tables (also known as cross-tabulations) to reshape data, generating sets of normally distributed random numbers, and querying hierarchical or tree-like data structures.

For instance, you can use tablefunc to transform a list of quarterly product sales into a summary table where each product is a row and each quarter is a column. Or, you could explore an employee reporting structure to visualize an organization chart.

Try it on Neon!

Neon is Serverless Postgres built for the cloud. Explore Postgres features and functions in our user-friendly SQL editor. Sign up for a free account to get started.

Sign Up

Enable the tablefunc extension

You can enable the extension by running the following CREATE EXTENSION statement in the Neon SQL Editor or from a client such as psql that is connected to your Neon database.

CREATE EXTENSION IF NOT EXISTS tablefunc;

Version availability:

Please refer to the list of all extensions available in Neon for up-to-date extension version information.

Key functions and usage

The tablefunc extension provides the following key functions:

  1. normal_rand(): Generates a series of random numbers following a normal (Gaussian) distribution.
  2. crosstab(): Transforms data from a "long" format to a "wide" format, creating pivot tables.
  3. connectby(): Traverses hierarchical data, such as organizational charts or bill-of-materials structures.

Let's explore each function in detail.

normal_rand()

The normal_rand() function is useful for creating sample datasets that mimic real-world measurements often clustering around an average value (mean) with a certain spread (standard deviation).

Function signature:

normal_rand(count INTEGER, mean FLOAT8, stddev FLOAT8) RETURNS SETOF FLOAT8
  • count: The number of random values to generate.
  • mean: The central value (average) of the distribution.
  • stddev: The standard deviation, indicating the spread of the numbers.

Example:

To generate 5 random numbers with a mean of 10.0 and a standard deviation of 2.0:

SELECT * FROM normal_rand(5, 10.0, 2.0);

Example output:

normal_rand
-------------
  9.32020692360359
  11.495399206878934
  7.738467056884886
  9.672348520651616
  7.734973342540705
(5 rows)

Ouput will vary each time you run the function due to the random nature of the data.

Use case: Populating tables with realistic-looking sample data for testing or analysis.

crosstab()

The crosstab() function is used for reshaping data, particularly for creating pivot tables. It allows you to summarize and reorganize data by transforming rows into columns, making it easier to analyze and visualize.

Basic crosstab() (single SQL argument)

This version of crosstab takes a single SQL query string as input. This query must produce exactly three columns: row identifier, category, and value.

Consider a product_sales_long table:

productquartersales
AppleQ1100
AppleQ2120
BananaQ180
AppleQ3110
BananaQ295

We want to transform it into:

productQ1_salesQ2_salesQ3_sales
Apple100120110
Banana8095(null)

Query:

CREATE TABLE product_sales_long (
  product TEXT,
  quarter TEXT,
  sales INT
);

INSERT INTO product_sales_long (product, quarter, sales) VALUES
  ('Apple', 'Q1', 100),
  ('Apple', 'Q2', 120),
  ('Banana', 'Q1', 80),
  ('Apple', 'Q3', 110),
  ('Banana', 'Q2', 95);

-- Using crosstab to pivot the product_sales_long table
SELECT *
FROM crosstab(
  'SELECT product, quarter, sales FROM product_sales_long ORDER BY 1, 2'
) AS ct(product TEXT, Q1_sales INT, Q2_sales INT, Q3_sales INT);

Breaking down the query:

  1. crosstab('source_sql_query_as_string'):

    The source_sql_query_as_string must return three columns:

    • Row identifier: Values in this column become distinct rows in the output (e.g., product).
    • Category: Values in this column become new column headers in the output (e.g., quarter).
    • Value: Values in this column populate the cells of the new pivot table (e.g., sales).

    Crucially, this source query must be sorted by the first column, then the second (ORDER BY 1, 2). This ensures crosstab processes data correctly (e.g., Q1 comes before Q2).

  2. AS ct(column_definitions):

    • Because crosstab returns a generic SETOF record, you must explicitly define the structure of the output table.
    • ct: An alias for the resulting table.
    • product TEXT: Corresponds to the first column of the source_sql_query. Its data type should match.
    • Q1_sales INT, Q2_sales INT, Q3_sales INT: These are the new columns derived from the unique values in the 'category' (second) column of your source_sql_query. Their data types must match the 'value' (third) column of the source_sql_query.
    • If a row identifier/category combination doesn't exist in the source data (e.g., Banana for Q3), the corresponding cell in the pivot table will be NULL.
    • If the source data contains categories not defined in the AS ct(...) clause, those categories will be ignored.

crosstab() with fixed columns (using two SQL queries)

This version of crosstab is used when you know exactly which categories you want as your new columns, and you want them to appear in a specific order. It's perfect for reports where the column layout is fixed, even if some rows don't have data for every column.

Imagine you have a table of student_test_scores:

student_namesubjectscore
AliceMath90
AliceScience85
BobMath78
AliceEnglish92
BobScience88
CarolMath95
CarolEnglish89

We want to transform this into a table where each student is a row, and their scores for 'Math', 'Science', 'English', and 'History' are in separate columns.

Desired Output:

student_namemath_scorescience_scoreenglish_scorehistory_score
Alice908592(null)
Bob7888(null)(null)
Carol95(null)89(null)

Notice we want a 'History' column even if no one has a score for it yet – it will just show (null).

Here's how we do it with crosstab():

-- Create the student_test_scores table
CREATE TABLE student_test_scores (
    student_name TEXT,
    subject TEXT,
    score INT
);

INSERT INTO student_test_scores (student_name, subject, score) VALUES
    ('Alice', 'Math', 90),
    ('Alice', 'Science', 85),
    ('Bob', 'Math', 78),
    ('Alice', 'English', 92),
    ('Bob', 'Science', 88),
    ('Carol', 'Math', 95),
    ('Carol', 'English', 89);

-- Now, the crosstab query
SELECT *
FROM crosstab(
  -- Query 1: This is our source data.
  -- It needs: row_identifier, category_for_new_columns, value_for_cells
  'SELECT student_name, subject, score
   FROM student_test_scores
   ORDER BY 1', -- IMPORTANT: Order by the row_identifier (student_name)

  -- Query 2: This query defines our new column headers, in the order we want them.
  -- It must return one column with the list of categories.
  $$SELECT s FROM unnest(ARRAY['Math', 'Science', 'English', 'History']) AS s$$
) AS ct(
    student TEXT,          -- This matches 'student_name' from Query 1
    math_score INT,        -- This matches 'Math' from Query 2
    science_score INT,     -- This matches 'Science' from Query 2
    english_score INT,     -- This matches 'English' from Query 2
    history_score INT      -- This matches 'History' from Query 2
);

unnest() is a Postgres function that expands an array into a set of rows. In this case, it generates the list of subjects to be used as column headers in the pivot table. The ARRAY[...] syntax creates an array of the specified values, and unnest() converts it into a set of rows. This allows you to dynamically define the categories for the pivot table based on the contents of the array. Learn more about the unnest() function here: Expanding an array into rows

How the crosstab(source_sql, category_sql) works:

  1. source_sql (the first query string):

    • This query fetches your raw data.
    • It must provide:
      1. The column(s) that will identify each row in your final table (here, student_name).
      2. The column whose values will become your new column headers (here, subject).
      3. The column whose values will fill the cells of your new table (here, score).
    • It's very important to ORDER BY the row identifier column(s) (e.g., ORDER BY student_name or ORDER BY 1).
  2. category_sql (the second query string):

    • This query's job is to produce a single column containing the exact list of categories you want as your new column headers.
    • The order of categories returned by this query determines the order of your new columns in the final pivot table.
    • In our example, $$SELECT s FROM unnest(ARRAY['Math', 'Science', 'English', 'History']) AS s$$ provides the list: 'Math', then 'Science', then 'English', then 'History'.
  3. AS ct(student TEXT, math_score INT, ...):

    • This part defines the structure of your final output table.
    • The first column(s) here (student TEXT) must match the type and number of your row identifier columns from source_sql.
    • The following columns (math_score INT, science_score INT, etc.) must match, in order, the categories produced by category_sql. Their data type should match the value column from source_sql (the score column, which is INT).

This two-argument version of crosstab is powerful because it guarantees your output table will always have the columns 'Math', 'Science', 'English', and 'History' in that order, filling in (null) where a student doesn't have a score for a particular subject.

crosstabN() functions

For common scenarios where the row identifier is text and you need a fixed number of text value columns (2, 3, or 4), tablefunc offers crosstab2(), crosstab3(), and crosstab4(). These are simplified wrappers around the main crosstab function, providing predefined output structures for common text-based pivot tables, saving you from writing the full AS (...) definition.

These functions are most useful when your source query provides a text row identifier, text categories, and text values (or values castable to text). The crosstabN function then produces an output table with a row_name TEXT column and N additional category_X TEXT columns.

For instance, if you use crosstab3(), the output table structure will implicitly be: (row_name TEXT, category_1 TEXT, category_2 TEXT, category_3 TEXT)

No explicit AS (...) clause is needed. Remember that the source SQL query provided to crosstabN must still:

  1. Return three columns: row_identifier, category, value.
  2. Be sorted using ORDER BY 1, 2.
  3. The value column (third column of the source query) should be TEXT or cast to TEXT, as it populates the category_X TEXT output columns. The row_identifier (first column) also populates the row_name TEXT output column.

Example using crosstab3():

Let's use our product_sales_long table again:

productquartersales
AppleQ1100
AppleQ2120
BananaQ180
AppleQ3110
BananaQ295

To pivot this using crosstab3(), ensuring sales are treated as text for the output:

SELECT *
FROM crosstab3(
  $$SELECT product, quarter, sales::TEXT  -- Cast sales to TEXT
    FROM product_sales_long
    ORDER BY 1, 2$$  -- Important: ORDER BY row_id, category
);

Expected Output:

The output columns will be row_name, category_1, category_2, and category_3. The values from the quarter column (Q1, Q2, Q3 in sorted order) will determine which category_X column receives the sales data.

row_namecategory_1category_2category_3
Apple100120110
Banana8095(null)

Explanation:

  • crosstab3 automatically defines the output columns as row_name TEXT, category_1 TEXT, category_2 TEXT, and category_3 TEXT.
  • The product column from the source query populates the row_name output column. The sorted quarter values (Q1, Q2, Q3) correspond to category_1, category_2, and category_3 respectively.
  • The ORDER BY 1, 2 clause in the source query is essential for correct processing and mapping of quarter data to category columns.
  • Banana has sales data only for Q1 and Q2, so its third value column (category_3, corresponding to Q3 for Banana if it existed) is NULL.

connectby()

The connectby() function is designed to traverse tree-like or hierarchical data structures, such as product category trees, organizational charts, or bill-of-materials.

Consider a product_categories table that defines a hierarchy of product categories:

category_idcategory_nameparent_category_id
1ElectronicsNULL
2Computers1
3Laptops2
4Desktops2
5Phones1
6Smartphones5
7BooksNULL
8Fiction7

We want to display the hierarchy starting from the 'Electronics' category (ID 1).

Query:

CREATE TABLE product_categories (
  category_id INT PRIMARY KEY,
  category_name TEXT,
  parent_category_id INT
);

INSERT INTO product_categories (category_id, category_name, parent_category_id) VALUES
    (1, 'Electronics', NULL),
    (2, 'Computers', 1),
    (3, 'Laptops', 2),
    (4, 'Desktops', 2),
    (5, 'Phones', 1),
    (6, 'Smartphones', 5),
    (7, 'Books', NULL),
    (8, 'Fiction', 7);

-- Using connectby to traverse the product category hierarchy
SELECT *
FROM connectby(
  'product_categories',   -- 1. Table name
  'category_id',          -- 2. Key field column name
  'parent_category_id',   -- 3. Parent key field column name
  '1',                    -- 4. Start row's key value (e.g., 'Electronics' category_id)
  0,                      -- 5. Maximum depth (0 for all levels)
  '>'                     -- 6. Branch delimiter string for the branch_path
) AS t(
    current_category_id INT, -- Output: Current item's key field
    parent_id INT,           -- Output: Parent item's key field
    level INT,               -- Output: Depth in the hierarchy (0 for start_with row)
    branch_path TEXT         -- Output: Text path from root to current item
);

How connectby() works:

  • Parameters:

    1. table_name TEXT: Name of the table containing the hierarchy.
    2. key_field TEXT: Name of the column storing the unique ID for each item.
    3. parent_key_field TEXT: Name of the column storing the ID of the parent item.
    4. start_with_value TEXT: The key_field value of the item from which to start the traversal (must be provided as text).
    5. max_depth INTEGER: Maximum number of levels to traverse (0 means no limit).
    6. branch_delimiter TEXT: A string used to construct the branch_path output column.
  • Output Definition AS t(...): You must define the structure of the output table:

    • key_field_alias <type>: The key of the current item. Its data type should match the key_field in the source table (e.g., current_category_id INT).
    • parent_key_field_alias <type>: The key of the parent item. Its data type should match the parent_key_field (or key_field) in the source table (e.g., parent_id INT).
    • level <INTEGER>: The depth of the current item in the hierarchy (0 for the starting item, 1 for its direct children, and so on).
    • branch_path <TEXT>: If the branch_delimiter argument is provided to connectby, this column will contain a text representation of the path from the starting item to the current item, using the specified delimiter.

Example output:

current_category_idparent_idlevelbranch_path
1(null)01
2111>2
3221>2>3
4221>2>4
5111>5
6521>5>6
  • This output shows Electronics (ID 1) at level 0 with a branch_path of 1.
  • Computers (ID 2) is a sub-category of Electronics, at level 1, with branch_path 1>2.
  • Laptops (ID 3) is a sub-category of Computers, at level 2, with branch_path 1>2>3, and so on.

Important considerations

  • crosstab() output definition: You must always define the output columns and their types using the AS (...) clause when calling crosstab(). The number and types of these columns must match what your pivoted data will look like.
  • crosstab() category ordering: The order of columns generated by the single-argument crosstab depends on the ORDER BY clause of your source query and the natural sort order of the category values. For explicit column ordering and to ensure all desired categories appear, use the two-argument version of crosstab.
  • Data types: Pay close attention to data types. The types defined in the AS (...) clause for crosstab must match the 'value' column of the source query (for the pivoted value columns) and the row identifier column(s). For connectby, the key and parent key alias types in the AS t(...) clause must match the source table's corresponding column types.

Conclusion

The tablefunc extension in Postgres is a powerful tool for reshaping and analyzing data. It provides essential functions like normal_rand() for generating random numbers, crosstab() for creating pivot tables, and connectby() for traversing hierarchical data structures.

Resources

Need help?

Join our Discord Server to ask questions or see what others are doing with Neon. Users on paid plans can open a support ticket from the console. For more details, see Getting Support.

Last updated on

Was this page helpful?