> This page location: Extensions > pgrag
> Full Neon documentation index: https://neon.com/docs/llms.txt

# The pgrag extension

Create end-to-end Retrieval-Augmented Generation (RAG) pipelines

**What you will learn:**

- What is RAG?
- What's included in a RAG pipeline?
- `pgrag` functions
- How to use `pgrag`

**Related resources**

- [The pgvector extension](https://neon.com/docs/extensions/pgvector)
- [YouTube: pgrag video demonstration](https://www.youtube.com/watch?v=QDNsxw_3ris\&t=1356s)

**Source code**

- [pgrag GitHub repository](https://github.com/neondatabase-labs/pgrag)

The `pgrag` extension and its accompanying model extensions are designed for creating end-to-end Retrieval-Augmented Generation (RAG) pipelines without leaving your SQL client. No additional programming languages or libraries are required. With functions provided by `pgrag` and a Postgres database with `pgvector`, you can build a complete RAG pipeline via SQL.

**Info: Experimental Feature**

The `pgrag` extension is experimental and actively being developed. Use it with caution as functionality may change.

## What is RAG?

**RAG stands for Retrieval-Augmented Generation**. It's the search for information relevant to a question that includes information alongside the question in a prompt to an AI chat model. For example, "_ChatGPT, please answer questions x using information Y_".

---

## What's included in a RAG pipeline?

A RAG pipeline includes a number of steps, as illustrated in the following diagram.

![The steps in a RAG pipeline](https://neon.com/docs/extensions/rag_pipeline.jpg)

The steps outlined above can be organized into two main stages:

1. **Preparing and indexing the information**:
   1. Load documents and extract text
   2. Split documents into chunks
   3. Generate embeddings for chunks
   4. Store the embeddings alongside chunks
2. **Handling incoming questions**: 5. Vectorize question 6. Use question embedding to find relevant document chunks 7. Retrieve document chunks from database 8. Rerank and take only best-match chunks to answer question 9. Prompt with question + relevant document chunks to answer question 10. Generated answer

---

## What does pgrag support?

With the exception of (4) storing embeddings in the database and (7) Retrieve document chunks from database, which is supported by Postgres with `pgvector`, `pgrag` supports all of the steps listed above. Specifically, `pgrag` supports:

- **Text extraction and conversion**
  - Simple text extraction from PDF documents (using [pdf-extract](https://github.com/jrmuizel/pdf-extract)). Currently, there is no Optical Character Recognition (OCR) or support for complex layout and formatting.
  - Simple text extraction from `.docx` documents (using [docx-rs](https://github.com/cstkingkey/docx-rs)).
  - HTML conversion to Markdown (using [htmd](https://github.com/letmutex/htmd)).

- **Text chunking**
  - Text chunking by character count (using [text-splitter](https://github.com/benbrandt/text-splitter)).
  - Text chunking by token count (also using [text-splitter](https://github.com/benbrandt/text-splitter)).

- **Local embedding and reranking models**

  - Local tokenising + embedding generation with 33M parameter model [bge-small-en-v1.5](https://huggingface.co/Xenova/bge-small-en-v1.5) (using [ort](https://github.com/pykeio/ort) via [fastembed](https://github.com/Anush008/fastembed-rs)).
  - Local tokenising + reranking with 33M parameter model [jina-reranker-v1-tiny-en](https://huggingface.co/jinaai/jina-reranker-v1-tiny-en) (also using [ort](https://github.com/pykeio/ort) via [fastembed](https://github.com/Anush008/fastembed-rs)).

  **Note:** These models run locally on your Postgres server. They are packaged as separate extensions that accompany `pgrag`, because they are large (>100MB), and because we may want to add support for more models in future in the form of additional `pgrag` model extensions.

- **Remote embedding and chat models**

---

## Installation

**Warning:**

As an experimental extension, `pgrag` may be unstable or introduce backward-incompatible changes. We recommend using it only in a separate, dedicated Neon project. To proceed with the installation, you will need to run the following command first:

```sql
SET neon.allow_unstable_extensions='true';
```

To install `pgrag` to a Neon Postgres database, run the following commands:

```sql
create extension if not exists rag cascade;
create extension if not exists rag_bge_small_en_v15 cascade;
create extension if not exists rag_jina_reranker_v1_tiny_en cascade;
```

The first extension is the `pgrag` extension. The other two extensions are the model extensions for local tokenising, embedding generation, and reranking. The three extensions have no dependencies on each other, but all depend on `pgvector`. Specifying `cascade` ensures that `pgvector` is installed.

---

## pgrag functions

This section lists the functions provided by `pgrag`. For function usage examples, refer to the [end-to-end RAG example](https://neon.com/docs/extensions/pgrag#end-to-end-rag-example) below or the [pgrag GitHub repository](https://github.com/neondatabase-labs/pgrag).

- **Text extraction**

  These functions extract text from PDFs, Word files, and HTML.

  - `rag.text_from_pdf(bytea) -> text`
  - `rag.text_from_docx(bytea) -> text`
  - `rag.markdown_from_html(text) -> text`

- **Splitting text into chunks**

  These functions split the extracted text into chunks by character count or token count.

  - `rag.chunks_by_character_count(text, max_chars, overlap) -> text[]`
  - `rag_bge_small_en_v15.chunks_by_token_count(text, max_tokens, overlap) -> text[]`

- **Generating embeddings for chunks**

  These functions generate embeddings for chunks either directly in the extension using a small but best-in-class model on the database server or by calling out to a 3rd-party API such as OpenAI.

  - `rag_bge_small_en_v15.embedding_for_passage(text) -> vector(384)`
  - `rag.openai_text_embedding_3_small(text) -> vector(1536)`

- **Generating embeddings for questions**

  These functions generate embeddings for the questions.

  - `rag_bge_small_en_v15.embedding_for_query(text) -> vector(384)`
  - `rag.openai_text_embedding_3_small(text) -> vector(1536)`

- **Reranking**

  This function reranks chunks against the question using a small but best-in-class model that runs locally on your Postgres server.

  - `rag_jina_reranker_v1_tiny_en.rerank_distance(text, text) -> real`

- **Calling out to chat models**

  This function makes API calls to AI chat models such as ChatGPT to generate an answer using the question and the chunks together.

  - `rag.openai_chat_completion(json) -> json`

---

## End-to-end RAG example

**1. Create a `docs` table and ingest some PDF documents as text**

```sql
drop table docs cascade;
create table docs
( id int primary key generated always as identity
, name text not null
, fulltext text not null
);

\set contents `base64 < /path/to/first.pdf`
insert into docs (name, fulltext)
values ('first.pdf', rag.text_from_pdf(decode(:'contents','base64')));

\set contents `base64 < /path/to/second.pdf`
insert into docs (name, fulltext)
values ('second.pdf', rag.text_from_pdf(decode(:'contents','base64')));

\set contents `base64 < /path/to/third.pdf`
insert into docs (name, fulltext)
values ('third.pdf', rag.text_from_pdf(decode(:'contents','base64'))));
```

**2. Create an `embeddings` table, chunk the text, and generate embeddings for the chunks (performed locally)**

```sql
drop table embeddings;
create table embeddings
( id int primary key generated always as identity
, doc_id int not null references docs(id)
, chunk text not null
, embedding vector(384) not null
);

create index on embeddings using hnsw (embedding vector_cosine_ops);

with chunks as (
  select id, unnest(rag_bge_small_en_v15.chunks_by_token_count(fulltext, 192, 8)) as chunk
  from docs
)
insert into embeddings (doc_id, chunk, embedding) (
  select id, chunk, rag_bge_small_en_v15.embedding_for_passage(chunk) from chunks
);
```

**3. Query the embeddings and rerank the results (performed locally)**

```sql
\set query 'what is [...]? how does it work?'

with ranked as (
  select
    id, doc_id, chunk, embedding <=> rag_bge_small_en_v15.embedding_for_query(:'query') as cosine_distance
  from embeddings
  order by cosine_distance
  limit 10
)
select *, rag_jina_reranker_v1_tiny_en.rerank_distance(:'query', chunk)
from ranked
order by rerank_distance;
```

**4. Feed the query and top chunks to a remote AI chat model such as ChatGPT to complete the RAG pipeline**

````sql
\set query 'what is [...]? how does it work?'

with ranked as (
  select
    id, doc_id, chunk, embedding <=> rag_bge_small_en_v15.embedding_for_query(:'query') as cosine_distance
  from embeddings
  order by cosine_distance
  limit 10
),
reranked as (
  select *, rag_jina_reranker_v1_tiny_en.rerank_distance(:'query', chunk)
  from ranked
  order by rerank_distance limit 5
)
select rag.openai_chat_completion(json_object(
  'model': 'gpt-4o-mini',
  'messages': json_array(
    json_object(
      'role': 'system',
      'content': E'The user is [...].\n\nTry to answer the user''s QUESTION using only the provided CONTEXT.\n\nThe CONTEXT represents extracts from [...] which have been selected as most relevant to this question.\n\nIf the context is not relevant or complete enough to confidently answer the question, your best response is: "I''m afraid I don''t have the information to answer that question".'
    ),
    json_object(
      'role': 'user',
      'content': E'# CONTEXT\n\n```\n' || string_agg(chunk, E'\n\n') || E'\n```\n\n# QUESTION\n\n```\n' || :'query' || E'```'
    )
  )
)) -> 'choices' -> 0 -> 'message' -> 'content' as answer
from reranked;
````

---

## Related docs (Extensions)

- [Extension explorer](https://neon.com/docs/extensions/extension-explorer)
- [anon](https://neon.com/docs/extensions/postgresql-anonymizer)
- [btree_gin](https://neon.com/docs/extensions/btree_gin)
- [btree_gist](https://neon.com/docs/extensions/btree_gist)
- [citext](https://neon.com/docs/extensions/citext)
- [cube](https://neon.com/docs/extensions/cube)
- [dblink](https://neon.com/docs/extensions/dblink)
- [dict_int](https://neon.com/docs/extensions/dict_int)
- [earthdistance](https://neon.com/docs/extensions/earthdistance)
- [fuzzystrmatch](https://neon.com/docs/extensions/fuzzystrmatch)
- [hstore](https://neon.com/docs/extensions/hstore)
- [intarray](https://neon.com/docs/extensions/intarray)
- [ltree](https://neon.com/docs/extensions/ltree)
- [neon](https://neon.com/docs/extensions/neon)
- [neon_utils](https://neon.com/docs/extensions/neon-utils)
- [online_advisor](https://neon.com/docs/extensions/online_advisor)
- [pgcrypto](https://neon.com/docs/extensions/pgcrypto)
- [pgvector](https://neon.com/docs/extensions/pgvector)
- [pg_cron](https://neon.com/docs/extensions/pg_cron)
- [pg_graphql](https://neon.com/docs/extensions/pg_graphql)
- [pg_mooncake](https://neon.com/docs/extensions/pg_mooncake)
- [pg_partman](https://neon.com/docs/extensions/pg_partman)
- [pg_prewarm](https://neon.com/docs/extensions/pg_prewarm)
- [pg_session_jwt](https://neon.com/docs/extensions/pg_session_jwt)
- [pg_stat_statements](https://neon.com/docs/extensions/pg_stat_statements)
- [pg_repack](https://neon.com/docs/extensions/pg_repack)
- [pg_search](https://neon.com/docs/extensions/pg_search)
- [pg_tiktoken](https://neon.com/docs/extensions/pg_tiktoken)
- [pg_trgm](https://neon.com/docs/extensions/pg_trgm)
- [pg_uuidv7](https://neon.com/docs/extensions/pg_uuidv7)
- [pgrowlocks](https://neon.com/docs/extensions/pgrowlocks)
- [pgstattuple](https://neon.com/docs/extensions/pgstattuple)
- [postgis](https://neon.com/docs/extensions/postgis)
- [postgis-related](https://neon.com/docs/extensions/postgis-related-extensions)
- [postgres_fdw](https://neon.com/docs/extensions/postgres_fdw)
- [tablefunc](https://neon.com/docs/extensions/tablefunc)
- [timescaledb](https://neon.com/docs/extensions/timescaledb)
- [unaccent](https://neon.com/docs/extensions/unaccent)
- [uuid-ossp](https://neon.com/docs/extensions/uuid-ossp)
- [wal2json](https://neon.com/docs/extensions/wal2json)
- [xml2](https://neon.com/docs/extensions/xml2)