> This page location: Extensions > lakebase_text
> Full Neon documentation index: https://neon.com/docs/llms.txt

> Summary: The lakebase_text extension adds the lakebase_bm25 index type to Neon Postgres for BM25 full-text search. It requires no migration from PostgreSQL's built-in full-text search — standard tsvector types and tsquery operators work unchanged. Use this page to enable the extension, create a lakebase_bm25 index, query with the <@> operator and to_bm25query function, configure the default_limit and prefilter GUCs, set fallback parameters at the index level, and reference all types, operators, functions, and index parameters.

# The lakebase_text extension

BM25 full-text search for Neon Postgres

**Coming Soon: Early access**

**lakebase_text** is currently in early access.

The `lakebase_text` extension adds a `lakebase_bm25` index type to Postgres for [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) full-text search. It is a native upgrade to PostgreSQL's built-in full-text search: standard `tsvector` type and query operators work unchanged; only the index type changes.

Lakebase Search is in private preview. To request access or learn about its architecture advantages, see [Lakebase Search](https://neon.com/docs/ai/lakebase-search).

## Why lakebase_text?

PostgreSQL's built-in full-text search uses GIN indexes with `tsvector`. GIN works well for boolean filtering, but it has two limitations for search relevance:

- **No BM25 ranking.** GIN uses `ts_rank`, which does not use global corpus statistics, so scores degrade as data grows. BM25 is more accurate, accounting for term frequency, document length, and corpus-wide statistics together.
- **No top-K pushdown.** GIN must score all matching documents even when you only need the top 10. For large tables, this means significant unnecessary work on every query.

`lakebase_bm25` adds a first-class BM25 index with Block-Max WAND top-K pushdown: the index returns only the K most relevant results directly, without scoring the entire match set. It fully preserves standard `tsvector` types and existing query operators. No application logic changes are required.

## Enable the lakebase_text extension

[Lakebase Search](https://neon.com/docs/ai/lakebase-search) must be enabled on your Neon project before you can install this extension. Once it's enabled, run the following statement in the [Neon SQL Editor](https://neon.com/docs/get-started/query-with-neon-sql-editor) or from a client such as [psql](https://neon.com/docs/connect/query-with-psql-editor):

```sql
CREATE EXTENSION IF NOT EXISTS lakebase_text;
```

`lakebase_text` requires Postgres 16 or later. It has no extension dependencies; unlike `lakebase_vector`, it does not require `pgvector`.

## Quick start

Create a table with a `tsvector` column and insert data:

```sql
CREATE TABLE documents (
    id serial PRIMARY KEY,
    passage text,
    vector tsvector
);

INSERT INTO documents (passage, vector) VALUES
('PostgreSQL is a powerful, open-source object-relational database system.', to_tsvector('english', 'PostgreSQL is a powerful, open-source object-relational database system.')),
('Full-text search is a technique for searching in plain-text documents.', to_tsvector('english', 'Full-text search is a technique for searching in plain-text documents.')),
('BM25 is a ranking function used by search engines to estimate document relevance.', to_tsvector('english', 'BM25 is a ranking function used by search engines to estimate document relevance.')),
('PostgreSQL provides advanced features like full-text search and window functions.', to_tsvector('english', 'PostgreSQL provides advanced features like full-text search and window functions.')),
('Effective ranking algorithms like BM25 improve information retrieval results.', to_tsvector('english', 'Effective ranking algorithms like BM25 improve information retrieval results.'));
```

Create a `lakebase_bm25` index on the `tsvector` column:

```sql
CREATE INDEX documents_passage_bm25 ON documents USING lakebase_bm25 (vector);
```

**Important: Create the index after inserting data**

`lakebase_bm25` computes corpus-wide statistics (document count, term frequencies) at index build time and updates them at VACUUM time. Create the index after your initial data load. After bulk-loading a large amount of new data, run `VACUUM` manually to keep BM25 scores accurate.

Set how many results the index returns and run a BM25 search:

```sql
SET lakebase_bm25.default_limit TO 5;

SELECT
  id,
  vector <@> to_bm25query(to_tsvector('english', 'PostgreSQL'), 'documents_passage_bm25') AS score
FROM documents
ORDER BY score
LIMIT 5;
```

The `<@>` operator calculates the negative BM25 score of a document against a query. Ordering by score ascending returns the most relevant documents first (lower negative score = higher relevance).

`to_bm25query` constructs a `bm25query_tsvector` value by combining the query `tsvector` with the object identifier of the BM25 index. The index identifier is required because BM25 scoring depends on corpus-wide statistics stored in the index.

## Configure default_limit

The `lakebase_bm25.default_limit` GUC controls how many results the index returns before PostgreSQL applies any `LIMIT` clause from your query. The default is `1000`.

```sql
-- Return at most 10 results from the index
SET lakebase_bm25.default_limit TO 10;
```

Setting this value to match your query's `LIMIT` avoids unnecessary work when you only need a small top-K result set.

## Fallback parameters

You can store search parameters directly in an index as storage parameters, rather than setting them per session or transaction with GUCs. This is useful when you have multiple indexes, prefer not to set GUCs, or want to configure search behavior offline.

Set `default_limit` at index creation:

```sql
CREATE INDEX documents_passage_bm25 ON documents USING lakebase_bm25 (vector)
WITH (default_limit = 5);
```

Queries against this index use `default_limit = 5` without requiring a `SET` command:

```sql
SELECT
  id,
  vector <@> to_bm25query(to_tsvector('english', 'PostgreSQL'), 'documents_passage_bm25') AS score
FROM documents
ORDER BY score
LIMIT 5;
```

Update a storage parameter on an existing index:

```sql
ALTER INDEX documents_passage_bm25 SET (default_limit = 10);
```

GUCs take precedence over index storage parameters when both are set. To avoid hard-to-diagnose behavior, use only one method at a time.

## Prefilter

In a filtered query, PostgreSQL applies `WHERE` conditions after the index scan returns results. If your filter eliminates many rows, the index may return far more results than your `LIMIT` requires.

The `lakebase_bm25.prefilter` GUC enables the index to evaluate filter conditions before computing BM25 scores, pruning the search space early:

```sql
SET lakebase_bm25.default_limit TO 5;
SET lakebase_bm25.prefilter = on;

SELECT
  id,
  vector <@> to_bm25query(to_tsvector('english', 'PostgreSQL'), 'documents_passage_bm25') AS score
FROM documents
WHERE id % 1000 = 0
ORDER BY score
LIMIT 5;
```

Prefilter is recommended when the filter is **strict** (eliminates many rows) or **unpredictable** (eliminates an unknown number of rows), and **cheap to evaluate** (much cheaper than computing BM25 scores). Enabling it for a loose or expensive filter may be slower than the default.

## Reference

### Types

| Type                 | Description                                                                                                   |
| :------------------- | :------------------------------------------------------------------------------------------------------------ |
| `bm25query_tsvector` | Combines a query `tsvector` with the object identifier of a BM25 index. Passed as the right operand to `<@>`. |

### Operators

| Operator | Arguments                        | Result   | Description                                                                                                                               |
| :------- | :------------------------------- | :------- | :---------------------------------------------------------------------------------------------------------------------------------------- |
| `<@>`    | `tsvector`, `bm25query_tsvector` | `double` | Returns the negative BM25 score of a document against a query, in the context of the BM25 index. Order ascending for most-relevant-first. |

### Operator classes

| Operator class      | Default | Operator                            |
| :------------------ | :------ | :---------------------------------- |
| `tsvector_bm25_ops` | Yes     | `<@>(tsvector, bm25query_tsvector)` |

### Functions

| Function                                       | Returns              | Description                                                                                          |
| :--------------------------------------------- | :------------------- | :--------------------------------------------------------------------------------------------------- |
| `to_bm25query(query tsvector, index regclass)` | `bm25query_tsvector` | Constructs a `bm25query_tsvector` from a query `tsvector` and the object identifier of a BM25 index. |

### Index storage parameters

| Parameter       | Type    | Default | Domain       | Description                                                                      |
| :-------------- | :------ | :------ | :----------- | :------------------------------------------------------------------------------- |
| `k1`            | real    | `1.2`   | `[1.2, 2.0]` | BM25 `k1` parameter. Controls term frequency saturation.                         |
| `b`             | real    | `0.75`  | `[0.0, 1.0]` | BM25 `b` parameter. Controls document length normalization.                      |
| `default_limit` | integer | `1000`  | `[1, 65535]` | Fallback value for `lakebase_bm25.default_limit`. GUCs take precedence when set. |
| `prefilter`     | boolean | `false` | —            | Fallback value for `lakebase_bm25.prefilter`. GUCs take precedence when set.     |

### Search parameters (GUCs)

| GUC                           | Type    | Default | Description                                                                                           |
| :---------------------------- | :------ | :------ | :---------------------------------------------------------------------------------------------------- |
| `lakebase_bm25.default_limit` | integer | `1000`  | Controls how many results the index returns. Set to match your query's `LIMIT` for best performance.  |
| `lakebase_bm25.prefilter`     | boolean | `false` | Enables filter evaluation before BM25 score computation. Recommended for strict, cheap filters.       |
| `lakebase_bm25.enable_scan`   | boolean | `on`    | Enables or disables `lakebase_bm25` index scans. Set to `off` for testing to force a sequential scan. |

---

## Related docs (Extensions)

- [Extension explorer](https://neon.com/docs/extensions/extension-explorer)
- [anon](https://neon.com/docs/extensions/postgresql-anonymizer)
- [btree_gin](https://neon.com/docs/extensions/btree_gin)
- [btree_gist](https://neon.com/docs/extensions/btree_gist)
- [citext](https://neon.com/docs/extensions/citext)
- [cube](https://neon.com/docs/extensions/cube)
- [dblink](https://neon.com/docs/extensions/dblink)
- [dict_int](https://neon.com/docs/extensions/dict_int)
- [earthdistance](https://neon.com/docs/extensions/earthdistance)
- [fuzzystrmatch](https://neon.com/docs/extensions/fuzzystrmatch)
- [hstore](https://neon.com/docs/extensions/hstore)
- [intarray](https://neon.com/docs/extensions/intarray)
- [lakebase_vector](https://neon.com/docs/extensions/lakebase-vector)
- [ltree](https://neon.com/docs/extensions/ltree)
- [neon](https://neon.com/docs/extensions/neon)
- [neon_utils](https://neon.com/docs/extensions/neon-utils)
- [online_advisor](https://neon.com/docs/extensions/online_advisor)
- [pgcrypto](https://neon.com/docs/extensions/pgcrypto)
- [pgvector](https://neon.com/docs/extensions/pgvector)
- [pgrag](https://neon.com/docs/extensions/pgrag)
- [pg_cron](https://neon.com/docs/extensions/pg_cron)
- [pg_graphql](https://neon.com/docs/extensions/pg_graphql)
- [pg_mooncake](https://neon.com/docs/extensions/pg_mooncake)
- [pg_partman](https://neon.com/docs/extensions/pg_partman)
- [pg_prewarm](https://neon.com/docs/extensions/pg_prewarm)
- [pg_session_jwt](https://neon.com/docs/extensions/pg_session_jwt)
- [pg_stat_statements](https://neon.com/docs/extensions/pg_stat_statements)
- [pg_repack](https://neon.com/docs/extensions/pg_repack)
- [pg_search](https://neon.com/docs/extensions/pg_search)
- [pg_tiktoken](https://neon.com/docs/extensions/pg_tiktoken)
- [pg_trgm](https://neon.com/docs/extensions/pg_trgm)
- [pg_uuidv7](https://neon.com/docs/extensions/pg_uuidv7)
- [pgrowlocks](https://neon.com/docs/extensions/pgrowlocks)
- [pgstattuple](https://neon.com/docs/extensions/pgstattuple)
- [plv8](https://neon.com/docs/extensions/plv8)
- [postgis](https://neon.com/docs/extensions/postgis)
- [postgis-related](https://neon.com/docs/extensions/postgis-related-extensions)
- [postgres_fdw](https://neon.com/docs/extensions/postgres_fdw)
- [tablefunc](https://neon.com/docs/extensions/tablefunc)
- [timescaledb](https://neon.com/docs/extensions/timescaledb)
- [unaccent](https://neon.com/docs/extensions/unaccent)
- [uuid-ossp](https://neon.com/docs/extensions/uuid-ossp)
- [wal2json](https://neon.com/docs/extensions/wal2json)
- [xml2](https://neon.com/docs/extensions/xml2)
