Sab's

Postgres Partitions

Postgres partitions

A better resources is https://www.postgresql.org/docs/10/ddl-partitioning.html

Postgres let's us split a large table in to smaller entities, done for managebility, performance.

Typical usecase includes:

  • Improve query performance of frequently accessed data from fewer partitions.
  • Bulk delete or upload. Eg: Delete all records more than 3 months old.
  • Seperate stale data from actively used data.

Partitioning

Declarative

Declarative partitioning lets us specify the desired outcome without much details on how the partition happens behind the scenes.

For instance we have a large table with orders partitioned by country

Create the table to be partitioned
CREATE TABLE orders
(
    order_id   INT     NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    item_id    INT,
    country    VARCHAR NOT NULL,
    PRIMARY KEY (order_id, country)
) PARTITION BY LIST (country);

Please note that primary key is a combination of table's primary key (order_id) and the partition key (country). This is a limitation of partitioned table, partition key must be part of the primary key.

Create the partitions
CREATE TABLE orders_de PARTITION OF orders FOR VALUES IN ('de');
CREATE TABLE orders_in PARTITION OF orders FOR VALUES IN ('in');

Data is stored in the respective partition without caring about which partition the data is saved. In the following, data inserted into the orders are queriable from orders and also from the partition table orders_de.

INSERT INTO orders (order_id, item_id, country)  VALUES (12345, 4352, 'de');
SELECT * FROM orders;
SELECT * FROM orders_de;

Output

order_iddateitem_idcountry
123452021-01-05 16:17:20.3777784352de

Joins on partitioned tables

Lets introduce billing table which is also partitioned

CREATE TABLE billing
(
    billing_id INT,
    order_id   INT,
    price      INT,
    country    VARCHAR,

    PRIMARY KEY (billing_id, country),
    FOREIGN KEY (order_id, country) REFERENCES orders (order_id, country)
) PARTITION BY LIST (country);

CREATE TABLE billing_de PARTITION OF billing FOR VALUES IN ('de');
CREATE TABLE billing_in PARTITION OF billing FOR VALUES IN ('in');

For partition-wise joins to work (as intended), they should be equi-partitioned. For example, for the below query - the join will find matching partitions based on bounds. In this case, the join pairs are orders_de, billing_de and orders_in, billing_in. But if billing table had another partition billing_us, it will not be paired with anything.

SELECT * FROM orders
JOIN billing b ON orders.order_id = b.order_id AND orders.country = b.country;

Adding and removing partitions

Add or attach partition

If we started accepting orders from US, then this would be as easy as create a partition and attached the partition.

create table orders_us partition of orders for values in ('us');
create table billing_us partition of billing for values in ('us');

Other options is to create a table which is not a partition and attach it as a partition.

CREATE TABLE billing_us
(LIKE billing INCLUDING DEFAULTS INCLUDING CONSTRAINTS);

ALTER TABLE billing ATTACH PARTITION billing_us
FOR VALUES IN ('us');
Remove partitions

An to remove a partition, dropping the partition works. However this requires getting an exclusive lock on the parent table. But detaching partitions is another options which would retain the table.

ALTER TABLE orders DETACH PARTITION orders_us;

Storage costs

Partitions are meant to make large (say > 100 GB) as manageble entities. There are instances where we don't need to access some tables actively. The less active data can be stored in slower storage devices to reduce cost. This is where tablespaces are useful, each partition can be created under different tablespaces.

#study notes #postgresql