YouTube-like Short IDs as Postgres Primary Keys

I recently released pg-shortkey, a simple Postgres type library for generating YouTube-like IDs based on an approach previously presented by andyet.com. It’s a trivial algorithm at its core. This is how key candidates are generated:

-- 8 bytes gives a collision p = .5 after 5.1 x 10^9 values
gkey := encode(gen_random_bytes(8), 'base64');
gkey := replace(gkey, '/', '_');  -- url safe replacement
gkey := replace(gkey, '+', '-');  -- url safe replacement
key := rtrim(gkey, '=');          -- cut off padding

The comment gives the birthday problem collision probability after 109 keys. pg-shortkey does handle collisions and keeps trying until it finds an available key. At 109 keys, there’s a 50% chance this loop needs to run once more.

This library is designed for Postgres and uses primitives from pgcrypto. See the repo for install instructions.

Example Usage

CREATE TABLE test (id SHORTKEY PRIMARY KEY, name TEXT);

CREATE TRIGGER trigger_test_genid BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE shortkey_generate();

-- generate
INSERT INTO test(name) VALUES ('bob'), ('joe');

-- user-supplied ID
INSERT INTO test(id, name) VALUES ('1TNhBqYo-6Q', 'lisa');

SELECT * FROM test;

-- id           name
-- 4E_z0mHJvrk  bob
-- wiz_j0HIBuQ  joe
-- 1TNhBqYo-6Q  lisa

Caveats

SHORTKEYs are random, they fragment the index space like v4 UUIDs, decreasing performance though not to a noticeable degree. But unlike UUID (which provides an instance-local pseudo-sequential type via UUID v1 MC that is more index-friendly), that behavior can’t be changed. This is intentional. SHORTKEYs are supposed to be used for external facing IDs, like in APIs. They prevent enumeration and cardinality evaluation (e.g. how many things are there and what’s the next/previous thing - like YouTube). It doesn’t mean the internal table shouldn’t have another more efficient ID type with an index. Use where appropriate, the library is intentionally kept simple to allow easy customization.