I recently released pg-shortkey, a simply Postgres type library for generating YouTube-like IDs based on an approach previously presented by andyet.com. It’s a fairly trivial algorithm at its core. This is how key candidates are generated:
-- 8 bytes gives a collision p = .5 after 5.1 x 10^9 values gkey := encode(gen_random_bytes(8), 'base64'); gkey := replace(gkey, '/', '_'); -- url safe replacement gkey := replace(gkey, '+', '-'); -- url safe replacement key := rtrim(gkey, '='); -- cut off padding
The comment gives the birthday problem collision probability after 109 keys. pg-shortkey does handle collisions automatically and keeps trying until it finds a free key. At 109 keys, there’s a 50% chance this loop needs to run once more (etc.).
This library is designed for Postgres and uses primitives from
pgcrypto. See the repo for install instructions.
CREATE TABLE test (id SHORTKEY PRIMARY KEY, name TEXT); CREATE TRIGGER trigger_test_genid BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE shortkey_generate(); -- generate INSERT INTO test(name) VALUES ('bob'), ('joe'); -- user-supplied ID INSERT INTO test(id, name) VALUES ('1TNhBqYo-6Q', 'lisa'); SELECT * FROM test; -- id name -- 4E_z0mHJvrk bob -- wiz_j0HIBuQ joe -- 1TNhBqYo-6Q lisa
SHORTKEYs are truly random, so they fragment the index space just like v4 UUIDs, decreasing performance slightly. But unlike UUID (which provides an instance-local pseudo-sequential type via UUID v1 MC that is more index-friendly), that behavior can’t be changed. This is intentional. SHORTKEYs are supposed to be used for external facing IDs, like in APIs. There, they prevent enumeration and cardinality evaluation (e.g. how many things are there and what’s the next/previous thing - just like YouTube). It doesn’t mean the internal table shouldn’t have another more efficient ID type with an index. Use where appropriate, the library is intentionally kept simple to allow easy customization.