I recently released pg-shortkey, a simple Postgres type library for generating YouTube-like IDs based on an approach previously presented by andyet.com. It’s a trivial algorithm at its core. This is how key candidates are generated:
-- 8 bytes gives a collision p = .5 after 5.1 x 10^9 values gkey := encode(gen_random_bytes(8), 'base64'); gkey := replace(gkey, '/', '_'); -- url safe replacement gkey := replace(gkey, '+', '-'); -- url safe replacement key := rtrim(gkey, '='); -- cut off padding
The comment gives the birthday problem collision probability after 109 keys. pg-shortkey does handle collisions and keeps trying until it finds an available key. At 109 keys, there’s a 50% chance this loop needs to run once more.
This library is designed for Postgres and uses primitives from
pgcrypto. See the repo for install instructions.
CREATE TABLE test (id SHORTKEY PRIMARY KEY, name TEXT); CREATE TRIGGER trigger_test_genid BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE shortkey_generate(); -- generate INSERT INTO test(name) VALUES ('bob'), ('joe'); -- user-supplied ID INSERT INTO test(id, name) VALUES ('1TNhBqYo-6Q', 'lisa'); SELECT * FROM test; -- id name -- 4E_z0mHJvrk bob -- wiz_j0HIBuQ joe -- 1TNhBqYo-6Q lisa
SHORTKEYs are random, they fragment the index space like v4 UUIDs, decreasing performance though not to a noticeable degree. But unlike UUID (which provides an instance-local pseudo-sequential type via UUID v1 MC that is more index-friendly), that behavior can’t be changed. This is intentional.
SHORTKEYs are supposed to be used for external facing IDs, like in APIs. They prevent enumeration and cardinality evaluation (e.g. how many things are there and what’s the next/previous thing - like YouTube). It doesn’t mean the internal table shouldn’t have another more efficient ID type with an index. Use where appropriate, the library is intentionally kept simple to allow easy customization.