首页 > 代码库 > How to code a URL shortener?
How to code a URL shortener?
I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to “http://www.example.org/abcdef
“. Instead of “abcdef
” there can be any other string with six characters containing a-z, A-Z and 0-9
. That makes 56~57 billion possible strings.
My approach:
I have a database table with three columns:
- id, integer, auto-increment
- long, string, the long URL the user entered
- short, string, the shortened URL (or just the six characters)
I would then insert the long URL into the table. Then I would select the auto-increment value for “id
” and build a hash of it. This hash should then be inserted as “short
“. But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don’t use these algorithms, I think. A self-built algorithm will work, too.
My idea:
For “http://www.google.de/
” I get the auto-increment id 239472
. Then I do the following steps:
short = ‘‘;if divisible by 2, add "a"+the result to shortif divisible by 3, add "b"+the result to short... until I have divisors for a-z and A-Z.
That could be repeated until the number isn’t divisible any more. Do you think this is a good approach? Do you have a better idea?
Edit: Due to the ongoing interest in this topic, I’ve uploaded the code that I used to GitHub, with implementations for Java, PHP and JavaScript. Add your solutions if you like :)
I would continue your “convert number to string” approach. However you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.
Theoretical background
You need a Bijective Function f. This is necessary so that you can find a inverse functiong(‘abc’) = 123 for your f(123) = ‘abc’ function. This means:
- There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
- and for every y you must be able to find an x so that f(x) = y.
How to convert the ID to a shortened URL
- Think of an alphabet we want to use. In your case that’s
[a-zA-Z0-9]
. It contains 62 letters. - Take an auto-generated, unique numerical key (the auto-incremented
id
of a MySQL table for example).For this example I will use 12510 (125 with a base of 10).
- Now you have to convert 12510 to X62 (base 62).
12510 = 2×621 + 1×620 =
[2,1]
This requires use of integer division and modulo. A pseudo-code example:
digits = []while num > 0 remainder = modulo(num, 62) digits.push(remainder) num = divide(num, 62)digits = digits.reverse
Now map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:
0 → a1 → b...25 → z...52 → 061 → 9
With 2 → c and 1 → b you will receive cb62 as the shortened URL.
http://shor.ty/cb
How to resolve a shortened URL to the initial ID
The reverse is even easier. You just do a reverse lookup in your alphabet.
- e9a62 will be resolved to “4th, 61st, and 0th letter in alphabet”.
e9a62 =
[4,61,0]
= 4×622 + 61×621 + 0×620 = 1915810 - Now find your database-record with
WHERE id = 19158
and do the redirect.
Some implementations (provided by commenters)
- Ruby
- Python
- CoffeeScript
- Haskell
- C#
How to code a URL shortener?