首页 > 代码库 > How to code a URL shortener?

How to code a URL shortener?

I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to “http://www.example.org/abcdef“. Instead of “abcdef” there can be any other string with six characters containing a-z, A-Z and 0-9. That makes 56~57 billion possible strings.

My approach:

I have a database table with three columns:

  1. id, integer, auto-increment
  2. long, string, the long URL the user entered
  3. short, string, the shortened URL (or just the six characters)

I would then insert the long URL into the table. Then I would select the auto-increment value for “id” and build a hash of it. This hash should then be inserted as “short“. But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don’t use these algorithms, I think. A self-built algorithm will work, too.

My idea:

For “http://www.google.de/” I get the auto-increment id 239472. Then I do the following steps:

short = ‘‘;if divisible by 2, add "a"+the result to shortif divisible by 3, add "b"+the result to short... until I have divisors for a-z and A-Z.

That could be repeated until the number isn’t divisible any more. Do you think this is a good approach? Do you have a better idea?

Edit: Due to the ongoing interest in this topic, I’ve uploaded the code that I used to GitHub, with implementations for Java, PHP and JavaScript. Add your solutions if you like :)

 

 

I would continue your “convert number to string” approach. However you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.

Theoretical background

You need a Bijective Function f. This is necessary so that you can find a inverse functiong(‘abc’) = 123 for your f(123) = ‘abc’ function. This means:

  • There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
  • and for every y you must be able to find an x so that f(x) = y.

How to convert the ID to a shortened URL

  1. Think of an alphabet we want to use. In your case that’s [a-zA-Z0-9]. It contains 62 letters.
  2. Take an auto-generated, unique numerical key (the auto-incremented id of a MySQL table for example).

    For this example I will use 12510 (125 with a base of 10).

  3. Now you have to convert 12510 to X62 (base 62).

    12510 = 2×621 + 1×620 = [2,1]

    This requires use of integer division and modulo. A pseudo-code example:

    digits = []while num > 0  remainder = modulo(num, 62)  digits.push(remainder)  num = divide(num, 62)digits = digits.reverse

    Now map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:

    0  → a1  → b...25 → z...52 → 061 → 9

    With 2 → c and 1 → b you will receive cb62 as the shortened URL.

    http://shor.ty/cb

How to resolve a shortened URL to the initial ID

The reverse is even easier. You just do a reverse lookup in your alphabet.

  1. e9a62 will be resolved to “4th, 61st, and 0th letter in alphabet”.

    e9a62 = [4,61,0] = 4×622 + 61×621 + 0×620 = 1915810

  2. Now find your database-record with WHERE id = 19158 and do the redirect.

Some implementations (provided by commenters)

  • Ruby
  • Python
  • CoffeeScript
  • Haskell
  • C#

How to code a URL shortener?