How to decode the gclid parameter in Google Adwords

Kevin Jones, 30th November 2013

We’ve recently found out how to decode the gclid parameter (in Google Adwords).
It turns out that this parameter is not actually encrypted—anyone can decode it quite easily.

What is the gclid parameter?

www.example.com/?gclid=CMrlnPq42q8CFQdb3wodOkkGBg It’s a code that Google uses to track users after they’ve clicked on an Adwords advert.

If you advertise with Adwords and you’ve got auto-tagging turned on, Google adds the gclid parameter to your destination URLs, so you can get information about the clicked advert in your Analytics reports.

But Google have never talked about what’s stored inside the gclid parameter itself.

How is the gclid parameter encoded?

It's encoded in Protocol Buffers, and then in (a version of) Base64.
And that's it!—there is no encryption layer.

What is “Protocol Buffers”?

Protocol Buffers (or “protobuf”) is a serialisation format developed by (and used by) Google.

It’s similar to JSON, but—

  • it’s a binary format, so it’s not “human-readable”
  • it’s much more compact
  • it doesn’t store the names of fields (just a numeric ID)
  • it doesn’t store the data types used—just a “wire type” (which is the minimum amount of information needed to separate the different fields)

Should we be publishing this?

We wondered about this—but then we thought that (surely) if Google really wanted to keep the gclid encoding secret, they would have made an effort to actually encrypt it.  Otherwise they may as well have “encrypted” it using JSON, or—I don't know—UTF-8.

(We were tempted to compare it to ROT13—sometimes used by programmers as a “joke” encryption technique because of its weakness.  But ROT13, after all, is a bona fide encryption technique (just not a very strong one).  Google have made no attempt to encrypt the gclid parameter.  Protocol bufffers is an encoding technique.)

Bear in mind as well that Google:

So we think Google probably understand that:

  • someone would have probably noticed it at some point
  • they can't, therefore, be relying solely on Protocol Buffers to stop click fraud and other crime

So what's actually in the gclid value?

Well, there are 3 numbers.
For example, the gclid CKSDxc_qhLkCFQyk4AodO24Arg contains:

  Array
  (
    [1] => 1376737438024100
    [2] => 182494220
    [3] => 2919263803
  )

Parameter 1

The first parameter is clearly a Unix timestamp (with microseconds, we presume)—and corresponds to the time when the advert was clicked on.

So this particular gclid value corresponds to an advert that was clicked on on 17th August 2013 at 12.03.58.024100.

Parameter 2

We don’t know what the second parameter stands for, but we’ve only seen values between 172000000–184000000, so it doesn’t seem to be random.  It’s easy to find old gclids on the internet, and we’ve seen that the value doesn’t seem to be related to the time when the gclid was generated (which we know from parameter 1).

Parameter 3

The third paramter seems to be random, and we don't know what it stands for either.  But we’ve noticed that sometimes two different gclids—for the same advert—can have the same values for parameters 2 and 3 (meaning only parameter 1 is different)—so they definitely seem to be linked.

It stands to reason that parameter 2 or 3 (or both) will uniquely identify the advert (and search results page) that the user just clicked on.  However we weren’t able to match them to any ID in our Adwords control panel.

So for now, the last 2 parameters remain a mystery.  Of course if you find out more information, or you have any ideas, please let us know in the comments below.

Is this information useful to advertisers?

Without knowing what the other two parameters in the gclid mean, we can only think of one practical use: by looking at the timestamp, you could distinguish between website visitors who genuinely clicked on one of your Adwords, and visitors who used a bookmark (or a link indexed by Google) with the gclid parameter still in the address.

What we would like to know is more information about what advert our visitors click on, their position in the results, and what the visitiors searched for.  This is already available in your Adwords / Analytics control panel, but Google just shows the averages, which hides some of the information.

How to decode a gclid

We've made a useful PHP function so you can decode your own gclids.

By default, we split the timestamp into seconds and microseconds, because:

  • we think it’s more useful
  • if you’ve got a 32-bit computer, and you haven’t got the bcmath extension available, then the timestamp will be returned as a float

—but you can turn off this behaviour by setting $splitTimestamp to false.

function gclid_decode($gclid, $splitTimestamp = true)
{
    // Copyright 2013 Deed Poll Office Ltd, UK <https://deedpolloffice.com>
    // Licensed under Apache Licence v2.0 <http://apache.org/licenses/LICENSE-2.0>
    preg_match_all('/
        (?=[\x5\xd\x15\x1d%\-5=EMU\]emu}\x85\x8d\x95\x9d\xa5\xad\xb5\xbd\xc5\xcd\xd5
	    \xdd\xe5\xed\xf5\xfd]) # 32-bit wire type
        ([\x80-\xff]*[\0-\x7f])(.{4}) |
        ([\x80-\xff]*[\0-\x7f])([\x80-\xff]*[\0-\x7f]) # default to varint wire type
        /sx',
	base64_decode(str_replace(array('_','-'), array('+','/'), $gclid)),
	$matches,
	PREG_SET_ORDER);
    $ret = array();
    foreach ($matches as $m) {
        $key = $val = 0;
        foreach (str_split($m[1] ? $m[1] : $m[3]) as $i => $c)
	    $key += (ord($c) & 0x7f) << $i * 7;
        if ($m[1]) { // 32-bit (probably) unsigned int (not supported by PHP)
            foreach (str_split($m[2]) as $i => $c) {
		$val = PHP_INT_SIZE < 5 && function_exists('bcadd') ?
		    bcadd($val, bcmul(ord($c), bcpow(2, $i * 8))) :
		    $val + (ord($c) * pow(2, $i * 8));
	    }
        } else {
            foreach (str_split($m[4]) as $i => $c) {
		$val = PHP_INT_SIZE < 8 && function_exists('bcadd') ?
		    bcadd($val, bcmul(ord($c) & 0x7f, bcpow(2, $i * 7))) :
		    $val + ((ord($c) & 0x7f) * pow(2, $i * 7));
	    }
        }
        $ret[$key >> 3] = $val;
    }
    if ($splitTimestamp) $ret[1] = array( // Split into seconds / microseconds
	(int) floor($ret[1] / 1000000),
	is_int($ret[1]) ? $ret[1] % 1000000 :
	    (is_string($ret[1]) ? bcmod($ret[1], 1000000) : null),
	);
    return $ret;
}

Example of use:

print_r(gclid_decode('CKSDxc_qhLkCFQyk4AodO24Arg'));

// Prints:
//
// Array
// (
//     [1] => Array
//         (
//             [0] => 1376737438   <---- timestamp (seconds)
//             [1] => 24100                        (microseconds)
//         )
//
//     [2] => 182494220
//     [3] => 2919263803
// )
//

Comments

comments powered by Disqus