The information hidden inside Google’s ved parameter

Kevin Jones, 13th December 2013

Have you ever noticed the ved parameter in Google’s search results (or in your server logs), with cryptic values—such as 0CFYQFjAG?

Curious bloggers have noticed patterns in these ved codes, but it wasn’t until very recently that Benjamin Schulz found out how the ved parameter is encoded—and how to decode it.  The information stored inside the ved code turns out to be quite interesting and useful.

We’ve taken it further, and written a guide to what the data inside a ved code means.
We’ve also written simple (open-source) functions in PHP and JavaScript that you can use to decode ved codes yourself.

What is the ved parameter?

Example of a link from a Google search engine results page, showing the ved parameter When you click on any of the links in Google’s search results, the URL (address) of the link contains a “ved” parameter.

This “ved” code contains information about the link that you clicked on, so Google can get a better picture of how you use their site.

When a user comes to your website (through Google’s search results), the ved code is (usually) also passed to you—in the referer HTTP header.  By decoding it, you can get extra insights into how visitors come to your site.

How is the ved parameter encoded?

ved=0CFYQFjAG
Encoded ved parameter (prefixed with 0)

The ved parameter is encoded in Protocol Buffers, and then in (a version of) Base64.
It isn’t actually encrypted, so it’s quite easy to decode it.

ved=1t:3588,r:1,s:0,i:83
Plain-text ved parameter (prefixed with 1)

Very rarely, though, the ved parameter uses a different, plain-text encoding.  We don’t really know why, but it’s certainly made it easier to find out how it’s encoded the rest of the time.  (Maybe, in fact, it was just a “clue” from Google.)

Encoded ved values are prefixed with the character 0.
Plain-text ved values are prefixed with the character 1.

What is “Protocol Buffers”?

Protocol Buffers (or “protobuf”) is a serialisation format developed by (and widely used by) Google.  It’s a way of storing (and exchanging) key–value data.

It’s similar in a way to JSON, but—

  • it’s a binary format, so it’s not “human-readable” (unless you encode it in Base64 as well—like Google do with ved codes)
  • it’s much more compact
  • it doesn’t store the names of fields (just a numeric ID)
  • it doesn’t store the data types used—just a “wire type” (which contains only the minimum amount of information needed to separate the different fields)

The “wire type” for each field tells you some information about the true data type, like whether it’s an number or a string of text.  But it doesn’t (for example) say whether numbers are integers or floating-point numbers, or whether they’re signed or unsigned—this is something you need to know in advance.  (In fact this would normally be defined in a separate .proto file.)

How do you decode the ved parameter?

You can decode Protocol Buffers by following the instructions in the manual, or downloading a (free) compiler from Google.

We’ve also written simple functions to decode the ved, which you’re welcome to use.
They can handle both encoded and plain-text ved values.

Using PHP

function ved_decode($ved)
{
    // Copyright 2013 Deed Poll Office Ltd, UK <https://deedpolloffice.com>
    // Licensed under Apache Licence v2.0 <http://apache.org/licenses/LICENSE-2.0>
    $keys = array('t' => 2, 'r' => 6, 's' => 7, 'i' => 1);
    $ret  = array();
    if (substr($ved, 0, 1) == '1') {
        preg_match_all('/([a-z]+):([0-9]+)/i', $ved, $matches, PREG_SET_ORDER);
        foreach ($matches as $m)
	    $ret[isset($keys[$m[1]]) ? $keys[$m[1]] : $m[1]] = (int) $m[2];
	return $ret;
    }
    preg_match_all('/([\x80-\xff]*[\0-\x7f])([\x80-\xff]*[\0-\x7f])/',
	base64_decode(str_replace(array('_','-'), array('+','/'), substr($ved, 1))),
	$matches, PREG_SET_ORDER);
    foreach ($matches as $m) {
	$key = $val = 0;
	foreach (str_split($m[1]) as $i => $c) $key += (ord($c) & 0x7f) << $i * 7;
	foreach (str_split($m[2]) as $i => $c) $val += (ord($c) & 0x7f) << $i * 7;
	$ret[$key >> 3] = $val;
    }
    return $ret;
}

// Example of use:
print_r(ved_decode('0CFYQFjAG'));

// Prints:
// Array
// (
//     [1] => 86
//     [2] => 22
//     [6] => 6
// )

Using JavaScript

// For Internet Explorer 9 and below, you’ll need to include the base64 library
if (!window.atob) window.atob = base64.decode;

function ved_decode(ved) {
    // Copyright 2013 Deed Poll Office Ltd, UK <https://deedpolloffice.com>
    // Licensed under Apache Licence v2.0 <http://apache.org/licenses/LICENSE-2.0>
    var keys = { t: 2, r: 6, s: 7, i: 1 }, ret = {}, re, match;
    if (ved.match(/^1/)) {
        re = /([a-z]+):([0-9]+)/ig;
        while ((match = re.exec(ved)) !== null)
	    ret[keys[match[1]] || match[1]] = parseInt(match[2], 10);
	return ret;
    }
    var vedBinary = atob(ved.replace(/_/, '+').replace('-', '/').substr(1));
    re = /([\x80-\xff]*[\x00-\x7f])([\x80-\xff]*[\x00-\x7f])/g;
    while ((match = re.exec(vedBinary)) !== null)
	ret[varint_decode(match[1]) >> 3] = varint_decode(match[2]);
    return ret;
    function varint_decode(vint) {
        var ret = 0, i = 0;
        for (; i < vint.length; ++i) ret += (vint.charCodeAt(i) & 0x7f) << (i * 7);
        return ret;
    }
}

// Minified version:
function ved_decode(c){/* Copyright 2013 Deed Poll Office Ltd, UK <http://deedpolloffice.com>, and licensed under Apache Licence v2.0 <http://apache.org/licenses/LICENSE-2.0> */var d={t:2,r:6,s:7,i:1},r={},g,m;if(c.match(/^1/)){g=/([a-z]+):([0-9]+)/ig;while((m=g.exec(c))!==null)r[d[m[1]]||m[1]]=parseInt(m[2],10);return r}var e=atob(c.replace(/_/,'+').replace('-','/').substr(1));g=/([\x80-\xff]*[\x00-\x7f])([\x80-\xff]*[\x00-\x7f])/g;while((m=g.exec(e))!==null)r[f(m[1])>>3]=f(m[2]);return r;function f(a){var b=0,i=0;for(;i<a.length;++i)b+=(a.charCodeAt(i)&0x7f)<<(i*7);return b}}

// Example of use:
console.log(ved_decode('0CFYQFjAG'));

// Prints:
// Object {1: 86, 2: 22, 6: 6}

Using regular expressions

People have noticed patterns in ved codes—and even interpreted them correctly (to some extent).  But parsing them with regular expressions is not a good solution.  It’s more difficult, and it’s likely to go wrong.  We don’t recommend it.

What’s inside the ved code?

There are several parameters, which tell you about the link that was clicked on.
For example, the ved code 0CFYQFjAG contains:

  Array
  (
    [1] => 86
    [2] => 22
    [6] => 6
  )

Parameters 1 and 2 are always present.
The other parameters (5, 6, and 7) may or may not be present, depending on the type of link that was clicked on.

(It looks like there should be parameters numbered 3 and 4, but—although we’ve looked really hard—we’ve never seen them in any ved codes.)

Parameter 1 (always present)—link index

Examples of ved values for parameter 1 on a typical Google search engine results page
Example values for parameter 1.  Can you spot an anomaly?—the value 6 is repeated.

Parameter 1 (labelled i in plain-text veds) is a unique index for each link on the page.  The value is incremented as you go down the page.  (There are a few exceptions, but this generally holds true.)

The parameter gives you a rough idea of where the link is in the page (the bigger it is, the lower down it is).

It’s not terribly useful information though—when analysing referer headers—because you don’t know how many links were originally shown on the page.  That could depend on lots of factors, such as:

  • how many adwords were displayed
  • how many results there were with sitelinks, breadcrumbs, “jump to” links, etc.
  • how many “special” search results there were (e.g. local results, news results, etc.)
  • whether the user was logged in or not

Bear in mind that it’s not just the search results themselves that get a value—practically all the links on the page (and even some non-links) have a ved code.  (Remember that the ved was meant for Google—not you!)

(Another complicating factor is that a image search result preview panel is treated as a page of results on its own.  Within the preview panel, parameter 1 is counted afresh from 0.)

For ved codes that don’t specify parameter 6 (e.g. adwords), though—parameter 1 is the next best thing.  In fact, when it comes to adwords—although you can’t tell for sure the exact position of the adword result that was clicked on—parameter 1 will tell you whether it was in the main column of results (if it’s a value in the range of 45–65) or in the right-hand column (a value of 170+).  For Adwords customers, this information will no doubt be very valuable.

Parameter 2 (always present)—link type

Parameter 2 (labelled t in plain-text veds) tells you what type of link was clicked on.

An ordinary (universal) search result has the value 22 (this of course is the most common).
But there are lots of other possible values:

Type of link Value
normal (universal) search result 22
normal result thumbnail (e.g. for an application, recipe, etc.) 1146 or 1150
sitelink 2060
one-line sitelink 338
breadcrumb 745
“Jump to” link 586
more results link (listed mainly for Q&A websites) 300
local search result 288
local search result marker pin icon 1455
dictionary definition link 5497
blog search result 152
book search result 232
book search result thumbnail 235
book search result author link 1140
image search result in basic (non-javascript) image search, or image result in universal search 245
image search result [probably not in use any more] 429
image search result (thumbnail) 3588
image search result preview title link 3598
image search result preview grey website link underneath title 3724
image search result preview thumbnail 3597
image search result preview “View image” link 3596
image search result preview “Visit page” link 3599
in-depth article result 5077
in-depth article result thumbnail 5078
map search result 1701
map search result website link 612
map search result thumbnail 646
news result 297
news result thumbnail 295
news result video thumbnail 2237
news sub-result (i.e. the same story from a different site) 1532
patent result 232
patent result thumbnail 235
patent result “Overview” / “Related” / “Discuss” link 1107
shopping search result 371
video result 311
video result thumbnail 312
authorship thumbnail link 2937
authorship “by [author]” link 2847
knowledge graph link 2459
knowledge graph main image 3836
knowledge graph repeated sub-link (e.g. football team squad players, album track listings) 1732
adword (i.e. sponsored search result) 1617
adword sitelink 706
adword one-line sitelink 5158
sponsored shopping result (in main column of universal search results) 1987
sponsored shopping result thumbnail (in main column of universal search results) 1986
sponsored shopping result (in right-hand column of universal search results) 1908
sponsored shopping result thumbnail (in right-hand column of universal search results) 1907

This isn’t even the complete list—almost every link in the search results has a ved code.  But a lot of them just link to other Google pages (e.g. to another page of results), so you won’t see them in a referer header.

If you find any values in your referer headers that aren’t on this list, then please let us know (e.g. in the comments below).  We’ll keep this list as up-to-date as we can.

Parameter 6—result position

Examples of ved values for parameter 6 on a typical Google search engine results page
Example values for parameter 6.
Note that the first result has the value 0.  The thumbnail doesn’t have a value.

Parameter 6 (labelled r in plain-text veds) tells you the position of the result in the page.

The parameter starts from 0, and counts upwards.
On page 2 of the results, the value is reset to 0.

Image, news, and video results, as well as sitelinks, are all counted as “full” results—so it’s quite common for there to be more than 10 results on each page.

One-line sitelinks and breadcrumbs (and anything else counted as a sub-result) will inherit the value of their parent—so they’ll also have a value for parameter 5.

Adwords, thumbnails, and other types of links aren’t counted as “proper” results, so they don’t have a value for parameter 6—but they do have a value for parameter 1 (the link index).

Parameter 7—start result position

Parameter 7 (labelled s in plain-text veds) tells you the starting position of the first result on the page.

On page 2, it will be 10; and on page 3 it will be 20, and so on.
On page 1, the value isn’t present (but implicitly, this means a value of 0).

Therefore you can get the page number by working out: (parameter 7) ÷ 10 + 1

In theory, you could work out (parameter 7) + (parameter 6) to get the “true” result position.  However, a page could well have more than 10 results (when you consider news, image, video results, and so on)—so it’s best to keep these two values separate.

Parameter 7 isn’t always present in ved codes—but when it isn’t present, it means the link was on page 1.  So you can always work out what page the link was on.  All types of link (including adwords, thumbnails, etc.) get a value for parameter 7—even if they don’t have a value for parameter 6.

Parameter 5—sub-result position

Examples of ved values for parameter 5 on a typical Google search engine results page
Example values for parameter 5 (in a group of one-line sitelinks).
All sub-results inherit parameter 6 from their parent.

Parameter 5 is only present in groups of sub-results, including:

Parameter 5 tells you the position of the link in the group of sub-results.  Like parameter 6—parameter 5 counts upwards, starting from 0.

Bear in mind that sitelinks (for non-sponsored results) are counted as full results, whereas one-line sitelinks are treated as sub-results.  For adwords, though, they’re all treated as sub-results.

But is the ved parameter always passed?

No, not always.  Mobile devices (e.g. iPads) will generally just pass https://www.google.com/ (for example) as the referer header, and no ved parameter.  Desktop browsers, though, (generally) do pass the ved parameter—and of course desktop users are still much more common.

For the last couple of years, Google have—for more and more users—been withholding the query terms (i.e. what a user actually searched for) from appearing in the referer header.  When this happens, it shows up in Analytics reports as a (not provided) keyword.  This has all been done in the name of privacy.  Nevertheless, it’s been a big disappointment to marketers and SEO analysts, who have relied on the information.  So the information in the ved parameter certainly goes some way in offsetting this loss.

For mobile devices—we think the loss of the ved parameter has been an incidental casualty of the search terms being withheld—only for different reasons.  We don’t think that Google consider the ved parameter to hold any sensitive data as such.  However, the technical details, and the motivations behind it (from Google’s point of view) are a complex topic, and it’s beyond the scope of this article.

What else is encoded in Protocol Buffers?

Yes—that’s what we thought too!

We did find one other parameter encoded in Protocol Buffers—the gclid parameter.  This is only passed when you click on an adword though.  If you advertise with adwords, you may be interested to read about how you can decode the gclid parameter yourself.

None of the other parameters passed in the referer header (from Google’s search results) seem to be encoded in Protocol Buffers.  Some of them are almost certainly encrypted, anyway.  For some reason, the ved parameter wasn’t.

Comments

comments powered by Disqus