HTTPS & Referer

There are basically two usages of HTTP header “Referer”. One is to trace the origin of the visit, which could be used to analyse the visitor’s browsing path. The other is to identity and block illegal visits to some privately owned web resources, like images & files.

There are some tricky things about “Referer” when it comes to HTTPS. Generally speaking, for safety reasons, a redirect from a HTTPS page to a HTTP page will drop the “Referer” header. Thus, the server will never know where this request really comes from. Some websites use a https-http redirect to bypass the restrictions of private web resources. For example, a webpage quote an image from sinaimg.cn. Sinaimg.cn does not accept requests unless it has empty Referer or the Referer is one of Sina owned domains. To let the browser drop the Referer header, the <img> tag uses a src of https://example.com/redirect?url=sinaimg.cn?xxxxx instead of the original one. The only function of this https://example.com/redirect is to redirect the request to the URL that is specified by the “url” parameter. Therefore, the “Referer” header is dropped and the image file could be visited from other domains.

One known fact about Google is that it uses HTTPS on all of its search results. Most other websites, however, are still using HTTP. As a result, the referer info will be dropped when redirecting from google search result to the webpage. If there is some auto-tracing tools installed on the server, it may never know that the request comes from Google search result.

To solve the problem, two things have been done by Google. One thing is the introduction of “referrer policy” to webkit. Details of it can be found here: https://wiki.whatwg.org/wiki/Meta_referrer. Basically it uses a <meta> tag to tell the browser how to deal with referrers. The current policy used by Google is “origin”, which sets “https://www.google.com” as referrer with no extra information. This is only a little bit better than no referrer, since the tracing tool cannot collect keyword information from the referrer. The other thing is the redirecting page between Google search result and the real target page. Google owned tracing tool, like Google Analytics, can use this redirecting page to collect required data.

What about non-webkit bowsers? Google uses a http redirect page instead of  a https one. This page uses JavaScript for redirecting, not a 301 or 302 redirection. In that way, the target page can collect the correct “Referer” information.

Leave a Reply

Your email address will not be published. Required fields are marked *