Cookies, small chunks of data placed on your computer by websites so that they can be returned to the website, have in the past 10+ years become a mostly useful part of online life. They are used "everywhere", from the beneficial login credentials used in these forums and various shopping carts, to the more ambivalent uses for advertisement targeting and other user tracking.
It is the latter point that makes many people a bit wary of cookies, and makes some people almost "paranoid" about them.
Let's get one thing settled right away: Cookies can only be sent to the server that set the cookie, or the group of servers to which that server belongs, as specified by the name of the server and the server itself. A cookie cannot be sent to a server outside that group.
Well, some bright readers are probably already asking: How do you define "group"? And how do you keep the groups from growing too big? Good questions, and that is what the rest of this article will be about.
Let's start with a simple example:
Example 1: www.example.com belongs to two groups, ".example.com" and ".com". The first, ".example.com", is perfectly acceptable, but what about the second one ".com"? Do we really want sites to be able to send cookies to all .com domains? No, we don't (add really capital letters and a couple of exlamation marks, if you want). So, the target group must contain at least two components (or an internal dot, as the specification say).
Fortunately, the specifications already says that for generic domains (like .com) the target domain must contain at least two components. So we are safe there.
Now, (example 2) what about www.example.no? Obviously, here we can use the same rules as for the generic domains (example 1).
And what about (example 3) www.example.co.uk? Obviously, since co.uk is a Top Level-like Domain we cannot permit cookies to be set for co.uk. That would permit a server to send a cookie for all servers in the co.uk area. But how do we consolidate this with example 2?
Or (example 4) what about www.example.suburb.city.state.us? It is equally unacceptable that cookies be set for the domain suburb.city.state.us.
This is starting to look really complicated, to put it mildy.
What we are trying to avoid is accidental, intentional, or even malicious interference with another service (e.g. Bank1 and Bank2 should not necessarily know about each other's customers, and if they want/need to share that information there are better means available to achive such sharing). Such interference can also have security implications, e.g. a specially crafted cookie might be used to block access to a site, or interfere with its operation.
Further, we want to limit cross-site information gathering. In many cases such information can be connected to a physical person (a few years ago DoubleClick got into serious trouble when they wanted to do just that). When such gathering is performed, the business connections should be out in the open, not hidden, as would be the case if any site could join the network clandestinely, by starting to look for a wide targeted cookie without informing the user.
Netscape, which wrote the initial specification for cookies, specified that the target domains for cookies had to contain at least one internal dot (example.com) in the generic domains like .com, while in all other top level domains (TLDs) the targets had to contain at least two internal dots (example.co.uk). That makes it impossible to set example.no in example 2 as the target domain, while it will permit cookies to be set for city.state.us. Oops! We've got double trouble: A rule that is too restrictive and too relaxed at the same time.
The second part of the rule, about non-generic domains, was never propely implemented by anyone (including Netscape and Opera), possibly because of example 2 (which is completely valid, from a practical point of view), and the rule was also easy to miss: It was a single sentence inside a larger paragraph. In late 1998 this missing check became notorious as the "Cookie Monster Bug". More about the consequences of that below.
The next versions of cookies (RFC 2109, RFC 2965) tried a different track: Cookies can only be set for the parent domain of the server, meaning that www.example.no can set cookie to example.no, www.example.co.uk to example.co.uk, but not co.uk, and www.example.suburb.city.state.us can set to example.suburb.city.state.us, but not suburb.city.state.us.
Now, this looks much better, but … hmmmm … what if somebody put the server at example.co.uk or example.suburb.city.state.us? Then they could set a cookie for, respectively, co.uk or suburb.city.state.us. Ooops, again! Not quite as watertight as we would prefer.
Browser vendors tried several ways to fix the Cookie Monster Bug. One was a black list of some second level domains, like "co", "com", "ac" etc, but they did not (and cannot) get every such TLD-like subdomain.
After the "Cookie Monster Bug" was discovered, Opera initially tried the double-dot-rule, and then the one-level-up approach, but neither worked very well.
The primary problem is: We want to let servers on a site be able to share information with the other servers on the site. But: How do you tell a valid site domain like example.no from the co.uk and suburb.city.state.us-type Top Level-like (non-site) domains?
It is not realistic to use short blacklists, there are too many ways each nation wants to do things, and the names cannot be put into nice convenient patterns (this is type1, that is type 2), making it impossible to use an heuristic method.
There is only one realistic option: We have to have some means of finding out which type of domain we are dealing with. This options has two alternatives: Either a big, expensive(!) database, or some kind of rule-of-thumb method.
Here at Opera we went for the rule-of-thumb method: When Opera is checking a cookie whose target domain matches certain criteriea (e.g. it is not a .com domain), we do a DNS name lookup for the target domain, to see if there is an IP address for that domain. If there is an IP address for the domain (e.g. example.no) we assume that the domain is a normal company domain, not a co.uk like domain, and therefore safe. If there is no IP address we assume that the domain is co.uk-like and therefore unsafe, and only allows the cookie to be set for the server that sent the cookie.
Unfortunately, this can break sites that do not have an IP address for their domain, but that is quite easy to fix for the webmaster, and it is also quite common to allow surfers to access the website without having to write out the "www." part, many also use this name in their advertising, so this is not an unsurmountable problem. Some sites even put their main site on the domain name, rather than at the www name.
Additionally, it is possible to get past this problem by using Opera's cookie filters to add an accept filter for the domain.
A bit more serious is the fact that some co.uk-like domains actually define an IP address for their domain, for example to provide a directory service. Alright, back to the drawing board.
After investigating many of the alternatives to securing the domain specification, only one alternative appears to do the job as well as it is possible to do it: A complete database of all co.uk-like domains, or subTLDs as I call them.
The problem is still the same one that shot down the idea the first time: It will be prohibitively expensive for a single company to examine all Top Level Domains, and their policies, in order to generate the list, and to maintain it.
The best solution appears to be one often used in Computer Science: "Divide and Conquer"1. In this case, spread the workload of building the database as widely as is efficient and possible, find the people best suited to create and maintain the list. In my opionion those people are the registries that maintain each Top Level Domain.
The task can be performed by others, but these people will have to spend much more effort gathering, organizing, and assuring the quality of the information, than the companies in charge of the policies and the systems.
About three months ago I released an Internet Draft proposing a way of implementing such a database, and how clients, like Opera can use it to secure their cookie support. It will also be possible to use the specification for other operations.
At the same time, to document the system, I relased another draft describing Opera's implementation of the DNS validation of cookie domain system.
The drafts are available from the IETF's servers: DNS validate and subTLD
There are other issues related to control of cookies, in particular within shared hosting services, but those problems require different solutions, and I am also working to solve this problem, but it is still some way to go before this is ready to be published.
Please note: Some changes are planned, e.g. using another file format, probably XML. It may also turn out that other protocols can be used instead.
Feel free to send me comments and suggestions, either direct or on the IETF's HTTP Work Group mailing list.
1 "Divide and Conquer" is a procedure by which a task is broken into smaller portions that can be handled independently, and then put back together, in a more efficient manner than if one tried to do the whole job in one go. It is often possible to repeat the procedure on the smaller portion. It should not be confused with the more nefarious political and military meaning of the expression.
Update September 21st:
As the drafts have now expired I have uploaded archive copies. The links above has also been updated.
draft-pettersen-dns-cookie-validate-00.txt
draft-pettersen-subtld-structure-00.txt
Wow … that was extensive. Thanks for sharing this information. I always wondered how those things worked.Could you please explain the following paragraph again? I try hard to understand it, but I seem to fail.Now, this looks much better, but … hmmmm … what if somebody put the server at example.co.uk or example.suburb.city.state.us? Then they could set a cookie for, respectively, co.uk or suburb.city.state.use. Ooops, again! Not quite as watertight as we would prefer.
_Grey_, what I mean is that even if you limit cookies to the parent domain (the “reach”), e.g. that http://www.somearea.at.example.com can only set cookies for all servers in the somearea.at.example.com-domain, not every server in the example.com-domain, it is (barring other rules preventing it) possible to bypass the intention of the rule with respect to TLD-like domains like co.uk by putting a server at example.co.uk (not http://www.example.co.uk) and then proceed to set cookies that will be sent to ALL servers in every co.uk-domain, e.g. to http://www.mybank-example.co.uk, which is, as stated in my post, of course not remotely desirable.What it means is that, with only that rule, the problem is still just as bad, even if the range of hostnames that can behave maliciously has been restricted.(BTW: Fixed a typo in the example)
Ahh, now I get it. Thanks :)Almost all sites are put at the root domain, though …In this case you could check against the first part (of the uri) being an indication for a server or a domain. In the latter case you can act accordingly (e.g. restricting access to example.co.uk instead of .co.uk).On a sidenote: A rule based entirely on such physical criteria as “dots” doesn’t make any sense. The rule should be formulated in other means … actually the means of it, not the visual markings of it. Just to state that, a “dot-rule” sounds like nonsense, like rubbish in my ears. Unfortunately I can’t come up with a nice browser analogy as I had planned, but nevertheless.
I think that it would be better to include to cookie specification some HTTP headers, which list all domains from same group. For example we have somesite.co.uk and it sends something like that:cookie-group: somesite.co.uk, http://www.somesite.co.uk, http://www.somesite.comThen if we visit http://www.somesite.com it also sould send such string so we can check if this group is correct. If group matches then everything is fine. If not – then hastalavista cookie! I think this would do the trick without any heuristics and buggy methods.
Aux: First, example.co.uk and example.com cannot share cookie anyway, since neither can be a postfix of the other.I am also afraid that your suggestion may have the same problem as the current cookie specifications: If you have a group specification “example.co.uk, http://www.example.co.uk“, how do you know that example.co.uk is a valid domain, and not a subTLD? We can’t trust the information sent to us from the server unless it is signed by a trusted third party.Even if the group specification had to match or overlap before sending cookies, using that method would slow down the process as much as using HEAD request to a server would. A single OPTIONS request might work, though, but that may still require serious modifications to the servers. And it would also not stop malicious multisite cooperation, or abuse of opportunities due to bad configuration of the specification.
Originally posted by _Grey_:In this case you could check against the first part (of the uri) being an indication for a server or a domain. In the latter case you can act accordingly (e.g. restricting access to example.co.uk instead of .co.uk).Sorry, I am not quite sure I understand what you mean _Grey_.If you mean that we should try prepending “www.” on the domain to see if it resolves, please try this URL: http://www.co.uk/If OTOH you mean we should check for an IP address on the target domain, that is what Opera does at the moment, and is what is described in the DNS-validate draft.A rule based entirely on such physical criteria as “dots” doesn’t make any sense.If you mean a rule based on X number of dots in a name, I agree, it does not work. Unfortunately, the original cookie specification was written at a time when the structure of the internet namespace was still in flux, and the people writing it may also have been more familiar with the American based domain space (.com etc.), but at least they did try to put a policy in place, it just did not work very well.
@yngve: I meant checking if ‘www’ is the prefix of the url. Imho all servers must be www, www2 etc.If there is such a prefix, one could append the “rule with the dots” and eveything should work fine. In the case that there isn’t, only allow to set cookies for the current domain, no “parent” or “reach” or what. Although this might be too restrictive to some …http://www.co.uk doesn’t resolve anything. Can’t seem to understand you. (anyone else seeing the irony? 😉 )But now I think you’re right. What you suggested might be the only option for not being too restrictive and at the same time being secure. That involves further engagement from quite a few people, though. Maybe you/Opera/someone else should start a petition that can be handed over to Committees, Working Groups, Domain Registries, and so on. There needs to be a simple explanation of the problem, a presentation of the solution and a FAQ for people having questions like the ones I asked or that are likely to be asked.Maybe, this just goes too far, though. *g*
http://www.co.uk is the WoolWorths Group website, and it does have an IP address according to my test. The server does not answer, at present, though, but it has on previus tests.
I understand issue with HEAD/OPTIONS, but what about TLDs? It does not matter which domain is in group – user cannot visit co.uk anyway and even if he could then co.uk will not send cookie group. So TLDs can be easily ignored.
Aux, It is not possible to differentiate co.uk from any other domain, unless you have extra information. And that means that, at the very least, somesite.co.uk and someothersite.co.uk can both send a domain specification “somesite.co.uk, someothersite.co.uk” possibly including “co.uk”, meaning that they can share cookies by setting for co.uk. And in order to stop the sharing in case of a mismatch the client would have to ask the site first “what is your specification”.OTOH, it could be that you are seeing possibilities I don’t see. Maybe you should flesh it out and submit it to the IETF as a Internet Draft?
I’m not a cookie specialist, so I don’t think I can write a good draft 🙂 As a web-developer I’m satisfied with cookie-for-one-domain policy.
Originally posted by yngve:The next versions of cookies (RFC 2109, RFC 2965) tried a different track: Cookies can only be set for the parent domain of the server, meaning that http://www.example.no can set cookie to example.no, http://www.example.co.uk to example.co.uk, but not co.uk, and http://www.example.suburb.city.state.us can set to example.suburb.city.state.us, but not suburb.city.state.us.This approach sounds not too bad for me, just a little bit short-sighted, as you pointed out the problem with missing prefix www (what is usual for subdomains). www is not really a prefix, but just treat it like one and define it is superfluous. Instead of ‘parent’ domain the domain itself would be the valid target. There has to be a tweak for the www[nn] (ie www2) which should be easily catchable with regular expressions (to be in the line of RFC 2109, RFC 2965 you can remove www[nn] if any and virtually add www again to use the parent domain – it is just harmonise / normalise in a formal sense).With this first rule http://www.co.uk would be invalid (maybe the rules for co.uk changed meanwhile??). It should not be too difficult to have an additional white list with all possible www[nn] in front of a TLD. Speaking of TLD in this cases doesn’t mean the entry behind the last dot, but what is logically the TLD: co.uk is a TLD in this sense. Still the number is very limited and not object for frequent changes.But i suppose it is not that easy. Maybe RFC 2109, RFC 2965 are not part of the real world and subdomain-cookies should be valid for the domain itself? Therefor subdomain1.example.com, subdomain2.example.com and example.com should belong to the same group. But of course example.suburb.city.state.us should be treated differently as a single group. In this case it is really bad and I wish you and all the other responsible people success and luck for a working solution.
Hi Yngve. You’ve wrote that Opera will not set a cross-domain cookie to a websites with domains that do not have an IP address (such as .examle.local). But what if I’am a web-developer and I simply want to test my cross-site auth (which greatly work in other browsers), e.g. when I login in test1.example.local I put the cookie for a group example.local, but other my site with domain test2.example.local can’t access it. I’ve populated my /etc/hosts file in my local machine with needed records so now I can resolve test1.example.com and test2.example.com but seems that Opera still doesn’t want to share a cookie. Maybe my local dev domains should to be resolved with 127.0.0.1 address?Any adviсe?
As mentioned above, it is possible to override the block using a Cookie Accept filter.BTW, do you mean Cross-domain, or domain-wide? “Cross-domain”, to me, means that example.org tries to set (in a response from example.org) a cookie for example.com, which is not allowed.Domain-wide cookies are allowed if the cookie is set for a second-level domain in a generic TLD, one level up in other TLDs, except on the second level, and anything else have to pass the IP address test, or be assigned a filter.
Yes, I’ve meant domain-wide, not cross-domain, sorry.Thanks for the answer.
egcrosser: AFAIK DNS information is not easily available to a client that does not also retrieve raw DNS information, which might mean that it would need to discover the system’s detailed network configuration, rather than allowing the OS to do all of that.Also, DNS is not necessarily available if you are behind a firewall that only allow access to external networks through a proxy. The only thing you can reasonably rely on being able to use in such a case is HTTP and HTTPS, DNS is out as an option. There are even platforms where the client cannot even do DNS lookups for establishing sockets, you have to tell the OS the name of the server you want to connect to, you won’t know til the connection fails if there really is a server with that name.
subTLD proposal looks more taxing and less beautiful than it could be. Just persuading the quasi-TLD owners to include a specific RR into their zone indicating “this zone is generic and therefore unsuitable as a cookie target” seems much easier and scales perfectly down to “webhotels”.
What if there were something that behaved like the Flash crossdomain.xml file? For example, this is one of YouTube’s:http://v1.lscache1.c.youtube.com/crossdomain.xmlWhat this says is any of these URLs are allowed to access this site. All others will be denied access. (Note, this is for access through the Flash player)Imagine if something like this could be applied for cookies. If I own bob.co.uk and I want to set a cookie to bob.co.uk from hi.my.name.is.bob.co.uk, then I simply have to add some sort of an XML file at bob.co.uk that says this URL is allowed access it: This file would be hosted at http://bob.co.uk/cookies.xml, so there is no way we could grant ourselves access to read or write to cookies from “.co.uk”This could even allow the flexibility to share cookies across totally separate domains, even with different TLDs:http://bob.co.uk/cookies.xml I understand this would never happen, but hey.. it’s nice to dream 🙂
John Jardine: I have been considering something like that, but being applied at each step upwards, allowing a site owner to apply policies locally or within a domain. The basic problem is to decide when to stop looking, which basically means we need a list of public suffixes before we can start.
What I don’t get is why cross-domain policy is useful at all? I can’t think of an area where I, as a developer, would need this or would consider its benefits in any way worth the labyrinth of complexity (read: SECURITY HOLES) in order to “enable” this otherwise useless feature. I run a fairly extensive cluster of application servers all secured by wildcard SSL. Whenever I need to “jump” from one subdomain to another, I simply “pre-set” the session on the target server via a secured connection from the source server and then hand a header redirect to the target server to the end-user’s browser. Since the target server is expecting the hit, it sets the cookie as soon as it recognizes the browser and the hand-off is complete. The process is nearly instantaneous and invisible to the end user, and it only requires a post on a private backplane network to enable it. I never even bothered to *try* cross-domain cookies, and I think the idea is fundamentally flawed, as this post articulates.
Benjamin, this is not about crossdomain as in the case of server1.example.com talking to server2.example.net (for which setting a cookie directly from one domain to the other is not allowed, unless one uses thirdpatry redirects), but of server1.example.tld setting a cookie for example.tld, when you don’t know id server1.example.tld have the authority to set that cookie because you don’t know if example.tld is domain like opera.com (which would be OK) or a domain like co.uk or city.state.us (which would NOT be OK). Just in the Norwegian TLD there are 700 domains of the latter type, and in the dot-us TLD there are likely tens of thousands, or more.
Mozilla already maintains and publishes a database for this purpose; see http://publicsuffix.org.
gojomohr: So are we, based on Mozilla’s list. http://my.opera.com/rootstore/blog/2009/06/17/swisssign-ev-enabled-and-a-public-suffix-list and http://my.opera.com/yngve/blog/2009/06/17/refreshed-subtld-public-suffix-drafts
OK, it just seemed odd not to have mentioned it as relevant prior work in this post or the I-D. In particular, to the extent the publicsuffix.org list is widely used, and doesn’t seem to have any fatal flaws, it seems a strong counterexample to the suggestion, “It will be prohibitively expensive for a single company to examine all Top Level Domains, and their policies, in order to generate the list, and to maintain it.”
Originally posted by gojomohr:OK, it just seemed odd not to have mentioned it as relevant prior work in this post or the I-D.Work on the Mozilla list did not start until after the above article was written.The Mozilla list is crowd sourced, they used several years to produce it, and they have AFAIK been contacting every TLD registrar to ask them to help assure the quality of the list. And the list will soon have to be massively extended, since we have started to get IDN TLD (TLDs using non-alphabethic scripts, such as Cyrillic and Indian), and ICANN is about to open the Generic TLD floodgates.My point in the article and the IDs is that for a third party to stay on top of such changes, as well as changes in the existing domain structure, will be costly, whether you count money or person hours spent, and time consuming, and it will likely be lagging behind actual events. It would be far more efficient if the registrars themselves maintained the information and made it available through some common repository.