April 22, 2021
We invited guest blog author, Mingkui Wei, to submit a summary of their research to the blog this week. This blog post is based on the upcoming Usenix Security paper (full version here). Note that the domain shadowing ideas presented herein are intended to be a building block for a future system that doesn’t exist for end-users yet. We hope this post will help system designers to think in new ways, and use those ideas to build new censorship circumvention tools.
What is Domain Shadowing?
Domain shadowing is a new censorship circumvention technique that uses Content Distribution Networks (CDNs) as its leverage to achieve its goal, which is similar to domain fronting. However, domain shadowing works completely differently from domain fronting and is stronger in terms of blocking-resistance. Compared to domain fronting, one big difference among many is that the user in domain shadowing is in charge of the whole procedure. In other words, the complete system can be solely configured by the user without necessary assistance from neither the censored website nor an anti-censorship organization.
How Domain Shadowing Works
We start this section by explaining how domain names are resolved and translated by CDN.
CDNs act like a reverse proxy that hides the back-end domain and presents only the front-end domain to the public. CDNs typically take two approaches to accomplish the name translation, as shown in the following two figures. We make the following assumptions to facilitate the illustration: assume the publisher’s (i.e. the person who wants to use CDN to distribute the content of their website) origin server is hosted on Amazon Web Service (AWS) and assigned a canonical name abc.aws.com, and the publisher wants to advertise the website using the domain example.com, which is hosted on GoDaddy’s name server.
Figure 1 shows the name translation procedure used by most CDNs, and we use Fastly as an example. To use Fastly’s service, the publisher will first log into their Fastly account and set example.com as the frontend, and abc.aws.com as backend. Then, the publisher will create a new CNAME record in their GoDaddy’s name server, which resolves the domain example.com to a fixed domain global.ssl.fastly.net. The remaining steps in Figure 1 are intuitive.
There are also some CDNs who host their own TLD name server, such as Cloudflare. If this is the case, step 2 and step 3 in Figure 1 can be skipped (as shown in Figure 2).
Note that the last four steps on both figures show the difference of how a name resolution is conducted by CDN. Specifically, a regular DNS server will only respond to a DNS query with the location of the origin server and the document must be fetched by the client itself, while the CDN will actually fetch the web document for the client.
Figure 1. Name resolution by Fastly
Figure 2: Name resolution by Cloudflare
Based on the above introduction, we can now present how domain shadowing works. Domain shadowing takes advantage of the fact that when the domain binding (i.e. the connection between the frontend and the backend domains) is created, the CDN allows arbitrary domains to be set at the backend. As a result, a user can freely bind a frontend domain to any backend domain. To access a blocked domain (e.g. censored.com) within a censored area, a censored user only needs to take the following steps:
- The user registers a random domain as the “shadow” domain, for example: shadow.com. We assume the censor won’t block this newly registered domain.
- The user subscribes to a CDN service that is accessible within the censored area, but the CDN itself is not censored. A practical example would be the CDN deploys all its edge servers outside the censored area.
- The user binds the shadow domain to the censored domain in the CDN service by setting the shadow domain as the frontend and the censored domain as the backend.
- The user creates a rule in their CDN account to rewrite the Host header of incoming requests from Host:shadow.com to Host:censored.com. This is an essential step since otherwise, the origin server of censored.com will receive an unrecognized Host header and unable to serve the request.
- Finally, to access the censored domain, the user sends a request to https://shadow.com within the censored area. The request will be sent to the CDN, which will rewrite the Host header and forwards the request to censored.com. After the response is received from censored.com, the CDN will return the response to the user “in the name” of https://shadow.com.
During this process, the censor will only see the user connect to the CDN using HTTPS and request resources from shadow.com, and thus will not block the traffic.
On a CDN that still supports domain fronting, we can apply domain fronting techniques to make domain shadowing stealthier. To do this, we still set shadow.com as the frontend and censored.com as the backend, but when an HTTPS request is issued from within the censored area, the user will request the front domain front.com and set the Host header to be shadow.com. This way, the censor only sees the user is communicating with the front domain and will not even suspect the user’s behavior.
What’s the Benefit of Domain Shadowing?
Compared to its siblings, domain fronting, the obvious benefit of domain shadowing is that it can use any CDN (as long as the CDN supports domain shadowing, and based on our experiments most CDNs do) to access any domain. The censored domain does not need to be on the same CDN, which is a big limitation of domain fronting. Actually, the censored domain does not need to be a domain that uses CDN at all. This is a big leap compared to domain fronting, which can only access the domains on the same CDN as the front domain.
Another shortcoming of domain fronting is that it can be (and is being) painlessly disabled by CDNs by mandating the Host header of an HTTPS request must match the SNI of the TLS handshake. Domain shadowing, on the other hand, is harder to be disabled since allowing a user to configure the backend domains is a legitimate feature of CDNs.
Compared to other VPS-based schemes, domain shadowing is (possibly) faster and does not need dedicated third-party support. It is faster because compared to the proxy-on-VPS scheme that uses a self-deployed proxy to relay the traffic, domain shadowing’s relay is actually all the CDN’s edge servers that operate on the CDN’s high-speed backbone network, and the whole infrastructure is optimized specifically to distribute content fast and reliably. The following figure compares the delay of fetching a web document directly from the origin server, using Psiphon, using proxy-over-EC2 (with 2 instances based on different hardware configuration), and using domain shadowing based on 5 different CDN providers (Fastly, Azure CDN, Google CDN, AWS Cloudfront, and StakePath). From the figure, we can see domain shadowing beats other schemes most of the time.
Challenges of Domain Shadowing
At this moment, domain shadowing faces the following main challenges:
Complexity: The user must config the frontend and backend domains in their CDN account for every censored domain they want to visit. Although such configuration can be automated using the CDN’s API, the user still needs to have sufficient knowledge about relatively complex operations such as how to register to a CDN, enable API configuration and obtain API credentials, and how to register a domain.
Cost: Based on our survey, for 500 GB monthly data usage, the cost of using domain shadowing with a reputable CDN is about $40, which increases or decreases linearly with the data usage in general. If the user chooses to use an inexpensive CDN, the cost could be brought to under $10 per month. However, this still can’t beat free tools such as Psiphon and Tor.
Security: By using domain shadowing, the browser “thinks” it is only communicating with the shadow domain, while the web documents are actually from all the different censored domains (see the following figure where we visit Facebook using Forbes.com as the shadow domain). Such domain transformation makes the Same-Origin-Policy no longer enforceable. While we can use a browser extension to help with this issue to some extent, the user must be aware and cautious about what websites to visit.
Privacy: CDNs intercept all HTTP and HTTPS traffic. That is, when a CDN is involved, the HTTPS is no longer between the client and the origin server but between the client and the CDN edge server. Thus, the CDN is able to view and modify any and all traffic between the user and the target server. While this is very unlikely, especially for large and reputable CDNs, users should be aware of the possibility.
We explained domain shadowing, a new technique that achieves censorship circumvention using CDN as leverage. It differs from domain fronting, but can work hand-in-hand with domain fronting to achieve better blocking-resistance. While significant work is still needed to address all the challenges and make it deployable, we see domain shadowing as a promising technique to achieve better censorship circumvention.