Tuesday, December 13, 2011

Trusted Execution In Untrusted Cloud


Wouldn't it be nice if we could actually own our data and programs in the cloud? By “owning” here I mean to have control over their confidentiality and integrity. When it comes to confidentiality and integrity for the data, it's not much of a rocket since, as the classic crypto (and secure client systems) is all that we need. I have already wrote about it in an earlier post.
But it would also be nice, if we could somehow get the same confidentiality and integrity assurance for our programs that we upload for the execution in the cloud...

For example, a company might want take their database application, that deal with all sorts of corporate critical sensitive data, and then upload and safely run this application on e.g. Amazon's EC2, or maybe even to some China-based EC2-clone. Currently there is really nothing that could stop the provider, who has a full control over the kernel or the hypervisor under which our application (or our VM) executes, from reading the contents of our process' memory and stealing the secrets from there. This is all easy stuff to do from the technical point of view, and this is also not just my own paranoia...


Plus, there are the usual concerns, such as: is the infrastructure of the cloud provider really that safe and secure, as it is advertised? How do we know nobody found an exploitable bug in the hypervisor and was not able to compromise other customer's VMs from within the attacker-hired VM? Perhaps the same question applies if we didn't decided to outsource the apps to a 3rd party cloud, but in case of a 3rd party clouds we really don't know about what measures have been applied. E.g. does the physical server on which my VMs are hosted also used to host some foreign customers? From China maybe? You get the point.

Sometimes all we really need is just integrity, e.g. if we wanted to host an open source code revision system, e.g. a git repository or a file server. Remember the kernel.org incident? On a side note, I find the Jonathan Corbet's self-comforting remarks on how there was really nothing to worry about, to be strikingly naive... I could easily think of a few examples of how the attacker(s) could have exploited this incident, so that Linus & co. would never (not soon) find out. But that's another story...

But, how can one protect a running process, or a VM, from a potentially compromised OS, or a hypervisor/VMM?

To some extent, at least theoretically, Intel Trusted Execution Technology (TXT), could be used to implement such protection. Intel TXT can attest to a remote entity, in that case this would be the cloud customer, about the hash of the hypervisor (or kernel) that has been loaded on the platform. This means it should be possible for the user to know that the cloud provider uses the unmodified Xen 4.1.1 binary as the hypervisor and not some modified version, with a built-in FBI backdoor for memory inspection. Ok, it's a poor example, because the Xen architecture (and any other commercially used VMM) allow the administrator who controls Dom0 (or equivalent) to essentially inspect and modify all the memory in the system, also that belonging to other VMs, and no special backdoors in the hypervisor are needed for this.

But let's assume hypothetically that Xen 5.0 would change that architecture, and so the Dom0 would not be able to access any other VM's memory anymore. Additionally, if we also assumed that the Xen hypervisor was secure, so that it was not possible to exploit any flaw in the hypervior, then we should be fine. Of course, assuming also there were also no flaws in the TXT implementation, and that the SMM was properly sandboxed, or that we trusted (some parts of) the BIOS (these are really complex problems to solve in practice, but I know there is some work going on in this area, so there is some hope).

Such a TXT-bases solution, although a step forward, still requires us to trust the cloud provider a bit... First, TXT doesn't protect against bus-level physical attacks – think of an attacker who replaces the DRAM dies with some kind of DRAM emulator – a device that looks like DRAM to the host, but on the other end allows full inspection/modification of its contents (well, ok, this is still a bit tricky, because of the lack of synchronization, but doable).

Additionally for Remote Attestation to make any sense, we must somehow know that we “talk to” a real TPM, and not to some software-emulated TPM. The idea here is that only a “real” TPM would have access to a private key, called Endorsement Key, used for signing during Remote Attestation procedure (or used during the generation of the AIK key, that can be used alternatively for Remote Attestation). But then again who generates (and so: owns) the private endorsement keys? Well, the TPM manufacturer, that can be... some Asian company that we not necessarily want to trust that much...

Now we see it would really be advantageous for customers, if Intel decided to return to the practice of implementing TPM internally inside the chipset, as they did in the past for their Series 4 chipsets (e.g. Q45). This would also protect against the LCP bus-level attacks against TPM (although somebody told me recently that TPM in current systems cannot be so easily attacked from LCP bus, because of some authentication protocol being used there – I really don't know, as physical attacks have not been the area we ever looked at extensively; any comments on that?).

But then again, the problem of DRAM content sniffing always remains, although I would consider this to be a complex and expensive attack. So, it seems to me that most governments would be able to bypass such TXT-ensured guarantees in order to “tap” the user's programs executing in the cloud provides that operate within their jurisdictions. But at least this could stop malicious companies from staring up fake cloud services with an intent to easily harvest some sensitive data from unsuspecting users.

It seems that the only way to solve the above problem of DRAM sniffing attacks is to add some protection at the processor level. We can imagine two solutions that processor vendors could implement:

First, they could opt for adding an in-processor hardware mechanism for encrypting all the data that leave the processor, to ensure that everything the is kept in the DRAM is encrypted (and, of course, also integrity-protected), with some private key that never leave the processor. This could be seen as an  extension to the Intel TXT.

This would mean, however, we still needed to relay on: 1) the hypervisor to not contain bugs, 2) the whole VMM architecture to properly protect VM's memory, specifically against the Dom0, 3) Intel TXT to not be buggy either, 4) SMM being properly sandboxed, or alternatively to trust (some parts of) the BIOS and SMI handler, 5) TPM's EK key to be non-compromised and verifiable as genuine, and 6) TPM bus attacks made impossible (those two could be achieved by moving the TPM back onto the chipset, as mentioned above), and finally, 7) on the encryption key used by the processor for data encryption to be safely kept in the processor.

That's still quite a lot of things to trust, and it requires quite a lot of work to make it practically really secure...

The other option is a bit more crazy, but also more powerful. The idea is that the processor might allow to create untrusted supervisors (or hypervisors). Bringing this down to x86 nomenclature, it would mean that kernel mode (or VT-x root) code cannot sniff or inject code into (crypto-protected) memory of the usermode processes (or VT-x guests). This idea is not as crazy as you might think, and there has even been some academic work done in this area. Of course, there are many catches here, as this would require specifically written and designed applications. And if we ever considered to use this technology also for client systems (how nice it would be if we could just get rid of some 200-300 kLOC of the Xen hypervisor from the TCB in Qubes OS!), the challenges are even bigger, mostly relating to safe and secure trusted output (screen) and, especially, input (keyboard, mouse).

If this worked out, then we would need to trust just one element: the processor. But we need to trust it anyway. Of course, we also need to trust some software stack, e.g. the compilers we use at home to build our application, and the libraries it uses, but that's somehow an unrelated issue. What is important is that we now would be able to choose that (important) software stack ourselves, and don't care about all the other software used by the cloud provider.

As I wrote above, the processor is this final element we always need to rust. In practice this comes down to also trusting the US government :) But we might imagine users consciously choosing e.g. China-based, or Russia-based cloud providers and require (cryptographically) to run their hosted programs on US-made processors. I guess this could provide reasonable politically-based safety. And there is also ARM, with its licensable processor cores, where, I can imagine, the licensee (e.g. an EU state) would be able to put their own private key, not known to any other government (here I assume the licensee also audits the processor RTL for any signs of backdoors). I'm not sure if it would be possible to hide such a private key from a foundry in Hong Kong, or somewhere, but luckily there are also some foundries within the EU.

In any case, it seems like we could make our cloud computing orders of magnitude safer and more secure than what is now. Let's see whether the industry will follow this path...

17 comments:

c43ssmas73r said...

No because

...think of an attacker who replaces the processor dies with some kind of processor emulator – a device that looks like processor to the host, but on the other end allows full inspection/modification of its contents (well, ok, this is still a bit tricky, but doable)...

Joanna Rutkowska said...

@c43ssmas73r: the point of the whole exercise is that it would not be possible to build such a processor emulator for the adversary, because she would not know the private key that is only in a legitimate processor.

Shayan Pooya said...

Are you considering git's signatures when talking about the possible hard-to-discover exploits on kernel.org? (considering the only important thing on that server is the kernel itself). Would you please elaborate on these possible exploits?

Also, this post is not showing well in google reader (in firefox and in android app). Some spaces between words are missing.

Joanna Rutkowska said...

@Shayan:

No, of course I didn't assume properly applied and checked digital signatures -- those should provide ultimate safety, naturally. However, the Corbet's explanation does _not_ resort to them! And we know why: there are lot of e.g. kernel branches on kernel.org that are never (or very rarely) signed! Yet, this doesn't stop the community there from self-comforting that all is fine. Considering this specific example of git, I think it should really be enforcing digital signature (signed tags) on _every_ commit automatically.

Regarding the wrong formatting: hmmm, I don't see that. Sounds strange though, because I used Google Blogger to create the blog... Anybody else see this problem?

Anonymous said...

Fully homomorphic encryption might once solve the cloud computing issues.
With fully homomorphic encryption you wouldn't even need to trust the CPU of the cloud provider.
But it's still a matter of research...

Joanna Rutkowska said...

@Anonymous: can you elaborate more on how could we not trust the processor?

mzlee said...

There is strain of more recent research (Overshadow, Cloudvisor) that tries to tackle this untrusted intermediary problem. Overshadow takes on the idea of a trusted hypervisor that protects user applications from an untrusted OS in the middle. A more recent, and perhaps more promising variant of this idea is Cloudvisor: It takes the form of a small security VMM that uses nested virtualization to prevent a real VMM from inspecting a particular VM.

In many ways, this is just a game of controlling the lowest layer, but they (the Cloudvisor people) make a compelling argument that their VMM only needs a small interface to enforce isolation between an untrusted VMM and the different VM's.

Alexey said...

The anonymous poster is talking about fully homomorphic encryption, which, theoretically, could allow to perform arbitrary computations by manipulating ciphertext, without any knowledge of the plaintext. Some signature produced as a part of computation would certify that computation sequence itself was not tampered with. However, this field is not very advanced yet (as far as I know). Possible vulnerabilities are not well understood, and it's totally impractical with current technology.

Anonymous said...

Well, the basic idea of fully homomorphic encryption (FHE) is to do the operations you want to do on encrypted data. Then, when you decrypt the result, you will get the result as if you had worked on plaintext data.
The algorithms themselves which work on the encrypted data are public though.
Back to the cloud provider example: I could decide to run some fancy well-known data mining algorithm on my private data, encrypt it using FHE, send it to the cloud provider, who executes the data mining algorithm on that data. Afterwards I receive the encrypted result and decrypt it on my machine to see the real result. Hence I don't need to trust the cloud provider's CPU. Ok I agree that I must trust it to execute the data mining algorithm correctly, but usually the algorithms are not that important to be kept secretly.

Anyway this is probably far away in the future. However current research indicates that it might once be possible. Actually there already exist FHE cryptosystems, but they are highly inefficient so far (Craig Gentry et al).

Ahmed Masud said...

There are a lot of man-in-the-middle attack scenarios that leave a bitter taste in the mouth, unless the entire communications path and intermediate processors are also verified. Which still leaves the problem of ensuring tampering of clear text. Here you would say FHE so let's just have a trivial look at that:

As for FHE Let's take the situation where you have a scheme where E(x•y) = E(x) • E(y) for some binary operation and you want the CPU to perform the '•' on cypher-text.

Trivially a rogue CPU in the cloud can easily take the E(x•y) and "add" a delta to it: E(x•y) • E(z) where E(z) is attacker's operand.

Now comes the question of whether or not it is possible to verify (x•y) without calculating (x•y) well that's equivalent of P=NP problem.

Now if you could encode the operation • in some way (which is by no means trivial) the only thing i can think of is some form of quantum solution, where any knowledge of the state of machine that performs E(x•y) while it is calculating it will render it chaotic... (Hmm, possible new research direction :))

Any how ... some thoughts.

Anonymous said...

I also see the wrong formatting in Firefox 8 and Android app, but only some spaces are missing, not all of them. The text is still readable.

Anonymous said...

@Ahmed Masud:
I don't really think that tampering the ciphertext and thus the plaintext would be such a big problem, if we had FHE.
If I wanted a cloud provider to use my algorithm and the provider computes something else - as you said - I cannot really check whether he really computed what I wanted.
Due to FHE, he cannot see the result.
Now if a cloud provider had like 10.000 customers and 1 customer would actually check the result and find out that it's not correct, it could become public and ruin the cloud provider. I just don't see the motivation why a cloud provider would want to change something he doesn't know much about and where the intervention could ruin him.

Anyway this is all way too hypothetical. ;)

Anonymous said...

i think the cloud is your only ally here, ironically. in the end, you are having to trust *some* big company and big government behind them ... and we all know they are quite permeable to always sway to some interest or other ... in a global context there is always a large population whose interests are denied by such entities ... so the problem will always remain, on both realpolitik and truly ethical levels: many want access to computing resources but can't trust them.

no matter how many tpms, crypto dongles, etc, there is always the chance that the guy who made the device sold you out, or built it using backdoored technology...

but if you can split your problem up and send to *many* CPUs, then aggregating the result becomes difficult. *YOU* at last have the advantage.

the challenge is the same as with TOR, to ensure that you are sending to enough real disparate CPUs, and not to a large number of captured/emulated nodes.

in practice, to trust computations coming to you from unknown (anonymous?) nodes, you need to use redundency, having the same compute block processed by several nodes, and cross-check their results.

so having divided your problem into small tasks which individually contain little of interest, and having sent each tasklet to several nodes from a very large pool, you may achieve reasonable security, i would guess similar to the way most people use their email ... many interceptors could scrutinise it, most don't, a few probably do, but most people don't care much, not enough to use PGP which has been around for decades and still looks like it has been as good as can be done to solve that problem.

certainly, although the disperse nature of the cloud works in favour of security, you can't expect greater security from it than at your own node, since if that is the weakest point, then you expect it to be attacked, and such attacks by anyone who would have the ability to capture/emulate many cloud nodes or subvert CPU hardware, would easily also capture your specific node, no matter where or who you are, if they knew a good reason.

at some point, social engineering usually triumphs before such monumental engineering tasks, but the threat of a technical subversion always weighs heavily enough on one side to push us all subtly, and were it to be ignored, how hard do you think it is for a giant hardware manufacturer to say 'oh dear, another FPU bug!' they push out one generation of flawed CPUs, should be enough to remind everyone not to trust the pie-in-the-sky liberty dream software, then can go back to 'rebuilding trust' with their market.

so not much point aiming for perfect security at the expense of alarmingly complex engineering, better to accept that politics (and resource limitation perhaps) will win out before we can achieve that, and rather, aim for the most reliable, and often simplest and most generic answers ... easy for the CPU manufacturer to make a 'flaw' in some hard-to-test, little used part of the security apparatus, but hard for them to push out CPUs that cannot do a mv instruction in reasonable time (i'm not an assembly programmer, but i hope you understand what i'm trying to say!) and harder for them to conceal or explain the defect. :)

or have i gone too far the other way now? we probably want a middle path, i found your thought train interesting, and will try to read your work thouroughly.

Brill said...

Hey, Joanna
just want to say thank you for Qubes. Its good and secure OS.
Keep walking!

Alex said...

You may not have to go so far to build a DRAM emulator or other advanced (and expensive) stuff. Have you ever considered JTAG as a possible way to tap DRAM? At least some modern Intel CPUs support it and JTAG seems to be pretty powerful:

http://www.arium.com/products/3/Intel-JTAG-Debuggers.html

http://www.windriver.com/products/JTAG-debugging/Probe-emulator/

CrisisMaven said...

All this begs the question: is there, and if not, why not, a task force of delegates from people like Qubes-development, Intel, AMD and a host of others (e.g. security/antivirus companies, renowned "hackers" or their associations) that get together - like in the IEEE groups of former renown - and draft specs for all this rather than Intel meandering from one concept to another?

Joanna Rutkowska said...

@Crisis: TCG perhaps? (Although We, the Qubes people, have not been invited there).