Monday, May 03, 2010

On Formally Verified Microkernels (and on attacking them)

Update May 14th, 2010: Gerwin Klein, a project lead for L4.verified, has posted some insightful comments. Also it's worth reading their website here that clearly explains what assumptions they make, and what they really prove, and what they don't.

You must have heard about it before: formally verified microkernels that offer 100% security... Why don't we use such a microkernel in Qubes then? (The difference between a micro-kernel and a type I hypervisor is blurry. Especially in case of a type I hypervisor used for running para-virtualized VMs, such as Xen used in Qubes. So I would call Xen a micro-kernel in this case, although it can also run fully-virtualized VMs, in which case it should be called a hypervisor I think.)

In order to formally prove some property of any piece of code, you need to first assume certain things. One such thing is the correctness of a compiler, so that you can be sure that all the properties you proved for the source code, still hold true for the binary generated from this source code. But let's say it's a feasible assumption -- we do have mature compilers indeed.

Another important assumption you need, and this is especially important in proving kernels/microkernels/hypervisors, is the model of the hardware your kernel interacts with. Not necessarily all the hardware, but at least the CPU (e.g. MMU, mode transitions, etc) and the Chipset.

While the CPUs are rather well understood today, and their architecture (we're talking IA32 here) doesn't change so dramatically from season to season. The chipsets, however, are a whole different story. If you take a spec for any modern chipset, let's say only the MCH part, the one closer to the processor (on Core i5/i7 even integrated on the same die), there are virtually hundreds of configuration registers there. Those registers are used for all sorts of different purposes -- they configure DRAM parameters, PCIe bridges, various system memory map characteristics (e.g. the memory reclaiming feature), access to the infamous SMM memory, and finally VT-d and TXT configuration.

So, how are all those details modeled in microkernels formal verification process? Well, as far as I'm aware, they are not! They are simply ignored. The nice way of saying this in academic papers is to say that "we trust the hardware". This, however, might be incorrectly understood by readers to mean "we don't consider physical attacks". But this is not equal! And I will give a practical example in a moment.

I can bet that even the chipset manufactures (think e.g. Intel) do not have formal models for their chipsets (again, I will give a good example to support this thesis below).

But why are the chipsets so important? Perhaps they are configured "safe by default" on power on, so even if we don't model all the configuration registers, and their effects on the system, and if we won't be playing with them, maybe it's safe to assume all will be fine then?

Well, it might be that way, if we could have secure microkernels without IOMMU/VT-d and without some trusted boot mechanism.

But we need IOMMU. Without IOMMU there is no security benefit of having a microkernel vs. having a good-old monolithic kernel. Let me repeat this statement again: there is no point in building a microkernel-based system, if we don't correctly use IOMMU to sandbox all the drivers.

Now, setting up IOMMU/VT-d permissions require programming the chipset's registers, and is by no means a trivial task (see the the Intel VT-d spec to get an impression, if you don't believe me). Correctly setting up IOMMU is one of the most security-critical tasks to be done by a hypervisor/microkernel, and so it would be logical to expect that they also formally prove that this part is done flawlessly...

The next thing is the trusted boot. I will argue that without proper trusted boot implementation, the system cannot be made secure. And I'm not talking about physical attacks, like Evil Maid. I'm talking about true, remote, software attacks. If you haven't read it already, please go back and read my very recent post on "Remotely Attacking Network Cards". Building on Loic's and Yves-Alexis' recent research, I describe there a scenario how we could take their attack further to compromise even such a securely designed system as Qubes. And this could be possible, because of a flaw in TXT implementation. And, indeed, we demonstrated an attack on Intel Trusted Execution Technology that exploits one such flaw before.

Let's quickly sketch the whole attack in points:

  1. The attacker attacks a flaw in the network card processing code (Loic and Yves-Alexis)

  2. The attacker replaces the NIC's firmware in EEPROM to survive the reboot (Loic and Yves-Alexis)

  3. The new firmware attacks the system trusted boot via a flaw in Intel TXT (ITL)

    • If the system uses SRTM instead, it's even easier -- see the previous post (ITL)

    • If you have new SINIT module that patched our attack, there is still an avenue to attack TXT via SMM (ITL)

  4. The microkernel/hypervisor gets compromised with a rootkit and the attacker gets full control over the system:o

And this is the practical example I mentioned above. I'm sure readers understand that this is just one example, of what could go wrong on the hardware level (and be reachable to a software-only attacker). Don't ignore hardware security! Even for software attacks!

A good question to ask is: would a system with a formally verified microkernel also be vulnerable to such an attack? And the answer is yes! Yes, unless we could model and prove correctness of the whole chipset and the CPU. But nobody can do that today, because it is impossible to build such a model. If it was, I'm pretty sure Intel would already have such a model and they would not release an SINIT module with this stupid implementation bug we found and exploited in our attack.

So, we see an example of a practical attack that could be used to fully compromise a well designed system, even if it had a formally verified microkernel/hypervisor. Compromise it remotely, over the network!

So, are all those whole microkernel/hypervisor formal verification attempts just a waste of time? Are they only good for academics so that they could write more papers for conferences? Or for some companies to use them in marketing?

Perhaps the formal verification of system software will never be able to catch up with the pace of hardware development... By the time people will learn how to build models (and how to solve them) for hardware used today, the hardware manufactures, in the meantime, will present a few new generations of the hardware. For which the academics will need another 5 years to catch up, and so on.

Perhaps the industry will take a different approach. Perhaps in the coming years we will get hardware that would allow us to create untrusted hypervisors/kernels that would not be able to read/write usermode pages (Hey Howard;)? This is currently not possible with the hardware we have, but, hey, why would a hypervisor need access to the Firefox pages?

And how this all will affect Qubes? Well, the Qubes project is not about building a hypervisor or a microkernel. Qubes is about how to take a secure hypervisor/microkernel, and how to build the rest of the system in a secure, and easy to use, way, using the isolation properties that this hypervisor/microkernel is expected to provide. So, whatever kernels we will have in the future (better formally verified, e.g. including the hardware in the model), or based on some exciting new hardware features, still Qubes architecture would make perfect sense, I think.

21 comments:

Nico said...

As always, a very interesting post. Do you belief we will ever arrive at a point of at least relative good security? I think I understand at least some of the stuff you talk about, but for the average person all these issues is just to complicated to comprehend, and that is one of the biggest problems: average people ignore security because it's far beyond what they can understand.

Joanna Rutkowska said...

@Nico,

We're not talking here about security of a grandma's computer... And I do know that there are people who can understand what I write here, and that they are worried about such attacks, and that they try to do something about it. They mostly work for governments, directly or indirectly.

[edited the comment to change "your grandma" -> "a grandma" -- could be taken as an offensive sarcasm, which was not my intention here]

HdP said...

Very interesting post indeed. Still, verifying the microkernel sounds better than nothing to me, just being equivalent to reducing the TCB. The main problem is, as I would suspect, more the (in)feasibility of verifying software. How much work/money would you estimate would it take to verify the Qubes/Xen codebase; or at which point in the future, do you think, would this be a realistic problem to solve? I am interested in computer security, but not very well read on formal proofing and verification etc.
Best, HdP
P.S. Security holes as big as a mom's..? scnr ;-)

Christian Belin said...

I don't share your point of view regarding formal verification. Indeed, formal verification is nowadays unpractical for basic economical reasons (cf. the time needed to prove seL4). On this point I can't agree more. But the actual economic model makes the creation of code unsafe. From the past 40 years, only a few pieces of code are totally flawless. This is clearly a huge issue.
But the work of academics is to create what will be the future of the development in, say, 20 years. That means that, for the moment, no tool is powerful enough to handle such complete and complicated models, such as a full operating system, including hardware and such. However, these tools, even slow make the one and only proof that the system has no flaw. No other mean can achieve such an impressive result. I repeat: proving code and hardware is the mathematical reason that explains no attack can succeed on a verified system.
I am not saying actual research like you do is worthless, certainly not, I admire the work you do, you and other teams to harden our environments agains all sorts of threats. I think that these steps are good way to temporarily mitigate threats. The development of formal verification is as its first steps (Coq is a good example to cite). When the tools and models will be mature, they will be part of software/hardware development cycle, as fuzzing has integrated itself in such cycles.

My .02

Christian Belin

P.S: once more, thanks for the job you do.

Joanna Rutkowska said...

@Christian:

Actually I think I have not expressed my point of view on formal verification anywhere in the post. So, I would argue that you cannot share, or not share for that matter, my point of view on this! (Cheap trick, I know;)

No doubt it would be nice to have formal verification methods that would be capable of assuring us about some practical properties. From what I see in various papers, we're still ages away from this point...

I believe that one good way of progressing, is to minimize number of things you need to rely on (and prove). The idea with non-security-critical kernels (not my idea, BTW) is one example of such approach.

But, I think, it's important to tell people, that whoever tries to sell them "formally proved secure systems" today is most likely a charlatan ("most likely", because I cannot formally prove it ;).

kuza55 said...

I think this is a bit of a straw man.

You've presented one legitimate issue (firmware runs before the OS gets to setup the IOMMU), which applies to all systems, and decided that this means that formally verified microkernels are useless (posing things as rhetorical questions doesn't let you get around this).

Not to take anything away from Qubes (though including the X server in your TCB doesn't seem like a great idea, but I guess there's no real alternative without writing a lot of your own code), which I think is awesome, but I have more trust in seL4 to not get owned than Xen.

I'm not really sure if the IOMMU setup code was verified in seL4, but I think it was (either that or they used the ghetto hack where they had the microkernel figure out if all the properties they required held at run-time).

Having said that, I think formally verified microkernels provide for the ability to create interesting architectures, one of the ones that is pretty often used as an example is a firewall system where you have two untrusted, isolated TCP stacks on either side of the firewall, separated by a firewall module and you only have to trust that your firewall module is correct for the firewall to correctly block packets (of course, people can sniff packets, but you should be encrypting everything anyway)

I'm not sure of any real figures, but I also have a feeling that performance would be better on a microkernel than on a virtualised OS. And that's on top of the fact that I'd trust seL4 more than Xen.

Regarding Intel having formal models, they have then, they're probably Verilog models that they use for synthesis, but those models are formal, they're hard to really do anything with, sure, but still formal models. (Ignoring the fact that synthesis tools are worse than compilers at producing correct output)

In any case, I think trusting hardware is a fair assumption, because hardware dictates your security model, maybe Intel will fix the fact that the IOMMU gets setup *after* other hardware gets to execute, in which case we won't have to worry about compromised network cards, etc, but, again, this seems like a hardware problem rather than a problem that can be solved in software.

Disclosure: I worked at NICTA briefly in the same group working on seL4, though on a different project, during which time I drank the kool-aid ;)

Joanna Rutkowska said...

@kuza55:

You've presented one legitimate issue (...) and decided that this means that formally verified microkernels are useless

No. I showed the problem. I showed that "trusting hardware" doesn't only mean to "exclude physical attacks". That it also means to miss some important software-only attacks. I think this is a pretty damn important conclusion, and a non-trivial one.

I'm not really sure if the IOMMU setup code was verified in seL4, but I think it was

Point me to the proof, please.

I'm not sure of any real figures, but I also have a feeling that performance would be better on a microkernel than on a virtualised OS. And that's on top of the fact that I'd trust seL4 more than Xen.

Oh really? And what about the fact that in seL4 they don't even use SMP? And the reason for this is that if they used it, it would be to difficult to formally prove its correctness -- see the paper.

Another related question: how power management is supposed to be implemented in a seL4-based system? Who should do ACPI tables parsing and ACPI function execution? (Surely the microkernel would need to do at least ACPI DMAR parsing, to be able to setup IOMMU/VT-d correctly, right?)

I never heard how this is solved in any academic microkernel project, such as seL4. And my bet is that they simply ignore power management (if anybody gets a link to how this is handled, I would love to read about it, really). This fact alone makes it unfair to compare any of those microkernels to e.g. Xen.

In any case, I think trusting hardware is a fair assumption

It is not about whether we trust the hardware or not (like if it is malicious or not) -- it is whether we know how to include it in the formal model or not. If we aspire to prove correctness of something so low-level as a u-kernel, that heavily interacts with the hardware, I don't see how we can not include it in the model.

kuza55 said...

No. I showed the problem. I showed that "trusting hardware" doesn't only mean to "exclude physical attacks". That it also means to miss some important software-only attacks. I think this is a pretty damn important conclusion, and a non-trivial one.

Ok, fair enough, but I do not know anyone who has taken "trusting hardware" to mean "exclude physical attacks", but maybe I'm not talking to enough people.

I still claim it is a straw man since it applies to Xen as much as to a formally verified microkernel.

Point me to the proof, please.

As far as I know the proofs are not public, and I do not know of any plans to make them public at this stage.

Having said that, I think I was wrong about the IOMMU code being proven, because afaik only the ARM version has been proven, and I'm not sure if the ARM (probably v6) architecture even has an IOMMU, do you know if it does?

Oh really? And what about the fact that in seL4 they don't even use SMP? And the reason for this is that if they used it, it would be to difficult to formally prove its correctness -- see the paper.

The paper just says it is out of scope, nothing about it being too difficult.

When I worked there, there was someone working on a verified SMP version, though I have no idea how that is going.

Another related question: how power management is supposed to be implemented in a seL4-based system? Who should do ACPI tables parsing and ACPI function execution? (Surely the microkernel would need to do at least ACPI DMAR parsing, to be able to setup IOMMU/VT-d correctly, right?)

I have no real idea, sorry, though the people working on it have a "mechanism, not policy" philosophy, so a microkernel API would probably be added to do power management and a userland process would be responsible for deciding how to manage it - probably the scheduler (I'm going on my experience with OKL4 to assume there is a userland scheduler, I haven't worked with seL4 myself).

Sadly, I know nothing about ACPI though, so I can't really say anything about that specifically.

Again, there are people working on power management there, it's only a matter of time untill someone decides it should be in seL4.

If we aspire to prove correctness of something so low-level as a u-kernel, that heavily interacts with the hardware, I don't see how we can not include it in the model.

Fair point, there is also ongoing work to construct a formal model of the ARM architecture (with an MMU this time!) so that it can be verified down to the assembly level.

And while this is a valid point, is having an informal understanding applied any worse than what Xen or any other kernel does? It seems to be the way we do things now anyway...

Joanna Rutkowska said...

@kuza55 (and others):

Why do you assume I advocate Xen-approach over a verified microkernel approach? Did I say that anywhere?

If I had a verified microkernel that would also have support for such elementary things such as IOMMU, SMP, and power management, it would be a no-brainer for me to chose it for Qubes.

The only problem is... such u-kernels do not exist (AFAIK at least).

Note that in the blog post I didn't advocate to use Xen instead of a formally verified kernel. Instead, I said that maybe another approach (to be taken in the future) is to use e.g. new (hypothetical) hardware that maybe would allow to write untrusted kernels... This is because I have some doubts that maybe formal verification will never catch up with new hardware developments (if I knew it would, it would be the best solution probably, but I don't know if it will).

kuza55 said...

I assume you are advocating Xen because the second sentence in your post is "Why don't we use such a microkernel in Qubes then?" hence setting the stage for you to explain why you think Xen is a better solution.

I'm curious, do you know of any papers or have any ideas on what an architecture which allows untrusted kernels would look like?

It seems almost as if you would need to implement the basic features of a microkernel in the CPU, which doesn't quite seem like the best idea...

It still seems as if you are being overly critical of formal verification, maybe the proof isn't perfect, but it seems better than our current approach where the assumptions and reasoning are mostly informal and not recorded.

Andreas Bogk said...

I'd hold that the fact that there is little formally verified hardware (it exists, just not in the commodity PC market) is no excuse not to use formally verified software. Being able to prove that there are no buffer overflows in your trusted code base is a big step up in system security.

Joanna Rutkowska said...

@Andreas: without including hardware in the model you cannot be sure there are no overflows... Again, take a look at our recent TXT SINIT attack -- it exploits integer overflow in the SINIT code (use your imagination now, and substitute SINIT for some other system software, e.g. microkernel). In order to catch this overflow, you would need to include chipset in the formal model.

Anyway, can you at least point me to a formally proved microkernel with the following features implemented:
1) IOMMU/VT-d
2) Some power management support
3) SMP support

Without the above features, even if we accepted to "trust the hardware", it still would be nothing else than an academic toy... And I would argue that less secure than Xen.

masadrasheed said...

awesome post BTW what you think about KVM ?

Joanna Rutkowska said...

@masadrasheed:

See chapter 3 in:

http://qubes-os.org/files/doc/arch-spec-0.3.pdf

Will said...

LynxSecure (not an open source product...) has been designed with formal methods AFAIK.
It includes VT-d and SMP management (not sure about power mgt though). But I agree, proofs are not easy to get...
Regards.

Joanna Rutkowska said...

@Will:

Design != Implementation

Unknown said...

Interesting post. I'd like to clarify some of the confusion about the current state of seL4 that has come up in the comments (I'm the project leader of L4.verified). There are several versions: seL4/ARMv6 (verified) and seL4/x86. seL4/x86 has optional support for IOMMU and optional support for SMP. None of the x86 versions are currently formally verified.

For attacking the hardware assumptions, you have to keep in mind that seL4/ARMv6 is targeted at embedded systems, not at a PC setup with arbitrary devices plugged in. In a fully high-assurance setup for the kinds of embedded devices that we target, you will have trusted boot and you have full control over the hardware that is plugged into the machine. That doesn't mean that there couldn't be behaviour in the hardware/boot code that we missed and that might indeed invalidate the proof as you say, but at least they would be bugs and not attacks. If you wanted DMA, you would have to formally verify drivers (we do research on that as well), and if you want a high-assurance system, all parts below the main protection mechanism need to be high-assurance. If a device has direct memory access (like the network card attack), then it has to be as highly assured as the CPU, MMU, kernel, etc.

We also don't claim to have proven security. In fact, if you look at the slides of any of my presentations I usually make a point that we do *not* prove security, we just prove that the kernel does what is specified. If that is secure for your purposes is another story. We are currently working more towards the security direction, but that is a new project.

On a PC setup, esp on an architecture that is as full of holes as x86 is, the story is different, of course. As you say, there are many ways to subvert a microkernel there, even if it was verified. I don't think that formalising the IOMMU will be that much harder than the MMU, but with newer hardware the risk that there are undocumented or just less well known features that may lead to gaping security holes is without doubt much greater. I also completely agree that trusted boot will remain an issue.

Actually, I'm not sure why I'm writing so much, because I pretty much agree with most of what you say. The verified version of seL4 just has a different target application domain.

You may be glad to hear that IOMMU, power management, and SMP are exactly the things our group is currently doing research on.

Cheers,
Gerwin

Joanna Rutkowska said...

@Gerwin:
We also don't claim to have proven security.

Actually you do (not you personally, but the project's homepage) -- how else could you interpret the following statement:

Ultimately, strong security guarantees are assured by mathematical proof, and the mathematical proof covers not only the design at differing levels of depth, but also the implementation of the design down to the programming language level.

source: http://ertos.nicta.com.au/research/sel4/

And that's the whole problem -- I met dozens of people who are convinced that formal verification provides 100% provable security...

Would you be willing to modify you project's website?

Unknown said...

I can see how people would take this too strongly. We've changed the text.

It's really hard to explain to people what proof gives you. It gives you pretty much 100% what you proved, but what you proved always has assumptions, and for security this is where you would attack (as you point out).

That doesn't mean the proof didn't buy you anything, it just means you have closed many, but not all attack vectors.

Joanna Rutkowska said...

@Gerwin:

That's really cool you changed the text on the website, thanks!

cmlh said...

@Gerwin

I would be having a word to NICTA PR as they are promoting this as "Sel4 will ensure operating systems are foolproof" i.e. http://www.theaustralian.com.au/australian-it/nicta-could-make-billions-with-new-tool/story-e6frganf-1225760721546