Saturday, August 12, 2006

Blue Pill Detection!

So, after I presented the idea behind Blue Pill at SyScan and Black Hat, some people started talking how *easy* it should be do detect Blue Pill using timing analysis. Interestingly they must have missed the fact that I already discussed this way of detection during my presentations and I also gave the reasons why I don't think it could be used in practice...

But anyway, let's look at the problem again...

Obviously, Blue Pill, like any other hardware based VMM, needs to intercept some events and instructions. One intercept which we need to take care of (in case of SVM technology), is the RDMSR EFER instruction - just because the bit 12th in EFER register signalizes weather processor works in SVM mode or not. So, we need to cheat about it to the guest.

Now, we can measure how many processor 'tics' the given instruction took to execute - all we need to do is to use a RDTSC instruction, which returns the processor's time stamp counter. So I did the measuring and it turned out that normally it takes around 90 ticks to execute RDMSR, while on a 'bluepilled' system it takes about 2100 tics. What a big difference you will say!

But SVM technology (and Intel VT-x also) offers a nice way to cheat the guest about those extra tics, by adjusting a special variable in VMCB, called TSC_OFFSET (in that case we would set this value to something around -2010, just before returning to the guest). As a result, guest can not realize that the RDMSR instruction took extra ticks, by using RDTSC instruction.

So, here's what we need to do: we need to prepare a test piece of code, which would involve calling e.g. RDMSR instruction something like a few millions of times and observer the timing using *external* clock (yes, VMM can also cheat about the internal real time clock). This external clock can also be a human being (=user). For example the detector could display a message to the user:

"Dear user, I'm going to run a test now; if this test took more then 1 minute, that would mean your computer is probably compromised with a VMM based malware. Press OK to continue..."

The first problem with this approach is how to generate a piece of code, which would execute for exactly 1 minute (or any other given amount of time) on a native machine, taking into account that we may have many different processor models, working with many different speeds, etc... One can say, that we can probe the processor speed, using some test instructions which we know for sure that are not intercepted (most of the instructions). But this is problematic, since the hypervisor can cheat as to how many tics those instructions took to execute (as RDTSC can be intercepted itself). Of course this is trivial, when we assume that we can run our detector before and after infection, but this is not the case in most practical scenarios. So, I don't know how to solve this problem (which doesn't mean it can't be solved though)... Any suggestions welcomed.

Without solving the above problem, we're facing a problem of false positives and negatives. Consider that the test above took 5 minutes (instead of one) - now does that mean that we took a too big testing code (because guest were cheated during calibration) and that the very processor just had to spent 5 minutes executing it or was this a sign of an infection - it's just that on a new processor model maybe the RDMSR interception slowdown would be of a factor of 5 instead of 20 as it's with the processor I have right now. And if it was 15 minutes?

Currently this is not a big problem, just because there are only two models of AMD processors supporting SVM on the market and each is available with few different clock speeds. So, we can probably hardcode the testing code into our detector (because the slowdown is so big). But how the situation will change during the next two years, when there will be much more processors supporting hardware virtualization on the market? We would have to have a database of processor models and how much test code we need to use on each of them. (oh btw, and how detector could detect on which model it's running? You bet, using a CPUID instruction, which can be intercepted...)

And still, even if we solved this problem, still this kind of detection would be annoying to users (imagine a user being forced to do this kind of "1-minute test", or even 10sec test, every 15 minutes or so) unless we used some kind of infrastructure providing external time measurement (can't be just public NTP, because NTP packets could be easily intercepted by the malware). So, we would need to setup encrypted NTP servers in each company... Ah great!

So, I find it quite surprising that some people diminish the threat introduced by hardware virtualization based malware. I would like to point out that it's somewhat ridicules situation, when the malware can be reliably written using perfectly documented features of the processor, while we need to do some timing based tricks to detect it :) Are we switching roles with malware writers?

What we need is a reliable detector, something which would return 0 or 1 depending whether we're inside a VM or not. And I really don't see how we can create such a program (i.e. a standalone generic detector).

For completeness, I should also mention, just as I did during my talks, that we're aware of another attack against Blue Pill which should be very reliable and that can be implemented as a standalone program, but unfortunately it seems to allow only for crashing the system when it's 'bluepilled'. This nice attack has been independently proposed by Alex Tereshkin and Oded Horowitz, BTW.

Some people talked about prevention... Can we disable virtualization in BIOS? I can't do it on my AMD machine - but I heard that vendors are going to release updates to allow for that. But, come on, this is not a good way to address this threat! It's better not to buy the processors supporting hardware virtualization!

One more thing - as I'm being continually asked about this - yes, it is possible to create a similar malware to Blue Pill using Intel VT-x, just like it was demonstrated by Dino Dai Zovi at Black Hat a week ago.

15 comments:

Anonymous said...

Don't get disheartened, a lot of people interpret things and this often leads to misinterpretation. Some people though do seem to like to knocking things/people, and new ideas etc. If they spent more time coming up with ideas/solutions/answers themselves, that would be a Lot more productive !

Now please don't shoot the messenger lol, but .....

Could some way/s be engineered to introduce a highly accurate clock, RTC ( Real Time Clock ) eg Atomic = into the equation somewhere/somehow for timing purposes. This could consist of not one, but say 2 RTC's, in internal the external to the PC. Or just the one RTC, say external to the PC, but the timing pulses inputted to the PC as well. Then possibly a diff comparison could be made against any timing anomalies etc.

Some ideas might not be feasable for all sorts of reasons, but @ least they get discussed and thought about etc. That's often how progress is made, by working through ideas that maybe won't work, but either have some merit or potential, and/or give other people ideas and spur them on in being able to produce something that can/will work.

Regards,

Spanner

SpannerITWks

Joanna Rutkowska said...

Ok, so one thing I forgot about is a nice method which was proposed by Peter Teoh – namely he suggested to chose a set of instructions (based on a latency table published in the AMD optimization guide) which should have a similar latency (in total) to the RDMSR instruction (around 90 on my processor model). So, we can get rid of the problem of calculating the required number of test loops – we can just use an arbitrary (high enough) number of iterations and compare the results of the two tests – one with RDMSR and the other with the test instructions. This implies that if we relay on a user to do the timing, then we need to ask him or her to pass the two tests. Now imagine that you need to spend, let’s say 10 seconds, for each test, every, let’s say one hour (we do want to constantly monitor our system, right)? I guess not too many people would be happy with such a detector… So, the practical solution would have to rely on some external clock and again we face a problem of creating a custom/encrypted NTP infrastructure…

BTW, I don’t think we could just create two threads executing concurrently (one doing a test loop of RDMSR and the other of no intercepted instructions), just because the thread scheduler could be fooled (the quantum for the first thread can be made 20 times longer) which would result in both threads finishing at the same time.

Anonymous said...

So, what about the claims that
the Vista paging stuff is not novel
?

Joanna Rutkowska said...

Well, if they claim they came up with a similar idea for an attack against BSD four years ago, they probably did. Not sure what they want to achieve by releasing the exploit code to the public, though… BTW, one thing they got wrong – I haven’t proposed “disabling paging altogether” as they wrote – I only proposed to disable *kernel* code and data paging, which makes a big difference.

Anonymous said...

Further to anons post about the claims that " Vistas paging stuff is not novel " via the link to elad's Invisible Research thread.

These days, and for quite some time, in PC's with large amounts < 1Gb of RAM installed, the paging/swap file can in ( most ) cases be dispensed with completely. Thereby eliminating that vector from the equation, and with the bonus of speeding things up too !

We do know what happens if and when the system runs out of memory, but not using the P/SF would of course be a personal choice. It wouldn't be difficult to run some overhead App to discover how much you had, or needed.

Spanner

SpannerITWks

Anonymous said...

You mentioned, that you'll publish the slides of your talk here. It'd great to read more about your work...

Anonymous said...

Hello,

Say, I can do ring 0 operations -
What about 'hooking' the SMM handler and reading the EFER from the state-save area. (Maybe you could disallow the hooking somehow, but what if it has been done before ?).

Will the CPU store the ORIGIANL EFER.SMVE (with the bit SET) and then BluePill is detected?

Anonymous said...

What I would like to hear about is what if the new machine came with a hypervisor with some defensive capabilities. Running at ring 0 it could know or somehow watermark its own hooks in the kernel and detect any changes. Obviously the rootkit would be difficult to defend against, but if the hypervisor was there first could it not have an advantage?

Thanks!

adin said...

First off, you're actually following an old computer science question--how can I (a von neuman computing machine) tell if I've locked up?

You've hit on the same approach that is commonly used for detecting a "hung" process--timing, or some variation of a timing test against an "uncompromised" time source. (Or hardware based timing interrupts, though I'm not sure about the compromisability of most hardware interrupts)

I guess you could do something similar to a popup test, so that you're not measuring once, but many times and measuring the pattern---so that drastic divergence is the give-away. But it's certainly not a mathematically provable solution and I doubt that any computer scientist worth his salt will say that he can come up with a "complete"/"provable" method for bluepill detection--if you can queer most methods of timing detection, you've turned this into an AI style problem.

Anonymous said...

More ideas how to detect bluepill:

CACHE ---
fill cache with known pattern
execute WRMSR, VM* or other emulated instruction.
On normal system it doesn't trash cache (access time to data is as if they were cached)
On bluepilled system some of the data will be pushed out of cache --- you can measure it with RDTSC, but it will be hard to fake (you don't take any exceptions when accessing cached/uncached data, so you can't do time dilatation for that)

DMA ---
hypervisor is in memory, that is inaccessible to guest OS, but still accessible to DMA devices --- a detector can read memory with DMA to disk, send it to network, play it with soundcard etc.
implementing workaround is very hard and nearly impossible --- you have to emulate all devices' DMA engines
--- your bluepill will have to know about every possible disk controller, network card, sound card, usb controller ... etc, so that it can make sure that it won't let these devices to fetch its code

Anonymous said...

What about external timing tests? Given that the presence of a running service can be detected on tor all you need to do is run RDMSR a few times on the local system while checking the clock skew using TCP timestamping. Yes, this requires the remote system to know when RDMSR is going to be run, but without altering any of the software (and removing the undetectable aspect) you can't stop this. Hmm, except by altering the clock, which you're already doing ... but you can check if a machine is losing time by running a ntp client on it and doing icmp and tcp timestamping (see the above paper).

Anonymous said...

What if the CPU manufacturers provided a special "Am I Virtualized?" instruction that was guaranteed to return whether the current execution stream is in a VM where this instruction was itself exempt from interrupt or tampering by the hypervisor?

Also the previous suggestion of shipping the OS with it's own defensive hypervisor seems a viable O/S independent approach to the problem.

Anonymous said...

Yep, Intel building LaGrande Technology maybe it's a way.
I talked with Intel guy during MTS 2006 in Warsaw and he said that Intel have and develops some software with lightweight hypervisor with fully supports VT architecture. Soft like this will allow creation of corporate environment where it will be started up hypervisor along with 2 virtual machines. One for security (Firewall, A/V etc.) and remote command line management for corp. support team and second for user system. Then I think blue pill can't run as second hypervisor.

Another way,
Even present VT hardware aren’t perfect, so, probably, it will be can build some detector, which will identify work at virtual environment without use VM control bit (which can be fake) but after some prepare functions or processes if otherwise, which different behave in virtual environment.

P.S.
I apologize for my poor English.
Greetings for Joanna after inspiring SecureCON 2006 in Poland.

Anonymous said...

Some people talked about prevention... Can we disable virtualization in BIOS? I can't do it on my AMD machine - but I heard that vendors are going to release updates to allow for that. But, come on, this is not a good way to address this threat! It's better not to buy the processors supporting hardware virtualization!

hardware virtualization will revolutionise the way we use computers..

i think the advantage outweighs the disadvantage..what is disadvantage? a rootkit?

these problems will be solved, in time, just like everything other piece of malware.

Sebastian said...

Hi Joanna,

indeed you're right that it is nearly impossible to find out "if I am virtualized" as long as hardware-counters or flags can be cheated.
IMHO the only reliable method for detecting compromised hardware is trusted hardware. As you mentioned in your malware talk, as long as the "malware" has at least the same privileges as we do, we cant find it.
As long as those registers/flags can be cheated, software doesn't seem to be a solution for finding blue pills and a hardware-based solution would be somewhat anoying but possible.
So, I'm thinking about an USB-Dongle sending crypted (private/public key) timing events.
For large companies, this shouldn't be a problem, because most of the staff has one of those RSA login-key-generators. So, now let's combine this timer with a read-only usb-device... ;-)

kind regards,
Sebastian