Arguably one of the biggest challenges
for desktop security is how to handle those overly complex PDFs,
DOCs, and similar files, that are so often exchanged by people, or
downloaded from the Web, and that often provide a way for the
attacker to
compromise the user's desktop
system.
Today I would like to discuss a recent
innovation we created for Qubes OS that allows to securely convert
those pesky PDFs (as well as essentially any graphics files) into
trusted PDFs. Here by a
“trusted PDF” I mean a file that should be harmless
to the user's system, so, a non-malicious PDF.
A few
years ago, we have already introduced a mechanism in Qubes OS called
Disposable VMs,
that can be used to safely
open any file, including PDFs, DOCs, etc. The file is being opened in
a... well a dedicated disposable VM that is created within seconds
(typically below 5 seconds) and all the file processing and rendering
happens inside this VM. Once the document is closed, the disposable
VM is automatically destroyed, and any changes to the file (e.g. if
the was an editable DOC file) are automatically propagated back into
the original file. This mechanism is very powerful, and I often use
it for my daily work. However, it surely is a bit cumbersome – who
wants to wait 5 seconds for the PDF to open, especially if I have a
dozen of invoices to look through! So, today I present an alternative
approach...
Approaches to converting PDFs
The
problem of converting a potentially untrusted PDF into a harmless one
is certainly not a new one. Some tools have already be created for
this task.
The typical approach is to parse the original PDF, look
for “potentially dangerous things” there, and remove them. As
simple as that! This is, of course, a typical AntiVirus approach to
the problem. And, typically as it is for most AV approaches, it's
completely useless against any more skilled and determined attacker
(and these are the ones we fear the most, don't we?).
A
somehow better approach is to parse the original PDF, disassemble it
into pieces, and then reassemble them into a new PDF only using the
“trusted” pieces – this, I think, could be called a
whitelisting approach.
Anyway,
the fundamental problem with the approaches mentioned above, is that
all of them require parsing
of the original PDF file. And parsing is where the “big bang”
usually happens. Parsing is where our, normally pretty decent, code,
comes in close, intimate contact with some unknown complex input
data, which often leads to a successful abuse or exploitation.
Parsing
PDFs safely
So,
how to perform parsing safely? Of course, that's simple! Let's run
the parser in an isolated container – in case of Qubes we already
have an ideal such container: it's the Disposable VMs.
But,
before we get too excited, let's think more about it – say we run
the parser safely isolated in a Disposable VM, meaning it couldn't
harm any of the rest of the system, except for the Disposable VM
itself, which we however won't worry much about, because it is
disposable... But then what?
We want our PDF back in our original VM,
to actually use it, right? But we cannot just copy the result from
the Disposable VM, because if it got compromised, as a result of
parsing of the malicious PDF, then we would like get... a compromised
converted PDF. So, this approach gives us nothing!
(Even though our “solution” incorporates all the obligatory
buzzwords: “Disposable VMs” (“Micro Disposable VMs”?), “VMs
isolated using hardware Intel vPRO technology” and, of course, the
“hypervisor”! Sometime just the mere fact we use “hardware
virtualization” buys us nothing... People seem to forget about this
sometimes.)
So,
the trick to make this approach meaningful is to introduce what I
will call a “Simple Representation” of the input file. More on
this, straightforward concept, below. The idea is that our parser
(that runs in a Disposable VM) will be expected
to return the Simple Representation of the original PDF. Of course,
it might very well go wild (as a result of exploitation by the PDF it
parses), and don't obey our expectations, and instead return
something totally different and potentially malicious. But that
doesn't matter! The whole point of the Simple Representation is that
it should be, well... simple to parse it safely and discard in case
what we're getting doesn't look like the Simple Representation.
Ok, so
what's the simplest possible representation of an arbitrary PDF file?
Yes, it's the RGB format, which is essentially just a raw array of
RGB values for each pixel. In fact, I'm not sure there could be
anything simpler in the Known Universe to represent a PDF file...
Now
this is all becoming simple: we would expect our parser to send us
just two things: the dimensions (W x H) of the bitmap representation of
each of the page of the PDF in question, and each of the PDF page
itself converted into a raw RGB format. If the parser didn't obey, we
would still interpret whatever stream of bytes we get as a RGB bitmap
– in the worst case the PDF we create would look like un-tuned
analog TV screen.
The
diagram below summaries this idea:
Implementing
this all on Qubes
Now I
would like to show how easy it is to implement such PDF converter
service using the Qubes advanced infrastructure that we call qrexec, and which is part of
Qubes core for quite some time now.
First,
let's choose the PDF and image conversion tools. The choice of PDF
converter is not security critical, because it will run in an
isolated Disposable VM. Here I decided to use pdftocairo converter,
which is part of the poppler-utils package on Fedora. We will also
use ImageMagick's “convert” command to convert the PNG files
(produced by pdftocairo, one for each PDF page) to the raw RGB
format. Incidentally ImageMagick supports RGB format natively.
As mentioned above, in addition to sending the raw RGB file, we would
also need to send the width and height of the pixmap – those can
easily be obtained using ImageMagick's “identify” command. Again,
all those programs discussed so far are not security critical –
they might get exploited during the processing of the untrusted input
PDF file, and we don't worry about that at all.
On the
receiving side, however, we need to use a foolproof parser for the
RGB format. Again, this is what we gain in this whole process –
instead of requiring a
foolproof-and-also-being-able-to-produce-non-malicious-PDFs parser,
we only require a foolproof RGB parses, and that's quite a gain! The
ImageMagick's convert comes to mind again here, and one might want to
use it like this:
convert page.rgb page.pdf
Unfortunately
this would be wrong, because the convert program would still try to
detect the “real” format of the page.rgb file, and, if it looked
more like, say, JPEG or PDF, it would parse it accordingly,
compromising all our careful plan! What we really need is to tell our
convert program to always treat the input as raw RGB file, instead of
trying to be (too) smart and trying to guess the format by itself.
This can be achieved by adding the “rgb:” prefix in front of the
input argument, which provides explicit input format specification:
convert -size ${IMG_WIDTH}x${IMG_HEIGHT} -depth ${IMG_DEPTH} rgb:$RGB_FILE pdf:$PDF_FILE
Now
also needed to add size and depth explicitly, because the raw RGB
format doesn't convey such information (well, it has no header of any
sort at all!). Of course we need to obtain the width and height from
the parser, but we can validate such input rather easily. In addition
we make sure that the received RGB file has exactly the size as
indicated by width and height. With those precautions in place, there
would have to be really a gapping
hole in the ImageMagic's RGB parsing code for the attacker to exploit
this. Perhaps instead of using the ImageMagick's convert I should
have written a small script in python that would parse the received
RGB file (and save it into a... RGB file, for later processing by
ImageMagick), but I sincerely think this would be an overkill here.
Additionally we also need to create a
policy file in Dom0 in
/etc/qubes_rpc/policy/
to allow to use this service. The policy file content for this service should
look like this:
$anyvm $dispvm allow
... which is pretty self explanatory.
When I do development I also add another line to the policy file like this:
$anyvm devel-vm ask
... to allow me to run the server inside
my 'devel-vm' VM, instead of running it in Disposable VM every time,
which would be very inconvenient for development, as it would require
me to update the Disposable VM template each time I wanted to test a
new version of qpdf-convert-server.
The policy file should be placed in
Dom0 in /etc/qubes_rpc/policy/qubes.PdfConvert
file – here the name of the file must be the same as the name of
the service, as invoked via qrexec_client_vm
command, discussed below.
And, one last thing, in the destination
VM we must also create a file that will map the service name (so, the
qubes.PdfConvert in our example)
to the actual binary that should be called in the VM when the service
is invoked. So, the file should be named:
/etc/qubes_rpc/qubes.PdfConvert
(again, this is now in a VM, not in Dom0, also note the lack of
policy/ subdir), and it is
another one-liner with the following content:
/usr/lib/qubes/qpdf-convert-server
The full source code of qpdf-converter can be seen and downloaded from
this git repo.
We're ready now to test our
qubes.PdfConvert service: in the
requesting VM, i.e. the one from which we want to initiate the
conversion process we do:
[user@work
Downloads]$ /usr/lib/qubes/qrexec_client_vm '$dispvm'
qubes.PdfConvert /usr/lib/qubes/qpdf-convert-client ITLquote.pdf
->
Sending file to remote VM...
->
Waiting for converted samples...
->
Receving page 2 out of 2...
->
Merging pages into a single PDF document...
->
Converted PDF saved as: ITLquote.trusted.pdf
->
Original file saved as .ITLquote.pdf
Again, for development process I would
replace '$dispvm' with something like 'devel-vm'.
The
qrexec_client_vm
command, used above, is not actually intended to be used by user
directly (that's why it's installed in /usr/lib/qubes instead of
/usr/bin/), and so when one creates a Qubes qrexec service, it's
customary to create also a small wrapper around qrexec, like
this one, that makes
using the service simple.
The presented converter saves the
original file as .${original_pdf}
making it a hidden file to help the user avoid accidental opening.
The new, converted file gets .trusted.pdf
suffix appended to the base name of the original file. I discuss more
issues regarding the human factor and avoiding accidental opening in
one of the next paragraphs below. The converter can also be used to convert essentially any image file, such as JPEG, PNG, etc, into a
PDF, using the same method.
As you can see creating client-server
services in Qubes is very simple – in fact it took me just one afternoon to get the inital working version of the converter (with subsequent "polishing" over the next 2 days).
The qrexec infrastructure takes care
about all the under-the-hood tasks, such as starting the necessary
VMs, e.g. creating Disposable VM to handle the service
request,establishing communication channels between VMs (which are
ultimately implemented on top of Xen's shared memory), redirecting
client and server's stdin and stdout to each other, so that writing
services is very simple, even in shell, and, of course, obeying
policies defined centrally in Dom0.
Most “inter-VM” features in Qubes,
such as secure file copy between domains, opening files in Disposable
VMs, time synchronization, appmenus synchronization, etc, are all
implemented on top of qrexec. A notable exception is clipboard
exchange, which is implemented as part of the GUI protocol, but still
uses the same common qrexec code for policy processing (e.g. I use
this policy to block clipboard and file exchanges between my work and
personal domains).
Limitations, other Simple
Representations
The obvious disadvantage of converting
a PDF to an RGB representation is that one looses text search, as
well as copy and edit capabilities (e.g. in case of PDF forms). So,
converting Intel's IA32 Software Developer's Manual this way would
certainly not be a good idea... But, hey, such large PDFs can always
be opened in a Disposable VM – they would be fully functional then,
only that you would need to wait a few seconds for the PDF window to
pop up. Or, better yet, why not keep all such PDFs in a dedicated
domain? E.g. I have a VM called “work-pub” where I keep tons of
various, publicly available PDFs, such as the mentioned Intel's SDM,
as well as various chipset docs, conferences papers and slides, and
generally lots of stuff. The key point is that all in this VM is
public material (and also all
is related to my work), so that I don't really care if any of those
PDFs compromises my work-pub domain. In the worst case, I will revert
the VM from backup and download any missing PDFs again from the web.
They are public after all.
But
the PDF conversion described above comes extremely useful in case of
all the various invoices, Purchase Orders, NDAs, contracts, and
god-knows-what-else PDF documents, which I'm forced to deal with in
my “work” domain (where my email client runs). Most of those are
one pagers, or maximum a few pages long documents, so the fact that
they got converted to a bitmap provides me with very little
discomfort. At the same time I gain incredible freedom in opening all
those documents natively in my work VM, without fearing that one of
those invoices will comrpomise my work domain (which would be a
rather sad thing for me, although the really sensitive stuff is still
in some other domains ;)
An
interesting question is, however, can we come up with another form of
Simple Representation that would allow e.g. to preserve the text
searching ability of the converted PDFs (and DOCs, PPTs)? Probably...
yes. The choice of the Simple Representation should be thought of as
of a trade-off between security and document's features preservation.
I'm not an expert on PDF and DOC formats (and I'm not sure I want to
be) but it seems plausible that one could disassemble PDF into simple
pieces, select the really simple ones, send those pieces as a Simple
Representation back to client, and have them reassembled back into a
almost-fully-functional PDF. Here, again, the point is that the PDF
parsing is done in isolated Disposable VM, while the reassembly in
the trusted VM. Anyway, let me leave it as a exercise for the reader
:)
Preventing user mistakes
Being able to
right-click on a PDF file and have it converted into a trusted PDF is
one thing. Having this mechanism adopted by users and actually making
their daily computing safer, is another story.
Users will likely
have hundreds of PDF spread over their home directories, and the real
challenge is how to make sure that the user never accidentally opens
the unconverted, untrusted PDF. We can think of several approaches to
this problem:
We modify the
Thunderbird, Firefox, etc, e.g. by providing specific plugins, to
always perform PDF conversion on each file that we got via email or
downloaded from the Web. Additionally we convert all the already
present PDFs in the user's home directory (file system?). And,
additionally, we modify Qubes file copy operation to also always do
automatic PDF conversion whenever one transfers files from other
domains (if Qubes qrexec policy allows for such transfer in the
first place, of course).
This approach
would not be optimal, because some PDFs, as we discussed above, might
not be well suited for conversion-through-bitmap process – they
might be large PDFs where text search is crucial, some conference
papers for review, where text copy is crucial, or some editable
forms. That's why it seems better to take a slightly different
approach:
We
modify mime handlers for PDF files (as well as any other files that
our converter supports) and then upon every opening of the file
(e.g. via mouse click in a file manager) our program gets to run and
its job is to determine whether the file should be opened natively,
converted to a trusted PDF, or perhaps opened in a Disposable VM. Of
course, upon “first” opening we should probably ask the user
about the decision, if this cannot be determined automatically. E.g.
if we can reliably determine the file is already converted, we can
safely open it without prompting the user, but if it's not, we
should ask – perhaps the user would like to open it in a
Disposable VM instead of converting, or perhaps the file should be
considered trusted anyway, because it was created by the user
herself.
This second
approach seems like a way to go, and we will likely implement it
sooner or later (probably sooner, but after the upcoming R2 Beta2).
It should also be noted, that typically user would need such
mechanism only in some domains – e.g. I really feel the need for
such protection in my “work” domain, but not in any other. But
that, of course, depends on how one partitions their digital life
into security domains.
One important
detail worth mentioning here, is that we should unconditionally
disable “Thumbnail View” in whatever file manager we use (which
itself is really a stupid feature – can people not read filenames
anymore or something?).
Qubes: from containers isolation down to apps protection
The mechanism introduced today, in
addition to the Disposable VMs mechanism introduced earlier,
represents a trend in Qubes development of “stepping down” into
AppVMs in order to also make the VMs themselves somehow more secure
(in addition to the isolation between the VMs).
Originally Qubes aimed at containers
isolation only. This included protecting the system TCB where
techniques such as deprivileged networking stacks (and optionally
also deprivileged USB stacks) have been deployed, as well as custom
GUI virtualization, and generally somehow “hardened” Xen
configuration. This also included protecting the VMs from each other,
where techniques such as secure clipboard, secure file copy and
generally secure qrexec infrastructure have been introduced, as well
as trusted GUI subsystem with explicit domain decorations.
But now, Qubes is stepping down into
the AppVMs in order to make the VMs themselves also less prone to
compromise. We surely will be working on more such mechanisms in the
future. We still are only at the beginning of the quest to create a
Reasonably Secure Desktop OS!
PS. The presenetd converter will be part of the Qubes R2 Beta 2, that is expected to be released... in the comming days. Experienced users of Qubes R1 and R2 Beta 1 can install the converter immediately by building the rpms from the git repo.
PS. WTF is happening with the Blogger web interface? Seriously, I don't remember being so frustrated using any software in the recent years that I am right now, when editing this post (as well as the last several ones). It sometimes honours the line breaks, sometimes do not, sometimes inserts a couple of new lines, sometime removes them, sometime mysterious spaces appear at the end of lines, sometime those cannot be removed... It doesn't allow to paste pre-formatted code-listing (at least I couldn't figure out how to make it honour tabs). And yes, I'm using the "Compose mode", because when I try to switch to the HTML mode, not only I'm overwhelmed with tons of HTML markups, nobody knows what for, but also when I switch back to the Compose mode, my article tends to get even more fucked up! Really, a shame. I wish I could go away to some other blogging service, but I'm afraid that converting all my posts would be even a bigger PITA... Sigh.