SC4: Secure Communications
in a Very Small Code Base

Ron Garret

Spark Innovations, Inc.

October 2015

Abstract

The code bases for current state-of-the-art secure communications systems like PGP or OpenWhisper are quite large, typically ~100,000 LOC or so. This makes them very difficult to audit, and leaves a lot of places for vulnerabilities to be introduced, whether by honest oversight, incompetence, or malice. Even an expert would have a hard time answering the question: how do I really know I can trust this system?

SC4 is a secure communications system specifically designed for (relatively) easy auditability by way of a ruthless commitment to simplicity. SC4 provides the functional equivalent of PGP from the end-user's point of view, but implements it in two order of magnitude less code. The cryptographic core of SC4 is Daniel J. Bernstein's TweetNaCl library (<800 LOC). On top of this we have a variety of UI implementations ranging from 1000 to 5000 LOC. One of these has completed a formal audit to date.

Introduction: A security parable

Suppose you want to send a confidential electronic document to a correspondent who is not a technical export (your lawyer, your accountant, a journalist...). So you call them on the phone and the following conversation ensues:

You: Hey Alice, I want to send you this confidential document, do you use PGP?

Alice: I've heard of it, but I've never used it.

You: It's really easy. You're on a Mac, right?

Alice: That's right.

You: OK, go to https://www.gpgtools.org. Down at the bottom you'll see a "Download GPG Suite" button. Click on that.

Alice: OK, got it. I'm running the intaller. It's asking for my administrator password. Hold on a second, how do I know that I can really trust this thing?

You: Just take my word for it.

Alice: OK, I do generally trust you, but have you actually vetted this thing?

You: Well, no, not me personally. But GPG is the de facto standard. Everybody uses it. If there was a problem with it someone would have noticed.

Alice: No, that's not what I mean. What I mean is, how can I be sure that this copy of GPG is the standard one that has all the eyes on it, and not one that has been secretly backdoored by the NSA? After all, if I were the NSA, the first thing I would do is put up a nice user-friendly web site serving a version of the de facto standard that everyone uses (and hence no once questions) but which has a secret back door built in to it that only I knew about.

You: Well, you can verify the SHA-1 hash or the signature.

Alice: Yes, I see those down at the bottom of the page. But how does that help? All of these things are being served from the same web site that I got the installer from. If this web site is actually controlled by the NSA then they can put up a legitimate SHA-1 hash and signature just as easily as they can serve a version of GPG that they have tampered with. I still need to trust this web site, right?

You: Well, yeah, you do because you're not a hacker. But I could download the source code and verify that it builds into the same version that you are installing. And because I can do it, others can do it as well. And because anyone with the right technical skills can do it, the NSA wouldn't dare put up a tampered version because they'd be caught.

Alice: I see. Do you mind if we actually do this experiment?

You: Well, we could try. [A few minutes later.] Uh-oh.

Alice: What's the problem?

You: Well, the installer contains five programs but the source code is only available for four of them. There's a Mac-specific control panel whose source code they haven't published. So even if I went to the trouble of building the four that are available, I couldn't verify the installer.

Alice: And neither could anyone else.

You: Not without the code for the fifth program.

Alice: So that rather undermines your argument that we should trust this thing because it has a lot of eyes on it, right? It seems that no one could possibly be verifying it. The only reason anyone trusts it is because everyone assumes that other people are vetting it. But in fact no one is vetting it because it's not possible.

The problem

Verifying that a piece of security software is reliable is extremely difficult even for experts. Even under the best of circumstances, a user wishing to validate security software has only two options:

  1. Build the system herself from source (after auditing the source code, of course) or
  2. Trust the vendor

Those are the only possibilities. The non-computabilty of the halting problem makes it impossible to validate compiled object code.

Even option 1 is not really feasible for anyone but the most elite of experts. TextSecure, for example, is over 85,000 lines of Java code. GnuPG is nearly 250,000 lines of C code. Auditing such code bases would be a daunting task, and indeed neither of these systems has been publicly audited despite being in widespread use. Every user of either of these systems has effectively elected option 2.

SC4 is an attempt to produce a secure communications system for which one could feasibly convince onesself that it is in fact secure. Note that "feasible convincing" is distinguished from "actual convincing" or "proof". What we mean is that if one were to wish to conduct an audit of the code, one could feasibly do so. And because it is feasible, it is more likely to actually be done, and so it is less likely for weaknesses to exist.

Another component of our strategy is to make the underlying architecture (core crypto, data interchange format, key protocols, etc.) simple enough that independent implementations of the design could easily be generated. To date we have built three different implementations of SC4, one of which has actually undergone a formal audit.

Design

The SC4 design is the result of a ruthless commitment to simplicity. SC4 is specifically not designed to be all things to all people. It is not backwards compatible with PGP or other existing cryptographic standards. It does not support RSA, for example. It includes only the bare minimum needed to accomplish the goal of providing state-of-the-art secure end-to-end encrypted communications.

Cryptographic primitives

In accordance with our guiding principle of keeping things as simple as possible, we choose as our cryptographic primitives Daniel Bernstein's TweetNaCl library (TNaCl). TNaCl is less than 800 lines of C code, but it includes state-of-the-art implementations of all the cryptographic primitives needed to build a secure communications application, including symmetric and asymmetric ciphers (Salsa20 and Curve25519), and secure digital signatures (Ed25519). We have added a single function to the original TweetNaCl library to convert an Ed25519 public key to a Curve25519 public key so the same key can be used for both encryption and signing. This function consists of seven lines of code adapted from libSodium, a full-featured (and much larger) port of the original (non-Tweet) NaCl library.

Can this library be trusted? On the one hand, its author is one of the most respected names in the field. On the other hand, the code is terse to the point of being obfuscated, and contains not a single comment. The available documentation is also very sparse. Most of the functionality has to be inferred from the documentation for the NaCl library from which TweetNaCl is derived. The code implements standard algorithms which have received significant scrutiny. And with only 800 LOC there just aren't a whole lot of places for shenanigans to hide.

Nonetheless, TweetNaCl could probably stand a rewrite with an eye towards making the code more readable.

Data formats

SC4's data formats are defined in a separate document distributed with the source code. SC4 defines four kinds of file formats:

  1. Encrypted files
  2. Bundles
  3. Signatures
  4. Keys

There are actually two different formats for encrypted files, one for a single recipient and another for multiple recipients. These are obviously redundant, as the multi-recipient format subsumes the single-recipient format. The reason for having both is that the single-recipient format is slightly more efficient, and it is an important enough special case that having a separate format is warranted.

Most SC4 file formats are fixed-field binary formats. They all start with a three-byte "magic number" which unambiguously identifies them. All of the binary formats can be "ascii armored" by converting them to base64. The magic numbers are chosen so that the base64 representation always starts with the string "SC4" with a fourth letter identifying the file type.

Bundles

A bundle is a standard format for a file and associated meta-data. SC4 defines two such formats, a binary format and an ascii format. Bundles exist because sigital signatures are more useful if they can be attached to data with associated meta-data rather than just raw data in order to help prevent chimera attacks. The reason for having both a binary and an ascii format is to allow documents that are signed but not encrypted to be signed without obfuscating their content. Files that are both signed and encrypted use binary bundles.

Signatures

SC4 uses Ed25519 signatures. These are short enough (256 bits) that a signature and its associated public key can be attached in-line in a text document without being overly obtrusive. For example, here is a signed version of an ascii document with no file name containing the text "Hello world":

X-SC4-bundle: 0 11 raw
X-SC4-filename: 
X-SC4-mimetype: text/plain
X-SC4-signed: v0.1 7h6AJvkdNTxsGFEbtyRUkmEce6ZKg91pF67yTCG6mZNw
221C25C3441DE94B3C5F75E66858AFC6A37F94255FCAFD006B4ED515267D9700
33F31755F9A567B36B6C0F9FA7F9C31CC369F685778B681E61825658E7A3D94F
5FkBgzgmUKnU1k5Bg6hzSG6fnRcc4o8NBhkh2DqNciSS
Ei3Vr145xLo6hyS9RrQHN82zKZ691THF1x6FECyzyi8X

Hello world

SC4 applies the Ed25519 algorithm not to the document directly but rather to the following string:

[sha512sum-hex] [file-name] [mime-type]

where:

[sha512sum-hex] is the sha512 sum of the actual content being signed, represented as a hexadecimal string using lower-case letters

[file-name] and [mime-type] are the obvious things.

Note that the [sha512sum-hex] is separated from [file-name] by two spaces, and [file-name] is separated from [mime-type] by a newline. The rationale for this somewhat eccentric design is to allow the hash of the signed content to be independently computed using the following simple bash script:

{ shasum -a 512 [file-name]; echo [mime-type]; } | shasum -a 512

Deployment

There are currently three implementations of SC4: a command-line version written in Python (~800 LOC plus the TweetNaCl library and associated glue code), and two versions written in Javascript that run in a browser. One of these is a standalone version, and the other uses a key server to create a fully turnkey user experience (and hence is dubbed SC4TK). The standalone browser-based version uses a Javascreipt port of TweetNaCl by Dmitry Chestnyk called TweetNaCl-JS. It also uses two standard libraries: jquery and purify. Beyond that it consists of 1062 lines of Javascript and 142 lines of HTML.

Deploying crypto code in browsers is controversial. There is a school of thought that crypto code should never be deployed this way. The argument is that the security perimiter of a browser necessarily extends to the server from which it fetches its code, and so browser-based crypto cannot possibly be any more secure than the browser's connection to the server. If the browser is connecting using HTTPS then the connection is already encrypted and any additional encrpytion is superfluous.

This argument does not apply to SC4 because it assumes a traditional browser-server interaction where encrpyted data does not leave the security perimeter. But SC4 is a secure communications application. Encrypting data for export outside the security perimeter is the whole point.

This is not to sat that there are no valid objections to this strategy. There are. Browser-based crypto is fraught with all manner of peril. However, it is possible to deploy SC4 so that its security is comparable to that used in on-line banking. Granted, that is a low bar, but it's still better than nothing, and the fact that this deployment strategy allows SC4 to be used on any machine without having to install any software is a big win.

We do intend to produce native GUI-based versions of SC4 eventually. An app for iOS is currently in development.

Key management

Deploying crypto code in a browser presents some unique challenges. Key storage is particularly thorny.

SC4 keys are stored in localStorage. This places the security perimeter around the browser and the server. This perimeter is protected by an SSL certificate and the browser's HTTPS implementation. It requires a user to do one of two things:

  1. Trust our server, or
  2. Run a server of their own (which could be their local machine)

Option 2 is much too burdensome for non-technical users (indeed, the requirement to deploy on HTTPS makes it too burdensome even for many technically savvy users!) So SC4 offers a third option: it can be run directly from a FILE: URL.

Intuitively one might think that running from a FILE: URL would be more secure than running from a server, but alas this is not the case. The problem is that most browsers are too promiscuous in drawing the security perimeter for localStorage when it is access from such URLs. Safari, for example, has a single localStorage instance for all FILE: URLs. This makes keys stored in localStorage from FILE: URLs trivial to phish.

SC4 solves this problem by detecting when it is being run from a FILE: URL and rewriting itself so that the keys are embedded directly inline in the SC4 javascript code. This is only a marginal improvement, of course, and we don't really expect anyone to use SC4 in this mode for really sensitive applications. We did it mainly as a proof-of-concept that it is possible to deploy a web-based application whose security perimeter is the user's account. A user who really wants to use SC4 in this manner is better off to use the command-line version, which also includes a secure key store called s-cache, a description of which is beyond the scope of this paper.

Current status

The standalone web version of SC4 has undergone an independent audit conducted by the well-known auditing and pen-testing firm Cure53. The report is publicly available on the Cure53 web site as well as the SC4 git repository. All issues identified by Cure53 have been addressed.

Summary and Conclusion

SC4 is a proof-of-concept demonstration that a secure communication system can be built and deployed in a much smaller code base than has been achieved to date. It is nearly two orders of magnitude smaller than most of its competitors. This makes it more feasible to audit, and therefore more worthy of trust. Ther easier it is to audit, the more likely it will be that someone will actually have done it.

Development of SC4 is ongoing, and we are actively seeking strategic partners.