Cloud file storage, Client Side Encryption, and privacy

Discussion in 'privacy technology' started by DisgruntledRanter9, May 4, 2012.

Thread Status:
Not open for further replies.
  1. DisgruntledRanter9

    DisgruntledRanter9 Registered Member

    May 4, 2012
    United States
    Client Side Encryption
    What is client side encryption?
    Client side encryption is a security and privacy measure that involves encrypting data on the user's (client's) computer, rather than on the server or not at all. When applied to “cloud” based online services like Dropbox, Google Drive, Microsoft Skydrive and Sugarsync, the term means that the user's content, including photos, videos, music, documents, and other files, is encrypted with a cryptographic algorithm before it leaves the user's device (computer, mobile phone, tablet, etc.). More importantly, the cryptographic key is not known by the service provider. An online storage service that uses client side encryption and a key known only to the client is sometimes called a zero knowledge service.

    Wait, what is encryption?

    When data (the plaintext) is encrypted with a sufficiently strong algorithm, the resulting ciphertext is completely useless without the correct cryptographic key. For many algorithms, the only known way to recover the data without knowing the correct key beforehand is to do a brute force search (try every possible key). Since modern ciphers commonly have 2^128 or 2^256 possible keys, these attacks are deemed infeasible; even supercomputers would most likely take many, many, many lifetimes to even stand a 1% chance of cracking the code. While “lucky guesses” of keys are possible, this is so unlikely that it is also not a concern. Experts have broken some older ciphers by cryptanalysis, but there are many widely used algorithms (AES, Twofish, etc.) that have not been broken this way.

    In short: Encrypted data can only be accessed with the correct key.

    What are the benefits of using client side encryption?

    The main benefits are increased security and privacy. Since the service provider does not have access to the key, not even an employee of the company can access the data. Furthermore, client side encryption can help prevent security breaches like the one that landed Dropbox in the news in June 2011. In this incident, a bad code update caused the Dropbox server to accept any password for any account. Only the username of a target user and the knowledge that the flaw existed were required to have full access to the data in the victim's Dropbox account. This means that literally anyone with an Internet connection could have, within the several hour window between the bug's introduction and fix, logged into any account and downloaded, deleted, edited, or shared the target user's files. This is unacceptable, and client side encryption would make this sort of attack impossible.

    What are the downsides of client side encryption?
    For obvious reasons, a storage service in which the provider (ie. Dropbox or Sugarsync) can restore access to a user who has forgotten his/her account password(s) is NOT a zero knowledge system. Account recovery would be limited. Client side encryption would also require extra encryption and decryption steps to be performed on the client side (hence the name). The performance hit caused by these extra steps, however, would be negligible.

    How might a client side encryption system work?
    *Important note: I am by no means an expert, and this description should only be seen as a sort of rough idea for those who would like to know how client side encryption might be implemented.

    Firstly, users should be given three options: no client side encryption (default), client side encryption with a hash of the user password as a key, or client side encryption with a user specified key.

    Option #1:
    -The service works like Dropbox does now. The storage company and its employees have full access to the user's files, and the files may or may not be encrypted on the server side with a key controlled by the service provider.

    Option #2:
    -Notation: (#1,x) indicates hashing the data x with the unbroken, secure hash algorithm #1. ($1,k,x) indicates encrypting the data x with the key k using the secure cipher $1
    -Authentication: the client sends (over a SSL secured connection) the output of (#1,password) to the server, where password is the user's password. If the sent hash matches the one that the server has on file, access is allowed. Otherwise, access is refused.
    -Encryption key: the client software encrypts all data with the key (#2, password) BEFORE sending it to the server. It uses the same key to decrypt all downloaded encrypted data.
    Thus, the only data known by the service provider and its employees is ($1,(#2,password),data). Because the server only has access to hash #1 of the user's password, it does NOT store the key to the encrypted user data (documents, photos, videos, etc.). Because the hashes are one-way, the server cannot simply “reverse” the hashing process and determine the original user password. Note, however, that this system is only as strong as the user's password. Should the user choose an easy to guess password (ie. “password” or “1234”), then there would be no real security at all.

    Option #3:
    -Authentication: authentication is done however it is already done (ie. Through hashes or a challenge-response system).
    -Encryption key: the user either specifies the encryption key himself or herself or allows the client software to decide an encryption key on its own. The key is NEVER sent to the server. The user manages the key independently of his/her user password.

    Option #1 is the easiest for both the end user and the service provider. It is also, consequently, the least secure.
    Option #2 is the same as option #1 from the “normal” user's perspective, but it is more difficult and expensive for the service provider. The service provider must pay someone to code the client software (ie. software for Windows, OS X, Linux, mobile devices, etc.) and must also pay more because the deduplication system would be largely useless.
    Option #3 is more difficult for the user and approximately the same as #2 from the provider's perspective. The user must manage his/her key independently (ie. by writing it down and storing it somewhere safe or, for those with VERY sharp memories, memorizing the key).
    Option #2 seems to be a good tradeoff.

    What is deduplication? Why is it “largely useless” when client side encryption is used?
    Deduplication is a space and money saving algorithm designed by the storage corporations. Deduplication works something like this:
    -The checksum (also known as a hash) of each file to be uploaded is calculated and sent to the server before the file itself is actually uploaded. The server stores the checkum of each file alongside the file itself.
    -If the checksum matches one already in the service provider's database (a “hit”), then the file is not uploaded. Instead, the server adds a reference to the already existing file to the user's list of files. All requests for that file will be redirected to the original uploaded file.
    -If there is no checksum match (a “miss”), then the file in question is uploaded and the checksum stored alongside the file.
    While data encrypted with various ciphers can sometimes be distinguished from random data, this is not of much concern here. Essentially, encrypted data is random data. Moreover, if two files are encrypted with the same cipher, but different keys, the resulting ciphertexts will appear to be completely different. Because the concept of each user having his/her own key is central and critical to the concept of a “zero knowledge” CSE system, deduplication post-upload (and encryption) would suffer significantly in such a system. Deduplication pre-upload would work, but it would violate the very principles of zero-knowledge and CSE. All of the uploaders after the first one would have to know the first uploader's private key. This means that the CSE is already defeated.

    Deceptive Marketing:

    “What”, you ask? The service providers are lying about user data privacy?! Sort of. Dropbox was under fire (and FTC complaint) once because its website said that “Dropbox employees cannot access user files.” This, of course, is simply wrong. To give Dropbox some credit, they did rectify statements like these to clarify that company policy, not technical means (ie. Encryption), prevents Dropbox employees from mishandling user data. They also made clear that some employees do have access to user data (including file contents). Still, I think this is not quite enough. Perhaps Dropbox should have a checkbox in addition to the normal TOS box: “I understand that Dropbox and its employees can technically access my personal data (including file contents) without my knowledge or permission. We do, however have strict company policy forbidding this.” I think this should be done not because it is legally required (TOS covers Dropbox against this), but because it is the right thing to do. I seriously doubt Dropbox (or anyone else) will do this.

    User confusion:

    To go along with the deceptive marketing practices, Dropbox also confused nontechnical users who often do not understand very much about cryptography, security, or data privacy. Such users might (wrongly) think that data stored online is safe just because a password is used to access it. Other users might be confused by providers' claims that data is encrypted by AES-256 in transit. They might not understand that this means nothing about what is (or is not) done to protect that information when it is stored on the service provider's servers. To these users, any mention of encryption automatically means security. This is wrong. Still other users might be confused in a different way. This third group might be slightly unsure or uneasy about uploading files to “the cloud” but feel reassured that Dropbox (and other services) mention frequently the idea that only shared files are accessible to other service users. Surely, these users think, no one must be able to access data that is not shared? This, too, is incorrect.


    Client side encryption and a zero-knowledge system help protect user security and privacy. Depending on how these features are implemented, the impact on user convenience can be minimal or nonexistent. On the provider end, there are some associated costs, but these costs can sometimes be outweighed by lawsuits and bad PR caused by preventable security breaches.

    Some footnotes:
    +I wrote some parts of this post (particularly the earlier ones) with a less-technical user in mind. This is not intended to offend anyone here. Rather, I wrote those portions before I decided where to post this.
    +This is my first poset here, so I am not exactly sure if this is where it belongs.
    +All corrections and comments are welcome as long as they are at least somewhat constructive and on-subject.
Thread Status:
Not open for further replies.