Deduplication is a process SpiderOak One and Groups uses to save our users space and, therefore, money. If you're saving multiple copies of the same file, only the original copy of the file will take up the full amount of space; all of the other copies will be a lot smaller because the application only saves the data that differs from your original file.
For example, if you add more text or a graphic to a document, the application will only save the new data (the "diff" or difference), instead of saving the entire file again. Also, if you back up a file on one computer that has already been backed up on another computer, this file will occupy no additional space in your account.
When you are uploading a copy of a file which is already saved to our servers, the application performs deduplication before it ever begins the upload, comparing the files to the information you have already saved. It then uploads only the information that differs between the two files, such as their locations, in the form of journal entries. Although it initially appears that it is uploading the entire file again, you’ll see that the upload goes much faster and takes up very little space because in fact only these journal entries are being uploaded.
If you're trying to upload files that will deduplicate but you are being warned that you will exceed your space limit, open the Preferences window. Under the General tab, check the box which says "Disable disk space calculations during backup selection" and press Save. This will suppress the warning. You can now select your data for upload and it will deduplicate before being sent to our servers. Once you have resolved the immediate problem and deduplicated those files, you will want to clear the "disable disk space calculations" checkbox.
There are some files, however, which are saved or compressed in such a way that they cannot be easily deduplicated by our software, even if they are essentially the same file. For example, historical versions of .docx files will still take up a great deal of space because each time these files are saved, their entire structure changes in a way that makes it hard for the application to detect any similarities. Outlook files are another common example.
If you're trying to deduplicate compressed files, you can read our FAQ about how to get the best compression on compressed files.
SpiderOak One and Groups only perform deduplication on files stored in your account and not across users. We explain this in more detail in our blog post Why SpiderOak doesn't deduplicate data across users and why it should worry you if we did.