New FTP Command: HASH for Requesting Hash of a File

Have you ever uploaded or downloaded a file and wanted to verify that there were no errors during the transfer To make sure that the file on your loc
  1. Blog

Have you ever uploaded or downloaded a file, and wanted to verify that there were no errors during the transfer? To make sure that the file on your local system and the remote system you transferred it to were identical? There are some cases where we can live with errors in transfers and other mission critical situations where there can be absolutely no errors. Error detection usually involves extra overhead, so it's not desirable in all cases.

Previously, there were two ways to verify the integrity of a file. Both ways compare the hashes of a file, which is basically a unique fingerprint. If the hashes match, then the files are identical. (Unfortunately, you can't rely on file size or modification time because those are not unique enough).

One way to verify files involved the publishers of a file also providing a text file with these hashes, which would then be manually processed. This manual method was actually totally separate from FTP. You might have seen some of these files on servers, with names like MD5SUMS or filename.sha1. The contents of these files look like this, with the hash and filename listed:

a8d8e24bf8b82b4302d074fcac380d65 *ubuntu-10.10-alternate-amd64.iso
419ad8ee1bb76a49490f4a08b5be43f0 *ubuntu-10.10-alternate-i386.iso
1b9df87e588451d2ca4643a036020410 *ubuntu-10.10-desktop-amd64.iso
59d15a16ce90c8ee97fa7c211b7673a8 *ubuntu-10.10-desktop-i386.iso
6877bf8d673b87ba9500b0ff879091d0 *ubuntu-10.10-netbook-i386.iso

The other method of verifying the integrity of a file involved a number of non-standard FTP hash commands, where the client would request the hash of the file from the server. One problem is that there were over 10 commands (XCRC, XSHA256, CKSM), with multiple different commands for requesting the same hashes (MD5/XMD5, XSHA/XSHA1), all for doing nearly the same thing. Non-standard additions to a protocol can be difficult because they may not have been as thoroughly examined and tested as other standard parts of a protocol. They usually are not well specified, as the case was for most of these. Software authors may be reluctant to implement them because of trust issues relating to underspecification. In some cases, software supported one or a couple of these commands, but there was not a whole lot of interoperability between different software from other vendors.

In the new FTP Working Group, we're rolling up the functionality of all these non-standard FTP hash commands into one standard command: HASH.

The client uses OPTS to change different algorithms, instead of having separate commands for each algorithm. This makes the command more extensible. You don't need a new command every few years.

C> OPTS HASH SHA-1

S> 200 SHA-1

The client then requests the hash of a file with:

C> HASH filename.ext

S> 213 SHA-1 0-255 80bc95fd391772fa61c91ed68567f09... filename.ext

The server replies with the algorithm, the byte range of the file (that was selected with the RANG command which is also under development), the file's hash, and the filename. The byte range is included because you could request the hash of the complete file, or just a specific portion of it.

What do you think? Will you find this feature useful? Once finished, it would likely be integrated into software so this whole process is automated. Users won't need to know the details, they will just know they can count on finding out whether their files have transferred with no errors.