Learning to Love the split() and cat() commands

While FTP is a good way to send medium-sized files, it is hard to reliably send files larger than 1 GB. The longer the file is sending, the higher the chance that a disconnect or error in the transmission will kill the entire transfer - forcing you to re-upload the entire thing.

Now, before you begin, it would probably help to be familiar with FTP.

Splitting Files

Wouldn't it be nice if you could split the file up into smaller chunks, upload each of them, and then the person on the other side would download each chunk and reassemble it? Not only does it make uploading and downloading easier, but the person on the recieving end can begin downloading without waiting for the entire group of files to upload.

Enter the unix command "split". It works like this: Let's say we have a file (file.zip) that's 300 MB and we want to split it up into 100MB chunks. Here's how:

split -b 100m file.zip file_

So what does this do? This will create three files:


Each 100 MB.

Saving Lots of Time

Since this can be applied to huge files that can take days to upload and download, it would be quite frustrating if the files were reassembled on the other side and you realized that one file had an error in it. A way around this is the widely-used md5() command. It creates a short, 64-character representation of the file. The downloader on the other side can then re-run the md5() command, and if the numbers match up, then the file has been transferred successfully. Try this:

md5 file_* > md5s.txt

This sends the output of a command to a text file called md5s.txt, which will look something like this:

MD5 (file_aa) = d2c833abcd294b1f20b34214e9667ace
MD5 (file_ab) = 0ed067b4f028706fb3effa91cb191ad5
MD5 (file_ac) = d49b997f8c33e930c1ab90c2cb4d2362

Upload the split files and the text file to an FTP server of your choice, and send the link over to the downloader.

Reassembling the Files

Once you've downloaded all necessary files, it's time to put them back together. Run this command:

cat file_* > file.zip

Again, the ">" character says "send output of this command to this file". Cat will read in each file beginning with "file_", and combine them all together.