Friday, November 08, 2013

Backing Up Can Take a While

I have a problem that is in common with a lot of IT Pros. I have a lot of data that I value and I want to keep it save. ALL forms of storage are flawless – disk die, SSDs, die, tapes die, tape drives die, CDs die, floppy disks die. So what to do?

IN my case, my data is my collection of Grateful Dead, Jerry Garcia and a bunch of other music. The Dead stash takes up around 1.6 TB, with the Jerry and other music weighing in at around 1 TB. That’s a lot of music (and with around 1900 Dead shows, one of the larger around!). I do not want to lose that. The music has some interesting characteristics that may or may not be shared with other data storage scenarios. With music, each show or album is made up of multiple songs. As most of what I collect is lossless with two formats. SHN (or Shorten) and FLAC (Free Lossless Audio Codec). Typical GD shows range from 750MB to 1GB or so. Typical ‘song’ sizes are anywhere from 30/40MB to 120MB and sometimes more (and sometimes less). Once in my hands, these files are very much read-only!

This data profile is different from many simple file servers with lots of very small documents. Most of which don’t change very often.

The profile is also different from database servers where there are a few VERY big files (containing lots of bits of occasionally changing data). In some applications, like LYNC, you have several databases that need to be backed (and restored in case of failure).

Given the varied nature of data usage, there are inevitably different approaches for backup.

For my music collection, I have 4 big capacity USB-disks. One pair holds the master and are tethered to my main workstation where I update these as I can. The other, backup, pair are on the other end of my network. I have two scripts that use Robocopy to sync primary to backup. Over the years, I’ve been bitten badly by disks dying. So what I do is to wait till one of the disks begins to fail, and I replace it with a new drive. And even if it looks OK, I rotate every couple of years.

Recently, the 2tb disks I used to for the GD stuff were filling up, so time to move up format, and I’ve not got my collection on new 4TB disks. As per the title, it took a while to do this copy. Here’s the Robocopy Log:

image

So, to copy my entire GD collection from one USB drive to another took a mere 30 hours. Not bad when you consider I’ve been collecting for 30 years.

Technorati Tags: ,,,

3 comments:

Scott R. said...

Great post on backups, with an interesting source data set content and size!

Can you share whether you have yet used USB 3 external disks (especially for the recent copy and backup benchmark shown) or just USB 2? I have heard that using USB 3 (at the disk level and PC USB port / cable / hub (if used) level) can improve backup performance (shorter run times, higher throughput, etc.) versus USB 2 - sometimes dramatically.

I also wonder if the multi-threading capabilities in recent RoboCopy versions (/MT option - default of 8 threads) could further improve backup performance.


Thanks,


Scott R.

Scott R. said...

Great post on backups, with an interesting source data set content and size!

Can you share whether you have yet used USB 3 external disks (especially for the recent copy and backup benchmark shown) or just USB 2? I have heard that using USB 3 (at the disk level and PC USB port / cable / hub (if used) level) can improve backup performance (shorter run times, higher throughput, etc.) versus USB 2 - sometimes dramatically.

I also wonder if the multi-threading capabilities in recent RoboCopy versions (/MT option - default of 8 threads) could further improve backup performance.


Thanks,


Scott R.

Thomas Lee said...

Scott,
The disks were Seagate USB3 4TB disks, but I am using them on systems with only USB2. When I upgrade systems, they will have USB3. But for the data I have, the difference only matters when I have a full copy to make. Every week or so, I use Robocopy to copy any new files from primary to backup system.

No - I have not tried the multithreading switch. Now that all the data is copied onto the master and backup, I only now need to copy the deltas. But I will try that when I do the next 'normal' backup.