OT Network Storage

Rick Stevens rstevens at vitalstream.com
Thu May 19 17:07:44 UTC 2005


Vincent Jordan wrote:
> This is a bit off topic however someone here probably had run into this 
> before. Where I work we receive tests in data format via ftp and flash 
> memory chips that are mailed.
> 
> The raw data is zipped, when we get it our techs interpret the data and 
> create a report. Now we have the original zipped file +-50mb, The 
> processed file +-85mb and a report at about 5 meg. Until recently we 
> have just moved the finished files compressed on a “tank” machine in the 
> office. We have filled 4 80gig drives. More and more tests are coming in 
> now and im running out of room to put stuff. I was thinking of a Network 
> Area Storage device but then I got to thinking we are eventually going 
> to fill it up too and backing up all this may prove to be quite a task. 
> So now im guessing I should look towards a library / archival setup.
> 
>  
> 
> Here is where things may get complicated. I can’t just on a whim archive 
> stuff. The data is for cardiac testing patients, we are required to keep 
> all the files for at least 7-10 years. My technicians will need to go 
> back 4-5 months at any time to get the information if requested by 
> insurance or the ordering doctor. We were quite small, performing maybe 
> 4-5 tests a week, my boss bought out another company and we are now 
> processing 75+ tests a week. Anyone have any idea of a ready-made 
> solution? I’ve goggled for NAS and it does not seem to be a viable long 
> term solution.

We are a big NAS user (we have over 30TB of NAS storage on-line spread
across about 10 Network Appliance F880s for speed reasons).

NAS is nothing more than a large, network attached disk drive.  The one
nice thing about NAS is that you can usually expand the storage on it
easily WITHOUT having to shut your systems down to do it.  Most use a
fiberchannel daisychain to connect storage shelves (which contain the
various drives) to the NAS head.  The drives are typically hot
swappable--which means that you can add drives to a given shelf and grow
the filesystem onto those drives on-the-fly.  This is certainly true of
NetApp and EMC (we have experience with both).  I'm sure you can do the
same for the various IBM, DiskStor and other NAS systems.

NAS is, of course, not anywhere near as fast as native drives, but it
doesn't sound as if speed is an issue for you (it is for us, hence our
plans to replace the NAS system with a block storage system).

Note, however, you will have exactly the same issues regarding backup
and archiving as you have with a directly-attached disk.  All you've
done is move the disks from inside the computer to a box somewhere
else and reduced the number of archive/backup cycles (or lengthened the
time between them) because you can dynamically grow the disk.

Most NAS devices include a SCSI port that you can attach a tape drive or
library to directly and have the NAS device do the backup (most have
some piece of software that does that) or you can back up the data over
the network using something like Veritas or Amanda.

Historical archiving is always an issue.  There are a number of
solutions for that (I mentioned Veritas and Amanda).  And, yes, you'll
need to set up a database system that tracks the tapes, CDs or DVDs.
Most commercial backup solutions include that, but you'd need to build
one for Amanda. That's not horrible.  You can easily write a script that
parses the log from Amanda to inject data into the database.  You can
also pre-label the tapes and do cronological backups based on the file's
ctime (creation time).

One of the problems you run into is the backup time itself.  If you're
trying to back up once a day, you obviously need the backup job itself
to take less than 24 hours.  Depending on how much has to be backed up,
you may need something like ADIC's PathLight, which actually backs up to
disk at first (very fast), then the disks are despooled to tape.

Remember that there are lots and lots of companies offering various
backup/archive programs and solutions.  Obviously, the problem is
complex or they wouldn't be in business.  You know your needs best.
You need to draw up a requirements document describing at least:

1.  What time periods of data need to be available immediately (the
     last 30 days, the last two months, the last six months, whatever).
2.  How fast the data is growing (this determines the on-line disk
     requirement)
3.  How available the archived data must be (e.g. you can get to any
     archived data within four hours).
4.  Any other requirements or constraints you may have.

Once you have that, you can then start looking at commercial or open-
source solutions.  We may be able to help more at that point, since the
problem would be a bit more codified.
----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens at vitalstream.com -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-  Any sufficiently advanced technology is indistinguishable from a  -
-                              rigged demo.                          -
----------------------------------------------------------------------




More information about the Redhat-install-list mailing list