A common issue with Business Intelligence systems is what to do with old data. Some organizations have data going back many years, and while it isn’t used on a daily basis anymore it still has value.
Retaining that information on a typical storage array can prove to be expensive (heck, storing anything on those can prove to be expensive).
Fortunately there is an alternative, wherein Hadoop’s file system (HDFS) proves very functional. It is redundant and runs on commodity hardware. Take a handful of x86 servers, load them up with cheap 4TB SATA drives at 7600 rpm, and you have a solution that most storage arrays can’t touch price-wise.
Will it have the same performance as an EMC or NetApp running 15K SAS drives and flash? Of course not, but you don’t need it to in this case. And if one of the servers in the cluster bites the dust, just replace it. HDFS will repair the cluster and re-distribute the data once the new server is added.
An added benefit is the ability to conduct your ETL (or ELT, depending on your point of view) process in the Hadoop cluster. If you’ve changed databases over the years or simply changed the schema, this will come in handy.
Using HDFS to store recent data, or data that needs to be accessed often and quickly, is usually not a good choice since since speed is not a hallmark of Hadoop in general. But for long-term storage of old data it could be a very good option.