Apache HDFS stores 3 copies of your data to provide high-availability. So 1 petabyte of data actually requires 3 petabytes of storgae. For many organizations, this results in onorous storage costs.
Hops-HDFS also supports erasure-coding to reduce the storage required by by 44% compared to HDFS, while still providing high-availability for your data.
Currently, the only alternative for HDFS is HDFS-RAID, which was developed by Facebook for an old version of HDFS, V0.19, and is no longer supported in the Apache HDFS distribution. In comparison to HDFS-RAID, our erasure-coding implementation, Hops-HDFS, is integrated with the NameNode, removing the need to periodically scan directories for broken files and reducing time to discover and repair from failures.
|Flexible API / Support for custom strategies||✓||✗|
|Detection of failures as early as possible||✓||✗|
|Low overhead to maintain the state||✓||✗|
|Enforced block placement||✓||✗|
|Support for Hadoop version 2||✓||✗|
|Configurable and extensible codecs||✓||✓|
|Support for append||✗||✓|
|HAR support for parity files||n/a||✓|