Loading page...

Apache HDFS stores 3 copies of your data to provide high-availability. So 1 petabyte of data actually requires 3 petabytes of storgae. For many organizations, this results in onorous storage costs.

Hops-HDFS also supports erasure-coding to reduce the storage required by by 44% compared to HDFS, while still providing high-availability for your data.

Currently, the only alternative for HDFS is HDFS-RAID, which was developed by Facebook for an old version of HDFS, V0.19, and is no longer supported in the Apache HDFS distribution. In comparison to HDFS-RAID, our erasure-coding implementation, Hops-HDFS, is integrated with the NameNode, removing the need to periodically scan directories for broken files and reducing time to discover and repair from failures.

Feature Comparison of Hops-HDFS and HDFS-RAID
Feature Hops-HDFS HDFS-RAID
Flexible API / Support for custom strategies
Detection of failures as early as possible
Low overhead to maintain the state
Prioritised repairs partially
Enforced block placement
Support for Hadoop version 2
Transparent repairs
Configurable and extensible codecs
Grouped encoding
Support for append
HAR support for parity files  n/a

  

Project categories

Erasure-Coding

Share the love