Deep Learning, Streaming, Multi-Tenancy. All in a single secure platform.
Multi-Tenancy with Hopsworks
Hopsworks is both a UI and Rest-API platform for privacy-by-design Data Science on Hops Hadoop. Uniquely among Hadoop platforms, even sensitive data can be processed/stored in the Data Lake.
Batch, Streaming, SQL
Apache Spark support for batch analytics, SparkSQL/Parquet, Spark Streaming, GraphX.
Train, deploy, and debug your models on clusters of GPUs with TensorFlow/Keras/PyTorch and debug with TensorBoard. One-click deployment of models to TensorFlow Serving.
The only Hadoop stack with full Conda and Pip support. Hopsworks Projects have their own their own conda environments in the data lake -Data Scientists can choose their own libraries.
Jupyter and Zeppelin Notebooks. Jupyter supports Python, Hive, and Sparkmagic kernels, for TensorFlow/Python/PySpark/Scala/Hive.
Apache Hive LLAP
Petabyte scale data warehousing with Apache Hive LLAP. Zeppelin Interpreter support for interactive analytics and visualizations. UI-driven starting/stopping of LLAP clusters.
The ELK stack is integrated with Spark/TensorFlow applications for realtime logging, visualizations, and search.
Spark applications and Hops services are monitored and monitoring data is stored in the time-series database, InfluxDB. Time-series data is graphed with Grafana.
Hops is the only Hadoop distribution with a TLS certificate-based security model. Certificate management is more scalable than Kerberos' KDC, they enable external systems easier integrate of external devices, and enable multi-tenancy feature in Hopsworks.
Hops is the result of research at the Distributed Systems Group – jointly run by KTH – Royal Institute of Technology and SICS Swedish ICT and managed as the EIT Digital Innovation Activity HopsWorks.The research has been financed by the EU Framework 7 programme (BiobankCloud – 317871), SSF (End-to-End Clouds), and SeRC, and ICT TNG.
Hops is jointly developed by KTH Stockholm, RISE SICS AB, and Logical Clocks AB.