Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.





下载 sandbox



企业 Spark 大数据大规模解决方案

Hortonworks 提供用于企业部署的 Spark

云 Hortonworks 是领导者。阅读 Forrester Wave。



Apache™ Spark Overview

Hortonworks is unleashing the power of the Apache Spark big data processing framework for enterprise scale, unifying the capabilities of open enterprise Apache Hadoop® and the in-memory analytic capabilities of Apache Spark to maximize organizational value.

Spark is Better as Part of the Platform
Spark is certified as YARN-ready and is part of Hortonworks Data Platform. Memory and CPU-intensive enterprise Spark-based applications can coexist with other workloads deployed in a YARN-enabled cluster. Spark has first class support for external data sources, it can run directly on the cluster in YARN, and that is where enterprises want to perform their data analysis. This approach avoids the need to create and manage dedicated enterprise Spark clusters and allows for more efficient resource use within a single cluster. 

Spark Requires Enterprise-Grade Security and Governance
As part of the HDP platform, Spark has access to the same governance, security and management policies as other components of the HDP stack. The Spark big data processing framework is one the fastest moving projects in the Big Data ecosystem and its libraries remain at different levels of maturity. Hortonworks investigates, validates, certifies and then supports each of the components in the Spark project. This approach is key to the way we add value for our customers.

Notebooks Makes Spark and Data Science Easier to Consume & Share
Web-based notebooks bring data ingestion, exploration, visualization, sharing and collaboration capabilities to Hadoop and Spark. Hortonworks is making a substantial investment in Apache Zeppelin; we plan to make Zeppelin ready for production use by making it easier to use, while adding security, stability and R support.

By delivering a unified Apache Spark and Hadoop, we combine Spark-driven Agile Analytic workflows with the vast-data set and economics of Hadoop. With Hortonworks, enterprises can deploy the Apache Spark big data processing framework with the industry’s best security, governance, and operations capabilities.

Hortonworks 对 Spark 的投入如何?

随着 Spark 1.6 的发布,Hortonworks 承诺帮助客户加速数据科学,维护无缝数据访问以及驱动核心创新。

Spark 作为开放企业 Hadoop 的一部分,使组织可以针对企业价值扩展 Spark。



通过增强Apache Zeppelin 以及贡献其他 Spark 算法和软件包来简化关键解决方案的部署,从而提高数据科学生产力。

例如:麦哲伦项目 - Apache Spark 中的地理分析学,一个面向地理分析的开源库,可便于地理空间查询,其基于 Spark,可解决处理大规模地理空间数据的棘手难题。



Spark SQL 提供 SQL 和数据帧 API 以访问结构化数据,而 Spark Streaming 则使开发者可以轻松构建五个实时数据流的可扩展、高吞吐量、容错性流处理。

Hortonworks 一直在改善 Spark 与 YARN、HDFS、Hive、HBase 和 ORC 集成。特别是,我们认为我们可以通过新的数据源 API 进一步优化数据访问。



使用 HDFS 内存层实现 RDD 共享


Enhance enterprise Spark’s security, governance, operations, and readiness


要详细了解全部激动人心的 Spark 创新,

查看我们的 Apache Spark 页面。


如何开始使用 Apache Spark at Scale?

收听我们最新的网络研讨会 - 包含 Hadoop 的 Spark at Scale