Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

CTA

开始

云

是否已准备就绪?

下载 sandbox

我们能为您做什么?

关闭关闭按钮
CTA

快速、轻松和安全的大数据提取

数据提取时间从几个月缩短到几分钟

云 了解您可以如何让数据提取更加快速、轻松和安全

下载白皮书

什么是数据提取

Big data ingestion is about moving data - especially unstructured data - from where it is originated, into a system where it can be stored and analyzed such as Hadoop.

Data ingestion may be continuous or asynchronous, real-time or batched or both (lambda architecture) depending upon the characteristics of the source and the destination. In many scenarios, the source and the destination may not have the same data timing, format or protocol and will require some type of transformation or conversion to be usable by the destination system.

As the number of IoT devices grows, both volume and variance of data sources are expanding rapidly, sources which now need to be accommodated, and often in real time. Yet extracting the data such that it can be used by the destination system is a significant challenge in terms of time and resources. Making data ingestion as efficient as possible helps focus resources on big data streaming and analysis, rather than the mundane efforts of data preparation and transformation.

HDF 让大数据提取变得更轻松

之前

复杂、混乱并且需要几周时间才能将正确数据移动到 Hadoop

之后

简洁、高效、轻松

数据提取的典型问题

复杂、缓慢且昂贵

*

Purpose-built and over-engineered tools make big data ingestion complex, time consuming, and expensive

*

Writing customized scripts, and combining multiple products together to acquire and ingestion data associated with current big data ingest solutions takes too long and prevents on-time decision making required of today’s business environment

*

• Command line interfaces for existing streaming data processing tools create dependencies on developers and fetters access to data and decision making

数据的安全性和可信性

*

分享分散的少量数据的需求与当前传输层数据安全功能不兼容,这限制了组或角色级别的访问

*

要遵从合规性和数据安全法规,不仅困难、复杂而且成本高

*

验证数据访问和使用情况不仅困难、耗时,而且涉及到将不同系统和报告分段的人工过程,以验证数据的来源、使用方式、使用者以及使用频率

面向物联网的数据提取的问题

*

• Difficult to balancing limited resources of power, computing and bandwidth with the volume of data signals being generated from big data streaming sources

*

不可靠的连接会导致通信中断并导致数据丢失

*

全球大部分已部署传感器缺乏安全性,这使业务和安全性面临着风险

使用 Hortonworks DataFlow 优化数据提取

快速、轻松、安全

*

目前解决很多大数据提取问题的最快速方法

*

实时、交互点和一键控制数据流

*

加速数据收集和移动,以提高大数据 ROI

*

实时运营可见性、反馈和控制

*

业务敏捷性和响应能力

*

Real-time decision making from big data streaming sources

*

消除编码和自定义脚本方法中固有的依赖性和延迟性,从而将运营效能提升到一个前所未有的高度

*

现成可用的基于流的编程,面向大数据基础设施

*

在地理位置分散且带宽不稳定的环境中,进行安全、可靠且优先数据收集

*

端到端数据管制,可实现产销监管链以获得数据合规性和数据“估值”和数据流优化及故障诊断

单一、灵活、自适应的双向实时系统

*

从动态、分散和分布式来源中集成式数据源无关集合

*

Adaptive to fluctuating conditions of remote, distributed data sources over geographically disperse communication links in varying bandwidth and latency environments

*

边缘的动态、实时数据优先级划分,以发送、删除或本地存储数据

*

双向移动数据、命令和上下文数据

*

同样精心设计,既可运行于构成物联网的小型数据源,也可运行于当前企业数据中心之中的大型集群

*

可视化数据监管链(溯源)提供了实时事件级别数据沿袭,以验证和信任来自物联网的数据

 
实时数据流如何加速大数据 ROI
保护来自物联网的数据流
实时、可视化数据沿袭
安全的数据访问和控制
动态数据的动态优先级划分

使用 Hortonworks Dataflow 进行数据提取的用例

用例 1

汇入到 Hadoop

通过实时拖放界面,加速将数据移动到 Hadoop 通常所需要的时间 (从几个月到几分钟)阅读真实用例并查看如何在 30 秒内将数据移动到 HDFS。

 

Prescient 视频 | 博客
立即观看 30 秒实时演示

使用案例 2
media img

日志收集/Splunk 优化

日志数据可能难以捕获,通常是以有限数量收集并且难以大规模操作。HDF 帮助高效率收集、汇总和访问数量不断扩大的日志数据,并且可以轻松与日志分析系统(如 Splunk、SumoLogic、Graylog、LogStash 等等)集成以轻松、安全且全方位地对日志文件进行数据提取。

 

日志分析优化白皮书立即下载

使用案例 3
media img

物联网提取

Realizing the promise of real-time decision making enabled by real-time IoT big data streaming is a challenge due to the distributed and disparate nature of IoT data. HDF simplifies data collection and helps push intelligence to at the very edge of highly distributed networks.

 

A. 物联网的边缘情报了解更多
B. 零售业与物联网了解更多
C. Open Energi 物联网了解更多

用例 4
media img

为流处理引擎提供数据

Big data ingestion leads to processing that delivers business intelligence. HDF enables streaming data processing for your organization to support real-time enterprise use cases with two of the most popular open-source solutions Apache Storm and Spark Streaming.

NiFi Kafka 和 Storm 博客、幻灯片、网络研讨会了解更多
Comcast 在 Hadoop 峰会上主旨演讲中 NiFi 到 Spark视频