CTA

开始

云

是否已准备就绪?

下载 sandbox

我们能为您做什么?

关闭关闭按钮
CTA

Enterprise Data Warehouse Optimization

通过把数据和处理工作转移到 Hadoop® 来降低成本

云 了解您可以如何通过 Hadoop 来实现数据仓库现代化

下载白皮书

What is an EDW?

Enterprise Data Warehouse (EDW) is an organization’s central data repository that is built to support business decisions. EDW contains data related to areas that the company wants to analyze. For a manufacturer, it might be customer, product or bill of material. EDW is built by extracting data from a number of operational systems. As the data is fed into EDW it is converted, reformatted and summarized to present a single corporate view. Data is added into the data warehouse over time in the form of snapshots and normally EDW contains data spanning 5 to 10 years.

EDW 优化

Problems with a typical EDW

EDW is Expensive

icon6.png

Built on commercial and proprietary technology that is expensive to acquire (licensing cost)

icon6.png

Runs on expensive converged appliances

icon6.png

Cost continues to rise as new users and data is added to EDW

icon6.png

Operationally expensive – takes 18 to 24 months to find data sources, agree on business questions and model the data to answer them

EDW is Rigid

icon6.png

Data model must be in place before a single business question can be answered using the data in EDW, (schema-on-write)

icon6.png

Designed to answer pre-determined questions.

icon6.png

Data modeling is a lengthy and labor intensive process

icon6.png

Any change in the organization’s business model requires a change in the EDW’s data mode

EDW is Inefficient

icon6.png

50-70% of data is unused and or cold in EDW

icon6.png

45-65% of CPU capacity is used for ETL/ELT

icon6.png

25-35% of CPU consumed by ETL is to load unused data

icon6.png

30-40% of CPU is consumed by only 5% of ETL workloads

Optimizing EDW with Apache Hadoop ®

Cost Effective

icon6.png

HDP (Hortonworks Data Platform) is 100% open - there is no licensing fee for software

icon6.png

HDP runs on commodity hardware

icon6.png

New data can be landed in HDP and used in days or even hours

Flexible

icon6.png

Data can be loaded in HDP without having a data model in place

icon6.png

Data model can be applied based on the questions being asked of data (schema-on-read

icon6.png

HDP is designed to answer questions as they occur to the user

Efficient

icon6.png

100% of the data is available at granular level for analysis

icon6.png

HDP can store and analyze both structured and unstructured data

icon6.png

Data can be analyzed in different ways to support diverse use cases

Use-Cases on EDW Optimization

用例 1
media img

归档

按照设计,Hadoop 在低成本的商用服务器上运行,而且采用直接连接存储,使得整体成本非常之低。与高端存储区域网络相比,使用 Hadoop 的扩展商用计算和存储也是一种颇具优势的选择 - 让用户只需在数据增长时扩展硬件。这样的动态成本让用户能够存储、处理、访问和分析更多数据。

了解更多

使用案例 2
media img

上传

ETL 功能是一个价值相对较低的计算工作量,在 Hadoop 中可以用较低的成本执行。当上传给 Hadoop 时,数据被提取、转换,然后结果被装载到数据仓库。结果:关键的 CPU 周期和存储空间被释放给了真正高价值的功能 - 分析和运算 - 在数据架构中最有效地利用了它的先进功能。

了解更多

使用案例 3
media img

丰富

一系列令人难以置信的新数据类型为在高性能 EDW 环境中进行分析创造了可能。但是,这些新数据类型拥有各种各样的结构,为无法接收和分析这些格式的 EDW 带来了挑战。许多组织依靠 Hadoop 的灵活性来捕捉、存储和改进这些新数据类型,以便在 EDW 中使用。他们利用通过 Hadoop 读取即可改进架构的能力,收集并存储任何格式的数据,然后在必要的时候创建架构来支持在 EDW 中进行的分析。

了解更多