The concept of a modern data architecture has evolved dramatically over the past 10-plus years. Turn the clock back and recall the days of legacy data architectures, which had many constraints. Storage was expensive and had associated hardware costs. Compute often involved appliances and more hardware investments. Networks were expensive, deployments were only on-premises and proprietary software and hardware were locking in enterprises everywhere you turned.
This was (and for many organizations still is) a world of transactional silos where the architecture only allowed for post-transactional analytics of highly structured data. The weaknesses in these legacy architectures were exposed with the advent of new data types such as mobile and sensors, and new analytics such as machine learning and data science. Couple that with the advent of cloud computing and you have a perfect storm.
A multitude of interconnected factors disrupted that legacy data architecture era. Storage became cheaper and new software such as Apache Hadoop took center stage. Compute also went the software route and we saw the start of edge computing. Networks became ubiquitous and provided the planet with 3G/4G/LTE connectivity, deployments started to become hybrid and enterprises embraced open source software. This led to a rush of innovation as customer requirements changed, influencing the direction that vendors had to take to modernize the data architecture.
The emergence of cloud created the need to evolve again to take advantage of its unique characteristics such as de-coupled storage and compute. As a result, this led to connected data architectures, with the Hadoop ecosystem evolving for IaaS and PaaS models and innovations such as Hortonworks DataPlane Service (DPS) for connecting deployments in the data center and the public cloud.
Given that data has “mass” and is responsible for the rapid rise of cloud adoption, the data architecture must evolve again to meet the needs of today’s enterprises and take advantages of the unique benefits of cloud. So much more is required in a data architecture today to achieve our dreams of digital transformation, real-time analytics and artificial intelligence – just to name a few. This paves the way for pre-transaction analysis and drives use cases such as 360-degree view of the customer. Organizations need a unified hybrid architecture for on-premises, multi-cloud and edge environments. The time has come to once again reimagine the data architecture, with hybrid as a key requirement.
What does it take to be hybrid? We’ve been innovating to answer this question for some time. Hybrid requires:
The last point on consistent architectures is critical – not just from a technology standpoint, but more because the differences manifest themselves in a fundamental manner in the interaction model for the user vis-a-vis the technology. As an example, when it comes to the Hadoop ecosystem today, users walk up to a shared, multi-tenant cluster and just submit their SQL queries, Spark applications, etc. In the cloud, however, users have to provision their workloads such as query instances, Spark clusters, etc., before they can run analytics.
Today, we are excited to announce the Open Hybrid Architecture initiative – the last mile of our endeavor to deliver on the promise of hybrid. This initiative is a broad effort across the open-source communities, the partner ecosystem and Hortonworks platforms to enable a consistent experience by bringing the cloud architecture on-premises for the enterprise.
Another key benefit is helping customers settle on a consistent architecture and interaction model which allows them to seamlessly move data and workloads across on-premises and multiple clouds using platforms such as DPS.
Through the initiative, we deliver an architecture where it absolutely will not matter where your data is – in any cloud, on-prem or the edge – enterprises can leverage open-source analytics in a secure and governed manner. The benefits of ensuring a consistent interaction model cannot be overstated, and provides the key to unlocking a seamless experience.
The Open Hybrid Architecture initiative will make this possible by:
After careful consideration, we’ve determined the best path forward is a phased approach, similar to how Hortonworks delivered enterprise-grade SQL queries-on-Hadoop via the Stinger and Stinger.Next initiatives.
The Open Hybrid Architecture initiative will include the following development phases:
Just as we enabled the modern data architecture with HDP and YARN back in the day, we’re at it again – but this time it’s bringing the innovation we’ve done in the cloud down to our products in the data center.
Hortonworks has been on a multi-year journey toward cloud-first and cloud-native architectures. The Open Hybrid Architecture initiative is the final piece of the puzzle. Not only will this initiative bring cloud-native to the data center, but it will also help our customers embrace and master the unified hybrid architectural model that is required to get the full benefits of on-premises, cloud and edge computing. We, along with our partner ecosystem and the open-source community, are excited to tackle this next redesign of the modern data architecture.