The rate of change in data management is astonishing. In just a few years we have seen the emergence of big data turn into a data lake and then we pushed these concepts to the edge where we capture our data. Along the way, the traditional paradigm of building a monolithic store to drive analytics is gone.
In fact, a veritable revolution has occurred in the world of business analytics. What was once a reactive function has become an active practice, providing insight into the business for both pre and post transaction. Business insight is now derived wherever the the data resides throughout its entire lifecycle. Real time decisions no longer mean higher performance analytics, but now include making decisions at the point of interaction—on a mobile device, at the point of purchase, and in the connected car.
With these advances, however, comes some new complexity. Data sprawl has widened as organizations now have multiple data lakes and clusters along with their traditional databases, EDWs and even new sources, such as graph and noSQL databases. Today, lineage and insight into where our data comes from, how it got there, who’s touched it, and how it’s been used is exponentially more difficult than in a traditional centralized EDW. And most importantly, while data security and privacy are top of mind issues, the control of a central, reliable security policy is nearly impossible to maintain. To make matters worse, our data now lives within our four walls and beyond, in the cloud and on edge devices.
Our customers, and successful businesses around the world need to master this new world. If we had a homogenous environment across all our customers our lives sure would be easier, but those days are long past..or are they?
When we set out to allow our customers to thrive in the new paradigm of data management, we had three key characteristics in mind. First and foremost, in order to make things work well we must use and leverage open source technologies. Second, we need to architect it as a platform so that services are being delivered on top of a set of shared capabilities. And finally, this layer must “fit” into any environment and not require an organization to change the way they architect their data. With these goals in mind, our team got moving .. and today .. I’m proud to announce delivery of Hortonworks DataPlane Service (DPS).
Hortonworks DPS is a next-gen service to manage, govern and secure data and workloads across multiple sources (databases, EDWs, clusters, data lakes), types of data (at-rest, in-motion) and tiers (on-prem, multiple clouds, hybrid). It allows enterprises to focus on getting more value from data quicker by providing an intuitive experience for managing all data.
DPS is comprised of a shared set of core capabilities and extensible services that use the platform. The core capabilities allow you to control clusters, provide central security and governance and integrate DPS with all your existing sources no matter where they live. The core capabilities of DPS include:
Extensibility is a key characteristic of the platform as well and DPS allows for services to be delivered that take advantage of the core. The first service we will launch is a Data Lifecycle Manager which simplifies the somewhat complicated tasks of moving data, replication, backups and will also allow you to set forth tiers so you can optimize cost based on data usage. We can automate the segmentation of data assets so that rarely used data can now reside in cheap storage and often used data can be more readily available. Soon, we will also introduce the Data Steward Service which will allow you to curate, govern and understand data assets and ease application of consistent policy across tiers.
We are also working with our partners to help them create extensions. This is an absolutely critical characteristic of DPS because it delivers a pragmatic platform designed to work in the real world. Every data center is different and DPS can work within your architecture.
In order to provide value across a wide range of environments, these capabilities are required to be delivered as a service. This is the only way to make sure DPS can provide these capabilities in any environment. Our data is on-prem, in the cloud, at the edge, in hybrid deployments and even spread across multiple clouds. As a service allows us to bring all this together into a common layer.
A few years ago, we introduced the concept of a modern data architecture and we have been busy helping our customers realize this vision. With the community, we helped make Hadoop an enterprise viable data platform. We then brought these capabilities to the edge with connected data platforms.
All the while, our philosophy has not wavered. We remain absolutely committed to open source and always take an open and collaborative approach that encourages an ecosystem to grow. With DPS this is more important than ever. No single vendor can solve this problem. It has to be addressed as a coalition. It needs a vision of a shared platform and extensible services so that all can take part and extend this new global data management platform.
Ultimately, categories aren’t simply created, they evolve over time. They emerge out of enterprise requirements. With DPS, we have listened carefully to the practitioners and leaders at our customers to craft a platform that is pragmatic, open and extensible. These are some of the key requirements that define the data plane.
While these concepts are not new, our ability as an industry to deliver on this promise is. In fact, we are at the dawn of a movement. This modern data architecture is finally coming into existence because we can abstract key components of the data center into a plane or a fabric or a mesh. Noel Yuhana from Forrester published a paper outlining his vision for this new world and calls this out as a “Data Fabric”. And my old friend, Mark Beyer at Gartner published a similar report, referring to these concepts as a “Data Mesh”. It seems we are coalescing around the same ideas and DPS is helping define these requirements.
Over time, will we see a Forrester Wave or a Gartner Magic Quadrant for planes, mesh and fabrics? I presume we will call them Global Data Management Platforms. We believe this is the future of successful deployment of analytics, and we at Hortonworks are happy to be at the launch of what I believe is an important new category of technology.