What is fuelling IBM’s commitment to Apache Hadoop and Spark?
The pressures of day to day business are delaying companies doing more with their data. IBM’s commitment is to initiate, simplify and integrate Hadoop technologies into organisations to help mature, expansive and experiential projects without succumbing to heavier workloads. Businesses need to pursue new revenue opportunities, beat their competition and delight their customers with better, faster analytics and new data-driven applications. For the exponential data growth all companies are currently facing, Hadoop has the ability to store, manage and analyse vast amounts of structured and unstructured data quickly.
But just like servers, not all Apache Hadoop distributions are created equally.
IBM is working with Hortonworks, a leading innovator of open and connected data platforms, to enable customers’ with new data-driven capabilities. Whether that’s a 360-degree customer view or an AI-enabled customer service chatbot, Hortonworks is committed to driving innovation across all industries. I am very excited about what our partnership means for the future of industry, from retail to banking and beyond.
Why is our partnership, which brings Hortonworks Data Platform to IBM Power Systems, significant?
For [IBM] customers already running IBM Power Systems, Hortonworks Data Platform (HDP) on Power provides a cost-effective, high performance solution for supporting their new big data and analytics—all on their preferred hardware platform. For any other companies who are looking to get started or are already implementing a big data strategy, this combination of IBM Power Systems with HDP delivers what I see to be the perfect combination: a highly efficient, cost-effective and high performance data platform for their Hadoop and Spark workloads. It is perfect in every sense of the word.
In today’s world of ever increasing data volumes, companies have to deal with data intensive workload performance and scalability issues. The IBM POWER8 processor provides leading I/O and memory capabilities inherently needed for fast data access and movement across a wide range of big data, analytics and cognitive applications.
Hortonworks’ secure, enterprise-ready open source Apache™ Hadoop® distribution provides clients with a highly scalable storage platform designed to process massive datasets across thousands of computing nodes. Hadoop and Spark distribution complements IBM Power Systems by allowing clients to gain business insights from their structured and unstructured data with differentiated speed.
I am also very excited about IBM Power Systems’ range of Linux servers, including the industry’s fastest server built for AI, as well as the PowerAI deep learning platform. Together we are driving new customer value and redefining advanced analytics in the era of AI.
Hortonworks and IBM are founding members of the Open Data Platform Initiative (ODPi), why is it important for our customers?
This initiative is another key example of how important the IBM + Hortonworks partnership is. The OPDi, launched two years ago, represents industry leaders working collaboratively to define and promote a set of standards for open source technologies and increase compatibility among big data platforms.
The ODPi represents our pledge to collaboration, openness and innovation. With HDP on Power, clients can benefit from running a top tier distribution for Hadoop and Spark on a platform with the performance, scalability and acceleration capabilities of IBM Power Systems, thanks to POWER8 and differentiated acceleration technology.
Have you seen a shift in how organisations are considering the combination of structured and unstructured data as a basis for better business outcomes over the past 12 months?
Absolutely. As the amount of data doubles every six months, more and more organisations are looking to utilise both structured and unstructured data to make them the industry expert in their respective fields. Hortonworks has just simplified the process by introducing a new way to hold, manipulate and analyse the data.
We are seeing that some organisations, using Hortonworks with POWER8, have started to apply machine learning to that data. This can be done in real time as the data is generated and fed into the Hortonworks engine using Spark, enabling all of this data to be collated and rapidly loaded into the engine where it can be enhanced, transformed and analysed to provide insights quicker on a much wider scale than we have ever seen before. Analytics becomes ever more agile and flexible by having a multipurpose Hadoop Engine which uses Spark to move and rapidly analyse data. This means you can also subset the data and send it off to other open source data manipulation tools easily to gain further downstream insights.
Having such analytical capabilities in real time is remarkable; I am very excited to see what the open source community and our customers can do with such a forward looking marriage of technology.
Can you share any metrics, either from a TCO or performance perspective, for Hadoop running on Power Systems?
We are seeing HDP on Power Systems deliver far better throughput for typical Hadoop workloads versus x86-based solutions. In addition, clients can realise up to 3 times the reduction of compute and storage infrastructure versus x86-based solutions when integrating IBM Elastic Storage Server (ESS) with Spectrum Scale storage.
POWER8 and Hortonworks deliver 1.70X the throughput compared to Hortonworks running on x86. With 70% more queries per hour, based on the average response time, you can complete the same amount of work with fewer system resources. And with a 41% reduction on average in query response time, you can see business decisions faster.
Where do you see the next positive disruptor coming from in relation to big data?
GPU-accelerated data analytics and artificial intelligence-augmented analytics are becoming an affordable reality, and showing some exceptional performance with results being generated in micro-seconds. In combination with big data platforms like Hortonworks Data Platform that can really funnel down and refine the huge amount of structured and unstructured data to a size that can be analysed by GPU-accelerated systems, we’re seeing the next generation of real-time analytics and immediate business benefit.
What advice would you have for organisations starting out on their journey with big data in 2017?
Innovations in technology are required to keep up with the ever-evolving business needs of today— and that’s just to stay alive. To thrive, grow and enchant their customers, businesses must match that with improving client satisfaction, reducing costs and staying true to the company’s core beliefs. This is especially true for organisation who are embarking on their big data journey for the first time; these businesses should look at the most appropriate and performant technology which would help maximise the company’s and team’s potential – not by assuming x86, but by using IBM- designed systems with the big data challenge in mind.
IBM offers an incredible range of hardware for every organisational and team’s specific big data requirements. Combine that with Hortonworks, and you have a perfect match in the era for AI. We are here to take that journey with you.