Many companies understand that their competitive position in the market will depend on their capabilities to become a data driven organization. Gathering more data from different sources into a data lake allows for a new generation of analytics that cuts across legacy boundaries and silos. However, this simple and straightforward concept brings a range of new challenges. Customers are looking for a scalable, secure, and cost-effective storage platform that also offers traditional enterprise class storage features they are used to, such as storage tiering, simplified administration, disaster recovery, etc.
What is Spectrum Scale
IBM Spectrum Scale is a flexible and scalable software defined file storage for analytics workloads. Enterprises around the globe have deployed IBM Spectrum Scale for
compute clusters (HPC), big data and analytics, high performance backup and restores and
content repositories. It scales up to more than a billion petabytes of data and hundreds of GB/s throughput.
IBM Spectrum Scale supports running on collection of different storage device types like flash, disks, tape, cloud etc. IBM Spectrum Scale can be consumed just as software directly on commodity hardware running HDP stack. Additionally clients can also purchase IBM Spectrum Scale as part of a pre-integrated system, the IBM Elastic Storage Server (ESS), and connect as shared storage for HDP environment. The IBM Elastic Storage Server is a modern implementation of software-defined storage, combining IBM Spectrum Scale software with IBM POWER8® processor-based servers and storage solutions.
IBM Spectrum Scale uses parallel file system architecture so there is no practical limit on the size of a file system and no metadata node bottleneck. The architectural limit for a single file system is more than a yottabyte. Some Spectrum Scale customers use single file systems up to 18 PB in size, while others utilize file systems containing billions of files. IBM Spectrum Scale local cache can use inexpensive solid-state drives (SSDs) or flash placed directly in IBM Spectrum Scale Client nodes that accelerate input/output (I/O) performance up to six times by reducing the time CPUs spend waiting for data and reducing the overall load on network and storage resources.
Launching Hortonworks Data Platform (HDP) on Spectrum Scale
IBM Spectrum Scale will be certified with HDP on Power by end of June, and by end of July Spectrum Scale will be certified with HDP on x86 platform. This certification is for Spectrum Scale software and hence applies to all deployment models of Spectrum Scale, including Elastic Storage Server(ESS). Unique from other products in the market is its parallel file system architecture, which is ideal for the massive scaling of performance and capacity needed in today’s cognitive and big data analytic workloads.
Flexible Storage Layer for In-Place Analytics
IBM Spectrum Scale enables the unification of virtualization, analytics, file and object use cases into a single scale-out storage solution, and provides a single namespace for all data, offering a single point of management. With support of a wide range of protocols, including POSIX, , NFS v4.0, SMB v3.0, HDFS, OpenStack Cinder (block), OpenStack Swift (object), S3 (object), customers can run their in-place analytics workloads without the need for duplicating data-sets.
Since all nodes see all file data, any node in the cluster can concurrently read or update a common set of files—enabling applications to scale out easily. Spectrum Scale maintains the coherency and consistency of the file system using sophisticated byte-range locking, token (distributed lock) management and journaling. This approach means applications using standard POSIX locking semantics do not need to be modified to run successfully on Spectrum Scale.
Using storage policies transparent to end users, data can be compressed or tiered to help cut costs. Data can also be tiered to high-performance media, including server cache, based on a heat map of data to lower latency and improve performance.
Using HDP Tag Based Security Policies with Apache Atlas and Apache Ranger
With Hadoop being a schema on read, data can be brought into the data lake without much worry about cleansing, enrichment and reconciliation. Users have unchecked permission to store virtually any type of data while delegating data management and governance to application layers operating at the top of the platform.
Apache Atlas provides data governance capabilities and serves as a common metadata store that is designed to exchange metadata both within and outside of the Hadoop stack. Apache Ranger provides a centralized user interface that can be used to define, administer and manage security policies consistently across all the components of the Hadoop stack. The Atlas- Ranger unites the data classification and metadata store capabilities of Atlas with security enforcement in Ranger. You can use Atlas and Ranger to implement dynamic classification-based security policies, in addition to role-based security policies. Ranger’s centralized platform empowers data administrators to define security policy based on Atlas metadata tags or attributes and apply this policy in real-time to the entire hierarchy of assets including databases, tables, and columns, thereby preventing security violations.
With Hortonworks Data Platform certified on both IBM Power Systems and IBM Spectrum Scale, customers can enjoy the power of a complete open source Hadoop platform, a high performance GPU enabled compute platform, and a scalable, flexible data storage system for the most demanding workloads .
For more information, please go to: