Big Data Made Easy — Right Cloud, Right Workload, Right Time

Author: Carl Kinson

Over the past five to six years, the platform and infrastructure ecosystem has gone through some major changes. A key change was the introduction of virtualization in infrastructure architecture. This not only provides a consolidation solution and, therefore, a cost-saving answer, but it also enables orchestration and management of the server, network and storage ecosystem.

Before virtualization, the ability to dynamically stand up, configure and scale an environment was time-consuming, manually intensive and inflexible. Now, virtualization is embraced by many organizations, and its use to support business growth is commonplace. If we wrap some utility commercial frameworks around this, pre-package some servers, storage and network architecture, and a support framework, we have the makings of an agile, scalable solution.

This all sounds perfect, so where can I buy one?

One what? Well let’s call this new world order “the cloud” But which cloud? There are many different cloud solutions: public, private, on-premises, off-premises and everything in between.

Your choice will depend on your workload, need for regulatory compliance, confidence level and finances, so it’s unlikely that just one cloud solution will solve all your needs. This is not uncommon. The reality is that today’s businesses have a complex set of requirements, and no one cloud will solve them all. Not yet, anyway!

The next step is to determine how to align your business needs with the right cloud and how to provision the cloud to deliver your business applications — gaining benefits such as reduced effort and complexity, a standard process, and an app store type of front end.

This is the role ServiceMesh Agility Platform is designed to fill. ServiceMesh, CSC’s recent acquisition, is able to orchestrate multiple clouds with predefined application blueprints that can be rolled out and deployed on a range of public and private cloud solutions.

Great, but how does it help my Big Data Projects?

Since big data is enabled by a collection of applications — open source and commercial — we can now create application blueprints for deploying big data solutions rapidly on the cloud. Let’s concentrate on big data running on a cloud infrastructure. (Quick refresh: Hadoop drives the infrastructure to commodity x64-based architecture with internal dedicated storage, configured in grid-based architecture with a high-performance back-end network.)

Today, some companies are running Hadoop clusters in the public cloud on providers such as Amazon and Google. For the longer term, those companies will discover that scaling issues and regulatory compliance will prevent this from being a single-answer solution. At the other end of the spectrum, Yahoo, Facebook and LinkedIn environments are built on dedicated petabyte-scale clusters running on commodity-based architecture. Although we see some very large clients with this kind of need, the typical big data deployment will sit somewhere between these two bookends.

With a controlled virtualization technology to underpin the dedicated Hadoop clusters at scale, configured in a way that does not have an impact on performance, ServiceMesh can be used effectively to provision and manage big data environments. This can include delivery on public clouds such as Amazon. Also, through the use of big data blueprints, the same solution can be deployed with on- and off-premises cloud solutions, enabling you to choose — through an intuitive interface — the right hosting platform for the workload you are trying to align with.

Does the combination make sense?

Absolutely. The intent of big data is to focus on driving business value through insightful analytics, not provisioning and deploying Hadoop clusters. If we can simplify and speed up the provisioning process, we can align the workloads with the most appropriate hosting platform. The complexity of deploying and configuring big data solutions requires key skills. Seeking to do this on multiple environments can become time-consuming and very difficult to manage. That’s why the use of an advanced orchestration tool can reduce your resource overhead, costs and errors, while also letting you operate more quickly. Creating this kind of environment is a specialty task. CSC Big Data Platform as a Service can manage multiple clouds, scalable workloads faster with limited upfront investment for you to derive the right insights, to be the best at your business.

Disruptive Technology in Big Data: Not Just Hadoop

2013-12-02

By Carl Kinson

You’ve heard the names: Pig, Flume, Splunk, MongoDB and Sqoop, to name a few. And Hadoop, of course. They carry funny names that make us smile but they represent disruptive technologies in big data that have proven their value to business. And that means they merit serious consideration for what they can do for your company.

To get the business the value out of the data you are not currently mining, you should consider how to introduce big data technologies into your business intelligence / analytics environment. Some of the best-known big data implementations are centered on Hadoop, which handles some truly massive amounts of data. For instance, eBay uses a Hadoop cluster to analyze more than 40 petabytes of data to power its customer-recommendations feature.

Hadoop is part of the solution in many cases, but today it is hardly the only one. To begin with, Hadoop is a batch-oriented big data solution that is well suited for handling large data volume and velocity. There are some applications where a company can justify running an independent Hadoop cluster, like eBay, but those instances will be the exception. More often, companies will get more value from offloading data into Hadoop-type environments, acting as data stores, running map-reduce jobs and seeding these outputs into the traditional data warehouses, to add additional data for analysis.

Well-established commercial vendors in the ERP/structured data space, such as IBM, SAP and Oracle, have all quickly embraced the Hadoop wave. Examples include SAP HANA + Hortonworks, IBM PureData + IBM BigInsights and Oracle + Cloudera, to name but a few. (Hortonworks, BigInsights and Cloudera are all based on Hadoop, an open source product.)

Many companies, however, will derive more value from a hybrid solution that combines the batch-processing power of Hadoop with “stream-based” technologies that can analyze and return results in real time, using some of the disruptive products I mentioned at the start.

Consider a courier company that geotags its drivers. By combining real-time information about the driver’s location, route plan, traffic information and the weather, the company could reroute a driver if delays are detected on his or her intended route. This is something a batch-oriented system such as Hadoop isn’t designed to address. But using a “streaming” product allows this to happen in near real time.

Each technology on its own is already creating significant disruption in the marketplace. As more companies combine the power of batch- and stream-based big data products and analytics, the disruptive waves will likely grow considerably larger.

Now is a good time to consider how these big data products could be added to your environment, adding functionality and features to your business and helping you make your own waves.