Connecting the Boxes


As we develop IT solutions, it is very easy to focus on the core elements: infrastructure, platform and application layers, and the big components such as storage and compute, ERP and middleware technologies. However, as we think about architectures and systems integration, focusing on the connectivity of the data and application is critical to a successful deployment and to satisfying both operational and regulatory requirements.

This focus on connectivity is particularly important as we move to modern, cloud-based applications. In today’s architecture we worry less about the basic interoperability of big components because the vendors typically have this well covered. Unless you’re trying to put the proverbial square peg in a round hole, your risk is low. As we look to make our applications more agile and consider moving workload from public cloud to private cloud or hosted solutions, and as we think about moving from testing to production, what we need to worry about more is the connection between data and applications. Is the line that connects these boxes well designed for today and tomorrow?

Consider the plumbing in your house. Would one type of pipe and fittings handle high and low pressure water, gas and oil-based systems? Fittings and pipe structure need to be designed specifically to ensure they integrate and operate with the appliances they connect. Now consider an IT architecture. Don’t confuse the lines that connect the boxes as being the network cables or network connection protocols. The OSI model handles these connections up to layer 4, typically in the infrastructure layer. The layers I want to focus on are those that deal with the data transportation between applications (layers 5-7), where the lines between the boxes are the protocols and APIs that connect the applications together.

These connections need to not only function as interconnections between applications but also take on the attributes of the overall solution. For example, if you are operating in a secure regulated environment, you must ensure you are using secure protocols (e.g., SSL, SFTP, HTTPS, SSH), making sure that data is encrypted as it moves between applications. Or if writing APIs, then Java with the Java Cryptography Extension (JCE) can be used to secure the data connections through encryption.

As part of the design when considering APIs and protocols, strive to future proof yourself. As we have seen in the Web space, RESTful APIs have become the protocol of choice. Risk is reduced around application integration, availability of skilled resources and support from the application vendors, providing flexibility and adaptability for future developments.

Consider a client looking to migrate from their legacy applications to new modern apps, migrating both platform and hosting to cloud-enabled solutions. A critical aspect is ensuring that connectivity for the migration itself and then the migrated operational components is enabled. Part of the success of most application modernisations projects is based on the ability to move and reconnect new applications and data sources into the legacy estates.

As we look forward we are already seeing products, both commercial and open source, that help solution designers interconnect their applications through common data connectors and APIs.  One to draw your attention to in the open source space is EzBake, developed by 42Six Solutions (a CSC company). EzBake is in the final stage of being launched. This open source project aims to simplify the data connectivity, federated query and security elements within the big data space.  There are already public cloud-based platforms that enable you to buy a service that connects your data source to a target through a common set of APIs and protocols. EzBake will likely sit in the private cloud space, focused on connecting big data applications and data stores, but the ability to make these application and data connections easily is usable across the IT landscape.

It all comes down to the line connecting the boxes. Ensuring that this is given as much thought and consideration as the data and applications when designing a solution will pay dividends, enabling your architecture to integrate and operate successfully. And with correctly chosen protocols, your solution will be future proofed for the next integration or migration project.

Big Data Made Easy — Right Cloud, Right Workload, Right Time

Author: Carl Kinson

Over the past five to six years, the platform and infrastructure ecosystem has gone through some major changes. A key change was the introduction of virtualization in infrastructure architecture. This not only provides a consolidation solution and, therefore, a cost-saving answer, but it also enables orchestration and management of the server, network and storage ecosystem.

Before virtualization, the ability to dynamically stand up, configure and scale an environment was time-consuming, manually intensive and inflexible. Now, virtualization is embraced by many organizations, and its use to support business growth is commonplace. If we wrap some utility commercial frameworks around this, pre-package some servers, storage and network architecture, and a support framework, we have the makings of an agile, scalable solution.

This all sounds perfect, so where can I buy one?

One what? Well let’s call this new world order “the cloud” But which cloud? There are many different cloud solutions: public, private, on-premises, off-premises and everything in between.

Your choice will depend on your workload, need for regulatory compliance, confidence level and finances, so it’s unlikely that just one cloud solution will solve all your needs. This is not uncommon. The reality is that today’s businesses have a complex set of requirements, and no one cloud will solve them all. Not yet, anyway!

The next step is to determine how to align your business needs with the right cloud and how to provision the cloud to deliver your business applications — gaining benefits such as reduced effort and complexity, a standard process, and an app store type of front end.

This is the role ServiceMesh Agility Platform is designed to fill. ServiceMesh, CSC’s recent acquisition, is able to orchestrate multiple clouds with predefined application blueprints that can be rolled out and deployed on a range of public and private cloud solutions.

Great, but how does it help my Big Data Projects?

Since big data is enabled by a collection of applications — open source and commercial — we can now create application blueprints for deploying big data solutions rapidly on the cloud. Let’s concentrate on big data running on a cloud infrastructure. (Quick refresh: Hadoop drives the infrastructure to commodity x64-based architecture with internal dedicated storage, configured in grid-based architecture with a high-performance back-end network.)

Today, some companies are running Hadoop clusters in the public cloud on providers such as Amazon and Google. For the longer term, those companies will discover that scaling issues and regulatory compliance will prevent this from being a single-answer solution. At the other end of the spectrum, Yahoo, Facebook and LinkedIn environments are built on dedicated petabyte-scale clusters running on commodity-based architecture. Although we see some very large clients with this kind of need, the typical big data deployment will sit somewhere between these two bookends.

With a controlled virtualization technology to underpin the dedicated Hadoop clusters at scale, configured in a way that does not have an impact on performance, ServiceMesh can be used effectively to provision and manage big data environments. This can include delivery on public clouds such as Amazon. Also, through the use of big data blueprints, the same solution can be deployed with on- and off-premises cloud solutions, enabling you to choose — through an intuitive interface — the right hosting platform for the workload you are trying to align with.

Does the combination make sense?

Absolutely. The intent of big data is to focus on driving business value through insightful analytics, not provisioning and deploying Hadoop clusters. If we can simplify and speed up the provisioning process, we can align the workloads with the most appropriate hosting platform. The complexity of deploying and configuring big data solutions requires key skills. Seeking to do this on multiple environments can become time-consuming and very difficult to manage. That’s why the use of an advanced orchestration tool can reduce your resource overhead, costs and errors, while also letting you operate more quickly. Creating this kind of environment is a specialty task. CSC Big Data Platform as a Service can manage multiple clouds, scalable workloads faster with limited upfront investment for you to derive the right insights, to be the best at your business.

Disruptive Technology in Big Data: Not Just Hadoop


By Carl Kinson

You’ve heard the names: Pig, Flume, Splunk, MongoDB and Sqoop, to name a few. And Hadoop, of course. They carry funny names that make us smile but they represent disruptive technologies in big data that have proven their value to business. And that means they merit serious consideration for what they can do for your company.

To get the business the value out of the data you are not currently mining, you should consider how to introduce big data technologies into your business intelligence / analytics environment. Some of the best-known big data implementations are centered on Hadoop, which handles some truly massive amounts of data. For instance, eBay uses a Hadoop cluster to analyze more than 40 petabytes of data to power its customer-recommendations feature.

Hadoop is part of the solution in many cases, but today it is hardly the only one. To begin with, Hadoop is a batch-oriented big data solution that is well suited for handling large data volume and velocity. There are some applications where a company can justify running an independent Hadoop cluster, like eBay, but those instances will be the exception. More often, companies will get more value from offloading data into Hadoop-type environments, acting as data stores, running map-reduce jobs and seeding these outputs into the traditional data warehouses, to add additional data for analysis.

Well-established commercial vendors in the ERP/structured data space, such as IBM, SAP and Oracle, have all quickly embraced the Hadoop wave. Examples include SAP HANA + Hortonworks, IBM PureData + IBM BigInsights and Oracle + Cloudera, to name but a few. (Hortonworks, BigInsights and Cloudera are all based on Hadoop, an open source product.)

Many companies, however, will derive more value from a hybrid solution that combines the batch-processing power of Hadoop with “stream-based” technologies that can analyze and return results in real time, using some of the disruptive products I mentioned at the start.

Consider a courier company that geotags its drivers. By combining real-time information about the driver’s location, route plan, traffic information and the weather, the company could reroute a driver if delays are detected on his or her intended route. This is something a batch-oriented system such as Hadoop isn’t designed to address. But using a “streaming” product allows this to happen in near real time.

Each technology on its own is already creating significant disruption in the marketplace. As more companies combine the power of batch- and stream-based big data products and analytics, the disruptive waves will likely grow considerably larger.

Now is a good time to consider how these big data products could be added to your environment, adding functionality and features to your business and helping you make your own waves.