Disruptive Technology in Big Data: Not Just Hadoop


By Carl Kinson

You’ve heard the names: Pig, Flume, Splunk, MongoDB and Sqoop, to name a few. And Hadoop, of course. They carry funny names that make us smile but they represent disruptive technologies in big data that have proven their value to business. And that means they merit serious consideration for what they can do for your company.

To get the business the value out of the data you are not currently mining, you should consider how to introduce big data technologies into your business intelligence / analytics environment. Some of the best-known big data implementations are centered on Hadoop, which handles some truly massive amounts of data. For instance, eBay uses a Hadoop cluster to analyze more than 40 petabytes of data to power its customer-recommendations feature.

Hadoop is part of the solution in many cases, but today it is hardly the only one. To begin with, Hadoop is a batch-oriented big data solution that is well suited for handling large data volume and velocity. There are some applications where a company can justify running an independent Hadoop cluster, like eBay, but those instances will be the exception. More often, companies will get more value from offloading data into Hadoop-type environments, acting as data stores, running map-reduce jobs and seeding these outputs into the traditional data warehouses, to add additional data for analysis.

Well-established commercial vendors in the ERP/structured data space, such as IBM, SAP and Oracle, have all quickly embraced the Hadoop wave. Examples include SAP HANA + Hortonworks, IBM PureData + IBM BigInsights and Oracle + Cloudera, to name but a few. (Hortonworks, BigInsights and Cloudera are all based on Hadoop, an open source product.)

Many companies, however, will derive more value from a hybrid solution that combines the batch-processing power of Hadoop with “stream-based” technologies that can analyze and return results in real time, using some of the disruptive products I mentioned at the start.

Consider a courier company that geotags its drivers. By combining real-time information about the driver’s location, route plan, traffic information and the weather, the company could reroute a driver if delays are detected on his or her intended route. This is something a batch-oriented system such as Hadoop isn’t designed to address. But using a “streaming” product allows this to happen in near real time.

Each technology on its own is already creating significant disruption in the marketplace. As more companies combine the power of batch- and stream-based big data products and analytics, the disruptive waves will likely grow considerably larger.

Now is a good time to consider how these big data products could be added to your environment, adding functionality and features to your business and helping you make your own waves.

Leave a Reply