This week, social data analytics company DataSift announced a partnership with Hadoop data management software provider Cloudera, which will help the San Francisco and UK-based analytics startup expand into areas that extend beyond social. DataSift’s plans extend far beyond helping companies make sense of just the social web, and Cloudera’s Hadoop expertise can help them get there faster.
“What we’ve been working on for the last year and a half, and getting Cloudera involved is part of the whole story, is the historic access systems we’ve got,” said DataSift co-founder and CTO Nick Halstead in an interview. “And obviously even though people know we’ve got two years of historic Twitter data in our platform, the actual bigger big data story is that we are a generic data processing platform using Hadoop, and going forward we’re going to be bringing on board a lot of other data sets, not just in social but from a whole range of sources, using Hadoop and MapReduce to be able to give our customers access to large amounts of data on demand.”
Basically, DataSift is aiming “to be able to do the same kinds of things that we do for public data sets to private data sets,” providing the ability for large enterprises, to, for instance, perform extensive data analytics on their cumulative Yammer interactions. Another prime use case for what DataSift’s use of Hadoop and MapReduce (a programming model and framework that allows for parallel processing of huge amounts of data on multiple server nodes simultaneously) comes from the financial market, where Halstead pointed out that the current way of delivering historics is for entities like Nasdaq, for instance, to send out physical hard drives storing years worth of ticker data.
“What we want to do is turn that on its head using the same platform where we can both deal with real-time data and the petabytes of historic data and be able to deliver that on demand,” Halstead explained. “Without things like Hadoop, that’s not really been cost-effective to be able to deliver that as a data solution for customers, and now we have those systems to make it possible.”
Halstead noted that while DataSift began as a straightforward Cloudera customer, it quickly became apparent to both companies that a closer partnership would have benefits for all parties. “One of the things that Cloudera likes about what we’re doing is very, very high-volume data ingestion,” he said. “Every day we’re ingesting four hundred million data items that need to be stored to disk, and that’s no small problem to solve. Having them on board helps us scale, but it also teaches them a lot about this very special use case when they’re then going back to their own customers.”
There are other benefits, like the ability for the two to work together on Hbase and contribute to Hadoop open-source efforts with their combined expertise. The partnership brings together two of the biggest rising stars in the data space, combining DataSift’s social experience with Cloudera’s big data knowledge. Both companies are experiencing plenty of growth on their own, and should be able to help each other reach new heights in the future.