Archive for the ‘InfoSphere Streams’ Category
This blog posts refers to the definition of Big Data commonly in use today. I do not include mainframe-based solutions, which some people might argue tackle Big Data challenges.
Both IBM and Oracle are going after the Big Data market. However, they are taking different approaches. I’m going to take a few moments to have a very brief look at what both companies are doing.
First of all, Oracle have introduced an “appliance” for Big Data. IBM have not. I put the word appliance in quotes because I consider this Oracle appliance to be closer in nature to an integrated collection of hardware and software components, rather than a true appliance that is designed for ease of operation. But the more important consideration is whether an appliance even makes sense for Big Data. There is a decent examination of this topic in the following blog post from Curt Monash and the accompanying comment stream: Why you would want an appliance — and when you wouldn’t. But, regardless of your position on this subject, the fact remains that Oracle currently propose an appliance-based approach, while IBM does not.
The other area I will briefly look at is the scope of the respective vendor approaches. In the press release announcing the Oracle Big Data Appliance, Oracle claim that:
Oracle Big Data Appliance is an engineered system optimized for acquiring, organizing, and loading unstructured data into Oracle Database 11g.
IBM takes a very different approach. IBM does not see its Big Data platform as primarily being a feeder for its relational database products. Instead, IBM sees this as being one possible use case. However, the way that customers want to use Big Data technologies extend well beyond that use case. IBM is designing its Big Data platform to cater for a wide variety of solutions, some of which involve relational solutions and some of which do not. For instance, the IBM Big Data platform includes:
- BigInsights for Hadoop-based data processing (regardless of the destination of the data)
- Streams for analyzing data in motion (where you don’t necessarily store the data)
- TimeSeries for smart meter and sensor data management
- and more
Here’s a short video that was recorded at the IDUG conference, where I talk about the characteristics of Big Data solutions, discuss some of the technologies involved, and describe some real world Big Data solutions that IBM has implemented. Its a high-level introduction, but if you’re not sure what this “Big Data” term refers to, you may find it useful.
In the video, I try to quantify what “big” means today, as well as describing some lessons we have learned while implementing Big Data solutions. Technologies introduced include Map/Reduce systems, systems for analyzing streaming data, Massive Parallel Processing data warehouse systems, and in-memory database systems.
Those of you that know me in person, will see that I was a little under-the-weather when the video was recorded. You can hear it in my voice, see it in my demeanor, and notice it in my cadence. I hope you can get past this, and find this video useful.
As many of you know, IBM has been making big investments in Big Data. This includes InfoSphere BigInsights (which is based on Apache Hadoop), InfoSphere Streams, IBM Netezza, and more than $14B in analytics-based acquisitions. IBM is now announcing a set of hands-on workshops that will be held around the world to help you get to grips with Big Data. There will be 1,200 of these free workshops held in more than 150 cities in 60 countries in 2011. For more information, see IBM Launches Global Bootcamps to Help Companies Tackle Big Data Challenges.