This blog series will provide you with the overview of various Big Data technologies around Hadoop which are supported in SAP Predictive Analytics. It will also cover and how SAP Predictive Analytics can be used to apply predictive techniques to Big Data on Hadoop.
Introduction
Big Data is more than just a buzzword nowadays; it's changing every way customers are running their business. To get previously unknown information and to bring actionable insights into the business, all information generated by the business needs to be stored. Hadoop, being the scalable platform for storage compared to other databases, is becoming very popular among the customers. The objective of this blog is to get you acquainted with Big Data technologies and briefly describe how you can use Big Data technologies with SAP Predictive Analytics.
What is Big Data?
Gartner analyst Doug Laney came up with famous three Vs back in 2001. 3V patterns are the commonly observed in Big Data.
Source: Forbes
Big Data means a large dataset which cannot be processed using traditional computing techniques. Big Data is not merely a type of data, rather it has become a complete subject which involves various tools, techniques, and frameworks.
- Volume refers to the vast amount of data generated every second. Think of all the emails, Twitter messages, photos, video clips and sensor data that we produce and share every second. We are not talking terabytes, but Yottabytes or Brontobytes of data.
- Velocity refers to the speed at which new data is generated and the speed at which data moves around. For example think of social media messages going viral in minutes, or the speed at which credit card transactions are checked for fraudulent activities.
- Variety refers to the different types of data we can use. Data is generated from „Internet of Things“ is growing exponentially (for example, sensors on the planes getting tons of data).
Source: Blog.sap.com ,HP
Big Data involves the data produced by different devices and applications. Below are the types of data that fall under the umbrella of Big Data.
- Structured data: Relational data, etc.
- Semi Structured data: XML data, etc.
- Unstructured data: Word, PDF, Text, Media Logs, weblogs, etc.
SAP Predictive Analytics on Big Data
Example Scenario: Increasing online sales by analyzing weblogs
In this section, let us review a typical predictive scenario of an online retail store and understand how SAP Predictive Analytics solution along withBig Data technology (Hadoop) can work together. Nowadays, online retailers collect massive amounts of data by having access to clickstream data,user profiles, advertising data, and social network data – just to name a few. This huge amount of data can be stored in a Hadoop cluster. The Hadoop system can be scaled up very easily to store and manage this continuously growing data getting generated from all different kind of sources. Hadoop contains in memory application like Spark and pre packaged machine learning libraries like Mlib to be utilized in building predictive models efficiently.
A typical example of SAP Predictive Analytics project on a Big Data scenario would be connect to the Hadoop system, prepare meaningful dataset in-database and train the predictive model, build and deploy the predictive model in the Hadoop system.
SAP Predictive Analytics enables the analyst to create predictive models that can identify the key influencers of customers going through with an online purchase.
For example, we may find out that customers under 25 are more likely to purchase products after 1 am on weekends, when certain types of advertising are shown and when they are redirected from YouTube.
Using the clustering module in SAP Predictive Analytics, a marketing manager can identify customer groups that have similar characteristics.These clusters can then be used during targeted marketing campaigns in the future.
In the next blog I will discuss how SAP Predictive Analytics can be used to build predictive models for this Big Data scenario.