As part of the SAP HANA Academy's examination of SAP HANA Vora, Tahir Hussain Babar (Bob) details how to load and use hierarchies in a series of three videos.
For this series you should already have SAP HANA Vora installed. For details on how to install SAP HANA Vora please watch these videos and read this blog. Through out the three videos Bob will be working in the Apache Spark Shell.
Loading Hierarchies
In the first video in the hierarchy series Bob profiles a typical hierarchy, examines the format and columns of a particular type of hierarchy. Then Bob demonstrates how to load it into HDFS. Loading into HDFS is a precursor to loading the hierarchy into SAP HANA Vora.
Example of a Hierarchy: At the top of a hierarchy could be a manager (Diane in Bob's example). Diane is the root of the hierarchy. Diane manages two people, Joe and Austin. Austin doesn't manage anyone so he is a root of that hierarchy. Joe manages five people. Those five are the children of the node, Joe, and are themselves roots of the hierarchy.
Bob loads some new data to reflect the hierarchy that he outlined above. He lists an ID number, the person's name, a predecessor rank, a successor rank, and an order rank. Bob then loads another dimension table of data with addresses and joins it to the hierarchy.
To conclude the video Bob, executes a put command to load his recently created hierarchy.csv and addresses.csv files into HDFS.
Creating a Hierarchy View
In this tutorial video Bob shows you how to load data from HDFS into SAP HANA Vora. Once the data is in Vora, Bob walks through how to create a specific view in Vora which will enable you to preform hierarchical analysis.
To move the data from HDFS to SAP HANA Vora, first Bob in Spark Shell imports org.apahce.spark.sql and puts it into a variable called sqlc. Bob next pastes in the below syntax to create the table. The syntax creates a temporary table called hierarchy_table with five columns and specifies the main node, the zookeeper host and the Vora host, while pointing to the HDFS path.
Next Bob enters a similar syntax to create a temporary table in Vora for the addresses table. Bob runs a select * statement to confirm the data is in Vora.
To create a view for the hierarchy, Bob runs the syntax shown below. Bob creates a view called HV that selects data from the hierarchy table using a pair of options called join parent and set node. Join parent defines how the adjacency list is constructed. In this example any two rows in the hierarchy table have an edge between them in the hierarchy if the child's row's predecessor column is equal to the parent's row's successor column. This is indicated by ON v.pred = u.succ.
START WHERE pred = 0 defines the root of the hierarchy as any row that matches the specified condition is considered a root of the hierarchy forrest. For optimal performance it's best to set the predecessor as the root node.
SET NODE just identifies a column to act as the node and will be reflected as a new column in the view. This column will contain information on how to specify the location within the hierarchy.
SEARCH By ord ASC enables the hierarchy to list in ascending order based on when someone joined.
After running the script Bob does a select * to view the newly created hierarchy.
Using Hierarchy Functions
In the final video of the series Bob details how to use user defined functions in SAP HANA Vora on top of hierarchically defined data.
Bob's first example on how to use hierarchies in SAP HANA Vora shows how to discover who is the top of the hierarchy. The script below selects the name for the person of is the root from the node in the hierarchy view.
This displays the value for all eight people in the view and if Bob modifies the SAP sql to return only where IS_ROOT(NODE) = true then only Diane is returned as she is the top of the hierarchy.
Another user defined function that Bob profiles (pictured above) is how to discover who are the children of the hierarchy. This sql splits the view into a HV A list and HV B list and then joins on the two Node columns. This returns all of the different parent-child combinations. Bob can simply modify this code to find out who the parents are and who the siblings are for each individual in the view. Bob also show how to simply manipulate the function to see each person's descendants and ancestors.
The functions that deal will the hierarchy view's order column are IS FOLLOWING and IS PRECEDING. Similar to the earlier functions these show who joined the organization after or before respectively for each person.
Then Bob shows how to join two tables together in SAP HANA Vora. Bob joins the two tables using the ID column that they both share. This returns each person's address.
Next Bob shows a more complex function that will return the address of only the people who are a root in the hierarchical forrest. The IS ROOT command finds out the root. Then the person who is a root from the A list of the view is joined to the child of that root to return the ID of the people who are below the top level.
Bob then modifies the syntax to join it to the addresses table to return the address of each of the people who are one level below the top of the hierarchy.
The next syntax Bob runs returns the addresses for the people that are two levels below the top of the hierarchy. Bob then modifies it to the third level to return return the addresses for those people.
In this series of three videos Bob showed how to load data from HDFS into SAP HANA Vora. Bob then detailed how to create a hierarchical view in SAP HANA Vora. Finally Bob detailed a few of SAP HANA Vora's hierarchical user functions.
For more SAP HANA Vora tutorial videos please check out this playlist
SAP HANA Academy - Over 1,200 free tutorial videos on SAP HANA, SAP Analytics and the SAP HANA Cloud Platform.
Follow us on Twitter @saphanaacademy and connect with us on LinkedIn.