Quantcast
Channel: SCN : Blog List - All Communities
Viewing all articles
Browse latest Browse all 2548

HANA based Transformation (deep dive)

$
0
0

2      HANA based Transformation (deep dive)


This blog is part of the blog series  HANA based BW Transformation.


Now I will look a little bit behind the curtain and provide some technical background details about SAP HANA Transformations. The information provided here serves only for a better understanding of BW transformations which are pushed down.

 

As part of the analysis of HANA executed BW transformations we need to distinguish between simple (non-stacked) and stacked data flows. A simple, non-stacked data flow connects two persistent objects with no InfoSource in between, only one BW transformation is involved. We use the term stacked data flow for a data flow with more than one BW transformation and at least one InfoSource in between.

 

Stability of the generated runtime objects

All information I provide here are background information to get a better understanding on SAP HANA executed BW transformation.

It is important to keep in mind that all object definition could be changed!

Do not implement any stuff based on the generated objects!

The

  • structure of a CalculationScenario (view names, number of views, …) or
  • generated SQL statements or
  • PLACEHOLDER
could be changed by the next release, support package or a  SAP note.

 

2.1      Simple data flow (Non-Stacked Data Flow)

 

A simple data flow is a data flow which connects two persistent BW objects with no InfoSource in between. The corresponding Data Transfer Process (DTP) processes only one BW transformation.

 

In case of a non-stacked data flow the DTP reuses the SAP HANA Transformation (SAP HANA Analysis Process) of the BW transformation, see Figure 2.1.

 

Figure_2_1.png

Figure 2.1: Non-Stacked Transformation


2.2      Stacked Data Flow

 

A stacked data flow connects two persistent data objects with at least one InfoSource in between. Therefore a stacked data flow contains at least two BW transformations. The corresponding Data Transfer Process (DTP) processes all involved BW transformations.

 

In case of a stacked data flow, the DTP cannot use the SAP HANA Transformation (SAP HANA Analysis Process) of the BW transformations. Strictly speaking, it is not possible to create a CalcScenario for a BW transformation with an InfoSource as source object. An InfoSource cannot be used as a data source in a CalculationScenario.


Figure_2_2.png

Figure 2.2: Stacked Transformation


Figure 2.2 shows a stacked data flow with two BW transformations (1) and (2) and the corresponding SAP HANA Transformations(3) and (4). There is no tab for the CalculationScenario in the SAP HANA Transformation (3) for the BW transformation (1) with an InfoSource as source object.

 

Therefore the DTP generated its own SAP HANA Transformations(6) and (7) for each BW transformation. The SAP HANA Transformations for the DTP are largely equivalent to the SAP HANA Transformation for the BW transformations (3) and (4).

 

In the sample data flow above, the SAP HANA Transformation (5) for the DTP get its own technical ID TR_5I3Y6060H25LXFS0O67VSCIF8. The technical ID is based on the technical DTP ID and the prefix DTP_ would be replaced by the prefix TR_.

 

The SAP HANA Transformation (5) and (6)is only a single object, in fact. I included it twice in the picture to illustrate that the SAP HANA Transformation (6) is based on the definition of the BW transformation (1) and will be used(5) from the DTP.

 

Figure 2.3 provides a more detailed view of the generated SAP HANA Transformation. The SAP HANA Transformation (6) is based on the definition of the BW transformation (1) and is therefore largely identical to the SAP HANA Transformation (3).(3) and (6)differs only in the source object. The SAP HANA transformation (6)uses the SAP HANA Transformation / SAP HANA Analysis Process (7) as the source object instead of the InfoSource as shown in (3). The SAP HANA Transformations (4) and (7)are also quite similar. They only differ with respect to the target object. In the SAP HANA Transformation (3), the InfoSource is used as the target object. The SAP HANA transformation (7) does not have an explicit target object. Its target object is only marked as Embedded in a Data Transfer Process. That means the SAP HANA Transformation (7) is used as data source in a SAP HANA Transformation, in our case in the SAP HANA Transformation (6).


Figure_2_3.png

Figure 2.3: Stacked Transformation (detailed)


The technical ID of the embedded SAP HANA Transformation (7) is based on the technical ID of the SAP HANA Transformation (5)and (6)of the DTP. Only the last digit 1 will be added as a counter for the level. This means that in case of a stacked data flow with more than two BW transformations, the next SAP HANA Transformation would get the additional digit 2 instead of 1 and so on.

 

Later on we will need the technical IDs to analyze a HANA based BW transformation, therefore it is helpful to understand how they are being created.


2.3      CalculationScenario


To analyze a SAP HANA based BW transformation, it is necessary to understand the primary SAP HANA runtime object, the CalculationScenario (CalcScenario). The BW Workbench shows the CalcScenario in an XML representation, see (2) in Figure 2.4. The CalculationScenario is part of the corresponding SAP HANA Transformation (Extra è Generated Program HANA Transformation) of the DTP. The CalculationScenario tab is only visible if the Expert Mode (Extras è Expert Mode on/off) is switched on. Keep in mind, if the source object of the BW transformation is an InfoSource the CalculationScenario could only been reached by using the DTP Meta data, see »Dataflow with more than one BW transformation« and »Stacked Data Flow«.


The naming convention for the CalculationScenario is:


     /1BCAMDP/0BW:DAP:<Technical ID – SAP HANA Transformation>


The CalculationScenario shown in the CalculationScenario tab, see (2) and (1)in Figure 2.4, is only a local temporary version, therefore the additional postfix .TMP.


The CalculationScenario processes the transformation logic by using different views (CalculationViews) to split the complex logic into more simplified single steps. One CalculationView, the default CalculationView, represents the CalculationScenario itself. The default CalculationView uses one or more other CalculationViews as source and so on, see Figure 2.5. A CalculationView cannot be used in SQL statements, therefore for each CalculationView a ColumnView will be generated.

 

Inside the SAP HANA database, the ColumnViews are created in the SAP<SID>schema, see (1). Each ColumnView represents a CalculationView, see paragraph 2.3.2.1 »CalculationScenario - calculationViews«, within a CalculationScenario. The create statement of each ColumnView provides two objects. First the CalculationScenario and as second object the ColumnView based on the CalculationScenario. All ColumnViews that belong to a SAP HANA transformation are based on the same CalculationScenario.


Figure_2_4.png

Figure 2.4: CalculationScenario and ColumnViews


SAP HANA internally uses the JSON notation to represent a CalculationScenario. Figure 2.5 shows the CalculationScenario depicted in Figure 2.4 (2) in a JSON Analyzer. The tree representation provides a good overview how the different CalculationViews are consumed.  The JSON based CalcScenario definition can be found in the Create Statement tab in the column view definition in the SAP HANA studio. The definition can be found in the USING clause of the CREATE CALCULATION SCENARIO statement, the definition starts with ‘[‘ and ends with ‘]’.


Figure_2_5.png

Figure 2.5: CalculationScenario in a JSON Analyzer


The SAP HANA studio also provides a good tool to visualize a CalculationScenario, see Figure 2.6. To open the visualization tool click on Visualize View in the context menu of a ColumnView based on a CalculationScenario. The visualization view is divided in three parts. The first part (1) provides a list of the CalculationViews which are used inside the CalculationScenario. Depending on each view definition there are more information about variables, filter or attributes are below the view node available.  The second part (2) provides an overview about the view dependencies. Which view consumes which view? The third part (3) provides context sensitive information for a view form the second part.


Figure_2_6.png

Figure 2.6: CalculationScenario visualization in the SAP HANA Studio


Now we will have a deeper look into the following sub nodes of the calculationScenario node (see (2) in Figure 2.4):

  • dataSources
  • variables
  • calculationViews


2.3.1 CalculationScenario - dataSources


The node dataSources lists all available data source objects of the CalculationScenario. The following data sources are used within a CalculationScenario in the context of a BW transformation:

  • tableDataSource
  • olapDataSource
  • calcScenarioDataSource

In the first sample transformation, we only use a database table (tableDataSource) as the source object, see Figure 2.7. The sample data flow reads from a DataSource (RSDS), therefore the corresponding PSA table is defined as tableDataSource. To resolve the request ID, the SID table /BI0/SREQUID is also added to the list of data sources.


Figure_2_7.png

Figure 2.7: CalculationScenario – Node: TableDataSource


In the second sample a transformation rule Master data read is being used in the BW transformation. In this case an olapDataSource is added to the list of data sources. The olapDataSourceuses the logical index (0BW:BIA:0MATERIAL_F4) from the InfoObject to read the required master data from the real source tables, see Figure 2.8.


Figure_2_8.png

Figure 2.8: CalculationScenario – Node: OLAPDataSource


To read the master data in the requested language, object version and time the PLACEHOLDERS

  • keydate,
  • objvers
  • langu

are added.

 

The third sample is a stacked data flow. In a stacked data flow the CalculationScenario from the DTP uses another CalculationScenario as data source. In these cases, the calcScenarioDataSource is used. The variables defined in the upper CalculationScenario are passed to the underlying CalculationScenario to be able to push down these variables (filters) where possible to the source objects, see Figure 2.9.


Figure_2_9.png

Figure 2.9: CalculationScenario – Node: CalcScenarioDataSource

 

The values for the PLACEHOLDERS are passed in the SQL statement by using the variables, see paragraph 2.3.2 »CalculationScenario - variables«.

 

The variables for the PLACEHOLDER for the variables keydate, objvers and langu are always set in the INSERT AS SELECT statement, whether they are used or not.


2.3.2       CalculationScenario - variables


In the node variables, all parameters are defined which are used in the CalculationScenario and can be used in the SQL statement to filter the result, see Figure 2.10.


Figure_2_10.png

Figure 2.10: CalculationScenario – Node: variables


Placeholder usage by customer

All variables and placeholders defined in a CalculationScenario in the context of a BW transformation are only intended for SAP internal usage. Variables and placeholder names are not stable, that means they can be changed, replaced or removed.


Figure 2.11 provides a further sample, based on a BW transformation, with several dataSource definitions. A variable is been used to control which dataSource and at the end from which table the data are read by the SQL statement.

 

The sample data flow for this CalculationScenario reads from an advanced DSO (ADSO) (based on a Data Propagation Layer - Template) with three possible source tables (Inbound Table, Change Log and Active Data). For each source table (dataSource), at least one view is generated into the CalculationScenario and all three views are combined by a union operation, see (2).

 

The input nodes are used to enhance all three structures by a new constant field named BW_HAP__________ADSO_TABLE_TYPE. The constant values

  • Inbound Table (AQ),
  • Change Log (CL) and
  • Active Data (AT)

can later be used as values for the filter $$ADSO_TABLE_TYPE$$, see (3). The filter value is handed over by the SELECT statement and depends, for example, on the DTP settings (Read from Active Table or Read from Change Log). To read data only from the active data (AT) table the following placeholder setting is used:

 

     'PLACEHOLDER'=('$$adso_table_type$$',   '( ("BW_HAP__________ADSO_TABLE_TYPE"=''AT'' ) )'),

 

For further information see 2.4 »SQL Statement«.

 

Figure_2_11.png

Figure 2.11: CalculationScenario – DataSource and Variable collaboration


2.3.2.1 CalculationScenario - calculationViews


The next relevant node type is the node calculationView. A CalculationScenario uses several layered views (calculationView) to transfer the logic given by the BW transformation. For the CalculationScenario and for each calculationView, a ColumnView is created, see (1) in Figure 2.4. The CalculationScenario related to a column view can be found in the definition of each column view.

 

There are several view types which can be used as sub node of a calculationView:

  • projection
  • union
  • join
  • aggregation
  • unitConversion
  • verticalUnion
  • rownum
  • functionCall
  • datatypeConversion


The view types as well as the number of views used in a CalculationScenario depend on the logic defined in the BW transformation and the BW / SAP HANA release.


A SELECT on a CalculationScenario (ColumnView) always reads from the default view. There is only one default view allowed. The default view can been identified by checking whether the attribute defaultViewFlag is set to “true”. In the JSON representation in Figure 2.5, the default view is always shown as the top node.


The processing logic of each view is described in further sub nodes. The most important sub nodes are:

  • viewAttributesattributes
  • inputs
  • filter

 

The used sub nodes of a CalculationView depend on the view type, on the logic defined in the BW transformation, and the BW / SAP HANA release.


CalculationScenario - calculationViews – view - viewAttributes / attributes


The nodes viewAttributes and attributesare used to define the target structure of a calculation view. The node attributes is used for more complex field definitions like data type mapping and calculated attributes (calculatedAttributes).

 

The InfoObject TK_MAT is defined as CHAR(18) with a conversion routine ALPHA. To ensure that all values comply with the APLHA conversion rules, the CalculationScenario creates an additional field TK_MAT$TMP as a calculatedAttribute. Figure 2.12 shows the definition of the new field. The APLHAconversion rule logic is implemented as a formula based on the original field TK_MAT.


Figure_2_12.png

Figure 2.12: CalculationScenario - CalculationAttributes


Calculated attributes are memory-intensive and we try to avoid them where possible. But there are some scenarios where calculated attributes must be used. For example, in case of target fields based on InfoObjects with conversion routines (see Figure 2.12) and in case of “non-trustworthy” sources. A “non-trustworthy” data source is a field-based source object (that is not based on InfoObjects), for example a DataSource (RSDS) or a field based advanced DataStore-Object (ADSO). In case of “non-trustworthy” data sources, the CalculationScenario must ensure that NULL values are converted to the correct type-related initial values.


CalculationScenario - calculationViews– view inputs


A calculation view can read data from one or more sources (inputs). CalculationViews and/or data sources could be used as sources. They are listed under the node input. A projection or an aggregation, for example, typically have one input node, and a union or a join typically have more than one input node.

 

The input node in combination with the mapping and viewAttribute nodes can be used to define new columns. In Figure 2.13 the attribute #SOURCE#.1.0REQTSN is defined as a new field based on the source field 0REQTSN.


Figure_2_13.png

Figure 2.13: CalculationScenario - input

 

CalculationScenario - calculationViews– view filter


The filter node, see (2) in Figure 2.14, is used in combination with the variable, see (1)in Figure 2.14, to provide the option to filter the result set of the union-view (TK_SOI). The filter value is set as placeholder in the SQL statement see (3) in Figure 2.14


Figure_2_14.png

Figure 2.14: CalculationScenario - input

 

I will come back to the different views later on in the analyzing and debugging section.


2.4 SQL Statement

 

Running a DTP triggers an INSERT AS SELECT statement that reads data from the source and directly inserts the transformed data into the target object. There are two kinds of filter options available to reduce the processed data volume: the WHERE condition in the SQL statement and the Calculation Engine PLACEHOLDER. Placeholders are used to set values for variables / filters which are defined in the CalculationScenario. Which placeholders are used depends on the logic defined in the BW transformation and the BW / SAP HANA release.


PLACEHOLDER


The following table describes the most important PLACEHOLDERS which can be embedded in a CalculationScenario and used in the SQL statements. Which PLACEHOLDERis been used in the CalculationScenario depends on the implemented dataflow logic.


Important note about the PLACEHOLDER

PLACEHOLDER are not stable and the definition can be changed by a new release, support package or note!

It is not supported to use the here listed PLACEHOLDER inside a SQL Script or any other embedded database object in a context of a BW transformation. This also applies to all PLACEHOLDER used in the generated SQL statement.

 

 

Placeholder name

Description

$$client$$

Client value to read client dependent data. The placeholder is always set, whether it is used or not.

$$change_log_filter$$

This placeholder is only used to filter the data which is read from the change log. The filter values are equivalent to the values of the placeholder $$filter$$.

The placeholder is only used for advanced DataStore-Objects (ADSO). See also $$inbound_filter$$and $$nls_filter$$.

$$change_log_extraction$$

Wenn die Extraktion aus dem Changelog erfolgt ist der Wert ‘X’ gesetzt. Wird im Rahmen des Errorhandlings genutzt.

$$datasource_src_type$$

The placeholder is set to ‚X‘ in case the data would be read from the remote source object and not from the PSA.

$$dso_table_type$$

This placeholder is used to control which table of a standard DataStore-Object (ODSO) is used as data source. Therefore the field BW_HAP__________DSO_TABLE_TYPE can be set to:

  • Active Table (0)
  • Change Log (3)

$$filter$$

This placeholder is used to filter the source data where possible. That means the placeholder is typically used in the next view above the union for all available source tables.

This placeholder contains the filters defined in the DTP extraction tab plus some technical filters based on the REQUEST or the DATAPAKID.

In case of using an ABAP routine or a BEx-Variable in the DTP filter, the result of both are used to create the filter condition.

The placeholder $$filter$$ is used for all BW objects except advanced DataStore-Objects (ADSO). To filter an ADSO see $$inbound_filter$$, $$change_log_filter$$and $$nls_filter$$.

$$inbound_filter$$

This placeholder is only used to filter the data which is read from the inbound queue[KT1]. The filter values are equivalent to the values of the placeholder $$filter$$.

This placeholder is only used for advanced DataStore-Objects (ADSO). See also $$change_log_filter$$and $$nls_filter$$.

$$changelog_filter$$

This placeholder is only used to filter the data which is read from the change log. The filter values are equivalent to the values of the placeholder $$filter$$.

This placeholder is only used for advanced DataStore-Objects (ADSO). See also $$change_log_filter$$and $$nls_filter$$.

$$keydate$$

Date to read time dependent master data. The variable value is applied to the logical index of an InfoObject in an olapDataSource. The variable is always set, whether it is used or not.

$$langu$$

Language to read master data. The variable value is applied to the logical index of an InfoObject in an olapDataSource. The placeholder is always set, whether it is used or not.

$$navigational_attribute_filter$$

This placeholder lists the filters based on navigation attributes, which are used in the DTP filter.

$$objvers$$

Object version to read master data. The variable value is applied to the logical index of an InfoObject in an olapDataSource. The variable is always set, whether it is used or not.

$$runid$$

For some features, it is necessary to store metadata in a temporary table. In a prepare phase, the metadata is inserted into this temporary table identified by a unique id (runid). During runtime, the values are then read by using the runid of this placeholder.

This placeholder is primarily used in a CalcScenario based on an explicitly created SAP HANA Analysis Process (see transaction RSDHAAP) and not a SAP HANA Transformation.

$$target_filter$$

The target filter is used in a SQL statement to ensure that only those records of the result set which match this filter condition are inserted into a target object. This placeholder is used if the filter condition is given by the target object, for example by a semantically partitioned Object (SPO). A target filter is applied to the result set of the transformation.

$$datasource_psa_version$$

Relevant version number of the PSA where the current request is located.

$$DATAPAKID$$.DTP

This value is set by the package size parameter maintained in the DTP. More information, especially for dependencies, can be found in the value help (F1) for the DTP package size parameter.

$$REQUEST$$.DTP

This placeholder contains the request ID for the target request.

In the simulation mode, the value is always set to the SID value of the request ID DTPR_SIMULATION in the request table /BI0/SREQUID.

 

 

 

 


The following SQL SELECT statement, see Figure 2.15, belongs to the CalculationScenario as shown in Figure 2.11. The SQL statement passes the value for the variable adso_table_type. The placeholder

 

     'PLACEHOLDER'=('$$adso_table_type$$',   '( ("BW_HAP__________ADSO_TABLE_TYPE"=''AT'' ) )'),

 

sets the field BW_HAP__________ADSO_TABLE_TYPE to ’AT’. In the union definition in Figure 2.11, see (2), the field BW_HAP__________ADSO_TABLE_TYPE is set to the constant value ’AT’for all rows provided by the active data table. Like this, the placeholder ensures that only data from the active data table is selected.

 

Figure_2_15.png

Figure 2.15: DPT - SELECT statement

 

Some PLACEHOLDERS get the transformation-ID as a prefix to ensure that these PLACEHOLDERidentifiers are unique, see placeholder $$runid$$ in Figure 2.15 and the calcScenarioDataSource description in the blog HANA based BW Transformation.

 

 


Viewing all articles
Browse latest Browse all 2548

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>