Overview:
During the data migration process, a usual question is generally asked - how to ensure data consistency for the source and target system after migration. Cross Database Comparison (CDC), one of Data Consistency Management tools, can be used to compare data sources with a complex structure or hierarchy. Those comparison data sources can be across different systems and different types (ABAP, Non-ABAP type, XML & OData). By reviewing the result of CDC, we can get the percentage of matched records, percentage and detail of fail records etc. Besides, CDC can act as a tool to validate the migration program correctness.
Cross Database Comparison - Solution Manager - SCN Wiki
For simple usage, where source and target have similar data structure and no complex data conversion logic, we can use the “Remote Database (Using ADBC)” or “RFC to Generic Extractor” to construct the CDC Data Model. Those standard documents already have covered the simple usage and will not be covered here. Below will show an extended usage of CDC by enhanced Data Model and ABAP Function Module.
Case Study:
Source Table – (ZLEGY_ADDRESS)
Target Tables – (BUT0ID, BUT021_FS, ADRC, ADRCITYPRT, ADR2)
(NOTE: Only show BUT0ID and BUT021_FS definition below)
According to business requirement, the key fields (CLIENTID, ADDRESSTYPE) of legacy table ZLEGY_ADDRESS are mapped to key field of CRM tables BUT0ID (IDNUMBER) and BUT021_FS (ADR_KIND) accordingly. Thus, a proper CDC Data Model can be constructed as below:
Besides, there are certain business rules and some will lead to problems:
- Target side (right) - Only specific “TYPE” of BUT0ID and “RS_USER” of ADR2 are related. This can be handled by Object Filter as shown in diagram.
- Target side (right) - Not all records from source table have PHONE value, thus no corresponding row in target table ADR2. <Problem> As CDC data model by default use inner key join link tables, empty row in ADR2 lead to whole linked rows be filtered out.
- Source side (left) - CLIENTID of ZLEGY_ADDRESS is INT4 (integer) but the mapped filed IDNUMBER of BUT0ID is CHAR(60). <Problem> Sorting order problem may occur when execute the CDC comparison and comparison will be aborted in the middle. As CDC detect the current row keys has lesser value than the previous row keys on either Source or Target.
- Source side (left) - Same CLIENTID may have multiple addresses, which are distinguished by the key ADDRESSTYPE: A, B AND C. Under SAME CLIENTID those multiple ADDRESSTYPE are converted in Target table with below logic:
Under same CLIENTID | Source - ADDRESSTYPE | Target - ADR_KIND |
Has ADDRESSTYPE = ‘C’ | ‘C’ | ‘DEFAULT’ |
| ‘B’ | ‘SHIP_TO’ |
| ‘A’ | ‘HOME’ |
Has ADDRESSTYPE = ‘B’ | ‘B’ | ‘DEFAULT’ |
| ‘A’ | ‘HOME’ |
NO ADDRESSTYPE = ‘C’ | ANY | ‘DEFAULT’ |
Obviously, this ADDRESSTYPE logic is a bit complicate and cannot be handled by the CDC’s ConverisonID.
Problem 2) Solution:
It can be resolved by changing the default “inner key join” into “outer join”. In this case, the Data Model Source Type 2 (Right side) has to change to – “RFC to Generated Extractor”, which allowed to generate ABAP Function Module. After the success generation of Function Module. Use SE80 ABAP editor to apply change. There is NO NEED to change others except those SQL Select statements (Total 3 similar SQL Select). Below is the modified SQL Select as reference:
Modified SQL with Outer Join |
---|
SELECT BUT021_FS~ADR_KIND, ADRCITYPRT~CITY_CODE, ADRC~STREET, ADRC~POST_CODE1, ADR2~TEL_NUMBER, ADRCITYPRT~POST_CODE, BUT0id~IDNUMBER INTO TABLE @lt_source_data FROM BUT0id AS BUT0id JOIN BUT021_FS AS BUT021_FS ON BUT021_FS~PARTNER = BUT0id~PARTNER JOIN ADRC AS ADRC ON ADRC~ADDRNUMBER = BUT021_FS~ADDRNUMBER JOIN ADRCITYPRT AS ADRCITYPRT ON ADRCITYPRT~CITY_CODE = ADRC~CITY_CODE AND ADRCITYPRT~CITY_PART = ADRC~CITY2 LEFT JOIN ADR2 AS ADR2 ON ADR2~ADDRNUMBER = ADRC~ADDRNUMBER AND ( ADR2~R3_USER <> '3' ) FOR ALL ENTRIES IN @lt_key WHERE BUT0id~IDNUMBER = @lt_key-CLIENTID_C AND BUT021_FS~ADR_KIND = @lt_key-ADDRESSTYPE AND ( BUT0id~TYPE = 'CRM001' ) AND ( BUT021_FS~ADR_KIND = 'DEFAULT' or BUT021_FS~ADR_KIND = 'SHIP_TO' or BUT021_FS~ADR_KIND = 'HOME' ). |
The ONLY change is from “JOIN ADR2” to “LEFT JOIN ADR2” (same as LEFT OUTER JOIN). All SQL Select statements inside need to apply the LEFT JOIN as above.
Problem 3) Solution:
Whenever the sorting order issue is caused by data type mismatch, it is almost impossible to resolve by CDC ConverisonID. An ultimate resolution is to add another CLIENTID field (e.g. CLIENTID_C ) but in CHAR type in the table and ensure trim out leading and ending SPACE. In that way, we can ensure that both the Source and Target table have the same sorting order.
See the enhanced Table definition:
Problem 4) Solution:
With the ADDERSSTYPE logic mentioned, a proper algorithm is required. Data Model Source Type 1 (Left side) has to change to type “RFC to Generated Extractor” to generate the ABAP function model. However, with ABAP code alone, it may consume many lines code to apply the conversion logic above. So as to simplify the ABAP logic as much as possible, an enhanced Data Model can help to the situation.
Create two table views from the source table ZLEGY_ADDRESS. Each view only need to pick three fields from ZLEGY_ADDRESS: (CLIENTID, ADDRESSTYPE, CLIENTID_C). One table view (e.g. ZLEGY_ADDR_C) has “Selection Condition” for ADDRESSTYPE = ‘C’. While the other one (e.g. ZLEGY_ADDR_B) has filter ADDRESSTYPE = ‘B’.
The main purpose here is to form SQL Dataset where ZLEGY_ADDRESS , ZLEGY_ADDR_C and ZLEGY_ADDR_B are joined together. Thus each row of ZLEGY_ADDRESS has corresponding ADDRESSTYPE=”C” and ADDRESSTYPE=”B” listed at the right side. If those ADDRESSTYPE types do not exist, still leave an empty row at right side. I.e. Link them together by SQL OUTER JOIN.
Imaginary Joined Tables with data (shown only key fields).
CLIENTID_C | ADDRESSTYPE | ADDRESSTYPE in | ADDRESSTYPE in |
1111111111 | C | C | B |
1111111111 | B | C | B |
1111111112 | B |
| B |
1111111112 | A |
| B |
1111111113 | A |
|
|
With the solution of Problem 3) & 4), the enhanced Data Model should be similar as this:
Part of Function Module code:
Portion of Enhanced Function Module - ZC_ADDR_SRC |
---|
FUNCTION ZC_ADDR_SRC. *"---------------------------------------------------------------------- TYPES: BEGIN OF ts_source_data, ADDRESSTYPE(10) TYPE C, CITY TYPE ZLEGY_ADDRESS-CITY, ADDRESS TYPE ZLEGY_ADDRESS-ADDRESS, ZIP TYPE ZLEGY_ADDRESS-ZIP, PHONE TYPE ZLEGY_ADDRESS-PHONE, TOWN TYPE ZLEGY_ADDRESS-TOWN, CLIENTID_C TYPE ZLEGY_ADDRESS-CLIENTID_C, END OF ts_source_data,
BEGIN OF ts_source_data1, ADDRESSTYPE TYPE ZLEGY_ADDRESS-ADDRESSTYPE, CITY TYPE ZLEGY_ADDRESS-CITY, ADDRESS TYPE ZLEGY_ADDRESS-ADDRESS, ZIP TYPE ZLEGY_ADDRESS-ZIP, PHONE TYPE ZLEGY_ADDRESS-PHONE, TOWN TYPE ZLEGY_ADDRESS-TOWN, CLIENTID_C TYPE ZLEGY_ADDRESS-CLIENTID_C, ADDRESSTYPE_C TYPE ZLEGY_ADDRESS-ADDRESSTYPE, ADDRESSTYPE_B TYPE ZLEGY_ADDRESS-ADDRESSTYPE, END OF ts_source_data1, : DATA: lt_source_data TYPE STANDARD TABLE OF ts_source_data, lt_source_data1 TYPE STANDARD TABLE OF ts_source_data1, lt_temp_source_data TYPE STANDARD TABLE OF ts_source_data, "#EC NEEDED lv_remaining_block_size TYPE i, "#EC NEEDED
lt_temp_source_data1 TYPE STANDARD TABLE OF ts_source_data1, ls_source_data TYPE ts_source_data, ls_source_data1 TYPE ts_source_data1, : : *** Part 3: Source Data Extraction *** IF iv_block_size < 0. * Count expected number of rows only SELECT COUNT(*) INTO @ev_total FROM ZLEGY_ADDRESS AS ZLEGY_ADDRESS LEFT OUTER JOIN ZLEGY_ADDR_C AS ZLEGY_ADDR_C ON ZLEGY_ADDR_C~ADDRESSTYPE = 'C' AND ZLEGY_ADDR_C~CLIENTID_C = ZLEGY_ADDRESS~CLIENTID_C LEFT OUTER JOIN ZLEGY_ADDR_B AS ZLEGY_ADDR_B ON ZLEGY_ADDR_B~ADDRESSTYPE = 'B' AND ZLEGY_ADDR_B~CLIENTID_C = ZLEGY_ADDRESS~CLIENTID_C . RETURN. ENDIF. : : : SELECT ZLEGY_ADDRESS~ADDRESSTYPE, ZLEGY_ADDRESS~CITY, ZLEGY_ADDRESS~ADDRESS, ZLEGY_ADDRESS~ZIP, ZLEGY_ADDRESS~PHONE, ZLEGY_ADDRESS~TOWN, ZLEGY_ADDRESS~CLIENTID_C, ZLEGY_ADDR_C~ADDRESSTYPE, ZLEGY_ADDR_B~ADDRESSTYPE INTO TABLE @lt_source_data1 FROM ZLEGY_ADDRESS AS ZLEGY_ADDRESS LEFT OUTER JOIN ZLEGY_ADDR_C AS ZLEGY_ADDR_C ON ZLEGY_ADDR_C~ADDRESSTYPE = 'C' AND ZLEGY_ADDR_C~CLIENTID_C = ZLEGY_ADDRESS~CLIENTID_C LEFT OUTER JOIN ZLEGY_ADDR_B AS ZLEGY_ADDR_B ON ZLEGY_ADDR_B~ADDRESSTYPE = 'B' AND ZLEGY_ADDR_B~CLIENTID_C = ZLEGY_ADDRESS~CLIENTID_C FOR ALL ENTRIES IN @lt_key WHERE ZLEGY_ADDRESS~CLIENTID_C = @lt_key-CLIENTID_C AND ZLEGY_ADDRESS~ADDRESSTYPE = @lt_key-ADDRESSTYPE .
SORT lt_source_data1 BY CLIENTID_C ADDRESSTYPE.
CLEAR: ev_act_key, ev_block. ev_lines = sy-dbcnt.
LOOP AT lt_source_data1 INTO ls_source_data1. MOVE-CORRESPONDING ls_source_data1 to ls_source_data. IF ls_source_data1-ADDRESSTYPE_C = 'C'. CASE ls_source_data1-ADDRESSTYPE. WHEN 'C'. ls_source_data-ADDRESSTYPE = 'DEFAULT'. WHEN 'B'. ls_source_data-ADDRESSTYPE = 'SHIP_TO'. WHEN 'A'. ls_source_data-ADDRESSTYPE = 'HOME'. ENDCASE. ELSEIF ls_source_data1-ADDRESSTYPE_C = '' AND ls_source_data1-ADDRESSTYPE_B = 'B'. CASE ls_source_data1-ADDRESSTYPE. WHEN 'B'. ls_source_data-ADDRESSTYPE = 'DEFAULT'. WHEN 'A'. ls_source_data-ADDRESSTYPE = 'HOME'. ENDCASE. ELSEIF ls_source_data1-ADDRESSTYPE_C = '' AND ls_source_data1-ADDRESSTYPE_B = ''. ls_source_data-ADDRESSTYPE = 'DEFAULT'. ENDIF. APPEND ls_source_data to lt_source_data. ENDLOOP.
SORT lt_source_data BY CLIENTID_C ADDRESSTYPE. : : |
Key changes of the function module:
- Change the ADDRESSTYPE to CHAR(10) of ts_source_data
- Create separated ts_source_data1 type to include ADDRESSTYPE_C and ADDRESSTYPE_B of those views
- Use LEFT OUTER JOIN to join all table and views
- Collect those whole resultset of Select into table "lt_source_data1" instead of default "lt_source_data" of CDC.
- Perform conversion from "lt_source_data1" and append to CDC default table "lt_source_data"
For detail about the function module, please refer to the attachment "ZC_ADDR_SRC.txt"