When it comes to comparing SDI with other solutions, one major difference is the support of transactional consistency. Some examples where this might come in handy:
1. The source has an order table and a reserved goods table. Whenever something is ordered, the goods are reserved and taken out of the stock and then both changes are committed together. So a user in the source system will always see the orders and reserved goods being in sync.
If both tables are copied independently, a user querying the sum of orders and the sum of reserved goods will constantly see those two numbers to be different as there are new orders all the time.
Transactional consistency across tables.
2. In the financial booking table every debit booking has a matching credit booking and both are committed together. Hence the sum of bookings is zero always, money is moved from one account to another. It would be outright wrong if the data is replicate and committed in the target differently.
Transactional consistency within tables.
3. A new customer called and ordered something. So first the customer master data was entered and a minute later a sales order placed. If both tables are replicated independently, then it might happen that in the target table there is a sales order for a customer did does not exist yet.
Transactional order.
4. In the source system a new material was entered, hence inserted into the database but not saved/committed. The user went to lunch and saved it 30 minutes later. The record in the database has a create data, which is 30 minutes before but whoever queries the table does not find the record until it is committed. This is a major problem if for example a delta process relies on this timestamp as it might read all changes with a recent create date but does not see this record for 30 minutes and then it is not recent anymore.
Long running transactions.
All of the above examples are huge problems for any ETL tool in the market. But because they ignore above problems they can be very fast during the initial load and partly for the delta loads.
With SDI the initial load does happen by reading in parallel for maximum speed and depending on the settings, it works just like all other ETL tools. But during the initial load all changes in the source tables are recorded already and later merged into the target to get a consistent state. The subsequent realtime push of the changes does happen in a transactional fashion honoring the above cases 1-3.
This is possible because the adapter does send all change rows of all tables in the correct order and a commit is a change row of its own, hence the Hana Data Provisioning server can replay the changes with the same transactional bracket and in the same order.
This can be seen in the following example, where an insert loader semantic is used:
The execution order in the source had been
(change 1 via initial load)
insert into address_input
values ('1', 'Innsbrucker Straße 34', 'Salzburg', '5070', 'AT');
commit;
(change 1)
insert into address_input
values ('2', 'North 1st Street', 'San Jose', '????', 'US');
commit;
(change 2)
insert into address_input
values ('3', 'Hill Rd Drive 532', 'LA', '90003', 'US');
commit;
(change 3 modifying the old record to change 4, the new values)
update address_input
set "postalcode" = '95121' where "customer_id" = '2';
commit;
(change 5)
delete from address_input
where "customer_id" = '1';
commit;
After all these changes the various records look like in the target
customer_id = 1: Had been inserted into Hana at 03:26:55 and deleted at 03:28:56
customer_id = 2: Inserted at 03:28:02 and updated 03:28:44
customer_id = 3: Inserted at 03:28:22
Preserving the order is key here, e.g. customer_id=2 has to be inserted before being updated, else it does not make sense.
Looking at the CHANGE_TIME more closely reveals more details.
First, all rows within the same source transaction all have the same value. In above example all changes have a truly ascending timestamp except for the update, where a before and after image row was generated. This does not have the same value because these rows followed so quickly after each other, they have the exact same value, even at the nano second.
This is more obvious when executing a few more changes in the source database.
(change 6)
insert into address_input
values ('4', 'Shoreline Bld 1', 'Los Angeles', '90002', 'US');
(change 7)
insert into address_input
values ('5', 'Page Mill Dr', 'Mountain View', '95111', 'US');
(change 8 & 9)
update address_input
set "street" = 'Page Mill Dr 435' where "customer_id" = '5';
(change 10, as this is executed on the Hana system against the virtual table, the insert statement is using the Hana(!!) current timestamp)
insert into address_input
values ('control', now(), now(), null, null);
commit;
All these changes have the same CHANGE_TIME although they have been entered over a period of 5 minutes.
And the last row proves the values does not reflect the change in the source as it was inserted at 03:36:49 in the source but in the target at 03:37:42.
Hence the CHANGE_TIME is not only strictly ascending, it also has no issues with system times being slightly off between different databases, different timezones or whatever.
Simply because this value does neither come from the source nor from the adapter but from the Data Provisioning server process of Hana. The DP Server does receive all change rows from the adapter and queues them up until a commit row is received. This is the point in time the CHANGE_TIME variable gets a value, the current timestamp (down to the nanosecond) of the Hana(!) instance, and the Applier process starts merging the data into the target tables.
Hence it is the perfect column for a timestamp based delta of subsequent processes.
Note: Above works identical if multiple tables are used. The examples show the behavior with a single table for better readability only.