Introduction
HCI provides functionality to convert between XML to CSV and vice versa. Compared to PI, its functionality is relatively rudimentary and can only cater for very simple structures.
The online documentation (Defining Converter) only covers the functionality briefly, and there is no other article on SCN covering it.
Therefore I tried experimenting with the functionality and this blog covers my experience doing so.
Component Details
As HCI is a cloud solution with automatic rolling updates, my testing is based on the following component versions of the tenant and Eclipse plugins.
HCI Tenant Version: 2.8.5
Eclipse Plugin Versions: Adapter 2.11.1, Designer 2.11.1, Operations 2.10.0
Example Scenarios
The following is mentioned in the online documentation, therefore I could only test out the following two scenarios.
You cannot use XML to CSV converter to convert complex XML files to CSV format.
For simplicity sake, the iFlows are designed with a timer to trigger the iFlow upon deployment and a Content Modifier to provide static input data to the converter. The output is then sent to a HTTP logging server.
Scenario 1 - Structure with single record type
In this scenario, the input payload is defined with a Records root node and an unbounded Line nodes.
Input Payload |
---|
<?xml version='1.0' encoding='UTF-8'?> <Records> <Line> <Field1>ABC</Field1> <Field2>123</Field2> <Field3>XXX</Field3> <Field4>567890</Field4> </Line> <Line> <Field1>XYZ</Field1> <Field2>456</Field2> <Field3>YYYY</Field3> <Field4>98765</Field4> </Line> </Records> |
The data is contained in the repeating Line nodes. So, configuration of the converter is as simple as entering the XPath to the Line node, i.e. /Records/Line. The other options are specifying the field separator as well as the column names as header.
With this configuration, the conversion's output payload is as follows.
Output Payload |
---|
Field1,Field2,Field3,Field4 ABC,123,XXX,567890 XYZ,456,YYYY,98765 |
Scenario 2 - Structure with header record type and repeating details record type
In this scenario, we have additionally a Header node.
Input Payload |
---|
<?xml version='1.0' encoding='UTF-8'?> <Records> <Header> <FieldA>H_ABC</FieldA> <FieldB>H_123</FieldB> <FieldC>H_XXX</FieldC> <FieldD>H_567890</FieldD> </Header> <Line> <Field1>ABC</Field1> <Field2>123</Field2> <Field3>XXX</Field3> <Field4>567890</Field4> </Line> <Line> <Field1>XYZ</Field1> <Field2>456</Field2> <Field3>YYYY</Field3> <Field4>98765</Field4> </Line> </Records> |
In addition to the configuration above, we can configure the conversion of the "parent" element in the Advanced tab. The configuration is as simple as selecting Include Parent Element and specifying the XPath to the Header node.
With this additional configuration, the conversion's output payload is as follows.
Output Payload |
---|
FieldA,FieldB,FieldC,FieldD H_ABC,H_123,H_XXX,H_567890
Field1,Field2,Field3,Field4 ABC,123,XXX,567890 XYZ,456,YYYY,98765 |
A particular point of interest is that the converter automatically includes an additional blank line in between the header line and the detail lines.
Additional Findings/Issues
Besides this simple conversions, during my testing of the function, I've come across the following issues.
i) Missing enclosure of fields that contain separator
According to RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files:-
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
However, the converter does not handle this properly. As shown below, Field1 contains a comma. However in the output, this field is no enclosed with double quotes. As such, it will potentially cause issue for applications that try to process the CSV content.
Input | Output |
---|---|
<?xml version='1.0' encoding='UTF-8'?> <Records> <Line> <Field1>AB,C</Field1> <Field2>123</Field2> <Field3>XXX</Field3> <Field4>567890</Field4> </Line> <Line> <Field1>XYZ</Field1> <Field2>456</Field2> <Field3>YYYY</Field3> <Field4>98765</Field4> </Line> </Records> | Field1,Field2,Field3,Field4 AB,C,123,XXX,567890 XYZ,456,YYYY,98765 |
ii) Include Parent Element setting still valid even after being unchecked
If Include Parent Element is checked and Path to Parent Element is populated (as shown in Scenario 2's screenshot), even if the setting was unchecked later, the converter still performs conversion for the parent element. The workaround for this is to ensure that the Path to Parent Element is cleared off prior to unchecking Include Parent Element.
Further Points
Although the scope of this blog is on the XML to CSV converter, I also tried out the CSV to XML converter functionality. However, I was unable to get it to work successfully. Again, the example on the online documentation is quite vague and there were no other materials on SCN to assist.
Following is a configuration of the CSV to XML converter that was being tested.
However, during runtime, the following error is triggered. I've tried various values for Path to Target Element in XSD but none was successful.
java.lang.IllegalStateException: Element name [DT_HCI_Conversion\Line] not found in provided XML schema file
This is similar to the error in the following thread which also remains unanswered.
iflow in HCI having error at csv to xml converter step
Conclusion
Although HCI comes with built-in functionality for XML to CSV conversion (and vice versa), the functionality is still very limited and buggy. The use case for the converter is restricted to just simple scenarios. As such, until this functionality is enhanced in future updates, more complex conversions would most likely require custom development in the form of custom Groovy scripts.