Creating database table patches based on XML
Example is based on the project: credit-forms-unification-sample. The example demonstrates the ability to update fields of a table in an existing database with data extracted from XML documents.
Open the project credit-forms-patching-sample
The table xml_files, which stores references to the original XML files, has the following schema (you can read about the minimal necessary database schema for operation here):
CREATE TABLE xml_files (
"id" Integer PRIMARY KEY AUTOINCREMENT,
"xml_path" Text,
"parsing_status" Text,
"processed_date" Numeric,
"is_patch" Text,
CONSTRAINT "unique_tax_number" UNIQUE (tax_number)
);
Attention:
- In this example, the filling_percent_stat field is missing. Make sure that the corresponding setting Write to DB columns filling stat is turned off.
-
When generating the
UPDATE
expression, theWHERE
clause uses the column names on which uniqueness constraints are set. For example:UPDATE credits_forms SET "phone_number" = '+31 20 83115136', "email" = 'lisa.newmail@example.com', "is_married" = 'Married' WHERE "tax_number" ='456-78-9012';
UPDATE
statements is not possible. Uniqueness constraints are required both at the database level and in the application settings, as will be shown below.
Overall view of the filled table:
id | xml_path | parsing_status | processed_date | is_patch |
---|---|---|---|---|
1 | data-samples/credit-forms-patching-sample/barclays.xml | NULL | NULL | NULL |
2 | data-samples/credit-forms-patching-sample/barclays_patch.xml | NULL | NULL | true |
3 | data-samples/credit-forms-patching-sample/deutschebank.xml | NULL | NULL | NULL |
4 | data-samples/credit-forms-patching-sample/deutschebank_patch.xml | NULL | NULL | true |
5 | data-samples/credit-forms-patching-sample/rabobank.xml | NULL | NULL | NULL |
6 | data-samples/credit-forms-patching-sample/rabobank_patch.xml | NULL | NULL | true |
The table includes the field is_patch
. Since it is assumed that patches should be applied
to data already loaded into the database, we need to somehow separate the documents into groups. At first, the parsing and loading of the
original XML should be performed, and only then apply the patches to them. This will require the creation of two
Job
. The first Job
will parse and load the main documents into the database. The second will apply patches to them. The field name can be anything.
More information on how to create a Job
can be found here.
Examples of documents that will be used as patches for existing tables:
DeutscheBank:
<?xml version="1.0" encoding="UTF-8"?>
<UpdateInfo>
<ApplicantInfo>
<FullName>
<FirstName>Sofia</FirstName>
<LastName>Lorenzo</LastName>
</FullName>
<DateOfBirth>1992-03-24</DateOfBirth>
<MaritalStatus>Single</MaritalStatus>
<TaxIdentificationNumber>545-56-7190</TaxIdentificationNumber>
<ContactInfo>
<EmailAddress>sofia.newmail@example.com</EmailAddress>
<PhoneNumber>+39 02 51821810</PhoneNumber>
</ContactInfo>
</ApplicantInfo>
</UpdateInfo>
Barclays:
<?xml version="1.0" encoding="UTF-8"?>
<ContactInfoUpdate>
<Applicant>
<Name>
<First>Lisa</First>
<Last>Martin</Last>
</Name>
<DateOfBirth>1987-12-12</DateOfBirth>
<MaritalStatus>Married</MaritalStatus>
<TaxID>456-78-9012</TaxID>
<ContactInfo>
<Email>lisa.newmail@example.com</Email>
<Phone>+31 20 83115136</Phone>
</ContactInfo>
</Applicant>
</ContactInfoUpdate>
Rabobank:
<?xml version="1.0" encoding="UTF-8"?>
<LoanApplicationUpdate>
<Applicant>
<FullName>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
</FullName>
<isMarried>TRUE</isMarried>
<DateOfBirth>1980-03-25</DateOfBirth>
<SocialSecurityNumber>987-65-4321</SocialSecurityNumber>
<ContactInformation>
<Email>john.smith.newmail@example.com</Email>
<Phone>+44 20 12345678</Phone>
</ContactInformation>
</Applicant>
</LoanApplicationUpdate>
As an example, we will be updating the fields:
phone_number
,
email
,
is_married
Overall Project View:
The structure describing the data for updating must have a field tag
with a value like:
"update_tableNameForUpdating"
. In our case: tag: "update_credits_forms"
.
Uniqueness settings:
Rules ⮞ Database Constraints need to specify that
in the tables credits_forms
and applicant_finances
, the column tax_number
is unique
(in case of uniqueness for a group of columns, you should specify their names separated by a space).
Also, it is necessary to specify the name of the node in the intermediate representation: applicant_finance
, as well as its variations in XML:
FinancialInfo EmploymentInformation
:
As seen from the intermediate representation, tax_number
is located at the root. Specify that it
needs to be passed to each child of the intermediate representation from which the table will be generated:
Table structures for extracting data from credit_forms and applicant_finances:
CREATE TABLE credits_forms (
"id" INTEGER PRIMARY KEY,
"first_name" Text,
"last_name" Text,
"tax_number" Text,
"phone_number" Text,
"email" Text,
"is_married" Text,
CONSTRAINT "unique_tax_number" UNIQUE (tax_number)
);
CREATE TABLE applicant_finances (
"id" INTEGER PRIMARY KEY,
"tax_number" Text,
"credit_rating" Text,
"loan_amount" Text,
"salary" Text,
CONSTRAINT "unique_tax_number" UNIQUE (tax_number)
);
Pay attention that in this example, all data is loaded into the database as text. Type conversion rules are covered in a separate article.
Configure the database connection on the Settings ⮞ Database tab.
- On the Settings ⮞ General tab, specify the missing part of the path to the directory where the test XML files are located in the XML Folder field.
- Check the Connect to DB at project opening option (Requires restart).
-
Make sure that Source Database and Target Database are set to
SQLite
. - Save the changes.
After restarting, on the Database tab, you can request the number of files awaiting processing by clicking Get XML Count.
As mentioned earlier, it is necessary to process files containing complete data first and only then apply patches to them. For this, we will create two Job
.
In the first Job
, specify that you need to select data from the xml_files table where the is_patch
column contains NULL
. In the second one, NOT NULL
Be sure to save the changes for both the Job
and the entire project.
Now, when you click on Get XML Count, it will show that the first Job
has only 3 files to process.
Clicking on Start Processing will initiate the processing.
Table structure after processing completion.
xml_files:
id | xml_path | parsing_status | processed_date | is_patch |
---|---|---|---|---|
1 | data-samples/credit-forms-patching-sample/barclays.xml | success | 2024-03-21 16:57 | NULL |
2 | data-samples/credit-forms-patching-sample/barclays_patch.xml | NULL | NULL | true |
3 | data-samples/credit-forms-patching-sample/deutschebank.xml | success | 2024-03-21 16:57 | NULL |
4 | data-samples/credit-forms-patching-sample/deutschebank_patch.xml | NULL | NULL | true |
5 | data-samples/credit-forms-patching-sample/rabobank.xml | success | 2024-03-21 16:57 | NULL |
6 | data-samples/credit-forms-patching-sample/rabobank_patch.xml | NULL | NULL | true |
credits_forms:
id | first_name | last_name | tax_number | phone_number | is_married | |
---|---|---|---|---|---|---|
1 | John | Smith | 987-65-4321 | +44 20 12345678 | john.smith@example.com | TRUE |
2 | Sofia | Lorenzo | 545-56-7190 | +39 02 98765432 | sofia.l@example.com | Married |
3 | Lisa | Martin | 456-78-9012 | +31 20 98765431 | lisa.m@example.com | Single |
applicant_finances:
id | tax_number | credit_rating | loan_amount | salary |
---|---|---|---|---|
1 | 987-65-4321 | 800 | 50000.00 | 75000.50 |
2 | 545-56-7190 | 710 | 42000.00 | 50000.75 |
3 | 456-78-9012 | 750 | 48000.00 | 71000 |
Switch to Job
2 and start processing XML files containing patches for the data:
The final view of the xml_files table. All files have been processed:
id | xml_path | parsing_status | processed_date | is_patch |
---|---|---|---|---|
1 | data-samples/credit-forms-patching-sample/barclays.xml | success | 2024-03-10 17:52 | NULL |
2 | data-samples/credit-forms-patching-sample/barclays_patch.xml | success | 2024-03-10 17:52 | true |
3 | data-samples/credit-forms-patching-sample/deutschebank.xml | success | 2024-03-10 17:52 | NULL |
4 | data-samples/credit-forms-patching-sample/deutschebank_patch.xml | success | 2024-03-10 17:52 | true |
5 | data-samples/credit-forms-patching-sample/rabobank.xml | success | 2024-03-10 17:52 | NULL |
6 | data-samples/credit-forms-patching-sample/rabobank_patch.xml | success | 2024-03-10 17:52 | true |
Updated data in the table:
id | first_name | last_name | tax_number | phone_number | is_married | |
---|---|---|---|---|---|---|
1 | John | Smith | 987-65-4321 | +44 20 12345678 | john.smith.newmail@example.com | TRUE |
2 | Sofia | Lorenzo | 545-56-7190 | +39 02 51821810 | sofia.newmail@example.com | Single |
3 | Lisa | Martin | 456-78-9012 | +31 20 83115136 | lisa.newmail@example.com | Married |
Old data for comparison:
id | first_name | last_name | tax_number | phone_number | is_married | |
---|---|---|---|---|---|---|
1 | John | Smith | 987-65-4321 | +44 20 12345678 | john.smith@example.com | TRUE |
2 | Sofia | Lorenzo | 545-56-7190 | +39 02 98765432 | sofia.l@example.com | Single |
3 | Lisa | Martin | 456-78-9012 | +31 20 98765431 | lisa.m@example.com | Single |
Queries generated to update data in the credits_forms table:
UPDATE credits_forms
SET "phone_number" = '+39 02 51821810', "email" = 'sofia.newmail@example.com', "is_married" = 'Single'
WHERE "tax_number" ='545-56-7190';
UPDATE credits_forms
SET "phone_number" = '+31 20 83115136', "email" = 'lisa.newmail@example.com', "is_married" = 'Married'
WHERE "tax_number" = '456-78-9012';
UPDATE credits_forms
SET "phone_number" = '+44 20 12345678', "email" = 'john.smith.newmail@example.com'
WHERE "tax_number" = '987-65-4321';