In this example, we will look at the peculiarities of intermediate representation, handling complex cases, and solving problems with incorrect nesting of data in XML.

XML:

      
      <books>
            <publisher>Elsevier</publisher>
            <book id="1">
               <title>The Origins of Totalitarianism</title>
               <author>Hannah Arendt</author>
            </book>
            <book id="2">
               <title>Alone in the Ocean</title>
               <author>Stanislav Kurilov</author>
            </book>
            <customer>
               <name>City Library</name>
            </customer>
      </books>
      
      

Make sure that in config.txt the option: ignore-tag-attributes: false is set to false. This is necessary to handle tag attributes (book id="1").

Note that the very nodes with attributes from which values will be extracted should be added to ignores/sample.txt (["books" "book"]). The attribute values themselves will be extracted if ignore-tag-attributes: false.

It is often impossible to tell from XML which node is a single node and which is an array. However, we often need to treat some nodes strictly as arrays (or as tables).

In the example books has two descendants and it is obvious that it is an array.

However, it is not clear from the example whether we can have more than one customer. Let's assume that it can.

Also, in the example customer has an incorrect nesting - it is next to the multiple node book. Then we must create some container that will play the role of a table for customer and specify that the first time a tag named customer is encountered, a parent container must be created that will play the role of a table (or array name). On each new encounter of the tag customer, the parent container will be refilled.

Note. Not every valid intermediate representation can be translated into valid SQL and JSON. To generate correct SQL and JSON **you must take into account the peculiarities of these two formats, as well as the rules for describing and transforming an intermediate representation!**.

Therefore, an attempt to describe the intermediate representation by duplicating the XML structure as much as possible will lead to incorrect generation of the result.

Example of incorrect description:

   
      #[
         sample: #[
               sample_subcategory: #[   
                  root_table: [        
                     books_table: [     
                           my_guid: #guid  ; #guid - magic constant just for illustrative purposes
                           publisher: none  

                           book_node_name: [ 
                              book_id: none
                              title: none
                              author: none
                           ]
                     ]
                     customer_table: [  
                           customer_node_name: [ 
                              customer_name: none
                           ]
                     ]
                  ]
               ]
         ]
      ]
   
   

Such a structure is equivalent to trying to generate invalid JSON with the following structure:

      
         {
            "root_table": [
               "my_guid": "F182697C-U28E-CA6D-K301-P075U79T7I8N",
               "publisher": "Elsevier",
               [
                  ...
               ]
            ]
         }
      
   

However, in doing so, an intermediate view would allow valid SQL to be created:

      
INSERT INTO sample_subcategory ("my_guid", "publisher")
  VALUES ('F182697C-U28E-CA6D-K301-P075U79T7I8N', 'Elsevier');

INSERT INTO books_table ("my_guid", "book_id", "title", "author")
  VALUES ('F182697C-U28E-CA6D-K301-P075U79T7I8N', '1', 'The Origins of Totalitarianism', 'Hannah Arendt'),
         ('F182697C-U28E-CA6D-K301-P075U79T7I8N', '2', 'Alone in the Ocean', 'Stanislav Kurilov');

INSERT INTO customer_table ("my_guid", "customer_name")
  VALUES ('F182697C-U28E-CA6D-K301-P075U79T7I8N', 'City Library');
      
   

Example of a valid intermediate intermediate intermediate representation generating correct but poorly usable JSON:


   #[
      sample: #[
         sample_subcategory: #[   
               my_guid: #guid   
               publisher: none          
               root_table: [        
                  books_table: [        
                     book_node_name: [ 
                           book_id: none
                           title: none
                           author: none
                     ]
                  ]

                  customer_table: [         
                     customer_node_name: [ 
                           customer_name: none
                     ]
                  ]
               ]
         ]
      ]
   ]

JSON:


   {
      "my_guid": "F182697C-U28E-CA6D-K301-P075U79T7I8N",
      "publisher": "Elsevier",
      "root_table": [
            [
               {
                  "my_guid": "F182697C-U28E-CA6D-K301-P075U79T7I8N",
                  "book_id": "1",
                  "title": "The Origins of Totalitarianism",
                  "author": "Hannah Arendt"
               },
               {
                  "my_guid": "F182697C-U28E-CA6D-K301-P075U79T7I8N",
                  "book_id": "2",
                  "title": "Alone in the Ocean",
                  "author": "Stanislav Kurilov"
               }
            ],
            [
               {
                  "my_guid": "F182697C-U28E-CA6D-K301-P075U79T7I8N",
                  "customer_name": "City Library"
               }
            ]
      ]
   }

SQL:

   
INSERT INTO sample_subcategory ("my_guid", "publisher")
  VALUES ('F182697C-U28E-CA6D-K301-P075U79T7I8N', 'Elsevier');

INSERT INTO books_table ("my_guid", "book_id", "title", "author")
  VALUES ('F182697C-U28E-CA6D-K301-P075U79T7I8N', '1', 'The Origins of Totalitarianism', 'Hannah Arendt'),
         ('F182697C-U28E-CA6D-K301-P075U79T7I8N', '2', 'Alone in the Ocean', 'Stanislav Kurilov');

INSERT INTO customer_table ("my_guid", "customer_name")
  VALUES ('F182697C-U28E-CA6D-K301-P075U79T7I8N', 'City Library');
   

However, despite the fact that the JSON above is valid, it is clearly not good to work with since we got two arrays in root_table: [[[] []].

However, the intermediate representation ignores the original XML nesting level. Let's describe the data in a similar way:

JSON:

   
#[
    sample: #[
        sample_subcategory: #[   
            my_guid: #guid   
            publisher: none          
            root_table: [ ; container for books         
                book_node_name: [ 
                    book_id: none
                    title: none
                    author: none
                ]
            ]
            customer_table: [ ; container for customers        
                customer_node_name: [ 
                    customer_name: none
                ]
            ]
        ]
    ]
]

JSON:


{
    "my_guid": "F182697C-U28E-CA6D-K301-P075U79T7I8N",
    "publisher": "Elsevier",
    "root_table": [
        {
            "my_guid": "F182697C-U28E-CA6D-K301-P075U79T7I8N",
            "book_id": "1",
            "title": "The Origins of Totalitarianism",
            "author": "Hannah Arendt"
        },
        {
            "my_guid": "F182697C-U28E-CA6D-K301-P075U79T7I8N",
            "book_id": "2",
            "title": "Alone in the Ocean",
            "author": "Stanislav Kurilov"
        }
    ],
    "customer_table": [
        {
            "my_guid": "F182697C-U28E-CA6D-K301-P075U79T7I8N",
            "customer_name": "City Library"
        }
    ]
}

The books array is now built separately from the customers array.

SQL:


INSERT INTO sample_subcategory ("my_guid", "publisher")
 VALUES ('F182697C-U28E-CA6D-K301-P075U79T7I8N', 'Elsevier');

INSERT INTO root_table ("my_guid", "book_id", "title", "author")
  VALUES ('F182697C-U28E-CA6D-K301-P075U79T7I8N', '1', 'The Origins of Totalitarianism', 'Hannah Arendt'),
         ('F182697C-U28E-CA6D-K301-P075U79T7I8N', '2', 'Alone in the Ocean', 'Stanislav Kurilov');

INSERT INTO customer_table ("my_guid", "customer_name")
  VALUES ('F182697C-U28E-CA6D-K301-P075U79T7I8N', 'City Library');

grow-rules.red:


sample: [
    root_table: ["books"] ; We'll create the table when we encounter the tag `books` (it's root tag)
    book_node_name: ["book"] ; we need to indicate which names of multiple nodes
    customer_node_name: ["customer"] ; When we meet `customer` then `customer_node_name` will start filling the parent of `customer_table`
]

tags-matching-rules.red:


sample: #[
    publisher: [
        "books publisher"
    ]
    book_id: [
        "books book book_id"
    ]
    title: [
        "books book title"
    ]
    author: [
        "books book author"
    ]
    customer_name: [
        "books customer name"
    ]
]

injection-rules.red:


sample: [
    inject-tag-to-every-children: [my_guid] ; pass to all child nodes
    enumerate-nodes: [] 
    injection-tag-and-recipients: []
]