---json
{
"description": "This database howto will show you how to load Xml file and document into a database.",
"low_quality_page": "false",
"page_id": "k9z63dibggq5kin4ubajf"
}
---
====== HowTo - How to load XML files into a database with Tabulify ======
===== Howto =====
This [[database|database howto]] will show you how to load [[docs:resource:xml|Xml file and document]] into a [[docs:system:database|database]].
===== Steps =====
tabli data drop --not-strict wikipedia@postgres
==== Checking that Postgres is up ====
Normally our [[database|standard database howto]] are created with [[docs:system:sqlite:sqlite|Sqlite]] but because ''Sqlite'' does not support any SQL XML function, nor native XML [[docs:common:data_type|data type]], we have created this ''howto'' with the [[docs:system:postgres:postgres|Postgres database]].
The next steps will be using the [[docs:system:postgres:howto|postgres howto connection]].
* You need to [[docs:system:postgres:howto#docker|start the postgres docker image]].
* [[docs:tabli:connection:ping|Ping]] your connection to check that the database is up.
tabli connection ping postgres
The connection (postgres) has been pinged successfully
==== The XML file to load ====
We will load the [[docs:resource:xml|XML file]] ''relation.xml''
* located in the ''XML'' subdirectory of the [[docs:connection:howto|howto directory connection]]
* that contains information over the [[wp>Relation_(database)|relation wikipedia page]]
* that have been created by the [[https://www.mediawiki.org/w/api.php|mediawiki API]]
==== The structure of an XML file ====
=== Default ===
An [[docs:resource:xml|XML resource]] is seen by ''Tabulify'' by default as a [[docs:resource:tabular|tabular resource]] that have:
* one column with the name ''xml'' and the ''xml'' [[docs:common:data_type|data type]]
* one row.
tabli data struct xml/relation.xml@howto
Structure of the resource xml\relation.xml@howto
position name data_type precision scale primary_key nullable autoincrement comment
-------- ---- --------- --------- ----- ----------- -------- ------------- -------
1 xml sqlxml x false
=== Changing the XML column name ===
You can change the name of the ''xml'' column with the [[docs:resource:xml#attributes|columnName attribute]]
For instance, to name the column ''data'', you would set the ''%%--attribute%%'' [[docs:tabli:option|option]] with the value ''column_name=data''
tabli data struct --attribute column_name=data xml/relation.xml@howto
Structure of the resource xml\relation.xml@howto
position name data_type precision scale primary_key nullable autoincrement comment
-------- ---- --------- --------- ----- ----------- -------- ------------- -------
1 data sqlxml x false
=== Adding context with virtual columns ===
You can add extra information about the [[docs:resource:xml|XML resource]] loaded via the [[docs:resource:virtual_column|virtual column]].
For instance, if you want to add the [[docs:resource:logicalname|logical name]].
tabli data struct --virtual-column filename=logical_name xml/relation.xml@howto
Structure of the resource xml\relation.xml@howto
position name data_type precision scale primary_key nullable autoincrement comment
-------- -------- --------- --------- ----- ----------- -------- ------------- -------
1 xml sqlxml x false
2 filename varchar x false
==== Loading the XML file into the Postgres database ====
To load an XML file, you can use any [[docs:tabli:data:transfer|transfer command]].
We will be using the [[docs:tabli:data:insert|insert command]] that will:
* insert the [[docs:resource:xml|XML file]]
* in a ''wikipedia'' target [[docs:resource:table|table]]
tabli data insert --virtual-column name=logicalName xml/relation.xml@howto wikipedia@postgres
Source Target Latency (ms) Row Count Error Message
---------------------- ------------------ ------------ --------- ----- -------
xml\relation.xml@howto wikipedia@postgres 68 1
We can see that there is one row loaded in the ''wikipedia'' table with the [[docs:tabli:data:list|list command]]
tabli data list -a count wikipedia@postgres
count
-----
1
==== Loading from a Web API ====
''Tabulify'' supports also to load XML data (even [[docs:resource:json|JSON]]) from a [[docs:resource:uri|URL]].
Let's add the wikipedia XML data for the page [[wp>SQL|SQL wikipedia page]]
This command line will work only with a [[docs:common:bash|bash based terminal]]. More ... see [[docs:common:bash#why_do_we_recommend_bash|Why Bash]]
tabli data insert "https://en.wikipedia.org/w/api.php?action=query&titles=SQL&format=xml&prop=description|categories" wikipedia@postgres
Source Target Latency (ms) Row Count Error Message
------------------------------------------------------------------------------------------------------------------------------ ------------------ ------------ --------- ----- -------
https://en.wikipedia.org/w/api.php?action=query&titles=SQL&format=xml&prop=description%7Ccategories@https:--en.wikipedia.org-w wikipedia@postgres 690 1
And we have now 2 rows
tabli data list -a count wikipedia@postgres
count
-----
2
==== Analyzing the XML with the SQL XML functions ====
Now that our XML have been loaded, we can analyze them with SQL XML function. [[docs:system:postgres:postgres|Postgres]] documents all this XML function on this [[https://www.postgresql.org/docs/9.1/functions-xml.html|page]].
For instance, to extract the title and description, we can use the [[https://www.postgresql.org/docs/9.1/functions-xml.html#FUNCTIONS-XML-PROCESSING|xpath function]]
SELECT
CAST((xpath('/api/query/pages/page/@title', xml))[1] AS text) AS title,
CAST((xpath('/api/query/pages/page/@description', xml))[1] AS text) AS description
FROM wikipedia;
Let's run this [[docs:resource:select|Query SQL select]]
tabli data print (postgres/xml_wikipedia.sql@howto)@postgres
title description
------------------- -------------------------------------------------------
Relation (database) The base data structure of relational databases
SQL Language for management and use of relational databases
==== Next ====
You can explore further, the [[https://www.postgresql.org/docs/9.1/functions-xml.html|XML Postgres functions]]
select
xml, -- Because XML is an XML object, you will normally not see the content in a database IDE
cast(xml as text) -- By casting it as text, you will see the content
xml is document, -- You can also check that this is a XML document
xml_is_well_formed_document(cast(xml as text)), -- And that it's well formed
from
wikipedia;
or you can also try to [[howto:database:load_json|load JSON document]].