---json { "description": "This database howto will show you how to load Xml file and document into a database.", "low_quality_page": "false", "page_id": "k9z63dibggq5kin4ubajf" } --- ====== HowTo - How to load XML files into a database with Tabulify ====== ===== Howto ===== This [[database|database howto]] will show you how to load [[docs:resource:xml|Xml file and document]] into a [[docs:system:database|database]]. ===== Steps ===== tabli data drop --not-strict wikipedia@postgres ==== Checking that Postgres is up ==== Normally our [[database|standard database howto]] are created with [[docs:system:sqlite:sqlite|Sqlite]] but because ''Sqlite'' does not support any SQL XML function, nor native XML [[docs:common:data_type|data type]], we have created this ''howto'' with the [[docs:system:postgres:postgres|Postgres database]]. The next steps will be using the [[docs:system:postgres:howto|postgres howto connection]]. * You need to [[docs:system:postgres:howto#docker|start the postgres docker image]]. * [[docs:tabli:connection:ping|Ping]] your connection to check that the database is up. tabli connection ping postgres The connection (postgres) has been pinged successfully ==== The XML file to load ==== We will load the [[docs:resource:xml|XML file]] ''relation.xml'' * located in the ''XML'' subdirectory of the [[docs:connection:howto|howto directory connection]] * that contains information over the [[wp>Relation_(database)|relation wikipedia page]] * that have been created by the [[https://www.mediawiki.org/w/api.php|mediawiki API]] ==== The structure of an XML file ==== === Default === An [[docs:resource:xml|XML resource]] is seen by ''Tabulify'' by default as a [[docs:resource:tabular|tabular resource]] that have: * one column with the name ''xml'' and the ''xml'' [[docs:common:data_type|data type]] * one row. tabli data struct xml/relation.xml@howto Structure of the resource xml\relation.xml@howto position name data_type precision scale primary_key nullable autoincrement comment -------- ---- --------- --------- ----- ----------- -------- ------------- ------- 1 xml sqlxml x false === Changing the XML column name === You can change the name of the ''xml'' column with the [[docs:resource:xml#attributes|columnName attribute]] For instance, to name the column ''data'', you would set the ''%%--attribute%%'' [[docs:tabli:option|option]] with the value ''column_name=data'' tabli data struct --attribute column_name=data xml/relation.xml@howto Structure of the resource xml\relation.xml@howto position name data_type precision scale primary_key nullable autoincrement comment -------- ---- --------- --------- ----- ----------- -------- ------------- ------- 1 data sqlxml x false === Adding context with virtual columns === You can add extra information about the [[docs:resource:xml|XML resource]] loaded via the [[docs:resource:virtual_column|virtual column]]. For instance, if you want to add the [[docs:resource:logicalname|logical name]]. tabli data struct --virtual-column filename=logical_name xml/relation.xml@howto Structure of the resource xml\relation.xml@howto position name data_type precision scale primary_key nullable autoincrement comment -------- -------- --------- --------- ----- ----------- -------- ------------- ------- 1 xml sqlxml x false 2 filename varchar x false ==== Loading the XML file into the Postgres database ==== To load an XML file, you can use any [[docs:tabli:data:transfer|transfer command]]. We will be using the [[docs:tabli:data:insert|insert command]] that will: * insert the [[docs:resource:xml|XML file]] * in a ''wikipedia'' target [[docs:resource:table|table]] tabli data insert --virtual-column name=logicalName xml/relation.xml@howto wikipedia@postgres Source Target Latency (ms) Row Count Error Message ---------------------- ------------------ ------------ --------- ----- ------- xml\relation.xml@howto wikipedia@postgres 68 1 We can see that there is one row loaded in the ''wikipedia'' table with the [[docs:tabli:data:list|list command]] tabli data list -a count wikipedia@postgres count ----- 1 ==== Loading from a Web API ==== ''Tabulify'' supports also to load XML data (even [[docs:resource:json|JSON]]) from a [[docs:resource:uri|URL]]. Let's add the wikipedia XML data for the page [[wp>SQL|SQL wikipedia page]] This command line will work only with a [[docs:common:bash|bash based terminal]]. More ... see [[docs:common:bash#why_do_we_recommend_bash|Why Bash]] tabli data insert "https://en.wikipedia.org/w/api.php?action=query&titles=SQL&format=xml&prop=description|categories" wikipedia@postgres Source Target Latency (ms) Row Count Error Message ------------------------------------------------------------------------------------------------------------------------------ ------------------ ------------ --------- ----- ------- https://en.wikipedia.org/w/api.php?action=query&titles=SQL&format=xml&prop=description%7Ccategories@https:--en.wikipedia.org-w wikipedia@postgres 690 1 And we have now 2 rows tabli data list -a count wikipedia@postgres count ----- 2 ==== Analyzing the XML with the SQL XML functions ==== Now that our XML have been loaded, we can analyze them with SQL XML function. [[docs:system:postgres:postgres|Postgres]] documents all this XML function on this [[https://www.postgresql.org/docs/9.1/functions-xml.html|page]]. For instance, to extract the title and description, we can use the [[https://www.postgresql.org/docs/9.1/functions-xml.html#FUNCTIONS-XML-PROCESSING|xpath function]] SELECT CAST((xpath('/api/query/pages/page/@title', xml))[1] AS text) AS title, CAST((xpath('/api/query/pages/page/@description', xml))[1] AS text) AS description FROM wikipedia; Let's run this [[docs:resource:select|Query SQL select]] tabli data print (postgres/xml_wikipedia.sql@howto)@postgres title description ------------------- ------------------------------------------------------- Relation (database) The base data structure of relational databases SQL Language for management and use of relational databases ==== Next ==== You can explore further, the [[https://www.postgresql.org/docs/9.1/functions-xml.html|XML Postgres functions]] select xml, -- Because XML is an XML object, you will normally not see the content in a database IDE cast(xml as text) -- By casting it as text, you will see the content xml is document, -- You can also check that this is a XML document xml_is_well_formed_document(cast(xml as text)), -- And that it's well formed from wikipedia; or you can also try to [[howto:database:load_json|load JSON document]].