07-24-2023, 09:56 AM
My (simplified) input XML file contains the following:
<?xml version="1.0" encoding="UTF-8"?>
<main>
<DATA_RECORD>
<MESSAGE>&#60;pd&#62;&#10; &#60;cdhead version&#61;&#34;13&#34;/&#62;&#10;&#60;/pd&#62;</MESSAGE>
</DATA_RECORD>
</main>
The MESSAGE element value is a character-escaped XML instance. It represents the following XML:
<pd>
<cdhead version="13"/>
</pd>
I would like to apply an xsl transformation on the input XML and somehow parse the MESSAGE contents into a variable and use Xpath expressions to access its details.
I tried adding a javascript function as below, but the object returned by the script apparently is of an incorrect DOM subclass (see result underneath). For completeness, I added an extra function that returns the DOM contents as a string.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ms="urn:schemas-microsoft-com:xslt"
xmlns:my="http://example.com/my"
exclude-result-prefixes="ms my">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<ms:script language="JScript" implements-prefix="my">
<![CDATA[
function parseToDOM (input) {
var doc = new ActiveXObject('Msxml2.DOMDocument.6.0');
doc.loadXML (input);
return doc.documentElement;
};
function parseToXMLString (input) {
var doc = new ActiveXObject('Msxml2.DOMDocument.6.0');
doc.loadXML (input);
return doc.documentElement.xml;
};
]]>
</ms:script>
<xsl:template match="/">
<root>
<xsl:apply-templates/>
</root>
</xsl:template>
<xsl:template match="DATA_RECORD">
<xsl:variable name="DOM"><xsl:copy-of select="my:parseToDOM (MESSAGE)"/></xsl:variable>
<xsl:variable name="XML"><xsl:copy-of select="my:parseToXMLString (MESSAGE)"/></xsl:variable>
<msg1><xsl:value-of select="$XML"/></msg1>
<msg2><xsl:value-of select="$XML" disable-output-escaping="yes"/></msg2>
<dom><xsl:copy-of select="$DOM"/></dom>
<version><xsl:value-of select="$DOM/pd/cdhead/@version"/></version>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
Result:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<msg1>&lt;pd&gt;
&lt;cdhead version="13"/&gt;
&lt;/pd&gt;</msg1>
<msg2><pd>
<cdhead version="13"/>
</pd></msg2>
<dom/>
<version></version>
</root>
How can I make the Jscript function return a result that allows the use of Xpath?
By the way, is there some XSLT 1.0 function available that allows parsing the escaped XML string to a result that allows the use of Xpath?
**ADDITION**
I have been trying some variations and got closer to a solution. First, Altova XMLSpy allows choosing the xsl processor, and the above resulted when using the built-in one. Of course I need MSXML 6.0 and when choosing that one, errors occurred as I had to parse input.text instead. But I only succeeded in being able to use Xpath expressions in the result after doing extra stuff in the javascript. It transpired that while `&#60;` and the like are parsed into `&lt;` etcetera, this is not enough to arrive at the proper DOM result. So I resorted to unescaping the input string first.
But I hit another snag: where the below works fine, it does not when I use `input.text` instead of the literal below.
See below the xslt.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ms="urn:schemas-microsoft-com:xslt"
xmlns:my="http://example.com/my"
exclude-result-prefixes="ms my">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<ms:script language="JScript" implements-prefix="my">
<![CDATA[
function parseToDOM (input) {
var doc = new ActiveXObject('Msxml2.DOMDocument.6.0');
doc.loadXML (unescapeXML ('&#60;pd&#62;&#10; &#60;cdhead version&#61;&#34;13&#34;/&#62;&#10;&#60;/pd&#62;'));
//doc.loadXML (unescapeXML (input.text));
return doc;
};
function unescapeXML (str) {
var ostr = str;
ostr = ostr.replace (/&#34;/g, '"');
ostr = ostr.replace (/&#60;/g, '<');
ostr = ostr.replace (/&#61;/g, '=');
ostr = ostr.replace (/&#62;/g, '>');
return ostr;
};
]]>
</ms:script>
<xsl:template match="/">
<root>
<xsl:apply-templates/>
</root>
</xsl:template>
<xsl:template match="DATA_RECORD">
<xsl:variable name="msg" select="my:parseToDOM (MESSAGE)"/>
<tst><xsl:value-of select="$msg/pd/cdhead/@version"/></tst>
</xsl:template>
</xsl:stylesheet>
Now results in
<?xml version="1.0" encoding="UTF-8"?>
<root>
<tst>13</tst>
</root>
Which is exactly what I want.
But as remarked above, when I comment the parsing of the literal and use the input instead, like so:
//doc.loadXML (unescapeXML ('&#60;pd&#62;&#10; &#60;cdhead version&#61;&#34;13&#34;/&#62;&#10;&#60;/pd&#62;'));
doc.loadXML (unescapeXML (input.text));
I get the following error (in Altova XML Spy with MSXML 6.0 as xslt parser):
XSL transformation failed due to following error:
Microsoft JScript runtime error
'undefined' is null or not an object
line = 10, col = 3 (line is offset from the start of the script block).
Error returned from property or method call.
Which points at the first javascript replace statement.
And also, IE9 cannot process the following properly:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="test.xslt"?>
<main>
<DATA_RECORD>
<MESSAGE>&#60;pd&#62;&#10; &#60;cdhead version&#61;&#34;13&#34;/&#62;&#10;&#60;/pd&#62;</MESSAGE>
</DATA_RECORD>
</main>
When I open this file in IE9 (where test.xslt is the version of the transformation where the input is ignored and instead a literal is processed, hence the one that is OK in XML Spy), I get a processing error:
XML5001: Applying Integrated XSLT Handling.
XSLT8690: XSLT processing failed.
Why is all this and how can I correct it?
<?xml version="1.0" encoding="UTF-8"?>
<main>
<DATA_RECORD>
<MESSAGE>&#60;pd&#62;&#10; &#60;cdhead version&#61;&#34;13&#34;/&#62;&#10;&#60;/pd&#62;</MESSAGE>
</DATA_RECORD>
</main>
The MESSAGE element value is a character-escaped XML instance. It represents the following XML:
<pd>
<cdhead version="13"/>
</pd>
I would like to apply an xsl transformation on the input XML and somehow parse the MESSAGE contents into a variable and use Xpath expressions to access its details.
I tried adding a javascript function as below, but the object returned by the script apparently is of an incorrect DOM subclass (see result underneath). For completeness, I added an extra function that returns the DOM contents as a string.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ms="urn:schemas-microsoft-com:xslt"
xmlns:my="http://example.com/my"
exclude-result-prefixes="ms my">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<ms:script language="JScript" implements-prefix="my">
<![CDATA[
function parseToDOM (input) {
var doc = new ActiveXObject('Msxml2.DOMDocument.6.0');
doc.loadXML (input);
return doc.documentElement;
};
function parseToXMLString (input) {
var doc = new ActiveXObject('Msxml2.DOMDocument.6.0');
doc.loadXML (input);
return doc.documentElement.xml;
};
]]>
</ms:script>
<xsl:template match="/">
<root>
<xsl:apply-templates/>
</root>
</xsl:template>
<xsl:template match="DATA_RECORD">
<xsl:variable name="DOM"><xsl:copy-of select="my:parseToDOM (MESSAGE)"/></xsl:variable>
<xsl:variable name="XML"><xsl:copy-of select="my:parseToXMLString (MESSAGE)"/></xsl:variable>
<msg1><xsl:value-of select="$XML"/></msg1>
<msg2><xsl:value-of select="$XML" disable-output-escaping="yes"/></msg2>
<dom><xsl:copy-of select="$DOM"/></dom>
<version><xsl:value-of select="$DOM/pd/cdhead/@version"/></version>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
Result:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<msg1>&lt;pd&gt;
&lt;cdhead version="13"/&gt;
&lt;/pd&gt;</msg1>
<msg2><pd>
<cdhead version="13"/>
</pd></msg2>
<dom/>
<version></version>
</root>
How can I make the Jscript function return a result that allows the use of Xpath?
By the way, is there some XSLT 1.0 function available that allows parsing the escaped XML string to a result that allows the use of Xpath?
**ADDITION**
I have been trying some variations and got closer to a solution. First, Altova XMLSpy allows choosing the xsl processor, and the above resulted when using the built-in one. Of course I need MSXML 6.0 and when choosing that one, errors occurred as I had to parse input.text instead. But I only succeeded in being able to use Xpath expressions in the result after doing extra stuff in the javascript. It transpired that while `&#60;` and the like are parsed into `&lt;` etcetera, this is not enough to arrive at the proper DOM result. So I resorted to unescaping the input string first.
But I hit another snag: where the below works fine, it does not when I use `input.text` instead of the literal below.
See below the xslt.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ms="urn:schemas-microsoft-com:xslt"
xmlns:my="http://example.com/my"
exclude-result-prefixes="ms my">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<ms:script language="JScript" implements-prefix="my">
<![CDATA[
function parseToDOM (input) {
var doc = new ActiveXObject('Msxml2.DOMDocument.6.0');
doc.loadXML (unescapeXML ('&#60;pd&#62;&#10; &#60;cdhead version&#61;&#34;13&#34;/&#62;&#10;&#60;/pd&#62;'));
//doc.loadXML (unescapeXML (input.text));
return doc;
};
function unescapeXML (str) {
var ostr = str;
ostr = ostr.replace (/&#34;/g, '"');
ostr = ostr.replace (/&#60;/g, '<');
ostr = ostr.replace (/&#61;/g, '=');
ostr = ostr.replace (/&#62;/g, '>');
return ostr;
};
]]>
</ms:script>
<xsl:template match="/">
<root>
<xsl:apply-templates/>
</root>
</xsl:template>
<xsl:template match="DATA_RECORD">
<xsl:variable name="msg" select="my:parseToDOM (MESSAGE)"/>
<tst><xsl:value-of select="$msg/pd/cdhead/@version"/></tst>
</xsl:template>
</xsl:stylesheet>
Now results in
<?xml version="1.0" encoding="UTF-8"?>
<root>
<tst>13</tst>
</root>
Which is exactly what I want.
But as remarked above, when I comment the parsing of the literal and use the input instead, like so:
//doc.loadXML (unescapeXML ('&#60;pd&#62;&#10; &#60;cdhead version&#61;&#34;13&#34;/&#62;&#10;&#60;/pd&#62;'));
doc.loadXML (unescapeXML (input.text));
I get the following error (in Altova XML Spy with MSXML 6.0 as xslt parser):
XSL transformation failed due to following error:
Microsoft JScript runtime error
'undefined' is null or not an object
line = 10, col = 3 (line is offset from the start of the script block).
Error returned from property or method call.
Which points at the first javascript replace statement.
And also, IE9 cannot process the following properly:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="test.xslt"?>
<main>
<DATA_RECORD>
<MESSAGE>&#60;pd&#62;&#10; &#60;cdhead version&#61;&#34;13&#34;/&#62;&#10;&#60;/pd&#62;</MESSAGE>
</DATA_RECORD>
</main>
When I open this file in IE9 (where test.xslt is the version of the transformation where the input is ignored and instead a literal is processed, hence the one that is OK in XML Spy), I get a processing error:
XML5001: Applying Integrated XSLT Handling.
XSLT8690: XSLT processing failed.
Why is all this and how can I correct it?