Re: XML extraction by ekkehard
ekkehard
Thu May 11 05:31:48 CDT 2006
bcarvinm@yahoo.com wrote:
> My question is probably fairly easy to answer,
let's see
> but I've searched the
> groups and haven't found anything that is close to what I'm looking
> for. It's probably because I don't know what to search on which
> isn't unusual. Here is an extract of data from an XML file that I'm
> working with.
>
> <vocabulary>
> <term id="K442490" parentid="K352208">
> <synonyms language="EN-GB">
> <synonym string="St Marks Church-in-the-Bowery" preferred="YES"/>
> <synonym string="church-in-the-bowery"/>
> <synonym string="church-on-the-bowery"/>
> <synonym string="saint mark;s church in the bowery"/>
> <synonym string="saint mark;s on the bowery"/>
> <synonym string="st marks"/>
> <synonym string="st marks church on the bowery"/>
> <synonym string="st marks-on-the-bowery"/>
> <synonym string="st. marks-on-the-bowery"/>
> </synonyms>
> <synonyms language="EN-US">
> <synonym string="St Marks Church-in-the-Bowery" preferred="YES"/>
> <synonym string="church-in-the-bowery"/>
> <synonym string="church-on-the-bowery"/>
> <synonym string="saint marks church in the bowery"/>
> <synonym string="saint marks on the bowery"/>
> <synonym string="st marks"/>
> <synonym string="st marks church on the bowery"/>
> <synonym string="st marks-on-the-bowery"/>
> <synonym string="st. marks-on-the-bowery"/>
> </synonyms>
> </term>
> </vocabulary>
I used
<vocabulary>
<term id="K442490" parentid="K352208">
<synonyms language="EN-GB">
<synonym string="St Marks Church-in-the-Bowery" preferred="YES"/>
<synonym string="church-in-the-bowery"/>
</synonyms>
<synonyms language="EN-US">
<synonym string="St Marks Church-in-the-Bowery" preferred="YES"/>
<synonym string="church-in-the-bowery"/>
</synonyms>
</term>
<term id="K442491" parentid="K352208">
<synonyms language="EN-GB">
<synonym string="St adelheid Church-in-the-flowery" preferred="YES"/>
<synonym string="church-in-the-flowery"/>
</synonyms>
<synonyms language="EN-US">
<synonym string="St adelheid Church-in-the-flowery" preferred="YES"/>
<synonym string="church-in-the-flowery"/>
</synonyms>
</term>
</vocabulary>
(at least two taken to tango & less noise)
> What I'm trying to do is search another recordset for the K#### and
> replace it with the "preferred" EN-US string. I can get the first
> recordset broken out the way I need it, but what I can't figure out is
> how to pull the two pieces of info from this XML file and have them
> searchable. There are about 80k terms that will need to be broken out.
now get at a solution in three steps:
(1) traversing nodes, look at properties
(2) select nodes according your criteria
(3) attempt at a function, you could [modify and] use
(undo any word wrap, please)
Dim objDoc, sKey, objNodeList, objNode
'Create an XML Document Object and specify XPath (to be on the safe side)
Set objDoc = CreateObject( "Msxml2.DOMDocument.4.0" )
objDoc.setProperty "SelectionLanguage", "XPath"
'Load XML
objDoc.async = False
objDoc.Load "vocabulary.xml"
'First pass to gain some confidence
WScript.Echo " ---- First pass --------"
For Each sKey In Array( "K442490" )
'Return a nodelist with all synonym Elements (full path)
'Set objNodeList = objDoc.selectNodes( "/vocabulary/term/synonyms/synonym" )
'Return a nodelist with all synonym Elements (just name)
Set objNodeList = objDoc.selectNodes( "//synonym" )
'Display
For Each objNode In objNodeList
'It's easy/save to get the tagName
WScript.Echo "found " + objNode.tagName
'Now look up getAttribute() in the XML SDK Docs and implement
WScript.Echo " s " + objNode.getAttribute( "string" )
'Now that's easy to (except: use '&' to hanle null)
WScript.Echo " p " & objNode.getAttribute( "preferred" )
WScript.Echo " l " & objNode.parentNode.getAttribute( "language" )
WScript.Echo " k " & objNode.parentNode.parentNode.getAttribute( "id" )
Next
Next
'Second pass to learn about filters
WScript.Echo " ---- Second pass --------"
For Each sKey In Array( "K442490" )
'Return a nodelist with all prefered synonym Elements
Set objNodeList = objDoc.selectNodes( "//synonyms[ @language=""EN-GB"" ]/synonym[
@preferred=""YES"" ]" )
Set objNodeList = objDoc.selectNodes( "//synonyms/synonym[ @preferred=""YES"" ]" )
For Each objNode In objNodeList
WScript.Echo "found " + objNode.getAttribute( "string" ) _
+ " (" & objNode.parentNode.getAttribute( "language" ) &
")" _
+ " [" & objNode.parentNode.parentNode.getAttribute( "id"
) & "]"
Next
Next
'Third step to make to it functionable
WScript.Echo " ---- Third Step ---------"
For Each sKey In Array( "K442490", "K442491", "nix" )
WScript.Echo "Preferred synonym for " + sKey + " is '" _
+ getPreferredSynonym( objDoc, sKey, "EN-GB" ) + "'"
Next
Function getPreferredSynonym( objDoc, sKey, sLanguage )
getPreferredSynonym = "*** not found ***"
On Error Resume Next ' dirty trick - if all is well - no problem; if anything
' fails, I don't care about the details
getPreferredSynonym = objDoc.selectNodes( "//term[ @id=""" + sKey + """ ]" _
+ "/synonyms[ @language=""" + sLanguage + """
]" _
+ "/synonym[ @preferred=""YES"" ]"
_
)( 0 ).getAttribute( "string" )
End Function
tell me if you find it still easy after working thru the (relevant parts
of) the XML SDK Docs.