My question is probably fairly easy to answer, but I've searched the
groups and haven't found anything that is close to what I'm looking
for. It's probably because I don't know what to search on which
isn't unusual. Here is an extract of data from an XML file that I'm
working with.

<vocabulary>
<term id="K442490" parentid="K352208">
<synonyms language="EN-GB">
<synonym string="St Marks Church-in-the-Bowery" preferred="YES"/>
<synonym string="church-in-the-bowery"/>
<synonym string="church-on-the-bowery"/>
<synonym string="saint mark;s church in the bowery"/>
<synonym string="saint mark;s on the bowery"/>
<synonym string="st marks"/>
<synonym string="st marks church on the bowery"/>
<synonym string="st marks-on-the-bowery"/>
<synonym string="st. marks-on-the-bowery"/>
</synonyms>
<synonyms language="EN-US">
<synonym string="St Marks Church-in-the-Bowery" preferred="YES"/>
<synonym string="church-in-the-bowery"/>
<synonym string="church-on-the-bowery"/>
<synonym string="saint marks church in the bowery"/>
<synonym string="saint marks on the bowery"/>
<synonym string="st marks"/>
<synonym string="st marks church on the bowery"/>
<synonym string="st marks-on-the-bowery"/>
<synonym string="st. marks-on-the-bowery"/>
</synonyms>
</term>
</vocabulary>
What I'm trying to do is search another recordset for the K#### and
replace it with the "preferred" EN-US string. I can get the first
recordset broken out the way I need it, but what I can't figure out is
how to pull the two pieces of info from this XML file and have them
searchable. There are about 80k terms that will need to be broken out.

Any help would be greatly appreciated.

Re: XML extraction by ekkehard

ekkehard
Thu May 11 05:31:48 CDT 2006

bcarvinm@yahoo.com wrote:
> My question is probably fairly easy to answer,
let's see
> but I've searched the
> groups and haven't found anything that is close to what I'm looking
> for. It's probably because I don't know what to search on which
> isn't unusual. Here is an extract of data from an XML file that I'm
> working with.
>
> <vocabulary>
> <term id="K442490" parentid="K352208">
> <synonyms language="EN-GB">
> <synonym string="St Marks Church-in-the-Bowery" preferred="YES"/>
> <synonym string="church-in-the-bowery"/>
> <synonym string="church-on-the-bowery"/>
> <synonym string="saint mark;s church in the bowery"/>
> <synonym string="saint mark;s on the bowery"/>
> <synonym string="st marks"/>
> <synonym string="st marks church on the bowery"/>
> <synonym string="st marks-on-the-bowery"/>
> <synonym string="st. marks-on-the-bowery"/>
> </synonyms>
> <synonyms language="EN-US">
> <synonym string="St Marks Church-in-the-Bowery" preferred="YES"/>
> <synonym string="church-in-the-bowery"/>
> <synonym string="church-on-the-bowery"/>
> <synonym string="saint marks church in the bowery"/>
> <synonym string="saint marks on the bowery"/>
> <synonym string="st marks"/>
> <synonym string="st marks church on the bowery"/>
> <synonym string="st marks-on-the-bowery"/>
> <synonym string="st. marks-on-the-bowery"/>
> </synonyms>
> </term>
> </vocabulary>
I used

<vocabulary>
<term id="K442490" parentid="K352208">
<synonyms language="EN-GB">
<synonym string="St Marks Church-in-the-Bowery" preferred="YES"/>
<synonym string="church-in-the-bowery"/>
</synonyms>
<synonyms language="EN-US">
<synonym string="St Marks Church-in-the-Bowery" preferred="YES"/>
<synonym string="church-in-the-bowery"/>
</synonyms>
</term>
<term id="K442491" parentid="K352208">
<synonyms language="EN-GB">
<synonym string="St adelheid Church-in-the-flowery" preferred="YES"/>
<synonym string="church-in-the-flowery"/>
</synonyms>
<synonyms language="EN-US">
<synonym string="St adelheid Church-in-the-flowery" preferred="YES"/>
<synonym string="church-in-the-flowery"/>
</synonyms>
</term>
</vocabulary>

(at least two taken to tango & less noise)

> What I'm trying to do is search another recordset for the K#### and
> replace it with the "preferred" EN-US string. I can get the first
> recordset broken out the way I need it, but what I can't figure out is
> how to pull the two pieces of info from this XML file and have them
> searchable. There are about 80k terms that will need to be broken out.

now get at a solution in three steps:

(1) traversing nodes, look at properties

(2) select nodes according your criteria

(3) attempt at a function, you could [modify and] use

(undo any word wrap, please)

Dim objDoc, sKey, objNodeList, objNode

'Create an XML Document Object and specify XPath (to be on the safe side)
Set objDoc = CreateObject( "Msxml2.DOMDocument.4.0" )
objDoc.setProperty "SelectionLanguage", "XPath"
'Load XML
objDoc.async = False
objDoc.Load "vocabulary.xml"

'First pass to gain some confidence
WScript.Echo " ---- First pass --------"
For Each sKey In Array( "K442490" )
'Return a nodelist with all synonym Elements (full path)
'Set objNodeList = objDoc.selectNodes( "/vocabulary/term/synonyms/synonym" )
'Return a nodelist with all synonym Elements (just name)
Set objNodeList = objDoc.selectNodes( "//synonym" )
'Display
For Each objNode In objNodeList
'It's easy/save to get the tagName
WScript.Echo "found " + objNode.tagName
'Now look up getAttribute() in the XML SDK Docs and implement
WScript.Echo " s " + objNode.getAttribute( "string" )
'Now that's easy to (except: use '&' to hanle null)
WScript.Echo " p " & objNode.getAttribute( "preferred" )
WScript.Echo " l " & objNode.parentNode.getAttribute( "language" )
WScript.Echo " k " & objNode.parentNode.parentNode.getAttribute( "id" )
Next
Next

'Second pass to learn about filters
WScript.Echo " ---- Second pass --------"
For Each sKey In Array( "K442490" )
'Return a nodelist with all prefered synonym Elements
Set objNodeList = objDoc.selectNodes( "//synonyms[ @language=""EN-GB"" ]/synonym[
@preferred=""YES"" ]" )
Set objNodeList = objDoc.selectNodes( "//synonyms/synonym[ @preferred=""YES"" ]" )
For Each objNode In objNodeList
WScript.Echo "found " + objNode.getAttribute( "string" ) _
+ " (" & objNode.parentNode.getAttribute( "language" ) &
")" _
+ " [" & objNode.parentNode.parentNode.getAttribute( "id"
) & "]"
Next
Next

'Third step to make to it functionable
WScript.Echo " ---- Third Step ---------"
For Each sKey In Array( "K442490", "K442491", "nix" )
WScript.Echo "Preferred synonym for " + sKey + " is '" _
+ getPreferredSynonym( objDoc, sKey, "EN-GB" ) + "'"
Next

Function getPreferredSynonym( objDoc, sKey, sLanguage )
getPreferredSynonym = "*** not found ***"

On Error Resume Next ' dirty trick - if all is well - no problem; if anything
' fails, I don't care about the details
getPreferredSynonym = objDoc.selectNodes( "//term[ @id=""" + sKey + """ ]" _
+ "/synonyms[ @language=""" + sLanguage + """
]" _
+ "/synonym[ @preferred=""YES"" ]"
_
)( 0 ).getAttribute( "string" )

End Function

tell me if you find it still easy after working thru the (relevant parts
of) the XML SDK Docs.