Hallo,

this is my fist post here, so please apologize if its the wrong group or
topic....

I'm trying to get the source of a HMTL-FIle to read it line by line.
I found a little example that I customized to my requirements, please
find class-definition and program below.

My problem is the routine "title".
As you see it is possible to access the titel of the page
with "objDocument.title" (type is MSHTML.HTMLDocument), and it is
displayed properly.
If I access "body" I get an object, but don't know how to handle.

How do I get the source-code of the page?
I'd like to read the lines into a string-array.

Any suggestions?

Grüße
Werner

------ prog -----------------------
Sub main()
Dim objIEController As clsIEControll
Dim url As String

url = "http://www.Kinderbahn.de"

Set objIEController = New clsIEControll

With objIEController
.LoadDocument url
Do While Not .DocumentComplete
DoEvents
Loop
.title
End With
End Sub


------ class defintion -------------
Option Explicit

Dim WithEvents objIE As InternetExplorer
Dim mDocumentComplete As Boolean

Public Property Get DocumentComplete() As Boolean
DocumentComplete = mDocumentComplete
End Property

Public Sub LoadDocument(strDocumentname As String)
mDocumentComplete = False
objIE.Navigate2 strDocumentname
End Sub

Private Sub Class_Initialize()
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.Silent = True
End Sub

Private Sub Class_Terminate()
Set objIE = Nothing
End Sub

Private Sub objIE_BeforeNavigate2(ByVal pDisp As Object, url As Variant,
Flags As Variant, TargetFrameName As Variant, PostData As Variant,
Headers As Variant, Cancel As Boolean)
mDocumentComplete = False
End Sub

Private Sub objIE_DocumentComplete(ByVal pDisp As Object, url As Variant)
mDocumentComplete = True
End Sub

Public Sub title()
Dim objDocument As MSHTML.HTMLDocument
Set objDocument = objIE.document
MsgBox objDocument.title
Set objDocument = Nothing
End Sub





--
MODELLEISENBAHN FÜR KINDER ==> http://www.kinderbahn.de
THEMA SCHMALSPURBAHN ==> http://www.thema-schmalspurbahn.de

mailto:wf.usenet.nospam.35@werner-falkenbach.de (das kommt schon an)

Re: Getting Source of a HML-File by Anthony

Anthony
Fri May 12 07:18:39 CDT 2006


"Werner Falkenbach" <wf.usenet.nospam.35@werner-falkenbach.de> wrote in
message news:4cj5nrF1562g2U1@individual.net...
> Hallo,
>
> this is my fist post here, so please apologize if its the wrong group or
> topic....
>
> I'm trying to get the source of a HMTL-FIle to read it line by line.
> I found a little example that I customized to my requirements, please
> find class-definition and program below.
>
> My problem is the routine "title".
> As you see it is possible to access the titel of the page
> with "objDocument.title" (type is MSHTML.HTMLDocument), and it is
> displayed properly.
> If I access "body" I get an object, but don't know how to handle.
>
> How do I get the source-code of the page?
> I'd like to read the lines into a string-array.
>
> Any suggestions?
>
> Grüße
> Werner
>
> ------ prog -----------------------
> Sub main()
> Dim objIEController As clsIEControll
> Dim url As String
>
> url = "http://www.Kinderbahn.de"
>
> Set objIEController = New clsIEControll
>
> With objIEController
> .LoadDocument url
> Do While Not .DocumentComplete
> DoEvents
> Loop
> .title
> End With
> End Sub
>
>
> ------ class defintion -------------
> Option Explicit
>
> Dim WithEvents objIE As InternetExplorer
> Dim mDocumentComplete As Boolean
>
> Public Property Get DocumentComplete() As Boolean
> DocumentComplete = mDocumentComplete
> End Property
>
> Public Sub LoadDocument(strDocumentname As String)
> mDocumentComplete = False
> objIE.Navigate2 strDocumentname
> End Sub
>
> Private Sub Class_Initialize()
> Set objIE = New InternetExplorer
> objIE.Visible = True
> objIE.Silent = True
> End Sub
>
> Private Sub Class_Terminate()
> Set objIE = Nothing
> End Sub
>
> Private Sub objIE_BeforeNavigate2(ByVal pDisp As Object, url As Variant,
> Flags As Variant, TargetFrameName As Variant, PostData As Variant,
> Headers As Variant, Cancel As Boolean)
> mDocumentComplete = False
> End Sub
>
> Private Sub objIE_DocumentComplete(ByVal pDisp As Object, url As Variant)
> mDocumentComplete = True
> End Sub
>
> Public Sub title()
> Dim objDocument As MSHTML.HTMLDocument
> Set objDocument = objIE.document
> MsgBox objDocument.title
> Set objDocument = Nothing
> End Sub
>
>

If you want the original HTML code then you should use the MSXML2.XMLHTTP30
object.

Reference Microsoft XML 3.0 in your project:-


Dim asHTMLSourceLines() as String
Dim oXMLHTTP AS XMLHTTP30
Set oXMLHTTP = New MSXML2.XMLHTTP30

oXMLHTTP.Open "GET", url, False
oXMLHTTP.Send
If oXMLHTTP.Status = 200 Then
asHTMLSourceLines = Split(Replace(oXMLHTTP.ResponseText, vbCrLF, vbLF),
vbLF)
End If

Dim i as Long

For i = 0 To UBound(asHTMLSourceLines)
'Do stuff with each line
Next





>
>
>
> --
> MODELLEISENBAHN FÜR KINDER ==> http://www.kinderbahn.de
> THEMA SCHMALSPURBAHN ==> http://www.thema-schmalspurbahn.de
>
> mailto:wf.usenet.nospam.35@werner-falkenbach.de (das kommt schon an)
>



Re: Getting Source of a HML-File by Csaba

Csaba
Fri May 12 08:29:31 CDT 2006

Werner Falkenbach wrote:
> I'm trying to get the source of a HMTL-FIle to read it line by line.

Here is a possible approach.
It doesn't give you the _exact_ source, but it does give you
the outerHTML of the document.body, which may be what
you are after. Just plunk the self contained example in a .vbs
file and run it.

Set ie=WScript.CreateObject("InternetExplorer.Application","IE_")
ie.Visible = true
ie.Navigate2 "www.google.com"
While ie.ReadyState<>4: Wscript.Sleep 10: Wend

Sub IE_DownloadComplete()
Dim html, ieDisplay
If (ie.ReadyState<2) Then Exit Sub 'DOM not ready
html = ie.document.body.outerHTML
'MsgBox "Doc has loaded:" & vbcrlf & ie.document.body.outerHTML

'This section is strictly for display since MsgBox too restrictive
html = Replace (html, "&", "&amp;")
html = Replace (html, "<", "&lt;")
Set ieDisplay = CreateObject ("InternetExplorer.Application")
ieDisplay.Visible = true
ieDisplay.Navigate2 "about:blank"
ieDisplay.document.body.innerHTML = "<PRE>" & html & "</PRE>"
ieDisplay.document.title = "HTML Display"
ieDisplay.document.body.contentEditable = true
ieDisplay.document.parentWindow.focus
End Sub


Modify per desire,
Csaba Gabor from Vienna


Re: Getting Source of a HML-File by Werner

Werner
Sun May 14 03:33:14 CDT 2006

Hi,

thanks for the hints.
The solution to read the file via GET requires an URL,
which is a little bit difficult, because the HTML I have to read
can only be reached via clicking different SUBMIT on previous pages, and
the pages are created via php.
So I follow Links and Input Forms till I've reach the desired page.
That all works fine. The program just "reads" the pages and clicks the
buttons.

But with
ieDisplay.document.body.innerHTML
I get what I want to see: the source, without HEAD-Section, but that
doesn't matter for my purpose.
Thats what I wanted. Fine.

What's the differenc between innerHMTL and outerHMTL?
Just the absence of the "body"-tag?

(inner and outertext give the visible text only, which isn't useful,
cause I need the relationship between text and names of buttons I click on)

Grüße
Werner
--
MODELLEISENBAHN FÜR KINDER ==> http://www.kinderbahn.de
THEMA SCHMALSPURBAHN ==> http://www.thema-schmalspurbahn.de

mailto:wf.usenet.nospam.35@werner-falkenbach.de (das kommt schon an)


Re: Getting Source of a HML-File by Csaba

Csaba
Sun May 14 07:12:44 CDT 2006

Werner Falkenbach wrote:
> thanks for the hints.
> The solution to read the file via GET requires an URL,
> which is a little bit difficult, because the HTML I have to read
> can only be reached via clicking different SUBMIT on previous pages
> , and the pages are created via php.

Irrelevant. What happens server side is its own business (as far as I
can tell from the context of this problem).

> So I follow Links and Input Forms till I've reach the desired page.
> That all works fine. The program just "reads" the pages and clicks the buttons.

And of course you are using methods like .click (illustrated below) or
.submit to achieve this.

> But with
> ieDisplay.document.body.innerHTML
> I get what I want to see: the source, without HEAD-Section, but that
> doesn't matter for my purpose.
> Thats what I wanted. Fine.
>
> What's the differenc between innerHMTL and outerHMTL?
> Just the absence of the "body"-tag?

Yes: outerHTML returns the html for the entire element. In the
example, I've set the resultant page to a garish pink. That property
appears in body.outerHTML but not body.innerHTML. With input elements,
the distinction between .outerHTML and .innerHTML can be especially
important.

> (inner and outertext give the visible text only, which isn't useful,
> cause I need the relationship between text and names of buttons I click on)

Csaba
Expanded example follows:

Set ie=WScript.CreateObject("InternetExplorer.Application","IE_")
ie.Visible = true
ie.Navigate2 "www.google.com"
While ie.ReadyState<>4: Wscript.Sleep 10: Wend

Dim inpt, btn, oInput
Set oInput = ie.document.getElementsByTagName("INPUT")
For Each eIn In oInput 'Find a text box
If eIn.type="text" Then Exit For
Next
eIn.value="Model trains" 'This is what we'll search for
For Each eIn In oInput 'Find a (search) button
If eIn.type="submit" Then Exit For
Next
eIn.click 'Do it (search).

'Keep script alive so Sub can process event, else it's unloaded
While True: Wscript.Sleep 10: Wend

Sub IE_DownloadComplete()
Dim html, ieDisplay
If (ie.ReadyState<2) Then Exit Sub 'DOM not ready
'Next line exits upon initial page, since we're not interested in it
If (ie.document.parentWindow.location.pathname="/") Then Exit Sub

'Color search results; alters .outerHTML
Ie.document.body.bgColor = "pink"
html = ie.document.body.outerHTML
'MsgBox "Doc has loaded:" & vbcrlf & ie.document.body.outerHTML

'This section is strictly for display since MsgBox too restrictive
html = Replace (html, "&", "&amp;")
html = Replace (html, "<", "&lt;")
Set ieDisplay = CreateObject ("InternetExplorer.Application")
ieDisplay.Visible = true
ieDisplay.Navigate2 "about:blank"
ieDisplay.document.body.innerHTML = "<PRE>" & html & "</PRE>"
ieDisplay.document.title = "HTML Display"
ieDisplay.document.body.contentEditable = true
ieDisplay.document.parentWindow.focus

WScript.Quit
End Sub


Re: Getting Source of a HML-File by Werner

Werner
Sun May 14 08:39:19 CDT 2006

Hi,

Csaba Gabor schrieb:

> And of course you are using methods like .click (illustrated below) or
> .submit to achieve this.

Yep.

> With input elements,
> the distinction between .outerHTML and .innerHTML can be especially
> important.

Yep, that's now clear to me.
Thanks a lot for the examples and for using a really nice
search string ;-))

Grüße
Werner
--
MODELLEISENBAHN FÜR KINDER ==> http://www.kinderbahn.de
THEMA SCHMALSPURBAHN ==> http://www.thema-schmalspurbahn.de

mailto:wf.usenet.nospam.35@werner-falkenbach.de (das kommt schon an)