So my boss has been killing me about creating a link checker, what I
need to know is there a sample script somewhere to get me started on
creating a checker? Basically we have a ton of links and want to check
if they are valid as in not hardcoded, not sure I want to check if the
link is actually working cause that might not be an easy thing to pull
off. Reason I am doing vbs is cause we cannot install anything on our
server and running something client side is not dependable. With a vbs
file I can just schedule it and produce a report and email it to myself
or anyone else who needed to see if there were broken links. Any leads
to good example scripts?

Re: Creating a Link Checker by McKirahan

McKirahan
Mon Aug 14 10:17:04 CDT 2006

"wreed" <whreed@gmail.com> wrote in message
news:1155563363.655796.184420@m73g2000cwd.googlegroups.com...
> So my boss has been killing me about creating a link checker, what I
> need to know is there a sample script somewhere to get me started on
> creating a checker? Basically we have a ton of links and want to check
> if they are valid as in not hardcoded, not sure I want to check if the
> link is actually working cause that might not be an easy thing to pull
> off. Reason I am doing vbs is cause we cannot install anything on our
> server and running something client side is not dependable. With a vbs
> file I can just schedule it and produce a report and email it to myself
> or anyone else who needed to see if there were broken links. Any leads
> to good example scripts?

Can you maintain of list of links in a text file or must it scan your
pages for links to check?

Also, what does "... valid as in not hardcoded" mean?



Re: Creating a Link Checker by wreed

wreed
Mon Aug 14 12:39:52 CDT 2006

I prefer it to be checking each link because we generate a lot of excel
files that if the program bombs we want to know that the link is no
longer going to work cause the excel file was not generated....we have
multiple apps that generate excel files that are then linked off the
site to.

We have some pages that have hardcoded links (Absolute) and I would
like to see these and know which to change to relative links.


McKirahan wrote:
> "wreed" <whreed@gmail.com> wrote in message
> news:1155563363.655796.184420@m73g2000cwd.googlegroups.com...
> > So my boss has been killing me about creating a link checker, what I
> > need to know is there a sample script somewhere to get me started on
> > creating a checker? Basically we have a ton of links and want to check
> > if they are valid as in not hardcoded, not sure I want to check if the
> > link is actually working cause that might not be an easy thing to pull
> > off. Reason I am doing vbs is cause we cannot install anything on our
> > server and running something client side is not dependable. With a vbs
> > file I can just schedule it and produce a report and email it to myself
> > or anyone else who needed to see if there were broken links. Any leads
> > to good example scripts?
>
> Can you maintain of list of links in a text file or must it scan your
> pages for links to check?
>
> Also, what does "... valid as in not hardcoded" mean?


Re: Creating a Link Checker by McKirahan

McKirahan
Mon Aug 14 13:18:43 CDT 2006

"wreed" <whreed@gmail.com> wrote in message
news:1155577191.992082.305270@74g2000cwt.googlegroups.com...
> I prefer it to be checking each link because we generate a lot of excel
> files that if the program bombs we want to know that the link is no
> longer going to work cause the excel file was not generated....we have
> multiple apps that generate excel files that are then linked off the
> site to.
>
> We have some pages that have hardcoded links (Absolute) and I would
> like to see these and know which to change to relative links.

[snip]

I'm not sure I understand the appeoach you'd like to use.
Could you clarify -- perhaps with examples?

Here is a script that will read a file and, for each line that contains
"http", it will fetch the URL and generate a log file of the results;
(a "+" prefix indicates success; a "-" prefix indicates a failure.)

Option Explicit
'*
'* Declare Constants
'*
Const cVBS = "links.vbs"
Const cTXT = "links.txt"
Const cLOG = "links.log"
'*
'* Declare Variables
'*
Dim intINS
Dim arrTXT
Dim intTXT
Dim strTXT
Dim intURL
intURL = 0
Dim strURL
Dim booXML
'*
'* Declare Objects
'*
Dim objFSO
Set objFSO = CreateObject("Scripting.FileSystemObject")
Dim objLOG
Set objLOG = objFSO.OpenTextFile(cLOG,2,true)
Dim objTXT
Set objTXT = objFSO.OpenTextFile(cTXT,1)
Dim objXML
Set objXML = CreateObject("Microsoft.XMLHTTP")
'*
'* Check Links
'*
strTXT = objTXT.ReadAll
arrTXT = Split(strTXT,vbCrLf)
For intTXT = 0 To UBound(arrTXT)
strTXT = arrTXT(intTXT)
intINS = InStr(strTXT,"http")
If intINS > 0 Then
strURL = Trim(Mid(strTXT,intINS))
intURL = intURL + 1
'*
On Error Resume Next
Err.Clear
objXML.Open "GET",strURL,False
objXML.Send
If Err.Number <> 0 Or objXML.Status <> 200 Then
booXML = "-"
Else
booXML = "+"
End If
On Error GoTo 0
'*
objLOG.WriteLine(booXML & " " & strTXT)
End If
Next
'*
'* Destroy Objects
'*
Set objXML = Nothing
Set objTXT = Nothing
Set objLOG = Nothing
Set objFSO = Nothing
'*
'* Finish
'*
MsgBox intURL & " links checked",vbInformation,cVBS


The "links.txt" file can either just identify a list of URLS:
http://www.google.com
http://www.msn.com
or it may identfy the page that each URL is on; for example:
Google Home Page http://www.google.com
MSN Home Page http://www.msn.com
it just expects the URL ("http" prefix) to be at the end of the line.


Note that in some environments "Microsoft.XMLHTTP" may
not work; if it doesn't, then try one of the following:
"MSXML2.XMLHTTP.5.0"
"MSXML2.XMLHTTP.4.0"
"MSXML2.XMLHTTP.3.0"
"MSXML2.XMLHTTP"




Re: Creating a Link Checker by wreed

wreed
Mon Aug 14 15:11:28 CDT 2006

What I am trying to do is have a crawler so to speak to check if links
work, they are all internal links and their are tons of them, I prefer
not to have a txt file listing the links....I have used programs to
check the links... Xenu's Link Sleuth is great but I need to run a
script on the server to run and produce a report of broken links....I
am just not really sure how to really approach it, I would assume I
would open a file system object and just loop through folders and what
not but I cannot actully check for 404's that way.

McKirahan wrote:
> "wreed" <whreed@gmail.com> wrote in message
> news:1155577191.992082.305270@74g2000cwt.googlegroups.com...
> > I prefer it to be checking each link because we generate a lot of excel
> > files that if the program bombs we want to know that the link is no
> > longer going to work cause the excel file was not generated....we have
> > multiple apps that generate excel files that are then linked off the
> > site to.
> >
> > We have some pages that have hardcoded links (Absolute) and I would
> > like to see these and know which to change to relative links.
>
> [snip]
>
> I'm not sure I understand the appeoach you'd like to use.
> Could you clarify -- perhaps with examples?
>
> Here is a script that will read a file and, for each line that contains
> "http", it will fetch the URL and generate a log file of the results;
> (a "+" prefix indicates success; a "-" prefix indicates a failure.)
>
> Option Explicit
> '*
> '* Declare Constants
> '*
> Const cVBS = "links.vbs"
> Const cTXT = "links.txt"
> Const cLOG = "links.log"
> '*
> '* Declare Variables
> '*
> Dim intINS
> Dim arrTXT
> Dim intTXT
> Dim strTXT
> Dim intURL
> intURL = 0
> Dim strURL
> Dim booXML
> '*
> '* Declare Objects
> '*
> Dim objFSO
> Set objFSO = CreateObject("Scripting.FileSystemObject")
> Dim objLOG
> Set objLOG = objFSO.OpenTextFile(cLOG,2,true)
> Dim objTXT
> Set objTXT = objFSO.OpenTextFile(cTXT,1)
> Dim objXML
> Set objXML = CreateObject("Microsoft.XMLHTTP")
> '*
> '* Check Links
> '*
> strTXT = objTXT.ReadAll
> arrTXT = Split(strTXT,vbCrLf)
> For intTXT = 0 To UBound(arrTXT)
> strTXT = arrTXT(intTXT)
> intINS = InStr(strTXT,"http")
> If intINS > 0 Then
> strURL = Trim(Mid(strTXT,intINS))
> intURL = intURL + 1
> '*
> On Error Resume Next
> Err.Clear
> objXML.Open "GET",strURL,False
> objXML.Send
> If Err.Number <> 0 Or objXML.Status <> 200 Then
> booXML = "-"
> Else
> booXML = "+"
> End If
> On Error GoTo 0
> '*
> objLOG.WriteLine(booXML & " " & strTXT)
> End If
> Next
> '*
> '* Destroy Objects
> '*
> Set objXML = Nothing
> Set objTXT = Nothing
> Set objLOG = Nothing
> Set objFSO = Nothing
> '*
> '* Finish
> '*
> MsgBox intURL & " links checked",vbInformation,cVBS
>
>
> The "links.txt" file can either just identify a list of URLS:
> http://www.google.com
> http://www.msn.com
> or it may identfy the page that each URL is on; for example:
> Google Home Page http://www.google.com
> MSN Home Page http://www.msn.com
> it just expects the URL ("http" prefix) to be at the end of the line.
>
>
> Note that in some environments "Microsoft.XMLHTTP" may
> not work; if it doesn't, then try one of the following:
> "MSXML2.XMLHTTP.5.0"
> "MSXML2.XMLHTTP.4.0"
> "MSXML2.XMLHTTP.3.0"
> "MSXML2.XMLHTTP"


Re: Creating a Link Checker by McKirahan

McKirahan
Mon Aug 14 15:59:44 CDT 2006

"wreed" <whreed@gmail.com> wrote in message
news:1155586288.740698.314920@i42g2000cwa.googlegroups.com...
> What I am trying to do is have a crawler so to speak to check if links
> work, they are all internal links and their are tons of them, I prefer
> not to have a txt file listing the links....I have used programs to
> check the links... Xenu's Link Sleuth is great but I need to run a
> script on the server to run and produce a report of broken links....I
> am just not really sure how to really approach it, I would assume I
> would open a file system object and just loop through folders and what
> not but I cannot actully check for 404's that way.

A script could be written that would call through all the pages on your
site to identify all *hardcoded* links easily enough. Though I'd probably
have it scan an offline (local) version.

I gather that you have hyperlinks other than *hardcoded* URLs.

Are they constructed client-side (e.g. JavaScript) and/or server-side
(e.g. ASP)? Could you provide examples?



Re: Creating a Link Checker by ken

ken
Mon Aug 14 16:10:20 CDT 2006

Have you considered using a commercial link checker? LinkScan does all
that you
require and you don't need to maintain it or update it. There is a free
15 day trial of
the full blown product at

http://www.elsop.com/

Ken

wreed wrote:
> So my boss has been killing me about creating a link checker, what I
> need to know is there a sample script somewhere to get me started on
> creating a checker? Basically we have a ton of links and want to check
> if they are valid as in not hardcoded, not sure I want to check if the
> link is actually working cause that might not be an easy thing to pull
> off. Reason I am doing vbs is cause we cannot install anything on our
> server and running something client side is not dependable. With a vbs
> file I can just schedule it and produce a report and email it to myself
> or anyone else who needed to see if there were broken links. Any leads
> to good example scripts?


Re: Creating a Link Checker by Tim

Tim
Mon Aug 14 16:57:06 CDT 2006

Might consider updating this to just do a HEAD request (you only need to see if the resource is available, not download it)

http://www.jibbering.com/2002/4/httprequest.html

Tim

--
Tim Williams
Palo Alto, CA


"McKirahan" <News@McKirahan.com> wrote in message news:QcWdnd0Z2-oeI33ZnZ2dnUVZ_qOdnZ2d@comcast.com...
> "wreed" <whreed@gmail.com> wrote in message
> news:1155577191.992082.305270@74g2000cwt.googlegroups.com...
> > I prefer it to be checking each link because we generate a lot of excel
> > files that if the program bombs we want to know that the link is no
> > longer going to work cause the excel file was not generated....we have
> > multiple apps that generate excel files that are then linked off the
> > site to.
> >
> > We have some pages that have hardcoded links (Absolute) and I would
> > like to see these and know which to change to relative links.
>
> [snip]
>
> I'm not sure I understand the appeoach you'd like to use.
> Could you clarify -- perhaps with examples?
>
> Here is a script that will read a file and, for each line that contains
> "http", it will fetch the URL and generate a log file of the results;
> (a "+" prefix indicates success; a "-" prefix indicates a failure.)
>
> Option Explicit
> '*
> '* Declare Constants
> '*
> Const cVBS = "links.vbs"
> Const cTXT = "links.txt"
> Const cLOG = "links.log"
> '*
> '* Declare Variables
> '*
> Dim intINS
> Dim arrTXT
> Dim intTXT
> Dim strTXT
> Dim intURL
> intURL = 0
> Dim strURL
> Dim booXML
> '*
> '* Declare Objects
> '*
> Dim objFSO
> Set objFSO = CreateObject("Scripting.FileSystemObject")
> Dim objLOG
> Set objLOG = objFSO.OpenTextFile(cLOG,2,true)
> Dim objTXT
> Set objTXT = objFSO.OpenTextFile(cTXT,1)
> Dim objXML
> Set objXML = CreateObject("Microsoft.XMLHTTP")
> '*
> '* Check Links
> '*
> strTXT = objTXT.ReadAll
> arrTXT = Split(strTXT,vbCrLf)
> For intTXT = 0 To UBound(arrTXT)
> strTXT = arrTXT(intTXT)
> intINS = InStr(strTXT,"http")
> If intINS > 0 Then
> strURL = Trim(Mid(strTXT,intINS))
> intURL = intURL + 1
> '*
> On Error Resume Next
> Err.Clear
> objXML.Open "GET",strURL,False
> objXML.Send
> If Err.Number <> 0 Or objXML.Status <> 200 Then
> booXML = "-"
> Else
> booXML = "+"
> End If
> On Error GoTo 0
> '*
> objLOG.WriteLine(booXML & " " & strTXT)
> End If
> Next
> '*
> '* Destroy Objects
> '*
> Set objXML = Nothing
> Set objTXT = Nothing
> Set objLOG = Nothing
> Set objFSO = Nothing
> '*
> '* Finish
> '*
> MsgBox intURL & " links checked",vbInformation,cVBS
>
>
> The "links.txt" file can either just identify a list of URLS:
> http://www.google.com
> http://www.msn.com
> or it may identfy the page that each URL is on; for example:
> Google Home Page http://www.google.com
> MSN Home Page http://www.msn.com
> it just expects the URL ("http" prefix) to be at the end of the line.
>
>
> Note that in some environments "Microsoft.XMLHTTP" may
> not work; if it doesn't, then try one of the following:
> "MSXML2.XMLHTTP.5.0"
> "MSXML2.XMLHTTP.4.0"
> "MSXML2.XMLHTTP.3.0"
> "MSXML2.XMLHTTP"
>
>
>



Re: Creating a Link Checker by Alexander

Alexander
Tue Aug 15 05:38:09 CDT 2006

Tim Williams schrieb:
> Might consider updating this to just do a HEAD request (you only need to see if the resource is available, not download it)
>
> http://www.jibbering.com/2002/4/httprequest.html

Btw do you know if sending the request asynchronously and
checking readystate on catching the change-event is more *performant*
then sending a synchronous request and simply wait for it to return?

MfG,
Alex

Re: Creating a Link Checker by wreed

wreed
Tue Aug 15 08:16:44 CDT 2006

Linkscan is nice but I cannot do it by a client side app that someone
will have to run each day, basically we have dts' running that generate
excel spreadsheets that are linked off the site, I want to make sure
that if they bombed that I need to make sure the link still works, yea
it sounds simple but the site is a big piece of junk (which needs to be
house cleaned) and its all done in Frontpage, yuck....inherited.

Anyway The boss wants an automated script running that is crawling the
site so to speak to look for these broken links and generating a
report, I like the idea of checking for header but really this whole
idea to me yes sounds dumb and I hate it but to get him off my back I
need to create something so I was basically looking for ideas but I am
not 100% what the exact way to do it. I don't know if I should crawl
it from the client side through HTTP or do it through a filesystem
object and scan each html file which is a nightmare. I would also like
to see if links are hardcoded and what not. Sorry for explaining
myself to many times but I don't think I explained it right in the
beginning....I am really hating this project, cause I am a web
programmer and this coding with file objects is beyond me :)


Alexander Mueller wrote:
> Tim Williams schrieb:
> > Might consider updating this to just do a HEAD request (you only need to see if the resource is available, not download it)
> >
> > http://www.jibbering.com/2002/4/httprequest.html
>
> Btw do you know if sending the request asynchronously and
> checking readystate on catching the change-event is more *performant*
> then sending a synchronous request and simply wait for it to return?
>
> MfG,
> Alex


Re: Creating a Link Checker by Tim

Tim
Wed Aug 16 01:29:57 CDT 2006

Don't know why it would be different synch vs. asynch: response time of the
server is unlikely to be influenced.
Unless you're going to do something else while wating for the response, you
may as well just wait.

Tim

"Alexander Mueller" <millerax@hotmail.com> wrote in message
news:eqEMobFwGHA.3372@TK2MSFTNGP02.phx.gbl...
> Tim Williams schrieb:
>> Might consider updating this to just do a HEAD request (you only need to
>> see if the resource is available, not download it)
>>
>> http://www.jibbering.com/2002/4/httprequest.html
>
> Btw do you know if sending the request asynchronously and
> checking readystate on catching the change-event is more *performant*
> then sending a synchronous request and simply wait for it to return?
>
> MfG,
> Alex



Re: Creating a Link Checker by Tim

Tim
Wed Aug 16 11:50:44 CDT 2006

Maybe you could explain what you mean by
"see if links are hardcoded"

If you want to check for functioning links then it makes more sense to check the links and not the filesystem.

--
Tim Williams
Palo Alto, CA


"wreed" <whreed@gmail.com> wrote in message news:1155647804.843771.46600@75g2000cwc.googlegroups.com...
> Linkscan is nice but I cannot do it by a client side app that someone
> will have to run each day, basically we have dts' running that generate
> excel spreadsheets that are linked off the site, I want to make sure
> that if they bombed that I need to make sure the link still works, yea
> it sounds simple but the site is a big piece of junk (which needs to be
> house cleaned) and its all done in Frontpage, yuck....inherited.
>
> Anyway The boss wants an automated script running that is crawling the
> site so to speak to look for these broken links and generating a
> report, I like the idea of checking for header but really this whole
> idea to me yes sounds dumb and I hate it but to get him off my back I
> need to create something so I was basically looking for ideas but I am
> not 100% what the exact way to do it. I don't know if I should crawl
> it from the client side through HTTP or do it through a filesystem
> object and scan each html file which is a nightmare. I would also like
> to see if links are hardcoded and what not. Sorry for explaining
> myself to many times but I don't think I explained it right in the
> beginning....I am really hating this project, cause I am a web
> programmer and this coding with file objects is beyond me :)
>
>
> Alexander Mueller wrote:
> > Tim Williams schrieb:
> > > Might consider updating this to just do a HEAD request (you only need to see if the resource is available, not download it)
> > >
> > > http://www.jibbering.com/2002/4/httprequest.html
> >
> > Btw do you know if sending the request asynchronously and
> > checking readystate on catching the change-event is more *performant*
> > then sending a synchronous request and simply wait for it to return?
> >
> > MfG,
> > Alex
>



Re: Creating a Link Checker by wreed

wreed
Wed Aug 16 11:59:53 CDT 2006

Hardcoded would be with the server name in the href tag...but you make
a good point here, to check for hardcoded that would be a file system
check while functioning links would be through the HTTP protocol....I
wish I could just use a client based link checker.

Tim Williams wrote:
> Maybe you could explain what you mean by
> "see if links are hardcoded"
>
> If you want to check for functioning links then it makes more sense to check the links and not the filesystem.
>
> --
> Tim Williams
> Palo Alto, CA
>
>
> "wreed" <whreed@gmail.com> wrote in message news:1155647804.843771.46600@75g2000cwc.googlegroups.com...
> > Linkscan is nice but I cannot do it by a client side app that someone
> > will have to run each day, basically we have dts' running that generate
> > excel spreadsheets that are linked off the site, I want to make sure
> > that if they bombed that I need to make sure the link still works, yea
> > it sounds simple but the site is a big piece of junk (which needs to be
> > house cleaned) and its all done in Frontpage, yuck....inherited.
> >
> > Anyway The boss wants an automated script running that is crawling the
> > site so to speak to look for these broken links and generating a
> > report, I like the idea of checking for header but really this whole
> > idea to me yes sounds dumb and I hate it but to get him off my back I
> > need to create something so I was basically looking for ideas but I am
> > not 100% what the exact way to do it. I don't know if I should crawl
> > it from the client side through HTTP or do it through a filesystem
> > object and scan each html file which is a nightmare. I would also like
> > to see if links are hardcoded and what not. Sorry for explaining
> > myself to many times but I don't think I explained it right in the
> > beginning....I am really hating this project, cause I am a web
> > programmer and this coding with file objects is beyond me :)
> >
> >
> > Alexander Mueller wrote:
> > > Tim Williams schrieb:
> > > > Might consider updating this to just do a HEAD request (you only need to see if the resource is available, not download it)
> > > >
> > > > http://www.jibbering.com/2002/4/httprequest.html
> > >
> > > Btw do you know if sending the request asynchronously and
> > > checking readystate on catching the change-event is more *performant*
> > > then sending a synchronous request and simply wait for it to return?
> > >
> > > MfG,
> > > Alex
> >


Re: Creating a Link Checker by ken

ken
Sun Aug 20 11:23:07 CDT 2006

LinkScan can be installed and executed on the server. It can
be scheduled to run automatically, produce reports, email
them to selected addressees and more.

And although LinkScan does not normally differentiate between
relative and absolute links (your "hardcoded") there are
some techniques that can be applied depending on exactly
what you want to accomplish. Contact the vendor on that.

As an aside, the HTTP HEAD method is a nice idea but sadly
the HEAD method is not supported (or not properly supported)
by *many* modern servers/applications. So, unless you *know*
it will work accurately in your specific case, don't waste
your time with the inevitible false positives and false negatives.


Re: Creating a Link Checker by Tim

Tim
Sun Aug 20 13:29:52 CDT 2006

Ken,

I wasn't aware that the HEAD method is not supported by many servers.
Do you have any examples of specific platforms which do not respond to such
requests?

Thanks
Tim

<ken@elsop.com> wrote in message
news:1156090987.749535.151890@b28g2000cwb.googlegroups.com...
> LinkScan can be installed and executed on the server. It can
> be scheduled to run automatically, produce reports, email
> them to selected addressees and more.
>
> And although LinkScan does not normally differentiate between
> relative and absolute links (your "hardcoded") there are
> some techniques that can be applied depending on exactly
> what you want to accomplish. Contact the vendor on that.
>
> As an aside, the HTTP HEAD method is a nice idea but sadly
> the HEAD method is not supported (or not properly supported)
> by *many* modern servers/applications. So, unless you *know*
> it will work accurately in your specific case, don't waste
> your time with the inevitible false positives and false negatives.
>