Hi all,

While this is not specific to asp.net, it does seem like the most
appropriate form (that I could find) to ask the following:

I'm trying to write a screen scraper that needs to submit a POST request,
then scrape the response. I've tried to determine the contents of the POST
via firebug but can't figure it out. I'm wondering if any of you could
recommend an application that will show exactly what is submitted when I
click Search on the following page:
https://nppes.cms.hhs.gov/NPPES/NPIRegistrySearch.do?subAction=reset&searchType=ind

I've never written a "scraper" before, so I'm not sure how it's done but I
imagine I would do something like this:
Once I manage to POST the form, I will read the response into some kind of
DOM object, then locate the elements I'm interested in and read their
innerHtml properties.

Anyway, that's how I imagine it will work. First hurdle: POSTing the form.

Thanks for any help,
Steve

RE: viewing the contents of a POST request by AnthonyJones

AnthonyJones
Fri Mar 14 12:32:03 CDT 2008


"sklett" wrote:

> Hi all,
>
> While this is not specific to asp.net, it does seem like the most
> appropriate form (that I could find) to ask the following:
>
> I'm trying to write a screen scraper that needs to submit a POST request,
> then scrape the response. I've tried to determine the contents of the POST
> via firebug but can't figure it out. I'm wondering if any of you could
> recommend an application that will show exactly what is submitted when I
> click Search on the following page:
> https://nppes.cms.hhs.gov/NPPES/NPIRegistrySearch.do?subAction=reset&searchType=ind
>
> I've never written a "scraper" before, so I'm not sure how it's done but I
> imagine I would do something like this:
> Once I manage to POST the form, I will read the response into some kind of
> DOM object, then locate the elements I'm interested in and read their
> innerHtml properties.
>
> Anyway, that's how I imagine it will work. First hurdle: POSTing the form.
>

http://www.fiddlertool.com/fiddler

this is an excellent debugging proxy. Once running you can use the site and
then review in detail the conversation the browser has with the site.

--
Anthony Jones - MVP ASP/ASP.NET


Re: viewing the contents of a POST request by sklett

sklett
Fri Mar 14 12:54:37 CDT 2008


"Anthony Jones" <AnthonyJones@discussions.microsoft.com> wrote in message
news:E2BA9AEC-B7A2-4AC8-8638-67D561555589@microsoft.com...

> http://www.fiddlertool.com/fiddler
>
> this is an excellent debugging proxy. Once running you can use the site
> and
> then review in detail the conversation the browser has with the site.
>
> --
> Anthony Jones - MVP ASP/ASP.NET
>

Thanks for the tip Anthony!

I've already downloaded and obtained the info I need (I think) - it looks
like this is what I need to send in my request:

<POST data>

POST /NPPES/NPIRegistrySearch.do HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/x-shockwave-flash, application/vnd.ms-excel,
application/vnd.ms-powerpoint, application/msword, application/xaml+xml,
application/vnd.ms-xpsdocument, application/x-ms-xbap,
application/x-ms-application, */*
Referer:
https://nppes.cms.hhs.gov/NPPES/NPIRegistrySearch.do?subAction=reset&searchType=ind
Accept-Language: en-us
Content-Type: application/x-www-form-urlencoded
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR
1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648)
Host: nppes.cms.hhs.gov
Content-Length: 163
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: JSESSIONID=0000b48FKemf-yxDhvGDlgxB4Nf:12l6retk3

org.apache.struts.taglib.html.TOKEN=643755f453b294990412442d6e4fb304&searchType=ind&searchNpi=&firstName=David&lastName=Paskil&city=&state=CA&zip=&subAction=Search

<POST data>

If I'm reading this correctly, the ONLY data included in the content of the
request is the formfield/value pairs.

I will use Fiddler to sniff my own traffic when trying to POST - once they
look exactly the same (minus the session ID) I should be in good shape.

Thanks again,
Steve



Re: viewing the contents of a POST request by Ian

Ian
Fri Mar 14 14:00:39 CDT 2008

I think the Microsoft Network Monitor will do what you want.

"sklett" <s@s.com> wrote in message
news:OyWJOQfhIHA.3788@TK2MSFTNGP03.phx.gbl:

> Hi all,
>
> While this is not specific to asp.net, it does seem like the most
> appropriate form (that I could find) to ask the following:
>
> I'm trying to write a screen scraper that needs to submit a POST
> request,
> then scrape the response. I've tried to determine the contents of the
> POST
> via firebug but can't figure it out. I'm wondering if any of you could
> recommend an application that will show exactly what is submitted when I
> click Search on the following page:
> https://nppes.cms.hhs.gov/NPPES/NPIRegistrySearch.do?subAction=reset&sea
> rchType=ind
>
> I've never written a "scraper" before, so I'm not sure how it's done but
> I
> imagine I would do something like this:
> Once I manage to POST the form, I will read the response into some kind
> of
> DOM object, then locate the elements I'm interested in and read their
> innerHtml properties.
>
> Anyway, that's how I imagine it will work. First hurdle: POSTing the
> form.
>
> Thanks for any help,
> Steve