I need a routine to parse _cliptext for name, email, web URL, phone,
cellphone, street, city, zip, etc. I.E. an Address grabber that will
slice and dice _cliptext for replace fieldname in the current record
when I rightclick (On Key Label RightMouse Do Parse.prg)

Usually, but not always, each of the fields I copy to _cliptext is
separated from the next by a carriage return chr(13)

Parsing is getting pretty hairy because I'm grabbing more than one
field at a time to the clipboard from a VARIETY of websites, each
layed out differently.

There is no telling whether I will be grabbing a persons name, his
email, his cellphone, his home address, his web site. or a combination
of 2 or three things.

One website will show phone preceeding email and while another will
display the same information, with email preceeding the phone number.

The next person's information will be missing the phone, but show his
address or something like that.

Wondering if anyone has some code already written that would help me.

Up until now I have stripped out carriage returns along with other
funny characters Chr(1... through 31), and done tests on a single line
_cliptext looking for clues like "@" (email), "www" (website), "Phone"
(phone).

Now I'm thinking it is better to leave the carriage returns in, count
the lines using memlines(_cliptext) and categorize mline(_cliptext,1)
....2....3, store each line to memory variables m.phone, m.cellphone,
m.street, m.email, m.website

Any help appreciated.
John "J.J." Jackson

Re: parse website _cliptext for name, email, web URL, phone, cellphone, street, city, zip by swdev2

swdev2
Wed Feb 16 01:29:15 CST 2005

I've written something similar, but most is under NDA -
and it encapsulates the browser, finds the IE page objects and their values,
and sucks out the values .
try some google searches on VFP Screen Scraping, VFP Web Page Scraping - as
I remember from 18 months ago, those results got me on the right track.
hth - mondo regards [Bill]
--
William Sanders / Electronic Filing Group Remove the DOT BOB to reply via
email.
Mondo Cool TeleCom -> http://www.efgroup.net/efgcog.html
Mondo Cool WebHosting -> http://www.efgroup.net/efglunar.html
Mondo Cool Satellites -> http://www.efgroup.net/sat
VFP Webhosting? You BET! -> http://efgroup.net/vfpwebhosting
mySql / VFP / MS-SQL

"JJ" <jjyg@adelphia.net> wrote in message
news:6v0511tlf0ndk8esc9i0no0v5g6o86aaa5@4ax.com...
> I need a routine to parse _cliptext for name, email, web URL, phone,
> cellphone, street, city, zip, etc. I.E. an Address grabber that will
> slice and dice _cliptext for replace fieldname in the current record
> when I rightclick (On Key Label RightMouse Do Parse.prg)
>
> Usually, but not always, each of the fields I copy to _cliptext is
> separated from the next by a carriage return chr(13)
>
> Parsing is getting pretty hairy because I'm grabbing more than one
> field at a time to the clipboard from a VARIETY of websites, each
> layed out differently.
>
> There is no telling whether I will be grabbing a persons name, his
> email, his cellphone, his home address, his web site. or a combination
> of 2 or three things.
>
> One website will show phone preceeding email and while another will
> display the same information, with email preceeding the phone number.
>
> The next person's information will be missing the phone, but show his
> address or something like that.
>
> Wondering if anyone has some code already written that would help me.
>
> Up until now I have stripped out carriage returns along with other
> funny characters Chr(1... through 31), and done tests on a single line
> _cliptext looking for clues like "@" (email), "www" (website), "Phone"
> (phone).
>
> Now I'm thinking it is better to leave the carriage returns in, count
> the lines using memlines(_cliptext) and categorize mline(_cliptext,1)
> ....2....3, store each line to memory variables m.phone, m.cellphone,
> m.street, m.email, m.website
>
> Any help appreciated.
> John "J.J." Jackson



Re: parse website _cliptext for name, email, web URL, phone, cellphone, street, city, zip by Andrew

Andrew
Wed Feb 16 09:46:53 CST 2005

JJ wrote:
> There is no telling whether I will be grabbing a persons name, his
> email, his cellphone, his home address, his web site. or a combination
> of 2 or three things.

Sorry, don't have anything to share but I think the way forward is to split
the text up with the CHR(13) delimiters and then use regex to match the
fields that aren't empty.

I think you'll really struggle to see the difference between lines of an
address and a name though. If you give me a foreign name and address all
mixed up I couldn't do it myself [ie brainpower] so programming something is
going to be even harder!

--
HTH
Andrew Howell



Re: parse website _cliptext for name, email, web URL, phone, cellphone, street, city, zip by JJ

JJ
Wed Feb 16 11:25:36 CST 2005

>Sorry, don't have anything to share but I think the way forward is to split
>the text up with the CHR(13) delimiters and then use regex to match the
>fields that aren't empty.

What is regex?

I'm running VFP 6

>I think you'll really struggle to see the difference between lines of an
>address and a name though. If you give me a foreign name and address all
>mixed up I couldn't do it myself [ie brainpower] so programming something is
>going to be even harder!

Conditional logic will go a long way for me.

Say a particular web site yields memlines=4 and I test that
mline(_cliptext,3) contains the word "Phone:" its a good bet that
_cliptext came from this particular web site and that would determine
the contents of Mline(_cliptext,1....2...4) etc.

I think I can get it to work, I'm just struggling with the best
approach.

John "J.J." Jackson

Re: parse website _cliptext for name, email, web URL, phone, cellphone, street, city, zip by Andrew

Andrew
Thu Feb 17 02:52:31 CST 2005

JJ wrote:
>> Sorry, don't have anything to share but I think the way forward is
>> to split the text up with the CHR(13) delimiters and then use regex
>> to match the fields that aren't empty.
>
> What is regex?

A regular expression. A powerful (and slow ;)) form of pattern matching. I'm
sure I've seen other people using some sort of library or similar to be able
to do regex matching in VFP, there is a website here with a list of
contributions of expressions to match email addresses, postal addresses and
URLs.

http://www.regexlib.com/DisplayPatterns.aspx

--
HTH
Andrew Howell




Re: parse website _cliptext for name, email, web URL, phone, cellphone, street, city, zip by Dan

Dan
Thu Feb 17 13:34:40 CST 2005

Andrew Howell wrote:
> matching. I'm sure I've seen other people using some sort of library
> or similar to be able to do regex matching in VFP

It's a class in the FFC, which is really just a wrapper around the Windows
Scripting Host.

Dan



Re: parse website _cliptext for name, email, web URL, phone, cellphone, street, city, zip by darrell

darrell
Fri Feb 18 17:33:39 CST 2005

Could you post a few websites that are of interest to you?

As SWDev2 alluded to, it'd be better; and easier, to use IE's object model
to scrape the pages in question.

If you post some sites, I can post a little code showing a methodology to
scrape the page using IE.

BTW/ What's the programs ultimate purpose?

Darrell
"JJ" <jjyg@adelphia.net> wrote in message
news:6v0511tlf0ndk8esc9i0no0v5g6o86aaa5@4ax.com...
> I need a routine to parse _cliptext for name, email, web URL, phone,
> cellphone, street, city, zip, etc. I.E. an Address grabber that will
> slice and dice _cliptext for replace fieldname in the current record
> when I rightclick (On Key Label RightMouse Do Parse.prg)
>
> Usually, but not always, each of the fields I copy to _cliptext is
> separated from the next by a carriage return chr(13)
>
> Parsing is getting pretty hairy because I'm grabbing more than one
> field at a time to the clipboard from a VARIETY of websites, each
> layed out differently.
>
> There is no telling whether I will be grabbing a persons name, his
> email, his cellphone, his home address, his web site. or a combination
> of 2 or three things.
>
> One website will show phone preceeding email and while another will
> display the same information, with email preceeding the phone number.
>
> The next person's information will be missing the phone, but show his
> address or something like that.
>
> Wondering if anyone has some code already written that would help me.
>
> Up until now I have stripped out carriage returns along with other
> funny characters Chr(1... through 31), and done tests on a single line
> _cliptext looking for clues like "@" (email), "www" (website), "Phone"
> (phone).
>
> Now I'm thinking it is better to leave the carriage returns in, count
> the lines using memlines(_cliptext) and categorize mline(_cliptext,1)
> ....2....3, store each line to memory variables m.phone, m.cellphone,
> m.street, m.email, m.website
>
> Any help appreciated.
> John "J.J." Jackson
>



Re: parse website _cliptext for name, email, web URL, phone, cellphone, street, city, zip by JJ

JJ
Mon Feb 21 13:21:45 CST 2005

On Fri, 18 Feb 2005 23:33:39 GMT, "darrell" <someone@somewhere.com>
wrote:

>If you post some sites, I can post a little code showing a methodology to
>scrape the page using IE.

1)
http://www.jupiterarea.com/memship_roster.html

2)
In this example you must seach on a category to get to the address
page:

http://www.npbchamber.com/index.php?category=chamber&section=membership&sub=directory

3)
http://www.mmsi-ecom.com/rmls/roster/roster.pl?Param=rmls-roster&pass=firmSearch&FIRM=Coldwell&CITY=&ZIP=&SORT=FIRMNAME&MAX=40&WEBPAGEONLY=

4)
http://www.switchboard.com/
(Find a person, business, or do a reverse lookup on a phone number to
get address, zip, phone)

>BTW/ What's the programs ultimate purpose?

Data collection and cleanup.

Eventually this data will be imported into something like ACT!,
Goldmine, MSCRM for purposes of marketing mortgages to realtors.

I'm not much of a FoxPro programmer but grew up on dBase and still
find it easier to manipulate this data, eliminate duplicate records,
merge records from serveral sources, add details like cellphone,
email, website, home address from various web sites in to Visual
FoxPro 6 Browse Windows, using _cliptext, my mouse, the function keys
etc.

I had it pretty automated but am rewriting it in a procedure file.

I have a program that opens the database several times in browse
windows and sets relations since I bounce back and forth in the way I
look at the data.

I use the function keys like this:

***********************
*keylabset.prg
on key label CTRL+W ? chr(7)
on key label CTRL+Q do appgatcomp
on key label rightmouse do cleanup && was sbphone.prg
on key label f1 do cleanup && sbphone
on key label f2 do scatter
on key label f3 do gather
on key label f4 do appgat
on key label f5 do scatcomp
on key label f6 do gatcomp
on key label f7 do appgatcomp
on key label f8 browse last nowait
*on key label f8 do browslave
on key label f9 do scatrealhome
on key label f10 do gatrealhome
*on key label f11 do realtolast
on key label f11 do lasttoreal
on key label f12 do dedup
return
**********************

Then I'm building this Procedure file bookproc.prg which will be used
when I right click on a particular record in one of the browse windows
after copying data from a web site to the clipboard
**************************
* bookproc.prg procedure file called by setup
*********
PROCEDURE CleanUp
clear
*set memowidth to 100
public dirty, split, CutSource,
=public VarName, mname, mphone, mcell, memail, mweb, mfax, mcompany,
mstreet, mcity, mzip
store "" to dirty, split, CutSource, mname, mphone, mcell, memail,
mweb, mfax, mcompany, mstreet, mcity, mzip
dirty=alltrim(_cliptext)
x = ""
FOR Ctr = 1 TO 31
If not Ctr=13
&& ignore/keep carriage returns
x = x + CHR(Ctr)
&& x keeps growing to include all offending characters
Endif
NEXT
&& next must be same as EndFor
dirty = CHRTRAN(dirty,x,"")
&& actual replacement takes place HERE
do ShowVar with "dirty"
For Ctr = 1 to Memlines(dirty)
test=mline(dirty,ctr)
Do SplitLine
Endfor
do ShowVar with "split"
For Ctr = 1 to Memlines(split)
test=mline(split,ctr)
Do Paste
Endfor
RETURN
*********
PROCEDURE LastToReal
parameters plast
&& experimental code
If rectype="R" and rlast=" " and real =" " && *
clipx=alltrim(clipx)
repl rlast with plast
&& with clipx
keyboard '{tab}' && if rlast is highlighted field replacement doesn't
finish till you tab out of the field
keyboard '{backtab}'
repl real with trim(substr(rlast,at(", ",rlast)+2))+" "+
substr(rlast,1,at(",",rlast)-1)
If not "Jr." $real and not "Sr." $real
repl real with strtran(real,".","")
Endif
Endif
? CHR(7)
return
*********
PROCEDURE Email
parameters pmail
&& experimental code
pmail=strtran(lower(strtran(strtran(pmail," ",""),":","")),"email","")
*pmail=strtran(pmail,"mailto","")
Do Case
Case empty(remail)
repl remail with pmail
&& clipx
Case alltrim(remail) $pmail
&& clipx
repl remail with pmail
&& clipx
Case pmail $remail
&& clipx
set bell to uhoh
? chr(7)
Wait
Wait Window pmail+" already in remail" timeout 5 && clipx
set bell to chimes
Case len(pmail+" "+trim(remail2nd))<=Fsize('remail2nd')
&& clipx
repl remail2nd with pmail+" "+trim(remail2nd)
&& clipx
Case len(pmail+" "+trim(remail))<=Fsize('remail')
&& clipx
repl remail with pmail+" "+trim(remail)
&& clipx
Otherwise
set bell to uhoh
? chr(7)
set bell to chimes
_cliptext=pmail
wait
wait window " pmail =>_cliptext to big CTRL+V to paste ", _cliptext
EndCase
*********
Procedure Fax
parameters pfax
pfax=strtran(strtran(strtran(strtran(strtran(lower(pfax),"fax",""),":",""),")","-"),"("),"
","")
do NotCoded with "Fax, memlines(split)=" +memlines(split)
*********
PROCEDURE NotCoded
parameter pfrom
do UhOh
Wait Window "Not Coded, Procedure=", pfrom
Return
*********
PROCEDURE Paste
Do Case
Case memlines(split)=1

Do Case
Case "@" $split or "email" $lower(split)
do Email with split
Case ", " $split
do LastToReal with split
Case "fax"$lower(split)
do Fax with split
Case "cell" $lower(split) or "mobile"$lower(split)
do PhoneCell with split
Case "Other: " $split && Other rphone from M=MMSI
do PhoneOther with split
Case ("Sorry" $split or "people" $split) and not empty(rzip) && S:
no people rzip or people wrong street
If empty(rphone)
repl rphone with "NoS@add"
Else
repl remail with trim(remail) +" NoS@add"
Endif
? chr(7)
Case "match" $split and not empty(rphone)
&& S: nobody found @ rphone
repl remail with trim(remail)+" NoSadd4Ph"
? chr(7)
Case "person" $split and empty(rphone) and empty(rzip) &&
#rphone #radd First Last Failed
repl rphone with "NoS4Nm"
? chr(7)
Case "Inquiry" $split or "Quick" $split
repl remail with trim(remail)+" NoM", recsource with
strtran(recsource,"M","") && get rid of M cause not on file at MMSI
? chr(7)
Otherwise
do NotCoded with "Paste, memlines(split)=1"
Endcase

Case memlines(split)=2

Do Case
Case ", "$mline(split,1) and "@"$mline(split,2)
do LastToReal with mline(split,1)
do email with mline(split,2)
Otherwise
do NotCoded with "Paste, memlines(split)=2, Otherwise"
Endcase

Case memlines(split)=3
do NotCoded with "Paste, memlines(split)=3"
Case memlines(split)=4
do NotCoded with "Paste, memlines(split)=4"
Case memlines(split)=5
do NotCoded with "Paste, memlines(split)=5"
Case memlines(split)=6
do NotCoded with "Paste, memlines(split)=6"
Case memlines(split)=7
do NotCoded with "Paste, memlines(split)=7"
Case memlines(split)=8
do NotCoded with "Paste, memlines(split)=8"
Otherwise
do NotCoded with "Paste, Otherwise3"
EndCase
*********
PROCEDURE PhoneCell
parameters Pcell
pcell=strtran(strtran(strtran(strtran(strtran(pcell,"
",""),".","-"),":",""),"(",""),")","-")
pcell=strtran(strtran(strtran(pcell,"cellular",""),"mobile",""),"cell","")
repl Rcellphone with pcell
If not "W" $recsource
repl recsource with trim(recsource)+"W"
Endif
? chr(7)
*********

PROCEDURE PhoneOther
parameter pother
If empty(rphone) and empty(rphsource)
repl rphsource with "M", rphone with substr(split,at("Other:
",pother)+7,12)
Else
repl remail2nd with ltrim(trim(remail2nd)+"
M"+substr(split,at("Other: ",pother)+7,12))
Endif
If not "M"$recsource
repl recsource with "M"+recsource
Endif
? chr(7)
*********
PROCEDURE ShowVar
&& do ShowVarjj with "dirty" so you can display pshow
parameters pshow
?
? "CutSource: ", Cutsource
? pshow
local pval
pval = evaluate(pshow)
For Ctr = 1 to Memlines(pval)
? ctr, mline(pval,ctr)
EndFor
ENDPROC
*********
PROCEDURE ShowVarjj
&& do ShowVarjj with dirty (no quotes around dirty)
parameters pshow
?
? "CutSource: ", Cutsource
For Ctr = 1 to Memlines(pshow)
? ctr, mline(pshow,ctr)
EndFor
?
*********
PROCEDURE SplitLine
&& to further parse lines containing more than one field
Do Case
Case ", FL " $test and "Phn: "$test &&
West Palm Bch, FL 33401-7918 Phn: 561-832-4663
test=strtran(test,", FL ",chr(13)+"FL"+chr(13))
test =strtran(test," Phn: ",chr(13)+"Phn: ")
CutSource="MFirm"
Case ", FL " $test
test=strtran(test,", FL ",chr(13)+"FL"+chr(13))
CutSource="Address"
Case "@"$test and "Office: "$test
test=strtran(test," Office: ",chr(13)+"Office: ")
CutSource="Mmember"
EndCase
If not ''=split
split=split+chr(13)+test
Else
split=test
Endif
*********
PROCEDURE UhOh
set bell to UhOh
? chr(7)
set bell to chimes
*********



*************************
John "J.J." Jackson

Re: parse website _cliptext for name, email, web URL, phone, cellphone, street, city, zip by gerry

gerry
Mon Feb 21 13:55:33 CST 2005

> 1)
> http://www.jupiterarea.com/memship_roster.html

wow - now that is one absolute piece or crap web page.
every member address section is wrapped in 200+ ( just a guess ) FONT tags
!!!!
no wonder it takes this page so long to load.
and there are maybe 6 or more ( unintentionally ? ) different formatting
styles used.




anyway, I didn't look through all that code , but I have done some html
parsing ( lottery results ), if all you are doing is parsing data from the
html text, look into using CHRTRAN and STRTRAN to get rid of any unnecessary
noise from the data and STREXTRACT to 'chunk up' the data and then to zero
in on the parts that you want.

here is an actual code snippet that parses lottery results from a retreived
html page, maybe i got lucky with having to parse much cleaner data, but
this code is much much simpler and shorter that what you seem to be using :

* data is in 1st table following FORM
m.htm=STREXTRACT(m.htm,'<TABLE WIDTH=100% BORDER=1
BGCOLOR="#F5F5F5">',"</table>",1,3)
IF empty(m.htm) THEN
WAIT WINDOW m.fname+CHR(10)+"Result table not found !!"
SET STEP ON
ENDIF
m.htm=STRTRAN(m.htm,"<B>","",1,999,3)
m.htm=STRTRAN(m.htm,"</B>","",1,999,3)

m.ohtm=m.htm
m.htm=CHRTRAN(m.htm,CHR(9)+CHR(10)+CHR(13)," ")
DO WHILE len(m.ohtm)#LEN(m.htm)
m.ohtm=m.htm
m.htm=STRTRAN(m.htm," "," ")
ENDDO
m.htm=STRTRAN(m.htm,"> <","><")

* extract data one table row at a time
LOCAL rcnt , rw , datafound
LOCAL dt,n1,n2,n3,n4,n5,n6,b
m.datafound=.f.
FOR m.rcnt=1 TO 999999

* get next row in table
m.rw=STREXTRACT(m.htm,"<tr>","</tr>",m.rcnt,1)
IF len(m.rw)=0 then
EXIT
ENDIF

* are we interestd in this row ?
IF LEFT(m.rw,41) == '<TD VALIGN=MIDDLE ROWSPAN=5 CLASS="body">' THEN
* parse out fields
m.dt=STREXTRACT(m.rw,">","<",1,1)
m.n1=STREXTRACT(m.rw,">","<",3,1)
m.n2=STREXTRACT(m.rw,">","<",5,1)
m.n3=STREXTRACT(m.rw,">","<",7,1)
m.n4=STREXTRACT(m.rw,">","<",9,1)
m.n5=STREXTRACT(m.rw,">","<",11,1)
m.n6=STREXTRACT(m.rw,">","<",13,1)
m.b=STREXTRACT(m.rw,">","<",15,1)

IF !SEEK( m.dt , "lotto649" , "dt" ) THEN
APPEND BLANK
ENDIF
GATHER memvar
ENDIF

NEXT






"JJ" <jjyg@adelphia.net> wrote in message
news:vhck11l7bsvs19rtem7auf19eolke4e7kd@4ax.com...
> On Fri, 18 Feb 2005 23:33:39 GMT, "darrell" <someone@somewhere.com>
> wrote:
>
> >If you post some sites, I can post a little code showing a methodology to
> >scrape the page using IE.
>
> 1)
> http://www.jupiterarea.com/memship_roster.html
>
> 2)
> In this example you must seach on a category to get to the address
> page:
>
>
http://www.npbchamber.com/index.php?category=chamber&section=membership&sub=directory
>
> 3)
>
http://www.mmsi-ecom.com/rmls/roster/roster.pl?Param=rmls-roster&pass=firmSearch&FIRM=Coldwell&CITY=&ZIP=&SORT=FIRMNAME&MAX=40&WEBPAGEONLY=
>
> 4)
> http://www.switchboard.com/
> (Find a person, business, or do a reverse lookup on a phone number to
> get address, zip, phone)
>
> >BTW/ What's the programs ultimate purpose?
>
> Data collection and cleanup.
>
> Eventually this data will be imported into something like ACT!,
> Goldmine, MSCRM for purposes of marketing mortgages to realtors.
>
> I'm not much of a FoxPro programmer but grew up on dBase and still
> find it easier to manipulate this data, eliminate duplicate records,
> merge records from serveral sources, add details like cellphone,
> email, website, home address from various web sites in to Visual
> FoxPro 6 Browse Windows, using _cliptext, my mouse, the function keys
> etc.
>
> I had it pretty automated but am rewriting it in a procedure file.
>
> I have a program that opens the database several times in browse
> windows and sets relations since I bounce back and forth in the way I
> look at the data.
>
> I use the function keys like this:
>
> ***********************
> *keylabset.prg
> on key label CTRL+W ? chr(7)
> on key label CTRL+Q do appgatcomp
> on key label rightmouse do cleanup && was sbphone.prg
> on key label f1 do cleanup && sbphone
> on key label f2 do scatter
> on key label f3 do gather
> on key label f4 do appgat
> on key label f5 do scatcomp
> on key label f6 do gatcomp
> on key label f7 do appgatcomp
> on key label f8 browse last nowait
> *on key label f8 do browslave
> on key label f9 do scatrealhome
> on key label f10 do gatrealhome
> *on key label f11 do realtolast
> on key label f11 do lasttoreal
> on key label f12 do dedup
> return
> **********************
>
> Then I'm building this Procedure file bookproc.prg which will be used
> when I right click on a particular record in one of the browse windows
> after copying data from a web site to the clipboard
> **************************
> * bookproc.prg procedure file called by setup
> *********
> PROCEDURE CleanUp
> clear
> *set memowidth to 100
> public dirty, split, CutSource,
> =public VarName, mname, mphone, mcell, memail, mweb, mfax, mcompany,
> mstreet, mcity, mzip
> store "" to dirty, split, CutSource, mname, mphone, mcell, memail,
> mweb, mfax, mcompany, mstreet, mcity, mzip
> dirty=alltrim(_cliptext)
> x = ""
> FOR Ctr = 1 TO 31
> If not Ctr=13
> && ignore/keep carriage returns
> x = x + CHR(Ctr)
> && x keeps growing to include all offending characters
> Endif
> NEXT
> && next must be same as EndFor
> dirty = CHRTRAN(dirty,x,"")
> && actual replacement takes place HERE
> do ShowVar with "dirty"
> For Ctr = 1 to Memlines(dirty)
> test=mline(dirty,ctr)
> Do SplitLine
> Endfor
> do ShowVar with "split"
> For Ctr = 1 to Memlines(split)
> test=mline(split,ctr)
> Do Paste
> Endfor
> RETURN
> *********
> PROCEDURE LastToReal
> parameters plast
> && experimental code
> If rectype="R" and rlast=" " and real =" " && *
> clipx=alltrim(clipx)
> repl rlast with plast
> && with clipx
> keyboard '{tab}' && if rlast is highlighted field replacement doesn't
> finish till you tab out of the field
> keyboard '{backtab}'
> repl real with trim(substr(rlast,at(", ",rlast)+2))+" "+
> substr(rlast,1,at(",",rlast)-1)
> If not "Jr." $real and not "Sr." $real
> repl real with strtran(real,".","")
> Endif
> Endif
> ? CHR(7)
> return
> *********
> PROCEDURE Email
> parameters pmail
> && experimental code
> pmail=strtran(lower(strtran(strtran(pmail," ",""),":","")),"email","")
> *pmail=strtran(pmail,"mailto","")
> Do Case
> Case empty(remail)
> repl remail with pmail
> && clipx
> Case alltrim(remail) $pmail
> && clipx
> repl remail with pmail
> && clipx
> Case pmail $remail
> && clipx
> set bell to uhoh
> ? chr(7)
> Wait
> Wait Window pmail+" already in remail" timeout 5 && clipx
> set bell to chimes
> Case len(pmail+" "+trim(remail2nd))<=Fsize('remail2nd')
> && clipx
> repl remail2nd with pmail+" "+trim(remail2nd)
> && clipx
> Case len(pmail+" "+trim(remail))<=Fsize('remail')
> && clipx
> repl remail with pmail+" "+trim(remail)
> && clipx
> Otherwise
> set bell to uhoh
> ? chr(7)
> set bell to chimes
> _cliptext=pmail
> wait
> wait window " pmail =>_cliptext to big CTRL+V to paste ", _cliptext
> EndCase
> *********
> Procedure Fax
> parameters pfax
>
pfax=strtran(strtran(strtran(strtran(strtran(lower(pfax),"fax",""),":",""),"
)","-"),"("),"
> ","")
> do NotCoded with "Fax, memlines(split)=" +memlines(split)
> *********
> PROCEDURE NotCoded
> parameter pfrom
> do UhOh
> Wait Window "Not Coded, Procedure=", pfrom
> Return
> *********
> PROCEDURE Paste
> Do Case
> Case memlines(split)=1
>
> Do Case
> Case "@" $split or "email" $lower(split)
> do Email with split
> Case ", " $split
> do LastToReal with split
> Case "fax"$lower(split)
> do Fax with split
> Case "cell" $lower(split) or "mobile"$lower(split)
> do PhoneCell with split
> Case "Other: " $split && Other rphone from M=MMSI
> do PhoneOther with split
> Case ("Sorry" $split or "people" $split) and not empty(rzip) && S:
> no people rzip or people wrong street
> If empty(rphone)
> repl rphone with "NoS@add"
> Else
> repl remail with trim(remail) +" NoS@add"
> Endif
> ? chr(7)
> Case "match" $split and not empty(rphone)
> && S: nobody found @ rphone
> repl remail with trim(remail)+" NoSadd4Ph"
> ? chr(7)
> Case "person" $split and empty(rphone) and empty(rzip) &&
> #rphone #radd First Last Failed
> repl rphone with "NoS4Nm"
> ? chr(7)
> Case "Inquiry" $split or "Quick" $split
> repl remail with trim(remail)+" NoM", recsource with
> strtran(recsource,"M","") && get rid of M cause not on file at MMSI
> ? chr(7)
> Otherwise
> do NotCoded with "Paste, memlines(split)=1"
> Endcase
>
> Case memlines(split)=2
>
> Do Case
> Case ", "$mline(split,1) and "@"$mline(split,2)
> do LastToReal with mline(split,1)
> do email with mline(split,2)
> Otherwise
> do NotCoded with "Paste, memlines(split)=2, Otherwise"
> Endcase
>
> Case memlines(split)=3
> do NotCoded with "Paste, memlines(split)=3"
> Case memlines(split)=4
> do NotCoded with "Paste, memlines(split)=4"
> Case memlines(split)=5
> do NotCoded with "Paste, memlines(split)=5"
> Case memlines(split)=6
> do NotCoded with "Paste, memlines(split)=6"
> Case memlines(split)=7
> do NotCoded with "Paste, memlines(split)=7"
> Case memlines(split)=8
> do NotCoded with "Paste, memlines(split)=8"
> Otherwise
> do NotCoded with "Paste, Otherwise3"
> EndCase
> *********
> PROCEDURE PhoneCell
> parameters Pcell
> pcell=strtran(strtran(strtran(strtran(strtran(pcell,"
> ",""),".","-"),":",""),"(",""),")","-")
> pcell=strtran(strtran(strtran(pcell,"cellular",""),"mobile",""),"cell","")
> repl Rcellphone with pcell
> If not "W" $recsource
> repl recsource with trim(recsource)+"W"
> Endif
> ? chr(7)
> *********
>
> PROCEDURE PhoneOther
> parameter pother
> If empty(rphone) and empty(rphsource)
> repl rphsource with "M", rphone with substr(split,at("Other:
> ",pother)+7,12)
> Else
> repl remail2nd with ltrim(trim(remail2nd)+"
> M"+substr(split,at("Other: ",pother)+7,12))
> Endif
> If not "M"$recsource
> repl recsource with "M"+recsource
> Endif
> ? chr(7)
> *********
> PROCEDURE ShowVar
> && do ShowVarjj with "dirty" so you can display pshow
> parameters pshow
> ?
> ? "CutSource: ", Cutsource
> ? pshow
> local pval
> pval = evaluate(pshow)
> For Ctr = 1 to Memlines(pval)
> ? ctr, mline(pval,ctr)
> EndFor
> ENDPROC
> *********
> PROCEDURE ShowVarjj
> && do ShowVarjj with dirty (no quotes around dirty)
> parameters pshow
> ?
> ? "CutSource: ", Cutsource
> For Ctr = 1 to Memlines(pshow)
> ? ctr, mline(pshow,ctr)
> EndFor
> ?
> *********
> PROCEDURE SplitLine
> && to further parse lines containing more than one field
> Do Case
> Case ", FL " $test and "Phn: "$test &&
> West Palm Bch, FL 33401-7918 Phn: 561-832-4663
> test=strtran(test,", FL ",chr(13)+"FL"+chr(13))
> test =strtran(test," Phn: ",chr(13)+"Phn: ")
> CutSource="MFirm"
> Case ", FL " $test
> test=strtran(test,", FL ",chr(13)+"FL"+chr(13))
> CutSource="Address"
> Case "@"$test and "Office: "$test
> test=strtran(test," Office: ",chr(13)+"Office: ")
> CutSource="Mmember"
> EndCase
> If not ''=split
> split=split+chr(13)+test
> Else
> split=test
> Endif
> *********
> PROCEDURE UhOh
> set bell to UhOh
> ? chr(7)
> set bell to chimes
> *********
>
>
>
> *************************
> John "J.J." Jackson



Re: parse website _cliptext for name, email, web URL, phone, cellphone, street, city, zip by JJ

JJ
Wed Feb 23 09:27:10 CST 2005

On Mon, 21 Feb 2005 14:55:33 -0500, "gerry" <germ@hotmail.com> wrote:

>if all you are doing is parsing data from the
>html text, look into using CHRTRAN and STRTRAN to get rid of any unnecessary
>noise from the data and STREXTRACT to 'chunk up' the data and then to zero
>in on the parts that you want.

I understand chrtran and strtran and am using

PROCEDURE CleanUp
parameters pclean
x = ""
FOR Ctr = 1 TO 31
If not Ctr=13
&& ignore/keep carriage returns
x = x + CHR(Ctr)
&& x keeps growing to include all offending characters
Endif
EndFor
pclean = CHRTRAN(pclean,x,"")
&& actual replacement takes place HERE

to get rid of funny characters.

STREXTRACT is not listed in my VFP 6.0 help file, perhaps a new
command in later versions...?

If so out of curiousity what does it do?

What VFP 6 commands could I use in its place?

I will have to study your code sample really hard to understand what
it is doing, but I gather you are scraping "table oriented" web pages
to APPEND new records to a database.

I'm not doing any mass appends right now, but it might be useful for
me on the following two pages

1) Firm Search
http://www.mmsi-ecom.com/rmls/roster/roster.pl?Param=-roster&pass=firmLookup

and

2) Member Search
http://www.mmsi-ecom.com/rmls/roster/roster.pl?Param=-roster&pass=memberLookup

After you key in a Firm Name you will get (search 1) all the offices
with that name and (search 2) all the employees of that firm (have to
narrow it down with zip code or something for really common office
names)

For real common office names where multiple locations are
indistinguishable by office name search 2 helps me determine WHICH of
these offices an employee works at when that data is not known.

This data is the icing on my cake. I have MOST of the firm names and
most of the employees in my .dbf already. What I don't have is their
emails.

I don't think APPENDING new records is the way to go here. In Search 2
I'm copying a single employees email and "Other: Phone" to the
clipboard for insertion into the existing record.

Switchboard.com and a mishmosh of other company and employee websites
with no predictable format are being used to gather home phones, more
emails (when not in Search 2), and cellphones etc.

As far as Search 2 is concerned, I suppose I could and do some sort of
mass APPEND followed by a compare and MERGE to my existing data, but
people like "John Smith" would get tricky. I'd have to compare
employee name, company name, office phone from the Member Search page
to my existing records.

If I could grab the entire database off that site (Search 2) that
would be one thing, but if I have to wade through it a page at a time,
which is what I'm doing, then my current approach seems better.

However there are some individual firms I don't have, especially
outside the geography of my other data source.

In this case there is a good chance that if I don't have the firm I
may not have any of its employees either (unless they changed jobs) or
maybe I have a small number of employees at this out of area geography
who chose to join an organization in my area.

Here parsing Search 2 entire page(s) for that Firm name might be
quicker than Append Blank, Paste particualrs from the clipboard using
my macros.

!!!!!!!!!!!!!!!!!!!!!

If you know how to grab the entire database, using FoxPro or some
other method I'd be interested to know how.

!!!!!!!!!!!!!!!!!!!!!

*************************
John "J.J." Jackson


John "J.J." Jackson

Re: parse website _cliptext for name, email, web URL, phone, cellphone, street, city, zip by gerry

gerry
Thu Feb 24 01:17:30 CST 2005



"JJ" <jjyg@adelphia.net> wrote in message
news:9k6p119m4ohcfeuh7pdmd6r2r24usgf2uf@4ax.com...
> On Mon, 21 Feb 2005 14:55:33 -0500, "gerry" <germ@hotmail.com> wrote:
>
...
>
> STREXTRACT is not listed in my VFP 6.0 help file, perhaps a new
> command in later versions...?
>
> If so out of curiousity what does it do?
>
> What VFP 6 commands could I use in its place?

STREXTRACT was introduced in v7
it parses out strings using starting and ending string delimiters.
equiv in 6 would be locate the start string and then do a second locate for
the end string from the end of the start string and then do the math to pull
out the substring.