I'm trying to write an script that does a check of each line in a text
file for all non-alphabetic characters....... I know it can be done
using the reg exp.....

can anyone provide some examples of how this can be done........I have
a look through some but they all seem so complicated and I haven't done
much scripting in this area...........


thaks in advance

Re: reg expression by JHP

JHP
Thu Jan 26 14:30:38 CST 2006

Try this:

Option Explicit

Dim objRegEx, strLine, objMatches, strMatch

Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.Pattern = "[^a-z,^A-Z]"
strLine = "1%3a4*D8fg9"
Set objMatches = objRegEx.Execute(strLine)

If objMatches.Count > 0 Then
For Each strMatch in objMatches
WScript.Echo strMatch.Value
Next
End If
Set objMatches = Nothing
Set objRegEx = Nothing

Here are the various 'Character Usage':

* Matches the previous character zero or more times
+ Matches the previous character one or more times
? Matches the previous character zero or one times
. Matches any single character except the newline
^ Matches the start of the input
$ Matches the end of the input
x|y Matches either first or second character listed
(pattern) Matches pattern
{number} Matches exactly number times
{number,} Matches number, or more, times (note comma)
{num1, num2} Matches at least num1 and at most num2 times
[abc] Matches any character listed between the [ ]
[^abc] Matches all characters except those listed between the [ ]
[a-e] Matches any characters in the specified range (a,b,c,d,e)
[^K-Q] Matches all characters except in the specified range
\ Signifies that the next character is special or a literal.
\b Matches only on a word boundary
\B Matches only inside a word
\d Matches only on a digit
\D Matches only on a non-digit
\f Matches only on a form feed character
\n Matches only on a new line
\r Matches only on a carriage return
\s Matches only on a blank space
\S Matches only on nonblank spaces
\t Matches only on a tab
\v Matches only on a vertical tab
\w Matches only on A to Z, a to z, 0 to 9, and _
\W Matches characters other than A to Z, a to z, 0 to 9, and _
\number Matches any positive number
\octal Matches any octal number
\xhex Matches any hexadecimal number (x is required)

"Star" <momo2804@gmail.com> wrote in message
news:1138293786.951167.150060@g49g2000cwa.googlegroups.com...
> I'm trying to write an script that does a check of each line in a text
> file for all non-alphabetic characters....... I know it can be done
> using the reg exp.....
>
> can anyone provide some examples of how this can be done........I have
> a look through some but they all seem so complicated and I haven't done
> much scripting in this area...........
>
>
> thaks in advance
>



Re: reg expression by James

James
Thu Jan 26 17:48:07 CST 2006

"Star" <momo2804@gmail.com> wrote in message
news:1138293786.951167.150060@g49g2000cwa.googlegroups.com...
> I'm trying to write an script that does a check of each line in a text
> file for all non-alphabetic characters....... I know it can be done
> using the reg exp.....
>
> can anyone provide some examples of how this can be done........I have
> a look through some but they all seem so complicated and I haven't done
> much scripting in this area...........

Are you wanting to simply identify if there are any non-alphabetic
characters in the line, return only the non-alphabetic characters or discard
the non-alphabetic characters?

Below are some small regular expression function code examples to
accomplish each of these.

Just identify if line contains non-alphabetic characters:

'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dim sLine
sLine = "a5bc%(#5gcv=_+"
MsgBox IsAlpha(sLine)

sLine = "Alphabetic"
MsgBox IsAlpha(sLine)

Function IsAlpha(sLine)
'Returns True is all chars are alphabetic, otherwise False
Dim oRegEx
Set oRegEx = CreateObject("VBScript.RegExp")
oRegEx.Global = True
oRegEx.Pattern = "[^a-zA-Z]"
IsAlpha = Not oRegEx.Test(sLine)
End Function
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Strip all non-alphabetic characters from the string:

'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dim sLine
sLine = "a5bc%(#5gcv=_+"

MsgBox Only_Alpha(sLine)

Function Only_Alpha(sLine)
'Strips all non-alphabetic characters
Dim oRegEx
Set oRegEx = CreateObject("VBScript.RegExp")
oRegEx.Global = True
oRegEx.Pattern = "[^a-zA-Z]"
Only_Alpha = oRegEx.Replace(sLine, Empty)
End Function
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Strip all alphabetic characters from the string:

'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dim sLine
sLine = "a5bc%(#5gcv=_+"

MsgBox DiscardAlpha(sLine)

Function DiscardAlpha(sLine)
'Strips all non-alphabetic characters
Dim oRegEx
Set oRegEx = CreateObject("VBScript.RegExp")
oRegEx.Global = True
oRegEx.Pattern = "[a-zA-Z]"
DiscardAlpha = oRegEx.Replace(sLine, Empty)
End Function
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If hope one of these answers your request.



Re: reg expression by Star

Star
Thu Jan 26 21:08:12 CST 2006

James,

Your spot on what I actually want is to identify the non-alphanumeric
characters in the lines this is because ......... were expecting the
lines to containt some foreign language words ie. japanes/korea and we
want to list them into a text file............

Thanks


Re: reg expression by Michael

Michael
Thu Jan 26 21:13:46 CST 2006

> Set oRegEx = CreateObject("VBScript.RegExp")

FYI - The VBScript engine has "insider knowledge" about the RegExp component
and can be simplified to...

Set oRegEx = New RegExp


--
Michael Harris
Microsoft MVP Scripting




Re: reg expression by James

James
Thu Jan 26 21:54:47 CST 2006

"Star" <momo2804@gmail.com> wrote in message
news:1138331292.712076.61680@g44g2000cwa.googlegroups.com...
> James,
>
> Your spot on what I actually want is to identify the non-alphanumeric
> characters in the lines this is because ......... were expecting the
> lines to containt some foreign language words ie. japanes/korea and we
> want to list them into a text file............

In your original message, you mentioned "all non-alphabetic characters".
In your last message you stated "non-alphanumeric characters". Please
consider that the code I posted for identifying and removing non-alphabetic
does just that. If there are numbers, spaces, punctuation, etc, the result
of the 'IsAlpha' will be false. Do you want to expand this to include
numbers, spaces and normal punctuation?

I do not have any unicode text files on hand with Japanese or Korean text.
If you would like some additional help with this, please attach a sample
text file to a reply. I have no actual experience in regards to foreign
characters, but I would assume they would be in unicode and have an 'AscW'
value above 255.



Re: reg expression by Star

Star
Fri Jan 27 01:41:32 CST 2006

Actually what I need is to get the script to read a text file with dir
paths and list down those paths that containt non-aphabetic
characters........ since I would beleive the
test will detect foreign characters Chinese/Japanese as
non-aplhabetic..... here is a sample from the text file....

data\
data\
data\testpath
data\testpath
data\testpath
data\pathII\BUDGET
data\pathII\BUDGET\CAPBUD00\PURCHASE\=E7=9B=B8=E6=A9=9Ffuction=E5=95=8F=E9=
=A1=8C
data\pathII\BUDGET\CAPBUD02\AREAMGT\=E4=BB=B2=E7=88=AD=E5=80=8B DK-21M D200=
=E5=8A=9F
data\pathII\BUDGET\CAPBUD02\MACAU OFFICE
data\pathII\BUDGET\CAPBUD02\=E4=BB=B2=E7=88=AD=E5=80=8B DK-21M D200 =E5=8A=
=9F
data\pathII\BUDGET\CAPBUD02\PM&S
data\pathII\BUDGET\FY02\=E8=94=A1=E5=B1=8B=E5=9C=8D
data\pathII\BUDGET\FY02\MACAU
data\pathII\BUDGET\FY02\PM&S=E8=BE=A3=E6=A4=92=E5=B0=8F=E6=8F=90=E7=A4=BA=
=E5=85=89=E5=8D=80=E5=AE=9A=E7=BE=A9=E9=87=8D=E7=94=B3


really appreciate your generous help..................


Re: reg expression by Alexander

Alexander
Fri Jan 27 05:54:27 CST 2006

Star schrieb:
> Actually what I need is to get the script to read a text file with dir
> paths and list down those paths that containt non-aphabetic
> characters........ since I would beleive the
> test will detect foreign characters Chinese/Japanese as
> non-aplhabetic..... here is a sample from the text file....
>
> data\
> data\
> data\testpath
> data\testpath
> data\testpath
> data\pathII\BUDGET
> data\pathII\BUDGET\CAPBUD00\PURCHASE\ç?¸æ©?fuctionå??é¡?
> data\pathII\BUDGET\CAPBUD02\AREAMGT\仲ç?­å?? DK-21M D200 å??
> data\pathII\BUDGET\CAPBUD02\MACAU OFFICE
> data\pathII\BUDGET\CAPBUD02\仲ç?­å?? DK-21M D200 å??
> data\pathII\BUDGET\CAPBUD02\PM&S
> data\pathII\BUDGET\FY02\è?¡å±?å??
> data\pathII\BUDGET\FY02\MACAU
> data\pathII\BUDGET\FY02\PM&Sè¾£æ¤?å°?æ??示å??å??å®?義é??ç?³


SayHUC "data\"
SayHUC "data\testpath"
SayHUC "data\pathII\BUDGET"
SayHUC "data\pathII\BUDGET\CAPBUD00\PURCHASE\ç?¸æ©?fuctionå??é¡?"
SayHUC "data\pathII\BUDGET\CAPBUD02\AREAMGT\仲ç?­å?? DK-21M D200 å??"
SayHUC "data\pathII\BUDGET\CAPBUD02\MACAU OFFICE"


Function HasUnicode(str)

Set re = new RegExp
'matches all UTF-16(aka unicode)-chars
re.pattern = "[\u0100-\u9999]"

HasUnicode = re.test(str)

End Function

Sub SayHUC(str)
MsgBox "String: " & str & vbcr _
& "HasUnicode: " & CStr(HasUnicode(str))

End Sub



Mfg,
Alex

Re: reg expression by Star

Star
Fri Jan 27 09:06:46 CST 2006

Thanks alot Alex.......... I'll give it a try and see
.................

Cheers.............


Re: reg expression by Star

Star
Thu Feb 02 10:51:55 CST 2006

Alex..........


I tried the above script by creating a MsgBox and typing in some values
to test such as =E4=BB=B2=E7=88=AD=E5=80=8B

however it doesn't seem to work...........am I doing something wrong?

Function HasUnicode(str)


Set re =3D new RegExp
'matches all UTF-16(aka unicode)-chars
re.pattern =3D "[\u0100-\u9999]"


HasUnicode =3D re.test(str)


End Function


Sub SayHUC(str)
MsgBox "String: " & str & vbcr _
& "HasUnicode: " & CStr(HasUnicode(str))=20


End Sub


Re: reg expression by Alexander

Alexander
Tue Feb 07 06:21:29 CST 2006

Star schrieb:
> Alex..........
>
>
> I tried the above script by creating a MsgBox and typing in some values
> to test such as 仲ç?­å??

If you save the script be sure you save it as UNICODE, not ANSI (or
ASCII or UTF-8) which requires an editor that allows handling of
CharWidth/CharSet, Notepad for XP has such an option.

After all this is only a script for testing. If you want to check the
filepathes from your textfile to contain Unicode-characters (whose
charcode is > 255) simply read the filelist line by line.


Mfg,
Alex

'# This one works for me, if filelist contains unicode-chars > 255

Option Explicit

Dim fs, reader

Const strTextFile = "C:\filelist.txt"

set fs = createobject("scripting.filesystemobject")
set reader = fs.OpenTextFile(strTextFile, 1, False, -1) '-1 = As Unicode

while not reader.AtEndOfStream
sayHasUniCode reader.readline
wend

reader.close

Function HasUnicode(str)

Dim re
Set re = new RegExp

'matches all UTF-16(aka unicode)-characters
'bigger then ansi-range (0-255)
re.pattern = "[\u0100-\u9999]"

HasUnicode = re.test(str)

End Function

Sub sayHasUniCode(str)
MsgBox "String: " & str & vbcr & "HasUnicode: " _
& CStr(HasUnicode(str))
End Sub