Hi,

as I couldn't find any newsgroup specialized on regex, I hope
somebody here can help me.
I have the following kind of strings:
file/start.asp#en:test>top
file/start.asp?load=http:/www.test.com>top
file/test.asp?load=http://www.test.com#en:test;de:test>top

With the following regex I can extract the things I need (everything
between 'en:'..and the last ; or 'top' or the uri's:
((.*)[(/|\\)])(.*\.asp)(\?load=(.*)#)?((.*;en:(.*));)?(>(.*)$)?

But additionally the whole thing can start with http:// or ftp://, e.g.
http://www.start.com?load=http:/www.test.com>top
http://www.start.com?load=http:/www.test.com#en:test;>top
...
But when I modify it like this:
(http://.*|ftp://.*)|((.*)[(/|\\)])(.*\.asp)(\?load=(.*)#)?((.*;en:(.*));)?(>(.*)$)?
it only matches the whole string and doesn't split it.

Maybe anybody has got a clue, I'm a bit stuck with that a the moment.

Best regards,

Andi

Re: regex problem by James

James
Wed Aug 17 16:04:12 CDT 2005

"Andreas Bauer" <buki@gmx.net> wrote in message
news:ddvlj0$vel$1@online.de...
> Hi,
>
> as I couldn't find any newsgroup specialized on regex, I hope
> somebody here can help me.
> I have the following kind of strings:
> file/start.asp#en:test>top
> file/start.asp?load=http:/www.test.com>top
> file/test.asp?load=http://www.test.com#en:test;de:test>top
>
> With the following regex I can extract the things I need (everything
> between 'en:'..and the last ; or 'top' or the uri's:
> ((.*)[(/|\\)])(.*\.asp)(\?load=(.*)#)?((.*;en:(.*));)?(>(.*)$)?
>
> But additionally the whole thing can start with http:// or ftp://, e.g.
> http://www.start.com?load=http:/www.test.com>top
> http://www.start.com?load=http:/www.test.com#en:test;>top
> ...
> But when I modify it like this:
>
(http://.*|ftp://.*)|((.*)[(/|\\)])(.*\.asp)(\?load=(.*)#)?((.*;en:(.*));)?(
>(.*)$)?
> it only matches the whole string and doesn't split it.
>
> Maybe anybody has got a clue, I'm a bit stuck with that a the moment.

I am not sure if I fully understand what you are trying to do, but I will
make an assumption & you can correct me I am wrong. In the example strings
you provided:

1: file/start.asp#en:test>top
2: file/start.asp?load=http:/www.test.com>top
3: file/test.asp?load=http://www.test.com#en:test;de:test>top
4: http://www.start.com?load=http:/www.test.com>top
5: http://www.start.com?load=http:/www.test.com#en:test;>top

I assume that the string your are trying to match only exists in examples
1, 3 & 5. If this is correct, I think this pattern should work:
'[^#]*en:([^>;:]*)' Here is what I used for testing:

'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Set RegEx = New RegExp
x = "http://www.start.com?load=http:/www.test.com#en:test;>top"
RegEx.Pattern = "[^#]*en:([^>;:]*)"

Set Matches = RegEx.Execute(x)

For Each Match in Matches
WScript.Echo Match.SubMatches(0)
Next
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If my assumptions are incorrect, please clarify.



Re: regex problem by Roland

Roland
Thu Aug 18 00:39:03 CDT 2005

"James Whitlow" wrote in message
news:O6Ua582oFHA.2472@tk2msftngp13.phx.gbl...
:
: I am not sure if I fully understand what you are trying to do, but I will
: make an assumption & you can correct me I am wrong. In the example strings
: you provided:
:
: 1: file/start.asp#en:test>top
: 2: file/start.asp?load=http:/www.test.com>top
: 3: file/test.asp?load=http://www.test.com#en:test;de:test>top
: 4: http://www.start.com?load=http:/www.test.com>top
: 5: http://www.start.com?load=http:/www.test.com#en:test;>top
:
: I assume that the string your are trying to match only exists in examples
: 1, 3 & 5. If this is correct, I think this pattern should work:
: '[^#]*en:([^>;:]*)' Here is what I used for testing:
:
: '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: Set RegEx = New RegExp
: x = "http://www.start.com?load=http:/www.test.com#en:test;>top"
: RegEx.Pattern = "[^#]*en:([^>;:]*)"
:
: Set Matches = RegEx.Execute(x)
:
: For Each Match in Matches
: WScript.Echo Match.SubMatches(0)
: Next
: '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:
: If my assumptions are incorrect, please clarify.

I don't think you need the colon.

[^#]*en:([^;>]*)

--
Roland Hall
/* This information is distributed in the hope that it will be useful, but
without any warranty; without even the implied warranty of merchantability
or fitness for a particular purpose. */
Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp
MSDN Library - http://msdn.microsoft.com/library/default.asp



Re: regex problem by Andreas

Andreas
Thu Aug 18 12:24:40 CDT 2005


> I assume that the string your are trying to match only exists in examples
> 1, 3 & 5. If this is correct, I think this pattern should work:
> '[^#]*en:([^>;:]*)' Here is what I used for testing:

Thanks, but I need more the parts of the string, e.g.

test_word.pdf#en:Test Word>top (I need test_word.pdf, Test Word, top)

/file\file.asp>top (/file, file.asp, top)
/file/file.asp (/file, file.asp)

file\start.asp?load=http://www.test.de>top (file, \start.asp,
http:www.test.de, top)

file/start.asp?load=http://www.test.de (file, /start.asp,
http:/ww.test.de)

/file/start.asp#de:Beispiel;en:example;>top (/file, /start.asp, example)
http://www.start.de (http://www.start.de)

ftp://ftp.test.tt#en:example (ftp://ftp.test.tt, example)

and with this regex I only get the ones starting with files,
but I can't get the one with pdf and the uris.
((.*)[(/|\\)])(.*\.asp)(\?load=(.*)#)?((.*;en:(.*));)?(>(.*)$)?

Regards,

Andi

Re: regex problem by Roland

Roland
Thu Aug 18 15:14:18 CDT 2005

"Andreas Bauer" wrote in message news:de2g8q$nqb$1@online.de...
:
: > I assume that the string your are trying to match only exists in
examples
: > 1, 3 & 5. If this is correct, I think this pattern should work:
: > '[^#]*en:([^>;:]*)' Here is what I used for testing:
:
: Thanks, but I need more the parts of the string, e.g.
:
: test_word.pdf#en:Test Word>top (I need test_word.pdf, Test Word, top)
:
: /file\file.asp>top (/file, file.asp, top)
: /file/file.asp (/file, file.asp)
:
: file\start.asp?load=http://www.test.de>top (file, \start.asp,
: http:www.test.de, top)
:
: file/start.asp?load=http://www.test.de (file, /start.asp,
: http:/ww.test.de)
:
: /file/start.asp#de:Beispiel;en:example;>top (/file, /start.asp, example)
: http://www.start.de (http://www.start.de)
:
: ftp://ftp.test.tt#en:example (ftp://ftp.test.tt, example)
:
: and with this regex I only get the ones starting with files,
: but I can't get the one with pdf and the uris.
: ((.*)[(/|\\)])(.*\.asp)(\?load=(.*)#)?((.*;en:(.*));)?(>(.*)$)?

Andi...

You can't do all that in one regexp because you're data is not consistent
and it contradicts itself.

: /file/file.asp (/file, file.asp)

: file/start.asp?load=http://www.test.de (file, /start.asp,
: http:/ww.test.de)

There is no way to determine you want the / on the filename vs not having
it. A better comparison see the flaw would be:

/file/file.asp (/file, file.asp)
/file/file.asp (/file, /file.asp)

Which is it?

--
Roland Hall
/* This information is distributed in the hope that it will be useful, but
without any warranty; without even the implied warranty of merchantability
or fitness for a particular purpose. */
Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp
MSDN Library - http://msdn.microsoft.com/library/default.asp



Re: regex problem by Andreas

Andreas
Fri Aug 19 00:50:04 CDT 2005


> : test_word.pdf#en:Test Word>top (I need test_word.pdf, Test Word, top)

> : file\start.asp?load=http://www.test.de>top (file, \start.asp,
> : http:www.test.de, top)
> :
> : file/start.asp?load=http://www.test.de (file, /start.asp,
> : http:/ww.test.de)
> :
> : /file/start.asp#de:Beispiel;en:example;>top (/file, /start.asp, example)
> : http://www.start.de (http://www.start.de)
> :
> : ftp://ftp.test.tt#en:example (ftp://ftp.test.tt, example)
> :
> : and with this regex I only get the ones starting with files,
> : but I can't get the one with pdf and the uris.
> : ((.*)[(/|\\)])(.*\.asp)(\?load=(.*)#)?((.*;en:(.*));)?(>(.*)$)?


> /file/file.asp (/file, file.asp)
(/file,/file.asp)
> file/file.asp (/file, /file.asp)
(file,/file.asp)

But the problem is more to match the test_word.pdf#...
and the uris and to expand my regex to match them ( the examples with
'file' weren't very excat, sorry, but the matches with the regex I've
already provided works)

Best regars

Re: regex problem by James

James
Fri Aug 19 08:22:07 CDT 2005

"Andreas Bauer" <buki@gmx.net> wrote in message
news:de3rud$drb$1@online.de...
>
> > /file/file.asp (/file, file.asp)
> (/file,/file.asp)
> > file/file.asp (/file, /file.asp)
> (file,/file.asp)
>
> But the problem is more to match the test_word.pdf#...
> and the uris and to expand my regex to match them ( the examples with
> 'file' weren't very excat, sorry, but the matches with the regex I've
> already provided works)

Ok. Second try. Uncomment the strings one at a time & run the script below
for each string. See if it gives you what you are looking for.

'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dim oRegEx, sSubMatch, sMatch, oMatches, x

Set oRegEx = New RegExp

x = "test_word.pdf#en:Test Word>top"
'x = "/file\file.asp>top"
'x = "/file/file.asp"
'x = "file\start.asp?load=http://www.test.de>top"
'x = "/file/start.asp#de:Beispiel;en:example;>top"
'x = "http://www.start.de"
'x = "ftp://ftp.test.tt#en:example"

oRegEx.Pattern = "(\w*:\/\/[^#]*|.[^#\\\/]*)/?(?:#en:|\\)" _
& "?([^\?>;#]*)(?:[\?#].+[=:](?!\/)([^>;]*)>?|>?)(top)?"

Set oMatches = oRegEx.Execute(x)

For Each sMatch in oMatches
For Each sSubMatch in sMatch.SubMatches
If Len(sSubMatch) Then WScript.Echo sSubMatch
Next
Next
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



Re: regex problem by Andreas

Andreas
Thu Sep 01 14:50:12 CDT 2005


> oRegEx.Pattern = "(\w*:\/\/[^#]*|.[^#\\\/]*)/?(?:#en:|\\)" _
> & "?([^\?>;#]*)(?:[\?#].+[=:](?!\/)([^>;]*)>?|>?)(top)?"
>
Thanks alot. As far as I can see it fulfills all my needs.

Re: regex problem (not the whole problem was solved:)) by Andreas

Andreas
Tue Sep 06 16:20:54 CDT 2005

Andreas Bauer schrieb:
>
>> oRegEx.Pattern = "(\w*:\/\/[^#]*|.[^#\\\/]*)/?(?:#en:|\\)" _
>> & "?([^\?>;#]*)(?:[\?#].+[=:](?!\/)([^>;]*)>?|>?)(top)?"
>>
I can't get 'en:France" in:

https://test.org.de/app1/folder1/folder2/folder3/folder4/start.asp#de:Frankreich;en:France;>top

I'm working on it, but any hint is appreciated as well :)

Regards,

Andi


Re: regex problem (not the whole problem was solved:)) by James

James
Tue Sep 06 20:42:42 CDT 2005

"Andreas Bauer" <buki@gmx.net> wrote in message
news:dfl17f$lla$1@online.de...
> Andreas Bauer schrieb:
> >
> >> oRegEx.Pattern = "(\w*:\/\/[^#]*|.[^#\\\/]*)/?(?:#en:|\\)" _
> >> & "?([^\?>;#]*)(?:[\?#].+[=:](?!\/)([^>;]*)>?|>?)(top)?"
> >>
> I can't get 'en:France" in:
>
>
https://test.org.de/app1/folder1/folder2/folder3/folder4/start.asp#de:Frankreich;en:France;>top
>
> I'm working on it, but any hint is appreciated as well :)

I'm a little confused. In one of your previous examples:

/file/start.asp#de:Beispiel;en:example;>top

...you wanted 'example' & not 'en:example'. Based on this, I would assume
you would want you would want 'France' instead of 'en:France'. When I run
this string through the expression, I get 'France'. If I am
misunderstanding, please clarify & I will attempt to adjust the regular
expression.



Re: regex problem (not the whole problem was solved:)) by Andreas

Andreas
Wed Sep 07 10:08:37 CDT 2005

James Whitlow schrieb:

> I'm a little confused. In one of your previous examples:
>
> /file/start.asp#de:Beispiel;en:example;>top
>
> ...you wanted 'example' & not 'en:example'. Based on this, I would assume
> you would want you would want 'France' instead of 'en:France'.
Sorry, my fault. I want France, of course :)