I would like to use some of the data in the Wikipedia in one of my (.NET!)
program.
I'm still at the stage of trying to figure out how to download the data.

Any tip on:
- how to download WikiPedia data?
- how to use the data once downloaded?

Re: download Wikipedia.... by Alex

Alex
Sat May 13 23:32:17 CDT 2006

Do you mean scrapping the wikipedia webpages? If that is the case,
then you want to take a look at the System.Net namespace; in particular
the WebClient class or the HttpWebRequest class for download the
content; then use a parser to extra the data from the webpage content.

Does that help?
Alex


Re: download Wikipedia.... by Lloyd

Lloyd
Sun May 14 06:06:34 CDT 2006

Nono...
I found the URL, you could download the wikipedia's books at:
http://download.wikimedia.org/

Now I am the stage, trying to figure out what to do with this 136MB long XML
file.
Obviously basic XML tool which simply load it in memory are
innapropriate....

"Alex Li" <likwoka@gmail.com> wrote in message
news:1147581137.948357.286610@y43g2000cwc.googlegroups.com...
> Do you mean scrapping the wikipedia webpages? If that is the case,
> then you want to take a look at the System.Net namespace; in particular
> the WebClient class or the HttpWebRequest class for download the
> content; then use a parser to extra the data from the webpage content.
>
> Does that help?
> Alex
>



Re: download Wikipedia.... by Gaurav

Gaurav
Sun May 14 17:50:02 CDT 2006

Hi Lloyd,

> Now I am the stage, trying to figure out what to do with this 136MB long
> XML file.
> Obviously basic XML tool which simply load it in memory are
> innapropriate....

What is that you are trying to achieve will define a lot of things.
May be you need to upgrade to 2GB machine to load all XML in memory
May be you need serial access (XMLTextReader) and can do with only
512MB of RAM.


--
Cheers,
Gaurav Vaish
http://www.mastergaurav.org
http://www.edujini.in
-------------------



Re: download Wikipedia.... by Lloyd

Lloyd
Sun May 14 19:23:36 CDT 2006

>> Now I am the stage, trying to figure out what to do with this 136MB long
>> XML file.
>> Obviously basic XML tool which simply load it in memory are
>> innapropriate....
>
> What is that you are trying to achieve will define a lot of things.
> May be you need to upgrade to 2GB machine to load all XML in memory
> May be you need serial access (XMLTextReader) and can do with only
> 512MB of RAM.
>
I try something very simple with XMLTextReader:
XmlTextReader xml = new XmlTextReader("theBigFile.xml");
while(!xml.EOF)
xml.Skip();

it tooks ages.....
so I am kind of dubious I could use for anything usefull....

but that's kind of suprising as I found some other WikiPedia tool which
didn't seem to ave any trouble.. mhh....



Re: download Wikipedia.... by Gaurav

Gaurav
Tue May 16 22:47:50 CDT 2006

> I try something very simple with XMLTextReader:
> XmlTextReader xml = new XmlTextReader("theBigFile.xml");
> while(!xml.EOF)
> xml.Skip();
>
> it tooks ages.....

Probably that demonstrates the difference between efficient and inefficient
parsing logic...?



--
Cheers,
Gaurav Vaish
http://www.mastergaurav.org
http://www.edujini.in
-------------------



Re: download Wikipedia.... by Lloyd

Lloyd
Tue May 16 23:37:59 CDT 2006

Hu.. doesn't demonstrate much to me.
Anyway, interestingly This:
===
XmlTextReader xml = new XmlTextReader("theBigFile.xml");
xml.ReadStartElement(); <<== new
while(!xml.EOF)
xml.Skip();
===
works much better!...


"Gaurav Vaish (EduJini.IN)" <gaurav.vaish.nospam@nospam.gmail.com> wrote in
message news:%23ulewSWeGHA.2188@TK2MSFTNGP04.phx.gbl...
>> I try something very simple with XMLTextReader:
>> XmlTextReader xml = new XmlTextReader("theBigFile.xml");
>> while(!xml.EOF)
>> xml.Skip();
>>
>> it tooks ages.....
>
> Probably that demonstrates the difference between efficient and
> inefficient parsing logic...?
>
>
>
> --
> Cheers,
> Gaurav Vaish
> http://www.mastergaurav.org
> http://www.edujini.in
> -------------------
>
>



Re: download Wikipedia.... by Gaurav

Gaurav
Wed May 17 08:26:04 CDT 2006

Ha ha ha ha.
That tells me that we should be given access to the source code of the
application to check and report the code that result in these issues.

Let me also try out.. should be interesting to work with :D

--
Happy Hacking,
Gaurav Vaish
http://www.mastergaurav.org
http://www.edujini.in
-------------------


"Lloyd Dupont" <net.galador@ld> wrote in message
news:%23T59ouWeGHA.3556@TK2MSFTNGP02.phx.gbl...
> Hu.. doesn't demonstrate much to me.
> Anyway, interestingly This:
> ===
> XmlTextReader xml = new XmlTextReader("theBigFile.xml");
> xml.ReadStartElement(); <<== new
> while(!xml.EOF)
> xml.Skip();
> ===
> works much better!...
>