Note this all applies to Portal search only, SPS not WSS.
I'm looking for ways to optimize the crawl of documents in a web content
source.
I've set the resource usage to dedicated already.
For example, I only require search on metadata.
I don't require full text indexing but I can't see a way to stop this on the
portal.
What happens if I set maxdownloadsize to zero?
Any other tweaks to achieve this or other optimization?
Much appreciated.

RE: Crawl does just that! by NikanderMargrietBruggeman

NikanderMargrietBruggeman
Tue Sep 20 08:55:07 CDT 2005

Hi,

This one might be obvious, but here it goes anyways... You can limit the
hopcount so that the number of followed links which are indexed are limited.

Kind regards,
Nikander & Margriet Bruggeman

"SharpeySP" wrote:

> Note this all applies to Portal search only, SPS not WSS.
> I'm looking for ways to optimize the crawl of documents in a web content
> source.
> I've set the resource usage to dedicated already.
> For example, I only require search on metadata.
> I don't require full text indexing but I can't see a way to stop this on the
> portal.
> What happens if I set maxdownloadsize to zero?
> Any other tweaks to achieve this or other optimization?
> Much appreciated.
>

RE: Crawl does just that! by SharpeySP

SharpeySP
Wed Sep 21 03:26:01 CDT 2005

Thanks for the speedy response.
Unfortunately the site I'm crawling is "self contained", i.e. there are no
links to any other site and hence no hops.
Valid answer for other circumstances and I'll remember it for those hopefully!

"Nikander & Margriet Bruggeman" wrote:

> Hi,
>
> This one might be obvious, but here it goes anyways... You can limit the
> hopcount so that the number of followed links which are indexed are limited.
>
> Kind regards,
> Nikander & Margriet Bruggeman
>
> "SharpeySP" wrote:
>
> > Note this all applies to Portal search only, SPS not WSS.
> > I'm looking for ways to optimize the crawl of documents in a web content
> > source.
> > I've set the resource usage to dedicated already.
> > For example, I only require search on metadata.
> > I don't require full text indexing but I can't see a way to stop this on the
> > portal.
> > What happens if I set maxdownloadsize to zero?
> > Any other tweaks to achieve this or other optimization?
> > Much appreciated.
> >

Re: Crawl does just that! by Shane

Shane
Wed Sep 21 16:53:18 CDT 2005

So you are just having problems with slow crawls? How many docs do you
have? What is your server setup? What type of hardware?

--
Shane Young
SharePoint911 - "SharePoint Help...Now!"
http://www.SharePoint911.com


"SharpeySP" <SharpeySP@discussions.microsoft.com> wrote in message
news:10407A97-7A43-4395-8016-E3BCD9BCB2FB@microsoft.com...
> Thanks for the speedy response.
> Unfortunately the site I'm crawling is "self contained", i.e. there are no
> links to any other site and hence no hops.
> Valid answer for other circumstances and I'll remember it for those
> hopefully!
>
> "Nikander & Margriet Bruggeman" wrote:
>
>> Hi,
>>
>> This one might be obvious, but here it goes anyways... You can limit the
>> hopcount so that the number of followed links which are indexed are
>> limited.
>>
>> Kind regards,
>> Nikander & Margriet Bruggeman
>>
>> "SharpeySP" wrote:
>>
>> > Note this all applies to Portal search only, SPS not WSS.
>> > I'm looking for ways to optimize the crawl of documents in a web
>> > content
>> > source.
>> > I've set the resource usage to dedicated already.
>> > For example, I only require search on metadata.
>> > I don't require full text indexing but I can't see a way to stop this
>> > on the
>> > portal.
>> > What happens if I set maxdownloadsize to zero?
>> > Any other tweaks to achieve this or other optimization?
>> > Much appreciated.
>> >



Re: Crawl does just that! by SharpeySP

SharpeySP
Wed Sep 28 03:36:01 CDT 2005

Hi Shane apologies for the delay.
Our config is as follows:
Single Server, 2 x 3.4GHz processors, 3Gb ram, Server 2003 enterprise SP1,
34GB system disk, 474GB data disk. SQL 2000 standard SP3. SPS 2003 SP1.
Now for the hard part!
We have 10 virtual servers extended using wss & host headers on this server.
The default web site is used for our Sharepoint Portal.
Each of the ten WSS sites is set up as a content source in the Portal.
So I have ten content indexes in addition to Portal content and non portal
content
We are basicaly using sharepoint as a document repository.
Each of the Wss webs has loads of sites within it & each site there are up
to 3 document libraries containing tifs and pdfs.
Our total number of documents is 2 million spread pretty evenly across all
the sites.
I've not managed to do an initial full update yet on all the indexes, its
getting slower & I'm now talking about weeks not days to do the indexing.
Ouch!

"Shane Young" wrote:

> So you are just having problems with slow crawls? How many docs do you
> have? What is your server setup? What type of hardware?
>
> --
> Shane Young
> SharePoint911 - "SharePoint Help...Now!"
> http://www.SharePoint911.com
>
>
> "SharpeySP" <SharpeySP@discussions.microsoft.com> wrote in message
> news:10407A97-7A43-4395-8016-E3BCD9BCB2FB@microsoft.com...
> > Thanks for the speedy response.
> > Unfortunately the site I'm crawling is "self contained", i.e. there are no
> > links to any other site and hence no hops.
> > Valid answer for other circumstances and I'll remember it for those
> > hopefully!
> >
> > "Nikander & Margriet Bruggeman" wrote:
> >
> >> Hi,
> >>
> >> This one might be obvious, but here it goes anyways... You can limit the
> >> hopcount so that the number of followed links which are indexed are
> >> limited.
> >>
> >> Kind regards,
> >> Nikander & Margriet Bruggeman
> >>
> >> "SharpeySP" wrote:
> >>
> >> > Note this all applies to Portal search only, SPS not WSS.
> >> > I'm looking for ways to optimize the crawl of documents in a web
> >> > content
> >> > source.
> >> > I've set the resource usage to dedicated already.
> >> > For example, I only require search on metadata.
> >> > I don't require full text indexing but I can't see a way to stop this
> >> > on the
> >> > portal.
> >> > What happens if I set maxdownloadsize to zero?
> >> > Any other tweaks to achieve this or other optimization?
> >> > Much appreciated.
> >> >
>
>
>

Re: Crawl does just that! by Tony

Tony
Wed Sep 28 08:21:30 CDT 2005

You might want to look into setting up a server farm instead of running
everything off 1 server. The next "supported" upgrade from your setup,
one that will actually split the web and search/index/job roles, is one
that will have 3 servers:

1 - Your web front-end.
2 - SQL Server
3 - Your search, indexing, and job roles.

If your organisation has 2 million documents, ones that I would imagine
they depend on, it would benefit them to upgrade into a small server
farm vice a single server.


Re: Crawl does just that! by SharpeySP

SharpeySP
Tue Oct 04 03:44:03 CDT 2005

Hi Tony,

I agree we are asking a lot of our server so my next step is to go to the
server farm.

For now I've disabled the IFilter for tifs which has speeded things up no end.
I'm not posting my method here, I have no ill effects but I don't want to be
responsible for anyone copying it!


"Tony" wrote:

> You might want to look into setting up a server farm instead of running
> everything off 1 server. The next "supported" upgrade from your setup,
> one that will actually split the web and search/index/job roles, is one
> that will have 3 servers:
>
> 1 - Your web front-end.
> 2 - SQL Server
> 3 - Your search, indexing, and job roles.
>
> If your organisation has 2 million documents, ones that I would imagine
> they depend on, it would benefit them to upgrade into a small server
> farm vice a single server.
>
>