We've heard reports that the Dataset object in the .NET 1.1 frameworks tends
to slow down dramatically after it reaches a certain size, around 70MB or so,
regardless of the amount of RAM installed on the machine. We haven't
experienced this ourselves yet, but can see the day when our Dataset will
comprise of hundreds of megabytes of data.



Questions:



1 Is this a well-known issue?

2 Is this a general performance problem, or just a problem with certain
methods? The Merge method comes to mind, which we use quite a bit in our
code indirectly.

3 Is there any way to work around this limitation?

4 Does the NET 2.0 framework address this issue?

Re: dataset Performence Issue by Sahil

Sahil
Tue Aug 02 21:59:36 CDT 2005

If your DataSet will occupy 100's of MB, seriously that is ... ugghh .. it
makes me feel so pukish .. sorry but just don't do that. Such a big dataset
is like blasphemy, it's so awful, I am just so disgusted to even hear that.
Why do you need such a big dataset? That is an AWFUL way to use a DataSet.

In .NET 1.1, given the size of the dataset of course everything will croak.
Yes Especially GetChanges and Merge, both will be awful - especially if you
have lots of relations and a lot of tables.

The workaround is - "A DATASET IS NOT A DATABASE, DON'T ABUSE IT AS ONE"

.NET 2.0 has many enhancements under the scenes in both the various
underlying collections and the GetChanges algorithm and otherwise that make
using a DataSet a lot better, but 70MB? 100MB in a Dataset (In-Memory
Disconnected Cache?) - That is SUPER DUPER BUMPER ULTRA AWFUL.

BTW, especially YOU need to read Chapters 9 and 10 of my book to understand
clearly and practically why I am advocating so strongly against such a
misuse/ABUSE of a DataSet.

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
----------------------------------------------------------------------------




"Ashishthaps" <Ashishthaps@discussions.microsoft.com> wrote in message
news:5841F739-7619-42C1-AFD7-7DEA0BC3BC37@microsoft.com...
> We've heard reports that the Dataset object in the .NET 1.1 frameworks
> tends
> to slow down dramatically after it reaches a certain size, around 70MB or
> so,
> regardless of the amount of RAM installed on the machine. We haven't
> experienced this ourselves yet, but can see the day when our Dataset will
> comprise of hundreds of megabytes of data.
>
>
>
> Questions:
>
>
>
> 1 Is this a well-known issue?
>
> 2 Is this a general performance problem, or just a problem with certain
> methods? The Merge method comes to mind, which we use quite a bit in our
> code indirectly.
>
> 3 Is there any way to work around this limitation?
>
> 4 Does the NET 2.0 framework address this issue?
>
>



Re: dataset Performence Issue by Adrian

Adrian
Tue Aug 02 22:58:05 CDT 2005

Sahil,

There's nothing wrong with having a large, in-memory database. There are
lots of reasons in a real-time environment where a database can fit in
memory and performance rules. The SCADA system I work with has a 400 MB
in-memory database. Data is flushed to disk, once an hour or on-demand.

In-memory databases are also typically used in the embedded market since I/O
to flash memory is usually not great.

I see Datasets as a good solution to some problems.

I see other open-source databases like Firebird, SharpSQL and SQLLite as
good solutions when a small footprint RDBMS is needed. They are simple to
install and setup.

MSDE / SQL-Server Express is still overkill for many database needs, but
does provide a great solution to problems that match it capabilities.

I look foward to reading you book when its available.
Ad.


"Sahil Malik [MVP]" <contactmethrumyblog@nospam.com> wrote in message
news:ebYAid9lFHA.3552@TK2MSFTNGP10.phx.gbl...
> If your DataSet will occupy 100's of MB, seriously that is ... ugghh .. it
> makes me feel so pukish .. sorry but just don't do that. Such a big
> dataset is like blasphemy, it's so awful, I am just so disgusted to even
> hear that. Why do you need such a big dataset? That is an AWFUL way to use
> a DataSet.
>
> In .NET 1.1, given the size of the dataset of course everything will
> croak. Yes Especially GetChanges and Merge, both will be awful -
> especially if you have lots of relations and a lot of tables.
>
> The workaround is - "A DATASET IS NOT A DATABASE, DON'T ABUSE IT AS ONE"
>
> .NET 2.0 has many enhancements under the scenes in both the various
> underlying collections and the GetChanges algorithm and otherwise that
> make using a DataSet a lot better, but 70MB? 100MB in a Dataset (In-Memory
> Disconnected Cache?) - That is SUPER DUPER BUMPER ULTRA AWFUL.
>
> BTW, especially YOU need to read Chapters 9 and 10 of my book to
> understand clearly and practically why I am advocating so strongly against
> such a misuse/ABUSE of a DataSet.
>
> - Sahil Malik [MVP]
> ADO.NET 2.0 book -
> http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
> ----------------------------------------------------------------------------



Re: dataset Performence Issue by Sahil

Sahil
Tue Aug 02 23:03:46 CDT 2005

Adrian,

Call it personal opinion, but I am not a big fan of Prevayler*.* concepts. I
feel standard RDBMS's will evolve to provide that functionality.
But what I am certainly not a fan of is storing 400 MB in a DataSet - that
object is just not designed for such a heavy amount of in memory data.

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
----------------------------------------------------------------------------

"Adrian Moore" <queryadataset@hotmail.com> wrote in message
news:uSvzK%239lFHA.4000@TK2MSFTNGP12.phx.gbl...
> Sahil,
>
> There's nothing wrong with having a large, in-memory database. There are
> lots of reasons in a real-time environment where a database can fit in
> memory and performance rules. The SCADA system I work with has a 400 MB
> in-memory database. Data is flushed to disk, once an hour or on-demand.
>
> In-memory databases are also typically used in the embedded market since
> I/O to flash memory is usually not great.
>
> I see Datasets as a good solution to some problems.
>
> I see other open-source databases like Firebird, SharpSQL and SQLLite as
> good solutions when a small footprint RDBMS is needed. They are simple to
> install and setup.
>
> MSDE / SQL-Server Express is still overkill for many database needs, but
> does provide a great solution to problems that match it capabilities.
>
> I look foward to reading you book when its available.
> Ad.
>
>
> "Sahil Malik [MVP]" <contactmethrumyblog@nospam.com> wrote in message
> news:ebYAid9lFHA.3552@TK2MSFTNGP10.phx.gbl...
>> If your DataSet will occupy 100's of MB, seriously that is ... ugghh ..
>> it makes me feel so pukish .. sorry but just don't do that. Such a big
>> dataset is like blasphemy, it's so awful, I am just so disgusted to even
>> hear that. Why do you need such a big dataset? That is an AWFUL way to
>> use a DataSet.
>>
>> In .NET 1.1, given the size of the dataset of course everything will
>> croak. Yes Especially GetChanges and Merge, both will be awful -
>> especially if you have lots of relations and a lot of tables.
>>
>> The workaround is - "A DATASET IS NOT A DATABASE, DON'T ABUSE IT AS ONE"
>>
>> .NET 2.0 has many enhancements under the scenes in both the various
>> underlying collections and the GetChanges algorithm and otherwise that
>> make using a DataSet a lot better, but 70MB? 100MB in a Dataset
>> (In-Memory Disconnected Cache?) - That is SUPER DUPER BUMPER ULTRA AWFUL.
>>
>> BTW, especially YOU need to read Chapters 9 and 10 of my book to
>> understand clearly and practically why I am advocating so strongly
>> against such a misuse/ABUSE of a DataSet.
>>
>> - Sahil Malik [MVP]
>> ADO.NET 2.0 book -
>> http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
>> ----------------------------------------------------------------------------
>
>



Re: dataset Performence Issue by Cor

Cor
Wed Aug 03 01:39:53 CDT 2005

Adrian,

I agree with you that there can be many reasons to have a large in memory
database.
By instance with a single user used database in situations where speed is
very important.

However that is not the reason a dataset is build for. A dataset is build
for multipurpose use in a multi-user disconnected situation and has
therefore all kinds of features.

By instance in a single used database it is not of any importance if a row
is changed. After copying the previous one in process time, do you dump the
whole database over the offline stored data

Therefore let us in this performance issue not mix up apples with computers.

Cor



Re: dataset Performence Issue by Nigel

Nigel
Wed Aug 03 04:50:33 CDT 2005


"Sahil Malik [MVP]" <contactmethrumyblog@nospam.com> wrote in message
news:Of7iYB%23lFHA.576@TK2MSFTNGP15.phx.gbl...

> But what I am certainly not a fan of is storing 400 MB in a DataSet - that
> object is just not designed for such a heavy amount of in memory data.
>
Sahil,

You make that assertion - can you elaborate as to why you believe that? I
don't see anything in the documentation, or the services that DataSets
provide, that would make me believe that they are not designed to handle
large amounts of data (given the inherent constraints of memory, garbage
collecting very large heaps, etc).

Now it appears to be the case that the V1.1 implementation has some
performance problems in some areas with large tables, but at least some of
these are fixed in V2.0. So Microsoft are making efforts to ensure that
large volumns of data are supported. See the following article for some
information:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnadonet/html/datasetenhance.asp

While I would agree that the main use and focus for DataSets is *selective*
caching of data, I can certainly envisage situations where I might want to
hold quite a large extract of a table in a dataset (where connectivity is
not always available, for instance).

---------
Nigel Norris




Re: dataset Performence Issue by Adrian

Adrian
Wed Aug 03 07:29:35 CDT 2005

Sahil,

I expect Datasets will also continue to evolve in order to meet the feature
and performance needs that developers are demanding.

Ad.

"Sahil Malik [MVP]" <contactmethrumyblog@nospam.com> wrote in message
news:Of7iYB%23lFHA.576@TK2MSFTNGP15.phx.gbl...
> Adrian,
>
> Call it personal opinion, but I am not a big fan of Prevayler*.* concepts.
> I feel standard RDBMS's will evolve to provide that functionality.
> But what I am certainly not a fan of is storing 400 MB in a DataSet - that
> object is just not designed for such a heavy amount of in memory data.
>
> - Sahil Malik [MVP]
> ADO.NET 2.0 book -
> http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
> ----------------------------------------------------------------------------



RE: dataset Performence Issue by Ashishthaps

Ashishthaps
Wed Aug 03 07:34:58 CDT 2005

I have found answer for my queries i would like to share this with you


Question
=======
â??We've heard reports that the Dataset object in the .NET 1.1 frameworks tend
to slow down dramatically after it reaches a certain size, around 70MB or so,
regardless of the amount of RAM installed on the machine. We haven't
experienced this ourselves yet, but can see the day when our Dataset will
comprise of hundreds of megabytes of data. â??



Please find answers to your queries below
============================

1 Is this a well-known issue?


Yes for v1.x, itâ??s a well known issue.



2 Is this a general performance problem, or just a problem with certain
methods? The Merge method comes to mind, which we use quite a bit in our
code indirectly.



For v1.x, this is a general performance problem â?? it cuts across all write
actions i.e. all methods that modify DataRows â?? including Merge.

The performance problem is mitigated to some extent for bulk load operations
which include Merge, Adapter.Fill and Table.AcceptChanges(). The impact is
most visible for singleton random inserts, deletes and modifications.



3 Is there any way to work around this limitation?



Ways to work around this limitation?

1. Try to minimize constraints on columns. Constrains include
column.AllowDbNull, ForeignKey, PrimaryKey, UniqueKey, etc.

2. Try to minimize the use of relations as it involved building indexes on
the participating columns of the tables.



4 Does the NET 2.0 framework address this issue?

Yes, out of the box â?? no changes required in user code. The benefits cut all
across Dataset.



To give you some perspective on the change, 70 MB is at most 500K rows, and
in v2.0 you can fill 1 million rows, inserted randomly with a primary key in
around 30 seconds. To insert the same 1 million rows in v1.x, it takes 30
minutes.


Thanks for your support

"Ashishthaps" wrote:

> We've heard reports that the Dataset object in the .NET 1.1 frameworks tends
> to slow down dramatically after it reaches a certain size, around 70MB or so,
> regardless of the amount of RAM installed on the machine. We haven't
> experienced this ourselves yet, but can see the day when our Dataset will
> comprise of hundreds of megabytes of data.
>
>
>
> Questions:
>
>
>
> 1 Is this a well-known issue?
>
> 2 Is this a general performance problem, or just a problem with certain
> methods? The Merge method comes to mind, which we use quite a bit in our
> code indirectly.
>
> 3 Is there any way to work around this limitation?
>
> 4 Does the NET 2.0 framework address this issue?
>
>

Re: dataset Performence Issue by Sahil

Sahil
Wed Aug 03 12:06:09 CDT 2005

Managed code atleast in the near future will never be able to outperform
native unmanaged code. I seriously doubt you will ever substitute a dataset
for SQL Server EVER.

--

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
-------------------------------------------------------------------------------------------

"Adrian Moore" <queryadataset@hotmail.com> wrote in message
news:etA$%23bCmFHA.2904@TK2MSFTNGP14.phx.gbl...
> Sahil,
>
> I expect Datasets will also continue to evolve in order to meet the
> feature and performance needs that developers are demanding.
>
> Ad.
>
> "Sahil Malik [MVP]" <contactmethrumyblog@nospam.com> wrote in message
> news:Of7iYB%23lFHA.576@TK2MSFTNGP15.phx.gbl...
>> Adrian,
>>
>> Call it personal opinion, but I am not a big fan of Prevayler*.*
>> concepts. I feel standard RDBMS's will evolve to provide that
>> functionality.
>> But what I am certainly not a fan of is storing 400 MB in a DataSet -
>> that object is just not designed for such a heavy amount of in memory
>> data.
>>
>> - Sahil Malik [MVP]
>> ADO.NET 2.0 book -
>> http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
>> ----------------------------------------------------------------------------
>
>



Re: dataset Performence Issue by Sahil

Sahil
Wed Aug 03 12:27:41 CDT 2005

Nigel,

It is true that .NET 2.0's dataset has been really improved from within to
handle greater quantities of data, but the justification of that is the
desire to abuse it as a database. It shouldn't be interpreted as a nod from
Microsoft that a DataSet is okay to abuse as a DataBase. (I am not Microsoft
mind you).

There are a number of reasons for this.

1. First of all, Dataset is guess what - Managed Code, and to send all the
.NET lovers in tailspin, Managed Code can never be as fast and as optimized
as native code. That is reasonable to expect and I am certainly not saying
that unmanaged is better and drop managed code for good - NO WAY, there are
a lot of benefits of using Managed code, but when it comes to raw
performance, Managed code sucks.

2. Secondly, the Garbage collector is that animal that makes things very
very good for 90% of the situations i.e. normal memory usage, but when you
start storing many megabytes or close to a gigabyte of information
completely in RAM - it will actually hurt your application performance. In
those scenarios, you don't want an external policeman who doesn't understand
the specific needs of your app. In that situation, you want fine control on
the memory where you specifiy when it gets cleaned, or serialized to the
disk etc. You need paging mechanisms etc. which are possible to write for
the dataset but are a real royal pain to write and even then they don't work
quite as well as - guess what - native code (i.e. most of what SQL Server).

3. SQL Server and any database comes with a "Query Engine". The number of
optimizations built into that is the work of many Phds (or dudes with
similar smarts and specialization), they have written up SQL Server's query
engine to take advantage of automatic paging, locking algorithms, spilling
over to the disk when needed, "query plans", caching those query plans -
when you compare the object model of a Dataset (or any biz object for that
matter), the comparison is like comparing a candle with the sun.

4. The algorithms in a DataSet is rudimentary, they rely on simple
techniques such as string matching, string manipulation - that level of
simplicity. They work on an "Object structure", every value they access goes
over a dereferenced segment calculation. SQL Server and any standard
database is written with that level of optimization (Not like I have looked
at their code, but it has to be written that way). Maybe in certain
instances DataSet might be smarter than I try to bring out here, but a
DataSet is an in memory object - that is how simple it is. It's just an
"object". It's worse than MS-Access when it comes to managing data for you.
(Don't get me wrong, I still love datasets, I just don't think they replace
databases). Even MS-Access will manage 4 GB, a DataSet doesn't even have a
clear upper limit definied. But of course MS Access isn't an in memory, xml
convertible, serializable in memory cache that lets you extract and merge
changes - so comparing dataset with access is like comparing apples and
oranges (but hey the argument is all about a DataSet is an Orange (in memory
cache), not an Apple (Database)).

5. Lets not forget transactional locks and many other such points, I blogged
about it earlier over here -
http://codebetter.com/blogs/sahil.malik/archive/2005/01/23/47547.aspx

6. Datasets are or any such object - AN IN MEMORY disconnected cache of
data. Being completely in memory lends them to the disadvantage of a 32 bit
OS's 2 GB memory allocation limit, there are ways around that but I
personally see those ways as bandaids rather than a true solution. Secondly
being disconnected leaves you with a WORLD of problems to solve when trying
to persist relational data back into the database. If you think
DataAdapter.Update(Dataset) will save your entire dataset into the database,
and also take care of concurrency issues, and transactional deadlocks - you
are sorely mistaken. A simple 3 table hierarchy will require you to write
pages and pages of code to save properly into the database in every
scenario - it is NOT a trivial task. And then you have to worry about not
sending too much over the wire in web service like environments, so
Merge/GetChanges - and oh lets not forget keeping your disconnected cache
fresh. How do you resolve Deleted Rows? .NET 1.1 leaves you with very few
choices, .NET 2.0 has a new IndexOf method which doesn't work in every
circumstance.

Now of course you could argue that #6 proves that simply don't use a
database, only use a disconnected cache i.e. prevayler etc. For that
argument, re-read items #1 thru 5.

In short, while DataSets will continue to improve, or lets say, they will
continue to try and be as good as a full fledged database is, they will
NEVER reach that point. While even though datasets will improve, who knows
you may even be able to run a SQL Query against them directly, but hey you
could always store a little tiny such database on a RAMDisk, or a memory
mapped file in SQL Server, and pretty much get what you need including the
heavy duty research that has already gone in making "a database".

Again, I strongly and vehemently disagree with an architecture that puts 1
GB data into a DataSet. That is complete stupidity in both .NET 1.1 and 2.0.

Whew .. this was a long reply .. gotta go !!

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
-------------------------------------------------------------------------------------------




"Nigel Norris" <nospam@nospam.com> wrote in message
news:ugx7MDBmFHA.2860@TK2MSFTNGP15.phx.gbl...
>
> "Sahil Malik [MVP]" <contactmethrumyblog@nospam.com> wrote in message
> news:Of7iYB%23lFHA.576@TK2MSFTNGP15.phx.gbl...
>
>> But what I am certainly not a fan of is storing 400 MB in a DataSet -
>> that object is just not designed for such a heavy amount of in memory
>> data.
>>
> Sahil,
>
> You make that assertion - can you elaborate as to why you believe that? I
> don't see anything in the documentation, or the services that DataSets
> provide, that would make me believe that they are not designed to handle
> large amounts of data (given the inherent constraints of memory, garbage
> collecting very large heaps, etc).
>
> Now it appears to be the case that the V1.1 implementation has some
> performance problems in some areas with large tables, but at least some of
> these are fixed in V2.0. So Microsoft are making efforts to ensure that
> large volumns of data are supported. See the following article for some
> information:
>
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnadonet/html/datasetenhance.asp
>
> While I would agree that the main use and focus for DataSets is
> *selective* caching of data, I can certainly envisage situations where I
> might want to hold quite a large extract of a table in a dataset (where
> connectivity is not always available, for instance).
>
> ---------
> Nigel Norris
>
>
>



Re: dataset Performence Issue by Sahil

Sahil
Wed Aug 03 23:14:36 CDT 2005

V1.x DataSet not storing megabytes of data, IS NOT AN ISSUE, it is WHAT
YOU'D EXPECT out of a completely disconnected object that stores an
in-memory representation of tabular data.

Your bicycle doesn't fly, is that an issue with the bicycle?? Don't you
think the problem is between the keyboard and the chair? Don't you think you
should have instead bought an airplane instead rather than expecting your
bicycle to fly?

Sorry but you are SORELY mistaken.

But do share the source of your information below. I am curious how did they
pass a judgement on the exact numbers you mention - 500K rows and 1 million
rows.

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
----------------------------------------------------------------------------




"Ashishthaps" <Ashishthaps@discussions.microsoft.com> wrote in message
news:38464877-0FBE-4105-BD8F-AF3750A139EA@microsoft.com...
>I have found answer for my queries i would like to share this with you
>
>
> Question
> =======
> "We've heard reports that the Dataset object in the .NET 1.1 frameworks
> tend
> to slow down dramatically after it reaches a certain size, around 70MB or
> so,
> regardless of the amount of RAM installed on the machine. We haven't
> experienced this ourselves yet, but can see the day when our Dataset will
> comprise of hundreds of megabytes of data. "
>
>
>
> Please find answers to your queries below
> ============================
>
> 1 Is this a well-known issue?
>
>
> Yes for v1.x, it's a well known issue.
>
>
>
> 2 Is this a general performance problem, or just a problem with certain
> methods? The Merge method comes to mind, which we use quite a bit in our
> code indirectly.
>
>
>
> For v1.x, this is a general performance problem - it cuts across all write
> actions i.e. all methods that modify DataRows - including Merge.
>
> The performance problem is mitigated to some extent for bulk load
> operations
> which include Merge, Adapter.Fill and Table.AcceptChanges(). The impact is
> most visible for singleton random inserts, deletes and modifications.
>
>
>
> 3 Is there any way to work around this limitation?
>
>
>
> Ways to work around this limitation?
>
> 1. Try to minimize constraints on columns. Constrains include
> column.AllowDbNull, ForeignKey, PrimaryKey, UniqueKey, etc.
>
> 2. Try to minimize the use of relations as it involved building indexes on
> the participating columns of the tables.
>
>
>
> 4 Does the NET 2.0 framework address this issue?
>
> Yes, out of the box - no changes required in user code. The benefits cut
> all
> across Dataset.
>
>
>
> To give you some perspective on the change, 70 MB is at most 500K rows,
> and
> in v2.0 you can fill 1 million rows, inserted randomly with a primary key
> in
> around 30 seconds. To insert the same 1 million rows in v1.x, it takes 30
> minutes.
>
>
> Thanks for your support
>
> "Ashishthaps" wrote:
>
>> We've heard reports that the Dataset object in the .NET 1.1 frameworks
>> tends
>> to slow down dramatically after it reaches a certain size, around 70MB or
>> so,
>> regardless of the amount of RAM installed on the machine. We haven't
>> experienced this ourselves yet, but can see the day when our Dataset will
>> comprise of hundreds of megabytes of data.
>>
>>
>>
>> Questions:
>>
>>
>>
>> 1 Is this a well-known issue?
>>
>> 2 Is this a general performance problem, or just a problem with certain
>> methods? The Merge method comes to mind, which we use quite a bit in our
>> code indirectly.
>>
>> 3 Is there any way to work around this limitation?
>>
>> 4 Does the NET 2.0 framework address this issue?
>>
>>



Re: dataset Performence Issue by Nigel

Nigel
Thu Aug 04 03:01:06 CDT 2005

Sahil,

An interesting and well argued post. However I fear you are setting up a
Straw Man here. I never suggested that a DataSet is an alternative to a
DBMS. All I'm suggesting is that the OP'S use of a 70Mb DataSet is not
necessarily 'awful'. Indeed there may be cases where that's a very good
thing to do.

Clearly an in-memory approach has inherent scaling limits - the virtual and
physical memory on the machine. But 70Mb is a perfectly reasonable amount
of data to hold in memory in some applications. A suitable structured
in-memory solution tailored to specific queries can outperform a DBMS.
Network and disk costs see to that.

More comments in-line.

Nigel

"Sahil Malik [MVP]" <contactmethrumyblog@nospam.com> wrote in message
news:u$hUoCFmFHA.3300@TK2MSFTNGP15.phx.gbl...

> 1. First of all, Dataset is guess what - Managed Code, and to send all the
> .NET lovers in tailspin, Managed Code can never be as fast and as
> optimized as native code.
Ok, Managed Code has a cost. But DBMS's require network and disk operations,
which have a way bigger cost. You have to look at all the costs before you
can conclude that one approach is faster than the other.

> 2. Secondly, the Garbage collector is that animal that makes things very
> very good for 90% of the situations i.e. normal memory usage, but when you
> start storing many megabytes or close to a gigabyte of information
> completely in RAM - it will actually hurt your application performance. In
> those scenarios, you don't want an external policeman who doesn't
> understand the specific needs of your app. In that situation, you want
> fine control on the memory where you specifiy when it gets cleaned, or
> serialized to the disk etc. You need paging mechanisms etc. which are
> possible to write for the dataset but are a real royal pain to write and
> even then they don't work quite as well as - guess what - native code
> (i.e. most of what SQL Server).
>
My data is going to fit in VM - I would never suggest writing any form of
paging mechansism. Then you really are re-inventing a DBMS. The only paging
mechanism I'm relying on is the OS paging, which is - guess what - native
code that has been optimized to the max. In any case I'd expect to have
enough physical memory not to need paging.

> 3. SQL Server and any database comes with a "Query Engine". The number of
> optimizations built into that is the work of many Phds (or dudes with
> similar smarts and specialization), they have written up SQL Server's
> query engine to take advantage of automatic paging, locking algorithms,
> spilling over to the disk when needed, "query plans", caching those query
> plans - when you compare the object model of a Dataset (or any biz object
> for that matter), the comparison is like comparing a candle with the sun.
>
And it needs all that complexity because it's trying to minimize disk IO,
and because it's got to service every kind of request thrown at it. An
in-memory DataSet used for a specific purpose doesn't need any of that to
achieve the same performance.

> 4. The algorithms in a DataSet is rudimentary, they rely on simple
> techniques such as string matching, string manipulation - that level of
> simplicity. They work on an "Object structure", every value they access
> goes over a dereferenced segment calculation.

DataSets can maintain indexes, as far as I understand. I presume an index
will use a standard Hashtable - which is a reasonably optimized lookup
algorithm. So for a single indexed column lookup performance should be very
good.

>
> 5. Lets not forget transactional locks and many other such points, I
> blogged about it earlier over here -
> http://codebetter.com/blogs/sahil.malik/archive/2005/01/23/47547.aspx
>

Don't need any of that stuff. In-memory is fast - if you really need writes
then just serialize everything with a reader/writer lock.

> 6. Datasets are or any such object - AN IN MEMORY disconnected cache of
> data. Being completely in memory lends them to the disadvantage of a 32
> bit OS's 2 GB memory allocation limit,

Yep - if you have, or may need, 2 Gb of data, use a DBMS. Or use a 64-bit
machine.

> Again, I strongly and vehemently disagree with an architecture that puts 1
> GB data into a DataSet. That is complete stupidity in both .NET 1.1 and
> 2.0.
>
As part of your straw man argument, you've escalated the OP's 70 Mb a bit!



Re: dataset Performence Issue by Cor

Cor
Thu Aug 04 04:17:24 CDT 2005

> Managed Code can never be as fast and as optimized as native code.

Why not?

I have the idea about the opposite.

I make it short otherwise you probably don't understand the question.

Cor



Re: dataset Performence Issue by Miha

Miha
Thu Aug 04 06:37:25 CDT 2005

Actually, managed code can be faster than native code due to CPU specific
optimisations.
Perhaps now there aren't many optiomizations compiler does, however in
future we should se more of them.

--
Miha Markic [MVP C#] - RightHand .NET consulting & development
www.rthand.com
Blog: http://cs.rthand.com/blogs/blog_with_righthand/
SLODUG - Slovene Developer Users Group www.codezone-si.info

"Cor Ligthert [MVP]" <notmyfirstname@planet.nl> wrote in message
news:Ohyp%23VNmFHA.572@TK2MSFTNGP15.phx.gbl...
>> Managed Code can never be as fast and as optimized as native code.
>
> Why not?
>
> I have the idea about the opposite.
>
> I make it short otherwise you probably don't understand the question.
>
> Cor
>



Re: dataset Performence Issue by Cor

Cor
Thu Aug 04 07:11:48 CDT 2005

Miha,

Beside that gives managed code you automatic features that native does not.

It is for me the same discussion as that machine code would be faster than
program language code.

This is true for very small programs, however as soon as it becomes large,
than the amount of code that could have been reused as done by a program
language (and is almost impossible to do by hand), is so much that the
program language wins it almost forever.

Just my thought,

Cor



Re: dataset Performence Issue by Sahil

Sahil
Thu Aug 04 07:29:43 CDT 2005

Maybe in very specific situations it can be. Maybe in future the balance
will tip, but in general native code is faster, don't you think?

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
----------------------------------------------------------------------------


"Miha Markic [MVP C#]" <miha at rthand com> wrote in message
news:OW82ljOmFHA.3316@TK2MSFTNGP14.phx.gbl...
> Actually, managed code can be faster than native code due to CPU specific
> optimisations.
> Perhaps now there aren't many optiomizations compiler does, however in
> future we should se more of them.
>
> --
> Miha Markic [MVP C#] - RightHand .NET consulting & development
> www.rthand.com
> Blog: http://cs.rthand.com/blogs/blog_with_righthand/
> SLODUG - Slovene Developer Users Group www.codezone-si.info
>
> "Cor Ligthert [MVP]" <notmyfirstname@planet.nl> wrote in message
> news:Ohyp%23VNmFHA.572@TK2MSFTNGP15.phx.gbl...
>>> Managed Code can never be as fast and as optimized as native code.
>>
>> Why not?
>>
>> I have the idea about the opposite.
>>
>> I make it short otherwise you probably don't understand the question.
>>
>> Cor
>>
>
>



Re: dataset Performence Issue by Miha

Miha
Thu Aug 04 07:44:05 CDT 2005

Hi Sahil,

I am not sure which is faster and when. It depends on so many factors (even
on the CPU type) and it is really hard to compare avoiding "comparing apples
and oranges". Generally speaking I don't think that managed code is slower
than native or at least, significantly slower afterall managed code is
executed as native code...

--
Miha Markic [MVP C#] - RightHand .NET consulting & development
www.rthand.com
Blog: http://cs.rthand.com/blogs/blog_with_righthand/
SLODUG - Slovene Developer Users Group www.codezone-si.info

"Sahil Malik [MVP]" <contactmethrumyblog@nospam.com> wrote in message
news:OWBbACPmFHA.2156@TK2MSFTNGP14.phx.gbl...
> Maybe in very specific situations it can be. Maybe in future the balance
> will tip, but in general native code is faster, don't you think?
>
> - Sahil Malik [MVP]
> ADO.NET 2.0 book -
> http://codebetter.com/blogs/sahil.malik/archive/2005/05/13/63199.aspx
> ----------------------------------------------------------------------------
>
>
> "Miha Markic [MVP C#]" <miha at rthand com> wrote in message
> news:OW82ljOmFHA.3316@TK2MSFTNGP14.phx.gbl...
>> Actually, managed code can be faster than native code due to CPU specific
>> optimisations.
>> Perhaps now there aren't many optiomizations compiler does, however in
>> future we should se more of them.
>>
>> --
>> Miha Markic [MVP C#] - RightHand .NET consulting & development
>> www.rthand.com
>> Blog: http://cs.rthand.com/blogs/blog_with_righthand/
>> SLODUG - Slovene Developer Users Group www.codezone-si.info
>>
>> "Cor Ligthert [MVP]" <notmyfirstname@planet.nl> wrote in message
>> news:Ohyp%23VNmFHA.572@TK2MSFTNGP15.phx.gbl...
>>>> Managed Code can never be as fast and as optimized as native code.
>>>
>>> Why not?
>>>
>>> I have the idea about the opposite.
>>>
>>> I make it short otherwise you probably don't understand the question.
>>>
>>> Cor
>>>
>>
>>
>
>



Re: dataset Performence Issue by Nigel

Nigel
Thu Aug 04 10:44:43 CDT 2005

Just to get some facts instead of opinions, I ran some tests. The test
populated a DataTable with varying numbers of records up to 1 million.

The table was simple - 5 string columns, with one of those columns as the
primary key. Around about 100 bytes of data per row. The kind of application
I was thinking of was an address lookup table.

The two numbers I was interested in were:
- the load time per row
- the time to retrieve a row by primary key

Here's a few sample numbers (times in milliseconds):

Count Load Retrieve
1,000 0.02 0.009
10,000 0.03 0.013
100,000 0.04 0.018
1,000,000 0.06 0.028

CPU is an Athlon 2200. Framework version is V1.1

Basically the load cost per row does rise, but not too bad. However this is
a very simple table. I will interested to try this with the V2.0
improvements.

Retrieval time also rises, though quite slowly. Still, 30 microseconds per
lookup is probably