Hi to all.

I'm developing a network server application which will be constantly
receiving data from mobile GPRS clients and storing them in a centralized
database (probably SQL Server or Oracle); from the server's point of view,
the clients will just be standard TCP/IP clients, sending data using a
custom protocol I'll design. The data will be sent to the server in
messages, which will contain very small amounts of data (a bunch of floating
point numbers), but will be sent quite often, every second or even faster.
The number of client is not pre-determined, but the application needs to be
able to scale up to hundreds of clients. The server won't do any processing
on the incoming data, just add a timestamp and the client's ID and INSERT a
new row in a table.

I'm going to use async sockets to handle the connections from the clients,
so the data storing actions will be triggered by incoming socket data and
will be handled by the socket callback methods; I think this is much better
than using worker threads to wait on the sockets, since the number of
simultaneous connections can potentially become quite high (am I right on
this?). Now, my question is: which is the best way to do an INSERT on the DB
every time a data message arrives from a connected socket?

I (obviously) don't want to manually open and close a DB connection every
time; I could open one when the server starts and do all of my data access
using that one, but I don't know how this method works when using a
multithreaded application; if I have to use some form of mutual exclusion
here, then the single DB connection will become a bottleneck.
I've heard about using connection pooling, but I don't actually know how it
works and if this is the ideal technique for this scenario.

Can someone please help? :-)

Thanks


Massimo

Re: Storing data from multiple clients by Massimo

Massimo
Sun Oct 01 11:05:41 CDT 2006

"Massimo" <barone@mclink.it> ha scritto nel messaggio
news:eVqE5KX5GHA.2536@TK2MSFTNGP06.phx.gbl...

> Can someone please help? :-)

I forgot to tell: .NET 2.0, C#.
But if you post code samples using VB, I think I'll still be able to read
them ;-)


Massimo


Re: Storing data from multiple clients by Thomas

Thomas
Sun Oct 01 11:19:44 CDT 2006

Massimo

I cannot answer for the async sockets. I've never used them; but for the database:

The .NET Connection classes can not be used by multiple trheads. You have to use one Connection instance for each thread.

Opening a connection will reuse the connection on the SQL Server side if possible, so the SQL Server will create it only once (Connection Pooling). Opening and closing the connection will be done fast.

We have seen, that the most time will be used by inserting the data into the table of SQL Server. Depending on the servers load and the available hardware, you could assume 20..50 ms for every insert. To get a more accurate time, try it out with the planned or similar hardware.
Use stored procedures for the insert; they have better performance than using sql statements.

Try to reduce roundtrips to the database server. You could collect lets say 10 messages from the clients and send them as a bunch to a stored procedure on the database server.

Try to decouple the insert from the TCP sockets, because you never know, how much time it will take. You could use Microsoft Message Queue for that.

Furthermore try to design your application for scale out. That means that you can do the inserts on multiple SQL/Oracle servers.


I hope that helps
Greetings from Switzerland
Thomas


On Sun, 1 Oct 2006 18:00:30 +0200
"Massimo" <barone@mclink.it> wrote:

> Hi to all.
>
> I'm developing a network server application which will be constantly
> receiving data from mobile GPRS clients and storing them in a centralized
> database (probably SQL Server or Oracle); from the server's point of view,
> the clients will just be standard TCP/IP clients, sending data using a
> custom protocol I'll design. The data will be sent to the server in
> messages, which will contain very small amounts of data (a bunch of floating
> point numbers), but will be sent quite often, every second or even faster.
> The number of client is not pre-determined, but the application needs to be
> able to scale up to hundreds of clients. The server won't do any processing
> on the incoming data, just add a timestamp and the client's ID and INSERT a
> new row in a table.
>
> I'm going to use async sockets to handle the connections from the clients,
> so the data storing actions will be triggered by incoming socket data and
> will be handled by the socket callback methods; I think this is much better
> than using worker threads to wait on the sockets, since the number of
> simultaneous connections can potentially become quite high (am I right on
> this?). Now, my question is: which is the best way to do an INSERT on the DB
> every time a data message arrives from a connected socket?
>
> I (obviously) don't want to manually open and close a DB connection every
> time; I could open one when the server starts and do all of my data access
> using that one, but I don't know how this method works when using a
> multithreaded application; if I have to use some form of mutual exclusion
> here, then the single DB connection will become a bottleneck.
> I've heard about using connection pooling, but I don't actually know how it
> works and if this is the ideal technique for this scenario.
>
> Can someone please help? :-)
>
> Thanks
>
>
> Massimo
>

Re: Storing data from multiple clients by Massimo

Massimo
Sun Oct 01 12:07:04 CDT 2006

"Thomas Weingartner" <thomas.weingartner@gmx.nospam.ch> ha scritto nel
messaggio news:20061001181944.b945c881.thomas.weingartner@gmx.nospam.ch...


You gave me some good points, thanks.

> The .NET Connection classes can not be used by multiple trheads.
> You have to use one Connection instance for each thread.

That's the problem: when the framework uses its own worker threads to
process async socket callbacks, you don't have any control on how many
threads there are and which one of them is used. To do as you suggest, I'd
have to manually spawn worker threads and have each one of them manage one
or more connected sockets.
I could also have each callback create a connection object, open it, use it
and then close it, but I think this would be *really* a bottleneck.
Another option would be to have only one connection object and use a mutex
to have only one thread at a time access it, but I think this would create a
big bottleneck too.

> Opening a connection will reuse the connection on the SQL Server side if
> possible, so the SQL Server will create it only once (Connection Pooling).
> Opening and closing the connection will be done fast.

Of course, but only when using the same connection object(s).

> We have seen, that the most time will be used by inserting the data
> into the table of SQL Server. Depending on the servers load and
> he available hardware, you could assume 20..50 ms for every insert.
> To get a more accurate time, try it out with the planned or similar
> hardware.

I know this; the actual INSERT query is not my main concern here... I'm
concerned about how to handle connections from the application side.

> Use stored procedures for the insert; they have better performance
> than using sql statements.

But they only make sense when using one specific DBMS; The application will
need to support different ones.

> Try to reduce roundtrips to the database server. You could collect
> lets say 10 messages from the clients and send them as a bunch
> to a stored procedure on the database server.
>
> Try to decouple the insert from the TCP sockets, because you never know,
> how much time it will take. You could use Microsoft Message Queue for
> that.

That's a really good point; I think maybe MSMQ is too much for this, but an
internal (synchronized) FIFO buffer which would be filled by the socket
callbacks and periodically emptied by a worker thread which moves the data
to the DB could be a good design choice. This could also let me see if a
long queue of unstored data builds up (which would mean the DB can't handle
all those queries), or temporarily store data if the DB is down/unreachable
for a while. I should think more about this, thanks :-)

> Furthermore try to design your application for scale out. That means
> that you can do the inserts on multiple SQL/Oracle servers.

It's not planned to use many DB server (there are license costs here...).
Maybe my server will be deployed in a load balancing configuration, but even
in this case every instance will store data on the same DB.


Massimo


Re: Storing data from multiple clients by Thomas

Thomas
Sun Oct 01 14:38:55 CDT 2006

Massimo

I have to clear some of my proposals:


> That's the problem: when the framework uses its own worker threads to
> process async socket callbacks, you don't have any control on how many
> threads there are and which one of them is used. To do as you suggest, I'd
> have to manually spawn worker threads and have each one of them manage one
> or more connected sockets.
> I could also have each callback create a connection object, open it, use it
> and then close it, but I think this would be *really* a bottleneck.
> Another option would be to have only one connection object and use a mutex
> to have only one thread at a time access it, but I think this would create a
> big bottleneck too.

We have written a customized DataAdapter, that handles the connections internally on a per thread basis. That means the connection will be created for a specific thread. We've got a big application (>1 million loc) written in C# and there are running about 700 queries (average) every second against a SQL Server 2000. For every query we will create a connection object, open it, run the query and close it. The only bottleneck is the database server itself!

I understand your distrust; try it out with a little sample project and measure the time spent to create, open and close the connection object.

> > Opening a connection will reuse the connection on the SQL Server side if
> > possible, so the SQL Server will create it only once (Connection Pooling).
> > Opening and closing the connection will be done fast.
>
> Of course, but only when using the same connection object(s).

Thats not true. Connection pooling will be done with the same "configuration". That means with the same data in the connection string and Windows identity when integrated security is used.
You should read the chapter "Using Connection Pooling" in the VS.NET 2005 help.

> > Try to reduce roundtrips to the database server. You could collect
> > lets say 10 messages from the clients and send them as a bunch
> > to a stored procedure on the database server.
> > Try to decouple the insert from the TCP sockets, because you never know,
> > how much time it will take. You could use Microsoft Message Queue for
> > that.
> That's a really good point; I think maybe MSMQ is too much for this, but an
> internal (synchronized) FIFO buffer which would be filled by the socket
> callbacks and periodically emptied by a worker thread which moves the data
> to the DB could be a good design choice. This could also let me see if a
> long queue of unstored data builds up (which would mean the DB can't handle
> all those queries), or temporarily store data if the DB is down/unreachable
> for a while. I should think more about this, thanks :-)

It's very easy to use MSMQ. With your own FIFO buffer, you have to handle the synchronisation by yourself. Of course you even need a worker thread with MSMQ.

I post some sample code here (I left exception handling away). To use this code, you have to install MSMQ from your Windows Components (even unter Windows 2000/XP):

using System;
using System.Messaging;

public class AutoCreateMessageQueue
{
private string m_queueName;
private MessageQueue m_msgQ;

public AutoCreateMessageQueue(string queueName)
{
m_queueName = queueName;
}

public void Initialize()
{
// Create the queue if it does not already exist
if (!MessageQueue.Exists(m_queueName))
{
m_msgQ = MessageQueue.Create(m_queueName);
}
else
{
m_msgQ = new MessageQueue(m_queueName);
}
}

/// <summary>
/// Deletes all the messages contained in the queue.
/// </summary>
public void Purge()
{
m_msgQ.Purge();
}

/// <summary>
/// Adds an object to the nontransactional queue and specifies a label for the message.
/// </summary>
public void Add(string message, string label)
{
m_msgQ.Send(message, label);
}

/// <summary>
/// Receives the first message available in the queue. This call is synchronous, and
/// blocks the current thread of execution until a message is available.
/// </summary>
/// <returns>Returns the content of the message. The caller is responsible to interpret the contents.</returns>
public object GetBlocking()
{
Message myMessage = m_msgQ.Receive();
return myMessage.Body;
}
}


...
// socket side initializing
AutoCreateMessageQueue queue = new AutoCreateMessageQueue(@".\private$\DbInsertData");
queue.Initialize();
queue.Purge(); // cleanup old messages in queue

...
// socket side adding a message
string data = GetData(your parameter to assemble a string);
queue.Add(data, data.Id); // you could serialize your data class directly

...
// Worker thread initializing
AutoCreateMessageQueue queue = new AutoCreateMessageQueue(@".\private$\DbInsertData");
queue.Initialize();

...
// Worker thread reading the message and inserting in db
string data = (string)queue.GetBlocking()
InsertDataIntoDb(data);
...

That way, you could dynamically setup more worker threads to insert data into the database. The synchronization is done completely by MSMQ.


Good luck
Thomas

Re: Storing data from multiple clients by Massimo

Massimo
Sun Oct 01 15:01:56 CDT 2006

"Thomas Weingartner" <thomas.weingartner@gmx.nospam.ch> ha scritto nel
messaggio news:20061001213855.5119dff9.thomas.weingartner@gmx.nospam.ch...


> For every query we will create a connection object, open it, run
> the query and close it. The only bottleneck is the database server itself!
>
> I understand your distrust; try it out with a little sample project and
> measure the time spent to create, open and close the connection object.

Ok, I'll give it a try.
It just doesn't seem so obvious that having to create and open connections
on the fly is almost as efficient as using permanent connections :-)

> It's very easy to use MSMQ. With your own FIFO buffer, you have
> to handle the synchronisation by yourself. Of course you even need
> a worker thread with MSMQ.
>
> I post some sample code here (I left exception handling away).

[CUT]

> That way, you could dynamically setup more worker threads to
> insert data into the database. The synchronization is done completely
> by MSMQ.

That's quite nice. Thanks for the tip.


Massimo


Re: Storing data from multiple clients by Massimo

Massimo
Sun Oct 01 15:35:41 CDT 2006

"Thomas Weingartner" <thomas.weingartner@gmx.nospam.ch> ha scritto nel
messaggio news:20061001213855.5119dff9.thomas.weingartner@gmx.nospam.ch...

> That way, you could dynamically setup more worker threads
> to insert data into the database.

A question about this: does it make any sense to use N worker threads
instead of one, if they're sending the data to the same database? Maybe if
the database is running on a SMP machine, so it *can* actually process more
queries simultaneously; but what if it's running on a single CPU? Can it
better optimize the queries if they're coming from two or more client
connections instead of one?


Massimo


Re: Storing data from multiple clients by Thomas

Thomas
Mon Oct 02 07:14:11 CDT 2006

Massimo

In my opinion it makes sense, because scalable database servers work
with multiple threads too. The inserts could be processed in parallel
either using multiple hard disks to write the data or by buffering the
disk writes in the disk caches.

The queries from one client can only be processed serial.

Thomas

Massimo wrote:
> "Thomas Weingartner" <thomas.weingartner@gmx.nospam.ch> ha scritto nel
> messaggio news:20061001213855.5119dff9.thomas.weingartner@gmx.nospam.ch...
>
>> That way, you could dynamically setup more worker threads
>> to insert data into the database.
>
> A question about this: does it make any sense to use N worker threads
> instead of one, if they're sending the data to the same database? Maybe
> if the database is running on a SMP machine, so it *can* actually
> process more queries simultaneously; but what if it's running on a
> single CPU? Can it better optimize the queries if they're coming from
> two or more client connections instead of one?
>
>
> Massimo
>

Re: Storing data from multiple clients by Massimo

Massimo
Mon Oct 02 08:26:36 CDT 2006

"Thomas Weingartner" <thomas.weingartner@nospam.gmx.ch> ha scritto nel
messaggio news:%23UE%23Xwh5GHA.1200@TK2MSFTNGP02.phx.gbl...

> In my opinion it makes sense, because scalable database servers work
> with multiple threads too. The inserts could be processed in parallel
> either using multiple hard disks to write the data or by buffering the
> disk writes in the disk caches.
>
> The queries from one client can only be processed serial.

I was thinking the same.

About MSMQ: it's actually fairly easy to use, but I don't think it's the
best choice for an in-memory buffer used by a single application; it stores
its data on disk, and also its general-purpose architecture surely adds some
overhead. I think going with a System.Collections.Generic.Queue<> and some
lock objects could be a lot better.


Massimo


Re: Storing data from multiple clients by Thomas

Thomas
Fri Oct 06 02:01:09 CDT 2006

Massimo

> About MSMQ: it's actually fairly easy to use, but I don't think it's the
> best choice for an in-memory buffer used by a single application; it
> stores its data on disk, and also its general-purpose architecture
> surely adds some overhead. I think going with a
> System.Collections.Generic.Queue<> and some lock objects could be a lot
> better.

You are right.

MSMQ is more flexible about scaling out on another machine,
transactional handling etc. If you don't use these features, your choice
will be better.

Thomas