Bruce
Sat Mar 15 17:42:43 CDT 2008
On Mar 15, 1:46 pm, Jon Skeet [C# MVP] <sk...@pobox.com> wrote:
> Bruce <bruce.james....@gmail.com> wrote:
> > I am designing an app to do log search efficiently. I have gigabytes
> > of server logs that contain all kind of information - typically I
> > query about a user name in a certain time span to find out what the
> > user actually did during that time, what errors he got etc.
> > I previously just used findstr across these files to do it but I am
> > finding it slow and inaccurate.
> > So, I am planning to write a platform that parses all the logs
> > realtime, stores the words in a database.
> > For example a line in the log reading "User connected to server"
> > would result in 4 rows in the database for each of the words, with
> > information about the file, time, relative location in the log among
> > other things.
>
> > This way, if I query for "bruce connected', I would be able to convert
> > it into a database query and fetch the results fairly quickly.
>
> > I have a couple of questions:
> > 1. I am not using any standard search engine since I don't think they
> > index and provide the level of detail I would need. So, does my design
> > of using a database in the above manner sound good?
>
> It sounds like you're ignoring the fact that the database itself is
> probably good at doing full text indexing. Why not have one row per log
> entry, and let the database handle the complicated stuff?
>
> Alternatively, if you're really just building an index, consider using
> a product built specifically for indexing, such as Lucene.
>
> > 2. On top of this platform, I plan to build layers that do intelligent
> > search - say using business logic, it queries and finds out all users
> > who got errors and displays them in a UI.
>
> > I am curious to know whether there is a better approach to this.
>
> That part sounds perfectly reasonable - although again, it may well be
> easier if you have one entry per log entry, with separate columns for
> the nature of the entry (error, info etc), the affected user, etc.
>
> --
> Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet Blog:
http://www.msmvps.com/jon.skeet
> World class .NET training in the UK:
http://iterativetraining.co.uk
Thanks for the reply Jon.
>Why not have one row per log entry, and let the database handle the complicated stuff?
Won't the search get less efficient if I do this?
If the row reads "Bruce sent a message" and I search for Bruce, my
query would be like "Select xx from xxx where log like "% Bruce %"".
Isn't doing the "like" inefficient since the database would have to do
a full table scan?
(whereas if I just had the word, the index can directly get me the
row. Multiple words would require joins though)
Also, I would not know from a log entry what/where the user name is to
have a separate column for it.
Thanks
Bruce