Google Desktop, is it everything it’s cracked up to be, or just on crack?

That’s right. I said ’server’. Although Google Desktop is really intended for a single user I use in a more robust environment. At any given time, at work, we need to have 20,000+ PDF documents accessible and searchable, both my file name and content, by all of our staff and from two different offices. Of course, the ideal way to do it is on the cheap… don’t spend the money on hardware or expensive search appliances (ie. Google Mini), instead hack something together using existing products and force it to work even if you waste tons of valuable time on it. Ok, that’s not my ideal way, it’s my employers.

Up until last week, this didn’t seem to be a problem. I had setup a Dell server with Windows XP Pro, Google Desktop, and the DNKA plugin for GDS which allows the Google Desktop engine to act as a web server and accept traffic from IPs other than localhost. Perfect. We had around 17,000 documents indexed… but all of the sudden it started missing/skipping files during it’s regular index (hence the difference between 20k+ and 17k).

I should point out now that I was using a previous version of GDS and DNKA from late 2005 (which is when I set this up).

I’ve heard/read instances of other people having problems with GDS’ index cache where it’s become corrupted and just stops indexing new files. I followed some of  the the tips to deleting everything in “Documents and Settings\**username**\Local Settings\Application Data\Google\Google Desktop Search” and then restarting GDS to allow it to completely re-index everything, however this did NOT work. I repeated this step numerous times only to find that each time GDS was producing different, and chaotic, search and indexing results. No rhyme or reason, no pattern, but each time it would somehow ‘choose’ different files to index, or not. Google Desktop was on crack. Maybe it couldn’t handle that many files in one directory structure (with sub-folders), or maybe it couldn’t cope with the speed of which the files were being copied onto the server. Who knows.

In the older version it appeared to me that it would try to process as much as it could at once, and then would choke; causing the unfortunate error of not indexing all of the files properly. I gave up on trying to get it to work, and
updated DNKA and GDS to their current versions, and so far it appears
that it’s going to yield positive results. The current version of GDS seems to be indexing much more steadily than it did previously, spreading it’s workload out over a greater amount of time and using MUCH less system resources at any given time.

Also, it’s important to mention that the files are being copied over from one machine to another using Microsoft’s SyncToy. If this works, as planned, I’ll then try the same test but using ‘robocopy’ from the Windows Server 2003 Resource Kit Tools. (If this MS synctoy developers happen to read this; try emulating the functions of robocopy for SyncToy as most people using SyncToy are likely power users anyway and can deal with the more robust set of options that robocopy offers.)

Technorati Tags: , , , , ,

Leave a Reply

You must be logged in to post a comment.