Linux – Desktop Search

A while ago I published a post on Desktop Search on Linux (specifically Ubuntu).  I was far from happy with my conclusions and I felt I needed to re-evaluate all the options to see which would really perform the most accurate search against my information.

Primarily my information consists of Microsoft Office documents, Open Office documents, pictures (JPEG, as well as Canon RAW and Nikon RAW), web pages, archives, and email (stored as RFC822/RFC2822 compliant files with an eml extension).

My test metrics would be to take a handful of search terms which I new existed in various types of documents, and check the results (I actually used Microsoft Windows Search 4.0 to prepare a complete list of documents that matched the query — since I knew it worked as expected).

The search engines I tested were:

I was able to install, configure, and launch each of the applications.  Actually none of them were really that difficult to install and configure; but all of them required searching through documentation and third party sites — I’d say poor documentation is just something you have to get used to.

Beagle, Google, Tracker, Pinot, and Recoll all failed to find all the documents of interest… none of them properly indexed the email files — most of the failed to handle plain text files; that didn’t leave a very high bar to pick a winner.

Queries on Strigi actually provided every hit that the same query provided on Windows Search… though I have to say Windows Search was easier to setup and use.

I tried the Neopomuk (KDE) interface for Strigi — though it just didn’t seem to work as well as strigiclient did… and certainly strigiclient was pretty much at the top of the list for butt-ugly, user-hostile, un-intuitive applications I’d ever seen.

After all of the time I’ve spent on desktop search for Linux I’ve decided all of the search solutions are jokes.  None of them are well thought out, none of them are well executed, and most of them out right don’t work.

Like most Linux projects, more energy needs to be focused on working out a framework for search than everyone going off half-cocked and creating a new search paradigm.

The right model is…

A single multi-threaded indexer running in the background indexing files according to a system wide policy aggregated with user policies (settable by each user on directories they own) along with the access privileges.

A search API that takes the user/group and query to provide results for items that the user has (read) access to.

The indexer should be designed to use plug-in modules to handle particular file types (mapped both by file extension, and by file content).

The index should also be designed to use plug-in modules for walking a file system and receiving file system change events (that allows the framework to adapt as the Linux kernel changes — and would support remote indexing as well).

Additionally, the index/search should be designed with distributed queries in mind (often you want to search many servers, desktops, and web locations simultaneously).

Then it becomes a simple matter for developers to write new/better indexer plug-ins; and better search interfaces.

I’ve pointed out in a number of recent posts that you can effective use Linux as a server platform in your business; however, it seems that if search is a requirement you might want to consider ponying up the money for Microsoft Windows Server 2008 and enjoy seamless search (that works) between your Windows Vista / Windows 7 Desktops and Windows Server.


Ubuntu – Desktop Search

Bootrec.exe, available as part of repair from the command line can resolve a number of start up issues on Windows.  It comes in quite handy for replacing the master boot record (MBR) and boot loader (a good way to remove a multi-boot manager like GRUB).

 Be sure you understand what you’re doing it you choose to use it.

 Use the Bootrec.exe tool in the Windows Recovery Environment to troubleshoot and repair startup issues in Windows

Windows 6 Service Pack 2

It’s out… it’s been in BETA for quite some time.

Just so you’re clear; Windows 6 covers all the Vista family and the Server 2008 family, and there’s an installer for 32-bit and one for 64-bit; there’s also a DVD image that you can install either from.

You can find a number of articles on the web telling you all about what was originally supposed to be in SP2, and what ended up in it… other than Bluetooth 2.1 and Blu-Ray support there isn’t that much that caught my eye as for “features”.

The big thing you will notice is that this makes Vista noticably faster… and includes the compcln.exe tool that allows you to remove previous component versions (saving disk space — of course once you do so, you cannot go back to previous versions… but if your machine is stable after SP2 you probably wouldn’t want to).

You must have SP1 installed first (Server 2008 comes with SP1 “pre-installed”).

You can access the Microsoft TechNet article via the link below and download the file(s) you desire.  At the moment SP2 is not included in automatic updates, but it will likely be pushed out soon.


Computer Tid Bits

I haven’t sent one of these tid bit emails out in a long long time — this is just a collection of little points that you might find comes in handy.

Server 2008 is indeed out and available. I think I’m going to wait a few months (and I’m just about out of funds for MSFT store purchase, so doubtful I can get a copy for anyone else — I’ll probably do the MSDN OS subscription again). Hyper-V has not shipped as of yet.

Service Pack 1 for Vista can be downloaded or you’ll get it from Windows Update. If you’re updating more than a single machine, download the whole thing (Windows Update will swamp your connection). There are separate packs for 32-bit and 64-bit (you may need both if you have both machines). Also, copy the update file to the local disk (it will need elevated privileges to install).

Virtual Server 2005 R2 can be installed on XP, XP-64, Vista-32, or Vista-64. The management interface requires IIS, so that’s a little different with PWS version on non-server platforms. If you have VS installed on a server, you should be able to manage _all_ of your installations from one management interface (though Vista doesn’t make that easy).

Google GMail allows you to host your domains for email there for free… you basically get GMail accounts in your own domain. I’ve moved my mail services over there for the time being (I still archive all my email on my own server at home, but the active send/receive is done via GMail).

Parallels is coming out with a new server (64 & 32 bit) to compete with Hyper-V; I looked at the beta (definitely a beta, but useable), they may be able to get some of the market share — but my guess is they’ll get the share from VMware (I didn’t care for the Mac-ish look of the product on Windows).

2.5″ SATA disk drives continue to fall in price; Seagate 250GB drives were $104 @ Fry’s, and they still had some on the shelf on Monday!!!

Intel hasn’t release the most of the 45nm processor family yet; the older Core2 dual and quad processor continue to be a great buy. Remember that really none of the current Intel chip sets take advantage of the higher transfers the newer processors are capable of (well the X38, but that’s supposed to have major issues) — so you might want to wait for the next generation of Intel chips and motherboards to hit the market. FYI: Intel delayed the release because AMD missed their ship dates… their new cores had some rather serious flaws

Notebook and desktop memory are nearly on par with each other. You can purchase 2 x 2GB for under $100 (easily — even the really fast memory). $60 is actually the low price and $80 get’s you high quality with heat spreaders (notebook memory doesn’t have heat spreaders — no room). 2 x 1GB can be purchased for $40!!!

Microsoft WebsiteSpark

A program that offers visibility, support and software for professional Web Developers and Designers

If you company has ten or fewer employees, has been around for less than three years, and you provide services, support, and hosting to businesses that develop web sites and applications you might qualify for deeply discounted Windows Web Server and SQL Server Web Edition (like free or nearly free).

You can get more information at the Microsoft® WebsiteSpark page…

Microsoft® WebsiteSpark

Hyper-V Server

With the release of Windows Server 2008 Microsoft made a huge step forward in releasing thin, high-performance hyper-visor for machine virtualization – Hyper-V.

Microsoft has also baited the market by offering a free version of Windows Server 2008 specifically designed to be a virtualization host; Hyper-V Server.

I decide to play with Windows Server 2008 with Hyper-V and Hyper-V Server to get a feel for what it could do.

Installation is a snap; much the same as Vista.

With Windows Server 2008 with Hyper-V everything goes very smoothly and just works.  You can use the Hyper-V manager to setup virtual machines, run them, stop them, etc.  But one thing you want to while you have Windows Server 2008 up and running is figure out everything you need to do to remotely connect to manage Hyper-V and Server 2008 from your workstation because Hyper-V server isn’t going to allow you to do much from the console.

To say it’s a little complicated to get remote Hyper-V management working is an understatement; after I figured it out I found a tool that can help automate the setup — makes like much easier.

The one thing I never got working from Vista x64 was remote management of Windows Server 2008 – and you really need that as well (remember you don’t get much capability from the console).  I’ll probably play with that a little more; and certainly I’ll get it working before I deploy any Hyper-V servers (it’s not a huge problem if you have a Windows Server 2008 machine already, remote management of other Windows Server 2008 boxes just works).

Now after the headache of getting everything configured properly it was time to put Hyper-V through it’s paces.

First task, migrate a machine over from Virtual Server 2005 R2 SP2… piece of cake — copy over the VHD files, create a machine, hookup the disks (back track since Hyper-V seems to have a fairly set directory format for machines and disks — so if you create a new machine on Hyper-V first you’ll see the layout).  Boot the machine, connect, remove the virtual machine additions, reboot, install the new virtual machine files — asks to update the HAL (say yes), reboot, and finally install the new virtual machine files, reboot, re-generate the SID and rename the machine (I still have the old one, and I don’t want confusion)… and everything works great.  Shutdown the machine, add a second processor, start it up… and a dual processor virtual machine is born.

I migrated over 32-bit XP Professional; did a test install of 64-bit Server 2003… and every thing worked just fine.

Don’t get carried away just yet.

There’s a couple gotchas with this.

  • To effectively use the free Hyper-V Server you either need a Windows Server 2008 (full install) or you need to get the remote tools working from your workstation; that’s non-trivial.
  • To run Hyper-V Server or Windows Server 2008 with Hyper-V you need a machine with hardware virtualization and execute disable (which really isn’t that uncommon these days, just make sure your BIOS has those features enabled).
  • Once you migrate a machine to Hyper-V there’s no automated way to go back to Virtual Server 2005 R2 SP2 (sure you can probably do it — but it’s going to be a pain).
  • To get performance out of Hyper-V you really need to use SCSI virtual disks; right now Microsoft doesn’t support booting from SCSI disks in Hyper-V since they only support the para-virtualized SCSI interface.  So to get performance you have to have an IDE boot disk and run off SCSI disks (not exactly a common installation, so you probably won’t be converting any physical machines like that — and seems like it’s a nightmare just waiting to unfold).

Fortunately I’m not in a huge hurry to move to Hyper-V; I’m fairly certain since it’s a corner stone of Microsoft’s push to own the virtual infrastructure market I suspect we’ll see the issues that prevent it from being all that it can be resolved quickly.

And I’ll close with an up-note… WOW — the performance was very impressive… I really wish I had a test machine with lots of spindles to see what kind of load I could realistically put on it.

Windows Component Clean Utility

When you install Windows V6 SP2 you will also get the Component Clean Utility (compcln.exe).

This utility will remove previous component versions from your computer, saving disk space and reducing the size of the installation catalog.

The caveat is that once you remove previous components you will not be able to go back to them.

Before running this utility it’s prudent to insure that you computer is stable after the last update and to create a backup (using something like Acronis or with the included tool that comes with Vista).

Performing simple maintenance tasks and reducing the amount of “fluff” on your disk (remember, the disk clean tool is a good thing to run occassionally as well — and even the included disk defragmenter will help after a great deal of use [though not as much as something like O&O Defrag]) will help keep your computer running well and running fast[er].

Virtulization, Virtulization, Virtulization

For a decade now I’ve been a fan of virtulization (of course, that’s partially predicated on understanding what virtualization is, and how it works — and it’s limitation).

For software developers it offers a large number of practical uses… but more and more the average computer user is discovering the benefits of using virtual machines.

In Windows 7 Microsoft has built the “Windows XP” compatibility feature on top of virtualization (which means to use it you’ll need a processor that supports hardware virtualization — so many low end computers and notebooks aren’t going to have the ability to use the XP compatability feature).

While Windows 7 might make running older programs a seamless, you can (of course) install another virtualization package and still run older software.

Which virtualization package to choose???

Well, for me it’s an easy choice…

  • Windows Server 2008 on machines that have hardware virtualization – HyperV
  • Windows 7 on machines that have hardware virtualization – Virtual PC
  • All others (Windows, OS-X, Linux) – Virtual Box

Now, the disclaimers… if I were running a commercial enterprise; and I didn’t want to spend the money to buy Windows Server 2008, Microsoft does offer Windows Server 2008 – Virtual Server Edition for no cost (you really need one Windows Server 2008 in order to effectively manage it — but you can install the tools on Vista if you really don’t have it in your budget to buy a single license).

And no, I wouldn’t choose Linux OR OS-X as the platform to run a commercial virtualization infrastructure on… simply because device support for modern hardware (and modern hardware is what you’re going to base a commercial virtualization infrastructure on if you’re serious) is unparalleled PERIOD.

If you’re running Vista or Vista 64 you may decide to user Virtual PC ( a better choice would be Virtual Server 2005 R2); but Virtual Box is being actively developed, and it’s hardware reference for virtualization is much more modern (and I feel a better choice).

To make it simple… the choice comes down to Microsoft HyperV derived technology or Virtual Box.  Perhaps if I were a *nix biggot I’d put Xen in the loop, but like with so many Linux centric projects there are TOO MANY distributions, and too many splinter efforts.

One last note; keep in mind that you need a license for any operating system that you run in a virtual environment.

Online Capacity Expansion


  • Call me old fashion…
  • Call me conservative…
  • Call me a doubting “Thomas”…
  • Call me tickled pink…
  • Call me surprised…

I just finished adding four additional spindles to one of my virtual hosts; when I originally built it out I only had four spindles available, and didn’t want to buy more since I knew I would be freeing up smaller spindles for it soon.

The first task was to have the RAID software add the new spindles to the array, then to “expand” the array container… the first step took only a few moments, the second step took about 20 hours for the array controller to rebuild / expand the array.

The second task was to get Windows to actually use the added space by expanding the volume; to do that was a simple matter of using diskpart.exe (you can search Microsoft’s Knowledge Base) only took a few moments.

The incredible thing about this was that my virtual host and virtual machines was online for the entire 20 hours — with absolutely no service interruption.

This particular machine used a Dell / LSI controller; but the Promise controllers also support dynamic capacity expansion as do 3Ware controllers.  I believe the Intel Matrix pseudo RAID controller also support dynamic capacity expansion; but as with other RAID and pseudo-RAID controllers you should check the documentation specific to it and consult the manufacturer’s web site for errata and updates before proceeding.

The bottom line is Windows and RAID arrays have come a long way, and it’s quite possible that you will be able to expand the capacity of your array without taking your server down; however, if the data on the server is irreplaceable, I recommend you consider backing it up (at least the irreplaceable data).

Web Servers

For several years I’ve used a combination of Microsoft IIS and Apache, which fits in with my belief that you choose the best tool for the job (and rarely does one tool work best across the board).

About a month ago I “needed” to do some maintenance on my personal web server, and I started to notice the number of things that had been installed on it… like two versions of Microsoft SQL Server (why a Microsoft product felt the need to install the compact edition when I already had the full blown edition is beyond me).

As I started to peel  away layer upon layer of unnecessary software I realized that my dependency on IIS was one very simple ASP dot Net script I’d written for a client of mine and adapted for my own use (you could also say I’d written it for my use and adapted it for them).

I started thinking, and realized it would take me about ten minutes to re-write that script in PHP and in doing that I could totally eliminate my personal dependency on IIS and somewhat simplify my life.

In about half an hour (I had to test the script and there was more to uninstall) I had a very clean machine with about 8GB more of disk space, and no IIS… and the exact same functionality (well — I would argue increased functionality since there was far less software that I would have to update and maintain on the machine).

Sure, there are cases where ASP dot Net is a good solution (though honestly I absolutely cannot stand it or the development environment, it seems to me like an environment targeted at mediocre programmers who have no understanding of what they’re doing and an incredible opportunity for security flaws and bugs)… but many times PHP works far better, and for very complex solutions a JSP (Java Servlet / JavaServer Pages) solution would likely work better.

My advice, think through what your (technical) requirements are and consider the options before locking into proprietary solutions.

