Entries Tagged as 'Ubuntu'

Linux – Desktop Search

A while ago I published a post on Desktop Search on Linux (specifically Ubuntu).  I was far from happy with my conclusions and I felt I needed to re-evaluate all the options to see which would really perform the most accurate search against my information.

Primarily my information consists of Microsoft Office documents, Open Office documents, pictures (JPEG, as well as Canon RAW and Nikon RAW), web pages, archives, and email (stored as RFC822/RFC2822 compliant files with an eml extension).

My test metrics would be to take a handful of search terms which I new existed in various types of documents, and check the results (I actually used Microsoft Windows Search 4.0 to prepare a complete list of documents that matched the query — since I knew it worked as expected).

The search engines I tested were:

I was able to install, configure, and launch each of the applications.  Actually none of them were really that difficult to install and configure; but all of them required searching through documentation and third party sites — I’d say poor documentation is just something you have to get used to.

Beagle, Google, Tracker, Pinot, and Recoll all failed to find all the documents of interest… none of them properly indexed the email files — most of the failed to handle plain text files; that didn’t leave a very high bar to pick a winner.

Queries on Strigi actually provided every hit that the same query provided on Windows Search… though I have to say Windows Search was easier to setup and use.

I tried the Neopomuk (KDE) interface for Strigi — though it just didn’t seem to work as well as strigiclient did… and certainly strigiclient was pretty much at the top of the list for butt-ugly, user-hostile, un-intuitive applications I’d ever seen.

After all of the time I’ve spent on desktop search for Linux I’ve decided all of the search solutions are jokes.  None of them are well thought out, none of them are well executed, and most of them out right don’t work.

Like most Linux projects, more energy needs to be focused on working out a framework for search than everyone going off half-cocked and creating a new search paradigm.

The right model is…

A single multi-threaded indexer running in the background indexing files according to a system wide policy aggregated with user policies (settable by each user on directories they own) along with the access privileges.

A search API that takes the user/group and query to provide results for items that the user has (read) access to.

The indexer should be designed to use plug-in modules to handle particular file types (mapped both by file extension, and by file content).

The index should also be designed to use plug-in modules for walking a file system and receiving file system change events (that allows the framework to adapt as the Linux kernel changes — and would support remote indexing as well).

Additionally, the index/search should be designed with distributed queries in mind (often you want to search many servers, desktops, and web locations simultaneously).

Then it becomes a simple matter for developers to write new/better indexer plug-ins; and better search interfaces.

I’ve pointed out in a number of recent posts that you can effective use Linux as a server platform in your business; however, it seems that if search is a requirement you might want to consider ponying up the money for Microsoft Windows Server 2008 and enjoy seamless search (that works) between your Windows Vista / Windows 7 Desktops and Windows Server.

REFERENCES:

Ubuntu – Desktop Search

Originally posted 2010-07-16 02:00:19.

Libre Office on Ubuntu

If you want Libre Office on Ubuntu and you just can’t wait until 28-April-2011 to upgrade to Ubuntu 11.04 (which should contain Libre Office), then here’s the quick way to make it happen…

 

First, remove Open Office

sudo apt-get remove openoffice*.*

Then setup the PPA

sudo add-apt-repository ppa:libreoffice/ppa
sudo apt-get update

Then do one of the following (based on your desktop manager)

sudo apt-get install libreoffice-gnome

sudo apt-get install libreoffice-kde

sudo apt-get install libreoffice

My recommendation is that you just wait and update your Ubuntu to 11.04 on Thursday — then remove Open Office and install Libre Office… but you are the master of your own computer.

Originally posted 2011-04-26 02:00:51.

Macbuntu

Macbuntu isn’t a sanctioned distribution of Ubuntu like Kubuntu, Xubuntu, etc; rather it’s a set of scripts that turns an Ubuntu desktop into something that resembles a Mac running OS-X… but it’s till very much Ubuntu running gdm (GNOME).

I don’t recommend install Macbuntu on a production machine, or even a real machine until you’ve taken it for a spin around the block.  For the most part it’s eye candy; but that said, it does make a Mac user feel a little more comfortable at an Ubuntu workstation, and there’s certainly nothing wrong with the desktop paradigm (remember, the way GNOME, KDE, XFCE, Enlightenment, Windows, OS-X, etc work is largely arbitrary — it’s just a development effort intended to make routine user operations intuitive and simply; but no two people are the same, and not everyone finds a the “solution” to a particular use case optimal).

What I recommend you do is create a virtual machine with your favorite virtualization software; if you don’t have virtualization software, consider VirtualBox — it’s still free (until Larry Ellison decides to pull the plug on it), and it’s very straight forward for even novices to use.

Install Ubuntu 10.10 Desktop (32-bit is fine for the test) in it, and just take all the defaults — it’s easy, and no reason to fine tune a virtual machine that’s really just a proof-of-concept.

After that, install the virtual guest additions and do a complete update…

Once you’re done with all that, just open a command prompt and type each of the following (without elevated privileges).

  • wget https://downloads.sourceforge.net/project/macbuntu/macbuntu-10.10/v2.3/Macbuntu-10.10.tar.gz -O /tmp/Macbuntu-10.10.tar.gz
  • tar xzvf /tmp/Macbuntu-10.10.tar.gz -C /tmp
  • cd /tmp/Macbuntu-10.10/
  • ./install.sh

Once you’ve followed the on-screen instructions and answered everything to install all the themes, icons, wallpapers, widgets, and tools (you’ll have to modify Firefox and Thunderbird a little more manually — browser windows are opened for you, but you have to install the plug-ins yourself), you reboot and you’re presented with what looks very much like OS-X (you actually get to see some of the eye candy as it’s installed).

Log in… and you see even more Mac-isms… play play play and you begin to get a feel of how Apple created the slick, unified OS-X experience on top of BSD.

Now if you’re a purist you’re going to push your lower lip out and say this isn’t anything like OS-X… well, maybe it doesn’t carry Steve Job’s DNA fingerprint, but for many users I think you’ll hear them exclaim that this is a significant step forward for making Linux more Mac-ish.

There are a couple different efforts to create a Mac like experience under Linux; Macbuntu is centric on making Ubuntu more like OS-X, and as far as I can see it’s probably one of the cleanest and simplest ways to play with an OS-X theme on top of Linux…

If you find you like it, then go ahead and install on a real machine (the eye candy will be much more pleasing with a manly video card and gpu accelerated effects), and you can uninstall it if you like — but with something this invasive I’d strongly encourage you to follow my advice and try before you buy (so to speak — it’s free, but time and effort count for a great deal).

I’ll make a post on installing Macbuntu for tomorrow so that it’s a better reference.

Macbuntu on SourceForge.net

Macbuntu

Originally posted 2010-11-14 02:00:36.

Compression

There are two distinct features that Windows Server 2008 outshines Linux on; and both are centric on compression.

For a very long time Microsoft has supported transparent compression as a part of NTFS; you can designate on a file-by-file or directory level what parts of the file system are compressed by the operating system (applications need do nothing to use compressed files).  This feature was probably originally intended to save the disk foot print of seldom used files; however, with the explosive growth in computing power what’s happened is that compressed files can often be read and decompressed much faster from a disk than a uncompressed file can.  Of course, if you’re modifying say a byte or two in the middle of a compressed file over and over, it might not be a good idea to mark it as compressed — but if you’re basically reading the file sequentially then compression may dramatically increase the overall performance of the system.

The reason for this increase is easy to understand; many files can be compressed ten to one (or better), that means each disk read is reading effectively ten times the information, and for a modern, multi-core, single-instruction/multiple-data capable processor to decompress this stream of data put no appreciable burden on the processing unit(s).

Recently, with SMBv2, Microsoft has expanded the file sharing protocol to be able to transport a compressed data stream, or even a differential data stream (Remote Differential Compression – RDC) rather than necessarily having to send every byte of the file.  This also has the effect of often greatly enhancing the effect data rate, since once again a modern, multi-core, single-instruction/multiple-data capable processor can compress (and decompress) a data stream at a much higher rate than most any network fabric can transmit the data (the exception would be 10G).  In cases of highly constrained networks, or networks with extremely high error rates the increase in effect through put could be staggering.

Unfortunately, Linux lags behind in both areas.

Ext4 does not include transparent compression; and currently no implementation of SMBv2 is available for Linux servers (or clients).

While there’s no question, what-so-ever, that the initial cost of a high performance server is less if Linux is chosen as the operating system, the “hidden” costs of lacking compression may make the total cost of ownership harder to determine.

Supporting transparent compression in a file system is merely a design criteria for a new file system (say Ext5 or Ext4.1); however, supporting SMBv2 will be much more difficult since (unlike SMBv1) it is a closed/proprietary file sharing protocol.

Originally posted 2010-07-11 02:00:49.

Ubuntu – Desktop Search

Microsoft has really shown the power of desktop search in Vista and Windows 7; their newest Desktop Search Engine works, and works well… so in my quest to migrate over to Linux I wanted to have the ability to have both a server style as well as a desktop style search.

So the quest begun… and it was as short a quest as marching on the top of a butte.

I started by reviewing what I could find on the major contenders (just do an Internet search, and you’ll only find about half a dozen reasonable articles comparing the various desktop search solutions for Linux)… which were few enough it didn’t take very long (alphabetical):

My metrics to evaluate a desktop search solutions would focus on the following point:

  • ease of installation, configuration, maintenance
  • search speed
  • search accuracy
  • ease of access to search (applet, web, participation in Windows search)
  • resource utilization (cpu and memory on indexing and searching)

I immediately passed on Google Desktop Search; I have no desire for Google to have more access to information about me; and I’ve tried it before in virtual machines and didn’t think very much of it.

Begal

I first tried Beagle; it sounded like the most promising of all the search engines, and Novel was one of the developers behind it so I figured it would be a stable baseline.

It was easy to install and configure (the package manager did most of the work); and I could use the the search application or the web search, I had to enable it using beagle-config:

beagle-config Networking WebInterface true

And then I could just goto port 4000 (either locally or remotely).

I immediately did a test search; nothing came back.  Wow, how disappointing — several hundred documents in my home folder should have matched.  I waited and tried again — still nothing.

While I liked what I saw, a search engine that couldn’t return reasonable results to a simple query (at all) was just not going to work for me… and since Begal isn’t actively developed any longer, I’m not going to hold out for them to fix a “minor” issue like this.

Tracker

My next choice to experiment with was Tracker; you couldn’t ask for an easier desktop search to experiment with on Ubuntu — it seems to be the “default”.

One thing that’s important to mention — you’ll have to enable the indexer (per-user), it’s disabled by default.  Just use the configuration tool (you might need to install an additional package):

tracker-preferences

Same test, but instantly I got about a dozen documents returned, and additional documents started to appear every few seconds.  I could live with this; after all I figured it would take a little while to totally index my home directory (I had rsync’d a copy of all my documents, emails, pictures, etc from my Windows 2008 server to test with, so there was a great deal of information for the indexer to handle).

The big problem with Tracker was there was no web interface that I could find (yes, I’m sure I could write my own web interface; but then again, I could just write my own search engine).

Strigi

On to Strigi — straight forward to install, and easy to use… but it didn’t seem to give me the results I’d gotten quickly with Tracker (though better than Beagle), and it seemed to be limited to only ten results (WTF?).

I honestly didn’t even look for a web interface for Strigi — it was way too much a disappointment (in fact, I think I’d rather have put more time into Beagle to figure out why I wasn’t getting search results that work with Strigi).

Recoll

My last test was with Recoll; and while it looked promising from all that I read, but everyone seemed to indicate it was difficult to install and that you needed to build it from source.

Well, there’s an Ubuntu package for Recoll — so it’s just as easy to install; it just was a waste of effort to install.

I launched the recoll application, and typed a query in — no results came back, but numerous errors were printed in my terminal window.  I checked the preferences, and made a couple minor changes — ran the search query again — got a segmentation fault, and called it a done deal.

It looked to me from the size of the database files that Recoll had indexed quite a bit of my folder; why it wouldn’t give me any search results (and seg faulted) was beyond me — but it certainly was something I’d seen before with Linux based desktop search.

Conclusions

My biggest conclusion was that Desktop Search on Linux just isn’t really something that’s ready for prime time.  It’s a joke — a horrible joke.

Of the search engines I tried, only Tracker worked reasonably well, and it has no web interface, nor does it participate in a Windows search query (SMB2 feature which directs the server to perform the search when querying against a remote file share).

I’ve been vocal in my past that Linux fails as a Desktop because of the lack of a cohesive experience; but it appears that Desktop Search (or search in general) is a failing of Linux as both a Desktop and a Server — and clearly a reason why choosing Windows Server 2008 is the only reasonable choice for businesses.

The only upside to this evaluation was that it took less time to do than to read about or write up!

Originally posted 2010-07-06 02:00:58.

Dynamic IP Filtering (Black Lists)

There are a number of reasons why you might want to use a dynamic black list of IP addresses to prevent your computer from connecting to or being connect to by users on the Internet who might not have your best interests at heart…

Below are three different dynamic IP filtering solutions for various operating systems; each of them are open source, have easy to use GUIs, and use the same filter list formats (and will download those lists from a URL or load them from a file).

You can read a great deal more about each program and the concepts of IP blocking on the web pages associated with each.

Originally posted 2010-08-17 02:00:55.

Ubuntu – Creating A Disk Mirror

A disk mirror, or RAID1 is a fault tolerant disk configuration where every block of one drive is mirrored on a second drive; this provides the ability to lose one drive (or have damaged sectors on one drive) and still retain data integrity.

RAID1 will have lower write performance than a single drive; but will likely have slightly better read performance than a single drive.  Other types of RAID configurations will have different characteristics; but RAID1 is simple to configure and maintain (and conceptually it’s easy for most anyone to understand the mechanics) and the topic of this article.

Remember, all these commands will need to be executed with elevated privileges (as super-user), so they’ll have to be prefixed with ‘sudo’.

First step, select two disks — preferably identical (but as close to the same size as possible) that don’t have any data on them (or at least doesn’t have any important data on them).  You can use Disk Utility (GUI) or gparted (GUI) or cfdisk (CLI) or fdisk (CLI) to confirm that the disk has no data and change (or create) the partition type to “Linux raid autotected” (type “fd”) — also note the devices that correspond to the drive, they will be needed when building the array.

Check to make sure that mdadm is installed; if not you can use the GUI package manager to download and install it; or simply type:

  • apt-get install mdadm

For this example, we’re going to say the drives were /dev/sde and /dev/sdf.

Create the mirror by executing:

  • mdadm ––create /dev/md0 ––level=1 ––raid-devices=2 /dev/sde1 missing
  • mdadm ––manage ––add /dev/md0 /dev/sdf1

Now you have a mirrored drive, /dev/md0.

At this point you could setup a LVM volume, but we’re going to keep it simple (and for most users, there’s no real advantage to using LVM).

Now you can use Disk Utility to create a partition (I’d recommend a GPT style partition) and format a file system (I’d recommend ext4).

You will want to decide on the mount point

You will probably have to add an entry to /etc/fstab and /etc/mdadm/mdadm.conf if you want the volume mounted automatically at boot (I’d recommend using the UUID rather than the device names).

Here’s an example mdadm.conf entry

  • ARRAY /dev/md0 level=raid1 num-devices=2 UUID=d84d477f:c3bcc681:679ecf21:59e6241a

And here’s an example fstab entry

  • UUID=00586af4-c0e8-479a-9398-3c2fdd2628c4 /mirror ext4 defaults 0 2

You can use mdadm to get the UUID of the mirror (RAID) container

  • mdadm ––examine ––scan

And you can use blkid to get the UUID of the file system

  • blkid

You should probably make sure that you have SMART monitoring installed on your system so that you can monitor the status (and predictive failure) of drives.  To get information on the mirror you can use the Disk Utility (GUI) or just type

  • cat /proc/mdstat

There are many resources on setting mirrors on Linux; for starters you can simply look at the man pages on the mdadm command.

NOTE: This procedure was developed and tested using Ubuntu 10.04 LTS x64 Desktop.

Originally posted 2010-06-28 02:00:37.

Ubuntu – Creating A RAID5 Array

A RAID5 array is a fault tolerant disk configuration which uses a distributed parity block; this provides the ability to lose one drive (or have damaged sectors on one drive) and still retain data integrity.

RAID5 will likely have slightly lower write performance than a single drive; but will likely have significantly better read performance than a single drive. Other types of RAID configurations will have different characteristic.  RAID5 requires a minimum of three drives, and may have as many drives as desires; however, at some point RAID6 with multiple parity blocks should be considered because of the potential of additional drive failure during a rebuild.

The following instructions will illustrate the creation of a RAID5 array with four SATA drives.

Remember, all these commands will need to be executed with elevated privileges (as super-user), so they’ll have to be prefixed with ‘sudo’.

First step, select two disks — preferably identical (but as close to the same size as possible) that don’t have any data on them (or at least doesn’t have any important data on them). You can use Disk Utility (GUI) or gparted (GUI) or cfdisk (CLI) or fdisk (CLI) to confirm that the disk has no data and change (or create) the partition type to “Linux raid autotected” (type “fd”) — also note the devices that correspond to the drive, they will be needed when building the array.

Check to make sure that mdadm is installed; if not you can use the GUI package manager to download and install it; or simply type:

  • apt-get install mdadm

For this example, we’re going to say the drives were /dev/sde /dev/sdf /dev/sdg and /dev/sdh.

Create the RAID5 by executing:

  • mdadm ––create /dev/md1 ––level=5 ––raid-devices=4 /dev/sd{e,f,g,h}1

Now you have a RAID5 fault tolerant drive sub-system, /dev/md1 (the defaults for chunk size, etc are reasonable for general use).

At this point you could setup a LVM volume, but we’re going to keep it simple (and for most users, there’s no real advantage to using LVM).

Now you can use Disk Utility to create a partition (I’d recommend a GPT style partition) and format a file system (I’d recommend ext4).

You will want to decide on the mount point

You will probably have to add an entry to /etc/fstab and /etc/mdadm/mdadm.conf if you want the volume mounted automatically at boot (I’d recommend using the UUID rather than the device names).

Here’s an example mdadm.conf entry

  • ARRAY /dev/md1 level=raid5 num-devices=4 UUID=d84d477f:c3bcc681:679ecf21:59e6241a

And here’s an example fstab entry

  • UUID=00586af4-c0e8-479a-9398-3c2fdd2628c4 /mirror ext4 defaults 0 2

You can use mdadm to get the UUID of the RAID5 container

  • mdadm ––examine ––scan

And you can use blkid to get the UUID of the file system

  • blkid

You should probably make sure that you have SMART monitoring installed on your system so that you can monitor the status (and predictive failure) of drives. To get information on the RAID5 container you can use the Disk Utility (GUI) or just type

  • cat /proc/mdstat

There are many resources on setting RAID5 sub-systems on Linux; for starters you can simply look at the man pages on the mdadm command.

NOTE: This procedure was developed and tested using Ubuntu 10.04 LTS x64 Desktop.

Originally posted 2010-06-29 02:00:15.

Ubuntu – RAID Creation

I think learning how to use mdadm (/sbin/mdadm) is a good idea, but in Ubuntu Desktop you can use Disk Utility (/usr/bin/palimpsest) to create most any of your RAID (“multiple disk”) configurations.

In Disk Utility, just access “File->Create->Raid Array…” on the menu and choose the options.  Before doing that, you might want to clear off the drives you’re going to use (I generally create a fresh GTP partition to insure the drive is ready to be used as a component of the RAID array).

Once you’ve created the container with Disk Utility; you can even format it with a file system; however, you will still need to manually add the entries to /etc/mdadm/mdadm.conf and /etc/fstab.

One other minor issue I noticed.

I gave my multiple disk containers names (mirror00, mirror01, …) and Disk Utility will show them mounted on device /dev/md/mirror00 — in point of fact, you want to use device names like /dev/md0, /dev/md1, … in the /etc/mdadm/mdadm.conf file.  Also, once again, I highly recommend that you use the UUID for the array configuration (in mdadm.conf) and for the file system (in fstab).

Originally posted 2010-07-12 02:00:33.

Ubuntu – Disk Utility

When you install Ubuntu 10.04 Desktop, the default menu item for Disk Utility isn’t extremely useful; after all, it’s on the System->Administration menu, so you would assume that it’s meant to administer the machine, not just view the disk configuration.

What I’m alluding to is that by default Disk Utility (/usr/bin/palimpsest) is not run with elevated privileges (as super-user), but rather as the current user — which if you’re doing as you should be, that’s means you won’t be able to effect any changes, and Disk Utility will probably end up being a waste of time and effort.

To correct this problem all you need do is modify the menu item which launches Disk Utility to elevate your privileges before launching (using gksu) — that, of course, assumes that you’re permitted to elevate your privileges.

To do add privilege elevation to disk utility:

  1. Right click your mouse on the menu bar along the top (right on system is good) and select ‘edit menu items’
  2. Navigate down to ‘administration’ and select it in the left pane
    Select ‘disk utility’ in the right pane
  3. Select ‘properties’ in the buttons on the right
  4. Under ‘command’ prefix it with ‘gksu’ or substitute ‘gksu /usr/bin/palimpsest’ (putting the entire path there)
  5. Then click ‘close’ and ‘close’ again…

Originally posted 2010-06-27 02:00:33.