Entries Tagged as 'Ubuntu'

Compression

There are two distinct features that Windows Server 2008 outshines Linux on; and both are centric on compression.

For a very long time Microsoft has supported transparent compression as a part of NTFS; you can designate on a file-by-file or directory level what parts of the file system are compressed by the operating system (applications need do nothing to use compressed files).  This feature was probably originally intended to save the disk foot print of seldom used files; however, with the explosive growth in computing power what’s happened is that compressed files can often be read and decompressed much faster from a disk than a uncompressed file can.  Of course, if you’re modifying say a byte or two in the middle of a compressed file over and over, it might not be a good idea to mark it as compressed — but if you’re basically reading the file sequentially then compression may dramatically increase the overall performance of the system.

The reason for this increase is easy to understand; many files can be compressed ten to one (or better), that means each disk read is reading effectively ten times the information, and for a modern, multi-core, single-instruction/multiple-data capable processor to decompress this stream of data put no appreciable burden on the processing unit(s).

Recently, with SMBv2, Microsoft has expanded the file sharing protocol to be able to transport a compressed data stream, or even a differential data stream (Remote Differential Compression – RDC) rather than necessarily having to send every byte of the file.  This also has the effect of often greatly enhancing the effect data rate, since once again a modern, multi-core, single-instruction/multiple-data capable processor can compress (and decompress) a data stream at a much higher rate than most any network fabric can transmit the data (the exception would be 10G).  In cases of highly constrained networks, or networks with extremely high error rates the increase in effect through put could be staggering.

Unfortunately, Linux lags behind in both areas.

Ext4 does not include transparent compression; and currently no implementation of SMBv2 is available for Linux servers (or clients).

While there’s no question, what-so-ever, that the initial cost of a high performance server is less if Linux is chosen as the operating system, the “hidden” costs of lacking compression may make the total cost of ownership harder to determine.

Supporting transparent compression in a file system is merely a design criteria for a new file system (say Ext5 or Ext4.1); however, supporting SMBv2 will be much more difficult since (unlike SMBv1) it is a closed/proprietary file sharing protocol.

Originally posted 2010-07-11 02:00:49.

Ubuntu – Desktop Search

Microsoft has really shown the power of desktop search in Vista and Windows 7; their newest Desktop Search Engine works, and works well… so in my quest to migrate over to Linux I wanted to have the ability to have both a server style as well as a desktop style search.

So the quest begun… and it was as short a quest as marching on the top of a butte.

I started by reviewing what I could find on the major contenders (just do an Internet search, and you’ll only find about half a dozen reasonable articles comparing the various desktop search solutions for Linux)… which were few enough it didn’t take very long (alphabetical):

My metrics to evaluate a desktop search solutions would focus on the following point:

  • ease of installation, configuration, maintenance
  • search speed
  • search accuracy
  • ease of access to search (applet, web, participation in Windows search)
  • resource utilization (cpu and memory on indexing and searching)

I immediately passed on Google Desktop Search; I have no desire for Google to have more access to information about me; and I’ve tried it before in virtual machines and didn’t think very much of it.

Begal

I first tried Beagle; it sounded like the most promising of all the search engines, and Novel was one of the developers behind it so I figured it would be a stable baseline.

It was easy to install and configure (the package manager did most of the work); and I could use the the search application or the web search, I had to enable it using beagle-config:

beagle-config Networking WebInterface true

And then I could just goto port 4000 (either locally or remotely).

I immediately did a test search; nothing came back.  Wow, how disappointing — several hundred documents in my home folder should have matched.  I waited and tried again — still nothing.

While I liked what I saw, a search engine that couldn’t return reasonable results to a simple query (at all) was just not going to work for me… and since Begal isn’t actively developed any longer, I’m not going to hold out for them to fix a “minor” issue like this.

Tracker

My next choice to experiment with was Tracker; you couldn’t ask for an easier desktop search to experiment with on Ubuntu — it seems to be the “default”.

One thing that’s important to mention — you’ll have to enable the indexer (per-user), it’s disabled by default.  Just use the configuration tool (you might need to install an additional package):

tracker-preferences

Same test, but instantly I got about a dozen documents returned, and additional documents started to appear every few seconds.  I could live with this; after all I figured it would take a little while to totally index my home directory (I had rsync’d a copy of all my documents, emails, pictures, etc from my Windows 2008 server to test with, so there was a great deal of information for the indexer to handle).

The big problem with Tracker was there was no web interface that I could find (yes, I’m sure I could write my own web interface; but then again, I could just write my own search engine).

Strigi

On to Strigi — straight forward to install, and easy to use… but it didn’t seem to give me the results I’d gotten quickly with Tracker (though better than Beagle), and it seemed to be limited to only ten results (WTF?).

I honestly didn’t even look for a web interface for Strigi — it was way too much a disappointment (in fact, I think I’d rather have put more time into Beagle to figure out why I wasn’t getting search results that work with Strigi).

Recoll

My last test was with Recoll; and while it looked promising from all that I read, but everyone seemed to indicate it was difficult to install and that you needed to build it from source.

Well, there’s an Ubuntu package for Recoll — so it’s just as easy to install; it just was a waste of effort to install.

I launched the recoll application, and typed a query in — no results came back, but numerous errors were printed in my terminal window.  I checked the preferences, and made a couple minor changes — ran the search query again — got a segmentation fault, and called it a done deal.

It looked to me from the size of the database files that Recoll had indexed quite a bit of my folder; why it wouldn’t give me any search results (and seg faulted) was beyond me — but it certainly was something I’d seen before with Linux based desktop search.

Conclusions

My biggest conclusion was that Desktop Search on Linux just isn’t really something that’s ready for prime time.  It’s a joke — a horrible joke.

Of the search engines I tried, only Tracker worked reasonably well, and it has no web interface, nor does it participate in a Windows search query (SMB2 feature which directs the server to perform the search when querying against a remote file share).

I’ve been vocal in my past that Linux fails as a Desktop because of the lack of a cohesive experience; but it appears that Desktop Search (or search in general) is a failing of Linux as both a Desktop and a Server — and clearly a reason why choosing Windows Server 2008 is the only reasonable choice for businesses.

The only upside to this evaluation was that it took less time to do than to read about or write up!

Originally posted 2010-07-06 02:00:58.

Dynamic IP Filtering (Black Lists)

There are a number of reasons why you might want to use a dynamic black list of IP addresses to prevent your computer from connecting to or being connect to by users on the Internet who might not have your best interests at heart…

Below are three different dynamic IP filtering solutions for various operating systems; each of them are open source, have easy to use GUIs, and use the same filter list formats (and will download those lists from a URL or load them from a file).

You can read a great deal more about each program and the concepts of IP blocking on the web pages associated with each.

Originally posted 2010-08-17 02:00:55.

Ubuntu – Creating A Disk Mirror

A disk mirror, or RAID1 is a fault tolerant disk configuration where every block of one drive is mirrored on a second drive; this provides the ability to lose one drive (or have damaged sectors on one drive) and still retain data integrity.

RAID1 will have lower write performance than a single drive; but will likely have slightly better read performance than a single drive.  Other types of RAID configurations will have different characteristics; but RAID1 is simple to configure and maintain (and conceptually it’s easy for most anyone to understand the mechanics) and the topic of this article.

Remember, all these commands will need to be executed with elevated privileges (as super-user), so they’ll have to be prefixed with ‘sudo’.

First step, select two disks — preferably identical (but as close to the same size as possible) that don’t have any data on them (or at least doesn’t have any important data on them).  You can use Disk Utility (GUI) or gparted (GUI) or cfdisk (CLI) or fdisk (CLI) to confirm that the disk has no data and change (or create) the partition type to “Linux raid autotected” (type “fd”) — also note the devices that correspond to the drive, they will be needed when building the array.

Check to make sure that mdadm is installed; if not you can use the GUI package manager to download and install it; or simply type:

  • apt-get install mdadm

For this example, we’re going to say the drives were /dev/sde and /dev/sdf.

Create the mirror by executing:

  • mdadm ––create /dev/md0 ––level=1 ––raid-devices=2 /dev/sde1 missing
  • mdadm ––manage ––add /dev/md0 /dev/sdf1

Now you have a mirrored drive, /dev/md0.

At this point you could setup a LVM volume, but we’re going to keep it simple (and for most users, there’s no real advantage to using LVM).

Now you can use Disk Utility to create a partition (I’d recommend a GPT style partition) and format a file system (I’d recommend ext4).

You will want to decide on the mount point

You will probably have to add an entry to /etc/fstab and /etc/mdadm/mdadm.conf if you want the volume mounted automatically at boot (I’d recommend using the UUID rather than the device names).

Here’s an example mdadm.conf entry

  • ARRAY /dev/md0 level=raid1 num-devices=2 UUID=d84d477f:c3bcc681:679ecf21:59e6241a

And here’s an example fstab entry

  • UUID=00586af4-c0e8-479a-9398-3c2fdd2628c4 /mirror ext4 defaults 0 2

You can use mdadm to get the UUID of the mirror (RAID) container

  • mdadm ––examine ––scan

And you can use blkid to get the UUID of the file system

  • blkid

You should probably make sure that you have SMART monitoring installed on your system so that you can monitor the status (and predictive failure) of drives.  To get information on the mirror you can use the Disk Utility (GUI) or just type

  • cat /proc/mdstat

There are many resources on setting mirrors on Linux; for starters you can simply look at the man pages on the mdadm command.

NOTE: This procedure was developed and tested using Ubuntu 10.04 LTS x64 Desktop.

Originally posted 2010-06-28 02:00:37.

Ubuntu – Creating A RAID5 Array

A RAID5 array is a fault tolerant disk configuration which uses a distributed parity block; this provides the ability to lose one drive (or have damaged sectors on one drive) and still retain data integrity.

RAID5 will likely have slightly lower write performance than a single drive; but will likely have significantly better read performance than a single drive. Other types of RAID configurations will have different characteristic.  RAID5 requires a minimum of three drives, and may have as many drives as desires; however, at some point RAID6 with multiple parity blocks should be considered because of the potential of additional drive failure during a rebuild.

The following instructions will illustrate the creation of a RAID5 array with four SATA drives.

Remember, all these commands will need to be executed with elevated privileges (as super-user), so they’ll have to be prefixed with ‘sudo’.

First step, select two disks — preferably identical (but as close to the same size as possible) that don’t have any data on them (or at least doesn’t have any important data on them). You can use Disk Utility (GUI) or gparted (GUI) or cfdisk (CLI) or fdisk (CLI) to confirm that the disk has no data and change (or create) the partition type to “Linux raid autotected” (type “fd”) — also note the devices that correspond to the drive, they will be needed when building the array.

Check to make sure that mdadm is installed; if not you can use the GUI package manager to download and install it; or simply type:

  • apt-get install mdadm

For this example, we’re going to say the drives were /dev/sde /dev/sdf /dev/sdg and /dev/sdh.

Create the RAID5 by executing:

  • mdadm ––create /dev/md1 ––level=5 ––raid-devices=4 /dev/sd{e,f,g,h}1

Now you have a RAID5 fault tolerant drive sub-system, /dev/md1 (the defaults for chunk size, etc are reasonable for general use).

At this point you could setup a LVM volume, but we’re going to keep it simple (and for most users, there’s no real advantage to using LVM).

Now you can use Disk Utility to create a partition (I’d recommend a GPT style partition) and format a file system (I’d recommend ext4).

You will want to decide on the mount point

You will probably have to add an entry to /etc/fstab and /etc/mdadm/mdadm.conf if you want the volume mounted automatically at boot (I’d recommend using the UUID rather than the device names).

Here’s an example mdadm.conf entry

  • ARRAY /dev/md1 level=raid5 num-devices=4 UUID=d84d477f:c3bcc681:679ecf21:59e6241a

And here’s an example fstab entry

  • UUID=00586af4-c0e8-479a-9398-3c2fdd2628c4 /mirror ext4 defaults 0 2

You can use mdadm to get the UUID of the RAID5 container

  • mdadm ––examine ––scan

And you can use blkid to get the UUID of the file system

  • blkid

You should probably make sure that you have SMART monitoring installed on your system so that you can monitor the status (and predictive failure) of drives. To get information on the RAID5 container you can use the Disk Utility (GUI) or just type

  • cat /proc/mdstat

There are many resources on setting RAID5 sub-systems on Linux; for starters you can simply look at the man pages on the mdadm command.

NOTE: This procedure was developed and tested using Ubuntu 10.04 LTS x64 Desktop.

Originally posted 2010-06-29 02:00:15.

Ubuntu – RAID Creation

I think learning how to use mdadm (/sbin/mdadm) is a good idea, but in Ubuntu Desktop you can use Disk Utility (/usr/bin/palimpsest) to create most any of your RAID (“multiple disk”) configurations.

In Disk Utility, just access “File->Create->Raid Array…” on the menu and choose the options.  Before doing that, you might want to clear off the drives you’re going to use (I generally create a fresh GTP partition to insure the drive is ready to be used as a component of the RAID array).

Once you’ve created the container with Disk Utility; you can even format it with a file system; however, you will still need to manually add the entries to /etc/mdadm/mdadm.conf and /etc/fstab.

One other minor issue I noticed.

I gave my multiple disk containers names (mirror00, mirror01, …) and Disk Utility will show them mounted on device /dev/md/mirror00 — in point of fact, you want to use device names like /dev/md0, /dev/md1, … in the /etc/mdadm/mdadm.conf file.  Also, once again, I highly recommend that you use the UUID for the array configuration (in mdadm.conf) and for the file system (in fstab).

Originally posted 2010-07-12 02:00:33.

Ubuntu – Disk Utility

When you install Ubuntu 10.04 Desktop, the default menu item for Disk Utility isn’t extremely useful; after all, it’s on the System->Administration menu, so you would assume that it’s meant to administer the machine, not just view the disk configuration.

What I’m alluding to is that by default Disk Utility (/usr/bin/palimpsest) is not run with elevated privileges (as super-user), but rather as the current user — which if you’re doing as you should be, that’s means you won’t be able to effect any changes, and Disk Utility will probably end up being a waste of time and effort.

To correct this problem all you need do is modify the menu item which launches Disk Utility to elevate your privileges before launching (using gksu) — that, of course, assumes that you’re permitted to elevate your privileges.

To do add privilege elevation to disk utility:

  1. Right click your mouse on the menu bar along the top (right on system is good) and select ‘edit menu items’
  2. Navigate down to ‘administration’ and select it in the left pane
    Select ‘disk utility’ in the right pane
  3. Select ‘properties’ in the buttons on the right
  4. Under ‘command’ prefix it with ‘gksu’ or substitute ‘gksu /usr/bin/palimpsest’ (putting the entire path there)
  5. Then click ‘close’ and ‘close’ again…

Originally posted 2010-06-27 02:00:33.

Disk Bench

I’ve been playing with Ubuntu here of late, and looking at the characteristics of RAID arrays.

What got me on this is when I formatted an ext4 file system on a four drive RAID5 array created using an LSI 150-4 [hardware RAID] controller I noticed that it took longer than I though it should; and while most readers probably won’t be interested in whether or not to use the LSI 150 controller they have in their spare parts bin to create a RAID array on Linux, the numbers below are interesting just in deciding what type of array to create.

These numbers are obtained from the disk benchmark in Disk Utility; this is only a read test (write performance is going to be quite a bit different, but unfortunately the write test in Disk Utility is destructive, and I’m not willing to lose my file system contents at this moment; but I am looking for other good benchmarking tools).

drives avg access time min read rate max read rate avg read rate

ICH8 Single 1 17.4 ms 14.2 23.4 20.7 MB/s
ICH8 Raid1 (Mirror) 2 16.2 ms 20.8 42.9 33.4 MB/s
ICH8 Raid5 4 18.3 ms 17.9 221.2 119.1 MB/s
SiL3132 Raid5 4 18.4 ms 17.8 223.6 118.8 MB/s
LSI150-4 Raid5 4 25.2 ms 12.5 36.6 23.3 MB/s

All the drives used are similar class drives; Seagate Momentus 120GB 5400.6 (ST9120315AS) for the single drive and RAID1 (mirror) tests, and Seagate Momentus 500GB 5400.6 (ST9500325AS) for all the RAID5 tests.  Additionally all drives show that they are performing well withing acceptable operating parameters.

Originally posted 2010-06-30 02:00:09.

Linux – Desktop Search

A while ago I published a post on Desktop Search on Linux (specifically Ubuntu).  I was far from happy with my conclusions and I felt I needed to re-evaluate all the options to see which would really perform the most accurate search against my information.

Primarily my information consists of Microsoft Office documents, Open Office documents, pictures (JPEG, as well as Canon RAW and Nikon RAW), web pages, archives, and email (stored as RFC822/RFC2822 compliant files with an eml extension).

My test metrics would be to take a handful of search terms which I new existed in various types of documents, and check the results (I actually used Microsoft Windows Search 4.0 to prepare a complete list of documents that matched the query — since I knew it worked as expected).

The search engines I tested were:

I was able to install, configure, and launch each of the applications.  Actually none of them were really that difficult to install and configure; but all of them required searching through documentation and third party sites — I’d say poor documentation is just something you have to get used to.

Beagle, Google, Tracker, Pinot, and Recoll all failed to find all the documents of interest… none of them properly indexed the email files — most of the failed to handle plain text files; that didn’t leave a very high bar to pick a winner.

Queries on Strigi actually provided every hit that the same query provided on Windows Search… though I have to say Windows Search was easier to setup and use.

I tried the Neopomuk (KDE) interface for Strigi — though it just didn’t seem to work as well as strigiclient did… and certainly strigiclient was pretty much at the top of the list for butt-ugly, user-hostile, un-intuitive applications I’d ever seen.

After all of the time I’ve spent on desktop search for Linux I’ve decided all of the search solutions are jokes.  None of them are well thought out, none of them are well executed, and most of them out right don’t work.

Like most Linux projects, more energy needs to be focused on working out a framework for search than everyone going off half-cocked and creating a new search paradigm.

The right model is…

A single multi-threaded indexer running in the background indexing files according to a system wide policy aggregated with user policies (settable by each user on directories they own) along with the access privileges.

A search API that takes the user/group and query to provide results for items that the user has (read) access to.

The indexer should be designed to use plug-in modules to handle particular file types (mapped both by file extension, and by file content).

The index should also be designed to use plug-in modules for walking a file system and receiving file system change events (that allows the framework to adapt as the Linux kernel changes — and would support remote indexing as well).

Additionally, the index/search should be designed with distributed queries in mind (often you want to search many servers, desktops, and web locations simultaneously).

Then it becomes a simple matter for developers to write new/better indexer plug-ins; and better search interfaces.

I’ve pointed out in a number of recent posts that you can effective use Linux as a server platform in your business; however, it seems that if search is a requirement you might want to consider ponying up the money for Microsoft Windows Server 2008 and enjoy seamless search (that works) between your Windows Vista / Windows 7 Desktops and Windows Server.

REFERENCES:

Ubuntu – Desktop Search

Originally posted 2010-07-16 02:00:19.

Linux Server

I’ve been experimenting with a Linux server solution for the past couple months — I was prompted to look at this when my system disk failed in a Windows Server 2008 machine.

First, I’m amazed that after all these years Microsoft doesn’t have a standard module for monitoring the health of a system — at the SMART from disk drives.

I do have an Acronis image of the server from when I first installed it, but it would be a pain to reconfigure everything on that image to be as it was — and I guess I just haven’t been that happy with Windows Server 2008.

I personally find Windows Server 2008 needlessly complicated.

I’m not even going to start ranting on Hyper-V (I’ve done that enough, comparing it head-to-head with other technology… all I will say is it’s a good thing their big competitor is Vmware, or else Microsoft would really have to worry about having such a pathetic virtualization offering).

With a Linux distribution it’s a very simple thing to install a basic server. I actually tried Ubuntu, Centos, and Fedora. I also looked at the Xen distribution as well, but that wasn’t really of interest for a general purpose server.

Personally I found Centos (think Red Hat) to be a little too conservative on their releases/features; I found Fedora to be a little too bleeding edge on their releases/features (plus there’s no long term support commitment); so I was really just left with Ubuntu.

I didn’t really see any reason to look exhaustively at every Debian based distribution — Ubuntu was, in my mind, the best choice of that family; and I didn’t want to look at any distribution that wasn’t available at no cost, nor any distribution that didn’t have a good, stable track record.

With Ubuntu 10.04 LTS (10.04 is a Long Term Support release – which makes it a very good choice to build a server on) you could choose the Desktop or the Server edition — the main difference with the Server verses the Desktop is that the server does not install the XServer and graphical desktop components (you can add them).

The machine I was installing on had plenty of memory and processor to support a GUI, and I saw no reason not to install the Desktop version (I did try out the server version on a couple installs — and perhaps if you have an older machine or a machine with very limited memory or a machine that will be taxed to it’s limits or a machine that you want the absolute smallest attack surface you’d want desktop — though almost all those requirements would probably make me shift to Centos rather than Ubuntu).

My requirements were fairly simple — I wanted to replace the failed Windows 2008 Server with a machine that could perform my DNS, DHCP, web server, file store (home directories — served via CIFS/Samba), and active P2P downloads.

Additionally, the server would have to have fault-tolerate file systems (as did the Windows server).

Originally my testing focused on just making sure all the basic components worked, and worked reasonably well.

Then I moved on to getting all the tools I had written working (I converted all the C# code to PHP).

My final phase involved evaluating fault tolerant options. Initially I’d just used the LSI 150-4 RAID controller I had in the Windows Server 2008 (Linux supported it with no real issues — except that Linux was not able to monitor the health of the drives or the array).

I didn’t really see much need to use RAID5 as I had done with Windows Server 2008; so I concentrated on just doing RAID1 (mirroring) — I tried basic mirrors just using md, as well as using lvm (over md).

My feelings were that lvm added an unnecessary level of complexity on a standalone server (that isn’t to say that lvm doesn’t have feature that some individuals might want or need). So my tests focused primarily on just simple mirrors using md.

I tested performance of my LSI 150-4 RAID5 SATA1 PCI controller (with four SATA2 drives) against RAID1 SATA2 using Intel ICH9 and SiI3132 controllers (with pairs of SATA1 or SATA2 drives). I’d expected that the LSI 150-4 would outperform the md mirror with SATA1 drives on both read and write, but that with SATA2 drives I’d see better reads on the md mirror.

I was wrong.

The md mirrors actually performed better across the board (though negligibly better with SATA1 drives attached) — and the amazing thing was that CPU utilization was extremely low.

Now, let me underscore here that the LSI 150-4 controller is a PCI-X (64-bit) controller that I’m running as PCI (32-bit); and the LSI 150-4 represents technology that’s about six years old… and the LSI 150-4 controller is limited to SATA1 with no command set enhancements.

So this comparison wouldn’t hold true if I were testing md mirrors against a modern hardware RAID controller — plus the other RAID controllers I have are SAS/SATA2 PCIe and have eight and sixteen channels (more spindles means more performance).

Also, I haven’t tested md RAID5 performance at all.

My findings at present are that you can build a fairly high performance Linux based server for a small investment. You don’t need really high end hardware, you don’t need to invest in hardware RAID controllers, and you don’t need to buy software licenses — you can effectively run a small business or home office environment with confidence.

Originally posted 2010-06-24 02:00:09.