Friday, June 15, 2012

Migrating POPFile

For years now, I've been using POPFile (an open source, cross-platform Bayesian classifier) to filter spam. Once trained, it seems to be more accurate than what my former employer used (or at least what they were using when I installed POPFile).

When I switched my university mail account from POP3 to IMAP, I also switched to the IMAP module in POPFile. It's pretty sweet. Whereas POPFile, like most filters, inserts itself between the mail client and a POP3 mail server (mail client polls POPFile, which in turn polls the server), with IMAP, POPFile works in parallel with the mail client (POPFile polls the server and moves spam-o-grams into selected folders, mail client directly polls the server). This has several advantages:
  • it simplifies configuration a bit, especially when you use different mail clients on different platforms;
  • you can train it just by dragging messages from the folder POPFile put them in to the folder you want them in;
  • if you have POPFile running nonstop (say, on a server), you get spam filtering even when you are using a mail client on a platform where POPFile is not installed (in my case, an Android tablet); and
  • if POPFile should go down or fail to start, your mail client is not disconnected from the server.
The only real issue I see with the IMAP module is that it can only filter one server. I can live with that (particularly as I have no choice.)

POPFile can do more than filter spam, though.  You can teach it to distinguish personal messages from business messages, or sales offers from sports updates, and route messages to specific folders. I highly recommend it.

I had POPFile running on my office PC (Linux Mint), but with my retirement that machine is no longer mine to abuse. So I installed POPFile on my home PC (also Linux Mint). A slightly less than current version of POPFile can be installed effortlessly through Synaptic (the package manager).

Having used it for so long, I have POPFile highly trained, and I was loath to start from scratch. There are pretty good instructions for copying the corpus (database) to another machine, particularly here, but I also copied the configuration file, and somehow ran into a problem getting POPFile to start. If I started it manually using sudo /usr/share/popfile/start_popfile.sh, the daemon would run. The autostart script, however, failed silently, even if I ran it manually (sudo /etc/init.d/popfile start). A bit of digging indicated that it was complaining about listening on port 110. On the one hand, this makes sense. When you install POPFile, it creates a special user and group with the name popfile, which owns the program file. Even if you run the init.d start script as root, it actually starts the server using the user group popfile. On Linux systems, I think that only the root account has server access to privileged ports (including 110), so the popfile account would not be allowed to hang a server on that port. What did not make sense is that POPFile would be trying to do so, since it was configured (both by default and in the configuration file I was copying over) to listen on nonprivileged port 7070. Even turning off POP3 monitoring completely did not fix the problem. Also, instructions that include copying the configuration file to the new machine do not warn you that it will copy the administrative password (encrypted) and the host name (which will now be incorrect, unless the new machine happens to have the same name as the old one).

So, to jog my memory the next time around, here is the step-by-step process that worked. On Windows or Mac OS, the file names should be the same, but paths will be different.
  1. Install POPFile on the new machine (via the package manager).
  2. Access the control panel on the new machine (point a browser at 127.0.0.1:7070) and shut POPFile down from there. This creates a configuration file the first time you do it. Try to reload 127.0.0.1:7070 just to make sure the server is down.
  3. Copy popfile.cfg (configuration) and popfile.db (corpus) from /var/lib/popfile on the old machine to a temporary directory on the new one. (This will require root access on the old machine.)
  4. As root, move the copy of popfile.db to /var/lib/popfile on the new machine, overwriting the file installed by Synaptic.
  5. On the new machine, run sudo chmod o+rw /var/lib/popfile/popfile.cfg to make the configuration file writable.
  6. Using a file merging program (I like meld), compare the configuration files from the old and new machines, and merge the IMAP settings (lines starting imap_...) from the old configuration into the new one. Save the new configuration file. (Don't worry about resetting the permissions on it; POPFile will do that automatically.)
  7. On the new machine, run sudo /etc/init.d/popfile start, then browse to 127.0.0.1:7070 and verify that POPFile successfully started.

No comments:

Post a Comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.