Sunday, December 8, 2013

Mint Petra Odds and Ends

As I complete (hopefully) the upgrade of my home PC to Linux Mint 16 Petra (which appears to be commensurable with Ubuntu 13.10 Saucy Salamander), I'm making notes on glitches small and large.
  • I initially had some display problems booting and resuming from hibernation. By "display problems" I mean totally corrupted displays, or what appeared to be hang-ups during the boot process. This is not the first time I've experienced display problems of this nature. So I reinstalled the sgfxi script (this time to my home partition, where it should survive future system upgrades, and ran it. Since installing the latest NVIDIA proprietary drivers, I've had no boot failures. The display is still sometimes a bit slow redrawing itself, something I'll need to investigate when I have time.
  • As noted in my previous post, Petra has the annoying habit of opening the Disk Usage Analyzer (a.k.a. "Baobab") when it ought to be displaying a folder in Nemo, the replacement for Nautilus. The fix (in the second update of the previous post) is to add
    inode/directory=nemo.desktop;baobab.desktop;
    to  ~/.local/share/applications/mimeapps.list
  • Back in June, I posted about using the µswsusp package to accelerate hibernation and return from hibernation. Unfortunately, while the package seems to work as far as saving a compressed image to disk, Petra is unable to return from hibernation. I get a message that it is "Resuming from /dev/disk/by-uuid/you-don't-want-to-know", and I get to stare at that message for as long as I want before rebooting the machine. So I uninstalled µswsusp. Fortunately, Petra seems to be faster than Nadia was at both hibernating and (more importantly, at least to me) resuming. If I need to bring back µswsusp at a later date, the answer may be here (change the configuration file for µswsusp so that "resume device" is set to the UUID of the drive rather than its name). [Update: No joy. I reinstalled µswsusp, confirmed that it hung on resume, edited its config file to use /dev/disk/by-uuid/you-don't-want-to-know as the resume device, ran update-initramfs -u (as root), and it still hung on resume. Apparently I need to give up on µswsusp.]
  • While messing around with the hibernation stuff, I periodically encountered a message "No support for locale en_US.utf8". This seemed harmless enough, but I decided to do a quick search, and turned up clear instructions (albeit for an older version, Mint 13) on how to fix it. Running
sudo locale-gen --purge --no-archive
hopefully fixes it.

Saturday, December 7, 2013

Updating Linux Mint: Nemo v. Nautilus v. Baobab

I'm in the process of updating to Linux Mint 16 (Petra, Cinnamon-flavor), and I've already run into one bit of minor insanity.  Mint 16 switched its file manager from Nautilus to Nemo, a fork of Nautilus. That's fine with me. There are minor adjustments to make, but no big deal.

Except ... after installing Petra and then downloading a file using Firefox, I right-clicked the file's entry in Firefox's download list and clicked "Open Containing Folder", expecting to, well, open the Downloads folder. Instead, Firefox launched the Disk Usage Analyzer (?!).

A web search found a small number of reports of this with no fixes posted. It took a lot of digging before I finally put together the facts that (a) Firefox uses MIME types to decide what applications to launch and (b) the name of the disk usage analyzer is "Baobab", not "Disk Usage Analyzer". Armed with that, I found a line in /usr/share/applications/mimeinfo.cache (line 368 in my copy; your mileage may vary) that read:
inode/directory=baobab.desktop;nemo.desktop;
So I ran sudo gedit  /usr/share/applications/mimeinfo.cache, changed that line to
inode/directory=nemo.desktop;baobab.desktop;
and saved the file. Immediately (without having to restart it), Firefox sobered up and starting opening the Downloads folder in Nemo. I guess Firefox uses the first available program in a list of applications for a given MIME type, and for whatever reason someone thought Baobab should have precedence over Nemo.

Update: As soon as I posted this, I went back to fiddling with my installation, and installed the nemo-dropbox package from Synaptic, which integrates Dropbox with Nemo. It also downloaded Dropbox, regardless of the fact that Dropbox was already installed. :-( Not only that, it apparently modified mimeinfo.cache, because mimeinfo.cache reverted to its original self. I had to edit it again. Looks like this may be the gift that keeps on giving. :-(

One other note to self: the annotations on the Dropbox folder (up-to-date, being synched, ...) that are the whole point of the nemo-dropbox package only show up after you stop Nemo (nemo -q) and then restart it.

Update #2: I think it's solved now. After noting that /usr/share/applications/mimeinfo.cache had reverted yet again, I added the line
inode/directory=nemo.desktop;baobab.desktop;
to  ~/.local/share/applications/mimeapps.list. That should override the entry in /usr/share/applications/mimeinfo.cache.

Friday, December 6, 2013

Controlling 3G/4G in Jelly Bean 4.3

My phone (a Samsung Galaxy S3) updated itself to Android 4.3 this morning. I'm sure it has all sorts of wizardous new features (most of which I'll never use). One feature that is somewhat handy is that you now have some control over which "quick setting buttons" you have in the notifications panel (top of the screen), and in what order they appear. If you swipe down from the top to expose the notifications stuff, you'll see the usual rows of buttons plus a new (I think) button in the upper right, as in the following screenshot.


Tap the button in the upper right, and you get an expanded display of the quick setting buttons.


Now tap the edit button (looks like a pencil), and you get a screen that allows you to drag and drop buttons within and between two groups, those displayed by default and those displayed only in the expanded view.


Unfortunately, there is some bad news: the quick setting button to turn mobile data (3G/4G service) on or off has been removed. I went online and found various people complaining about this on phone company support forums. I can't blame them for complaining -- this is a feature I use fairly frequently when traveling -- but I think the complaints may be misplaced. I doubt that the service providers (Verizon, Sprint, etc.) had anything to do with eliminating the button. It's possible it was a decision by the hardware vendors (Samsung, did you do this to me?), but may also be a "feature" determined by Google.

The good news is in two parts. First, you can still turn mobile data on and off by going through the device settings menu, although it's buried fairly deeply. Second, there are free third party apps that provide alternative ways to toggle data service. One very nice one is Notification Toggle (henceforth "NT"). Once installed from Google Play, it will drop a launcher icon somewhere on one of your desktops, as seen in the next shot.


You'll also see an icon in the upper left corner of the notifications bar, telling you that NT is running. Tap the launcher to get to the NT home screen.


There are lots of things you can set in NT, but for our purposes the next step is to tap the icon to the left of the program name.


Now tap the Toggles menu item to get a list of built in buttons.


The number available is truly impressive. Scroll down a bit and you'll find one for "Mobile data". Put a check next to it. (You can see in the shot above that I also added a button to turn on flashlight mode from the notifications screen.) There's also a button labeled "4G", which I assume controls whether you use 4G when available or 3G only. Now go to any screen and swipe down to expand the notifications bar.


In addition to the usual stuff, you get some new buttons (circled in the screenshot above). The one on the left toggles flashlight mode; the one on the right toggles data service on (lit) or off (unlit, as in the shot above).

Problem solved.

Sunday, December 1, 2013

Testing Regression Significance in R

I've come to like R quite a bit for statistical computing, even though as a language it can be rather quirky. (Case in point: The anova() function compares two or more models using analysis of variance; if you want to fit an ANOVA model, you need to use the aov() function.) I don't use it that often, though, which is a mixed blessing. The bad new is that infrequent use makes it hard for me to remember everything I've learned about the language. The good news is that infrequent use means I'm not having to do statistical analysis very often.

I don't think I'm alone in believing that consistent coding patterns (paradigms, idioms, whatever you want to call them) are very helpful when using a language infrequently. That motivates today's post, on testing significance of a regression model. By model significance, I mean (in somewhat loose terms) testing
H0: the null model (no predictors other than a constant term) fits the data at least as well as our model
versus
H1: our model fits the data better than the null model.
When performing a standard linear regression, the usual test of model significance is an F-test. As with most (all?) statistics packages, R helpfully prints out the p-value for this test in the summary output of the regression, so you can see whether your model is (literally) better than nothing without any extra work. To test whether a second model (call it model2) improves on model 1 significantly, you use the anova() command:
anova(model1, model2)

which is easy enough to remember.

When performing a generalized linear regression, however, R does not automatically give you a model significance test. I'll focus here on a binary logit model (dependent variable binary), but I'm pretty sure the various approaches apply to other uses of the GLM, perhaps with some tweaks.

Let's say that model1 is a binary logistic regression model I've fitted in R. The most common test for significance of a binary logistic model is a chi-square test, based on the change in deviance when you add your predictors to the null model. R will automatically calculate the deviance for both your model and the null model when you run the glm() command to fit the model. The approach to testing significance that I've seen on a number of web pages, including this one, involves calculating the p-value manually, using some variation of the following syntax:
with(model1, pchisq(null.deviance - deviance, df.null - df.residual, lower.tail = FALSE))

That's fine (nice and compact), but prospects of my remembering it are slender at best. Fortunately, we can use the aforementioned anova() command by manually fitting the null model. First rerun the logistic regression using just a constant term. Call the resulting fit null.model. Now compare null.model to model1 using the anova() command, adding an argument to tell R that you want a chi-square test rather than the default F test:
anova(null.model, model1, test = "Chisquare")
You can also use the same syntax to compare two fitted logistic models for the same data, say where model2 adds some predictors to model1. For me, that's a lot easier to remember than the manual approach.

Here's some heavily annotated code (or you can download it), if you want to see an example:

#
# Linear and logit regression examples.
#
# (c) 2013 Paul A. Rubin
# Released under the <a href="http://creativecommons.org/licenses/by/3.0/deed.en_US">Creative Commons Attribution 3.0 Unported License</a>.
#
library(datasets);
#
# To demonstrate linear regression, we use the ChickWeight data set.
# Dependent variable:
#   weight = the weight (grams) of a chick.
# Predictors:
#   Time = age of the chick (days)
#   Diet = factor indicating which of four diets was fed to the chick
# Not used:
#   Chick = subject identifier
#
# First model: regress weight on just age.
#
model1 <- lm(weight ~ Time, data = ChickWeight);
summary(model1);
#
# Shocking discovery: weight increases with age!
#
# Second model: regress weight on both age and diet.
#
model2 <- lm(weight ~ Time + Diet, data = ChickWeight);
summary(model2);
#
# Diet is also significant, and diets 2-4 all apparently have
# different effect on weight gain from diet 1.
#
# Is model 2 better than the "null" model (constant term only)?
# The summary output includes the approximate p-value (< 2.2e-16, 
# so essentially zero) for an F-test comparing model 2 to the null
# model. We can get the same information as follows:
#
null.model <- lm(weight ~ 1, data = ChickWeight); # actual null model
summary(null.model);
anova(null.model, model2); # compare model 2 to null model
#
# Is model 2 significantly better than model 1?
#
anova(model1, model2); # yes (p < 2.2e-16 again)
#
# We now switch to logit regression. To demonstrate it, we create
# a new 0-1 variable indicating whether a chick is heavier than
# 170 grams (1) or not (0), and append it to the data set.
#
ChickWeight <- cbind(ChickWeight, chubby = ChickWeight$weight > 170);
#
# Next, we run a logit model to see if age and diet predict whether
# a chick is chubby.
#
model3 <- glm(chubby ~ Time + Diet, data = ChickWeight, family = "binomial");
summary(model3);
# All terms except Diet2 seem significant (suggest that diets 1 and
# 2 may have the same tendency to create chubby chicks, while diets
# 3 and 4 are more inclined to do so, since their coefficients are
# positive).
#
# Now we add interactions.
#
model4 <- glm(chubby ~ Time*Diet, data = ChickWeight, family = "binomial");
summary(model4);
#
# Main effects of diet are not longer significant, but somewhat oddly
# the interaction of time with diet 4 is.
#
# We use a chi-square test to test for overall model significance,
# analogous to the F test for a linear regression. The catch is
# that R does not provide a significance value in the summary output
# for the glm method. We can compute a p-value manually, as follows.
#
with(model3, pchisq(null.deviance - deviance, df.null - df.residual, lower.tail = FALSE));
#
# The p-value is essentially zero, so we reject the null and conclude
# that model 3 is better than nothing.
#
# Manual computation may be faster on large data sets (the deviances
# have already been calculated), but it is arguably easier (at least
# on the user's memory) to generate the null model (as before) and then
# run an ANOVA to compare the two models.
#
null.model <- glm(chubby ~ 1, data = ChickWeight, family = "binomial");
anova(null.model, model3, test = "Chisq");
#
# The chi-square test has a bit less precision (p < 2.2e-16 rather than
# p = 7e-69), but that precision is probably spurious (the weight data
# is not accruate to 69 decimal places). We still reject the null
# hypothesis that the null model is at least as good as model3 at
# predicting chubbiness. To test whether the model with interaction terms is
# better than the model without them, we can again run an ANOVA.
#
anova(model3, model4, test = "Chisq");
#
# The p-value (0.03568) suggests that we should again reject the
# null hypothesis (that first model is at least as good as the
# second model) and conclude that inclusion of the interaction terms
# improves prediction (although the significant interaction with
# no significant main effect for diet is rather suspicious).
#
# One other way to test significance of a logit model is to run
# an ANOVA with the model as the sole argument.
#
anova(model3, test = "Chisq");
#
# R adds terms to the model sequentially and shows the significance
# of each change (using a chi square test). If any of the terms
# are significant, the model is better than the null model. In
# the output of the previous command, the Time variable is very
# significant (p < 2.2e-16), so the null hypothesis that our model
# is no better than the null model is rejected even before we see
# the results for adding Diet (again significant).
Created by Pretty R at inside-R.org

Saturday, November 9, 2013

LP Cutting Planes in CPLEX

Cut generation, as used in what follows, refers to generating constraints for a mathematical program "on the fly" (based on intermediate solutions), rather than adding all relevant constraints at the outset of the problem. It is typically used when  there is an astronomical number of possible constraints, most of which will turn out not to be very useful.

Someone recently asked me which is faster in CPLEX when solving a linear program with cut generation:
  • solving the LP, adding cuts, solving the LP again ...; or
  • using a callback to add the cuts on the fly.
My initial guess (which I think proved correct) was that the first method is likely to be faster, or at least not notably slower, then the second method, but I suppose it might be possible for the second method to win if problems are large. (I'm not at all confident about that last point.)

What makes the "solve - cut - repeat" loop competitive is that CPLEX can, and with default settings does, "warm start" the solution of a linear program based on the most recent previous solution of the model. The model can be changed between calls to the solver, so long as CPLEX recognizes that it is the same problem. Generally, warm starting after adding (or removing) constraints expedites the solution process (relative to starting from scratch) if the number of changes is fairly modest.

The second method involves attaching a lazy constraint callback to the model. Each time a potential new optimum for the LP is found, the solution will be passed to the callback, which can either reject the solution by adding one or more constraints (cuts), at least one of which renders the candidate solution infeasible, or accept the solution by refraining from adding any new cuts. The original LP now becomes the root node "relaxation" of the problem. CPLEX will never attempt to branch, since there are no integer variables in the LP (and thus none that will take on fractional values), so once the root node is optimized, we're done.

A bit of trickery is required for the second method, since the lazy constraint callback is available only for integer and mixed integer programs. We can spoof CPLEX by adding a single integer or binary variable to the model, but not including it in any constraints or in the objective function. This works in the various programming APIs, where there is a method to add an object (in this case our dummy variable) directly to the model. In the interactive optimizer, it may be necessary to add a vacuous constraint as well. For example, if we add a dummy binary variable $z$, we can use the constraint $z \le 1$ to force $z$ into the model (and convince CPLEX the model is an integer program) without affecting the optimal solution.

I tested this with a couple of small convex optimization problems, using a rather old school cutting plane method. The problems take the form\[ \begin{array}{cccc} \mathrm{opt} & c'x\\ \mathrm{s.t.} & Ax & \le & b\\ & g(x) & \le & h\\ & x & \in & \Re^n_+ \end{array} \]where $g:\Re^n_+\rightarrow\Re^m$ is convex and differentiable and "opt" denotes either maximization or minimization. The initial linear program omits the constraint(s) $g(x)\le h$. When a candidate solution $\hat{x}$ is obtained, we plug it into each $g_i()$. If $g_i(\hat{x})\gt h_i$ by more than some rounding tolerance, we add the constraint\[\nabla g_i(\hat{x})x \le h_i-g_i(\hat{x}) + \nabla g_i(\hat{x})\hat{x}.\]If multiple constraints are violated, we can add just one cut or (as I did in my tests) simultaneously add a cut for each violated constraint.

I wrote a test program in Java (for which the source code is available), using two test problems:\[ \begin{array}{cccc} \mathrm{max} & 3x_{1}+x_{2}\\ \mathrm{s.t.} & x_{1}+x_{2} & \le & 3\\ & x_{1}^{2}+x_{2} & \le & 1\\ & x & \ge & 0 \end{array} \]and\[ \begin{array}{cccc} \mathrm{max} & 3x_{1}^{\phantom{2}}+\phantom{2}x_{2}+4x_{3}^{\phantom{4}}\\ \mathrm{s.t.} & \phantom{{3}}x_{1}^{\phantom{2}}\phantom{-2x_{2}+4x_{3}^{4}} & \le & 5\\ & \phantom{{3x_{1}^{2}-2}}x_{2}\phantom{+4x_{3}^{4}} & \le & 5\\ & \phantom{3x_{1}^{2}-2x_{2}+4}x_{3}^{\phantom{4}} & \le & 5\\ & \phantom{3}x_{1}^{2}-\phantom{1}x_{2}\phantom{+4x_{3}^{4}} & \le & 3\\ & \phantom{3x_{1}^{2}}-2x_{2}+\phantom{4}x_{3}^{4} & \le & 0\\ & x & \ge & 0. \end{array} \]My code recorded execution times for the CPLEX 12.5.1 solver using both methods, excluding time spent setting up the models, time spent reporting the results, and time spent generating the cuts. (Time spent adding the cuts to the models was included.)

The problems are both very small, so their execution times (which varied significantly during repeated runs of the same models) may not be indicative of what happens on more difficult problems. On the first problem, the first method usually beat the second method by a small margin (typically about 6 milliseconds v. 7 milliseconds), but I did see an occasional win for the second method (by about 1 ms.). On the second problem, the first method consistently beat the second method, usually by one or two milliseconds, occasionally by a bit more. The number of cuts generated (five in the first problem, eleven in the second) did not vary between methods, nor did the optimal solution reported.

Monday, November 4, 2013

The Triangle Inequality in Transportation Networks

I just noticed that I managed to go the entire month of October without a post. This was a combination of several factors: it's conference season for me (INFORMS in October, Decision Sciences Institute coming up soon); I was a guest blogger at the INFORMS conference (sample here); deciduous trees + autumn + Michigan = yard work; college and pro football games to watch; laziness; and a somewhat silent muse. Today's entry was prompted partly by a vague sense of guilt and partly by a bug I encountered.

The bug occurred in the context of some ongoing research, involving optimization of vehicle assignments and routing. Computational results on artificial data sets occasionally produced "optimal" solutions that were demonstrably suboptimal. The source of the problem turned out to be instances where the data did not satisfy the triangle inequality, one of the assumptions upon which our model is based. Fixing the data cured the problem.

This in turn reminded me of a dissertation proposal defense I attended a couple of years ago. The candidate (who, in fairness, had not received any training in operations research) was proposing a set of research questions in logistics that only made sense in situations where the underlying transportation network failed to satisfy the triangle inequality, either when the metric was distance traveled or when it was travel time. This prompts some observations about the triangle inequality in the context of transport networks. To save myself some typing, I'll henceforth use the phrase "literal network" to mean a network in which nodes are locations on a map and arcs or edges are literally segments of a road network. Some but not all of what follows may also apply to rail, sea or air transport.
  1. When the metric is travel time, literal networks will frequently not satisfy the triangle inequality. The most direct route from my former employer's campus to my house involved a fairly linear stretch on streets with comparatively low speed limits and frequent traffic lights. By going a little out of my way, I could turn the bulk of that drive into a sprint up an interstate highway. So A (campus exit) to B (home) had a substantially longer driving time, particularly at rush hour, than A to C (highway entrance) to D (highway exit) to B.
  2. When the metric is distance, literal networks may still not satisfy the triangle inequality. The "direct" route from A to B may loop around some obstacle (hill, swamp, US Capitol ... but I repeat myself) or have a significant vertical component. If A to C and C to B are fairly flat and linear segments, their combined length may be less than the looping A to B.
  3. If your key metric is distance, an undirected graph may be sufficient, unless roads are one-way. If you are concerned with driving time, you probably need a directed network, even if there are no one-way roads. Anyone who has driven during rush hour in the opposite direction of the heavy traffic has experienced (and enjoyed) the asymmetry.
  4. Practically speaking, the network actually traveled by (knowledgeable) drivers will satisfy the triangle inequality in either time or distance, whichever is more important, regardless of whether the literal network does. If I can get from A to B faster by way of C than going directly, I will, and so my A->B "arc" is actually A->C->B.
Points 2 and 4 underly the bug in that research project I mentioned. The arcs were generated without the assumption that distances would satisfy the triangle inequality. The non-triangularity (to coin a term) is plausible given that a rural road network, where road segments are not particularly straight, was being simulated. Travel times were then made proportional to distances, which is reasonable from a literal perspective but neglects the fact that drivers will use an indirect route if it is faster.

Point 4 also relates to what was tripping up the doctoral student I mentioned. His research required a scenario where commercial delivery vehicles would be routed using traveling salesman problems (TSPs), with travel time as the metric, on a network where travel times did not satisfy the triangle inequality. When I pointed out the gist of the fourth point (that drivers will take an indirect route if faster, assuming that their mileage is not being tightly monitored), he was properly scandalized, because it is well known that TSP solutions are Hamiltonian cycles that do not revisit intermediate nodes. The indirect routes would in some cases violate that restriction.

My response was that one has to apply the TSP model to a network in which an arc A->B represents not a literal segment from A directly to B, passing through no other nodes, but a "virtual" arc originating at A, terminating at B and stopping at no intermediate nodes (but possibly passing through them, if that is more efficient). He was quite skeptical, until I pointed out that if your warehouse is at the end of a blind alley and you are not allowed to repeat any streets, you can never return. That convinced him. There are a few situations in which the no repetition rule needs to be enforced rigorously, typically when you are leaving burned bridges, land mines or irate traffic cops in your wake. In most other cases, I think you can be flexible.

Replacing the arcs in my joint research problem with "virtual arcs" amounted to computing shortest (time) paths between all pairs of nodes, and replacing the original travel time on each arc with the time of the shortest path. I used something equivalent to the Floyd-Warshall algorithm for that. In the other incident, once I had convinced the doctoral candidate that travel times typically satisfy the triangle inequality (and that the no repetition rule of the Hamiltonian cycle is to be taken a bit loosely), he contacted two vendors of commercial routing software. As he reported the conversations to me, neither had taken this into account in their software, which I find a bit surprising.

One final note: if your literal network satisfies the triangle inequality with respect to distances (road segments are fairly flat and linear) but not time (due to traffic signal, asymmetric traffic volumes, etc.), and you "virtualize" the network as I described to get times that do satisfy the triangle inequality, your distances may no longer satisfy it. You can't have everything.

-----

Muses: Greek mythology seems to be a bit vague about the number of Muses -- nine seems to be a common estimate -- and most sources ignore my personal muse: Erratic, the unreliable twin of Erato.

Monday, September 23, 2013

INFORMS 2013: Just Around the Corner

We're within about two weeks of the 2013 INFORMS Annual Meeting in beautiful Minneapolis, Minnesota. (Thank heavens for Google Maps. The atlas I had as a kid growing up in New York just said "Here Be Dragons" somewhere in the blank space west of Detroit.) I'll be a guest blogger during the meeting (October 6 through October 9), so I may be even less productive than usual on this blog. If you are curious, my first (pre-)conference post is already up (unless someone gets embarrassed and takes it down).

Thursday, September 5, 2013

A Minimum Variance Convex Combination

Someone asked a question on Quora that reduces to the following: how to solve the quadratic program (QP)\begin{gather*}\mathrm{minimize}\,\,f(x)=\frac{1}{2}x'Qx\\\mathrm{s.t.}\,\,\sum_{i=1}^n x_i=1\\x\in \Re^n_+.\end{gather*}The constraints make $x$ the weights in a convex combination of something. My impression of the original question was that $Q$ would be the covariance or correlation matrix of a set of $n$ random variables, in which case (a) $Q$ would be symmetric and at least positive semidefinite (positive definite if the variables are not multicollinear) and (b) the objective function would represent the variance (scaled in the case of that $Q$ is a correlation matrix) of the convex combination of those variables using weights $x$.

Assuming (as I will below) that $Q$ is symmetric and at least positive semidefinite, this problem is easily solved by a variety of QP solvers, and my first recommendation was to link to a suitable library. (The questioner intends to solve the problem within a C++ program.) The questioner, however, prefers to "roll his own" code: he will be solving a large number of small problems, and apparently believes the overhead getting in and out of libraries might be excessive. Personally, I'm a big fan of code libraries. I make every attempt not to reinvent the wheel, partly to save time and partly because my wheels somehow never come out completely round. To each his own, however. I suggested a few standard algorithms, including Rosen's gradient project method (for which, curiously, I cannot find a suitable link) and the generalized reduced gradient (GRG) algorithm.

While toddling over to a local coffee shop, though, a very simple (naive?) algorithm occurred to me. I do not have a formal proof of convergence, but it is easy to show that it cannot terminate at a suboptimal solution, so the only questions are (a) how slow it is compared to more sophisticated algorithms and (b) whether it is susceptible to jamming (zigzagging) (another term for which I cannot find a suitable link). The algorithm is predicated on the following observations:
  • if $x$ is feasible and I shift weight from one component of $x$ to another, moving from $x$ to $\hat{x}=x+\lambda d$ where\[d_k=\begin{cases}+1 & k=i\\-1 & k=j\\0 & i\neq k\neq j\end{cases},\]then$\sum_k \hat{x}_k=1$ and feasibility boils down to $\hat{x}_j\ge 0$, or equivalently $\lambda \le x_j$.
  • For $d$ defined as above, if we let $h(\lambda)=f(x+\lambda d)$, then \begin{gather*}h'(\lambda)=\nabla f(x+\lambda d)d \\ = (x+\lambda d)'Qd \\ = x'Qd + \left(d'Qd\right)\lambda \\ = \nabla f(x)d + \left(d'Qd\right)\lambda \\ = (g_i - g_j) + \left(Q_{ii} + Q_{jj} - 2Q_{ij}\right)\lambda,\end{gather*}where $g=Qx$ is the (transposed) gradient of $f$ at $x$.
Armed with that, the naive algorithm can be expressed in words as follows. Start with an interior feasible point (such as the center of the unit simplex). At each point, compute the gradient ($g$) and locate the smallest component among all variables (index $i$) and largest component among all variables that are not zero (index $j$). Clearly $g_i \le g_j$; if $g_i = g_j$, stop (you are unable to find an improving direction). Assuming $g_i < g_j$, set up the direction vector $d$ to shift weight from $x_j$ to $x_i$ in equal measure. The largest feasible step length is $x_j$; the step length at which $h$ turns from decreasing to increasing is $$\frac{(g_j - g_i)}{Q_{ii} + Q_{jj} - 2Q_{ij}}.$$The denominator of that fraction is strictly positive if $Q$ is positive definite. (If $Q$ is only positive semidefinite, there is a danger that it might be 0.) Pick whichever step length is smaller, take that size step in direction $d$, and iterate. Written in a mathematics/pseudocode mishmash:

$t\leftarrow 0$, $x^{(t)}\leftarrow\left(\frac{1}{n},\dots,\frac{1}{n}\right)$ // initialization
while (not converged) {
  $g^{(t)}\leftarrow Qx^{(t)}$ // gradient at $x^{(t)}$
  $i\leftarrow \mathrm{arg\,min}_k\left\{ g^{(t)}_k\right\}$ // index of smallest (most negative) gradient component
  $j\leftarrow \mathrm{arg\,max}_k\left\{ g^{(t)}_k | x^{(t)}_k > 0\right\}$ // index of largest gradient component with positive $x$
  $d^{(t)}_k\leftarrow \begin{cases} +1 & k=i\\ -1 & k=j\\ 0 & i\neq k\neq j \end{cases}$ // search direction
  $a^{(t)}\leftarrow g^{(t)}_i - g^{(t)}_j$
  $b^{(t)}\leftarrow Q_{ii} + Q_{jj} - 2Q_{ij}$ // derivative along search path is a + b*step
  if ($a^{(t)} = 0$) {
    STOP // solution is optimal
  } else if ($a^{(t)} + b^{(t)}x^{(t)}_j \le 0$) {
    $\lambda^{(t)}\leftarrow x^{(t)}_j$ // the objective decays along this direction all the way to the boundary
  } else {
    $\lambda^{(t)}\leftarrow -a^{(t)}/b^{(t)}$ // move until the directional derivative becomes zero
  }
  $t\leftarrow t+1$
  $x^{(t)}\leftarrow x^{(t)} + \lambda d^{(t)}$ // take a step
 }

I coded a couple of small tests in Java, using both CPLEX and the algorithm above. The CPLEX solver executed considerably faster than my scraggly, non-optimized Java code. No shock there. Interestingly, though, overall execution time (including printing and overhead for function calls) was very close, with my code about 1 ms. slower as measured by the system clock. Note that, at each iteration, I only have to do one matrix-vector multiplication (to get the gradient $g$, after which the remaining computations for that iteration are pretty trivial. Still, I'm surprised my algorithm came that close in terms of time. As far as accuracy goes, the solutions on test problems matched to within convergence limits.


Update: A couple of additional remarks are in order.
  • If $Q$ is only positive semidefinite and, at some iteration $t$, $b^{(t)}=0$, then one of two things happens. Either $a^{(t)}=0$ and we stop, or else $h'<0$ all the way to the boundary, and we take $\lambda^{(t)}=x^{(t)}_j$.
  • Other than at the outset, a full matrix multiplication is not required to update the gradient vector. It is easy to verify that $g^{(t+1)} = g^{(t)} + \lambda^{(t)}\left(Q_i - Q_j\right)$ where $i$ and $j$ are the indices selected at iteration $t$. (Sorry, I absolutely refuse to put a $(t)$ superscript on a subscript -- this is math, not particle physics.) So gradient updates amount to something like $n$ multiplications and $2n$ additions or subtractions. (If the iteration count gets high enough, you might want to do an occasional full $g=Qx$ multiplication just to clear out some of the rounding error, akin to occasionally refactoring an LP basis.)
I got curious enough to tweak my Java code and run a set of simulations. I generated covariance matrices randomly. In particular, the matrices were dense and scaled so that variances of the underlying random variables were typically around 2.5. My results might not apply if variances are larger or more spread out, or if there are lots of zero covariances. (Then again, those things might make no difference.) At any rate, I ran 100 replications with $n=10$, contrasting my results with those of CPLEX 12.5.1. I terminated iteration of my method when the step size fell below $10^{-9}$. What I got was the following:
  • mean execution times (including setup and extracting solution) of 0.13 ms. for my method v. 14.78 ms. for CPLEX;
  • mean objective ratio (my result/CPLEX result) of 0.9999999971843175 (so we're pretty much tied for optimality);
  • mean $L_2$ norm of the difference in weight vectors = $2.8\times 10^{-7}$.

Wednesday, September 4, 2013

"New" OR Educational Resource

Okay, it's not really new -- the author, Francisco Yuraszeck (professor at the Universidad Santa Maria in Chile) tells me he started it more than two years ago -- but it's new to me. Gestión de Operaciones  ("Operations Management") describes itself as a "blog on Management and Operations Research with tutorials and solved exercises". (At least that is what it says if Google Translate can be trusted; I had one Spanish class, in the summer before ninth grade, which barely allows me to order an occasional cerveza ... in a Castilian "lisp".) Professor Yuraszeck tells me he started it as an aid to OR students in all Spanish-speaking countries.

I have not actually poked around the examples and exercises, but they look pretty good on cursory inspection. The site is tied to a YouTube channel with OR videos (in Spanish), and there is at least one "sketch book" available for download. Almost all the content is free, the one current exception being an introductory ebook on linear programming available for a dollar or two (the minimal royalties for which help offset site costs).

Overall, the site seems to be very well done: organized well, easy to navigate, visually appealing, with lots of pertinent content. It would be nice to see the blog gain visibility.

Tuesday, September 3, 2013

There Was a World (and Music!) Before Punk Rock

This is in response to Laura McLay's latest blog post, "famous punk songs about climate change and optimization". Like so many young whippersnappers (and the former Soviet Union), she seems to think everything was invented by her generation. I'm here to refute that. (You'll probably want to read her post before this one, for context.)

Long before "punk rock" meant anything beyond the Jets and the Sharks:
I've probably just scratched the surface here. Heck, maybe it's true that "Everything Old Is New Again" (Peter Allen, cowritten with Carole Bayer Sager, 1974).

Update: Since Marc-André Carle brought up geometry, I feel obligated to point out that geometry begins with basic shapes:

STEM Crisis: Fact or Fiction?

After reading assorted articles about a looming crisis in the supply of qualified STEM (Science, Technology, Engineering, Mathematics) graduates, today LinkedIn pointed me to an article on the IEEE Spectrum site titled "The STEM Crisis Is a Myth". It seems to be cogent and fairly well researched, but I'm not sure I entirely buy the author's analysis. I will certainly stipulate this: determining whether there is a looming STEM crisis, its probable extent (if it does occur), and what to do about it are complex questions, involving fuzzy definitions, data that can be parsed in a variety of ways, and at times some rather heated opinions that can get in the way of careful analysis.

I don't have a position to defend in this debate, and I certainly don't have any answers. I do have some questions/concerns about some of the definitions, how some of the data is interpreted, and how some realities are perhaps ignored (or abstracted away) during the analysis ... and that's without getting into the questions of what constitutes a STEM degree or a "STEM job". In short, I'm happy to muddy the waters further. Here, in no particular order, are some thoughts:

Not all STEM graduates get "STEM jobs" ... nor need they


Do the debaters consider jobs in banking and finance to be "STEM jobs"? My guess is that the answer in most cases is no; and yet the banking and financial services industries employ significant numbers of mathematics and statistics graduates. Actuarial positions may be classified as "STEM jobs", but what about people who work designing portfolios or analyzing market trends? Wall Street is a leading employer of Ph. D. graduates in particle physics (see, for instance, this New York Times article), largely because the physicists (claim to) understand the Ito calculus, used to describe physical processes such as Brownian motion but also used in derivative pricing.

My personal opinion, formed well before the 2008 market crash, can be summed up as follows: Handing my retirement portfolio to people who think they can measure particles (tachyons) that exist only at speeds greater than the speed of light ... what could possibly go wrong with that? I cringed when I learned that my alma mater, Princeton University, has a department titled Operations Research and Financial Engineering -- and trust me, it's not the OR part at which I cringe. Personal prejudices aside, though, it seems reasonable to me that a portion of STEM graduates will legitimately be desired for (and hired into) jobs that are probably not considered "STEM jobs", siphoning a portion of our university output away from the jobs specifically targeting holders of STEM degrees.

People have changes of heart.


Years ago, I had a student in an MBA course (now a lifelong friend) who had a bachelors degree in a scientific field (chemistry or something related). She had worked for the Michigan Department of Natural Resources as a lab chemist before deciding that it was not her passion, and that she would rather work in the business sector. Years later, I had another student with a science degree (chemistry? biochemistry? chemical engineering?) who had worked in the pharmaceutical industry before joining the MBA program. After graduating, he worked in management jobs that leveraged his scientific background but would not be classified as STEM jobs. Another MBA student had a degree in nuclear engineering and had served as a propulsion engineer aboard a US Navy submarine.

In fact, a fairly sizable portion of our MBA program consisted of people with undergraduate STEM degrees (and, often related work experience) who had decided to go over to the Dark Side and join the ranks of management. In comparing supply of and demand for STEM degrees, we need to allow for the fact that some STEM degree holders will naturally choose, for reasons other than a lack of job opportunities, to pursue non-STEM careers.

There is more to employability than having the right degree.


An administrator once made the mistake of inviting me to participate in an orientation session for new MBA students, designed to foster camaraderie and esprit de corps. During the session, I remember saying the following: "Look at the person on your left. Now look at the person on your right. Statistically, one in three people is an asshole ... so if it's not one of them, it's you." (I made the statistic up, based on my experience growing up in New York.) The point I was making then was that candidates would be required to work in teams, just as in industry, and it was naive to assume that it would always be easy to get along with all your teammates.

Sadly, though, it is also true that some people just cannot coexist at all with other workers. Some are chronically absent or late. Some need to be nagged, wheedled or hand-held just to get things done. Some are larcenous. I see no inherent reason why a STEM degree would preclude any of those possibilities (and in fact I've met walking, talking counterexamples to that conjecture). Those people will often "fail" a job interview, no matter how solid their technical credentials, or they will be involuntarily separated from an employer in a way that follows them through subsequent interviews. Thus we have to allow for some slice, hopefully small, of the STEM degree holders to be unsuitable for employment in STEM jobs (unless the recruiters are desperate).

Educational standards in the US ain't what they used to be.


During my graduate studies in mathematics at Michigan State University, I was a teaching assistant. Like most TAs, I began teaching recitation sections of a large service course, whose title was (approximately) "Introduction to College Algebra". The course was taught in a large lecture by a faculty member. The primary roles of the TAs handling the recitation sections were to answer questions and grade homework and exams.

A year or so after I arrived, we were joined by a new graduate student with a bachelor's degree in mathematics from a "prestigious" university. (I shall not name the university, so as to avoid embarrassing the fine folks in Ann Arbor.) He too had a teaching assistantship. Fall quarter of his first year, he was assigned to teach a couple of sections of the introductory algebra course. Winter quarter he was pulled out of the classroom, and his sole duty as TA was to work out the answers to the problems in the textbook ... because the Powers That Be had discovered that he could not answer simple algebra questions in class (and not, apparently, due to stage fright). Had he chosen to work in industry, rather than going straight to graduate school, and had the recruiters done their jobs well, he might have contributed to the statistics representing STEM degree holders not working in STEM jobs.

Employers frequently complain (particularly when lobbying for more H1B visas) that they cannot find a sufficient number of STEM degree holders with the "right skill set". We can argue about who bears responsibility for a genuine lack of the right skills: universities with outmoded curricula; employers unwilling to pay for training; or degree holders unwilling to upgrade their skills on their own time. We can also speculate (and many people do -- see the comments on the IEEE Spectrum post) on how often the "right skill set" translates to "willing to work cheap". We also need to accept that in some cases the "right skill set" translates to "actually knows the subject matter for their awarded degree", and that this is not a given.

Saturday, August 31, 2013

Mythbuntu Remote Access Commands

My Mythbuntu box is accessible within the house via WiFi (using a nonrouteable address), and I sometimes like to program recordings remotely. For the most part, this is very straightforward using the MythWeb web interface. I just point a browser at the server ... usually. The MythWeb interface is both straightforward and rather comprehensive.

The only catch is that, while the browser can always find the MythWeb service, sometimes MythWeb cannot talk to the MythTV back end. I have no idea why the connection works some times and fails other times. Sorting that out is for another day. Fortunately, the problem is easily fixed using shell access, since SSH is enabled (by default, I believe).

For future reference (since I will forget them, probably within the next hour), here are the only commands I seem to need in the SSH shell. These may be somewhat specific to Mythbuntu, so anyone using a different version of MythTV be warned: Your Mileage May Vary.
  • sudo service mythtv-backend [start | stop | restart] to start/stop/restart the back end (when it gives MythWeb the cold shoulder);
  • sudo mythshutdown --shutdown to shut down the server remotely after I'm done playing with it.

Wednesday, August 28, 2013

Mything Channel Issue Resolved (?)

I just resolved an issue with my Mythbuntu/MythTV installation. At least I think I resolved it ... I'll post the details here in case it helps anyone else.

Symptoms


MythTV lets you watch TV directly, with the ability to pause shows (say, while you are either preparing a beverage for consumption or, ahem, recycling one already consumed). You can tune up and down with arrow keys on your remote and select a channel. MythTV superimposes on the screen information about each channel (id/call sign, current program, signal strength, ...) as you tune around. Tuning displays only channels known to the electronic program guide that you configured, in effect skipping channels you do not have ... usually ...

In my case, MythTV would let me scroll to channel 16 (which it labeled TBS) or 15 (WGNAMER), but when I selected either one the tuner would crash, taking me back to the MythTV front-end interface.

Setup


I get my TV via cable from Comcast, although I'm not sure that's entirely pertinent here. I subscribe to Schedules Direct (henceforth "SD") to get EPG data, which I feed both to MythTV and to Freeguide, running on a different PC, so that I can plan my week's viewing. SD lets me setup and edit multiple lineups, and it knows the Comcast lineup for my area. To reduce the amount of data downloaded each time I grab listings, I edited out (disabled) most of the available channels, since I pretty much never watch them, and certainly never record them. (If I do want to watch one, the Comcast cable box will still list it.)

Root Problem


I'm pretty sure that, at one time, WGN and TBS really were on channels 15 and 16 respectively, and I had them enabled (on my active channel list) at SD. Somewhere along the line I disabled them at SD, and somewhere along the line Comcast moved them to channels 95 and 66 respectively. Apparently, though, the old numbers somehow got "stuck" in the channel lists of both Freeguide and the built in channel "grabber" in MythTV. So when I logged into the SD web site to inspect my channel lineup, the correct channel numbers were listed (and marked as disabled), but when I looked in Freeguide or (gulp) tuned manually in MythTV, the old channel numbers were there.

Solution (?)


Reloading the EPG data did not good on either MythTV or Freeguide until I edited my SD lineup and re-enabled both WGNAMER and TBS. After doing that, I reloaded the listings in Freeguide (which now showed the correct channel numbers) and then reloaded the listings in MythTV. I'm not sure it was necessary, but just to be safe I opened a terminal on the Mythbuntu box and did the reload manually, running 'mythfilldatabase --do-channel-updates' to ensure that channel numbers, as well as program content, were updated. Then I went into the MythTV front end, selected 'Watch TV', and verified that the correct channel numbers were displayed.

Apparently it required an edit of the channel lineup at SD to get things unstuck.

Wednesday, August 14, 2013

Different Solvers, Different Solutions

I've seen multiple variations of the following general question online lately: "I ran my model (linear or integer linear program) through two different solvers (or two different versions of the same solver) (or the same version of the same solver during two different phases of the moon) and got two different answers. What went wrong?"

Well, let's break this down according to what is meant by "answers" and just what is different.

Both solutions "optimal", same objective value, different variable values.


Barring an error by one solver or the other, you've just discovered that your problem has multiple optimal solutions.
  • If you used two different solvers, there is no reason to expect them to find the same solution, so this is an unsurprising outcome.
  • If you used two versions of the same solver, it is quite possible that changes to the "plumbing" of the solver would occasionally lead the newer version to find a different one of the multiple optimal solutions.
  • If you used the same version of the same solver on two different computers, you may be the "victim" of subtle differences in how they do floating point arithmetic (leading to slightly different round/truncation errors, in turn steering the algorithm slightly differently each time).
  • If you used two different versions of the model file, for instance one in CPLEX's LP format and the other in their SAV format, differences in the precision of the input values or the order in which variables and/or constraints are listed could account for variations in which optimal solution is found.
  • If the solver uses multiple parallel threads, I believe it is possible that differences in the synchronization of those threads could result in different solutions being found. I suspect this is more likely with integer programs, but this is really a bit outside my expertise. The operating system will likely interrupt threads occasionally to use the processors/cores running them for other tasks. It could be that different threads were interrupted at different times during the two runs, so that for example a node that was pruned due to bound in the first run was separated in the second because the new incumbent (found in a different thread during the first run) was not found in time to prune it during the second run.
  • Computers are not as deterministic as one might hope. (Buy me a beer at a conference and I'll tell you about the computer that -- briefly -- had true and false confused. No, I'm not making that up just to score free beers.)
 

Both solutions "optimal" but with different objective values.


Solvers have various convergence criteria, typically including absolute and/or relative tolerances for the objective value. For integer programs, these are usually expressed in terms of the "gap" between the objective value and the best known bound (lower bound for minimization, upper bound for maximization). If either the absolute or relative (percentage) gap falls below a specified nonzero threshold, the solver declares victory and announces an "optimal" solution.

This means that the announced objective value for an optimal solution is only approximately equal to the true optimal value (charitably assuming that the solver has correctly identified an optimum). Therefore, unless the difference between the announced optimal values of the two solvers exceeds both the larger of the two absolute gap limits and the larger of the two relative gap limits, this is probably just a combination of rounding error and convergence tolerances. In short, you can accept both answers as plausibly correct.

One "optimal" solution, one suboptimal solution.


Perhaps the solvers had different convergence criteria specified. Very likely they had different values for an assortment of parameters that affect performance. Quite possibly, either one solver is better than the other or one just was luckier than the other (particularly plausible with integer programs, perverse beasties that they are). Move along, folks: nothing to see here. Just make sure that the suboptimal solution does not have a better objective value than the optimal solution (again, by more than the tolerance convergence of the first solver -- if the values are close, it could be that both have the optimal solution but the second solver has not yet proved it optimal).

One "optimal" solution, one "infeasible" diagnosis.


Obviously someone is pulling your leg here. In addition to the convergence tolerances mentioned above, solvers generally have non-zero tolerances for how close a solution has to come to satisfying a constraint (and, for integer problems, how close the value of an integer variable has to be to the nearest actual integer). This is necessary in order to avoid having darned near every problem declared "infeasible" due to rounding errors.

Different tolerances could possibly account for the discrepancy. If your problem is "numerically unstable" (predisposed to causing ugly rounding error problems), it could be that one solver is better at handling numerical instability than the other one, or that parameter settings relating to numerical stability account for the different in results. I suggest that you take the alleged optimal solution, plug it into the model fed to the other solver (for instance, by setting the upper and lower bound of each variable to its "optimal" value), then ask the second solver to diagnose the source of infeasibility (which constraints are violated). You can then decide for yourself whether the constraint violations are small enough for you to tolerate.

Both feasible, neither optimal, different results.


Nothing remarkable here. Anything goes in this case, so long as neither incumbent solution spells out "Harvard" or "Michigan" in ASCII. (No need for obscenity just because the problem was not solved to optimality.)

Thursday, August 1, 2013

Numerical Analysis: Size Matters

A pure mathematician, an applied mathematician and an accountant all apply for the same job on Wall Street, and during their interviews they are all asked the same question: "What is the value of $(1-1/N)\times N + 1 - N$?" The pure mathematician replies "Zero, of course!" The applied mathematician asks "How large is $N$?" The accountant asks "What would you like it to be?" The accountant gets the job ...

... but the applied mathematician is on the right track, especially in an era where much research and virtually all application of mathematics is done on digital computers. I've been meaning to do some posts on numerical analysis for a while now, motivated in part by a large number of questions appearing on optimization support forums that indicate the topic is foreign to a sizable number of people using mathematical programming software. (The poster child for questions of this type: "Why did solver XXX say that the optimal value of my variable is 0.999999823 when it is a 0-1 variable??")

I was a math major end to end. In my first quarter of graduate school, I took a course on numerical analysis -- I can't recall now why, as I was a "pure" mathematician in those days -- and it was a real eye-opener. Identities that hold in symbolic mathematics (including "chalkboard arithmetic") don't necessarily survive the conversion to a digital medium. I have no intention of trying to compress that first course in numerical analysis into a blog post, or even a series of posts, but perhaps an example or two will at least get the attention of some people who previously had not heard of it. The numerical analysis course turned out to be one of the most valuable math courses I took (at any level), and I invariably recommend it whenever someone asks "What courses should I take to prepare for a career in Operations Research?"

I tested the formula in the above joke in a small Java program, using double-precision arithmetic, and the reported answer was zero no matter how large a value of $N$ I tried. This might or might not be a consequence of compiler optimizations -- the compiler doing the arithmetic in a different order than I specified it. So I'll fall back on an example from that graduate course: computing a partial sum of the (divergent) harmonic series $\sum_{n=1}^\infty 1/n$.

One of the first things anyone learns about math, at an early age, is that addition is commutative associative ... except, as it turns out, when a computer is involved. Here is a small Java program that computes $\sum_{n=1}^N 1/n$ twice, in one case summing in ascending order of $n$ ($1 + 1/2 + 1/3 + \dots + 1/n$) and in the other case summing in descending order of $n$ ($1/N + 1/(N-1) + \dots + 1$). I start with $N=1000$ and double $N$ each time through the main loop. To save myself repetition, let me stipulate that henceforth "large" and "small" refer to the magnitudes of numbers, without regard for their sign. If addition is commutative on the computer, the difference between summing in ascending and descending order should be zero.

for (int limit = 1000; limit <= 1024000; limit *=2) {
  double up = 0;
  double down = 0;
  for (int n = 1; n <= limit; n++) {
    up += 1.0/n;
  }
  for (int n = limit; n >= 1; n--) {
    down += 1.0/n;
  }
  double diff = up - down;
  System.out.println("Limit = " + limit + ", up = " + up + ", down = " 
                     + down + ", diff = " + diff);
}

Here is the output it generated:

N1 + ... + 1/N1/N + ... + 1Difference
10007.4854708605503437.4854708605503412.6645352591003757E-15
20008.1783681036102848.1783681036102785.329070518200751E-15
40008.8713902997951988.871390299795223-2.4868995751603507E-14
80009.5644749842613919.564474984261425-3.375077994860476E-14
1600010.25759091579793710.2575909157979142.3092638912203256E-14
3200010.9507224716020310.950722471602038-8.881784197001252E-15
6400011.6438618397230111.6438618397230028.881784197001252E-15
12800012.33700511404807212.337005114048194-1.2256862191861728E-13
25600013.03015034148708713.0301503414869521.3500311979441904E-13
51200013.7232965454854613.7232965454853561.0480505352461478E-13
102400014.4164432377636214.416443237764337-7.176481631177012E-13

The first takeaway is that we're not seeing a difference of zero, ever. (Actually, for $N$ up to about 40, the reported difference is zero.) The differences are small, but not zero. There is no obvious pattern to their sign, nor is their size changing monotonically, but they do seem generally to be growing with $N$, which among other things means that if we have a really large value of $N$, we could get a fairly seriously inaccurate answer (at least in absolute, rather than relative, terms).

Why does this happen? The short answer is that computers store non-integral numbers only to a specific finite precision. That creates "truncation errors" in the representations of the numbers (0.1 as stored in a computer is not exactly 1/10) and "rounding errors" when arithmetic is performed. In particular, adding or subtracting a large number and a small number can produce rounding errors. In the extreme, the small number is smaller than the precision at which the large number is stored, and so the sum or difference of the two is identical to the large number as far as the computer is concerned. That sort of error is more likely to occur when $n$ is ascending (we are adding progressively smaller terms $1/n$ to progressively larger partial sums) than when $n$ is descending (at the outset, we add small numbers to small partial sums, and eventually larger numbers to larger sums); so I am inclined to trust the second set of partial sums more than I do the first. Neither, though, is exactly correct.

You might not be too excited about the differences in the table -- even at $N=1,024,000$ the difference is in the thirteenth decimal place -- but consider the following coding error probably made by every rookie programmer to tackle a mathematical computation: testing whether the difference of two floating point (non-integer) numbers equals zero when they should be testing whether it is close enough to zero. Many a program has entered an indefinite loop thanks to a test for equality and a smidgen of rounding error.

Update: John D. Cook has a related post in a different context (probability and statistics).

Wednesday, July 31, 2013

Benders Decomposition with Integer Subproblems

I'm not sure why, but occasionally people post questions related to an attempt to apply Benders decomposition in a situation where the subproblem contains integer variables. A key question is how you generate cuts from the subproblem, since discrete problems do not enjoy the same duality theory that continuous problems do.

A typical application of Benders decomposition to integer programming starts with a problem of the form\[ \begin{array}{lrclcc} \textrm{minimize} & c_{1}'x & + & c_{2}'y\\ \textrm{subject to} & Ax & & & \ge & a\\ & & & By & \ge & b\\ & Dx & + & Ey & \ge & d\\ & x & \in & \mathbb{Z}^{m}_+\\ & y & \in & \mathbb{R}^{n}_+ \end{array} \]This decomposes into a master problem\[ \begin{array}{lrclcc} \textrm{minimize} & c_{1}'x & + & z\\ \textrm{subject to} & Ax & & & \ge & a\\ & h'x & & & \ge & h_0 & \forall (h,h_0)\in \mathcal{F}\\ & h'x & + & z & \ge & h_0 & \forall (h, h_0)\in \mathcal{O}\\ & x & \in & \mathbb{Z}^{m}_+ \\ & z & \ge & 0 \end{array} \]and a subproblem\[ \begin{array}{lrclcc} \textrm{minimize} &  c_{2}'y\\ \textrm{subject to} & By & \ge & b\\ & Ey & \ge & d - Dx\\ & y & \in & \mathbb{R}^{n}_+ \end{array} \]where $\mathcal{F}$ and $\mathcal{O}$ are  sets of coefficient vectors for "feasibility" cuts (pushing $x$ in directions that make the solution $(x,y)$ feasible) and "optimality" cuts (pushing $z$ upward so as not to underestimate $c_2'y$) respectively. The subproblem is a linear program, and its dual solution supplies the coefficient vectors $(h,h_0)$ for both types of master cuts.

So what happens if $y$ is integer-valued ($y\in\mathbb{Z}^n_+$) rather than continuous ($y\in\mathbb{R}^n_+$)? I don't have a definitive answer, but there are a few things that can be tried. The following suggestions should also work equally well (or poorly) when $y$ is a mix of integer and continuous variables.

Proceed as usual


The subproblem is now an integer program, but you can always relax it to a linear program and obtain the dual solution to the relaxation. If the current master solution $x = \hat{x}$, $z = \hat{z}$ makes the linear relaxation of the subproblem infeasible, you can be sure it also makes the actual subproblem infeasible, and thus you will get a legitimate feasibility cut. If the subproblem is feasible, the dual to the relaxation will produce a cut that forces $z$ to be at least as great as the objective value of the relaxation, which is a legitimate lower bound for the actual subproblem objective value.

The news here is not all good, though. It is possible that $\hat{x}$ makes the subproblem integer-infeasible but with a feasible relaxation, in which case you will not get the feasibility cut you need. If the subproblem is feasible (let's say with optimal solution $\hat{y}$) but $\hat{z}$ underestimates its objective value $c_2'\hat{y}$, you want an optimality cut that forces $z\ge c_2'\hat{y}$ when $x=\hat{x}$; but the cut you get forces $z\ge w$ where $w$ is a lower bound for $c_2'\hat{y}$, and so there is the possibility that $c_2'\hat{y} > \hat{z} \ge w$ and the optimality cut accomplishes nothing.

"No good" constraints for infeasibility


Suppose that $x$ consists exclusively of binary variables. (General integer variables can always be converted to binary variables, although it's not clear that the conversion is in general desirable.) We can exclude a particular solution $x=\hat{x}$ with a "no good" constraint that forces at least one of the variables to change value:\[\sum_{i : \hat{x}_i=0} x_i + \sum_{i : \hat{x}_i = 1} (1-x_i)\ge 1.\]This gives us another option for feasibility cuts. Solve the subproblem as an IP (without relaxation); if the subproblem is infeasible, add a "no good" cut to the master problem. Note that "no good" cuts are generally not as deep as regular Benders feasibility cuts -- the latter may cut off multiple integer solutions to the master problem, whereas a "no good" cut only cuts off a single solution.

If a "no good" cut eliminates just one solution, is it worth the bother? After all, the node that produced $x=\hat{x}$ will be pruned once we realize the subproblem is infeasible. The answer depends on a combination of factors (and essentially reduces to "try it and see"). First, if $x=\hat{x}$ was produced by a heuristic, rather than as the integer-feasible solution to the node LP problem, then you likely cannot prune the current node (and, in fact, the node you would want to prune may be elsewhere in the tree). Adding the "no good" cut may prevent your ever visiting that node, and at minimum will result in the node being pruned as soon as you visit it, without having to solve the subproblem there. Second, if your master problem suffers from symmetry, the same solution $x=\hat{x}$ may lurk in more than one part of the search tree. The "no good" cut prevents your tripping over it multiple times.

It may be possible to strengthen the cut a bit. Suppose that $\hat{x}$ renders the subproblem infeasible (as an IP). There are various ways to identify a subset of the subproblem (excluding the objective function) that causes infeasibility. CPLEX can do this with its conflict refiner; other solvers may have similar functionality. Let $N$ be the set of indices of the $x$ variables and $N_0$ the set of indices of all $x$ variables that appear in the right hand side of at least one subproblem constraint identified as part of the conflict. If we are lucky, $N_0$ is a proper subset of $N$. We can form a "no good" cut for the master problem using just the variables $x_i, i\in N_0$, rather than all the $x$ variables, and obtain a somewhat deeper cut (one that potentially cuts off multiple master problem solutions). The caveat here is that running something like the CPLEX conflict refiner, after determining that the subproblem is infeasible, may eat up a fair bit of CPU time for questionable reward.

"No good" constraints for optimality


It may be possible to exploit the technique I just described to create ersatz optimality constraints as well. Suppose that the current incumbent solution is $(\tilde{x}, \tilde{y})$, and that some node gives an integer-feasible solution $(\hat{x},\hat{z})$ for the master problem. It must be the case that\[c_1'\hat{x}+\hat{z}<c_1'\tilde{x}+c_2'\tilde{y},\]else the master problem node would be pruned based on bound. Now suppose we pass $\hat{x}$ to the IP subproblem and obtain an optimal solution $y=\hat{y}$. If $c_1'\hat{x}+c_2'\hat{y}<c_1'\tilde{x}+c_2'\tilde{y}$, we have a new incumbent solution. If not, then $x=\hat{x}$ cannot lead to an improved solution, and we can add a "no good" cut to eliminate it (again recognizing that this is a weak constraint).

More?


That pretty much exhausts my quiver. If any readers have other ideas for generating Benders cuts from integer subproblems, I invite you to post them in comments.