OR in an OB World: August 2013

Saturday, August 31, 2013

Mythbuntu Remote Access Commands

My Mythbuntu box is accessible within the house via WiFi (using a nonrouteable address), and I sometimes like to program recordings remotely. For the most part, this is very straightforward using the MythWeb web interface. I just point a browser at the server ... usually. The MythWeb interface is both straightforward and rather comprehensive.

The only catch is that, while the browser can always find the MythWeb service, sometimes MythWeb cannot talk to the MythTV back end. I have no idea why the connection works some times and fails other times. Sorting that out is for another day. Fortunately, the problem is easily fixed using shell access, since SSH is enabled (by default, I believe).

For future reference (since I will forget them, probably within the next hour), here are the only commands I seem to need in the SSH shell. These may be somewhat specific to Mythbuntu, so anyone using a different version of MythTV be warned: Your Mileage May Vary.

sudo service mythtv-backend [start | stop | restart] to start/stop/restart the back end (when it gives MythWeb the cold shoulder);
sudo mythshutdown --shutdown to shut down the server remotely after I'm done playing with it.

Wednesday, August 28, 2013

Mything Channel Issue Resolved (?)

I just resolved an issue with my Mythbuntu/MythTV installation. At least I think I resolved it ... I'll post the details here in case it helps anyone else.

Symptoms

MythTV lets you watch TV directly, with the ability to pause shows (say, while you are either preparing a beverage for consumption or, ahem, recycling one already consumed). You can tune up and down with arrow keys on your remote and select a channel. MythTV superimposes on the screen information about each channel (id/call sign, current program, signal strength, ...) as you tune around. Tuning displays only channels known to the electronic program guide that you configured, in effect skipping channels you do not have ... usually ...

In my case, MythTV would let me scroll to channel 16 (which it labeled TBS) or 15 (WGNAMER), but when I selected either one the tuner would crash, taking me back to the MythTV front-end interface.

Setup

I get my TV via cable from Comcast, although I'm not sure that's entirely pertinent here. I subscribe to Schedules Direct (henceforth "SD") to get EPG data, which I feed both to MythTV and to Freeguide, running on a different PC, so that I can plan my week's viewing. SD lets me setup and edit multiple lineups, and it knows the Comcast lineup for my area. To reduce the amount of data downloaded each time I grab listings, I edited out (disabled) most of the available channels, since I pretty much never watch them, and certainly never record them. (If I do want to watch one, the Comcast cable box will still list it.)

Root Problem

I'm pretty sure that, at one time, WGN and TBS really were on channels 15 and 16 respectively, and I had them enabled (on my active channel list) at SD. Somewhere along the line I disabled them at SD, and somewhere along the line Comcast moved them to channels 95 and 66 respectively. Apparently, though, the old numbers somehow got "stuck" in the channel lists of both Freeguide and the built in channel "grabber" in MythTV. So when I logged into the SD web site to inspect my channel lineup, the correct channel numbers were listed (and marked as disabled), but when I looked in Freeguide or (gulp) tuned manually in MythTV, the old channel numbers were there.

Solution (?)

Reloading the EPG data did not good on either MythTV or Freeguide until I edited my SD lineup and re-enabled both WGNAMER and TBS. After doing that, I reloaded the listings in Freeguide (which now showed the correct channel numbers) and then reloaded the listings in MythTV. I'm not sure it was necessary, but just to be safe I opened a terminal on the Mythbuntu box and did the reload manually, running 'mythfilldatabase --do-channel-updates' to ensure that channel numbers, as well as program content, were updated. Then I went into the MythTV front end, selected 'Watch TV', and verified that the correct channel numbers were displayed.

Apparently it required an edit of the channel lineup at SD to get things unstuck.

Wednesday, August 14, 2013

Different Solvers, Different Solutions

I've seen multiple variations of the following general question online lately: "I ran my model (linear or integer linear program) through two different solvers (or two different versions of the same solver) (or the same version of the same solver during two different phases of the moon) and got two different answers. What went wrong?"

Well, let's break this down according to what is meant by "answers" and just what is different.

Both solutions "optimal", same objective value, different variable values.

Barring an error by one solver or the other, you've just discovered that your problem has multiple optimal solutions.

If you used two different solvers, there is no reason to expect them to find the same solution, so this is an unsurprising outcome.
If you used two versions of the same solver, it is quite possible that changes to the "plumbing" of the solver would occasionally lead the newer version to find a different one of the multiple optimal solutions.
If you used the same version of the same solver on two different computers, you may be the "victim" of subtle differences in how they do floating point arithmetic (leading to slightly different round/truncation errors, in turn steering the algorithm slightly differently each time).
If you used two different versions of the model file, for instance one in CPLEX's LP format and the other in their SAV format, differences in the precision of the input values or the order in which variables and/or constraints are listed could account for variations in which optimal solution is found.
If the solver uses multiple parallel threads, I believe it is possible that differences in the synchronization of those threads could result in different solutions being found. I suspect this is more likely with integer programs, but this is really a bit outside my expertise. The operating system will likely interrupt threads occasionally to use the processors/cores running them for other tasks. It could be that different threads were interrupted at different times during the two runs, so that for example a node that was pruned due to bound in the first run was separated in the second because the new incumbent (found in a different thread during the first run) was not found in time to prune it during the second run.
Computers are not as deterministic as one might hope. (Buy me a beer at a conference and I'll tell you about the computer that -- briefly -- had true and false confused. No, I'm not making that up just to score free beers.)

Both solutions "optimal" but with different objective values.

Solvers have various convergence criteria, typically including absolute and/or relative tolerances for the objective value. For integer programs, these are usually expressed in terms of the "gap" between the objective value and the best known bound (lower bound for minimization, upper bound for maximization). If either the absolute or relative (percentage) gap falls below a specified nonzero threshold, the solver declares victory and announces an "optimal" solution.

This means that the announced objective value for an optimal solution is only approximately equal to the true optimal value (charitably assuming that the solver has correctly identified an optimum). Therefore, unless the difference between the announced optimal values of the two solvers exceeds both the larger of the two absolute gap limits and the larger of the two relative gap limits, this is probably just a combination of rounding error and convergence tolerances. In short, you can accept both answers as plausibly correct.

One "optimal" solution, one suboptimal solution.

Perhaps the solvers had different convergence criteria specified. Very likely they had different values for an assortment of parameters that affect performance. Quite possibly, either one solver is better than the other or one just was luckier than the other (particularly plausible with integer programs, perverse beasties that they are). Move along, folks: nothing to see here. Just make sure that the suboptimal solution does not have a better objective value than the optimal solution (again, by more than the tolerance convergence of the first solver -- if the values are close, it could be that both have the optimal solution but the second solver has not yet proved it optimal).

One "optimal" solution, one "infeasible" diagnosis.

Obviously someone is pulling your leg here. In addition to the convergence tolerances mentioned above, solvers generally have non-zero tolerances for how close a solution has to come to satisfying a constraint (and, for integer problems, how close the value of an integer variable has to be to the nearest actual integer). This is necessary in order to avoid having darned near every problem declared "infeasible" due to rounding errors.

Different tolerances could possibly account for the discrepancy. If your problem is "numerically unstable" (predisposed to causing ugly rounding error problems), it could be that one solver is better at handling numerical instability than the other one, or that parameter settings relating to numerical stability account for the different in results. I suggest that you take the alleged optimal solution, plug it into the model fed to the other solver (for instance, by setting the upper and lower bound of each variable to its "optimal" value), then ask the second solver to diagnose the source of infeasibility (which constraints are violated). You can then decide for yourself whether the constraint violations are small enough for you to tolerate.

Both feasible, neither optimal, different results.

Nothing remarkable here. Anything goes in this case, so long as neither incumbent solution spells out "Harvard" or "Michigan" in ASCII. (No need for obscenity just because the problem was not solved to optimality.)

Thursday, August 1, 2013

Numerical Analysis: Size Matters

A pure mathematician, an applied mathematician and an accountant all apply for the same job on Wall Street, and during their interviews they are all asked the same question: "What is the value of $(1-1/N)\times N + 1 - N$?" The pure mathematician replies "Zero, of course!" The applied mathematician asks "How large is $N$?" The accountant asks "What would you like it to be?" The accountant gets the job ...

... but the applied mathematician is on the right track, especially in an era where much research and virtually all application of mathematics is done on digital computers. I've been meaning to do some posts on numerical analysis for a while now, motivated in part by a large number of questions appearing on optimization support forums that indicate the topic is foreign to a sizable number of people using mathematical programming software. (The poster child for questions of this type: "Why did solver XXX say that the optimal value of my variable is 0.999999823 when it is a 0-1 variable??")

I was a math major end to end. In my first quarter of graduate school, I took a course on numerical analysis -- I can't recall now why, as I was a "pure" mathematician in those days -- and it was a real eye-opener. Identities that hold in symbolic mathematics (including "chalkboard arithmetic") don't necessarily survive the conversion to a digital medium. I have no intention of trying to compress that first course in numerical analysis into a blog post, or even a series of posts, but perhaps an example or two will at least get the attention of some people who previously had not heard of it. The numerical analysis course turned out to be one of the most valuable math courses I took (at any level), and I invariably recommend it whenever someone asks "What courses should I take to prepare for a career in Operations Research?"

I tested the formula in the above joke in a small Java program, using double-precision arithmetic, and the reported answer was zero no matter how large a value of $N$ I tried. This might or might not be a consequence of compiler optimizations -- the compiler doing the arithmetic in a different order than I specified it. So I'll fall back on an example from that graduate course: computing a partial sum of the (divergent) harmonic series $\sum_{n=1}^\infty 1/n$.

One of the first things anyone learns about math, at an early age, is that addition is ~~commutative~~ associative ... except, as it turns out, when a computer is involved. Here is a small Java program that computes $\sum_{n=1}^N 1/n$ twice, in one case summing in ascending order of $n$ ($1 + 1/2 + 1/3 + \dots + 1/n$) and in the other case summing in descending order of $n$ ($1/N + 1/(N-1) + \dots + 1$). I start with $N=1000$ and double $N$ each time through the main loop. To save myself repetition, let me stipulate that henceforth "large" and "small" refer to the magnitudes of numbers, without regard for their sign. If addition is commutative on the computer, the difference between summing in ascending and descending order should be zero.

for (int limit = 1000; limit <= 1024000; limit *=2) {
  double up = 0;
  double down = 0;
  for (int n = 1; n <= limit; n++) {
    up += 1.0/n;
  }
  for (int n = limit; n >= 1; n--) {
    down += 1.0/n;
  }
  double diff = up - down;
  System.out.println("Limit = " + limit + ", up = " + up + ", down = " 
                     + down + ", diff = " + diff);
}

Here is the output it generated:

N	1 + ... + 1/N	1/N + ... + 1	Difference
1000	7.485470860550343	7.485470860550341	2.6645352591003757E-15
2000	8.178368103610284	8.178368103610278	5.329070518200751E-15
4000	8.871390299795198	8.871390299795223	-2.4868995751603507E-14
8000	9.564474984261391	9.564474984261425	-3.375077994860476E-14
16000	10.257590915797937	10.257590915797914	2.3092638912203256E-14
32000	10.95072247160203	10.950722471602038	-8.881784197001252E-15
64000	11.64386183972301	11.643861839723002	8.881784197001252E-15
128000	12.337005114048072	12.337005114048194	-1.2256862191861728E-13
256000	13.030150341487087	13.030150341486952	1.3500311979441904E-13
512000	13.72329654548546	13.723296545485356	1.0480505352461478E-13
1024000	14.41644323776362	14.416443237764337	-7.176481631177012E-13

The first takeaway is that we're not seeing a difference of zero, ever. (Actually, for $N$ up to about 40, the reported difference is zero.) The differences are small, but not zero. There is no obvious pattern to their sign, nor is their size changing monotonically, but they do seem generally to be growing with $N$, which among other things means that if we have a really large value of $N$, we could get a fairly seriously inaccurate answer (at least in absolute, rather than relative, terms).

Why does this happen? The short answer is that computers store non-integral numbers only to a specific finite precision. That creates "truncation errors" in the representations of the numbers (0.1 as stored in a computer is not exactly 1/10) and "rounding errors" when arithmetic is performed. In particular, adding or subtracting a large number and a small number can produce rounding errors. In the extreme, the small number is smaller than the precision at which the large number is stored, and so the sum or difference of the two is identical to the large number as far as the computer is concerned. That sort of error is more likely to occur when $n$ is ascending (we are adding progressively smaller terms $1/n$ to progressively larger partial sums) than when $n$ is descending (at the outset, we add small numbers to small partial sums, and eventually larger numbers to larger sums); so I am inclined to trust the second set of partial sums more than I do the first. Neither, though, is exactly correct.

You might not be too excited about the differences in the table -- even at $N=1,024,000$ the difference is in the thirteenth decimal place -- but consider the following coding error probably made by every rookie programmer to tackle a mathematical computation: testing whether the difference of two floating point (non-integer) numbers equals zero when they should be testing whether it is close enough to zero. Many a program has entered an indefinite loop thanks to a test for equality and a smidgen of rounding error.

Update: John D. Cook has a related post in a different context (probability and statistics).