Sunday, March 11, 2018

Piecewise Linear Approximations in MIP Models

In the past, I've written about piecewise linear approximations of functions of a single variable. (There are too many posts to list here. Just type "piecewise linear" in the blog search box if you want to find them.) Handling piecewise linear approximations of multivariable functions is a bit more intimidating. I'll illustrate one approach here.

To set the stage, assume an optimization problem involving variables $x\in\Re^{n}$, and possibly some other variables $y\in\Re^{m}$. Some or all of the $x$ and $y$ variables may be restricted to integer values. Also assume that the objective function and constraints are all linear, with the exception of one nonlinear function of the $x$ variables that may appear in one or more constraints and/or the objective function. I'll denote that function $f:\Re^{n}\rightarrow\Re$. With that, I'll write our hypothetical problem as
\begin{align*}
\text{min } & g(x,y,z)\\
\text{s.t. } & z=f(x)\\
 & (x,y,z)\in D
\end{align*}where $z$ is a new variable that captures the value of $f$, the domain $D$ incorporates all other functional constraints, bounds and integrality restrictions, and the objective function $g$ is linear. So if it were not for the constraint $z=f(x)$, this would be a mixed integer linear program.

I'm also going to assume that we know a priori some hyperrectangle $R\subset\Re^{n}$ containing all feasible values of $x$, say $R=\prod_{i=1}^{n}[L_{i},U_{i}]$. Brace yourself, because the notation is about to get a bit messy. We will create a mesh of discrete points at which $f$ will be evaluated. First, we specify a sequence of what I will call "grid values" for each individual component of $x$. I'll denote the $j$-th grid value for $x_{i}$ by $a_{i}^{(j)}$, where $$L_{i}=a_{i}^{(1)}<a_{i}^{(2)}<\cdots<a_{i}^{(N_{i})}=U_{i}.$$The number of grid values need not be the same for each variable (hence the subscript on $N_{i}$), and the spacing need not be equal. It probably makes sense to sample $f()$ more frequently in regions where its curvature is greater, and less frequently in regions where it is fairly flat.

I'll use the term "mesh points" for the points$$\prod_{i=1}^{n}\left\{ a_{i}^{(1)},\dots,a_{i}^{(N_{i})}\right\}$$formed by combinations of grid values, and $a^{(j)}$ will denote a generic mesh point $(a_{1}^{(j_{1})},\dots,a_{n}^{(j_{n})})$. The superscript $j$ for $w$ is a vector $(j_{1},\dots,j_{n})$ whose domain I will denote $J$. Now we can get down to the piecewise linear approximation. We will write $x$ as a convex combination of the mesh points, and approximate $f(x)$ with the corresponding convex combination of the function values at the mesh points. Just to be clear, in this formulation $z$ approximates, but no longer equals, $f(x)$. To do this, we will introduce new (continuous) variables $w^{(j)}\in[0,1]$ for each $j\in J$. There are $N_{1}\times\cdots\times N_{n}$ mesh points, and so an identical number of $w$ variables. The $w$ variables are weights assigned to the mesh points. This leads to the following additional constraints:
\begin{align*}
\sum_{j\in J} & w^{(j)}=1\\
\sum_{j\in J}a^{(j)}w^{(j)} & =x\\
\sum_{j\in J} & f(a^{(j)})w^{(j)}=z.
\end{align*}
There's still one more wrinkle with which to contend. Other than possibly at extreme points, there will be more than one convex combination of mesh points producing the same $x$ vector, and they will not all produce the same approximation $z$ of $f(x)$. Consider an example in which $n=2$, $R=[2,5]\times[1,3]$, and $f(x)=x^{\prime}x=\left\Vert x\right\Vert ^{2}$. I'll use integer-valued mesh points. Assume that the optimal solution requires that $x=(3.5,2.2)$, in which case $f(x)=17.09$. Figure 1 illustrates the situation, with the values of $f()$ at the mesh points shown in red. (Click any figure to get a better resolution version in a separate browser tab.)
Figure 1: Rectangle $R$ and solution $x$

Assume first that the solver prefers smaller values of $f(x)$. Figure 2 shows the weights $w$ that minimize $z$, the estimate of $f(x)$, for our given choice of $x$. The interpolated value of $z$ is $17.5$, which is moderately close to the correct value $17.09$. The three mesh points closest to $x$ receive weights 0.3, 0.5 and 0.2. Figure 3 shows an alternative solution with the same value of $z$, using a different combination of adjacent corners.
 

Figure 2: Weights that minimize $z$
Figure 3: Alternative weights that minimize $z$

Now assume instead that the solver prefers larger values of $f(x)$. The solution that maximizes $z$ is shown in Figure 4. It uses three corners of R, none of which are adjacent to $x,$ with weights 0.5, 0.4 and 0.1 Although this produces the correct value of $x$, the interpolated value of $z$ is $20.3$, which grossly overstates the correct value $17.09$.
Figure 4: Weights that maximize $z$
To keep the approximation as accurate as possible, we should force the solver to use mesh points adjacent to the actual solution $x$. We can do that by introducing a new binary variable $v_{i}^{(j)}\in\left\{ 0,1\right\} $ for each grid value $a_{i}^{(j)}$. Variable $v_{i}^{(j)}$ will signal whether $a_{i}^{(j)}$ is the $i$-th coordinate of the "lower left" (more generally, closest to the origin) corner of the mesh hyperrectangle containing $x$. Since we want to select a single hyperrectangle to contain $x$, we add constraints requiring exactly one choice for each coordinate of the lower left corner:
\[
\sum_{j=1}^{N_{i}}v_{i}^{(j)}=1\quad i=1,\dots,n.
\]Now we just need to add constraints forcing $w^{(j)}=0$ unless mesh point $j$ is one of the corners of the chosen hyperrectangle. Observe that, along any dimension $i$, the $i$-th component of any corner of the correct hyperrectangle will either be the chosen grid value or the next consecutive grid value. For instance, the rectangle containing $x$ in Figure 1 has lower left corner $(a_{1}^{(2)},a_{2}^{(2)})=(3,2)$. That means $v_{1}^{(2)}=1=v_{2}^{(2)}$. The corners of the rectangle have either $3=a_{1}^{(2)}$ or $4=a_{1}^{(3)}$ for their $x_{1}$ coordinate and either $2=a_{2}^{(2)}$ or $3=a_{2}^{(3)}$ for their $x_{2}$ coordinate.

So for $w^{(j)}$ to be nonzero, we need either $v_{i}^{(j_{i})}=1$ or $v_{i}^{(j_{i}-1)}=1$. This leads us to the constraints \[
w^{(j)}\le v_{i}^{(j_{i})}+v_{i}^{(j_{i-1})}
\]for all indices $j$ of mesh points and for all $i=1,\dots,n$, with the understanding that $v_{i}^{(0)}=0$ (since there is no grid point prior to the first one in any direction).

With those extra constraints, the solutions to our little example when we want $z$ small are unchanged, since they already obey the additional constraints. When we want $z$ large, however, the solution in Figure 4 is now infeasible. All three positive weights violate the new constraints. For instance, $w^{(1,3)}=0.5$ (the weight applied to the mesh point formed by the first grid value of $x_{1}$ and the third grid value of $x_{2}$), but $v_{1}^{(1)}+v_{1}^{(0)}=0$. The solutions that maximize $z$ end up being the same ones that minimize $z$ (those shown in Figures 2 and 3).

What remains is to take stock of how much the model has inflated. Let $$P=N_{1}\times\cdots\times N_{n}$$and$$S=N_{1}+\cdots+N_{n}.$$ We first added $P$ continuous variables ($w$) and $n+2$ constraints involving them. Then we added $S$ binary variables ($v$), $n$ constraints involving just them, and $nP$ constraints tying them to the $w$ variables. That's a grand total of $P$ continuous variables, $S$ binary variables and $2n+2+nP$ constraints. Note that $n\ll S\ll P<nP$. Also note that, in general, continuous variables are computationally cheaper than integer variables. As for the gaggle of extra constraints, modern solvers have ways to deal with some constraints "lazily", which mitigates the load to some extent.

An alternative approach would be to partion $R$ into nonoverlapping polygons (not necessarily hyperrectangles), assign weight variables $w$ to the corners of those polygons as above, and assign a binary variable to each polygon indicating whether it was selected. That would increase the number of binary variables from $S$ to $P$ (where $P$ would be the number of polygons) and decrease the number of added constraints from $2n+2+nP$ to $n+3+P.$ (The $n$ constraints requiring sums of binary variables to be 1 becomes a single constraint summing all the new binary variables. The $nP$ constraints tying the $w$ and $v$ variables together become $P$ constraints, one for each $w$ variable.) This approach is more flexible in terms of concentrating polygons where the curvature of $f$ is greatest, and it significantly reduces the number of constraints; but it significantly increases the number of binary variables, albeit with all of them tied into a single type 1 special ordered set. So it's hard for me to say which is better in general.

Thursday, January 18, 2018

More on "Core Points"

A few additions to yesterday's post occurred to me belatedly.

First, it may be a good idea to check whether your alleged core point $y^0$ is actually in the relative interior of the integer hull $\mathrm{conv}(Y)$. A sufficient condition is that, when you substitute $y^0$ into the constraints, all inequality constraints including variable bounds have positive slack. (Equality constraints obviously will not have slack.) In particular, do not forget that nonnegativity restrictions count as bounds. If you specify $0\le y_i \le u_i$, then you are looking for $\epsilon \le y^0_i \le u_i - \epsilon$ for some $\epsilon > 0$ (and $\epsilon$ greater than your tolerance for rounding error).

Second, while the presence of equality constraints will definitely make the feasible region less than full dimension, it can occur even in a problem with only inequality constraints. Consider a problem with nonnegative general integer variables and the following constraints (as well as others, and other variables): \begin{align*} y_{1}+y_{2} +y_{3} & \le 2\\ y_{2}+y_{4} + y_{5} & \le 4\\ y_{1}+y_{3} + y_{4} + y_{5}& \ge 6 \end{align*} Although all the constraints are inequalities, the feasible region will live in the hyperplane $y_2 = 0$, and thus be less than full dimension. This points to why I said the condition in the previous paragraph is sufficient rather than necessary and sufficient for $y^0$ to be in the relative interior of the integer hull. In this example, there is no way to get positive slack in all three of the constraints (or, in fact, in any one of them) without violating the nonnegativity restriction on $y_2$.

Yesterday, I listed a few things one could try in the hope of getting a core point $y^0$ in the relative interior of the integer hull. Here are a few others that occurred to me. (Warning: I'm going to use the term "slack variable" for both slacks and surpluses.)
  • Tighten all inequality constraints (including variable bounds) and solve the LP relaxation of the tightened problem. (Feel free to change the objective function if you wish.) If you find a feasible solution $y^0$, it will be in the relative interior of the LP hull, and quite possibly in the integer hull. Good news: It's easy to do. Bad news: Even a modest tightening might make the problem infeasible (see example above).
  • Insert slack variables in all constraints, including variable bounds. That means $0 \le y_i \le u_i$ would become \begin{align*} y_{i}-s_{i} & \ge0\\ y_{i}+t_{i} & \le u_{i} \end{align*} where $s_i \ge 0$ and $t_i \ge 0$ are slack variables. Maximize the minimum value of the slacks over the LP relaxation of the modified problem. Good news: If the solution has positive objective value (meaning all slacks are positive), you have a relative interior point of at least the LP hull. Bad news: An unfortunate combination of inequalities, like the example above, may prevent you from getting all slacks positive.

Wednesday, January 17, 2018

Finding a "Core Point"

In a famous (or at least relatively famous) paper [1], Magnanti and Wong suggest a method to accelerate the progress of Benders decomposition for certain mixed-integer programs by sharpening "optimality" cuts. Their approach requires the determination of what they call a core point. I'll try to follow their notation as much as possible. Let $Y$ be the set of integer-valued vectors satisfying all the constraints that involve solely the integer variables. Basically, $Y$ would be the feasible region for the problem if the continuous variables did not exist. A core point is a point $y^0$ in the relative interior of the convex hull of $Y$.

I'll assume that any reader who made it this far knows what a convex hull is. That said, I should point out that the convex hull of $Y$ is not the feasible region of the LP relaxation of $Y$. The feasible region of the LP relaxation is frequently termed the "LP hull", whereas the convex hull of $Y$ is typically referred to as the "integer hull".  The integer hull is a subset (typically a proper subset) of the LP hull. Knowing what the integer hull is basically makes a MIP model pretty easy: solving a linear program over the integer hull yields the integer optimum. Since it's pretty well known that most MIP models are not easy, one can infer that in most cases the integer hull is not known (and not easy to figure out).

Mathematicians will know the term "relative interior", but it may not be familiar to OR/IE/MS people trying to solve MIP models. In Euclidean spaces, a point belongs to the interior of a set if there's a ball centered around that point and contained entirely in the set. In lay terms, you're at an interior point if you move slightly in any direction and not leave the set. In our context, that set is the integer hull, a convex polyhedron. The catch is that if the set is not full dimensional, there is no interior (or, more properly, the interior is empty). Picture the unit cube in $\mathbb{R}^3$ as the integer hull. The point $y^0 = (0.5, 0.5, 0.5)$ is in the interior; you can move a bit in any direction and stay in the cube. Now add the constraint $y_3 = 0.5$, which flattens the integer hull to a square (still containing $y^0$). You can move a bit in the $y_1$ and $y_2$ directions and stay in the square, but any vertical movement (parallel to the $y_3$ axis) and you're no longer feasible. That brings us to the relative interior. A point is in the relative interior of a set if it can be surrounded by a ball in the largest subspace for which the set is full dimensional, with the ball staying in the set. Returning to our example, before adding the equality constraint I could surround $y^0$ with a sphere contained in the unit cube. After adding the equality constraint, making the feasible region a square, I can no longer surround $y^0$ with a feasible sphere, but I can surround it with a feasible circle in the $y_3 = 0.5$ plane. So $y^0$ is at least in the relative interior of the integer hull after adding the equation constraint.

Back to Magnanti-Wong. To use their technique, you need to know a "core point" up front. In their paper, they mention that in some cases a core point will be obvious ... but in many cases it will not be. So I'll mention some techniques that probably yield a core point (but no general guarantee).
  • If you happen to know two or more integer-feasible points, average them. The average will be feasible, and unless all those points live on the same facet of the integer hull, you should get a relative interior point.
  • Pick an objective function and optimize it (maximize or minimize, it doesn't matter) over $Y$, ignoring the continuous variables and any constraints involving them from the original problem. Do this a few times: minimize and maximize the same objective, switch the objective, iterate; or just minimize (or maximize) a different objective each time. I might use randomly generated objective functions, but an easy alternative is to minimize $y_1$, then maximize $y_1$, then minimize $y_2$, etc. Average the solutions you get. This is guaranteed to belong to integer hull, and almost surely (at least with random objectives and multiple iterations) to the relative interior of the integer hull. Bad news: you just solved a bunch of integer programs, which might be a trifle time-consuming.
  • Use the previous method, but optimize over the LP hull (relaxing the integrality constraints on $y$). Again, average the solutions you get. Good news: Solving a handful of LPs is typically much less painful than solving a handful of IPs. Bad news: Since the LP hull is a superset of the integer hull, and since your LP solutions are all vertices of the LP hull, there's at least a chance that the average of them lives inside the LP hull but outside the integer hull. That said, I've used this method once or twice and not lost any sleep. If your core point happens to fall outside the integer hull, I don't think it will cause any problems; it just probably won't make your Benders decomposition solve any faster.
  • Generate a bunch of random integer vectors of the correct dimension, and filter out those that do not satisfy the constraints defining $Y$. Average the survivors. Good news: Each of the surviving integer vectors will belong to the integer hull, so your averaged core point $y^0$ definitely will as well. Also, generating random integer vectors is pretty easy. Bad news: If the integer hull is less than full dimension, the probability of a random vector falling in it is zero, so the likelihood of any "survivors" is negligible. Mitigating news: In some cases you can randomly generate an integer vector and then tweak it to ensure it satisfies the constraints that are keeping the integer hull from being full dimension. For instance, suppose that you have a constraint of the form $\sum_i y_i = B$ where $B$ is an integer constant. Generate a random integer vector $\tilde{y}$, replace $\tilde{y}_1$ with $B-\sum_{i>1}\tilde{y}_i$, and you have a vector satisfying that constraint. If you can pull off similar tweaks for other equation constraints (staying integer-valued and not violating bounds on the variables), maybe you can get lucky and find a few integer-feasible points. In fact, even if you can only find one survivor (before exhausting your patience), you might get lucky and have it be in the relative interior of the integer hull. (You won't know this, but you can try using it as your core point and hope for the best.)


[1] Magnanti, T. L. & Wong, R. T., Accelerating Benders Decomposition: Algorithmic Enhancement and Model Selection Criteria. Operations Research, 1981, 29, 464-484

Monday, January 1, 2018

Ordering Index Vector with Java Streams

I bumped up against the following problem while doing some coding in Java 8 (and using streams where possible). Given a vector of objects $x_1, \dots, x_N$ that come from some domain having an ordering $\preccurlyeq$, find the vector of indices $i_1, \dots, i_N$ that sorts the original values into ascending order, i.e., such that $x_{i_1} \preccurlyeq x_{i_2} \preccurlyeq \cdots \preccurlyeq x_{i_N}$ I'm not sure there's an "official" name for that vector of indices, but I've seen it referred to more than once as the "ordering index vector" (hence the name of this post).

One of the nice features of the Java stream package is that it is really easy to sort streams, either using their natural ordering (where applicable) or using a specified ordering. As best I can tell, though, there is (at least currently) no built-in way to find out the original indices or positions of the sorted results. Fortunately, it didn't take to long to hack something that works. It may not be the most elegant way to get the ordering vector, but I'll share it anyway in case someone finds it useful.

My example will sort a vector of doubles, since that was what I was trying to do when I was forced to come up with this code. With fairly obvious modifications, it should work for sorting vectors of other types. Here is the code. Please try not to laugh.

// Create a vector of values whose order is desired.
double[] vals = ...
// Get the sort order for the values.
int[] order =
  IntStream.range(0, vals.length)
           .boxed()
           .sorted((i, j) -> Double.compare(vals[i], vals[j]))
           .mapToInt(x -> x)
           .toArray();
// The sorted list is vals[order[0]], vals[order[1]], ...

The stream has to take a winding trip through the Ugly Forest to get this to work. We start out with an IntStream, because that is the easiest way to get a stream of indices. Unfortunately, sorting using a specified comparator is not supported by IntStream, so we have to "box" it get a stream of integers (Stream<Integer>). (Yes, fans, IntStream and stream of integer are two separate things.) The stream of integers is sorted by comparing the values they index in the original vector of double precision reals. The we use the identity function to map the stream of integers back to an IntStream (!) so that we can easily convert it to a vector of integers (meaning int[], not Integer[]).

When I said this might not be the "most elegant" approach, what I meant was that it looks clunky (at least to me). I look forward to being schooled in the comments section.