Thursday, April 20, 2017

Sharing "Implicit" Test Problems

The topic of reproducible research is garnering a lot of attention these days. I'm not sure there is a 100% agreed upon, specific, detailed definition of it, and I do think it's likely to be somewhat dependent on the type of research, but for purposes of this post the Wikipedia definition (previous link) works for me. In particular, I want to call out one part of the Wikipedia entry:
The term reproducible research refers to the idea that the ultimate product of academic research is the paper along with the laboratory notebooks and full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.
I've seen a lot of press related to reproducibility in the biological sciences, although I assume it's an issue in other sciences as well. (Good luck with that, physicists!) It's also increasingly an issue in the social sciences, where one study asserted that over half of the published results they tested failed to replicate. (That study itself was criticized as allegedly failing to replicate.) All of this has been termed by some a "replication crisis". In short, reproducibility is a big deal. There's at least one blog devoted to it, and a considerable amount of work in the statistics community is going into tools to facilitate reproducibility (such as R and Python "notebooks"). R users in particular should have a look at the rOpenSci Reproducibility Guide.

My research tends almost exclusively to fall into the category of "stupid math programming tricks". Either I'm trying to find some clever formulation of a (deterministic) problem, I'm trying to find an efficient way to generate an exact solution, I'm trying to find a decent heuristic for getting "good" solutions "quickly" (preferably without stretching analogies to nature too far: there's no way I'm beating slime mold optimization in the naming contest), or some combination of the above. Since I mostly avoid statistics (other than the occasional comparison of run times with some selected benchmark alternative), I've been largely unaffected by the debate on reproducibility ... until now.

Recently, the journals published by INFORMS have moved in the direction of reproducible research, and I suspect others are doing so (or will do so in the near future) as well. As an example relevant to my interests, the INFORMS Journal on Computing (JOC) has introduced policies on the sharing of both data and code. Personally, I think this is a good idea. I've shared data and code for one particular paper (machine scheduling) on my web site since the paper came out (20+ years ago), and I share data for a more recent paper as well (having released the code as an open source project).

At the same time, I recognize that there are various difficulties (licensing/copyright, for instance) in doing so, and also there are costs for the researcher. I'd be willing to share the code for one or two other projects, but it would take a major effort on my part to untangle the spaghetti and figure out which libraries are/are not required, and which code should be included or should be excluded as irrelevant to the ultimate experiments. I'm reluctant to commit that time until someone actually requests it.

There's another twist that I would not have anticipated until quite recently. I'm working on a project that involves "implicit hitting set" (IHS) problems. In an IHS problem, you have a master problem that formulates as a pure integer (in fact, 0-1) program. Candidate solutions to that model are fed to one or more "oracles", which either bless the solutions as feasible or generate additional constraints for the master problem that the candidates violate. Note that I have not said anything about the nature of the oracles. Oracles might be solving linear or integer programming models, which would be relatively easy to share as "data", but they might also be doing something decidedly funky that is encapsulated in their coding. In the latter case, the test "data" is actually code: I would have to provided the code for the oracles in order for someone to reproduce my results.

Well, okay, if sharing code is on the table now, isn't that just more code? Not quite. Let's say that some unfortunate doctoral student has been tasked by her advisor to code their shiny new IHS algorithm and test it against my published problems. The doctoral student used Python or Julia (or some other trendy language), whereas I used stodgy old Java, so there's a good chance the luckless doctoral student will have to recode my stuff (which, among other things, requires making sense of it). Moreover, I will have created an API to my oracles that may or may not fit with what they are doing ... and that's if I used an API at all. If I directly integrated program structures external to the oracle functions into the oracle (passed in variables, constraints or other elements of my master problems, for instance), our doctoral student will need to figure out how to isolate those elements and replace them with corresponding elements from her code ... if there is even a correspondence.

There's at least one implication in this for me. If I actually subscribe to the push for reproducibility (or if I take pity on other people's doctoral students), I need to code my oracles not as an integral part of my master code but as a separate software module or library with a well-documented and reasonably flexible interface to the outside world (API). <sigh>More work for me.</sigh>

Tuesday, March 28, 2017

Adventures with TeX Live

Since I use the Linux Mint operating system, the obvious (if not only) choice for a LaTeX distribution is TeX Live. (If you are not familiar with, or are not interested in, the LaTeX typesetting system, you have already read too far in this post.) On Mint, Ubuntu and other Debian-type operating systems, you typically install TeX Live by downloading and installing a number of fairly large files from a repository (in my case, mainly repositories for Ubuntu). Sometimes, though, you just want to install one or two small LaTeX packages without bothering with one of the large TeX Live files (which will contain a zillion other LaTeX packages you do not want or need). For this purpose, TeX Live includes a package manager (tlmgr).

Today, I found myself trying unsuccessfully to install something using tlmgr. The first problem was a missing compression/decompression program that was easy to install. After that, I continued to get cryptic error messages. Eventually, I decided to update tlmgr to see if that would help.

The Canonical (Ubuntu) repositories are fairly far behind the curve when it comes to TeX Live; they still provide the 2015 version, which is incompatible with newer versions (including the newer version of tlmgr). So updating tlmgr alone was not an option. (I had what appeared to be the most recent, and presumably final, incarnation of the 2015 version of tlmgr.) Following instructions I found on the Tips on Ubuntu site, I added a PPA containing the latest (I think) version of TeX Live, running the following in a shell:

sudo add-apt-repository ppa:jonathonf/texlive
sudo apt update 

After that, I used the software updater to update my TeX Live packages. (The "Tips on Ubuntu" instructions have you install TeX Live from the PPA, which presumes you do not already have a version installed, as I did.)

Before using tlmgr to do much of anything, you need to initialize the user tree by running

tlmgr init-usertree

in a shell. I had already done that, but I repeated it, confirming that the user tree was still intact. I then ran

tlmgr --gui

to open a graphical user interface, told it to load the default repository (there's a button in the "Repository" panel of the display), and tried -- unsuccessfully -- to install a LaTeX package.

So what was wrong now?  The error message contained the following snippet:

Tk::Error: /usr/bin/tlmgr: open(/usr/share/texlive/tlpkg/texlive.tlpdb)
failed: No such file or directory ...

I checked /usr/share/texlive/tlpkg, and sure enough there was no file texlive.tlpdb ... but there was a file named texlive.tlpdb.9ed73e8174b03a21aa0d40ebbcaac97f. It's a text file (around 12 MB) with configuration information and what looks like a directory structure. Why it had the rather funky hexadecimal extension, I have no idea. I symlinked it by running

sudo ln -s texlive.tlpdb.9ed73e8174b03a21aa0d40ebbcaac97f texlive.tlpdb

and tlmgr started behaving itself.

I later deleted the file with the scary looking hex extension, reran tlmgr and had it load the default repository again. It reproduced the file, but with a different (equally scary) hex extension. I again symlinked the file to texlive.tlpdb.

I have no idea why loading a repository produces what appears to be the correct file with the wrong extension and does not symlink it, but at least this fix seems to work.

Monday, February 27, 2017

MSU Students Do Microfinance

Several years ago, I found out about, an online "microfinance" site where individuals can make small loans ($25 is the standard increment at Kiva) to entrepreneurs in low income settings. The entrepreneurs actually apply for larger amounts, which they typically receive from third-party "field partners". The entrepreneurs repay the loans with interest to the field partners, who in turn repay the principal (no interest) to the original donors. A modest deposit by a donor can be turned over repeatedly into loans, enjoying a version of what in my long-ago undergrad econ class was termed the "multiplier effect". I've made a fair number of loans over the years, nearly all of which have been fully repaid. It's a bit of fun as well as personally satisfying.

So I was geeked to learn from one of my colleagues at the Broad College of Business, Professor Paulette Stenzel, that we have a student organization on campus dedicated to microfinance. The Spartan Global Development Fund is a 501(c)(3) nonprofit organization, created and run by students, that makes microloans both through their own field partners and through Kiva. It's a registered student organization, meaning that any Michigan State University student can join. Students and nonstudents alike can donate through PayPal.

Next time I'm on the Kiva site, I plan to join their Kiva "team", and I encourage anyone interested in microfinance to check out their site.

Thursday, February 23, 2017

Another Absolute Value Question

Someone asked whether it is possible to eliminate the absolute value from the constraint$$Lx\le |y| \le Ux$$(where $L\ge 0$ and $U>0$ are constants, $x$ is a binary variable, and $y$ is a continuous variable). The answer is yes, but what it takes depends on whether $L=0$ or $L>0$.

The easy case is when $L=0$. There are two possibilities for the domain of $y$: either $y\in [0,0]$ (if $x=0$) or $y\in [-U,U]$ (if $x=1$). One binary variable is enough to capture two choices, so we don't need any new variables. We just need to rewrite the constraint as$$-Ux\le y \le Ux.$$If $L>0$, then there are three possibilities for the domain of $y$: $[-U, -L] \cup [0,0] \cup [L, U]$. That means we'll need at least two binary variables to cover all choices. Rather than change the interpretation of $x$ (which may be used elsewhere in the questioner's model), I'll introduce two new binary variables ($z_1$ for the left interval and $z_2$ for the right interval) and link them to $x$ via $x=z_1+z_2$. That leads to the following constraints:$$-Uz_1 \le y \le -Lz_1 + U(1-z_1)$$and$$Lz_2 - U(1-z_2) \le y \le Uz_2.$$ If $x=0$, both $z_1$ and $z_2$ must be 0, and the new constraints force $y=0$. If $x=1$, then either $z_1=1$ or $z_2=1$ (but not both). Left to the reader as an exercise: verify that $z_1=1\implies -U\le y \le -L$ while $z_2=1 \implies L\le y\le U$.

Saturday, January 21, 2017

Fischetti on Benders Decomposition

I just came across slides for a presentation that Matteo Fischetti (University of Padova) gave at the Lunteren Conference on the Mathematics of Operations Research a few days ago. Matteo is both expert at and dare I say an advocate of Benders decomposition. I use Benders decomposition (or variants of it) rather extensively in my research, so it ends up being a frequent theme in my posts. Those posts tend to generate more comments than posts on other topics. Apparently Matteo and I are not the only BD users out there.

I don't know that I would recommend Matteo's presentation as the starting point for someone who has heard of BD but never used it, but I certainly recommend having a look at the slides for anyone who has any familiarity with BD. Matteo provides several interesting perspectives as well as a tip or two for potentially improving performance. I learned a few new things.

In a sad coincidence, Professor Jacques Benders, the originator of Benders decomposition, passed away at age 92 just eight days before Matteo's presentation.

Tuesday, January 10, 2017

Pro Bono Analytics Is Growing Social

Pro Bono Analytics is a program by INFORMS (the Institute for Operations Research and the Management Sciences, for the acronym-averse), "the largest society in the world for professionals in the field of operations research (O.R.), management science, and analytics". PBA "connects our members and other analytics professionals with nonprofit organizations working in underserved and developing communities". In other words, we hook up charitable organizations needing analytics or OR help with volunteers willing to provide it without compensation. Our volunteers are a mix of industry practitioners, academics, students and the occasional geezer retiree. I think the majority of the volunteer pool comes from the U. S., and a majority are INFORMS members, but we have volunteers from as far away as Australia, and a significant portion are non-members.

I'd encourage anyone with OR/MS/analytics skills to consider volunteering, and anyone who knows a charitable organization (particularly those of limited financial means) to let them know we're out there willing to extend a helping hand. Both potential clients and potential volunteers can find out more, and signify their interest, at the PBA home page (repeating the link above).

Our new (and apparently energetic) staff liaison, Tia Carrai, has expanded PBA's social media footprint. You can find us now on the following:
I'd also like to give a shout-out to Pro Bono O.R., a like-minded initiative by our sister institution in the U. K., the Operational Research Society.

Friday, January 6, 2017

Mapping Trackball Buttons

For years, I've used a Logitech M570 wireless trackball with my Linux Mint PC. I generally prefer trackballs to mice -- no need to lift and reposition after a bunch of movement -- and I find that using my thumb, rather than my index finger (or, if I'm in a bad mood, my middle finger) to move the cursor is less fatiguing for my hand and wrist. The M570 works fine with Linux (at least Mint and Ubuntu) with no need for additional drivers.

In addition to the usual (at least for non-Mac users) three main buttons (left, write and combination scroll wheel/third button), the M570 has a couple of secondary buttons, which Logitech describes as "large, easy-to-reach Back/Forward buttons". I'm not sure I'd agree about "large", but I agree that they are easy to reach, and up until my most recent operating system upgrade I would have agreed they acted as back and forward buttons. Prior to the upgrade (and with no special configuration on my part, at least that I can remember), the extra buttons acted like page up and page down in every web browser or document reader that I used. After the upgrade (and switch from the Cinnamon desktop to the Mate desktop), their behavior changed. In the Firefox web browser, they would switch among tabs rather than vertically scrolling the current tab. In Xreader and Acrobat Reader, they also did not move forward backward among pages. (I don't recall trying to read multiple documents at once to see if they would switch among open documents.)

I find the forward/backward action rather handy with multipage documents and long web pages, so I wanted the previous behavior back. It turned out (after some research) not to be hard to do.

The first step is to install the xbindkeys and xautomation packages, both available from the Canonical repositories. This can be done via Synaptic or by running
     sudo apt-get install xbindkeys
     sudo apt-get install xautomation
in a terminal. (I also installed xbindkeys-config, which xbindkeys "suggests", but I don't think it's really necessary.) The xautomation package provides a command xte that can fake a key press.

The second step is to determine what button numbers are assigned to the forward and backward buttons. Run
     xev | grep -i button
in a terminal. This will open a small window with a target. Position the cursor over that window and click each of the buttons. You will see two messages in the terminal for each button, one for the press and one for the release. Look for the button numbers. For me, they were button 8 for forward and button 9 for backward, but your mileage may vary.

The third step is to configure xbindkeys to translate the extra trackball buttons appropriately. The configuration for xbindkeys is kept in a plain text file named .xbindkeysrc in your home directory (~/.xbindkeysrc). You can either create a new one (if you don't have one yet, or if there is nothing in it you want to keep) containing the lines below, or append those lines to the existing file, using your choice of text editor.
# Key bindings for Logitech M570 trackball
"xte 'key Page_Up'"
"xte 'key Page_Down'"
If your button numbers differ from mine (8 and 9), edit the lines accordingly. Note carefully the single and double quotation marks; xte seems a bit finicky about them.

Finally, you'll want to test this. You need to restart xbindkeys to get it to read the modified configuration file. Theoretically you can do this by running
     killall -HUP xbindkeys
in a terminal, but I found it necessary to restart the X server (after closing any applications using it, such as my web browser and email client). Typically Control+Alt+Backspace will do the trick. After that, try the trackball keys and hopefully they'll behave as expected.