Wednesday, September 12, 2012

Boot Bug: Anniversary Edition

A year ago July, I ran into a problem in which my AMD-64 Mint Katya system started to hang on boots. As noted in that post, the problem stemmed from messing with the video settings for the boot loader. Reinstalling the nVidia display driver might or might not have helped eliminate the problem, but eventually it sorted itself out.

A month later, I ran into a second problem linked to the nVidia driver, in which the X server decided to spontaneously reboot on occasion. I got rid of that problem by using the sgfxi script to update to the latest version of the nVidia driver.

Do gremlins celebrate anniversaries? Slightly over one year after fixing the second problem, the first one came back. It started when I tried to run a program that crashed because it could not find a way to do XGL hardware graphics acceleration. That led me to the Additional Drivers dialog in the Control Center, where I discovered that the proprietary nVidia driver was not selected. As it turns out, that's probably because the Control Center sees the one the operating system installed but not the newer one that the sgfxi script installed. I'm not sure that either version was running, though, since both should provide XGL.

Anyway, I decided to turn on the proprietary driver in the Additional Drivers dialog. When I rebooted, the machine hung at the battery test stage. (Note to self: if this happens again, do not turn on the proprietary driver. Get out of X and run the sgfxi script.) I could boot into safe mode, but no way could I boot regularly (despite repeated attempts). Safe mode worked, but it used too low a resolution, with the desktop off-center (left edge cut off, blank screen on the right), and I could not change the resolution. In Control Center > Monitors I discovered that the system could not detect what kind of monitor I had, so it gave me a wimpy default choice that I could not edit.

Eventually I booted into safe mode, went back to Additional Drivers and discovered that I now had a new option: an "experimental" driver. This was the nouveau driver, which I selected. With the nouveau driver, I could boot normally and get to the desktop, but the resolution was still wrong, and the desktop was still off-center. So I switched back to the nVidia driver, and the next boot predictably hung.

I booted into safe mode again, but this time I selected the option to repair packages, thinking perhaps one of the X packages was damaged. That was apparently not the case, but it did remind me that I had seven updates I had not installed. These were updates to the Linux core, including the X system. Normally I don't install those updates because I like to upgrade to a new version of the entire system at once. This was not a normal time, though (particularly as it was getting on 1:30 in the morning), so I let the system install the updates and rebooted again ... and everything worked. Mint automatically detected my monitor correctly, the resolution was set correctly, everything was back the way it had been.  As an added bonus, the nVidia driver was being used, and the program that triggered this whole cluster worked correctly. Woo-hoo!

I subsequently ran the sgfxi script, and sure enough I was on a one-iteration-old nVidia driver. I let the script upgrade me to the latest version, and everything still seems to work properly.

There won't always be a core update waiting to bail me out, so I guess the solution process next time (a year from now?) will be as follows:
  1. Try the sgfxi script.
  2. If that fails, see if there's an update to any of the X packages that can be installed. If not, try reinstalling the X system.
  3. If that also fails, swear mightily and see if there's a new version of Mint I can download.

No comments:

Post a Comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.