Tuesday, May 20, 2008

OpenSolaris power management

One aspect of OpenSolaris that I've taken an interest in is power management. In my research, I've come across several useful things.

One of the first things I tried to find was the current CPU speed to see if power management was doing anything. I finally stumbled across this post by Mark Haywood. In that post he shows these commands:
$ kstat -m cpu_info -s supported_frequencies_Hz
module: cpu_info instance: 0
name: cpu_info0 class: misc
supported_frequencies_Hz 2800000000:3200000000
$ kstat -m cpu_info -s current_clock_Hz
module: cpu_info instance: 0
name: cpu_info0 class: misc
current_clock_Hz 2800000000
These report the supported frequencies and the current frequency of the CPU. If the supported frequencies field shows a range as indicated above, OpenSolaris supports power management for your CPU.

Another very handy tool is PowerTop. It captures statistics about how much CPU time is idle, how much time it spends running at different speeds, and what is causing the CPU to wake up from idle most frequently.

OpenSolaris 2008.05 installation and the fault manager

Last week I had to install an OpenSolaris 2008.05 system at work. The first system I tried to use wouldn't boot the kernel. I did some searching, and found that adding -kv to the kernel command line in grub enables more verbose output during boot. Adding -d enables the kernel debugger. The system was hanging so early in the boot process that I wasn't able to diagnose very much. BIOS and firmware updates didn't seem to help either.

After that, I moved to another system. That was where I had my first encounter with the Solaris Fault Manager system. During a reboot after an image update, it showed an error on a PCI device but continued working fine. After the next reboot, there was a message on the console about a device being retired and the system did not come up on the network. A look at dmesg showed that the e1000g0 devices had been unregistered. I did some searching, and came across this Sun Documentation page that describes the faulty device retirement feature. Basically, you can use the fmadm faulty command to list devices that have show failures. In my case, I issued the fmadm repair command with the fault ID. After another reboot, the system came back up with the e1000g0 device working properly.