Main Page

From Optiportals
Jump to: navigation, search


Welcome to

This site contains recipes and FAQs for building optiportals.


What is an OptIPortal

An optiportal is essentially a display wall. The term optiportal was coined during the Optiputer Project. The Optiputer Project goal was to create interconnected computing resources over high-bandwidth networks. Our goal is to encourage collaboration through the use of optiportals. All of the operating systems and middleware we use are opensource, or at the very least, freely available to public institutions.


Here are some middleware options that are used on many optiportals; all are either free for institutional use or completely opensource.



  • General Information: SAGE is one of the more mature display middleware option. It has been used on many Optiportals. This site includes detailed installation instructions for SAGE. SAGE is well designed for Collaborative purposes, in that users can easily share their content and their laptop sessions to the local display wall and to other remote walls running SAGE.
  • SAGE Site: The SAGE site includes downloads of the latest distributions, detailed documentation and an active forum for trouble-shooting. SAGE is being actively developed at EVL, UIC.
  • Build a SAGE Wall: see instruction on this site under "How to build an Optiportal".


  • General Information: CGLX-Media-Commons is active on many Optiportals as well. It's strength traditionally has been in rendering very large images, while supporting display of standard images and videos.
  • CGLX-Media-Commons Site: The CGLX-Media-Commons site includes general information and legacy downloads. The latest rebuild will be released soon, for the latest build, please send a note CGLX-Media-Commons is being actively developed at Calit2, UCSD.
  • Build a CGLX Wall: see instruction on this site under "How to build an Optiportal".


  • General Information: CalVR is virtual reality framework that has been used in both 2D and 3D environments. Its strength has been in 3D immersive environments - see the Example section.
  • CalVR Site: The CalVR site contains detailed information and documentation, including information on downloading the latest source from github. CalVR is being actively developed at Calit2, UCSD.
  • Build a CalVR Wall: see instruction on this site under "How to build an Optiportal".

Commercial Solutions


Note: the following is from the site: "The educational package is free and available to universities and non-commercial research clients.'

  • HIPerWorks Site: The HIPerWorks site has links for software downloads and documentation.


How to build an Optiportal

Depending on the size of the display wall, an optiportal can be built relatively quickly, below are some examples, one for a standalone, single-node optiportal, and the other a more traditional many-node optiportal.

Single-Node Optiportable (Ubuntu-12.04.4-LTS/SAGE)

Single-Node Optiportable (OpenSUSE/SAGE)

Clustered Optiportal (OpenSUSE/SAGE)

Clustered Optiportal (RHEL6/SAGE, CalVR, MediaCommon(CGLX))

Hardware Capture Boxes

Interacting with SAGE, Mediacommons and CalVR


The easiest way to interact with SAGE sessions is to use sagePointer - this application is available for Windows and Mac Environments - it will provide you with a pointer for the SAGE session, a media drop-box (for copying media content to the SAGE session) and widget for passing your VNC IP and Authentication to the VNC client on the SAGE session.

sagePointer details

  • Download sagePointer
    • if running Windows - be sure to follow the instructions on the sagePointer-site for setting up TightVNC.
    • mac users can use the Screen Sharing toggle in the System Preferences - Sharing bucket - note, you can use your Mac username:login when initiating Desktop Sharing via sagePointer.


Currently - the most robust method available is to use a gyro mouse attached to the head node


Most of our CalVR setups, have head and wand tracking available. If tracking is not available, a gyro mouse attached to the head-node offers reasonable interaction.



I started sage and nothing happens

  • verify your environment vars, if your user's shell is /bin/tcsh and that you are using the recommended tcshrc and xinitrc files.
  • make sure you have, setup your authorized_keys and ssh'd to all possible IP addresses associated with your setup:,, <any of your interface IPs, for example your public IP address>, IP of your nodes, any hostnames associated with your setup. Essentially, you don't want any "yes" queries to be waiting for responses.
  • make certain your hostname and IP are in /etc/hosts
  • if the above is OK, verify your X11 setup via nvidia-settings (make sure the display labels match what you use in your stdtile.conf)
  • if you are running a cluster, make sure your home dir is shared to the display nodes.
    • For Ubuntu and RHEL-Opensource variants, just modify the nfs sharing as normal, make sure the server and client NFS dependencies have been met, and that /etc/exports on the server is correct and corresponding entries are in /etc/fstab on the cluster nodes.
    • For OpenSUSE use yaston the head-node to setup nfs server aspects (/etc/exports permissions, etc.) and use yast on the display nodes to setup the nfs client.
    • Make sure the shell for the demo user on the display node is set to /bin/tcsh and that the demo user can write to the mounted home dir
  • if you are running a cluster, be sure that the cluster interface on all nodes is trusted (that is, as wide open as possible)

I started sage and my screens are black, but no icons appear

  • start sage via the sage gui, clicked on Advanced Startup - check the logs for missing libs, etc.
  • verify that your firewall is not blocking essential ports - consider disabling iptables for a brief testing period to make sure you are not blocking essential traffic
  • make certain that you have generated your rsa or dsa key and have cp (or catted) the respective key to .ssh/authorized_keys, then populate known_hosts via ssh, ssh, ssh hostIP, etc.
  • keep things simple, start a new .sageConfig dir, mv .sageConfig .sageConfig.bak and then start the Sage gui, it will generate a fresh setup. Try to get sage to come up on the primary display first.
  • verify your X configs per node - SAGE can be very sensitive to X11 inconsistencies
  • (legacy - not true for releases after 12/2011) this can be the nvidia driver may need to be downgraded - although this is less necessary with the newer distributions of sage, if you just want to get going, try downloading a prior nvidia-driver release, such as NV*275*. You can also chat with sage team at EVL via The quick and dirty method is to downgrade your NV driver, but we don't see this error as often currently.

VNC sessions cannot be shared to the wall

  • make certain your SAGE-side VNC aspects are up to date - visit
  • make certain that you are using the latest sagepointer - visit
  • check the password specifications with respect to your laptop VNC server. TightVNC, for example, won't tolerate certain characters in the password - verify the requirement.s
  • turn off the firewall while debugging and then fine-tune.
  • check sage log on head-node.

Should I use Separate X or Twinview/Xinerama

  • we find the best performance using Separate X, but Twinview/Xinerama is workable - just check your stdtile*.conf carefully.

Mac: nodename, servname gaierror when opening sagepointer or ui

Be sure to add your hostname to /etc/hosts: "echo `hostname` | sudo tee -a /etc/hosts"

SAM Audio

  • upgrade jack to latest version (jackdmp)
  • install latest qt from source
  • modify mplayer configs for SAM:
    • .sageConfig/fileServer/fileServer.conf and $SAGE_DIRECTORY/sageConfig/fileServer/fileServer.conf (comment out and modify
      • app:video = mplayer -sws 4 -softvol -softvol-max 300 -loop 0 -quiet -vo sage -framedrop -title %f
      • #app:audio = mplayer -ao alsa -softvol -softvol-max 300 -loop 0 -quiet
    • ~/.mplayer.conf and ~/.mplayer/config (add the following)
      • ao=",alsa,pulse,oss"

Collaboration with SAGE

Below is an example of an Optiportal to Optiportal collaboration setup. In this example we are using SAGE, you can use CalVR and Mediacommons/CGLX in a similar manner, but SAGE works very well with respect to collaboration.

Video Conferencing: for low-cost HD quality Video Conferencing we have been using LifeSize - they have a series of solutions - the lowest cost, Passport, is a very good starting point and has a current market price of ~$2500.00. This includes an echo-canceling mic, HD camera and codec. The Passport camera is a fixed version, a pan-tilt-zoom can be purchased separate for more $s. The LifeSize can do HD VTC at 5Mbps and has been very robust. You can either connect to a large screen LCD or connect to a capture box (linux box running linux with Blackmagic Decklink card - Capture Build Instructions).

SAGE Walls: it is best to update to the latest distribution - instructions are available at Optiportal Build Instructions. Once your SAGE Wall is completed, simply share your Public IP of the head-node with your remote partner. Enter your partners IP address in the dim.pyc script - see example below. Note: below is an example of a sage startup script on the head-node. The example below will create a button at the top of your sage desktop with a label "RemoteU" - it directs pushes to

cd $HOME
xhost +local:
xset dpms force on
xset -dpms
jackd -d alsa -p 256 &>br> cd $SAGE_DIRECTORY/bin
sleep 1
./fsManager &
sleep 3
python appLauncher/ -v &
sleep 2
python ../dim/dim.pyc --shared_host RemoteU: &
sleep 2
cd fileServer && python &
cd $SAGE_DIRECTORY/ui && python -a nameofsite &


Multicast Setup

Here are the usual concerns with setting up CGLX nodes:

  • Make certain your switch that the cglx traffic is traveling on is not blocking multicast traffic.
  • Be sure to set a multicast route on the interface on each node that will be carrying the cglx traffic, for example, suppose you are using eth2, then on each node (including the head-node) setup a static route for multicast, for Centos (RHEL, Fedora), you would add a file labeled /etc/sysconfig/network-scripts/route-eth2 containing the following:
    • NETMASK0=
    • ADDRESS0=
  • When setting up your tile config using csconfig, add the IP address in Preferences for the head-node cglx facing IP.
  • if the above fails, your switch may not be allowing Multicast Packets across ports

EL6 (Centos 6.x, SL 6.x)

kernel panics

At installation, with known good hardware, this almost always due to nouveau Use failsafe options when installing from media.

  • add rdblacklist=nouveau to the end of your kernel line in /boot/grub/menu.lst

blacklist nouveau

  • add rdblacklist=nouveau to the end of your kernel line in /boot/grub/menu.lst
  • create: /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
    • add:
      • blacklist nouveau
      • options nouveau modeset=0

Note: the nvidia driver installation will often create the nvidia-installer-disable-nouveau.conf for you

Disable selinux:

vi /etc/selinux/config set line: SELINUX=disabled

Disable NetworkManager:

Network: chkconfig NetworkManager off make appropriate changes to /etc/sys*/net*s/ifc*eth* files, specifially NM_CONTROLLED=no Note: if working remotely, you will have to be careful to time the mods accordingly

Changing interface names

Post installation, if you want to change interface labels, for example, eth1 to eth2: 1. edit /etc/udev/rules.d/70-persistent-net.rules: change the mac address entries to match your new interface lables 2. edit /etc/sysconfig/network-scripts/ifcfg*eth1(2) and change labels accordingly 3. reboot

Remove autoupdate:

yum remove yum-autoupdate

Repos Additions: Add EPEL and RPMForge Repos:

EPEL: wget yum -y --nogpgcheck localinstall epel*

RPMForge: wget yum -y --nogpgcheck localinstall rpmfo* rpm --import

Disable Gnome Notifications: gconftool-2 -s /apps/panel/global/tooltips_enabled --type bool false

Disable Multi-user Login Screen:

gconftool-2 --direct --config-source xml:readwrite:/etc/gconf/gconf.xml.mandatory --type Boolean --set /apps/gdm/simple-greeter/disable_user_list True


edit /etc/gdm/custom.conf add the following to daemon section:

TimedLoginEnable=True TimedLogin=bstoops TimedLoginDelay=5


NOTE: (if you are seeing nobody user/group when local uid/gid exists check the following):

  • more robustly: modify /etc/idmapd.conf and force domain to local
  • alternate (and in some cases necessary) method
    • be sure to “domain local” in /etc/resolv.conf, then “chattr +i /etc/resolv.conf” to disallow changes
    • example:
      • domain local
      • search local
      • nameserver #.#.#.#
      • nameserver #.#.#.#

setup NFS server

yum -y install nfs-utils rpcbind
chkconfig nfs on
chkconfig rpcbind on
chkconfig nfslock on
service rpcbind start
service nfs start
service nfslock start
edit /etc/exports as needed

nfs mount catchup on client

add the following to /etc/init.d/netfs on the client to wait for mounts to complete

[ ! -f /var/lock/subsys/rpcbind ] && service rpcbind start
action $”Sleeping for 30 secs: ” sleep 30
action $"Mounting NFS filesystems: " mount -a -t nfs,nfs4

misc ssh

disable dns check

modify /etc/ssh/sshd_config UseDNS no

generate authorized_keys

generate keys ssh-keygen -t dsa cd `/.ssh; cp authorized_keys

10Gb Tuning

1. Enable Jumbo Frames on all interfaces (if supported) to 9000 bytes.
ifconfig ethX mtu 9000
2. Set the Transmit Queue Length (txqlength) on the interface to 10,000:
ifconfig ethX txqueuelength 10000
3. Enable Large TCP windows:
/etc/sysctl.conf settings faster TCP
: # Set maximum TCP window sizes to 140 megabytes
net.core.rmem_max = 139810133
net.core.wmem_max = 139810133
# Set minimum, default, and maximum TCP buffer limits
net.ipv4.tcp_rmem = 4096 524288 139810133
net.ipv4.tcp_wmem = 4096 524288 139810133
# Set maximum network input buffer queue length
net.core.netdev_max_backlog = 30000
# Disable caching of TCP congestion state (2.6 only) *Fixes a bug in some Linux stacks.
net.ipv4.tcp_no_metrics_save = 1
# Ignore ARP requests for local IP received on wrong interface
net.ipv4.conf.all.arp_ignore = 1
# Use the BIC TCP congestion control algorithm instead of TCP Reno, 2.6.8 to 2.6.18
net.ipv4.tcp_congestion_control = bic
# Use the CUBIC TCP congestion control algorithm instead of TCP Reno, 2.6.18+
# net.ipv4.tcp_congestion_control = cubic
A reboot will be needed for changes to /etc/sysctl.conf to take affect.
4. Set Loadable Kernel Module Interface specific settings:
The following is applicable to Linux network interface cards, specifically for 10 gigabit/s cards. For optimal performance, most of these parameters will need to be set, but various vendors may implement similar options with different syntax or variables.
See driver instructions for vendor specific parameters. These values may need to be adjusted for optimal performance, but the values listed below are know to increase the performance of an Intel-based 10Gig PCI-X card. Other PCI bus parameters may need to be adjusted.
TCP Offload:
Linux 2.6.11 and under has a serious problem with certain Gigabit and 10 Gig ethernet drivers and NICs that support "tcp segmentation offload", such as the Intel e1000 and ixgb drivers, the Broadcom tg3, and the s2io 10 GigE drivers. This problem was fixed in version 2.6.12. A workaround for this problem is to use ethtool to disable segmentation offload:
ethtool -K eth0 tso off
Other EthTool settings:
RX Offload IP CRC Checking=on
TX Offload IP CRC Checking=on
RX Offload TCP CRC Checking=OFF
TX Offload TCP CRC Checking=OFF
Flow Control=OFF
Module options (set at module load):
Flow Control=OFF
Allocated Receive Descriptors=2048
Allocated Transit Descriptors=2048
Receive Interrupt Delay=0
Transmit Interrupt Delay=0
PCI Parameters (set with setpci command):
MMRBC to 4k reads

40Gb Tuning

We have used Mellanox and Chelsio 40Gb adapters. With either adapter, set your CPU and Memory to a Performance setting, that is we want to not limit the the CPU or Memory energy consumption. Not green, unfortunately, but the 40Gb Nics seem to be more consistent with Performance settings. We have seen more reliable performance with the Chelsio NICs using a 3.x kernel.

Use the latest driver and firmware for your NIC - this is critical.

/boot/grub/menu.lst - kernel
add:     intel_idle.max_cstate=0 processor.max_cstate=1

chkconfig irqbalance off
chkconfig cpuspeed off
Add the following to /etc/sysctl.conf
# tuning added for 40G Mellanox or Chelsio
# increase Linux autotuning TCP buffer limits
# min, default, and max number of bytes to use
# increase this for 40G NICS
# don't cache ssthresh from previous connection
# Explicitly set htcp as the congestion control: cubic buggy in 2.6.x kernels
# set for Jumbo Frames
# 40G NIC up on paths up to 50ms RTT:
# increase TCP max buffer size setable using setsockopt()
# increase Linux autotuning TCP buffer limit
net.ipv4.tcp_rmem="4096 65536 134217728"
net.ipv4.tcp_wmem="4096 65536 134217728"

Network Bandwidth Tools

iperf and nuttcp can be used effectively, we generally prefer nuttcp:
download nuttcp: wget
Use the makefile from the 6.1.2 version and make the 7.2.1 version

server: nuttcp -S
client: ./nuttcp -i5 -T20 -v -t <host>
@40Gb you will have to play with the affinity settings, for example:
client: nuttcp -vv -i1 -xc 2/2 <host>
Try different cores to check performance.

With iperf, use multithreaded invocations:
server: iperf -s
client: iperf -c <host> -P 10

Upgrade Centos 6.x kernel to 3.x

3.x kernel for centos 6.x

1. Using the elrepo distros

Note: kernel-lt - is used for the long standing 3.x, kernel-ml is for the latest

setup yum for elrepo:
rpm -Uvh
vi /etc/yum.repos.d/elrepo.repo
enable extras and kernel
yum -y --enablerepo=elrepo-kernel install kernel-ml (just in case you want to test at later date)
yum -y --enablerepo=elrepo-kernel install kernel-lt
yum -y install kernel-lt-3.10.33-1.el6.elrepo.x86_64
yum install -y kernel-lt-devel
yum install -y kernel-lt-doc
# remove prior headers
yum remove kernel-headers-2.6.32-431.5.1.el6.x86_64
yum install -y kernel-lt-headers
# remover prior firmware
yum remove kernel-firmware-2.6.32-431.5.1.el6.noarch
yum install -y kernel-lt-firmware
# Reinstall dependencies that may have been removed above:
yum -y groupinstall "Development Tools"
yum -y groupinstall "Development Libraries"
yum -y install yum-utils
yum -y install kernel-devel
yum -y install curl-devel.x86_64
yum -y install mysql.x86_64
yum -y install mysql-devel.x86_64
yum -y install libXp.x86_64
yum -y install emacs
yum -y install screen
yum -y install java-1.6.0-openjdk*.x86_64
yum -y install libgfortran
yum -y install lapack
yum -y install lapack-devel
yum -y install git
yum -y install gcc*
# for zfs,spl support
yum -y install zlib-devel libuuid-devel libblkid-devel libselinux-devel parted lyum install redhat-lsb-graphics
yum update
shutdown -r now

Reinstall Chelsio, Mellanox, Myri drivers
Reinstall nVidia driver
Reinstall spl/zfs - be sure to remove spl and zfs packages first
rpm -qa | grep spl
remove spl packages
rpm -qa | grep zfs
remove zfs packages
reinstall spl and zfs from source distro
chkconfig zfs on
shutdown -r now
If zpool status shows pool unavailable
zpool export data
zpool import -d /dev/disk/by-id (or by-path) to check potential import
zpool import -f -d /dev/disk/by-id data
verify via zpool status

2. Using the latest from
NOTE: as of 3/2014 zfs won't compile with 3.13.x kernels
yum install gcc ncurses ncurses-devel
yum update
tar -jxvf linux-3.#.#.tar.bz2 -C /usr/src/
# cd /usr/src/linux-3.#.#/

make menuconfig (try defaults)
make modules_install install

remove zfs and spl rpms and remake and reinstall (NOTE: as of 3/2014 zfs won't compile with 3.13.x kernels)
remove Myri, Mellanox, Chelsio drivers and reinstall
update nVidia driver and reinstall


hold your nose and use the gui

opensuse gets persnickety easily. you might want to use yast for as many system tasks as possible

  • kernel tuning
  • grub edits
  • network config
  • nfs setup
  • ntp setup
  • adding users and groups

kernel panics

This most typically arises due to the nouveau driver. See nouveau blacklist.

blacklist nouveau driver

  • at installation use the Failsafe Mode and add "nomodeset" to your kernel options
  • post installation add nomodeset to the kernel line in /boot/grub/menu.lst
  • set NO_KMS_IN_INITRD="yes" in /etc/sysconfig/kernel ... use your favourite text editor or YaST -> System -> Sysconfig Editor
  • Add blacklist nouveau on separate line to /etc/modprobe.d/50-blacklist.conf ... use your favourite text editor or YaST -> System -> Sysconfig Editor

install nVidia driver

  • follow the procedure for blacklisting nouveau first, then:
  • download the latest driver from
  • init 3
  • install nvidia driver
  • init 5

rpm installation error

  • this can arise from a failed refresh at installation time; use: "rpm -rebuilddb" until all errors clear
  • you should then be able to run zypper normally

Setup NFS

  • On the NFS server we run:
  • yast2 -i nfs-kernel-server
  • Then we create the system startup links for the NFS server and start it:
  • chkconfig --add nfsserver
  • /etc/init.d/nfsserver start
  • client:
  • On the client we can install NFS as follows:
  • yast2 -i nfs-client
  • start nfs at boot: chkconfig nfs on

Note: opensuse can be rather finicky, although it hurts to use gui tools, use yast to add nfs exports on the server, and to add mounts on the client

ZFS on Solaris 10+ and Centos 6+

Solaris 10 Net Config


Modify /etc/ipf/ipf.conf
Restart ipfilter:
svcadm restart svc:/network/ipfilter:default


In /etc/nodename, you must specify your name of the server/host.


The interface names in solaris include ce, hme, bge, e1000g, myri10ge, etc. So, if you have an interface called e1000g0 there should be a file named /etc/hostname.e1000g0 In this file, you must specify network configuration information such as IP address, netmask etc. Ex.
>cat /etc/hostname.e1000g0 netmask
>cat /etc/hostname.qfe0 netmask


standard hosts file - IP hostname

The /etc/inet/hosts file serves as local file name resolver. It resolves hostnames, log hosts etc. You can specify any number of hosts associted with IP addresses in it. You must specify hostname of you system in it. Ex.

>cat /etc/inet/hosts localhost loghost solarisbox1 solarisbox1 solarisbox1 solarisbox2


ip hostname


FQDN of host


default gateway

Solaris 11.1 Net Config

ladm show-linkprop
ipadm create-ip net0
ipadm show-if
ipadm create-addr -T static -a net0/v4
route -p add default -ifp net0
edit /etc/resolv.conf (add “nameserver”)
svcadm restart network


How to deploy zfs on Centos

This is a simple server - you can make the server more maintenance friendly by using a case with as many hot-swappable, easy-access bays as possible.

Simple ZFS box running Centos 6.5

Low-end disk suite - 1 - 1TB system disks and 3 - 1.5TB disks for raidz:

  • This is just a simple config, you will see improved performance by adding log and cache devices (especially ssd cache).
  • base desktop install of Centos 6.3 x86_64
  • yum -y install kernel-devel zlib-devel libuuid-devel libblkid-devel libselinux-devel parted lsscsi
  • Note: the latest releases keep up-to-date with kernel updates - you will most likely see compile errors with out-of-order releases
  • spl: get latest release from
  • tar xfvz spl*z && cd spl*
  • ./configure && make rpm
  • rpm -Uvh *.x86_64.rpm
  • zfs: get latest release from
  • tar xfvz zfs-*z && cd zfs*
  • ./configure && make rpm
  • rpm -Uvh *.x86_64.rpm
  • in this simple configuration I will create a raidz with the three 1.5TB disks
[root@zfstest]# zpool create -f data raidz /dev/sdb /dev/sdc /dev/sdd [root@zfstest]# zpool status -v
pool: data
state: ONLINE
scan: none requested
data ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
errors: No known data errors
[root@zfstest]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
51606140 5056536 43928164 11% /
tmpfs 6093648 112 6093536 1% /dev/shm
/dev/sda1 495844 60175 410069 13% /boot
903255184 205796 857166612 1% /home
data 2859816832 0 2859816832 0% /data

On many systems it may not be appropriate using /dev/sd# labeling, since these labels can jump around a bit at boot time, depending on the internal connection configuration. To avoid problems, use /dev/disk/by-id or /dev/disk/by-path.
Here is an example using /dev/disk/by-id:

[testbox]# zpool create -f data raidz /dev/disk/by-id/scsi-SATA_WDC_WD15EADS-00_WD-WCAVY0576419 /dev/disk/by-id/scsi-SATA_WDC_WD15EADS-00_WD-WCAVY0780757 /dev/disk/by-id/scsi-SATA_WDC_WD15EADS-00_WD-WMAVU0062155
[testbox]# zpool status
pool: data
state: ONLINE
scan: none requested

data ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
scsi-SATA_WDC_WD15EADS-00_WD-WCAVY0576419 ONLINE 0 0 0
scsi-SATA_WDC_WD15EADS-00_WD-WCAVY0780757 ONLINE 0 0 0
scsi-SATA_WDC_WD15EADS-00_WD-WMAVU0062155 ONLINE 0 0 0
Replacing Disks in ZFS Pools with spare

This relates primarily to Oracle/SUN Thumper-Thors:

  • zpool status (will detect which disks are faulting in the pool) - if zpool status hangs: then be sure to run zpool status as a non-root user
    • after faulted disk is determined from zpool status: run /opt/SUNWhd/hd/bin/hd as root - this will show the physical layout of the disk for replacement. This will be needed after the disk is unconfigured below: note: the led's may not indicate properly (in fact the led's are almost never indicative).
    • for example: If c0t5d0 is the faulted disk, then as root:
      • if there is a good hot-spare - replace the faulted disk (for example, c0t5d0) with the hot spare (c0t2d0): as root (here the pool is named "data"):
        zpool replace data c0t5d0 c0t2d0
      • detach disk from pool:
        zpool detach data c0t5d0
      • unconfigure disk from the system:
        cfgadm -c unconfigure c0::dsk/c0t5d0
      • replace disk (pull from slot, and add new disk)
      • configure the new disk for the system:
        cfgadm -c configure c0::dsk/c0t5d0
      • add new disk as hot spare:
        zpool add data spare c0t5d0
      • check zpool status to see if pool is fully online (note: it will take a day (or more) to completely resilver, but the pool should show up as online:
        zpool status
Replacing Disks in ZFS Pools without spare

This will work when disks are faulted or unavailable - Note: when disks are unavailable - you may have to offline, unconfigure, and reboot to free disk.

  • zpool offline data c1t3d0
  • cfgadm | grep c1t3d0

sata1/3::dsk/c1t3d0 disk connected configured ok

  • cfgadm -c unconfigure c1::dsk/c1t3d0

Unconfigure the device at: /devices/pci@0,0/pci1022,7458@2/pci11ab,11ab@1:3
This operation will suspend activity on the SATA device
Continue (yes/no)? yes

  • cfgadm | grep sata1/3

sata1/3 disk connected unconfigured ok
Physically replace the failed disk c1t3d0

  • cfgadm -c configure c1::dsk/c1t3d0
  • cfgadm | grep sata1/3

sata1/3::dsk/c1t3d0 disk connected configured ok

  • zpool online data c1t3d0
  • zpool replace data c1t3d0
  • zpool clear data
  • zpool status data
Pool Unavailable

If your pool becomes unavailable, you may be able to recover the pool by doing an export and then import of the pool. I have seen pools become unavailable for a variety of reasons, two examples are: after installing new hardware on the zfs host or after failed disk replacement.

  • export pool - example zpool export data
    • if your pool became unavailable due to a failed disk replacement, then unconfigure failed disk(s) and import pool - zpool import data
      • if you don't unconfigure, you are likely to get errors and not be able to import pool
      • replace disks one at a time following procedure from last section
  • import pool. I typically use the directory (path search method)
    • zpool import -d /dev/disk/by-id
      • this will show available pools for import
    • zpool import -f =d /dev/disk/by-id data (the -f switch (force) is often necessary)
zfs won't compile with a 3.13.x kernel

Until the zfs compilations bugs are patched for the 3.13+ kernel, use the 3.11.x kernel and below.

Legacy FAQ (Rocks based information)

Personal tools