This weekend a veritas disaster was dumped on me. This is how I deleted, and recreated entries for the library robot, and it’s drives.To be honest, I don’t even know what the original issue was, but for some reason, Sun was called out to replace a card in our StorageTek L40 tape library. The card was replaced, and from that point forward, all of the drives in the library went down and we began recieving SCSI errors in our system log.

The errors looked something like this:

Aug 4 10:15:09 gcand715 tldcd[303]: [ID 359089 daemon.error] TLD(0) mode_sense ioctl() failed: No such file or directory

Aug 4 10:15:11 gcand715 tldcd[303]: [ID 832037 daemon.error] scsi command failed, may be timeout, scsi_pkt.us_reason = 1

Aug 4 10:15:11 gcand715 tldcd[303]: [ID 359089 daemon.error] TLD(0) mode_sense ioctl() failed: No such file or directory

Aug 4 10:15:11 gcand715 tldcd[303]: [ID 832037 daemon.error] scsi command failed, may be timeout, scsi_pkt.us_reason = 1

Aug 4 10:15:11 gcand715 tldcd[303]: [ID 769352 daemon.error] TLD(0) Mode_sense error, SCSI INTERFACE ERROR

Aug 4 10:15:12 gcand715 tldcd[303]: [ID 958280 daemon.notice] TLD(0) opening robotic path /dev/sg/c4t1l0

Aug 4 10:15:12 gcand715 tldcd[303]: [ID 832037 daemon.error] scsi command failed, may be timeout, scsi_pkt.us_reason = 1

Aug 4 10:15:13 gcand715 last message repeated 1 time


We called sun back out to look at the issue.
The card replaced on the tape library by the sun contractor was configured incorrectly. . I guess it is a similar sort of mistake as putting a battery the wrong way into a hearing aid. It is easily done but also easily corrected. The library was set in SCSI LVD mode rather than SCSI HVD. There was a jumper on the board that dictates this setting. We also noted that the previous engineer had neglected to configure the SCSI IDs for each drive. They were all set to unassigned.

So, we assigned SCSI id 0 to the tape library. Then we assigned SCSI id 1-4 to each of the drives inside.

After assigning the id’s, we did a reconfigure reboot
#touch /reconfigure
#reboot

When the box came up, the errors in /var/adm/messages changed a little.

Aug 5 03:10:41 gcand715 tldcd[324]: [ID 958280 daemon.notice] TLD(0) opening robotic path /dev/sg/c4t1l0
Aug 5 03:10:42 gcand715 tldcd[324]: [ID 985024 daemon.error] TLD(0) key = 0x5, asc = 0x24, ascq = 0x0, INVALID FIELD IN CDB
Aug 5 03:10:42 gcand715 tldcd[324]: [ID 321662 daemon.error] TLD(0) Mode_sense error, CHECK CONDITION
Aug 5 03:10:42 gcand715 tldd[289]: [ID 641686 daemon.notice] DecodeQuery() Actual status: Unable to sense robotic device
Aug 5 03:10:42 gcand715 tldd[289]: [ID 320639 daemon.error] TLD(0) unavailable: initialization failed: Unable to sense robotic device

Since the error changed, we assumed that the SCSI IDs had been changed and that the issue was with our drive mappings. The engineer left and I called up Symantec (Veritas).

Veritas ran me through some tests.
First, we tried to run the sgscan command to scan the SCSI bus.
It just hung. I left it running for a good 15 minutes with no output.

So veritas had me rebuild the sg driver by following their technote 266501.
After rebuilding the sg driver they asked me to run the sgscan again, but I got the same results.
Veritas then decided that this was still a hardware issue, and that I should call storagetek back out to take another look.

Well, storagetek returned today. We went and looked at the library, and he brought some tools to test the SCSI Bus. He hooked up his laptop to the library and was able to move the robot, move tapes… etc. Unfortunatly I was still recieving unable to control robot error messages.

When scanning the bus with iostat -En, I noted that the OS saw 2 drives. A Quantum DLT 8000 and a Quantum DLT 7000.

We dug a little further and found that there were 3 Quantum DLT 8000 drives and 1 Quantum DLT 7000. I checked on the box using the tpconfig command to see what veritas was expecting to find, and… It was looking for 4 DLT 8000s. We came to the conclusion that a drive was replaced with a 7000 instead of an 8000.

We removed the 7000. Took it off the SCSI bus, and added the other 3 drives to the same bus. Verified that iostat -En showed 3 DLT 8000s, and tried to test the robot.

Well the robot didn’t work. In a random attempt, we changed the SCSI id for the L20 library from 0 to 5, did a reboot reconfigure, and then just because, I tried an sgscan.

This time sgscan worked.
It dumped out the scsi devices.

#sgscan
/dev/sg/c4t1l0: Tape (/dev/rmt/0): “QUANTUM DLT8000”
/dev/sg/c4t2l0: Tape (/dev/rmt/6): “QUANTUM DLT8000”
/dev/sg/c4t3l0: Tape (/dev/rmt/7): “QUANTUM DLT8000”
/dev/sg/c4t5l0: Changer: “STK L40”

Theres the 3 tape drives and the library itself.

Now I ran robtest, and it spat out an error about being unable to find the robot at /dev/sg/c4t4l0
Well, my sgscan results tell me that the robot is at /dev/sg/c4t5l0

I tried, just to test, adding a new robot to the netbackup config by using the bpadm comand.
I added a new robot at /dev/sg/c4t5l0, saved the configuration, and ran robtest again.

Sucess.

So at the end of the day, I used bpadm to delete the robot and the drives, and then I recreated them in bpadm with the correct addresses.

I followed the instructions given by tpconfig:

The Media Manager device daemon is active on this machine.
If any device changes are made, the daemon must be
stopped and restarted for the changes to take effect.
To do this enter:
/usr/openv/volmgr/bin/stopltid
/usr/openv/volmgr/bin/ltid

Press any key to continue or CTRL-C to terminate tpconfig

And then, I ran vmoprcmd to check the status of the tapes.. and:

PENDING REQUESTS

DRIVE STATUS

Drv Type Control User Label RecMID ExtMID Ready Wr.Enbl. ReqId
0 dlt TLD root No GU3229 Yes Yes 0
1 dlt TLD – No – –
2 dlt TLD – No – –

ADDITIONAL DRIVE STATUS

Drv DriveName Shared Assigned Comment
0 QUANTUMDLT80000 No gcand715
1 QUANTUMDLT80001 No –
2 QUANTUMDLT80002 No –

Voilia.

The tapes are no longer under AVR control, they’re controled by TLD.
I ran a test backup, and you can see that tape drive 0 grabbed tape number GU3229.

Hope this helps someone.