Next: Other People's Software
Up: Computational Infrastructure & Future Technologies
Previous: Computational Infrastructure & Future Technologies
Table of Contents -- Index -- PS reprint -- PDF reprint
Skip Schaller
Steward Observatory, University of Arizona, Tucson, AZ 85721
As part of my duties, I manage the Steward Observatory Computer Support Group. Our small group provides programming services to several observatory projects, as well as system administration services for a large number of hosts on our network. We provide full support for approximately 100 Sun workstations, 70 X terminals, 40 network printers, 40 VxWorks real-time computers, and 20 networking boxes, such as routers, switches, and terminal servers. In addition, we provide limited support for some 400 other hosts, which are mostly PCs. Most of these 670 hosts are located in our main building on campus, which has over 200 offices, but others are located in a lab on the other side of campus, and at four different mountain-top observatory sites. This is all done with essentially one system manager.
Despite such a thin staff for such a large responsibility, I am told that the quality of our system management is second to none, by those who have had the opportunity to compare with other places. How does this quality manifest itself? Reliability is one of the best indicators. Our users enjoy 99.9 percent up time on our systems. Interdependence is minimized. The system is designed so that one doesn't depend on four or more hosts, (such as a software server, cpu server, data file server, font server, etc.) to get work done. Another good indicator is that the systems work as advertised. The mail always goes through. Software is not installed on the system in a non-functional or semi-functional form. Users are provided with a default environment that hides gory computer system and network details, so that they can get right to their job or science application. This environment appears the same everywhere on the network. Finally, users get a prompt response to their questions and problems.
So, how do we accomplish all this? The first thing to do is to hire the right people. In our university environment, we are often pressured to hire students. However, our experience has generally shown that even though one can hire several students for the same money as one highly experienced computer professional, the professional will produce more useful work than all the students combined. A person that is less than 25 years old just doesn't compare with a person that has more than 25 years of experience. Quality outweighs quantity.
Certainly, another way to maximize the useful work produced by a system manager is to make everything, hardware and software, as homogeneous as possible. Several more instances of identical workstations require much less extra work to maintain, than several, even similar workstations that each need to be individually custom-tailored. Reducing the number of degrees of freedom, or the number of variables to deal with, reduces the overall work load. The secret is in choosing exactly which degrees of freedom to eliminate so that the load on the system manager is lightened, without seriously inconveniencing the users. The choice of which variables to eliminate is made by determining which ones are the most burdensome to administer and least useful to the user. This determination is based on experience.
How do we achieve this homogeneity? A large part of my job is to just say no. When a user asks me to support a certain hardware or software item, I must determine its money or manpower cost to our group. If the item is just another instance of something we already have or do, then the incremental cost is usually very low, and I can say yes. If the item is new or different, the cost is usually much greater, and if it doesn't fit into my money or manpower budget, I must say no. With a fixed amount of money or manpower resources, supporting a more widely varied list of hardware or software items will reduce the quality of support for those items already on the list. Why have more alternatives to choose from, if that means that fewer or none of them will work very well? Why go to a restaurant that has a very long list of meals on its menu, if none of them taste very good. In the long term, most users will understand that quality outweighs quantity.
Finally, keeping up with trends in technology is another way to reduce the cost of system management. One important trend currently happening is the rapid decrease in the price of ether switches. Steward Observatory has just recently changed over from a routed net to a switched net. This has given us more effective bandwidth, reduced the cost of spares since switches are cheaper than routers, and has reduced the administrative burden, because switches are easier to manage than routers and because there is now only one subnet to deal with instead of many.
The other important trend is the rapid decrease in the price/performance ratio of CPUs, memory, disks, and tape drives, and the fact that networking, while improving, is not keeping up with the improvements in the other areas. This is leading us to a server-less model of distribution of computer resources. A server that exports software to many clients is no longer necessary when the client can hold all the software it needs on 200 dollars worth of disk space. A tape server is no longer needed when tape drives are so inexpensive and every client can afford one. Not only the purchase and administrative costs of the server are saved, but also the cost of making the underlying network carry that load. The office desk-top workstation soon adopts the model of a home computer, an autonomous machine that has all the resources it needs attached locally.
Next: Other People's Software
Up: Computational Infrastructure & Future Technologies
Previous: Computational Infrastructure & Future Technologies
Table of Contents -- Index -- PS reprint -- PDF reprint