Today's class was "Beyond Shell Scripts: 21st Century Automation Tools and Techniques", taught by Aeleen Frisch.
I chose this class primarily for its introduction to cfengine, which we may use at work one of these days.
I've taken classes from Aeleen before at other conferences and enjoy her light presentation style, although she can be a bit disorganized at times. As usual, we fell behind during the day and had to speed through the final topics.
Topics:
- System Cloning - Coverage of various vendor utilities (jumpstart, kickstart) and other tools like ghost. Did a warmup exercise to build a simple kickstart config file.
- cfengine - This was the meat of the class. We covered the theory behind cfengine and discussed the use of the various cfengine terms, "configuration", "policy", etc. The built-in classes were a little hard to understand by a few people. We built a simple cfagent.conf file to perform some basic functions.
The review of the basic actions (tidy, disable, processes, copy, files, shellcommands) was excellent and I came away with a decent understanding of what you could do with cfengine. The devil is, of course, in the details, and you'll need a fairly solid understanding of your environment and your goals to make cfengine useful and non-vexing. I see it replacing numerous 'standard' scripts that we use to remove setuid bits or disable services on our hosts. link
- expect - Simple review of an expect conversation. link
- amanda - Not many users of amanda in our class, covered lightly. link
- RRDTool - In-depth coverage of how the details get shoved in to rrdtool and used with graphs, but nothing really on the theory of how the round-robin database actually works or why it is such a neat idea. link
- Nagios - Exhaustive/exhausting coverage of nagios which descended into a combined nagios-lovefest and Q&A session. link
- stem - Covered at the very end of class. Aeleen uses stem to manage jobs that run on various machines and have interdependencies. It's some sort of message-passing tool, and useful in her batch job/computational environment. I can't fathom how it would be useful to me on a webserver.
Tags: blog on technorati, delicious, flickr, northerncrown
Context: [Google Search] [Geolocation]
BOSTON - The conference started today, at a bright 9am. My first full day course was "Advanced Solaris System Administration Topics, taught by Peter Baer Galvin.
It was a useful and enlightening class, mostly for the historical perspective and for the discussion on Solaris 9/10 features.
Some nuggets:
- Crash Dumps - Mike Shapiro, a Sun kernel guy, suggests that crash dumps not be written where they can be lost (temp space). He also suggests the use of
dumpadm which will let you direct dumps to use raw disk.
- Autoclient - Autoclient is dead.
- Swap - Galvin explains that swap usage is allocated in a round-robin fashion, in 1MB increments. If you choose to have >1 swap partitions, put them on separate disks or even controllers. But, if you're using swap, you might have bigger problems... Always mirror your swap space. Swapping is 10^4 slower than RAM.
# swap -l vs. swap -s - # swap -l reports on true swap, swap -s actually includes physical memory, thus reports on virtual memory.
- Shutting Down -
halt and reboot skip the "K" files, which can be bad for database servers, since things aren't shutdown cleanly.
- Recovery - Ever damage your
/etc/system so badly that you can't boot? Use boot -a to specify the path to a backup copy of the file or even point it to /dev/null to get your host back on its feet.
- Exiting Faster -
pgrep and pkill make your scripts shut the system down faster by accessing the /proc filesystem directly rather than the kludgy grepping of ps output.
- "short read" errors - Too many files in /.
- Privileges - Solaris 10 will be adding substantial new features, notably expanded privileges to allow finer grained control than
setuid can provide.
- Inodes - When creating a filesystem, tune the number of inodes created to fit your application. Mail and news use a lot, databases comparatively few.
- Intent Logging - File creation rates increase from 5/second to 200/second using intent logging. This is the default in Sol10, optional in Sol9, and should be feared and avoided in Sol8.
- UFS - Sun claims that UFS performance is on par with vxfs in Solaris 9.
- VXFS - Due to its extent-based approach, vxfs actually suffers from fragmentation and needs to be de-fragged. The utility to do this must be manually enabled.
- New commands - Lots of new "p" commands for working with processes:
pcred, pflags, pmap, psig, pstop, ptree, pfiles.
- ZFS - "The next great thing". Sun is working on the zetabyte file system, which is "infinitely" expansible, and uses an allocation model, similar to malloc() for memory. Let's hope that it is better behaved... ZFS breaks any backwards compatibility with UFS.
- Tuning and benchmarking - Change only one thing at a time. Reboot. Measure. Repeat. More spindles is always better. Each of the many tools,
vmstat, iostat, mpstat, sar, lockstat, netstat each provide a small piece of the puzzle when tuning or troubleshooting. Learn what they are each telling you.
- CPU's - Sun possesses "M-values" calculations for all CPU's which can help you get a solid idea about relative performance. Oracle considers the Sparc4 arhictecture as 2-CPU's. Beware of licensing charges.
- http://blogs.sun.com/bmc
Excellent class, nearly 400 slides of information.
Tags: blog on technorati, delicious, flickr, northerncrown
Context: [Google Search] [Geolocation]
BOSTON - Solaris Internals class, by the guys who wrote the book.
Tons and tons of material on how paging, swapping, scanning, vm, filesystems all work. We covered dtrace in depth, even though it is a Solaris 10 feature only. It looks to be a useful tool, and provides good summaries of information so you aren't swamped with the output.
dtrace has its own scripting language "D".
There appears to be quite a burr in Sun's blanket, and it is called Linux. Apparently, any system feature or process that takes longer on Linux (on comparable gear, of course) is treated and prioritized internally as a bug. Substantial work has gone into lightening and streamlining system calls to speed up operation. Sun's also done work on speeding up the boot and shutdown processes. Perhaps they can shed the name "Slowlaris"?
Sun apparently has new religion (yes, again... sigh) about Solaris on x86. We'll see if it lasts. Here is their marketing. I'd say that this emphasis is also somehow based on the impact of Linux in their markets. Given their past, seemingly random commitments to Linux and their longtime lack of a real x86 strategy, it appears they have hunkered down with their workhorse, Solaris, and decided to make it beat Linux on Intel. This may be a solid strategy, given the emergence of 64-bit processors for the Intel world.
I'll probably add some more details to this later, since it seems I've left my course book up in the room.
Attended a BoF held by these guys. I didn't even stay for my free cert. How terribly unprepared they are for the real world.
It may seem like a good idea to issue certs to avoid giving money to VeriSign, but the CACert guys have totally missed the point on the trust model. They had many snide comments about browser vendors and flip answers about their own ability to get their certs included in the major web browsers. They won't last. You read it here first.
Killer dinner downtown, although they were sold out of lobster!
Tags: blog on technorati, delicious, flickr, northerncrown
Context: [Google Search] [Geolocation]
BOSTON - Today was a fairly light day, with no training session scheduled. I attended a couple of the general technical sessions and a set of WIP (works-in-progress) presentations. Another Happy Hour with free (and good) beer. Unfortunately, the name of the benevolent sponsor escapes me...
The session on improving web server performance turned out to be useless to me. The paper detailed the research in modifying the behavior of accept() queuing for several small-name webservers, Knot, uServer, and Tux (the in-kernel webserver). There were pretty graphs of connections per second and many allusions made to how CNN supposedly melted on 9/11 under the load. If I remember correctly, and I can check my archives when I return home, I think that the robots.cnn.com server(s) held up throughout the morning and into the early afternoon of 9/11.
The thing these guys seem to miss is that you really can't run a production site, especially a news site, on crude applications like knot or Tux. While they may be able to scale massively to serve tons of one-packet-sized hits (their test load), they are just not feasible to use with modern dynamic sites and publishing methods. Something like AOLServer, which has proven to be massively scalable, can strike the balance between serving a flexible and useful daily news site and standing up to a 9/11-style onslaught. Even an optimized Apache could carry a lot of the static content load, yet still be useful beyond a 1500-byte html page.
I tread lightly and don't mean to insinuate that these guys don't understand the real world, as I am sure much of the computer science work like this may eventually have real-world application. Tons of intriguing and useful open-source applications come out of the world's universities, but being in the business has left me a bit cynical to theories, graphs, and half-baked niche applications. I guess this may be the same way that some people feel about the practicality of the US space program.
The lightning 5-minute WIP's were entertaining, at least, although a couple of the topics were even less practical than Tux and knot. One topic that caught my notice was the idea of eliminating unpredictability in web services.
How do we do that?
- software fuses (to control input)
- resource cops (to control runaway resource usage)
- output guards (to control the correctness of server output).
Since it was only a 5-minute talk, I was simply teased by the sub-topics and still need to dig more. The idea that we can build our services to fail gracefully and predictably intrigues me.
The website. Typing this off-line now, will have to verify the link and contents later.
And the winner for "Most Surprising Idea" award goes to the fellow who proposed, as a way to save heat and electricity in the data center, that server disk drives be directly replaced with laptop disk drives which are smaller and produce less heat. I'll leave counting the reasons why this would be bad, as an exercise for you, dear reader.
We do need to cut down on heat and power usage in large servers, I agree, but aren't laptop drives mostly 4200-5400rpm (1/3 or 1/2 of the speeds of server-class disks), IDE, and meant for shorter and gentler duty cycles than a 36GB 15Krpm SCSI drive? I'd think that increasing densities, which would allow us to just use less drives in general, would yield greater benefit, with less impact.
To be fair, 5 minutes is not much time and English is not the first language for some of these guys doing the WIP's. I'll be re-reading the proceedings on the train on Monday to dig into the technical session content.
Tags: blog on technorati, delicious, flickr, northerncrown
Context: [Google Search] [Geolocation]
|