r4 - 29 Mar 2006 - 16:37:43 - PriscillaChungYou are here: OSAF >  Journal Web  >  ContributorNotes > PriscillaChungNotes > InterviewSystemAdmin

System Administrator Interview 3/16/06 - 4:30PM in San Francisco

This is an effort to understand how a target user: "System Administrators" would use Cosmo. This was not a structured interview and so there are no specific questions during this interview. The information collected will help prioritize target users features for future releases of Cosmo.

Information about the user

  • Senior level System Administrator
  • Uses Cosmo mostly from the terminal window, rarely touches the web console
  • Works closely with developers ie. When the errors occur, ther is a strong relationship to hand off issues from admin to developers

Highlights

Items of importance:

  • Server status, really useful page.
  • Having the server be able to repeat things by itself. (instead of writing seperate scripts)
  • Running roll over counter, log in the last minute.

Top priorities:

  • Back ups "can't back up w/ out stopping and restart" (about 20mins to back up, big binary files, not incremental back up
  • Understanding who is using the disk space, w/ out clicking though each user, sorted name of the top users, then use the http based browser, (web console) go to home directory
  • Need a better post production validation
  • Repeat installation. Not having to write scripts, such as installing Cosmo on each server would improve productivity for senior admins
  • "I can do a lot w/ web access logs"
  • Don't need/want a long html list of all the users--> really long wait time.
  • Questions Sys Admin's always need to check: disk space, memory, network and power.

Nice to have:

  • Greater sophistication to view memory use in a graph, ie. "50MB of number is tight, I'd like to know if it's dropped before 50MB, number of free space". Perhaps a script to fetch this page and put it in a database
  • On going monitoring
  • fetch url-->to get a list of the 6000 users. xml extract their user name.

Interview

Installation of Cosmo

  • Look at the read me, after untar try to follow instructions--> wiki further (admins like what's in the bundle)went through it step by step
  • Repeating the installation is "where I had difficulties" –don't want to install in all the machines
  • Would it be valuable to have a public communication-->to talk about how one person would install it on a machine

Writing Scripts

  • Wrote a script to install cosmo on many servers-->senior admins would know about scripting, but not all admins. Especially if they have less experience.
  • If I want to run a mail a server, I ask my OS to get me a known stable copy.
  • Run many instances, support dev.
  • Production every time-->not want to install in all the machines
  • Multiple project instances: old ones still available, here's the url to get a new data--> data wipe. 0.2 regularly, won't release wipe w/ out migration-->script to go to the old intance. 0.2-->0.3
  • Send upgrade to the Cosmo instance from the old one to 0.3.1, use scripts to create a new instance.
  • Move/copy files, find me the 10 largest files. How do I find the bad seed if someone is abusing the server. Can't now browser simulator.

Log files

  • Log files very important-->production instance the most is log files. When the site isn't working, admins look into log files--> not best way to check errors
  • Production instance the most is log files.-->start a new instance, tar ball unzip, 30 sec, sometimes take first instance 200 sec.
  • Better visual cue of what is the real vs. what is in production. PriscillaChung?: For example everything in the left pane is production vs. right is real.
  • Hard to get a unfiled view of consistent, right order, multiple event one event correlate between 2 log files.

Errors

  • Very common look for error such as, 'file not found', 'i have a crash'
  • A lot of visual analysis, looking for patterns.
  • This this different, something is different, something rpt a problem,
  • Send to the dev. --> when the errors strong hand off from admin to dev.

Building Cosmo

  • Provide switches, change password from the built in one, copy data from an old one, where to download one, pass all the right switches
  • Put description, basic switch--> to rebuild all the instances. To rebuild everyone every time would be a special case
  • The idea of revert, testing version, ie. people are still working on the existing wiki, and at the same time people are coming up and recreating a new one

What Sys Admins check on and ask themselves:

  • Check to see if it's not running out of disk space, serving a lot of bandwidth, that it's not running slow.
  • Are thing are going fine? Is the application is healthy?
  • Not crashing, people are hosting, besides are Chandler is okay, but though may not be best practice.
  • Server status, really useful page. Having the server be able to repeat things by itself.
  • Running roll over counter, log in the last minute.

Web Console: Server Status

  • Uses garbage collection for performance issues
  • Java system app pause garbage collection, then pause and wait (not sun)
  • Production instance to work right is to lower behavior of the application

Other notes:

  • Use Appache for the front end-->General web server to management control
  • Directory index--> Can't look through files, like grip (unix, find files w/ a string in it)
  • Lock on the calendar, and there is a lock, no tool to break the lock tickets and fix
  • Can I create an tool, read, update, and I can delete things --> collections, files, etc.
  • Cosmo is layered-->Derby, cosmo http, log, standard out log...random what goes to where...
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r4 < r3 < r2 < r1 | More topic actions
 
Open Source Applications Foundation
Except where otherwise noted, this site and its content are licensed by OSAF under an Creative Commons License, Attribution Only 3.0.
See list of page contributors for attributions.