Sunday, April 4, 2021

Discussions on Staffing I.T. Departments

Note: In addition to this article, you may wish to read what I wrote on this topic back in 2014.

Several times each year, I find someone asking how large of an I.T. department they should have. Typically it is someone in the I.T. department trying to navigate this question so they can advise decision makers about their budget and/or organizational structure. This is a complicated question and sometimes the answers aren’t accepted because the intuition of the various people in these conversations can be very different.

What I’m going to do here is try to provide a neutral perspective that helps the involved parties have a constructive conversation. I’ll avoid error prone simplifications such as a devices-per-tech ratio, my personal intuition, and comparisons to similar organizations. My process takes a while, but if you stick with it I think it will help. It is based on inspiration from other neutral parties. I’ve included two of those sources in the notes at the end of this article. I encourage you to look at those worksheets to get an idea of how the process can look. Be aware that they may make assumptions that are different in your particular environment. By contrast, I propose a process that you can adapt to your individual situation.

Step 1: Conversations and Goal Setting

Much like a good Disaster Recovery Plan or Business Continuity Plan, the institution needs to start with it’s objectives. Here are some questions to get the conversation started:

  • What services are to be provided?
  • Which of those services are custom made and which are commodities?
  • What is the scale of each service? Is it only used by some secretarial staff, all employees, or the public?
  • How long of an outage is the institution willing to allow for each service? How frequently?
  • Are all matters of routine upkeep expected to happen during off-hours? If so, when are off-hours?
  • How many end-point devices will be in active service at any one time?
  • How long is an acceptable waiting period between asking for technical assistance and receiving it?
  • During what hours is technical support expected?
  • Will you provide support to guests, such as people connecting to your wifi or projectors?
  • How many templates of devices will you have? For example, perhaps you have high school student chromebooks and elementary school classroom iPads and cafeteria point-of-sale computers and office secretary computers and so on.
  • How many “bespoke” computers will you have which require custom attention? For example, does the PC running athletics events live streaming or the HVAC system require unique software setups that are not automated and centrally controlled?
  • Do you force all end-users to store data on backed-up servers or is valuable data stored directly on their end-point devices? If the data is on the end-point devices, do you expect technical support to recover the data if the device is upgraded, replaced, or damaged?

Taking a closer look at those questions, you’ll find that the answers aren’t always obvious. Consider the question “How many end-point devices will be in active service at any one time?” This could include desktop computers in offices, chromebooks and tablets assigned to students, labs, and even the computers that run software for your test scanning software, athletics event livestreaming system, and HVAC controls. Do teachers have a mobile device assigned to them as well as a desktop device in their classroom? Do you count the “spare” devices that you give to users when their current device breaks?

Let’s look at the questions about when outages can occur and when technical support is expected. If you’re answering for a school, this may seem obvious. Technical support is only needed when there are students around and outages can happen when class isn’t in session, right? Do you mind an outage while teachers are writing substitute teacher plans and using the copier 30 minutes after school dismisses? Will you expect technical support for the parent trying to connect their phone to your wifi during a basketball game at 6pm? How does the institution feel about an upgrade starting at 5pm which also happens to cause the livestream of a basketball game to “drop” for 15 minutes? Will the superintendent want technical support at Board of Education meetings at 7:30pm? Should system upgrades happen on Sundays in order to avoid impacting classes? If so, will you have any athletics events which are livestreamed and offer wifi to visiting parents?

I hope I’ve shown that there are a lot of situations that we take for granted and might not consider at first. This is why the goals should be defined up front. Otherwise, everyone will be unhappy with the results: management, I.T. staff, and the people that they serve alike.

Step 2: Making Lists and Numbers

Start simple. Make a list of every service you can remember. Give yourself a week or two to think of them all. Ask others to add to it. Look at the calendar and ask what services you need to worry about in each month, quarter, or season.

Do the same for every hardware category. Start with the obvious: desktops, laptops, tablets, printers, network switches, wifi access points, etc. You'll eventually remember things you don’t think about often. Phones, PA systems, fire alarms, cafeteria point-of-sale computers, copiers, fax machines, etc. are all easy to overlook at first but remember later, after you've walked through that office for unrelated reasons. Look through your asset management system (a.k.a. inventory database) and see what you might have missed. Make it a constant thought for several weeks.

As you find items, fill in a quantity where relevant (e.g. copiers but not software), the duration that your institution would be willing to have it unavailable (a.k.a. "return to service" time or RTS), and how much time it takes to maintain each day, week, month, and/or year.

For example, I might say that we have 10 copiers, we can't go without them for a full day (i.e. RTS tolerance is 8 hours), at least one needs service every month, it takes about 0.5 - 3 hours each time, and so on. I might also say that we have uniFLOW for managing those copiers, that it takes about 4 hours per month to manage the application, and another hour or so per month to manage its OS (Windows updates, etc.) So now I can see that the service of "copiers" takes about 5.5 to 8 hours per month of personnel time. I'll take the high number, otherwise I'm not planning appropriately for the target RTS. That 8 hours/month is roughly 0.06 FTE for regular operations. To come to that conclusion, I assumed 4 weeks per month. I also assumed 35 working hours per employee day after removing lunch breaks. That makes:

8 hours / (4 weeks x (35 hours/week)) = 0.057

Replacements, which happen every 5 years or so, are obviously going to be a larger drain on personnel time. So I make a note of that in this section of my data.

Do this for each service in the list that you’ve made. In this way, you quantify each service's required personnel. Right now, 0.06FTE seems like a rounding error, but it will add up. If we decide to hire more staff, it will also help us decide on the division of job duties throughout the department’s positions.

Next, calculate the impact of sick days and vacation time. For example, maybe you give 4 weeks of vacation time annually and assume each employee takes 1 to 2 weeks of sick and personal time annually. So that makes 46 out of 52 weeks, or 46/52 of a year, or about 88-89% of the year that any employee is present. This is very rough, since I'm working in weeks and not actual work days on the calendar. Now that means you'll need to increase any employee requirements by about 10% in order to continue to maintain expected levels of service during vacations, etc. The amount of vacation time and number of holidays your institution gives will influence the math, so the above is only a demonstration.

Step 3: Reviewing and Revising

When you reach this point, your team of stakeholders will have a spreadsheet full of data, justifications, and the tools for transparent conversations with Human Resources and the budget making leadership. Now instead of opposing opinions, people working in good faith can have informed conversations. You can have conversations such as:

  • What is the value of increasing the staffing budget vs. decreasing the RTS goal?
  • Should you consider changing the guidelines for when scheduled outages may occur?
  • What services should be outsourced to keep your limited staff focused on the core mission?

In essence, you have built the formula and the data that goes into it. You can now “turn the knobs” to change the outputs and see what you might want to achieve and what you’re willing to pay (in money or time) to get there.

This process can also be used in future conversations about adding services. Want to change from unmanaged copiers to a system with accountability, printing limits, automation, and more? Do the math to figure out the impact on your FTE for different levels of expected service, different RTS targets, handling it in-house vs. outsourcing the service, etc. This doesn’t just address one conversation. It equips you to have better conversations internally and with vendors about any projects you may consider in the future.

Footnote: Outsourcing

It is worth noting that outsourcing a service reduces the staff necessary, but it doesn’t remove all of the staff time related to it. Continuing with the example above, if staff can’t login to the copiers, the I.T. department will spend time receiving the trouble-report, confirming it, testing if it is a problem caused by their equipment or the vendor’s, and then finally calling the company which has the service contract. If an issue happens every month, that could be 1 - 8 hours per month, depending on the system’s design. They still save time performing the hardware repairs, but the other steps are still handled by the internal I.T. department. Also, outsourcing can have a negative effect on the RTS. If the person who will make the repair has to drive for two hours to get to your office, that is lost productivity. So the question of outsourcing can cut both ways. I recommend considering it for narrow and specialized services, such as copiers, HVAC, computer controlled lighting, phone services, etc. I recommend staying away from it for more flexible tasks, such as general technical support, systems administration, programmers, etc. and issues that are core to your institution’s mission.

Footnote: Inspirations

Here are some of the documents that formed my thinking. If you review them carefully, you can “see” the logic I describe above woven through the math of the worksheets. However, these worksheets contain a lot of invisible assumptions. I offer the method above as a way to adapt the philosophy of these worksheets to your particular environment.

Tuesday, March 30, 2021

Clearing User Files on Macs

In some environments, it is desirable to clear all the user created and downloaded content from a Mac when the user logs out. Perhaps there is only one generic account or you're trying to strongly encourage users to only store things on servers or online services like Google Drive. To create this effect in my environment, I wrote a LaunchAgent and a configurable shell script. I've tested this up to MacOS 10.12, a.k.a. Sierra, but it will probably work on newer versions as well.

To start, the "engine" of this system is the following shell script. Place the code in the file /usr/local/bin/clear_local_files.sh and make it executable. You might need to make this directory manually. You can do that with mkdir -p /usr/local/bin && chmod 755 /usr/local/bin. When you finish pasting the following code into your preferred text editor (BBEdit is a great option), you can save the file clear_local_files.sh to that location. Then use chmod +x /usr/local/bin/clear_local_files.sh to make it executable.


#!/bin/sh
#
# This script will clear away a lot of the files that users are likely
# to leave behind on the local disk.  This is meant as a way to encourage
# users to store files on the server, so that they aren't accidentally
# lost when a computer breaks down, is replaced, is upgraded, etc.
#

# The following is a list of directories at the root of the user's home
#   which will be cleared.
# Note:  The lack of Library allows account customizations to stay on the
#   local disk.
# Note:  Some sub-items will be moved back in a following setting.
# Warning:  Never put " in " in this list, as it will cause a syntax
#   error with loops.
clearDirs=( "Desktop" "Documents" "Downloads" "Movies" "Music" "Pictures" "Public" "Sites" )

# The following is a list of items to preserve in the user's home.
# Warning:  Never put " in " in this list, as it will cause a syntax
#   error with loops.
# Warning:  Be careful with spaces, colons, and slashes in file names.
keepDirs=( "Documents/Microsoft User Data" "Movies/iMovie data folders" "Movies/iMovie Events.localized" "Movies/iMovie Projects" "Movies/iMovie Library.imovielibrary" "Movies/iMovie Theater.theater" "Music/iTunes" "Pictures/iPhoto Library.photolibrary" "Pictures/Photos Library.photoslibrary" "Public/Drop Box" "Sites/images" "Sites/index.html" "Sites/Streaming" )

# This should be executed in the home directory of the current user.
cd ~

# Make a place to hide things.
mkdir ~/.backup0

# Move things into that hidden location
for item in "${clearDirs[@]}"
do
        if [ -e "${item}" ];
        then
                mkdir -p .backup0/"${item}"
                mv "${item}"/* .backup0/"${item}"/
        fi
done
        
# Move the things we're preserving out of the hidden location and back where they're supposed to be.
for item in "${keepDirs[@]}"
do
        if [ -e ".backup0/${item}" ];
        then
                mv ".backup0/${item}" "${item}"
        fi
done

# Get rid of anything that has been around too long.
if [ -e ~/.backup9 ];
then
        rm -rf ~/.backup9
fi

# "Age" each hidden backup by one "notch"
for index in {8..0}
do
        # Make sure it exists before moving it, to avoid errors.
        if [ -e ~/.backup${index} ];
        then
                index2=`expr "$index" + 1`
                mv ~/.backup${index} ~/.backup${index2}
        fi
done


exit

The next step is to make this run whenever a user logs out. However, it is easier to make this run at login than logout. A small difference and mostly unnoticable to the end user, so this is what I went with. To do this, I made a LaunchAgent by putting the following code into a file named com.reviewmynotes.clearLocalFiles.plist located at /Library/LaunchAgents.


<plist version="1.0">
<dict>
        <key>KeepAlive</key>
        <false>

        <key>Label</key>
        <string>org.cairodurham.clearLocalFiles</string>

        <key>LowPriorityIO</key>
        <true>

        <key>ProgramArguments</key>
        <array>
                <string>/usr/local/bin/clear_local_files.sh</string>
        </array>

        <key>RunAtLoad</key>
        <true>

        <key>LimitLoadToSessionType</key>
        <array>
                <string>Aqua</string>
        </array>

</true></true></false></dict>
</plist>

Now logout and login. Anything in the locations listed in clearDirs and not listed in keepDirs should be moved into a hidden folder called .backup1. At each login, that folder will be renamed so the number goes up by one. The folder .backup9 will be deleted each time. This gives you a chance to save people from their own mistakes.

This system can be easily deployed via tools like Munki, Jamf, and FileWave.