IT Service Management – Page 2

Why does reporting get forgotten in ITSM projects?

Posted on June 23, 2013June 24, 2013 by Jon Hall

ITSM initiatives often focus heavily on operational requirements, without paying enough up-front attention to reporting and analytics. This can lead to increased difficulty after go-live, and lost opportunity for optimisation. Big data is a huge and important trend, but don’t forget that a proactive approach to ordinary reporting can be very valuable.

“…users must spend months fighting for a desired report, or hours jockeying Excel spreadsheets to get the data they need. I can only imagine the millions of hours of productive time spent each month by people doing the Excel “hokey pokey” each month to generate a management report that IT has deemed not worthwhile”

Don’t Forget About “Small Data” – Patrick Gray in TechRepublic

In a previous role, aligning toolsets to processes in support of our organisation’s ITSM transformation, my teammates and I used to offer each other one piece of jokey advice: “Never tell anyone you’re good with Crystal Reports”.

The reason? Our well established helpdesk, problem and change management tool had become a powerful source of management reports. Process owners and team managers wanted to arrive at meetings armed with knowledge and statistics, and they had learned that my team was a valuable data source.

Unfortunately, we probably made it look easier than it actually was. These reports became a real burden to our team, consuming too much time, at inconvenient times. “I need this report in two hours” often meant two hours of near-panic, delving into data which hadn’t been designed to support the desired end result. We quickly needed to reset expectations. It was an important lesson about reporting.

Years later, I still frequently see this situation occurring in the ITSM community. When ITSM initiatives are established, processes implemented, and toolsets rolled out, it is still uncommon for reporting to be considered in-depth at the requirements gathering stage. Perhaps this is because reporting is not a critical-path item in the implementation: instead, it can be pushed to the post-rollout phase, and worried about later.

One obvious reason why this is a mistake is that many of the things that we might need to report on will require specific data tracking. If, for example, we wish to track average assignment durations, as a ticket moves between different teams, then we have to capture the start and end times of each. If we need to report in terms of each team’s actual business hours (perhaps one team works 24/7, while another is 9 to 5), then that’s important too. If this data is not explicitly captured in the history of each record, then retrospectively analysing it can be surprisingly difficult, or even impossible.

Consider the lifecycle of a typical ITSM data instance, such as an incident ticket:

Our record effectively moves through three stages:

1: The live stage
This is the key part of an incident’s record’s life, in which it is highly important as a piece of data in its own right. At this point, there is an active situation being managed. The attributes of the object define where it is in the process, who owns it, what priority it should take over other work, and what still needs to be done. This phase could be weeks long, near-instantaneous, or anything between.
2: The post-live stage
At this point, the ticket is closed, and becomes just another one of the many (perhaps hundreds of thousands) incidents which are no longer actively being worked. Barring a follow up enquiry, it is unlikely that the incident will ever be opened and inspected by an individual again. However, this does not mean that it has no value. Incidents (and other data) in this lifecycle phase do not have much significant value in their own individual right (they are simply anecdotal records of a single scenario), but together they make up a body of statistical data that is, arguably, one of the IT department’s most valuable proactive assets.
3: The archived stage
We probably don’t want to keep all our data for ever. At some stage, the usefulness of the data for active reporting diminishes, and we move it to a location where it will no longer slow down our queries or take up valuable production storage.

It’s important to remember that our ITSM investment is not just about fighting fires. Consider two statements about parts of the ITIL framework (these happen to be taken from Wikipedia, but they each seem to be very reasonable statements):

Firstly, for Incident Management:

“The objective of incident management is to restore normal operations as quickly as possible”

And, for Problem Management:

“The problem-management process is intended to reduce the number and severity of incidents and problems on the business”

In each case, the value of our “phase 2” data is considerable. Statistical analysis of the way incidents are managed – the assignment patterns, response times and reassignment counts, first-time closure rates, etc. – helps us to identify the strong and weak links of our incident process in a way that no individual record can do so. Delving into the actual details of those incidents in a similar way helps us to identify what is actually causing our issues, reinforcing Problem Management.

It’s important to remember that this is one of the major objectives of our ITSM systems, and a key basis of the return on our investment. We can avoid missing out on this opportunity by following some core principles:

Give output requirements as much prominence as operational requirements, in any project’s scope.
Ensure each stakeholder’s individual reporting and analytics needs are understood and accounted for.
Identify the data that actually needs to be recorded, and ensure that it gets gathered.
Quantify the benefits that we need to get from our analytics, and monitor progress against them after go-live.
Ensure that archiving strategies support reporting requirements.

Graphs icon courtesy of RambergMediaImages on Flickr, used under Creative Commons licensing.

Congestion charging… in IT?

Posted on March 20, 2013March 20, 2013 by Jon Hall

Does your organization understand the real costs of the congestion suffered by your IT services? Effective management and avoidance of congestion can deliver better service and reduced costs, but some solutions can be tough to sell to customers.

The Externalities of Congestion

In 2009, transport analyst and activist Charles Komanoff published, in an astonishingly detailed spreadsheet, his Balanced Transportation Analysis for New York City. His aim was to explore the negative external costs caused by the vehicular traffic trying to squeeze into the most congested parts of the city each day.

His conclusion? In the busiest time periods, each car entering the business district generates congestion costs of over $150.

: Congestion Costs outlined in Komanoff’s Balanced Transportation Analysis

Komanoff’s spreadsheet can be downloaded directly here. Please be warned: it’s a beast – over three megabytes of extremely complex and intricate analysis. Reuters write Felix Salmon succinctly stated that “you really need Komanoff himself to walk you through it“.

Komanoff’s work drills into the effect of each vehicle moving into the Manhattan business district at different times of day, analyzing the cascading impact of each vehicle on the other occupants of the city. The specific delays caused by any given car on any other given vehicle is probably tiny, but the cumulative effect is huge.

The Externalities of Congested IT Services

Komanoff’s city analysis models the financial impact of a delay to each vehicle, such as commercial vehicles, carring several paid professionals, travelling to fulfil charged-for business services. With uncontrolled access to the city, there is no consideration of the “value” of each journey, and thus high-value traffic (perhaps a delivery of expensive retail goods to an out-of-stock outlet) gets no prioritization over any lower value journey.

Congested access to IT resources, such as the Service Desk, has equivalent effects. Imagine a retail unit losing its point-of-sale systems on the Monday morning that a HQ staff return from their Christmas vacation. The shop manager’s frantic call may find itself queued behind dozens of forgotten passwords. Ten minutes of lost shop business will probably cost far more than a ten minute delay in unlocking user accounts.

That’s not to say that each password reset isn’t important. But in a congested situation, each caller is impacted by, and impacts, the other people calling in at the same time.

The dynamics and theory of demand management in call centers have been extensively studied and can be extremely complex (a google search reveals plentiful studies, often with complex and deep mathematical analysis. This is a by no means the most example!).

Fortunately, we can illustrate the effects of congestion with a relatively simple model.

Our example has the following components:

Four incoming call lines, each manned by an agent
A group of fifteen customers, dialling-in at or after 9am, with incidents which each take 4 minutes for the agent to resolve.
Calls arriving at discrete 2-minute intervals (this is the main simplification, but for the purposes of this model, it suffices!)
A call queuing system which can line up unanswered calls.

When three of our customers call in each 2-minute period, we quickly start to build up a backlog of calls:

Congestion at the Service Desk - a table models the impact of too many customers arriving in each unit of time. — With three calls arriving at the start of each two-minute interval, a queue quickly builds.

We’ve got through the callers in a relatively short time (everything is resolved by 09:16). However, that has come at a price: 30 customer-minutes of waiting time.

If we spread out the demand slightly, and assume that only two customers call in at the start of each two-minute period, however, the difference is impressive:

Table showing a moderated arrival rate at the service desk, resulting in no queueing — If the arrival rate is slowed to two callers in each time period, no queue develops

Although a few users (customers 3,7, 11 and 15) get their issues resolved a couple of minutes later in absolute terms, there is no hold time, for anyone. Assuming there are more productive things a user can be doing other than waiting on hold (notwithstanding their outstanding incident), the gains are clear. In the congestion scenario, the company has lost half an hour of labour, to no significant positive end.

Of course, while Komanoff’s analysis is comprehensive, it is one single model and can’t be assumed completely definitive. But it is undeniable that congestion imposes externalities.

Komanoff’s proposed solution involves a number of factors, including:

A congestion charge, applying at all levels of day, with varying rates, applying to anyone wishing to bring a car into the central area of the city.
Variable pricing on some alternative transportation methods such as trains, with very low fares at off-peak times.
Completely free bus transport at ALL times.

Congestion management of this kind is nothing new, of course. London, having failed to capitalize on its one big chance to remodel its ancient street layout, introduced a central, flat-fare central congestion charge in 2003. Other cities have followed suit (although proposals in New York have not come to fruition). Peak time rail fares and bridge tolls are familiar concepts in many parts of the world. Telecoms, the holiday industry, and numerous other sectors vary their pricing according to periodic demand.

Congestion Charging in IT?

Presumably, then, we can apply the principles of congestion charging to contested IT resources, implementing a variable cost model to smooth demand? In the case of the Service Desk, this may not always be straightforward, simply because in many cases the billing system is not a straightforward “per call” model. And in any case, how will the customer see such a proposal?

Nobel Laureate William S. Vickery is often described as “the father of congestion charging”, having originally proposed it for New York in 1952. Addressing the objections to his idea, he said:

“People see it as a tax increase, which I think is a gut reaction. When motorists’ time is considered, it’s really a savings.”

If the customer agrees, then demand-based pricing could indeed be a solution. A higher price at peak times could discourage lower priority calls, while still representing sufficient value to those needing more urgent attention. This model will increasingly be seen for other IT services such as cloud-based infrastructure.

There are still some big challenges, though. Vickrey’s principles included the need to vary prices smoothly over time. If prices suddenly fall at the end of a peak period, this generates spikes in demand which themselves may cause congestion. In fact, as our model shows, the impact can be worse than with no control at all:

Table showing the negative impact of a failed off peak/peak pricing model — If we implement a peak/off-peak pricing system, this can cause spikes. In this case, all but four of the customers wait until a hypothetical cheaper price band starting at 09:08, at which point they all call. there is even more lost time (40 minutes) in the queue than before.

This effect is familiar to many train commuters (the 09:32 train from Reading to London, here in the UK, is the first off-peak service of the morning, and hence one of the most crowded). However, implementing smooth pricing transitions can be complex and confusing compared to more easily understood fixed price brackets.

Amazon’s spot pricing of its EC2 service is an interesting alternative. In effect, it’s still congestion pricing, but it’s set by the customer, who is able to bid their own price for spare capacity on the Amazon cloud.

Alternatives?

Even if the service is not priced in a manner that can be restructured in this way, or if the proposition is not acceptable to the customer, there are still other options.

Just as Komanoff proposes a range of positive and negative inducements to draw people away from the congested peak-time roads, an IT department might consider a range of options, such as:

Implementation of a service credits system, where customers are given a positive inducement to access the service at lower demand periods, could enable the provider to enhance the overall service provided, with the savings from congestion reduction passed directly to the consumer.
Prioritization of access, whereby critical tasks are fast-tracked in priority to more routine activities.
Varieable Service Level Agreements, offering faster turnarounds of routine requests at off-peak times. Again, if we can realise Vickrey’s net overall saving, it may be possible to show enhanced overall service without increased overall costs.
Customer-driven work scheduling. Apple’s Genius Bar encourages customers to book timeslots in advance. This may result in a longer time to resolution than a first-come-first-served queue, but it also gives the customer the opportunity to choose a specific time that may be more convenient to them anyway. Spare capacity still allows “walk up” service to be provided, but this may involve a wait.
Customer self-service solutions such as BMC’s Service Request Management. Frankly, this should be a no-brainer for many organizations. If we have an effective solution which allows customers to both log and fulfil their own requests, we can probably cut a significant number of our 15 customer calls altogether. Self-service systems offer much more parallel management of requests, so if all 15 customers hit our system at once, we’d not expect that to cause any issue.

Of course, there remains the option of spending more to provide a broader capacity, whether this is the expansion of a helpdesk or the widening of roads in a city. However, when effective congestion management can be shown to provide positive outcomes from unexpanded infrastructure, shouldn’t this be the last resort?

(congestion charge sign photo courtesy of mariodoro on Flickr, used under Creative Commons licensing)

When critical IT suppliers fail, the impact can be severe

Posted on February 8, 2013August 21, 2013 by Jon Hall

The collapse of 2e2 is a warning to all IT organizations.

As IT evolves into a multi-sourced, supplier driven model, how many companies understand the risks?

One of the big stories in corporate IT this week has been the troubles of the IT service provider 2e2. 2e2 are a supplier of a range of outsourcing, resourcing and support services. As liquidators moved in, staff were cut and services ceased. It’s horrible for the staff, of course, and I wish everybody all the best. For customers, particularly datacenter customers, the situation is uncertain.

Increasingly, the role of IT within a large organization is to be a broker of services, driven by a range of internal functions and external suppliers. This trend continues to drive growth in the outsourcing market, which Gartner now estimates to be in excess of quarter of a trillion US dollars.

This hybrid model means that a typical IT service, as understood by its customers and stakeholders, will be dependent on both internal capabilities, and the performance and viability of multiple external suppliers. The collapse of 2e2 is a reminder that suppliers sometimes fail. When this happens, not all organizations are prepared for the consequences.

A failed service can kill a business

A harsh truth: The failure of a critical business service can kill a profitable multi-billion-dollar company in a matter of months. It has happened before, and will happen again.

One of the biggest examples of this is the billing system failure that caused Independent Energy to collapse. A British darling of the dot-com stock market boom, Independent Energy was a new energy supplier, operating a multi-sourced supply chain model to compete with large post-privatization incumbents.

The model was initially a big success. Floating in 1996 for 15 million pounds, it had risen in value to over 1 billion pounds (approx USD$1.6bm at current rates) by 2000.

However, in February of that year, the company was forced to admit that it was facing serious problems billing its customers, due to serious problems with systems and processes. Complex dependencies on external companies and previous suppliers were compounded by internal IT issues, and the effect was devastating.

The company simply couldn’t invoice the consumers of its products. The deficit was significant: the company itself stated that it was unable to bill some 30% of the hundreds of millions of pounds outstanding from its customers.

The company itself seemed otherwise to be healthy, and even reported profits in May, but the billing problems continued. Months later, it was all over: in September 2000, Independent Energy collapsed with at least 119 million pounds of uncollected debt. The remains of the company were purchased by a competitor for just ten million pounds.

The impact on customers

For 2e2’s customers, the immediate problem is a demand for funding to keep the lights on in the datacenter. The biggest customers are reported to have been asked for GB£40,000 (US$63,000 immediately), with smaller customers receiving letters demanding GB£4,000 (US$6,300). Non payment means disconnection of service. Worse still, there is the additional threat that if overall funding from all customers is insufficient, operations might shut down regardless:

Administrator's letter to 2e2 customers — The Administrator’s letter to 2e2 customers warns that any customers unable to pay the demanded charge will lose all services immediately, and that ALL services may cease if the total required amount is not raised.

But the complications don’t end there. The infrastructure in the 2e2 datacenters is reportedly leased from another supplier, according to an article in the UK IT journal The Register. Customers, the article claims, may face additional payments to cover the outstanding leasing costs for the equipment hosting their data and services.

A key lesson: It’s vital to understand the services you are providing

The events we’ve discussed reinforce the importance of understanding, in detail, your critical IT services. The Service Model is key to this, even in simple examples such as this one:

Sketch of a simple service, including an external datacenter, several data stores, and a client server application. — A sketch model of a simple service-driving application

Even for this simplistic example, we see some a number of critical questions for the organization. Here are just a few:

How do I know which equipment in the datacenter belongs to us, which belongs to the customer, and which is leased? Recently, a number of companies experienced devastating flooding as result of the storm which hit the USA’s Eastern Seaboard. Many are now struggling to identify their losses for insurance purposes. This can cause a serious cashflow hit, as equipment has to be replaced regardless of the fact that payouts are delayed.

What happens if our cloud-based archiving provider gets into difficulties? In this situation, the immediate impact on live service may be limited, but in the medium and longer term, how will billing and vital financial record keeping be affected?

Our client tool is dependent on a 3rd party platform. What risks arise from that? A few days ago, Oracle released a critical fix which patched 50 major security holes. Updates like this are nothing unusual, of course. But there are many examples of major security breaches caused by unpatched platforms (the Information Commissioner’s Office recently cited this error in its assessment of the Sony Playstation Network failure, adding a £250,000 fine to the huge costs already borne by Sony as a result of the collapse). Of course, there are other risks to consider too: How long will the supplier continue to support and maintain the platform, and what might happen if they stop?

The required understanding of a service can only be achieved with effective planning, management and control of the components that make it up. Is this the most critical role of IT Service Management in today’s organization?

ITAM 2015: The evolving role of the IT Asset Manager

Posted on January 21, 2013March 20, 2013 by Jon Hall

In a previous post, we discussed the fact that IT Asset Management is underappreciated by the organizations which depend on it.

That article discussed a framework through which we can measure our performance within ITAM, and build a structured and well-argued case for more investment into the function. I’ve been lucky enough to meet some of the best IT Asset Management professionals in the business, and have always been inspired by their stories of opportunities found, disasters averted, and millions saved. ITAM, done properly, is never just a cataloging exercise.

As the evolution of corporate IT continues at a rapid pace, there is a huge opportunity (and a need) for Asset Management to become a critical part of the management of that change. The role of IT is changing fundamentally: Historically, most IT departments were the primary (or sole) provider of IT to their organizations. Recent years have seen a seismic shift, leaving IT as the broker of a range of services underpinned both by internal resources and external suppliers. As the role of the public cloud expands, this trend will only accelerate.

Here are four ways in which the IT Asset Manager can ensure that their function is right at the heart of the next few years’ evolution and transition in IT:

1: Ensure that ITIL v3’s “Service Asset and Configuration Management” concept becomes a reality

IT Asset Management and IT Service Management have often, if not always, existed with a degree of separation. In Martin Thompson’s survey for the ITAM Review, in late 2011, over half of the respondents reported that ITSM and ITAM existed as completely separate entities.

Despite its huge adoption in IT, previous incarnations of the IT Infrastructure Library (ITIL) framework did not significantly detail IT Asset Management as many practitioners understand it. Indeed, the ITIL version 2 definition of an Asset was somewhat unhelpful:

“Asset”, according to ITIL v2:
“Literally a valuable person or thing that is ‘owned’, assets will often appear on a balance sheet as items to be set against an organization’s liabilities. In IT Service Continuity and in Security Audit and Management, an asset is thought of as an item against which threats and vulnerabilities are identified and calculated in order to carry out a risk assessment. In this sense, it is the asset’s importance in underpinning services that matters rather than its cost”

This narrow definition needs to be read in the context of ITIL v2’s wider focus on the CMDB and Configuration Items, of course, but it still arguably didn’t capture what Asset Managers all over the world were doing for their employers: managing the IT infrastructure supply chain and lifecycle, and understanding the costs, liabilities and risks associated with its ownership.

ITIL version 3 completely rewrites this definition, and goes broad. Very broad:

“Service Asset”, according to ITIL v3: Any Capability or Resource of a Service Provider. Resource (ITILv3): [Service Strategy] A generic term that includes IT Infrastructure, people, money or anything else that might help to deliver an IT Service. Resources are considered to be Assets of an Organization Capability (ITIL v3): [Service Strategy] The ability of an Organization, person, Process, Application, Configuration Item or IT Service to carry out an Activity. Capabilities are intangible Assets of an Organization.”

This is really important. IT Asset Management has a huge role to play in enabling the organization to understand the key components of the services it is providing. The building blocks of those services will not just be traditional physical infrastructure, but will be a combination of physical, logical and virtual nodes, some owned internally, some leased, some supplied by external providers, and so forth.

In many cases, it will be possible to choose from a range of such options, and a range of suppliers, to fulfill any given task. Each option will still bear costs, whether up-front, ongoing, or both. There may be a financial contract management context, and potentially software licenses to manage. Support and maintenance details, both internal and external, need captured.

In short, it’s all still Asset management, but the IT Asset Manager needs to show the organization that the concept of IT Assets wraps up much more than just pieces of tin.

2: Learn about the core technologies in use in the organization, and way they are evolving:

A good IT Asset Manager needs to have a working understanding of the IT infrastructure on which their organization depends, and, importantly, the key trends changing it. It is useful to monitor information sources such as Gartner’s Top 10 Strategic Technology Trends, and to consider how each major technology shift will impact the IT resources being managed by the Asset Manager. For example:

Big Data will change the nature of storage hardware and services. Estimates of the annual growth rate of stored data in the corporate datacenter typically range from 40% to over 70%. With this level of rapid data expansion, technologies will evolve rapidly to cope. Large monolithic data warehouses are likely to be replaced by multiple systems, linked together with smart control systems and metadata.

Servers are evolving rapidly in a number of different ways. Dedicated appliance servers, often installed in a complete unit by application service providers, are simple to deploy but may bring new operating systems, software and hardware types into the corporate environment for the first time. With an increasing focus on energy costs, many tasks will be fulfilled by much smaller server technology, using lower powered processors such as ARM cores to deliver perhaps hundreds of servers on a single blade.

Image of Boston Viridis U2 server — An example of a new-generation server device: Boston’s Viridis U2 packs 192 server cores into a single, low-power unit

Software Controlled Networks will do for network infrastructure changes what virtualization has done for servers: they’ll be faster, simpler, and propagated across multiple tiers of infrastructure in single operations. Simply: the network assets underpinning your key services might not be doing the same thing in an hour’s time.

“The Internet of Things” refers to the rapid growth in IP enabled smart devices.
Gartner now state that over 50% of internet connections are “things” rather than traditional computers. Their analysis continues by predicting that in more than 70% of organizations, a single executive will have management oversight over all internet connected devices. That executive, of course, will usually be the CIO. Those devices? They could be almost anything. From an Asset Management point of view, this could mean anything from managing the support contracts on IP-enabled parking meters to monitoring the Oracle licensing implications of forklift trucks (this is a real example, found in their increasingly labyrinthine Software Investment Guide). IT Asset Management’s scope will go well beyond what many might consider to be IT.

SF parking meter - an example of an IP enabled "thing" — A “thing” on the streets of San Francisco, and on the internet.

3: Be highly cross functional to find opportunities where others haven’t spotted them

The Asset Manager can’t expect to be an expert in every developing line of data center technology, and every new cloud storage offering. However, by working with each expert team to understand their objectives, strategies, and roadmaps, they can be at the center of an internal network that enables them to find great opportunities.

A real life example is a British medical research charity, working at the frontline of research into disease prevention. The core scientific work they do is right at the cutting edge of big data, and their particular requirements in this regard lead them to some of the largest, fastest and most innovative on-premise data storage and retrieval technologies (Cloud storage is not currently viable for this: “The problem we’d have for active data is the access speed – a single genome is 100Gb – imagine downloading that from Google”).

These core systems are scalable to a point, but they still inevitably reach an end-of-life state. In the case of this research organization, periodic renewals are a standard part of the supply agreement. As their data centre manager told me:

“What they do is sell you a bit of kit that’ll fit your needs, front-loaded with three years support costs. After the three years, they re-look at your data needs and suggest a bigger system. Three years on, you’re invariably needing bigger, better, faster“

With the last major refresh of the equipment, a clever decision was made: instead of simply disposing of, or selling, the now redundant storage equipment, the charity has been able to re-use it internally:

“We use the old one for second tier data: desktop backups, old data, etc. We got third-party hardware-only support for our old equipment”.

This is a great example of joined-up IT Asset Management. The equipment has already largely been depreciated. The expensive three year, up-front (and hence capital-cost) support has expired, but the equipment can be stood up for less critical applications using much cheaper third party support. It’s more than enough of a solution for the next few years’ requirements for another team in the organization, so an additional purchase for backup storage has been avoided.

4: Become the trusted advisor to IT’s Financial Controller

The IT Asset Manager is uniquely positioned to be able to observe, oversee, manage and influence the make up of the next generation, hybrid IT service environment. This should place them right at the heart of the decision support process. The story above is just one example of the way this cross-functional, educated view of the IT environment enables the Asset Manager to help the organization to optimize its assets and reduce unnecessary spend.

This unique oversight is a huge potential asset to the CFO. The Asset Manager should get closely acquainted with the organization’s financial objectives and strategy. Is there an increased drive away from capital spend, and towards subscription based services? How much is it costing to buy, lease, support, and dispose of IT equipment? What is the organization’s spend on software licensing, and how much would it cost to use the same licensing paradigms if the base infrastructure changes to a newer technology, or to a cloud solution.

A key role for the Asset Manager role in this shifting environment is that of decision support. A broad and informed oversight over the structure of IT services and the financial frameworks in place around them, together with proactive analysis of the impact of planned, anticipated or proposed changes, should enable the Asset Manager to become one of the key sources of information to executive management as they steer the IT organization forwards.

Parking meter photo courtesy of SJSharkTank on Flickr, used under Creative Commons license

Ticket Tennis

Posted on November 16, 2012March 20, 2013 by Jon Hall

The game starts when something breaks.

A service is running slowly, and the sounds of a room full of frustration echo down a phone line. Somewhere, business has expensively stopped, amid a mess of lagging screens and pounded keyboards.

The helpdesk technician provides sympathetic reassurance, gathers some detail, thinks for a moment, and passes the issue on. A nice serve, smooth and clean, nothing to trouble the line judges here.

THUD!

And it’s over to the application team. For a while.

“It’s not us. There’s nothing in the error logs. They’re as clean as a whistle”.

Plant feet, watch the ball…

WHACK!

Linux Server Support. Sure footed and alert, almost nimble (it’s all that dashing around those tight right-angle corners in the data center). But no, it seems this one’s not for them.

“CPU usage is normal, and there’s plenty of space on the system partition”.

SLICE!

The networks team alertly receive it. “It can’t be down to us. Everything’s flowing smoothly, and anyway we degaussed the sockets earlier”. (Bear with me. I was never very good at networking).

“Anyway, it’s slow for us, too. It must be an application problem”.

BIFF!

Back to the application team it goes. But they’re waiting right at the net. “Last time this was a RAID problem”, someone offers.

CLOUT!

…and it’s a swift volley to the storage team.

I love describing this situation in a presentation, partly because it’s fun to embellish it with a bit of bouncy time-and-motion. Mostly, though, it’s because most people in the room (at the very least, those whose glasses of water I’ve not just knocked over) seem to laugh and nod at the familiarity of it all.

Often, something dramatic has to happen to get things fixed. Calls are made, managers are shouted at, and things escalate. Eventually people are made to sit round the same table, the issue is thrashed out, and finally a bit of co-operation brings a swift resolution.

You see, it turns out that the servers are missing a patch, which is causing new application updates to fail, but they can’t write to the log files because the network isn’t correctly routing around the SAN fabric that was taken down for maintenance which has overrun. It took a group of people, working together, armed with proper information on the interdependent parts of the service, to join the dots.

Would this series of mistakes seem normal in other lines of work? Okay, it still happens sometimes, but in general most people are very capable of actually getting together to fix problems and make decisions. Round table meetings, group emails and conference calls are nothing new. When we want to chat about something, it’s easy. If we want to know who’s able to talk right now, it’s right there in our office communicator tools and on our mobile phones:

It’s hard to explain why so many service management tools remain stuck in a clumsy world of single assignments, opaque availability, and uncoordinated actions. Big problems don’t get fixed quickly if the normal pattern is to whack them over the net in the hope that they don’t come back.

Fixing stuff needs collaboration, not ticket tennis. I’ve really been enjoying demonstrating the collaboration tools in our latest Service Desk product. Chat simply makes sense. Common views of the services we’re providing customers simply make sense. It demos great, works great, and quite frankly, it all seems rather obvious.

Photo courtesy of MeddyGarnet, licensed under Creative Commons.