Intro | Support Driven Development
Are you Practising Support Driven Development? On this post I give the intro to this topic, explaining what it is, as well as talking through future posts on the topic
Are you Practising Support Driven Development?
If you are reading this then the answer is probably no. Or, you may think you are, but certain decisions that have been made (or not made) in the development of your product have made it much more difficult for you and your staff to support. These problems may be small and unnoticeable at the start but as your product scales and your user base grows, they will begin to create a black hole of lost hours and days for your developers who are supporting problems that should have been solved long before they reach their desks.
Throughout the course of this series of posts, I’ll go through what I find to be the main themes behind issues that cause grief for anyone supporting software and for customers raising them. I’ll speak to how they can be avoided with some development choices that can be made very early in the project development cycle.
As an introduction to this series, I’ll run through a sample scenario to give you a taste of what this content is going to look like. I’ll try to cover a few of the themes and they’ll be shown in bold below. For the sake of simplicity, I’ll also be making a few general assumptions about the structure of this company as it pertains to the issue at hand.
Keep in mind — this is a perfect storm of happenings for a small issue — but it is these kind of small issues that frustrate support staff the most, as solving them won’t have much of an impact on the entire user base but you lose a lot of time in the process.
Example
A customer using the software has recently forgotten their password. They clicked the “I forgot my password” link and they are told the reset link will be sent to their email. The customer never receives the email and as a result is forced to contact support.
This company has a fairly decent support structure in place, is using a well-known CRM ticketing system and has a custom-built cloud-based UI for supporting the software itself. The company has an outsourced call center as level 1 support and a couple of designated internal support engineers as level 2. Level 3 is the main development team.
Day 1
The call center answers the phone, gets the customer’s info and finds their account on the system. They find the user’s record but the only info they have on the screen is the name, email, phone number and if the account is locked or not. Sure enough, the account is locked, but since the customer can’t remember the password, unlocking the account here isn’t going to do much good. This one will have to go to L2. The call center worker tells the customer that they need to escalate the issue and that they’ll get back to them. The ticket gets escalated to L2 and dropped into the queue.
One thing to note here in this example — there are no automated responses set up on this ticketing system — so the customer is none the wiser as to the status of this ticket and has no case number or email thread to use to follow up with support.
Day 2
Since the L2 team is in a different time zone to the call center, a full day is lost before the ticket is picked up and the customer has been given no update. L2 do have access to the logs and know that the logs when an email gets sent from the system. They know the rough time of when the customer tried the password reset and they know from experience the log for the password reset action is simply PasswordReset. Unfortunately, here’s what they see in the log file when they search for that operation for the time period L1 has provided:
2021–03–23T23:14:22+00:00 PasswordReset userId: 23423334 email: **********@gmail.com
2021–03–23T23:14:32+00:00 PasswordReset userId: 24323453 email: **********@gmail.com
2021–03–23T23:14:44+00:00 PasswordReset userId: 56456456 email: **********@gmail.com
2021–03–23T23:14:45+00:00 PasswordReset userId: 34534543 email: **********@gmail.com
+100…….
As it turns out, the company hired a security expert to assess their system a few months back, and told them since they had European customers, they had to mask email addresses. Due to the new GDPR rules, email addresses are personally identifiable information (PII — any data that can be used to identify a specific individual) and to avoid any issues with EU regulations, they shouldn’t be printed in the logs.
Now the L2 team has hundreds of log lines and no easy way of linking the user to the log. They need to connect to the DB and use SQL to find the user’s ID based off the email address raised in the ticket. Unfortunately, the documentation hasn’t been updated and L2 didn’t get any training on how the DB is structured and can’t find the table where this user’s information is saved.
This now needs to get escalated to the developers. Since the developers use a different ticketing system to the support team, a new ticket has to be created there and it gets dropped into the development team queue.
Day 3
The ticket doesn’t get picked up by developers.
Remember when we said there was no automated response? Well, while this is going through all of the support and development teams, the customer has been ringing up trying to find out information. L1 finally sends them an email with a ticket reference so that they can chase up again, but can’t provide them with any further updates. Customer is understandably upset at this stage.
Day 4
Customer has begun to get very annoyed and the ticket is escalated to the support team lead. The development team doesn’t have an on-call/support schedule, so usually nobody takes ownership of support tickets until they are told to do so. Support lead contacts the lead developer to let them know about the escalation and it gets assigned to someone from the dev team.
They spend some time connecting to the DB, find the user ID, trace the log from when the issue is occurring and can see that there is a 500 internal server error happening in the log files a few entries after the log claiming the user has clicked the reset button.
After a few hours of digging they find out that the reset password email entry field doesn’t support special characters, and the user’s email contains a ‘ as part of their second name.
A fix needs to be implemented for this, but can’t be released as it’s now a Friday and the company has a policy that unless there is a major outage or platform breaking issue, new code can’t be pushed to production on Fridays. The developer lets the support team know that this will be fixed the following Monday and the customer should try the button again once it is.
L2 notifies the customer that this issue will be fixed the following Monday.
Day 6
Support gets the all clear from the development team that the fix is out, L2 contacts the customer directly to let them know and they can finally reset the password. The customer who raised the issue can finally log in and view their account, but this entire process leaves a bad taste in their mouth.
Looking at this from a customer lens, this should have been a simple fix and having to wait for a week to get logged into the software could very well mean that this customer won’t continue to use the software.
OK, now comes the fun part. Let’s break down this example and see where we can find some places where we could improve a few key aspects of this support and point out some gaps where supporting said software was not thought of or simply ignored for one reason or another. Where possible, I’ll also add the amount of time would have been shaved off the customer’s 7 day turnaround time.
Issue Breakdown
Any account administration action that would need to be taken by a customer trying to use the system should be available in the support’s UI. Support should be able to send a password reset on behalf of the merchant. Another example would be the ability to resend the email confirmation when a user creates an account. Time Saved (TS): 7 days — issue resolved on initial phone call
Showing IDs for key records in the Support UI. Having these IDs shown to support will help when searching the logs. In this case, the User ID was shown on the user’s account page — then they would be able to search the logs for that ID — which is not PII and does not have to be hidden. TS: 2–3 days
Audit Logging — If L1 don’t have direct access to the logs, then the support UI should have some sort of a logging UI built in, where support can directly see the main logs that are being generated from that user’s account. TS: 1–2 days
Automated Responses — Sending an email automatically on ticket creation to the customer that they can respond to will not only reduce the number of tickets that will be created for the same issue, but they can also be used for many other reasons that can improve the support experience. TS: 1 Day — Ticket may have been escalated earlier
Improved Logging — Many parts of this logging could be changed, correlation IDs could have been used along with JSON Logging formats, making the linking of the 500 error to the user ID much easier and vice versa. TS: 1–2 days
Development Impact Analysis — creating an impact analysis checklist that is filled out for each new feature helps the development team dig deeper into their proposed solution and see possible problems before they begin development. A section for support will help avoid a lot of issues where support may not be considered during the development of a feature. “Will the logging be changed?”, “Will Support have to re-create logging regex for saved searches or alerts?”, “What are our error messages?” are some examples of questions that should be asked.
A number of the above solutions would have completely negated the need for the support member to connect to the DB and query the user data. Requiring anyone to connect to the DB is simply creating a barrier to support and means that a. Support members will have to have DB access and b. Support members will need to know SQL. If you have a call call centre in a different country or on a different network, these will likely mean all issues requiring DB access will have to immediately bypass L1 (and possibly L2). TS: 2–3 days if data was available on UI and not DB
Are you still with me? These were the easiest examples of solutions to some of the more common themes. In the following posts, I’ll go into individual themes at depth, so you can see where this could help your team or implement some support driven development.
You can subscribe to any of the pages social media links below, and then you’ll be notified when my next posts go live.
The next topic — The importance and uses of automated responses — can be found here
Join the page
I’d love you to be a part of a community of people who share the interests of the page. Participate in the comments of the posts, the chat section, or support this work by subscribing!
Share it with a friend!
If you think anyone you know might like this content, please feel free to also share the page
Continue the conversation elsewhere
You may also reach out directly on any of the below if you would like to get in touch.
My own Linkedin | Linkedin for the page | Twitter/X.com | Instagram