Building AI products - behind the scenes

We talk to Lawrence Jones, founding engineer at incident.io, about AI engineering: hiring, shipping and treating AI like a human counterpart.

May 11, 2025

I’m running a hands-on Hiring Sprint with Zeshaan (ex-Onfido Director of Talent) on June 6th. If you’ve just raised money and need to make your first hires, we’ll help you build your hiring playbook—job specs, scorecards, interview flows, and onboarding plans—in a day.

AI is redefining what we expect from humans.

“Did you create that with AI?” might have once been considered an insult. These days, if you're not thinking about how your product can scale its users, you're falling behind.

AI engineers are at the forefront of this type of applied AI. Essentially, they use code to augment (and even replace) humans.

“
There are levels to AI engineering, but often you're trying to replace a human expert's work. It’s not just an LLM call, it’s a bunch of them feeding into one another. That system's complexity means you can no longer test it as you could with non-AI products. This is what creates the role of AI engineering.

Lawrence is a founding engineer at incident.io and has spent most of the last year building agentic-AI features into their incident-response product. We’re excited to speak to him because of his experience transitioning into AI engineering.

We discuss:

Hiring AI engineers.
How to build AI products that consistently deliver.
The practicalities of building agentic products in 2025.

Hiring AI engineers

Finding candidates who are a good fit is an interesting challenge when a new role, like AI engineer, emerges. It’s not simply keyword matching for prior jobs and experience. You have to look for signals in the application and during the process that you wouldn’t usually have to.

“
I’ve been quite impressed when candidates have built a UI or a new way of communicating with an LLM as a side project. Sure, there’s the risk that it’s been vibe-coded into existence, but that’s relatively easy to detect in an interview process. When they’re excited about AI to the point they’ve been tinkering with it, that’s the signal I’m usually looking for.

Incident deeply cares about product thinking. Their engineers are expected to speak to customers and determine their needs. Their AI engineers are no different:

“
It’s things like ensuring that when an AI proposes specific actions in response to an incident, there’s a human in the loop when it matters. That’s critical for a product like ours, so we check for this understanding in interviews.

If you think about an engineer dealing with an incident (website downtime, critical bug, etc.), the presence and timing of the “human in the loop” are crucial.

Equally important is having a way of quantifying how well a large language model is performing.

This is typically done through an eval dataset containing input/output pairs. The outputs are compared against an expected result to validate for correctness. For example, inputs could be questions, outputs could be the answers, and the eval dataset would evaluate whether the expected answer is correct:

An example eval

(Q: Capital of Germany → Expected answer: Berlin → Answered: Berlin —> Correct).

The key to generating these eval datasets is setting up foundational AI infrastructure, which Incident has invested heavily in. They look for engineers with the mindset to thoroughly inspect the data and determine whether or not the AI is improving.

How to build AI products that consistently deliver

There are a lot of AI products that seem impressive in a demo. But, in practice, they fail to meet expectations. Incident’s product relies on them delivering value during incidents with little room for error. Lawrence talks about what it takes to provide this experience:

“
We have a scorecard, which we apply to different parts of the product. It contains several key metrics and a load of diagnostic information. As an AI engineer, someone would take a particular set of requests on which the AI performs poorly and iterate on the solution until the evals start improving. This type of work is very different from product engineering.

This is the crucial work required for an AI product to deliver value consistently.

In Incident’s case, this has been a key strategic priority for their product. They have tight feedback loops with customers and diligently evaluate their eval datasets.

It's not easy, but their ambition isn’t either: they aim to use agentic systems to 10x the productivity of an incident responder, actively aiming to replace some of the human work required in an incident. It’s less about eliminating the role and more about reimagining where someone spends time dealing with an incident effectively.

The practicalities of building agentic products in 2025.

In addition to the “human in the loop” and “treating AI as a human” points mentioned above, Lawrence discusses the need for citations to provide trust.

“
When an AI product makes a suggestion, users must unpack why the product came to that conclusion. They need to see the raw evidence, i.e. the inputs. We’re facing this challenge right now: how to package this information in a way the user understands. So it’s more than just getting the product to make the proper recommendations; it’s also about ensuring it leaves a trail the user can follow.

I’m curious to understand the role of a PM in an organisation with product engineers and AI engineers. We’ve previously written about whether AI will kill product management. – and ultimately concluded that the future would have more, not fewer, PMs.

”
In our case, our PMs are thinking further into the future. If our product engineers are focused on the here and now, our PMs typically think about where we might get to next year. How will we package up what we’re building now to customers in a year? The building is primarily on the AI engineers, but how to position and market the product is more of a PM’s remit.

Even in a “product engineering first” company, PMs are safe for now.

Wrap up

Building agentic products is not easy. You need to hire engineers with the right skillsets, set up the relevant infrastructure, and be prepared to put in the work to achieve reliability. There is no easy fix, but the uphill struggle is worthwhile.

Agentic products allow us to rethink what humans can focus their time on. For an incident responder, this means spending less time on menial and administrative tasks and more time dealing with a critical incident.

This is a good analogy for the wider workforce and a future we should be running towards, not away from. “Disrupt or be disrupted” is no longer a cliche for tech products; it’s how we should all think about our work.

Hiring Humans 🤝🏼

Discussion about this post