The 6 AI Engineering Patterns, come build with Greg live: Starts Jan 6th, 2025
Leverage
AI Show & Tell

Arjun, CEO Of ResiDesk

Hacking legacy systems with AI, saving 30+ man-hours a week with clever automations, and why overthinking AI can be your biggest downfall

Arjun, CEO of Residesk, shares his insights on implementing AI workflows in a business context. Residesk powers over 1,000,000 AI-driven chats per month, with AI integrated seamlessly into various aspects of their product.

Insights

Key Takeaways

  • Model Strategy: Residesk uses a best-of-breed approach, leveraging different models for specific tasks. They favor Gemini 1.5 Flash for its cost-effectiveness and latency advantages. OpenAI models excel at analysis, Anthropic models are preferred for writing, and Gemini models are best for information retrieval ("detectives"). They use Claude to generate code for easy model switching and testing.
  • Prompt Strategy: Residesk relies on Anthropic's prompt generator (Claude) and structures prompts using XML for cross-model compatibility. They prioritize stuffing context into prompts, anticipating future improvements in prompt caching and batching. Prompt caching yields approximately 50% savings, with OpenAI's implementation proving more reliable than Anthropic's.
  • Human-in-the-Loop Approach: Residesk views AI as a tool to enhance workflows rather than fully automate them. They aim to shorten the time it takes to complete tasks, not eliminate human involvement entirely. This allows for faster iteration and a higher margin of error.
  • Retool + Commandbar: Residest utilizes Command Bar (now Command AI) for in-app shortcuts and automations, Retool for embedded applications and integrations, Front as their email client, and Make (formerly Integromat) for workflow automation and connecting AI systems. Make is highlighted as an underrated tool.
  • Customer-Facing AI: AI is used in various behind-the-scenes processes, including lease document processing, payment reconciliation, and generating summaries of resident conversations for escalation to site teams.
  • Multiple Options Approach: Presenting multiple AI-generated options to human agents, particularly in nuanced resident conversations, proves more effective than relying on the AI to select the single best response.
  • Risk Level Assessment: AI is used to determine the risk level of conversations, using a five-tier framework developed through iterative refinement with an AI "sparring partner" (Claude). They prioritize avoiding false negatives over false positives in risk assessment.
  • Call Transcript Analysis: Claude is used to analyze call transcripts and generate concise summaries and emails in a desired writing style. The "sparring partner" approach helps refine and polish these outputs.
  • Mermaid Diagrams for Workflows: Claude is used to generate Mermaid diagrams from textual descriptions of workflows. These diagrams are then used for documentation, process analysis, and QA purposes, ensuring adherence to established procedures.
  • Unstructured to Structured Data: A Retool application allows the team to upload unstructured data like lease documents and PDFs and convert them into structured FAQs, improving information retrieval and response generation. This has significantly increased their first-day response rate to resident inquiries.
  • Payment Processing Automation: Make is used to automate the complex process of reconciling payments made through one-time-use credit cards sent by a third-party clearing house.

Short Lessons Learned

  • Focus on specific use cases: Don't try to "throw AI" at your business; identify concrete problems where AI can provide value.
  • Embrace a human-in-the-loop mindset: AI is a tool to enhance human capabilities, not replace them entirely.
  • Iterate and refine: AI workflows require continuous improvement and adjustment. Use feedback and data to optimize your prompts and processes.
  • Start small and scale up: Pilot AI projects with a few key individuals and then expand based on their success.
  • Empower your team: Give your employees the tools and resources they need to experiment with and utilize AI effectively.

Frameworks

Best-of-Breed Model Strategy

  • Choose horses for courses: Select different AI models optimized for specific tasks (analysis, writing, information retrieval, etc.).
  • Evaluate and monitor: Continuously assess model performance and adjust your strategy accordingly.
  • Flexibility and optimization: Gain advantages over a one-size-fits-all model approach.

Human-in-the-Loop Workflow Enhancement

  • Enhance, don't fully automate: Use AI to improve speed and efficiency, but retain human oversight.
  • Shorten the commute: Focus on reducing the time it takes to complete tasks, not eliminating human involvement.
  • Faster iteration: Allows for more rapid experimentation and improvement of AI workflows.

Bottom-Up AI Adoption Playbook

  • Empower individuals: Give employees access to AI tools and encourage them to find relevant use cases.
  • Start small, solve concrete problems: Focus on practical applications that resonate with individual employees.
  • Build internal champions: Showcase early successes to inspire wider adoption and generate organic enthusiasm.

Transcript

00:00:00 Arjun: We favor Gemini 1.5 Flash over everything else. We found that the OpenAI models tend to be the best analysts. The anthropic models tend to be the best writers and Gemini models broadly tend to be the best detectives. We trust the anthropic prompt generator by a lot, and we're making the bet that it's gonna get easier and easier with prompt caching, batching inside all of this. So everything below escalate is actually a retool app that is embedded within my actual code. So this is happening 100 and 100 of times a day for every one of the customers that we support. When you wanna ask an LLM to do something, you better have a pretty damn good idea what good looks like for yourself.

00:00:40 Arjun: Talking about how to put AI in my business is about as broad and silly as talking about how do I use Excel in my business.

00:00:58 Greg: That's Arjun, CEO of Residesk. They power over 1,000,000 AI chats per month. Their clients interact with AI in almost every single part of their products, but they don't know it. In this interview, he shows me how to create business grade prompts using Claude's workbench, how he saves 100 of hours on development using retool and command bar, and why he favors Gemini 1.5 Flash over any other model. Let's get into show and tell. So, Arjun, sent out this Twitter thread asking who are the scrappiest LLM operators I knew, and you were mentioned more than anybody else in the entire thing. So I was super excited to get on the phone with you.

00:01:34 Greg: You told me about a wild stat about ResiDesk, about how many chats you're processing with AI per month. And so I'd love to hear about that as well.

00:01:42 Arjun: Yeah. For sure. So I'm Arjun. I'm the cofounder of ResidDesk. On average today, we process about a couple of 1,000,000 conversations per month with residents over text across the US and Canada. And so each of those is being processed by our systems, LLMs that are looking at everything we know about the property to create the appropriate responses and then helping our human team in the loop to have better conversations with residents. If you ever if you've ever rented before, you know it kinda sucks because your landlord doesn't care about you. That's what we're trying to change.

00:02:16 Greg: That is so wild. So when I think of other businesses doing a couple million conversations per month, I'm thinking of the huge people that literally do chatbots as a service, like the intercoms. What's your model strategy? Like, who are you paying a ton of money to, or are you running your own open source, or how do you think about that?

00:02:33 Arjun: We're not paying a ton of money to anybody, but our models model strategy is sort of just best of breed. Is so we use a mix of models across all the closed source providers. We have some stuff that we're testing with Llama on the side. But broadly, it's a mix of, you know, OpenAI, Anthropic, and Google are the big ones. And the what we have learned right now, I would say we favor Gemini 1.5 Flash over everything else, which is how we also manage our costs and latency by by a pretty good margin. But, like, what we found is that we have a work we don't think of it as, like, one model for everything. We think about it as there's a ton of different workflows.

00:03:17 Arjun: We'll get into some of those as we talk through. But, like, we found that the OpenAI models tend to be the best analysts. Mhmm. The anthropic models tend to be the best writers, and the Gemini models broadly tend to be the best detectives, if you wanna think about it that way. Right? Like, they're great at finding needles in haystacks when looking at large contexts. And so that's our general framework. But then from a tactical standpoint, if you ask me how I did this, it is literally I had Claude spit out a bunch of code that lets me switch models on the fly by just tweaking settings in my database. And so we're always testing new models on some subset of our conversations Mhmm.

00:03:58 Arjun: And then trying to figure out what's best for what without trying to dive too much into the rabbit hole of constantly tweaking and testing what is ultimately a nondeterministic thing.

00:04:08 Greg: Sure. Well okay. And so that makes sense on the model side. The other part that's gonna get complicated very quickly is the prompt strategy. So, yes, there's the actual prompts that you do, but how do you do prompt versioning or, like, prompt testing or orchestration of your prompts? What what do you what do you all do there?

00:04:24 Arjun: The the simple answer is that we just we trust the anthropic prompt generator by by a lot. Right? So we basically have our prompt strategy is dictated by whatever Claude thinks is a good prompt. Sure. We have tons of structure in our prompt too. And there's a couple of maybe deeper learnings that we've that are fairly hard earned. I think what what we found is that if you if you can manage your token budget decently well, the anthropic strategy of everything is XML works quite well across any model. Mhmm. And then it's it's gotten much easier now because everybody offers some some form of prompt caching as well.

00:05:08 Arjun: But even before that, we were all in on XML and structured inputs and outputs. And then more broadly, I think we think about we stuff a lot of context into the prompts is, I think, the easiest way to think about it. And we're making the bet that it's gonna get easier and easier with prompt caching, batching, etcetera, to manage stuffing the prompt on the input side than to try and break things into multiple workflows. Right? Like, I I like I always like the Sam Altman thing of just try and just keep betting that the models and the interfaces are gonna get better.

00:05:42 Greg: Sure. Sure. That makes sense. Out of curiosity, have you all done the math on how much you're actually saving on prompt caching? I'm always curious to hear, like, because anthropic, they market it's up to 90%, but then in actuality, it's not actually 90%. It's more like in the 30 or 40 range I've heard from other folks. What what what are you all at?

00:05:58 Arjun: About half.

00:06:00 Greg: I mean, still, 50% is, like, pretty insane.

00:06:03 Arjun: Yeah. I would say we we see more savings, funny enough, on on GPT than we do on Claude. I think whatever OpenAI launched in the last few weeks, whatever they're doing out of the box on their back end is working much better. Yeah. Claude is we find a little bit finicky with the prompt caching, and we've we've actually seen sort of mixed results. So it's 50 on a good day, but it can be 30 on a bad day.

00:06:31 Greg: Yeah. Oh, 15. Okay. Yeah. That makes sense. Maybe it's because OpenAI, you don't have to do any extra work. You just literally send them the same prompt and they take care of it for you. But on Claude, you have to do the breakpoint, which is kind of an not annoying per se. Well, it's extra overhead, so it's annoying, but it's just more management you have to think about. Yeah. Yeah.

00:06:47 Arjun: That's right. So I think it's like it it is variable, but it's certainly not 90%. Not by a long shot.

00:06:53 Greg: Sure. Sure. Sure. Yeah. Totally. Okay.

00:06:55 Arjun: At least not yet.

00:06:56 Greg: Yeah. For sure. And so I was gonna say, one of the other things from our previous conversations that attracted me to your background a ton was that you have a unique mindset as a business owner to kind of go Sherlock Holmes AI style into your business, find different workflows that need automating and being like, actually executing against them. So before we dive into specifics, and I know we'll get into that in a second, but, like, what's the mindset that you have as you're looking at your team and you're looking at AI, and they're like, okay. I know that there's something here. How do you go find out what they need?

00:07:28 Arjun: Maybe one thing that I've noticed about me and my cofounders and just broadly at ResidDesk that I think is slightly different from other founders working with AI I've seen is we don't air about AI doing the whole thing. Right? Like, our business is built with the model of a human in the loop. Mhmm. And so we think of AI almost as shortening the commute between starting a task and finishing a task, not so much as automating something. So I am okay with, you know, if I have a 5 task 5 step workflow, I'm okay with AI doing 80% that you know, going from 10% to 90% on each of the 5 steps

00:08:06 Greg: Sure.

00:08:06 Arjun: As opposed to trying to figure out how can I automate the whole thing? And I think that frees you up. It makes you look at AI in a whole different way. It's just if this can it's just about saving me time and not so much about automating the task. And I think that's maybe a little bit different than than people looking to just automate things with AI.

00:08:28 Greg: Totally. And to take the self driving car metaphor, it's like you're cool if the FSD wants to drive on the freeways, but you'll still drive it on the cities to in in the certain parts.

00:08:37 Arjun: That's exactly right. Yeah.

00:08:39 Greg: Yeah. Yeah. Okay. Well, I love that. So let's get a little tactical here. So let's before we dive into some demos, because I know you have some really cool demos to show us, I would love to hear about what do your customers not see that's AI enabled?

00:08:54 Arjun: So everything from when we sell into a property. Right? Like, how they set us up, sending us their lease documents, and turning that into a questionnaire. What somebody else would do as a big, like, training step on the AI, They just email us, and then we take care of the rest in terms of turning it into structured data. When they pay us with we live in an industry that still pays with paper checks. Right? Like, I have AI that's I I scan the receipts in the power scanner, and then I have AI that reconciles those payments. Because if I sign a contract with Greg's property management company, I'm actually getting paid by all 40 of Greg's properties.

00:09:30 Arjun: And so I have a diffusion problem that I'm then solving on the other end. When we have conversations that we are escalating back to our customers about a something that needs to get fixed in somebody's apartment or a ticket that we create in their system. Right? They get a diagnosis, but all of that stuff is happening through AI powered workflows in the background. There's always a human component to it, which is why we have so much margin for error. That's the other thing that I think is unique about us is I can kinda go ham with AI stuff, because I know there is always a human in the loop to backstop anything I'm not doing well.

00:10:08 Arjun: And because of that, I can cycle through a bunch of iterations much faster than anybody else can.

00:10:12 Greg: Sure. Sure. And what sort of tool stack do you have that makes it so easy to do human in the loop stuff? Because that's not easy to, like, orchestrate and integrate new existing tools. And so what do y'all do?

00:10:22 Arjun: We make heavy use of the drop in workflows from tools like Command Bar, now Command AI that was recently bought by Amplitude and a bunch of retool embeddings. And then we have a lot of back end workflows that are specifically enabled by Front, which is our email client, which is then super pluggable into into make, which then plugs into all of the AI systems. That's why. So, again, for our internal teams, they're never really flipping back and forth and going to chat GPT to use something. I mean, they have the tools. They are they're encouraged too if they need it. But broadly speaking, we're baking AI into those workflows as a kind of automatic thing.

00:11:02 Greg: Yeah. Yeah. Yeah. So of those tools that you did mention between front make Mhmm. And the other 2, I'm forgetting what they were. But which one impresses you?

00:11:09 Arjun: Retool and Command Bar.

00:11:11 Greg: That's right. Retool and Command Bar. Which ones do you look at and are just like, holy cow. I'm pretty impressed with the product that they're doing, and I think that it's underrated. More people should be look looking at them.

00:11:21 Arjun: So it's very hard to say that retool is underrated even though that would be my natural answer. Like, retool is the one that I think I just wouldn't be able to ship code without. Mhmm. And command bar is not underrated. Front is not underrated, so I'm gonna go with Make. Right? Like, I think Make is Make lives way too far in Zapier's shadow. I think they're actually a far better product. It's it's insanely good at the in the ways that it plugs into things, and you can actually debug workflows that would otherwise be a nightmare to build with any other tool.

00:11:56 Greg: That's interesting. Cool.

00:11:58 Arjun: And it's like it break breaches the low code, no code barrier enough that I think a person with skills on the engineering side can use it phenomenally well, as can people who are maybe systems minded but don't have the engineering skills.

00:12:12 Greg: Sure.

00:12:12 Arjun: And I think that's that's kind of a special thing for a tool to achieve.

00:12:16 Greg: Yeah. Yeah. Yeah. Absolutely. Well, I tell you what. You mentioned some of those tools. I would love to dive in to some of those tools. And so prior to the interview here, you shared a a couple of really cool examples. What do you think about starting with the command bar retool create summary emails one? I'd love to see how you do that.

00:12:31 Arjun: Yeah. Absolutely. So I'm gonna go ahead and share my screen here.

00:12:34 Greg: Beautiful.

00:12:35 Arjun: In a prior world, before we had these integrations, what would happen is we would basically copy this conversation, put it in a transcript Mhmm. Write a summary email, open our email client, and then send it to the team that actually works on-site. Right? And the idea is that, as I said before, the site team interacts with us as if we were a virtual extension of their team. That means we don't give them new tools. They just live in their inbox, and we we send things to their inbox when they need to look at it. Right? As you can probably imagine, bunch of context switching, bunch of lookups, lots of room for manual error.

00:13:12 Greg: Sure.

00:13:12 Arjun: And so that's we wanted to bring all of that into the context of Residash. So the first investment that I'm gonna talk about here is command bar. So command bar is for anybody who's unfamiliar. Effectively a way to replicate the power user command k functionality inside your app. So think of it as opening up a menu of functions that you can actually trigger. Right? And so I press command k here. It shows me all of the different shortcuts in my application. It can even be context dependent. And the cool thing about it is that you can create these commands both via JavaScript API as well as a GUI. So I basically just dropped you can see all of my actions here because I'm in the admin view.

00:14:01 Arjun: But I just dropped a couple lines of code into my application, and I was already able to set up command bar. You can do a lot of cool things with it that we're not gonna get into right now, but you can even create, you know, automations based on, hey. If I hit the shortcut, click this button then this button then this button. Super powerful. Right? Cool. And so we have a workflow here that just tells me I have here's how I escalate a conversation. Again, our team would actually use the shortcut. I use it much less. And one of the cool things that it does is we can then have our team make judgment calls on a couple of steps that they need to fill in before we can generate the summary.

00:14:45 Arjun: Right? So I now see this is actually a pretty high risk conversation because the resident is really upset. And if we don't fix this, they're not gonna pay their rent on time, which is then a big issue. Right? So I'm gonna say charges on rent. And I'm gonna highlight a couple messages that they talked about. And

00:15:07 Greg: Oh, cool. So it's actually going from the chat and it's it's actually grabbing all that information so you can pick what's what's relevant and what's Correct.

00:15:15 Arjun: Correct. So because command bar is super flexible, right, I had a multistep workflow. My team is picking sort of what they wanna talk about. And then, again, because I can embed retool apps into anywhere in my code, command bar ran a bunch of steps, hit an API on my back end that goes through read through all of this, read the things that I wanted to highlight, and then give me a summary of the conversation. This opens a model. And now what you're looking at inside all of this, so everything below escalate, is actually a retool app that is embedded within my actual code. Mhmm. Right? And the cool thing about this is retool embed, I I have now passed in through React a a couple of parameters that give it context about what account this is, what customer this is, and what property this belongs to.

00:16:09 Arjun: Retool is then interfacing directly with Front, which has our contact lists and our user lists, which is how we sort of keep them in sync. Picking out the right people that they should go to, in fact, for some of our customers, we even customize this based on what the topic of the message is. So, you know, payment conversations go to payment teams. Electricity conversations go to the utility teams, whatever that might be. It is auto generating a a subject line. It's auto generating a conversation summary. And then all my team needs to do there's a picture of a cat because we're including the entire thing.

00:16:47 Arjun: Nice. But it's looking at the images. It's gonna figure out what I am trying to say and tell us, okay. Incorrect charges on the resident account blah blah blah blah blah. All of this is being generated automatically with AI. And then if I hit escalate, which I'm not gonna do now Uh-huh. This message goes straight to the site team. Again, this is a workflow that before sort of shipping all of this stuff would be copy the conversation, summarize it, put it into a new email, look up who it should go to, put that in, fill out the subject line, send it, and then log in the conversation itself that this happened.

00:17:24 Arjun: So this is happening 100 and 100 of times a day for every one of the customers that we support. Right?

00:17:31 Greg: Uh-huh.

00:17:32 Arjun: And by keeping our team in in the context of the conversation at all times, I am saving I think we estimated it on average something like 20 to 30 man hours a week.

00:17:45 Greg: Wow. Yeah. That's that's wild. That's absolutely that's that's wonderful. Yeah. What are the other top Yeah. Used command bar shortcuts on there?

00:17:57 Arjun: So we have a bunch of things. I would say the most important thing is just translating messages. We use that quite a bit as you can probably imagine. There's also quite a few other things as well. So we can do things like I'd say escalating conversations and looking at looking at common themes is a big one. And so you can sort of navigate through the inbox. So we've basically built this to be a power user app. Okay. But, again, because we have command bar and retool, you can use this with key with just your mouse and be okay. But if you're a power user, you can use it well beyond that. The other thing that we have, just like any other prompt engine, would be we have a bunch of AI use cases.

00:18:47 Arjun: Again, this is all just set up with command bar. I've set up a bunch of different templates.

00:18:51 Greg: So

00:18:52 Arjun: you can see it can do all the things that Grammarly can do for you. So, again, I have Grammarly on my machine, but our team can use it without needing to again set up a new tool on their machines.

00:19:02 Greg: That's very cool. Do you have stats on what percentage of your company is using, like, AI features within Command Bar?

00:19:09 Arjun: Yeah. I mean, it's about well, all everybody who use actually, it's a 100%. And I'll I'll explain why is all of our team has to use the Resides gap to to actually talk to residents. And so we don't think about it so much as AI features, more just that it's baked into it. Sure. And I'll explain a couple more things in a second, but, like, actually, no. I'll I'll do that right now because this is this is also a fun thing. Right? Is if you go back, and we'll get into this, I think, as we talk about, like, thinking about unstructured data and turning it into structured. Sure. I would say our team doesn't need to think too much about AI because what's happening is k.

00:19:53 Arjun: Another conversation. Here's somebody who's asking if they can make, you know, partial payments. So they don't wanna pay their rent up front. They wanna pay it in a couple installments. Right? We had, at some point, a a guideline that we scraped from the property or from one of the responses from the site team about how you can pay your rent through alternative methods. That then comes is auto populated, as I'm sure you can see here Mhmm. As a response. So just, again, we draft responses to every message as it comes in based on property context and conversation history. And, like, the way that our customers that our that our team thinks about it is we don't really even talk about things like saying AI, but it's like, okay.

00:20:40 Arjun: Let me think about a couple of different ways to respond to this conversation. I click view all responses. It's like, okay. Do you wanna be a bit more supportive? Do you wanna be a little bit more factual? Different residents require sort of different touches.

00:20:53 Greg: Sure.

00:20:53 Arjun: We try to make it basically a native part of how they think about it more so than trying to get our team to adopt new actions and responses, if that makes sense.

00:21:04 Greg: Yeah. It does make sense. And what's interesting about this is the theme of presenting a few options to the user and then having the user pick which one they want keeps on coming up, like, just in in my around me here. So, like, instead of making the AI go and guess what is the exact response that the person wants to do, just let the human take it the last mile and figure out which one. So, like, how do you how does does that make its way into other parts of your flow as well with regards to multiple options presented?

00:21:31 Arjun: Not quite. I think the you saw the 2 main ones,

00:21:34 Greg: which

00:21:34 Arjun: is the like, having a conversation with the resident is actually the place where you need the most number of judgment calls by a human. So that's where we let sort of to keep going with your FSD analogy, I think that's where we would let the driver take the wheel Sure. Just because there's so much nuance to be applied here. And while we trust our trust our models to come up with with good answers, they don't always come up with great answers unless you force them to bring in options. Yeah. So it makes the system a bit more fault tolerant in both ways. There are other places where this is not quite true. So you can imagine so you can see here.

00:22:13 Arjun: Right? Like, there is a you can see that we assign a risk level, for example, to every conversation.

00:22:19 Greg: Mhmm.

00:22:19 Arjun: This is fully determined by AI Mhmm. Where it's like, okay. This is a high risk response because it's a rent payment financial concern that requires a timely response. This is all fully generated. So I think you picked up on the theme because I'm also showing you the 2 most external facing workflows that need a human to sign off

00:22:39 Greg: Yeah.

00:22:39 Arjun: Before they go out into the world.

00:22:42 Greg: Totally.

00:22:42 Arjun: But outside of that, we do have it generate options and reasons.

00:22:46 Greg: Mhmm.

00:22:47 Arjun: And I we find that that actually works quite well.

00:22:49 Greg: Yeah. Yeah. Totally. Well, on that one specifically, using LLM as a judge more or less to grade the risk level of these things, Sometimes that's not as easy as it looks because you need to, like, encode different criteria. And if it meets criteria, then give it a certain level. What, like, what was that process like to implement that? Did was it straightforward, or did it give some hard time?

00:23:09 Arjun: It took us, I would say, a few months to actually figure it out.

00:23:14 Greg: Mhmm.

00:23:15 Arjun: I think there are a few things that that we were that we were struggling with. I think one, it's funny. Right? Like, the I I the one universal lesson slash callus is when you wanna ask an LLM to do something, you need to you better have a pretty damn good idea of what good looks like for yourself.

00:23:36 Greg: Uh-huh.

00:23:37 Arjun: You're all like, as we're like, k. What is the risk level of the conversation? And you there you can go really deep down that rabbit rabbit hole and realize that there's a different answer for everybody. And so we did a lot of we have a we have a custom GPT called sparring partner, which is basically just built to challenge every idea you throw at it.

00:23:59 Greg: Could we could we take a look at that one?

00:24:01 Arjun: Yeah.

00:24:01 Greg: That'd be fun to see. You're like

00:24:03 Arjun: Actually, I think we have the latest duration on Claude now. So let's just go

00:24:07 Greg: Well, so that that's interesting. So why move over to Claude instead of the GPT on Chow GPT?

00:24:15 Arjun: Right now, it's just because we have chosen Claude as the as the blessed tool for the team, but we move around quite a bit.

00:24:25 Greg: I'm curious. How did you tactically make the sparring partner? If it's a project, is it just like the custom instructions that come on into it?

00:24:31 Arjun: Yeah. Pretty much. So you can see here I'm trying to make a risk framework for conversations. I want it to be binary. Tell me how you think about this. And I'm oversimplifying this. Like, ours were pages and pages. Right? And it's a lot of, like, okay. Clarifying questions. What do you actually mean when you say binary? Blah blah blah. And so we went back and forth through this a bunch of times to then come up with effectively, we came up with a 5 tier framework. Right? Which was okay. Like, broadly speaking, and it's funny because you can always intuit your way back into this, but it's like, okay. There is a really urgent fire, flood, blood category, and then there's a really not urgent okay, thank you category.

00:25:24 Greg: Mhmm.

00:25:25 Arjun: And so even if you went by it's just sort of standard survey design, you would come up with 5 tiers. Then we had effectively went back and forth with the anthropic prompt generator to say, here's what a tier 1, 2, 3, 4, 5 conversation looks like.

00:25:39 Greg: Mhmm.

00:25:39 Arjun: And then it was a bunch of iterations on different models. We looked at a sample of about about 1,000 conversations to look at the distribution of scores and why. Mhmm. And then this answer, I don't think anybody's gonna like, but we eyeballed it to see if it intuitively made sense to us. And then you just gotta pick where you're willing to be wrong. Right? Like so we were okay with overfitting the risk and saying conversations were we were okay with tons of false positives. We just didn't want any false negatives. Mhmm. Because, again, if you're complaining about a leak in your kitchen sink, I wanna know about it.

00:26:17 Arjun: I'd and I'd rather be a little bit more alert

00:26:20 Greg: Mhmm.

00:26:21 Arjun: To problems that aren't actually problems.

00:26:23 Greg: Sure.

00:26:24 Arjun: So once we had that, that gave us sort of the first iteration of our risk framework.

00:26:30 Greg: Cool.

00:26:31 Arjun: And then after that, the process is pretty straightforward. Right? Like, so we generate we we look at a sample of maybe a few 1,000 conversations every week and have, well, previously, 4 o, now o one, go through it and highlight anything that doesn't feel right. And then we also have a a human on the on the engagement team who's responsible for finding conversations that are an NA on the risk scale and then giving us an opinion on what it should be.

00:27:05 Greg: Sure.

00:27:05 Arjun: And that's how we're going through it. Eventually, we see this as, and we're partly in the process of building this, a much more automated eval framework that will sort of self correct as it goes.

00:27:16 Greg: Yeah. Yeah. Yeah. Yeah. Totally.

00:27:18 Arjun: And look at the last 5 correctly and badly marked conversations.

00:27:22 Greg: Totally. Tell tell me about the shift to o one and why you chose it for that one. Because I think a lot of people are still asking, like, I know Owen's good. I know that it thinks a lot before it gives me an answer, but, like, what types of tasks do you think it's better at versus 4 o would we get at?

00:27:38 Arjun: The rule of thumb for me is is this worth overthinking? Right? Like, it's it's a lot of, like, unstructured stuff like, hey. I just had this really rough conversation with a customer today. They meet they went on about a lot of things they don't like about our process. Here's the transcript of the conversation. Here is the PDF of my website so you have context on what I do. Here's what I am worried they're actually saying. Talk to me about this. That's the kind of stuff that it's fantastic at. Right? Like, it's trying to find a little meaning in fuzziness, But I would we don't use it for anything in production right now.

00:28:27 Arjun: We find it to be a little bit too much of an overthinker for the types of process we have. And I the bias here is that we try to build processes as you've seen even with that little workflow. We try to build processes that are many small steps that you can compose together.

00:28:43 Greg: And

00:28:44 Arjun: so we get a lot of bang for the buck out of just breaking down prompts to the simplest possible thing and then giving it to the dumbest possible model.

00:28:52 Greg: Sure. Yeah. Yeah. Yeah. I love that.

00:28:54 Arjun: So it doesn't meander.

00:28:55 Greg: Yeah. That's a workflow I think a lot of people could do. Well, I tell you what. I wanna jump on to just at least a few more of these cool suggestions you had. So let's stay on the clog let's stay on the clog projects 1. So you're uploading call transcripts and voice notes and yeah. Tell me more about that.

00:29:10 Arjun: We have a project that's basically set up to our writing style. It's it's not that fancy. It's literally just a Harvard Business Review writing style of bottom line up front, and here's what you say. Mhmm. And then we upload our calls, transcripts, and then go back and forth with it. I will say mostly a lot of it is just for me to say candidly what I think after

00:29:31 Greg: a call Sure.

00:29:32 Arjun: And then have it sort of sort of polish the the thing. Right? Like, everything from, hey. You need to pay for the product or we're gonna shut off service, which I promise you I did not word as nicely as that Sure. To, hey. Like, you might have a bunch of in house things that you think could replace Residuesk, but here's why our solution is actually better.

00:29:55 Greg: That's interesting. Do do you have that project? Like, could you just show us, like, going into the project and let's see let's see that that like, even a preview of the system prompt would be really cool to see.

00:30:06 Arjun: So this is something that we did as a one time exercise and then set up as a system prompt. Awesome. So it's pretty straightforward. Right? So this is a conversation that we had with a tough prospect who was trying to build their own solution in house, which often happens. Right? Like, sometimes people are evaluating buy versus build.

00:30:27 Greg: Mhmm.

00:30:27 Arjun: And so the prompt is literally, like, write me an email, somebody who was skeptical about why our tech would beat something they did in house. Here's my draft so far. Ask me clarifying questions. I'm aiming to get a short, sweet draft that encapsulates our tech technical depth and data mode. This is literally the system prompt.

00:30:45 Greg: Uh-huh.

00:30:46 Arjun: And then at the end, I also have this, which which is I'm attaching the transcript of our call so you get a deeper handle on the issue. And then the only thing that I add to everything is ask me clarifying questions. By doing this and then throwing what I want in there, it is okay. Can you provide more details on 1, 2, 3, 4, 5, 6? And then I'm just going back and forth on it conversationally. I'm just throwing everything I have at it, which then gives me a rough email that I can use. Right? And then I'm gonna, like, how can I make these ideas hit harder? And then eventually going back and forth and, like, I don't have enough data to hit harder.

00:31:24 Arjun: Okay? Ask me clarifying questions on each point. I'll provide these insights. Go back and forth. Go back and forth. Go back and forth. And then eventually ended up with a fairly if I scroll down, a fairly quick email. And then I was like also use my tone, and then I can be like I can say something like this. Right? Pretty straightforward thing. I wouldn't do this for everything, but, like, the fact that you can just drop this in there and then think through the LLM interface is insanely good. Because then I can take the same thinking and share it with my team and have them debate it, and so you just get better off for every turn.

00:32:06 Greg: Yeah. Yeah. Yeah. That makes sense. I mean and even just having, like, a brainstorming partner and buddy with that to give you, like, a different perspective is super valuable as well.

00:32:14 Arjun: Yeah. So here's another fun piece, which is I was working on and I'm just gonna show you the output of this. But one of the things that we need to put together for some of our due diligence workflows is trying to help people understand how we integrate and when we interact with residents. Right? And so I actually worked with Claude to basically say, here's our, like, general workflows, and we have that outlined as a piece of text and then had it turn each of them into a mermaid diagram that I could then and let's find out. So this is like okay. Here's what happens. When somebody opens a ticket, we start with, is the issue reported, get the details, does it already exist, create it, update it.

00:33:06 Arjun: Then after that, we start processing it. If it happens, escalate blah blah blah, and then come back. And then when it's closed, we close it. Right? These are all fairly straightforward things that I could describe as as code by actually writing the code. Mhmm. But the ability to have a conversational interface that generates these diagrams for me and I can edit in real time was insanely helpful. Right? Like, because we went from having a due diligence request on on Monday to having a full presentation on everything on Tuesday, which is not not how things usually go.

00:33:43 Greg: No. That's not usually how it goes. With those flows and diagrams, are you training your team on how to do those and so there's, like, no ambiguity? Or what do you use those for?

00:33:51 Arjun: We can feed the mermaid diagram back into prompts to check after after the fact on a conversation if we actually follow the process. Right? And that's actually really fun because I can be like, here's the transcript, and we do this less often than we really should, but it's like, here's the process we should have followed. Here's the process as it played out in the conversation between Greg and us from day 1 to day 30. How close were we? And it's actually something that we're starting to bake into our QA process. So something we've learned, and this comes back to one of those little things that you learn about token efficiency, is if you can describe the process as mermaid code, then you get a lot of bang for the buck inside the prompt, especially when you're building like a QA workflow of how closely did we stick to what we were supposed to do.

00:34:41 Greg: Sure. Yeah. That's actually fascinating because it's traditionally very hard and very fuzzy to, like, look at a conversation of unstructured text and say, did this agent or did this person or did this employee actually follow the flow they're going to? So to have 4 o Mini or 4 o look at it and be like, yes. It yes. They did. Or Yeah. No. They didn't. Like, that seems like a really cool tool to make sure your team is staying on track.

00:35:01 Arjun: Yeah. So something we're still building out, but the fact that it it was actually fairly trivial to get a prompt to tell us if we were, you know, 60, 80, 90% on track.

00:35:13 Greg: Yeah.

00:35:13 Arjun: And we're still iterating through again, much like the risk level calibration, I think we'll end up with a 1 to 5 scale on this too.

00:35:21 Greg: Yeah.

00:35:21 Arjun: But, like, being able to look at that for a 1,000,000 conversations at scale is super interesting.

00:35:26 Greg: That's for that's that's that's absolutely wild. What about in, like, in terms of observability tools do you use? And so what are you using to, like, track your calls and costs and everything?

00:35:37 Arjun: So we do a couple of things. So, obviously, we have everything that we do with AI goes through we I just have a light middleware that is logging every request and results back into our database. I think it's kind of absolutely mandatory that the first thing that you build is just observability infrastructure with all of the AI stuff. And so I literally just have a postgres table that logs every request I've made to every provider through every pipeline. The the other thing that we use that we are just testing out is I don't know if you've heard of Gentrace.

00:36:18 Greg: Oh, yeah. Big time.

00:36:19 Arjun: So we use Gentrace quite a bit, and that's who we use to monitor costs broadly. Mhmm. So we have all of our calls going through them too. These guys are great. Know the founders love them. Nice. They also offer testing, which we don't quite use. But for costs, we use GenTrace. And then for internal monitoring, just because so much of our workflow is just conversation QA Mhmm. We just use we just use our database.

00:36:47 Greg: Yeah.

00:36:48 Arjun: So the idea behind it is that once you have, you know, basically an entire history of every AI call that you've ever made

00:36:55 Greg: Uh-huh.

00:36:56 Arjun: Then you can just look at subset of that data to feed your QA workflows or at least test them.

00:37:02 Greg: Yeah. That's wild. I love that. That's super cool.

00:37:04 Arjun: Let's maybe start with thinking about how we turn unstructured data into structured data that we can actually use. Mhmm. Right? So one of the one of the other cool things that we do with Retool is the hard part about understanding and answering resident questions at any property is just knowing what the property is about. Right? Do they have a pool? When is it open? What is their pet policy? What is their parking policy, etcetera? If you ever lived at any rental property before, you know that these things don't live in, like, neat structured online documents. They live in random PDFs that nobody's ever seen before.

00:37:42 Arjun: And so it is really hard save for save for actually like going in and writing things down to get the institutional knowledge that you have on a property. So one of the fun things that we built was a retool tool so that our teams can use internally. So the interface for this is we will typically ask our customers when they set up a property on Residuesk. Hey. Like, send us any documents that you have, our way. Right? Like, it's a lease document. It's a flyer for an event, whatever it might be. And so by doing that, we take on the sort of work of turning that unstructured data. It could be a PDF. It could be a painting.

00:38:25 Arjun: Who the hell knows? And turning it into guidelines that we can actually use. So our team has this retool app that you can browse. I'm just gonna take here's like a pet policy that I have on hand, and I can take that and say, okay. Here's the guideline. And by saving it, what it's doing is taking a document that let me just quickly show you what it looks like. But this is what that document will typically look like. Right? Like and this is actually more structured than a a good a a typical document would be. But here's all the responsibilities. Here's all the agreements. Here's the liability, etcetera, etcetera.

00:39:12 Arjun: We have a tool that is then taking that and actually turning it into an FAQ. So it creates a blurb. It uploads a document, and then it actually uses this is using Gemini behind the scenes because it's so good at finding needles in haystacks. The most common questions that you have that a resident could have about these things. Right? But then what happens on the back end is even more interesting, which is all of this gets vectorized. The raw text of the dark of the document, the blurb, and all of the individual questions that we picked up from it. And so when that happens, what that lets us do is bring up do rag in a really efficient way where we look at conversations that people are asking about.

00:39:57 Arjun: So, like, this resident asking about partial payments and then bring up the related guideline and and tell you what the answer is. This then feeds into the doc into the answers that we suggest, which then makes our rag workflow a lot easier. By doing this, we were able to sort of see an uptick in, in the number of questions we answer on day 1. So when we started Resideo SA, I think we were answering something like 60 to 65% of resident questions on day 1. After we started just taking in any sort of unstructured data, we're now answering closer to, like, 85% of resident questions on day 1. And it only happens because we are able to abstract away this really dumb workflow of send me whatever the hell you have that looks like a property document and we will make it useful for ourselves without having people go through, like, a big questionnaire that they need to fill out.

00:40:45 Arjun: So Yeah. Anybody building an AI AI tool out there that needs to be trained on data, this is an easy hack.

00:40:53 Greg: And what's crazy is that this easy hack, you just did it yourself with some retool and some, like, slight prompts and then turning into structured data.

00:40:59 Arjun: Yeah. So, here's another internal workflow if you're ready for some insanity of just dealing with payments and property management. So, again, context is we sell ResidDesk to Greg. Greg has a set of, let's call it, 15 properties. Right? If I bill Greg for 15 bucks and I'm making the math so funny here, but, basically, like, it's a dollar per property. Let's just call it that. The way that real estate has worked for a long time, I said, was first, each of your properties would just pay me a dollar. So I'm left stuck with a $15 invoice that I then have to itemize and then reconcile. Customers pay us in checks.

00:41:43 Arjun: We have a workflow that scans and then reconciles for us in Stripe. But that's not the crazy end of it. In an effort to digitize, what's happened in real estate is now those 15 properties will typically pay push that check to a third party clearing house that then turns that check into a one time used credit card that they send us along with the invoice number. So Wow. What I end up with, I'm gonna show I'm gonna share my screen because this this is literally the nuttiest thing that we have seen in our time here. What I end up with is something like this. So you can see this email here. It comes from a third party clearing house.

00:42:25 Arjun: It's, you know, a payment for $5 for this and a payment for $208 for this for these invoices. Right? Behind the scenes on this is you can see here this little thing called click here to view card information. You open that. Our customers actually send us a one time use credit card that is only valid for 30 days, and we have to pick up the card number, figure out what invoice this belongs to, and then pay it within those 30 days or that payment expires. Right? And then we have to go through the whole process all over again. As you can probably imagine, this is a complete pain in the neck for us to do manually.

00:43:07 Arjun: Because as I said, if Greg has 15 properties, we get 15 of these every month. Right? And so this is where make has been our best friend, and Front has been our best friend. So what we've been able to do is say, okay. You can automate things on tags in Front. So when I assign a tag that just says, you know, billing payment reconciliation

00:43:32 Greg: Mhmm.

00:43:32 Arjun: You can create a webhook that goes to front and or to make. Sorry. And then what happens inside of make is it then finds the sort of content of the message. It opens the link, takes a screenshot of it after a small delay, and then OCRs it to get the card details, finds the link of the conversation in Stripe sorry. Finds the links of the associated invoice in Stripe and then adds them back as comments into the conversation, which we then, if the payment amounts actually fully line up, we will automatically pay, and then the whole thing is automated. If the payment amounts don't line up, then it it alerts our billing team to then manually pay the invoice in Stripe.

00:44:21 Arjun: So, again, by doing this, we are processing as I said, we serve about a 1000000 residents across the US. So we are now processing something on the order of probably like 6 to 800 one time credit card payments a month. And like nobody has the time to do that. The other side of the automation, which I'm incredibly bullish on, will happen with things like cloud computer use but isn't quite there yet Mhmm. Is we're interacting with software that was built in the mid to late nineties. Right? We live in an industry where nobody actually employs software engineers. They just employ contractors to build a thing and then that's the thing that's that stays forever.

00:45:07 Arjun: So we're dealing with a lot of like kind of like the, you know, what Plaid used to deal with back in the day. We're dealing with a lot of legacy software that isn't is actually fairly RPA resistant too. So because that's the one thing that people actually will invest time and money into into combating. Mhmm. And so we there are systems that our customers use that have APIs, but by and large, they don't have APIs. And so we're slowly starting to automate some of the some of the data intake, but then I'm I'm excited to get to a future where, you know, Claude or GPT can just have access to a VM where it logs in and gets the data that we need automatically.

00:45:49 Greg: Sure.

00:45:49 Arjun: So we TLDR is I think we wanna in we wanna automate all of our data intake problems. Mhmm. But from a from a conversation output to the residents piece, I think that will always have a human in the loop, and that's mostly driven by the product that we're building. We're we're in the trust business, and residents trust us because we have a human team.

00:46:10 Greg: Sure. Yeah. That makes total sense. Alright. Last question I wanna end up with with you. So there's a lot of viewers out there who may be thinking about, hey. I wanna implement AI in my business, but I have no idea what to do. So let's say that you're CEO of a new business. Legacy, the business has already been going for a long time, profitable, it's going well, but you're the new CEO. You drop in on day 1, and they say, we need you to evaluate our workflows and what we're doing, and we need you to make us AI enabled or get us on AI. What's the playbook that you're gonna run to look at the whole business and figure out what to do?

00:46:47 Arjun: It's a great question. We think about this a lot because, again, our industry is really focused on talking about things like what is AI in real estate as an example. Right? To me, the funny part is I think talking about AI in real estate or talking about how to put AI in my business is about as broad and silly as talking about how do I use Excel in my business. Right? I think what people fail to understand is that this is really about specific use cases, not about, you know, throwing Excel or AI magic into your business. So what I would do on day 1 would be to look at much like what we did with Residesk, look at the most manual workflows that people have and what you need to do.

00:47:31 Arjun: And so what we do internally is, first of all, give people broad access to LLM tools in house, you know, whatever. Like, on prem, think about data security. All that stuff is great, but, like, give everybody broad access. I think the playbook for getting people to use AI to help themselves is not a centralized top down instruction to do certain workflows the AI way, but instead to give people the tools. And then the second part, which is much more important, although the first part never seems to get past IT teams. Mhmm. 2nd part that's really important is go find a couple folks whose lives would be a lot better.

00:48:09 Arjun: Ideally, somebody who's doing a really repetitive tedious task, something super manual, and then work with them to build a workflow tailored to them using AI. Right? So example for us is we actually scan paper checks. Our customers pay us with checks. Right? And so reconciling invoices becomes a huge deal. I literally sat down with our billing person and said, okay. How do we make this easier? Build and make workflow that then talks to GPT to look at all the receipts that we get and then reconciles it and pays it on Stripe so they don't have to spend time on it. There's probably a 100 examples of this in any company that you go to.

00:48:48 Arjun: Find a couple of people who would be great to sit down with and build a really small, cheap, fast process with them. Like, none of this stuff should take you more than Dataship.

00:48:57 Greg: Mhmm.

00:48:57 Arjun: And then make case studies out of that and then have them spread the word to other people in the company about how this was useful to them and how you can be creatively thinking about AI for your own workflow. I can be told as an employee to use AI as much as the executives want.

00:49:18 Greg: Uh-huh.

00:49:18 Arjun: But, ultimately, I'm only gonna care when it helps me solve something I don't like to do. And it ends up being a very personal workflow to each person. And so what you really need are just these sort of champions of AI inside your organization. If you can go find 2, 3, 5 people across different departments that you can spread the word to and then put them in charge of sharing with everybody else how they do things better, that's when you really start getting the magic. Right? It has to be almost like a it's almost like a viral b to c adoption curve. You have to make it useful to a few super users and then have them spread it out.

00:50:00 Greg: That's so cool. I love that playbook, and it makes that makes sense to me. Well, I tell you what, Arjun. This has been absolutely fabulous. Thank you very much for just showing like, just screen sharing and just showing the workflows that you have going on. It's really cool to see what, like, you're actually actually doing. Yeah. So looking at the prompts, looking at the projects. So I really appreciate that and thanks for chatting today.

00:50:19 Arjun: Absolutely. Thanks so much for having me.

On this page