Build an AI that answers questions based on user research data
date
Apr 17, 2023
slug
build_ai_answder_question_based_research
status
Published
tags
AI
summary
A guide to building an AI with a custom knowledge base using OpenAI API.
type
Post
A guide to building an AI with a custom knowledge base using OpenAI API.
Modern
products often have a large amount of user research data from different
sources: user research interviews, intercom conversations, customer
e-mails, surveys, customer reviews on various platforms, etc.
Making
sense of all that data is a challenging task. A traditional way to do
that is to maintain a neatly organized database with various
corresponding tags.
But what if we can have our personal AI chatbot that can answer any question about our user research data?
By
querying a large amount of historic user research data, the chatbot can
provide insights and recommendations for a new project, product, or
marketing campaign.
Well,
now it’s possible with just a few lines of code. You can do that even
without a technical background. In this article, I’m going to explain
how to do that step-by-step.
This technique was first described by Dan Shipper.

OpenAI API
You are probably familiar with ChatGPT and hopefully are already using it in your working process. If not, I recommend reading my article about AI's impact on design first.
OpenAI
also provides an API to send the requests. We need that to be able to
send the relevant context to the model. And to keep the information
private.
Before we start with API, you can try interacting with GPT-3 model through a user interface in GPT-3 Playground.

OpenAI API Playground
Privacy concerns
There
are many privacy concerns when we deal with user data. By default,
OpenAI will not use data submitted by customers via our API to train
OpenAI models or improve OpenAI’s service offering. But of course, there
might be much more security limitations. Check OpenAI documentation for more information and consult with your legal team.
Custom knowledge base
We want it to use data from our research and not just general knowledge from the internet. How can we do it?
Can fine-tuning work?
When first approaching this issue, I thought that it is possible to fine-tune
the model with our dataset. It turned out that fine-tuning is used to
train the model answer in a certain way by providing prompt-response
examples.
Fine-tuning
can be helpful to train the model to recognize the sentiment, for
example. To do that you need to provide sentence-sentiment value pairs
in the training data like in this example:
But
in our case, we don’t have prompt-response examples. We just have data
that we want to use to find an answer. So fine-tuning will not work in
that situation.
Sending context into the prompt
Instead, we need to make the model aware of the context. And we can do it by simply providing the context in the prompt itself.
Like this:
There
is a catch though. We cannot just send all our research data in one
prompt. It is computationally unreasonable and the GPT-3 model has a
request/response hard limit of 2049 “tokens”. Which is approximately 8k
characters for request and response combined.
Instead
of sending all the data in the request, we need to find a way to send
only relevant information that would help our chatbot to answer the
question.
There is a library for that
Here is how it works:
- Create an index of text chunks
- Find the most relevant chunks
- Ask the question to GPT-3 using the most relevant chunk
The library does all the heavy lifting for us, we just need to write a few lines of code. Let’s do it!
Getting the hands dirty
The
code has only 2 functions: the first one constructs an index from our
data and the second one is sending the request to GPT-3. Here is a
pseudo-code for that:
Constructing an index
First,
we need to construct an index. An index is like a database that stores
pieces of text in a way that makes them easy to find.
To
do that, we have to collect all our data into a folder. Then we ask GPT
Index to take all of the files in the folder and break each file into
small, sequential pieces. Then we store those pieces in a searchable
format.
When
we run this function, it will create a file called index.json that
contains chunks of our data converted into a format that makes them easy
to search.
Be
careful, running this code will cost you credits on your OpenAPI
account ($0.02 for every 1,000 tokens, you’ll have $18 free credits when
you set your account)
Asking the question
Now, let’s ask the question. To search the index that we made, we just need to enter a question into GPT Index.
- GPT Index will find the parts of our index that are most related to the question
- It will combine them with the question, and send it to GPT-3.
- Then, it will print out the response.
Testing it out
I
cannot share user research data with you as it is confidential. So to
test the code out, I will use automatically generated interviews as my
knowledge base for the example.
I’ve
asked ChatGPT to generate interview questions about cooking at home and
the use of domestic appliances. After that, I asked to generate
interview scripts based on these questions. The interviews turned out to
be quite blank and not very insightful, but it is enough to test our
AI.
After uploading the files and constructing an index, I can try to ask a question about those interviews.
For example, I can ask my chatbot to “brainstorm marketing campaign ideas for an air fryer that would appeal to people that cook at home”. It will generate ideas based on the interviews that I’ve provided and not based on general knowledge from the Internet.
If you are a visual learner, you can check the video where I explain everything step by step.
Possible applications
We’ve
created an ai with a custom knowledge base with just a few lines of
code. Just think for a moment how accessible custom AI becomes.
In
just a few minutes we’ve managed to build a custom solution for
searching the insights in our research database. The same technique can
be used in dozens of other different use cases.
Imagine
a healthcare chatbot that provides medical advice based on a user’s
health history and symptoms. Or an artificial legal advisor that is able
to dig through the legal documents to provide you with meaningful
information. Or a chatbot with a knowledge base of financial regulations
and best practices that can assist with financial planning and help you
make informed decisions.
And
this is just one small technique that we explored today. With
advancements in Large Language Models, it is now easier than ever to
create custom AI that can serve your specific needs.
What do you want to create next?