Use ChatGPT to intelligently extract information from various documents

Tram Ho

Use ChatGPT to intelligently extract information from various documents

Combine the power of Azure Search with ChatGPT for intelligent analysis of your textual data. Example: https://github.com/Drop-Database-Cascade/chatGPT-azurefuncs.git

All credit has been contributed by the following individuals in the design and development of this solution:

Background

Not wanting to be left behind in the ChatGPT race, a group of colleagues and I decided to design and build a custom solution that excels by harnessing the capabilities of ChatGPT.

We decided to create a “Document Search Chatbot” that can be used as an additional tool to answer common questions (FAQs) based on the content of several documents.

One financial services team I worked with had a problem where they regularly produced informational materials for the public (reports, fact sheets, propaganda texts) and then received a large number of inquiries. There is a broad view that 70-80% of these inquiries can be answered with the information contained in a particular document. However, it is not always clear to members of the public which document will answer their questions and where to look in it. In theory, a chatbot smart enough to link inquiries to source documents could significantly improve the experience for financial services teams and members of the public.

In addition to exploring an interesting use case, I’m also particularly interested in how a language model like ChatGPT can enhance existing cloud platforms.

How it works

To summarize, the Document Search Chatbot uses Azure Search to extract and rank key highlights from a set of text documents based on a user query. User queries and Azure Search results are passed to OpenAI to be interpreted and formatted into a chat-based response.

Here are the high-level processes performed to generate a response from the Document Search Chatbot:

  1. Documents are uploaded to Azure Blob Storage (used to “feed” the Document Search Chatbot).
  2. Azure Search is used to create an index on uploaded documents.
  3. Configure Semantics created for the index. This specifies how semantic search ranks fields in the document index when queried.
  4. The user submits a query through the Web Portal, for example, “What services are provided by provider A?”
  5. The user’s question triggers an Azure Function request that runs an API call to Azure Search and OpenAI.
  6. The query is passed to Azure Search as a semantic query and returns the highlights and extracts of matching documents ranked by criteria in the Semantic Profile (e.g. relevance) . Refer to the Azure Search SDK documentation for more details.
  7. A ChatGPT query is generated using the appropriate Search extracts and the original user query. ChatGPT Query directs ChatGPT to answer user queries using Search extracts if they are relevant to the user query.
  8. A ChatGPT query is sent and a response is received from OpenAI. Refer to the OpenAI API documentation for more details.
  9. User queries, Azure Search responses, and ChatGPT responses are logged with Application Insights.
  10. This response is passed to the web portal and served to the user.

This solution helps to solve the following problems:

From an Azure Search point of view – Azure Search can only return text exactly as it appears in your document, our solution uses ChatGPT to interpret responses from Azure Search and customize the response for a user question while looking at whether there is a suitable answer to that question in your documents.

From ChatGPT’s point of view – our solution provides a mechanism to utilize the power of ChatGPT on a large set of custom text documents without being constrained by token limitations.

The limitations of this solution include:

  • Accuracy of responses: While the Document Search Chatbot can analyze a variety of documents and provide quick answers to simple questions, there is no way to guarantee accuracy. 100%.
  • Data security and privacy: The Document Search Chatbot should be used to analyze publicly available documents, but it is important to ensure there is a process in place to avoid unintentional sharing of documents considered top secret.
  • User Experience: The Docs Search Chatbot should be designed to provide a positive experience and satisfy the user’s needs. This may require ongoing testing and updates to ensure the Document Search Chatbot is continuously processing a wide range of queries.
  • Human Involvement: While the Document Search Chatbot is capable of providing quick and accurate responses to questions about the content in a set of documents, it will not replace the ability human understanding of more complex or important questions.

The performance of the Document Search Chatbot can be improved by the following methods:

  • Tooling – Adds open source Chain of Thought (COT) tools such as LangChain to allow continued questioning from users when the user’s query is not sufficient to extract extracts from source documents.
  • Pre-processing – Process user queries before sending to Azure Search, e.g. use Chat GPT to exchange user queries with some embedded background context to improve results Azure Search results.
  • Document Chunking – Adjusts the size of the logical segments into which source documents are split to improve the relevancy of the text fragments returned by Azure Search. Depending on the source documents, you may want smaller pieces of text to increase the number of different pieces of text sent to ChatGPT, or you may want longer paragraphs to ensure the context of the text. text is not lost.
  • Summarisation – Using ChatGPT to summarize the text snippets returned by Azure Search before responding to a user query can allow more extracts to be reviewed from the source documents.

overall

Overall, this example shows how Large Language Models (LLMs) can be integrated with managed cloud services to provide enhanced capabilities. As organizations continue to evolve their cloud platforms, they can leverage ML Ops processes to integrate pre-trained models like ChatGPT and Azure Semantic Search. In my opinion, organizations that do this well will benefit from highly customized and efficient solutions. Just before this article was released, Microsoft released a view with source code examples for integrating Azure Search with ChatGPT, which can be read more here. Furthermore, Azure OpenAI service now allows you to use ChatGPT (preview) in a suitable way for enterprise applications. Given the relationship between Microsoft and OpenAI, I wouldn’t be surprised if they release a direct solution to easily integrate Search and OpenAI in the not too distant future.

Please refer to the following links to read more and better understand the topic:

Source: Max Fifield, Medium

Share the news now

Source : Viblo