Amazon Q Enterprise is an AI-powered generative assistant that solutions questions, offers summaries, generates content material, and extracts insights immediately from digital content material and scanned PDF recordsdata from company sources with out first extracting textual content.
Prospects in industries resembling finance, insurance coverage, healthcare and life sciences want insights from a wide range of doc sorts, resembling receipts, healthcare plans or tax statements, typically in scanned PDF format. These file sorts usually have a semi-structured or unstructured format and require processing to extract textual content earlier than indexing with Amazon Q Enterprise.
Amazon Q Enterprise introduces assist for scanning PDF recordsdata that will help you seamlessly course of numerous multi-mode doc sorts in all supported Amazon Q Enterprise AWS areas by way of the AWS administration console and API. You should utilize supported connectors to extract recordsdata (together with scanned PDFs) from sources, index them, after which use these recordsdata to reply questions, present summaries, and generate content material securely and precisely from enterprise techniques. This function eliminates the event work required to extract textual content from scanned PDF recordsdata exterior of Amazon Q Enterprise and improves doc processing pipelines constructed utilizing Amazon Q Enterprise to generate synthetic intelligence (AI) assistants.
On this article, we present the right way to use Amazon Q Enterprise to asynchronously index scanned PDF recordsdata and carry out on-the-fly queries.
Answer overview
You should utilize Amazon Q Enterprise to scan PDF recordsdata by way of the console, AWS SDK, or AWS Command Line Interface (AWS CLI).
Amazon Q Enterprise offers a set of versatile knowledge connectors that combine with a wide range of enterprise knowledge sources, permitting you to develop generative AI options with minimal setup and configuration. To be taught extra, go to Amazon Q Enterprise, now typically out there, to assist enhance worker productiveness by way of generative AI.
As soon as your Amazon Q Enterprise utility is obtainable, you possibly can add scanned PDFs immediately into the Amazon Q Enterprise index utilizing the console or API. Amazon Q Enterprise offers a number of knowledge supply connectors to consolidate and synchronize knowledge from a number of knowledge repositories right into a single index. On this article, we reveal two situations for utilizing recordsdata: one utilizing the direct file add choice and one utilizing the Amazon Easy Storage Service (Amazon S3) connector. If that you must acquire paperwork from different knowledge sources, see Supported connectors for particulars on connecting to different knowledge sources.
Index paperwork
On this article, we use three scanned PDF recordsdata as examples: an bill, a well being plan abstract, and an employment verification kind, in addition to some textual content recordsdata.
Step one is to index these recordsdata. Full the next steps to index recordsdata utilizing Amazon Q Enterprise’s direct add function. On this instance, we add a scanned PDF.
- On the Amazon Q Enterprise console, select Software areas within the navigation pane and open your app.
- select Add to supply.
- select add recordsdata.
- Add scanned PDF recordsdata.
You possibly can monitor uploaded recordsdata supply Label. this Add standing Change from obtained arrive processing arrive index or renew, the file has been efficiently listed into the Amazon Q Enterprise knowledge retailer. The screenshot beneath exhibits a efficiently listed PDF.
The next steps reveal the right way to use the Amazon S3 connector to combine and synchronize recordsdata with Amazon Q Enterprise. For this instance, we index textual content paperwork.
- On the Amazon Q Enterprise console, select Software areas within the navigation pane and open your app.
- select Add new knowledge supply.
- select Amazon S3 for connectors.
- Enter message Title, VPC and Safety group settings, IAM roles, and synchronous mode.
- To finish connecting your knowledge supply to Amazon Q Enterprise, choose Add new knowledge supply.
- inside Information supply particulars part of the Connector Particulars web page, choose Sync now Permits Amazon Q Enterprise to start out synchronizing (crawling and extracting) knowledge out of your knowledge sources.
As soon as the sync job is full, your knowledge sources are prepared to be used. The screenshot beneath exhibits that each one 5 paperwork (scanned and digital PDFs and textual content recordsdata) have been efficiently listed.
The next screenshot exhibits a mixed view of two knowledge sources: a file uploaded immediately and a file retrieved by way of the Amazon S3 connector.
Now let’s use Amazon Q Enterprise to carry out some queries on our knowledge supply.
Queries over dense, unstructured, scanned PDF recordsdata
Your file could also be a dense, unstructured scanned PDF file sort. Amazon Q Enterprise identifies and extracts essentially the most vital information-dense textual content. On this instance, we use the beforehand listed multi-page well being plan abstract PDF. The next screenshot exhibits a pattern web page.
Within the Amazon Q Enterprise Net UI, we ask “What’s the annual out-of-pocket most talked about within the well being plan abstract?”
Amazon Q Enterprise searches listed paperwork, retrieves related info, and generates solutions whereas citing its sources. The next screenshot exhibits pattern output.
Queries on structured, tabular, scanned PDF recordsdata
Paperwork can also include structured knowledge components in tabular format. Amazon Q Enterprise mechanically identifies, extracts, and linearizes structured knowledge in scanned PDFs to precisely resolve any consumer question. Within the following instance, we use a beforehand listed bill PDF. The next screenshot exhibits an instance.
Within the Amazon Q Enterprise Net UI, we ask “How a lot are the headphones charged on the bill?”
Amazon Q Enterprise searches the index recordsdata and retrieves solutions by referring to the supply recordsdata. The next screenshot exhibits Amazon Q Enterprise having the ability to extract billing info from an bill.
Semi-structured kind question
Your file can also include semi-structured knowledge components in types, resembling key-value pairs. Amazon Q Enterprise can precisely fulfill queries associated to those knowledge components by retrieving particular fields or attributes which might be significant to the question. On this instance, we use the Employment Verification PDF. The next screenshot exhibits an instance.
Within the Amazon Q Enterprise Net UI, we ask “What’s the applicant’s date of employment on the employment verification kind?” Amazon Q Enterprise searches the listed employment verification doc and retrieves the reply by referring to the supply doc.
Index paperwork utilizing the AWS CLI
On this part, we present you the right way to use the AWS CLI to extract structured and unstructured recordsdata saved in an S3 bucket into an Amazon Q Enterprise index. You possibly can shortly retrieve detailed details about a file, together with its standing and any errors that occurred throughout indexing. If you’re an present Amazon Q Enterprise consumer and have listed recordsdata in numerous codecs, resembling scanned PDFs and different supported sorts, and also you now need to re-index the scanned recordsdata, please full Following steps:
- Test the standing of every doc and filter failed paperwork based mostly on standing
"DOCUMENT_FAILED_TO_INDEX"
. You possibly can filter recordsdata based mostly on this error message:
"errorMessage": "Doc can't be listed because it accommodates no textual content to index and search on. Doc should include some textual content."
If you’re a brand new consumer and haven’t listed any paperwork but, you possibly can skip this step.
The next is an instance of utilizing the ListDocuments API to filter recordsdata with a selected standing and their error message:
The next screenshot exhibits the AWS CLI output, which accommodates an inventory of failed recordsdata with error messages.
Now you possibly can batch course of recordsdata. Amazon Q Enterprise helps including a number of recordsdata to the Amazon Q Enterprise index.
- Use the BatchPutDocument API to extract a number of scanned paperwork saved in an S3 bucket into an index:
The next screenshot exhibits AWS CLI output. You must see an empty record of failed paperwork.
- Lastly, use the ListDocuments API once more to examine that each one recordsdata have been listed accurately:
The screenshot beneath exhibits that the doc has been listed within the supply.
clear up
In case you created a brand new Amazon Q Enterprise utility and don’t plan to make use of it additional, cancel the subscription and take away the assigned customers from the appliance and delete them in order that prices don’t accrue to your AWS account. Moreover, when you now not want the listed knowledge supply, see Managing Amazon Q Enterprise Information Sources for directions on deleting the listed knowledge supply.
in conclusion
This text demonstrates Amazon Q Enterprise’s assist for scanning PDF file sorts. We spotlight the steps to make use of generative AI to sync, index, and question supported file sorts (now together with scanned PDF recordsdata) with Amazon Q Enterprise. We additionally present examples of utilizing the Amazon Q Enterprise Net UI and AWS CLI to question structured, unstructured, or semi-structured multi-mode scan recordsdata.
To be taught extra about this function, see Supported file codecs in Amazon Q Enterprise. Strive it now on the Amazon Q Enterprise console! For extra info, go to Amazon Q Enterprise and the Amazon Q Enterprise Person Information. You possibly can ship suggestions to AWS re:Put up for Amazon Q or by way of your traditional AWS assist contact.
In regards to the creator
Sonali Sahu Lead the answer structure staff of generative AI consultants at AWS. She is an creator, thought chief, and passionate expertise professional. Her core focus areas are synthetic intelligence and machine studying, and she or he is a frequent speaker at synthetic intelligence and machine studying conferences and meetups all over the world. She has broad and deep expertise within the expertise and expertise industries, with trade experience in healthcare, the monetary sector and insurance coverage.
Chinmayi Rayne is a Generative AI Skilled Options Architect at AWS. She is keen about utilized arithmetic and machine studying. She focuses on designing clever file processing and generative AI options for AWS clients. Outdoors of labor, she enjoys salsa and bachata dancing.
Himesh Kumar He’s an skilled senior software program engineer at the moment working in Amazon Q Enterprise of AWS. He’s keen about constructing decentralized techniques within the area of generative AI/ML. His experience extends to creating scalable and environment friendly techniques making certain excessive availability, efficiency and reliability. Along with technical expertise, he’s dedicated to steady studying and staying on the forefront of technological developments in synthetic intelligence and machine studying.
Qingwei Is a senior software program developer on AWS’s Amazon Q enterprise staff, keen about constructing trendy purposes utilizing AWS expertise. He enjoys community-driven studying and expertise sharing, particularly subjects associated to machine studying internet hosting and inference. His present essential focus is constructing serverless and event-driven architectures for RAG knowledge ingestion.